Abstract 17713: Data Driven Modeling of Electronic Health Record Data to Detect Pre-diagnostic Heart Failure in Primary Care
Introduction: Electronic health records (EHRs) represent an opportunity for real-time early risk prediction. EHR records were used to determine the extent to which machine learning tools and a purely data driven approach to modeling (DDM) could detect heart failure subtypes, i.e., preserved ejection fraction (HFpEF) and reduced ejection fraction (HFrEF), 12 months before a clinical diagnosis.
Methods: Incident HF cases were identified from Geisinger Clinic primary care patients age 50-85, diagnosed between 2001 and 2010 and further defined as HFrEF if LVEF≤40 and HFpEF if LVEF>50. Controls were chosen to match HF cases by age, gender, location and primary care physician. EHR data were extracted on demographics, ICD-9 codes, medication orders, clinical and behavioral measures. Modeling was completed to detect HF using data from a 24-month observation window 12 months before the HF diagnosis. Patient feature vectors were generated from the data and summarized by one or more aggregation functions (e.g., counts, means). For the HF endpoint, modeling was done with and without patients who had an acute coronary syndrome (ACS) event within 12 months of the diagnosis. Regularized logistic regression was applied using information gain feature selection and 10-fold cross validation. Model performance was assessed by the area under the ROC curve (AUC) and complexity by the number of selected features.
Results: Performance for HFpEF is better than for HFrEF. The HFpEF model is more complex than the HFrEF model as indicated by more EHR information that was needed to discriminate the HFpEF cases from controls. Performance with and without ACS cases is similar though models including ACS cases are more complex than models excluding them (Table).
Conclusions: Purely data driven approaches to modeling can be used to detect HF 12 months before clinical diagnosis. Model performance and complexity varies with the HF subtypes indicating differences in the complexity of modeling the HF subtypes.
Author Disclosures: K. Ng: None. W.F. Stewart: None. C. deFilippi: None. Y. Wang: None. R.J. Byrd: None. S.R. Steinhubl: None. Z. Daar: None. H. Law: None. J. Hu: None.
- © 2015 by American Heart Association, Inc.