Abstract 18245: Identifying Novel Predictors for Incident Heart Failure Using Statistical Learning Techniques in the Women’s Health Initiative (WHI) Cohort
Introduction: Machine learning (statistical learning) techniques applied to high-dimensional health data may aid in identifying novel predictors of heart failure (HF). We used machine learning to identify novel predictors of HF in post-menopausal women and compared these models to an established HF risk model.
Methods: After performing minimum-necessary data cleaning on all baseline WHI variables, 1227 variables remained. Without a priori input, we separately applied the Least Absolute Shrinkage and Selection Operator (LASSO) technique with cross-validation, and Classification and Regression Trees (CART) to select sets of variables that are predictive for our primary outcome of incident heart failure hospitalization. We built Cox Proportional Hazards models using each set of predictors and using published predictors from the Atherosclerosis risk in Communities (ARIC) cohort. Model discrimination was compared using Receiver Operator Characteristic (ROC).
Results: Total sample size was 44,173; there were 2,355 outcomes and the median time to event was 8.7 years. LASSO and CART identified 12 and 9 significant predictors, respectively. The highest correlation between selected variables was 0.56. Selected novel predictors include physical function, Hodgkin’s Lymphoma, prior pulmonary embolism, Medicare insurance, and use of cardiotonic, antiarrhythmic and mineral supplement medication among others. Other variables reflected known risk factors for HF (Table). In ROC analysis the model using CART variables had the highest C-statistic (0.794), followed by LASSO (0.775) and ARIC (0.712).
Conclusions: When applied in the WHI, machine learning techniques can identify novel sets of variables without a priori input that yield risk prediction models with improved discrimination as compared to an established HF risk model. Both the models and the novel predictors provide a basis for hypothesis generation and future investigation.
Author Disclosures: G.H. Tison: None. G. Nah: None. J.E. Olgin: None. E. Vittinghoff: None. B.V. Howard: None. R. Foraker: None. M.A. Allison: None. R.L. Casanova: None. R.H. Blair: None. K.K. Breathett: None. L. Klein: None. N.I. Parikh: None.
- © 2016 by American Heart Association, Inc.