Abstract P203: Building Cardiovascular Risk Prediction Models by Applying Machine Learning Methods to Right-Censored Electronic Health Data
Background: Cardiovascular (CV) risk prediction is useful in prioritizing resources and guiding treatment decisions. Most risk prediction equations are constructed using data from cohort studies. However, these cohorts often suffer from lack of diversity or generalizability. Therefore, there is an increasing interest in using electronic health data (EHD) to construct CV risk models. Given the size and complexity of EHD, machine learning (ML) techniques are an appealing alternative to time-to-event regression. However, the nature of EHD is such that subjects have varying duration of follow-up and some event times may be right censored. However, many ML techniques do not accommodate censoring.
Methods: We propose a simple and universal method that allows any ML technique to accommodate right-censored data by averaging predictions over a set of weighted bootstrap samples, where the sampling weights are computed using inverse probability of censoring weighting (IPCW). IPCW is a method that reweights the observed data to represent the underlying population by assigning more weight to observed subjects who were likely to be censored. After resampling from the fully observed data, we fit a model and produce predictions in each iteration. Lastly, we offset computational times by using an ensemble. We consider the problem of predicting 5-year CV risk using EHD from a large health system (N=87,348), where 50% of subjects are censored. We compare our method to other ways of handling censoring.
Results: Our technique consistently improved calibration by about 70-90% than approaches which ignored censoring (see below). However, there were no differences in discrimination. In conclusion, we show that miscalibration due to censoring can affect real-world treatment decisions, as demonstrated using the 2013 ACC/AHA treatment recommendation for statins. In such a scenario, one does not aim to identify for example top 10% highest risk patients, but rather the actual magnitude of risk for a single patient.
Author Disclosures: A. Kotalik: None. D. Vock: None. J. Wolfson: None. P. O’Connor: None.
- © 2017 by American Heart Association, Inc.