Circulation. 2002;106:685-690
Published online before print July 1, 2002,
doi: 10.1161/01.CIR.0000024410.15081.FD
(Circulation. 2002;106:685.)
© 2002 American Heart Association, Inc.
Clinical Investigation and Reports |
Use of the Logical Analysis of Data Method for Assessing Long-Term Mortality Risk After Exercise Electrocardiography
Michael S. Lauer, MD;
Sorin Alexe, MS;
Claire E. Pothier Snader, MA;
Eugene H. Blackstone, MD;
Hemant Ishwaran, ScD;
Peter L. Hammer, PhD
From the Departments of Cardiology (M.S.L, C.E.S.P.), Cardiothoracic Surgery (E.H.B.), and Epidemiology and Biostatistics (E.H.B., H.I.) of the Cleveland Clinic Foundation, Cleveland, Ohio; and the Center for Operations Research (RUTCOR) (S.A., P.L.H.), Rutgers University, Piscataway, NJ.
Correspondence to Michael S. Lauer, MD, FACC, Director of Clinical Research, Department of Cardiology, Cleveland Clinic Foundation, Desk F25, 9500 Euclid Ave, Cleveland, OH 44195. E-mail Lauerm{at}ccf.org
 |
Abstract
|
|---|
Background Logical Analysis of Data is a methodology
of mathematical optimization on the basis of the systematic
identification of patterns or "syndromes." In this study, we
used Logical Analysis of Data for risk stratification and compared
it to regression techniques.
Methods and Results Using a cohort of 9454 patients referred for exercise testing, Logical Analysis of Data was applied to identify syndromes based on 20 variables. High-risk syndromes were patterns of up to 3 findings associated with >5-fold increase in risk of death, whereas low-risk syndromes were associated with >5-fold decrease. Syndromes were derived on a randomly derived training set of 4722 patients and validated in 4732 others. There were 15 high-risk and 26 low-risk syndromes. A risk score was derived based on the proportion of possible high risk and low risk syndromes present. A value
0, meaning the same or a greater proportion of high-risk syndromes, was noted in 979 patients (21%) in the validation set and was predictive of 5-year death (11% versus 1%, hazard ratio 8.3, 95% CI 5.9 to 11.6, P<0.0001), accounting for 67% of events. Calibration of expected versus observed death rates based on Logical Analysis of Data and Cox regression showed that both methods performed very well.
Conclusion Using the Logical Analysis of Data method, we identified subsets of patients who had an increased risk and who also accounted for the majority of deaths. Future research is needed to determine how best to use this technique for risk stratification.
Key Words: risk factors mortality statistics
 |
Introduction
|
|---|
The Logical Analysis of Data (LAD) is a mathematical methodology
based on techniques of optimization and logic.
1,2 Designed to
identify patterns of findings, or syndromes, that predict outcomes,
this method has been applied to problems in economics, seismology,
and oil exploration,
3 but not to medicine. Cardiovascular risk
stratification may be an appropriate application for the LAD,
as it relies on collections of different data elements.
4 In
this study, we applied the LAD to a cohort of patients referred
for exercise electrocardiography.
5 We assessed this methods
ability to predict mortality and compared it to Cox regression.
6
 |
Methods
|
|---|
Patient Population
The sample has been described in detail.
5 Consecutive adults
referred for symptom-limited exercise electrocardiography between
September 1990 and March 1998 were eligible. Patients with heart
failure, valvular disease, left bundle branch block, digoxin
use, and resting ST segment depression were excluded. The Cleveland
Clinic Foundations Institutional Review Board approved
research study of the exercise database.
Clinical Data
All patients provided a structured history for prospective recording of symptoms, risk factors, cardiac procedures, co-morbidities, and medication use. Detailed explanations and definitions of these have been published elsewhere.5
Exercise Testing
Exercise testing was symptom-limited according to standard protocols.7 All data were collected prospectively, including heart rates before, during, and after exercise, symptoms, arrhythmias, blood pressure, exercise capacity, electrocardiographic changes, and calculated Duke treadmill scores.8
End Points
The primary end point was all-cause mortality9 obtained from the Social Security Death Index.10
LAD and Derivation of Syndromes
The LAD method focuses on systematic evaluation of combinations of findings, or syndromes, which we based on 20 variables considered as predictors of death. These variables were age, sex, current or recent smoking, hypertension, diabetes, chronic lung disease, peripheral vascular disease, prior coronary heart disease, referral because of chest discomfort, right bundle branch block, resting non-specific ST abnormalities, use of aspirin, ß-blockers, non-dihydropyridine calcium channel blockers, vasodilators, and/or lipid-lowering drugs, resting heart rate, Duke treadmill score, 8 chronotropic index, 11 and 1-minute heart rate recovery.5
Two types of syndromes were searched for, high-risk and low-risk. High-risk syndromes were collections of
3 findings that were associated with a mortality rate at least 5 times the average and were present in at least 15% of deceased patients. For example, in a training data set of 4722 patients, the pattern of normal ST segments at rest, heart rate recovery <12 beats per minute, and Duke exercise treadmill score <5 was found in 162, 27 of whom died. This pattern accounted for 17% of all deaths and was associated with a mortality rate that was 5.1 times the average.
Conversely, low-risk syndromes were collections of up to 3 findings that were associated with mortality rates of no more than 20% of the average death risk and were found in at least 30% of survivors. For example, in the same training set, there were 2687 patients who had the pattern of age <58 years and normal resting ST segments. Only 16 died. Thus, this syndrome accounted for 59% of the survivors and was associated with a mortality rate of 0.60 deaths per 100 person-years, which was under 1/20 of the average mortality rate.
We identified 502 candidate high-risk syndromes and 1098 candidate low-risk syndromes. As expected, there was much redundancy between syndromes; by applying a standard "set covering" algorithm of discrete optimization, this set was collapsed into 15 high-risk syndromes (Table 1) and 26 low-risk syndromes (Table 2). Further technical details regarding the derivation of high- and low-risk syndromes are provided elsewhere.12 Of note, the process was performed mathematically with no manual intervention, meaning that aside from the variables chosen for study, investigator bias had no role in syndrome derivation.
Statistical Analyses
The study sample of 9454 patients was randomly divided into training and validation sets. The training set was used to derive syndromes. Missing data were uncommon (<3% of patients had any missing data); mean and mode imputations were performed as appropriate.
All remaining analyses were performed on the validation data set. Patients were divided into those who had only high-risk syndromes, only low-risk syndromes, both high- and low-risk syndromes, and neither type of syndrome. Survival curves were generated using the Kaplan-Meier method. After confirmation of the proportional hazards assumption by Schoenfeld residuals, Cox modeling6 was used to assess the association of syndromes with mortality.
A risk score was derived from LAD-derived syndromes by taking into account the total number of high-risk and low-risk syndromes each patient had. The score of patient X was calculated as a linear discriminant equation
where the Pi (i in I) and Nj (j in J) represent the high-risk and low-risk syndromes, Pi(X) (respectively, Nj(X)) takes the value 1 if patient X displays the high-risk syndrome Pi (respectively, low-risk syndrome Nj ) and takes the value 0 otherwise, and where the
is and ßjs are non-negative normalized "weights." In our example, we have chosen all the 15
is corresponding to the high-risk syndromes to be equal to 100/15, and all the 26 ßjs to be equal to 100/26.
To understand the theoretical basis for this score, consider all the positive patterns P1,..., Ph and all the negative patterns N1,..., Nk, listed only "in theory," without us actually having produced them. Associate to a patient C the 0,1 vector of h+k components, which indicate for each one of the positive or negative patterns whether the patient does or does not display that pattern. Further, let C* be an "ideal" (obviously non-existent) patient, who displays every positive pattern and none of the negative ones. Then, his/her associated vector is (1,1,...,1,1,0,0,...,0,0). Let cor(C) be the correlation of the 0,1 vector associated to patient C and the 0,1 vector associated to patient C*. The risk score Score(C) developed in LAD for patient C has the property that it has the same sign as cor(C). Therefore, when we base our classification on the sign of Score(C), we are in fact basing it on the sign of cor(C).
Model Comparisons and Validation
To compare the predictive capabilities of the LAD with traditional statistical techniques, we developed a Cox proportional hazards model6 in the training data set, paying attention to variable transformations and possible interactions. On the basis of deciles predicted in the training set, calibration plots were constructed for patients in the validation data set examining predicted versus observed mortality. Differences in correlations between predicted and observed mortality rates were compared using the transformed zr statistic of Fisher.13 To compare model discrimination, c-statistics were calculated according to the method of Harrell.14 All analyses were performed using the SAS system, version 8.1.
 |
Results
|
|---|
Derivation of Syndromes
Syndromes were derived from the training set of 4722 patients.
Tables 1 and 2 present the representative 15 high-risk and 26
low-risk syndromes.
Syndromes in the Validation Set
Among the 4732 patients in the validation set, there were 723 (15%) who had
1 high-risk syndrome and no low-risk syndromes. Analogously, there were 3501 patients (74%) who had at least 1 low-risk syndrome but had none of the 15 high-risk syndromes. There were 415 patients (9%) who had high-risk and low-risk syndromes, and there were 93 (2%) who had neither. Baseline and exercise characteristics according to syndromes are summarized in Table 3.
View this table:
[in this window]
[in a new window]
|
Table 3. Baseline and Exercise Characteristics According to Presence or Absence of High-Risk or Low-Risk Syndromes
|
|
High-Risk Syndromes, Low-Risk Syndromes, and Mortality
During 5 years of follow-up, there were 156 deaths in the validation set. Patients with only high-risk syndromes were at increased risk for death (Figure 1; 12% versus 2%, hazard ratio 8.0, 95% CI 5.8 to 10.9, P<0.0001). Although these patients made up only 15% of the population, they accounted for 58% of the deaths.

View larger version (11K):
[in this window]
[in a new window]
|
Figure 1. Kaplan-Meier plot relating pattern of syndromes derived from LAD in the training data set to mortality in the validation data set. Patients were divided into those who only had high-risk syndromes, only low-risk syndromes, both high- and low-risk syndromes, and neither high- nor low-risk syndromes.
|
|
Patients with only low-risk syndromes were at low risk (1.3%, compared with all others hazard ratio 0.15, 95% CI 0.10 to 0.20, P<0.0001). Compared with patients with only low-risk syndromes, patients who had both high-risk and low-risk syndromes were at somewhat increased risk (4%, hazard ratio 3.1, 95% CI 1.8 to 5.4, P<0.0001). Patients who had neither high-risk nor low-risk syndromes could not be shown to be at increased risk (2% with hazard ratio 1.8, 95% CI 0.4 to 7.4, P=0.42).
Prognostic Score and Risk
A prognostic score based on the number of high-risk and low-risk syndromes present in each patient was derived (median -31, 25th and 75th percentile values -54 and 8). A value
0, representing the highest quintile, was noted in 979 patients (21%) who had a markedly increased risk of death (Figure 2; 11% versus 1%, hazard ratio 8.3, 95% CI 5.9 to 11.6, P<0.0001), accounting for 67% of events. The score was predictive of death when considered as a continuous variable (for a 1 standard deviation increase of risk score; hazard ratio 2.8, 95% CI 2.4 to 3.2, P<0.0001).

View larger version (9K):
[in this window]
[in a new window]
|
Figure 2. Kaplan-Meier plot relating quintiles of prognostic risk score with mortality in the validation data set.
|
|
LAD and Cox Model
We analyzed a model in which we entered the LAD risk score and a score risk derived from the parameter coefficients of a Cox model; both scores were derived in the training set and tested in the validation data set. The variables that entered the final Cox model were age, sex, chronotropic response, heart rate recovery, resting heart rate squared, and a history of chronic lung disease. The 2 interactions entered were age and chronotropic response and resting heart rate and heart rate recovery.
The LAD risk score provided additional prognostic information over that provided by the Cox model (P=0.037). Specifically, the likelihood ratio
2 value for the Cox model alone was 226.9 and increased to 231.3 when adding the LAD risk score. When high risk was defined as the highest quintile, there were 3652 patients (77%) who were considered low risk by both methods, 210 (4%) considered high risk by the Cox model only, 101 (2%) considered high risk by the LAD only, and 769 (16%) considered high risk by both methods. The respective Kaplan-Meier 5 year death rates in these groups were 1.3%, 2.5%, 3.9%, and 12.4%, respectively.
Model Calibrations
Figure 3 shows the predicted and actual mortality outcomes in the validation data set according to deciles of predicted risk, as defined by the training set using the LAD method. There was excellent calibration (F=1063, r2=0.992). Figure 4 shows predicted and actual mortality outcomes in the validation set according to deciles of risk based on the Cox model. Calibration was quite good, but tended to be not as good as with the LAD method (F=238, r2=0.963, P for difference in r2=0.14). Model discrimination was similar with both approaches. The c-statistics for Cox alone, LAD alone, and both together were 0.82, 0.81, and 0.82, respectively.

View larger version (13K):
[in this window]
[in a new window]
|
Figure 3. Predicted versus actual mortality rates among patients in the validation data set with predicted values based on LAD on the derivation data set. Each point refers to a decile of risk, whereas the line reflects the least squares fit.
|
|

View larger version (13K):
[in this window]
[in a new window]
|
Figure 4. Predicted versus actual mortality rates among patients in the validation data set with predicted values based on Cox regression modeling on the derivation data set. Each point refers to a decile of risk, whereas the line reflects the least squares fit.
|
|
 |
Discussion
|
|---|
We used the LAD method to derive a risk stratification scheme
for patients referred for exercise electrocardiography. In a
training data set of 4722 patients, we identified 15 unique
high-risk syndromes and 26 unique low-risk syndromes. When applying
these syndromes to a validation set of 4732 patients, a greater
preponderance of high-risk syndromes predicted increased death
risk. Thus, we were able to designate a group of patients who
made up 21% of validation cohort, yet accounted for 67% of the
deaths. This is in contrast to most risk markers, which, although
they identify high-risk populations, can only identify a minority
of patients who experience events.
15 Using LAD, it was possible
to accurately predict death rates across a wide spectrum of
risk (
Figure 3) to at least the same degree that a Cox proportional
hazards model did (
Figure 4).
Previous groups have incorporated clinical and exercise test findings into standard multivariable modeling techniques to derive risk scores for similar patient populations.8,16 Although these risk scores have been shown to identify high- and low-risk patients, they are limited in that patients labeled as high-risk tend to be very few and therefore account for a small minority of events.17 For example, the Duke treadmill score8 has been used to identify intermediate risk patients. Patients with either intermediate or high-risk scores were shown to comprise the majority of those to experience events, but they also accounted for 45% to 55% of all patients studied.17 Thus, a clinician referring a patient for a stress test would be faced with an even chance that the test result would require further evaluation. In contrast, by classifying patients according to whether or not they had high-risk or low-risk syndromes, it was possible to label 15% of the population as high-risk and 74% as very low- risk. This left some degree of uncertainty among the remaining 11%.
The LAD method may represent a systematic means to explore the importance and nature of interactions in prognostic models. Not only is it possible to test very large numbers of interactions in an efficient way, but one can also test 3-way and even more complex patterns. LAD allows for determination of predictive variables via a rigorous, systematic, and unbiased examination of combinations. Continuous variables need not be constrained to arbitrary cut-points. Finally, another advantage of LAD is that it does not require confirmation of any assumptions about the distributions of data or times to events, unlike, for example, the Cox model, which requires constant proportional hazards over time.
This was an observational study limited to 1 center and is therefore subject to the inherent limitations of such studies, including unobserved confounders and biased sampling variation. Although we performed validation studies, both in deriving the syndromes and in assessing how well they predicted risk, the results of this analysis will need to be confirmed in other data sets derived from other centers. We also did not have formal measures of left ventricular function or non-electrocardiographic measures of myocardial ischemia available.
By systematically considering patterns of findings, we have demonstrated a potentially useful means of prediction of death after exercise testing. On the basis of this analysis alone, the role of LAD in cardiovascular medicine in relation to standard methodologies has not been established, only introduced. If the prognostic utility can be confirmed in other cohorts, it may function well not only for risk stratification but also for systematic determination of clinically important interactions. Much future research will be needed to determine whether or not and how best to incorporate the LAD method into routine clinical risk stratification.
 |
Acknowledgments
|
|---|
Drs Lauer, Blackstone, and Ishwaran and C. Snader receive support
from the National Heart, Lung, and Blood Institute (Grant RO1
HL-66004-1). Drs Lauer and Blackstone and C. Snader receive
additional support from the American Heart Association (Established
Investigator Grant 0040244N). Dr Hammer and S. Alexe receive
support from the National Science Foundation (Grant NSF-DMS-9806389)
and the Office of Naval Research (Grant N00014-92-J-1375).
Received April 17, 2002;
revision received May 20, 2002;
accepted May 20, 2002.
 |
References
|
|---|
- Crama Y, Hammer PL, Ibaraki T. Cause-effect relationships and partially defined Boolean functions. Ann Oper Res. 1988; 16: 299326.[CrossRef]
- Boros E, Hammer PL, Ibaraki T, et al. Logical analysis of numerical data. Math Programming. 1997; 79: 163190.[CrossRef]
- Boros E, Hammer PL, Ibaraki T, et al. An implementation of logical analysis of data. IEEE Trans Knowl Data Eng. 2000; 12: 292306.[CrossRef]
- Califf RM, Armstrong PW, Carver JR, et al. 27th Bethesda Conference: matching the intensity of risk factor management with the hazard for coronary disease events. Task Force 5. Stratification of patients into high, medium and low risk subgroups for purposes of risk factor management. J Am Coll Cardiol. 1996; 27: 10071019.[CrossRef][Medline]
[Order article via Infotrieve]
- Nishime EO, Cole CR, Blackstone EH, et al. Heart rate recovery and treadmill exercise score as predictors of mortality in patients referred for exercise ECG. JAMA. 2000: 284: 13921398.[Abstract/Free Full Text]
- Cox D. Regression models and life tables [with discussion]. J R Stat Soc B. 1972; 34: 187220.
- Gibbons RJ, Balady GJ, Beasley JW, et al. ACC/AHA Guidelines for Exercise Testing. A report of the American College of Cardiology/American Heart Association Task Force on Practice Guidelines (Committee on Exercise Testing). J Am Coll Cardiol. 1997; 30: 260311.[CrossRef][Medline]
[Order article via Infotrieve]
- Mark DB, Shaw L, Harrell FE Jr, et al. Prognostic value of a treadmill exercise score in outpatients with suspected coronary artery disease [see comments]. N Engl J Med. 1991; 325: 849853.[Abstract]
- Lauer MS, Blackstone EH, Young JB, et al. Cause of death in clinical research: time for a reassessment? J Am Coll Cardiol. 1999; 34: 618620.[Free Full Text]
- Boyle CA, Decoufle P. National sources of vital status information: extent of coverage and possible selectivity in reporting. Am J Epidemiol. 1990; 131: 160168.[Abstract/Free Full Text]
- Lauer MS, Francis GS, Okin PM, et al. Impaired chronotropic response to exercise stress testing as a predictor of mortality. JAMA. 1999; 1999: 524529.
- Alexe S, Blackstone EH, Hammer PL, et al. Coronary risk prediction by Logical Analysis of Data. Ann Oper Res. In press.
- Fisher RA. On the "probable error" of a coefficient of correlation deduced from a small sample. Metron. 1921; 1: 132.
- Harrell FE Jr, Califf RM, Pryor DB, et al. Evaluating the yield of medical tests. JAMA. 1982; 247: 25432546.[Abstract]
- Rose G. The Strategy of Preventive Medicine. New York, NY: Oxford University Press; 1992.
- Morrow K, Morris CK, Froelicher VF, et al. Prediction of cardiovascular death in men undergoing noninvasive evaluation for coronary artery disease. Ann Intern Med. 1993; 118: 689695.[Abstract/Free Full Text]
- Kwok JM, Miller TD, Christian TF, et al. Prognostic value of a treadmill exercise score in symptomatic patients with nonspecific ST-T abnormalities on resting ECG. JAMA. 1999; 282: 10471053.[Abstract/Free Full Text]