Comparing AMI Mortality Among Hospitals in Patients 65 Years of Age and Older
Evaluating Methods of Risk Adjustment
Background—Interest in the reporting of risk-adjusted outcomes for patients with acute myocardial infarction is growing. A useful risk-adjustment model must balance parsimony and ease of data collection with predictive ability.
Methods and Results—From our analysis of 82 359 patients ≥65 years of age admitted with acute myocardial infarction to 2401 hospitals, we derived a parsimonious model that predicts 30-day mortality. The model was validated on a similar group of 78 699 patients from 2386 hospitals. Of the 73 candidate predictor variables examined, 7 variables describing patient characteristics on arrival were selected for inclusion in the final model: age, cardiac arrest, anterior or lateral location of myocardial infarction, systolic blood pressure, white blood cell count, serum creatinine, and congestive heart failure. The area under the receiver-operating characteristic curve for the final model was 0.77 in the derivation cohort and 0.77 in the validation cohort. The rankings of hospitals by performance (in deciles) with this model were most similar to a comprehensive 27-variable model based on medical chart review and least similar to models based on administrative billing codes.
Conclusions—A simple 7-variable risk model performs as well as more complex models in comparing hospital outcomes for acute myocardial infarction. Although there is a continuing need to improve methods of risk adjustment, our results provide a basis for hospitals to develop a simple approach to compare outcomes.
Appropriate treatment of acute myocardial infarction (AMI) can substantially reduce 30-day mortality.1 However, guideline-based therapies are not uniformly applied,2 and there is much variation in mortality among hospitals. For hospitals to be accountable for the care they provide to patients with AMI, they need to be able to compare their performance. In an increasingly competitive medical environment, information about outcomes holds substantial interest for patients, healthcare providers, employers, and healthcare plans.
An impediment to the reporting of outcomes is the challenge of comparing institutions with patients who have different risk profiles. Without adjustment for these baseline differences, comparisons of crude mortality rates favor hospitals that admit the lowest-risk patients. Meaningful evaluations of hospital performance need to consider baseline differences in patient characteristics that could confound comparisons among them.
One approach to facilitate the comparison of hospitals is to use a mathematical model based on patient characteristics to predict mortality and calculate a standardized mortality ratio (SMR), the ratio of the observed mortality of a hospital divided by its predicted mortality. Hospitals can be compared more meaningfully by use of SMRs because these ratios take into account differences in baseline patient characteristics. There is, however, a paucity of information to guide the choice of risk-adjustment model to predict mortality. Many studies have identified prognostic factors for patients with AMI,3 and some studies have promoted specific predictive models.3 4 5 6 7 Complex risk-adjustment models may be preferred by some clinicians because of the breadth of information they include, but extensive data collection efforts can be costly. An ideal model would balance parsimony and ease of data collection with predictive ability.
The objective of this study was to develop a model based on a small number of easily abstracted variables that would accurately predict short-term mortality among patients with AMI. In addition, we sought to compare our model with other published models with respect to discriminant ability, calibration, and hospital ranking. To address these objectives, this study was conducted as part of the Cooperative Cardiovascular Project (CCP), a Health Care Financing Administration initiative to improve the quality of care for Medicare beneficiaries with AMI.2
The CCP database has been described previously.2 In brief, it includes >200 000 patients hospitalized across the country with a principal discharge diagnosis of AMI (International Classification of Diseases, Ninth Revision, Clinical Modification [ICD-9-CM] 410) from 1994 to 1995. Trained technicians abstracted predefined demographic, clinical, and treatment variables from copies of the hospital records and entered them directly into a computer database using interactive software. For all CCP samples, >3000 records were reabstracted, with overall variable agreement of ≈95%.
Medicare Enrollment Database
The Medicare Enrollment Database contains accurate records of the vital status of Medicare beneficiaries,8 but entries from the Social Security records include unverified dates of death recorded as the last day of the month when the exact date from a death certificate was unavailable. We eliminated cases with unverified days of death from the mortality analysis if mortality could not be classified with certainty at the time of evaluation, as described in an earlier report.7 We found unverified days of death for 325 patients in our sample (≈0.2% of such patients or 0.8% of deaths).
The overall study sample was restricted to patients ≥65 years of age who had confirmed AMI, as previously reported,2 and who were not received in transfer from another institution. To avoid counting patients more than once, we included only a patient’s first confirmed AMI hospitalization in the CCP.
Derivation and Validation of Predictive Model
Candidate Predictor Variables
From our review of the medical literature and clinical experience, we selected candidate predictor variables that described demographic and clinical characteristics of the patients. These variable domains (and the specific variables) included the following: demographic characteristics (age, sex, and race), medical history (angina, hypertension, diabetes, active ulcer disease, bleeding disorder, internal bleeding, bypass surgery, heart failure or pulmonary edema, chronic obstructive pulmonary disease, cigarette smoker, stroke, AMI, angioplasty, and trauma in the past month), functional status (mobility, urinary continence, and dementia), clinical presentation and severity variables (systolic and diastolic blood pressures, pulse, respiratory rate, temperature, presence of chest pain, time since chest pain started, hemorrhage, cardiac arrest, gallop rhythm or S3, rales, heart failure or pulmonary edema, cardiomegaly, height, and weight), initial laboratory results (albumin, serum urea nitrogen, creatinine, hematocrit, sodium, and white blood count), and first ECG (left bundle-branch block, pacemaker rhythm, right bundle-branch block, ST-segment elevation, transmural MI, ventricular tachycardia, atrial fibrillation [AF]/flutter, second- or third-degree heart block, evidence of old infarction, and location of AMI). We did not include shock because of concerns that it would be susceptible to intentional manipulation.
Model Development and Validation
We defined a derivation sample that randomly included half of the hospitals in the study sample. In this derivation set, we performed iterations of logistic regression models with 30-day mortality as the dependent variable, gradually reducing the number of independent predictors. We began with all 73 candidate predictors with their associated dummy variables. When variables with missing observations were included in or removed from multivariate models, dummy variables indicating the presence of missing values (yes/no) were also added or removed. We then selected 40 variables with a significance level of P<0.001 in the logistic regression. To identify the most influential variables, the model was further restricted to 23 variables with a Wald χ2 value >50. At this point, we created composite variables in which related variables had similar ORs (eg, we combined anterior MI location with lateral MI location). We repeated the logistic regression, selecting 7 variables with a Wald χ2 value >300. Although this threshold is arbitrary, it allowed selection of variables with strong clinical associations to 30-day mortality.
Missing Data and Extreme Values
Missing observations exceeded 5% for the following candidate predictor variables: angina, time since chest pain started, evidence of heart failure or pulmonary edema on chest x-ray, location of AMI, ventricular tachycardia, height, weight, albumin, AF/flutter, heart block on ECG, left or right bundle-branch block, and paced rhythm. Values for continuous variables outside the following ranges were considered implausible and set to missing: respiratory rate >80 breaths per minute, systolic blood pressure >300 mm Hg, diastolic blood pressure >150 mm Hg, serum urea nitrogen >200 mg/dL, creatinine >25 mg/dL, and albumin >20 mg/dL. We replaced values outside the following ranges with either minimum or maximum values: systolic blood pressure (70 to 300 mm Hg) and creatinine (0.6 to 2.5 mg/dL).
Approximately 2.7% of the sample had missing values for creatinine and white blood cell count, and 0.5% had missing values for systolic blood pressure (Appendix A). Because missing systolic blood pressure was associated with higher mortality (P<0.001), possibly representing situations in which patients were unstable, we replaced missing values with minimum values (70 mm Hg). Missing observations for creatinine and white blood cell count were replaced with median values. Observations with missing values for MI location and radiographic evidence of heart failure were set to null. Alternative methods for controlling missing values, such as including dummy variables indicating missing observations or restricting the analysis to observations without any missing values, did not substantially affect model estimates, calibration, or our conclusions.
We compared the new model with the following published AMI-specific models of 30-day mortality: the CCP-pilot model,7 the Global Utilization of Streptokinase and Tissue Plasminogen Activator for Occluded Coronary Arteries Trial (GUSTO-I) model,4 the Medicare Mortality Predictor System (MMPS) model,5 an ICD-9 code model,6 and 2 models from the California Hospital Outcomes Project9 (Table 1⇓). The CCP-pilot model and the ICD-9 model were not modified. The GUSTO-I model included all the demographic and clinical variables, but the type of thrombolytic therapy was not included because this sample was not restricted to patients who received it. Our version of the MMPS did not include values for serum potassium in the APACHE II10 score, which were not abstracted for the CCP. The California risk-adjustment models included 2 ICD-9–based models: model A (CA-A), which included risk factors most likely present only at admission, and model B (CA-B), which included additional characteristics believed to be present only at admission but may have occurred during hospitalization. Our versions of CA-A and CA-B did not include source of payment (because all CCP patients were enrolled in Medicare) and year of admission (because CCP admissions were within 1.5 years of each other). All models were recalibrated by use of the validation set.
We used 4 approaches to compare models. First, in a patient-level analysis, we assessed the discriminative ability of each model using analysis of area under the receiver-operating characteristic (AROC) curves.11 Second, we compared model calibration for each of the models using the Hosmer-Lemeshow χ2 statistic. Calibration is a measure of how well a particular model fits the data across a range of patient characteristics.12 Models with smaller χ2 values are less likely to suffer from systemic lack of fit. Third, we determined the correlation of the SMR for each hospital calculated by the different models.
Finally, for each hospital with >50 cases, we evaluated the degree to which hospital rankings would change when different models were used. We calculated risk-adjusted 30-day mortality rates for each hospital on the basis of each of the models (Appendix B). We assigned each hospital a performance rank on the basis of the decile of risk-adjusted mortality (lowest to highest) for each model. We then determined the agreement in the ranking among the models by classifying the percentage of hospitals that were in a similar decile (defined as the same decile or 1 decile different) by each pair of models. For example, if 1 model classified the hospital in the fifth decile and the other in the sixth or fourth decile, they would be considered to agree. We also assessed the similarity of rankings by comparing each of the models with a ranking based on crude (unadjusted) mortality rates.
Table 2⇓ reports the development of the study sample of 161 058 patients. This sample was randomly split by hospital into a derivation set of 82 359 patients and a validation set of 78 699 patients.
New Model Characteristics
In a model with all 73 of the candidate predictor variables, the AROC curve was 0.80. The variables selected for the final model were age, cardiac arrest, anterior or lateral location of myocardial infarction, systolic blood pressure, white blood cell count, serum creatinine, and congestive heart failure (Table 3⇓). This model had an AROC curve of 0.77. In the validation cohort, the AROC was also 0.77, indicating good model discrimination (Table 4⇓).
There were 7 variables in our new model, 27 variables in the CCP pilot model, 19 variables in the GUSTO-I model, 31 variables in the MMPS model, 45 variables in the ICD-9 model, 22 variables in the CA-A model for patients with no prior admissions and 16 for patients with prior admissions, and 58 variables in the CA-B model for patients with no prior admission and 45 for patients with prior admissions (Table 1⇑). The AROC curves for the models were similar, ranging from 0.70 in the ICD-9 model to 0.78 in the CCP pilot model (Table 4⇑). The new model was among the 3 models with the lowest Hosmer-Lemeshow χ2 values, performing similarly with the GUSTO-I and CA-A models. The Figure⇓ compares the SMR between the different models and the new model. The correlation was highest among the clinically based models.
The agreement in similar rankings by decile for the new model based on risk-adjusted mortality was also highest among the clinically based models (Table 5⇓). Compared with a ranking based on crude mortality rates, only 38.3% of the hospitals were classified similarly with a ranking based on the new model, 40.6% for the CCP-pilot model, 38.3% for the GUSTO-I model, 45.2% for the MMPS model, 36.0% for the ICD-9 code model, 37.5% for the CA-A model, and 39.5% for the CA-B model.
In this study, we developed and validated a simple 7-variable risk model for 30-day mortality after admission for AMI. The model is based on data from medical records obtained from acute-care hospitals throughout the United States. The variables are easy to collect and available at admission. This parsimonious model yields a predictive performance for 30-day mortality that is comparable to other published clinically based risk indexes that are based on larger numbers of variables. Moreover, the assessment of hospital performance with this model is similar to those obtained by other published models based on clinical information.
The clinically based models produced rankings of hospitals that were more similar to each other than to the administrative code–based models. Among the models tested, the new model had the worst agreement in the ranking of hospitals with the 3 models derived exclusively from administrative data. These models were based on billing codes and may have included information from events that occurred during hospitalization. As a result, the secondary codes may have represented either comorbidities or complications. Previous work has shown that these codes may not commonly agree with documentation in the medical charts.13 The advantage of the code-based models is that administrative data are readily available on all patients without further data collection required.
The study of risk-adjustment models presents an important challenge.14 There is no gold standard to compare model performance. We sought to compare published models for their ability to classify hospitals and to determine the SMR. Nevertheless, the fact that the models produce similar results does not ensure that they accurately indicate similar levels of performance. Differences in the characteristics of patients admitted to the various hospitals indicate the need for risk adjustment, as demonstrated by the difference in agreement of the risk-adjustment models with crude mortality rates. However, critics of risk adjustment may be concerned that even the best models cannot explain most of the variation in patient outcome.
None of the models in this study demonstrated perfect agreement in the ranking of hospitals by similar decile. The best agreement achieved between models was 80.3%. Iezzoni and others15 have expressed concern that the use of different risk-adjustment models can result in different rankings of hospitals. The failing of these models may result from random variation (a particular problem with a small number of cases), imprecision in the measurement of the variables, unmeasured differences among patients, and other uncertainties pertaining to the human condition. Our model is also based on baseline patient characteristics and does not consider treatments that may modify outcome. Although these models provide the best current approach to comparing performance, there is a continuing need to develop better methods of comparison.
When developing a risk-adjustment model, we must consider how variable selection would impact data quality. Because hospitals have an incentive to overstate illness severity, a risk-adjustment model would ideally select clinical measures that could not easily be manipulated. Most of the variables in our new model were not subject to clinical interpretation. Congestive heart failure may have been subject to some variability in clinical interpretation but included radiographic evidence for heart failure in its definition.
This study has several limitations. First, we focused on 30-day mortality. Although short-term mortality is only a single domain with which to evaluate hospitals in terms of performance, it is an outcome that is important to patients and can be measured reliably. Our focus on this outcome is not meant to diminish the importance of other domains that include functioning, satisfaction, and cost. Future studies may address the best approach to evaluating performance across hospitals in these other domains.
Second, the study population included patients who were ≥65 years of age, and the generalizability of the results to younger populations was not explored. However, most patients with AMI are in this older age group. In addition, given the competing risks of older patients, these disease-specific models would be expected to perform less well in a group of older compared with younger patients.
In conclusion, we demonstrate that a simple 7-variable risk model can perform as well as more complex models in comparing hospital mortality rates for AMI. These results can provide a basis for hospitals to develop a simple approach to comparing outcomes for this important diagnosis.
Age (years): Date of birth from Medicare UB-92 claims form.
Cardiac arrest (yes/no): Ventricular fibrillation, ventricular tachycardia, or some other cardiac disturbance within 6 hours before arrival to the hospital that required cardiopulmonary resuscitation, defibrillation, or chemical cardioversion.
Location of MI (anterior/septal, lateral, posterior, inferior, subendocardial, other): As determined from ECG.
Systolic blood pressure (mm Hg): First value documented within 48 hours of admission.
White blood cell count (thousands): First value documented within 24 hours after admission. If none, closest value within 24 hours before admission.
Creatinine (mg/dL): First value documented within 24 hours after admission. If none, closest value within 24 hours before admission. Divide metric (SI) values recorded in μmol/L by 88.4 to convert to mg/dL.16
Congestive heart failure (yes/no): Congestive heart failure and/or pulmonary edema present at time of arrival on the basis of clinical or radiographic evidence.
See Table 6⇓ for additional data.
Calculation of risk-adjusted mortality rate (or indirectly standardized rate) represents an estimate of the mortality rate that would have been observed for a particular hospital if its patients were similar to those of the overall population with respect to the risk-adjustment variables of interest. The following steps should be used to calculate risk-adjusted mortality:
1. Collect data on observed 30-day outcomes for each patient and the 7 independent variables in Appendix A.
2. Estimate the predicted risk of mortality for each patient (P) from a logistic regression model for 30-day outcomes on the 7 independent variables using the following equation: The mean of the predicted risk of mortality for all patients at a particular hospital represents the expected mortality rate for that hospital.
3. Calculate the SMR for a particular hospital by dividing its observed mortality rate by its expected mortality rate. An SMR <1 indicates that a hospital was observed to have a lower mortality rate than predicted by the risk-adjustment model; an SMR >1 indicates that a hospital was observed to have a higher mortality rate than predicted by the risk-adjustment model.
4. A risk-adjusted mortality rate for a particular hospital can be calculated by multiplying the SMR for a hospital by the overall population mortality rate.
Dr Krumholz is a Paul Beeson Faculty Scholar. Jersey Chen is a Merck/American Federation for Aging Research Scholar in Geriatric Pharmacology and an American Heart Association Student Scholar in Cardiovascular Disease and Stroke.
Reprint requests to Harlan M. Krumholz, MD, Yale University School of Medicine, 333 Cedar St, PO Box 208025, New Haven, CT 06520-8025.
The analyses on which this article is based were performed under contract No. 500–96-P549, entitled “Utilization and Quality Control Peer Review Organization for the State of Connecticut,” sponsored by the Health Care Financing Administration, Department of Health and Human Services. The contents of this article do not necessarily reflect the views or policies of the Department of Health and Human Services, nor does mention of trade names, commercial products, or organizations imply endorsement by the US government. The authors assume full responsibility for the accuracy and completeness of the ideas presented. This article is a direct result of the Health Care Quality Improvement Program initiated by the Health Care Financing Administration, which has encouraged identification of quality improvement projects derived from analysis of patterns of care and therefore required no special funding on the part of this contractor. Ideas and contributions to the author concerning experience in engaging with issues presented are welcomed.
- Received December 17, 1998.
- Revision received March 25, 1999.
- Accepted March 26, 1999.
- Copyright © 1999 by American Heart Association
Ryan TJ, Anderson JL, Antman EM, Braniff BA, Brooks NH, Califf RM, Hillis LD, Hiratzka LF, Rapaport E, Riegel BJ, Russell RO, Smith EE, Weaver WD. ACC/AHA guidelines for the management of patients with acute myocardial infarction: a report of the American College of Cardiology/American Heart Association Task Force on Practice Guidelines (Committee on Management of Acute Myocardial Infarction). J Am Coll Cardiol. 1996;28:1328–1428.
Lee KL, Woodlief LH, Topol EJ, Weaver WD, Betriu A, Col J, Simoons M, Aylward P, Van de Werf F, Califf RM, for the GUSTO-I Investigators. Predictors of 30-day mortality in the era of reperfusion for acute myocardial infarction: results from an international trial of 41,021 patients. Circulation. 1995;91:1659–1668.
Romano PS, Luft HS, Rainwater JA, Zach AP. Report on Heart Attack 1991–1993, Volume 2: Technical Guide. Sacramento, Calif: California Office of Statewide Health Planning and Development; 1997.
Hosmer D, Lemeshow S. Applied Logistic Regression. New York, NY: John Wiley & Sons; 1989.
Young DS. Implementation of SI units for clinical laboratory data: style specifications and conversion tables. Ann Intern Med. 1987;106:114–129.We analyzed data on 82 359 patients ≥65 years of age who were admitted with acute myocardial infarction to 2401 hospitals and derived a model predicting 30-day mortality. It included 7 variables: age, cardiac arrest, anterior or lateral location of myocardial infarction, systolic blood pressure, white blood cell count, serum creatinine, and congestive heart failure. We validated the model in 78 699 patients from 2,386 hospitals and compared it with 5 other published models. The 7-variable risk model performed as well as more complex models in comparing hospital outcomes.