| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
(Circulation. 2007;115:1518-1527.)
© 2007 American Heart Association, Inc.
Cardiovascular Surgery |
From Tufts University School of Medicine (D.M.S.), Harvard Medical School (T.S., A.F.L., R.E.W., S.-L.T.N.), and Harvard School of Public Health (S.T.N.), Boston, Mass.
Correspondence to Sharon-Lise Normand, Department of Health Care Policy, Harvard Medical School, 180 Longwood Ave, Boston, MA 02115. E-mail Sharon{at}hcp.med.harvard.edu
Received April 10, 2006; accepted December 22, 2006.
| Abstract |
|---|
|
|
|---|
Methods and Results Fiscal year 2003 isolated coronary artery bypass grafting surgery results based on an audited and validated Massachusetts clinical registry were compared with those derived from a contemporaneous state administrative database, the latter using the inclusion/exclusion criteria and risk model of the Agency for Healthcare Research and Quality. There was a 27.4% disparity in isolated coronary artery bypass grafting surgery volume (4440 clinical, 5657 administrative), a 0.83% difference in observed in-hospital mortality (2.05% versus 2.88%), corresponding differences in risk-adjusted mortality calculated by various statistical methodologies, and 1 hospital classified as an outlier only with the administrative databased approach. The discrepancies in volumes and risk-adjusted mortality were most notable for higher-volume programs that presumably perform a higher proportion of combined procedures that were misclassified as isolated coronary artery bypass grafting surgery in the administrative cohort. Subsequent analyses of a patient cohort common to both databases revealed the smoothing effect of hierarchical models, a 9% relative difference in mortality (2.21% versus 2.03%) resulting from nonstandardized mortality end points, and 1 hospital classified as an outlier using logistic regression but not using hierarchical regression.
Conclusions Cardiac surgery report cards using administrative data are problematic compared with those derived from audited and validated clinical data, primarily because of case misclassification and nonstandardized end points.
Key Words: coronary artery bypass surgery databases health policy outcome assessment (health care) statistics
| Introduction |
|---|
|
|
|---|
Editorial p 1508
Clinical Perspective p 1527
Determining the optimal statistical methodology for risk adjustment and provider profiling has been controversial.1,3 No statistical technique, however, regardless of its sophistication, can compensate for flawed data, and it is this concern that motivates the present study. Healthcare data may be derived from administrative or clinical sources,4,5 the latter including both retrospective chart abstraction and prospectively maintained databases such as the Society of Thoracic Surgeons National Adult Cardiac Database. Administrative data, typically derived from discharge billing forms, are the most inexpensive and readily available source of information regarding acute care hospitalizations. Although not originally intended for this purpose, such data also have been used to assess healthcare provider performance and are currently the basis for some public "report cards."
Using administrative data to evaluate the quality of provider performance is plausible when they are the sole available source of information and when it has been demonstrated that their use results in conclusions similar to those derived from clinical data.6 However, are public reports based on hospital discharge data an acceptable alternative when more accurate but also more costly and resource-intensive options are available, as in the case of CABG surgery?
The present study, prompted by recent experience in Massachusetts, investigates the usefulness of a national algorithm based on administrative data, compared with results derived from prospectively maintained clinical data, in assessing hospital CABG outcomes. Beginning in 2002, Massachusetts law mandated that data on all cardiac surgical procedures be collected using the Society of Thoracic Surgeons National Adult Cardiac Database. These data are submitted to the Massachusetts Data Analysis Center (Mass-DAC)7 based at the Department of Health Care Policy at Harvard Medical School, where they are subjected to internal and external audit and validation.8,9 Analyses are then performed using hierarchical logistic regression in which random intercepts are included for each hospital.1,9 The first CABG report card was published in October 2004 and was based on calendar year 2002 data.
Because of its desire to bring information to the public more quickly than was possible with audited clinical data, the Massachusetts Executive Office of Health and Human Services published its own fiscal year 2004 CABG results on their Internet site in 2006. They were based on hospital discharge billing data collected by the Massachusetts Division of Health Care Finance and Policy using a national model proposed by the Agency for Healthcare Research and Quality (AHRQ).10 The Massachusetts Executive Office of Health and Human Services also published the calendar year 2003 Mass-DAC isolated CABG results. This afforded a side-by-side, although not strictly contemporaneous, comparison.
Notably, the CABG cohort based on AHRQ CABG criteria as applied to Massachusetts administrative data consisted of 5315 procedures for fiscal year 2004, whereas Mass-DAC recorded 4393 confirmed isolated CABG cases for calendar year 2003. This 21% discrepancy in volume (as high as 40% at 1 hospital) cast doubt on the accuracy of the AHRQ algorithm, particularly because the volume of isolated CABG surgery declined rather than increased in Massachusetts between 2003 and 2004.
Because we had access to both audited clinical data and administrative data for fiscal year 2003, we conducted a comprehensive, contemporaneous comparison of these 2 algorithms and the hospital quality profiles derived from them.
| Methods |
|---|
|
|
|---|
Clinical Cohort (Mass-DAC)
Hospitals submit clinical information to Mass-DAC for all adult patients (age
18 years) having cardiac surgery in a Massachusetts nonUS governmental hospital. Isolated CABG cases are identified by excluding cases that combine CABG with other significant procedures such as valve replacement or carotid endarterectomy. For certain procedures with problematic classification, additional exclusion criteria were used on the basis of clinical data submitted by the hospital.7 Cases coded by hospitals as "CABG+other" are reviewed by an adjudication committee of senior cardiac surgeons, and a determination is made as to whether the case is legitimately in the "other" category. This review seeks to mitigate gaming of the system in which cases with unfavorable outcomes are shifted into an unreported category.1
Administrative Cohort (AHRQ)
Patients
40 years of age discharged with International Classification of Diseases, 9th revision, clinical modification (ICD-9-CM) codes 36.10 to 36.19 are selected. Thus, the total number of discharges invariably includes cases that are not isolated CABG such as CABG plus cardiac valve replacement. Discharges with missing discharge disposition or those involving transfer to a short-term general hospital are excluded.
The gold standard for this comparison is the audited database and the analytical methodology used by Mass-DAC.79
Definition of Outcomes
Mass-DAC uses 30-day all-cause mortality as the primary patient end point. Hospital-reported 30-day status is verified by linking all cardiac surgery cases reported to the Massachusetts Registry of Vital Records and Statistics. The AHRQ model uses all-cause in-hospital mortality reported in hospital billing data.
The primary hospital end point is standardized mortality incidence rate (SMIR), interpreted as the mortality at each hospital standardized to the overall mortality experience in the population of hospitals under study. Confidence intervals are constructed when estimating logistic regression models, and probability intervals are constructed when estimating hierarchical logistic regression models. For ease of exposition, we use the terms interval estimates or confidence intervals interchangeably.
Risk Adjustment
Mass-DAC uses the Society of Thoracic Surgeons National Adult Cardiac Database instrument and definitions, as well as a hierarchical logistic regression model, to determine standardized mortality rates. Risk factors for the model were selected by expert consensus from the best existing clinical models. A hospital-specific random intercept was included for each hospital in the sample.
The AHRQ model uses standard logistic regression, and its risk factors include demographic information and factors based on All Patient Refined Diagnosis-Related Group codes.10
Statistical Analyses
Comparison of Cardiac Surgery Volumes With CABG Volumes and With Observed CABG Mortality Rates
CABG volumes and observed mortality rates were examined overall and by hospital using graphical methods. We plotted ratios of CABG volumes in the original AHRQ and Mass-DAC cohorts as a function of total cardiac surgery procedures to determine whether the disparities between cohorts varied by hospital program size. Total cardiac surgery volume was obtained from the Mass-DAC database. A similar plot was constructed to examine the relationship of the difference between observed in-hospital mortality in the original AHRQ and Mass-DAC cohorts as a function of total cardiac program volume. Scatterplot smoothers11 were constructed to illustrate these relationships.
Comparison of Discharges Included in the AHRQ and Mass-DAC Models
We identified the number of discharges meeting inclusion criteria of both approaches, the number included in the AHRQ but not the Mass-DAC model, and the number included in the Mass-DAC but not in the AHRQ model. The administrative data and Mass-DAC data were merged using a combination of medical record number, discharge date, admission data, and date of birth. When possible, we determined the specific reason that discharges were included in 1 model but not the other. Because we had in-hospital mortality for all patients, we also calculated the observed inpatient mortality rate by exclusion category.
Comparison of Hospital Standardized Mortality and Outlier Determination
In addition to misclassification errors, we hypothesized other potential reasons for differences in the AHRQ and Mass-DAC approaches, including risk models, mortality end points, and statistical methodology. To isolate differences not attributable to classification errors, additional comparative analyses were performed using a common cohort of patients who met both AHRQ and Mass-DAC inclusion criteria.
For this common cohort, we estimated the predicted risk of in-hospital mortality for each patient using simple logistic regression models and the model-specific risk predictors and determined the areas under the receiver-operating characteristics curves. Second, we computed point and 95% CI estimates of in-hospital SMIRs using AHRQ and Mass-DAC predictors, estimated by both standard and hierarchical logistic regressions. The standardized rate is obtained by risk adjusting for the case mix of the hospital, indirectly standardizing its risk to an expected risk, and then multiplying by the state average. If the interval estimate for the hospital was below the state average, we concluded that mortality was lower than expected. If the interval estimate included the state average, then the mortality was as expected. If the interval estimate was above the state average, then mortality was higher than expected. Finally, we determined 30-day SMIRs using Mass-DAC predictors and both standard and hierarchical logistic regression. Models were fitted using the WinBUGS (Medical Research Council Biostatistics Unit, Cambridge, United Kingdom), SAS (SAS Institute, Cary, NC), and S-Plus (Insightful Corp, Seattle, Wash) software systems.
The authors had full access to and take full responsibility for the integrity of the data. All authors have read and agree to the manuscript as written.
| Results |
|---|
|
|
|---|
|
Observed CABG Mortality (Original Cohorts)
Overall observed in-hospital and 30-day mortality rates in the Mass-DAC cohort were 2.05% (91 of 4440) and 2.27% (101 of 4440). The in-hospital mortality observed in the AHRQ cohort was 2.88%, exceeding the corresponding Mass-DAC value by 0.83% (range, 0% to 1.63%). In-hospital mortality from the AHRQ-defined cohort is always higher than that from the Mass-DAC cohort, and the absolute difference increases directly with hospital total cardiac surgery volumes (Figure 1, right).
Risk-Adjusted and Standardized Mortality (Original Cohorts and Methodologies)
Figure 2 depicts hospital-specific 30-day SMIRs (95% probability intervals) using hierarchical logistic regression for the original Mass-DAC cohort (n=4440) and hospital-specific SMIR point estimates (95% CIs) using logistic regression for the original administrative cohort (n=5657). The AHRQ cohort has generally higher mortality rates (overall, 2.88% versus 2.27%). The smoothing effect of the Mass-DAC hierarchical model reduces interhospital variability, as demonstrated by the narrower 95% CIs compared with the AHRQ model. Hospital 14 had no in-hospital mortalities, and the AHRQ modelestimated SMIR is 0% (95% CI undefined), whereas the hierarchical model estimate is 2.30% (95% CI, 1.24 to 3.71). No hospital is identified as a statistical outlier based on the Mass-DAC algorithm, whereas hospital 4 is identified as having statistically higher-than-expected 30-day mortality using the AHRQ algorithm.
|
Comparison of AHRQ and Mass-DAC Cohorts
Figure 3 depicts 3 groups of patients: (1) 1264 patients found only in the AHRQ cohort having an in-hospital observed mortality of 5.93%, (2) 47 patients found only in the Mass-DAC cohort having an observed mortality of 4.25%; and (3) 4393 patients found in both cohorts, with 2.03% mortality in the Mass-DAC cohort (confirmed as accurate) and 2.00% in the AHRQ cohort. Most of the patients (42 of 47) classified in group 2 met the age criteria for Mass-DAC but not for AHRQ, had missing data, or had been transferred. Among the 1264 patients in group 1, about half (n=596) had CABG combined with valve procedures. This misclassification could be corrected with a more refined administrative algorithm. However, an additional 663 cases,
10% of the original AHRQ cohort, were coded as CABG plus other. Even with a clinical database and well-defined exclusion criteria, adjudication of such cases by an expert panel typically is required. It would be difficult, if not impossible, to correctly categorize such procedures (isolated versus nonisolated CABG) from administrative data using only discharge codes. This subset of CABG plus other cases was associated with high mortality (7.39%) and clearly had substantial impact on the overall AHRQ mortality estimate.
|
Risk Model Differences (Common Cohort, N=4393)
Tables 1 and 2
list risk factors, frequencies, and associations with in-hospital or 30-day mortality for 4393 patients meeting criteria for both the AHRQ and Mass-DAC models. For example, 2.0% of patients in the Mass-DAC cohort presented with cardiogenic shock, of which 23% died in-hospital and 19.5% died within 30 days. In-hospital mortality exceeds 30-day mortality in this and several other instances in Tables 1, 2 and 3![]()
because the Society of Thoracic Surgeons definition of in-hospital mortality includes all in-hospital deaths, regardless of timing.
|
|
|
Comparisons of Hospital Risk-Standardized Mortalities (Common Cohort, N=4393)
Table 3 lists the individual hospital volumes and observed mortality rates for the 4393 patients common to both the AHRQ and Mass-DAC cohorts.
Figure 4 depicts hospital-specific SMIRs (with 95% posterior intervals) for the common cohort estimated by hierarchical logistic regression using Mass-DAC and AHRQ predictors. The 95% Mass-DAC intervals are generally larger compared with those based on AHRQ predictors. Figure 5 depicts a similar comparison using logistic regression. There were 3 hospitals with no in-hospital mortalities for which the logistic model produces SMIRs of 0%.
|
|
Figure 6 depicts the results for 30-day mortality using Mass-DAC predictors estimated using both hierarchical and standard logistic regression models. Overall, there was an absolute 0.18% (2.21% versus 2.03%) difference in average mortality, depending on whether in-hospital (Figure 5) or 30-day (Figure 6) mortality was used as the end point. This 9% relative difference in mortality suggests that choice of an appropriate, standardized end point is not a trivial concern. The smoothing effect of the Mass-DAC hierarchical model shrinks the estimates toward the state mean (especially for low-volume providers), reduces interhospital variability, and narrows the 95% intervals.
|
Overall, these results demonstrate substantially both higher average mortality at 30 days compared with in-hospital mortality and the shrinkage effect of the hierarchical logistic regression model on point and 95% interval estimates. Notably, using in-hospital mortality, hospital 5 was clearly an outlier using the AHRQ or Mass-DAC predictors, estimated via logistic regression. It was not an outlier using hierarchical regression and either set of predictors.
Comparative Risk Model Discrimination (Common Cohort, N=4393)
The Mass-DAC model using in-hospital mortality as an outcome had an area under the receiver-operating characteristics curve of 0.84, and the observed mortalities in the highest and lowest risk deciles were 11.7% and 0.2%. The AHRQ model had an area under the curve of 0.89 (not significantly different from the Mass-DAC model; P=0.05) with observed mortalities in the highest and lowest risk deciles of 15.6% and 0.4%. Finally, the Mass-DAC model assigned 0.2% of the sample to a risk of dying >0.50; the AHRQ assigned no one to this high-risk group.
| Discussion |
|---|
|
|
|---|
When used as the basis for public report cards,1,3 outcomes data must be of the highest possible reliability and validity, a requirement emphasized by the Institute of Medicine12,13 and the American Heart Association.6 Criteria have been proposed for profiling initiatives that evaluate risk-adjusted outcomes in cardiac surgery.5 Clinical data registries generally incorporate these desirable features, whereas administrative databases do not.
Because clinical data are unavailable for many specialties and because of the emphasis currently being placed on outcomes transparency and public accountability, there has been renewed interest in using administrative data for these purposes. Administrative data originally were intended primarily for reimbursement, although there has been a resurgence of interest in their use for healthcare research (despite having been abandoned by the federal government in 1993 as the basis for publicly reporting Medicare hospital mortality4,14,15). Recent reports suggest that experienced researchers can, with careful design and sophisticated statistical methodology, develop models based on administrative data for certain conditions that may be appropriate for profiling,16,17 but experts have generally cautioned against this practice.4,12,1829 Administrative data should be used to profile hospitals only after rigorous verification of their validity for such purposes, as was recently demonstrated for mortality after AMI16 and heart failure.17
Concerns regarding administrative data sources for profiling include the following:
As a consequence of failing to differentiate complications from comorbidities, the performance of risk models based on administrative data may be exaggerated. This results from including predictors in the risk model that are actually late-hospitalization, preterminal events and thus highly predictive of subsequent mortality.4,14,19,29
Using an indicator for condition present at admission or date stamping of secondary diagnoses49 may improve differentiation of complications from comorbidities in administrative databases. This is currently the practice in at least 2 states, New York and California.4,52
Study Limitations
In the present study, we used the actual AHRQ algorithm used by the Massachusetts Executive Office of Health and Human Services to generate CABG profiles for their Web site in 2006. Better administrative algorithms could certainly be developed, especially with regard to the correct identification of isolated CABG cases. The results from administrative-based profiling also could be enhanced by adding a few critical clinical variables and using more appropriate statistical techniques such as hierarchical models.
Conclusions
Government agencies and payers will be increasingly tempted to use administrative data for provider profiling because it is inexpensive and available within a short time frame. Given the inaccuracies demonstrated in our Massachusetts CABG analysis, this practice should be discouraged. Efforts should be made, as they are currently in Massachusetts, to reduce the lag time for producing more accurate reports based on clinical data.
| Acknowledgments |
|---|
This work was supported by a contract from the Massachusetts Department of Public Health (No. 620022A4PRE).
Disclosures
Dr Shahian receives an honorarium for providing clinical advice to Mass-DAC. T. Silverstein, A. Lovett, R. Wolf, and Dr Normand receive salary support from a Massachusetts Department of Public Health contract.
| References |
|---|
|
|
|---|
Related Article:
Circulation 2007 115: 1503.
This article has been cited by other articles:
![]() |
S. Westaby, N. Archer, N. Manning, S. Adwani, C. Grebenik, O. Ormerod, R. Pillai, and N. Wilson Comparison of hospital episode statistics and central cardiac audit database in public reporting of congenital heart surgery mortality BMJ, October 13, 2007; 335(7623): 759 - 759. [Abstract] [Full Text] [PDF] |
||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Circulation Home | Subscriptions | Archives | Feedback | Authors | Help | AHA Journals Home | Search Copyright © 2007 American Heart Association, Inc. All rights reserved. Unauthorized use prohibited. |