Donate Help Contact The AHA Sign In Home
American Heart Association
Circulation
Search: search_blue_button Advanced Search
Circulation. 2008;117:1955-1963
Published online before print April 7, 2008, doi: 10.1161/CIRCULATIONAHA.107.747873
CLINICAL PERSPECTIVE
This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow All Versions of this Article:
117/15/1955    most recent
CIRCULATIONAHA.107.747873v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrowRequest Permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Shahian, D. M.
Right arrow Articles by Normand, S.-L. T.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Shahian, D. M.
Right arrow Articles by Normand, S.-L. T.
Right arrowPubmed/NCBI databases
Medline Plus Health Information
*Coronary Artery Bypass Surgery
Related Collections
Right arrow Health policy and outcome research
Right arrow Ablation/ICD/surgery
Right arrow CV surgery: coronary artery disease
Right arrowRelated Article

(Circulation. 2008;117:1955-1963.)
© 2008 American Heart Association, Inc.


Health Services and Outcomes Research

Comparison of "Risk-Adjusted" Hospital Outcomes

David M. Shahian, MD; Sharon-Lise T. Normand, PhD

From the Center for Quality and Safety, Department of Surgery, and Institute for Health Policy, Massachusetts General Hospital, and Harvard Medical School (D.M.S.), and Department of Health Care Policy, Harvard Medical School, and the Department of Biostatistics, Harvard School of Public Health (S.T.N.), Boston, Mass.

Correspondence to Sharon-Lise T. Normand, Department of Health Care Policy, Harvard Medical School, 180 Longwood Ave, Boston, MA 02115. E-mail Sharon{at}hcp.med.harvard.edu

Received November 9, 2007; accepted February 13, 2008.


*    Abstract
up arrowTop
*Abstract
down arrowIntroduction
down arrowMethods
down arrowResults
down arrowDiscussion
down arrowConclusions
down arrowReferences
 
Background— A frequent challenge in outcomes research is the comparison of rates from different populations. One common example with substantial health policy implications involves the determination and comparison of hospital outcomes. The concept of "risk-adjusted" outcomes is frequently misunderstood, particularly when it is used to justify the direct comparison of performance at 2 specific institutions.

Methods and Results— Data from 14 Massachusetts hospitals were analyzed for 4393 adults undergoing isolated coronary artery bypass graft surgery in 2003. Mortality estimates were adjusted using clinical data prospectively collected by hospital personnel and submitted to a data coordinating center designated by the state. The primary outcome was hospital-specific, risk-standardized, 30-day all-cause mortality after surgery. Propensity scores were used to assess the comparability of case mix (covariate balance) for each Massachusetts hospital relative to the pool of patients undergoing coronary artery bypass grafting surgery at the remaining hospitals and for selected pairwise comparisons. Using hierarchical logistic regression, we indirectly standardized the mortality rate of each hospital using its expected rate. Predictive cross-validation was used to avoid underidentification of true outlying hospitals. Overall, there was sufficient overlap between the case mix of each hospital and that of all other Massachusetts hospitals to justify comparison of individual hospital performance with that of the remaining hospitals. As expected, some pairwise hospital comparisons indicated lack of comparability. This finding illustrates the fallacy of assuming that risk adjustment per se is sufficient to permit direct side-by-side comparison of healthcare providers. In some instances, such analyses may be facilitated by the use of propensity scores to improve covariate balance between institutions and to justify such comparisons.

Conclusions— Risk-adjusted outcomes, commonly the focus of public report cards, have a specific interpretation. Using indirect standardization, these outcomes reflect a provider’s performance for its specific case mix relative to the expected performance of an average provider for that same case mix. Unless study design or post hoc adjustments have resulted in reasonable overlap of case-mix distributions, such risk-adjusted outcomes should not be used to directly compare one institution with another.


Key Words: health care quality assessment • outcomes research • risk • statistics


*    Introduction
up arrowTop
up arrowAbstract
*Introduction
down arrowMethods
down arrowResults
down arrowDiscussion
down arrowConclusions
down arrowReferences
 
Outcomes research "seeks to understand the end results of particular health care practices and interventions."1 This may involve investigation of a new drug or procedure compared with standard therapy through the use of either a randomized trial or an observational study. Because of the current health policy emphasis on measuring and improving provider performance,2,3 interest has also been increasing in another type of outcomes research referred to as provider profiling.4,5 This research focuses on the collection and analysis of outcomes data to evaluate the performance of a physician or a hospital.

Clinical Perspective p 1963

Provider profiling has a number of features that distinguish it from other types of outcomes research. First, unlike trials of new medications or treatment regimens, randomization of patients to hospitals or physicians would often be both impractical and unethical. Thus, profiling studies are almost always observational in nature, relying on data from usual practice settings. In further contrast to drug trials that involve direct comparisons of outcomes for only a few treatments, profiling studies typically assess outcomes for many providers, usually with regard to some population reference standard. Finally, when profiling is based on outcomes measures such as mortality or morbidity, risk adjustment is necessary to account for preexisting conditions that may confound their assessment.

Despite their increasingly widespread use, considerable confusion exists among consumers, the media, payers, and providers as to the correct meaning and interpretation of risk-adjusted outcomes. For example, many incorrectly interpret such outcomes as having "leveled the playing field" to permit direct comparison of one provider with another. Direct comparability may sometimes be justified in an observational study, but this would be fortuitous and is not an inherent characteristic of the study design.

Correct interpretation of the concept of risk-adjusted outcomes is neither a trivial nor a strictly academic concern. Such outcomes are used to designate centers of excellence, to determine reimbursement levels in pay for performance programs, to rank institutions, and to classify providers as "outliers." These determinations may have profound effects on patient access, hospital reputation, referrals, and financial survival.

The goal of this article is to systematically review the fundamental concepts from which the deceptively simple term "risk-adjusted outcome" is derived. We develop the concept of risk-adjusted outcomes in the context of causal inference theory and illustrate the derivation of indirectly standardized mortality ratios, often referred to as O/E (observed/expected) ratios. Key methodological concepts (eg, outlier determination and direct comparison of hospitals) are illustrated through the example of coronary artery bypass grafting surgery (CABG) mortality profiling, in which the difference in outcomes of a hospital compared with the reference standard is generally regarded as a reflection of quality of care.5


*    Methods
up arrowTop
up arrowAbstract
up arrowIntroduction
*Methods
down arrowResults
down arrowDiscussion
down arrowConclusions
down arrowReferences
 
Background
It is useful to consider risk adjustment and standardization as specific applications of causal inference theory, a broad discipline with historical roots in philosophy, mathematical logic, and statistics.6–19 This is the foundation for understanding causal effects in health care,16,20–24 which can be thought of as the difference between the outcome for a patient when exposed to one treatment (or provider) and the outcome when exposed to another.

A fundamental precept of causality is that only one of a series of potential outcomes can be experienced at any one time.7,17,20,23,24 In CABG hospital profiling, a patient can undergo CABG at only one hospital on a given day. Therefore, some method must be used to estimate what would hypothetically have occurred to that patient had he or she undergone surgery at a different hospital. The observed result is referred to as the actual outcome, and the unobservable estimated outcome is the counterfactual.7,17,20,23,24 Estimation of this counterfactual outcome, the hypothetical result if treated under a different set of circumstances, is the primary motivator for risk model development. Several approaches have been developed to estimate these potential outcomes for individual patients and subsequently to assess the overall performance of a hospital.

Estimation of Counterfactuals for Risk Adjustment and Standardization
The simplest estimator of a counterfactual would be the average result of treating a similar condition (eg, a CABG procedure) in the overall population or at another specific institution. However, this estimator is likely to be both inaccurate and misleading. Patients are nonrandomly allocated among institutions, and use of crude mortality rates from other hospitals as the counterfactual outcomes would ignore systematic differences among patients such as acuity status. At the other end of the spectrum, the counterfactual outcomes could be determined through randomization,15,18,19,25 the most internally valid design. Both measured and unmeasured confounders would be balanced, so the mortality experience of patients undergoing CABG at one hospital could serve as the counterfactual outcome for patients treated at another hospital. However, it is implausible to think that most patients would consent to randomization for anything but truly experimental care; for this reason, almost all profiling studies are conducted with observational data. Matching and stratification are other methods sometimes used to derive counterfactuals, but they quickly become impractical when more than a few predictor variables are considered, the typical case in mortality profiling.

Most profiling studies have relied on regression modeling to derive counterfactual outcomes, and it is the method used here. Risk adjustment, the term commonly used for this approach, refers to the results of statistical regression models that relate the outcome for a specific patient to his or her observed characteristics.4,26–29 Then, because the main focus of profiling is to determine how the overall experience of a particular hospital compares to what would be "expected," the next step is to standardize the results of an institution to the reference population.

Indirect standardization is used for almost all profiling and public report cards. With this method, the expected rate represents what the mortality rate would have been at a hospital given its actual distribution of patients but replacing its observed mortality rates with rates estimated from the entire group of providers. The indirectly standardized mortality ratio, often referred to as the ratio of observed to expected outcomes (O/E ratio), compares the outcomes for the specific distribution of patients at a hospital with their expected results had they been treated by an average provider in the reference population.

Indirect standardization is accomplished by first summing the individual risk probabilities for each patient within a given hospital using the coefficients estimated from the regression model and the patient’s specific distribution of confounders. This yields the expected total number of deaths for that hospital. This counterfactual hospital mortality often is used as the denominator of the ratio of observed to expected mortality (O/E ratio), a form of causal estimand. This O/E ratio is favorable if <1 and unfavorable if >1. As a final step, the O/E ratio may be multiplied by the unadjusted population mortality rate for the procedure to obtain what is often called the risk-adjusted mortality rate but which is more correctly designated the risk-standardized mortality rate (RSMR) or standardized mortality incidence rate (SMIR).30–34

Outlier Determination and the Direct Comparison of Hospitals
Outliers
The main goal of outcomes profiling is to identify differences in hospital quality. Because the risk-standardized rates for each hospital are derived from the reference population, it is most appropriate to determine whether these rates are statistically different from the population average. If so, the hospital is regarded as a statistical outlier. Most commonly, this is achieved by determining whether the 95% interval for a hospital’s risk-standardized mortality estimate includes the overall state average mortality (or alternatively, if the intervals around their O/E ratio intersect 1). If no overlap exists, they typically are classified as an outlier. An important but overlooked aspect of outlier determination is the effect on expected outcomes when true outlying programs are included in the development of the statistical model. This problem and a potential solution (cross-validated P values) are described further in the Illustration.

Risk Factor Distribution and Direct Comparability
In addition to comparing individual hospitals with the reference population to determine outlier status, some consumers also seek to directly compare individual hospitals with one another. A problem with direct comparisons that has been widely recognized by statisticians, and that was the motivation for the development of balancing methods such as propensity scores,14–16,18,19,35–41 is that of covariate imbalance. Absent randomization, the patient cohorts from 2 hospitals may be unbalanced with regard to the frequency of confounders. The implications of such imbalance have received little attention in the context of risk-adjusted outcomes profiling, which in turn has led to both misunderstanding and misuse.

In general, only the results for those patients with comparable risk profiles (eg, that overlap the risk distributions of the 2 providers) should be directly compared. Consider the extreme but not uncommon example of a state or region with many small community hospitals and 1 or 2 tertiary/quaternary hospitals. As a general principle, direct comparison of a community to a tertiary hospital would be appropriate only for the relatively small proportion of patients who overlap between the 2 hospitals. Although the results for the overlap group can be used to estimate expected outcomes for patients not in common between the 2 institutions, this form of extrapolation depends heavily on assumptions that are typically unverifiable. For example, the indirectly risk-standardized results at a community hospital apply to its specific type of patients, who might be relatively low risk compared with a tertiary center. It cannot be assumed that a favorable risk-standardized mortality at the community hospital, based on its lower risk case mix, could necessarily be achieved if it were confronted with the higher-risk case mix of the tertiary center, including some types of patients that it rarely, if ever, encounters.

Propensity scores are a useful method to construct treatment and control groups that may differ in number of subjects but are similar to randomized studies in their balanced distribution of all measured confounders.14–16,18,19,35–41 The propensity score is the likelihood of receiving treatment of one type compared with another (or in the case of profiling, exposure to one or another specific provider) on the basis of a patient’s set of observed characteristics. It provides a convenient scalar (1-number) summary of the information contained in all the patient’s measured covariates. The propensity score may then be used for matching, stratification, blocking, or weighting in regression modeling.

The problem of covariate imbalance has received little attention in provider profiling studies.42–45 If the propensity score provides a convenient summary estimate of individual patient risk, then each provider will have a specific distribution of propensity scores that characterizes its "case mix." For 2 providers to be comparable, the area of overlap in their respective propensity score distributions should be identified. As shown in Figure 1A, 2 hypothetical hospitals (hospitals 1 and 2) might by chance (or as a result of randomization) have substantial overlap in their propensity score distributions. The area of shaded overlap in Figure 1A indicates that a majority of patients treated at hospital 2 have a similar propensity to have been treated at hospital 1. For almost every patient who underwent CABG at hospital 1, we can find a "similar" patient from among those having CABG at hospital 2.


Figure 1189401
View larger version (15K):
[in this window]
[in a new window]

 
Figure 1. Covariate balance (shaded area) between patients treated at 2 fictitious hospitals. The x axis represents the log-odds of the probability that a patient has surgery at hospital 1 vs hospital 2; the y axis represents the density of patients. Substantial overlap is present in log-odds in A, and less overlap is present in B.

Figure 1B depicts a different set of 2 hospitals with significant imbalance in their average patient risk as measured by their propensity score distributions. Only a small percentage of patients at the 2 institutions have comparable risk profiles. It is only the group of patients who overlap from which relative performance inferences should be drawn.

Illustration
Study Population
We examined data from all adults (≥18 years of age) undergoing isolated CABG at all acute-care, nonfederal hospitals in Massachusetts between January 1, 2003, and December 31, 2003. Data collection is mandated by the Massachusetts Department of Public Health.

Data Sources
We used clinical data submitted to a data coordinating center (Mass-DAC) located in the Harvard Medical School Department of Health Care Policy. Data are collected by trained hospital personnel using the Society of Thoracic Surgeons National Adult Cardiac Database instrument.46 Supplemental patient and surgeon identifying information also is collected using additional data forms developed by Mass-DAC. The data are sent electronically to Mass-DAC, where they are cleaned, audited, and verified using internal and external procedures.

End Points
The primary end point is hospital-specific, risk-standardized, all-cause, 30-day mortality rate. Mortality data are obtained 2 ways. First, hospital personnel are responsible for collecting 30-day mortality for all patients undergoing cardiac surgery. Second, patient identifying information is linked to this registry from the Massachusetts Registry of Vital Records and Statistics to verify date of death. The registry includes mortality information for Massachusetts residents and all records of deaths that occur within the Commonwealth regardless of the state of residence. Because Mass-DAC has access to Social Security numbers, the Social Security Index Web site47 also is searched to identify deaths, including those reported to the Social Security Administration by funeral homes or by relatives.

Statistical Analyses
Distributions of clinical and demographic variables are computed and stratified by hospital to identify unusual or extreme values. Because of data collection protocols and auditing procedures, no data are missing in the clinical variables or outcomes for the mortality models.

Risk Adjustment
We first estimated a propensity score model in which the dependent variable was multinomial, assuming 13 distinct values corresponding to the 13 hospitals (1 hospital is the reference group). The specific clinical variables included in the model were selected from a literature review of existing models and expert opinion from a panel of senior cardiac surgeons. A multinomial logistic regression model was estimated, and predictions for each patient in the sample were subsequently obtained. Thus, each patient had 14 estimated probabilities, each reflecting the likelihood that the patient would undergo CABG at 1 specific hospital rather than 1 of the remaining 13 hospitals. For this reason, the sum of the 14 estimated probabilities for each patient was 1.

To compare the performance of each hospital with that of its peers, it is necessary to assess whether the population of patients undergoing surgery at a particular hospital is comparable to that of all other Massachusetts hospitals on the basis of their observed characteristics. To accomplish this, we examined the overlap between the distribution of the propensity scores for patients undergoing surgery at each hospital and the distribution of the propensity scores for patients not undergoing surgery at that hospital. Ideally, the estimated propensity scores of the latter group would cover the entire range of estimated propensity scores at the particular hospital being studied. This finding would provide support for the assumption that the 2 groups of patients (those treated at a particular hospital versus all others) were similar in terms of observable demographic characteristics and other comorbidities.

We next estimated a regression model for the mortality outcomes. The dependent variable was binary, assuming a value of 1 if the patient died of any cause within 30 days of surgery and 0 otherwise. We included the same set of confounders used in the propensity score model. We included a random hospital-specific intercept that represented the underlying quality of the hospital and accounted for within-hospital correlation of patients. We calculated odds ratios (ORs) conditional on the hospital random effects that apply to comparisons of patients belonging to the same hospital (see Larsen and Merlo48 for a discussion of differences between conditional and unconditional ORs).

The size of between-hospital variation was summarized by the median OR (MOR).49 The MOR considers 2 CABG patients with the same set of observed risk factors but selected randomly from 2 different hospitals. The MOR is the OR between the patient with a higher probability of dying and the patient with a lower probability of dying. A MOR value >1 supports the hypothesis that between-hospital variation in mortality exists after adjustment for patient characteristics. If the between-hospital variation were 0, this would imply that differences in hospital outcomes, after adjustment for patient characteristics, are due only to random sampling variability. Although between-hospital variation will always be >0 in practice, some have suggested that small values can be effectively ignored by essentially setting the between-hospital variation component to 0. We see no reason to assume that between-hospital variation is 0 given that this value can be estimated.

We calculated the mortality risk for each patient using the observed values of his or her confounding variables. The individual risk factors were multiplied by the estimated coefficients from the regression model, transformed onto the probability scale, and summed to obtain the number of expected number of deaths at each hospital.

Hospital RSMRs
We next estimated a risk-standardized mortality ratio for each hospital by computing the ratio of the "observed" number of deaths to the expected number of deaths (RSMR). However, rather than use the actual numbers of deaths at a hospital, we used an adjusted number (called a shrinkage estimate) that avoids several statistical problems associated with the observed number, including small sample sizes and clustering.28,34,50,51 We then multiplied the standardized mortality ratio by the crude state mortality rate to obtain hospital-specific RSMRs. Ninety-five percent posterior intervals for each RSMR were computed.

Cross-Validation
Because all hospitals contribute to the model used to estimate the expected number of deaths, each hospital helps to define its own expected behavior.50,51 If one hospital is truly "outlying," with an unusually high or low mortality rate, it may "inflate" the estimated between-hospital variance component because the regression model adapts to incorporate the results of the unusual hospital. Consequently, this hospital will be less likely to be identified as an outlier. With a very large number of hospitals, the results of one institution are unlikely to distort the model substantially. However, with a smaller number of cardiac surgery hospitals, as in Massachusetts or other individual states, one aberrant hospital could substantially influence the counterfactual outcome and make the performance of that hospital less likely to be identified as an outlier.

We addressed this problem through cross-validation. In a second set of analyses, the data from each hospital were sequentially deleted from the determination of the counterfactual distribution for its particular patients. With this approach, the expected number of deaths for a hospital represents how well the rest of the hospitals in the state would fare with the patients from that specific hospital. We computed the difference between the observed numbers of deaths in each hospital and the number of deaths predicted using its case mix and the regression coefficients from a model based on all other hospitals. Posterior predictive probability values, which reflect the similarity of the mortality experience of a particular hospital to that of its peers, also were computed.50 Extreme predictive P values (P≤0.01 or P≥0.99) indicate a discrepancy between the observed data and what is predicted by the model developed from the remaining hospitals.

The authors had full access to and take full responsibility for the integrity of the data. All authors have read and agree to the manuscript as written.


*    Results
up arrowTop
up arrowAbstract
up arrowIntroduction
up arrowMethods
*Results
down arrowDiscussion
down arrowConclusions
down arrowReferences
 
The crude 30-day mortality rate is 2.25%, corresponding to 99 deaths out of 4393 isolated CABG admissions. The number of isolated CABG admissions ranged from a low of 44 to a high of 650. Not surprisingly, substantial differences were found in patient risk factors among hospitals (Table 1). For example, the percentage of admissions in which ejection fraction was <30% ranged from 1.8% to 15.0%, renal failure ranged from 1.8% to 13.0%, preoperative intraaortic balloon pump use varied from 2.3% to 29.0%, and emergent or salvage procedures ranged from 0% to 7.2%. Visual inspection of the covariate frequencies for hospitals B and F suggests that they represent, on average, quite different populations. For example, 7.2% of the patients at hospital B were emergent or salvage, the highest-acuity group, whereas only 0.9% of patients at hospital F were in that category. This imbalance is illustrated more formally in Figure 2B, a graphic depiction of the density of estimated propensity scores from hospital B compared with those of hospital F. This analysis is restricted to those patients who underwent surgery in those 2 hospitals. The propensity scores in Figure 2B were obtained by estimating a (binary) logistic regression model in which the response was an indicator assuming a value of 1 if the patient underwent CABG at hospital B and 0 if the patient underwent surgery at hospital F. The density estimates indicate that for 13% of the patients who underwent CABG at hospital B (solid line), no "similar" patient underwent the procedure in hospital F (dashed line). This percentage was calculated by identifying the fraction of hospital B patients with estimated log-odds of propensity scores >5 because this defined the area of nonoverlap (eg, no hospital F patient had an estimated log-odds of propensity score >5). The lack of overlap implies that a direct comparison of all patients treated at hospital B with those at hospital F may not be statistically valid.


View this table:
[in this window]
[in a new window]

 
Table 1. Selected Patient Characteristics Stratified by Hospital: Massachusetts Adults Undergoing Isolated CABG Surgery During 2003


Figure 2189401
View larger version (14K):
[in this window]
[in a new window]

 
Figure 2. Covariate balance for 2 comparisons using Massachusetts cardiac surgery programs. A, Substantial overlap is present in the log-odds of the probability of surgery at hospital B vs the remaining 13 cardiac surgery programs. B, The covariate balance for the direct comparison of hospital B to hospital F is much less.

Table 2 illustrates the prevalence of the individual covariates from which these propensity score density distributions were derived. Column 1 shows the characteristics of the subset of patients at hospital B who do not overlap with hospital F (ie, for whom the log-odds of their propensity scores are >5). The prevalence of individual high-risk characteristics is quite elevated in this patient subset (eg, 24% renal failure, 17% reoperation, 10% cardiogenic shock, 52% emergent or salvage), and hospital F has no experience with patients having this overall level of acuity. The last 2 columns demonstrate the balancing properties of propensity scores in the area of overlap, in which patients are found from both hospitals with comparable log-odds of propensity score. For many of the most important covariates (eg, prior CABG, cardiogenic shock, recent myocardial infarction, urgent or emergent/salvage status), the prevalence was comparable for hospital B and F patients in the overlap region.


View this table:
[in this window]
[in a new window]

 
Table 2. Comparison of Prevalence of Risk Factors Between Hospitals B and F

Although direct hospital-to-hospital covariate balance was poor, the overlap of estimated propensity score distributions for each hospital compared with the propensity score distribution for patients at most of the remaining hospitals was excellent. For example, Figure 2A displays the overlap for hospital B and all remaining hospitals based on the predictions obtained from the multinomial logistic regression model. This suggests that a comparison of the performance of hospital B relative to the overall group of other Massachusetts CABG providers is statistically valid.

The prevalence of the confounders and their relationship to 30-day mortality are presented in Table 3. Between-hospital variation measured by the MOR, after accounting for patient risk factors, is 1.34. This implies that for 2 patients with the same observed risk factors, the patient treated in the hospital with higher mortality risk is 1.34 times as likely to die within 30 days of isolated CABG as the patient treated in the hospital with lower mortality risk.


View this table:
[in this window]
[in a new window]

 
Table 3. Prevalence of Risk Factors and Conditional and Unconditional (Population-Averaged) Odds Ratios of 30-Day Mortality After Isolated CABG Surgery in Massachusetts (2003)

The last column of Table 4 depicts the typical profiling results that would be obtained with the entire state experience (all 14 hospitals) as the counterfactual. The 95% posterior interval of each hospital for its RSMR includes the state crude rate of 2.25%. This would imply that no hospital had higher- or lower-than-expected mortality rate given its case mix. In most public report cards, this finding would be regarded as sufficient evidence for the absence of statistical outliers, but as noted previously, this conclusion may be misleading. The 3 columns on the left demonstrate the results of analyses performed with cross-validation, sequentially deleting the results of each hospital from the determination of its own counterfactual. The result of this cross-validation predictive P value analysis was highly significant (P=0.01) for hospital D on the left side of Table 4. Supporting this concern is the fact that the between-hospital variation in risk-adjusted mortality is reduced by 50% when hospital D is excluded from the model (from 0.0939 to 0.048; data not shown), and the MOR decreases from 1.34 to 1.23. Finally, a 2.26% excess mortality rate results when hospital D is compared with its peers. These findings all suggest that hospital D is in fact a statistical outlier.


View this table:
[in this window]
[in a new window]

 
Table 4. Cross-Validation Results


*    Discussion
up arrowTop
up arrowAbstract
up arrowIntroduction
up arrowMethods
up arrowResults
*Discussion
down arrowConclusions
down arrowReferences
 
The study of variations in the provision of healthcare services has been a central activity of outcomes research for more than 2 decades. This variability has included both utilization of services and outcomes. Initial publication of hospital mortality rates in 1986 by the Health Care Financing Administration (now known as the Centers for Medicare and Medicaid Services, or CMS) was widely criticized for failing to adjust for patient risk.52 This motivated the development of numerous statistical risk models, particularly in cardiac surgery, to account for preoperative patient characteristics. It also stimulated CMS to look more closely at its risk models. It has now released new mortality models for acute myocardial infarction and heart failure that address many risk-adjustment issues and statistical deficiencies identified in their earlier releases.32,33 Nevertheless, although risk adjustment corrects for the case severity at a given institution using risk estimates derived from the entire population, it does not guarantee statistically valid direct hospital-to-hospital comparisons. When analyzing outcomes data, interested stakeholders should always consider these additional questions: To what type of patients can inferences about risk-standardized hospital outcomes be applied? What reference population was used to determine the counterfactual? If direct hospital-to-hospital comparison is the goal, is there sufficient covariate balance (overlap) to justify such comparison? A widely held view is that risk adjustment levels the playing field so that hospitals can be compared directly with one another over the broad spectrum of patient risk. We argue that this assumption often is invalid and that this common misinterpretation has profound health policy implications in today’s performance-centric environment.

Are current report cards useful? Yes, they are useful when interpreted in the correct context. Most outcomes report cards use indirect standardization. In this context, the RSMR of a hospital may be interpreted as a measure of quality for the type of patient it treats. Properly constructed and interpreted, report cards facilitate comparisons of hospitals with the entire experience of a larger population of providers (eg, a state or region). Such a comparison group for each hospital typically will be rich enough to support a valid assessment of their quality of care, and it provides meaningful information to payers, regulators, and healthcare consumers.


*    Conclusions
up arrowTop
up arrowAbstract
up arrowIntroduction
up arrowMethods
up arrowResults
up arrowDiscussion
*Conclusions
down arrowReferences
 
Outcomes research typically involves nonrandomized studies to assess the results of patient experience with the healthcare system. Virtually always, some form of adjustment is required. Although risk-standardized outcomes have been an important advance in adjusting provider results for differences in case mix, such results often have been misapplied. Assessing the performance of a hospital for its case mix compared with the expected performance of a reference group of providers for a similar case mix usually is justified. However, because of substantial differences in the distribution of risk factors, it may often be inappropriate to directly compare 2 hospitals using the results available in most public report cards.


*    Acknowledgments
 
Sources of Funding

Dr Normand is contracted by the Massachusetts Department of Public Health to monitor hospital cardiac quality and also receives funding from Yale University to develop risk models for CMS.

Disclosures

None.


*    References
up arrowTop
up arrowAbstract
up arrowIntroduction
up arrowMethods
up arrowResults
up arrowDiscussion
up arrowConclusions
*References
 
1. Agency for Healthcare Research and Quality. Outcomes Research: Fact Sheet. Available at: http://www.ahrq.gov/clinic/outfact.htm. Accessed September 5, 2007.

2. Institute of Medicine. Crossing the Quality Chasm: A New Health System for the 21st Century. Washington, DC: National Academies Press; 2001.

3. Institute of Medicine. Performance Measurement: Accelerating Improvement. Washington, DC: National Academies Press; 2006.

4. Gatsonis CA. Profiling providers of medical care. In: Armitage P, Colton T, ed. Encyclopedia of Biostatistics, Volume 6. 2nd ed. Chichester, UK: John Wiley & Sons Ltd; 2005: 4252–4254.

5. Normand S-LT. Quality of care. In: Armitage P, Colton T, ed. Encyclopedia of Biostatistics, Volume 6. 2nd ed. Chichester, UK: John Wiley & Sons Ltd; 2005: 4348–4352.

6. Rubin DB. Comment: Neyman (1923) and causal inference in experiments and observational studies. Stat Sci. 1990; 5: 472–480.

7. Holland PW. Statistics and causal inference. J Am Stat Assoc. 1986; 81: 945–960.[CrossRef]

8. Holland PW, Rubin DB. Causal inference in retrospective studies. Eval Rev. 1988; 12: 203–231.[Abstract/Free Full Text]

9. Rothman KJ, Greenland S. Causation and causal inference in epidemiology. Am J Public Health. 2005; 95: S144–S150.[Abstract/Free Full Text]

10. Rothman KJ, Greenland S. Modern Epidemiology. Philadelphia, Pa: Lippincott-Raven; 1998.

11. Pearl J. Causality: Models, Reasoning, and Inference. Cambridge, UK: Cambridge University Press; 2000.

12. Robins JM, Greenland S. The role of model selection in causal inference from nonexperimental data. Am J Epidemiol. 1986; 123: 392–402.[Free Full Text]

13. Rosenbaum PR, Rubin DB. Estimating the effects caused by treatments: comment. J Am Stat Assoc. 1984; 79: 26–28.[CrossRef]

14. Rosenbaum PR, Rubin DB. The central role of the propensity score in observational studies for causal effects. Biometrika. 1983; 70: 41–55.[Abstract/Free Full Text]

15. Rosenbaum PR. Observational Studies. New York, NY: Springer; 2002.

16. Little RJ, Rubin DB. Causal effects in clinical and epidemiological studies via potential outcomes: concepts and analytical approaches. Annu Rev Public Health. 2000; 21: 121–145.[CrossRef][Medline] [Order article via Infotrieve]

17. Rubin DB. Causal inference using potential outcomes: design, modeling, decisions. J Am Stat Assoc. 2005; 100: 322–331.[CrossRef]

18. Gelman A. Applied Bayesian Modeling and Causal Inference From Incomplete Perspectives. Chichester, UK: Wiley; 2004.

19. Gelman A, Hill J. Data Analysis Using Regression and Multilevel/Hierarchical Models. Cambridge, UK: Cambridge University Press; 2007.

20. Maldonado G, Greenland S. Estimating causal effects. Int J Epidemiol. 2002; 31: 422–429.[Free Full Text]

21. Rubin DB. The design versus the analysis of observational studies for causal effects: parallels with the design of randomized trials. Stat Med. 2007; 26: 20–36.[CrossRef][Medline] [Order article via Infotrieve]

22. Rubin DB. Direct and indirect causal effects via potential outcomes. Scand J Stat. 2004; 31: 161–170.[CrossRef]

23. Rubin DB. Bayesian-inference for causal effects: role of randomization. Ann Stat. 1978; 6: 34–58.[CrossRef]

24. Rubin DB. Estimating causal effects of treatments in randomized and nonrandomized studies. J Educ Psychol. 1974; 66: 688–701.[CrossRef]

25. Fleiss JL, Levin BA, Paik MC. Statistical Methods for Rates and Proportions. Hoboken, NJ: J. Wiley; 2003.

26. Shahian DM, Blackstone EH, Edwards FH, Grover FL, Grunkemeier GL, Naftel DC, Nashef SA. Nugent WC, Peterson ED. Cardiac surgery risk models: a position article. Ann Thorac Surg. 2004; 78: 1868–1877.[Abstract/Free Full Text]

27. Shahian DM, Normand SL, Torchiana DF, Lewis SM, Pastore JO, Kuntz RE, Dreyer PI. Cardiac surgery report cards: comprehensive review and statistical critique. Ann Thorac Surg. 2001; 72: 2155–2168.[Abstract/Free Full Text]

28. Normand S-LT, Glickman ME, Gatsonis CA. Statistical methods for profiling providers of medical care: issues and applications. J Am Stat Assoc. 1997; 92: 803–814.[CrossRef]

29. McNeil BJ, Pedersen SH, Gatsonis C. Current issues in profiling quality of care. Inquiry. 1992; 29: 298–307.[Medline] [Order article via Infotrieve]

30. Hannan EL, Wu C, Ryan TJ, Bennett E, Culliford AT, Gold JP, Hartman A, Isom OW, Jones RH, McNeil B, Rose EA, Subramanian VA. Do hospitals and surgeons with higher coronary artery bypass graft surgery volumes still have lower risk-adjusted mortality rates? Circulation. 2003; 108: 795–801.[Abstract/Free Full Text]

31. Hannan EL, Kumar D, Racz M, Siu AL, Chassin MR. New York State’s Cardiac Surgery Reporting System: four years later. Ann Thorac Surg. 1994; 58: 1852–1857.[Abstract]

32. Krumholz HM, Wang Y, Mattera JA, Wang Y, Han LF, Ingber MJ, Roman S, Normand SL. An administrative claims model suitable for profiling hospital performance based on 30-day mortality rates among patients with an acute myocardial infarction. Circulation. 2006; 113: 1683–1692.[Abstract/Free Full Text]

33. Krumholz HM, Wang Y, Mattera JA, Wang Y, Han LF, Ingber MJ, Roman S, Normand SL. An administrative claims model suitable for profiling hospital performance based on 30-day mortality rates among patients with heart failure. Circulation. 2006; 113: 1693–1701.[Abstract/Free Full Text]

34. Shahian DM, Torchiana DF, Shemin RJ, Rawn JD, Normand SL. Massachusetts cardiac surgery report card: implications of statistical methodology. Ann Thorac Surg. 2005; 80: 2106–2113.[Abstract/Free Full Text]

35. Rosenbaum PR, Rubin DB. Constructing a control-group using multivariate matched sampling methods that incorporate the propensity score. Am Stat. 1985; 39: 33–38.[CrossRef]

36. Rosenbaum PR, Rubin DB. Reducing bias in observational studies using subclassification on the propensity score. J Am Stat Assoc. 1984; 79: 516–524.[CrossRef]

37. Rubin DB. The design versus the analysis of observational studies for causal effects: parallels with the design of randomized trials. Stat Med. 2007; 26: 20–36.[CrossRef][Medline] [Order article via Infotrieve]

38. D’Agostino RB Jr. Propensity scores in cardiovascular research. Circulation. 2007; 115: 2340–2343.[Free Full Text]

39. Braitman LE, Rosenbaum PR. Rare outcomes, common treatments: analytic strategies using propensity scores. Ann Intern Med. 2002; 137: 693–695.[Free Full Text]

40. D’Agostino RB Jr. Propensity score methods for bias reduction in the comparison of a treatment to a non-randomized control group. Stat Med. 1998; 17: 2265–2281.[CrossRef][Medline] [Order article via Infotrieve]

41. Joffe MM, Rosenbaum PR. Invited commentary: propensity scores. Am J Epidemiol. 1999; 150: 327–333.[Abstract/Free Full Text]

42. Glance LG, Osler TM, Mukamel DB, Dick AW. Use of a matching algorithm to evaluate hospital coronary artery bypass grafting performance as an alternative to conventional risk adjustment. Med Care. 2007; 45: 292–299.[CrossRef][Medline] [Order article via Infotrieve]

43. Huang IC, Frangakis C, Dominici F, Diette GB, Wu AW. Application of a propensity score approach for risk adjustment in profiling multiple physician groups on asthma care. Health Serv Res. 2005; 40: 253–278.[CrossRef][Medline] [Order article via Infotrieve]

44. Dehejia RH, Wahba S. Causal effects in nonexperimental studies: reevaluating the evaluation of training programs. J Am Stat Assoc. 1999; 94: 1053–1062.[CrossRef]

45. Tchernis R, Horvitz-Lennon M, Normand SL. On the use of discrete choice models for causal inference. Stat Med. 2005; 24: 2197–2212.[CrossRef][Medline] [Order article via Infotrieve]

46. Society of Thoracic Surgeons. STS National Database. Available at: http://www.sts.org/sections/stsnationaldatabase/. Accessed September 5, 2007.

47. Social Security Death Index interactive search. Available at: http://ssdi.rootsweb.com/cgi-bin/ssdi.cgi. Accessed September 5, 2007.

48. Larsen K, Merlo J. Appropriate assessment of neighborhood effects on individual health: integrating random and fixed effects in multilevel logistic regression. Am J Epidemiol. 2005; 161: 81–88.[Abstract/Free Full Text]

49. Larsen K, Petersen JH, Budtz J, Endahl L. Interpreting parameters in the logistic regression model with random effects. Biometrics. 2000; 56: 909–914.[CrossRef][Medline] [Order article via Infotrieve]

50. Normand ST, Shahian DM. Statistical and clinical aspects of hospital outcomes profiling. Stat Sci. 2007; 22: 206–226.[CrossRef]

51. Draper D, Gittoes M. Statistical analysis of performance indicators in UK higher education. J Royal Stat Soc Ser A (Stat Soc). 2004; 167: 449–474.

52. Iezzoni LI. Risk Adjustment for Measuring Health Care Outcomes. 3rd ed. Chicago, Ill: Health Administration Press; 2003.


 

CLINICAL PERSPECTIVE

Risk-standardized outcomes are increasingly being used by various stakeholders to assess the quality of care delivered by healthcare providers. Although adjusted outcomes represent a substantial improvement over unadjusted results, they are commonly misinterpreted and misused, which can have important consequences for the provider and the healthcare system. Risk-standardized outcomes, as most commonly constructed, characterize a provider’s performance for a specific group of patients compared with what would have been expected had that care been delivered by an average provider in the reference population (typically a state or a country). These indirectly standardized outcomes, based on providers’ actual case mix, cannot necessarily be extrapolated to predict what their performance might be with a different (eg, more complex) group of patients. Moreover, if the number of providers in the reference population is small, the inclusion of a true outlying program in the development of the risk model may decrease the sensitivity of the resulting algorithm to detect true outliers. In Massachusetts, this problem is mitigated through the use of cross-validation, obtained by sequentially removing each hospital from risk model development and then assessing its performance with a model derived from the remaining hospitals. Finally, although risk-standardized outcomes are useful for comparing individual provider performance with that of the overall reference population, this does not imply that the outcomes of 2 providers can be directly compared with one another. This would only be justified for the group of patients whose risk profiles overlap the 2 providers because these are the only patients that they have in common.


*    Footnotes
 
Guest Editor for this article was Harlan M. Krumholz, MD, SM.


Related Article:

Clinical Summaries
Circulation 2008 117: 1909. [Extract] [Full Text]



This article has been cited by other articles:


Home page
CirculationHome page
H. M. Krumholz and S.-L. T. Normand
Public Reporting of 30-Day Mortality for Patients Hospitalized With Acute Myocardial Infarction and Heart Failure
Circulation, September 23, 2008; 118(13): 1394 - 1397.
[Full Text] [PDF]


Home page
Eur. J. Cardiothorac. Surg.Home page
L. A. Menicanti
Reply to d'errico et Al.
Eur. J. Cardiothorac. Surg., August 1, 2008; 34(2): 469 - 469.
[Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow All Versions of this Article:
117/15/1955    most recent
CIRCULATIONAHA.107.747873v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrowRequest Permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Shahian, D. M.
Right arrow Articles by Normand, S.-L. T.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Shahian, D. M.
Right arrow Articles by Normand, S.-L. T.
Right arrowPubmed/NCBI databases
Medline Plus Health Information
*Coronary Artery Bypass Surgery
Related Collections
Right arrow Health policy and outcome research
Right arrow Ablation/ICD/surgery
Right arrow CV surgery: coronary artery disease
Right arrowRelated Article