(Circulation. 1999;99:2098-2104.)
© 1999 American Heart Association, Inc.
Clinical Investigation and Reports |
From the Division of Cardiovascular Surgery at The Toronto Hospital (J.I.), Toronto, Ontario, Canada; The Institute for Clinical Evaluative Sciences (J.V.T., C.D.N.), North York, Ontario, Canada; the Faculty of Medicine (J.I., J.V.T., C.D.N.), University of Toronto, Toronto, Ontario, Canada; and the Department of Medicine and the Clinical Epidemiology and Health Services Research Program (J.V.T., C.D.N.), Sunnybrook Health Science Centre, North York, Ontario, Canada.
Correspondence to Dr C. David Naylor, Institute for Clinical Evaluative Sciences, G106-2075 Bayview Ave, North York, Ontario, M4N 3 M5, Canada. E-mail cdn{at}ices.on.ca
| Abstract |
|---|
|
|
|---|
Methods and ResultsDrawing on 7491 consecutive patients who underwent isolated CABG at 2 Toronto teaching hospitals between 1993 and 1996, we compared 3 strategies: (1) using a ready-made model originally derived and validated in our jurisdiction; (2) recalibrating the ready-made model to better fit the population; and (3) deriving a new model with additional risk factors. We assessed statistical accuracy, ie, area under a receiver-operator characteristic curve (ROC); precision, ie, statistical goodness-of-fit; and actual impact on both risk-adjusted operative mortalities (RAOM) and performance rankings for 14 surgeons. The new model was slightly more accurate than the ready-made model (ROC, 0.78 versus 0.76; P<0.05), albeit not different from the recalibrated model (ROC, 0.77). The ready-made model showed poor fit between the predicted and observed results (P<0.001), leading to significant underestimation of RAOM (1.6±0.2%) compared with the other strategies (2.5±0.2%; P=0.048). Remodeling also changed the performance rankings among half the surgeons with higher RAOM.
ConclusionsPoorly calibrated risk algorithms can bias the calculation of RAOM and alter the results of surgeon-specific profiles. Any existing index used for risk assessment in cardiac surgery should be episodically recalibrated or compared with new models derived from local subjects to ensure that its performance remains optimal.
Key Words: bypass risk factors mortality prognosis
| Introduction |
|---|
|
|
|---|
Patient populations and the risk factors associated with adverse outcomes do change over time and also differ between centers. Thus, risk models derived and validated in 1 locale usually perform less well when applied in other settings or even to more contemporary patients in the same setting.6 11 12 15 16 Clinicians and managers considering the use of a risk index accordingly have 3 basic options:
In this article, we explore the implications of these options for assessment of operative mortality using a detailed data set with information on consecutive patients undergoing isolated CABG at 2 Toronto teaching hospitals. Specifically, we have compared the analytical strategies of recalibrating and remodeling against a ready-made rule developed by the authors for the Ontario Cardiac Care Network (CCN).1
| Methods |
|---|
|
|
|---|
1 data elements used in the
analyses.
Analysis
General Issues
Data were collected and managed in dBASEIV data sets. The SAS
for PC18 and BMDP/DYN LR19 programs were used
for statistical analysis. We were not seeking to recalibrate or
rederive an index for external and general application. Thus, we
forwent split-sample methods (ie, separate derivation and validation
steps) with both the recalibrating and remodeling strategies described
below.
Generating the Models
Ready-made: We used a 6-variable risk index developed for
the Ontario CCN, drawing on all patients undergoing cardiac surgery in
the province between April 1991 and March 1993. For isolated CABG
outcomes, 1 variable (type of surgery) was set to its null value,
leaving only 5 variables and their associated risk scores. The
original regression coefficients for each variable were used to
calculate patient-specific predicted probability (P) of OM
from the formula:
![]() |
Recalibrated: The 5 explanatory variables from the CCN index were included in logistic regression analyses of the 1993 to 1996 data set to reestimate mortality-specific regression coefficients and related risk scores. The predicted probability of OM as well as the total risk score for each patient was calculated as for the CCN model.1 20
Remodeled: The University of Toronto cardiac surgery
registry covers a wide variety of potential risk
factors.17 All explanatory variables with a
univariate P value <0.25, as well as those
found commonly in other major risk indexes but failing to meet the
critical
-level, were submitted to logistic regression
analyses that used forward selection combined with backward
elimination.21 22 The best logistic regression model
was determined by 2 diagnostic criteria: the
Hosmer-Lemeshow goodness-of-fit statistic21 and the area
under the receiver-operator characteristic (ROC)
curve.15 23 24 As in the CCN index, odds ratios were
rounded to the nearest integer, and an additive risk index was created.
Because of small numbers, risk scores >18 were collapsed (ie, score
18 for OM).
Statistical Comparisons of Index Performance
Statistical accuracy, or model discrimination, was assessed with
the area under the ROC curve23 24 for each model, with
comparisons between models as described by Hanley and
McNeil.24
Statistical precision, or model calibration, was evaluated by the Hosmer-Lemeshow goodness-of-fit statistic.21 We also plotted the mean predicted probability for OM against observed OM for each total risk score15 22 and performed a weighted linear regression to evaluate whether the relationship was overestimated or underestimated. A slope of 1 and intercept of 0 would indicate a perfect fit of predicted to observed outcomes.22 Differences in slopes and intercepts between the 3 regressions were evaluated by ANCOVA with pairwise comparisons as appropriate.
Clinically Salient Comparisons of Index Performance
Expected mortality for each surgeon for each model was
calculated as the mean predicted probability of OM based on the
prevalence of risk factors in his or her caseload. We calculated
risk-adjusted OM (RAOM) by dividing the observed mortality by the
expected mortality and then multiplying that ratio by the overall
mortality rate in the study population (0.0226). This result can be
interpreted as the mortality rate a surgeon would have if his or her
case mix were similar to the average case mix in the
study.25
Within each model, the difference between the mean observed mortality minus the mean expected mortality was evaluated by paired t test for the null hypothesis (H0) that the difference equaled zero.15 The differences in RAOM across 14 surgeons and 3 models were evaluated by ANOVA.
For each model, the surgeons were ranked from 1 (lowest RAOM) to 14 (highest RAOM) on the basis of how they ranked in the original CCN model. We examined qualitatively whether the ranking of surgeons changed and also calculated Spearman rank correlation coefficients (rs) across models.
| Results |
|---|
|
|
|---|
|
The independent predictors of operative mortality for the new internal model were left ventricular grade, age group, previous bypass surgery, the timing of surgery, sex, triple-vessel disease, left main coronary artery disease, peripheral vascular disease, recent myocardial infarction, acute coronary insufficiency, and a history of hypertension. Given its recurrence as a risk factor in other published indexes, we forced preoperative renal insufficiency into the model, but its inclusion unfavorably affected both model discrimination and precision, possibly because of its low prevalence in our database or collinearity with other important predictors.
Table 2
contains the original ORs for OM
and risk weights for the ready-made CCN model as well as the ORs, their
95% CIs, and risk weights from the recalibrated and remodeled indexes.
When the CCN and remodeled indexes are compared, all 5 original risk
factors do recur, but there are differences in weights, most notably
for repeat operation and grade IV ventricular function.
|
Table 3
shows the number of patients
defined by each risk score for each model and the observed OM for that
model's score. The addition of 4 explanatory variables to the
remodeled index redefined "risk" in 87% of patients who bore at
least 1 of the 4 conditions and resulted in a lower prevalence of both
patients and OM at the lower risk scores.
|
Statistical Parameters of Index Performance
As shown in Table 4
, the longer,
remodeled index showed a small increment in accuracy over the original
CCN index (ROC, 0.78 versus 0.76; P<0.05), with the
recalibrated index between them (ROC, 0.77). The ready-made CCN model
showed significantly poor fit between predicted and observed results
(P<0.001), whereas the other models had acceptable
calibration.
|
Another method of assessing model fit is shown in Figure 1
, which depicts the mean predicted
probability of OM versus observed OM for each cumulative risk score.
The original CCN model overestimated predicted OM (top panel). In
contrast, for the recalibrated and new models, slopes appeared closer
to unity and intercepts close to zero (data available on request).
ANCOVA confirmed that there was a significant difference in intercepts
among models. Pairwise comparisons showed that as with the ROC curve
area, the significant difference arose only when the original CCN model
and the newly remodeled index were compared (P<0.0001).
|
OM: Observed, Expected, and Adjusted
We compared observed and expected mortality for each model. The
observed minus expected mortality rate was significantly different from
zero for the CCN model (-1.04±0.4%; P=0.011) but not with
recalibration (0.17±0.3%; P=0.62) or remodeling
(0.21±0.3%; P=0.55). Intermodel comparisons by ANOVA
confirmed that the CCN model had a significantly greater disparity
between observed and predicted results compared with both the
recalibrated (P=0.017 versus CCN) and remodeled
(P=0.014 versus CCN) indexes.
As shown in Figure 2
, this higher
expected OM in the ready-made CCN model resulted in an underestimation
of risk-adjusted OM (P=0.048 compared with the remodeled
index). RAOM was similar to unadjusted OM for the recalibrated and
remodeled indexes.
|
Surgeon-Specific Rankings
Table 5
depicts the relative ranking
of surgeons for each of the 3 models from lowest RAOM (rank=1) to
highest (rank=14). Recalibration resulted only in surgeons 8 and 9
exchanging positions. Remodeling, however, resulted in surgeon 13
shifting up by 3 positions and surgeons 10 and 11 each moving down 2
ranks. Despite these latter shifts, the overall Spearman correlation
coefficient showed a significant association between ranks for the CCN
and new models (rs=0.982,
P=0.012) because the positions of the first 7 surgeons were
stable across models. However, examining the 7 higher-ranking surgeons
revealed a diminished correlation
(rs=0.857, P=0.07).
|
| Discussion |
|---|
|
|
|---|
Our rationale was to offer some guidance to providers who must respond to the burgeoning literature on risk indexes. We accordingly discuss the implications of our findings in 3 areas.
Implications for Benchmarking Improvements Over Time
The original CCN model was derived and validated for earlier
years. As such, it tended to overestimate the chances of postoperative
death for high-risk patients, with the result that risk-adjusted
outcomes improved. Experience with the Society of Thoracic
Surgeons5 26 risk-adjustment algorithm has been similar.
Their original 23-variable model was derived with Bayesian methods
from data on tens of thousands of subjects and scores of centers.
Nonetheless, in recent years, the model has predicted a rising
probability of operative mortality owing to an increasing prevalence of
high-risk patients, even as observed operative mortality has decreased.
If the goal of an outcomes analysis is to determine trends in mortality over time, then arguably a risk model derived and validated from an earlier period can be used, because it anchors practice historically and controls for the evolution of case mix. One limitation is that improved reportage of risk factors ("upcoding") in contemporary groups of patients may lead to a spurious impression that risk-adjusted outcomes are improving. For example, critics have charged that this phenomenon, rather than the impact of public report cards, explains the improved outcomes of cardiac surgery in New York State.27
Implications for Contemporaneous Quality Management or Patient
Counseling
Setting aside temporal benchmarking, the usual goals of an
outcomes analysis are contemporary quality management or risk
prediction by, respectively, comparing the risk-adjusted outcomes of
surgeons or centers or identifying patients in high-risk subgroups. For
this purpose, our findings suggest that practitioners and
managers should consider recalibrating an existing index or developing
a new model with the data at hand.
Recall first that our ready-made CCN index was originally derived and validated in Ontario with data from 9 centers, including the 2 hospitals that contributed subsequent patients to the present study. Thus, it is perhaps not surprising that the CCN index still showed good performance in this study, with an ROC area of 0.76. However, the CCN index showed poor calibration associated with overestimation of expected operative mortality, a feature of model performance that is undesirable for both prospective risk prediction and post hoc risk adjustment. Poor calibration presumably occurred not only because of temporal shifts in case mix but also because the CCN index was developed in a data set that combined ischemic and valvular heart disease, and we were applying it to an isolated CABG series. Parsonnet et al,11 for example, observed a deterioration in model performance when their predictive rule, developed in a data set that combined CABG with valve surgery, was subsequently tested for CABG and valve procedures separately. Similarly, the CCN index was derived to cover both OM and length-of-stay outcomes, and the recalibration here was outcome specific.
Recalibration was therefore a promising strategy in this context. More generally, it allows a group of practitioners to remain efficient in data collection, restricting their efforts to careful documentation of a limited number of prespecified variables. By reweighting these variables and fine-tuning the risk index, analysts may sometimes mitigate shifts in case mix and outcomes that occur either over time or as the index is applied to centers other than those from which it was derived.
Indeed, recalibration did lead to some improvements in model performance in this test case. Whereas the original CCN index demonstrated significantly poor fit with data from this new series of almost 7500 patients, the recalibrated index fit the data well, and we avoided overestimating operative risk. The recalibrated model also showed discrimination similar to that of a new and more complex model and yielded similar relative ranks for most surgeons. However, for the higher RAOM surgeons, the new model did lead to some alterations in surgeon outcome rankings, an observation that underscores the potential practical importance of even small marginal improvements in model accuracy from a statistical standpoint.
In sum, for clinicians and managers who have developed their own index in the past or found an index that shows acceptable performance in their patient populations, episodic recalibration of that index may suffice. However, in those instances in which there are profound differences in case mix or event rates, it will be prudent to derive a new model with the data at hand.
How Many Risk Factors Are Enough?
Recently, the Society of Thoracic Surgeons published its updated
risk model for 1995,14 developed from a database of
>138 000 patients who underwent surgery at 374 hospitals throughout
the United States and Canada. The model shows excellent accuracy but
now requires 33 predictor variables. Apart from increased costs of
data collection and increased risks of data "gaming" or random
errors, the large numbers of explanatory variables also increase
the chances of statistical overfitting and model instability when
applied to specific centers.
In contrast, the original CCN model was designed to be parsimonious and robust for multicenter comparisons.1 The new model adds only 4 variables, bringing the total to 9 for isolated CABG. These factors are similar to those reported previously by our group17 and others1 2 3 4 5 6 7 8 9 10 11 12 13 and include those highlighted in recent guidelines from the Working Group Panel on the Cooperative CABG Database Project.28 Despite minor differences in surgeon rankings, this new model had performance characteristics similar to those of the recalibrated CCN model with only 5 variables. This latter result is consistent with our earlier findings on the limited marginal improvements in model performance with increasing numbers of predictor variables.25 Accurate and complete data collection on a constrained set of important variables appears to be a prudent strategy.
Conclusions
Our findings illustrate that temporal and intercenter differences
in case mix make it difficult to achieve optimal predictive
performance with ready-made risk indexes. This observation
argues against the proliferation of published risk indexes in the
clinical literature that either affirm well-known prognostic factors or
add new variables with minimal marginal impact. We have also
demonstrated that recalibration of existing indexes may sometimes be
sufficient to ensure adequate risk prediction even when models are
parsimonious. As a precaution, however, we suggest that centers collect
data fastidiously on a modest-sized set of key variables such as
those suggested by the Working Group Panel28 and undertake
intermittent remodeling to ensure that emerging risk factors are not
inadvertently overlooked.
| Acknowledgments |
|---|
| Footnotes |
|---|
Received July 31, 1998; revision received January 22, 1999; accepted January 26, 1999.
| References |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
V. Guru, J. V. Tu, E. Etchells, G. M. Anderson, C. D. Naylor, R. J. Novick, C. M. Feindel, F. D. Rubens, K. Teoh, A. Mathur, et al. Relationship Between Preventability of Death After Coronary Artery Bypass Graft Surgery and All-Cause Risk-Adjusted Mortality Rates Circulation, June 10, 2008; 117(23): 2969 - 2976. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. D'Errigo, F. Seccareccia, S. Rosato, V. Manno, G. Badoni, D. Fusco, C. A. Perucci, and the Research Group of the Italian CABG Outcome Pro Comparison between an empirically derived model and the EuroSCORE system in the evaluation of hospital performance: the example of the Italian CABG Outcome Project Eur. J. Cardiothorac. Surg., March 1, 2008; 33(3): 325 - 333. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Singh, B. J. Gersh, S. Li, J. S. Rumsfeld, J. A. Spertus, S. M. O'Brien, R. M. Suri, and E. D. Peterson Mayo Clinic Risk Score for Percutaneous Coronary Intervention Predicts In-Hospital Mortality in Patients Undergoing Coronary Artery Bypass Graft Surgery Circulation, January 22, 2008; 117(3): 356 - 362. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. N. Wijeysundera, K. Karkouti, J.-Y. Dupuis, V. Rao, C. T. Chan, J. T. Granton, and W. S. Beattie Derivation and Validation of a Simplified Predictive Index for Renal Replacement Therapy After Cardiac Surgery JAMA, April 25, 2007; 297(16): 1801 - 1809. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. L. Hannan, C. Wu, E. V. Bennett, R. E. Carlson, A. T. Culliford, J. P. Gold, R. S.D. Higgins, C. R. Smith, and R. H. Jones Risk Index for Predicting In-Hospital Mortality for Cardiac Valve Surgery Ann. Thorac. Surg., March 1, 2007; 83(3): 921 - 929. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Brunelli, N. J. Morgan-Hughes, M. Refai, M. Salati, A. Sabbatini, and G. Rocco Risk-adjusted morbidity and mortality models to compare the performance of two units after major lung resections J. Thorac. Cardiovasc. Surg., January 1, 2007; 133(1): 88 - 96. [Abstract] [Full Text] [PDF] |
||||
![]() |
F. Seccareccia, C. A. Perucci, D. Fusco, and P. D'Errigo Reply to hekmat et Al. Eur. J. Cardiothorac. Surg., May 1, 2006; 29(5): 857 - 858. [Full Text] [PDF] |
||||
![]() |
C.-H. Yap, C. Reid, M. Yii, M. A. Rowland, M. Mohajeri, P. D. Skillington, S. Seevanayagam, and J. A. Smith Validation of the EuroSCORE model in Australia. Eur. J. Cardiothorac. Surg., April 1, 2006; 29(4): 441 - 446. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Brunelli, F. Xiume', M. Al Refai, M. Salati, R. Marasco, and A. Sabbatini Risk-adjusted morbidity, mortality and failure-to-rescue models for internal provider profiling after major lung resection Interactive CardioVascular and Thoracic Surgery, April 1, 2006; 5(2): 92 - 96. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Nilsson, L. Algotsson, P. Hoglund, C. Luhrs, and J. Brandt Comparison of 19 pre-operative risk stratification models in open-heart surgery Eur. Heart J., April 1, 2006; 27(7): 867 - 874. [Abstract] [Full Text] [PDF] |
||||
![]() |
W. S. Weintraub Evaluating the Risk of Coronary Surgery and Percutaneous Coronary Intervention J. Am. Coll. Cardiol., February 7, 2006; 47(3): 669 - 671. [Full Text] [PDF] |
||||
![]() |
E. L. Hannan, C. Wu, E. V. Bennett, R. E. Carlson, A. T. Culliford, J. P. Gold, R. S.D. Higgins, O. W. Isom, C. R. Smith, and R. H. Jones Risk Stratification of In-Hospital Mortality for Coronary Artery Bypass Graft Surgery J. Am. Coll. Cardiol., February 7, 2006; 47(3): 661 - 668. [Abstract] [Full Text] [PDF] |
||||
![]() |
F. Seccareccia, C. A. Perucci, P. D'Errigo, M. Arca, D. Fusco, S. Rosato, D. Greco, and on behalf of the research group of the Italian CAB The Italian CABG Outcome Study: short-term outcomes in patients with coronary artery bypass graft surgery Eur. J. Cardiothorac. Surg., January 1, 2006; 29(1): 56 - 62. [Abstract] [Full Text] [PDF] |
||||
![]() |
J.-Y. Dupuis Clinical Predictions and Decisions to Perform Cardiac Surgery on High-Risk Patients Seminars in Cardiothoracic and Vascular Anesthesia, June 1, 2005; 9(2): 179 - 186. [Abstract] [PDF] |
||||
![]() |
I-C. Huang, F. Dominici, C. Frangakis, G. B. Diette, C. L. Damberg, and A. W. Wu Is Risk-Adjustor Selection More Important Than Statistical Approach for Provider Profiling? Asthma as an Example Med Decis Making, January 1, 2005; 25(1): 20 - 34. [Abstract] [PDF] |
||||
![]() |
D. M. Shahian, E. H. Blackstone, F. H. Edwards, F. L. Grover, G. L. Grunkemeier, D. C. Naftel, S. A.M. Nashef, W. C. Nugent, and E. D. Peterson Cardiac Surgery Risk Models: A Position Article Ann. Thorac. Surg., November 1, 2004; 78(5): 1868 - 1877. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Z. Omar, G. Ambler, P. Royston, J. Eliahoo, and K. M. Taylor Cardiac surgery risk modeling for mortality: a review of current practice and suggestions for improvement Ann. Thorac. Surg., June 1, 2004; 77(6): 2232 - 2237. [Abstract] [Full Text] [PDF] |
||||
![]() |
V. Guru, S. E. Fremes, and J. V. Tu Time-related mortality for women after coronary artery bypass graft surgery: A population-based study J. Thorac. Cardiovasc. Surg., April 1, 2004; 127(4): 1158 - 1165. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. F. Ibrahim, D. Paparella, J. Ivanov, M. R. Buchanan, and S. J. Brister Gender-related differences in morbidity and mortality during combined valve and coronary surgery J. Thorac. Cardiovasc. Surg., October 1, 2003; 126(4): 959 - 964. [Abstract] [Full Text] [PDF] |
||||
![]() |
V. Rao, M. C. Oz, M. A. Flannery, K. A. Catanese, M. Argenziano, and Y. Naka Revised screening scale to predict survival after insertion of a left ventricular assist device J. Thorac. Cardiovasc. Surg., April 1, 2003; 125(4): 855 - 862. [Abstract] [Full Text] |
||||
![]() |
P. W. C. ten Broecke, S. G. De Hert, E. Mertens, and H. F. Adriaensen Effect of preoperative {beta}-blockade on perioperative mortality in coronary surgery Br. J. Anaesth., January 1, 2003; 90(1): 27 - 31. [Abstract] [Full Text] [PDF] |
||||
![]() |
J R Pepper Risk assessment in coronary artery surgery Heart, January 1, 2003; 89(1): 1 - 2. [Full Text] [PDF] |
||||
![]() |
V. A. Ferraris and S. P. Ferraris Risk Stratification and Comorbidity Card. Surg. Adult, January 1, 2003; 2(2003): 187 - 224. [Full Text] |
||||
![]() |
P. Pinna Pintor, S. Colangelo, and M. Bobbio Evolution of case-mix in heart surgery: from mortality risk to complication risk Eur. J. Cardiothorac. Surg., December 1, 2002; 22(6): 927 - 933. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. E. Shaw, H. V. Anderson, R. G. Brindis, R. J. Krone, L. W. Klein, C. R. McKay, P. C. Block, L. J. Shaw, K. Hewitt, W. S. Weintraub, et al. Development of a risk adjustment mortality model using the American College of Cardiology-National Cardiovascular Data Registry (ACC-NCDR) experience: 1998-2000 J. Am. Coll. Cardiol., April 3, 2002; 39(7): 1104 - 1112. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. L. Grunkemeier, K. J. Zerr, and R. Jin Cardiac surgery report cards: making the grade Ann. Thorac. Surg., December 1, 2001; 72(6): 1845 - 1848. [Full Text] [PDF] |
||||
![]() |
D. M. Shahian, S.-L. Normand, D. F. Torchiana, S. M. Lewis, J. O. Pastore, R. E. Kuntz, and P. I. Dreyer Car |