Generic, Simple Risk Stratification Model for Heart Valve Surgery
Background— Heart valve surgery has an associated in-hospital mortality rate of 4% to 8%. This study aims to develop a simple risk model to predict the risk of in-hospital mortality for patients undergoing heart valve surgery to provide information to patients and clinicians and to facilitate institutional comparisons.
Methods and Results— Data on 32 839 patients were obtained from the Society of Cardiothoracic Surgeons of Great Britain and Ireland on patients who underwent heart valve surgery between April 1995 and March 2003. Data from the first 5 years (n=16 679) were used to develop the model; its performance was evaluated on the remaining data (n=16 160). The risk model presented here is based on the combined data. The overall in-hospital mortality was 6.4%. The risk model included, in order of importance (all P<0.01), operative priority, age, renal failure, operation sequence, ejection fraction, concomitant tricuspid valve surgery, type of valve operation, concomitant CABG surgery, body mass index, preoperative arrhythmias, diabetes, gender, and hypertension. The risk model exhibited good predictive ability (Hosmer-Lemeshow test, P=0.78) and discriminated between high- and low-risk patients reasonably well (receiver-operating characteristics curve area, 0.77).
Conclusions— This is the first risk model that predicts in-hospital mortality for aortic and/or mitral heart valve patients with or without concomitant CABG. Based on a large national database of heart valve patients, this model has been evaluated successfully on patients who had valve surgery during a subsequent time period. It is simple to use, includes routinely collected variables, and provides a useful tool for patient advice and institutional comparisons.
Received October 19, 2004; revision received February 25, 2005; accepted March 28, 2005.
The profile of cardiac surgical practice is changing. The number of patients undergoing CABG surgery is static or falling in the United Kingdom, whereas the proportion and number of patients requiring surgery for valvular heart disease is increasing.1 Approximately 275 000 heart valve operations are carried out annually worldwide,2 with >9000 carried out in the United Kingdom alone.1 Heart valve surgery has an associated short-term mortality of 4% to 8%, which is at least twice that of CABG surgery in the United Kingdom, United States, and Europe.1,3–6 The growth in valve surgery, higher operative mortality, and increased public, political, and professional interest in comparative outcome measures have encouraged us to explore a model based on UK data.
Over the last 2 decades, several risk models have been proposed to predict the risk of short-term mortality after cardiac surgery on the basis of the patients’ preoperative characteristics.7–10 However, most of these models have been developed for CABG surgery. Although some studies have investigated potential predictors of short-term mortality after heart valve surgery,3,11–13 there are few risk models specifically for heart valve patients.4,6,14 We propose a simple, generic risk model that may be used to predict in-hospital mortality for patients undergoing heart valve surgery with or without concomitant CABG surgery. The aims of the risk model are to provide information to clinicians and patients about the risk of in-hospital mortality after surgery and to facilitate a fairer comparison of institutional performance.
Before starting the modeling process, we prepared a protocol that outlined the various steps to be undertaken in developing and validating the model. The protocol specified the clinical aims of the risk model, a list of potential predictors for in-hospital mortality, exclusion criteria for patients, and all statistical methods to be used.
All patients in the database of the Society of Cardiothoracic Surgeons of Great Britain and Ireland (SCTS) who underwent aortic and/or mitral heart valve surgery, both repair and replacement, from April 1995 to March 2003 were considered for inclusion. Because valvular disorders more commonly affect the aortic and mitral valves,15 we did not consider patients who had only pulmonary and/or tricuspid valve surgery to avoid computational problems associated with small numbers. Salvage patients also were not considered. The clinical outcome considered was in-hospital mortality, defined as the patient’s status at discharge after the operation.
The procedures used to collect these data are described in SCTS reports.1 Briefly, these data are supplied by participating centers and are subject to internal data consistency checks. Additionally, 10% of case notes for patients undergoing coronary surgery were subject to independent scrutiny.16 This scrutiny revealed accurate reporting of in-hospital deaths and good correlation between case note risk factors and those recorded in the database, with both completeness and accuracy improving over time. More recently, the vital status of 16 000 patients undergoing valve and/or coronary surgery was obtained from the Office of National Statistics. We found that the outcome status (dead or alive) of only 2 patients was coded inaccurately.
The following preoperative patient characteristics were chosen in advance as candidate predictors from clinical knowledge and previous research3,4,11–13: age, gender, body mass index (BMI), number and position of implanted heart valves, hypertension (no/yes; treated or blood pressure >140/90 mm Hg), diabetes (no/yes; managed by diet, oral therapy, or insulin), renal failure (none or functioning transplant/creatinine >200 μmoL/dialysis dependency), respiratory disease (no/chronic obstructive pulmonary disease), ejection fraction (<30%/30% to 50%/>50%), arrhythmias (no/atrial fibrillation or heart block/ventricular tachycardia or fibrillation), active endocarditis (no/yes), operative priority (elective/urgent/emergency), operation sequence (previous sternotomy; first/second/third or more), concomitant CABG surgery (no/yes), concomitant tricuspid surgery (no/yes), valve regurgitation or stenosis (no/yes), left ventricular end-diastolic pressure and pulmonary artery wedge pressure, and aortic valve gradient.
The categorization used for ejection fraction, creatinine level, and blood pressure were based on the coding used by the SCTS.
We split the data into development (training) and validation (test) data sets. The development data included all operations within the first 5 years; the validation data included the rest. To ensure reliability of data, we excluded patients who had missing information on key predictors: age, gender, operation sequence, and number and position of implanted heart valves. In addition, patients were excluded from the development data if they were missing information on >3 of the remaining predictors. Any predictor recorded for <50% of patients in the development data was not included in the modeling process, resulting in the exclusion of left ventricular end-diastolic pressure, pulmonary artery wedge pressure, aortic valve gradient, and active endocarditis. Patients were excluded from the validation data if they had missing information on any of the predictors in the risk model.
We initially developed a single risk model for both aortic and mitral valve surgery because it would be easier to apply in practice. To investigate whether the importance of particular predictors varied by type of operation, we compared this model with separate valve-specific risk models. The latter also included valve-specific factors, regurgitation, and stenosis.
To investigate whether exclusions of patients as a result of missing data had introduced any bias, we compared the key preoperative characteristics of patients excluded from the study with those included. Any remaining missing predictor values in the development data were imputed by use of multiple imputation techniques.17,18 Five different imputed data sets were created.
We developed the risk model using a logistic regression model fitted with generalized estimating equations (GEE) methodology19 to adjust for clustering of patients within institutions.20 An exchangeable correlation structure was used to apply the GEE methodology. This methodology assumes that pairwise correlations between patients within the same institution are equal, whereas patients from different institutions are independent. The logistic model was fitted to each of the 5 imputed data sets, and the 5 sets of results were combined to give overall regression coefficients and confidence intervals.17
We selected the predictors for the risk model using a backwards elimination strategy21 with a statistical significance level of 5%. We included year of surgery in the model to adjust for changes that may occur in in-hospital mortality over the 5-year period in the development data. Fractional polynomials were used to explore presence of nonlinear relationships of the continuous predictors of age, BMI, and year to outcome and to suggest possible categorization of these predictors.22
We converted the regression coefficients to integer scores to make the risk model easier to use in practice. The scale factor required for this transformation was found via a grid search so that as much of the predictive accuracy of the original logistic model was retained. We assessed this by calculating the correlation between the predicted log odds from the original and simplified models (risk score). We performed the goodness of fit and validation exercises using the risk scores, with the year effect adjusted for the latest year in the development data (2000).
The goodness of fit of the model in the development data was assessed with the Hosmer-Lemeshow (H-L) test.23 This assesses the agreement between observed and predicted mortality typically within 10 equal-sized groups (deciles) based on the predicted risk of mortality. We plotted the observed and predicted log odds in these groups to ascertain whether the risk model predicted mortality correctly.
We adopted an external validation approach21 to evaluate the performance of the model and used the H-L test to assess the agreement between the observed and predicted mortality in the validation data. We used clinically relevant risk groups based on predicted mortality defined by the cut points of 2.5%, 5%, 10%, and 20% used in other studies.3,24 Accurate predictions of mortality within each of these risk groups would suggest that the risk model is suitable for use for patient advice for all (low- to high-risk) patients.
We also investigated the agreement between the total observed and predicted mortality within each institution in the validation data. For each institution, we predicted the number of deaths from the risk model, and using the binomial distribution, we constructed an interval within which we would expect the observed number of deaths to lie 95% of the time. We then considered whether the observed number of deaths lay within this interval.
The receiver-operating characteristics (ROC) curve area was used as a measure of how well the model discriminated between patients with high and low risks of mortality.25 A reasonably high ROC area suggests that the model may be used to rank patients into treatment groups to facilitate treatment management.
The final step, after a successful validation exercise, was to combine the development and validation data, refit the logistic risk model using all the data, and derive the final risk score.21 The risk score is presented for the latest year in the combined data (2003); no adjustment should be necessary for patients who had surgery at subsequent time periods.
All statistical analyses were carried out with the STATA statistical software.26
Initially, our data set consisted of 42 052 patients. However, 2724 patients were excluded from the development data because they were missing ≥1 key predictors, with another 1476 patients excluded because they had >3 predictors missing. In total, 5013 patients were excluded from the validation data because they were missing information on predictors in the proposed risk model. The final study sample comprised of 32 839 patients from 30 institutions. The excluded patients had an overall in-hospital mortality of 6.9% and a mean age of 65.5 years (SD, 12.5 years), 40.7% were female, and 63.9% and 29.6% had isolated aortic and isolated mitral valve operations, respectively. The included patients had 6.4% mortality and a mean age of 64.8 years (SD, 12.3 years), 41.6% were female, and 64.4% and 29.4% had isolated aortic and isolated mitral valve operations. These values suggest that our exclusion criteria have not caused any important clinical bias.
The proportion of patients by valve operation and the associated mortality are presented in Table 1. A total of 2089 deaths (6.4%) were observed. Aortic valve surgery was more common and was associated with a lower crude mortality than mitral valve surgery. About 2% of the patients had concomitant tricuspid valve procedure at the time of aortic and/or mitral valve surgery, and a very small percentage (0.5%) of patients had concomitant pulmonary valve surgery. The development and validation data have a similar breakdown by types of valve surgery, but there is lower overall mortality in the validation data. This could be due partly to improvements in surgical procedure and postoperative care over the time period studied.
The preoperative characteristics (predictors) of the patients are summarized in Table 2⇓. The patients in the 2 data sets are similar, although hypertension and high BMI are more prevalent in the validation data and high creatinine levels are more prevalent in the development data. Table 2⇓ also shows the percentage missing for each predictor. The 3 predictors that required the most imputation in the development data were arrhythmias (32.3% missing) and renal (14.1%) and respiratory (11.8%) disease.
Respiratory diseases was the only predictor dropped by the backward elimination algorithm (P=0.35). The odds ratios (ORs) for the remaining predictors are shown in Table 3. Rounding the regression coefficients to integer scores resulted in little loss of predictive accuracy; the correlation between the predicted log odds before and after the simplification exceeded 0.99. However, the H-L test suggested that the risk model did not fit the development data well (P<0.001; Figure 1a); it overpredicted risk of mortality for the low-risk patients and underpredicted mortality for the higher-risk group. Therefore, the model was recalibrated by considering possible transformations for the predicted log odds; an exponential function was found to be appropriate. This recalibration greatly improved the fit of the model (HL test, P=0.59; Figure 1b).
A more stringent validation exercise was performed by examining the ability of the model to make accurate predictions for a subsequent group of patients not used for its development. The risk model predicted mortality in the validation data accurately (Table 4), despite the lower observed mortality (Table 1). Additionally, the model predicted mortality well within each clinical risk group (H-L test, P=0.78; Table 4).
A plot of the total observed and predicted mortality is shown for the institutions in Figure 2, with 95% intervals placed around the predicted values. There are fewer institutions because the risk model could not be applied to 5 of the institutions: One did not contribute validation data and 4 had missing information on arrhythmias. The plot shows that observed mortality lies within the range predicted by the model for 18 of the 25 institutions. This type of plot may be used to identify institutions with unexpectedly low or high mortality, perhaps caused by unmeasured patient or institutional characteristics.
The area under the ROC curve was 0.77 (95% CI, 0.76 to 0.79), which suggests that the risk model has reasonable discriminatory ability and may be used to stratify patients into risk groups for treatment management.
To examine whether a single risk model is appropriate for patients undergoing different types of valve operations listed in Table 1, we examined the performance of the model for patients divided into subgroups according to their operation type. The aortic and mitral patients, with and without CABG, were combined to avoid problems with small numbers. We used the same clinical risk groups as before but combined the 2 higher-risk groups to avoid problems with small numbers. Figure 3 demonstrates good agreement between the observed and predicted mortality for patients for all operation types, suggesting that a single risk model is appropriate. The H-L test results and ROC areas (Table 5) also suggest good calibration and reasonable discrimination.
We also investigated whether the model was suitable for both patients who had had previous cardiac surgery and those who had not. The plots of observed and predicted mortality (Figure 4) and the corresponding H-L test values (Table 5) suggest that the model works very well for both groups of patients.
We then refitted the risk model after combining the development and validation data, derived risk scores (Table 6), and recalibrated the risk model. The highest risk scores were associated with emergency priority, followed by age >79 years, renal failure with dialysis, and ≥2 previous cardiac operations.
The total risk score for a patient can be related to the probability of death through the use of a lookup table (Table 7) or the following relationships: log odds=1.36−1.75×exp(1.45−0.0716×S), and risk of in-hospital death (%)=100/[1+exp(−log odds)], where S is the sum of the risk scores for an individual patient.
Investigation of Additional Predictors
Active endocarditis has been observed to be an important risk factor of mortality in some studies.4,11 Following our protocol, we excluded active endocarditis because information on this factor was missing for >50% of patients. We did investigate, however, whether it had important prognostic value by developing a risk model using just those patients for whom it had been measured (n=9257). The predictor was statistically significant (P<0.001), but the OR of 1.73 (95% CI, 1.45 to 2.07) does not classify active endocarditis as one of the stronger predictors (see Table 3). Furthermore, this risk model does not provide improved performance in the validation data as assessed by the ROC area and H-L test.
We also investigated whether the presence of stenosis or regurgitation is associated with mortality using valve-specific models. We developed 2 separate models for isolated aortic and isolated mitral valve patients for whom regurgitation and stenosis information was available. The predictor representing stenosis and/or regurgitation was not statistically significant for either aortic patients (P=0.15) or mitral patients (P=0.48). These results agree with the findings of other authors.6 Additionally, the valve-specific models did not offer any improvement in performance compared with the single model.
Heart valve surgery is the second-most-common type of cardiac surgery, accounting for 20% to 35% of all cardiac surgical procedures, with an in-hospital mortality of 4% to 8%.1,3–6 Although several studies have investigated potential predictors of short-term mortality after heart valve surgery,3,11–13 there are few risk models specifically for heart valve patients that present all the information necessary for use in health institutions.4,6,14
Nowicki and colleagues6 used 8943 patients from 8 New England medical centers to derive separate risk models for in-hospital mortality after isolated aortic valve and isolated mitral valve surgery and provided simplified risk scores complete with a lookup table. These models cannot be applied to patients undergoing multiple valve surgery. The models were validated on data used for their development; thus, it is unclear how well the risk models will perform in independent data or how accurate the simplified risk scores are compared with the original logistic models.
Edwards and colleagues4 derived 2 risk models, one for isolated valve surgery and one for valve surgery plus CABG. Their outcome included in-hospital mortality and 30-day mortality if the patient had been discharged. Single imputation methods were used to handle missing data. This risk model is not in the public domain because the authors do not present all the information necessary to calculate the risk of death for a patient. Again, these models cannot be applied to patients undergoing multiple valve surgery.
Florath and colleagues,14 using 1400 patients from one institution in Germany, have developed a risk model for 30-day mortality after aortic valve surgery. The model also allows for concomitant CABG and mitral valve surgery. However the risk model was developed on a small data set, which may lead to overfitting.21 Because the authors performed internal validation only, it is unclear how well the risk model will perform in independent data.
The aim of our work was to develop a single, simple risk model for heart valve patients that can be used to predict in-hospital mortality and facilitate better informed consent and fairer comparisons of institutional performance. We used a large national database consisting of 30 institutions (SCTS) and 32 502 patients to develop and validate the risk model.
We took into account the hierarchical nature of the data—ie, patients clustered within institutions—by using logistic models based on GEE. The results proved to be somewhat different from those obtained with ordinary logistic regression, indicating that the correlation between patients within an institution should not be ignored in the modeling process. Shahian and colleagues20 have previously suggested that risk modeling should incorporate the appropriate hierarchical nature to obtain correct statistical inferences.
Missing data were handled by use of multiple imputation techniques.17,18 This method allows the use of far more data than would be permitted through a complete case analysis and is superior to single imputation techniques.18
Our single risk model can be applied to patients undergoing aortic valve surgery, mitral valve surgery, or both, with or without concomitant CABG surgery. A single, generic model for all valves is arguably more useful because many institutions perform relatively small numbers of valve procedures. The validation results suggest that our single model is suitable for different types of operation.
Our proposed risk model identified age, BMI, gender, site of valve surgery, concomitant CABG and tricuspid valve surgery, renal failure, diabetes, hypertension, poor ejection fraction, arrhythmias, number of previous cardiac operations, and priority of surgery as important predictors of in-hospital mortality. The strongest predictors were operative priority, followed by renal failure, age, and operation sequence. These findings are supported by previous studies.3,4,6,11–13
Our risk model contains relatively few preoperative predictors,13 all of which are routinely collected in most institutions in a reliable manner. Including more predictors may limit the applicability of a risk model, particularly in institutions in which resources to collect such data are limited. Some approaches could be adopted even if these predictors were missing for some patients. One would be to calculate a range containing the minimum and maximum risks for a patient by assuming the 2 extreme possibilities for a missing predictor value. For example, if a patient is missing information on priority, we could obtain lower and higher risk estimates by assuming “elective” and “emergency,” respectively. If we simply assume that the missing value is the most popular category, we will typically underestimate the risk.
Active endocarditis was found to have a statistically significant association with in-hospital mortality. It was excluded from the model because this predictor is collected in <50% of institutions in the United Kingdom. Inclusion of this predictor would immediately limit wider application of the model. Furthermore, active infective endocarditis is a relatively uncommon pathology, responsible for only 4% of valve operations.5 Some elements of the increased risk are likely to be picked up by the weightings for surgical priority, arrhythmias, and renal failure. Our investigation with this predictor suggests that, despite its statistical significance, it may not provide much improvement in model performance.
The data consisted of operations performed over an 8-year period. Consequently, we adjusted for year of operation in our model to allow for possible differences in in-hospital mortality resulting from changes in surgical procedures and patient care over time. This is an approach taken by others.6 The risk score and lookup table (Tables 6 and 7⇑) are presented for the latest year in our data (2003) and should be appropriate for operations performed after this date, as suggested by our validation exercise. However, we suggest that all proposed risk models have a “use-by date” and perhaps be updated every few years, depending on clinical issues and statistical performance.
We have used data from the Great Britain and Ireland national cardiac surgical database, which is the best source of information on predictors for heart valve patients from the United Kingdom. However, the database was not designed specifically for the derivation of heart valve risk models, and the data were supplied voluntarily by institutions. Consequently, we are missing information on a few valve-specific predictors. We had to rely on the internal data collecting procedures used by the individual institutions; however, by excluding patients with missing values for key predictors and those with several predictors missing, we have attempted to provide some quality control for the data.
We performed temporal validation to assess the performance of the risk model, developing the risk model on early data and validating it on later data. This type of validation exercise, described as external by Harrell,21 is more stringent than randomly splitting the data into development and validation data sets.21 However, a more stringent test is to perform external validation using completely new data from other institutions to further assess the generalizability of the proposed model. We encourage other researchers to do this.
The risk model proposed here provides a simple, useful tool for risk stratification for most patients undergoing valve surgery.
This work was supported by a grant from the Garfield Weston Trust.
Keogh B, Kinsman R. Fifth National Adult Cardiac Surgical Database Report. Society of Cardiothoracic Surgeons of Great Britain and Ireland; 2003. Available at: http://www.scts.org. Accessed June 21, 2005.
Nowicki ER, Birkmeyer NJO, Weintraub RW, Leavitt BJ, Sanders JH, Dacey LJ, Clough RA, Quinn RD, Charlesworth DC, Sisto DA, Uhlig PN, Olmstead EM, O’Connor GT. Multivariable prediction of in-hospital mortality associated with aortic and mitral valve surgery in northern New England. Ann Thorac Surg. 2004; 77: 1966–1977.
Nashef SAM, Roques F, Michel P, Gauducheau E, Lemeshow S, Salamon R, for the EuroSCORE Study Group. European system for cardiac operatic risk evaluation (EuroSCORE). Eur J Cardiothorac Surg. 1999; 16: 9–13.
Eagle KA, Guyton RA, Davidoff R, Edwards FH, Ewy GA, Gardner TJ, Hart JC, Herrmann HC, Hillis LD, Hutter AM Jr, Lytle BW, Marlow RA, Nugent WC, Orszulak TA. ACC/AHA 2004 guideline update for coronary artery bypass graft surgery: a report of the American College of Cardiology/American Heart Association Task Force on Practice Guidelines (Committee to Update the 1999 Guidelines for Coronary Artery Bypass Graft Surgery). Circulation. 2004;110:e340–e438. Available at: http://circ.ahajournals.org/cgi/reprint/110/14/e340. Accessed June 10, 2005.
Potapov EV, Loebe M, Anker S, Stein J, Bondy S, Nasseri BA, Sodian R, Hausmann H, Hetzer R. Impact of body mass index on outcome in patients after coronary artery bypass grafting with and without valve surgery. Eur Heart J. 2003; 24: 1933–1941.
Bender J. Heart valve disease. In: Zaret BL, Moser M, Cohen LS, eds. Yale University School of Medicine Heart Book. New York NY: Harper Collins; 2002: chap 13.
Fine L, Keogh B, Cretin S, Orlando M, Gould M. How to evaluate and improve the quality and credibility of an outcomes database: validation and feedback study on the UK Cardiac Surgery Experience. BMJ. 2003; 326: 25–28.
Hu FB, Goldberg J, Hedeker G, Flay BR, Pentz MA. Comparison of population-averaged and subject-specific approaches for analyzing repeated binary outcomes. Am J Epidemiol. 1998; 147: 694–703.
Harrell FE. Regression Modeling Strategies. New York, NY: Springer-Verlag; 2001.
Royston P, Ambler G, Sauerbrei W. The use of fractional polynomials to model continuous risk factors in epidemiology. Int J Epidemiol. 1999; 28: 964–974.
Hosmer DW, Lemeshow S. Applied Logistic Regression. New York, NY: John Wiley and Sons; 2000.
Al-Ruzzeh S, Asimakopoulos G, Ambler G, Omar R, Hasan R, Fabri B, El-Gamel A, DeSouza A, Zamvar V, Griffin S, Keenan D, Trivedi U, Pullan M, Cale A, Cowen M, Taylor K, Amrani M. Validation of four different risk stratification systems in patients undergoing off-pump coronary bypass graft surgery. Heart. 2003; 89: 432–435.
StataCorp. Stata Statistical Software, release 8. College Station, Tex: StataCorp; 2003.