Agreement Is Poor Among Current Criteria Used to Define Response to Cardiac Resynchronization Therapy
Background— Numerous criteria believed to define a positive response to cardiac resynchronization therapy have been used in the literature. No study has investigated agreement among these response criteria. We hypothesized that the agreement among the various response criteria would be poor.
Methods and Results— A literature search was conducted with the keywords “cardiac resynchronization” and “response.” The 50 publications with the most citations were reviewed. After the exclusion of editorials and reviews, 17 different primary response criteria were identified from 26 relevant articles. The agreement among 15 of these 17 response criteria was assessed in 426 patients from the Predictors of Response to Cardiac Resynchronization Therapy (PROSPECT) study with Cohen’s κ-coefficient (2 response criteria were not calculable from PROSPECT data). The overall response rate ranged from 32% to 91% for the 15 response criteria. Ninety-nine percent of patients showed a positive response according to at least 1 of the 15 criteria, whereas 94% were classified as a nonresponder by at least 1 criterion. κ-Values were calculated for all 105 possible comparisons among the 15 response criteria and classified into standard ranges: Poor agreement (κ≤0.4), moderate agreement (0.4<κ<0.75), and strong agreement (κ≥0.75). Seventy-five percent of the comparisons showed poor agreement, 21% showed moderate agreement, and only 4% showed strong agreement.
Conclusions— The 26 most-cited publications on predicting response to cardiac resynchronization therapy define response using 17 different criteria. Agreement between different methods to define response to cardiac resynchronization therapy is poor 75% of the time and strong only 4% of the time, which severely limits the ability to generalize results over multiple studies.
Received September 19, 2009; accepted February 17, 2010.
Predicting whether a patient will benefit, or “respond,” to cardiac resynchronization therapy (CRT) has been the focus of more than 500 publications during the last 5 years; however, the definition of response to CRT varies widely between studies, and numerous criteria to define a positive response to CRT exist in the literature. “Echocardiographic” response is typically assessed by quantifying the change in left ventricular ejection fraction1–4 or left ventricular end-systolic volume (LVESV)2,5–10 3 to 6 months after CRT implantation. “Clinical” response is assessed with the increase in the distance walked in 6 minutes11 or improvement in New York Heart Association functional class2,12–14 3 to 6 months after CRT implantation. Some studies have defined response to CRT as a combination of several clinical measures15–17 or as a combination of both clinical and echocardiographic measures.18
Editorial see p 1977
Clinical Perspective on p 1991
The heterogeneous approach to defining response to CRT is a potential barrier to progress in this field. No study has addressed this issue by investigating the agreement among the numerous published CRT response criteria. If these different response criteria show poor agreement, then the ability to generalize results from multiple studies is severely impaired, and a standard needs to be developed. We hypothesized that the agreement between the various published response criteria would be poor. We tested this hypothesis by identifying response criteria from a literature search and then assessing the statistical agreement among the different criteria in the 426 patients enrolled in the Predictors of Response to Cardiac Resynchronization Therapy (PROSPECT) study.
To identify commonly used criteria to define response to CRT, a literature search was conducted with the Web of Science “Science Citation Index Expanded” database19 using the topics “cardiac resynchronization” and “response.” The 50 publications with the most citations were reviewed for relevance (Figure 1). Four review articles and 20 publications that did not report individual response criteria were excluded.
Seventeen different primary response criteria were identified from the 26 remaining publications (Table 1).1–18,20–27 Eight of these 17 response criteria were based on echocardiography, 8 were based on clinical measures, and 1 criterion was based on a combination of both echocardiographic and clinical measures. Six of the 17 response criteria included either all-cause or heart failure mortality as a criterion to define a nonresponse, whereas the other 11 did not.
Agreement between response criteria was assessed with information from the baseline and 6-month follow-up visits for the 426 patients in the PROSPECT study.10 Briefly, PROSPECT was a prospective, multicenter study that was designed to test the ability of 12 different echocardiographic dyssynchrony parameters to predict response to CRT. Four hundred fifty-seven patients with standard CRT indications (New York Heart Association class III/IV heart failure, left ventricular ejection fraction ≤35%, QRS ≥130 ms, and stable medical regimen) were enrolled in PROSPECT at 53 centers worldwide. After the exclusion of 31 patients who exited the study early and did not receive an implant, 426 patients remained and were followed up for 6 months after CRT implantation.
Statistics and Quantification of Agreement
The Cohen κ-coefficient was used to assess agreement between the different response criteria. The κ-coefficient is an accepted statistical coefficient that is used to assess agreement between methodologies.28,29 The κ-coefficient ranges from −1 (perfect disagreement) to + 1 (perfect agreement), and a κ-coefficient of 0 indicates that the amount of agreement was exactly that expected by chance.28 A κ-coefficient ≥0.75 was defined as strong agreement, 0.4<κ<0.75 was defined as moderate agreement, and κ ≤0.4 was defined as poor agreement (Table 2).29
An example of a κ-value calculation is shown in Table 3 and presented in the Results section.
Two of the 17 response criteria24,25 could not be calculated from PROSPECT data because (1) oxygen consumption at peak exercise was not measured in PROSPECT and (2) height and weight at 6 months were not measured in PROSPECT, which precludes the ability to calculate the LVESV index. Agreement among the 15 remaining response criteria was assessed by calculating a κ-value for all possible pairs of criteria. Thus, 105 κ-values were calculated. Group κ-values were then quantified (mean, median, and range) to summarize the agreement among criteria from 4 different groups: (1) All 105 comparisons, (2) comparisons between 2 echocardiographic criteria, (3) comparisons between 2 clinical criteria, and (4) comparisons between an echocardiographic criterion and a clinical criterion. Mean group κ-values were compared with a permutation test, and a Bonferroni correction was applied to account for multiple comparisons. To justify that the sample size to estimate κ-values was large enough, a bootstrap resampling procedure was used with a Kolmogorov-Smirnov test to assess the normality of the resulting distribution. P<0.05 was defined as statistically significant.
Subgroup Analysis Excluding Response Criteria Quantified Short Term and at 3 Months
One response criterion (increase in stroke volume ≥15%4,21,22) was used in the literature as a short-term response measure that was quantified within 2 days of CRT implantation. In addition, 2 response criteria from the literature were only used in studies that assessed response at 3 months (criteria 15 and 17 in Table 1). Short-term (<2 days) and 3-month data for calculating all 15 response measures were not collected in PROSPECT, so all criteria were assessed with data from the 6-month visit. To ensure that the present results were not confounded by calculating these short-term and 3-month measures from 6-month follow-up data, we performed a subgroup analysis in which we excluded them and recalculated the group κ-values.
The percentage of patients defined as having a positive “response” to CRT ranged from 32% to 91% for the 15 response criteria (Table 4). All 15 criteria could be calculated in 250 of the 426 patients in PROSPECT. Of these 250 patients, 99% showed a positive response according to at least 1 of the 15 criteria, whereas 94% were classified as a nonresponder by at least 1 criterion. Similarly, 95% of patients showed a positive response by at least 2 of the 15 criteria, whereas 87% were classified as nonresponders by at least 2 criteria.
Example Calculation of κ-Value
An example of the κ-value calculation between response criterion 3 (decrease in LVESV ≥10%, no death due to heart failure) and response criterion 13 (increase in 6-minute walk distance ≥10%, no heart failure death, no transplant) is given in Table 3. Response criterion 3 identified 62% of patients who received CRT as responders (and thus, 38% of the patients as nonresponders), and criterion 13 also identified 62% of patients as responders. The expected agreement due to chance alone was therefore 0.62×0.62+0.38×0.38=53%. The observed agreement was 0.39+0.15=54%. Equation 1 then shows that κ=0.02 for Table 3, which suggests poor agreement after accounting for the level of agreement expected due to chance.
Agreement Among the 15 Response Criteria
The 15 response criteria showed poor agreement as a group (Figures 2 and 3⇓; mean κ=0.22±0.24, median=0.14, range=−0.2 to 0.97). Seventy-nine (75%) of the 105 κ-values were classified as having poor agreement, whereas 22 κ-values (21%) were classified as having moderate agreement (Figure 3). Only 4 pairs of response criteria of the 105 total pairs (4%) showed strong agreement, and 2 of these 4 pairs were comparisons between a response criterion that excluded mortality and the same exact criterion that defined death as a nonresponse. The 7 echocardiographic response criteria also showed poor agreement among each other (Figure 4; mean κ=0.35±0.28, median=0.29, range=−0.2 to 0.88). The 7 clinical response criteria showed moderate agreement (Figure 4; mean κ=0.44±0.23, median=0.43, range=0.14 to 0.97). Agreement between echocardiographic and clinical criteria was poor (Figure 4; mean κ=0.05±0.05, median=0.04, range=−0.03 to 0.17), with all 49 κ-values showing poor agreement. The agreement among the echocardiographic parameters was not significantly different from the agreement among the clinical parameters (uncorrected P=0.35). The response criterion based on a combination of both echocardiographic and clinical measures showed significantly better agreement (P=0.003) with clinical response criteria (0.44±0.10) than with echocardiographic response criteria (0.21±0.12). Bootstrap resampling of the κ-statistic comparing clinical composite response and a 15% reduction in LVESV justified that the sample size was large enough to estimate the κ-value (Kolmogorov-Smirnov test for normality P>0.15).
Subgroup Analysis Excluding Response Criteria Quantified Short Term and at 3 Months
Exclusion of the short-term and 3-month response criteria did not significantly affect the results. After exclusion of 1 short-term (criterion 8 in Table 1) and two 3-month (criteria 15 and 17 in Table 1) response measures, the agreement among the 12 remaining response criteria was poor as a group (mean κ=0.22±0.26, median=0.14, range=−0.03 to 0.97). Agreement among the 6 remaining echocardiographic response criteria was moderate (mean κ=0.44±0.24, median=0.47, range=0.16 to 0.88). Agreement among the 6 remaining clinical response criteria was also moderate (mean κ=0.42±0.27, median=0.32, range=0.14 to 0.97). Finally, agreement between echocardiographic and clinical criteria remained poor (mean κ=0.05±0.05, median=0.04, range=−0.03 to 0.17).
The major findings of this study are as follows: (1) The 26 most-cited publications on predicting response to CRT used 17 different primary response criteria, and the level of agreement not due to chance among 15 of these response criteria was poor 75% of the time and strong only 4% of the time in the 426 patients enrolled in the PROSPECT study; (2) agreement between echocardiographic and clinical response criteria was poor and nearly equal to the level of agreement expected by chance; (3) the percentage of patients defined as having a positive response to CRT ranged from 32% to 91% for the 15 response criteria; and (4) 99% of patients were classified as a responder by at least 1 of the 15 criteria, whereas 94% were classified as a nonresponder by at least 1 criterion.
Comparison to the Literature
To the best of our knowledge, no study has quantified agreement among response criteria with a κ-coefficient; however, a recent study by Bleeker et al2 aimed to quantify the agreement between echocardiographic and clinical measures of response to CRT. The authors compared a decline in New York Heart Association class (clinical response) with a 15% decrease in LVESV (echocardiographic response) in 144 consecutive patients undergoing CRT. The authors concluded that “the agreement between [clinical response and echocardiographic response] was good” based on the observed agreement of 76%. However, their data show that clinical and echocardiographic responses would be expected to agree 52% of the time on the basis of chance alone. The study did not calculate κ-values to account for this expected level of agreement due to chance. We estimated the κ-value to be 0.50 from their data, which is higher than the value of 0.17 observed in the present study. However, the main conclusion that should be drawn from both studies is similar: The agreement between echocardiographic and clinical criteria for defining a positive response to CRT is only slightly better than that expected by chance alone.
In the MIRACLE trial (Multicenter InSync Randomized Clinical Evaluation), correlation between the change in left ventricular end-diastolic volume and change in New York Heart Association class after 6 months of CRT was weak (r=0.13).30 In addition, the correlation between the change in distance walked in 6 minutes and change in left ventricular ejection fraction was weak (r=0.15).30 These data are consistent with the present results, which show poor agreement between clinical and echocardiographic response criteria.
Previous studies have reported different rates of response to CRT when different definitions of response are used within the same population. For example, the PROSPECT study reported that 56% of patients were echocardiographic responders (defined by a reduction in LVESV of at least 15%), whereas 69% of patients were clinical responders (defined by an improvement in the clinical composite score).10 Thus, one would expect these measures to show poor agreement because of the different response rates. However, the actual response rate does not tell the entire story: Table 3 shows 2 different response criteria with identical response rates of 62%, and despite the identical response rates, the criteria show very poor agreement (κ=0.02). Thus, assessment of agreement with the κ-statistic provides valuable information in addition to the overall response rate of the population.
Other Inconsistencies in Defining Response to CRT
Length of Follow-Up
Another area of inconsistency in defining response to CRT is the length of the follow-up period after which a patient is deemed either a responder or a nonresponder. Some studies focused on short-term (1 to 2 days) response,1,4,21,22 whereas most focused on 3-month5–9,18 or 6-month2–4,10–17,20,23–27 response. CRT has been shown to have persistent, increasing benefits with a longer mean follow-up period of 29.5 months.31 We defined response at 6 months because this was the prespecified follow-up period for the PROSPECT study. We also performed a subgroup analysis after excluding criteria that were assessed in the literature at short-term and 3-month follow-up only, and this did not change the present results. Future studies will be needed to address agreement among the different lengths of follow-up.
Whether death should be considered a nonresponse to CRT is another area in which there is inconsistency. There are at least 3 different methods that authors have used to incorporate death into their response criteria: (1) Death due to worsening heart failure is included in the nonresponder group,11,16,17,20,23,27 (2) death due to any cause is included in the nonresponder group,24 and (3) deaths are excluded from analysis.3,5,6,8,9,26 Moreover, numerous publications fail to specify how death was incorporated into response criteria despite enrolling consecutive patients and following them for a 3- to 6-month period.2,12–14,18 Although inclusion of all-cause mortality as a criterion for nonresponse may not be appropriate, a patient who dies of progressive heart failure should, objectively, be classified as a nonresponder. Regardless, there is no consistent method for incorporating mortality into the definition of response to CRT, and this needs to be standardized.
A Consensus Definition of “Response to CRT”
Because heart failure is a debilitating life-threatening disease, an effective heart failure therapy should treat both symptoms and quality and duration of life.32 Thus, measures of “response” to CRT should either directly measure outcomes or have a surrogate relationship with benefits in heart failure symptoms, quality of life, and duration of life. The clinical composite score33 is a measure of response that accounts for all of these factors and may be the best overall choice for defining response in future CRT trials.
The results of the present study are limited by the fact that we used data from a single study (PROSPECT). However, PROSPECT was a multicenter study that enrolled 457 wide-QRS patients from 53 different centers across Europe, Hong Kong, and the United States. We would expect similar results from other large, multicenter databases.
The present results show that many different methods to define a positive response to CRT are being used in the literature and show poor agreement among each other. This begs the question, which method should we use in the future to determine whether a patient benefited from CRT? The present study did not attempt to address this question, and future studies will need to explore this important issue.
The definition of clinically acceptable agreement based on the κ-coefficient is not standardized. Fleiss29 proposed a threshold of ≥0.75 to define strong evidence for agreement that is not due to chance, which is what we used; however, this threshold is somewhat arbitrary. Landis and Koch34 proposed that any value of κ above 0.60 suggests substantial agreement, and κ>0.80 implies “almost perfect” agreement. However, the use of a different threshold, such as 0.6, to define strong agreement would not significantly alter the present results; 4 of the 105 κ-statistics that we calculated were greater than 0.8, and only 8 were ≥0.6. Thus, regardless of the threshold used, we observed mostly poor agreement among the 15 different response criteria.
The 26 most-cited publications on predicting response to CRT define response using 17 different criteria. Agreement between these different published methods to define response to CRT is poor 75% of the time and strong only 4% of the time. This inconsistency in the definition of response to CRT severely limits the ability to generalize results over multiple studies and hinders progress in the field.
Sources of Funding
This work was supported by grants from the American Heart Association (Grant-in-Aid No. 0855386E) and the National Institutes of Health (HL089160) to Dr Oshinski. Dr Fornwalt was supported in part by National Institutes of Health training grant No. 5 T32 GM008169.
Dr Gerritse is an employee of Medtronic Inc and owns company stock. The remaining authors report no conflicts.
Bax JJ, Marwick TH, Molhoek SG, Bleeker GB, van Erven L, Boersma E, Steendijk P, van der Wall EE, Schalij MJ. Left ventricular dyssynchrony predicts benefit of cardiac resynchronization therapy in patients with end-stage heart failure before pacemaker implantation. Am J Cardiol. 2003; 92: 1238–1240.
Suffoletto MS, Dohi K, Cannesson M, Saba S, Gorcsan J. Novel speckle-tracking radial strain from routine black-and-white echocardiographic images to quantify dyssynchrony and predict response to cardiac resynchronization therapy. Circulation. 2006; 113: 960–968.
Yu CM, Chan YS, Zhang Q, Yip GWK, Chan CK, Kum LCC, Wu L, Lee APW, Lam YY, Fung JWH. Benefits of cardiac resynchronization therapy for heart failure patients with narrow QRS complexes and coexisting systolic asynchrony by echocardiography. J Am Coll Cardiol. 2006; 48: 2251–2257.
Yu CM, Fung JWH, Chan CK, Chan YS, Zhang Q, Lin H, Yip GWK, Kum LCC, Kong SL, Zhang Y, Sanderson JE. Comparison of efficacy of reverse remodeling and clinical improvement for relatively narrow and wide QRS complexes after cardiac resynchronization therapy for heart failure. J Cardiovasc Electrophysiol. 2004; 15: 1058–1065.
Yu CM, Zhang Q, Chan YS, Chan CK, Yip GWK, Kum LCC, Wu EB, Lee PW, Lam YY, Chan S, Fung JWH. Tissue Doppler velocity is superior to displacement and strain mapping in predicting left ventricular reverse remodelling response after cardiac resynchronisation therapy. Heart. 2006; 92: 1452–1456.
Chung ES, Leon AR, Tavazzi L, Sun JP, Nihoyannopoulos P, Merlino J, Abraham WT, Ghio S, Leclercq C, Bax JJ, Yu CM, Gorcsan J, Sutton MS, De Sutter J, Murillo J. Results of the Predictors of Response to CRT (PROSPECT) trial. Circulation. 2008; 117: 2608–2616.
Diaz-Infante E, Mont L, Leal J, Garcia-Bolao I, Fernandez-Lozano I, Hernandez-Madrid A, Perez-Castellano N, Sitges M, Pavon-Jimenez R, Barba J, Cavero MA, Moya JL, Perez-Isla L, Brugada J; SCARS Investigators. Predictors of lack of response to resynchronization therapy. Am J Cardiol. 2005; 95: 1436–1440.
Molhoek SG, Bax JJ, Boersma E, Van Erven L, Bootsma M, Steendijk P, Van Der Wall EE, Schalij MJ. QRS duration and shortening to predict clinical response to cardiac resynchronization therapy in patients with end-stage heart failure. Pacing Clin Electrophysiol. 2004; 27: 308–313.
Bleeker GB, Kaandorp TAM, Lamb HJ, Boersma E, Steendijk P, de Roos A, van der Wall EE, Schalij MJ, Bax JJ. Effect of posterolateral scar tissue on clinical and echocardiographic improvement after cardiac resynchronization therapy. Circulation. 2006; 113: 969–976.
Ypenburg C, Schalij MJ, Bleeker GB, Steendijk P, Boersma E, Dibbets-Schneider P, Stokkel MPM, van der Wall EE, Bax JJ. Impact of viability and scar tissue on response to cardiac resynchronization therapy in ischaemic heart failure patients. Eur Heart J. 2007; 28: 33–41.
Thomson Reuters. Web of Science, Science Citation Index Expanded database; ISI Web of Knowledge. Available at: http://apps.isiknowledge.com. Available at: http://apps.isiknowledge.com. Accessed June 15, 2009.
Bleeker GB, Mollema SA, Holman ER, Van De Veire N, Ypenburg C, Boersma E, van der Wall EE, Schalij MJ, Bax JJ. Left ventricular resynchronization is mandatory for response to cardiac resynchronization therapy: analysis in patients with echocardiographic evidence of left ventricular dyssynchrony at baseline. Circulation. 2007; 116: 1440–1448.
Henneman MM, Chen J, Dibbets-Schneider P, Stokkel MR, Bleeker GB, Ypenburg C, van der Wall EE, Schalij MJ, Garcia EV, Bax JJ. Can LV dyssynchrony as assessed with phase analysis on gated myocardial perfusion SPECT predict response to CRT? J Nucl Med. 2007; 48: 1104–1111.
Lecoq G, Leclercq C, Leray E, Crocq C, Alonso C, de Place C, Mabo P, Daubert C. Clinical and electrocardiographic predictors of a positive response to cardiac resynchronization therapy in advanced heart failure. Eur Heart J. 2005; 26: 1094–1100.
Marcus GM, Rose E, Viloria EM, Schafer J, De Marco T, Saxon LA, Foster E; VENTAK CHF/CONTAK-CD Biventricular Pacing Study Investigators. Septal to posterior wall motion delay fails to predict reverse remodeling or clinical improvement in patients undergoing cardiac resynchronization therapy. J Am Coll Cardiol. 2005; 46: 2208–2214.
Stellbrink C, Breithardt OA, Franke A, Sack S, Bakker P, Auricchio A, Pochet T, Salo R, Kramer A, Spinelli J; CPI Guidant Congestive Heart Failure Research Group. Impact of cardiac resynchronization therapy using hemodynamically optimized pacing on left ventricular remodeling in patients with congestive heart failure and ventricular conduction disturbances. J Am Coll Cardiol. 2001; 38: 1957–1965.
Hulley SB, Cummings SR, Browner WS, Grady D, Hearst N, Newman TB. Designing Clinical Research: An Epidemiologic Approach. 2nd ed. Philadelphia, Pa: Lippincott Williams & Wilkins; 2001.
Fleiss JL. Statistical Methods for Rates and Proportions. 2nd ed. New York, NY: Wiley; 1981.
St John Sutton MG, Plappert T, Abraham WT, Smith AL, DeLurgio DB, Leon AR, Loh E, Kocovic DZ, Fisher WG, Ellestad M, Messenger J, Kruger K, Hilpisch KE, Hill MR. Effect of cardiac resynchronization therapy on left ventricular size and function in chronic heart failure. Circulation. 2003; 107: 1985–1990.
Cleland JG, Daubert JC, Erdmann E, Freemantle N, Gras D, Kappenberger L, Tavazzi L. Longer-term effects of cardiac resynchronization therapy on mortality in heart failure [the CArdiac REsynchronization-Heart Failure (CARE-HF) trial extension phase]. Eur Heart J. 2006; 27: 1928–1932.
A literature search revealed that the 26 most-cited publications on predicting response to cardiac resynchronization therapy defined response using 17 different criteria. No study has investigated agreement among these various response criteria, and we hypothesized that this agreement would be poor. The agreement among 15 of the 17 response criteria was assessed in 426 patients from the PROSPECT study using the Cohen κ-coefficient (2 of the 17 response criteria were not calculable from PROSPECT data). Response rates for the entire population were highly varied and ranged from 32% to 91% for the 15 criteria. Ninety-nine percent of patients showed a positive response by at least 1 of the 15 criteria, whereas 94% were classified as a nonresponder by at least 1 criterion. κ-Values were calculated for all 105 possible comparisons among the 15 response criteria and classified into standard ranges: Poor agreement (κ≤0.4), moderate agreement (0.4<κ<0.75), and strong agreement (κ≥0.75). Seventy-five percent of the comparisons showed poor agreement, 21% showed moderate agreement, and only 4% showed strong agreement. Thus, agreement between different methods to define response to cardiac resynchronization therapy is poor 75% of the time and strong only 4% of the time, which severely impairs the ability to generalize results over multiple studies. This lack of standardization hinders progress in cardiac resynchronization therapy research and needs to be resolved.
Continuing medical education (CME) credit is available for this article. Go to http://cme.ahajournals.org to take the quiz.