Measurement of Clinical Efficacy in Studies of Heart Failure
To the Editor:
A recent clinical investigation reported by Packer et al1 used a multiplicity of end points to assess the efficacy of carvedilol for heart failure because the most appropriate measure(s) have not been firmly established. Statistically significant favorable effects of carvedilol compared with placebo were observed for measures commonly used in clinical practice, such as NYHA classification, physician’s assessment of changes in clinical status, and asking patients if they felt better compared with baseline. These congruent results do not represent independent assessments. Other measures, including a previously validated quality-of-life questionnaire, did not demonstrate significant differences.2 3 These discrepant results may have been due to differences in the content of measures, timing of assessments, and statistical methods. Nevertheless, the authors concluded that these results have important implications for future clinical trials, implying that simple symptom assessments were adequate to assess clinical efficacy because they correspond with clinical practice and demonstrated differences compared with placebo. Are the simple symptom assessments adequate measures of therapeutic efficacy?
Reliable clinical measurements result from methods that can be applied consistently at different times and by different investigators.4 The NYHA classification has been shown to have poor interobserver agreement in part because asking about “ordinary” physical activities is rather imprecise.5 When standard activities such as walking a specific distance or climbing a flight of stairs are used, it can enhance this measure’s reliability,5 but concerns about what investigators take into consideration persist. For example, 60% of the patients in the carvedilol study were classified as NYHA class III or IV. The protocol specified that class III patients should be symptomatic walking 200 yards or up a flight of stairs. However, the mean distance walked in a corridor was >345 m, and all patients had to walk at least 150 m. Perhaps these patients developed symptoms at much shorter distances and continued to walk. Another possibility is that protocol criteria for NYHA classification were not applied consistently.
Similarly, asking patients if they “feel better or worse” does not specify what the patient or investigator should focus on each time this measurement is made. Indeed, one doesn’t know what is actually being measured when patients say they feel better. Is the response based on changes in symptoms of heart failure or some other aspect of their care? The large placebo response seen in the carvedilol and other studies raises serious concerns about what was measured by these so-called “global” questions.6 Single questions are not global measures in the sense that patients do not consider many aspects of their heart failure when asked if they feel better. These potential measurement problems can be minimized by carefully designed written questionnaires that ask the same specific questions each time they are administered.
Measures of quality of life extend well beyond simple assessments of symptoms. Measures of quality of life focus on how changes in symptoms affect the individual’s activities and sense of well-being.7 Statistically significant changes in simple measurements of symptoms may be insufficient to alter lifestyle. Furthermore, quality of life can be affected by more than symptoms of heart failure. For example, side effects may adversely affect quality of life. The more frequent dizziness reported by patients receiving carvedilol compared with those receiving placebo may have reduced their quality of life even though the frequency of dyspnea, but not fatigue, was less in the carvedilol group. Clearly, commonly used symptom assessments cannot serve as adequate measures of quality of life.
Symptoms are certainly an important component of clinical efficacy. Selection of measurements of symptoms for clinical trials should not be unduly influenced by probability values in studies of investigational therapies. Rather, we should be explicit about what symptoms it is important to measure to determine if a treatment has value to patients. We should then develop unambiguous measures that comprehensively reflect what is judged to be important (ie, valid measures) and that provide for consistent applications (ie, reliable measures) in both clinical trials and practice. On the basis of these criteria, assessments such as the NYHA classification and so-called global questions are not the best possible measures of therapeutic efficacy. More recent attempts to develop comprehensive, reliable, and valid written questionnaires should not be discarded because they don’t demonstrate statistically significant effects in some clinical trials. Perhaps they are providing more meaningful data. We must continue to improve our measures of symptoms, adverse effects, and quality of life to understand the true value of treatments for heart failure.
- Copyright © 1998 by American Heart Association
Packer M, Colucci WS, Sackner-Bernstein JD, Liang CS, Goldscher DA, Freeman I, Kukin ML, Kinhal V, Udelson JE, Klapholz M, Gottlieb SS, Pearle D, Cody RJ, Gregory JJ, Kantrowitz NE, LeJemtel TH, Young ST, Lukas MA, Shusterman NH. Double-blind, placebo-controlled study of the effects of carvedilol in patients with moderate to severe heart failure: the PRECISE trial. Circulation. 1996;94:2793–2799.
Rector TS, Cohn JN. Assessment of patient outcome with the Minnesota Living with Heart Failure questionnaire: reliability and validity during a randomized, double-blind, placebo-controlled trial of pimobendan. Am Heart J. 1992;124:1017–1025.
Rector TS, Kubo SH, Cohn JN. Validity of the Minnesota Living with Heart Failure questionnaire as a measure of therapeutic response to enalapril or placebo. Am J Cardiol. 1993;71:1106–1107.
Feinstein AR. An additional basic science for clinical medicine, the development of clinimetrics. Ann Intern Med. 1983;99:843–848.
Goldman L, Hashimoto B, Cook EF, Loscalzo A. Comparative reproducibility and validity of systems for assessing cardiovascular functional class: advantages of a new specific activity scale. Circulation. 1981;64:1227–1234.
Rector TS, Johnson G, Dunkman B, Daniels G, Farrell L, Henrick A, Smith B, Cohn JN. Evaluation by patients with heart failure of the effects of enalapril compared with hydralazine plus isosorbide dinitrate on quality of life: V-HeFT II. Circulation. 1993;87(suppl VI):VI-71-VI-77.
Testa MA, Simonson DC. Assessment of quality of life outcomes. N Engl J Med. 1996;334:835–840.