Subgroup Interactions in the Heart and Estrogen/Progestin Replacement Study
Background— The Heart and Estrogen/Progestin Replacement Study (HERS) showed no overall benefit of postmenopausal hormone treatment in women with coronary heart disease (CHD). We analyzed the HERS data to determine whether there were specific subgroups of women who responded differently to treatment, either during the first year or in the overall study.
Methods and Results— In the search for significant treatment interactions, we analyzed a total of 86 subgroups defined by baseline characteristics. These included demographics and lifestyle factors, laboratory and physical examination variables, medical history and symptoms by self-report, medication use, and prior CHD history by chart review. We examined within-subgroup treatment effects for baseline variables that significantly interacted with treatment assignment. Under the null hypothesis, 4 (5%) of the 86 interactions would be expected to be nominally significant (P<0.05) by chance alone at each time point. Six of the interaction values were P<0.05 at 1 year, and 3 were P<0.05 at trial completion. The findings are discussed in the context of known mechanisms of action and prior scientific knowledge. Use of digitalis and history of myocardial infarction emerged as 2 possible modifiers of the effect of hormone therapy during the first year, and lipoprotein(a) emerged as a possible modifier during the overall study.
Conclusions— Extensive post hoc analyses did not identify any subgroup of HERS participants in which postmenopausal hormone treatment was clearly beneficial or harmful, but several possibilities emerged for testing in future trials.
Received October 4, 2001; revision received December 11, 2001; accepted December 21, 2001.
Investigators for most randomized clinical trials typically include in their analyses intervention-control comparisons within multiple subgroups. Such subgroup analyses provide important information regarding internal consistency. Similar findings in major subgroups tend to strengthen the validity of the overall trial results. Subgroup analyses are also conducted to address the following question: Among which subgroup of patients is the intervention especially beneficial or harmful?1 The search for subgroups classified by prerandomization variables in which the effect of the intervention is modified (an interaction) is more controversial because of the increased likelihood of spurious results. Such analyses are important, but they must be interpreted cautiously.2
See p 902
The Heart and Estrogen/Progestin Replacement Study (HERS) was the first randomized placebo-controlled clinical trial of sufficient scope to evaluate the clinical effect of hormone therapy on coronary events in postmenopausal women with coronary heart disease (CHD).3 Hormone treatment did not reduce the risk of CHD events during an average follow-up of 4.1 years.4 The lack of clinical benefit was puzzling because of the net 11% reduction in LDL cholesterol and the 10% increase in HDL cholesterol in the hormone group compared with the placebo group and also because of the wealth of prior epidemiological and pathophysiological evidence that hormone treatment would prevent CHD.5–7⇓⇓ Thus, there were good reasons to conduct exploratory subgroup analyses in HERS.
The overall treatment effect in HERS was null, but there were unexpected time trends. Post hoc analysis revealed a pattern of early increase and later decrease in CHD risk when the hormone and placebo groups were compared.4 If real, this would represent a “time-dependent” interaction. Such an analysis differs from the evaluation of subgroup interactions classified by baseline variables (the main topic of the present report) in that the effect modification is observed in the whole cohort over the entire period of follow-up. Chance remains a reasonable explanation for the pattern of early risk in light of the unusually low rate of early coronary events in the placebo group.4 However, the finding could also occur if hormones first enhance thrombosis and/or plaque instability and then retard atherosclerosis.
The objective of the present report was to summarize the experience with the large number of subgroup analyses conducted by the HERS investigators and to discuss the efforts to distinguish between spurious and real subgroup findings. The possible time trends in HERS led us to look separately for subgroup interactions in the first year of the trial before examining the 4 years of the trial as a whole.
HERS was a randomized placebo-controlled trial of a daily dose of 0.625 mg conjugated equine estrogens plus 2.5 mg medroxyprogesterone acetate in 2763 postmenopausal women with an intact uterus and CHD.3 The prespecified primary outcome was the occurrence of nonfatal myocardial infarction (MI) or CHD death, referred to as coronary events. The prognostic baseline characteristics were evenly balanced between the 2 groups. During the average follow-up of 4.1 years, 179 women in the hormone group and 182 women in the placebo group suffered a coronary event, for an overall relative hazard (RH) of 0.99 (95% CI 0.81 to 1.22). The RHs for years 1, 2, 3, and 4/5 were 1.52 (95% CI 1.01 to 2.29), 0.98 (95% CI 0.66 to 1.46), 0.85 (95% CI 0.54 to 1.33), and 0.75 (95% CI 0.50 to 1.13), respectively. A post hoc analysis after complete adjudication of all primary events revealed a borderline statistically significant decline in the RH over time (P=0.03). Further details of the design, baseline findings, and main results have been published elsewhere.3,4⇓
A total of 86 post hoc subgroup variables were examined during the course of HERS. All subgroups were defined by baseline characteristics. These included demographics and lifestyle factors, laboratory and physical examination variables, medical history and symptoms by self-report, baseline medication use, and prior CHD history by chart review. Information on the 86 subgroups is detailed in the Appendix. The analyses were restricted to the primary CHD outcome and to 2 follow-up periods (the first year and the entire study duration) and adhered to the intention-to-treat principle.
For each of the 86 subgroups, a Cox proportional hazards model (PHM) including only subgroup participants and with treatment as the single independent variable was used to estimate the RH for treatment in the subgroup. An analogous PHM was used to estimate the RH for treatment in the complementary group. Finally, a Wald test for interaction between treatment and subgroup was carried out by using a PHM including all participants and with 3 independent indicator variables: treatment, subgroup, and the treatment×subgroup interaction. The Wald test determines whether these 2 RHs for treatment are different. For example, we tested for a statistically significant difference between 4.94 (the RH for treatment during the first year of the trial among women reporting use of digitalis at baseline) and 1.26 (the first year RH in the complementary group of women who did not take digitalis at baseline) (Table 1). For subgroup classifications for which the interaction was statistically significant, we then assessed the magnitude, direction, and statistical significance of treatment effect estimates within the subgroup and its complement. In the example, this assessment focused on women who used digitalis. Because the subgroups were defined by baseline covariates, the within-subgroup treatment effect estimates are based on randomized comparisons even in subgroups with small numbers and thus do not require adjustment for other baseline covariates.
Our examination of subgroups in the first year of follow-up was motivated by a nominally significant interaction between treatment assignment and time since randomization. In addition to the continuous time trend in the log RH previously reported,4 we compared the RH for treatment in the first year to the estimate for the remainder of the trial.
Inflation of the type I error rate is a well-recognized problem in post hoc subgroup analyses1 and may be particularly problematic in trials with an overall null finding.8 We conducted a total of 172 subgroup analyses to determine the effect of treatment in 86 subgroups after 1 year and after the entire follow-up. Under the null hypothesis that the treatment effect does not differ by subgroup, we would expect ≈4 (5%) of 86 interactions to be nominally significant at P<0.05 by chance alone, both for the 1-year and overall follow-up periods. After Bonferroni correction for 172 comparisons, the adjusted significance level would be ≈0.0003. To facilitate the full discussion of the unexpected HERS results, we present subgroups with interactions nominally significant at P<0.10, while being aware of the potential for type I error.
First-Year CHD Events
Table 1 presents subgroup analyses in which the probability value for interaction was P<0.10. The interaction value was P≤0.05 for 6 of the 86 subgroups (Table 1). For 3 of the 86 subgroups (history of smoking [yes versus no], current smoking [yes versus no], and ≥3 live births [yes versus no]), the RH was near or <1.0, indicating that the risk of coronary events was similar or less in the hormone group compared with the placebo group. The nominally significant interaction probability values were explained by the ≈2 to 3-fold increase in event rates in the hormone-treated women in the complementary subgroups, ie, those with no history of smoking, current nonsmokers, and those with <3 live births.
The other 3 subgroups in which there was a significant interaction (digitalis use, living alone, and prior MI) showed 2- to 5-fold excess rates of coronary events among the hormone-treated women compared with the control group. In the complementary subgroup (no digitalis use, not living alone, and no prior MI), the RHs were ≈1.0, indicating similar event rates in those treated with hormones and in those taking placebo. Four other subgroups (β-blocker use, age ≥70 years, exercise, and prior PTCA) had an interaction value of P>0.05 but P≤0.10 (Table 1).
Overall CHD Events
Only 3 of the 83 subgroups had an interaction value of P≤0.05 when all CHD events occurring during the average follow-up of 4.1 years were considered. For all 3 subgroups (prior PTCA, lipoprotein[a] >25 mg/dL, and “other serious medical conditions”), the RHs were ≈0.7 to 0.8 among the hormone-treated patients compared with the control group, whereas the RHs were ≈1.2 in the complementary subgroups (Table 2).
None of the subgroups showing a nominal statistically significant interaction P value at 1 year demonstrated statistical significance when the entire follow-up period was considered. However, 3 subgroups (prior PTCA, digitalis use, and history of smoking) had an interaction value of P≤0.10 for both time periods. The overall and 1-year findings were qualitatively similar but less pronounced in the overall data.
Cumulative Event Curves
Cumulative event curves for the 6 subgroups of interest are displayed in the Figure, A through F. For the prior-MI subgroup (Figure, A), the event curve increased in a linear manner over the entire follow-up in the placebo group. In the hormone group, the slope of the cumulative event curve increased markedly over the first 7 to 8 months after randomization and was followed by a slower increase. The absolute difference in the 1-year CHD event rate between women assigned to hormone treatment and those assigned to placebo was 3.1% (5.8% versus 2.7%, P=0.01). The event curves merged at 3.5 years. In the complementary subgroup (no prior MI), the 2 event curves overlapped.
In women aged ≥70 years (Figure, B), the cumulative event rate was higher in the hormone group throughout follow-up; however, the hazard rate declined over time in the treated group compared with the placebo group. The difference between the hormone and placebo groups at 1 year (P=0.01) may be partially explained by a low placebo event rate for women aged >70 years (only 19 of 1000 in year 1 compared with 36 of 1000, 40 of 1000, and 48 of 1000 in the ensuing years). In the complementary group, the 2 event curves overlapped.
In the small use-of-digitalis subgroup (Figure, C), the coronary event rate in the hormone group was much higher than the rate in the placebo group during the entire follow-up period. At 1 year, the difference reached statistical significance (P<0.01). In the complementary subgroup (no use of digitalis), the event rates in the 2 study groups were the same.
Among those with a history of smoking (Figure, D), the cumulative event curves were similar in the hormone and placebo groups. In contrast, among those with no history of smoking, the hormone group had more events than did the placebo group during the entire follow-up. At 1 year, the 3.5% absolute difference in coronary events was statistically significant (P<0.001).
In the subgroup with lipoprotein(a) above the median (Figure, E), the hormone group tended to have fewer events than did the placebo group after a lag time of 1.5 years. On the contrary, for those with lipoprotein(a) levels below the median, the placebo group seemed to fare better for 2 to 3 years, and the 1-year value was P=0.03, which suggests an unfavorable early hormone effect.
The cumulative event curves in patients with prior PTCA (Figure, F) overlapped for almost 3 years. The curves subsequently diverged, with a smaller increase in the hormone group (overall P=0.04). In the subgroup with no prior PTCA, the placebo group fared better. The 1-year interaction value was P<0.04.
Under the assumption of no effect of hormone treatment and given the 86 subgroup analyses, by chance, 4 comparisons would be expected to have a value of P≤0.05 in the 1-year analyses, and another 4 would be expected to have a value of P≤0.05 in the overall analyses. Thus, the fact that we observed 6 nominally significant interactions in the first year and 3 in the overall trial is almost exactly what would have been expected by chance if none of the 172 subgroups tested represented real interactions. This is not surprising, inasmuch as true subgroup interactions are rarely observed in clinical trials.9 It is important to underscore that most clinical trials have limited power for discovering statistically significant interactions. Thus, only large interactions can be detected. Therefore, it is possible that many true interactions of small or moderate size are not detected because of low power.
Even though the frequency of observed interactions was almost exactly what would be expected by chance alone, it remains possible that ≥1 of these are real effects. The challenge is to distinguish between real and chance findings. In general, the likelihood of an observed treatment group difference being real increases if there is other scientific evidence for an underlying mechanism of action attributable to the study intervention. The only way to be sure that an observed interaction is real is to repeat the observation in other clinical trials.
Underlying Mechanism(s) of Action
Among the 6 subgroups with interaction P<0.05 at 1 year, it is difficult to explain why hormone therapy would increase the risk of coronary events in women with <3 live births and in those living alone. The observed lower event rate in smokers compared with nonsmokers in the hormone group is intriguing. It is known that smoking reduces estrogen levels in plasma. If the estrogen dose in HERS was too high, one could speculate that smoking has a favorable dose-attenuating effect. However, the higher event rate among users of digitalis in the hormone group compared with the placebo group might be due to an adverse drug-drug interaction. Digitalis is eliminated primarily by renal tubular secretion, which involves P-glycoprotein. Progestin appears to inhibit P-glycoprotein,10 which could result in higher serum concentrations of digitalis and digitalis toxicity. There are no reports suggesting an interaction between digitalis and estrogen. Because use of digitalis is a marker of congestive heart failure, the association could also reflect a hormone–congestive heart failure interaction.
High levels of C-reactive protein (CRP) are associated with increased risk of coronary events.11 Because hormone therapy has been reported to increase CRP levels, one could speculate that such CRP increases could be unfavorable,12 especially in those with an elevated CRP level at baseline, such as women with a history of MI.
Of the 3 significant subgroup findings during the entire trial duration, 2 subgroups (prior PTCA and other serious medical conditions) are puzzling. The findings in the third subgroup, lipoprotein(a) >25 mg/dL, were further analyzed in a recent publication by Shlipak et al.13 Analyses of the HERS data showed that high levels of lipoprotein(a) were an independent risk factor for CHD events in the placebo group, that hormone treatment lowered lipoprotein(a) levels, and that large reductions in lipoprotein(a) levels were associated with a lower risk of coronary events. The authors observed a significant interaction (P=0.03), with hormone treatment having a more favorable effect (relative to placebo) in women with high initial lipoprotein(a) levels than in women with low levels. This interaction has a plausible biological basis because lipoprotein(a) resembles plasmin and thus may, in the presence of estrogen, predispose an individual to thrombosis. However, the authors pointed out that this apparent interaction requires confirmation by others.
Genetic markers are strong candidates for interaction because they may represent a genotype that has a categorically distinct phenotypic expression. Psaty et al14 recently observed from a case-control study that the association between hormone therapy and risk of nonfatal MI differed between postmenopausal hypertensive women with and without the prothrombin 20210 G→A variant. This prothrombin variant, present in 1.8% of women, was associated with an 11-fold increase in the risk of nonfatal MI. If operational in HERS, this factor could explain part of the pattern of early increase in CHD risk. Other susceptibility factors not yet examined in HERS may also be important, and genetic studies in HERS are in progress.
Distinguishing Chance Findings From Real Interactions
Observed interactions that are most likely due to chance are the ones without plausible biological explanation and with lack of support in the literature. In HERS, we considered 2 subgroups (≥3 live births and living alone) to be in this category. With some ingenuity, however, one can often construct a possible mechanism through which an interaction could occur. Drugs have multiple mechanisms of action, and it is very difficult to determine which have important biological effects. The medical literature is full of proposed mechanisms of action, many of which remain unconfirmed. From this rich literature, it is easy to find supportive evidence for unexpected findings in post hoc defined subgroups. Caution is always advised, particularly when such an observation lacks considerable and consistent support in the literature. Even an interaction finding for which there is strong evidence from other studies supporting biological plausibility may represent a chance finding. In the final analysis, the only way to be sure that an interaction observed in a sample represents real effect modification in the population is to replicate the subgroup finding in other clinical trials.
Because none of the subgroup analyses were prespecified in the HERS protocol or strongly supported by other evidence and because the number of observed interactions with a nominal interaction value of P≤0.05 was almost exactly the number expected by chance, we cannot be certain that any of these observed interactions are real. The post hoc nature of the analyses supports a cautious interpretation, particularly in the absence of supporting evidence from other clinical trials. The more plausible findings of these subgroup analyses from HERS ought to be considered as hypothesis-generating and should be tested in ongoing and future trials.
We conclude that extensive post hoc analyses did not identify any subgroup of HERS participants in which postmenopausal hormone treatment was clearly beneficial or harmful. The unfavorable treatment trend within 12 months of randomization is not explained by our extensive subgroup analyses. However, the power to detect subgroup differences was limited, and small to moderate interactions could have gone undetected. Also, other factors that have not yet been measured, such as genetic markers, may identify subgroups of women susceptible to adverse effects or greater benefit from hormone replacement treatment.
The following categories were used for demographics: age (<60, 60 to 69, and ≥70 years; <70 versus ≥70 years), race (white), marital status (4 levels), current marriage, education (high school or more), and living condition (alone).
The following categories were used for reproductive history: any pregnancy, birth of first child at age ≥22 years, ≥3 pregnancies, ≥3 live births, oophorectomy, ≥18 years since menopause, age ≥53 years at menopause, any estrogen use since menopause, ≥1 year of estrogen use, use of estrogen vaginal cream since menopause, any use of estrogen plus progestin, maternal history of breast cancer, and full sister with breast cancer.
Lifestyle categories included the following: smoking status (current, former, and never), current smoking, any history of smoking, ≥30 pack-years of smoking, any alcohol consumption, ≥1 drink per day, ≥1 drink per week, relative physical activity, regular exercise, and ≥75 minutes of exercise per week.
Medical Conditions by Self-Report
The following categories were used for medical conditions: prior myocardial infarction (MI), CABG, PTCA, angiography; age at prior MI, CABG, PTCA, and/or angiogram (<60, 60 to 69, and ≥70 years); time elapsed since prior MI, CABG, PTCA, and/or angiogram (0.5 to 1, 1 to 3, and >3 years); cholecystectomy; history of diabetes, gallbladder disease, fracture since menopause, and/or other serious medical conditions; poor or fair overall health; chest pain in previous 4 weeks; at least 2 years of depression; and depression in past year.
Laboratory results were categorized as follows: serum glutamic oxaloacetic transaminase ≥21 mg/dL, glucose ≥99 mg/dL, triglycerides ≥157 mg/dL, triglycerides by quartiles, total cholesterol ≥224 mg/dL, HDL cholesterol ≥49 mg/dL, LDL cholesterol ≥141 mg/dL, and lipoprotein(a) ≥25.3 mg/dL.
Physical Examination Findings
The following physical examination findings were included: hypertension by self-report or physical examination, systolic BP ≥134 mm Hg, diastolic BP ≥72 mm Hg, waist/hip ratio ≥0.87, body mass index ≥27.75, rales in the lungs, jugular venous distension, heart murmur, S3 heart sounds, peripheral edema (absent, trace, and pitting), history of congestive heart failure, and NY Heart Association classification.
ECG findings included were Q-wave MI, atrial fibrillation, and left bundle branch block.
Chart Review Findings
The following chart review findings were included: prerandomization MI, CABG, PTCA, and/or angiogram; time elapsed since prior MI (<2, 2 to 5, and >5 years); left ventricular ejection fraction <40%; and ≥2 severely obstructed coronary arteries.
Study Entry Criteria
The following were taken as study entry criteria: age ≥55 years and no menses for 5 years; follicle-stimulating hormone >40 mIU/mL and no menses for 1 year; and documented MI, CABG, PTCA, and/or coronary artery narrowing.
Medication Use by Inventory
Medication use was categorized as follows: aspirin, ACE inhibitors, β-blockers, calcium antagonists, digitalis, diuretics, l-thyroxine, vitamin K antagonists, any lipid-lowering medication, and any coronary heart disease medication.
Those with a composite risk score greater or equal to the median (overall and placebo group scores) were included.
HERS was sponsored by Wyeth-Ayerst, and all coauthors received salary support from this company. In addition, Dr Herrington has received research support and occasional honoraria and consulting fees from Wyeth-Ayerst, Pfizer, Lilly, Solvay, and Organon.
- ↵Friedman LM, Furberg CD, DeMets DL. Fundamentals of Clinical Trials. 3rd ed. New York, NY: Springer-Verlag; 1998.
- ↵Furberg CD, Byington RP, for the Beta-Blocker Heart Attack Trial Research Group. What do subgroup analyses reveal about differential response to beta-blocker therapy?: the Beta-Blocker Heart Attack Trial Experience. Circulation. 1993; 67 (suppl I): I-98–I-101.
- ↵Grady D, Applegate W, Bush T, et al. Heart and Estrogen/Progestin Replacement Study (HERS): design, methods and baseline characteristics. Control Clin Trials. 1998; 19: 314–335.
- ↵Hulley S, Grady D, Bush T, et al, for the Heart and Estrogen/Progestin Replacement Study (HERS) Research Group. Randomized trial of estrogen plus progestin for secondary prevention of coronary heart disease in postmenopausal women. JAMA. 1998; 280: 605–613.
- ↵Barrett-Connor E, Grady D. Hormone replacement therapy, heart disease, and other considerations. Annu Rev Public Health. 1998; 19: 55–72.
- ↵Manson JE, Martin KA. Postmenopausal hormone-replacement therapy. N Engl J Med. 2001; 345: 34–40.
- ↵Mosca L, Collins P, Herrington DM, et al. Hormone replacement therapy and cardiovascular disease: a statement for healthcare professionals from the American Heart Association. Circulation. 2001; 104: 499–503.
- ↵Lee KL, McNeer JF, Starmer CF, et al. Clinical judgment and statistics: lessons from a simulated randomized trial in coronary artery disease. Circulation. 1980; 61: 508–515.
- ↵Browner WS, Hulley SB. Effect of risk status on treatment criteria: implications of hypertension trials. Hypertension. 1989; 13 (suppl I): I-51–I-56.
- ↵Aebi S, Schnider TW, Los G, et al. A phase II/pharmacokinetic trial of high-dose progesterone in combination with paclitaxel. Cancer Chemother Pharmacol. 1999; 44: 259–265.
- ↵Haverkate F, Thompson SG, Pyke SDM, et al, for the European Concerted Action on Thrombosis and Disabilities Angina Pectoris Study Group. Production of C-reactive protein and risk of coronary events in stable and unstable angina. Lancet. 1997; 349: 462–466.
- ↵Cushman M, Legault C, Barrett-Connor E, et al. Effect of postmenopausal hormones on inflammation-sensitive proteins: the Postmenopausal Estrogen/Progestin Interventions (PEPI) Study. Circulation. 1999; 100: 717–722.
- ↵Shlipak MG, Simon JA, Vittinghoff E, et al. Estrogen and progestin, lipoprotein(a), and the risk of recurrent coronary heart disease events after menopause. JAMA. 2000; 283: 1845–1852.
- ↵Psaty BM, Smith NL, Lemaitre RN, et al. Hormone replacement therapy, prothrombotic mutations, and the risk of incident nonfatal myocardial infarction in postmenopausal women. JAMA. 2001; 285: 906–913.