(Circulation. 2007;115:1164-1169.)
© 2007 American Heart Association, Inc.
Statistical Primer for Cardiovascular Research |
From the Department of Biostatistics, Harvard School of Public Health, Boston, Mass.
Correspondence to Kenneth Stanley, PhD, Department of Biostatistics, Harvard School of Public Health, 651 Huntington Ave, Boston, MA 02115. E-mail kstanley{at}sdac.harvard.edu
Key Words: randomized controlled trials statistics trials
| Introduction |
|---|
|
|
|---|
Nothing more clearly indicates the key role of an RCT in modern clinical research than the placement of this specific research method at the top of the list of levels of evidence in evidence-based medicine.1 According to this classification, significant results of an RCT are more definitive than any other type of clinical research information.
The purpose of this article is to present an overview of the design of RCTs. Some of the principles of a high-quality study, such as the use of randomization, placebos, and double-blind designs are well known. Other principles such as stratification, use of a decision-making structure, and statistical power are known by many investigators but are not universally recognized or fully understood. These features plus others that indicate the design of a high-quality RCT are discussed. A companion article on the conduct and evaluation of RCTs will appear in a future issue of this journal.
| Clarity of Study Objective |
|---|
|
|
|---|
| Classification by Study Design |
|---|
|
|
|---|
| Classification by Study Objective and Phase |
|---|
|
|
|---|
|
Treatment assignment for phase III trials nearly always uses a randomization mechanism. Although nearly all phase III trials are RCTs, not all randomized trials are phase III trials. The frequency with which randomization is used decreases for phase I and II trials. In addition to ensuring that groups are alike as much as possible, randomization in phase I and II studies is sometimes seen as a fair mechanism to provide patient access to a promising new drug of limited supply.
Although the concept of progression of a drug/intervention through phase I, II, and III trials has served its purpose well for many years, often the progression is not clearly demarcated. For example, phase I/II and phase II/III studies are quite common and may fit clinical needs better than strict adherence to the phase I, II, III progression. Furthermore, with a typical clinical trial gestation period of
1 year, investigators often adopt a multiphase study design to speed the pace of research.
| Equipoise |
|---|
|
|
|---|
| Common Phase III Designs |
|---|
|
|
|---|
|
In the clinical setting in which no prior drug (or intervention) has been established as the standard therapy, the study design for the initial phase III studies would compare a new experimental therapy group to a "no therapy" (eg, placebo control) group (design A). After a drug was found to be effective and identified as the "standard," subsequent phase III study designs would either compare a "new drug" to the standard (design B) or would compare the standard to combination therapy that involves the standard plus the "new drug" (design C). Often the decision to design the study as a head-on-head comparison of the "new drug" (design B) depends on how promising the new drug appeared to be at the phase II level. New drugs that looked promising but are not as potent as the current standard often end up being added to the standard in a combination therapy arm (design C). A sequence of promising but not spectacular drugs that enter a particular disease setting over a period of time often leads to a sequence of 2-, then 3-, then 4-drug combination regimen RCTs.
Other common phase III designs consider issues of timing and switching. The "testing of timing" study design depicts a situation in which the optimal time to initiate therapy is unknown (design D). The study team has selected 2 points in the clinical course of the disease to investigate. Patient entry and randomization is set at the earlier of these points and patients are randomized to the standard therapy or a "delay" arm. The subsequent trigger point (most often a clinical or laboratory event) on the delay arm would determine the initiation of the standard therapy for that group of patients. A comparison of results for these 2 groups would clarify the advantages of a delay in therapy initiation, if any.
Similarly, the "testing of switching" study design gives a strategy for evaluation of a switch from the standard therapy to a new experimental therapy (design E). All patients would be treated on standard therapy up to a specific point, often a chronological time or a clinical or laboratory event, at which point the patients would enter the study and be randomly assigned to continue the current standard therapy or switch to the new experimental therapy. The value of the switch could then be evaluated by a direct comparison of the 2 randomized groups.
| Randomization |
|---|
|
|
|---|
Although the majority of clinical investigators today are convinced of the benefits of randomization, some disadvantages exist. Many investigators feel that the action of randomization interferes with the doctor-patient relationship. In order to participate in an RCT, clinicians must admit to a patient that it is not known which of the therapies would be best for the patient, which thereby potentially erodes their relationship with that patient. Furthermore, from an ethical perspective, a clinician should believe that these therapies are equivalent with respect to potential patient benefit, a situation many clinicians find uncomfortable.
| Stratified Randomization |
|---|
|
|
|---|
Additional protection against a possible imbalance is easily obtained by the use of a stratified randomization. In stratification, patients are formed into risk groups (strata) based on 1 or more prognostic factors, and a separate randomization is conducted for each strata. When the treatment assignment groups are then summed over the various strata, the end result is a forced balance of these overall treatment groups according to the factors used to form the strata. Use of stratified randomization should be viewed as an insurance policy against a potential imbalance, and, because it has virtually no cost (ie, no increase in number of patients needed or additional administrative complexity), it should be routinely used in RCTs.
| Selecting the Treatments to be Compared |
|---|
|
|
|---|
| Selection of the Patient Population |
|---|
|
|
|---|
A related contrast is the investigators option to carefully select a set of patients that are motivated and more likely to adhere to the treatment regimen. Some patient groups are not able to adhere to even a moderately complex treatment program, which thereby dilutes the study. One of the best ways to ensure an efficient clinical trial is to establish a run-in period, and then restrict subsequent patient entry onto the main study to only those who demonstrated that they could adhere to the run-in regimen. This strategy is also effective in identification of patients who will be the least likely to be lost to follow-up.
Nearly all patient populations are a blend of different risk groups. When the primary end point is a time-to-failure type end point (eg, survival), the statistical power is directly proportional to the number of observed failures. For example, consider a completed study of patients with congestive heart failure that used patient survival as its primary end point and had a patient population that could be clearly divided into 2 groups with different risks; call these groups A and B. Assume group A (a high-risk group) comprised 100 patients and that this group experienced 50 deaths. Assume group B (a low-risk group) comprised 200 patients and that this group experienced 10 deaths. Because group A experienced 50 deaths, it provided 5 times (50/10) more statistical information on mortality than group B. This means that the study results were mainly driven by group A. Even though group B had twice as many patients, its contribution to the study survival results was minimal. Inclusion of low-risk patients in a clinical trial population may not be a good investment of resources.
| Placebos and Double-Blind Designs |
|---|
|
|
|---|
| Primary End Point |
|---|
|
|
|---|
Use of composite end points is common in high-quality cardiovascular RCTs. Composite end points are necessary because a number of clinical events, such as a nonfatal myocardial infarction or stroke, may indicate a clinical failure, whereas the selection of only 1 type of clinical event as the end point will not present a comprehensive clinical picture. However, care must be taken when a composite end point is defined to ensure that the clinical failure events include the events of interest as well as "anything worse." For example, consider an RCT that compares 2 treatments for patients with congestive heart failure and the composite end point "nonfatal myocardial infarction or stroke." If there were more deaths on 1 of the 2 treatment arms, then deaths may have prevented the observance of either a nonfatal myocardial infarction or stroke and thus artificially made the arm with more deaths appear better. In this example, one can avoid such interpretation difficulties by inclusion of "death" into the definition of the composite end point.
| Sample Size and Statistical Power |
|---|
|
|
|---|
A definitive study result is achieved by placement of a mathematical decision-making structure on the clinical trial in the study development phase. For RCTs, the basic mathematical structure involves (1) identification of the primary end point and the main objective of the trial, (2) formulation of the trial objective as an hypothesis to be tested, (3) specification of the medically important difference the study is designed to detect, (4) identification of the magnitude of the errors that are acceptable (ie, the desired precision of the trial), and (5) calculation of the sample size necessary to achieve this desired precision. As noted in the Table, the typical size of a phase III RCT is 100 to 1000 patients. The sample size determination method outlined below is appropriate for RCTs. Different approaches are used for phase I and II trials.
As an example that uses the most common type of end point seen in RCTs, consider a trial that compares 2 treatments, A and B, with respect to the proportion of successes observed in each treatment group, denoted PA and PB. In a randomized trial that compares these 2 treatments, we test the null hypothesis (HO: PAPB=0) that the 2 treatments yield equivalent results versus the alternative hypothesis (HA: PAPB
0) that the treatments yield different results.
The study is conducted in an attempt to gather sufficient evidence to show that the null hypothesis is incorrect. Samples of patients are selected and the estimated difference in proportions is calculated. The key question is: How far from 0 does this estimate of PAPB need to be before we have sufficient evidence to say the treatments are different? To answer this question, we formulate the problem in statistical terms, with
as the probability of a conclusion that the treatments are different when in fact they are really equivalent (type I error), and with ß as the probability of a conclusion that the treatments are not different when in fact they are different (type II error). For RCTs, traditionally the
level is set to be 0.05. The ß level is most often set to 0.20 or 0.10 and is often stated as the power level (1ß) for the study.
Let
be the difference in the primary end point between the 2 treatment groups that the study is designed to detectthe medically important difference. Therefore,
is the difference, PAPB, considered to be both medically significant and biologically plausible. Any smaller difference is considered to be too small to be worth detection and not medically important. Any larger difference is considered to be biologically implausible; it is quite unlikely that there will be such a large difference between these 2 treatments. With
, ß, and
specified, statistical methods can be used to calculate the sample size necessary to provide the desired precision. Numerous Web sites are available for these calculations, depending on the type of primary end point.24 Although the premature loss of cases to follow-up weakens the quality of a clinical trial, it is a fact of life for nearly all long-term studies. Sample size for clinical trials should be adjusted to take into account the anticipated proportion of cases lost to follow-up.
It is useful to review the "power statement" in the published report of an RCT. This statement, most often found in the statistical methods paragraph of the methods section, will specify (1) the original primary end point, (2) the medically important difference
the study was designed to detect, (3) the size of type I error
(usually 0.05), (4) the power (usually 0.80 or 0.90) or ß, and (5) the sample size necessary to achieve this desired precision.
For example, consider the RCT by Dawkins et al that appeared in a recent issue of Circulation.5 On page 3307 one finds that these authors have identified the rate of ischemia-driven target-vessel revascularization at 9 months to be their primary end point, their
to be the change from a 20% control rate to a 10% rate in the treatment group, their
to be 0.05, their power to be 80%, and their sample size to be N=448. Comparison of the power statement with the observed results from this article allows one to see that the prior planning for this study was well done. The abstract reports an observed control rate of 19.4% and an observed treatment group rate of 9.1%.
| Need for Rapid Enrollment |
|---|
|
|
|---|
| Difference Versus Equivalence Trials |
|---|
|
|
|---|
With the common difference trial, the investigators conclude a difference has been demonstrated if they observe a P value <0.05. A series of successful difference trials will thus move medical science forward with a series of improvements in the standard therapy.
An equivalence trial tries to demonstrate similarity between a new treatment and standard therapy. This is most often done to show that a less expensive or less toxic new treatment has clinical benefit very similar to that of the standard therapy. Equivalence trials are sometimes used by a pharmaceutical manufacturer when attempts are made to license a drug in a disease setting that already has 1 or more licensed drugs.
Many researchers who have planned a noninferiority trial, however, do not correctly present their results. The noninferiority design concept is a 1-directional concept. Either the new treatment is inferior to the standard therapy or it is nota yes versus no type of decision. Statistical procedures for an equivalence trial should focus on that unidirectional decision with 1-sided tests, P values, and confidence intervals. Readers are referred to the COBALT (Continuous Infusion vs Double-Bolus Administration of Alteplase) study6 and the accompanying editorial7 for an example of how an equivalence trial should be reported.
| Study Monitoring |
|---|
|
|
|---|
A second mechanism is a Data and Safety Monitoring Board (DSMB), also known as a Data Monitoring Committee (DMC). This is an independent committee established to assess at regularly scheduled intervals the progress of an RCT, regarding enrollment, safety data, data quality, and the critical efficacy end points, as well as the continuing validity and scientific merit of the trial.8 Because the DSMB/DMC is entirely independent of the clinicians who are participating in the study, it can ensure patient safety and study validity without compromise or bias of the study. Good study design and periodic monitoring also help the investigation maintain appropriate ethical standards. The ability of investigators to monitor and evaluate ongoing clinical trials has improved markedly with the recent initiative by many medical journals to require the registration of a clinical trial in a public trials registry as a condition for consideration of publication.9
| Acknowledgments |
|---|
None.
| References |
|---|
|
|
|---|
2. Brant R. Inference for Proportions: Comparing Two Independent Samples. Available at: http://newton.stat.ubc.ca/
rollin/stats/ssize/b2.html. Accessed January 2, 2006.
3. Brant R. Inference for Means: Comparing Two Independent Samples. Available at: http://newton.stat.ubc.ca/
rollin/stats/ssize/n2.html. Accessed January 2, 2006.
4. Schoenfeld D. Find Statistical Considerations for a Study Where the Outcome Is a Time to Failure. Available at: http://hedwig.mgh.harvard.edu/sample_size/quan_measur/para_time.html. Accessed January 2, 2006.
5. Dawkins KD, Grube E, Guagliumi G, Banning AP, Zmudka K, Colombo A, Thuesen L, Hauptman K, Marco J, Wijns W, Popma JJ, Koglin J, Russell ME; TAXUS VI Investigators. Clinical efficacy of polymer-based paclitaxel-eluting stents in the treatment of complex, long coronary artery lesions from a multicenter, randomized trial: support for the use of drug-eluting stents in contemporary clinical practice. Circulation. 2005; 112: 33063313.
6. The Continuous Infusion versus Double-Bolus Administration of Alteplase (COBALT) Investigators. A comparison of continuous infusion of alteplase with double-bolus administration for acute myocardial infarction. N Engl J Med. 1997; 337: 11241130.
7. Ware JH, Antman EM. Equivalence trials [editorial]. N Engl J Med. 1997; 337: 11591161.
8. Food and Drug Administration. Guidance for Clinical Trial Sponsors on the Establishment and Operation of Clinical Trial Data Monitoring Committees. Available at: http://www.fda.gov/cber/gdlns/clindatmon.htm. Accessed January 2, 2006.
9. DeAngelis C, Drazen JM, Frizelle FA, Haug C, Hoey J, Horton R, Kotzin S, Laine C, Marusic A, Overbeke AJPM, Schroeder TV, Sox HC, Van Der Weyden MB. Clinical trial registration: a statement from the International Committee of Medical Journal Editors [editorial]. N Engl J Med. 2004; 351: 12501251.
This article has been cited by other articles:
![]() |
F. Zannad, P. Bousquet, and L. Monassier CHAPTER 11 Clinical Pharmacology of Cardiovascular Drugs ESC Textbook of Cardiovascular Medicine, January 1, 2009; 2(1): med-9780199566990-chapter - med-9780199566990-chapter. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. K. Nallamothu, R. A. Hayward, and E. R. Bates Beyond the Randomized Clinical Trial: The Role of Effectiveness Studies in Evaluating Cardiovascular Therapies Circulation, September 16, 2008; 118(12): 1294 - 1303. [Full Text] [PDF] |
||||
![]() |
M. G. Larson Analysis of Variance Circulation, January 1, 2008; 117(1): 115 - 121. [Full Text] [PDF] |
||||
![]() |
B. R. Overholser and K. M. Sowinski Biostatistics Primer: Part I Nutr Clin Pract, December 1, 2007; 22(6): 629 - 635. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Circulation Home | Subscriptions | Archives | Feedback | Authors | Help | AHA Journals Home | Search Copyright © 2007 American Heart Association, Inc. All rights reserved. Unauthorized use prohibited. |