(Circulation. 2008;117:1238-1243.)
© 2008 American Heart Association, Inc.
Statistical Primer for Cardiovascular Research |
From the Department of Biostatistics, Boston University School of Public Health, Boston, Mass.
Correspondence to Lisa M. Sullivan, PhD, Boston University School of Public Health, Department of Biostatistics, 715 Albany St, Boston, MA 02118. E-mail lsull{at}bu.edu
Key Words: biostatistics measurement research design statistical data analysis
| Introduction |
|---|
|
|
|---|
Repeated-measures analysis encompasses a spectrum of applications, which in the simplest case is a generalization of the paired t test.1 A repeated-measures within-subjects design can be thought of as an extension of the paired t test that involves
3 assessments in the same experimental unit. Repeated-measures analysis can also handle more complex, higher-order designs with within-subject components and multifactor between-subjects components. The focus here is on within-subjects designs.
| Design Issues |
|---|
|
|
|---|
The goal of the analysis is to compare responses among the 4 treatments. If the dependent or outcome variable is continuous, this test is performed with ANOVA.3 If the outcome variable is categorical, this test is performed with a
2 test.4 These tests are based on the assumption that the measurements within and across treatments are independent or unrelated. If the experimental units are unrelated (ie, not family members or littermates), and 1 measurement has been made per unit, then this assumption is reasonable.
In contrast, in a repeated-measures design, multiple measurements are taken on each experimental unit. Consider again the application described above in which the goal of the analysis is to compare the 4 competing treatments. A repeated-measures design could involve 5 animals, each measured 4 times, once under each experimental condition. The repeated-measures design involves a smaller number of animals, which is both efficient and ethically appealing.
If 5 animals are measured under each of 4 different experimental conditions, a total of 20 measurements will again be available for analysis. The 20 measurements, however, are not independent but are related within the subjects. Because the measurements might be affected by within-subject characteristics (eg, age or genetic factors), statistical tests that properly account for within-subject correlation are needed. If we assume that measurements taken in the same individual are correlated, the test for a difference in treatments will involve a smaller residual or error variance than that based on a completely randomized design, thereby increasing precision in the analysis.
A randomized block design is one in which a set of experimental units are organized into homogeneous groups or blocks on the basis of a characteristic assumed to affect the outcome. The goal is to have r replicates of each of k treatments in each of b blocks, with the total sample size n=kbr. Consider again the study comparing 4 competing treatments (k=4). Suppose the outcome of interest is known to be affected by age. With n=20 independent experimental units, these might be organized into 5 age groups (eg, quintiles of age) with 1 replication per group (k=4, b=5, r=1). In a randomized block design, experimental units within each block are randomly assigned to treatments, and this technique reduces variation due to differences in age. The design can be thought of as replications of a completely randomized experiment in which there are as many replications as there are blocks.
Some repeated-measures designs can be viewed as a special case of the randomized block design in which the block is the individual experimental unit (eg, person or animal). The randomized block design is often used with siblings or littermates. The family unit is the block, and assessments are repeated on each member of the family. The assessments within a family or litter are related. Accounting for the dependencies within the block results in a more precise test of treatment differences.
| Repeated-Measures Analysis |
|---|
|
|
|---|
|
Example 1
An animal study is performed to assess transgene activation in the heart. The primary outcome is percent fractional shortening, which is measured at baseline and again after 2, 4, and 6 weeks of treatment. A final assessment is made after 6 weeks of treatment and 2 weeks off treatment. At baseline, the mice were 12 weeks of age; thus, at subsequent assessments, they were 14, 16, 18, and 20 weeks of age. Three mice completed the protocol, and the data on percent fractional shortening measured at each time point are shown in Table 1. The research hypothesis is that the mean percent fractional shortening scores are different over time. The means at each time point are shown in the bottom row of Table 1 and decrease over time. To test for a significant difference in means over time, a repeated-measures ANOVA is used. The results of the repeated-measures ANOVA are contained in Table 2. The test statistic for equality of means over time is F=95.4 (df=4,8), which is highly statistically significant at P<0.0001. Thus, a highly statistically significant difference exists in the mean percent fractional shortening over time.
|
|
Suppose that these same data were incorrectly analyzed as if they were derived from a completely randomized design. The results of the ANOVA, testing for a difference in means over time, are contained in Table 3. If the data are treated incorrectly as 15 independent observations and analyzed with ANOVA, the F statistic is F=45.1 (df=4,10), which is still highly statistically significant. Notice the difference in the error or residual variation between methods. The denominator of the F statistic for testing differences in means over time is the mean square error. In the repeated-measures ANOVA, the mean square error is 6.1 compared with 12.9 in the ANOVA that assumed independence. In this particular example, the incorrect analysis still produced a significant result. In other applications, failure to account for the dependencies among observations could result in a nonsignificant finding. The repeated-measures ANOVA that appropriately accounts for dependencies in the data produces a more precise test.
|
If a significant difference is found, it may be of interest to test for differences between pairs of treatments or, in example 1, between specific time points. These tests should be handled with a multiple comparison procedure that again appropriately handles the correlation in the data and also controls the type I error rate (see Larson3 and Cabral5 for details).
| Repeated-Measures Analysis With Repeated Measures on 1 Factor |
|---|
|
|
|---|
|
These data can be analyzed in several different ways. An important issue is the appropriate specification of the nature of the correlations between measurements in the same person, called the covariance structure. Most statistical computing packages offer a variety of covariance structures for these types of analysis, and the covariances must be modeled correctly. Three structures are very popular and fit many applications. The first is called "compound symmetry" and assumes that the correlations between all pairs of measures are the same. This may be reasonable for a repeated-measures study in which each subject is measured under k different experimental conditions. The second is called "autoregressive of order 1," or AR(1), and assumes that the correlations between adjacent pairs are greater than the correlations between more distant pairs. This may be reasonable for data measured serially in time, whereby more proximal measures are more highly correlated than measures taken more distantly in time. For this structure, the time points should be approximately equally spaced in time. A third popular structure is called "unstructured," and as the name implies, it assumes that each pair of measurements has its own correlation. Although the latter might seem appealing, it actually produces a less powerful analysis because the data first must be used to assess the correlation structure and then to perform the primary analyses. Some statistical computing packages (eg, SAS, SAS Institute, Cary, NC) offer metrics to determine which structure best fits the data. One such measure is the Akaike information criterion, with which smaller values indicate a better fit. As in all statistical analyses, it is important to plan and implement parsimonious models that are biologically sensible. An example of a two-factor ANOVA with repeated measures on 1 factor is contained in example 2.
Example 2
A randomized, placebo-controlled study is performed to estimate the short-term effects of an antihypertensive medication on systolic blood pressure. Subjects are randomly assigned to receive either the treatment or a placebo. Systolic blood pressures are measured before the first dose of treatment is administered (baseline) and again at 2, 4, and 6 weeks after the initiation of treatment (or placebo). The study involves 6 participants, 3 of whom are randomly assigned to each treatment arm; the data on systolic blood pressure measured at each time point are shown in Table 4. The research hypothesis is that the mean systolic blood pressures are different between treatments.
|
Figure 3 displays the mean systolic blood pressures over time for participants undergoing treatment and given a placebo. The mean systolic blood pressures decreased over time in both groups, with a sharper decrease in the treatment group. The results of the two-factor ANOVA with repeated measures are contained in Table 5. The test statistic for equality of treatment means over time is F=36.1 (df=1,4), which is highly statistically significant at P=0.0039. Thus, a highly statistically significant difference is present in mean systolic blood pressures between patients given the antihypertensive medication and those given placebo. The test for a difference in mean systolic blood pressures over time is also highly statistically significant [F=27.1 (df=3,12), P=0.0001]. The test for the interaction between treatment and time is marginally significant [F=3.2 (df=3,12), P=0.0626]. This test assesses the homogeneity of the difference in mean blood pressures between the treatment and placebo groups over time. Figure 3 shows that the difference in means is widening over time, which is driving the test for interaction to approach statistical significance.
|
|
The analyses reported in Table 5 assume equal correlations between measurements (ie, compound symmetry). An alternative analysis for these data would be an autoregressive correlation structure in which correlations between measures taken closer together in time are higher than those measured more distantly. If we assume an AR(1) covariance structure, the test statistic for equality of treatment means over time is F=12.7, which is significant at P=0.0235. The test for a difference in mean systolic blood pressures over time is highly statistically significant (F=30.4, P=0.0001), and the test for the interaction between treatment and time is significant (F=3.7, P=0.0423). The Akaike information criterion is 102.7 for the compound symmetry model and 98.0 for the AR(1) model. Because smaller values indicate better fit, the AR(1) model is a better choice for these data.
Estimates of treatment effect are provided in Table 6 for both the model that assumes compound symmetry and the model that assumes an AR(1) covariance structure. Notice that the estimates of the treatment effect are the same; however, the standard errors are different, which affects the significance of the difference.
|
| Alternative Approaches to Analysis of Repeated-Measures Data |
|---|
|
|
|---|
| Assumptions and Analytic Details |
|---|
|
|
|---|
A number of statistical computing packages are available that offer procedures for repeated-measures ANOVA. Within these packages, several options are available for conducting the tests. SAS, for example, offers several procedures that can handle repeated-measures data. Careful attention must be paid to the data layout, the specification of factors (eg, as fixed or repeated), the appropriate error terms for test statistics, and the nature of the correlations between observations measured in the same individual (ie, the covariance structure). Littell et al7,8 provide a detailed approach to using SAS for repeated-measures analysis.
| Acknowledgments |
|---|
None.
| References |
|---|
|
|
|---|
2. Verberk WJ, Kroon AA, Kessels AGH, Nelemans PJ, VanRee JW, Lenders JWM, Thien T, Bakx JC, VanMontfrans GA, Smit AJ, Beltman FW, DeLeeuw PW. Comparison of randomization techniques for clinical trials with data from the HOMERUS-trial. Blood Pressure. 2005; 14: 306–314.[CrossRef][Medline] [Order article via Infotrieve]
3. Larson MG. Analysis of variance. Circulation. 2008; 117: 115–121.
4. DAgostino RB, Sullivan LM, Beiser AS. Introductory Applied Biostatistics. Belmont, Calif: Brooks/Cole; 2004.
5. Cabral HJ. Multiple comparisons procedures. Circulation. 2008; 117: 698–705.
6. Kleinbaum DG, Kupper LL, Muller KE. Applied Regression Analysis and Other Multivariable Methods. 2nd ed. Boston, Mass: PWS-Kent; 1988.
7. Littell RC, Henry PR, Ammerman CB. Statistical analysis of repeated measures data using SAS procedures. J Anim Sci. 1998; 76: 1216–1231.
8. Littell RC, Milliken GA, Stroup WW, Wolfinger RD. SAS System for Mixed Models. Cary, NC: SAS Institute Inc; 1996.
This article has been cited by other articles:
![]() |
S. Shea, R. S Weinstock, J. A Teresi, W. Palmas, J. Starren, J. J Cimino, A. M Lai, L. Field, P. C Morin, R. Goland, et al. A Randomized Trial Comparing Telemedicine Case Management with Usual Care in Older, Ethnically Diverse, Medically Underserved Patients with Diabetes Mellitus: 5 Year Results of the IDEATel Study JAMIA, July 1, 2009; 16(4): 446 - 456. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Shea, R. S. Weinstock, J. A. Teresi, W. Palmas, J. Starren, J. J. Cimino, A. M. Lai, L. Field, P. C. Morin, R. Goland, et al. A Randomized Trial Comparing Telemedicine Case Management with Usual Care in Older, Ethnically Diverse, Medically Underserved Patients with Diabetes Mellitus: 5 Year Results of the IDEATel Study J. Am. Med. Inform. Assoc., July 1, 2009; 16(4): 446 - 456. [Abstract] [Full Text] [PDF] |
||||
![]() |
Burst Stimulation Improves Hemodynamics During Resuscitation after Prolonged Ventricular Fibrillation. Circ Arrhythm Electrophysiol, January 1, 2009; 2: 57 - 62. |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Circulation Home | Subscriptions | Archives | Feedback | Authors | Help | AHA Journals Home | Search Copyright © 2008 American Heart Association, Inc. All rights reserved. Unauthorized use prohibited. |