Circulation. 2008;117:1238-1243
doi: 10.1161/CIRCULATIONAHA.107.654350
(Circulation. 2008;117:1238-1243.)
© 2008 American Heart Association, Inc.
Statistical Primer for Cardiovascular Research |
Repeated Measures
Lisa M. Sullivan, PhD
From the Department of Biostatistics, Boston University School of Public Health, Boston, Mass.
Correspondence to Lisa M. Sullivan, PhD, Boston University School of Public Health, Department of Biostatistics, 715 Albany St, Boston, MA 02118. E-mail lsull{at}bu.edu
Key Words: biostatistics measurement research design statistical data analysis
 |
Introduction
|
|---|
A repeated-measures design is one in which multiple, or repeated,
measurements are made on each experimental unit. The experimental
unit could be a person or an animal, and repeated measurements
might be taken serially in time, such as in weekly systolic
blood pressures or monthly weights. The repeated assessments
might be measured under different experimental conditions. Repeated
measurements on the same experimental unit can also be taken
at a point in time. For example, it might be of interest to
measure the diameter of each of several lesions within each
person or animal in a study. The dependency, or correlation,
among responses measured in the same individual is the defining
feature of a repeated-measures design. This correlation necessitates
a statistical analysis that appropriately accounts for the dependency
among measurements within the same experimental unit, which
results in a more precise and powerful statistical analysis.
Repeated-measures analysis encompasses a spectrum of applications, which in the simplest case is a generalization of the paired t test.1 A repeated-measures within-subjects design can be thought of as an extension of the paired t test that involves
3 assessments in the same experimental unit. Repeated-measures analysis can also handle more complex, higher-order designs with within-subject components and multifactor between-subjects components. The focus here is on within-subjects designs.
 |
Design Issues
|
|---|
A
completely randomized design is one in which each experimental
unit (eg, person or animal) is assigned randomly to 1 of several
competing treatments. For example, a study is proposed to compare
4 treatments (eg, a control and 3 distinct active treatments
or a control and 3 different doses of the same treatment), and
a sample of 20 animals are randomized to the 4 treatments. The
randomization could be implemented by 1 of a number of possible
techniques ranging from a simple randomization (in which a single
sequence of the numbers 1 through 4 is produced, and animals
are assigned according to the sequence) to a more involved randomization
that uses stratification or permuted blocks.
2 The permuted blocks
strategy is used to ensure balance in the randomization process
such that equal numbers of animals are assigned to each treatment.
This strategy can be designed to ensure balance at specified
enrollment points, for example, balance among the 4 treatments
after randomization of 8 (2 per treatment) or 12 (3 per treatment)
experimental units. This strategy is generally used when enrollment
into a study occurs over time. With the permuted blocks strategy,
5 animals would be randomly assigned to each treatment in the
present example.
The goal of the analysis is to compare responses among the 4 treatments. If the dependent or outcome variable is continuous, this test is performed with ANOVA.3 If the outcome variable is categorical, this test is performed with a
2 test.4 These tests are based on the assumption that the measurements within and across treatments are independent or unrelated. If the experimental units are unrelated (ie, not family members or littermates), and 1 measurement has been made per unit, then this assumption is reasonable.
In contrast, in a repeated-measures design, multiple measurements are taken on each experimental unit. Consider again the application described above in which the goal of the analysis is to compare the 4 competing treatments. A repeated-measures design could involve 5 animals, each measured 4 times, once under each experimental condition. The repeated-measures design involves a smaller number of animals, which is both efficient and ethically appealing.
If 5 animals are measured under each of 4 different experimental conditions, a total of 20 measurements will again be available for analysis. The 20 measurements, however, are not independent but are related within the subjects. Because the measurements might be affected by within-subject characteristics (eg, age or genetic factors), statistical tests that properly account for within-subject correlation are needed. If we assume that measurements taken in the same individual are correlated, the test for a difference in treatments will involve a smaller residual or error variance than that based on a completely randomized design, thereby increasing precision in the analysis.
A randomized block design is one in which a set of experimental units are organized into homogeneous groups or blocks on the basis of a characteristic assumed to affect the outcome. The goal is to have r replicates of each of k treatments in each of b blocks, with the total sample size n=kbr. Consider again the study comparing 4 competing treatments (k=4). Suppose the outcome of interest is known to be affected by age. With n=20 independent experimental units, these might be organized into 5 age groups (eg, quintiles of age) with 1 replication per group (k=4, b=5, r=1). In a randomized block design, experimental units within each block are randomly assigned to treatments, and this technique reduces variation due to differences in age. The design can be thought of as replications of a completely randomized experiment in which there are as many replications as there are blocks.
Some repeated-measures designs can be viewed as a special case of the randomized block design in which the block is the individual experimental unit (eg, person or animal). The randomized block design is often used with siblings or littermates. The family unit is the block, and assessments are repeated on each member of the family. The assessments within a family or litter are related. Accounting for the dependencies within the block results in a more precise test of treatment differences.
 |
Repeated-Measures Analysis
|
|---|
Repeated-measures analysis can be used to assess changes over
time in an outcome measured serially or to test for differences
in 1 or more treatments based on repeated assessments in the
same subjects. The simplest application has 1 within-subjects
factor (eg, each of
n subjects are measured under
k distinct
experimental treatments), and the goal of the analysis is to
test for a difference in experimental treatments. This is achieved
by constructing a test statistic as the ratio of the variance
due to the treatments to the residual or error variance. In
repeated-measures analysis, the total variance can be partitioned
into variance between subjects and variance within subjects.
Variance between subjects reflects individual subject differences.
Variance within subjects consists of 2 components, differences
between treatments and error or residual variation. The test
statistic for testing the null hypothesis of equality of means
is the ratio of the variation due to treatments to the residual
variation, after between-subject variation has been removed.
The components of variance and the test statistic are illustrated
in
Figure 1. The details of the computations are illustrated
in example 1.
Example 1
An animal study is performed to assess transgene activation in the heart. The primary outcome is percent fractional shortening, which is measured at baseline and again after 2, 4, and 6 weeks of treatment. A final assessment is made after 6 weeks of treatment and 2 weeks off treatment. At baseline, the mice were 12 weeks of age; thus, at subsequent assessments, they were 14, 16, 18, and 20 weeks of age. Three mice completed the protocol, and the data on percent fractional shortening measured at each time point are shown in Table 1. The research hypothesis is that the mean percent fractional shortening scores are different over time. The means at each time point are shown in the bottom row of Table 1 and decrease over time. To test for a significant difference in means over time, a repeated-measures ANOVA is used. The results of the repeated-measures ANOVA are contained in Table 2. The test statistic for equality of means over time is F=95.4 (df=4,8), which is highly statistically significant at P<0.0001. Thus, a highly statistically significant difference exists in the mean percent fractional shortening over time.
Suppose that these same data were incorrectly analyzed as if they were derived from a completely randomized design. The results of the ANOVA, testing for a difference in means over time, are contained in Table 3. If the data are treated incorrectly as 15 independent observations and analyzed with ANOVA, the F statistic is F=45.1 (df=4,10), which is still highly statistically significant. Notice the difference in the error or residual variation between methods. The denominator of the F statistic for testing differences in means over time is the mean square error. In the repeated-measures ANOVA, the mean square error is 6.1 compared with 12.9 in the ANOVA that assumed independence. In this particular example, the incorrect analysis still produced a significant result. In other applications, failure to account for the dependencies among observations could result in a nonsignificant finding. The repeated-measures ANOVA that appropriately accounts for dependencies in the data produces a more precise test.
If a significant difference is found, it may be of interest to test for differences between pairs of treatments or, in example 1, between specific time points. These tests should be handled with a multiple comparison procedure that again appropriately handles the correlation in the data and also controls the type I error rate (see Larson3 and Cabral5 for details).
 |
Repeated-Measures Analysis With Repeated Measures on 1 Factor
|
|---|
A popular extension of the one-way repeated-measures ANOVA is
the two-factor ANOVA with repeated measures on 1 factor. In
this application, a treatment group (eg, medical versus surgical
treatment, treatment versus placebo, or challenged versus unchallenged)
is often used, and different subjects are assigned to each treatment
group, but the outcome is again measured repeatedly over time.
The goal is to compare the treatments with respect to differences
in the outcome. The treatment factor is a between-subjects factor
and has no repeated measures. However, repeated assessments
are taken on each subject within each treatment over time, and
thus, the time factor must be handled appropriately in the analysis.
The procedure again partitions the variation to produce F statistics
to test the hypotheses of equality of outcomes between treatments
and equality of outcomes over time. The variance is partitioned
as shown in
Figure 2, and the following tests of hypothesis
are performed. The first test is a test for treatment effect.
This is done by constructing an F statistic as the ratio of
the treatment variation to the error variation due to subjects
within treatments (
Figure 2). The second test is for differences
in outcomes over time, the repeated factor. This is again performed
by constructing an F statistic. The F statistic for differences
over time is based on the ratio of time variation to error or
residual variation. Because this is a two-factor design, a possibility
of an interaction between the treatment and time factors (ie,
a different effect of treatment over time) may also exist, and
this is tested by constructing an F statistic as the ratio of
the treatment-by-time variation to the error or residual variation
(
Figure 2). Some investigators first test the treatment and
time effects and then perform a test for interaction, whereas
others first test for an interaction and then test for treatment
and time effects. If a statistically significant interaction
exists, the treatment effect is different over time, and therefore,
the tests for an overall treatment effect and an overall time
effect do not completely explain differences in outcome (see
Kleinbaum et al
6 for more details).
These data can be analyzed in several different ways. An important issue is the appropriate specification of the nature of the correlations between measurements in the same person, called the covariance structure. Most statistical computing packages offer a variety of covariance structures for these types of analysis, and the covariances must be modeled correctly. Three structures are very popular and fit many applications. The first is called "compound symmetry" and assumes that the correlations between all pairs of measures are the same. This may be reasonable for a repeated-measures study in which each subject is measured under k different experimental conditions. The second is called "autoregressive of order 1," or AR(1), and assumes that the correlations between adjacent pairs are greater than the correlations between more distant pairs. This may be reasonable for data measured serially in time, whereby more proximal measures are more highly correlated than measures taken more distantly in time. For this structure, the time points should be approximately equally spaced in time. A third popular structure is called "unstructured," and as the name implies, it assumes that each pair of measurements has its own correlation. Although the latter might seem appealing, it actually produces a less powerful analysis because the data first must be used to assess the correlation structure and then to perform the primary analyses. Some statistical computing packages (eg, SAS, SAS Institute, Cary, NC) offer metrics to determine which structure best fits the data. One such measure is the Akaike information criterion, with which smaller values indicate a better fit. As in all statistical analyses, it is important to plan and implement parsimonious models that are biologically sensible. An example of a two-factor ANOVA with repeated measures on 1 factor is contained in example 2.
Example 2
A randomized, placebo-controlled study is performed to estimate the short-term effects of an antihypertensive medication on systolic blood pressure. Subjects are randomly assigned to receive either the treatment or a placebo. Systolic blood pressures are measured before the first dose of treatment is administered (baseline) and again at 2, 4, and 6 weeks after the initiation of treatment (or placebo). The study involves 6 participants, 3 of whom are randomly assigned to each treatment arm; the data on systolic blood pressure measured at each time point are shown in Table 4. The research hypothesis is that the mean systolic blood pressures are different between treatments.
Figure 3 displays the mean systolic blood pressures over time for participants undergoing treatment and given a placebo. The mean systolic blood pressures decreased over time in both groups, with a sharper decrease in the treatment group. The results of the two-factor ANOVA with repeated measures are contained in Table 5. The test statistic for equality of treatment means over time is F=36.1 (df=1,4), which is highly statistically significant at P=0.0039. Thus, a highly statistically significant difference is present in mean systolic blood pressures between patients given the antihypertensive medication and those given placebo. The test for a difference in mean systolic blood pressures over time is also highly statistically significant [F=27.1 (df=3,12), P=0.0001]. The test for the interaction between treatment and time is marginally significant [F=3.2 (df=3,12), P=0.0626]. This test assesses the homogeneity of the difference in mean blood pressures between the treatment and placebo groups over time. Figure 3 shows that the difference in means is widening over time, which is driving the test for interaction to approach statistical significance.
The analyses reported in Table 5 assume equal correlations between measurements (ie, compound symmetry). An alternative analysis for these data would be an autoregressive correlation structure in which correlations between measures taken closer together in time are higher than those measured more distantly. If we assume an AR(1) covariance structure, the test statistic for equality of treatment means over time is F=12.7, which is significant at P=0.0235. The test for a difference in mean systolic blood pressures over time is highly statistically significant (F=30.4, P=0.0001), and the test for the interaction between treatment and time is significant (F=3.7, P=0.0423). The Akaike information criterion is 102.7 for the compound symmetry model and 98.0 for the AR(1) model. Because smaller values indicate better fit, the AR(1) model is a better choice for these data.
Estimates of treatment effect are provided in Table 6 for both the model that assumes compound symmetry and the model that assumes an AR(1) covariance structure. Notice that the estimates of the treatment effect are the same; however, the standard errors are different, which affects the significance of the difference.
 |
Alternative Approaches to Analysis of Repeated-Measures Data
|
|---|
When repeated measures have been taken on each experimental
unit, several approaches to the statistical analysis are possible.
Thinking again of the two-factor ANOVA with repeated measures
on 1 factor, a simple approach to handling the correlation among
repeated measures in the same person involves computing mean
scores for each person over time. In example 2, this would reduce
the sample sizes to n
1=3 and n
2=3, and the test for treatment
differences could be performed with the unpaired
t test. Using
the data in example 2, this would produce
t=6.0,
P=0.0039, which
indicates that a significant difference is present in mean systolic
blood pressures between groups. This
t test is based on only
3 observations per group and 1 observation per participant (the
mean systolic blood pressure over time). This approach is analytically
correct but does not take full advantage of the data. This approach
is much less powerful than the repeated-measures approach. A
second alternative is to assess treatment differences at each
time point. In example 2, this translates to conducting 4 unpaired
t tests, 1 at each observation point. This approach is again
inefficient, because it does not allow for any assessment of
trend over time. In addition, this approach increases the likelihood
of a false-positive result due to multiple statistical testing.
5,7 The most efficient approach is to explicitly account for the
dependency in the data by use of repeated-measures techniques,
and this can be done in many different ways.
 |
Assumptions and Analytic Details
|
|---|
An important assumption in repeated-measures analysis is sphericity,
or homogeneity of variances over time. Most statistical computing
packages offer tests for sphericity. If the assumption is violated,
then mixed models can be used to explicitly address differences.
8
A number of statistical computing packages are available that offer procedures for repeated-measures ANOVA. Within these packages, several options are available for conducting the tests. SAS, for example, offers several procedures that can handle repeated-measures data. Careful attention must be paid to the data layout, the specification of factors (eg, as fixed or repeated), the appropriate error terms for test statistics, and the nature of the correlations between observations measured in the same individual (ie, the covariance structure). Littell et al7,8 provide a detailed approach to using SAS for repeated-measures analysis.
 |
Acknowledgments
|
|---|
Disclosures
None.
 |
References
|
|---|
1. Davis RB, Mukamal KJ. Hypothesis testing: means: statistical primer for cardiovascular research.
Circulation. 2006; 114: 1078–1082.
[Free Full Text]2. Verberk WJ, Kroon AA, Kessels AGH, Nelemans PJ, VanRee JW, Lenders JWM, Thien T, Bakx JC, VanMontfrans GA, Smit AJ, Beltman FW, DeLeeuw PW. Comparison of randomization techniques for clinical trials with data from the HOMERUS-trial. Blood Pressure. 2005; 14: 306–314.[CrossRef][Medline]
[Order article via Infotrieve]
3. Larson MG. Analysis of variance. Circulation. 2008; 117: 115–121.[Free Full Text]
4. DAgostino RB, Sullivan LM, Beiser AS. Introductory Applied Biostatistics. Belmont, Calif: Brooks/Cole; 2004.
5. Cabral HJ. Multiple comparisons procedures. Circulation. 2008; 117: 698–705.[Free Full Text]
6. Kleinbaum DG, Kupper LL, Muller KE. Applied Regression Analysis and Other Multivariable Methods. 2nd ed. Boston, Mass: PWS-Kent; 1988.
7. Littell RC, Henry PR, Ammerman CB. Statistical analysis of repeated measures data using SAS procedures. J Anim Sci. 1998; 76: 1216–1231.[Abstract/Free Full Text]
8. Littell RC, Milliken GA, Stroup WW, Wolfinger RD. SAS System for Mixed Models. Cary, NC: SAS Institute Inc; 1996.
This article has been cited by other articles:

|
 |

|
 |
 
S. Shea, R. S. Weinstock, J. A. Teresi, W. Palmas, J. Starren, J. J. Cimino, A. M. Lai, L. Field, P. C. Morin, R. Goland, et al.
A Randomized Trial Comparing Telemedicine Case Management with Usual Care in Older, Ethnically Diverse, Medically Underserved Patients with Diabetes Mellitus: 5 Year Results of the IDEATel Study
J. Am. Med. Inform. Assoc.,
July 1, 2009;
16(4):
446 - 456.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
Burst Stimulation Improves Hemodynamics During Resuscitation After Prolonged Ventricular Fibrillation
Circ Arrhythmia Electrophysiol,
February 1, 2009;
2(1):
57 - 62.
|
 |
|