| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
(Circulation. 2006;114:2528-2533.)
© 2006 American Heart Association, Inc.
Statistical Primer for Cardiovascular Research |
From the Department of Biostatistics, School of Public Health, University of North Carolina, Chapel Hill.
Correspondence to Lisa M. LaVange, PhD, Collaborative Studies Coordinating Center, Department of Biostatistics, CB 8030, School of Public Health, University of North Carolina, Chapel Hill, NC 27514-4145. E-mail lisa lavange{at}unc.edu
Key Words: statistics, nonparametric probability variable distributions
| Introduction |
|---|
The Wilcoxon signed rank test, the Spearman rank correlation coefficient, and the Wilcoxon rank sum test are among the most commonly used nonparametric tests and cover a variety of research questions. These tests are described here. Although the focus is on hypothesis testing, related methods for estimation of confidence intervals are also presented. Extensions of nonparametric methods to handle stratification and covariate adjustment are also described. Scenarios in which nonparametric methods may be most useful and the power they can be expected to yield are discussed. The methods are illustrated with data from a clinical trial assessing the impact of exposure to low levels of carbon monoxide on exercise capacity in patients with ischemic heart disease.
| Wilcoxon Signed Rank Test |
|---|
represents the median of the distribution of difference values {di}, then the null hypothesis is that
=0. That is, the null hypothesis is the hypothesis that the true, underlying difference between the 2 conditions is zero. The following steps describe the calculation of the test statistic to assess the null hypothesis against the alternative of a shift in location associated with one of the conditions or time points:
The null hypothesis that the median difference is zero is assessed by comparing the test statistic U/S to critical values of the standard normal distribution for large sample sizes (eg, n
20) or by tabulating the exact critical region for small sample sizes. Computation of P values from critical values for both the normal approximation and the exact distribution is available through commercial statistical software packages (eg, SAS Proc Univariate3 and StatXact4).
When there are no ties among the observed differences and no differences equal to zero, then the signed rank test statistic simplifies to a commonly used form. If T denotes the sum of the positive ranks, then U=2Tn(n+1)/2 and equation
|
|
The significance of U/S is determined by comparison with a standard normal distribution or by computation of the exact critical region, as above.
If it is assumed that the distribution of the {di} is continuous and symmetrical, a point estimate of the median difference
is given by equation
|
|
The n(n+1)/2 quantities involved here are the n differences and their n(n1)/2 pairwise averages. Furthermore, a confidence interval can be constructed about
via the methods of Hodges and Lehmann on the basis of the exact distribution of T.1,5,6
The asymptotic relative efficiency (ARE) is a useful way to compare a nonparametric test with its parametric counterpart. Briefly, the ARE can be defined as the ratio of sample sizes required by the 2 statistics to achieve the same power under a certain distributional assumption.7 For a test of paired samples or paired conditions on a sample of subjects, the paired t test would be the parametric test of choice. The ARE of the Wilcoxon signed rank test relative to the paired t test is at least 0.864 in the entire class of continuous symmetrical distributions and at least 0.955 when the differences {di} follow a normal distribution.5
| Example Dataset |
|---|
A total of 30 patients (8 women and 22 men) successfully completed training and were randomized to exposure sequence. The outcome variable used for illustration here is duration of exercise (seconds) after the exposure condition, provided in Table 1 for each patient. This outcome variable typically has a somewhat skewed distribution and can be subject to outliers, and therefore the use of nonparametric methods for hypothesis testing is particularly appealing. The order variable groups the patients according to order of exposure (1=CO first and 2=Air first). Therefore, 16 patients were exposed to CO on the first day and exposed to Air on the second day, and 14 patients were exposed to Air on the first day, followed by CO on the second day. The baseline measure corresponds to the duration of exercise recorded on the training day before randomization.
|
The Wilcoxon signed rank test can be used to test for differences between duration of exercise under the 2 exposures. First, differences are formed between the 2 exercise times, and the absolute values of the nonzero differences are ranked across the 30 patients. Differences of zero are ignored, and midranks are used in the case of ties. Signs are then applied to indicate which differences are <0 or >0, corresponding to a decrease and an increase in exercise time, respectively. Table 2 provides the rank matrix for the example data. In this example, the sum of the positive ranks is T=166.5, the sum of the signed ranks is U=123, the square root of the sum of squares of the signed ranks (the standard deviation of U) is S=53.56, and the test statistic is U/S=2.296. The exact P=0.0198, and the approximate P=0.0217, both indicating that the null hypothesis of a zero median for the difference between exposure conditions in exercise times is rejected in favor of a significant difference. Subjects were able to exercise for significantly longer periods of time after exposure to Air than after exposure to CO. The Hodges-Lehmann point estimate for the median difference is 54.0 seconds with a 95% confidence interval of 15.5 to 110.0.
|
| Spearman Rank Correlation Coefficient |
|---|
|
|
A test of significance for the association between the 2 variables of interest (X and Y) is given by (n1)rS2, which is approximately
2 distributed with 1 degree of freedom, when the 2 variables are independent and thereby have no association (ie, the null hypothesis is true).9
The Spearman rank correlation coefficient is appropriate for both ordered categorical and continuous variables. The computations are valid with the use of midranks, and therefore ties with respect to either variable can be accommodated. Critical values of the test statistic can be computed with the large-sample
2 approximation when sample sizes are large (eg, n
40) and through tabulation of the critical regions of the exact distribution, when sample sizes are small. The statistical procedures SAS Proc FREQ3 and StatXact4 both provide exact probability levels for the Spearman rank correlation test.
| Example Dataset, Continued |
|---|
|
|
|---|
|
The Spearman rank correlation coefficient for baseline by differences in exercise times is 0.1093 with P=0.5564, indicating no significant association between these 2 variables. A logical follow-up question is whether baseline is correlated with exercise times after either exposure condition. The Spearman rank correlation coefficient between the baseline value and exercise time after exposure to Air is 0.7843 (P<0.0001) and between baseline and exercise time after exposure to CO is 0.8234 (P<0.0001). Baseline values are therefore strongly associated with postexposure exercise times, regardless of the condition. The difference between conditions with respect to exercise duration does not, however, appear to vary with baseline.
| Wilcoxon Rank Sum Test |
|---|
Let n1 denote the sample size in the first group and n2 denote the sample size in the second group. The total sample size is n=n1+n2. The Wilcoxon rank sum test is computed as follows:
2 distribution with 1 degree of freedom, when sample sizes are large (eg,
20 per group). When sample sizes are small and subjects are randomly allocated to groups (either by the design of the study or as implied by the null hypothesis), the significance level is calculated by comparing the test statistic to the critical region of the exact distribution.10
If there are no ties among the ranks, then the test statistic simplifies to a commonly used form. Let T be the sum of the ranks in group 1. Then the rank sum test statistic is given by equation
|
|
The statistical procedures SAS Proc NPAR1WAY3 and StatXact4 both provide exact probability levels for the Wilcoxon rank sum test.
If it is assumed that metric distributions for the 2 groups have the same shape and scale, Hodges-Lehmann estimates for the difference in medians between the 2 groups of patients,
, and confidence limits about
are available. The point estimate corresponds to the median of all pairwise differences between observations in one group versus those in the other group. There are n1n2 such differences.
The ARE for the Wilcoxon rank sum test relative to the t test for comparing 2 independent samples is at least 0.864 when the alternative hypothesis is a location shift in the distributions of the 2 samples and all continuous distributions are considered. When the distributions are normal, the ARE is at least 0.955. Note that when the distributions of the response variables are highly skewed, with long tails at either end, then the ARE can exceed 1.0, indicating that the Wilcoxon rank sum test will be more powerful than a t test in this instance.11
| Example Dataset, Continued |
|---|
|
|
|---|
For this example dataset, the fact that the order of exposure conditions did not appear to be related to the response variable validates the use of a signed ranks analysis of exercise times under the 2 conditions, ignoring the order of exposure. Had order been related to response, then a proper crossover analysis of the study data would be required that accounted for the order of exposure in assessing the impact of CO versus air on exercise times.12
| Extensions of the Rank Sum Test |
|---|
The method in step 4 for comparing the residuals in the 2 groups is an extension of the Wilcoxon rank sum test to provide covariate adjustment. Rank ANCOVA can provide additional power through the variance reduction typically associated with a baseline covariate adjustment, even when the response variable does not follow a normal distribution.14
Stratification may also be an important aspect of the study design resulting from patients being sorted into subsets before the conduct of the study, (eg, male and female strata or strata consisting of patients from different clinical centers in a multicenter study). Extensions for the Wilcoxon rank sum test and the test for the Spearman rank correlation coefficient that account for stratification are available.9,10 The stratified extension for the Wilcoxon rank sum test is referred to as the van Elteren statistic.15 Computations for this method are shown in the technical appendix in the online-only Data Supplement and are available through SAS Proc FREQ.3
| Example Dataset, Continued |
|---|
|
|
|---|
To illustrate the extension of these methods for stratification, subjects were stratified according to whether their baseline value was below the median of 540 seconds versus equal to or above the median. A stratified Wilcoxon rank sum test (ie, van Elteren test) was then performed. The probability value for testing the null hypothesis of no association in all strata is 0.8810 and therefore is compatible with no association between order of exposure and differences in exercise duration after stratifying on baseline value (below versus above the median).
| Discussion |
|---|
The methods described here all address a shift in location as the alternative hypothesis, in which location corresponds to the median of the response variable distribution. If transformed data are expected to be normally distributed (eg, the response variable follows a log-normal distribution), then nonparametric methods will be at least 95% efficient, and their use precludes identifying the most optimal transformation.
When the response variable is so highly skewed that the distribution appears to have an "L" or "J" shape, the Wilcoxon tests will not have good power, and Savage (or log-rank) tests will be better.16 With highly skewed distributions, both groups of subjects will tend to have ranked values on one side of the median, but only one group will have ranked values on the other or tail side of the distribution. In this case, only the rank values on the tail side are informative, and against this alternative, the Wilcoxon rank sum test will not be the most appropriate test. Because the Wilcoxon tests address shifts in location only, ranked values from both groups are expected to occur to the left and to the right of the median, and both are informative. Under the alternative hypothesis of a shift in location, one group will tend to have ranks on one side of the overall median, whereas the other will tend to have ranks on the opposite side. This is precisely the setting in which the Wilcoxon tests are most useful and nearly as powerful as parametric methods applied when all assumptions hold.
| Acknowledgments |
|---|
None.
| Footnotes |
|---|
| References |
|---|
2. Conover WJ. Practical Nonparametric Statistics. New York, NY: John Wiley & Sons; 1971.
3. SAS Institute Inc. SAS/STAT Users Guide, Version 9. Cary, NC: SAS Institute Inc; 2004.
4. Cytel Software Corporation. StatXact 7 On-line User Manual. Cambridge, Mass: Cytel Software Corporation; 2005.
5. Woolson RF. Wilcoxon signed-rank test. In: Armitage P, Colton T, eds. Encyclopedia of Biostatistics. Vol 6. West Wessex, England: John Wiley & Sons Ltd; 1998: 47394740.
6. Hodges JL, Lehmann EL. Rank methods for combination of independent experiments in analysis of variance. Ann Math Stat. 1962; 33: 482497.[CrossRef]
7. DasGupta A. Encyclopedia of Biostatistics. Vol 1. Armitage P, Colton T, eds. West Wessex, England: John Wiley & Sons Ltd; 1998: 210215.
8. Adams KF, Koch GG, Chaterjee B, Goldstein GM, ONeil JJ, Bromberg PA, Sheps DS, McAllister S, Price CJ, Bissette J. Acute elevation of blood carboxyhemoglobin to 6% impairs exercise performance and aggravates symptoms in patients with ischemic heart disease. J Am Coll Cardiol. 1988; 12: 900909.[Abstract]
9. Stokes ME, Davis CS, Koch GG. Categorical Data Analysis Using the SAS System. 2nd ed. Cary, NC: SAS Institute Inc; 2000.
10. Landis RJ, Sharp TJ, Kuritz SJ, Koch GG. Mantel-Haenszel methods. In: Armitage P, Colton T, eds. Encyclopedia of Biostatistics. Vol 3. West Essex, England: John Wiley & Sons Ltd; 1998: 23782391.
11. Moses L. Wilcoxon-Mann-Whitney test. In: Armitage P, Colton T, eds. Encyclopedia of Biostatistics. Vol 6. West Wessex, England: John Wiley & Sons Ltd; 1998: 47424745.
12. Tudor G, Koch GG. Review of non-parametric methods for the analysis of crossover studies. Stat Methods Med Res. 1994; 3: 345381.
13. Koch GG, Carr GJ, Amara IA, Stokes ME, Uryniak TJ. Categorical data analysis. In: Berry DA, ed. Statistical Methodology in the Pharmaceutical Sciences. New York, NY: Marcel Dekker; 1990: 389473.
14. LaVange LM, Durham TA, Koch GG. Randomization-based nonparametric methods for the analysis of multicentre trials. Stat Methods Med Res. 2005; 14: 281301.
15. Lehmann EL. Nonparametrics: Statistical Methods Based on Ranks. San Francisco, Calif: Holden-Day; 1975.
16. Koch GG, Sen PK, Amara I. Log-rank scores, statistics, and tests. In: Kotz S, Johnson NL, eds. Encyclopedia of Statistical Sciences. Vol 5. New York, NY: John Wiley & Sons; 1985: 136142.
This article has been cited by other articles:
![]() |
J. A. Cohen, P. B. Imrey, P. A. Calabresi, K. R. Edwards, T. Eickenhorst, W. L. Felton III, E. Fisher, R. J. Fox, A. D. Goodman, C. Hara-Cleaver, et al. Results of the Avonex Combination Trial (ACT) in relapsing-remitting MS Neurology, February 10, 2009; 72(6): 535 - 541. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Circulation Home | Subscriptions | Archives | Feedback | Authors | Help | AHA Journals Home | Search Copyright © 2006 American Heart Association, Inc. All rights reserved. Unauthorized use prohibited. |