(Circulation. 2007;115:2870-2875.)
© 2007 American Heart Association, Inc.
Statistical Primer for Cardiovascular Research |
From the European College of Pharmaceutical Medicine, Lyon, France (T.J.C., A.H.Z.), and Academic Medical Center, Amsterdam, the Netherlands (A.H.Z.).
Correspondence to Ton J. Cleophas, MD, PhD, Department of Medicine, Albert Schweitzer Hospital, Dordrecht, The Netherlands. E-mail ajm.cleophas{at}wxs.nl
Key Words: population characteristics meta-analysis publication bias
| Introduction |
|---|
|
|
|---|
Meta-analyses can be defined as systematic reviews with pooled data. Traditionally, they are post hoc analyses. However, probability statements may be more valid than they usually are with post hoc studies, particularly if performed on outcomes that were primary outcomes in the original trials. Problems with pooling are frequent: Correlations are often nonlinear3; effects are often multifactorial rather than unifactorial4; continuous data frequently have to be transformed into binary data for the purpose of comparability5; poor studies may be included, and coverage may be limited6; and data may not be homogeneous and may fail to relate to hypotheses.7 Despite these problems, the methods of meta-analysis are an invaluable scientific activity: they establish whether scientific findings are consistent8 and can be generalized across populations and treatment variations9 and whether findings vary between subgroups.10 The methods also limit bias, improve reliability and accuracy of conclusions,11 and increase the power and precision of treatment effects and risk exposures.6
The objective of this article is to review statistical procedures for the meta-analysis of cardiovascular research. The Google database system provides 659 000 references on the methods of meta-analysis and refers to hundreds of books of up to 600 pages,12 illustrating the complexity of this subject. The basic statistical analysis of meta-analyses, however, is not complex if the basic scientific methods are met.13 We first will review the scientific methods and then introduce the statistical analysis, including the analysis of potential pitfalls. Finally, we will cover some new developments.
| Four Scientific Rules |
|---|
|
|
|---|
Clearly Defined Hypothesis
Clinical trials address efficacy and safety of new drugs or interventions. The main outcome variables and the manner in which they should be tested are specified in advance. A meta-analysis is similar to a single trial, and, as in a single trial, it tests a very small number of primary hypotheses, primarily that the new compound or intervention is more efficacious and safe than the reference compound or intervention.
Thorough Search of Trials
The activity of thoroughly searching published research requires a systematic procedure and must be learned. One may pick up a checklist for this purpose, similar to the checklist used by aircraft staff before takeoff, a simile used by Oxman and Guyatt.14 A faulty review of trials is as perilous as a faulty aircraft, and both of them are equally deadly, particularly so if we are going to use it for making decisions about healthcare. For a systematic review, MEDLINE15 is not enough, and other databases have to be searched, eg, EMBASEExcerpta Medica16 and the Cochrane Library.17
Strict Inclusion Criteria
Inclusion criteria are concerned with the levels of validity, otherwise termed quality criteria, of the trials to be included. Having strict inclusion criteria means that we will subsequently include only the valid studies. Some factors have been shown empirically to beneficially influence validity. These factors include blinding the study, random assignment of patients, explicit description of methods, accurate statistics, and accurate ethics, including written informed consent. We should add that the inclusion of unpublished studies may reduce the magnitude of publication bias, an issue that will be discussed in Pitfalls of Data Analysis.
Uniform Data Analysis
Statistical analysis is a tool that helps us to derive meaningful conclusions from the data and to avoid analytical errors. Statistics should be simple and test primary hypotheses in the first place. Before any analysis or data plots, we must decide what kind of data we have.
| General Framework of Meta-Analysis |
|---|
|
|
|---|
|
|
and its SE is equation
|
|
The weights wi are a function of the SE of xi, denoted as SE(xi), and of the variance
2 of the true effects of the compound between k different studies: equation
|
|
If all k studies have the same true quantitative effect,
2=0, and the weighted average effect is called a fixed-effect estimate. If the true effects of the compound vary between studies,
2 >0, and the weighted average effect is called a random-effects estimate. For the fixed-effect estimate (ie,
2=0), the calculations are quite simple; for the random-effects estimate, the calculations are more complex but are available in computer packages.1822
With dependence on the type of outcome variable, the summary statistics x1, x2, ..., xk have different forms.
Continuous Data
Continuous data are summarized with means and SDs: mean1i and SD1i in the placebo group and mean2i and SD2i in the active treatment group of trial i. The summary statistic equals xi=mean1imean2i and equation
|
|
If a trial compares 2 treatments in the same patients, the summary statistic is xi=mean1imean2i, where mean1i and mean2i are the means of the 2 treatments, and equation
|
|
where r is the correlation between the outcomes in the 2 treatments.
If the distribution of the outcomes is very skewed, it is more useful to summarize the outcomes with medians than means.
Binary Data
Binary data are summarized as proportions of patients with a positive outcome in the treatment arms, denoted by pi1 and pi2. Three different summary statistics are used, as follows.
Risk Difference
The summary statistic of trial i equals xi=p1ip2i, and the SE equals equation
|
|
where n1i and n2i are the sample sizes of the 2 treatments of trial i.
Relative Risk
The summary statistic of trial i equals the ratio of the 2 proportions, but its distribution is often very skewed. Therefore, we prefer to analyze the natural logarithm of the relative risk, ln(RR). The summary statistic thus equals xi=ln(p1i/pi2), and the SE equals equation
|
|
Odds Ratio
The summary statistic of trial i equals the ratio of the odds, but because the odds ratio is strictly positive, we again prefer to analyze the natural logarithm of the odds ratio. Thus, the summary statistic equals equation
|
|
and the SE equals equation
|
|
Other Methods
The Mantel-Haenszel method has been developed for the stratified analysis of odds ratios and has been extended to the stratified analysis of risk ratios and risk differences.23 Like the general model, a weighted average effect is calculated. For the calculation of combined odds ratios, Petos method is also often used.24 It applies a way to calculate odds ratios that may cause underestimation or overestimation of extreme values like odds ratios <0.2 or >5.0.
Sometimes valuable information can be obtained from crossover studies, and, if the paired nature of the data is taken into account, such data can be included in a meta-analysis. The Cochrane Library CD-ROM provides the generic inverse variance method for that purpose.17
Survival Data
Survival trials are summarized with Kaplan-Meier curves, and the difference between the survival in 2 treatment arms is quantified with the log(hazard ratio) calculated from the Cox regression model. To test whether the weighted average is significantly different from 0.0, a
2 test is used, as follows equation
|
|
with 1 df. A calculated
2 value >3.841 indicates that the pooled average is significantly different from 0.0 at P<0.05 and thus that a significant different exists between the test and reference treatments. The generic inverse variance method is also possible for the analysis of hazard ratios.17
| Pitfalls of Data Analysis |
|---|
|
|
|---|
Publication Bias
A good starting point with any statistical analysis is plotting the data (Figure 1, left). A Christmas tree13 or upside-down funnel pattern of distribution of the results of 100 published trials shows on the x axis the mean result of each trial and on the y axis the sample size of the trials. The smaller the trial, the wider is the distribution of results. The right panel of Figure 1 gives a simulated pattern, suggestive of publication bias: the negative trials are not published and thus are missing. This cut Christmas tree can help one to suspect that there is publication bias in the meta-analysis. Publication bias can be tested by calculating the shift of odds ratios caused by the addition of unpublished trials from abstract reports or proceedings.27
|
Heterogeneity
To visually assess heterogeneity between studies, several types of plots are proposed, including forest plots, radial plots, and LAbbe plots.28 The forest plot of Figure 2 gives an example used by Thompson29 of a meta-analysis with odds ratios and 95% CIs, revealing information about heterogeneity. On the x axis are the results, and on the y axis are the trials. We see the results of 19 trials of endoscopic intervention versus no intervention for upper intestinal bleeding: odds ratios <1 represent a beneficial effect. These trials were considerably different in patient selection, baseline severity of condition, endoscopic techniques, management of bleeding, and duration of follow-up. Therefore, this is a meta-analysis that is, clinically, very heterogeneous. Is it also statistically heterogeneous? For that purpose, we may use a fixed-effect model, which tests whether there is a greater variation between the results of the trials than is compatible with the play of chance, using a
2 test. The null hypothesis is that all studies have the same true odds ratio and that the observed odds ratios vary only because of sampling variation in each study. The alternative hypothesis is that the variation of the observed odds ratio is also due to systematic differences in true odds ratios between studies. The following test statistic is used to test the aforementioned null hypothesis with summary statistics xi and weights wi: equation
|
|
|
with k1 df.
We find Q=43 for 191=18 df for the example of the endoscopic intervention. The probability value is <0.001, providing substantial evidence for statistical heterogeneity. For the interpretation, it is useful to know that, when the null hypothesis is true, a Q statistic has on average a value close to the df and increases with increasing df. Therefore, a result of Q=18 with 18 df would give no evidence for heterogeneity, and the opposite is true for much larger values.
If the aforementioned test is positive, it is common to also calculate a random-effects estimate of the weighted average, as suggested by Dersimonian and Laird.30 We should add that, in most situations, the use of the random-effects model will lead to wider CIs and a lower chance to call a difference statistically significant. A disadvantage of the random-effects analysis is that small and large studies are given almost similar weights.31 Complementary to the Q statistic, the amount of heterogeneity between studies is often quantified with the I2 statistic,32 as follows equation
|
|
which is interpreted as the proportion of total variation in study estimates due to heterogeneity rather than sampling error. Fifty percent is often used as a cutoff for heterogeneity.
Investigating the Cause for Heterogeneity
When there is heterogeneity, careful investigation of the potential cause must be accomplished. The main focus should be trying to understand any sources of heterogeneity in the data. In practice, it may be less hard to assess because clinical differences have already been noticed, and it therefore becomes easy to test the data accordingly. The general approach is to quantify the association between the outcomes and characteristics of the different trials. Not only patient characteristics but also trial quality characteristics such the use of blinding, randomization, and placebo controls have to be considered. Scatterplots are helpful to investigate the association between outcome and a covariate, but these must be inspected carefully because differences in trial sample sizes may distort the existence of association, and meta-regression techniques may be needed to investigate associations.
Outliers may also provide a clue about the cause of heterogeneity. Figure 3 shows the relation between cholesterol and coronary heart disease.33 The 2 outliers on top were the main cause of heterogeneity in the data.
|
Still other causes for heterogeneity may be involved. As an example, 33 studies of cholesterol and the risk of carcinomas showed that heterogeneity was huge.34 When the trials were divided according to social class, the effect in the lowest class was 4 to 5 times the effects in the middle and upper classes, explaining this heterogeneous result.
There is some danger of overinterpretation of heterogeneity. Heterogeneity may occur by chance and will almost certainly be found with large meta-analyses involving many and large studies. This is particularly an important possibility when no clinical explanation is found or when the heterogeneity is clinically irrelevant. In addition, we should warn that a great deal of uniformity among the results of independently performed studies is not necessarily good; it can indicate consistency in bias rather than consistency in real effects, as suggested by Riegelman.35
Lack of Robustness
Sensitivity or robustness of a meta-analysis is one last aspect to be addressed. When we described strict inclusion criteria, we discussed studies with lower levels of validity. It may be worthwhile not to completely reject the studies with lower methodology.34 They can be used for assessing sensitivity.
The left panel of Figure 4 gives an example of how the pooled data of 3 high-quality studies provide a smaller result than do 4 studies of borderline quality. The summary result is determined mainly by the borderline-quality studies. When studies are ordered according to use of blinding, as shown in the right panel of Figure 4, differences may or may not be large. In studies in which objective variables are used, eg, blood pressures or heart rates, blinding is not as important as it is in studies in which subjective variables (eg, pain scores) are used. In this particular example, differences were negligible. When examining the influence of various inclusion criteria on the overall odds ratios, we must conclude that the criteria themselves are an important factor in determining the summary result. In that case, the meta-analysis lacks robustness. Interpretation must be cautious, and pooling may have to be omitted altogether. Just omitting trials at this stage of the meta-analysis is inappropriate because it would introduce either bias similar to publication bias or bias introduced by not complying with the intention-to-treat principle.
|
| Discussion |
|---|
|
|
|---|
New statistical methods are being developed. Boekholdt et al45 showed that observational studies and clinical trials can be included simultaneously in a meta-analysis. Van Houwelingen et al46 assessed heterogeneity with multivariate methods for bivariate and multivariate outcome parameters. If trials directly comparing the treatments under study are not available, indirect comparisons with a common comparator may be used.47 A method like leave-1-out cross-validation is a standard sensitivity technique for such purpose. Lumley48 developed network meta-analysis to compare competing treatments not directly compared in trials. Terrin et al49 and Tang and Liu50 recently demonstrated that an asymmetrical Christmas tree is only related to publication bias if the trials included are homogeneous and that registries are a good alternative approach. In recent years, the method of meta-regression brought new insights.51,52 For example, it showed that group-level instead of patient-level analyses easily fail to detect heterogeneities between individual patients, otherwise termed ecological biases. Robustness is hard to assess if low-quality studies are lacking. Casas et al53 showed that it can be assessed by evaluating the extent to which different variables contribute to the variability between the studies. It can also be assessed with the use of cumulative meta-analysis,54 whereas quality measures can be adjusted for in meta-regression.
Meta-analyses including few studies, eg, 3 or 4, have little power to test the pitfalls. In contrast, meta-analyses including many studies may have so much power that they demonstrate small pitfalls that are not clinically relevant. For example, a meta-analysis of 43 angiotensin blocker studies55 found 95% CIs of the heterogeneity and publication bias effects were not wider than 5% of the treatment effects. Another reason why the pitfalls receive less attention today than 5 years ago is that an increasing part of the current meta-analyses is performed in the form of working papers of an explorative nature, in which the primary question is not a result representative of the entire population but rather the estimates of the treatment effects in subgroups and interactions. These meta-analyses contain many details and look a bit like working papers of technological evaluations produced by physicists. The trend to increasingly publish detailed data, rather than study reports as allowed by journals, is enhanced by the Internet, which enables registration of many more data than do medical journals.
Meta-analyses were invented in the early 1970s by psychologists, but pooling study results extends back to the early 1900s by statisticians such as Karl Pearson and Ronald Fisher. In the first years, pooling of the data were often impossible because of heterogeneity of the studies. However, after 1995, trials became more homogeneous. In the late 1990s, several publications concluded that meta-analyses did not accurately predict treatment56,57 and adverse effects.58 The pitfalls were held responsible. Initiatives against them include (1) the Consolidated Standards of Reporting Trials Movement (CONSORT), (2) the Unpublished Paper Amnesty Movement of the English journals, and (3) the World Association of Medical Editors initiative to standardize the peer review system. Guidelines and checklists for reporting meta-analyses were published such as QUOROM (Quality of Reporting of Meta-analyses) and MOOSE (Meta-analysis Of Observational Studies in Epidemiology).
| Conclusions |
|---|
|
|
|---|
| Acknowledgments |
|---|
None.
| References |
|---|
|
|
|---|
steel/procrastinus/meta/meta.html. Accessed May 24, 2007.This article has been cited by other articles:
![]() |
G. W. Stone Angioplasty Strategies in ST-Segment-Elevation Myocardial Infarction: Part I: Primary Percutaneous Coronary Intervention Circulation, July 29, 2008; 118(5): 538 - 551. [Full Text] [PDF] |
||||
![]() |
O. J. Liakopoulos, Y.-H. Choi, P. L. Haldenwang, J. Strauch, T. Wittwer, H. Dorge, C. Stamm, G. Wassmer, and T. Wahlers Impact of preoperative statin therapy on adverse postoperative outcomes in patients undergoing cardiac surgery: a meta-analysis of over 30 000 patients Eur. Heart J., June 2, 2008; 29(12): 1548 - 1559. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. J. London Quo Vadis, Perioperative Beta Blockade? Are You "POISE'd" on the Brink? Anesth. Analg., April 1, 2008; 106(4): 1025 - 1030. [Full Text] [PDF] |
||||
![]() |
D. J. Kereiakes The Emperor's New Clothes: Another Cypher Versus Taxus Post-Hoc Meta-Analysis J. Am. Coll. Cardiol., October 2, 2007; 50(14): 1381 - 1385. [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Circulation Home | Subscriptions | Archives | Feedback | Authors | Help | AHA Journals Home | Search Copyright © 2007 American Heart Association, Inc. All rights reserved. Unauthorized use prohibited. |