# Meta-Analysis

## Jump to

In 1982, thrombolytic therapy for acute coronary syndromes was controversial. In a meta-analysis of 7 trials, Stampfer et al^{1} found a reduced risk of mortality of 0.80 (95% CI, 0.68 to 0.95). These findings were not accepted by cardiologists until 1986, when a large clinical trial confirmed the conclusions,^{2} and streptokinase became widely applied.

Meta-analyses can be defined as systematic reviews with pooled data. Traditionally, they are post hoc analyses. However, probability statements may be more valid than they usually are with post hoc studies, particularly if performed on outcomes that were primary outcomes in the original trials. Problems with pooling are frequent: Correlations are often nonlinear^{3}; effects are often multifactorial rather than unifactorial^{4}; continuous data frequently have to be transformed into binary data for the purpose of comparability^{5}; poor studies may be included, and coverage may be limited^{6}; and data may not be homogeneous and may fail to relate to hypotheses.^{7} Despite these problems, the methods of meta-analysis are an invaluable scientific activity: they establish whether scientific findings are consistent^{8} and can be generalized across populations and treatment variations^{9} and whether findings vary between subgroups.^{10} The methods also limit bias, improve reliability and accuracy of conclusions,^{11} and increase the power and precision of treatment effects and risk exposures.^{6}

The objective of this article is to review statistical procedures for the meta-analysis of cardiovascular research. The Google database system provides 659 000 references on the methods of meta-analysis and refers to hundreds of books of up to 600 pages,^{12} illustrating the complexity of this subject. The basic statistical analysis of meta-analyses, however, is not complex if the basic scientific methods are met.^{13} We first will review the scientific methods and then introduce the statistical analysis, including the analysis of potential pitfalls. Finally, we will cover some new developments.

## Four Scientific Rules

The logic behind meta-analyses is simple and straightforward. It requires adherence to scientific methods, largely similar to those required for clinical trials. These scientific methods can be summarized as follows: (1) a clearly defined prior hypothesis, (2) thorough search of trials, (3) strict inclusion criteria, and (4) uniform data analysis.^{13}

### Clearly Defined Hypothesis

Clinical trials address efficacy and safety of new drugs or interventions. The main outcome variables and the manner in which they should be tested are specified in advance. A meta-analysis is similar to a single trial, and, as in a single trial, it tests a very small number of primary hypotheses, primarily that the new compound or intervention is more efficacious and safe than the reference compound or intervention.

### Thorough Search of Trials

The activity of thoroughly searching published research requires a systematic procedure and must be learned. One may pick up a checklist for this purpose, similar to the checklist used by aircraft staff before takeoff, a simile used by Oxman and Guyatt.^{14} A faulty review of trials is as perilous as a faulty aircraft, and both of them are equally deadly, particularly so if we are going to use it for making decisions about healthcare. For a systematic review, MEDLINE^{15} is not enough, and other databases have to be searched, eg, EMBASE–Excerpta Medica^{16} and the Cochrane Library.^{17}

### Strict Inclusion Criteria

Inclusion criteria are concerned with the levels of validity, otherwise termed *quality criteria*, of the trials to be included. Having strict inclusion criteria means that we will subsequently include only the valid studies. Some factors have been shown empirically to beneficially influence validity. These factors include blinding the study, random assignment of patients, explicit description of methods, accurate statistics, and accurate ethics, including written informed consent. We should add that the inclusion of unpublished studies may reduce the magnitude of publication bias, an issue that will be discussed in Pitfalls of Data Analysis.

### Uniform Data Analysis

Statistical analysis is a tool that helps us to derive meaningful conclusions from the data and to avoid analytical errors. Statistics should be simple and test primary hypotheses in the first place. Before any analysis or data plots, we must decide what kind of data we have.

## General Framework of Meta-Analysis

In general, meta-analysis refers to statistical analysis of the results of different studies. The simplest analysis is to calculate an average, and in a meta-analysis a weighted average is computed. Consider a meta-analysis of k different clinical trials, and let x_{1}, x_{2}, …, x_{k} be the summary statistics. The weighted average effect is then calculated as equation

and its SE is equation

The weights w_{i} are a function of the SE of x_{i}, denoted as SE(x_{i}), and of the variance ς^{2} of the true effects of the compound between k different studies: equation

If all k studies have the same true quantitative effect, ς^{2}=0, and the weighted average effect is called a *fixed-effect estimate*. If the true effects of the compound vary between studies, ς^{2} >0, and the weighted average effect is called a *random-effects estimate*. For the fixed-effect estimate (ie, ς^{2}=0), the calculations are quite simple; for the random-effects estimate, the calculations are more complex but are available in computer packages.^{18–22}

With dependence on the type of outcome variable, the summary statistics x_{1}, x_{2}, …, x_{k} have different forms.

### Continuous Data

Continuous data are summarized with means and SDs: mean_{1i} and SD_{1i} in the placebo group and mean_{2i} and SD_{2i} in the active treatment group of trial i. The summary statistic equals x_{i}=mean_{1i}−mean_{2i} and equation

If a trial compares 2 treatments in the same patients, the summary statistic is x_{i}=mean_{1i}−mean_{2i}, where mean_{1i} and mean_{2i} are the means of the 2 treatments, and equation

where *r* is the correlation between the outcomes in the 2 treatments.

If the distribution of the outcomes is very skewed, it is more useful to summarize the outcomes with medians than means.

### Binary Data

Binary data are summarized as proportions of patients with a positive outcome in the treatment arms, denoted by p_{i1} and p_{i2}. Three different summary statistics are used, as follows.

**
***Risk Difference*

*Risk Difference*

The summary statistic of trial i equals x_{i}=p_{1i}−p_{2i}, and the SE equals equation

where n_{1i} and n_{2i} are the sample sizes of the 2 treatments of trial i.

**
***Relative Risk*

*Relative Risk*

The summary statistic of trial i equals the ratio of the 2 proportions, but its distribution is often very skewed. Therefore, we prefer to analyze the natural logarithm of the relative risk, ln(RR). The summary statistic thus equals x_{i}=ln(p_{1i}/p_{i2}), and the SE equals equation

**
***Odds Ratio*

*Odds Ratio*

The summary statistic of trial i equals the ratio of the odds, but because the odds ratio is strictly positive, we again prefer to analyze the natural logarithm of the odds ratio. Thus, the summary statistic equals equation

and the SE equals equation

**
***Other Methods*

*Other Methods*

The Mantel-Haenszel method has been developed for the stratified analysis of odds ratios and has been extended to the stratified analysis of risk ratios and risk differences.^{23} Like the general model, a weighted average effect is calculated. For the calculation of combined odds ratios, Peto’s method is also often used.^{24} It applies a way to calculate odds ratios that may cause underestimation or overestimation of extreme values like odds ratios <0.2 or >5.0.

Sometimes valuable information can be obtained from crossover studies, and, if the paired nature of the data is taken into account, such data can be included in a meta-analysis. The Cochrane Library CD-ROM provides the generic inverse variance method for that purpose.^{17}

### Survival Data

Survival trials are summarized with Kaplan-Meier curves, and the difference between the survival in 2 treatment arms is quantified with the log(hazard ratio) calculated from the Cox regression model. To test whether the weighted average is significantly different from 0.0, a χ^{2} test is used, as follows equation

with 1 *df*. A calculated χ^{2} value >3.841 indicates that the pooled average is significantly different from 0.0 at *P*<0.05 and thus that a significant different exists between the test and reference treatments. The generic inverse variance method is also possible for the analysis of hazard ratios.^{17}

## Pitfalls of Data Analysis

Meta-analyses will suffer from any bias that the individual studies included suffer from, including incorrect and incomplete data. Two publications emphasize these problems: (1) Of 49 recently published studies, 83% of the nonrandomized and 25% of the randomized studies were partly refuted soon after publication^{25}; and (2) of 519 recently published trials, 20% selectively reported positive results and reported negative results incompletely.^{26} Three common pitfalls of meta-analyses are listed below.

### Publication Bias

A good starting point with any statistical analysis is plotting the data (Figure 1, left). A Christmas tree^{13} or upside-down funnel pattern of distribution of the results of 100 published trials shows on the *x* axis the mean result of each trial and on the *y* axis the sample size of the trials. The smaller the trial, the wider is the distribution of results. The right panel of Figure 1 gives a simulated pattern, suggestive of publication bias: the negative trials are not published and thus are missing. This cut Christmas tree can help one to suspect that there is publication bias in the meta-analysis. Publication bias can be tested by calculating the shift of odds ratios caused by the addition of unpublished trials from abstract reports or proceedings.^{27}

### Heterogeneity

To visually assess heterogeneity between studies, several types of plots are proposed, including forest plots, radial plots, and L’Abbe plots.^{28} The forest plot of Figure 2 gives an example used by Thompson^{29} of a meta-analysis with odds ratios and 95% CIs, revealing information about heterogeneity. On the *x* axis are the results, and on the *y* axis are the trials. We see the results of 19 trials of endoscopic intervention versus no intervention for upper intestinal bleeding: odds ratios <1 represent a beneficial effect. These trials were considerably different in patient selection, baseline severity of condition, endoscopic techniques, management of bleeding, and duration of follow-up. Therefore, this is a meta-analysis that is, clinically, very heterogeneous. Is it also statistically heterogeneous? For that purpose, we may use a fixed-effect model, which tests whether there is a greater variation between the results of the trials than is compatible with the play of chance, using a χ^{2} test. The null hypothesis is that all studies have the same true odds ratio and that the observed odds ratios vary only because of sampling variation in each study. The alternative hypothesis is that the variation of the observed odds ratio is also due to systematic differences in true odds ratios between studies. The following test statistic is used to test the aforementioned null hypothesis with summary statistics x_{i} and weights w_{i}: equation

with k−1 df.

We find Q=43 for 19−1=18 *df* for the example of the endoscopic intervention. The probability value is <0.001, providing substantial evidence for statistical heterogeneity. For the interpretation, it is useful to know that, when the null hypothesis is true, a Q statistic has on average a value close to the *df* and increases with increasing *df*. Therefore, a result of Q=18 with 18 *df* would give no evidence for heterogeneity, and the opposite is true for much larger values.

If the aforementioned test is positive, it is common to also calculate a random-effects estimate of the weighted average, as suggested by Dersimonian and Laird.^{30} We should add that, in most situations, the use of the random-effects model will lead to wider CIs and a lower chance to call a difference statistically significant. A disadvantage of the random-effects analysis is that small and large studies are given almost similar weights.^{31} Complementary to the Q statistic, the amount of heterogeneity between studies is often quantified with the I^{2} statistic,^{32} as follows equation

which is interpreted as the proportion of total variation in study estimates due to heterogeneity rather than sampling error. Fifty percent is often used as a cutoff for heterogeneity.

### Investigating the Cause for Heterogeneity

When there is heterogeneity, careful investigation of the potential cause must be accomplished. The main focus should be trying to understand any sources of heterogeneity in the data. In practice, it may be less hard to assess because clinical differences have already been noticed, and it therefore becomes easy to test the data accordingly. The general approach is to quantify the association between the outcomes and characteristics of the different trials. Not only patient characteristics but also trial quality characteristics such the use of blinding, randomization, and placebo controls have to be considered. Scatterplots are helpful to investigate the association between outcome and a covariate, but these must be inspected carefully because differences in trial sample sizes may distort the existence of association, and meta-regression techniques may be needed to investigate associations.

Outliers may also provide a clue about the cause of heterogeneity. Figure 3 shows the relation between cholesterol and coronary heart disease.^{33} The 2 outliers on top were the main cause of heterogeneity in the data.

Still other causes for heterogeneity may be involved. As an example, 33 studies of cholesterol and the risk of carcinomas showed that heterogeneity was huge.^{34} When the trials were divided according to social class, the effect in the lowest class was 4 to 5 times the effects in the middle and upper classes, explaining this heterogeneous result.

There is some danger of overinterpretation of heterogeneity. Heterogeneity may occur by chance and will almost certainly be found with large meta-analyses involving many and large studies. This is particularly an important possibility when no clinical explanation is found or when the heterogeneity is clinically irrelevant. In addition, we should warn that a great deal of uniformity among the results of independently performed studies is not necessarily good; it can indicate consistency in bias rather than consistency in real effects, as suggested by Riegelman.^{35}

### Lack of Robustness

Sensitivity or robustness of a meta-analysis is one last aspect to be addressed. When we described strict inclusion criteria, we discussed studies with lower levels of validity. It may be worthwhile not to completely reject the studies with lower methodology.^{34} They can be used for assessing sensitivity.

The left panel of Figure 4 gives an example of how the pooled data of 3 high-quality studies provide a smaller result than do 4 studies of borderline quality. The summary result is determined mainly by the borderline-quality studies. When studies are ordered according to use of blinding, as shown in the right panel of Figure 4, differences may or may not be large. In studies in which objective variables are used, eg, blood pressures or heart rates, blinding is not as important as it is in studies in which subjective variables (eg, pain scores) are used. In this particular example, differences were negligible. When examining the influence of various inclusion criteria on the overall odds ratios, we must conclude that the criteria themselves are an important factor in determining the summary result. In that case, the meta-analysis lacks robustness. Interpretation must be cautious, and pooling may have to be omitted altogether. Just omitting trials at this stage of the meta-analysis is inappropriate because it would introduce either bias similar to publication bias or bias introduced by not complying with the intention-to-treat principle.

## Discussion

Software programs for the analysis of meta-data are provided by SAS,^{18} the Cochrane Revman,^{20} S-plus,^{36} StatsDirect,^{37} StatXact,^{38} and True Epistat.^{39} Most of these programs are expensive, but common procedures are available through Microsoft’s Excel and in Excel add-ons,^{40} and many Web sites offer online statistical analyses for free, including BUGS^{41} and R.^{42} Leandro’s software program^{43} visualizes heterogeneity directly from a computer graph based on Galbraith^{44} plots.

New statistical methods are being developed. Boekholdt et al^{45} showed that observational studies and clinical trials can be included simultaneously in a meta-analysis. Van Houwelingen et al^{46} assessed heterogeneity with multivariate methods for bivariate and multivariate outcome parameters. If trials directly comparing the treatments under study are not available, indirect comparisons with a common comparator may be used.^{47} A method like leave-1-out cross-validation is a standard sensitivity technique for such purpose. Lumley^{48} developed network meta-analysis to compare competing treatments not directly compared in trials. Terrin et al^{49} and Tang and Liu^{50} recently demonstrated that an asymmetrical Christmas tree is only related to publication bias if the trials included are homogeneous and that registries are a good alternative approach. In recent years, the method of meta-regression brought new insights.^{51,52} For example, it showed that group-level instead of patient-level analyses easily fail to detect heterogeneities between individual patients, otherwise termed *ecological biases*. Robustness is hard to assess if low-quality studies are lacking. Casas et al^{53} showed that it can be assessed by evaluating the extent to which different variables contribute to the variability between the studies. It can also be assessed with the use of cumulative meta-analysis,^{54} whereas quality measures can be adjusted for in meta-regression.

Meta-analyses including few studies, eg, 3 or 4, have little power to test the pitfalls. In contrast, meta-analyses including many studies may have so much power that they demonstrate small pitfalls that are not clinically relevant. For example, a meta-analysis of 43 angiotensin blocker studies^{55} found 95% CIs of the heterogeneity and publication bias effects were not wider than 5% of the treatment effects. Another reason why the pitfalls receive less attention today than 5 years ago is that an increasing part of the current meta-analyses is performed in the form of working papers of an explorative nature, in which the primary question is not a result representative of the entire population but rather the estimates of the treatment effects in subgroups and interactions. These meta-analyses contain many details and look a bit like working papers of technological evaluations produced by physicists. The trend to increasingly publish detailed data, rather than study reports as allowed by journals, is enhanced by the Internet, which enables registration of many more data than do medical journals.

Meta-analyses were invented in the early 1970s by psychologists, but pooling study results extends back to the early 1900s by statisticians such as Karl Pearson and Ronald Fisher. In the first years, pooling of the data were often impossible because of heterogeneity of the studies. However, after 1995, trials became more homogeneous. In the late 1990s, several publications concluded that meta-analyses did not accurately predict treatment^{56,57} and adverse effects.^{58} The pitfalls were held responsible. Initiatives against them include (1) the Consolidated Standards of Reporting Trials Movement (CONSORT), (2) the Unpublished Paper Amnesty Movement of the English journals, and (3) the World Association of Medical Editors’ initiative to standardize the peer review system. Guidelines and checklists for reporting meta-analyses were published such as QUOROM (Quality of Reporting of Meta-analyses) and MOOSE (Meta-analysis Of Observational Studies in Epidemiology).

## Conclusions

Meta-analysis is important in cardiovascular research because it establishes whether scientific findings are consistent and can be generalized across populations. The statistical analysis consists of the computation of weighted averages of study characteristics and their SEs. Common pitfalls of data-analysis are (1) publication bias, (2) heterogeneity, and (3) lack of robustness. New developments in the statistical analysis include (1) new software that is easy to use, (2) new arithmetical methods that facilitate the assessment of heterogeneity and comparability of studies, and (3) a current trend toward more extensive data reporting, including multiple subgroup and interaction analyses. Meta-analyses are governed by the traditional rules for scientific research; the pitfalls are particularly relevant to hypothesis-driven meta-analyses but less so to current working papers with emphasis on entire data coverage.

## Acknowledgments

**Disclosures**

None.

## References

- ↵
- ↵
- ↵
- ↵
- ↵
Stein RA. Meta-analysis from one FOA reviewer’s perspective. Proc Biopharmaceut Sect Am Statist Assoc
*.*1988; 2: 34–38. - ↵
Zhou X, Fang J, Yu C, Xu Z, Lu Y. Meta-analysis. In: Lu Y, Fang J, eds. Advanced Medical Statistics. River Edge, NJ; World Scientific; 2003: 233–316.
- ↵
- ↵
Cook DJ, Mulrow CD, Haynes RB. Systematic reviews: synthesis of the best evidence for clinical decisions. Ann Intern Med
*.*1998; 317: 339–342. - ↵
Straus SE, Sackett DL. Using research findings in clinical practice. BMJ
*.*1998; 317: 339–342. - ↵
Bero LA, Grilli R, Grimshaw JM, Harvey E, Oxman AD, Thomson MA. Closing the gap between research and practice: an overview of systematic reviews of interventions to promote the implementation of research findings. BMJ
*.*1998; 317: 465–468. - ↵
- ↵
Hunter JE, Schmidt FL. Methods in Meta-Analysis. 2nd ed. Thousand Oakes, Calif: Sage Public Inc; 2004.
- ↵
Cleophas TJ. Meta-analysis. In: Cleophas TJ, Zwinderman AH, Cleophas AF, eds. Statistics Applied to Clinical Trials. 3rd ed. New York, NY: Springer; 2006: 205–218.
- ↵
- ↵
Greenhalgh T. How to read a paper: the Medline database. BMJ
*.*1997; 315: 180–183. - ↵
Lefebvre C, McDonald S. Development of a sensitive search strategy for reports of randomized trials in EMBASE. Paper presented at: Fourth International Cochrane Colloquium; October 20–24 1996; Adelaide, Australia.
- ↵
Cochrane Library. Available at: http://www.cochrane.org/resources/handbook/index.htm. Accessed May 24, 2007.
- ↵
SAS. Available at: http://www.prw.le.ac.uk/epidemiol/personal/ajs22/meta/macros.sas. Accessed May 24, 2007.
- ↵
SPSS Statistical Software. Available at: http://www.spss.com. Accessed May 24, 2007.
- ↵
Cochrane Revman. Available at: http://www.cochrane.org/cochrane/revman.htm. Accessed May 24, 2007.
- ↵
Stata, statistical software for professionals. Available at: http://www.stat.com. Accessed May 24, 2007.
- ↵
Comprehensive meta-analysis, by Biostat. Available at: http://www.meta-analysis.com. Accessed May 24, 2007.
- ↵
- ↵
- ↵
Ioannides JP. Contradicted and initially stronger effects in highly cited clinical research. JAMA
*.*2005; 294: 210–228. - ↵
Chan AW, Altman DG. Identifying outcome reporting bias in randomised trials on Pub Med: review of publications and survey of authors. BMJ
*.*2005; 330: 753–756. - ↵
Chalmers I, Altman DG. Systematic Reviews. London, UK: BMJ Publishing Group; 1995.
- ↵
National Council of Social Studies. Statistical and power analysis software. Available at: http://www.ncss.com/metaanal.html. Accessed May 24, 2007.
- ↵
Thompson SG. Why sources of heterogeneity should be investigated. In: Chalmers I, Altman DG, eds. Systematic Reviews. London, UK: BMJ Publishing Group; 1995: 48–63.
- ↵
- ↵
- ↵
- ↵
Shipley MJ, Pocock SJ, Marmot MG. Does plasma cholesterol concentration predict mortality from coronary heart disease in elderly people? 18 year follow-up in Whitehall study. BMJ
*.*1991; 303: 89–92. - ↵
- ↵
Riegelman RK. Meta-analysis. In: Riegelman RK, ed. Studying a Study & Testing a Test. Philadelphia, Pa: Lippincott Williams & Wilkins; 2005: 99–115.
- ↵
S-plus. Available at: http://www.insightful.com. Accessed May 24, 2007.
- ↵
StatsDirect. Available at: http://www.camcode.com. Accessed May 24, 2007.
- ↵
StatXact. Available at: http://www.cytel.com/products/statxact/statact1.html. Accessed May 24, 2007.
- ↵
True Epistat. Available at: http://www.true-epistat.com. Accessed May 24, 2007.
- ↵
*Meta-Analysis Mark X Microsoft Excel*. Available at: http://www.ucalgary. ca/∼steel/procrastinus/meta/meta.html. Accessed May 24, 2007. - ↵
BUGS y WinBUGS. Available at: http://www.mrc-bsu.cam.ac.uk/bugs. Accessed May 24, 2007.
- ↵
R. Available at: http://cran.r-project.org. Accessed May 24, 2007.
- ↵
Leandro G. Meta-Analysis in Medical Research. London UK: BMJ Books; 2005.
- ↵
- ↵
Boekholdt SM, Sacks FM, Jukema JW, Shepherd J, Freeman DJ, McMahon AD, Cambien F, Nicaud V, De grooth GJ, Talmud PJ, Humphries SE, Miller GJ, Eiriksdottir G, Gudnason V, Kauma H, Kakko S, Savolainen MJ, Arca M, Montasli A, Liu S, Lanz HJ, Zwinderman AH, Kuivenhoven JA, Kastelein JJ. Cholesterol ester transfer protein TaqiB variant, high density lipoprotein cholesterol levels, cardiovascular risk, and efficacy of pravastatin treatment. Circulation
*.*2005; 111: 278–287. - ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
Higgins JPT, Thompson SG. Controlling the risk of spurious findings from meta-regression. Stat Med
*.*2004; 23: 1662–1682. - ↵
Casas JP, Leonelo EB, Humphries SE. Endothelial NO synthase genotype and ischemic heart disease. Circulation
*.*2004; 109: 1359–1365. - ↵
- ↵
- ↵
- ↵
- ↵

## This Issue

## Jump to

## Article Tools

- Meta-AnalysisTon J. Cleophas and Aeilko H. ZwindermanCirculation. 2007;115:2870-2875, originally published June 4, 2007https://doi.org/10.1161/CIRCULATIONAHA.105.594960
## Citation Manager Formats