1 Introduction and background

Meta-analysis is a statistical methodology for the analysis and integration of results from individual, independent studies. In the last decades, meta-analysis developed a crucial role in many fields of science such as medicine and pharmacy, health science, psychology, and social science (Petitti 1994; Schulze et al. 2003; Böhning et al. 2008; Sutton et al. 2000; Egger et al. 2001). Consider the typical set-up in a meta-analysis: effect measure estimates \({\hat{\theta }}_{1}, \ldots , {\hat{\theta }}_{k}\) are available from k studies with associated variances \(\sigma _1^2, \ldots , \sigma _k^2\), which are assumed known and equal to their sampling counterparts. Typically, the random effects model

$$\begin{aligned} {\hat{\theta }}_i=\theta +\delta _i + \epsilon _i \end{aligned}$$

is employed where \(\delta _i\sim N(0,\tau ^2)\) is a normal random effect and \( \epsilon _i \sim N(0,\sigma _i^2)\) is a normal random error, all random effects and errors being independent, and \(\tau ^2 \ge 0\). Furthermore, let \(w_i=1/\sigma _i^2\) and \(W_i=1/(\sigma _i^2+\tau ^2)\). The heterogeneity statistic Q is defined as

$$\begin{aligned} Q = \sum _{i=1}^k w_i({\hat{\theta }}_{i} - {\bar{\theta }})^{2}, \end{aligned}$$

where \({\bar{\theta }}= \sum _{i=1}^{k} w_{i} {\hat{\theta }}_{i}/\sum _{i=1}^{k} w_i\). The distribution of Q has been investigated including Biggerstaff and Jackson (2008) and a critical appraisal of Q is given by Hoaglin (2016). More importantly for this work, Q is also the basis of the DerSimonian–Laird estimator for the heterogeneity variance \(\tau ^2\), which is given, in its untruncated form, by

$$\begin{aligned} \hat{\tau }^2=\frac{Q-(k-1)}{\sum _{i=1}^k w_i -\sum _{i=1}^k w_i^2/[\sum _{i=1}^k w_i]}. \end{aligned}$$

It is also the foundation of Higgins’ \(I^2\) defined as

$$\begin{aligned} I^2=\frac{Q-(k-1)}{Q} \end{aligned}$$
(1)

designed to provide a measure of quantifying the magnitude of heterogeneity involved in the meta-analysis (Higgins and Thompson 2002; Borenstein et al. 2009). Higgins’ \(I^2\) is very popular and has been discussed intensively including critical appraisals (its dependence on the study-specific precision) given in Rücker et al. (2008) or, more recently, in Borenstein et al. (2017). \(I^2\) has also been recently generalized to the multivariate context, see Jackson et al. (2012). \(I^2\) is indeed a proportion and, if multiplied by 100, a percentage. In addition, \(I^2\) has a variance component interpretation. The variance of the effect measure \(Var({\hat{\theta }}_{i})= \tau ^2+\sigma _i^2\) can be partitioned into the within-study variance and the variance between studies. If all studies would have the same within-study variance \(\sigma _i^2=\sigma ^2\), then it would be easy to define with \(\tau ^2/(\tau ^2+\sigma ^2)\) the proportion of variance due to across-study variation, or simply due to heterogeneity. With study-specific variances an average variance needs to be used and, if a specific average is selected, then \(I^2\) can be interpreted as the proportion of variation due to heterogeneity. This might be not obvious from the definition provided in (1) but becomes more evident from the identity (although this can be found elsewhere a proof of this identity is given in the Appendix for completeness)

$$\begin{aligned} I^2=\frac{{\hat{\tau }}^{2}}{{\hat{\tau }^{2}+s^{2}}}, \end{aligned}$$
(2)

where \(s^2=(k-1)\sum _{i=1}^k w_i/[(\sum _{i=1}^k w_i)^2 - \sum _{i=1}^k w_i^2]\). If \(s^2\) could be viewed as some sort of average of the study-specific variances \( \sigma _1^2\), ..., \(\sigma _k^2\), then \(I^2\) could be validly interpreted, as typically done in variance component models, as the proportion of the total variance (variance due to heterogeneity plus within-study variance).

This short note serves two purposes:

  • we will show that, under mild regularity assumptions, \(s^2\) is asymptotically identical to the harmonic mean \({ {\bar{\sigma }^2}} =\left[ \frac{1}{k} \sum _{i=1}^k w_i\right] ^{-1}\) of the study-specific variances.

  • we will show that \(s^2 \ge {{\bar{\sigma }^2}}\) with the difference \(s^2 - {{\bar{\sigma }^2}}\) being zero if all study-specific variances are identical, and for the more general case of non-identical study-specific variances, approaching zero for k becoming large.

2 Main results

2.1 The harmonic mean result

We have the following result:

Theorem 1

If there exist positive constants b and B such that for all \(i=1,2,\ldots ,k\)

$$\begin{aligned} 0< b \le \sigma _i^2 \le B < \infty , \end{aligned}$$

then

$$\begin{aligned} |s^2 - {{\bar{\sigma }^2}}| \rightarrow 0 \text{ for } k \rightarrow \infty . \end{aligned}$$

Proof

We can write \(s^2\) as

$$\begin{aligned} s^2 = \frac{(k-1)/k}{\frac{1}{k}\sum _i w_i -\left[ \frac{1}{k} \sum _i w_i^2]/[\sum _i w_i\right] }. \end{aligned}$$

As \((k-1)/k \rightarrow 1\) for \(k \rightarrow \infty \), it is sufficient to show that

$$\begin{aligned} \frac{\frac{1}{k} \sum _i w_i^2}{\sum _i w_i} \rightarrow 0 \end{aligned}$$

as \(k \rightarrow \infty \). As \(\sigma _i^2\) is bounded below by b for all i, we have that

$$\begin{aligned} w_i^2 \le 1/b^2 \end{aligned}$$

for all i, so that

$$\begin{aligned} \frac{1}{k} \sum _i w_i^2 \le 1/b^2. \end{aligned}$$
(3)

In addition, as \(\sigma _i^2\) is bounded above by B for all i, we have that

$$\begin{aligned} w_i \ge 1/B \end{aligned}$$

so that

$$\begin{aligned} \frac{1}{\sum _i w_i} \le B/k. \end{aligned}$$
(4)

Taking (3) and (4) together yields

$$\begin{aligned} \frac{\frac{1}{k} \sum _i w_i^2}{\sum _i w_i} \le \frac{1}{k} \frac{B}{b^2} \rightarrow 0 \end{aligned}$$

for \(k \rightarrow \infty \). This ends the proof. \(\square \)

2.2 The inequality

Further clarification on the relation between \(s^2\) and \({{\bar{\sigma }^2}}\) is given by the following inequality. Note that this result does not require any assumption on the variances \( \sigma _1^2\), ..., \(\sigma _k^2\).

Theorem 2

$$\begin{aligned} s^2 \ge {{\bar{\sigma }^2}}, \end{aligned}$$

with equality if

$$\begin{aligned} \sigma _1^2 = \sigma _2^2= \cdots = \sigma _k^2. \end{aligned}$$

Proof

We need to show that

$$\begin{aligned} s^2=(k-1)\sum _{i=1}^k w_i/\left[ \left( \sum _{i=1}^k w_i\right) ^2 - \sum _{i=1}^k w_i^2\right] \ge k/\sum _i w_i. \end{aligned}$$

This is equivalent to showing that

$$\begin{aligned} \frac{k-1}{1- \frac{\sum _i w_i^2}{\left( \sum _i w_i\right) ^2} } \ge k, \end{aligned}$$

or

$$\begin{aligned} \frac{\sum _i w_i^2}{\left( \sum _i w_i\right) ^2} \ge 1/k. \end{aligned}$$

This is equivalent to

$$\begin{aligned} E(V^2)/[E(V)^2] \ge 1, \end{aligned}$$

for a random variable V giving equal weights to \(w_1,\ldots , w_k\). This inequality holds as

$$\begin{aligned} Var(V)= E(V^2)-E(V)^2 \ge 0, \end{aligned}$$

with equality if \(w_1=w_2=\cdots =w_k\), and this ends the proof. \(\square \)

3 Empirical illustrations

Here, we illustrate these relationships on the basis of 15 meta-analyses. Details on these are given in Table 1. These meta-analyses were not selected in any particular way, they were simply collected from the literature while teaching a course on statistical methods for meta-analysis.

Table 1 Large details of the 15 meta-analyses used in illustrating the asymptotic result of Theorem 1

It is clear from Theorem 1 that the difference between \(s^2\) and \({{\bar{\sigma }^2}}\) should become smaller with increasing number of studies k. In Fig. 1, we examine \(\log (s^2 /{{\bar{\sigma }^2}})\) in dependence on k. Here we are taking the log-ratio of \(s^2\) and \( {{\bar{\sigma }^2}}\) to remove any scaling factor variation across different meta-analyses. There is a clear decreasing trend for \(\log (s^2 /{{\bar{\sigma }^2}})\) with increasing k.

Fig. 1
figure 1

Scatter plot with regression line of \(\log (s^2 /{\bar{\sigma }^2})\) versus k for 15 different meta-analyses; the correlation coefficient is −0.29

In Fig. 2, we examine \(\log (s^2 /{\bar{\sigma }^2})\) vs. \(\sqrt{\frac{1}{k-1}\sum _i(\sigma _i^2-{\tilde{\sigma }^2})^2}/{\tilde{\sigma }^2}\) where \({\tilde{\sigma }^2}=\frac{1}{k}\sum _i \sigma _i^2\). Here we are considering coefficient of variation as measure of variability of the within-study variances. Again we need to take the coefficient of variation (in contrast to the standard deviation) to remove scale variation across different meta-analyses. There is a clear increasing trend for \(\log (s^2 /{\bar{\sigma }^2})\) with increasing variability of the study-specific variances involved in the meta-analysis.

Fig. 2
figure 2

Scatter plot with regression line of \(\log (s^2 /{\bar{\sigma }^2})\) versus standard deviation of \( \sigma _1^2\), ..., \(\sigma _k^2\) divided by their mean for 15 different meta-analyses; the correlation coefficient is 0.33

4 A simulation study

To further investigate these findings, we have undertaken the following simulation work. We assume that study-specific variances differ only by the study size. Hence \( \sigma _i^2 = \sigma ^2/n_i\) where \(n_i\) is the sample size of study i. We take \(\sigma ^2=1\) so that \(w_i= \frac{1}{\sigma _i^2}= n_i\). We consider three settings in which \(w_i=n_i\) is sampled from a uniform with common mean 55 but different range:

  • setting 1: \(w_i=n_i \sim Uniform(10,100)\) for \(i=1,\ldots ,k\) is arising from a uniform distribution on [10, 100];

  • setting 2: \(w_i=n_i \sim Uniform(30,80)\) for \(i=1,\ldots ,k\) is arising from a uniform distribution on [30, 80];

  • setting 3: \(w_i=n_i \sim Uniform(45,65)\) for \(i=1,\ldots ,k\) is arising from a uniform distribution on [45, 65].

Setting 3 has the smallest range whereas setting 1 has the largest, and setting 2 is in between. For \(i=1,\ldots , k\) the following measures are calculated:

Fig. 3
figure 3

Scatter plot of \(\log (s^2 /{\bar{\sigma }^2})\) versus the number of studies k involved in the meta-analysis for the three settings of the simulation study as described in Sect. 4

  1. 1.

    the arithmetic mean \({\tilde{\sigma }^2}=\frac{1}{k}\sum _i \sigma _i^2= \frac{1}{k} \sum _{i=1}^k w_i^{-1}\) of the study-specific variances;

  2. 2.

    the harmonic mean \({\bar{\sigma }^2} =[\frac{1}{k} \sum _{i=1}^k w_i]^{-1}\) of the study-specific variances;

  3. 3.

    \(s^2=(k-1)\sum _{i=1}^k w_i/[(\sum _{i=1}^k w_i)^2 - \sum _{i=1}^k w_i^2]\);

  4. 4.

    \(\log (s^2/{\bar{\sigma }^2})\);

  5. 5.

    the coefficient of variation \(CV = \sqrt{\frac{1}{k-1}\sum _i(\sigma _i^2-{\tilde{\sigma }^2})^2}/{\tilde{\sigma }^2}\) where \({\tilde{\sigma }^2}=\frac{1}{k}\sum _i \sigma _i^2\).

This process has been repeated 10,000 times and the mean of the above performance measures calculated. Figure 3 shows a scatter plot of \(\log (s^2 /{\bar{\sigma }^2})\) vs. the number of studies k involved in the meta-analysis. It can be seen that for k larger than 20 the difference between \(s^2\) and the harmonic mean becomes rather small. Of course, this occurs much earlier for the small-range setting 3 and the moderate-range setting 3. This illustrates the result of Theorem 1 and also indicates when the limit is approached with acceptable approximation. Figure 4 shows that \(\log (s^2 /{\bar{\sigma }^2})\) also depends strongly on the variability of the study-specific variances involved in the meta-analysis. Clearly, the smaller the coefficient of variation the closer to one is also the ratio of \(s^2\) to \({\bar{\sigma }^2}\). This illustrates Theorem 2.

Fig. 4
figure 4

Scatter plot of \(\log (s^2 /{\bar{\sigma }^2})\) versus the coefficient of variation (CV) of the study-specific variances involved in the meta-analysis for the three settings of the simulation study as described in Sect. 4; the numbers next to the symbols in the graph indicate the number of studies involved in the meta-analysis

5 Discussion and conclusion

It was seen that \(s^2\) approximates the harmonic mean \([\frac{1}{k} \sum _{i=1}^k w_i]^{-1}\). This asymptotic result has been achieved under fairly mild assumptions (study-specific variances need to be bounded). As a referee gratefully points out, Jackson and Bowden (2009, Sect. 3.1) also mention this result without proof in a simulation study and it remains unclear, in the work of Jackson and Bowden, how general the result is meant to hold. Some general conditions are given here, in Theorem 1, under which the asymptotic result is valid.

The harmonic mean appears to be a more reasonable summary measure of the study-specific variance than the arithmetic mean. To make this point more clear consider that situation that \(\sigma _i^2=\sigma ^2/n_i\) with \(\sigma ^2=1\) for simplicity. Let us further assume that \(\sigma _i^2\) is estimated by \(s_i^2/n_i\) where \(s_i^2\) is the study-specific variance of study i. Then, under normality, \(Var(s_i^2)\) is proportional to \(1/n_i\), in other words, \(Var(s_i^2/n_i)\) is proportional to \(1/n_i^3\). As an optimal meta-analytic weighting scheme uses inverse variances the optimal summary measure would be

$$\begin{aligned} \frac{\sum _i n_i^3 s_i^2/n_i}{\sum _i n_i^3}\approx \frac{\sum _i n_i^2 }{\sum _i n_i^3}. \end{aligned}$$
(5)

The harmonic mean \([\frac{1}{k} \sum _{i=1}^k n_i]^{-1}\) is a sharp upper bound for the RHS in (5), with the bound being sharp if all sample sizes agree. Hence we can understand the harmonic mean as an approximation of the optimal weighted summary of the study-specific variances.

Higgins’ \(I^2\) can be validly interpreted as the proportion of variance due to heterogeneity if the variance of the study-specific variances are small and/or the number of component studies is moderately large. This provides a valuable way of interpreting any value of \(I^2\). If, however, the number of component studies are small with considerable variation in the variances of the component studies it may be advisable to consider a more direct measure for the proportion due to heterogeneity.

Evidently, there is interest in developing a measure of heterogeneity which has an interpretation as a ‘proprotion of total variation due to heterogeneity’. In this line, Wetterslev et al. (2009) suggest a measure of heterogeneity, denoted by \(D^2\), involving, besides \(\tau ^2\), a new measure of average sampling error. The basis of \(D^2\) is the ratio of the harmonic mean of the study-specific variance increased by the heterogeneity variance to the harmonic mean of the study-specific variances. This new measure of heterogeneity seems to have a lot of rationale in its construction: however, it remains to be seen if it establishes itself as an alternative to Higgins’ \(I^2\).

In conclusion, a measure of heterogeneity of the form \(\tau ^2/(\tau ^2+ {\bar{\sigma }^2})\), with \({\bar{\sigma }^2}\) being the harmonic mean of the study-specific variances, seems to be a reasonable good choice. It seems also possible to extend this most straight forward definition of \(I^2\) to meta-regressive contexts. Clearly, estimating \(I^2\) as \(\frac{Q-(k-1)}{Q}\) (and not as \({\hat{\tau }^{2}}/({\hat{\tau }^{2}}+ {\bar{\sigma }^2})\) has considerable benefits, in particular as it is easily carried forward to meta-analytic regression approaches. The results given here justify that \(\frac{Q-(k-1)}{Q}\) can still be interpreted as proportion of explained variance due to heterogeneity. In fact, it is the latter way of defining \(I^2\) which allow Jackson et al. (2012) to extend \(I^2\) to a multivariate meta-analytic context. Here, multivariate meta-analysis is understood in the dense of having several outcome measures or endpoints simultaneously available for meta-analysis. In the approach of Jackson et al. (2012), Q is first extended to a quadratic form incorporating all outcome measures, and then, in a second step, \(I^2\) is then defined as \(\frac{Q-\nu }{Q}\) where \(\nu \) then corresponds to the degrees of freedom involved in the quadratic form. The concept of ‘explained variance due to heterogeneity’ is evidently more difficult to generalize as within-study and between-study variation involve covariance matrices due to the multivariate nature of the meta-analysis. This area is clearly of great interest for future work.