Introduction

Mediation models are widely used in the social and behavioral sciences, as demonstrated in recent books by Hayes (2013) and MacKinnon (2008). Mediation models are useful because they can be used to investigate the underlying mechanism related to why an input variable influences an output variable. In order to avoid underpowered research, statistical power analysis is always necessary before data collection. We are aware of only a few studies that have discussed how to conduct power analysis for mediation models. Most literature on power analysis for mediation models has focused on a simple mediation model (Beasley, 2012; Fritz & MacKinnon, 2007; Vittinghoff, Sen, & McCulloch, 2009; Wang & Xue, 2012). Software is available in terms of R code (Kenny & Judd, 2013) and R package (Qiu, 2013) for conducting power analysis for certain types of mediation models. Thoemmes, MacKinnon, and Reiser (2010) proposed a general framework for power analysis for complex mediation models using Monte Carlo simulation in Mplus (Muthén & Muthén, 1998–2011). However, their method assumes that data are normally distributed and uses the Sobel test, although it can be extended to nonnormal data analysis.

Practically collected data are often nonnormal. For example, Micceri (1989) reported that among 440 large-sample achievement and psychometric measures taken from journal articles, research projects, and tests, all were significantly nonnormally distributed. Consequently, statistical tests developed for normal data often give inaccurate power estimation in the presence of nonnormal data (e.g., Wang & Zhang, 2011; Zhang & Wang, 2013b; Zu & Yuan, 2010). Mediation analysis adds extra complexity to power analysis. For example, different methods are available for testing the mediation effect, and they can have different power for the same mediation effect (e.g., Cheung, 2007; Fritz & MacKinnon, 2007). Studies have shown that the bootstrap method achieves the highest power among many methods developed for detecting mediation in the literature.

This study extends Thoemmes et al. (2010) in several ways. First, it proposes a general method for conducting power analysis for mediation models based on the bootstrap method. The method is still based on Monte Carlo simulation but uses the bootstrap method to test mediation effects. Second, the method allows the specification of nonnormal data in the Monte Carlo simulation and can, thereby, reflect more closely practical data collection. Third, a free, open-source R package, bmem, is developed to ease power analysis for mediation models using the proposed method.

In the following sections, we first present the proposed method for power analysis of mediation models. Then we illustrate the use of the R package bmem for conducting power analysis. After that, we demonstrate the use of the proposed method through four examples including a simple mediation model, a multiple-mediator model with latent variables, a multiple-group mediation model, and a longitudinal mediation model. Complete R code for the four examples is provided in the Appendices.

Monte Carlo based statistical power analysis

In this section, we first present the proposed method. For better illustration, we focus our discussion on a simple mediation model, even though the method applies to more complex models, as shown in our examples. Figure 1 displays the path diagram of the simple mediation model. In the figure, x, m, and y represent the independent or input variable, the mediation variable, and the dependent or outcome variable, respectively. In this model, the total effect of x on y, c ’ + a * b, consists of the direct effect c and the mediation effect θ = a * b, the multiplication of the direct effect of x on m and the direct effect of m on y. The mediation effect is also called the indirect effect because it is the effect of x on y indirectly through m.

Fig. 1
figure 1

Path diagram of a simple mediation model

Statistical power analysis for mediation can be viewed as concerning a test of whether the mediation effect (θ) is significantly different from 0. More specifically, we have the null and alternative hypothesis

$$ {H}_0:\theta =0\kern0.5em \mathrm{vs}.\kern0.5em {H}_1:\theta ={\theta}_1, $$

where θ 1 represents a given effect size. By its definition, the statistical power (π) is

$$ \pi = \Pr \left(\mathrm{reject}\kern0.5em {H}_0|{H}_1\right). $$
(1)

In addition to the use of null hypothesis testing, the power can be calculated using the confidence intervals. This is based on the equivalence of confidence intervals and hypothesis testing (e.g., Hoenig & Heisey, 2001; Meehl, 1997). That is, if a 1 − α confidence interval does not include the null hypothesis value, one can infer a statistically significant result at the significant level α (e.g., Daly, 1991). More specifically, let [l,u] denote the confidence interval of the mediation effect θ. The power is then

$$ \pi = \Pr \left(0\notin \left[l,u\right]|{H}_1\right). $$
(2)

In practice, the power π can be difficult to calculate analytically, especially for complex mediation models. However, it can be estimated using the relative frequency of rejecting the null hypothesis in Monte Carlo simulation following Algorithm 1. The algorithm has been widely applied in the literature of statistical power analysis for both mediation analysis and other analysis (e.g., Cheung, 2007; Fritz & MacKinnon, 2007; Fritz, Taylor, & MacKinnon, 2012; Hayes & Scharkow 2013; MacKinnon, Lockwood, & Williams, 2004; Muthén & Muthén, 2002; Thoemmes et al., 2010; Zhang & Wang, 2009, 2013b).

Algorithm 1

Monte Carlo simulation algorithm for statistical power

  1. 1.

    Form a mediation model based on the hypothesized theory and set up the population parameters for the mediation model. The parameter values can be decided from previous studies in the literature or a pilot study.

  2. 2.

    Generate a data set with sample size n based on the model and its population parameter values.

  3. 3.

    Test the significance of a mediation effect by forming a confidence interval using the generated data.

  4. 4.

    Repeat steps 2 and 3 for R times, where R is the number of Monte Carlo replications.

  5. 5.

    Suppose among the R replications, the mediation effect is significant for r times. Then the power for detecting the mediation effect given the sample size n is r/R.

A critical component of such a Monte Carlo algorithm is the choice of the method for constructing the confidence interval of the mediation effect. In this study, we consider three types of confidence intervals: the normal confidence interval, the robust confidence interval, and the bootstrap confidence interval, although we recommend the use of the bootstrap confidence interval.

Normal confidence interval

In mediation analysis, model parameters and their covariance can be estimated using the maximum likelihood method. Under the normal data assumption, the estimated model parameters follow a multivariate normal distribution asymptotically. For example, for the simple mediation model, \( \widehat{a} \) and \( \widehat{b} \), estimates of a and b, have a bivariate normal distribution with the covariance matrix \( \left(\begin{array}{cc}\hfill {\displaystyle {\widehat{\sigma}}_a^2}\hfill & \hfill {\displaystyle {\widehat{\sigma}}_{ab}}\hfill \\ {}\hfill {\displaystyle {\widehat{\sigma}}_{ab}}\hfill & \hfill {\displaystyle {\widehat{\sigma}}_b^2}\hfill \end{array}\right) \), where \( {\displaystyle {\widehat{\sigma}}_a^2} \), \( {\displaystyle {\widehat{\sigma}}_b^2} \), and \( {\displaystyle {\widehat{\sigma}}_{ab}} \) are the estimated variances and covariance of \( \widehat{a} \) and \( \widehat{b} \). Using the delta method, \( \widehat{\theta}=\widehat{a}\widehat{b} \) is normally distributed with mean θ = ab and variance \( {\displaystyle {\widehat{b}}^2}{\displaystyle {\widehat{\sigma}}_a^2}+2\widehat{a}\widehat{b}{\displaystyle {\widehat{\sigma}}_{ab}}+{\displaystyle {\widehat{a}}^2}{\displaystyle {\widehat{\sigma}}_b^2} \) (Sobel, 1982, p. 298). The 1 − α confidence interval for ab can be constructed as

$$ \left[\widehat{a}\widehat{b}+{\varPhi}^{-1}\left(\alpha /2\right)\times \widehat{ se}\left(\widehat{a}\widehat{b}\right),\widehat{a}\widehat{b}+{\varPhi}^{-1}\left(1-\alpha /2\right)\times \widehat{ se}\left(\widehat{a}\widehat{b}\right)\right], $$
(3)

where Φ is the standard normal cumulative distribution function and, therefore, Φ − 1(α) gives the 100αth percentile of the standard normal distribution. For example, for the 95 % confidence interval, Φ − 1(α/2) = Φ − 1(.05/2) = Φ − 1(.025) ≈ − 1.96 and Φ − 1(1 − α/2) = Φ − 1(.975) ≈ 1.96. \( \widehat{ se}\left(\widehat{a}\widehat{b}\right)=\sqrt{{\displaystyle {\widehat{b}}^2}{\displaystyle {\widehat{\sigma}}_a^2}+2\widehat{a}\widehat{b}{\displaystyle {\widehat{\sigma}}_{ab}}+{\displaystyle {\widehat{a}}^2}{\displaystyle {\widehat{\sigma}}_b^2}} \) is the standard error of \( \widehat{a}\widehat{b} \). We refer to this interval as the normal confidence interval. Note that a power analysis based on the normal confidence interval is the same as the use of the Sobel test.

Robust confidence interval

When data are not normally distributed, the standard error estimates of the parameter estimates of the mediation models are not consistent. Therefore, the confidence interval in Equation 3 is problematic. However, if the fourth moments (or kurtosis) of the nonnormal data still exist, the robust sandwich-type standard errors are consistent and can be used (Zu & Yuan, 2010). Therefore, replacing the normal standard error with the sandwich-type standard error in Equation 3, we obtain a robust confidence interval for the mediation effect.

Bootstrap confidence interval

Both the normal and robust confidence intervals are based on asymptotic theory, and they might not perform well in finite sample experiments (e.g., MacKinnon et al., 2004; Zu & Yuan, 2010). In the literature, confidence intervals constructed using the bootstrap method have been shown to perform better under many studied conditions (e.g., Cheung, 2007; Fritz & MacKinnon, 2007; Fritz et al., 2012; Hayes & Scharkow, 2013; MacKinnon et al., 2004; Preacher & Hayes, 2004; Shrout & Bolger, 2002). Algorithm 2 can be followed to construct a bootstrap confidence interval.

Algorithm 2

Bootstrap confidence interval algorithm

  1. 1.

    Using the original data set (sample size = n) as a population, draw a bootstrap sample of n persons randomly with replacement.

  2. 2.

    With the bootstrap sample, estimate model parameters and compute estimated mediation effects.

  3. 3.

    Repeat steps 1 and 2 for a total of B times. B is the number of bootstrap samples.

  4. 4.

    The bootstrap confidence intervals of model parameters and mediation effects are constructed.

Different bootstrap confidence intervals have been used for the bootstrap method in the literature of mediation analysis (e.g., Cheung, 2007; Fritz & MacKinnon, 2007; Fritz et al., 2012; Hayes & Scharkow, 2013; MacKinnon et al., 2004). Let θ denote a population mediation effect, \( \widehat{\theta} \) denote the estimate of θ from the original data, and \( {\displaystyle {\widehat{\theta}}^b},b=1,\dots, B \) denote its estimate for the bth bootstrap sample. A 100(1 − α)% bootstrap confidence interval is formed in the following ways. First, the percentile bootstrap confidence interval can be constructed by \( \left[{\displaystyle {\widehat{\theta}}^b}\left(\alpha /2\right),{\displaystyle {\widehat{\theta}}^b}\left(1-\alpha /2\right)\right] \) for a parameter with \( {\displaystyle {\widehat{\theta}}^b}\left(\alpha \right) \) denoting the 100αth percentile of the B bootstrap estimates. Second, the bias-corrected bootstrap confidence interval can be constructed as \( \left[{\displaystyle {\widehat{\theta}}^b}\left({\displaystyle {\tilde{\alpha}}_l}\right),{\displaystyle {\widehat{\theta}}^b}\left({\displaystyle {\tilde{\alpha}}_u}\right)\right] \), where \( {\displaystyle {\tilde{\alpha}}_l} \) and \( {\displaystyle {\tilde{\alpha}}_u} \) are used to get the quantiles and are calculated by

$$ {\displaystyle {\tilde{\alpha}}_l}=\varPhi \left[2{z}_0+{\varPhi}^{-1}\left(\alpha /2\right)\right] $$
(4)

and

$$ {\displaystyle {\tilde{\alpha}}_u}=\varPhi \left[2{z}_0+{\varPhi}^{-1}\left(1-\alpha /2\right)\right], $$
(5)

with

$$ {z}_0={\varPhi}^{-1}\left[\frac{\mathrm{number}\kern0.5em \mathrm{of}\kern0.5em \mathrm{times}\kern0.5em \mathrm{that}\kern0.5em {\displaystyle {\widehat{\theta}}^b}<\widehat{\theta}}{B}\right]. $$
(6)

Remarks

Choice of confidence intervals

Simulation studies have been conducted to evaluate the normal, robust, and bootstrap confidence intervals (e.g., Cheung, 2007; Fritz et al., 2012; Hayes & Scharkow, 2013; MacKinnon et al., 2004; Zu & Yuan, 2010). Overall, the bootstrap confidence intervals, including the percentile and the bias-corrected ones, perform better than the nonbootstrap ones. It is found that the percentile bootstrap confidence interval has greater power than the normal one and, at the same time, maintains type I error near its nominal level. The bias-corrected bootstrap confidence interval has even greater power than the percentile bootstrap confidence interval but at the cost of more liberal type I error. In a recent study, Hayes and Scharkow (2013) recommended that if power is at the forefront of concerns, the bias-corrected bootstrap confidence interval can be used and, in general, one can use the percentile bootstrap confidence interval as a good compromise test. In this study, by default, we adopt the percentile bootstrap confidence interval in our analysis. Our software allows the use of normal, robust, percentile bootstrap, and bias-corrected bootstrap confidence intervals.

Nonnormal data in power analysis

Typically, in the Monte Carlo based power analysis, data are generated from a multivariate normal distribution assuming that data collected in the future study will be normally distributed (e.g., Zhang & Wang, 2009). In order to deal with nonnormal data, one should allow for a power analysis based on nonnormal data. In this study, continuous nonnormal data with target skewness and kurtosis can be used. Specifically, the method developed by Vale and Maurelli (1983) is used to generate nonnormal data with the same mean and variance as the normal data but with target skewness and kurtosis provided by a user. If the literature shows, or a researcher has reason to believe, that nonnormality is a concern after data collection, power analysis should be conducted with simulated nonnormal data. For nonnormal data with excessive skewness and kurtosis, power based on the robust and bootstrap confidence intervals should be trusted more than the normal confidence interval. For studies with small sample size, the bootstrap method is expected to perform best.

Controlling type I error

The literature on mediation analysis has shown that the type I error for mediation tests is generally not well controlled (e.g., Fritz et al., 2012; MacKinnon et al., 2004; Zhang & Wang, 2013b). Through simulation, MacKinnon et al. showed that the normal method had too conservative empirical type I error and, on average, the bias-corrected bootstrap method had an acceptable empirical type I error. Recent studies further found that the bias-corrected bootstrap method had an inflated type I error rate when one of the direct effects, a or b, was not zero (e.g., Fritz et al., 2012; Hayes & Scharkow, 2013). The percentile bootstrap method, on the other hand, was found to have better controlled type I error. Therefore, if the power is calculated on the basis of the percentile bootstrap confidence interval, the type I error should not be a big concern.

If controlling type I error is the foremost concern in power analysis, we recommend the following strategy. Our Monte Carlo method can be conducted under the null hypothesis, where the null values can be used as the population parameters. In this case, power becomes type I error. Suppose a researcher wants to control type I error at .05 level. The researcher can always starts with a significance level at .05. Then, the empirical type I error can be obtained. If the empirical type I error rate is smaller than .05, the researcher can increase the significance level. Otherwise, the researcher can decrease the significance level. By trial and error, the researcher can decide on a significance level to control the type I error rate at the desired value. Then, in power analysis, the significance level can be used to construct confidence intervals to test mediation effects.

Relevant statistics from the Monte Carlo method

Using the Monte Carlo method, the statistical power is estimated by r/R. The standard error of the power can be estimated by \( \sqrt{r\left(1-r\right)/{R}^3} \). Note that with the increase of R, the power estimate becomes more accurate. In practice, we recommend R ≥ 1,000 because, in the literature, 1,000 replications are often used in evaluating power (e.g., Cheung, 2007; Thoemmes et al., 2010; Zhang & Wang, 2009 ).

In addition to power and its standard error, other statistics can be calculated. An important one is the empirical coverage probability. The empirical coverage probability is the rate that the constructed confidence interval covers the population value. For a well-performed confidence interval, the empirical coverage probability should be close to the confidence level 1 − α. For example, Hayes and Scharkow (2013) showed that the percentile bootstrap confidence interval has better coverage than the bias-corrected one. With the R sets of parameter estimates and their standard errors, one can also calculate the mean and standard deviation of the parameter estimates and the mean of the standard errors.

R package bmem

The proposed method in the above section is implemented in the free, open-source R package bmem (Zhang & Wang, 2013a). The package bmem uses the R package lavaan (Rosseel, 2012) for model estimation. The package can conduct power analysis based on the normal, robust, and bootstrap confidence intervals. We now illustrate the use of the package through a simple mediation model, shown in Fig. 2. The values in the figure are population parameters that can be decided from a pilot study or previous literature. In this example, we choose the parameter values to approximate a medium mediation effect. Some of the values are labeled using a, b, and cp. For demonstration, suppose we are interested in the power of the mediation effect ab = a * b and the total effect abc = a * b + cp.

Fig. 2
figure 2

An example mediation model with population parameters

To use bmem, one needs to specify the mediation model and the mediation effect. The package uses the lavaan model specification method, but with some specific requirements. For example, for the simple mediation model, it is specified as follows:

figure a

First, the name of the model is demo in R. Everything about the model is given in a pair of quotation marks. Each path in the model is described using a line of statement. For example, m ~ a*x + start(.39)*x means that m regresses on x with the coefficient .39 as in start(.39). Because the coefficient has a label a, it is also specified in the equation. The statement x ~~ start(1)*x means that the variance for x is 1. More generally, the regression relationships are specified using ~, and variance and covariance are specified using ~~. More about model specification can be found in Rosseel (2012). The use of bmem does not require knowledge of R beyond what is discussed in this article. However, users who are interested in learning R are directed to the webpage at http://www.r-project.org/doc/bib/R-books.html for a list of useful references.

With the model, we also need to tell the package the mediation effect to conduct power analysis on. In this example, the mediation effect ab and the total effect abc are of interest to us. They can be specified as

figure b

The notation := means to calculate the indirect effect ab as the product of parameter a and b, where the labels on the right hand of “:=” should be consistent with those used in the model statement demo. Similarly, the total effect is calculated.

Only the labels for the parameters that will appear in the calculation of the mediation effect are necessary to use in the model specification part. For example, for the variance parameters, no labels are used. By default, the variance parameters will be set at 1. Therefore, in this example, the specifications of the three variance parameters are not required.

The package bmem conducts power analysis on the basis of the percentile bootstrap method through the function power.boot. The code below presents an example to calculate power for the model in Fig. 2 with a sample size 100:

figure c

The function power.boot takes many arguments, but only the model one is required. A model is provided using the argument model; for this example, model=demo. If a mediation effect is of interest, it should also be provided, as in this example indirect=mediation. By default, the power is calculated for a sample size of 100, but one can change it by providing a different number to nobs. One can specify the number of Monte Carlo simulation replications in the calculation of power using option nrep= with a default 1,000 and also the number of bootstrap using nboot= with a default 1,000. To take advantage of the multicore processors of modern computers, the package allows parallel computing by setting parallel=’snow’, which uses the R package snowfall (Knaus, 2013) for automatic parallelization. By default, all cores available on a computer are used to speed up calculation. If one suspects the data will be nonnormal , the skewness and kurtosis for the observed variables can be provided. When specifying nonnormal data, the observed variable names (ovnames) should also be provided to match the order of the skewness and kurtosis statistics.

The results of the power analysis can be summarized into a table using the function summary(power.result). The results table (Output 1) shows several columns. First, the column True lists the population parameter values. Second, the column Estimate presents the average parameter estimates across all replications. Third, the column MSE is the average bootstrap standard error, and the column SD is the standard deviation of the parameter estimates across all replications. Fourth, the column Power gives the power to detect whether a parameter is significant, and the column Power.se provides the standard error of the estimated statistical power. Finally, the column Coverage presents the empirical coverage probability of the bias-corrected bootstrap confidence interval. The power for the mediation effect is listed at the end of the table entitled “Indirect/Mediation Effects.” The power to detect the mediation effect with a sample size 100 is about .935 using the percentile bootstrap confidence interval for the present example. During the phase of sample size planning, if a researcher targets a power of .8, he/she can reduce the current sample size for another calculation.

Output 1 Output of bmem

figure d

In addition to the power for the mediation effect and the total effect, the results also include power for all parameters in the model. For variance parameters, since they are always larger than 0, its power is obtained on the basis of the bootstrap standard error instead of the percentile confidence interval. Furthermore, if one adds the argument ci=”BC” in the power.boot function, the power based on the bias-corrected confidence interval can be obtained. Although the purpose of this study focuses on the power using the bootstrap method, the package bmem also provides a function power.basic to conduct power analysis for mediation models using the normal and robust confidence intervals.

In estimating the power for the mediation effect and the total effect, we assume that the type I error is well controlled. If a researcher is concerned about the type I error, he/she can investigate it using bmem. For example, the code below specifies a possible model under the null hypothesis that assumes that a = b = 0. Replacing the model demo using demo.alpha, we get the empirical type I error .003 for the mediation effect and .064 for the total effect. Furthermore, through trial and error, one can find that at the significance level .3, the empirical standard error is approximately 0.05 for a = b = 0, and at the same time, the power for a = b = 0.39 boosts to .996. However, Fritz et al. (2012) showed that the type I error is related to the magnitude of a and b under null (see also Example 1). Therefore, a more serious investigation would evaluate the type I error according to the combination of different values of a and b.

figure e

Examples

In this section, we present four examples to demonstrate how the proposed method can be applied in different scenarios. The first example is about a simple mediation analysis. The second example is on a multiple-mediator mediation model with a latent mediator. The third example involves mediation analysis in a multiple-group analysis setting. The fourth example shows the power analysis for longitudinal mediation models.

Example 1: Simple mediation analysis

In this example, the model with its population parameter values in Fig. 2 is used to explore whether the relationship between mothers’ education (ME) and children’s mathematical achievement (math) is mediated by home environment (HE; Zhang & Wang, 2013b). Through this example, we demonstrate the difference in power for normal and nonnormal data. In generating the nonnormal data, the skewness is set at −0.3, −0.7, and 1.3, and the kurtosis is set at 1.5, 0, and 5 for ME, HE, and math, respectively. The skewness and kurtosis statistics are determined according to real data used in Zhang and Wang (2013b). The sample size of 50 and 100 is investigated. The focus is the mediation effect ab. Complete R code for the analysis can be found in Appendix 1.

The statistical power for detecting the mediation effect at the significance level α = .05 in this example is given in Table 1. Like a typical power analysis, power increased with sample size regardless of the normality of the data. It should be noted that nonnormality may not necessarily reduce the power to detect significance of mediation effect. In this example, when data are nonnormal, the power actually increased. The results are consistent with the previous literature on nonnormal data with excessive kurtosis (e.g., Yuan, Bentler, & Zhang, 2005).

Table 1 Power (a = b=.39) and type I error (ab = 0) in detecting the mediation effect in Example 1

Type I error is also investigated in this example. Because ab = 0 has different indications on the value of a and b, we evaluate the influence of different combinations of them. The results shows that when the magnitude of a or b is small—for example, < .14—the empirical type I error is smaller than .05. On the other hand, when either a or b is large, the method based on the percentile bootstrap confidence interval tends to reject the null hypothesis more. The results here are consistent with those in Fritz et al. (2012).

Example 2: Mediation analysis with a latent mediator (power curve)

A power curve is useful to graphically display how power changes with sample size (e.g., Zhang & Wang, 2009). Using the model shown in Fig. 3, we show how to generate a power curve. The substantive idea of the model in Fig. 3 is that the relationship between age and education and performance on the everyday problem solving test (ept) is mediated by the memory ability measured by the Hopkins Verbal Learning Test (hvltt) and the reasoning ability measured by three reasoning tests, including word series (ws), letter sets (lt), and letter series (ls) tests (see Zhang & Wang, 2013b). The population model parameters are also displayed in the figure. The R code in Appendix 2 generates the power curve in Fig. 4. The power curve displays the power in detecting the effect of age and education on ept that is mediated by hvltt (a * b + c * b) for sample size from 100 to 1,900 with an interval of 200. The plot shows that to get a power of .8, a sample size of about 1,500 is needed. Note that a power curve can be used to obtain power for a given sample size through interpolation, although the results might not be as accurate.

Fig. 3
figure 3

A multiple-mediator mediation model with population parameter values used in Example 2

Fig. 4
figure 4

Power curve for testing the mediation effect in Example 2. To get a power of .8, a sample size of around 1,500 is needed based on the power curve

Example 3: Multiple-group mediation analysis (moderated mediation)

Thoemmes et al. (2010) considered a multiple-group mediation model shown in Fig. 5. Different from the simple mediation model in Fig. 2, the mediator m is measured as a latent variable by three observed variables, m1, m2, and m3. Furthermore, two groups are considered with varying mediation effects. Specifically, the mediation effect for the first group is a1*b1 = 0.26 and, for the second group, is a2*b2 = 0.10. This implies a moderated mediation, because the mediation effects are different for the two groups. The moderated mediation can be evaluated using a1*b1 - a2*b2. The sample size for the first group is 400 and, for the second group, 200. The R code for this analysis is given in Appendix 3.

Fig. 5
figure 5

The path diagram of a multiple group mediation model with population parameter values

Power for this example is given in Table 2. Comparing the power from Thoemmes et al. (2010), the power for med2 = a2*b2 increased while the power for diffmed = a1*b1 - a2*b2 decreased. The following reasons might explain the difference. First, x is binary in Thommes et al. but is continuous in the present study. Second, a close look at the bootstrap distributions revealed that the bootstrap distribution of med2 was right-skewed and the bootstrap distribution of diffmed was left-skewed. Thommes et al. used the Sobel test that assumed that the distribution of the indirect effects is normal, while the bootstrap method does not require such an assumption.

Table 2 Power from Thoemmes, MacKinnon, and Reiser (2010) and the present analysis

Example 4: A longitudinal mediation model

Maxwell and Cole (2007) have recommended the use of longitudinal mediation models in mediation analysis because of the involvement of causal process in mediation. Figure 6 is a longitudinal mediation model derived from Fig. 3 of Maxwell and Cole, with population parameter values calculated from Table 2 of Maxwell and Cole. In this example, each variable in the mediation model is measured three times repeatedly. The idea of longitudinal mediation is that the input variable at time 1 influences the mediator at time 2, which, in turn, affects the outcome variable at time 3. The mediation effect is then measured by a*b as in the cross-sectional mediation models. The power for the mediation effect a*b is calculated with the code in Appendix 4. The power is .860 when the bootstrap method is utilized for a sample size of 50.

Fig. 6
figure 6

Path diagram for a longitudinal mediation model with population parameter values

Discussion

In this study, we proposed to conduct power analysis for mediation models based on the bootstrap method. Specifically, the significance of the mediation effect is evaluated using the percentile bootstrap confidence interval. The proposed method is implemented in the free, open-source R package bmem. The use of the method is illustrated through four examples that cover a large variety of mediation models. The bootstrap method is recommended for use especially when data are not normally distributed—for example, with excessive skewness and kurtosis.

The proposed method is computationally intensive because of the involvement of the Monte Carlo simulation and bootstrap. For example, for a power analysis with 1,000 replications of Monte Carlo simulation and 1,000 times of bootstrap, a total of one million models have to be estimated and evaluated. In order to take advantage of modern hardware such as multicore processors, the package bmem implements automatic parallelization algorithms. Figure 7 displays the computing time along with the number of cores used on our desktop. Clearly, the computing time can be significantly reduced when multiple cores are used. Furthermore, the parallel method is very efficient, because the computing time reduces almost linearly with the increase of the number of cores.

Fig. 7
figure 7

Computing time along with the number of CPU cores utilized for the mediation model in Example 1

The bootstrap method requires about B times computing time of the normal or robust method, where B is the number of bootstraps. Furthermore, with the same sample size, the bootstrap method often has greater power. Therefore, in practice, one can first calculate power using the normal method to determine a rough sample size. Then the bootstrap method can be carried out with a smaller sample size than the normal method. In this way, one can save a significant amount of computing time.

Although we have focused our discussion on the mediation models, the method and software in this study can be used to conduct power analysis for structural equation models as well. The calculation of the power using the Monte Carlo method needs the estimation of a mediation model. Therefore, after data collection, the same model can be estimated in R without the need of additional statistical software.

In the future, we will improve our method and software in the following ways. First, a better model estimation algorithm will be utilized to save computing time. For example, von Oertzen and Brick (2013) developed an efficient method that can significantly reduce the time of power analysis proposed in this study. Second, missing data are always a problem in practical power analysis (Zhang & Wang, 2009). The current method assumes the data collected will be complete. In the future, we will incorporate missing data in power calculation. Third, the algorithm for generating non-normal data will be improved. One limitation of the method developed by Vale and Maurelli (1983) is that it cannot generate non-normal data with all possible combinations of skewness and kurtosis. Fourth, the current study focuses on one type of non-normal data, namely, continuous data with excessive skewness and kurtosis. In the future, other types of non-normal data, for example, categorical data, count data, and survival data will be investigated.