Skip to main content
Log in

Methods for Mediation Analysis with Missing Data

  • Published:
Psychometrika Aims and scope Submit manuscript

Abstract

Despite wide applications of both mediation models and missing data techniques, formal discussion of mediation analysis with missing data is still rare. We introduce and compare four approaches to dealing with missing data in mediation analysis including listwise deletion, pairwise deletion, multiple imputation (MI), and a two-stage maximum likelihood (TS-ML) method. An R package bmem is developed to implement the four methods for mediation analysis with missing data in the structural equation modeling framework, and two real examples are used to illustrate the application of the four methods. The four methods are evaluated and compared under MCAR, MAR, and MNAR missing data mechanisms through simulation studies. Both MI and TS-ML perform well for MCAR and MAR data regardless of the inclusion of auxiliary variables and for AV-MNAR data with auxiliary variables. Although listwise deletion and pairwise deletion have low power and large parameter estimation bias in many studied conditions, they may provide useful information for exploring missing mechanisms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Figure 1.
Figure 2.
Figure 3.

Similar content being viewed by others

Notes

  1. To install the package for the first time use, one can issue the command install.packages("bmem") in R console.

  2. The model fitted the data well according to the chi-square test using the Bollen-Stine bootstrap method implemented in bmem.

  3. We have also conducted simulations where c′=0.5 with no (a=b=0), medium (a=b=0.39), and large (a=b=0.59) effect sizes. For the condition of no mediation effects, we also considered a=0.39 & b=0 and a=0 & b=0.39 and no different patterns were observed. The results from these simulations can be seen on the authors’ website at http://rpackages.psychstat.org/examples/bmem/Supplenment%20results.pdf.

  4. We only considered MAR and MNAR data because for MCAR the covariance matrices are expected to be the same with and without missing data and, thus, biased parameter estimates are not expected. Numerical results from MI cannot be easily obtained but it is expected that results from MI would be similar to those from TS-ML.

  5. This result is due to the MAR missing data manipulation method of this study. It does not imply that listwise deletion works the same way for all MAR data. However, for some specific MAR data such as the one we generate here, listwise deletion does not produce biased results.

References

  • Azen, S., & Van Guilder, M. (1981). Conclusions regarding algorithms for handling incomplete data. In Proceedings of the survey research methods section (pp. 53–56).

    Google Scholar 

  • Bauer, D.J., Preacher, K.J., & Gil, K.M. (2006). Conceptualizing and testing random indirect effects and moderated mediation in multilevel models: new procedures and recommendations. Psychological Methods, 11(2), 142–163.

    Article  PubMed  Google Scholar 

  • Bentler, P.M., & Weeks, D.G. (1980). Linear structural equations with latent variables. Psychometrika, 45, 289–308.

    Article  Google Scholar 

  • Best, N.G., Spiegelhalter, D.J., Thomas, A., & Brayne, C.E. (1996). Bayesian analysis of realistically complex models. Journal of the Royal Statistical Society. Series A, 159, 323–342.

    Google Scholar 

  • Bollen, K.A., & Stine, R.A. (1990). Direct and indirect effects: classical and bootstrap estimates of variability. Sociological Methodology, 20, 115–140.

    Article  Google Scholar 

  • Brandt, J. (1991). The Hopkins verbal learning test: development of a new memory test with six equivalent forms. Clinical Neuropsychology, 5, 125–142.

    Article  Google Scholar 

  • Chen, Z.X., Aryee, S., & Lee, C. (2005). Test of a mediation model of perceived organizational support. Journal of Vocational Behavior, 66(3), 457–470.

    Article  Google Scholar 

  • Center for Human Resource Research (2006). NLSY79 child & young adult data users guide: a guide to the 1986–2004 child data (Computer software manual). Columbus.

  • Cladwell, B.M., & Bradley, R.H. (1979). Home observation for measurement of the environment. Little Rock: University of Arkansas.

    Google Scholar 

  • Cole, D.A., & Maxwell, S.E. (2003). Testing mediational models with longitudinal data: questions and tips in the use of structural equation modeling. Journal of Abnormal Psychology, 112, 558–577.

    Article  PubMed  Google Scholar 

  • Davis-Kean, P.E. (2005). The influence of parent education and family income on child achievement: the indirect role of parental expectations and the home environment. Journal of Family Psychology, 19, 294–304.

    Article  PubMed  Google Scholar 

  • Efron, B. (1979). Bootstrap methods: another look at the jackknife. The Annals of Statistics, 7(1), 1–26.

    Article  Google Scholar 

  • Efron, B. (1987). Better bootstrap confidence intervals. Journal of the American Statistical Association, 82(397), 171–185.

    Article  Google Scholar 

  • Efron, B. (1994). Missing data, imputation, and the bootstrap. Journal of the American Statistical Association, 89(426), 463–478.

    Article  Google Scholar 

  • Efron, B., & Tibshirani, R. (1993). An introduction to the bootstrap. New York: CRC Press.

    Google Scholar 

  • Ekstrom, R.B., French, J.W., Harman, H.H., & Derman, D. (1976). Kit of factor-referenced cognitive tests. Princeton: Educational Testing Service.

    Google Scholar 

  • Enders, C.K. (2003). Using the expectation maximization algorithm to estimate coefficient alpha for scales with item-level missing data. Psychological Methods, 8, 322–337.

    Article  PubMed  Google Scholar 

  • Fox, J. (2006). Structural equation modeling with the sem package in r. Structural Equation Modeling, 13, 465–486.

    Article  Google Scholar 

  • Gonda, J., & Schaie, K.W. (1985). Schaie-Thurstone mental abilities test: word series test. Palo Alto: Consulting Psychologists Press.

    Google Scholar 

  • Grimm, K.J. (2008). Longitudinal associations between reading and mathematics. Developmental Neuropsychology, 33, 410–426.

    Article  PubMed  Google Scholar 

  • Jelicic, H., Phelps, E., & Lerner, R.M. (2009). Use of missing data methods in longitudinal studies: the persistence of bad practices in developmental psychology. Developmental Neuropsychology, 45, 1195–1199.

    Google Scholar 

  • Jobe, J.B., Smith, D.M., Ball, K., Tennstedt, S.L., Marsiske, M., Willis, S.L., & Kleinman, K. (2001). Active: a cognitive intervention trial to promote independence in older adults. Controlled Clinical Trials, 22(4), 453–479.

    Article  PubMed  Google Scholar 

  • Leppard, P., & Tallis, G.M. (1989). Evaluation of the mean and covariance of the truncated multinormal. Applied Statistics, 38, 543–553.

    Article  Google Scholar 

  • Little, R.J.A., & Rubin, D.B. (2002). Statistical analysis with missing data (2nd ed.). New York: Wiley-Interscience.

    Google Scholar 

  • Lu, Z., Zhang, Z., & Lubke, G. (2011). Bayesian inference for growth mixture models with non-ignorable missing data. Multivariate Behavioral Research, 46, 567–597.

    Article  Google Scholar 

  • MacKinnon, D.P. (2008). Introduction to statistical mediation analysis. London: Taylor & Francis.

    Google Scholar 

  • MacKinnon, D.P., Lockwood, C.M., Hoffman, J.M., West, S.G., & Sheets, V. (2002). A comparison of methods to test mediation and other intervening variable effects. Psychological Methods, 7, 83–104.

    Article  PubMed  Google Scholar 

  • MacKinnon, D.P., Lockwood, C.M., & Williams, J. (2004). Confidence limits for the indirect effect: distribution of the product and resampling methods. Multivariate Behavioral Research, 39(1), 99–128.

    Article  PubMed  Google Scholar 

  • McArdle, J.J., & Boker, S.M. (1990). Rampath. Hillsdale: Lawrence Erlbaum.

    Google Scholar 

  • Preacher, K.J., & Hayes, A.F. (2004). SPSS and sas procedures for estimating indirect effects in simple mediation models. Behavior Research Methods, Instruments, & Computers, 36, 717–731.

    Article  Google Scholar 

  • Preacher, K.J., & Hayes, A.F. (2008). Asymptotic and resampling strategies for assessing and comparing indirect effects in multiple mediator models. Behavior Research Methods, 40, 879–891.

    Article  PubMed  Google Scholar 

  • Rubin, D.B. (1976). Inference and missing data. Biometrika, 63(3), 581–592.

    Article  Google Scholar 

  • Rubin, D.B. (1996). Multiple imputation after 18+ years. Journal of the American Statistical Association, 91, 473–489.

    Article  Google Scholar 

  • Savalei, V., & Bentler, P.M. (2009). A two-stage approach to missing data: theory and application to auxiliary variables. Structural Equation Modeling, 16, 477–497.

    Article  Google Scholar 

  • Savalei, V., & Falk, C. (in press). Robust two-stage approach outperforms robust FIML with incomplete nonnormal data. Structural Equation Modeling.

  • Schafer, J.L. (1997). Analysis of incomplete multivariate data. London: Chapman & Hall/CRC.

    Book  Google Scholar 

  • Shrout, P.E., & Bolger, N. (2002). Mediation in experimental and nonexperimental studies: new procedures and recommendations. Psychological Methods, 7, 422–445.

    Article  PubMed  Google Scholar 

  • Sobel, M.E. (1982). Asymptotic confidence intervals for indirect effects in structural equation models. In S. Leinhardt (Ed.), Sociological methodology (pp. 290–312). San Francisco: Jossey-Bass.

    Google Scholar 

  • Tallis, G.M. (1961). The moment generating function of the truncated multinormal distribution. Journal of the Royal Statistical Society. Series B, 23, 223–229.

    Google Scholar 

  • Tang, M.-L., & Bentler, P.M. (1997). Maximum likelihood estimation in covariance structure analysis with truncated data. British Journal of Mathematical & Statistical Psychology, 50(2), 339–349. doi:10.1111/j.2044-8317.1997.tb01149.x. Available from http://dx.doi.org/10.1111/j.2044-8317.1997.tb01149.x.

    Article  Google Scholar 

  • Thurstone, L.L., & Thurstone, T.G. (1949). Examiner manual for the SRA primary mental abilities test (form 10–14). Chicago: Science Research Associates.

    Google Scholar 

  • Wilhelm, S., & Manjunath, B.G. (2010). tmvtnorm: truncated multivariate normal and Student t distribution [Computer software manual]. Available from http://CRAN.R-project.org/package=tmvtnorm (R package version 1.2-3).

  • Willis, S.L., & Marsiske, M. (1993). Manual for the everyday problems test. University Park: Pennsylvania State University.

    Google Scholar 

  • Yuan, K.-H. (2009). Identifying variables responsible for data not missing at random. Psychometrika, 74, 233–256.

    Article  Google Scholar 

  • Yuan, K.-H., & Bentler, P.M. (2000). Three likelihood-based methods for mean and covariance structure analysis with nonnormal missing data. Sociological Methodology, 30, 165–200.

    Article  Google Scholar 

  • Yung, Y.-F. (1996). Bootstrapping techniques in analysis of mean and covariance structures. In G.A. Marcoulides & R.E. Schumacker (Eds.), Advanced structural equation modeling: issues and techniques (pp. 195–226). Mahwah: Erlbaum.

    Google Scholar 

  • Zhang, Z., & Yuan, K.-H. (2012). WebSEM manual [Computer software manual]. Available from https://websem.psychstat.org.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhiyong Zhang.

Appendices

Appendix A. Expectation for the EM Algorithm

The E-step of the EM algorithm is to fill in the missing data using their expectations

$$ E\bigl(z_{ij}|z_{\mathit{obs}},U^{(t)},S^{(t)} \bigr)=z_{ij}^{(t)};\quad i=1,\ldots,N, \ j=1,2,\ldots,p+3, $$
(A.1)

and

$$ E\bigl(z_{ij}z_{ik}|z_{\mathit{obs}},U^{(t)},S^{(t)} \bigr)=z_{ij}^{(t)}z_{ik}^{(t)}+c_{ijk}^{(t)} $$
(A.2)

where

$$ z_{ij}^{(t)}=\left\{ \begin{array}{l@{\quad}l} z_{ij} & \text{if }z_{ij}\text{ is observed}\\ E(z_{ij}|z_{\mathit{obs}},U^{(t)},S^{(t)}) & \text{if }z_{ij}\text{ is missing} \end{array} \right. $$
(A.3)

and

$$ c_{ijk}^{(t)}=\left\{ \begin{array}{l@{\quad}l} \operatorname{Cov}(z_{ij},z_{ik}|z_{\mathit{obs}},U^{(t)},S^{(t)}) & \text{if both }z_{ij}\text{ and }z_{ik}\text{ are missing}\\ 0 & \text{otherwise} \end{array} \right. $$
(A.4)

with j,k=1,2,…,p+3 and z obs denoting the observed data. The expectation E(z ij |z obs ,U (t),S (t)) and covariance \(\operatorname{Cov}(z_{ij},z_{ik}|z_{\mathit{obs}},U^{(t)},S^{(t)})\) are readily available from the conditional distribution of the multivariate normal distribution with mean U (t) and covariance S (t).

Appendix B. R Code for Example 1

The R code lines below were used to obtain the results for Example 1. The statements following “#” are annotations for the R code lines. Line 2 loads our R library, and Line 7 reads data for the first example into R. Lines 9 to 15 specify the path model in Figure 2. The three-term phrase start -> end, parname, st denotes a single-headed path from the variable start to the variable end. The parameter on this path is represented by parname, and its starting value for estimation purpose is st. The starting value st can be set as NA to ask the program to choose a starting value. Similarly, <-> represents a double-headed arrow denoting variance or covariance in the path diagram. For more information on how to specify a path model, see Fox (2006). Line 18 specifies the mediation effect or indirect effect to be estimated. More than one indirect effect can be given as shown in Appendix C for Example 2. On Line 21, the model parameters are estimated using the bmem function through the listwise deletion method by setting method='list'. By changing the method argument to 'pair', 'tsml', and 'mi', respectively, other missing data handling methods can be used to get parameter estimates. In the bmem function, the first argument is the data set to be used, ex1, in this example. The second argument is the model to be estimated. The third argument supplies the indirect effects. The fourth argument selects the variables to be used in the mediation model. This argument distinguishes the variables used in the mediation model from the auxiliary variables.

figure a

Appendix C. R Code for Example 2

In this example, multiple mediation effects are specified on Line 27. For example, a*b is the indirect effect from age to EPT through HVLT and d*h is the indirect effect from age to EPT through R. Note that a*b+d*h is the total indirect effect from age to EPT.

figure b

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, Z., Wang, L. Methods for Mediation Analysis with Missing Data. Psychometrika 78, 154–184 (2013). https://doi.org/10.1007/s11336-012-9301-5

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11336-012-9301-5

Key words

Navigation