, Volume 78, Issue 1, pp 154–184 | Cite as

Methods for Mediation Analysis with Missing Data

  • Zhiyong Zhang
  • Lijuan Wang


Despite wide applications of both mediation models and missing data techniques, formal discussion of mediation analysis with missing data is still rare. We introduce and compare four approaches to dealing with missing data in mediation analysis including listwise deletion, pairwise deletion, multiple imputation (MI), and a two-stage maximum likelihood (TS-ML) method. An R package bmem is developed to implement the four methods for mediation analysis with missing data in the structural equation modeling framework, and two real examples are used to illustrate the application of the four methods. The four methods are evaluated and compared under MCAR, MAR, and MNAR missing data mechanisms through simulation studies. Both MI and TS-ML perform well for MCAR and MAR data regardless of the inclusion of auxiliary variables and for AV-MNAR data with auxiliary variables. Although listwise deletion and pairwise deletion have low power and large parameter estimation bias in many studied conditions, they may provide useful information for exploring missing mechanisms.

Key words

mediation analysis missing data MI TS-ML bootstrap auxiliary variables 


  1. Azen, S., & Van Guilder, M. (1981). Conclusions regarding algorithms for handling incomplete data. In Proceedings of the survey research methods section (pp. 53–56). Google Scholar
  2. Bauer, D.J., Preacher, K.J., & Gil, K.M. (2006). Conceptualizing and testing random indirect effects and moderated mediation in multilevel models: new procedures and recommendations. Psychological Methods, 11(2), 142–163. PubMedCrossRefGoogle Scholar
  3. Bentler, P.M., & Weeks, D.G. (1980). Linear structural equations with latent variables. Psychometrika, 45, 289–308. CrossRefGoogle Scholar
  4. Best, N.G., Spiegelhalter, D.J., Thomas, A., & Brayne, C.E. (1996). Bayesian analysis of realistically complex models. Journal of the Royal Statistical Society. Series A, 159, 323–342. Google Scholar
  5. Bollen, K.A., & Stine, R.A. (1990). Direct and indirect effects: classical and bootstrap estimates of variability. Sociological Methodology, 20, 115–140. CrossRefGoogle Scholar
  6. Brandt, J. (1991). The Hopkins verbal learning test: development of a new memory test with six equivalent forms. Clinical Neuropsychology, 5, 125–142. CrossRefGoogle Scholar
  7. Chen, Z.X., Aryee, S., & Lee, C. (2005). Test of a mediation model of perceived organizational support. Journal of Vocational Behavior, 66(3), 457–470. CrossRefGoogle Scholar
  8. Center for Human Resource Research (2006). NLSY79 child & young adult data users guide: a guide to the 1986–2004 child data (Computer software manual). Columbus. Google Scholar
  9. Cladwell, B.M., & Bradley, R.H. (1979). Home observation for measurement of the environment. Little Rock: University of Arkansas. Google Scholar
  10. Cole, D.A., & Maxwell, S.E. (2003). Testing mediational models with longitudinal data: questions and tips in the use of structural equation modeling. Journal of Abnormal Psychology, 112, 558–577. PubMedCrossRefGoogle Scholar
  11. Davis-Kean, P.E. (2005). The influence of parent education and family income on child achievement: the indirect role of parental expectations and the home environment. Journal of Family Psychology, 19, 294–304. PubMedCrossRefGoogle Scholar
  12. Efron, B. (1979). Bootstrap methods: another look at the jackknife. The Annals of Statistics, 7(1), 1–26. CrossRefGoogle Scholar
  13. Efron, B. (1987). Better bootstrap confidence intervals. Journal of the American Statistical Association, 82(397), 171–185. CrossRefGoogle Scholar
  14. Efron, B. (1994). Missing data, imputation, and the bootstrap. Journal of the American Statistical Association, 89(426), 463–478. CrossRefGoogle Scholar
  15. Efron, B., & Tibshirani, R. (1993). An introduction to the bootstrap. New York: CRC Press. Google Scholar
  16. Ekstrom, R.B., French, J.W., Harman, H.H., & Derman, D. (1976). Kit of factor-referenced cognitive tests. Princeton: Educational Testing Service. Google Scholar
  17. Enders, C.K. (2003). Using the expectation maximization algorithm to estimate coefficient alpha for scales with item-level missing data. Psychological Methods, 8, 322–337. PubMedCrossRefGoogle Scholar
  18. Fox, J. (2006). Structural equation modeling with the sem package in r. Structural Equation Modeling, 13, 465–486. CrossRefGoogle Scholar
  19. Gonda, J., & Schaie, K.W. (1985). Schaie-Thurstone mental abilities test: word series test. Palo Alto: Consulting Psychologists Press. Google Scholar
  20. Grimm, K.J. (2008). Longitudinal associations between reading and mathematics. Developmental Neuropsychology, 33, 410–426. PubMedCrossRefGoogle Scholar
  21. Jelicic, H., Phelps, E., & Lerner, R.M. (2009). Use of missing data methods in longitudinal studies: the persistence of bad practices in developmental psychology. Developmental Neuropsychology, 45, 1195–1199. Google Scholar
  22. Jobe, J.B., Smith, D.M., Ball, K., Tennstedt, S.L., Marsiske, M., Willis, S.L., & Kleinman, K. (2001). Active: a cognitive intervention trial to promote independence in older adults. Controlled Clinical Trials, 22(4), 453–479. PubMedCrossRefGoogle Scholar
  23. Leppard, P., & Tallis, G.M. (1989). Evaluation of the mean and covariance of the truncated multinormal. Applied Statistics, 38, 543–553. CrossRefGoogle Scholar
  24. Little, R.J.A., & Rubin, D.B. (2002). Statistical analysis with missing data (2nd ed.). New York: Wiley-Interscience. Google Scholar
  25. Lu, Z., Zhang, Z., & Lubke, G. (2011). Bayesian inference for growth mixture models with non-ignorable missing data. Multivariate Behavioral Research, 46, 567–597. CrossRefGoogle Scholar
  26. MacKinnon, D.P. (2008). Introduction to statistical mediation analysis. London: Taylor & Francis. Google Scholar
  27. MacKinnon, D.P., Lockwood, C.M., Hoffman, J.M., West, S.G., & Sheets, V. (2002). A comparison of methods to test mediation and other intervening variable effects. Psychological Methods, 7, 83–104. PubMedCrossRefGoogle Scholar
  28. MacKinnon, D.P., Lockwood, C.M., & Williams, J. (2004). Confidence limits for the indirect effect: distribution of the product and resampling methods. Multivariate Behavioral Research, 39(1), 99–128. PubMedCrossRefGoogle Scholar
  29. McArdle, J.J., & Boker, S.M. (1990). Rampath. Hillsdale: Lawrence Erlbaum. Google Scholar
  30. Preacher, K.J., & Hayes, A.F. (2004). SPSS and sas procedures for estimating indirect effects in simple mediation models. Behavior Research Methods, Instruments, & Computers, 36, 717–731. CrossRefGoogle Scholar
  31. Preacher, K.J., & Hayes, A.F. (2008). Asymptotic and resampling strategies for assessing and comparing indirect effects in multiple mediator models. Behavior Research Methods, 40, 879–891. PubMedCrossRefGoogle Scholar
  32. Rubin, D.B. (1976). Inference and missing data. Biometrika, 63(3), 581–592. CrossRefGoogle Scholar
  33. Rubin, D.B. (1996). Multiple imputation after 18+ years. Journal of the American Statistical Association, 91, 473–489. CrossRefGoogle Scholar
  34. Savalei, V., & Bentler, P.M. (2009). A two-stage approach to missing data: theory and application to auxiliary variables. Structural Equation Modeling, 16, 477–497. CrossRefGoogle Scholar
  35. Savalei, V., & Falk, C. (in press). Robust two-stage approach outperforms robust FIML with incomplete nonnormal data. Structural Equation Modeling. Google Scholar
  36. Schafer, J.L. (1997). Analysis of incomplete multivariate data. London: Chapman & Hall/CRC. CrossRefGoogle Scholar
  37. Shrout, P.E., & Bolger, N. (2002). Mediation in experimental and nonexperimental studies: new procedures and recommendations. Psychological Methods, 7, 422–445. PubMedCrossRefGoogle Scholar
  38. Sobel, M.E. (1982). Asymptotic confidence intervals for indirect effects in structural equation models. In S. Leinhardt (Ed.), Sociological methodology (pp. 290–312). San Francisco: Jossey-Bass. Google Scholar
  39. Tallis, G.M. (1961). The moment generating function of the truncated multinormal distribution. Journal of the Royal Statistical Society. Series B, 23, 223–229. Google Scholar
  40. Tang, M.-L., & Bentler, P.M. (1997). Maximum likelihood estimation in covariance structure analysis with truncated data. British Journal of Mathematical & Statistical Psychology, 50(2), 339–349. doi: 10.1111/j.2044-8317.1997.tb01149.x. Available from CrossRefGoogle Scholar
  41. Thurstone, L.L., & Thurstone, T.G. (1949). Examiner manual for the SRA primary mental abilities test (form 10–14). Chicago: Science Research Associates. Google Scholar
  42. Wilhelm, S., & Manjunath, B.G. (2010). tmvtnorm: truncated multivariate normal and Student t distribution [Computer software manual]. Available from (R package version 1.2-3).
  43. Willis, S.L., & Marsiske, M. (1993). Manual for the everyday problems test. University Park: Pennsylvania State University. Google Scholar
  44. Yuan, K.-H. (2009). Identifying variables responsible for data not missing at random. Psychometrika, 74, 233–256. CrossRefGoogle Scholar
  45. Yuan, K.-H., & Bentler, P.M. (2000). Three likelihood-based methods for mean and covariance structure analysis with nonnormal missing data. Sociological Methodology, 30, 165–200. CrossRefGoogle Scholar
  46. Yung, Y.-F. (1996). Bootstrapping techniques in analysis of mean and covariance structures. In G.A. Marcoulides & R.E. Schumacker (Eds.), Advanced structural equation modeling: issues and techniques (pp. 195–226). Mahwah: Erlbaum. Google Scholar
  47. Zhang, Z., & Yuan, K.-H. (2012). WebSEM manual [Computer software manual]. Available from

Copyright information

© The Psychometric Society 2012

Authors and Affiliations

  1. 1.University of Notre DameNotre DameUSA

Personalised recommendations