, Volume 77, Issue 4, pp 803–826 | Cite as

Robust Structural Equation Modeling with Missing Data and Auxiliary Variables

  • Ke-Hai Yuan
  • Zhiyong Zhang


The paper develops a two-stage robust procedure for structural equation modeling (SEM) and an R package rsem to facilitate the use of the procedure by applied researchers. In the first stage, M-estimates of the saturated mean vector and covariance matrix of all variables are obtained. Those corresponding to the substantive variables are then fitted to the structural model in the second stage. A sandwich-type covariance matrix is used to obtain consistent standard errors (SE) of the structural parameter estimates. Rescaled, adjusted as well as corrected and F-statistics are proposed for overall model evaluation. Using R and EQS, the R package rsem combines the two stages and generates all the test statistics and consistent SEs. Following the robust analysis, multiple model fit indices and standardized solutions are provided in the corresponding output of EQS. An example with open/closed book examination data illustrates the proper use of the package. The method is further applied to the analysis of a data set from the National Longitudinal Survey of Youth 1997 cohort, and results show that the developed procedure not only gives a better endorsement of the substantive models but also yields estimates with uniformly smaller standard errors than the normal-distribution-based maximum likelihood.

Key words

auxiliary variables estimating equation missing at random R package rsem sandwich-type covariance matrix 



We would like to thank Dr. Alberto Maydeu-Olivares and two reviewers for their very constructive comments on an earlier version of the paper.


  1. Arminger, G., & Sobel, M.E. (1990). Pseudo-maximum likelihood estimation of mean and covariance structures with missing data. Journal of the American Statistical Association, 85, 195–203. CrossRefGoogle Scholar
  2. Bentler, P.M. (2008). EQS 6 structural equations program manual. Encino: Multivariate Software. Google Scholar
  3. Bentler, P.M., & Yuan, K.-H. (1999). Structural equation modeling with small samples: test statistics. Multivariate Behavioral Research, 34, 181–197. CrossRefGoogle Scholar
  4. Browne, M.W. (1984). Asymptotic distribution-free methods for the analysis of covariance structures. British Journal of Mathematical & Statistical Psychology, 37, 62–83. CrossRefGoogle Scholar
  5. Cheng, T.-C., & Victoria-Feser, M.-P. (2002). High-breakdown estimation of multivariate mean and covariance with missing observations. British Journal of Mathematical & Statistical Psychology, 55, 317–335. CrossRefGoogle Scholar
  6. D’Agostino, R.B., Belanger, A., & D’Agostino, R.B. Jr. (1990). A suggestion for using powerful and informative tests of normality. American Statistician, 44, 316–321. Google Scholar
  7. Enders, C.K. (2010). Applied missing data analysis, New York: Guilford. Google Scholar
  8. Enders, C.K., & Bandalos, D.L. (2001). The relative performance of full information maximum likelihood estimation for missing data in structural equation models. Structural Equation Modeling, 8, 430–457. CrossRefGoogle Scholar
  9. Hampel, F.R., Ronchetti, E.M., Rousseeuw, P.J., & Stahel, W.A. (1986). Robust statistics: the approach based on influence functions. New York: Wiley. Google Scholar
  10. Hu, L., Bentler, P.M., & Kano, Y. (1992). Can test statistics in covariance structure analysis be trusted? Psychological Bulletin, 112, 351–362. PubMedCrossRefGoogle Scholar
  11. Huber, P.J. (1981). Robust statistics. New York: Wiley. CrossRefGoogle Scholar
  12. Lee, S.Y., & Xia, Y.M. (2006). Maximum likelihood methods in treating outliers and symmetrically heavy-tailed distributions for nonlinear structural equation models with missing data. Psychometrika, 71, 565–585. CrossRefGoogle Scholar
  13. Lee, S.Y., & Xia, Y.M. (2008). A robust Bayesian approach for structural equation models with missing data. Psychometrika, 73, 343–364. CrossRefGoogle Scholar
  14. Little, R.J.A. (1988). Robust estimation of the mean and covariance matrix from data with missing values. Applied Statistics, 37, 23–38. CrossRefGoogle Scholar
  15. Liu, C. (1997). ML estimation of the multivariate t distribution and the EM algorithm. Journal of Multivariate Analysis, 63, 296–312. CrossRefGoogle Scholar
  16. Lopuhaä, H.P. (1989). On the relation between S-estimators and M-estimators of multivariate location and covariances. Annals of Statistics, 17, 1662–1683. CrossRefGoogle Scholar
  17. Mair, P., Wu, E., & Bentler, P.M. (2010). EQS goes R: simulations for SEM using the package REQS. Structural Equation Modeling, 17, 333–349. CrossRefGoogle Scholar
  18. Mardia, K.V. (1970). Measures of multivariate skewness and kurtosis with applications. Biometrika, 57, 519–530. CrossRefGoogle Scholar
  19. Mardia, K.V., Kent, J.T., & Bibby, J.M. (1979). Multivariate analysis. New York: Academic Press. Google Scholar
  20. Micceri, T. (1989). The unicorn, the normal curve, and other improbable creatures. Psychological Bulletin, 105, 156–166. CrossRefGoogle Scholar
  21. Preacher, K.J., Wichman, A.L., MacCallum, R.C., & Briggs, N.E. (2008). Latent growth curve modeling. Thousand Oaks: Sage. Google Scholar
  22. Raykov, T. (2005). Analysis of longitudinal studies with missing data using covariance structure modeling with full-information maximum likelihood. Structural Equation Modeling, 12, 493–505. CrossRefGoogle Scholar
  23. Rocke, D.M. (1996). Robustness properties of S-estimators of multivariate location and shape in high dimension. Annals of Statistics, 24, 1327–1345. CrossRefGoogle Scholar
  24. Rubin, D.B. (1976). Inference and missing data (with discussions). Biometrika, 63, 581–592. CrossRefGoogle Scholar
  25. Satorra, A., & Bentler, P.M. (1994). Corrections to test statistics and standard errors in covariance structure analysis. In A. von Eye & C.C. Clogg (Eds.), Latent variables analysis: applications for developmental research (pp. 399–419). Newbury Park: Sage. Google Scholar
  26. Savalei, V., & Bentler, P.M. (2009). A two-stage ML approach to missing data: theory and application to auxiliary variables. Structural Equation Modeling, 16, 477–497. CrossRefGoogle Scholar
  27. Savalei, V., & Falk, C. (in press) Robust two-stage approach outperforms robust FIML with incomplete non-normal data. Structural Equation Modeling. Google Scholar
  28. Schott, J. (2005). Matrix analysis for statistics (2nd ed.). New York: Wiley. Google Scholar
  29. Tong, X., Zhang, Z., & Yuan, K.-H. (2011, October). Evaluation of test statistics for robust structural equation modeling with non-normal missing data. Paper presented at the graduate student pre-conference of the annual meeting of the society of multivariate experimental psychology, Norman, OK. Google Scholar
  30. Yuan, K.-H. (2011). Expectation-robust algorithm and estimating equation for means and covariances with missing data. Manuscript under review. Google Scholar
  31. Yuan, K.-H., & Bentler, P.M. (1997). Improving parameter tests in covariance structure analysis. Computational Statistics & Data Analysis, 26, 177–198. CrossRefGoogle Scholar
  32. Yuan, K.-H., & Bentler, P.M. (1998). Normal theory based test statistics in structural equation modeling. British Journal of Mathematical & Statistical Psychology, 51, 289–309. CrossRefGoogle Scholar
  33. Yuan, K.-H., & Bentler, P.M. (2000). Three likelihood-based methods for mean and covariance structure analysis with non-normal missing data. Sociological Methodology, 30, 167–202. Google Scholar
  34. Yuan, K.-H., & Bentler, P.M. (2001). A unified approach to multigroup structural equation modeling with nonstandard samples. In G.A. Marcoulides & R.E. Schumacker (Eds.), Advanced structural equation modeling: new developments and techniques (pp. 35–56). Mahwah: Lawrence Erlbaum Associates. Google Scholar
  35. Yuan, K.-H., & Bentler, P.M. (2010). Two simple approximations to the distributions of quadratic forms. British Journal of Mathematical & Statistical Psychology, 63, 273–291. CrossRefGoogle Scholar
  36. Yuan, K.-H., Bentler, P.M., & Chan, W. (2004a). Structural equation modeling with heavy tailed distributions. Psychometrika, 69, 421–436. CrossRefGoogle Scholar
  37. Yuan, K.-H., & Jennrich, R.I. (1998). Asymptotics of estimating equations under natural conditions. Journal of Multivariate Analysis, 65, 245–260. CrossRefGoogle Scholar
  38. Yuan, K.-H., Lambert, P.L., & Fouladi, R.T. (2004b). Mardia’s multivariate kurtosis with missing data. Multivariate Behavioral Research, 39, 413–437. CrossRefGoogle Scholar
  39. Yuan, K.-H., & Lu, L. (2008). SEM with missing data and unknown population using two-stage ML: theory and its application. Multivariate Behavioral Research, 62, 621–652. CrossRefGoogle Scholar
  40. Yuan, K.-H., Marshall, L.L., & Bentler, P.M. (2002). A unified approach to exploratory factor analysis with missing data, non-normal data, and in the presence of outliers. Psychometrika, 67, 95–122. CrossRefGoogle Scholar
  41. Yuan, K.-H., Wallentin, F., & Bentler, P.M. (in press) ML versus MI for missing data with violation of distribution conditions. Sociological Methods & Research. Google Scholar
  42. Zhong, X., & Yuan, K.-H. (2011). Bias and efficiency in structural equation modeling: maximum likelihood versus robust methods. Multivariate Behavioral Research, 46, 229–265. CrossRefGoogle Scholar
  43. Zu, J., & Yuan, K.-H. (2010). Local influence and robust procedures for mediation analysis. Multivariate Behavioral Research, 45, 1–44. CrossRefGoogle Scholar

Copyright information

© The Psychometric Society 2012

Authors and Affiliations

  1. 1.Department of PsychologyUniversity of Notre DameNotre DameUSA

Personalised recommendations