Abstract
The paper develops a two-stage robust procedure for structural equation modeling (SEM) and an R package rsem to facilitate the use of the procedure by applied researchers. In the first stage, M-estimates of the saturated mean vector and covariance matrix of all variables are obtained. Those corresponding to the substantive variables are then fitted to the structural model in the second stage. A sandwich-type covariance matrix is used to obtain consistent standard errors (SE) of the structural parameter estimates. Rescaled, adjusted as well as corrected and F-statistics are proposed for overall model evaluation. Using R and EQS, the R package rsem combines the two stages and generates all the test statistics and consistent SEs. Following the robust analysis, multiple model fit indices and standardized solutions are provided in the corresponding output of EQS. An example with open/closed book examination data illustrates the proper use of the package. The method is further applied to the analysis of a data set from the National Longitudinal Survey of Youth 1997 cohort, and results show that the developed procedure not only gives a better endorsement of the substantive models but also yields estimates with uniformly smaller standard errors than the normal-distribution-based maximum likelihood.
Similar content being viewed by others
Notes
Without missing values, NML is uniquely defined. With missing values, there are direct NML and 2-stage NML (see Yuan & Bentler, 2000). Unless explicitly mentioned, our discussion equally applies to both/either of them.
The R package for robust SEM does not work with earlier versions of EQS that do not have the capability of talking with R (Mair, Wu, & Bentler, 2010).
When including Algebra, the means and covariances of the five variables cannot be well fitted by a two-factor model, as implied by a highly significant T ML .
Any name can be used here and mardiamv25 is used for convenience.
The input file is also available at http://rpackages.psychstat.org/examples/rsem/mcov.eqs.
Because EQS uses a different order from \(\operatorname{vech}(\boldsymbol{\Sigma})\) when vectorizing the covariance matrix, the matrix in the file weight.txt is a permutation of \(\hat{\boldsymbol{\Gamma}}\); it also has an extra row and column of zeros. To print the matrix in R console, use ex1$sem.
For the adjusted statistic T AML , EQS approximates the \(\hat{m}_{2}\) using the nearest integer and obtains the p-value using the approximated degrees of freedom.
The R code for the analysis is rsem(mardiamv25, c(1,2,4,5), "mcov.eqs", varphi=0).
The data can be obtained at http://rpackages.psychstat.org/examples/rsem/mardiamv25_contaminated.dat.
The SEs of \(\hat {\boldsymbol{\mu}}\) and \(\hat{\boldsymbol{\Sigma}}\) according to (8a) and (8b) will be in the default output of R. The matrix \(\hat{\boldsymbol{\Gamma}}\) according to the order of β in (8a) and (8b) will be saved into the object ex1, which is useful when SEM software other than EQS is used.
References
Arminger, G., & Sobel, M.E. (1990). Pseudo-maximum likelihood estimation of mean and covariance structures with missing data. Journal of the American Statistical Association, 85, 195–203.
Bentler, P.M. (2008). EQS 6 structural equations program manual. Encino: Multivariate Software.
Bentler, P.M., & Yuan, K.-H. (1999). Structural equation modeling with small samples: test statistics. Multivariate Behavioral Research, 34, 181–197.
Browne, M.W. (1984). Asymptotic distribution-free methods for the analysis of covariance structures. British Journal of Mathematical & Statistical Psychology, 37, 62–83.
Cheng, T.-C., & Victoria-Feser, M.-P. (2002). High-breakdown estimation of multivariate mean and covariance with missing observations. British Journal of Mathematical & Statistical Psychology, 55, 317–335.
D’Agostino, R.B., Belanger, A., & D’Agostino, R.B. Jr. (1990). A suggestion for using powerful and informative tests of normality. American Statistician, 44, 316–321.
Enders, C.K. (2010). Applied missing data analysis, New York: Guilford.
Enders, C.K., & Bandalos, D.L. (2001). The relative performance of full information maximum likelihood estimation for missing data in structural equation models. Structural Equation Modeling, 8, 430–457.
Hampel, F.R., Ronchetti, E.M., Rousseeuw, P.J., & Stahel, W.A. (1986). Robust statistics: the approach based on influence functions. New York: Wiley.
Hu, L., Bentler, P.M., & Kano, Y. (1992). Can test statistics in covariance structure analysis be trusted? Psychological Bulletin, 112, 351–362.
Huber, P.J. (1981). Robust statistics. New York: Wiley.
Lee, S.Y., & Xia, Y.M. (2006). Maximum likelihood methods in treating outliers and symmetrically heavy-tailed distributions for nonlinear structural equation models with missing data. Psychometrika, 71, 565–585.
Lee, S.Y., & Xia, Y.M. (2008). A robust Bayesian approach for structural equation models with missing data. Psychometrika, 73, 343–364.
Little, R.J.A. (1988). Robust estimation of the mean and covariance matrix from data with missing values. Applied Statistics, 37, 23–38.
Liu, C. (1997). ML estimation of the multivariate t distribution and the EM algorithm. Journal of Multivariate Analysis, 63, 296–312.
Lopuhaä, H.P. (1989). On the relation between S-estimators and M-estimators of multivariate location and covariances. Annals of Statistics, 17, 1662–1683.
Mair, P., Wu, E., & Bentler, P.M. (2010). EQS goes R: simulations for SEM using the package REQS. Structural Equation Modeling, 17, 333–349.
Mardia, K.V. (1970). Measures of multivariate skewness and kurtosis with applications. Biometrika, 57, 519–530.
Mardia, K.V., Kent, J.T., & Bibby, J.M. (1979). Multivariate analysis. New York: Academic Press.
Micceri, T. (1989). The unicorn, the normal curve, and other improbable creatures. Psychological Bulletin, 105, 156–166.
Preacher, K.J., Wichman, A.L., MacCallum, R.C., & Briggs, N.E. (2008). Latent growth curve modeling. Thousand Oaks: Sage.
Raykov, T. (2005). Analysis of longitudinal studies with missing data using covariance structure modeling with full-information maximum likelihood. Structural Equation Modeling, 12, 493–505.
Rocke, D.M. (1996). Robustness properties of S-estimators of multivariate location and shape in high dimension. Annals of Statistics, 24, 1327–1345.
Rubin, D.B. (1976). Inference and missing data (with discussions). Biometrika, 63, 581–592.
Satorra, A., & Bentler, P.M. (1994). Corrections to test statistics and standard errors in covariance structure analysis. In A. von Eye & C.C. Clogg (Eds.), Latent variables analysis: applications for developmental research (pp. 399–419). Newbury Park: Sage.
Savalei, V., & Bentler, P.M. (2009). A two-stage ML approach to missing data: theory and application to auxiliary variables. Structural Equation Modeling, 16, 477–497.
Savalei, V., & Falk, C. (in press) Robust two-stage approach outperforms robust FIML with incomplete non-normal data. Structural Equation Modeling.
Schott, J. (2005). Matrix analysis for statistics (2nd ed.). New York: Wiley.
Tong, X., Zhang, Z., & Yuan, K.-H. (2011, October). Evaluation of test statistics for robust structural equation modeling with non-normal missing data. Paper presented at the graduate student pre-conference of the annual meeting of the society of multivariate experimental psychology, Norman, OK.
Yuan, K.-H. (2011). Expectation-robust algorithm and estimating equation for means and covariances with missing data. Manuscript under review.
Yuan, K.-H., & Bentler, P.M. (1997). Improving parameter tests in covariance structure analysis. Computational Statistics & Data Analysis, 26, 177–198.
Yuan, K.-H., & Bentler, P.M. (1998). Normal theory based test statistics in structural equation modeling. British Journal of Mathematical & Statistical Psychology, 51, 289–309.
Yuan, K.-H., & Bentler, P.M. (2000). Three likelihood-based methods for mean and covariance structure analysis with non-normal missing data. Sociological Methodology, 30, 167–202.
Yuan, K.-H., & Bentler, P.M. (2001). A unified approach to multigroup structural equation modeling with nonstandard samples. In G.A. Marcoulides & R.E. Schumacker (Eds.), Advanced structural equation modeling: new developments and techniques (pp. 35–56). Mahwah: Lawrence Erlbaum Associates.
Yuan, K.-H., & Bentler, P.M. (2010). Two simple approximations to the distributions of quadratic forms. British Journal of Mathematical & Statistical Psychology, 63, 273–291.
Yuan, K.-H., Bentler, P.M., & Chan, W. (2004a). Structural equation modeling with heavy tailed distributions. Psychometrika, 69, 421–436.
Yuan, K.-H., & Jennrich, R.I. (1998). Asymptotics of estimating equations under natural conditions. Journal of Multivariate Analysis, 65, 245–260.
Yuan, K.-H., Lambert, P.L., & Fouladi, R.T. (2004b). Mardia’s multivariate kurtosis with missing data. Multivariate Behavioral Research, 39, 413–437.
Yuan, K.-H., & Lu, L. (2008). SEM with missing data and unknown population using two-stage ML: theory and its application. Multivariate Behavioral Research, 62, 621–652.
Yuan, K.-H., Marshall, L.L., & Bentler, P.M. (2002). A unified approach to exploratory factor analysis with missing data, non-normal data, and in the presence of outliers. Psychometrika, 67, 95–122.
Yuan, K.-H., Wallentin, F., & Bentler, P.M. (in press) ML versus MI for missing data with violation of distribution conditions. Sociological Methods & Research.
Zhong, X., & Yuan, K.-H. (2011). Bias and efficiency in structural equation modeling: maximum likelihood versus robust methods. Multivariate Behavioral Research, 46, 229–265.
Zu, J., & Yuan, K.-H. (2010). Local influence and robust procedures for mediation analysis. Multivariate Behavioral Research, 45, 1–44.
Acknowledgements
We would like to thank Dr. Alberto Maydeu-Olivares and two reviewers for their very constructive comments on an earlier version of the paper.
Author information
Authors and Affiliations
Corresponding author
Additional information
The research was supported by Grants DA00017 and DA01070 from the National Institute on Drug Abuse.
Appendices
Appendix A. Mathematical Details for Evaluating the Matrix \(\hat{\boldsymbol{\Upsilon}}\)
This appendix provides the development and formulas for evaluating the \(\hat{\boldsymbol{\Upsilon}}\) in (8b) with Huber-type weight. The formulas are programmed in the R package introduced in Section 4.
With the Huber-type weight, w i3(d i )=1. Then the estimating equations in (1) and (2) are derived from
and
where d is for differentials. It follows from (A.1) and (A.2) that
and
Noting that both w i1 and w i2 are function of \(d_{i}=[(\mathbf{x}_{i}-\boldsymbol{\nu}_{i})'\mathbf{V}_{i}^{-1}(\mathbf{x}_{i}-\boldsymbol {\nu}_{i})]^{1/2}\), we have
and
Thus, when d i >ρ i , we have
and
Notice that, for matrices A, B, C, and D of proper orders, there exists
Let
Using (A.9), it follows from (A.3) to (A.8) that, when d i ≤ρ i ,
and when d i >ρ i ,
Appendix B. R Code for Robust SEM and Its Output
Appendix C. EQS Code for the Model in Equations (12) and (13)
Appendix D. EQS Code for Confirmatory Factor Analysis with Four Variables
Appendix E. EQS Code for the Unconditional Latent Growth Curve Model in Equation (14)
Appendix F. EQS Code for the Conditional Latent Growth Curve Model in Equations (15) and (16)
Rights and permissions
About this article
Cite this article
Yuan, KH., Zhang, Z. Robust Structural Equation Modeling with Missing Data and Auxiliary Variables. Psychometrika 77, 803–826 (2012). https://doi.org/10.1007/s11336-012-9282-4
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11336-012-9282-4