Skip to main content
Log in

Robust Structural Equation Modeling with Missing Data and Auxiliary Variables

  • Published:
Psychometrika Aims and scope Submit manuscript

Abstract

The paper develops a two-stage robust procedure for structural equation modeling (SEM) and an R package rsem to facilitate the use of the procedure by applied researchers. In the first stage, M-estimates of the saturated mean vector and covariance matrix of all variables are obtained. Those corresponding to the substantive variables are then fitted to the structural model in the second stage. A sandwich-type covariance matrix is used to obtain consistent standard errors (SE) of the structural parameter estimates. Rescaled, adjusted as well as corrected and F-statistics are proposed for overall model evaluation. Using R and EQS, the R package rsem combines the two stages and generates all the test statistics and consistent SEs. Following the robust analysis, multiple model fit indices and standardized solutions are provided in the corresponding output of EQS. An example with open/closed book examination data illustrates the proper use of the package. The method is further applied to the analysis of a data set from the National Longitudinal Survey of Youth 1997 cohort, and results show that the developed procedure not only gives a better endorsement of the substantive models but also yields estimates with uniformly smaller standard errors than the normal-distribution-based maximum likelihood.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Figure 1.

Similar content being viewed by others

Notes

  1. Without missing values, NML is uniquely defined. With missing values, there are direct NML and 2-stage NML (see Yuan & Bentler, 2000). Unless explicitly mentioned, our discussion equally applies to both/either of them.

  2. The R package for robust SEM does not work with earlier versions of EQS that do not have the capability of talking with R (Mair, Wu, & Bentler, 2010).

  3. When including Algebra, the means and covariances of the five variables cannot be well fitted by a two-factor model, as implied by a highly significant T ML .

  4. The line numbers on the right margin of Appendix B are for the convenience of explaining the code, not part of R input. The same is true for the EQS input files in Appendices C to F.

  5. Any name can be used here and mardiamv25 is used for convenience.

  6. The input file is also available at http://rpackages.psychstat.org/examples/rsem/mcov.eqs.

  7. Because EQS uses a different order from \(\operatorname{vech}(\boldsymbol{\Sigma})\) when vectorizing the covariance matrix, the matrix in the file weight.txt is a permutation of \(\hat{\boldsymbol{\Gamma}}\); it also has an extra row and column of zeros. To print the matrix in R console, use ex1$sem.

  8. For the adjusted statistic T AML , EQS approximates the \(\hat{m}_{2}\) using the nearest integer and obtains the p-value using the approximated degrees of freedom.

  9. The R code for the analysis is rsem(mardiamv25, c(1,2,4,5), "mcov.eqs", varphi=0).

  10. The data can be obtained at http://rpackages.psychstat.org/examples/rsem/mardiamv25_contaminated.dat.

  11. The SEs of \(\hat {\boldsymbol{\mu}}\) and \(\hat{\boldsymbol{\Sigma}}\) according to (8a) and (8b) will be in the default output of R. The matrix \(\hat{\boldsymbol{\Gamma}}\) according to the order of β in (8a) and (8b) will be saved into the object ex1, which is useful when SEM software other than EQS is used.

References

  • Arminger, G., & Sobel, M.E. (1990). Pseudo-maximum likelihood estimation of mean and covariance structures with missing data. Journal of the American Statistical Association, 85, 195–203.

    Article  Google Scholar 

  • Bentler, P.M. (2008). EQS 6 structural equations program manual. Encino: Multivariate Software.

    Google Scholar 

  • Bentler, P.M., & Yuan, K.-H. (1999). Structural equation modeling with small samples: test statistics. Multivariate Behavioral Research, 34, 181–197.

    Article  Google Scholar 

  • Browne, M.W. (1984). Asymptotic distribution-free methods for the analysis of covariance structures. British Journal of Mathematical & Statistical Psychology, 37, 62–83.

    Article  Google Scholar 

  • Cheng, T.-C., & Victoria-Feser, M.-P. (2002). High-breakdown estimation of multivariate mean and covariance with missing observations. British Journal of Mathematical & Statistical Psychology, 55, 317–335.

    Article  Google Scholar 

  • D’Agostino, R.B., Belanger, A., & D’Agostino, R.B. Jr. (1990). A suggestion for using powerful and informative tests of normality. American Statistician, 44, 316–321.

    Google Scholar 

  • Enders, C.K. (2010). Applied missing data analysis, New York: Guilford.

    Google Scholar 

  • Enders, C.K., & Bandalos, D.L. (2001). The relative performance of full information maximum likelihood estimation for missing data in structural equation models. Structural Equation Modeling, 8, 430–457.

    Article  Google Scholar 

  • Hampel, F.R., Ronchetti, E.M., Rousseeuw, P.J., & Stahel, W.A. (1986). Robust statistics: the approach based on influence functions. New York: Wiley.

    Google Scholar 

  • Hu, L., Bentler, P.M., & Kano, Y. (1992). Can test statistics in covariance structure analysis be trusted? Psychological Bulletin, 112, 351–362.

    Article  PubMed  Google Scholar 

  • Huber, P.J. (1981). Robust statistics. New York: Wiley.

    Book  Google Scholar 

  • Lee, S.Y., & Xia, Y.M. (2006). Maximum likelihood methods in treating outliers and symmetrically heavy-tailed distributions for nonlinear structural equation models with missing data. Psychometrika, 71, 565–585.

    Article  Google Scholar 

  • Lee, S.Y., & Xia, Y.M. (2008). A robust Bayesian approach for structural equation models with missing data. Psychometrika, 73, 343–364.

    Article  Google Scholar 

  • Little, R.J.A. (1988). Robust estimation of the mean and covariance matrix from data with missing values. Applied Statistics, 37, 23–38.

    Article  Google Scholar 

  • Liu, C. (1997). ML estimation of the multivariate t distribution and the EM algorithm. Journal of Multivariate Analysis, 63, 296–312.

    Article  Google Scholar 

  • Lopuhaä, H.P. (1989). On the relation between S-estimators and M-estimators of multivariate location and covariances. Annals of Statistics, 17, 1662–1683.

    Article  Google Scholar 

  • Mair, P., Wu, E., & Bentler, P.M. (2010). EQS goes R: simulations for SEM using the package REQS. Structural Equation Modeling, 17, 333–349.

    Article  Google Scholar 

  • Mardia, K.V. (1970). Measures of multivariate skewness and kurtosis with applications. Biometrika, 57, 519–530.

    Article  Google Scholar 

  • Mardia, K.V., Kent, J.T., & Bibby, J.M. (1979). Multivariate analysis. New York: Academic Press.

    Google Scholar 

  • Micceri, T. (1989). The unicorn, the normal curve, and other improbable creatures. Psychological Bulletin, 105, 156–166.

    Article  Google Scholar 

  • Preacher, K.J., Wichman, A.L., MacCallum, R.C., & Briggs, N.E. (2008). Latent growth curve modeling. Thousand Oaks: Sage.

    Google Scholar 

  • Raykov, T. (2005). Analysis of longitudinal studies with missing data using covariance structure modeling with full-information maximum likelihood. Structural Equation Modeling, 12, 493–505.

    Article  Google Scholar 

  • Rocke, D.M. (1996). Robustness properties of S-estimators of multivariate location and shape in high dimension. Annals of Statistics, 24, 1327–1345.

    Article  Google Scholar 

  • Rubin, D.B. (1976). Inference and missing data (with discussions). Biometrika, 63, 581–592.

    Article  Google Scholar 

  • Satorra, A., & Bentler, P.M. (1994). Corrections to test statistics and standard errors in covariance structure analysis. In A. von Eye & C.C. Clogg (Eds.), Latent variables analysis: applications for developmental research (pp. 399–419). Newbury Park: Sage.

    Google Scholar 

  • Savalei, V., & Bentler, P.M. (2009). A two-stage ML approach to missing data: theory and application to auxiliary variables. Structural Equation Modeling, 16, 477–497.

    Article  Google Scholar 

  • Savalei, V., & Falk, C. (in press) Robust two-stage approach outperforms robust FIML with incomplete non-normal data. Structural Equation Modeling.

  • Schott, J. (2005). Matrix analysis for statistics (2nd ed.). New York: Wiley.

    Google Scholar 

  • Tong, X., Zhang, Z., & Yuan, K.-H. (2011, October). Evaluation of test statistics for robust structural equation modeling with non-normal missing data. Paper presented at the graduate student pre-conference of the annual meeting of the society of multivariate experimental psychology, Norman, OK.

  • Yuan, K.-H. (2011). Expectation-robust algorithm and estimating equation for means and covariances with missing data. Manuscript under review.

  • Yuan, K.-H., & Bentler, P.M. (1997). Improving parameter tests in covariance structure analysis. Computational Statistics & Data Analysis, 26, 177–198.

    Article  Google Scholar 

  • Yuan, K.-H., & Bentler, P.M. (1998). Normal theory based test statistics in structural equation modeling. British Journal of Mathematical & Statistical Psychology, 51, 289–309.

    Article  Google Scholar 

  • Yuan, K.-H., & Bentler, P.M. (2000). Three likelihood-based methods for mean and covariance structure analysis with non-normal missing data. Sociological Methodology, 30, 167–202.

    Google Scholar 

  • Yuan, K.-H., & Bentler, P.M. (2001). A unified approach to multigroup structural equation modeling with nonstandard samples. In G.A. Marcoulides & R.E. Schumacker (Eds.), Advanced structural equation modeling: new developments and techniques (pp. 35–56). Mahwah: Lawrence Erlbaum Associates.

    Google Scholar 

  • Yuan, K.-H., & Bentler, P.M. (2010). Two simple approximations to the distributions of quadratic forms. British Journal of Mathematical & Statistical Psychology, 63, 273–291.

    Article  Google Scholar 

  • Yuan, K.-H., Bentler, P.M., & Chan, W. (2004a). Structural equation modeling with heavy tailed distributions. Psychometrika, 69, 421–436.

    Article  Google Scholar 

  • Yuan, K.-H., & Jennrich, R.I. (1998). Asymptotics of estimating equations under natural conditions. Journal of Multivariate Analysis, 65, 245–260.

    Article  Google Scholar 

  • Yuan, K.-H., Lambert, P.L., & Fouladi, R.T. (2004b). Mardia’s multivariate kurtosis with missing data. Multivariate Behavioral Research, 39, 413–437.

    Article  Google Scholar 

  • Yuan, K.-H., & Lu, L. (2008). SEM with missing data and unknown population using two-stage ML: theory and its application. Multivariate Behavioral Research, 62, 621–652.

    Article  Google Scholar 

  • Yuan, K.-H., Marshall, L.L., & Bentler, P.M. (2002). A unified approach to exploratory factor analysis with missing data, non-normal data, and in the presence of outliers. Psychometrika, 67, 95–122.

    Article  Google Scholar 

  • Yuan, K.-H., Wallentin, F., & Bentler, P.M. (in press) ML versus MI for missing data with violation of distribution conditions. Sociological Methods & Research.

  • Zhong, X., & Yuan, K.-H. (2011). Bias and efficiency in structural equation modeling: maximum likelihood versus robust methods. Multivariate Behavioral Research, 46, 229–265.

    Article  Google Scholar 

  • Zu, J., & Yuan, K.-H. (2010). Local influence and robust procedures for mediation analysis. Multivariate Behavioral Research, 45, 1–44.

    Article  Google Scholar 

Download references

Acknowledgements

We would like to thank Dr. Alberto Maydeu-Olivares and two reviewers for their very constructive comments on an earlier version of the paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ke-Hai Yuan.

Additional information

The research was supported by Grants DA00017 and DA01070 from the National Institute on Drug Abuse.

Appendices

Appendix A. Mathematical Details for Evaluating the Matrix \(\hat{\boldsymbol{\Upsilon}}\)

This appendix provides the development and formulas for evaluating the \(\hat{\boldsymbol{\Upsilon}}\) in (8b) with Huber-type weight. The formulas are programmed in the R package introduced in Section 4.

With the Huber-type weight, w i3(d i )=1. Then the estimating equations in (1) and (2) are derived from

$$ g_{i1}(\boldsymbol{\alpha})=w_{i1}(d\boldsymbol{ \nu}_i)'\mathbf {V}_i^{-1}( \mathbf{x}_i-\boldsymbol{\nu}_i), $$
(A.1)

and

$$ g_{i2}(\boldsymbol{\alpha})=\frac{1}{2}\operatorname{tr} \bigl \{ \mathbf{V}_i^{-1}(d\mathbf{V}_i) \mathbf{V}_i^{-1}\bigl[w_{i2}( \mathbf{x}_i-\boldsymbol{\nu}_i) (\mathbf {x}_i-\boldsymbol{\nu}_i)'- \mathbf{V}_i\bigr] \bigr\}, $$
(A.2)

where d is for differentials. It follows from (A.1) and (A.2) that

(A.3)

and

(A.4)

Noting that both w i1 and w i2 are function of \(d_{i}=[(\mathbf{x}_{i}-\boldsymbol{\nu}_{i})'\mathbf{V}_{i}^{-1}(\mathbf{x}_{i}-\boldsymbol {\nu}_{i})]^{1/2}\), we have

(A.5)

and

(A.6)

Thus, when d i >ρ i , we have

(A.7)

and

(A.8)

Notice that, for matrices A, B, C, and D of proper orders, there exists

$$ \operatorname{tr}(\mathbf{A}\mathbf{B}\mathbf{C}\mathbf {D})= \operatorname{vec}'(\mathbf{D}) \bigl(\mathbf{A}\otimes\mathbf {C}'\bigr)\operatorname{vec}\bigl(\mathbf{B}'\bigr)= \operatorname{vec}'\bigl(\mathbf{D}'\bigr) \bigl( \mathbf{C}'\otimes\mathbf{A}\bigr)\operatorname {vec}(\mathbf{B}). $$
(A.9)

Let

$$\mathbf{E}_i=\frac{\partial\boldsymbol{\nu}_i}{\partial \boldsymbol{\nu}'}, \quad \mbox{and} \quad \mathbf{F}_i=\frac{\partial\operatorname{vec}(\mathbf {V}_i)}{\partial\boldsymbol{v}'}. $$

Using (A.9), it follows from (A.3) to (A.8) that, when d i ρ i ,

$$\frac{\partial\mathbf{g}_{i1}(\boldsymbol{\alpha})}{\partial \boldsymbol{\nu}'}=-\mathbf{E}_i'\mathbf{V}_i^{-1} \mathbf{E}_i, \qquad \frac{\partial\mathbf{g}_{i1}(\boldsymbol{\alpha})}{\partial \boldsymbol{v}'}=-\mathbf{E}_i' \bigl(\mathbf{V}_i^{-1}\otimes\mathbf{b}_i' \bigr)\mathbf{F}_i, $$
$$\frac{\partial\mathbf{g}_{i2}(\boldsymbol{\alpha})}{\partial \boldsymbol{\nu}'}=-\frac{1}{\kappa_i}\mathbf{F}_i' \bigl(\mathbf{b}_i\otimes\mathbf{V}_i^{-1} \bigr)\mathbf{E}_i, \qquad \frac{\partial\mathbf{g}_{i2}(\boldsymbol{\alpha})}{\partial \boldsymbol{v}'}=- \mathbf{F}_i'\biggl[\frac {1}{\kappa_i}\bigl( \mathbf{H}_i\otimes\mathbf{V}_i^{-1}\bigr)- \mathbf {W}_i\biggr]\mathbf{F}_i; $$

and when d i >ρ i ,

Appendix B. R Code for Robust SEM and Its Output

figure e

Appendix C. EQS Code for the Model in Equations (12) and (13)

figure f

Appendix D. EQS Code for Confirmatory Factor Analysis with Four Variables

figure g

Appendix E. EQS Code for the Unconditional Latent Growth Curve Model in Equation (14)

figure h

Appendix F. EQS Code for the Conditional Latent Growth Curve Model in Equations (15) and (16)

figure i

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yuan, KH., Zhang, Z. Robust Structural Equation Modeling with Missing Data and Auxiliary Variables. Psychometrika 77, 803–826 (2012). https://doi.org/10.1007/s11336-012-9282-4

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11336-012-9282-4

Key words

Navigation