Robust Structural Equation Modeling with Missing Data and Auxiliary Variables

Yuan, Ke-Hai; Zhang, Zhiyong

doi:10.1007/s11336-012-9282-4

Robust Structural Equation Modeling with Missing Data and Auxiliary Variables

Published: 22 August 2012

Volume 77, pages 803–826, (2012)
Cite this article

Psychometrika Aims and scope Submit manuscript

Ke-Hai Yuan¹ &
Zhiyong Zhang¹

1488 Accesses
47 Citations
Explore all metrics

Abstract

The paper develops a two-stage robust procedure for structural equation modeling (SEM) and an R package rsem to facilitate the use of the procedure by applied researchers. In the first stage, M-estimates of the saturated mean vector and covariance matrix of all variables are obtained. Those corresponding to the substantive variables are then fitted to the structural model in the second stage. A sandwich-type covariance matrix is used to obtain consistent standard errors (SE) of the structural parameter estimates. Rescaled, adjusted as well as corrected and F-statistics are proposed for overall model evaluation. Using R and EQS, the R package rsem combines the two stages and generates all the test statistics and consistent SEs. Following the robust analysis, multiple model fit indices and standardized solutions are provided in the corresponding output of EQS. An example with open/closed book examination data illustrates the proper use of the package. The method is further applied to the analysis of a data set from the National Longitudinal Survey of Youth 1997 cohort, and results show that the developed procedure not only gives a better endorsement of the substantive models but also yields estimates with uniformly smaller standard errors than the normal-distribution-based maximum likelihood.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Comparisons among several consistent estimators of structural equation models

Article 29 November 2017

R-squared change in structural equation models with latent variables and missing data

Article 29 March 2021

A unified model-implied instrumental variable approach for structural equation modeling with mixed variables

Article Open access 07 June 2021

Notes

Without missing values, NML is uniquely defined. With missing values, there are direct NML and 2-stage NML (see Yuan & Bentler, 2000). Unless explicitly mentioned, our discussion equally applies to both/either of them.
The R package for robust SEM does not work with earlier versions of EQS that do not have the capability of talking with R (Mair, Wu, & Bentler, 2010).
When including Algebra, the means and covariances of the five variables cannot be well fitted by a two-factor model, as implied by a highly significant T _ML.
The line numbers on the right margin of Appendix B are for the convenience of explaining the code, not part of R input. The same is true for the EQS input files in Appendices C to F.
Any name can be used here and mardiamv25 is used for convenience.
The input file is also available at http://rpackages.psychstat.org/examples/rsem/mcov.eqs.
Because EQS uses a different order from $\operatorname{vech}(\boldsymbol{\Sigma})$ when vectorizing the covariance matrix, the matrix in the file weight.txt is a permutation of $\hat{\boldsymbol{\Gamma}}$; it also has an extra row and column of zeros. To print the matrix in R console, use ex1$sem.
For the adjusted statistic T _AML, EQS approximates the $\hat{m}_{2}$ using the nearest integer and obtains the p-value using the approximated degrees of freedom.
The R code for the analysis is rsem(mardiamv25, c(1,2,4,5), "mcov.eqs", varphi=0).
The data can be obtained at http://rpackages.psychstat.org/examples/rsem/mardiamv25_contaminated.dat.
The SEs of $\hat {\boldsymbol{\mu}}$ and $\hat{\boldsymbol{\Sigma}}$ according to (8a) and (8b) will be in the default output of R. The matrix $\hat{\boldsymbol{\Gamma}}$ according to the order of β in (8a) and (8b) will be saved into the object ex1, which is useful when SEM software other than EQS is used.

References

Arminger, G., & Sobel, M.E. (1990). Pseudo-maximum likelihood estimation of mean and covariance structures with missing data. Journal of the American Statistical Association, 85, 195–203.
Article Google Scholar
Bentler, P.M. (2008). EQS 6 structural equations program manual. Encino: Multivariate Software.
Google Scholar
Bentler, P.M., & Yuan, K.-H. (1999). Structural equation modeling with small samples: test statistics. Multivariate Behavioral Research, 34, 181–197.
Article Google Scholar
Browne, M.W. (1984). Asymptotic distribution-free methods for the analysis of covariance structures. British Journal of Mathematical & Statistical Psychology, 37, 62–83.
Article Google Scholar
Cheng, T.-C., & Victoria-Feser, M.-P. (2002). High-breakdown estimation of multivariate mean and covariance with missing observations. British Journal of Mathematical & Statistical Psychology, 55, 317–335.
Article Google Scholar
D’Agostino, R.B., Belanger, A., & D’Agostino, R.B. Jr. (1990). A suggestion for using powerful and informative tests of normality. American Statistician, 44, 316–321.
Google Scholar
Enders, C.K. (2010). Applied missing data analysis, New York: Guilford.
Google Scholar
Enders, C.K., & Bandalos, D.L. (2001). The relative performance of full information maximum likelihood estimation for missing data in structural equation models. Structural Equation Modeling, 8, 430–457.
Article Google Scholar
Hampel, F.R., Ronchetti, E.M., Rousseeuw, P.J., & Stahel, W.A. (1986). Robust statistics: the approach based on influence functions. New York: Wiley.
Google Scholar
Hu, L., Bentler, P.M., & Kano, Y. (1992). Can test statistics in covariance structure analysis be trusted? Psychological Bulletin, 112, 351–362.
Article PubMed Google Scholar
Huber, P.J. (1981). Robust statistics. New York: Wiley.
Book Google Scholar
Lee, S.Y., & Xia, Y.M. (2006). Maximum likelihood methods in treating outliers and symmetrically heavy-tailed distributions for nonlinear structural equation models with missing data. Psychometrika, 71, 565–585.
Article Google Scholar
Lee, S.Y., & Xia, Y.M. (2008). A robust Bayesian approach for structural equation models with missing data. Psychometrika, 73, 343–364.
Article Google Scholar
Little, R.J.A. (1988). Robust estimation of the mean and covariance matrix from data with missing values. Applied Statistics, 37, 23–38.
Article Google Scholar
Liu, C. (1997). ML estimation of the multivariate t distribution and the EM algorithm. Journal of Multivariate Analysis, 63, 296–312.
Article Google Scholar
Lopuhaä, H.P. (1989). On the relation between S-estimators and M-estimators of multivariate location and covariances. Annals of Statistics, 17, 1662–1683.
Article Google Scholar
Mair, P., Wu, E., & Bentler, P.M. (2010). EQS goes R: simulations for SEM using the package REQS. Structural Equation Modeling, 17, 333–349.
Article Google Scholar
Mardia, K.V. (1970). Measures of multivariate skewness and kurtosis with applications. Biometrika, 57, 519–530.
Article Google Scholar
Mardia, K.V., Kent, J.T., & Bibby, J.M. (1979). Multivariate analysis. New York: Academic Press.
Google Scholar
Micceri, T. (1989). The unicorn, the normal curve, and other improbable creatures. Psychological Bulletin, 105, 156–166.
Article Google Scholar
Preacher, K.J., Wichman, A.L., MacCallum, R.C., & Briggs, N.E. (2008). Latent growth curve modeling. Thousand Oaks: Sage.
Google Scholar
Raykov, T. (2005). Analysis of longitudinal studies with missing data using covariance structure modeling with full-information maximum likelihood. Structural Equation Modeling, 12, 493–505.
Article Google Scholar
Rocke, D.M. (1996). Robustness properties of S-estimators of multivariate location and shape in high dimension. Annals of Statistics, 24, 1327–1345.
Article Google Scholar
Rubin, D.B. (1976). Inference and missing data (with discussions). Biometrika, 63, 581–592.
Article Google Scholar
Satorra, A., & Bentler, P.M. (1994). Corrections to test statistics and standard errors in covariance structure analysis. In A. von Eye & C.C. Clogg (Eds.), Latent variables analysis: applications for developmental research (pp. 399–419). Newbury Park: Sage.
Google Scholar
Savalei, V., & Bentler, P.M. (2009). A two-stage ML approach to missing data: theory and application to auxiliary variables. Structural Equation Modeling, 16, 477–497.
Article Google Scholar
Savalei, V., & Falk, C. (in press) Robust two-stage approach outperforms robust FIML with incomplete non-normal data. Structural Equation Modeling.
Schott, J. (2005). Matrix analysis for statistics (2nd ed.). New York: Wiley.
Google Scholar
Tong, X., Zhang, Z., & Yuan, K.-H. (2011, October). Evaluation of test statistics for robust structural equation modeling with non-normal missing data. Paper presented at the graduate student pre-conference of the annual meeting of the society of multivariate experimental psychology, Norman, OK.
Yuan, K.-H. (2011). Expectation-robust algorithm and estimating equation for means and covariances with missing data. Manuscript under review.
Yuan, K.-H., & Bentler, P.M. (1997). Improving parameter tests in covariance structure analysis. Computational Statistics & Data Analysis, 26, 177–198.
Article Google Scholar
Yuan, K.-H., & Bentler, P.M. (1998). Normal theory based test statistics in structural equation modeling. British Journal of Mathematical & Statistical Psychology, 51, 289–309.
Article Google Scholar
Yuan, K.-H., & Bentler, P.M. (2000). Three likelihood-based methods for mean and covariance structure analysis with non-normal missing data. Sociological Methodology, 30, 167–202.
Google Scholar
Yuan, K.-H., & Bentler, P.M. (2001). A unified approach to multigroup structural equation modeling with nonstandard samples. In G.A. Marcoulides & R.E. Schumacker (Eds.), Advanced structural equation modeling: new developments and techniques (pp. 35–56). Mahwah: Lawrence Erlbaum Associates.
Google Scholar
Yuan, K.-H., & Bentler, P.M. (2010). Two simple approximations to the distributions of quadratic forms. British Journal of Mathematical & Statistical Psychology, 63, 273–291.
Article Google Scholar
Yuan, K.-H., Bentler, P.M., & Chan, W. (2004a). Structural equation modeling with heavy tailed distributions. Psychometrika, 69, 421–436.
Article Google Scholar
Yuan, K.-H., & Jennrich, R.I. (1998). Asymptotics of estimating equations under natural conditions. Journal of Multivariate Analysis, 65, 245–260.
Article Google Scholar
Yuan, K.-H., Lambert, P.L., & Fouladi, R.T. (2004b). Mardia’s multivariate kurtosis with missing data. Multivariate Behavioral Research, 39, 413–437.
Article Google Scholar
Yuan, K.-H., & Lu, L. (2008). SEM with missing data and unknown population using two-stage ML: theory and its application. Multivariate Behavioral Research, 62, 621–652.
Article Google Scholar
Yuan, K.-H., Marshall, L.L., & Bentler, P.M. (2002). A unified approach to exploratory factor analysis with missing data, non-normal data, and in the presence of outliers. Psychometrika, 67, 95–122.
Article Google Scholar
Yuan, K.-H., Wallentin, F., & Bentler, P.M. (in press) ML versus MI for missing data with violation of distribution conditions. Sociological Methods & Research.
Zhong, X., & Yuan, K.-H. (2011). Bias and efficiency in structural equation modeling: maximum likelihood versus robust methods. Multivariate Behavioral Research, 46, 229–265.
Article Google Scholar
Zu, J., & Yuan, K.-H. (2010). Local influence and robust procedures for mediation analysis. Multivariate Behavioral Research, 45, 1–44.
Article Google Scholar

Download references

Acknowledgements

We would like to thank Dr. Alberto Maydeu-Olivares and two reviewers for their very constructive comments on an earlier version of the paper.

Author information

Authors and Affiliations

Department of Psychology, University of Notre Dame, Notre Dame, IN, 46556, USA
Ke-Hai Yuan & Zhiyong Zhang

Authors

Ke-Hai Yuan
View author publications
You can also search for this author in PubMed Google Scholar
Zhiyong Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ke-Hai Yuan.

Additional information

The research was supported by Grants DA00017 and DA01070 from the National Institute on Drug Abuse.

Appendices

Appendix A. Mathematical Details for Evaluating the Matrix $\hat{\boldsymbol{\Upsilon}}$

This appendix provides the development and formulas for evaluating the $\hat{\boldsymbol{\Upsilon}}$ in (8b) with Huber-type weight. The formulas are programmed in the R package introduced in Section 4.

With the Huber-type weight, w _i3(d _i)=1. Then the estimating equations in (1) and (2) are derived from

$$ g_{i1}(\boldsymbol{\alpha})=w_{i1}(d\boldsymbol{ \nu}_i)'\mathbf {V}_i^{-1}( \mathbf{x}_i-\boldsymbol{\nu}_i), $$

(A.1)

and

$$ g_{i2}(\boldsymbol{\alpha})=\frac{1}{2}\operatorname{tr} \bigl \{ \mathbf{V}_i^{-1}(d\mathbf{V}_i) \mathbf{V}_i^{-1}\bigl[w_{i2}( \mathbf{x}_i-\boldsymbol{\nu}_i) (\mathbf {x}_i-\boldsymbol{\nu}_i)'- \mathbf{V}_i\bigr] \bigr\}, $$

(A.2)

where d is for differentials. It follows from (A.1) and (A.2) that

(A.3)

and

(A.4)

Noting that both w _i1 and w _i2 are function of $d_{i}=[(\mathbf{x}_{i}-\boldsymbol{\nu}_{i})'\mathbf{V}_{i}^{-1}(\mathbf{x}_{i}-\boldsymbol {\nu}_{i})]^{1/2}$, we have

(A.5)

and

(A.6)

Thus, when d _i>ρ _i, we have

(A.7)

and

(A.8)

Notice that, for matrices A, B, C, and D of proper orders, there exists

$$ \operatorname{tr}(\mathbf{A}\mathbf{B}\mathbf{C}\mathbf {D})= \operatorname{vec}'(\mathbf{D}) \bigl(\mathbf{A}\otimes\mathbf {C}'\bigr)\operatorname{vec}\bigl(\mathbf{B}'\bigr)= \operatorname{vec}'\bigl(\mathbf{D}'\bigr) \bigl( \mathbf{C}'\otimes\mathbf{A}\bigr)\operatorname {vec}(\mathbf{B}). $$

(A.9)

Let

$$\mathbf{E}_i=\frac{\partial\boldsymbol{\nu}_i}{\partial \boldsymbol{\nu}'}, \quad \mbox{and} \quad \mathbf{F}_i=\frac{\partial\operatorname{vec}(\mathbf {V}_i)}{\partial\boldsymbol{v}'}. $$

Using (A.9), it follows from (A.3) to (A.8) that, when d _i≤ρ _i,

$$\frac{\partial\mathbf{g}_{i1}(\boldsymbol{\alpha})}{\partial \boldsymbol{\nu}'}=-\mathbf{E}_i'\mathbf{V}_i^{-1} \mathbf{E}_i, \qquad \frac{\partial\mathbf{g}_{i1}(\boldsymbol{\alpha})}{\partial \boldsymbol{v}'}=-\mathbf{E}_i' \bigl(\mathbf{V}_i^{-1}\otimes\mathbf{b}_i' \bigr)\mathbf{F}_i, $$

$$\frac{\partial\mathbf{g}_{i2}(\boldsymbol{\alpha})}{\partial \boldsymbol{\nu}'}=-\frac{1}{\kappa_i}\mathbf{F}_i' \bigl(\mathbf{b}_i\otimes\mathbf{V}_i^{-1} \bigr)\mathbf{E}_i, \qquad \frac{\partial\mathbf{g}_{i2}(\boldsymbol{\alpha})}{\partial \boldsymbol{v}'}=- \mathbf{F}_i'\biggl[\frac {1}{\kappa_i}\bigl( \mathbf{H}_i\otimes\mathbf{V}_i^{-1}\bigr)- \mathbf {W}_i\biggr]\mathbf{F}_i; $$

and when d _i>ρ _i,

Appendix B. R Code for Robust SEM and Its Output

Appendix C. EQS Code for the Model in Equations (12) and (13)

Appendix D. EQS Code for Confirmatory Factor Analysis with Four Variables

Appendix E. EQS Code for the Unconditional Latent Growth Curve Model in Equation (14)

Appendix F. EQS Code for the Conditional Latent Growth Curve Model in Equations (15) and (16)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yuan, KH., Zhang, Z. Robust Structural Equation Modeling with Missing Data and Auxiliary Variables. Psychometrika 77, 803–826 (2012). https://doi.org/10.1007/s11336-012-9282-4

Download citation

Received: 05 October 2011
Revised: 12 December 2011
Published: 22 August 2012
Issue Date: October 2012
DOI: https://doi.org/10.1007/s11336-012-9282-4

Key words

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Robust Structural Equation Modeling with Missing Data and Auxiliary Variables

Abstract

Access this article

Similar content being viewed by others

Comparisons among several consistent estimators of structural equation models

R-squared change in structural equation models with latent variables and missing data

A unified model-implied instrumental variable approach for structural equation modeling with mixed variables

Notes

References

Acknowledgements