Abstract
The common maximum likelihood (ML) estimator for structural equation models (SEMs) has optimal asymptotic properties under ideal conditions (e.g., correct structure, no excess kurtosis, etc.) that are rarely met in practice. This paper proposes modelimplied instrumental variable – generalized method of moments (MIIVGMM) estimators for latent variable SEMs that are more robust than ML to violations of both the model structure and distributional assumptions. Under less demanding assumptions, the MIIVGMM estimators are consistent, asymptotically unbiased, asymptotically normal, and have an asymptotic covariance matrix. They are “distributionfree,” robust to heteroscedasticity, and have overidentification goodnessoffit Jtests with asymptotic chisquare distributions. In addition, MIIVGMM estimators are “scalable” in that they can estimate and test the full model or any subset of equations, and hence allow better pinpointing of those parts of the model that fit and do not fit the data. An empirical example illustrates MIIVGMM estimators. Two simulation studies explore their finite sample properties and find that they perform well across a range of sample sizes.
Similar content being viewed by others
References
Anderson, J.C., & Gerbing, D. (1984). The effect of sampling error on convergence, improper solutions, and goodnessoffit indices for maximum likelihood confirmatory factor analysis. Psychometrika, 49, 155–173.
Anderson, T.W., & Amemiya, Y. (1988). The asymptotic normal distribution of estimators in factor analysis under general conditions. The Annals of Statistics, 16, 759–771.
Angrist, J.D., & Pischke, J. (2009). Mostly harmless econometrics: an empiricist’s companion. Princeton: Princeton University Press.
Bauldry, S. (forthcoming). miivfind: a program for identifying modelimplied instrumental variables (MIIVs) for structural equation models in Stata. Stata Journal.
Bentler, P.M. (1982). Confirmatory factor analysis via noniterative estimation: a fast, inexpensive method. Journal of Marketing Research, 19, 417–424.
Bentler, P.M., & Yuan, K. (1999). Structural equation modeling with small samples: test statistics. Multivariate Behavioral Research, 34, 181–197.
Bollen, K.A. (1989). Structural equations with latent variables. New York: Wiley.
Bollen, K.A. (1996a). An alternative Two Stage Least Squares (2SLS) estimator for latent variable equations. Psychometrika, 61, 109–121.
Bollen, K.A. (1996b). A limited information estimator for LISREL models with and without heteroscedasticity. In G.A. Marcoulides & R.E. Schumacker (Eds.), Advanced structural equation modeling (pp. 227–241). Mahwah: Erlbaum.
Bollen, K.A. (2001). Twostage least squares and latent variable models: simultaneous estimation and robustness to misspecifications. In: R. Cudeck, S.D. Toit, & D. Sörbom (Eds.), Structural equation modeling: present and future, a festschrift in honor of Karl Jöreskog (pp. 119–138). Lincolnwood: Scientific Software International.
Bollen, K.A. (2012). Instrumental variables in sociology and the social sciences. Annual Review of Sociology, 38, 37–72.
Bollen, K.A., & Bauer, D.J. (2004). Automating the selection of modelimplied instrumental variables. Sociological Methods & Research, 32, 425–452.
Bollen, K.A., Kirby, J.B., Curran, P.J., Paxton, P.M., & Chen, F. (2007). Latent variable models under misspecification: twostage least squares (2SLS) and maximum likelihood (ML) estimators. Sociological Methods & Research, 36, 48–86.
Bollen, K.A., & Pearl, J. (2013). Eight myths about causality and structural equation models. In S. Morgan (Ed.), Handbook of causal analysis for social research, New York: Springer.
Bollen, K.A., & Stine, R. (1990). Direct and indirect effects: classical and bootstrap estimates of variability. Sociological Methodology, 20, 115–140.
Bollen, K.A., & Stine, R. (1992). Bootstrapping goodnessoffit measures in structural equation models. Sociological Methods & Research, 21, 205–229.
Boomsma, A., & Hoogland, J.J. (2001). The robustness of LISREL modeling revisited. In R. Cudeck, S.D. Toit, & D. Sörbom (Eds.), Structural equation modeling: present and future, a festschrift in honor of Karl Jöreskog (pp. 139–168). Lincolnwood: Scientific Software International.
Browne, M.W. (1984). Asymptotically distributionfree methods for the analysis of the covariance structures. British Journal of Mathematical & Statistical Psychology, 37, 62–83.
Browne, M.W., & Cudeck, R. (1993). Alternative ways of assessing model fit. In K.A. Bollen & J.S. Long (Eds.), Testing structural equation models (pp. 136–162). Newbury Park: Sage.
Chausse, P. (2012). gmm: generalized method of moments and generalized empirical likelihood (R package). http://cran.rproject.org/web/packages/gmm/index.html.
Cragg, J.G. (1968). Some effects of incorrect specification on the small sample properties of several simultaneous equation estimators. International Economic Review, 9, 63–86.
Davidson, R., & MacKinnon, J.G. (1993). Estimation and inference in econometrics. New York: Oxford University Press.
Foster, E.M. (1997). Instrumental variables for logistic regression: an illustration. Social Science Research, 26, 487–504.
Glanville, J.L., & Paxton, P. (2007). How do we learn to trust? A confirmatory tetrad analysis of the sources of generalized trust. Social Psychology Quarterly, 70, 230–242.
Godambe, V.P., & Thompson, M. (1978). Some aspects of the theory of estimating equations. Journal of Statistical Planning and Inference, 2, 95–104.
Hall, A.R. (2005). Generalized method of moments. Oxford: Oxford University Press.
Hägglund, G. (1982). Factor analysis by instrumental variables. Psychometrika, 47, 209–222.
Hansen, L.P. (1982). Large sample properties of generalized method of moments estimators. Econometrica, 50, 1029–1054.
Hu, L.T., Bentler, P.M., & Kano, Y. (1992). Can test statistics in covariance structure analysis be trusted? Psychological Bulletin, 112, 351–362.
Ihara, M., & Kano, Y. (1986). A new estimator of the uniqueness in factor analysis. Psychometrika, 51, 563–566.
Jöreskog, K.G. (1969). A general approach to confirmatory maximum likelihood factor analysis. Psychometrika, 34, 183–202.
Jöreskog, K.G. (1973). A general method for estimating a linear structural equation system. In: A.S. Goldberger & O.D. Duncan (Eds.), Structural equation models in the social sciences (pp. 85–112). New York: Academic Press.
Jöreskog, K.G. (1977). Structural equation models in the social sciences: specification, estimation, and testing. in: P.R. Krishnaiah (Ed.), Applications of statistics (pp. 265–287). Amsterdam: NorthHolland.
Jöreskog, K.G. (1983). Factor analysis as an errorinvariables model. In: Wainer, H. & Messick, S. (Eds.) Principles of Modern Psychological Measurement (pp. 185–196). Hillsdale: Erlbaum.
Kirby, J.B., & Bollen, K.A. (2009). Using instrumental variable tests to evaluated model specification in latent variable structural equation models. Sociological Methodology, 39, 327–355.
Kolenikov, S. (2011). Biases of parameter estimates in misspecified structural equation models. Sociological Methodology, 41, 119–157.
Kolenikov, S., & Bollen, K.A. (2012). Testing negative error variances: is a Heywood case a symptom of misspecification? Sociological Methods & Research, 41, 124–167.
Lawley, D.N. (1940). The estimation of factor loadings by the method of maximum likelihood. Proceedings of the Royal Society of Edinburgh, 60, 64–82.
Madansky, A. (1964). Instrumental variables in factor analysis. Psychometrika, 29, 105–113.
Mardia, K.V. (1970). Measures of multivariate skewness and kurtosis with applications. Biometrika, 57, 519–530.
Mátyás, L. (Ed.) (1999). Generalized method of moments estimation. Cambridge: Cambridge University Press.
Micceri, T. (1989). The unicorn, the normal curve, and other improbable creatures. Psychological Bulletin, 1, 156–166.
Muthén, L.K., & Muthén, B. (1998–2010). Mplus user’s guide. Los Angeles: Muthén & Muthén.
Nevitt, J., & Hancock, G.R. (2004). Evaluating small sample approaches for model test statistics in structural equation modeling. Multivariate Behavioral Research, 39, 439–478.
Newey, W.K., & McFadden, D. (1986). Large sample estimation and hypothesis testing. In R.F. Engle & D. McFadden (Eds.), Handbook of Econometrics (Vol. 4, 1st ed., pp. 2111–2245). Amsterdam: Elsevier.
Paxton, P.M., Curran, P., Bollen, K.A., Kirby, J., & Chen, F. (2001). Monte Carlo simulations in structural equation models. Structural Equation Modeling, 8, 287–312.
Pew Research Center (1998). Trust and citizen engagement in metropolitan Philadelphia: a case study. Washington: The Pew Research Center for the People and the Press.
Sargan, J.D. (1958). The estimation of economic relationships using instrumental variables. Econometrica, 26, 393–415.
Satorra, A. (1990). Robustness issues in structural equation modeling: a review of recent developments. Quality and Quantity, 24, 367–386.
Satorra, A., & Bentler, P.M. (1994). Corrections to test statistics and standard errors in covariance structure analysis. In A. von Eye & C.C. Clogg (Eds.), Latent variable analysis (pp. 399–419). Thousand Oaks: Sage.
Searle, S.R. (1982). Matrix algebra useful for statistics (1st ed.). New York: Wiley.
Skrondal, A., & Hesketh, S.R. (2004). Generalized latent variable modeling. Boca Raton: Chapman & Hall/CRC.
Staiger, D., & Stock, J.H. (1997). Instrumental variables regression with weak instruments. Econometrica, 65, 557–586.
StataCorp (2011). Stata statistical software: release 12. College Station: StataCorp.
Stock, J.H., & Yogo, M. (2005). Testing for weak instruments in linear IV regression. In D.W.K. Andrews (Ed.), Identification and Inference for Econometric Models (pp. 80–108). New York: Cambridge University Press.
Stock, J.H., Wright, J.H., & Yogo, M. (2002). A survey of weak instruments and weak identification in generalized method of moments. Journal of Business & Economic Statistics, 20, 518–529.
van der Vaart, A.W. (1998). Asymptotic statistics. New York: Wiley.
Wooldrige, J.M. (2010). Econometric analysis of cross section and panel data. Cambridge: MIT Press.
Yuan, K., & Hayashi, K. (2006). Standard errors in covariance structure models: asymptotic versus bootstrap. British Journal of Mathematical & Statistical Psychology, 59, 397–417.
Acknowledgements
We gratefully acknowledge the support of NSF SES 0617276 and SES0617193.
Author information
Authors and Affiliations
Corresponding author
Appendices
Appendix A. Notation Example
In the section on “From Latent to Observed Variables,” we introduced the notation of y ^{∗}=ZA+u. To illustrate this notation, suppose that the first latent variable equation for the ith case is
with y _{1i }, y _{2i },x _{1i }, and x _{2i } as the scaling indicators for η _{1i },η _{2i },ξ _{1i }, and ξ _{2i }, respectively. By replacing each latent variable by its scaling indicator minus its error [e.g., η _{1i }=(y _{1i }−ϵ _{1i })], the observed variable counterpart to this first latent variable equation is
The Z _{1} for this first equation is
and u _{1} is the vector
The Z _{2},…,Z _{ p+q−n } are constructed in an analogous fashion.
In the full system of equations, the coefficient vector A contains all of the intercepts, factor loadings and other coefficients in the model. It is the partitioned vector
where A _{ j } contains the intercept and coefficients for the jth equation in the model.
Continuing with the previous example of the equation for y _{1i } where y _{1i } depends on y _{2i }, x _{1i }, and x _{2i },
The other A _{ j } vectors are formed in a similar way.
Each of the regression coefficients in the original LISRELtype model appears only once in A. Variance and covariance parameters are not estimated by MIIVGMM. Also, A has a lot of structure, with zeroes corresponding to the lack of a direct effect of latent or observed variables. Thus, there is a onetoone correspondence between entries of A, on one hand, and entries of the collection B,Γ,Λ _{ x }, and Λ _{ y }, on the other.
Appendix B. Selection of MIIVs
The basic process for selecting MIIVs starts with all observed variables in the model and eliminates as potential MIIVs any variables that are directly or indirectly influenced by the errors or unique factors that are part of or that are correlated with the composite disturbance for a given equation. The remaining variables are the MIIVs for the given equation. More specifically, finding the MIIVs involves the following steps:

1.
Make a list of all observed variables in the model since these are the potential MIIVs;

2.
Make a list of all errors or unique factors (ϵs, δs, or ζs) that are included in the composite disturbance term that is part of the equation of interest;

3.
Eliminate any observed variable that is directly or indirectly influenced by the errors or unique factors noted in Step 2;

4.
Eliminate any observed variable that is directly or indirectly influenced by an error or unique factor that is correlated with the errors or unique factors noted in Step 2;

5.
The remaining observed variables are the MIIVs for the given equation.
This procedure can be implemented in several ways. One is by visual inspection of the path diagram of the model. Another is by looking at the reducedform model for each observed variable and determining whether the disturbances or errors in question have an effect. Finally, Bollen and Bauer (2004) provide a SAS macro to implement this check, and Bauldry (forthcoming), an implementation of the same algorithm in Stata. In virtually all identified SEMs that we have examined, there are sufficient MIIVs to estimate all equations in the model. There is no need to search for additional observed variables once a researcher starts with an (over)identified model. This is different than the usual IV approach where a researcher searches for auxiliary IVs that were not part of the original structure (Bollen 2012).
Appendix C. GMM Theory and Technical Aspects
This Appendix outlines the general theory of the generalized method of moments estimates. It restates the results given in the original development of Hansen (1982), as well as in comprehensive reviews such as Hall (2005) and Newey and McFadden (1986).
Given the pvariate data vector z and model parameters θ, the generalized method of moments works with qvariate vector of functions g(z,θ) that combine the data and parameters in such a way that in the population,
for the unique “true” value θ _{0}. Equations (C.1) are typically referred to as “moment conditions” in economics, or “estimating equations” in statistics (van der Vaart 1998; Godambe & Thompson, 1978).
The GMM proceeds as follows. First, the sample analogues to the estimating equations are formed:
Second, these estimating equations are collected together into a quadratic form:
where W _{ N } is a conforming q×q weight matrix, possibly obtained from the data. Third, this quadratic form is minimized with respect to θ to obtain the parameter estimates:
Thus, a GMM estimator \(\widehat{\boldsymbol{\theta}}_{N}\) is defined by a combination of the estimating equations g(z,θ) and the weight matrix W. Conceptually, both of them are at researcher discretion. However, generally the estimating equations are strongly determined by the model of interest (including our case of the modelimplied instrumental variables), and some choices of the weight matrix W are obviously better than others, as explained below.
In the MIIVGMM methodology discussed in this paper, the estimating equations are given by Equation (16):
After algebraic simplifications, these estimating equations reduce to linear combinations of crossproducts ξ _{ i } ζ _{ i }, ξ _{ i } ϵ _{ i }, ξ _{ i } δ _{ i }, ζ _{ i } ϵ _{ i }, ζ _{ i } δ _{ i } and ϵ _{ i } δ _{ i }. Since these pairs of variables are assumed to be uncorrelated, the estimating equations have indeed zero expectations, as required by (C.1).
The desirable properties of the GMM estimates include consistency, asymptotic normality and, with an optimal choice of the weight matrix W _{ N }, asymptotic efficiency. Also, specification tests of whether the assumptions (C.1) are supported by the data are available. All of these results are asymptotic, and their justification requires certain regularity conditions.
Consistency of the GMM estimates (Newey & McFadden, 1986, Theorem 2.6, p. 2132; Hall 2005, Theorem 3.1, p. 68) is obtained under the following conditions:

1.
z _{ i }∼ i.i.d.;

2.
\({\mathbf{W}}_{N} \stackrel{p}{\rightarrow} \mathbf{W}\) (Hall 2005, Assumption 3.7);

3.
W is positive semidefinite (Hall 2005, Assumption 3.7);

4.
WE[g(z,θ)]=0 iff θ=θ _{0} (Hall 2005, Assumptions 3.3 and 3.4);

5.
\(\boldsymbol{\theta}_{0} \in \operatorname{int}\boldsymbol{\Theta} \in R^{p}\) (Hall 2005, Assumption 3.5);

6.
Θ is compact (Hall 2005, Assumption 3.8);

7.
g(z,θ) is continuous at each θ with probability 1 (Hall 2005, Assumption 3.2);

8.
E[sup_{ θ }∥g(z,θ)∥]<∞ (Hall 2005, Assumptions 3.2 and 3.10).
Instead of Condition 1, Hall (2005) uses a weaker conditions of strict stationarity and ergodicity of the data (Assumption 3.1), in which case i is the time index. Hall (2005) Assumption 3.1 also allows for heteroscedasticity of the measurement errors and unique variances.
Let us apply these conditions to the MIIVGMM framework. The conditions on the weight matrix are satisfied for all the matrices we consider in this paper. The fourth condition is satisfied when the model is identified. Continuity of the estimating equations is trivial, as they are linear in the parameters. Finally, the last condition on E[sup_{ θ }∥g(z,θ)∥] is satisfied under the fourthorder crossmoments condition given in the section on the model and assumptions.
Asymptotic normality additionally requires the following conditions (Newey & McFadden, 1986, Theorem 3.4, p. 2148; Hall 2005, Theorem 3.2, p. 71):

9.
g(z,θ) is continuously differentiable in the neighborhood of θ _{0} with probability approaching 1 (Hall 2005, Assumptions 3.5 and 3.12);

10.
E[g(z,θ _{0})]=0 and E[∥g(z,θ _{0})∥^{2}]<∞ (Hall 2005, Assumption 3.11);

11.
E[sup_{ θ }∥∇_{ θ } g(z,θ)∥]<∞ (Hall 2005, Assumption 3.2);

12.
\(\operatorname{rank}E[ \nabla_{\boldsymbol{\theta}} g(\mathbf{z},\boldsymbol{\theta}_{0}) ] = p = \operatorname{dim}\boldsymbol{\Theta}\) (Hall 2005, Assumption 3.6);

13.
G′WG is nonsingular for G=E[∇_{ θ } g(z,θ)];

14.
\(\sup_{\boldsymbol{\theta}} \ \frac{1}{n} \sum_{i} \nabla_{\boldsymbol{\theta}} g(\mathbf{z}_{i},\boldsymbol{\theta})  \mathrm{E}[\nabla_{\boldsymbol{\theta}} g(\mathbf{z},\boldsymbol{\theta})] \ \stackrel{p}{\rightarrow} \mathbf{0}\) (Hall 2005, Assumption 2.13).
Let us apply these conditions to the MIIVGMM framework. Smoothness of g(z,θ) is trivial since g(z,θ) is linear in θ. Finite second moment of g(z,θ _{0}) is ensured by the fourthorder crossmoments condition given in the section on the model and assumptions. For the estimating equations given by (16), the gradients with respect to the parameters A are given by
Finiteness of its absolute value follows from the finiteness of the second moment of the data. The condition on the matrix G′WG is one of the conditions for the estimator (18) to be properly defined. The condition on the rank of the moment derivative matrix is similar to the condition of nondegenerate Jacobian in the likelihood context, and is satisfied whenever there are no perfectly collinear dependent variables in the model. The last condition is satisfied for i.i.d. data by virtue of the central limit theorem (CLT) for the derivatives of the moment conditions, since the first terms are of order O _{ p }(n ^{−1/2}).
Under the same set of conditions, the asymptotic variance estimator with GMM estimates plugged in for the population parameters is consistent for the target variance (Newey & McFadden, 1986, Theorem 4.5, p. 2160; Hall 2005, Section 3.5.1) when the data are i.i.d. For dependent or heteroscedastic data, one additionally needs (Hall 2005, Section 3.5.3):

15.
sup_{ θ }E∥∂ ^{2} g(z _{ i },θ)/∂ θ _{ j } ∂ θ _{ k }∥<∞ for all j,k in a neighborhood of θ _{0}.
As is easily seen, Equations (C.5) do not involve the parameters explicitly or implicitly, so this condition is easily satisfied for MIIVGMM. Alternatively, if heteroscedasticity is a function of observed or unobserved variables present in the model, as is the case in the simulations, the expected values operators in the population conditions (C.1) and (15) will include integration over the variables that cause heteroscedasticity. After this integration is performed, the data form a skewed and kurtotic distribution. In our first simulation, this is demonstrated by the highly significant results of Mardia’s test.
Asymptotic efficiency is achieved with an optimal choice of the weight matrix W _{ N }. Namely, W _{ N } needs to converge to the asymptotic variance of the estimating equations g(z,θ _{0}). The result is given in Theorem 5.2 of Newey and McFadden (1986, p. 2165) and Theorem 3.4 of Hall (2005, p. 88), and does not require any additional assumptions beyond those necessary for asymptotic normality of the estimates.
A distinction should be made of the use of the term “moments” in the three literatures related to the current paper. In the statistics literature, a “moment” is universally understood as the expected value of a power (most typically, a positive integer power) of a random variable X, possibly centered, i.e., E[X ^{k}] or E[(X−μ)^{k}] where μ=E[X], or their sample analogues. In the covariance modeling approach to structural equation modeling, “moments” refer to covariances σ _{ jk }=E[(X _{ j }−μ _{ j })(X _{ k }−μ _{ k })]. A further distinction is made of the sample, population, and implied moments. In the econometrics literature, the term “moment” is used more loosely to indicate any relation between (vectorvalued) data X and (vectorvalued) parameter θ such that E[g(X,θ)]=0. This generalization covers the standard uses: (i) E[X−μ]=0 for the population mean; (ii) E[X ^{2}−μ ^{2}−σ ^{2}]=0 for the population variance; (iii) E[(X _{ j }−μ _{ j })(X _{ k }−μ _{ k })−σ _{ jk }(θ)]=0 for the covariance structure models. It also allows for other uses, such as the normal equations in regression, E[x _{ j }(y−x′β)]=0, or the instrumental variables orthogonality conditions, E[z _{ k }(y−x′β)]=0. At this level of generality, the econometric “moments” have the same meaning as “estimating equations” in statistics, the point we made in Sections 2 and 5.1. As the impetus for this paper comes from bringing the econometric ideas into latent variable modeling, we use the term “moment” in the latter, econometric, sense to denote the functions of data and parameters. For the MIIVGMM application, the moments we use in estimation are given by Equation (15), in which the parameters are implicitly present in the composite error u.
Another way to look at these terminology distinctions is to observe that the covariance structure methods, such as MLE, ADF and other least square methods, make multiple steps from (i) setting up the model, as is done in Section 2, to (ii) deriving the implied second moments to (iii) minimizing the discrepancy between the sample and the implied moments to (iv) forming the variances of the sample moments to (v) utilizing the delta method to derive the standard errors of the parameter estimates. MIIVGMM is, in fact, more straightforward, as it uses the equations from the latent variable and the measurement model with some minimal transformations, and obtains the standard errors explicitly with analytically available formulae that do not involve any derivatives. Given its greater simplicity, it is not surprising that the method works quite well in small samples even despite severe nonnormality of the data.
Rights and permissions
About this article
Cite this article
Bollen, K.A., Kolenikov, S. & Bauldry, S. ModelImplied Instrumental Variable—Generalized Method of Moments (MIIVGMM) Estimators for Latent Variable Models. Psychometrika 79, 20–50 (2014). https://doi.org/10.1007/s1133601393353
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s1133601393353