Skip to main content
Log in

A Model Implied Instrumental Variable Approach to Exploratory Factor Analysis (MIIV-EFA)

  • Theory & Methods
  • Published:
Psychometrika Aims and scope Submit manuscript

Abstract

Spearman (Am J Psychol 15(1):201–293, 1904. https://doi.org/10.2307/1412107) marks the birth of factor analysis. Many articles and books have extended his landmark paper in permitting multiple factors and determining the number of factors, developing ideas about simple structure and factor rotation, and distinguishing between confirmatory and exploratory factor analysis (CFA and EFA). We propose a new model implied instrumental variable (MIIV) approach to EFA that allows intercepts for the measurement equations, correlated common factors, correlated errors, standard errors of factor loadings and measurement intercepts, overidentification tests of equations, and a procedure for determining the number of factors. We also permit simpler structures by removing nonsignificant loadings. Simulations of factor analysis models with and without cross-loadings demonstrate the impressive performance of the MIIV-EFA procedure in recovering the correct number of factors and in recovering the primary and secondary loadings. For example, in nearly all replications MIIV-EFA finds the correct number of factors when N is 100 or more. Even the primary and secondary loadings of the most complex models were recovered when the sample sizes were at least 500. We discuss limitations and future research areas. Two appendices describe alternative MIIV-EFA algorithms and the sensitivity of the algorithm to cross-loadings.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. We use the terms “measure,” “indicator,” and “observed variable” interchangeably to refer to the variables in the data set that are factor analyzed and that “load” on the resulting factors.

  2. Asparouhov and Muthén (2009) proposed an exploratory structural equation modeling approach that researchers can apply to factor analysis that permits correlated errors and cross-loadings. We will contrast their ESEM to MIIV-EFA in the conclusion.

    .

  3. There are other noniterative estimators for EFA that do not use instrumental variables. See Albert (1944a,1944b), Bentler (1982), Ihara and Kano (1986), and Kano (1990) for details.

  4. The factor analysis literature sometimes refers to the error or disturbance as the uniqueness term that is composed of measurement error and specific variance. Other literature simply uses error term. We use the term error even though uniqueness term is more in keeping with traditional factor analysis terminology.

  5. Because our interest lies in factor analysis, we only consider the measurement model (i.e., Z \(=\) \(\varvec{\alpha } +\) \(\Lambda \)L \(+\)  \(\varvec{\varepsilon } )\), whereas Bollen (1996) also includes the latent variable (“structural”) model.

  6. Another way of stating this is that we have more MIIVs than the number of scaling indicators on the right-hand side of the equation.

  7. Strictly speaking the degrees of freedom match when both factor loadings to the indicators with correlated errors are set to one. In an EFA procedure like ours, we are not introducing such constraints as part of the search procedure. 

  8. The full data set available in lavaan also includes demographic data. We only use the nine measures of test scores.

  9. The factor loadings for the new factor should be set to 1.

References

  • Albert, A. A. (1944a). The matrices of factor analysis. Proceedings of the National Academy of Sciences of the United States of America, 30(4), 90–95. http://www.jstor.org/stable/87882.

  • Albert, A. A. (1944b). The minimum rank of a correlation matrix. Proceedings of the National Academy of Sciences of the United States of America, 30, 144–146.

  • Angrist, J. D., & Krueger, A. B. (2001). Instrumental variables and the search for identification: From supply and demand to natural experiments. Journal of Economic Perspective, 15, 69–85.

    Article  Google Scholar 

  • Asparouhov, T., & Muthén, B. (2009). Exploratory structural equation modeling. Structural Equation Modeling, 16, 397–438. https://doi.org/10.1080/10705510903008204

    Article  Google Scholar 

  • Bentler, P. M. (1982). Confirmatory factor analysis via noniterative estimation: A fast, inexpensive method. Journal of Marketing Research, 19(4), 417–424. https://doi.org/10.2307/3151715

    Article  Google Scholar 

  • Bollen, K. A. (1987). Structural equation approaches to subjective air quality. In H. S. Koelega (Ed.), Environmental annoyance: characterization, measurement, and control (pp. 57–72). Elsevier Science Publishers.

    Google Scholar 

  • Bollen, K. A. (1989). Structural equations with latent variables (Vol. 210). Wiley.

    Book  Google Scholar 

  • Bollen, K. A. (1996). An alternative two stage least squares (2SLS) estimator for latent variable equations. Psychometrika, 61(1), 109–121.

    Article  Google Scholar 

  • Bollen, K. A. (2012). Instrumental variables in sociology and the social sciences. Annual Review of Sociology, 38, 37–72.

    Article  Google Scholar 

  • Bollen, K. A. (2019). Model implied instrumental variables (MIIVs): An alternative orientation to structural equation modeling. Multivariate Behavioral Research, 54(1), 31–46.

    Article  PubMed  Google Scholar 

  • Bollen, K. A. (2020). When good loadings go bad: Robustness in factor analysis. Structural Equation Modeling: A Multidisciplinary Journal, 27(4), 515–524.

    Article  PubMed  Google Scholar 

  • Bollen, K. A., & Arminger, G. (1991). Observational residuals in factor analysis and structural equation models. Sociological Methodology, 21, 235–62.

    Article  Google Scholar 

  • Bollen, K. A., & Bauer, D. J. (2004). Automating the selection of model-implied instrumental variables. Sociological Methods & Research, 32(4), 425–452.

    Article  Google Scholar 

  • Bollen, K. A., Fisher, Z. F., Giordano, M. L., Lilly, A. G., Luo, L., & Ye, A. (2021). An introduction to model implied instrumental variables using two stage least squares (MIIV-2SLS) in structural equation models (SEMs). Psychological Methods, 27(5), 752–772.

    Article  PubMed  PubMed Central  Google Scholar 

  • Bollen, K. A., Gates, K. M., & Fisher, Z. (2018). Robustness conditions for MIIV-2SLS when the latent variable or measurement model is structurally misspecified. Structural Equation Modeling: A Multidisciplinary Journal, 25(2), 848–859.

    Article  PubMed  Google Scholar 

  • Bollen, K. A., Kirby, J. B., Curran, P. J., Paxton, P. M., & Chen, F. (2007). Latent variable models under misspecification: Two-stage least squares (2SLS) and maximum likelihood (ML) estimators. Sociological Methods & Research, 36(1), 48–86.

    Article  Google Scholar 

  • Bollen, K. A., Kolenikov, S., & Bauldry, S. (2014). Model-implied instrumental variable– generalized method of moments (MIIV-GMM) estimators for latent variable models. Psychometrika, 79, 20–50.

    Article  PubMed  Google Scholar 

  • Bollen, K. A., & Maydeu-Oliveres, A. (2007). A polychoric instrumental variable (PIV) estimator for structural equation models with categorical variables. Psychometrika, 72, 309–326.

    Article  Google Scholar 

  • Bowden, R. J., & Turkington, D. A. (1990). Instrumental variables (Vol. 8). Cambridge University.

    Google Scholar 

  • Burt, C. (1917). The distribution and relations of educational abilities.

  • Carey, N. (1916). Factors in the Mental Processes of School Children. III. Factors concerned in the School Subjects. British Journal of Psychology, 8, 170.

    Google Scholar 

  • Cattell, R. (Ed.). (2012). The scientific use of factor analysis in behavioral and life sciences. Springer.

    Google Scholar 

  • Cudeck, R. (1991). Noniterative factor analysis estimators, with algorithms for subset and instrumental variable selection. Journal of Educational Statistics, 16(1), 35–52.

    Article  Google Scholar 

  • Darton, R. A. (1980). Rotation in factor analysis. Journal of the Royal Statistical Society . Series D (The Statistician), 29(3), 167–194.

    Google Scholar 

  • Didelez, V., Meng, S., & Sheehan, N. A. (2010). Assumptions of IV methods for observational epidemiology. Statistical Science, 25, 22–40.

    Article  Google Scholar 

  • Du Toit, A. (1986). The development of a non-iterative method of exploratory factor analysis. Unpublished B.Sc. honors thesis, University of South Africa, Pretoria, South Africa.

  • Fisher, A. J., & Boswell, J. F. (2016). Enhancing the personalization of psychotherapy with dynamic assessment and modeling. Assessment, 23(4), 496–506.

    Article  PubMed  Google Scholar 

  • Fisher, Z., & Bollen, K. A. (2020). An instrumental variable estimator for mixed indicators: Analytic derivatives and alternative parameterizations. Psychometrika, 85(3), 660–683. https://doi.org/10.1007/s11336-020-09721-6

    Article  PubMed  PubMed Central  Google Scholar 

  • Fisher, Z., Bollen, K., Gates, K., & Rönkkö, M. (2021). MIIVsem: Model implied instrumental variable (MIIV) estimation of structural equation models. R package version 0.5.8, https://CRAN.R-project.org/package=MIIVsem.

  • Gorsuch, R. L. (1983). Factor analysis (2nd ed., p. 1983). Erlbaum.

    Google Scholar 

  • Hägglund, G. (1982). Factor analysis by instrumental variables methods. Psychometrika, 47(1), 209–222.

    Article  Google Scholar 

  • Harman, H. H. (1976). Modern Factor Analysis (3rd ed.). University of Chicago Press.

    Google Scholar 

  • Holm, S. (1979). A simple sequentially rejective multiple test procedure. Scandinavian Journal of Statistics, 6, 65–70.

    Google Scholar 

  • Holzinger, K. J., & Swineford, F. (1939). A study in factor analysis: The stability of a bi-factor solution. Supplementary Educational Monographs, 48, xi + 91.

    Google Scholar 

  • Ihara, M., & Kano, Y. (1986). A new estimator of the uniqueness in factor analysis. Psychometrika, 51(4), 563–566. https://doi.org/10.1007/BF02295595

    Article  Google Scholar 

  • Jennrich, R. I. (1987). Tableau algorithms for factor analysis by instrumental variable methods. Psychometrika, 52(3), 469–476.

    Article  Google Scholar 

  • Jin, S., & Cao, C. (2018). Selecting polychoric instrumental variables in confirmatory factor analysis: An alternative specification test and effects of instrumental variables. British Journal of Mathematical and Statistical Psychology, 71, 387–413. https://doi.org/10.1111/bmsp.12128

    Article  PubMed  Google Scholar 

  • Jin, S., Yang-Wallentin, F., Bollen, K. A. (2021). A unified model-implied instrumental variable approach for structural equation modeling with mixed variables. Psychometrika, 86(2), 564–595. https://doi.org/10.1007s11336-021-09771-4

  • Jöreskog, K. G. (1969). A general approach to confirmatory maximum likelihood factor analysis. Psychometrika, 34(1), 183–202.

    Article  Google Scholar 

  • Kano, Y. (1990). Noniterative estimation and the choice of the number of factors in exploratory factor analysis. Psychometrika, 55(2), 277–291. https://doi.org/10.1007/BF02295288

    Article  Google Scholar 

  • Kirby, J. B., & Bollen, K. A. (2009). 10. Using instrumental variable tests to evaluate model specification in latent variable structural equation models. Sociological Methodology, 39(1), 327–355.

    Article  PubMed  PubMed Central  Google Scholar 

  • Kyriazos, T. A. (2018). Applied psychometrics: Sample size and sample power considerations in factor analysis (EFA, CFA) and SEM in general. Psychology, 9(08), 2207.

    Article  Google Scholar 

  • Lawley, D. N., & Maxwell, A. E. (1963). Factor analysis as a statistical method. Butterworths.

    Google Scholar 

  • Madansky, A. (1964). Instrumental variables in factor analysis. Psychometrika, 29(1), 105–113.

    Article  Google Scholar 

  • Molenaar, P. C., & Campbell, C. G. (2009). The new person-specific paradigm in psychology. Current Directions in Psychological Science, 18(1), 112–117.

    Article  Google Scholar 

  • Mroz, Thomas A., Bollen, Kenneth A., Speizer, Ilene S., & Mancini, Dominic J. (1999). Quality, accessibility, and contraceptive use in rural Tanzania. Demography, 36(1), 23–40.

    Article  PubMed  Google Scholar 

  • Mulaik, S. A. (2009). Foundations of factor analysis (2nd ed.). Chapman & Hall/CRC.

    Book  Google Scholar 

  • Nestler, S. (2014). How the 2 SLS/IV estimator can handle equality constraints in structural equation models: A system-of-equations approach. British Journal of Mathematical and Statistical Psychology, 67(1), 353–369.

    Article  PubMed  Google Scholar 

  • Nunnally, J. C. (1978). Psychometric theory (2nd ed.). McGraw-Hil.

    Google Scholar 

  • Preacher, Kristopher J., Zhang, Guangjian, Kim, Cheongtag, & Mels, Gerhard. (2013). Choosing the optimal number of factors in exploratory factor analysis: A model selection perspective. Multivariate Behavioral Research, 48(1), 28–56.

    Article  PubMed  Google Scholar 

  • Rosseel, Yves. (2012). lavaan: An R package for structural equation modeling. Journal of Statistical Software, 48(1), 1–36. https://doi.org/10.18637/jss.v048.i02

    Article  Google Scholar 

  • Sargan, J. D. (1958). The estimation of economic relationships using instrumental variables. Econometrica: Journal of the Econometric Society, 26, 393–415.

    Article  Google Scholar 

  • Sovey, A. J., & Green, D. P. (2011). Instrumental variables estimation in political science: A reader’s guide. American Journal of Political Science, 55, 188–200.

    Article  Google Scholar 

  • Spearman, C. (1904). ‘General intelligence’, objectively determined and measured. The American Journal of Psychology, 15(1), 201–293. https://doi.org/10.2307/1412107

    Article  Google Scholar 

  • Thurstone, L. L. (1947). Multiple factor analysis. University of Chicago Press.

    Google Scholar 

  • Toyoda, H. (1997). A noniterative estimation in confirmatory factor analysis by an instrumental variable method. Behaviormetrika, 24, 147–158.

  • Urban, C., & Bauer, D. (2021). A deep learning algorithm for high-dimensional exploratory item factor analysis. Psychometrika, 86(1), 1–29. https://doi.org/10.1007/s11336-021-09748-3

    Article  PubMed  Google Scholar 

  • van Zyl, L. E., & ten Klooster, P. M. (2022). Exploratory structural equation modeling: practical guidelines and tutorial with a convenient online tool for Mplus. Frontiers in Psychiatry, 12, 795672. https://doi.org/10.3389/fpsyt.2021.795672

    Article  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

We gratefully acknowledge grant support for this research from NIH [1R21MH119572- 01]. Data for this research are available from the cited sources and from the authors.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kenneth A. Bollen.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Appendices

Appendices

1.1 Appendix 1: Alternative Algorithms

The algorithm we report in the text was not the first one we developed. Because it may be instructive to others who seek to develop instrumental variable EFA algorithms, we give a brief description of two other algorithms that we tried before settling on the one in the main text. We also discuss their shortcomings.

Version 1

Our earliest version of the algorithm has several steps that we summarize below:

  1. (1)

    Regress each variable on all remaining variables and calculate the R\(^{\textrm{2}}\) values.

  2. (2)

    Start with a one-factor model, using the variable with the highest R\(^{\textrm{2}}\) value as the scaling indicator.

  3. (3)

    Use MIIV-2SLS estimator (the MIIVsem package) to estimate the model.

  4. (4)

    Find variables with significant Sargan test statistics. If more than one variable has significant test statistics, create a new factor. If none or one variable is significant, the algorithm stops and prints the current model as the final model.

  5. (5)

    Load the variables with significant Sargan test statistics exclusively on the new factor, again use the variable with the highest R\(^{\textrm{2}}\) as the scaling indicator. Repeat steps 3–5 until no more than one variable has a significant Sargan test statistic.

Fig. 8
figure 8

Sensitivity for cross-loading variables.

It is not hard to see that this version is quite simple compared to the current algorithm. But with its simplicity comes limitations. One is that the variable with the highest R\(^{\textrm{2}}\) might not always be the most appropriate. For example, if the data generating model is a multiple factor model with one variable cross-loaded on all factors, that variable is likely to have a very large R\(^{\textrm{2}}\) value but questionable as a scaling indicator. Another limitation of this algorithm is that it only considers Sargan test statistics when checking for ‘problematic variables’ but not factor loadings. This means the algorithm does not remove nonsignificant factor loadings. Finally, this algorithm assumes factor complexity of one so that no cross-loadings are allowed, which is not very realistic for many empirical examples. It is for these reasons that we did not stay with this algorithm.

Version 2

The second version of the algorithm improved version one, and it is very close to the retained algorithm that we reported in the main text. It shares almost all steps, except that it does not include cross loadings. In fact, when no cross-loadings are allowed, this version usually ends up creating extra factors to explain these omitted cross-loading relations.

1.2 Appendix 2: Sensitivity of Cross-Loading Variables

Figure A1 shows the sensitivity for cross-loading variables—the first row is the sensitivity on the primary factor, the second row the secondary loadings on cross-loading factor(s), and the third row the overall sensitivity.

As represented in Fig. 5, sensitivity for cross-loading variables on their main factors is still high for all three significance levels at a sample size of 100. However, the overall sensitivity dropped at a sample size of 100 across all four simulations because of a lower sensitivity for cross-loadings variables on their corresponding secondary factor(s). This means that it is not that MIIVefa cannot recover any cross-loadings at small sample sizes, but that not all cross-loadings    are always correctly recovered. When dealing with empirical data, the finding for cross-loadings can be aided by substantive background knowledge that simulations do not have. It is worth mentioning that when the sample size is large, MIIVefa works very well at even recovering cross-loadings in addition to primary loadings. Note that the seeming decrease in sensitivity on secondary loadings at N\(=\)100 is due to the falsely high sensitivity at N\(=\)60. At N\(=\)60, as shown in Figs. 23, and 4, the data structure recovery rates and overall sensitivity are both very low. However, when we look at sensitivity for a specific path/relation, it does not account for how accurately the rest of the paths were recovered. For example, at N\(=\)60 for simulation 5, we could have recovered a model that is very far away from the true DGM and the cross-loading variable is loaded on all factors, which would have resulted in a 100% sensitivity for that specific cross-loading variable. When the sample size increases to 100, again as shown in Figs. 23, and 4, the data structure recovery rates, especially for primary loadings, and sensitivity increase as well, but not too much for secondary loadings on cross-loaded factors. This means that the algorithm is more likely to recover a model that is closer to the true DGM, although it is not uncommon for cross loading variables to be only recovered on the primary factor.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bollen, K.A., Gates, K.M. & Luo, L. A Model Implied Instrumental Variable Approach to Exploratory Factor Analysis (MIIV-EFA). Psychometrika (2024). https://doi.org/10.1007/s11336-024-09949-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11336-024-09949-6

Keywords

Navigation