Skip to main content

A Deep Learning Algorithm for High-Dimensional Exploratory Item Factor Analysis


Marginal maximum likelihood (MML) estimation is the preferred approach to fitting item response theory models in psychometrics due to the MML estimator’s consistency, normality, and efficiency as the sample size tends to infinity. However, state-of-the-art MML estimation procedures such as the Metropolis–Hastings Robbins–Monro (MH-RM) algorithm as well as approximate MML estimation procedures such as variational inference (VI) are computationally time-consuming when the sample size and the number of latent factors are very large. In this work, we investigate a deep learning-based VI algorithm for exploratory item factor analysis (IFA) that is computationally fast even in large data sets with many latent factors. The proposed approach applies a deep artificial neural network model called an importance-weighted autoencoder (IWAE) for exploratory IFA. The IWAE approximates the MML estimator using an importance sampling technique wherein increasing the number of importance-weighted (IW) samples drawn during fitting improves the approximation, typically at the cost of decreased computational efficiency. We provide a real data application that recovers results aligning with psychological theory across random starts. Via simulation studies, we show that the IWAE yields more accurate estimates as either the sample size or the number of IW samples increases (although factor correlation and intercepts estimates exhibit some bias) and obtains similar results to MH-RM in less time. Our simulations also suggest that the proposed approach performs similarly to and is potentially faster than constrained joint maximum likelihood estimation, a fast procedure that is consistent when the sample size and the number of items simultaneously tend to infinity.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12


  1. For distributions q and p, the KL divergence is defined as \({{\,\mathrm{\textit{D}_{KL}}\,}}\big [ q \Vert p \big ] = {\mathbb {E}}_{q} \big [ \log q \big ] - {\mathbb {E}}_{q} \big [ \log p \big ] \). It can be shown that \({{\,\mathrm{\textit{D}_{KL}}\,}}\big [ q \Vert p \big ] \ge 0\) with equality if and only if \(p = q\) almost everywhere w.r.t. q.

  2. We move the gradient inside the expectation in line 25 using the fact that \(q_{\varvec{\psi }}({\mathbf {x}} \mid {\mathbf {y}})\), \(\log q_{\varvec{\psi }}({\mathbf {x}} \mid {\mathbf {y}})\), and \(\log p_{\varvec{\theta }}({\mathbf {x}}, {\mathbf {y}})\) satisfy certain regularity conditions. For details, see Lehmann and Casella (1998).

  3. M is typically set to a power of 2 to reduce fitting times by facilitating GPU (or CPU) memory allocation (Goodfellow et al., 2016).


  • Anderson, T. W., & Rubin, H. (1957). Statistical inference in factor analysis. In J. Neyman (Ed.), Proceedings of the Third Berkeley Symposium on Mathematical Statistics and Probability (pp. 111–150). Berkeley: University of California Press.

    Google Scholar 

  • Asparouhov, T., & Muthén, B. (2009). Exploratory structural equation modeling. Structural Equation Modeling: A Multidisciplinary Journal, 16(3), 397–438.

    Article  Google Scholar 

  • Béguin, A. A., & Glas, C. A. W. (2001). MCMC estimation and some model-fit analysis of multidimensional IRT models. Psychometrika, 66(4), 541–562.

    Article  Google Scholar 

  • Bengio, Y. (2012). Practical recommendations for gradient-based training of deep architectures. In G. Montavon, G. Orr, & K.-R. Müller (Eds.), Neural Networks: Tricks of the Trade (pp. 437–478). Berlin: Springer.

    Chapter  Google Scholar 

  • Biesanz, J. C., & West, S. G. (2004). Towards understanding assessments of the Big Five: Multitrait-multimethod analyses of convergent and discriminant validity across measurement occasion and type of observer. Journal of Personality, 72(4), 845–876.

    Article  PubMed  Google Scholar 

  • Blei, D. M., Kucukelbir, A., & McAuliffe, J. D. (2017). Variational inference: A review for statisticians. Journal of the American Statistical Association, 112(518), 859–877.

    Article  Google Scholar 

  • Bock, R. D., & Aitkin, M. (1981). Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm. Psychometrika, 46(4), 443–459.

    Article  Google Scholar 

  • Bock, R. D., Gibbons, R., & Muraki, E. (1988). Full-information item factor analysis. Applied Psychological Measurement, 12(3), 261–280.

    Article  Google Scholar 

  • Bolt, D. M. (2005). Limited- and full-information estimation of item response theory models. In A. Maydeau-Olivares & J. J. McArdle (Eds.), Contemporary Psychometrics, Chap. 2 (pp. 27–72). New Jersey: Lawrence Erlbaum Associates, Inc.

    Google Scholar 

  • Bottou, L., Curtis, F. E., & Nocedal, J. (2018). Optimization methods for large-scale machine learning. SIAM Review, 60(2), 223–311.

    Article  Google Scholar 

  • Bowman, S. R., Vilnis, L., Vinyals, O., Dai, A. M., Jozefowicz, R., & Bengio, S. (2016). Generating sentences from a continuous space. In Proceedings of the\(20^{{\rm th}}\)SIGNLL Conference on Computational Natural Language Learning (pp. 10-21). Association for Computational Linguistics. Retrieved from arXiv:1511.06349.

  • Burda, Y., Grosse, R. & Salakhutdinov, R. (2016). Importance weighted autoencoders. In \(4^{{\rm th}}\)International Conference on Learning Representations. ICLR. Retrieved from arXiv:1509.00519.

  • Cai, L. (2010a). High-dimensional exploratory item factor analysis by a Metropolis-Hastings Robbins-Monro algorithm. Psychometrika, 75(1), 33–57.

    Article  Google Scholar 

  • Cai, L. (2010b). Metropolis-Hastings Robbins-Monro algorithm for confirmatory item factor analysis. Journal of Educational and Behavioral Statistics, 35(3), 307–335.

    Article  Google Scholar 

  • Chalmers, R. P. (2012). Mirt: A multidimensional item response theory package for the R environment. Journal of Statistical Software, 48(6), 1–29.

    Article  Google Scholar 

  • Chen, Y., Filho, T. S., Prudêncio, R. B. C., Diethe, T., & Flach, P. (2019). \(\beta ^3\)-IRT : A new item response model and its applications. In Proceedings of the\(22^{{\rm nd}}\)International Conference on Artificial Intelligence and Statistics (pp. 1013-1021). Retrieved from

  • Chen, Y., Li, X., & Zhang, S. (2019). Joint maximum likelihood estimation for high-dimensional exploratory item factor analysis. Psychometrika, 84(1), 124–146.

    Article  PubMed  Google Scholar 

  • Chen, X., Liu, S., Sun, R., & Hong, M. (2019). On the convergence of a class of ADAM-type algorithms for non-convex optimization. In \(7^{{\rm th}}\)International Conference on Learning Representations. ICLR. Retrieved from arXiv:1808.02941.

  • Cho, A. E. (2020). Gaussian variational estimation for multidimensional item response theory. [Doctoral dissertation, University of Michigan]. Deep Blue Data. Retrieved from

  • Choi, J., Oehlert, G., & Zou, H. (2010). A penalized maximum likelihood approach to sparse factor analysis. Statistics and Its Interface, 3(4), 429–436.

    Article  Google Scholar 

  • Christensen, R. H. B. (2019). Cumulative link models for ordinal regression with the R package ordinal. Retrieved from

  • Clevert, D. A., Unterthiner, T., & Hochreiter, S. (2016). Fast and accurate deep network learning by exponential linear units (ELUs). In \(4^{{\rm th}}\)International Conference on Learning Representations. ICLR. Retrieved from arXiv:1511.07289.

  • Cremer, C., Li, X., & Duvenaud, D. (2018). Inference suboptimality in variational autoencoders. In Proceedings of the\(35^{{\rm th}}\)International Conference on Machine Learning (pp. 1078–1086). JMLR, Inc. and Microtome Publishing. Retrieved from

  • Cremer, C., Morris, Q., & Duvenaud, D. (2017). Reinterpreting importance-weighted autoencoders. In \(5^{{\rm th}}\)International Conference on Learning Representations. ICLR. Retrieved from arXiv:1704.02916.

  • Curi, M., Converse, G. A., Hajewski, J., & Oliveira, S. (2019). Interpretable variational autoencoders for cognitive models. 2019 International Joint Conference on Neural Networks.

  • Cybenko, G. (1989). Approximation by superpositions of a sigmoidal function. Mathematics of Control, Signals, and Systems, 2(1), 303–314.

    Article  Google Scholar 

  • Domke, J., & Sheldon, D. (2018). Importance weighting and variational inference. In Advances in Neural Information Processing Systems 31 (pp. 4470–4479). Curran Associates, Inc. Retrieved from

  • Duchi, J. C., Hazan, E., & Singer, Y. (2011). Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research, 12(1), 2121–2159.

    Google Scholar 

  • Edwards, M. (2010). A Markov chain Monte Carlo approach to confirmatory item factor analysis. Psychometrika, 75(3), 474–497.

    Article  Google Scholar 

  • Erosheva, E. A., Fienberg, S. E., & Joutard, C. (2007). Describing disability through individual-level mixture models for multivariate binary data. The Annals of Applied Statistics, 1(2), 502–537.

    Article  Google Scholar 

  • Gershman, S., & Goodman, N. (2014). Amortized inference in probabilistic reasoning. In Proceedings of the\(36^{{\rm th}}\)Annual Conference of the Cognitive Science Society, (Vol. 1, pp. 517–522). Retrieved from

  • Ghosh, R. P., Mallick, B., & Pourahmadi, M. (2020). Bayesian estimation of correlation matricesof longitudinal data. Bayesian Analysis, 1–20,

  • Glorot, X., & Bengio, Y. (2010). Understanding the difficulty of training deep feedforward neural networks. Journal of Machine Learning Research, 9(1), 249–256.

    Google Scholar 

  • Goldberg, L. R. (1992). The development of markers for the Big-Five factor structure. Psychological Assessment, 4(1), 26–42.

    Article  Google Scholar 

  • Goldberg, L. R., Johnson, J. A., Eber, H. W., Hogan, R., Ashton, M. C., Cloninger, C. R., et al. (2006). The international personality item pool and the future of public-domain personality measures. Journal of Research in Personality, 40(1), 84–96.

    Article  Google Scholar 

  • Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep Learning. Cambridge: MIT Press.

    Google Scholar 

  • He, K., Zhang, X., Ren, S., & Sun, J. (2015). Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification. In 2015 IEEE International Conference on Computer Vision (pp. 1026–1034).

  • Heaton, J. (2008). Introduction to Neural Networks for Java (2nd ed.). Washington, DC: Heaton Research, Inc.

    Google Scholar 

  • Hirose, K., & Konishi, S. (2012). Variable selection via the weighted group lasso for factor analysis models. The Canadian Journal of Statistics, 40(2), 345–361.

    Article  Google Scholar 

  • Huang, C. W., Krueger, D., Lacoste, A., & Courville, A. (2018). Neural autoregressive flows. In Proceedings of the\(35^{{\rm th}}\)International Conference on Machine Learning (pp. 2078–2087). Retrieved from

  • Huber, P., Ronchetti, E., & Victoria-Feser, M.-P. (2004). Estimation of generalized linear latent variable models. Journal of the Royal Statistical Society - Series B, 66(4), 893–908.

    Article  Google Scholar 

  • Hui, F. K. C., Tanaka, E., & Warton, D. I. (2018). Order selection and sparsity in latent variable models via the ordered factor LASSO. Biometrics, 74(4), 1311–1319.

    Article  PubMed  Google Scholar 

  • Hui, F. K. C., Warton, D. I., Ormerod, J. T., Haapaniemi, V., & Taskinen, S. (2017). Variational approximations for generalized linear latent variable models. Journal of Computational and Graphical Statistics, 26(1), 35–43.

    Article  Google Scholar 

  • Jordan, M. I., Ghahramani, Z., Jaakkola, T. S., & Saul, L. K. (1998). Learning in Graphical Models., 37(1), 183–233.

  • Jöreskog, K. G., & Moustaki, I. (2001). Factor analysis of ordinal variables: A comparison of three approaches. Multivariate Behavioral Research, 36(3), 347–387.

    Article  PubMed  Google Scholar 

  • Keskar, N. S., Mudigere, D., Nocedal, J., Smelyanskiy, M., & Tang, P. T. P. (2017). On large-batch training for deep learning: Generalization gap and sharp minima. In \(5^{{\rm th}}\)International Conference on Learning Representations. ICLR. Retrieved from arXiv:1609.04836.

  • Kingma, D. P., Salimans, T., Jozefowicz, R., Chen, X., Sutskever, I., & Welling, M. (2016). Improved variational inference with inverse autoregressive flow. In Advances in Neural Information Processing Systems 31 (pp. 4743-4751). Curran Associates, Inc. Retrieved from

  • Kingma, D. P., & Welling, M. (2014). Auto-encoding variational Bayes. In \(2^{{\rm nd}}\)International Conference on Learning Representations. ICLR. Retrieved from arXiv:1312.6114.

  • LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature Methods, 521(1), 436–444.

    Article  Google Scholar 

  • Lehmann, E. L., & Casella, G. (1998). Theory of Point Estimation. Berlin: Springer.

    Google Scholar 

  • Linnainmaa, S. (1970). The representation of the cumulative rounding error of an algorithm as a Taylor expansion of the local rounding errors. [Unpublished master’s thesis (in Finnish)]. University of Helsinki.

  • Lorenzo-Seva, U., & ten Berge, J. M. (2006). Tucker’s congruence coefficient as a meaningful index of factor similarity. Methodology: European Journal of Research Methods for The Behavioral and Social Sciences, 2(2), 57–64.

    Article  Google Scholar 

  • MacCallum, R. C., Widaman, K. F., Zhang, S., & Hong, S. (1999). Sample size in factor analysis. Psychological Methods, 4(1), 84–99.

    Article  Google Scholar 

  • Mattei, P.-A., & Frellsen, J. (2019). MIWAE: Deep generative modelling and imputation of incomplete data. In Proceedings of the\(36^{{\rm th}}\)International Conference on Machine Learning, (pp. 4413–4423). Retrieved from

  • McKinley, R., & Reckase, M. (1983). An extension of the two-parameter logistic model to the multidimensional latent space (Research Report ONR83-2). The American College Testing Program.

  • McMahan, H. B., & Streeter, M. (2010). Adaptive bound optimization for online convex optimization. In A. T. Kalai & M. Mohr (Eds.), The\(23^{{\rm rd}}\)Conference on Learning Theory (pp. 244–256). Retrieved from

  • Meng, X.-L., & Schilling, S. (1996). Fitting full-information item factor models and an empirical investigation of bridge sampling. Journal of the American Statistical Association, 91(435), 1254–1267.

    Article  Google Scholar 

  • Monroe, S. L. (2014). Multidimensional item factor analysis with semi-nonparametric latent densities. [Unpublished doctoral dissertation]. University of California.

  • Muthén, B. (1978). Contributions to factor analysis of dichotomous variables. Psychometrika, 43(4), 551–560.

    Article  Google Scholar 

  • Muthén, B. (1984). A general structural equation model with dichotomous, ordered categorical, and continuous latent variable indicators. Psychometrika, 49(1), 115–132.

    Article  Google Scholar 

  • Natesan, P., Nandakumar, R., Minka, T., & Rubright, J. D. (2016). Bayesian prior choice in IRT estimation using MCMC and variational Bayes. Frontiers in Psychology, 7(1), 1.

    Article  Google Scholar 

  • Nemirovski, A., Juditsky, A., Lan, G., & Shapiro, A. (2009). Robust stochastic approximation approach to stochatic programming. SIAM Journal on Optimization, 19(4), 1574–1609.

    Article  Google Scholar 

  • Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., Demaison, A., Köpf, A., Yang, E., DeVito, Z., Raison, M., Tejani, A., Chilamkurthy, S., Steiner, B., Fang, L., Chintala, S. (2019). PyTorch: An imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems 32 (pp. 8024-8035). Curran Associates, Inc. Retrieved from

  • Pinheiro, J. C., & Bates, D. M. (1996). Unconstrained parametrizations for variance-covariance matrices. Statistics and Computing, 6(3), 289–296.

    Article  Google Scholar 

  • Rabe-Hesketh, S., Skrondal, A., & Pickles, A. (2005). Maximum likelihood estimation of limited and discrete dependent variable models with nested random effects. Journal of Econometrics, 128(2), 301–323.

    Article  Google Scholar 

  • Rainforth, T., Kosiorek, A. R., Le, T. A., Maddison, C. J., Igl, M., Wood, F., & Teh, Y. W. (2018). Tighter variational bounds are not necessarily better. In Proceedings of the\(35^{{\rm th}}\)International Conference on Machine Learning (Vol. 80, pp. 4277–4285). Retrieved from

  • Rapisarda, F., Brigo, D., & Mercurio, F. (2007). Parameterizing correlations: A geometric inter-pretation. IMA Journal of Management Mathematics, 18(1), 55–73.

    Article  Google Scholar 

  • Reckase, M. D. (2009). Multidimensional Item Response Theory. Berlin: Springer.

    Book  Google Scholar 

  • Reddi, S. J., Kale, S., & Kumar, S. (2018). On the convergence of ADAM and beyond. In \(6^{{\rm th}}\)International Conference on Learning Representations. ICLR. Retrieved from arXiv:1904.09237.

  • Rezende, D. J., Mohamed, S., & Wierstra, D. (2014). Stochastic backpropagation and approximate inference in deep generative models. In Proceedings of the\(31^{{\rm st}}\)International Conference on Machine Learning (pp. 1278–1286). Retrieved from

  • Rezende, D. J., & Mohamed, S. (2015). Variational inference with normalizing flows. In Proceedings of the\(32^{{\rm nd}}\)International Conference on Machine Learning (pp. 530–1538). Retrieved from

  • Robbins, H., & Monro, S. (1951). A stochastic approximation method. The Annals of Mathematical Statistics, 22(3), 400–407.

    Article  Google Scholar 

  • Samejima, F. (1969). Estimation of latent ability using a response pattern of graded scores. Psychometrika, 35(1), 139.

    Article  Google Scholar 

  • Schilling, R., & Bock, D. (2005). High-dimensional maximum marginal likelihood item factor analysis by adaptive quadrature. Psychometrika, 70(3), 533–555.

    Google Scholar 

  • Sønderby, C. K., Raiko, T., Maaløe, L., Sønderby, S. K., & Winther, O. (2016). Ladder variational autoencoders. In Advances in Neural Information Processing Systems (pp. 3745–3753). Curran Associates, Inc. Retrieved from

  • Song, X., & Lee, S. (2005). A multivariate probit latent variable model for analyzing dichotomous responses. Statistica Sinica, 15(3), 45–64.

    Google Scholar 

  • Spall, J. C. (2003). Introduction to Stochastic Search and Optimization: Estimation, Simulation, and Control. Hoboken: Wiley.

    Book  Google Scholar 

  • Staib, M., Reddi, S., Kale, S., Kumar, S., & Sra, S. (2019). Escaping saddle points with adaptive gradient methods. In Proceedings of the\(36^{{\rm th}}\)International Conference on Machine Learning (pp. 5956–5965). Retrieved from

  • Sun, J., Chen, Y., Liu, J., Ying, Z., & Xin, T. (2016). Latent variable selection for multidimensional item response theory models via L1 regularization. Psychometrika, 81(4), 921–939.

    Article  PubMed  Google Scholar 

  • Tabak, E. G., & Turner, C. V. (2012). A family of nonparametric density estimation algorithms. Communications on Pure and Applied Mathematics, 66(2), 145–164.

    Article  Google Scholar 

  • Tabak, E. G., & Vanden-Eijnden, E. (2010). Density estimation by dual ascent of the log-likelihood. Communications in Mathematical Sciences, 8(1), 217–233.

    Article  Google Scholar 

  • Tsay, R. S., & Pourahmadi, M. (2017). Modelling structured correlation matrices. Biometrika, 104(1), 237–242.

    Article  Google Scholar 

  • Tucker, G., Lawson, D., Gu, S., & Maddison, C. J. (2019). Doubly reparameterized gradient estimators for Monte Carlo objectives. In \(7^{{\rm th}}\)International Conference on Learning Representations. ICLR. Retrieved from arXiv:1810.04152.

  • Wainwright, M. J., & Jordan, M. I. (2008). Graphical models, exponential families, and variational inference. Foundations and Trends in Machine Learning, 1(1–2), 1–305.

    Article  Google Scholar 

  • Wirth, R. J., & Edwards, M. C. (2007). Item factor analysis: Current approaches and future directions. Psychological Methods, 12(1), 58–79.

    Article  PubMed  PubMed Central  Google Scholar 

  • Woods, C. M., & Thissen, D. (2006). Item response theory with estimation of the latent population distribution using spline-based densities. Psychometrika, 71(2), 281–301.

    Article  PubMed  Google Scholar 

  • Wu, M., Davis, R. L., Domingue, B. W., Piech, C., & Goodman, N. (2020). Variational item response theory: Fast, accurate, and expressive. In A. N. Rafferty, J. Whitehill, C. Romero, & V. Cavalli-Sforza (Eds.), Proceedings of the\(13^{{\rm th}}\)International Conference on Educational Data Mining 2020 (pp. 257–268). Retrieved from

  • Yalcin, I., & Amemiya, Y. (2001). Nonlinear factor analysis as a statistical method. Statistical Science, 16(3), 275–294.

    Google Scholar 

  • Yates, A. (1988). Multivariate Exploratory Data Analysis: A Perspective on Exploratory Factor Analysis. Albany: State University of New York Press.

    Google Scholar 

  • Yun, J., Lozano, A. C., & Yang, E. (2020). A general family of stochastic proximal gradient methods for deep learning. arXiv preprint. Retrieved from arXiv:2007.07484.

  • Zhang, C., Butepage, J., Kjellstrom, H., & Mandt, S. (2019). Advances in variational inference. IEEE Transactions on Pattern Analysis and Machine Intelligence, 41(8), 2008–2026.

    Article  PubMed  Google Scholar 

  • Zhang, S., Chen, Y., & Li, X. (2019). mirtjml [Computer software]. Retrieved from

  • Zhang, H., Chen, Y., & Li, X. (2020). A note on exploratory item factor analysis by singular value decomposition. Psychometrika, pp. 1–15.

  • Zhang, S., Chen, Y., & Liu, Y. (2020). An improved stochastic EM algorithm for large-scale full-information item factor analysis. British Journal of Mathematical and Statistical Psychology, 73(1), 44–71.

    Article  Google Scholar 

  • Zhou, D., Tang, Y., Yang, Z., Cao, Y., & Gu, Q. (2018). On the convergence of adaptive gradient methods for nonconvex optimization. arXiv preprint. Retrieved from arXiv:1808.05671.

Download references

Author information

Authors and Affiliations


Corresponding author

Correspondence to Christopher J. Urban.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This material is based upon work supported by the National Science Foundation Graduate Research Fellowship under Grant No. DGE-1650116.

     We would like to thank to the Editor, the Associate Editor, and the reviewers for their many constructive comments. We are also grateful to Dr. David Thissen for his extensive suggestions, feedback, and support.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Urban, C.J., Bauer, D.J. A Deep Learning Algorithm for High-Dimensional Exploratory Item Factor Analysis. Psychometrika 86, 1–29 (2021).

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI:


  • Deep learning
  • artificial neural network
  • variational inference
  • variational autoencoder
  • importance sampling
  • importance weighted autoencoder
  • item response theory
  • categorical factor analysis
  • latent variable modeling