Skip to main content
Log in

A Note on Improving Variational Estimation for Multidimensional Item Response Theory

  • Theory & Methods
  • Published:
Psychometrika Aims and scope Submit manuscript

Abstract

Survey instruments and assessments are frequently used in many domains of social science. When the constructs that these assessments try to measure become multifaceted, multidimensional item response theory (MIRT) provides a unified framework and convenient statistical tool for item analysis, calibration, and scoring. However, the computational challenge of estimating MIRT models prohibits its wide use because many of the extant methods can hardly provide results in a realistic time frame when the number of dimensions, sample size, and test length are large. Instead, variational estimation methods, such as Gaussian variational expectation–maximization (GVEM) algorithm, have been recently proposed to solve the estimation challenge by providing a fast and accurate solution. However, results have shown that variational estimation methods may produce some bias on discrimination parameters during confirmatory model estimation, and this note proposes an importance-weighted version of GVEM (i.e., IW-GVEM) to correct for such bias under MIRT models. We also use the adaptive moment estimation method to update the learning rate for gradient descent automatically. Our simulations show that IW-GVEM can effectively correct bias with modest increase of computation time, compared with GVEM. The proposed method may also shed light on improving the variational estimation for other psychometrics models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Algorithm 1
Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Notes

  1. The limited-information method such as weighted least squares is not reviewed here as it handles high-dimensional models very differently, and it cannot handle missing data very well.

References

  • Albert, J. H. (1992). Bayesian estimation of normal ogive item response curves using GIBBS sampling. Journal of educational statistics, 17(3), 251–269.

    Article  Google Scholar 

  • Bates, D., Mächler, M., Bolker, B., & Walker, S. (2014). Fitting linear mixed-effects models using lme4. arXiv preprint arXiv:1406.5823.

  • Bishop, C. M. (2006). Pattern recognition and machine learning. Springer.

    Google Scholar 

  • Blei, D. M., Kucukelbir, A., & McAuliffe, J. D. (2017). Variational inference: A review for statisticians. Journal of the American Statistical Association, 112(518), 859–877.

    Article  Google Scholar 

  • Bock, R. D., & Aitkin, M. (1981). Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm. Psychometrika, 46(4), 443–459.

    Article  Google Scholar 

  • Briggs, D. C., & Wilson, M. (2003). An introduction to multidimensional measurement using rasch models.

  • Burda, Y., Grosse, R., & Salakhutdinov, R. (2015). Importance weighted autoencoders. arXiv preprint arXiv:1509.00519.

  • Cai, L. (2008). Sem of another flavor: Two new applications of the supplemented EM algorithm. British Journal of Mathematical and Statistical Psychology, 61, 309–329.

    Article  PubMed  Google Scholar 

  • Cai, L. (2010). Metropolis–Hastings Robbins–Monro algorithm for confirmatory item factor analysis. Journal of Educational and Behavioral Statistics, 35(3), 307–335.

    Article  Google Scholar 

  • Cai, L. (2010). High-dimensional exploratory item factor analysis by a Metropolis–Hastings Robbins–Monro algorithm. Psychometrika, 75(1), 33–57.

    Article  Google Scholar 

  • Cai, L., & Hansen, M. (2018). Improving educational assessment: Multivariate statistical methods. Policy Insights from the Behavioral and Brain Sciences, 5(1), 19–24.

    Article  Google Scholar 

  • Cai, L., Yang, J. S., & Hansen, M. (2011). Generalized full-information item bifactor analysis. Psychological methods, 16(3), 221.

    Article  PubMed  PubMed Central  Google Scholar 

  • Chen, Y., Li, X., & Zhang, S. (2019). Joint maximum likelihood estimation for high-dimensional exploratory item factor analysis. Psychometrika, 84(1), 124–146.

    Article  PubMed  Google Scholar 

  • Chen, P., & Wang, C. (2021). Using EM algorithm for finite mixtures and reformed supplemented EM for MIRT calibration. Psychometrika, 86, 299–326.

    Article  PubMed  Google Scholar 

  • Cho, A. E., Xiao, J., Wang, C., & Xu, G. (2022). Regularized variational estimation for exploratory item response theory. Psychometrika, pp. 1–29.

  • Cho, A. E., Wang, C., Zhang, X., & Xu, G. (2021). Gaussian variational estimation for multidimensional item response theory. British Journal of Mathematical and Statistical Psychology, 74, 52–85.

    Article  PubMed  Google Scholar 

  • CRESST (2017). English language proficiency assessment for the 21st century: Item analysis and calibration.

  • Curi, M., Converse, G. A., Hajewski, J., & Oliveira, S. (2019). Interpretable variational autoencoders for cognitive models. In 2019 international joint conference on neural networks (IJCNN), pp. 1–8. IEEE.

  • Domke, J., & Sheldon, D. R. (2018). Importance weighting and variational inference. Advances in Neural Information Processing Systems, 31.

  • Gibbons, R. D., & Hedeker, D. R. (1992). Full-information item bi-factor analysis. Psychometrika, 57(3), 423–436.

    Article  Google Scholar 

  • Hamilton, L. S., Nussbaum, E. M., Kupermintz, H., Kerkhoven, J. I., & Snow, R. E. (1995). Enhancing the validity and usefulness of large-scale educational assessments: Ii. nels: 88 science achievement. American Educational Research Journal, 32(3), 555–581.

    Article  Google Scholar 

  • Hartig, J., & Höhler, J. (2009). Multidimensional IRT models for the assessment of competencies. Studies in Educational Evaluation, 35(2–3), 57–63.

    Article  Google Scholar 

  • Hui, F. K., Warton, D. I., Ormerod, J. T., Haapaniemi, V., & Taskinen, S. (2017). Variational approximations for generalized linear latent variable models. Journal of Computational and Graphical Statistics, 26(1), 35–43.

    Article  Google Scholar 

  • Jeon, M., Rijmen, F., & Rabe-Hesketh, S. (2017). A variational maximization-maximization algorithm for generalized linear mixed models with crossed random effects. Psychometrika, 82(3), 693–716.

    Article  Google Scholar 

  • Jordan, M. I. (2004). Graphical models. Statistical science, 19(1), 140–155.

    Article  Google Scholar 

  • Kingma, D. P., & Ba, J. (2014). Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.

  • Kupermintz, H., Ennis, M. M., Hamilton, L. S., Talbert, J. E., & Snow, R. E. (1995). In dedication: Leigh burstein: Enhancing the validity and usefulness of large-scale educational assessments: I. nels: 88 mathematics achievement. American Educational Research Journal, 32(3), 525–554.

    Google Scholar 

  • Lindstrom, M. J., & Bates, D. M. (1988). Newton–Raphson and EM algorithms for linear mixed-effects models for repeated-measures data. Journal of the American Statistical Association, 83(404), 1014–1022.

    Google Scholar 

  • Liu, T., Wang, C., & Xu, G. (2022). Estimating three- and four-parameter MIRT models with importance-weighted sampling enhanced variational auto-encoder. Frontiers in Psychology, 13.

  • McCulloch, C. E. (1997). Maximum likelihood algorithms for generalized linear mixed models. Journal of the American statistical Association, 92(437), 162–170.

    Article  Google Scholar 

  • Natesan, P., Nandakumar, R., Minka, T., & Rubright, J. D. (2016). Bayesian prior choice in IRT estimation using MCMC and variational bayes. Frontiers in Psychology, 7, 1422.

    Article  PubMed  PubMed Central  Google Scholar 

  • Neyman, J., & Scott, E. L. (1948). Consistent estimates based on partially consistent observations. Econometrica: Journal of the Econometric Society, 1–32.

  • OECD, N. (2003). The pisa 2003 assessment framework: Mathematics, reading, science and problem solving knowledge and skills.

  • Ormerod, J. T., & Wand, M. P. (2010). Explaining variational approximations. The American Statistician, 64(2), 140–153.

    Article  Google Scholar 

  • Patz, R. J., & Junker, B. W. (1999). Applications and extensions of MCMC in IRT: Multiple item types, missing data, and rated responses. Journal of educational and behavioral statistics, 24(4), 342–366.

    Article  Google Scholar 

  • Pinheiro, J. C., & Bates, D. M. (1995). Approximations to the log-likelihood function in the nonlinear mixed-effects model. Journal of computational and Graphical Statistics, 4(1), 12–35.

    Google Scholar 

  • Reckase, M. D. (2009). Multidimensional item response theory models. In Multidimensional item response theory, pp. 79–112. Springer.

  • Rijmen, F., & Jeon, M. (2013). Fitting an item response theory model with random item effects across groups by a variational approximation method. Annals of Operations Research, 206(1), 647–662.

    Article  Google Scholar 

  • Rijmen, F., Vansteelandt, K., & De Boeck, P. (2008). Latent class models for diary method data: Parameter estimation by local computations. Psychometrika, 73(2), 167–182.

    Article  PubMed  Google Scholar 

  • Thissen, D. (2013). Using the testlet response model as a shortcut to multidimensional item response theory subscore computation. In New developments in quantitative psychology, pp. 29–40. Springer.

  • Urban, C. J., & Bauer, D. J. (2021). A deep learning algorithm for high-dimensional exploratory item factor analysis. Psychometrika, 86(1), 1–29.

    Article  PubMed  Google Scholar 

  • von Davier, M., & Sinharay, S. (2010). Stochastic approximation methods for latent regression item response models. Journal of Educational and Behavioral Statistics, 35(2), 174–193.

    Article  Google Scholar 

  • Wainer, H., Bradlow, E. T., & Wang, X. (2007). Testlet response theory and its applications. Cambridge University Press.

    Book  Google Scholar 

  • Wang, C., & Xu, G. (2015). A mixture hierarchical model for response times and response accuracy. British Journal of Mathematical and Statistical Psychology, 68(3), 456–477.

    Article  PubMed  Google Scholar 

  • Wu, M., Davis, R. L., Domingue, B. W., Piech, C., & Goodman, N. (2020). Variational item response theory: Fast, accurate, and expressive. arXiv preprint arXiv:2002.00276.

  • Yamaguchi, K., & Okada, K. (2020). Variational Bayes inference algorithm for the saturated diagnostic classification model. Psychometrika, 85(4), 973–995.

    Article  PubMed  Google Scholar 

  • Yamaguchi, K., & Okada, K. (2020). Variational Bayes inference for the DINA model. Journal of Educational and Behavioral Statistics, 45(5), 569–597.

    Article  Google Scholar 

  • Zhang, H., Chen, Y., & Li, X. (2020). A note on exploratory item factor analysis by singular value decomposition. Psychometrika, 85, 358–372.

    Article  PubMed  PubMed Central  Google Scholar 

  • Zhang, S., Chen, Y., & Liu, Y. (2020). An improved stochastic EM algorithm for large-scale full-information item factor analysis. British Journal of Mathematical and Statistical Psychology, 73(1), 44–71.

    Article  PubMed  Google Scholar 

Download references

Acknowledgements

We are grateful to the editor, an associate editor, and three anonymous referees for their helpful comments and suggestions. This work is partially supported by IES Grant R305D200015 and NSF grants SES-1846747 and SES-2150601.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Chun Wang or Gongjun Xu.

Ethics declarations

Data Availability

The simulation code and datasets generated during the current study are available at  https://github.com/jingoystat/A-Note-on-Improving-Variational-Estimation-for-Multidimensional-Item-Response-Theory.

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: Additional Comparative Studies

1.1 A.1: Comparing IW-GVEM with Importance-Weighted Variational Bayesian Method

In recent literature, researchers also proposed importance-weighted variational Bayesian (IW-VB) methods for the estimation of MIRT models. In particular, Urban and Bauer (2021) and  Liu et al. (2022) proposed to use importance-weighted variational autoencoder (IW-VAE) for exploratory factor analysis. This method is a deep learning-based variational method and is computationally fast in large data sets. Although IW-VB methods handle large-scale data with high computational efficiency, their performances at relatively small-sized and medium-sized data are not competitive. While MCMC could be an alternative method for small samples, in situations with small to medium-sample sizes, our variational method is faster and more competitive than MCMC.

Fig. 9
figure 9

Bias for \(K=2\) between item under exploratory analysis.

Fig. 10
figure 10

Bias for \(K=2\) within item under exploratory analysis.

Fig. 11
figure 11

RMSE for \(K=2\) between item under exploratory analysis.

Fig. 12
figure 12

RMSE for \(K=2\) within item under exploratory analysis.

Fig. 13
figure 13

Bias for \(K=5\) between item under exploratory analysis.

Fig. 14
figure 14

Bias for \(K=5\) within item under exploratory analysis.

Fig. 15
figure 15

RMSE for \(K=5\) between item under exploratory analysis.

Fig. 16
figure 16

RMSE for \(K=5\) within item under exploratory analysis.

In this section, we provide additional finite-sample simulation results to show that our method outperforms the IW-VB methods in small to medium samples. To illustrate it, we compare our proposed IW-GVEM method and IW-VB method by Liu et al. (2022) at \(N = 200\), \(N = 500\) and \(N = 1000\). Because their method focuses only on exploratory MIRT, we will compare the performance of our method (denoted as “IS" in the figure) to IW-VB for exploratory analysis. The simulation settings follow the same settings as in Sect. 2.1. The results are presented in Figures  9, 10, 11, 12, 13, 14, 15, and 16. From the results, we see the biases of IW-GVEM are closer to 0 than the IW-VB method under all simulation settings. The RMSEs of our proposed method are substantially smaller than the IW-VB in Liu et al. (2022).

1.2 A.2: Comparing IW-GVEM with Joint Maximum Likelihood Method

The joint maximum likelihood (JML) estimator is a computationally efficient estimator with theoretical consistency established. It is proved in Chen et al. (2019) that JML estimator is consistent under high-dimensional settings and it outperforms the marginal maximum likelihood approaches in terms of computational costs. However, different from our IW-GVEM method, the latent abilities are treated as fixed effect parameters instead of random variables in JML method, which may constrain its performances in settings where latent factors are correlated. The JML estimation is also inconsistent in the setting when the number of items is fixed and the sample size grows to infinity. Because the number of parameters in the joint likelihood function grows to infinity, the standard theory for the maximum likelihood method cannot directly apply and the point estimation consistency for each item cannot be attained, which is known as Neyman–Scott phenomenon  (Neyman & Scott, 1948).

Extensive simulation studies were conducted in Cho et al. (2021) to compare GVEM to JMLE method under the same simulation settings (sample sizes, within or between multidimensional structures, factor correlations, etc.) and using the same evaluation criteria (bias and RMSE) as in Sect. 2.1. Specifically, Figures 3 and 4 of Cho et al. (2021) compared the bias and RMSE of GVEM and JML and showed that GVEM has much lower bias and RMSE than JML across all settings. At certain challenging cases such as “within item, correlation is high", JML estimator has even worse performances. This could be explained by that latent factors are fixed effects in JMLE, whereas GVEM treats them as random effects with multivariate Gaussian distributions accounting for the correlations among factors.

As an improvement of GVEM method, our IW-GVEM method outperforms GVEM in confirmatory factor analysis and has overall comparable performances as GVEM in exploratory factor analysis, across all simulation settings. For a detailed comparison of the simulation results of IW-GVEM and GVEM, please refer to Sect. 2.2. As our IW-GVEM is comparable to, if not better than, GVEM, the performance of our IW-GVEM is also better than JML under our simulation settings.

Appendix B: Additional Simulation Study

In this section, we present finite-sample simulation studies to show that our proposed IW-GVEM greatly improves the ELBO from GVEM. For the purpose of illustration, we consider the four settings under \(N = 200\) and \(J= 30\): (1) within-item and low factor correlation; (2) between-item and low factor correlation; (3) within-item and high factor correlation; (4) between-item and high factor correlation. For each setting, we generate the ELBOs from the GVEM algorithm and importance-weighted ELBOs for different sample sizes \(M = 5, 10, 50\), and 100 at the importance sampling step over 100 replications. The calculated ELBOs are presented in Fig. 17. From Fig. 17, we see that the importance sampling step leads to a tighter importance-weighted ELBO (\(M = 5, 10, 50, 100\)) than that of GVEM. As the sample M in the importance sampling step increases, the ELBOs converge, which is consistent with theoretical results in Proposition 1.

Fig. 17
figure 17

Importance-weighted ELBO at \(N = 200, J = 30\)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ma, C., Ouyang, J., Wang, C. et al. A Note on Improving Variational Estimation for Multidimensional Item Response Theory. Psychometrika (2023). https://doi.org/10.1007/s11336-023-09939-0

Download citation

  • Received:

  • Published:

  • DOI: https://doi.org/10.1007/s11336-023-09939-0

Keywords

Navigation