A Note on Improving Variational Estimation for Multidimensional Item Response Theory

Ma, Chenchen; Ouyang, Jing; Wang, Chun; Xu, Gongjun

doi:10.1007/s11336-023-09939-0

A Note on Improving Variational Estimation for Multidimensional Item Response Theory

Theory & Methods
Published: 18 November 2023

(2023)
Cite this article

Psychometrika Aims and scope Submit manuscript

Chenchen Ma¹,
Jing Ouyang¹,
Chun Wang² &
…
Gongjun Xu ORCID: orcid.org/0000-0003-4023-5413¹

335 Accesses
Explore all metrics

Abstract

Survey instruments and assessments are frequently used in many domains of social science. When the constructs that these assessments try to measure become multifaceted, multidimensional item response theory (MIRT) provides a unified framework and convenient statistical tool for item analysis, calibration, and scoring. However, the computational challenge of estimating MIRT models prohibits its wide use because many of the extant methods can hardly provide results in a realistic time frame when the number of dimensions, sample size, and test length are large. Instead, variational estimation methods, such as Gaussian variational expectation–maximization (GVEM) algorithm, have been recently proposed to solve the estimation challenge by providing a fast and accurate solution. However, results have shown that variational estimation methods may produce some bias on discrimination parameters during confirmatory model estimation, and this note proposes an importance-weighted version of GVEM (i.e., IW-GVEM) to correct for such bias under MIRT models. We also use the adaptive moment estimation method to update the learning rate for gradient descent automatically. Our simulations show that IW-GVEM can effectively correct bias with modest increase of computation time, compared with GVEM. The proposed method may also shed light on improving the variational estimation for other psychometrics models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Reporting reliability, convergent and discriminant validity with structural equation modeling: A review and best-practice recommendations

Article Open access 30 January 2023

A new criterion for assessing discriminant validity in variance-based structural equation modeling

Article Open access 22 August 2014

Recognize the Value of the Sum Score, Psychometrics’ Greatest Accomplishment

Article Open access 17 April 2024

Notes

The limited-information method such as weighted least squares is not reviewed here as it handles high-dimensional models very differently, and it cannot handle missing data very well.

References

Albert, J. H. (1992). Bayesian estimation of normal ogive item response curves using GIBBS sampling. Journal of educational statistics, 17(3), 251–269.
Article Google Scholar
Bates, D., Mächler, M., Bolker, B., & Walker, S. (2014). Fitting linear mixed-effects models using lme4. arXiv preprint arXiv:1406.5823.
Bishop, C. M. (2006). Pattern recognition and machine learning. Springer.
Google Scholar
Blei, D. M., Kucukelbir, A., & McAuliffe, J. D. (2017). Variational inference: A review for statisticians. Journal of the American Statistical Association, 112(518), 859–877.
Article Google Scholar
Bock, R. D., & Aitkin, M. (1981). Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm. Psychometrika, 46(4), 443–459.
Article Google Scholar
Briggs, D. C., & Wilson, M. (2003). An introduction to multidimensional measurement using rasch models.
Burda, Y., Grosse, R., & Salakhutdinov, R. (2015). Importance weighted autoencoders. arXiv preprint arXiv:1509.00519.
Cai, L. (2008). Sem of another flavor: Two new applications of the supplemented EM algorithm. British Journal of Mathematical and Statistical Psychology, 61, 309–329.
Article PubMed Google Scholar
Cai, L. (2010). Metropolis–Hastings Robbins–Monro algorithm for confirmatory item factor analysis. Journal of Educational and Behavioral Statistics, 35(3), 307–335.
Article Google Scholar
Cai, L. (2010). High-dimensional exploratory item factor analysis by a Metropolis–Hastings Robbins–Monro algorithm. Psychometrika, 75(1), 33–57.
Article Google Scholar
Cai, L., & Hansen, M. (2018). Improving educational assessment: Multivariate statistical methods. Policy Insights from the Behavioral and Brain Sciences, 5(1), 19–24.
Article Google Scholar
Cai, L., Yang, J. S., & Hansen, M. (2011). Generalized full-information item bifactor analysis. Psychological methods, 16(3), 221.
Article PubMed PubMed Central Google Scholar
Chen, Y., Li, X., & Zhang, S. (2019). Joint maximum likelihood estimation for high-dimensional exploratory item factor analysis. Psychometrika, 84(1), 124–146.
Article PubMed Google Scholar
Chen, P., & Wang, C. (2021). Using EM algorithm for finite mixtures and reformed supplemented EM for MIRT calibration. Psychometrika, 86, 299–326.
Article PubMed Google Scholar
Cho, A. E., Xiao, J., Wang, C., & Xu, G. (2022). Regularized variational estimation for exploratory item response theory. Psychometrika, pp. 1–29.
Cho, A. E., Wang, C., Zhang, X., & Xu, G. (2021). Gaussian variational estimation for multidimensional item response theory. British Journal of Mathematical and Statistical Psychology, 74, 52–85.
Article PubMed Google Scholar
CRESST (2017). English language proficiency assessment for the 21st century: Item analysis and calibration.
Curi, M., Converse, G. A., Hajewski, J., & Oliveira, S. (2019). Interpretable variational autoencoders for cognitive models. In 2019 international joint conference on neural networks (IJCNN), pp. 1–8. IEEE.
Domke, J., & Sheldon, D. R. (2018). Importance weighting and variational inference. Advances in Neural Information Processing Systems, 31.
Gibbons, R. D., & Hedeker, D. R. (1992). Full-information item bi-factor analysis. Psychometrika, 57(3), 423–436.
Article Google Scholar
Hamilton, L. S., Nussbaum, E. M., Kupermintz, H., Kerkhoven, J. I., & Snow, R. E. (1995). Enhancing the validity and usefulness of large-scale educational assessments: Ii. nels: 88 science achievement. American Educational Research Journal, 32(3), 555–581.
Article Google Scholar
Hartig, J., & Höhler, J. (2009). Multidimensional IRT models for the assessment of competencies. Studies in Educational Evaluation, 35(2–3), 57–63.
Article Google Scholar
Hui, F. K., Warton, D. I., Ormerod, J. T., Haapaniemi, V., & Taskinen, S. (2017). Variational approximations for generalized linear latent variable models. Journal of Computational and Graphical Statistics, 26(1), 35–43.
Article Google Scholar
Jeon, M., Rijmen, F., & Rabe-Hesketh, S. (2017). A variational maximization-maximization algorithm for generalized linear mixed models with crossed random effects. Psychometrika, 82(3), 693–716.
Article Google Scholar
Jordan, M. I. (2004). Graphical models. Statistical science, 19(1), 140–155.
Article Google Scholar
Kingma, D. P., & Ba, J. (2014). Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.
Kupermintz, H., Ennis, M. M., Hamilton, L. S., Talbert, J. E., & Snow, R. E. (1995). In dedication: Leigh burstein: Enhancing the validity and usefulness of large-scale educational assessments: I. nels: 88 mathematics achievement. American Educational Research Journal, 32(3), 525–554.
Google Scholar
Lindstrom, M. J., & Bates, D. M. (1988). Newton–Raphson and EM algorithms for linear mixed-effects models for repeated-measures data. Journal of the American Statistical Association, 83(404), 1014–1022.
Google Scholar
Liu, T., Wang, C., & Xu, G. (2022). Estimating three- and four-parameter MIRT models with importance-weighted sampling enhanced variational auto-encoder. Frontiers in Psychology, 13.
McCulloch, C. E. (1997). Maximum likelihood algorithms for generalized linear mixed models. Journal of the American statistical Association, 92(437), 162–170.
Article Google Scholar
Natesan, P., Nandakumar, R., Minka, T., & Rubright, J. D. (2016). Bayesian prior choice in IRT estimation using MCMC and variational bayes. Frontiers in Psychology, 7, 1422.
Article PubMed PubMed Central Google Scholar
Neyman, J., & Scott, E. L. (1948). Consistent estimates based on partially consistent observations. Econometrica: Journal of the Econometric Society, 1–32.
OECD, N. (2003). The pisa 2003 assessment framework: Mathematics, reading, science and problem solving knowledge and skills.
Ormerod, J. T., & Wand, M. P. (2010). Explaining variational approximations. The American Statistician, 64(2), 140–153.
Article Google Scholar
Patz, R. J., & Junker, B. W. (1999). Applications and extensions of MCMC in IRT: Multiple item types, missing data, and rated responses. Journal of educational and behavioral statistics, 24(4), 342–366.
Article Google Scholar
Pinheiro, J. C., & Bates, D. M. (1995). Approximations to the log-likelihood function in the nonlinear mixed-effects model. Journal of computational and Graphical Statistics, 4(1), 12–35.
Google Scholar
Reckase, M. D. (2009). Multidimensional item response theory models. In Multidimensional item response theory, pp. 79–112. Springer.
Rijmen, F., & Jeon, M. (2013). Fitting an item response theory model with random item effects across groups by a variational approximation method. Annals of Operations Research, 206(1), 647–662.
Article Google Scholar
Rijmen, F., Vansteelandt, K., & De Boeck, P. (2008). Latent class models for diary method data: Parameter estimation by local computations. Psychometrika, 73(2), 167–182.
Article PubMed Google Scholar
Thissen, D. (2013). Using the testlet response model as a shortcut to multidimensional item response theory subscore computation. In New developments in quantitative psychology, pp. 29–40. Springer.
Urban, C. J., & Bauer, D. J. (2021). A deep learning algorithm for high-dimensional exploratory item factor analysis. Psychometrika, 86(1), 1–29.
Article PubMed Google Scholar
von Davier, M., & Sinharay, S. (2010). Stochastic approximation methods for latent regression item response models. Journal of Educational and Behavioral Statistics, 35(2), 174–193.
Article Google Scholar
Wainer, H., Bradlow, E. T., & Wang, X. (2007). Testlet response theory and its applications. Cambridge University Press.
Book Google Scholar
Wang, C., & Xu, G. (2015). A mixture hierarchical model for response times and response accuracy. British Journal of Mathematical and Statistical Psychology, 68(3), 456–477.
Article PubMed Google Scholar
Wu, M., Davis, R. L., Domingue, B. W., Piech, C., & Goodman, N. (2020). Variational item response theory: Fast, accurate, and expressive. arXiv preprint arXiv:2002.00276.
Yamaguchi, K., & Okada, K. (2020). Variational Bayes inference algorithm for the saturated diagnostic classification model. Psychometrika, 85(4), 973–995.
Article PubMed Google Scholar
Yamaguchi, K., & Okada, K. (2020). Variational Bayes inference for the DINA model. Journal of Educational and Behavioral Statistics, 45(5), 569–597.
Article Google Scholar
Zhang, H., Chen, Y., & Li, X. (2020). A note on exploratory item factor analysis by singular value decomposition. Psychometrika, 85, 358–372.
Article PubMed PubMed Central Google Scholar
Zhang, S., Chen, Y., & Liu, Y. (2020). An improved stochastic EM algorithm for large-scale full-information item factor analysis. British Journal of Mathematical and Statistical Psychology, 73(1), 44–71.
Article PubMed Google Scholar

Download references

Acknowledgements

We are grateful to the editor, an associate editor, and three anonymous referees for their helpful comments and suggestions. This work is partially supported by IES Grant R305D200015 and NSF grants SES-1846747 and SES-2150601.

Author information

Authors and Affiliations

Department of Statistics, University of Michigan, 456 West Hall, 1085 South University, Ann Arbor, MI, 48109, USA
Chenchen Ma, Jing Ouyang & Gongjun Xu
College of Education, University of Washington, 312 E Miller Hall, 2012 Skagit Lane, Seattle, WA, 98105, USA
Chun Wang

Authors

Chenchen Ma
View author publications
You can also search for this author in PubMed Google Scholar
Jing Ouyang
View author publications
You can also search for this author in PubMed Google Scholar
Chun Wang
View author publications
You can also search for this author in PubMed Google Scholar
Gongjun Xu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Chun Wang or Gongjun Xu.

Ethics declarations

Data Availability

The simulation code and datasets generated during the current study are available at https://github.com/jingoystat/A-Note-on-Improving-Variational-Estimation-for-Multidimensional-Item-Response-Theory.

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: Additional Comparative Studies

1.1 A.1: Comparing IW-GVEM with Importance-Weighted Variational Bayesian Method

In recent literature, researchers also proposed importance-weighted variational Bayesian (IW-VB) methods for the estimation of MIRT models. In particular, Urban and Bauer (2021) and Liu et al. (2022) proposed to use importance-weighted variational autoencoder (IW-VAE) for exploratory factor analysis. This method is a deep learning-based variational method and is computationally fast in large data sets. Although IW-VB methods handle large-scale data with high computational efficiency, their performances at relatively small-sized and medium-sized data are not competitive. While MCMC could be an alternative method for small samples, in situations with small to medium-sample sizes, our variational method is faster and more competitive than MCMC.

In this section, we provide additional finite-sample simulation results to show that our method outperforms the IW-VB methods in small to medium samples. To illustrate it, we compare our proposed IW-GVEM method and IW-VB method by Liu et al. (2022) at \(N = 200\), \(N = 500\) and \(N = 1000\). Because their method focuses only on exploratory MIRT, we will compare the performance of our method (denoted as “IS" in the figure) to IW-VB for exploratory analysis. The simulation settings follow the same settings as in Sect. 2.1. The results are presented in Figures 9, 10, 11, 12, 13, 14, 15, and 16. From the results, we see the biases of IW-GVEM are closer to 0 than the IW-VB method under all simulation settings. The RMSEs of our proposed method are substantially smaller than the IW-VB in Liu et al. (2022).

1.2 A.2: Comparing IW-GVEM with Joint Maximum Likelihood Method

The joint maximum likelihood (JML) estimator is a computationally efficient estimator with theoretical consistency established. It is proved in Chen et al. (2019) that JML estimator is consistent under high-dimensional settings and it outperforms the marginal maximum likelihood approaches in terms of computational costs. However, different from our IW-GVEM method, the latent abilities are treated as fixed effect parameters instead of random variables in JML method, which may constrain its performances in settings where latent factors are correlated. The JML estimation is also inconsistent in the setting when the number of items is fixed and the sample size grows to infinity. Because the number of parameters in the joint likelihood function grows to infinity, the standard theory for the maximum likelihood method cannot directly apply and the point estimation consistency for each item cannot be attained, which is known as Neyman–Scott phenomenon (Neyman & Scott, 1948).

Extensive simulation studies were conducted in Cho et al. (2021) to compare GVEM to JMLE method under the same simulation settings (sample sizes, within or between multidimensional structures, factor correlations, etc.) and using the same evaluation criteria (bias and RMSE) as in Sect. 2.1. Specifically, Figures 3 and 4 of Cho et al. (2021) compared the bias and RMSE of GVEM and JML and showed that GVEM has much lower bias and RMSE than JML across all settings. At certain challenging cases such as “within item, correlation is high", JML estimator has even worse performances. This could be explained by that latent factors are fixed effects in JMLE, whereas GVEM treats them as random effects with multivariate Gaussian distributions accounting for the correlations among factors.

As an improvement of GVEM method, our IW-GVEM method outperforms GVEM in confirmatory factor analysis and has overall comparable performances as GVEM in exploratory factor analysis, across all simulation settings. For a detailed comparison of the simulation results of IW-GVEM and GVEM, please refer to Sect. 2.2. As our IW-GVEM is comparable to, if not better than, GVEM, the performance of our IW-GVEM is also better than JML under our simulation settings.

Appendix B: Additional Simulation Study

In this section, we present finite-sample simulation studies to show that our proposed IW-GVEM greatly improves the ELBO from GVEM. For the purpose of illustration, we consider the four settings under \(N = 200\) and \(J= 30\): (1) within-item and low factor correlation; (2) between-item and low factor correlation; (3) within-item and high factor correlation; (4) between-item and high factor correlation. For each setting, we generate the ELBOs from the GVEM algorithm and importance-weighted ELBOs for different sample sizes \(M = 5, 10, 50\), and 100 at the importance sampling step over 100 replications. The calculated ELBOs are presented in Fig. 17. From Fig. 17, we see that the importance sampling step leads to a tighter importance-weighted ELBO (\(M = 5, 10, 50, 100\)) than that of GVEM. As the sample M in the importance sampling step increases, the ELBOs converge, which is consistent with theoretical results in Proposition 1.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Ma, C., Ouyang, J., Wang, C. et al. A Note on Improving Variational Estimation for Multidimensional Item Response Theory. Psychometrika (2023). https://doi.org/10.1007/s11336-023-09939-0

Download citation

Received: 29 October 2022
Published: 18 November 2023
DOI: https://doi.org/10.1007/s11336-023-09939-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Note on Improving Variational Estimation for Multidimensional Item Response Theory

Abstract

Access this article

Similar content being viewed by others

Reporting reliability, convergent and discriminant validity with structural equation modeling: A review and best-practice recommendations

A new criterion for assessing discriminant validity in variance-based structural equation modeling

Recognize the Value of the Sum Score, Psychometrics’ Greatest Accomplishment

Notes

References

Acknowledgements