Abstract
We develop a latent variable selection method for multidimensional item response theory models. The proposed method identifies latent traits probed by items of a multidimensional test. Its basic strategy is to impose an \(L_{1}\) penalty term to the log-likelihood. The computation is carried out by the expectation–maximization algorithm combined with the coordinate descent algorithm. Simulation studies show that the resulting estimator provides an effective way in correctly identifying the latent structures. The method is applied to a real dataset involving the Eysenck Personality Questionnaire.
Similar content being viewed by others
References
Ackerman, T. A. (1989). Unidimensional IRT calibration of compensatory and noncompensatory multidimensional items. Applied Psychological Measurement, 13, 113–127.
Ackerman, T. A. (1994). Using multidimensional item response theory to understand what items and tests are measuring. Applied Measurement in Education, 7, 255–278.
Akaike, H. (1974). A new look at the statistical model identification. IEEE Transactions on Automatic Control, 19, 716–723.
Ansley, T. N., & Forsyth, R. A. (1985). An examination of the characteristics of unidimensional IRT parameter estimates derived from two-dimensional data. Applied Psychological Measurement, 9, 37–48.
Béguin, A. A., & Glas, C. A. (2001). MCMC estimation and some model-fit analysis of multidimensional IRT models. Psychometrika, 66, 541–561.
Bock, D. R., Gibbons, R., & Muraki, E. (1988). Full-information item factor analysis. Applied Psychological Measurement, 12, 261–280.
Bock, D. R., Gibbons, R., Schilling, S., Muraki, E., Wilson, D., & Wood, R. (2003). Testfact 4.0. In Computer software and manual. Lincolnwood, IL: Scientific Software International.
Bolt, D. M., & Lall, V. F. (2003). Estimation of compensatory and noncompensatory multidimensional item response models using Markov chain Monte Carlo. Applied Psychological Measurement, 27, 395–414.
Cai, L. (2010). High-dimensional exploratory item factor analysis by a Metropolis–Hastings Robbins–Monro algorithm. Psychometrika, 75, 33–57.
Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B (Methodological), 39, 1–38.
Donoho, D. L., & Johnstone, I. M. (1995). Adapting to unknown smoothness via wavelet shrinkage. Journal of the American Statistical Association, 90, 1200–1224.
Embretson, S. E. (1984). A general latent trait model for response processes. Psychometrika, 49, 175–186.
Embretson, S. E., & Reise, S. P. (2000). Psychometric methods: Item response theory for psychologists. Mahwah, NJ: Lawrence Erlbaum Associates.
Eysenck, S., & Barrett, P. (2013). Re-introduction to cross-cultural studies of the EPQ. Personality and Individual Differences, 54(4), 485–489.
Fraser, C., & McDonald, R. P. (1988). NOHARM: Least squares item factor analysis. Multivariate Behavioral Research, 23, 267–269.
Friedman, J., Hastie, T., Hofling, H., & Tibshirani, R. (2007). Pathwise coordinate optimization. The Annals of Applied Statistics, 1, 302–332.
Friedman, J., Hastie, T., & Tibshirani, R. (2010). Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software, 33, 1.
Jöreskog, K. G. (1969). A general approach to confirmatory maximum likelihood factor analysis. Psychometrika, 34, 183–202.
Kang, T. (2006). Model selection methods for unidimensional and multidimensional IRT models. Madison, WI: University of Wisconsin.
Lord, F. M., & Novick, M. R. (1968). Statistical theories of mental test scores. Reading, MA: Addison-Wesley.
Mallows, C. L. (1973). Some comments on Cp. Technometrics, 15, 661–675.
Maydeu-Olivares, A., & Liu, Y. (2015). Item diagnostics in multivariate discrete data. Psychological Methods, 20, 276–292.
McDonald, R. P. (1967). Nonlinear factor analysis. Psychometric Monographs, No. 15. Richmond, VA: Psychometric Corporation.
McDonald, R. P. (1982). Linear versus nonlinear models in item response theory. Applied Psychological Measurement, 6, 379–396.
McKinley, R. L. (1989). Confirmatory analysis of test structure using multidimensional item response theory. Technical Report No. RR-89-31. Princeton, NJ: Educational Testing Service.
McKinley, R. L., & Reckase, M. D. (1982). The use of the general Rasch model with multidimensional item response data. Technical Report No. ONR-82-1. Iowa City, IA: American College Testing Program.
Reckase, M. D. (1972). Development and application of a multivariate logistic latent trait model. Unpublished Doctoral Dissertation, Syracuse University, Syracuse, NY.
Reckase, M. D. (1997). The past and future of multidimensional item response theory. Applied Psychological Measurement, 21, 25–36.
Reckase, M. D. (2009). Multidimensional item response theory. New York: Springer.
Schwarz, G. (1978). Estimating the dimension of a model. The Annals of Statistics, 6, 461–464.
Spiegelhalter, D. J., Best, N. G., Carlin, B. P., & Van Der Linde, A. (2002). Bayesian measures of model complexity and fit. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 64, 583–639.
Svetina, D., & Levy, R. (2012). An overview of software for conducting dimensionality assessment in multidimensional models. Applied Psychological Measurement, 36, 659–669.
Sympson, J. B. (1978). A model for testing with multidimensional items. In D. J. Weiss (Ed.), Proceedings of the 1977 computerized adaptive testing conference (pp. 82–98).
Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Methodological), 58, 267–288.
Way, W. D., Ansley, T. N., & Forsyth, R. A. (1988). The comparative effects of compensatory and noncompensatory two-dimensional data on unidimensional IRT estimates. Applied Psychological Measurement, 12, 239–252.
Acknowledgments
This research was funded by Fundamental Research Funds for the Central Universities (No. BLX2014-31), NSF grant SES-1323977, NSF grant IIS-1633360, Army Research Office grant W911NF-15-1-0159, NIH grant R01GM047845, National Natural Science Foundation of China (31371047; 11171029). We also would like to thank Dr. Paul Barrett for letting us use the EPQ-R data.
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
The cyclical coordinate descent algorithm for solving the optimization (12) is introduced as follows. For each item j, there are one difficulty parameter \(b_j\) and K discrimination parameters \(\mathbf {a}_j=(a_{j1},\ldots ,a_{jK}).\) The algorithm update each of the \(K+1\) variables iteratively according to the following updating rule. For the difficulty parameter, there is no \(L_1\) penalty and it is updated by
where \(\partial \hat{Q}_{j}\) denotes derivative of \(\hat{Q}_{j}({\mathbf {a}_{j}} ,\,b_{j}|\mathbf {a}_{j}^{(t)},\,b_{j}^{(t)})\) with respect to \(b_{j}\) or \(a_{jk}\) as labeled by the subscript and \(\partial ^{2}{\hat{Q}_{j}}\) is the second derivative. During the above update, the discrimination vector \(\mathbf {a}_j\) takes its most up-to-date value. The above update employs a local quadratic approximation of \(\hat{Q}_{j}({\mathbf {a}_{j}},\,b_{j}^{*}|\mathbf {a}_{j}^{(t)},\,b_{j}^{(t)})\) as a function of \(b_j\) with all the other variables fixed. For each discrimination parameter \(a_{jk},\) an \(L_1\) penalty is imposed and it is updated by
The function S is the soft threshold operator (Donoho & Johnstone, 1995):
To obtain the above updating rule, we approximate a generic univariate function f(x) by a quadratic function
where \(f^{\prime \prime }(x_0)\) is negative. Furthermore, the \(L_1\)-penalized maximization with the approximated function
is solved at
Rights and permissions
About this article
Cite this article
Sun, J., Chen, Y., Liu, J. et al. Latent Variable Selection for Multidimensional Item Response Theory Models via \(L_{1}\) Regularization. Psychometrika 81, 921–939 (2016). https://doi.org/10.1007/s11336-016-9529-6
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11336-016-9529-6