Additive Multilevel Item Structure Models with Random Residuals: Item Modeling for Explanation and Item Generation

Cho, Sun-Joo; De Boeck, Paul; Embretson, Susan; Rabe-Hesketh, Sophia

doi:10.1007/s11336-013-9360-2

Additive Multilevel Item Structure Models with Random Residuals: Item Modeling for Explanation and Item Generation

Published: 12 December 2013

Volume 79, pages 84–104, (2014)
Cite this article

Psychometrika Aims and scope Submit manuscript

Sun-Joo Cho¹,
Paul De Boeck^2,3,
Susan Embretson⁴ &
…
Sophia Rabe-Hesketh^5,6

928 Accesses
22 Citations
Explore all metrics

Abstract

An additive multilevel item structure (AMIS) model with random residuals is proposed. The model includes multilevel latent regressions of item discrimination and item difficulty parameters on covariates at both item and item category levels with random residuals at both levels. The AMIS model is useful for explanation purposes and also for prediction purposes as in an item generation context. The parameters can be estimated with an alternating imputation posterior algorithm that makes use of adaptive quadrature, and the performance of this algorithm is evaluated in a simulation study.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Correction for Item Response Theory Latent Trait Measurement Error in Linear Mixed Effects Models

Article 10 June 2019

A Comparison of Item Parameter and Standard Error Recovery Across Different R Packages for Popular Unidimensional IRT Models

Generalized Fiducial Inference for Binary Logistic Item Response Models

Article 14 January 2016

References

Akaike, H. (1974). A new look at the statistical model identification. IEEE Transactions on Automatic Control, 19, 716–723.
Article Google Scholar
Albers, W., Does, R.J.M.M., Imbos, T., & Janssen, M.P.E. (1989). A stochastic growth model applied to repeated test of academic knowledge. Psychometrika, 54, 451–466.
Article Google Scholar
Baker, F.B., & Kim, S.-H. (2004). Item response theory: parameter estimation techniques (2nd ed.). New York: Dekker.
Google Scholar
Bejar, I.I. (1993). A generative approach to psychological and educational measurement. In N. Frederiksen, R.J. Mislevy, & I.I. Bejar (Eds.), Test theory for a new generation of tests (pp. 323–359). Hillsdale: Erlbaum.
Google Scholar
Bejar, I.I. (2012). Item generation: implications for a validity argument. In M. Gierl & T. Haladyna (Eds.), Automatic item generation, New York: Taylor & Francis.
Google Scholar
Bejar, I.I., Lawless, R.R., Morley, M.E., Wagner, M.E., Bennett, R.E., & Revuelta, J. (2003). A feasibility study of on-the-fly item generation in adaptive testing. Journal of Technology, Learning, and Assessment, 2, 3–28.
Google Scholar
Bellio, R., & Brazzale, A.R. (2011). Restricted likelihood inference for generalized linear models. Statistics and Computing, 21, 173–183.
Article Google Scholar
Birnbaum, A. (1968). Test scores, sufficient statistics, and the information structures of tests. In L. Lord & M. Novick (Eds.), Statistical theories of mental test scores (pp. 425–435). Reading: Addison-Wesley.
Google Scholar
Bock, R.D., & Schilling, S.G. (1997). High-dimensional full-information item factor analysis. In M. Berkane (Ed.), Latent variable modelling and applications to causality (pp. 164–176). New York: Springer.
Google Scholar
Bormuth, J.R. (1970). On the theory of achievement test items. Chicago: University of Chicago Press.
Google Scholar
Bradlow, E.T., Wainer, H., & Wang, X. (1999). A Bayesian random effects model for testlets. Psychometrika, 64, 153–168.
Article Google Scholar
Breslow, N.E., & Clayton, D.G. (1993). Approximate inference in generalized linear mixed models. Journal of the American Statistical Association, 88, 9–25.
Google Scholar
Breslow, N.E., & Lin, X. (1995). Bias correction in generalised linear mixed models with a single component of dispersion. Biometrika, 82, 81–91.
Article Google Scholar
Breslow, N.E. (2004). Whither PQL? In D.Y. Lin & P.J. Heagerty (Eds.), Proceedings of the second seattle symposium in biostatistics: analysis of correlated data (pp. 1–22). New York: Springer.
Chapter Google Scholar
Browne, W.J., & Draper, D. (2006). A comparison of Bayesian and likelihood methods for fitting multilevel models. Bayesian Analysis, 1, 473–514.
Article Google Scholar
Chaimongkol, S., Huffer, F.W., & Kamata, A. (2006). A Bayesian approach for fitting a random effect differential item functioning across group units. Thailand Statistician, 4, 27–41.
Google Scholar
Cho, S.-J., & Rabe-Hesketh, S. (2011). Alternating imputation posterior estimation of models with crossed random effects. Computational Statistics & Data Analysis, 55, 12–25.
Article Google Scholar
Cho, S.-J., & Suh, Y. (2012). Bayesian analysis of item response models using WinBUGS 1.4.3. Applied Psychological Measurement, 36, 147–148.
Article Google Scholar
Cho, S.-J., Athay, M., & Preacher, K.J. (2013). Measuring change for a multidimensional test using a generalized explanatory longitudinal item response model. British Journal of Mathematical & Statistical Psychology, 66, 353–381.
Article Google Scholar
Cho, S.-J., Gilbert, J.K., & Goodwin, A.P. (2013). Explanatory multidimensional multilevel random item response model: an application to simultaneous investigation of word and person contributions to multidimensional lexical quality. Psychometrika, 78, 830–855.
Article PubMed Google Scholar
Clayton, D.G., & Rasbash, J. (1999). Estimation in large crossed random-effect models by data augmentation. Journal of the Royal Statistical Society Series A, 162, 425–436.
Article Google Scholar
Daniel, R.C., & Embretson, S.E. (2010). Designing cognitive complexity in mathematical problem-solving items. Applied Psychological Measurement, 34, 348–364.
Article Google Scholar
De Boeck, P. (2008). Random item IRT models. Psychometrika, 73, 533–559.
Article Google Scholar
De Jong, M.G., Steenkamp, J.B.E.M., & Fox, J.-P. (2007). Relaxing cross-national measurement invariance using a hierarchical IRT model. Journal of Consumer Research, 34, 260–278.
Article Google Scholar
De Jong, M.G., Steenkamp, J.B.E.M., Fox, J.-P., & Baumgartner, H. (2008). Using item response theory to measure extreme response style in marketing research: a global investigation. Journal of Marketing Research, 45, 104–115.
Article Google Scholar
De Jong, M.G., & Steenkamp, J.B.E.M. (2010). Finite mixture multilevel multidimensional ordinal IRT models for large-scale cross-cultural research. Psychometrika, 75, 3–32.
Article Google Scholar
Embretson, S.E. (1998). A cognitive design system approach to generating valid tests: application to abstract reasoning. Psychological Methods, 3, 300–396.
Article Google Scholar
Embretson, S.E. (1999). Generating items during testing: psychometric issues and models. Psychometrika, 64, 407–433.
Article Google Scholar
Embretson, S.E. (2010). Cognitive design systems: a structural modelling approach applied to developing a spatial abtiliy test. In S.E. Embretson (Ed.), Measuring psychological constructs: advances in model-based approaches (pp. 247–273). Washington: American Psychological Association.
Chapter Google Scholar
Embretson, S.E., & Daniel, R.C. (2008). Understanding and quantifying cognitive complexity level in mathematical problem solving items. Psychology Science Quarterly, 50, 328–344.
Google Scholar
Embretson, S.E., & Gorin, J.S. (2001). Improving construct validity with cognitive psychology principles. Journal of Educational Measurement, 38, 343–368.
Article Google Scholar
Embretson, S.E., & Yang, X. (2007). Automatic item generation and cognitive psychology. In C.R. Rao & S. Sinharay (Eds.), Handbook of statistics: psychometrics (Vol. 26, pp. 747–768). North Holland: Elsevier.
Chapter Google Scholar
Fischer, G.H. (1973). Linear logistic test model as an instrument in educational research. Acta Psychologica, 37, 359–374.
Article Google Scholar
Fox, J.-.P. (2010). Bayesian item response modeling. New York: Springer.
Book Google Scholar
Frederickx, S., Tuerlinckx, F., De Boeck, P., & Magis, D. (2010). RIM: a random item mixture model to detect differential item functioning. Journal of Educational Measurement, 47, 432–457.
Article Google Scholar
Freund, Ph.A., Hofer, S., & Holling, H. (2008). Explaining and controlling for the psychometric properties of computer-generated figural matrix items. Applied Psychological Measurement, 32, 195–210.
Article Google Scholar
Geerlings, H., Glas, C.A.W., & van der Linden, W.J. (2011). Modeling rule-based item generation. Psychometrika, 76, 337–359.
Article Google Scholar
Gierl, M., & Haladyna, T. (2012). Automatic item generation. New York: Taylor & Francis.
Google Scholar
Gierl, M., & Lai, H. (2012). Using weak and strong theory to create item models for automatic item generation: some practical guidelines with examples. In M. Gierl & T. Haladyna (Eds.), Automatic item generation, New York: Taylor & Francis.
Google Scholar
Gierl, M.J., Zhou, J., & Alves, C.B. (2008). Developing a taxonomy of item model types to promote assessment engineering. The Journal of Technology, Learning, and Assessment, 7, 1–51.
Google Scholar
Glas, C.A.W., & van der Linden, W.J. (2003). Computerized adaptive testing with item cloning. Applied Psychological Measurement, 27, 247–261.
Article Google Scholar
Goldstein, H., & Rasbash, J. (1996). Improved approximations for multilevel models with binary responses. Journal of the Royal Statistical Society Series A, 159, 505–513.
Article Google Scholar
Goldstein, H. (1991). Nonlinear multilevel models, with an application to discrete response data. Biometrika, 78, 45–51.
Article Google Scholar
Gorin, J. (2005). Manipulating processing difficulty of reading comprehension questions: the feasibility of verbal item generation. Journal of Educational Measurement, 42, 351–373.
Article Google Scholar
Gurieroux, C., Holly, A., & Monfort, A. (1982). Likelihood ratio test, Wald test, and Kuhn–Tucker test in linear models with inequality constraints on the regression parameters on the regression parameters. Econometrica, 50, 63–80.
Article Google Scholar
Holling, H., Bertling, J.P., & Zeuch, N. (2009). Probability word problems: automatic item generation and LLTM modelling. Studies in Educational Evaluation, 35, 71–76.
Article Google Scholar
Irvine, S.H. & Kyllonen, P. (Eds.) (2002). Item generation for test development. Mahwah: Erlbaum.
Google Scholar
Janssen, R., Tuerlinckx, F., Meulders, M., & De Boeck, P. (2000). A hierarchical IRT model for criterion-referenced measurement. Journal of Educational and Behavioral Statistics, 25, 285–306.
Article Google Scholar
Janssen, R., Schepers, J., & Perez, D. (2004). Models with item and item group predictors. In P. De Boeck & M. Wilson (Eds.), Explanatory item response models: a generalized linear and nonlinear approach (pp. 189–212). New York: Springer.
Chapter Google Scholar
Joe, H. (2008). Accuracy of Laplace approximation for discrete response mixed models. Computational Statistics & Data Analysis, 52, 5066–5074.
Article Google Scholar
Johnson, M.S., & Sinharay, S. (2005). Calibration of polytomous item families using Bayesian hierarchical modeling. Applied Psychological Measurement, 29, 369–400.
Article Google Scholar
Karim, M.R., & Zeger, S.L. (1992). Generalized linear models with random effects: Salamander mating revisited. Biometrics, 48, 631–644.
Article PubMed Google Scholar
Klein Entink, R.H., Fox, J.-P., & van der Linden, W.J. (2009a). A multivariate multilevel approach to the modeling of accuracy and speed of test takers. Psychometrika, 74, 21–48.
Article PubMed PubMed Central Google Scholar
Klein Entink, R.H., Kuhn, J.-T., Hornke, L.F., & Fox, J.-P. (2009b). Evaluating cognitive theory: a joint modeling approach using responses and response times. Psychological Methods, 14, 54–75.
Article PubMed Google Scholar
Koehler, E., Brown, E., & Haneuse, S. (2009). On the assessment of Monte Carlo error in simulation-based statistical analyses. American Statistician, 63, 155–162.
Article PubMed PubMed Central Google Scholar
Lee, Y., & Nelder, J.A. (1996). Hierarchical generalized linear models (with discussion). Journal of the Royal Statistical Society Series B, 58, 619–678.
Google Scholar
Lee, Y., & Nelder, J.A. (2006). Double-hierarchical generalized linear models (with discussion). Journal of the Royal Statistical Society Series C, 55, 1–29.
Article Google Scholar
Lin, X., & Breslow, N.E. (1996). Bias correction in generalized linear mixed models with multiple components of dispersion. Journal of the American Statistical Association, 91, 1007–1016.
Article Google Scholar
McGilchrist, C.A. (1994). Estimation in generalized mixed models. Journal of the Royal Statistical Society Series B, 56, 61–69.
Google Scholar
Millman, J., & Westman, R.S. (1989). Computer assisted writing of achievement test items: toward a future technology. Journal of Educational Measurement, 26, 177–190.
Article Google Scholar
Mislevy, R.J. (1986). Bayes modal estimation in item response models. Psychometrika, 51, 177–195.
Article Google Scholar
Mislevy, R.J. (1988). Exploiting auxiliary information about items in the estimation of Rasch item difficulty parameters. Applied Psychological Measurement, 12, 281–296.
Article Google Scholar
Natarajan, R., & Kass, R.E. (2000). Reference Bayesian methods for generalized linear mixed model. Journal of the American Statistical Association, 95, 227–237.
Article Google Scholar
Noh, M., & Lee, Y. (2007). REML estimation for binary data in GLMMs. Journal of Multivariate Analysis, 98, 896–915.
Article Google Scholar
Patterson, H.D., & Thompson, R. (1971). Recovery of inter-block information when block sizes are unequal. Biometrika, 58, 545–554.
Article Google Scholar
Pinheiro, J.C., & Bates, D.M. (1995). Approximation to the log-likelihood function in the nonlinear mixed-effects model. Journal of Computational Graphics and Statistics, 4, 12–35.
Google Scholar
Rabe-Hesketh, S., Skrondal, A., & Pickles, A. (2004). Generalized multilevel structural equation modelling. Psychometrika, 69, 167–190.
Article Google Scholar
Rabe-Hesketh, S., Skrondal, A., & Pickles, A. (2005). Maximum likelihood estimation of limited and discrete dependent variable models with nested random effects. Journal of Econometrics, 128, 301–323.
Article Google Scholar
Rabe-Hesketh, S., & Skrondal, A. (2012). Multilevel and longitudinal modeling using Stata (3rd ed.). College Station: Stata Press.
Google Scholar
Rasbash, J., & Browne, W.J. (2007). Non-hierarchical multilevel models. In J. de Leeuw & E. Meijer (Eds.), Handbook of multilevel analysis (pp. 333–336). New York: Springer.
Google Scholar
Raudenbush, S.W., Yang, M., & Yosef, M. (2000). Maximum likelihood for generalized linear models with nested random effects via high-order, multivariate Laplace approximation. Journal of Computational and Graphical Statistics, 9, 141–157.
Google Scholar
Rodriguez, G., & Goldman, N. (1995). An assessment of estimation procedures for multilevel models with binary responses. Journal of the Royal Statistical Society Series A, 158, 73–89.
Article Google Scholar
Rodriguez, G., & Goldman, N. (2001). Improved estimation procedures for multilevel models with binary response: a case study. Journal of the Royal Statistical Society Series A, 164, 339–355.
Article Google Scholar
Roid, G.H., & Haladyna, T.M. (1982). Toward a technology of test-item writing. New York: Academic.
Google Scholar
Schilling, S., & Bock, R.D. (2005). High dimensional maximum marginal likelihood item factor analysis by adaptive quadrature. Psychometrika, 70, 533–555.
Google Scholar
Schwarz, G. (1978). Estimating the dimension of a model. The Annals of Statistics, 6, 461–464.
Article Google Scholar
Scrams, D.J., Mislevy, R.J., & Sheehan, K.M. (2002). An analysis of similarities in item functioning within antonym and analogy variant families (RR-02-13). Princeton: Educational Testing Service.
Google Scholar
Sinharay, S., Johnson, M.S., & Williamson, D.M. (2003). Calibrating item families and summarizing the results using family expected response functions. Journal of Educational and Behavioral Statistics, 28, 295–313.
Article Google Scholar
Snijders, T.A.B., & Bosker, R.J. (1994). Modeled variance in two-level models. Sociological Methods & Research, 22, 342–363.
Article Google Scholar
Soares, T.M., Gonçalvez, F.B., & Gamerman, D. (2009). An integrated Bayesian model for DIF analysis. Journal of Educational and Behavioral Statistics, 34, 348–377.
Article Google Scholar
Stram, D.O., & Lee, J.W. (1994). Variance components testing in the longitudinal mixed effect model. Biometrics, 50, 1171–1177.
Article PubMed Google Scholar
Stram, D.O., & Lee, J.W. (1995). Correction to: variance components testing in the longitudinal mixed-effects model. Biometrics, 51, 1196.
Google Scholar
Tanner, M.A., & Wong, W.H. (1987). The calculation of posterior distributions by data augmentation. Journal of the American Statistical Association, 82, 528–540.
Article Google Scholar
Tierney, L., & Kadane, J.B. (1986). Accurate approximations for posterior moments and marginal densities. Journal of the American Statistical Association, 81, 82–86.
Article Google Scholar
Vaida, F., & Blanchard, S. (2005). Conditional Akaike information for mixed effects models. Biometrika, 92, 351–370.
Article Google Scholar
van der Linden, W.J., Klein Entink, R.H., & Fox, J.-P. (2010). IRT parameter estimation with response times as collateral information. Applied Psychological Measurement, 34, 327–347.
Article Google Scholar
Verbeke, G., & Molenberghs, G. (2003). The use of score tests for inference on variance components. Biometrics, 59, 254–262.
Article PubMed Google Scholar
Wainer, H., Bradlow, E.T., & Wang, X. (2007). Testlet response theory and its applications. New York: Cambridge University Press.
Book Google Scholar

Download references

Author information

Authors and Affiliations

Vanderbilt University, Nashville, USA
Sun-Joo Cho
Ohio State University, Columbus, USA
Paul De Boeck
KU, Leuven, Denmark
Paul De Boeck
Georgia Institute of Technology, Atlanta, USA
Susan Embretson
University of California, Berkeley, USA
Sophia Rabe-Hesketh
Institute of Education, University of London, London, UK
Sophia Rabe-Hesketh

Authors

Sun-Joo Cho
View author publications
You can also search for this author in PubMed Google Scholar
Paul De Boeck
View author publications
You can also search for this author in PubMed Google Scholar
Susan Embretson
View author publications
You can also search for this author in PubMed Google Scholar
Sophia Rabe-Hesketh
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sun-Joo Cho.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cho, SJ., De Boeck, P., Embretson, S. et al. Additive Multilevel Item Structure Models with Random Residuals: Item Modeling for Explanation and Item Generation. Psychometrika 79, 84–104 (2014). https://doi.org/10.1007/s11336-013-9360-2

Download citation

Received: 10 June 2011
Published: 12 December 2013
Issue Date: January 2014
DOI: https://doi.org/10.1007/s11336-013-9360-2

Key words

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Additive Multilevel Item Structure Models with Random Residuals: Item Modeling for Explanation and Item Generation

Abstract

Access this article

Similar content being viewed by others

Correction for Item Response Theory Latent Trait Measurement Error in Linear Mixed Effects Models

A Comparison of Item Parameter and Standard Error Recovery Across Different R Packages for Popular Unidimensional IRT Models

Generalized Fiducial Inference for Binary Logistic Item Response Models

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Key words

Navigation

Additive Multilevel Item Structure Models with Random Residuals: Item Modeling for Explanation and Item Generation

Abstract

Access this article

Similar content being viewed by others

Correction for Item Response Theory Latent Trait Measurement Error in Linear Mixed Effects Models

A Comparison of Item Parameter and Standard Error Recovery Across Different R Packages for Popular Unidimensional IRT Models

Generalized Fiducial Inference for Binary Logistic Item Response Models

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Key words

Search

Navigation