Skip to main content

Advertisement

Log in

Additive Multilevel Item Structure Models with Random Residuals: Item Modeling for Explanation and Item Generation

  • Published:
Psychometrika Aims and scope Submit manuscript

Abstract

An additive multilevel item structure (AMIS) model with random residuals is proposed. The model includes multilevel latent regressions of item discrimination and item difficulty parameters on covariates at both item and item category levels with random residuals at both levels. The AMIS model is useful for explanation purposes and also for prediction purposes as in an item generation context. The parameters can be estimated with an alternating imputation posterior algorithm that makes use of adaptive quadrature, and the performance of this algorithm is evaluated in a simulation study.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Figure 1.

Similar content being viewed by others

References

  • Akaike, H. (1974). A new look at the statistical model identification. IEEE Transactions on Automatic Control, 19, 716–723.

    Article  Google Scholar 

  • Albers, W., Does, R.J.M.M., Imbos, T., & Janssen, M.P.E. (1989). A stochastic growth model applied to repeated test of academic knowledge. Psychometrika, 54, 451–466.

    Article  Google Scholar 

  • Baker, F.B., & Kim, S.-H. (2004). Item response theory: parameter estimation techniques (2nd ed.). New York: Dekker.

    Google Scholar 

  • Bejar, I.I. (1993). A generative approach to psychological and educational measurement. In N. Frederiksen, R.J. Mislevy, & I.I. Bejar (Eds.), Test theory for a new generation of tests (pp. 323–359). Hillsdale: Erlbaum.

    Google Scholar 

  • Bejar, I.I. (2012). Item generation: implications for a validity argument. In M. Gierl & T. Haladyna (Eds.), Automatic item generation, New York: Taylor & Francis.

    Google Scholar 

  • Bejar, I.I., Lawless, R.R., Morley, M.E., Wagner, M.E., Bennett, R.E., & Revuelta, J. (2003). A feasibility study of on-the-fly item generation in adaptive testing. Journal of Technology, Learning, and Assessment, 2, 3–28.

    Google Scholar 

  • Bellio, R., & Brazzale, A.R. (2011). Restricted likelihood inference for generalized linear models. Statistics and Computing, 21, 173–183.

    Article  Google Scholar 

  • Birnbaum, A. (1968). Test scores, sufficient statistics, and the information structures of tests. In L. Lord & M. Novick (Eds.), Statistical theories of mental test scores (pp. 425–435). Reading: Addison-Wesley.

    Google Scholar 

  • Bock, R.D., & Schilling, S.G. (1997). High-dimensional full-information item factor analysis. In M. Berkane (Ed.), Latent variable modelling and applications to causality (pp. 164–176). New York: Springer.

    Google Scholar 

  • Bormuth, J.R. (1970). On the theory of achievement test items. Chicago: University of Chicago Press.

    Google Scholar 

  • Bradlow, E.T., Wainer, H., & Wang, X. (1999). A Bayesian random effects model for testlets. Psychometrika, 64, 153–168.

    Article  Google Scholar 

  • Breslow, N.E., & Clayton, D.G. (1993). Approximate inference in generalized linear mixed models. Journal of the American Statistical Association, 88, 9–25.

    Google Scholar 

  • Breslow, N.E., & Lin, X. (1995). Bias correction in generalised linear mixed models with a single component of dispersion. Biometrika, 82, 81–91.

    Article  Google Scholar 

  • Breslow, N.E. (2004). Whither PQL? In D.Y. Lin & P.J. Heagerty (Eds.), Proceedings of the second seattle symposium in biostatistics: analysis of correlated data (pp. 1–22). New York: Springer.

    Chapter  Google Scholar 

  • Browne, W.J., & Draper, D. (2006). A comparison of Bayesian and likelihood methods for fitting multilevel models. Bayesian Analysis, 1, 473–514.

    Article  Google Scholar 

  • Chaimongkol, S., Huffer, F.W., & Kamata, A. (2006). A Bayesian approach for fitting a random effect differential item functioning across group units. Thailand Statistician, 4, 27–41.

    Google Scholar 

  • Cho, S.-J., & Rabe-Hesketh, S. (2011). Alternating imputation posterior estimation of models with crossed random effects. Computational Statistics & Data Analysis, 55, 12–25.

    Article  Google Scholar 

  • Cho, S.-J., & Suh, Y. (2012). Bayesian analysis of item response models using WinBUGS 1.4.3. Applied Psychological Measurement, 36, 147–148.

    Article  Google Scholar 

  • Cho, S.-J., Athay, M., & Preacher, K.J. (2013). Measuring change for a multidimensional test using a generalized explanatory longitudinal item response model. British Journal of Mathematical & Statistical Psychology, 66, 353–381.

    Article  Google Scholar 

  • Cho, S.-J., Gilbert, J.K., & Goodwin, A.P. (2013). Explanatory multidimensional multilevel random item response model: an application to simultaneous investigation of word and person contributions to multidimensional lexical quality. Psychometrika, 78, 830–855.

    Article  PubMed  Google Scholar 

  • Clayton, D.G., & Rasbash, J. (1999). Estimation in large crossed random-effect models by data augmentation. Journal of the Royal Statistical Society Series A, 162, 425–436.

    Article  Google Scholar 

  • Daniel, R.C., & Embretson, S.E. (2010). Designing cognitive complexity in mathematical problem-solving items. Applied Psychological Measurement, 34, 348–364.

    Article  Google Scholar 

  • De Boeck, P. (2008). Random item IRT models. Psychometrika, 73, 533–559.

    Article  Google Scholar 

  • De Jong, M.G., Steenkamp, J.B.E.M., & Fox, J.-P. (2007). Relaxing cross-national measurement invariance using a hierarchical IRT model. Journal of Consumer Research, 34, 260–278.

    Article  Google Scholar 

  • De Jong, M.G., Steenkamp, J.B.E.M., Fox, J.-P., & Baumgartner, H. (2008). Using item response theory to measure extreme response style in marketing research: a global investigation. Journal of Marketing Research, 45, 104–115.

    Article  Google Scholar 

  • De Jong, M.G., & Steenkamp, J.B.E.M. (2010). Finite mixture multilevel multidimensional ordinal IRT models for large-scale cross-cultural research. Psychometrika, 75, 3–32.

    Article  Google Scholar 

  • Embretson, S.E. (1998). A cognitive design system approach to generating valid tests: application to abstract reasoning. Psychological Methods, 3, 300–396.

    Article  Google Scholar 

  • Embretson, S.E. (1999). Generating items during testing: psychometric issues and models. Psychometrika, 64, 407–433.

    Article  Google Scholar 

  • Embretson, S.E. (2010). Cognitive design systems: a structural modelling approach applied to developing a spatial abtiliy test. In S.E. Embretson (Ed.), Measuring psychological constructs: advances in model-based approaches (pp. 247–273). Washington: American Psychological Association.

    Chapter  Google Scholar 

  • Embretson, S.E., & Daniel, R.C. (2008). Understanding and quantifying cognitive complexity level in mathematical problem solving items. Psychology Science Quarterly, 50, 328–344.

    Google Scholar 

  • Embretson, S.E., & Gorin, J.S. (2001). Improving construct validity with cognitive psychology principles. Journal of Educational Measurement, 38, 343–368.

    Article  Google Scholar 

  • Embretson, S.E., & Yang, X. (2007). Automatic item generation and cognitive psychology. In C.R. Rao & S. Sinharay (Eds.), Handbook of statistics: psychometrics (Vol. 26, pp. 747–768). North Holland: Elsevier.

    Chapter  Google Scholar 

  • Fischer, G.H. (1973). Linear logistic test model as an instrument in educational research. Acta Psychologica, 37, 359–374.

    Article  Google Scholar 

  • Fox, J.-.P. (2010). Bayesian item response modeling. New York: Springer.

    Book  Google Scholar 

  • Frederickx, S., Tuerlinckx, F., De Boeck, P., & Magis, D. (2010). RIM: a random item mixture model to detect differential item functioning. Journal of Educational Measurement, 47, 432–457.

    Article  Google Scholar 

  • Freund, Ph.A., Hofer, S., & Holling, H. (2008). Explaining and controlling for the psychometric properties of computer-generated figural matrix items. Applied Psychological Measurement, 32, 195–210.

    Article  Google Scholar 

  • Geerlings, H., Glas, C.A.W., & van der Linden, W.J. (2011). Modeling rule-based item generation. Psychometrika, 76, 337–359.

    Article  Google Scholar 

  • Gierl, M., & Haladyna, T. (2012). Automatic item generation. New York: Taylor & Francis.

    Google Scholar 

  • Gierl, M., & Lai, H. (2012). Using weak and strong theory to create item models for automatic item generation: some practical guidelines with examples. In M. Gierl & T. Haladyna (Eds.), Automatic item generation, New York: Taylor & Francis.

    Google Scholar 

  • Gierl, M.J., Zhou, J., & Alves, C.B. (2008). Developing a taxonomy of item model types to promote assessment engineering. The Journal of Technology, Learning, and Assessment, 7, 1–51.

    Google Scholar 

  • Glas, C.A.W., & van der Linden, W.J. (2003). Computerized adaptive testing with item cloning. Applied Psychological Measurement, 27, 247–261.

    Article  Google Scholar 

  • Goldstein, H., & Rasbash, J. (1996). Improved approximations for multilevel models with binary responses. Journal of the Royal Statistical Society Series A, 159, 505–513.

    Article  Google Scholar 

  • Goldstein, H. (1991). Nonlinear multilevel models, with an application to discrete response data. Biometrika, 78, 45–51.

    Article  Google Scholar 

  • Gorin, J. (2005). Manipulating processing difficulty of reading comprehension questions: the feasibility of verbal item generation. Journal of Educational Measurement, 42, 351–373.

    Article  Google Scholar 

  • Gurieroux, C., Holly, A., & Monfort, A. (1982). Likelihood ratio test, Wald test, and Kuhn–Tucker test in linear models with inequality constraints on the regression parameters on the regression parameters. Econometrica, 50, 63–80.

    Article  Google Scholar 

  • Holling, H., Bertling, J.P., & Zeuch, N. (2009). Probability word problems: automatic item generation and LLTM modelling. Studies in Educational Evaluation, 35, 71–76.

    Article  Google Scholar 

  • Irvine, S.H. & Kyllonen, P. (Eds.) (2002). Item generation for test development. Mahwah: Erlbaum.

    Google Scholar 

  • Janssen, R., Tuerlinckx, F., Meulders, M., & De Boeck, P. (2000). A hierarchical IRT model for criterion-referenced measurement. Journal of Educational and Behavioral Statistics, 25, 285–306.

    Article  Google Scholar 

  • Janssen, R., Schepers, J., & Perez, D. (2004). Models with item and item group predictors. In P. De Boeck & M. Wilson (Eds.), Explanatory item response models: a generalized linear and nonlinear approach (pp. 189–212). New York: Springer.

    Chapter  Google Scholar 

  • Joe, H. (2008). Accuracy of Laplace approximation for discrete response mixed models. Computational Statistics & Data Analysis, 52, 5066–5074.

    Article  Google Scholar 

  • Johnson, M.S., & Sinharay, S. (2005). Calibration of polytomous item families using Bayesian hierarchical modeling. Applied Psychological Measurement, 29, 369–400.

    Article  Google Scholar 

  • Karim, M.R., & Zeger, S.L. (1992). Generalized linear models with random effects: Salamander mating revisited. Biometrics, 48, 631–644.

    Article  PubMed  Google Scholar 

  • Klein Entink, R.H., Fox, J.-P., & van der Linden, W.J. (2009a). A multivariate multilevel approach to the modeling of accuracy and speed of test takers. Psychometrika, 74, 21–48.

    Article  PubMed  PubMed Central  Google Scholar 

  • Klein Entink, R.H., Kuhn, J.-T., Hornke, L.F., & Fox, J.-P. (2009b). Evaluating cognitive theory: a joint modeling approach using responses and response times. Psychological Methods, 14, 54–75.

    Article  PubMed  Google Scholar 

  • Koehler, E., Brown, E., & Haneuse, S. (2009). On the assessment of Monte Carlo error in simulation-based statistical analyses. American Statistician, 63, 155–162.

    Article  PubMed  PubMed Central  Google Scholar 

  • Lee, Y., & Nelder, J.A. (1996). Hierarchical generalized linear models (with discussion). Journal of the Royal Statistical Society Series B, 58, 619–678.

    Google Scholar 

  • Lee, Y., & Nelder, J.A. (2006). Double-hierarchical generalized linear models (with discussion). Journal of the Royal Statistical Society Series C, 55, 1–29.

    Article  Google Scholar 

  • Lin, X., & Breslow, N.E. (1996). Bias correction in generalized linear mixed models with multiple components of dispersion. Journal of the American Statistical Association, 91, 1007–1016.

    Article  Google Scholar 

  • McGilchrist, C.A. (1994). Estimation in generalized mixed models. Journal of the Royal Statistical Society Series B, 56, 61–69.

    Google Scholar 

  • Millman, J., & Westman, R.S. (1989). Computer assisted writing of achievement test items: toward a future technology. Journal of Educational Measurement, 26, 177–190.

    Article  Google Scholar 

  • Mislevy, R.J. (1986). Bayes modal estimation in item response models. Psychometrika, 51, 177–195.

    Article  Google Scholar 

  • Mislevy, R.J. (1988). Exploiting auxiliary information about items in the estimation of Rasch item difficulty parameters. Applied Psychological Measurement, 12, 281–296.

    Article  Google Scholar 

  • Natarajan, R., & Kass, R.E. (2000). Reference Bayesian methods for generalized linear mixed model. Journal of the American Statistical Association, 95, 227–237.

    Article  Google Scholar 

  • Noh, M., & Lee, Y. (2007). REML estimation for binary data in GLMMs. Journal of Multivariate Analysis, 98, 896–915.

    Article  Google Scholar 

  • Patterson, H.D., & Thompson, R. (1971). Recovery of inter-block information when block sizes are unequal. Biometrika, 58, 545–554.

    Article  Google Scholar 

  • Pinheiro, J.C., & Bates, D.M. (1995). Approximation to the log-likelihood function in the nonlinear mixed-effects model. Journal of Computational Graphics and Statistics, 4, 12–35.

    Google Scholar 

  • Rabe-Hesketh, S., Skrondal, A., & Pickles, A. (2004). Generalized multilevel structural equation modelling. Psychometrika, 69, 167–190.

    Article  Google Scholar 

  • Rabe-Hesketh, S., Skrondal, A., & Pickles, A. (2005). Maximum likelihood estimation of limited and discrete dependent variable models with nested random effects. Journal of Econometrics, 128, 301–323.

    Article  Google Scholar 

  • Rabe-Hesketh, S., & Skrondal, A. (2012). Multilevel and longitudinal modeling using Stata (3rd ed.). College Station: Stata Press.

    Google Scholar 

  • Rasbash, J., & Browne, W.J. (2007). Non-hierarchical multilevel models. In J. de Leeuw & E. Meijer (Eds.), Handbook of multilevel analysis (pp. 333–336). New York: Springer.

    Google Scholar 

  • Raudenbush, S.W., Yang, M., & Yosef, M. (2000). Maximum likelihood for generalized linear models with nested random effects via high-order, multivariate Laplace approximation. Journal of Computational and Graphical Statistics, 9, 141–157.

    Google Scholar 

  • Rodriguez, G., & Goldman, N. (1995). An assessment of estimation procedures for multilevel models with binary responses. Journal of the Royal Statistical Society Series A, 158, 73–89.

    Article  Google Scholar 

  • Rodriguez, G., & Goldman, N. (2001). Improved estimation procedures for multilevel models with binary response: a case study. Journal of the Royal Statistical Society Series A, 164, 339–355.

    Article  Google Scholar 

  • Roid, G.H., & Haladyna, T.M. (1982). Toward a technology of test-item writing. New York: Academic.

    Google Scholar 

  • Schilling, S., & Bock, R.D. (2005). High dimensional maximum marginal likelihood item factor analysis by adaptive quadrature. Psychometrika, 70, 533–555.

    Google Scholar 

  • Schwarz, G. (1978). Estimating the dimension of a model. The Annals of Statistics, 6, 461–464.

    Article  Google Scholar 

  • Scrams, D.J., Mislevy, R.J., & Sheehan, K.M. (2002). An analysis of similarities in item functioning within antonym and analogy variant families (RR-02-13). Princeton: Educational Testing Service.

    Google Scholar 

  • Sinharay, S., Johnson, M.S., & Williamson, D.M. (2003). Calibrating item families and summarizing the results using family expected response functions. Journal of Educational and Behavioral Statistics, 28, 295–313.

    Article  Google Scholar 

  • Snijders, T.A.B., & Bosker, R.J. (1994). Modeled variance in two-level models. Sociological Methods & Research, 22, 342–363.

    Article  Google Scholar 

  • Soares, T.M., Gonçalvez, F.B., & Gamerman, D. (2009). An integrated Bayesian model for DIF analysis. Journal of Educational and Behavioral Statistics, 34, 348–377.

    Article  Google Scholar 

  • Stram, D.O., & Lee, J.W. (1994). Variance components testing in the longitudinal mixed effect model. Biometrics, 50, 1171–1177.

    Article  PubMed  Google Scholar 

  • Stram, D.O., & Lee, J.W. (1995). Correction to: variance components testing in the longitudinal mixed-effects model. Biometrics, 51, 1196.

    Google Scholar 

  • Tanner, M.A., & Wong, W.H. (1987). The calculation of posterior distributions by data augmentation. Journal of the American Statistical Association, 82, 528–540.

    Article  Google Scholar 

  • Tierney, L., & Kadane, J.B. (1986). Accurate approximations for posterior moments and marginal densities. Journal of the American Statistical Association, 81, 82–86.

    Article  Google Scholar 

  • Vaida, F., & Blanchard, S. (2005). Conditional Akaike information for mixed effects models. Biometrika, 92, 351–370.

    Article  Google Scholar 

  • van der Linden, W.J., Klein Entink, R.H., & Fox, J.-P. (2010). IRT parameter estimation with response times as collateral information. Applied Psychological Measurement, 34, 327–347.

    Article  Google Scholar 

  • Verbeke, G., & Molenberghs, G. (2003). The use of score tests for inference on variance components. Biometrics, 59, 254–262.

    Article  PubMed  Google Scholar 

  • Wainer, H., Bradlow, E.T., & Wang, X. (2007). Testlet response theory and its applications. New York: Cambridge University Press.

    Book  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sun-Joo Cho.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cho, SJ., De Boeck, P., Embretson, S. et al. Additive Multilevel Item Structure Models with Random Residuals: Item Modeling for Explanation and Item Generation. Psychometrika 79, 84–104 (2014). https://doi.org/10.1007/s11336-013-9360-2

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11336-013-9360-2

Key words

Navigation