Abstract
An additive multilevel item structure (AMIS) model with random residuals is proposed. The model includes multilevel latent regressions of item discrimination and item difficulty parameters on covariates at both item and item category levels with random residuals at both levels. The AMIS model is useful for explanation purposes and also for prediction purposes as in an item generation context. The parameters can be estimated with an alternating imputation posterior algorithm that makes use of adaptive quadrature, and the performance of this algorithm is evaluated in a simulation study.
Similar content being viewed by others
References
Akaike, H. (1974). A new look at the statistical model identification. IEEE Transactions on Automatic Control, 19, 716–723.
Albers, W., Does, R.J.M.M., Imbos, T., & Janssen, M.P.E. (1989). A stochastic growth model applied to repeated test of academic knowledge. Psychometrika, 54, 451–466.
Baker, F.B., & Kim, S.-H. (2004). Item response theory: parameter estimation techniques (2nd ed.). New York: Dekker.
Bejar, I.I. (1993). A generative approach to psychological and educational measurement. In N. Frederiksen, R.J. Mislevy, & I.I. Bejar (Eds.), Test theory for a new generation of tests (pp. 323–359). Hillsdale: Erlbaum.
Bejar, I.I. (2012). Item generation: implications for a validity argument. In M. Gierl & T. Haladyna (Eds.), Automatic item generation, New York: Taylor & Francis.
Bejar, I.I., Lawless, R.R., Morley, M.E., Wagner, M.E., Bennett, R.E., & Revuelta, J. (2003). A feasibility study of on-the-fly item generation in adaptive testing. Journal of Technology, Learning, and Assessment, 2, 3–28.
Bellio, R., & Brazzale, A.R. (2011). Restricted likelihood inference for generalized linear models. Statistics and Computing, 21, 173–183.
Birnbaum, A. (1968). Test scores, sufficient statistics, and the information structures of tests. In L. Lord & M. Novick (Eds.), Statistical theories of mental test scores (pp. 425–435). Reading: Addison-Wesley.
Bock, R.D., & Schilling, S.G. (1997). High-dimensional full-information item factor analysis. In M. Berkane (Ed.), Latent variable modelling and applications to causality (pp. 164–176). New York: Springer.
Bormuth, J.R. (1970). On the theory of achievement test items. Chicago: University of Chicago Press.
Bradlow, E.T., Wainer, H., & Wang, X. (1999). A Bayesian random effects model for testlets. Psychometrika, 64, 153–168.
Breslow, N.E., & Clayton, D.G. (1993). Approximate inference in generalized linear mixed models. Journal of the American Statistical Association, 88, 9–25.
Breslow, N.E., & Lin, X. (1995). Bias correction in generalised linear mixed models with a single component of dispersion. Biometrika, 82, 81–91.
Breslow, N.E. (2004). Whither PQL? In D.Y. Lin & P.J. Heagerty (Eds.), Proceedings of the second seattle symposium in biostatistics: analysis of correlated data (pp. 1–22). New York: Springer.
Browne, W.J., & Draper, D. (2006). A comparison of Bayesian and likelihood methods for fitting multilevel models. Bayesian Analysis, 1, 473–514.
Chaimongkol, S., Huffer, F.W., & Kamata, A. (2006). A Bayesian approach for fitting a random effect differential item functioning across group units. Thailand Statistician, 4, 27–41.
Cho, S.-J., & Rabe-Hesketh, S. (2011). Alternating imputation posterior estimation of models with crossed random effects. Computational Statistics & Data Analysis, 55, 12–25.
Cho, S.-J., & Suh, Y. (2012). Bayesian analysis of item response models using WinBUGS 1.4.3. Applied Psychological Measurement, 36, 147–148.
Cho, S.-J., Athay, M., & Preacher, K.J. (2013). Measuring change for a multidimensional test using a generalized explanatory longitudinal item response model. British Journal of Mathematical & Statistical Psychology, 66, 353–381.
Cho, S.-J., Gilbert, J.K., & Goodwin, A.P. (2013). Explanatory multidimensional multilevel random item response model: an application to simultaneous investigation of word and person contributions to multidimensional lexical quality. Psychometrika, 78, 830–855.
Clayton, D.G., & Rasbash, J. (1999). Estimation in large crossed random-effect models by data augmentation. Journal of the Royal Statistical Society Series A, 162, 425–436.
Daniel, R.C., & Embretson, S.E. (2010). Designing cognitive complexity in mathematical problem-solving items. Applied Psychological Measurement, 34, 348–364.
De Boeck, P. (2008). Random item IRT models. Psychometrika, 73, 533–559.
De Jong, M.G., Steenkamp, J.B.E.M., & Fox, J.-P. (2007). Relaxing cross-national measurement invariance using a hierarchical IRT model. Journal of Consumer Research, 34, 260–278.
De Jong, M.G., Steenkamp, J.B.E.M., Fox, J.-P., & Baumgartner, H. (2008). Using item response theory to measure extreme response style in marketing research: a global investigation. Journal of Marketing Research, 45, 104–115.
De Jong, M.G., & Steenkamp, J.B.E.M. (2010). Finite mixture multilevel multidimensional ordinal IRT models for large-scale cross-cultural research. Psychometrika, 75, 3–32.
Embretson, S.E. (1998). A cognitive design system approach to generating valid tests: application to abstract reasoning. Psychological Methods, 3, 300–396.
Embretson, S.E. (1999). Generating items during testing: psychometric issues and models. Psychometrika, 64, 407–433.
Embretson, S.E. (2010). Cognitive design systems: a structural modelling approach applied to developing a spatial abtiliy test. In S.E. Embretson (Ed.), Measuring psychological constructs: advances in model-based approaches (pp. 247–273). Washington: American Psychological Association.
Embretson, S.E., & Daniel, R.C. (2008). Understanding and quantifying cognitive complexity level in mathematical problem solving items. Psychology Science Quarterly, 50, 328–344.
Embretson, S.E., & Gorin, J.S. (2001). Improving construct validity with cognitive psychology principles. Journal of Educational Measurement, 38, 343–368.
Embretson, S.E., & Yang, X. (2007). Automatic item generation and cognitive psychology. In C.R. Rao & S. Sinharay (Eds.), Handbook of statistics: psychometrics (Vol. 26, pp. 747–768). North Holland: Elsevier.
Fischer, G.H. (1973). Linear logistic test model as an instrument in educational research. Acta Psychologica, 37, 359–374.
Fox, J.-.P. (2010). Bayesian item response modeling. New York: Springer.
Frederickx, S., Tuerlinckx, F., De Boeck, P., & Magis, D. (2010). RIM: a random item mixture model to detect differential item functioning. Journal of Educational Measurement, 47, 432–457.
Freund, Ph.A., Hofer, S., & Holling, H. (2008). Explaining and controlling for the psychometric properties of computer-generated figural matrix items. Applied Psychological Measurement, 32, 195–210.
Geerlings, H., Glas, C.A.W., & van der Linden, W.J. (2011). Modeling rule-based item generation. Psychometrika, 76, 337–359.
Gierl, M., & Haladyna, T. (2012). Automatic item generation. New York: Taylor & Francis.
Gierl, M., & Lai, H. (2012). Using weak and strong theory to create item models for automatic item generation: some practical guidelines with examples. In M. Gierl & T. Haladyna (Eds.), Automatic item generation, New York: Taylor & Francis.
Gierl, M.J., Zhou, J., & Alves, C.B. (2008). Developing a taxonomy of item model types to promote assessment engineering. The Journal of Technology, Learning, and Assessment, 7, 1–51.
Glas, C.A.W., & van der Linden, W.J. (2003). Computerized adaptive testing with item cloning. Applied Psychological Measurement, 27, 247–261.
Goldstein, H., & Rasbash, J. (1996). Improved approximations for multilevel models with binary responses. Journal of the Royal Statistical Society Series A, 159, 505–513.
Goldstein, H. (1991). Nonlinear multilevel models, with an application to discrete response data. Biometrika, 78, 45–51.
Gorin, J. (2005). Manipulating processing difficulty of reading comprehension questions: the feasibility of verbal item generation. Journal of Educational Measurement, 42, 351–373.
Gurieroux, C., Holly, A., & Monfort, A. (1982). Likelihood ratio test, Wald test, and Kuhn–Tucker test in linear models with inequality constraints on the regression parameters on the regression parameters. Econometrica, 50, 63–80.
Holling, H., Bertling, J.P., & Zeuch, N. (2009). Probability word problems: automatic item generation and LLTM modelling. Studies in Educational Evaluation, 35, 71–76.
Irvine, S.H. & Kyllonen, P. (Eds.) (2002). Item generation for test development. Mahwah: Erlbaum.
Janssen, R., Tuerlinckx, F., Meulders, M., & De Boeck, P. (2000). A hierarchical IRT model for criterion-referenced measurement. Journal of Educational and Behavioral Statistics, 25, 285–306.
Janssen, R., Schepers, J., & Perez, D. (2004). Models with item and item group predictors. In P. De Boeck & M. Wilson (Eds.), Explanatory item response models: a generalized linear and nonlinear approach (pp. 189–212). New York: Springer.
Joe, H. (2008). Accuracy of Laplace approximation for discrete response mixed models. Computational Statistics & Data Analysis, 52, 5066–5074.
Johnson, M.S., & Sinharay, S. (2005). Calibration of polytomous item families using Bayesian hierarchical modeling. Applied Psychological Measurement, 29, 369–400.
Karim, M.R., & Zeger, S.L. (1992). Generalized linear models with random effects: Salamander mating revisited. Biometrics, 48, 631–644.
Klein Entink, R.H., Fox, J.-P., & van der Linden, W.J. (2009a). A multivariate multilevel approach to the modeling of accuracy and speed of test takers. Psychometrika, 74, 21–48.
Klein Entink, R.H., Kuhn, J.-T., Hornke, L.F., & Fox, J.-P. (2009b). Evaluating cognitive theory: a joint modeling approach using responses and response times. Psychological Methods, 14, 54–75.
Koehler, E., Brown, E., & Haneuse, S. (2009). On the assessment of Monte Carlo error in simulation-based statistical analyses. American Statistician, 63, 155–162.
Lee, Y., & Nelder, J.A. (1996). Hierarchical generalized linear models (with discussion). Journal of the Royal Statistical Society Series B, 58, 619–678.
Lee, Y., & Nelder, J.A. (2006). Double-hierarchical generalized linear models (with discussion). Journal of the Royal Statistical Society Series C, 55, 1–29.
Lin, X., & Breslow, N.E. (1996). Bias correction in generalized linear mixed models with multiple components of dispersion. Journal of the American Statistical Association, 91, 1007–1016.
McGilchrist, C.A. (1994). Estimation in generalized mixed models. Journal of the Royal Statistical Society Series B, 56, 61–69.
Millman, J., & Westman, R.S. (1989). Computer assisted writing of achievement test items: toward a future technology. Journal of Educational Measurement, 26, 177–190.
Mislevy, R.J. (1986). Bayes modal estimation in item response models. Psychometrika, 51, 177–195.
Mislevy, R.J. (1988). Exploiting auxiliary information about items in the estimation of Rasch item difficulty parameters. Applied Psychological Measurement, 12, 281–296.
Natarajan, R., & Kass, R.E. (2000). Reference Bayesian methods for generalized linear mixed model. Journal of the American Statistical Association, 95, 227–237.
Noh, M., & Lee, Y. (2007). REML estimation for binary data in GLMMs. Journal of Multivariate Analysis, 98, 896–915.
Patterson, H.D., & Thompson, R. (1971). Recovery of inter-block information when block sizes are unequal. Biometrika, 58, 545–554.
Pinheiro, J.C., & Bates, D.M. (1995). Approximation to the log-likelihood function in the nonlinear mixed-effects model. Journal of Computational Graphics and Statistics, 4, 12–35.
Rabe-Hesketh, S., Skrondal, A., & Pickles, A. (2004). Generalized multilevel structural equation modelling. Psychometrika, 69, 167–190.
Rabe-Hesketh, S., Skrondal, A., & Pickles, A. (2005). Maximum likelihood estimation of limited and discrete dependent variable models with nested random effects. Journal of Econometrics, 128, 301–323.
Rabe-Hesketh, S., & Skrondal, A. (2012). Multilevel and longitudinal modeling using Stata (3rd ed.). College Station: Stata Press.
Rasbash, J., & Browne, W.J. (2007). Non-hierarchical multilevel models. In J. de Leeuw & E. Meijer (Eds.), Handbook of multilevel analysis (pp. 333–336). New York: Springer.
Raudenbush, S.W., Yang, M., & Yosef, M. (2000). Maximum likelihood for generalized linear models with nested random effects via high-order, multivariate Laplace approximation. Journal of Computational and Graphical Statistics, 9, 141–157.
Rodriguez, G., & Goldman, N. (1995). An assessment of estimation procedures for multilevel models with binary responses. Journal of the Royal Statistical Society Series A, 158, 73–89.
Rodriguez, G., & Goldman, N. (2001). Improved estimation procedures for multilevel models with binary response: a case study. Journal of the Royal Statistical Society Series A, 164, 339–355.
Roid, G.H., & Haladyna, T.M. (1982). Toward a technology of test-item writing. New York: Academic.
Schilling, S., & Bock, R.D. (2005). High dimensional maximum marginal likelihood item factor analysis by adaptive quadrature. Psychometrika, 70, 533–555.
Schwarz, G. (1978). Estimating the dimension of a model. The Annals of Statistics, 6, 461–464.
Scrams, D.J., Mislevy, R.J., & Sheehan, K.M. (2002). An analysis of similarities in item functioning within antonym and analogy variant families (RR-02-13). Princeton: Educational Testing Service.
Sinharay, S., Johnson, M.S., & Williamson, D.M. (2003). Calibrating item families and summarizing the results using family expected response functions. Journal of Educational and Behavioral Statistics, 28, 295–313.
Snijders, T.A.B., & Bosker, R.J. (1994). Modeled variance in two-level models. Sociological Methods & Research, 22, 342–363.
Soares, T.M., Gonçalvez, F.B., & Gamerman, D. (2009). An integrated Bayesian model for DIF analysis. Journal of Educational and Behavioral Statistics, 34, 348–377.
Stram, D.O., & Lee, J.W. (1994). Variance components testing in the longitudinal mixed effect model. Biometrics, 50, 1171–1177.
Stram, D.O., & Lee, J.W. (1995). Correction to: variance components testing in the longitudinal mixed-effects model. Biometrics, 51, 1196.
Tanner, M.A., & Wong, W.H. (1987). The calculation of posterior distributions by data augmentation. Journal of the American Statistical Association, 82, 528–540.
Tierney, L., & Kadane, J.B. (1986). Accurate approximations for posterior moments and marginal densities. Journal of the American Statistical Association, 81, 82–86.
Vaida, F., & Blanchard, S. (2005). Conditional Akaike information for mixed effects models. Biometrika, 92, 351–370.
van der Linden, W.J., Klein Entink, R.H., & Fox, J.-P. (2010). IRT parameter estimation with response times as collateral information. Applied Psychological Measurement, 34, 327–347.
Verbeke, G., & Molenberghs, G. (2003). The use of score tests for inference on variance components. Biometrics, 59, 254–262.
Wainer, H., Bradlow, E.T., & Wang, X. (2007). Testlet response theory and its applications. New York: Cambridge University Press.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Cho, SJ., De Boeck, P., Embretson, S. et al. Additive Multilevel Item Structure Models with Random Residuals: Item Modeling for Explanation and Item Generation. Psychometrika 79, 84–104 (2014). https://doi.org/10.1007/s11336-013-9360-2
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11336-013-9360-2