Abstract
Hooker, Finkelman, and Schwartzman (Psychometrika, 2009, in press) defined a paradoxical result as the attainment of a higher test score by changing answers from correct to incorrect and demonstrated that such results are unavoidable for maximum likelihood estimates in multidimensional item response theory. The potential for these results to occur leads to the undesirable possibility of a subject’s best answer being detrimental to them. This paper considers the existence of paradoxical results in tests composed of item bundles when compensatory models are used. We demonstrate that paradoxical results can occur when bundle effects are modeled as nuisance parameters for each subject. However, when these nuisance parameters are modeled as random effects, or used in a Bayesian analysis, it is possible to design tests comprised of many short bundles that avoid paradoxical results and we provide an algorithm for doing so. We also examine alternative models for handling dependence between item bundles and show that using fixed dependency effects is always guaranteed to avoid paradoxical results.
Similar content being viewed by others
References
Ackerman, T. (1996). Graphical representation of multidimensional item response theory analyses. Applied Psychological Measurement, 20(4), 311–329.
Bock, R., Gibbons, R., & Muraki, E. (1988). Full-information item factor analysis. Applied Psychological Measurement, 12, 261–280.
Craven, B.D. (1988). Fractional programming. Berlin: Heldermann.
Douglas, J.A., Roussos, L.A., & Stout, W. (1996). Item-bundle dif hypothesis testing: Identifying suspect bundles and assessing their differential functioning. Journal of Educational Measurement, 33, 465–484.
Finkelman, M., Hooker, G., & Wang, J. (2009). Technical Report BU-1768-M, Department of Biological Statistics and Computational Biology, Cornell University.
Hooker, G., Finkelman, M., & Schwartzman, A. (2009). Paradoxical results in multidimensional item response theory. Psychometrika, 74(3), 419–442.
Hoskens, M., & de Boeck, P. (1997). A parametric model for local dependence among test items. Psychological Methods, 2, 261–277.
Kelderman, H. (1984). Loglinear Rasch model tests. Psychometrika, 49, 223–245.
Li, Y., Bolt, D.M., & Fu, J. (2006). A comparison of alternative models for testlets. Applied Psychological Measurement, 20(1), 3–21.
McCullagh, P., & Nelder, J.A. (1989). Generalized linear models. London: Chapman and Hall/CRC.
Reckase, M. (1985). The difficulty of test items that measure more than one ability. Applied Psychological Measurement, 9, 401–412.
Rijmen, F., Tuerlinckx, F., de Boeck, P., & Kuppens, P. (2003). A nonlinear mixed model framework for item response theory. Psychological Methods, 8(2), 185–205.
Rosenbaum, P.R. (1988). Item bundles. Psychometrika, 53, 349–359.
Veldkamp, B.P. (2002). Multidimensional constrained test assembly. Applied Psychological Measurement, 26(2), 133–146.
Wang, W., & Wilson, M. (2005). The Rasch testlet model. Applied Psychological Measurement, 29(2), 126–149.
Wang, X., Bradlow, E.T., & Wainer, H. (2002). A general Bayesian model for testlets: Theory and applications. Applied Psychological Measurement, 26(1), 109–128.
Wilson, M., & Adams, R.J. (1995). Rasch models for item bundles. Psychometrika, 60, 181–198.
Author information
Authors and Affiliations
Corresponding author
Additional information
The authors would like to thank an anonymous referee of Hooker et al. (2009) for suggesting the problem of item bundles.
Rights and permissions
About this article
Cite this article
Hooker, G., Finkelman, M. Paradoxical Results and Item Bundles. Psychometrika 75, 249–271 (2010). https://doi.org/10.1007/s11336-009-9143-y
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11336-009-9143-y