Abstract
Item cloning is increasingly used to generate slight differences in tasks for use in psychological experiments and educational assessments. This paper investigates the psychometric issues that arise when item cloning introduces variation into the difficulty parameters of the item clones. Four models are proposed and evaluated in simulation studies with conditions representing possible types of variation due to item cloning. Depending on the model specified, unaccounted variance in the item clone difficulties propagates to other parameters in the model, causing specific and predictable patterns of bias. Person parameters are largely unaffected by the choice of model, but for inferences related to the item parameters, the choice is critical and can even be leveraged to identify problematic item cloning.
Similar content being viewed by others
References
Almond, R. G., Kim, Y. J., Velasquez, G., & Shute, V. J. (2014). How task features impact evidence from assessments embedded in simulations and games. Measurement: Interdisciplinary Research & Perspectives, 12(1–2), 1–33.
Almond, R. G., Steinberg, L. S., & Mislevy, R. J. (2002). Enhancing the design and delivery of assessment systems: A four-process architecture. Journal of Technology, Learning, and Assessment, 1(5), 1–6.
Arendasy, M. E., & Sommer, M. (2012). Using automatic item generation to meet the increasing item demands of high-stakes educational and occupational assessment. Learning and Individual Differences, 22, 112–117.
Arendasy, M. E., Sommer, M., & Mayr, F. (2012). Using automatic item generation to simultaneously construct german and english versions of a word fluency test. Journal of Cross-Cultural Psychology, 43, 464–479.
Bejar, I. I., Lawless, R., Morely, M., Wagner, M., Bennett, R., & Revuelta, J. (2003). A feasibility study of on-the-fly item generation in adaptive testing. The Journal of Technology, Learning, and Assessment, 2(3).
Bradlow, E. T., Wainer, H., & Wang, X. (1999). A bayesian random effects model for testlets. Psychometrika, 64, 153–168.
Buchanan, T. (2002). Online assessment: Desirable or dangerous? Professional Psychology: Research and Practice, 33, 148–154.
Cho, S.-J., de Boeck, P., Embretson, S., & Rabe-Hesketh, S. (2014). Additive multilevel item structure models with random residuals: Item modeling for explanation and item generation. Psychometrika, 79, 84–104.
de Boeck, P., & Leuven, K. (2008). Random item IRT models. Psychometrika, 73(4), 533–559.
de Boeck, P., & Wilson, M. (2004). Explanatory item response models. New York: Springer.
DiCerbo, K. E., & Berhens, J. T. (2014). Impacts of the digital ocean. London: Pearson.
Embretson, S., & Yang, X. (2007). Automatic item generation and cognitive psychology. In C. Rao & S. Sinharay (Eds.), Handbook of statistics: Psychometrics (pp. 747–768). Amsterdam: Elsevier.
Enright, M. K., & Sheehan, K. M. (2002). Modeling the difficulty of quantitative reasoning items: Implications for item generation. In S. H. Irvine & P. C. Kyllonen (Eds.), Item generation for test development. New York: Psychology Press.
Fischer, G. H. (1973). The linear logistic test model as an instrument in educational research. Acta psychologica, 37(6), 359–374.
Gatti, G. G. (2013). Digits 2012 efficacy study (tech. rep.). London: Pearson.
Geerlings, H., Glas, C. A., & van der Linden, W. (2011). Modeling rule-based item generation. Psychometrika, 76(2), 337–359.
Gelman, A. (2006). Prior distributions for variance parameters in hierarchical models. Bayesian Analysis, 1(3), 515–533.
Gelman, A., & Hill, J. (2007). Data analysis using regression and multilevel/hierarchical models. Cambridge: Cambridge University Press.
Gierl, M. J., & Lai, H. (2012). The role of item models in automatic item generation the role of item models in automatic item generation. International Journal of Testing, 12, 273–298.
Gierl, M. J., & Lai, H. (2013). Using automated processes to generate test items. Educational Measurement: Issues and Practice, 22, 36–50.
Glas, C. A. W., & van der Linden, W. J. (2003). Computerized adaptive testing with item cloning. Applied Psychological Measurement, 27, 247–261.
Graf, E. A. (2014). Connecting lines of research on task model variables, automatic item generation, and learning progressions in game-based assessment. Measurement: Interdisciplinary Research & Perspectives, 12(1–2), 42–46.
Han, K. T. (2012). Fixing the c parameter in the three-parameter logistic model. Practical Assessment, Research & Evaluation, 17(1), 1–24.
Irvine, S., Kyllonen, P., Laboratory, A. F. H. R., & Service, E. T. (2002). Item generation for test development. Mahwah: Lawrence Erlbaum Associates.
Johnson, M. S., & Sinharay, S. (2005). Calibration of polytomous item families using bayesian hierarchical modeling. Applied Psychological Measurement, 29, 369–400.
Matzen, L. E., Benz, Z. O., Dixon, K. R., Posey, J., Kroger, J. K., & Speed, A. E. (2010). Recreating raven’s: Software for systematically generating large numbers of raven-like matrix problems with normed properties. Behavior Research Methods, 42, 525–541.
McCulloch, C. E., & Neuhaus, J. M. (2011). Misspecifying the shape of a random effects distribution: Why getting it wrong may not matter. Statistical Science, 26(3), 388–402.
Mislevy, R. J., Oranje, A., Bauer, M. I., von Davier, A., Hao, J., Corrigan, S., . . . John, M. (2014). Psychometric considerations in game-based assessment (Tech. Rep.). GlassLab Research.
Neuhaus, J. M., & McCulloch, C. E. (2006). Separating between- and within-cluster covariate effects by using conditional and partitioning methods. Journal of the Royal Statistical Society Series B (Statistical Methodology), 68(5), 859–872.
R Development Core Team (2011). R: A language and environment for statistical computing [Computer software manual]. Vienna: Austria.
Sinharay, S., & Johnson, M. (2005). Analysis of data from an admissions test with item models (Tech. Rep. No. RR-05-06). Educational Testing Service.
Sinharay, S., Johnson, M., & Williamson, D. (2003). Calibrating item families and summarizing the results using family expected response functions. Journal of Educational and Behavioral Statistics, 28(4), 295–313.
Stan Development Team (2013). Stan: A C++ library for probability and sampling, version 1.3.
Author information
Authors and Affiliations
Corresponding author
Additional information
The views and opinions expressed in this article are those of the authors and do not necessarily reflect those of their institutions.
Appendices
Appendix
Stan Code
Below is the Stan code for the 2P-F and 2P-RX models. The 2P-FX and 2P-R can be inferred from this code. Note that the code below is not optimized for speed, but is presented here in a more human readable form.
Rights and permissions
About this article
Cite this article
Lathrop, Q.N., Cheng, Y. Item Cloning Variation and the Impact on the Parameters of Response Models. Psychometrika 82, 245–263 (2017). https://doi.org/10.1007/s11336-016-9513-1
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11336-016-9513-1