Item Cloning Variation and the Impact on the Parameters of Response Models

Lathrop, Quinn N.; Cheng, Ying

doi:10.1007/s11336-016-9513-1

Item Cloning Variation and the Impact on the Parameters of Response Models

Published: 03 October 2016

Volume 82, pages 245–263, (2017)
Cite this article

Psychometrika Aims and scope Submit manuscript

Quinn N. Lathrop¹ &
Ying Cheng²

677 Accesses
8 Citations
Explore all metrics

Abstract

Item cloning is increasingly used to generate slight differences in tasks for use in psychological experiments and educational assessments. This paper investigates the psychometric issues that arise when item cloning introduces variation into the difficulty parameters of the item clones. Four models are proposed and evaluated in simulation studies with conditions representing possible types of variation due to item cloning. Depending on the model specified, unaccounted variance in the item clone difficulties propagates to other parameters in the model, causing specific and predictable patterns of bias. Person parameters are largely unaffected by the choice of model, but for inferences related to the item parameters, the choice is critical and can even be leveraged to identify problematic item cloning.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Identified and unidentified cases of the fixed-effects 3- and 4-parameter models in item response theory

Article 01 July 2017

Haruhiko Ogasawara

The Use of an Identifiability-Based Strategy for the Interpretation of Parameters in the 1PL-G and Rasch Models

Article 23 January 2019

Paula Fariña, Jorge González & Ernesto San Martín

Seeking the real item difficulty: bias-corrected item difficulty and some consequences in Rasch and IRT modeling

Article Open access 17 June 2022

Jari Metsämuuronen

References

Almond, R. G., Kim, Y. J., Velasquez, G., & Shute, V. J. (2014). How task features impact evidence from assessments embedded in simulations and games. Measurement: Interdisciplinary Research & Perspectives, 12(1–2), 1–33.
Google Scholar
Almond, R. G., Steinberg, L. S., & Mislevy, R. J. (2002). Enhancing the design and delivery of assessment systems: A four-process architecture. Journal of Technology, Learning, and Assessment, 1(5), 1–6.
Google Scholar
Arendasy, M. E., & Sommer, M. (2012). Using automatic item generation to meet the increasing item demands of high-stakes educational and occupational assessment. Learning and Individual Differences, 22, 112–117.
Article Google Scholar
Arendasy, M. E., Sommer, M., & Mayr, F. (2012). Using automatic item generation to simultaneously construct german and english versions of a word fluency test. Journal of Cross-Cultural Psychology, 43, 464–479.
Article Google Scholar
Bejar, I. I., Lawless, R., Morely, M., Wagner, M., Bennett, R., & Revuelta, J. (2003). A feasibility study of on-the-fly item generation in adaptive testing. The Journal of Technology, Learning, and Assessment, 2(3).
Bradlow, E. T., Wainer, H., & Wang, X. (1999). A bayesian random effects model for testlets. Psychometrika, 64, 153–168.
Article Google Scholar
Buchanan, T. (2002). Online assessment: Desirable or dangerous? Professional Psychology: Research and Practice, 33, 148–154.
Article Google Scholar
Cho, S.-J., de Boeck, P., Embretson, S., & Rabe-Hesketh, S. (2014). Additive multilevel item structure models with random residuals: Item modeling for explanation and item generation. Psychometrika, 79, 84–104.
Article PubMed Google Scholar
de Boeck, P., & Leuven, K. (2008). Random item IRT models. Psychometrika, 73(4), 533–559.
Article Google Scholar
de Boeck, P., & Wilson, M. (2004). Explanatory item response models. New York: Springer.
Book Google Scholar
DiCerbo, K. E., & Berhens, J. T. (2014). Impacts of the digital ocean. London: Pearson.
Google Scholar
Embretson, S., & Yang, X. (2007). Automatic item generation and cognitive psychology. In C. Rao & S. Sinharay (Eds.), Handbook of statistics: Psychometrics (pp. 747–768). Amsterdam: Elsevier.
Google Scholar
Enright, M. K., & Sheehan, K. M. (2002). Modeling the difficulty of quantitative reasoning items: Implications for item generation. In S. H. Irvine & P. C. Kyllonen (Eds.), Item generation for test development. New York: Psychology Press.
Google Scholar
Fischer, G. H. (1973). The linear logistic test model as an instrument in educational research. Acta psychologica, 37(6), 359–374.
Article Google Scholar
Gatti, G. G. (2013). Digits 2012 efficacy study (tech. rep.). London: Pearson.
Google Scholar
Geerlings, H., Glas, C. A., & van der Linden, W. (2011). Modeling rule-based item generation. Psychometrika, 76(2), 337–359.
Article Google Scholar
Gelman, A. (2006). Prior distributions for variance parameters in hierarchical models. Bayesian Analysis, 1(3), 515–533.
Google Scholar
Gelman, A., & Hill, J. (2007). Data analysis using regression and multilevel/hierarchical models. Cambridge: Cambridge University Press.
Google Scholar
Gierl, M. J., & Lai, H. (2012). The role of item models in automatic item generation the role of item models in automatic item generation. International Journal of Testing, 12, 273–298.
Article Google Scholar
Gierl, M. J., & Lai, H. (2013). Using automated processes to generate test items. Educational Measurement: Issues and Practice, 22, 36–50.
Article Google Scholar
Glas, C. A. W., & van der Linden, W. J. (2003). Computerized adaptive testing with item cloning. Applied Psychological Measurement, 27, 247–261.
Article Google Scholar
Graf, E. A. (2014). Connecting lines of research on task model variables, automatic item generation, and learning progressions in game-based assessment. Measurement: Interdisciplinary Research & Perspectives, 12(1–2), 42–46.
Google Scholar
Han, K. T. (2012). Fixing the c parameter in the three-parameter logistic model. Practical Assessment, Research & Evaluation, 17(1), 1–24.
Google Scholar
Irvine, S., Kyllonen, P., Laboratory, A. F. H. R., & Service, E. T. (2002). Item generation for test development. Mahwah: Lawrence Erlbaum Associates.
Google Scholar
Johnson, M. S., & Sinharay, S. (2005). Calibration of polytomous item families using bayesian hierarchical modeling. Applied Psychological Measurement, 29, 369–400.
Article Google Scholar
Matzen, L. E., Benz, Z. O., Dixon, K. R., Posey, J., Kroger, J. K., & Speed, A. E. (2010). Recreating raven’s: Software for systematically generating large numbers of raven-like matrix problems with normed properties. Behavior Research Methods, 42, 525–541.
Article PubMed Google Scholar
McCulloch, C. E., & Neuhaus, J. M. (2011). Misspecifying the shape of a random effects distribution: Why getting it wrong may not matter. Statistical Science, 26(3), 388–402.
Article Google Scholar
Mislevy, R. J., Oranje, A., Bauer, M. I., von Davier, A., Hao, J., Corrigan, S., . . . John, M. (2014). Psychometric considerations in game-based assessment (Tech. Rep.). GlassLab Research.
Neuhaus, J. M., & McCulloch, C. E. (2006). Separating between- and within-cluster covariate effects by using conditional and partitioning methods. Journal of the Royal Statistical Society Series B (Statistical Methodology), 68(5), 859–872.
Article Google Scholar
R Development Core Team (2011). R: A language and environment for statistical computing [Computer software manual]. Vienna: Austria.
Sinharay, S., & Johnson, M. (2005). Analysis of data from an admissions test with item models (Tech. Rep. No. RR-05-06). Educational Testing Service.
Sinharay, S., Johnson, M., & Williamson, D. (2003). Calibrating item families and summarizing the results using family expected response functions. Journal of Educational and Behavioral Statistics, 28(4), 295–313.
Article Google Scholar
Stan Development Team (2013). Stan: A C++ library for probability and sampling, version 1.3.

Download references

Author information

Authors and Affiliations

Pearson, Portland, USA
Quinn N. Lathrop
University of Notre Dame, Notre Dame, USA
Ying Cheng

Authors

Quinn N. Lathrop
View author publications
You can also search for this author in PubMed Google Scholar
Ying Cheng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Quinn N. Lathrop.

Additional information

The views and opinions expressed in this article are those of the authors and do not necessarily reflect those of their institutions.

Appendices

Appendix

Stan Code

Below is the Stan code for the 2P-F and 2P-RX models. The 2P-FX and 2P-R can be inferred from this code. Note that the code below is not optimized for speed, but is presented here in a more human readable form.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lathrop, Q.N., Cheng, Y. Item Cloning Variation and the Impact on the Parameters of Response Models. Psychometrika 82, 245–263 (2017). https://doi.org/10.1007/s11336-016-9513-1

Download citation

Received: 15 September 2014
Revised: 31 March 2016
Published: 03 October 2016
Issue Date: March 2017
DOI: https://doi.org/10.1007/s11336-016-9513-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Item Cloning Variation and the Impact on the Parameters of Response Models

Abstract

Access this article

Similar content being viewed by others

Identified and unidentified cases of the fixed-effects 3- and 4-parameter models in item response theory

The Use of an Identifiability-Based Strategy for the Interpretation of Parameters in the 1PL-G and Rasch Models

Seeking the real item difficulty: bias-corrected item difficulty and some consequences in Rasch and IRT modeling

References