Skip to main content
Log in

Item Cloning Variation and the Impact on the Parameters of Response Models

  • Published:
Psychometrika Aims and scope Submit manuscript

Abstract

Item cloning is increasingly used to generate slight differences in tasks for use in psychological experiments and educational assessments. This paper investigates the psychometric issues that arise when item cloning introduces variation into the difficulty parameters of the item clones. Four models are proposed and evaluated in simulation studies with conditions representing possible types of variation due to item cloning. Depending on the model specified, unaccounted variance in the item clone difficulties propagates to other parameters in the model, causing specific and predictable patterns of bias. Person parameters are largely unaffected by the choice of model, but for inferences related to the item parameters, the choice is critical and can even be leveraged to identify problematic item cloning.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  • Almond, R. G., Kim, Y. J., Velasquez, G., & Shute, V. J. (2014). How task features impact evidence from assessments embedded in simulations and games. Measurement: Interdisciplinary Research & Perspectives, 12(1–2), 1–33.

    Google Scholar 

  • Almond, R. G., Steinberg, L. S., & Mislevy, R. J. (2002). Enhancing the design and delivery of assessment systems: A four-process architecture. Journal of Technology, Learning, and Assessment, 1(5), 1–6.

    Google Scholar 

  • Arendasy, M. E., & Sommer, M. (2012). Using automatic item generation to meet the increasing item demands of high-stakes educational and occupational assessment. Learning and Individual Differences, 22, 112–117.

    Article  Google Scholar 

  • Arendasy, M. E., Sommer, M., & Mayr, F. (2012). Using automatic item generation to simultaneously construct german and english versions of a word fluency test. Journal of Cross-Cultural Psychology, 43, 464–479.

    Article  Google Scholar 

  • Bejar, I. I., Lawless, R., Morely, M., Wagner, M., Bennett, R., & Revuelta, J. (2003). A feasibility study of on-the-fly item generation in adaptive testing. The Journal of Technology, Learning, and Assessment, 2(3).

  • Bradlow, E. T., Wainer, H., & Wang, X. (1999). A bayesian random effects model for testlets. Psychometrika, 64, 153–168.

    Article  Google Scholar 

  • Buchanan, T. (2002). Online assessment: Desirable or dangerous? Professional Psychology: Research and Practice, 33, 148–154.

    Article  Google Scholar 

  • Cho, S.-J., de Boeck, P., Embretson, S., & Rabe-Hesketh, S. (2014). Additive multilevel item structure models with random residuals: Item modeling for explanation and item generation. Psychometrika, 79, 84–104.

    Article  PubMed  Google Scholar 

  • de Boeck, P., & Leuven, K. (2008). Random item IRT models. Psychometrika, 73(4), 533–559.

    Article  Google Scholar 

  • de Boeck, P., & Wilson, M. (2004). Explanatory item response models. New York: Springer.

    Book  Google Scholar 

  • DiCerbo, K. E., & Berhens, J. T. (2014). Impacts of the digital ocean. London: Pearson.

    Google Scholar 

  • Embretson, S., & Yang, X. (2007). Automatic item generation and cognitive psychology. In C. Rao & S. Sinharay (Eds.), Handbook of statistics: Psychometrics (pp. 747–768). Amsterdam: Elsevier.

    Google Scholar 

  • Enright, M. K., & Sheehan, K. M. (2002). Modeling the difficulty of quantitative reasoning items: Implications for item generation. In S. H. Irvine & P. C. Kyllonen (Eds.), Item generation for test development. New York: Psychology Press.

    Google Scholar 

  • Fischer, G. H. (1973). The linear logistic test model as an instrument in educational research. Acta psychologica, 37(6), 359–374.

    Article  Google Scholar 

  • Gatti, G. G. (2013). Digits 2012 efficacy study (tech. rep.). London: Pearson.

    Google Scholar 

  • Geerlings, H., Glas, C. A., & van der Linden, W. (2011). Modeling rule-based item generation. Psychometrika, 76(2), 337–359.

    Article  Google Scholar 

  • Gelman, A. (2006). Prior distributions for variance parameters in hierarchical models. Bayesian Analysis, 1(3), 515–533.

    Google Scholar 

  • Gelman, A., & Hill, J. (2007). Data analysis using regression and multilevel/hierarchical models. Cambridge: Cambridge University Press.

    Google Scholar 

  • Gierl, M. J., & Lai, H. (2012). The role of item models in automatic item generation the role of item models in automatic item generation. International Journal of Testing, 12, 273–298.

    Article  Google Scholar 

  • Gierl, M. J., & Lai, H. (2013). Using automated processes to generate test items. Educational Measurement: Issues and Practice, 22, 36–50.

    Article  Google Scholar 

  • Glas, C. A. W., & van der Linden, W. J. (2003). Computerized adaptive testing with item cloning. Applied Psychological Measurement, 27, 247–261.

    Article  Google Scholar 

  • Graf, E. A. (2014). Connecting lines of research on task model variables, automatic item generation, and learning progressions in game-based assessment. Measurement: Interdisciplinary Research & Perspectives, 12(1–2), 42–46.

    Google Scholar 

  • Han, K. T. (2012). Fixing the c parameter in the three-parameter logistic model. Practical Assessment, Research & Evaluation, 17(1), 1–24.

    Google Scholar 

  • Irvine, S., Kyllonen, P., Laboratory, A. F. H. R., & Service, E. T. (2002). Item generation for test development. Mahwah: Lawrence Erlbaum Associates.

    Google Scholar 

  • Johnson, M. S., & Sinharay, S. (2005). Calibration of polytomous item families using bayesian hierarchical modeling. Applied Psychological Measurement, 29, 369–400.

    Article  Google Scholar 

  • Matzen, L. E., Benz, Z. O., Dixon, K. R., Posey, J., Kroger, J. K., & Speed, A. E. (2010). Recreating raven’s: Software for systematically generating large numbers of raven-like matrix problems with normed properties. Behavior Research Methods, 42, 525–541.

    Article  PubMed  Google Scholar 

  • McCulloch, C. E., & Neuhaus, J. M. (2011). Misspecifying the shape of a random effects distribution: Why getting it wrong may not matter. Statistical Science, 26(3), 388–402.

    Article  Google Scholar 

  • Mislevy, R. J., Oranje, A., Bauer, M. I., von Davier, A., Hao, J., Corrigan, S., . . . John, M. (2014). Psychometric considerations in game-based assessment (Tech. Rep.). GlassLab Research.

  • Neuhaus, J. M., & McCulloch, C. E. (2006). Separating between- and within-cluster covariate effects by using conditional and partitioning methods. Journal of the Royal Statistical Society Series B (Statistical Methodology), 68(5), 859–872.

    Article  Google Scholar 

  • R Development Core Team (2011). R: A language and environment for statistical computing [Computer software manual]. Vienna: Austria.

  • Sinharay, S., & Johnson, M. (2005). Analysis of data from an admissions test with item models (Tech. Rep. No. RR-05-06). Educational Testing Service.

  • Sinharay, S., Johnson, M., & Williamson, D. (2003). Calibrating item families and summarizing the results using family expected response functions. Journal of Educational and Behavioral Statistics, 28(4), 295–313.

    Article  Google Scholar 

  • Stan Development Team (2013). Stan: A C++ library for probability and sampling, version 1.3.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Quinn N. Lathrop.

Additional information

The views and opinions expressed in this article are those of the authors and do not necessarily reflect those of their institutions.

Appendices

Appendix

Stan Code

Below is the Stan code for the 2P-F and 2P-RX models. The 2P-FX and 2P-R can be inferred from this code. Note that the code below is not optimized for speed, but is presented here in a more human readable form.

figure a
figure b

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lathrop, Q.N., Cheng, Y. Item Cloning Variation and the Impact on the Parameters of Response Models. Psychometrika 82, 245–263 (2017). https://doi.org/10.1007/s11336-016-9513-1

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11336-016-9513-1

Keywords

Navigation