Abstract
According to the recent Nation’s Report Card, 12th-graders failed to produce gains on the 2005 National Assessment of Educational Progress (NAEP) despite earning better grades on average. One possible explanation is that 12th-graders were not motivated taking the NAEP, which is a low-stakes test. We develop three Bayesian IRT mixture models to describe the results from a group of examinees including both nonguessers and partial guessers. The first assumes that the guesser answers questions based on his or her knowledge up to a certain test item, and guesses thereafter. The second model assumes that the guesser answers relatively easy questions based on his or her knowledge and guesses randomly on the remaining items. The third is constructed to describe more general low-motivation behavior. It assumes that the guesser gives less and less effort as he or she proceeds through the test. The models can provide not only consistent estimates of IRT parameters but also estimates of each examinee’s nonguesser/guesser status and degree of guessing behavior. We show results of a simulation study comparing the performance of the three guessing models to the 2PL-IRT model. Finally, an analysis of real data from a low-stakes test administered to university students is presented.
Similar content being viewed by others
References
Berger, J.O. (1985). Statistical decision theory and bayesian analysis (2nd ed.). New York: Springer.
Bechger, T., Maris, G., Verstralen, H., & Verhelst, N. (2003). The Nedelsky model for multiple choice items (R & D Report) Arnhem: Cito.
Brophy, J., & Ames, C. (2005). NAEP testing for twelfth graders: motivational issues. A paper prepared for the national assessment governing board.
Gilk, W., & Wild, P. (1992). Adaptive rejection sampling for Gibbs sampling. Applied Statistics, 41, 337–48.
Grigg, W., Donahue, P., & Dion, G. (2007). The nation’s report card: 12th-grade reading and mathematics 2005. National Center for Education Statistics.
Johnson, V. (2004). A Bayesian χ 2 test for goodness of fit. Annals of Statistics, 32, 2361–384.
Martin, E.S., del Pino, G., & De Boeck, P. (2006). IRT models for ability-based guessing. Applied Psychological Measurement, 30, 183–03.
Mislevy, R.J., & Verhelst, N. (1990). Modeling item responses when different subjects employ different solution strategies. Psychometrika, 55, 195–15.
Neal, R.M. (2003). Slice sampling. The Annuals of Statistics, 31, 705–67.
Rijmen, F., De Boeck, P., & van der Maas, H.L.J. (2005). An IRT model with a parameter-driven process for change. Psychometrika, 70, 651–69.
Speckman, P.L., & Sun, D. (2003). Fully Bayesian spline smoothing and intrinsic autoregressive priors. Biometrika, 90, 289–02.
Spiegelhalter, D.J., Best, N.G., Carlin, B.P., & van der Linde, A. (2002). Bayesian measures of model complexity and fit. Journal of the Royal Statistical Society, Series B, Methodological, 64, 583–16.
Wise, S.L., & DeMars, C.E. (2006). An application of item response time: the effort-moderated IRT model. Journal of Educational Measurement, 43(1), 19–8.
Wise, S.L., & DeMars, C.E. (2005). Low examinee effort in low-stakes assessment: problems and potential solutions. Educational Assessment, 10(1), 1–7.
Wise, S.L., & Kong, X. (2005). Response time effort: a new measure of examinee motivation in computer-based tests. Applied Measurement in Education, 18(2), 1–7.
Yamamoto, K. (1995). Estimating the effects of test length and test time on parameter estimation using the HYBRID model (TOEFL Technical Report No. TR-10).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Cao, J., Stokes, S.L. Bayesian IRT Guessing Models for Partial Guessing Behaviors. Psychometrika 73, 209–230 (2008). https://doi.org/10.1007/s11336-007-9045-9
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11336-007-9045-9