Response time-based treatment of omitted responses in computer-based testing

Abstract

A new response time-based method for coding omitted item responses in computer-based testing is introduced and illustrated with empirical data. The new method is derived from the theory of missing data problems of Rubin and colleagues and embedded in an item response theory framework. Its basic idea is using item response times to statistically test for each individual item whether omitted responses are missing completely at random (MCAR) or missing due to a lack of ability and, thus, not at random (MNAR) with fixed type-1 and type-2 error levels. If the MCAR hypothesis is maintained, omitted responses are coded as not administered (NA), and as incorrect (0) otherwise. The empirical illustration draws from the responses given by N = 766 students to 70 items of a computer-based ICT skills test. The new method is compared with the two common deterministic methods of scoring omitted responses as 0 or as NA. In result, response time thresholds from 18 to 58 s were identified. With 61%, more omitted responses were recoded into 0 than into NA (39%). The differences in difficulty were larger when the new method was compared to deterministically scoring omitted responses as NA compared to scoring omitted responses as 0. The variances and reliabilities obtained under the three methods showed small differences. The paper concludes with a discussion of the practical relevance of the observed effect sizes, and with recommendations for the practical use of the new method as a method to be applied in the early stage of data processing.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Notes

  1. 1.

    At this point, we discuss missing responses at a general level. A distinction between the types “missing by design”, “not reached”, and “omitted” is introduced later in the text in the section “Types of Missing Item Responses”.

References

  1. Arbuckle JL (1996) Full information estimation in the presence of incomplete data. In: Marcoulides GA, Schumacker RE (eds) Advanced structural equation modeling. Lawrence Erlbaum Publishers, Mahwah, pp 243–277

    Google Scholar 

  2. Champely, S. (2017). pwr: Basic functions for power analysis [software]. R package version 1.2-1

  3. Chen HY, Little RJA (1999) A test of missing completely at random for generalised estimating equations with missing data. Biometrika 86:1–13

    MathSciNet  Article  Google Scholar 

  4. Core Team R (2017) R: a language and environment for statistical computing [software]. R Foundation for Statistical Computing, Vienna

    Google Scholar 

  5. De Ayala RJ (2009) The theory and practice of item response theory. The Guildford Press, New York

    Google Scholar 

  6. De Ayala RJ, Plake BS, Impara JC (2001) The impact of omitted responses on the accuracy of ability estimation in item response theory. J Educ Meas 38:213–234

    Article  Google Scholar 

  7. Diggle PJ (1989) Testing for random dropouts in repeated measurement data. Biometrics 45:1255–1258

    Article  Google Scholar 

  8. Enders CK (2010) Applied missing data analysis. Guilford Press, New York

    Google Scholar 

  9. Enders CK, Bandalos DL (2001) The relative performance of full information maximum likelihood estimation for missing data in structural equation models. Struct Equ Model 8:430–457

    MathSciNet  Article  Google Scholar 

  10. Engelhardt L, Naumann J, Goldhammer F, Frey A, Wenzel SFC, Hartig K, Horz H (2018) Convergent evidence for validity of a performance-based ICT skills test. European Journal of Psychological Assessment

  11. Finch H (2008) Estimation of item response theory parameters in the presence of missing data. J Educ Meas 45:225–245

    Article  Google Scholar 

  12. Frey A, Hartig J, Rupp A (2009) Booklet designs in large-scale assessments of student achievement: theory and practice. Educ Meas: Issues Pract 28:39–53

    Article  Google Scholar 

  13. Glas CAW, Pimentel J (2008) Modeling nonignorable missing data in speeded tests. Educ Psychol Measur 68:907–922

    MathSciNet  Article  Google Scholar 

  14. Goldhammer F, Kroehne U (2014) Controlling individuals’ time spent on task in speeded performance measures: experimental time limits, posterior time limits, and response time modeling. Appl Psychol Meas 38:255–267

    Article  Google Scholar 

  15. Goldhammer F, Martens T, Lüdtke O (2017) Conditioning factors of test-taking engagement in PIAAC: an exploratory IRT modelling approach considering person and item characteristics. Large-scale Assess Educ 5(1):18

    Article  Google Scholar 

  16. Hedges LV (1981) Distribution theory for Glass’s estimator of effect size and related estimators. J Educ Stat 6:107–128

    Article  Google Scholar 

  17. Holman R, Glas CAW (2005) Modelling non-ignorable missing data mechanisms with item response theory models. Br J Math Stat Psychol 58:1–17

    MathSciNet  Article  Google Scholar 

  18. International ICT Literacy Panel. (2002). Digital transformation: a framework for ICT literacy. Princeton, NJ. http://www.ets.org/research/policy_research_reports/publications/report/2002/cjik

  19. Kim KH, Bentler PM (2002) Tests of homogeneity of means and covariance matrices for multivariate incomplete data. Psychometrika 67:609–624

    MathSciNet  Article  Google Scholar 

  20. Köhler C, Pohl S, Carstensen CH (2015) Taking the missing propensity into account when estimating competence scores: evaluation of item response theory models for nonignorable omissions. Educ Psychol Measur 75:850–874

    Article  Google Scholar 

  21. Kong X, Wise J, Bhola SL (2007) Setting the response time threshold parameter to differentiate solution behavior from rapid-guessing behavior. Educ Psychol Meas 67:606–619

    MathSciNet  Article  Google Scholar 

  22. Lee Y-H, Chen H (2011) A review of recent response-time analyses in educational testing. Psychol Test Assess Model 53:359–379

    Google Scholar 

  23. Little RJA (1988) A test of missing completely at random for multivariate data with missing values. J Am Stat Assoc 83:1198–1202

    MathSciNet  Article  Google Scholar 

  24. Little RJA, Rubin DB (2002) Statistical analysis with missing data, 2nd edn. Wiley, Hoboken

    Google Scholar 

  25. Lord FM (1974) Estimation of latent ability and item parameters when there are omitted responses. Psychometrika 39(2):247–264

    Article  Google Scholar 

  26. Ludlow LH, O’Leary M (1999) Scoring omitted and not-reached items: practical data analysis implications. Educ Psychol Measur 59:615–630

    Article  Google Scholar 

  27. Mislevy RJ, Wu PK (1996) Missing responses and IRT ability estimation: Omits, choice, time limits, and adaptive testing (RR-96-30-ONR). Educ Testing Serv, Princeton

    Google Scholar 

  28. Neyman J, Pearson ES (1933) On the testing of statistical hypotheses in relation to probability a priori. Proc Camb Philos Soc 29:492–510

    Article  Google Scholar 

  29. O’Muircheartaigh C, Moustaki I (1999) Symmetric pattern models: a latent variable approach to item non-response in attitude scales. J R Stat Soc, Ser A 162:177–194

    Article  Google Scholar 

  30. OECD (2016) Technical report of the survey of adult skills (PIAAC), 2nd edn. Author, Paris

    Google Scholar 

  31. Robitzsch A (2016) Zu nichtignorierbaren Konsequenzen des (partiellen) ignorierens fehlender item responses im large-scale assessment [on non-negligible consequences of (partially) ignoring missing item responses in large- scale assessments]. In: Suchań B, Wallner-Paschon C, Schreiner C (eds) PIRLS & TIMSS 2011—die Kompetenzen in Lesen, Mathematik und Naturwissenschaft am Ende der Volksschule: Österreichischer Expertenbericht. Leykam, Graz, pp 55–64

    Google Scholar 

  32. Robitzsch A, Kiefer T, Wu M (2018) TAM: test analysis modules [software]. R package version 2.9-35

  33. Rose N, von Davier M, Xu X (2010) Modeling non-ignorable missing data with IRT (ETS Research Report No. Educational Testing Service, Princeton, pp 10–11

    Google Scholar 

  34. Rubin DB (1976) Inference and missing data. Biometrika 63:581–592

    MathSciNet  Article  Google Scholar 

  35. Rubin DB (1987) Multiple imputation for nonresponse in surveys. Wiley, New York

    Google Scholar 

  36. Rutkowski L, Gonzales E, von Davier M, Zhou Y (2014) Assessment design for international large-scale assessments. In: Rutkowski L, von Davier M, Rutkowski D (eds) Handbook of international large-scale assessment: background, technical issues, and methods of data analysis. CRC Press, Boca Raton, pp 75–95

    Google Scholar 

  37. Satterthwaite FE (1946) An approximate distribution of estimates of variance components. Biometr Bull 2:110–114. https://doi.org/10.2307/3002019

    Article  Google Scholar 

  38. van der Linden WJ (2009) Conceptual issues in response-time modeling. J Educ Meas 46:247–272

    Article  Google Scholar 

  39. Warm TA (1989) Weighted likelihood estimation of ability in item response theory. Psychometrika 54:427–450

    MathSciNet  Article  Google Scholar 

  40. Weeks JP, von Davier M, Yamamoto K (2016) Using response time data to inform the coding of omitted responses. Psychol Test Assess Model 58:671–701

    Google Scholar 

  41. Welch BL (1947) The generalization of “Student’s” problem when several different population variances are involved. Biometrika 34:28–35. https://doi.org/10.1093/biomet/34.1-2.28

    MathSciNet  Article  MATH  Google Scholar 

  42. Wenzel SFC, Engelhardt L, Hartig K, Kuchta K, Frey A, Goldhammer F, Naumann J, Horz H (2016) Computergestützte, adaptive und verhaltensnahe Erfassung Informations- und Kommunikationstechnologie-bezogener Fertigkeiten (ICT-Skills) [Computerized adaptive and behaviorally oriented measurement of Information and communication technology-related skills (ICT-skills)]. In Bundesministerium für Bildung und Forschung (BMBF) Referat Bildungsforschung (Hrsg.), Forschungsvorhaben in Ankopplung an Large-Scale Assessments (pp 161–180). Niestetal: Silber Druck

  43. Wilkinson L, Task Force on Statistical Inference, American Psychological Association, Science Directorate (1999) Statistical methods in psychology journals: guidelines and explanations. Am Psychol 54:594–604. https://doi.org/10.1037/0003-066X.54.8.594

    Article  Google Scholar 

  44. Wise SL, Kingsbury GG (2016) Modeling student test-taking motivation in the context of an adaptive achievement test. J Educ Meas 53:86–105

    Article  Google Scholar 

  45. Wise SL, Ma L (2012) Setting response time thresholds for a CAT item pool: the normative threshold method. Paper presented at the annual meeting of the National Council on Measurement in Education, Vancouver, Canada

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Andreas Frey.

Ethics declarations

Conflict of interest

On behalf of all authors, the corresponding author states that there is no conflict of interest.

About this article

Verify currency and authenticity via CrossMark

Cite this article

Frey, A., Spoden, C., Goldhammer, F. et al. Response time-based treatment of omitted responses in computer-based testing. Behaviormetrika 45, 505–526 (2018). https://doi.org/10.1007/s41237-018-0073-9

Download citation

Keywords

  • Testing
  • Computer-based testing
  • Missing data
  • Response time
  • Item response theory