Advertisement

Behaviormetrika

, Volume 45, Issue 2, pp 505–526 | Cite as

Response time-based treatment of omitted responses in computer-based testing

  • Andreas Frey
  • Christian Spoden
  • Frank Goldhammer
  • S. Franziska C. Wenzel
Original Paper
  • 52 Downloads

Abstract

A new response time-based method for coding omitted item responses in computer-based testing is introduced and illustrated with empirical data. The new method is derived from the theory of missing data problems of Rubin and colleagues and embedded in an item response theory framework. Its basic idea is using item response times to statistically test for each individual item whether omitted responses are missing completely at random (MCAR) or missing due to a lack of ability and, thus, not at random (MNAR) with fixed type-1 and type-2 error levels. If the MCAR hypothesis is maintained, omitted responses are coded as not administered (NA), and as incorrect (0) otherwise. The empirical illustration draws from the responses given by N = 766 students to 70 items of a computer-based ICT skills test. The new method is compared with the two common deterministic methods of scoring omitted responses as 0 or as NA. In result, response time thresholds from 18 to 58 s were identified. With 61%, more omitted responses were recoded into 0 than into NA (39%). The differences in difficulty were larger when the new method was compared to deterministically scoring omitted responses as NA compared to scoring omitted responses as 0. The variances and reliabilities obtained under the three methods showed small differences. The paper concludes with a discussion of the practical relevance of the observed effect sizes, and with recommendations for the practical use of the new method as a method to be applied in the early stage of data processing.

Keywords

Testing Computer-based testing Missing data Response time Item response theory 

Notes

Compliance with ethical standards

Conflict of interest

On behalf of all authors, the corresponding author states that there is no conflict of interest.

References

  1. Arbuckle JL (1996) Full information estimation in the presence of incomplete data. In: Marcoulides GA, Schumacker RE (eds) Advanced structural equation modeling. Lawrence Erlbaum Publishers, Mahwah, pp 243–277Google Scholar
  2. Champely, S. (2017). pwr: Basic functions for power analysis [software]. R package version 1.2-1Google Scholar
  3. Chen HY, Little RJA (1999) A test of missing completely at random for generalised estimating equations with missing data. Biometrika 86:1–13MathSciNetCrossRefGoogle Scholar
  4. Core Team R (2017) R: a language and environment for statistical computing [software]. R Foundation for Statistical Computing, ViennaGoogle Scholar
  5. De Ayala RJ (2009) The theory and practice of item response theory. The Guildford Press, New YorkGoogle Scholar
  6. De Ayala RJ, Plake BS, Impara JC (2001) The impact of omitted responses on the accuracy of ability estimation in item response theory. J Educ Meas 38:213–234CrossRefGoogle Scholar
  7. Diggle PJ (1989) Testing for random dropouts in repeated measurement data. Biometrics 45:1255–1258CrossRefGoogle Scholar
  8. Enders CK (2010) Applied missing data analysis. Guilford Press, New YorkGoogle Scholar
  9. Enders CK, Bandalos DL (2001) The relative performance of full information maximum likelihood estimation for missing data in structural equation models. Struct Equ Model 8:430–457MathSciNetCrossRefGoogle Scholar
  10. Engelhardt L, Naumann J, Goldhammer F, Frey A, Wenzel SFC, Hartig K, Horz H (2018) Convergent evidence for validity of a performance-based ICT skills test. European Journal of Psychological AssessmentGoogle Scholar
  11. Finch H (2008) Estimation of item response theory parameters in the presence of missing data. J Educ Meas 45:225–245CrossRefGoogle Scholar
  12. Frey A, Hartig J, Rupp A (2009) Booklet designs in large-scale assessments of student achievement: theory and practice. Educ Meas: Issues Pract 28:39–53CrossRefGoogle Scholar
  13. Glas CAW, Pimentel J (2008) Modeling nonignorable missing data in speeded tests. Educ Psychol Measur 68:907–922MathSciNetCrossRefGoogle Scholar
  14. Goldhammer F, Kroehne U (2014) Controlling individuals’ time spent on task in speeded performance measures: experimental time limits, posterior time limits, and response time modeling. Appl Psychol Meas 38:255–267CrossRefGoogle Scholar
  15. Goldhammer F, Martens T, Lüdtke O (2017) Conditioning factors of test-taking engagement in PIAAC: an exploratory IRT modelling approach considering person and item characteristics. Large-scale Assess Educ 5(1):18CrossRefGoogle Scholar
  16. Hedges LV (1981) Distribution theory for Glass’s estimator of effect size and related estimators. J Educ Stat 6:107–128CrossRefGoogle Scholar
  17. Holman R, Glas CAW (2005) Modelling non-ignorable missing data mechanisms with item response theory models. Br J Math Stat Psychol 58:1–17MathSciNetCrossRefGoogle Scholar
  18. International ICT Literacy Panel. (2002). Digital transformation: a framework for ICT literacy. Princeton, NJ. http://www.ets.org/research/policy_research_reports/publications/report/2002/cjik
  19. Kim KH, Bentler PM (2002) Tests of homogeneity of means and covariance matrices for multivariate incomplete data. Psychometrika 67:609–624MathSciNetCrossRefGoogle Scholar
  20. Köhler C, Pohl S, Carstensen CH (2015) Taking the missing propensity into account when estimating competence scores: evaluation of item response theory models for nonignorable omissions. Educ Psychol Measur 75:850–874CrossRefGoogle Scholar
  21. Kong X, Wise J, Bhola SL (2007) Setting the response time threshold parameter to differentiate solution behavior from rapid-guessing behavior. Educ Psychol Meas 67:606–619MathSciNetCrossRefGoogle Scholar
  22. Lee Y-H, Chen H (2011) A review of recent response-time analyses in educational testing. Psychol Test Assess Model 53:359–379Google Scholar
  23. Little RJA (1988) A test of missing completely at random for multivariate data with missing values. J Am Stat Assoc 83:1198–1202MathSciNetCrossRefGoogle Scholar
  24. Little RJA, Rubin DB (2002) Statistical analysis with missing data, 2nd edn. Wiley, HobokenCrossRefGoogle Scholar
  25. Lord FM (1974) Estimation of latent ability and item parameters when there are omitted responses. Psychometrika 39(2):247–264CrossRefGoogle Scholar
  26. Ludlow LH, O’Leary M (1999) Scoring omitted and not-reached items: practical data analysis implications. Educ Psychol Measur 59:615–630CrossRefGoogle Scholar
  27. Mislevy RJ, Wu PK (1996) Missing responses and IRT ability estimation: Omits, choice, time limits, and adaptive testing (RR-96-30-ONR). Educ Testing Serv, PrincetonGoogle Scholar
  28. Neyman J, Pearson ES (1933) On the testing of statistical hypotheses in relation to probability a priori. Proc Camb Philos Soc 29:492–510CrossRefGoogle Scholar
  29. O’Muircheartaigh C, Moustaki I (1999) Symmetric pattern models: a latent variable approach to item non-response in attitude scales. J R Stat Soc, Ser A 162:177–194CrossRefGoogle Scholar
  30. OECD (2016) Technical report of the survey of adult skills (PIAAC), 2nd edn. Author, ParisGoogle Scholar
  31. Robitzsch A (2016) Zu nichtignorierbaren Konsequenzen des (partiellen) ignorierens fehlender item responses im large-scale assessment [on non-negligible consequences of (partially) ignoring missing item responses in large- scale assessments]. In: Suchań B, Wallner-Paschon C, Schreiner C (eds) PIRLS & TIMSS 2011—die Kompetenzen in Lesen, Mathematik und Naturwissenschaft am Ende der Volksschule: Österreichischer Expertenbericht. Leykam, Graz, pp 55–64Google Scholar
  32. Robitzsch A, Kiefer T, Wu M (2018) TAM: test analysis modules [software]. R package version 2.9-35Google Scholar
  33. Rose N, von Davier M, Xu X (2010) Modeling non-ignorable missing data with IRT (ETS Research Report No. Educational Testing Service, Princeton, pp 10–11Google Scholar
  34. Rubin DB (1976) Inference and missing data. Biometrika 63:581–592MathSciNetCrossRefGoogle Scholar
  35. Rubin DB (1987) Multiple imputation for nonresponse in surveys. Wiley, New YorkCrossRefGoogle Scholar
  36. Rutkowski L, Gonzales E, von Davier M, Zhou Y (2014) Assessment design for international large-scale assessments. In: Rutkowski L, von Davier M, Rutkowski D (eds) Handbook of international large-scale assessment: background, technical issues, and methods of data analysis. CRC Press, Boca Raton, pp 75–95Google Scholar
  37. Satterthwaite FE (1946) An approximate distribution of estimates of variance components. Biometr Bull 2:110–114.  https://doi.org/10.2307/3002019 CrossRefGoogle Scholar
  38. van der Linden WJ (2009) Conceptual issues in response-time modeling. J Educ Meas 46:247–272CrossRefGoogle Scholar
  39. Warm TA (1989) Weighted likelihood estimation of ability in item response theory. Psychometrika 54:427–450MathSciNetCrossRefGoogle Scholar
  40. Weeks JP, von Davier M, Yamamoto K (2016) Using response time data to inform the coding of omitted responses. Psychol Test Assess Model 58:671–701Google Scholar
  41. Welch BL (1947) The generalization of “Student’s” problem when several different population variances are involved. Biometrika 34:28–35.  https://doi.org/10.1093/biomet/34.1-2.28 MathSciNetCrossRefzbMATHGoogle Scholar
  42. Wenzel SFC, Engelhardt L, Hartig K, Kuchta K, Frey A, Goldhammer F, Naumann J, Horz H (2016) Computergestützte, adaptive und verhaltensnahe Erfassung Informations- und Kommunikationstechnologie-bezogener Fertigkeiten (ICT-Skills) [Computerized adaptive and behaviorally oriented measurement of Information and communication technology-related skills (ICT-skills)]. In Bundesministerium für Bildung und Forschung (BMBF) Referat Bildungsforschung (Hrsg.), Forschungsvorhaben in Ankopplung an Large-Scale Assessments (pp 161–180). Niestetal: Silber DruckGoogle Scholar
  43. Wilkinson L, Task Force on Statistical Inference, American Psychological Association, Science Directorate (1999) Statistical methods in psychology journals: guidelines and explanations. Am Psychol 54:594–604.  https://doi.org/10.1037/0003-066X.54.8.594 CrossRefGoogle Scholar
  44. Wise SL, Kingsbury GG (2016) Modeling student test-taking motivation in the context of an adaptive achievement test. J Educ Meas 53:86–105CrossRefGoogle Scholar
  45. Wise SL, Ma L (2012) Setting response time thresholds for a CAT item pool: the normative threshold method. Paper presented at the annual meeting of the National Council on Measurement in Education, Vancouver, CanadaGoogle Scholar

Copyright information

© The Behaviormetric Society 2018

Authors and Affiliations

  1. 1.Friedrich Schiller University JenaJenaGermany
  2. 2.Centre for Educational Measurement (CEMO) at the University of OsloOsloNorway
  3. 3.DIPF | Leibniz Institute for Research and Information in EducationFrankfurtGermany
  4. 4.Centre for International Student Assessment (ZIB)FrankfurtGermany
  5. 5.Goethe UniversityFrankfurtGermany
  6. 6.German Institute for Adult Education, Leibniz Centre for Lifelong LearningBonnGermany

Personalised recommendations