Advertisement

11 Computer-based competence tests in the national educational panel study: The challenge of mode effects

  • Ulf KröhneEmail author
  • Thomas Martens
Article

Abstract

Computerized competence tests promise a variety of advantages compared to paper–pencil delivered tests, for instance, increased test security, more information about test takers and the test-taking process, instant scoring, and immediate feedback. Moreover, new innovative item types can be administered to broaden the test content. Three benefits should be particularly emphasized for the assessment of cognitive competencies in the German National Educational Panel Study. First, reductions of test time can be obtained through the higher measurement efficiency of adaptive tests. Second, computerized testing is expected to enhance standardization and to increase test takers’ interest in completing the test. Third, Internet-based assessment offers the opportunity to deliver tests to spatially distanced test takers. However, before we can exploit these opportunities, we have to study the equivalence between different test administrations in order to maintain comparability of test scores and to ensure the validity of score interpretations. In this chapter, we shall describe a theoretical framework of mode effects and discuss various properties of test administrations. We shall relate the resulting equivalence criteria to the specific settings of the National Educational Panel Study in which (a) the usage of computerized competence tests is being prepared for upcoming assessments, and (b) tests for different grades and age groups are being designed to assess competence development over the life span.

Keywords

Education Panel study Computer-based competence test Adaptive testing Mode effects 

Computerbasierte Kompetenztests im Nationalen Bildungspanel: Herausforderung durch Mode Effects

Zusammenfassung

Im Vergleich zu Papier- und Bleistifttests versprechen computerisierte Kompetenztests eine Vielzahl von Vorteilen, beispielsweise eine erhöhte Testsicherheit, mehr Informationen über die Testteilnehmer, sofortiges Scoring und unmittelbares Feedback. Zudem können neue, innovative Aufgabenformate angewendet werden, um die Testinhalte zu erweitern. Drei Vorteile sind für die Messung kognitiver Kompetenzen im Rahmen des Nationalen Bildungspanels besonders hervorzuheben: Erstens kann eine Reduktion der Testzeit durch die höhere Messeffizienz von adaptiven Tests erzielt werden. Zweitens ist zu erwarten, dass computerisiertes Testen die Standardisierung erhöht und das Interesse der Testteilnehmer an der Testdurchführung steigert. Drittens ermöglicht die internetbasierte Testdurchführung die Auslieferung von Tests an räumlich entfernte Testteilnehmer. Bevor diese Vorteile jedoch genutzt werden können, muss die Äquivalenz zwischen den verschiedenen Formen der Testadministration untersucht werden, um die Vergleichbarkeit der Testergebnisse und die Validität der Ergebnisinterpretationen sicherzustellen. In diesem Kapitel wird ein theoretisches Bezugssystem für Mode Effects beschrieben und spezifische Eigenschaften der verschiedenen Administrationsformen werden diskutiert. Darüber hinaus werden abgeleitete Äquivalenzkriterien im Hinblick auf die Gegebenheiten der Kompetenzdiagnostik im Nationalen Bildungspanel betrachtet unter denen a) die Nutzung computerisierter Kompetenztests für nachfolgende Testdurchführungen vorbereitet wird und unter denen b) die Kompetenzentwicklung über die Lebensspanne mit Tests für verschiedene Klassenstufen und Altersgruppen gemessen wird.

Schlüsselwörter

Bildung Panelstudie Computerbasierte Kompetenztests Adaptives Testen Mode Effects 

References

  1. Adams, R. J. (2005). Reliability as a measurement design effect. Studies in Educational Evaluation, 31, 162–172.CrossRefGoogle Scholar
  2. Armstrong, R. D., Jones, D. H., Koppel, N. B., & Pashley, P. J. (2004). Computerized adaptive testing with multiple-form structures. Applied Psychological Measurement, 28, 147–164.CrossRefGoogle Scholar
  3. Bennett, R. E. (2003). Online assessment and the comparability of score meaning. Princeton: Educational Testing Service.Google Scholar
  4. Bergstrom, B. (1992, April). Ability measure equivalence of computer adaptive and pencil and paper tests: A research synthesis. Paper presented at the annual meeting of the American Educational Research Association, San Francisco.Google Scholar
  5. Bridgeman, B., Lennon, M. L., & Jackenthal, A. (2003). Effects of screen size, screen resolution, and display rate on computer-based test performance. Applied Measurement in Education, 16, 191–205.CrossRefGoogle Scholar
  6. Bodmann, S. M., & Robinson, D. H. (2004). Speed and performance differences among computerbased and paper-pencil tests. Journal of Educational Computing Research, 31, 51–60.CrossRefGoogle Scholar
  7. Bugbee, A. C. (1996). The equivalence of paper-and-pencil and computer-based testing. Journal of Research on Computing in Education, 28, 282–299.Google Scholar
  8. Choi, S. W., & Tinkler, T. (2002, April). Evaluating comparability of paper-and-pencil and computer-based assessment in a K-12 setting. Paper presented at the annual meeting of the American Educational Research Association, New Orleans.Google Scholar
  9. Clariane, R., & Wallace, P. (2002). Paper-based versus computer-bases assessment: Key factors associated with the test mode effect. British Journal of Educational Technology, 33, 593–602.CrossRefGoogle Scholar
  10. Cleary, A. T., Linn, R. L., & Rock, D. A. (1968). An exploratory study of programmed tests. Educational and Psychological Measurement, 28, 345–360.CrossRefGoogle Scholar
  11. Domino, G., & Domino, M. L. (2006). Psychological testing: An introduction. Cambridge: Cambridge University Press.Google Scholar
  12. Frey, A., & Ehmke, T. (2007). Hypothetischer Einsatz adaptiven Testens bei der Messung von Bildungsstandards in Mathematik. In M. Prenzel, I. Gogolin, & H.-H. Krüger (Eds.), Kompetenzdiagnostik (Zeitschrift für Erziehungswissenschaft: Sonderheft 8, pp. 169–184). Wiesbaden: VS Verlag für Sozialwissenschaften.Google Scholar
  13. Frey, A, & Seitz, N.-N. (2010). Multidimensionale adaptive Kompetenzdiagnostik: Ergebnisse zur Messeffizienz. Zeitschrift für Pädagogik, Beiheft 56, 40–51.Google Scholar
  14. Frey, A., Hartig, J., & Moosbruuger, H. (2009). Effekte des adaptiven Testens auf die Motivation zur Testbearbeitung am Beispiel des Frankfurter Adaptiven Konzentrationsleistungs-Tests. Diagnostica, 55, 20–28.CrossRefGoogle Scholar
  15. Gallagher, A., Bennett, R. E., Cahalan, C., & Rock, D. A. (2002). Validity and fairness in technology-based assessment: Detecting construct-irrelevant variance in an open-ended, computerized mathematics task. Educational Assessment, 8, 27–41.CrossRefGoogle Scholar
  16. Glas, C. A. W. (2006). Violations of ignorability in computerized adaptive testing. (Computerized Testing Report 04-04). Newtown: Law School Admission Council.Google Scholar
  17. Greaud, V. A., & Green, B. F. (1986). Equivalence of conventional and computer presentation of speed tests. Applied Psychological Measurement, 10, 23–34.CrossRefGoogle Scholar
  18. Green, B. F. (1988). Construct validity of computer-based tests. In H. Wainer & H. I. Braun (Eds.), Test validity (pp. 77–86). Hillsdale: Erlbaum.Google Scholar
  19. Green, B. F., Bock, R. D., Humphreys, L. G., Linn, R. L., & Reckase, M. D. (1984). Technical guidelines for assessing computerized adaptive tests. Journal of Educational Measurement, 21, 347–360.CrossRefGoogle Scholar
  20. Greeno, J. G. (1998). The situativity of knowing, learning and research. American Psychologist, 53, 5–26.CrossRefGoogle Scholar
  21. Gwaltney, C. J., Shields, A. L., & Shiffman, S. (2008). Equivalence of electronic and paper-and-pencil administration of patient-reported outcome measures: A meta-analytic review. Value in Health, 11, 322–333.CrossRefGoogle Scholar
  22. Haberman, S. J., & Sinharay, S. (2010). The application of the cumulative logistic regression model to automated essay scoring. Journal of Educational and Behavioral Statistics, 35, 586–602.CrossRefGoogle Scholar
  23. Hetter, R. D., Segall, D. O., & Bloxom, B. M. (1994). A comparison of item calibration media in computerized adaptive testing. Applied Psychological Measurement, 18, 197–204.CrossRefGoogle Scholar
  24. Ito, K., & Sykes, R. C. (2004, April). Comparability of scores from norm-referenced, paper-and-pencil, and web-based linear tests for 4–12. Paper presented at the annual meeting of the National Council on Measurement in Education, San Diego.Google Scholar
  25. Kingston, N. M. (2009). Comparability of computer- and paper-administered multiple-choice tests for K-12 populations: A synthesis. Applied Measurement in Education, 22, 22–37.CrossRefGoogle Scholar
  26. Kobrin, J. L., & Young, J. W. (2003). The cognitive equivalence of reading comprehension test items via computerized and paper-and-pencil administration. Applied Measurement in Education, 16, 115–140.CrossRefGoogle Scholar
  27. Kolen, M. J., & Brennan, R. L. (2004). Test equating, scaling, and linking: methods and practices. New York: Springer.Google Scholar
  28. Lee, J., Moreno, K. E., & Sympson, J. B. (1986). The effects of mode of test administration on test performance. Educational and Psychological Measurement, 46, 467–473.CrossRefGoogle Scholar
  29. Linn, R. L. (1993). Linking results of distinct assessments. Applied Measurement in Education, 6, 83–102.CrossRefGoogle Scholar
  30. Lord, F. M. (1980). Applications of item response theory to practical testing problems. Hillside: Erlbaum.Google Scholar
  31. Luecht, R. M., & Nungester, R. J. (1998). Some practical examples of computer-adaptive sequential testing. Journal of Educational Measurement, 35, 229–249.CrossRefGoogle Scholar
  32. Lunz, M. E., & Bergstrom, B. A. (1994). An empirical study of computerized adaptive test administration conditions. Journal of Educational Measurement, 31, 251–263.CrossRefGoogle Scholar
  33. MacCann, R. (2006). The equivalence of online and traditional testing for different subpopulations and item types. British Journal of Educational Technology, 37, 79–81.CrossRefGoogle Scholar
  34. Mazzeo, J., & Harvey, A. L. (1988). The equivalence of scores from automated and conventional versions of educational and psychological tests: A review of the literature (Research Report CBR 87-8, ETS RR 88-21). Princeton: Educational Testing Service.Google Scholar
  35. Mead, A. D. (2006). An introduction to multistage testing. Applied Measurement in Education, 19, 185–187.CrossRefGoogle Scholar
  36. Mead, A. D., & Drasgow, F. (1993). Equivalence of computerized and paper-and-pencil cognitive ability tests: A meta-analysis. Psychological Bulletin, 114, 449–458.CrossRefGoogle Scholar
  37. Noyes, J. M., & Garland, K. J. (2008). Computer- vs. paper-based tasks: Are they equivalent? Ergonomics, 51, 1352–1375.CrossRefGoogle Scholar
  38. Parshall, C. G., Davey, T., & Pashley, P. J. (2000). Innovative item types for computerized testing. In W. J. der van Linden & C. Glas (Eds.), Computerized adaptive testing: Theory and practice (pp. 129–148). Boston: Kluwer.Google Scholar
  39. Poggio, J., Glasnapp, D. R., Yang, X., & Poggio, A. J. (2005). A comparative evaluation of score results from computerized and paper and pencil mathematics testing in a large scale state assessment program. Journal of Technology, Learning, and Assessment, 3. http://www.jtla.org. Accessed 11 Nov 2009.Google Scholar
  40. Pommerich, M. (2004). Developing computerized versions of paper-and-pencil tests: Mode effects for passage-based tests. Journal of Technology, Learning, and Assessment, 2. http://www.jtla.org. Accessed 4 March 2010.Google Scholar
  41. Pomplun, M. (2007). A bifactor analysis for a mode-of-administration effect. Applied Measurement in Education, 20, 137–152.Google Scholar
  42. Pomplun, M., Frey, S., & Becker, D. F. (2002). The score equivalence of paper-and-pencil and computerized versions of a speeded test of reading comprehension. Educational and Psychological Measurement, 62, 337–354.CrossRefGoogle Scholar
  43. Pomplun, M., Ritchie, T., & Custer, M. (2006). Factors in paper-and-pencil and computer reading score differences at the primary grades. Educational Assessment, 11, 127–149.CrossRefGoogle Scholar
  44. Puhan, P., Boughton, K., & Kim, S. (2007). Examining differences in examinee performance in paper and pencil and computerized testing. Journal of Technology, Learning, and Assessment, 6. http://www.jtla.org. Accessed 19 Nov 2009.Google Scholar
  45. Russell, M. (1999). Testing on computers: A follow-up study comparing performance on computer and on paper. Education Policy Analysis Archives, 7, 20.Google Scholar
  46. Russell, M., Goldberg, A., & O’Connor, K. (2003). Computer-based testing and validity: a look back into the future. Assessment in Education, 10, 278–293.Google Scholar
  47. Schaeffer, G. A., Steffen, M., Golub-Smith, M. L., Mills, C. N., & Durso, R. (1995). The introduction and comparability of the computer-adaptive GRE General Test (Research Rep. No. 95-20). Princeton: Educational Testing Service.Google Scholar
  48. Schwarz, R. D., Rich, C., & Podrabsky, T. (2003, April). A DIF analysis of item-level mode effects for computerized and paper-and-pencil tests. Paper presented at the annual meeting of the National Council on Measurement in Education, Chicago.Google Scholar
  49. Segall, D. O. (1996). Multidimensional adaptive testing. Psychometrika, 61, 331–354.CrossRefGoogle Scholar
  50. Sireci, S. G., & Zenisky, A. L. (2006). Innovative item formats in computer-based testing: In pursuit of construct representation. In S. M. Downing & T. M. Haladyna (Eds.), Handbook of test development (pp. 329–348). Hillsdale: Erlbaum.Google Scholar
  51. Spray, J. A., Ackerman, T. A., Reckase, M. D., & Carl, J. E. (1989). Effect of medium of item presentation on examinee performance and item characteristics. Journal of Educational Measurement, 26, 261–271.CrossRefGoogle Scholar
  52. Styles, I. (1991). Clinical assessment and computerized testing. International Journal of Man-Machine Studies, 35, 133–155.CrossRefGoogle Scholar
  53. Thissen, D., Reeve, B. B., Bjorner, J. B., & Chang, C. H. (2007). Methodological issues for building item banks and computerized adaptive scales. Quality of Life Research, 16, 109–119.CrossRefGoogle Scholar
  54. Threlfall, J., Pool, P., Homer, M., & Swinnerton, B. (2007). Implicit aspects of paper and pencil mathematics assessment that come to light through the use of the computer. Educational Studies in Mathematics, 66, 335–348.CrossRefGoogle Scholar
  55. van de Vijver, F. J. R., & Harsveld, M. (1994). The incomplete equivalence of the paper-and-pencil and computer versions of the General Aptitude Test Battery. Journal of Applied Psychology, 79, 852–859.CrossRefGoogle Scholar
  56. Vispoel, W. P. (2000). Reviewing and changing answers on computerized fixed-item vocabulary tests. Educational and Psychological Measurement, 60, 371–384.CrossRefGoogle Scholar
  57. Vispoel, W. P., Hendrickson, A. B., & Bleiler, T. (2000). Limiting answer review and change on computerized adaptive vocabulary tests: Psychometric and attitudinal results. Journal of Educational Measurement, 37, 21–38.CrossRefGoogle Scholar
  58. Wainer, H. (Ed.). (2000). Computerized adaptive testing: A Primer. Mahwah: Erlbaum .Google Scholar
  59. Wainer, H., & Kiely, G. L. (1987). Item clusters and computerized adaptive testing: A case for testlets. Journal of Educational Measurement, 24, 185–201.CrossRefGoogle Scholar
  60. Wang, T., & Kolen, M. J. (2001). Evaluating comparability in computerized adaptive testing: Issues, criteria and an example. Journal of Educational Measurement, 38, 19–49.CrossRefGoogle Scholar
  61. Weiss, D. J. (1982). Improving measurement quality and efficiency with adaptive testing. Applied Psychological Measurement, 6, 473–492.CrossRefGoogle Scholar
  62. Wise, S. L., Barnes, L. B., Harvey, A. L., & Plake, B. S. (1989). Effects of computer anxiety and computer experience on the computer-based achievement test performance of college students. Applied Measurement in Education, 2, 235–241.CrossRefGoogle Scholar
  63. Zandvliet, D., & Farragher, P. (1997). A comparison of computer administered and written tests. Journal of Research on Computers in Education, 29, 423–438.Google Scholar
  64. Zhang, L., & Lau, C. A. (2006, April). A comparison study of testing mode using multiple-choice and constructed-response itemsLessons learned from a pilot study. Paper presented at the AERA annual conference, San Francisco.Google Scholar

Copyright information

© VS Verlag für Sozialwissenschaften 2011

Authors and Affiliations

  1. 1.Department of Educational Quality and EvaluationGerman Institute of International Educational Research (DIPF)Frankfurt a. M.Germany

Personalised recommendations