Setting Standards to a Scientific Literacy Test for Adults Using the Item-Descriptor (ID) Matching Method

  • Linda I. Haschke
  • Nele Kampa
  • Inga Hahn
  • Olaf Köller
Part of the Methodology of Educational Measurement and Assessment book series (MEMA)


Common standard setting methods such as the Angoff or the Bookmark method require panellists to imagine minimally competent persons or to estimate response probabilities, in order to define cut scores. Imagining these persons and how they would perform is criticised as cognitively demanding. These already challenging judgemental tasks become even more difficult, when experts have to deal with very heterogeneous or insufficiently studied populations, such as adults. The Item-Descriptor (ID) Matching method can reduce the arbitrariness of such subjective evaluations by focusing on rather objective judgements about the content of tests. At our standard setting workshop, seven experts had to match the item demands of 22 items of a scientific literacy test for adults with abilities described by performance level descriptions (PLDs) of the two proficiency levels Basic and Advanced. Since the ID Matching method has hardly been used in European standard settings, the method has not been evaluated comprehensively. In order to evaluate the appropriateness and correct interpretation of cut scores, information about the validity of standard setting methods is essential. In this chapter, we aim to provide procedural and internal evidence for the use and interpretation of the derived cut scores and PLDs using the ID Matching method. With regard to procedural validity, we report high and consensual agreement of the experts regarding explicitness, practicability, implementation, and feedback, which we assessed by detailed questionnaires. The inter-rater reliability for the panellists’ classification of items was low, but increased during subsequent rounds (κ = .38 to κ = .63). The values are consistent with findings of earlier studies which support internal validity. We argue that the cut scores and PLDs derived from the application of the ID Matching method are appropriate to categorise adults as scientifically illiterate, literate, and advanced literate.


Item-Descriptor Matching method Internal validity External validity Science abilities Standard setting 


  1. Angoff, W. H. (1971). Scales, norms, and equivalent scores, pp. 508–600.Google Scholar
  2. Artelt, C., Weinert, S., & Carstensen, C. H. (2013). Assessing competencies across the lifespan within the German national educational panel study (NEPS). Journal for Educational Research Online, 5(2), 5–14.Google Scholar
  3. Bazinger, C., Freunberger, R., & Itzlinger-Bruneforth, U. (2013). Standard-Setting Mathematik 4. Schulstufe: Technischer Bericht.Google Scholar
  4. Berk, R. A. (1986). A consumer’s guide to setting performance standards on criterion-referenced tests. Review of Educational Research, 56(1), 137–172.CrossRefGoogle Scholar
  5. Blossfeld, H.-P., Rossbach, H. G., & von Maurice, J. (Eds.). (2011). Education as a lifelong process: The German national educational panel study (NEPS) [Special issue], Zeitschrift für Erziehungswissenschaft. (14). Wiesbaden: VS Verlag fur Sozialwissenschaften.Google Scholar
  6. Buckendahl, C. W., Smith, R. W., & Impara, J. C. (2002). A comparison of Angoff and bookmark standard setting methods. Journal of Educational Measurement, 39(3), 253–263.CrossRefGoogle Scholar
  7. Cizek, G. J. (2012a). The forms and functions of evaluations of the standard setting process. In G. J. Cizek (Ed.), Setting performance standards. Foundations, methods, and innovations (2nd ed., pp. 165–178). New York: Routledge.Google Scholar
  8. Cizek, G. J. (Ed.). (2012b). Setting performance standards: Foundations, methods, and innovations (2nd ed.). New York: Routledge.Google Scholar
  9. Egan, K. L., Schneider, M. C., & Ferrara, S. (2012). Performance level descriptors: History, practice, and a proposed framework. In G. J. Cizek (Ed.), Setting performance standards. Foundations, methods, and innovations (2nd ed., pp. 79–106). New York: Routledge.Google Scholar
  10. Ferrara, S., & Lewis, D. M. (2012). The item- descriptor (ID) matching method. In G. J. Cizek (Ed.), Setting performance standards. Foundations, methods, and innovations (2nd ed., pp. 255–282). New York: Routledge.Google Scholar
  11. Ferrara, S., Perie, M., & Johnson, E. (2008). Matching the judgemental task with standard setting panelist expertise: The item-descriptor (ID) matching method. Journal of Applied Testing Technology, 9(1), 1–20.Google Scholar
  12. Fleiss, J. L., Levin, B., & Paik, M. C. (2003). The measurement of interrater agreement. In J. L. Fleiss, B. Levin, & M. C. Paik (Eds.), Wiley series in probability and statistics. Statistical methods for rates and proportions (3rd ed.). Hoboken: J. Wiley.CrossRefGoogle Scholar
  13. Freunberger, R. (2013). Standard-Setting Mathematik 8. Schulstufe. Google Scholar
  14. Freunberger, R., & Yanagida, T. (2012). Kompetenzdiagnostik in Österreich: Der Prozess des Standard-Settings. Psychologie in Österreich, 5, 396–403.Google Scholar
  15. Hahn, I., Schöps, K., Rönnebeck, S., Martensen, M., Hansen, S., Saß, S., Dalehefte, I., & Prenzel, M. (2013). Assessing scientific literacy over the lifespan: A description of the NEPS science framework and the test development. Journal of Educational Research Online, 5(2), 110–138.Google Scholar
  16. Hambleton, R. K. (2001). Setting performance standards on educational assessments and criteria for evaluating the process. In G. J. Cizek (Ed.), Setting performance standards. Concepts, methods, and perspectives (pp. 89–116). Mahwah: Lawrence Erlbaum Associates.Google Scholar
  17. Hambleton, R. K., Pitoniak, M. J., & Copella, J. M. (2012). Essential steps in setting performance standards on educational tests and strategies for assessing the reliability of results. In G. J. Cizek (Ed.), Setting performance standards. Foundations, methods, and innovations (2nd ed., pp. 47–76). New York: Routledge.Google Scholar
  18. Haschke, L. I., & Kähler, J. (in press). NEPS technical report for science-scaling results of starting cohort 6: adults: NEPS working paper. Google Scholar
  19. IBM. (2010). IBM SPSS Statistics for Windows, Version 19.0. Armonk: IBM.Google Scholar
  20. Kane, M. (1994). Validating the performance standards associated with passing scores. Review of Educational Research, 64(3), 425–461.CrossRefGoogle Scholar
  21. Kane, M. (2012). Validating score interpretations and uses. Language Testing, 29(1), 3–17.CrossRefGoogle Scholar
  22. Karantonis, A., & Sireci, S. G. (2006). The bookmark standard-setting method: A literature review. Educational Measurement: Issues and Practice, 25(1), 4–12.CrossRefGoogle Scholar
  23. Landis, J. R., & Koch, G. G. (1977). The measurement of observer agreement for categorical data. Biometrics, 33, 159–174.CrossRefGoogle Scholar
  24. Lewis, D. M., Mitzel, H. C., Mercado, R. L., & Schulz, E. M. (2012). The bookmark standard setting procedure. In G. J. Cizek (Ed.), Setting performance standards. Foundations, methods, and innovations (2nd ed., pp. 225–254). New York: Routledge.Google Scholar
  25. Lord, F. M. (1981). Standard error of an equating by item response theory. ETS Research Report Series, 2, 463–471.CrossRefGoogle Scholar
  26. Mitzel, H. C., Patz, R. G., & Green, D. R. (2001). The bookmark procedure: psychological perspectives. In G. J. Cizek (Ed.), Setting performance standards. Concepts, methods, and perspectives (pp. 249–281). Mahwah: Lawrence Erlbaum Associates.Google Scholar
  27. Moosbrugger, H. (2012). Item-Response-Theory (IRT). In H. Moosbrugger & A. Kelava (Eds.), Tetstheorie und Fragebogenkonstruktion (2nd ed.). Berlin/Heidelberg/New York: Springer.Google Scholar
  28. OECD. (2013). OECD Skills outlook 2013. Paris: OECD Publishing.Google Scholar
  29. Pant, H. A., Rupp, A. A., Tiffin-Richards, S. P., & Köller, O. (2009). Validity issues in standard-setting studies. Studies in Educational Evaluation, 35(2–3), 95–101.CrossRefGoogle Scholar
  30. Pitoniak, M. J. (2003). Standard setting methods for complex licensure examinations (dissertation). University of Massachusetts Amherst.Google Scholar
  31. Plake, B. S., & Cizek, G. J. (2012). Variations on a theme: The modified Angoff, extended Angoff, and yes/no standard setting methods. In G. J. Cizek (Ed.), Setting performance standards. Foundations, methods, and innovations (2nd ed., pp. 181–200). New York: Routledge.Google Scholar
  32. Plake, B. S., Impara, J. C., Cizek, G. J., & Sireci, S. G. (2008). AP standard setting pilot studies final report. New York, NY.Google Scholar
  33. Pohl, S., & Carstensen, C. H. (2013). Scaling of competence tests in the national educational panel study – Many questions, some answers, and further challenges. Journal of Educational Research Online, 5(2), 189–216.Google Scholar
  34. Shepard, L., Glaser, R., Linn, R., & Bohrnstedt, G. (1993). Setting performance standards for student achievement. New York: National Academy of Education.Google Scholar
  35. Sireci, S. G. (2007). On validity theory and test validation. Educational Researcher, 36(8), 477–481.CrossRefGoogle Scholar
  36. Sireci, S. G., Hambleton, R. K., & Pitoniak, M. J. (2004). Setting passing scores on licensure examinations using direct consensus. CLEAR Exam Review, 15(1), 21–25.Google Scholar
  37. van der Linden, J., & Hambleton, R. K. (1997). Handbook of modern item response theory. New York: Springer.CrossRefGoogle Scholar
  38. Wu, M. L., Adams, R. J., Wilson, M. R., & Haldane, S. (2007). ACER ConQuest. Camberwell: Australian Council for Educational Research.Google Scholar
  39. Zieky, M. J. (2012). An historical overview of setting cut-scores. In G. J. Cizek (Ed.), Setting performance standards. Foundations, methods, and innovations (2nd ed.). New York: Routledge.Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Linda I. Haschke
    • 1
  • Nele Kampa
    • 1
  • Inga Hahn
    • 1
  • Olaf Köller
    • 1
  1. 1.Leibniz Institute for Science and Mathematics Education (IPN)KielGermany

Personalised recommendations