Extending participation in standard setting: an online judging proposal



In order for standard setting to retain public confidence, it will be argued there are two important requirements. One is that the judges’ allocation of students to performance bands would yield results broadly consistent with the expectation of the wider educational community. Secondly, in the absence of any change in educational performance, that the percentages in the corresponding bands should be stable over time. It is argued that the use of a small team of judges makes it more difficult to satisfy these conditions. However the cost and logistics of organizing a larger number of judges in the time-pressured atmosphere of public examining can lead to sub-optimal standard setting. Two parallel systems of awarding performance bands are empirically compared, one based on teams of six judges, the other based on a population of teachers. It is shown that the latter system gives more stable results over time for the same large student population. A proposal is outlined for extending participation in standard setting through the web-based presentation of materials and the capturing of cutscores from a population of teachers.


Standard setting Cutscores Angoff method Bookmark method Standard error Online judging 


  1. Angoff, W. H. (1971). Scales, norms and equivalent scores. In R. L. Thorndike (Ed.), Educational measurement (2nd ed., pp. 508–600). Washington, D.C.: American Council on Education.Google Scholar
  2. Beretvas, S. N. (2004). Comparison of bookmark difficulty locations under different item response models. Applied Psychological Measurement, 28, 25–47.CrossRefGoogle Scholar
  3. Berk, R. (1996). Standard setting: the next generation. Applied Measurement in Education, 9, 215–235.CrossRefGoogle Scholar
  4. Board of Studies NSW. (2007). The standards-setting operation: handbook for judges. Sydney: Board of Studies NSW.Google Scholar
  5. Brennan, R. L., & Lockwood, R. E. (1980). A comparison of the Nedelsky and Angoff cutting score procedures using generalizability theory. Applied Psychological Measurement, 4, 219–240.CrossRefGoogle Scholar
  6. Buckendahl, C. W., Blackhurst, A., & Rodeck, E. (2006). Adaptation within a language: considerations for standard setting. Paper presented at the International Test Commission conference, Brussels, Belgium, July 6–8, 2006.Google Scholar
  7. Busch, J. C., & Jaeger, R. M. (1990). Influence of type of judge, normative information, and discussion on standards recommended for the National Teacher Examinations. Journal of Educational Measurement, 27, 145–163.CrossRefGoogle Scholar
  8. Chang, L. (1999). Judgmental item analysis of the Nedelsky and Angoff standard-setting methods. Applied Measurement in Education, 12, 151–165.CrossRefGoogle Scholar
  9. Chang, L., Dziuban, C., Michael, C., Hynes, M., & Olson, A. (1996). Does a standard reflect minimal competency of examinees or judge competency? Applied Measurement in Education, 9, 161–173.CrossRefGoogle Scholar
  10. Cross, L. H., Impara, J. C., Frary, R. B., & Jaeger, R. M. (1984). A comparison of three methods for establishing minimum standards on the National Teacher Examinations. Journal of Educational Measurement, 21, 113–130.CrossRefGoogle Scholar
  11. DeMars, C., Sundre, D., & Wise, S. (2002). Standard setting: a systematic approach to interpreting student learning. The Journal of General Education, 51, 1–20.CrossRefGoogle Scholar
  12. Garet, M., Porter, A., Desimone, L., Binnan, B., & Suk Yoon, K. (2001). What makes professional development effective? Results from a national sample of teachers. American Educational Research Joumal., 38, 915–945.CrossRefGoogle Scholar
  13. Giraud, G., Impara, J. C., & Buckendahl, C. (2000). Making the cut in school districts: alternative methods for setting cutscores. Educational Assessment, 6, 291–304.CrossRefGoogle Scholar
  14. Goodwin, L. D. (1999). Relations between observed item difficulty levels and Angoff minimum passing levels for a group of borderline examinees. Applied Measurement in Education, 12, 13–28.CrossRefGoogle Scholar
  15. Hambleton, R. K. (2001). Setting performance standards on educational assessments and criteria for evaluating the process. In G. J. Cizek (Ed.), Setting performance standards: Concepts, methods and perspectives (pp. 89–116). Mahwah: Lawrence Erlbaum Associates.Google Scholar
  16. Hambleton, R. K., & Pitoniak, M. J. (2006). Setting performance standards. In R. L. Brennan (Ed.), Educational measurement (4th ed., pp. 433–470). Washington, DC: American Council on Education.Google Scholar
  17. Jaeger, R. (1982). An iterative structured judgment process for establishing standards on competency tests of theory and application. Educational Evaluation and Policy Analysis, 4, 461–475.Google Scholar
  18. Kane, M. T. (1987). On the use of IRT Models with judgmental standard setting procedures. Journal of Educational Measurement, 24, 333–345.CrossRefGoogle Scholar
  19. Linn, R. (1978). Demands, cautions and suggestions for setting standards. Journal of Educational Measurement, 15, 301–308.CrossRefGoogle Scholar
  20. Livingston, S. A., & Zieky, M. J. (1989). A comparative study of standard-setting methods. Applied Measurement in Education, 2, 121–141.CrossRefGoogle Scholar
  21. MacCann, R. G. (2008a). A modification to Angoff and bookmarking cutscores to account for the imperfect reliability of test scores. Educational and Psychological Measurement, 68, 197–214.CrossRefGoogle Scholar
  22. MacCann, R. G. (2008b). The application of computer-based testing to large scale assessment programs. In T. B. Scott & J. I. Livingston (Eds.), Leading-edge educational technology (pp. 1–47). New York: Nova Science.Google Scholar
  23. MacCann, R. G., & Stanley, G. (2004). Estimating the standard error of the judging in a modified-Angoff standards setting procedure. Practical Assessment Research and Evaluation, 9(5). Retrieved 1 July, 2009 from http://pareonline.net/getvn.asp?v=9&n=5.
  24. MacCann, R. G., & Stanley, G. (2006). The use of Rasch modeling to improve standard setting. Practical Assessment Research and Evaluation, 11(2). Retrieved 1 July, 2009 from http://pareonline.net/pdf/v11n2.pdf.
  25. Mitzel, H. C., Lewis, D. M., Patz, R. J., & Green, D. R. (2001). The bookmark procedure: psychological perspectives. In G. J. Cizek (Ed.), Setting performance standards (pp. 249–281). Mahwah: Lawrence Erlbaum.Google Scholar
  26. Morrison, H., Busch, J., & D’Arcy, J. (1994). Setting reliable national curriculum standards: a guide to the Angoff procedure. Assessment in Education, 1, 181–199.CrossRefGoogle Scholar
  27. Näsström, G., & Nyström, P. (2008). A comparison of two different methods for setting performance standards for a test with constructed-response items. Practical Assessment Research and Evaluation, 13(9). Retrieved 1 July 2009 at: http://pareonline.net/getvn.asp?v=13&n=9.
  28. Norcini, J., Lipner, R., Langdon, L., & Strecker, C. (1987). A comparison of three variations on a standard-setting method. Journal of Educational Measurement, 24, 56–64.CrossRefGoogle Scholar
  29. Norcini, J., Shea, J., & Kanya, D. (1988). The effect of various factors on standard setting. Journal of Educational Measurement, 25, 57–65.CrossRefGoogle Scholar
  30. Popham, W. (1978). As always provocative. Journal of Educational Measurement, 15, 297–300.CrossRefGoogle Scholar
  31. Ross, L., Clauser, B., Margolis, M., Orr, N., & Klass, D. (1996). An expert-judgement approach to setting standards for a standardized-patient examination. Academic Medicine, 71, 4–6.CrossRefGoogle Scholar
  32. Tang, S., Cheng, M., & So, W. (2006). Supporting student teachers’ professional learning with standards-referenced assessment. Asia-Pacific Journal of Teacher Education, 34, 223–244.CrossRefGoogle Scholar
  33. Verhoeven, B., Verwijnen, G., Muijtjens, A., Scherpbier, A., & van der Vleuten, C. (2002). Panel expertise for an Angoff standard setting procedure in progress testing: item writers compared to recently graduated students. Medical Education, 36, 860–867.CrossRefGoogle Scholar
  34. Wang, N. (2003). Use of the Rasch IRT model in standard setting: an item-mapping method. Journal of Educational Measurement, 40, 231–253.CrossRefGoogle Scholar

Copyright information

© Springer Science + Business Media, LLC 2010

Authors and Affiliations

  1. 1.Oxford University Centre for Educational AssessmentOxford UniversityOxfordUK

Personalised recommendations