Abstract
In order for standard setting to retain public confidence, it will be argued there are two important requirements. One is that the judges’ allocation of students to performance bands would yield results broadly consistent with the expectation of the wider educational community. Secondly, in the absence of any change in educational performance, that the percentages in the corresponding bands should be stable over time. It is argued that the use of a small team of judges makes it more difficult to satisfy these conditions. However the cost and logistics of organizing a larger number of judges in the time-pressured atmosphere of public examining can lead to sub-optimal standard setting. Two parallel systems of awarding performance bands are empirically compared, one based on teams of six judges, the other based on a population of teachers. It is shown that the latter system gives more stable results over time for the same large student population. A proposal is outlined for extending participation in standard setting through the web-based presentation of materials and the capturing of cutscores from a population of teachers.


Similar content being viewed by others
References
Angoff, W. H. (1971). Scales, norms and equivalent scores. In R. L. Thorndike (Ed.), Educational measurement (2nd ed., pp. 508–600). Washington, D.C.: American Council on Education.
Beretvas, S. N. (2004). Comparison of bookmark difficulty locations under different item response models. Applied Psychological Measurement, 28, 25–47.
Berk, R. (1996). Standard setting: the next generation. Applied Measurement in Education, 9, 215–235.
Board of Studies NSW. (2007). The standards-setting operation: handbook for judges. Sydney: Board of Studies NSW.
Brennan, R. L., & Lockwood, R. E. (1980). A comparison of the Nedelsky and Angoff cutting score procedures using generalizability theory. Applied Psychological Measurement, 4, 219–240.
Buckendahl, C. W., Blackhurst, A., & Rodeck, E. (2006). Adaptation within a language: considerations for standard setting. Paper presented at the International Test Commission conference, Brussels, Belgium, July 6–8, 2006.
Busch, J. C., & Jaeger, R. M. (1990). Influence of type of judge, normative information, and discussion on standards recommended for the National Teacher Examinations. Journal of Educational Measurement, 27, 145–163.
Chang, L. (1999). Judgmental item analysis of the Nedelsky and Angoff standard-setting methods. Applied Measurement in Education, 12, 151–165.
Chang, L., Dziuban, C., Michael, C., Hynes, M., & Olson, A. (1996). Does a standard reflect minimal competency of examinees or judge competency? Applied Measurement in Education, 9, 161–173.
Cross, L. H., Impara, J. C., Frary, R. B., & Jaeger, R. M. (1984). A comparison of three methods for establishing minimum standards on the National Teacher Examinations. Journal of Educational Measurement, 21, 113–130.
DeMars, C., Sundre, D., & Wise, S. (2002). Standard setting: a systematic approach to interpreting student learning. The Journal of General Education, 51, 1–20.
Garet, M., Porter, A., Desimone, L., Binnan, B., & Suk Yoon, K. (2001). What makes professional development effective? Results from a national sample of teachers. American Educational Research Joumal., 38, 915–945.
Giraud, G., Impara, J. C., & Buckendahl, C. (2000). Making the cut in school districts: alternative methods for setting cutscores. Educational Assessment, 6, 291–304.
Goodwin, L. D. (1999). Relations between observed item difficulty levels and Angoff minimum passing levels for a group of borderline examinees. Applied Measurement in Education, 12, 13–28.
Hambleton, R. K. (2001). Setting performance standards on educational assessments and criteria for evaluating the process. In G. J. Cizek (Ed.), Setting performance standards: Concepts, methods and perspectives (pp. 89–116). Mahwah: Lawrence Erlbaum Associates.
Hambleton, R. K., & Pitoniak, M. J. (2006). Setting performance standards. In R. L. Brennan (Ed.), Educational measurement (4th ed., pp. 433–470). Washington, DC: American Council on Education.
Jaeger, R. (1982). An iterative structured judgment process for establishing standards on competency tests of theory and application. Educational Evaluation and Policy Analysis, 4, 461–475.
Kane, M. T. (1987). On the use of IRT Models with judgmental standard setting procedures. Journal of Educational Measurement, 24, 333–345.
Linn, R. (1978). Demands, cautions and suggestions for setting standards. Journal of Educational Measurement, 15, 301–308.
Livingston, S. A., & Zieky, M. J. (1989). A comparative study of standard-setting methods. Applied Measurement in Education, 2, 121–141.
MacCann, R. G. (2008a). A modification to Angoff and bookmarking cutscores to account for the imperfect reliability of test scores. Educational and Psychological Measurement, 68, 197–214.
MacCann, R. G. (2008b). The application of computer-based testing to large scale assessment programs. In T. B. Scott & J. I. Livingston (Eds.), Leading-edge educational technology (pp. 1–47). New York: Nova Science.
MacCann, R. G., & Stanley, G. (2004). Estimating the standard error of the judging in a modified-Angoff standards setting procedure. Practical Assessment Research and Evaluation, 9(5). Retrieved 1 July, 2009 from http://pareonline.net/getvn.asp?v=9&n=5.
MacCann, R. G., & Stanley, G. (2006). The use of Rasch modeling to improve standard setting. Practical Assessment Research and Evaluation, 11(2). Retrieved 1 July, 2009 from http://pareonline.net/pdf/v11n2.pdf.
Mitzel, H. C., Lewis, D. M., Patz, R. J., & Green, D. R. (2001). The bookmark procedure: psychological perspectives. In G. J. Cizek (Ed.), Setting performance standards (pp. 249–281). Mahwah: Lawrence Erlbaum.
Morrison, H., Busch, J., & D’Arcy, J. (1994). Setting reliable national curriculum standards: a guide to the Angoff procedure. Assessment in Education, 1, 181–199.
Näsström, G., & Nyström, P. (2008). A comparison of two different methods for setting performance standards for a test with constructed-response items. Practical Assessment Research and Evaluation, 13(9). Retrieved 1 July 2009 at: http://pareonline.net/getvn.asp?v=13&n=9.
Norcini, J., Lipner, R., Langdon, L., & Strecker, C. (1987). A comparison of three variations on a standard-setting method. Journal of Educational Measurement, 24, 56–64.
Norcini, J., Shea, J., & Kanya, D. (1988). The effect of various factors on standard setting. Journal of Educational Measurement, 25, 57–65.
Popham, W. (1978). As always provocative. Journal of Educational Measurement, 15, 297–300.
Ross, L., Clauser, B., Margolis, M., Orr, N., & Klass, D. (1996). An expert-judgement approach to setting standards for a standardized-patient examination. Academic Medicine, 71, 4–6.
Tang, S., Cheng, M., & So, W. (2006). Supporting student teachers’ professional learning with standards-referenced assessment. Asia-Pacific Journal of Teacher Education, 34, 223–244.
Verhoeven, B., Verwijnen, G., Muijtjens, A., Scherpbier, A., & van der Vleuten, C. (2002). Panel expertise for an Angoff standard setting procedure in progress testing: item writers compared to recently graduated students. Medical Education, 36, 860–867.
Wang, N. (2003). Use of the Rasch IRT model in standard setting: an item-mapping method. Journal of Educational Measurement, 40, 231–253.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
MacCann, R.G., Stanley, G. Extending participation in standard setting: an online judging proposal. Educ Asse Eval Acc 22, 139–157 (2010). https://doi.org/10.1007/s11092-010-9094-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11092-010-9094-y

