Skip to main content
Log in

Extending participation in standard setting: an online judging proposal

  • Published:
Educational Assessment, Evaluation and Accountability Aims and scope Submit manuscript

Abstract

In order for standard setting to retain public confidence, it will be argued there are two important requirements. One is that the judges’ allocation of students to performance bands would yield results broadly consistent with the expectation of the wider educational community. Secondly, in the absence of any change in educational performance, that the percentages in the corresponding bands should be stable over time. It is argued that the use of a small team of judges makes it more difficult to satisfy these conditions. However the cost and logistics of organizing a larger number of judges in the time-pressured atmosphere of public examining can lead to sub-optimal standard setting. Two parallel systems of awarding performance bands are empirically compared, one based on teams of six judges, the other based on a population of teachers. It is shown that the latter system gives more stable results over time for the same large student population. A proposal is outlined for extending participation in standard setting through the web-based presentation of materials and the capturing of cutscores from a population of teachers.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
€32.70 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Finland)

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

References

  • Angoff, W. H. (1971). Scales, norms and equivalent scores. In R. L. Thorndike (Ed.), Educational measurement (2nd ed., pp. 508–600). Washington, D.C.: American Council on Education.

    Google Scholar 

  • Beretvas, S. N. (2004). Comparison of bookmark difficulty locations under different item response models. Applied Psychological Measurement, 28, 25–47.

    Article  Google Scholar 

  • Berk, R. (1996). Standard setting: the next generation. Applied Measurement in Education, 9, 215–235.

    Article  Google Scholar 

  • Board of Studies NSW. (2007). The standards-setting operation: handbook for judges. Sydney: Board of Studies NSW.

    Google Scholar 

  • Brennan, R. L., & Lockwood, R. E. (1980). A comparison of the Nedelsky and Angoff cutting score procedures using generalizability theory. Applied Psychological Measurement, 4, 219–240.

    Article  Google Scholar 

  • Buckendahl, C. W., Blackhurst, A., & Rodeck, E. (2006). Adaptation within a language: considerations for standard setting. Paper presented at the International Test Commission conference, Brussels, Belgium, July 6–8, 2006.

  • Busch, J. C., & Jaeger, R. M. (1990). Influence of type of judge, normative information, and discussion on standards recommended for the National Teacher Examinations. Journal of Educational Measurement, 27, 145–163.

    Article  Google Scholar 

  • Chang, L. (1999). Judgmental item analysis of the Nedelsky and Angoff standard-setting methods. Applied Measurement in Education, 12, 151–165.

    Article  Google Scholar 

  • Chang, L., Dziuban, C., Michael, C., Hynes, M., & Olson, A. (1996). Does a standard reflect minimal competency of examinees or judge competency? Applied Measurement in Education, 9, 161–173.

    Article  Google Scholar 

  • Cross, L. H., Impara, J. C., Frary, R. B., & Jaeger, R. M. (1984). A comparison of three methods for establishing minimum standards on the National Teacher Examinations. Journal of Educational Measurement, 21, 113–130.

    Article  Google Scholar 

  • DeMars, C., Sundre, D., & Wise, S. (2002). Standard setting: a systematic approach to interpreting student learning. The Journal of General Education, 51, 1–20.

    Article  Google Scholar 

  • Garet, M., Porter, A., Desimone, L., Binnan, B., & Suk Yoon, K. (2001). What makes professional development effective? Results from a national sample of teachers. American Educational Research Joumal., 38, 915–945.

    Article  Google Scholar 

  • Giraud, G., Impara, J. C., & Buckendahl, C. (2000). Making the cut in school districts: alternative methods for setting cutscores. Educational Assessment, 6, 291–304.

    Article  Google Scholar 

  • Goodwin, L. D. (1999). Relations between observed item difficulty levels and Angoff minimum passing levels for a group of borderline examinees. Applied Measurement in Education, 12, 13–28.

    Article  Google Scholar 

  • Hambleton, R. K. (2001). Setting performance standards on educational assessments and criteria for evaluating the process. In G. J. Cizek (Ed.), Setting performance standards: Concepts, methods and perspectives (pp. 89–116). Mahwah: Lawrence Erlbaum Associates.

    Google Scholar 

  • Hambleton, R. K., & Pitoniak, M. J. (2006). Setting performance standards. In R. L. Brennan (Ed.), Educational measurement (4th ed., pp. 433–470). Washington, DC: American Council on Education.

    Google Scholar 

  • Jaeger, R. (1982). An iterative structured judgment process for establishing standards on competency tests of theory and application. Educational Evaluation and Policy Analysis, 4, 461–475.

    Google Scholar 

  • Kane, M. T. (1987). On the use of IRT Models with judgmental standard setting procedures. Journal of Educational Measurement, 24, 333–345.

    Article  Google Scholar 

  • Linn, R. (1978). Demands, cautions and suggestions for setting standards. Journal of Educational Measurement, 15, 301–308.

    Article  Google Scholar 

  • Livingston, S. A., & Zieky, M. J. (1989). A comparative study of standard-setting methods. Applied Measurement in Education, 2, 121–141.

    Article  Google Scholar 

  • MacCann, R. G. (2008a). A modification to Angoff and bookmarking cutscores to account for the imperfect reliability of test scores. Educational and Psychological Measurement, 68, 197–214.

    Article  Google Scholar 

  • MacCann, R. G. (2008b). The application of computer-based testing to large scale assessment programs. In T. B. Scott & J. I. Livingston (Eds.), Leading-edge educational technology (pp. 1–47). New York: Nova Science.

    Google Scholar 

  • MacCann, R. G., & Stanley, G. (2004). Estimating the standard error of the judging in a modified-Angoff standards setting procedure. Practical Assessment Research and Evaluation, 9(5). Retrieved 1 July, 2009 from http://pareonline.net/getvn.asp?v=9&n=5.

  • MacCann, R. G., & Stanley, G. (2006). The use of Rasch modeling to improve standard setting. Practical Assessment Research and Evaluation, 11(2). Retrieved 1 July, 2009 from http://pareonline.net/pdf/v11n2.pdf.

  • Mitzel, H. C., Lewis, D. M., Patz, R. J., & Green, D. R. (2001). The bookmark procedure: psychological perspectives. In G. J. Cizek (Ed.), Setting performance standards (pp. 249–281). Mahwah: Lawrence Erlbaum.

    Google Scholar 

  • Morrison, H., Busch, J., & D’Arcy, J. (1994). Setting reliable national curriculum standards: a guide to the Angoff procedure. Assessment in Education, 1, 181–199.

    Article  Google Scholar 

  • Näsström, G., & Nyström, P. (2008). A comparison of two different methods for setting performance standards for a test with constructed-response items. Practical Assessment Research and Evaluation, 13(9). Retrieved 1 July 2009 at: http://pareonline.net/getvn.asp?v=13&n=9.

  • Norcini, J., Lipner, R., Langdon, L., & Strecker, C. (1987). A comparison of three variations on a standard-setting method. Journal of Educational Measurement, 24, 56–64.

    Article  Google Scholar 

  • Norcini, J., Shea, J., & Kanya, D. (1988). The effect of various factors on standard setting. Journal of Educational Measurement, 25, 57–65.

    Article  Google Scholar 

  • Popham, W. (1978). As always provocative. Journal of Educational Measurement, 15, 297–300.

    Article  Google Scholar 

  • Ross, L., Clauser, B., Margolis, M., Orr, N., & Klass, D. (1996). An expert-judgement approach to setting standards for a standardized-patient examination. Academic Medicine, 71, 4–6.

    Article  Google Scholar 

  • Tang, S., Cheng, M., & So, W. (2006). Supporting student teachers’ professional learning with standards-referenced assessment. Asia-Pacific Journal of Teacher Education, 34, 223–244.

    Article  Google Scholar 

  • Verhoeven, B., Verwijnen, G., Muijtjens, A., Scherpbier, A., & van der Vleuten, C. (2002). Panel expertise for an Angoff standard setting procedure in progress testing: item writers compared to recently graduated students. Medical Education, 36, 860–867.

    Article  Google Scholar 

  • Wang, N. (2003). Use of the Rasch IRT model in standard setting: an item-mapping method. Journal of Educational Measurement, 40, 231–253.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Robert G. MacCann.

Rights and permissions

Reprints and permissions

About this article

Cite this article

MacCann, R.G., Stanley, G. Extending participation in standard setting: an online judging proposal. Educ Asse Eval Acc 22, 139–157 (2010). https://doi.org/10.1007/s11092-010-9094-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11092-010-9094-y

Keywords

Navigation