Estimating the Minimum Number of Judges Required for Test-centred Standard Setting on Written Assessments. Do Discussion and Iteration have an Influence?

Fowell, S. L.; Fewtrell, R.; McLaughlin, P. J.

doi:10.1007/s10459-006-9027-1

Estimating the Minimum Number of Judges Required for Test-centred Standard Setting on Written Assessments. Do Discussion and Iteration have an Influence?

Published: 07 September 2006

Volume 13, pages 11–24, (2008)
Cite this article

Advances in Health Sciences Education Aims and scope Submit manuscript

S. L. Fowell¹,
R. Fewtrell¹ &
P. J. McLaughlin¹

483 Accesses
22 Citations
Explore all metrics

Abstract

Absolute standard setting procedures are recommended for assessment in medical education. Absolute, test-centred standard setting procedures were introduced for written assessments in the Liverpool MBChB in 2001. The modified Angoff and Ebel methods have been used for short answer question-based and extended matching question-based papers, respectively. Data collected has been analysed to investigate whether reliable standards can be achieved for small-scale, medical school-based assessments, to establish the minimum number of judges required and the effect of a discussion phase on reliability. The root mean squared error (RMSE) has been used as a measure of reliability and used to compute 95% confidence intervals for comparison to the examination statistics. The RMSE has been used to calculate the minimum number of judges required to obtain a predetermined minimum level of reliability, and the effect of the number of judges and number of items have been examined. Values of the RMSE obtained vary from 0.9 to 2.2%. Using average variances across each paper type, the minimum number of judges to obtain a RMSE of less than 2% is 10 or more judges before discussion or 6 or more judges after discussion. The results indicate that including a discussion phase improves the reliability and reduces the minimum number of judges required. Decision studies indicate that increasing the number of questions included in the assessments would not significantly improve the reliability of the standard setting.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The Use of Cronbach’s Alpha When Developing and Reporting Research Instruments in Science Education

Article Open access 07 June 2017

Artificial intelligence in higher education: the state of the field

Article Open access 24 April 2023

Why, When, Who, What, How, and Where for Trainees Writing Literature Review Articles

Article 21 May 2019

References

Angoff W.H. (1971). Scales, norms, and equivalent scores. In: Thorndike R.L. (Eds.), Educational Measurement. American Council on Education, Washington D.C., pp. 508–600
Google Scholar
Berk R.A. (1986). A consumer’s guide to setting performance standards on criterion-referenced tests Review of Education Research 56:137–172
Article Google Scholar
Berk R.A. (1996). Standard setting: The next generation: Where few psychometricians have gone before Applied Measurement in Education 9:215–235
Article Google Scholar
Brandon P.R. (2004). Conclusions about frequently studied modified Angoff standard-setting topics Applied Measurement in Education 17:59–88
Article Google Scholar
Brennan R.L. (2001). Generalizability Theory. Springer-Verlag, New York
Google Scholar
Brennan R.L., Gao X., Colton D. (1995). Generalizability analyses of work keys listening and writing tests Educational and Psychological Measurement 55:157–176
Article Google Scholar
Brennan R.L., Lockwood R.E. (1980). A comparison of the Nedelsky and Angoff cutting score procedures using generalizability theory Applied Psychological Measurement 4:219–240
Article Google Scholar
Case, S.M. & Swanson, D.B. (1996). Constructing Written Test Questions for the Basic and Clinical Sciences: National Board of Medical Examiners
Cizek G.J. (1996). Standard setting guidelines Educational Measurement: Issues and Practice 15:13–21
Article Google Scholar
Cusimano M.D. (1996). Standard setting in medical education Academic Medicine 71:S112–S120
Article Google Scholar
Downing S.M., Lieska N.G., Raible M.D. (2003). Establishing passing standards for classroom achievement tests in medical education: A comparative study of four methods Academic Medicine 78:S85–S87
Article Google Scholar
Ebel R.L. (1979). Determination of the passing score. In: Ebel R.L. (Eds.), Essentials of Educational Measurement (3rd ed.). Prentice-Hall, Englewood Cliffs, New Jersey, pp 3337–3342
Google Scholar
Hambleton R.K., Powell S. (1983). A framework for viewing the process of standard setting Evaluation and the Health Professions 6:3–24
Article Google Scholar
Hurtz G.M., Hertz N.R. (1999). How many raters should be used for establishing cutoff scores with the Angoff method? A generalizability theory study Educational and Psychological Measurement 59:885–897
Article Google Scholar
Kaufman D.M., Mann K.V., Muijtjens A.M.M., van der Vleuten C.P.M. (2000). A comparison of standard-setting procedures for an OSCE in undergraduate medical education Academic Medicine 75:267–271
Article Google Scholar
Lowry S. (1993). Assessment of Students British Medical Journal 306:51–54
Article Google Scholar
Maurer T.J., Alexander R.A., Callahan C.M., Bailey J.J., Dambrot F.H. (1991). Methodological and psychometric issues in setting cutoff scores using the Angoff method Personnel Psychology 44:235–262
Article Google Scholar
Morrison H., McNally H., Wylie C., McFaul P., Thompson W. (1996). The passing score in the objective structured clinical examination Medical Education 30:345–348
Article Google Scholar
Muijtjens A.M.M., Hoogenboom R.J.I., Verwinjen G.M., van der Vleuten C.P. (1998). Relative or absolute standards in assessing medical knowledge using progress tests Advances in Health Sciences Education 3:81–87
Article Google Scholar
Norcini J.J. (2003). Setting standards on educational tests Medical Education 37:464–469
Article Google Scholar
Norcini J.J., Lipner R.S., Langdon L.O., Strecker C.A. (1987). A comparison of three variations on a standard-setting method Journal of Educational Measurement 24:56–64
Article Google Scholar
Searle J. (2000). Defining competency – the role of standard setting Medical Education 34:363–366
Article Google Scholar
Streiner, D.L. & Norman, G.R. (1995). Reliability (Chapter 8). In Health Measurement Scales: A Practical Guide to their Development and Use, 2nd edn., pp. 104–127. Oxford University Press
Verhoeven B.H., van der Steeg A.F., Scherpbier A.J., Muijtjens A.M., Verwijnen G.M., van der Vleuten C.P. (1999). Reliability and credibility of an Angoff standard setting procedure in progress testing using recent graduates as judges Medical Education 33:832–837
Article Google Scholar
Verhoeven B.H., Verwijnen G.M., Muijtjens A.M.M., Scherpbier A.J., van der Vleuten C.P.M. (2002). Panel expertise for an Angoff standard setting procedure in progress testing: Item writers compared to recently graduated students Medical Education 36:860–867
Article Google Scholar
Wilkinson T.J., Newble D.I., Frampton C.M. (2001). Standard setting in an objective structured clinical examination: Use of global ratings of borderline performance to determine the passing score Medical Education 35:1043–1049
Article Google Scholar

Download references

Acknowledgements

We would like to thank all our colleagues associated with the Liverpool curriculum who gave their time to attend standard setting sessions.

Author information

Authors and Affiliations

School of Medical Education, University of Liverpool, 2nd Floor Cedar House, Ashton Street, Liverpool, L69 3GE, UK
S. L. Fowell, R. Fewtrell & P. J. McLaughlin

Authors

S. L. Fowell
View author publications
You can also search for this author in PubMed Google Scholar
R. Fewtrell
View author publications
You can also search for this author in PubMed Google Scholar
P. J. McLaughlin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to S. L. Fowell.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Fowell, S.L., Fewtrell, R. & McLaughlin, P.J. Estimating the Minimum Number of Judges Required for Test-centred Standard Setting on Written Assessments. Do Discussion and Iteration have an Influence?. Adv in Health Sci Educ 13, 11–24 (2008). https://doi.org/10.1007/s10459-006-9027-1

Download citation

Received: 24 November 2005
Accepted: 14 July 2006
Published: 07 September 2006
Issue Date: March 2008
DOI: https://doi.org/10.1007/s10459-006-9027-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Estimating the Minimum Number of Judges Required for Test-centred Standard Setting on Written Assessments. Do Discussion and Iteration have an Influence?

Abstract

Access this article

Similar content being viewed by others

The Use of Cronbach’s Alpha When Developing and Reporting Research Instruments in Science Education

Artificial intelligence in higher education: the state of the field

Why, When, Who, What, How, and Where for Trainees Writing Literature Review Articles

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Estimating the Minimum Number of Judges Required for Test-centred Standard Setting on Written Assessments. Do Discussion and Iteration have an Influence?

Abstract

Access this article

Similar content being viewed by others

The Use of Cronbach’s Alpha When Developing and Reporting Research Instruments in Science Education

Artificial intelligence in higher education: the state of the field

Why, When, Who, What, How, and Where for Trainees Writing Literature Review Articles

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation