Skip to main content
Log in

Examiners and content and site: Oh My! A national organization’s investigation of score variation in large-scale performance assessments

  • Published:
Advances in Health Sciences Education Aims and scope Submit manuscript

Abstract

Examiner effects and content specificity are two well known sources of construct irrelevant variance that present great challenges in performance-based assessments. National medical organizations that are responsible for large-scale performance based assessments experience an additional challenge as they are responsible for administering qualification examinations to physician candidates at several locations and institutions. This study explores the impact of site location as a source of score variation in a large-scale national assessment used to measure the readiness of internationally educated physician candidates for residency programs. Data from the Medical Council of Canada’s National Assessment Collaboration were analyzed using Hierarchical Linear Modeling and Rasch Analyses. Consistent with previous research, problematic variance due to examiner effects and content specificity was found. Additionally, site location was also identified as a potential source of construct irrelevant variance in examination scores.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Berendonk, C., Stalmeijer, R. E., & Schuwirth, L. W. T. (2013). Expertise in performance assessment: Assessors’ perspectives. Advances in Health Sciences Education, 18, 559–571. doi:10.1007/s10459-012-9392-x.

    Article  Google Scholar 

  • Brannick, M. T., Erol-Korkmaz, H. T., & Prewett, M. (2011). A systematic review of the reliability of objective structured clinical examination scores. Medical Education, 45, 1181–1189. doi:10.111/j.1365-2923.2011.04075.x.

    Article  Google Scholar 

  • Clauser, B. E., Swanson, D. B., & Harik, P. (2002). Multivariate generalizability analysis of the impact of training and examinee performance information on judgments made in an angoff-style standard-setting procedure. Journal of Educational Measurement, 39, 269–290. doi:10.1111/j.1745-3984.2002.tb01143.x.

    Article  Google Scholar 

  • Crossley, J., Johnson, G., Booth, J., & Wade, W. (2011). Good questions, good answers: Construct alignment improves the performance of workplace-based assessment scales. Medical Education, 45, 560–569. doi:10.1111/j.1365-2923.2010.03913.x.

    Article  Google Scholar 

  • De Champlain, A. F., MacMillan, M. K., King, A. M., Klass, D. J., & Margolis, M. J. (1999). Assessing the impacts of intra-site and inter-site checklists recording discrepancies on the reliability of scores obtained in a nationally administered standardized patient examination. Academic Medicine, 74, S53–S54.

    Google Scholar 

  • Elstein, A. S., Shulman, L. S., & Sprafka, S. A. (1978). Medical problem solving: An analysis of clinical reasoning. Cambridge, MA: Harvard University Press.

    Book  Google Scholar 

  • Floreck, L. M., & De Champlain, A. F. (2001). Assessing sources of score variability in the multisite medical performance assessment: An application of hierarchical linear modeling. Academic Medicine, 76, S93–S95.

    Article  Google Scholar 

  • Gibson, N. M., & Olenjnik, S. (2003). Treatment of missing data at the second level of hierarchical linear models. Educational and Psychological Measurement, 63, 204–238. doi:10.1177/0013164402250987.

    Article  Google Scholar 

  • Green, M. L., & Holmboe, E. (2010). The ACGME toolbox: Half empty or half full? Academic Medicine, 85, 787–790. doi:10.1097/ACM.0b013e3181d737a6.

    Article  Google Scholar 

  • Harasym, P. H., Woloschuk, W., & Cunning, L. (2008). Undesired variance due to examiner stringency/leniency effect in communication skill scores assessed in OSCEs. Advances in Health Sciences Education, 13, 617–632. doi:10.1007/s10459-007-9068-0.

    Article  Google Scholar 

  • Harden, R. M., & Gleeson, F. A. (1979). Assessment of clinical competence using an objective structured clinical examination (OSCE). Medical Education, 13, 41–54.

    Google Scholar 

  • Iramaneerat, C., & Yudkowsky, R. (2007). Rater errors in a clinical skills assessment of medical students. Evaluation and the Health Professions, 30, 266–283. doi:10.1177/0163278707304040.

    Article  Google Scholar 

  • Iramaneerat, C., Yudkowsky, R., Myford, C. M., & Downing, S. M. (2008). Quality control of an OSCE using generalizability theory and many-faceted rasch measurement. Advances in Health Sciences Education, 13, 479–493. doi:10.1007/s10459-007-9060-8.

    Article  Google Scholar 

  • Kogan, J. R., Conforti, L., Bernabeo, E., Iobst, W., & Holmboe, E. (2011). Opening the black box of clinical skills assessment via observation: A conceptual model. Medical Education, 45, 1048–1060. doi:10.111/j.1365-2923.2011.04025.x.

    Article  Google Scholar 

  • Landy, F. J., & Farr, J. L. (1980). Performance Rating. Psychological Bulletin, 87, 72–107.

    Article  Google Scholar 

  • Lawson, D. M. (2006). Applying generalizability theory to high-stakes objective structured clinical examinations in a naturalistic environment. Journal of Manipulative and Physiological Therapeutics, 29, 463–467. doi:10.1016/j.jmpt.2006.06.009.

    Article  Google Scholar 

  • Linacre, J. M. (1995). Misfit statistics for rating scale categories. Rasch Measurement Transactions, 9, 450.

    Google Scholar 

  • Linacre, J. M. (2010). Rasch measurement: Core topics. http://courses.statistics.com/index.php3.

  • Linacre, J. M. (2011). Facets computer program for many-facet Rasch measurement, version 3.68.1. Beaverton, OR: Winsteps.com.

  • Linn, R. L., & Burton, E. (1994). Performance-based assessment: Implications of task specificity. Educational Measurement: Issues and Practice, 13, 5–15. doi:10.1111/j.1745-3992.1994.tb00778.x.

    Article  Google Scholar 

  • Ma, X., & Klinger, D. A. (2000). Hierarchical linear modeling of student and school effects on academic achievement. Canadian Journal of Education, 25, 41–55.

    Article  Google Scholar 

  • Medical Council of Canada. (2012). NAC scoring and quality control annual report. Ottawa, ON: Medical Council of Canada.

    Google Scholar 

  • Raudenbush, S. W., & Bryk, A. S. (2002). Hierarchical linear models: Applications and data analysis methods (2nd ed.). Thousand Oaks, CA: Sage Publications.

    Google Scholar 

  • Sebok, S. S., Luu, K., & Klinger, D. A. (2014). Psychometric properties of the multiple mini-interview used for medical admissions: Findings from generalizability and rasch analyses. Advances in Health Sciences Education, 19, 71–84. doi:10.1007/s10459-013-9463-7.

    Article  Google Scholar 

  • Swanson, D. B., Clauser, B. E., & Case, S. M. (1999). Clinical skills assessment with standardized patients in high-stakes tests: A framework for thinking about score precision, equating, and security. Advances in Health Sciences Education, 4, 67–106. doi:10.1023/A:1009862220473.

    Article  Google Scholar 

  • Tavares, W., & Eva, K. W. (2013). Exploring the impact of mental workload on rater-based assessments. Advances in Health Sciences Education, 18, 291–303. doi:10.1007/s10459-012-9370-3.

    Article  Google Scholar 

  • Wolfe, E., & McVay, A. (2012). Application of latent trait models to identify substantively interesting raters. Educational Measurement: Issues and Practice, 31, 31–37. doi:10.1111/j.1745-3992.2012.00241.x.

    Article  Google Scholar 

  • Wright, B. D., & Linacre, J. M. (1994). Reasonable mean-square fit values. Rasch Measurement Transactions, 8, 370.

    Google Scholar 

  • Yeates, P., O’Neill, P., Mann, K., & Eva, K. (2013). Seeing the same thing differently: Mechanisms that contribute to assessor differences in directly-observed performance assessments. Advances in Health Sciences Education, 18(325–341), 1045. doi:10.1007/s9-012-9372-1.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Stefanie S. Sebok.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sebok, S.S., Roy, M., Klinger, D.A. et al. Examiners and content and site: Oh My! A national organization’s investigation of score variation in large-scale performance assessments. Adv in Health Sci Educ 20, 581–594 (2015). https://doi.org/10.1007/s10459-014-9547-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10459-014-9547-z

Keywords

Navigation