Abstract
The authors report final-year ward simulation data from the University of Dundee Medical School. Faculty who designed this assessment intend for the final score to represent an individual senior medical student’s level of clinical performance. The results are included in each student’s portfolio as one source of evidence of the student’s capability as a practitioner, professional, and scholar. Our purpose in conducting this study was to illustrate how assessment designers who are creating assessments to evaluate clinical performance might develop propositions and then collect and examine various sources of evidence to construct and evaluate a validity argument. The data were from all 154 medical students who were in their final year of study at the University of Dundee Medical School in the 2010–2011 academic year. To the best of our knowledge, this is the first report on an analysis of senior medical students’ clinical performance while they were taking responsibility for the management of a simulated ward. Using multi-facet Rasch measurement and a generalizability theory approach, we examined various sources of validity evidence that the medical school faculty have gathered for a set of six propositions needed to support their use of scores as measures of students’ clinical ability. Based on our analysis of the evidence, we would conclude that, by and large, the propositions appear to be sound, and the evidence seems to support their proposed score interpretation. Given the body of evidence collected thus far, their intended interpretation seems defensible.

Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
American Educational Research Association, American Psychological Association, & National Council on Measurement in Education. (2014). The standards for educational and psychological testing. Washington, DC: American Educational Research Association.
Barsuk, J., Cohen, E., McGaghie, W., & Wayne, D. (2010). Long-term retention of central venous catheter insertion skills after simulation-based mastery learning. Academic Medicine, 85(10 Suppl), S9–S12.
Bindal, T., Wall, D., & Goodyear, H. M. (2011). Trainee doctors’ views on workplace-based assessments: Are they just a tick box exercise? Medical Teacher, 33(11), 919–927. doi:10.3109/0142159X.2011.558140.
Bloch, R. & Norman, G. (2011). G string IV user manual. Retrieved from http://fhsperd.mcmaster.ca/g_string/download/g_string_4_manual_611.pdf
Brennan, R. L. (2000). Performance assessments from the perspective of generalizability theory. Applied Psychological Measurement, 24(4), 339–353.
Cook, D. A., Brydges, R., Zendejas, B., Hamstra, S. J., & Hatala, R. (2013). Technology enhanced simulation to assess health professionals: A systematic review of validity evidence, research methods and reporting quality. Academic Medicine, 88(6), 873–883.
Crofts, J. F., Ellis, D., Draycott, T. J., Winter, C., Hunt, L. P., & Akande, V. A. (2007). Change in knowledge of midwives and obstetricians following obstetric emergency training: A randomized controlled trial of local hospital, simulation centre and teamwork training. British Journal of Obstetrics and Gynaecology, 114(12), 1534–1541.
Eckes, T. (2011). Introduction to many-facet Rasch measurement: Analyzing and evaluating rater-mediated assessments. Frankfurt am Main: Peter Lang.
Epstein, R. (2007). Assessment in medical education. The New England Journal of Medicine, 356, 387–396.
Foundation Programme. (2012, July). The UK foundation programme curriculum updated for August 2014. Retrieved from http://www.foundationprogramme.nhs.uk/pages/home/curriculum-and-assessment/curriculum2012
General Medical Council. (2009). Tomorrows doctors: Outcomes and standards for undergraduate medical education. London: General Medical Council. Retrieved from http://www.gmc-uk.org/static/documents/content/Tomorrows_Doctors_0414.pdf
General Medical Council. (2010). Standards for curricula and assessment systems. London: General Medical Council. Retrieved from http://www.gmc-uk.org/Standards_for_curricula_and_assessment_systems_0414.pdf_48904896.pdf
General Medical Council. (2011). Assessment in undergraduate medical education. Advice supplementary to Tomorrow’s Doctors (2009). London: General Medical Council. Retrieved from http://www.gmc-uk.org/static/documents/content/Assessment_in_undergraduate_medical_education_0211.pdf
Grantcharov, T. P., Kristiansen, V. B., Bendix, J., Bardram, L., Rosenberg, J., & Funch-Jensen, P. (2004). Randomized clinical trial of virtual reality simulation for laparoscopic skills training. British Journal of Surgery, 91(2), 146–150.
Hayes, K. (2011). Work-place based assessment. Obstetrics, Gynaecology, & Reproductive Medicine, 21(2), 52–54.
Iramaneerat, C., Yudkowsky, R., Myford, C. M., & Downing, S. M. (2008). Quality control of an OSCE using generalizability theory and many-faceted Rasch measurement. Advances in Health Sciences Education: Theory and Practice, 13(4), 479–493. doi:10.1007/s10459-007-9060-8.
Issenberg, S. B., & Scalese, R. J. (2007). Best evidence on high fidelity simulation: What clinical teachers need to know. The Clinical Teacher, 4(2), 73–77. doi:10.1111/j.1743-498X.2007.00161.x.
Kane, M. T. (2006). Validation. In R. L. Brennan (Ed.), Educational measurement (4th ed., pp. 17–64). Westport, CT: Praeger.
Kane, M. T. (2013). Validating the interpretations and uses of test scores. Journal of Educational Measurement, 50(1), 1–73. doi:10.1111/jedm.12000.
Ker, J., & Bradley, P. (2014). Simulation in medical education. In T. Swanwick (Ed.), Understanding medical education: Evidence, theory and practice (2nd ed., pp. 175–192). Hoboken, NJ: Wiley.
Ker, J., Hesketh, A., Anderson, F., & Johnston, D. (2005). PRHO views of the usefulness of a pilot ward simulation exercise. Hospital Medicine, 66(3), 168–170.
Ker, J. S., Hesketh, E. A., Anderson, F., & Johnston, D. A. (2006). Can a ward simulation exercise achieve the realism that reflects the complexity of everyday practice junior doctors encounter? Medical Teacher, 28(4), 330–334. doi:10.1080/01421590600627623.
Ker, J., Murphy, D., Anderson, F., Hogg, G., Hesketh, A., Hanslip, J., Kellett, C., et al. (2009, July). Reliability of a diagnostic tool to assess performance of foundation doctors in a ward simulation exercise. Poster presented at the annual scientific meeting of the Association for the Study of Medical Education (ASME). Edinburgh, UK.
Ker, J., & Till, H. (2014). Psychometrics. In P. Dasgupta, K. Ahmed, P. Jaye, & M. Khan (Eds.), Surgical simulation (pp. 95–109). London: Anthem Press.
Linacre, J. M. (1996). Generalizability theory and many-facet Rasch measurement. In G. Engelhard & M. Wilson (Eds.), Objective measurement: Theory into practice (Vol. 3, pp. 85–98). Norwood, NJ: Ablex.
Linacre, J. M. (1997). Communicating examinee measures as expected ratings. Rasch Measurement Transactions, 111(1), 550–551. Retrieved from http://www.rasch.org/rmt/rmt111m.htm
Linacre, J. M. (1999). Investigating rating scale category utility. Journal of Outcome Measurement, 3, 103–122.
Linacre, J. M. (2001). Generalizability theory and Rasch measurement. Rasch Measurement Transactions, 15, 806–807.
Linacre, J. M. (2002). Optimizing rating scale category effectiveness. Journal of Applied Measurement, 3(1), 85–106.
Linacre, J. M. (2010). FACETS (Version 3.67.1)[Computer software]. Minnetonka, MN: SWReg Digital River, Inc.
Linacre, J. M., & Wright, B. D. (2002). Construction of measures from many-facet data. Journal of Applied Measurement, 3(4), 486–512.
Linstone, H., & Turoff, M. (Eds.). (1975). The Delphi method: Techniques and applications. Reading, MA: Addison-Wesley.
Lynch, B. K., & McNamara, T. F. (1998). Using G-theory and many-facet Rasch measurement in the development of performance assessments of the ESL speaking skills of immigrants. Language Testing, 15, 158–180.
Maran, N. J., & Glavin, R. J. (2007). Low-to-high-fidelity simulation: What clinical teachers need to know. The Clinical Teacher, 4, 73–77.
McIlwaine, L. M., McAleer, J. P. G., & Ker, J. S. (2007). Assessment of final year medical students in a simulated ward: Developing content validity for an assessment instrument. International Journal of Clinical Skills, 1(1), 33–35.
McKinley, R. K., Strand, J., Ward, L., Gray, T., Alun-Jones, T., & Miller, H. (2008). Checklists for assessment and certification of clinical procedural skills omit essential competencies: A systematic review. Medical Education, 42(4), 338–349.
McLeod, R., Mires, G. J., & Ker, J. (2012). Direct observed procedural skills assessment in the undergraduate setting. The Clinical Teacher, 9(4), 228–232. doi:10.1111/j.1743-498X.2012.00582.x.
McManus, I. C., Thompson, M., & Mollon, J. (2006). Assessment of examiner leniency and stringency (‘hawk-dove effect’) in the MRCP(UK) clinical examination (PACES) using multi-facet Rasch modelling. BMC Medical Education, 6, 42. doi:10.1186/1472-6920-6-42. Retrieved from http://www.biomedcentral.com/1472-6920/6/42
Messick, S. (1989). Validity. In R. L. Linn (Ed.), Educational measurement (3rd ed., pp. 13–103). New York, NY: Macmillan.
Messick, S. (1996). Validity of performance assessment. In G. W. Phillips (Ed.), Technical issues in large-scale performance assessment (pp. 1–18). Washington, DC: U. S. Department of Education, Office of Educational Research and Improvement, National Center for Education Statistics.
Miller, A., & Archer, J. (2010). Impact of workplace based assessment on doctors’ education and performance: A systematic review. British Medical Journal, 341, c5064. doi:10.1136/bmj.c5064.
Miller, D. M., & Linn, R. L. (2000). Validation of performance-based assessments. Applied Psychological Measurement, 24(4), 367–378. doi:10.1177/01466210022031813.
Morris, M. C., Gallagher, T. K., & Ridgway, P. F. (2012). Tools used to assess medical students competence in procedural skills at the end of a primary medical degree: A systematic review. Medical Education Online, 17. doi: 10.3402/meo.v17i0.18398. Retrieved from http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3427596/
Murphy, D., Bruce, D., Mercer, S., & Eva, K. W. (2009). The reliability of workplace-based assessment in postgraduate medical education and training: A national evaluation in general practice in the United Kingdom. Advances in Health Sciences Education: Theory and practice, 14(2), 219–232. doi:10.1007/s10459-008-9104-8.
Myford, C. M., & Wolfe, E. W. (2003). Detecting and measuring rater effects using many-facet Rasch measurement: Part I. Journal of Applied Measurement, 4, 386–422.
Norcini, J. J. (2003). ABC of learning and teaching in medicine: Work based assessment. British Medical Journal, 326(7392), 753–755. doi:10.1136/bmj.326.7392.753.
Norcini, J. J. (2005). Current perspectives in assessment: The assessment of performance at work. Medical Education, 39(9), 880–889. doi:10.1111/j.1365-2929.2005.02182.x.
Norcini, J. J., & McKinley, D. W. (2007). Assessment methods in medical education. Teaching and Teacher Education, 23, 239–250.
Postgraduate Medical Education and Training Board and Academy of Medical Royal Colleges. (2009). Workplace based assessment: A guide for implementation. London: Author. Retrieved from http://www.gmc-uk.org/Workplace_Based_Assessment___A_guide_for_implementation_0410.pdf_48905168.pdf
Roberts, C., Rothnie, I., Zoanetti, N., & Crossley, J. (2010). Should candidate scores be adjusted for interviewer stringency or leniency in the multiple mini-interview? Medical Education, 44(7), 690–698. doi:10.1111/j.1365-2923.2010.03689.x.
Schuwirth, L. W. T., & van der Vleuten, C. P. M. (2003). The use of clinical simulations in assessment. Medical Education, 37(Suppl 1), 65–71. doi:10.1046/j.1365-2923.37.s1.8.x.
Siassakos, D., Bristowe, K., Draycott, T. J., Angouri, J., Hambly, H., Winter, C., et al. (2011). Clinical efficiency in a simulated emergency and relationship to team behaviours: A multisite cross-sectional study. BJOG: An International Journal of Obstetrics and Gynaecology, 118(5), 596–607. doi:10.1111/j.1471-0528.2010.02843.x.
Smith, E. V., & Kulikowich, J. M. (2004). An application of generalizability theory and many-facet Rasch measurement using a complex problem-solving assessment. Educational and Psychological Measurement, 64, 617–639. doi:10.1177/0013164404263876.
Stiggins, R. (2005). From formative assessment to assessment FOR learning: A path to success in standards-based schools. Phi Delta Kappan, 87(4), 324–328.
Stiggins, R., & Chappuis, J. (2006). What a difference a word makes. Journal of Staff Development, 27(1), 10–14.
Streiner, D. L., & Norman, G. R. (2008). Health measurement scales: A practical guide to their development and use (4th ed.). New York, NY: Oxford University Press.
Sturm, L. P., Windsor, J. A., Cosman, P. H., Cregan, P., Hewett, P. J., & Maddern, G. J. (2008). A systematic review of skills transfer after surgical simulation training. Annals of Surgery, 248(2), 166–179.
Sudweeks, R. R., Reeve, S., & Bradshaw, W. S. (2005). A comparison of generalizability theory and many-facet Rasch measurement in an analysis of college sophomore writing. Assessing Writing, 9, 239–261.
Till, H., Myford, C., & Dowell, J. (2013). Improving student selection using multiple mini-interviews with multifaceted Rasch modelling. Academic Medicine, 88(2), 216–223. doi:10.1097/ACM.0b013e31827c0c5d.
van der Vleuten, C. P. M., & Dannefer, E. F. (2012). Towards a systems approach to assessment. Medical Teacher, 34(3), 185–186. doi:10.3109/0142159X.2012.652240.
van der Vleuten, C. P. M., & Schuwirth, L. W. T. (2005). Assessing professional competence: From methods to programmes. Medical Education, 39, 309–317. doi:10.1111/j.1365-2929.2005.02094.x.
Wright, B. D., & Linacre, J. M. (1994). Reasonable mean-square fit values. Rasch Measurement Transactions, 8, 370. Retrieved from http://www.rasch.org/rmt/rmt83b.htm
Wright, B. D., & Masters, G. N. (1982). Rating scale analysis: Rasch measurement. Chicago, IL: MESA Press.
Ethical standard
The manuscript was submitted to the University of Dundee Research Ethics Committee (UREC). It met the ethical standards, but did not require ethical approval.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Till, H., Ker, J., Myford, C. et al. Constructing and evaluating a validity argument for the final-year ward simulation exercise. Adv in Health Sci Educ 20, 1263–1289 (2015). https://doi.org/10.1007/s10459-015-9601-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10459-015-9601-5


