Skip to main content
Log in

Constructing and evaluating a validity argument for the final-year ward simulation exercise

  • Published:
Advances in Health Sciences Education Aims and scope Submit manuscript

Abstract

The authors report final-year ward simulation data from the University of Dundee Medical School. Faculty who designed this assessment intend for the final score to represent an individual senior medical student’s level of clinical performance. The results are included in each student’s portfolio as one source of evidence of the student’s capability as a practitioner, professional, and scholar. Our purpose in conducting this study was to illustrate how assessment designers who are creating assessments to evaluate clinical performance might develop propositions and then collect and examine various sources of evidence to construct and evaluate a validity argument. The data were from all 154 medical students who were in their final year of study at the University of Dundee Medical School in the 2010–2011 academic year. To the best of our knowledge, this is the first report on an analysis of senior medical students’ clinical performance while they were taking responsibility for the management of a simulated ward. Using multi-facet Rasch measurement and a generalizability theory approach, we examined various sources of validity evidence that the medical school faculty have gathered for a set of six propositions needed to support their use of scores as measures of students’ clinical ability. Based on our analysis of the evidence, we would conclude that, by and large, the propositions appear to be sound, and the evidence seems to support their proposed score interpretation. Given the body of evidence collected thus far, their intended interpretation seems defensible.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
€32.70 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Finland)

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

References

  • American Educational Research Association, American Psychological Association, & National Council on Measurement in Education. (2014). The standards for educational and psychological testing. Washington, DC: American Educational Research Association.

    Google Scholar 

  • Barsuk, J., Cohen, E., McGaghie, W., & Wayne, D. (2010). Long-term retention of central venous catheter insertion skills after simulation-based mastery learning. Academic Medicine, 85(10 Suppl), S9–S12.

    Article  Google Scholar 

  • Bindal, T., Wall, D., & Goodyear, H. M. (2011). Trainee doctors’ views on workplace-based assessments: Are they just a tick box exercise? Medical Teacher, 33(11), 919–927. doi:10.3109/0142159X.2011.558140.

    Article  Google Scholar 

  • Bloch, R. & Norman, G. (2011). G string IV user manual. Retrieved from http://fhsperd.mcmaster.ca/g_string/download/g_string_4_manual_611.pdf

  • Brennan, R. L. (2000). Performance assessments from the perspective of generalizability theory. Applied Psychological Measurement, 24(4), 339–353.

    Article  Google Scholar 

  • Cook, D. A., Brydges, R., Zendejas, B., Hamstra, S. J., & Hatala, R. (2013). Technology enhanced simulation to assess health professionals: A systematic review of validity evidence, research methods and reporting quality. Academic Medicine, 88(6), 873–883.

    Article  Google Scholar 

  • Crofts, J. F., Ellis, D., Draycott, T. J., Winter, C., Hunt, L. P., & Akande, V. A. (2007). Change in knowledge of midwives and obstetricians following obstetric emergency training: A randomized controlled trial of local hospital, simulation centre and teamwork training. British Journal of Obstetrics and Gynaecology, 114(12), 1534–1541.

    Article  Google Scholar 

  • Eckes, T. (2011). Introduction to many-facet Rasch measurement: Analyzing and evaluating rater-mediated assessments. Frankfurt am Main: Peter Lang.

    Google Scholar 

  • Epstein, R. (2007). Assessment in medical education. The New England Journal of Medicine, 356, 387–396.

    Article  Google Scholar 

  • Foundation Programme. (2012, July). The UK foundation programme curriculum updated for August 2014. Retrieved from http://www.foundationprogramme.nhs.uk/pages/home/curriculum-and-assessment/curriculum2012

  • General Medical Council. (2009). Tomorrows doctors: Outcomes and standards for undergraduate medical education. London: General Medical Council. Retrieved from http://www.gmc-uk.org/static/documents/content/Tomorrows_Doctors_0414.pdf

  • General Medical Council. (2010). Standards for curricula and assessment systems. London: General Medical Council. Retrieved from http://www.gmc-uk.org/Standards_for_curricula_and_assessment_systems_0414.pdf_48904896.pdf

  • General Medical Council. (2011). Assessment in undergraduate medical education. Advice supplementary to Tomorrow’s Doctors (2009). London: General Medical Council. Retrieved from http://www.gmc-uk.org/static/documents/content/Assessment_in_undergraduate_medical_education_0211.pdf

  • Grantcharov, T. P., Kristiansen, V. B., Bendix, J., Bardram, L., Rosenberg, J., & Funch-Jensen, P. (2004). Randomized clinical trial of virtual reality simulation for laparoscopic skills training. British Journal of Surgery, 91(2), 146–150.

    Article  Google Scholar 

  • Hayes, K. (2011). Work-place based assessment. Obstetrics, Gynaecology, & Reproductive Medicine, 21(2), 52–54.

    Article  Google Scholar 

  • Iramaneerat, C., Yudkowsky, R., Myford, C. M., & Downing, S. M. (2008). Quality control of an OSCE using generalizability theory and many-faceted Rasch measurement. Advances in Health Sciences Education: Theory and Practice, 13(4), 479–493. doi:10.1007/s10459-007-9060-8.

    Article  Google Scholar 

  • Issenberg, S. B., & Scalese, R. J. (2007). Best evidence on high fidelity simulation: What clinical teachers need to know. The Clinical Teacher, 4(2), 73–77. doi:10.1111/j.1743-498X.2007.00161.x.

    Article  Google Scholar 

  • Kane, M. T. (2006). Validation. In R. L. Brennan (Ed.), Educational measurement (4th ed., pp. 17–64). Westport, CT: Praeger.

    Google Scholar 

  • Kane, M. T. (2013). Validating the interpretations and uses of test scores. Journal of Educational Measurement, 50(1), 1–73. doi:10.1111/jedm.12000.

    Article  Google Scholar 

  • Ker, J., & Bradley, P. (2014). Simulation in medical education. In T. Swanwick (Ed.), Understanding medical education: Evidence, theory and practice (2nd ed., pp. 175–192). Hoboken, NJ: Wiley.

    Google Scholar 

  • Ker, J., Hesketh, A., Anderson, F., & Johnston, D. (2005). PRHO views of the usefulness of a pilot ward simulation exercise. Hospital Medicine, 66(3), 168–170.

    Article  Google Scholar 

  • Ker, J. S., Hesketh, E. A., Anderson, F., & Johnston, D. A. (2006). Can a ward simulation exercise achieve the realism that reflects the complexity of everyday practice junior doctors encounter? Medical Teacher, 28(4), 330–334. doi:10.1080/01421590600627623.

    Article  Google Scholar 

  • Ker, J., Murphy, D., Anderson, F., Hogg, G., Hesketh, A., Hanslip, J., Kellett, C., et al. (2009, July). Reliability of a diagnostic tool to assess performance of foundation doctors in a ward simulation exercise. Poster presented at the annual scientific meeting of the Association for the Study of Medical Education (ASME). Edinburgh, UK.

  • Ker, J., & Till, H. (2014). Psychometrics. In P. Dasgupta, K. Ahmed, P. Jaye, & M. Khan (Eds.), Surgical simulation (pp. 95–109). London: Anthem Press.

    Google Scholar 

  • Linacre, J. M. (1996). Generalizability theory and many-facet Rasch measurement. In G. Engelhard & M. Wilson (Eds.), Objective measurement: Theory into practice (Vol. 3, pp. 85–98). Norwood, NJ: Ablex.

    Google Scholar 

  • Linacre, J. M. (1997). Communicating examinee measures as expected ratings. Rasch Measurement Transactions, 111(1), 550–551. Retrieved from http://www.rasch.org/rmt/rmt111m.htm

  • Linacre, J. M. (1999). Investigating rating scale category utility. Journal of Outcome Measurement, 3, 103–122.

    Google Scholar 

  • Linacre, J. M. (2001). Generalizability theory and Rasch measurement. Rasch Measurement Transactions, 15, 806–807.

    Google Scholar 

  • Linacre, J. M. (2002). Optimizing rating scale category effectiveness. Journal of Applied Measurement, 3(1), 85–106.

    Google Scholar 

  • Linacre, J. M. (2010). FACETS (Version 3.67.1)[Computer software]. Minnetonka, MN: SWReg Digital River, Inc.

  • Linacre, J. M., & Wright, B. D. (2002). Construction of measures from many-facet data. Journal of Applied Measurement, 3(4), 486–512.

    Google Scholar 

  • Linstone, H., & Turoff, M. (Eds.). (1975). The Delphi method: Techniques and applications. Reading, MA: Addison-Wesley.

    Google Scholar 

  • Lynch, B. K., & McNamara, T. F. (1998). Using G-theory and many-facet Rasch measurement in the development of performance assessments of the ESL speaking skills of immigrants. Language Testing, 15, 158–180.

    Google Scholar 

  • Maran, N. J., & Glavin, R. J. (2007). Low-to-high-fidelity simulation: What clinical teachers need to know. The Clinical Teacher, 4, 73–77.

    Article  Google Scholar 

  • McIlwaine, L. M., McAleer, J. P. G., & Ker, J. S. (2007). Assessment of final year medical students in a simulated ward: Developing content validity for an assessment instrument. International Journal of Clinical Skills, 1(1), 33–35.

    Google Scholar 

  • McKinley, R. K., Strand, J., Ward, L., Gray, T., Alun-Jones, T., & Miller, H. (2008). Checklists for assessment and certification of clinical procedural skills omit essential competencies: A systematic review. Medical Education, 42(4), 338–349.

    Article  Google Scholar 

  • McLeod, R., Mires, G. J., & Ker, J. (2012). Direct observed procedural skills assessment in the undergraduate setting. The Clinical Teacher, 9(4), 228–232. doi:10.1111/j.1743-498X.2012.00582.x.

    Article  Google Scholar 

  • McManus, I. C., Thompson, M., & Mollon, J. (2006). Assessment of examiner leniency and stringency (‘hawk-dove effect’) in the MRCP(UK) clinical examination (PACES) using multi-facet Rasch modelling. BMC Medical Education, 6, 42. doi:10.1186/1472-6920-6-42. Retrieved from http://www.biomedcentral.com/1472-6920/6/42

  • Messick, S. (1989). Validity. In R. L. Linn (Ed.), Educational measurement (3rd ed., pp. 13–103). New York, NY: Macmillan.

    Google Scholar 

  • Messick, S. (1996). Validity of performance assessment. In G. W. Phillips (Ed.), Technical issues in large-scale performance assessment (pp. 1–18). Washington, DC: U. S. Department of Education, Office of Educational Research and Improvement, National Center for Education Statistics.

    Google Scholar 

  • Miller, A., & Archer, J. (2010). Impact of workplace based assessment on doctors’ education and performance: A systematic review. British Medical Journal, 341, c5064. doi:10.1136/bmj.c5064.

    Article  Google Scholar 

  • Miller, D. M., & Linn, R. L. (2000). Validation of performance-based assessments. Applied Psychological Measurement, 24(4), 367–378. doi:10.1177/01466210022031813.

    Article  Google Scholar 

  • Morris, M. C., Gallagher, T. K., & Ridgway, P. F. (2012). Tools used to assess medical students competence in procedural skills at the end of a primary medical degree: A systematic review. Medical Education Online, 17. doi: 10.3402/meo.v17i0.18398. Retrieved from http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3427596/

  • Murphy, D., Bruce, D., Mercer, S., & Eva, K. W. (2009). The reliability of workplace-based assessment in postgraduate medical education and training: A national evaluation in general practice in the United Kingdom. Advances in Health Sciences Education: Theory and practice, 14(2), 219–232. doi:10.1007/s10459-008-9104-8.

    Article  Google Scholar 

  • Myford, C. M., & Wolfe, E. W. (2003). Detecting and measuring rater effects using many-facet Rasch measurement: Part I. Journal of Applied Measurement, 4, 386–422.

    Google Scholar 

  • Norcini, J. J. (2003). ABC of learning and teaching in medicine: Work based assessment. British Medical Journal, 326(7392), 753–755. doi:10.1136/bmj.326.7392.753.

    Article  Google Scholar 

  • Norcini, J. J. (2005). Current perspectives in assessment: The assessment of performance at work. Medical Education, 39(9), 880–889. doi:10.1111/j.1365-2929.2005.02182.x.

    Article  Google Scholar 

  • Norcini, J. J., & McKinley, D. W. (2007). Assessment methods in medical education. Teaching and Teacher Education, 23, 239–250.

    Article  Google Scholar 

  • Postgraduate Medical Education and Training Board and Academy of Medical Royal Colleges. (2009). Workplace based assessment: A guide for implementation. London: Author. Retrieved from http://www.gmc-uk.org/Workplace_Based_Assessment___A_guide_for_implementation_0410.pdf_48905168.pdf

  • Roberts, C., Rothnie, I., Zoanetti, N., & Crossley, J. (2010). Should candidate scores be adjusted for interviewer stringency or leniency in the multiple mini-interview? Medical Education, 44(7), 690–698. doi:10.1111/j.1365-2923.2010.03689.x.

    Article  Google Scholar 

  • Schuwirth, L. W. T., & van der Vleuten, C. P. M. (2003). The use of clinical simulations in assessment. Medical Education, 37(Suppl 1), 65–71. doi:10.1046/j.1365-2923.37.s1.8.x.

    Article  Google Scholar 

  • Siassakos, D., Bristowe, K., Draycott, T. J., Angouri, J., Hambly, H., Winter, C., et al. (2011). Clinical efficiency in a simulated emergency and relationship to team behaviours: A multisite cross-sectional study. BJOG: An International Journal of Obstetrics and Gynaecology, 118(5), 596–607. doi:10.1111/j.1471-0528.2010.02843.x.

    Article  Google Scholar 

  • Smith, E. V., & Kulikowich, J. M. (2004). An application of generalizability theory and many-facet Rasch measurement using a complex problem-solving assessment. Educational and Psychological Measurement, 64, 617–639. doi:10.1177/0013164404263876.

    Article  Google Scholar 

  • Stiggins, R. (2005). From formative assessment to assessment FOR learning: A path to success in standards-based schools. Phi Delta Kappan, 87(4), 324–328.

    Article  Google Scholar 

  • Stiggins, R., & Chappuis, J. (2006). What a difference a word makes. Journal of Staff Development, 27(1), 10–14.

    Google Scholar 

  • Streiner, D. L., & Norman, G. R. (2008). Health measurement scales: A practical guide to their development and use (4th ed.). New York, NY: Oxford University Press.

    Book  Google Scholar 

  • Sturm, L. P., Windsor, J. A., Cosman, P. H., Cregan, P., Hewett, P. J., & Maddern, G. J. (2008). A systematic review of skills transfer after surgical simulation training. Annals of Surgery, 248(2), 166–179.

    Article  Google Scholar 

  • Sudweeks, R. R., Reeve, S., & Bradshaw, W. S. (2005). A comparison of generalizability theory and many-facet Rasch measurement in an analysis of college sophomore writing. Assessing Writing, 9, 239–261.

    Article  Google Scholar 

  • Till, H., Myford, C., & Dowell, J. (2013). Improving student selection using multiple mini-interviews with multifaceted Rasch modelling. Academic Medicine, 88(2), 216–223. doi:10.1097/ACM.0b013e31827c0c5d.

    Article  Google Scholar 

  • van der Vleuten, C. P. M., & Dannefer, E. F. (2012). Towards a systems approach to assessment. Medical Teacher, 34(3), 185–186. doi:10.3109/0142159X.2012.652240.

    Article  Google Scholar 

  • van der Vleuten, C. P. M., & Schuwirth, L. W. T. (2005). Assessing professional competence: From methods to programmes. Medical Education, 39, 309–317. doi:10.1111/j.1365-2929.2005.02094.x.

    Article  Google Scholar 

  • Wright, B. D., & Linacre, J. M. (1994). Reasonable mean-square fit values. Rasch Measurement Transactions, 8, 370. Retrieved from http://www.rasch.org/rmt/rmt83b.htm

  • Wright, B. D., & Masters, G. N. (1982). Rating scale analysis: Rasch measurement. Chicago, IL: MESA Press.

    Google Scholar 

Download references

Ethical standard

The manuscript was submitted to the University of Dundee Research Ethics Committee (UREC). It met the ethical standards, but did not require ethical approval.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hettie Till.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Till, H., Ker, J., Myford, C. et al. Constructing and evaluating a validity argument for the final-year ward simulation exercise. Adv in Health Sci Educ 20, 1263–1289 (2015). https://doi.org/10.1007/s10459-015-9601-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10459-015-9601-5

Keywords

Navigation