Skip to main content

Advertisement

Log in

Constructing a validity argument for the Objective Structured Assessment of Technical Skills (OSATS): a systematic review of validity evidence

  • Published:
Advances in Health Sciences Education Aims and scope Submit manuscript

An Erratum to this article was published on 15 September 2015

Abstract

In order to construct and evaluate the validity argument for the Objective Structured Assessment of Technical Skills (OSATS), based on Kane’s framework, we conducted a systematic review. We searched MEDLINE, EMBASE, CINAHL, PsycINFO, ERIC, Web of Science, Scopus, and selected reference lists through February 2013. Working in duplicate, we selected original research articles in any language evaluating the OSATS as an assessment tool for any health professional. We iteratively and collaboratively extracted validity evidence from included articles to construct and evaluate the validity argument for varied uses of the OSATS. Twenty-nine articles met the inclusion criteria, all focussed on surgical technical skills assessment. We identified three intended uses for the OSATS, namely formative feedback, high-stakes assessment and program evaluation. Following Kane’s framework, four inferences in the validity argument were examined (scoring, generalization, extrapolation, decision). For formative feedback and high-stakes assessment, there was reasonable evidence for scoring and extrapolation. However, for high-stakes assessment there was a dearth of evidence for generalization aside from inter-rater reliability data and an absence of evidence linking multi-station OSATS scores to performance in real clinical settings. For program evaluation, the OSATS validity argument was supported by reasonable generalization and extrapolation evidence. There was a complete lack of evidence regarding implications and decisions based on OSATS scores. In general, validity evidence supported the use of the OSATS for formative feedback. Research to provide support for decisions based on OSATS scores is required if the OSATS is to be used for higher-stakes decisions and program evaluation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  • Aggarwal, R., Moorthy, K., & Darzi, A. (2004). Laparoscopic skills training and assessment. British Journal of Surgery, 91(12), 1549–1558.

    Article  Google Scholar 

  • American Educational Research Association, American Psychological Association, National Council on Measurement in Education, & Joint Committee on Standards for Educational and Psychological Testing US. (2014). Standards for Educational and Psychological Testing. Washington: AERA Publications.

    Google Scholar 

  • Ault, G., Reznick, R., MacRae, H., Leadbetter, W., DaRosa, D., Joehl, R., et al. (2001). Exporting a technical skills evaluation technology to other sites. American Journal of Surgery, 182(3), 254–256.

    Article  Google Scholar 

  • Banks, E. H., Chudnoff, S., Karmin, I., Wang, C., & Pardanani, S. (2007). Does a surgical simulator improve resident operative performance of laparoscopic tubal ligation? American Journal of Obstetrics and Gynecology, 197(5), 541.e1–541.e5.

    Article  Google Scholar 

  • Bann, S., Davis, I. M., Moorthy, K., Munz, Y., Hernandez, J., Khan, M., et al. (2005). The reliability of multiple objective measures of surgery and the role of human performance. The American Journal of Surgery, 189(6), 747–752.

    Article  Google Scholar 

  • Bann, S., Kwok, K. F., Lo, C. Y., Darzi, A., & Wong, J. (2003). Objective assessment of technical skills of surgical trainees in Hong Kong. British Journal of Surgery, 90(10), 1294–1299.

    Article  Google Scholar 

  • Black, S. A., Nestel, D. F., Kneebone, R. L., & Wolfe, J. H. N. (2010). Assessment of surgical competence at carotid endarterectomy under local anaesthesia in a simulated operating theatre. British Journal of Surgery, 97(4), 511–516.

    Article  Google Scholar 

  • Broe, D., Ridgway, P. F., Johnson, S., Tierney, S., & Conlon, K. C. (2006). Construct validation of a novel hybrid surgical simulator. Surgical Endoscopy, 20(6), 900–904.

    Article  Google Scholar 

  • Brydges, R., Hatala, R., Zendejas, B., Erwin, P. J., & Cook, D. A. (2015). Linking simulation-based educational assessments and patient-related outcomes: A systematic review and meta-analysis. Academic Medicine, 90(2), 246–256.

  • Clauser, B. E., Margolis, M. J., Holtman, M. C., Katsufrakis, P. J., & Hawkins, R. E. (2010). Validity considerations in the assessment of professionalism. Advances in Health Sciences Education, 17(2), 165–181.

    Article  Google Scholar 

  • Cook, D. A. (2014). Much ado about differences: Why expert-novice comparisons add little to the validity argument. Advances in Health Sciences Education. doi:10.1007/s10459-014-9551-3.

  • Cook, D. A., Brydges, R., Ginsburg, G., & Hatala, R. (2014). A contemporary approach to validity arguments: A practical guide to Kane’s framework. Medical Education (in press).

  • Cook, D. A., Brydges, R., Zendejas, B., Hamstra, S. J., & Hatala, R. (2013). Technology-enhanced simulation to assess health professionals: A systematic review of validity evidence, research methods, and reporting quality. Academic Medicine, 88(6), 872–883.

    Article  Google Scholar 

  • Cook, D. A., Hatala, R., Brydges, R., Zendejas, B., Szostek, J. H., Wang, A. T., et al. (2011). Technology-enhanced simulation for health professions education: A systematic review and meta-analysis. JAMA, 306(9), 978–988.

    Article  Google Scholar 

  • Crossley, J., Davies, H., Humphris, G., & Jolly, B. (2002). Generalisability: a key to unlock professional assessment. Medical Education, 36(10), 972–978.

    Article  Google Scholar 

  • Dath, D., Regehr, G., Birch, D., Schlachta, C., Poulin, E., Mamazza, J., et al. (2004). Toward reliable operative assessment: The reliability and feasibility of videotaped assessment of laparoscopic technical skills. Surgical Endoscopy, 18(12), 1800–1804.

    Article  Google Scholar 

  • Datta, V., Bann, S., Beard, J., Mandalia, M., & Darzi, A. (2004). Comparison of bench test evaluations of surgical skill with live operating performance assessments. Journal of the American College of Surgeons, 199(4), 603–606.

    Article  Google Scholar 

  • Datta, V., Bann, S., Mandalia, M., & Darzi, A. (2006). The surgical efficiency score: A feasible, reliable, and valid method of skills assessment. The American Journal of Surgery, 192(3), 372–378.

    Article  Google Scholar 

  • Faulkner, H., Regehr, G., Martin, J., & Reznick, R. (1996). Validation of an objective structured assessment of technical skill for surgical residents. Academic Medicine, 71(12), 1363–1365.

    Article  Google Scholar 

  • Fialkow, M., Mandel, L., VanBlaricom, A., Chinn, M., Lentz, G., & Goff, B. (2007). A curriculum for Burch colposuspension and diagnostic cystoscopy evaluated by an objective structured assessment of technical skills. American Journal of Obstetrics and Gynecology, 197(5), 544.e1–544.e6.

    Article  Google Scholar 

  • Friedlich, M., MacRae, H., Oandasan, I., Tannenbaum, D., Batty, H., Reznick, R., & Regehr, G. (2001). Structured assessment of minor surgical skills (SAMSS) for family medicine residents. Academic Medicine, 76(12), 1241–1246.

    Article  Google Scholar 

  • Goff, B. A., Lentz, G. M., Lee, D., Fenner, D., Morris, J., & Mandel, L. S. (2001). Development of a bench station objective structured assessment of technical skills. Obstetrics and Gynecology, 98(3), 412–416.

    Article  Google Scholar 

  • Goff, B., Mandel, L., Lentz, G., VanBlaricom, A., Oelschlager, A.-M. A., Lee, D., et al. (2005). Assessment of resident surgical skills: Is testing feasible? American Journal of Obstetrics and Gynecology, 192(4), 1331–1338.

    Article  Google Scholar 

  • Goff, B. A., Nielsen, P. E., Lentz, G. M., Chow, G. E., Chalmers, R. W., Fenner, D., & Mandel, L. S. (2002). Surgical skills assessment: A blinded examination of obstetrics and gynecology residents. American Journal of Obstetrics and Gynecology, 186(4), 613–617.

    Article  Google Scholar 

  • Goff, B. A., VanBlaricom, A., Mandel, L., Chinn, M., & Nielsen, P. (2007). Comparison of objective, structured assessment of technical skills with a virtual reality hysteroscopy trainer and standard latex hysteroscopy model. The Journal of Reproductive Medicine, 52(5), 407–412.

    Google Scholar 

  • Hance, J., Aggarwal, R., Stanbridge, R., Blauth, C., Munz, Y., Darzi, A., & Pepper, J. (2005). Objective assessment of technical skills in cardiac surgery. European Journal of Cardio-Thoracic Surgery, 28(1), 157–162.

    Article  Google Scholar 

  • Harden, R. M., & Gleeson, F. A. (1979). Assessment of clinical competence using an objective structured clinical examination (OSCE). Medical Education, 13(1), 41–54.

    Article  Google Scholar 

  • Hawkins, R. E., Margolis, M. J., Durning, S. J., & Norcini, J. J. (2010). Constructing a validity argument for the mini-clinical evaluation exercise: A review of the research. Academic Medicine, 85(9), 1453–1461.

    Article  Google Scholar 

  • Hislop, S. J., Hsu, J. H., Narins, C. R., Gillespie, B. T., Jain, R. A., Schippert, D. W., et al. (2006). Simulator assessment of innate endovascular aptitude versus empirically correct performance. Journal of Vascular Surgery, 43(1), 47–55.

    Article  Google Scholar 

  • Hodges, B., & McIlroy, J. H. (2003). Analytic global OSCE ratings are sensitive to level of training. Medical Education, 37(11), 1012–1016.

    Article  Google Scholar 

  • Hodges, B., Regehr, G., McNaughton, N., Tiberius, R., & Hanson, M. (1999). OSCE checklists do not capture increasing levels of expertise. Academic Medicine, 74(10), 1129–1134.

    Article  Google Scholar 

  • Holmboe, E. S., Hawkins, R. E., & Huot, S. J. (2004). Effects of training in direct observation of medical residents’ clinical competence: A randomized trial. Annals of Internal Medicine, 140(11), 874–881.

    Article  Google Scholar 

  • Ilgen, J. S., Ma, I. W. Y., Hatala, R., & Cook, D. A. (2015). Checklists and global rating scales to assess health professionals: A systematic review and meta-analysis of reliability and validity evidence in simulation-based education. Medical Education, 49(2), 161–173.

  • Jelovsek, J. E., Kow, N., & Diwadkar, G. B. (2013). Tools for the direct observation and assessment of psychomotor skills in medical trainees: a systematic review. Medical Education, 47(7), 650–673.

    Article  Google Scholar 

  • Kane, M. (2006). Validation. In R. L. Brennan (Ed.), Educational measurement (4th ed., pp. 17–63). Washington: Rowman and Littlefield Publishers Inc.

    Google Scholar 

  • Kane, M. T. (2013). Validating the interpretations and uses of test scores. Journal of Educational Measurement, 50(1), 1–73.

    Article  Google Scholar 

  • Kassab, E., Tun, J. K., Arora, S., King, D., Ahmed, K., Miskovic, D., et al. (2011). “Blowing up the Barriers” in Surgical Training. Annals of Surgery, 254(6), 1059–1065.

    Article  Google Scholar 

  • Khan, M. S., Bann, S. D., Darzi, A. W., & Butler, P. E. M. (2007). Assessing surgical skill using bench station models. Plastic and Reconstructive Surgery, 120(3), 793–800.

  • Landis, J. R., & Koch, G. G. (1977). The measurement of observer agreement for categorical data. Biometrics, 33(1), 159–174.

    Article  Google Scholar 

  • LeBlanc, V. R., Tabak, D., Kneebone, R., Nestel, D., MacRae, H., & Moulton, C. A. (2009). Psychometric properties of an integrated assessment of technical and communication skills. American Journal of Surgery, 197(1), 96–101.

    Article  Google Scholar 

  • Leong, J. J. H., Leff, D. R., Das, A., Aggarwal, R., Reilly, P., Atkinson, H. D. E., et al. (2008). Validation of orthopaedic bench models for trauma surgery. The Journal of Bone and Joint Surgery, British Volume, 90(7), 958–965.

    Article  Google Scholar 

  • Martin, J., Regehr, G., Reznick, R., MacRae, H., Brown, M., Murnaghan, J., et al. (1995). An objective structured assessment of technical skills (OSATS) for surgical residents. Gastroenterology, 108(Suppl), A1231.

    Google Scholar 

  • Martin, J. A., Regehr, G., Reznick, R., MacRae, H., Murnaghan, J., Hutchison, C., & Brown, M. (1997). Objective structured assessment of technical skill (OSATS) for surgical residents. British Journal of Surgery, 84(2), 273–278.

    Article  Google Scholar 

  • Moorthy, K. (2003). Objective assessment of technical skills in surgery. BMJ, 327(7422), 1032–1037.

    Article  Google Scholar 

  • Norman, G. R., van der Vleuten, C. P., & De Graaf, E. (1991). Pitfalls in the pursuit of objectivity: Issues of validity, efficiency and acceptability. Medical Education, 25(2), 119–126.

    Article  Google Scholar 

  • Pandey, V. A., Wolfe, J. H. N., Liapis, C. D., Bergqvist, D., & on behalf of the European Board of Vascular Surgery. (2006). The examination assessment of technical competence in vascular surgery. British Journal of Surgery, 93(9), 1132–1138.

    Article  Google Scholar 

  • Pandey, V. A., Wolfe, J. H. N., Lindahl, A. K., Rauwerda, J. A., & Bergqvist, D. (2004). Validity of an exam assessment in surgical skill: EBSQ-VASC pilot study. European Journal of Vascular and Endovascular Surgery, 27(4), 341–348.

    Article  Google Scholar 

  • Ponton-Carss, A., Hutchison, C., & Violato, C. (2011). Assessment of communication, professionalism, and surgical skills in an objective structured performance-related examination (OSPRE): A psychometric study. American Journal of Surgery, 202(4), 433–440.

    Article  Google Scholar 

  • Regehr, G., MacRae, H., Reznick, R. K., & Szalay, D. (1998). Comparing the psychometric properties of checklists and global rating scales for assessing performance on an OSCE-format examination. Academic Medicine, 73(9), 993–997.

    Article  Google Scholar 

  • Reznick, R., Regehr, G., MacRae, H., Martin, J., & McCulloch, W. (1997). Testing technical skill via an innovative “bench station” examination. American Journal of Surgery, 173(3), 226–230.

    Article  Google Scholar 

  • Shrout, P. E., & Fleiss, J. L. (1979). Intraclass correlations: Uses in assessing rater reliability. Psychological Bulletin, 86(2), 420–428.

    Article  Google Scholar 

  • Streiner, D. L., & Norman, G. R. (2008). Health Measurement Scales: A practical guide to their development and use. Oxford: Oxford University Press.

    Book  Google Scholar 

  • Swanson, D. B., & van der Vleuten, C. P. M. (2013). Assessment of clinical skills with standardized patients: State of the art revisited. Teaching and Learning in Medicine, 25(sup 1), S17–S25.

    Article  Google Scholar 

  • van Hove, P. D., Tuijthof, G. J. M., Verdaasdonk, E. G. G., Stassen, L. P. S., & Dankelman, J. (2010). Objective assessment of technical surgical skills. British Journal of Surgery, 97(7), 972–987.

    Article  Google Scholar 

  • VanBlaricom, A. L., Goff, B. A., Chinn, M., Icasiano, M. M., Nielsen, P., & Mandel, L. (2005). A new curriculum for hysteroscopy training as demonstrated by an objective structured assessment of technical skills (OSATS). American Journal of Obstetrics and Gynecology, 193(5), 1856–1865.

    Article  Google Scholar 

  • VanHeest, A., Kuzel, B., Agel, J., Putnam, M., Kalliainen, L., & Fletcher, J. (2012). Objective structured assessment of technical skill in upper extremity surgery. Journal of Hand Surgery, 37(2), 332–337.e4.

    Article  Google Scholar 

  • Willems, M. C. M., van der Vliet, J. A., Williams, V., Kool, L. J. S., Bergqvist, D., & Blankensteijn, J. D. (2009). Assessing endovascular skills using the simulator for testing and rating endovascular skills (STRESS) machine. European Journal of Vascular and Endovascular Surgery, 37(4), 431–436.

    Article  Google Scholar 

  • Winckel, C. P., Reznick, R. K., Cohen, R., & Taylor, B. (1994). Reliability and construct validity of a structured technical skills assessment form. American Journal of Surgery, 167(4), 423–427.

    Article  Google Scholar 

Download references

Acknowledgments

This article may not represent the views or opinions of the American Medical Association.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rose Hatala.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hatala, R., Cook, D.A., Brydges, R. et al. Constructing a validity argument for the Objective Structured Assessment of Technical Skills (OSATS): a systematic review of validity evidence. Adv in Health Sci Educ 20, 1149–1175 (2015). https://doi.org/10.1007/s10459-015-9593-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10459-015-9593-1

Keywords

Navigation