Abstract
In order to construct and evaluate the validity argument for the Objective Structured Assessment of Technical Skills (OSATS), based on Kane’s framework, we conducted a systematic review. We searched MEDLINE, EMBASE, CINAHL, PsycINFO, ERIC, Web of Science, Scopus, and selected reference lists through February 2013. Working in duplicate, we selected original research articles in any language evaluating the OSATS as an assessment tool for any health professional. We iteratively and collaboratively extracted validity evidence from included articles to construct and evaluate the validity argument for varied uses of the OSATS. Twenty-nine articles met the inclusion criteria, all focussed on surgical technical skills assessment. We identified three intended uses for the OSATS, namely formative feedback, high-stakes assessment and program evaluation. Following Kane’s framework, four inferences in the validity argument were examined (scoring, generalization, extrapolation, decision). For formative feedback and high-stakes assessment, there was reasonable evidence for scoring and extrapolation. However, for high-stakes assessment there was a dearth of evidence for generalization aside from inter-rater reliability data and an absence of evidence linking multi-station OSATS scores to performance in real clinical settings. For program evaluation, the OSATS validity argument was supported by reasonable generalization and extrapolation evidence. There was a complete lack of evidence regarding implications and decisions based on OSATS scores. In general, validity evidence supported the use of the OSATS for formative feedback. Research to provide support for decisions based on OSATS scores is required if the OSATS is to be used for higher-stakes decisions and program evaluation.
Similar content being viewed by others
References
Aggarwal, R., Moorthy, K., & Darzi, A. (2004). Laparoscopic skills training and assessment. British Journal of Surgery, 91(12), 1549–1558.
American Educational Research Association, American Psychological Association, National Council on Measurement in Education, & Joint Committee on Standards for Educational and Psychological Testing US. (2014). Standards for Educational and Psychological Testing. Washington: AERA Publications.
Ault, G., Reznick, R., MacRae, H., Leadbetter, W., DaRosa, D., Joehl, R., et al. (2001). Exporting a technical skills evaluation technology to other sites. American Journal of Surgery, 182(3), 254–256.
Banks, E. H., Chudnoff, S., Karmin, I., Wang, C., & Pardanani, S. (2007). Does a surgical simulator improve resident operative performance of laparoscopic tubal ligation? American Journal of Obstetrics and Gynecology, 197(5), 541.e1–541.e5.
Bann, S., Davis, I. M., Moorthy, K., Munz, Y., Hernandez, J., Khan, M., et al. (2005). The reliability of multiple objective measures of surgery and the role of human performance. The American Journal of Surgery, 189(6), 747–752.
Bann, S., Kwok, K. F., Lo, C. Y., Darzi, A., & Wong, J. (2003). Objective assessment of technical skills of surgical trainees in Hong Kong. British Journal of Surgery, 90(10), 1294–1299.
Black, S. A., Nestel, D. F., Kneebone, R. L., & Wolfe, J. H. N. (2010). Assessment of surgical competence at carotid endarterectomy under local anaesthesia in a simulated operating theatre. British Journal of Surgery, 97(4), 511–516.
Broe, D., Ridgway, P. F., Johnson, S., Tierney, S., & Conlon, K. C. (2006). Construct validation of a novel hybrid surgical simulator. Surgical Endoscopy, 20(6), 900–904.
Brydges, R., Hatala, R., Zendejas, B., Erwin, P. J., & Cook, D. A. (2015). Linking simulation-based educational assessments and patient-related outcomes: A systematic review and meta-analysis. Academic Medicine, 90(2), 246–256.
Clauser, B. E., Margolis, M. J., Holtman, M. C., Katsufrakis, P. J., & Hawkins, R. E. (2010). Validity considerations in the assessment of professionalism. Advances in Health Sciences Education, 17(2), 165–181.
Cook, D. A. (2014). Much ado about differences: Why expert-novice comparisons add little to the validity argument. Advances in Health Sciences Education. doi:10.1007/s10459-014-9551-3.
Cook, D. A., Brydges, R., Ginsburg, G., & Hatala, R. (2014). A contemporary approach to validity arguments: A practical guide to Kane’s framework. Medical Education (in press).
Cook, D. A., Brydges, R., Zendejas, B., Hamstra, S. J., & Hatala, R. (2013). Technology-enhanced simulation to assess health professionals: A systematic review of validity evidence, research methods, and reporting quality. Academic Medicine, 88(6), 872–883.
Cook, D. A., Hatala, R., Brydges, R., Zendejas, B., Szostek, J. H., Wang, A. T., et al. (2011). Technology-enhanced simulation for health professions education: A systematic review and meta-analysis. JAMA, 306(9), 978–988.
Crossley, J., Davies, H., Humphris, G., & Jolly, B. (2002). Generalisability: a key to unlock professional assessment. Medical Education, 36(10), 972–978.
Dath, D., Regehr, G., Birch, D., Schlachta, C., Poulin, E., Mamazza, J., et al. (2004). Toward reliable operative assessment: The reliability and feasibility of videotaped assessment of laparoscopic technical skills. Surgical Endoscopy, 18(12), 1800–1804.
Datta, V., Bann, S., Beard, J., Mandalia, M., & Darzi, A. (2004). Comparison of bench test evaluations of surgical skill with live operating performance assessments. Journal of the American College of Surgeons, 199(4), 603–606.
Datta, V., Bann, S., Mandalia, M., & Darzi, A. (2006). The surgical efficiency score: A feasible, reliable, and valid method of skills assessment. The American Journal of Surgery, 192(3), 372–378.
Faulkner, H., Regehr, G., Martin, J., & Reznick, R. (1996). Validation of an objective structured assessment of technical skill for surgical residents. Academic Medicine, 71(12), 1363–1365.
Fialkow, M., Mandel, L., VanBlaricom, A., Chinn, M., Lentz, G., & Goff, B. (2007). A curriculum for Burch colposuspension and diagnostic cystoscopy evaluated by an objective structured assessment of technical skills. American Journal of Obstetrics and Gynecology, 197(5), 544.e1–544.e6.
Friedlich, M., MacRae, H., Oandasan, I., Tannenbaum, D., Batty, H., Reznick, R., & Regehr, G. (2001). Structured assessment of minor surgical skills (SAMSS) for family medicine residents. Academic Medicine, 76(12), 1241–1246.
Goff, B. A., Lentz, G. M., Lee, D., Fenner, D., Morris, J., & Mandel, L. S. (2001). Development of a bench station objective structured assessment of technical skills. Obstetrics and Gynecology, 98(3), 412–416.
Goff, B., Mandel, L., Lentz, G., VanBlaricom, A., Oelschlager, A.-M. A., Lee, D., et al. (2005). Assessment of resident surgical skills: Is testing feasible? American Journal of Obstetrics and Gynecology, 192(4), 1331–1338.
Goff, B. A., Nielsen, P. E., Lentz, G. M., Chow, G. E., Chalmers, R. W., Fenner, D., & Mandel, L. S. (2002). Surgical skills assessment: A blinded examination of obstetrics and gynecology residents. American Journal of Obstetrics and Gynecology, 186(4), 613–617.
Goff, B. A., VanBlaricom, A., Mandel, L., Chinn, M., & Nielsen, P. (2007). Comparison of objective, structured assessment of technical skills with a virtual reality hysteroscopy trainer and standard latex hysteroscopy model. The Journal of Reproductive Medicine, 52(5), 407–412.
Hance, J., Aggarwal, R., Stanbridge, R., Blauth, C., Munz, Y., Darzi, A., & Pepper, J. (2005). Objective assessment of technical skills in cardiac surgery. European Journal of Cardio-Thoracic Surgery, 28(1), 157–162.
Harden, R. M., & Gleeson, F. A. (1979). Assessment of clinical competence using an objective structured clinical examination (OSCE). Medical Education, 13(1), 41–54.
Hawkins, R. E., Margolis, M. J., Durning, S. J., & Norcini, J. J. (2010). Constructing a validity argument for the mini-clinical evaluation exercise: A review of the research. Academic Medicine, 85(9), 1453–1461.
Hislop, S. J., Hsu, J. H., Narins, C. R., Gillespie, B. T., Jain, R. A., Schippert, D. W., et al. (2006). Simulator assessment of innate endovascular aptitude versus empirically correct performance. Journal of Vascular Surgery, 43(1), 47–55.
Hodges, B., & McIlroy, J. H. (2003). Analytic global OSCE ratings are sensitive to level of training. Medical Education, 37(11), 1012–1016.
Hodges, B., Regehr, G., McNaughton, N., Tiberius, R., & Hanson, M. (1999). OSCE checklists do not capture increasing levels of expertise. Academic Medicine, 74(10), 1129–1134.
Holmboe, E. S., Hawkins, R. E., & Huot, S. J. (2004). Effects of training in direct observation of medical residents’ clinical competence: A randomized trial. Annals of Internal Medicine, 140(11), 874–881.
Ilgen, J. S., Ma, I. W. Y., Hatala, R., & Cook, D. A. (2015). Checklists and global rating scales to assess health professionals: A systematic review and meta-analysis of reliability and validity evidence in simulation-based education. Medical Education, 49(2), 161–173.
Jelovsek, J. E., Kow, N., & Diwadkar, G. B. (2013). Tools for the direct observation and assessment of psychomotor skills in medical trainees: a systematic review. Medical Education, 47(7), 650–673.
Kane, M. (2006). Validation. In R. L. Brennan (Ed.), Educational measurement (4th ed., pp. 17–63). Washington: Rowman and Littlefield Publishers Inc.
Kane, M. T. (2013). Validating the interpretations and uses of test scores. Journal of Educational Measurement, 50(1), 1–73.
Kassab, E., Tun, J. K., Arora, S., King, D., Ahmed, K., Miskovic, D., et al. (2011). “Blowing up the Barriers” in Surgical Training. Annals of Surgery, 254(6), 1059–1065.
Khan, M. S., Bann, S. D., Darzi, A. W., & Butler, P. E. M. (2007). Assessing surgical skill using bench station models. Plastic and Reconstructive Surgery, 120(3), 793–800.
Landis, J. R., & Koch, G. G. (1977). The measurement of observer agreement for categorical data. Biometrics, 33(1), 159–174.
LeBlanc, V. R., Tabak, D., Kneebone, R., Nestel, D., MacRae, H., & Moulton, C. A. (2009). Psychometric properties of an integrated assessment of technical and communication skills. American Journal of Surgery, 197(1), 96–101.
Leong, J. J. H., Leff, D. R., Das, A., Aggarwal, R., Reilly, P., Atkinson, H. D. E., et al. (2008). Validation of orthopaedic bench models for trauma surgery. The Journal of Bone and Joint Surgery, British Volume, 90(7), 958–965.
Martin, J., Regehr, G., Reznick, R., MacRae, H., Brown, M., Murnaghan, J., et al. (1995). An objective structured assessment of technical skills (OSATS) for surgical residents. Gastroenterology, 108(Suppl), A1231.
Martin, J. A., Regehr, G., Reznick, R., MacRae, H., Murnaghan, J., Hutchison, C., & Brown, M. (1997). Objective structured assessment of technical skill (OSATS) for surgical residents. British Journal of Surgery, 84(2), 273–278.
Moorthy, K. (2003). Objective assessment of technical skills in surgery. BMJ, 327(7422), 1032–1037.
Norman, G. R., van der Vleuten, C. P., & De Graaf, E. (1991). Pitfalls in the pursuit of objectivity: Issues of validity, efficiency and acceptability. Medical Education, 25(2), 119–126.
Pandey, V. A., Wolfe, J. H. N., Liapis, C. D., Bergqvist, D., & on behalf of the European Board of Vascular Surgery. (2006). The examination assessment of technical competence in vascular surgery. British Journal of Surgery, 93(9), 1132–1138.
Pandey, V. A., Wolfe, J. H. N., Lindahl, A. K., Rauwerda, J. A., & Bergqvist, D. (2004). Validity of an exam assessment in surgical skill: EBSQ-VASC pilot study. European Journal of Vascular and Endovascular Surgery, 27(4), 341–348.
Ponton-Carss, A., Hutchison, C., & Violato, C. (2011). Assessment of communication, professionalism, and surgical skills in an objective structured performance-related examination (OSPRE): A psychometric study. American Journal of Surgery, 202(4), 433–440.
Regehr, G., MacRae, H., Reznick, R. K., & Szalay, D. (1998). Comparing the psychometric properties of checklists and global rating scales for assessing performance on an OSCE-format examination. Academic Medicine, 73(9), 993–997.
Reznick, R., Regehr, G., MacRae, H., Martin, J., & McCulloch, W. (1997). Testing technical skill via an innovative “bench station” examination. American Journal of Surgery, 173(3), 226–230.
Shrout, P. E., & Fleiss, J. L. (1979). Intraclass correlations: Uses in assessing rater reliability. Psychological Bulletin, 86(2), 420–428.
Streiner, D. L., & Norman, G. R. (2008). Health Measurement Scales: A practical guide to their development and use. Oxford: Oxford University Press.
Swanson, D. B., & van der Vleuten, C. P. M. (2013). Assessment of clinical skills with standardized patients: State of the art revisited. Teaching and Learning in Medicine, 25(sup 1), S17–S25.
van Hove, P. D., Tuijthof, G. J. M., Verdaasdonk, E. G. G., Stassen, L. P. S., & Dankelman, J. (2010). Objective assessment of technical surgical skills. British Journal of Surgery, 97(7), 972–987.
VanBlaricom, A. L., Goff, B. A., Chinn, M., Icasiano, M. M., Nielsen, P., & Mandel, L. (2005). A new curriculum for hysteroscopy training as demonstrated by an objective structured assessment of technical skills (OSATS). American Journal of Obstetrics and Gynecology, 193(5), 1856–1865.
VanHeest, A., Kuzel, B., Agel, J., Putnam, M., Kalliainen, L., & Fletcher, J. (2012). Objective structured assessment of technical skill in upper extremity surgery. Journal of Hand Surgery, 37(2), 332–337.e4.
Willems, M. C. M., van der Vliet, J. A., Williams, V., Kool, L. J. S., Bergqvist, D., & Blankensteijn, J. D. (2009). Assessing endovascular skills using the simulator for testing and rating endovascular skills (STRESS) machine. European Journal of Vascular and Endovascular Surgery, 37(4), 431–436.
Winckel, C. P., Reznick, R. K., Cohen, R., & Taylor, B. (1994). Reliability and construct validity of a structured technical skills assessment form. American Journal of Surgery, 167(4), 423–427.
Acknowledgments
This article may not represent the views or opinions of the American Medical Association.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Hatala, R., Cook, D.A., Brydges, R. et al. Constructing a validity argument for the Objective Structured Assessment of Technical Skills (OSATS): a systematic review of validity evidence. Adv in Health Sci Educ 20, 1149–1175 (2015). https://doi.org/10.1007/s10459-015-9593-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10459-015-9593-1