In the minds of OSCE examiners: uncovering hidden assumptions

Abstract

The Objective Structured Clinical Exam (OSCE) is a widely used method of assessment in medical education. Rater cognition has become an important area of inquiry in the medical education assessment literature generally, and in the OSCE literature specifically, because of concerns about potential compromises of validity. In this study, a novel approach to mixed methods that combined Ordinal Logistic Hierarchical Linear Modeling and cognitive interviews was used to gain insights about what examiners were thinking during an OSCE. This study is based on data from the 2010 to 2014 administrations of the Clinician Assessment for Practice Program OSCE for International Medical Graduates (IMGs) in Nova Scotia. An IMG is a physician trained outside of Canada who was a licensed practitioner in a different country. The quantitative data were examined alongside four follow-up cognitive interviews of examiners conducted after the 2014 administration. The quantitative results show that competencies of (1) Investigation and Management and (2) Counseling were highly predictive of the Overall Global score. These competencies were also described in the cognitive interviews as the most salient parts of OSCE. Examiners also found Communication Skills and Professional Behavior to be relevant but the quantitative results revealed these to be less predictive of the Overall Global score. The interviews also reveal that there is a tacit sequence by which IMGs are expected to proceed in an OSCE, starting with more basic competencies such as History Taking and building up to Investigation Management and Counseling. The combined results confirm that a hidden pattern exists with respect to how examiners rate candidates. This study has potential implications for research into rater cognition, and the design and scoring of practice-ready OSCEs.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3

References

  1. Ambady, N., & Rosenthal, R. (1992). Thin slices of expressive behavior as predictors of interpersonal consequences: A meta-analysis. Psychological Bulletin, 111(2), 256–274. doi:10.1037/0033-2909.111.2.256.

    Article  Google Scholar 

  2. Bejar, I. I. (2012). Rater cognition: Implications for validity. Educational Measurement: Issues and Practice, 31(3), 2–9. doi:10.1111/j.1745-3992.2012.00238.x.

    Article  Google Scholar 

  3. Berendonk, C., Stalmeijer, R. E., & Schuwirth, L. W. T. (2012). Expertise in performance assessment: assessors’ perspectives. Advances in Health Science Education, 18, 559–571. doi:10.1007/s10459-012-9392-x.

    Article  Google Scholar 

  4. Beretvas, S. N., & Kamata, A. (2005). The multilevel measurement model: Introduction to the special issue. Journal of Applied Measurement, 6(3), 247–254.

    Google Scholar 

  5. Bobko, P., Roth, P. L., & Buster, M. A. (2007). The usefulness of unit weights in creating composite scores: A literature review, application to content validity, and meta-analysis. Organizational Research Methods, 10(4), 689–709. doi:10.1177/1094428106294734.

    Article  Google Scholar 

  6. Boulet, J. R., Cooper, R. A., Seeling, S. S., Norcini, J. J., & McKinley, D. W. (2009). U.S. citizens who obtain their medical degrees abroad: An overview, 1992–2006. Health Affairs, 28(1), 226–233. doi:10.1377/hlthaff.28.1.226.

    Article  Google Scholar 

  7. Boursicot, K. A. M., & Burdick, W. P. (2014). Structured assessments of clinical competence. In T. Swanwick (Ed.), Understanding medical education: Evidence, theory and practice (2nd ed., pp. 293–304). New York: Wiley.

    Google Scholar 

  8. Brennan, R. L. (2001). An essay on the history and future of reliability from the perspective of replications. Journal of Educational Measurement, 38(4), 295–317. doi:10.1111/j.1745-3984.2001.tb01129.x.

    Article  Google Scholar 

  9. Canadian Institute for Health Information. (2009, August). International Medical Graduates in Canada: 1972 to 2007 Executive Summary. Retrieved February 1, 2015 from http://secure.cihi.ca/free_products/img_1972-2007_aib_e.pdf.

  10. Corp, I. B. M. (2012). IBM SPSS Statistics for Windows, Version 21.0. Armonk, NY: IBM Corp.

    Google Scholar 

  11. Cox, M., Irby, D. M., & Epstein, R. M. (2007). Assessment in medical education. New England Journal of Medicine, 356(4), 387–396. doi:10.1056/NEJMra054784.

    Article  Google Scholar 

  12. CRAN. (2015). R 3.1.3 “Smooth Sidewalk”. http://cran.r-project.org/.

  13. Creswell, J. W., Klassen, A. C., Plano Clark, V. L., & Smith, K. C. (2011, August) for the Office of Behavioral and Social Sciences Research. Best practices for mixed methods research in the health sciences. National Institutes of Health. Retrieved August 1, 2015 from http://obssr.od.nih.gov/mixed_methods_research/pdf/Best_Practices_for_Mixed_Methods_Research.pdf.

  14. Crisp, V. (2012). An investigation of rater cognition in the assessment of projects. Educational Measurement: Issues and Practice, 31(3), 10–20. doi:10.1111/j.1745-3992.2012.00239.x.

    Article  Google Scholar 

  15. Cronbach, L. J., Gleser, G. C., Nanda, H., & Rajaratnam, N. (1972). The dependability of behavioral measurements: Theory of generalizability for scores and profiles. New York: Wiley.

    Google Scholar 

  16. Cronbach, L. J., & Meehl, P. E. (1955). Construct validity in psychological tests. Psychological Bulletin, 52(4), 281–302.

    Article  Google Scholar 

  17. Douglas, S., & Selinker, L. (1992). Analyzing oral proficiency test performance in general and specific purpose contexts. System, 20(3), 317–328. doi:10.1016/0346-251x(92)90043-3.

    Article  Google Scholar 

  18. Eckes, T. (2012). Operational rater types in writing assessment: Linking rater cognition to rater behavior. Language Assessment Quarterly, 9, 270–292. doi:10.1080/15434303.2011.649381.

    Article  Google Scholar 

  19. Epstein, R. M., & Hundert, E. M. (2002). Defining and assessing professional competence. Jama, 287(2), 226–235.

    Article  Google Scholar 

  20. Fuller, R., Homer, M., & Pell, G. (2013). Longitudinal interrelationships of OSCE station level analyses, quality improvement and overall reliability. Medical Teacher, 35, 515–517. doi:10.3109/0142159X.2013.775415.

    Article  Google Scholar 

  21. Gingerich, A., & Eva, K. W. (2011). Rater-based assessments as social judgments: Rethinking the etiology of rater errors. Academic Medicine, 86, S1–S7. doi:10.1097/ACM.0b013e31822a6cf8.

    Article  Google Scholar 

  22. Gingerich, A., Kogan, J., Yeates, P., Govaerts, M., & Holmboe, E. (2014a). Seeing the “black box” differently: Assessor cognition from three research perspectives. Medical Education, 48, 1055–1068. doi:10.1111/medu.12546.

    Article  Google Scholar 

  23. Gingerich, A., van der Vleuten, C. P. M., & Eva, K. W. (2014b). More consensus than idiosyncrasy: Categorizing social judgments to examine variability in Mini-CEX ratings. Academic Medicine, 89, 1510–1519. doi:10.1097/ACM.0000000000000486.

    Article  Google Scholar 

  24. Goldstein, H. (1986). Multilevel mixed linear model analysis using iterative generalized least squares. Biometrika, 73(1), 43–56. doi:10.1093/biomet/73.1.43.

    Article  Google Scholar 

  25. Hodges, B., & McIlroy, J. H. (2003). Analytic global OSCE ratings are sensitive to level of training. Medical Education, 37, 1012–1016.

    Article  Google Scholar 

  26. Hodges, B., Regehr, G., McNaughton, N., Tiberius, R., & Hanson, M. (1999). OSCE checklists do not capture increasing levels of expertise. Academic Medicine, 74, 1129–1134.

    Article  Google Scholar 

  27. Joe, J. N., Harmes, J. C., & Hickerson, C. A. (2011). Using verbal reports to explore rater perceptual processes in scoring: A mixed methods application to oral communication assessment. Assessment in Education: Principles, Policy & Practice, 18, 239–258. doi:10.1080/0969594X.2011.577408.

    Article  Google Scholar 

  28. Johnston, J. L., Lundy, G., McCullough, M., & Gormley, G. J. (2013). The view from over there: Reframing the OSCE through the experience of standardised patient raters. Medical Education, 47(9), 899–909. doi:10.1111/medu.12243.

    Article  Google Scholar 

  29. Kamata, A. (2001). Item analysis by the hierarchical generalized linear model. Journal of Educational Measurement, 38(1), 79–93. doi:10.1111/j.1745-3984.2001.tb01117.x.

    Article  Google Scholar 

  30. Kamata, A., Bauer, D. J., & Miyazaki, Y. (2008). Multilevel measurement modeling. In A. A. O’Connell & D. B. McCoach (Eds.), Multilevel modeling of educational data (pp. 345–390). Charlotte, NC: Information Age Publishing.

    Google Scholar 

  31. Kane, M. T. (1992). The assessment of professional competence. Evaluation and the Health Professions, 15(2), 163–182.

    Article  Google Scholar 

  32. Kane, M. T. (2013). Validation as a pragmatic, scientific activity. Journal of Educational Measurement, 50(1), 115–122. doi:10.1111/jedm.12007.

    Article  Google Scholar 

  33. Kane, M. T., & Bejar, I. I. (2014). Cognitive frameworks for assessment, teaching, and learning: A validity perspective. Psicología Educativa, 20(2), 117–123. doi:10.1016/j.pse.2014.11.006.

    Article  Google Scholar 

  34. Kelley, T. L. (1927). Interpretation of educational measurements. New York: World Book Co. Retrieved February 1, 2014 from http://hdl.handle.net/2027/mdp.39015001994071.

  35. Khan, K. Z., Gaunt, K., Ramachandran, S., & Pushkar, P. (2013). The objective structured clinical examination (OSCE): AMEE Guide No. 81. Part II: Organisation & Administration. Medical Teacher, 35(9), e1447–e1463. doi:10.3109/0142159X.2013.818635.

    Article  Google Scholar 

  36. Kishor, N. (1990). The effect of cognitive complexity on halo in performance judgment. Journal of Personnel Evaluation in Education, 3, 377–386.

    Article  Google Scholar 

  37. Kishor, N. (1995). The effect of implicit theories on raters’ inference in performance judgment: Consequences for the validity of student ratings of instruction. Research in Higher Education, 36(2), 177–195. doi:10.1007/BF02207787.

    Article  Google Scholar 

  38. Kogan, J. R., Conforti, L., Bernabeo, E., Iobst, W., & Holmboe, E. (2011). Opening the black box of clinical skills assessment via observation: A conceptual model. Medical Education, 45(10), 1048–1060. doi:10.1111/j.1365-2923.2011.04025.x.

    Article  Google Scholar 

  39. Liao, S. C., Hunt, E. A., & Chen, W. (2010). Comparison between inter-rater reliability and inter-rater agreement in performance assessment. Annals of the Academy of Medicine, Singapore, 39(8), 613–618.

    Google Scholar 

  40. Linacre, J. M., & Wright, B. D. (2002). Construction of measures from many-facet data. Journal of Applied Measurement, 3(4), 486–512.

    Google Scholar 

  41. MacLellan, A.-M., Brailovsky, C., Rainsberry, P., Bowmer, I., & Desrochers, M. (2010). Examination outcomes for international medical graduates pursuing or completing family medicine residency training in Quebec. Canadian Family Physician, 56(9), 912–918.

    Google Scholar 

  42. Maudsley, R. (2008). Assessment of international medical graduates and their integration into family practice: The clinical assessment for practice program. Academic Medicine, 83, 309–315.

    Article  Google Scholar 

  43. Medical Council of Canada. (2013, November). Guidelines for the development of objective structured clinical examination (OSCE) cases. Retrieved February 1, 2015, from http://mcc.ca/wp-content/uploads/osce-booklet-2014.pdf.

  44. Messick, S. (1975). The standard problem: Meaning and values in measurement and evaluation. American Psychologist, 30(10), 955–966. doi:10.1037/0003-066X.30.10.955.

    Article  Google Scholar 

  45. Messick, S. (1994). The interplay of evidence and consequences in the validation of performance assessments. Educational Researcher, 23(2), 13–23. doi:10.3102/0013189X023002013.

    Article  Google Scholar 

  46. Miles, M. B., Huberman, A. M., & Saldana, J. (2014). Qualitative data analysis: A methods sourcebook (3rd ed.). Thousand Oaks: Sage.

    Google Scholar 

  47. Mislevy, R. J. (1993). Foundations of a new test theory. In N. Frederikson, R. Mislevy, & I. I. Bejar (Eds.), Test theory for a new generation of tests (pp. 19–49). Hilllsdale, NJ: Lawrence Erlbaum Associates.

    Google Scholar 

  48. Newble, D. (2004). Techniques for measuring clinical competence: Objective structured clinical examinations. Medical Education, 38(2), 199–203. doi:10.1046/j.1365-2923.2004.01755.x.

    Article  Google Scholar 

  49. Norcini, J. J., Boulet, J. R., Opalek, A., & Dauphinee, W. D. (2014). The relationship between licensing examination performance and the outcomes of care by international medical school graduates. Academic Medicine, 89, 1157–1162. doi:10.1097/ACM.0000000000000310.

    Article  Google Scholar 

  50. Osborne, J. W. (2000). Advantages of hierarchical linear modeling. Practical Assessment, Research & Evaluation, 7(1). Retrieved February 6, 2015 from http://PAREonline.net/getvn.asp?v=7&n=1.

  51. Page, G., Bordage, G., & Allen, T. (1995). Developing key-feature problems and examinations to assess clinical decision-making skills. Academic Medicine, 70(3), 194.

    Article  Google Scholar 

  52. Raudenbush, S., & Bryk, A. S. (1986). A hierarchical model for studying school effects. Sociology of Education, 59(1), 1–17. doi:10.2307/2112482.

    Article  Google Scholar 

  53. Raudenbush, S. W., Bryk, A. S., & Congdon, R. (2004). HLM 6 for Windows. Skokie, IL: Scientific Software International, Inc.

    Google Scholar 

  54. Regehr, G., Eva, K., Ginsburg, S., Halwani, Y., & Sidhu, R. (2011). Assessment in postgraduate medical education: Trends and issues in assessment in the workplace (Members of the FMEC PG consortium). Retrieved February 1, 2015 from https://www.afmc.ca/pdf/fmec/13_Regehr_Assessment.pdf.

  55. Regehr, G., MacRae, H., Reznick, R. K., & Szalay, D. (1998). Comparing the psychometric properties of checklists and global rating scales for assessing performance on an OSCE-format examination. Academic Medicine, 73(9), 993–997.

    Article  Google Scholar 

  56. Sandilands, D. D., Gotzmann, A., Roy, M., Zumbo, B. D., & de Champlain, A. (2014). Weighting checklist items and station components on a large-scale OSCE: Is it worth the effort? Medical Teacher, 36(7), 585–590. doi:10.3109/0142159X.2014.899687.

    Article  Google Scholar 

  57. Shepard, L. A. (1997). The centrality of test use and consequences for test validity. Educational Measurement: Issues and Practice, 16(2), 5–24. doi:10.1111/j.1745-3992.1997.tb00585.x.

    Article  Google Scholar 

  58. ten Cate, O., Snell, L., & Carraccio, C. (2010). Medical competence: The interplay between individual ability and the health care environment. Medical Teacher, 32(8), 669–675. doi:10.3109/0142159X.2010.500897.

    Article  Google Scholar 

  59. Toops, H. A. (1927). The selection of graduate assistants. Personnel Journal (Pre-1986), 6, 457–472.

  60. van der Vleuten, C. P. M. (1996). The assessment of professional competence: Developments, research and practical implications. Advances in Health Science Education, 1(1), 41–67. doi:10.1007/BF00596229.

    Article  Google Scholar 

  61. van der Vleuten, C. P. M., & Schuwirth, L. W. T. (2005). Assessing professional competence: From methods to programmes. Medical Education, 39(3), 309–317. doi:10.1111/j.1365-2929.2005.02094.x.

    Article  Google Scholar 

  62. Walsh, A., Banner, S., Schabort, I., Armson, H., Bowmer, M. I., & Granata, B. (2011). International Medical Graduates—Current issues (Members of the FMEC PG consortium). Retrieved February 1, 2015 from http://www.afmc.ca/pdf/fmec/05_Walsh_IMG%20Current%20Issues.pdf.

  63. Wickham, H., & Chang, W. (2015). Ggplot2: An implementation of the grammar of graphics, Version 1.0.1. http://cran.r-project.org/web/packages/ggplot2/index.html.

  64. Williams, R. G., Klamen, D. A., & McGaghie, W. C. (2003). Cognitive, social and environmental sources of bias in clinical performance ratings. Teaching and Learning in Medicine, 15(4), 270–292. doi:10.1207/S15328015TLM1504_11.

    Article  Google Scholar 

  65. Willis, G. B. (2005). Cognitive interviewing: A tool for improving questionnaire design. Thousand Oaks: Sage.

    Book  Google Scholar 

  66. Wolfe, E. W. (2004). Identifying rater effects using latent trait models. Psychology Science, 46(1), 35–51.

    Google Scholar 

  67. Wolfe, E. W. (2006). Uncovering rater’s cognitive processing and focus using think-aloud protocols. Journal of Writing Assessment., 2(1), 37–56. http://www.journalofwritingassessment.org/archives/2-1.4.pdf.

    Google Scholar 

  68. Wong, G. Y., & Mason, W. M. (1985). The hierarchical logistic regression model for multilevel analysis. Journal of the American Statistical Association, 80(391), 513–524. doi:10.2307/2288464.

    Article  Google Scholar 

  69. Wood, T. J. (2014). Exploring the role of first impressions in rater-based assessments. Advances in Health Science Education, 19, 409–427. doi:10.1007/s10459-013-9453-9.

    Article  Google Scholar 

Download references

Acknowledgments

The authors wish to acknowledge the support of the College of Physicians & Surgeons of Nova Scotia in conducting the research study. The authors wish to acknowledge Susan Elgie, Dr. Lorelei Lingard, and Dr. Tomoko Arimura for their support in reviewing this manuscript.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Saad Chahine.

Appendix

Appendix

The Ordinal Logistic Hierarchical Linear Modeling (OLHLM) is slightly more complex as the outcome variable is an ordered categories and not continuous data. OLHLM is very similar to Logistic Hierarchical Linear Modeling, where the outcome is binary, and we estimating likelihood of receiving very good or poor ratings. Since there are four categories (poor, borderline, satisfactory, very good) of probability in our case three functions (13) are used and propensity of being in each of the categories is based on the change in log odds. The model below, represents a three level OLHLM with 4 category outcomes, 3 of the 4 categories are estimated, as it assumed that the 4th category is 1—the probability of being in the other three.

Station level model

$$ {\text{Category}}\, 1 :\;\;\log \left[ {\frac{{\phi_{ijm\left( 1 \right)}^{\prime } }}{{1 - \phi_{ijm\left( 1 \right)}^{\prime } }}} \right] = \pi_{ojm} + \pi_{1jm} X_{1jm} + \pi_{2j} mX_{2jm} + \cdots + \pi_{kjm} X_{kjm} $$
(1)
$$ {\text{Category}}\,2:\;\;\log \left[ {\frac{{\phi_{ijm\left( 2 \right)}^{\prime } }}{{1 - \phi_{ijm\left( 2 \right)}^{\prime } }}} \right] = \pi_{ojm} + \pi_{1jm} X_{1jm} + \pi_{2j} mX_{2jm} + \cdots + \pi_{kjm} X_{kjm} + \delta_{\left( 2 \right)} $$
(2)
$$ {\text{Category}}\,3:\;\;\log \left[ {\frac{{\phi_{ijm\left( 3 \right)}^{\prime } }}{{1 - \phi_{ijm\left( 3 \right)}^{\prime } }}} \right] = \pi_{ojm} + \pi_{1jm} X_{1jm} + \pi_{2j} mX_{2jm} + \cdots + \pi_{kjm} X_{kjm} + \delta_{\left( 3 \right)} $$
(3)

Candidate level

$$ \begin{aligned} &\pi_{0jm} = \beta_{00m} + r_{0jm} \hfill \\ &\pi_{1jm} = \beta_{10m} \hfill \\ &\pi_{1jm} = \beta_{10m} \hfill \\&\qquad \vdots \hfill \\ &\pi_{kjm} = \beta_{k0m} \hfill \\ \end{aligned} $$
(4)

Year level

$$ \begin{aligned} &\beta_{00m} = \gamma_{000} + u_{00m} \hfill \\ &\beta_{10m} = \gamma_{100} \hfill \\ &\beta_{20m} = \gamma_{200} \hfill \\ &\qquad \vdots \hfill \\ & \beta_{k0m} = \gamma_{k00} \hfill \\ \end{aligned} $$
(5)

Its important that when reading an HLM model to look at the different levels. In our case since there are no person characteristics at level 2 and no year characteristics at level 3, we are estimating the error variance at each of those levels and the competency coefficients. As such the:

\( \phi_{ijm(1)}^{\prime } \) is the probability that person j in year m scores a 1; \( \phi_{ijm(2)}^{\prime } \) is the probability that person j in year m scores a 1 or 2; \( \phi_{ijm(3)}^{\prime } \) is the probability that person j in year m scores a 1, 2 or 3; \( \pi_{ojm} \) is the intercept term; \( \pi_{kjm} \) is the coefficient for the kth competency for person j in year m; \( X_{kjm} \) is the kth competency score for person j in year m; \( \delta_{(2)} \) is the threshold value between category 2 and 1; \( \delta_{(3)} \) is the threshold value between category 3 and 2; \( r_{0jm} \) is the random component of \( \pi_{0jm} \); \( \beta_{00m} \) is an effect of the reference competency in year m; \( \beta_{k0m} \) is an effect of the kth competency in year m; \( u_{00m} \) is the random component of \( \beta_{00m} \); \( \gamma_{000} \) is the overall effects of competencies; \( \gamma_{k00} \) is the coefficient for competency k.

The HLM program uses the above model and provides estimates for each of the values above. However it does not provide the probability estimates of a candidate belonging to the poor, borderline, satisfactory or very good category at each station. These probability estimates need to be calculated. The following formulas are used to calculate the probability of being in each of the categories.

$$ {\text{Category}}\,1:\;\;\phi_{ijm\left( 1 \right)}^{\prime } = \frac{{e^{{(\gamma_{000} + \gamma_{100} X_{1jm} + \gamma_{200} X_{2jm} + \cdots + \gamma_{k00} X_{kjm} )}} }}{{1 + e^{{(\gamma_{000} + \gamma_{100} X_{1jm} + \gamma_{200} X_{2jm} + \cdots + \gamma_{k00} X_{kjm} )}} }} $$
(6)
$$ {\text{Category}}\;2:\;\;\phi_{ijm\left( 2 \right)}^{\prime } = \left( {\frac{{e^{{(\gamma_{000} + \gamma_{100} X_{1jm} + \gamma_{200} X_{2jm} + \cdots + \gamma_{k00} X_{kjm} + \delta_{\left( 2 \right)} )}} }}{{1 + e^{{(\gamma_{000} + \gamma_{100} X_{1jm} + \gamma_{200} X_{2jm} + \cdots + \gamma_{k00} X_{kjm} + \delta_{\left( 2 \right)} )}} }}} \right) - \phi_{ijm(1)}^{\prime } $$
(7)
$$ {\text{Category}}\;3:\;\;\phi_{ijm(3)}^{\prime } = \left( {\frac{{e^{{(\gamma_{000} + \gamma_{100} X_{1jm} + \gamma_{200} X_{2jm} + \cdots + \gamma_{k00} X_{kjm} + \delta_{\left( 3 \right)} )}} }}{{1 + e^{{(\gamma_{000} + \gamma_{100} X_{1jm} + \gamma_{200} X_{2jm} + \cdots + \gamma_{k00} X_{kjm} + \delta_{\left( 3 \right)} )}} }}} \right) - \phi_{ijm(2)}^{\prime } - \phi_{ijm(1)}^{\prime } $$
(8)
$$ {\text{Category}}\;4:\;\;\phi_{ijm(4)}^{\prime } = 1 - \phi_{ijm(3)}^{\prime } - \phi_{ijm(2)}^{\prime } - \phi_{ijm(1)}^{\prime } \;(2) $$
(9)

\( \phi '_{ijm\left( 1 \right)} \) denotes the probability of being in the poor category, when the \( X_{kjm} \) (i.e. competency scores) are at the average (i.e. set to zero). To calculate \( \phi '_{ijm\left( 4 \right)} \) the probability of being in the very good category we subtract 1 from the probability of being in the satisfactory, borderline, or very good categories.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Chahine, S., Holmes, B. & Kowalewski, Z. In the minds of OSCE examiners: uncovering hidden assumptions. Adv in Health Sci Educ 21, 609–625 (2016). https://doi.org/10.1007/s10459-015-9655-4

Download citation

Keywords

  • Assessment
  • Cognitive interview
  • Hierarchical Linear Modeling (HLM)
  • International Medical Graduate (IMG)
  • Objective Structured Clinical Exam (OSCE)
  • Practice-ready
  • Rater cognition