Background

Administration of formative and summative objective structured clinical examinations (OSCEs) in teaching programs has been shown to improve final-year medical school student’s examination performance [1]. Although the correlation between clinical competence and high-stakes OSCEs is widely accepted [2, 3], it remains unclear whether serial administration of formative OSCEs in clinical teaching programs leads to consistent improvement of clinical performance. Therefore, in this study, we investigated whether serially administered formative OSCEs help medical students achieve high scores in the national OSCE.

OSCEs are well-established as effective assessment tools for clinical competence. As such, they have become an important part of the medical licensing process in many countries [4, 5]. In Taiwan, the Taiwan Association of Medical Education, jointly commissioned by the Examination Yuan, Ministry of Education and Ministry of Health and Welfare, established an OSCE office to initiate a national medical licensure OSCE program. Passing the national OSCE has been a requirement for advancement to the second-stage national medical licensing exam in Taiwan since 2013 [6]. The Taiwanese national OSCE was held in over 20 certified examination units set in teaching hospitals or medical schools.

OSCE-related policies also influence various aspects of the medical education systems that apply OSCEs as a formative educational mode [7]. All students are repeatedly instructed about the goals of OSCEs. During formative OSCEs, students are given the opportunity to interact with standardized patients (SPs), and raters (their teachers) provide immediate feedback at each station. Serially administered OSCEs increase the reliability of borderline students through increased practice time [8]. Such programs can narrow the clinical skills gap among students with different training backgrounds. Such activities also enhance the evaluation and teaching abilities of teachers, helping to create a positive atmosphere in education systems. Hence, investigating the response of students completing serial mock OSCEs at training hospital should help determine the examinations’ influence students’ performance of clinical skills. Our study analyzed the annual performance of 82 final-year medical students of Cathay General Hospital (CGH) who completed five OSCEs, the fifth of which was the national OSCE held in CGH, and 125 final year medical students from Fu-Jen University (FJU) who completed national OSCEs, which were held in CGH and three other teaching hospitals (herein referred to as hospitals F, S, K) during 2013–2015.

Methods

This retrospective study was approved by the Institutional Review Board (IRB) of Cathay General Hospital (CGHIRB No: CGH-P104084). The IRB granted a waiver of informed consent from the participants in this study; their records were anonymized and de-identified prior to analysis. Eighty-two students who received clinical training at CGH between 2013 and 2015 were categorized according to the medical school attended into three groups labelled A, B and C. Group A comprised students from FJU (n = 31); group B contained students from medical school B of Taiwan (n = 30); and group C comprised graduates from foreign medical schools (n = 21). The students in all the groups had completed a set of five OSCEs within 1 academic year between 2013 and 2015; the first four were formative OSCEs in which station raters provided students with direct feedback, and the final examination was the national OSCE. Individual student scores on each OSCE were recorded and the differences in the mean scores among groups A, B and C were analyzed. Regarding previous (i.e., pre-internship) experience of OSCEs, participants in group A completed OSCEs six times on average, whereas those in Group B had completed them once and those in Group C had no experience of OSCEs. The first to fourth mock OSCEs (OSCE1–OSCE4) were held during the months of October, December, February, and April, respectively, and the final national OSCE (OSCE5) was held in the last week of April. Three OSCEs (OSCE1, OSCE2, and OSCE4) contained six stations (four SP stations and two procedure operation stations) and two OSCEs (OSCE3 and OSCE5) comprised 12 stations (eight SP stations and four procedure operation stations). The allocated fields for each OSCE station covered five departments: internal medicine, surgery, gynecology, pediatrics, and emergency. These fields correspond to five categories of clinical skills: history taking, physical examination, explanation and management, communication and counseling, and procedure skills. Students were required to achieve not only the passing score at a given minimum number of stations (7 stations) but also the overall passing score. The passing score for each station was calculated using borderline-group and the borderline regression methods. Each station in OSCE1–OSCE4 involved 8 min of observation and 2 min of immediate verbal feedback from station raters. Students received formal individualized performance analysis reports, and participated in a 90-min discussion meeting 1 week after each mock OSCE to review the correct procedures for each station with the aim of improving performance at the next examination. The last national OSCE (OSCE5) was a summative examination; students were given 8 min at each station. All raters were clinical physicians with actual scoring experience as certified by the OSCE office. All SPs were certified as having performance experience. Discrimination, difficulty and passing score of each station were calculated and compared according to guidelines set out by the OSCE office.

The 125 final-year FJU medical students received clinical training at four teaching hospitals: F hospital (FH), S hospital (SH), K hospital (KH) and CGH. Each student in this group had participated in the national OSCE (the fifth OSCE of CGH) between 2013 and 2015. The mean scores of these students in each teaching hospital were compared. CGH held four mock OSCEs (30 stations), whereas the remaining teaching hospitals held one or two mock OSCEs (6–12 stations) annually.

Statistical methods

An analysis of variance (ANOVA) was performed to compare the differences among the study groups. A post hoc analysis was subsequently conducted for the groups showing significant differences; given that the sizes of the groups differed, the Scheffe method was used to prevent type-one error.

Results

Student performance

For the 82 students trained at CGH from 2013 to 2015, the cumulative mean scores for OSCE1 through OSCE5 showed positive curve (the scores for OSCE3 and OSCE5 were adjusted because they each contained 12 stations, whereas the others only had six stations) (Fig. 1). Among the three groups, group A had a higher mean score in the five CGH OSCEs compared with group C. ANOVA showed significant differences for OSCE2 and OSCE3; however, no significant differences were found for OSCE1, OSCE4, and OSCE5 (Table 1). Post-hoc analysis for groups A, B and C revealed significant differences between groups A and C in OSCE2 and OSCE3 scores (OSCE2, P = 0.022; OSCE3, P = 0.027) (Table 2).

Fig. 1
figure 1

OSCE1–OSCE5 mean scores for group a, b and c students (N = 82) trained at CGH between 2013 and 2015; OSCE3 and OSCE5 mean scores were adjusted

Table 1 ANOVA test results of difference in mean scores from OSCE1–OSCE5 for CGH student groups A, B and C, 2013–2015
Table 2 Post-hoc test results of differences in mean scores from OSCE1–OSCE5 for CGH student groups A, B and C, 2013–2015

Student performance over time

The mean scores of the 125 FJU students in the national OSCEs of 2013–2015 were analyzed using ANOVA. The 2013 results revealed no significant differences; FH-trained students registered the highest mean score, followed by SH, KH, and CGH. The differences for 2014 and 2015 were all significant (Table 3). For the 2014 test, CGH demonstrated the highest mean score, followed by FH, SH, and KH. Post-hoc analysis revealed statistically significant differences between CGH and FH (P = 0.018), FH and KH (P = 0.003), CGH and SH (P < 0.001), and CGH and KH (P < 0.001) (Table 4). Furthermore, for the 2015 test, CGH-trained students registered the highest mean score, followed by KH, FH and SH. Post-hoc analysis showed significant differences between the mean scores of CGH and SH (P = 0.002) (Table 4).

Table 3 ANOVA test results of difference in mean scores from national OSCE for final-year FJU medical students at FH, SH, KH and CGH, 2013–2015
Table 4 Post-hoc test results of difference in mean scores from national OSCE for final-year FJU medical students at FH, SH, KH and CGH, 2013–2015

Discussion

The differences in the mean scores for the five OSCEs among group A, B and C CGH students were significant for OSCE2 and OSCE3 but not for OSCE1, OSCE4 and OSCE5. There was a gradual but observable increase on the mean learning curve, consistent with previously reported results [9]. Medical school attended and pre-internship OSCE experience did not affect student performance in OSCE1; however, these factors affected performance in OSCE2 and OSCE3. Unfamiliarity with the format of the OSCE and/or its related clinical competencies was a major factor affecting performance in OSCE1. Medical school attended (group C) and pre-internship OSCE experience (group A) affected students’ performance in OSCE2 and OSCE3 when the influence of unfamiliarity was attenuated. There was no significant difference in performance among the three groups in OSCE4 and OSCE5. Taken together, these findings demonstrate that, after three mock OSCEs, the influence of the abovementioned factors (i.e., unfamiliarity with the OSCE, medical school attended, and pre-internship OSCE experience) seems attenuated, enabling students to show consistent performance levels.

Unfamiliarity with OSCE format not only interferes with the performance of students but also with that of staff at the training hospitals wherein OSCEs are held. To reduce the risk of unfamiliarity, high-stakes national OSCEs were administered in Taiwan in 2011 and 2012, before the examination became one of the formal criteria of medical licensure in 2013. The differences in the mean scores from the 2013 national OSCE of the FJU students training at the four hospitals were not significant. All raters, SPs and station developers had to become familiar with the test through repeated practice. The differences in the mean scores on the national OSCE among the four teaching hospitals were significant in 2014 and 2015. In these tests, hospitals incorporating more mock OSCEs, such as CGH, registered higher mean scores. Students of CGH gained more experience from the four mock OSCEs (30 stations) than did students at the other hospitals. The stations in these mock examinations featured SPs in well-designed clinical scenarios, and raters, who provided immediate feedback. Such a training style is highly beneficial for faculty development and student training [10]. The mock OSCE training program builds various domains of medical education in an integrated, coherent, and longitudinal fashion, and provides students with frequent and constructive feedback [11]. Because OSCEs are becoming more widely incorporated, future work should examine the potential of applying OSCEs at different stages of medical training for predicting and improving clinical performance [12]. In line with Ericsson’s concept of deliberate practice, the repeated practice the mock OSCEs afforded and the corrective feedback received from tutors likely contributed to students’ better performance [13, 14].

Conclusions

A limitation of this study was the small sample size. However, OSCE training programs are easier to manage when conducted on small groups comprising approximately 30 students. The 2015 CGH OSCE training program, for example, required two workshops (one for raters and one for SPs), 122 raters, 75 SPs and 54 OSCE stations to train only 33 students. All methods for the training of raters and SPs, development of OSCE stations, and administration of OSCEs were qualified and followed the standards set by the OSCE office. Fine tuning of this training program is essential for application in teaching hospitals of different sizes. The administration of mock OSCEs by teaching hospitals enhances the performance students on the summative OSCE, as well as the teaching and assessment abilities of teachers and program directors.