Ethnic and social disparities in different types of examinations in undergraduate pre-clinical training

Medical schools are increasingly faced with a more diverse student population. Generally, ethnic minority students are reported to underperform compared with those from the ethnic majority. However, there are inconsistencies in findings in different types of examinations. Additionally, little is known about the performance of first-generation university students and about performance differences across ethnic minority groups. This study aimed to investigate underperformance across ethnic minority groups and by first-generation university students in different types of written tests and clinical skills examinations during pre-clinical training. A longitudinal prospective cohort study of progress on a 3-year Dutch Bachelor of Medicine course was conducted. Participants included 2432 students who entered the course over a consecutive 6-year period (2008–2013). Compared with Dutch students, the three non-Western ethnic minority groups (Turkish/Moroccan/African, Surinamese/Antillean and Asian) underperformed in the clinical problem solving tests, the language test and the OSCEs. Findings on the theoretical end-of-block tests and writing skills tests, and results for Western minority students were less consistent. Age, gender, pre-university grade point average and additional socio-demographic variables (including first-generation university student, first language, and medical doctor parent) could explain the ethnicity-related differences in theoretical examinations, but not in language, clinical and writing skills examinations. First-generation university students only underperformed in the language test. Apparently, underperformance differs both across ethnic subgroups and between different types of written and clinical examinations. Medical schools should ensure their assessment strategies create a level playing field for all students and explore reasons for underperformance in the clinical and writing skills examinations.


Introduction
The past few decades have shown an increase in the number of students from nontraditional backgrounds, such as ethnic minority and first-generation university students, who enter medical school in Western countries (Arulampalam et al. 2004;Bedi and Gilthorpe 2000;Howe et al. 2004;Klimidis et al. 1997). The increase in non-traditional students is not only a result of changing demographics, but is also caused by widening access policies that medical schools have adopted to achieve social equality and to ensure that the population of matriculating medical doctors is more representative of society (Cleland et al. 2012). In general, students from ethnic minorities are reported to underperform compared with those from the ethnic majority. However, there are inconsistencies in findings in different types of examinations. In addition, little is known about the performance of first-generation university students and about performance differences across ethnic minority groups. In this study, we investigate underperformance across ethnic minority groups and by first-generation university students in different types of written and clinical skills examinations during pre-clinical training.
Ethnic minority students have been shown to underperform compared with those from the ethnic majority at different stages of medical school. Studies from Australia and the Netherlands show that they underperform in the first year of medical school (Kay-Lambkin et al. 2002;Stegers-Jager et al. 2012) and also later in the course, in final year assessments in Australia, the UK and the USA (Liddell and Koritsas 2004;McManus et al. 1996;Xu et al. 1993). A systematic review and meta-analysis showed that ethnic minority students in the UK academically underperform compared with white students throughout medical school and across different types of examinations (Woolf et al. 2011).
In particular differences in clinical performance between white students and ethnic minority students appear to be consistent: research from numerous Western medical schools has shown that ethnic minority students underperform compared with their white counterparts in Objective Structured Clinical Examinations (OSCEs) and other clinical assessments (Dewhurst et al. 2007;Haq et al. 2005;Lumb and Vail 2004;Woolf et al. 2008Woolf et al. , 2013Yates and James 2007). Others found that they received lower grades in their clerkships (Lee et al. 2007;Stegers-Jager et al. 2012). The underperformance in clinical assessments of ethnic minority students was also found among physiotherapy students (Naylor et al. 2014).
However, ethnic differences on performance in written examinations are less clear. Several authors have reported that ethnic disparities in the pre-clinical course were less profound than in the clinical course (Stegers-Jager et al. 2012;Yates and James 2007). Woolf et al. (2008) found ethnic differences on both written and OSCE assessments, but the difference on the written assessments disappeared when it was adjusted for OSCE performance, while it remained on the OSCE when adjusted for written performance. Haq et al. (2005) found ethnic differences on all OSCEs, but only on half of the written examinations.
In addition, to our knowledge, little is known about differences in performance on various types of written examinations, such as theoretical end-of-block tests, clinical problem solving tests and writing skills tests. Despite the fact that studies on ethnicity and medical school performance have used several types of written outcomes (Woolf et al. 2011), we are not aware of studies that have specifically looked at differences on performance in different types of written examinations.
Although traditionally medical students come from the highest socio-economic groups (Seyan et al. 2004), the anticipated effect of the widening access policies is an increase of so-called first-generation university medical students. The evidence with respect to the relationship between social background and medical school performance is inconclusive: some studies report that social-class background and parental education are not associated with performance of medical students in the UK and the USA (Arulampalam et al. 2004;Fernandez et al. 2007;Lumb and Vail 2004), whereas other studies report that students of lower social-class and first-generation university students underperform in medical school in the Netherlands and the UK, in particular in clinical examinations (Stegers-Jager et al. 2012;Woolf et al. 2013). Two studies report that students with a medical doctor as parent are less likely to drop out of medical school (Arulampalam et al. 2004(Arulampalam et al. , 2007, whereas this predictor was not confirmed in another study (Stegers-Jager et al. 2012). Irrespective of these reports there appears to be a relative paucity in the number of studies that have focused on social background as predictor of performance in medical school (O'Neill et al. 2011).
In sum, there is accumulating evidence of ethnic disparities and some of social disparities in preclinical training. However, it is still not clear why it occurs. The aim of this study was to gain more insight in reasons for underperformance by looking at performance in various types of written and clinical examinations. In addition, we examined whether the ethnic and social disparities in examination performance could be explained by a combination of socio-demographic and academic factors that have previously been found to influence medical school performance, including gender (Haq et al. 2005;James and Chilvers 2001;Lumb and Vail 2004;Yates and James 2007), age (James and Chilvers 2001;Lumb and Vail 2004;Stegers-Jager et al. 2012), first language (Arulampalam et al. 2007;Ferguson et al. 2002;McManus et al. 1996), and pre-university grades (Arulampalam et al. 2004;Huff and Fang 1999;Yates and James 2007). Finally, in our analysis we have also taken into account the reported variation in academic performance among different ethnic minority groups (Hofman and van den Berg 2003;McManus et al. 1996;Stegers-Jager et al. 2012).

Course structure and examination
This study was conducted at the Erasmus MC Medical School, Rotterdam, the Netherlands. Compared with other Dutch medical schools this school has a relatively large number of ethnic minority students. The medical course consists of a 3-year Bachelor degree course followed by a 3-year Masters degree course. The integrated and theme-oriented Bachelor curriculum is divided into three thematic blocks per year. Each thematic block consists of 2-3 sub-blocks. The Bachelor course includes two types of examinations: written examinations and clinical skills examinations. There are three types of written examinations: (1) theoretical knowledge, (2) language skills, and (3) writing skills. The theoretical knowledge examinations are further divided into block tests at the end of each thematic sub-block and a clinical problem solving test (CPST) at the end of each year of the Bachelor degree course. The theoretical examinations are largely machine-marked and consist of multiple choice questions (MCQs), extended matching questions (EMQs), comprehensive integrative puzzles (CIPs) and/or short answer questions. The language skills test consists of four parts: basic Dutch, spelling, grammar and style. The writing skills tests include writing an abstract (Year 1), an argument paper (Year 2) and an essay (Year 3).

Participants and procedure
The new Erasmus MC Bachelor curriculum was implemented in 2008. All 2432 students who entered the Erasmus MC Medical School during 2008-2013 were included in this study. Data on ethnicity, gender, age and pre-university Grade Point Average (pu-GPA) for these cohorts were available from a national database of students in higher education in the Netherlands (1cijferHO).
Additional data on social background were collected by online questionnaire for firstyear students in 2012 (n = 331; 83 %) and in 2013 (n = 392; 95 %) and for third-year students in 2012 (n = 340; 92 %). This part of the study was designed with the help and approval of the Dutch Data Protection Authority. Students were informed about the study, participation was voluntary, confidential processing was guaranteed, and individual consent was sought. Data on examination performance were obtained from the university student administration system. Because these data were collected as part of regular academic activities, individual consent was not necessary.

Variables and measures
The socio-demographic variables included in this study are ethnicity, first-generation immigrant, urban background, first-generation university student, first language, and medical doctor as parent.
According to Statistics Netherlands (CBS; www.cbs.nl), an individual belongs to an ethnic minority group if at least one of his or her parents was born outside the Netherlands. Based on the countries of birth of their parents, ethnic minority students were classified into one of five ethnic subgroups: Turkish/Moroccan/African; Surinamese/Antillean; Asian (mainly Middle East: Afghanistan, Iran, Iraq, Pakistan); Western, and 'Other' (Hofman and van den Berg 2003;Stegers-Jager et al. 2012). Because of the small number of students in the category 'Other' (n = 19), their data were excluded from the analyses. On the questionnaire, students self-categorized whether they had an urban background and whether their first language was Dutch or non-Dutch. Parental education and parental profession as provided by the students was used to determine whether or not they were first-generation university students and whether or not they had at least one parent who was a medical doctor.
As confounders we included gender, age and pu-GPA. The age at entry of medical school was split into three categories:\19 years; 19-21 years;[21 years. pu-GPA was included in the analyses as a continuous variable. As pu-GPA was not available for students with a foreign or a non-standard Dutch pre-university education, a categorical variable-'missing pu-GPA'-was added to the analyses. Missing values for pu-GPA were substituted with the mean in an analysis with 'missing pu-GPA' and continuous pu-GPA included.
Outcome measures were 'pass' or 'fail' on the written and clinical examinations. The cut-off pass/fail mark was 5.5 on a 10-point scale (1 = poor, 10 = excellent). Two firstyear end-of-block tests and the three end-of-year CPSTs were included in this study. We also included the first-year language skills test and the three writing skills tests (one each year). Finally, the clinical skills examinations in year 2 and 3 of the bachelor degree course were included in this study. The second-year OSCE consists of three stations: history taking, physical examination and communication. The third-year OSCE consists of seven stations: history taking (29), physical examination (29), communication (29) and neurology (19). Measures of reliability of the written and clinical examinations were generally [0.7. Only data of the first attempt on the examinations were included. The number of participants per examination per cohort differs because of absence at examinations and, in particularly in later years, of attrition, either voluntarily or due to dismissal (see Stegers-Jager et al. 2011 for more details).

Statistical analysis
We assessed associations between ethnicity and the other independent variables using Chi squared tests for categorical data and analysis of variance (ANOVA) for pu-GPA. We used logistic regression to calculate odds ratios (ORs) for the effect of ethnicity on the outcome measures (Step 1). Subsequently, we adjusted for key confounders (gender, age, and pu-GPA) (Step 2). Finally, we adjusted for confounders and socio-demographic characteristics (first-generation immigrant, urban background, first language, medical doctor as parent and first-generation university student) (Step 3).
Similar analyses were used to calculate the effect of being a first-generation university student.
In order to obtain an indication of the effect of ethnicity per type of exam we took a meta-analytic approach. For each of the ethnic minority subgroups, we pooled odds ratios using mixed effects logistic regression for each type of examination and for all examinations combined. These meta-analyses were performed on both the unadjusted (Step 1) and the adjusted (Step 3) odds ratios. The glmer command in R was used to estimate the logistic regression models with ethnicity (unadjusted and adjusted), confounders and sociodemographic characteristics as predictors and pass/fail on the various examinations as outcomes.
Missing values on 'first-generation university student', 'medical doctor as parent', 'first language' and 'urban background' were statistically imputed based on their correlation with the other variables, including the outcome variables (Appendix 2). We used the multiple imputations procedure in SPSS, and chose to replace each missing value five times using five independent draws from the imputation model. The multiple imputation for categorical variables was restricted so that only categorical imputations were produced. Pooled estimates over the imputed data sets were used. As recommended by Sterne et al. (2009) odds ratios were compared between analyses of the imputed dataset (multiple imputed) and the unimputed dataset (complete cases) (see Appendix 2 for details). As the absence of data on the four imputed variables was systematically related to cohort, we considered the missing-at-random assumption to be reasonable.
Meta-analyses were performed using R statistical software, version 3.1.0 (R Foundation for Statistical Computing, Vienna, Austria), all other statistical analyses were performed using IBM SPSS Statistics for Windows Version 21.0 (IBM Corp., Armonk, NY, USA). We present 95 % confidence intervals (CIs) for adjusted ORs, which indicate statistical significance if they do not include a value of 1.0.

Student characteristics
Non-Dutch students on average were older, more often had a missing pu-GPA and an urban background (Table 1). Turkish/Moroccan/African students were more often firstgeneration university students and had less often a medical doctor as parent. Asian students were more often male. Non-Dutch as a first language was more often spoken among Turkish/Moroccan/African and Asian students. The mean pu-GPA of Turkish/Moroccan/ African students and Asian students was significantly lower than of Dutch and Western students. There was no statistically significant difference in the numbers of students in each ethnic category between the six cohorts.

Written examinations: theoretical knowledge
Dutch students were more likely to pass the first-year CPST (74 %) compared with Turkish/Moroccan/African students (60 %; unadjusted OR 0.52), Surinamese/Antillean students (57 %; unadjusted OR 0.46) and Asian students (57 %; unadjusted OR 0.46; Table 2 and Appendix 1, Fig. 1). Similar results were found for the second-year CPST (see Table 2 and Appendix 1; Fig. 1). On the third-year CPST Dutch students (85 %) were more likely to pass compared with Turkish/Moroccan/African students (71 %; unadjusted OR 0.43) and Asian students (62 %; unadjusted OR 0.29). Dutch students were also more likely to pass first-year block test A (68 %) compared with Surinamese/Antillean students (55 %; unadjusted OR 0.59). Turkish/Moroccan/African and Surinamese/Antillean students were less likely to pass first-year block test B compared with Dutch students (65 and 67 %, respectively vs 76 %; unadjusted ORs 0.59 and 0.63), while Western students were more likely than Dutch students to pass this test (86 %; unadjusted OR 1.92). All these disparities were to a large extent explained by confounders and socio-demographic characteristics ( Fig. 1). Details of the regression analyses, with both complete cases and multiple imputations, are presented in Appendix 2.
The pass/fail rates between first-generation university students and non-first-generation university students were not significantly different on any of the theoretical knowledge tests (Table 3).

Written examinations: language skills
The percentage of Dutch students that passed the language skills test (56 %) was significantly higher than for all other ethnic subgroups (ranging from 28 to 48 %; Table 2). The differences in percentages correspond to unadjusted ORs ranging from 0.30 for Asian students to 0.48 for Western students. These disparities were only partly explained by confounders and socio-demographic characteristics ( Fig. 1; Appendix 1).
First-generation university students less often passed the language skills examination than non-first-generation university students (44 vs 53 %;   Written examinations: writing skills Dutch students were more likely to pass the first-year writing skills test compared with Turkish/Moroccan/African students (77 vs 65 %; unadjusted OR 0.56) and Asian students (69 %; unadjusted OR 0.67). Similar results were found for the second and third-year writing skills tests (Table 3, Fig. 1). Confounders and socio-demographic characteristics could only partly explain these differences (Fig. 1, Appendix 1). The pass/fail rate between first-generation university students and non-first-generation university students was not significantly different on the writing skills tests (Table 3).
Dutch students were also more likely to pass the third-year OSCE (79 %) than Turkish/ Moroccan/African students (67 %; unadjusted OR 0.54) and Asian students (54 %; unadjusted OR 0.30; Table 2, Appendix 1). Confounders and socio-demographic characteristics could only explain the difference found for the Turkish/Moroccan/African students (Fig. 1, Appendix 1  There were no significant differences in the pass/fail rates between first-generation university students and non-first-generation university students on the OSCEs (Table 3).

Discussion
Whereas the present study confirms earlier findings of ethnic disparities in preclinical training, it clearly shows that there are differences both across ethnic subgroups and between different types of written and clinical examinations. While all three non-Western ethnic minority groups underperformed on the CPSTs, the language skills test and the OSCEs, findings on the theoretical end-of-block tests and writing skills tests, and results for Western minority students were less consistent. Age, gender, pu-GPA and sociodemographic variables (including parental education and first language) could largely explain the ethnicity-related disparities in theoretical examinations, but not in language, writing and clinical skills examinations. First-generation university students only underperformed on the language skills test.

Explanations of the findings
One of the most surprising outcomes is the difference in findings on the CPSTs and the end-of-block tests, which are both written theoretical knowledge tests. A possible explanation lies in the nature of, and the required preparation for these examinations. While the end-of-block tests are written tests with series of mostly multiple choice clinical theme-related questions, the CPSTs consist of a number of clinical cases that students have to prepare for in the week preceding the examination. The description of the clinical cases, their elaboration during the preparation for the examination and their subsequent testing in the CPST requires a thorough command of the Dutch language, rendering it probable that these students underperform as a result of a lower level of Dutch language skills. This explanation is confirmed by the fact that the ethnicity-related disparities were further explained after adjusting for the socio-demographic factors including first language.
However, for the preparation of the CPST the involvement of a group of fellow students is essential as well, since the sheer number of differential diagnostic possibilities is much too large for a single student to manage. It might be that the negative effects on learning of ethnic homophily (the tendency to interact with others in the same group) which has been reported in medical students (Vaughan et al. 2015;Woolf et al. 2012) are more profound for this type of examinations. Ethnic homophily may cut off minority students from resources that facilitate learning (Vaughan et al. 2015), which might be particularly important for an examination that requires a high level of self-organised, informal learning.
As previous research showed that all ethnic minority groups, including Western minority students, underperformed in clinical training (Stegers-Jager et al. 2012), it is remarkable that in the current study no underperformance was found in the two OSCEs for Western students. Similarly, the current study also found no indications for a lower level of clinical skills that could explain the previously reported lower clerkship grades for first-generation university students (Stegers-Jager et al. 2012). Apparently, there is a difference in what is measured by the preclinical OSCE stations and the examinations in clinical training. As we have suggested previously (Stegers-Jager et al. 2015), it might be that, due to the more subjective examination methods in clinical training than in preclinical training (Kassebaum and Eaglen 1999), the role of cultural capital (i.e. ''knowledge of the norms, styles, conventions and tastes that pervade specific social settings and allow individuals to navigate them in ways that increase their odds of success'' (see Massey et al. 2002, p. 6) is more prominent during clinical than during preclinical training.
For the three non-Western ethnic minority groups indications were found for a lower level of clinical skills, and interestingly the different groups underperformed on different parts of the clinical examinations. Our findings were not in line with the study of Fernandez et al. (2007) who found that Asian and Black student only scored lower on the communication part, not on history of physical examination scores. The discrepancy in underperformance of Surinamese/Antillean students on the OSCEs in year 2 and year 3 may be explained by a higher drop-out rate after 2 years at medical school for Surinamese/ Antillean students, resulting in a relative loss of the lower performing students from this group (unpublished observation).
Another remarkable finding is that the confounders and socio-demographic factors could largely explain the ethnic related disparities on the theoretical knowledge tests, but to a much lesser extent those on the language, writing and clinical skills examinations. This suggests that the underperformance in the latter examinations is due to other factors, such as cultural differences in communication styles. Hauer et al. (2010) found that lower scores on the communication part of a clinical performance examination for ethnic minority students could partially be explained by a less patient-centred approach. The ethnic minority students scored higher on impersonal attitude, suggesting that they integrate less of the patients' background and the patients' perspective into history taking. Another study found that non-native English speakers in Australia scored significantly lower than native English speaking students on appropriate content and appropriate use of the English language for a writing skills assessment (Chur-Hansen and Vernon-Roberts 2000). Nevertheless, further research is required into explanations for the underperformance of ethnic minority students in the clinical and writing skills examinations.

Comparisons with other studies
Our study confirms that ethnic minority students underperform on written and clinical examinations (Woolf et al. 2011), but also reveals differences in performance among ethnic minority groups and between different types of examinations. In this study we systematically adjusted for a combination of confounders and additional socio-demographic factors, whereas most studies on ethnicity and medical school performance only adjust for gender and sometimes for age, pre-university grades, first language or socioeconomic group (Woolf et al. 2011). Our analyses confirmed the expected associations of the confounders with performance at medical school (Appendix 2). The most consistent predictor for underperformance was a missing or lower pu-GPA (Arulampalam et al. 2007;Ferguson et al. 2002;James and Chilvers 2001;Stegers-Jager et al. 2012;Yates and James 2007). In line with other studies (Haq et al. 2005;James and Chilvers 2001;Lumb and Vail 2004;Yates and James 2007), male gender was associated with underperformance in several examinations, but in the end-of-block examination covering biochemistry the male students outperformed the female students. On both types of theoretical knowledge tests the students aged [21 years performed relatively well after adjustment for the other variables (James and Chilvers 2001;Lumb and Vail 2004;Stegers-Jager et al. 2012). The remaining socio-demographic characteristics were less important as predictors of performance in medical school during the pre-clinical years.

Strengths and limitations
A major strength of this study was its large sample size and hence the large number of non-Dutch students which allowed us to extend our analysis beyond a white/non-white comparison, to which most studies on ethnicity and medical school performance are restricted (Woolf et al. 2011). The use of a longitudinal design, which is also uncommon in studies on factors associated with medical school performance (Ferguson et al. 2002), enabled us to note differences among ethnic groups on several types of pre-clinical examinations. Additionally, unlike previous studies, we were not compelled to use less reliable methods such as self-report or to use names or photographs to gather students' ethnicity (Haq et al. 2005;Woolf et al. 2011).
There are some limitations to our study. Firstly, our data reveal ethnic disparities in preclinical examinations, but do not shed light on whether these disparities resulted from bias or from actual differences in skills levels on the constructs being assessed. A first strategy to find out whether bias is present is to evaluate the differential predictive validity of examinations or tests. This comprises comparing predicted performance and actual performance, which can either show ''underprediction'' or ''overprediction'' (predicted performance lower, respectively higher than actual performance) (Koenig et al. 1998). However, the main challenge for such differential predictive validity studies remains to find a valid, unbiased criterion of medical school performance. A second strategy would be to use differential item functioning procedures to examine for statistical evidence of bias in items (Koenig et al. 1998). Our current study forms a good start for these kinds of followup studies.
Secondly, the additional social background data was not collected for all cohorts of students. Therefore we had limited data on the additional socio-demographic factors (urban background, first language, first-generation university student and medical doctor as parent), which were replaced using the multiple imputation technique, a generally accepted and suitable method for dealing with missing values (Donders et al. 2006;Steyerberg 2009). Since they allow the use of data that are available for other predictors that would otherwise be lost, imputation methods, especially multiple imputations, are superior to complete case analysis (Altman and Bland 1995;Donders et al. 2006;Steyerberg 2009). The ORs calculated in the imputed dataset in our study were similar and, if different, generally more conservative than the ORs in the unimputed dataset (Appendix 2), supporting the validity of our use of multiple imputations.
Thirdly, all cohorts of students came from a single medical school. However, there are no reasons to presume that-apart from the relatively large amount of ethnic minority students due to our geographical position in the Netherlands-students at our institution are different from other Dutch medical students with regard to entrance variables (Cohen-Schotanus 1999). Still, replication studies are needed to establish whether the results can be generalised to other populations. We would like to encourage others to also examine ethnic and social disparities in different types of written and clinical examinations.

Implications for practice and future research
This study has several practical implications for medical schools that are confronted with increasingly diverse student populations. A first practical implication is that medical schools should take care in designing assessment strategies to avoid possible unintended effects of certain types of examinations for certain groups of students. In analogy to the ''validity-diversity dilemma'' (Kravitz 2008) in selecting for a diverse medical school population, as recently described by Lievens (2015), medical schools face the challenge of balancing assessment strategies that not only fulfil the goal of assessing the required standards of competency, but also retain a diverse student population. In order to ensure that non-traditional medical students are not disadvantaged, diversity should be considered both in test construction and implementation (Wass et al. 2003). Future research should be focused on helping medical schools to design valid assessment strategies that enable nontraditional students to show their merit. As mentioned above, evaluating the differential predictive validity and using differential item functioning procedures may be helpful here.
A second practical implication is that additional support focused on specific examinations for specific groups of students might be appropriate. As an example, the additional support for the CPSTs might take the form of planning formal meetings for studentspreferably in randomly allocated tutor groups (Woolf et al. 2012)-to prepare for the examinations. This might lead to 'meaningful social and academic interactions among students who differ in their experiences, views and traits' and prevent student from sorting into homogeneous niches (Tienda 2013) which might in particular be disadvantageous for ethnic minority students (Vaughan et al. 2015;Woolf et al. 2012). As stated by Cleland et al. (2013), there is a need for rigorous approaches to developing and evaluating additional support for specific groups, focused on what works and why.
Two additional areas of research within the field of ethnic and social disparities in medical school performance emerge from our findings. Firstly, the previously reported lower clerkship grades for Western minority or first-generation university students (Stegers-Jager et al. 2012) appear not be due to a worse preparation during the pre-clinical years, as both groups of students did not underperform in either the written or the clinical examinations. This was not in line with the findings by Woolf et al. (2008) who found ethnic differences in practical clinical knowledge and skills, but not in theoretical medical knowledge. So, further research is required to explore other causes of the lower grades of Western and first-generation university students in clinical training. Secondly, although the combination of confounders and socio-demographic factors could largely explain the differences in the theoretical examinations, we were still not able to explain the differences in the clinical and writing skills examinations. Despite our own efforts and those of others (Vaughan et al. 2015;Wass et al. 2003;Woolf et al. 2008Woolf et al. , 2013, still more research is needed to find explanations for ethnic disparities in the clinical and writing skills examinations.

Conclusion
Ethnic minority students underperform in pre-clinical training, but there are differences both across ethnic subgroups and between different types of written and clinical examinations. Age, gender and pu-GPA, and socio-demographic variables could largely explain the ethnicity-related disparities in theoretical examinations, but not in language, writing and clinical skills examinations. In order to retain non-traditional students in the medical education pipeline (Lievens 2015), medical schools must design assessment strategies and, if necessary, additional targeted support programmes that create a level playing field for a diverse student population.
Acknowledgments We thank Ewout Steyerberg, Ph.D. for his statistical advice and his critical comments on the manuscript, and Caspar Looman, Ph.D., Dimitris Rizopoulos, Ph.D., and Lidia Arends, Ph.D. for their assistance with the meta-analyses. We also thank Ada van der Velden, M.Sc. for her assistance with the OSCE data and her critical comments on the manuscript.
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Appendix 2
See Tables 5 and 6. Step 2: ethnicity effect adjusted for age, gender and pre-university grade point average (confounders) and cohort c Step 3 ethnicity effect adjusted for confounders, cohort and first-generation immigrant, urban background, first language non-Dutch, medical doctor as parent and first-generation university (socio-demographic characteristics)  Missing values for pre-university GPA were substituted to the mean with Missing pu-GPA and continuous pu-GPA included