Introduction

There are a number of issues of importance in selection for admission to medical school (Prideaux et al. 2011; Girotti et al. 2015). One of these is assessing the predictive validity and reliability of any selection tool, to ensure it measures what it claims to measure, does so fairly and consistently and can be employed rationally (e.g., Cleland et al. 2012; Norman 2015). A second is ensuring that selection tools assess the range of attributes considered important by key stakeholders. Medical schools must select applicants who will not only excel academically but also possess personality traits befitting a career in medicine such as compassion, team working skills and integrity (e.g., Albanese et al. 2003; General Medical Council 2009; Frank and Snell 2015; Accreditation Council for Graduate Medical Education 2014).

This increasing recognition that there is more to being a capable medical student or doctor than academic performance follows on from a similar direction of travel in education where, according to a large body of research, a number of non-cognitive skills are associated with positive academic and work-related outcomes for young people (see Gutman and Schoon 2013, for a recent review). Given this, the assessment of non-academic factors, or personality traits is of increasing importance in medical school selection (Patterson 2013). However, “traditional” methods of assessing such personality traits, including unstructured interviews, using personal references and autobiographical statements are now known to have weak predictive validity (Cleland et al. 2012; Patterson et al. 2016). There is a drive to identify better ways to assess non-academic attributes such as values and personality traits in medical school selection.

Various different ways to do so have been proposed. These can be grouped into “paper and pencil” assessments of personality traits (e.g., Adams et al. 2012, 2015; Bore et al. 2005a, b; Dowell et al. 2011; Fukui et al. 2014; James et al. 2013; Lumsden et al. 2005; Manuel et al. 2005; Nedjat et al. 2013), structured multiple interview approaches (Dore et al. 2010; Eva et al., 2004a, b, 2009; Hofmeister et al. 2008, 2009; O’Brien et al. 2011; Reiter et al. 2007; Roberts et al. 2008; Rosenfeld et al. 2008), selection centres (Gafni et al. 2012; ten Cate and Smal 2002; Ziv et al. 2008; Gale et al. 2010; Randall et al. 2006a, b) and—the “new kid on the block”—situational judgement tests (Christian et al. 2010; Koczwara et al. 2012; Lievens 2013; Lievens et al. 2008; Patterson et al. 2009).

However, there are relatively few studies examining the predictive validity of the “paper and pencil” tests which aim to assess personality traits in medical school applicants. Those which have been published are often concerned with feasibility of use across cultural settings (e.g., Fukui et al. 2014; Nedjat et al. 2013) and/or are descriptive in terms of cross-sectional comparisons across different groups of students (e.g., graduate entrants versus school-leavers: Bore et al. 2005a, b; James et al. 2013; Lumsden et al. 2005; Nedjat et al. 2013). The few studies of predictive validity to date tend to be small scale, usually single site (Adams et al. 2012, 2015; Manuel et al. 2005) and/or use local assessments as outcome measures (Adams et al. 2012, 2015; Dowell et al. 2011; Manuel et al. 2005), limiting their generalizable messages. Large-scale, independent studies of the predictive validity of approaches to assessing personality traits, or non-academic factors in medical selection are lacking, partly because appropriate non-academic outcome markers are not easily available.

Moreover, there is much debate about the promise of personality traits for predicting success generally, the different approaches being advocated to measure these, and a clear need for more evidence (Norman 2015; Powis 2015). Drawing on the wider educational literature again, it is clear that personality traits include a very broad range of characteristics. These can be separated into those considered to be modifiable, such as motivation, resilience, perseverance, and social and communication skills, and those considered more stable or personality traits, which include Openness to Experience, Conscientiousness, Extraversion, Agreeableness, and Neuroticism (also called Emotional Stability) (Gutman and Schoon 2013). There is a wealth of evidence indicating that the latter, the “Big Five” personality traits, correlate highly with job performance over a range of occupational groups (e.g., Barrick and Mount 1991; Rothmann and Coetzer 2003; Salgado 1997; Dudley et al. 2006) and with performance at medical school (e.g., Lievens et al. 2002). It is this apparently stable group of traits which has been used as the theoretical basis of most “paper and pencil” assessments of personality traits designed specifically for use in medical school selection (see earlier for references). However, the more recent approaches to measuring personal characteristics in medical school selection have a slightly different conceptual basis. For example, rather than being based directly on the “Big Five” theory of personality traits, SJTs are based on implicit trait policy (ITP) theory and, depending on the job level, specific job knowledge (e.g., Motowidlo and Beier 2010a, b; Patterson et al. 2015a, b). SJTs measure the expression of personality traits in hypothetical situations which are designed on the basis of what is expected in the job for which the individual is being assessed (Motowidlo et al. 2006). They encompass measurement of personal choice (e.g., what is the best way to respond in this particular situation?) rather than just unfiltered (our word) trait expression which is arguably what is measured in traditional personality tests. There is also a pragmatic difference between “paper and pencil” tests and the SJTs. The latter are based on thorough job analysis (Patterson et al. 2012b, Motowidlo et al. 1990) of what is expected by doctors in particular roles (e.g., junior doctor (resident)/or doctor working in a particular specialty) and take the stance that “one size does not fit all”, whereas the former are typically more general measures of traits which are considered generically important to being a doctor. We return to the implications of these different positions and theoretical underpinnings for assessing personality traits in medical school section in the discussion section of this paper.

Several major changes in selection for medical school and medical training after graduation in the UK now enable large-scale multi-site studies examining the predictive validity of selection processes, including those proposing to measure personality traits. The first of these is greater consistency across UK medical schools in terms of their selection approaches (Cleland et al. 2014), with, for example, the vast majority of UK medical schools using the same aptitude test, the UK Clinical Aptitude Test (UKCAT), as part of their selection matrix. While the focus of the UKCAT is assessment of cognitive ability, “non-cognitive” or personality trait tests were included, on a trial basis, in 2007–2009. The second is the introduction of a standardised, national process for selection into the next stage of medical training after medical school in the UK, via the Foundation Programme Office (UKFPO). Those entering the selection process for the UKFPO obtain two indicators of performance: an Educational Performance Measure (EPM) and the score they achieve for a Situational Judgement Test (SJT). We present details of these indicators later in this paper. Finally, there is a move within the UK for organisations such as UKCAT and the UKFPO to work together in terms of data linkage, to enable large-scale, high-quality, longitudinal research projects.

Together, these innovations finally provide the opportunity to address a gap in the literature highlighted many years ago (see Schuwirth and Cantillon 2005). Our aim in this paper is therefore to examine the predictive power of tests purporting to assess personal personality traits in relation to two national performance indicators on exit from medical school: an academic progress measure and a measurement of personality traits determined, through a job analysis, to be associated with successful performance as a Foundation Programme doctor. We do so with data from a large number of medical schools.

Methods

Design

This was a quantitative study grounded in post-positivist research philosophy (Savin Badin and Major, Savin Baden and Major 2013). We examined the predictive validity of the personality traits, or “non-cognitive” component of the UKCAT admissions test (http://www.ukcat.ac.uk/) compared to the UK Foundation Programme (UKFPO: (http://www.foundationprogramme.nhs.uk/pages/home)) performance indicators in one graduating student cohort.

Study population

Our sample was the 2013 graduating cohort of UK medical students from the 30 UKCAT medical schools. This was the first cohort for whom both UKCAT and UKFPO indicators were available.

Data description

With appropriate permissions in place, working within a data safe haven (to ensure adherence to the highest standards of security, governance, and confidentiality when storing, handling and analysing identifiable data), routine data held by UKCAT and UKFPO were matched and linked.

The following demographic and pre-entry scores were collected: age on admission to medical school; gender; ethnicity; type of secondary school attended (fee-paying or non fee-paying); indicators of socio-economic status or classification (SEC), including Index of Multiple Deprivation (IMD) which is based on postcode, and the National Statistics Socio-economic Classification (NSSEC) which is based on parental occupation; domicile (UK, European Union [EU] or overseas); and academic achievement prior to admission, out of a maximum of 360 points (the UCAS tariff). UKCAT cognitive scores were not included given the research question focused on the predictive validity of UKCAT non-cognitive scores. Those taking the test in 2007 and 2008 were randomly allocated to sit one of four non-cognitive tests (see below). These were:

  1. 1.

    The Interpersonal Values Questionnaire (IVQ) measures the extent to which the respondent favours individual freedoms (versus societal rules) as a basis for making moral decisions (Bore et al. 2005a, b; Powis et al. 2005). The rationale being that this dimension of moral orientation, the extent to which the individual will ‘act in own best interests’ (Libertarian) vs ‘act in interests of society (Communitarian). This has one domain entitled libertarian (low score –communitarian (high score). Candidates are presented with a number of situations where people have to decide what to do according to their opinions or values, responding via a 4 point Likert scale to decide where best their values sit.

  2. 2.

    The Interpersonal Traits Questionnaire (ITQ) or NACE, which measures narcissism, aloofness, confidence (in dealing with people) and empathy Munro et al. (2005); Powis et al. (2005). It claims to assess specific aspects of the wider domain of empathy; a high degree of empathy is linked to convivial interpersonal relationships and is generally seen as a positive thing in care-givers; although too high a degree of empathy it is argued could lead to over-involvement and burnout. ITQ produces a summary score for INVOLVEMENT where C + E − (N + A), therefore some totals may be negative overall representing ‘detachment)’. Overall confidence and empathy are deemed positive, narcissism and aloofness negative. The candidates who receive this test are presented with 100 statements about people and the way in which they might think or behave in certain situations. They are then given a 4 point Likert scale, and asked to decide which statements most relate to them.

  3. 3.

    The Managing Emotions and Resiliency Scale (MEARS) (Childs et al. 2008) was designed to reflect the cognitive, behavioural and emotional elements of resilience and describe coping styles in terms of attitudes, beliefs and typical behaviour, in six domains: self-esteem, optimism, self-discipline, faking, emotional non-defensiveness, and control (Childs 2012). In each a high score reflects a high perceived self-value in that domain. It is reported as three scores: cognitive/self-esteem and optimism scales, behavioural/control and self-discipline and emotional non-defensiveness. Candidates receive a set of paired statements that represent opposing viewpoints. They must decide their level of agreement within a six point range.

  4. 4.

    1 and 2 above combined, both in an abridged format.

Note that UKCAT introduced the ITQ, IVQ and MEARS assessment on a pilot basis and the scores were not made available to selectors, i.e. NOT used in the actual selection to medical school. See Appendix: UKCAT non-cognitive test example questions.

The four outcome measures were the UKFPO selection SJT and EPM (decile and total) and total UKFPO. The EPM is a decile ranking (within each medical school) of an individual student’s academic performance across all years of medical school except final year, plus additional points for extra degrees, publications etc. The total EPM score is based on three components, with a combined score of up to 50 points:

  • Medical school performance by decile (presented as 34–43 points).

  • Additional degrees, including intercalation (up to 5 points).

  • Publications (up to 2 points).

We chose the EPM as an outcome measure as the wider education literature strongly indicates that personality traits relate to performance on academic outcomes (Gutman and Schoon 2013).The UKFPO SJT is also scored out of 50 points. The SJT focuses on key non-academic criteria deemed important for junior doctors on the basis of a detailed job analysis (Commitment to Professionalism, Coping with Pressure, Communication, Patient Focus, Effective Teamwork; see e.g., Patterson and Ashworth 2011; Patterson et al. 2015a). It presents candidates with hypothetical and challenging situations that they might encounter at work, and may involve working with others as part of a team, interacting with others, and dealing with workplace problems. In response to each situation, candidates are presented with several possible actions (in multiple choice format) that could be taken when dealing with the problem described. It is administered to all final year medical students in the UK as part of the foundation programme application process, is taken in exam conditions, and consists of 70 questions in 2 h 20 min (http://www.foundationprogramme.nhs.uk/pages/medical-students/SJT-EPM). It is a relatively new assessment but a preliminary validation study (Patterson et al. 2015a, b) has identified that that higher SJT scores were associated with higher ratings of Foundation Year 1 doctors (FY1 s: those in their first year post-graduation) in-practice performance as measured by supervisor ratings and other key performance outcomes (via supervisor ratings and routine measures); that the two selection tools (SJT and EPM) were complementary in providing prediction of performance, and that FY1 doctors in the low scoring SJT category were almost five times more likely to receive remediation than those who were in the high scoring category.

We chose this as an outcome measure as, given that there is emerging consensus that the SJT is essentially a measurement technique that targets non-cognitive attributes (Motowidlo and Beier 2010a, b), this offers a meaningful interim outcome marker for non-academic measures used within medical school selection processes.

The EPM and SJT are summed to give the UKFPO score out of 100.

Statistical analysis

All data were analysed using SPSS 22.0. Pearson or Spearman’s rank correlation coefficients were used to examine the linear relationship between each of SJT score and EPM and continuous factors such as UKCAT scores and pre-admission academic scores and age. In terms of practical interpretation of the magnitude of a correlation coefficient, we have a priori defined low/weak correlation as r = 0.10–0.29, moderate correlation as r = 0.30–0.49 and strong correlation as r ≥ 0.50. Two-sample t-tests, ANOVA, Kruskal–Wallis or Mann–Whitney U tests were used to compare UKFPO indices across levels of categorical factors as appropriate.

Multilevel linear models were constructed to assess the relationship between the independent variables of interest: UKCAT non-cognitive test totals and individual domains with each of the four outcomes (SJT, EPM decile, EPM total and UKFPO total). Fixed effects models were fitted first and then random intercepts and slopes were introduced using maximum likelihood methods. Intercepts and slopes for the medical schools were allowed to vary for the non-cognitive tests variables only. Models were adjusted for identified confounders (based on pre-hoc testing showing a correlation coefficient of >0.2 or <−0.2) such as gender, age at admission, IMD quintiles, year UKCAT exam was taken and whether or not the student attended a fee-paying school (NS-SEC and ethnicity had to be dropped from the models due to issues with non-convergence). Interactions between our primary variables and year of UKCAT exam were tested using Wald statistics and was dropped from the models if not significant at the 5 % level. Nested models were compared using information criteria such as the log-likelihood statistic, Akaike’s information criteria, and Schwarz’s Bayesian information criteria. The best fitting models are presented.

Results

There were 6294 students from 30 medical schools in the graduating 2013 cohort. UKCAT non-cognitive and UKFPO results were available for the 3343 students who sat the UKCAT in 2007 (n = 2714) and 2008 (n = 629) but not those who sat the test in 2006 as non-cognitive tests were not part of UKCAT in 2006—i.e. those applying in 2006 had not had the non-cognitive tests administered.

Table 1 shows the demographic profile of the cohort. Most students were from the UK (n = 2958, 90.3 %). The majority (58 %) were female and Caucasian (73.6 %). Just under a quarter of students had attended a fee paying school (23.9 %). The majority of graduating medical students were from higher SEC groups.

Table 1 Descriptive statistics of the demographic variables of the sample

In terms of outcome measures, as would be expected in a decile system such as the EPM, the percentage of graduating students within each decile per school were relatively constant (varying between 9.7 and 11.2) with only the lowest decile as an outlier (7.6). EPM, SJT and total UKFPO scores are shown in Table 1. Almost one half (47.8 %) of the sample had no additional EPM points, 34.9 % (n = 1168) gained three or more further degree points, which indicates they had either intercalated or entered medicine as an Honours graduate. Most (75.3 %) did not gain any points for publications, while 18.4 % gained 1 point, and 6.3 % 2 points.

Table 2 provides an overview of candidate performance on the UKCAT non-cognitive tests. Note that each candidate sat only one of the four tests. The table shows the possible score for each domain and the range achieved by candidates (important for contextualising the multivariate analysis below) as well as the mean score and standard deviation or median and interquartile range, depending on distribution.

Table 2 UKCAT non-cognitive domain scores

Table 3 shows the relationship between demographic characteristics and outcomes. Being older at the time of admission to medical school had weak positive correlation with EPM (r = 0.126, p < 0.001) and a weak negative correlation with SJT (r = −0.054, p < 0.001). Females performed significantly better than males in their EPM decile [median (IQR)]: females 6 (4, 8) versus males 5 (3, 8) p < 0.001. Females also had higher marks in the SJT: females 41.3 (39.1, 43.3) versus males 40.4 (38.3, 42.3) p < 0.001. Females outperformed males in their UKFPO scores: 82.4 (78.3, 86.4) versus males 81.1 (76.9, 85.2) p < 0.001. Caucasian students performed better than non-Caucasians in all outcomes (p < 0.001). In terms of type of secondary school attended, students who had attended independent secondary schools had a poorer EPM decile median of 6 (IQR 3, 8) than students from non-fee-paying schools (median 6, IQR 3, 8: p < 0.001). No statistically significant difference was seen in the other outcome measures. Spearman’s rho identified a weak correlation between pre-admission academic scores and each of EPM decile (r = 0.198, p < 0.001), total EPM (r = 0.224, p < 0.001), SJT (r = 0.104, p < 0.001) and total UKFPO (r = 0.212. p < 0.001).

Table 3 Univariate analysis: relationship between demographic variables and outcomes

Linear regression showed that there was no significant association between EPM decile or total EPM and any of the individual domains in the non-cognitive tests 1, 2 and 4. In test 3, however, there was modest correlation between total EPM and each of the individual MEARS domains (r = 0.255–0.449, p < 0.001) and there was weak correlation between the MEARS domains and EPM decile (r = 0.085–0.211). There was no significant correlation between any of the non-cognitive tests and the SJT score. Total UKFPO had weak correlation with the MEARS domains (r = 0.209 to 0.318). Of note, there was a strong correlation between student age and MEARS total (Spearman’s r = 0.570. p < 0.001). (Not shown in tabular form).

As a large number of multi-variate analysis tests were performed, where significant results were obtained, the effects are quite small.

The multilevel analysis (see Table 4) shows that tests 1, 2 and 4 (libertarian-communitarian, NACE total and the abridged test 4) are not significantly associated with any of the four outcomes. END, part of MEARS is significantly associated with all four outcomes. Self-esteem is significantly associated with EPM decile and EPM total but the coefficients are very small. Aloofness and empathy domains in the NACE test are negatively associated with both SJT score and EPM decile.

Table 4 Multilevel analysis—non-cognitive test coefficients adjusted for year of UKCAT exam, gender, age at admission, school type attended and IMD quintile for the four outcomes

In the MEARS domains, the emotional non-defensiveness (END: how one feels and reacts to people and situations) sub-test stood out as predicting all measures positively, with an accumulative effect such that a modest and achievable 7.5 extra marks (out of a valid range of 24–144) would improve total UKFPO score by 1 mark out of 100. Interestingly, increased self-esteem (out of 126) was related to a decrease in EPM decile and this filtered through to EPM total. One extra mark in aloofness (out of 50) led to a decrease in SJT score of 0.066 points, in other words, 15 extra aloofness marks led to a decrease in SJT of one point. Similarly, 14 extra points in empathy (out of 50) on average predicted one less SJT point.

Discussion

This is the first study examining the predictive validity of paper and pencil tests of personality traits on admission to medical school against academic and non-academic outcomes on exit, in relation to both school-based and national performance indicators. We found some significant correlations but all with low effect sizes and an overall inconsistent picture. For example, aloofness and empathy scores on the NACE negatively predicted performance on the SJT but not the EPM decile or EPM total. Moreover, the actual patterns seem conflicting–higher empathy (representing emotional involvement) and higher aloofness (representing emotional detachment) both predicted performance in the same direction. Similarly, scores on the MEARS instrument generally lacked correlation although, first, it seemed the most sensitive test in that modest differences in scores could influence performance on the outcome measures, and, second, two scales appeared of interest. The emotional non-defensiveness (END: how one feels and reacts to people and situations) sub-test stood out as predicting all outcomes measures positively while higher self-esteem was associated with lower EPM decile and EPM total scores. EPM is an indicator of academic achievement, mostly test performance, both written and clinical, but this does fit with the wider, non-medical literature which highlights that non-cognitive attributes can influence cognitive test performance (e.g., Gutman and Schoon 2013). However, these tests are not primarily being employed to predict academic performance and the small effect size with the SJT does not, on its own, seem sufficient to justify the use of such a test (although there may be an argument to explore the utility of the END sub-test further).

Where do our findings sit in comparison to previous literature? Powis and colleagues developed the Personal Qualities Assessment (PQA: which includes the IVQ and ITQ) and tested it in a number of centres. However, few of the reported studies have examined the predictive validity of the PQA, and those which have been carried out are limited in their methodology (e.g., small scale, local outcome measures e.g., Adams et al. 2012, 2015; Dowell et al. 2011; Manuel et al. 2005) and—at best—find only modest correlations (Adams et al. 2012, 2015). We would argue that, given the evidence to date as to the utility of SJTs in a variety of professional groups (see earlier, and Patterson et al. 2012a, b for a review) the use of a validated SJT as an outcome measure is more robust than the comparators used by other authors, and hence the weak relationship we found is probably a more accurate assessment of the power of the IVQ and ITQ to predict outcomes at the end of medical school.

It has been argued that the non-academic attributes the PQA measures are desirable in clinicians until the extremes are reached, as too much or little of any may be problematic. Indeed, Powis (2015) has gone as far as suggesting that the minority at these extremes might be excluded from the selection process. This view is not widely supported (e.g., Norman 2015) and indeed, given the low effect sizes and inconsistent picture we found with the NACE, we elected not to assess the ‘extremes’ as advocated by Powis and colleagues (e.g., Bore et al. 2009; Munro et al. 2008) as there seemed no justification for doing so. Certainly, on our evidence, the PQA cannot be justified as a tool or filter for excluding individual candidates.

Should we have expected there to be an association between performance on the various non-cognitive tests included in the UKCAT, and the EPM and the SJT? It could be argued that we compared apples and pears by expecting tests of personality traits to predict academic performance and the expression of job-specific personality traits in hypothetical situations. On the other hand, there is evidence that the “Big Five” personality factors correlate with academic performance at medical school (e.g., Lievens et al. 2002) and with implicit trait policies (ITPs) (Motowidlo et al. 2006a, b). However, what about the additional influence of other personality traits such as motivation, resilience, perseverance, and social and communication skills (Gutman and Schoon 2013)? It was made clear to applicants that the non-cognitive tests within the UKCAT would not be used in selection decisions, so it would not be unreasonable to assume that those sitting this part of the UKCAT were less motivated to do well on these tests compared to the “high stakes” cognitive UKCAT tests. Conversely, the Foundation Programme application process is competitive so motivation to do one’s best will be high.

There is also the issue of beliefs about the costs and benefits associated with expressing certain traits in particular situations. While ITP theory proposes to be related to individuals’ inherent tendencies or traits, individuals must make judgements about how and when to express certain traits. Thus, SJTs are designed to draw on an applicant’s knowledge of how they should respond in a given situation, rather than how they would respond. Although this seems a conceptual gap to us, there is some evidence that SJTs predict performance in one medical training context, that of UK general practice training (Lievens and Patterson 2011) (and the wider literature also suggests that the way an individual responds to an SJT question does predict actual behaviour and performance once in a role (e.g., McDaniel et al. 2001)). Validity studies have also shown that SJTs add incremental validity when used in combination with other predictors of job performance such as structured interviews, tests of IQ and personality questionnaires (O’Connell et al. 2007; McDaniel et al. 2007; Koczwara et al. 2012). While the focus of this paper is not to analyse the conceptual and theoretical frameworks of personality tools, it is essential that these are critically examined in order to develop, evaluate and compare medical selection tools and how these are used in admissions/selection processes.

This study is unusual in its scale, allowing for accurate estimates of correlations, subgroup analysis and multilevel modelling to more accurately estimate effect sizes. However, the range of outcome markers available was limited. The EPM is an indication of overall course academic achievement as judged against peers within each medical school: without a common exit exam it is not clear how much variation there is between schools and we are unable to estimate this effect, or correct for this. It is also a complex and varied measure as it includes other degrees and publications that will be confounded by age and other factors such as previous degrees. However, there are currently no comprehensive, standardized assessments across the UK akin to say the Canadian or US licensing examination, and so we had to be pragmatic and use what outcome measures were available to us. The SJT predictive validity remains to be determined but there is good reason to expect this, based on related previous work (McManus et al. 2013; Patterson et al. 2012a, 2015a). However, although we did not have access to the full dataset of test-takers (i.e. including those who either were not admitted to medical school or who did not graduate in 2013), mean scores and ranges across the non-cognitive tests were very similar between the full dataset summary (UKCAT technical reports 2007 and 2008) and the results from this graduating cohort (data not shown). In other words, those who graduated did not have significantly different non-cognitive scores from those who did, implying no range restriction due to subset selection. The non-cognitive tests were included in the 2007 and 2008 UKCAT test battery on a trial basis, and it was made clear that this data would not be used in decision making: this “low stakes” situation may have influenced candidate test behaviour, as discussed earlier (e.g., Abdelfattah 2010).

Norman (2015) argues that, without a clear negative relationship between academic achievement and desirable non-academic attributes, selection for medical school can and should seek students with attributes in both domains. To do so, requires valid, reliable and affordable measurement techniques if we are to avoid an overly large initial filter on purely academic grounds. We must conclude that none of the non-cognitive tests evaluated in this study have been shown to have sufficient utility to be used in medical student selection in their current forms. Newer non-cognitive tests, such as the UKCAT entry level SJT (http://www.ukcat.ac.uk/about-the-test/situational-judgement/) will hopefully prove to be more useful in our context, when scrutinised in due course. We intend to follow up this cohort of doctors to examine the predictive validity of the cognitive and non-cognitive tests used at admission to medical school against post-graduate outcome measures.

Ethical permission

The Chair of the local ethics committee ruled that formal ethical approval was not required for this study given the fully anonymised data was held in safe haven and all students who sit UKCAT are informed that their data and results will be used in educational research. All students applying for the UKFPO also sign a statement confirming that their data may be used anonymously for research purposes.