Background

According to the World Health Organization (WHO), Interprofessional Education (IPE) occurs when “students from two or more professions learn about, from, and with each other to enable effective collaboration and improve health outcomes” [1]. Safe, high-quality, accessible, patient-centred care requires continuous development of interprofessional competencies [2], and IPE has repeatedly been called for, so that healthcare students can enter the workforce as effective collaborators [3,4,5].

A growing amount of empirical work shows that IPE can have a beneficial impact on learners’ attitudes, knowledge, skills, and behaviours (the so-called collaborative competencies) [6, 7], and can positively affect professional practice and patient outcomes [8, 9]. IPE may enhance attitudes toward teamwork and collaboration, leading to improved patient care upon graduation. However, the optimal time to expose medical students to IPE is still subject to debate.

IPE may enhance attitudes toward collaboration and teamwork during training, leading to improved attitudes towards IP upon graduation. Nevertheless, the complexity of simultaneous teaching for different healthcare disciplines, as well as logistical problems and busy timetables raise issues concerning the introduction of IPE interventions. The optimal timing to introduce IPE and whether immersion (i.e. continuous collaborative learning) or exposure (periodic collaborative activities) should be adopted [10] are still subject to debate. Gilbert [11] suggests exposure during the early years and immersion in the graduation year. Reasons for this include ensuring the optimal development of students’ professional identity before expecting them to work collaboratively with others. Furthermore, delaying the introduction of IPE to later in the curriculum may be deterred by the students’ focus on profession-specific clinical practice, and immersion in vocation-specific stereotypes or negative attitudes [10]. Current undergraduate literature shows a tendency to introduce IPE earlier, even in the first year of studies [11, 12], but the most effective timing to perform PE interventions in the medical curriculum remains to be determined.

We undertook a systematic literature review to determine the most effective time to introduce IPE to pre-registration medical students. Additionally, we were interested in exploring the nature of the training, the assessment methods and the study outcomes. Our systematic review was guided by the research question: “What is the optimal time to institute interprofessional education interventions in the medical school curriculum?”

Methods

Study design

We performed a systematic review of the literature focusing on interprofessional learning interventions in pre-registration medical students and applied a review protocol based on the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-analyses) statement [13]. We also aimed to perform a meta-analysis with studies grouped by type of assessment. This systematic review was registered in PROSPERO (www.crd.york.ac.uk) with the number CRD42020160964.

Data sources and selection criteria

The systematic literature search was performed on December 12, 2019, using the databases PubMed, PsycINFO, EThOS, EMBASE, PEDro and SCOPUS. The following keywords and subject headings were used as search terms: interprofession*, interprofessional education, inter professional, inter professionally, IPE, and medical student. We included all peer-reviewed articles in English and German that reported on evaluative studies of IPE interventions including medical students, and were published after the 2011 Interprofessional Education Collaborative (IPEC) report [2]. The full search strategy is available in an additional word file [see Additional file 1]. In addition, we included articles found in the reference lists of previous reviews on IPE, discovered as a result of the search for IPE interventions [4, 6, 9, 14,15,16,17,18,19,20,21,22].

Inclusion criteria

We included studies that reported on assessment of knowledge, skills or attitudes (KSA), with an IPE intervention, and that reported quantitative results with a validated IPE instrument. We included only studies using previously comprehensive validated instruments according to various psychometric tests. Validated questionnaires provide reliable and valid results, and can be used to benchmark or compare results on an international level [23], and make statistical comparisons, therefore increasing rigour and allowing for a meta-analysis. One limitation of the use of validated questionnaires is the lack of further piloting or cultural adaptation, which may induce bias. We also narrowed our search to groups of at least 35 medical students in the same year of their medical education programme, to ensure an adequate sample size for statistical validity. To avoid interventions in overlapping years of education, we selected studies reporting on interventions with a duration of at most 6 months (regardless of the type of intervention, the study programme, and the educational year of other students taking part). Although we encountered qualitative IPE studies, we chose a positivist approach because it better aligned with our intention to perform a meta-analysis.

Exclusion criteria

We excluded conference contributions and abstracts without a related peer-reviewed published article. We also excluded all non-validated questionnaires and articles without available full-text in English or German.

Identification of potentially eligible studies

After the primary search, all titles and abstracts were screened and duplicates or non-relevant articles were excluded. The full text of the remaining articles was read by two authors (JBE and AF) to identify the eligible articles for this review. All potentially eligible articles were imported into a software platform for systematic reviews (http://rayyan.qcri.org) [24] to expedite the screening of abstracts and titles and to determine the final selection of eligible studies. The two authors initially performed selection in a blinded mode with three options: “include”, “exclude” and “maybe”. After finishing the first personal assessment, results were unblinded and disagreements were resolved by discussion of individual papers to find consensus. The study selection process is outlined in the PRISMA Flow Diagram – Fig. 1.

Fig. 1
figure 1

PRISMA Study Flow diagram

Data extraction and synthesis

The data extraction form was developed by two reviewers, informed by the form from Reeves et al. [9] but modified to include important aspects specific to this review, including ratio of study year to total duration of studies and classification of “early” or “late” depending if the IPE intervention occurred in the first or second half of medical studies. The reviewers extracted additional data regarding the context of study, recruitment, description of participants, study design, results and conclusions. The analysis of the risk of bias was performed independently, at a later stage. RG moderated in case of disagreement.

Upon completion of article extraction, data were analysed using the Statistical Package for the Social Sciences (SPSS). 23.0. (IBM Corp., Armonk NY, USA). We report descriptive statistics for quantitative data (median, IQR). Data extracted were synthesised in a narrative manner, using an integrative and aggregative approach [25].

Quality assessment and risk of bias

The quality of included studies was also evaluated by JBE and HC using a standardised critical appraisal tool, the McMaster Critical Review Form for Quantitative Studies [26]. If research articles met each criterion outlined in the appraisal guidelines, they received a score of “one” for that item, or, if they did not, a score of “zero”. Item scores were then summed to provide a score of a maximum of 16, with 16 indicating excellent methodological rigour. The quality was defined as poor when the overall score was 8 or less, fair if 9–10, good if 11–12, very good if 13–14 and excellent if 15–16 [27]. This tool was chosen for this systematic review as it is published, freely available, has been used extensively, and can be applied to a range of research designs [28]. Differences in judgment were resolved through discussion.

Statistics

A meta-analysis for those studies using the Readiness of Interprofessional Learning Scale (RIPLS) [10, 29,30,31,32,33] was attempted with the R meta package [34], as this scale was most often used. Otherwise, descriptive analyses were conducted, including frequencies. Where applicable, scales were reversed by subtracting the mean from the maximum score for the scale to ensure a consistent direction of effects across studies. Weighted means of subscales were calculated for each study using the number of participants as weights. Pooling of estimates on the single-item level was not possible, as Sheu et al. [30] only reported on subscale level. Estimates of weighted means of subscales are reported with 95% confidence intervals (CIs). A random effects model was used with the inverse variance method for pooling of estimates across the remaining studies using RIPLS. Standard deviations of mean changes were not given and had to be calculated according to Cochrane’s Handbook [35], which introduced further uncertainty by the need to choose a more or less random correlation coefficient for standard deviations.The meta-analysis was conducted using R 3.5.0 statistical package (R Foundation for Statistical Computing, Vienna, Austria) after related content was extracted and all remaining analyses were conducted by SPSS v.23 (IBM Corp. in Armonk, NY, USA).

Results

Trial flow

The literature search retrieved 3995 articles. After applying the inclusion and exclusion criteria and removing duplicates, 23 articles were included in the review [10, 29,30,31,32,33, 36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52] (see PRISMA Flow diagram, Fig. 1). All studies had a pre-test-post-test design. Basic characteristics of educational interventions are presented in Table 1. We present an overview of characteristics of the included studies in Table 2.

Table 1 Categorised description and characteristics of the 23 included studies (Findings of individual studies could belong to more than one category)
Table 2 Extraction grid for selected studies

Participants

In total 5231 students, of which 62% (n = 3229) were medical students, experienced an IPE intervention. The median number of MS in the IPE interventions was 100 [35–464]. Nine studies (39%) reported data for first-year medical students [10, 29,30,31, 36,37,38,39,40], five (22%) for second-year students [41,42,43,44,45], six (26%) for third-year students [32, 46,47,48,49,50], two (9%) for fourth-year [33, 51] and one for sixth-year medical students [52]. No study reported interventions occurring in the fifth year. Most studies (65%) [10, 29,30,31, 36,37,38,39,40,41, 43,44,45,46, 48] were performed in the first half of the medical curriculum. Three studies [10, 45, 50] (13%) involved only medical students. In all the interventions across all the studies, the other professional groups in the IPE intervention included nursing, pharmacy, dental medicine, physical therapy, biomedical science, occupational therapy, physician’s assistant, radiotherapy and dietetics students (Table 2).

Study designs and locations

The study design was mainly cross-sectional (n = 16). Only two studies (9%) were randomised [39, 40]. Most studies took place in the USA (n = 14) [30,31,32, 37, 38, 40,41,42,43,44, 47, 49,50,51,52] and in Europe (n = 5, Germany, Italy, Spain, Sweden and the United Kingdom) [36, 39, 45, 46, 48].

Interventions

Interventions varied in their type and topic. Most frequently, faculty chose IPE interventions on the topic of chronic care [e.g., Alzheimer’s disease [42], end-of-life issues [49], geriatric care [44], long-term conditions [10, 33, 36, 41, 52] (n = 8)] or acute care (n = 4) [30, 32, 43, 51]. Other topics were communication (n = 2) [37, 46]; medication plans and errors (n = 3) [38, 44, 47] and teaching aimed at influencing interprofessional knowledge, attitudes and skills [29, 31, 39, 40, 45, 48, 53]. Duration of interventions varied from 25 min [50] to 6 months [37], and interprofessional group size ranged from 2 [42, 48] to 25 [49] students. The main educational strategies were small group discussions (n = 7) [30, 31, 36,37,38, 47, 48], simulations (n = 6) [32, 41, 43, 49,50,51] and workshops (n = 5) [38,39,40, 44, 47]. The majority of the reported interventions (48%, n = 11) were held a single time, and 39% (n = 9) lasted less than 6 h.

Assessment measures and outcomes

All studies reported learning outcomes. We could identify 49 different outcome measurements with 46 different assessment methods, but the majority (76%, n = 35) were questionnaires. The most frequent outcomes were attitudes towards IPE and/or other professions (78%, n = 38) and satisfaction (16%, n = 8). Eight studies (35%) used more than one validated instrument to evaluate the experience; four studies [30, 40, 42, 51] used two instruments, and the other four [32, 33, 39, 49] used three. The most commonly used method for assessing attitudes towards IPE was the RIPLS, used in six studies (26%) [10, 29,30,31,32,33], but a total of 22 different scales were used:

  • Attitudes to Health Professionals Questionnaire (AHPQ) [36]

  • Common Ground Instrument (CGI) [36]

  • Scale of Attitudes toward Physician-Pharmacist Collaboration (SATP2C) [38, 40, 44]

  • Sociocultural Attitudes in Medicine Inventory (SAMI) [30]

  • Jefferson Scale of Empathy (JSE) [39, 40]

  • Jefferson Scale of Attitudes toward Physician-Nurse Collaboration (JSAPNC) [39, 48, 49]

  • Jefferson Scale of Physician Lifelong Learning (JeffSPLL) [39]

  • Interprofessional Collaborative Competency Attainment Scale (ICCAS) [41]

  • Attitudes Toward Collaboration Scale (ATCS) [42],

  • Attitudes Toward Interdisciplinary Teams Scale (ATITS) [42]

  • Interprofessional Educative Collaborative Competency Self-Assessment Instrument (IPEC CSI) [43]

  • Interdisciplinary Education Perception Scale (IEPS) [45]

  • University of the West of England Interprofessional Questionnaire (UWE-IP-D) [46]

  • Attitudes Towards Health Care Teams Scale (ATHCTS) [33, 42, 47, 49]

  • Self-Efficacy for Interprofessional Experimental Learning (SEIEL) [50]

  • Teamwork Assessment Scale (TAS) [32]

  • Team Strategies and Tools to Enhance Performance and Patient Safety (TeamSTEPPS) Teamwork Attitude Questionnaire (T-TAQ) [32]

  • Team Skills Scale (TSS) [33]

  • Student Perceptions of Interprofessional Clinical Education (SPICE-R2) [51]

  • Healthcare Stereotypes Scale (HSS) [51]

  • Interprofessional Socialization and Valuing Scale (ISVS) [52]

Findings

Over half of the studies (n = 13) [29, 32, 33, 36,37,38,39, 41, 43, 45, 49, 51, 52] showed a significant increase in positive attitudes towards IP after the interventions. Nine studies (39%) showed no significant changes in medical students’ attitudes towards IPE [30, 31, 40, 42, 44, 46,47,48, 50], while one demonstrated an increase in negative attitudes towards IPE after the intervention [10]. In years 1 and 2 IPE interventions appear longer in duration. Late IPE interventions show a trend to be longer and more statistically significant (Fig. 2). The sample size is too low for further comparisons.

Fig. 2
figure 2

Bar chart: Outcome and duration of IPE interventions in selected articles, according to early (first half) or late (second half) time of medical school. White bars: statistically significant positive change of attitudes; Grey bars: Non-significant positive change of attitudes; full line: continuous IPE intervention; dotted line: intermittent IPE intervention

Methodological rigour

There was 91% agreement (kappa = 0.772) between the reviewers on the scores elicited by the McMaster Critical Review Form for Quantitative Studies [26], which represents good inter-rater reliability [54]. Consensus was reached on the disagreements after discussion. Methodological rigour scores ranged from 7 to 15 out of a maximum of 16. An additional word file shows the scoring in more detail [see Additional file 2]. Most studies (n = 18) were rated as either “Good” [10, 31, 36,37,38, 44, 47, 49, 51, 52], “Very Good” [29, 30, 39, 41, 45, 48] or “Excellent” [33].

Meta-analysis

Initially we planned to undertake a meta-analysis of all studies included in the review. However, with such a broad range of instruments and therefore covering various different factors, it was not feasible. Instead, we performed the analysis with the RIPLS – as it was the most frequently used instrument –in the knowledge that this would only represent 26% of the articles in this review.

Due to the heterogeneity in the reporting of RIPLS results, a sound estimation of summary scores across studies was hampered. Whereas Darlow et al. [33] and Hudson et al. [10] used altered instruments with more than 19 items, Chua et al. [29], Paige et al. [32], Sheu et al. [30] and Sytsma et al. [31] used the original 19-item RIPLS. Nevertheless, in the article by Paige et al. [32], the item “For small group learning to work, students need to trust and respect each other.” is missing and the author did not respond to an email inquiring further information. Combined with extensive heterogeneity in reporting as well as statistically tested (Cochrane’s Q < 0.01 for the meta-analysis of Chua et al. [29], Paige et al. [32], Sheu et al. [30] and Sytsma et al. [31] for the subscales team, identity and role (see supplemental digital file Additional file 3/Table 3: Original RIPLS scores for Chua et al., Paige et al., Systma et al. and Sheu et al., supplemental_material_IPE_RIPLS_original_data.xls) the combination of the single study data for a summary measure seems prone to error. Additionally, authors used means and standard deviations in the original articles, which are not the appropriate summary measures for Likert scaled items. As Sheu et al. [30] only reported the means and standard deviations of RIPLS-subscales, a merging of information for meta-analysis was only possible on that level and not on a single item level. Furthermore, the standard deviations for the mean changes (difference of scores pre-test-post-test) were not given and had to be estimated according to Cochrane’s Handbook (16.1.3.2 Imputing standard deviations for changes from baseline), which introduced further uncertainty by the need to choose a rather random correlation coefficient of standard deviations (0.4 in our case). With regard to the pragmatic heterogeneity of interventions across studies, an ordinary pre-test-post-test score difference is a too simple way to capture the information created by the original studies. All in all, a meta-analysis could not be performed because of the high heterogeneity of the instruments used and the inconsistent data reporting.

Discussion

In this systematic review, we analysed IPE interventions based on 23 studies published between 2011 and 2019. Our findings show that medical students were exposed to IPE interventions at various points in their training, and we could establish evidence of effectiveness of IPE. Three studies involved only medical students and therefore did not meet the WHO definition of IPE. However, they reported on interprofessional interventions and therefore were not excluded from this systematic review.

All years except the fifth study year were represented, so no preference for pre-clinical or clinical years could be observed. However, studies in the first four years of medical education were more frequent. This may reflect variation in the length of pre-registration medical education programmes worldwide. In the USA, medical school consists mainly of 4 years of training (generally preceded by a 3–4-year Bachelor’s degree), while in Europe it averages 6 years (without a preceding program) [55].

In Europe, most medical university programmes are public, and rather larger cohorts of students are educated (e.g., Germany has 36 public and only two private medical schools, and almost 10,000 new medical students per educational year, leading to an average class size of over 260 students) [56], while in the USA (141 fully-accredited medical schools), more than one third are private (n = 56) and class size is much smaller, with an average of 146 students per educational year [56, 57]. This may also explain the higher frequency of studies from the USA, as implementing IPE elements could be more feasible with smaller classes, and private medical schools may suffer more pressure to evaluate their programmes.

The optimal timing to introduce IPE is still subject to debate [10]. In clinical years it may seem reasonable, as it contributes to optimal development of students’ professional identities and gives them experience in working collaboratively with students in different health professions [11]. However, the introduction of IPE so late in the medical curriculum may be complicated by the students’ focus on profession-specific clinical practice [10]. On the other hand, introducing IPE early in pre-registration healthcare courses may be useful in breaking down negative attitudes and avoiding stereotypes [58,59,60].

From our analysis we could not determine the best time to introduce IPE, as both pre-clinical and clinical IPE interventions showed some degree of success. It appears that late IPE interventions show a trend to be longer and more statistically significant. It seems reasonable to conclude that interventions should be introduced in the early years and continue throughout the curriculum. More well-designed studies are needed to address this gap in knowledge.

Published IPE interventions had a pre-test-post-test design and most studies were cross-sectional. Interventions varied in their type and topic, group sizes were small and most activities were only performed once. There was also a paucity of studies reporting medium and long-term outcomes. Most studies (78%) were of good or very good quality, although a small proportion still scored poorly. This is consistent with previous reviews [4, 6, 15, 18]. This trend limits the development of strategies for targeting long-term behaviour changes and potential to positively impact patient outcomes. Longer interventions and longitudinal follow-up of learning outcomes are key to identifying robust outcomes that lead to changes in practice. An increasing number of studies now report mid- and long-term outcomes, but – as we can see from our own sample – these are still a minority. More studies are needed in models for pre-licensure IPE interventions (including adequate evaluation of their effectiveness), particularly regarding long-term outcomes [9, 31, 61]. In situations where prolonged IPE training is not feasible due to organizational limitations, intermittent interventions may be a good strategy [47]. The heterogeneity of most outcome measures may also limit the ability to draw conclusions about best practices and has, in our case, prevented the accomplishment of a meta-analysis.

Studies were most frequently assessed with RIPLS. The Readiness for Interprofessional Learning Scale, developed in 1999, was among the first scales developed for measurement of attitudes towards interprofessional learning [62]. It has been translated and acculturated into several languages [63]. The scale is very popular, but it has not been updated, it fails to embody all the dimensions of the Core Competencies for Interprofessional Collaborative Practice [2], and its conceptual framework has recently been questioned [63]. Additionally, concerns about its low internal consistency at item level and subscale results – raised by the RIPLS authors themselves – perpetuate the debate of what exactly the RIPLS is measuring [64] and there have even been past recommendations to abandon the scale altogether [23, 65]. Finally, some newer scales, more aligned with the IPEC dimensions, have also been successfully tested and acculturated [66, 67]. While educators, curriculum planners and policy makers continue to struggle to identify methods of interprofessional education that lead to better practice [9], clearer measures of interprofessional competency are needed to assess the outcomes from health professional degree programs and to determine what approaches to interprofessional education benefit patients and communities.

The results from this review and from individual studies should be interpreted with caution: students’ educational backgrounds, as well as attitudes, expectations and stereotypes, may vary considerably between institutions and countries and may influence how the IPE interventions are experienced. This probably accounts for many differences in effectiveness of IPE activities in different settings [15]. Additionally, a few studies described a “package” of interprofessional activities, and medical curricula differ significantly, which may introduce more bias. University IPE programmes should agree on a comparable methodology that aligns with research in IPE (e.g., larger cohorts, multi-centre studies) and should focus on fewer instruments to measure IPE, adequately assessed for validity, responsiveness, reliability, and interpretability [45].

There is a broad variation in the length of the medical curriculum between continents and countries. Most of the studies didn’t explain their specific curriculum to the reader. For many articles, we were not able to determine the total length of purported medical studies and therefore determine whether the IPE intervention took place in the final year, which would have been relevant to this literature review. To bridge this gap in knowledge we propose that future research should briefly describe their specific medical curriculum.

Our methodology also has limitations. We decided a priori to include only papers with a at least 35 medical students. The reason was to have sufficiently powered studies in the sample. However, this may have led to some selection bias, or left out potentially relevant interventions. Because we were interested in IPE effects on medical students, we also excluded all studies that did not report specific results for medical students. This limited the number of positive studies available. Similar to other systematic reviews, our work aimed to exclude all “lower quality” studies (i.e., non-randomised, non-experimental, qualitative studies) [9, 16, 20]. Reflecting on our methods, we question whether they are adequate for social or educational research, as there are repeated appeals for more qualitative reviews in IPE [61].

Unfortunately, there were also several issues that made a meta-analysis impossible. First, as RIPLS uses a Likert scale (therefore, an ordinal scale), central tendency statements should be calculated with the median value. However, most studies in this sample chose to report the mean. This is acceptable if one assumes equal distances between items, but it is very unrealistic. Additionally, students responding to pre- and post-intervention questionnaires were pooled cohorts, and items differed in wording (questionnaires were slightly modified). In given studies, some items were not reported. In other studies, items were sometimes scored reversely (negative attitudes), and some studies did not report the change in score which is the outcome of interest for the meta-analysis.

Conclusions

This systematic review showed some evidence of a post-intervention change of attitudes towards IPE across different medical years studied. IPE was successfully introduced both in pre-clinical and clinical years of the medical curriculum. However, we found great variability in the scales chosen to evaluate changes in knowledge, behaviours and attitudes linked with participation in IPE. There was a paucity of studies reporting medium and long-term outcomes. The heterogeneity of results prevents further comparisons or the performance of a rigorous meta-analysis.