Supervision matters in health and human services. While definitions of supervision vary across the literature (Martin et al. 2017), Proctor’s popular model outlines three purposes of supervision: facilitating consistent and quality practice in supervisees (managerial function), helping the development of supervisees’ knowledge, skills, attitudes and practices (educational function), plus providing supervisees with support and validation (restorative function) (Proctor 1987; Brunero and Stein-Parbury 2008; Dilworth et al. 2013; Gonsalvez and Milne 2010). See Box 1 for varying examples of definitions for supervision. An effective supervisor skilfully provides feedback, teaches, fosters collaborative learning, understands the expectation of their supervisees and is organized (Gibson et al. 2018). While supervision training is thought to enhance supervision effectiveness (Martin et al. 2014; Fitzpatrick et al. 2012; Dilworth et al. 2013; Chu et al. 2016), supervisors in health and human services consistently lack such training (MacDonald 2002; Spence et al. 2001; Hoge et al. 2011; Butterworth et al. 2008). Despite a plethora of evidence on the quality and effectiveness of supervision over the last 20 years (Spence et al. 2001; Hill et al. 2014; Newton et al. 2016), scant evidence evaluating the impact of supervision training exists. Only three reviews exist in the literature focusing on supervision training (Milne et al. 2011; Gonsalvez and Milne 2010; Tsutsumi 2011). While these reviews begin to offer insights into the effectiveness of supervision training, they largely focus on the positive outcomes of supervision training without explicitly discussing the complexities around how training interventions influence the quality or effectiveness of supervision. To address this gap in the supervision training literature, we conducted a realist synthesis to explore the extent to which supervision training works (or does not work), for whom and under what circumstances, how and why.

Box 1 Example definitions of supervision

Diversity and complexity of supervision training interventions

While many argue for the importance of training to enhance supervision effectiveness (Kilminster and Jolly 2000), supervisors often carry out their supervision roles without any specific training (Hoge et al. 2011). For supervisors who do experience supervision training, they can experience a wide diversity in training with respect to content, mode, pedagogical strategies and duration. Supervision training content often focuses on the development of supervisor knowledge (e.g. definitions, models, methods, responsibilities, legal/ethical aspects) (Kilminster and Jolly 2000; Spence et al. 2001; Newton et al. 2016), and/or skills (teaching, assessment, feedback, counselling, leadership, interpersonal) (Kilminster and Jolly 2000; Hill et al. 2014; McKellar and Graham 2017). Supervision training modes include face-to-face, online or blended approaches. Pedagogical strategies also vary including didactic (e.g. presentations, videotaped demonstrations), active (e.g. small group discussions) and/or experiential learning (e.g. role-play, feedback) (Spence et al. 2001; Hoge et al. 2011; Pollock et al. 2017). The duration of supervision training ranges from one-off, short-term interventions (such as a 2-day workshop) to extended-duration interventions over many months that are punctuated by mini-interventions such as monthly supervision sessions (Spence et al. 2001). Although competency frameworks for supervision have been developed to guide supervision training (Health Workforce Australia 2014), some scholars argue that a lack of specificity still exists in terms of what and how supervision training should be conducted (Reiser and Milne 2014; Alfonsson et al. 2018). Furthermore, supervision training interventions have been criticised for lacking theoretical and evidence-based foundations (Kilminster and Jolly 2000). Therefore, we designed this realist synthesis to address these criticisms and gaps in the current literature.

A realist approach to supervision training

Given the diversity and complexity of supervision training interventions, a realist synthesis was used to better understand how and why supervision training interventions produce their effects. A realist approach is theory-driven, so facilitates the development and modification of program theories accounting for how and why interventions work (or do not work) and for whom and when (Wong et al. 2012, 2016). This approach is underpinned by scientific realism, which asserts that it is not interventions that create change; rather, it is people who create change (Pawson and Tilley 1997). Interventions are thought to lead to outcomes through the operation of mechanisms, that is, the resources proffered by an intervention and the ways in which this influences participants’ reasoning (Dalkin et al. 2015). Furthermore, there is an appreciation that this complex relationship is context-dependent (Sholl et al. 2017; Ajjawi et al. 2018). Outcomes of any intervention can be affected by the range of conditions within any given setting, which are often sociocultural (Jolly and Jolly 2014). The basic premise is: what works for one person might not work for another; and what works in one circumstance might fail to work in another (Wong et al. 2016). While the context–mechanism–outcome (CMO) relationship is not necessarily straightforward or linear, contextual aspects are thought to trigger particular mechanisms in response to an intervention leading to particular outcomes (Jolly and Jolly 2014). See Box 2 for a glossary of key realist terms.

Box 2 Glossary of realist terms

Developing an initial program theory from non-realist supervision training reviews

Three supervision training reviews, one narrative and two systematic, have so far been published in the literature (Milne et al. 2011; Gonsalvez and Milne 2010; Tsutsumi 2011). While none of these employed realist approaches, nor did they include middle-range theory (MRT) specific to education apart from mentioning Kolb’s (1984) experiential learning (Milne et al. 2011), we applied realist logic in our reading of these reviews to develop an initial program theory (IPT: Fig. 1). We developed this IPT based on our identification of contexts, mechanisms, outcomes and context–mechanism–outcome configurations (CMOCs) for the supervision training interventions across the three papers. Firstly, studies included in these three reviews had several different contexts including health and human services (e.g. psychology, mental health, allied health), plus commercial contexts such as sales and insurance (Milne et al. 2011; Gonsalvez and Milne 2010; Tsutsumi 2011). Secondly, the supervision training interventions outlined within these three reviews were complex and diverse in terms of the: (a) content such as knowledge, skills and attitudes; (b) mode of delivery including face-to-face and online learning; (c) pedagogical strategies employed including theoretical and experiential learning; and (d) duration of the interventions including short (e.g. half-day) and extended durations (e.g. year) (Milne et al. 2011; Gonsalvez and Milne 2010; Tsutsumi 2011). Third, a variety of (mostly) positive outcomes were identified within the three reviews and related to supervisors (e.g. improved satisfaction, confidence, knowledge, skills) and supervisees (e.g. improved satisfaction and mental health). Fourth, we were also able to identify some mechanisms in the reviews to explain why interventions produced their effects e.g. supervision training interventions having an appropriate balance between didactic and experiential learning methods and extended durations enhancing engagement. Finally, we were able to identify two distinct CMOCs in two of the reviews:

Fig. 1
figure 1

Initial program theory

  • Within mental health supervision [C] clinical supervision training [I] leads to supervisor and supervisee development [O] through having a blend of pedagogic methods such as feedback, educational role-play and modelling [M] (Milne et al. 2011).

  • Within the workplace [C], supervision training [I] leads to improved supervisor knowledge and behaviour, plus enhanced supervisee mental health [O] through improved knowledge and behavioural modification [M] (Tsutsumi 2011).

Study aim and research questions

Although we have been able to identify contexts, mechanisms and outcomes and two CMOCs for supervision training interventions from previous reviews using realist logic, and develop an IPT, this current realist synthesis aimed to extend current literature reviews to develop a modified program theory (MPT). It aimed to review the published supervision training literature within health and human services to answer the novel research question: To what extent do supervision training interventions work (or not), for whom and in what circumstances, and why?


The review protocol registered on PROSPERO (CRD42018094186) and published (Lee et al. 2019) was underpinned by Pawson’s five stages of realist review: (1) clarifying scope; (2) searching for evidence; (3) study selection; (4) data extraction; and (5) data synthesis (Pawson et al. 2005). Although presented in a linear fashion, stages were conducted iteratively with some overlap. The review methods and findings follow the RAMESES publication standards for realist syntheses (Wong et al. 2013).

Clarifying the scope

A matrix identifying existing primary literature/empirical studies, literature reviews, search terms and their synonyms was created, generating numerous search terms. With the help of a medical librarian (see acknowledgements), pilot searches were conducted using several databases to test search terms (those identified as keywords in other published supervision training outputs, plus synonyms familiar to our multidisciplinary team), Boolean operators and proximity searching. Note that our original scope for this realist synthesis was broad including health (e.g. medicine, nursing, allied health etc.) and human services (e.g. housing, disability, children services, youth and family services, alcohol and drug services, out of home care etc.), consistent with our funding (Victorian Department of Health and Human Services). Furthermore, our scope was also broad in terms of interventions (e.g. workshops, online education, lectures), professions (e.g. nursing, physiotherapy, pharmacology, mental health), contexts (e.g. hospitals, universities, training centers, community services) and levels of learner (e.g. undergraduate students, postgraduate trainees, peers and colleagues).

Searching for empirical evidence

A final and comprehensive search of the literature was conducted in May 2018 by SLL, with input from the medical librarian and co-authors. Note that none of these final searches were limited by date. Key search terms and phrases included supervisor terms (e.g. supervisor, practice educator, clinical educator, preceptor) and training terms only (e.g. education, professional development, train-the-trainer). Given the breadth of our search, we did not include search terms relating to interventions, professions, contexts or levels of learner, as advised by our medical librarian. For a full list of search terms see the protocol (Lee et al. 2019). Key terms were combined with proximity searching, Boolean operators, truncations and asterisks. Furthermore, searches were adapted to meet the operative requirements of each database including: Educational Resources Information Center (ERIC, ProQuest); Australian Public Affairs Information Service (APAIS, Informit); Social Services Abstracts (ProQuest); Scopus; PsycINFO (Ovid); MEDLINE (Ovid); and Cumulative Index to Nursing and Allied Health Literature (CINAHL Plus, Ebsco). An example of a CINAHL search strategy is included in Box 3. Citations and reference lists of included studies were hand searched to identify additional relevant studies.

Box 3 Search strategy example of CINAHL search

The first search elicited 15,676 outputs across all databases. Once duplicate results were removed, 11,764 outputs remained. All outputs were exported to Covidence software (© Covidence 2019) for management. The searching and selection process is summarised in the PRISMA diagram (see Fig. 2). Inclusion and exclusion criteria are shown in Table 1. Given the number of outputs identified, non-peer-reviewed outputs were excluded.

Fig. 2
figure 2


Table 1 Inclusion and exclusion criteria

Study selection and appraisal

All authors (except EH) conducted initial assessments of outputs’ relevance using Covidence. Each analyst first participated in a calibration exercise of ten titles and abstracts using the inclusion criteria, with subsequent team-based discussions, before analyzing their own set of titles and abstracts. Each author (except EH) then screened a roughly equal portion of the titles and abstracts of studies retrieved using the search strategy (and those retrieved from hand searched references) against the inclusion criteria (see Table 1). Any ambiguities at this stage (i.e. outputs selected as ‘maybe’ in Covidence) were checked by a second independent researcher and resolved through discussion (SLL). Five percent of the 11,764 outputs examined for relevance was therefore double-checked at this stage (SLL) (Brennan et al. 2014).

Following this initial assessment of relevant titles and abstracts, 77 outputs remained. The full text of these outputs were retrieved and all authors assessed a roughly equal portion of outputs for rigour, after first participating in a second calibration exercise involving two full-text outputs, with consequent team-based discussions. Rigour was determined to understand whether the methods used to generate data were credible and trustworthy (Abrams et al. 2018). All authors checked rigour using either the Critical Appraisal Skills Programme (CASP) qualitative checklist (for qualitative or mixed methods studies) (Critical Appraisal Skills Programme 2018), or the Medical Education Research Study Quality Instrument (MERSQI) (for quantitative studies) (Cook and Reed 2015; Reed et al. 2009). At the same time as assessing rigour, we also reexamined relevance based on the full-text outputs, a process which we termed ‘realist relevance’. This meant that the outputs were judged in terms of whether they could contribute to the development of our IPT (Wong et al. 2013; Abrams et al. 2018). Assessment of ‘realist relevance’ was based on a 0–3 scale where 0 = the article lacked richness to enable the identification of contexts (C), mechanisms (M), outcomes (O) or context–mechanism–outcome configurations (CMOCs) and could therefore not help in the development of our IPT. At the other end of the scale, a paper received 3 = the article was sufficiently rich to identify CMOCs and could help develop our program theory. Finally, each paper was given an overall judgement (include, exclude, borderline) for rigour and realist relevance combined. Any outputs assessed as borderline for rigour and relevance (approximately 57%) were checked by a second author and any disagreements resolved through discussion (SLL and EH) (Brennan et al. 2014). The final sample of included outputs was 29.

Data extraction

All authors (except VE and BW) extracted data after a third calibration exercise involving analysts’ extraction of two full-text outputs, with subsequent team-based discussions. The data extraction of the 29 outputs included: study characteristics (e.g. publication year, study methodology); contexts (e.g. study setting, profession, level of supervisor experience, country); intervention characteristics (e.g. content, mode, pedagogical strategies, duration); types of participants (e.g. clinical teachers); mechanisms and outcomes (note that outcomes and/or mechanisms could be positive or negative and pertain to supervisors or supervisees); CMOCs; and MRT. Contexts (C), Mechanisms (M) and Outcomes (O) and CMOCs for each supervision training intervention were highlighted on the 29 outputs and notes added by the data extractors. These highlights and notes were then transferred to tables using Microsoft Word (Microsoft, Windows 10) collating C, M, and Os and CMOCs both within and across our final sample. Note that during this process we labelled outcomes and mechanisms underpinning those outcomes as either positive (+) or negative (-). Inspired by other realist syntheses (Abrams et al. 2018), in order to elicit this information we made interpretations of meaning (e.g. does the relevant text provide sufficient data that could be interpreted as operating as contexts, mechanisms and/or outcomes?). Seven outputs (24%) were double-checked by a second extractor (mostly EH) at this data extraction stage, with any discrepancies resolved through discussion (SLL).

Data synthesis

To synthesise the large amount of extracted data, we first divided our data into two categories based on the duration of interventions (either short or extended durations), given that intervention duration was flagged as an important intervention component and a mechanism underpinning positive outcomes in our IPT (Fig. 1). Note that short durations were defined as one-off interventions or interventions with multiple sessions but within a restricted time period (e.g. less than 1 week). Conversely, interventions with extended durations were defined as those conducted over many months (and sometimes years), with extended time periods between multiple sessions (e.g. monthly). Microsoft word tables including CMOCs with supporting illustrative quotes from the outputs were used for this data synthesis stage. Led by CER, four authors (CER, SLL, EH and CP) examined the data in these tables to identify demi-regularities (i.e. recurrent CMOCs) (Lee et al. 2019) across the 29 outputs with 139 original CMOCs identified across interventions with short (87 CMOCs) and extended durations (52 CMOCs). Inspired by other realist syntheses (Abrams et al. 2018), we asked questions like: is this CMOC found elsewhere in the same or other documents? How does this CMOC interplay with our IPT? How might this CMOC develop our program theory? Note that at this stage, CMOCs that were considered tangential to these demi-regularities or did not contribute to our MPT were removed from the final tables presented in this paper, leaving 74 final CMOCs (with 42 CMOCs for short-duration interventions, and 32 CMOCs for extended-duration interventions).


Following the assessment of rigour and realist relevance, 29 outputs remained in the final synthesis based on 28 studies; one study being presented across two outputs (Sandau et al. 2011; Sandau and Halm 2011). The final sample of outputs consisted of eight qualitative, eleven quantitative and ten mixed methods studies. Studies were conducted in various countries including the USA (n = 10), Australia (n = 5), UK (n = 3), Jordan (n = 2), Sweden (n = 3), Canada (n = 2), Netherlands (n = 1), Taiwan (n = 1) and Pakistan (n = 1), with one paper conducted across multiple counties (Myrick et al. 2011). Study interventions included face-to-face only (n = 24), online only (n = 4) and blended approaches including face-to-face and online components (n = 1). Interventions were either of short (n = 19) or extended durations (n = 10). There was a vast array of disciplines involved in the final sample including nursing (n = 9), medicine (n = 2) and allied health professions (n = 14), with some outputs including multiple disciplines (n = 4) (e.g. Carlson and Bengtsson 2015). In keeping with our IPT, data extraction and synthesis is presented separately for short (Table 2) and extended-duration interventions (Table 3).

Table 2 Data extraction for short-duration interventions
Table 3 Data extraction for extended-duration interventions

Short-duration supervision training interventions

Short-duration supervision training interventions typically focused on learning outcomes relating to supervisory knowledge and skills (content), were delivered face-to-face (mode) and employed multiple approaches such as didactic (e.g. presentations, videos), active (e.g. group discussions, case studies, reflection activities) and experiential learning (e.g. role plays, feedback). Although MRTs underpinning short interventions were often absent or not specified in the outputs, a range of theories were identified, the most common of which were adult learning theories (Knowles 1972), experiential learning (Kolb 1984) and the novice-to-expert model (Benner 1982).

Ten demi-regularities pertinent to our developing program theory were identified from the wide-ranging CMOCs identified in the extraction phase, with eight demi-regularities highlighting interventions’ positive outcomes and two demi-regularities illustrating interventions’ negative outcomes (see Table 4). In terms of the positive outcomes, all but one of the identified demi-regularities related to supervisor outcomes:

Table 4 CMOCs for short-duration interventions
  • Healthcare supervisors [C] undergoing short-duration supervision training [I] experienced improved satisfaction with training [+ O] (Hook and Lawson-Porter 2003; McChesney and Euster 2000; Murphy 2014); improved supervisory confidence [+ O] (Carlson and Bengtsson 2015; Ford et al. 2013; Taylor et al. 2007); improved supervisory engagement [+ O] (Cox and Araoz 2009; McChesney and Euster 2000; Taylor et al. 2007) and improved supervisory knowledge and practices [+ O] (C Cox et al. 2017; Ford et al. 2013; Gillieatt et al. 2014; Henderson et al. 2006; Hook and Lawson-Porter 2003; Lee et al. 2017; Methot et al. 1996; Murphy 2014) through mixed pedagogical strategies including active and/or experiential learning [+ M].

  • Healthcare supervisors [C] undergoing short-duration supervision training [I] experienced improved supervisory practices [+ O] through improved knowledge, skills and/or attitudes [+ M] (Carlson and Bengtsson 2015; Taylor et al. 2007).

  • Healthcare supervisors [C] undergoing short-duration supervision training [I] experienced improved supervisory practices [+ O] through improved confidence and/or self-efficacy [+ M] (Carlson and Bengtsson 2015; Eckstrom et al. 2006).

  • Healthcare supervisors [C] undergoing short-duration supervision training [I] experienced improved supervisory satisfaction, knowledge, practices [+ O] through positive social relationships [+ M] (Gillieatt et al. 2014; Henderson et al. 2006; Hook and Lawson-Porter 2003; McChesney and Euster 2000).

The remaining demi-regularity relating to positive outcomes spoke to supervisee outcomes:

  • Healthcare supervisors [C] undergoing short-duration supervision training [I] helped improve supervisee development and well-being (e.g. retention) [+ O] through structured training [+ M] (Clipper and Cherry 2015; Sandau et al. 2011).

    The demi-regularities that resulted in negative outcomes pertained only to supervisor-related outcomes:

  • Healthcare supervisors [C] undergoing short-duration supervision training [I] experienced no improvements in supervisory skills [− O] through lack of engagement in training or reinforcement of training [− M] (Busari et al. 2006; Eckstrom et al. 2006; Quirk et al. 1998).

  • Healthcare supervisors [C] undergoing short-duration supervision training [I] experienced poor supervisor engagement in training [− O] through insufficient protected time [− M] (Hook and Lawson-Porter 2003; Sandau and Halm 2011; Sayani et al. 2017).

Based on these demi-regularities we developed a modified program theory (MPT) for short-duration interventions (Fig. 3).

Fig. 3
figure 3

Modified program theory—short duration intervention

Extended-duration supervision training interventions

Extended-duration supervision training interventions also typically focused on participants developing their supervisory knowledge and skills (content), were delivered face-to-face (mode) and employed multiple approaches such as didactic (e.g. presentation, videos, reading), active (e.g. group discussions, reflective activities) and experiential learning (e.g. group supervision). Indeed, differences between short and extended-duration interventions (other than their longevity) were subtle, including: (1) some short-duration interventions being delivered online, and (2) more extended-duration interventions employing experiential pedagogical strategies. Although middle-range theories underpinning extended-duration interventions were sometimes absent or not specified in the outputs (e.g. ‘learning theory’, ‘psychodynamic theory’), various theories were identified. The most commonly identified were experiential learning (Kolb 1984), reflective practice (Schön 1987), and social learning theories (Proctor and Inskipp 2001; Shulman 1991, 1993, 2005).

We were able to identify fewer demi-regularities across our wide-ranging CMOCs for extend-duration interventions (Table 5). Firstly, we found five demi-regularities consistent with those already identified above for short-duration interventions but these were sometimes expressed in the reverse way (e.g. negative outcomes for extended but positive outcomes for short-duration interventions):

Table 5 CMOCs for extended-duration interventions
  • Healthcare supervisors [C] undergoing extended-duration supervision training [I] experienced improved supervisory knowledge and practices [+ O] through mixed pedagogical strategies emphasizing active and/or experiential learning [+ M] (Halabi et al. 2012; Myrick et al. 2011; Ögren et al. 2008; Rogers and McDonald 1992; Seo and Engelhard 2014).

  • Healthcare supervisors [C] undergoing extended-duration supervision training [I] experienced modest outcomes only [− O] through a lack of systematic training involving mixed pedagogical strategies [− M] (Milne and Westerman 2001; Rogers and McDonald 1992).

  • Healthcare supervisors [C] undergoing extended-duration supervision training [I] experienced improved supervisory knowledge and practices [+ O] through supervisor engagement [+ M] (Tebes et al. 2011).

  • Healthcare supervisors [C] undergoing extended-duration supervision training [I] experienced poor supervisor engagement in training [− O] through insufficient protected time [− M] (Ögren et al. 2008).

  • While healthcare supervisors [C] undergoing extended-duration supervision training [I] experienced improved supervisory satisfaction, knowledge, practices and/or attitudes [+ O] through positive social relationships [+ M] (Myrick et al. 2011; Ögren et al. 2008; Sundin et al. 2008), they experienced modest outcomes only [− O] through challenging social relationships [− M] (Milne and Westerman 2001; Ögren et al. 2008).

    We identified only one additional demi-regularity for the extended-duration interventions, not prominent for short-duration interventions:

  • While healthcare supervisors [C] undergoing extended-duration supervision training [I] experienced improved supervisory engagement, knowledge and/or practices [+ O] through positive facilitator styles [+ M] (Myrick et al. 2011; Ögren et al. 2008; Sundin et al. 2008; Sevenhuysen et al. 2013), they experienced modest outcomes only [− O] through negative facilitator styles [− M] (Sundin et al. 2008).

Based on these demi-regularities we developed a MPT for extended-duration interventions (Fig. 4).

Fig. 4
figure 4

Modified program theory—extended duration intervention


This synthesis set out to address the research questions: to what extent do supervision training interventions work (or not), for whom and in what circumstances, and why? Through our realist synthesis of 29 research outputs, we were able to develop two novel program theories, grounded in that evidence, about the positive and negative outcomes of short and extended-duration supervision training interventions, the mechanisms underpinning those outcomes and the extent to which those relationships were context-dependent, thereby developing existing knowledge on supervision training interventions.

Summary of key findings

The developed program theories demonstrate that both short and extended-duration supervision training interventions have a multiplicity of positive supervisor outcomes including improved satisfaction, knowledge, skills, and engagement through a combination of mechanisms including mixed pedagogical approaches involving active and/or experiential learning, plus privileging social relationships (e.g. teacher–learner, peer–peer). Furthermore, both modified program theories illustrate that short and extended-duration supervision training interventions can lead to poor supervisor engagement in training when insufficient protected time exists for supervisor learning. Additionally, while most of the literature reviewed originated from health professions rather than human services contexts, we did not find that variations in disciplinary or organisational contexts were especially relevant to our program theories for short or extended-duration interventions. However, when comparing the mechanisms underpinning short and extended-duration training interventions, we found that supervisor characteristics (i.e. confidence, knowledge, skills and attitudes) were key mechanisms triggering positive outcomes for short-duration interventions, whereas facilitator characteristics were key mechanisms triggering either positive or negative outcomes for extended-duration interventions.

In summary, our findings are novel in two key ways: (1) that short and extended-duration interventions have numerous positive outcomes through mixed pedagogical approaches, social learning, and protected time for supervisors; and (2) that interventions of different durations may work in slightly different ways, with the success of short interventions relying on supervisor characteristics, and extended-duration interventions instead relying on facilitator characteristics.

Comparison with existing literature

That mixed pedagogies involving active and/or experiential learning were important for the success of supervision training interventions is consistent with educational theories e.g. reflective practice (Schön 1987), experiential learning (Kolb 1984), plus our IPT based on three non-realist reviews of supervision training (Milne et al. 2011; Gonsalvez and Milne 2010; Tsutsumi 2011). Furthermore, that social relationships were also important for positive supervision training program outcomes in our modified program theories is also consistent with social learning theories (Shulman 1991, 1993, 2005; Proctor and Inskipp 2001). Finally, that negative outcomes occurred when supervisors were provided with insufficient protected time for learning, is consistent with literature emphasising the tensions between training and service delivery (Sholl et al. 2017). However, the findings of our realist synthesis not only extend our IPT but also add considerable new knowledge to the supervision training literature (Milne et al. 2011; Gonsalvez and Milne 2010; Tsutsumi 2011).

Firstly, our findings illustrate a wider range of outcomes (including negative outcomes) than has been previously identified in the supervision training literature including our IPT (Milne et al. 2011; Gonsalvez and Milne 2010; Tsutsumi 2011). Furthermore, aligned with our IPT based on previous non-realist reviews (Milne et al. 2011; Gonsalvez and Milne 2010; Tsutsumi 2011), we expected extended-duration interventions to have enhanced positive outcomes compared with short interventions, but we did not find this to be the case based on our realist synthesis of 29 outputs. It is worth noting that our synthesis included nineteen short and ten extended-duration studies from which to draw our conclusions, consistent with previous literature suggesting that short-duration supervision training interventions were more commonly delivered (Gonsalvez and Milne 2010). We did not identify additional positive outcomes for extended-duration interventions, plus we identified fewer demi-regularities across our identified CMOCs for extended-duration interventions. While this may reflect the fewer outputs reviewed in our study employing extended durations, our findings may reflect a genuine lack of added benefit from extending the duration of supervision training interventions. Indeed, healthcare workers may only require short interventions in order to realize key positive outcomes from training (as long as those short interventions include mixed pedagogies, social learning, protected time, and supervisor characteristics like confidence).

Secondly, our findings provide novel insights into the causal pathways for the multiplicity of ways in which both short and extended-duration supervision training interventions work (or not). Indeed, our realist lens has enabled us to identify the multiplicity of mechanisms triggered within supervision training interventions, leading to various positive supervisor outcomes. While interventions of any duration seemed to work through mixed pedagogies, social relationships and protected time (consistent with previous research and educational theories as described above), short interventions seemed to work through supervisor characteristics, whereas extended-duration interventions seemed to work (or not) based on facilitator characteristics. That learner characteristics seemed central in the face of short interventions mirrors previous research flagging the importance of supervisors’ personal qualities and skills as key contributors to supervision effectiveness (Wearne et al. 2012; Gibson et al. 2018), plus learning theories associated with the short-duration interventions, which were exclusively individualist and constructivist in nature such as adult learning theories (Knowles 1990), experiential learning (Kolb 1984) and the novice-to-expert model (Benner 1982). That extended-duration interventions seemed dependent on facilitator characteristics, probably relates to the increased importance of facilitator–supervisor relationships in the face of enduring associations (sometimes several years long). This also mirrors our finding that extended-duration interventions were associated with middle-range social educational theories.

Methodological strengths and challenges

Our synthesis was strengthened through the use of a large multidisciplinary team and a rigorous process aligned with the RAMESES guidelines (Wong et al. 2013, 2017). However, we acknowledge several potential challenges concerning this synthesis. First, although we worked closely with a medical librarian and piloted our search terms, due to the voluminous nature of the supervision training literature, plus the extensive range of contexts included in our searches, we recognize that we inevitably omitted terms associated with supervision and/or training (e.g. coaching, facilitation etc.) (Lee et al. 2019). Therefore, we may not have identified all potentially key studies. Second, although our search strategy and inclusion criteria did include human services, this literature was either absent or excluded because of its poor quality and/or low realist relevance, meaning that our findings speak to health rather than human services contexts. Third, while we decided to include only peer-reviewed outputs due to the vast supervision training literature, we acknowledge that we may have excluded potentially important non-peer-reviewed grey literature, which may have been beneficial in the development of our program theories, and could have accounted for human services contexts. Fourth, none of the outputs included in our synthesis employed realist evaluation methods and as such, we struggled to tease out how context influenced the program theories. Fifth, like others who have identified a lack of evidence pertaining to the outcomes of supervision training on supervisees (e.g. students) and healthcare consumers (Gibson et al. 2018), the outcomes of studies included in our synthesis are somewhat limited to supervisor outcomes (and often based on self-report). Finally, the papers included in our realist synthesis often lacked explicit articulation of middle-range educational theories on which to base the development and refinement of our program theory. Furthermore, when theories were drawn on they were typically older individualist theories, rather than more sophisticated contemporary social educational theories.

Implications for further research

Our study findings and our methodological challenges have a number of implications for further research. Firstly, given that our realist synthesis focuses primarily on health contexts, further literature reviews are now needed to explore supervision training in human services, perhaps employing different types of review (e.g. narrative review) to describe the types and outcomes of supervision training interventions for human services workers. Secondly, given that our realist synthesis has presented somewhat contradictory and unexpected findings about intervention duration, further research is now needed to explore more fully the similarities and differences between short and extended-duration supervision training interventions in terms of how they work (or do not work), for whom and under what circumstances, plus drawing on more contemporary social educational theories. The next stage of our supervision training study will employ realist evaluation (Wong et al. 2012), in order to explore the outcomes of short (i.e. half-day workshops) and extended-duration supervision training interventions for health and human services workers (i.e. half-day workshops plus 3-month longitudinal audio diaries), their underlying mechanisms and associated contextual nuances. Indeed, through employing realist evaluation we hope to better tease out how contextual variations influence mechanisms generating outcomes. Thirdly, similar to others reporting limitations in how the effectiveness of supervision training has traditionally been measured (Milne et al. 2011), further research is now needed that extends the ‘measurement’ of outcomes beyond supervisor outcomes to include outcomes for supervisees, and healthcare consumers. Indeed, realist evaluation could help to flesh out the multiplicity of outcomes for supervisors, supervisees and healthcare consumers, as well as identifying the multiple causal pathways.

Implications for educational practice

Investment in supervision training has been proposed as having greater positive impact than resourcing supervision alone (Hill et al. 2014). In the quest to develop healthcare workers’ supervisory practices, we have found that supervisor training interventions of any duration can work to enhance supervisors’ confidence, knowledge, skills, and engagement through mixed pedagogical approaches involving active and/or experiential learning, privileging social relationships, and protected time. Supervision training that extends over longer periods of time showed no evidence of additional benefits in our realist synthesis. Our review therefore implies that only a modest investment may be required to produce significant outcomes for supervisory practices. These findings are important for resource-sensitive healthcare systems that fund the supervision training of healthcare workers. If offering short-term duration interventions, supervisor characteristics become important mechanisms triggering positive outcomes, whereas facilitator characteristics become central mechanisms triggering outcomes for extended-duration interventions. Therefore, we encourage healthcare educators involved in the design and facilitation of supervision training interventions to pay close attention to the key mechanisms highlighted in this realist synthesis in order to maximise the positive outcomes of supervision training interventions for supervisors. Finally, from an organizational perspective, supervision training programs need to be situated within organizational workplace cultures that enable supervisors to apply their new-found supervisory knowledge and skills to supervisory practices. Ultimately, healthcare organisations need to operate as positive learning organisations in order to maximise supervisory outcomes from training programs.