Supervision training in healthcare: a realist synthesis

Supervision matters: it serves educational, supportive and management functions. Despite a plethora of evidence on the effectiveness of supervision, scant evidence for the impact of supervision training exists. While three previous literature reviews have begun to examine the effectiveness of supervision training, they fail to explore the extent to which supervision training works, for whom, and why. We adopted a realist approach to answer the question: to what extent do supervision training interventions work (or not), for whom and in what circumstances, and why? We conducted a team-based realist synthesis of the supervision training literature focusing on Pawson’s five stages: (1) clarifying the scope; (2) determining the search strategy; (3) study selection; (4) data extraction; and (5) data synthesis. We extracted contexts (C), mechanisms (M) and outcomes (O) and CMO configurations from 29 outputs including short (n = 19) and extended-duration (n = 10) supervision training interventions. Irrespective of duration, interventions including mixed pedagogies involving active and/or experiential learning, social learning and protected time served as mechanisms triggering multiple positive supervisor outcomes. Short-duration interventions also led to positive outcomes through mechanisms such as supervisor characteristics, whereas facilitator characteristics was a key mechanism triggering positive and negative outcomes for extended-duration interventions. Disciplinary and organisational contexts were not especially influential. While our realist synthesis builds on previous non-realist literature reviews, our findings extend previous work considerably. Our realist synthesis presents a broader array of outcomes and mechanisms than have been previously identified, and provides novel insights into the causal pathways in which short and extended-duration supervision training interventions produce their effects. Future realist evaluation should explore further any differences between short and extended-duration interventions. Educators are encouraged to prioritize mixed pedagogies, social learning and protected time to maximize the positive supervisor outcomes from training.


Introduction
Supervision matters in health and human services. While definitions of supervision vary across the literature (Martin et al. 2017), Proctor's popular model outlines three purposes of supervision: facilitating consistent and quality practice in supervisees (managerial function), helping the development of supervisees' knowledge, skills, attitudes and practices (educational function), plus providing supervisees with support and validation (restorative function) (Proctor 1987;Brunero and Stein-Parbury 2008;Dilworth et al. 2013;Gonsalvez and Milne 2010). See Box 1 for varying examples of definitions for supervision. An effective supervisor skilfully provides feedback, teaches, fosters collaborative learning, understands the expectation of their supervisees and is organized (Gibson et al. 2018). While supervision training is thought to enhance supervision effectiveness Fitzpatrick et al. 2012;Dilworth et al. 2013;Chu et al. 2016), supervisors in health and human services consistently lack such training (MacDonald 2002;Spence et al. 2001;Hoge et al. 2011;Butterworth et al. 2008). Despite a plethora of evidence on the quality and effectiveness of supervision over the last 20 years (Spence et al. 2001;Hill et al. 2014;Newton et al. 2016), scant evidence evaluating the impact of supervision training exists. Only three reviews exist in the literature focusing on supervision training (Milne et al. 2011;Gonsalvez and Milne 2010;Tsutsumi 2011). While these reviews begin to offer insights into the effectiveness of supervision training, they largely focus on the positive outcomes of supervision training without explicitly discussing the complexities around how training interventions influence the quality or effectiveness of supervision. To address this gap in the supervision training literature, we conducted a realist synthesis to explore the extent to which supervision training works (or does not work), for whom and under what circumstances, how and why.

Diversity and complexity of supervision training interventions
While many argue for the importance of training to enhance supervision effectiveness (Kilminster and Jolly 2000), supervisors often carry out their supervision roles without any specific training (Hoge et al. 2011). For supervisors who do experience supervision Box 1 Example definitions of supervision "Clinical supervision is a process of professional support and learning in which nurses are assisted in developing their practice through regular discussion time with experienced and knowledgeable colleagues…" (Brunero and Stein-Parbury 2008, p. 87) "Supervision is any activity where more experienced health professionals provide less experienced health professionals with opportunities that enable these health professionals to achieve learning, to receive support, and to improve the quality and safety of their practice" (Fitzpatrick et al. 2012, p. 462) "Supervision is a forum where supervisees review and reflect on their work in order to do better.
Practitioners bring their actual work-practice to another person (individual supervision), or to a group (small group or team supervision), and with their help review what happened in their practice in order to learn from that experience" (Caroll 2007, p. 36) "The formal provision, by approved supervisors, of a relationship-based education and training that is work-focused and which manages, supports, develops and evaluates the work of colleagues…" (Milne 2007, p. 439) "The term clinical supervision is defined as a formal process of professional support and learning which enables individual practitioners to develop knowledge and competence, and is acknowledged to be a life-long process…" (Martin et al. 2014, p. 201) training, they can experience a wide diversity in training with respect to content, mode, pedagogical strategies and duration. Supervision training content often focuses on the development of supervisor knowledge (e.g. definitions, models, methods, responsibilities, legal/ethical aspects) (Kilminster and Jolly 2000;Spence et al. 2001;Newton et al. 2016), and/or skills (teaching, assessment, feedback, counselling, leadership, interpersonal) (Kilminster and Jolly 2000;Hill et al. 2014;McKellar and Graham 2017). Supervision training modes include face-to-face, online or blended approaches. Pedagogical strategies also vary including didactic (e.g. presentations, videotaped demonstrations), active (e.g. small group discussions) and/or experiential learning (e.g. role-play, feedback) (Spence et al. 2001;Hoge et al. 2011;Pollock et al. 2017). The duration of supervision training ranges from one-off, short-term interventions (such as a 2-day workshop) to extendedduration interventions over many months that are punctuated by mini-interventions such as monthly supervision sessions (Spence et al. 2001). Although competency frameworks for supervision have been developed to guide supervision training (Health Workforce Australia 2014), some scholars argue that a lack of specificity still exists in terms of what and how supervision training should be conducted (Reiser and Milne 2014;Alfonsson et al. 2018). Furthermore, supervision training interventions have been criticised for lacking theoretical and evidence-based foundations (Kilminster and Jolly 2000). Therefore, we designed this realist synthesis to address these criticisms and gaps in the current literature.

A realist approach to supervision training
Given the diversity and complexity of supervision training interventions, a realist synthesis was used to better understand how and why supervision training interventions produce their effects. A realist approach is theory-driven, so facilitates the development and modification of program theories accounting for how and why interventions work (or do not work) and for whom and when (Wong et al. 2012. This approach is underpinned by scientific realism, which asserts that it is not interventions that create change; rather, it is people who create change (Pawson and Tilley 1997). Interventions are thought to lead to outcomes through the operation of mechanisms, that is, the resources proffered by an intervention and the ways in which this influences participants' reasoning (Dalkin et al. 2015). Furthermore, there is an appreciation that this complex relationship is context-dependent (Sholl et al. 2017;Ajjawi et al. 2018). Outcomes of any intervention can be affected by the range of conditions within any given setting, which are often sociocultural (Jolly and Jolly 2014). The basic premise is: what works for one person might not work for another; and what works in one circumstance might fail to work in another . While the context-mechanism-outcome (CMO) relationship is not necessarily straightforward or linear, contextual aspects are thought to trigger particular mechanisms in response to an intervention leading to particular outcomes (Jolly and Jolly 2014). See Box 2 for a glossary of key realist terms.

Developing an initial program theory from non-realist supervision training reviews
Three supervision training reviews, one narrative and two systematic, have so far been published in the literature (Milne et al. 2011;Gonsalvez and Milne 2010;Tsutsumi 2011). While none of these employed realist approaches, nor did they include middle-range theory (MRT) specific to education apart from mentioning Kolb's (1984) experiential learning (Milne et al. 2011), we applied realist logic in our reading of these reviews to develop 1 3 an initial program theory (IPT: Fig. 1). We developed this IPT based on our identification of contexts, mechanisms, outcomes and context-mechanism-outcome configurations (CMOCs) for the supervision training interventions across the three papers. Firstly, studies included in these three reviews had several different contexts including health and human services (e.g. psychology, mental health, allied health), plus commercial contexts such Contexts can be described as: "the conditions that an intervention operates in (often but not exclusively sociocultural)" (Taylor et al. 2007, p. 28). Context can refer to individuals participating in programs, stakeholder interrelationships, institutional arrangements in which programs sit and/or wider cultural, economic and/or societal settings for programs (Pawson 2018). Mechanisms can be described as: "underlying entities, processes or structures which operate in particular contexts to generate outcomes of interest" (Astbury and Leeuw 2010, p. 368). Mechanisms are typically hidden, sensitive to contextual variations and generative of outcomes (Astbury and Leeuw 2010). Outcomes can be described as the desired products of a program and/or the program's observed products (Yardley et al. 2015;Jolly and Jolly 2014). Context-mechanism-outcome configurations (CMOCs) can be described as heuristics employed "by some realists during analysis to identify the causal links between context, mechanism and outcomes" (Marchal et al. 2018, p. 83). Demi-regularities can be described as: "prominent recurrent patterns of contexts and outcomes… in the data" (Wong et al. 2013, p. 9). Program Theory can be described as: "a plausible and sensible model of how a program is supposed to work" (Bickman 1987, p. 5). A program theory therefore is an explanatory account of how a program works, under what circumstances and for whom (Astbury and Leeuw 2010). Such a theory-driven approach should include both the development of, and testing and refinement of, program theory (Astbury and Leeuw 2010). Middle-range theory (MRT) can be described as theory situated: "between the minor but necessary working hypothesis… and the all-inclusive systematic efforts to develop a unified theory that will explain all the observed uniformities of social behavior, social organization and social change" (Merton 1968, p. 83). MRT can be considered formal theory providing a bridge to existing knowledge about a topic (Marchal et al. 2018).
as sales and insurance (Milne et al. 2011;Gonsalvez and Milne 2010;Tsutsumi 2011). Secondly, the supervision training interventions outlined within these three reviews were complex and diverse in terms of the: (a) content such as knowledge, skills and attitudes; (b) mode of delivery including face-to-face and online learning; (c) pedagogical strategies employed including theoretical and experiential learning; and (d) duration of the interventions including short (e.g. half-day) and extended durations (e.g. year) (Milne et al. 2011;Gonsalvez and Milne 2010;Tsutsumi 2011). Third, a variety of (mostly) positive outcomes were identified within the three reviews and related to supervisors (e.g. improved satisfaction, confidence, knowledge, skills) and supervisees (e.g. improved satisfaction and mental health). Fourth, we were also able to identify some mechanisms in the reviews to explain why interventions produced their effects e.g. supervision training interventions having an appropriate balance between didactic and experiential learning methods and extended durations enhancing engagement. Finally, we were able to identify two distinct CMOCs in two of the reviews: •

Study aim and research questions
Although we have been able to identify contexts, mechanisms and outcomes and two CMOCs for supervision training interventions from previous reviews using realist logic, and develop an IPT, this current realist synthesis aimed to extend current literature reviews to develop a modified program theory (MPT). It aimed to review the published supervision training literature within health and human services to answer the novel research question: To what extent do supervision training interventions work (or not), for whom and in what circumstances, and why?

Methods
The review protocol registered on PROSPERO (CRD42018094186) and published (Lee et al. 2019) was underpinned by Pawson's five stages of realist review: (1) clarifying scope; (2) searching for evidence; (3) study selection; (4) data extraction; and (5) data synthesis (Pawson et al. 2005). Although presented in a linear fashion, stages were conducted iteratively with some overlap. The review methods and findings follow the RAMESES publication standards for realist syntheses (Wong et al. 2013).

Clarifying the scope
A matrix identifying existing primary literature/empirical studies, literature reviews, search terms and their synonyms was created, generating numerous search terms. With the help of a medical librarian (see acknowledgements), pilot searches were conducted using several databases to test search terms (those identified as keywords in other published supervision training outputs, plus synonyms familiar to our multidisciplinary team), Boolean operators and proximity searching. Note that our original scope for this realist synthesis was broad including health (e.g. medicine, nursing, allied health etc.) and human services (e.g. housing, disability, children services, youth and family services, alcohol and drug services, out of home care etc.), consistent with our funding (Victorian Department of Health and Human Services). Furthermore, our scope was also broad in terms of interventions (e.g. workshops, online education, lectures), professions (e.g. nursing, physiotherapy, pharmacology, mental health), contexts (e.g. hospitals, universities, training centers, community services) and levels of learner (e.g. undergraduate students, postgraduate trainees, peers and colleagues). Citations and reference lists of included studies were hand searched to identify additional relevant studies. The first search elicited 15,676 outputs across all databases. Once duplicate results were removed, 11,764 outputs remained. All outputs were exported to Covidence software (© Covidence 2019) for management. The searching and selection process is summarised in the PRISMA diagram (see Fig. 2). Inclusion and exclusion criteria are shown in Table 1. Given the number of outputs identified, non-peer-reviewed outputs were excluded. Box 3 Search strategy example of CINAHL search (supervisor* OR mentors OR mentor OR mentoring OR instructor* OR "placement educator*" OR "practice educator*" OR trainer* OR preceptor OR preceptors OR "clinical teacher*" OR "clinical educator*" or "fieldwork educator*") N2 (training* OR education OR educating OR workshop*) Supervision N1 (training OR education OR educating OR workshop*) "train the trainer*" ("professional development" OR "faculty development" OR "personal development" OR CPD) N2 (supervisor* OR mentors OR mentor OR mentoring OR instructor* OR "placement educator*" OR "practice educator*" OR trainer* OR preceptor OR preceptors OR "clinical teacher*" OR "clinical educator*" OR "fieldwork educator*") 1 3

Study selection and appraisal
All authors (except EH) conducted initial assessments of outputs' relevance using Covidence. Each analyst first participated in a calibration exercise of ten titles and abstracts using the inclusion criteria, with subsequent team-based discussions, before analyzing their own set of titles and abstracts. Each author (except EH) then screened a roughly equal  portion of the titles and abstracts of studies retrieved using the search strategy (and those retrieved from hand searched references) against the inclusion criteria (see Table 1). Any ambiguities at this stage (i.e. outputs selected as 'maybe' in Covidence) were checked by a second independent researcher and resolved through discussion (SLL). Five percent of the 11,764 outputs examined for relevance was therefore double-checked at this stage (SLL) (Brennan et al. 2014).
Following this initial assessment of relevant titles and abstracts, 77 outputs remained. The full text of these outputs were retrieved and all authors assessed a roughly equal portion of outputs for rigour, after first participating in a second calibration exercise involving two full-text outputs, with consequent team-based discussions. Rigour was determined to understand whether the methods used to generate data were credible and trustworthy (Abrams et al. 2018). All authors checked rigour using either the Critical Appraisal Skills Programme (CASP) qualitative checklist (for qualitative or mixed methods studies) (Critical Appraisal Skills Programme 2018), or the Medical Education Research Study Quality Instrument (MERSQI) (for quantitative studies) (Cook and Reed 2015;Reed et al. 2009). At the same time as assessing rigour, we also reexamined relevance based on the full-text outputs, a process which we termed 'realist relevance'. This meant that the outputs were judged in terms of whether they could contribute to the development of our IPT (Wong et al. 2013;Abrams et al. 2018). Assessment of 'realist relevance' was based on a 0-3 scale where 0 = the article lacked richness to enable the identification of contexts (C), mechanisms (M), outcomes (O) or context-mechanism-outcome configurations (CMOCs) and could therefore not help in the development of our IPT. At the other end of the scale, a paper received 3 = the article was sufficiently rich to identify CMOCs and could help develop our program theory. Finally, each paper was given an overall judgement (include, exclude, borderline) for rigour and realist relevance combined. Any outputs assessed as borderline for rigour and relevance (approximately 57%) were checked by a second author and any disagreements resolved through discussion (SLL and EH) (Brennan et al. 2014). The final sample of included outputs was 29.

Data extraction
All authors (except VE and BW) extracted data after a third calibration exercise involving analysts' extraction of two full-text outputs, with subsequent team-based discussions. The data extraction of the 29 outputs included: study characteristics (e.g. publication year, study methodology); contexts (e.g. study setting, profession, level of supervisor experience, country); intervention characteristics (e.g. content, mode, pedagogical strategies, duration); types of participants (e.g. clinical teachers); mechanisms and outcomes (note that outcomes and/or mechanisms could be positive or negative and pertain to supervisors or supervisees); CMOCs; and MRT. Contexts (C), Mechanisms (M) and Outcomes (O) and CMOCs for each supervision training intervention were highlighted on the 29 outputs and notes added by the data extractors. These highlights and notes were then transferred to tables using Microsoft Word (Microsoft, Windows 10) collating C, M, and Os and CMOCs both within and across our final sample. Note that during this process we labelled outcomes and mechanisms underpinning those outcomes as either positive (+) or negative (-). Inspired by other realist syntheses (Abrams et al. 2018), in order to elicit this information we made interpretations of meaning (e.g. does the relevant text provide sufficient data that could be interpreted as operating as contexts, mechanisms and/or outcomes?). Seven outputs (24%) were double-checked by a second extractor (mostly EH) at this data extraction stage, with any discrepancies resolved through discussion (SLL).

Data synthesis
To synthesise the large amount of extracted data, we first divided our data into two categories based on the duration of interventions (either short or extended durations), given that intervention duration was flagged as an important intervention component and a mechanism underpinning positive outcomes in our IPT (Fig. 1). Note that short durations were defined as one-off interventions or interventions with multiple sessions but within a restricted time period (e.g. less than 1 week). Conversely, interventions with extended durations were defined as those conducted over many months (and sometimes years), with extended time periods between multiple sessions (e.g. monthly). Microsoft word tables including CMOCs with supporting illustrative quotes from the outputs were used for this data synthesis stage. Led by CER, four authors (CER, SLL, EH and CP) examined the data in these tables to identify demi-regularities (i.e. recurrent CMOCs) (Lee et al. 2019) across the 29 outputs with 139 original CMOCs identified across interventions with short (87 CMOCs) and extended durations (52 CMOCs). Inspired by other realist syntheses (Abrams et al. 2018), we asked questions like: is this CMOC found elsewhere in the same or other documents? How does this CMOC interplay with our IPT? How might this CMOC develop our program theory? Note that at this stage, CMOCs that were considered tangential to these demi-regularities or did not contribute to our MPT were removed from the final tables presented in this paper, leaving 74 final CMOCs (with 42 CMOCs for shortduration interventions, and 32 CMOCs for extended-duration interventions).

Results
Following the assessment of rigour and realist relevance, 29 outputs remained in the final synthesis based on 28 studies; one study being presented across two outputs Sandau and Halm 2011). The final sample of outputs consisted of eight qualitative, eleven quantitative and ten mixed methods studies. Studies were conducted in various countries including the USA (n = 10), Australia (n = 5), UK (n = 3), Jordan (n = 2), Sweden (n = 3), Canada (n = 2), Netherlands (n = 1), Taiwan (n = 1) and Pakistan (n = 1), with one paper conducted across multiple counties (Myrick et al. 2011). Study interventions included face-to-face only (n = 24), online only (n = 4) and blended approaches including face-to-face and online components (n = 1). Interventions were either of short (n = 19) or extended durations (n = 10). There was a vast array of disciplines involved in the final sample including nursing (n = 9), medicine (n = 2) and allied health professions (n = 14), with some outputs including multiple disciplines (n = 4) (e.g. Carlson and Bengtsson 2015). In keeping with our IPT, data extraction and synthesis is presented separately for short (Table 2) and extended-duration interventions (Table 3).

Short-duration supervision training interventions
Short-duration supervision training interventions typically focused on learning outcomes relating to supervisory knowledge and skills (content), were delivered face-toface (mode) and employed multiple approaches such as didactic (e.g. presentations, Gagné's information processing theory (Gagné 1985)  Kolb's experiential learning (Kolb 1984) Dewey's theoretical ideas of education as integration between theory, practice, reflection and action (Dewey 1964) Schön's reflective practice (Schön 1987) Milne and Westerman (2001 (Shulman 1991(Shulman , 1993(Shulman , 2005 videos), active (e.g. group discussions, case studies, reflection activities) and experiential learning (e.g. role plays, feedback). Although MRTs underpinning short interventions were often absent or not specified in the outputs, a range of theories were identified, the most common of which were adult learning theories (Knowles 1972), experiential learning (Kolb 1984) and the novice-to-expert model (Benner 1982). Ten demi-regularities pertinent to our developing program theory were identified from the wide-ranging CMOCs identified in the extraction phase, with eight demi-regularities highlighting interventions' positive outcomes and two demi-regularities illustrating interventions' negative outcomes (see Table 4). In terms of the positive outcomes, all but one of the identified demi-regularities related to supervisor outcomes: Based on these demi-regularities we developed a modified program theory (MPT) for short-duration interventions (Fig. 3). "The challenges associated with the simultaneous application of the three functions were explored through practice exercises using scenarios and through feedback from peers and trainers" (p. 4)

Health professionals [C] participating in a 1-day training program [I] reported that their supervisory knowledge and skills had improved [+ O] through increased confidence [+ M]
"… reported that their skills had definitely changed (41%) or mostly changed (42%) post-training… feeling empowered, confident and enthusiastic; being more comfortable in the role of supervisor and having increased knowledge and skills." (pp. 5-6) "The variability in behavior levels for most subjects is apparent and should be qualified in terms of the amount and types of other duties involved in managing, supervising and delivering direct care… Because of these additional duties… one would not expect to find stable data on delivery of contingent consequences across observations." (p. 21)

Extended-duration supervision training interventions
Extended-duration supervision training interventions also typically focused on participants developing their supervisory knowledge and skills (content), were delivered faceto-face (mode) and employed multiple approaches such as didactic (e.g. presentation, videos, reading), active (e.g. group discussions, reflective activities) and experiential learning (e.g. group supervision). Indeed, differences between short and extended-duration interventions (other than their longevity) were subtle, including: (1) some shortduration interventions being delivered online, and (2) more extended-duration interventions employing experiential pedagogical strategies. Although middle-range theories underpinning extended-duration interventions were sometimes absent or not specified in the outputs (e.g. 'learning theory', 'psychodynamic theory'), various theories were identified. The most commonly identified were experiential learning (Kolb 1984), reflective practice (Schön 1987), and social learning theories (Proctor and Inskipp 2001;Shulman 1991Shulman , 1993Shulman , 2005. We were able to identify fewer demi-regularities across our wide-ranging CMOCs for extend-duration interventions (Table 5). Firstly, we found five demi-regularities consistent with those already identified above for short-duration interventions but these were sometimes expressed in the reverse way (e.g. negative outcomes for extended but positive outcomes for short-duration interventions): Based on these demi-regularities we developed a MPT for extended-duration interventions (Fig. 4).

Discussion
This synthesis set out to address the research questions: to what extent do supervision training interventions work (or not), for whom and in what circumstances, and why? Through our realist synthesis of 29 research outputs, we were able to develop two novel program theories, grounded in that evidence, about the positive and negative outcomes of short and extended-duration supervision training interventions, the mechanisms underpinning those outcomes and the extent to which those relationships were context-dependent, thereby developing existing knowledge on supervision training interventions.

Summary of key findings
The developed program theories demonstrate that both short and extended-duration supervision training interventions have a multiplicity of positive supervisor outcomes including improved satisfaction, knowledge, skills, and engagement through a combination of mechanisms including mixed pedagogical approaches involving active and/or experiential learning, plus privileging social relationships (e.g. teacher-learner, peer-peer). Furthermore, both modified program theories illustrate that short and extended-duration supervision training interventions can lead to poor supervisor engagement in training when insufficient protected time exists for supervisor learning. Additionally, while most of the literature reviewed originated from health professions rather than human services contexts, we did not find that variations in disciplinary or organisational contexts were especially relevant to  "… supervisors believed that it was important that they assumed a humble attitude when they did not understand something that was expressed during the supervision. It was considered essential to wait for the supervisees and to let each of them find their own pace of understanding what was happening in the interplay." (p. 12) The "The supervisees generally experienced that the program supervisors had actively sought to create space for the supervisees to reflect and ponder. Supervisees were given the opportunity to find their own paths to solutions" (p. 13) "Something that contributed to security and quality in supervision was that the supervisor was direct and expressed him or herself clearly without being offensive" (p. 14) The ] "… this level of engagement was achieved by responding to the continual critical review of stakeholder feedback and adjusting the content of the workshops, and the model itself, based on the feedback. It was also achieved by allowing "space" for participants to raise concerns and discuss potential solutions for these concerns" (p. 42) "… increases in supervisor competencies are associated with increased supervisor satisfaction… managing supervisory relationships and managing job performance significantly predict increases in supervisor satisfaction " (p. 195) our program theories for short or extended-duration interventions. However, when comparing the mechanisms underpinning short and extended-duration training interventions, we found that supervisor characteristics (i.e. confidence, knowledge, skills and attitudes) were key mechanisms triggering positive outcomes for short-duration interventions, whereas facilitator characteristics were key mechanisms triggering either positive or negative outcomes for extended-duration interventions.
In summary, our findings are novel in two key ways: (1) that short and extendedduration interventions have numerous positive outcomes through mixed pedagogical approaches, social learning, and protected time for supervisors; and (2) that interventions of different durations may work in slightly different ways, with the success of short interventions relying on supervisor characteristics, and extended-duration interventions instead relying on facilitator characteristics.

Comparison with existing literature
That mixed pedagogies involving active and/or experiential learning were important for the success of supervision training interventions is consistent with educational theories e.g. reflective practice (Schön 1987), experiential learning (Kolb 1984), plus our IPT based on three non-realist reviews of supervision training (Milne et al. 2011;Gonsalvez and Milne 2010;Tsutsumi 2011). Furthermore, that social relationships were also important for positive supervision training program outcomes in our modified program theories is also consistent with social learning theories (Shulman 1991(Shulman , 1993(Shulman , 2005Proctor and Inskipp 2001). Finally, that negative outcomes occurred when supervisors were provided with insufficient protected time for learning, is consistent with literature emphasising the tensions between training and service delivery (Sholl et al. 2017). However, the findings of our realist synthesis not only extend our IPT but also add considerable new knowledge to the supervision training literature (Milne et al. 2011;Gonsalvez and Milne 2010;Tsutsumi 2011). Fig. 4 Modified program theory-extended duration intervention Firstly, our findings illustrate a wider range of outcomes (including negative outcomes) than has been previously identified in the supervision training literature including our IPT (Milne et al. 2011;Gonsalvez and Milne 2010;Tsutsumi 2011). Furthermore, aligned with our IPT based on previous non-realist reviews (Milne et al. 2011;Gonsalvez and Milne 2010;Tsutsumi 2011), we expected extended-duration interventions to have enhanced positive outcomes compared with short interventions, but we did not find this to be the case based on our realist synthesis of 29 outputs. It is worth noting that our synthesis included nineteen short and ten extended-duration studies from which to draw our conclusions, consistent with previous literature suggesting that short-duration supervision training interventions were more commonly delivered (Gonsalvez and Milne 2010). We did not identify additional positive outcomes for extended-duration interventions, plus we identified fewer demi-regularities across our identified CMOCs for extended-duration interventions. While this may reflect the fewer outputs reviewed in our study employing extended durations, our findings may reflect a genuine lack of added benefit from extending the duration of supervision training interventions. Indeed, healthcare workers may only require short interventions in order to realize key positive outcomes from training (as long as those short interventions include mixed pedagogies, social learning, protected time, and supervisor characteristics like confidence).
Secondly, our findings provide novel insights into the causal pathways for the multiplicity of ways in which both short and extended-duration supervision training interventions work (or not). Indeed, our realist lens has enabled us to identify the multiplicity of mechanisms triggered within supervision training interventions, leading to various positive supervisor outcomes. While interventions of any duration seemed to work through mixed pedagogies, social relationships and protected time (consistent with previous research and educational theories as described above), short interventions seemed to work through supervisor characteristics, whereas extended-duration interventions seemed to work (or not) based on facilitator characteristics. That learner characteristics seemed central in the face of short interventions mirrors previous research flagging the importance of supervisors' personal qualities and skills as key contributors to supervision effectiveness (Wearne et al. 2012;Gibson et al. 2018), plus learning theories associated with the short-duration interventions, which were exclusively individualist and constructivist in nature such as adult learning theories (Knowles 1990), experiential learning (Kolb 1984) and the novice-to-expert model (Benner 1982). That extended-duration interventions seemed dependent on facilitator characteristics, probably relates to the increased importance of facilitator-supervisor relationships in the face of enduring associations (sometimes several years long). This also mirrors our finding that extended-duration interventions were associated with middle-range social educational theories.

Methodological strengths and challenges
Our synthesis was strengthened through the use of a large multidisciplinary team and a rigorous process aligned with the RAMESES guidelines (Wong et al. 2013(Wong et al. , 2017. However, we acknowledge several potential challenges concerning this synthesis. First, although we worked closely with a medical librarian and piloted our search terms, due to the voluminous nature of the supervision training literature, plus the extensive range of contexts included in our searches, we recognize that we inevitably omitted terms associated with supervision and/or training (e.g. coaching, facilitation etc.) (Lee et al. 2019). Therefore, we may not have identified all potentially key studies. Second, although our search strategy and inclusion criteria did include human services, this literature was either absent or excluded because of its poor quality and/or low realist relevance, meaning that our findings speak to health rather than human services contexts. Third, while we decided to include only peer-reviewed outputs due to the vast supervision training literature, we acknowledge that we may have excluded potentially important non-peer-reviewed grey literature, which may have been beneficial in the development of our program theories, and could have accounted for human services contexts. Fourth, none of the outputs included in our synthesis employed realist evaluation methods and as such, we struggled to tease out how context influenced the program theories. Fifth, like others who have identified a lack of evidence pertaining to the outcomes of supervision training on supervisees (e.g. students) and healthcare consumers (Gibson et al. 2018), the outcomes of studies included in our synthesis are somewhat limited to supervisor outcomes (and often based on self-report). Finally, the papers included in our realist synthesis often lacked explicit articulation of middle-range educational theories on which to base the development and refinement of our program theory. Furthermore, when theories were drawn on they were typically older individualist theories, rather than more sophisticated contemporary social educational theories.

Implications for further research
Our study findings and our methodological challenges have a number of implications for further research. Firstly, given that our realist synthesis focuses primarily on health contexts, further literature reviews are now needed to explore supervision training in human services, perhaps employing different types of review (e.g. narrative review) to describe the types and outcomes of supervision training interventions for human services workers. Secondly, given that our realist synthesis has presented somewhat contradictory and unexpected findings about intervention duration, further research is now needed to explore more fully the similarities and differences between short and extended-duration supervision training interventions in terms of how they work (or do not work), for whom and under what circumstances, plus drawing on more contemporary social educational theories. The next stage of our supervision training study will employ realist evaluation (Wong et al. 2012), in order to explore the outcomes of short (i.e. half-day workshops) and extendedduration supervision training interventions for health and human services workers (i.e. half-day workshops plus 3-month longitudinal audio diaries), their underlying mechanisms and associated contextual nuances. Indeed, through employing realist evaluation we hope to better tease out how contextual variations influence mechanisms generating outcomes. Thirdly, similar to others reporting limitations in how the effectiveness of supervision training has traditionally been measured (Milne et al. 2011), further research is now needed that extends the 'measurement' of outcomes beyond supervisor outcomes to include outcomes for supervisees, and healthcare consumers. Indeed, realist evaluation could help to flesh out the multiplicity of outcomes for supervisors, supervisees and healthcare consumers, as well as identifying the multiple causal pathways.

Implications for educational practice
Investment in supervision training has been proposed as having greater positive impact than resourcing supervision alone (Hill et al. 2014). In the quest to develop healthcare workers' supervisory practices, we have found that supervisor training interventions of any duration can work to enhance supervisors' confidence, knowledge, skills, and engagement through mixed pedagogical approaches involving active and/or experiential learning, privileging social relationships, and protected time. Supervision training that extends over longer periods of time showed no evidence of additional benefits in our realist synthesis. Our review therefore implies that only a modest investment may be required to produce significant outcomes for supervisory practices. These findings are important for resourcesensitive healthcare systems that fund the supervision training of healthcare workers. If offering short-term duration interventions, supervisor characteristics become important mechanisms triggering positive outcomes, whereas facilitator characteristics become central mechanisms triggering outcomes for extended-duration interventions. Therefore, we encourage healthcare educators involved in the design and facilitation of supervision training interventions to pay close attention to the key mechanisms highlighted in this realist synthesis in order to maximise the positive outcomes of supervision training interventions for supervisors. Finally, from an organizational perspective, supervision training programs need to be situated within organizational workplace cultures that enable supervisors to apply their new-found supervisory knowledge and skills to supervisory practices. Ultimately, healthcare organisations need to operate as positive learning organisations in order to maximise supervisory outcomes from training programs.