Introduction

Faculty Development Programs (FDPs) in Health Professions Education (HPE) encompass an array of programs and activities that are designed to enhance the expertise of educators in various domains including, but not limited to, teaching, assessment, educational research, curriculum design, mentorship, leadership, and accreditation [1, 2].

Steinert et al. [3] found that, for an FDP to be effective, it should be based on experiential learning; effective feedback; peer-reviewed concepts; collaborative learning; useful interventions; successful models and diverse educational strategies.

Moreover, a FDP in health professions education (HPE) is a well-recognized tool to promote Continuous Professional Development (CPD). CPD is a wider paradigm, encompassing all the core elements of HPE, including knowledge, professionalism and skills such as medical, social, personal, leadership and managerial skills [4].

A necessary part of implementing FDPs is regular evaluation. The evaluation of the effectiveness of most FDPs is reported in the literature by quantitative questionnaires and self-reporting tools [5]. Other techniques for evaluation include hierarchical models like “Kirkpatrick” and other various qualitative methodologies such as interviews [6, 7]. Several studies report how individual components of the FDP are efficient but the literature is scarce for comprehensive evaluation for the whole FDP [8].

The World Federation of Medical Education recommends a set of global standards to monitor the design, development, implementation, and evaluation of CPD [4]. These standards comprise 9 areas namely, “Mission & outcomes, Educational Program, Assessment & Documentation, Individual Doctor, CPD Provision, Educational Resources, Evaluation, Organization and Continuous Renewal”. These are further divided into 32 sub-areas [4]. All the identified components have intricate elements and dynamic links of communication between them. These standards, not only enable the identification of strengths and weaknesses of the FDP but also foster quality enhancement.

However, it is advised by the World Federation for Medical Education that a regulatory body from each country or institution should examine the applicable standards accordingly and build a fitting version that suits the local context. Moreover, standards for CPD programs essentially focus on the processes and procedures of training rather than the core of the training. FDPs based on such robust models are deemed a solid prerequisite to provide effective training for health professionals including doctors and nurses [9].

FDPs need to be geared for the improvement of the whole institutional atmosphere, including student and faculty skills, growth, organizational development, leadership and change management capacities [10]. To accomplish all this, a linear approach may fall short as it focuses on a rigid model with specific initiation and termination dates with very limited room for iteration. Similarly, using a single method of evaluation is deemed as an insufficient technique to judge all aspects of a multi-faceted program such as a FDP [10]. Therefore, there is a dire need for outcome measures and a well-designed study to rigorously evaluate the FDPs, justifying the time and resources requested by departments and institutions.

Several models have been put forth for Faculty development (FD). O’Sullivan et al., [11], proposed the significance of the four fundamental components of FDP, namely: the facilitators, participants, context, and program along with their associated practices, while Dittmar and McCracken [12] put forth the META model (Mentoring, Engagement, Technology, and Assessment) converging on personalized mentoring, constant engagement, the amalgamation of technologies and systematic assessments. This was embraced by regular objective evaluations done by all the stakeholders involved in the educational process, including self, students, and peers [12]. Furthermore, Lancaster in 2014, recognized “centres, committees, and communities” as three core areas in his FD evaluation model [13].

Most of these programs were designed and structured keeping in mind specific criteria and objectives, primarily geared towards strengthening the teaching skills, leadership and learners’ satisfaction [7]. Despite that, such longitudinal FDPs were recommended by many authors for reaping long-lasting benefits in terms of institutional accreditation and better patient care [14,15,16,17,18,19].

In 2020, this trend of linear FDP approaches was taken notice of by Ahmed S A et al., who devised a model based on the “Backward Planning Approach”. This was in response for the need for a more inclusive model. This model reinforces the fact that FD should be considered as a series of cyclical processes, rather than a single endpoint with no future visitations or evaluations of the implemented changes [20].

By “cyclical” we imply a continuous methodology that will assess the program at different points of its progression and then revisit those areas to reinforce and reevaluate issues in the form of a “circle” this is different from traditional linear models of evaluation, for example, the Kirkpatrick model. The Kirkpatrick model addresses the evaluation of FDP in a linear ascending fashion with levels of evaluation. As opposed to this the “5x2 D Model”, consists of five dynamic steps “Decide, Define, Design, Direct, Dissect” which are flexible and interchangeable as part of a cycle [20]. What sets this model apart from the rest reported in the literature, is its flexibility and adaptability.

The 5X2 D-model envisions FDP as an ongoing rejuvenating process of continual renewal and refreshment of skills, performance indicators and competencies. It comprises flexible domains that are revisited continuously. This reiteration and the provision of interchangeability make this cycle a dynamic model for FDP [20].

With the development of the ‘5x2 D Model’, it was necessary to create an evaluation tool suitable for FDP that utilize this model. This is done offering the additional benefit of creating an evaluation tool that is both objective and inclusive of all the domains of the FDP as a whole rather than its individual aspects.

Evaluation of such a holistic longitudinal FDP model needs to be rooted in rigorous methodology and must ensure achievement of the internationally recognized quality standards. Therefore, the purpose of our study is to develop, and face validate an evaluation guide for Health professions schools to use for assessing the progress of the longitudinal FDPs based on the “5X2 D-model.”

Methodology

The Authors followed a deductive qualitative grounded theory approach aiming at generating descriptors for the evaluation of FDPs. This work utilized a qualitative multistage approach starting with the generation of the evaluation questions, Delphi technique and an expert consensus session followed by focus groups discussions (FGD), as outlined below:

Step 1: generation of evaluation questions

Researchers generated the evaluation questions by reviewing the preceding similar appraisal work in the literature and adopting the 5 × 2 D Model (Fig. 1) [20] to analyze the data thematically to identify the proper evaluation questions for the FDP. This was done by the authors and the saturation was confirmed in a series of two virtual meetings, each lasting for 2 h.

Fig. 1
figure 1

5X2 D cycle Backward Planning Model

Step2: Delphi technique

To reach the consensus of the experts on the developed evaluation questions for the FDP, authors developed a survey and pilot-tested it on a group of five respondents.

Delphi approach was deployed over two- online rounds, conducted from May 2021 to June 2021. The Delphi panel consisted of 20 medical educators, purposefully chosen based on their experience in the domain of FD and managing quality standards. Nineteen educators participated in round one and eighteen educators participated in round two.

A consensus threshold of 100% was chosen as the cutoff for continuation, i.e., if 100% of the evaluation questions reached consensus by round 2, the study would be considered complete. This decision was based on a common observation of Delphi studies [21, 22].

Consensus rules

Pre-determined consensus rules were used by the authors to guide decision-making regarding when the evaluation question was to be accepted or excluded. These rules were referenced in rounds 1 and 2. These rules were as follows:

  • Consensus: Mean/average score is ≥4 on the 5-Point Likert Scale. Or percentage more than 75%.

  • Non-consensus: Mean/average score is < 4 on the 5-Point Likert Scale.

The Experts were anonymous to each other throughout the study. The Delphi study was not completely anonymous as the authors are aware of experts’ identities. Each participant was assigned an alphanumeric identifier that was attached to their contributions.

Rounds 1 and 2 involved ranking the questions on a 5-point Likert scale. This allowed the experts to roughly decide the level of agreement on each question.

Round 1 survey consisted of 59 evaluation questions categorized in 11 domains. It was distributed via personal emails. Experts were asked to rank their level of agreement with each statement on the 5-Point Likert Scale. There was an option for the experts to provide written comments for each question, suggest modifications, and/or offer justification for their ranking scores. If comments were provided, keywords and ideas were extracted. The comments were critically evaluated to determine if and what revisions were indicated. Not all respondents provided comments to support their scoring decision. According to the experts’ comments, seven domains did not reach a consensus. Therefore round 2 surveys consisted of 36 questions categorized in 7 domains. Finally, 56 evaluation questions were included in the FGD.

The authors analyzed the responses and extracted the recommendations from the participants’ responses. Then they devised a list of adaptations, which were approved subsequently by all the authors. A second set of evaluation questions were generated based on a second consensus meeting done by the researchers (SA, AK, NN).

Step 3: virtual focus group discussions

Two virtual FGDs were conducted with medical educators who were formally invited based on convenience non-probability sampling method.

First virtual FGD

A total of 30 members participated. They varied in gender, specialty, academic rank, and affiliation. Precautions were taken to guarantee both the anonymity of the participants and the confidentiality of their contributions to the discussions (e.g., their identities were concealed during data analysis).

Participants were divided in to five groups, with one of the authors moderating the session. The FGD lasted for 90-min, during which each moderator used a question guide aiming at exploring participants’ views on indicators for the already developed evaluation questions.

Second virtual FGD

The methodology followed in second FGD was very much similar to the first FGD. However, the purpose of second FGD was to elicit the views of the participants regarding the data sources for the previously agreed upon indicators based on their personal experience in FDP, This was done in order to ascertain data relating to what is currently being used in the real practice.

The questions in the focus group guide covered five major themes concerning FDP based on the 5 × 2 D model: Decide (context and selection of trainees), Define (needs assessment and objectives), Design (materials and methods), Direct (communities of practice (CoP) and learning) and Dissect (key performance indicators (KPIs) and feedback).

The kickoff of the FGD was in the form of leading sentences and questions that are summarized in Textbox 1.

Results

Delphi results

The experts proposed a total of 42 modifications to the original 11 domains, ranging from 1 to 5 modifications per domain. Some of the modifications consisted of minor wording changes (i.e., “mechanism” instead of “structure” in domain G) while other suggestions were more extensive (i.e., merge / discard / add more details to enhance comprehension). Round 1 of the Delphi process began with 11 domains (59 questions). The 19 experts accepted 4 of the proposed domains, modified the remaining 7 domains. Overall, the experts directed most suggestions to domain B and G (9 modifications), with the fewest suggestions made to domain E (3 modifications). Some domains received no comments and reached consensus at round 1. Therefore, they were not included in Delphi round 2. The 2nd round included 7 domains (36 questions). Eighteen experts responded to our invitation and agreed to participate in round 2. All domains reached a consensus by the end of round 2 as shown in Table 1. In summary, the consensus in round 1 was 88.3% while all the questions reached 100% consensus by the end of round 2 (Table 1).

Table 1 Delphi Scores in Round 1 and 2

FGD results

The final version of the evaluation questions after Delphi round 2 (56 questions) were used for discussion and generation of the indicators and data sources as shown in (Table 2).

Table 2 Evaluation guide for faculty development program in educational effectiveness

Discussion

The main focus of this work was to develop a guide for evaluating longitudinal faculty development programs. In order to do that, expert opinions were taken into account. The reliance on expert consensus was previously used by Minas and Jorm and Kern [23, 24].

Recent trends in training of proficient educators in HPE for their newer roles and responsibilities demand a shift to longitudinal FDPs (LFDPs) [14, 25, 26]. LFDPs developed based on robust models are shown to steadily establish and strengthen the desired competencies of the participants [27].

Even though several linear models were proposed in the past [11,12,13, 28,29,30,31,32,33], there was an explicit need for a flexible cyclical model that is more appropriate for LFDPs [9, 20, 34].

To achieve this objective, multi-level analysis, a widely used scientific method was employed [35,36,37]. This qualitative method was built upon the input from individuals with vast experience in planning and implementation of FDPs, engrained on a series of trials and errors encountered in the past [23, 24].

Community of Practice (CoP)

In this study, there is an inclination to identify indicators to test the continuity of the community practice. There is a multitude of facets used starting from the availability of information to the methods and platforms for communication to the impact of product development because of ongoing collaborations. The use of similar indicators to evaluate the development and sustainability of CoP was described before in previous work [38, 39].

Evaluating the CoP practice requires a longitudinal approach that allows for visiting and revisiting preset indicators [40]. This requires a communication strategy with alumni communities and a methodology to keep them engaged throughout the testing period.

CoP develop over five stages according to Etienne and Beverly Wenger-Trayner, 2015 [41].

Each of these stages requires an evaluation strategy and a set of indicators to identify the success of the process [38, 39]. In this study, indicators are stratified across all the five stages of CoP.

Data collection methods

In this study there are three sets of data collection methods for evaluation; 1) observation, 2) interviews, surveys or focus groups and finally 3) document or media review. According to Peersman, G. (2014), data collection tools are either those collected by direct observations, those reported by stakeholders either through interviews, surveys or focus groups and those extracted from evidence which might be documents or media analysis. This is in concordance with our proposed data sources [42].

Selection of faculty

Selection of the faculty for the training program received a semi-consensus with a tendency to identify indicators to test the homogeneity in terms of knowledge and interest among the faculty recruited for the program. Effective training design reduces the evaluation and categorization effort for the participants by building on pre-existing sector knowledge and expertise [43]. Therefore, many programs have a few salient requirements which will need to be met by the faculty to join the advocacy program services.

In terms of training alliance, focusing on the faculty selection with homogenous knowledge and interest will decrease the knowledge power gaps between the participants focusing on a common goal to improve and develop. Believing that candidates should possess several relevant qualities, the literature did not shed the light on the indicators required for that. This was attributed by some authors to the fact that faculty development is embedded within the training system with a systematic dynamic trainee evaluation [44, 45].

However, heterogeneous groups can outperform homogeneous groups in terms of the range of decision options and consequences of decisions that they consider [46, 47]. Thu s, a degree of heterogeneity is allowed depending on the goal and outcomes of the training program.

Quantification

When experts were requested to contemplate the standards, it became evident that quantification was a prerequisite for agreeing upon setting benchmarks. Similar views were resonated by other researchers as well [48,49,50,51,52]. Recognition of this fact strengthens the need for regional standards that fit seamlessly to cater to the requirements of institutions in diverse areas. Thus, the identified set of standards and indicators are meant as a guide for LFDPs with due adaptations to suit local needs [53, 54].

Limitations of the study

This work did not cover aspects of validation of the tool that can be performed longitudinally over a period of time. This work could benefit from a further study and application of this evaluation guide in real life situations, and this can be a future direction of research. Next steps recommended will be to implement the evaluation model on a pilot basis taking into account utility in various contexts. A study is also recommended to compare the novel model with existing models like Kirkpatrick model regarding process and outcome.

Conclusion

Conducting faculty development is an art that needs a degree of flexibility within the scope of ensuring a continual process of improvement and ongoing learning. The use of the guide for best practice in faculty development can be a self-evaluation tool as well as a quality assurance tool for external auditors. The best practice guide together with the evaluation process is a universal technique that can be adopted worldwide where indicators can be quantified based on local context after it has been tested for applicability, usability, and utility.

Recommendations

This work offers direction for schools needing to perform and evaluate FDPs. Using the checklist in Table 2 can be a good guide for schools in the evaluation and continuous quality assurance cycle. It is recommended to incorporate a structured strategy for evaluation, as early as possible while planning for FDPs.