Delphi Study as a Research Method
Delphi studies strive for consensus on a specific topic with a panel of experts over multiple rounds by means of questionnaires interspersed with feedback (Dalkey and Helmer 1963). Experts remain anonymous throughout the entire study to avoid any bias as a result of direct confrontation or in defense of preconceived notions (Okoli and Pawlowski 2004; Skinner et al. 2015). In each round, experts share opinions and feedback, which is anonymized, consolidated by the researchers, and shared with the panel until stable results are achieved or predefined termination criteria are met (Paré et al. 2013). Depending on the setup, rounds can focus on brainstorming, validation, narrowing-down, or ranking (Paré et al. 2013). Over the last several years, many rigor criteria and good practices related to Delphi studies have been proposed, which we abided by (Keeney et al. 2006; Okoli and Pawlowski 2004; Paré et al. 2013; Schmidt 1997).
Central Design Decisions
In line with our research question, we strived for an updated BPM capability framework. Before outlining preparatory activities and the Delphi procedure, it is important to share central design decisions. We communicated these design decisions repeatedly to the experts before and during the study, and the experts could comment on them anytime. Acknowledging that these design decisions affect our results, we also address precautions to offset potential bias and validity threats in Sect. 3.4 and include related limitations in Sect. 6.
First, to support the compilation of an updated BPM capability framework, we chose a two-phase approach. While the first phase focused on challenges and opportunities that BPM will face in the next 5–10 years, the second phase aimed at deriving related capability areas. This established a common ground across the panel, which included experts with diverse backgrounds. It also facilitated the derivation of BPM capability areas in response to challenges and opportunities. Accordingly, the first phase included brainstorming, validation, and narrowing-down rounds, whereas the second phase encompassed brainstorming and validation rounds. We decided against narrowing down the results of the second phase (e.g., by focusing on the most important capability areas) because this would have compromised the framework’s conceptual completeness. By contrast, several validation rounds ensured convergence toward stable results without losing content.
Second, to assess the novelty of the identified BPM capability areas, we planned to compare them to established ones. To that end, we adopted de Bruin and Rosemann’s (2007) capability framework. On the one hand, we used the core elements of BPM to group the challenges, opportunities, and capability areas, acting on the assumption that the core elements have remained constant over time. The core elements helped account for the comprehensive scope and interdisciplinary nature of BPM. For the same reason, we did not require capability areas to be BPM-exclusive but rather to have a BPM-specific interpretation or impact. On the other hand, we used de Bruin and Rosemann’s (2007) capability areas to assess which identified capability areas are new, are enhanced versions of existing ones, or are included as-is. In line with our goal of proposing an updated BPM capability framework, we did not require capability areas to be new. Finally, to facilitate communication and adoption in research and practice, we aimed for a parsimonious (in terms of the overall number of capability areas) and balanced (in terms of the number of capability areas per core element) capability framework, analogous to that of de Bruin and Rosemann’s (2007) work.
Third, we intended to judge the quality and convergence of our results quantitatively and qualitatively. Hence, we followed the common practice of measuring the experts’ satisfaction with the coding of challenges, opportunities, and capability areas (coding satisfaction) and their overall satisfaction (König et al. 2018; Schmiedel et al. 2013). To that end, we used the following 7-point Likert scale: 1 (fully dissatisfied), 2 (strongly dissatisfied), 3 (unsatisfied), 4 (neutral), 5 (satisfied), 6 (strongly satisfied), and 7 (fully satisfied). This enabled us to judge the development and the level of convergence as well as to check for selection bias, ensuring that satisfaction had not risen because experts had dropped out due to dissatisfaction but because the remaining experts had become more satisfied with the results (Heckman 2010). Overall, we strived for a positive development throughout the study and a high level of satisfaction, accompanied by supportive expert feedback and marginal changes between subsequent rounds (Paré et al. 2013).
Finally, we decided to invite experts from academia and industry as well as experts with a management and technology background to accommodate the diversity of the BPM field (Okoli and Pawlowski 2004). To ensure broad coverage, we invited experts from different countries, backgrounds, and sub-communities (Schmiedel et al. 2013). We specifically invited researchers who had already published on the future of BPM and included practitioners to complement the view of academics with first-hand experience. Formally, we required academic experts to have held a Ph.D. for at least 5 years, and industry experts to have at least 5 years of experience in a key role representing their organization’s BPM function or as BPM consultants (König et al. 2018).
Preparatory Activities
Prior to the main study, we conducted a pilot study (König et al. 2018; Paré et al. 2013; Skinner et al. 2015). As we had already decided to use the core elements of BPM for structuring the shortlisted challenges and opportunities as well as BPM capability areas, and had communicated this, the pilot study aimed to determine a suitable format for brainstorming in round 1. We investigated two options. The first was a greenfield approach where experts had to come up with challenges and opportunities without further guidance. The second involved asking experts to identify challenges and opportunities per core element (Kasiri et al. 2012). We assessed both options using two groups of three Ph.D. students, with the first group receiving the unstructured and the second the structured questionnaire. While the first group had no issues with the open questions, the second group argued that the presence of core elements constrained their creativity. Accordingly, we decided to use the greenfield approach in round 1.
Simultaneously, we invited experts to participate in the Delphi study in line with the selection criteria mentioned above (Okoli and Pawlowski 2004). Given the required commitment and experience, we primarily recruited experts from our networks. Initially, we identified 60 experts from 20 countries. By asking them to nominate further experts we increased the pool of potential experts to 62, 34 of whom agreed to participate in the study. This amounts to a response rate of 55%. Judging by the experts’ backgrounds, the panel was balanced in terms of technically- and business-oriented experts as well as in terms of researchers and practitioners. As for the geographical distribution, the panel covered 14 countries from five continents. Academic experts who participated in round 1 had held their Ph.D. for 17 years on average, while practitioners had 27 years of work experience on average. More background information on the panel can be found in Online Appendix A.
We also agreed on guidelines for coding the experts’ responses in the brainstorming and validation rounds. Methodologically, we used iterative coding (Krippendorff 2013; Schmidt 1997). In each round, one co-author anonymized all responses, whereupon two other co-authors coded the experts’ responses independently before they were consolidated in joint workshops (Okoli and Pawlowski 2004; Schmidt et al. 2001). After each workshop, we checked whether the results were linked to the experts’ input to ensure that they reflected the experts’ ideas—not ours by ensuring that all results can be traced back to at least one expert input. Our guidelines also covered the formulation of challenges, opportunities, and capability areas (Schmidt et al. 2001). We strived for short denominations and single-sentence descriptions while abstracting from the domain-specific and technology-centric vocabulary. Finally, we decided to avoid references to de Bruin and Rosemann’s (2007) work wherever possible, except for the core elements of BPM. This ensured that our framework could evolve as independently as possible, which was an important prerequisite for comparing it to de Bruin and Rosemann’s (2007) framework.
Delphi Study Procedure
The Delphi study took 4 months. In each round, the experts had 1 week to provide feedback via email or online questionnaire. In addition to open-ended feedback on the current round, experts could comment on the study in general. In each round, we provided instructions and definitions, responses from the previous round, and a change log (Keeney et al. 2006; Paré et al. 2013; Skinner et al. 2015). Table 2 provides an overview of the Delphi study and relevant key figures. Insights into the experts’ participation and satisfaction follow. Details about each round and the precautions we took to offset potential biases are compiled in Online Appendix B.
Table 2 Overview of the Delphi study and important key figures Between 23 and 29 experts participated per round, a number complying with recommendations in the literature (Paré et al. 2013). With 29 experts participating in round 1 and 23 in round 6, we had an end-to-end dropout of 21%. In round 1, we invited all experts who had agreed to participate in the study. In all subsequent rounds of the first phase, we invited those 29 experts who had participated in round 1, amounting to an initial no-show rate of 15%. This ensured a high diversity of input while guaranteeing that all experts were familiar with information shared before and during the study (e.g., related to the design decisions). As the results of the second phase built on the first phase, we invited those 28 experts in the second phase who had participated in rounds 1 and 3. Despite the dropout that typically occurs in Delphi studies, the panel remained balanced in terms of industry and academia experts and background.
In terms of quality and convergence, satisfaction increased during the study. The only exceptions were the overall satisfaction in round 4 and the standard deviation of the coding satisfaction in round 6. We comment on this in Online Appendix B. Together with the expert feedback and the fact that almost no changes had occurred between rounds 5 and 6, the development and level of satisfaction gave us confidence that the study had converged after six rounds. Upon completion, we also checked for selection bias by analyzing the last satisfaction values of all experts who had dropped out (Online Appendix C). A mean overall satisfaction of 5.00 (out of 7.00) and a mean coding satisfaction of 5.17 before dropout suggests that experts did not leave due to dissatisfaction.