Keywords

1 Introduction

A business process is a set of activities executed in a given setting to achieve predefined business objectives [25]. Since business processes constitute the operational foundation of organizations, companies seek to manage and improve them. Business processes are commonly supported by information systems that record data on the executions of the processes. These records are referred to as event logs. An event log consists of traces that capture the execution of a business process instance (a.k.a. a case). A case consists of a sequence of time-stamped events, each representing the execution of an activity. Process mining is a family of methods that analyze business processes based on their observed behavior recorded in event logs.

In the past two decades, research on process mining has made advances resulting in the generation of a large corpus of academic literature [63]. However, the number of studies on process mining methods might be disconcerting when companies seek to understand how process mining can be applied for improving business processes. More specifically, companies might find it challenging to understand (1) which prominent process mining use cases are available and (2) what business-oriented questions such use cases answer.

As such, the objective of this study is to develop, based on a Systematic Literature Review (SLR), a business-oriented framework that classifies existing process mining use cases relating them with specific methods and with the business-oriented questions they can answer. Therefore, the framework can aid practitioners that seek to use data-driven approaches to manage their business processes in exploring how process mining methods can add value to their business and what process-related analysis such methods can support. Thus, the main research questions we seek to examine in this paper are What are the main use cases for existing process mining methods? and What business-oriented questions do existing process mining use cases answer?. We conducted the SLR following the guidelines proposed in [45], retrieved 2293 papers using a keyword-based search from electronic libraries, and filtered them according to predefined inclusion and exclusion criteria. We finally identified a corpus of 839 relevant papers that we reviewed. Then, we used the review and the extracted data to develop a business-oriented a framework that could represent a valid instrument to guide companies in how process mining can support their business.

The remainder of the paper is organized as follows. In Sect. 2, we position our work against related work. Section 3 presents the SLR protocol. Section 4 summarizes the results, and Sect. 5 presents and discusses the framework. Finally, we conclude the paper in Sect. 6.

2 Related Work

In this section, we position our work against existing reviews within the process mining field. More specifically, we consider systematic mapping studies, process mining reviews in specific industrial sectors, and reviews on how process mining is applied in industry.

In [63], dos Santos Garcia et al. present a systematic mapping study providing an overview of the main process mining branches, algorithms, and application domains. A similar study is discussed by Maita et al. in [54]. These studies highlight that most of the process mining publications can be associated with process model discovery. This is in line with our findings. However, while these mapping studies focus on the current state of the process mining research, we examined empirically validated process mining methods to elicit how they might deliver value to companies. Thus, we take a business-oriented perspective allowing practitioners to link the everyday issues they have to deal with in their organizations with the process mining techniques that can help them solving these issues. Furthermore, Maita et al. classified process mining techniques into 3 main branches, i.e., process model discovery, conformance checking, and process model enhancement as proposed in [1]. This classification is also applied in other studies, such as [18]. However, the application of process mining has evolved in recent years. Therefore, our framework extends this classification by incorporating more recent process mining use cases.

Several reviews have also been conducted within specific industry sectors. For instance, literature reviews have been conducted with focus on healthcare [6, 29, 33, 36, 61], educational processes [34], and supply chain [41]. While these reviews focus on a specific application domain, we consider business-oriented questions that are answered at a domain-agnostic level.

Finally, reviews have also been conducted on how organizations use process mining. For instance, Thiede et al. [70] show with their study that process mining is not sufficiently leveraged in the context of cross-system and cross-organizational processes. Corallo et al. [16] primarily provide an overview of software tools that support process mining analysis in industry. Eggers and Hein [27], instead, examine capabilities and practices required to enable the realization of value using process mining in an organization. These studies provide valuable insights on different aspects of how process mining is applied in industry, but they do not guide practitioners in selecting use cases and methods to answer common business-oriented questions.

3 Systematic Literature Review

Our research objective is to develop a framework for classifying process mining use cases and the business issues they might address. This objective is achieved by answering two research questions (RQ). The first research question (RQ1) aims at identifying and categorizing process mining use cases, such as conformance checking and predictive monitoring. Therefore, RQ1 is formulated as What are the main use cases for existing process mining methods? The second research question (RQ2) aims at eliciting the business-oriented questions that the outputs of process mining methods can answer. Therefore, RQ2 is formulated as What business-oriented questions do existing process mining use cases answer? To answer these questions, we employ the SLR method as it allows us to collect relevant studies and, based on the review of existing research, to develop a framework for classifying them [15]. We followed the guidelines proposed by Kitchenham [45] according to which an SLR has three consecutive phases: planning, execution, and reporting.

Our SLR is composed of two parts. The first one (SLR review) aims at identifying other SLR studies on specific process mining use cases (e.g., [5] for process model discovery and [26] for conformance checking). The list of final papers in each of these SLR studies was extracted. However, as not all process mining use cases have a dedicated SLR study, we conducted a second review (PM review) targeting papers that have applied process mining techniques to real-life event logs. The lists of final papers retrieved from each SLR study identified with the SLR review was combined with the hits obtained with the PM review. The merged list of candidate papers was subjected to content screening. We followed the same guidelines for both parts, i.e., we developed search strings, identified search sources, filtered the results according to predefined criteria, identified additional relevant papers through backwards referencing, and extracted data according to a predefined form. Below, we provide a summary of these steps. The review protocolFootnote 1 and the list of final papersFootnote 2 are available online.

The planning phase of our SLR includes developing search strings, identifying search sources, defining selection criteria, and defining the data extraction strategy. We derived the search strings from our research questions as suggested in [45]. For the SLR review, the aim was to capture SLR studies on process mining. Therefore, we used the search string “process mining” AND “systematic literature review”. The search strings for the PM review were, instead, derived from the research questions and included the terms “process mining”, “workflow mining” (as this term is sometimes used interchangeably with “process mining”), “real-life”, “real-world”, and “case study”. Then, we applied the search strings to electronic databases. We selected Scopus and Web of Science for both parts as they index the venues where most research on process mining is published.

We then defined the selection criteria to identify relevant studies. These criteria, expressed as exclusion (EC) and inclusion (IC) criteria, allowed us to filter the initial list of papers to keep only those that are relevant to answer the research questions. For the SLR review, duplicate papers (EC1), papers not written in English (EC2), and papers inaccessible via the digital libraries subscribed by the University of Tartu, or otherwise unavailable (EC3) were excluded. In addition, with the first inclusion criterion (IC1), we filtered out studies that were not specifically about process mining and the second one (IC2) served to identify studies that applied SLR to identify relevant papers for specific process mining use cases. Thus, studies focusing on, for instance, evaluating process mining algorithms, such as [57] were excluded.

We applied the three exclusion criteria above also for the PM review. In addition, papers having less than five pages were discarded (EC4). The list of candidate papers was then filtered based on the inclusion criteria. We first excluded papers not within the domain of process mining (IC1). Then, we identified papers that apply process mining to real-life event logs (IC2). This criterion was included for two reasons. The first one is to identify methods that are applicable to real-life, often challenging, business settings. The second one is to identify papers that address business process aspects existing in real business contexts. The third inclusion criterion (IC3) was aimed at excluding papers that consider process mining for applications unrelated to business, such as [59], where discovery algorithms for managing noisy event logs are compared. Finally, the fourth inclusion criterion filters out papers that do not provide sufficient information to elicit business-oriented questions (IC4).

The final step of planning an SLR is data extraction. The objective of this step is to define the data extraction form to reduce the opportunity for bias. We developed the data extraction form according to the suggestions provided in [31, 60]. The data extraction form consists of two parts. The first part was used to extract the metadata of the papers, i.e., paper id, title, authors, and publication year. In the second part, the data extracted concerned process mining use cases (such as process model discovery or performance analysis) and the questions being answered by the process mining methods.

We conducted both searches in February 2020. A summary of the detailed procedure applied is depicted in Fig. 1. For the SLR review, the application of the search string to the electronic databases returned 132 hits from Scopus and 61 from Web of Science, making it a total of 193 candidate papers. After having applied the exclusion criteria, 60 papers remained. Of these, 15 were removed as they did not meet IC1, resulting in 45 papers. The application of IC2 filtered out additional 26 papers, resulting in 19 papers left. Finally, backward and forward referencing identified two additional papers, resulting in a final list of 21 relevant studies. The data extraction for this part consisted in exporting all studies included in the final lists of all 21 relevant SLR publications. These were merged with the hits resulting from the search conducted for the PM review. From the 21 SLR studies, a total of 702 papers were extracted.

Fig. 1.
figure 1

Application of the SLR protocol.

For the PM review, applying the search strings resulted in 1021 hits from Scopus and 570 from Web of Science, making it 1591 hits in total. With the 702 papers added from the first part, the total number was 2293. We discarded 91 papers that were unavailable, 889 duplicates and 109 short papers. A total of 1204 papers remained. The first inclusion criterion (IC1) resulted in the removal of 80 papers. In the second filtering, where IC 2-4 were considered, 316 papers were removed, resulting in 808 papers left. Additional 31 papers were identified from the backward and forward referencing. Thus, the final list consists of 839 relevant papers.

4 Results

In this section, we first present the identified use cases of empirically validated process mining methods. Then, we relate them to the business-oriented questions they answer.

4.1 Process Mining Use Cases

The most common use case identified in the process mining literature is process model discovery (see Fig. 2). Business processes are commonly supported by information systems that log information on the process executions. When such logs are available, they can be used to discover process models automatically. Process model discovery takes such event logs as input and produces a process model. Process model discovery is, therefore, used to build procedural process models (e.g., using BPMN or Petri nets) [43], declarative process models (e.g., using the Declare language) [22, 51], or hybrid [53] process models containing both a procedural and a declarative part. Use cases related to social network, goal, and rule mining focus on discovering other aspects of the process executions. Social network mining analyzes processes from an organizational perspective, i.e., discovers the performers involved in a case and their relations [2]. Goal mining, on the other hand, focuses on the process goals [74]. While process model discovery is activity-oriented, goal modeling seeks to discover the process actors’ intentions related to the execution of these activities [20]. Finally, rule mining [62], also referred to as decision mining, examines the data attributes in an event log to elicit the rules behind the choices made in the process.

Event logs hold information on the actual process executions, which is not necessarily aligned with process models. Therefore, models can be enhanced using process model enhancement techniques. In particular, process models can be repaired to better represent the process executions [30, 50] or extended with additional data recorded in the event logs [64].

Concept drift identifies changes in the process behavior. The discovery of process models from event logs implicitly assumes that the process model remains stable throughout the time period recorded in the event log. However, this is not always true, as the process might change during the recorded period. Thus, the concept drift use case focuses on detecting changes in the process behavior over time [49].

Fig. 2.
figure 2

Process mining use cases.

The second most common use case is predictive monitoring. Such use case aims at predicting the outcome of active cases, i.e., cases that are uncompleted and, therefore, still ongoing [24]. Learning from an event log of historical cases, predictive monitoring techniques are able to predict the remaining time of an ongoing case [73], delays [65], next activities [66], waiting times [7], outcomes [68], risks [14], costs [72], or performance indicators [17]. Since hyperparameter configuration in predictive process monitoring is crucial and often difficult for users, some works provide methods to support hyperparameter optimization in predictive process monitoring [23]. Prescriptive monitoring can be viewed as an extension of predictive monitoring. While predictive monitoring forecasts the likelihood of a case ending up with a desirable/undesirable outcome, it does not suggest the interventions that can increase/reduce the probability of such an outcome. Prescriptive monitoring, on the other hand, seeks to identify specific interventions, such as next activities to be executed [71], resource allocation [75], resource selection [44] to improve the likelihood of a favourable outcome, or when an intervention is needed based on the trade-off cost-gain [69].

The third most common use case we identified is conformance checking. Conformance checking aims at examining if the behavior of a process execution, as derived from an event log, conforms with the expected behavior (represented as a process model) [13]. This can be done by simply showing to the user where the process execution deviates from the process model [12, 42], or by providing a way to align the deviant process execution with the closest compliant case [3, 19, 46]. Compliance monitoring follows the same principle as conformance checking. However, while conformance checking is applied to completed process cases, compliance monitoring checks whether the behavior of active cases is compliant with predefined rules and constraints [48].

The fourth most common use case we found is variant analysis. Executions of a business process commonly include variants, i.e., cases that follow the same path (characterized by the same sequence of activities) [67]. Variant analysis enable identifying these variants in an event log. Variant analysis can also be applied for identifying differences and similarities between different variants [10, 38]. Deviance mining, instead, aims at explaining why a certain variant deviates from the most frequently taken path [8, 55].

A last use case concerns assessing process performance [56]. The performance measured can be the duration of a process execution [40], the resource utilization [39], or the quality of the products/services provided [4]. The performance of several connected processes can also be assessed [28] and the process performance trends over time [21].

4.2 Business-Oriented Questions

Some process mining methods answer questions that are descriptive. For instance, process model discovery answers the question How are the cases of a procedural, a declarative, or a hybrid process executed? The answer is expressed as a process model representing the process behavior as recorded in the event log. Process cases can commonly grouped into variants. Therefore, some process mining methods answer the question What are the main variants of a process? Other process mining methods answer, instead, questions to quantitatively describe processes. For instance, process mining can answer questions such as What is the duration-, resource-, quality-related performance of a case?

There are also methods answering comparative questions to compare two or more process cases. For instance, variant analysis methods might be used to compare different variants of a process thus answering the questions What are the similarities between two or more variants of a process? and What are the differences between two or more variants of a process? Similarly, conformance checking seeks to compare the prescribed behavior of a process with the observed behavior, i.e., comparing what a process model stipulates and how the process is executed in reality. Therefore, conformance checking answers questions such as Where does a case differ from a process model? Another type of comparative questions are the ones related to how the process behavior changes over time such as How has the process behavior, or its performance changed over time?

Process mining also answers questions that seek explanatory answers, i.e., providing information that explains relations among different entities. For instance, some process mining methods answer questions such as How is the performance of a case affected by other factors? Likewise, deviance mining provides explanatory information by answering the question Why do some cases deviate from the normal flow? Methods that compare a model with historical (conformance checking) or live (compliance monitoring) cases could provide explanatory information by answering questions such as Given a non-compliant case, what is closest compliant case? Predictive monitoring methods answer (forecasting) descriptive questions such as What are the predicted remaining times, delays, next activities, waiting times, outcomes, risks, costs, or performance indicators of an ongoing case?

Finally, there are process mining methods aiming at providing suggestions on how to redesign a process model to improve understandability, or optimize the likelihood of a favourable outcome of ongoing process executions. For instance, model repair techniques provide input for improving a process model by answering the recommendatory questions How can a process model be repaired to better reflect the actual execution of the process? and How can the understandability of the mined process models be improved? On the other hand, prescriptive monitoring methods provide recommendations on how an ongoing process case should be executed to reach a positive outcome. For instance, recommendations can be given on which variant to follow thus answering the question What is the recommended execution path of an ongoing case? Recommendations also extend to resources and their allocation by answering questions such as What is the recommended resource allocation? and Who is the recommended process performer?

5 Framework

This section introduces our business-oriented framework that categorizes the identified process mining use cases. The framework can be used by practitioners to be guided in the selection of the process mining methods that are the most suitable for their needs. It consists of two parts: a categorization of the main process mining use cases (RQ1) and the elicitation of the business-oriented questions that these use cases can answer (RQ2). Our categorization draws on the value-driven business process management (VBPM) proposed in [32]. According to VBPM, organizations need BPM techniques to realize at least one of the six values: efficiency, quality, compliance, agility, integration, and networking. In order to realize these values, transparency is required. Transparency is creating visibility of how processes are executed. Commonly, this is achieved with business process models. Therefore, transparency lies at the core of VBPM.

Organizations, engaging in BPM to gain in efficiency, take an internal organizational viewpoint and focus on improving the performance of their processes. Efficiency gains are achieved by, for instance, eliminating waste in the processes, reducing redundancies, and removing rework. On the other hand, organizations can focus on the outputs of the processes and on improving their quality. Organizations that hold quality as a core value, engage in BPM to explore the correlation between process characteristics and product/service quality.

Compliance as an expected value of BPM emphasizes reducing variability and increasing standardization. Financial institutions, for instance, are subjected to regulatory requirements. Therefore, such organizations gain value from designing and executing processes that comply with predefined standards. However, organizations might also gain value from having agile processes, i.e., flexible and adaptable processes. For instance, an insurance company experiencing a sharp increase in claims during severe weather conditions, would switch to a different process execution that caters to the increased volumes. Other values of BPM are integration and networking. Integration concerns the creation of business value by increasing awareness and accessibility of process models to internal stakeholders. Conversely, networking focuses on involving external stakeholders in the processes.

We use VBPM as the basis for categorizing process mining use cases for two reasons. Firstly, VBPM is derived from surveys and interviews with companies and, therefore, captures the main reasons why companies engage with BPM. Our framework is business-oriented, i.e., categorizes use cases and questions that, when answered, aim at delivering value to organizations. Therefore, using VBPM allows us to categorize process mining methods while aligning them with the business values companies seek from BPM. Secondly, process mining is applicable in different BPM lifecycle phases such as process discovery, monitoring and analysis [25]. Likewise, process mining can be applied to support the execution of different methodologies such as Six Sigma [35]. Since VBPM focuses on business value rather than on specific BPM methodologies, using VBPM as a basis, our business-oriented framework can be used to show how process mining techniques can contribute to generating value to organizations instead of framing these techniques in the context of specific BPM lifecycle phases or methodologies.

Table 1. Framework instantiation.

5.1 Framework Instantiation

Our framework categorizes the main process mining use cases into categories transparency, efficiency, quality, compliance, and agility. Transparency encompasses use cases that aim at discovering process models (process model discovery), discovering the interaction between resources in a process (social network mining), adjusting process models to capture the process executions better (process model repair), enriching the process models with additional data (process model extension), detecting the decision rules embedded in the process decision points (rule mining), and identifying the process objectives (goal mining). Efficiency includes process mining use cases concerning the analysis of the performance of a process (process performance), Quality encompasses use cases for the identification and comparison of process variants (variant analysis) and analyzing the reasons for deviations in a process case (deviance mining). Compliance includes use cases comparing a process model with some observed behavior (conformance checking), or predefined rules or constraints with an ongoing case (compliance monitoring). Agility encompasses use cases about the predictions on how ongoing process executions will unfold in the future (predictive monitoring), the description of how the process behavior changes over time (concept drift), and the prescription of actions to take, for an ongoing case to achieve a certain desired outcome (prescriptive monitoring).

We have excluded integration and networking in our framework. Integration focuses on improving the availability and accessibility of process models to internal resources, for instance, to raise their engagement in the processes. However, although process mining methods can discover process models, the distribution and accessibility of such models are beyond the scope of this field. Networking, instead, focuses on incorporating external parties within the scope of a process. Although event logs of several parties can be merged together, process mining methods treat such logs in the same way as a single internal event log. Therefore, we also excluded networking from our framework.

At the highest level, our framework categorizes process mining use cases into transparency, efficiency, quality, compliance, and agility. Each of these categories is then organized in sub-categories. For instance, transparency consists of process model discovery, repair, enhancement, social network mining, goal mining, and rule mining (see first column of Table 1). Then, we define the questions that each use case of a certain sub-category can answer, and provide a sample referenceFootnote 3 (see second column of Table 1). For instance, for concept drift under agility, we have defined the question How has the process execution changed over time?

The questions specified in the transparency category are, as expected, descriptive, while questions specified in the efficiency category are quantitative or comparative. The questions specified in the quality and in the compliance categories are, instead, descriptive, comparative, or explanatory. Finally, the questions defined in the agility category are descriptive, recommendatory, comparative, or explanatory. Thus, we can observe that descriptive questions lie at the foundation of other questions, just as transparency constitutes the foundation of the other categories.

5.2 Limitations

The main limitations of SLR studies are selection bias and data extraction inaccuracies. These threats, although not eliminated, were reduced by adhering to the guidelines proposed by [45]. More specifically, we used well-known databases to find papers, performed backwards referencing to avoid excluding potentially relevant papers, and ensured replicability by providing access to the SLR protocol. Another limitation concerns the fact that we relied on the results reported in the literature and we did not empirically verify or assess the extent to which the use cases impact the business processes, or if they led to effective process improvements (we considered methods using real-life event logs but not necessarily tested in industrial contexts). Although this could represent a limitation for the generalizability of the results, the proposed framework still provides practitioners with valuable insights on how process mining might be applied in industry, and represents an easy-to-use instrument to understand what types of analysis can be conducted with the existing process mining methods.

6 Conclusion

Process mining methods have been growing fast in the last decades. While such methods help manage business processes, it can be challenging for practitioners to readily understand how they can deliver value, or what business-oriented questions they can answer. To fill this research gap, we propose a framework that classifies existing process mining use cases using categories transparency, efficiency, quality, compliance, and agility. Furthermore, within each of the above categories, process mining use cases can answer descriptive, comparative, explanatory, or recommendatory questions.

The SLR we conducted also highlights that several studies in the process mining literature support the discovery of process models (transparency), predictive monitoring (agility), the analysis of process performance and variants (efficiency and quality), and conformance checking (compliance). In this respect, the framework also represent an instrument allowing researchers and/or process mining companies to understand which use cases in the process mining field have already been largely explored and which ones, instead, need further investigations.