1 Introduction

Process mining is leading the process-oriented data science research paradigm that is predominantly known for extracting insights from event logs captured by information systems. It aims at analyzing and visualizing event logs to reveal process patterns. Process mining is identified by three main activities: automatic process discovery, conformance checking, and enhancement.

Over the past decade, more than 80 papers addressed the automatic discovery research issues by proposing new and variations of existing algorithms [1]. Often, these methods tend to extract models that differ in their approach to consider the trade-off among precision, fitness, and complexity criteria. Generally, the decision to respect which quality dimension criteria is arbitrarily made. Hence, it is difficult to detect a reference model using conventional process discovery methods.

For instance, the heuristic-based process discovery methods captivated the attention of researchers in the discipline of process mining. They have been identified as a proper solution to discover less structured process models. This is due to their ability to handle noises in an event log [2, 3]. The work in [4] was one of the pioneers in developing heuristic-based process discovery algorithms, and the developed algorithm was named the heuristic miner. It can be inferred that the main objective of such an algorithm was to give flexibility to users so that they can detect the target model from an event log. This has been observed in improved versions of this algorithm [5].

The classic heuristic-based methods are developed mainly in five general steps which is (i) identifying the footprint matrix, (ii) calculating the dependency measure, (iii) devising the graph, (iv) discovering the splits and joins, and (v) adjusting the loops. In this article, we will elaborate on the second step of this algorithm. This decision is founded upon one of the scientific gaps that we have identified, and we will elaborate on it in the subsequent section.

1.1 The Scientific Gap and Its Risks

The actions to extract the footprint matrix lead to detecting the direct relations among activities. The advantage of the classic heuristic miner algorithm is within its second step where the algorithm calculates the dependency measure among activities. By defining this metric, the algorithm gives the mentioned flexibility advantage to select a target model [5].

This so-called flexibility is offered to users by providing manually adjustable thresholds, which is a common approach of most process discovery algorithms. These thresholds allow modification of the graph by adding or removing activities (vertices) and connections (edges) among them. This modification is mainly carried out arbitrarily, and it’s highly dependent on the experience and knowledge of the user. This is incompatible in case the user wants to understand how a process normally is being executed in reality, and the detection of drifts is not possible. This shortcoming was evoked in other works as well [6]. Additionally, in data-driven simulation, one of the main challenges is to extract the reference behavior recorded in an event log to eventually identify a simulation model.

Moreover, the extracted results are likely to illustrate a non-optimal graph. Increasing the value of these thresholds could also lead to detecting uncommon behaviors, more entropy, and impractically enlarging the size of the graph.

This gap could be observed in other studies too [6,7,8,9]. For instance, researchers in [9] mentioned that the quality of the discovered model depends on trial-and-error and can be time-consuming to detect a target model. Therefore, considering the second step of these algorithms, the research question for heuristic-based process discovery algorithms is “how to calculate the optimal values for thresholds?"

The primary version of the stable heuristic miner algorithm was proposed to address this research question [10,11,12]. To do so, the stable heuristic miner proposed to evaluate the statistical stability in an event log. As a result, a process model is discovered that represents the descriptive reference process model. This algorithm was developed to detect a reference behavior of patients with the objective of diagnosing deviations and drifts from it [11]. This was seen as an important requirement in the analysis of patients’ pathways. This was due to the fact that the healthcare processes are highly complex, and users need a reference model for diagnosis actions, which is difficult to obtain.

However, the previous version of the stable heuristic miner addressed the evaluation of statistical stability by focusing only on activities (vertices). As a result, the discovered process model could not effectively consider the statistical stability of the whole behavior registered in an event log (i.e., activities and relationships among them). This was one of the concerns highlighted in [10, 12]. In this article, we aim to address this issue.

1.2 Hypothesis and Contributions

In complex systems with emergent properties like a hospital or environments where humans are running the majority of tasks, it’s a challenge to capture a model that illustrates the reference behavior of a statistical population. To address this issue, we define statistical stability in process discovery.

We establish that Statistical stability in process discovery is manifested through a meta-analysis that evaluates the consistency of samples’ behaviors against the recorded information in an event log, which we consider as a statistical population.

Each sample is identified by an activity or a connection between two activities. By evaluating the statistical stability, one could obtain a snapshot that can be used as the main and reference behavior of the population. This evaluation–statistical stability–is carried out by assessing the frequency of mass events, and also by taking the stability of averages, variances, and the standard deviations of the samples into consideration [13, 14]. Accordingly, the contribution of this article is:

  • The new stable heuristic miner 2 algorithm, which redefines the descriptive reference process model by assessing the statistical stability of both activities and edges extracted from an event log.

  • A real-world event log, which is used to assess the capability of the novel algorithm, which is accessible in [15].

1.3 Article’s Structure

The remainder of the article is organized as follows: Sect. 2 collects the main state of the art of the related works addressing the heuristic-based mining methods for process discovery. Section 3 describes the developed algorithm and its definitions. Additionally, it provides a running example for a better illustration of the introduced method. Section 4 reports two experiments to validate the applicability of the new algorithm, and it provides a comparison with previous methods. These experiments are based on two real-world event logs. Section 5 summarizes the main conclusions of this work, its limitations, and its potential future works.

2 Background

2.1 A Review of Heuristic-Based Process Discovery Methods

As mentioned in Sect. 1.1, calculating the dependency measure is a principal step targeted by several heuristic-based process discovery algorithms. Optimizing the output of this step could lead to the extraction of an optimal dependency graph, or as mentioned here the descriptive reference process model.

Previously in [4, 5, 16], the dependency graph was defined as:

$$\begin{aligned}&Dependency \, Graph \nonumber \\&\quad = \left\{ (a,b) \mid (a\in E \wedge b\in a\square ) \vee (b \in E \wedge a \in \square b) \right\} \end{aligned}$$
(1)

This definition expresses ‘E’ as a limited set of activities. Each event could represent an activity. ‘\(\square b\)’ stands for the activities that come before ‘b’. ‘\(a \square\)’ denotes the activities that come after ‘a’. Accordingly, a dependency relation is represented by (ab) expressing the \(input-output\) and sequence of activities.

Therefore, the dependency measure [5] could be expressed by equation 2:

$$\begin{aligned} Dependency \, Measure: a \Rightarrow _wb = \frac{\mid a>_wb \mid -\mid b>_wa \mid }{\mid a>_wb \mid + \mid b>_wa \mid +1} \end{aligned}$$
(2)

Here, ‘w’ stands for the event log with ‘n’ number of activities. \(\mid a>_wb\mid\) denotes the number of times activity ‘a’ is followed by ‘b’. An increase or decrease in the value of these thresholds by the user will lead to a set of vertices and edges. This selection is arbitrary and doesn’t guarantee an optimal graph.

The primary heuristic miner algorithm [4] discovered the dependency graph based on the minimum thresholds for evaluating the dependency among activities.

Authors in [17] presented another version of the heuristic miner by focusing on the mentioned scientific gap. They proposed a new approach to calculate the dependency measures. One of the advantages of this version was the ability to extract dependency graphs from event streams.

Heuristic miner ++ was introduced in 2015 [18]. This version also tried to modify the approach in calculating the dependency measures by considering the time intervals in an event log.

Another version of the heuristic miner was proposed as the “flexible heuristic miner” in [5]. The output of this version was constructed similarly to the classic heuristic miner, however, it improved the results by considering long-distance dependencies as well.

The Fodina algorithm [19] is another improved version of the heuristic miner, which is capable of detecting duplicate tasks and providing more flexibility for the user to extract a dependency graph. Authors in [8] mentioned that this algorithm could lead to disconnected graphs, which could be seen as irrelevant.

These presented works aimed at improving the heuristic miner functionalities. However, their aim was narrowly different from the focus of the stable heuristic miner 2. In this new algorithm, we focus on the extraction of one target model from the data for specific analytical purposes such as diagnosis and concept drift detection.

Several researchers in [8, 20,21,22,23,24] worked on mathematical programming applications to address the research question of “how to discover an optimal process". For instance, authors in [21] considered integer linear programming. They try to optimize their functions according to an interesting constraint, which is assuring the modeled activity is on a path starting from the initial activity to the end activity. The Proximity miner algorithm [22] is an interesting example of these works, which integrates the domain knowledge in the discovery task of the dependency graph. Consequently, the scientific question appears here again that the mining procedure is dependent on the expert’s knowledge.

We identified two promising methods with an approximately similar objective to ours. The first one is the Inductive Miner algorithm [25], which is one of the most applied process discovery algorithms. This method aims to detect the most significant splits in an event log and determine associated operators to characterize each split. This method results in block-structured process models, which are visually appealing. However, such a model could generate a flower-structure process model, leading to low fitness and generating behaviors that were not seen in the event log. Moreover, this method identifies various process structures and suggests thresholds to detect linear models. Considering its approach, it did not match our plan to propose a reference model based on mathematical logic. It has been seen that both Inductive Miner and classic Heuristic Miner have been ineffective in detecting the relevant/reference process model [6]. In [6], authors compared the effectiveness of classic heuristic miner and inductive miner while using the sepsis data set–used in this paper as well–and highlighted the need to develop methods for advancing towards extracting a reference process model.

The other method is the Split Miner algorithm [26]. The objective of this promising method is to extract simple process models with low branching complexity while keeping a high fitness value. This approach does not predominately lead to a reference model, since it does not consider the statistical distribution of the events nor the nature of the observed system. The application of this method to our study could lead to the extraction of a simple model, but not necessarily the descriptive reference model.

2.2 The Previous Version of the Stable Heuristic Miner Algorithm

As mentioned earlier, in the first version of the stable heuristic miner [10, 12], we aimed to find the presence frequency of all activities in an event log. Then, we obtained three main thresholds to see how important the activity is according to the statistical stability phenomenon.

The reason statistical stability was found to be an appropriate approach to evaluate and discover a process model from an event log was based on the nature of the studied system.

According to the hypothesis of statistical stability [13, 14], to understand the behavior of a complex system, it is required to evaluate the behavior of each member of the system based on its impact on the whole system. To our humble knowledge, this is a missing notion in the core of most process discovery algorithms.

The statistical stability evaluation could be driven by the Shewhart control charts approach [10, 27]. These control charts include a center line (CL) that represents the average value of a measured characteristic, relevant to the in-control state. The Upper Control Limit (UCL), and Lower Control Limit (LCL) are calculated by considering the standard deviations and averages of the samples. These two limits—UCL & LCL—are the borders of a statistically stable state. As long as the analyzed data points fall between these two thresholds, the outcomes of the process are in control. If a data point falls outside these limits, it will be considered as a variation of the process outcomes, and the process will no longer be considered stable.

As a result, the previous version of the stable heuristic miner algorithm proposed to replace the second step of the classic heuristic miner with ten new steps.

Fig. 1
figure 1

Illustration of the steps of the first version of the stable heuristic miner to evaluate the statistical stability for activities [10, 12]

These steps were defined as the sequence of actions to identify the stable activities and activities that are imposing instability into the process. These steps are shown in Fig. 1. By identifying these two types of activities, a new definition was proposed in equation 3. This definition represents a descriptive reference process model.

According to equation 3, in order to acquire the descriptive reference process model, we need to detect a state from the event log in which for each activity (\(\mathcal {A}\)), a vector as s represents the frequency of relations with other activities. ‘s’ vector is defined as a sample of the population (footprint matrix). Additionally, for each ‘s’ sample, there exists a \(\bar{x}_s\), which is the average of direct relations frequencies. Therefore, the corresponding activity to the sample (s) will be shown in the descriptive reference process model (\(\mathcal {P}\)) if the average of its direct relations frequencies is between the two thresholds; ‘UCL’ and ‘LCL’. Consequently, such a sample (activity) is considered a stable behavior. If the average value is greater than the UCL value, then, it’s considered a hot zone that imposes instability into the process.

$$\begin{aligned}&[\forall \quad \mathcal {A} \quad \exists \quad s] \wedge [\forall \quad s \subseteq \mathcal {S} \quad \exists \quad \bar{x}_{s}] \nonumber \\&\therefore \nonumber \\&(\mathcal {A} \quad \in \quad \mathcal {P}) \rightarrow [ LCL< \bar{x}_{s} < UCL] \cup [ UCL \le \bar{x}_{s} ] \end{aligned}$$
(3)

This definition—equation 3—evaluated the statistical stability criteria to detect solely the stable, and unstable activities behavior. On the other hand, each edge can have an impact on the nature of a graph, but this is ignored here.

Therefore, we propose an improvement in the following. The second version of this algorithm not only considers the behavior of activities in a dependency graph but also the edges that can be discovered from an event log. The following section presents the new modification to the stable heuristic miner algorithm.

3 Theory and Methods

This novel version will consider both activities (vertices) and relationships among them (edges) as samples of a larger statistical population, which is the event log.

3.1 Presenting the Steps of the Novel Algorithm

To better illustrate this new algorithm, we are going to present each step by using a running example. Figure 2 demonstrates the procedure of the second version of the algorithm.

Fig. 2
figure 2

Stable heuristic miner 2 algorithm: the sequence of actions that lead to the output of the algorithm

To avoid redundancy, we will not focus on the steps related to the previous version of the algorithm, and we will present through a running example the definitions regarding the new improvements.

3.1.1 The Running Example

L’ represents a series of traces that are supposedly extracted from an event log.

figure a

Inside L each group represents a trace. A trace consists of events corresponding to the activities. For instance, the first trace shows that 12 cases have followed the same sequence of activities. Figure 2 shows the new steps we are going to take to discover the descriptive reference process model. We’ll describe these steps in the following definitions.

After extracting the footprint matrix (shown below), we will consider the matrix as the population and each value as a sample (\(s_e\)).

$$\begin{aligned} \mathcal {S}= \left\{ \begin{array}{cccccccccccccc} &{} a &{} b &{} c &{} d &{} e &{} f &{} g &{} h &{} i &{} j &{} k &{} l &{} m \\ a &{} 0 &{} 26 &{} 34 &{} 0 &{} 0 &{} 0 &{} 0 &{} 0 &{} 0 &{} 0 &{} 0 &{} 0 &{} 0 \\ b &{} 0 &{} 0 &{} 46 &{} 0 &{} 0 &{} 2 &{} 0 &{} 0 &{} 5 &{} 0 &{} 4 &{} 0 &{} 0 \\ c &{} 0 &{} 31 &{} 0 &{} 48 &{} 5 &{} 9 &{} 0 &{} 4 &{} 10 &{} 6 &{} 0 &{} 0 &{} 0\\ d &{} 0 &{} 0 &{} 0 &{} 0 &{} 14 &{} 0 &{} 17 &{} 0 &{} 0 &{} 0 &{} 0 &{} 23 &{}0 \\ e &{} 0 &{} 0 &{} 5 &{} 4 &{} 0 &{} 0 &{} 0 &{} 0 &{} 0 &{} 0 &{} 0 &{} 20 &{} 0\\ f &{} 0 &{} 0 &{} 15 &{} 2 &{} 0 &{} 0 &{} 0 &{} 0 &{} 0&{} 0 &{} 0 &{} 0 &{} 0\\ g &{} 0 &{} 0 &{} 0 &{} 0 &{} 1 &{} 0 &{} 0 &{} 5 &{} 0 &{} 0 &{} 0 &{} 11 &{}0 \\ h &{} 0 &{} 0 &{} 4 &{} 0 &{} 5 &{} 0 &{} 0 &{} 0 &{} 0 &{} 0 &{} 0 &{} 0 &{}0 \\ i &{} 0 &{} 0 &{} 9 &{} 0 &{} 0 &{} 6 &{} 0 &{} 0 &{} 0 &{} 0 &{} 0 &{} 0 &{}0 \\ j &{} 0 &{} 0 &{} 0 &{} 0 &{} 0 &{} 0 &{} 0 &{} 0 &{} 0 &{} 0&{} 0&{} 6 &{}0 \\ k&{} 0 &{} 0 &{} 0 &{} 0 &{} 4 &{} 0&{} 0 &{}0 &{} 0 &{} 0 &{} 0 &{} 0 &{} 0\\ l &{} 0 &{} 0 &{} 0 &{} 0 &{} 0 &{} 0 &{} 0 &{} 0 &{} 0 &{} 0 &{} 0 &{} 0 &{}60 \\ m &{} 0 &{} 0&{} 0 &{} 0 &{} 0 &{} 0 &{} 0 &{} 0 &{} 0 &{} 0 &{} 0 &{} 0 &{} 0 \\ \end{array}\right\} \end{aligned}$$

3.1.2 Preliminaries and Definitions

Definition 1

Sample-edge (\(s_e\)): Each connection or an edge between two activities with a value greater than 0 is considered as a sample. All the samples have a unique size of 1.

While it might seem unusual to select samples with the size of 1, there are statistical constraints that require such selection [28]. Accordingly, in the example above, there are ‘\(30+1\)’ samples (edges). The ‘\(+1\)’ is an assumption for presenting the connection to the end activity.

Definition 2

Observation-edge (\(x_{ij}\)): Each edge or sample within the population has a value. This value is called the Observation value.

As an example, in the previous footprint matrix, the value of \(a \rightarrow b\) is equal to 26. Therefore, \(a \rightarrow b\) is a sample of the population. The value of this sample is considered as an observation.

The value of each observation will be used to build upon the extraction of a statistically stable state. Needless, to mention, there are 31 edges in this example and accordingly ‘31’ observation values.

To understand the statistical behavior of each sample, we need to evaluate it by considering other samples’ behaviors. To do so, we define the Moving range (MR).

Definition 3

Moving Range (MR): MR represents the difference from one \(observation (x_{ij})\) to another.

For instance, the \(x_{ab} = 26\) and \(x_{ac} = 34\). Therefore, the MR from \(x_{ab}\) to \(x_{ac}\) is equal to ‘8’. The order is based on the sequence in which events are recorded.

The value of MR helps to consider the variations in samples’ behaviors.

Definition 4

Average behavior (\({\bar{x}}\)): As it can be expected, this value represents the average of all the samples(edges) values.

$$\begin{aligned} \bar{x}= \frac{x_1+ x_2+x_3+...+x_n}{n} \end{aligned}$$
(4)

Definition 5

Average of Moving Range (\({\bar{MR}}\)): While the value for each edge changes from one to another, the \(\bar{MR}\) gives a value to represent the average of variations.

$$\begin{aligned} \bar{MR}= \frac{MR_{1}+MR_{2}+MR_{3}+ ... + MR{n}}{n} \end{aligned}$$
(5)

To calculate the value of \(\bar{MR}\) for the running example (L), at first, we sorted the observation values. Then, the average behavior \(\bar{x}\) was calculated. Finally, MR values or the changes in the values of edges will be calculated. This will lead to calculating the average of moving ranges (\(\bar{MR}\)). The result of these actions are shown in Table 1.

Table 1 Presenting the value of observations and the average Moving Range (\(\bar{MR}\))

Accordingly, the average value of variations is \(\bar{MR} = 10.51\). The average value for the recorded behavior of edges is \(\bar{x} = 14.225\).

Definition 6

Central Line-edges(CL.edges): This central threshold determines the most statistically stable edges (samples). If the value of an edge is closer to this line, It is more likely for this edge to be seen in future behaviors.

As shown in equation 6, the most stable behaviors are close to the average behavior.

$$\begin{aligned} CL.edges = \bar{x} = \frac{x_1 + x_2 + x_3 + ...+ x_n}{n} \end{aligned}$$
(6)

Definition 7

Upper Control Limit-edges (UCL.edges): This threshold determines the limit to conceive edge behavior as stable. If an edge value passes this threshold, it will be considered as an edge with a high variation ratio in its behavior.

Equation 7 presents the mathematical model of UCL. A customary constant as ‘\(d_2\)’ is defined here. The value for ‘\(d_2\)’ is defined through a series of calculus operations and it has led to certain constant values. \(d_2\) values are depending on the number of monitored samples in a population. Additionally, these values are presented in most statistics handbooks [28].

$$\begin{aligned} UCL.edges = \bar{x}+ 3 \left(\frac{\bar{MR}}{d_2}\right) \end{aligned}$$
(7)

Definition 8

Lower Control Limit-edges (LCL.edges): The LCL value ascertains the unstable behaviors. The edges with values less than LCL are not expressed within the discovered process model. This threshold helps to remove the so-called “dirt roads” from a process model automatically.

Equation 8 refers to the mathematical model for extracting LCL value.

$$\begin{aligned} LCL.edges = \bar{x}- 3\left( \frac{\bar{MR}}{d_2}\right) \end{aligned}$$
(8)

For 31 samples in the population, \(d_2\) is equal to 4.113 [27]. Finally, the state of each edge will be determined according to the previous equations 6, 7, and 8. This procedure is also shown in algorithm 1.

  • \(\textit{UCL} = 14.225+ (3)\left(\frac{10.51}{4.113}\right) \approx \textit{22}\)

  • \(\textit{CL = 14.225}\)

  • \(\textit{LCL} = 14.225 - (3) \left(\frac{10.51}{4.113}\right) \approx \textit{7}\)

After detecting the statistical stability thresholds, we redefine the descriptive reference process model in the following.

Definition 9

Descriptive reference process model V2 (\(\mathcal {P}\)) The updated definition of the descriptive reference process model or the common behaviors will contain activities (\(\mathcal {A}\)) and edges (\(\mathcal {E}\)) that respect the following conditions:

$$\begin{aligned}&[[\forall \quad \mathcal {A} \quad \exists \quad s_a] \wedge [\forall \quad s_a \subseteq \mathcal {S}_a \quad \exists \quad \bar{x}_{s_a}]] \wedge \nonumber \\&[[\forall \quad \mathcal {E} \quad \exists \quad s_e ] \wedge [\forall \quad s_e \subseteq \mathcal {S}_e \quad \exists \quad x_{s_e}]] \nonumber \\&\therefore \nonumber \\&((\mathcal {A}, \mathcal {E}) \quad \in \quad \mathcal {P}) \rightarrow [ LCL< \bar{x}_{s_a}< UCL] \cup [ UCL \le \bar{x}_{s_a} ] \wedge \nonumber \\&[ LCL.edge< x_{s_e} < UCL.edge] \cup [ UCL.edge \le x_{s_e} ] \end{aligned}$$
(9)

This definition–equation 9–expresses the conditions for the sets of activities (\(\mathcal {A}\)) and edges (\(\mathcal {E}\)) to be included within the descriptive reference process model(\(\mathcal {P}\)).

As shown, for all activities (\(\mathcal {A}\)), there exists a sample (\(s_a\)) that represents the behavior of each activity. A value as \(\bar{x}_{s_a}\) shows the average behavior of each activity.

Additionally, the behavior of all the edges (\(\mathcal {E}\)) is presented by a sample that has a value of \(x_{s_e}\). Therefore, a set of an activity and its edges is within the descriptive reference process model (\(\mathcal {P}\)), if the average of the activity behavior(\(\bar{x}_{s_a}\)) is between the UCL and LCL. If \(\bar{x}_s\) is greater than UCL then the activity is considered as an activity with high instability.

Simultaneously, edges (and the linked activities) are within \(\mathcal {P}\) if their values (\(x_{s_e}\)) fall between the two thresholds UCL.edge and LCL.edge. If \(x_{s_e}\) is greater than UCL.edge, then this edge is considered as an edge with high instability as well.

Note that the new evaluation of UCL.edges, LCL.edges, and CL.edges are in parallel with the previous calculation of the first version of the algorithm. Basically, this branch of calculations is added to improve a previous shortcoming of the algorithm. The result of applying the new algorithm is presented in Fig. 8. Needless to say, the behavior of activities was discovered by using the first version of the algorithm [10, 12], which had a focus on activities’ behaviors. However, to simplify our explanation we avoid re-explaining the previous work. In the following, we will compare the results of applying the stable heuristic miner 1 &2 and classic heuristic miner to the running example. This procedure is also shown in algorithm 2.

Algorithm 1
figure b

Stable Heuristic Miner V2

Algorithm 2
figure c

Stable Heuristic Miner V2

3.2 Evaluating the Method on the Running Example

Figures 3, 4, 5, and 6 present the result of the classic heuristic miner for the running example. The thresholds for determining the level of dependency measures for activities in the event log are set in an arbitrary manner which is considered a disadvantage of this algorithm [7, 8].

Fig. 3
figure 3

The process model extracted by the classic heuristic miner approach with a manual threshold set at 20%. Accordingly, the model presents activities that have a dependency measure higher than 20%. Only 67% of the recorded behaviors respect this threshold

Fig. 4
figure 4

The process model extracted by the classic heuristic miner approach with a manual threshold set at 80%. Accordingly, the model presents activities that have a dependency measure higher than 80%. Only 45% of the recorded behaviors respect this threshold

At first, we applied some known process discovery quality criteria evaluation metrics: precision, fitness, and complexity in both size and behavior of the model [29, 30]. Previous research provided a concrete definition of these criteria [31]. Precision evaluates the ability of the discovered model to reproduce exactly what has been found in the original event log. This metric ranges between 0 and 1, where a value of the maximum of 1 means that the model can only generate sequences that are already seen in the event log. Fitness measures to what degree the recorded behavior in the event log is represented by the process model.

Complexity evaluates the difficulty of analyzing a process model by humans. Simplicity is just the opposite value of complexity. One could argue that the complexity and simplicity criteria are subjective, and it is difficult to quantify a quality metric this way. In this regard, the new algorithm effectively stabilizes the metrics in the four quality dimensions.

The extracted results were evaluated by these presented metrics, which are depicted in Table 2. Looking at this evaluation and corresponding criteria, choosing a value to represent the reference behavior would be arbitrary. Therefore, deriving a conclusion to select the descriptive reference process model solely based on the traditional process discovery criteria poses a challenge.

Accordingly, the new application of statistical stability ensures the detection of a stable amount of information from an event log. This task will be carried out by certain reactive thresholds, which are determined by using the statistical stability methods. Therefore, by using the stable heuristic miner algorithm, it is not required anymore to arbitrarily select the threshold values. As a result, activities and edges with insignificant behaviors will be removed. A common path will present the stable behaviors in an event log, and the most variant behaviors will be detected as well. As stated before, these behaviors that are shown in red could raise issues in the process.

The descriptive reference process model extracted by the new algorithm is more explanatory in comparison with the classic heuristic miner and the previous stable heuristic miner. This is due to the automatic detection of stable and unstable behaviors for both edges and activities.

Fig. 5
figure 5

The process model extracted by the classic heuristic miner approach with a manual threshold set at 90%. Accordingly, the model presents activities that have a dependency measure higher than 90%. Only 25% of the recorded behaviors respect this threshold

Fig. 6
figure 6

The process model extracted by the classic heuristic miner approach with manual configuration of thresholds. The value for the threshold is set at 0, therefore, the model shows all the registered behaviors

Table 2 Presents the evaluation of process discovery results of the running example considering the quality criteria mentioned in the literature
Fig. 7
figure 7

The process model extracted by the first version of the stable heuristic miner. Red activities have shown a high level of variation in their behavior according to the algorithm

Fig. 8
figure 8

The model extracted by the new version of the stable heuristic miner. Red edges and activities have shown a high level of variations in their behavior according to the algorithm

4 Case Studies: Discussion and Evaluation

A pragmatic approach is selected to assess the capability of the new algorithm based on real-world scenarios that are:

  • LivingLabHospital_Interpreted Location event logs [15]: This location event log is included in this article for the first time by the authors. It contains the events related to the movements of patients inside the urology department of the Toulouse hospital facility in the south of France. This data was recorded by the authors to monitor and improve patients’ pathways by using location data and process mining. This data is shared in this article, and a novel analysis could help compare the results of both versions. We focus on analyzing patients in the urology department. To better understand how the data was interpreted and prepared for process discovery actions, we refer to the works in [32, 33].

  • Sepsis Data set [34]: This event log consists of sepsis cases in a hospital. There are 1000 cases with a total of 15,000 events that were recorded for a total of 16 different activities.

4.1 Comparing the Results of Different Versions of the Algorithm Based on Sepsis Data

At first, we analyzed this dataset with the classic heuristic miner. Figures 9, 10, 11 present the results. As depicted there, the classic heuristic miner results in an arbitrary selection of thresholds, and it is not efficient in extracting the reference process model.

On the other hand, Fig. 12 presents the outcome obtained by applying the initial version of the stable heuristic miner algorithm to the sepsis data set. As mentioned, this data set contains the behaviors of 16 different activities. Nonetheless, the process discovery result shows that all of these activities exhibit stable behaviors.

Fig. 9
figure 9

Visualizing the sepsis process model by using the classic heuristic miner and applying a dependency threshold of 15%. The selection of the threshold value is random due to the approach of the algorithm

Fig. 10
figure 10

Visualizing the sepsis process model by using the classic heuristic miner and applying a dependency threshold of 50%. The selection of the threshold value is random due to the approach of the algorithm

Fig. 11
figure 11

Visualizing the sepsis process model by using the classic heuristic miner and applying a dependency threshold of 80%. The selection of the threshold value is random due to the approach of the algorithm

Within this assessment, Fig. 13 depicts the outcome of the new version of the stable heuristic miner algorithm. Based on this result, edges related to four activities represented insignificant behaviors in the statistical population. These activities were ‘Release B’, ‘Release C’, ‘Release D’, and ‘Release E’. Consequently, they have been removed. There are no activities with high instability.

Moreover, according to Fig. 13, several edges show high instability in their behaviors. Such instability could be the cause of one or several inefficiencies. For instance, consider ER Triage activity. Here are all the outgoing edges:

  • ER Triage \(\rightarrow\) CRP: 52 (frequency)

  • ER Triage \(\rightarrow\) Leucocytes: 29 (frequency)

  • ER Triage \(\rightarrow\) LacticAcid: 48 (frequency)

  • ER Triage \(\rightarrow\) ER Sepsis Triage: 905 (frequency)

There is one edge (ER Triage \(\rightarrow\) ER Sepsis Triage) that shows a higher instability and abnormal behavior than others. This edge is shown with a ‘red’ color code in Fig. 13. Unstable behavior may signal underlying issues. With this updated algorithm, we can pinpoint potential bottlenecks and inefficiencies in our process, an analysis beyond the classic heuristic miner. To our knowledge, conventional methods lack this insight. Assessing event log stability yields a more informative process model, useful for automatic diagnosis of process deviations and concept drift detection. It should be noted that there is a difference between the in-going and the outgoing frequencies of edges. This is due to removing the insignificant edges (edges with values lower than LCL). Furthermore, the stable heuristic miner algorithm efficiently handles large event logs, like sepsis data, due to improved statistical methods for large datasets.

Fig. 12
figure 12

Result of the stable heuristic miner v1 by using the Sepsis data set

Fig. 13
figure 13

Result of the new stable heuristic miner v2 algorithm by using the Sepsis data set

4.2 Comparing the Results of Different Versions of the Algorithm Based on Urology Data

As indicated earlier, this case study was designed to detect inefficiencies in patients’ pathways. Indoor localization systems were used to collect this data.

Similarly in this case study, firstly, we evaluated the results by applying the classic heuristic miner. Figures 14, 15, 16 depict the results. As expected, the results failed to provide a reference process model according to solid mathematical logic.

On the contrary, Fig. 17 presents the effort to detect a reference model by the first version of the stable heuristic miner algorithm. Out of the 14 activities recorded in the event log, 10 of them are shown by the result of the first version of the stable heuristic miner algorithm. Table 3 summarizes this observation.

Fig. 14
figure 14

Visualizing the urology process model by using the classic heuristic miner and applying a dependency threshold of 15%. The selection of the threshold value is random due to the approach of the algorithm

Fig. 15
figure 15

Visualizing the urology process model by using the classic heuristic miner and applying a dependency threshold of 50%. The selection of the threshold value is random due to the approach of the algorithm

Fig. 16
figure 16

Visualizing the urology process model by using the classic heuristic miner and applying a dependency threshold of 85%. The selection of the threshold value is random due to the approach of the algorithm

Table 3 Comparing the number of observed behaviors in the event log, with the result of the stable heuristic miner V1
Fig. 17
figure 17

The discovered model of the urology department by the stable heuristic miner 1. The statistical stability method is applied only to activities and behaviors

Fig. 18
figure 18

The discovered model of the urology department by the stable heuristic miner V2. The Statistical stability methods are applied for both activities and edge behavior

Eight of the ten activities expressed stable behaviors, and two of them showed high instability in comparison with the total number of recorded behaviors. However, the statistical stability of edges are not addressed here.

Figure 18 illustrates the result of the second version of the stable heuristic miner algorithms by using the urology patients’ data.

By means of this algorithm, not only unstable activities but unstable edges are discovered as well. Now, by considering the “Registration” activity, it is possible to recognize the source of instability in this department.

The high level of fluctuations caused by this activity is seen by the connection/edge between “Registration” and “Reception_ Waiting_ room”. The value of these edges is 57, which is higher than the normal value for all the other registered behaviors. As a result, it puts the process in an unstable state.

The same behavior causes instability for edges between “Waiting_ Room5” and “Box Consultation” activities.

Detection of these deviating behaviors could help the domain experts in highlighting the causes of inefficiencies in patients’ pathways.

It is important to stress that acquiring such a diagnosis is only feasible if experts are assured of a mathematical logic behind the process discovery. A logic that could lead to the extraction of the descriptive reference process model of patients’ pathways. Based on our extensive experiments involving the monitoring of patients’ activities, it was evident that the outcome observed was not attainable using the classic heuristic miner algorithm.

Moreover, the novel approach of the stable heuristic miner algorithm helped experts to detect deviating behaviors automatically and to capture an image of what patients normally do, even if the domain experts did not particularly have a piece of complete knowledge about the process.

In the past, domain experts were required to possess an in-depth understanding of the process, as the discovered model had to be manually filtered. However, given the intricate nature of complex healthcare processes, this task posed significant challenges.

5 Conclusion

Heuristic-based process discovery methods showed promising results in the literature on process mining. For instance, based on the literature review in [1, 3], the classic heuristic miner algorithm has been identified as the most applied process discovery algorithm in the healthcare sector. The heuristic-based methods discover an initial set of activities and edges according to dependency measures and then modify the set regarding some arbitrarily defined thresholds. This has been identified in this article and the literature as a scientific challenge leading to the extraction of a non-optimal set of activities and edges.

Based on the hypothesis of this article, evaluating the statistical stability in an event log could lead to the discovery of a descriptive reference process model. Eventually, this evaluation could address the mentioned scientific gap. This hypothesis is mainly driven by the definition of statistical stability phenomenon that addresses the question of “how one can represent the reference behavior of a complex system with emerging properties". Such a workflow could be used as one of the building blocks of data-driven simulation models.

The first version of the stable heuristic miner algorithm introduced the evaluation of statistical stability in an event log. However, the assessment was carried out only by considering activities’ behaviors. This has been mentioned as a limitation of the previous version of the algorithm.

In this article, we presented a new version of the stable heuristic miner. We redefined the descriptive reference process model and presented an assessment of this algorithm by using two real-life event logs. Thanks to this algorithm, we were able to extract a more comprehensive and informative descriptive reference process model. The discovered results showed the detection of unstable and insignificant behaviors for both activities and edges among them. Accordingly, users could obtain a stable amount of information from an event log without the need to manually and arbitrarily modify a filtering value. The obtained model could also detect a stable state among 4 traditional process discovery quality dimensions criteria (i.e., fitness, precision, generalization, and simplicity/complexity).

5.1 Limitations and Challenges

It is a demanding task to evaluate the results of a process discovery algorithm. Some researchers used conformance-checking-based methods to identify and evaluate certain criteria such as fitness, simplicity, generality, etc. [1]. These criteria have been applied in this article for the running example. They evaluate the correlation between the recorded information in the event log and the discovered model. However, these methods are not the most effective metrics to measure statistical stability results. The nature of this work is based on the assessment of statistical stability criteria. Accordingly, methods and definitions are presented to examine and eventually discover the behavior registered in event logs. Assessing statistical stability performed by conformance checking-based criteria is conceived as not the best practice. Similar issues and concerns have been seen before in other research works as well [7, 31, 35]. We aim to address this challenge in our future research, with the objective of developing an overall applicable evaluation approach.

5.2 Future Perspective

We are planning to focus on devising a set of quantitative evaluation criteria for this algorithm rather than a qualitative approach. Moreover, this algorithm has the potential for automatic diagnosis and reasoning of business processes. Accordingly, we are aiming to improve the task of process discovery by developing a knowledge-driven approach. Also, it is beneficial to integrate a procedural semantic in the representation of the process discovery results rather than a declarative semantic. The other avenue of research is the application of this algorithm for the discovery of a data-driven simulation model.