Keywords

1 Introduction

The execution of business processes on information systems is recorded on event logs. Process mining involves discovery, conformance and enhancement of process starting from event logs. A process discovery algorithm is a function that maps an event log onto a process model, such that the model is “representative” for the behavior seen in the event log [1]. Several algorithms have been developed for discovering process models that reflect the actual execution of processes. These models allow business analysts to make performance evaluations, anomaly identification, compliance checking, among others analysis.

Noise, duplicate tasks, hidden tasks, non-free choice constructs and loops are typical problems for discovery algorithms [2]. These problems are related to unstructured processes, commonly present in real environments [3]. Thus, algorithms performance depends on event log characteristics and their associated process.

Several executions of different discovery algorithms could be required trying to obtain a quality model, thus becoming a time consuming and error-prone task. Selecting the right algorithms is a hard task due to the variety of variables involved. Several techniques execute different discovery algorithms for an event log and evaluate their resulting models using quality metrics [4]. Nevertheless, this empirical evaluation approach is computationally expensive and time consuming.

On the other hand, a recommendation technique is based on regression, but it requires reference models [5]. Reference models are not commonly available in contexts where process discovery is required. If there are reference models, it is unwise to assume that they reflect the actual execution of processes.

Studies that attempt to establish the algorithms with better performance under certain conditions have been published using the aforementioned empirical evaluation techniques [6]. However, the impact of each condition on model quality is not clearly defined yet. Therefore, the actual use of these studies remains limited.

The aim of this paper is to evaluate the usefulness of an approach based on classification to recommend discovery algorithms [7]. A knowledge base is constructed considering event log features such as: control-flow patterns, invisible tasks and infrequent behavior (noise). The recommendation procedure based on classification is tested over the knowledge base with different classifiers.

The paper is structured as follows: in the next section concepts and approaches related to process discovery and recommendation of discovery algorithms are presented. In Sect. 3, phases of the recommendation procedure, followed by a description of the creation process of the knowledge base are presented. In Sect. 4, experimental results of the classification based recommendation and their respective analysis are provided. A set of current techniques and approaches to assess and recommend discovery algorithms are discussed in Sect. 5. Finally, the last section is devoted to conclusions and outlines for future work.

2 Recommendation of Process Discovery Algorithms

A process discovery algorithm constructs a process model starting from an event log. An event is the occurrence of an activity of a process and a trace is a non-empty finite sequence of events recorded during one execution of such process. So, an event log is a multi-set of traces belonging to different executions of the same process.

Obtaining a quality model is the main goal of a process discovery algorithm. There are various metrics and approaches for estimating process model quality, though there is a consensus on the following quality criteria [1]:

  • Fitness: The model should allow the behavior present in the event log.

  • Precision: The model should not allow a behavior unrelated to the one stored in the log.

  • Generalization: The model should generalize the behavior present in the log.

  • Simplicity: The model should be as simple as possible. Also referred as structure, is influenced by the vocabulary of modeling language.

These criterion compete among them due to the inverse relationship between generalization and precision. A too general model could lead to allow much more behavior than the one presents in the log, it is also known as underfitting model. On the contrary, a too precise or overfitting model is undesirable. The right balance between overfitting and underfitting is called behavioral appropriateness. The structural appropriateness of a model refers to its ability to clearly reflect the performance recorded with the minimal possible structure [8]. A quality model requires both, behavioral appropriateness and structural appropriateness [9].

In order to obtain a quality model, a discovery algorithm should tackle several challenges related to event logs characteristics. Heterogeneity of data sources from real environments, can lead to difficult cases for discovery algorithms [10]. Infrequent traces and data recorded incompletely and/or incorrectly can induce wrong interpretations of process behavior. Moreover, data provided by parallel branches and ad-hoc changed instances generate complex sequences on event logs, this creates traces that are harder to mine.

Process structure is another source of challenges for discovery algorithms. Presence of control-flow patterns like non-free choices, loops and parallelism affect the discovery algorithms. For example, algorithms such as \(\alpha \), \(\alpha \) +, \(\alpha \) # and \(\alpha \)* do not support non-free choices [11]. On the other hand, DWS Mining and \(\alpha \) ++ can deal with non-free choice but cannot support loops [2].

In order to identify which discovery algorithm allows obtaining suitable models for particular situations, a set of techniques for algorithms evaluation have been developed. Performance of these algorithms is determined through evaluation of quality of obtained models. Defined quality metrics are grouped under two main methods [12]. One method compares the discovered model with respect to the event log and is called model-log. The other method, called model-model, assesses similarity between discovered model and a reference model of process.

Evaluation frameworks allow end users to compare the performance of discovery algorithms through empirical evaluation with quality metrics [4, 13]. Moreover, recommending a discovery algorithm for a given event log, based on empirical evaluation, involves time and resource consumption for each of the algorithms chosen as a possible solution. So, alternative approaches, based on classification, have been proposed to recommend process discovery algorithms [5, 7].

3 Classification of Event Logs for Recommendation of Process Discovery Algorithms

In this section we evaluate the usefulness of an approach based on classification to recommend discovery algorithms [7]. Classification is the problem concerning the construction of a procedure that will be applied to a continuing sequence of cases, in which each new case must be assigned to one of a set of pre-defined classes on the basis of observed attributes or features [14]. An event log on which is necessary to recommend a discovery algorithm is considered as a new case to be classified. The recommended algorithm is the pre-defined class to be assigned to an event log based on its observed features.

Taking into account the challenges for discovery algorithms the classification mechanism for recommendation of discovery algorithms consider the following factors:

  1. 1.

    The event log is the main information source that is available in all environments for process characterization.

  2. 2.

    The peculiarities of event log. must be considered in addition to process characteristics.

  3. 3.

    The results obtained by quality metrics on discovered models provide information about performance of process discovery algorithm facing event logs and process characteristics.

Fig. 1.
figure 1

Recommendation of discovery algorithms through event log classification

The stages for classification of event logs in order to recommend process discovery algorithms can be observed in Fig. 1. It shows that staring point is the new case to be classified. This new case is composed of event log and the process features that affect discovery algorithms and desired values for quality metrics on each quality criterion. The discovery algorithm that could discover a model for that log with the desired values on quality metrics is the class. The classifiers are trained in a knowledge base composed by cases with the same structure of aforementioned case, but labeled with the corresponding discovery algorithm. The discovery algorithm selected as the class for the new case is recommended to be applied on the new event log in order to obtain a process model with the specified quality values.

3.1 Building the Knowledge Base

A major challenge for this classification problem is the building of a knowledge base. In order to construct the knowledge base for this problem the following features were selected to build the cases:

  1. 1.

    The event log features.

  2. 2.

    The process characteristics.

  3. 3.

    The discovery algorithm used to obtain the process model.

  4. 4.

    The quality metrics obtained values based on the discovered model.

The sequence of phases followed to generate the artificial cases for the knowledge base is presented in Fig. 2. The outcome of each phase is used as input to the following.

Fig. 2.
figure 2

Generation of artificial cases for the knowledge base

The goal of the first and second phases is to obtain the event log and process features that affect discovery algorithms. In these two phases was used the Process log generator tool [15]. Using this tool several process models were generated in a random way. 67 of these process models that combine loops, non-free choice and invisible tasks were selected for the second phase. These process features were considered due to their impact on process discovery algorithms [6].

In order to generate the event logs a factorial complete experimental design was performed. For this factorial design five classes were considered: noise (\(C_1\)), noise interval (\(C_5\)), loops (\(C_3\)), parallelism (\(C_4\)) and invisible tasks (\(C_5\)). Noise could appear on different proportion of traces on event logs, on this cases 5 different proportions were used: 0, 25, 50, 75, 100. The same stand for noise interval: 0, 25, 50, 75 and 100 were the distribution used. Control flow patterns (\(C_3\), \(C_4\)) and invisible tasks (\(C_5\)) were considered on a boolean manner. Formula 1 was used to calculate the required number of event logs to combine the aforementioned criteria.

$$\begin{aligned} \left. \begin{aligned} F= C_1*C_2*C_3*C_4*C_5 \\ F=5*5*2*2*2=200 \\ \end{aligned} \right. \end{aligned}$$
(1)

Considering results from Formula 1, 201 event logs were generated combining the five features already stated. One third of these event logs has 500 traces each one. The second third has 1000 traces on each event log, while the last third has 1500 traces on each one.

On the third phase, discovery algorithms are applied on the generated event logs. Even there are several discovery algorithms, only five where selected for this knowledge base. First criterion used on the selection was that the algorithm could discover a model on Petri net notation, or another notation that could be translated to a Petri net. This requirement is related to available quality metrics, that could be applied only on Petri net models. Performance of discovery algorithms on real and artificial event logs are considered too.

Therefore, published results about assessment of discovery algorithms [2, 4, 6, 11] lead us to select the Heuristic Miner [16], ILP Miner [17], Inductive Miner [18], Genetic Miner [19] and Alpha Miner [20]. All these algorithms are available as plug-ins on the process mining framework ProM [21]. So, ProM 6.3 was used to obtain five process model from each event log using every selected algorithm.

The main goal of the last phase in the generation of artificial cases for the knowledge base is to evaluate the performance of discovery algorithms. One quality metric was selected for each quality dimension or criterion. So, were selected fitness [22], ETC [23], ARC Average [22] and Behavioral Generalization [24]. All these metrics belong to model-log method and are implemented in CoBeFra [25], a benchmarking tool. Therefore, using the generated event logs and the discovered models, the values for these quality metrics were obtained using CoBeFra.

Once all the phases were executed the information obtained were used to create the cases. One case was created for each discovered model. Each case is represented as a vector \(c_i=\{at_i,aa_i,and_i,xor_i,l_i,it_i,nd_i,ni_i,f_i,p_i,g_i,s_i, DA\}\). In this vector i refers to the ordinal number of the discovered model. The \(at_i\) and \(aa_i\) variables stand for amount of traces and amount of activities in the event log used to discover the i model. Moreover, \(and_i\), \(xor_i\), \(l_i\) and \(it_i\) refers to the amount of parallelism, exclusive choice, loops and invisible tasks respectively, in the process related to the i model. The noise distribution and interval on the event log used to discover the i model are represented as \(nd_i\) and \(ni_i\). Variables \(f_i\), \(p_i\), \(g_i\) and \(s_i\) express the values of the quality metrics obtained on the i model, related to fitness, precision, generalization and simplicity, respectively. Last but not least, DA is the discovery algorithm used to create the i model and this is the class that labeled the case.

Following the aforementioned description, a knowledge base was constructed with 795 cases. A ProM plug-in was developed to visualize and manage the knowledge base. This plug-in allows integration with other techniques in ProM.

4 Testing Classifiers

In order to find suitable classifiers for the knowledge base built, a set of well-known classifiers were trained and assessed. Before training, the data set was normalized to values between 0 and 1. Results for each classifier training are presented in Table 1, expressed in terms of Incorrectly Classified Instances (ICI) and Mean Absolute Error (MAE). Classifiers implementation on WEKA [26] where used in all cases, with default configuration values.

Table 1. Results of classifiers training

Based on the results presented on Table 1, seven classifiers were selected: Classification Via Regression, Multilayer Perceptron, Simple Logistic, Logistic, J48, Filtered Classifier and MultiClass Classifier. A ProM plug-in was developed to integrate the WEKA implementation of these classifiers into ProM. Using this classification plug-in, the classifiers could be trained in ProM with the previously mentioned knowledge base. With the trained classifiers, the plug-in enables the recommendation of discovery algorithms through classification of new cases.

Five new event logs were generated to assess the recommendation provided by the classification plug-in developed. Empirical evaluation of discovery algorithms on these event logs were used as reference for this assessment. Starting from the features of these event logs, five new cases were prepared (Table 2). Each case has the structure \(c_i=\{at_i,aa_i,and_i,xor_i,l_i,it_i,nd_i,ni_i,f_i,p_i,g_i,s_i, DA\}\).

Table 2. Execution times for empirical evaluation and classification of new event logs

Recommendation through classification means a significant time improvement with respect to empirical evaluation, as can be seen in Table 2. Besides, for each event log (with the exception of event log 4) the class proposed by six of the seven classifiers match with the discovery algorithm with best results on empirical evaluation.

Heuristics Miner was the only algorithm with quality values distinct from 0 in empirical evaluation of event log 4. This result only matches with the class obtained by three classifiers, because other four proposed Alpha Miner and the remaining propose Inductive Miner. Discovery algorithms were impacted by the high level of noise distribution in this event log (\(nd_4=100\)). Models obtained from this kind of logs are incomprehensible and have very low values on quality metrics. This situation could be the explanation for mismatching of classifiers with event log 4. Nevertheless, further experimentation with highly noisy event logs is required to prove this hypothesis.

5 Related Work

Evaluation frameworks allow end users to compare the performance of discovery algorithms through empirical evaluation [13]. But, using this framework as a recommendation mechanism is not suitable due to the cost involved on empirical assessments of discovery algorithms. Following the model-log method, another evaluation framework, that includes a parameter optimization step, has been proposed [4]. Nevertheless, the negative examples generation created serious performance problems in the experiments with complex event logs [4].

Other proposed solution is based on selecting reference models of high quality and building from these a regression model to estimate the similarity of other process models [5]. Created serious performance problems However, this approach needs reference models for the evaluation and prediction, a requirement that severely limits its application. In multiple real-world environments, where discovery algorithms need to be applied, the process models are not described or are inconsistent and/or incomplete. Besides, this solution assumes that the actual execution of the processes keeps a close relationship with their reference models. But, inexact results can be expected in contexts where features of the actual logs differ from logs artificially generated by the reference models. Furthermore, the construction of a regression model from process model features discards issues such as noise and lack of information on event logs. These issues have a significant impact on the performance of discovery algorithms.

6 Conclusions and Future Work

Event logs features and process characteristics affect the performance of process discovery algorithms. Classical approaches that select discovery algorithms based on empirical assessments are computationally expensive and time consuming.

This paper evaluates the recommendation of discovery algorithms as a classification problem. For this purpose, a knowledge base, with artificially generated cases was built. Cases combine features of event logs and process characteristics with impact on performance of discovery algorithms. Besides, each case contains the values of one quality metric from each quality criterion.

Two ProM plug-ins developed allow to train seven well known classifiers over the knowledge base built. Recommendation of these classifiers match entirely, on four from five event logs, with the discovery algorithm with best quality values on empirical evaluation. In all cases recommendation through classification was obtained in a significant lower time than through empirical evaluation.

Experimentation with highly noisy logs and multiple classifier systems is suggested as future work. Besides, research is required to apply the proposed approach on event logs from real environments. In this context, low level patterns such as indirect successions and repeated events could be used to extract process characteristics from real event logs.