Specificationdriven predictive business process monitoring
 105 Downloads
Abstract
Predictive analysis in business process monitoring aims at forecasting the future information of a running business process. The prediction is typically made based on the model extracted from historical process execution logs (event logs). In practice, different business domains might require different kinds of predictions. Hence, it is important to have a means for properly specifying the desired prediction tasks, and a mechanism to deal with these various prediction tasks. Although there have been many studies in this area, they mostly focus on a specific prediction task. This work introduces a language for specifying the desired prediction tasks, and this language allows us to express various kinds of prediction tasks. This work also presents a mechanism for automatically creating the corresponding prediction model based on the given specification. Differently from previous studies, instead of focusing on a particular prediction task, we present an approach to deal with various prediction tasks based on the given specification of the desired prediction tasks. We also provide an implementation of the approach which is used to conduct experiments using reallife event logs.
Keywords
Predictive business process monitoring Prediction task specification language Automatic prediction model creation Machine learningbased prediction1 Introduction
Process mining [66, 67] provides a collection of techniques for extracting processrelated information from the logs of business process executions (event logs). One important area in this field is predictive business process monitoring, which aims at forecasting the future information of a running process based on the models extracted from event logs. Through predictive analysis, potential future problems can be detected and preventive actions can be taken in order to avoid unexpected situation, e.g., processing delay and servicelevel agreement (SLA) violations. Many studies have been conducted in order to deal with various prediction tasks such as predicting the remaining processing time [52, 53, 54, 63, 69], predicting the outcomes of a process [18, 35, 50, 72], and predicting future events [19, 24, 63] (cf. [11, 15, 41, 42, 49, 58]). An overview of various works in the area of predictive business process monitoring can be found in [20, 36].
In practice, different business areas might need different kinds of prediction tasks. For instance, an online retail company might be interested in predicting the processing time until an order can be delivered to the customer, while for an insurance company, predicting the outcome of an insurance claim process would be interesting. On the other hand, both of them might be interested in predicting whether their processes comply with some business constraints (e.g., the processing time must be less than a certain amount of time).
When it comes to predicting the outcome of a process, business constraint satisfaction, and the existence of an unexpected behaviour, it is important to specify the desired outcomes, the business constraint, and the unexpected behaviour precisely. For instance, in the area of customer problem management, to increase the customer satisfaction as well as to promote efficiency, we might be interested in predicting the possibility of “pingpong behaviour” among the customer service (CS) officers while handling the customer problems. However, the definition of a pingpong behaviour could be varied. For instance, when a CS officer transfers a customer problem to another CS officer who belongs into the same group, it can already be considered as a pingpong behaviour since both of them should be able to handle the same problem. Another possible definition would be to consider a pingpong behaviour as a situation when a CS officer transfers a problem to another CS officer who has the same expertise, and the problem is transferred back to the original CS officer.
To have a suitable prediction service for our domain, we need to be able to specify the desired prediction tasks properly. Thus, we need a means to express the specification. Once we have characterized the prediction objectives and are able to express them properly, we need a mechanism to create the corresponding prediction model. To automate the prediction model creation, the specification should be unambiguous and machine processable. Such specification mechanism should also allow us to specify constraints over the data, and compare data values at different time points. For example, to characterize the pingpong behaviour, one possibility is to specify the behaviour as follows: “there is an event at a certain time point in which the CS officer (who handles the problem) is different from the CS officer in the event at the next time point, but both of them belong to the same group”. Note that here we need to compare the information about the CS officer names and groups at different time points. In other cases, we might even need to involve arithmetic expressions. For instance, consider a business constraint that requires that the length of customer order processing time to be less than 3 hours, where the length of the processing time is the time difference between the timestamp of the first activity and the last activity within the process. To express this constraint, we need to be able to specify that “the time difference between the timestamp of the first activity and the last activity within the process is less than 3 hours”.
The language should also enable us to specify the target information to be predicted. For instance, in the prediction of remaining processing time, we need to be able to define that the remaining processing time is the time difference between timestamp of the last activity and the current activity. We might also need to aggregate some data values, for instance, in the prediction of the total processing cost where the total cost is the sum over the cost of all activities/events. In other cases, we might even need to specify an expression that counts the number of a certain activity. For example, in the prediction of the amount of work to be done (workload), we might be interested in predicting the number of the remaining validation activities that are necessary to be done for processing a client application.
 RQ1:

How can a specificationdriven mechanism for building prediction models for predictive process monitoring look like?
 RQ2:

How can an expressive specification language that allows us to express various desired prediction tasks, and at the same time enables us to automatically create the corresponding prediction model from the given specification, look like? Additionally, can that language allow us to specify complex expressions involving data, arithmetic operations and aggregate functions?
 RQ3:

Once we are able to specify the desired prediction tasks properly, how can a mechanism to automatically build the corresponding prediction model based on the given specification look like?
 1.
We introduce a rich language that allows us to specify various desired prediction tasks. In some sense, this language allows us to specify how to create the desired prediction models based on the event logs. We also provide a formal semantics for the language in order to ensure a uniform understanding and avoid ambiguity.
 2.
We devise a mechanism for building the corresponding prediction model based on the given specification. This includes the mechanism for automatically processing the specification. Once created, the prediction model can be used to provide predictive analysis services in business process monitoring.
 3.
To provide a general idea on the capability of our language, we exhibit how our proposal can be used for specifying various prediction tasks (cf.Sect. 5).
 4.
We provide an implementation of our approach which enables the automatic creation of prediction models based on the specified prediction objective.
 5.
To demonstrate the applicability of our approach, we carry out experiments using reallife event logs that were provided for the Business Process Intelligence Challenge (BPIC) 2012, 2013, and 2015.
Roughly speaking, we specify the desired prediction task by specifying how we want to map each (partial) business processes execution information into the expected predicted information. Based on this specification, we train either a classification or regression model that will serve as the prediction model. By specifying a set of desired prediction tasks, we could obtain multiperspective prediction services that enable us to focus on different aspects and predict various information of interest. Our approach is independent with respect to the classification/regression model that is used. In our implementation, to get the expected quality of predictions, the users are allowed to choose the desired classification/regression model as well as the feature encoding mechanisms (in order to allow some sort of feature engineering).
This paper extends [57] in several ways. First, we extend the specification language so as to incorporate various aggregate functions such as Max, Min, Average, Sum, Count, and Concat. Importantly, our aggregate functions allow us not only to perform aggregation over some values but also to choose the values to be aggregated. Obviously, this extension increases the expressivity of the language and allows us to specify many more interesting prediction tasks. Next, we add various new showcases that exhibit the capabilities of our language in specifying prediction tasks. We also extend the implementation of our prototype in order to incorporate those extensions. To demonstrate the applicability of our approach, more experiments on different prediction tasks are also conducted and presented. Apart from using the reallife event log that was provided for BPIC 2013 [62], we also use another reallife event logs, namely the event logs that were provided for BPIC 2012 [70] and BPIC 2015 [71]. Notably, our experiments also exhibit the usage of a deep Learning approach [30] in predictive process monitoring. In particular, we use deep feedforward neural network. Though there have been some works that exhibit the usage of deep learning approach in predictive process monitoring (cf. [19, 23, 24, 38, 63]), here we consider the prediction tasks that are different from the tasks that have been studied in those works. We also add more thorough explanation on several concepts and ideas of our approach so as to provide a better understanding. The discussion on the related work is also extended. Last but not least, several examples are added in order to support the explanation of various technical concepts as well as to ease the understanding of the ideas.
The remainder of this paper is structured as follows: In Sect. 2, we provide the required background on the concepts that are needed for the rest of the paper. Having laid the foundation, in Sect. 3, we present the language that we introduce for specifying the desired prediction tasks. In Sect. 4, we present a mechanism for building the corresponding prediction model based on the given specification. In Sect. 5, we continue the explanation by providing numerous showcases that exhibit the capability of our language in specifying various prediction tasks. In Sect. 6, we present the implementation of our approach as well as the experiments that we have conducted. Related work is presented in Sect. 7. In Sect. 8, we present various discussions concerning this work, including a discussion on some potential limitations, which pave the way towards our future direction. Finally, Sect. 9 concludes this work.
2 Preliminaries
We will see later that we build the prediction models by using machine learning classification/regression techniques and based on the data in event logs. To provide some background concepts, this section briefly explains the typical structure of event logs as well as the notion of classification and regression in machine learning.
2.1 Trace, event, and event log
We follow the usual notion of event logs as in process mining [67]. Essentially, an event log captures historical information of business process executions. Within an event log, an execution of a business process instance (a case) is represented as a trace. In the following, we may use the terms trace and case interchangeably. Each trace has several events, and each event in a trace captures the information about a particular event/activity that happens during the process execution. Events are characterized by various attributes, e.g., timestamp (the time when the event occurred).
We now proceed to formally define the notion of event logs as well as their components. Let \({\mathcal {E}} \) be the event universe (i.e., the set of all event identifiers), and \(\mathcal {{A}} \) be the set of attribute names. For any event \(e \in {\mathcal {E}} \), and attribute name \(n \in \mathcal {{A}} \), \(\#_{\text {{n}}}(e)\) denotes the value of attributen of \(e \). For example, \(\#_{\text {{timestamp}}}(e)\) denotes the timestamp of the event \(e \). If an event \(e \) does not have an attribute named n, then \(\#_{\text {{n}}}(e) = \bot \) (where \(\bot \) is undefined value). A finite sequence over \({\mathcal {E}} \)of lengthn is a mapping \(\sigma : \{1, \ldots , n\} \rightarrow {\mathcal {E}} \), and we represent such a sequence as a tuple of elements of \({\mathcal {E}} \), i.e., \(\sigma = \langle e _1, e _2, \ldots , e _n\rangle \) where \(e _i = \sigma (i)\) for \(i \in \{1, \ldots , n\}\). The set of all finite sequences over \({\mathcal {E}} \) is denoted by \({\mathcal {E}} ^*\). The length of a sequence \(\sigma \) is denoted by \({\sigma }\).
A trace\(\tau \) is a finite sequence over \({\mathcal {E}} \) such that each event \(e \in {\mathcal {E}} \) occurs at most once in \(\tau \), i.e., \(\tau \in {\mathcal {E}} ^{*}\) and for \(1~\le ~i~<~j~\le ~{\tau }\), we have \(\tau (i) \ne \tau (j)\), where \(\tau (i)\) refers to the event of the trace\(\tau \)at the indexi. Let \(\tau = \langle e_1, e_2, \ldots , e_n\rangle \) be a trace, \(\tau ^{k} = \langle e_1, e_2, \ldots , e_{k}\rangle \) denotes the klength trace prefix of \(\tau \) (for \(1~\le ~k~<~n\)).
Example 1
For example, let \(\{e_1, e_2, e_3, e_4, e_5, e_6, e_7\}~\subset ~{\mathcal {E}} \) be some event identifiers, then the sequence \(\tau ~=~\langle e_3, e_7, e_6, e_4, e_5\rangle ~\in ~{\mathcal {E}} ^{*}\) is an example of a trace. In this case, we have that \({\tau }~=~5\), and \(\tau (3)\) refers to the event of the trace \(\tau \) at the index 3, i.e., \(\tau (3) = e_6\). Moreover, \(\tau ^{2}\) is the prefix of length 2 of the trace \(\tau \), i.e., \(\tau ^{2} = \langle e_3, e_7\rangle \). \(\blacksquare \)
Finally, an event log\(L \) is a set of traces such that each event occurs at most once in the entire log, i.e., for each \(\tau _1, \tau _2 \in L \) such that \(\tau _1 \ne \tau _2\), we have that \(\tau _1 \cap \tau _2 = \emptyset \), where \(\tau _1 \cap \tau _2 = \{e \in {\mathcal {E}} ~\mid ~\exists i, j \in \mathbb {Z}^+ \text { . } \tau _1(i) = \tau _2(j) = e \}\).
 1.
concept:name, which stores the name of event/trace;
 2.
org:resource, which stores the name/identifier of the resource that triggered the event (e.g., a person name);
 3.
org:group, which stores the group name of the resource that triggered the event.
2.2 Classification and regression
In machine learning, a classification and regression model can be seen as a function \(f: \vec {X} \rightarrow Y\) that takes some input features/variables\(\vec {x} \in \vec {X}\) and predicts the corresponding target value/output\(y \in Y\). The key difference is that the output range of the classification task is a finite number of discrete categories (qualitative outputs), while the output range of the regression task is continuous values (quantitative outputs) [28, 31]. Both of them are supervised machine learning techniques where the models are trained with labelled data. That is, the inputs for the training are pairs of input variables \(\vec {x}\) and (expected) target value y. This way, the models learn how to map certain inputs \(\vec {x}\) into the expected target value y.
3 Specifying the desired prediction tasks
This section elaborates our mechanism for specifying the desired prediction tasks. In predictive business process monitoring, we are interested in predicting the future information of a running process. Thus, the input of the prediction is information about the process that is currently running, e.g., what is the sequence of activities that have been executed so far, who executes a certain activity, etc. This information is often referred to as a (partial) business process execution information or a (partial) trace, which consists of a sequence of events that have occurred during the process execution. On the other hand, the output of the prediction is the future information of the process that is currently running, e.g., how long is the remaining processing time, whether the process will comply to a certain constraint, whether an unexpected behaviour will occur, etc. Based on these facts, here we introduce a language that can capture the desired prediction task in terms of the specification for mapping each (partial) trace in the event log into the desired prediction results. In some sense, this specification language enables us to specify the desired prediction function that maps each input (partial) trace into an output that gives the corresponding predicted information. Such specification can be used to train a classification/regression model that will be used as the prediction model.
Section 3.1 provides an informal intuition on our language for specifying prediction tasks. It also gives some intuitive examples that illustrate the motivation of some of our language requirements. In Sect. 3.2, we elaborate several requirements that guide the development of our specification language. Throughout Sects. 3.3 and 3.4, we introduce the language for specifying the condition and target expressions in analytic rules. Specifically, Sect. 3.4 introduces a language called FirstOrder Event Expression (FOE), while Sect. 3.3 elaborates several components that are needed to define such language. We will see later that FOE can be used to formally specify condition expressions and a fragment of FOE can be used to specify target expressions. Finally, the formalization of analytic rules is provided in Sect. 3.5.
3.1 Overview: prediction task specification language
We will see later that a target expression specifies either the desired prediction result or expresses the way to obtain the desired prediction result from a trace. Thus, an analytic rule \(R \) can also be seen as a means to map (partial) traces into either the desired prediction results, or to obtain the expected prediction results of (partial) traces.
3.2 Specification language requirements
Driven by various typical prediction tasks that are studied in the area of predictive process monitoring (see [20, 36] for an extensive overview), in the following, we elaborate several requirements for our language as well as the motivation for each requirement. These requirements guide the development of our language.
 Req. 1:

The language should support the specification of prediction tasks in which the outputs could be either numerical or nonnumerical values/information.
 Req. 2:

The language should allow us to automatically build the corresponding prediction model from the given specification.
 Req. 3:

The language should allow us to express complex constraints/properties over sequence of events by also involving the corresponding events data (attribute values). This includes the capability to universally/existentially quantify different event time points and to compare different event attribute values at different time points.
 Req. 4:

The language should allow us to specify arithmetic expressions as well as aggregate functions involving the data. Additionally, the language should allow us to do selective aggregation operations (i.e., selecting the values to be aggregated).
 Req. 5:

The language should allow us to specify the target information to be predicted, and this might include the way to obtain a certain value (which might involve some arithmetic expressions).
3.3 Towards formalizing the condition and target expressions
This section is devoted to introduce several components that are needed to define the language for specifying condition and target expressions in Sect. 3.4.
As an example of event attribute accessor, the expression \(\text {\texttt {e}}[i]\mathbf{. }\,{\text {{org:resource}}}\) refers to the value of the attribute \(\text {org:resource}\) of the event at the position i.
Example 2
Consider the trace \(\tau = \langle e_1, e_2, e_3, e_4, e_5\rangle \), let “Bob” be the value of the attribute org:resource of the event \(e_3\) in \(\tau \), i.e., \(\#_{\text {{org:resource}}}(e_3) = \hbox {``Bob''}\), and \(e_3\) does not have any attributes named org:group, i.e., \(\#_{\text {{org:group}}}(e_3) = \bot \). In this example, we have that \((\text {\texttt {e}}[3]\mathbf{. }\,{\text {{org:resource}}})^{\tau ,k}_{\nu }~=~\hbox {``Bob''}\), and \((\text {\texttt {e}}[3]\mathbf{. }\,{\text {{org:group}}})^{\tau ,k}_{\nu }~=~\bot \). \(\blacksquare \)
Example 3
We interpret each logical/arithmetic comparison operator (i.e., \(==\), \(\ne \), <, >, etc) in the event expressions as usual. For instance, the expression \(26 \ge 3\) is interpreted as \(\mathsf {true}\), while the expression “receivedOrder” \(==\) “sendOrder” is interpreted as \(\mathsf {false}\). Additionally, any comparison involving undefined value (\(\bot \)) is interpreted as false. It is easy to see how to extend the formal definition of our interpretation function \((\cdot )^{\tau ,k}_{\nu }\) towards interpreting event expressions; therefore, we omit the details.
3.3.1 Adding aggregate functions
We now extend the notion of numeric expression and nonnumeric expression by adding several numeric and nonnumeric aggregate functions. A numeric (resp. nonnumeric) aggregate function is a function that performs an aggregation operation over some values and return a numeric (resp. nonnumeric) value. Before providing the formal syntax and semantics of our aggregate functions, in the following we illustrate the needs of having aggregate functions and we provide some intuition on the shape of our aggregate functions.
where (i)\(\mathsf {number}\), \(\mathsf {idx} \), \(\text {\texttt {e}}[\mathsf {idx} ]\mathbf{. }\,{\text {{NumericAttribute}}}\), \(\text {\texttt {e}}[\mathsf {idx} ]\mathbf{. }\,{\text {{NonNumericAttribute}}}\), \({\mathsf {numExp} _1+\mathsf {numExp} _2}\), and \({\mathsf {numExp} _1  \mathsf {numExp} _2}\) are as before; (ii)\(\mathsf {st}\) and \(\mathsf {ed}\) are either positive integers (i.e., \(\mathsf {st}\in \mathbb {Z}^+\) and \(\mathsf {ed}\in \mathbb {Z}^+\)) or special indices (i.e., \({\mathsf {last}}\) or \({\mathsf {curr}}\)), and \(\mathsf {st}\le \mathsf {ed}\); (iii)x is a variable called aggregation variable, and the range of its value is between \(\mathsf {st}\) and \(\mathsf {ed}\) (i.e., \({\mathsf {st}\le x \le \mathsf {ed}}\)). The expressions \({{{{\mathbf {\mathsf{{where}}}}}}~x=\mathsf {st}: \mathsf {ed}}\) and \({{\mathbf {\mathsf{{{{within}}}}}}~\mathsf {st}: \mathsf {ed}}\) are called aggregation variable range; (iv)\(\mathsf {numSrc} \) and \(\mathsf {nonNumSrc} \) specify the source of the values to be aggregated. The \(\mathsf {numSrc} \) is specified as numeric expression, while \(\mathsf {nonNumSrc} \) is specified as nonnumeric expression. Both of them may and can only use the corresponding aggregation variable x, and they cannot contain any aggregate functions; (v)\(\mathsf {aggCond}\) is an aggregation condition over the corresponding aggregation variable x and no other variables are allowed to occur in \(\mathsf {aggCond}\); (vi)\(\text {attName}\) is an attribute name; (vii) for the aggregate functions, as the names describe, sum stands for summation, avg stands for average, min stands for minimum, max stands for maximum, count stands for counting, countVal stands for counting values, and concat stands for concatenation. The behaviour of these aggregate functions is quite intuitive. Some intuition has been given previously, and we explain their details behaviour while providing their formal semantics below. The aggregate functions sum, avg, min, max, concat that have aggregation conditions \(\mathsf {aggCond}\) are also called conditional aggregate functions.
Notice that a numeric aggregate function is also a numeric expression and a numeric expression is also a component of a numeric aggregate function (either in the source value or in the aggregation condition). Hence, it may create some sort of nested aggregate function. However, to simplify the presentation, in this work we do not allow nested aggregation functions of this form, but technically it is possible to do that under a certain care on the usage of the variables (similarly for the nonnumeric aggregate function).
Example 4
Having this machinery in hand, we are now ready to formally define the semantics of aggregate functions. The formal semantics of the conditional aggregate functions \({\mathbf {\mathsf{{{sum}}}}} \), \({\mathbf {\mathsf{{{avg}}}}} \), \({\mathbf {\mathsf{{{max}}}}} \), \({\mathbf {\mathsf{{{min}}}}} \) is provided in Fig. 2. Intuitively, the aggregate function \({\mathbf {\mathsf{{{sum}}}}} \) computes the sum of the values that are obtained from the evaluation of the specified numeric expression \(\mathsf {numSrc} \) over the specified aggregation range (i.e., between \(\mathsf {st}\) and \(\mathsf {ed}\)). Additionally, the computation of the summation ignores undefined values and it only considers those indices within the specified aggregation range in which the aggregation condition is evaluated to true. The intuition for the aggregate functions \({\mathbf {\mathsf{{{avg}}}}} \), \({\mathbf {\mathsf{{{max}}}}} \), \({\mathbf {\mathsf{{{min}}}}} \) is similar, except that \({\mathbf {\mathsf{{{avg}}}}} \) computes the average, \({\mathbf {\mathsf{{{max}}}}} \) computes the maximum values, and \({\mathbf {\mathsf{{{min}}}}} \) computes the minimum values.
Example 5
The aggregate function \({\mathbf {\mathsf{{{max}}}}} (\mathsf {numExp} _1,\mathsf {numExp} _2)\) computes the maximum value between the two values that are obtained by evaluating the specified two numeric expressions \(\mathsf {numExp} _1\) and \(\mathsf {numExp} _2\). It gives undefined value \(\bot \) if one of them is evaluated to undefined value \(\bot \) (similarly for the aggregate function \({\mathbf {\mathsf{{{min}}}}} (\mathsf {numExp} _1,\mathsf {numExp} _2)\) except that it computes the minimum value). The detailed formal semantics of these functions is provided in Appendix A.
The aggregate function \({\mathbf {\mathsf{{{concat}}}}} \) concatenates the values that are obtained from the evaluation of the given nonnumeric expression under the valid aggregation range (i.e., we only consider the value within the given aggregation range in which the aggregation condition is satisfied). Moreover, the concatenation ignores undefined values and treats them as empty string. The detailed formal semantics of the aggregate function \({\mathbf {\mathsf{{{concat}}}}} \) is provided in Appendix A.

\({\mathbf {\mathsf{{{sum}}}}} (\mathsf {numSrc} ;~{{{\mathbf {\mathsf{{where}}}}}}~x=\mathsf {st}:\mathsf {ed})\)

\({\mathbf {\mathsf{{{avg}}}}} (\mathsf {numSrc} ;~{{{\mathbf {\mathsf{{where}}}}}}~x=\mathsf {st}:\mathsf {ed})\)

\({\mathbf {\mathsf{{{min}}}}} (\mathsf {numSrc} ;~{{{\mathbf {\mathsf{{where}}}}}}~x=\mathsf {st}:\mathsf {ed})\)

\({\mathbf {\mathsf{{{max}}}}} (\mathsf {numSrc} ;~{{{\mathbf {\mathsf{{where}}}}}}~x=\mathsf {st}:\mathsf {ed})\)

\({\mathbf {\mathsf{{{concat}}}}} (\mathsf {nonNumSrc} ;~{{{\mathbf {\mathsf{{where}}}}}}~x=\mathsf {st}:\mathsf {ed})\)
3.4 FirstOrder Event Expression (FOE)
Finally, we are ready to define the language for specifying condition expression, namely FirstOrder Event Expression (FOE). A part of this language is also used to specify target expression.
Example 6
In general, FOE has the following main features: (i) It allows us to specify constraints over the data (attribute values); (ii) it allows us to (universally/existentially) quantify different event time points and to compare different event attribute values at different event time points; (iii) it allows us to specify arithmetic expressions/operations involving the data as well as aggregate functions; (iv) it allows us to do selective aggregation operations (i.e., selecting the values to be aggregated). (v) The fragments of FOE, namely the numeric and nonnumeric expressions, allow us to specify the way to compute a certain value.
3.4.1 Checking whether a closed FOE formula is satisfied
We now proceed to introduce several properties of FOE formulas that are useful for checking whether a trace \(\tau \) and a prefix length k satisfy a closed FOE formula \(\varphi \), i.e., to check whether \(\tau ,k \models \varphi \). This check is needed when we create the prediction model based on the specification of prediction task provided by an analytic rule.
Let \(\varphi \) be an FOE formula, we write \(\varphi [i \mapsto c]\) to denote a new formula obtained by substituting each variable i in \(\varphi \) by c. In the following, Theorems 1 and 2 show that while checking whether a trace \(\tau \) and a prefix length k satisfy a closed FOE formula \(\varphi \), we can eliminate the presence of existential and universal quantifiers.
Theorem 1
Proof
By the definition of the semantics of FOE, we have that \(\tau \) and k satisfy \(\exists i.\varphi \) (i.e., \(\tau ,k~\models ~\exists i.\varphi \)) iff there exists an index \(c \in \{1, \ldots , {\tau }\}\), such that \(\tau \) and k satisfy the formula \(\psi \) that is obtained from \(\varphi \) by substituting each variable i in \(\varphi \) with c (i.e., \(\tau ,k~\models ~\psi \) where \(\psi \) is \(\varphi [i \mapsto c]\)) Thus, it is the same as satisfying the disjunctions of formulas that is obtained by considering all possible substitutions of the variable i in \(\varphi \) by all possible values of c (i.e., \(\bigvee \nolimits _{c \in \{1, \ldots {\tau }\}} \varphi [i \mapsto c]\)). This is the case because such disjunctions of formulas can be satisfied by \(\tau \) and k if and only if there exists at least one formula in that disjunctions of formulas that is satisfied by \(\tau \) and k. \(\square \)
Theorem 2
Proof
The proof is quite similar to Theorem 1, except that we use the conjunctions of formulas. Basically, we have that \(\tau \) and k satisfy \(\forall i.\varphi \) (i.e., \(\tau ,k~\models ~\forall i.\varphi \)) iff for every \(c \in \{1, \ldots , {\tau }\}\), we have that \(\tau ,k~\models ~\psi \), where \(\psi \) is obtained from \(\varphi \) by substituting each variable i in \(\varphi \) with c. In other words, \(\tau \) and k satisfy each formula that is obtained from \(\varphi \) by considering all possible substitutions of variable i with all possible values of c. Hence, it is the same as satisfying the conjunctions of those formulas (i.e., \(\bigwedge _{c \in \{1, \ldots {\tau }\}} \varphi [i \mapsto c]\)). This is the case because such conjunctions of formulas can be satisfied by \(\tau \) and k if and only if each formula in that conjunctions of formulas is satisfied by \(\tau \) and k. \(\square \)
 1.
First, we eliminate all quantifiers. This can be done easily by applying Theorems 1 and 2. As a result, each quantified variable will be instantiated with a concrete value;
 2.
Evaluate all aggregate functions as well as all event attribute accessor expressions based on the event attributes in \(\tau \) so as to get the actual values of the corresponding event attributes. After this step, we have a formula that is constituted by only concrete values composed by either arithmetic operators (i.e., \(+\) or −), logical comparison operators (i.e., \(==\) or \(\ne \)), or arithmetic comparison operators (i.e., <, >, \(\le \), \(\ge \), \(==\) or \(\ne \));
 3.
Last, we evaluate all arithmetic expressions as well as all expressions involving logical and arithmetic comparison operators. If the whole evaluation gives us \(\mathsf {true}\) (i.e., \((\varphi )^{\tau ,k}_{} = \mathsf {true}\)), then we have that \(\tau ,k \models \varphi \), otherwise \(\tau ,k \not \models \varphi \) (i.e., \(\tau \) and k do not satisfy \(\varphi \)).
Theorem 3
Given a closed FOE formula \(\varphi \), a trace \(\tau \), and a prefix length k, checking whether \(\tau ,k \models \varphi \) is decidable.
This procedure has been implemented in our prototype as a part of the mechanism for processing the specification of prediction task while constructing the prediction model.
3.5 Formalizing the analytic rule
With this machinery in hand, we can formally say how to specify condition and target expressions in analytic rules, namely that condition expressions are specified as closed FOE formulas, while target expressions are specified as either numeric expression or nonnumeric expression, except that target expressions are not allowed to have index variables. Thus, they do not need variable valuation. We require an analytic rule to be coherent, i.e., all target expressions of an analytic rule should be either only numeric or nonnumeric expressions. An analytic rule in which all of its target expressions are numeric expressions is called numeric analytic rule, while an analytic rule in which all of its target expressions are nonnumeric expressions is called nonnumeric analytic rule.
4 Building the prediction model
Once we are able to specify the desired prediction tasks properly, how can we build the corresponding prediction model based on the given specification? In this case, we need a mechanism that fulfils the following requirement: For the given specification of the desired prediction task provided in our language, it should create the corresponding reliable prediction function, which maps each input partial trace into the most probable predicted output.
Given an analytic rule \(R \) and an event log \(L \), our aim is to create a prediction function that takes (partial) trace as the input and predict the most probable output value for the given input. To this aim, we train a classification/regression model in which the input is the features that are obtained from the encoding of all possible trace prefixes in the event log \(L \) (the training data). If \(R \) is a numeric analytic rule, we build a regression model. Otherwise, if \(R \) is a nonnumeric analytic rule, we build a classification model. There are several ways to encode (partial) traces into input features for training a machine learning model. For instance, [33, 60] study various encoding techniques such as indexbased encoding and boolean encoding. In [63], the authors use the socalled onehot encoding of event names, and also add some timerelated features (e.g., the time increase with respect to the previous event). Some works consider the feature encodings that incorporate the information of the last nevents. There are also several choices on the information to be incorporated. One can incorporate only the name of the events/activities, or one can also incorporate other information (provided by the available event attributes) such as the (human) resource who is in charged in the activity.
In general, an encoding technique can be seen as a function \(\mathsf {enc}\) that takes a trace \(\tau \) as the input and produces a set \(\{x_1,\ldots , x_m\}\) of features, i.e., \(\mathsf {enc}(\tau ) = \{x_1,\ldots , x_m\}\). Furthermore, since a trace \(\tau \) might have arbitrary length (i.e., arbitrary number of events), the encoding function must be able to transform these arbitrary number of trace information into a fix number of features. This can be done, for example, by considering the last nevents of the given trace \(\tau \) or by aggregating the information within the trace itself. In the encoding that incorporates the last nevents, if the number of the events within the trace \(\tau \) is less than n, then typically we can add 0 for all missing information in order to get a fix number of features.
Algorithm 1 illustrates our procedure for building the prediction model based on the given inputs, namely: (i) an analytic rule \(R \), (ii) an event log \(L \), and (iii) a set \(\mathsf {Enc}= \{\mathsf {enc}_1, \ldots , \mathsf {enc}_n\}\) of encoding functions. The algorithm works as follows: for each klength trace prefix \(\tau ^{k}\) of each trace \(\tau \) in the event log \(L \) (where \(1< k < {\tau }\)), we do the following: In line 3, we apply each encoding function \(\mathsf {enc}_i \in \mathsf {Enc}\) into \(\tau ^{k}\), and combine all obtained features. This step gives us the encoded trace prefix. In line 4, we compute the expected prediction result (target value) by applying the analytical rule \(R \) to \(\tau ^{k}\). In line 5, we add a new training instance by specifying that the prediction function \({\mathcal {P}}\) maps the encoded trace prefix \(\tau ^{k}_{\text {encoded}}\) into the target value computed in the previous step. Finally, we train the prediction function \({\mathcal {P}}\) and get the desired prediction function.
Observe that the procedure above is independent with respect to the classification/regression model and trace encoding technique that are used. One can plug in different machine learning classification/regression technique as well as use different trace encoding technique in order to get the desired quality of prediction.
We elaborate why our approach fulfils the requirement that is described in the beginning of this section as follows: In our approach, we mainly rely on supervised machine learning technique in order to learn the desired prediction function from the historical business process execution log (event log). Recall that a supervised machine learning approach learns and approximates a function that maps each input to an output based on training data containing examples of input and output pairs. Additionally, recall that a prediction task specification essentially specifies the way to map a partial trace (input) into the corresponding desired prediction result (output). As can be seen from our approach, by using the given prediction task specification and the given event log, we can create a training data containing various examples of inputs and outputs. Therefore, each example of input and output in this training data is generated based on the given prediction task specification. Since the prediction function is built based on this training data (which is created based on the prediction task specification), the way the learned prediction function maps each input to output is influenced by the given specification as well. Thus, the generated prediction function is basically built based on the given specification of prediction task. As for the reliability of the generated prediction functions, when we apply our whole approach in our experiments (cf. Sect. 6), the reliability of the created prediction functions is measured by several metrics, e.g., accuracy, AUC, precision, etc.
5 Showcases and multiperspective prediction service
An analytic rule \(R \) specifies a particular prediction task of interest. To specify several desired prediction tasks, we only have to specify several analytic rules, i.e., \(R _1, R _2, \ldots , R _n\). Given a set \({\mathcal {R}} = \{R _1, R _2, \ldots , R _n\}\) of analytic rules, our approach allows us to construct a prediction model for each analytic rule \(R _i \in {\mathcal {R}} \). By having all of the constructed prediction models where each of them focuses on a particular prediction objective, we can obtain a multiperspective prediction analysis service.
In Sect. 3, we have seen some examples of prediction task specification for predicting the pingpong behaviour and the remaining processing time. In this section, we present numerous other showcases of prediction task specification using our language. More showcases can be found in Appendix B.
5.1 Predicting unexpected behaviour/situation
We can specify the task for predicting unexpected behaviour by first expressing the characteristics of the unexpected behaviour.
5.2 Predicting SLA/business constraints compliance
Using FOE, we can easily specify numerous expressive SLA conditions as well as business constraints. Furthermore, using the approach presented in Sect. 4, we can create the corresponding prediction model, which predicts the compliance of the corresponding SLA/business constraints.
5.3 Predicting timerelated information
In Sect. 3.1, we have seen how we can specify the task for predicting the remaining processing time (by specifying a target expression that computes the time difference between the timestamp of the last and the current events). In the following, we provide another examples on predicting timerelated information.
5.4 Predicting workloadrelated information
5.5 Predicting resourcerelated information
\({\mathbf {\mathsf{{{countVal}}}}} (\text {org:resource};~{\mathbf {\mathsf{{{{within}}}}}}~1:{\mathsf {last}})\)
is evaluated to the number of different values of the attribute \(\text {org:resource}\) within the corresponding trace, \(R _{16}\) maps each trace prefix \(\tau ^{k}\) into the number of different resources.
5.6 Predicting valueadded related information
Valueadded analysis in BPM aims at identifying unnecessary steps within a process with the purpose of eliminating those steps [22]. The steps within a process can be classified into either valueadded, business valueadded, or nonvalue added. Valueadded steps are the steps that produce/add values to the customer or to the final outcome/product of the process. Business valueadded steps are those steps that might not directly add value to the customer but they might be necessary or useful for the business. Nonvalueadded steps are those that are neither valueadded nor business valueadded. Based on this analysis, typically we would like to retain those valueadded steps, and eliminate (or reduce) those nonvalueadded steps. These nonvalueadded steps are often also associated with wastes. An example of waste is overprocessing waste. An overprocessing waste occurs when an activity is performed unnecessarily with respect to the outcome of a process (i.e., it is performed in a way that does not add value to the final outcome/product of the process). It includes tasks that are performed but later found to be unnecessary. In this light, the work by [73] proposes an approach for minimizing the overprocessing waste by employing prediction techniques.
Using our language, we can characterize prediction tasks that could assist in minimizing overprocessing waste. For instance, consider the process of handing personal loan applications. Suppose that there are several checking activities that need to be performed on each application, and these checks can be performed in any order (each check is independent of each other). Failure in any of these checks would make the application rejected and the process stops immediately. Only cases that successfully pass all checks are accepted. This kind of process is often called a knockout process [65]. For instance, let these checking activities be (i) Applicant’s Identity Check (IDC); (ii) Loan Worthiness Assessment (LWA); and (iii) Applicant’s Documents Verification (DV). In this scenario, if we first perform the Identity Check activity and then the Documents Verification activity, but the Document Verification gives negative outcome (i.e., the application is rejected), then the efforts that we have spent on performing the Identity Check activity would be a waste. Thus, it might be desirable to perform the Document Verification step first so that we could reject the application immediately before spending any efforts on any other activities.
6 Implementation and experiments
As a proof of concept, we develop a prototype that implements our approach^{3} This prototype includes a parser for our language and a program for automatically processing the given prediction task specification as well as for building the corresponding prediction model based on our approach explained in Sects. 3 and 4. We also build a ProM^{4} plugin that wraps these functionalities. Several feature encoding functions to be selected are also provided, e.g., onehot encoding of event attributes, time since the previous event, plain attribute values encoding, etc. We can also choose the desired machine learning model to be built. Our implementation uses Java and Python. For the interaction between Java and Python, we use Jep (Java Embedded Python).^{5} In general, we use Java for implementing the program for processing the specification and we use Python for dealing with the machine learning models.
 1.
Is the whole proposed approach applicable in practice (in a reallife setting)?
 2.
In practice, can we generate reliable prediction models by following our approach (starting from specifying the desired prediction task)?
 3.
What are the factors that influence the quality of the generated prediction models?
For the experiment, we follow the standard holdout method [31]. Specifically, we partition the data into two sets as follows: We use the first twothirds of the log for the training data and the last onethird of the log for the testing data. For each prediction task specification, we apply our approach in order to generate the corresponding prediction model, and then we evaluate the prediction quality of the generated prediction model by considering each klength trace prefix \(\tau ^{k}\) of each trace \(\tau \) in the testing set (for \(1< k < {\tau }\)). In order to provide a baseline, we use a statisticalbased prediction technique, which is often called Zero Rule (ZeroR). Specifically, for the classification task, the prediction by \(\text {ZeroR}\) is performed based on the most common target value in the training set, while for the regression task, the prediction is based on the mean value of the target values in the training data. Although \(\text {ZeroR}\) seems to be quite naive, depending on the data, in some cases it can outperform some advanced machine learning models. The datasets and our source code are available online so as to enable the replication of the experiments \(^3\).
Recall that our approach for building the prediction model presented in Sect. 4 is independent with respect to the supervised machine learning model that is used. Within these experiments, we consider several machine learning models, namely (i) Logistic Regression, (ii) Linear Regression, (iii) Naive Bayes Classifier, (iv) Decision Tree [10], (v) Random Forest [9], (vi) AdaBoost [27] with Decision Tree as the base estimator, (vii) Extra Trees [29], (viii) Voting Classifier that is composed of Decision Tree, Random Forest, AdaBoost, and Extra Trees. Among these, logistic regression, naive Bayes, and voting classifier are only used for classification tasks, and linear regression is only used for regression tasks. The rest are used for both. Notably, we also use a deep learning approach [30]. In particular, we use the deep feedforward neural network and we consider various sizes of the network by taking into account several different depth and width of the network. We consider different numbers of hidden layers ranging from 2 to 6 and three variants of the number of neurons, namely 75, 100, and 150. For compactness of the presentation, we do not show the evaluation results on all of these network variations but we only show the best result. The detail results are available at http://bit.ly/sdprom2 as a supplementary material. In the implementation, we use the machine learning libraries provided by scikitlearn [46]. For the implementation of neural network, we use Keras^{6} with Theano [64] backend.
To assess the prediction quality, we use the standard metrics for evaluating classification and regression models that are generally used in the machine learning literature. These metrics are also widely used in many works in this research area (e.g., [33, 35, 37, 63, 69, 72]). For the classification task, we use accuracy, area under the ROC curve (AUC), precision, recall, and Fmeasure. For the regression task, we use mean absolute error (MAE) and root mean square error (RMSE). In the following, we briefly explain these metrics. A more elaborate explanation on these metrics can be found in the typical literature on machine learning and data mining, e.g., [28, 31, 43].
Accuracy is the fraction of predictions that are correct. It is computed by dividing the number of correct predictions by the number of all predictions. The range of accuracy value is between 0 and 1. The value 1 indicates the best model, while 0 indicates the worst model. An ROC (receiver operating characteristic) curve allows us to visualize the prediction quality of a classifier. If the classifier is good, the curve should be as closer to the top left corner as possible. A random guessing is depicted as a straight diagonal line. Thus, the closer the curve to the straight diagonal line, the worse the classifier. The value of the area under the ROC curve (AUC) allows us to assess a classifier as follows: The AUC value equal to 1 shows a perfect classifier, while the AUC value equal to 0.5 shows the worst classifier that is not better than random guessing. Thus, the closer the value to 1, the better it is, and the closer the value to 0.5, the worse it is. Precision measures the exactness of the prediction. When a classifier predicts a certain output for a certain case, the precision value intuitively indicates how much is the chance that such prediction is correct. Specifically, among all cases that are classified into a particular class, precision measures the fraction of those cases that are correctly classified. On the other hand, recall measures the completeness of the prediction. Specifically, among all cases that should be classified as a particular class, recall measures the fraction of those cases that can be classified correctly. Intuitively, given a particular class, the recall value indicates the ability of the model to correctly classify all cases that should be classified into that particular class. The best precision and recall value is 1. FMeasure is harmonic mean of precision and recall. It provides a measurement that combines both precision and recall values by also giving equal weight to them. Formally, it is computed as follows: \(\text {FMeasure} = (2 \times P \times R) / (P + R)\), where P is precision and R is recall. The best Fmeasure value is 1. Thus, the closer the value to 1, the better it is.
MAE computes the average of the absolute error of all predictions over the whole testing data, where each error is computed as the difference between the expected and the predicted values. Formally, given n testing data, \(\text {MAE }~=~\left( \sum _{i=1}^n {y_i  \hat{y_i}}\right) / n\), where \(\hat{y_i}\) (resp. \(y_i\)) is the predicted value (resp. the expected/actual value) for the testing instance i. RMSE can be computed as follows: \(RMSE ~=~\sqrt{\left( \sum _{i=1}^n (y_i  \hat{y_i})^2\right) / n}\), where \(\hat{y_i}\) (resp. \(y_i\)) is the predicted value (resp. the expected/actual value) for the testing instance i. Compared to MAE, RMSE is more sensitive to errors since it gives larger penalty to larger errors by using the ‘square’ operation. For both MAE and RMSE, the lower the score, the better the model.
In our experiments, we use the trace encoding that incorporates the information of the last nevents, where n is the maximal length of the traces in the event log under consideration. Furthermore, for each experiment we consider two types of encoding, where each of them considers different available event attributes (one encoding incorporates more event attributes than the others). The detail of event attributes that are considered is explained in each experiment below.
6.1 Experiment on BPIC 2013 event log
The event log from BPIC 2013^{7} [62] contains the data from the Volvo IT incident management system called VINST. It stores information concerning the incidents handling process. For each incident, a solution should be found as quickly as possible so as to bring back the service with minimum interruption to the business. It contains 7554 traces (process instances) and 65533 events. There are also several attributes in each event containing various information such as the problem status, the support team (group) that is involved in handling the problem, the person who works on the problem, etc.
The results from the experiments on BPIC 2013 event log using prediction tasks \(R _{\text {E1}}\), \(R _{\text {E2}}\), and \(R _{\text {E3}}\)
Model  First encoding (more features)  Second encoding (less features)  

AUC  Accuracy  W. Prec  W. Rec  FMeasure  AUC  Accuracy  W. Prec  W. Rec  FMeasure  
Experiments with the analytic rule \(R _{\text {E1}}\) (change of group while the concept:name is not ‘queued’)  
\(R _{\text {E1}}\)  
ZeroR  0.50  0.82  0.68  0.82  0.75  0.50  0.82  0.68  0.82  0.75 
Logistic Reg.  0.64  0.81  0.75  0.81  0.76  0.55  0.82  0.68  0.82  0.75 
Naive Bayes  0.51  0.21  0.80  0.21  0.12  0.54  0.19  0.79  0.19  0.09 
Decision Tree  0.67  0.78  0.80  0.78  0.79  0.68  0.82  0.76  0.82  0.77 
Random Forest  0.83  0.84  0.83  0.84  0.83  0.68  0.82  0.76  0.82  0.77 
AdaBoost  0.73  0.81  0.77  0.81  0.78  0.66  0.82  0.75  0.82  0.75 
Extra Trees  0.81  0.83  0.81  0.83  0.82  0.68  0.82  0.76  0.82  0.77 
Voting  0.81  0.81  0.81  0.81  0.81  0.68  0.82  0.76  0.82  0.77 
Deep Neural Net.  0.73  0.83  0.81  0.83  0.81  0.68  0.83  0.78  0.83  0.75 
Experiments with the analytic rule \(R _{\text {E2}}\) (change of people/group and change back to the original person/group)  
\(R _{\text {E2}}\)  
ZeroR  0.50  0.79  0.63  0.79  0.70  0.50  0.79  0.63  0.79  0.70 
Logistic Reg.  0.77  0.82  0.80  0.82  0.80  0.62  0.81  0.78  0.81  0.76 
Naive Bayes  0.69  0.79  0.75  0.79  0.75  0.63  0.80  0.77  0.80  0.76 
Decision Tree  0.73  0.82  0.82  0.82  0.82  0.76  0.82  0.80  0.82  0.80 
Random Forest  0.85  0.86  0.85  0.86  0.85  0.78  0.82  0.80  0.82  0.80 
AdaBoost  0.81  0.84  0.83  0.84  0.83  0.68  0.81  0.79  0.81  0.77 
Extra Trees  0.85  0.86  0.85  0.86  0.86  0.78  0.82  0.80  0.82  0.80 
Voting  0.85  0.86  0.85  0.86  0.85  0.77  0.82  0.81  0.82  0.81 
Deep Neural Net.  0.77  0.86  0.86  0.86  0.85  0.78  0.83  0.82  0.83  0.80 
Experiments with the analytic rule \(R _{\text {E3}}\) (involves at least three different groups)  
\(R _{\text {E3}}\)  
ZeroR  0.50  0.74  0.54  0.74  0.63  0.50  0.74  0.54  0.74  0.63 
Logistic Reg.  0.78  0.78  0.76  0.78  0.76  0.77  0.79  0.77  0.79  0.77 
Naive Bayes  0.75  0.76  0.73  0.76  0.70  0.76  0.77  0.75  0.77  0.73 
Decision Tree  0.79  0.82  0.83  0.82  0.83  0.81  0.82  0.82  0.82  0.82 
Random Forest  0.92  0.87  0.87  0.87  0.87  0.83  0.82  0.82  0.82  0.82 
AdaBoost  0.89  0.86  0.86  0.86  0.86  0.83  0.81  0.80  0.81  0.80 
Extra Trees  0.91  0.87  0.87  0.87  0.87  0.82  0.82  0.82  0.82  0.82 
Voting  0.91  0.85  0.85  0.85  0.85  0.82  0.82  0.81  0.82  0.82 
Deep Neural Net.  0.85  0.85  0.84  0.85  0.84  0.83  0.83  0.82  0.83  0.82 
The results of the experiments on BPIC 2013 event log using prediction tasks \(R _{\text {E4}}\) and \(R _{\text {E5}}\)
Model  First Encoding (more features)  Second Encoding (less features)  

MAE (in days)  RMSE (in days)  MAE (in days)  RMSE (in days)  
Experiments with the analytic rule \(R _{\text {E4}}\) (the remaining duration of all waitingrelated events)  
\(R _{\text {E4}}\)  
ZeroR  5.977  6.173  5.977  6.173 
Linear Reg.  5.946  6.901  6.16  6.462 
Decision Tree  5.431  17.147  5.8  7.227 
Random Forest  4.808  8.624  5.81  7.114 
AdaBoost  14.011  18.349  14.181  15.164 
Extra Trees  4.756  8.612  5.799  7.132 
Deep Neural Net.  2.205  4.702  4.064  4.596 
Experiments with the analytic rule \(R _{\text {E5}}\) (the remaining duration of all events in which the status is “wait”)  
\(R _{\text {E5}}\)  
ZeroR  1.061  1.164  1.061  1.164 
Linear Reg.  1.436  1.974  1.099  1.233 
Decision Tree  0.685  5.165  1.003  1.66 
Random Forest  0.713  3.396  1.016  1.683 
AdaBoost  1.507  3.89  1.044  1.537 
Extra Trees  0.843  3.719  1.005  1.649 
Deep Neural Net.  0.37  2.037  0.683  0.927 
6.2 Experiment on BPIC 2012 event log
 1.One type of activity within this process is named W_Completeren aanvraag, which stands for “Filling in information for the application”. The task for predicting the total duration of all remaining activities of this type is formulated as follows: where \(\textsf {RemTimeFillingInfo}\) is as follows: i.e., it computes the sum of the duration of all remaining W_Completeren aanvraag activities.
 2.At the end of the process, an application can be declined. The task to predict whether an application will eventually be declined is specified as follows: where \(\mathsf {Cond}_{\text {E8}}\) is as follows: i.e., \(\mathsf {Cond}_{\text {E8}}\) says that eventually there will be an event in which the application is declined.
The results of the experiments on BPIC 2012 event log using the prediction task \(R _{\text {E6}}\)
Model  First Encoding (more features)  Second Encoding (less features)  

MAE (in days)  RMSE (in days)  MAE (in days)  RMSE (in days)  
Experiments with the analytic rule \(R _{\text {E6}}\) (total duration of all remaining activities named ‘W_Completeren aanvraag’)  
\(R _{\text {E6}}\)  
ZeroR  3.963  5.916  3.963  5.916 
Linear Reg.  3.613  5.518  3.677  5.669 
Decision Tree  2.865  5.221  2.876  5.228 
Random Forest  2.863  5.198  2.877  5.213 
AdaBoost  3.484  5.655  3.484  5.655 
Extra Trees  2.857  5.185  2.868  5.191 
Deep Neural Net.  2.487  5.683  2.523  5.667 
The results from the experiments on BPIC 2012 event log using the prediction task \(R _{\text {E7}}\)
Model  First encoding (more features)  Second encoding (less features)  

AUC  Accuracy  W. Prec  W. Rec  FMeasure  AUC  Accuracy  W. Prec  W. Rec  FMeasure  
Experiments with the analytic rule \(R _{\text {E7}}\) (predict whether an application will be eventually ‘DECLINED’)  
\(R _{\text {E7}}\)  
ZeroR  0.50  0.78  0.61  0.78  0.68  0.50  0.78  0.61  0.78  0.68 
Logistic Reg.  0.69  0.78  0.75  0.78  0.76  0.69  0.77  0.71  0.77  0.71 
Naive Bayes  0.67  0.33  0.74  0.33  0.30  0.67  0.33  0.73  0.33  0.30 
Decision Tree  0.70  0.78  0.76  0.78  0.77  0.70  0.78  0.76  0.78  0.77 
Random Forest  0.71  0.79  0.77  0.79  0.78  0.71  0.79  0.77  0.79  0.78 
AdaBoost  0.71  0.81  0.78  0.81  0.78  0.71  0.80  0.78  0.80  0.78 
Extra Trees  0.71  0.79  0.77  0.79  0.78  0.71  0.79  0.77  0.79  0.78 
Voting  0.71  0.79  0.77  0.79  0.78  0.71  0.79  0.77  0.79  0.77 
Deep Neural Net.  0.71  0.80  0.77  0.80  0.78  0.71  0.80  0.78  0.80  0.78 
6.3 Experiment on BPIC 2015 event log
In BPIC 2015^{9} [71], 5 event logs from 5 Dutch Municipalities are provided. They contain the data of the processes for handling the building permit application. In general, the processes in these 5 municipalities are similar. Thus, in this experiment we only consider one of these logs. There are several information available such as the activity name and the resource/person that carried out a certain task/activity. The statistic about the log that we consider is as follows: It has 1409 traces (process instances) and 59681 events.
The results from the experiments on BPIC 2015 event log using the prediction task \(R _{\text {E8}}\)
Model  First encoding (more features)  Second encoding (less features)  

AUC  Accuracy  W. Prec  W. Rec  FMeasure  AUC  Accuracy  W. Prec  W. Rec  FMeasure  
Experiments with the analytic rule \(R _{\text {E8}}\) (predicting whether a process is complex)  
\(R _{\text {E8}}\)  
ZeroR  0.50  0.57  0.32  0.57  0.41  0.50  0.57  0.32  0.57  0.41 
Logistic Reg.  0.92  0.83  0.85  0.83  0.83  0.90  0.84  0.84  0.84  0.83 
Naive Bayes  0.81  0.72  0.82  0.72  0.71  0.93  0.68  0.81  0.68  0.66 
Decision Tree  0.80  0.79  0.80  0.79  0.80  0.84  0.85  0.85  0.85  0.85 
Random Forest  0.95  0.89  0.89  0.89  0.89  0.95  0.90  0.90  0.90  0.90 
AdaBoost  0.92  0.87  0.87  0.87  0.87  0.93  0.88  0.88  0.88  0.88 
Extra Trees  0.95  0.88  0.88  0.88  0.88  0.95  0.88  0.89  0.88  0.88 
Voting  0.94  0.85  0.86  0.85  0.86  0.95  0.88  0.88  0.88  0.88 
Deep Neural Net.  0.89  0.84  0.84  0.84  0.84  0.92  0.84  0.84  0.84  0.84 
The results of the experiments on BPIC 2015 event log using the prediction task \(R _{\text {E9}}\)
Model  First Encoding (more features)  Second Encoding (less features)  

MAE  RMSE  MAE  RMSE  
Experiments with the analytic rule \(R _{\text {E9}}\) (the number of the remaining events/activities)  
\(R _{\text {E9}}\)  
ZeroR  11.21  13.274  11.21  13.274 
Linear Reg.  6.003  7.748  14.143  18.447 
Decision Tree  6.972  9.296  6.752  9.167 
Random Forest  4.965  6.884  4.948  6.993 
AdaBoost  4.971  6.737  4.879  6.714 
Extra Trees  4.684  6.567  4.703  6.627 
Deep Neural Net.  6.325  8.185  5.929  7.835 
6.4 Discussion on the experiments
In total, our experiments involve 9 different prediction tasks over 3 different reallife event logs from 3 different domains (1 event log from BPIC 2015, 1 event log from BPIC 2012, and 1 event log from BPIC 2013).
Overall, these experiments show the capabilities of our language in capturing and specifying the desired prediction tasks that are based on the event logs coming from reallife situation. These experiments also exhibit the applicability of our approach in automatically constructing reliable prediction models based on the given specification. This is supported by the following facts: First, for all prediction tasks that we have considered, by considering different input features and machine learning models, we are able to obtain prediction models that beat the baseline. Moreover, for all prediction tasks that predict categorical values, in our experiments we are always able to get a prediction model that has AUC value greater than 0.5. Recall that AUC = 0.5 indicates the worst classifier that is not better than a random guess. Thus, since we have AUC > 0.5, the prediction models that we generate certainly take into account the given input and predict the most probable output based on the given input, instead of randomly guessing the output no matter what the input is. In fact, in many cases, we could even get very high AUC values which are ranging between 0.8 and 0.9 (see Tables 1, 5). This score is very close to the AUC value for the best predictor (recall that AUC = 1 indicates the best classifier).
As can be seen from the experiments, the choice of the input features and the machine learning models influence the quality of the prediction model. The result of our experiments also shows that there is no single machine learning model that always outperforms other models on every task. Since our approach does not rely on a particular machine learning model, it justifies that we can simply plug in different supervised machine learning models in order to get different/better performance. In fact, in our experiments, by considering different models we could get different/better prediction quality. Concerning the input features, for each task in our experiments, we intentionally consider two different input encodings. The first one includes many attributes (hence it incorporates many information), and the second one includes only a certain attribute (i.e., it incorporates less information). In general, our common sense would expect that the more information, the better the prediction quality would be. This is simply because we thought that, by having more information, we have a more holistic view over the situation. Although many of our experiment results show this fact, there are several cases where considering less features could give us a better result, e.g., the RMSE score in the experiment with several models on the tasks \(R _{\text {E5}}\), and the scores of several metrics in the experiment \(R _{\text {E8}}\) show this fact (see Tables 2, 5). In fact, this aligns with the typical observation in machine learning. Irrelevant features could decrease the prediction performance because they might introduce noise in the prediction. Although in the learning process a good model should (or will try to) give a very low weight into irrelevant features, the absence of these unrelated features might improve the quality of the prediction. Additionally, in some situation, too many features might cause overfitting, i.e., the model fits the training data very well, but it fails to generalize well while doing prediction with new input data.
Based on the experience from these experiments, time constraint would also be a crucial factor in choosing the model when we would like to apply this approach in practice. Some models require a lot of tuning in order to achieve a good performance (e.g., neural network), while other models do not need many adjustment and able to achieve relatively good performance (e.g., Extra Trees, Random Forest).
Looking at another perspective, our experiments complement various studies in the area of predictive process monitoring in several ways. First, instead of using machine learning models that are typically used in many studies within this area such as Random Forest and Decision Tree (cf. [17, 18, 35, 72]), we also consider other machine learning models that, to the best of our knowledge, are not typically used. For instance, we use Extra Trees, AdaBoost, and Voting Classifier. Thus, we provide a fresh insight on the performance of these machine learning models in predictive process monitoring by using them in various different prediction tasks (e.g., predicting (finegrained) timerelated information, unexpected behaviour). Although this work is not aimed at comparing various machine learning models, as we see from the experiments, in several cases, Extra Trees exhibits similar performance (in terms of accuracy) as Random Forest. There are also some cases where it outperforms the Random Forest (e.g., see the experiment with the task \(R _{\text {E9}}\) in Table 6). In the experiment with the task \(R _{\text {E7}}\), AdaBoost outperforms all other models. Regarding the type of the prediction tasks, we also look into the tasks that are not yet highly explored in the literature within the area of predictive process monitoring. For instance, while there are numerous works on predicting the remaining processing time, to the best of our knowledge, there is no literature exploring a more finegrained task such as the prediction of the remaining duration of a particular type of event (e.g., predicting the duration of all remaining waiting events). We also consider several workloadrelated prediction tasks, which is rarely explored in the area of predictive process monitoring.
Concerning the deep learning approach, there have been several studies that explore the usage of deep neural network for predictive process monitoring (cf. [19, 23, 24, 38, 63]). However, they focus on predicting the name of the future activities/events, the next timestamp, and the remaining processing time. In this light, our experiments contribute new insights on exhibiting the usage of deep learning approach in dealing with different prediction tasks other than just those tasks. Although the deep neural network does not always give the best result in all tasks in our experiments, there are several interesting cases where it shows a very good performance. Specifically, in the experiments with the tasks \(R _{\text {E4}}\) and \(R _{\text {E5}}\) (cf. Table 2), where all other models cannot beat the RMSE score of the baseline, the deep neural network comes to the rescue and be the only model that could beat the RMSE score of our baseline.
To sum up, concerning the three questions regarding this experiment that are described in the beginning of this section, the first two questions are positively answered by our experiment. In our experiments of applying our approach over three different reallife event logs and nine different prediction tasks, we have successfully obtained (reliable) prediction models and positively demonstrate the applicability of our approach. Regarding the third question, our experiments show that the choice of the machine learning model and the information that is incorporated in the trace encoding would influence the quality of the prediction.
7 Related work
This work is tightly related to the area of predictive analysis in business process management. In the literature, there have been several works focusing on predicting timerelated properties of running processes. The works by [52, 53, 54, 55, 68, 69] focus on predicting the remaining processing time. In [68, 69], the authors present an approach for predicting the remaining processing time based on annotated transition system that contains time information extracted from event logs. The work by [54, 55] proposes a technique for predicting the remaining processing time using stochastic Petri nets. The works by [41, 49, 58, 59] focus on predicting delays in process execution. In [58, 59], the authors use queueing theory to address the problem of delay prediction, while [41] explores the delay prediction in the domain of transport and logistics process. In [25], the authors present an ad hoc predictive clustering approach for predicting process performance. The authors of [63] present a deep learning approach (using LSTM neural network) for predicting the timestamp of the next event and use it to predict the remaining cycle time by repeatedly predicting the timestamp of the next event.
Looking at another perspective, the works by [18, 35, 72] focus on predicting the outcomes of a running process. The work by [35] introduces a framework for predicting the business constraints compliance of a running process. In [35], the business constraints are formulated in propositional Linear Temporal Logic (LTL), where the atomic propositions are all possible events during the process executions. The work by [18] improves the performance of [35] by using a clustering preprocessing step. Another work on outcomes prediction is presented by [50], which proposes an approach for predicting aggregate process outcomes by taking into account the information about overall process risk. Related to process risks, [14, 15] propose an approach for risks prediction. The work by [37] presents an approach based on evolutionary algorithm for predicting business process indicators of a running process instance, where business process indicator is a quantifiable metric that can be measured by data that is generated by the processes. The authors of [40] present a work on predicting business constraint satisfaction. Particularly, [40] studies the impact of considering the estimation of prediction reliability on the costs of the processes.
Another major stream of works tackle the problem of predicting the future activities/events of a running process (cf. [11, 19, 23, 24, 38, 53, 63]). The works by [19, 23, 24, 38, 63] use deep learning approach for predicting the future events, e.g., the next event of the current running process. Specifically, [19, 23, 24, 63] use LSTM neural network, while [38] uses deep feedforward neural network. In [19, 53, 63] the authors also tackle the problem of predicting the whole sequence of future events (the suffix of the current running process).
A key difference between many of those works and ours is that, instead of focusing on dealing with a particular prediction task (e.g., predicting the remaining processing time or the next event), this work introduces a specification language that enables us to specify various desired prediction tasks for predicting various future information of a running business process. To deal with these various desired prediction tasks, we present a mechanism to automatically process the given specification of prediction task and to build the corresponding prediction model. From another point of view, several works in this area often describe the prediction tasks under study simply by using a (possibly ambiguous) natural language. In this light, the presence of our language complements this area by providing a means to formally and unambiguously specify/describe the desired prediction tasks. Consequently, it could ease the definition of the task and the comparison among different works that propose a particular prediction technique for a particular prediction task.
Regarding the specification language, unlike the propositional LTL [51], which is the basis of Declare language [47, 48] and often used for specifying business constraints over a sequence of events (cf. [35]), our FOE language (which is part of our rulebased specification language) allows us not only to specify properties over sequence of events but also to specify properties over the data (attribute values) of the events, i.e., it is dataaware. Concerning dataaware specification language, the work by [1] introduces a dataaware specification language by combining data querying mechanisms and temporal logic. Such language has been used in several works on verification of dataaware processes systems (cf. [2, 12, 13, 56]). The works by [16, 34] provide a dataaware extension of the Declare language based on the FirstOrder LTL (LTLFO). Although those languages are dataaware, they do not support arithmetic expressions/operations over the data which is absolutely necessary for our purpose, e.g., for expressing the time difference between the timestamp of the first and the last event. Another interesting dataaware language is SFEEL, which is part of the Decision Model and Notation (DMN) standard [45] by OMG. Though SFEEL supports arithmetic expressions over the data, it does not allow us to universally/existentially quantify different event time points and to compare different event attribute values at different event time points, which is important for our needs, e.g., in specifying the pingpong behaviour.
Concerning aggregation, there are several formal languages that incorporate such feature (cf. [5, 8, 21]) and many of them have been used in system monitoring. The work by [21] extends the temporal logic Past Time LTL with counting quantifier. Such extension allows us to express a constraint on the number of occurrences of events (similar to our \({\mathbf {\mathsf{{{count}}}}} \) function). In [8] a language called SOLOIST is introduced and it supports several aggregate functions on the number of event occurrences within a certain time window. Differently from ours, both [21] and [8] do not consider aggregation over data (attribute values). The works by [5, 6] extend the temporal logic that was introduced in [4, 7] with several aggregate functions. Such language allows us to select the values to be aggregated. However, due to the interplay between the set and bag semantics in their language, as they have illustrated, some values might be lost while computing the aggregation because they first collect the set of tuples of values that satisfy the specified condition and then they collect the bag of values to be aggregated from that set of tuples of values. To avoid this situation, they need to make sure that each tuple of values has a sort of unique identifier. This situation does not happen in our aggregation because, in some sense, we directly use the bag semantics while collecting the values to be aggregated.
Importantly, unlike those languages above, apart from allowing us to specify a complex constraint/pattern, a fragment of our FOE language also allows us to specify the way to compute/obtain certain values from a trace, which is needed for specifying the desired information/values to be predicted, e.g., the remaining processing time, or the remaining number of a certain activity/event. Our language is also specifically tuned for expressing dataaware properties based on the typical structure of business process execution logs (cf. [32]), and the design is highly driven by the typical prediction tasks in business process management. From another point of view, our work complements the works on predicting SLA/business constraints compliance by providing an expressive language to specify complex dataaware constraints that may involve arithmetic expression and data aggregation.
8 Discussion
This work should be beneficial for the researchers in the area of predictive process monitoring. As we have seen, several works in this area often describe the prediction tasks under study simply by using a (possibly ambiguous) natural language. In this light, the presence of our language complements this area by providing a means to formally and unambiguously specify/describe the desired prediction tasks. Consequently, it could ease the definition of the task and the comparison among different works that propose a particular prediction technique for a particular prediction task. Similarly, as for the business process analyst, the presence of our language should help them in precisely specifying and communicating the desired prediction tasks so as to have suitable prediction services. The presence of our mechanism for creating the corresponding prediction model would also help practitioners in automatically obtaining the corresponding prediction model.
In the following, we provide a discussion concerning the research questions described in Sect. 1 as well as the language requirements in Sect. 3. Furthermore, we also provide a discussion regarding the usability aspect. Finally, this section also discusses potential limitations of this work, which might pave the way towards our future direction.
8.1 Discussion on the research questions
Concerning RQ1, i.e., “How can a specificationdriven mechanism for building prediction models for predictive process monitoring look like?”, as introduced in this work, our specificationdriven approach for building prediction models for predictive process monitoring essentially consists of several steps: (i) First, we have to specify the desired prediction tasks by using the specification language that we have introduced in this work; (ii) second, we create the corresponding prediction models based on the given specification by using the approach that is explained in Sect. 4. (iii) Finally, once the prediction models are created, we can use the constructed prediction models for predicting the future information of a running process. Notice that this approach requires a mechanism to express various desired prediction tasks and also a mechanism to process the given specification so as to build the corresponding prediction models. In this work, we provide all of these and we have also applied this approach into some case studies based on reallife data (cf. Sect. 6).
Regarding RQ2, i.e., “How can an expressive specification language that allows us to express various desired prediction tasks, and at the same time enables us to automatically create the corresponding prediction model from the given specification, look like? Additionally, can that language allow us to specify complex expressions involving data, arithmetic operations and aggregate functions?”, as explained in Sect. 3, the specification of a prediction task can be expressed by using an analytic rule, which is introduced in this work. Essentially, an analytic rule consists of several conditionaltarget expressions and it allows us to specify how we want to map each partial business processes execution information into the expected predicted information. As can be seen in Sects. 4 and 6, a specification provided in this language can be used to train a classification/regression model that can be used as the prediction model. Additionally, as a part of analytic rules, we introduce an FOLbased language called FOE. As can be seen from Sects. 3 and 5, FOE allows us to specify complex expression involving data, arithmetic operations and aggregate functions.
Concerning RQ3, i.e., “how can a mechanism to automatically build the corresponding prediction model based on the given specification look like?”, as explained in Sect. 4, roughly speaking, our mechanism to build the corresponding prediction model from the given specification in our language consists of two core steps: (i) First, we build the training data based on the given specification in our language; (ii) second, we use supervised machine learning technique to learn the corresponding prediction model based on the training data that is generated in the first step.
8.2 Discussion on the language requirements
We now elaborate why our language fulfils all requirements in Sect. 3.2. For the first requirement, since the target expression in the analytic rules can be either numeric or nonnumeric expressions, it is easy to see that the first requirement is fulfilled. Various examples of prediction tasks specification in Sect. 5 also confirm this fact. Our experiments in Sect. 6 also exhibit several case studies where the predicted information is either numeric or nonnumeric values.
Concerning the second requirement, our procedure in Sect. 4 and our experiments in Sect. 6 confirm the fact that we can have a mechanism for automatically building the corresponding prediction model from the given specification in our language.
Finally, regarding the requirements 3, 4, and 5, our language formalism in Sect. 3 and our extensive showcases in Sect. 5 exhibit the fact that our language fulfils these requirements. Specifically, we are able to specify complex expressions over sequence of events by also involving the events data, arithmetic expressions as well as aggregate functions. Additionally, from our various showcases, we can also see that our language allows us to specify the target information to be predicted where we might also have to specify the way to obtain a certain value and might involve some arithmetic expression.
8.3 Discussion on usability
The usability of the language can only be measured by outcomes of user studies which are beyond the scope of this work. Obviously, it is interesting to do this; therefore, we consider doing a user study as one of our future directions. Although this is one of the limitations of our current work, some insights regarding this aspect can be drawn from previous studies. Taking the success story from the works on domainspecific languages (DSLs) [39], by providing a language in which the development is driven by or tailored to a particular domain/area, DSL offers substantial gains in expressiveness and ease of use compared to general purpose language in their domain of application [39]. In general, DSLs were developed because they can offer domainspecificity in better ways. In this light, our language has a similar situation in the sense that its development is highly driven by a particular area, namely predictive process monitoring. Due to this fact, we expect that similar benefits concerning this aspect could be obtained.
Another aspect of usability concerns learnability. According to [39], one way to design a DSL is to build it based on an existing language. A possible benefit of this approach is familiarity for the users. Due to this familiarity, it is expected that the users, who are familiar with the corresponding existing language, could easily learn and use it. In this light, our FOE language follows this approach. It is actually based on a wellknown logicbased language namely FirstOrder Logic (FOL). Furthermore, choosing FOL as the basis of the language also gives us another advantage since it is more well known compared to other more advanced logicbased languages, e.g., Linear Temporal Logic (LTL). This is the case because typically FOL is a part of many foundational courses; hence, it is taught to a wider audience than other more advanced logics. In general, since this language is a logicbased language, a user with pretrained knowledge in logic should be able to use the language. Looking at another perspective, the development of our language is also highly driven by the typical structure of XES event log. Since XES is a typical standard format for event logs and widely known in the area of process mining, we expect that this fact improves the familiarity for the users and eases the adoption/usage by the users who are familiar with the XES standard (which should be the typical users in this area). A possible direction to improve the usability would be to study advanced visual techniques and explore the possibility of having visual support (e.g., graphical notations) for creating the specification. Having visual support is a way that has been observed in the area of DSL as a way to improve usability [26, 44]. This kind of approach also has been seen in the literature (cf. [47, 48]).
According to [44], the effectiveness of communication is measured by the match between the intended message and the received message (information transmitted). In this light, the presence of the formal semantics of our language prevents ambiguous interpretation that could cause a mismatch between the intended and the understood meaning. In the end, the presence of this language could be useful for BP analysts to precisely specify the desired prediction tasks, and this is important in order to have suitable prediction services. This language could also be an effective means for BP analysts to communicate the desired prediction tasks unambiguously.
8.4 Limitation and future work
This work focuses on the problem of predicting the future information of a single running process based on the current information of that corresponding running process. In practice, there could be several processes running concurrently. Hence, it is absolutely interesting to extend the work further so as to consider the prediction problems on concurrently running processes. This extension would involve the extension of the language itself. For instance, the language should be able to specify some patterns over multiple running processes. Additionally, it should be able to express the desired predicted information or the way to compute the desired predicted information, and it might involve the aggregation of information over multiple running processes. Consequently, the mechanism for building the corresponding prediction model needs to be adjusted.
Our experiments (cf. Sect. 6) show a possible instantiation of our generic approach in creating prediction services. In this case we predict the future information of a running process by only considering the information from a single running process. However, in practice, other processes that are concurrently running might affect the execution of other processes. For instance, if there are so many processes running together and there are not enough employees for handling all processes simultaneously, some processes might need to wait. Hence, when we predict the remaining duration of waiting events, the current workload information might be a factor that need to be considered and ideally these information should be incorporated in the prediction. One possibility to overcome this limitation is to use the trace encoding function that incorporates the information related to the processes that are concurrently running. For instance, we can make an encoding function that extracts relevant information from all processes that are concurrently running, and use them as the input features. Such information could be the number of employees that are actively handling some processes, the number of available resources/employees, the number of processes of a certain type that are currently running, etc.
This kind of machine learningbased technique performs the prediction based on the observable information. Thus, if the information to be predicted depends on some unobservable factors, the quality of the prediction might be decreasing. Therefore, in practice, all factors that highly influence the information to be predicted should be incorporated as much as possible. Furthermore, the prediction model is only built based on the historical information about the previously running processes and neglects the possibility of the existence of the domain knowledge (e.g., some organizational rules) that might influence the prediction. In some sense, it (implicitly) assumes that the domain knowledge is already incorporated in those historical data that captures the processes execution in the past. Obviously, it is then interesting to develop further the technique so as to incorporate the existing domain knowledge in the creation of the prediction model with the aim of enhancing the prediction quality. Looking at another perspective, since the prediction model is only built based on the historical data of the past processes execution, this approach is absolutely suitable for the situation in which the (explicit) process model is unavailable or hard to obtain.
As also observed by other works in this area (e.g., [69]), in practice, by predicting the future information of a running process, we might affect the future of the process itself, and hence, we might reduce the preciseness of the prediction. For instance, when it is predicted that a particular process would exhibit an unexpected behaviour, we might be eager to prevent it by closely watching the process in order to prevent that unexpected behaviour. In the end, that unexpected behaviour might not be happened due to our preventive actions, and hence, the prediction is not happened. On the other hand, if we predict that a particular process will run normally, we might put less attention than expected into that process, and hence, the unexpected behaviour might occur. Therefore, knowing the (prediction of the) future might not always be good for this case. This also indicates that a certain care need to be done while using the predicted information.
9 Conclusion
We have introduced an approach for obtaining predictive process monitoring services based on the specification of the desired prediction tasks. Specifically, we proposed a novel rulebased language for specifying the desired prediction tasks, and we devise a mechanism for automatically building the corresponding prediction models based on the given specification. Establishing such language is a nontrivial task. The language should be able to capture various prediction tasks, while at the same time allowing us to have a procedure for building/deriving the corresponding prediction model. Our language is a logicbased language which is fully equipped with a welldefined formal semantics. Therefore, it allows us to do formal reasoning over the specification, and to have a machine processable language that enables us to automate the creation of the prediction model. The language allows us to express complex properties involving data and arithmetic expressions. It also allows us to specify the way how to compute certain values. Notably, our language supports several aggregate functions. A prototype that implements our approach has been developed, and several experiments using reallife event logs confirmed the applicability of our approach. Remarkably, our experiments involve the usage of a deep learning model. In particular, we use the deep feedforward neural network.
Apart from those that are discussed in Sect. 8, the future work includes the extension of the tool and the language. One possible extension would be to incorporate trace attribute accessor that allows us to specify properties involving trace attribute values. As our FOE language is a logicbased language, there is a possibility to exploit existing logicbased tools such as satisfiability modulo theories (SMT) solver [3] for performing some reasoning tasks related to the language. Experimenting with other supervised machine learning techniques would be the next step as well, for instance by using another deep learning approach (i.e., another type of neural network such as recurrent neural network) with the aim of improving the prediction quality.
Footnotes
 1.
Note that, as usual, a timestamp can be represented as milliseconds since Unix epoch (i.e., the number of milliseconds that have elapsed since Jan 1, 1970 00:00:00 UTC).
 2.
We assume that variables are standardized apart, i.e., no two quantifiers bind the same variable (e.g., \(\forall i . \exists i . (i > 3)\)), and no variable occurs both free and bound (e.g., \((i> 5) ~\wedge ~\exists i . (i > 3)\)). As usual in FOL, every FOE formula can be transformed into a semantically equivalent formula where the variables are standardized apart by applying some variable renaming [61].
 3.
More information about the implementation architecture, the code, the tool, and the screencast can be found at http://bit.ly/sdprom2.
 4.
ProM is a widely used extendable framework for process mining (http://www.promtools.org).
 5.
 6.
 7.
More information on BPIC 2013 can be found in http://www.win.tue.nl/bpi/doku.php?id=2013:challenge.
 8.
More information on BPIC 2012 can be found in http://www.win.tue.nl/bpi/doku.php?id=2012:challenge.
 9.
More information on BPIC 2015 can be found in http://www.win.tue.nl/bpi/doku.php?id=2015:challenge.
Notes
Acknowledgements
Open access funding provided by University of Innsbruck and Medical University of Innsbruck. The authors thank Tri Kurniawan Wijaya for various suggestions related to this work, and Yasmin Khairina for the implementation of several prototype components.
Supplementary material
References
 1.Bagheri Hariri, B., Calvanese, D., De Giacomo, G., Deutsch, A., Montali, M.: Verification of relational datacentric dynamic systems with external services. In: The 32nd ACM SIGACT SIGMOD SIGAI Symposium on Principles of Database Systems (PODS), pp 163–174 (2013)Google Scholar
 2.Bagheri Hariri, B., Calvanese, D., Montali, M., Santoso, A., Solomakhin, D.: Verification of semanticallyenhanced artifact systems. In: Proceedings of the 11th International Joint Conference on Service Oriented Computing (ICSOC), LNCS, vol. 8274, pp. 600–607. Springer (2013). https://doi.org/10.1007/9783642450051_51 CrossRefGoogle Scholar
 3.Barrett, C.W., Sebastiani, R., Seshia, S.A., Tinelli, C.: Satisfiability modulo theories. In: Biere, A., Heule, M., van Maaren, H. (eds.) Handbook of Satisfiability. IOS Press, Amsterdam (2009)Google Scholar
 4.Basin, D., Klaedtke, F., Müller, S., Pfitzmann, B.: Runtime monitoring of metric firstorder temporal properties. In: IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science, Leibniz International Proceedings in Informatics (LIPIcs), vol. 2, pp 49–60 (2008)Google Scholar
 5.Basin, D., Klaedtke, F., Marinovic, S., Zălinescu, E.: Monitoring of temporal firstorder properties with aggregations. In: Runtime Verification (RV) 2013, LNCS, vol. 8174, pp. 40–58. Springer (2013)Google Scholar
 6.Basin, D., Klaedtke, F., Marinovic, S., Zălinescu, E.: Monitoring of temporal firstorder properties with aggregations. Formal Methods Syst. Des. 46(3), 262–285 (2015)CrossRefGoogle Scholar
 7.Basin, D., Klaedtke, F., Müller, S., Zălinescu, E.: Monitoring metric firstorder temporal properties. J. ACM 62(2), 15:1–15:45 (2015)MathSciNetCrossRefGoogle Scholar
 8.Bianculli, D., Ghezzi, C., San Pietro, P.: The tale of SOLOIST: a specification language for service compositions interactions. In: Formal Aspects of Component Software (FACS) 2012, LNCS, vol. 7684, pp. 55–72. Springer (2013)Google Scholar
 9.Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)CrossRefGoogle Scholar
 10.Breiman, L., Friedman, J., Stone, C., Olshen, R.: Classification and Regression Trees The Wadsworth and BrooksCole Statisticsprobability Series. Taylor & Francis, Milton Park (1984)Google Scholar
 11.Breuker, D., Matzner, M., Delfmann, P., Becker, J.: Comprehensible predictive models for business processes. MIS Q. 40(4), 1009–1034 (2016)CrossRefGoogle Scholar
 12.Calvanese, D., Ísmail Ílkan Ceylan, Montali, M., Santoso, A.: Verification of contextsensitive knowledge and action bases. In: Proceedings of the 14th European Conference on Logics in Artificial Intelligence (JELIA), LNCS, vol. 8761, pp. 514–528, Springer (2014). https://doi.org/10.1007/9783319115580_36 CrossRefGoogle Scholar
 13.Calvanese, D., Montali, M., Santoso, A.: Verification of generalized inconsistencyaware knowledge and action bases. In: Proceedings of the 24th International Joint Conference on Artificial Intelligence (IJCAI), pp. 2847–2853. AAAI Press (2015)Google Scholar
 14.Conforti, R., de Leoni, M., La Rosa, M., van der Aalst, W.M.P.: Supporting RiskInformed Decisions During Business Process Execution, pp. 116–132. Springer, Berlin (2013)Google Scholar
 15.Conforti, R., de Leoni, M., La Rosa, M., van der Aalst, W.M., ter Hofstede, A.H.: A recommendation system for predicting risks across multiple business process instances. Decis. Support Syst. 69, 1–19 (2015)CrossRefGoogle Scholar
 16.De Masellis, R., Maggi, F.M., Montali, M.: Monitoring dataaware business constraints with finite state automata. In: Proceedings of the 2014 International Conference on Software and System Process, pp. 134–143. ACM (2014)Google Scholar
 17.Di Francescomarino, C., Dumas, M., Federici, M., Ghidini, C., Maggi, F.M., Rizzi, W.: Predictive business process monitoring framework with hyperparameter optimization. In: Proceedings of the 28th International Conference on Advanced Information Systems Engineering (CAiSE), LNCS, vol. 9694, pp. 361–376. Springer (2016)Google Scholar
 18.Di Francescomarino, C., Dumas, M., Maggi, F.M., Teinemaa, I.: Clusteringbased predictive process monitoring. IEEE Trans. Serv. Comput. PP(99), 1–18 (2016)Google Scholar
 19.Di Francescomarino, C., Ghidini, C., Maggi, F.M., Petrucci, G., Yeshchenko, A.: An eye into the future: leveraging apriori knowledge in predictive business process monitoring. In: Proceedings of the 15th International Conference on Business Process Management (BPM), LNCS (2017)Google Scholar
 20.Di Francescomarino, C., Ghidini, C., Maggi, F.M., Milani, F.: Predictive process monitoring methods: Which one suits me best? In: Proceedings of the 16th International Conference on Business Process Management (BPM), pp. 462–479. Springer (2018)Google Scholar
 21.Du, X., Liu, Y., Tiu, A.: Tracelength independent runtime monitoring of quantitative policies in LTL. In: Formal Methods (FM) 2015, LNCS, vol. 9109, pp 231–247. Springer (2015)Google Scholar
 22.Dumas, M., La Rosa, M., Mendling, J., Reijers, H.A.: Fundamentals of Business Process Management, 2nd edn. Springer, Berlin (2018)CrossRefGoogle Scholar
 23.Evermann, J., Rehse, J.R., Fettke, P.: A deep learning approach for predicting process behaviour at runtime. In: BPM Workshops 2016, pp. 327–338. Springer (2017)Google Scholar
 24.Evermann, J., Rehse, J.R., Fettke, P.: Predicting process behaviour using deep learning. Decis. Support Syst. 100, 129–140 (2017)CrossRefGoogle Scholar
 25.Folino, F., Guarascio, M., Pontieri, L.: Discovering contextaware models for predicting business process performances. In: On the Move to Meaningful Internet Systems: OTM Conference 2012, pp. 287–304. Springer (2012)Google Scholar
 26.Frank, U.: DomainSpecific Modeling Languages: Requirements Analysis and Design Guidelines, pp. 133–157. Springer, Berlin (2013)Google Scholar
 27.Freund, Y., Schapire, R.E.: A decisiontheoretic generalization of online learning and an application to boosting. J. Comput. Syst. Sci. 55(1), 119–139 (1997)MathSciNetCrossRefGoogle Scholar
 28.Friedman, J., Hastie, T., Tibshirani, R.: The Elements of Statistical Learning. Springer, Berlin (2001)zbMATHGoogle Scholar
 29.Geurts, P., Ernst, D., Wehenkel, L.: Extremely randomized trees. Mach. Learn. 63(1), 3–42 (2006)CrossRefGoogle Scholar
 30.Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, Cambridge (2016)zbMATHGoogle Scholar
 31.Han, J., Pei, J., Kamber, M.: Data Mining: Concepts and Techniques. Elsevier, Amsterdam (2011)zbMATHGoogle Scholar
 32.IEEE Comp Intelligence Society.: IEEE Standard for eXtensible Event Stream (XES) for achieving interoperability in event logs and event streams. IEEE Std 18492016 (2016)Google Scholar
 33.Leontjeva, A., Conforti, R., Di Francescomarino, C., Dumas, M., Maggi, F.M.: Complex symbolic sequence encodings for predictive monitoring of business processes. In: Proceedings of the 13th International Conference on Business Process Management (BPM), LNCS, Springer (2015)Google Scholar
 34.Maggi, F.M., Dumas, M., GarcíaBañuelos, L., Montali, M.: Discovering dataaware declarative process models from event logs. In: Proceedings of the 11th International Conference on Business Process Management (BPM), pp. 81–96. Springer (2013)Google Scholar
 35.Maggi, F.M., Di Francescomarino, C., Dumas, M., Ghidini, C.: Predictive monitoring of business processes. In: Proceedings of the 26th International Conference on Advanced Information Systems Engineering (CAiSE), LNCS, vol. 8484, pp. 457–472. Springer (2014)Google Scholar
 36.MárquezChamorro, A.E., Resinas, M., RuizCortés, A.: Predictive monitoring of business processes: a survey. IEEE Trans. Serv, Comput. (2017)Google Scholar
 37.MárquezChamorro, A.E., Resinas, M., RuizCortés, A., Toro, M.: Runtime prediction of business process indicators using evolutionary decision rules. Expert Syst. Appl. 87, 1–14 (2017)CrossRefGoogle Scholar
 38.Mehdiyev, N., Evermann, J., Fettke, P.: A multistage deep learning approach for business process event prediction. In: 2017 IEEE 19th Conference on Business Informatics (CBI), vol. 01, pp. 119–128 (2017)Google Scholar
 39.Mernik, M., Heering, J., Sloane, A.M.: When and how to develop domainspecific languages. ACM Comput. Surv. 37(4), 316–344 (2005)CrossRefGoogle Scholar
 40.Metzger, A., Föcker, F.: Predictive business process monitoring considering reliability estimates. In: Proceedings of the 29th International Conference on Advanced Information Systems Engineering (CAiSE), pp 445–460. Springer (2017)Google Scholar
 41.Metzger, A., Franklin, R., Engel, Y.: Predictive monitoring of heterogeneous serviceoriented business networks: the transport and logistics case. In: Annual SRII Global Conference (2012)Google Scholar
 42.Metzger, A., Leitner, P., Ivanović, D., Schmieders, E., Franklin, R., Carro, M., Dustdar, S., Pohl, K.: Comparing and combining predictive business process monitoring techniques. IEEE Trans. Syst. Man Cybern. Syst. 45(2), 276–290 (2015)CrossRefGoogle Scholar
 43.Mohri, M., Rostamizadeh, A., Talwalkar, A.: Foundations of Machine Learning. MIT Press, Cambridge (2012)zbMATHGoogle Scholar
 44.Moody, D.: The “physics” of notations: toward a scientific basis for constructing visual notations in software engineering. IEEE Trans. Softw. Eng. 35(6), 756–779 (2009)CrossRefGoogle Scholar
 45.Object Management Group.: Decision Model and Notation (DMN) 1.0. http://www.omg.org/spec/DMN/1.0/ (2015)
 46.Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E.: Scikitlearn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)MathSciNetzbMATHGoogle Scholar
 47.Pesic, M., van der Aalst, W.M.P.: A declarative approach for flexible business processes management. In: BPM Workshops 2006, pp. 169–180. Springer (2006)Google Scholar
 48.Pesic, M., Schonenberg, H., van der Aalst, W.M.P.: DECLARE: full support for looselystructured processes. In: 11th IEEE International Enterprise Distributed Object Computing Conference (EDOC 2007), pp. 287–287 (2007)Google Scholar
 49.Pika, A., van der Aalst, W.M.P., Fidge, C.J., ter Hofstede, A.H.M., Wynn, M.T.: Predicting deadline transgressions using event logs. In: BPM Workshops 2012, LNBIP, Springer (2012)Google Scholar
 50.Pika, A., van der Aalst, W., Wynn, M., Fidge, C., ter Hofstede, A.: Evaluating and predicting overall process risk using event logs. Inf. Sci. 352–353, 98–120 (2016)CrossRefGoogle Scholar
 51.Pnueli, A.: The temporal logic of programs. In: Proceedings of the 18th Annual Symposium on the Foundations of Computer Science (FOCS), pp. 46–57 (1977)Google Scholar
 52.Polato, M., Sperduti, A., Burattin, A., de Leoni, M.: Dataaware remaining time prediction of business process instances. In: 2014 International Joint Conference on Neural Networks (IJCNN) (2014)Google Scholar
 53.Polato, M., Sperduti, A., Burattin, A., Leoni, Md: Time and activity sequence prediction of business process instances. Computing 100(9), 1005–1031 (2018)MathSciNetCrossRefGoogle Scholar
 54.RoggeSolti, A., Weske, M.: Prediction of remaining service execution time using stochastic petri nets with arbitrary firing delays. In: Proceedings of the 11th International Joint Conference on Service Oriented Computing (ICSOC), LNCS, vol. 8274, pp. 389–403. Springer (2013)Google Scholar
 55.RoggeSolti, A., Weske, M.: Prediction of business process durations using nonmarkovian stochastic petri nets. Inf. Syst. 54, 1–14 (2015)CrossRefGoogle Scholar
 56.Santoso, A.: Verification of dataaware business processes in the presence of ontologies. Ph.D. thesis, Free University of BozenBolzano, Technische Universität Dresden. http://nbnresolving.de/urn:nbn:de:bsz:14qucosa213372 (2016)
 57.Santoso, A.: Specificationdriven multiperspective predictive business process monitoring. In: Enterprise, BusinessProcess and Information Systems Modeling, BPMDS 2018, EMMSAD 2018, LNBIP, vol. 318, pp. 97–113. Springer (2018) https://doi.org/10.1007/9783319917047_7 CrossRefGoogle Scholar
 58.Senderovich, A., Weidlich, M., Gal, A., Mandelbaum, A.: Queue mining predicting delays in service processes. In: Proceedings of the 26th International Conference on Advanced Information Systems Engineering (CAiSE), LNCS, vol. 8484, pp. 42–57. Springer (2014)Google Scholar
 59.Senderovich, A., Weidlich, M., Gal, A., Mandelbaum, A.: Queue mining for delay prediction in multiclass service processes. Inf. Syst. 53, 278–295 (2015)CrossRefGoogle Scholar
 60.Senderovich, A., Di Francescomarino, C., Ghidini, C., Jorbina, K., Maggi, F. M.: Intra and intercase features in predictive process monitoring: a tale of two dimensions. In: Proceedings of the 15th International Conference on Business Process Management (BPM), pp. 306–323. Springer, Cham (2017)Google Scholar
 61.Smullyan, R.M.: First Order Logic. Springer, Berlin (1968)CrossRefGoogle Scholar
 62.Steeman, W.: BPI Challenge (2013). https://doi.org/10.4121/uuid:a7ce5c5503a74583b85598b86e1a2b07
 63.Tax, N., Verenich, I., La Rosa, M., Dumas, M.: Predictive business process monitoring with LSTM neural networks. In: Proceedings of the 29th International Conference on Advanced Information Systems Engineering (CAiSE), LNCS, vol. 10253, pp. 477–492. Springer (2017)Google Scholar
 64.Theano Development Team.: Theano: A Python Framework for Fast Computation of Mathematical Expressions. arXiv:1605.02688 (2016)
 65.van der Aalst, W.: Reengineering knockout processes. Decis. Support Syst. 30(4), 451–468 (2001)CrossRefGoogle Scholar
 66.van der Aalst, W., et al.: Process mining manifesto. In: BPM Workshops 2012, LNBIP, vol. 99, pp. 169–194. Springer (2012)Google Scholar
 67.van der Aalst, W.M.P.: Process MiningData Science in Action, 2nd edn. Springer, Berlin (2016)CrossRefGoogle Scholar
 68.van der Aalst, W.M.P., Pesic, M., Song, M.: Beyond process mining: from the past to present and future. In: Proceedings of the 22nd international conference on Advanced Information Systems Engineering (CAiSE), LNCS, vol. 6051, pp. 38–52. Springer (2010)Google Scholar
 69.van der Aalst, W.M.P., Schonenberg, M., Song, M.: Time prediction based on process mining. Inf. Syst. 36(2), 450–475 (2011)CrossRefGoogle Scholar
 70.Van Dongen, B.: BPI Challenge (2012). https://doi.org/10.4121/uuid:3926db30f7124394aebc75976070e91f
 71.Van Dongen, B.: BPI Challenge (2015). https://doi.org/10.4121/uuid:ed445cdd27d54d77a1f759fe7360cfbe
 72.Verenich, I., Dumas, M., La Rosa, M., Maggi, F.M., Di Francescomarino, C.: Complex symbolic sequence clustering and multiple classifiers for predictive process monitoring. In: BPM Workshops 2015, LNBIP, vol. 256, pp. 218–229. Springer (2015)Google Scholar
 73.Verenich, I., Dumas, M., La Rosa, M., Maggi, F.M., Di Francescomarino, C.: Minimizing overprocessing waste in business processes via predictive activity ordering. In: Proceedings of the 28th International Conference on Advanced Information Systems Engineering (CAiSE), pp. 186–202. Springer (2016)Google Scholar
Copyright information
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.