1 Introduction

Predictive Process Monitoring [29] is a branch of process mining that aims at predicting the future of an ongoing (uncompleted) process execution. Typical examples of predictions of the future of an execution trace relate to the outcome of a process execution, to its completion time, or to the sequence of its future activities.

Being able to predict in advance the outcome of a process execution, the time that a process instance will require to complete, or the activities that will be executed next can be extremely valuable in several domains and scenarios, e.g., for production processes, allowing organizations to prevent undesired outcomes, issues and delays. Indeed, differently from the problem of monitoring business processes in a reactive way [28], i.e., so that the violation or the delay is identified only after its occurrence, predicting the violation or the issue before it occurs, would allow for supporting users and organizations in preventing it by taking the appropriate preventive countermeasures. Fueled also by the wave of technical developments in Data Science, Predictive Analytics, and data driven Artificial Intelligence, the development of predictive techniques tailored to the field of Process Mining has rapidly established itself both as a vibrant research topic and as an impactful functionality with a direct application in innovative organizational contexts and process mining tools, which often go hand in hand. Examples are the development of new Predictive Process Monitoring pipelines for specific organizations (such as hospitals) [3] and the investigation of explainable Predictive Process Monitoring techniques performed together with leading Process Mining companies such as myInvenioFootnote 1 [18] with the aim of incorporating the features within their Process Mining tools (see also [36]).

Predictive Process Monitoring approaches usually leverage past historical complete executions in order to provide predictions about the future of an ongoing (incomplete) case. They usually have two phases: a training or learning phase, in which a predictive model is learned from historical (complete) execution traces and a runtime or prediction phase, in which the predictive model is queried for predicting the future of an ongoing case.

The chapter is structured as follows:Footnote 2 after an introduction of a simple explanatory example (Sect. 2) and of the main dimensions characterizing the family of the Predictive Process Monitoring approaches for business processes (Sect. 3), the typical encodings and approaches used for the prediction of outcomes (Sect. 4), numeric values (Sect. 5), and sequences of activities - and related payloads - (Sect. 6), are described in the next three sections, respectively. Finally, Sect. 7 presents new relevant trends in the context of operational support techniques based on Machine Learning and Sect. 8 introduces the main available open source tools supporting Predictive Process Monitoring tasks. We assume as prerequisite for the next sections that the reader has some machine/deep learning knowledge, especially on classification and regression algorithms, as well as on recurrent neural networks. The interested reader can refer to [5, 19, 20].

2 Running Example

During the execution of a business process, process participants cooperate to satisfy certain business constraints. At any stage of the process enactment, decisions are taken aimed at achieving the satisfaction of these constraints. Being able to predict in advance certain aspects of a process execution allows organizations to take advantage or adapt to desirable future enfolding or to react and be able to prevent an undesirable scenario by taking the appropriate preventive countermeasures.

In this chapter we will illustrate the potential and characteristics of Predictive Process Monitoring by means of a running example in a healthcare scenario.Footnote 3 The example describes the process of a patient going to a hospital to perform a radiology exam and related medical checks. The process covers both the clinical aspects, such as the visit(s) and the radiology exam(s) and administrative issues, such as the admission to the radiology department, the computation of the medical bill and its payment. During the process execution, the doctor has to make decisions on whether further exams are required, and - if possible - issued. Depending on the examinations visits can precede and/or follow the radiology exam, which vary in range from Ultrasound, to X-ray, to Pet, MRI, Breast Imaging, and so on. The process typically starts with the admission, the execution of the medical activities (exams and visits) and the computation and paying of the bills. Different executions are nonetheless possible such as a payment in advance, before the visit.

In this scenario, historical information about past executions of the process, and in particular data related to the clinical history of other patients with similar characteristics, could be used to support the hospital predicting the unfolding of a certain execution. As an example, at a certain time during the process execution, one could predict whether a certain patient will require ultrasounds and/or at what time. This may be used by the hospital staff to improve or adapt the scheduling of their facilities.

Fig. 1.
figure 1

Predictive Process Monitoring along three dimensions.

3 The Family of Predictive Process Monitoring Approaches

Although Predictive business Process Monitoring is a relatively young field, it has been growing fast in the latest years, as it is also witnessed by recent surveys on the topic [13, 31]. As depicted in Fig. 1, the literature on predictive business process monitoring can be roughly classified along three main dimensions:

  • type of prediction (i.e., the type of predictions provided as output);

  • type of adopted approach and technique;

  • type of information exploited in order to get predictions (i.e., the type of information taken as input).

Fig. 2.
figure 2

Types of predictions.

Concerning the type of prediction, according to the literature [13, 46], we can classify the existing prediction types into three main big categories:

  • predictions related to predefined categorical or boolean outcome values (outcome-based predictions);

  • predictions related to measures of interest taking numeric or continuous values (numeric value predictions);

  • predictions related to sequences of future activities and related data payloads (next event predictions).

Figure 2 shows an example of an execution trace describing the activities carried out by John. Let us assume that it is 8:54 a.m now. At 8:00 a.m. John has registered to the hospital to undergo some health checks, at 8:10 he was taken to the radiology department where he was visited at 8:15 and he is now having X-rays. Predictive Process Monitoring would allow us to answer different types of questions on the future of John. For instance, we could predict whether John will undergo an ultrasound scan in the future. The answer to this specific question will be a boolean value (e.g., it is true that John will undergo an ultrasound in the future). This is a typical example of an outcome-based prediction. However, this class of predictions also includes predictions assuming categorical values, that is, values that range in a limited and fixed number of possible options. Examples are the class of discount that will be applied to a customer at the end of his shopping, the class of risk of a given execution, or, in our scenario, the specific exam out of a number of options. Another typical question Predictive Process Monitoring could allow us to answer about John’s future is, once we know that he will undergo an ultrasound, in how much time he is going to have it. The answer to this question is generally provided in terms of a numeric value (e.g., John is going to have an ultrasound exam in 26 min) and is an example of a numeric-value prediction. Typical examples in this settings are predictions related to the remaining time of an ongoing execution, predictions related to the duration or to the cost of an ongoing case. Finally, we could even predict what John is going to do from now on. The answer to this question is a sequence of future activities (e.g., John will undergo an ultrasound, will ask for his bill and will pay it). Typical examples of predictions falling under this category refer to the prediction of the sequence of the future activities (and of their data payloads) of a process case upon its completion.

Fig. 3.
figure 3

Types of PPM approaches.

Predictive Process Monitoring approaches are usually characterized by two phases. In a first phase, the training or learning phase (see the light blue part in Fig. 3), one or more models are built or enriched by leveraging the information contained in the execution log. In the second phase, the runtime or prediction phase (see the light green part in Fig. 3), the learned model(s) is(are) exploited in order to get predictions related to an ongoing execution trace. We can identify two main groups of approaches dealing with the prediction problem:

  • approaches relying on an explicit model (model-based approaches), e.g., annotated transition systems. The explicit model can either be discovered from the event log and then enriched with the information the log contains or directly be enriched, if an explicit model is already available. In model-based approaches, the model that is then leveraged at runtime in order to get predictions is an (enriched) model in which the process control flow is somehow made explicit (see the blue box in the middle on the right of Fig. 3).

  • approaches leveraging machine learning and statistical techniques, e.g., classification and regression models, as well as neural networks. These approaches only rely on (implicit) predictive models built by encoding event log information in terms of features to be used as input for machine/deep learning techniques (see the blue box at the top on the right of Fig. 3).

Fig. 4.
figure 4

Information used for making predictions.

Finally, we can identify four different types of information that can be used as input to the Predictive Process Monitoring approaches, e.g., for building a model annotated with execution information or for building the features to be used by machine learning approaches:

  • information related to the control flow - i.e., the sequence of events. As depicted in the fourth row of Fig. 4, in the example of John’s history this is the information related to the activities carried out by John (e.g., check in to the hospital, go to the radiology department, ...).

  • information related to the structured data payload associated to the events. This information usually include the timestamp of the events, but it can also include other types of data attributes. For instance, in John’s history, besides the timestamp associated to each event, the data payload of the event Visit patient also includes the doctor who has visited John, i.e., Alice (see the third row in Fig. 4).

  • information related to unstructured (textual) content, which can be available together with the event log. Indeed, it often happens that, together with the structured information related to the events and data payload, some unstructured information is also available. In John’s example, for instance, the text of Alice’s medical report is available together with the event visit patient (see the second row in Fig. 4) and could provide useful information on what John is going to do later on.

  • information related to process context, such as workload or resource availability. In John’s example, this kind of information could be related for instance to the availability of free ultrasound scan machines (first row in Fig. 4). Contextual information could provide useful information on what John is going to do later on and when. For example, the time required to John to perform an ultrasound could be related to the immediate availability of a scan equipment.

In several approaches, more than one of these types of information is used in order to learn from the past.

After reporting few more details on the model-based approaches and approaches leveraging machine learning in the next subsection, in the following sections, we will mainly focus on machine-learning approaches and on encodings taking into account event and data payload features. We will look in more detail at each of the three prediction type macro-categories, i.e., predicting outcomes, numeric values and sequences of activities (and related payloads), respectively.

3.1 Predictive Process Monitoring Approaches

We report here an overview of the main phases related to the two main families of Predictive Process Monitoring approaches, i.e., model-based approaches and approaches based on machine learning.

Fig. 5.
figure 5

Overview of the typical phases of model-based approaches.

Figure 5 shows the main phases characterizing the model-based approaches. At training time, an explicit and conformant model (see [7]) can either be already available or can be discovered from the historical traces (see [2, 4]) using the optional Model discovery phase in Fig. 5. The model is then enriched with information related to the data (Model enrichment), as for instance the remaining time extracted from the historical traces. At runtime, the enriched model is used in order to return a prediction.

Fig. 6.
figure 6

Running example model-based approaches.

Fig. 7.
figure 7

Annotated transition system obtained from the log reported in Fig. 6.

One of the main model-based approaches leverages a transition system as explicit control-flow model (see Definition 1 in [4]). The transitions system is built based on a given abstraction of the representation of the events in the traces (e.g., the name of the activity), as well as of the representation of the state of the transition system, as for instance the sequence of activities executed so far or the set of activities occurred so far. For instance, let us consider the simple event log reported in Fig. 6 related to the example described in Sect. 2. Each case relates to a different patient and the corresponding sequence of events indicates the activities executed for a medical treatment of that patient. Given the variability of the process, different interplays are possible between the clinical and administrative activities. In particular in sequence \(\sigma _1\) the process starts directly with a visit (possibly due to urgency), while the administrative part is executed in the middle of the process; instead in sequence \(\sigma _3\), the process starts with a computation of the overall price (possibly due to the request of having a quote) before proceeding further. The event timestamp of each event is reported among brackets nearby the activity. For example, trace \(\sigma _2\) refers to a process execution in which the activity Visit patient is executed at time 08:00, the activity Compute rate at time 10:00 and so on. Figure 7 shows the transition system computed using as event representation abstraction the name of the activity and as state representation the activity set.

The transition system is then annotated, given a certain measurement function, as for instance the elapsed time or the remaining time, with the corresponding information extracted from the event log. For instance, information about the remaining time can be extracted from the traces and reported for each state of the transition system. This information is then used for making predictions, e.g., on the completion time of an ongoing trace, given a certain prediction function, as for example the average remaining execution time. The transition system in Fig. 7, for instance, is annotated (in blue) with the remaining time of each trace in the event log of Fig. 6. Moreover, for each state, the average of these values is also computed and reported. For example, the state corresponding to the empty set of activities , is annotated with the remaining time of each trace at the beginning of the execution, i.e., 11 h for \(\sigma _1\), 8 h for \(\sigma _2\) and so on.

When, at runtime, a prediction about the completion time of a new ongoing trace is required, the annotated transition system can be queried by looking at the state of the transition system corresponding to the ongoing case, and the value of the chosen prediction function returned. For instance, let us assume we want to predict the completion time of an ongoing case \(\sigma _t =\) (Compute rate (CR) {12:00}, Visit patient (VP) {13:00}). Two measurements are associated to the corresponding state of the transition system in Fig. 7 (see the state in light green), i.e., 6 and 2 hours. Considering the average as prediction function, the average value of the measurements (4 hours) can be used to compute the predicted completion time, i.e., according to the prediction, the patient will complete his process at 17:00.

Several extensions have been proposed to the original approach, such as annotating the transition systems with machine learning models like Naïve Bayes and Support Vector Regression models [34], taking into account also data payloads [35], combining the annotated transition systems with a context-driven predictive clustering approach [16, 17]. Other model-based approaches consider, instead sequence trees [9] or stochastic Petri nets [40, 41] as explicit models to predict the remaining execution time of a process instance.

Fig. 8.
figure 8

Overview of the typical phases of approaches based on machine learning.

Figure 8 sketches the main phases of the typical approaches based on machine learning. These approaches usually require that trace prefixes are extracted from the historical execution traces (Prefix extraction phase). This is due to the fact that at runtime predictions are made on incomplete traces, so that correlations between incomplete traces and what we want to predict (target variables or labels) have to be learned in the training phase. After prefixes have been extracted, prefix traces and labels (i.e., the information that has to be predicted) are encoded in the form of feature vectors (Encoding phase). Encoded traces are then passed to the (supervised learning) techniques in charge of learning from the encoded data one (or more) predictive model(s) (Encoding phase). At runtime, the incomplete execution traces i.e., the traces whose future is unknown, should also be encoded as feature vectors and used to query the predictive model(s) so as to get the prediction (Predicting phase).

In this chapter we will mainly focus on approaches leveraging machine learning - and in particular supervised learning - techniques.

4 Predicting Outcomes

Outcome predictions are predictions related to (categorical) case outcomes [46]. Typical examples of outcome predictions in the Predictive Process Monitoring literature are predictions related to risks or related the fulfilment of a predicate [11, 29].

Given an event log L and a prefix execution trace \(\sigma _i^m =\,{<}e_1, \ldots , e_m{>}\) of length m, the overall idea is learning a function \(f_c(L,\sigma _i^m)\) returning a categorical value \(\overline{label_i}\), which is as close as possible to \(label_i\), i.e., the actual (categorical) value of the variable that we aim to predict (e.g., whether the predicate will be actually fulfilled).

As described in the previous section, when dealing with approaches based on machine learning, one of the main steps to be carried out deals with encoding the information contained in (prefix) execution traces and corresponding labels in a format that is understandable by machine learning techniques. This would allow the technique to train, and hence learn, from encoded data a predictive model. In order to train a model, each (prefix) execution trace \(\sigma _i\), (and its corresponding label) have to be represented through a feature vector \(g_i=(g_{i1},g_{i2},...g_{ih}, label_i)\).

In this section (and in the next two sections) we will present first the typical encodings used with the corresponding type of predictionsFootnote 4 and then the main (machine-learning) pipelines/approaches used to build the predictive model and query it.

4.1 Typical Data Encodings

To exemplify the different data encoding techniques, we consider the very simple log in Fig. 9 pertaining to our running example of Sect. 2. Similarly to the log used in Sect. 3.1, also in this log each case relates to a different patient and the corresponding sequence of events indicates the activities executed for a medical treatment of that patient. Visit patient is the first event of sequence \(\sigma _1\). Its data payload “\(\{\) 33, radiology\(\}\)” corresponds to the data associated to attributes age and departmentFootnote 5. Note that the value of age is static: it is the same for all the events in a case, while the value of department is different for every event. In the payload of an event, the entire set of attributes available in the log is considered as well. In case for some event the value for a specific attribute is not available, the value \(\bot \) (unknown) is specified for it.

Given a case prefix, we aim at predicting whether the patient will recover soon (true), or not (false). We report the corresponding value, i.e., the corresponding label, for each case after the semicolon in Fig. 9.

Fig. 9.
figure 9

Running example with an outcome label

Table 1. Typical outcome-based encodings for the example in Fig. 9.

Boolean Encoding. In the boolean encoding sequences of events are represented as feature vectors, in such a way that each feature corresponds to an event class (an activity) from the log. In particular, the boolean encoding represents a sequence \(\sigma _i\) through a feature vector \(g_i=(g_{i1_A},g_{i2_A},...g_{ih_A}, label_i)\), where \(h_A\) is the size of the event class alphabet \(A=\{a_{1_A}, \ldots , a_{h_A}\}\) and if \(g_{ij_A}\) corresponds to the event class \(a_{j_A} \in A\) then:

$$ g_{ij} = \left\{ \begin{array}{l l} 1 &{} \quad \text {if } a_{j_A} \text { occurs in }\sigma _i\\ 0 &{} \quad \text {if } a_{j_A} \text { does not occur in }\sigma _i \end{array} \right. $$

For instance, the encoding of the example reported in Fig. 9 with the boolean encoding is shown in Table 1a.

Frequency-Based Encoding. The frequency-based encoding, instead of boolean values, represents the control flow in a case with the frequency of each event class in the case. The frequency-based encoding \(g_i=(g_{i1_A},g_{i2_A},...g_{ih_A}, label_i)\) of \(\sigma _i\), is such that, if \(g_{ij_A}\) corresponds to the event class \(a_{j_A} \in A\) then:

$$ g_{ij} = \left\{ \begin{array}{l l} n &{} \quad \text {if } a_{j_A} \text {occurs }n\text { times in }\sigma _i\\ 0 &{} \quad \text {if } a_{j_A} \text {does not occur in }\sigma _i \end{array} \right. $$

Table 1b shows the frequency-based encoding for the example in Fig. 9, assuming that Visit patient occurs two times in \(\sigma _i\) and Get Payment occur four times in \(\sigma _k\).

Simple-Index Encoding. Another way of encoding a sequence is by taking into account also information about the order in which events occur in the sequence, as in the simple-index encoding. Here, each feature corresponds to a position in the sequence and the possible values for each feature are the event classes. The resulting feature vector \(g_i\) of the simple-index encoding of an execution trace \(\sigma _i\) of length m is \(g_i=(a_{i1},a_{i2},...a_{im}, label_i)\), such that \(a_{ik}\) corresponds to the event class of the event at position k in \(\sigma _i\). By using this type of encoding the example in Fig. 9 would be encoded as reported in Table 1c.

Latest-Payload Encoding. The latest-payload encoding takes into account both the static and the dynamic data attributes of the traces. The value of static attributes (trace attributes) is the same for all the events in the sequence, while the value of dynamic data attributes (event attributes) changes for different events. However, in this encoding, data attributes, also the dynamic ones, are all treated as static features without taking into consideration their evolution over time. Indeed, the latest-payload encoding encodes the data attributes and the data of the latest payload. The latest-payload encoding \(g_i\) of an execution trace \(\sigma _i\) of length m is \(g_i=(s^1_{i}, \ldots , s^u_{i}, d^1_{im}, \ldots , d^r_{im}, label_i)\), where each \(s_{i}\) is a static feature and each \(d_{im}\) is a dynamic feature associated to the last event, i.e., the event at position m. Table 1d shows this encoding for the example in Fig. 9.

Index Latest-Payload Encoding. The index latest-payload encoding adds the latest encoding to the simple-index encoding. The resulting feature vector \(g_i\), for a sequence \(g_i=\sigma _i,\,{ is}g_i = (s^1_{i}, \ldots , s^u_{i}, a_{i1}, a_{i2}, \ldots , a_{im}, d^1_{im}, \ldots , d^r_{im}, label_i)\), where each \(s_{i}\) is a static feature, each \(a_{ij}\) is the event class at position j and each \(d_{im}\) is a dynamic feature associated to the event at position m. Table 1e reports this encoding for the example in Fig. 9.

Complex Index-Based Encoding. In the complex-based encoding, the dynamic nature of the dynamic information is considered and its evolution over time is taken into account. The resulting feature vector \(g_i\), for a sequence \(\sigma _i\), is \(g_i = (s^1_{i},..,s^u_{i}, a_{i1}, a_{i2}, ..a_{im}, d^{1}_{i1}, d^{1}_{i2}, \ldots , d^{1}_{im}, \ldots , d^{r}_{i1}, d^{r}_{i2},...d^{r}_{im}, label_i)\), where each \(s_{i}\) is a static feature, each \(a_{ij}\) is the event class at position j and each \(d_{ij}\) is a dynamic feature associated to an event. The example in Fig. 9 is transformed into the encoding shown in Table 1f.

4.2 Mostly Used Approaches: Classification-Based Approaches

Different pipelines and frameworks have been proposed for providing outcome predictions. Most of them relies on classification techniquesFootnote 6 (e.g., Decision Tree, Random Forest, Support Vector Machine) for the supervised learning phase [12, 23, 25, 29]. Moreover, most of these pipelines have been enriched with a Bucketing phase [46] (see the orange blocks in Fig. 10). The idea is that at training time multiple predictive models are trained. Specifically, the log of prefix traces is divided in multiple buckets and each bucket is used to train a different classifier. At runtime, the most suitable bucket is identified and the corresponding classifier used for predicting the outcome.

Fig. 10.
figure 10

Typical outcome-based pipeline

The Bucketing phase has been instantiated in different ways in the Predictive Process Monitoring literature. For instance, in [12] trace clustering has been used to group prefix traces. Specifically, at training time, a clustering algorithm has been leveraged to cluster together prefix traces sharing a similar control flow. For each cluster, the data payload of the prefix traces in the cluster, once encoded in the proper format, has then been used to train a classifier. At runtime, the cluster of the incomplete ongoing trace is identified, i.e., the cluster containing the trace prefixes closest to the current incomplete trace, and the corresponding classifier queried in order to get the prediction. In [25], instead, a bucket consists of a set of prefix traces of the same length. Also in this case, at training time, a classifier for each prefix length k is built by learning from all prefix traces of length k. At runtime, the classifier of the same length of the ongoing trace is identified and the prediction returned.

5 Predicting Numeric Values

Numeric value predictions are predictions related to quantitative measures of interest of business process executions. Typical examples of numeric predictions in the Predictive Process Monitoring literature are predictions related to time, cost or generic process performance [1, 8, 48].

Given an event log L and a prefix execution trace \(\sigma _i^m =\,{<}e_1, \ldots , e_m{>}\) of length m, the overall idea is learning a function \(f_n(L,\sigma _i^m)\) returning a numerical value \(\overline{label_i}\), which is as close as possible to \(label_i\), i.e., the actual (numerical) value of the variable that we aim to predict (e.g., the remaining cycle time until the completion of the execution).

5.1 Typical Data Encodings

Let us consider the running example of Fig. 9 and let us assume that this time we would like to predict the time required for completing the execution (reported in Fig. 11 after the semicolon).

Fig. 11.
figure 11

Running example with a numeric label

Encodings typically used for numeric predictions are the same as the ones used for categorical predictions, except for the label, which is a numerical value rather than a boolean or a categorical value. Table 2 summarizes the boolean, frequency, simple-index, latest-payload, index latest-payload and complex-index encodings for numeric-based predictions.

Table 2. Typical numeric-based encodings for the example in Fig. 9.

5.2 Mostly Used Approaches: Regression-Based Approaches

Pipelines and frameworks proposed for numeric predictions are quite similar to the ones for outcome predictions. Most of them relies on regression techniquesFootnote 7 (e.g., Regression Trees, Random Forest, XGBoost) for the supervised learning phase [23, 29].

6 Predicting Next Events

Next event predictions are predictions related to the unfolding of the future events - until the end - of an incomplete ongoing trace [45]. Next event predictions can be related to the sequence of next event classes, but also to the next data payloads associated to the events, as for instance, the timestamps or the resources associated to the next event(s).

In case of activity predictions, given an event log L and a prefix execution trace \(\sigma _i^m =\,{<}e_1, \ldots , e_m{>}\) of length m, the overall idea is learning a function \(f_{sa}(L,\sigma ^m)\) returning a sequence of next event classes that is as close as possible to \(a_{m+1}, \ldots , \omega \), i.e., to the activity suffix of the current ongoing trace.

Most of the approaches for next activity predictions typically first learn a function \(f_{1a}\) that, given the first m events of a trace \(\sigma _i^{m}\), predicts the next event class, i.e., the event class that will occur at time step \(m+1\). The suffix of the ongoing trace \(\sigma _i^m\) is then predicted until the last event \(\omega \), by predicting the next event iteratively, that is by learning the function \(f_{sa}\):

$$\begin{aligned} f_{sa}(L,\sigma _i^m) = \left\{ \begin{array}{l l} f_{1a}(\sigma ^m) &{} \quad \quad \text {if }f_{1a}(L,\sigma _i^m)=\omega \\ f_{sa}(L,<e_1, e_2, ..., e_m, e>) &{} {\quad \quad \text {otherwise}} \\ \quad \text {with } f_{1a}(L,\sigma _i^m) \text { as } e \text {'s event class} &{} \end{array} \right. \end{aligned}$$
(1)

Similarly, when predicting the values of the next events’ data attribute x, e.g., the next timestamps, the idea is learning a function \(f_{sx}(L,\sigma ^m)\) returning a sequence of values of the data attribute x that is as close as possible to the sequence of values actually held by the attribute x in the next events of the ongoing trace.

In the next subsection describing the typical data encodings, we mainly focus on the encoding for the next event class prediction. The results can then be extended to the prediction of other data attributes related to the next event, as well as to predictions related to next events, as described in (1).

6.1 Typical Data Encodings

Let us consider the running example described in Fig. 9 enriched with timestamp information and let us assume that we want to predict the next activity related to the next time step (i.e., the activity at time step \(m+1\)). The actual activity at time step \(m+1\) is reported after the semicolon for the training traces in Fig. 12.

Fig. 12.
figure 12

Running example with next activity as label

Table 3. Typical sequence-based encodings for the example in Fig. 12.

One-Hot Encoding. The one-hot encoding allows categorical data to be transformed into a numeric format. It relies on the existence of an alphabet of activities. Given the set \(A = \{a_{1_A}, \ldots a_{h_A}\}\) of all possible activities, an ordering function \(idx:A \rightarrow \{1, \ldots , \left| A\right| \}\subseteq \mathbb {N}\) is defined on it, such that \(a_{i_A}<>a_{j_A}\) if and only if \(i_A<>j_A\), i.e., two activities have the same A-index if and only if they are the same activity.

For instance, in the example in Fig. 12, if the activity alphabet is \(A=\{\texttt {Visit patient}, \texttt {Perform ultrasound}, \texttt {Compute rate}, \texttt {Get Payment}, \texttt {Check} \texttt {X-ray}, \texttt {Emit receipt} \}\), the function \(idx:\) \(A \rightarrow \{1,2,3,4,5,6\}\) can be defined such that \(idx(\texttt {Visit patient})=1\), \(idx(\texttt {Perform ultrasound})=2\), \(idx(\texttt {Compute} \texttt {rate})=3\) and so on. Each event \(e_{ij} \in \sigma _i\) is then encoded as a vector \((A_{ij})\) where the features are all set to 0, except the one occurring at the index of its event class, which is set to 1. In the training phase, the event class of the next event \(e_{m+1}\), which represents the target variable or label, is also encoded in the corresponding vector \((A_{im})\). The trace is finally encoded by composing the vectors obtained from all activities in the trace and the next activity into a matrix. The encoding of the trace \(\sigma _i\) is hence given by \(g_i = ((A_{i1}), ..., (A_{im}), (A_{im+1}))\). The one-hot encoding related to the example in Fig. 12 is reported in Table 3a.

One-Hot Encoding with Temporal Features. The one-hot encoding, which takes into account only the activities, can be enriched with other information. For instance, another encoding used with activity sequences combines the one-hot encoding of features related to event classes and features related to time [45]. In the one-hot encoding with temporal features, given the set \(A = \{a_{1_A}, \ldots a_{m_A}\}\) of all possible activities, each event \(e_{ij} \in \sigma _i\) is encoded as the one-hot encoding of its event class enriched with three additional features pertaining to time. The first one relates to the time difference between the considered event and the one of the previous event (\(\delta _i\)), the second one reports the time since midnight (\(h_i\)), thus allowing for distinguishing between working and night time, and the last one refers to the time since the beginning of the week (\(w_i\)), thus allowing for distinguishing between business and non-working days. Also in this case, in the training phase, the label, i.e., the event class of the next event \(e_{m+1}\) is also encoded with the one-hot encoding. The one-hot encoding with temporal features related to the example in Fig. 12 is reported in Table 3b.

Embedding-Based Encoding. The embedding-based encoding is typically used when the number of the possible values of one or more categorical variables is high and the one-hot encoding may cause an exponential growth of the feature vector dimensionality. In the embedding-based encoding, categorical data with an alphabet of possible values of size m is mapped into a n-dimensional embedding space (where n is the chosen dimensionality of the embedded space) that encodes the values of the categorical attribute so that values that are closer in the vector space are expected to be similar.

6.2 Mostly Used Approaches: LSTM-Based Approaches

Most of the approaches for next event predictions rely on Recurrent Neural Networks and, more specifically, on LSTM (Long-Short Term Memory) architectures [6, 26, 45].Footnote 8 This type of deep learning approaches, by using recurrent connections in a single block (LSTM cell), is indeed particularly suitable to deal with sequence problems. Different types of LSTM architectures have been proposed in the literature for predicting the label associated to the next event and its data attributes.

For instance in [45] three types of architectures have been proposed in order to predict both next activity and the timestamp of the next event and then, iteratively, suffix prediction and remaining cycle time: a first type with separate layers for activity and timestamp prediction, a second type with shared LSTM layers for both activity and timestamp prediction and finally a third one with some shared and some separate layers. The architecture proposed in [6] for predicting the next activity and its timestamp and the remaining cycle time and suffix for a running case is a composition of LSTMs and feedforward layers. In [26] an encoder-decoder framework based on LSTMs is proposed to predict the next activity and the suffix of an ongoing case. The encoder maps an input sequence into a set of high dimensional vectors and the decoder returns it back into new sequence that can be used for prediction tasks.

7 New Trends in ML-Driven Operational Support

Besides the mainstream works in the field of Predictive Process Monitoring, new research trends and directions focusing on ML-driven operational support have recently started being investigated and developed. Some of these new trends are summarised in the following subsections.

7.1 Intercase Predictions

In classical works, Predictive Process Monitoring methods assume that the predicted value of interest of an ongoing case only depends on intra-case information, as for instance on the execution history of that specific case. This assumption results in encodings that include past events, inter-event durations, and other case-related attributes. However, the only intra-case assumption does not hold in many real-life scenarios. For example, in situations where cases share limited resources, the completion time of a case heavily depends on other cases that are running at the same time [42, 43].

Table 4. Simple-index encoding enriched with some inter-case features for the example in Fig. 9.

Inter-case information can be encoded in different ways, as for instance by aggregating data related to traces running simultaneously. Examples of aggregated inter-case information that can be encoded together with the intra-case features are the number of traces and the average duration of traces being executed in the same time window in which the considered trace (prefix) is being executed, e.g., the number of traces and the average duration of traces executed in the same day of the current prefix trace. Table 4 shows an example of a simple-index encoding enriched with these two simple inter-case features related to the example reported in Fig. 9, where we assume that 10 other traces are running the same day in which \(\sigma ^m_1\) is being executed and that their average duration is 6 hours, while 18 traces are running simultaneously to \(\sigma ^m_k\) with an average duration of 8 hours.

Taking into account the inter-case dimension is a challenging problem, since, on the one hand, we would like to take into account as much inter-case information as possible as the levels of dependencies among cases can greatly vary in different scenarios and, on the other hand, encoding several features for a large number of simultaneously running cases may lead to a feature space explosion.

7.2 Explainable Predictions

In many applications of Predictive Process Monitoring techniques, users are asked to trust a model helping them making decisions. However, users would need a certain level of trust towards the predictive model: a doctor will not operate on a patient simply because the operation has been predicted or recommended by the model. Understanding the rationale behind predictions would certainly help users decide when to trust or not to trust them.

Explainability techniques are a way to implement responsible process decision making (see [30]) and can help us to this aim. Different explainability techniques have been proposed in the XAI (Explainable Artificial Intelligence) literature. Some of these techniques have already been experimented in the field of Predictive Process Monitoring in order to support users in understanding the overall predictive model [33] or the specific predictions it provides [18, 44, 51]; with model-agnostic techniques, i.e., techniques that can be applied to any predictive model, as in the case of  [18] or with techniques specific to the predictive model used, as in the case of XNAP [51] and the attention layer [44] for neural networks.

Fig. 13.
figure 13

Example of an explanation plot related to the prediction for \(\sigma _j\).

As an example of prediction explanation related to a trace instance,Footnote 9 let us assume that we have trained our predictive model by encoding the training set of the example reported in Fig. 9 with the complex-index encoding (see Table 1f) and that, for our current ongoing trace \(\sigma _j\) (Visit patient {20, clinic}, Perform X-Ray {20, radiology}, Perform ultrasound {20, radiology}), which we have observed up to the event 3, the prediction of our predictive model is that the patient will recover soon. In order to understand whether we can trust or not the prediction, we would need to understand why our predictive model has returned such a prediction. Figure 13 shows an example of a possible explanation returned by a prediction explainer as LIME[37] or SHAP[27] applied to our specific Predictive Process Monitoring problem. The plot shows the impact of each feature (and related value) towards (in case of positive values) or against (in case of negative values) the fast recovery of the patient.Footnote 10 In the example, the feature that has impacted most on the prediction of the fast recovery of the patient is her young age.Footnote 11

Furthermore, the explanations used for making predictions more trustable to the users can be eventually used also for understanding the reasons why a predictive process model is wrong and hence use them to improve the model accuracy [38].

7.3 Predictions with A-Priori Knowledge

Past event logs, or more in general knowledge about the past, is not the only important source of knowledge that can be leveraged to make predictions. In many real life situations, cases exist in which, together with past execution data, some case-specific additional knowledge (a-priori knowledge) about the future is available and can be leveraged for improving the predictive power of a Predictive Process Monitoring technique. Indeed, this additional a-priori knowledge is what characterizes the future context of the process executions that will affect the development of the currently running cases.

We can think for instance to the occurrence of a strike, which may cause the delay or the cancellation of a flight in the travel process of a passenger, or to the temporary unavailability of a surgery room, which may delay or even rule out the possibility of executing certain activities in a patient treatment process. In this kind of scenarios, the information about the strike or about the unavailability of the surgery room is often available in advance. However, traditional Predictive Process Monitoring approaches, which only learn from the most frequent observed behaviours, are not able to take into account this knowledge. They will predict that the next activities of the passenger will be the usual ones, as if there is no strike, e.g., having the security check, moving to the boarding gate 3, boarding, ... . While it is impractical to retrain the predictive algorithms to take into consideration this additional knowledge every time it becomes available, it is also reasonable to assume that considering it in some way would allow the Predictive Process Monitoring algorithm to predict for instance that the passenger will be moved to gate 2 and that there will be no boarding, and hence to improve the accuracy of the predictions on an ongoing case.

Fig. 14.
figure 14

Beam search in the a-priori approach.

A possibility to deal with a-priori knowledge is to take into account this knowledge K at prediction time by guiding the Predictive Process Monitoring algorithm towards a solution that is compliant to the a-priori knowledge [14]. In [14] for instance, an approach using LSTM for predicting the next activities has been enriched with a mechanism able to take into account background knowledge K expressed in terms of LTL formulae in order to guide the LSTM algorithm to make predictions compliant with the a-priori knowledge. The LSTM approach keeps returning likely predictions on the suffix of the current ongoing trace (up to the last event \(\omega \)) until it does not find a suffix that is compliant with K. More in detail, the LSTM network uses a beam search algorithm for considering at each time step the top beam-width bw most likely next events. Figure 14 shows the idea of the beam-search approach with \(bw=2\). \(\sigma ^m =\,<\) Take shuttle, Enter via door 3, Check in> is the current ongoing trace at time step m. At time step \(m+1\), among the three possible next events we take the bw most likely next events (the green nodes in Fig. 14) and keep exploring those future paths. At time step \(m+2\), we again select the 2 most likely next events and keep exploring the next events of these sequences. Whenever we find a sequence that is not compliant with K, as at time step \(m+3\), we discard that path and we keep on exploring bw compliant paths. We stop whenever we predict the last event \(\omega \) (see the circle with the thicker border) and the considered trace is still compliant with K.

7.4 Prescriptive Process Monitoring

Predictive Process Monitoring techniques are able to predict the likelihood of a positive outcome, the time required for completing an execution or the next activities that will be executed. However, all these techniques, are limited to the prediction. They do not support further stakeholders in making decisions on whether it is worth to intervene to avoid undesired outcomes and what to do next to optimize a given Key Process Performance Indicator (KPI) [24, 32, 47, 50].

Prescriptive Process Monitoring aims to overcome this limit of Predictive Process Monitoring by supporting or prescribing stakeholders with decisions on whether to take actions in order to prevent or mitigate the occurrence of an undesired outcome [32, 47] or on the activities to take for optimizing a certain measure of interests [24, 50].

In the first scenario [32, 47], predictions are used in order to evaluate through a cost model the tradeoffs between the cost of intervention to mitigate undesired outcomes and the cost of compensating unnecessary interventions. For instance, in the example related to the patient recovery described in Sect. 4, if the prediction related to an ongoing trace is that the patient will not recover soon, a surgery may increase the likelihood that the patient will recover soon and hence reduce anyway the cost for the hospital. However, the surgery has a cost, so that if the surgery has been planned because of a wrong prediction, then the cost of the surgery is unnecessary and hence should be avoided.

In the second scenario [24, 50], predictions are used to uncover the future of different continuations of the current trace, so as to identify and hence recommend the one(s) leading to the best value for the KPI of interest. For instance, we can consider the example of the patient recovery described in Sect. 5. If the aim is recommending next activities to minimize the remaining cycle time until the completion of the execution of an ongoing trace \(\sigma ^m\) of length m, possible next activities at step m+1 can be considered. For each possible continuation of \(\sigma ^m\), \(\sigma ^{m+1}\), the remaining time until the end of the execution can be predicted and the next activity corresponding to the minimum cycle time recommended.

8 Tool Support

The research related to Predictive Process Monitoring has been paired with the development of non-commercial plugins and tools with the purpose to be used and improved by the research community. We briefly illustrate in the following three among the main open-source tools supporting Predictive Process Monitoring.

8.1 Predictive Process Monitoring in ProM

ProM [15] is one of the most used and known tool in Process Mining. It is a framework collecting a number of plugins, working independently one from the other, and each focused on implementing a specific task. Among its variety of plugins, ProM also collects several plugins implementing techniques for the prediction of outcomes (e.g., [8, 29]), for the prediction of numerical values (e.g., [1, 10, 16, 23]), as well as for the prediction of next activity sequences (e.g., [35]). Some of them leverage model-based approaches (e.g. [1]), while others rely on machine-learning solutions (e.g., [10]).

8.2 Predictive Process Monitoring in Apromore

Apromore [22] is a well known and established tool. It is an advanced process model repository that allows to hold, analyse, and re-use large sets of process models. The tool is web-based and therefore it allows the easy integration of new plug-ins in a service oriented manner. This tool aims both at allowing practitioners to deal with the challenges of stakeholders of processes, and researchers to develop and benchmark their own techniques with a strong emphasis on the separation of concerns. The only plug-in performing Predictive Process Monitoring related challenges in Apromore is the one described in [49]. This plug-in performs outcome-based, numeric-based prediction, as well as next event predictions.

8.3 Predictive Process Monitoring in Nirdizati

Nirdizati [21, 39] is a web-based application for supporting users in building, comparing, and analyzing predictive models that can then be used to perform predictions on the future development of an ongoing case. Differently from the other tools, Nirdizati specifically addresses Predictive Process Monitoring problems. Nirdizati, which collects a rich set of different state-of-the-art approaches based on machine learning algorithms, supports users to deal with different predictive monitoring tasks: outcome-based, numeric and next activities predictions. Moreover, it provides services for supporting the users in tuning the hyperparameters of the specific technique, the possibility of adding some simple intercase features in the encodings, as well as some incremental algorithms, so as to be able to incrementally update the predictive model as soon as new execution traces are available. Finally, it also offers several plots for the results visualisation, thus supporting the users in the predictive model comparison tasks.