Keywords

1 Introduction

Predictive business process monitoring aims at anticipating potential process performance violations during the execution of a business process instance [15]. To this end, predictive business process monitoring predicts how an ongoing process instance will unfold up to its completion [18]. If a violation is predicted, a process instance may be proactively adapted to prevent the occurrence of violations [23]. As an example, if a delay in delivery time is predicted for an ongoing freight transport process, faster means of transport or alternative transport routes may be scheduled before the delay actually occurs.

A key requirement for the applicability of any predictive business process monitoring technique is that the technique delivers accurate predictions. Informally, prediction accuracy characterizes the ability of a prediction technique to forecast as many true violations as possible, while – at the same time – generating as few false alarms as possible [23]. Prediction accuracy is important to avoid the execution of unnecessary process adaptations, as well as not to miss required process adaptations [20].

Research focused on improving a prediction technique’s aggregate accuracy [1, 2, 5,6,7, 11, 13, 16, 19, 22, 25]. Aggregate accuracy takes into account the results of a set of predictions. Examples for aggregate accuracy metrics are precision, recall or mean average prediction error. Compared with aggregate accuracy metrics, prediction reliability estimates provide additional information about the error of an individual prediction for a given business process instance [4]. As an example, an aggregate accuracy of 75% means that, for a given prediction, there will be a 75% chance that the prediction is correct. In contrast, the reliability estimate of one prediction may be 60% while for another prediction it may be 90%. Reliability estimates thus facilitate distinguishing between more and less reliable predictions on a case by case basis. Reliability estimates can help decide whether to trust an individual prediction [12] and consequently whether to perform a proactive adaptation of the given process instance. In our freight transport example above, a process manager may only trust predictions with a reliability higher than 80%. Only then, the process manager would proactively schedule faster – and therefore also more expensive – means of transport.

Intuitively, considering reliability estimates sounds appealing, as reliability estimates offer more information to the process manager for decision making. However, we lack empirical evidence to support this intuition, as research on predictive business process monitoring focused on aggregate prediction accuracy. As an example, the process manager may be more conservative and act only if a prediction is reliable, which in turn should reduce the number of unnecessary process adaptations. Yet, it may also be that a process manager becomes too conservative and rejects relevant predictions deemed not reliable enough.

We experimentally analyze the effect of considering reliability estimates during predictive business process monitoring. We use an ensemble of artificial neural network prediction models, which we apply to an industry data set from the transport and logistics domain. Each prediction model aims at predicting whether a transport process instance may violate its delivery deadline. We consider reliability estimates for proactive process adaptation and analyze their effect from two complementary points of view. First, we analyze their effect on the rate of process performance violations (in our freight example, the rate of processes completed with delays). Second, we analyze their effect on costs.

After introducing relevant background and discussing related work in Sect. 2, we describe our experimental design in Sect. 3. In Sect. 4 we present and discuss our experimental results. Section 5 concludes with an outlook on future work.

2 Background and Related Work

2.1 Prediction Reliability

Background. Reliability estimates provide more information about the individual prediction error than aggregate accuracy metrics [4]. Reliability estimates can be computed in different ways. On the one hand, reliability estimates can be derived from information provided by individual prediction techniques. An example is decision tree learning (e.g., see [18]), which indicates the number of training examples correctly classified (“class support”) and the percentage of training examples correctly classified (“class probability”) for each prediction.

On the other hand, reliability estimates can be computed using ensemble prediction, as illustrated in Fig. 1. Ensemble prediction is a meta-prediction technique where the predictions of m prediction models are combined [21]. The main aim of ensemble prediction is to increase aggregate prediction accuracy. However, ensemble prediction also allows computing reliability estimates (e.g., see [3, 4]). In our experiments, we use ensemble prediction to compute reliability estimates for binary predictionsFootnote 1. This means each prediction result \(T_i\), \(i = 1, \ldots , m\), is either of class “violation” or “non-violation”. Formally, we compute reliability estimates as: .

Fig. 1.
figure 1

Reliability estimates computed using ensemble prediction

Related Work. Research on predictive business process monitoring focused on improving aggregate prediction accuracy [1, 2, 5,6,7, 11, 13, 16, 19, 22, 25]. Only recently, reliability estimates have been considered in the context of predictive business process monitoring.

Maggi et al. [18] use decision tree learning for predictive business process monitoring. Reliability estimates are computed from class probabilities and class support. In their experiments, they report on the impact of considering reliability estimates on aggregate prediction accuracy. They observe that using reliability estimates may improve aggregate accuracy. They also measure the loss of predictions, i.e., the number of predictions that are not considered because they are below the threshold. They observe that the loss of predictions is usually not very high (around 20%). However, they provide no further analysis of their observations and the possible implications for considering reliability estimates.

Di Francescomarino et al. [12] present a general predictive business process monitoring framework, which can be tailored to fit a given data set. They use decision trees and random forests for prediction. Reliability estimates are computed from class probabilities and class support. If prediction reliability is above a certain threshold, the prediction is considered. In their experiments, they measure “failure rate” among other metrics to assess the performance of different framework instances. “Failure rate” is defined as the percentage of process instances for which no reliable prediction could be given. They observe that “failure rate” may vary widely for the different framework instances. Yet, they provide no further analysis on the variables that may have an effect on “failure rate”.

2.2 Costs

Background. Proactive process adaptation entails asymmetric costs. On the one hand, one may face penalties in case of violations; e.g., due to contractual arrangements (e.g., SLAs) or due to loss of customers. On the other hand, adapting the running business processes may incur costs; e.g., due to executing roll-back actions or due to scheduling alternative process activities.

Figure 2 shows a cost model that incorporates these two aforementioned cost drivers (based in parts on [19, 20]). In this model, costs depend on (1) the actual process performance if no adaptation was taken, (2) whether the prediction was accurate, and (3) whether a business process adaptation was effective, i.e., whether the adaptation indeed resulted in a non-violation.

Fig. 2.
figure 2

Asymmetric costs of proactive business process adaptation

We use the cost in model Fig. 2 as basis for our experiments. We have purposefully chosen it to be simple, in order to concisely analyze and present our experimental results. Of course, costs may be more complex in reality. On the one hand, different shapes of penalties may exist [17]; e.g., penalties may be higher if the actual delays in a transport process are longer. On the other hand, adaptation costs may differ depending on the extent of changes needed for a specific business process instance; e.g., scheduling air transport as an alternative means of transport would typically be more expensive than truck transport.

Related Work. Different ways of factoring in costs during predictive business process monitoring and proactive process adaptation have been presented in the literature. On the one hand, costs may be considered by the prediction technique itself. A prominent class of approaches is cost-sensitive learning [10]. Cost-sensitive learning attempts to minimize costs due to prediction errors, rather than optimizing aggregate prediction accuracy. Cost-sensitive learning incorporates asymmetric costs into the learning of prediction models [26, 27]. However, existing cost-sensitive learning techniques do not consider reliability estimates.

On the other hand, costs may be considered when deciding on proactive process adaptations. Cost-based adaptation attempts to minimize the overall costs of process execution and adaptation. Leitner et al. [17] were among the first to argue that the costs of adaptation should be considered when deciding on the adaptation of service-oriented workflows. They formalized an optimization problem taking into account costs of violations and costs of applying adaptations. Their experimental results indicate that cost reductions of up to 56% may be achieved. However, these cost-aware proactive adaptation techniques do not consider prediction reliability. In addition, they rest on the assumption that process adaptations are always effective (i.e., lead to non-violations), which may not give an accurate view of reality. In our experiments, we factor in both reliability estimates and the effectiveness of adaptations.

3 Experiment Design

As motivated in Sect. 1, we aim to analyze the effect of considering reliability estimates during predictive business process monitoring and proactive process adaptation. In this section, we describe the design of our experiments to answer the following two research questions:

 

RQ1::

What effect does considering reliability estimates have on the rate of non-violations?

RQ2::

What effect does considering reliability estimates have on costs?

 

Below, we define the experimental variables, introduce the industry data set, and describe how we implemented the prediction techniques used.

3.1 Experimental Variables

In our experiment, we consider the following dependent variables:

  • Non-violation rate: We use the non-violation rate to answer RQ1. Given the number of process instances that completed with non-violations, l, and the number of all process instances, n, the non-violation rate is l / n.

  • Costs: We use costs to answer RQ2. For each process instance, we compute its individual costs according to the cost model from Sect. 2.2. The total costs are the sum of the individual costs of all process instances.

We consider the following independent variables:

  • Reliability threshold \(\theta \in [0.5,1]\). If the reliability estimateFootnote 2 for an individual process instance is higher than \(\theta \), we assume that a process manager would consider this a reliable prediction. As a result, the process manager would perform a proactive process adaptation if the prediction indicates a violation.

  • Adaptation effectiveness \(\alpha \in (0,1]\). If an adaptation helps achieve a non-violation (cf. Fig. 2), we consider such an adaptation effective. We use \(\alpha \) to represent the fact that not all adaptations might be effective. More concretely, \(\alpha \) represents the probability that an adaptation is effective; e.g., \(\alpha = 1\) means that all adaptations are effective. We do not consider \(\alpha = 0\) as this means that no adaptation is effective.

  • Relative adaptation costs \(\lambda \in [0,1]\). Based on our cost model from Sect. 2.2, \(\lambda \) expresses the costs of a business process adaptation, \(c_a\), as a fraction of the penalty for process violation, \(c_p\), i.e., \(c_a = \lambda \cdot c_p\). Choosing \(\lambda > 1\) would not make sense, as this leads to higher costs than if no adaptation is performed.

3.2 Industry Data Set

The data setFootnote 3 we use in our experiments stems from operational data of an international freight forwarding company. The data set covers five months of business operations and includes event logs of 3,942 business process instances, comprising a total of 56,082 activities. The processes and event data comply with IATA’s Cargo 2000 standardFootnote 4. Figure 3 shows the BPMN model of the business processes covered by the data set.

Fig. 3.
figure 3

Structure of Cargo 2000 transport and logistics process

Up to three shipments from suppliers are consolidated and in turn shipped to customers to benefit from better freight rates or increased cargo security. The business processes are structured into incoming and outgoing transport legs, which jointly aim at ensuring that freight is delivered to customers on time. Each transport leg involves the execution of transport and logistics activities, which are labeled using the acronyms of the Cargo 2000 standard. A transport leg may involve multiple flight segments (e.g., if cargo is transferred to other flights or airlines at stopover airports), in which case, “RCF” loops back to “DEP”. The number of segments per leg may range from one to four.

3.3 Implementation of Ensemble Prediction

We focus on predicting violations of business process performance metrics. Specifically, we aim at predicting, during process execution, whether a transport process instance may violate its delivery deadline. Thereby, process managers are warned about possible delays as early as possible (e.g., see [9]) so they can proactively adapt the running process instance.

Predictions may be performed at any point in time during process execution. For our experiment, we haven chosen to perform the predictions immediately after the synchronization point of the incoming transport processes as indicated in Fig. 3. Our earlier work has shown reasonably good prediction accuracy of more than \(70\%\) for this point in process execution, while still leaving time to execute actions required to respond to violations or mitigate their effects [19].

As prediction model, we use artificial neural networks (ANNs [14]), which have shown good results in our earlier work [19]. We use the implementation of ANNs (with their standard parameters) of the WEKA open source machine learning toolkit. As attributes for the ANN model, we use the expected and actual times for all process activities until the point of prediction (i.e., all activities of the incoming transports in Fig. 3), and the actual violation or non-violation of the delivery time of the completed process instance.

To automatically train the ensembles of ANNs and to compute the reliability estimates according to Sect. 2.1, we developed a Java tool that interfaces with WEKA. We use bagging (bootstrap aggregating) as a concrete ensemble prediction technique. Bagging generates m new training data sets by sampling from the whole training data set uniformly and with replacement. For each of the m new training data sets an individual prediction model is trained. Bagging is a generally recommended and used ensemble prediction technique for ANNs [8].

4 Results

Here, we present the experimental resultsFootnote 5 to answer the two research questions posed above, and discuss how we addressed potential threats to validity.

To give a first impression of the effect of considering reliability estimates, Fig. 4 shows aggregate accuracy measurements. We measured precision, recall, specificity and correct classification rate. As can be seen, considering reliability estimates has a positive effect on aggregate accuracy. Aggregate accuracy improves with higher reliability threshold \(\theta \). This is in line with previous empirical evidence (e.g., see [4, 12, 18]).

Fig. 4.
figure 4

Aggregate accuracy indicators and loss

In addition, Fig. 4 shows the loss of predictions, i.e., the rate of predictions that have a reliability estimate below the threshold \(\theta \). As can be seen, the loss of predictions increases with higher \(\theta \), and relatively more predictions are lost the higher \(\theta \) gets. As we speculated in the introduction, this loss of predictions may imply that required process adaptations could be missed, as the process manager may be too conservative and reject relevant predictions with low reliability. In turn, the number of violation situations in which the process manager does not act may increase. We further explore this in the following sections.

4.1 Results for RQ1 (Non-violation Rates)

RQ1 is concerned with the effect of reliability estimates on non-violation rates. Figure 5 shows the non-violation rates depending on the reliability threshold (\(\theta \)) and the probability that adaptations are effective (\(\alpha \)).

Fig. 5.
figure 5

Non-violation rates for varying adaptation effectiveness (\(\alpha \))

A positive effect means that non-violation rates are (1) higher than the non-violation rates if no proactive process adaption is performed (= value on the right hand side of the figure), and (2) higher than the non-violation rates when process adaptations are performed without reliability estimates (= values on the left hand side of the figure). This leads to two cases as indicated in Fig. 5:

  • For \(\alpha \ge .7\) (dashed lines), considering reliability estimates has a negative effect on non-violation rates and thus non-violation rates decrease. The reason is that with a higher loss of predictions (cf. Fig. 4), the number of missed adaptations increases. Yet, this is not compensated by fewer unnecessary adaptations. Due to the large \(\alpha \), each adaptation will lead to a non-violating situation with high probability. Thus performing too many adaptations seems to be better than performing too few.

  • Reliability estimates can have a positive effect and lead to higher non-violation rates if \(\alpha < .7\) (solid lines). For a given \(\alpha \), optimal non-violation rates are marked by “\(\times \)” in Fig. 5.

In conclusion, our experimental results suggest the following answer to RQ1: Considering reliability estimates can have a positive effect on non-violation rates if the probability of effective process adaptations is low.

Conversely, this would imply that considering reliability estimates when the probability of effective process adaptations is high might not make sense in practice. However, this conclusion would not consider costs, which we analyze in the next section.

4.2 Results for RQ2 (Costs)

RQ2 is concerned with the effect of reliability estimates on costs. Figure 6 shows our experimental results when factoring in costs.

Fig. 6.
figure 6

Costs for \(\alpha = .8\)

We factor in costs according to the cost model from Sect. 2.2. Without loss of generality, penalty costs are set to 100. Adaptation costs are computed as fraction (\(\lambda \)) of penalty costs. We have chosen \(\alpha = 0.8\), which is a relatively high probability of effective process adaptations. According to RQ1, considering reliability estimates for such \(\alpha \) should have a negative effect.

A positive effect on costs means that costs are (1) lower than the costs if no proactive process adaption is performed (value on the right hand side of the figure), and (2) lower than the costs when process adaptations are performed without reliability estimates (values on the left hand side of the figure). This leads to three cases, as indicated in Fig. 6:

  • For \(\lambda \ge .8\) (dashed lines), proactive adaptation – independently of whether considering reliability estimates or not – leads to costs that are higher than not performing any proactive process adaptation. The reason is that the avoided penalties do not compensate the prohibitively high adaptation costs.

  • For \(\lambda < .2\) (dashed lines), considering reliability estimates during proactive process adaptation leads to higher costs than not considering reliability estimates. The reason is that due to the loss of predictions, the rate of missed adaptations may go up. As adaptation costs are low, investing in proactive adaptation – even if some are unnecessary – pays off to prevent penalties. It should be noted though, that even if one considered reliability estimates, costs would remain lower than if not performing any proactive adaptation.

  • Reliability estimates have a positive effect for \(.2 \le \lambda \le .7\) (solid lines). In these situations, there is an optimal choice of \(\theta \) that leads to the lowest costs.

Above, \(\alpha \) was fixed. To provide the complete picture, Fig. 7 depicts a matrix considering the complete ranges of \(\alpha \), \(\lambda \) and \(\theta \) (cf. Sect. 3.1), thereby aggregating 5,000 experimental data points.

Fig. 7.
figure 7

Threshold \(\theta \) that leads to minimal costs, depending on \(\alpha \) and \(\lambda \)

For each combination of \(\alpha \) and \(\lambda \), the matrix shows the value of \(\theta \) that leads to the lowest costs (these are the points marked with “\(\times \)” in Fig. 6). If not considering reliability estimates performed best, this is indicated by \(\theta = 0\). If no proactive adaptation performed best, this is indicated by \(\theta = 1\). Again, we can differentiate three cases:

  • Proactive process adaptation in general does not have a positive effect on costs if \(\lambda \ge \alpha \). This is the case in 47.5% of all situations.

  • Considering reliability estimates does not have a positive effect on costs if \(\lambda \) is small and \(\alpha \) is large (9% of all situations). Again, even if one considered reliability estimates, costs would remain lower than if not performing any proactive adaptation.

  • In the remaining 43.5% of all situations, considering reliability estimates can have a positive effect. Again, the minimal costs depend on the choice of \(\theta \).

Fig. 8.
figure 8

Histogram of relative cost savings

To quantify the size of the effect on costs, we have measured the relative cost savings that may be achieved in each of the situations in which considering reliability estimates has a positive effect. Results are shown in Fig. 8. Savings range from 2% to 54%, with 14% savings on average. Savings of more than 30% are achieved in 15% of the situations.

Finally, we aimed to determine whether we can choose a reliability threshold that would work in all situations. To this end, we have computed for each situation the minimal \(\theta \) for which costs will be below the costs of not performing any adaptation. We considered the largest of these minimal \(\theta \) as a safe lower bound for reliability thresholds. Our results indicate that a threshold of \(\theta > 80\%\) will lead to costs lower than if not performing any adaptation.

In conclusion, our experimental results suggest the following answer to RQ2: Provided that the relative adaptation costs are smaller than the probability of effective process adaptations, considering reliability estimates can have a positive effect on costs.

4.3 Addressing Threats to Validity

Regarding internal validity, we minimized the risk of bias as follows. For training and testing the prediction models, we performed a 10-fold cross-validation. In addition, we analyzed the impact of the following main parameters of our ensemble prediction technique: (1) Computing reliability estimates: In addition to computing reliability estimates as defined in Sect. 2.1, we used weighted reliability, which factors in the probability delivered by individual ANN predictions. Differences in results were marginal. (2) Bootstrap size: Bootstrap (see Sect. 3.3) refers to the size of the newly generated training data sets. We used 80%, 66%, and 50% as bootstrap sizes. There was not a clear trend that larger bootstrap sizes would perform better than smaller ones and different bootstrap sizes did not impact the general shape of the experimental results. (3) Ensemble size: We varied ensemble size from 2 to 100. The size of the ensemble did not lead to different principal findings. However, as expected, larger ensembles generally delivered better aggregate accuracy. More importantly, larger ensembles delivered more fine-grained reliability estimates.

Regarding external validity, our experimental results are based on a relatively large industry data set. We have specifically chosen different reliability thresholds (\(\theta \)), different probabilities of effective process adaptations (\(\alpha \)), and different adaptation costs (\(\lambda \)) to cover different possible situations that may be faced in practice. The process model covers many relevant workflow patterns [24]: sequence; exclusive choice and simple merge; cycles; parallel split and synchronization. Still, our data set is from a single application domain which thus may limit generalizability. As mentioned in Sect. 2.2, our cost model was purposefully chosen to be simple. Even though this cost model helped analyzing and understanding the effects of the independent variables in our experiment, it may have been too simple.

In view of construct validity, we took great care to ensure we measure the right things. In particular, we assessed the impact of considering reliability estimates from different, complementary angles: aggregate accuracy, loss, non-violation rates, and costs.

5 Conclusions and Perspectives

Our experimental evidence suggest that considering reliability estimates during predictive business process monitoring can have a positive effect on costs. With respect to the independent variables of our experiment, the effect mainly depends on the effectiveness of proactive process adaptations and the relative costs of these process adaptations, while the concrete reliability threshold has a secondary effect. In our experiments, proactive process adaptation in general had a positive effect on costs in 52.5% of the situations. In 82.9% of these situations, considering reliability estimates increased the positive effect, with cost savings ranging from 2% to 54%, and 14% on average. We also determined that even if one considered reliability estimates in the remaining 17.1% of situations, costs would remain lower than if not performing any proactive adaptation. Finally, our results suggest 80% is a safe lower bound for choosing reliability thresholds.

Our results also clearly indicate that considering reliability estimates does not lead to a positive effect in all situations. How to determine these situations up front remains an open question. This paper was a first step towards answering this question. We plan to gather further empirical data by replicating our experiments in other application domains, such as energy and e-commerce.

Further, we aim at using more complex cost models. These cost models will consider different shapes of penalties and different costs of adaptations, both of which depend on the extent of deviations from expected business performance metrics. We will therefore perform numeric predictions, instead of the binary predictions used in this paper, to quantify the extent of deviations. This will be complemented by case studies with industry to capture the perspective of users.