Predictive Business Process Monitoring Considering Reliability Estimates
Predictive business process monitoring aims at predicting potential problems during process execution so that these problems can be proactively managed and mitigated. Compared to aggregate prediction accuracy indicators (e.g., precision or recall), prediction reliability estimates provide additional information about the prediction error for an individual business process. Intuitively, it appears appealing to consider reliability estimates when deciding on whether to adapt a running process instance or not. However, we lack empirical evidence to support this intuition, as research on predictive business process monitoring focused on aggregate prediction accuracy. We experimentally analyze the effect of considering prediction reliability estimates for proactive business process adaptation. We use ensemble prediction techniques, which we apply to an industry data set from the transport and logistics domain. In our experiments, proactive business process adaptation in general had a positive effect on cost in 52.5% of the situations. In 82.9% of these situations, considering reliability estimates increased the positive effect, leading to cost savings of up to 54%, with 14% savings on average.
KeywordsBusiness process monitoring Proactive adaptation Prediction Empirical evaluation
Predictive business process monitoring aims at anticipating potential process performance violations during the execution of a business process instance . To this end, predictive business process monitoring predicts how an ongoing process instance will unfold up to its completion . If a violation is predicted, a process instance may be proactively adapted to prevent the occurrence of violations . As an example, if a delay in delivery time is predicted for an ongoing freight transport process, faster means of transport or alternative transport routes may be scheduled before the delay actually occurs.
A key requirement for the applicability of any predictive business process monitoring technique is that the technique delivers accurate predictions. Informally, prediction accuracy characterizes the ability of a prediction technique to forecast as many true violations as possible, while – at the same time – generating as few false alarms as possible . Prediction accuracy is important to avoid the execution of unnecessary process adaptations, as well as not to miss required process adaptations .
Research focused on improving a prediction technique’s aggregate accuracy [1, 2, 5, 6, 7, 11, 13, 16, 19, 22, 25]. Aggregate accuracy takes into account the results of a set of predictions. Examples for aggregate accuracy metrics are precision, recall or mean average prediction error. Compared with aggregate accuracy metrics, prediction reliability estimates provide additional information about the error of an individual prediction for a given business process instance . As an example, an aggregate accuracy of 75% means that, for a given prediction, there will be a 75% chance that the prediction is correct. In contrast, the reliability estimate of one prediction may be 60% while for another prediction it may be 90%. Reliability estimates thus facilitate distinguishing between more and less reliable predictions on a case by case basis. Reliability estimates can help decide whether to trust an individual prediction  and consequently whether to perform a proactive adaptation of the given process instance. In our freight transport example above, a process manager may only trust predictions with a reliability higher than 80%. Only then, the process manager would proactively schedule faster – and therefore also more expensive – means of transport.
Intuitively, considering reliability estimates sounds appealing, as reliability estimates offer more information to the process manager for decision making. However, we lack empirical evidence to support this intuition, as research on predictive business process monitoring focused on aggregate prediction accuracy. As an example, the process manager may be more conservative and act only if a prediction is reliable, which in turn should reduce the number of unnecessary process adaptations. Yet, it may also be that a process manager becomes too conservative and rejects relevant predictions deemed not reliable enough.
We experimentally analyze the effect of considering reliability estimates during predictive business process monitoring. We use an ensemble of artificial neural network prediction models, which we apply to an industry data set from the transport and logistics domain. Each prediction model aims at predicting whether a transport process instance may violate its delivery deadline. We consider reliability estimates for proactive process adaptation and analyze their effect from two complementary points of view. First, we analyze their effect on the rate of process performance violations (in our freight example, the rate of processes completed with delays). Second, we analyze their effect on costs.
After introducing relevant background and discussing related work in Sect. 2, we describe our experimental design in Sect. 3. In Sect. 4 we present and discuss our experimental results. Section 5 concludes with an outlook on future work.
2 Background and Related Work
2.1 Prediction Reliability
Background. Reliability estimates provide more information about the individual prediction error than aggregate accuracy metrics . Reliability estimates can be computed in different ways. On the one hand, reliability estimates can be derived from information provided by individual prediction techniques. An example is decision tree learning (e.g., see ), which indicates the number of training examples correctly classified (“class support”) and the percentage of training examples correctly classified (“class probability”) for each prediction.
Related Work. Research on predictive business process monitoring focused on improving aggregate prediction accuracy [1, 2, 5, 6, 7, 11, 13, 16, 19, 22, 25]. Only recently, reliability estimates have been considered in the context of predictive business process monitoring.
Maggi et al.  use decision tree learning for predictive business process monitoring. Reliability estimates are computed from class probabilities and class support. In their experiments, they report on the impact of considering reliability estimates on aggregate prediction accuracy. They observe that using reliability estimates may improve aggregate accuracy. They also measure the loss of predictions, i.e., the number of predictions that are not considered because they are below the threshold. They observe that the loss of predictions is usually not very high (around 20%). However, they provide no further analysis of their observations and the possible implications for considering reliability estimates.
Di Francescomarino et al.  present a general predictive business process monitoring framework, which can be tailored to fit a given data set. They use decision trees and random forests for prediction. Reliability estimates are computed from class probabilities and class support. If prediction reliability is above a certain threshold, the prediction is considered. In their experiments, they measure “failure rate” among other metrics to assess the performance of different framework instances. “Failure rate” is defined as the percentage of process instances for which no reliable prediction could be given. They observe that “failure rate” may vary widely for the different framework instances. Yet, they provide no further analysis on the variables that may have an effect on “failure rate”.
Background. Proactive process adaptation entails asymmetric costs. On the one hand, one may face penalties in case of violations; e.g., due to contractual arrangements (e.g., SLAs) or due to loss of customers. On the other hand, adapting the running business processes may incur costs; e.g., due to executing roll-back actions or due to scheduling alternative process activities.
We use the cost in model Fig. 2 as basis for our experiments. We have purposefully chosen it to be simple, in order to concisely analyze and present our experimental results. Of course, costs may be more complex in reality. On the one hand, different shapes of penalties may exist ; e.g., penalties may be higher if the actual delays in a transport process are longer. On the other hand, adaptation costs may differ depending on the extent of changes needed for a specific business process instance; e.g., scheduling air transport as an alternative means of transport would typically be more expensive than truck transport.
Related Work. Different ways of factoring in costs during predictive business process monitoring and proactive process adaptation have been presented in the literature. On the one hand, costs may be considered by the prediction technique itself. A prominent class of approaches is cost-sensitive learning . Cost-sensitive learning attempts to minimize costs due to prediction errors, rather than optimizing aggregate prediction accuracy. Cost-sensitive learning incorporates asymmetric costs into the learning of prediction models [26, 27]. However, existing cost-sensitive learning techniques do not consider reliability estimates.
On the other hand, costs may be considered when deciding on proactive process adaptations. Cost-based adaptation attempts to minimize the overall costs of process execution and adaptation. Leitner et al.  were among the first to argue that the costs of adaptation should be considered when deciding on the adaptation of service-oriented workflows. They formalized an optimization problem taking into account costs of violations and costs of applying adaptations. Their experimental results indicate that cost reductions of up to 56% may be achieved. However, these cost-aware proactive adaptation techniques do not consider prediction reliability. In addition, they rest on the assumption that process adaptations are always effective (i.e., lead to non-violations), which may not give an accurate view of reality. In our experiments, we factor in both reliability estimates and the effectiveness of adaptations.
3 Experiment Design
As motivated in Sect. 1, we aim to analyze the effect of considering reliability estimates during predictive business process monitoring and proactive process adaptation. In this section, we describe the design of our experiments to answer the following two research questions:
What effect does considering reliability estimates have on the rate of non-violations?
What effect does considering reliability estimates have on costs?
Below, we define the experimental variables, introduce the industry data set, and describe how we implemented the prediction techniques used.
3.1 Experimental Variables
Non-violation rate: We use the non-violation rate to answer RQ1. Given the number of process instances that completed with non-violations, l, and the number of all process instances, n, the non-violation rate is l / n.
Costs: We use costs to answer RQ2. For each process instance, we compute its individual costs according to the cost model from Sect. 2.2. The total costs are the sum of the individual costs of all process instances.
Reliability threshold \(\theta \in [0.5,1]\). If the reliability estimate2 for an individual process instance is higher than \(\theta \), we assume that a process manager would consider this a reliable prediction. As a result, the process manager would perform a proactive process adaptation if the prediction indicates a violation.
Adaptation effectiveness \(\alpha \in (0,1]\). If an adaptation helps achieve a non-violation (cf. Fig. 2), we consider such an adaptation effective. We use \(\alpha \) to represent the fact that not all adaptations might be effective. More concretely, \(\alpha \) represents the probability that an adaptation is effective; e.g., \(\alpha = 1\) means that all adaptations are effective. We do not consider \(\alpha = 0\) as this means that no adaptation is effective.
Relative adaptation costs \(\lambda \in [0,1]\). Based on our cost model from Sect. 2.2, \(\lambda \) expresses the costs of a business process adaptation, \(c_a\), as a fraction of the penalty for process violation, \(c_p\), i.e., \(c_a = \lambda \cdot c_p\). Choosing \(\lambda > 1\) would not make sense, as this leads to higher costs than if no adaptation is performed.
3.2 Industry Data Set
Up to three shipments from suppliers are consolidated and in turn shipped to customers to benefit from better freight rates or increased cargo security. The business processes are structured into incoming and outgoing transport legs, which jointly aim at ensuring that freight is delivered to customers on time. Each transport leg involves the execution of transport and logistics activities, which are labeled using the acronyms of the Cargo 2000 standard. A transport leg may involve multiple flight segments (e.g., if cargo is transferred to other flights or airlines at stopover airports), in which case, “RCF” loops back to “DEP”. The number of segments per leg may range from one to four.
3.3 Implementation of Ensemble Prediction
We focus on predicting violations of business process performance metrics. Specifically, we aim at predicting, during process execution, whether a transport process instance may violate its delivery deadline. Thereby, process managers are warned about possible delays as early as possible (e.g., see ) so they can proactively adapt the running process instance.
Predictions may be performed at any point in time during process execution. For our experiment, we haven chosen to perform the predictions immediately after the synchronization point of the incoming transport processes as indicated in Fig. 3. Our earlier work has shown reasonably good prediction accuracy of more than \(70\%\) for this point in process execution, while still leaving time to execute actions required to respond to violations or mitigate their effects .
As prediction model, we use artificial neural networks (ANNs ), which have shown good results in our earlier work . We use the implementation of ANNs (with their standard parameters) of the WEKA open source machine learning toolkit. As attributes for the ANN model, we use the expected and actual times for all process activities until the point of prediction (i.e., all activities of the incoming transports in Fig. 3), and the actual violation or non-violation of the delivery time of the completed process instance.
To automatically train the ensembles of ANNs and to compute the reliability estimates according to Sect. 2.1, we developed a Java tool that interfaces with WEKA. We use bagging (bootstrap aggregating) as a concrete ensemble prediction technique. Bagging generates m new training data sets by sampling from the whole training data set uniformly and with replacement. For each of the m new training data sets an individual prediction model is trained. Bagging is a generally recommended and used ensemble prediction technique for ANNs .
Here, we present the experimental results5 to answer the two research questions posed above, and discuss how we addressed potential threats to validity.
In addition, Fig. 4 shows the loss of predictions, i.e., the rate of predictions that have a reliability estimate below the threshold \(\theta \). As can be seen, the loss of predictions increases with higher \(\theta \), and relatively more predictions are lost the higher \(\theta \) gets. As we speculated in the introduction, this loss of predictions may imply that required process adaptations could be missed, as the process manager may be too conservative and reject relevant predictions with low reliability. In turn, the number of violation situations in which the process manager does not act may increase. We further explore this in the following sections.
4.1 Results for RQ1 (Non-violation Rates)
For \(\alpha \ge .7\) (dashed lines), considering reliability estimates has a negative effect on non-violation rates and thus non-violation rates decrease. The reason is that with a higher loss of predictions (cf. Fig. 4), the number of missed adaptations increases. Yet, this is not compensated by fewer unnecessary adaptations. Due to the large \(\alpha \), each adaptation will lead to a non-violating situation with high probability. Thus performing too many adaptations seems to be better than performing too few.
Reliability estimates can have a positive effect and lead to higher non-violation rates if \(\alpha < .7\) (solid lines). For a given \(\alpha \), optimal non-violation rates are marked by “\(\times \)” in Fig. 5.
In conclusion, our experimental results suggest the following answer to RQ1: Considering reliability estimates can have a positive effect on non-violation rates if the probability of effective process adaptations is low.
Conversely, this would imply that considering reliability estimates when the probability of effective process adaptations is high might not make sense in practice. However, this conclusion would not consider costs, which we analyze in the next section.
4.2 Results for RQ2 (Costs)
We factor in costs according to the cost model from Sect. 2.2. Without loss of generality, penalty costs are set to 100. Adaptation costs are computed as fraction (\(\lambda \)) of penalty costs. We have chosen \(\alpha = 0.8\), which is a relatively high probability of effective process adaptations. According to RQ1, considering reliability estimates for such \(\alpha \) should have a negative effect.
For \(\lambda \ge .8\) (dashed lines), proactive adaptation – independently of whether considering reliability estimates or not – leads to costs that are higher than not performing any proactive process adaptation. The reason is that the avoided penalties do not compensate the prohibitively high adaptation costs.
For \(\lambda < .2\) (dashed lines), considering reliability estimates during proactive process adaptation leads to higher costs than not considering reliability estimates. The reason is that due to the loss of predictions, the rate of missed adaptations may go up. As adaptation costs are low, investing in proactive adaptation – even if some are unnecessary – pays off to prevent penalties. It should be noted though, that even if one considered reliability estimates, costs would remain lower than if not performing any proactive adaptation.
Reliability estimates have a positive effect for \(.2 \le \lambda \le .7\) (solid lines). In these situations, there is an optimal choice of \(\theta \) that leads to the lowest costs.
Proactive process adaptation in general does not have a positive effect on costs if \(\lambda \ge \alpha \). This is the case in 47.5% of all situations.
Considering reliability estimates does not have a positive effect on costs if \(\lambda \) is small and \(\alpha \) is large (9% of all situations). Again, even if one considered reliability estimates, costs would remain lower than if not performing any proactive adaptation.
In the remaining 43.5% of all situations, considering reliability estimates can have a positive effect. Again, the minimal costs depend on the choice of \(\theta \).
To quantify the size of the effect on costs, we have measured the relative cost savings that may be achieved in each of the situations in which considering reliability estimates has a positive effect. Results are shown in Fig. 8. Savings range from 2% to 54%, with 14% savings on average. Savings of more than 30% are achieved in 15% of the situations.
Finally, we aimed to determine whether we can choose a reliability threshold that would work in all situations. To this end, we have computed for each situation the minimal \(\theta \) for which costs will be below the costs of not performing any adaptation. We considered the largest of these minimal \(\theta \) as a safe lower bound for reliability thresholds. Our results indicate that a threshold of \(\theta > 80\%\) will lead to costs lower than if not performing any adaptation.
In conclusion, our experimental results suggest the following answer to RQ2: Provided that the relative adaptation costs are smaller than the probability of effective process adaptations, considering reliability estimates can have a positive effect on costs.
4.3 Addressing Threats to Validity
Regarding internal validity, we minimized the risk of bias as follows. For training and testing the prediction models, we performed a 10-fold cross-validation. In addition, we analyzed the impact of the following main parameters of our ensemble prediction technique: (1) Computing reliability estimates: In addition to computing reliability estimates as defined in Sect. 2.1, we used weighted reliability, which factors in the probability delivered by individual ANN predictions. Differences in results were marginal. (2) Bootstrap size: Bootstrap (see Sect. 3.3) refers to the size of the newly generated training data sets. We used 80%, 66%, and 50% as bootstrap sizes. There was not a clear trend that larger bootstrap sizes would perform better than smaller ones and different bootstrap sizes did not impact the general shape of the experimental results. (3) Ensemble size: We varied ensemble size from 2 to 100. The size of the ensemble did not lead to different principal findings. However, as expected, larger ensembles generally delivered better aggregate accuracy. More importantly, larger ensembles delivered more fine-grained reliability estimates.
Regarding external validity, our experimental results are based on a relatively large industry data set. We have specifically chosen different reliability thresholds (\(\theta \)), different probabilities of effective process adaptations (\(\alpha \)), and different adaptation costs (\(\lambda \)) to cover different possible situations that may be faced in practice. The process model covers many relevant workflow patterns : sequence; exclusive choice and simple merge; cycles; parallel split and synchronization. Still, our data set is from a single application domain which thus may limit generalizability. As mentioned in Sect. 2.2, our cost model was purposefully chosen to be simple. Even though this cost model helped analyzing and understanding the effects of the independent variables in our experiment, it may have been too simple.
In view of construct validity, we took great care to ensure we measure the right things. In particular, we assessed the impact of considering reliability estimates from different, complementary angles: aggregate accuracy, loss, non-violation rates, and costs.
5 Conclusions and Perspectives
Our experimental evidence suggest that considering reliability estimates during predictive business process monitoring can have a positive effect on costs. With respect to the independent variables of our experiment, the effect mainly depends on the effectiveness of proactive process adaptations and the relative costs of these process adaptations, while the concrete reliability threshold has a secondary effect. In our experiments, proactive process adaptation in general had a positive effect on costs in 52.5% of the situations. In 82.9% of these situations, considering reliability estimates increased the positive effect, with cost savings ranging from 2% to 54%, and 14% on average. We also determined that even if one considered reliability estimates in the remaining 17.1% of situations, costs would remain lower than if not performing any proactive adaptation. Finally, our results suggest 80% is a safe lower bound for choosing reliability thresholds.
Our results also clearly indicate that considering reliability estimates does not lead to a positive effect in all situations. How to determine these situations up front remains an open question. This paper was a first step towards answering this question. We plan to gather further empirical data by replicating our experiments in other application domains, such as energy and e-commerce.
Further, we aim at using more complex cost models. These cost models will consider different shapes of penalties and different costs of adaptations, both of which depend on the extent of deviations from expected business performance metrics. We will therefore perform numeric predictions, instead of the binary predictions used in this paper, to quantify the extent of deviations. This will be complemented by case studies with industry to capture the perspective of users.
Numeric predictions are part of our future work.
According to how we compute reliability estimates (see Sect. 3.3), the smallest reliability value that is possible is \(\theta = 0.5\).
Available from http://www.s-cube-network.eu/c2k.
Cargo 2000 (now Cargo iQ: http://cargoiq.org/) is an initiative of IATA.
The raw data of our experiments is available from http://www.s-cube-network.eu/reliability/.
We thank Zoltán Ádám Mann for his constructive comments on earlier versions of the paper. Research leading to these results has received funding from the EFRE co-financed operational program NRW.Ziel2 under grant agreement 005-1010-0012 (LoFIP – Cockpits for Operational Management of Transport Processes) and from the European Unions Horizon 2020 research and innovation programme under grant agreement No. 731932 (TransformingTransport).
- 2.Bevacqua, A., Carnuccio, M., Folino, F., Guarascio, M., Pontieri, L.: A data-adaptive trace abstraction approach to the prediction of business process performances. In: Hammoudi, S., Maciaszek, L.A., Cordeiro, J., Dietz, J.L.G. (eds.) 15th International Conference on Enterprise Information Systems (ICEIS 2013), Angers, France, pp. 56–65. SciTePress (2013)Google Scholar
- 9.Eder, J., Panagos, E., Rabinovich, M.: Workflow time management revisited. In: Bubenko, J., Krogstie, J., Pastor, O., Pernici, B., Rolland, C., Sølvberg, A. (eds.) Seminal Contributions to Information Systems Engineering: 25 Years of CAiSE, pp. 207–213. Springer, Heidelberg (2013). doi: 10.1007/978-3-642-36926-1_16 CrossRefGoogle Scholar
- 10.Elkan, C.: The foundations of cost-sensitive learning. In: Nebel, B. (ed.) Proceedings 7th International Joint Conference on Artificial Intelligence, IJCAI 2001, Seattle, Washington, USA, 4–10 August 2001, pp. 973–978. Morgan Kaufmann (2001)Google Scholar
- 11.Feldmann, Z., Fournier, F., Franklin, R., Metzger, A.: Proactive event processing in action: a case study on the proactive management of transport processes. In: Chakravarthy, S., Urban, S., Pietzuch, P., Rundensteiner, E., Dietrich, S. (eds.) 7th International Conference on Distributed Event-Based Systems (DEBS 2013), Arlington, Texas, USA, pp. 97–106. ACM (2013)Google Scholar
- 12.Di Francescomarino, C., Dumas, M., Federici, M., Ghidini, C., Maggi, F.M., Rizzi, W.: Predictive business process monitoring framework with hyperparameter optimization. In: Nurcan, S., Soffer, P., Bajec, M., Eder, J. (eds.) CAiSE 2016. LNCS, vol. 9694, pp. 361–376. Springer, Cham (2016). doi: 10.1007/978-3-319-39696-5_22 Google Scholar
- 14.Haykin, S.: Neural Networks and Learning Machines: A Comprehensive Foundation, 3rd edn. Prentice Hall, Upper Saddle River (2008)Google Scholar
- 18.Maggi, F.M., Di Francescomarino, C., Dumas, M., Ghidini, C.: Predictive monitoring of business processes. In: Jarke, M., Mylopoulos, J., Quix, C., Rolland, C., Manolopoulos, Y., Mouratidis, H., Horkoff, J. (eds.) CAiSE 2014. LNCS, vol. 8484, pp. 457–472. Springer, Cham (2014). doi: 10.1007/978-3-319-07881-6_31 Google Scholar
- 20.Metzger, A., Sammodi, O., Pohl, K.: Accurate proactive adaptation of service-oriented systems. In: Cámara, J., de Lemos, R., Ghezzi, C., Lopes, A. (eds.) Assurances for Self-Adaptive Systems. LNCS, vol. 7740, pp. 240–265. Springer, Heidelberg (2013). doi: 10.1007/978-3-642-36249-1_9 CrossRefGoogle Scholar
- 22.Rogge-Solti, A., Weske, M.: Prediction of remaining service execution time using stochastic petri nets with arbitrary firing delays. In: Basu, S., Pautasso, C., Zhang, L., Fu, X. (eds.) ICSOC 2013. LNCS, vol. 8274, pp. 389–403. Springer, Heidelberg (2013). doi: 10.1007/978-3-642-45005-1_27 CrossRefGoogle Scholar
- 24.Skouradaki, M., Ferme, V., Pautasso, C., Leymann, F., van Hoorn, A.: Micro-benchmarking BPMN 2.0 workflow management systems with workflow patterns. In: Nurcan, S., Soffer, P., Bajec, M., Eder, J. (eds.) CAiSE 2016. LNCS, vol. 9694, pp. 67–82. Springer, Cham (2016). doi: 10.1007/978-3-319-39696-5_5 Google Scholar
- 25.Verenich, I., Dumas, M., La Rosa, M., Maggi, F.M., Di Francescomarino, C.: Complex symbolic sequence clustering and multiple classifiers for predictive process monitoring. In: Reichert, M., Reijers, H.A. (eds.) BPM 2015. LNBIP, vol. 256, pp. 218–229. Springer, Cham (2016). doi: 10.1007/978-3-319-42887-1_18 CrossRefGoogle Scholar
- 26.Zadrozny, B., Elkan, C.: Learning and making decisions when costs and probabilities are both unknown. In: Lee, D., Schkolnick, M., Provost, F.J., Srikant, R. (eds.) Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 26–29 August 2001, pp. 204–213. ACM (2001)Google Scholar
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.