Keywords

1 Introduction

Remaining time prediction approaches learn from historical process executions and build prediction models for running process instances, i.e., cases, based on the extracted features from the event data. Many approaches have been suggested to solve the remaining time prediction problem [17]. However, most proposed approaches have considerably high prediction errors. Based on [17], the best performing model using an LSTM neural network [10] showed a prediction error of 178.4 days on average for the Road Traffic Management (RF) event log [9]. These approaches also only consider control-flow-related aspects of processes and individual case properties, i.e., intra-case properties, while making predictions [12]. A process also has other dimensions associated with it [13]. For instance, specific rules determining scheduling and assignment of limited resources, queuing mechanism, and decision logic in the process create inter-case dependencies within the performance of process instances. Moreover, most of the effort put into this research area has focused on applying new predictive modeling techniques, which create black-box prediction models. Considering inter-case along with intra-case process features in RTMs increases the explainability, interpretability, and accuracy of the prediction [8]. Therefore, we aim to improve the quality of RTMs and introduce more interpretability in the predictions. The accuracy of a RTM which is unaware of inter-case behavior is substantially impacted if cases in a process segment, i.e., a pair of related activities, are processed in a batch, First-In-First-Out (FIFO), or other patterns. The prediction accuracy decreases as a case passes through such segments indicating that RTM is uncertain about the underlying process behavior in such segments. We call these process segments uncertain segments. Therefore, recognizing all uncertain segments and translating their various inter-case patterns of process execution into features for training RTMs increases prediction quality.

Fig. 1.
figure 1

Our proposed framework for inter-case-aware RTMs. Patterns are discovered after detecting uncertain segments, i.e., segments causing high prediction errors due to inter-case dynamics. RTMs are trained using the extracted features from the patterns within uncertain segments.

In this paper, we present a three-step approach for developing inter-case dynamics aware RTMs: (1) Identifying process segments that cause high prediction errors due to inter-case dynamics, i.e., uncertain segments. (2) Discovering insights about the underlying patterns, e.g., batching, that leads to inter-case dependencies within the detected segments. (3) Transforming derived insights into features and incorporating them in RTMs to improve the quality of predictions. For instance, the waiting time for the batching in a segment is transformed into a feature and introduced into the RTM. We evaluate the prediction errors of RTMs without incorporating inter-case dependencies, such as batching behavior in a process segment, as shown in Fig. 1, and identify uncertain segments that involve inter-case dynamics. We continue by extracting the features associated with the observed patterns in the uncertain segments.

We introduce preliminaries and the related work in Sect. 2. In Sect. 3, we present our main approach. We evaluate the approach in Sect. 4 using real event logs, and Sect. 5 concludes this work.

2 Preliminaries and Related Work

In this section, we introduce the necessary concepts and related work required to understand the approach presented in this paper.

2.1 Related Work

RTM approaches can be classified into three broad categories [17]. Process aware approaches make predictions using explicit process model representations such as transition system [1]. Process agnostic approaches typically use machine learning (ML) methods [14] to make predictions. Recent process agnostic approaches predominantly make use of sophisticated neura!l network architectures like LSTM [16] and explainable AI methods [5] to develop RTMs. Hybrid approaches like [11] combine capabilities of both categories by exploiting transition systems that are annotated using a machine learning algorithm. However, most approaches across all three categories only consider the intra-case perspective for predictions.

RTM approaches based on queuing models [15] and supervised learning [14] utilized the inter-case dimension in predictions. They create features on the basis of queuing theory like case priority and open cases of similar type. However, these approaches assume FIFO queuing behavior throughout the entire process. Two recent PPM approaches [3, 8] use performance spectra [2] to learn inter-case dynamics present in the process without any prior assumption. Denisov et al. [3] presented a novel approach to predict the aggregated performance of non-isolated cases that utilize performance-related features. Klijn et al. [8] presented a novel RTM approach that is aware of batching-at-end dynamics. In this paper, we extend the process agnostic RTM approach presented in [8] by considering inter-case dynamics caused by non-batching, batching-at-start patterns too. We use and improve the fine-grained error analysis technique proposed in [8] to identify inter-case dynamics by limiting manual intervention.

2.2 RTM Background

RTM approaches predict the remaining time to completion of an ongoing process instance, i.e., case, based on process execution data of completed cases. Process execution of a completed case is recorded as a non-empty sequence of events (e), i.e., \(\sigma \,{=}\, \langle e_1,..,e_n\rangle \) or trace. An event log L is a set of completed traces. Let \(\mathcal {A}, \mathcal {T}, \mathcal {E}\) be the universe of activities (event classifiers), timestamps and events. Each event \(e\, {\in }\,\mathcal {E}\,\) consists of mandatory and additional attributes. Let AN be the set of attribute names. For \(an {\in }AN\), we define \(\#_{an}(e)\) as the value of attribute an for event e. An event e has mandatory attributes timestamp \(\#_{t}(e) {\in }\mathcal {T}\) at which e occurs and activity \(\#_{act}(e) {\in }\mathcal {A}\) that occurs during e.

We first need to understand the general steps to develop a RTM described in [17]. In the offline or training phase, the first step is to prepare the input data, i.e., event log. Since a RTM makes prediction for incomplete traces, it trains on prefixes extracted from traces in L. A prefix is extracted by taking the first \(k{\in }\mathbb {N}\) events from a completed trace (\(\sigma \,{=}\, \langle e_1,..,e_n\rangle \)) using function \(hd^k(\sigma )\,{=}\,\langle e_1,..,e_k \rangle , k \le n\). The resulting prefixes are collectively known as a prefix log \(L^*\) of L. Therefore, data preparation includes cleaning the data, creating a prefix log and feature engineering. Features like weekday or sojourn time are extracted from event data and categorical features are encoded.

A RTM can be instantiated based on three main parameters, methods for grouping similar prefixes into buckets, prefix encoding methods, and used prediction techniques. For instance, \(RTM\,{=}\,(p,a,x)\) represents that the model’s prefix bucketing method is based on similar prefix lengths (p), the encoding method is aggregating data of all prefix events (a), and ML algorithm is XGBoost (x). After training, the models are tuned using techniques like hyperparameter optimization. Finally, the optimal model’s prediction accuracy is evaluated using aggregated metrics, e.g., Mean Absolute Error (MAE).

2.3 Performance Spectrum with Error Progression

To identify process segments subject to high prediction errors due to inter-case dynamics, Klijn et al. [8] introduced a visual analysis technique, Performance Spectrum with Error Progression (PSwEP). It uses the performance spectrum (PS) [2], which maps the performance of each case passing through a segment over time. A process segment \((a,b) {\in } \mathcal {A}\times \mathcal {A}\) can be defined as any two successive steps in the process, e.g., a step from activity a to activity b. For traces of form \(\langle ...,e_i,e_{i+1},...\rangle \), where \(\#_{act}(e_i)\,{=}\,a, \#_{t}(e_i)\,{=}\,t_a, \#_{act}(e_{i+1})\,{=}\,b\), and \(\#_{t}(e_{i+1})\,{=}\,t_b\), we observe an occurrence of a segment (a, b) from time \(t_a\) to \(t_b\). Each occurrence of segment (a, b) representing a case is plotted in a PS as a line from \((t_a,a)\) to \((t_b,b)\). In PSwEP, segment occurrences within a PS are classified based on the error progression of the case while passing through the segment. Let \(\mathcal {P}\) be the set of predictions made on test data using RTM. Each prediction \(pr_{k}{\in }\mathcal {P}\) corresponds to a prediction made for prefix \(hd^k(\sigma )\,{=}\,\langle e_1,..,e_k\rangle \) at point of prediction \(\#_{act}(e_k) \,{=}\, a_k\) and \(t_{pr_{k}} \,{=}\, \#_t(e_k)\), i.e., the time moment of prediction.

Fig. 2.
figure 2

PswEP for (Add Penalty (AP), Send for Credit Collection (SC)) in RF: error decrease (red), error increase (blue).

\(y_{pr_{k}}\) and \(\overline{y_{pr_{k}}}\) denote the actual and predicted outcomes of \(pr_{k}\). To measure the error progression of segment occurrence \((a_k,a_{k+1})\) linked to \(\sigma \), the prediction errors at \(a_k\) and \(a_{k+1}\) are compared. The difference in relative absolute errors \(DRAE(rae_k, rae_{k+1}) = rae_k - rae_{k+1}\) with \( rae_k {=} |\overline{y_{pr_{k}}} - y_{pr_{k}}|/y_{pr_{k}}\) is measured. If the prediction error decreases for a segment occurrence, i.e., \(DRAE>0\) this plotted line is colored red in the PSwEP. If the prediction error increases, i.e., \(DRAE<0\) the line is colored blue. Figure 2 shows PSwEP of segment (Apply Penalty (AP), Send for Credit Collection (SC)) in the RF event log.

3 Approach

In this section, we will discuss the main approach proposed to develop an inter-case-dynamics-aware RTM. In Sect. 3.1, we discuss the proposed techniques to automatically identify uncertain segments. In Sect. 3.2, we discuss the process of identifying and deriving insights about inter-case dynamics. Finally, in Sect. 3.3, we propose ways to create inter-case features by utilizing derived insights.

3.1 Detecting Uncertain Segments

Measuring Uncertainty of a Process Segment. To identify uncertain segments, we need to measure the uncertainty of each process segment. To do so, we first measure the DRAE (Sect. 2.3) of individual segment occurrences linked to predictions made using RTM on test data. Table 1 shows an example of how individual predictions are aligned with segment occurrences and the error progression of each occurrence is classified. A decrease in error, i.e., \(DRAE>0\) for a case passing through segment (a, b) implies that after the occurrence of activity b the remaining time prediction improves. This decrease could indicate some uncertainty between activity a and b, which gets resolved after activity b completes. An increase in error implies that after the occurrence of activity b, the prediction model becomes more unsure about how the partial trace will proceed. If prediction error remains the same, i.e., \(DRAE\,{=}\,0\), there is no clear indication of uncertainty within the process segment. We can either ignore such rare cases or include them as error decrease, where we consider the latter.

Based on above insights, we use three aggregated metrics to quantify uncertainty of segments. For each segment (S) linked to \(\mathcal {P}\), we measure (1) observations or total occurrences linked to S in \(\mathcal {P}\), (2) decrease cases or total occurrences linked to S with \(DRAE \ge 0\), and (3) increase cases or total occurrences linked to S with \(DRAE < 0\). Table 2 is the result of applying the above aggregations to occurrences of segments found in Table 1.

Table 1. Error progression for the occurrence of segments linked to predictions.
Table 2. Measuring uncertainty of each segment by aggregating its occurrences to calculate observations, decrease cases, and increase cases.

Selecting the Most Uncertain Segments. We define a mapping function \(u_{S}:\mathbb {N} \times \mathbb {R} \longrightarrow [0,1]\) to select a subset of process segments for which inter-case features could be created (Eq. 1). The inputs are the number of observations (o) and the ratio \(r \,= \,d/max(1,i)\) of decrease cases (d) to increase cases (i) for segment S (as shown in Table 2). Output 1 indicates the segment is highly uncertain. Note that ideal candidates for uncertain segments are those where decrease cases are almost the same or more than increase cases, i.e., their ratio should be greater than some threshold \(t_r\). The threshold for the number of observations (\(t_{obs}\)) indicates the occurrences of the segments. These thresholds can be set for each process individually.

$$\begin{aligned} u_S(o,r)={\left\{ \begin{array}{ll} 1 \quad &{}\text {if} \, o \ge t_{obs} \ and \ round(r)\ge t_r \\ 0 \quad &{} \, otherwise \\ \end{array}\right. } \end{aligned}$$
(1)

Let SG be the set of all segments in a process and \({SG}_{start}\) be the set of starting segments. Therefore, we apply \(u_S\) to \(S {\in }{SG} \setminus {SG}_{start}\) based on some \(t_r\) and \(t_{obs}\) and select set of segments U for which \(u_S(o,r)\,{=}\,1\). Removing starting activities in traces is due to the fact that the RTM has too little information, and the prediction error is likely to decrease when the second activity occurs. We use the RF event log [9] as the running example. First, predictions are made on the last 20% (temporally split) of the event log using a RTM, here \(RTM\,{=}\,(p,a,x)\). Then, these predictions are used to measure the uncertainty of each process segment and \(u_S\) is applied to all non-starting segments. We set \(t_r\,{=}\,1\) and \(t_{obs}> \mu \), e.g., \(t_{obs}\,{=}\, 2*std\) where \(\mu ,std\) are the mean and standard deviation of segment occurrences. The selected uncertain segments are (Send Fine (SF), Insert Fine Notification (IF)), (Insert Fine Notification (IF), Add Penalty (AP)) and (Add Penalty (AP), Send for Credit Collection (SC)). The details of selecting the most uncertain segments presented hereFootnote 1.

3.2 Identifying Inter-case Dynamics in Uncertain Segments

In order to diagnose causes for uncertainty within segments, first, we visualize the performance of cases within the process segment using PSwEP (Sect. 2.3). After that, the observed patterns in the performance spectrum are compared to a taxonomy [2] to identify underlying process behavior that causes inter-case patterns within the process segment. We explain the process of deriving insights for the uncertain segments identified in the running example.

In the shown PSwEP of \((SF,\ IF)\) in Fig. 3 (left), two patterns, batching-at-start and non-batching FIFO behavior are identified. These are elementary patterns related to the order of case arrival. We notice uncertainty (as shown by the red lines) for non-batched cases. Therefore, RTM is currently not aware that non-batched cases are processed much faster than batch ones. Batched cases within the segment (Fig. 3) are also classified using red. The uncertainty concerning these cases is caused by the prediction model’s lack of awareness about batching-at-start dynamics. The order of lines in PSwEP of \((AP,\ SC)\) presented before in Fig. 2 clearly shows that the inter-case pattern is caused by batching-at-end. The prediction model is currently unaware of this inter-case dynamic within the process segment. In PSwEP of \((IF,\ AP)\) in Fig. 3 (right), we observe a FIFO with a constant time pattern in the order of case arrival. The performance of a case is strongly correlated to the previous case that passed through the segment. We also know that there are two possible activities, Add Penalty (AP) or Insert Date Appeal to Prefecture (ID), that can occur after Insert Fine Notification (IF) and the time that cases wait within the segments is significantly different. Therefore, incorrectly assuming the path of a case arrives at IF impacts the remaining time prediction. We are able to predict the path by observing the recent performance of cases in \((IF,\ AP)\) and \((IF,\ ID)\) w.r.t. inter-case dependencies. Lastly, across three segments, we observe changing the density of lines indicating varying workloads.

Based on the above derived insights, we define the abbreviated inter-case pattern(s) identified for segments \((SF,\ IF),(IF,\ AP)\) and \((AP,\ SC)\) as \(R_1\,{=}\,non-batching,\ batch(s)\), \(R_2\,{=}\,non-batching\) and \(R_3\,{=}\,batch(e)\) respectively.

Fig. 3.
figure 3

PSwEP for segments (Send Fine (SF), Insert Fine Notification (IF)) (left), and (Insert Fine Notification (IF), Add Penalty (AP)) (right) in the RF event log.

Table 3. The created inter-case features for segment predictions (\(\mathcal {C}\,{=}\,\{C_S, C_{S_1},C_{S_2},C_{S_3}\}\)) and waiting time (w) within uncertain segments for the RF event log.

3.3 Inter-case Feature Creation

As the running example shows, ignoring inter-case dynamics results in high prediction errors for prefixes expected to pass through segment \(S {\in } U\). Therefore, we need to provide the RTM information about a prefix being subject to inter-case pattern R detected in uncertain segment S prior to the occurrence of the segment. We use these insights to develop inter-case features.

Fig. 4.
figure 4

The overview of feature creation process for RF event log with uncertain segments \(S_1\), \(S_2\) and \(S_3\).

Consider the running example with three uncertain segments \(S_1\), \(S_2\), and \(S_3\) with inter-case pattern(s) \(R_1\), \(R_2\) and \(R_3\), respectively, we define the following inter-case features: (1) \(C_S {\in }\{0,1\}\), to indicate if a prefix passes through an uncertain segment \(S {\in }U\), (2) \(C_{S_{1}} {\in }\{0,1\}\), to indicate that the prefix passes through \(S_1\) with inter-case pattern(s) \(R_1\), (3) \(C_{S_2} {\in }\{0,1\}\), to indicate that prefix passes through \(S_2\) with inter-case pattern(s) \(R_2\), (4) \(C_{S_3} {\in }\{0,1\}\), to indicate that prefix passes through \(S_3\) with inter-case pattern(s) \(R_3\), and (5) w, to indicate the waiting time of the prefix in \(S {\in }U\), as a result of inter-case pattern(s) R. As a result of the feature creation step for the running example, Table 3 is generated showing inter-case features. These features are used to train an inter-case-dynamics-aware RTM. Feature y is the target feature, i.e., remaining time to completion.

Creating inter-case features for an ongoing case at run-time requires its own prediction models. We need a model (NS) to predict inter-case features related to segment prediction and waiting time prediction model (\(TM_{S,R}\)) for each uncertain segment \(S {\in }U\) with inter-case pattern(s) R. Figure 4 gives an overview of the steps involved in creating the models (offline) and utilization of these models to create inter-case features (at run-time). This process is the extended version of the presented feature-creation in [8].

3.4 Predicting the Next Segment

Classifier NS should determine if a prefix passes through segment \(S{\in }U\) at the point of prediction. To build NS, we build a classifier for the next activity prediction using [18] and modify the outcome to predict the value of segment prediction inter-case features. Let \(hd^k(\sigma )\) be the input prefix with last activity a for NS. If the next activity predicted is b, we say that the prefix passes through segment (a, b) at the point of prediction. If \((a,b) {\in }U\), then \(\overline{C_S}\,{=}\,1\), else we set it to 0. If \(\overline{C_S}\,{=}\,1\), we set the value of the boolean variable representing the prefix passing through segment (a, b) as 1. Therefore, if predicted \((a,b) \,{=}\, S_1\), then \(\overline{C_{S_1}}\,{=}\,1\), \(\overline{C_{S_2}}\,{=}\,0\), and \(\overline{C_{S_3}}\,{=}\,0\). The collective set of predicted features using NS is called \(\overline{\mathcal {C}}\).

Fig. 5.
figure 5

Illustration of a single instance for \(TM_{S,R}\) to learn waiting time for case \(c_1\) using performance-related features extracted from \(S_h\) and relevant individual properties of \(c_1\).

3.5 Predicting Waiting Time

In this section, we present general steps to create a waiting time prediction model (\(TM_{S,R}\)) that predicts how long a case stays in a segment S with inter-case patterns R. Consider a case \(c_1\) arriving at segment \(S\,{=}\,(a,b)\) at time \(t_a\) (Fig. 5). Because of inter-case dynamics, the waiting time w of \(c_1\) will depend on the performance of other cases in relevant segments in some recent time interval, i.e., historic spectrum (\(S_h\)) [3] and relevant individual properties (intra-case features). The intra-case feature of \(c_1\) and performance seen within \(S_h\) can be encoded as feature vector \(X_{1}..X_{n}\) using insights gained about R within S. This allows us to formulate the waiting time prediction problem as a supervised learning problem: \(w \,{=}\, f(X_1..X_{n}) + \varepsilon \), where function f predicts w from \(X_{1}..X_{n}\). To learn f, we create training samples using the sliding window method and apply a ML method like LightGBM [6] that tries to minimize prediction error \(\varepsilon \). Table 4 shows sample data used to train a \(TM_{S,R}\) for (IF, AP).

Table 4. Sample data for training waiting time prediction model (\(TM_{S,R}\)) for uncertain segment (IF, AP) with pattern \(R\,{=}\,non-batching\).

Waiting Time Prediction for Non-batching Dynamics. In Sect. 3.1, we learned that \(\overline{w}\) of a case in (IF, AP) is influenced by \(R\,{=}\,non-batching\) and varying workload in segments (IF, AP) and (IF, AD). To derive workload related context, we define h in \(S_h\) as the period between arrival of \(c_1\) and the last case before it and derive: (1) starting cases or the number of cases that started (arrived at the segment) in period h, (2) ending cases or the number of cases that completed (exited the segment) in period h and (3) pending cases or the number of cases that have started within period h and will complete in the future. Since, performance of a case in (IF, AP) strongly depends on the previous case, we also extract last waiting time (\(w_{l}\)), e.g., Table 4.

Waiting Time Prediction for Batching-at-Start Dynamics. \(\overline{w}\) of a case c1 arriving at (SF, IF) will depend on \(R\,{=}\, batch(s), non-batching\) and varying workload within the segment. Therefore, \(S_h\) contains only segment (SF, IF). To learn performance related to \(R\,{=}\,non-batching\) and the workload, we include features presented in Sect. 3.5. To include features related to \(R\,{=}\,batch(s)\), we extract features related to the previous batch [7] with batching moment \(BM_l\): (1) least (\(w_{min}\)) and longest waiting time (\(w_{max}\)) in previous batch, (2) previous batch size and batch size percentile, (3) mean and standard deviation of IBCT or inter batch case completion time, which is the time difference between the completion times of two successive cases in the batch and (4) batch type, which distinguishes batches with less than 2 observations that behave like simultaneous batches, and (5) CIA or case inter-arrival time which the time between arrival of \(c_1\) and the case before it. We also include relevant intra-case features resource, expense, points and weekday, month, hour of previous batch. Duration or the waiting time of the case in the previous segment is also included to distinguish batched and non-batched cases. However, learning case-specific w is difficult because batching-at-start cases proceed randomly, i.e., not in the order they arrived at the batch. To avoid learning this random behavior, we propose building a \(TM_{S,R}\) that predicts the average of expected waiting times for all cases that arrive along with \(c_1\). Hence, the training data will be prepared by extracting the above-mentioned features and then aggregating (calculating mean) feature values for instances that correspond to cases arriving simultaneously in the segment.

Waiting Time Prediction for Batching-at-End Dynamics. (AP,SC) contains inter-case dynamics caused by \(R\,{=}\,batch(e)\) and varying workload. To consider the varying workload across the segment, we include the features presented in Sect. 3.5. To learn batching related performance, we extract features \(w_{min}\), \(w_{max}\) and CIA described in previous section. Additionally, we include: (1) \(t_{lb}\): or the time elapsed since the occurrence of the last batch, (2): the mean and standard deviation of IBIA (inter-case arrival rate) which is the difference between the arrival times of two successive cases in the batch. We also include intra-case features month and weekday.

4 Evaluation

4.1 Experimental Setup

We evaluate the proposed approach on two real-life event logs: the RF event log [9] and BPIC’20 event log [4]. We implemented inter-case feature creation and PSwEP in Python, which is publicly availableFootnote 2. To train and test RTMs, we use the benchmark implementation for RTM approachesFootnote 3 [17]. First, we make predictions with \(RTM\,{=}\,(p,a,x)\) for both event logs to identify uncertain segments and their patterns. The uncertain segments identified from RF event log are (SF, IF), (IF, AP) and (AP, SC) with inter-case pattern(s) \(R_1\,{=}\,non-batching,\ batch(s)\), \(R_2\,{=}\,non-batching\) and \(R_3\,{=}\,batch(e)\), respectively. The two identified uncertain segments from BPIC’20 event log are (Declaration Final Approved by Administration (DF), Request Payment (RP)) and (Request Payment (RP), Payment Handled (PH)). The inter-case pattern(s) identified for segments (DF, RP) and (RP, PH) are \(R_1\,{=}\,non-batching,\ batch(s)\), and \(R_2\,{=}\,batch(e)\) respectively. To create inter-case features, we implement NS using [18] and follow steps described in Sect. 3.5 to create \(TM_{S,R}\) models using LightGBM [6]. Predictions are made with different bucketing prefixes, encoding prefix events, and ML methods. We consider prefix bucketing methods to be grouping by prefix lengths (p), using a clustering algorithm (c) or grouping all prefixes in a single bucket (s). Common prefix encoding methods include data of only last prefix event (l) or aggregating data of all prefix events (a), and apply ML models, XGBoost (x) or random forest (r) to the input encoded feature vectors. The following input configurations are used: (1) \(I(\emptyset )\): event log with no inter-case features, (2) \(I(\mathcal {C}, \overline{w})\): event log with inter-case features created using actual segment prediction \(\mathcal {C}\), and (3) \(I(\overline{\mathcal {C}}, \overline{w})\): event log with inter-case features created using segment prediction made using NS. We use \(80\%\) and \(20\%\) (by temporally splitting) of the event logs for training and testing the RTMs. To measure overall prediction accuracy, we measure the weighted average MAE [17] of all predictions \(\mathcal {P}\) made on test data.

4.2 Results

Table 5. Weighted average MAE (in days) of different RTM models with different bucketing, encoding and ML methods, e.g., (p, a, x), while using no inter-case features \(I(\emptyset )\) and with the created inter-case features using segment predictions \(I(\overline{\mathcal {C}}, \overline{w})\).
Table 6. MAE (in days) for different configurations (I) with the similar lengths bucketing (p), aggregating events data for encoding prefix events (a), and XGBoost (x) as the ML method, \(RTM\,=\,(p,a,x)\). \(\mathcal {P}_{k}\) is the set of all predictions for prefixes of length k.

Table 5 shows that using inter-case features leads to an increase in performance for all 8 combinations of bucketing prefixes, encoding prefix events, and ML methods in RTMs against baseline \(I(\emptyset )\). For the RF event log, we see that prediction error decreases by a maximum of \(14.26\%\) and a minimum of \(4.27\%\) for methods (p, l, x) and (c, a, x), respectively, with \(I(\overline{\mathcal {C}}, \overline{w})\). For the BPIC’20 event log, we observe a maximum decrease of \(5.12\%\) and a minimum decrease of \(1.55\%\) in weighted average MAE for methods (c, a, r) and (c, a, x), respectively. Since BPIC’20 is a smaller event log with fewer cases subject to the identified inter-case patterns, the overall reduction in prediction error is smaller. The most accurate predictions for the RF event log obtained using \(I(\overline{\mathcal {C}}, \overline{w})\) with (c, l, x), has a MAE 0.6 days less than the benchmark result [17]. However, our approach’s privilege is that these predictions can be interpreted more easily because of the inter-case features.

Fig. 6.
figure 6

Comparing prediction results for RF

Fig. 7.
figure 7

Comparing prediction results for BPIC’20

In our approach, inter-case features are primarily included for prefixes passing through uncertain segments which occur at some step k of the process. Therefore, we look at MAE of predictions made for all prefixes of relevant length k, i.e., \(\mathcal {P}_k \subseteq \mathcal {P}\). Segments \((SF,\ IF)\), \((IF,\ AP)\) and \((AP,\ SC)\) of the RF event log occur predominantly at step \(k\,=\,2\), \(k\,=\,3\) and \(k\,=\,4\) of the process respectively. Segments \((DF,\ RP)\) and \((RP,\ PH)\) of the BPIC’20 log occur predominantly at steps \(k\,=\,3\) and \(k\,=\,4,\ 5\) respectively. Table 6 shows us the results for predictions made using \(RTM\,=\,(p,a,x)\). For the RF event log, the prediction error decreases by \(39\%\), \(12\%\) and \(15\%\) for \(\mathcal {P}_2,\ \mathcal {P}_3\) and \(\mathcal {P}_4\), respectively using \(I(\overline{\mathcal {C}}, \overline{w})\) over baseline. For BPIC’20, error decreases up to \(15\%\) and \(9\%\) for \(\mathcal {P}_4\) and \(\mathcal {P}_5\), respectively, when using \(I(\overline{\mathcal {C}}, \overline{w})\). However, the MAE of \(\mathcal {P}_3\) is slightly higher for configuration \(I(\overline{\mathcal {C}}, \overline{w})\) compared to \(I(\emptyset )\) . This is because of incorrect segment predictions for (DF, RP) made by NS which is proven by the results of \(I(\mathcal {C}, \overline{w})\). Figures 6 and 7 compare the batching-at-end aware predictions made using inter-case features created in our approach that uses LightGBM [6]) and previous approach [8] that uses exponential smoothing (ES). We measure the increase/decrease in performance of \(\mathcal {P}_4\) made using different combination of RTMs over their respective baselines. We compare only predictions at \(k\,{=}\,4\) for both logs where uncertain segments with batching-at-end dynamics occur. Figure 6 shows that, our approach performs better than previous approach in 5 of the 8 input configuration (I) for batched cases in RF event log. Figure 7 shows that for the batched cases in BPIC’20 log, our method performs better for all the configurations.

5 Conclusion

We presented an approach to systematically discover a subset of uncertain process segments with inter-case dynamics that cause high prediction errors. Contrary to previous approaches, our designed function for detecting the subset of uncertain segments, limited the manual intervention to the identification of inter-case patterns within these segments. Using visual analysis, we identified and gained insights about inter-case pattern(s) within uncertain segments. In particular, we gained insights into non-batching (FIFO and unordered), batching-at-start, and batching-at-end inter-case patterns. Subsequently, we included these insights in remaining time predictions by transforming them into the inter-case features. For instance, there is a maximum increase in overall prediction performance by 14.2% for RF event-log. Since there is no standardized process to create a ML model for inter-case feature creation, our proposed approach is also sensitive to user interpretation. Yet, it provides more interpretability to RTMs. Note that despite an overall decrease in prediction error, some prefixes were heavily over-predicted or under-predicted. Therefore, the next step is to improve the prediction models and leverage routing probability derived from stochastic process models. It improves the inter-case feature creation for segment prediction. Another possible path is to make RTM aware of non-case-related aspects, e.g., resources dependencies.