Predictive End-to-End Enterprise Process Network Monitoring

Ever-growing data availability combined with rapid progress in analytics has laid the foundation for the emergence of business process analytics. Organizations strive to leverage predictive process analytics to obtain insights. However, current implementations are designed to deal with homogeneous data. Consequently, there is limited practical use in an organization with heterogeneous data sources. The paper proposes a method for predictive end-to-end enterprise process network monitoring leveraging multi-headed deep neural networks to overcome this limitation. A case study performed with a medium-sized German manufacturing company highlights the method’s utility for organizations.


Introduction
Business processes are the backbone of organizational value creation . The progressing digitalization of business processes results in massive amounts of historical process data (van der Aalst 2016). In parallel, analytics capabilities facilitate the use of this data (Vera-Baquero et al. 2013;Beheshti et al. 2018). Business process analytics refers to a set of approaches, methods, and tools for analyzing process data to provide process participants, decision-makers, and other related stakeholders with insights into the efficiency and effectiveness of organizational processes (Zur Muehlen and Shapiro 2015; Polyvyanyy et al. 2017;Benatallah et al. 2016).
A type of business process analytics aims to predict future process behavior based on business process data (Zur Muehlen and Shapiro 2015). Predictive process analytics is typically realized by a class of information systems, called predictive monitoring systems, which promise to assist decision-makers through predictions based on historical event log data (Schwegmann et al. 2013). As a methodological basis for predictive monitoring systems, predictive process monitoring (PPM) is gaining momentum in business process management. PPM provides a set of methods that allow predicting measures of interest based on event log data (Maggi et al. 2014). By gaining insights into the uncertain future of a process, PPM methods enable decision-makers to prevent undesirable outcomes (van der Aalst et al. 2010;Márquez-Chamorro et al. 2017). For example, in a hypothetical manufacturing company with a production process manifested in a manufacturing execution system, a PPM tool can be used to predict disruptions for running process instances. The predictions allow the company to proactively intervene in the respective process instances to mitigate or prevent disruptions. As disruptions directly affect productivity, proactive management of process instances enhances value creation. This is typically achieved by providing extended and relevant information at the right time which in turn will lead to time, cost, and workforce savings.
As event log data, PPM typically refers to a single event log documenting a specific process or multiple sub-processes (e.g., Cuzzocrea et al. 2019;Senderovich et al. 2019). Oftentimes, the (process) control flow information is feature-encoded, with one target variable per process instance or prefix (part of the process instance) (e.g., Breuker et al. 2016;Lakshmanan et al. 2015). More sophisticated approaches append (process) context information to control flow information of a single event log to increase the explainability of input variables concerning the target variable (e.g., Yeshchenko et al. 2018;Brunk et al. 2020).
In organizations with a process-oriented design (Eversheim 2013), the departments' organizational alignment supports end-to-end business process execution and management. Departments are connected via the organization and departments layer and via the enterprise process network layer, connecting departments, processes, and information systems (Fig. 1). 1 More specifically, this layer establishes inter-department and inter-process dependencies, as departments will usually be involved in a multitude of processes (e.g., the production department is responsible for disruptions affecting the shipment process in the logistics department or may influence the sales process in the sales department) and a process will often involve multiple departments (e.g., an order process (red), that spans the sales, logistics, and production department).
Consequently, the enterprise process network extends the scope from the process level to the process network level. The primary data source in the enterprise process networks are event logs documenting the control flow information of a process. This logged control flow is often combined with additional event-log-related context information directly related to the process. The primary log data is supplemented by additional data sources which are related to the process, e.g., sensor data (temperature, humidity, vibration), or measurements. Complex manufacturing business process environments encompass many heterogeneous data sources. We refer to these as different types of data, i.e., measuring differently scaled data or collecting data at varying frequency (Canizo et al. 2019).
Given this data scope definition, Fig. 1 distinguishes data sources such as an order event log (red-dashed), a production event log (blue-dash-dotted), both with control flow and process-related context information, as well as disruption context information (green-dotted). 2 In this exemplary enterprise process network, a disruption prediction may benefit from additional information from the logistics process. By considering the interplay between the different processes, the predictive power may increase, as more data potentially results in additional relevant features. Higher predictive power enhances the organization's value creation. By contrast, existing PPM approaches do not adopt such a process network perspective (Borkowski et al. 2019). This may limit their practical use as seamless combination of heterogeneous data sources relating to multiple processes is very difficult. By focusing on enterprise process network monitoring, we address this limitation and introduce a predictive end-to-end method. The main contribution of our research is threefold: 1. We present a method for predictive enterprise process network monitoring in the business process management (BPM) domain. The method establishes an end-to-end perspective on predictive process network monitoring in an organizational context. In doing so, it facilitates the combination of heterogeneous data sources for predictive tasks and guides the problem specification as well as the design and application of a multi-headed neural network (MH-NN) model. 2. Our novel multi-headed deep neural network (DNN) model integrates multiple data sources from an enterprise process network, such as the color-highlighted process logs or context information in Fig. 1. With this deep learning (DL) architecture, the heterogeneous data are processed in dedicated neural network (NN) input heads and concatenated for prediction, based on cross-department information. 3. The results from a case study conducted with a medium-sized German manufacturing company shed light on the practical relevance. We evaluate our method against traditional machine learning (ML) and state-of-the-art DL approaches in terms of predictive power and runtime performance based on real-world data. While the DL model constructed with our method exhibits somewhat higher computational costs, its predictive power is significantly higher than the considered baselines.

Background and Related Work
We first review recent advances in PPM with a special focus on predictive models. In doing so, we highlight the research gap and position our methodological contributions.

Prediction Methods in Predictive Process Monitoring
Process mining (PM) is an established process analysis method in BPM that involves data-driven (process model) discovery, conformance checking, and enhancement of processes (van der Aalst et al. 2011a). PM's general idea is to gain process transparency from event log data. It is thus an approach for process analytics, particularly focusing on ex-post process diagnostics. With the advent of predictive analytics, new potentials of gaining insights from event log data have been unlocked (Breuker et al. 2016). Using these methods, PPM has emerged as a new subfield of PM (Márquez-Chamorro et al. 2017). PPM provides a set of techniques to predict the properties of operational processes, which can be arranged into two general groups (Mehdiyev et al. 2020). The first group of techniques addresses regression tasks and refers to the prediction of continuous target variables, such as the completion time of a process instance (e.g., van der Aalst et al. 2011b;Wahid et al. 2019). In contrast, the second group tackles classification tasks and refers to the prediction of discrete target variables, such as the next activity (e.g., Mehdiyev et al. 2017;Breuker et al. 2016), process violations (e.g., Di Francescomarino et al. 2016, or process-related outcomes (e.g., Flath and Stein 2018;Kratsch et al. 2020). A branch of early PPM approaches augment discovered process models with predictive capabilities but require certain model structures to support prediction tasks. Thereby, the process model is transformed into a predictive model. For example, van der Aalst et al. (2011b) introduce a technique that uses an annotated transition system with the capability to predict process completion time based on historical event log data. Another example is Rogge-Solti et al. (2013), who mine a stochastic Petri net with arbitrary delay distribution from event log data. These approaches can be described as process-aware because they utilize ''(...) an explicit representation of the process model to make predictions'' (Márquez-Chamorro et al. 2017, p. 4). However, real-world processes are usually more complex than the discovered process models (van der Aalst 2011). The process-model-dependence limits the predictive power (Senderovich et al. 2019). To overcome this restriction, another, more recent branch of PPM approaches proposes to encode sequences of process steps as features vectors for the straightforward use of ML models. This transforms the event log's sequential process information into a predictive model without discovering a process model. Leveraging the generalization power of ML models, sequence-encoding approaches often outperform predictive models built on top of discovered process models (Senderovich et al. 2017).
The multi-layer perceptron (MLP) is a classic NN architecture (from the class of feed-forward DNN, Goodfellow et al. 2016). that has been leveraged for PPM. The MLP does not explicitly model temporality. As a workaround, sequential data has a two-dimensional data structure. For example, Theis and Darabi (2019) used MLPs to predict the next activities. DNNs have been applied to PPM, due to the conceptual similarities between next event prediction and natural language processing tasks (Evermann et al. 2016). DNNs can outperform statistical (e.g., Verenich et al. 2019) and traditional ML approaches (e.g., Kratsch et al. 2020;Mehdiyev et al. 2020;Evermann et al. 2016). DNNs perform multirepresentation learning, which ''(...) focuses on extracting the multiple representations from the single view of data'' (Zhu et al. 2019, p. 3) and are good at unveiling intricate structures in data (LeCun et al. 2015). A popular sub-class of DNNs are recurrent neural network (RNN) approaches (Rama-Maneiro et al. 2021), including LSTM and gated recurrent unit (GRU) neural networks, providing the capability to capture temporal dependencies within sequences (Rumelhart et al. 1985). Another DNN architecture, which allows the processing of temporal patterns across short time horizon (local temporal neighborhood), is the convolutional neural network (CNN) (Zhao et al. 2017). To leverage the potential of CNN for PPM, a preprocessing of sequences from temporal to spatial structure is needed. Pasquadibisceglie et al. (2019) show the validity of such a sequence preprocessing for predicting the next process activity using the helpdesk event log and BPI challenge 2012 data. Graph neural networks (GNNs) have recently been used in PPM because the process control flow follows a graph structure (e.g., Stierle et al. 2021) and can directly be processed through GNNs. Beyond the four general architectural types MLPs, RNNs, CNNs, and GNNs, extensions (e.g., transformer networks with dense layers like MLPs; Moon et al. 2021) or combinations (e.g., long-term recurrent convolutional networks; Park and Song 2020) were proposed for PPM.

Data Scope vs. Prediction Methods in Predictive
Process Monitoring Statistical approaches in PPM (e.g., van der Aalst et al. 2011b;Rogge-Solti et al. 2013) start with the control flow information of event log data. This type of information is key for process predictions, as the control flow of processes describes their structure. By using ML, the scope of data is extended and PPM techniques can encode further event log information in feature vectors (e.g., Folino et al. 2012). This additional information is called process context information. It characterizes the environment in which the process is performed (Da Cunha Mattos et al. 2014;Rosemann et al. 2008), and represents, for example, information about the resource that performs an activity.
In recent years, PPM research has suggested DL architectures that integrate context information to improve prediction results (Rama-Maneiro et al. 2021). Current PPM approaches receive single event logs as input and do not leverage information from multiple data sources. Thereby, an event log can also contain several subprocesses, such as in the event log shared at the BPI Challenge 2012. 3 Currently, there are no PPM techniques using multiple data sources to perform end-to-end enterprise process network predictions. Figure 2 differentiates published PPM techiques based on two dimensions, namely data scope and prediction method, to extract the research gap within scientific literature concerning end-to-end PPM.
New time series forecasting techniques (e.g., Canizo et al. 2019;Mo et al. 2020;Wan et al. 2019) offer a promising way to realize such predictions through multiheaded NN. These networks process data from each input head (e.g., from a machine sensor) individually and merge the heads' outcomes subsequently. Motivated by this idea, we set out to adapt this method for end-to-end enterprise process networks.

Predictive End-To-End Enterprise Process Network Monitoring
We propose PPNM, a five-phase method for predictive end-to-end enterprise process network monitoring (Fig. 3). We develop our PPNM method based on the method engineering research framework for information systems development methods and tools proposed by Brinkkemper (1996). Methods describe systematic procedures ''to perform a systems development project, based on a specific  Fig. 3 Five-phase method for predictive end-to-end enterprise process network monitoring way of thinking, consisting of directions and rules, structured in a systematic way in development activities'' (Brinkkemper 1996). The method engineering process consists of three phases (Gupta and Prakash 2001): requirements engineering, method design, and method implementation. First, we define requirements for the construction of the PPNM method such as the application as an end-to-end approach, the integration of multiple data sources, and an outperforming predictive power. Second, we present the design, evaluation, and implementation of the PPNM method in this section and describe the method's phases in detail in the context of a case study of a medium-sized German manufacturing company. Finally, we discuss the PPNM method critically and provide implications (Sect. 3.4). In our PPNM method, at first, the underlying problem is specified. This includes (business) problem identification, (business) process understanding, and predictive task specification. Second, the method prescribes to acquire and prepare the input data for the MH-NN model. Third, the MH-NN model is designed and subsequently evaluated in the fourth phase. Lastly, PPNM describes aspects of the model application.

Problem Specification
The first phase specifies the problem by adapting the approach of Benscoter (2012), beginning with the problem identification at the business department or enterprise process network layer. Their approach to ''identify and analyze problems in your organization'' (Benscoter 2012) has a particular focus on identifying a situation's impact on processes and workers as well as problem-relevant metrics. Subsequently, the establishment of an understanding of the interdependent processes and data sources is crucial. Within an organization's layers, all relevant processes and data sources, which can add value to the predictive analysis task, should be identified. Then, their dependencies should be understood to identify common denominators for synchronizing heterogeneous data sources and how they relate to the organizational problem or situation. Based on this process and data understanding, the method prescribes to define the organizational objective and the type of predictive task (regression or classification). 4

Data Acquisition and Preparation
Having identified relevant processes and data sources, we next acquire and prepare input data for the MH-NN. Data acquisition relates to activities seeking to obtain the heterogeneous data. This data is analyzed to gain insights about the data source and subsequently prepare it for the MH-NN. The network processes each data source individually, without the need for prior aggregation and combination. We apply some standard preparation techniques (Han et al. 2011) but more generally follow the DL recommendation of focusing on standard DL architectures for feature extraction and limiting extensive preparation (LeCun et al. 2015).
As a crucial step of data preparation, PPM requires appropriately encoded events and sequences. Events can be encoded based on the attributes' type. Sequences of events can be encoded as feature-outcome pairs (Van Dongen et al. 2008), n-grams of sub-sequences (Mehdiyev et al. 2020), feature vectors derived from Petri nets (Theis and Darabi 2019), or weighted adjacency matrices (Oberdorf et al. 2021a).

Multi-headed Neural Network Design
Designing the multi-headed NN, we follow recent work on PPM methods, which move from explicit process models and traditional ML approaches to NN-based approaches (Mehdiyev et al. 2020). Yet, for some scenarios, the sequential structure of these NNs is not sufficiently flexible such as, if data from different sources with different dimensions are required to explain the output variable. Following Chollet (2018, p. 301), the proposed architecture for these cases is a multi-head NN. Architectures with multiple heads use independent single-channel input heads to process each input individually. With this approach, each data source can be processed, according to its data type and structure. Head outputs are then concatenated and further processed to obtain a prediction in the output layer.
For the design of the multi-headed NN, the method facilitates the use of a multitude of architectures (Fig. 4). In general, it distinguishes customized and state-of-the-art architectures.
For customized architectures, a combination of NN layers can be selected (Sect. 2.1). Following Goodfellow et al. (2016), combining various layers in a task-specific manner enables the implicit extraction of valuable features. To this end, distinct properties of architectures can be leveraged, such as the particular suitability of LSTM layers to process time-series or CNN layers for matrix data. These properties can even be combined to process time-series, such as a combination of LSTM and CNN layers (Brownlee 2017).
In addition to the customized architectures, the method taps into recent advances in the DL domain by incorporating established architectures. There are state-of-the-art architectures for the various domains such as image, text, or signal processing. As the numbers of available architectures are constantly changing, we suggest checking for currently available state-of-the-art networks during a model's design phase to build on recent research advances. 5 Figure 4 provides an overview of currently established state-of-the-art methods for various tasks. Depending on the data type, we show current DL solutions for problems, such as sentiment analysis (Jiang et al.  The common denominator for such models is that they consist of complex DL architectures with many hidden layers and trainable parameters. Because the training of such models is computationally demanding, they are usually provided with pretrained weights, which can then be leveraged for the prediction task at hand or even fine-tuned based on the task's specific data.

Multi-headed Neural Network Evaluation
The method next requires to consider aspects of model evaluation. For this purpose, we follow Brownlee (2020)'s approach, including the generation of a validation set and the use of performance metrics to assess a model's performance. The evaluation of the resulting model is crucial for the selection of a proper configuration. It reveals whether the model is suitable to estimate the desired target variables. To this end, test and validation sets are artificially generated through validation methods. In particular, in the field of PPM, selecting an appropriate validation set method is challenging. There are three established validation set generation methods (Fig. 3). In addition to the validation set generation, it is common to keep a holdout set containing exclusive data for a final model evaluation.
The most common method used is a straightforward strategy, referred to as a train-test split procedure (James et al. 2017, p.176-178). An alternative evaluation procedure is k-fold cross-validation for estimating the prediction error (James et al. 2017, p.181-186). It splits the data set into k folds, uses k À 1 of folds for training and the other fold for validation.
In some settings, regular k-fold cross-validation is not directly applicable. This is the case for time-series data, where observations are samples with fixed time intervals. The constraint is the temporal components inherent in the problem. Here, a time-series split is an appropriate method, where in the k th split, the first k folds are used as a train set, and the ðk þ 1Þ th fold is used as a test set. Time-series splits have the drawback that there is overlap between the training and testing data. This limitation can be resolved by forward testing techniques where the model is automatically retrained at each time step when new data is added (Kohzadi et al. 1996).
After selecting an appropriate validation technique, the next step is choosing a performance metric for the predictive problem. For classification tasks, accuracy is a very commonly applied metric. It measures the ratio between the number of correctly predicted target labels and the total number of predictions. The accuracy metric is only designed for tasks considering all classes as equally important, and its usefulness suffers if the samples within the classes are not equally distributed. For imbalanced data sets, the preferable metrics are balanced accuracy, the weighted F1-score, or the Matthews correlation coefficient. The most common metrics for evaluating predictive regression tasks are mean absolute error (MAE), or the mean squared error (MSE). To provide relational insights, in particular in an organizational context, the mean absolute percentage error (MAPE) is useful. One of the metrics is then chosen for model training, yet it is common to provide an overview of multiple metrics for the evaluation.
Based on the validation set and performance metrics, the model is trained and tuned. Finally, the tuned models are tested and the learning curves evaluated, to ensure a robust model for the prediction task.

Multi-headed Neural Network Application
In the last phase, the method describes aspects for MH-NN application. This includes the operationalization of data acquisition and preparation as well as the deployment of an evaluated MH-NN. Of particular importance is the live connection to the enterprise process network and the data sources. Instead of training on historical data, the MH-NN must handle live data to provide real-time predictions. Thus, besides model performance, runtime performance becomes particularly relevant during model deployment.
If the model is integrated into the enterprise process network and connected to (live) data sources, it facilitates the prediction of the desired variable. Such a prediction then affects an organizational process, for example, through the prediction of upcoming events or the classification of an event's type, which can be used to provide better solutions in organizations. As the processes are improved due to the prediction, the designed model then assists in the organizational goal of process improvement.

Method Evaluation
To evaluate the PPNM method, we use a real-world use case and present the processing of the method's five phases. We provide insights about the real-world application and discuss the method's engineering as well as application.

Problem Specification and Industry Background
We collaborated with a medium-sized German manufacturing company. The firm has multiple distributed production and assembly lines for highly customized mechatronics products. Competitive pressure necessitates the firm to offer high-quality products with (mass) customization options. This combination can lead to fairly complex production processes. Here, disruptions 6 where a worker has to interrupt work, are not uncommon.
To efficiently handle such disruptions, our cooperation partner has deployed a disruption management system (Oberdorf et al. 2021b). The system automates responder notification for solving a disruption. 7 As a disruption is solved through the responding agent, the agent provides the system additional information, such as one of 32 disruption reasons (types). We identified the disruption's type as a central component of the problem specification. If the type was already known, an agent could already prepare the solution process (e.g., bringing relevant tools or documentation), which reduces the disruption associated downtime.
In parallel, the production processes have been analyzed with PM techniques to identify optimization potentials. However, due to the enterprise process network's complexity, interrelations, and dependencies, the respective analyses are very time-consuming. Consequently, the realization horizon of possible benefits is long. Striving for immediate benefit with minimal analysis effort, we adopt the PPNM method and provide an end-to-end PPNM solution. Thereby, the MH-NN is integrated into the organizational enterprise process network. The organizational objective is to improve the production process through better disruption handling, resulting in reduced downtime. We do so by predicting the disruption type and providing a solution suggestion to a notified agent based on the prediction. Accurate predictions are essential for meaningful notifications and suggestions.
We engaged with various departments (digitalization, logistics, and production) to evaluate the PPNM method in practice. Thereby, we elaborated on each department's process event log and related databases. 8

Data Acquisition and Preparation
We compute basic statistics and advanced event log characteristics such as sparsity, variation, or repetitiveness (Heinrich et al. 2021;Di Francescomarino et al. 2017) to better understand the production and logistics event log data used (Table 1) as well as the disruption context information ( Table 2). The descriptives demonstrate the high complexity of the semi-structured event logs with many unique process variants and activity types. Furthermore, we combine both event logs and obtain the combined production event log, which contains information about the logistics and production process, its control flow, and context information.
The disruption log is closely related to the intra-logistics and production departments and processes, as disruptions occur in both departments. It contains information about historical disruptions with features such as the disruption hardware id and timestamp. This way disruptions can be mapped to a workplace through the hardware device database. This enables us to retrieve product information from the respective data sources, which we can also leverage as features for the predictive task.
We follow the PPNM method to design a multi-head NN: We start with the data preparation for the disruption log. Concerning the hardware id, we include additional workstation and product information using one-hot encoding. Besides, we can extract time features, such as days, weekdays, hours, and minutes, from the disruptionassociated timestamp, which we subsequently normalize. By aggregating the logistics and production log, we obtain a process event log with context information. To transform the event log into valuable features, we follow Oberdorf et al. (2021a) and select process instances within a time window, which we subsequently transform into a matrix representation. Thereby, rows and columns relate to specific workstations and the value of a distinct cell to the production quantity within the time window. For NN preparation, we scale each matrix by the maximum production quantity of all matrices. This process is used for the control flow data (process matrices) as well as for the context data (context matrices).

Multi-headed Neural Network Design
We choose a three-headed DNN architecture (Fig. 6 in the Appendix, available online via http://link.springer.com). The disruption vector is the first input for the multi-head NN and is processed with an MLP (head), including a batch normalization. For both input matrices (weighted adjacency and context matrices), we use CNN architectures, consisting of stacked CNN and fully connected (FC) layers. For the context information, we apply a CNN-FC architecture to perform best in combination with the other heads. It consists of three CNN-layers and a subsequent FC layer. The third head's design -the process event headposts a more challenging task. We tried the architecture used for context information and appended the adjacency matrices to the context matrices in the fourth dimension. 9 However, none of these approaches delivered satisfactory results. For this reason, we leverage process knowledge in the definition of the CNN kernel sizes. Basically, multiple sequential CNN layers extract features with distinct kernels. 10 After feature extraction, both matrix head outputs have a 4D shape. To combine both with the disruption head's output vector, we flatten the matrix head outputs. The flattened features are subsequently processed by a dense layer and the final output dense layer for the multiclass classification task.

Multi-headed Neural Network Evaluation
For the quantitative evaluation, we classify the type of each disruption event with the constructed MH-NN. In addition, we compare traditional aggregation-based approaches, where we append the disruption input vector with engineered (process) adjacency list features and, in addition, a vector of context information. Instead of 24 disruption vector features, we use 291 input features for adjacency list combination. In combination with the 267 additional adjacency list features, we use a total of 558 features.
We perform a five-time repeated five-fold cross-validation with random initialization. To prevent the DNN models from overfitting, we integrate an early stopping rule for validation accuracy. We store the best-performing models during each training cycle and used a Bayesian optimization algorithm (O'Malley et al. 2019) for hyperparameter tuning. Our tuning objective is the validation accuracy with a maximum retrial of 50 configurations.
For the tuned FC, CNN, and multi-headed (MH) models, we first compare the validation loss (Fig. 5) at the stopping time. The multi-headed approach's loss clearly outperforms the other DNN architectures. In addition, it reaches a solid model with fewer epochs compared to the CNN or FC architecture with flattened feature inputs.
The final models are subsequently evaluated on the hold-out set, resulting in the metrics summarized in Table 3, where we compare basic benchmark approaches such as most frequent (mFreq) or k-nearest-neighbor (KNN) methods, as well as more advanced machine learning, deep learning, and the multi-headed architectures. All evaluated algorithms, ML, and DNN models outperform the naive benchmark in terms of BMACC as well as the (weighted) F1-score, Precision, and Recall-score. We observe that the FC architecture benefits from the additional adjacency list features. However, we also see that the additional context list features lead to a decrease in predictive power, indicating that the FC architecture cannot completely prevent overfitting.
A comparison of CNN with only adjacency matrix features shows that they contain some basic information. However, this performance does not match the FC architecture with disruption and adjacency list features. The proposed multi-headed NN approach outperforms all benchmark architectures. Besides the better training behavior of the multi-headed NN approach, the higher aggregation of the data seems to result in this information loss. Due to the matrix properties, the CNN can identify patterns in the data that lead to improved results. Note that the resulting multi-class accuracy refers to a 32-class classification problem. Accordingly, the 81% MH accuracy is a good result, allowing a reliable solution suggestion. The experimental results of the multi-headed architecture are in line with recent research in computer vision (He et al. 2016) in general and predictive process monitoring (Rama-Maneiro et al. 2021) in particular. The DL algorithms show superior performance for the specific use case of multi-class classification. However, the superiority of the MH-NN architecture in terms of predictive power is tied to some drawbacks regarding implementation and training time. Compared to the standard ML models, that are readily implemented using libraries such as Scikit-learn (Pedregosa et al. 2011), finding and implementing optimal NN architectures for each network head is a complex and time-consuming task. Additionally, the training of the multi-headed NN takes significantly more time. 11 Clearly, this is a limitation of the MH-NN model. For our use case, however, the prediction duration is more relevant, which is acceptable and facilitates the application of the model.

Multi-headed Neural Network Application
In the last phase of the PPNM method, we deploy data acquisition and preparation as well as the identified best model. The method's resources are deployed on a standard commercial virtual machine with Linux OS. It is connected to the organizational enterprise process network through an MQTT connection, which enables the live interaction with the disruption management system. Whenever a disruption occurs and the worker triggers the notification process, the disruption data is transmitted through the MQTT connection and triggers the prediction process. Recent production and intra-logistic event log data are automatically obtained, and all data are prepared as well as forwarded to the MH-NN. The prediction result is then transmitted to the disruption management system and improves the information, which a responding agent receives as part of the disruption notification. Therefore, better preparation for the disruption task at hand is possible, which ultimately reduces disruption downtimes and associated costs.
To provide an evaluation based on the real-world setting, we follow the approach described by Kraus et al. (2020) and evaluate the prediction error costs (c err ). The costs originate from the downtimes for solving a disruption. We calculate the costs based on the production environment setup across the production lines with a mean disruption rate of 1.3% per produced part and report it in a relative monetary unit (MU). To do so, we leverage a previously established study that analyzes the prediction accuracy with respect to the resulting downtimes (Oberdorf et al. 2021b). Based on our quantitative study, increasing model accuracy results in decreasing downtimes due to better information and thus preparation of the notified agents. Further, an increasing accuracy, such as for the MH-NN, results in reduced prediction error costs. While, for example, the basic benchmark approach mFreq creates prediction error costs of about 3,246 MU, the MH-NN comes to prediction error costs of 695 MU.
In addition, we interviewed a data scientist and a project manager. According to the data scientist, the collaboration facilitated the awareness for the great interdependence of the processes. Clearly, processes affect each other, even across organizational borders, which the employees were aware of. However, combining these heterogeneous data sources meant great efforts. The proposed method provides a valuable tool for structured data combination across departments.
Of course, we are aware of interdependent processes, but leveraging the data was usually not practical. The multi-headed NN approaches bridge this gap, as we can further combine data without the downside of extensive aggregation. And due to the deployment, even without first searching and collecting the data. (Data Scientist) We presented the initial results to data scientists, project managers, and managers of the cooperation partner and discussed the practical implications. Aligned with the data scientist's perspective, the project manager depicts the potential on an organizational scale. Beyond the digitalization, production, and logistics departments, applications to financial and controlling are of particular focus. Connections to the customer resources management (CRM) system or website user statistics may enable a better prediction of incoming orders, leading to improved production planning. In addition to better predictions, the deployment is then of special importance.

Discussion and Implications
The presented method enables predictive end-to-end enterprise process network monitoring by leveraging a multi-headed NN architecture. Through a cross-organizational end-to-end view, interrelationships and dependencies between different departments, processes, and information systems can be jointly analyzed.

Critical Perspective on the PPNM Method
Through the first and last phase with particular focus on the organizational layers, we enable end-to-end analyses. Leveraging the multi-headed DNN architecture provides a scaleable solution to combine multiple data sources from across the organization and processes, each with specialized input heads. For the case study, we applied PPNM to a real-world use case and designed a three-headed DNN architecture with multi-log and context data input heads. Based on the numerical evaluation, combined with the employees' feedback, we can summarize that the PPNM method helps guiding the development of predictive endto-end enterprise process network monitoring.
Moreover, there are standard procedure models for data mining, such as CRISP-DM (Wirth and Hipp 2000), that someone may compare to our engineered method. Even though these procedure models work well for numerous use-cases in practical settings, they lack specifications and instructions for guiding the actual model design or combining multiple data sources, particularly considering the complex design process of a multi-headed neural network in an organizational context. For this purpose, the engineered PPNM establishes a more specialized perspective on defining the problem in the enterprise process network and particularly considers the combination of data sources in the design of a MH-NN with dedicated NN input heads.
Finally, considering the MH-NN, architecture alternatives may enhance predictive power. Thus, it may be worth comparing multiple architectures for the same input. We did so during the MH-NN design, resulting in the design with three customized heads. However, with ongoing advances in NN development, new layers or even (pretrained) state-of-the-art methods may emerge. Thus, the chosen MH-NN should be regularly reviewed.

Concept Drift in the Enterprise Process Network
The fifth phase consists of the final step of model integration and operationalization in the enterprise process However, for the current use-case the prediction time is satisfying, whereas it may be optimization potential for future research. Once the predictive model has been put into production, it draws on the knowledge from the historical data used for training. Deployed models inevitably face the phenomenon of structural changes in data over time, which is referred to as concept drift and usually leads to a deterioration of the prediction performance. Maisenbacher and Weidlich (2017), Denisov et al. (2018) and Spenrath and Hassani (2020) mention respective observations in various organizational PPM contexts. Yet, the concept drift problem is neither limited to PPM, but also known in the more general fields of PM (Adams et al. 2021;de Sousa et al. 2021) and ML (Widmer and Kubat 1996). For valid process predictions and analyses, the phenomenon of concept drift has to be detected and counteracted at an early stage. Currently the PPNM method, does not account for concept drift. To detect a concept drift, multiple methods are known (Seidl 2021;Kahani et al. 2021), such as local outlier detection, which can initiate retraining of the model with updated data to avoid wrong predictions and achieve temporal stability (Teinemaa et al. 2018).

Detailed Analytics vs. End-to-End Method
A common phenomenon of traditional enterprises with hierarchical organizational structures is silo thinking. The symptoms of it are weak collaboration throughout the organization. As a result, isolated process analysis within departmental boundaries is often observed, as there is little responsibility for end-to-end processes (Eggers et al. 2021). Nevertheless, a holistic view of the organization is necessary as processes often span several departments. Connected through information systems, inter-departmental information about processes is available. In this regard, digitalization and emerging technologies, such as PM or PPM, enable end-to-end insights into processes and a holistic view on the heterogeneous IT-landscape of enterprises (Armengaud et al. 2020). Both PM and PPM provide tools for generating insights on processes on an organizational scale, as they can process large amounts of data. For example, Lorenz et al. (2021) provide an end-to-end perspective for PM to improve the productivity in make-tostock manufacturing processes, and Eggers et al. (2021) show how management decisions can drive an end-to-end perspective on process data by creating new process owner positions. However, the capability of end-to-end process analysis is hardly considered in research as well as in practice.
Our proposed PPNM method contributes to this field of research by integrating the enterprise process network with all its interrelations and dependencies. In addition, for PPM as a subcategory of PM, our research has shown the benefits of taking an end-to-end view of processes for predictive tasks. The PPNM method and the fusion of interdepartmental data sources significantly increase the predictive power. This is already a first contribution, but it should not be the end of the research. Our approach for end-to-end PPNM is only an avenue towards general approaches for end-to-end PM. Therefore, future research should focus on leveraging the resources of the enterprise process network for PM and derive end-to-end insights.

Conclusion and Outlook
We present the PPNM method, for end-to-end enterprise process network monitoring, leveraging a MH-NN approach. In doing so, we overcome the phenomenon of silo-thinking and separated analysis of in data sources, as we enable the seamless combination of multiple data sources, combined with specialized processing and NN computation for each input. The resulting MH-NN outperforms classical ML and DL models and was applied and evaluated in an organizational context.
From a more general perspective, the method is an essential piece of research, enabling end-to-end PPNM on an organizational scale. Further, it guides the path towards a more general end-to-end PM, which then overcomes silothinking and enables an organization's enterprise process network's potential (van der Aalst 2021). However, the approach is not limited to single organizations. Due to the method's extend-ability, additional data sources, even across multiple organizations, could be combined and leveraged each best. Thus, we further contribute to research towards holistic supply chain analytics. Respective interorganizational PM analyses are proposed by Hernandez-Resendiz et al. (2021) for descriptive supply chain analytics, yet predictive insights are neglected. Our research extends the scope and enables the inter-organizational combination of data, even for predictive tasks. With larger data integrated, additional analytics research streams such as federated learning or aspects such as data ownership become more relevant and should be investigated in future research. The transfer of improved process predictions within and across organizations is not only relevant for research, but especially for enterprises by means of scaling the respective solutions. Thus, our method not only enables new research but could be a fundamental component for scaleable enterprise-ready PPNM solutions with heterogeneous intra-and inter-organizational data sources.
Funding Open Access funding enabled and organized by Projekt DEAL.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons. org/licenses/by/4.0/.