Conformance-oriented Predictive Process Monitoring in BPaaS Based on Combination of Neural Networks

As a new cloud service for delivering complex business applications, Business Process as a Service (BPaaS) is another challenge faced by cloud service platforms recently. To effectively reduce the security risk caused by business process execution load in BPaaS, it is necessary to detect the non-compliant process executions (instances) from tenants in advance by checking and monitoring the conformance of the executing process instances in real-time. However, the vast majority of existing conformance checking techniques can only be applied to the process instances that have been executed completely offline and only focus on the conformance from the single control-flow perspective. We develop an extensible multi-perspective conformance measurement method to address these issues first and then investigate the predictive conformance monitoring approach by automatically constructing an online multi-perspective conformance prediction model based on deep learning techniques. In addition, to capture more decisive features in the model from both local information and long-distance dependency within an executed process instance, we propose an approach, called CNN-BiGRU, by combining Convolutional Neural Network (CNN) with a variant and enhancement of Recurrent Neural Network (RNN). Extensive experiments on two data sets demonstrate the effectiveness and efficiency of the proposed CNN-BiGRU.


Introduction
Cloud computing proved to change the supply of computing, storage, and software services. It provides users with computing resources on demand through a pay-per-use approach to offer flexible IT solutions [1]. Since the formulation of cloud security policies always lags behind the use of cloud services, cloud services have many security risks. Generally, cloud service vendors provide three main types of services, software as a service (SaaS), platform as a Abstract As a new cloud service for delivering complex business applications, Business Process as a Service (BPaaS) is another challenge faced by cloud service platforms recently. To effectively reduce the security risk caused by business process execution load in BPaaS, it is necessary to detect the non-compliant process executions (instances) from tenants in advance by checking and monitoring the conformance of the executing process instances in real-time. However, the vast majority of existing conformance checking techniques can only be applied to the process instances that have been executed completely offline and only focus on the conformance from the single control-flow perspective. We develop an extensible multi-perspective conformance measurement service (PaaS), and infrastructure as a service (IaaS). In terms of SaaS, the service provider deploys application software on remote servers and users use these application services via the Internet by paying for them [2]. As we know, the daily operations of organizations and enterprises mainly rely on automated business processes executing on IT infrastructure. On the one hand, the introduction of business process management (BPM) into an enterprise can cause integration problems because not all software involved in a business process (BP) has clear interfaces. On the other hand, enterprises do not need to buy and maintain expensive servers and Processaware Information Systems (PAIS) to manage and perform their business processes. Accordingly, a new type of SaaS paradigm, business process as a service (BPaaS), has emerged [3,4]. It is delivered via the Internet through a cloud-based business process (model) execution [5,6]. In BPaaS, the business process models are usually developed by service providers for tenants to pay for. Since the business process execution is susceptible to resource availability and the configuration of information systems, the process instance executed in the real world may deviate from the defined business process. These non-compliant process executions can bring serious consequences, and most of them prove to be invalid and meaningless at the end of them. The malicious attack posing as a legitimate tenant can especially initiate a large number of non-compliant process executions to attack BPaaS by increasing the process execution load. Accordingly, these invalid and non-compliant process executions waste the resources of cloud services and may result in the security risk of business process execution load in BPaaS. Therefore, it is particularly necessary to predictively monitor the conformance of process executions when they are ongoing [7] to enhance the security of process execution load in BPaaS. Inspired by the definition of Predictive (business) Process Monitoring (PPM), which purpose is to predict the future state of an executing process instance [8], we propose the concept of predictive conformance monitoring for business process executions (i.e. conformance-oriented predictive process monitoring, conformance-oriented PPM) in this paper.
Until now, the existing conformance-oriented monitoring technologies mainly focus on the measurement value of conformance by Conformance Checking. Conformance checking technologies are designed to find and measure behavioral deviations from actual process executions (that denotes the real behavior) against a predefined process model (that denotes the expected behavior). So the conformance of a process instance only can be determined when it is completed. Accordingly, traditional conformance checking techniques are mostly offline, and they cannot determine in real-time whether the process instance is consistent with the predefined process model. Driven by intelligent business processes management and deep learning techniques, people would like to know whether the process deviates at runtime, rather than a few days later or even longer [9]. Subsequently, some online conformance checking techniques are proposed based on the executed business activities and some behavioral patterns [10,11]. However, they can only perceive the current conformance of an executing process instance. Different from them, our proposed predictive conformance monitoring for an ongoing process execution (instance) can perceive its future (final) conformance-oriented situation in real-time and take actions to terminate the process instances that are unlikely to be compliant in advance for enhancing the security of the process execution load in BPaaS. In addition, when monitoring the compliance of process execution in BPaaS, the conformance of a process instance should be considered from multiple perspectives rather than a control-flow perspective (the order in which certain activities are performed) because the deviations can not only occur in the control-flow. The deviations from other perspectives such as data, resources, and time can also result in invalid and meaningless process execution. Nevertheless, most existing techniques only study the conformance of control-flow, such as [12][13][14][15]. As indicated in [16], it is necessary to propose additional conformance checking techniques from multiple perspectives. Thus, we innovatively propose the Multi-perspective (i.e., control-flow, data, resource, and time) Conformance-oriented Predictive (Business) Process Monitoring (MCPPM) for tenantoriented business process executions to reduce the security risk of business process execution load and the tenants' cost in BPaaS.
To achieve the MCPPM task in terms of enhancing the security of process execution load in BPaaS, inspired by the research of PPM on the other tasks 1 3 Vol.: (0123456789) [17][18][19][20], we investigate how to predict the multiperspective conformance of an executing case on the basis of the historical process executions and a predefined process model in BPaaS. Meanwhile, we view this MCPPM problem as a binary classification task ground on a certain multi-perspective conformance threshold. In order to solve this problem, the general approach usually includes two parts by referring to the solution of the outcome prediction for a process instance in [18]. The first one is offline part, during which we study the relationship between the historical executed process instances and their multi-perspective (control-flow, data, and resource, etc.) conformance and then build a predictive classification model. The second one is online part, during which we forecast the future multi-perspective conformance for an executing case based on this model. Consequently, the effectiveness and efficiency of online real-time predictions are crucial for this task. Driven by the relevant work [17,[20][21][22], we develop an approach based on deep learning techniques. One reason is that the prediction approaches based on conventional machine learning are less intelligent and have deficiencies in terms of prediction performance, especially the efficiency of online prediction, as demonstrated in [17,22]. Moreover, considering that an executed process instance consists of a series of events, the MCPPM can be viewed as a sequential data prediction problem. As for this similar problem, some approaches based on Recurrent Neural Networks (RNNs) and Convolutional Neural Networks (CNNs) prove to be more efficient [23][24][25] because RNN-based approaches pay more attention to the features between the context of elements in sequential data, while CNN can extract local features [21]. However, the RNNs have obvious disadvantages in terms of gradient vanishing and long-distance dependency feature extraction. To address this issue, two variants of Gated RNN, RNN with a Long Short-term Memory unit (LSTM) [26] and RNN with a Gated Recurrent Unit (GRU) [27] are proposed subsequently while the GRU has some advantages in computational complexity [28]. Besides, some other enhancements of neural networks prove to be more efficient regarding the prediction performance, such as the bidirectional neural networks. Meanwhile, the MCPPM task is relatively complex because the multi-perspective conformance of a case is related to some of the events and the contextual dependencies between these events and the attributes of these events the local dependencies between these attributes. Accordingly, in this paper, we integrate the bidirectional enhancement of GRU with CNN to automatically build a prediction model for MCPPM to enhance the security of process execution load in BPaaS. Here, the purpose of using CNN is first to extract representative attribute features by aggregating attributes of events, while the purpose of BiGRU is to extract more temporal relation features such as the long context dependencies from these events.
In summary, this article makes the following contributions: -We innovatively propose a conformance-oriented predictive process monitoring solution based on deep learning techniques, which aims to enhance the cloud security of the process execution load in BPaaS by detecting the future non-compliant process execution in advance and taking action. -We take into consideration the conformance of an executed process instance from multiple perspectives (such as the control-flow, data, resource, and time) rather than just considering the control-flow, and then propose an extensible multi-perspective conformance measurement method. -We focus on the relationship between historical process executions and their multi-perspective conformances and then put forward the CNN-BiGRU approach that can aggregate attribute features of events and progressively extract temporal features among events to forecast the future multiperspective conformance of an executing process instance.
The remainder of this article is organized as follows. We first introduce the related work and make a brief discussion in Section 2. Then, in Section 3, we outlined some of the basic concepts and the problem we are trying to solve, as well as the architecture of our proposed approach. Afterward, Section 4 introduces our proposed CNN-BiGRU approach and Section 5 discusses the experiments and their results. Finally, we make a conclusion and outlook on some of the research questions in Section 6.

Conformance Checking
As an important part of Process Mining, the purpose of Conformance Checking is to detect and measure the difference between the business process executions in the real world and the corresponding process models that set expected behaviors. Generally speaking, conformance checking for a certain process requires the input of the corresponding process execution log and the predefined process description. Most current research concentrates on the process models represented in graphical language and views them as technical descriptions of a business process [16]. Accordingly, given a predefined process model of a certain process and the corresponding process executions, the conformance of this process can be calculated by adopting or designing an algorithm to compare both of them. The related research mainly focuses on the two algorithm types, one is log replay and the other is trace alignment.
The algorithm based on log replay aims to replay the trace of each process instance against a predefined process model and then utilize different measurement methods to define and calculate a conformance metric. One of the most commonly used is the log replay based on tokens [29]. Unlike it, Leemans et al. [15] designed another log replay algorithm. The input process model can be transformed into deterministic finite automata (DFAs) firstly and then the event log is replayed on the allowed execution traces of the automata. In addition, Adriansyah et al. [30] presented a cost-based replay approach, which computes the conformance according to the total costs of the arc in Petri net and the insertion or skipping of activities. Similarly, Munoz-Gama et al. [31] investigated another log replay approach for large event logs by dividing a process model into single-entry single-exit sub-models firstly and then replaying each part of the log on its corresponding sub-model. These approaches are only employed for conformance checking from the control-flow (i.e., the order in which activities are performed) perspective because they are all developed according to the structure of the process model. Besides, as for the multi-perspective conformance checking, Burattin et al. [32] designed another log replay method for conformance checking in terms of declarative process models, in which an interpreter is designed to extract Linear Temporal Logic (LTL) constraints from these declarative process models firstly, and then these constraints are employed to check the conformance of executed cases.
The algorithm based on trace alignment aims to convert the input used for conformance checking into event sequences and then align them as much as possible. Up to now, a lot of conformance checking techniques based on trace alignment have been proposed. The original trace alignment was developed by a series of heuristic steps such as calculating the score matrix, constructing the guide tree, evaluating and pruning the alignment [33]. Based on this, Adriansyah et al. [30] firstly developed an approach based on the A* algorithm. Meanwhile, they defined a cost function to evaluate each alignment for retrieving the optimal one. In addition, some other measurements of the cost function are available such as distance and legal moves. Besides, Song et al. [12] presented an alignment method based on a heuristic algorithm and divide-and-conquer strategy. As for the multi-perspective conformance checking, Mannhardt et al. [34] put forward a balanced multipleperspective alignment approach to align contextual data and resources, Alizadeh et al. [35] developed an approach on the basis of a data CRUD matrix that is constructed from a process execution log and the corresponding process model. Similarly, De Leoni et al. [36] employed causal nets with data from BPMN process models to create multi-perspective alignments. Besides, the multi-perspective conformance checking can be taken as an Integer Linear Programming problem and a cost function of alignment can be defined from the perspectives of resource, data, and time [37].
In comparison, the "trace alignment"-based approaches are more extensible in terms of the other perspectives of resource, data, and time rather than the single control flow. After comparing the process execution log with the corresponding process model, a metric is required to define and evaluate their conformance. Usually, four quality metrics are available, including fitness, precision, simplicity, and generalization [38]. Among them, the closest concept to conformance is the fitness, which denotes the proportion of executed process instances that can successfully replay on the corresponding process model. Accordingly, some different definitions of fitness including the token-based and the cost-based are proposed in [29,30].
Of all related approaches, the closest to our work is online conformance checking. As for online conformance checking, Burattin [39] developed an approach ground on the behavioral difference between a business process model and a process execution log. Moreover, they proposed an online conformance checking framework based on the Transition Systems (TS) that are extracted from the process models modeled by Petri nets [9]. After that, they employed the behavior patterns to denote the business process and then implemented a framework to detect the compliance between these patterns and the process executions [10]. Additionally, Zelst et al. proposed an online conformance checking approach based on the event stream of an executing process instance and the prefix-alignment [11]. Unlike the online conformance checking, we propose a more forward-looking and meaningful framework to online forecast the future conformance for an executing process instance. This framework can provide a predictive measurement of conformance before a process execution completes. In this way, the process executions that will not conform to the predefined process model in BPaaS can be terminated in advance.

Predictive (Business) Process Monitoring
Process executions always change due to the dynamic execution settings and external conditions (i.e., law, regulation, and policies), especially in the BPaaS application. As described above, these traditional monitoring methods are mostly passive because deviations are only identified after they have occurred rather than prevent these deviations in the first place. In order to address this issue, PPM techniques are proposed to predict the future status of an executing process instance (case). Generally, the PPM techniques consist of two parts, the first one is offline training where one or more predictive models can be constructed ground on a completed process execution log, and the second one is online prediction, where the established prediction model(s) is(are) employed to make relevant predictions about the process instances being executed.
The existing vast majority of PPM techniques can be grouped into three categories according to prediction content, including time prediction [17,40,41], outcome prediction [18,[42][43][44][45], and the next activity (sequences) prediction [19,21,41,[46][47][48]. Moreover, all approaches mentioned above can be grouped into three categories depending on the used techniques, including those based on an extended process model, traditional machine learning, and deep learning. For example, Rogge-Solti et al. [40] employed a specific Petri-net to capture any duration distributions and then utilized it to forecast the remaining execution time for an ongoing process instance. Furthermore, Lakshmanan et al. [44] provided an algorithm to relate the process instances to an expanded spacebased Markov chain and then leveraged existing techniques to predict the possibility of performing an activity in the future. Ferilli et al. [46] extended the process model by using the WoMan framework for activity prediction of process. Whereas Appice et al. [41] investigated a data-centric process execution prediction approach based on the shallow machine learning technique. Leontjeva et al. [45] focused on the trace encoding techniques and then proposed an approach ground on Random Forest classification to predict the outcome of cases. As for deep learningbased approaches, Tax et al. [17] investigated the performance of the LSTM network on the predictive tasks such as the next event to be executed and the complete continuation time of a running case. Pasquadibisceglie et al. [21] transformed the event log into 2D image-like data structures and then employed a CNN network to train a prediction model for the next activity prediction. In addition, Park et al. [49] applied deep neural networks to predict the future performance of a business process based on an event log.

Discussion
According to the above analysis, traditional conformance checking techniques are offline and delayed so that they can not support the real-time checking of conformance. Although online conformance checking techniques are real-time, they cannot forecast the future conformance for an executing case at the present moment because they only compare the currently executed part trace of an ongoing case with a fixed pattern to obtain the conformance result currently instead of predicting the future conformance by learning the certain feature representative. Additionally, the existing majority of studies only focus on the conformance as for the single control-flow rather than the comprehensive multi-perspective such as the resource, data, and time.
To achieve the business process conformance monitoring in terms of reducing the security risk of process execution load, inspired by the forward-looking nature of PPM, we first focus on and clarify the problem of MCPPM in this paper. After the investigation of existing PPM studies, we find that some of them are based on deep learning and prove to be more automatic and efficient than the most advanced approaches [17,20,21]. These deep learning-based approaches treat the completed case as a series of events (i.e., traces), where each event is recorded as multiple attributes, then adopt encoding methods to transform traces into numerical vectors, and finally build a prediction model for capturing the decisive features by utilizing some neural networks such as the RNN and CNN. Likewise, the future conformance of an executing case can be predicted by a prediction model constructed based on these neural networks because its input is sequential data (i.e., trace), which is just suitable for RNN and LSTM. Thus, we explore the efficient deep learning application in the MCPPM task and then propose an approach by combining CNN with a variant and enhancement of RNN to construct an effective and efficient prediction model.

Measurement of Multi-perspective Conformance
As for a defined business process model in the BPaaS application, the future multi-perspective conformance of an executing process instance can be forecasted based on the historical executed cases for a tenant to enhance process execution load security. Generally, these executed cases for each tenant are recorded in an event log, and each case consists of some event records with several attributes. These attributes indicate the detailed execution (behavior) information of a real-world case, such as the resource, data, and time perspective. By comparing them with the predefined process models with multi-perspective constraints, the consistency between executed cases and a specific process model can be measured from multiple perspectives.
Here, we use a specific Petri net with constraints (C_PN), in which the transitions of this Petri net should satisfy certain constraints, to model a specific process from a multi-perspective. In reality, such a process model is determined by the original process model with a control-flow perspective (modeled by Petri net) and the constraints with some other perspectives (e.g., resource, data, and time).

Definition 3.1 (C_PN Process Model)
A predefined process model that sets the desired behavior can be expressed as a Petri net with constraints (C_PN) M = (P,T,F,V,G d ,G r ,G t ), which includes: -a list of places P; -a list of transitions T that is labeled with activity name; -a list of directed arcs F ⊆ (P × T) ∪ (T × P) that are recorded as flow relation; -a list of variables V; -a constraint function G d ∶ T → V that relates a set of logical expressions (i.e., guard) in terms of data perspective to each transition. -a constraint function G r ∶ T → V that relates a set of logical expressions (i.e. guard) in terms of resource perspective to each transition. -a constraint function G t ∶ T → Γ V that relates a set of logical expressions (i.e., guard) in terms of time perspective to each transition.
As shown in Fig. 1, the process modeled by C_PN can be represented as T = {a 1 ,a 2 ,a 3 ,a 4 ,a 5 ,a 7 ,a 8 ,a 6 },P = {start,c 1 ,c 2 ,c 3 ,c 4 ,c 5 ,end},F = {(start,a 1 ),(a 1 ,c 1 ),(a 1 , c 2 ),(c 1 ,a 3 ),(c 1 ,a 2 ),(a 2 ,c 3 ),(c 2 ,a 4 ),(a 4 ,c 4 ),(a 3 ,c 3 ),(c 3 ,a 5 ), (c 4 ,a 5 ),(a 5 ,c 5 ),(c 5 ,a 6 ),(a 6 ,c 1 ),(a 6 ,c 2 ),(c 5 ,a 7 ),(a 7 ,end),(c 5 ,a 8 ),(a 8 ,end)}. The control-flow (i.e. structure) of a process defines the order where business activities are performed. Table 1 gives the corresponding guards for each transition of this process from multiple perspectives (i.e., data, resource and time), in which the constraint function is composed of multiple logical expressions with "∨", "∧", and " ⌝ ". If a transition t has no constraints in terms of a certain dimension, we set G i (t) = true (t ∈ T,i ∈{d,r,t}). For example, the guard of data for transition examine thoroughly is G d (a 1 ) = (amount > 2000) means that the data constraint of this transition is amount > 2000. In other words, the transition examine thoroughly can be enabled only if the control structure is satisfied first and then the amount of request for flight claim is bigger than 2,000. Likewise, the guard of resource G r (t) specifies the related resource of each transition. In particular, as for transitions of a 2 and a 3 , there is another constraint is that the resource for executing each of them in the last time is different from that in the next time when re-initiating a request within an execution of this process. Besides, the guard of time G t (t) defines the time constraints for each transition in addition to the order of transitions described in process model. For transition a 6 , the time constraint T a 6 ≤ T a 5 + 5days requires that it must happen within 5 days after the transition a 5 occurs. Taking the process model described in Fig. 1 for example, a sequence of transitions can be formed based on the above firing rules. At first, the start place has a token and it can be fired first and then the transition a 1 can be enabled and it has a token. Afterwards, once a 1 is enabled, each of the two output places c 1 and c 2 has a token. Whereas the place c 1 has only one token so that only one transition a 2 or a 3 can be enabled. In this case, a transition sequence can be generated until the end place is fired. As for the process modeled in Fig. 1, we can obtain reachable traces T(M) = {⟨a 1 , a 4 , a 2 , a 5 , a 7 ⟩, ⟨a 1 , a 4 , a 2 , a 5 , a 8 ⟩, ⟨a 1 , a 4 , a 3 , a 5 , a 6 , a 3 , a 4 , a 5 , a 8 ⟩, … } .

Definition 3.3 (Event, Event Log)
An instantiation of a transition is an event, which can be defined as a tuple e = (a,c,t start ,t end ,d 1 ,…,d m ) where c is the case id representing the specific process execution,  Table 1 The guards of multi-perspective, i.e. data, resource, and time Pete", "Mike", "Ellen"} true t start and t end denote the start and complete timestamp respectively, and d 1 , … , d m (∀i [1, m], d i D i ) denotes a series of additional attributes. All the executed events that are recorded by a process-centered PAIS in BPaaS make up an event log. As for the event log L, all occurred events can be represented as a collection of A L . Table 2 gives the event log associated with the process model described by Fig. 1. As shown in this table, some events with the same caseID indicate that they occur in the same process execution. Meanwhile, the activity attribute of each event associates with a transition in a process model. Generally, the attributes of each event can be divided into two types, case attribute and event attribute, according to whether their values is distinguished by cases or events. For example, the attributes of caseID and amount belong to case attribute while the attributes of startTimestamp, completeTimestamp, activity, and resource belong to event attribute. Generally, the value of each event attribute and case attribute may be numeric data, categorical data, as well as text data. For instance, the value of activity and resource in Table 2 is categorical data while the value of amount is numeric data.

Definition 3.4 (Trace, Prefix Trace)
Each process execution (i.e., process instance or case) can generate a non-empty finite time ordered sequence of events, which can be defined as a trace σ = 〈e 1 ,e 2 ,...,e |σ| 〉 with satisfying ∀i,j ∈ [1,|σ|],e i ∈ A,e j ∈ A,e i .c = e j .c. For a given trace σ, the preceding part of its event sequence from the beginning represents the executed events at different moments, which can be defined as prefix trace σ l = 〈e 1 ,e 2 ,...,e l 〉 with certain length l(l ≤|σ|).

Definition 3.5 (Control-flow Alignment, Optimal Alignment).
A control-flow alignment between process model M and its executed trace σ can be defined as a sequence of pairs M denotes all the activities (i.e., transition collection T) in this process model and the placeholder "⊥" while A ⊥ L represents all the activities occurred in the event log and the placeholder "⊥". The possible shifts when they align can be formalized as follows.
Given a reachable trace M ( M ∈ T(M)) from M and a trace σ L of L, the alignment between them can be defined as a series of shifts, In terms of each pair of σ M and σ L , many different alignments can be obtained, such as two of them shown below.
Vol.: (0123456789) To measure these alignments, a cost function about the legal shifts can be expressed as To obtain the best complete alignment between process model M and trace σ L , we define an optimal alignment κ as , is an alignment between σ M andσ L } because a process model typically has multiple reachable traces. Accordingly, an alignment between a trace σ L and the optimal aligned reachable trace σ M can be defined as , the cost of this alignment can be represented as  where {t ∈ σ L |G d (t) = true} is the collection of transitions that are also belong to trace σ L and satisfied the guard function of data G d (t), |{t ∈ σ L |G d (t) = true}| indicates the number of transitions while |σ L | (1) 1 = a 3 ⊥ a 5 a 7 a 9 a 11 a 3 a 4 ⊥ a 7 a 9 a 11 2 = a 3 a 5 a 7 a 9 a 11 a 3 a 4 a 7 a 9 a 11 indicates the length of σ L , that is, the number of transitions involved in trace σ L .

Definition 3.8 (Resource Fitness)
As for process model M, the resource fitness of trace σ L is introduced to measure its conformance from the resource perspective in terms of the resource constraints described in M. To normalize its value in a range of [0,1], we define it as: where {t ∈ σ L |G r (t) = true} is the collection of transitions that are also belong to trace σ L and satisfied the guard function of resource G r (t), |{t ∈ σ L |G r (t) = true}| indicates the number of transitions while |σ L | indicates the length of σ L . where {t ∈ σ L |G t (t) = true} is the collection of transitions that are also belong to trace σ L and satisfied the guard function of time G t (t), |{t ∈ σ L |G t (t) = true}| indicates the number of transitions while |σ L | indicates the length of σ L .
Take the process instance with caseID of 1 in Table 2, its data fitness can be calculated as 4/5(0.8) because the length of this trace is 5 and the G d (a 2 ) is false (the amount of a 2 is 2000 which doesn't satisfy amount > 2000). Similarly, the resource fitness and the time fitness can be calculated like this. Definition 3.10 (Multi-perspective Conformance) As for process model M, the multi-perspective conformance of trace σ L can be defined as: Here, the weights of ω 1 , ω 2 , ω 3 , and ω 4 satisfy the constraint of ω 1 + ω 2 + ω 3 + ω 4 = 1. To balance the multiple perspectives, each weight can be set equally to 0.25. Besides, (7) can be extended for much more Fitness( L , M) = 1 * fitness ctrl + 2 * fitness data + perspectives rather than the above-mentioned ones, such as the role perspective.

The Prediction of Multi-perspective Conformance
Definition 3.11 (Multi-perspective Conformance Labeling) As for trace σ L , its multi-perspective conformance class y(σ L ) can be defined as a mapping function y ∶ (M, L , ) → {True, False} based on the predefined fitness threshold . Here, True indicates the process execution is consistent with the process model from multi-perspective while False indicates the process execution is completely inconsistent with the process model from multi-perspective. The detailed description is as follows.
For instance, as shown in Table 2, we can obtain the multi-perspective conformance class y(σ 1 ) for trace σ 1 based on the measurement of multi-perspective conformance and (8).

Definition 3.12 (Attribute Encoding, Event Encoding, Trace Encoding)
An attribute encoding is defined as a mapping f attr ∶ attr i → R p i (i ∈ [1, |e|]) (|e| denotes the number of attributes within event e (e = {(attr 1 , val 1 ), (attr 2 , val 2 ), … , (attr |e| , val |e| )}) ) that encodes the value of each attribute attr i as a vector with specific dimensions p i . In this case, for each event e, its event encoding can be also defined as  Problem Statement For a certain process in BPaaS, given its process model M described in C_PN for a tenant, its event log L = { 1 , 2 , … , s } with s historical executed cases, and an ongoing case � = ⟨e 1 , e 2 , … , e � � � ⟩ to be predicted, the problem considered in this paper is to forecast the future conformance Fitness( � , M) of each running trace ′ from multiple perspectives, that is, the control-flow, data, resource, and time. This is the so-called Multi-perspective Conformance-oriented Predictive (Business) Process Monitoring (MCPPM).

Overall Architecture
In this paper, we concentrate on the security of process execution load in the BPaaS application and propose a predictive monitoring solution of the multiperspective conformance. In order to effectively forecast the future final multi-perspective conformance of an executing process instance, we put forward an approach of CNN-BiGRU to establish a prediction model according to a certain process model (combined with a general process structure and some other constraints) and the corresponding historical process executions. The CNN-BiGRU approach, combined with CNN and the variant and enhancement of RNN, aims to determine a specific neural network with the CNN-BiGRU framework and the optimal parameters. Based on it, a prediction model can reveal the impact of the behavior of process executions on its multiperspective conformance, in which these behaviors are involved in the order where the activities are performed, the resource and time for performing these activities and the process executions are recorded as attribute-value pairs of events. Our proposed solution for the conformance prediction problem mainly includes two parts, offline and online. The former part aims to construct a prediction model (i.e., classification model or classifier) that indicates the relationship between process executions and their multi-perspective conformance. In contrast, the latter part aims to predict the future real-time conformance of an executing process instance. Figure 2 gives the overview of our proposed solution that includes the following two parts.
Offline Training In this stage, the required inputs are the process model described by C_PN and its historical event log (for a tenant), which contains a series of executed process instances. On the basis of them, we can calculate the fitness of each executed case in terms of multiple perspectives. Then, the multi-perspective conformance for each case can be measured based on these fitness values, and each case can also be labeled conformance class based on the predefined threshold. Meanwhile, we preprocess the event log by expanding many more additional attributes for events and then encode each trace according to some coding strategies. Thus, we can obtain the pair of encoded traces and the corresponding multi-perspective conformance class for each case of this event log. A prediction model can be finally obtained through sample training by taking them as the input of a specific neural network.
Online Predicting Based on the built prediction model, the final multi-perspective of an ongoing case can be forecasted. The running case (i.e., a prefix trace) includes some executed events and each of them has a series of pairs of attribute-value. By taking this prefix trace as the input of the prediction model, it needs to be encoded and padded with zero according to the length of the encoded vector for historically completed traces. Afterward, the encoded vector of the prefix trace can be input into the prediction model to determine the multi-perspective conformance class of this case.

Multi-perspective Conformance-oriented Predictive (Business) Process Monitoring (MCPPM) Approach
The purpose of MCPPM is to forecast the final conformance class of an executing process instance in real-time effectively and efficiently. To solve the multi-perspective prediction problem, we need to first measure the multi-perspective conformance of historical executed cases by comparing it with a certain process model, and then determine their conformance classes (i.e., True vs. False). We can then construct a predictive classification model with a specific neural network to extract the relationship between the executed cases and their determined multi-perspective conformance classes. Thus, according to this prediction model, the future conformance class of an executing process instance can be forecasted. To better introduce our proposed approach, we first describe its whole framework and then describe how to construct a specific neural network based on CNN-BiGRU in detail.

Measuring the Multi-perspective Conformance of Process Instances
To compute the multi-perspective conformance of an executed case, we propose an algorithm grounded on the process model described by C_PN to measure the conformance from the perspective of the control structure, data, resource, and time. As introduced in Definition 3.1, the process modeled by C_PN can set the expected behavior with some constraints from multiple perspectives, in which the desired behavior of a process in the control-flow aspect is expressed by the transitions T, places P, and flow relations F of C_PN. Similarly, the desired behavior in data, resource, and time perspectives are demonstrated by the guard functions G d , G r , and G t , respectively. Thus, we compute its fitness from these multiple perspectives for each executed process case according to the C_PN process model, respectively. Subsequently, the multi-perspective conformance of an executed process instance can be measured according to these fitness values.
As analyzed above, we design a multi-perspective conformance measurement algorithm in Algorithm 1. Firstly, we initialize the event log L ′ that adds the measurement of multi-perspective conformance (line 1). Then, based on the structure of process model M (i.e., M.T, M.p, and M.F) described in C_PN, we get all the reachable traces T(M) that are allowed to perform in a process model (line 2). After that, for each trace of process case in L, we compute its fitness in terms of multiple perspectives, respectively and then obtain its multi-perspective conformance (lines 3-16). As for the control-flow perspective, we find all of the possible alignments between a trace L i and each trace M j of T(M) and then compute the cost of these alignments based on (2) (lines 4-7). Based on them, we can find an optimal reachable trace M (1) that can align to the trace L i with the minimum cost from T(M) (line 8). Meanwhile, we can obtain the cost of trace L i that is aligned with the process model M based on the minimum cost (line 9). According to (3), in the control-flow aspect, the fitness of trace L i relative to the process model M can be calculated (line 10). In addition, based on the constraint functions G d , G r , and G t described in the process model M, the fitness of each trace L i from the data, resource, and time perspectives can be calculated respectively according to (4)-(6) (lines [11][12][13]. After that, the multi-perspective conformance Fitness( L i , M) of trace L i can be computed on the basis of these fitness and the given weight parameters (line 14). Finally, we add the conformance value for each trace in L ′ (line 15) and then return the final result L ′ (line 17).

Predicting the Multi-perspective Conformance
After the multi-perspective conformance measurement, the key to forecasting the future conformance class for an executing process instance is to establish a prediction model that reveals the relationship between a case and its conformance. The reason is that the multi-perspective conformance of a case is always determined by its execution information. Generally, the actual executed cases are recorded in the event log, where each case consists of a series of events that are logged as many attribute-value pairs. Considering the relation between them, the multi-perspective conformance of a case is not only related to some of the events as well as the contextual dependencies between these events, but also to the attributes in these events as well as the local dependencies between these attributes. Therefore, the prediction model to be built requires the ability to learn both the long context dependencies of events and the local dependencies of attributes. However, using traditional machine learning techniques to construct the prediction model is less intelligent. Here, we propose the CNN-BiGRU approach based on neural networks to build a prediction model that performs well.

The Framework of Multi-perspective Conformance Prediction Based on Neural Networks
To forecast the multi-perspective conformance class of an executing process instance in terms of the security of process execution load in BPaaS, here, we describe a framework about how to preprocess the event log for training, how to obtain a prediction model, and how to make predictions about the multi-perspective conformance for a running case. The executed cases recorded as a series of events with attribute-value pairs in the event log can be employed to train a prediction model after their conformance class is determined. In order to obtain a more effective prediction model, the original event log can extend some additional attributes computed from these basic attributes. However, the extended event log cannot be used directly to train a prediction model because each attribute in the event log has different value types, such as numerical data and categorical data. Thus, we adopt some coding strategies to encode them as vectors with a certain dimension before training a prediction model. Taking them as inputs, a prediction model based on neural networks can be trained, and then the future conformance class of an executing process instance can be forecasted through this model. As shown in Algorithm 2, we design an algorithm to describe a framework that contains the preprocess of the event log, the determination of conformance class, the construction of a prediction model, and the prediction of a running case. Algorithm 2 can be divided into two parts, one is the offline part when the historical cases are preprocessed for training, and a prediction model is trained (lines 1-29), and the other is online part when the multi-perspective conformance of an executing case is forecasted through the built model (lines [30][31]. At the offline part, for the executed cases of event log L ′ , we first compute the values of new added case attributes (newAttr,val) according to the given additional case attribute collection CA (lines 2-6). Then, for each event of these cases, we compute the values of new added event attributes (newAttr,val) based on the given additional event attribute collection EA (lines [8][9][10][11][12]. Meanwhile, the new additional attributes are extended in each event L ′ .e ij of event log L ′ (line 13). Afterwards, we encode each case for numerical vector ���� ⃗ v L i with specific dimensions l according to the characteristic of attributes (lines [17][18][19][20]. For instance, if the value of the attribute is numerical, we standardize it according to the value range of this attribute. For the categorical attribute, we utilize the one-hot coding method to transform its value into a vector that consists of 0, 1.
Meanwhile, each case is labeled with a conformance class based on its multi-perspective conformance and the given conformance threshold . And the encoded trace and its conformance class can be stored in L (line 19). By taking these encoded cases as well as their conformance classes as input of a designed neural network, a predictive classification model can be trained. Accordingly, the architecture of neural networks should be determined first, and then the related weight parameters need to be trained by the encoded and labeled event log. Here, we suppose a neural network has been determined. Firstly, we initialize all weight parameters W and b of this neural network randomly (line 21). Next, each pair of the encoded case and conformance class is fed to this neural network for training (lines [22][23][24][25][26][27][28][29]. As for each encoded case ���� ⃗ v L i we compute its result ŷ ( L i ) based on the neural network NN with the initialized W 0 and b 0 , and then employ a loss function to denote the deviation between the computed result ŷ ( L i ) and its real result y( L i ) (lines [23][24]. Based on the loss, the W and b can be iteratively updated through a loss-based Back-Propagation (BP) algorithm as well as some related parameters HP, such as the batch size, learning rate and dropout (line 25). The above operations are repeated until the loss function converges (lines [26][27]. Finally, a prediction model can be obtained according to the final value of weights W and biases b. At online part, for an ongoing case ′ , we use the obtained prediction model to predict its conformance class according to the executed events with data-payload (lines [30][31]. In particular, we encode this case ′ and then fill out its rest part with zero according to the determined dimension l.

The Construction of a Specific Neural Network Based on CNN-BiGRU
To construct a prediction model based on neural networks, we need to determine a specific neural network that can first reveal the relationship between a case and its multi-perspective conformance class. Based on the above analysis, such a neural network requires the ability to learn long context dependencies of events and the local dependencies of attributes. Therefore, the multi-perspective conformance of a case not only relates to some of the events and the contextual dependencies between these events, but also relates to the attributes of these events and the local dependencies between these attributes. Therefore, this paper presents the CNN-BiGRU approach to construct a specific neural network by combining CNN with a variant and improvement of RNN. Figure 3 gives the architecture of this specific neural network. In this figure, we combine a 3-layer CNN (i.e., three convolutional-pooling layers) with a multi-layer BiGRU (i.e., multiple bidirectional GRU layers). The features extracted with CNN and BiGRU can be concatenated to obtain a hybrid feature. Finally, we compute the probabilities of the conformance class for a case. Here, we will give an example to show how the proposed CNN-BiGRU neural network can perform effective feature extraction. Considering an event log L = {σ 1 ,σ 2 ,...,σ s } with s cases, one of them can be denoted as t = ⟨e t1 , e t2 , … , e tn ⟩(t ∈ [1, s], n = � t �) , in which e ti (i ∈ [1,n]) denotes the i-th occurred event in this case.

Input Layer
Taking the above case σ t as a input of the CNN-BiGRU, the i-th event is represented as a vector of ⃗ x ti = [x ti,1 , x ti,2 , … , x ti,l ] with l-dimension by adopting the event encoding strategy as described in Definition 3.12. Here, l denotes the total length of the event encoded vector based on the attribute encoding strategy. Accordingly, this case can be represented as ⃗ x t1 , ⃗ x t2 , ⋯ , ⃗ x tn , and then feature extraction can be carried out by a 3-layer CNN and multi-layer BiGRU, respectively. In particular, as the input of the 3-layer CNN, a padding operation is required to extend the dimension of the input matrix so that the convolution output has the same size as the input. In the subsequent experiments of this paper, we pad with zeros for each trace based on the length of the longest trace when encoding them. In the following description, we assume that n is the length of the longest trace in L.
Attribute Feature Aggregation based on CNN A Convolutional Neural Network extends three additional operations of local filters (i.e., convolution), pooling, and weight sharing based on a simple and fully-connected feed-forward neural network. The CNN shown in Fig. 3 has three pairs of convolutionpooling layers: the Conv1-Pool1, Conv2-Pool2, and Conv3-Pool3, in which each convolution layer utilizes a series of filters to compute the small local context information for each part of the input. Each pooling layer utilizes an optional pooling function, such as the max-pooling, to get the refined local information from the convolutional layer output. In particular, the pooling function can keep translation invariance for some minor differences of the position where some features occur, making sense when we focus on whether a feature appears rather than where it appears. Thus, we use CNN to aggregate the important representative features among the encoded items of attributes in the event log. As for a CNN with multiple layers, after the computations of convolution and pooling many times, a fully connected layer (i.e., the FC Layer is shown in Fig. 3) is finally employed to integrate the feature extracted from all positions. To further demonstrate the application of CNN, Fig. 4 gives a detailed example of the first convolution-pooling Layer in Fig. 3. In Fig. 4, an input case with nine events with 6-dimension item of encoded attributes is considered.

Convolution Layer
In a convolutional layer, some filters, i.e. convolution kernels, with different size are applied to extract features. Here, we use K filters with h × 1 size (i.e. a window of h-gram encoded attribute items) to extract the local h-gram features. For this where c (k) j is the j-th feature map extracted from input matrix X by filter f (k) , F 3 denotes ReLU activation function, W (k) and b (k) respectively denote the corresponding trainable weight matrix and bias. Accordingly, we obtain all feature maps c (k) = (c (k) 1 , c (k) 2 , … , c (k) l−h+1 ) after feature extraction through k-th filters.
Pooling Layer Here, we utilize a max-pooling function for each filter output (i.e. feature map) to choose the most important feature by: Accordingly, after all K filters (i.e. convolution kernels) with size h conducted, the extracted features are represented as c = (c (1) max , c (2) max , … , c (K) max ).

Full-collected Layer
After max-pooling operation, all features outputted can be concatenated to represent the local context h-gram attribute items, which is rep- is the concatenate operation).

Temporal Relation Extraction based on
BiGRU A RNN network [50] originates from a basic feed-forward neural network, which has recurrent hidden states. The hidden state can be activated at each moment, depending on the hidden state of the previous moment. Accordingly, it can handle the variable-length sequential data. However, RNN is difficult to extract long-distance dependencies due to the possible gradient diminishing or explosion. To address this issue, two variants of Gated Recurrent Neural Network, RNN with a Long Short-term Memory (LSTM) unit [26] and RNN with a Gated Recurrent Unit (GRU) [27], are proposed. Both of them have been demonstrated to work well in tasks with sequential input, but the GRU performs better than LSTM when considering a controlled model complexity. Compared with the general RNN, GRUbased RNN adds two gate units in the recurrent hidden state (i.e., cell) so as to extract the dependencies of different time periods adaptively. Here, we adopt a bidirectional improvement and then propose a BiGRU with multiple layers as shown in Fig. 3, in which each BiGRU layer, such as the 1st-BiGRU Layer and the 2nd-BiGRU Layer, consists of a forward propagation layer (i.e., the recurrent direction is consistent with that of events in a case) and a backward propagation layer. Such a BiGRU layer can extract the context information between the preceding events and the current event with the forward propagation layer and the context information between the current event and the subsequence events with the backward propagation layer.
BiGRU Layer Each unit of a BiGRU layer is integrated by the basic GRU unit bidirectionally, i.e. forward and backward, as shown in Fig. 3. Each BiGRU layer extracts the hidden context features from the previous BiGRU layer. For instance, the output of the 1st-BiGRU Layer can be viewed as the input of the 2nd-BiGRU Layer. Taking the first BiGRU layer for example, there are two propagation layers integrated forward and backward. Then, the probability that the multi-perspective conformance class of case σ t is positive can be calculated by using o t as the input of the sigmoid activation function: where W c , b c are the corresponding weight parameters, and the value of ŷ t is in the range of 0 − 1.
To train the neural network, finally, we determine a binary cross-entropy function that measures the error between the computation result in ŷ t from the CNN-BiGRU neural network and the actual result y t for the case σ t . The detailed loss function is as follows.
Based on the determined CNN-BiGRU neural network and loss function, we can use Algorithm 2 to train a prediction model.

Experiment Settings
Here, we make a comprehensive comparison with some other deep learning approaches and two typical traditional machine learning approaches to demonstrate the performance of the hybrid CNN-BiGRU approach because no one has proposed other solutions to this problem. The comparative deep learning approaches include the basic RNN, LSTM, and GRU approaches, the corresponding bi-directional improvements Bi-RNN, Bi-LSTM, and Bi-GRU approaches, and the hybrid CNN-RNN, CNN-LSTM, CNN-GRU, CNN-BiRNN, CNN-BiLSTM, and CNN-BiGRU approaches. Moreover, two other traditional machine learning-based approaches, the Gradient Boosted Trees (XGBoost) and the Random Forest (RF), are chosen for comparison because a recent empirical study on 165 datasets show that they are usually better than other traditional machine learning algorithms for classification tasks [51].

NN (Neural Network)-based approaches
For further comparison, we develop the RNN approach, LSTM approach and GRU approach by utilizing the original RNN neural network (i.e., RNN approach) and its variants LSTM and GRU neural networks to construct prediction models, respectively. Similarly, we also develop the Bi-RNN approach, Bi-LSTM approach, and Bi-GRU approach by bidirectionally integrating the RNN network, the LSTM network, and the GRU network, respectively. Based on these six approaches, we also develop the other six hybrid approaches by combining with the CNN network, i.e., CNN-RNN, CNN-LSTM, CNN-GRU, CNN-BiRNN, CNN-BiLSTM, and CNN-BiGRU approaches. (16) Loss(ŷ t , y t ) = −((1 − y t )log(1 −ŷ t ) + y t logŷ t )

RF-based approaches
To compare with the traditional machine learning approaches, we choose the RF algorithm to construct a prediction model. As shown in [52], there are some optional operations needed to be determined first, e.g., the bucketing (clustering) and coding for these (prefix) traces extracted from the executed cases. Here, we choose a single bucket method for clustering and two typical methods laststate and aggregation for encoding. The laststate method encodes a (prefix) trace according to the last state (event) information of this trace. In contrast, the aggregation method encodes a (prefix) trace by employing an optional aggregation function on this trace. The former only considers the last event of a (prefix) trace, while the latter considers all events of a (prefix) trace but ignores the order of events in this (prefix) trace. Accordingly, two approaches of RF_ single_laststate and RF_single_agg are developed for comparison based on the RF classification algorithm.

XGBoost-based approaches
To compare with the traditional machine learning approaches, we also choose the XGBoost algorithm to construct a prediction model inspired by [53]. Similar to RF-based approaches, we develop two approaches XGBoost_sin-gle_laststate and XGBoost_single_agg for comparison.
These 16 approaches mentioned above are employed on two publicly available data sets and then compared from the accuracy and time performance of predictions. For these approaches, we develop them in Python and conduct comparative experiments on a server with three NVIDIA Tesla V100 GPUs and 2 x 12 Intel Xeon 5118 CPU @2.30GHz 256GB memory. Here, we employ these two public event logs to denote the real process execution from tenants in terms of two different processes in BPaaS. To obtain the process modeled by C_PN in BPaaS, in this paper, we use the plug-in from public ProM 1 to mine a process model described by general Petri net and some constraints for each transition from some other perspectives, inspired by [14].
Datasets Two event logs Traffic Fines and BPIC2012 from the public 4TU Centre for Research Data 2 are employed in this experiment. The detailed statistic about these two processed logs is shown in Table 3. The BPIC2012 log records the historical executions for a loan application process and the Traffic Fines log mainly includes a set of activities about paying traffic fines and some information related to individual cases, such as the reason and the total amount paid for each traffic fine. At first, we preprocess these logs by removing some noise records. We use ProM to obtain their process model expressed by Petri net with some constraints for each transition from the perspective of data, resource, and time for these preprocessed logs. Based on them, we calculate the multi-perspective conformance of each case through Algorithm 1 and then decide their conformance classes by a defined threshold of = 0.8. After that, we extend some additional attributes for each event of these two logs. In addition, we truncate the long cases where the length is greater than a certain one because the long traces can decrease the performance of the prediction model during training. To determine the truncated length, we first select the certain conformance class with fewer cases, then group them in ascending order according to the length of cases and find the length at the point of 90%. From Table 3, it is not difficult to discover that the proportion of positive and negative samples in these two event logs is unbalanced. Especially, the proportion of positive samples of the Traffic Fines log is up to 97%. In addition, we also give the corresponding number of case attribute and the event attribute, respectively. Finally, we encode the executed cases of these event logs according to the value types of attributes.
Evaluation Metrics Generally, the predictive process monitoring techniques expect to obtain an accurate prediction result efficiently in process execution because real-time process monitoring makes sense in terms of the security of process execution load in BPaaS. Therefore, we compare these approaches from the perspectives of accuracy and execution time when making predictions about the multi-perspective conformance classes. At first, the AUC (the area under the ROC curve), which expresses the probability that a given classifier will rank a positive case higher than a negative one, is determined to measure the prediction accuracy [54] because other metrics need to predefine a threshold and the value of threshold has a great influence on the accuracy. Furthermore, the ROC curve in AUC can keep unchanged even if the sample ratio is imbalanced. As for execution time, two metrics of offline time and online time are determined for comparison, in which the offline time denotes the time it takes to obtain a prediction model while the online time denotes the meantime it takes to forecast the multi-perspective conformance (class) for an executing process instance each time.

Implementation Details
To simulate the real scenario of multi-perspective conformance-oriented PPM (i.e., the multi-perspective conformance class prediction of an executing case will be made after each event is performed), the processed event logs, including some pairs of an encoded case and its conformance class, are divided into the first 80% training set and the last 20% test set depending on the time the cases occurred. Furthermore, the training set is classified into 80% training data and 20% validation data randomly to compare the advantages and disadvantages of these approaches after optimization. In other words, we divide these samples of training set into two parts, training data and validation data. The samples of training data are employed to train a prediction model, while the samples of validation data are viewed as test data to find a set of optimal hyperparameter combinations for the above-constructed models with the best performance. Since there are many parameters in the NN-based approaches, we choose the random search method [55] to make the hyper-parameters optimization. For the above 16 approaches, we set a distribution and value domain for each involved parameter respectively and then initialize these parameters to get a combination of them for optimization in this experiment. In addition, for the NN-based approaches, the number of epochs is fixed at 50. Besides, for these approaches CNN-BiRNN, CNN-BiLSTM, and CNN-BiGRU on Traffic Fines log, the parameters involved in the CNN part are determined empirically, such as the number of filters in 3 convolution layers is 16, 32, and 64 respectively, the size of the kernel is 3, the stride is 2, and the activation function is ReLU.

Experimental Results
To make online conformance predictions for an executing case in terms of the security in BPaaS, we extract all prefix traces with different lengths for the tenant-oriented historical executed cases from test set firstly. Then, we utilize the constructed prediction model based on each approach to forecast the multiperspective conformance class of each prefix trace. Based on these predictive results, we calculate the AUC values and online prediction time (i.e., online time) for each prefix length and each approach on different datasets. Meanwhile, we also calculate the time of training a prediction model (i.e., offline time) for each approach on different datasets. Table 4 gives the overall AUC for each approach on two datasets and the mean overall AUC on two datasets, respectively. In particular, this table also shows the overall AUC for each approach that utilizes a class weight to address the issue of sample imbalance. The overall AUC for an approach on a dataset refers to the weighted average of the AUC values calculated based on the prediction results of all predicted prefix traces with different lengths. Here, the weights are obtained depending on the number of prefix traces with a certain length. As shown in Table 4, it's easy to find our proposed CNN-BiGRU approach outperforms other compared approaches on Traffic Fines and BPIC2012 logs, respectively. Among these 16 approaches, the traditional machine learning-based approaches, i.e., RFbased approaches and XGBoost-based approaches, have the worst performance according to the mean of overall AUC values. Furthermore, RF_single_laststate and XGBoost_single_laststate have the lower AUC values than other approaches. Thus, we can infer that the reason for this phenomenon is the used encoding method of last state. Among these NNbased approaches, in terms of the mean of overall AUC values, the GRU performs best among these basic approaches, followed by LSTM and then RNN. Likewise, among the three bidirectional improvements of them (i.e., BiRNN, BiLSTM, and BiGRU), the BiGRU is also better than the BiLSTM and BiRNN approaches. Comparing the BiRNN, BiL-STM and BiGRU with the RNN, LSTM and GRU, respectively, we can find the improved bidirectional approaches perform better indeed. However, among the hybrid approaches, the CNN-RNN approach outperforms the CNN-LSTM, which may be due to the interference of CNN. Similarly, we also find that the CNN-BiRNN is worse than the CNN-RNN. However, the CNN-BiGRU and CNN-BiLSTM still perform better than the CNN-GRU and the CNN-LSTM, respectively. In addition, from the perspective of sample imbalance, these approaches with class weighted have a better performance than the original ones, especially for Traffic Fines dataset, which may be because the sample of this dataset is more unbalanced.

Accuracy Comparison
For further comparison, Figs. 5 and 6 show the AUC value of making predictions for the prefix traces with different (prefix) lengths. In these subgraphs, the prefix length shown on the x-axis denotes a set of prefix traces that are with a certain length and waiting for prediction. And the corresponding AUC value shown on the y-axis denotes the mean AUC for predicting these prefix traces by using an approach. As for Traffic Fines dataset, Fig. 5a and b give the trend of AUC changes with the prefix length increases for the above 16 original approaches, as well as these approaches with class weighted, which is similar to Fig. 6a and b. In Fig. 5, it is easy to find that the changing trend of AUC in subfigures (a) and (b) is similar, but both of them have obvious fluctuations with the prefix length increases, especially at the beginning and end of a case. From a normal point of view, the AUC value should gradually increase as the prefix length increases by considering the larger the prefix length, the more reference information available for prediction. The reason for this phenomenon may be related to the sample imbalance of the dataset. As shown in Figs. 6a and b, at the beginning of cases, the AUC values for different approaches fluctuate significantly in the short term, and they soon tend to increase steadily with the increasing prefix length. Subsequently, it is obvious to identify that the AUC values of these NN-based approaches always keep increasing gradually until most cases are complete. However, the AUC values of RF_single_laststate and XGBoost_ single_laststate begin to fluctuate and decrease once the length of prefix trace is greater than 20. On the one hand, the phenomenon may be caused by the used encoding method in these two approaches. On the other hand, these two approaches may be susceptible to activities that may have a decisive impact on the multi-perspective conformance class when making predictions about an executing case.
In addition, a hypothesis test was further used to evaluate these approaches to demonstrate that the experimental results in this paper are not accidental. Since the predictive performance of these different approaches (classifiers) is for each an ongoing process instance to be predicted, a hypothesis test is required for the predicted results of the test samples of each dataset rather than the dataset. In other words, we can only use a hypothesis test method based on algorithm rank because we compare the performance of multiple approaches on different test samples. Meanwhile, we find that the prediction results with AUC for all test samples under all different approaches (classifiers) do not meet the normal distribution through analysis. Therefore, we choose a nonparametric multivariate hypothesis test, the Friedman test, to refine our evaluation. The p-value after the Friedman test was less than 0.05, indicating that there are significant differences between these approaches. However, we could not know which two approaches have the performance differences, and a post-hoc test was further required [56]. The Nemenyi post-hoc test, always used in conjunction with the Friedman test, can demonstrate whether there are significant differences between every two approaches. Therefore, we apply the combined Friedman-Nemenyi test for all test samples and their predicted results of AUC under different approaches (classifiers) in each event log. Then we can calculate the p-value (between 0 and 1) between every two approaches among the above 16 approaches. If the value is less than the significance level of 0.05, we can conclude that there is a significant difference between them. Afterward, we find that the p-value between most approach pairs is 0.001 (less than 0.05). For example, as for these two event logs, there are significant differences between CNN-BiGRU and BiGRU (p-value= 0.001), CNN-BiGRU and CNN-GRU (p-value= 0.001) respectively, which is similar to CNN-BiRNN and BiRNN  Table 5 gives the offline time (in seconds) that is required to train a classification model by using different approaches and the online time (in milliseconds) that is required to make predictions about the conformance of a running case (i.e., a prefix trace). First of all, compared with the traditional approaches, we find that the RF-based and the XGBoost-based approaches require less time to build a prediction model (almost within 120 seconds), while the neural network-based approaches require a longer time. In particular, the LSTM approach requires nearly 5,000 seconds on Traffic Fines dataset. From the perspective of the datasets, we find that these approaches on BPIC2012 dataset need less time to construct prediction models. However, in terms of online prediction, the traditional RF-based and XGBoost-based approaches require more time to make predictions based on the built prediction model. As shown in this table, the online time required by the neural network-based approaches is less than ten milliseconds, most of which are two milliseconds, which also reflects the lack of efficiency and intelligence in the process prediction and monitoring applications based on the traditional machine learning techniques. Generally, the online prediction time is considered Fig. 6 Comparison of prediction accuracy in terms of AUC on BPIC2012 dataset for: (a) the above-mentioned 16 approaches and (b) the above-mentioned 16 approaches with class weighted to be more crucial than offline training time in realtime prediction or process execution monitoring scenarios. Accordingly, the NN-based approaches have advantages in online prediction tasks compared with these traditional machine learning approaches. In particular, as shown in Table 5, the online time of these GRU-based approaches, i.e., the GRU, BiGRU, CNN-GRU, and CNN-BiGRU, can keep the steady realtime prediction.

Interpretability of Prediction Results
In practical application, we can use our proposed approach to make predictions about the final multiperspective conformance checking for an ongoing process instance. As we know, the ongoing case indicates a prefix trace that consists of a series of performed activities (i.e., events). These performed activities have many attribute values, such as the activity name, resource, timestamp, and amount. In addition, there are some attributes can generated from these original attributes, such as the open_case, event_nr, and time_ since_midnight. As shown in Figs. 7 and 8, we take two decision points of an ongoing case in BPIC2012 for example. Here, 'A_SUBMITTED-COMPLETE' is the loan application submission stage and the first decision point 'A_PARTLYSUBMITTED-COM-PLETE' is the supplementary submission stage. The current state of this process case is called prefix trace. Before prediction, the above-mentioned approaches (classifiers) have the ability to learn the relationship between the encoded features of a case and the class of its multi-perspective conformance based on the similar prefix traces generated from the training set. Take a prediction for a case in this decision point, the input of this ongoing case for each approach (classifier) includes all attributes of the completed activities and these attributes are encoded as a numerical value according to the previous description. As shown in Figs. 7 and 8, based on our approach CNN-BiGRU, the prediction probability of prediction target True (conformance) is 0.38 at the first decision point and then the prediction probability rises to 0.39 when this case further completes activity 'W_Afhandelen leads-SCHEDULE'. As the process instance moves forward, the prediction probability of the prediction target True (conformance) doesn't always increase. Take the case, and for example, the prediction probability will drop to 0.33 after completing 5 activities and then increase to 0.36 after completing 8 activities. Moreover, this change will happen again. As for this case, the prediction probability gradually increases from 0.54 to 0.89 Table 5 Comparison of prediction efficiency in terms of execution time (after completing 39 activities) when 25 activities are completed.

Conclusion and Future Work
It is very important to predictively monitor the final conformance for an executing process instance regarding the security of process execution load in the cloudbased BPaaS application. In this article, we concentrated on the multi-perspective conformance-oriented predictive process monitoring task for enhancing the security of BPaaS. We then proposed an extensible multi-perspective (i.e., the structure, data, resource, and time) conformance measurement first. Based on it, given a predefined process model in BPaaS with some multi-perspective constraints (determined by tenants), the multi-perspective conformance of an executed case in historical event log can be determined and viewed as supervised knowledge. To predict the multi-perspective conformance of an executing process instance for a tenant, we proposed the CNN-BiGRU approach to build a prediction model from the historical executed cases that correspond to this tenant by combining the CNN neural network with the variant and enhancement of the RNN neural network. The proposed CNN-BiGRU simultaneously uses a 3-layer CNN network to aggregate the features of attributes and a multi-layer bidirectional GRU network to extract the temporal relation of events. In addition, we developed a framework in which we can make a multi-perspective conformance prediction for an executing process instance based on neural networks to enhance the BPaaS security. Extensive experimental results on two event logs demonstrated the superiority of our CNN-BiGRU Fig. 7 The explanation for True (conformance) prediction target at 'A_PARTLYSUBMITTED-COMPLETE' decision point Fig. 8 The explanation for True (conformance) prediction target at 'W_Afhandelen leads-SCHEDULE' decision point approach by comparing it with a bundle of state-of-theart technologies on process prediction tasks.
However, in terms of the applicability, there are some limits of the proposed method in this paper. For example, our proposed solution for conformance-oriented predictive process monitoring needs to have an original regulatory process model as a baseline when measuring the multi-perspective conformance of an executed process instance. Moreover, due to page limitations, this paper does not explore what to do after the compliance prediction for an ongoing case. Therefore, we are going to develop a strategy of how to take action on an ongoing case initiated by tenants based on the result of conformance predictions in the future. Besides, as some more efficient feature representation learning techniques are proposed, our future work will consider more contextual information to improve the performance of a multi-perspective conformance prediction model. Last but not least, with the continued execution of a process in BPaaS for a tenant, we also plan to investigate the incremental conformance prediction based on neural networks to avoid duplicate offline training.