In a metaphorical sense, event logs are a “force of the past” [159] that can be used to shape the present and predict future events. Similarly, in the context of this thesis, past process executions are intended to eliminate a present obstruction and predict the sequence of events for how the process execution can be completed (cf. the Log-based Completability Requirements in Chapter 2.3). In addition to the model-based approaches, this chapter assumes there is historical information for detecting and resolving obstructions. The consideration of logs addresses the general requirements of completability to consider all inputs (model, policy, log) throughout all approaches (GR-3). Similar to the obstructed path sequences of process models, obstructed traces represent obstructed states (GR-1). As identified in Chapter 2.3, various log-based techniques exist to derive a range of indicators to consider when assessing the security-sensitivity of solutions (GR-2).

Analogously to the model-based OLive-M  approach, this chapter presents OLive-L  to make obstructed sequences live again based on the log. The log-based completability requirements derived in Chapter 2.3, representing the potentials for how logs can be used in the context of obstructions, are incorporated.

Leveraging this approach, this chapter presents a further use case of the SecANet   encoding to demonstrate its applicability to log-based techniques. Although the SecANet  approach forms the basis, OLive-L  works independently because, depending on the information system used to execute a process, the process model or policy is not necessarily provided or enforceable, and only logs are available. However, the approach also leverages SecANet  modeling, which is relevant in additional aspects, such as generating or classifying logs.

Figure 5.1
figure 1

The OLive-L  approach for resolving obstruction based on logs

Figure 5.1 illustrates the OLive-L  approach. To detect and separate obstructed and successful traces (RLC-1), the traces are first partitioned accordingly. After this preprocessing, the approach then proposes a partial trace of events, i.e., a segment of the successful trace, to find paths to complete the obstructed trace (RLC-3). Such a completion trace could violate a safety property, such as an SoD requirement, but eventually allows the liveness to be “enforced” (in case a PAIS is provided that steers the execution accordingly). The proposed solution is again assessed with regard to its security-sensitivity. Indicators that quantify the cost of the obtained completion traces (RLC-2) are identified based on the log, enabling a measure of security-sensitivity for selecting the most security-sensitive candidate.

Figure 5.2
figure 2

Log-based approach with example traces

Figure 5.2 illustrates the OLive-L  approach with example traces of an arbitrary model that are categorized as “successful” or “obstructed.” Based on this partitioning of cases, the intention is to find the closest matches for the obstructed trace to the successfully executed traces. These nearest matches then propose the partial trace to complete the execution, and the example presented here results in two candidates. An objective function tries to find, for example, the shortest completion trace, which could be justified based on the assumption that a shorter exceptional completion trace involving the minimal number of executed events may be less risky and more security-sensitive. If constraint definitions exist, then an objective function minimizes the number of violations a solution candidate implies.

Similarly to the OLive-M  approach, this chapter demonstrates the applicability of OLive-L  by using logs for resolving obstructed workflow executions. A subsequent illustration presents a realization of the OLive-L  approach that exemplifies how process mining and machine learning methods are applied in this context. The OLive-L  approach primarily addresses the obstruction resolving requirement (RLC-3), as well as depends on the partitioning of traces and the assignment of costs. Therefore, the other requirements of RLC-1 and RLC-2 are incorporated as essential building blocks. Chapter 2 previously examined various possibilities of using logs for the partitioning of logs (RLC-1) and how they can determine indicators and costs (RLC-2). This chapter complements these methods for partitioning by taking the SecANet  into account to separate the traces.

Next, possible methods for exploring the similarity of traces are examined, resulting in identifying a common and applicable approach to provide solution candidates of successful traces that represent the closet match to the considered obstruction. Then, to finally select a candidate trace from which the completion trace is obtained, the ways to determine the security-sensitivity of candidates, i.e., the different costs they may imply, are illustrated. An implementation of the OLive-L  approach offers experimental results based on a log synthesized from the CEW SecANet. Finally, a discussion considers further developments and extensions, such as the possibilities to assess security-sensitivity, and sketches how the logs may be used for the model-based approach.

5.1 Methods and Realization of the OLive-L  Approach

This section identifies the methods to realize an implementation of the OLive-L  approach, which builds upon the approaches and methods indicated in Chapter 2 that suggested the promising use or adaption of OLive-L. The required formalism is introduced, beginning with a trace replay performed on a SecANet  considered for identifying the successful or obstructed traces. The partitioning of logs represents a fundamental initial step to provide further reason on the traces of the logs. For resolving an obstructed trace, the logs can also recommend or predict actions based on the behavior they reveal, such as how to complete the processes to achieve a positive outcome. Therefore, this section will then consider log-based techniques from process mining, in particular those from predictive monitoring suitable for determining nearest matches to select and demonstrate an appropriate method to resolve obstructions. The partitioning of logs further enables deriving meaningful measures for identifying the most security-sensitive candidate trace, where such indicators and measures are considered under the notion of “security-sensitive costing.”

5.1.1 Trace Replay

To partition traces, a log may already provide enough information to directly categorize its traces as obstructed or successful, for example, by considering the attribute-based filtering (e.g., “pi_abort” in XES) or endpoints filter. Alternatively, the log may be classified by corresponding conformance checking artifacts, such as rule checking, replay, or alignments. Subsequently, based on the SecANet  encoding, a straightforward way of conformance checking, the trace replay, is prepared and illustrated. If a fully replayed trace reaches the final marking, then it is considered successful. If it first reaches a terminal marking, then it is considered obstructed.

5.1.1.1 Trace-Related Notations

The basic definition related to traces and event logs is presented in the following.

Definition 5.1

(Trace and Event Log). Given an alphabet of events, \(T=\{t_1,\ldots ,t_n\}\), a trace is a word \(\sigma \in T^*\) that represents a finite sequence of events. An event log \(L \in \mathcal {B}(T^*)\) is a multiset of traces.

In the current context, an event consists of the executed task. The user who executes the task is denoted as \(<task, user>\). Equivalently, a corresponding trace that contains events of such form is denoted as \(\sigma _{tu}\). Accordingly, the obstructed trace in Figure 5.2 is given as \(\sigma _{tu\otimes }=\langle<t_1, u_1>,<t_2, u_4>,<t_3, u_2>\rangle \), with the successful trace of \(\sigma _{tuS}=\langle<t_1, u_1>,<t_2, u_1>,<t_3, u_2>,<t_4, u_6>,<t_5, u_8>,<t_6, u_9>\rangle \). Analogously to these language-related formalisms, the trace sequences beginning after the i-th position of the trace is indicated by \(\sigma _{tuSi}\). Then, the partial trace to complete the workflow is \(\sigma _{tuS3}=\langle<t_4, u_6>,<t_5, u_8>,<t_6, u_9>\rangle \).

Similar to full firing sequences, full traces indicate a sequence of events that is fully replayable on a WF-net by taking the net from the initial to the end markings. The sequence definitions from van der Aalst [3] are adapted and extended to this notion, as well as the use of traces, so that partial and possibly incomplete traces can be defined and potentially concatenated with a completion trace.

Definition 5.2

(Concatenation of traces, set of log events). A trace \(\sigma \) appended with element \(t^\prime \) is denoted as \(\sigma \bigoplus t^\prime = \langle t_1,\ldots ,t_n,t^\prime \rangle \). Similarly, \(\sigma _1 \sigma _2\) appends the trace \(\sigma _2\) to \(\sigma _1\), resulting in a trace of length \(|\sigma _1| + |\sigma _2|\). This can be simplified as \(\sigma t^\prime \) or \(\sigma _1 \bigoplus \sigma _2\), respectively. For any log \(L = \{\sigma _1, \sigma _2,\ldots ,\sigma _n\}\), \(L_{\bigoplus } = \sigma _1 \bigoplus \sigma _2 \bigoplus ... \bigoplus \sigma _n\) concatenates all traces into a single trace of length \(|\sigma _1| + |\sigma _2| + |...| + |\sigma _n| \). Hence, \(\text {supp}(\widehat{L_{\bigoplus })}\) gives the set of all events that occur in all traces contained in the log.

Appending the completion trace \(\sigma _{tuS|\sigma _{tu\otimes }|}\) to the obstructed trace \(\sigma _{tu\otimes }\) can be denoted as \(\sigma _{tu\otimes }\sigma _{tuS|\sigma _{tu\otimes }|}=\langle<t_1, u_1>,<t_2, u_4>,<t_3, u_2>,<t_4, u_6>,<t_5, u_8>,<t_6, u_9>\rangle \).

5.1.1.2 Replay-Based Partitioning of Traces

To replay the traces on the SecANet, the events of the trace may need enriching. For example, an event that encodes which user executed a task is represented as a distinct user-task transition and a corresponding task. In turn, the traces must consist of events containing the name of the executed task \(t_i\) and the user \(u_j\) who executed it. For the case that the traces are not easy to map and replay on the flattened model, it is first shown how the replay can be prepared. For this, events of the form \(<t_i, u_j>\) are mapped to the transitions of the model:

Definition 5.3

(Replay preparation). For each event \(< t_i, u_j>\) of the traces \(\sigma _{tu}\) occurring in the log \(L_{tu}\), the corresponding transitions of the flattened SecANet  N, i.e., the corresponding user-task transition \(t_{u_jt_i}\) that assigns the user to its task and the transition \(t_i\) indicating the task afterwards, are mapped to each other. By doing this, each event \(<t_i, u_j>\) of the trace \(\sigma _{tu}\) is transformed to the sequence \(\langle t_{u_jt_i}\), \(t_i \rangle \). The resulting trace is notated as \(\sigma _{utt}\), indicating the order of the transformed events. Analogously, the log is denoted as \(L_{utt}\).

The log replay algorithm is used to replay the resulting traces \(\sigma _{utt}\) [175]. The transformation from the BPMN model into a P/T-Net [78] and the conducted flattening introduces transitions that are not visible in the log (e.g., the forks or joins that route the control flow).

Such invisible tasks are considered lazy. In other words, they might fire in order to enable succeeding visible tasks, i.e., the tasks from the BPMN model (\(t_i\)) or user-task transitions (\(t_{u_jt_i}\)), but never directly in the course of log replay because they do not have an associated log event [175]. If a trace \(\sigma _{utt}\) is replayable by applying the log replay algorithm and reaches the final marking (with only one token remaining in the end position of the WF-net), then the trace is considered successful. Thus, the corresponding original trace \(\sigma _{tu}\) is added to the set of successful traces \(L_{tu\text {S}}\). If the trace \(\sigma _{utt}\) is replayable and does not reach the desired final marking but some other terminal marking, then it represents an obstruction with its corresponding original trace \(\sigma _{tu}\) classified as obstructed \(\sigma _{tu\otimes }\). Traces not fully replayable, such as those resulting from the aforementioned incompleteness or noise (cf. Chapter 2.3), are neglected.

5.1.2 Nearest Match

Based on the performed classification of logs, given an obstructed trace (cf. Figure 5.2) that contains the executed tasks and its executor (e.g., \(\sigma _{tu\otimes } = \langle<t1,u1>,<t2,u4>,<t3,u2> \rangle \)), the nearest match to the successful traces must be found to identify a partial sequence to complete the execution. To obtain the matches of traces that are in some way the “closest” to the obstructed trace, this section introduces the so-called k-nearest neighbor (kNN) search as a method to realize the OLive-M  approach.

5.1.2.1 k-Nearest Neighbor

Identifying the nearest match and proposing an addition of events have strong similarities to the imputation of missing values for cleaning and imputing raw data. If the traces related to the OLive-M  approach were considered as data points, then an obstructed data point could be imputed with the user-task events from the points that encode successful executions.

A popular imputation approach to correct missing values is based on the k-nearest neighbor search. For each instance that contains one or more missing values, the k-nearest neighbors are calculated, and gaps are imputed based on the existing values of the selected neighbors. The most commonly used similarity function to obtain the k-nearest neighbors for missing values imputation is a variation of the Euclidean distance that accounts for those samples containing missing values [27]. Along with being a typical method in machine learning (ML), predictive monitoring that leverage ML approaches often use kNN in addition to support vector machines, artificial neural networks, decision trees, clustering methods, or regression trees [144, 202].

The kNN algorithm is based on the Parikh vector representation, introduced previously, allowing for an easy transition of the results to Petri net-related matrix equations and Parikh vectors, which facilitate combining the elements and solutions of the log- and model-based approaches. Therefore, kNN is suitable to search for similarities between the data points used to realize the approach presented here.

Figure 5.3
figure 3

Sketch of obstructed and successful traces in n-dimensional space

Figure 5.3 sketches the points of successful traces related to an obstructed trace (o) that indicates the n-dimensional space in which kNN operates. Identifying those traces with the nearest distance to “o”, a variety of similarity metrics exist, such as the Manhattan or Cosine distance. Because the straightforward applicability of the approach is of primary interest here, the Euclidean distance that corresponds to the typical spatial understanding of a distance is introduced. The distance of each selected data point to all other data points is computed sequentially, i.e., the computational steps increase linearly with the size of the problem. In this respect, the caveat of kNN lies in the dimensionality because the required space increases exponentially with each added dimension.

5.1.2.2 kNN-Based Completion Trace

The kNN algorithm, dating back to Cover et al. [56], is adapted for use in the OLive-L  approach by finding the nearest match between an obstructed trace and the successfully executed traces. The completion traces are then identified based on these k-candidates.

Definition 5.4

(Find k-nearest neighbors). Given a set of successful traces \(L_{tu\text {S}}\), an obstructed trace \(\sigma _{tu\otimes }\), and a positive integer k, calculating the k nearest traces to \(\sigma _{tu\otimes }\) is performed as follows:

  1. 1.

    For each trace \(\sigma _i\) in \(L_{tu\text {S}}\), assign its Parikh vector \(\widehat{\sigma _i}\) to the n-dimensional space \(\mathcal {R}^n\), where \(n = |\{\text {supp}(\widehat{L_{{tu\text {S}}\bigoplus }}) \cup \text {supp}(\widehat{\sigma _{tu\otimes }})\}|\).

  2. 2.

    Find the k nearest Parikh vectors of the successful traces \(\{\widehat{\sigma _1},\widehat{\sigma _2},..., \widehat{\sigma _k}\}\) with minimal distance to the Parikh vector of the obstructed trace \(\underset{\widehat{\sigma _i}\in L_{{tu\text {S}}}}{\text {min}^k}\) \(d(\widehat{\sigma _i}, \widehat{\sigma _{tu\otimes }})\), where d is the Euclidean distance metric d(ab) \(=\) \( \sqrt{\sum _{i=1}^{n}(a_{i}-b_{i})^{2}}\).   \(\dashv \)

Given \(\{\widehat{\sigma _1},\widehat{\sigma _2},..., \widehat{\sigma _k}\}\), the partial sequences of the corresponding traces \(\{\sigma _{1|\sigma _{tu\otimes }|}\), \(\sigma _{2|\sigma _{tu\otimes }|},\) \(..., \sigma _{k|\sigma _{tu\otimes }|}\}\) contain all the events after the \(|\sigma _{tu\otimes }|\)-th position of the trace, presenting the k potential sequences of events to complete from \(\sigma _{tu\otimes }\).

If more than one candidate are found, then only one must be selected, for example, by an objective function that considers the length of the partial sequence to complete or the number of violations taken into account. For instance, if two successful candidates

$$\sigma _{tu\text {S}1} = \langle<t1,u1>,<t2,u1>,<t3,u2> ,<t4,u6>,<t5,u8>,<t6,u9> \rangle \text { and}$$
$$\sigma _{tu\text {S}2} = \langle<t1,u1>,<t2,u1>,<t3,u2>,<t4,u6>,<t5,u7> \rangle $$

are chosen as the nearest match, both having the same first three events \(<t1,u1>,<t2,u1>,<t3,u2> \) in which only the executor of t2, namely u1, differs from the executor of t2 in the obstructed trace, u4, then the potential partial sequences of events to complete the execution, i.e., the completion traces

$$\sigma _{tu\text {S}1|\sigma _{tu\otimes }|} = \langle<t4,u6>,<t5,u8>,<t6,u9> \rangle \text { and}$$
$$\sigma _{tu\text {S}2|\sigma _{tu\otimes }|} = \langle<t4,u6>,<t5,u7> \rangle $$

can be compared by an objective function. If the length of the partial sequence of events is minimized, then the completion trace \(\sigma _{tu\text {S}2|\sigma _{tu\otimes }|}\) is chosen to complete \(\sigma _{tu\otimes }\).

5.1.2.3 Security-Sensitive Costing of Candidates

As examined in Chapter 2.3, the log can be used to enhance the model as well as the log by indicators, such that violations can be better assessed. Such a security-sensitive costing based on the log may consider a plethora of indicators that can be derived from the logs [17, 164, 165] and then incorporated and weighted into the overall cost of the related elements in the SecANet  or the events of a log. More precisely, in addition to assigning indicators to places or transitions of a SecANet, these could also be assigned to each user-task event or each event that involves only a certain user or task. Then, the overall cost of each proposed candidate is summed to assess the solution in terms of its security-sensitivity.

This extraction of indicators and metrics used for security-sensitive assessments can focus on multiple aspects, some of which are presented in the following with examples:

  • Tasks: The general relevance of a task can be assessed on the basis of the log, or how important a task is for successful execution (cf. Key Performance Indicators). Given that a log contains the appropriate attributes, identifying the tasks affected by unauthorized access in a break-glass case is possible. Accordingly, the possible cost of a violation affecting those tasks can be lowered.

  • User-task level: Referred to as the user-task transitions of a SecANet  or an access control matrix (ACL), the relevance of a corresponding permission (or user-task transition) is evaluated, depending on if a corresponding user-task authorization occurs in the log.

  • Data elements: Interpreted in relation to tasks or users, exceeding a threshold (e.g., a credit of more than 5000 Euro), for example, could mean a higher risk for the tasks or users affected. As another example, dealing with a larger amount of money may improve the qualification and lower the risk (or cost) associated with a user.

  • Resources/Users: In Chapter 2.3, many existing methods were identified, such as resource behavior indicators, as identified in Figure 2.11 and resource profiles. In addition, profiling techniques that identify the threat emanating from an insider could indicate how risky is the participation of a particular user in a process [34]. However, the egocentric perspective on an individual user can be considered, as well as the socio-centric perspective that relates the users with each other. Specifically, the consideration of social networks that use different metrics for the type of collaboration of actors can be assessed from a security perspective.

To offer additional examples, the social network analysis provides different metrics that seem suitable for use in security-sensitive costing. A relationship can result, for example, from how often multiple actors are involved in the same process execution. This so-called “working together” metric indicates that the process is being handled well, but there remains an increased risk of collusion or fraud. Furthermore, considering which actors perform similar tasks is possible. A user who carried out a task similar to the obstructed one could be favored by the costs. Finally, as identified in Chapter 2, process logs can contain events that represent the execution or completion of tasks, as well as provide more detailed information about the state of processing, which can include the assignment of an activity to a specific actor, whose actual start of editing is recorded in another event. Other event types can define delegations, pauses, resumes, and the end of activities (see the Standard Transactional Life-cycle Model in Chapter 2). Metrics that refer to an event type can, for example, explicitly track delegations and derive information about the hierarchy of process participants to be used for role mining. Such hierarchies can then be associated with costs. For instance, if the head of a department carries out an activity that an employee would otherwise perform, then this activity results in a higher cost.

Based on the classification of logs, the feature weighting of the partitioned classes can assess the security-sensitivity of a solution. In determining the influence of individual features, i.e., the dimensions in an n-dimensional space, as classified as a successful or obstructed trace, a high attribution of certain user-task events suggests that the existence of these features is crucial in the selection of candidates. Thereby, based on the typical RELIEF Algorithm [129], for example, the vectors are determined with a dependence on both classes, the nearest hit to the class under consideration and the nearest miss from the other class. Assessing the importance of the different events is then possible to make the entire trace successful or obstructed, respectively. The k-candidates can then be multiplied with the obtained feature vector, such that the candidate with the highest summed up weight represents that it contains most of the user-task assignments that are crucial for successful execution, and thereby provides a “completability measure.” A feature vector for obstructability can also be deduced by a feature weighting of the obstructed traces where the minimum value is chosen, as it represents an “obstructability measure.”

If there exists a SecANet  model of the process, then replaying the overall trace \(\sigma _{tu\otimes }\sigma _{tuS|\sigma _{tu\otimes }|}\), i.e., the obstructed and completion traces, can also indicate the missing tokens during replay firing. When the SecANet  has costs assigned, the cost to add missing tokens in the places, together with the cost to execute the transitions related to the events in the completion trace, sum to the overall cost of the solution.

5.2 OLive-L  Experiments for Log-Based Obstruction Solving

An implementation of the OLive-M  approach presented above is demonstrated through the following experiments applied to logs and traces related to the CEW SecANet.

5.2.1 Implementation

The implementation of the OLive-L  obstruction solution was developed in Java 8 using the Apache Commons math library to calculate the Euclidean distance for the kNN algorithm. Based on two CSV files for the successful and the obstructed traces, and a positive integer value of k that encodes the vicinity to scan when finding the closest matches to a given obstructed trace, it returns a list containing at most k closest vectors, the related traces, and their distance from the obstructed trace vector. These experiments were conducted on a MacBook Pro with 8 GB RAM and an Intel Core i7 3 GHz CPU.

Table 5.1 Encoding of successful traces in 12-dimensional space

5.2.2 Experiment Preparation: Obtaining Traces

Comparable to real WSP instances, because acquiring real-world traces with a corresponding model along with all the authorization data required to perform the described analysis is difficult, successful and obstructed traces were generated by playing out firing sequences of the flattened Petri net from Figure 3.46 (i.e., sequential firing of the enabled transitions until an obstructed or a final marking is reached). With this data generation method, both evaluations build upon the same model to compare the results.

After generating the traces, the events were mapped to the users who executed each, according to the flattened user-task assignment, and filtered only by the relevant user-task events (in a real-world log, such events would contain the task name with the executing user/originator, cf. Definition 5.3). From this, successful and obstructed traces conforming to the user-task assignment and SoD/BoD constraints were generated. For each trace of the form \(\sigma _{tu}\), the corresponding Parikh vector \(\widehat{\sigma _{tu}}\) was build and assigned to the n-dimensional space. Table 5.1 displays the successful Parikh vectors of the traces as assigned to a 12-dimensional space, based on all possible user-task assignments.

Table 5.2 Solution for \(k=5\) with highlighted partial sequence

5.2.3 Experiment Setup and Solution

The nearest neighbor of the successful traces to the corresponding obstructed trace was computed with the Euclidean distance measure. For comparability, the obstructed trace \(\sigma _\otimes \) from the model-based experiments, encoded as \((0,0,1,1,0,0,1,0,1,0,0,0)\), was also chosen. The solution for \(k=5\) is depicted in Table 5.2 Footnote 1.

Trivially, if \(k=1\), then no decision is needed as to which partial sequence to choose. Interestingly, \(<t5,d>\) would be proposed, although the majority of successful traces for \(k=5\) in Table 5.1 ends with \(<t5,a>\). Based on the candidates identified by the Euclidean distance, the \(k=5\) solution requires selecting one of these candidates with their completion trace by considering security violations or the minimum length of the partial trace. As the partial trace, i.e., the completion trace, is selected at the \(|\sigma _\otimes |\)-th position, the second and third solutions are empty. Because the remaining completion traces have the same length, the length-criterion to identify completion traces is neglected.

To assess the security violations of the completion traces, the obstructed trace is checked against the different solutions and impacts on the given SoD and BoD constraints. Similarly to the model-based approach, both solutions, \(<t5,a>\) and \(<t5,d>\), violate one SoD constraint. However, reviewing the set of partial sequences provided in Table 5.2, a majority for \(<t5,d>\) can be identified. Additional techniques and corresponding limitations due to the uncertainties that a log may entail are discussed below.

5.2.4 Experiments with Extensive Logs

To illustrate the applicability of this implementation to real-world logs, the performance was tested on a large data set with a 65-dimensional space and 247,192 traces. This log did not allow partitioning based on a SecANet  because no authorization specific data was available. Therefore, the traces were related to a randomly selected trace to assess the performance, and the qualitative aspect of the result had to be neglected. In this experiment, finding the five nearest neighbors for one trace required 0.31 seconds, which suggests the efficiency of the approach.

5.3 Discussion and Potentials of the OLive-L  Approach

In realizing the OLive-L  approach, this chapter first proposed an additional way to detect and separate obstructed and successful traces (RLC-1) by replaying traces on the SecANet  model. The kNN method was leveraged to find the traces that complete obstructed execution (RLC-3). Depending on the actual capabilities of process monitoring, control, and enforceability that a PAIS provides, the solutions for the completion traces may only present a recommendation on how to proceed. Alternatively, the PAIS may steer the obstructed process towards its completion by the obtained completion trace. To determine the security-sensitivity of the solution candidates, in addition to examining the log-based indicators, the SecANet  model quantified the costs of the proposed solutions (RLC-2). Based on the feature space spanned by all user-task events, the feature vector was illustrated as to how it can evaluate the significance of the events in a candidate trace, for example, with regard to the successful execution of the process. Because the presented methods represent one possible approach to realizing the building blocks of the implementation, the limitations that exist in each step, as well as possible improvements, are discussed in the following.

5.3.1 Log and Partitioning

The solutions strongly depend on the size and quality (e.g., in terms of noise or granularity) of the log, especially to the extent to which successful and obstructed traces appear. At the same time, the log allows considering only those executions that are relevant in practice.

The possibilities of partitioning traces are extended by the SecANet  model. Although the result of this partitioning may initially appear as just a reflection of the firing sequences of the “played out” model or the terminal language of the net, considering the users and tasks that are part of real processes make a difference. The log reflects a concrete selection of real-world process executions, such that the possibilities to solve an obstruction are reasonably limited. Therefore, a model-based solution might lack practical relevance if it does not represent a solution within the log-based approach.

Another issue concerning the replay is that traces not fully replayable are neglected. However, these traces could still indicate “good” outliers that result from a break-glass situation that was resolved and reviewed, so they are no longer replayable by the (idealized) model. An event attribute could additionally indicate the trace as completed, such that it would be included via attribute-based filtering.

Apart from replaying the traces, further conformance checking artifacts, such as rule checking or alignments, could use the SecANet  encoding to partition traces. For example, the alignment of the log traces to the firing sequences of the SecANet  model could be computed such that deviations from the synthesized traces are relatable to violations and then introduce more classifications.

5.3.2 kNN and Selection of Completion Segment

Although the kNN-solution provides a way to escape an obstructed trace, the necessary assumptions leave room for discussion and improvement. Variants for finding the nearest neighbors, such as other distance measures of Manhattan, Cosine, or edit distance, could be explored. Alignments could also be used as an additional similarity metric to obtain the nearest matches of an obstructed trace to those successful executions.

When determining the completion trace, the approach also suggests opportunities for refinement when considering which events of the successful trace must be contained in the completion segment to complete the obstructed trace adequately. In contrast to the model, many uncertainties exist in the traces. For example, the obstruction does not necessarily occur in the final task of the obstructed trace. The reason for the obstruction might appear early in a trace, and concurrent activities could follow. Currently, the length of the obstructed trace is key for choosing the completion segment, i.e., the completion trace, because traces of equal length may have similar execution paths, so they already inherit a certain similarity. Candidate traces that have the same tasks executed after a certain length within the total of the obstructed trace could additionally be required. However, if this is not the case, then the completion segments of a candidate may become too short, such as in the second and third experimental candidates in Table 5.2.

Therefore, other techniques could be considered to determine the partial trace. The last common task of the obstructed and the considered candidate trace when traversing the traces from left to right might be considered. Alternatively, the Parikh vector of the obstructed trace could be subtracted from the Parikh vector of the candidate trace. Then the remaining events in the Parik vector of the successful trace could be executed according to the order of occurrence in the corresponding trace. Based on predictive monitoring approaches, the next task of an obstructed trace could be predicted, such that the subsequent event or the starting point of the completion trace may be determined. Finally, by using a SecANet  model, traces could be replayed and obstruction markings identified. Then, the marking obtained related to the obstructed marking can be checked, along with a determination of the remaining activities to be executed. For simplicity in this implementation, however, the length of the obstructed trace was chosen to be the most straightforward to illustrate the applicability of the approach efficiently.

5.3.3 Security-Sensitive Costing

Additional factors could be considered (e.g., in an objective function) to better assess the security-sensitivity. Implications regarding security violations from choosing the closest trace could also take additional features into account by conformance checking of an SoD rule for classifying the traces accordingly. A feature weighting could further partition traces regarding violations. Having logs portioned into classes of “conforming” and “violating,” a corresponding feature vector could then perform a security-based weighting instead of a pure success-based weighting. From this feature weighting, the events that contribute to violations could be indicated.

Additional indicators and measures could be refined using the manifold approaches sketched by machine learning and process mining. For example, the partitioned obstructed traces for predictive or prescriptive monitoring could predict obstructions to direct avoidance. Alternatively, these traces could be hypothetically completed when considering each as a completion trace to ensure that the proposed solution has a low risk of being obstructed again. Because the current focus, apart from the feature weighting, is on the successful traces, the advantage of this knowledge within the obstructed traces can be leveraged.

5.3.4 SecANet  Discovery

If no SecANet  is available, then mining a SecANet  in the course of “process discovery” is conceivable. In this case, discovering all aspects separately is advisable, including the control flow and all user-task assignments of the complete executions. Some reasonable assumptions must be made for mining the SoD or BoD constraints. For example, the pairs of tasks that usually involve different users at the case level could be investigated and then defined as SoD constraints for these tasks. As identified in Chapter 2.3, process mining techniques focus on various resource perspectives, e.g., role mining, that may be used or adapted here. Then, an adequate process discovery method can obtain the control flow, user-task assignments, and constraints as inputs based on which of the usual SecANet  encoding, as described in Chapter 3, could be performed, enabling the discovery of a SecANet  based on a log to which the OLive-M  approach is applied.

5.3.5 Log- and Model-Based OLive Extensions

When assessing violations, the SecANet  is already considered by a replay to identify missing tokens during firing. However, additional specialized ways are sketched below that not only consider the SecANet  but the respective model or log-based counterparts of the OLive approaches to resolve obstructions. The sequences of user-task events (\(\sigma _{tu}\)) and user-task transitions \(\sigma _{utt}\) are assumed to be mapped according to Definition 5.3. Therefore, the advantages of how both methods of the log- and model-based technique can be combined are explored in the following.

5.3.5.1 OLive-LM: Refining the Log-Based approach with the Model-Based approach

In addition to using the SecANet  to assess security-sensitivity, elements of the model-based OLive-M  approach can assess violation. For example, the obstruction trace on the SecANet  can be replayed to the end in the obstruction marking, followed by the candidate trace without the partial trace for completion. From the perspective of the model-based approach, the resulting marking represents a live marking that must be reached from the obstructed marking to fire the partial trace to complete the execution. By subtracting the place vector of the obstructed marking from the place vector of the live marking, the result could reveal the tokens that must be added to complete the execution. If there are no positive integer solutions of the tokens to add, then the solution may be too far from the model, such as when tasks are skipped.

Eventually, in the case of the resulting vector \(\ge 0\), based on the added tokens and the events in the partial completion trace, the costs of the related places and transitions can be summed. To perform such a cost-based evaluation, the corresponding costs can then be assigned to the model during the log-based model-enhancement.

5.3.5.2 OLive-ML: Refining the Model-Based Approach with the Log-Based Approach.

Just as the OLive  approach can be leveraged for the log-based approach, logs can be integrated into the model-based OLive-M  approach to provide more realistic and possibly faster solutions. Two possibilities are considered in checking the solution Parikh vector X of the OLive-M  approach with the corresponding traces or directly inserting the successful partial traces of the log as a Parikh vector X into the marking equation.

Using Logs to Address the Problem of Replayability:

The first option of checking the solution Parikh vector X of the OLive-M  approach with the corresponding traces addresses the problem of replayability. In contrast to the replayability of the Parikh vector, the log replay has the advantage that the traces in the partitioned log already contain an order that can be deduced by the trace tuple or by the timestamp of the corresponding trace event, if provided. After the ILP model involving the OLive-M  state equation is solved, the resulting X vector is related to the logs. The solution also provides a live marking \(m_{live}\) that consists of the obstructed marking \(m_\otimes \) and the addition of the tokens in \(\Delta \).

Instead of checking the replayability of the solution vector X, the transitions or events in X could be checked if they are completely contained in some of the successful traces. To filter only those traces that correspond to the solution, X must be fully contained in the Parikh vector of each of the considered traces \(\sigma _{uttS}\), i.e., \(\sigma _{uttS}\in L_{utt\text {S}}\) and \(\widehat{\sigma _{uttS}} \ge X\). For each identified trace \(\widehat{\sigma _{utS}}\), the replay of the trace without the events in the Parikh solution, i.e., \(\widehat{\sigma _{ut}}- X\), can be checked if it ends in \(m_{live}\). This could either be checked by a SecANet  replay, or directly by the marking equation \(m_{live} = m_0 + A(\widehat{\sigma _{utS}}- X)\). Therefore, the replay on the model directly excludes those traces that have gaps in the firing sequences, which occur in the subtraction of the Parikh vector. In contrast, the extent to which checking the replay by the marking equation is sufficient would have to be considered.

Finally, if the live marking obtained after replaying the sequence is equal to the live marking obtained by the OLive-M  solution by adding \(\Delta \) to \(m_\otimes \), then X is replayable because it contains all the events that have not yet been replayed from the successful trace. This combined model- and log-based approach excludes spurious solutions and eliminates the possibly exhaustive replay analysis of the Parikh vectors, as described in Chapter 4. Moreover, relating the ILP solution of X to a limited set of successful traces does not necessarily suggest a restriction on the possible ILP solutions. If no corresponding trace is available to check X, then the replayability of X can still be examined, as described in Chapter 4. By using logs to check replayability, the combination of the model and the log shows a method, assuming that a small ILP instance can be solved efficiently, for how a solution that resolves obstructions is achieved efficiently.

Using Logs to Address the Problem of Larger ILP Instances:

The second option of inserting successful partial traces of the log directly as the Parikh vector X in the marking equation allows for solving a system of linear equations instead of an ILP instance. A simplifying assumption for this method is that only successful traces with the same tasks in the same order as in the obstructed trace are selected. The completion segment of such a successful trace \(\sigma _{uttS}\in L_{utt\text {S}}\) is denoted as \(\sigma _{uttS|\sigma _{tu\otimes }|}\), where \(\sigma _\otimes \) denotes the sequence that leads to the obstruction marking.

As mentioned above, this choice of the completion segment can be significantly more multi-faceted and differentiated. Although the simplifying assumption allows for a fast identification of possible solutions by setting \(X=\widehat{\sigma _{uttS|\sigma _{tu\otimes }|}}\), additional solutions could be lost. By inserting these values assigned to X, every linear equation has only one independent variable such that \(\Delta \) can be directly solvable. Based on these observations, the solutions can be further restricted by \(\Delta \) such that \(0 \le \Delta \le 2\) in which the result only allows for the addition of single tokens. A solution means that the process can be completed by adding the obtained tokens \(\Delta \) and firing the completion sequence \(\sigma _{uttS|\sigma _{tu\otimes }|}\). As before, the costs can be identified for each possible solution based on the assigned cost in the model. After solving each possible successful trace, the solution with the least cost can be identified. While this solution approach is efficient for systems of linear equations, smaller ILP instances may be more exhaustive. With increasing problem sizes, a considerable amount of space is required but remains efficiently solvable compared to larger ILP instances.

In summary, while the OLive-M  approach may be practical for smaller ILP instances, depending on the sizes of the log and the possibly larger ILP instances, the first or second options presented above must be determined as to which is more suitable to enhance the OLive-M  approach based on logs. So, the logs can build a solution base to be included if necessary. In both approaches described in this section, the finite nature of the logs could be used to ease computation and limit solutions to realistic possibilities. The additional computing steps are light, such as the linear search of comparing the X vector with each Parikh vector of the successful traces. In addition, by the Parikh mapping, all traces can be transferred as points into an n-dimensional space, such that a point encodes multiple traces that have been used to execute the same events in a different order. This simplification of the representations of the traces to check reduces the search space and memory requirements. A log already contains a certain degree of evidence for how the real-world executions may work, so by using logs, the results of the model-based approach can be adjusted to reality. As a drawback to using logs, the theoretically conceivable solutions that do not appear in the log are potentially suppressed. Complementing logs with successful traces synthesized from the SecANet  model can be considered to counter this scenario. However, because synthesis means computing the reachability, this must be done in a workable way. Otherwise, only checking the replayability when required is more appropriate.