Visualizing Trace Variants From Partially Ordered Event Data

Executing operational processes generates event data, which contain information on the executed process activities. Process mining techniques allow to systematically analyze event data to gain insights that are then used to optimize processes. Visual analytics for event data are essential for the application of process mining. Visualizing unique process executions -- also called trace variants, i.e., unique sequences of executed process activities -- is a common technique implemented in many scientific and industrial process mining applications. Most existing visualizations assume a total order on the executed process activities, i.e., these techniques assume that process activities are atomic and were executed at a specific point in time. In reality, however, the executions of activities are not atomic. Multiple timestamps are recorded for an executed process activity, e.g., a start-timestamp and a complete-timestamp. Therefore, the execution of process activities may overlap and, thus, cannot be represented as a total order if more than one timestamp is to be considered. In this paper, we present a visualization approach for trace variants that incorporates start- and complete-timestamps of activities.


Introduction
The execution of operational processes, e.g., business and production processes, is often supported by information systems that record process executions in detail. We refer to such recorded information as event data. The analysis of event data is of great importance for organizations to improve their processes. Process mining [1] offers various techniques for systematically analyzing event data, e.g., to learn a process model, to check compliance, and to obtain performance measures. These insights into the processes can then be used to optimize them.
As in other data analysis applications, visual analytics for event data are important in the application of process mining. A state-of-the-art process mining methodology [6] lists process analytics including visual analytics as a key component next to the classic fields of process mining: process discovery, conformance checking, and process enhancement. activity A activity B activity C activity D activity E activity F activity A activity G activity A activity B activity C activity E activity D activity A activity F activity G activity A activity B activity C activity E activity D activity G  A visualization approach that is used across various process mining tools, ranging from industry to scientific tools, is called the variant explorer. Consider Figure 1a for an example. In classic trace variant visualizations, a variant describes a unique sequence of executed process activities. Thus, a strict total order on the contained activities is required to visualize such sequence. Recorded timestamps of the executed activities are usually used for ordering them.
This classic trace variant visualization has two main limitations. (1) Assume atomic process activities, i.e., a single timestamp is recorded for each process activity. A strict total order cannot be derived if multiple activities have the same timestamp. In such cases, the sequential visualization, indicating temporal execution, of process activities is problematic because a second-order criteria is needed to obtain a strict total order. (2) In many real-life scenarios, process activities are performed over time, i.e., they are non-atomic. Thus, the execution of activities may intersect with each other. Consider Figure 2a for an example. Considering both start-and complete-timestamps, a strict total order cannot be obtained if the executions of activities overlap. The classic trace variant explorer usually splits the activities in start and complete as shown in Figure 1b to obtain atomic activities. However, the parallel behavior of activities is not easily discernible from the visualization. In addition, the first limitation remains.
In this paper, we propose a novel visualization of trace variants to overcome the two aforementioned limitations. We define a variant as an interval order, which can be represented as a graph. For instance, Figure 2b shows the interval order of the two process executions shown in Figure 2a. The graph representation of an interval order (cf. Figure 2b) is, however, not easy to read compared to the classic trace variant explorer (cf. Figure 1). Therefore, we propose an approach to derive a visualization from interval orders representing trace variants.
The remainder of this paper is structured as follows. Section 2 presents related work. Section 3 introduces concepts and definitions used throughout this paper. Section 4 introduces the proposed visualization approach. Section 5 presents an experimental evaluation, and Section 6 concludes this paper.  For a general overview of process mining, we refer to [1]. Note that the majority of process mining techniques assume totally ordered event data. For example, in process discovery few algorithms exist that utilize life cycle information, i.e., more than one timestamp, of the recorded process activities. For instance, the Inductive Miner algorithm has been extended in [9] to utilize start-and complete-timestamps of process activities. Also in conformance checking there exist algorithms that utilize life cycle information, e.g., [10]. A complete overview of techniques utilizing life cycle information is outside the scope of this paper.
In [6], the authors present a methodology for conducting process mining projects and highlight the importance of visual analytics. In [8], open challenges regarding visual analytics in process mining are presented. The visualization of time-oriented event data-the topic of this paper-is identified as a challenge.
The classic variant explorer as shown in Figure 1 can be found in many different process mining tools, e.g., in ProM 3 , which is an open-source process mining software tool. In [3], the authors present a software tool to visualize event data. Various visualizations of event data are offered; however, a variant explorer, as considered in this paper, is not available. In [2], the authors present a plugin for ProM to visualize partially ordered event data. The approach considers events to be atomic, i.e., an event representing the start and an event representing the completion of an activity are considered to be separate events. Based on a userselected time granularity, events within the same time segment are aggregated, i.e., they are considered and visualized to be executed in parallel. This offers the advantage that the user can change the visualization depending on how accurately the timestamps are to be interpreted. Compared to our approach, we consider non-atomic activity instances, i.e., we map start and complete events of a process activity to an activity instance. Next, we relate these activity instances to each other instead of atomic events as proposed in [2]. Therefore, both approaches, the one presented in [2] and the one presented in this paper, can coexist and each have their advantages and disadvantages.

Preliminaries
In this section, we present concepts and definitions used within this paper.
Event data describes the historical execution of processes. Table 1 shows an example of said event data. Each row corresponds to an event, i.e., in the given example an activity instance. 4 For example, the first event, identified by event-id 1, recorded that activity A has been executed from 08:00 until 09:30 at 07/13/2021 within the process instance identified by case-id 1.
In general, activity instances describe the execution of a process activity within a specific case. A case describes a single execution of a process, i.e., a process instance, and it is formally a set of activity instances that have been executed for the same case. Activity instances consist of at least the following attributes: an identifier, a case-id, an activity label, a start-timestamp, and a complete-timestamp. Since we are only interested in the order of activity instances within a case and not in possible additional attributes of an activity instance, we define activity instances as a 5-tuple.
Definition 1 (Universes). T is the universe of totally ordered timestamps. L is the universe of activity labels. C is the universe of case identifiers. I is the universe of activity instance identifiers.
Definition 2 (Activity Instance). An activity instance (i, c, l, t s , t c )∈I×C× L×T ×T describes the execution of an activity labeled l within the case c. The start-timestamp of the activity's execution is t s , and the complete-timestamp is t c , where t s ≤t c . Each activity instance is uniquely identifiable by i. We denote the universe of activity instances by A.
Note that any event log with only one timestamp per executed activity can also be easily expressed in terms of activity instances, i.e., t s =t c . For a given activity instance a=(i, c, l, t s , t c )∈A, we define projection functions: π i (a)=i, π c (a)=c, π l (a)=l, π ts (a)=t s , and π tc (a)=t c .

Definition 3 (Event Log
). An event log E is a set of activity instances, i.e., E⊆A such that for a 1 , a 2 ∈E ∧ π i (a 1 )=π i (a 2 ) ⇒ a 1 =a 2 . We denote the universe of event logs by E.
For a given event log E∈E, we refer to the set of activity instances executed within a given case c∈C as a trace, i.e., T c ={a∈E | ∧π c (a)=c}. As shown in Figure 2a, we can visualize a trace and its activity instances in a time plot.
Note that each activity instance a=(i, c, l, t s , t c )∈A defines an interval on the timeline, i.e., [t s , t c ]. A collection of intervals-in this paper we focus on tracesdefines an interval order. In general, given two activity instances a 1 , a 2 ∈A, we say a 1 <a 2 iff π tc (a 1 )<π ts (a 2 ). Note that interval orders are a proper subclass of strict partial orders [7]; hence, interval orders satisfy: irreflexivity, transitivity, and asymmetry. Interval orders additionally satisfy the interval order condition, i.e., for any x, y, w, z : x<w ∧ y<z ⇒ x<z ∨ y<w [7].
In this paper, we represent an interval order as a directed, labeled graph that consists of vertices V , representing activity instances, and directed edges V ×V , representing ordering relations between activity instances. Figure 2b shows the interval order of the traces shown in Figure 2a. We observe that the first two activity instances labeled with A and B are incomparable to each other because there is no arc from either A to B or vice versa. Thus, the first execution of A and B are executed in parallel, i.e., their intervals overlap. For example, activity C is related to F , G and the second execution of A. Thus, C is executed before F , G and the second execution of A. Next, we formally define the construction of the directed graph representing the interval order of a trace.
Definition 4 (Interval Order of a Trace). Given a trace T c ⊆A, we define the corresponding interval order as a labeled, directed graph (V, E, λ) consisting of vertices V , directed edges E=(V ×V ), and a labeling function λ : V →L. The set of vertices is defined by V =T c with λ(a)=π l (a). Given two activity instances a 1 , a 2 ∈T , there is a directed edge π i (a 1 ), π i (a 2 ) ∈E iff π tc (a 1 )<π ts (a 2 ). We denote the universe of interval orders by P.
Next, we define the induced interval order.

Visualizing Trace Variants
This section introduces the proposed approach to visualize trace variants from partially ordered event data. Section 4.1 introduces the approach, and Section 4.2 proves that the approach is deterministic. Section 4.3 discusses the potential limitations of the approach. Finally, Section 4.4 covers the implementation.

Approach
The proposed visualization approach of trace variants is based on chevrons, a graphical element known from classical trace variant visualizations (cf. Figure 1). Figure 3 shows an example of the proposed visualization for the interval order given in Figure 2b. The interpretation of a chevron as indicating sequential order is maintained in our approach. Additionally, chevrons can be nested and stacked on top of each other. Stacked chevrons indicate parallel/overlapping execution of activities. Nested chevrons relate groups of activities to each other. In the given example, the first chevron indicates that C is executed in parallel to A, B, D, and E. The two upper chevrons indicate that A and B are executed in parallel, but are executed before D and E, both of which are also executed in parallel.
The proposed approach assumes an interval order, representing a trace variant, as input and recursively partitions the interval order by applying cuts to compute the layout of the visualization (cf. Figure 3). In general, a cut is a partition of the nodes of a given interval order. Based on the partition, induced interval orders are derived. Each application of such a cut corresponds to chevrons and their positioning in the final visualization, e.g., stacked or sideby-side chevrons. Nested chevrons result from the recursive manner. Next, we define the computation of the proposed layout, i.e., we define two types of cuts.
An ordering cut partitions the activity instances into sets such that these sets can be totally ordered, i.e., all activity instances within a set can be related to all other activity instances from other sets. In terms of the graph representation of an interval order, this implies that all nodes from one partition have a directed edge to all nodes from the other partition(s). We depict an example of an ordering  Figure 4. Note that all nodes in V 1 are related to all nodes in V 2 and V 3 . Next, we formally define an ordering cut for an interval order.
Definition 6 (Ordering Cut). Assume an interval order (V, E, λ)∈P. An ordering cut describes a partition of the nodes V into n>1 non-empty subsets V 1 , . . . , V n such that: A parallel cut indicates that activity instances from one partition overlap in time with activity instances in the other partition(s), i.e., activity instances from different partitions are unrelated to each other. Thus, we are looking for components in the graph representation of an interval order.
Definition 7 (Parallel Cut). Assume an interval order (V, E, λ)∈P. A parallel cut describes a partition of the nodes V into n≥1 non-empty subsets V 1 , . . . , V n such that V 1 , . . . , V n represent connected components of (V, E, λ), i.e., ∀1≤i<j≤n We call a cut maximal if n, i.e., the number of subsets, is maximal. Figure 5 shows an example of the proposed visualization approach. We use the interval order from Figure 2b as input. The visualization approach recursively looks for a maximal ordering or parallel cut. In the example, we initially find an ordering cut of size three (cf. Figure 5a). Given the cut, we create three induced interval orders (cf. Figure 5b). As stated before, each induced interval order created by a cut represents a chevron. In general, an ordering cut indicates the horizontal alignment of chevrons while a parallel cut indicates the vertical alignment of chevrons. Since we found an ordering cut of size three, the intermediate visualization consists of three horizontally-aligned chevrons (cf. Figure 5c). If an induced interval order only consists of one element (e.g., the third induced interval order in Figure 5b), we fill the corresponding chevron with a color that is unique for the given activity label (cf. Figure 5c). As in the classic trace variant explorer, colors are used to better distinguish different activity labels.
We now recursively apply cuts to the induced interval orders. In the first two interval orders, we apply a parallel cut (cf. Figure 5d). The third interval order consists only of one node labeled with G; thus, no further cuts can be applied.   Figure 5e shows the induced interval orders after applying the two parallel cuts. As stated before, time-overlapping activity instances are indicated by stacked chevrons. Since both applied parallel cuts have size two, we create two stacked chevrons each within the first and the second chevron (cf. Figure 5f). After another ordering cut (cf. Figure 5g-5i) and two more parallel cuts (cf. Figure 5j), the visualization approach stops because all induced interval orders consist of only one activity instance. Figure 5l shows the final visualization.

Formal Guarantees
Next, we show that the proposed approach is deterministic, i.e., the same visualization is always returned for the same interval order. We therefore show that different cuts cannot coexist, i.e., either a parallel cut, an ordering cut, or no cut exists in an interval order. Further, we show that maximal cuts are unique.

Lemma 1 (Cuts Cannot Coexist).
In an interval order (V, E, λ)∈P a parallel and an ordering cut cannot coexist.
Proof. Let (V, E, λ)∈P be an interval order with an ordering cut V 1 , . . . , V n for some n≥2. Assume there exists a parallel cut, too, i.e., V 1 , . . . , V m for some m≥2. For 1≤j≤m, assume that for an arbitrary v∈V it holds that v∈V j such that v∈V i for some i∈{1, . . . , n}. Since an ordering cut exists, we know that , also all w and w must be in V j . Hence, . . , V m is a partition of V . This contradicts our assumption that there exists a parallel cut, too. The other direction is symmetrical.
Since cuts cannot coexist (cf. Lemma 1), one cut is applicable for a given interval order at most. Next, we show that maximal cuts are unique.
Lemma 2 (Maximal Ordering Cuts Are Unique). If an ordering cut exists in a given interval order (V, E, λ)∈P, the maximal ordering cut is unique.
Lemma 3 (Maximal Parallel Cuts Are Unique). If a parallel cut exists in a given interval order (V, E, λ)∈P, the maximal parallel cut is unique. Proof (Lemma 3). By definition, components of a graph are unique.
Lemma 2 and Lemma 3 show that maximal cuts, both ordering and parallel, are unique. Together with Lemma 1, we derive that the proposed visualization approach is deterministic, i.e., the approach always returns the same visualization for the same input, because for a given interval order only one cut type is applicable at most and if a cut exists, the maximal cut is unique.

Limitations
In this section, we discuss the limitations of the proposed visualization approach.
Reconsider the example in Figure 5. Cuts are recursively applied until one node, i.e., an activity instance, remains in each induced interval order (cf. Figure 5k). However, there are certain cases in which the proposed approach cannot apply cuts although more than one node exists in an (induced) interval order.  Figure 6a, showing an example of a trace where no cuts can be applied. Since each activity instance is overlapping with some other activity instance, we cannot apply an ordering cut. Also, since there is no activity instance that overlaps with all other activity instances, we cannot apply a parallel cut. Note that the visualized pattern of chained activity instances can be arbitrarily extended by adding more activity instances vertically and horizontally, indicated by the dots in Figure 6a. Figure 6b shows the corresponding interval order.
For the example trace, the proposed approach visualizes the activities A, . . . , F within a single chevron, indicating that the activities are executed in an unspecified order (cf. Figure 6c). Thus, the visualization highly simplifies the observed process behavior in such cases. Alternatively, it would be conceivable to show the interval order within a chevron if an (induced) interval order cannot be cut anymore. However, we decided to keep the visualization simple and show all activities within a single chevron. Note that this design decision entails that the expressiveness of the proposed visualization is lower than the graphical notation of interval orders, i.e., different interval orders can have the same visualization.

Implementation
The proposed visualization approach for partially ordered event data has been implemented in Cortado [12] 5 , which is a standalone tool for interactive process discovery. Figure 7 shows a screenshot of Cortado visualizing an event log with partially ordered events. The implemented trace variant explorer works for both, partially and totally ordered event data. The tool assumes an event log in the .xes format as input. If the provided event log contains start-and completetimestamps, the visualization approach presented in this paper is applied.

Evaluation
In this section, we evaluate the proposed visualization approach. We focus thereby on the performance aspects of the proposed visualization. Further, we focus on the limitations, i.e., no cuts can be applied anymore, although the (induced) interval order has more than one element, as discussed in Section 4.3. We use publicly available, real-life event logs [4,5,11]. Table 2 shows the results. The first three columns show information about the logs. Two logs [5,4] contain start-and complete-timestamps per activity instance while one log [11] contains only a single timestamp per activity instance. Regarding the total calculation time, we note that the duration of the visualization calculation is reasonable from a practical point of view. We observe that the recursive application of cuts takes up most of the computation time in all logs, as expected. Regarding the variants, we observe that the number of classic variants is higher compared to the number of variants derived from the interval order for all event logs. We observe this even for the third event log [11] because some activities within the cases share the same timestamp. Regarding the limitations of the approach, as discussed in Section 4.3, we observe that only in the first log [5] approximately in 6% of all trace variants patterns occur where it was not possible to apply cuts anymore. Note that the limitation cannot occur in event logs where only a single timestamp per activity is available, e.g., [11].

Conclusion
This paper introduced a novel visualization approach for partially ordered event data. Based on chevrons, known from the classic trace variant explorer, our approach visualizes the ordering relations between process instances in a hierarchical manner. Our visualization allows to easily identify common patterns in trace variants from partially ordered event data. The approach has been implemented in the tool Cortado and has been evaluated on real-life event logs.