Keywords

1 Introduction

In the last few years, object-centric event logs have been proposed as the next step forward in event log representation. The drive behind this is the fact that the eXtensible Event Stream (XES) standard [15] with a single case notion does not allow capturing reality adequately [14]. A more realistic assumption instead is to view a process as a sequence of events that interact with several objects. Several object-centric event log representations have been proposed such as eXtensible Object-Centric (XOC) event logs [18], Object-Centric Behavioral Constraint model (OCBC) [4], and most recently Object-Centric Event Logs (OCEL) [14]. The first two event log representations face scalability issues related to the storage of an object model with each event or to the duplication of attributes [14]. However, there is a difficult trade-off to be made between expressiveness and simplicity, leaving the recent OCEL proposal as the most suitable for object-centric process mining as it strikes a good balance between storing objects, attributes and their relationships and yet keeping everything simple.

OCEL offers interesting new research opportunities not only for process mining with, e.g., object-centric Petri nets [1] or object-centric predictive analysis [11], but also for decision mining [16]. OCEL is already well on its way to become an established standard with a visualization tool [12], log sampling and filtering techniques [5], its own fitness and precision notions [2], its own clustering technique [13], an approach to define cases and variants in object-centric event logs [3] and a method to extract OCEL logs from relational databases [23]. In this paper, attributes are considered to be logged together with events and objects in an event log and should relate clearly to their respective concepts, i.e., events, objects or both. As such, OCEL could provide more analysis opportunities by supporting attributes having several values simultaneously, allowing attributes to change values over time and to unambiguously link attributes to objects, all of which is currently not fully supported but common in object-centric models such as structural conceptual models like the Unified Modeling Language (UML) [20].

For this purpose, this paper proposes an extension to OCEL called, Data-aware OCEL or DOCEL, which allows for such dynamic object attributes. The findings are illustrated through a widely-used running example for object-centric processes indicating how this standard can also support the further development of object-centric decision/process mining and other domains such as Internet of Things (IoT) related business processes. This paper also presents an algorithm to convert XES logs to DOCEL logs. Since many event logs are available in a “flat” XES format for every object involved in the process, not all information can be found in one event log. As such, providing an algorithm that merges these XES files into one DOCEL log would centralize all the information in one event log without compromising on the data flow aspects that make XES such an interesting event log format.

The structure of this paper is as follows: Sect. 2 explains the problem together with a running example applied on the standard OCEL form. Section 3 introduces the proposed DOCEL format together with an algorithm to automatically convert XES log files into this novel DOCEL format. Next, the limitations and future work of this work are discussed in Sect. 4. Finally, Sect. 5 concludes this paper.

2 Motivation

The IEEE Task Force conducted a survey during the 2.0 XES workshopFootnote 1 concluding that complex data structures, especially one-to-many or many-to-many object relationships, form a challenge for practitioners when pre-processing event logs. By including multiple objects with their own attributes, object-centric event logs have the opportunity to address these challenges. This does entail that the correct attributes must be unambiguously linked to the correct object and/or activity to correctly discover the process of each object type as well as the relevant decision points [1]. The next subsection discusses the importance object attribute analysis had on single case notion event logs.

2.1 Importance of Object Attributes in Single Case Notion Event Logs

Various single case notion process mining algorithms make use of both event and case attributes, e.g., in [7], a framework is proposed to correlate, predict and cluster dynamic behavior using data-flow attributes. Both types of attributes are used to discover decision points and decision rules within a process in [17]. For predictive process monitoring, the authors of [9] develop a so-called clustering-based predictive process monitoring technique using both event and case data. Case attributes are also used to provide explanations of why a certain case prediction is made within the context of predictive process monitoring [10].

The same challenges apply to decision mining which aims to discover the reasoning and structure of decisions that drive the process based on event logs [22]. In [8], both event and case attributes are used to find attribute value shifts to discover a decision structure conforming to a control flow and in [19], these are used to discover overlapping decision rules in a business process. Lastly, within an IoT context, it has been pointed out that contextualization is not always understood in a similar fashion as process mining does [6]. As such object-centric event logs offer an opportunity for these different views of contextualization to be better captured.

The previous paragraphs show (without aiming to provide an exhaustive overview) that various contributions made use of attributes that could be stored and used in a flexible manner. Unfortunately, as will be illustrated in the next subsections, the aforementioned aspects related to attribute analysis are currently not fully supported in object-centric event logs.

2.2 Running Example

Consider the following adapted example inspired from [8] of a simple order-to-delivery process with three object types: Order, Product, Customer. Figure 1Footnote 2 visualizes the process.

A customer places an order with the desired quantity for Product 1,2 or 3. Next, the order is received and the order is confirmed. This creates the value attribute of order. Afterwards, the ordered products are collected from the warehouse. If a product is a fragile product, it is first wrapped with cushioning material before being added to the package. The process continues and then the shipping method needs to be determined. This is dependent on the value of the order, on whether there is a fragile product and on whether the customer has asked for a refund. If no refund is asked, this finalizes the process. The refund can only be asked once the customer has received the order and requests a refund. If that is the case, the order needs to be reshipped back and this finalizes the process.

Fig. 1.
figure 1

BPMN model of running example

2.3 OCEL Applied to the Running Example

In this subsection, the standard OCEL representation visualizes a snippet of this process. Table 1 is an informal OCEL representation of events and Table 2 is an informal OCEL representation of objects. Figure 2 visualizes the meta-model of the original OCEL standard. Several observations can be made about the standard OCEL representation:

Table 1. Informal representation of the events in an OCEL format
Table 2. Informal representation of the objects in an OCEL format

A: Attributes that are stored in the events table can not unambiguously be linked to an object. The OCEL standard makes the assumption that attributes that are stored in the events table can only be linked to an event. This assumption was taken for its clear choice of simplicity and it holds in this running example, which has straightforward attributes relationships and no changing product values over time. Even though the given example is very obvious regarding how the attributes relate to the objects given the attribute names, this is not always the case. If the value of a product could change over time, the product value attributes would have to be added to the events table but then there would be 4 attributes storing values, i.e., order value, product 1 value, product 2 value and product 3 value. Knowing which attribute is linked to which object would then require domain knowledge as it is not explicitly made clear in the events table. As such, this can be an issue in the future for generic OCEL process discovery or process conformance algorithms since prior to running such an algorithm, the user would have to specify how attributes and objects are related to one another.

B: Based on the OCEL metamodel (Fig. 2), it is unclear whether attributes can only be linked to an event or an object individually or whether an attribute can be linked to both an event and an object simultaneously. Since the OCEL standard did not intend for attribute values to be shared between events and objects by design to keep things compact and clear and since the OCEL UML model (Fig. 2) can not enforce the latter, Object-Constraint Language (OCL) constraints would have made things clearer. Therefore, it might be beneficial to support the possibility to track an attribute change, e.g., the refund attribute of object Order can change from 0 to 1 and back to 0 across the process.

C: Attributes can only contain exactly one value at a time according to the OCEL metamodel (see Fig. 2). This observation entails two aspects. First, it is unclear, based on the metamodel of Fig. 2, whether an attribute can contain a list of values. It is not difficult to imagine situations with a list of values, e.g., customers with multiple bank accounts or emails, products can have more than one color. Currently, OCEL supports multiple values by creating a separate column for each value in the object or event table. This means that each value is treated as a distinct attribute , e.g., in the running example, a customer orders a quantity of product 1, 2 and 3. This can be considered as 1 attribute with 3 values. However, in Table 1, the columns Q1, Q2 and Q3 are considered to be separate attributes even though they could be considered as being from the same overarching attribute Quantity. Secondly, even if an attribute only has 1 value at a time, its value could change over time as well. Such an attribute can be considered to have multiple values at different points in time. If a value were to change, currently, one would have to create a new object for each attribute change. Unfortunately, this only works to some degree since there are no object-to-object references (only through events) in the standard OCEL format. Another possibility would require to unambiguously track the value of an attribute of an object to a certain event that created it. This is also valid within an IoT context with sensors having multiple measurements of the same attributes over time. As such, the first three observations clearly go hand in hand.

D: Both the event and object tables seem to contain a lot of columns that are not always required for each event or object. When looking at the events table, attribute Order Value is only filled once with event ‘confirm purchase’ when it is set for order 1. One could either duplicate this value for all the next events dealing with order 1 or one could simply keep it empty. Therefore, in a big event log with multiple traces one could expect a lot of zero padding or duplication of values across events. Even though this issue is not necessarily present in a storage format, it still shows that ambiguity about attribute relationships might lead to wrongly stored attributes without domain knowledge.

Fig. 2.
figure 2

OCEL UML model from [14]

Fig. 3.
figure 3

DOCEL UML model

3 Data-Aware OCEL (DOCEL)

Subsection 3.1 introduces the DOCEL UML metamodel. Next, Subsect. 3.2 applies DOCEL to the running example. Finally, Subsect. 3.3 introduces an algorithm to convert a set of XES files into this DOCEL format.

3.1 DOCEL UML Metamodel

To formally introduce the DOCEL standard, a UML class diagram is modeled (Figure 3). UML diagrams clearly formalize how all concepts relate to one another in OCEL or DOCEL. Based on the observations from Sect. 2.3, the key differences with the UML class diagram of OCEL (Fig. 2) are indicated in color in Fig. 3 to enrich OCEL even further:

  • 1: Attribute values can be changed and these changes can be tracked. By allowing ambiguities, domain knowledge becomes indispensable to make sensible and logical conclusions. In the DOCEL UML model, attributes are considered to be an assignment of a value to an attribute name in a particular context event and/or object. A distinction is made between static and dynamic attributes. Static event attributes and static object attributes are assumed to be linked to an event or an object respectively and only contain fixed value(s). Static attributes are stored in a similar fashion as with the standard OCEL format, namely in the event or the object table, except that now each object type has an individual table to avoid having null values for irrelevant columns. On the other hand, dynamic attributes are assumed to have changing values over time. Dynamic attributes are linked to both an object and an event so that a value change of an attribute can easily be tracked. Another design choice would be to store a timestamp with the attribute value instead of linking it to the event, however, this might lead to ambiguity in case two events happened at the exact same moment. As such, this proposal tackles observation A.

  • 2: Event attributes can unambiguously be linked to an object. This issue goes hand in hand with the previous proposal and is solved at the same time. By distinguishing between dynamic and static attributes all relations between attributes, events and objects are made clear and ambiguities have been reduced. A static attribute is either linked to an object or an event and its value(s) can not change over time. A dynamic attribute is clearly linked to the relevant object and to the event that updated its value. The DOCEL UML model (Fig. 3) can enforce that a static attribute must be linked with at least 1 event or at least 1 object since a distinction is made between static event attributes and static object attributes. For dynamic attributes, this issue does not apply since it needs to both connected to both an object and an event anyhow. This proposal solves both observations A & B.

  • 3: Attributes can contain a list of values. Even though not all attributes have a list of values, supporting this certainly reflects the reality that multiple values do occur in organizations. In the DOCEL UML model (Fig. 3) the 1 cardinality for Attribute Value allows both dynamic and static attributes to have complex values, e.g., lists, sets and records containing multiple values. In practice, these values are stored in the relevant attribute tables with a list of values. This proposal solves observation C.

3.2 DOCEL Applied to the Running Example

Table 3 is the events table containing all the events together with their static event attributes (in green) in this case Resource. Complying with the DOCEL UML model, only static event attributes are found in this table which are solely linked to events. The main changes from the OCEL to the DOCEL tables have been highlighted using the same color scheme as in the DOCEL UML model to show where the columns have been moved to in the DOCEL tables.

Table 3. Informal representation of events with static attributes in a DOCEL format

Tables 4, 5 and 6 represent object type tables where the objects are stored. Each object is given an object ID. In this data-aware format, aligned with the UML model, a distinction is made between static attributes and dynamic attributes. Static attributes are assumed to be immutable and, therefore, the static object attributes (in blue) are stored together with the objects themselves, e.g., customer name, product value, fragile and bank account. Notice how here, once again, the attributes can be clearly linked to an object. Table 5 only contains primary keys because its attributes are dynamic attributes in this example.

Table 4. Product Type table
Table 5. Order table
Table 6. Customer table

The red Tables 7, 8, 9 and 10 are dynamic attribute tables. Dynamic attributes are assumed to be mutable and its values can change over time. Using two foreign keys (event ID and object ID), the attribute and its value can be traced back to the relevant object as well as the event that created it. Each attribute value is given an attribute value ID with the value(s) being stated in the following column. This complies with the proposed UML model in Fig. 3 where dynamic attributes are clearly linked to the relevant event and relevant object.

Table 7. Quantity table
Table 8. Order Value table
Table 9. Refund table
Table 10. Shipping method table

From the DOCEL log, the following things are observed:

Attributes can unambiguously be linked to an object, to an event or to both an event and an object with the use of foreign keys.

Attributes can have different values over time, with value changes directly tracked in the dynamic attributes tables. This means one knows when the attribute was created and for how long it was valid, e.g., refund was initialized to 0 by event 1, then event 15 set it to 1 and finally event 24 sets it back to 0.

Static and dynamic attributes can contain a list of values in the relevant attributes table, e.g., attribute Quantity.

The amount of information stored has only increased with foreign keys. Previously, the dynamic attributes would have been stored anyhow in the events table with the unfortunate side-effect of not being explicitly linked to the relevant object and with more columns in the events table. This essentially is a normalization of an OCEL data format. Even though it starts resembling a relational database structure, it was decided for this DOCEL format to not include relations between objects. Deciding on whether to include object models within event logs is essentially a difficult trade-off between complexity/scalability and available information within the event log. From this perspective, the design choice of XOC and OCBC was mostly focused on reducing complexity [14], where we aim for an event log format that offers more information in exchange of a slightly increased complexity. As such, the DOCEL standard has decreased the amount of columns per table and thus observation D is solved as well.

3.3 Automatically Converting XES Logs to DOCEL Logs

Currently, research is focused on automatically converting XES logs to OCEL logs with a first proposal introduced in [21]. Automatically transforming XES logs or an OCEL log to the proposed DOCEL log would mainly require domain knowledge to correctly link all attributes to the right object, but this is also required for a normal process analysis of an OCEL log. Our algorithm can be found in Algorithm 1. This algorithm takes as input a set of XES files describing the same process and assumes that each XES file describes the process from the point of view of one object type. The main ideas of the algorithm are as follows:

  • Line 3 starts the algorithm by looping over all XES-logs.

  • Lines 4–8 create the object type tables with all their objects and static object attributes. In line 7, it is assumed that the trace attributes are not changing and solely linked to one object. Since the assumption is made that an XES file only contains one object type, these trace attributes can be considered as static object attributes belonging to that object.

  • Lines 10–12 require the user to identify the static event attributes and the other event attributes that can be linked to an object. Next, a new EventID is made to know from which log an event comes from.

  • In line 15, the dynamic attributes tables are constructed under the assumption that attributes that have not yet been identified as static object attributes or static event attributes are dynamic attributes.

  • Lines 17–18 create the new chronologically ordered events Table E.

  • Line 20 matches the events with the relevant objects based on the dynamic attributes tables using the new EventID. It should definitely also include the object related to the original traceID related to that event.

  • Finally, lines 21–22 will create the final DOCEL eventIDs and update the eventID across all dynamic attribute tables.

figure a

4 Limitations and Future Work

To better store information about attributes, DOCEL comes with a variable number of tables. However, the tables should be smaller as there are fewer columns compared to the standard OCEL format. It is still possible to only use certain attributes or attribute values for analysis by extracting the relevant attributes/values. Instead of selecting a subset of columns with OCEL, the user selects a subset of tables in DOCEL which offer more information. Next, neither OCEL or DOCEL include the specific roles of objects of the same object type in an event, in case of a Send Message event from person 1 to person 2, making it currently impossible to distinguish between the sender and the receiver.

To further validate the DOCEL format, the authors are planning to develop a first artificial event log together with a complete formalization of the DOCEL UML with OCL constraints. Furthermore, directly extracting DOCEL logs from SAP is also planned. Regarding the algorithm to automatically convert XES logs to DOCEL logs, the authors are planning to extend the algorithm with a solution to automatically discover which attributes are linked to objects or events. Secondly, an extension to create a DOCEL log based on a single XES file with multiple objects is also planned. DOCEL however offers many other research opportunities such as novel algorithms for object-centric process discovery, conformance checking or enhancements which would further validate or improve the DOCEL format. Also other domains such as IoT-related process mining can be interesting fields to apply DOCEL on.

5 Conclusion

This paper illustrates that the OCEL standard has certain limitations regarding attribute analysis, such as unambiguously linking attributes to both an event and an object or not being able to track attribute value changes. To deal with these challenges, an enhanced Data-aware OCEL (DOCEL) is proposed together with an algorithm to adapt XES logs into the DOCEL log format. With DOCEL, the authors hope that new contributions will also take into account this data-flow perspective not only for object-centric process and decision mining algorithms but also for other domains such as IoT-oriented process analysis.