1 Introduction

Event log files are used as input to any process mining algorithm. Often, the aim of these algorithms is to derive an as-is model of the process that created these logs that can be used to further analyze the actual process execution. Usually, process mining is applied to historical data and, thus, event log files that were recorded from IT systems (e.g., ERP systems). Recently, the application of process mining to real-time data became more common where log files of low-level sensor data is increasingly being leveraged for process mining and for such data the event-activity mapping is especially challenging [1].

In process mining, events are commonly defined as the observable and instantaneous occurrences of specific well-defined business activities [2] within the scope of specific process instances (i.e., cases). In fact, most process mining methods rely on these two requirements and results have shown to been poor if any of the requirements is not met [3,4,5,6]. Many of the different kind of events, e.g., a sudden change in sensor data, do not comply with the strict assumptions of process mining methods. Therefore, we use the term event in a wider meaning than its typical usage in process mining. Events can be any kind of observations (e.g., a sensor value changed) with relevance to a certain process and are not necessarily already linked to a specific (business) process activity.

Fig. 1.
figure 1

Two examples for event-activity mappings are shown. The behavior of a process in the physical world is captured through sensing of low-level events from the observable behavior, aggregation and abstraction to high-level events, inference of activity instances from the process context and correlation to a specific process instance. The objective of the derived as-is model influences the event-activity mapping.

Generally, process mining algorithms assume that each event has already been assigned to a distinct meaningful process activity in the context of the process in question. Although several mapping approaches exist in the literature, the status-quo is that approaches can only indicate likelihoods of mappings, since there is often more than one possible solution [7, 8]. Figure 1 looks at different levels of events and their mapping to process activities. The first and lowest level of events are low-level events generated through sensing or observation of the physical world. Let us assume that in this hospital example the low-level events were generated through sensors. Looking at low-level events in isolation is usually not useful for the analysis of a process since their semantics may be unclear or even ambiguous. Low-level events need to be aggregated or abstracted to high-level events through aggregation or abstraction using methods such as CEP, which derives events on a higher level of abstraction from a set of low-level events. High-level events already carry a semantics in the terms of the process under observation since they are often derived through rules based on domain knowledge. However, it is also possible to derive high-level events through unsupervised abstraction techniques [6], in which case their exact semantics may not be clear. Sometimes high-level events may already be correlated with the occurrence of a specific (business) activity that is recognizable in the context of a particular process. Yet, often additional information on the context in which one or several high-level events occurred needs to be taken into account. Through contextualization of events into the realm of a specific process, occurrences of activities, i.e., activity instances can be identified and correlated to a specific process instance. Referring to Fig. 1 the high-level event “image on” requires in the left hand process the aggregation of two events, while on the right hand the event is aggregated by three low-level events due to the stronger light needed for the nightly diagnostic imaging (i.e., in this example time is a contextual factor). Also the event “image on” is mapped to more activities of the nightly intensive care process than for the oncology at daytime process. Thus, accuracy of event-activity mappings is difficult to be benchmarked if contextual factors are not fully considered.

Traditionally, event-activity mappings consider the order of events, timestamps and related persons (resource) as sole context attributes. However, a more comprehensive view of the process context in which the event was recorded is necessary in order to increase the quality of event to activity mappings. To understand how events-activities mappings can be contextualized, we studied context taxonomies. From this study we provide a framework for classifying context factors for event-activity mappings and demonstrate the applicability of the framework.

The remainder of this paper is structured as follows. The next section discusses context dimensions. These dimensions are applied in Sect. 3 to our literature search. Section 4 defines the context framework for event-activity mappings and demonstrates its application. The paper ends with a summary and an outlook.

2 Context Dimensions

In [9] context is defined as “any information that can be used to characterize the situation of an entity. An entity refers to a person, place, or object, which is related to the interaction between user and application.” A process context is “...the set of process context information that characterizes the current execution situation of a process...” [10]. To understand the process context of event-activity mappings several context taxonomies were studied [11,12,13,14,15,16,17,18,19]. From this study, we classify context information into four context dimensions as depicted in Fig. 2:

Fig. 2.
figure 2

Context information classified into four context dimensions with properties.

  1. 1.

    Personal and Social Context: describes all tasks in which an entity is involved and also mental and physical information about an entity and on her interaction to others [17]. The tasks in which an entity is involved are discriminated by the context property activity, the mental and physical information by ability and interaction to others is addressed by relationship [13]. The personal and social context of entities might be additionally described by properties that are not covered elsewhere (i.e. workload).

  2. 2.

    Environmental Context: addresses an entities’ surrounding [18] such as tool and device aspects (equipment) [17] and the performance of the algorithm [14].

  3. 3.

    Task Context: is related to the history [14], the goal or intention behind the process [17], the frequency of tasks or events (causality) [19], its application [12], and rules [17]. The task context might also be described by properties that are not covered elsewhere (such as costs, the cycle time of tasks, security issues) [19, 20]. Particularly, historical information on the process are recorded within history [19]. Causality can be uncovered through the following metrics such as the overall frequency of task, the frequency of task directly preceded, the frequency of task directly succeeded, the frequency of directly or indirectly preceded, but before the next appearance, the frequency of directly or indirectly succeeded, but before the next appearance of tasks [21]. The context property application refers to the domain of the task such as health care or mobile banking. Rules refer to business rules that involve tasks (e.g., if more than two persons travel together, the third pays only half price.) and structural constraints and can be specified in a formalized or textual form [22].

  4. 4.

    Spatial-temporal Context: is related to the spatial-temporal coordination of the entity and subsumes location and time [17,18,19].

Context information results from an aggregation or abstraction of a set of simple context information, which are observed or sensed from raw data and can be classified as:

  • simple context information, such as location or time, or

  • complex context information that is aggregated from simple context information, such as that the entity was in a room on the 28rd of FebruaryFootnote 1.

Fig. 3.
figure 3

Hierarchy of context information where simple context information can be captured from raw data and a complex context information is obtained by aggregation.

The next section summarizes the results of a literature review on context in process mining and on event-activity mappings. The context dimensions from Fig. 2 are used to classify the results.

3 Event to Activity Mapping

To understand context-awareness in event to activity mappings we performed two different literature searches. First, we intended to extract event attributes from log files that are related to each of the four context dimensions (see Fig. 2) in order to give guidelines when developing event-activity mappings. Second, we were interested about the degree to how these event attributes have been tackled for event to activity mapping approaches already. The literature reviews have been conducted between November 2017 and February 2018. We searched the research databases ACM Digital Library, IEEE Xplore, ISI Web of Science, ScienceDirect, Scopus and Springer Link. We additionally used Google Scholar to widen the scope of our search.

3.1 Context Awareness for Process Mining

The literature analysis on context awareness in process mining reveals that several context properties have already been taken into account when mining a process model from an event log. Table 2 summarizes the results.

  1. 1.

    Personal and Social Context: The social context of individuals was considering for process mining by work environment [23,24,25,26] or organizational structures [20]. For this purpose, events were attached with the attributes performer, identity, originator or some approaches differentiate between role and resource. These works cover the context properties activity and relationship. Another contribution uncovered the abilities of entities through the analysis of the attributes service line, entity position [27] or capabilities [29]. Interactions of entities can be extracted through the mining of performers and their relationships. Entity properties such as department identifiers (group) have been used to discover sub-processes per department [3] (Table 1).

  2. 2.

    Task Context: Objectives behind a task (goal) or a process can be uncovered by the order of activities and activity labels [28]. History and causality are directly determined from a “common” event log as shown in Fig. 1 [2]. The context property application was considered in the attribute-subject domain [29]. To uncover rules either a rule attribute was used to point to rules [31] or alignments to LTL-based descriptions or transaction protocols are defined [30].

  3. 3.

    Environmental Context: The context property performance (measured by recall or fitness) was determined through meta-data analysis of a log file in case that the user indicated the performance of the mapping algorithm (e.g., exact, approximate) or through the size of a trace [33]. To uncover tool and device aspects, i.e., the equipment property, the attribute medium was attached as event attributeFootnote 2.

  4. 4.

    Spatial-temporal Context: When mining process models related to spatial-temporal information, the timestamp attribute was refined by startTime and endTime [34] (time context property) as well as location area, location level, and location category (location context property) [35] were attached to events.

To sum up, mostly the personal and social context of entities, causality, history or rule were considered as context properties when mining a process model. The context properties history and causality are implicitly used by all process mining approaches since they can be determined directly from the minimal event log requirements [36]. Properties such as goal, application or equipment were mostly disregarded, which might be explained due to the difficulty of that challenging task.

Table 1. Literature identified for event-activity mappings organized according to context dimensions and context properties.

3.2 Context Awareness for Event-Activity Mappings

This section summarizes the literature review on context awareness for event-activity mappings. Additionally, the results of this review should be used to compare the status-quo of context awareness in process mining (i.e., which context properties have been tackled for event-activity mappings already). The literature results are again crossed with the context dimensions listed in Sect. 2.

  1. 1.

    Personal and Social Context: Folino et al. [37] enrich the event logs with the attribute team that indicates the team associated with the first event of a trace and employ a clustering approach to obtain activities. Mannhardt et al. [38] uses information about the department in which events occurred to determine per-department activity patterns for the use of event-activity mapping. Moreover, the event log attribute workload as the number of problems open on the time when a trace of an event log started [37] is also related to the property entity property.

  2. 2.

    Task Context: Domain knowledge is a key-word often used in the literature which refers to knowledge about the direct task context in the terms of the context property history. Two examples for domain-knowledge based approaches are Baier et al. [39], who uses domain knowledge extracted from process documentation to semi-automatically match events and activities, and Mannhardt et al. [38] who uses activity patterns to capture domain knowledge for event-activity mapping. The behavior defined by the activity patterns is aligned with the observed behavior in the event log, which records historical information about the process. Mounira et al. [14] propose historic related context with regards to patient’s immediate members to use for developing a context-aware process mining framework for maximizing business process exibility illustrated in hospital environment. Another context property is causality: Tax et al. [40] propose entropy of an activity in an event log based on its directly follows ratio vector and the directly-precedes ratio vector. Lu et al. describe a semi-supervised approach for log pattern detection. They refine causal dependencies into directly causes and eventually causes. In addition, Diamantini et al. [41] refer to their work to relevant subtraces from an event log by considering process execution patterns. Also, attributes like one-to-many correspondence (an event corresponds to a set of low-level events). The third context property we deal with is application: We identified Folino et al. [37] as related paper dealing with combining the discovery of different execution scenarios with the automatic abstraction of log events. Finally, the last context property we deal with is rule: Goedertier et al. [42] propose the use of first-order logic to define preconditions and time-varying properties to overcome difficulties like the limitation of process mining to a setting of non-supervised learning since negative information is often not available.

  3. 3.

    Spatial-temporal Context: Time as a context property is found in several papers [14, 37, 40, 42, 43]. Particularly, [37] enrich the time dimensions by attributes such as week-day, month and year. Location of activities can be found via RFID tracking [43].

In fact, many context properties have not been covered yet when mapping events to activities, which can be concluded from the comparison to the analysis on context-awareness in process mining. Particularly, the context properties “activity”, “ability” and “entity property”, “equipment”, and “location” are not sufficiently covered by the literature we identified within our review. This might be explained due to the challenging task to retrieve the information. Additionally, more attributes for the personal & social context have been addressed.

Both literature reviews elicit that several contextual factors have been already tackled when mining a process from an event log and developing an event-activity mapping. Properties such as goal, application, privacy or equipment should attract more attention. One solution that allows to identify the application from a log might be the linguistic analysis of activity labels [36]. The development of privacy-aware event-activity mappings might be inspirited by privacy-aware modeling approaches [44] where privacy policies or privacy restrictions are considered. We are convinced that in the further mapping approaches will emerge addressing these context properties. This can be justified by the increase interest in this topic and particularly in the rise of IoT.

Table 2. Literature identified for event-activity mappings organized according to context dimensions and context properties.

4 Framework for Event-Activity Mappings

To improve the accuracy of event-activity mappings we developed a framework based on the literature results found on context awareness in process mining and event-activity mappings. The pillars of the framework are the four context dimensions presented in Sect. 2. The properties of each dimension are those event attributes that are tackled for context awareness in process mining and event-activity mappings. Depending on the objective, these properties might be on different abstraction levels. For instance, in case of event-activity mappings according to personal & social relationships either one might subsume resource and role to performer or consider role and performer as synonym. The sole consideration of performer without any other event attributes is not sufficient in any way.

The benefit of the framework is twofold. For those, who intend to develop an event to activity mapping, it is recommended to first specify the objective of the event-activity mapping. Next, it should be decided whether a simple or complex context information is of relevance. Depending on the extent of context-awareness, event attributes have to be distilled and attached to the trace. Certainly, the decision in favor which attributes to attach to the log file depends on the mining objective or even on the accessibility and availability of information. Definitely, the more complete the log file the more accurate the results obtained from process mining. In this way, this framework is applied as guide towards developing an accurate event-activity mapping.

On the other hand, the application of the framework might be the comparison of event-activity mapping approaches (i.e., find the event-activity mapping that consider more attributes of the spatial-temporal context). The accuracy in this scenario correlates with the number of used attributes. In this way, the framework benchmarks event-activity mapping approaches.

Fig. 4.
figure 4

Context framework for event-activity mappings and the applications of the framework. The benefit of the framework is decision support improving the accuracy of event to activity mapping approaches.

5 Conclusion and Implications

Events need to be contextualized through the use of context information for a successful mapping to activity instances. However, a systematic discussion on the use of context information for event-activity mappings is missing. To fill this gap, we conducted a comprehensive literature review on existing event-activity mappings as well as on the general use of context properties in process mining methods. The literature was structured according to four context dimensions: personal and social context, task context, environmental context, and spatial-temporal context, which we identified from work on context taxonomies. As a result, we identified 14 context properties that should be recorded in event logs and that should be used by event-activity mapping methods to improve the mapping accuracy. We found that the context properties causality and history, which belong to the task context dimension, are supported most frequently. However, other properties such as, e.g., activity, ability, goal, equipment, performance, and location are not or only rarely described in the literature on event-activity mappings. Thus, it remains challenging to consider the wider context for event-activity mapping problems.