Artifact-Centric Process Mining
KeywordsArtifact-centric Process Behavioral Dependence Case Management Model And Notation (CMMN) Classical Process Mining Data-centric Dynamic Systems
Artifact-centric process mining is an extension of classical process mining (van der Aalst 2016) that allows to analyze event data with more than one case identifier in its entirety. It allows to analyze the dynamic behavior of (business) processes that create, read, update, and delete multiple data objects that are related to each other in relationships with one-to-one, one-to-many, and many-to-many cardinalities. Such event data is typically stored in relational databases of, for example, Enterprise Resource Planning (ERP) systems (Lu et al. 2015). Artifact-centric process mining comprises artifact-centric process discovery, conformance checking, and enhancement. The outcomes of artifact-centric process mining can be used for documenting the actual data flow in an organization and for analyzing deviations in the data flow for performance and conformance analysis.
During artifact-centric process discovery, each event is associated with one data object in the data source. From the behavioral relations between all events associated with one data object, a life-cycle model of the data object is learned using automated process discovery techniques. Each life-cycle model describes the possible changes to the object and their ordering as they have been observed in reality. From behavioral relationships between events in different related data objects, information about behavioral dependencies between changes in different data objects is discovered preserving the one-to-one, one-to-many, and many-to-many cardinalities.
Several modeling languages have been proposed to describe a complete artifact-centric model of all object life cycles and their behavioral interdependencies. Existing behavioral modeling languages can be extended to express interdependencies of one-to-many and many-to-many cardinalities including Petri nets (van der Aalst et al. 2001) and UML (Estañol et al. 2012). Specifically designed languages including the Guard-Stage-Milestone (GSM) model (Hull et al. 2011) or data-centric dynamic systems (Hariri et al. 2013) employ both data and behavioral constructs as primary modeling concepts. The Case Management Model and Notation (CMMN) standard v1.1 incorporates several modeling concepts of GSM (OMG 2016).
Artifact-centric conformance checking compares event data to an artifact-centric model with the aim to identify where recorded events deviate from the behavior described in the artifact-centric model. Deviations may exist between observed and specified data models, between observed events and the life-cycle model of an artifact, and between observed events and the interactions of two or more artifacts.
Artifact-centric model enhancement uses event data to enrich an artifact-centric model, for example, with information about the frequency of paths through a life-cycle model or interactions, or to identify infrequent behavior as outliers.
Historically, artifact-centric process mining addressed the unsolved problem of process mining on event data with multiple case identifiers and one-to-many and many-to-many relationships by adopting the concept of a (business) artifact as an alternative approach to describing business processes.
Event Data with Multiple Case Identifiers
Convergence and Divergence
Process mining requires to associate events to a case identifier in order to analyze behavioral relations between events in the same case (van der Aalst 2016). The data in Fig. 2 provides three case identifiers: SD id, DD id, and BD id. Classical process mining forces to associate all events to a single case identifier. However, this is equivalent to flattening and de-normalizing the relational structure along its one-to-many relationships.
Artifact-Centric Process Models
Artifact-centric process mining adopts modeling concept of a (business) artifact to analyze event data with multiple case identifiers in their entirety (Lu et al. 2015; Nooijen et al. 2012; van Eck et al. 2017). The notion of a (business) artifact was proposed by Nigam and Caswell (2003) as an alternative approach to describing business processes. This approach assumes that any process materializes itself in the (data) objects that are involved in the process, for instance, sales documents and delivery documents; these objects have properties such as the values of the fields of a paper form, the processing state of an order, or the location of a package. Typically, a data model describes the (1) classes of objects that are relevant in the process, (2) the relevant properties of these objects in terms of class attributes, and (3) the relations between the classes. A process execution instantiates new objects and changes their properties according to the process logic. Thereby, the relations between classes describe how many objects of one class are related to how many objects of another class.
An artifact-centric process model enriches the classes of the data model themselves with process logic restricting how objects may evolve during execution. More precisely, one artifact (1) encapsulates several classes of the data model (e.g., Sales Documents and Sales Document Lines), (2) provides actions that can update the classes attributes and move the artifact to a particular state, and (3) defines a life cycle. The artifact life cycle describes when an instance of the artifact (i.e., a concrete object) is created, in which state of the instance which actions may occur to advance the instance to another state (e.g., from created to cleared), and which goal state the instance has to reach to complete a case. A complete artifact-centric process model provides a life-cycle model for each artifact in the process and describes which behavioral dependencies exist between actions and states of different artifacts (e.g., pick delivery may occur for a Delivery object only if all its Billing objects are in state cleared). Where business process models created in languages such as BPMN, EPC, or Petri nets describe a process in terms of activities and their ordering in a single case, an artifact-centric model describes process behavior in terms of creation and evolution of instances of multiple related data objects. In an artifact-centric process model, the unit of modularization is the artifact, consisting of data and behavior, whereas in an activity-centric process modeling notation, the unit of modularization is the activity, which can be an elementary task or a sub-process. A separate entry in this encyclopedia discusses the problem of automated discovery of activity-centric process models with sub-processes.
The gray part of Fig. 4 shows a more involved behavioral dependency. Whenever Update Price occurs in a Sales document, all related Delivery documents get blocked. Only after Release delivery block occurred in Sales, the Delivery document may be updated again, and Pick Delivery may occur. For the sake of simplicity, the model does not show further behavioral dependencies such as “Update price also blocks related Billing documents.”
Artifact-Centric Process Mining
Structural conformance states how well the data model describes the data records observed in reality. The proclet model of Fig. 4 structurally conforms to the data in Fig. 2 regarding objects and relations but not regarding actions: the recorded event data shows two additional event types for the life cycle of the Sales document – Release billing block and Last Change.
Life-cycle conformance states how well the life-cycle model of each artifact describes the order of events observed in reality. This corresponds to conformance in classical process mining. For example, in Fig. 2, Update invoice date occurs in Billing after Clear Invoice which does not conform to the life-cycle model in Fig. 4.
Interaction conformance states how well the entire artifact centric model describes the behavioral dependencies between artifacts. In Fig. 2, instance D3 of Delivery is created after its related instance B2 of Billing. This does not conform to the channels and ports specified in Fig. 4.
The objective of artifact-centric process mining is to relate recorded behavior to modeled behavior, through (1) discovering an artifact-centric process model that conforms to the recorded behavior, (2) checking how well recorded behavior and an artifact-centric model conform to each other and detecting deviations, and (3) extending a given artifact-centric model with further information based on recorded event data.
Discovering the data model of entities or tables, their attributes, and relations from the data records in the source data. This step corresponds to data schema recovery. It can be omitted if the data schema is available and correct; however, in practice foreign key relations may not be documented at the data level and need to be discovered.
Discovering artifact types and relations from the data model and the event data. This step corresponds to transforming the data schema discovered in step 1 into a domain model often involving undoing horizontal and vertical (anti-) partitioning in the technical data schema and grouping entities into domain-level data objects. User input or detailed information about the domain model are usually required.
Discovering artifact life-cycle models for each artifact type discovered in step 2. This step corresponds to automated process discovery for event data with a single case identifier and can be done fully automatically up to parameters of the discovery algorithm.
Discovering behavioral dependencies between the artifact life cycles discovered in step 3 based on the relations between artifact types discovered in step 2. This step is specific to artifact-centric process mining; several alternative, automated techniques have been proposed. User input may be required to select domain-relevant behavioral dependencies among the discovered ones.
Artifact-centric conformance checking and Artifact-centric model enhancement follow the same problem decomposition into data schema, artifact types, life cycles, and interactions as artifact-centric discovery. Depending on which models are available, the techniques may also be combined by first discovering data schema and artifact types, then extracting event logs, and then checking life-cycle and behavioral conformance for an existing model or enhancing an existing artifact model with performance information.
Key Research Findings
Nooijen et al. (2012) provide a technique for automatically discovering artifact types from a relational database, leveraging schema summarization techniques to cluster tables into artifact types based on information entropy in a table and the strength of foreign key relations. The semiautomatic approach of Lu et al. (2015) can then be used to refine artifact types and undo horizontal and vertical (anti-) partitioning and to discover relations between artifacts. Popova et al. (2015) show how to discover artifact types from a rich event stream by grouping events based on common identifiers into entities and then deriving structural relations between them.
Event log extraction.
In addition to discovering artifact types, Nooijen et al. (2012) also automatically create a mapping from the relational database to the artifact-type specification. The technique of Verbeek et al. (2010) can use this mapping to generate queries for event log extraction for life-cycle discovery automatically. Jans (2017) provides guidelines for extracting specific event logs from databases through user-defined queries. The event log may also be extracted from database redo logs using the technique of de Murillas et al. (2015) and from databases through a meta-model-based approach as proposed by de Murillas et al. (2016).
Given the event log of an artifact, artifact life-cycle discovery is a classical automated process discovery problem for which various process discovery algorithms are available, most returning models based on or similar to Petri nets. Weerdt et al. (2012) compared various discovery algorithms using real-life event logs. Lu et al. (2015) advocates the use of the Heuristics Miner of Weijters and Ribeiro (2011) and vanden Broucke and Weerdt (2017) with the aim of visual analytics. Popova et al. (2015) advocate to discover models with precise semantics and free of behavioral anomalies that (largely) fit the event log (Leemans et al. 2013; Buijs et al. 2012) allowing for translating the result to the Guard-Stage-Milestone notation.
Lu et al. (2015) discover behavioral dependencies between two artifacts by extracting an interaction event log that combines the events of any two related artifact instances into one trace. Applying process discovery on this interaction event log then allows to extract “flow edges” between activities of the different artifacts, also across one-to-many relations, leading to a model as shown in Fig. 6. This approach has been validated to return only those dependencies actually recorded in the event data but suffers when interactions can occur in many different variants, leading to many different “flow edges.”
van Eck et al. (2017) generalize the interaction event log further and create an integrated event log of all artifact types to be considered (two or more) where for each combination of related artifact instances, all events are merged into a single trace. From this log, a composite state machine model is discovered which describes the synchronization of all artifact types. By projecting the composite state machine onto the steps of each artifact type, the life-cycle model for each artifact is obtained, and the interaction between multiple artifacts can be explored interactively in a graphical user interface through their relation in the composite state machine. This approach assumes one-to-one relations between artifacts.
Popova and Dumas (2013) discover behavioral dependencies in the form of data conditions over data attributes and states of other artifacts, similar to the notation in Fig. 5 but is limited to one-to-one relations between artifacts.
Artifact life-cycle conformance can be checked through extracting artifact life-cycle event logs and then applying classical conformance checking techniques (Fahland et al. 2011a). The technique in Fahland et al. (2011b) checks interaction conformance in an artifact life-cycle model if detailed information about which artifact instances interact is recorded in the event data.
Models for artifacts.
Artifact-centric process mining techniques originated and are to a large extent determined by the modeling concepts available to describe process behavior and data flow with multiple case identifiers. Several proposals have been made in this area. The Proclet notation (van der Aalst et al. 2001) extended Petri nets with ports that specify one-to-many and many-to-many cardinality constraints on messages exchanged over channels in an asynchronous fashion. Fahland et al. (2011c) discuss a normal form for proclet-based models akin to the second normal form in relational schemas. The Guard-Stage-Milestone (GSM) notation (Hull et al. 2011) allows to specify artifacts and interactions using event-condition-actions rules over the data models of the different artifacts. Several modeling concepts of GSM were adopted by the CMMN 1.1 standard of OMG (2016). Hariri et al. (2013) propose data-centric dynamic systems (DCDS) to specify artifact-centric behavior in terms of updates of database records using logical constraints. Existing industrial standards can also be extended to describe artifacts as shown by Lohmann and Nyolt (2011) for BPMN and by Estañol et al. (2012) for UML. Freedom of behavioral anomalies can be verified for UML-based models (Calvanese et al. 2014) and for DCDS (Montali and Calvanese 2016). Meyer and Weske (2013) show how to translate between artifact-centric and activity-centric process models, and Lohmann (2011) shows how to derive an activity-centric process model describing the interactions between different artifacts based on behavioral constraints.
Examples of Application
Artifact-centric process mining is designed for analyzing event data where events can be related to more than one case identifier or object and where more than one case identifier has to be considered in the analysis.
The primary use case is in analyzing processes in information systems storing multiple, related data objects, such as Enterprise Resource Planning (ERP) systems. These systems store documents about business transactions that are related to each other in one-to-many and many-to-many relations. Lu et al. (2015) correctly distinguish normal and outlier flows between 18 different business objects over 2 months of data of the Order-to-Cash process in an SAP ERP system using artifact-centric process mining. The same technique was also used for identifying outlier behavior in processes of a project management system together with end users. van Eck et al. (2017) analyzed the personal loan and overdraft process of a Dutch financial institution. Artifact-centric process mining has also been applied successfully on software project management systems such as Jira and customer relationship management systems such as Salesforce (Calvo 2017).
Artifact-centric process mining can also be applied on event data outside information systems. One general application area is analyzing the behavior of physical objects as sensed by multiple related sensors. For instance, van Eck et al. (2016) analyzed the usage of physical objects equipped with multiple sensors. Another general application area is analyzing the behavior of software components from software execution event logs. For instance, Liu et al. (2016) follow the artifact-centric paradigm to structure events of software execution logs into different components and discover behavioral models for each software component individually.
Future Directions for Research
At the current stage, artifact-centric process mining is still under development allowing for several directions for future research.
Automatically discovering artifact types from data sources is currently limited to summarizing the structures in the available data. Mapping these structures to domain concepts still requires user input. Also the automated extraction of event logs from the data source relies on the mapping from the data source to the artifact-type definition. How to aid the user in discovering and mapping the data to domain-relevant structures and reducing the time and effort to extract event logs, possibly through the use of ontologies, is an open problem. Also little research has been done for improving the queries generated for automated event log extraction to handle large amount of event data.
Although many different modeling languages and concepts for describing artifact-centric processes have been proposed, the proposed concepts do not adequately capture these complex dynamics in an easy-to-understand form (Reijers et al. 2015). Further research is needed to identify appropriate modeling concepts for artifact interactions.
Systems with multiple case identifiers are in their nature complex systems, where complex behaviors and multiple variants in the different artifacts multiply when considering artifact interactions. Further research is needed on how to handle this complexity, for example, through generating specific, interactive views as proposed by van Eck et al. (2017).
Although several, comprehensive conformance criteria in artifact-centric process mining have been identified, only behavioral conformance of artifact life cycles can currently be measured. Further research for measuring structural conformance and interaction conformance is required, not only for detecting deviations but also to objectively evaluate the quality of artifact-centric process discovery algorithms.
- Buijs JCAM, van Dongen BF, van der Aalst WMP (2012) A genetic algorithm for discovering process trees. In: IEEE congress on evolutionary computationGoogle Scholar
- Calvanese D, Montali M, Estañol M, Teniente E (2014) Verifiable UML artifact-centric business process models. In: CIKMGoogle Scholar
- Calvo HAS (2017) Artifact-centric log extraction for cloud systems. Master’s thesis, Eindhoven University of TechnologyGoogle Scholar
- de Murillas EGL, van der Aalst WMP, Reijers HA (2015) Process mining on databases: unearthing historical data from redo logs. In: BPMGoogle Scholar
- de Murillas EGL, Reijers HA, van der Aalst WMP (2016) Connecting databases with process mining: a meta model and toolset. In: BMMDS/EMMSADGoogle Scholar
- Estañol M, Queralt A, Sancho MR, Teniente E (2012) Artifact-centric business process models in UML. In: Business process management workshopsGoogle Scholar
- Fahland D, de Leoni M, van Dongen BF, van der Aalst WMP (2011b) Conformance checking of interacting processes with overlapping instances. In: BPMGoogle Scholar
- Fahland D, de Leoni M, van Dongen BF, van der Aalst WMP (2011c) Many-to-many: some observations on interactions in artifact choreographies. In: ZEUSGoogle Scholar
- Hariri BB, Calvanese D, Giacomo GD, Deutsch A, Montali M (2013) Verification of relational data-centric dynamic systems with external services. In: PODSGoogle Scholar
- Hull R, Damaggio E, Masellis RD, Fournier F, Gupta M, Heath FT, Hobson S, Linehan MH, Maradugu S, Nigam A, Sukaviriya N, Vaculín R (2011) Business artifacts with guard-stage-milestone lifecycles: managing artifact interactions with conditions and events. In: DEBSGoogle Scholar
- Jans M (2017) From relational database to event log: decisions with quality impact. In: First international workshop on quality data for process analyticsGoogle Scholar
- Liu CS, van Dongen BF, Assy N, van der Aalst WMP (2016) Component behavior discovery from software execution data. In: 2016 IEEE symposium series on computational intelligence (SSCI), pp 1–8Google Scholar
- Lohmann N, Nyolt M (2011) Artifact-centric modeling using BPMN. In: ICSOC workshopsGoogle Scholar
- Meyer A, Weske M (2013) Activity-centric and artifact-centric process model roundtrip. In: Business process management workshopsGoogle Scholar
- Nooijen EHJ, van Dongen BF, Fahland D (2012) Automatic discovery of data-centric and artifact-centric processes. In: Business process management workshops. Lecture notes in business information processing, vol 132. Springer, pp 316–327. https://doi.org/10.1007/978-3-642-36285-9_36CrossRefGoogle Scholar
- OMG (2016) Case management model and notation, version 1.1. http://www.omg.org/spec/CMMN/1.1
- Popova V, Dumas M (2013) Discovering unbounded synchronization conditions in artifact-centric process models. In: Business process management workshopsGoogle Scholar
- van der Aalst WMP (2016) Process mining – data science in action, 2nd edn. Springer. https://doi.org/10.1007/978-3-662-49851-4
- van der Aalst WMP, Barthelmess P, Ellis CA, Wainer J (2001) Proclets: a framework for lightweight interacting workflow processes. Int J Coop Inf Syst 10:443–481Google Scholar
- van Eck ML, Sidorova N, van der Aalst WMP (2016) Composite state machine miner: discovering and exploring multi-perspective processes. In: BPMGoogle Scholar
- van Eck ML, Sidorova N, van der Aalst WMP (2017) Guided interaction exploration in artifact-centric process models. In: 2017 IEEE 19th conference on business informatics (CBI), vol 1, pp 109–118Google Scholar
- Verbeek HMW, Buijs JCAM, van Dongen BF, van der Aalst WMP (2010) Xes, xesame, and prom 6. In: CAiSE forumGoogle Scholar
- Weijters AJMM, Ribeiro JTS (2011) Flexible heuristics miner (FHM). In: 2011 IEEE symposium on computational intelligence and data mining (CIDM), pp 310–317Google Scholar