Keywords

1 Introduction

Based on the UN annual report, the number of older people is envisaged to be nearly 2.1 billion by 2050, growing to a size more than twice as large as in 2017 [1]. As a result of the aging population, it is expected that “multi-morbidity” is going to increase [2]. Multi-morbidity refers to any co-occurrence of conditions in the same person [2]. Sometimes the term “co-morbidity” is used instead of multi-morbidity, while the term co-morbidity is defined as the combination of extra disorders besides an index disease [2]. The treatment of multi-morbid patients is a complicated task since they generate several challenges. These include recognizing signs and symptoms of different illnesses, managing multiple medications and treatments, interacting between various health conditions, and allocating resources by medical centers. These lead us to develop care pathways for patients with multi-morbidity in a way that overcomes these challenges.

Care pathways, as one of the central tools used in healthcare, can be described as a straightforward statement of the aims, a representation of the interactions between the health’s resources and patients, or a description of roles, sequential decisions, and activities related to the care process [3]. The primary goal of care pathways is reducing variability in the treatment of diseases [4]. Since care pathways are a set of time-framed events focusing on a specific situation that delivers guidance about how to deal with conditions that appear during treatment’s processes [4]. It can be itself considered as a process which is a sequence of events with a common goal [5].

Table 1. Example of event log with single-dimensional or single-entity event data.

Processes can be graphically represented by process models [5] which explain responsibilities, inspect compliance, predict performance using simulation [6], manage complexity, reduce variation, and enhance coordination [5] in processes. Discovering process models or process discovery from event logs is one of the main tasks in process mining. Event logs contain sequences of events recorded from information systems. Any registered event refers to at least (1) an activity (i.e., a well-defined step in the process), (2) a case or process instance representing a single entity, (3) a unique timestamp. Logs fulfilling these requirements are called single-dimensional or single-entity event data [7], which an example of this type of log was shown in Table 1. Single entity event data also can refer to properties (e.g., the person executing or initiating the activity) [5].

If we want to satisfy all practitioners in the healthcare sector and achieve a holistic process view for care pathways [8], we should consider more than one clinical process of patients’ care pathways. But, the standard type of event data forces us to deploy an event log for each clinical process of patient care pathways. On the other hand, if we have multi-entity event data, meaning events refer to multiple entities (e.g., each clinical process of a care pathway), relational databases and traditional process mining techniques are ineffective.

This study explores the potential of analyzing care pathways for patients with multi-morbidity using a multi-entity event data representation reflecting the independent clinical processes. Our main research question is: How can we identify valuable insights by using a multi-entity event data representation for care pathways of multi-morbid patients? The remainder of this research is structured as follows. Section 2 reviews state-of-the-art research about the use of multi-entity event data in process mining and how to represent and store them. Section 3 introduces MIMIC-III that is used to illustrate and validate our approach. In Sect. 4, we show how to build multi-entity event data for multi-morbid patients. In Sect. 5, we show preliminary results that are, then, discussed in Sect. 6. We conclude with an outlook on future work in Sect. 7.

2 Related Work

Multi-entity event data can not be stored in the same way as single-entity event data; furthermore, in this setting process discovery is not possible with traditional methods. In this section, we explore the related literature from several perspectives to select a good format for multi-morbid care pathways event data.

Table 2. Excerpt of a event log with multi-entity event data relating to multiple entities that can be converted into an event graph representation [7].

2.1 Multi-entity Event Data

In the approach of [9], known as object-centric process mining, each case notion is referred to as one object type (e.g., application and vacancy can be two case notions or two object types, and each of them has its own case identifiers). In that approach, events can refer to multiple case notions instead of referring to a single case notion. A process model is first discovered for all objects sequentially. Then, each directly-follows relation is labeled to its related object type. For example, if event-1 that is related to object-1 happened right before event-2 that is related to object-1, event-2 directly follows event-1, and so on.

Another type of multi-entity event data was proposed by [7]. Based on [7], there does not need to be a single case notion, but events are related to one or more entities of different entity types. Entities themselves can also be related to each other. The required input events have been shown in [7] is similar to the one shown in Table 2. Information about the relations may also be extracted from other sources, e.g., relational database keys. Process models can be discovered in a flexible manner per entity or for various combinations of several entities. Our event log format for storing multi-entity event data is based on this model.

2.2 Storing Event Data

A classical approach for storing event data is using relational databases (RDBs). A relatively new approach is using an event graph which is a mathematical graph data structure that is built by converting relational database concepts to vertices, edges, nodes, and relationships [7]. This leads to a natural representation of multi-entity event data and the possibility to discover multi-entity models by querying from event graphs.

A series of experiments were conducted in [10] to compare the performance and efficacy of relational databases and event graphs, sho1higher capabilities of event graphs. Extracting multi-entity event data needs to flatten event data because only a single case notion can be chosen [7] leading to traditional process mining. Additionally, a graph database can store all of the case notions of a multi-entity directly follows graph in only one graph [7].

Recently, event graphs were deployed for storing data. The work in [11] introduces an approach to store and retrieve single-entity event logs into/from graph databases. That approach defines how log files shall be stored in a graph database, and it also illustrates how directly follows graphs (DFG) can be calculated in the graph database. In another recent literature, task executions and routines in processes were classified and detected using event graphs [12]. In that research, at first, the event log was transformed into an event graph. Then graph theory was used to detect task execution patterns and their changes over time.

Converting multi-entity event data to an event graph was formalized in [7] by conceptualizing event log, events, entities, and classes. Based on [7] each event log has several events, and each event in one hand correlates to entities, and on the other hand, can be observed by classes. Meanwhile, the events can be related to each other if they directly follow each other. Entities can be related to each other based on the occurrence of their events. As well, the classes can follow each other by directly following relationships. Based on these reasons, in sum, an event graph seems to be a better approach compared to the relational database for storing multi-entity event data.

Vogelsang et al. [13] looks at process mining from multiple dimensions. Still, these dimensions are related to properties of cases such as region, age of patients, and not event data. In the approach, several single-entity event data, separated based on the difference between regions, ages, and so forth, were used.

Overall, we found that the subject of using event graphs in a healthcare setting and, in particular, discovering care pathways from multi-entity event data using event graphs was not yet explored in previous literature.

3 Multi-entity Event Data in MIMIC-III

For evaluation of the feasibility of using event graphs for clinical pathways of multi-morbid patients, the MIMIC-III [14] is used. MIMIC-III is a freely accessible tertiary care database that involves information relating to patients admitted to critical care units (CCU) of Beth Israel Deaconess Medical Center in Boston, Massachusetts, during 2001 and 2012. Data from MIMIC-III were downloaded from several sources such as critical care monitoring information systems, bedside monitors, hospital and laboratory electronic health record databases, and social security administration.

Table 3. List of patient ICD code and its repetitive in patients.

The ninth revision of the international statistical classification of diseases and related health problems (ICD-9) is widely used diagnostic coding system. Each ICD-9 code corresponds to a single diagnostic disease except the codes starting with E and V, which are related to external causes of injury and additional classification. We use the ICD-9 code system for specifying multi-morbid patients by considering patients with several ICD-9 codes as patients with multi-morbidity.

We use a subset of data from MIMIC-III. To extract event data from MIMIC-III, first, from DIAGNOSES_ICD Table, values of icd9_code column, excluding codes start with E and V, were grouped by each distinct patient’s hospital admission identifier (hadm_id). The DIAGNOSES_ICD table involves patients identifiers (subject_id), patients hospital admission identifiers (hadm_id), the sequence order in which the ICD-9 diagnoses were made (seq_id), and ICD-9 (icd9_code). After that, the patient admission identifier was grouped by an collection of ICD codes as shown in Table 3. Each row of Table 3, shows the number of observances of a disease (or group of diseases), which has been coded by ICD-9 format, at the time of admission of patients to the hospital. If the first row of the table shows more than one disease, we consider them as multi-morbidity cases. Meanwhile, a patient can have several admission identifiers that show the patients admitted to the hospital several times at different times.

From this initial look at a subset of the MIMIC-II dataset on multi-morbid patients, multiple entities can be identified, e.g., admissions, diseases (ICD codes), and so on. We now describe the relevant entities in detail and extract them to build an event graph representation.

4 Event Graphs for Multi-morbid Patients Pathways

This study explores how to analyze multi-entity event data for patients with multi-morbidity based on an event graph. Based on our research question, a hypothesis for this research was formulated as follows: Applying event graph produces valuable insights when using multi-entity event data for clinical pathways of multi-morbid patients. Our strategy is to design an experiment for the research to investigate this. This section describes the method we followed to investigate this question and build event graphs to discover care pathways for multi-morbid patients.

4.1 Identifying and Extracting Entities

Each distinct clinical process related to patients with multi-morbidity is called an entity. Since several clinical processes are involved in treating multi-morbid patients, entities can easily be identified by considering those clinical processes. We identified the following entities in the subset of the MIMIC-III dataset:

  1. 1.

    Logistic. This entity events contains admission, discharging, registering to Emergency department (ED), discharging from ED, In-hospital death (if died), calling-out request (when patients ready to discharge), and transferring between different services, care unit and wards. Six MIMIC-III tables were used to download this entity’s events: PATIENTS, ADMISSIONS, CALLOUT, SERVICES, ICUSTAYS, TRANSFERS.

  2. 2.

    Laboratory_Measurement. This entity contains events of the type abnormal laboratory measurements, Which play an essential role in diagnosing and treating patients’ diseases. For extracting these events label, value, valueuom, and flag columns of D_LABITEMS, and LABEVENTS tables were used.

  3. 3.

    Prescriptions. This entity contains starting and ending timestamps of medication-related order entries, i.e., prescriptions such as the drug which is prescribed to the patient, its dose’s value, form, and unit of medication, for extracting of this entity PRESCRIPTIONS table was used.

  4. 4.

    Diagnosis. This entity was related to the first event at the beginning of each time of patients admissions. It involves a group of ICD codes showing patients’ diseases in each admission. DIAGNOSES_ICD table relationship with other tables was used for downloading ICD codes of this entity.

  5. 5.

    Admission. In the end, the hospital admission identifier was appended to multi-entity event data. If an event is related to the NULL admission number, it is associated with the outpatient clinic.

Table 4 shows an example of created multi-entity event data for patients identified 4900. It is possible to extract multi-entity event data for each row of Table 4, while we consider the admission identifier or its equivalent patient identifier as a case identifier.

Fig. 1.
figure 1

Graph creation for Patient_4900: Steps (top), (bottom left), and (bottom right)

Fig. 2.
figure 2

Graph creation for Patient_4900: Steps (top left), (top right), (bottom)

Table 4. Excerpt of an event log extracted from MIMIC-III with multiple entities. We abbreviate event labels in the remainder as follows: L_Taken = Laboratory Test Taken, LAM = Laboratory Abnormal Measurement, CA = Coronary Atherosclerosis, DM = Diabetes Mellitus, HL = Hypercholesterolemia, HT = Hypertension, TBS = Transfers Between Services, TIW 27 = Transfer Into Ward: 27, HA = Hospital Admission.

4.2 Building the Event Graph

We showed the steps we followed to create the event graph from the multi-entity event data based on the approach introduced in [7] in Figs. 1 and 2: Each record of the event log was converted to a node, called event node; then another node was created for the event log. After that, relationships from each event node to the log node was created. Nodes for the cases’ entities and their properties, called entity nodes was generated, then each event node was correlated to its relative entity node. The entities nodes were related to each other based on their event’s sequential occurrence. The relationship between the entities nodes were reified. Directly follows relation between the events node was created based on entities and properties, and Event class nodes and property class nodes were created respectively for distinct events and properties, and finally aggregated directly follows relationships for the event and property class nodes were created.

5 Results of Application to MIMIC-III

A preliminary evaluation of our approach relies on a qualitative discussion. We analyze the generated multi-entity directly follows graphs from the MIMIC-III database and evaluate to which extend they support our hypothesis. We implemented the event graph creation using Python and the Neo4J library and adapted the code provided by [7] for our caseFootnote 1. Multi-entity directly follows graphs were discovered by querying the event graph with CQL and visualized it with Graphviz.

The multi-entity directly-follows graphs of two patients are shown in Figs. 3 and 4. These two patients, Patient_4900 and Patient_14606, are examples of multi-morbid patients who have been admitted to the hospital several times and had more than one disease at each time of admission.

Based on Fig. 3, before hospital admission, a laboratory measurement was taken (L Taken Node) for Patient_4900, the abnormal measurements (LAM node) of laboratory test is one of the bases for diagnosing diseases for that patient. The patient was admitted to the hospital three times. In each of them, several diseases were diagnosed for the patient, and after that, several activities related to Logistic, Laboratory_Measurement and Prescriptions entities happened for the patient. In the first, second, and third admission, respectively, four, six, and four diseases were diagnosed for the patient. The activities for Logistic, Laboratory_Measurement and Prescriptions entities is different in each admission because there is difference between diagnoses diseases of each three admission. It means the activities done for patients are related to their diseases. We can see that the disease CA (Coronary Atherosclerosis) and DM (Diabetes Mellitus type II) was diagnosed in all three admissions, which indicate some common activities related to entities have occurred in all three times of admission. On the other hand, we have diseases such as HL (Hypercholesterolemia), HH (Hemorrhage), MN (Malignant neoplasm), MF (Myocardial Infarction), which were diagnosed in only one admission time. It shows that first, there are unique activities related to entities related to this disease. Second, they were treated in hospital.

According to Fig. 4, the patient was admitted to the hospital without any laboratory measurement, which means that patient diagnoses related to the first admission are not related to previous measurements. For patient_14606, a group of diseases was diagnosed in the patient’s first admission: CA (Coronary Atherosclerosis), CS (Coronary Syndrome), HD (Hyperlipidemia), HM (Hypothyroidism), HT (Hypertension). After that several activities related to Logistic, Laboratory_Measurement and Prescriptions entities were conducted for treating those diseases. After the first patient admission, a laboratory test was taken that was used as the basis of diagnoses for the second admission. In the second admission of Patient_14606 another group of diseases was diagnosed: DM (Diabetes mellitus), CC (Carotid Artery Occlusion), VD (Vascular Disease), HL (Hypercholesterolemia), HM (Hypothyroidism), HT (Hypertension) since then activities related to Logistic, Laboratory_Measurement and Prescriptions entities happen. In the third admission of Patient_14606, another group of diseased were diagnosed: CH (Congestive Heart Failure), CD (Cardiac Dysrhythmia), HM (Hypothyroidism), CC (Carotid Artery Occlusion). For the Patient_14606, we can see that diseases related to coronary disease were not diagnosed in the second and third time, indicating activities in the first admission treated these diseases. Also, diseases are repeated in all three admissions, which indicates these diseases are chronic diseases or the activities are done for the patient were not useful.

Fig. 3.
figure 3

Multi-entity directly-follows graph for Patient_4900. (Color figure online)

Fig. 4.
figure 4

Multi-entity directly-follows graph for Patient_14606 (top) and details (bottom) (Color figure online)

6 Discussion

Based on the Figs. 3 and 4, discovered multi-entity directly follows graph for those patients show all traditional process mining concepts (e.g., sequence of activities) and for all involved clinical processes in only one graph. Meanwhile, the relationship between the different clinical processes activities that were not detectable in traditional models was clearly shown in discovered directly follows graph. This graph shows how diagnoses for multi-morbid patients evolved during the care pathways and how these diagnoses relate to other events, and how the trajectory of patients varies for each group of diagnoses.

The multi-entity directly-follows graph of Patients_4900 and Patient_14606 involves four entities which each of which has been shown with different colors. Before the first Admission of the Patients_4900, the patient had abnormal values related to out-of-hospital laboratory measurements from clinics which the patient had visited. These measurements can be one of the bases for diagnosing diseases for the first Admission of that patients. These diseases were shown in Diagnoses entity. Meanwhile, in discovered graphs, the admission number of patients was indicated by separate red edges.

These graphs demonstrate that analyzing care pathways of patients with multi-morbidity is completely applicable using an event graph. The discovered graphs for distinct patients can illustrate all single-entity concepts such as activities, cases, and their properties for all entities simultaneously. Based on these results, the hypotheses of the research, applying event graphs produce valuable insights when using multi-entity event data for clinical pathways of multi-morbid patients, seems to be valid.

7 Conclusions

In this research, we could discover insightful graphs comparing traditional process mining by using multi-entity event data stored in an event graph. We evaluate the potential of the event graph approach proposed by Essser and Fahland [7] for clinical data by using the MIMIC-II database. Some of the limitations of this paper are related to the case study, such absence of resources in the MIMIC-III database and shifting times. Another limitation is related to missing visualization methods for multi-entity event data. Creating appropriate visualization approaches and automating process discovery can be future research. Enabling to show sub-processes inside an event is a highly insightful capability for graphs, which can be future work. As well, multi-entity graph notations need to be researched and created.