1 Introduction

In recent years, healthcare has become an important part of our daily life and it has been challenging to deliver high-quality care with limited resources (Strome et al., 2013). In many countries, healthcare has become a thriving sector of the economy (Yang et al., 2015) and has gone through technological advancements. For example, information technology is currently being used as part of healthcare management systems (Prokosch & Ganslandt, 2009).

Data mining can be defined as the analysis of data that discovers relationships or identifies patterns between various elements of a data set. It has been applied to extract hidden patterns within patient data in the healthcare system, including clinical medicine (Iavindrasana et al., 2009), adverse drug reaction signal detection (Karimi et al., 2015), big data analytics (Ghassemi et al., 2015), diabetes (Sigurdardottir et al., 2007), and skin diseases.

Discrete event simulation (DES) models a system by simulating its sequence of events or processes over time. Due to the complexity associated with healthcare processes, DES is the most widely used decision support tool for assessing trade-offs between the multiple objectives of healthcare systems. Simulation-based optimization could be used to find solutions to problems with a large number of conceivable scenarios (Fetter & Thompson, 1965).

Climate change is occurring because of the accumulation of greenhouse emissions in the atmosphere due to the combustion of fossil fuels. The major greenhouse gases responsible for climate change and global warming are carbon dioxide (CO2), methane (CH4), and nitrous oxide (N2O). Healthcare infrastructures have a large footprint in climate change and hospitals as a major part of this system have a high demand for electricity, lighting, heating, and the energy for ventilation, electric and electronic equipment, and air conditioning (Bi & Hansen, 2018a).

From an operations research perspective, carbon footprint (CFP) reduction has been tackled at strategic (Badri et al., 2013), tactical, and operational (Absi et al., 2013) levels. Planning and advanced scheduling techniques can play key roles in supporting CFP reduction (Liu, 2014). Intense climate events can have a direct or indirect effect on human health by disrupting ecosystems, agriculture, food, water quality, air quality, and by damaging infrastructure (Organization, 2014). All these effects place a great burden on health systems. Given the importance of healthcare and environmental threats due to increasing levels of greenhouse gases (GHGs), health authorities need to use effective ways to decrease the carbon footprint in healthcare systems. Pollard et al. (Pollard et al., 2013) used data from a case study to propose a bottom-up modeling framework to help with decisions regarding both cost and carbon in healthcare. Research results have confirmed that a bottom-up approach is effective for estimating and modeling the carbon footprint in healthcare.

The study in this paper was conducted at the Bushehr Heart Hospital in Iran. Negative effects of waiting on patients are well studied (e.g., Sigurdardottir et al., 2007). However, clinical staff also experience negative effects (Viccellio, 2017). Growing queues of patients put staff under significant work pressure and often require them to deal with frustrated patients. In the long run, such pressures can create morale problems and likely contributes to absenteeism. We were brought in to help the clinic staff diagnose the causes of poor patient flow and to identify effective solutions. We used simulation modeling as the main tool to help in our diagnostic and improvement efforts.

One major contribution of our work is to show how a simulation analysis of patient flow can significantly improve waiting time in a specialized hospital. To the best of our knowledge, our study is the first of its kind for a heart hospital. Another major contribution is to show how simulation modeling and the “hard” quantitative analysis it provides can assist in convincing involved parties to implement improvements. The clinic previously attempted to improve patient waiting-time performance by testing localized initiatives using standard Plan-Do-Study-Act (PDSA) methods (Fetter & Thompson, 1965). While this approach likely helped in creating a culture more accepting of change, the modeling we performed provided a systems perspective in addition to the quantitative evidence that showed that the improvements should work. With the preponderance of scientific staff in healthcare settings, a quantitative evidence-based approach can be important for a successful implementation.

The proposed approach used in this research is made up of two parts. In the first part, data mining is used to investigate data to discover relevant relationships among them. In the second part, a simulation–optimization model is developed to find an optimized patient flow while minimizing CO2 emissions. The remainder of this paper is organized as follows. In Sect. 2, related works are reviewed. Section 3 describes the research methodology. Section 4 presents the results of patients’ clustering and optimization of the care process. Section 5 discusses the findings of this research and presents conclusions based on the outputs of this research.

2 Literature review

Air pollution is a leading cause of global mortality and morbidity in the twenty-first century (Eckelman et al., 2018). A large number of studies on the relationship between climate change and death rate have been conducted (Sheridan et al., 2011) and have shown that for example blood pressure and cardiovascular diseases are related to diurnal temperature range and that heart disease incidence increases with air pollution. The healthcare sector like other industries requires improving its environmental performance thus many new facilities have been built based with this in mind in the past decades (Pinzone et al., 2012). Appropriate policies are needed to cope with the demand for climate change management within the health sector (Frumkin et al., 2008). Healthcare infrastructures have a large footprint in climate change and hospitals as a major part of this system have a high demand for electricity, lighting, heating, and the energy for ventilation, electric and electronic equipment, and air conditioning (Bi & Hansen, 2018b); in fact, all medical equipment needs the energy to function (Chevalier et al., 2009). Considering the importance of patient flow in hospitals and the usage of medical equipment in care processes, this work integrates data mining and simulation–optimization modeling to find an optimized patient flow while minimizing CO2 emission.

The merging of simulation and optimization methods has seen remarkable growth in recent years (Sheridan et al., 2011). Klassen and Yoogalingam (2009) proposed a simulation–optimization approach that was used to determine optimal rules for outpatient healthcare service scheduling problems. Their approach uses more variables and factors for system modeling as compared to previous studies. Kasaie and Kelton (2013) proposed a simulation–optimization framework for resource allocation in the control of epidemics interventions, and analyzed the behavior of RA outcomes concerning different investment strategies and sought optimal allocations. Cabrera et al. (2012) presented an Agent-Based modeling (ABM) approach to design a decision support system for Healthcare Emergency Department (ED). Osorio et al. (2017) presented an integrated simulation–optimization model to support both strategic and operational decisions for production planning in the blood supply chain. This method improved key indicators such as shortages, outdated units, donors required, and cost.

Healthcare encompasses many processes dealing with the treatment, diagnosis, and prevention of disease, injury, and other mental and physical impairments. Data mining has been used in previous studies to extract hidden patterns in patient data (Sun & Reddy, 2013; Yoo et al., 2012). Bruno (Bruno et al., 2014) proposed an explorative data mining approach to identify examinations followed by patients with a given disease. Their results showed the effectiveness of the proposed approach for discovering interesting groups of patients based on disease severity and similar examination history.

Xu et al. (2016) proposed an alternating optimization approach that was used to discover clusters in the positive class and to optimize the classifiers that separate each positive cluster from the negative samples. Mahoto et al. (2014) used clustering techniques to transform patient diagnostic exam data into patient vectors based on three clustering algorithms including DBSCAN, K-means, and Hierarchical algorithms and showed that DBSCAN performed better than the other algorithms.

Several studies have been published regarding the combination of simulation/optimization and data mining. Ng et al. (2011) proposed the integration of DES and data mining techniques for the analysis of general systems that are particularly suitable for production systems. Codrington-Virtue et al. (2006) developed an intelligent patient management system for use in the Accident and Emergency (A&E) setting based on DES and clustering techniques to calculate the maximum number of treatment places and nurse units required to service A&E ambulance arrivals. Their study also demonstrated how A&E ambulance arrivals can be categorized into diagnosis sub-groups according to length of stay quantiles. Ceglowski et al. (2016) combined data mining and DES to identify bottlenecks at the interface between the ED and hospital wards. Their model provided a value-added view of a hospital emergency department, treatment and disposal, and the occurrence of queues for treatment.

Amaran et al. (2016) compared and contrasted simulation optimization (SO) to algebraic model-based mathematical programming. The capacity problem of perinatal networks in the United Kingdom was considered by Asaduzzaman et al. (2010), while bed occupancy levels in an intensive care unit were assessed using simulation–optimization by Mallor and Azcárate (2014).

The above literature review shows that the patient flow, in general, and emergency departments have been considered with great concern. It also reveals that the combination of data mining, discrete event simulation, and optimization in improving the patient flow has been rarely considered. In addition to optimizing the length of stay, the number of patients discharged from the hospital, and waiting time, the amount of carbon produced by medical equipment in the hospital has been investigated. Reviewing the literature revealed that carbon footprint has been neglected in the context of healthcare in many middle and low income countries, and has not been considered in heart hospitals, thus motivating our research.

3 Research methodology

This research uses clustering algorithms to cluster patients and DES to capture the complexity of the patient flow. Then, the clustered patient flow is optimized based on waiting time, length of stay, patient throughput, and CO2 emission, using OptQuest (Eckelman & Sherman, 2016). The three stages of the methodology are described in Fig. 1.

Fig. 1
figure 1

Stages of the proposed methodology

Data mining (M), the first stage is composed of five steps: patient recording, data processing, retrieving patients’ database, data clustering, and data modeling. Following data collection, is data preprocessing, consisting of data cleaning, data integration, data selection, and data transformation. Data are cleaned because real-world data are sometimes noisy, inconsistent, and incomplete. Then, the data are stored in a database. Next, data relevant to the analysis are retrieved from the database. Finally, data are transformed and consolidated into different forms that are suitable for the mining procedure.

For the second stage simulation (S), the general framework of the flexible job-shop scheduling problem (FJSP) is used for patients’ flow modeling. The care units that a patient must go to during treatment and the average electricity consumption of equipment in each unit per patient are explained in Sect. 4. The last stage of the methodology is optimization (O), where OptQuest is used to optimize the objective function. According to the relationship between environment and health (Schulz et al., 2016), and the role of the health sector against climate change (Frumkin et al., 2008), in addition to throughput, waiting times and length of stay, reduction of carbon dioxide emissions due to the use of electrical equipment in the treatment process is considered.

3.1 Data mining: clustering methods and internal validation

Data mining emerged in the middle of the 1990s as a new approach to data analysis and knowledge discovery. The term “Data Mining” was first registered for the 2010 Medical Subject Headings (Yoo et al., 2012). Data mining has been used for pattern recognition (Kaya & Schoop, 2019), database design (Chaudhuri, 1998), artificial intelligence (Navale et al., 2016), visualization, and applications in healthcare (Tomar & Agarwal, 2013). One of the definitions mostly used states that “data mining is the analysis of observational data sets to summarize the data in novel ways and to find unsuspected relationships that are both useful and understandable to the data owner” (Hand et al., 2001).

Clustering forms one of the major classes of data mining algorithms. Clustering is an approach in which data are categorized into different groups or clusters in such a way that each group contains similar data points (Ibrahim et al., 2013). In healthcare systems, data points represent clinical profiles. Since patients with similar diseases need fairly similar types of care, the system should be able to design diagnostic patterns for treatment. Clear and tested clusters based on comorbidities can help clinicians select treatments for specific patients. In turn, this can assist with resource planning and system performance. In this research, four clustering methods are used to categorize patients: K-means (Duda et al., 2012; Jain et al., 1999), K-medoid (Na et al., 2010), hierarchical clustering (Jain et al., 1999), and fuzzy C-means (Mannila, 1996).

To ensure that a technique produces reliable results, validation is vital. Clustering validation is recognized as essential to the success of clustering applications (Jain & Dubes, 1988) and evaluates the goodness of clustering results (Liu, et al., 2010). Internal validation measures include Root-mean-square error, R-squared, Dunn’s index, Silhouette index, among many others (Liu et al., 2010). In this research we evaluated the clustering performance of each method using two of the most commonly used measures the Dunn index (Azuaje, 2002) and the Silhouette score (Wang et al., 2003).

Silhouette analysis is used to study the separation distance between the resulting clusters and measure how close each object in one cluster is close to another object in another cluster. Silhouette score values lie between −1 to + 1. The value of + 1 indicates the correct clustering of data points while the value of −1 shows that data points are not properly clustered. Dunn’s validation index is characterized as the ratio of the minimum distance between two clusters and the size of the biggest cluster (Azuaje, 2002).

3.2 Simulation optimization method

Discrete event simulation (DES) is a computer-based methodology utilized in modeling complex dynamic and stochastic systems, including health care delivery, and characterized by its speed and high flexibility. Nowadays, DES software is often embedded withrobust tools to support optimization in a variety of applications, including manufacturing (Rivera-Gómez et al., 2016), and operations scheduling (Cadi, et al., 2015).

DES is useful in hospitals where patient demand outstrips medical system capacity, and low-cost approaches to improve health care delivery are essential. It allows users to estimate the impact of operational changes before expanding resources (Abo-Hamad & Arisha, 2013).

Simulation optimization is an important enhancement of the simulation methodology because optimization is often desired in the design of systems. For instance, Li and Wang (2012) modeled and compared the impact of different ordering policies utilizing OptQuest simulation. Zhang et al. (2020) developed an ED model to evaluate different assignment strategies for expected patient waiting time, care quality, physician, and hospital profit. Lin et al. (2013) presented a system for multi-objective simulation optimization that combines the power of genetic algorithm with data envelopment analysis to evaluate the simulation results and guide the search process.

4 Case study and results

Our case study is based at Bushehr Heart Hospital (BHH), which is a hospital in southern Iran specializing in Cardiovascular disease (CVD), one of the most prevalent causes of death throughout the world (Sufi & Khalil, 2010). The Bushehr’s Heart Hospital has eight care units including triage, cardiopulmonary resuscitation (CPR), emergency department (ED), coronary care unit I and II (CCUI and CCU II), post coronary care unit (PCCU), intensive care units I and II (ICU I and ICU II), Catheterization Laboratory (Cath Lab), and operating rooms (ORs). Also, it has two administration units including reception and discharge units which have been conceptualized as workstations in this research.

The conceptual model of the patient flow is illustrated in Fig. 2. Patients arrive either as walk-in or by ambulance. On arrival, they are registered at the admission desk and based on their conditions, they receive the required treatment. Patients will be discharged when the treatment is successfully completed, or they are transferred to an inpatient ward or another hospital. Unfortunately, sometimes, the treatment is not successful, and the patient passes away.

Fig. 2
figure 2

Conceptual model of patient flow simulation

Patient flow is defined as the movement of patients through a set of care units in the hospital. Based on interviews with the head nurse and the supervisor, ten common pathways have been discovered. To validate the discovered pathways, a database of patients has been investigated using a mix of descriptive and advanced data analytics techniques. Data extracted from the repository have been transformed and structured as excel files. Using the Emergency Severity Index, and data analytics (Bachhety et al., 2021), the experimental pathways defined by staff were confirmed.

Upon arriving and based on their condition, patients can be categorized in one of the five levels of severity using the Emergency Severity Index. Depending on their ESI, patients follow a different sequence of treatment and care. The care units visited by a patient during treatment processes are illustrated in Fig. 3. As shown, patients categorized as ESI 1 (resuscitation) follow either Route 11, Route 12, or Route 13. Patients with ESI 2 (emergent) are categorized as acute cardiovascular disease (ACS); follow either Route 21, Route 22, or Route 23. Patients with ESI 3 (urgent) follow either Route 31 or Route 32, while patients with ESI 4 (nonurgent) follow Route 4. Patient categorized as ESI 5 (referred) follow Route 5.

Fig. 3
figure 3

Treatment routes of patients based on the ESI

The collected data cover those patients who have visited the hospital within one year, from August 2017 to July 2018. That is, 11,700 patients were referred to the hospital in total, of which 5% were in the ESI 1 category and another 10% were in the ESI 2. The ESI for the others was found to be 30%, 30%, and 25% for ESI 3, ESI 4, and ESI 5, respectively. In order to capture the patient flow, four clustering algorithms have been applied. The results are reported in the following sub-section.

4.1 Patients data clustering

According to Nyman (2007), LOS is an important performance measure for a hospital. Since patients must pay for the cost of care services they receive, the cost is an important performance measure for a hospital. Based on the gathered information of BHH, patients who underwent surgery, coronary artery bypass grafting (CABG), and primary percutaneous coronary intervention (pPCI)/PCI had longer stay and cost than other patients. Furthermore, age, gender, and blood cholesterol are important and influential factors in heart disease. By selecting age, gender, cost, LOS, CABG, pPCI/PCI, and blood cholesterol features, patients in the BHH dataset were categorized into two groups using clustering algorithms (K-means, K-medoid, Hierarchical clustering, and fuzzy C-means).

As can be seen in Table 1, hierarchical clustering with two clusters outperformed the other methods, based on both the Silhouette score (0.8520) and the Dunn index (0.4548). This is also confirmed in Fig. 4, where Silhouette and Dunn’s index have the highest values (shown in boldface) for the hierarchical algorithm with two clusters.

Table 1 Internal validation of clustering algorithms
Fig. 4
figure 4

Comparison of the results of clustering algorithms

The resulting clusters are justified in two ways. First, based on the BHH, it is very important for the hospital authorities to classify patients based on the so-called "cost class". The low-cost patients and the high-cost patients are the two classes from this point of view. Secondly, as it is shown in Table 1 and illustrated in Fig. 4, these two classes are technically confirmed using machine learning techniques.

According to Fig. 5a, cost is an appropriate feature in comparison to other features to separate the observations into two clusters. As seen in Fig. 5b, a cost boundary line of provides a separation of observations into two clusters with an accuracy of 0.99.

Fig. 5
figure 5

Results of hierarchical clustering

As shown in Fig. 6, based on the information obtained from BHH data and for modeling purpose, patients from ESI 1, ESI 2, and ESI 3 who have received services CABG or PCI / (PPCI) (or both) are regrouped as high-risk / high-cost patients (cluster 1), while the other patients are labeled as low-risk / low-cost patients (cluster 2). The higher cost of treating a patient is a consequence of being in the high-risk category. Hence, in this study, this fact has been considered as one of the useful features in differentiating high-risk and low-risk patients.

Fig. 6
figure 6

Medical versus modeling view of data

The clustering results indicate that approximately 90% of ESI 1, 70% of ESI2, and 13% of ESI3 patients are in the high-risk cluster. Figure 7a shows the percentage of patients based on their ESI as reported at the end of the previous section, while Fig. 7b shows the percentage of high-risk and low-risk patients based on the clustering approach.

Fig. 7
figure 7

Percentage of patients based on ESI and categories

4.2 Simulation input model

To create a correct simulation model, it is necessary to determine the right probability distribution function for those inputs of the model which follow random behavior. Based on historical data and graphical representations, probability distributions of the time of essential procedures were determined using classical Kolmogorov–Smirnov test. Triangular distribution provided good fit for the time of most activities, while an exponential distribution was adequate for modelling patient inter-arrival time. Table 2 shows the time distributions of activities that are essential for patients. These distributions were also verified by clinical staff.

Table 2 Input distributions for simulation model

4.3 Carbon emission calculation

To calculate the CFP, electricity consumption by equipment in different care units was investigated. Electricity consumption depends, almost linearly, on the amount of time the equipment is used during the treatment process. Statistical results show that the time usage of each piece of equipment follows a triangular distribution with parameters (min, mode, max) (see Table 8 in Appendix). Electricity consumption of equipment is taken from technical specifications, shown in Table 9 in the Appendix.

The emission factor is conventionally expressed in terms of carbon dioxide emitted for every unit of energy delivered, e.g., kilograms of carbon dioxide per kilowatt-hour (\(kg{CO}_{2}/kWh\)). The amount of produced \({kgCO}_{2}\) in the hospital is calculated using Eqs. (1) and (2).

$${C}_{i}=\sum_{k=1}^{K}\sum_{j=1}^{J}\left(EF*{T}_{ijk}*{W}_{jk}*{Z}_{ijk}\right) \quad \forall i=1,2,\dots ,I$$
(1)
$$T{CO}_{2}=\sum_{i=1}^{I}{C}_{i}$$
(2)

where \(T{CO}_{2}\) is the total amount of carbon dioxide produced in the hospital, \({C}_{i}\) is the total amount of carbon dioxide produced per patient, EF is the emission factor, \({T}_{ijk}\) indicates the usage time (hours) of equipment \(j\) in care unit \(k\) for patient \(i\), \({W}_{jk}\) is the rate of power consumption (kW) of equipment \(j\) in the care unit \(k\), and \({Z}_{ijk}\) equals one if equipment \(j\) is used in care unit \(k\) for patient i; otherwise, it is zero.

To calculate the total CO2 emitted, it is required to have the emission factor. Migone et al. (2010) estimated Greenhouse Gas (GHG) emission of the electricity generation sector for Iranian power plants and showed that Iran’s national grid emission factor (EF) was 0.58, 0.62, 0.61 and 0.62 kgCO2/kWh for years 2007, 2008, 2009 and 2010 respectively. Despite the development of the hydropower and renewable energy power plants and their shares in generated power, Iran’s grid EF has not changed dramatically, mostly because of the simultaneous development of fossil fuel power plants that counterbalances this positive effect. Therefore, we used the four years weighted EF average of 0.61 in our models.

4.4 Patient flow simulation model

The focal point of patient flow analysis is how patients are moved throughout the treatment process and from activity to activity. The flow of a patient could vary from a simple sequence of some care services to a very complex flow with lots of decisions, branching, repetitions, and reworks. The complexity of the flow depends on the patient's conditions and uncertainties.

Patients who arrived by ambulance are categorized as ESI 1 and transferred to CPR immediately, whilst those who walk into hospital first go to the admission desk to be routed to the appropriate treatment activity. The routing is based on probabilities that were determined from historical data and observation. In the simulation model, each arriving patient is routed according to the clustering group and the ten routing schemes. Upon generating incoming patients, their associated ESI label are also generated using a probability distribution. The corresponding probability distribution is based on the number of patients of each five ESI categories within the population of patients who have visited the hospital in the last two years. Service time for each activity was randomly generated using the probability distributions presented in Table 2. In simulating each activity, the amount of CO2 emitted is calculated using the duration that an equipment contributes to patient treatment, and the electricity consumption of the device (see Table 9 in the Appendix).

The simulation model was run for a year (365 days) and replicated 50 times to ensure that model outputs are accurate enough. To further validate the model, a t-Test was used to see whether the mean value of the simulation results was statistically different from the actual values for the year from August 2017 to July 2018. As seen in Table 3, there is no significant difference between the simulation output and the actual data.

Table 3 The t-test results of comparing the mean of simulation output and the actual data

Table 4 shows average simulation results for each patient group (low risk and high risk). According to Table 4, the average waiting time is higher for low-risk (10 min) that for high-risk (7 Minutes) while the LOS is lower (4,273 versus 2,595 min). The amount of CO2 produced per high-risk patient and low-risk patient is 19.18 and 13.26, respectively, resulting in a total amount of carbon dioxide of 14,615 for high-risk patients and of 145,224 for low-risk patients.

Table 4 Average simulation results for one year

Simulation enables us to find the best configuration among a set of predetermined scenarios. Optimization is then applied to search for an optimal configuration among many (infinite) scenarios subject to specified constraints.

4.5 Simulation-based optimization

OptQuest is a generic optimization package that treats the simulation model as a black box by considering inputs and outputs of the simulation model and combines the metaheuristics of Neural Networks (NNs), Scatter Search (SS), and Tabu Search (TS) into a single search heuristic. To optimize the hospital performance criteria (including number of patients being served, waiting time, length of stay, and amount of CO2 produced), a mathematical model is proposed with one objective and ten constraints.

In the following optimization problem, Eq. (3) is a single-objective function \({f}_{i}\) representing the hospital performance criterion to be optimized with i = 1 for number of patients served (to be maximized), i = 2 for waiting time, i = 3 for total length of stay, and i = 4 for total amount of CO2 produced. For \({f}_{2}\),\({f}_{3}\), and \({f}_{4}\), the model is a minimization problem. Variables \({x}_{1}\),\({x}_{2}\),\(\dots \), and \({x}_{8}\) represent the number of beds in ED, Cath lab, PCCU, CCU I, CCU II, ICU I, ICU II, and operating rooms, respectively. The value of \({\alpha }_{i}\) is calculated as the average value of simulation outputs over the 50 runs. For example, for i = 2, i.e., waiting time, \({\alpha }_{2}\) is the average waiting time calculated using the output of all simulation runs. Equation (4) represents four different constraints of the optimization model. For instance, \({f}_{2}({x}_{j})\le {a}_{2}\) forces the optimization model to choose a solution in which the optimized total waiting time is at least as good as the simulation results. Equation (5) provides bounds on the number of beds in each care unit \({x}_{j};j=1, \dots , 8\) as defined by hospital authorities based on operational requirements and financial conditions and shown in Table 5. Equation (6) indicates that the mathematical model is an integer programming problem.

Table 5 Bounds on the number of beds defined by hospital
$$\underset{1\le i\le 4}{{\max} {\min}}{f}_{i}({x}_{j} )$$
(3)
$${f}_{i}\left({x}_{j}\right)\le {\alpha }_{i}\quad j=1,\dots ,8;\quad \forall i=1,\dots ,4$$
(4)
$${x}_{jL}\le {x}_{j}\le {x}_{jU}\quad j = 1, 2,\ldots 8$$
(5)
$${x}_{j} \in {Z}^{+} \quad j=1, 2,\ldots 8$$
(6)

The purpose of using the simulation–optimization model is to determine the number of beds in the hospital wards so that number of patients discharged from the hospital is maximized, and length of stay, waiting time, and amount of carbon produced due to the use of medical equipment during the treatment process are minimized. Optimizing the number of hospital beds plays an important role in improving hospital performance.

Optimized simulation outputs for both high-risk and low-risk patients are shown in Table 6. For example, the optimal value of the objective function \({f}_{1}\) is 11,125 for low-risk patients, while it is 776 for high-risk patients. Considering all four objective functions, and the willingness to keep a conservative approach to the number of beds, the hospital authorities have decided to set the number of beds in each unit as the maximum value suggested by the four objective functions.

Table 6 Optimized values of objective functions and number of beds

Table 7 shows the percentage improvement of the objective functions \({ f}_{1}\),\({f}_{2}\), \({f}_{3}\), and \({f}_{4}\) for both low-risk and high-risk patients after optimization. As it is seen, the highest improvement is in \({f}_{2}\) for both low-risk and high-risk patients.

Table 7 Percentage improvement obtained compared to current status

5 Findings and conclusions

This study reports the successful improvement in patient flow achieved at a heart hospital in Iran. It proposed a hybrid method combining data mining and simulation–optimization approach to improve care delivery in a cardiovascular hospital.

In the data mining part, four clustering algorithms (K-means, K-medoid, hierarchical clustering, and fuzzy C-means) were applied to cluster patients based on age, gender, cost, LOS, CABG, and pPCI/PCI, features. The clustering results were evaluated using Dunn’s index and Silhouette index and showed that hierarchical clustering with two clusters performed better than the other clustering algorithms. Hence, patients were classified into two categories, namely high-risk and low-risk patients.

Then, a simulation-based methodology was applied to each cluster of patients to track performance measures of the treatment process. The OptQuest package was used to optimize number of patients being served, total waiting time, LOS, and the amount of CO2 produced during the process. The use of simulation–optimization models was shown to be particularly valuable for identifying process improvement and quantifying the resulting improvements in hospital performance.

Considering the environmental impact of hospitals is a great challenge while maintaining a good level of care services. The proposed approach in this study helped a hospital to resolve this challenge. Although our research was applied to a specific hospital in Iran, the results are applicable to most other hospitals. It appears that other hospitals and healthcare, in general, have comparable performance measures and environmental concerns. Therefore, the problems and potential solutions described in this study would be applicable to many hospitals, worldwide.

The proposed approach could be extended from different angles. Time-dependent flows of patients could help to bridge environmental concerns with other crucial challenges such as scheduling and resource management. We could then use timed colored Petri nets to model different flow branching and resources.