1 Introduction

Over the past decade, emergency departments (EDs) faced challenging conditions to fulfill their missions. Thus, optimally controlling and managing patient flux, risk management, and care process are needed to ensure suitable care quality. For several years, this has resulted in various strategic and operational actions in the health care sectors. However, despite the combined efforts, the EDs still suffer from numerous constraints (structural, human, material, and financial), moral crisis (Clancy 2007; Kellermann 2006) and crowding situations (Hoot and Dominik 2008; Kadri et al. 2014b; Stone et al. 2019). The EDs are often the first line providing care for patients with sudden illness and injury (Jun et al. 2011). Their activity has been steadily increasing over the last decades. For example, in France, the ED visits have been doubled during this time with +3.5% per year (Bejaoui et al. 2018). In this context, EDs often face the influx of patients generated by events and/or exceptional situations like health threats related to epidemics. Therefore, emergencies must be able to receive and manage patients’ flux that is sometimes very important for medical and surgical treatments. Hence it has become essential to predict ED activities, which is necessary for appropriate patient flows control and for optimal utilization of resources (human and material) in the ED (Bhattacharjee and Ray 2014; Chen et al. 2019; Kadri et al. 2017).

Nowadays, the hospital systems have faced the influx of patients generated by several events such as seasonal flow epidemics (e.g., influenza, colds, gastroenteritis, bronchiolitis in winter, and trauma in summer) or health crises related to epidemics (e.g., COVID’19). Therefore, hospital establishments, particularly EDs, must receive patients for medical treatments regardless of the extent of the care demands. The consequence of the patient influx often result in increasing the length of stay of patients (LOS) in EDs, which leads the overcrowding problems and strain situations within the ED ( Kadri et al. 2014a, b; Boyle et al. 2012; Visintin et al. 2019; Wachtel and Elalouf 2017). The management of ED overcrowding is one of the most critical challenges faced by many hospital establishments which require significant human and material resources (unfortunately these latter’s are limits). To overcome this issue, hospital managers must predict the patient’s LOS. This latter is considered as an essential indicator for the assessment of the ED overcrowding (Stone et al. 2019; Chung-Hsien et al. 2017).

The rapid advancement in machine learning (ML) technology has generated high opportunities in data-based applications, especially healthcare (Vijay and Bala 2018; Xue et al. 2018). In recent years, many statistical and machine learning models have been proposed to improve predictions of EDs activities (Carter and Potts 2014; Beunza et al. 2019; Stewart et al. 2018; Wason 2018; Yucesan et al. 2020). The implication of machine intelligence in healthcare can result in improved performance in wide medical applications, such as enhancing ECG signal classification and arrhythmia analysis (Kumar et al. 2020) and predicting inpatient mortality in departments of internal medicine (Schwartz et al. 2018). Importantly, machine learning models can be employed to develop intelligent medical decision support tools for clinical interpretation and analysis and enable saving time for practitioners (Abacha et al. 2015; Benbelkacem et al. 2019; Daghistani Tahani et al. 2019; Ichikawa et al. 2016; Layeghian et al. 2018). For example, in Li et al. (2013). Back-propagation (BP) neural network is applied to predict patient LOS surgery services to help the medical staff individualize patient treatment. To this end, the patient’s age, gender, patient admission conditions, and other hospitalization information are used as input features for the neural network model. Similarly, in Gentimis et al. (2017), a neural network-based model is considered to predict hospital LOS using information extracted from the patient admission, discharges, and transfer information’s with medical and laboratory information. The authors in Pendharkar and Khurana (2014) investigated three different ML techniques, namely regression tree (CART), chi-square automatic interaction detection (CHAID), and support vector regression (SVR), to predict patient LOS in Pennsylvania Federal and Specialty hospitals. In Mekhaldi et al. (2020), the authors investigated the factors influencing the patient LOS in a Hospital setting and explored random forest and the gradient boosting model for patient LOS prediction. In Awad et al. (2017), authors presented an overview of the methods and applications of LOS and mortality prediction in acute medicine and critical care units. The study in Elbattah and Molloy (2016) investigates the feasibility of machine learning in predicting the inpatient LOS and discharge destinations of hip-fracture patients using patient historical data. Results revealed that the random forests provided superior performance compared to Neural networks and Linear regression model, by reaching an overall classification accuracy of 0.88. In Bacchi et al. (2020), four models have been investigated to predict LOS and discharge destination in a hospital: the artificial neural network (ANN), the convolutional neural network (CNN), logistic regression, and random forest. Results showed that ANN excels the other models in predicting LOS and discharge destinations. In Ma et al. (2020), an approach combining just-in-time learning (JITL) with a one-class extreme learning machine (ELM) procedure is introduced for predicting the number of days a patient stays in intensive care units (ICU) in a hospital. Experiments are conducted based on physiological data of ICU patients. The JITL-ELM approach reached good prediction accuracy with an AUC of 0.8510 and outperformed traditional binary classification schemes (i.e., one-class ELM and one-class SVM). Recently, in Bacchi et al. (2020b), authors reported a review regarding LOS prediction based on machine learning methods. In He et al. (2021), the authors proposed a multi-task approach to predict LOS and classify inpatient flow simultaneously. This approach is designed based on artificial neural networks and multi-task learning (MTL) methods. Results based on data from a hospital in New York City revealed the superior performance of the multi-task approach compared to four single-task learning models.

With computer technology’s progress and the massive increase of collected data, deep learning technology showed improved performance compared to the traditional machine learning models and has progressively become a hot topic (Wang et al. 2018; Harrou et al. 2020b; Pham et al. 2017; SabaLuca et al. 2019). Deep learning methods have gained particular attention in healthcare applications because of their outstanding generalization capability and superior nonlinear approximation (Lalmuanawma et al. 2020; Xue et al. 2019; Chang et al. 2020; Harrou et al. 2018). Moreover, they enable automatic learning of relevant features from complex data  (Lundervold 2019; Guo et al. 2018; Mai et al. 2019; Pham et al. 2017). For instance, in Liu et al. (2019), an iHome smart dental health—internet of things (IoT) system has been proposed using intelligent hardware, deep learning, and a mobile terminal. This study explored the possibility of dental treatment using home-based dental health care services. In Wang et al. (2020), deep learning and machine learning methods are considered in detecting Parkinson’s disease (PD) by monitoring some premotor features of PD, i.e., rapid eye movement (REM), sleep behaviour disorder (RBD), and olfactory loss. Results highlighted the superior performance of deep learning in discriminating normal individuals and patients aected by PD. In Mefraz et al. (2019), a transfer learning-based approach with intelligent training data selection has been employed for Alzheimer’s diagnosis from magnetic resonance imaging (MRI) images. In Xiaokang et al. (2020), a semisupervised deep learning-based strategy is used to improve human activity recognition on the Internet of Healthcare environments. In Yuan et al. (2018), a unified multi-view deep learning approach is proposed to detect Electroencephalogram seizure on a clinical scalp multi-channel EEG epilepsy dataset. Results demonstrate the potential advantages of deep learning for EEG seizure detection. In Harrou et al. (2020a), authors applied a particularly promising deep learning-based model called a variational autoencoder (VAE) to the problem of predicting patient admissions and flow through an emergency department in a pediatric hospital. Results show that the VAE model has a much better performance than the other models. This result is mainly due to the extended capacity of the VAE model to learn higher-level features that permit good forecasting precision. In another study (Liao et al. 2020), a deep learning-based approach is developed to assess physical rehabilitation exercises toward enhanced patient outcomes. In Kretz et al. (2020), a regression convolutional neural net (CNN) is used to estimate mammography image quality reliably. This approach is tested using data, including virtual mammography with known ground truth and real images of the contrast-detail phantom for mammography (CDMAM) phantom. It has been shown that the trained CNN can predict contrast-detail curves from single simulated and single real images of the CDMAM phantom. In Chang et al. (2019), a deep learning-based approach is employed to design an intelligent medicine recognition system. This system, called ST-Med-Box, enables recognizing drugs and delivering recognition outputs systematically and practically for chronic patients. The considered system comprises an intelligent medicine recognition device, an application implemented in an android-based mobile device, a cloud-based management platform, and a deep learning training server. Successful results were obtained by this system with a 96.6% recognition rate when applied to recognize eight different medicines. These applications of the deep learning models are examples of how deep learning can interpret and extract meaning from complicated data sets.

Recently, generative adversarial networks (GANs) have shown good efficacy in synthesizing visually appealing images (Goodfellow et al. 2014). A GAN essentially comprises two networks: the generator network \({\mathcal {G}}\) and the discriminator network \({\mathcal {D}}\) (Goodfellow et al. 2014). The generator attempts to learn the distribution underlying the input data and then generate new data points following the learned distribution to elude the discriminator, aiming to distinguish real data points from fake ones (Zhang et al. 2018). GANs give a promising strategy to avoid the overfitting phenomena (Creswell et al. 2018). Thereby, the GANs desirable features have been exploited in numerous applications, including text-to-image synthesis (Zhang et al. 2018), face synthesis (Di et al. 2018), and single image super-resolution (Ledig et al. 2017). Of course, GAN architecture is flexible, enabling to choose any neural network architecture as generator or discriminator resulting in a deeper model able to learn more complex features. Essentially, this study investigates the effectiveness of the GAN-based approach to anticipate overcrowding and strain situations by predicting the patient LOS at EDs, where the input data are time-series data instead of images.

In our previous work (Benbelkacem et al. 2019), supervised machine learning models, including Naive Bayes, C4.5, and SVM methods, have been applied to predict the length of stay in the PED at the Lille University Hospital. However, these models are static and cannot capture dynamics in data. In another work (Kadri and Abdennbi 2020), we investigated the performance of recurrent neural networks (RNNs), including LSTM and GRU, trained in a supervised manner to forecast patient flow in a hospital emergency department. Recently, in Harrou et al. (2021), we combine the advantages of the autoregressive-moving-average (ARMA) model with the detection capacity of the generalized likelihood ratio (GLR) scheme to detect abnormal high patient influx in an ED. Results revealed the superior performance of this coupled monitoring scheme compared to the conventional charts (e.g., Shewhart chart and an exponentially weighted moving average chart). This work aims to develop an unsupervised predictive approach, based on GAN deep learning models, for predicting patient LOS at an emergency department. As we know, this is the first study presenting an unsupervised deep learning model for patient LOS prediction. Overall, the contributions of this paper are three-fold:

  • Firstly, this study introduces an effective prediction approach based on a deep learning-driven GAN model. This is the first study introducing the GAN-based model to effectively forecast patient LOS in ED to the best of the author’s knowledge. This choice is motivated by the flexibility of GAN architecture, which enables selecting any neural network architecture as a generator or discriminator, resulting in a deeper model to learn more complex features. Furthermore, the GAN-driven approach flexibly can learn relevant information from linear and nonlinear processes without prior assumptions on data distribution and significantly enhances the prediction accuracy. Crucially, the GAN-driven method is trained in an unsupervised manner. A predictor layer has been amalgamated in the GAN as the output layer to perform LOS prediction.

  • Moreover, we classified the predicted patients’ LOS according to time spent at the PED to further help decision-making and prevent overcrowding. The presented classes are manually established and validated based on the expertise and knowledge of pediatric medical experts.

  • Thirdly, this study compared the prediction performance of the proposed GAN-driven approach with other deep learning models, including deep belief networks (DBN) (Hinton Geoffrey et al. 2006), convolutional neural network (CNN) (LeCun et al. 1998), stacked auto-encoder (SAE) (Vincent et al. 2010), and support vector regression (SVR) (Awad and Khanna 2015), random forests (RF), AdaBoost, and decision tree (DT). Here, DBN and an SAE for comparison purposes, which are unsupervised and generative deep learning models. We used DBN and an SAE for comparison purposes, which are unsupervised and generative deep learning models. In addition, we used a powerful deep learning model widely used, namely CNN. Also, four efficient machine learning models, namely SVR, RF, DT, and AdaBoost, are incorporated in the comparison.

  • The GAN-based LOS prediction performance has been assessed via an actual database collected from the pediatric emergency department (PED) in Lille regional hospital center, France. It has been shown that the proposed GAN-based predictor achieves better prediction accuracy performance than other investigated models.

The remaining sections of the paper are organized as follows. Section 2 provides a brief presentation of the GAN model, how it can be applied to predict patient LOS in EDs, and presents the used metrics of effectiveness. Section 3 presents the used dataset, the statistical data analysis, and discusses the results of applying the proposed approach to the data collected from the pediatric emergency department of Lille (France). Section 4 reviews the main points discussed in this work and concludes the study.

2 Methodology

  The proposed deep learning-driven prediction approach in this paper aims to predict the patient LOS in EDs. This section introduces firstly an overview of the GAN model and how it is used. Then presents the deep learning-driven patient LOS prediction procedure.

2.1 Generative adversarial networks

GANs are effective deep learning models that can learn complex data representations and capture data distributions without any need for labeling (i.e., in an unsupervised manner) (Creswell et al. 2018; Lin et al. 2018). Specifically, GANs are formed of two neural networks (called generator and discriminator) arranged in an adversarial way in competition with each other. The composite architecture of GAN is inspired from the game theory (two-person zero-sum games), where the gain of one person is the loss of the other making the sum zero. The two persons participating in the game are representing a generator and a discriminator. During the training, GAN attempts to learn the statistical distribution of the training dataset through high-dimensional distributions modeling based on two deep learning models, the generator \({\mathcal {G}}\) and the discriminator \({\mathcal {D}}\).

Fig. 1
figure 1

GAN architecture

The following steps describe the role of the generator and the discriminator.

  • The Generator model produces synthetic data (called fake data) and introduces noise to make realistic samples. To do that, it learns to capture the probability distribution of the training and generate new data samples.

  • The Discriminator model input is the fake and real data and aims to determine whether the input data is from the actual data or the generator (Fig. 1).

  • Both models are driven by adversarial competition to enhance further the quality of the fake data generated, which becomes progressively comparable to real data (training data).

  • The optimum is reached until an equilibrium between the two models is found: achieving superior discriminative capacity and suitability mimics the fake data with approximatively identical distribution to the real data.

The real data distribution is represented by the probability density function \(p_{data}(x)\). The distribution of the data produced by the generator (fake) is denoted by \(p_{g}(x)\), while z represents noise vector obtained from the given distribution \(p_{z}(z)\) uniform distribution is usually used in this case. The discriminator function \({\mathcal {D}}(x,\theta _D)\) parameterized by \(\theta _D\), which are updated during the training based on the loss function \({\mathcal {L}}_{D}\) defined as follow:

$$\begin{aligned} {\mathcal {L}}_{D} = - E_{x \sim p_{data} } [log ({\mathcal {D}} ( x ))] - E_{z \sim p_{z} } [log (1- {\mathcal {D}} ( {\mathcal {G}}(z) )]. \end{aligned}$$
(1)

With the generator function \({\mathcal {G}}(x,\theta _G)\) parameterized by \(\theta _G\). The minimizing \({\mathcal {L}}_{D}\) is done through the maximization of \({\mathcal {D}}(x)\) and minimization \({\mathcal {D}}( {\mathcal {G}} ( z )) \). The training objective of \({\mathcal {G}}\) is to derive the generated probability distribution \(p_{g}(x)\) closest to the true data distribution \(p_{data}(x)\), during the training phase, the distribution \(p_{z}(z)\) is used to draw a batch of samples to feed \({\mathcal {G}}\), that will produce a new outputs, where the outputs is constrained to generate data that meet the statistical distribution of \(p_{g}(x)\) (ie:. more realistic data generated). The parameters of \({\mathcal {G}}\) are updates with the respect to the loss function \({\mathcal {L}}_{D}\) defined as follow:

$$\begin{aligned} {\mathcal {L}}_{D} = {\mathbb {E}}_{z \sim p_{z} } [log (- {\mathcal {D}} ( {\mathcal {G}}(z) )]. \end{aligned}$$
(2)

GAN training optimize a value function \({\mathcal {V}}\), which is built based on the two loss functions \({\mathcal {L}}_{G}\) and \({\mathcal {L}}_{D}\) in order to form a two-player minimax game:

$$\begin{aligned} \nonumber \min _{G} \max _{D} {\mathcal {V}} ({\mathcal {G}}, {\mathcal {D}})&= {\mathbb {E}}_{x \sim p_{data}} [log {\mathcal {D}}(x)]\\&\quad + {\mathbb {E}}_{z \sim p_{z}} [log (1-{\mathcal {D}}({\mathcal {G}}(z)) )]. \end{aligned}$$
(3)

The Eq. (4) denotes the stochastic gradient used by the discriminator to update its parameters during the training phase:

$$\begin{aligned} \bigtriangledown _{\theta d} \dfrac{1}{m} \sum _{i=1}^{m} [log ({\mathcal {D}}(x^i) ) + log (1 - {\mathcal {D}}(G(z^i)) )]. \end{aligned}$$
(4)

While the second Eq. (5) is used by the generator to update the model parameter based on a descending stochastic gradient

$$\begin{aligned} \bigtriangledown _{\theta g} \dfrac{1}{m} \sum _{i=1}^{m} log (1 - {\mathcal {D}}(G(z^i)) ). \end{aligned}$$
(5)

The discriminator D function convergence is expressed in the Eq. (6), Where the training data distribution is denoted by \(P_data (x)\), while \(P_g(z)\) represents the learned distribution.

$$\begin{aligned} {\mathcal {D}}^* = \dfrac{P_data (x)}{P_data (x) + P_g(z)}. \end{aligned}$$
(6)

Indeed GAN is trained by alternately updating the generator’s parameters and a discriminator based on the batch (a subset sampled from the training dataset). For the prediction problems, the adversarial training strategy achieved promising performance, even-though if GAN was designed for computer vision problems.

In summary, the training of the GAN model, which is composed of generative and discriminative models, is performed using an unsupervised learning approach by alternately optimizing the generative and discriminative models (Goodfellow et al. 2014). The training objective is to learn the LOS data distribution approximation through learning to generate samples that resemble the LOS data points. After completing the training, the discriminator is used for the rest of the study since the role of the generator is to help the discriminator to distinguish between true and fake (noisy) data; so at the end of the training, the discriminator output constitutes a features ma or space that provide a compact continuous representation of the input data (LOS variables). Note that this study aims to predict LOS time series data based on the GAN model. Thus, the output layer of the discriminator should provide a vector of real values (i.e., LOS predictions). In particular, the last layer of the GAN model serves as a predictor layer. The proposed models should produce a single output representing the LOS; for this reason, we added a mapping layer that we call a predictor (Figs. 2, 3). It should be noted that the discriminator and the predictor training are done via supervised learning to learn the mapping between the observed LOS variables and the LOS value.

Fig. 2
figure 2

Illustration of the GAN predictor layer

2.2 Building GAN prediction model

Here, we propose a deep generative approach for LOS prediction, called GAN Predictor (GAN-P), which was proposed initially to deal with computer vision problems like image classification (Fig. 3). Indeed we adapt it to perform a prediction of LOS time-series based on a set of input variables. Essentially, the main objective of GAN is to learn the probability distribution of the underlining training dataset, which represents a mapping between given patient cases and the time spent waiting before leaving. The learning procedure is divided into two stages; first, unsupervised training is established to approximate the probability distribution based on dual models, namely generator and discriminator, which learn to gather in an adversarial way to learn and discover features reconstruct them with an objective to minimize the reconstruction error.

While the second stage consists of fine-tuning the discriminator model parameters. Specifically, fine-tuning aims to adjust and optimize the already trained parameters to reach the global optimum. Moreover, a predictor layer is added in this stage to complete the whole architecture; initially, the predictor is randomly initialized and optimized during the fine-tuning step to learn how to map a given observation to its LOS. This why we called the proposed approach GAN-P.

Fig. 3
figure 3

Schematic illustration of GAN-driven predictor

2.3 Evaluation metrics

In this study, we assess the accuracy of the prediction models using six commonly used metrics: coefficient of determination (R2), Root mean square error (RMSE), mean absolute percentage error (MAPE), mean absolute error (MAE), explained variance (EV), and root mean squared log error (RMSLE).

$$\begin{aligned} RMSE= & {} \sqrt{\frac{1}{n}\sum _{t=1}^{n}(y_{t}-{\hat{y}}_{t})^{2}}, \end{aligned}$$
(7)
$$\begin{aligned} MAE= & {} \frac{\sum _{t=1}^n\left| y_t-{\hat{y}}_{t}\right| }{n}, \end{aligned}$$
(8)
$$\begin{aligned} MAPE= & {} \frac{100}{n} \sum _{t=1}^{n}\bigg |\frac{y_{t}-{\hat{y}}_{t}}{y_{t}}\bigg |\%, \end{aligned}$$
(9)
$$\begin{aligned} EV= & {} 1 - \frac{\mathrm { Var ( \hat{{\mathbf {y}}} - {\mathbf {y}})}}{ \mathrm {Var}({\mathbf {y}})}, \end{aligned}$$
(10)
$$\begin{aligned} RMSLE= & {} \sqrt{\frac{1}{n}\sum _{t=1}^{n}(\log (y_{t})-\log ({\hat{y}}_{t}))^{2}}, \end{aligned}$$
(11)

where \(y_{t}\) is the patient’s LOS, \(\hat{{\mathbf {y}}}\) is its correspond-ing forecasted LOS, and n is the number of data points. Lower RMSE, MAE, MAPE, and RMSLE values and higher EV and R2 would imply better precision and prediction quality.

2.4 The proposed forecasting framework

The flowchart of the proposed LOS forecasting framework is depicted in Fig. 4. The recorded LOS time-series datasets are first preprocessed and then used to train deep learning and machine learning models. Importantly, to improve data quality, outliers are discarded, and missing values are imputed (Wu et al. 2020). Handling missing values and outlier detection are essential steps in the data preparation process (Wu and Luo 2020). Outliers are distinct from typical samples in quantity and quality in a given dataset. Hence, outliers are usually removed to increase the performance of a forecasting method (Wu et al. 2019; Liu et al. 2008). On the other hand, to improve data quality, numerous techniques have been reported in the literature to impute missing values and appropriately predict missing data, including k-nearest neighbors and Random Forest regressor (Wu et al. 2021; Aydilek and Arslan 2012).

In this study, we normalize the LOS data, \({\mathbf {y}}\), by min-max normalization within the interval [0, 1].

$$\begin{aligned} {\widetilde{y}} = \frac{( {y} - y_{min})}{( y_{max} - y_{min})} \end{aligned}$$
(12)

where \(e_{min}\) and \(e_{max}\) denotes the minimum and maximum of the LOS data, respectively. Note that we allied a reverse procedure after the forecasting process.

Fig. 4
figure 4

Deep learning-driven LOS forecasting procedure

Here, we adopted the GAN-driven forecasting approach and compared its performance with three deep learning models (i.e., DBN, CNN, and SAE) and four machine learning models (i.e., SVR, DT, RF, and AdaBoost) to forecast LOS time-series data recorded in a pediatric emergency department. Toward this end, we split the normalized data into training and testing sub-datasets. Firstly, the forecast models are established using a training set, and the parameters of each model are computed. For GAN and DBN, a new layer having only single output is incorporated for forecasting purposes. The whole structure of each model is trained for optimizing its parameters to get suitable performance. For instance, the training in the coupled GAN and forecaster layer is performed in an unsupervised way. At first, the first layer is trained, and the latent variables at every layer are employed as the input for training the following layer. After this greedy layer-wise training, supervised fine-tuning is employed to tune weight matrices and bias of the network to minimize the loss function. Essentially, the crucial role of fine-tuning is adjusting the network parameters to minimize information losses. After that, we evaluate the forecasting performance of each model using testing data. The quality of each model is jugged in terms of six statistical metrics: R2, RMSE, MAPE, MAE, EV, and RMSLE.

3 Results and discussion

3.1 Data source: presentation of the pediatric emergency department (PED)

A daily patient visit data set recorded from the PED in CHRU-Lille is used to assess the considered methods. The Lille hospital serves about four million residents in Nord-Pas-de-Calais in France, and its PED admits, on average, 23,900 demands yearly.

The patient care process within PED begins when the patient arrives and ends when the patient exits the PED (Fig. 5). The general process of patient management at the PED in CHRU-L is characterised by the following parameters:

  • input of the care process.

  • output of the care process.

  • resources (human and material resources).

  • procedures and care protocols.

  • management and control rules.

  • ED care process.

When the patient does not present a vital emergency, he rst goes through the administrative registration. Then he will be supported by the hostess before starting the nurse and medical consultations. If the patient presents a vital emergency, he will be directly admitted to a vital emergency room without administrative registration (in general, the administrative registration is after the treatment).

Fig. 5
figure 5

Schematic illustration of the general procedure performed for managing patient in the PED (Kadri and Abdennbi 2020)

According to the interviews conducted with the PED medical staff, by taking into account the patient’s state of emergency upon arrival at the PED (critical state or not) and the need or not for additional examinations, we identified, we established a dynamic model of care process in the PED (see Fig. 6).

Fig. 6
figure 6

Stages of care process according to the type of patient admitted to the PED

3.2 Data pre-processing and data analysis

This study utilized a dataset extracted from the PED database at CHRU-L from January 1, 2011, to December 31, 2012. This dataset includes 44 676 patients. Each patient \(P_{i}\) is assigned a length of stay at the PED, \(Y_i\). Each patient \(P_{i}\) is described by 12 descriptors \(X1, X2, \ldots , X12\). The meaning and description of the attributes are given in Table 1.

Table 1 description and dignification of features

Figure 7 presents the repartition of the patient length of stay (LOS) at the PED for the whole studied period. From Fig. 7, more than 28% of patients who arrived at the PED have LOS between [1–2] h, and 22.46%, 13.42%, and 8.66% of patients have a LOS between [2–3] h, [3–4], and [4–5] h respectively.

To help decision-making and prevent overcrowding situations within the PED, we have defined and classified patients according to their length of stay (LOS). With the help of the medical staff, we defined the patient LOS as the time between when the patient completes their administrative registration at the PED and their discharge (departure) from the PED; we also proposed five classes of patient LOS. The proposed classes of patient LOS were validated by the pediatric medical staff and defined as follow:

  • Class C1 (LOS \(<=\) 120 min): represents the very short LOS.

  • Class C2 (120 min < LOS \(<=\) 210 min): represents the short LOS.

  • Class C3 (210 min < LOS \(<=\) 300 min): represents the average LOS.

  • Class C4 (300 min <LOS \(<=\) 480 min): represents the long LOS.

  • Class C5 (LOS > 480 min): represents the very long LOS.

Fig. 7
figure 7

Repartition of the patient’s length of stay at the PED

The six groups of the GEMSA classification developed by the Multicentric Emergency Department Study Group are summarized in Table 2. Each group is associated with a type of patient (Afilal et al. 2016; Kadri et al. 2014b). The classification criteria is established according to the patient’s entry and exit mode and the planned or unplanned care activity. This classification provides information on the arrival of patients, which requires significant and prolonged care (Afilal et al. 2016; Kadri et al. 2014b). According to Table 2, more than 81% of patients arriving at the PED are unplanned (GEMSA2). In general, these patients return to their homes after taking the needed care. The GMESA4 group represents unexpected arrivals (10.614 %), and most of these patients are hospitalized after emergency treatment, which increases their length of stay.

Table 2 GEMSA groups description and distribution for the studied period (2011–2012)

The clinical classification of patients in an ED (CCMU) is viewed as coding that assesses the patient’s condition in the ED, his level of severity, and his medical prognosis. Table 3 presents the classification and the distribution of patients treated at the PED level for the whole studied period (2011–2012). We note that patients classified CCMU2 represent the majority of patients admitted to the PED (more than 62% ), then patients classified CCMU1 with more than 31%. These patients presented a non-urgent state, and they all returned home after the medical cares.

Table 3 CCMU groups description and distribution for the studied period (2011–2012)

According to the studied period data, a patient may be required to have additional examinations depending on its state severity. The five types of complementary examinations are presented in Table 4. This Table shows that more than 66% of patients admitted to the PED have undergone medical imaging, and more than 23% and 19% undergo respectively radiology and biology examinations. These data allow us to explain the patient’s LOS at the PED.

Table 4 Distribution of patients according to additional examinations for the studied period (2011–2012)

According to data analysis and PED staff, we identified new features from the raw data summarized in Table 1. The new deduced features are presented in Table 5. Note that we did not apply any features selection algorithms identifying these new features.

  • Period of year: The period of the year has an influence on the LOS and especially the epidemic period that takes place between November and March (see Fig. 8. Three different periods were be distinguished. We also add the Month of the year as a new variable that may be used as a new descriptor variable.

  • Day of week: patient arrival number varies according to the day of the week. Figure 9 presents the box plots of the patient arrivals by day of the week for the two studied years. We remarked that Sunday accounts for the highest number of arrivals at the PED, followed by Monday, Saturday, and Thursday.

  • Period of day: The data analysis on the arrival number by day shows that the number of patient arrivals varies according to the hours of the day (in general, the number of patient arrivals is low between [0–7 am]. From [8 am to 7 pm], the number of arrivals increases gradually (see Fig. 10). From this analysis, three separate daily periods were deduced.

Fig. 8
figure 8

Patient arrival at the PED by month of year

Fig. 9
figure 9

Distribution of PED visits time-series by day of the week

Fig. 10
figure 10

Descriptive statistics (mean, median and max) of hourly PED visit time-series in the daytime

Table 5 New deduced features

3.3 Building and validation models for patient LOS prediction

Before starting the training process, the dataset is arranged to meet the mapping \(X \longrightarrow Y\), where \(X = {x_1,x2,\ldots ,x_k}\) and observation vector and its mapping represented by the LOS, meaning that Y is the corresponding LOS. We evaluate the performance of the proposed GAN-P against three deep learning models (i.e., CNN, DBN, and SAE) and four machine learning models (i.e., SVR, DT, RF, and AdaBoost). Table 6 presents the model parameters obtained in the training phase using the grid search approach. Note that DBN and SAES are trained using the same way as the proposed approach, using two steps unsupervised first to learn the training data distribution, and then a predictor layer is added to the model, where the whole architecture is fine-tuned based on supervised training to learn the LOS mapping. However, CNN is trained in a supervised way to learn the LOS mapping through nonlinear transformations aiming to extract features building a feature space that helps to learn this mapping. Furthermore, SVR is a supervised machine learning model based on a statistical approach to perform a nonlinear regression that maps statistical features to the corresponding LOS value.

Table 6 Signification and description of model parameters

Forecasting results from the trained models using the testing are depicted in Fig. 11. From Fig. 11, we observe that these models can follow the trend of the patient LOS time series. Figure 12 depicts the boxplots of the prediction errors of each model based on testing LOS data. The prediction errors represent the mismatch separating the recorded and forecasted patient’s LOS from the trained models. In fact, in comparing boxplots, wider distribution is characterized by a large box with large ranges. In contrast, small and compact boxes with a central line (median) nearby zero reveal that prediction error is small. We observe that SVR and CNN have the largest errors, as indicated by the width of the central box and whiskers of the boxplots of the prediction errors of these two methods. The GAN and SAE models have compressed box plots with relatively narrower interquartile intervals and whiskers, and prediction errors are tightly fluctuating around zero. Visually, one can see that the GAN reaches superior LOS prediction(i.e., having the shortest boxes with a median around zero) compared to the other models.

Fig. 11
figure 11

Forecasting results obtained by the considered methods

Fig. 12
figure 12

Boxplot of forecasting errors obtained by the considered methods

Table 7 illustrates the prediction results of the proposed approach GAN-P and the other investigated models. The evaluation is based on statistical metrics such as RMSE, MAE, R2, EV, MDAE, and MLSE. Results show that GAN-P has recorded the lowest prediction error measured by (RMSE, MAE) around (100.309, 61.722) and MDAE of 37.322. While GAN-P is maximizing the fitting score (how well did the model fit the input) denoted by R2 and EV of \(87\%\), on the other hand, we have MLSE that shows how bad did the model has performed the prediction, as we can see GAN-P has recorded the lowest value of 0.272.

Table 7 Validation measures applied to patient LOS prediction

Firstly, deep learning models have outperformed the baseline approaches (i.e., SVR, DT, RF, and Adaboost). This could be attributed to the efficiency of deep learning models are efficient to learn deep information embed in time-series data. Results also demonstrate the robustness of the proposed approach again another generative model, namely DBN and SAE, which are trained in the same way. Furthermore, GAN-P has demonstrated its superiority compared to the deep 1D CNN. The proposed approach achieves superior performance due to its learning procedure based on duality and adversarial architecture, using two deep learning models to accurately capture complex features and models.

Figure 13 displays the results of the prediction LOS values for 100 patients when using the GAN-based model. Results indicate that the GAN-P model achieved a good prediction of the five patient LOS classes presented in Sect. 3.2. This information is crucial for PED managers to better prepare for managing each class. According to this study, the developed GAN-P model has a good prediction ability and can be used to predict the patient LOS at the PED.

Fig. 13
figure 13

Prediction of patient LOS at the PED using GAN-based model

4 Conclusion

The prediction of LOS at the pediatric emergency department (PED) in Lille hospital, France, was studied using a deep learning framework. Predictions of LOS in emergency departments can help ED managers effectively manage human and material resources within their establishments and efficiently increase the quality of patient care. Therefore, predicting the ED patient length of stay is essential in controlling and managing health resources.

Firstly, this paper reports developing a deep learning-driven approach for predicting the patient LOS in EDs using a generative adversarial network (GAN) model. Secondly, the GAN model was used to predict the patient LOS at the pediatric emergency department (PED) in Lille regional hospital center, France. The experiments were carried out on an actual database collected from the PED. The GAN model results were compared with other deep learning models (SAES, DBN, and CNN) and four machine learning models (i.e., SVR, DT, RF, and AdaBoost). The results clearly show the promising performance of these deep learning models in predicting patient LOS and emphasize the GAN’s better performance compared to the other models.

However, this study raises several questions about the patient LOS prediction within EDs; predicting patient LOS on more refined LOS classes and time scales would be very interesting. We could also improve these predictions by taking into account the real state of the ED at each new patient arrival, by using other internal explanatory features, such as occupancy rate of resources, waiting time for nurse and medical consultation, waiting time for additional examinations, and the number of patient transfers, and external explanatory features, such as epidemic and meteorological events. To this end, all explanatory features that take into account the internal and external information of EDs must be taken in designing strategies aimed at avoiding overcrowding that may lead to severe strain situations in these establishments.