Introduction

In cardiac surgery, optimal use of intensive care unit (ICU) and operating room (OR) capacity requires the prediction of future availability of ICU beds. On the level of the management of the department, a number of beds are reserved for cardiac surgery patients. In order to manage the planning of the intensive care unit and the operating theatre, it would be very helpful to have a system that provides an early alert if there is a high probability that a patient will be disconnected from ventilation during the next day. When the patients are still ventilated, they cannot be sent to a normal ward, the bed does not become available and the surgeon cannot operate on new patients. This medically relevant prediction task, concerning the time instant on which patients can be disconnected from mechanical ventilation, is well suited for our research, where we want to focus, in the first place, on the impact of using dynamic information in the prediction task.

Information on vital signs such as heart rate, blood pressure, oxygenation, etc. is routinely gathered in the ICU. Continuous evaluation of the values of these variables starting from the arrival in the ICU is important because the alterations are relevant to patient management [1]. For our analysis, we wanted to use the trends of the vital signals during the first hours of ICU stay to predict a short or prolonged length of stay from early on. Prognostic models in medicine can be useful for various tasks: from capacity planning to individual patient interventions. For an overview of uses and development approaches in the statistical and artificial intelligence field, we refer to the work of Abu-Hanna et al. and Ohno-Machado et al. [2, 3].

Minimal conditions used to start the weaning from mechanical ventilation in these critically ill patients are: hemodynamic and respiratory stability, absence of bleeding and normothermia. A lot of different physiological parameters related to these criteria are measured, monitored and stored in a typical ICU. From research it was shown that humans have difficulties with interpreting and handling more than seven variables at the same time. On top of that, the interpretation of the information can differ between clinicians and interpretation of temporal data seems to be the most important problem [4]. So, the need for decision support in the medical environment is very high [5]. Medical diagnostic decision support systems already have been an established component of medical technology [6]. A number of quantitative models, including logistic regression, neural networks and many others, have been used in this kind of systems to assist human decision-makers in several applications [6, 7], e.g. in epileptic detections [8].

Since all living organisms are characterized by the fact that they are complex individually different time-variant and dynamic (so called CITD systems) [9], it is expected that taking these characteristics into account will lead to better models of the physiological signals of intensive care patients. For example, Cappi et al. used repeated measurements of Acute Physiology and Chronic Health Evaluation (APACHE) instead of only one score on the day of admission [10], since this APACHE score is based on physiological measurements on a certain moment in time and does not consider the evolution of the signals in time. Chang et al. predicted deaths among ICU patients on the basis of trend analysis of daily measured APACHE II scores that were corrected for organ system failure. They applied this approach because they were convinced that the patho-physiological processes affecting ICU patients are dynamic and cannot be reflected by a single assessment of a static score on the day of admission [1113]. Also Clermont et al. used repeated static scores in their micro simulation model to predict temporal patterns of multiple outcomes on the basis of demographic variables and the Sequential Organ Failure (SOFA) scores on admission [14]. The work of Toma et al. describes a method that captures the temporal evolution of organ functioning which is quantified by SOFA scores or Individual Organ System Failure (IOSF) scores and uses these patterns in a logistic regression modeling framework [15, 16].

Instead of using repeated static scores as described before to obtain dynamic information about a patient, it is also possible to extract dynamically relevant features from the commonly measured physiological data itself. Since a lot of time series of physiological variables are available in the ICU environment, these signals could be well suited as inputs for different modeling techniques and for cepstral coefficient analysis. These techniques can all be used to analyze individual patients whose health status varies with time. So far, univariate autoregressive analyses of physiological variables have been applied in several studies in the field of intensive care medicine [17, 18]. Akaike used a multivariate autoregressive method for the identification of a multivariate feedback system [19]. A lot of systems, e.g. in the human body, can be explained using this kind of systems [20]. His method has been applied in several medical applications [2123] and helps to detect the relationships between all variables included in the model. The calculation of cepstral coefficients is another possibility to extract the significant features from time series. Curcie et al., for example, used this technique to identify individual heart rate patterns [24].

For making classifications using many variables at the same time, several data mining techniques are available. It has been demonstrated that machine learning algorithms can analyze data from a collection of patients and can be trained to make predictions on new unseen patients. Machine learning algorithms have been used in a variety of medical applications [25] and have been shown to be specially valuable in data mining scenarios involving large databases and where the domain is poorly understood and therefore difficult to model by humans [26]. Intensive care is one of those domains that can benefit from the use of machine learning techniques [27]. In this field they have been used for prediction and classification tasks. For instance they have been used for classifying pressure-volume curves into different measurement methods for artificially ventilated patients suffering from the Adult Respiratory Distress Syndrome (ARDS) [28]. They have been shown to outperform to logistic regression in the task of classifying ICU patients with head injuries according to their outcome: good vs. poor Glasgow Coma Scores (GOS) and dead vs. alive [29]. In a different prediction task, Tong and colleagues successfully classified a neonatal ICU population according to ventilation duration, a study that extends their previous success with the same machine learning technique and classification task but on an adult ICU setting [30]. Giraldo and colleagues [31] classified respiratory patterns of patients on weaning trials into those that will succeed or fail to sustain spontaneous breathing. Gaussian processes (GP) have been applied to the problem of neonatal seizure detection from electroencephalograph (EEG) signals, where they are shown to outperform other modeling methods currently in clinical use for EEG analysis [32].

However, in the above cases no dynamic information about the patients is taken into account when applying the data mining approach although it is important and useful to capture and analyze the temporal aspects of the data as part of the knowledge discovery process [33]. Several attempts on temporal feature extraction for time series classification have been made [3437]. According to Kadous et al. [38], abstracting temporal features is not a trivial task, especially not when it has to be done automatically. In this work, several automatically extracted representations of the dynamics of time series will be studied. Moreover, the classification results will be compared with the classification based on admission data only, while in the references cited above only monitoring data was considered, even though in clinical practice, static admission data plays an important role in the calculation of health evaluation scores such as APACHE scores.

The general objective of this study was to explore and quantify the prognostic value of dynamic information that was abstracted from time series data in various ways. More specifically, it was investigated whether the prediction of the timeframe in which the minimal clinical conditions to start weaning of the mechanical ventilation are reached, can be more accurately predicted by using dynamic information of the individual patients when compared to predictions on the basis of static admission data.

Materials and methods

Figure 1 gives an overview of the consecutive steps in our analyses. In this section the used signals, different types of time series analyses, the GP classifier and the prediction task are briefly explained.

Fig. 1
figure 1

Schematic overview of the analyses performed in this research

Data generation

In the surgical ICU of the university hospitals of Leuven, 22 beds are reserved for cardiac surgery patients. We screened all patients admitted to the ICU after planned coronary bypass surgery, between February 2006 and December 2006 for this retrospective study. Ethics committee approval was obtained, and the need for informed consent was waived because of the retrospective nature of the study. We selected five physiological variables, routinely monitored in these patients (Philips Merlin monitor), to be used as inputs. Since we were focusing on the dynamics of the patients in this study, we took into account signals that were measured with the highest frequency (i.e. a sample interval of 1 min) in the Patient Data Management System (Metavision®, iMD-Soft®) and that were, on top of that, almost always measured and registered and showed enough variability. For an overview, see Table 1. Data of a total of 203 patients was used for analysis.

Table 1 Physiological variables

For these patients also admission data was used (see Table 2). For this, parameters from the Parsonnet score [39] and Euroscore [40] were selected, as far as they were available. Both scores have been shown to be predictive for ICU length of stay. The following seven variables were taken into account: age, sex, body mass index (BMI), normal lung function, diabetes, creatinine level, and NYHA class. The NYHA (New York Heart Association) classifies the extent of heart failure and ranges from I (no symptoms or limitations) to IV (severe limitations).

Table 2 The population description table

Modeling analysis

Abstraction of dynamic information

In order to quantify the dynamics of the patients’ physiological variables, we used the mean and standard deviations of the signals (Avgstd), we applied multivariate autoregressive models (MAR) and calculated cepstral coefficients (CEP). The latter two are explained in more detail in this section.

Multivariate autoregressive models (MAR)

A time series is a sequence of observations taken sequentially in time. Most time series consist of elements that are serially dependent. A common approach for analyzing this dependence is the AR model. In this type of model, a coefficient or a set of coefficients is estimated that describes the association between consecutive elements of the series [41]. The general equation of a multivariate autoregressive model (MAR) can be written as

$$Y\left( t \right) = \sum\limits_{m = 1}^M {A\left( m \right)Y\left( {t - m} \right) + E\left( t \right)} $$
(1)

Every observation is made up of a linear combination of M prior observations (the order of the model) and a white noise term, which is a vector of mutually independent white noises. \(Y\left( t \right) = \left[ {y_1 \left( t \right),y_2 \left( t \right), \ldots ,y_K \left( t \right)} \right]\) is the vector of simultaneously measured values at time t for K variables, in this case all variables of Table 1, and \(E\left( t \right) = \left[ {e_1 \left( t \right),e_2 \left( t \right), \ldots ,e_K \left( t \right)} \right]\) is a prediction error vector. The generation of the AR models was performed using the ARfit package for Matlab [42]. The matrices A(m) are the MAR coefficients and are estimated using a stepwise least squares algorithm. In this study, the coefficients of matrix A are used as features in the data mining (cfr. Fig. 1) since they describe the dynamics of the considered system.

Cepstral coefficients (CEP)

Cepstrum analysis is a nonlinear signal processing technique with a variety of applications in areas such as speech and image processing. The cepstrum is defined as the inverse Fourier transform of the short-time logarithmic amplitude spectrum [43, 44]. More detailed, the real cepstrum for a sequence x is given by the sequence y:

$$y = \frac{1}{{2\pi }}{\int_{ - \pi }^\pi {\log {\left| {X{\left( {e^{{j\omega }} } \right)}} \right|}e^{{^{{j\omega t}} }} d\omega } }$$
(2)

where \(X{\left( {e^{{j\omega }} } \right)}\) is the Fourier transform of y.

The difference between the cepstral coefficients of different time series can be used as a similarity measure between these time series. Cepstral coefficients decay rapidly to zero, so only the first few coefficients are needed to capture most of the dynamic information in the time series. An example of the cepstrum of the heart rate signal of one patient is shown in Fig. 2. Because of the good clustering results of Kalpakis et al. [43] on the basis of cepstral coefficients, it is interesting to use these coefficients as input features in the data mining analysis as an alternative summary of the dynamics of the signals. Moreover, other techniques based on frequency information, such as the calculation of wavelet coefficients [34],have been applied for the summarization of data. Given the good results of Zhang et al. it is worthwhile to explore and use frequency information (such as cepstrum coefficients) in the classification task.

Fig. 2
figure 2

A heart rate signal in beats per minute of 230 samples. Right: The corresponding cepstrum truncated at 50 cepstral coefficients

Gaussian processes for classification

Gaussian processes [45], a type of kernel method, are a machine learning technique that has been successfully used to model and forecast real dynamic systems because of their flexible modeling abilities and their high predictive performances. They allow for multi-dimensional inputs and they assign a confidence value to their predictions. The main advantage of using a GP classifier over other kernel method classifiers is that it produces an output with a clear probabilistic interpretation [46].

In probabilistic binary classification the task is to determine for an unlabeled test input vector \(x_* \) the probability of belonging to the class \(C:\pi _{c} {\left( {x_{ * } } \right)} = p{\left( {\left. {t_{ * } = 1} \right|x_{ * } } \right)}\) when a training set {X, t} is given. The training set is comprised of N training input vectors \(X = \left\{ {x_1 ,x_2 , \ldots ,x_N } \right\}\) and their corresponding N binary class labels \(t = \left\{ {t_1 ,t_2 , \ldots ,t_N } \right\}\) such that t i =+1 if x i belongs to a given class C and t i =–1 if x i does not belong to the class. The probability that \(x_* \) does not belong to the class can then be computed as \(p{\left( {\left. {t_{ * } = - 1} \right|x_{ * } } \right)} = 1 - \pi _{c} {\left( {x_{ * } } \right)}\). In the remainder of this text the input vectors X will be referred to as examples.

In GP binary classification [47], a GP over a function f(x) is defined and then transformed through a logistic or squashing function σ(. ) so that its outputs lie in the [0,1] interval, and can be thus interpreted as probabilities: \(\pi _{c} {\left( x \right)}: = p{\left( {\left. {t = + 1} \right|x} \right)} = \sigma {\left( {f{\left( x \right)}} \right)}\). Conditioning the predictive distribution on the training data allows for a probabilistic prediction on a test input example [48].

A GP is a distribution over functions and is a natural generalization of a Gaussian Distribution, the latter of which has a vector as mean and as covariance a matrix. The GP over a function is accordingly specified by a mean function and a covariance function. The covariance function is given by a positive semi-definite kernel function k(x i , x j ). The covariance function determines the properties of the function distribution in the GP, for example it can impose smoothness so that nearby inputs x i , x j have similar values f(x i ), f(x j ), with high probability.

Learning from data in the GP case means to modify the function distribution by conditioning it on the observed data. This modified or posterior function distribution has a mean function that coincides with the target values when evaluated on the training examples.

Figure 3 shows a GP learned from the one-dimensional training data depicted with crosses. The shaded area corresponds to the 95% confidence region learned for the function distribution; it can be seen that the uncertainty of the prediction grows in regions where there are few training points. Figure 4 shows a cut-section of the predicted distribution for the test input at -6, which has a mean predicted value of 1.92. Also shown (dashed line) is the predicted distribution before training, which has a mean predicted value of 0 and is very broad to reflect the uncertainty associated with this prediction. Once learning has occurred, the predictions become more certain because data has been seen in the vicinity of the test point, and the predictions must be consistent with these observations.

Fig. 3
figure 3

Gaussian Process learned from the one-dimensional training data depicted with crosses. The shaded area corresponds to the 95% confidence region learned for the function distribution, and bold line indicates the mean predictions. The dashed line indicates a test point more thoroughly studied in Fig. 4

Fig. 4
figure 4

Predicted distribution for the test input at -6, with a mean predicted value of 1.92. Dashed line is the predicted distribution before training, which is very broad and has a mean predicted value of 0

Given that the GP is defined by its covariance function, and that the covariance or kernel function is defined by a set of parameters (referred to as hyper parameters), then training the GP amounts to finding the values of the hyper parameters such that the probability of the data given these hyper parameters is maximized.

Because of the inclusion of the logistic function σ(. ) required for classification, the inference of the predictive or posterior distribution requires the solution to integrals which are analytically intractable, a problem that is solved either by resorting to Monte Carlo sampling or analytical approximations to the integrals. In this study we follow the latter approach through the use of expectation propagation [49].

The covariance function used in this study is the so called rational quadratic with ARD (automatic relevance determination) defined as follows:

$$k{\left( {x_{i} ,x_{j} } \right)} = \sigma ^{2}_{f} {\left[ {1 + \frac{{{\left( {x_{i} - x_{j} } \right)}^{T} M{\left( {x_{i} - x_{j} } \right)}}}{{2\alpha }}} \right]}^{{ - \alpha }} $$
(3)

Recall that each example x corresponds to a vector obtained from the different time series models. In this equation M=diag(l)–2, and the \(l_1 ,l_2 , \ldots ,l_D \) parameters in the diagonal matrix are characteristic length-scales for each dimension of the input examples. The σ f is the signal variance of the process, which controls its magnitude and α is the shape parameter. Learning or training the GP amounts to finding the values for the parameters \(\theta = {\left\{ {\sigma _{f} ,\alpha ,l_{{\text{1}}} ,l_{{\text{2}}} , \ldots ,l_{D} } \right\}}\) (which are iteratively updated according to the expectation propagation algorithm) so as to maximize the likelihood of the class labels given the training data [46]. The values of the parameters of the diagonal matrix M determine the relevance of the corresponding input dimension. If after training, a length-scale has a very large value, the covariance will become almost independent of that input dimension. This ARD covariance function has been found in other works to successfully remove uninformative input dimensions [50]. Also, the increase in degrees of freedom of the ARD covariance function given by the increase in hyper parameters allows for more complex mappings between the inputs and the targets to be found.

It has been shown [46] that in the limit α→∞, the covariance function of Eq. (3) converges to the squared exponential covariance function, one of the most frequently used covariance function in kernel methods. The rational quadratic covariance function can thus be seen as an infinite sum of squared exponential covariance functions with different characteristic length-scales. A detailed description of commonly used covariance functions can be found in the work of Rasmussen et al. [46].

Protocol

Prediction task

Can we predict the time frame in which the patients fulfill the criteria for stability that will lead to weaning from mechanical ventilation? In our ICU, cardiac surgery patients are weaned off the ventilator using a protocol. In this protocol, the following criteria have to be met before sedation can be switched off: hemodynamic stability (dobutamine ≤5 μg/kg/min, levophed ≤0,2 μg/kg/min and lactate <2 mmol/L), respiratory stability (the oxygen saturation in arterial blood flow (PaO2) ≥75 mmHg, the fraction of inspired oxygen concentration (FiO2) ≤0.5, the positive end-expiratory pressure (PEEP) ≤8 mbar), temperature stability (blood temperature >36°C, peripheral temperature >30°C) and blood loss stability (sum of blood loss of all drains <100 ml/h).

To enable future comparisons with predictions performed by intensivists, the considered task was restated as follows: Predict the probability that the patient will begin to satisfy the stability criteria within each of the following time frames (classes): class 1: earlier than 9 h after admission; class 2: later than 9 h after admission. This 9 h threshold was chosen such that the resulting classes contained roughly same amount of patients. In class 1 there was a total of 102 patients and class 2 contained 101 patients. These classes also conform to an intuitive classification into patients that recover quickly and those that require prolonged ICU stays.

Preprocessing

Before doing any analysis, the signals were normalized: the mean was put to zero and the standard deviation to one. Furthermore, the recorded time series contained a limited number of missing values or artifacts, usually due to sensor disconnections. The missing data points were calculated using linear interpolation. In order to remove these artifacts, a peak-shaving algorithm was applied. This algorithm consisted of three major parts. In the first step, the trend of the original time series was calculated. Secondly, an upper and lower bound were computed as the trend plus and minus four times the standard deviation of the trend respectively. In the third step, values of the original signal that did not lie in between the lower and upper bound were replaced by linearly interpolated values calculated from the previous and next value that lay in between the two borders.

In total, the inclusion of the missing values and the removal of artifacts affected 1.9% of all data points.

Time series models

Data from each patient, collected during the first 4 h ICU stay, were used to generate the different time-series models, the parameters of which were used as the features of the examples. One of the two possible class labels was assigned to each example. Figure 5 shows data from one patient used to generate a training example, and how the appropriate class label was assigned.

Fig. 5
figure 5

The gray area corresponds to 4-hour interval of data used to generate the example. The signals are numbered according to Table 1 (1: arterial blood pressure, 2: SpO2, 3: heart rate, 4: blood temperature, 5: arterial pulmonary pressure). The dashed vertical line depicts the 9-hour class-boundary and the solid vertical line indicates the moment when the patient satisfies the stability criteria (minute 627). The example generated from this data is labeled as belonging to Class 2 (Stability criteria met after 9 h)

On the one hand, sufficient data points should be taken into account in the modeling process. On the other hand, the sooner after admission of the patient a reliable prediction can be made about the extubation time, the better. Therefore, an interval duration of 4 h was chosen for our analysis. Shorter time intervals led to non-stable MAR models. The different time-series analysis techniques described above, were applied to each of the 4-hour intervals of data for each patient in order to generate the examples used as inputs for the GP classifier. In order to avoid over-fitting, the dimension of the examples should be kept low enough. According to traditional rules of thumb, 5 to 10 observations are required for each parameter to be estimated [51]. This leads to maximum number of parameters between 18 and 36 in our 10-fold cross-validation schema (explained below) for 203 patients. The types of examples (input vectors) used to train the GP classifiers in our experiments are explained below. They were designed in a way that the rule above is not violated.

Signal Average and standard deviation (Avgstd)

Each example is a 20 dimensional vector containing four values for each of the five physiological variables of Table 1. For two intervals of 2 h the mean and the standard deviation were calculated for each signal.

MAR coefficients

All five variables of Table 1 were used as input of a first order MAR model. The first order was chosen in order to keep all models as simple and compact as possible. Moreover, higher order models would lead to examples of high dimensions and in that case there is a higher chance for over-fitting (cfr. supra). So, matrix A of Eq. 1 was a 5 × 5 matrix of which all 25 parameters were put in a 25 dimensional vector that served as input example of the GP.

Cepstral coefficients (CEP)

Each example contained the four (CEP_4) or five (CEP_5) first cepstral coefficients of all variables in Table 1, i.e. the four or five first numbers of the sequence y of Eq. 2. This resulted in a 20 or 25 dimensional vector respectively.

Gaussian processes

A binary probabilistic classifier was learned for class 1, such that for each patient a probability p of belonging to the class was obtained, the probability of belonging to class 2 could readily be determined as 1−p. Training examples for each classifier were labeled positive (t=+1) if the moment when the patient became stable started within the corresponding time interval and were labeled negative (t=−1) otherwise.

All examples generated for all patients from one type of time series model and their corresponding class labels were collected in one dataset. The dataset was randomly split into 10-folds, 1 fold was removed and used as test set, while the data from the remaining folds was used as training for the classifier. Once the classifier has been trained, the predicted probability of belonging to class 1 was determined for each example in the test set. The described process was repeated for each of the 10 folds so that a probability of belonging to each class was assigned to each of the N patients. In other words, a 10-fold cross-validation was performed. The obtained probabilities allowed for the computation of an aROC (area under the receiver operating characteristic curve) for each classifier. If a hard-classification is required, each patient would be assigned to the class for which it had the highest probability. To evaluate the calibration of the predicted probabilities the Brier Score [52] was also computed.

To evaluate whether there was statistical significance between the differences in performance of the classifiers two approaches were followed. Regarding the Brier scores, a non-parametric bootstrap method [53] was used to generate a bootstrap distribution of 1,000 samples of mean differences, from which a 95% confidence interval could be determined based on the 2.5% and 97.5% quartiles. If the confidence interval did not include 0, then a statistically significant difference at the 0.05 level was declared. Regarding the aROC scores the non-parametric method described in DeLong et al. [54] was implemented to determine significance at the 0.05 level.

Results and discussion

Table 3 gives the obtained aROCs as well as the Brier scores for each experiment. The left column of 3 contains to the results of the corresponding GP probabilistic binary classifier with the covariance function of Eq. 3. The right column contains the results obtained when using a logistic regression (LOGREG) model [55], included here as a baseline for performance.

Table 3 The aROC’s and Brier scores for all experiments

The main goal of this research was to investigate the prognostic value of dynamic information abstracted in various ways when predicting how much time a critically ill patient needs to reach a stable state after coronary bypass surgery. Five physiological variables were considered, not including demographic or historical patient information. A separate model on the basis of admission data was developed for comparison purposes.

Table 4 shows that the increase in performance for all GP models versus the LOGREG models was found to be significant, except for the model based on admission data for which the difference in performance was not statistically significant. So, although logistic regression techniques are commonly used in medical applications, other classifiers might lead to better results. This was, among others, also concluded by Sakai et al. who found that artificial neural networks have a higher level of accuracy than logistic regression models for the diagnosis of acute appendicitis [56]. In another study about assessing the posttraumatic cerebral hemodynamia in minor head injured patients, Erol et al. obtained better classification results with multi-layer perceptron neural networks than with logistic regressions [57].

Table 4 The statistical significance of the differences in performance between the GP classifier and the logistic regression shown for the Brier scores as well as the aROC’s

The statistical significances of the GP are shown in Table 5. From this it is clear that all dynamic models perform better than the model purely based on admission information, with respect to both the Brier score and aROC. In Table 4, the GP with 5 cepstral coefficients (CEP_5) had the best performance (lowest Brier score and highest aROC). From Table 5 it can be seen that the difference in performance is shown to be significant. This agrees with our assumptions that it is a promising approach towards feature extraction for time-series prediction tasks. Only the first five cepstral coefficients seem to contain enough information to result in a good classification, which is consistent with the findings of Kapalkis et al. [43]. The poor performance of the models based on static information alone can be attributed to the similarity of these parameters for the two classes in our particular population (see Table 2).

Table 5 The statistical significance of the differences between the different GP classifiers

There is no statistically significant difference in performance between CEP_4 and MAR or between CEP_4 and Avgstd. With respect to the Brier score there is no statistically significant difference between MAR and Avgstd, but there is one regarding aROC values (in favor of Avgstd).

Table 6 gives the statistical significances for the logistic regression models. From this table can be concluded that there is no significant statistical difference between any of the two models with respect to the Brier score. When considering the aROC’s, Avgstd has the best performance with statistical significance.

Table 6 The statistical significance of the differences between the different logistic regressions

A possibility to improve the results is to combine the parameters of several dynamic analysis techniques in one input example for the GP classifier, or to combine static admission data and dynamic information of the first hours in the ICU.

To improve on the generalization capabilities of the classifiers it would also be of use to increase the number of patients used during training. When more patients are included, the models can be trained on more features what possibly results in better performances while over-fitting is still avoided. This increase both in the number of physiological variables and patients will however require more complex implementations of the algorithms presented such that they are able to cope with the data increase while still remaining computationally tractable. Possible variants of the GP classifier include the use of sparse methods, aggregation, dimensionality reduction techniques and the inclusion of more specialized kernels that better incorporate the available prior knowledge.

To our knowledge, the work of Verduijn et al. [35] is most closely related to our study. They compared two temporal abstraction procedures, one that resulted in symbolic descriptions of the data and one that resulted in numerical mate features. These procedures were applied to monitoring data from the ICU for the estimation of the risk of prolonged mechanical ventilation after cardiac surgery. The defined the outcome as “mechanical ventilation longer than 24 h” and used high frequently measured physiological data as well as laboratory values of the first 12 h in the ICU. The main conclusion of their work was that induction of numerical meta features is preferable to extraction of symbolic meta features using existing clinical concepts. These results compliment our own findings, in which for a particular population, extracted dynamic features can be used as predictors that outperform more typically used clinical concepts such as static admission data.

Conclusion

In this study, the use of dynamic information, obtained from physiological signals in various ways, was investigated for the prediction task about the future stability of ICU patients, resulting in weaning of mechanical ventilators. For every patient a probability of belonging to each of two classes was assigned. Each class was defined according to the time needed to reach a stable state after coronary bypass surgery: less or more than 9 h. For this prediction, dynamic data from the first 4 h of the patient’s ICU stay were included and results were compared to a model built upon admission data only. The main conclusion of this work is that it is preferable to use dynamic information of the first few hours after admission in the ICU above using only static admission data for the considered prediction task. All models based on dynamic information preformed better with respect to aROC’s and Brier scores and the differences were found to be significant. When compared to logistic regression, the Gaussian process classifier results in better performances in all cases.