Modelling physiological deterioration in post-operative patient vital-sign data

Patients who undergo upper-gastrointestinal surgery have a high incidence of post-operative complications, often requiring admission to the intensive care unit several days after surgery. A dataset comprising observational vital-sign data from 171 post-operative patients taking part in a two-phase clinical trial at the Oxford Cancer Centre, was used to explore the trajectory of patients’ vital-sign changes during their stay in the post-operative ward using both univariate and multivariate analyses. A model of normality based vital-sign data from patients who had a “normal” recovery was constructed using a kernel density estimate, and tested with “abnormal” data from patients who deteriorated sufficiently to be re-admitted to the intensive care unit. The vital-sign distributions from “normal” patients were found to vary over time from admission to the post-operative ward to their discharge home, but no significant changes in their distributions were observed from halfway through their stay on the ward to the time of discharge. The model of normality identified patient deterioration when tested with unseen “abnormal” data, suggesting that such techniques may be used to provide early warning of adverse physiological events.


Introduction
Delayed detection of clinical deterioration has been repeatedly associated with high rates of avoidable in-hospital death and intensive care unit (ICU) readmissions (which are associated with a substantially increased mortality rate) [8,11,15]. According to large national surgical audits such as the UK National Confidential Enquiry into post-operative deaths, current systems of post-operative care fail to detect or respond appropriately to early signs of critical illness [17]. Such failures have been explained by lack of experienced senior nursing staff, inexperienced trainee medical staff [17], poor quality of care offered to critically ill patients [6,15], and, more importantly, the inability of current systems to recognise clinical deterioration early. All of these factors can lead to deterioration in a patient's condition and admission to the ICU, or death.
The UK National Institute for Health and Clinical Excellence (NICE) [16] has recommended that physiological track and trigger (T&T) systems should be used to monitor all adult patients in acute hospital units, to promote the recognition of patient deterioration early enough to allow proper intervention by medical staff. These systems are based on early warning scores (EWS) calculated from the values of physiological variables observed periodically. Univariate scoring criteria are applied to each physiological variable (vital sign) in turn, and then care is escalated to a higher level if any of the scores assigned to individual vital signs, or the sum of all such scores, exceed some threshold. There is widespread interest and clinical utilisation of these scores in countries across Europe and Australasia, and increasingly in North America [5]. However, the quality of evidence supporting the use of T&T systems is poor [5], and they have a number of disadvantages. The thresholds and ranges of these EWS systems are mostly determined heuristically (although evidence-based methods have recently been proposed [18,23]). Furthermore, each vital sign is treated independently and correlations between them are not taken into account. Also, the clinical setting from which data are acquired for either validating or designing the EWS system is an important consideration. Many studies have been conducted in medical assessment units [5,18], and it is questionable whether the scores can be extrapolated to other medical units; for example, post-operative wards, general wards, or other settings.
An alternative approach to detecting patient deterioration from changes in vital signs is that of novelty detection [2,20], or one-class classification, which involves the construction of a multivariate, multimodal model of normality using examples of ''normal'' vital signs. This then allows the classification of test data as either ''normal'' or ''abnormal'' with respect to that model. Several approaches to novelty detection have been proposed, and an extensive review of these techniques is presented in [13,14]. We have shown how novelty detection can be combined with continuous vital-sign monitoring of acutely ill in-hospital patients [7,9,10,21].
In this paper, we investigate models of normality tuned to a specific post-operative patient population, recovering from gastro-intestinal surgery. Following surgery, patients start in their most acute state and gradually stabilise. We hypothesise that models of the distribution of vital-sign data from ''normal'' patients may be used to describe the physiological trajectory associated with ''normal'' recovery of these patients. These models may then be used to identify ''abnormal'' trajectories in patients who experience major deterioration and have to be re-admitted to the ICU.

Dataset
Vital-sign data (heart rate, HR, measured in beats per minute; respiratory rate, RR, measured in breaths per minute; arterial blood oxygen saturation, SpO 2 , measured as a percentage; systolic blood pressure, SysBP, measured in mmHg; core temperature measured with a tympanic thermometer in°C; and a level of consciousness assessed typically with the Glasgow Coma Scale, 1 GCS) were recorded by nursing staff during their regular observations of post-operative patients in the upper gastrointestinal (GI) ward at the Oxford Cancer Centre, Oxford University Hospitals NHS Trust, Oxford, UK. The dataset used for the work described by this paper comprises measurements of HR, RR, SpO 2 , SysBP and temperature (dimensionality of the input space, D = 5) acquired by ward staff every hour or every 2 h in the days immediately following the patient admission to the ward (depending on patient's condition), and every 4 h in the last few days of the patient's stay on the ward. These measurements were then transcribed by two independent research nurses into an electronic database.
200 patients were recruited during Phase I of the CALMS2 clinical trial in the upper GI ward (approved by the local research ethics committee, REC reference: 08/H0607/79). This dataset was firstly refined to include only observations with no missing physiological variables (for example, if an observation from a patient does not include HR, it was removed from the dataset). The median length of stay on the ward was 9 days, and we selected those patients who stayed on the ward for a minimum of 4 days (which corresponds to the 10th percentile) and a maximum of 29 days (90th percentile), to construct our model of normality, for the purposes of novelty detection. This reduces the number of patients from 200 to 171 (see Fig. 1).
From the original dataset, a set of 12,797 observations X [ R 5 obtained from the 171 patients was then analysed. From the patients analysed, those who were either admitted to the ICU or died on the ward were labelled as belonging to the ''abnormal group'' of patients (17 patients), while the remainder were labelled as being part of the ''normal group'' (154 patients). The patient population characteristics in each group are shown in Table 1. We note that the mortality rate in the ''abnormal'' set of patients was

Vital-sign distributions
The changes in vital-sign distributions between admission to the upper GI ward and subsequent discharge, when the patient was deemed sufficiently stable to go home, were evaluated for the patients in the ''normal group''. Normalised histograms (unit area under the curve) and cumulative distribution functions (cdfs) were plotted for each physiological variable (HR, RR, SpO 2 , SysBP and temperature), using the average value for each variable on the admission and discharge days.
The trajectory of each vital sign throughout the patient's stay on the ward was evaluated by examining the following subgroups of observations: • G 1 : the set of averages of all observations performed on the first day of the patient's stay on the ward (admission day); • G 2 : the set of averages of all observations performed on the day that corresponds to a quarter (25 %) of the length of the patient's stay on the ward; • G 3 : the set of averages of all observations performed on the day that corresponds to half (50 %) of the length of the patient's stay on the ward; • G 4 : the set of averages of all observations performed on the day that corresponds to 75 % of the length of the patient's stay on the ward; • G 5 : the set of averages of all observations performed on the last day of the patient's stay on the ward (discharge day).
These subgroups were defined in this way because of the different lengths of patient stay on the ward (which varied between 4 and 28 days). Three different metrics were used to compare the resulting vital-sign distributions: the Kolmogorov-Smirnov (KS) metric [4], the symmetrical Kullback-Leibler (KL) distance [12,25] and the Bhattacharyya (Bhat) distance [1].
The KS distance is a non-parametric metric that quantifies the distance between the empirical distribution functions of two sample sets [4]. Considering two probability densities, p and q, if P and Q are the respective cdfs, the KS distance (DKS) between them is defined by where supðyÞ is the supremum of the set of distances y. The KL divergence [12] compares the entropy of two distributions over the same random variable. It measures the number of additional bits required when encoding a random variable with a distribution pðxÞ using the alternative distribution qðxÞ. This measure is asymmetrical, but it can be modified to be the symmetrical KL distance (DKL S ) [25], defined as The Bhat distance (DBhat) [1] measures the amount of overlap between two distributions, and is defined by In order to study the physiological trajectory of the ''normal'' patients, the distributions of each vital sign, for each of the first 4 subgroups described (G 1 , G 2 , G 3 , G 4 ) were compared with G 5 (which contains the average of the vital signs from the most physiologically stable period of the patient stay), using the three metrics defined by (1), (2) and (3).

Data visualisation
The first stage in constructing a model of normality for novelty detection usually consists of obtaining more insight into the structure of the data [22]. Procedures for visualisation of the data in their original high-dimensional space are therefore required.
can be visualised through a non-linear projection from R D to R 2 . Sammon's method [19] seeks to create a mapping such that the distances between pairs of image points in a projection plane (y i ; y j ) are as close as possible to the distances between the corresponding pair of points in data space (x i ; x j ). The following error function, known as the Sammon stress metric, is defined as with d ij ¼ x i À x j and d Ã ij ¼ y i À y j , where k k is the Euclidean norm. The Sammon mapping aims to minimise the error metric (4), which can be achieved by initialising the image points y to have random locations in a 2-D map and by iteratively adjusting these locations in the direction which gives the maximum change in E using a gradient descent method.
It is assumed a priori that each of the five vital signs has equal importance in the model of normality. Each variable was therefore scaled to have approximately the same dynamic range to ensure that variables with large changes (e.g., blood pressure in mmHg) do not dominate parameters with smaller changes (e.g., temperature in°C). Every vitalsign measurement, x, was normalised using a zero-mean, unit-variance transformation, x n ¼ ðx À lÞ=r, where x n is the normalised value and l and r are the mean and standard deviation of the vital sign, respectively, in the overall dataset (171 patients).
The Sammon mapping algorithm was then applied to the 770 normalised vectors contained in the 5 subgroups (G 1 , G 2 , G 3 , G 4 and G 5 ) from the 154 patients included in the ''normal'' subset of patients.

Model of normality
We now consider the construction of a model of normality, based on all observations made on the last day on the ward (discharge day) of each patient from the ''normal group''. This dataset contains the vital signs from the most physiologically stable period of the patient stay, because these data were acquired immediately prior to discharge from the ward, when the patient is at their most ''normal'' after recovering from surgery. This set of ''normal'' pre-discharge data contains 1,100 vital-sign vectors, X [ R 5 , which were subsequently used for the construction of our model of normality.
A kernel density estimate [3] is a technique that allows the underlying 5-dimensional vital-sign pdf to be estimated from training data. A kernel density estimate was chosen because it is a non-parametric method, so no a priori assumptions are made about the form of the probability distribution.
The pdf of the set of N = 1,100 ''normal'' vectors, x 1 ; . . .; x N , was estimated using the following equation: which is a weighted sum of Gaussian kernels centred on the 1,100 vectors, x i , and where each kernel is isotropic with variance r 2 . The variance was determined using the nearest-neighbour method proposed by Bishop [2], in which the average of the squared Euclidean distance to the set of 10 nearest neighbours fNNsg is determined for each point x 1 ; . . .; x N in X, and r 2 is estimated by calculating the average over all points: The likelihood for all data from the ''normal'' group of patients was then calculated using (5). The likelihood of all data from the ''abnormal'' group of patients, prior to the occurrence of an adverse event (either death or ICU admission) was also evaluated using the same model of normality.
In order to estimate the ''abnormality'' of a data point x, the departure from normality is usually quantified using a novelty score defined as follows where zðxÞ is the novelty score and h ¼ fx i ; rg. ''Normal'' data, which have higher likelihoods pðxjhÞ, therefore generate low novelty scores zðxÞ; conversely, ''abnormal'' data, which have lower likelihoods, generate high novelty scores zðxÞ.

Vital-sign distributions
Empirical pdfs (histograms) and cdfs for each physiological variable for each of the two subgroups G 1 (average of observations at admission) and G 5 (average of observations at discharge day) are shown in Fig. 2. Table 2 gives the corresponding means and standard deviations. Figure 3 shows the KS, KL and Bhat distance-maps between each of the distributions, for the 4 subgroups (G 1 , G 2 , G 3 , and G 4 ) and the distribution for the G 5 subgroup. In each distance-map, the subgroups involved (G 5 -G i with i = {1, 2, 3, 4}) are represented on the x-axis, and physiological variables are represented on the y-axis. The colour code is associated with the values of the calculated distances (blue indicates small distances, and red indicates large distances).
It may be seen that, apart from the HR, the distributions represented for each of the other 4 vital signs vary from admission to discharge, as the patient recovers from surgery.
The results obtained by the three different metrics are very similar, in the sense that the patterns in the distances for each physiological variable are identical. The distances between the G 1 and G 5 distributions are greater than the distances between the G 3 and G 5 distributions.

Data visualisation
The resulting Sammon maps obtained are shown in Fig. 4. Represented in each map are the projected data points from G 1 , G 2 , G 3 , G 4 (red crosses) superimposed on the projected data points from G 5 (blue points).

Model of normality
The novelty scores are computed each day for each patient, by averaging the scores for each set of vital-sign observations that day. The group mean novelty scores zðxÞ for Fig. 2 Histograms for respiratory rate, heart rate, blood oxygen saturation and temperature, computed from the average of vital-sign data acquired from patients at admission to the post-operative ward (light blue) and time near discharge (dark blue). Cumulative distribution functions PðxÞ for each vital sign from patients at admission (light red) and time near discharge (dark red), are also represented (refer to the right vertical axis) (colour figure online) each day are shown in Fig. 5 for ''normal'' and ''abnormal'' patients. The median length of stay on the ward for the ''normal'' group of patients is 9 (see Table 1). For the ''abnormal'' group, we considered the length of stay on the ward prior to the event (either admission to the ICU or death). The median time to event for this group is 5 days. The novelty scores are displayed in Fig. 5 for the length of stay (or time to event in the case of the ''abnormal'' group) up to the 75th percentile (12 and 8 days, respectively) for each group of patients. Figure 6 shows the change in novelty score over time for two example patients from the ''abnormal'' group who deteriorate sufficiently after surgery to be re-admitted to ICU. A threshold zðxÞ ¼ k was determined using k ¼ l þ 3s:d:, where l is the average of the density estimates zðxÞ for the 154 ''normal'' patients in the model of normality, and where s:d: is one standard deviation of zðxÞ for these ''normal'' patients.
The first example (Fig. 6a) shows a patient who deteriorated 2 days after admission to the upper GI ward and was then admitted to the ICU. The patient was sent back to the upper GI ward after 2 days in the ICU. The patient stayed a further 4 days before being discharged. During the first 2 days after surgery, the patient exhibits physiological instability (which is more significant at the end of the second day) showing indications of tachycardia (HR reaching 150 beats per min) and tachypnea (RR reaching almost 40 breaths per min). It can be seen that zðxÞ increases in value at approximately 12 h before ICU admission, indicating physiological deterioration. After the stabilisation of the patient in the ICU, zðxÞ remains close to the ''normal'' trajectory.
The second example (Fig. 6b) shows a patient who had some periods of instability after being admitted to the upper GI ward, following surgery. After 7 days, the patient was re-admitted to the ICU, and then died 1 month later. In this case, large variations are observed in zðxÞ during the post-surgical period of abnormality, which are caused by episodes of elevated blood pressure (SysBP at around 190 mmHg) and bradycardia (HR decreasing to 45 beats per minute). These exceed the threshold defined by zðxÞ ¼ k.

Discussion
The histograms and cdfs shown in Fig. 2 indicate that the HR distributions are similar and approximately symmetrical. The distributions for SpO 2 are one-sided because they are limited to the maximum value SpO 2 = 100 %. For the distribution of SpO 2 values at admission, a mode occurs at SpO 2 = 97 %. Patients are likely to achieve 100 % oxygen saturation only if they are receiving additional oxygen through an oxygen mask. Therefore, the distributions shown in Fig. 2 exclude values of SpO 2 [99 %. RR distributions are similar between admission and discharge. Tympanic temperature and SysBP distributions show that patients are, in general, mildly pyrexic (high temperature) and hypotensive (low systolic blood pressure) when admitted to the ward following surgery. They subsequently show decreasing temperature (back to ''normal'' values) and increasing blood pressure (back to ''normal'' values) by the last day of their stay on the ward.
From the distances between the distributions calculated with the three different metrics (Fig. 3), we can easily see the pattern of recovery with time: the distance between the G 1 and G 5 distributions is greater than, for example, the distance between the G 3 and G 5 distributions. If we consider the SysBP, for example, the KS, KL and Bhat distances between the G 1 and G 5 distributions are 0.29, 0.41 and 0.54, respectively, whereas the distances between the G 3 and G 5 distributions are 0.21, 0.31 and 0.34. The Sammon maps represented in Fig. 4 show that the projected data from the five groups form clusters with some overlap between them, but that there are groups with visually separable distributions. The G 1 cluster is the most diffuse (shown in red, in the upper-left plot in Fig. 4), while the projected data from G 3 , G 4 and G 5 are more concentrated, and similar to each other in their locus in the projection plane. This suggests that there are no large changes in the vital-sign distributions from halfway through a patient's stay to the time of their discharge from the ward. That is, ''normal'' patients appear to have stabilised at around halfway through their stay on the ward. These results suggest that patients included in the ''normal group'' could have been considered for earlier discharge, or provided with a lower level of care from halfway through their stay.
From the trajectory of zðxÞ for the ''normal'' group of patients (Fig. 5a) we can see a significant decrease in zðxÞ in the first 4 days, after which zðxÞ is approximately constant for t C 4 days. The first 4 days correspond to patient recovery immediately following surgery [24]. After day 4, the majority of patients included in the ''normal'' group appear to have fully physiologically recovered from surgery and are physiologically stable. It could be argued that these patients are sufficiently stable for early discharge to be considered, or for them to be provided with a lower level of care should they need to remain in hospital for reasons not related to physiological instability. Conversely, zðxÞ for the ''abnormal'' group of patients, suggests that the physiological trajectory for these patients is significantly different to that of ''normal'' patients with a sudden increase in novelty in the last 48 h, following the gradual decrease prior to this. These results suggest that patients' criticality could be assessed by evaluating the distribution of their vital signs using the novelty score of Eq. (8) after their admission to the post-operative upper GI ward, following major surgery.
In summary, this study indicates that multivariate models of normality may be used to assess post-operative patients' criticality. A multivariate model of the distribution of vital-sign data from ''normal'' patients was constructed using a kernel density estimate, and tested using ''abnormal'' data from patients who deteriorate sufficiently after surgery to be re-admitted to the ICU. Significant differences were found between the physiological trajectories for ''normal'' patients and those for ''abnormal'' patients.