Adaptive threshold-based alarm strategies for continuous vital signs monitoring

Continuous vital signs monitoring in post-surgical ward patients may support early detection of clinical deterioration, but novel alarm approaches are required to ensure timely notification of abnormalities and prevent alarm-fatigue. The current study explored the performance of classical and various adaptive threshold-based alarm strategies to warn for vital sign abnormalities observed during development of an adverse event. A classical threshold-based alarm strategy used for continuous vital signs monitoring in surgical ward patients was evaluated retrospectively. Next, (combinations of) six methods to adapt alarm thresholds to personal or situational factors were simulated in the same dataset. Alarm performance was assessed using the overall alarm rate and sensitivity to detect adverse events. Using a wireless patch-based monitoring system, 3999 h of vital signs data was obtained in 39 patients. The clinically used classical alarm system produced 0.49 alarms/patient/day, and alarms were generated for 11 out of 18 observed adverse events. Each of the tested adaptive strategies either increased sensitivity to detect adverse events or reduced overall alarm rate. Combining specific strategies improved overall performance most and resulted in earlier presentation of alarms in case of adverse events. Strategies that adapt vital sign alarm thresholds to personal or situational factors may improve early detection of adverse events or reduce alarm rates as compared to classical alarm strategies. Accordingly, further investigation of the potential of adaptive alarms for continuous vital signs monitoring in ward patients is warranted. Supplementary Information The online version contains supplementary material available at 10.1007/s10877-021-00666-4.


Introduction
Patients admitted to the hospital for postoperative care are at risk of developing adverse events (AEs), which may lead to serious harm and life threatening situations [1][2][3]. Early identification and timely treatment of AEs is important to reduce secondary injury and improve patient outcomes [4]. As serious AEs are often preceded by changes in vital signs, routine vital sign measurements are an essential part of early warning systems in the hospital. However, various studies have reported that clinical deterioration in ward patients may be delayed or remain unnoticed due to infrequent or incomplete manual measurements of vital signs [5,6]. As a result, there is increasing interest in implementing unobtrusive wearable wireless sensors that enable continuous monitoring of vital signs in postoperative patients on the ward and may support early identification of clinical deterioration and AEs [6][7][8][9][10].
The first two authors (van Rossum, Vlaskamp) contributed equally to this work.

3
Although continuous monitoring has been applied in high care units for many years, its application in the ward is challenged by the lower nurse-to-patient ratio, limited critical care training of nurses, and increased mobilization of patients [11]. In this setting, active alarm systems that support interpretation of signs are crucial for adequate and timely response to potential deterioration [12]. Currently, most mobile monitoring systems adopt traditional and widely used alarm strategies where an alert is sent automatically as soon as measurements of one of the vital signs exceed a pre-set upper or lower threshold. Although this threshold-based alarm system can be life-saving in critical situations [4,13], it does not consider factors that affect vital signs levels such as physical activity, the circadian rhythm [14] or age [15]. As a result, many false positive alarms are generated in settings where there is continuous (wired) vital signs monitoring, such as intensive care units. Such high false alarm rates are unacceptable for continuous monitoring in a ward setting [9,16], as alarm overload is a high burden for caregivers and may even cause life-threatening situations from delayed response or even ignored alarms [17,18]. On the other hand, the use of standard thresholds may result in delayed notification of subtle but relevant vital sign abnormalities. As such, there is a clear clinical need for improved alarm strategies.
Various alternative methods that may improve alarm precision or reduce the number of false alarms in continuous vital signs monitoring have been described, with a clear trend towards intelligent techniques [18][19][20]. However, the integration of these advanced methods in patient care brings various concerns regarding the accuracy, reliability, efficiency, and interoperability [21][22][23]. Furthermore, alarms generated by complex or black-box models can be harder to interpret, which may hamper adoption by caregivers. We hypothesize that relatively simple alterations to classical threshold-based strategies may improve identification of AEs in post-surgical ward patients and reduce false alarm rates, which is investigated in the current study.

Data collection
The current observational retrospective study collected data from adult patients that were admitted to the surgical ward for postoperative care after elective major or intermediate surgery in the Amsterdam University Medical Center (Amsterdam, the Netherlands) between December 2018 until March 2019. All patients received standard postoperative care including intermittent vital signs measurements according to local Early Warning Score protocols. In addition, patients were monitored using the wireless Sensium Vitals® system (Sensium Healthcare, Oxford, UK). For this aim, a chest-worn patch sensor with axillary temperature probe was applied to measure the patient's heart rate (HR), respiratory rate (RR) and axillary temperature (T) every 2 min.
To support continuous monitoring, the original Sensium Vitals® algorithm was used as active alarm system. An alarm was generated in case one of the vital signs measurements exceeded the upper or lower threshold of the predefined normal range (HR: 40-120 beats/min, RR: 8-24 breaths/min, T: ≤ 38 °C respectively) for at least 7 successive measurements. As such, this alarm strategy includes an annunciation delay with interval length of 14 min in case of no missing or invalid measurements. For recurrent abnormalities, a new alarm of the same type was only generated if at least 5 successive measurements (minimal 10 min) had been in the normal range since the preceding alarm. In case of alarms, nurses were asked per protocol to assess the patient. When the nurse judged that an alarm was not caused by technical disturbances or movement, vital signs were measured manually and the Modified Early Warning Score (MEWS) [24] was calculated; further actions were taken according to established local protocols.
Patients were only included for analysis when the total vital signs recording time was at least 24 h and each of the vital signs measurements was available for at least a third of the total recording time. Next to the collection of vital sign measurements and alarms, the presence of observed AEs was assessed retrospectively using the patients clinical record. Adverse events were defined as any postoperative complication, new illness, or deterioration of existing disease described in the patient record. The onset of the AEs was defined as the timing of diagnostic confirmation reported in nursing files, laboratory or radiology results, following the Institute for HealthCare Improvements' Global Trigger tool [25]. The end of the AE was defined by the moment that AE treatment was no longer reported in the patient record. Only AEs that presented or were treated during the period of continuous monitoring were included in the analysis.

Simulation of alarm strategies
The collected wireless vital sign measurements of patients and clinically observed alarms were used to retrospectively evaluate the performance of the currently used Sensium Vitals® alarm algorithm for detection of AEs. Next, simulation was used to investigate the performance of alternative alarm strategies in the same dataset. For this aim, the original Sensium Vitals® algorithm was first reproduced retrospectively in MATLAB (version 2019b, The Math-Works Inc., Natick, MA, US) adopting the alarm principles described by the manufacturers and default settings. Subsequently, six alternative alarm strategies were explored by modifying the original alarm algorithm, as specified in Table 1. Two of these strategies were based on previously described methods for abnormality detection (I) or prevention of false alarm rates (III), as explained below. The other strategies were introduced based on physiological assumptions (II, IV, V, VI). For each alternative alarm strategy, three (sets of) parameter settings were subsequently tested to investigate and select optimal standard parameter settings ( Table 1). The tested parameter settings were chosen arbitrarily within in a range that was expected suitable, given physiology and default settings of the currently used algorithm.
The first alternative alarm strategy (I) implemented individual thresholds to correct for differences in normal vital signs ranges between patients. For this aim, the first available 24 h of the recording was used to create individual distributions of the vital signs for each patient and identify corresponding upper and lower alarm thresholds for the remaining monitoring period, similar to the approach described by Poole et al. [26].
The second alarm strategy (II) aimed to prevent false alarms, by increasing upper threshold levels in the first four postoperative days where levels for HR, RR and T are typically higher due to the surgical stress response [27,28].
The third strategy (III) focused on optimization of the annunciation delay, supported by the beneficial results reported in other studies [20,29,30]. Accordingly, an increase in the interval length of alarms was simulated, such that vital signs should exceed a threshold for a longer successive period to cause an alarm. With this adaptation, it was aimed to reduce the number of false alarms related to short lasting abnormalities caused by normal variations or movement artifacts.
The fourth (IV) alarm strategy was designed to compensate for increased physical activity level, which leads to increased HR and RR levels as compared to resting state. As patients are most active during daytime, the upper HR and RR threshold was increased for daytime (8 a.m. to 10 p.m.) to prevent false alarms.
Likewise, the fifth alarm strategy (V) corrected for low HR and RR levels that are often observed during sleep [31] by decreasing the corresponding lower threshold during nighttime (10 p.m. to 8 a.m.).
The sixth alarm strategy (VI) assessed vital signs solely based on time trends, as patterns of change are crucial in the detection of clinical deterioration [32]. Accordingly, this alarm strategy generated alarms in case the upward or downward slope calculated over a predefined time window exceeded a certain threshold, without taking the absolute vital sign value into account. Trends were assessed for time windows of multiple hours, as the wireless monitoring system is currently indicated for detection of clinical deterioration and not as surveillance system for acute situations.

Evaluation of alarm strategies
The alarms that were respectively generated in clinical practice or during simulation were defined as true positives (TP) or false positives (FP) to evaluate the performance to detect AEs. Alarms that occurred in the 24 h before diagnostic confirmation and during the treatment period of the AE were classified as TP in case the vital sign abnormality could be physiologically explained by development or presence of the AE. To enable consequent alarm classification, a list of assumed relations between AEs and vital sign abnormalities was composed using clinical guidelines and literature. In case subsequent AEs with overlapping windows of presentation were observed, alarms that could be related to both events were not double counted but allocated only to the event that developed latest in time. As continuous monitoring is aimed to be used as an early warning tool, TP alarms that were generated in the 24 h before diagnostic confirmation of the AE were also investigated as a separate category (TP early ).
The performance of the original alarm strategy and each of the optimized alternative strategies was evaluated using two sensitivity rates (S total , S early ), the total alarm rate, and the false discovery rate. S total and S early were defined as the number of AEs for which TP alarms or TP early were observed respectively, and represent the sensitivity for detection or early detection of AEs. The total alarm rate was calculated as the sum of all alarms divided by the total recording time of all patients, resulting in an average number of alarms/ day/patient. The false discovery rate was calculated as the percentage of alarms classified as FP. In addition to these four metrics, we introduced a performance score (P-score) to evaluate the relative improvement in overall performance for each of the alternative alarm strategies as compared to the original alarm strategy, based on the trade-off between early AE detection and total alarm rate. For this aim, sub scores were assigned to the level of increase or decrease in S early and total alarm rate, as specified in Table 2. The P-score was calculated as the sum of the two sub scores assigned to S early and total alarm rate respectively. Accordingly, a positive P-score indicates improvement in overall performance as opposed to the original alarm strategy the while a negative P-score indicates impairment.
For each alternative alarm strategy, the parameter set with highest P-score was selected as most optimal and used as standard setting applied to each patient record for further analysis and evaluation. In case of an equal P-score, the setting with lowest false discovery rate or the setting with smallest modification (lowest correction factor) as compared to the original alarm algorithm was selected subsequently. In addition to evaluation of individual alarm strategies, we explored whether combining multiple strategies improved alarm performance. For this aim, all possible combinations of strategies I to V were implemented cumulatively. The trend-based strategy (VI) was not included in these combinations due to its incompatibility with strategies that adapt thresholds for absolute vital sign values. Last, stepwise backward elimination was performed. Accordingly, the strategies that affected the P-score most were removed step-by-step from the combination, starting from the full combination of strategies (I-V). This process was repeated until all combination sizes were tested.

Patients
Data was collected for a total of 60 patients, of which 21 patients were excluded due to limited availability of wireless vital signs recordings. Table 3 reports the characteristics of the 39 remaining patients that were included for analysis. A total of 20 included patients (51%) developed one or more AEs during postoperative ward stay. In 14 patients, AEs presented during the continuous monitoring period, resulting in a total inclusion of 18 AEs (Clavien Dindo class I: N = 6, II: N = 8, III: N = 4). The type of included AEs is reported in Table 3.

Current alarm strategy
In total, 3999 h of vital signs data were available for the 39 included patients with a median duration of 94 (range: 28-279) h per patient. The population distribution of the vital signs is shown in Supplementary file 1 (Fig. 3). The original Sensium Vitals® algorithm generated a total of 83 alarms in 20 out of 39 patients, which translates to an average total alarm rate of 0.49 (median: 0.18, IQR: 0.0-0.73) alarms per patient per day. Figure 1 reports the type and classification of original alarms observed, indicating clear differences in the total amount of HR, RR, and T alarm types and the corresponding ratio of TP and FP alarms. Most alarms (63%) presented during daytime (8 a.m-10 p.m). Furthermore, the false discovery rate during daytime (52%) was lower as compared to nighttime (68%), which indicates that daytime alarms were more often classified as TP alarms. Often, alarms were not spread throughout the admission period but presented clustered on a specific day. Days with ≥ 3 alarms were found in seven patients. Figure 2 visualizes the presentation of different alarm types for patients with observed AEs. In total, 42% of the alarms generated by the original algorithm were classified as TP, where one or more TP alarms were observed in 11 out of 18 AEs (S total : 61%). In seven of these AEs, TP alarms were caused by one type of vital sign (HR, RR or T) only. However, the type of alarm was not necessarily the same for the few AE types that were observed in multiple patients (see Fig. 2). TP alarms were exclusively triggered by high vital signs levels and never for low levels. Although most TP alarms were generated during the period of AE treatment, alarms were generated before diagnostic confirmation in seven AEs (S early : 39%). TP alarms were observed for all four AEs with Clavien Dindo score of III, but only in half of the AEs with Clavien Dindo score of I (3 out of 6) or II (4 out of 8). In 15 out of 25 (60%) of the patients without events, no alarms were generated at all.

Alternative alarm strategies
The reproduced original algorithm that was used as starting point to simulate alternative alarm strategies regenerated 96% of the original alarms and created two additional alarms that were not created by the original algorithm. Table 4 summarizes the performance of the six simulated alternative alarm strategies, using the parameter settings that provided optimal results out of three tested options. The performance of all settings can be found in Supplementary file 2 ( Table 6). As compared to the original alarm strategy, adapted alarm strategy I and VI implemented with the optimal settings improved overall detection and early identification of AEs (S early and S total ) but also led to multifold increase in total alarm rate. In contrast, adapted alarm strategy II, III and IV decreased the total alarm rate at the cost of S early and S total . Strategy V led to a lower daily alarm rate without affecting (early) detection of AEs, constituting the only strategy with a positive P-score. This reduction in total alarm rate was only related to the modification in the lower threshold of RR as no alarms were generated for low HR. The performance of all tested combinations is found in Supplementary file 2 (Table 7), and Table 5 reports the results of the backward selection process. Various combinations of strategies improved overall performance (P-score ≥ 1), which was always the result of a larger increase in S early relative to the growth in total alarm rate. Although S early increased most in the well-performing alarm strategies, this was often accompanied by higher levels of S total as well. The combination of strategy II, III and IV performed best and increased S early to 61% and S total to 72%, but also caused a small increase in total alarm rate to 0.59 alarms/patient/day. Remarkably, all combinations with high P-score (P-score = 2) included strategy II, whilst this strategy impaired performance when implemented solely. Strategy I contributed least to improving alarm performance, as this strategy was included least frequently in combinations with improved performance and dropped out first in the backward elimination process.

Main findings
This study evaluated the performance of classical and adaptive threshold-based alarm strategies for continuous vital signs monitoring in ward patients. We aimed to explore easy-to-implement and transparent methods to support identification of clinical deterioration related to postoperative AEs. Our results show that the currently used classical threshold-based alarm strategy detected abnormalities in vital signs before or after onset of treatment in most of Table 4 Performance of original and alternative alarm strategies For definition of alarm strategies (I-VI) and corresponding parameters see Table 1. S total : sensitivity for detection of adverse events, S early : sensitivity for early detection of adverse events, P-score: performance score (for specification see Table 2), AE adverse event (N=18), TP true positive alarm, TP early true positive alarm presenting before presentation of the adverse event, NA not applicable   Table 1) implemented using optimal parameter settings (as mentioned in Table 4).
The crosses indicate that the considering alternative alarm strategy was included in the combination. S total : sensitivity for detection of adverse events, S early : sensitivity for early detection of adverse events, P-score: performance score (for specification see Table 2 the observed AEs in ward patients. Each of the six adapted threshold-based alarm strategies that we simulated retrospectively in the same population showed potential to either increase the sensitivity for detection of AEs or to reduce the total alarm rate as compared to the currently used alarm strategy. However, the individual alarm strategies caused minimal improvement or even impairment of overall alarm performance. Combining specific alternative alarm strategies improved overall performance most, where sensitivity rates increased while raising only few extra alarms. In particular, the number of AEs where alarms were observed in the 24 h prior to onset of treatment was increased, which suggests that implementation of multiple approaches to adaptive alarm thresholds may improve early detection of clinical deterioration in ward patients.

Evaluating alarm strategies
Alarms are seen as an essential element of continuous physiological monitoring, as these support timely identification of abnormalities and create awareness of potentially relevant deterioration. Yet, the alarm burden is also considered as one of the major concerns for successful implementation of continuous monitoring in a ward setting [9]. Therefore, critical evaluation of optimal alarm strategies for this setting is desired. Although there is general consensus about the need for adequate alarm systems, no clear definition of acceptable alarm rates and situations that require alarms exist. By definition, alarm systems are most effective in case the alerts promote actions that directly or indirectly contribute to patient outcome. Furthermore, it is known that the response towards alarms is best in case alarms convey specific events [16,33]. Most studies that investigated detection methods for ward patients used cardiac arrest, ICU transfer, or death as marker for deterioration [19,34], where the need to call for rapid action is obvious. However, we believe that alarm strategies for ward patients should also focus on less severe events that are more common in this setting, and to the early phase of serious AEs where sequelae could still be minimized. Accordingly, the current study evaluated alarm strategies by their ability to detect any type of postoperative adverse event requiring treatment, focusing on actionable situations. Despite the small study cohort, we were able to study a relatively high rate [3] and variety of AEs. By using a retrospective study set-up and investigation of alarms that presented in the 24-h window prior to AE treatment, we explored whether alarms could serve as an early warning tool. However, one should be aware that most AEs develop gradually which hampers sharp limitation of the corresponding onset and duration, challenged even more by variations in clinical response times and delays in reporting. Furthermore, it should be kept in mind that vital sign measurements do not detect diseases but only signs of deterioration related to (progression of) disease. Therefore, vital sign abnormalities that develop in a later phase of AEs may also be of clinical importance. For this reason, early alarms as well as alarms that presented after onset of AE treatment were included in the evaluation of overall TP rate. Still, as the causality of vital sign abnormalities and true timing of AEs remains uncertain, careful interpretation of sensitivity rates is required.

Relation to previous studies
Even though various methods to improve alarm strategies for continuous vital signs monitoring have been described [20], most monitoring systems still work with classical thresholds-based alarms and high rates of alarms remain problematic as today [17,35]. The average alarm rate observed in the current study was approximately 0.5 alarms per patient per day, which is markedly low as compared to previously reported rates of physiological monitoring systems used in the ICU (38-350 alarms/patient/day) and ward (96 alarm/ patient/day) [36,37]. Although this lower alarm rate is partially explained by the fact that currently used monitoring system does not assess oxygen saturation, blood pressure and electrocardiogram, this also indicates that current alarm strategy has relative good performance in terms of minimizing alarm burden. Still, more than half of the observed alarms was classified as false positive and no abnormalities were detected for a part of the AEs, supporting the search for improvement of alarm strategies.
Various studies described that manual or automated personalization of alarm thresholds improves alarm strategies for vital signs monitoring [26,38]. Besides, its has been suggested that use of trend information contributes to outcome prediction [32,39]. In the current study, the personalized and trend-based strategies (I and VI) were indeed able to improve sensitivity rates but also resulted in relatively high alarm rates. These findings indicate that the isolated assessment of relative or absolute changes in vital sign levels has limited specificity for AE detection, and question whether normal ranges should solely be based on previous postoperative measurements of the individual patient and without considering current vital sign levels. As such, further investigation of alternative methods that define and adapt to normal patterns representative for an unaffected physiological state are warranted.
As expected, adapting the annunciation delay interval (strategy III) reduced alarm rates, which was in line with previous studies [20,29,30]. Likewise, the methods correcting for the postoperative phase (strategy II) or day/night differences (strategy IV and V) lowered the number of alarms. Still, most of these strategies also reduced sensitivity rates when applied individually, and resulted in minimal improvement or even impairment of overall alarm performance. Combining strategy II-V was more effective and led to highest performance scores observed in current study, supporting the expectation that integration of different methods is beneficial [20].
However, remarkably, combining strategy II-V improved sensitivity rather than alarm rates, which is the opposite effect as observed for individual implementation of strategy II, III, IV or V. These reversed results indicate that the overall benefits of the modifications strongly depend on the overall algorithm design and settings applied, which is possible related to the general limitations of static single-parameter alarms. This is underlined by studies reporting that classical methods for detection of deterioration are outperformed by more advanced methods for personalization of alarm thresholds [38] or identification of abnormal trends or patterns in vital signs [19,32]. Furthermore, the integration of vital signs and context data can improve prediction of severe outcome events, which has led to the development of various patient assessment tools such as the MEWS [24], electronic Cardiac Arrest Risk Triage (eCART) score [40], Rothman score [41], and prediction methods based on machine learning [42][43][44]. However, most of these methods used more complex models or require additional data sources, and their clinical benefits still have to be demonstrated for applications of continuous wireless patient monitoring. Nevertheless, their underlying principles may guide further improvement of adaptive systems that trigger clinical response.

Limitations
To simulate modification to the original algorithm, the current clinically used alarm strategy was reconstructed based on descriptions of the original source code. Although the alarms of the reproduced original algorithm were almost identical to those observed in clinical practice, some inaccuracy may have been induced. Furthermore, even though accuracy of the currently used wireless monitoring system has been described as acceptable and reliable for HR and RR monitoring in ward patients [45], the continuous measurements could have been affected by missing data or inaccuracies. As such, the performance of the clinically used alarm strategy and adapted alarm methods may not translate to other systems and requires external verification.
To evaluate modified alarm strategies and optimize parameter settings, we introduced a performance score assessing the degree of improvement in sensitivity rate and total alarm rate as compared to the original algorithm. However, the optimal trade-off between sensitivity and alarm load in a ward setting is not yet established [20]. Besides, the specificity of alarm strategies is also relevant but could not be judged, as the number of true and false negative cases could not be verified retrospectively. Last, the opportunities to improve alarm performance were limited due to the relatively small population size, low clinical alarm rate, and by restricting the number of alterations and range of parameter settings that was tested. In addition, the methods used in the adaptive strategies were based on previously described principles or physiological assumptions, and the computational design or settings were not further trained or adapted for individual patients. Furthermore, the trend-based strategy was only tested in isolation, while combined or stepped assessment of trends and absolute vital sign values may be of interest as well. Therefore, larger prospective studies are desired to further optimize and integrate the alternative threshold-based alarm strategies and validate current results. Moreover, it is recommended to evaluate the effects of the adaptive alarm strategies also in other settings where higher false alarm rates are currently observed. Last, it is desired to verify the performance of the adaptive alarm strategies in relation to alternative methods for prediction of patient deterioration such as the MEWS score, and to assess their overall potential clinical benefits.

Conclusions
In conclusion, a classical threshold-based alarm strategy is able to identify abnormalities in continuously measured vital signs for the majority of AEs observed in surgical ward patients without causing excessive alarm rates. Implementation of transparent methods that adapt thresholds to personal or situational factors may increase event detection rates or lower alarm rates as compared to the classical strategy, yet with no or minimal overall improvement of alarm performance. Combining multiple adaptive threshold-based strategies seems more successful in improving alarm performance and may contribute to increased or earlier identification of clinical deterioration.
Luscii Healthtech BV (Health ICT company, Amsterdam, The Netherlands). B. Preckel takes part in an advisory board for Sensium Healthcare, United Kingdom. No competing financial interests exist.

Ethical approval The Medical Research Ethics Committee Academic
Medical Center Amsterdam (MREC AMC) waived ethical approval for this study.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creat iveco mmons .org/licen ses/by/4.0/.