More than 70% of preventable in-hospital deaths are associated with “failure to rescue” and “death in low-mortality risk groups”.1 Unexpected intensive care unit admissions, cardiopulmonary arrests, and deaths are preceded by abnormal physiologic parameters 6-24 hr prior to these events.2,3,4,5 Altered respiratory rate (RR) is potentially the most important predictor of these serious adverse events.6,7 Although RR (typically indicated as breaths per minute) is a very informative physiologic parameter, it is not routinely and accurately measured in patients on general hospital wards.8,9 Caregivers typically assess RR by counting chest wall movements for at most 15-30 sec, while the other physiologic parameters (e.g., systolic blood pressure and temperature) are only intermittently measured with well-tolerated reliable equipment.

Accurate continuous RR monitors for low care clinical settings can facilitate early recognition of abnormal respiratory patterns and early intervention (e.g., by alerting the rapid response team), which in turn might reduce the incidence of serious adverse events and unfavourable patient outcomes.10 The functionality of the complete monitoring strategy – which includes the sensing principle, signal analysis, caregiver notification, and protocolized therapeutic intervention – is essential to determine whether introduction of a continuous RR monitor improves patient outcomes.11

In this methods-comparison study, we focus on the first and second components (the sensing principle and the signal analysis) of the monitoring strategy. Although our eventual intent is to study the ability of different devices to detect RR in low care clinical settings, such as general hospital wards, these technologies first need evaluating in settings where accurate reference standards can be assured. The current measurement of RR on general hospital wards based on counting chest wall movement is not accurate enough to serve as a reference standard. Furthermore, the incidence of respiratory abnormalities on the general hospital ward is likely too low for such a method-comparison study. Therefore, we studied the ability of frequency modulated continuous wave (FMCW) radar, thoracic impedance pneumography (IPG), and breath sound acoustic monitoring to detect RR in comparison with the reference standard (capnography) in surgical patients during the early postoperative phase on the postanesthesia care unit (PACU). The study was conducted in PACU patients as they are a closely monitored diverse group of patients who share important similarities with our target population and have a variety of respiratory alterations. We hypothesized that remote continuous RR monitors (radar, IPG, and acoustic) can measure RR and recognize abnormal breathing to a similar standard as capnography.

Methods

Study population

We studied adult patients who underwent major abdominal surgery and received postoperative care on the PACU at a large tertiary teaching hospital in the Netherlands (University Medical Centre Utrecht). The institutional review board (IRB) reviewed the study protocol and found that it was not subject to the Dutch act on “medical research involving human subjects” (14-278/C, November 2014). Although the IRB waived the need for informed consent, our study group opted to request written informed consent from all participants. The study was conducted in accordance with the moral, ethical, and scientific principles governing clinical research as set out in the Declaration of Helsinki (2013) and good clinical practice.12

Design and conduct of the study

This method-comparison study had an observational cross-sectional diagnostic design. Hence, all RR monitors were fitted to patients in a similar systematic manner at the same time. Three index devices, namely FMCW radar (RespiRadar; ART Innovation Company, Rotterdam, The Netherlands),13 thoracic IPG (from the Datex Ohmeda Compact S3, module e-PRESTN, Datex Ohmeda, Inc., Helsinki, Finland),14 and breath sound acoustic monitor (RRa; RAD-87TM Pulse CO-Oximeter, Masimo Corporation, Irvine, CA, USA)15 were compared with sidestream capnography (the reference standard). Treatment decisions were solely based on vital signs displayed on routinely used bedside monitors (pulse oximetry, electrocardiogram, oscillometric or invasive blood pressure measurement), as per current guidelines for postoperative care.

Reference standard

Capnography was performed using a nasal cannula (Smart Capnoline; Oridion Medical 1987 Ltd., Israel) connected to the capnograph (Compact S3 module E-miniC, Datex Ohmeda, Inc., Helsinki, Finland). The RR derived from the capnograph was calculated as a 1 min average and was updated breath-by-breath. The manufacturer reported an accuracy of ±1 breath·min−1. Supplemental oxygen was delivered over the cannula at a maximum of 4 L·min−1. Patients needing more oxygen received oxygen through a mask placed over the cannula.

Index devices

We studied three different index monitors with different sensing principles. The first, the FMCW radar was a prototype that determines RR by frequency and amplitude differences between electromagnetic radio waves emitted to and reflected by the chest wall.13 It is a non-invasive, wireless, contactless RR monitor that is mounted to the ceiling above the bed. The radar determines the RR over 20 sec epochs with an update rate of 1.6 Hz. The accuracy of the FMCW radar has not been previously reported. The second index device was thoracic IPG, which measured impedance changes (0.2-7 Ohm) over electrocardiographic electrodes.14 Impedance changes originate from an altered gas to fluid volume ratio together with an altered length of conductance induced by the inhalation and exhalation of air. The IPG determined the RR over 30 sec epochs with an update rate of 0.1 Hz. The reported accuracy in the device manual is ± 5 breaths·min−1. The third index device, the breath sound acoustic monitor, includes an adhesive transducer applied to the patient’s neck that detects the acoustic signals induced by air displacement during breathing.15 The manufacturer reported a bias of 0.8 breaths·min−1 (standard deviation [SD] 3.4) in postsurgical patients compared with sidestream capnography.15 The update rate was 0.5 Hz, and the averaging mode was 20 sec with an alarm delay for apnea of 30 sec. Other alarms were switched off.

Data collection and processing

Signals were synchronously sampled and recorded with corresponding time stamps in the Anesthesia Information Management System (AIMS) (AnStat; Carepoint, Ede, the Netherlands). The reference and index devices used different averaging times for RR calculation. We were not able to adjust these averaging times to allow for comparison of the moving averages. Therefore, we stored the median values of RR once every minute to enable comparison. The raw FMCW radar waveform information (describing the amplitude and phase of the reflected radio waves) was stored separately on secure digital cards. Signal-processing, based on a zero-crossing algorithm, was performed offline and RRs derived from the radar device were subsequently stored in the AIMS. Offline processing of the raw FMCW radar data was performed by an engineer blinded to the RRs derived from the other devices. Two observers (K.L. and E.V.) observed the patients and their caregivers continuously for voluntary movement, talking, and specific activities (without interfering with the regular care process) to identify potential artifacts. Observations were categorized and listed in advance. In addition, free text observations were made. The appendix in the electronic supplement information describes the activities that were observed during the study measurements.

The ability of a monitor to support correct treatment decisions is clinically very important and graphically depicted in a Clarke Error Grid (CEG). The reference lines in a CEG for RR that mark the boundaries between correct and incorrect treatment decisions were determined by consensus between the authors.

Outcome measures

The primary outcome was RR (breaths·min−1) compared with the reference standard. We considered limits of agreement (LoA) within ± 2 breaths·min−1 as clinically relevant, considering the allowance for false negatives. A secondary outcome was the ability of the monitors to guide treatment decisions using CEG analysis, expressed as the percentage of data pairs representing adequate and inadequate treatment decisions.

Statistical analysis

The published statistical literature does not provide formal rules to calculate statistical power for method-comparison studies when multiple measurements are recorded per patient. Therefore, the sample size by convenience was set at 20 patients, each of who had at least 60 consecutive data points (median RR every min) per device. The primary outcome (RR) was analyzed using both the Bland and Altman (BA) method for repeated measurements and the mixed effect models as suggested by Myles.16,17 In the BA method, the bias is determined by the mean of the subject means. The LoA are calculated with the SD, which is corrected for the number of observations per subject and multiple testing (LoA ± 2.63·SD). We also report the variance ratio (between-subject variance/total variance) as the repeated measurements correction is only applied to the subject variance. Furthermore, we calculated the LoA that were derived by adding the residual random effects from two mixed effect models for the index and reference device to the total variance. Myles previously suggested using this method as alternative for handling repeated measurements.17 For mixed effect modelling, we used the “nmle” package version 3.1-131 of R (https://cran.r-project.org/web/packages/nlme/index.html) with the restricted maximum likelihood method.

We studied the potential influence on treatment decisions using CEG.18 Adequate (areas a, b), unnecessary (area c), inadequate (area d), and opposite (area e) treatment decisions are reflected by different regions in CEGs with cut-off points for bradypnea (< 9 breaths·min−1) and tachypnea (individually assessed using definitions of > 20 and > 25 breaths·min−1) with 98.9% confidence intervals (CI). Furthermore, CEG percentages for areas a-e were compared between monitors with a Fisher exact test. We also studied the positive predictive value and negative predictive value for abnormal respiration defined as bradypnea (< 9 breaths·min−1) and tachypnea (> 25 breaths·min−1) combined. Analyses were performed with R version 3.1.1 (R Foundation for Statistical Software, Vienna, Austria).

Results

From December 15 2014 until February 20 2015, 20 patients were enrolled. Study measurements were performed for 1,203 min, with a median [interquartile range] of 61 min [60-63] per patient. For the acoustic breath sound monitor we stored fewer (m = 952 min) complete data pairs because of data acquisition problems in four patients. The RR measurements derived from the capnograph ranged between 4 and 38 breaths·min−1. Table 1 shows other patient, surgical, and respiratory pattern characteristics.

Table 1 Baseline characteristics of 20 participants monitored during their postoperative stay on the PACU

The primary outcomes are presented in Table 2. Compared with the RR measured by reference capnography, the acoustic breath sound monitor showed a bias of 0.1 breaths·min−1 with a 98.9% LoA of −7.9 to 7.9 breaths·min−1. The 98.9% LoA derived from the mixed effect model was −6.2 to 6.2 breaths·min−1. The bias and 98.9% LoA for IPG and FMCW radar showed a larger deviation from the reference. The BA plots for IPG, FMCW radar, and the acoustic monitor are depicted in Fig. 1, with LoAs from the BA method in red, and from the mixed effect models in black. The intensity of the dots displays the number of superimposed measurement pairs. The BA plot for the acoustic monitor shows a horizontal pattern around the bias near zero for the average RR range of 5-28 breaths·min−1. The dispersion of measurement pairs around the bias was larger for the radar and IPG measurements.

Table 2 Primary outcome, respiratory rate as measured by different monitors (IPG, FMCW radar, and acoustic breath sound monitoring) compared with capnography
Fig. 1
figure 1

Bland and Altman plots for impedance pneumography (IPG), radar, and acoustic monitors. The intensity of the dots indicates the number of superimposed measurement pairs (1-10). 98.8% limits of agreement from the Bland and Altman method in red, and from mixed effect models in black. FMCW = frequency modulated continuous wave

The CEG results on measurement error and its influence on treatment decisions at the same time is presented in Fig. 2, where the CEGs are plotted. The percentages of data pairs in the regions a-e are provided in Table 3. The extent to which the monitors guided adequate or led to inadequate treatment decisions (determined by CEG analysis) differed significantly between the three monitors (P = 0.011). Adequate treatment decisions (areas a and b) with 98.9% CI would have been arrived at 96 (95 to 97)%, 95 (93 to 96)% and 94 (91 to 95)% of the time for acoustic, radar, and IPG monitoring respectively. Area e of the CEG contains those points that would treat patients with bradypnea as if they had tachypnea, and vice-versa. None of the measurements with the acoustic monitor are in this region, while the radar and IPG have respectively 0.1% and 0.7% measurement pairs in this region.

Fig. 2
figure 2

Clarke Error Grids for respiratory rate measured with impedance pneumography (IPG), radar, and acoustic sensor. Areas labelled a-e provide information on measurement error and its influence on treatment. Areas: dark blue a = points with 20% of the reference measurement; light blue b = points 20% outside reference but not leading to unnecessary treatment; light orange c = points leading to unnecessary treatment; light orange d = indicates a potentially dangerous failure to detect bradypnea or tachypnea; dark orange e = represents where tachypnea and bradypnea are confused; bpm = breaths·min−1; FMCM = frequency modulated continuous wave; RR = respiratory rate

Table 3 Diagnostic accuracy for abnormal breathing (defined as RR below 10 or above 25 breaths·min−1)

Discussion

Diagnostic studies evaluating non-invasive RR monitors are scarce and even fewer studies describe wireless RR monitors for use on the general hospital ward.11 In this study, we focused on continuous RR monitors, their accuracy, and to which extent they cause erroneous treatment decisions in a clinical setting that partly mimics postoperative admission to a surgical hospital ward. For the three monitors that were studied, the LoAs were outside our predefined acceptable LoA of ± 2 breaths·min−1. The acoustic monitor had the highest percentage of RR measurements in CEG area a (i.e., correct treatment decisions and within 20% of the reference measurement) and the lowest percentage of RR measurements in CEG area d (i.e., dangerous failure to detect bradypnea/tachypnea) and area e (i.e., opposite treatment decisions). These percentages differed significantly from the other devices (IPG and FMCW radar).

When interpreting these findings, the presumed superiority of the acoustic monitor may, at least partly, be due to the choice of capnography as a potentially imperfect reference standard. In addition, the acoustic monitor may have an advantage compared with the two other index devices, as the capnograph and the acoustic monitor both measure displacement of air by tracing carbon dioxide concentrations or sound production during the respiratory cycle respectively. Patient activities such as talking and partially obstructive breath (e.g., snoring) can interrupt the displacement of air and may cause measurement artifacts in a similar direction. Unlike capnography and acoustic monitoring, IPG and radar are likely to suffer from movement artifacts as these modalities rely on the measurement of chest wall movements. We advocate selecting and combining different sensor technologies to obtain reliable information on RR in all clinical situations. Future studies should evaluate the signal analytic techniques necessary to select the correct RR measurement for a specific clinical situation.

The finding that the accuracy for all three index monitors was outside the predefined limits needs our consideration. Capnography has proved its additive value in the high care setting during mechanical ventilation,19 but this has been translated into clinical use in lower or intermediate care settings. The accuracy of capnography may also be reduced during spontaneous breathing in patients who receive supplemental oxygen, which dilutes expired carbon dioxide, resulting in erroneously low end-tidal carbon dioxide.12,20 It is thus important to realize that an undefined part of the measurement error is potentially attributed to the capnograph, rather than to the index devices. For this reason, we also calculated the bias and 95% LoA with mixed effect models that add the mean of each measurement pair as an explanatory variable. This corrects, at least partly, the measurement error that is attributable to the capnograph. The SD obtained from the mixed effect models are reduced for all three methods, suggesting that the RR obtained with capnography did indeed differ from the true RR. Despite this improvement using the Myles method,17 the monitors frequently exceeded our predefined LoA.

Although our investigation focused on the accuracy of the various non-invasive RR monitors in the low care setting, other factors also play a role in the usability of continuous RR monitoring in this setting. First, it is important that the sensing principle detects most respiratory problems. Monitoring displacement of air (as with the capnograph and acoustic monitor) can detect upper airway obstruction, while monitoring chest wall movements (as with IPG and radar) and may falsely suggest adequate breathing even in the presence of severe upper airway obstruction. Second, patient comfort is also an important factor in the choice of RR monitors as it can hinder both accuracy and patient acceptance. Patient contact and wires are needed for the capnograph and acoustic monitors. This would, in many cases, reduce patient comfort, limit patient mobility, and thereby threaten the sustainability of the monitoring strategy. We encountered this issue during a previous observational pilot study, in which we intended to study respiratory patterns with capnography on hospital wards (IRB decision 10-032/C, February 2010). That study involved additional nasal cannula (Smart Capnoline; Oridion Medical 1987 Ltd. Israel), which annoyed patients and it was terminated early as all six patients refrained from further participation within a few hours of starting monitoring (unpublished results). Similar issues with patient discomfort could conceivably occur with long-term use of the acoustic and IPG adhesive sensors. The radar was the only wireless and contactless RR monitor evaluated in this study. The radar was a prototype monitor that provided us with detailed information on all patient movements, whereas for the current study, we were only interested in chest wall movements to determine RR. An update of this prototype with smart algorithms that are able to detect voluntary movement will likely improve the accuracy and sustainability in low care clinical settings. Such algorithms should be able to reliably distinguish between breaths and voluntary movements.

There were some limitations to our study. First, we were unable to obtain a complete acoustic RR data set in all the patients because of data storage problems. This, however, is unlikely to have influenced the study conclusions, as the LoA in our analyses only included patients with similar complete data sets. Whether our sample size (of approximately 1,200 data pairs in 20 patients) is sufficient remains unclear as formal rules for sample size calculation in method-comparison studies with repeated measurements have not been established. Another potential limitation is the study setting itself – while our overarching aim is to better understand the characteristics of these monitors in low care settings such as general hospital wards, due to the need to use a continuous reference standard, we studied patients in the PACU during the immediate postoperative period with capnography as our reference. In addition, we measured a wide range of RRs that were also outside normal values. We presume that these results can be generalized to patients admitted to the general hospital ward with other pathologies causing abnormal RRs. Confirming the functionality on the general hospital ward in a method-comparison study is difficult without a suitable reference standard. Therefore, we studied the monitors’ accuracy in a PACU setting with a pragmatic study design. Such a study should ultimately be followed by an intervention study assessing the additive value on general hospital wards.11

Conclusion

This study directly compared acoustic, IPG, and FMCW radar monitoring with capnography monitoring for RR during spontaneous breathing in postsurgical patients in the immediate postoperative period. RR measurements with the acoustic monitor were most accurate but the LoA were wide. Accordingly, none of the studied continuous RR monitors were accurate enough to be introduced into clinical practice.