Clinical study
The data was collected opportunistically during a non-related hypoxia (‘breathe-down’) study to evaluate a pulse-oximeter sensor. This parallel study protocol includes a desaturation event comprising a series of step-changes in oxygen saturation. Approval was given for the use of depth camera data acquisition and no other alteration to the existing protocol was made.
Fourteen subjects participated in the study. Subjects provided an institutional review board (IRB) approved informed consent covering the essential information stated in the protocol, as required elements according to 21 CFR 812.150 for a non-significant risk medical device investigation. The subjects were fitted with a face mask in order to adjust the FiO2 using a mixture of nitrogen and oxygen and induce desaturation. Each subject underwent a discrete episode of hypoxia. The sequence of targeted oxygen saturation levels is shown schematically in Fig. 1. In addition to the pulse oximetry data, capnography data was recorded during the study using a Datex-Ohmeda S/5 Monitor (GE Healthcare, Chicago, IL, USA). The capnograph is the reference against which we assessed our non-contact respiratory rate algorithm. Each session took approximately 35 min.
All 14 subjects successfully completed the hypoxic challenge. Due to technical reasons the primary study only captured 12 of the 14 desaturation profiles. Our secondary study however successfully captured depth information for all subjects, and since our study did not require the saturation data, we could analyse all 14 subjects. The subjects had a mean age of 31.9 (standard deviation, SD 6.9) years and mean body mass index (BMI) of 26.3 (SD 4.2). Individual demographic information is provided in Table 1. Exclusion criteria included subjects with known respiratory conditions and/or heart or cardiovascular conditions.
Table 1 Participant demographic information Data acquisition and processing
The depth data was captured using a Kinect V2 camera (Microsoft Corporation, Redmond, WA, USA) connected to a laptop and at a frame rate of 30 fps. The camera was mounted on a tripod and placed in front of each subject. The data was collected over several days and the distance between the camera and subjects varied between 1.2 and 2.0 m over the collection period and positioned vertically at approximately chest height. The subjects were seated in a slightly reclined position. The room was illuminated with standard ceiling mounted fluorescent lights. Other than starting and stopping the recording process no other intervention or calibration was required over the study period.
Respiratory rate (RR) is extracted algorithmically from the acquired depth data as illustrated in Fig. 2. A region-of-interest (ROI) is defined on the torso area of the subject (Fig. 3a). An estimate of the volume change across the ROI over time is obtained by calculating the depth changes of each frame and integrating spatially across the ROI. The resulting volume signal offers a clear indication of the breathing pattern as shown in Fig. 3b where the peaks and troughs of the individual breaths are marked. Note that Fig. 3b shows a whole trace from one of the subjects in the trial. Three large breaths are obvious in the main plot (marked by the arrow) at the end of the trial where the subject is instructed to take large breaths at the end of the hypoxic challenge. The zoomed-in portion of the signal shows the respiratory modulations in more detail.
The next steps in the algorithm (outlined in Fig. 2) extract a robust value of respiratory rate (RR) from the volume signal. The respiratory volume signal is first filtered by a low pass filter (Butterworth, 5th order, cut-off 0.67 Hz). The peaks of this signal are then identified and the respiratory periods (RPs) calculated as the time difference between successive peaks to produce a “per breath” RP signal. This RP signal is low-pass filtered (Butterworth, 5th order, cut-off 0.67 Hz) to smooth the periods. The RR signal is then calculated by multiplying the reciprocal of the RP signal by 60. The final step removes the effect of outliers in the RR signal by averaging over a 60 s sliding window only those points that are within the 25th and 75th percentiles of the values. (We have found that these outliers may arise if non-prominent peaks are not successfully eliminated during the initial stages of the algorithm and this outlier removal step successfully deals with these.) This processing produces the output RRdepth signal, an example of which is shown in Fig. 3c.
The capnograph provides a reference respiratory rate on a per-second basis. This output reporting time step duration is relatively typical for medical monitoring devices for screen updating. We therefore resampled the output of the depth sensing RR to match this. The two respiratory rate signals, RRdepth and RRcap, required synchronization prior to statistical analysis as the depth camera and capnograph signals were collected independently on separate acquisition systems. This was carried out using cross-correlation of the two signals.
Data analysis
Bias and accuracy statistics were calculated to compare the depth data derived RR with that of the reference (capnograph) system. These are, respectively, the mean difference and the root mean squared difference (RMSD) between the test and reference values. That is
$$Bias = \frac{{\mathop \sum \nolimits_{i = 1}^{N} \left( {RR_{depth} \left( i \right) - RR_{cap} \left( i \right)} \right)}}{N}$$
(1)
and
$$RMSD\,accuracy = \sqrt {\frac{{\mathop \sum \nolimits_{i = 1}^{N} \left( {RR_{depth} \left( i \right) - RR_{cap} \left( i \right)} \right)^{2} }}{N}}$$
(2)
The latter expression is a root mean square deviation (RMSD) and represents a combination of the systematic and random components of the differences between the corresponding readings from the two devices.
Least-squares linear regression was performed to obtain the line of best fit between the video and reference parameters from which the gradient, intercept, Pearson correlation coefficient, R, and associated p values were computed. In this work p < 0.05 was considered statistically significant. A Bland–Altman analysis of the data was also performed using the method of Bland and Altman [23] which compensates for within-subject longitudinal correlation in the data. SD of the bias and corresponding limits of agreement were calculated using this methodology.
A reliability measure in the form of an uptime was computed. This is a measure of the percentage of time that RRdepth can be computed for each subject. A high uptime is usually a fundamental technical requirement for the development of a medical device. To be acceptable for use in clinical practice, both accuracy and uptime must be sufficiently high. Note that accuracy may be improved at the expense of uptime by avoiding posting results when the signal quality is poor (e.g., due to noise). We define uptime, here as the duration, Tvalid, that a valid respiratory rate can be calculated and reported by the algorithm as a percentage of the total acquisition time, Tacq, i.e.
$$Uptime = \frac{{T_{valid} }}{{T_{acq} }} \times 100$$
(3)
Matlab (R2018b) was used to process the data and perform the statistical analysis. An in-house developed C++ application was used to capture the depth data.