1 Introduction

Monitoring respiration rate (RR) and associated changes is important to examine the health condition of individuals [1]. Keeping track of these parameters is even more important for vulnerable individuals, including the critically ill, neonates, infants, and the elderly [2]. Anomalous RR is an important evidence for serious health issues and can also be used to predict potentially serious clinical events such as cardiopulmonary arrest, chronic heart failure, pneumonia, pulmonary embolism, weaning failure, overdose or admission to an intensive care unit [3,4,5,6,7,8,9,10,11,12,13,14]. Some studies have presented that RR is more efficient than other vital signs in differentiating between healthy and unhealthy individuals [15]. Moreover, monitoring changes of RR helps diagnose a high-risk patient up to 24 hours earlier than the actual event [15].

In the past decade, various non-contact-based methods for RR measurement have been explored. Respiratory movement is subtle and cyclic with different breathing patterns for individuals but is not easy to detect through vision. Tan et al. [2] detect breathing motion using difference of frames (DOF). Tan et al. [2] use image and signal processing techniques to derive information about chest and abdominal movements from a series of video images captured with a single camera. Tan et al.’s [2] method is largely dependent on distinctive pattern clothing and could be greatly affected when clothing is removed or by non-respiratory movements. Zhao et al. [16] used near-infrared illuminated camera to detect the heart rate (HR) and RR. Zhao et al.’s [16] method deconstructs single channel images using delay-coordinate transformation and independent component analysis to reveal the temporal dynamics of heart beat and breathing rate. Zhao et al.’s [16] method requires that subjects sit in front of the camera and face towards the camera, which however poses strict positional and postural constraints on the monitoring subjects. Xia et al. [17] used the KINECT sensor and a translation surface to measure the RR. Xia et al. [17] use a motion magnification method to improve KINECT’s inherent depth resolution from 1 cm to 1 mm. Xia et al. [17] programmed the KINECT to capture depth images and calculate the average depth over a thoracic area of interest, viewed almost parallel to the subject’s surface, using the KINECT software development kit. Xia et al.’s [17] method requires that the translation surface is placed on the subject’s abdomen, which also poses strict positional and postural constraints and furthermore causes discomfort to the subject. Wijenayake et al. [18] and Rehouma et al. [19, 20] used RGB-D cameras to measure RR, which also poses positional constraints on the subjects. Wijenayake et al. [18] use principal component analysis (PCA) to eliminate the spatial and temporal noise from the input depth data and construct a patient-specific respiratory motion model. Then, with high precision, this model is used to calculate external respiratory motion in real time. Rehouma et al.’s [19] method uses depth information captured by two (Red Green Blue-Depth) RGB-D cameras at various view angles. The depth information is then used to recreate a 3D surface of a patient’s torso with high temporal and spatial resolution and broad spatial coverage using depth information. The motion data is recorded for the top of the torso as well as its two lateral sides. A recursive subdivision of the 3D space into cubic unit elements is used to estimate the volume for each reconstruction. Between successive reconstructions, the volume shift is measured using a subtraction technique. Wijenayake et al.’s [18] and Rehouma et al.’s [19, 20] methods poses positional constraints on the subjects and these positional and postural restrictions can make the subject uneasy and affect the measurement results. Sanyal et al. [21] used RGB camera to capture subject’s face video to measure HR and RR by calculating variations in color of reflected light, i.e., Hue from the video of a subject’s face. This technique is highly affected by the lighting conditions and by the individuals skin tone. Moreover, this method fails to measure RR and HR if the subject is not facing towards the camera. Massaroni et al. [22] also proposed a RGB video-based RR measurement while sitting in front of the camera. Massaroni et al.’s [22] technique consists of a laptop’s built-in RGB camera and an algorithm for post processing of acquired video data. The study of pixel intensity shifts produces a waveform showing respiratory pattern from the recording of a subject’s chest movements. However, this method use manual annotation of breathing region, which can again cause the postural restriction on the individuals. Harte et al. [23] used structured-light (SL) to detect chest wall motion to estimate the RR. In this method, the cameras were placed around the patient, and the patient is required in a fixed standing position. More precisely, the patient must be in a precise marked position due to the system restrictions. Janssen et al. [24] measure RR with a video camera and also detect the region of interest. Janssen et al.’s [24] method exploit the intrinsic properties of respiration to find the respiratory region of interest and extract the respiratory signal via motion factorization based on the observation that respiration induced chest/abdomen motion is independent motion in a video. However, Janssen et al.’s [24] method was unable to produce accurate results and tends to fail in different lighting conditions. Al-Naji et al. [25] and Brieva et al. [26] use the motion magnification technique to measure RR from the video. Al-Naji et al.’s [24] method uses motion magnification method based on elliptic filter and wavelet decomposition to magnify breathing action that is difficult to see through naked eye. RR and its time parameters are calculated by identifying the fastest moving areas in magnified video frame sequences. However, the magnification techniques are also likely to amplify the environmental noise, which may cause inaccurate RR measurements. For this reason, Brieva et al. [26] used a convolutional neural network (CNN) with manual annotation of breathing region to address this problem. This technique however may fail in general because sleeping posture and breathing pattern vary in individuals. Wang et al. [27] built a persistent luminous impression model (PLIM) to detect subtle breathing motion signals from a camera without positional constraints for identification of obstructive sleep apnoea behavior, but Wang et al.’s method does not produce RR results and is sensitive to environmental noises. Lee et al. [28] used radar sensor for contactless measurement of RR and achieved high precision, but Lee et al.’s method fails to measure RR when subjects moved substantially. Phokela et al. [29] proposed a method to estimate different breathing rates with the help of nasal breath sound strength. Phokela et al.’s method uses a headset microphone placed beneath the nose and attached with a smartphone, which will cause positional constraints for subject as the subject will have to keep the microphone close to their nose. Reyes et al. [30] proposed a computer vision-based estimation of RR and respiratory movement. In this method, the system uses the combination of Kanade-Lucas-Tomasi, Viola-Jones and Harris Stephens feature algorithms to automatically detect and tracks a region of interest on the chest of a person facing a camera. Then the respiration movement and RR is estimated by using displacement in the vertical direction of the ROI. This method also poses a postural constraint. Schoun et al. [31] used thin medium thermal imaging to calculate RR, tidal volume, and mouth distribution. In this method, a subject breathes on a thin medium placed perpendicular to the exhaled airflow of the patient, while the heat signature is recorded on the opposite side of the medium by a thermal camera, which also poses positional constraints on the subject. Nosrati et al. [32] use an electromagnetic-based doppler radar to calculate RR. However, the radar was placed in front of the patient, and the patient is required in a fixed standing position. More precisely, the patient must be in a precise marked position due to the system restrictions. Nam et al. [33] use a introduces a new method of monitoring for the simultaneous measurement of heart and respiratory rates using dual smartphone cameras. The suggested method estimates heart rates using a rear-facing camera, while breathing rates are measured using a front-facing non-contact camera at the same time. However, one drawback of this method is that during data recording, hand movement is reduced, which might not always be feasible.

In this paper, we present an improved adaptive real-time camera-based RR measurement system based on [34], and the proposed method is demonstrated to be able to automatically adapt to different lighting conditions, deal with environmental noise, detect the active breathing region, and does not pose any positional constraint. This paper is organized as follows. We introduce our adaptive real-time camera-based RR monitoring system algorithm in Section 2. In Section 3, we present the experimental setup for data acquisition and the results. Finally, Section 4 states concluding remarks.

2 Methods

In this research, we proposed a real-time camera-based adaptive RR monitoring system. The proposed system uses a regular smartphone RGB camera as a video input to monitor the RR and change in RR of the individuals. The proposed system is robust to lighting conditions, environmental noise, different breathing patterns (i.e., shallow, middle, and deep), and it is sensitive to subtle respiratory movement, which is difficult to detect otherwise. Figure 1 illustrates an overview of the proposed algorithm. To monitor RR, we observe the subtle respiratory movement while breathing and propose an adaptive RR monitoring system which consists of the following: (1) adaptive breathing motion detection, (2) adaptive region of interest detection to eliminate environmental noise, (3) breathing and body movement classification, (4) respiration rate estimation, (5) monitor change in respiration rate to examine overall health of an individual, and (6) online adaptation to lighting.

Fig. 1
figure 1

The proposed framework for a real-time camera-based adaptive RR monitoring system. (i) Data acquisition, the animal data is recorded from three different smartphones highlighted with the orange boxes and two stopwatches highlighted with the yellow boxes. The distance between smartphones and subject is illustrated with the blue arrows. (ii) Filtered breathing motion detection. (ii, a) shows the real-time breathing motion detection with ROI and activity map in yellow rectangle; (ii, b) illustrates the saved normal breathing template, (ii, c) shows the real-time raw breathing activity levels from the activity map as an oscillogram; (ii, d) shows the filtered breathing activity levels on the oscillogram with the help of saved normal breathing template. (iii) Region of interest is selected avoiding the environmental noise. (v) Matching of breathing template with normal template. (iv,a) shows a normal breathing event where the green color indicates the motion matched with the normal breathing template; (iv, b) presents a limb movement event with high raw activity level (ef) whereas the filtered breathing activity level (\(e^{\prime }_{f}\)) remains stable; the red color indicates the non-breathing motion, which is not matched with the template; (iv, c) shows a non-breathing status where the subject is at rest without any movement, and the blue color indicates the latest saved normal breathing template. (v) Movement classification. (v, a) shows the signal of the normal breathing activity; (v, b) shows the signal of body or limb movement activity; (v, c) shows the respiratory arrest activity. (vi) Estimation of respiration rate, RR is estimated using filtered breathing activity (\(e^{\prime }_{f}\)) signals over f video frames. af is the initial frame number, and bf is the final frame number required to calculate the frame difference between two breaths. The calculated frame difference is illustrated as τ, which is used to estimate the RR per minute. Activity peak (Υf), upper-threshold (ζu), and lower-threshold (ζl) are used to detect the breathing activity. Inclining slope shaded in green shows inhalation, and declining peak shaded in blue shows exhalation. Upper and lower thresholds are marked to identify correct inhalation and exhalation patterns. (vii) Monitor change in respiration, minor alert is generated if the change is RR is less than 25%, moderate alert is generated if the change in RR is more than 25% and less than 50%, and critical alert is generated if change in RR is more than 50%

2.1 Adaptive breathing motion detection

Adaptive breathing motion detection is introduced in the proposed system to adapt to different lighting environments and detect subtle breathing motion patterns accurately. Existing vision-based RR monitoring methods have a significant impact due to different lighting environments that compromise their results. Therefore, the proposed system assess the video input (If(x, y)) and adapt according to the lighting condition.

In order to detect the subtle breathing motion without any positional constraint, the proposed system is using a persistent luminous impression model (PLIM)[26] to detect the subtle breathing motion without positional constraints. Given a (w× h) image, and a frame rate of F frames/s, the PLIM is initialized using the initial frame (I0(x, y)) from the video input as in Eq. 1. The PLIM incorporates slow adaptation, allowing pose changes to be accommodated while allowing cyclical movements to be detected. The subtle breathing motion is detected in the region of interest (ROI) boundaries by calculating the difference between the current frame (If(bx, by)) and the background frame (Pfν(bx, by)) at frame f using Eq. 2. The background frame is updated with the frequency of ν to incorporate slow adaptation and enhance subtle breathing movement using Eq. 3. After breathing motion detection, PLIM’s breathing activity map (Af(bx, by)) is computed with the help of Eq. 4 by comparing the difference with the threshold (α). We also define the raw motion activity level (ef) as the number of active pixels in the activity map (Af(bx, by)) at frame f (see Fig. c). It is computed using Eq. 5 at frame f.

$$ P_{0}(x,y) = I_{0}(x,y) $$
(1)

At frame f, the PLIM in the ROI is updated using

$$ {{\varDelta}}_{f} (bx,by) = I_{f}(bx,by) - P_{f - \nu}(bx,by) $$
(2)
$$ P_{f}(bx,by)=P_{f-\nu}(bx,by)+\left\{\begin{array}{ccc} 1 & if & {{\varDelta}}_{f}(bx,by)>0 \\ 0 & if & {{\varDelta}}_{f}(bx,by)=0 \\ -1& if & {{\varDelta}}_{f}(bx,by)< 0 \end{array}\right. $$
(3)

The PLIM activity map Af(bx, by) in the ROI is defined as

$$ A_{f}(bx,by)=\left\{ \begin{array}{ccc} 1 & if & {{\varDelta}}_{f}(bx,by) > \alpha \\ 0 & & otherwise \end{array}\right. $$
(4)
$$ e_{f}=\underset{bx}{\sum}\underset{by}{\sum}A_{f}(bx,by) $$
(5)

where bxxROI = [x1, x2], byyROI = [y1, y2], x1 and x2 are the first and last x-axis points of the ROI, y1 and y2 are the first and last y-axis points of the ROI, which are further elaborated in the Section 2.2. If(bx, by) is the image at the frame f, Δf(bx, by) is the image difference at frame f, Af(bx, by) is the PLIM’s activity map, ef is the number of active pixels in the activity map. α is the motion detection threshold, and ν is the PLIM background update frequency, which are further elaborated in Section 2.6.

In this paper, an improved method to produce stable breathing signals by

filtering out signals from environmental noises and other body movements is developed. First of all, the normal breathing template (Bf) of an individual is saved initially from the activity map (Af(bx, by)), which is used as the breathing motion reference. Then, the real-time raw motion is compared and matched with the reference template to identify breathing signals using Eq. 6. The filtered breathing activity level (\(e^{\prime }_{f}\)) is defined in Eq. 7, which is the number of active pixels in the breathing activity map (Mf(bx, by)). The raw activity level ef is produced based on the raw motion data as in [26], the proposed filtered breathing activity model is able to distinguish breathing signals from signals caused by other movements and produce reliable breathing activity measurement \(e^{\prime }_{f}\). Importantly, the online normal breathing template is designed to be adaptive and is automatically updated if the proposed system detects a notable raw activity level (ef) while detects none or very less filtered breathing activity level (\(e^{\prime }_{f}\)) for ten continuous seconds.

$$ M_{f}(bx, by) = \left\{ \begin{array}{ccc} 1 & if & A_{f}(bx, by)=B_{f}(bx, by) \\ 0 & & otherwise \end{array}\right. $$
(6)
$$ e^{\prime}_{f}=\underset{bx}{\sum}\underset{by}{\sum}M_{f}(bx, by) $$
(7)

where Bf is the saved breathing template for individuals. The Mf(bx, by) is filtered breathing activity map from the raw activity map Af(bx, by) and the \(e^{\prime }_{f}\) is the filtered breathing activity level at frame f.

2.2 Adaptive region of interest detection

In order to develop a robust and reliable vision-based breathing monitoring system, it is important to deal with foreground and background noises, distinguish breathing movements from other motion signals and produce accurate measurement of breathing activities. As a result, we build a LABB model for localization of active breathing behavior regions, which helps produce pure breathing signals and quantitative measurement results. In our preliminary simulated tests, Fig. 2 shows that the proposed LABB model is able to deal with Fig. 2a background noises such as a moving ball and Fig. 2b foreground noises such as a person walking by and works well with various camera view angles as shown in Fig. 2c.

Fig. 2
figure 2

The results of preliminary tests on the localization of active breathing regions in the presence of foreground and background noises, and on different view angles.The results shows a the proposed method is able to effectively and robustly identify the active breathing regions despite the background noise caused by throwing a ball. b The proposed method can effectively locate the active breathing regions despite the foreground noise caused by a human walking by. c The proposed framework can identify the region of active breathing on different view angles. The red color indicates the motion signal, and the yellow color represents the detected active breathing region

Obtaining individual normal breathing template Bf, which is defined in Eq. 18, the system identifies the area with personalized breathing patterns as the active region of interest \(R^{\prime }\) for analysis. The proposed system automatically reset the ROI if a large motion event is detected. For initialization, the breathing region (R) is detected from the full-frame is refined by applying erosion to remove noises. After that, a bounding box from the refined region (\(R^{\prime }\)) is determined. In some cases, the breathing motion may have a weak signal which results in ROI to be small. The proposed system resolves this situation using Eqs. 101112, and 13.

$$ H=\left( \begin{array}{ccc} 0 & 1 & 0 \\ 1 & 1 & 1 \\ 0 & 1 & 0 \end{array} \right) $$
(8)
$$ R^{\prime}_{f}=B_{f}\ominus H $$
(9)
$$ \begin{array}{r}x_{1}=\left\{\begin{array}{ccc} (1-{{\varOmega}})x^{\prime}_{1} & if & x^{\prime}_{2} - x^{\prime}_{1} < w / \lambda\\ x^{\prime}_{1} & & otherwise \end{array}\right. \end{array} $$
(10)
$$ \begin{array}{r}x_{2}=\left\{\begin{array}{ccc} (1+{{\varOmega}})x^{\prime}_{2} & if & x^{\prime}_{2} - x^{\prime}_{1} < w / \lambda\\ x^{\prime}_{2} & & otherwise \end{array}\right. \end{array} $$
(11)
$$ \begin{array}{r}y_{1}=\left\{\begin{array}{ccc} (1-{{\varOmega}})y^{\prime}_{1} & if & y^{\prime}_{2} - y^{\prime}_{1} < h / \lambda\\ y^{\prime}_{1} & & otherwise \end{array}\right. \end{array} $$
(12)
$$ \begin{array}{r}y_{2}=\left\{\begin{array}{ccc} (1+{{\varOmega}})y^{\prime}_{2} & if & y^{\prime}_{2} - y^{\prime}_{1} < h / \lambda\\ y^{\prime}_{2} & & otherwise \end{array}\right. \end{array} $$
(13)

where \(R^{\prime }\) is the breathing region after noise removal with bounding box \(R^{\prime } = \{x^{\prime }_{1}, y^{\prime }_{1}, x^{\prime }_{2}, y^{\prime }_{2}\}\), w is the width of the image, h is the height of the image, Ω is the resize factor, and λ is the video frame dividing factor (in this study Ω = 0.15, and λ = 5); ROI with bounding box is defined as b = {x1, y1, x2, y2} with dimensions xROI × yROI.

2.3 Breathing and body movement classification

Breathing and activity classification is pivotal to the accuracy of the proposed system. Unconscious body and limb movements are natural phenomena, which is unavoidable. The proposed system, with the help of raw activity level (ef) and filtered activity level (\(e^{\prime }_{f}\)), is able to distinguish between respiratory and non-respiratory movements. Figure 1v illustrates the oscillogram about different activity patterns based on the activity level and peak duration.

Accurate activity classification would ensure the results produced by the proposed system are accurate and robust. After the body or limb movement detection, the proposed system reset the ROI and normal breathing template to automatically adapt according to the new resting position of the subject. Moreover, it excludes that particular movement from the RR calculation. As a result, the proposed system can easily adapt to the new position and keep producing accurate RR measurements.

2.4 Respiratory rate estimation

The respiratory activity causes the cyclic movement of individuals with varying frequency and depth. The video inputs are a series of f frames at a frame rate (F) per second, and a complete breathing activity is identified using inspiration and expiration events from the filtered breathing activity level (\(e^{\prime }_{f}\)). Initially, the maximum peak value of the filtered activity level (ϒf) is detected from the maximum filtered activity level (\(e^{\prime }_{f}\)) using Eq. 14. Upper (ζu) and lower (ζl) thresholds are calculated using Eqs. 15 and 16, respectively (see Fig. 1vi). These upper and lower thresholds are used to detect a complete inhalation and exhalation activity without interference of unnecessary movement using Eq. 17. The breathing pattern of an individual could change over time. For this reason, the proposed system is designed to adapt different breathing patterns. The peak value (ϒf) is decremented by every second (see Eq. 14). According to the results, this adaptive approach is proved to be effective in detecting the inhalation and exhalation activity efficiently, even if the breathing activity level (\(e^{\prime }_{f}\)) decreases or increases over time. Furthermore, the time interval between two breathing peaks could also help us to identify and classify the movement as a respiratory or non-respiratory movement. If the system detects a non-respiratory movement or a long breathing interval, then the activity peak (Υf) is reset.

$$ {\Upsilon}_{f} = \left\{ \begin{array}{ccc} e^{\prime}_{f} & if & e^{\prime}_{f} > {\Upsilon}_{f-1}) \\ {\Upsilon}_{f-F} - 1 & &otherwise \end{array}\right. $$
(14)
$$ \zeta_{u} = {\Upsilon}_{f} \times \sigma_{1} $$
(15)
$$ \zeta_{l} = {\Upsilon}_{f} \times (1 - \sigma_{1}) $$
(16)
$$ \begin{array}{ll} \rho_{f} = \left\{ \begin{array}{ccccc} 1 & if & e^{\prime}_{f} > \zeta_{u} & \wedge & \rho_{f-1} = 0\\ 0 & if & e^{\prime}_{f} < \zeta_{l} & \wedge & \rho_{f-1} = 1\\ \rho_{f-1} & & otherwise \end{array}\right. \end{array} $$
(17)

where Υf is the maximum level of breathing activity at frame f, and F is the video frame rate per second; σ1 = 0.75 is empirically determined for further calculation of the upper-threshold (ζu) and lower-threshold (ζl); ρf is the breathing peak value at frame f, and \(e^{\prime }_{f}\) is the filtered breathing activity level at frame f.

The RR (rf) is estimated by using the time interval between two breathing activities (τ) (see Fig. 1, vi). The number of breathing activities (ßf) are counted according to the detected breathing activities using Eq. 18. Further, the initial frame number (af) and final frame number (bf) are detected from the series of frames (f ) using Eqs. 19 and 20, respectively. The time interval between two breathing activities (τ) could be calculated from the difference of the final frame (bf) and initial frame (af) as in Eq. 21. This time interval is further used to estimate RR per minute (rf) at frame f using Eq. 22.

$$ \begin{array}{ll} \ss_{f} = \left\{ \begin{array}{ccc} \ss_{f-1} + 1 & if & \rho_{f-1} - \rho_{f} = 1\\ 0 & if & \ss_{f-1} = L \\ \ss_{f-1} & & otherwise \end{array}\right. \end{array} $$
(18)
$$ \begin{array}{ll} a_{f} = \left\{\begin{array}{ccc} f & if & \ss_{f-1} - \ss_{f} = L \\ a_{f-1} && otherwise \end{array}\right. \end{array} $$
(19)
$$ \begin{array}{ll} b_{f} = \left\{\begin{array}{ccc} f & if & \ss_{f-1} + \ss_{f} = L + 1\\ b_{f-1} & & otherwise \end{array}\right. \end{array} $$
(20)
$$ \begin{array}{ll} \tau = \frac{|b_{f} - a_{f}|}{F} \end{array} $$
(21)
$$ r_{f} = \frac{\chi}{\tau} \times L $$
(22)

where ßf is the breath counter at frame f, bf is the final frame number at frame f, af is the initial frame number at frame f, F is the video frame rate per second, rf is the RR at frame f, χ is the number of seconds in 1 min (χ = 60), and τ is the elapsed time for L number of breaths (in this study L = 2, which is empirically determined parameter).

2.5 Monitor change in respiration rate

The proposed system is also able to monitor the change in RR. Monitoring change in RR is important to identify critical events and critical events must be notified to the medical experts. For this reason, the proposed system also introduce intelligent alarming system, which the existing monitoring system lacks. Although RR is estimated every two breaths in the proposed, but change in the RR is calculated every 10 s. The proposed system has three types of alarming system: (1) the adjustable upper and lower threshold; (2) calculation of the change in respiration rate over a certain time period; and (3) the respiratory arrest or no breathing alarm. The adjustable upper and lower threshold works similar with the existing system (GE Dash 5000 [27]). The alarm will be triggered if the RR exceeds the upper limit, and go below the lower limit. Equation 23 shows alarm triggering conditions.

$$ Al_{f}=\left\{\begin{array}{ccc} N_{U}& if & r_{f} > \varepsilon_{u}\\ N_{L}& if & r_{f} < \varepsilon_{l} \\ N_{G}& & otherwise \end{array}\right. $$
(23)

where Alf is the fixed threshold alarm trigger, εu is the upper threshold, εl is the lower threshold, and rf is the RR at frame f. NU and NL is the alarm for upper and lower limit, respectively, whereas NG is the condition when the alarm is not triggered.

For the rate of change in RR, the proposed system calculates the change (Δrf), and percentage change (δrf) every 10 s (see Eq. 24), which the existing ETCO2 monitors are unable to calculate. The proposed system classifies the alarming situation according to the calculated percentage. The alarming situations include the critical alarm, the moderate alarm or the minor alarm (shown in Eq. 24). This intelligent classification can alert and inform a medical staff about the patient’s condition.

$$ {{\varDelta}} r_{f} = |r_{f} - r_{f-k}| $$
(24)
$$ \delta r_{f} = \frac{{{\varDelta}} r_{f}}{r_{f}} \times 100 $$
(25)
$$ Al_{c} = \left\{\begin{array}{ccc} N_{A} &if& \varpi_{1} < \delta r_{f} \leq \varpi_{2}\\ N_{B} &if& \varpi_{2} < \delta r_{f} \leq \varpi_{3}\\ N_{C} &if& \delta r_{f} > \varpi_{3}\\ N_{G} & & otherwise \end{array}\right. $$
(26)

where Δrf is the rate of change of RR every k number of frames (in this study k = 10F), δrf is the percentage change in the RR, and Alc is the alarming condition classification according to the δrf. NC shows critical change in the RR, NB shows moderate change in the RR, NC shows minor decrement and increment in the RR, and NG shows no change in the RR. In this study ϖ1 = 1%, ϖ2 = 25%, and ϖ3 = 50% are empirically determined parameter.

The proposed system could efficiently generate the alarms if there is no breathing activity for \(\mho \) continuous seconds (see Eq. 27). Furthermore, the proposed system also includes an adaptive respiratory arrest detection system, which the ETCO2 monitor (GE Dash 5000 [27]) lacks. The ETCO2 monitor (GE Dash 5000 [27]) is unable to adapt according to the conditions, but it is able to check for no-breathing over a fixed (default) time interval. However, the proposed system is able to adapt according to the condition with time. As a result, the adaptability and the accuracy of the proposed system increase. Initially, the proposed system generates its first respiratory arrest critical alarm after \(\mho \) seconds of inactivity and reset the RR result. Further, the system will automatically add \(\mho \) seconds to the previous waiting time (qf) until it crosses the maximum RR reset limit (Z). When the RR reset limit (qf) cross the maximum limit (Z), then the system will automatically reset the RR reset limit to \(\mho \) seconds. Due to this adaptability, our system is able to adapt for some special conditions and could estimate as low RR as two (see Fig. 5b), whereas the existing system (GE Dash 5000 [27]) waiting time is fixed, and it is unable to adapt to the low respiration rate. Thus, the existing system (GE Dash 5000 [27]) is unable to estimate the RR, which lower than six breaths per minute.

$$ Al_{n} = \left\{\begin{array}{ccc} 1& if & {\sum}_{f}^{\mho} r_{f} = 0\\ 0& &otherwise \end{array}\right. $$
(27)
$$ q_{f} =\left\{\begin{array}{ccccc} q_{f-1} + \mho& if & Al_{n} = 1& \wedge &q_{f-1} < Z\\ \mho & if & q_{f-1} > Z\\ q_{f-1} & &otherwise& \end{array}\right. $$
(28)

where Aln is no breathing alarm, ℧ is the threshold for breathing inactivity, rf is the RR at frame f, qf is the threshold for RR to reset at frame f, and Z is the maximum frame limit for breathing inactivity. In this study \(\mho = 10F\), and Z = 30F.

2.6 Online adaptation to lighting

For online adaptation to lighting, the proposed system initially assesses the first frame (I0(x, y)) of the video input and computes the illumination level (φf). Each frame is an image composed of three channels, i.e. red, green, and blue channels. RGB image (If(x, y)) is converted into a grayscale image (Vf(x, y)) by averaging the pixel intensities of the three channels, and from the grayscale image, the illumination is calculated using Eq. 29.

$$ \varphi_{f} = \frac{{{\sum}_{x}^{w}}{{\sum}_{y}^{h}} V_{f}(x,y)}{w \times h} $$
(29)

where Vf(x, y) is the grayscale image at frame f with dimensions w × h, φf is the illumination with resulting values in the range [0, 255] at frame f.

Motion detection threshold (α) and background update frequency (ν) are determined using Eqs. 30 and 31. α and ν are the key parameters, which control the sensitivity of subtle breathing motion detection. α is directly proportional to the sensitivity of the proposed system; ν also increases the motion sensitivity of the breathing movement, which means that the system will update the persistent impression frame after every ν frame.

$$ \begin{array}{r} \alpha = \left\{\begin{array}{lcc} \omega_{1} & if & \varphi_{f} < \beta_{1}\\ \omega_{2} & if & \beta_{1} \leq \varphi_{f} \leq \beta_{2}\\ \omega_{3} & if & \varphi_{f} > \beta_{2} \end{array}\right. \end{array} $$
(30)
$$ \begin{array}{r} \nu = \left\{\begin{array}{lcc} \gamma_{1} & if & \alpha = \omega_{1}\\ \gamma_{2} & & otherwise \end{array}\right. \end{array} $$
(31)

where α is motion detection threshold, and ν is background update frequency; ω1 = 8, ω2 = 12, ω3 = 24, γ1 = 4, γ2 = 2, β1 = 83, and β2 = 100 are empirically determined.

3 Data and results

3.1 Experimental setup and data acquisition

The video data were acquired from The National Taiwan University Veterinary Hospital. The animal use protocol has been reviewed and approved (Approval Number NTU106-EL-00116) by the Institutional Animal Care and Use Committee (IACUC), National Taiwan University, Taiwan. All experiments were carried out in accordance with the approved guidelines. Our experimental setup is shown in Fig. 1i. Using three smartphones and two digital stopwatches, the videos were recorded at 1080p resolution with 30 frames per second. Two smartphones were used to record the animal video from two different angles, and the distance from the camera to the target is shown in Fig. 1i. The third smartphone was used to record the measurement outputs of the referenced device, i.e. a FDA-approved medical device (GE Dash 5000 [27]). Stopwatches were used to synchronize the video of the target and the video of the referenced device.

In total, 15 young healthy animals (Landrace × Yorkshire × Duroc (LYD) pigs) were used as subjects to record the video data twice with 7-day interval between two data acquisition. Therefore, 30 recorded video data were collected. In order to evaluate the proposed system more comprehensively, the data from each subject was recorded and classified into seven different stages to simulate various medical conditions and breathing patterns. When the subject was being prepared for the experiment, premedication was administered with intramuscular injection of a mixture of 2 mg/kg xylazine (Rompun, Bayer Korea Ltd., Seoul, Korea) and 4 mg/kg Zoletil (a mixture of tiletamine hydrochloride and zolazepam hydrochloride, Virbac Laboratory, Carros, France). A 22 intravenous catheter was placed in the marginal auricular vein through which the fluids and the drugs were subsequently injected. The anaesthesia was inducted with intravenous propofol 1mg/kg. Once anaesthetized and tracheal intubated, general anaesthesia was maintained with isoflurane (2–3%) throughout the procedure. Heart rate, ECG, BT, mean arterial pressure, pulse oximetry, ETCO2, and RR were monitored continuously. Respiratory condition and rate in the general anaesthesia were controlled by mechanical ventilator (Anesthesia Delivery System (ADS) 1000, Engler, Florida, USA) and intravenous administration of muscle relaxants, 10% atracurium. Pressure control mode was used on ADS 1000 to control the RR of the subject.

Figure 3 illustrates the seven stages with ADS ventilation rate in the experimental design to simulate various medical conditions and breathing patterns. In the beginning of the anaesthesia (Stage 1), atracurium was given a loading dose of 0.5 mg/kg followed by 0.5 mg/kg/h constant rate infusion for 15 mins and the manual ventilator rate of ADS was set at one breath per minute (the lowest frequency). In this stage (Stage 2), the subject will be breathing with the help of mechanical ventilator only. The effect of atracurium would be onset about 2 min after injection. Spontaneous respiratory would be gradually recovered about 10 min after withdrawal of atracurium. This means that the subject will gradually recover in stage 3. In the beginning of stage 4, the ventilator rate of ADS was set at 30 breaths per minute then decreased to 10 breaths per minute to enter stage 5. In the stage 6, we turned off the ventilator. The concentration of isoflurane was elevated to 5% in stage 7 to deepen the anaesthesia and to observe the change of the respiratory condition. In the first data collection procedure, all pigs gradually recovered from general anaesthesia since the end of stage 6 by turning off the isoflurane. In the second experiment, propofol and potassium chloride (KCl) were introduced intravenously to euthanize the pigs at the end-point of this experiment.

Fig. 3
figure 3

This figure illustrates the experimental setup with different stages to simulate various medical conditions and breathing patterns. An anesthesia delivery system (ADS) is used as positive pressure ventilator (Pressure Control Mode) to control the respiratory rate (RR) during machine aided and spontaneous plus machine aided breathing stages. In Stage 1, atracurium was given a loading dose of 0.5 mg/kg followed by 0.5 mg/kg/h constant rate infusion for 15 mins and the manual ventilator rate of ADS was set at one breath per minute (the lowest frequency). In Stage 2, the subject will be breathing with the help of mechanical ventilator only. The effect of atracurium would be onset about 2 min after injection. Spontaneous respiratory would be gradually recovered about 10 min after withdrawal of atracurium. This means that the subject will gradually recover in stage 3. In the beginning of stage 4, the ventilator rate of ADS was set at 30 breaths per minute then decreased to 10 breaths per minute to enter stage 5. In stage 6, we turned off the ventilator. The concentration of isoflurane was elevated to 5% in stage 7 to deepen the anaesthesia and to observe the change of the respiratory condition

The breathing pattern in the animal data is categorized into three types, including spontaneous breathing, machine aided breathing, and spontaneous plus machine aided breathing. Spontaneous breathing occurs in stage 1, and from stage 6 till the end. Machine aided breathing occurs in stage 2 only, and spontaneous plus machine aided breathing occurs from stage 3 to stage 5. For detailed comparison, RR values from the monitor and the proposed system needed to be quantized and recorded every fixed time interval. In this study, we computed RR every two breaths and calculated the difference every 10 s.

3.2 Results and discussion

The RR measurements from the proposed system were compared with the results from the FDA-approved ETCO2 monitor (GE Dash 5000 [27]) to evaluate the accuracy of proposed system. In the preliminary test, five trials were performed initially with two different viewing angles (anterior and lateral) of the animals in supine position. RR results were compared with the reference standard from both viewing angles, and the anterior view is proved to be more accurate than the lateral view (see Table 1). Based on the preliminary test results, an anterior view is selected for further detailed experiment. Full evaluation was performed using an anterior view of 30 animal videos, each having seven stages for simulating medical conditions (see Fig. 3). The RR values were recorded every 10 s for evaluation.

Table 1 Illustrates the average accuracy of the RR results from different viewing angles

Using Spearman’s rho test, the proposed system produces significantly correlated results to the referenced medical device with a correlation coefficient equal to 0.92 and p-value less than 0.001. Figure 4 shows RR results of (a) spontaneous breathing and (b) spontaneous plus machine aided breathing box plots from the proposed method and the referenced medical device, showing that the proposed system produces highly correlated and stable results.

Fig. 4
figure 4

The box plots of the RR measurements by the reference medical device and the proposed system in two different breathing stages where the outliers > 1.5 × interquartile range are marked with a dot and outliers > 3 × interquartile are marked with an asterisk, including a the spontaneous breathing stage and b the spontaneous plus machine aided breathing stage. The results show that the proposed method produces correlated results with the medical monitoring device and generates comparably less outliers

3.3 Discussion

In our experiments, it is found that the proposed system is more sensible and stable in comparison to the reference medical device. Figure 5 illustrates examples from our experiments. Figure 5a shows that in stage 1 the spontaneous breathing results from the proposed system are highly consistent with the reference device with the correlation coefficient equal to 0.92, and the p-value < 0.001 using Spearman’s rho test. At the end of stage 1, there is a unreasonable sudden drastic increase in the RR value from the reference medical device while the proposed system remains stable. In addition, the proposed vision-based system is able to notify the medical doctors about the critical situation 10 to 20 s earlier than the reference medical device. However, human intervention in the foreground may cause minor noises for the proposed vision-based system as observed at 100 s at stage 1 the results are unusual. For the machine aided breathing, the reference medical device fails to produce results because of occurrence of extreme low RR while the proposed system adapts to that condition and successfully produces results as shown in Fig. 5b. In the stage of spontaneous plus machine aided breathing as shown in Fig. 5c, results from the proposed system and the reference device are also significantly correlated with the correlation coefficient equal to 0.63, and the p-value equal to < 0.001 using Spearman’s rho test. For the critical stages from spontaneous breathing to respiratory arrest (see Fig. 5d), the results from the proposed system and the reference device are highly consistent, but more importantly the proposed vision-based monitoring system is demonstrated to be able to notify the critical situation 10 to 20 s earlier than the reference medical device, which is also shown in stage 1 Fig. 5a. The proposed framework does not require expensive state-of-the-art equipment, but it can still produce accurate results as compared to the FDA-approved ETCO2 monitor. The proposed framework only requires a smartphone and a personal computer. While for the stand-alone system, only a smartphone is required.

Fig. 5
figure 5

a In stage 1 the spontaneous breathing results from the proposed system are highly consistent with the reference medical device. b the reference medical device fails to produce results because of low RR while the proposed system adapts to that condition and successfully produces results. c Results from the proposed system and the reference device are also significantly correlated. d For the critical stages from spontaneous breathing to respiratory arrest, the results from the proposed system and the reference device are highly consistent, but more importantly the proposed vision-based monitoring system is demonstrated to be able to notify the critical situation 10 to 20 s earlier than the reference medical device

4 Conclusion

In this paper, we present an improved camera-based video-based respiration monitoring system, which includes real time (1) adaptive breathing motion detection, (2) adaptive region of interest detection to eliminate environmental noise, (3) breathing and body movement classification, (4) Respiration rate estimation, (5) monitor change in respiration rate to examine overall health of an individual, and (6) online adaptation to lighting. The proposed method has been thoroughly evaluated using 30 animal videos with simulation of various breathing conditions such as spontaneous breathing, machine aided breathing, combination of spontaneous and machine aided breathing and respiratory arrest. The proposed method performs highly consistent RR measurement results to the reference contact-based medical monitoring device, and more importantly the proposed method is demonstrated to produce alarms 10 to 20 s earlier than the conventional medical device. For the system limitation, in our simulation tests, it is found that a continuous and cyclic moving fan in the scene would influence the system as shown in Fig. 6. In the future work, we would like to improve the system by integrating the concept of reasonable moving speed to distinguish breathing behaviour from other cyclical movements. Another improvement could be made for the future work is to use deep learning-based image processing to extract breathing features.

Fig. 6
figure 6

System limitation. A continuous and cyclic moving fan in the scene would influence the system. In the future work, we would like to improve the system by integrating the concept of reasonable moving speed to distinguish breathing behavior from other cyclical movements