# Respiratory Rate Estimation from the Built-in Cameras of Smartphones and Tablets

- 744 Downloads
- 20 Citations

## Abstract

This paper presents a method for respiratory rate estimation using the camera of a smartphone, an MP3 player or a tablet. The iPhone 4S, iPad 2, iPod 5, and Galaxy S3 were used to estimate respiratory rates from the pulse signal derived from a finger placed on the camera lens of these devices. Prior to estimation of respiratory rates, we systematically investigated the optimal signal quality of these 4 devices by dividing the video camera’s resolution into 12 different pixel regions. We also investigated the optimal signal quality among the red, green and blue color bands for each of these 12 pixel regions for all four devices. It was found that the green color band provided the best signal quality for all 4 devices and that the left half VGA pixel region was found to be the best choice only for iPhone 4S. For the other three devices, smaller 50 × 50 pixel regions were found to provide better or equally good signal quality than the larger pixel regions. Using the green signal and the optimal pixel regions derived from the four devices, we then investigated the suitability of the smartphones, the iPod 5 and the tablet for respiratory rate estimation using three different computational methods: the autoregressive (AR) model, variable-frequency complex demodulation (VFCDM), and continuous wavelet transform (CWT) approaches. Specifically, these time-varying spectral techniques were used to identify the frequency and amplitude modulations as they contain respiratory rate information. To evaluate the performance of the three computational methods and the pixel regions for the optimal signal quality, data were collected from 10 healthy subjects. It was found that the VFCDM method provided good estimates of breathing rates that were in the normal range (12–24 breaths/min). Both CWT and VFCDM methods provided reasonably good estimates for breathing rates that were higher than 26 breaths/min but their accuracy degraded concomitantly with increased respiratory rates. Overall, the VFCDM method provided the best results for accuracy (smaller median error), consistency (smaller interquartile range of the median value), and computational efficiency (less than 0.5 s on 1 min of data using a MATLAB implementation) to extract breathing rates that varied from 12 to 36 breaths/min. The AR method provided the least accurate respiratory rate estimation among the three methods. This work illustrates that both heart rates and normal breathing rates can be accurately derived from a video signal obtained from smartphones, an MP3 player and tablets with or without a flashlight.

## Keywords

Respiratory rate estimation Autoregressive model Continuous wavelet transform Variable frequency complex demodulation method Smartphone Tablet## Introduction

Respiratory rate is an important indicator for early detection and diagnosis of potentially dangerous conditions such as sleep apnea,24 sudden infant death syndrome,18 cardiac arrest3 and chronic obstructive pulmonary disease.5 In addition, for some patients who undergo surgery, relative changes in respiratory rates are much greater than changes in heart rate or systolic blood pressure, thus, respiratory rates can be an important vital sign indicator.21 Respiratory rate is most accurately measured using transthoracic impedance plethysmography,1 nasal thermocouples20 or capnography.16 However, these methods all require expensive external sensors which may require donning a mask, nasal cannula or chest band sensors. More importantly, since these devices may disturb natural breathing and sleep positions, they are mostly applicable in constrained environments such as operating rooms and intensive care units.

Recently, photoplethysmography (PPG) has been widely considered for respiratory rate extraction due to its simplicity and non-invasive measurement capability.11, 12, 13 The PPG signal contains components that are synchronous with respiratory and cardiac rhythms. Indeed, the respiratory rhythm is modulated by frequency and/or amplitude of the cardiac rhythm. The occurrence of temporal variations of frequency and amplitude is characteristic of the respiratory sinus arrhythmia.6 Thus, the respiratory rate can be obtained by detecting the presence of either amplitude modulation (AM) or frequency modulation (FM) in the PPG signal.2

Numerous advanced signal processing algorithms (both parametric and nonparametric approaches) have been applied to extract respiratory rates by looking for AM or FM signatures from a PPG signal.2,19 For a parametric approach, the autoregressive (AR) model approach has been shown to provide relatively good respiratory rate estimation.7, 8, 9, 10 For nonparametric approaches, time–frequency spectrum (TFS) methods such as continuous wavelet transform (CWT) and variable frequency complex demodulation method (VFCDM) have also been shown to provide accurate respiratory rate estimation.2,11, 12, 13

To our knowledge, respiratory rate estimation using the camera of either a smartphone or a tablet has never been demonstrated nor discussed in the literature. We have recently demonstrated that a pulsatile signal (PS) that has similar dynamics to that of a PPG signal can be obtained from a smartphone’s camera when a fingertip is pressed onto it.4,19 Utilizing these PS derived from an iPhone, we have also shown that accurate detection of atrial fibrillation can be made.17 Given these advances, the aims of this work were: (1) a systematic examination of the PS quality derived from a video camera from several measurement modalities including iPhone 4S, iPad 2, iPod 5, and Galaxy S3; and (2) to determine if accurate respiratory rates can be estimated directly from the PS of the different measurement modalities. The challenge here is that PPG signals are often sampled at greater than 100 Hz whereas most smartphones’ video sampling rates are no more than 30 Hz. Since previous studies have shown good estimation of respiratory rates using the AR model, CWT, and VFCDM from a PPG signal, we also use these methods to compare the accuracy of breathing rates from PS obtained from various models of a smartphone, MP3 player (iPod 5) and a tablet.

## Methods

### Data Collection

Data were collected on 10 healthy subjects on 2 separate occasions using 4 different devices: iPhone 4S, iPad 2, iPod 5, and Galaxy S3. Only two devices were used simultaneously for data collection in a given experimental setting. Worcester Polytechnic Institute’s Institutional Review Board approved the data collection technique. For the PS acquisition, we used the Objective-C programming language and the Xcode platform for iPhone 4S, iPad 2, and iPod 5; Java was used for the Galaxy S3 on the mobile platform Android 4.1 (Jelly Bean). Specifically, we used Eclipse IDE Indigo R2 for the development environment and Samsung Galaxy S3 for the development and debugging purposes. For the video recordings of iPhone, iPad, and iPod, we examined four different sizes of pixel regions: 50 × 50, 320 × 240 (QVGA), 640 × 240 (vertical HVGA), and 640 × 480 (VGA) for determining the optimal signal quality. For all five different pixel sizes, the PS was obtained by averaging the entire pixel size for each of the three color bands (red, green and blue) for every frame. All four devices provided sampling rate close to 30 frames per second. However, when the video sampling rate was lower than 30 Hz, a cubic spline algorithm was used to interpolate the signal to 30 Hz.

### Extraction of Respiratory Rates

#### VFCDM

Detection of AM and FM from a PS using the power spectral density (PSD) is difficult since the dynamics are time-varying, hence, require high resolution time–frequency spectral (TFS) methods to resolve them. We have recently shown that because the VFCDM method provides one of the highest TFS resolutions, it can identify AM and FM dynamics. Consequently, Fourier transform of either the AM or FM time series extracted from the heart rate frequency band can lead to accurate estimation of respiratory rates when the acquired signal is PPG data.23

*et al.*23 Hence, we will only briefly describe the main essence of the algorithm. The VFCDM starts with an assumption that a signal

*x*(

*t*) is considered to be a narrow band sinusoidal oscillation with a center frequency

*f*

_{0}, instantaneous amplitude

*A*(

*t*), phase \( \phi (t) \), and the direct current component

*dc*(

*t*), as follows:

*A*(

*t*) and phase information \( \phi (t) \) can be extracted by multiplying Eq. (1) by \( e^{{ - j2\pi f_{0} t}} \), resulting in the following:

*f*

_{0}, to zero frequency in the spectrum of

*z*(

*t*). If

*z*(

*t*) in Eq. (2) is subjected to an ideal low pass filter (LPF) with a cutoff frequency

*f*

_{c}<

*f*

_{0}, then the filtered signal

*z*

_{lp}(

*t*) will contain only the component of interest and the following Eqs. (3a)–(3c) are obtained:

*x*(

*t*) can be written in the following form:

*A*(

*t*) and instantaneous phase \( \phi (t), \) as described in the following equation:

*z*(

*t*)is filtered with an ideal LPF with a cutoff frequency

*f*

_{c}<

*f*

_{0}, then the filtered signal

*z*

_{lp}(

*t*) will be obtained with the same instantaneous amplitude

*A*(

*t*) and phase \( \phi (t) \) as provided in Eqs. (3b) and (3c). The instantaneous frequency is given by:

The VFCDM method thus involves a two-step procedure. The first step is to use complex demodulation (CDM) or what we termed the fixed frequency CDM (FFCDM) to obtain an estimate of the TFS, and the second step is to select only the dominant frequencies of interest for further refinement of the time–frequency resolution using the VFCDM approach. In the first step of the VFCDM method, a bank of LPFs is used to decompose the signal into a series of band-limited signals. The analytic signals that are obtained from these, through use of the Hilbert transform, then provide estimates of the instantaneous amplitude, frequency, and phase within each frequency band.

#### CWT

As described in Introduction section, numerous studies11, 12, 13 showed relatively good results using the CWT for extraction of respiratory rates directly from a pulse oximeter. The Morlet wavelet was also applied to a half-length of five samples at the coarsest scale for estimating the scalogram of the PS.22 The procedures of the CWT for extracting respiratory rates is nearly identical to the VFCDM in that identified AM and FM series are Fourier transformed to estimate respiratory rates.

#### AR Modeling

This approach involves estimation of AR model parameters using the optimal parameter search (OPS) criteria.15 The AR parameters are formulated as the transfer function followed by factorization into pole terms. The real and complex conjugate poles define the power spectral peaks with the larger magnitude poles corresponding to higher magnitude peaks. The resonant frequency of each spectral peak is given by the phase angle of the corresponding pole. Among the poles, we set the region of interest for respiratory rates between 0.15 and 1 Hz. The details of the respiratory algorithm using the AR model are described in Lee and Chon.7

### Data Analysis

Using PPG signals with sampling rates of at least 250 Hz to derive heart rates has previously been shown to be a good alternative to ECG monitoring.14 However, sampling rates for most smart phone and tablet video cameras range from 25 to 30 Hz. Given these low sampling rates, it is necessary to determine the accuracy of the smart phone and tablet devices in estimating heart rates and respiratory rates. Comparisons of derived heart rates were made between the standard ECG, smartphones and tablets. We used our own peak detection algorithm to determine R-wave peaks from the ECG signals and cardiac pulse peaks from the phone camera PPG signal. Due to the frame rate variability, we interpolated the PS to 30 Hz using a cubic spline algorithm followed by the peak detection. The peak detection algorithm incorporated a filter bank with variable cutoff frequencies, spectral estimates of the heart rate, rank-order nonlinear filters and decision logic.

*F*

_{w}= 0.03 Hz, second cutoff frequency

*F*

_{v}= 0.015 Hz, and filter length

*N*

_{w}= 64. We have previously shown that the parameter

*F*

_{v}=

*F*

_{w}/2, and that

*N*

_{w}is chosen to be approximately half the data length. For each of these categories, detection errors were found for each frequency for all subjects using the four different methods. The error

*ε*is calculated as follows:

*n*is the number of 1-min segments, \(R_{\text{D}}^{i} \) and \(R_{\text{T}}^{i} \) denote the detected breathing rate and the true breathing rate of

*i*-th 1-min dataset, respectively.

## Results

### Selection of the Best Color Band and the Optimal Video Pixel Size for Estimation of Heart Rates

Figures 2b and 2c provide details of the video pixel regions examined on all four devices and they consist of the following 12 video regions: left top (LT), left middle (LM), left bottom (LB), right top (RT), right middle (RM), right bottom (RB), middle top (MT), center (C), middle bottom (MB), vertical left half-VGA (vertical left HVGA), vertical right half-VGA (vertical right HVGA) and VGA.

Experimental results of heart rate extracted from ECG and three-color band signals obtained from iPhone 4S (Resolution: HVGA)

Color | PS | RRI | Median error |
---|---|---|---|

Blue | 0.8124 ± 0.23334 | 0.8103 ± 0.0514 | 0.0021 |

Green | 0.8149 ± 0.19698 | 0.0047 | |

Red | 0.8121 ± 0.22897 | 0.0018 |

The mean amplitude values of the green color pulse signals with flash on except for iPad 2

No. | Resolution | Region | Mean amplitude value | |||
---|---|---|---|---|---|---|

iPhone 4S | iPad 2 | iPod 5 | Galaxy S3 | |||

1 | 50 × 50 | RT | 6.33 ± 1.99 | 4.78 ± 1.42* | 2.67 ± 0.82 | 9385.85 ± 3140.96* |

2 | RM | 7.02 ± 2.19 | 4.77 ± 1.42* | 2.41 ± 0.75 | 9326.86 ± 3123.12* | |

3 | RB | 6.15 ± 1.94 | 2.44 ± 0.72 | 2.31 ± 0.72 | 8583.78 ± 2839.43* | |

4 | MT | 8.45 ± 2.64 | 4.10 ± 1.22 | 4.11 ± 1.27 | 7066.07 ± 2365.34 | |

5 | Center | 9.05 ± 2.82 | 3.88 ± 1.16 | 2.79 ± 0.88 | 6550.41 ± 2173.4 | |

6 | MB | 8.28 ± 2.59 | 3.07 ± 0.91 | 3.59 ± 1.12 | 3459.99 ± 1148.69 | |

7 | LT | 9.42 ± 2.94 | 3.53 ± 1.06 | 5.79 ± 1.79* | 5682.13 ± 1910.77 | |

8 | LM | 10.49 ± 3.26 | 2.89 ± 0.85 | 6.23 ± 1.92* | 3969.18 ± 1315.59 | |

9 | LB | 9.61 ± 3.01 | 4.05 ± 1.21 | 5.04 ± 1.57 | 1605.74 ± 525.84 | |

10 | HVGA | Right | 8.67 ± 2.54 | 4.74 ± 1.39* | 3.53 ± 1.02 | 7595.58 ± 2521.62 |

11 | Left | 11.37 ± 3.32* | 3.78 ± 1.11 | 5.17 ± 1.49 | 2766.16 ± 915.96 | |

12 | VGA | Full | 9.05 ± 2.65 | 3.11 ± 0.91 | 5.75 ± 1.66* | 5168.72 ± 1715.26 |

### Heart Rate, Frequency Spectrum and Power Spectrum

*via*the VFCDM, and the PSD of the AM and FM signals derived from the HR frequency band (e.g., ~1 to 1.5 Hz), respectively, while a subject was breathing at a metronome rate of 18 breaths/min. Note the similarity of the PS in Fig. 4a to those of commercially-available PPG signals. As shown in Fig. 4c, the PSD of the extracted AM and FM time series show the largest peaks at 0.3 Hz; these peaks correspond accurately to the true respiratory rate of 18 breaths/min.

### Respiratory Rate

The true respiratory rates were derived by taking the PSD of the respiratory impedance trace signals during metronome breathing experiments. True respiratory rates from the respiration trace and the estimated breathing rates from the green signal using both the FM and AM sequences from the VFCDM and CWT were compared using metronome rates ranging from 0.2 to 0.9 Hz. In order to evaluate the four computational methods, we provide figures and tables that show the accuracy and repeatability of each method as a function of the true breathing rate. For tabulating results, we grouped the results for 0.2–0.3 Hz together and designated them as the low frequency (LF) breathing rates. Likewise, the results for 0.4–0.6 Hz breathing rates were lumped together and designated as the high frequency (HF) breathing rates. Since the percentage errors were found to be not-normally distributed, we report the median and inter-quarter range (IQR) values.

As shown in Fig. 5, the AR model approach is the least accurate followed by CWT-AM, CWT-FM, and VFCDM (both AM & FM approaches) when we consider all breathing frequencies. Note that the variances of the median values as determined by *ε* [the average respiratory estimation error as defined in Eq. (7)] are significantly lower for both VFCDM and CWT than for AR model approach. Although there was no significant difference in the median error between CWT and VFCDM methods at 0.3 Hz, *ε* is found to be the lowest for VFCDM- FM at 0.2 Hz. In general, *ε* is larger for HF than LF breathing rates for all computational methods. For HF breathing rates, *ε* is lowest for CWT-FM, followed by VFCDM, CWT-AM, and AR model. While there is no significant difference in the variance between VFCDM-FM and CWT-FM, both methods have significantly less variance than either CWT-AM or VFCDM-AM or AR model. Thus, gauging the accuracy as defined by the median errors and their variances, as shown in Fig. 5, we observed that for HF breathing rates, CWT-FM consistently provides significantly lowest median errors and variance values.

As shown in Fig. 5, the subjects’ variation of percentage detection errors has been shown in the form of box plots, which were extracted from front cameras of an iPhone 4S and an iPad 2 (no flash), respectively, for the left HVGA region. While not shown, the left HVGA region also had the best signal quality with the flashlight off for an iPhone 4S. The AR model approach is the least accurate followed by CWT and VFCDM methods when we consider all breathing frequencies. For LF breathing rates, there was no significant difference in the median error between VFCDM methods. However, the variances of the median values as determined by *ε* are significantly lower for both VFCDM and CWT than for AR model approaches. In general, *ε* is larger in HF than LF breathing rates. For HF breathing rates, *ε* is lowest for CWT-FM, followed by VFCDM, CWT-AM, and AR model. While there is no significant difference in the variance between VFCDM-FM and VFCDM-AM in LF breathing rate, median errors of VFCDM-FM are significantly lower than that of VFCDM-AM. Thus, gauging the accuracy as defined by the median errors and their variances, as shown in Fig. 5, we observed that for both LF and HF breathing rates, CWT-FM consistently provides the lowest median errors and variance values.

Figure 5 also shows the subjects’ variation of percentage detection error in the form of box plots, which were extracted from front cameras of a Galaxy S3 and an iPod 5, respectively, both from the 50 × 50 pixel resolutions in the LT for the former and LM regions for the latter. The AR model approach is the least accurate followed by CWT and VFCDM methods when we consider all breathing frequencies. For LF breathing rates, there was no significant difference in the median error between VFCDM methods. However, the variances of the median values as determined by *ε* are significantly lower for both VFCDM and CWT than for AR model approaches. *ε* is larger in HF than LF breathing rates. For HF breathing rates, *ε* is lowest for CWT-FM. While there is no significant difference in the variance between VFCDM-FM and VFCDM-AM in LF breathing rate, median errors of VFCDM-FM are significantly lower than that of VFCDM-AM. Thus, gauging the accuracy as defined by the median errors and their variances, as shown in Fig. 5, we observed that for both LF and HF breathing rates, VFCDM-FM most often provides the lowest median errors and variance values.

*ε*< 7) and median errors (

*ε*< 6) than those of any other devices in the 0.2–0.4 Hz breathing rate range.

Population statistics for IQR detection errors for each method

Device | Breaths/min | AR | WT | CDM | ||
---|---|---|---|---|---|---|

AM | FM | FM | AM | |||

iPhone 4S | 12 | 1.06 ± 0.55 | 1.52 ± 0.8 | 3.65 ± 1.89 | 3.17 ± 1.65 | 1.04 ± 0.53 |

18 | 0.94 ± 0.47 | 2.25 ± 1.12 | 2.08 ± 1.14 | 1.84 ± 1.01 | 3.95 ± 2 | |

24 | 1.28 ± 0.65 | 6.12 ± 3.12 | 5.86 ± 3.08 | 3.76 ± 1.89 | 5.24 ± 2.67 | |

30 | 1.95 ± 1.02 | 11.54 ± 5.8 | 4.82 ± 2.5 | 8.87 ± 4.47 | 9.03 ± 4.86 | |

36 | 2.48 ± 1.32 | 4.57 ± 2.43 | 6.38 ± 3.46 | 7.02 ± 3.51 | 7.44 ± 3.94 | |

iPad 2 | 12 | 0.59 ± 0.3 | 2.69 ± 1.38 | 7.96 ± 4.08 | 5.18 ± 2.84 | 4.58 ± 2.39 |

18 | 0.83 ± 0.42 | 3.03 ± 1.63 | 3.66 ± 1.92 | 1.89 ± 1.03 | 2.84 ± 1.45 | |

24 | 2.15 ± 1.17 | 5.94 ± 2.98 | 6.25 ± 3.22 | 4.4 ± 2.2 | 2.01 ± 1.02 | |

30 | 3.21 ± 1.7 | 11.24 ± 5.83 | 5.98 ± 3.2 | 8.01 ± 4.01 | 9.2 ± 4.8 | |

36 | 2.45 ± 1.28 | 8.93 ± 4.48 | 6.95 ± 3.54 | 9.15 ± 4.6 | 4.34 ± 2.23 | |

Galaxy S3 | 12 | 0.42 ± 0.22 | 1.26 ± 0.64 | 2.1 ± 1.05 | 1.68 ± 0.92 | 1.09 ± 0.55 |

18 | 0.41 ± 0.21 | 4.31 ± 2.31 | 5.99 ± 3 | 6.04 ± 3.12 | 4.25 ± 2.2 | |

24 | 0.15 ± 0.08 | 7.96 ± 4.03 | 6.48 ± 3.24 | 5.28 ± 2.66 | 5.79 ± 2.89 | |

30 | 0.42 ± 0.22 | 7.02 ± 3.51 | 7.55 ± 3.8 | 5.01 ± 2.58 | 2.93 ± 1.48 | |

36 | 0.69 ± 0.35 | 9.94 ± 4.97 | 14.07 ± 7.13 | 7.79 ± 3.99 | 7.93 ± 4.18 | |

iPod 5 | 12 | 8.64 ± 4.72 | 7.48 ± 3.79 | 4.42 ± 2.21 | 3.29 ± 1.66 | 5.49 ± 3.05 |

18 | 0.4 ± 0.2 | 3.88 ± 1.97 | 4.96 ± 2.59 | 4.09 ± 2.06 | 2.1 ± 1.07 | |

24 | 0.67 ± 0.34 | 4.54 ± 2.27 | 2.38 ± 1.29 | 4.57 ± 2.4 | 7.87 ± 4.05 | |

30 | 0.38 ± 0.19 | 5.38 ± 2.77 | 6.57 ± 3.29 | 6.77 ± 3.46 | 12.21 ± 6.51 | |

36 | 0.9 ± 0.45 | 9.34 ± 4.68 | 19.38 ± 9.97 | 11.19 ± 5.74 | 11.8 ± 5.96 |

*t*test were used for analysis of differences between the medians for the seven different methods. Statistical significance was identified as

*p*< 0.05. Tables 4 and 5 provide a summary of the statistical analysis comparing the performance of the five methods (AR, CWT-AM, CWT-FM, VFCDM-AM and VFCDM-FM) to each other. For Tables 4 and 5, we list only those comparison that show significant difference among the five computation methods for each device for both LF and HF breathing ranges. Regarding accuracy, for both LF and HF breathing ranges, the tables show that the AR is significantly less accurate than either the AM or the FM version of the CWT and VFCDM methods for all four devices. Further, FM of CWT and VFCDM are significantly more accurate than their AM counterparts for all four devices but only for the HF breathing ranges. The repeatability values shown in Tables 5 are similar to the accuracy results. For example, for both LF and HF breathing ranges, the AR is significantly less repeatable than either AM or FM of CWT and VFCDM methods for all four devices. For the HF breathing range, FM of CWT and VFCDM are significantly more repeatable than their AM counterparts for all four devices.

Statistical significance (accuracy) among the five methods for four devices

Device | LF | HF | Device | LF | HF |
---|---|---|---|---|---|

iPhone 4S | AR vs. VFCDM-AM AR vs. VFCDM-FM AR vs. CWT-AM AR vs. CWT-FM | AR vs. VFCDM-AM AR vs. VFCDM-FM AR vs. CWT-AM AR vs. CWT-FM VFCDM-AM vs. VFCDM-FM VFCDM-AM vs. CWT-FM CWT-AM vs. VFCDM-FM CWT-AM vs. WT-FM | iPod 5 | AR vs. VFCDM-AM AR vs. VFCDM-FM AR vs. CWT-AM AR vs. CWT-FM CWT-AM vs. VFCDM-FM | AR vs. VFCDM-AM AR vs. VFCDM-FM AR vs. CWT-AM AR vs. CWT-FM VFCDM-AM vs. VFCDM-FM VFCDM-AM vs. CWT-FM CWT-AM vs. WT-FM |

iPad 2 | AR vs. VFCDM-AM AR vs. VFCDM-FM AR vs. CWT-AM AR vs. CWT-FM VFCDM-AM vs. VFCDM-FM VFCDM-AM vs. CWT-FM CWT-AM vs. CWT-FM | Galaxy S3 | AR vs. VFCDM-AM AR vs. VFCDM-FM AR vs. CWT-AM AR vs. CWT-FM | AR vs. VFCDM-AM AR vs. VFCDM-FM AR vs. CWT-AM AR vs. CWT-FM VFCDM-AM vs. VFCDM-FM VFCDM-AM vs. CWT-FM CWT-AM vs. CWT-FM |

Statistical significance (repeatability across time) among the five methods for four devices

Device | LF | HF | Device | LF | HF |
---|---|---|---|---|---|

iPhone 4S | AR vs. VFCDM-AM AR vs. VFCDM-FM AR vs. CWT-AM AR vs. CWT-FM | AR vs. VFCDM-AM AR vs. VFCDM-FM AR vs. CWT-AM AR vs. CWT-FM VFCDM-AM vs. CWT-FM | iPod 5 | AR vs. VFCDM-AM AR vs. VFCDM-FM AR vs. CWT-AM AR vs. CWT-FM | AR vs. VFCDM-AM AR vs. VFCDM-FM AR vs. CWT-AM AR vs. CWT-FM VFCDM-AM vs. CWT-FM |

iPad 2 | AR vs. VFCDM-AM AR vs. VFCDM-FM AR vs. CWT-AM AR vs. CWT-FM VFCDM-AM vs. CWT-FM | Galaxy S3 | AR vs. VFCDM-AM AR vs. VFCDM-FM AR vs. CWT-AM AR vs. CWT-FM |

Accuracy as determined by median errors at 42, 48, 54 breaths/min (iPhone 4S, flashlight: On). The error values listed for each method represent breaths/min

Breaths/min | Error | AR | WT | CDM | ||
---|---|---|---|---|---|---|

AM | FM | FM | AM | |||

42 (0.7 Hz) | Median | 40.05 ± 0.41 | 21.58 ± 9.14 | 5.58 ± 5.16 | 16.05 ± 4.58 | 24.21 ± 6.33 |

IQR | 0.72 ± 0.38 | 19.89 ± 10.15 | 7.22 ± 3.87 | 9.27 ± 4.72 | 5.17 ± 2.59 | |

48 (0.8 Hz) | Median | 45.69 ± 1.21 | 32.61 ± 4.65 | 24.06 ± 9.67 | 24.74 ± 4.08 | 28.53 ± 6.82 |

IQR | 0.68 ± 0.35 | 9.3 ± 4.97 | 14.07 ± 7.04 | 4.61 ± 2.32 | 6.25 ± 3.15 | |

54 (0.9 Hz) | Median | 51.49 ± 1.46 | 38.14 ± 4.9 | 36.38 ± 3.55 | 32.8 ± 4.87 | 33.24 ± 8.93 |

IQR | 0.41 ± 0.22 | 6.79 ± 3.68 | 6.93 ± 3.51 | 6.05 ± 3.07 | 11.77 ± 6.28 |

*ε*are significantly lower for both VFCDM and CWT than for the AR model approach. In the normal range (11–27 breaths/min), VFCDM-FM consistently provides the lowest median errors and variance values. As shown in Table 7, there was no significant difference in the median error among WT-AM, WT-FM, VFCDM-FM, and VFCDM-AM during spontaneous breathing, the accuracy of AR is lower than other approaches.

Statistical significance (accuracy and repeatability across time) among the five methods for spontaneous respiratory rate

Accuracy | Repeatability across Time |
---|---|

AR vs. VFCDM-AM AR vs. VFCDM-FM AR vs. WT-AM AR vs. WT-FM | AR vs. VFCDM-AM AR vs. VFCDM-FM AR vs. WT-AM AR vs. WT-FM |

In general, the ability of the methods to provide consistent results is especially excellent (highest) for both the CWT-FM and VFCDM methods, for both LF and HF breathing rates. As with the accuracy results, the repeatability is also better for the LF than for the HF breathing rates for all four methods. Both CWT-FM and VFCDM provide significantly more repeatable results than either CWT-AM or AR model.

### Computation Time

Computation time of heart rate extracted from color band signal of iPhone 4S depending on different resolutions

Resolution | Color | Computation time |
---|---|---|

320 × 240 (QVGA) | Green | 25 frames/s |

480 × 320 (HVGA) | Green | 25 frames/s |

480 × 320 (HVGA) | Green and red | 23 frames/s |

480 × 320 (HVGA) | 3 Colors | 20 frames/s |

640 × 480 (VGA) | Green or red | 19 frames/s |

The clock speed of the CPU used in the iPhone 4S and iPod 5 is 800 MHz. The latest iPhone 5 is clocked at 1.02 GHz. The recently released Samsung Galaxy S4 is equipped with a 1.9 GHz Quad-core processor. Thus, for most new smartphone and tablet cameras, higher than 30 frames/s can be achieved, suggesting that a choice of higher pixel resolution will not be a significant problem for accurate and real-time detection of heart rates and respiratory rates.

## Discussion

In this work, we tested several smartphones and tablets for their feasibility in estimating respiratory rates using the PS derived from a resident video camera and flashlight, when available. The motivation for this work is based on several recent works which showed that accurate respiratory rates, especially at normal breathing rates, can be obtained from pulse oximeters.11, 12, 13 The characteristics of the PS derived from cameras in smartphones and tablets are similar to PPG signals, hence, similarly-accurate respiratory rates can be obtained, theoretically. Our results do indicate that certainly for normal breathing ranges (0.2–0.3 Hz), this is feasible from PS derived from smartphone and tablet video cameras.

We have optimized the accuracy of the respiratory rates by first systematically analyzing the optimal pixel resolution of the video signal for the attainment of the strongest PS strength. It is logical to assume that the greater the amplitude of the PS, the higher the signal’s strength with the proviso that care is taken to minimize motion artifacts during measurements. Our results showed that a choice of larger pixel resolutions does not necessary result in higher PS amplitude. For example, for the Galaxy S3, iPod 5 and iPad 2, 50 × 50 resolution provided either the highest pulsatile amplitude or was statistically equivalent to HVGA resolution. In fact, HVGA resolution was the best choice only for the iPhone 4S. The important implication of having a smaller pixel region providing just as good or better signal quality than a larger pixel region is the significant reduction in the computational time so that real-time calculation of respiratory rates can be attained.

Commercial pulse oximeters in either transmittance or reflectance mode normally employ a single photodetector (PD) element, typically with an active area of about 6–10 mm^{2}. The image sensor size of the iPhone 4S is 4.54 × 3.42 = 15.5268 mm^{2}. Consequently, when signals are extracted from HVGA (320 × 480 pixels) video mode, the active area is 2.27 × 3.42 = 7.7634 mm^{2}. Hence, we initially thought that motion artifact and noise can be significantly reduced by increasing the active area in the sensor. However, our investigation revealed that larger pixel resolutions do not necessary result in a higher signal-to-noise ratio.

We compared AR-based approaches, CWT, and VFCDM for respiratory rate estimation from smartphones and a tablet because these techniques have been shown to provide good results from PPG signals. Similar to PPG signal results, the VFCDM-FM provided the most accurate respiratory rate estimation with the fastest computational time than any of the methods compared in this study for the LF breathing rate. For HF breathing rates, both CWT and VFCDM methods provided comparable results. The CWT approach using either the FM or AM signals fared better than the AR method but at the expense of higher computational time.

Due to the inherent non-stationarity in the respiratory rate, a time–frequency method is needed and appears to be the most appropriate approach. Another advantage of the TFS approach to estimating respiratory rates is that unlike most filtering approaches, tuning of a number of parameters specific to each subject is not required. Note that in our work, we have used the same parameters (as described in Methods section) for both CWT and VFCDM for all subjects and for all breathing rates.

As was the case with respiratory rate estimation using the PPG signal, the computational speed of the VFCDM method is faster than that of the wavelet method for smartphone and tablet data. The average time to calculate the respiration frequency using the VFCDM method was found to be around 1.4 s, while using the wavelet method took 37.8 s on average (programs running on MATLAB R2012a). The AR spectral method was the fastest as it took only 0.2 s on average using MATLAB, and this computation time includes the time needed to calculate the model order based on an initial model order selection of 50. However, the AR method is the least accurate in respiratory rate estimation.

All three methods showed increased estimation errors as the breathing rates increased, for all devices tested. This observation was also noted for the PPG signal.2 We have also examined breathing rates of 0.7 Hz, 0.8 Hz and 0.9 Hz, and the results showed significant deviation from the true breathing rates for all 3 methods. Both CWT and VFCDM methods provided comparable results with significantly worse estimates for the AR method which was also the case with both LF and HF breathing rates. Hence, our results show that it is feasible to obtain good results for the normal breathing rates but not higher breathing rates (i.e., >0.5 Hz). We can speculate that there are two reasons for inaccurate results for high breathing rates. First, detection of both AM and FM phenomenon requires persistent oscillations for several cycles, but with faster respiratory rates, our decision to limit the data segment to 1 min may not be sufficient. More importantly, with faster breathing rates, the AM or FM phenomenon becomes less apparent, and thus, it becomes more difficult to detect them no matter how sophisticated the detection may be.

We have not considered the device-to-device variations in obtaining respiratory rates. However, we do not believe this is a concern because the specifications of the camera from one device to another device is tightly controlled by the phone manufacturers and hence should not vary at all, and if so, it should only be a minimal amount. Moreover, the pixel resolutions of the examined cameras have more than sufficient resolution to resolve pulse changes, hence, small variations in the pixel resolution, if they occur, should not really affect the respiratory rate estimates. Thus, we believe the device-to-device variation is minimal or not at all, thus, it should not affect the respiratory rate estimation.

In summary, our work was undertaken to determine the optimal pixel resolution and location as well as the color band for obtaining the best quality signal so that we maximize the accuracy of respiratory rate estimation from a video signal from either smartphones or tablets. It was found that a larger pixel resolution does not necessarily result in better signal quality. In fact in most scenarios, a 50 × 50 pixel resolution was just as good as or better than HVGA resolution. In addition, we found that the region closest to the flash in most cases resulted in a higher signal quality which is logical and expected. Finally, using the optimum pixel size, location and color band of the PS, we found accurate respiratory estimates especially in the normal breathing ranges. We found increased breathing rate estimation errors as the respiratory rates increased higher than 0.5 Hz with unreliable results at 0.6 Hz or higher. When both computational time and estimation accuracy are taken into account, the VFCDM-FM provided the best results among all approaches examined in this work. This work allows attainment of at least two vital sign measurements all directly from a finger pressed onto a video camera of either a smartphone or tablet: the heart rate and respiratory rate. It is expected that future work by either our laboratory or others will result in additional other vital sign capabilities directly from a video signal acquired from either a smartphone or tablet.

## Notes

### Acknowledgments

This work was supported in part by the US Army Medical Research and Materiel Command (USAMRMC) under Grant No. W81XWH-12-1-0541.

## References

- 1.Allison, R., E. Holmes, and J. Nyboer. Volumetric dynamics of respiration as measured by electrical impedance plethysmography.
*J. Appl. Physiol.*19:166–173, 1964.PubMedGoogle Scholar - 2.Chon, K. H., S. Dash, and K. Ju. Estimation of respiratory rate from photoplethysmogram data using time–frequency spectral estimation.
*Biomed. Eng. IEEE Trans.*56:2054–2063, 2009.CrossRefGoogle Scholar - 3.Fieselmann, J. F., M. S. Hendryx, C. M. Helms, and D. S. Wakefield. Respiratory rate predicts cardiopulmonary arrest for internal medicine inpatients.
*J. Gen. Intern. Med.*8:354–360, 1993.PubMedCrossRefGoogle Scholar - 4.Grimaldi, D., Y. Kurylyak, F. Lamonaca, and A. Nastro. Photoplethysmography detection by smartphone’s videocamera. In: The 6th IEEE International Conference on Intelligent Data Acquisition and Advance Computing Systems: Technology and Applications, 2011.Google Scholar
- 5.Hasselgren, M., M. Arne, A. Lindahl, S. Janson, and B. Lundbäck. Estimated prevalences of respiratory symptoms, asthma and chronic obstructive pulmonary disease related to detection rate in primary health care.
*Scand. J. Prim. Health Care*19:54–57, 2001.PubMedCrossRefGoogle Scholar - 6.Hirsch, J., and B. Bishop. Respiratory sinus arrhythmia in humans: how breathing pattern modulates heart rate.
*Am. J. Physiol. Heart Circ. Physiol.*241:H620–H629, 1981.Google Scholar - 7.Lee, J., and K. Chon. Respiratory rate extraction via an autoregressive model using the optimal parameter search criterion.
*Ann. Biomed. Eng.*38:3218–3225, 2010.PubMedCrossRefGoogle Scholar - 8.Lee, J., and K. H. Chon. An autoregressive model-based particle filtering algorithms for extraction of respiratory rates as high as 90 breaths per minute from pulse oximeter.
*Biomed. Eng. IEEE Trans.*57:2158–2167, 2010.CrossRefGoogle Scholar - 9.Lee, J., and K. H. Chon. Time-varying autoregressive model-based multiple modes particle filtering algorithm for respiratory rate extraction from pulse oximeter.
*Biomed. Eng. IEEE Trans.*58:790–794, 2011.CrossRefGoogle Scholar - 10.Lee, J., J. P. Florian, and K. H. Chon. Respiratory rate extraction from pulse oximeter and electrocardiographic recordings.
*Physiol. Meas.*32:1763, 2011.PubMedCrossRefGoogle Scholar - 11.Leonard, P., T. Beattie, P. Addison, and J. Watson. Standard pulse oximeters can be used to monitor respiratory rate.
*Emerg. Med. J.*20:524–525, 2003.PubMedCentralPubMedCrossRefGoogle Scholar - 12.Leonard, P. A., D. Clifton, P. S. Addison, J. N. Watson, and T. Beattie. An automated algorithm for determining respiratory rate by photoplethysmogram in children.
*Acta Paediatr.*95:1124–1128, 2006.PubMedCrossRefGoogle Scholar - 13.Leonard, P., N. R. Grubb, P. S. Addison, D. Clifton, and J. N. Watson. An algorithm for the detection of individual breaths from the pulse oximeter waveform.
*J. Clin. Monit. Comput.*18:309–312, 2004.PubMedCrossRefGoogle Scholar - 14.Liu, H., Y. Wang, and L. Wang. A review of non-contact, low-cost physiological information measurement based on photoplethysmographic imaging.
*Conf. Proc. IEEE Eng. Med. Biol. Soc.*2012. doi: 10.1109/EMBC.2012.6346371.Google Scholar - 15.Lu, S., K. H. Ju, and K. H. Chon. A new algorithm for linear and nonlinear ARMA model parameter estimation using affine geometry.
*Biomed. Eng. IEEE Trans.*48:1116–1124, 2001.CrossRefGoogle Scholar - 16.Mason, K. P., P. E. Burrows, M. M. Dorsey, D. Zurakowski, and B. Krauss. Accuracy of capnography with a 30 foot nasal cannula for monitoring respiratory rate and end-tidal CO2 in children.
*J. Clin. Monit. Comput.*16:259–262, 2000.PubMedCrossRefGoogle Scholar - 17.McManus, D. D., J. Lee, O. Maitas, N. Esa, R. Pidikiti, A. Carlucci, J. Harrington, E. Mick, and K. H. Chon. A novel application for the detection of an irregular pulse using an iPhone 4S in patients with atrial fibrillation.
*Heart Rhythm*10:315–319, 2013.PubMedCentralPubMedCrossRefGoogle Scholar - 18.Rantonen, T., J. Jalonen, J. Grönlund, K. Antila, D. Southall, and I. Välimäki. Increased amplitude modulation of continuous respiration precedes sudden infant death syndrome:–Detection by spectral estimation of respirogram.
*Early Hum. Dev.*53:53–63, 1998.PubMedCrossRefGoogle Scholar - 19.Scully, C. G., J. Lee, J. Meyer, A. M. Gorbach, D. Granquist-Fraser, Y. Mendelson, and K. H. Chon. Physiological parameter monitoring from optical recordings with a mobile phone.
*Biomed. Eng. IEEE Trans.*59:303–306, 2012.CrossRefGoogle Scholar - 20.South, M. Measurement of respiratory rate and timing using a nasal thermocouple.
*J. Clin. Monit.*11:159–164, 1995.PubMedCrossRefGoogle Scholar - 21.Subbe, C., R. Davies, E. Williams, P. Rutherford, and L. Gemmell. Effect of introducing the Modified Early Warning score on clinical outcomes, cardio-pulmonary arrests and intensive care utilisation in acute medical admissions*.
*Anaesthesia*58:797–802, 2003.PubMedCrossRefGoogle Scholar - 22.Torrence, C., and G. P. Compo. A practical guide to wavelet analysis.
*Bull. Am. Meteorol. Soc.*79:61–78, 1998.CrossRefGoogle Scholar - 23.Wang, H., K. Siu, K. Ju, and K. H. Chon. A high resolution approach to estimating time-frequency spectra and their amplitudes.
*Ann. Biomed. Eng.*34:326–338, 2006.PubMedCrossRefGoogle Scholar - 24.Younes, M. Role of respiratory control mechanisms in the pathogenesis of obstructive sleep disorders.
*J. Appl. Physiol.*105:1389–1405, 2008.PubMedCrossRefGoogle Scholar