1 Introduction

The human heart rate parameter is an important feature to be considered in many health care applications and clinical uses in various environments. Current cardiac pulse signal acquisition and monitoring are obtained from one of the following techniques: pulse oximetry based on photoplethysmography (PPG) and electrocardiography (ECG). Both methods are inexpensive and reliable but also have the limitation of using adhesive sensors to be attached to the users. This attachment and removal of sensors may cause some discomfort.

The various attempts made to overcome the prestated problem have resulted in many published novel works that use a fixed camera for recording the users to measure their heart rate [1,2,3,4,5,6,7]. Such a contactless method for heart rate monitoring represents an indispensable need in terms of health care implementation in both hospitals and houses [8]. These methods employ an idea that is analogous to that of pulse oximetry, such as contact PPG, in which cardiovascular variations control the changes in blood volume, resulting in changes in skin color, which in turn affect the light reflected from the skin [9]. The only difference between the two techniques lies in depending on reflection rather than wired transmission with pulse oximetry [10].

However, these methods of camera-based heart rate measurement still face some obstacles regarding motion artifact (MA) elimination, brightness changes, and other disturbances that may arise throughout video recording [11, 12].

The current article presents a new MA reduction algorithm for remote photoplethysmograph (rPPG) signals being acquired from facial videos captured by using an ordinary camera capable of recording at short distances of up to 2 m. Recorded videos are then submitted for processing using MATLAB (MathWorksTM) software, which extracts the PPG signals from the face region in the recorded video before applying MA reduction filtering. The enhanced signal is then displayed at cardiac pulse frequency over 30 s.

A total of 33 subjects from the real world were used to test the algorithm by video recording each participant for 30 s. The presented method applies RGB color separation to isolate the green channel in facial video recordings for heart rate signal extraction, as previously mentioned in the methodology section. The contributions of this research can be summarized in the points below:

  1. 1.

    We propose dividing the green component of each video into 60 segments (two per second) and enhancing all segments apart before recombining them back into a single wave.

  2. 2.

    Applying wavelet transform for denoising the signal obtained from the partitioning step.

  3. 3.

    FFT is used on the signal to find the heart rate frequency.

  4. 4.

    The measurements show that our proposed method can efficiently suppress the noise generated by MA on heart rate signals.

The remainder of this paper is arranged as follows. Section 2 illustrates the approach of noise MA reduction and heart rate measurement. Section 3 illustrates the experiments and analyzes the results, demonstrating the efficiency of the suggested algorithm for reducing the noise in heart rate signals acquired from video recordings. Finally, Sect. 4 presents and discusses the experimental results, concludes the paper, and proposes future works.

2 Literature Review

Different signal processing algorithms were recently proposed for artifact reduction and noise cancelation to improve the signal attained from remote PPG (rPPG) sources [13, 14]. Several of these approaches have been presented for MA reduction using independent component analysis (ICA) to disassemble the rPPG data to obtain the desired heart rate signal [15, 16].

Other recent work utilized the Hilbert-Huang transform (HHT) to build a decomposition technique that reduces MA in PPG signals. The HHT algorithm exploits empirical mode decomposition for nonlinear data to obtain the intrinsic mode function (IMF) elements and analyzes the changes in amplitude against time [17].

Another paper presented a comparative analysis of the performance of two motion artifact reduction techniques being applied to rPPG signals, namely, the digital adaptive filtering technique versus the wavelet transform method. The research concluded that using wavelet transformation for restoring corrupted PPG signals can facilitate clinical interpretation in terms of both heart rate (HR) and pulse transit time (PTT) measurements due to the unwarranted phase variability in its inherent algorithm [18].

The study in [19] compared the performance among different wavelet transform techniques for the reduction in various MAs in PPG signals. The estimated arterial blood oxygen saturation (SpO2) results were obtained from the signals by using various wavelets that are rather close, with the Daubechies wavelet enjoying a higher performance compared to other types. Additionally, adaptive filtering has been employed to reduce the issues of MA in PPG signals, such as in [20], which combined an adaptive lattice infinite impulse response (IIR) notch filter (ALNF) and adaptive comb filter (ACF) to achieve the designated goal.

Another work proposed a fusion approach for PPG artifact cancelation. The first detection stage used the inherent time and frequency domain (FD) properties of PPG signals, while the second stage involved enhanced preprocessing, combining motion detection, period estimation and Fourier series reconstruction, followed by amplitude-based FD independent component analysis (FD-ICA) [21].

The authors of [22] proposed an algorithm framework for MA reduction depending on various signal wavelengths acquired from wrist-worn PPG sensors. The framework used the green component of the PPG signal for HR monitoring and an infrared PPG source as the motion reference. The researcher proposed four major stages for reduction: motion artifact detection, motion cancelation using continuous wavelet transform, heart rate estimation, and signal reconstruction.

Gaussian decomposition and minimum mean square error (MSE) estimation can also be used for MA correction, as in [23], where a notch filter is applied for blur reduction and signal refinement. Multiple features were calculated from a number of PPG signals to investigate the existence of MA. Then, MSE and Gauss functions were used for disturbing signal property estimation and corrected signal synthesis, respectively.

Time and frequency domain analyses are another possibility for achieving the desired MA reduction, as proposed in [24], where two types of adaptive filters are sequentially utilized along with singular spectrum analysis (SSA) for peak enhancement in the frequency domain, which are both incorporated to achieve spectral heart rate estimation. The authors claimed to reach an average absolute error of 1.16 beats per minute (BPM) with a standard deviation of 1.74 BPM when tested on 12 subjects with motion.

Other time and frequency domain analyses were presented by utilizing empirical mode decomposition (EMD) and singular value decomposition (SVD) proceeded by a preprocessing stage called variance characterization series (VCS) along with Haar wavelet transform (HWT) to achieve the same goal of minimizing the effect of MA in PPG signals extracted from facial videos [25].

The study in [26] focused on the refinement of PPG signals to filter out all disturbed, unreliable data to obtain better, more accurate results when used in internet-of-Things (IoT)-based health monitoring systems. This reliability improvement is realized by applying the convolutional neural network (CNN) method while testing the results versus heart rate values acquired from ECG signals.

Another type of neural network-based system can be seen in [27], which combines multilayer perceptron (MLP) and radial basis function (RBF) artificial neural networks with an adaptive-neuro fuzzy inference system (ANFIS). The authors focused on the reconstruction of disturbing parts in PPG signals acquired from 23 subjects with an age range of 25–28 years old. The authors concluded that using ANFIS with a subtractive clustering algorithm gave the best results regarding the reconstruction of any missing parts in PPG signals.

Other research articles have suggested various, less popular algorithms, such as [28], in which an algorithm called periodic component factorization (PCF), driven by independent component analysis (ICA), is presented for better MA elimination from PPG signals to overcome the lack in accuracy when using a method called pulse transit time (PTT) in the presence of hand movements and the deterioration in blood pressure measurement under intensive activity.

Another approach can be observed in [29, 30], which applied variations in least mean square (LMS) along with SVD and slope sum method (SSM) for removing MA from PPG signals as well as a testing method against the tunable Q-factor wavelet transform (TQWT) approach suggested by the authors.

A special case of the previously mentioned ICA called joint approximate diagonalization eigenmatrices (JADE) is presented in [31], where multiple enhancement filters were applied for normalization, illumination variation reduction and smoothing prior to JADE. A fast Fourier transform (FFT) was also employed to find the equivalent PPG signal in the frequency domain.

Despite the presence of multiple MA reduction methods for rPPG signals, the problem persists because inexpensive approaches for rPPG signal detection and MA reduction are not easily attainable. Nevertheless, as far as we can comprehend, the partitioning method was never used by any of the related works for MA reduction in rPPG signals. This paper comprises a new fusion method for MA reduction to overcome these limitations. Initially, the signal is divided, with every segment enhanced individually before reassembling the signal again. Then, a wavelet transform is applied to denoise the signal to complete the enhancement process. The results collected from the experiments reveal the distinguished performance of the suggested method when compared to other present works.

3 Proposed Method

As stated earlier, experiments were implemented on the recorded videos of 33 participants (27 males and 6 females) ranging between 18 and 38 years old. A total of 129 pairs of measurements from 33 participants were recorded for this study. An ordinary camera is used for the recording process; the videos were acquired in RGB color model standard (24 bit per pixel (bpp) RGB frames consisting of three channels, each of which has eight bpp) at 30 frames per second (fps). The video clips were saved on the computer in AVI format for later interpretation using MATLAB software. A pulse oximeter with precision (18–300 bpm ± 3 digits) was used as a reference signal. The subjects recorded for this study had varying skin colors and origins (Arab, African and Asian participants), and video recording sessions were carried out at various periods of the day. The participants were placed approximately 80 cm (0.8 m) away from the camera and were asked to move naturally but to avoid making quick or large motions while recording. The only source of brightness during the recording was sunlight, which varied depending on the time of the day.

3.1 Heart Rate Signal Estimation

In this study, the wave generated due to heart pulse activities is considered to be the source of the signal of interest, which propagates throughout the face skin. The ROI segmentation is based on the Viola-Joneses algorithm [32] for face detection. Variations that may occur in the amount of reflected light are due to the volumetric changes in the facial blood vessels during the cardiac cycle; the camera picks up these changes to identify the timing of all heart pulse events.

The summation of values for the green component of all pixels in the ROI of each frame is calculated to yield the component g(t) over all video frames. Experiments showed that the signals extracted from the green channel are the best signals to describe heart rate activity; however, these signals are distorted mainly by MA noise.

It is assumed that the pixel's weight changes result from variations in the arterial blood volume and slight motions of the head. The signal with small fluctuations represents the heart activity extracted from the green channel component, while the other signal painted in red with coarse amplitude variations indicates the noise from MA, as shown in Fig. 1, which displays the green component signals being acquired by the camera. These three signals are obtained from the green components of the captured videos for three participants. They show ambiguous heart pulse characteristics, being degraded by the noise introduced from face MA and illumination changes that occurred during the process of recording. To reduce this noise and yield a clear heart pulse signal, a new two-stage method for MA reduction is implemented in this research.

Fig. 1
figure 1

Three rPPG signals of heart pulses with motion artifacts noise of three participants

3.2 Noise Reduction

The MA reduction method presented in this work consists of two stages. The first stage consists of equally partitioning the signal at 0.5-s periods, shifting these segments to the mean level before recombining them again to rebuild the signal. The second stage uses the wavelet transform to enhance the signal obtained from the first stage. Figure 2 shows the signal acquired from the green channel and the enhancement processes.

Fig. 2
figure 2

a Using face detection for ROI cropped, b decomposing into RGB channels. c Stage-1 of MA reduction. d Stage-2 wavelet transform for denoising

The normal heart rate (HR) of an adult person ranges from 60 to 100 bpm, meaning that every second of the video contains one or two pulses according to the participant’s status. By partitioning the green component, each second would contain two partitions, giving 60 partitions per 30 s.

Every pulse in one partition is separated from the other pulses and shifted to the mean level of the partition. Then, all of the pulses are reconstructed together to form one signal. Finally, the wavelet transform is applied to denoise the reconstructed signal. The aforementioned two-stage approach of noise reduction in heart rate signals acquired from human face video recording can be summarized as follows:

Stage 1:

The pixel's weight changes result from variations in the arterial blood volume and slight motions of the head. The signal with small fluctuations represents the heart activity extracted from the green channel component. These small fluctuations are separated from each other pulses by partitioning the green component of the acquired signal (30-s video) into 60 partitions, as shown in Fig. 3a.

Fig. 3
figure 3

MA reduction and recombined signal enhancement

Calculate the mean value of each partition according to the formula below:

$$ \overline{X} = { }\frac{{\mathop \sum \nolimits_{i = 1}^{n} pg\left( t \right)_{i} }}{n} $$

where g(t) = green component, pg(t) = partition

Shift the samples of every partition to the mean level as in Fig. 3b using the equation:

$$ sg\left( t \right) = pg\left( t \right) - \overline{X} $$

where sg(t) = Shifted partition

Repeat the steps above for all partitions before recombining them to yield the improved signal with reduced noise, as shown in Eq. (3) and Fig. 3c.

$$ {\text{G}}\left( {\text{t}} \right) = \left[ {{\text{sg}}\left( {\text{t}} \right)_{{1}} {\text{sg}}\left( {\text{t}} \right)_{{{2} }} {\text{sg}}\left( {\text{t}} \right)_{{3}} . \ldots \ldots \ldots {\text{sg}}\left( {\text{t}} \right)_{{{6}0}} } \right] $$

where G(t)= recombined partitions

Stage 2:

Wavelet transform denoising [33] is used for the final enhancement, as stated in Fig. 3d, and wavelet denoising contains three steps, as shown in Fig. 4.

Fig. 4
figure 4

Denoising using Wavelet Transform

The three denoising steps using the wavelet transform are as follows:

  1. 1.

    Wavelet transform (decomposing).

  2. 2.

    Thresholding selection.

  3. 3.

    Inverse wavelet transform (denoised signal).

DWT (discrete wavelet transform) is used in this work. The discrete wavelet transform (DWT) is the discrete version of a continuous wavelet transform (CWT), as shown in Eq. (4) below:

$$ {\text{G}}\left( {a,b} \right) = \frac{1}{\sqrt a }\mathop \int \limits_{ - \infty }^{\infty } G\left( t \right)\overline{\psi }\left( {\frac{t - b}{a}} \right){\text{d}}t $$

where G(t) is the input signal, a is the scaling parameter, b is the translation parameter, and \(\overline{\psi }\) is the mother wavelet.

DWT decomposes a signal into approximate and detailed information, allowing for analysis in different frequency bands with different resolutions.

DWT aims to decompose a signal into different resolutions using high-pass and low-pass filters. Consider these decomposition equations:

$$ D\left( n \right) = \mathop \sum \limits_{k} g\left( k \right) \cdot G\left( {2n - k} \right) $$
$$ A\left( n \right) = \mathop \sum \limits_{k} h\left( k \right) \cdot G\left( {2n - k} \right) $$

where G(n) is the input signal, D(n) refers to the detailed coefficients, A(n) refers to the approximation coefficients, h(n) is the low-pass filter and g(n) is the high-pass filter.

Wavelet function Symlet 8 is used to reduce artifacts in heart rate signals. Symlet 8 is better than any other mother wavelet function for removing artifacts from heart rate signals. The DWT decomposition is level 5, which is the appropriate level for noise removal from noisy signals [34].

Denoising with DWT involves three steps, the second of which is the most important. This step consists of determining a threshold and the treatment of the wavelet coefficients. In thresholding, two types are distinguished (hard thresholding and soft thresholding).

Based on Eq. 7, hard thresholding is more sensitive to small changes in the signal and is unstable because it tends to have a larger variance. As a result of the shrinkage of large wavelet coefficients, soft thresholding tends to produce a larger bias. Soft thresholding is more stable than hard thresholding in Eq. 8.

$$ \tilde{D}i = \left\{ {\begin{array}{*{20}l} {0,} \hfill & {Di \le T} \hfill \\ {Di,} \hfill & {Di > T} \hfill \\ \end{array} } \right. $$
$$ \tilde{D}i = \left\{ {\begin{array}{*{20}l} {0,} \hfill & {Di \le T} \hfill \\ {sign\left( {Di} \right)\left( {Di - T} \right),} \hfill & {Di > T} \hfill \\ \end{array} } \right. $$

where Di is the detailed coefficient, \(\tilde{D}i\) is the updated detail, and the T is the threshold.

In this work, soft thresholding combined with rigorous SURE is used to denoise all levels of DWT decomposition for the heart rate signals obtained from stage 1 of noise reduction. Denoised signals are reconstructed by using updated detail coefficients of the DWT, eliminating approximation coefficients.

To compute the heart rate frequency, the processed data are transformed to the frequency domain (FD) via FFT. As mentioned earlier, the suggested method is tested on the recordings of 33 healthy subjects (ranging from 18 to 38 years old) at various times of day. Additionally, a pulse oximeter is used as a reference for heart rate measurement to contrast the attained results with the reference ground truth information.

4 Results and Discussion

The video-capturing process was performed in an inside-outside room at different periods of the day. During the experiment, the camera was placed approximately 80 cm away from the person to pick up the RGB weight variations. The face of each subject is considered to be the ROI and segmented using the Viola-Jones face detection method for further analysis.

The results of the MA reduction method and the heart pulse signal contactless measuring is displayed in Fig. 5, which is taken from various participants at different recording times. Figure 5a displays the waveforms in time domain with oscillating amplitudes duo to light intensity fluctuations caused by heart activities and distorted by MA. Figure 5b demonstrates the results of the suggested noise reduction approach applied through signal enhancement and WT denoising, respectively, and Fig. 5c shows the result of applying FFT on the enhanced signals resulted from (b). The resulted spectrum shows a clear high peak at the frequency content of the heart pulse rate over 30 s.

Fig. 5
figure 5

a Green component signals in time the domain with different amplitudes, b MA reduction results, c high peak of heart rate from MA reduction signals in FD

A Bland‒Altman analysis was used to test agreement between 129 pairs of measurements from 33 subjects. The standard deviation (SD), mean of the differences, 95% limits of agreement (± 1.96 SD) and mean of the absolute differences were calculated. A correlation coefficient and root-mean-squared error (RMSE) were calculated for the heart rate obtained from the pulse oximeter and our two-stage artifact reduction method.

Figure 6 shows that the correlation coefficient is 0.5 of 129 pairs of measurements from 33 participants using the measurements from the green trace component only without two-stage MA reduction. The mean bias error (MBE) was 4.3 bpm with 95% limits of agreement −37.2 to 28.5 bpm, and the root mean square error (RSME) was 17.2 bpm.

Fig. 6
figure 6

Correlation coefficient found between heart rate results obtained by green trace’s component and the results from the pulse oximeter

The observed heart rate results measured by our proposed method are compared with results obtained from a pulse oximeter as a reference. This comparison is made to illustrate the accuracy extent of our proposed method for MA reduction and heart rate measurement. We see a notable variation in the distribution of points in the correlation coefficient analysis before and after using our two-stage method of MA reduction, as shown in Figs. 6 and 7.

Fig. 7
figure 7

Correlation coefficient found between heart rate results obtained by our method and the results from the pulse oximeter

The correlation coefficient of the proposed method was found to be (0.97) when calculated between the observed results measured by our method and the results attained from the pulse oximeter. The MBE was 0.13 bpm with 95% limits of agreement -6.9 to 6.6 bpm, and the RSME was 3.41 bpm.

A Bland‒Altman analysis was used to test agreement between 129 pairs of measurements from 33 subjects to clarify the effect of the two-stage MA reduction. Based on the data taken from the green trace component without our MA reduction method, the MBE was 4.3 bpm with 95% agreement limits of −37.2 to 28.5 bpm, and the root mean square error (RSME) was 17.2 bpm, as shown in Fig. 8. Using the two-stage MA reduction method raised the correlation coefficient from 0.5 to 0.97. The MBE was reduced from 4.3 to 0.13 bpm with 95% limits of agreement of −6.9 to 6.6 bpm, and the RMSE decreased from 17.2 to 3.41 bpm, as shown in Fig. 9.

Fig. 8
figure 8

Bland‒Altman plots between the mean of the two measures and the difference between the two measures taken from (Green trace’s component and pulse oximeter)

Fig. 9
figure 9

Bland‒Altman plots between the mean of the two measures and the difference between the two measures taken from (pulse oximeter and two-stage MA Reduction)

Table 1 summarizes the descriptive statistics for the key evaluations of the proposed method compared to the pulse oximeter. Overall, the proposed method showed very high agreement with pulse oximeter measurements in the presence of motion artifacts (MBE = 0.13 ± 3.4 bpm). A two-stage MA reduction technique increased the correlation level and reduced the RMSE, mean bias and standard deviation.

Table 1 Hearts rate measurements by proposed method and reference pulse Oximeter

To compare the performance of the proposed method with different state-of-the-art approaches, Table 2 shows the previous related works on contactless PPG methods of heart rate measurements, and Table 3 shows an RMSE-based comparison between our proposed MA reduction technique and different existing techniques. The comparison techniques reported by Ming-Zher et al. [1], Holton et al. [2], Monkaresi et al. [3], Hsu et al. [5], Tran et al. [7], Frédéric Bousefsaf et al. [35] and Si-Qi Liu et al. [36] are indicated as MZ, HO, MH, HY, TD, FB and SQ, respectively. We can see that our proposed method achieves the highest performance compared with other techniques. The RMSE (3.41) was obtained using our proposed method. We found that the RMSE of 33 participants attained by the proposed MA reduction technique was the lowest among the other methods.

Table 2 Previous papers on contactless PPG methods of heart rate measurements
Table 3 Compare performance in term of RMSE between different existing methods

5 Conclusion

In this research, a method is proposed for reducing the noise generated by motion artifacts in heart rate signals remotely acquired by video recording. To examine the performance of the proposed method, FFT is applied to the green component of the signal. The high spectrum peak is related to the heart rate frequency of the participant.

The results prove the capability of the proposed method in reducing the noise generated by MA on heart rate signals, which can be used with rPPG data. Furthermore, the ability of this method to process videos recorded at various periods of the day is established. The motion artifacts we examined in this study in which participants interacted with their laptop, such as natural movement. The attained results are compared against standard results gained from a pulse oximeter, with MBE (0.13), RSME (3.41) and correlation coefficient equal to (0.97) when comparing the results of the two methods.

However, this technique is still under development and will be extended to be used in robust and accurate heart rate measuring systems. The idea of creating a multiparameter, real-time contactless monitoring system based on the proposed approach for measuring heart rate variability and respiratory and oxygen saturation in arterial blood flow is a possibility that can be realized in future studies.