Non-invasive cuff-less blood pressure estimation using a hybrid deep learning model

Conventional blood pressure (BP) measurement methods have different drawbacks such as being invasive, cuff-based or requiring manual operations. There is significant interest in the development of non-invasive, cuff-less and continual BP measurement based on physiological measurement. However, in these methods, extracting features from signals is challenging in the presence of noise or signal distortion. When using machine learning, errors in feature extraction result in errors in BP estimation, therefore, this study explores the use of raw signals as a direct input to a deep learning model. To enable comparison with the traditional machine learning models which use features from the photoplethysmogram and electrocardiogram, a hybrid deep learning model that utilises both raw signals and physical characteristics (age, height, weight and gender) is developed. This hybrid model performs best in terms of both diastolic BP (DBP) and systolic BP (SBP) with the mean absolute error being 3.23 ± 4.75 mmHg and 4.43 ± 6.09 mmHg respectively. DBP and SBP meet the Grade A and Grade B performance requirements of the British Hypertension Society respectively.


Introduction
Blood pressure (BP) is one of the most important and commonly measured clinical parameters and accurate measurement is crucial for therapeutic decisions. The World Health Organization (WHO) estimates that 1.13 billion people worldwide have hypertension which is a major cause of premature death. However, fewer than 1 in 5 people with hypertension have the problem under control (World Health Organisation 2019). One of the global targets for noncommunicable diseases is to reduce the prevalence of hypertension by 1 3 93 Page 2 of 20 25% by 2050 (baseline 2010). Regular BP monitoring is thus essential for prevention and control in the general population and for hypertensive patients.
Despite its importance, the existing non-invasive regular BP measure methods have downsides that can be ascribed to their measuring devices. The most popular cuff-based BP measurement requires user to follow protocols to obtain accurate BP values. Some cuffbased devices, e.g., mercury sphygmomanometer, require frequent calibration. Other disadvantages include movement artefacts during physical activity (Ogedegbe and Pickering 2010) and discomfort during cuff inflation. In addition, for some people, the act of going to the doctor triggers a response making their BP soar which clinicians recognize as whitecoat syndrome (Stergiou et al. 2018).
Due to the aforementioned factors, developing a cuff-less, continual or near real-time periodic, robust, comfortable, and wearable BP measurements system is desirable. Using physiological signals to conduct non-invasive and cuff-less BP measurement emerged in the past decade Liang et al. 2018a;Sharifi et al. 2019;Yoon et al. 2009). Two typical physiological signals used in BP estimation are the photoplethysmogram (PPG) and electrocardiogram (ECG). Pulse arrival time (PAT) is the time interval between the R-wave peak of the ECG and the systolic peak of the PPG. When the PAT is longer, it indicates a lower BP, while a shorter PAT indicates a higher BP, but the precise relationship is uncertain due to the complexity of the cardiovascular system. This method requires a calibration protocol for stepwise increases in BP and several simultaneous measurements of ECG, PPG and a reference method (e.g. a mercury sphygmomanometer). Furthermore, individual calibrations are often needed to increase accuracy. Therefore, it is a challenge to use PAT for BP measurement under clinical conditions (Hennig and Patzak 2013).
Recently, there has been growing interest in cuff-less and non-invasive BP estimation using machine learning algorithms with the PPG and ECG (Chen et al. 2019;Kachuee et al. 2017;Mousavi et al. 2019; Ribas Ripoll and Vellido 2019; Rundo et al. 2018). Most of the studies extracted specific features in the time domain or frequency domain and their results reveal the high correlation of these features with BP Kachuee et al. 2017;Tanveer and Hasan 2019;Wang et al. 2018). The two main challenges with these approaches are the need for considerable signal processing and extraction of features associated with physiological signals.
These drawbacks, alongside the emerging methods of using raw signals as inputs into deep learning for different purposes (Gotlibovych et al. 2018;Slapničar et al. 2019), have motivated us to investigate this approach for non-invasive BP measurement. To ensure the high quality of the data, this research conducted a series of measurements on 45 participants to obtain a database of 315 records, each containing PPG, ECG, BP values and corresponding participant's physical characteristics (i.e., age, height, weight and gender). This is suitable for the investigation of the use of deep learning with raw signals and physical characteristics for BP measurement for the first time to the authors' knowledge. Moreover, another objective is to compare the accuracy of predictions between traditional machine learning methods and the novel hybrid deep learning model.
The novelty of this study is threefold. Firstly, although there have been attempts predicting BP values using deep learning methods, they rely on the use of physiological signals. To date, no one has tried to use both physiological signals and physical characteristics as inputs in a deep learning structure. This study presents the first attempt in this regard by devising a novel hybrid deep learning model. Secondly, this study provides a comprehensive comparison not only between traditional machine learning methods and hybrid deep learning models, but also between hybrid deep learning models with different structures. Thirdly, the methods used to collect data to predict BP are simple and replicable.
Combined with the automatic nature of the hybrid deep learning model, it largely reduces the complexity for the end user and has the potential of large-scale implementation.
The paper is organized as follows: Sect. 2 explains the experimental data processing procedures and all the algorithms used in this work. Section 3 demonstrate the experimental design for this study. Section 4 presents the results and finally, Sect. 5 concludes and discusses the research.

Methods
Before application of machine learning algorithms, the ECG and PPG are pre-processed and features are extracted. Several popular machine learning algorithms are then applied to estimate BP from signal features and physical characteristics. Unlike traditional methods, the proposed deep learning method does not require feature extraction and key information contained in the raw data are automatically extracted by the deep learning network by selflearning. Data acquisition will be described in Sect. 3.

Data pre-processing
The acquired PPG signal is processed by a Chebyshev II bandpass filter with the lower and upper cut off frequencies of 0.5 and 10 Hz respectively in order to reduce noise within the raw PPG signal (Liang et al. 2018b). For the ECG signal, baseline drift and high frequency noise are removed using a Butterworth bandpass filter with lower and upper cut off frequencies of 0.5 and 40 Hz respectively (Shin et al. 2010). Afterwards, the PPG and ECG signals are normalized and their peaks in each period are obtained. The most stable segments are chosen from both signals by a calculation of the highest cross-correlation coefficient between periods which is defined by neighbouring peaks (Kachuee et al. 2015).

PPG
In the literature, morphological features from PPG and complexity features from ECG are often used to predict BP (Elgendi 2012;Kachuee et al. 2017;Simjanoska et al. 2018;Yang et al. 2020). There are more than twenty features that can be extracted from a PPG signal and its first and second derivatives (Elgendi 2012). Twelve of them are selected and used for further estimation in this research. A PPG signal with labelled features is displayed in Fig. 1a and a PPG and its second derivative signals are shown in Fig. 1b

ECG
The extracted and used features in this research are listed in Table 2. Most of these features from ECG signals are obtained from complexity analysis, except heart rate which is calculated from the measurement of the peak-to-peak time interval of the ECG signals.

Pulse arrival time (PAT)
PAT is extracted and applied as one of the features in this study. PAT is defined as the time interval between the electrical activation of the heart and arrival of the pulse pressure at a distal point measured as the time between the peaks of PPG and ECG (Chan et al. 2019). It    Peak to peak interval Time difference two successive systolic peaks 4 Inflection point (Millasseau et al. 2002) Used to replace diastolic point 5 Augmentation index (Elgendi 2012) AuI = x y 6 Large arterial stiffness index Inversely related to the time interval ΔT 7 S1 Areas under the PPG signal 8 S2 9 S3 10 S4 11 Crest time (Alty et al. 2007) CT 12 Ratio of b/a (Baek et al. 2007) From 2nd derivative Figure 1b includes the pre-ejection period, which is the time it takes for blood to leave the heart after the heart's electrical impulse.

Traditional machine learning methods
Several commonly used machine learning methods are used in this study to evaluate the effectiveness of different methods in predicting BP using features extracted from PPG and ECG signals and physical characteristics. LASSO (least absolute shrinkage and selection operator) is a linear model with L1 prior as a regularizer (Friedman et al. 2010). As a large number of features are used to predict BP, it is important to add a regularization term in linear models to help with the variable selection. LASSO is able to perform both variable selection and regularization, leading to increase of prediction accuracy. The amount of regularization is controlled by α, the coefficient of the L1 term, and it can be determined experimentally using cross-validation during the training process. In this study, fivefold cross-validation is used to select α. Support Vector Regression (SVR) is a popular machine learning model and has been proven to be an effective tool in real-value function estimation (Drucker et al. 1996). SVR uses a symmetrical loss function and errors with absolute values that are smaller than a certain threshold are ignored. As a result, the model produced by SVR depends only on a subset of the training data. A fivefold cross-validated grid-search is used to search for the optimal values for several important parameters, including kernel type (linear, polynomial, radial basis function), kernel coefficient (0.1, 0.01, 0.001, 0.0001), regularization parameter (1, 0.1, 0.01, 0.001, 0.0001) and epsilon-tube (0.1, 1, 5, 10, 20) which specifies the tolerance level.
AdaBoost, which is short for Adaptive Boosting, is an ensemble method and can be used to fit a sequence of weak learners (other types of learning algorithms) to improve performance (Drucker 1997). The final output is a combination of a weighted sum of predictions generated by these weak learners. A commonly used weak learner, a decision tree regressor is adopted in this study. A fivefold cross-validated grid-search is further used to search for the optimal values of the number of iterations (5, 50, 500), learning rate (1, 0.1, 0.01, 0.001, 0.0001) and loss function (linear, square, exponential).
Random forest (RF) is another ensemble method that constructs a number of decision trees built from samples drawn with replacement (Breiman 2001). With the added randomness, random forest can decrease the variance of the forest estimator. A fivefold cross-validated grid-search is used to search for the optimal values of several important parameters, namely the number of trees (100,150,200,500,1000), the criterion to measure the quality of a split (mean squared error, mean absolute error) and the minimum number of samples required to split an internal node (2, 3, 4, 5, 10). K-Nearest Neighbours (KNN) is a non-parametric method that calculates the predicted value by taking weighted average values of k nearest neighbours. K is an integer value that needs to be specified, as well as weighting scheme and distance metric. In this study, a fivefold cross-validated grid-search is used to search for the optimal values of k (1,5,10,15,20), weighting scheme (uniform, distance) and distance metric (Euclidean, Manhattan).
Multi-layer Perceptron (MLP) is a typical class of feedforward neural network and it has the capability to learn non-linear models. It consists of at least three layers, including input, hidden and output layers. A fivefold cross-validated grid-search is used to search for the optimal values of several important parameters, namely number of hidden layer (1, 2, 3), number of nodes in the hidden layers (5, 10, 20, 50), activation function in the hidden layer (logistic sigmoid, hyperbolic tangent, ReLU), coefficient for the L2 regularization term (1, 0.1, 0.01, 0.001, 0.0001) and maximum number of iterations (100, 200, 500, 1000).

Proposed deep learning model
This study proposes a novel deep learning model to utilize the information contained in the PPG and ECG along with physical characteristics to predict BP. In contrast to the methods mentioned earlier, which require pre-processing and feature extraction from the PPG and ECG, deep learning models can take directly the raw signal data as input and the feature learning is essentially embedded in the modelling process. This novel hybrid deep learning model consists of various types of neural network models, such as Convolutional neural network (CNN), Long short-term memory (LSTM) and fully connected layer (Dense). The Dense layer is essentially a hidden layer in the MLP.
CNN was initially developed for image classification problems, where it receives twodimensional image pixels as input and generates output after a series of operations that involve pattern learning. Multiple CNN layers are often applied in problems like this so that simple patterns can first be identified in the lower layers and be used to form more complex patterns within higher layers (Krizhevsky et al. 2012). The same process can be applied to one-dimensional time series data, such as the PPG and ECG in this study. Onedimensional CNN (1D CNN) can automatically learn to extract useful features from these signals and how to construct appropriate models to predict BP.
1D CNN applies the convolution operation on the input data with a number of filters (also called feature detector) (LeCun and Bengio 1995). The length of these filters can be specified and it is often referred to as kernel size. These filters are then moved along the signals and the shift size is referred to as strides, which is often chosen to be 1. Different types of padding can be applied to determine the size of the output. Zeropadding is often found to perform well in practice (Krizhevsky et al. 2012), and it is also adopted in this study. An activation function is often applied to the results generated from the convolution operation. ReLU is very popular and found to perform well in practice (Jarrett et al. 2009). Convolutional layers are often followed by dropout layers for regularization, and then pooling layers, such as max pooling and average pooling (Krizhevsky et al. 2012;Srivastava et al. 2014). CNN models tend to learn very quickly and the dropout layer can help slow down the learning process and result in a potentially better final model. The pooling layers can help reduce the dimension and consolidate learned features to the most essential elements. Pool length of 2 is often used in practice and it is also adopted in this study. Several convolutional layers can be stacked together to extract more complicated features. Hyperparameters that need to be determined for 1D CNN layers include kernel size (3,5,7,9), number of filters (64,128,256,512) and number of epochs (20,50,100). In this study, the range of kernel sizes, number of filters and epochs is investigated using a cross-validation process in which an optimum is selected based on accuracy and convergence time.
LSTM network model is a special type of recurrent neural network (RNN) that is able to learn long-term dependencies (Hochreiter and Schmidhuber 1997). It has been proven to be effective for sequence prediction tasks such as speech recognition, natural language processing and machine translation (Chen et al. 2017;Cui et al. 2016;Tian et al. 2017).
A typical memory block in LSTM contains a memory cell and three gates, namely, input, output and forget gates. The activation functions associated with the gates are often logistic sigmoid function. LSTM can support multiple parallel sequences of input data, such as the PPG and ECG signals in this study. LSTM can be used to automatically learn temporal dependencies in raw PPG and ECG signals and use them to predict BP values (Su et al. 2018). The parameter needs to be chosen for LSTM is the length of state vector (10, 50, 100).
CNN and LSTM are two types of deep learning structures that can be used separately to automatically learn from raw PPG and ECG signals to predict BP. They can also be stacked together in a way that the output from CNN is fed to the following LSTM layer. This stacked structure can be used to extract useful features and then learn the long-term temporal dependencies from the raw signals. This type of structure has been used for tasks such as detection of diabetes (Goutham et al. 2018), human activity recognition (Ordóñez and Roggen 2016), continuous cardiac monitoring (Saadatnejad et al. 2020), atrial fibrillation detection (Gotlibovych et al. 2018) and classification of myocardial infarction (Baloglu et al. 2019), and it is often found to perform well in practice.
In addition to the raw signals, this study investigates a novel deep learning structure that can also utilize useful information contained in physical characteristics to predict BP. This novel model consists of various types of models, including CNN, LSTM and Dense. This new structure can directly take raw signals and physical characteristics as input at the same time. It can learn to automatically pick up useful information contained in different types of input data and find an optimal way to link to BP.

Experimental design
Two streams of experiments are conducted in this study. The first stream involves the use of physical characteristics and features extracted from PPG and ECG signals, which are then used as input in traditional machine learning methods, namely LASSO, SVR, AdaBoost, RR, KNN and MLP in this study. The second stream is the construction of novel hybrid models that consists of various deep learning methods such as CNN and LSTM and utilises physical characteristics and the raw PPG and ECG directly as inputs. Several different architectures of hybrid models are investigated which are comprised of different numbers of layers of CNN and LSTM. Experimental data is gathered using the set-up and protocol described in the next section.

Data acquisition system
The proposed cuff-less BP estimation system is illustrated as Fig. 2 where BP, ECG and PPG are measured simultaneously. Acquisition of data with this system has been reviewed and approved according to the ethical review process in place.
The measurement system shown in Fig. 2 comprises three sections for measurement of BP, ECG and PPG. BP values are measured as a reference standard by a commercial device (Lloyds Pharmacy Fully Automatic Blood Pressure Monitor LBPK1) with measurement accuracy of ± 3 mmHg (Lloyds 2021). The PPG signal is measured by infrared transmission through the finger via a finger clip sensor (HRM-2511E, Kyoto Electronic Co., China) with data transferred to a data acquisition board (Easy Pulse Sensor Version 1.1, Elecrow, China) (Raj 2013). The ECG is measured with 3 disposable solid gel electrodes based on the lead I configuration placed on 2 wrists and an ankle connected to a data acquisition board (Analog devices, AD8232) (Lu et al. 2014). Due to availability and convenience, the power for both circuit boards is supplied by an Arduino UNO board (Arduino Co., Italy).
The measured PPG and ECG are then transmitted to a data acquisition device (USB-6211, National Instruments). The sample frequency for the data acquisition is 1 k samples/second in order to achieve a high-quality signal. The collected signals are sent to the processing unit which is a battery powered laptop for the benefit of minimum noise and to isolate the subject from mains power lines. All data were monitored and recorded through LabView (National Instruments).

Data collection protocol
Data collection is performed on 45 participants. A detailed description is listed in Table 3. All participants are healthy adults with no apparent arterial disease or physiological abnormality. Informed consent is obtained from all participants and they are requested to not take drinks that contained caffeine or a heavy meal 4 h before the experiment to prevent a large variability in BP. For each participant, the data collection includes two measurement sections occurring on the same day using the same data acquisition protocol. Each experiment takes less than 40 min to collect all relevant signals with the following protocol: 1. The participant stays still for 10 min, during which the consent forms are signed; the individual physical characteristics including age, height, weight and gender are recorded; the cuff of commercial BP measurement device is worn on the upper right arm; electrodes are pasted on the limbs for ECG signal; and the clip is fixed on the index finger of the left hand for PPG signal acquisition. Participants are requested to keep still during the measurement because the PPG signal is sensitive to movement. 2. The PPG and ECG are recorded continuously for a period of 3 min. At the same time, BP is also measured. This procedure is repeated 3 times. 3. To induce a change in BP, the participant is asked to go downstairs from the 4th floor to the 1st floor and then return as rapidly and safely as possible. 4. Once the participant returns, the same procedure in step (2) is repeated, but for 4 times.
Hence, there are 7 sets of data collected from each participant within around 40 min. Accordingly, there is a total number of 315 records of data obtained. For each record of the data, it includes PPG, ECG, BP and the corresponding participant's physical characteristics.
While raw PPG and ECG signals are fed directly into the hybrid deep learning model, pre-processed and extracted features from PPG and ECG signals are used as inputs for traditional machine learning methods. As detailed in Sect. 2.2, 12 features are extracted from PPG and 45 features are extracted from ECG. In addition, PAT is also extracted, which involves the use of both PPG and ECG. As a result, there are 58 features extracted from PPG and ECG in total. Combined with four physical characteristics, the input dimension for each observation in traditional machine learning models is 62. As models for DBP and SBP are separately built, the output for each observation in traditional machine learning models is 1, which is the corresponding DBP or SBP value.

Cross validation experiments
To generate a model with good generalization ability, this study conducts fivefold crossvalidation (CV) experiments where the training and testing samples are from different sets of subjects. Since there are 45 participants in this study, data samples from 9 random participants are used as testing samples and the rest are used as training samples in each CV experiment. CV experiments are repeated 20 times and the evaluation results are averaged over these 20 experiments. Separate models are built for systolic blood pressure (SBP) and diastolic blood pressure (DBP). Such an experimental design provides robust results as it involves multiple experiments to tackle the potential instability in a particular CV experiment. In addition, CV can also help avoid dependence of results on the choice of the split in each experiment.
During the training process of each CV experiment, the hyperparameters of traditional machine learning methods are determined using a fivefold CV on the training data set. For hybrid deep learning models, as they take a lot more time to train, their hyperparameters are determined in the first CV experiment where there are 5 different train-test splits. The best hyperparameters are decided to be the ones that are chosen most times in these 5 splits.

Hybrid model architectures
A general representation of the architecture of the hybrid model is shown in Fig. 3.
In contrast to traditional machine learning methods, this newly proposed hybrid model can take raw PPG and ECG signals and physical characteristics as simultaneous inputs by combining different deep learning structures. This hybrid model does not need any feature extraction from the raw signals and can learn to extract the optimal features itself. The input consists of two main parts, namely raw PPG and ECG signals and physical characteristics. The dimension of the signal part for each sample observation  Fig. 3 The architecture of the hybrid model is (5000, 2), which means it is 5 s of data (sampling rate 1000/s) and 2 channels (PPG and ECG), while the dimension of physical characteristics is 4, including age, height, weight and gender. Again, as the models for DBP and SBP are separately built, the output dimension is 1.
CNN blocks and LSTM are used to extract features from raw signals while dense is used to extract features from physical characteristics. The features learnt are then concatenated and fed to another Dense layer. Finally, this dense layer is followed by output layer with no (linear) activation function as the target variable BP is continuous. Mean absolute error (MAE) is used as the loss function.
CNN layers can be stacked together to extract more complicated features, different numbers of CNN blocks are used to form several different architectures. The number of CNN blocks is set to vary from 1 to 5, which leads to 5 different architectures. We denote hybrid models with 1-5 CNN blocks as Hybrid Model 1-5 respectively.
Each CNN block is comprised of 1D CNN, dropout and maxpooling layers. Dropout layer is also used following LSTM because it can impose regularization and prevent overfitting. The dropout rate defines the probability of a randomly selected neuron being dropped out. The dropout is only implemented during the training and not used in the testing. The dropout rate is chosen from 0.1, 0.2 and 0.5 during the training, when other hyperparameters are being chosen for the hybrid model, including the number of hidden nodes for the Dense layers (10, 50, 100).

Measured BP
Histograms of the BP data obtained from sphygmomanometer are presented in Fig. 4. The measured DBP and SBP ranged from 56-106 mmHg to 84-170 mmHg respectively. The relatively large range of BP values is driven by interval measurement after physical exercise in order to test the robustness of the prediction of BP values.

Training and prediction results
After the first CV experiment, the hyperparameters are all chosen for hybrid models. Then after 20 repetitions of CV experiments, the model with the best prediction performance is Hybrid Model 3 with the following configuration details: • CNN parameters: kernel size 7, number of filters 128 and number of epochs 20. • Dropout rate: 0.5 • Pool length of Maxpooling layer: 2 • Length of state vector of LSTM: 50 • Number of hidden nodes for the Dense layers: 50 The BP prediction results of traditional machine learning methods and newly proposed hybrid models are shown in Table 4. Criteria for performance evaluation are MAE and standard deviation (STD) of estimation.
The MAE and STD are calculated over 20 repetitions of CV experiments. According to Table 4, some comparisons can be made. For instance, what stands out in the table is that Hybrid Model 3 performs best in terms of both DBP and SBP with the results of 3.23 ± 4.75 mmHg and 4.43 ± 6.09 mmHg respectively. It is closely followed by Hybrid Model 4 and Hybrid Model 5. It suggests that 3 CNN blocks are sufficient to extract useful features from the raw signals.
It is clear that in all models, hybrid models achieved lower SBP and DBP errors than traditional machine learning methods. It indicates that this newly proposed hybrid model architecture can extract more information from the raw signals than manually extracted features, which leads to a more accurate prediction of BP when combined with physical characteristics. This also alludes to the possible misrepresentation of information by manually extracted features due to challenges encountered with distorted waveforms. Deep learning models seem to be more robust in this regard.
In addition to the comparisons within traditional machine learning methods, SVR is found to perform best in terms of DBP with the result of 5.05 ± 7.26 mmHg and followed by MLP and KNN. In terms of SBP, MLP, with 6.92 ± 9.11 mmHg, performs best among traditional machine learning methods, followed by SVR and RF. LASSO is found to perform worst regarding both DBP and SBP, and it can be inferred that there exist strong nonlinear relationships between features used and BP which can lead to inferior performance of LASSO. In addition, the inherent complexity in this problem necessitates using powerful regression algorithms, like hybrid models. In order to further understand the origins of the improvements provided by the proposed hybrid model, the performance of models without physical characteristics is investigated. This investigates whether the superior performance of the hybrid models is mainly driven by automatic feature extraction of raw signals or due to inclusion of the physical characteristics of the subject.
The number of CNN blocks of these deep learning models are again set to vary from 1 to 5, and we denote these models by DP 1-5 respectively. The same procedure applied for hybrid models is used for training.
The results of DP are shown in Table 5. Regarding DBP prediction, the accuracy of DP models improves quickly when the number of CNN blocks increases from 1 to 3. However, as the number further increases from 3 to 5, the performance does not improve. A similar pattern can be observed for the case of SBP prediction by DP model, except that the lowest MAE and STD are obtained when the number of CNN blocks is equal to 4. When comparing DP and hybrid model with the same number of CNN blocks, the results indicate that the hybrid model always perform better than DP. This indicates that the inclusion of physical characteristics increases the prediction accuracy.
The results from the 3 best performing models, which are Hybrid Models with 3, 4 and 5 CNN blocks are further compared with British Hypertension Society (BHS) standard as shown in Table 6. This standard requires that the cumulative percentage of error is under 5 mmHg, 10 mmHg and 15 mmHg (O'Brien et al. 2001). In this work, the predicted value of DBP obtained from the Hybrid Model with 3 CNN blocks is consistent with Grade A and the other two models meet Grade C. In addition, the hybrid model with 3 CNN blocks is in congruence with Grade B and that with 4 CNN blocks meet Grade C in the estimation of SBP values. However, the estimation of SBP from the Hybrid model with 5 CNN blocks is not consistent with the BHS standard.
The Association for the Advancement of Medical Instrumentation (AAMI) standard requires BP measurement devices to have MAE and STD values lower than 5 mmHg and 8 mmHg, respectively. According to Table 6, all hybrid models achieve the requirements when estimating DBP. However, only Hybrid Model 3 and 4 is consistent with the standard in SBP estimation. Also, the MAE and STD values of all traditional machine learning models are outside the stipulated limits.  Table 7 compares the proposed approach in this paper and other works in the literature, which use PPG, ECG and machine learning algorithms for BP estimation. In general, it is difficult to compare related work in this field because of different and inadequately specified databases, different signal pre-processing procedures, and different evaluation methods with different machine learning algorithms. In addition, some authors mixed all data and split randomly for training and testing, but they did not explicitly report whether any data of test subjects were included in the training data. Thus far, it is difficult to perform an objective comparison between different research.
However, from a general perspective, most previous research applies traditional machine learning algorithms, and the best results of DBP estimation ranges from about 4 to 6 mmHg. The best SBP estimation results vary from 5 to 12 mmHg. Although the BP estimation results of traditional algorithms in this work have no obvious advantages compared with the results from previous research, it is evident that hybrid models provide more accurate prediction of BP.

Discussion and conclusions
In this paper, a novel hybrid deep learning model is proposed to predict BP using raw PPG, ECG signals and some physical characteristics. Traditional machine learning methods used in predicting BP involve extracting features from signals and it often presents challenges when the quality of the signal is not good. This novel hybrid deep learning model consists of several different types of deep learning layers which enable the automatic feature extraction and can learn to extract optimal features in the modelling process. The hybrid models are tested on the data set collected and provide superior prediction results compared with traditional machine learning models. Deep learning models have shown high performance in many research areas and this study has shown its enormous potential in its application in predicting BP. Because of its flexible structure, deep learning models can receive various combinations of different types of inputs. This is a very useful feature as incorporating more physiological data that can be relevant to BP is likely to increase the prediction accuracy. The best performance of hybrid model achieves 3.23 ± 4.75 mmHg for DBP estimation and 4.43 ± 6.09 mmHg for SBP estimation. This result is consistent with Grade A and Grade B in the estimation of DBP and SBP respectively. In line with this, this model also achieves the requirements of the AAMI standard. It indicates that hybrid models with raw PPG and ECG signals have high potential in cuff-less BP estimation.
Different number of CNN blocks are used in this study and three CNN blocks are found to provide the best prediction results. Compared with its application in other areas such as image processing, which often benefits from many more CNN layers, the useful features contained in the physiological signals are not as complex. Therefore, the hybrid model does not have to be very deep. Indeed, hybrid models with four and five CNN blocks are outperformed by the hybrid model with three CNN blocks.
LSTM is included after CNN blocks as it is very useful in finding the important temporal features in time series data and is suitable in processing signals. After LSTM the features extracted from signals are combined with physical characteristics, which are age, height, weight and gender in this study.
With the automatic learning of optimal features in the training stage, this hybrid model minimizes the risk of omitting important features contained in the signals. Traditional feature extraction entails professional knowledge of specific signals and it is not often possible to extract all features that are potentially useful.
Despite the promising results found in this study, many important research questions remain. The focus was on developing a general hybrid model whose hyperparameters are determined using a pool of data from a number of individuals. A more accurate model can be estimated by tuning these hyperparameters based on the data of each individual. After this process, these models can be further calibrated to provide potentially more accurate prediction for different people as their associated optimal structure and hyperparameters may vary. The data used in this study is collected by the authors and there are 315 samples in total. As deep learning models often require a big data set, more data is likely to further improve the prediction accuracy. Therefore, in the next stage, we intend to further test the novel hybrid model on bigger data sets, including those that are publicly available. In addition, this paper focuses on improving the prediction accuracy of BP, however, before deep learning approaches are widely adopted it is important to consider the causality and relative importance of various features in predicting BP values (Holzinger et al. 2019). Due to deep learning models' multilayer and nonlinear structure, the relationship between input and output is not transparent and predictions are often not traceable. This causes problems in the interpretability of the deep learning models and make them of limited use in cases where causalities are of great importance in the study. Although beyond the scope of this work, this is a future direction that should be investigated.