1 Introduction & background

Heart monitoring is an important consideration for activities such as scuba diving or performance swimming. There are many important metrics that can be observed from cardiac analysis, arguably the most important being BPM as it is a crucial physiological marker in human beings [1]. ECG is the gold standard approach for cardiac analysis; it operates by monitoring the electrical activity of the heart via chest electrodes. The signals from the chest electrodes are amplified via a differential amplifier and filtered accordingly, the results of which can be plotted for analysis purposes [2]. This configuration generates the well known ECG trace featuring constituent components: P, Q, R, S and T waves. In the time domain, to calculate BPM from an ECG trace, one of these wave components must be identified per heart beat. Arguably the QRS complex, characterised by the sharp signal (R) peak, is the easiest feature to locate. Figure 1 demonstrates an example ECG trace sampled at 50 Hz. The difficulty associated with underwater ECG analysis and, therefore generation of underwater BPM, is the inherent signal noise and distortion resulting from underwater deployment.

Fig. 1
figure 1

Example ECG trace sampled at 50 Hz

In literature solutions exist to this problem including electrode fabrication methods. This involves utilising sealed Ag/AgCl electrodes or carbon powder electrodes [3, 4]. Such electrodes can yield clean underwater ECG traces suitable for further analysis however the Ag/AgCl electrodes are time consuming to apply and are reported to cause severe skin irritation [4]. Carbon powder electrodes are not associated with such difficulties and can also generate clean ECG traces especially when combined with poly-dimethylsiloxane [3]. However, this process is chemical in nature adding complexity to the design. An alternative solution to electrode design discussed in literature is focusing on amplifier considerations. A solution proposed by Gradl et al utilises a transimpedance amplifier circuit including active current feedback to produce clean underwater ECG waveforms [5]. Post amplification, additional filters are required to further clean the signal. This process also demonstrates some good results but increases electronic circuit complexity. Whilst both of the solutions discussed here (electrode/amplifier design) can achieve usable underwater ECG measurements suitable for calculating BPM, there are demonstrated design/use issues and complexities associated with each.

We find no examples in literature of software focused solutions to underwater ECG interpretation. There are however, demonstrated techniques for general ECG analysis of low Signal to Noise Ratio (SNR) data. For example, Vullings et al demonstrate the effective filtering of noisy ECG data via novel signal processing techniques [6]. This method is compared to existing wavelet approaches and is demonstrated to perform well. There are also several examples of machine learning methods that are intended for application to ECG signals, many solutions can be found that classify abnormalities such as arrhythmia [7, 8]. More relevant, other machine learning techniques demonstrate R peak detection via unsupervised clustering methods [9] or using supervised convolutional neural networks [10]. The supervised neural network approaches appear to perform significantly better.

Commercially, devices exist intended for swim fitness tracking. In particular, the Garmin HRM Swim strap [11] tracks BPM from the chest by using an ECG sensor during swimming. This product is, to our understanding, one of the best performing devices available on the market for this purpose. Garmin highlight the non-slip strap is specifically designed for use in swimming pools however will also work in open water. Emphasis is placed on the design of the strap with regards to ECG signal generation. Garmin do not indicate if any specialised signal processing is necessary to produce useable data, nor how accurate their device is with regards to generating BPM and we can find no Garmin HRM Swim BPM accuracy information in literature. Note, Garmin state in order to use the HRM chest strap, a compatible Garmin watch is also required.

In addition to the strap discussed above, Garmin (and other companies) produce wrist worn devices capable of monitoring swimming heart rate via an optical technique called Photoplethysmography (PPG). However, nominal (land) PPG has demonstrated limitations including inaccuracies especially at heightened Beats Per Minute (BPM) ranges [12]. Underwater PPG presents further difficulties including decreased signal amplitude and SNR at lower water temperatures [13].

For this work, we present a software based, time domain solution for underwater BPM generation. This is achieved via use of a CNN-RNN regression model that identifies the number of R peaks per 10 second ECG sample. A neural network approach is selected because of the demonstrated CNN performance on ECG data [10]. Whilst conventional signal processing methods that filter noisy ECG signals can perform well with regards to low SNR interpretation [6], our investigations show that a neural network approach is more effective. This is demonstrated in Section 3, where the performance on underwater ECG data of the devised CNN-RNN model is compared to the results of applying Continuous Wavelet Transform (CWT). CWT is a signal processing solution suitable for locating R peaks on low SNR ECG data [14]. The purpose of counting the number of R peaks per 10 second sample is to scale the count (multiply count by 6) to estimate BPM. The devised approach requires minimal signal preprocessing prior to application and, in addition, does not require specific physical design either to electrodes, amplifiers or chest strap in order to estimate BPM from underwater ECG signals.

The main contributions of this research are summarised by the list below:

  • Presents a novel approach for counting R peaks per fixed length low SNR ECG samples.

  • Approach developed for artefact replication. By applying modifications to clean ECG data, noise/artefacts can be replicated allowing for training to be performed on data sampled in standard conditions.

  • A software focused solution that can be applied to any sampled ECG data regardless of the collection method.

  • This method performs well compared to conventional signal processing techniques.

The remainder of this manuscript is structured as followed. Section 2 discusses the methods by which the machine learning techniques employed are introduced. Also detailed is the design of the constructed sensor system and the structure of the devised neural network. Section 3 demonstrates the results obtained. It first explains the evaluation methods used, then analyses the recorded underwater ECG data. Modifications applied to clean ECG data, to replicate the underwater samples, are then investigated. The results of the neural network trained on this modified data, tested on underwater data, are then given comparing the performance against a neural trained on unmodified data and application of CWT. These results are then discussed in Section 4 where future works are also detailed. Finally, conclusions are given in Section 5.

2 Methods

2.1 Recurrent & convolutional neural networks

Advances in the field of machine learning have resulted in neural network structures comprised of specialised neurons that are well suited to specific tasks. This work demonstrates the combined use of two such structures: Recurrent Neural Networks (RNNs) and Convolutional Neural Networks (CNNs). Recurrent neurons use information feedback, which gives rise to cell memory meaning that RNNs are suited to interpreting time series data. However, RNNs suffer from short term memory loss whereby information is preserved over a limited number of time-steps [15]. CNNs are typically associated with image related tasks however can be applied in 1 dimension to sequence data [15]. Where a RNN would interpret a time series sequentially, a CNN interprets a time series spatially [15].

A network constructed with a layer of 1D convolutional neurons, followed by a layer of recurrent neurons, can often achieve better performance with regards to sequence based tasks compared to a network comprised solely of recurrent/convolutional neurons [15]. The convolutional layer outputs N (where N is number of convolutional filters) 1 dimensional signals. By configuring the stride hyperparameter of this convolutional layer to be greater than 1, this will reduce the size of the output sequence(s), which alleviates the short term memory losses of the recurrent layer(s). Hence a CNN-RNN neural network configuration is used for this work.

2.2 Initial train/validation and test set generation

Figure 2 shows a diagram of the device we have constructed. Our chest worn device includes an ECG sensor (a single lead heart rate monitor [2]) and a microcontroller (Espressif ESP32 [16]), both of which are housed within an IP67 waterproof enclosure [17]. The ECG sensor performs the necessary signal processing/conditioning to produce readable ECG traces in standard conditions. Two ECG electrodes are fabricated into the elasticated chest strap, these electrodes are connected to the sensor cable (an RJ45 Ethernet lead [18]). To provide a waterproof interface between electrodes (outside IP67 enclosure) and the ECG sensor (within IP67 enclosure), an IP68 RJ45 junction is used [19]. The micro-controller, powered via a LiPo battery, samples the ECG sensor at a rate of 200Hz and transmits data every 10 seconds (2000 data points collected). The acquired data is transmitted to a nearby base-station (WiFi router) which communicates with a separate computer running a GUI designed by the author that saves all incoming data within CSV files.

Fig. 2
figure 2

Diagram of the constructed sensor system

The micro-controller employed is WiFi enabled, hence this protocol is used for transmission. It is understood that Radio Frequency (RF) signals do not propagate well through water [20]. However we find that data transmission is possible providing the enclosure housing the microcontroller is not submerged beyond a certain depth threshold.

Sampled ECG data can be broken down into two main categories:

  • Land based: from which 7359, 10 second ECG samples have been collected. These samples are used for neural network training/validation purposes. To generate a range of BPM values from this ECG data, various activities such as sitting, ascending stairs or jogging on spot were performed during sampling. Each individual sample is normalised to a range of (-1,1) and resampled at 50 Hz using MATLABs resample function. This reduces the number of samples to 500 instead of 2000 per 10 seconds. Network performance is poor without this step because of recurrent neuron short term memory.

    The samples are labeled via MATLABs peak detection algorithm [21] where minimum peak height and minimum spacing values are set to identify the R peaks of the ECG waveform. Configuration of the peak detect algorithm is minimum peak height: 0.7 and minimum peak spacing: 0.5. To verify this process, visual analysis was performed comparing peaks identified by software against peaks identified by eye. Having labeled this data set, it is split in a stratified manner into training and validation data subsets with ratio 80:20 respectively. As mentioned previously, this land ECG data, once modified, is used for neural network training.

  • Underwater: from which 525, 10 second ECG samples have been collected either from a swimming pool (chlorine concentration 1.5ppm) or bath (UK tap water). These samples are used to assess trained neural network performance and as before are normalised to (-1,1) and resampled at 50 Hz. Initially underwater ECG sequences were sampled from a bath as this approach grants a controlled, accessible environment from which sensor/transmission functionality could be verified. Experimentation then progressed to a swimming pool in order to allow dynamic movements such as swimming.

As highlighted, when generating the training data, various activities were performed in order to yield ECG samples with a range of R peak counts. The resultant distribution of labels is positively skewed (features more low R peak count samples) as this data was mostly recorded whilst sitting. A positive skew is not noted for the underwater ECG data set as many of these samples were recorded whilst swimming (which generated more increased R peak count samples). To account for this, the number of underrepresented beat count samples in the train and validation data sets was increased. This was done by segmenting underrepresented samples into 5 sections which were randomly shuffled resulting in a new ECG sample. This shuffled sample has the same number of R peaks as the original, however is a new sequence for the neural network to interpret. Repeating this process allows new samples to be generated for underrepresented data labels, however there are limited permutations possible considering each sample is divided into 5 segments. Hence this process is used sparingly to avoid sample duplication. Crucially, this balancing is performed independently on the train/validation subsets after the data is split.

2.3 Neural network configuration/hyperparameters

The neural network has been built using the python library Tensorflow Keras [22]. Experimentation has determined that the optimal network configuration for this problem is to employ two hidden layers. The first of these layers is a convolutional layer and the second is a recurrent layer. Figure 3 details the structure of the model that features dropout layers (set at 10%) after the convolutional and recurrent layers to limit overfitting. Specific hyperparameters of the convolutional layer are detailed by Table 1 and for the recurrent layer Table 2. Importantly, the continuous output of this network is rounded in order to yield an integer R peak count.

Fig. 3
figure 3

Devised CNN-RNN structure used for counting underwater heart beats

Table 1 Convolutional layer
Table 2 Recurrent layer

The Adaptive Moment Estimation (ADAM) [15] optimisation algorithm is used with a MSE cost function. During training, the batch size is set at 32 and the learning rate of the ADAM optimisation algorithm is configured to be 5 × 10− 4. However, this is lowered during training by Keras ReduceLROnPlateau callback function when the validation loss ceases to reduce. This callback functionality has been configured with reduction factor set to 0.5 and patience (number of epochs pre calling function) set to 5. This technique is known to increase the rate at which a network converges [15]. To prevent exploding gradients, the normal of the gradient vector for the optimisation algorithm is clipped to 1.

Early stopping is also used as a method of regularisation configured to halt training if the validation loss does not reduce after a patience of 10 epochs.

3 Results

3.1 Model evaluation & BPM generation

To assess the performance of the devised CNN-RNN network, the Mean Absolute Error (MAE) and Root Mean Square Error (RMSE) are used. Equation 1 describes the MAE where n is the number of samples, \(\hat {x}_{i}\) is the true value and xi is the predicted value. The RMSE is detailed by 2. The RMSE is similar to the Mean Square Error (MSE), a metric which significantly punishes large errors due to the squaring of the error residual (difference between predicted value and true value) [23]. By taking the square root of the MSE, the RMSE has the same units as the predictions/labels [15].

$$ MAE = \frac{{\sum}_{i=1}^{n}abs(x\textsubscript{i} - \hat{x}\textsubscript{i})}{n} $$
(1)
$$ RMSE = \sqrt{\frac{{\sum}_{i=1}^{n} (x\textsubscript{i} - \hat{x}\textsubscript{i})^{2}}{n}} $$
(2)

As discussed, once the number of R peaks is obtained for a given time window (10s/500 samples), this value is multiplied by an appropriate factor (6) to provide the BPM. This method is used by medical practitioners as a quick approach to generate BPM estimates [24]. However, generating BPM in this fashion gives rise to BPM error. This error can be minimised by increasing the sample length [25], however this can incur recurrent neuron short term memory issues. An alternative solution would be to reduce the sampling rate but increase sample length, so that the number of samples remains constant. Whilst this would increase sample length hence reduce BPM calculation error, reducing the sample rate below certain values can cause R peak distortion [26]. We initially sample at 200 Hz generating 2000 samples per 10 seconds. We then employ MATLABs resample functionality [27] to adjust sample length to 500 yielding an effective sampling rate of 50 Hz. This is done in an attempt to minimise recurrent short term memory losses. Sampling ECG signals at 50Hz is known to cause some ECG distortion, however it has been demonstrated as an acceptable frequency for Heart Rate Variability (HRV) analysis [26].

3.2 Underwater ECG recordings

Figure 4 demonstrates some examples of the ECG data recorded by the sensor system and pre-processed as discussed above (normalised & resampled). The plots are labelled according to which activity was performed during sampling where: (a) is land based; (b) is underwater stationary; (c) is underwater front crawl swimming and (d) is underwater breast stroke swimming.

Fig. 4
figure 4

ECG samples colour coded according to sampling activity where: (a) is land based; (b) is underwater stationary; (c) is underwater front crawl swimming and (d) is underwater breast stroke swimming

Crucially, for the underwater samples, the R peaks are generally identifiable by eye despite the signal noise/artefacts. Analysis of the distortions, particularly for the swimming examples, can correlate the motion artefacts to type of swim stroke. ECG motion artefacts are common signal errors caused by subject movement during sampling and appear as baseline wander on the signal [28]. The lower frequency baseline wander component seen on the breast stroke example (d), compared to the front crawl example (c), correlates to the rhythm associated with each stroke. Breast stroke involves a low frequency, vertical motion that is larger in magnitude compared to the higher frequency rolling movement associated with front crawl; we believe this is reflected by the two examples shown. The more subtle baseline wander seen on the stationary underwater sample (b) is associated with gentler movements such as body adjustments whilst submerged. The signal saturations seen on all underwater examples (b, c and d), appear as noise when the ECG amplifier operational capabilities are exceeded. This could be caused by sharp body movements causing the amplifier to saturate potentially covering important signal information. Another effect noticed on the underwater samples is that R peaks may not have a consistent magnitude across the sample. The precise cause of this may be the result of multiple factors, importantly it does not appear to fully obscure R peaks.

3.3 Data augmentations and artefact replication

To ensure neural network performance, we replicate the noise/artefacts highlighted above by applying various modifications to the initially clean training data. These modifications are demonstrated by Fig. 5. Shown here is an original sample (a) that is modified via multiple approaches.

Fig. 5
figure 5

Example of how an original ECG sample (a), is modified by: addition of baseline wander (b); compression (c); high frequency, short period saturation (d) and low frequency, long period saturation (e)

As discussed, for underwater samples, particularly when swimming significant baseline wander is present that is dependent on the activity. Traditionally, baseline wander is removed via use of appropriate signal filtering. However, our intention here is to avoid the use of such techniques. Instead we generate a composite signal, formed by summing 3 randomly generated sin waves, which is added to a clean ECG signal. This results in a new sample that has a random baseline wander. An example of this is demonstrated by the sample (b) on Fig. 5. Furthermore, to allow the neural network to interpret the signal saturations/inconsistent magnitudes discussed previously, these artefacts are also replicated. See Fig. 5: (c) - compressed sample; (d) - high frequency, short period saturation sample and (e) - low frequency, long period saturation sample. These are controlled by constrained random variables determining, where appropriate, the length, number or magnitude of modifications to the sample. To increase the data set size, the original sample (a), is reversed and the processes detailed by Fig. 5 are repeated for this reversed sample, further increasing dataset size. Note, these modifications are applied to both the train and validation datasets post split.

Figure 6 demonstrates how the modifications detailed by Fig. 5 are applied in combination to an original ECG sample. This process is repeated for all samples in the training/validation subsets (7359, 10 second land based ECG samples). All samples shown by Fig. 6 are included for training/validation purposes. This approach is used to generate samples that are representative of the underwater ECG data and to augment the size of the training/validation subsets. Augmentation is beneficial because increasing the size of the training set reduces neural network over fitting [29]. Data set augmentation is often used in the field of image classification (commonly performed via use of a CNN [30]), where simple strategies such as the addition of random noise and/or image flipping are used [31]. This generates a modified image that can be used for training as well as the original sample, the modifications detailed by Fig. 5 are applied in combination to replicate this principle.

Fig. 6
figure 6

Flow chart detailing how an original sample is modified and how the modifications demonstrated by Fig. 5 combine

Figure 6 also shows how the combination of modifications can yield samples for which the R peaks are difficult to locate manually. The labels for such samples are known, as the modified samples are associated with un-modified data for which the R peak count has been determined via peak detect. This means effective training on the heavily distorted samples is possible and therefore the devised neural network can attempt to interpret such sequences.

3.3.1 Neural network performance

As discussed previously we evaluate our beat counting CNN-RNN regression model on 525 underwater ECG samples. To assess the performance, Table 3 compares the results of this neural network when trained on modified and un-modified data; also demonstrating the results of Continuous Wavelet Transform (CWT) applied to the data. From literature, CWT is a technique that can be used to effectively locate R peaks on noisy ECG samples. The CWT configuration used here matches that which is described by MATLAB for the purpose of low SNR ECG R peak locating (wavelet: sym4, scales: 4 & 5) [14]. As stated by MATLAB, the sym4 wavelet is selected as it resembles the shape of the QRS complex and scales 4 & 5 are used to cover the passband frequency maximising QRS wave energy. As seen, the neural network trained on modified data performs significantly better compared to the other methods.

Table 3 Testing on underwater ECG samples

To understand the significance of the applied noise/artefacts, Table 4 demonstrates how the removal of specific noise/artefacts from the train and validation data subsets affects the results of the model when tested on the under water samples.

Table 4 Effects of noise/artefact removal from train/validation sets

4 Discussion

Table 3 demonstrates the notable performance achieved using a CNN-RNN model trained on modified land sampled ECG data, when tested on underwater ECG data. It is clear that not applying the modifications to the training/validation data results in a model that cannot interpret the underwater data, which is logical considering the lack of exposure to the associated noise/distortions. Whilst CWT performs slightly better than the CNN-RNN model trained on unmodified data, the performance of the signal processing technique is still poor. This is despite matching the CWT configuration settings used by MATLAB to locate R peaks from low SNR ECG signals. A possible cause for this is that underwater ECG waveforms are distorted so that the sym4 wavelet no longer adequately matches the shape of the QRS complex and therefore CWT fails to reliably identify heart beats.

From the results demonstrated by Table 4, it can be deduced that replication of baseline wander and compression are the most important artefacts to include within the train/validation sets. However, best performance is achieved when all approaches are included and combined. Whilst certain modifications employed may be more significant than others with regards to yielding representative samples, each technique applied increases the size of the train/validation datasets as demonstrated by Fig. 6. As highlighted previously, increasing the size of the training set reduces overfitting and hence improves neural network performance. Therefore the improved performance noted when a combination of all discussed modifications are applied, is likely not solely due to the modifications themselves, but also attributed to the resultant reduction in overfitting.

Whilst the use of a neural network in any case is associated with increased computational/mathematical complexity, we argue that our results demonstrate the potential of this approach. Minimal data pre processing is required with no need for any signal filtering. A software specific approach to this problem means that no special design is necessary with regards to the chest strap, electrodes or amplifier. Minimising physical design ensures such an approach is low cost and easy to distribute. Whilst the application demonstrated in this paper is relatively simple (counting heart beats per 10 second sample), the approach employed to make normal ECG samples representative of underwater data could be used to train a neural network to recognise other ECG measurable factors such as Heart Rate variability (HRV).

Future work beyond what is discussed in this manuscript will involve continued investigation of the artefact replication methods. As demonstrated by this paper, good neural network performance (when tested on underwater ECG data) is achievable for models trained on modified land sampled ECG data. This concept will be applied to data sampled from other sensors, such as breathing rate so to investigate the viability of artefact replication for other psychological metrics. Furthering this idea, we are interested in applying similar artefact replication methods in other environments where a large training/validation set is difficult to collect. Then investigating neural network performance, trained on the modified data, tested on genuine data from said environment relative to a particular task.

5 Conclusions

In this work we investigated a neural network approach to identify heart beats from fixed length ECG sequences sampled underwater. This was done as a method to estimate BPM which is an important physiological measurement generally and for applications such as performance sport, rehabilitation and/or divers in hazardous environments. We show that the devised neural network preforms significantly better compared to other BPM generation techniques and requires minimal signal pre-processing prior to network application. In addition, no specific electrical/mechanical considerations are necessary, meaning that off the shelf components can be used minimising cost. The network has been trained on land based ECG samples based which have been modified to represent underwater ECG data. Testing performed on this genuine underwater ECG data shows that the modifications applied yield representative samples suitable for training purposes.