Abstract
Heart health monitoring based on wearable devices is often contaminated by various noises to varying degrees. Using signal quality indicators (SQIs) to achieve signal quality assessment (SQA) is among the most promising ways to solve this problem, but the performance of SQIs in expressing ECG quality features contaminated by motion artifact (MA) noise remains disappointing. Here, we present a novel SQA method that fuses the proposed depth local dualview (DLDV) features and the dualinput transformer (DITransformer) framework to improve the recognition ability of MAcontaminated ECGs. The proposed DLDV features are to identify subtle differences between MA and ECG through depth local amplitude and phase angle features. When it fuses with the temporal relationship features extracted by DITransformer, its accuracy is significantly improved compared to the SQIsbased methods. In addition, we also verify the robustness and the accuracy of DLDV features on four traditional classifiers. Finally, we conduct our experiments on the two datasets. On the PhysioNet/Computing in Cardiology Challenge dataset, the DLDV features (Acc = 95.49%) outperform the combination of six SQIs features (Acc = 91.26%). When combined with our DITransformer, it delivered an accuracy of 99.62%, outperforming the stateoftheart SQA methods. On the artificial testset constructed by MA noise, our DITransformer outperforms four traditional methods and also delivered an accuracy of 97.69%.
Introduction
Traditional electrocardiograms (ECG) analysis usually requires doctors to diagnose and treat based on the patient’s ECG wave information. However, ECGs recorded by wearable devices are commonly contaminated by various noises. Especially contaminated by noise such as motion artifacts [MA: muscle artifact (ma) and electrode motion artifact (em)], resulting in a large number of poor quality signals, and its existence seriously hinders the doctors’ diagnosis and delays patients’ timely treatment. To make matters worse, some MA frequency details overlap with the band of the ECG signals, thus limiting the filtering methods in the frequency domain, or have similar morphology to some ECG signals, thus limiting the filtering methods in the time domain [1]. It is challenging to eliminate these noises without distorting the clinical features [2].
In general, there are two ways to solve this problem. The first is to use denoising techniques [3,4,5,6], which have good effect on baseline wander, highfrequency noise, etc., but it is difficult to remove the MA noise mentioned above. Another way is to eliminate signals heavily contaminated by the MA through the signal quality assessment (SQA) [7, 8]. Currently, the mainstream SQA methods can be roughly divided into two categories. The first category is based on traditional machine learning and signals quality indicators (SQIs) [9,10,11,12,13,14]. For example, Xia et al. proposed an ECG SQA method based on support vector machine (SVM) and multifeature fusion with waveform attributes, power spectrum, Rwave detection, and other characteristics [9]. Behar et al. employed indicators such as kSQI, sSQI, pSQI basSQI, bSQI, pcaSQI, and rSQI, and trained an SVM model to evaluate the quality of ECG signals to reduce false alarms [10]. Satija et al. calculated the SQIs through signal loss detection, baseline mutation extraction, and highfrequency noise detection and extraction to evaluate the clinical acceptability of ECG signals [11]. Zhang et al. adopted waveform featurebased methods (including leadoff features, baseline wander features, power spectral features, and nonlinear features) to train random forest and SVM model for SQA [12]. Shahriari et al. used a structural similarity measure (SSIM) to compare ECG images obtained from two ECGs at standard scales. Then, a representative subset of ECG images are selected from the training set as a template by a clustering method. Finally, the SSIM between each image and all templates is used as features to train a linear discriminant analysis classifier for SQA [13]. Holzinger et al. provided a taxonomy of various entropy methods, whereby describing in more detail: approximate entropy, sample entropy, fuzzy entropy, and particularly topological entropy for finite sequences. They also state that entropy measures have successfully been tested for analyzing short, sparse and noisy time series data [14]. These handcrafted features have the advantage of interpretability and can reflect the specific description of ECG features to a certain extent. However, these SQIs are based on humandefined desirable properties of clean signals, it relies on humanspecified properties, leading to inherent limitations in expressing potential features of signal quality. Simultaneously, they rarely consider the effective ECG feature extraction methods under the condition of MA interference.
The second category is deep learningbased methods [15,16,17,18], which usually utilize abstract features extracted by deep learning techniques or combine them with handcrafted features to implement SQA. For instance, Liu et al. proposed a new method that combines deep learningbased Stockwell Transform (STransform) spectrogram features and handcrafted statistical features to achieve SQA [15]. Huerta et al. combined convolutional neural networks and wavelet transform to robustly identify highquality ECG segments in the challenging setting of singlelead recordings of alternating sinus rhythms, atrial fibrillation episodes, and other rhythms [16]. Seeuws et al. used an unsupervised deep learning model to derive a datadriven quality metric that outperformed some traditional metrics (kSQI, sSQI, IOR, pSQI, basSQI, bSQI, and pcaSQI) and highlight the consistently superior performance of their metrics across different tasks [17]. Zhang et al. designed a comprehensive featureset (covering spectral distribution, signal complexity, horizontal and vertical variations of waves, etc.) and utilized two long shortterm memory (LSTM) layers to learn timedependent features automatically [18]. Compared with handcrafted features, the abstract features extracted based on deep learning methods describe ECG recordings from another magical perspective. But they seldom consider effective solutions to the problem of MA interference that have similar morphology and band aliasing to some ECG signal. In addition, they also rarely propose interpretability and relationships between these features.
Here, we mainly solve two problems: (1) noise such as MA with similar morphology and aliased frequency bands to some ECG can easily deceive machine learning methods, resulting in low accuracy of SQA. (2) The handcrafted features require sufficient human intervention and cannot express signal quality comprehensively. We propose a novel SQA method that fuses depth local dualview (DLDV) features and a dualinput Transformer (DITransformer) framework for improving the recognition ability of MAcontaminated ECG. Specifically, we extract the first three intrinsic mode functions (FTIMF) of the signal through empirical mode decomposition (EMD) [19] and then employ fast fourier transform (FFT) [20] to further explore the deeper local amplitude and phase angle features of FTIMF. Then, the DLDV features are dimensionally reduced by kernel principal component analysis (KPCA) [21] and employed to identify subtle differences between MA and ECG signal through depth local amplitude and phase angle features. At the same time, we also analyze the FTIMF’s central tendency and dispersion degree and combine the result with dimensionality reduced DLDV features to form augmented features (FTIMF\(_\mathrm{all}\)). Finally, the FTIMF\(_\mathrm{all}\) is fused with the temporal relational features extracted from Raw ECG by the proposed DITransformer framework to train the SQA model. In particular, the phase angle features we extracted contains the contribution of each time sample point. So it can quantify the subtle changes in ECGs at the time sample point. Naturally, it can also distinguish the nuances of ECGs and MA. As far as we know, there has no literature on extracting the DLDV features (phase angle and amplitude–frequency features) from FTIMF to achieve SQA. Only Lee et al. calculated the mean, variance, and Shannon entropy from the first IMF (FIMF) obtained by EMD, and then used them for SQA [22]. These indicators can reflect the signal’s central tendency and dispersion degree but cannot fully reflect the deeper local features used to distinguish the MA noise. Because the feature information computed by their method loses the temporal features. In this paper, the DLDV features extracted through FTIMF not only can solve the problem that traditional methods cannot obtain the iconic features of the MA, but also have the advantage of interpretability. We also verify the accuracy and robustness of DLDV features on four traditional classifiers and provide an accurate and efficient SQA scheme based on KNearest Neighbor (KNN). In addition, our proposed DITransformer model is based on the transformer [23] architecture, which has the advantage that the multihead attention module can be executed in parallel and can capture the temporal relationship of ECG signal. Our combined strategy with the transformer model can overcome the shortcomings of traditional machine learning requiring full human intervention while accurately distinguishing MA noise from ECGs. The contributions of this study can be summarized as follows:

The proposed DLDV features can identify subtle differences between MA and ECG signals through depth local amplitude and phase angle features, which provides a practical and novel solution for identifying MAcontaminated ECGs.

The proposed DITransformer can focus on the temporal relationship between sample points and reflect the subtle local changes in the signal sequence, which can effectively improve the model’s ability to identify MAcontaminated ECG.

The strategy of fusion the DLDV features and DITransformer’s temporal relational features extracted from Raw ECG significantly improves the accuracy of MA noise recognition and has applications such as wearable ECG monitoring devices.

For the first time, we propose the DLDV features to solve the ECG SQA problem and achieve an accuracy of 94.27% on GSVM and 93.32% on KNN, and the result outperforms six traditional SQIs. More importantly, we obtain the best accuracy (99.62%) on the proposed DITransformer, which outperforms other stateoftheart SQA methods.
This paper is organized as follows: “Methodology” presents the data used in the experiments and the details of our method. “Experiments and results” demonstrates the experimental results. Finally, we discuss and conclude our work in “Discussion” and “Conclusion”.
Methodology
We present the overall framework of the proposed SQA method in Fig. 1. It mainly consists of three parts: data preprocessing, DLDV features extraction and KPCA, and DITransformer framework. Among them, the DITransformer framework also consists of two parts: transformer encoder layer and classification layer. Next, we will describe each part in detail in the following sections.
The DLDV features extraction and KPCA
DLDV features extraction
We start our DLDV feature extraction method from EMD [19]. The EMD can effectively process nonlinear and nonstationary timeseries signals, such as ECG signals. Unlike FFT and discrete wavelet transform (DWT) [24], the EMD reveals the inherent features of a signal through its decomposition IMFs. It can represent a signal as a combination of multiple IMFs components, containing the characteristic distribution from high to low frequency. Different IMFs can reflect the feature information of signal and noise in different degrees.
In general, some MA noise has similar morphology and overlapping frequency to some ECG signals, so traditional denoising methods cannot effectively eliminate such noise. Amazingly, we find the local nuances between them that can be expressed by IMFs somehow. Therefore, we design a special method to obtain the DLDV features of these MAcontaminated ECGs. Figure 2 shows the architecture diagram of the proposed DLDV feature extraction method. The light green areas represent the key modules of the proposed method, which we named the DLDV feature extraction module (DLDVFEM), and it composed of a stack of \(N = 3\) identical modules. Each module has two submodules. The first is an FFTbased submodule, and the second is a statistical analysisbased submodule (SAbased submodule). After performing the EMD operation on x[n], we obtain its FTIMF components (FIMF: the first IMF, SIMF: the second IMF, and TIMF: the third IMF). When we feed FIMF to DLDVFEM through the “Input” pipeline, the FFTbased submodule obtains its amplitude value and phase angle in the frequency domain through the FFT [20] operation (denoted as FTIMF\(_\mathrm{f}\)). Meanwhile, the SAbased submodule obtains its central tendency and degree of dispersion (denoted as FTIMF\(_\mathrm{t}\)). Then, FTIMF\(_\mathrm{t}\) and FTIMF\(_\mathrm{f}\) are output together to FTIMF\(_\mathrm{F}\) through the lavender pipeline. When the remaining SIMF and TIMF pass through the DLDVFEM module in turn, we get two output components (SIMF\(_\mathrm{S}\), and TIMF\(_\mathrm{T}\)). Then, the FTIMF\(_\mathrm{f}\) of these three output components are concatenated together to form our FTIMF\(_\mathrm{freq}\) (DLDV) features, and the three FTIMF\(_\mathrm{t}\) of these components are concatenated together to form our FTIMF\(_\mathrm{time}\) features. Finally, the output features (FTIMF\(_\mathrm{all}\)) of the entire module are obtained by concatenating FTIMF\(_\mathrm{freq}\) and FTIMF\(_\mathrm{time}\). Next, we will describe the feature extraction process in detail:
Given \(X \in {\mathbb {R}}^{12 \times \ell }\) represents a multilead ECG signal, and \(X_\mathrm{f} \in {\mathbb {R}}^{1 \times \ell }\) represents the fth lead ECG signal, \(f \in [1, \ldots , 12]\) are the number of leads for the ECG signal, and l is the length of ECG segment. After performing the EMD operation according to [19], we can get IMFs as follows:
where n is the serial number of ECG segment, \(\mathrm{IMF}_{\mathrm{f},p}[n]\) represents the pth IMF of the fth lead. \(\mathrm {p} \in [1,2\ldots ,N]\), N (here, the value of N is 3 and f is 1) is the total layer number of IMFs, \(r_{f,p}[n]\) is the residual signal generated by the fth lead signal passing through the pth layer EMD. Note that this paper mainly uses the FTIMF (FIMF, SIMF, and TIMF) components of EMD. Because the dynamics of the FTIMF of the EMD are as though they have been passed through a highpass filter [25]. Hence, it is not surprising that the FTIMF contains dynamics associated with noise for any wellsampled data [26].
Figure 3 shows the FTIMF of clean signal, bwcontaminated signal, macontaminated signal and emcontaminated signal, respectively. We find several interesting phenomena: (1) the amplitude values of the IMFs of the noisecontaminated ECG signals are significantly lower than that of the clean signals. (2) The FTIMF component of EMD contains almost no bw noise (there is almost no difference between the corresponding IMFs components in Fig. 3a, b), but can well reflect the inherent features of em and ma noise (the FTIMF of the noise signal in Fig. 3c, d) reflect the feature information of noise to varying degrees). (3) R peaks have higher amplitude values in each IMF component, while em or ma noises similar to R peaks have different amplitude values in different IMFs. The difference of ma artifacts in each IMF component is marked in light purple in Fig. 3c, and it can be seen that the ma is manifest in different degrees in all three components. In Fig. 3d, the difference of em artifacts in each IMF component is marked in light purple colors, and it can be seen that em has obvious characteristics in TIMF. These phenomena indicate that the FTIMF contains some features beneficial to recognizing MAcontaminated ECG. Therefore, we utilize the FFTbased submodule to extract the amplitude value and phase angle of FIMF, SIMF and TIMF in the frequency domain, and concatenate the features obtained from the three components:
among them, FTIMF\(_\mathrm{freq} \in {\mathbb {R}}^{3 \times 2l}\), the \(\Vert \cdot \Vert \) means the absolute value operation, the \({\text {angle}}(\cdot )\) represents the operation of calculating the phase angle, and \({\text {fft}}(\cdot )\) represents the operation of FFT. \({\text {Concat}}(\cdot )\) represents the operation of the connection. Simultaneously, we utilize the SAbased submodule to analyze the central tendency and dispersion degree of FTIMF in the time domain, and concatenate the features obtained from the three components:
where the \({\text {mean}}(\cdot )\) is the averaging operation, \({\text {var}}(\cdot )\) represents the operation of calculating variance, and FTIMF\(_\mathrm{time} \in {\mathbb {R}}^{3 \times 2}\).
Figure 4 shows an example of the feature extraction of the em and ma contaminated signals at each stage. Figure 4a is the amplitude–frequency features of the emcontaminated ECG, and Fig. 4b is its corresponding phase angle features. Figure 4c is the amplitude–frequency features of the macontaminated ECG, and Fig. 4d is its corresponding phase angle features. It can be seen that when the frequency of the intermediate quantity decomposed by the em or macontaminated ECG is not 0, the corresponding phase angle is also not 0 and does not have obvious periodic characteristics (the phase angle feature of the clean signal has a periodic characteristic.). It is in line with the periodic characteristics of the ECG signal. In addition, the phase angle can reflect the local change of the signal waveform at a certain moment [27], so the depth features extracted in this way can well remember the subtle differences between the signal and noise. Finally, we obtain the FTIMF\(_\mathrm{freq}\) and FTIMF\(_\mathrm{time}\), and we also call the FTIMF\(_\mathrm{freq}\) as \(X_\mathrm{DLDV}\).
DLDV feature dimension reduction
Principal component analysis (PCA) [28] is one of the essential methods for linear dimensionality reduction. Each principal component is a data projection in a certain direction, and their variances in different directions are determined by their eigenvalue. In the dimensionality reduction process, the eigenvalues are sorted from large to small. The eigenvectors corresponding to the first k eigenvalues are used as dimensionalityreduced features to express the information we are interested in. However, the data we need to process are nonlinear and nonstationary ECG signals. Therefore, this paper adopts kernel principal component analysis (KPCA) [21] to deal with these data. In the KPCA, we believe the ECG data have a higher dimension. We can do PCA analysis in a higherdimensional space (Hilbert space). The advantage is that it is possible to find an effective projection direction to classify the data in a higherdimensional space for nonlinear data points that are difficult to classify in a lowerdimensional space. Since the dimensionality of DLDV features (nonlinear features) is too high and contains some features that hardly contribute to classification (as reflected in Fig. 4). So, we utilize KPCA to perform dimensionality reduction operations on DLDV features.
For PCA, given \(\mathbf {{\textbf {X}}}_\mathrm{DLDV} = \left[ x_{1}, x_{2}, \ldots , x_{n}\right] , \varvec{{\textbf {X}}}_\mathrm{DLDV} \in {\mathbb {R}}^{n \times d}\), n is the sequence numbers of \({\textbf {X}}_\mathrm{DLDV}\), and d is the dimension of each sequence. After performing PCA, we get the following decomposition model:
\(\mathbf {{\textbf {S}}}_{\mathrm {t}}(1 \le \mathrm {t} \le \mathrm {d})\) and \(\mathbf {{\textbf {U}}}_{\mathrm {t}}(1 \le \mathrm {t} \le \mathrm {d})\) represents the principal component vector and the corresponding projection vector, respectively. Since \({\textbf {U}}_{t}\) represents a series of orthonormalized vectors, the principal component \({\textbf {S}}_{t}\) can be expressed as: \({\textbf {S}}_{t} = {\textbf {X}}_\mathrm{DLDV} {\textbf {U}}_{t}\). So, the projection vector \({\textbf {U}}_{t}\) can be calculated by solving the eigenvalue problem:
For KPCA, we define a mapping: \({\textbf {X}} _\mathrm{DLDV} \in {\mathbb {R}}^{n \times d} \rightarrow \varvec{\mathbb {\aleph }}\left( \mathrm {{\textbf {X}}}_\mathrm{DLDV}\right) \in {\mathbb {R}}^{n \times p}\), the \(\varvec{\mathbb {\aleph }}(\cdot )\) denotes a nonlinear mapping function which is to map the signal to the Hilbert functional space (\(\varvec{\beth }\)), and p represents the dimension of the feature space. We denote the mapping function of \({\textbf {X}}_\mathrm{DLDV}\) to the \(\varvec{\beth }\) space as:
For the nonlinear case, it is difficult to solve \({\textbf {U}}_{t}\) by simply replacing \({\textbf {X}}_\mathrm{DLDV}\) with \(\varvec{\mathbb {\aleph }}({\textbf {X}}_\mathrm{DLDV} )\) according to (6). Because the mapping function \(\varvec{\mathbb {\aleph }}(\cdot )\) is unknown. To address this problem, we introduce kernel tricks to develop KPCA model. The \({\varvec{U}}_{t}\) can be expanded in the feature space as \(\mathrm {{\textbf {U}}}_{\mathrm {t}} = \varvec{\mathbb {\aleph }}^{T}\left( {\varvec{X}}_\mathrm{D L D V}\right) \varvec{\beta }_{t}\) by reference [29], \(\varvec{\beta }_{t}\) is a linear transformation vector. Thus, formula (6) is transformed as:
we find that \(K = \varvec{\mathbb {\aleph }}({\textbf {X}}_\mathrm{DLDV} ) \varvec{\mathbb {\aleph }}^{T} ({\textbf {X}}_\mathrm{DLDV} )\) is the kernel matrix of the kernel function, and the elements of the kernel matrix are calculated by the Gaussian kernel function \(k(x, y) = e^{\left( \frac{\left\ x^{2}y^{2}\right\ }{w}\right) }\), and w represents the bandwidth of the Gaussian kernel.
For a given test vector \({\varvec{X}}_\mathrm{D L D V}^{j} \in {\mathbb {R}}^{d}\), represents the jth DLDV feature vector, the corresponding kernel principal component can be calculated by [30,31,32]:
where \(t = [1,2,..., k]\) indicates that the first k vectors retained after dimensionality reduction, that is \(\mathrm {{\varvec{S}}_{t}}\left( {\varvec{X}}_\mathrm{D L D V}\right) \in {\mathbb {R}}^{1 \times k}\). Here, we determine the value of k by the cumulative contribution rate of the principal components. Usually, if the cumulative contribution rate (P) of the first k principal components reaches 80–90%, it means that the first k principal components basically contain the main information of all measurement indicators. To keep as many principal components as possible while reducing dimensionality as much as possible, we keep all principal components with \(P\ge 95\%\):
After DLDV feature extraction and dimensionality reduction for 6 s ECG signals, we determine the minimum k value that satisfy Eq. (10) is \(k = 2124\) (\(354\times 6\)). Finally, we combine the FTIMF\(_\mathrm{time}\), and the lowdimensional result (FTIMF\(_\mathrm{all} \in {\mathbb {R}}^{1 \times (k+6)}\)) obtained as:
Proposed dualinput transformer model
Deep learningbased approaches can automatically extract abstract features of samples. However, its complex convolution and recursive structure make a series of hidden layers have a large number of fronttoback dependencies, which leads to low parallelism of the model. Transformer, the first sequence transduction model entirely based on attention, replacing the recurrent layers most commonly used in encoderdecoder architectures with multiheaded selfattention [23]. Existing studies have shown that the transformer can not only handle the problem in the field of translation, but can deal with the classification of temporal sequence [23], such as ECG sequence [33, 34]. For the first time, we propose a DITransformer model to deal with the problem of ECG SQA, and its overall structure is shown in Fig. 5. Our DITransformer model mainly includes the transformer encoder layer and classifier layer. Furthermore, the feature extraction and KPCA are plugged into our model as augmented features. Note that the transformer encoder layer is formed by stacking six attention modules, each module includes six multihead attention blocks, and the specific composition of the multihead attention mechanism is in [23, 35, 36]. Since ECG does not require a standard translation process, we replace the decoder part of the transformer with a fully connected layer. We describe the DITransformer in detail as follows.
Transformer encoder layer
Input embedding and positional encoding: The input embedding of the sequential signal is similar to methods in most natural language processing (NLP) architectures [37]. To get the embedding for each point, the Raw ECG or FTIMF\(_\mathrm{all}\) is mapped to the \(d_\mathrm{model}\) dimensional space through 1D convolution. It should be noted that we must ensure the consistency of the sequence length before and after convolution through welldesigned padding and kernel size. That is, we must ensure the dimension of the embedding output is also \(d_\mathrm{model}\). In addition, we choose the sinusoidal version [23, 36] to provide positional embedding for our input sequence.
Attention module: We stack the attention module six times, and each consisting of two parts (the multihead attention block and the feed forward network). The former comprises six parallel attention modules, and its internal structure is shown in Fig. 6. After the “input embedding and positional encoding” operation for raw ECG, the input vector U of the transformer encoder layer is obtained. Then, we define three transformation matrices: \(\mathrm {W}_{e}^{\mathrm {Q}}\in {\mathbb {R}}^{d_\mathrm{model}\times d_{k}}\), \(\mathrm {W}_{e}^{\mathrm {K}}\in {\mathbb {R}}^{d_\mathrm{model}\times d_{k}}\) and \(\mathrm {W}_{e}^{\mathrm {V}}\in {\mathbb {R}}^{d_\mathrm{model}\times d_{v}}\), \(e = \{1,2,\ldots ,6\}\), and use these three transformation matrices to perform three linear transformations on U to get the query (\(Q_{e}\)), Key (\(K_{e}\)) and Value (\(V_{e}\)). Finally, the eth head is calculated by \(Q_{e}\), \(K_{e}\), and \(V_{e}\):
where T represents the operation of matrix transpose. To connect the results of all \(h_{e}\), we define the transformation matrix \(W^{P}\), and then get the output of the multihead attention module through a linear mapping operation:
where the \(W^{P}\in {\mathbb {R}}^{6 d_{v}\times d_\mathrm{model}}\) [23]. And then, a residual connection and a layer normalization are performed in “Add &Norm” blocks for MHAB(Q, K, V). The result is then connected to the feedforward network (the second part of attention module), which consists of two fully connected layers with a rectified linear unit (ReLU). The output of each attention module is represented as \(X_\mathrm{attention}\). Note that we use layer normalization rather than batch normalization. Again, a residual connection, layer normalization and feed forward are performed, respectively. We can finally get the output of the transformer encoder layer. The output will be used as the input of the next transformer encoder layer or fusion with FTIMF\(_\mathrm{all}\) and input to the classification layer to determine the final output categories.
Dualinput features fusion and classification
In the phase of model initialization, we extract FTIMF\(_\mathrm{time}\) and FTIMF\(_\mathrm{freq}\) features through the proposed method and perform KPCA on FTIMF\(_\mathrm{freq}\). They are then concatenated and used as the second channel input feature (FTIMF\(_\mathrm{all}\)) of DITransformer. In the training phase, the Raw ECG of the first channel is divided into minibatch and perform position encoding and then feed into the transformer encoder layer. For each iteration, we randomly select 6 s data from each Raw ECG sample (We have shown in followup experiments that 6 s long data is optimal). After the Raw ECG passes through the transformer encoder layer, the extracted feature map is flattened and concatenated with the FTIMF\(_\mathrm{all}\) features prepared in the phase of model initialization:
And then the \(X_\mathrm{hidden}\) goes through a linear layer (a 1D fully connected layer and the input dimension is \(d_\mathrm{in}\)), which is connected with a softmax function. Then, the Softmax mapping scores are compared with the corresponding input labels to calculate the crossentropy loss value. Finally, the classification layer outputs a vector \(V = (v_{1},v_{2})\), where \(v_{i}\) denotes the probability that the segment belongs to class i (good quality or bad quality).
Experiments and results
ECG database and experimental setting
ECG database
This paper employs the Physionet Computing in Cardiology Challenge 2011 (PCCC) [38] database to test the proposed SQA method. The PCCC includes 1500 10 s standard 12lead ECG recordings with sampling rate 500 Hz, and it contains two subsets: the seta includes 1000 12lead 10 s recordings, and the setb includes 500 12lead 10 s recordings. This paper employs seta, which contains 9276 (\(773\times 12\)) 10 s good quality (“acceptable”) ECGs and 2700 (\(225\times 12\)) 10 s bad quality (“unacceptable”) ECGs. In addition, we also select 500 singlelead good quality records and 500 singlelead bad quality records from the PCCC to form the testset (testa). Then, we randomly select the em or ma noise after oversampling and use it to contaminate any one of the 500 selected good quality data according to the method in [39], repeat this process 500 times, and generate 500 records with em and ma noise contamination. Finally, the generated 500 bad quality data and 500 good quality data selected from PCCC are combined into a testset (testb). The details of each database are described in Table 1. As shown in Fig. 7, we randomly select the good quality and bad quality segments from the seta. In addition, it should be noted that the Zscore is used to normalize each 10 s record of all datasets, which can be calculated as follows:
where x denotes the signal segments, \(\mu \) and \(\sigma \) are the mean value and standard deviation of the signal segments, respectively.
Experimental setting
Model parameters settings: The key parameters set for the DITransformer model are shown in Table 2. It should be noted that due to the physiological characteristics of the human body, ECG signal strength will be limited within a certain range, which means there will not be much numerical difference between peaks and troughs, so the \(d_\mathrm{model}\) is set to 512 [33]. In addition, to achieve the goal of rapid convergence and prevent oscillation near the local minimum, the learning rate is dynamically adjusted during the model’s training.
The whole method is developed and trained using Tensorflow and Pytorch. Our experiments are performed on a computer with an Intel(R) Core(TM) i57640X CPU@4.00GHz, and equipped with two GPU GeForce GTX 1080 Ti with 11GB RAM.
Performance evaluation: To evaluate the performance of the proposed method for SQA, we adopted fivefold crossvalidation. The seta is randomly divided into five equal subsets, each subset is selected as the test set in turn, and the remaining four subsets are used for training. However, less than a quarter of the data is classified as bad quality. It is well known that using an unbalanced dataset to build classifiers will cause bias and result in poor generalization ability of classification models. Another approach is to balance the dataset when not using prior probabilities (and Bayesian training paradigms) to overcome this problem. Therefore, we balance the dataset by adding real noise [em and ma noise from NSTDB [40] and additive Gaussian white noise (AGWN)] to the good quality segments to generate additional bad quality data. Note that we oversampled the em and ma noises to 500 Hz before adding them to the training subset, and the sampling rate of AGWN is also of 500 Hz. The method of balancing the dataset is described in [39]. For each crossvalidation task, we balance train subset (containing \(7421\approx 9276/5 * 4\) 10 s good quality segments and \(6838\approx 2700/5*4+4678\) 10 s bad quality segments) but keep the test subset unchanged (containing \(1855\approx 9276/5\) 10 s good quality segments and \(540\approx 2700/5\) 10 s bad quality segments).
In addition, we employ multiple indicators to evaluate the performance of the proposed method, such as sensitivity (Se), Specificity (Sp), Precision (\(P_{+}\)), accuracy (Acc), \(F_{1}\) and area under curve (AUC) [41]. It should be noted that for extremely unbalanced data (i.e., a low prevalence or incidence of a disease in the total population), the ROC curve and AUC are only partially meaningful. For this problem, Carrington et al. [42] gives an effective solution. Here, we balanced the training set. The definitions of these indicators are as follows:
where TP is true positives, TN is true negatives, FP is false positives and FN is false negatives.
Experiments results
Performance evaluation of DLDV features
To evaluate the performance of the DLDV features extracted by our method, we employ four traditional classifiers (Gaussian Kernel Support Vector Machines (GSVM) [43], Logistic Regression (LR) [44], Random Forests (RF) [45], and KNearest Neighbors (KNN) [46], and the parameter settings of each classifier are shown in Table 3) and six timefrequency dependent SQIs [10, 47, 48], such as sSQI and kSQI, pSQI, LpSQI, MpSQI, HpSQI. Table 4 shows the binary classification results of ECG signal quality using a series of features on four traditional classifiers. Figure 8 shows the confusion matrix obtained from the DLDV features (FTIMF\(_\mathrm{freq}\)) on the four classifiers. Table 4 shows that our DLDV features outperform the traditional six SQIs on GSVM, LR, RF and KNN. Our DLDV features achieve the best performance on GSVM, and the Se, \(P_{+}\) and Acc achieve 93.42, 97.85 and 93.32%, respectively. Among the six comparison SQIs, the sSQI achieve the best performance on KNN with Se, \(P_{+}\) and Acc are 89.91, 93.27 and 87.92%, respectively. Despite this, its Acc is still 5.40% lower than our method. Such results show that the performance of the DLDV features outperform the six comparison SQIs.
To further test the performance of the proposed method, instead of randomly combining SQIs to train the classification model, we generate new combinations of SQIs according to the principle of decreasing the average accuracy of the six SQIs on the four classifiers. Then, these combinations are compared with DLDV, FTIMF\(_\mathrm{all}\), respectively, and the results on each classifier are shown in Table 5. It can be seen that the Acc of the combination of six SQIs is the highest among all combinations, but still lower than the Acc of DLDV and FTIMF\(_\mathrm{all}\). It shows that our features’ performance is better than the traditional six advanced SQIs. Furthermore, our DLDV feature performs the best on GSVM (Acc = 93.32%), which benefits from our DLDV features and the superior performance of the SVM classifier based on the Gaussian kernel function. The results obtained on KNN (Acc = 92.98%) are slightly inferior to GSVM. In addition, our features perform poorly on LR (Acc = 87.76%), even lower than SQI\(_\mathrm{features}\) on KNN (Acc = 89.98%), but still slightly ahead of the results for the combinations of all 6 SQIs. It indicates that our method outperforms these six traditional SQIs in executing quality classification.
Comparison of our DITransformer and four traditional classifiers
This section compares our DITransformer with four traditional methods (GSVM, LR, RF and KNN). Four features (SQI\(_\mathrm{features}\), FTIMF\(_\mathrm{time}\), FTIMF\(_\mathrm{freq}\) and FTIMF\(_\mathrm{all}\)) are used to build five categories of classifiers, and the results on the test set are shown in Table 6. It can be seen that the classification models built with SQI\(_\mathrm{features}\), a higher accuracy (Acc = 89.98%) is achieved on KNN among all four traditional models, but still lower than the result of DITransformer (Acc = 91.26%). The performance of the classification models built with FTIMF\(_\mathrm{all}\) is generally better than that of SQI\(_\mathrm{features}\). The result on GSVM (Acc = 94.27%) is better than that obtained on KNN (Acc = 93.64%), but Table 7 and Fig. 13b reflect that the performance on KNN (AUC = 0.962) is better than GSVM (AUC = 0.921). More importantly, combined with FTIMF\(_\mathrm{all}\), our DITransformer achieves the globally best performance (Acc = 99.62% and AUC = 0.993). The p values we provide in Table 8 show the significant difference in expression signal quality between the proposed DITransformer and these four traditional classifiers, and this significant difference is statistically significant.
Ablation study on DITransformer model
In this section, we design a series of ablation experiments to comprehensively evaluate the performance of the proposed DITransformer. Experiment A only uses the FTIMF\(_\mathrm{freq}\) feature as the input to train the transformerbased model. Based on experiment A, the B used the FTIMF\(_\mathrm{freq}\) and FTIMF\(_\mathrm{time}\) as the input to train the transformerbased model. Experiment C only used Raw ECG as the input to train the transformer model. Based on C, experiment D treats FTIMF\(_\mathrm{time}\) as augmented features, which are then concatenated with the output of the transformer encoder layer and fed to the classification layer. Experiment E encodes the Raw ECG as the input of the transformer, and then the dimension reduced FTIMF\(_\mathrm{freq}\) is used as an augmented feature, which is finally fed into the classification layer along with the output of the transformer encoder layer (see in Fig. 5). On the basis of experiment E, the F treats FTIMF\(_\mathrm{freq}\) and FTIMF\(_\mathrm{time}\) as augmented features, which are then concatenated with the output of the transformer encoder layer and fed to the classification layer. Notice that compared with experiments A, B and C for the singleinput structure, experiments D, E, and F adopt the method of augments feature with a dualinput structure, the most advantage of which is that it can fully utilize the depth local dualview features.
Table 9 shows a series of ablation experiments associated with the proposed method, and Fig. 9 shows six confusion matrices for the corresponding experiments. As shown in Table 9, the Acc of the transformerbased model achieves 95.49% in experiment A. the Acc of experiment C achieves 92.57%. Compared with experiment C, the Acc of experiment E (DITransformer model) is increased by 6.01%. The result shows that as an augmented feature, the FTIMF\(_\mathrm{freq}\) significantly improves the performance of the model. Comparing the results of experiments A and C, we can find that inputs FTIMF\(_\mathrm{freq}\) into transformer can more effectively improve the classification performance than directly inputs Raw ECG into transformer. In experiment B, the Acc of the transformerbased model achieves 97.70% and the \(F_{1}\) achieves 98.51%. Comparing the results of experiments A and B, it can be seen that as an augmented feature the FTIMF\(_\mathrm{time}\) also improves the classification performance of the model, but its contribution is not as significant as FTIMF\(_\mathrm{freq}\). Experiment F maximizes the performance of the proposed DITransformer method, its Se, Sp, \(P_{+}\) and Acc values reaches 99.68, 99.44, 99.83 and 99.62%, respectively. As shown in Fig. 9f, only 0.25% of the good quality data are misclassified as bad quality data. Such results show that the performance of our DITransformer is much better than GSVM and KNN.
Performance of each model to recognize the MA noise
First, we select the four traditional classification models trained with the SQI\(_\mathrm{features}\) and FTIMF\(_\mathrm{all}\) features, respectively. The performance of these models are then tested on an artificial test set with progressively increasing MAcontaminated ECG segments. We generate a series of test sets with unchanged total samples (1000) to test the ability of each model to identify MAcontaminated ECG by adjusting the proportion of data obtained in testa and testb. We take data from testa and testb at the ratios of 8:2, 6:4, 4:6 and 2:8, respectively, and we denote these generated test sets as testab1, testab2, testab3 and testab4 in turn. The results of the four traditional classifiers trained with SQI\(_\mathrm{features}\) on each test subset are shown in Table 10. As the proportion of MAcontaminated ECG segments increases, the Acc of all four classifiers decreases to different degrees. Relatively speaking, the result of KNN under the same proportion is better than the results obtained by the other three classifiers. Figure 10a shows the results obtained on Testab1 and Testab4. It can be seen that these classifiers are more sensitive to MA noise. The results of the five classifiers trained with FTIMF\(_\mathrm{all}\) on each test subset are shown in Table 11. As the proportion of MAcontaminated ECG segments increases, the accuracy of all five classifiers decreases to different degrees, but it is much smaller than the decrease in Table 10. As shown in Fig. 10b, the results on Testab1 and Testab4 also confirmed this view. The results in Tables 12 and 13 illustrate that the contribution of our FTIMF\(_\mathrm{all}\) features to identifying MA noise is significant at p = 0.05, and our DITransformer based on FTIMF\(_\mathrm{all}\) outperforms the employed four conventional classifiers across the board in recognizing MA noise.
Optimal data length and computational time
To find the optimal segment length (\(N_\mathrm{seg}\)) for SQA, we repeat experiment F ten times on seta with \(N_\mathrm{seg}\) varying from 1 to 10 s at an increment of 1 s. Throughout the whole experiment, we only change the size of \(N_\mathrm{seg}\), and the relationship between the \(N_\mathrm{seg}\) and the accuracy of SQA are shown in Fig. 11a. It can be seen that as the size of \(N_\mathrm{seg}\) increases, the accuracy of quality classification of our model also increases. However, when \(N_\mathrm{seg}\) is greater than 6 s, the accuracy can hardly be improved. It shows that the 6 s segment has covered most of the features required for signal quality classification. In addition, the Fig. 11b reflects the relationship between sample length and training and testing times. As the \(N_\mathrm{seg}\) increases, the training and testing time of the model slowly increases within 5 s. After 6 s, as it increases, the curve shows a rapid upward trend. Combining the results of Fig. 11a, b, weighing classification accuracy and computational complexity, we finally choose the optimal signal segment length as \(N_\mathrm{seg} = 6s\).
Performance comparison
This paper employs the PCCC [38] database, and other papers also use that database. Table 14 lists some other wellperforming methods using this database. Albaba et al. [49] constructed an SQA pipeline by combining multiple timefrequency domain features with multiple traditional classifiers, and obtained good results on the Medium Gaussian SVM (MGSVM) classifier. The method achieves an accuracy of 93.00% on MGSVM, which is comparable to the result obtained by our FTIMF\(_\mathrm{all}\) features on GSVM (Acc = 94.27%), but still much lower than our DITransformer ( Acc = 99.62%). Shahriari et al. [13] used the SSIM to compare ECG images obtained from two ECGs at standard scales. And then, they trained a linear discriminant analysis classifier for SQA based on the SSIM between each image and all templates as feature vectors. Compared with others, their method obtained a lower accuracy. Behar et al. [10] employed indicators such as kSQI, sSQI, pSQI basSQI, bSQI, pcaSQI, and rSQI, and trained an SVM model to evaluate the quality of ECG signals to reduce false alarms, with the achieved accuracy of 99.30%. The result is higher than our GSVM based on FTIMF\(_\mathrm{all}\) but is slightly inferior to our DITransformer. It is worth noting that our methods have a strong MA noise recognition ability, but [10] aimed at the normal noisy signal and do not consider the interference of MA noise. Therefore, even though their performance metrics are high, but not entirely comparable. In [13, 49], they also hardly consider the case of MAcontaminated ECG. In addition, the proposed methods have good interpretability and can achieve accurate ECG SQA, including a large amount of MA noise.
Discussion
Analyzing the performance of DLDV features
This paper uses EMD and FFT to extract the DLDV features of ECG signals. Then four different traditional classifiers (GSVM, LR, RF, and KNN) are employed to evaluate the performance of the extracted DLDV features. Meanwhile, we also employ six traditional timefrequency related SQIs metrics as references to evaluate the performance of our DLDV features. In general, the larger span of signal quality, the more significant difference in SQI value. For example, as shown in Fig. 12, due to the obvious difference in the probability density distribution of different quality signals, the kurtosis (kSQI) and skewness (sSQI) can provide effective information for distinguishing good quality signals from bad quality signals. In addition, the other four timefrequencyrelated SQIs are all valid SQA indicators verified by researchers and have achieved good results in actual SQA [4, 5, 43, 44]. Therefore, this paper selects them as references to evaluate the confidence of our DLDV features for SQA.
Table 4 and Fig. 8 show the classification results and confusion matrices of the six traditional SQIs and DLDV features employed in this paper on the four classifiers. It can be seen that the DLDV features outperform these traditional SQIs metrics on the four classifiers, and even the SQI\(_\mathrm{features}\) on LR with the lowest accuracy is also lower than our DLDV. The reason why our method comprehensively outperforms the traditional six SQIs is that the features extracted by our method can not only express the central tendency and discrete degree of the signal segment, but also employ the phase angle and amplitude–frequency values to express the characteristics of the transient change of the signal.
Analyzing the performance of each model to recognize the MA noise
We also design experiments to test the proposed method’s ability to recognize MAcontaminated ECGs. Our DLDV features work well for MAcontaminated ECGs, which is well confirmed in Fig. 10 and Tables 10, 11, 12, 13. Table 10 reflects the expression ability of SQI\(_\mathrm{features}\) on MA noise. It can be seen that with the increase of MA noise, the accuracy of all four classifiers decreases, and the minimum decrease reaches 6.82%. It can be seen from Table 11, under the same conditions, the accuracy of all four classifiers also decreased, but the maximum decrease is only 1.08%. Tables 12 and 13 are the results of statistical analysis for Tables 10 and 11. The p values show the significant difference between SQI\(_\mathrm{features}\) and FTIMF\(_\mathrm{all}\) in expressing MA noise, which is statistically significant. It can be seen from the results in Fig. 10a, the SQI\(_\mathrm{features}\) has its limitation in expressing MA noise. Because these metrics are based on humandefined desirable properties of clean signals, they rely on humanspecified properties, leading to inherent limitations in expressing potential features of signal quality [17]. In addition, it is difficult for us to artificially specify the features of some MA noises similar to ECG signal, so it is not surprising that the features information of them are hard to extract by using the SQI\(_\mathrm{features}\). Compared with the results in Fig. 10a, the results obtained by each classifier in Fig. 10b on the two test sets are very close, with the average difference of 0.76%. It shows that the classifier constructed with our features can identify general noise well. More importantly, it also offers strong performance in identifying MA noises. Furthermore, our DITransformer structure achieves high accuracy on testab4. Such high accuracy is not only due to the design of the dualinput structure, but more importantly, the transformer’s selfattention module can also capture the timing relationship of the signal and then combine the DLDV features with improving the model’s ability to recognize MA noise. Note that we do not use the FTIMF\(_\mathrm{time}\) feature in this test experiment because this feature can only express the central tendency and dispersion of the signal and cannot fully reflect the transient change of the signal.
Analyzing the performance of proposed DITransformer
The effectiveness and robustness of our FTIMF\(_\mathrm{all}\) feature for SQA are verified on traditional classifiers (GSVM, LR, RF, and KNN). Furthermore, we also propose a DITransformer SQA method based on the FTIMF\(_\mathrm{all}\) features. Table 9 presents a series of ablation experiments for the proposed DITransformer method. Figure 9 shows the confusion matrix corresponding to each ablation experiment. The results of experiments C, D, E and F show that the contribution of FTIMF\(_\mathrm{freq}\) to the SQA is much more significant than that of FTIMF\(_\mathrm{time}\). The results of experiments C and E show that the proposed dualinput structure significantly improves the model’s classification performance. Feeding the FTIMF\(_\mathrm{freq}\) (experiment A) to the transformer as input data are much better than feeding it the Raw ECG directly (experiment C), which shows that DLDV features can help the transformer model to learn the quality features more easily. It benefits from the fact that the phase angle features can well represent the transient change of the signals, and combined with the amplitude features, this transient change can be quantified. We also observe that the Se value of experiment E is higher than that of experiment C, the accuracy of experiment F is the best. It shows that experiment F tends to identify more signal segments as good quality, with the advantage of not missing valuable signals in subsequent processing stages, which is also demonstrated in the confusion matrix in Fig. 9f. From this point of view, the abstract features automatically extracted by the transformer from Raw ECG are complementary to the FTIMF\(_\mathrm{freq}\) features. Comparing the results of A, E and B, F, we find that the DITransformer combines the advantages of DLDV features and transformerbased abstract features, and has higher Se, Sp and Acc values. It can obtain more effective signal quality features than the singleinput structure (A, B and C).
We also compare the proposed DITransformer with four traditional classifiers. It can be seen from Table 6 that the result on SQI\(_\mathrm{features}\) is inferior to our FTIMF\(_\mathrm{all}\), but higher than our FTIMF\(_\mathrm{time}\). Because our FTIMF\(_\mathrm{time}\) does not focus on the nuances of signal and noise. The AUC values in Table 7 show that our DITransformer exhibits the best performance on all features, followed by KNN combined with FTIMF\(_\mathrm{all}\). Furthermore, in Table 8 the p values we provide show significant differences between the method based on SQI\(_\mathrm{features}\) and the method based on FTIMF\(_\mathrm{all}\), and this significant difference is statistically significant. It is not surprising that we get such good results because our method rarely considers the morphology of Rwa ECG and instead mines the depth local features of the signal. We not only extract the transient amplitude features of the intermediate component of the signal (IMFs), but also extract the transient phase angle features that can express the subtle difference between the signal and the noise (especially for MA noise). Equally important, on the traditional classifierbased methods, although the accuracy of FTIMF\(_\mathrm{all}\) features on GSVM is higher than that of KNN, but the receiver operating characteristic curve (ROC) of each model in Fig. 13 shows that the performance of DITransformer is the best (AUC = 0.993). Therefore, the DItransformerbased model constructed by FTIMF\(_\mathrm{all}\) can provide a new set of practical solutions for SQA. In addition, it can be seen from Fig. 13b that the KNN model built with FTIMF\(_\mathrm{all}\) exhibits the best performance (AUC = 0.962), followed by RF (AUC = 0.948). Suppose the user uses the traditional method to build the signal quality classifier. In that case, the KNN or RF method based on FTIMF\(_\mathrm{all}\) can be preferred under the same conditions.
Conclusion
In summary, we present a novel ECG SQA method that fuses the proposed DLDV features and the DITransformer framework for improving the recognition ability of MAcontaminated ECG. For the first time, we combine DLDV features and transformer to handle the ECG SQA problem. Specifically, we use EMD and FFT to extract DLDV features of Raw ECG in the timefrequency domain. The extracted DLDV feature can identify subtle differences between MA and ECG signals through depth local amplitude and phase angle features. When it is fused with the temporal relationship features extracted by DITransformer, its accuracy is significantly improved compared to the method based on traditional SQIs. Experiments on SQA tasks show that the proposed method outperforms the stateoftheart SQA methods. In addition, our method can not only identify the common type of noise from noisecontaminated ECGs, more importantly, it can effectively identify MAcontaminated ECG. In the future, we will improve the proposed method and make it suitable for SQA of other physiological signals, such as SQA of electroencephalogram and electromyogram.
References
Clifford GD, Azuaje F (2006) Advanced methods and tools for ECG data analysis, vol 10. In: McSharry P (ed). Artech house, Boston
Satija U, Ramkumar B, Manikandan MS (2016) A unified sparse signal decomposition and reconstruction framework for elimination of muscle artifacts from ECG signal. In: 2016 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 779–783
Nguyen P, Kim JM (2016) Adaptive ECG denoising using genetic algorithmbased thresholding and ensemble empirical mode decomposition. Inf Sci 373:499–511
Hu M, Zhang S, Dong W, Xu F, Liu H (2021) Adaptive denoising algorithm using peak statisticsbased thresholding and novel adaptive complementary ensemble empirical mode decomposition. Inf Sci 563:269–289
Alyasseri ZAA, Khader AT, AlBetar MA, Awadallah MA (2018) Hybridizing \(\beta \)hill climbing with wavelet transform for denoising ECG signals. Inf Sci 429:229–246
Xie X, Liu H, Shu M, Zhu Q, Huang A, Kong X, Wang Y (2021) A multistage denoising framework for ambulatory ECG signal based on domain knowledge and motion artifact detection. Future Gener Comput Syst 116:103–116
Orphanidou C, Drobnjak I (2016) Quality assessment of ambulatory ECG using wavelet entropy of the HRV signal. IEEE J Biomed Health Inform 21(5):1216–1223
Mayer C, Bachler M, Holzinger A, Stein PK, Wassertheurer S (2016) The effect of threshold values and weighting factors on the association between entropy measures and mortality after myocardial infarction in the Cardiac Arrhythmia Suppression Trial (CAST). Entropy 18(4):129
Xia Y, Jia H (2017) ECG quality assessment based on multifeature fusion. In: 2017 13th international conference on natural computation, fuzzy systems and knowledge discovery (ICNCFSKD). IEEE, pp 672–676
Behar J, Oster J, Li Q, Clifford GD (2013) ECG signal quality during arrhythmia and its application to false alarm reduction. IEEE Trans Biomed Eng 60(6):1660–1666
Satija U, Ramkumar B, Manikandan MS (2017) Realtime signal qualityaware ECG telemetry system for IoTbased health care monitoring. IEEE Internet Things J 4(3):815–823
Zhang Y, Wei S, Zhang L, Liu C (2019) Comparing the performance of random forest, SVM and their variants for ECG quality assessment combined with nonlinear features. J Med Biol Eng 39(3):381–392
Shahriari Y, Fidler R, Pelter MM, Bai Y, Villaroman A, Hu X (2017) Electrocardiogram signal quality assessment based on structural image similarity metric. IEEE Trans Biomed Eng 65(4):745–753
Holzinger A, Hörtenhuber M, Mayer C, Bachler M, Wassertheurer S, Pinho AJ, Koslicki D (2014) On entropybased data mining. Interactive knowledge discovery and data mining in biomedical informatics. Springer, Berlin, pp 209–226
Liu G, Han X, Tian L, Zhou W, Liu H (2021) ECG quality assessment based on handcrafted statistics and deeplearned stransform spectrogram features. Comput Methods Progr Biomed 208:106269
Herraiz ÁH, MartínezRodrigo A, BertomeuGonzález V, Quesada A, Rieta JJ, Alcaraz R (2020) A deep learning approach for featureless robust quality assessment of intermittent atrial fibrillation recordings from portable and wearable devices. Entropy 22(7):733
Seeuws N, De Vos M, Bertrand A (2021) Electrocardiogram quality assessment using unsupervised deep learning. IEEE Trans Biomed Eng 69(2):882–893
Zhang J, Wang L, Zhang W, Yao J (2018) A signal quality assessment method for electrocardiography acquired by mobile device. In: 2018 IEEE international conference on bioinformatics and biomedicine (BIBM). IEEE, pp 1–3
MagrinChagnolleau I, Baraniuk RG (1999) Empirical mode decomposition based timefrequency attributes. In: SEG technical program expanded abstracts 1999. Society of Exploration Geophysicists, pp 1949–1952
Oberst U (2007) The fast Fourier transform. SIAM J Control Optim 46(2):496–540
Schölkopf B, Smola A, Müller KR (1997) Kernel principal component analysis. In: International conference on artificial neural networks. Springer, pp 583–588
Lee J, McManus DD, Merchant S, Chon KH (2011) Automatic motion and noise artifact detection in Holter ECG data using empirical mode decomposition and statistical approaches. IEEE Trans Biomed Eng 59(6):1499–1506
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. Adv Neural Inf Process Syst 30
Heil CE, Walnut DF (1989) Continuous and discrete wavelet transforms. SIAM Rev 31(4):628–666
Rilling G, Flandrin P (2007) One or two frequencies? The empirical mode decomposition answers. IEEE Trans Signal Process 56(1):85–95
Wu Z, Huang NE, Long SR, Peng CK (2007) On the trend, detrending, and variability of nonlinear and nonstationary time series. Proc Natl Acad Sci 104(38):14889–14894
Hasan S, Muttaqi KM, Sutanto D (2019) Automated segmentation of the voltage SAG signal using Hilbert Huang transform to calculate and characterize the phase angle jump. In: 2019 IEEE industry applications society annual meeting. IEEE, pp 1–6
Wold S, Esbensen K, Geladi P (1987) Principal component analysis. Chemom Intell Lab Syst 2(1–3):37–52
Schölkopf B, Smola A, Müller KR (1998) Nonlinear component analysis as a kernel eigenvalue problem. Neural Comput 10(5):1299–1319
Lee JM, Yoo C, Choi SW, Vanrolleghem PA, Lee IB (2004) Nonlinear process monitoring using kernel principal component analysis. Chem Eng Sci 59(1):223–234
Lee JM, Yoo C, Lee IB (2004) Fault detection of batch processes using multiway kernel principal component analysis. Comput Chem Eng 28(9):1837–1847
Cai P, Deng X (2020) Incipient fault detection for nonlinear processes based on dynamic multiblock probability related kernel principal component analysis. ISA Trans 105:210–220
Yan G, Liang S, Zhang Y, Liu F (2019) Fusing transformer model with temporal features for ECG heartbeat classification. In: 2019 IEEE international conference on bioinformatics and biomedicine (BIBM). IEEE, pp 898–905
Guan J, Wang W, Feng P, Wang X, Wang W (2021) Lowdimensional denoising embedding transformer for ECG classification. In: ICASSP 2021–2021 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 1285–1289
Song H, Rajan D, Thiagarajan JJ, Spanias A (2018) Attend and diagnose: clinical time series analysis using attention models. In: Thirtysecond AAAI conference on artificial intelligence
Yuan S, He Z, Zhao J, Yuan Z (2021) Lowdimensional depth local dualview features embedded transformer for electrocardiogram signal quality assessment. In: 2021 IEEE international conference on bioinformatics and biomedicine (BIBM). IEEE, pp 1137–1144
Zhao Z, Wu Y (2016) Attentionbased convolutional neural networks for sentence classification. Interspeech 8:705–709
Silva I, Moody GB, Celi L (2011) Improving the quality of ECGs collected using mobile phones: the physionet/computing in cardiology challenge 2011. In: 2011 computing in cardiology. IEEE, pp 273–276
Li Q, Clifford G (2011) Signal quality indices and data fusion for determining acceptability of electrocardiograms collected in noisy ambulatory environments. Comput Cardiol 38:1
Moody GB, Muldrow W, Mark RG (1984) A noise stress test for arrhythmia detectors. Comput Cardiol 11(3):381–384
Fletcher GS (2019) Clinical epidemiology: the essentials. Lippincott Williams & Wilkins
Carrington AM, Fieguth PW, Qazi H, Holzinger A, Chen HH, Mayr F, Manuel DG (2020) A new concordant partial AUC and partial c statistic for imbalanced data in the evaluation of machine learning algorithms. BMC Med Inform Decis Mak 20(1):1–12
Varewyck M, Martens JP (2010) A practical approach to model selection for support vector machines with a Gaussian kernel. IEEE Trans Syst Man Cybern Part B (Cybern) 41(2):330–340
Sahadat MN, Jacobs EL, Morshed BI (2014) Hardwareefficient robust biometric identification from amplitude and interval features of 0.58 second limb (lead I) ECG signal using logistic regression classifier. In: Engineering in Medicine and Biology Society (EMBC), Chicago, IL, pp 1440–1443
Li T, Zhou M (2016) ECG classification using wavelet packet entropy and random forests. Entropy 18(8):285
Fukunaga K, Narendra PM (1975) A branch and bound algorithm for computing knearest neighbors. IEEE Trans Comput 100(7):750–753
Li Q, Mark RG, Clifford GD (2007) Robust heart rate estimation from multiple asynchronous noisy sources using signal quality indices and a Kalman filter. Physiol Meas 29(1):15
Liu F, Wei S, Lin F, Jiang X, Liu C (2020) An overview of signal quality indices on dynamic ECG signal quality assessment. Feature Eng Comput Intell ECG Monit 33–54
Albaba A, SimõesCapela N, Wang Y, Hendriks RC, De Raedt W, Van Hoof C (2021) Assessing the signal quality of electrocardiograms from varied acquisition sources: a generic machine learning pipeline for model generation. Comput Biol Med 130:104164
Acknowledgements
This project is supported by the Science and Technology Major Project of Hubei Province (NextGeneration AI Technologies) under Grant 2019AEA170.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no conflicts of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Yuan, S., He, Z., Zhao, J. et al. Fusing depth local dualview features and dualinput transformer framework for improving the recognition ability of motion artifactcontaminated electrocardiogram. Complex Intell. Syst. 9, 981–999 (2023). https://doi.org/10.1007/s4074702200861z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s4074702200861z
Keywords
 Depth local dualview (DLDV) features
 Dualinput transformer (DITransformer)
 Motion artifacts (MA)
 Signal quality assessment (SQA)
 Electrocardiogram (ECG)