1 Introduction

Electroencephalogram (EEG) quantitatively measures the human brain’s electrical activity which takes place due to the firing of neurons [1] and such brain activity is recorded non-invasively utilizing several electrodes located in different regions of the scalp [2]. For epileptic seizure detection, the utilization of long-duration EEG is widespread [3,4,5]. EEG is also utilized to detect Alzheimer’s disease [6, 7], estimate drowsiness levels [8,9,10], recognize human emotions [11], evaluate cognitive workload [8, 12], develop brain–computer interfaces (BCIs) [13,14,15,16], implement biometric systems [17,18,19] and so on.

EEG is an exceptionally crucial physiological signal due to its widespread usage but is highly susceptible to motion artifacts that happen due to the voluntary and/or involuntary movement of the test subject during data recording using wearable devices. In some instances, movement artifacts may end up so conspicuous that the recorded EEG signal would lose its usability unless the movement artifacts are diminished significantly. Several earlier efforts were undertaken to reduce movement artifacts from motion-corrupted EEG data, which were summarized in [20, 21]. Several single-stage and two-stage motion artifacts correction techniques for EEG modality were introduced and implemented in [21]. The authors of [21] investigated discrete wavelet transform (DWT) [22] using Daubechies 5 mother wavelet, empirical mode decomposition (EMD) [23], ensemble empirical mode decomposition (EEMD) [24], EMD in conjunction with independent component analysis [25] (EMD-ICA), EEMD-ICA, EMD cascaded with canonical correlation analysis [26] (EMD-CCA), and EEMD-CCA to decompose the single-channel EEG data and utilized “reference ground truth signal” and autocorrelation function separately to identify and discard motion-corrupted component(s). In [27], the authors utilized singular spectrum analysis (SSA) [28] to decompose the single-channel EEG signals and then used the adaptive noise cancellation (ANC) technique to eliminate motion artifacts. Ghajbhiye et al. [29] used DWT to decompose the single-channel EEG data into sub-band signals and applied the total variation (TV) and weighted total variation (MTV) multi-resolution technique to the approximation sub-band signal to filter out motion artifacts. For the reduction of motion artifacts from single-channel EEG, a wavelet domain optimized Savitzky–Golay filter was implemented in [30]. Noorbasha et al. [31] used the SSA along with the generalized Moreau envelope total variation (SSA-GMETV) technique to lessen motion artifacts from single-channel EEG signals. To efficiently reduce movement artifacts from EEG, Shukla et al. [32] suggested a two-stage artifact correction technique where EEMD and Gaussian elimination CCA (GECCA) were utilized jointly whereas in [33], modified EMD in combination with optimized Laplacian of Gaussian (LoG) filter was proposed for suppressing movement artifacts. Recently, Hossain et al. [34] utilized variational mode decomposition (VMD) [35], VMD cascaded with principal component analysis [36] (VMD-PCA), and VMD-CCA for the correction of motion artifacts from single-channel EEG data. In [37], the wavelet packet decomposition technique in combination with CCA was proposed. The main limitation of these studies is their adoption of signal processing techniques. While there have been some improvements over the years, the correlation improvement performance from these studies could not exceed the 70% mark due to the static nature of these manually tuned techniques. Moreover, the existing techniques have never been properly evaluated with robust metrics, both temporally and spectrally, as we have performed during this study to ensure that the underlying EEG information is not lost during the process.

EEG signals, in addition to motion artifacts, suffer from other forms of artifacts among which ocular, muscular, and cardiac artifacts are prominent. Autoencoders (AEs) based on fully connected layers were developed by Ghosh et al. [38] and Yang et al. [39] to eliminate ocular artifacts from EEG signals. Leite et al. [40], Zhang et al. [41], and Sun et al. [42] introduced deep convolutional neural network (DCNN)-based models that can extract spatio-temporal information and are hence more resilient than typical fully connected neural networks. In [40], a deep convolutional autoencoder (DCAE) was developed to reduce eye blink and jaw clenching aberrations from EEG data. To reduce muscular distortions from EEG data, authors in [41] developed a DCNN that progressively increases its width. Sun et al. [42] reported a residual-connection-based DCNN for reducing ocular, muscular, and cardiac abnormalities from noisy EEG data. Recently, authors of [43] proposed EEGANet, a framework based on generative adversarial networks (GANs) for the removal of ocular artifacts from EEG data whereas in [44], the k-means algorithm in combination with the SSA technique was proposed for the reduction of eye blink artifacts. Although a fair share of studies is existent for the removal of ocular, muscle, and cardiac artifacts from EEG recordings to the best of our knowledge, the removal of motion artifacts using deep learning models has not been investigated to date.

Unlike EEG, both classical and deep Machine Learning techniques have been used to correct motion artifacts from other physiological signals such as photoplethysmography (PPG) [45,46,47,48,49,50,51,52], electrocardiogram (ECG) [45, 53,54,55,56,57,58,59,60], electromyogram (EMG) [61, 62], and phonocardiogram (PCG) [63]. To fill this void, this study presents a novel 1D convolutional neural network (CNN)-based signal synthesis or reconstruction approach to correct motion artifacts from motion-corrupted EEG recordings. The key contributions from this study can be summarized as follows:

  • This is the very first study that used any kind of Machine Learning approach to remove motion artifacts from EEG signals. All other previous studies used a combination of traditional signal processing techniques.

  • This study used deep learning (CNN)-based 1D signal reconstruction network to reduce motion artifacts significantly from motion-corrupted EEG signals with significantly higher performance in SNR reduction and correlation improvement than in existing studies.

  • This study evaluated the contribution of onboard accelerometers in reducing motion artifacts from corrupted EEG signals.

  • The methodology or framework proposed in this study can be extended to any other 1D physiological signal such as photoplethysmogram (PPG) and electrocardiogram (ECG) for signal artifacts correction.

The remainder of this paper is structured as follows: Sect. 2 illustrates the proposed convolutional neural network (CNN)-based MLMRS-Net segmentation network for EEG motion artifact correction, which is followed by an overview of the single-channel EEG benchmark dataset, and the data preprocessing techniques adopted in this study. Section 3 discusses in detail the experimental setup and the performance evaluation metrics used. Sect. 4 provides the performance of the proposed model as well as ten other state-of-the-art segmentation networks and discusses the results along with a comparison to past studies. Finally, a brief conclusion is presented in Sect. 5.

2 Materials and methods

In this section, the proposed MLMRS-Net segmentation network for EEG motion artifact removal is discussed in detail. A brief overview of the EEG benchmark dataset used as well as the data preprocessing steps adopted in this study is discussed in two separate sub-sections. Figure 1 shows the framework proposed in this study for effective EEG motion artifact removal using a 1D-CNN-based segmentation network.

Fig. 1
figure 1

Proposed EEG motion artifacts correction framework

2.1 Overview of MLMRS-Net

The architecture of the proposed MLMRS-Net segmentation network is illustrated in Fig. 2. MLMRS-Net is a 1D-CNN-based segmentation network that contains one multi-resolution pooling (MRP) block in each encoder and decoder layer of the network. The network itself follows the UNet framework [64] where the final output of each encoder level gets concatenated with the decoder layer at the same level to retain the feature map from the contracting path. Deep supervision [65] is used in each decoder layer including the latent layer at the bottom. Hence, apart from the final output, our proposed model generates five extra outputs (Fig. 2), all of which are being deeply supervised at the same time.

Fig. 2
figure 2

Proposed multi-layer multi-resolution spatially pooled (MLMRS) network architecture

2.1.1 Modified spatial pooling (MSP) layer

The architecture of the modified spatial pooling (MSP) layer is depicted in Fig. 3 which can be modified based on the input ‘\(n\)’ into the layer, as shown in Fig. 4. The input to the MSP layer gets mix-pooled with a pool size of \(2^{{\text{n}}}\) (Fig. 3). If the values of ‘\(n\)’ for a segmentation network having ‘\(k\)’ levels are represented as, \(n = \left[ {0,1 \ldots ,k - 2,k - 1} \right]\), the pool size for the MSP layers in each MRP block is, \({\text{s}}\) = [1,2…, \(2^{{{\text{k}} - 2}}\), \(2^{{{\text{k}} - 1}}\)]. Since we have designed our proposed MLMRS-Net model with 5 levels, the corresponding pool size is, \(s\) = [\(2^{0}\), \(2^{1}\), \(2^{2}\), \(2^{3}\), \(2^{4}\)] = [1, 2, 4, 8, 16]. Mixed or “Max-Average” pooling [66], [67] is a combination of max and average pooling. The outputs from the max and average pooling blocks are added based on the weight regulator value ‘\(\alpha\)’, as formulated in Eq. (1),

$$x^{l + 1} = \left( {\alpha \times f_{\max } \left( {x^{l} ,2^{n} } \right)} \right) \oplus \left( {\left( {1 - \alpha } \right) \times f_{{{\text{avg}}}} \left( {x^{l} ,2^{n} } \right) } \right)$$
(1)

Here, \(x^{l}\) and \(x^{l + 1}\) denotes input and output layers, respectively, out of an operation. \(^{^{\prime}} f^{^{\prime}}\) denotes a function. The addition operation is denoted by the \(^{^{\prime}} \oplus^{^{\prime}}\) symbol. In our study, the value of ‘\(\alpha\)’ is chosen as 0.5, i.e., equal weight is given to both max-pooling and average-pooling.

Fig. 3
figure 3

Modified spatial pooling (MSP) layer

Fig. 4
figure 4

Multi-resolution pooling (MRP) block expanded

The pooling layer is followed by a convolutional block of kernel size = 3. Then, the feature map generated from the convolutional layer is forwarded into two branches. In one branch, the features are upsampled with an upsize of 2n through ‘Bilinear Interpolation’ [68] [Eq. (2)] whereas in the second branch, the feature map is squeezed through a transposed convolution block [69] having kernel size and stride of 1 [Eq. (3)]. The squeezed feature maps are further fed into a transposed convolution block of kernel size = 3 and stride = 2n to transform the feature maps to the same size as the original. Feature maps from these two upsampling paths are concatenated [Eq. (4)]. In this way, the proposed network gets benefitted from both interpolation and transposed convolution type feature upsampling techniques [70].

$$x_{1} = f_{{{\text{upConv}}}} \left( {f_{{{\text{conv}}}} \left( {x^{l} ,k = 3} \right),\,\,^{^{\prime}} Bilinear^{^{\prime}} } \right)$$
(2)
$$x_{2} = f_{{{\text{TransConv}}}} \left( {f_{{{\text{conv}}}} \left( {x^{l} ,k = 3} \right),\,\,k = 1,\,\,s = 1} \right)$$
(3)
$$x^{l + 1} = x_{1} \otimes \left( {f_{{{\text{TransConv}}}} \left( {x_{2} ,\,k = 3,\,s = 2^{n} } \right)} \right)$$
(4)

Finally, the concatenated feature maps are squeezed through a transposed convolution block of kernel size and stride of 1 before outputting [Eq. (5)]. This process reduces the feature footprint in the next stage during concatenation [71]. It is evident that the coarseness (or fineness) of the MSP layer depends on the value of ‘n’. If n = 0, the feature map is the coarsest, which gets finer as the value of ‘n’ is increased [72].

$$x^{l + 1} = \left( {f_{{{\text{TransConv}}}} \left( {x^{l} ,\,k = 1,\,s = 1} \right)} \right).$$
(5)

2.1.2 Multi-resolution pooling (MRP) block

The multi-resolution pooling (MRP) blocks contain MSP layers equal to the number of levels of the segmentation model, which is ‘5’ in our proposed model. As discussed previously, the value of ‘\(n\)’ varies from 0 to ‘\(k - 1\)’. The feature maps become finer as the value of ‘n’ increases. Inside the MRP block, a skip connection from the input is concatenated to the output from each MSP layer [73], as shown in Fig. 4. In this way, the model will be able to capture coarser to finer features from the input signals, the skip connection being the coarsest and ‘\(n = k - 1\)’ being the finest. Since EEG signals are more unpredictable than other physiological signals such as ECG or PPG signals, the proposed model is designed to facilitate capturing various types of features from the EEG signals. This is also reflected in the outcome obtained as shown in Section 4, the results section. The operation in the MRP block is formulated in Eq. (6), where the term ‘U’ signifies concatenation.

$$x^{l + 1} = x^{l} \otimes \left( {\bigcup\limits_{n = 0}^{k - 1} {f_{{{\text{MSP}}}} \left( {x^{l} ,n} \right)} } \right)$$
(6)

It is worthwhile to mention that the model size or the number of parameters can be reduced by decreasing the number of MSP layers per MRP block, which is termed the ‘Cardinality’ [74] of the model. Increasing the cardinality exponentially boosts the number of model parameters. In this study, the cardinality of the model was kept as five.

2.2 Overview of the EEG benchmark dataset

The dataset used in this study, namely “Motion Artifact Contaminated fNIRS and EEG Data,” is a publicly available PhysioNet dataset, contributed by Sweeny et al. [75, 76]. This dataset contains instances of “reference ground truth” and motion-corrupted functional near-infrared spectroscopy (fNIRS) and electroencephalogram (EEG) recordings, which were primarily recorded for evaluating several motion artifact removal techniques. During the 9 min long EEG data acquisition from each test subject, two electrodes with the same hardware properties were placed simultaneously on the test subject’s scalp at very close proximity (30 mm) where one of the electrodes was kept unimpacted to record “reference ground truth EEG signal” while the other one was disturbed by tapping the sensor for 10–25 s at around two minutes intervals to record motion-corrupted EEG signal. The lack of motion artifacts in one sensor and presence in the other was also documented using 3-axis accelerometers placed along with each sensor. Simultaneously recorded “reference ground truth” EEG signals and the corresponding motion-corrupted EEG signals showed a high correlation (~ 0.83, \(\sigma\) = 0.2) during the motion-free intervals and a much lower correlation otherwise (~ 0.40, \(\sigma\) = 0.19) [75]. The dataset contains 23 sets of single-channel EEG data, collected from the prefrontal cortex region of the brain, along with corresponding 3-axis accelerometer signals for both “reference ground truth” and motion-corrupted EEG signals. It is worth mentioning that all the recorded signals were synchronized through software-based trigger signals for both EEG and accelerometer. These trigger signals were utilized during data preprocessing. The “reference ground truth” and motion-corrupted EEG signals in this dataset were labeled as channel 1 and channel 2, respectively. Each EEG signal was recorded at a sampling frequency of 2048 Hz whereas the accelerometer and trigger signals were sampled at a rate of 200 Hz. Figure 5a shows an example of synchronized plots of EEG Signals (“reference ground truth” and motion-corrupted), corresponding accelerometer 3-axis plots with motion artifacts, and accelerometer trigger during the whole recording duration (9 min), and Fig. 5b depicts one zoomed-in segment with motion artifacts. From Fig. 5, it is clear that EEG channel 1 (“reference ground truth” EEG signal) suffers from baseline drift whereas the high amplitude fluctuations in motion-corrupted EEG signal are noticeable in four different regions.

Fig. 5
figure 5

Synchronized plots for EEG signals (ground truth and motion-corrupted), corresponding accelerometer 3-axis plots with motion artifacts, and accelerometer trigger a Whole Duration; b Zoomed-in

2.3 Data preprocessing

The data preprocessing step is one of the most crucial steps for deep learning applications since the model performance greatly depends on how the data are preprocessed. A well-prepared and preprocessed data can boost the model performance significantly while the same model might fail if the data are not preprocessed properly. In this study, the signals were resampled, baseline corrected, segmented, and normalized in the process of making them suitable for the segmentation networks. Each step is explained below in detail.

2.3.1 Resampling

During data acquisition, the EEG and accelerometer signals were sampled at 2048 Hz and 200 Hz, respectively. EEG signals had much more sample points in comparison with the accelerometer signals. To use them concurrently in deep learning model training, all signals should have a similar number of data points. To fulfill this prerequisite, all the signals used in this study were resampled to a single sampling frequency of 256 Hz, i.e., the EEG signals were downsampled from 2048 to 256 Hz and 3 axis accelerometer signals were upsampled from 200 to 256 Hz. Downsampling a signal does not affect the signal morphology much if the interpolation method is chosen carefully since we are interpolating from more data points. But upsampling a signal to a much higher frequency might change the signal morphology as the algorithm tries to estimate several intermediate points. For this reason, the sampling frequency (256 Hz) was kept as close as the lowest sampling frequency (200 Hz). The linear interpolation method was found to be adequate for this study.

2.3.2 Baseline drift correction

The raw signals, especially “reference ground truth” and motion-corrupted EEG signals, had baseline drift during the whole recording for 23 trials. Baseline drifts have random patterns that are learned by the deep learning models during training, and it affects the performance. Moreover, the drift patterns are different for different channels (as shown in Supplementary Fig. 1), even if they match, training 1D-CNN models with signals affected by baseline wandering will instigate the model to produce baseline corrupted EEG during prediction. So, baseline wandering needs to be removed or minimized from all signals (target or predictor) so that the deep learning architecture can focus on learning only important features. But for motion-corrupted EEG signals, motion artifacts and baseline wandering remain mixed. Moreover, the EEG signals have large DC shifts or offsets which affect the baseline correction process, so it needs to be removed beforehand. Our baseline correction process primarily involves fitting a polynomial along the baseline of the signal and deducting it. Polynomial order and window length of the operation are two crucial factors that control the sharpness of the polynomial [77]. If the polynomial order is high or the window length is small, even a high-frequency baseline can be removed, and vice-versa. But it solely depends on the nature of the baseline. If the baseline is not highly frequent, a higher-order polynomial will distort the signal itself. On the contrary, if the baseline is highly frequent, a lower-order polynomial will not fix the baseline properly. Moreover, in this case, the baseline remains mixed with motion artifacts for some segments. Our segmentation models will work ideally when both “reference ground truth” and motion-corrupted EEG signals match closely during motion artifacts-free segments (Fig. 6a) and differ during motion artifacts contaminated segments (Fig. 6b), and the motion artifacts should be unaffected by the baseline correction algorithm. Now, if higher-order polynomials are applied to the whole signal, it also partially removes the motion artifacts, as depicted in Supplementary Fig. 2. By doing this, the segmentation network will still perform well but during evaluation, comparing the estimated EEG signal to the input signals will result in low performance since baseline correction partly removed the motion artifacts beforehand. Moreover, the AI framework will not be justified properly since removing a portion of motion artifacts before training made the task less challenging for the network, which will not be the case during a real-world scenario.

Fig. 6
figure 6

Superimposed ground truth and motion-corrupted EEG segments after preprocessing for deep learning: a non-corrupted segment, both channels are matching closely; b motion-corrupted segment, both channels differ as channel 2 contains large motion artifacts

On the contrary, if lower-order polynomials are used, it does not remove the baseline properly during non-corrupted segments. As shown in Supplementary Fig. 1, during a non-corrupted segment, the correlation between EEG Channel 1 and 2 improved from around 89% to ~ 99% due to proper baseline correction (ideally it should be 1). For clean EEG segments, a very high correlation between channels 1 and 2 is necessary since the model is trying to map the relationship between the corrupted (input) and the clean EEG (output) segments. For this reason, we have developed an adaptive baseline drift correction scheme that can handle all the scenarios and fulfill data requirements for deep learning models. The main idea of the scheme is to extensively remove the baseline drifts from EEG signals from both channels using higher-order polynomials during non-corrupted segments to match them closely; during motion-corrupted segments, the baseline was removed using lower-order polynomials so that motion artifacts do not get removed or reduced during baseline correction to ensure proper evaluation of the proposed deep learning framework. After DC offset removal, the baseline of ground truth EEG signals (channel 1) was approximated by a higher-order (e.g., 20) polynomial while a lower-order polynomial (e.g., 3) was used for the motion-corrupted EEG signals (channel 2). After chopping the signal into much smaller segments (1024 data samples per segment), the motion artifacts-free segments from both channels are hard baseline-corrected further. While keeping the polynomial order at 20, this operation on much smaller segments removes any remaining high-frequency drifts from both EEG channels. On the other hand, baseline correction for accelerometer signals was done using 10th-order polynomials.

2.3.3 Segmentation

Longer segments of signals are likely to contain several features which might be overlooked by the deep learning model while training. Also, relatively smaller signal segments would reduce higher resource requirements during training. Considering these two points, the EEG and accelerometer signals were chopped into segments of 1024 sample points following the approach of the works presented in [77,78,79]. During segmenting the waveforms, 50% overlapping was carried out to increase the number of segments to twice. This approach is similar to patching [80] for images. Signals were processed before and after segmentation, as discussed in the previous subsection. After prediction by the deep learning model, the baseline-corrected segments are overlapped by removing every even number of the segment. The remaining segments are concatenated together to form a signal of the same length as the original signal for evaluation purposes.

2.3.4 Normalization

Each extracted segment was ‘Zscore’ normalized first, then ‘range’ normalized between 0 and 1 [Eq. (7)]. ‘Zscore’ normalization is important for normalizing signals of high variance [77], and ‘range’ normalization is utilized to constrain the amplitudes between 0 and 1, which is crucial for deep learning algorithms [77,78,79].

$${\text{EEG}}_{{\text{i}}} \left( {{\text{norm}}} \right) = range\left( {\left( {\frac{{{\text{EEG}}_{{\text{i}}} - {\upmu }_{{\text{i}}} }}{{{\upsigma }_{{\text{i}}} }}} \right),\left[ {0\,\,1} \right]} \right)$$
(7)

The predicted signal (after joining) is denormalized later to calculate the change in signal-to-noise ratio (∆SNR) before and after motion artifact correction. Other studies conducted in the past also computed ∆SNR using denormalized signals.

2.3.5 Filtering

The EEG signals in this dataset were corrupted with 50 Hz powerline noise of varying amplitude across trials. A notch filter was used to clean 50 Hz noise from the signals during the preprocessing stage. A Quality Factor (Q-Factor) of 10 was found to be suitable for the whole dataset.

3 Experimentation and performance metrics

In this section, the experimental setup and all related components of this study are discussed in detail. Also, the evaluation metrics used in this study are introduced in a separate subsection to quantitatively measure the performance of all the deep CNN models in removing motion artifacts from single-channel EEG recordings.

3.1 Experimental setup

The raw EEG dataset was preprocessed and prepared for the deep learning pipeline developed using TensorFlow 2.0 in Python and was used to train 1D-CNN-based segmentation networks for motion artifact correction from EEG signals. A segmentation network in the deep learning domain is nothing but a one-to-one mapping algorithm. Therefore, the proposed MLMRS-Net along with ten other 1D-CNN models were trained with a view to mapping motion-corrupted EEG to their corresponding clean version and being validated through the Jackknife validation method. The dataset was divided into 23 folds, each fold containing processed segments from a single, independent trial. So, all experiments have been repeated 23 times and the results are the average of the outcomes from all 23 test sets. The following two experiments were performed to evaluate our proposed approach and model.

3.1.1 Experiment A

In experiment A, the motion-corrupted EEG signal and its corresponding 3-axis accelerometer signals were fed into the 1D-CNN model as inputs (predictor signals) whereas the “reference ground truth” EEG signal was the output (target signal) which needs to be estimated by the model. Thus, the models had four input channels and one output channel. Apart from our proposed MLMRS-Net, ten state-of-the-art segmentation networks viz. Feature Pyramid Network (FPN) [81], LinkNet [82], UNet [64], Attention Guided UNet [71], DenseInceptionUNet [70], MultiResUNet [83], UNet+ [84], UNet++ [84], Attention Guided UNet++ [85] and UNet3+ [86] were implemented in this experiment for training and testing. These deep CNN models were primarily proposed for solving 2D image segmentation, which we converted into 1D segmentation networks for our purpose. All the parameters of the networks, such as the number of layers or depth, number of filters or kernels in each layer, i.e., width, etc. were kept the same for all models to make the evaluation procedure fair. All models had 5 layers and the initial layer had 64 filters which were made doubled in each deeper level. Each model was trained for 300 epochs with an epoch patience of 30 in the Google COLAB platform. Prepared data from MATLAB were imported to the Python environment and were directly used for training and evaluation.

3.1.2 Experiment B

In this experiment, the 3-axis accelerometer data were removed from the input and the proposed MLMRS-Net model was evaluated to observe the effect and/or contribution of the accelerometer signals (individually or combined) in the motion artifacts correction process. Through this experiment, the feasibility of using only the EEG data for motion artifact correction has been analyzed to conclude whether the requirement of extra hardware devices (e.g., accelerometer) during practical implementation can be excluded or not. It is worth mentioning that all the studies conducted previously in cleaning motion artifacts from EEG signals, utilized traditional signal processing techniques where only EEG data (motion-corrupted and reference ground truth) were used. To the best of our knowledge, this is the first study that is using the accelerometer data parallelly to aid the estimation process and evaluate the effect of accelerometer signals individually and combined. Moreover, an interesting experiment was performed to estimate clean EEG signals from only 3-axis accelerometer signals to understand their standalone contribution.

3.1.3 Jackknife validation

In this study, the Jackknife validation [87], also known as the leave-one-out-cross-validation technique, was adopted for validating the proposed EEG signal motion artifacts correction method. As mentioned earlier, the benchmark dataset used in this study had 23 sets of EEG recordings where each set contained one “reference ground truth” EEG and one motion-corrupted EEG signal. For each iteration, 22 sets of EEG data were selected for training and the remaining 1 set for testing, i.e., 23 folds. For each model, the performance metrics computed and reported in Section 4 is an average of 23 runs. This validation approach is robust since the test sets were always independent of the training sets and contained data from only a single trial. On the other hand, the training set was ‘general’ due to containing all independent trials apart from the one in the test set.

3.2 Quantitative evaluation metrics

Since the objective of this study is to reduce artifacts from motion-corrupted EEG signals, calculating the difference in SNR \(\left( {\Delta SNR} \right)\) value between motion-corrected and motion-corrupted EEG signals, quantifying the improvement in correlation between motion-corrected and reference ground truth signals (expressed by the percentage reduction in motion artifact ‘\(\eta\)’) and computing the signal reconstruction error ‘\(\varepsilon\)’ can robustly assess the efficacy of the corresponding model in removing motion artifacts. Evaluating the performance of a signal reconstruction network using only a single or similar metrics might not show the complete picture. Hence, in this study, \(\Delta SNR\), \(\eta\), and \(\varepsilon\) computed using mean absolute error (MAE) are used as quantitative performance metrics.

3.2.1 Change in signal-to-noise ratio (∆SNR)

Motion artifacts appear as high-power noise components in both temporal and spectral domains. Removing motion artifacts from the EEG signals should result in a large improvement in the SNR of the signals. For the calculation of ∆SNR, Eq. (8) is used as provided in [20],

$$\Delta {\text{SNR}} = 10\log_{10} \left( {\frac{{\sigma_{x}^{2} }}{{\sigma_{{e_{{{\text{after}}}} }}^{2} }}} \right) - 10\log_{10} \left( {\frac{{\sigma_{x}^{2} }}{{\sigma_{{e_{{{\text{before}}}} }}^{2} }}} \right)$$
(8)

Here, \(\sigma_{x}^{2}\), \(\sigma_{{e_{{{\text{before}}}} }}^{2}\), and \(\sigma_{{e_{{{\text{after}}}} }}^{2}\) represent the variance of the “reference ground truth” signal, motion-corrupted signal, and motion-corrected signal, respectively.

3.2.2 Correlation coefficient (η)

The correlation between the estimated and the ground truth EEG signals should be more than the correlation between the ground truth and corrupted EEG channels. In this study, the Pearson Correlation Coefficient (PCC) is used to quantify the correlation between signals. To calculate the percentage reduction in motion artifacts \(\eta\), Eq. (9) is used as provided in [20]:

$$\eta = 100\left( {1 - \frac{{1 - \rho_{{{\text{after}}}} }}{{1 - \rho_{{{\text{before}}}} }}} \right)$$
(9)

Here \(\rho_{{{\text{before}}}}\) is the PCC between the “reference ground truth” and motion-corrupted signals whereas \(\rho_{{{\text{after}}}}\) is the PCC between “reference ground truth” and motion-corrupted signals over the epochs where motion artifact is absent.

3.2.3 Construction error (\(\varepsilon\))

MAE is one of the primary evaluation metrics to calculate the construction error of the reconstructed signals through 1D-segmentation networks [77,78,79]. Other similar metrics such as mean squared error (MSE), root mean squared error (RMSE), or median absolute error can also be used instead. In this study, the mean and Standard Deviation (SD) of construction error of all reconstructed segments are reported as the final metrics. For ground truth signals, \({\text{Y}} = \left[ {{\text{Y}}_{1} ,{\text{Y}}_{2} ,{\text{Y}}_{3} , \ldots ,{\text{Y}}_{{\text{n}}} } \right]\) and predicted signals (or vectors), \({\hat{\text{Y}}} = \left[ {{\hat{\text{Y}}}_{1} ,{\hat{\text{Y}}}_{2} ,{\hat{\text{Y}}}_{3} , \ldots ,{\hat{\text{Y}}}_{{\text{n}}} } \right]\), Construction Error ‘\(\varepsilon\)’, computed using MAE as the primary metric, can be defined as in Eq. (10),

$${\text{Construction}}\,\,{\text{Error}}, \varepsilon = \frac{{\mathop \sum \nolimits_{i = 1}^{n} \left( {\frac{{\mathop \sum \nolimits_{j = 1}^{m} \left| {Y_{ij} - \hat{Y}_{ij} } \right|}}{M}} \right)}}{N}$$
(10)

where ‘N’ is the number of signal segments and ‘M’ is the number of samples in each segment, which is 1024 for this study. Standard Deviation (SD) of Construction Error \(\sigma_{\varepsilon } ,{ }\) can be formulated as in Eq. (11),

$${\text{SD}}\,{\text{of}}\,{\text{Construction}}\,{\text{Error}},\, \sigma_{\varepsilon } = \sqrt {\frac{{\mathop \sum \nolimits_{i = 1}^{n} \left( {\left( {\frac{{\mathop \sum \nolimits_{j = 1}^{m} \left| {Y_{ij} - \hat{Y}_{ij} } \right|}}{M}} \right) - \mu } \right)^{2} }}{N}}$$
(11)

In Eqs. (10) and (11), capital symbols signify that they represent the whole population (i.e., all the segments in the dataset in this case). Since the deep learning pipeline outputs estimated segments of a length the same as the training segments, the predicted segments for a single trial were combined to reconstruct the final estimated signal. \(\Delta {\text{SNR}}\) and \(\eta\) measurements were performed on these signals only. All evaluation metrics were calculated for each of the 23 folds and averaged to report the performance for each model.

Quantitative evaluation often fails show to the true picture of the outcomes of a study, sometimes even after evaluating from different aspects. For this reason, we also qualitatively evaluate the motion artifact correction performance of the proposed MLMRS-Net from EEG signals, both in the temporal and spectral domains.

4 Results

This section provides the quantitative and qualitative evaluation outcomes from the experiments conducted in this study along with illustrations.

4.1 Quantitative evaluation

This section mainly provides the quantitative outcomes of the experiments performed for this study. Here, Table 1 presents the results from Experiment A, and it shows that MLMRS-Net outperforms all the state-of-the-art segmentation models in terms of construction error and percentage reduction in motion artifacts. The construction error for normalized EEG segments is found as 0.056, which is the lowest among all the trained models. The Standard Deviation of construction error for the MLMRS-Net is also excellent. The lower value of this parameter signifies that the variation in construction error is minimal for the network while estimating clean EEG signals. High variability in performance parameters can easily question the robustness of a deep learning model. Hence, our proposed MLMRS-Net is robust and reliable since it shows minimal variation while estimating the clean signal. An outstanding performance of 90.52% improvement in average percentage reduction in removing motion artifacts is observed by the MLMRS-Net which is the highest compared to the other deep CNN models. As evident from Table 1, MLMRS-Net is one of the two models which could exceed 90%. Even though the MLMRS-Net performed well with an improved ΔSNR value of 26.641 dB, no significant difference across models in terms of ΔSNR can be observed. Detailed four-channel result per trial for the MLMRS-Net has been provided in Supplementary Table 1. Mentionable that in all tables reporting results, outcomes from the best performing models have been made bold for individual metrics.

Table 1 Results for EEG motion artifact correction through signal reconstruction using 1D-CNN

On the other hand, Table 2 presents the results from Experiment B where the input signals for the MLMRS-Net model were varied to understand the effect of accelerometer signals on the motion artifact removal performance. At first, accelerometer signals were removed fully and only the motion-corrupted EEG was used for the process. Then, gradually different axis of the accelerometer was varied with EEG to understand their respective contribution. Also, one interesting experiment was performed to estimate clean EEG signals from only the motion-corrupted 3-axis accelerometer signals. From Table 2, it is clear that when only EEG signals were used to train the MLMRS-Net model, it could reach an average \(\eta\) value of 89.32% while testing. Using any one of the 3-axis accelerometer data alongside the EEG signals slightly boosted \({\upeta }\) whereas using all three axes accelerometer data along with the EEG signals produced the best average \(\eta\) value of 90.52%. The improvement in \(\Delta {\text{SNR}}\) value is similar for all the cases. The impossible experiment of using only 3-channel accelerometer signals to estimate EEG provides a minor average \(\eta\) improvement of 15.46% and ΔSNR value of 15.44 dB, which is expected as the estimated signals from only accelerometer data was nothing but noise. But this experiment proves that the accelerometer signals as predictors along with EEG signals have some positive impact in improving the average percentage reduction in motion artifacts with a 1.34% boost in performance. On the contrary, using only EEG signals for training a signal reconstruction model, one can reach optimum results in motion artifact correction. Thus, during a hardware system design, accelerometers can be removed, and one can still expect high performance from MLMRS-Net or similar models in EEG motion artifact correction.

Table 2 Effect of the Accelerometer in EEG motion artifact correction performance using MLMRS-Net

4.2 Qualitative evaluation

As mentioned earlier, qualitative evaluation is crucial for such studies since the number cannot always provide a clear and convincing picture of the feasibility of a newly proposed approach. From the studies in the current literature provided in Table 3, it can be summarized from their reported high ∆SNR that they were good at reducing noise but might also have reduced the embedded biological EEG signals in the process; therefore, the correlation improvement did not exceed 70% even with high ∆SNR. However, in the case of the deep learning technique proposed in this work, the motion artifact is removed while keeping the biological signals intact, which made the ∆SNR value slightly smaller than some of the earlier studies. This can be visualized from the plots shown in Fig. 7 for various trials or folds across the dataset, for both clean and corrupted segments. Figure 7a–d shows some sample corresponding ground truth (EEG channel 1), (moderate to high) motion-corrupted (EEG channel 2), and MLMRS-Net estimated EEG segments. Figure 7e, f displays some segments without any presence of motion artifacts in EEG channel 2. It can be seen that during segments with no motion, all three signals show a high correlation. In such cases, MLMRS-Net tries to keep the signals as closes as the input EEG segments from channel 2. On the other hand, for even highly motion-corrupted segments, MLMRS-Net improved the correlation by a great amount, which proves the robustness of the approach.

Fig. 7
figure 7

Corresponding ground truth, motion-corrupted, and motion-corrected EEG segments from various sample Test Folds. In this figure, a–d represents the proposed MLMRS-Net’s motion correction ability while e–f shows its almost invariable outputs in the case of clean segments

So far, we have visualized the performance in the time or temporal domain. Our claims have been further strengthened by Power Spectral Density (PSD) plots [87] and Topographic Maps [88] of EEG signals as shown in Figs. 8 and 9, respectively. These plots represent the performance of the model in the spectral domain. For spectral evaluation, segments from all 23 folds were concatenated and their spectra were analyzed and presented in a single plot. From Fig. 8, the PSD of the estimated EEG signals from the proposed deep learning framework greatly matches that of the ground truth EEG signals over the spectrum. On the other hand, even though motion artifacts insert high power components in the EEG signals all over the spectrum, in the case of the Delta (\(\delta\)) \(\cong\) 0.5 to 4 Hz band, the distortion is the worst. The proposed framework could greatly minimize the drastic effect of motion artifacts in this range, as shown in the PSD plot in Fig. 8 and the topographic map for the Delta band in Fig. 9b.

Fig. 8
figure 8

Periodogram power spectral density (PSD) plots of ground truth, motion-corrupted, and MLMRS-Net Estimated EEG signals from the whole dataset

Fig. 9
figure 9

EEG topographic maps of various EEG frequency bands for the whole dataset to show the robustness of the proposed motion artifact correction scheme

Talking about topographic maps, EEG topography is a neuroimaging technique for visualizing the neural activity around the brain by computing the bandpower of EEG signals collected from various electrodes and plotted smoothly following the gradient. In this case, we have a single-channel EEG collected from the prefrontal cortex region of the brain, as explained in detail in the dataset section. That means we have a single electrode in the ‘Fpz’ location of the brain as denoted by the international 10–20 system for scalp electrode placement for EEG data acquisition [89]. To compute the topographic map, we consider a total EEG bandwidth of 0.5 to 80 Hz while for the five EEG frequency components, we have Delta (\(\delta\)) \(=\) 0.5–4 Hz, Theta (\(\theta\)) = 4–8 Hz, Alpha (\(\alpha\)) = 8–13 Hz, Beta (\(\beta\)) = 13–40 Hz, and Gamma (\(\gamma\)) = 40–80 Hz [90, 91]. We combine the EEG signals (ground truth, motion-corrupted, and estimated) from all 23 folds, calculate the bandpower and plot the topographic maps while keeping the same scale for all cases [92]. From the topographic plots shown in Fig. 9, it can be seen that no matter what the frequency range is, motion artifacts destroy the topographic maps by inserting high-power components in the EEG signal, Delta being the worst affected band. Regardless of the EEG band, the estimated EEG components contain similar bandpower to the ground truth. Individual topographic plots for all 23 folds considering the whole EEG band (0.5–80 Hz) are provided in Supplementary Fig. 3 to understand the performance of the proposed framework in specific cases. It was observed that there are cases where the ground truth EEG has more bandpower (Fold 1–11), low bandpower (Fold 12–15), and medium bandpower (16–23). Regardless of the case, the model always managed to remove motion artifacts properly and extract EEG signals with robustness. So, there are significant improvements in the percentage of signal correlation and SNR which can be observed from both temporal and spectral perspectives as presented in Figs. 7, 8, 9.

4.3 Comparison with the existing works

There have been several studies that worked on removing motion artifacts due to external perturbations from EEG signals, and all the studies worked on signal processing algorithms to reach their solution. Among the earliest studies, Sweeny et al. [21] used DWT, EMD, and EEMD-based signal processing techniques combined with ICA and CCA for motion artifact removal. Maddirala et al. [27] and Noorbasha et al.[31] proposed SSA and its variations for the same purpose and found better results. Gajbhiye et al. [29, 30], in their two studies, used combinations of DWT along with multi-resolution TV, multi-resolution WTV, and Savitzky–Golay filtering techniques separately and reached much better outcomes. Until very recently, Hossain et al. [34, 37] in their two papers developed motion artifacts correction pipelines that used several single-stage (VMD, WPD) and two-stage (VMD-PCA, VMD-CCA, WPD-CCA) signal processing methods and reported good performance in ∆SNR estimation. Table 3 summarizes these works, and from Table 3, the best average ∆SNR value was reported as 30.76 dB using the WPD-CCA technique utilizing db1 wavelet packet and the highest average η was 68.76% utilizing DWT along with Savitzky–Golay filtering. In this work, our proposed MLMRS-Net model outperformed all the previous works with a staggering 90.52% improvement in average η value. Also, to the best of our knowledge, this is the very first paper in this domain that utilized any machine learning concept to clean motion artifacts from EEG signals. Compared to the traditional signal processing techniques, which have their drawbacks, the proposed approach made the deep learning model learn relevant features from EEG signals for better motion artifact correction through training. The results are provided in Table 3 for comparison. For different past studies reporting similar correlation improvements in Table 3, ΔSNR varied by a large margin, i.e., their trend in change is not similar. As mentioned before, the ΔSNR parameter depends more on the data preprocessing steps rather than the artifact correction technique, which is evident here and in past studies.

Table 3 Comparison of performance with the recent literature

5 Conclusion

In this extensive study, we have proposed a novel deep learning-based 1D-segmentation network (MLMRS-Net) to remove motion artifacts from single-channel, motion-corrupted EEG signals, which is a very novel concept in this domain. Motion artifacts can severely affect EEG signals, which sometimes distort the signal morphology itself due to its very low amplitude. So, it is crucial to develop robust methods for reducing the effect of motion artifacts from the EEG signals. The performance metrics obtained from all the networks tested under this study are a clear indication of the efficacy of using deep learning models in removing motion artifacts from EEG signals rather than using traditional signal processing techniques. Our proposed MLMRS-Net has produced the best performance in reducing the effect of motion artifacts in comparison to previously reported studies in the literature by reaching a PCC value of 90.52% between ground truth and estimated EEG signals along with an average noise reduction (SNR) value of 26.64 dB while reliably retaining the underlying biological signals. Also, a very minimal construction error value is found while the MLMRS-Net model was utilized for reconstructing motion-corrected EEG signals. This study is proof that after being trained on a sufficiently large dataset, such deep learning models can be used to reliably remove artifacts from corrupted EEG signals in real time. Moreover, 1D-CNN-based signal reconstruction networks could be used for motion artifact correction from similar physiological signals such as electromyogram (EMG), electrocardiogram (ECG), photoplethysmogram (PPG), and phonocardiogram (PCG) following a similar experimental setup. In this way, minimal efforts can be given to developing more efficient signal processing techniques when artificial intelligence can reliably learn the pattern of the signals itself.