MLMRS-Net: Electroencephalography (EEG) motion artifacts removal using a multi-layer multi-resolution spatially pooled 1D signal reconstruction network

Electroencephalogram (EEG) signals suffer substantially from motion artifacts when recorded in ambulatory settings utilizing wearable sensors. Because the diagnosis of many neurological diseases is heavily reliant on clean EEG data, it is critical to eliminate motion artifacts from motion-corrupted EEG signals using reliable and robust algorithms. Although a few deep learning-based models have been proposed for the removal of ocular, muscle, and cardiac artifacts from EEG data to the best of our knowledge, there is no attempt has been made in removing motion artifacts from motion-corrupted EEG signals: In this paper, a novel 1D convolutional neural network (CNN) called multi-layer multi-resolution spatially pooled (MLMRS) network for signal reconstruction is proposed for EEG motion artifact removal. The performance of the proposed model was compared with ten other 1D CNN models: FPN, LinkNet, UNet, UNet+, UNetPP, UNet3+, AttentionUNet, MultiResUNet, DenseInceptionUNet, and AttentionUNet++ in removing motion artifacts from motion-contaminated single-channel EEG signal. All the eleven deep CNN models are trained and tested using a single-channel benchmark EEG dataset containing 23 sets of motion-corrupted and reference ground truth EEG signals from PhysioNet. Leave-one-out cross-validation method was used in this work. The performance of the deep learning models is measured using three well-known performance matrices viz. mean absolute error (MAE)-based construction error, the difference in the signal-to-noise ratio (ΔSNR), and percentage reduction in motion artifacts (η). The proposed MLMRS-Net model has shown the best denoising performance, producing an average ΔSNR, η, and MAE values of 26.64 dB, 90.52%, and 0.056, respectively, for all 23 sets of EEG recordings. The results reported using the proposed model outperformed all the existing state-of-the-art techniques in terms of average η improvement.

EEG is an exceptionally crucial physiological signal due to its widespread usage but is highly susceptible to motion artifacts that happen due to the voluntary and/or involuntary movement of the test subject during data recording using wearable devices. In some instances, movement artifacts may end up so conspicuous that the recorded EEG signal would lose its usability unless the movement artifacts are diminished significantly. Several earlier efforts were undertaken to reduce movement artifacts from motion-corrupted EEG data, which were summarized in [20,21]. Several single-stage and two-stage motion artifacts correction techniques for EEG modality were introduced and implemented in [21]. The authors of [21] investigated discrete wavelet transform (DWT) [22] using Daubechies 5 mother wavelet, empirical mode decomposition (EMD) [23], ensemble empirical mode decomposition (EEMD) [24], EMD in conjunction with independent component analysis [25] (EMD-ICA), EEMD-ICA, EMD cascaded with canonical correlation analysis [26] (EMD-CCA), and EEMD-CCA to decompose the single-channel EEG data and utilized ''reference ground truth signal'' and autocorrelation function separately to identify and discard motion-corrupted component(s). In [27], the authors utilized singular spectrum analysis (SSA) [28] to decompose the single-channel EEG signals and then used the adaptive noise cancellation (ANC) technique to eliminate motion artifacts. Ghajbhiye et al. [29] used DWT to decompose the single-channel EEG data into sub-band signals and applied the total variation (TV) and weighted total variation (MTV) multi-resolution technique to the approximation sub-band signal to filter out motion artifacts. For the reduction of motion artifacts from single-channel EEG, a wavelet domain optimized Savitzky-Golay filter was implemented in [30]. Noorbasha et al. [31] used the SSA along with the generalized Moreau envelope total variation (SSA-GMETV) technique to lessen motion artifacts from singlechannel EEG signals. To efficiently reduce movement artifacts from EEG, Shukla et al. [32] suggested a twostage artifact correction technique where EEMD and Gaussian elimination CCA (GECCA) were utilized jointly whereas in [33], modified EMD in combination with optimized Laplacian of Gaussian (LoG) filter was proposed for suppressing movement artifacts. Recently, Hossain et al. [34] utilized variational mode decomposition (VMD) [35], VMD cascaded with principal component analysis [36] (VMD-PCA), and VMD-CCA for the correction of motion artifacts from single-channel EEG data. In [37], the wavelet packet decomposition technique in combination with CCA was proposed. The main limitation of these studies is their adoption of signal processing techniques. While there have been some improvements over the years, the correlation improvement performance from these studies could not exceed the 70% mark due to the static nature of these manually tuned techniques. Moreover, the existing techniques have never been properly evaluated with robust metrics, both temporally and spectrally, as we have performed during this study to ensure that the underlying EEG information is not lost during the process.
EEG signals, in addition to motion artifacts, suffer from other forms of artifacts among which ocular, muscular, and cardiac artifacts are prominent. Autoencoders (AEs) based on fully connected layers were developed by Ghosh et al. [38] and Yang et al. [39] to eliminate ocular artifacts from EEG signals. Leite et al. [40], Zhang et al. [41], and Sun et al. [42] introduced deep convolutional neural network (DCNN)-based models that can extract spatio-temporal information and are hence more resilient than typical fully connected neural networks. In [40], a deep convolutional autoencoder (DCAE) was developed to reduce eye blink and jaw clenching aberrations from EEG data. To reduce muscular distortions from EEG data, authors in [41] developed a DCNN that progressively increases its width. Sun et al. [42] reported a residual-connection-based DCNN for reducing ocular, muscular, and cardiac abnormalities from noisy EEG data. Recently, authors of [43] proposed EEGANet, a framework based on generative adversarial networks (GANs) for the removal of ocular artifacts from EEG data whereas in [44], the k-means algorithm in combination with the SSA technique was proposed for the reduction of eye blink artifacts. Although a fair share of studies is existent for the removal of ocular, muscle, and cardiac artifacts from EEG recordings to the best of our knowledge, the removal of motion artifacts using deep learning models has not been investigated to date.
Unlike EEG, both classical and deep Machine Learning techniques have been used to correct motion artifacts from other physiological signals such as photoplethysmography (PPG) [45][46][47][48][49][50][51][52], electrocardiogram (ECG) [45,[53][54][55][56][57][58][59][60], electromyogram (EMG) [61,62], and phonocardiogram (PCG) [63]. To fill this void, this study presents a novel 1D convolutional neural network (CNN)-based signal synthesis or reconstruction approach to correct motion artifacts from motion-corrupted EEG recordings. The key contributions from this study can be summarized as follows: • This is the very first study that used any kind of Machine Learning approach to remove motion artifacts from EEG signals. All other previous studies used a combination of traditional signal processing techniques. • This study used deep learning (CNN)-based 1D signal reconstruction network to reduce motion artifacts significantly from motion-corrupted EEG signals with significantly higher performance in SNR reduction and correlation improvement than in existing studies. • This study evaluated the contribution of onboard accelerometers in reducing motion artifacts from corrupted EEG signals.
• The methodology or framework proposed in this study can be extended to any other 1D physiological signal such as photoplethysmogram (PPG) and electrocardiogram (ECG) for signal artifacts correction.
The remainder of this paper is structured as follows: Sect. 2 illustrates the proposed convolutional neural network (CNN)-based MLMRS-Net segmentation network for EEG motion artifact correction, which is followed by an overview of the single-channel EEG benchmark dataset, and the data preprocessing techniques adopted in this study. Section 3 discusses in detail the experimental setup and the performance evaluation metrics used. Sect. 4 provides the performance of the proposed model as well as ten other state-of-the-art segmentation networks and discusses the results along with a comparison to past studies. Finally, a brief conclusion is presented in Sect. 5.

Materials and methods
In this section, the proposed MLMRS-Net segmentation network for EEG motion artifact removal is discussed in detail. A brief overview of the EEG benchmark dataset used as well as the data preprocessing steps adopted in this study is discussed in two separate sub-sections. Figure 1 shows the framework proposed in this study for effective EEG motion artifact removal using a 1D-CNN-based segmentation network.

Overview of MLMRS-Net
The architecture of the proposed MLMRS-Net segmentation network is illustrated in Fig. 2. MLMRS-Net is a 1D-CNN-based segmentation network that contains one multiresolution pooling (MRP) block in each encoder and decoder layer of the network. The network itself follows the UNet framework [64] where the final output of each encoder level gets concatenated with the decoder layer at the same level to retain the feature map from the contracting path. Deep supervision [65] is used in each decoder layer including the latent layer at the bottom. Hence, apart from the final output, our proposed model generates five extra outputs (Fig. 2), all of which are being deeply supervised at the same time.

Modified spatial pooling (MSP) layer
The architecture of the modified spatial pooling (MSP) layer is depicted in Fig. 3 which can be modified based on the input 'n' into the layer, as shown in Fig. 4. The input to the MSP layer gets mix-pooled with a pool size of 2 n (Fig. 3). If the values of 'n' for a segmentation network having 'k' levels are represented as, n ¼ 0; 1. . .; k À 2; k À 1 ½ , the pool size for the MSP layers in each MRP block is, s = [1,2…, 2 kÀ2 , 2 kÀ1 ]. Since we have designed our proposed MLMRS-Net model with 5 levels, the corresponding pool size is, s = [2 0 , 2 1 , 2 2 , 2 3 , 2 4 ] = [1,2,4,8,16]. Mixed or ''Max-Average'' pooling [66], [67] is a combination of max and average pooling. The outputs from the max and average pooling blocks are added based on the weight regulator value 'a', as formulated in Eq. (1), Here, x l and x lþ1 denotes input and output layers, respectively, out of an operation. 0 f 0 denotes a function. The addition operation is denoted by the 0 È 0 symbol. In our study, the value of 'a' is chosen as 0.5, i.e., equal weight is given to both max-pooling and average-pooling. The pooling layer is followed by a convolutional block of kernel size = 3. Then, the feature map generated from the convolutional layer is forwarded into two branches. In one branch, the features are upsampled with an upsize of 2 n through 'Bilinear Interpolation' [68] [Eq. (2)] whereas in the second branch, the feature map is squeezed through a transposed convolution block [69] having kernel size and stride of 1 [Eq. (3)]. The squeezed feature maps are further fed into a transposed convolution block of kernel size = 3   and stride = 2 n to transform the feature maps to the same size as the original. Feature maps from these two upsampling paths are concatenated [Eq. (4)]. In this way, the proposed network gets benefitted from both interpolation and transposed convolution type feature upsampling techniques [70].
Finally, the concatenated feature maps are squeezed through a transposed convolution block of kernel size and stride of 1 before outputting [Eq. (5)]. This process reduces the feature footprint in the next stage during concatenation [71]. It is evident that the coarseness (or fineness) of the MSP layer depends on the value of 'n'. If n = 0, the feature map is the coarsest, which gets finer as the value of 'n' is increased [72].

Multi-resolution pooling (MRP) block
The multi-resolution pooling (MRP) blocks contain MSP layers equal to the number of levels of the segmentation model, which is '5' in our proposed model. As discussed previously, the value of 'n' varies from 0 to 'k À 1'. The feature maps become finer as the value of 'n' increases. Inside the MRP block, a skip connection from the input is concatenated to the output from each MSP layer [73], as shown in Fig. 4. In this way, the model will be able to capture coarser to finer features from the input signals, the skip connection being the coarsest and 'n ¼ k À 1' being the finest. Since EEG signals are more unpredictable than other physiological signals such as ECG or PPG signals, the proposed model is designed to facilitate capturing various types of features from the EEG signals. This is also reflected in the outcome obtained as shown in Section 4, the results section. The operation in the MRP block is formulated in Eq. (6), where the term 'U' signifies concatenation.
It is worthwhile to mention that the model size or the number of parameters can be reduced by decreasing the number of MSP layers per MRP block, which is termed the 'Cardinality' [74] of the model. Increasing the cardinality exponentially boosts the number of model parameters. In this study, the cardinality of the model was kept as five.

Overview of the EEG benchmark dataset
The dataset used in this study, namely ''Motion Artifact Contaminated fNIRS and EEG Data,'' is a publicly available PhysioNet dataset, contributed by Sweeny et al. [75,76]. This dataset contains instances of ''reference ground truth'' and motion-corrupted functional near-infrared spectroscopy (fNIRS) and electroencephalogram (EEG) recordings, which were primarily recorded for evaluating several motion artifact removal techniques. During the 9 min long EEG data acquisition from each test subject, two electrodes with the same hardware properties were placed simultaneously on the test subject's scalp at very close proximity (30 mm) where one of the electrodes was kept unimpacted to record ''reference ground truth EEG signal'' while the other one was disturbed by tapping the sensor for 10-25 s at around two minutes intervals to record motion-corrupted EEG signal. The lack of motion artifacts in one sensor and presence in the other was also documented using 3-axis accelerometers placed along with each sensor. Simultaneously recorded ''reference ground truth'' EEG signals and the corresponding motion-corrupted EEG signals showed a high correlation (* 0.83, r = 0.2) during the motion-free intervals and a much lower correlation otherwise (* 0.40, r = 0.19) [75]. The dataset contains 23 sets of single-channel EEG data, collected from the prefrontal cortex region of the brain, along with corresponding 3-axis accelerometer signals for both ''reference ground truth'' and motion-corrupted EEG signals. It is worth mentioning that all the recorded signals were synchronized through software-based trigger signals for both EEG and accelerometer. These trigger signals were utilized during data preprocessing. The ''reference ground truth'' and motion-corrupted EEG signals in this dataset were labeled as channel 1 and channel 2, respectively. Each EEG signal was recorded at a sampling frequency of 2048 Hz whereas the accelerometer and trigger signals were sampled at a rate of 200 Hz. Figure 5a shows an example of synchronized plots of EEG Signals (''reference ground truth'' and motion-corrupted), corresponding accelerometer 3-axis plots with motion artifacts, and accelerometer trigger during the whole recording duration (9 min), and Fig. 5b depicts one zoomed-in segment with motion artifacts. From Fig. 5, it is clear that EEG channel 1 (''reference ground truth'' EEG signal) suffers from baseline drift whereas the high amplitude fluctuations in motion-corrupted EEG signal are noticeable in four different regions.

Data preprocessing
The data preprocessing step is one of the most crucial steps for deep learning applications since the model performance greatly depends on how the data are preprocessed. A wellprepared and preprocessed data can boost the model performance significantly while the same model might fail if the data are not preprocessed properly. In this study, the signals were resampled, baseline corrected, segmented, and normalized in the process of making them suitable for the segmentation networks. Each step is explained below in detail.

Resampling
During data acquisition, the EEG and accelerometer signals were sampled at 2048 Hz and 200 Hz, respectively. EEG signals had much more sample points in comparison with the accelerometer signals. To use them concurrently in deep learning model training, all signals should have a similar number of data points. To fulfill this prerequisite, all the signals used in this study were resampled to a single Synchronized plots for EEG signals (ground truth and motion-corrupted), corresponding accelerometer 3-axis plots with motion artifacts, and accelerometer trigger a Whole Duration; b Zoomed-in sampling frequency of 256 Hz, i.e., the EEG signals were downsampled from 2048 to 256 Hz and 3 axis accelerometer signals were upsampled from 200 to 256 Hz. Downsampling a signal does not affect the signal morphology much if the interpolation method is chosen carefully since we are interpolating from more data points. But upsampling a signal to a much higher frequency might change the signal morphology as the algorithm tries to estimate several intermediate points. For this reason, the sampling frequency (256 Hz) was kept as close as the lowest sampling frequency (200 Hz). The linear interpolation method was found to be adequate for this study.

Baseline drift correction
The raw signals, especially ''reference ground truth'' and motion-corrupted EEG signals, had baseline drift during the whole recording for 23 trials. Baseline drifts have random patterns that are learned by the deep learning models during training, and it affects the performance. Moreover, the drift patterns are different for different channels (as shown in Supplementary Fig. 1), even if they match, training 1D-CNN models with signals affected by baseline wandering will instigate the model to produce baseline corrupted EEG during prediction. So, baseline wandering needs to be removed or minimized from all signals (target or predictor) so that the deep learning architecture can focus on learning only important features. But for motion-corrupted EEG signals, motion artifacts and baseline wandering remain mixed. Moreover, the EEG signals have large DC shifts or offsets which affect the baseline correction process, so it needs to be removed beforehand. Our baseline correction process primarily involves fitting a polynomial along the baseline of the signal and deducting it. Polynomial order and window length of the operation are two crucial factors that control the sharpness of the polynomial [77]. If the polynomial order is high or the window length is small, even a highfrequency baseline can be removed, and vice-versa. But it solely depends on the nature of the baseline. If the baseline is not highly frequent, a higher-order polynomial will distort the signal itself. On the contrary, if the baseline is highly frequent, a lower-order polynomial will not fix the baseline properly. Moreover, in this case, the baseline remains mixed with motion artifacts for some segments. Our segmentation models will work ideally when both ''reference ground truth'' and motion-corrupted EEG signals match closely during motion artifacts-free segments (Fig. 6a) and differ during motion artifacts contaminated segments (Fig. 6b), and the motion artifacts should be unaffected by the baseline correction algorithm. Now, if higher-order polynomials are applied to the whole signal, it also partially removes the motion artifacts, as depicted in Supplementary Fig. 2. By doing this, the segmentation network will still perform well but during evaluation, comparing the estimated EEG signal to the input signals will result in low performance since baseline correction partly removed the motion artifacts beforehand. Moreover, the AI framework will not be justified properly since removing a portion of motion artifacts before training made the task less challenging for the network, which will not be the case during a real-world scenario.
On the contrary, if lower-order polynomials are used, it does not remove the baseline properly during non-corrupted segments. As shown in Supplementary Fig. 1, during a non-corrupted segment, the correlation between EEG Channel 1 and 2 improved from around 89% to * 99% due to proper baseline correction (ideally it should be 1). For clean EEG segments, a very high correlation between channels 1 and 2 is necessary since the model is trying to map the relationship between the corrupted (input) and the clean EEG (output) segments. For this reason, we have developed an adaptive baseline drift correction scheme that can handle all the scenarios and fulfill data requirements for deep learning models. The main idea of the scheme is to extensively remove the baseline drifts from EEG signals from both channels using higher-order polynomials during non-corrupted segments to match them closely; during motion-corrupted segments, the baseline was removed using lower-order polynomials so that motion artifacts do not get removed or reduced during baseline correction to ensure proper evaluation of the proposed deep learning framework. After DC offset removal, the baseline of ground truth EEG signals (channel 1) was approximated by a higher-order (e.g., 20) polynomial while a lower-order polynomial (e.g., 3) was used for the motion-corrupted EEG signals (channel 2). After chopping the signal into much smaller segments (1024 data samples per segment), the motion artifacts-free segments from both channels are hard baseline-corrected further. While keeping the polynomial order at 20, this operation on much smaller segments removes any remaining high-frequency drifts from both EEG channels. On the other hand, baseline correction for accelerometer signals was done using 10 th -order polynomials.

Segmentation
Longer segments of signals are likely to contain several features which might be overlooked by the deep learning model while training. Also, relatively smaller signal segments would reduce higher resource requirements during training. Considering these two points, the EEG and accelerometer signals were chopped into segments of 1024 sample points following the approach of the works presented in [77][78][79]. During segmenting the waveforms, 50% overlapping was carried out to increase the number of segments to twice. This approach is similar to patching [80] for images. Signals were processed before and after segmentation, as discussed in the previous subsection. After prediction by the deep learning model, the baseline-corrected segments are overlapped by removing every even number of the segment. The remaining segments are concatenated together to form a signal of the same length as the original signal for evaluation purposes.
The predicted signal (after joining) is denormalized later to calculate the change in signal-to-noise ratio (DSNR) before and after motion artifact correction. Other studies conducted in the past also computed DSNR using denormalized signals.

Filtering
The EEG signals in this dataset were corrupted with 50 Hz powerline noise of varying amplitude across trials. A notch filter was used to clean 50 Hz noise from the signals during the preprocessing stage. A Quality Factor (Q-Factor) of 10 was found to be suitable for the whole dataset.

Experimentation and performance metrics
In this section, the experimental setup and all related components of this study are discussed in detail. Also, the evaluation metrics used in this study are introduced in a separate subsection to quantitatively measure the performance of all the deep CNN models in removing motion artifacts from single-channel EEG recordings.

Experimental setup
The raw EEG dataset was preprocessed and prepared for the deep learning pipeline developed using TensorFlow 2.0 in Python and was used to train 1D-CNN-based segmentation networks for motion artifact correction from EEG signals. A segmentation network in the deep learning domain is nothing but a one-to-one mapping algorithm. Therefore, the proposed MLMRS-Net along with ten other 1D-CNN models were trained with a view to mapping motion-corrupted EEG to their corresponding clean version and being validated through the Jackknife validation method. The dataset was divided into 23 folds, each fold containing processed segments from a single, independent trial. So, all experiments have been repeated 23 times and the results are the average of the outcomes from all 23 test sets. The following two experiments were performed to evaluate our proposed approach and model.

Experiment A
In experiment A, the motion-corrupted EEG signal and its corresponding 3-axis accelerometer signals were fed into the 1D-CNN model as inputs (predictor signals) whereas the ''reference ground truth'' EEG signal was the output (target signal) which needs to be estimated by the model. Thus, the models had four input channels and one output channel. Apart from our proposed MLMRS-Net, ten stateof-the-art segmentation networks viz. Feature Pyramid Network (FPN) [81], LinkNet [82], UNet [64], Attention Guided UNet [71], DenseInceptionUNet [70], Multi-ResUNet [83], UNet? [84], UNet?? [84], Attention Guided UNet?? [85] and UNet3? [86] were implemented in this experiment for training and testing. These deep CNN models were primarily proposed for solving 2D image segmentation, which we converted into 1D segmentation networks for our purpose. All the parameters of the networks, such as the number of layers or depth, number of filters or kernels in each layer, i.e., width, etc. were kept the same for all models to make the evaluation procedure fair. All models had 5 layers and the initial layer had 64 filters which were made doubled in each deeper level. Each model was trained for 300 epochs with an epoch patience of 30 in the Google COLAB platform. Prepared data from MATLAB were imported to the Python environment and were directly used for training and evaluation.

Experiment B
In this experiment, the 3-axis accelerometer data were removed from the input and the proposed MLMRS-Net model was evaluated to observe the effect and/or contribution of the accelerometer signals (individually or combined) in the motion artifacts correction process. Through this experiment, the feasibility of using only the EEG data for motion artifact correction has been analyzed to conclude whether the requirement of extra hardware devices (e.g., accelerometer) during practical implementation can be excluded or not. It is worth mentioning that all the studies conducted previously in cleaning motion artifacts from EEG signals, utilized traditional signal processing techniques where only EEG data (motion-corrupted and reference ground truth) were used. To the best of our knowledge, this is the first study that is using the accelerometer data parallelly to aid the estimation process and evaluate the effect of accelerometer signals individually and combined. Moreover, an interesting experiment was performed to estimate clean EEG signals from only 3-axis accelerometer signals to understand their standalone contribution.

Jackknife validation
In this study, the Jackknife validation [87], also known as the leave-one-out-cross-validation technique, was adopted for validating the proposed EEG signal motion artifacts correction method. As mentioned earlier, the benchmark dataset used in this study had 23 sets of EEG recordings where each set contained one ''reference ground truth'' EEG and one motion-corrupted EEG signal. For each iteration, 22 sets of EEG data were selected for training and the remaining 1 set for testing, i.e., 23 folds. For each model, the performance metrics computed and reported in Section 4 is an average of 23 runs. This validation approach is robust since the test sets were always independent of the training sets and contained data from only a single trial. On the other hand, the training set was 'general' due to containing all independent trials apart from the one in the test set.

Quantitative evaluation metrics
Since the objective of this study is to reduce artifacts from motion-corrupted EEG signals, calculating the difference in SNR DSNR ð Þ value between motion-corrected and motion-corrupted EEG signals, quantifying the improvement in correlation between motion-corrected and reference ground truth signals (expressed by the percentage reduction in motion artifact 'g') and computing the signal reconstruction error 'e' can robustly assess the efficacy of the corresponding model in removing motion artifacts. Evaluating the performance of a signal reconstruction network using only a single or similar metrics might not show the complete picture. Hence, in this study, DSNR, g, and e computed using mean absolute error (MAE) are used as quantitative performance metrics.

Change in signal-to-noise ratio (DSNR)
Motion artifacts appear as high-power noise components in both temporal and spectral domains. Removing motion artifacts from the EEG signals should result in a large improvement in the SNR of the signals. For the calculation of DSNR, Eq. (8) is used as provided in [20], DSNR ¼ 10 log 10 r 2 x r 2 e after ! À 10 log 10 r 2 Here, r 2 x , r 2 e before , and r 2 e after represent the variance of the ''reference ground truth'' signal, motion-corrupted signal, and motion-corrected signal, respectively.

Correlation coefficient (g)
The correlation between the estimated and the ground truth EEG signals should be more than the correlation between the ground truth and corrupted EEG channels. In this study, the Pearson Correlation Coefficient (PCC) is used to quantify the correlation between signals. To calculate the percentage reduction in motion artifacts g, Eq. (9) is used as provided in [20]: Here q before is the PCC between the ''reference ground truth'' and motion-corrupted signals whereas q after is the PCC between ''reference ground truth'' and motion-corrupted signals over the epochs where motion artifact is absent.

Construction error (e)
MAE is one of the primary evaluation metrics to calculate the construction error of the reconstructed signals through 1D-segmentation networks [77][78][79]. Other similar metrics such as mean squared error (MSE), root mean squared error (RMSE), or median absolute error can also be used instead.
In this study, the mean and Standard Deviation (SD) of construction error of all reconstructed segments are reported as the final metrics. For ground truth signals, Y ¼ Y 1 ; Y 2 ; Y 3 ; . . .; Y n ½ and predicted signals (or vectors), Y ¼Ŷ 1 ;Ŷ 2 ;Ŷ 3 ; . . .;Ŷ n Â Ã , Construction Error 'e', computed using MAE as the primary metric, can be defined as in Eq. (10), where 'N' is the number of signal segments and 'M' is the number of samples in each segment, which is 1024 for this study. Standard Deviation (SD) of Construction Error r e ; can be formulated as in Eq. (11), In Eqs. (10) and (11), capital symbols signify that they represent the whole population (i.e., all the segments in the dataset in this case). Since the deep learning pipeline outputs estimated segments of a length the same as the training segments, the predicted segments for a single trial were combined to reconstruct the final estimated signal. DSNR and g measurements were performed on these signals only.
All evaluation metrics were calculated for each of the 23 folds and averaged to report the performance for each model.
Quantitative evaluation often fails show to the true picture of the outcomes of a study, sometimes even after evaluating from different aspects. For this reason, we also qualitatively evaluate the motion artifact correction performance of the proposed MLMRS-Net from EEG signals, both in the temporal and spectral domains.

Results
This section provides the quantitative and qualitative evaluation outcomes from the experiments conducted in this study along with illustrations.

Quantitative evaluation
This section mainly provides the quantitative outcomes of the experiments performed for this study. Here, Table 1 presents the results from Experiment A, and it shows that MLMRS-Net outperforms all the state-of-the-art segmentation models in terms of construction error and percentage reduction in motion artifacts. The construction error for normalized EEG segments is found as 0.056, which is the lowest among all the trained models. The Standard Deviation of construction error for the MLMRS-Net is also excellent. The lower value of this parameter signifies that the variation in construction error is minimal for the network while estimating clean EEG signals. High variability in performance parameters can easily question the robustness of a deep learning model. Hence, our proposed MLMRS-Net is robust and reliable since it shows minimal variation while estimating the clean signal. An outstanding performance of 90.52% improvement in average percentage reduction in removing motion artifacts is observed by the MLMRS-Net which is the highest compared to the other deep CNN models. As evident from Table 1, MLMRS-Net is one of the two models which could exceed 90%. Even though the MLMRS-Net performed well with an improved DSNR value of 26.641 dB, no significant difference across models in terms of DSNR can be observed. Detailed fourchannel result per trial for the MLMRS-Net has been provided in Supplementary Table 1. Mentionable that in all tables reporting results, outcomes from the best performing models have been made bold for individual metrics.
On the other hand, Table 2 presents the results from Experiment B where the input signals for the MLMRS-Net model were varied to understand the effect of accelerometer signals on the motion artifact removal performance. At first, accelerometer signals were removed fully and only the motion-corrupted EEG was used for the process. Then, gradually different axis of the accelerometer was varied with EEG to understand their respective contribution. Also, one interesting experiment was performed to estimate clean EEG signals from only the motion-corrupted 3-axis accelerometer signals. From Table 2, it is clear that when only EEG signals were used to train the MLMRS-Net model, it could reach an average g value of 89.32% while testing. Using any one of the 3-axis accelerometer data alongside the EEG signals slightly boosted g whereas using all three axes accelerometer data along with the EEG signals produced the best average g value of 90.52%. The improvement in DSNR value is similar for all the cases. The impossible experiment of using only 3-channel accelerometer signals to estimate EEG provides a minor average g improvement of 15.46% and DSNR value of 15.44 dB, which is expected as the estimated signals from only accelerometer data was nothing but noise. But this experiment proves that the accelerometer signals as predictors along with EEG signals have some positive impact in improving the average percentage reduction in motion artifacts with a 1.34% boost in performance. On the contrary, using only EEG signals for training a signal reconstruction model, one can reach optimum results in motion artifact correction. Thus, during a hardware system design, accelerometers can be removed, and one can still expect high performance from MLMRS-Net or similar models in EEG motion artifact correction.

Qualitative evaluation
As mentioned earlier, qualitative evaluation is crucial for such studies since the number cannot always provide a clear and convincing picture of the feasibility of a newly proposed approach. From the studies in the current literature provided in Table 3, it can be summarized from their reported high DSNR that they were good at reducing noise but might also have reduced the embedded biological EEG signals in the process; therefore, the correlation improvement did not exceed 70% even with high DSNR. However, in the case of the deep learning technique proposed in this work, the motion artifact is removed while keeping the biological signals intact, which made the DSNR value slightly smaller than some of the earlier studies. This can be visualized from the plots shown in Fig. 7 for various trials or folds across the dataset, for both clean and corrupted segments. Figure 7a-d shows some sample corresponding ground truth (EEG channel 1), (moderate to high) motion-corrupted (EEG channel 2), and MLMRS-Net estimated EEG segments. Figure 7e, f displays some segments without any presence of motion artifacts in EEG channel 2. It can be seen that during segments with no motion, all three signals show a high correlation. In such cases, MLMRS-Net tries to keep the signals as closes as the input EEG segments from channel 2. On the other hand, for even highly motion-corrupted segments, MLMRS-Net improved the correlation by a great amount, which proves the robustness of the approach. So far, we have visualized the performance in the time or temporal domain. Our claims have been further strengthened by Power Spectral Density (PSD) plots [87] and Topographic Maps [88] of EEG signals as shown in Figs. 8 and 9, respectively. These plots represent the performance of the model in the spectral domain. For spectral evaluation, segments from all 23 folds were concatenated and their spectra were analyzed and presented in a single plot. From Fig. 8, the PSD of the estimated EEG signals from the proposed deep learning framework greatly matches that of the ground truth EEG signals over the spectrum. On the other hand, even though motion artifacts insert high power components in the EEG signals all over the spectrum, in the case of the Delta (d) ffi 0.5 to 4 Hz band, the distortion is the worst. The proposed framework could greatly minimize the drastic effect of motion artifacts in this range, as shown in the PSD plot in Fig. 8 and the topographic map for the Delta band in Fig. 9b.
Talking about topographic maps, EEG topography is a neuroimaging technique for visualizing the neural activity around the brain by computing the bandpower of EEG signals collected from various electrodes and plotted smoothly following the gradient. In this case, we have a single-channel EEG collected from the prefrontal cortex region of the brain, as explained in detail in the dataset section. That means we have a single electrode in the 'Fpz' location of the brain as denoted by the international 10-20 system for scalp electrode placement for EEG data acquisition [89]. To compute the topographic map, we consider a total EEG bandwidth of 0.5 to 80 Hz while for the five EEG frequency components, we have Delta (d) ¼ 0.5-4 Hz, Theta (h) = 4-8 Hz, Alpha (a) = 8-13 Hz, Beta (b) = 13-40 Hz, and Gamma (c) = 40-80 Hz [90,91]. We combine the EEG signals (ground truth, motion-corrupted, and estimated) from all 23 folds, calculate the bandpower and plot the topographic maps while keeping the same scale for all cases [92]. From the topographic plots shown in Fig. 9, it can be seen that no matter what the frequency range is, motion artifacts destroy the topographic maps by inserting high-power components in the EEG signal, Delta being the worst affected band. Regardless of the EEG band, the estimated EEG components contain similar bandpower to the ground truth. Individual topographic plots for all 23 folds considering the whole EEG band (0.5-80 Hz) are provided in Supplementary Fig. 3 to understand the performance of the proposed framework in specific cases. It was observed that there are cases where the ground truth EEG has more bandpower (Fold 1-11), low bandpower (Fold 12-15), and medium bandpower (16)(17)(18)(19)(20)(21)(22)(23). Regardless of the case, the model always managed to remove motion artifacts properly and extract EEG signals with robustness. So, there are significant improvements in the percentage of signal correlation and SNR which can be observed from both temporal and spectral perspectives as presented in Figs. 7, 8, 9.

Comparison with the existing works
There have been several studies that worked on removing motion artifacts due to external perturbations from EEG signals, and all the studies worked on signal processing algorithms to reach their solution. Among the earliest studies, Sweeny et al. [21] used DWT, EMD, and EEMDbased signal processing techniques combined with ICA and CCA for motion artifact removal. Maddirala et al. [27] and Noorbasha et al. [31] proposed SSA and its variations for the same purpose and found better results. Gajbhiye et al. [29,30], in their two studies, used combinations of DWT along with multi-resolution TV, multi-resolution WTV, and Savitzky-Golay filtering techniques separately and reached much better outcomes. Until very recently, Hossain et al. [34,37] in their two papers developed motion artifacts correction pipelines that used several single-stage (VMD, WPD) and two-stage (VMD-PCA, VMD-CCA, WPD-CCA) signal processing methods and reported good performance in DSNR estimation. Table 3 summarizes  these works, and from Table 3, the best average DSNR value was reported as 30.76 dB using the WPD-CCA technique utilizing db1 wavelet packet and the highest average g was 68.76% utilizing DWT along with Savitzky-Golay filtering. In this work, our proposed MLMRS-Net model outperformed all the previous works with a staggering 90.52% improvement in average g value. Also, to the best of our knowledge, this is the very first paper in this domain that utilized any machine learning concept to clean motion artifacts from EEG signals. Compared to the traditional signal processing techniques, which have their drawbacks, the proposed approach made the deep learning model learn relevant features from EEG signals for better motion artifact correction through training. The results are provided in Table 3 for comparison. For different past studies reporting similar correlation improvements in Table 3, DSNR varied by a large margin, i.e., their trend in change is not similar. As mentioned before, the DSNR parameter depends more on the data preprocessing steps rather than the artifact correction technique, which is evident here and in past studies.

Conclusion
In this extensive study, we have proposed a novel deep learning-based 1D-segmentation network (MLMRS-Net) to remove motion artifacts from single-channel, motion-corrupted EEG signals, which is a very novel concept in this domain. Motion artifacts can severely affect EEG signals, which sometimes distort the signal morphology itself due to its very low amplitude. So, it is crucial to develop robust methods for reducing the effect of motion artifacts from the EEG signals. The performance metrics obtained from all the networks tested under this study are a clear indication of the efficacy of using deep learning models in removing motion artifacts from EEG signals rather than using  traditional signal processing techniques. Our proposed MLMRS-Net has produced the best performance in reducing the effect of motion artifacts in comparison to previously reported studies in the literature by reaching a PCC value of 90.52% between ground truth and estimated EEG signals along with an average noise reduction (DSNR) value of 26.64 dB while reliably retaining the underlying biological signals. Also, a very minimal construction error value is found while the MLMRS-Net model was utilized for reconstructing motion-corrected EEG signals. This study is proof that after being trained on a sufficiently large dataset, such deep learning models can be used to reliably remove artifacts from corrupted EEG signals in real time. Moreover, 1D-CNN-based signal reconstruction networks could be used for motion artifact correction from similar physiological signals such as electromyogram (EMG), electrocardiogram (ECG), photoplethysmogram (PPG), and phonocardiogram (PCG) following a similar experimental setup. In this way, minimal efforts can be given to developing more efficient signal processing techniques when artificial intelligence can reliably learn the pattern of the signals itself.
Acknowledgements The dataset used in this study is kindly shared in the PhysioNet database by Sweeney et al. [75,76]. Data availability statement The data that support the findings of this study are available from the corresponding author upon reasonable request.

Declarations
Conflict of interest The authors declare that they have no conflict of interest.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons. org/licenses/by/4.0/.