In this section, approaches utilized to detect right hand imagery movement patterns are presented as follow: 1- selecting a frequency band for filtering; 2- extracting the ERDs for individual subjects; 3- implementing the DFA algorithm; 4- combining the DWPT with DFA; 5- updating DWPT-DFA’s mother wavelet for individual subjects automatically; and 6- classifying the extracted features utilizing SSVM classifier with GRBF kernel.
Filtering
For the preprocessing, the EEG data consists of segments, and each segment is a time window, which starts at 200 msec before visual stimulation (displaying pictures) to the point of 2500 msec after the visual stimulation. The data is then filtered by applying the second order Butterworth filter in six frequency bands, which are 8-12Hz, 12-16Hz, 16-20Hz, 20-24Hz, 24-28Hz, and 28-32Hz [20, 23, 24]. The best frequency bands are selected from 8-12 Hz and 12-16Hz based on accuracies [24]. Finally, the 8-15Hz frequency band is selected by mixing the two frequency bands with try and test. After selecting the best frequency band, a customized mother wavelet using an ERD for the DWPT-DFA is attained as follows:
Event related desynchronization (ERD)
To obtain the ERD waves, a repetitive task is designed and depicted in Fig. 3, where of 280 trials are implemented. Subjects imagine the right hand movement after the hand pictures disappear from screen. In the procedure, a marker is sent to record the moment of the right hand movement imagination and displaying pictures. In Figs. 5 and 6, the y axes is the location of markers and is the zero point of our calculations in each EEG segments. The imaginary movement trials are extracted and filtered based on the location of the markers. The width of the window is 2700 msec. The window starts at the point of 200 msec before the stimulation and ends at 2500 msec after the stimulation. Finally, the filtered segments are averaged and obtained Figs. 5 and 6. Then, the ERD from FC1 channel is integrated with the DWPT as follows:
Wavelet
One well-known method to extract properties of a time series (TS) is Fourier Transform (FT). The FT does not have ability of specifying correspondence between time and frequency domains. Wavelet solved the constraint by decomposing TS to a frequency domain specified by time, i.e. time-frequency domain. The other property of wavelet is diagnosing self-similarity in frequency domain, which is based on mother wavelet function [16]. One wavelet constraint is that the predefined wavelets are not effectively useful in EEG studies. Our aim is to implement an automatic updateable mother wavelet method to diagnose the subject’s ERDs patterns. Hence, the ERD patterns of individual subjects are extracted and replaced. For instance, the ERD patterns of a subject are depicted in Figs. 5 and 6. The wavelets are defined into two main categories of discrete and continuous wavelets. The continuous wavelet is utilized when the high frequency ranges (or scales) are required for computations, whereas the discrete wavelet is useful when the low frequency bands are important for computations. The continuous wavelet is defined as follows [41]:
$$ \begin{array}{@{}rcl@{}} W{\!}_{\varphi }f(a,b)=\frac{1}{\sqrt{a}}\int\limits_{-\infty }^{+\infty }{f(t)\varphi \left( \frac{t-b}{a}\right)dt,} \end{array} $$
(1)
where φ, a, b symbols denote the mother wavelet, the scaling and shifting parameters, respectively [41]. To generalize the DWT, the DWPT is defined. The DWPT is a linear combination of the DWTs’ mother wavelet properties. The DWPT method has accessibility to low and high frequency bands at the same time as in Fig. 1. To design the diagram, the input signal is divided in low and high frequency components to reach the intended frequency bands (values in Fig. 1 are rounded, but in calculations values are accurate) [42].
To implement the DWPT-DFA algorithm, the DFA approach is implemented as follows:
Detrended fluctuation analysis (DFA)
The DFA is an effective approach for evaluating self-similarity by predicting of long-term correlation and scaling algorithms in TS [39]. The self-similarity TS (x(j)) based on the DFA is calculated by integrating TS (y(i),i = 1,...,N) as follows [43]:
$$ \begin{array}{@{}rcl@{}} x(j)=\sum\limits_{i=1}^{n}{[y(i)-\left\langle y \right\rangle ],} \qquad \left\langle y \right\rangle =\frac{1}{N}\sum\limits_{i=1}^{N}{y(i)}, \end{array} $$
(2)
where x(j) is the input TS signal, which is divided into Nn equal segments with length n, \({{N}_{n}}=\text {int}(\frac {N}{n})\). Then, a least square error (LSE) method is utilized to fit an envelope on the xn(j) sets of points for individual segments. Next, a mean square error (MSE) function ((S(n)) is applied on the detrended signals as follows:
$$ \begin{array}{@{}rcl@{}} & {{x}_{s}}(j)=x(j)-{{x}_{n}}(j),\\ & S(n)=\sqrt{\frac{1}{N}\sum\limits_{j=1}^{N}{{{[x(j)-{{x}_{n}}(j)]}^{2}}}}, \end{array} $$
(3)
where xs(j) is the detrended TS. Thereupon, a logarithmic diagram named S(n) is computed. Based on exponential law the, S(n) is formed as S(n) ∝ nα, in which, α is a parameter to obtain an envelope for fitting on the obtained logarithmic diagram. The slope of the fitted line is identified as self-similarity value. Regarding to α value, it is in the following categorized list:
long-term anti-correlation for 0 < α < 0.5,
long-term correlation for α > 0.5,
white noise for α = 0.5,
\(\frac {1}{f}\) noise for α = 1,
Brownian noise for α = 1.5.
In the final part of feature extraction calculations, the DFA is combined with the DWPT and ERD mother wavelets.
DWPT-DFA
To combine the DWPT-DFA with the ERD, the y(i) is decomposed into components in level l. In each level, the components are juxtaposing to make a new TS, called y′(i). The complete procedure is presented as follows:
- I.
Forming a new TS (y′(i)) by computing and juxtaposing wavelet packet, components in each level,
- II.
Constructing x(j) by integrating y′(i) using (3),
- III.
Dividing the achieved x(j) into Nn equal segments,
- IV.
Fitting a line on the x(j) points utilizing the LSE method,
- V.
Computing the S(n) using the MSE method and forming a logarithmic diagram,
- VI.
Computing the slope (w) of the linear fitted line in the logarithmic diagram, which is drawn by S(n).
After extracting the DWPT-DFA features with different mother wavelets, features are classified using the SSVM-GRBF classifier. At this stage, db4, db8 and coif4 are the predefined mother wavelets that will be replaced with the ERD mother wavelet in order to extract the features classified through the SSVM-GRBF classifier. The SSVM-GRBF classifier algorithm is described in detail as follows:
Classifiers
The last section of the DWPT-DFA is to classify the imaginary movements and non-imaginary movement features. To classify the extracted features, the SSVM classifier with the GRBF kernel is utilized and is described in two parts, the GRBF kernel and the SSVM classifier:
Generalized radial basis function (GRBF)
The Generalized RBF is a successful kernel, which is utilized in our previous studies [21, 22, 30]. The flexibility of the GRBF is summarized in terms of width (w > 0), shape (τ > 0) and center (c) of the Gaussian distribution parameters, which leads the fashion to highly accurate and robust results. The procedure is described in detail as follows:
$$ \begin{array}{@{}rcl@{}} G(s;c,w,\tau )=\frac{\tau }{2w\gamma ({}^{1}/{}_{\tau })}\exp \left( -\frac{\left\| s-c \right\|}{{{w}^{\tau }}}\right), \end{array} $$
(4)
where the parameters w and γ are the width and factorial extension function, which are computed as follows:
$$ \begin{array}{@{}rcl@{}} w&=&\sigma \sqrt{\frac{\gamma ({}^{1}/{}_{\tau })}{\gamma ({}^{3}/{}_{\tau })}},\\ \gamma (k)&=&\int\limits_{0}^{\infty }{{{t}^{k-1}}{{e}^{-t}}dt} ,\qquad for \quad k>0, \end{array} $$
(5)
where σ, t and k symbols denote the standard deviation, positive values, and inputs number. The factorial extension function is developed in [42], which is a part of an approach for generalizing the Gaussian function in the extreme learning machine method. The key parameter for generalizing the RBF kernel is τ that is tuned by optimization in a wide range.
Soft margin support vector machine (SSVM)
The SSVM is a powerful binary supervised classifier, which has the ability of integrating with different kernels. The SSVM is based on the traditional SVM with modification in solving duality problem using Lagrange’s theorem when the number of W (features weight) is increased [8]. In the procedure, two classes of y = [− 1,+ 1] are defined for the training features (xi ∈ Rn,i = 1,...,l). Then the features are mapped into higher dimension with GRBF, which is defined in Section 2.6. Afterwards, a linear hyperplane WTxi + b = 0 is employed to classify the data regarding to two criteria in decision boundary as follows:
$$ \begin{array}{@{}rcl@{}} & {{W}^{T}}{{x}_{i}}+b\ge 1\quad \qquad if\quad {{y}_{i}}=1, \\ & {{W}^{T}}{{x}_{i}}+b\le -1\qquad if\quad {{y}_{i}}=-1, \end{array} $$
(6)
where W, b, T symbols denote feature weight, bias, and the transpose operator, respectively[37]. The hyperplane is then fixed by choosing some features that maximizing the margin of the classes. The selected features are named as the support vectors and the margin center is fixed the decision boundary that are computed as follows:
$$ f(x)= sgn ({{W}^{T}}x+b). $$
(7)
In high dimension and complicated situations, a nonlinear map of ϕ(xi) is defined for (7), WTϕ(x) + b = 0. The duality problem is observed, which is solved through the Lagrange’s theorem [5] in (10). The Lagrange’s theorem is employed to limit the number of used features. The final decision boundary is computed by (8):
$$ sgn ({{W}^{T}}\phi (x)+b)=sgn \left( \sum\limits_{i=1}^{l}{{{y}_{i}}{{\theta }_{i}}K({{x}_{i}},x)}+b\right), $$
(8)
where k is the kernel function, K(xi,xj) ≡ ϕ(xi)Tϕ(xj). W is computed by \(w=\sum \limits _{i=1}^{l}{{{y}_{i}}{{\theta }_{i}}\phi ({{x}_{i}})}\). The optimized values for (8) are obtained in (9). 𝜃 is the utilized Lagrange’s coefficient for high dimension feature space for the ith trial (10).
$$ \mathop{\min_{W,b,\psi } } \left( \frac{1}{2}{{W}^{T}}W+Reg\sum\limits_{i=1}^{l}{{{\psi }_{i}}}\right)\quad \text{ Subject to }\quad \begin{array}{l} {{y}_{i}}({{W}^{T}}\phi ({{x}_{i}})+b)\ge 1-{{\psi }_{i}} \\ {{\psi }_{i}}\ge 0 \\ i=1,...,l, \end{array} $$
(9)
where ψ, (ψ(W;x,y) ≡ (1 − yWTx)2) is the loss function [8].
The Lagrange’s coefficient are computed as follows:
$$ \begin{array}{@{}rcl@{}} \mathop{\theta }{\mathop{\min }} \left( \frac{1}{2}\sum\limits_{i,j=1}^{N}{{{\theta }_{i}}^{T}{{y}_{i}}{{y}_{j}}K({{x}_{i}},{{x}_{j}}){{\theta }_{j}}}-\sum\limits_{i=1}^{N}{{{e}^{T}}{{\theta }_{i}}}\right) \text{ Subject to } \begin{array}{l} {{y}^{T}}\theta =0, \\ 0\le {{\theta }_{i}}\le Reg ,\\ i=1,...,l, \end{array} \end{array} $$
(10)
el×1 is a matrix unit (Fig. 2).
Data acquisition and experiment setup
In this experiment, EEG is recorded from nine international students in Lappeenranta University of Technology (LUT). Data is recorded from non-alcoholic people who has no habits of smoking or drugs. Subjects did not drink caffeinated materials such as coffee or tea in four hours before the experiment. The average age of subjects is 27.3 years old. To record the EEG data, a task is designed regarding to the validated BCI competition III data IVa [37], which is presented in Fig. 3 and available in http://www.bbci.de/competition/iii/. Description of the task is presented in four stages as follows: I) displaying the cross sign (cross fixation) at the center of a black screen for 500 msec; II) presenting pictures of right hand movement for 500 msec; III) imagining right hand movement for 2500 msec; IV) and finally, resting from 3500 msec to 4000 msec randomly after stimulation. In this experiment, we used the ENOBIO 32 portable device EEG recorder with 32 gel electrodes that is connected with Matlab through Wi-Fi. The electrode locations are depicted in Fig. 4. The frequency rate was set to 500 Hz. Computations, simulations and task are implemented in Matlab 2017.