Effect of time windows in LSTM networks for EEG-based BCIs

People with impaired motor function could be helped by an effective brain–computer interface (BCI) based on a real-time electroencephalogram (EEG) and artificial intelligence algorithms. However, current methodologies for interpreting patient instructions from an EEG are not accurate enough to be completely safe in a real-world situation , where a poor decision would place their physical integrity at risk, such as when traveling in an electric wheelchair in a city. For various reasons, such as the low signal-to-noise ratio of portable EEGs or the effects of signal contamination (disturbances due to user movement, temporal variation of the features of EEG signals, etc.), a long short-term memory network (LSTM) (a type of recurrent neural network) that is able to learn data flow patterns from EEG signals could improve the classification of the actions taken by the user. In this paper, the effectiveness of using an LSTM with a low-cost wireless EEG device in real time is tested, and the time window that maximizes its classification accuracy is studied. The goal is to be able to implement it in the BCI of a smart wheelchair with a simple coded command protocol, such as opening or closing the eyes, which could be executed by patients with reduced mobility. Results show a higher resolution of the LSTM with an accuracy range between 77.61 and 92.14% compared to traditional classifiers (59.71%), and an optimal time window of around 7 s for the task done by users in this work. In addition, tests in real-life contexts show that a trade-off between accuracy and response times is necessary to ensure detection.


Introduction
New technological developments increasingly require nonphysical interaction with the user. These types of applications not only cover routine or leisure aspects, but facilitate the lives of people who may have impaired motor function, (e.g. patients with amyotrophic lateral sclerosis (ALS) (Miao et al. 2020), spinal injuries, head trauma, rehabilitation, etc.) (Pfurtscheller et al. 1998;Craik et al. 2019). In recent years, many works have studied the development of brain-computer interfaces based on a non-invasive EEG device, from the standpoint of the design of both paradigms and algorithms for extracting features from EEG signals (Miao et al. 2020). In applications where the BCI is used by patients with some pathology, a BCI with maximum precision is essential for user safety. The difficulty with this application is that the classifier must make a decision in most cases based on a single trial (it is unable to obtain an averaged trial for the feature extraction). There are currently many research works aimed at finding better tools to identify an instruction and make a decision (Taniguchi et al. 2007;Li et al. 2016). A wide variety of methods for acquiring brain activity focus on non-invasive devices, such as magnetoencephalography (MEG), positron emission tomography (PET), and functional magnetic resonance imaging (fMRI). Even though electroencephalography (EEG) signals by nature have a poor signal-noise ratio, especially when acquired with wireless EEG devices, we conducted our research with this device due to the low cost, comfortability and ease of installation for a future user.
The first fact to consider is that the EEG signal patterns vary among users, and do not even remain constant over time for the same user. They always change in any patient state: when someone is at rest, performing some task (such as moving a hand) or is acoustically stimulated. Therefore, finding user responses to stimuli in real time, as well as determining the duration required for the signal recording to be classified properly, are essential to building an efficient BCI. All this time-dependent information and changes in the patient's status extracted from the EEG signals have to be taken into account to design a classifier that improves decision-making in any application executed in real time. Real-time execution poses additional problems. The first one, as Netzer et al. defined in their paper, is a common issue in BCI processing, namely that the probability of success in the detection depends on capturing time (Netzer et al. 2020). If the BCI has short reaction time, the classification algorithm will perform worse, while if a better algorithm is used, the reaction time increases considerably. For example, they propose an algorithm based on coreset to calculate the common spatial patterns (CSP), keeping its representation with better results. The second problem is fatigue, as Roy et al. demonstrated. Mental fatigue can affect the signal intensity in EEG due to the distribution of these band power features, which are affected by the emotional and physical states (Roy et al. 2013). Complementarily, Myrden and Chau (2015) found a lower BCI performance caused by fatigue in their experiments involving a navigation game in which the participants had to report their level of fatigue, inter alia.
As a result of the complexity of the EEG interpretation, especially in a real-time application, the extraction and classification of information requires a complex methodology based on machine learning (ML) or deep learning (DL) techniques. Despite many existing classification algorithms in artificial intelligence-based applications, we could consider several of the most commonly applied algorithms. Lotte and others in their articles (Lotte et al. 2007(Lotte et al. , 2018 review the most frequently used classification algorithms in EEG-based applications, highlighting the following methods: discriminant analysis (DA), support vector machine (SVM), artificial neural network (ANN), k-nearest neighbor classification algorithm (KNN), decision tree learning (DTL), etc., which have yielded high accuracies, although not enough to apply these systems in an actual environment. The DA and SVM algorithms rely on a discriminant hyperplane search to identify classes, and they are prevalent in EEG-based BCI. In DA, it attempts to separate the data into different classes using a hyperplane, assuming that the data has a normal distribution and each class has an equal covariance matrix. The SVM searches for the hyperplane that maximizes the margin of the training data to the hyperplane. Algorithms based on artificial neural networks (ANN) assemble artificial neurons that are used to establish non-linear decision limits, but these are generic algorithms that could be negatively affected by noisy and non-stationary signals, such as those from an EEG. The KNN algorithm classifies new data according to the distance of the k nearest neighbors (Poorna et al. 2017). And the DTL is based on recursive partitioning of the instance space to predict the response (Wang et al. 2009;Ishfaque et al. 2013).
All these algorithms need a set of features extracted from the EEG signals as input data. However, due to the large number of features that can be obtained, the features needed for each classification method have to be prescreened; that is, from each trace performed, a number of specific features is extracted (average power of a band in electrodes, connectivity, symmetry, etc.), losing the temporal information present in the brain when executing a task.
Against this backdrop, and given the importance of deep learning in a biomedical context (Zemouri et al. 2019), a classifier based on neural networks that can remember the information could provide a suitable alternative for an improved BCI. There is a type of recurrent neural network (RNN)-Long short-term memory (LSTM)-which is able to meet this requirement. The LSTM was developed by Hochreiter and Schmidhuber (1997). It is an RNN with the ability to remember information for long periods and forget unnecessary information, which means that the result outputs are based on a temporal pattern (Hochreiter and Schmidhuber 1997;Gers et al. 2000). It has been applied to EEG records, for example, to predict epileptic seizures (Tsiouris et al. 2018) or for emotion recognition (Alhagry et al. 2017;Li et al. 2018;Yang et al. 2018;Xing et al. 2019). Additionally, other researchers have applied this kind of RNN in similar contexts to ours: Kumar et al. (2019), who applied it in the BCI competition IV dataset 1 by combining common spatial pattern (CSP) with this type of network; Nagabushanam et al. (2020), who compare two specific neural network architectures (one with an LSTM layer) with the SVM; Li et al. (2016) defined an architecture and applied an LSTM net to classify motor imagery tasks in a public database, and something similar was applied by Wang et al. (2018) who classify motor imagery tasks of a public dataset by using their proposed framework based on LSTM, although they used a fixed time window.
The aims of this work are to study the effectiveness of using an LSTM with a low-cost wireless EEG device like Emotiv EPOC ? working in real time, and to find the minimal optimal time window that maximizes the accuracy of the LSTM classifier. These objectives will lay the foundations for achieving a low-cost BCI that could be used by people with impaired motor function, for example, in a smart electric wheelchair with basic commands based on a simple code of opening and closing the user's eyes. In order to determine the performance of using LSTM, we compared our results with popular 'classical' classification algorithms applied to EEG-based BCI, such as DA, SVM, KNN and DTL. In addition, the results obtained with the LSTM algorithms are compared with those obtained with a conventional recurrent neural network (RNN).

Materials and methods
This section aims to define and explain the methods carried out in this research in an effort to highlight the relationship between the time window size and the robustness of a BCI, which follows the workflow shown in Fig. 1.
The flowchart for EEG signal pre-processing, feature extraction and classification is given in Fig. 1, which starts with the data acquisition using the Emotiv EPOC ? to register the entire 14-channel EEG signal at a sample frequency of 128 Hz. Each data trial was divided into two seconds. A denoising of eye-blinks, a band-pass filter and a threshold were applied to remove any possible noise and artifacts. The flow continues by generating all the individual datasets with this processed signal, which consisted of different sequence sizes. The signal is then subject to a frequency analysis to obtain its power. Finally, a common train, validation and test processing is carried out to review the performance of the system.
All the analyses in this paper were conducted using MatlabÒ (The MathWorks, Inc.). The data were analyzed with Fieldtrip (Oostenveld et al. 2011), a toolbox developed by the Donders Institute for Brain, Cognition and Behaviour in Nijmegen (Netherlands), in collaboration with other institutes. This toolbox offers pre-processing and advanced analysis methods for MEG, EEG, iEEG and NIRS records. In addition, the LSTM was defined, trained and tested with the Matlab Deep Learning Toolbox TM . The statistical computing (Cohen 1988) was carried out using R software (R Core Team 2019).

Participants
All the volunteers, a total of 13, gave their consent before the experiment. They ranged in age from 18 to 50 and all of them were right-handed, with no motor pathologies. The study was approved by the ethics committee of the University of La Laguna (register number: CEIBA2020-0405).

Experimental protocol
The experimental paradigm is defined in Fig. 3. This protocol was fully recorded, and it is composed of two important parts, both lasting 20 s with a 10-s rest period in between.  In the first part of the experiment, a cross was displayed, and the volunteer did not have to do anything. This is considered the baseline state, during which the rest state of the EEG is recorded. The second part started and finished with an acoustic stimulus (beep) to indicate when to close and open their eyes, respectively. These actions, each lasting 20 s, were divided into different time window sizes for classification purposes.

Data pre-processing
The pre-processing consists of three steps: A denoising of eye-blinks, a band-pass filter and an automatic artefact rejection.
The denoising of eye-blinks was done using the REBLINCA method performed in Di Flumeri et al. (2016), where the regression component of the FPZ channel (Regr-FPZ) was calculated. The reason for applying this procedure is to eliminate the muscular effects of flicker on the signal, and thereby exclusively classify the EEG signal. Because our EEG device does not have the necessary central sensor, we defined a reference signal (AFZ) as the average of the AF3 and AF4 channels. The regression component of this signal (Regr-AFZ) was calculated using a 5th-order Butterworth filter between 1 and 7 Hz. A threshold signal (Thres-AFZ) was computed as the derivative of the Regr-FPZ in order to highlight both rising and falling parts generated by the eye blink. This signal was then normalized to have a mean of zero and a variance of one. Finally, a moving average was applied over the square of the signal. The Thres-APZ component was used to locate where the eye blink disturbance was. Then, in the blinking areas, the original signal was corrected by subtracting the weighted Regr-FPZ signal for each channel: where we previously estimated the weight of the proportion of the signal AFZ presented in each channel as: Figure 4 shows an example of an eye blink subtraction in four sensors (AF3, AF4, F7 and F8) where these artefacts are highlighted.
Then, the Butterworth band-pass filter between 5and 40 Hz was applied to the data.
Finally, a threshold (thre j ) was computed for the automatic artefact rejection as defined in the Fieldtrip tutorial (Donders Centre for Cognitive Neuroimaging 2019): the signal of each channel (x ch (t)) is standardized (z ch (t)) with respect to its standard deviation (r ch ) and its average (l ch ) over time (t): where with N being the total number of samples.
After that, a new signal (z sum (t)) is created as the average of all the standardized data of each channel, where c is the number of channels: The threshold (th rej ) to remove the outliers, which were considered artefacts and noise, is defined as: where l is the average and r is the standard deviation of the signal z sum with N being the total number of samples. An example of the application of this threshold to an EEG user test is shown in Fig. 5.

Power and dataset generation
After the signal pre-processing, the Morlet Wavelet Transform (Tallon-Baudry and Bertrand 1999) is applied to calculate the evolution of the power in the time-frequency domain. Because the purpose of a classifier is to discern when a user is in the eyes-open (EO) or eyes-closed (EC) Fig. 3 Experimental protocol. Each trial consisted of a 20-s fixation period with eyes open and a cross on the screen, 10-s rest with a black screen and 20-s period with eyes closed. A beep marks when to close or open the eyes. EEG data were recorded during the entire time state, the alpha band (8-12 Hz) of the EEG signals in the visual area (occipital lobe) corresponding to O1 and O2 sensors must be analyzed (Barry et al. 2009).
Since the classical classification algorithms selected (SVM, DA, KNN, and DTL) and the LSTM network use the same features in different formats, two databases were generated. The LSTM dataset was created with time sequences of the power in O1 and O2 with a fixed time window for all trials of 20 s by adding their corresponding label (EO or EC), where n w is the number of points in a sequence, n s is the number of sequences and E j is the label. The time window, which defines the duration of the sequence, was modified from 1 to 10 s to study the LSTM performance. Each sequence was created by moving the window along the signals, always back from the last sample (Fig. 6).  The dataset used in the classic algorithms was created with the power measurements from all the 20-s trials in each time in O1 and O2 adding its corresponding label (EO or EC), where N is the number of samples and E i is the label. In the case of these algorithms, it is possible to do extra preprocessing to work with sequences in a similar way to LSTMs. But in this research, the aim is to apply the same database for all algorithms with no additional transformations and to see both the metrics obtained and the result in a real case.

LSTM algorithm
The classification algorithm selected is an ANN with an LSTM layer (Hochreiter and Schmidhuber 1997). The complete neural network architecture is shown in Fig. 7. This architecture was composed of a sequence input layer (with two neurons corresponding to the features selected for each channel, O1 and O2), an LSTM layer with eight cells (this layer can learn long-term dependencies between time steps of the sequence data), a fully connected layer (which connects every neuron in one layer with the other one) and a SoftMax layer (which applies the SoftMax activation function to define the probability of each class) and, finally, the classification layer to define the outcome (Kudo et al. 1999;Bishop 2006). The most important layer is the LSTM. The top of Fig. 7 shows the cell, where each square in the cell is a layer of the neural network that corresponds to a sigmoid or tanh operation. The circles represent specific operations like vector multiplication 'x', vector sum '?' or a tanh operation. The arrows define the data flow, including the following operations: merging of two lines, concatenation or bifurcation. This cell uses the following data: input from  Architecture of the neural network used in this work. The LSTM layer has cells, the structure of which is shown in the exploded view, where x t is the input, h t the output and c t is the state the recurrent network (x), cell output (h) and memory (c). Each kind of data changes as a function of time t.
The structure of a cell can be divided into different parts. The horizontal line at the top of the diagram returns the state of the cell, and the data defined in it can be added or removed from specific gates. The first interaction in the state of the cell is generated for a sigmoid layer that decides what information is irrelevant from the h t-1 and x t sources, and deletes the data on the state of the cell (f t ).
The cell input selection can be divided into two steps. In the first step, a sigmoid layer defines the key values and updates the most important ones (i t ).
In the second step, a combination of new candidate data (c t ) is generated by a tanh layer and past data.
Finally, the data to forget (f t ) are removed and the previously selected data (c t ) are added as described in the following formula to generate the newly updated cell status (c t ).
A more detailed explanation of how LSTMs work is provided by Hochreiter et al. (Hochreiter and Schmidhuber 1997).

Classic algorithms
The LSTM algorithm was compared with the following common classification algorithms used in the literature (Lotte et al. 2007).

Support vector machine (SVM)
The SVM algorithm searches for a hyperplane in a space of n dimensions, where n is the number of classes, which separates the classes with a margin as wide as possible, with the margin being the width of the region parallel to the hyperplane. In this algorithm, multiple kernels that map the input data to a higher dimension space could be applied. By applying these kernels, the computational effort and time can be reduced, and the algorithm is able to deal with nonlinear problems.

Discriminant analysis (DA)
Classification method that assumes that different classes have their data based on different Gaussian distributions. Based on this assumption, the classification is done by calculating the probability that a new element belongs to each trained class.

k-nearest neighbor (KNN)
Lazy learning method that classifies at the time the new data are tested. This algorithm defines the label of a new element based on the closest known labels, the closest neighbors. This requires a distance function, meaning several distance functions as kernels were used.

Decision tree learner (DTL)
This algorithm is based on a tree structure. This algorithm defines a root node and creates forks, or branches, according to the training parameters until it reaches a leaf node, which defines the final label. Starting from the root, the label of a new element can be defined, since at each point its values are evaluated, defining the branch to follow until the leaf node is reached.

Recurrent neural network (RNN)
Recurrent neural network similar to feedforward networks, but with a recurrent connection that has an associated tap delay. This allows the network to have an infinite dynamic response to time series input data.

Parameters and kernels
Since the fit parameters and the dimensionality of the data can affect the classifiers themselves, several parameters and kernel functions have been applied (Table 1).

Metrics
Well-known machine learning metrics were applied to compare the performance of the algorithms. These metrics are based on the confusion matrix, where true positives (TP) correspond to the values classified as positive that were actually positive, and the true negatives (TN) to the values classified as negative that were actually negative. P is the number of records considered to be of the positive class (eyes closed) and N is the number of records of the class considered to be negative (eyes open).: • Accuracy (ACC): Percentage of cases where the model was correct. A full success rate would correspond to a value of 1.
• Sensitivity (SEN): the probability of a positive test, conditioned on being truly positive (true positive rate). A value of 1 corresponds to a full hit rate of positive cases.

SEN
• Specificity (SPE): the probability of a negative test, conditioned on being truly negative (true negative rate). As with the previous metric, a value of 1 in the specification corresponds to all the values considered negative being correctly classified.
• Matthew correlation coefficient (MCC): correlation coefficient for binary classifications. This metric is represented between the range -1 and ? 1, with 1 representing a perfect prediction, 0 being a random representation, and -1 total disagreement between the prediction and the observation.

Results
The temporal variation of the EEG recordings in BCI applications suggests that classification algorithms that take their historical evolution into account (such as LSTM) would give better results. This section shows the feasibility of using LSTM in BCI systems with a low-cost, wireless EEG device (Emotiv EPOC?).
EEG recordings of 13 volunteers were taken using the protocols defined in the previous section. After pre-processing, the time evolution of the alpha-band power is computed using the wavelet transform. Figure 8 shows that the magnitude of the power of the alpha band in the visual area (sensors O1 and O2) is not constant over time and has larger values in the EC state than in the EO state, although, the EC state also has low values that could be confused with the EO state if the power is analyzed with a small time window.
A good BCI must react ideally to the user's action (command) instantly, or at least in the shortest time possible. If the system takes too long to react, the user could run into trouble. In fact, if the BCI's response is not quick and accurate, a person with reduced mobility could spend considerable time and effort defining a simple action. Consequently, a long journey will take time and be exhausting. Applies a kernel scale set to sqrt(P) to make medium distinctions between classes, where P is the number of predictors

Coarse Gaussian
Applies a kernel scale set to sqrt(P)*4 to make coarse distinctions between classes, where P is the number of predictors

KNN Fine
The number of neighbors is set to 1 to make fine distinctions between classes

Medium
The number of neighbors is set to 10 to make medium distinctions between classes

Coarse
The number of neighbors is set to 100 to make coarse distinctions between classes Cubic Medium distinctions between classes are sought with a cubic distance metric and 10 neighbors Weighted Medium distinctions between classes are sought with a weight distance metric and 10 neighbors Cosine Medium distinctions between classes are sought with a cosine distance metric and 10 neighbors RNN Delay Input delays associated with the recurring connection of the layer Thus, to deal with these situations and have a more efficient BCI using an LSTM, it is important to define the shortest time window that generates optimal control of the system. Different time windows were used to create the datasets for the LSTM and find the best one.
The time evolution of the alpha-band power recorded in the O1 and O2 positions was used as a dataset to train and validate the classical algorithms and the LSTM network. The dataset was randomly divided to train (75%) and validate (25%) the algorithms. Four metrics, m, were calculated: accuracy (ACC), sensitivity (SEN), specificity (SPE) and Matthew correlation coefficient (MCC). The process of randomly dividing the dataset to train and validate the algorithms was repeated 20 times for each subject. Then, the mean value of the metric for each subject m i is the average of the 20 metrics calculated, m ij , where i indicates the subject and j identifies the iteration. Finally, the mean and standard deviation using the ''n -1'' method (sd.s) of the set of subjects is calculated as where i = 1,…,I, with I = 13 and j = 1,…,J, with J = 20. Table 2 shows the mean result of the classical algorithms and Table 3 shows the means for the LSTM network with different time windows. In Table 3, we can see that the accuracy is between 0.7661 and 0.9214, while in Table 2, the accuracy of the classical algorithms is between 0.4854 and 0.6872. The Friedman test yielded a statistically significant difference in the accuracy depending on the time windows, with a test statistic value F = 20.462 [ v 2 (2) = 7.815 and a significance level p-value = 0.000036. Then, to locate time windows with significant differences, post hoc studies with Wilcoxon signed-rank test were applied between different windows. The p-values obtained from these comparisons are shown in Table 4. p-values greater than 0.05 mean that there is no significant improvement to the LSTM algorithm when comparing the classifier accuracies using two different time windows. Therefore, as this table shows, there is no significant improvement for window sizes longer than 7 s.  Once the classifier algorithms were trained and validated, they were tested with new records. The paradigm used for the test recordings consisted of three EC states, each of which is t seconds long, with an EO state in between. Figure 9 shows the classification output for a user with different algorithms. The signals at the top (Fig. 9a) are EEG power records from the O1 and O2 sensors. Figure 9b shows the results using the best classifiers according to Table 2, and an LSTM classifier with a time window of 7 s. Figure 10 represents the classification of the same recordings as in Fig. 9 with the LSTM and different time windows (w): 2, 4 and 7 s. Also, long EC states (t) were defined with different durations: 7, 15 and 20 s.
In addition, to test the network architecture, the number of LSTM layers has been increased (using eight neurons in each one) and their accuracy has been evaluated, yielding the result shown in Fig. 11 with the different time windows used. Note the elbow that is formed in the time windows between 5 and 7 s. As for the results, when applying several layers (2, 3 and 4), the evolution of the accuracy is similar, while using a single layer has lower values.

Discussion
In this study, we have chosen a low-cost wireless EEG device, namely the Emotiv EPOC ? , because it is wireless and easy to fit, which facilitates its use in BCI applications, for example in situations where people with reduced mobility wish to interact with their environment through computers. In addition, because it is a low-cost device, it   Time windows (s) 1 *** *** *** *** *** *** *** *** *** 2 *** *** *** *** *** *** *** *** 3 *** * *** *** *** * *** can be used by a greater number of potential customers who will have less purchasing power. However, a low-cost wireless EEG device has a number of disadvantages compared to clinical devices in terms of the quality of the acquired signal. Therefore, in this study, we looked at the possibility of applying an LSTM network to the EEG recorded by a low-cost wireless EEG device and the need to choose the appropriate time window to improve accuracy. We also wanted to show that in this case, an LSTM network can yield better results than classical classification algorithms.
In order to study and compare the selected classification algorithms, the brain activity of 13 volunteers was measured while keeping their eyes closed or open, depending on the external stimuli received. Then, the purpose of the classifier is to decide whether the subject has their eyes closed or open. As was shown at the beginning of the results section, there is a clear contrast in brain signals when a user is in the EO or EC state, which is present in the alpha-band power signal, Fig. 8. The results obtained by Gopika Gopan et al. (2017) are similar, demonstrating a clear difference between these two states with an accuracy of 77.92%. But when comparing the classic classification algorithms, Table 2, with the LSTM networks, Table 3, we see that the latter are more accurate, yielding an accuracy that is at least 8% higher than standard algorithms. Although, Fig. 8 (Table 1)   EO and EC states. In this case, traditional classification algorithms cannot discern the difference between the two states. Therefore, when the classification algorithms are trained directly with the values of the alpha-band power signal that change with time, their accuracy is worse and the records must be processed to extract other characteristics that can improve the classification over a longer time window. When we focus on the sensitivity, specificity and MCC of these classic algorithms, Table 2, we see this behavior reflected: the accuracy average is 59.71% and the sensitivity (ratio of correct positives and all the elements classified as positive) and specificity (negative elements in relation to all elements classified as negative) are 52.38% and 67.04%, respectively. But in the case of MCC, we see that most of these algorithms have a value very close to 0, which means that the classification of these algorithms is almost random. Furthermore, it is observed that the application of the RNN, which works with previous information of the data, does not depart from the accuracies obtained with the rest of the algorithms. However, a classifier such as the LSTM, which takes into account the information in the previous time instants, produces better results. Table 3 shows that the accuracy (76.61-92.14%), sensitivity (70.75-91.62%) and specificity (82.47-93.63%) of LSTM networks are higher than the classical algorithms. In addition, the MCC is distributed between the range of 0.53 and 0.85, which means that its prediction is quite good (a perfect prediction is when MCC is equal to 1).
An important factor in properly training an LSTM network is choosing the duration (time window) of the records that are passed to the network. Other author, like Kumar et al. (2019), used a single time window that was 25% of the trial duration and with an overlap of approximately 5%. However, in this paper, the time window size is changed to compare its relationship with the algorithm's performance.
The results show that LSTM classifiers generate a better accuracy with longer windows-from a mean accuracy of 76.61% with a 1-s window size to 90.94% with 7 s. From this point on, the accuracy values are around 91%. In our case, no pre-selection of the signal was done when applying the proposed methodology. Other authors do select the best segments to build their dataset, which is not applicable in real-time systems; for example, Piątek et al. (2018) obtained an accuracy of over 96% with different classification algorithms with a time window of 10 s.
From the results presented in Tables 3 and 4, we concluded that after 7 s there is no significant improvement in the classification.
Finally, EEG records presenting a more realistic situation with EO and EC states were used to test how the different classifiers will work in a real BCI (Fig. 9a). Figure 9b shows the outcomes of the Coarse DTL, SVM Fine Gaussian, Coarse KNN algorithms and RNN with a delay of 1 s, which had the best accuracy in Table 2. Although the user is in a fixed state (EC or EO), the responses of these four classifiers change continuously between EC and EO. In contrast, the LSTM results are more stable and closer to the user's behavior. It should be noted that the RNN applies a dynamic temporal behavior as well, but does not achieve the constancy of actions present in the LSTM.
However, there is one aspect to highlight in Fig. 10, where different lengths of the EC state (t) are used in the protocol and the applied LSTM uses different time windows (w). Although the optimal result obtained in Table 3 is 7 s, we see how lengthening the window size affects the EO transition by adding a substantial delay, which could decrease the accuracy. It is necessary that the sequences to be classified in the LSTM be formed by exclusive characteristics of each state in order to be correctly classified. If they are formed by several classes, delays would be created because until the new state is predominant, the result of the classification will not change. That means that in a real system, it is necessary to establish a compromise between accuracy and response times since an LSTM-based BCI user has to know the duration of the time window over which to give his commands so they can be detected.
It should be noted that the network used and applied in the real context has a single LSTM layer which obtains higher accuracy than the classical algorithms. The results can be improved by adding more layers, as seen in Fig. 11.

Conclusions
In general, the algorithms applied can be considered, on the one hand, as classical algorithms, such as ML algorithms and, on the other hand, as DL algorithms, such as LSTM. In both cases, the quality and reliability of the data are essential to ensure efficient performance, even in the case involved in this research. In general, DL techniques usually require a large dataset and have a higher computational cost due to the complexity of the operations they perform. In this case study, the training was not significantly different, and the results were comparable, with the LSTM results standing out thanks to the sequential processing performed by this layer.
In a BCI context, although there is an extensive literature on EEG-based BCI with LSTM networks, there are not enough studies focused on the time window parameter, which is essential for its success. Our results highlight different aspects that we must take into account to implement an LSTM network in a BCI.
In our case study, the task of distinguishing the EO and EC states is complicated due to the changing evolution of the EEG signal. Most of the peaks in the alpha-band power in the EO state are low compared to those in the EC state. However, in an EC state, some values of power may be similar to those in the EO state. Therefore, this fact makes the classifier require extra information to differentiate the two states, since the discrete values of power do not have enough information to differentiate the states correctly.
There are different approaches to solve this problem; for example, using a threshold dependent on the maximum value of the peak powers detected in each state. When several consecutive power peaks of the EEG signal are detected and exceed this threshold in a predefined period, it is considered to be in the EC state, and if not, in the EO state. However, this method involves making several assumptions such as: assuming the behavior of the power, determining how often the power peaks appear, defining a threshold, etc. Therefore, a detailed study of each user's EEG is necessary. In this work, we demonstrate how an LSTM neural network can elegantly solve this problem.
The outcomes obtained with the LSTM have demonstrated that it is able to obtain a temporal pattern in the EEG signals, as other authors have demonstrated (Li et al. 2016;Wang et al. 2018). Based on our results, the importance of the time window in an LSTM is vital because there is a relationship between the time window sequence used and the accuracy of the classifier. In addition, the LSTM models proposed yielded better outcomes than the classical algorithms, which would need a longer pre-processing of the data to obtain the same level of performance. On the other hand, unlike an RNN network, LSTM can store information over long periods of time, improving the classification results. In summary, there is a relationship between the time window and the accuracy of the LSTM classifier which determines the efficacy and accuracy of a BCI system.
And finally, as a generic conclusion, despite the good results obtained with the EEG, there are other methods, such as EMG (Usman et al. 2021) or video eye tracking (Ibrahim et al. 2021) that could obtain better results in the proposed task. But we must not forget the potential of the system proposed here, which can be applied to other tasks such as providing feedback to aid with motor diseases (Miao et al. 2020;Pimenta et al. 2021), neuromarketing (Gers et al. 2000) or the classification of more elaborate brain tasks (Pfurtscheller et al. 1998;Kumar et al. 2019;Netzer et al. 2020) (to which the same study conducted in this work can be applied).
Funding Open Access funding provided thanks to the CRUE-CSIC agreement with Springer Nature. This research was funded under the auspices of Research Project ProID2017010100, supported by the Consejería de Economía, Industría, Comercio y Conocimiento of the Government of the Canary Islands (Spain) and the ERDF (European Regional Development Fund), and Research Projects TEC2016-80063-C3-2-R and DPI2017-90002-R, supported by the Spanish Ministerio de Economía, Industria y Competitividad, Gobierno de España. J. Ortega and K. Martín-Chinea each have a fellowship from the Agencia Canaria de Investigación, Innovación y Sociedad de la Información (ACIISI) of the Government of the Canary Islands (Spain) and the European Social Fund (ESP).
Data availability The data obtained for this manuscript is available upon request to the corresponding author.

Declarations
Ethical approval The study was conducted according to the guidelines of the Declaration of Helsinki, and approved by the Ethics Committee of the University of La Laguna (Register No. CEIBA2020-0405 and 3rd July, 2020).
Consent to participate Informed consent was obtained from all subjects involved in the study.

Consent for publication
The authors declare no conflict of interest and have given their explicit consent to publish this manuscript.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons. org/licenses/by/4.0/.