1 Introduction

Electroencephalography (EEG) provides a way to observe dynamic neuronal activity and regionally distinct oscillatory activity of the brain, in frequency ranges that extend over multiple orders of magnitude [1]. EEG is widely used in clinical diagnostics, brain disorders, estimation of mental workload and fatigue, emotion recognition, motor imagery, brain-computer interfaces, sleep scoring, neuroergonomics and many other applications [2, 3]. An extensive work has been done to find biomarkers of brain disorders correlated to neural activity [4]. Diagnosis of sleep disorders (for example narcolepsy), head injuries, degeneration of brain tissue, brain hemorrhage, stroke, brain infection, Alzheimer’s or Parkinson’s disease can be supported by extracting specific features of the EEG signals. EEG is also used in the monitoring of the depth of anesthesia and consciousness disorders. Another popular application area of EEG is in the construction of brain-computer interfaces [2, 3].

Many original methods have been proposed for EEG analysis. Complex neural networks and deep learning may frequently provide good classification results. Such models learn specific kernels and use them for convolution of the time-series signals, but they do not provide features that allow for a clear interpretation [5]. One of the most important problems in mental disease diagnostics based on the EEG signals is opening the black box of complex classifiers, and providing comprehensible explanations based on features that offer useful information. Cognitive, behavioral and emotional tasks can be used to elicit the responses of individuals to cognitive stimuli [6]. Such test-focused methodology is useful in neurology and psychiatry to provide personalized diagnostics. New approaches are needed to support the development of biomarkers of various brain disorders from the resting-state and task-related measurements. One promising direction to create objective biomarkers of brain disorders is based on the identification of abnormal dynamical connections measured in the resting state, using functional-connectivity magnetic resonance imaging (rs-fcMRI) [7]. This is an expensive technique that is hard to standardize, but it provides good quantitative results, showing the importance of neurodynamic assessment for clinical diagnosis and underlining the significance of searching for EEG-based biomarkers that could replace fcMRI.

The recurrence quantification analysis (RQA) is a method of nonlinear data analysis that quantifies the number, duration, and structure of recurrences of dynamical system states [8, 9]. This analysis is based on a fundamental property of recursion, observed in many complex systems, including brain neurodynamics. A recurrence plot (RP) enables the visualization of higher-dimensional phase spaces using a two-dimensional chart that displays the similarity of the current state to the past states. Patterns contained in recurrence plots include information about: metastable states (when system trajectory is fluctuating in some basin of attraction but does not change significantly), frequency of recurrence of specific states, and similarity between signals from spatially distributed regions indicating synchronized subnetwork activity. The RQA analysis of the EEG signal has been successfully used to characterize the various stages of dreams, episodes of sleep apnea [10], and to assess the depth of anesthesia [11]. It also enables the automated identification of epileptic EEG signals [12]. In this area, RQA is a new tool that provides the means to observe dynamical changes in the activity of the brain that were hard to notice by other types of EEG signal analysis methods. Non-linear features extracted from RQA have potential in clinical applications, being not only useful in diagnostics, but also helping to understand brain dynamics.

The standard procedure to work with non-linear time-series is to use the embedding theorem due to Takens [13], significantly extended in recent papers by many researchers [14]. These theorems provide the basis for optimal state-space reconstruction from univariate time series, creating a shadow version of the original multidimensional manifold simply by analyzing a single component of a multi-dimensional time-series. This enables reconstruction of the trajectory that preserves essential mathematical properties of the original system, such as the topology of the manifold on which the trajectory lies. A one-to-one mapping between the original attractor and its reconstruction is established, allowing the discovery of a low-dimensional manifold describing the state of the original dynamical system. This requires sampling of the time series with time delays, and embedding results in higher dimensional space. Optimal hyperparameters for representation of the original data require the determination of the embedding dimension and time-delays (TD-EMB) [8, 9, 14].

Although we have mathematical proof that attractors of dynamical systems may be recreated using the embedding approach [13, 14], in practice such methods have succeeded only for relatively simple systems. Most reconstructions of dynamics based on the embedding theorems have been tested on low-dimensional chaotic systems, such as the Rössler system, Duffing oscillator, or the Mackey–Glass delay equation [8, 9]. EEG data is much more complex. Specifying embedding and time-delay hyperparameters can be very challenging, especially when analyzing the resting state brain activity data. The hyperparameters defining the time-delay embedding may be selected for each frequency band separately, providing a detailed analysis of the delta (0.5–4 Hz), theta (4–8 Hz), alpha (8–14 Hz), beta (14–30 Hz) and gamma (above 30 Hz) bands. Each band is associated with different brain processes and mental states, from deep sleep to relaxation, anticipation, perception and attention. An extensive review of differences in frequency bands in eyes open and closed conditions showed substantial overlap across a spectrum of psychiatric disorders and large variability within disorders [15]. Parameters for every frequency band need to be optimized and calculations performed separately. Such an approach may lose important information when peak frequencies shift between several bandpasses. In the resting-state analysis, we would like to focus on features that are present in the whole spectrums of the brain waves. Because the EEG signal is a complex, non-stationary mixture of many oscillations, it is difficult to create a useful set of features that can be used to reliably estimate the similarity of its states, needed for the creation of recurrence plots.

There is growing evidence that brain dynamics evolves in low-dimensional latent spaces [16]. Coarse description at the level of local field potentials may contain more information than detailed, spiking-neuron models provide [17]. Pinotsis and Miller [16] claim that the emergence of stable electrical fields provides a stable representation of memory. If this hypothesis proves to be true, investigation of local field potentials, and EEG as their proxy, maybe the best road to understanding brain processes. So far, extracting useful information from the EEG signals has had limited success. We need to explore new ways of EEG analysis and find a new language for the description of brain processes.

Our main goal in this paper is to compare non-linear features extracted from recurrence plots build in two ways: using the standard embedding approach to create a time-dependent representation of signals, and using Short-Time Fourier Transform (STFT) to convert the input signal from the time domain into the time–frequency domain. Two types of vectors constructed in this way endow recurrence plots with different properties, reflected in features that are extracted from the recurrence plots using the recurrence quantitative analysis (RQA). Embedded representation of time-series EM-RQA is based on the sampling of the original, quickly oscillating signal, and it is not easy to find parameters that faithfully recreate underlying attractor states. The STFT-RQA representation is based on the distributions of power spectra that show peaks for certain frequencies, stabilizing the process of similarity estimation and facilitating meaningful interpretation. While STFT has been recently combined with RQA in a few applications, we have not seen EEG analysis done in this way. A combination of STFT with precise timing of neural events in recurrence plots may offer the most effective feature-based approach to characterize the nonstationary and largely chaotic EEG signals. Interpretation of recurrence is in this case rather simple: it is the return of oscillations that have similar power spectra, playing a specific role in the information flow in the brain.

Our strategy to compare the usefulness of these two methods of EEG signal representation involves the construction of recurrence matrices, extraction of RQA features, and using them in the classification of resting-state data acquired in the eyes open and eyes closed conditions. This is a rather simple, but non-trivial task. It requires a search for the best parameters for embedding, construction of recurrence matrices, and STFT calculation. The discriminatory power of RQA features is assessed by inspection of their distributions for each subject and each condition. Using a linear SVM classifier allows for the selection of the most useful combinations of features and electrodes. A single prototype for each class was used as a reference vector in the nearest prototype method. This approach is sufficient to reach our goal without an extensive comparison of various approaches to construct recurrence plots.

In the next section, steps used in our calculation pipelines, and the description of 64-channel EEG time series data used for testing are described. The third section presents details of the methods used to construct signal representation for the construction of recurrence matrices, calculations of distances, and feature extraction using recurrence quantification analysis. Section IV contains the results of the calculations. Parameters of the TD-EMB-RQA and STFT-RQA approaches have been selected to achieve the best classification results in tenfold cross validation. The distribution of the selected RQA feature values obtained for each of the 64 electrodes, calculated for each subject, illustrates differences among individual subjects. Histograms of the selected RQA feature values distributed over all subjects, show the discriminatory power of these features. RQA features calculated for each electrode provide spaces in which the linear SVM method creates a discriminating hyperplane. Histograms of projections on the direction perpendicular to this hyperplane show overlapping distributions for open and closed eyes conditions. Selecting the largest SVM coefficients shows that a few features for specific electrodes are sufficient to achieve the best results. Feature vectors are also displayed using UMAP visualization. The final section contains conclusions and plans for future work.

2 Preliminaries

Non-linear features are created from the analysis of signals measured by single electrodes. Starting with the original data matrix Uk = (uik), where the index i enumerates time-series samples and index k refers to electrodes (data streams, input channels), two types of matrices representing time series are constructed, labeled Xk and Sk. First, the vectors representing original samples from each electrode are replaced by the time-delayed embedded vectors xik = (uik, uik+τ,…,uik+(m − 1)τ). Here m is the embedding dimension and τ is an index enumerating time delays [18]. A sequence of these vectors creates an Xk matrix, containing the representation of the time-series in the m-dimensional embedded space, separate for each channel. The recurrence matrix RXk is created by comparing the distance between xik vectors. RQA features FX describing properties of dynamics are extracted from the RXk matrix. Thus, the TD-EMB-RQA pipeline used to generate non-linear features for each channel k involves Uk → Xk → RXk → FXk transformations. A set of features for each electrode defines the input space for classifiers.

In the second approach, STFT is used to determine the power spectrum for a range of oscillation frequencies in a short time-window that contains fragments of EEG signals. STFT was used with good results in a wide range of applications in the Brain-Computer Interfaces (BCI), e.g. analysis of signals during imagined writing [19], and many applications in neurology, such as the classification of seizure prediction [20]. Vectors sik = STFT([uik,uik+n]) are created, where n is the number of samples in the time window and sik(f) components represent the EEG power at frequency f in the time window i. We have filtered frequencies to the 0–50 Hz range, with signal sampling rate at 160 Hz and discretized the whole spectrum to 257 points (obtained from 512 FFT frequency bins), so this is the dimension of sik vectors in the STFT representation. Vectors for all time windows i are collected in the matrix S. Observation of power distribution changes over time, rising and waning peaks in different frequency bands, is done in consecutive time windows, shifted by a few samples. To preserve temporal resolution we have used a shift by a single sample. Calculating distances between columns of Sk matrix recurrence matrix RSk is created. The RSk matrix is analyzed to obtain a set FSk of RQA features. The STFT-RQA pipeline for the generation of non-linear features is: Uk → Sk → RSk → FSk.

Which set of features, FX or FS, contains more useful information, and which is easier to optimize and calculate? To answer this question we have performed various analyses of the EEG resting state data collected with open and closed eyes.

EEG Recordings: a subset of data from the BCI2000 project [21] was used for our tests. It contains preprocessed data of 109 subjects. This dataset has been publicly available as a part of PhysioNet resources [22]. The BCI2000 project facilitates the implementation of different BCI systems as well as various psychophysiological experiments. It is available with full documentation and has been used in a variety of studies by many research groups. To avoid artifacts, we have removed 10 subjects, leaving the data for 98 people.

Electrodes were placed according to the 10–10 international electrode placement system [23], excluding the following 11 channels at the circumference of the skull: Nz, F9, F10, FT9, FT10, A1, A2, TP9, TP10, P9, and P10. This leaves 64 electrodes. The sampling rate of EEG recordings was 160 Hz (relatively low) for 64 channels. We set up an average reference for the electrodes and use the data with pre-applied preprocessing. For the purpose of our analysis, we have extracted the two 1-min baseline runs, one with eyes open and one with eyes closed. For our computations, only 31 s of the signal from both baselines were used, each containing 5000 samples per channel. For EEG preprocessing and data handling pipeline, we used the MNE-Python library [24].

3 Methods

Here the details of methods used to create our two representations, TD-EMB and STFT, are described.

Time-delay embedding (TD-EMB): to create X matrix that contains EEG signals xi = (ui, ui+τ,…,ui+(m − 1)τ) for each channel (for simplicity we will drop the k index here) one has to determine hyperparameter m, the embedding dimension and time delay τ, equal to the number of samples each component is shifted by [18]. The real-time delay is equal to τΔt, where Δt is the sampling time (in our case Δt = 6.25 ms = 1000/160).

The value of the τ parameter is often estimated by selecting the delay corresponding to the first local minimum value of mutual information (DMI) [18]. One of the methods recommended to select embedding dimensions is based on the number of the false nearest neighbors (FNN, [18]). Here we have used these aforementioned methods implemented in the NoLiTSA package [25]. Moreover, we have estimated the FNN number (and the values of RQA parameters) as a function of embedding dimension, finding a minimum for m between 4 and 6, correlated with relatively high trapping time and other RQA features. This was true for the whole frequency range as well as in separate bands, from delta to gamma.

After creation of the X matrix, the recurrence matrix Ri,j = Θ(ε −ǁxi-xjǁ), i,j = 1…N is computed. Here N is the number of vectors after embedding (equal to the total number of samples minus (m − 1)τ), and ε is the distance threshold (also called the neighborhood parameter), a maximum distance that vectors may differ to be considered similar. Θ(\(\cdot\)) is the binary step function, and \(\|\cdot\|\) is the norm used to measure distances [8]. We have used the Euclidean distance for embedded signals. For the power spectra, we have experimented with the Wasserstein distance metric, but it made calculations slower and Euclidean distances gave similar results, so they were used in both methods.

Selecting values of embedding parameters shows one problem with this approach: it is usually done using various heuristics. The choice of ε is somehow arbitrary and it has to be adjusted to the estimated embedding and delay parameters. To this day, a golden standard for the ε has not been established. Selecting different thresholds for each channel, subject and time segment could probably give better results, but the whole procedure would be very tedious and require considerable computations. Automatic procedures for the selection of embedding parameters have been published recently, but have not yet been used with such complex time series as the EEG signals [14]. Selection of a non-uniform time delay embedding, like the one developed in [14], might give better results with time-delay embedding method. However, our goal is not to maximize classification accuracy, but to compare the advantages of two approaches to EEG time series representation. Here we have used a well-established approach, selecting ε value for the fixed global recurrence rate. Specifically, we have calculated ε as the 4th percentile of the distance distribution [26].

Despite optimization of codes and parallelization for the Compute Unified Device Architecture (CUDA) of the Tesla V100 24 GB cards, the complete pipeline, from construction of representation to the final classification, takes more than two hours on a PC workstation. However, our goal was not to achieve the highest possible classification accuracy, but to compare the advantages of two approaches to EEG time series representation. Therefore, for the sake of simplicity during classification, we fixed the m and τ parameters at the most frequent values: m = 5 and τ = 9.

After obtaining m, τ and ε parameters, recurrence matrices are calculated separately for each electrode.

Short-Time Fourier transform (STFT): to create S matrices, STFT was computed in sliding time windows, separately for every electrode and each subject. For this purpose, we have used the TensorFlow implementation, defining Hamming window type with 240 samples (corresponding to 1488 ms), shifted by a single sample. This number of samples proved to be a good compromise, as shorter time frames will introduce uncertainty in the frequency determination of the Fourier transform and larger windows increase the uncertainty of time in which calculated spectra arise. High gamma frequencies have very small amplitudes; therefore, a cutoff frequency of 50 Hz was used, well below the 80 Hz Nyquist frequency for the 160 Hz sampling rate. With this relatively slow sampling frequency and 240 samples in the window, a low-frequency resolution of 2/3 Hz is obtained (Rayleigh frequency). The number of vectors generated in this way is equal to N minus the window size (here 4800–240 = 4560). Between each sampling point we have Δt = 6.25 ms. At 50 Hz each oscillation lasts 20 ms, about three times longer. Although we have calculated STFT starting from every sampling point, a sliding window of 3 points (or more for higher sampling frequencies) should be sufficiently accurate.

STFT analysis returns 257 samples representing each si vector for each time window, with the number of frequency bins set to 512. This operation resulted in a matrix S containing a representation of time series based on STFT (Fig. 1). These vectors have a clear interpretation, showing peaks of characteristic frequency. The separation of EEG signals into separate frequency bands may distort such power shifts. In Fig. 1, the main alpha peak is shifting towards a slightly higher frequency. The usual boundary between alpha and beta bands is at 12–13 Hz.

Fig. 1
figure 1

Example of STFT vectors calculated in time windows of about 1.5-s length, showing a shift of the alpha peak frequency in the power spectra after 100 ms and 200 ms, measured by the O1 electrode (occipital area) during eyes closed condition

Power peaks of similar frequencies observed in spectra for different electrodes indicate patterns of synchronized activity, but investigation of these processes is left for future work.


Recurrence matrices from STFT vectors RX matrices are calculated directly from the columns of X matrix, using Euclidean distance with the threshold (neighborhood parameter) ε equaled to the 4th percentile of the distance distribution. To create RS recurrence matrices, we have also used the Euclidean distance metric.

Every column vector in matrix S, representing the power spectrum in a given time window, is compared with all previous vectors using the Euclidean distance metric. Figure 2 shows the spectrogram with a rather stable alpha peak and weaker power in other frequencies. The threshold parameter was similarly calculated as the 4th percentile of the distance distribution, creating the recurrence plots with rather large basins of similarity (Fig. 2b). Note that column vectors X and S are of very different size, m components versus 257 components, therefore also the neighborhood size parameter is quite different. In the STFT case we expect to see many distinct metastable states, corresponding to similar power spectra and similar fragments of the time/frequency spectrograms. The distance is small when peaks and shapes of the power spectra are similar, indicating recurrence to a similar state. While recurrence plots do not show precisely which frequencies contribute to this result, we can identify states that belong to groups that have similar spectra and which states form a distinct group. Recurrence matrices contain information about the spectra with sets of peaks of different frequency values. If ε is small, all states seem to be different, similarity basins are small and only a few off-diagonal elements appear. If ε is large, everything seems to be quite similar. We know that microstates, clusters of similar global power distributions, have typical duration of about 100 ms [27]. One way to estimate ε value is to require that recurrence plots should show rich structure within a stripe of no more than a few hundred milliseconds near the diagonal, indicating metastable and transition states, as in Fig. 2b. We can also check how the value of ε will influence classification accuracy at our final step.

Fig. 2
figure 2

a Example of a spectrogram, with brighter colors for higher powers. b Fragment of recurrence plot that contains STFT column vectors for each point in time, with periods of rapid changes (red dot) and relative stability (orange dot), when the system is trapped in a meta-stable state. c Red and orange color dots/markers for the two time points shown in the spectrogram plot correspond to the two power distributions

Recurrence Quantification Analysis (RQA): non-linear features are calculated from the matrices RX and RS using the recurrence quantitative analysis. We have used the PyRQA library [28] to extract RQA features FX from each electrode for each subject, and to plot the recurrence matrices. For the FS features, we have used our own implementation of recurrence quantification analysis, based on modification of the recurrence_python software [29].

In both cases computed RQA features included: TT = trapping time (also called dwell time), DET = determinism, L = average diagonal line length, Lentr = entropy of diagonal lines and LAM = laminarity (see [8] for the definition of these quantities). Although it is possible to extract at least 12 additional features from the RQA analysis [29], these five features were the most useful in the prediction of autism from the infant EEG data [30] and are quite sufficient to compare TD-EMB-RQA and STFT-RQA approaches. Non-linear EEG features have so far been used only in the analysis of a relatively small number of subjects. Here we have 98 subjects, allowing for more reliable statistics. We may expect better results using more features, but relative advantages of the two methods should not change.

Features generated from RX and RS matrices may be compared using histograms in Fig. 3 and 4. They show only 3 features: trapping time (TT), average diagonal line length (L), and determinism (DET). The histogram for laminarity (LAM) is quite similar to determinism and entropy of diagonal lines (Lentr), so these histograms are not shown to save space. The histogram for TT resembles L, LAM is similar to DET.

Fig. 3
figure 3

Examples of histograms of the three FS feature values (from the STFT column vectors, window size = 240) from the RQA analysis (other features have similar histograms), showing for all 98 subjects distribution of TT, L and DET values extracted from 64 electrodes

Fig. 4
figure 4

Examples of histograms of the three FX feature values (X column vectors for m = 5, τ = 9) from the RQA analysis (other features have similar histograms), showing for all 98 subjects distribution of TT, L and DET values, extracted from 64 electrodes

Distributions shown in Fig. 3 and 4. have been normalized and outliers, due to the artifacts produced by some electrodes, removed using Robust Chauvenet Outlier Rejection [31]. This procedure has been extensively tested in radio astronomy, but so far has never been used for the EEG data. For most subjects histograms of the FS features show distributions that have lower values for the open-eyes condition. This is understandable, as the brain activation is in this case more varied, and trapping times are shorter. Oculometry data would probably show a correlation between the frequency of eye movements and the distribution of these feature values. It also indicates that even single features with selected electrodes may have discriminative power. The variance of feature values is quite large and that points to the importance of the selection of the most useful EEG channels. In a few cases subjects have similar distributions of feature values that may point to vivid imagery, making it hard to distinguish between brain activations in eyes open or closed conditions.

FX features show strong overlaps for most subjects. Distributions of all RQA feature values from all electrodes also are quite similar for open and closed eyes. There is a small tendency towards higher values in the open eyes condition, quite the opposite as in the FS case. The high variance shows that careful selection of electrodes for which extreme values are calculated may be helpful in discrimination of the two conditions.

Figures 5 and 6 show L, TT, and DET histograms for the FS and FX features for all subjects. From these histograms we may also conclude that in the case of closed eyes L, TT, and DET calculated from the FS features have in general larger values than in the case of open eyes. TT shows the largest difference, therefore it may be the most useful feature in classification. This is followed by L, and to a lower degree by DET, where the peaks of the two distributions are much closer to each other. Brain dynamics is less chaotic with eyes closed. Differences seen here may be attributed mostly to the strong, persistent alpha peaks in the eyes closed condition, as seen in the spectrogram in Fig. 1. STFT representation of EEG signal shows these differences quite clearly. Strong alpha peaks may contribute to the higher values of these parameters. In the case of FX features (Fig. 6) it is not so explicit. Overlaps of feature distributions are much stronger. There is a small shift of TT and L values in the eyes open conditions towards lower values. Therefore these features will not be of much use to distinguish between our two experimental conditions. Interpretation of these results may be fallacious.

Fig. 5
figure 5

Histograms of the RQA features for all 98 subjects: a TT (trapping time), b DET (determinism), c L (average diagonal line length), calculated for the FS features (STFT parameters as in Fig. 3)

Fig. 6
figure 6

Histograms of the RQA features for all 98 subjects: a TT (trapping time), b DET (determinism), c L (average diagonal line length), calculated for the FS features (embedding parameters as in Fig. 4)

4 Classification results

FX and FS features calculated for each channel define vector space for the EEG classification. Although many approaches could be applied here, and probably achieve slightly better results, we have used the Linear Support Vector Machine (LSVM) algorithm. This is a version of the linear discrimination algorithm that finds a hyperplane in high-dimensional space dividing two classes of data. For our goal—comparison of representations based on two sets of features—this has important advantages. The linear classification approach allows for a simple interpretation that is lost when non-linear kernels are used. LSVM allows for visualization of results, creating histograms of data for each class projected on the vector perpendicular to the discriminating hyperplane [32]. The weights obtained from LSVM show the importance of the combination of each feature/channel for discrimination. This allows for the estimation of the feature/channel usefulness. The creation of filters based on a small number of electrodes, with algorithms that extract specific information from each channel, is very important for clinical applications. A step in this direction is done here.


LSVM classification and visualization of results We now have two sets of nonlinear features. Combining five features with all 64 electrodes creates vectors Z with 320 components that contain all information available in our feature space. This should allow for better discrimination than single features in Figs. 5 and 6. Classification accuracy for the whole group of 98 subjects was calculated with the linear SVM classifier using these vectors as inputs. For the cross-validation, the stratified k-fold method was chosen with k = 10. This method shuffles the data randomly and splits it into k − 1 groups which are used as a training set and one test set, all with a balanced number of condition values. The whole procedure is repeated k times. The total mean classification accuracy as well as the mean accuracy for each condition type derived from the confusion matrix were calculated. The SVM implementation from Scikit-learn Python library was used.

To exclude redundant electrodes and features, we have performed recursive feature elimination with tenfold cross-validation using the Scikit-learn function. In every step of this process, the algorithm removed one component from the data with the lowest absolute value of the SVM coefficient, performing cross-validation in the space of reduced dimensionality.

The STFT-RQA classification achieved fairly good results with 88.2% total accuracy. Elimination of redundant features increased the score up to 95.9% for 194 selected dimensions.

Results for TD-EMB-RQA representation happened to be lower than for STFT technique. Analysis of RQA features has not shown much discriminatory power, giving total mean accuracy of 80%. However, removal of components in the cross-validation scheme significantly increased accuracy, reaching 89.2% for 80 selected components.

Using the weight vectors W obtained from the LSVM, a scalar product z = W \(\cdot\) Z projects individual cases on the line perpendicular to the discrimination hyperplane [32]. Figure 7 shows the histogram of these projections for features from the STFT-RQA, and Fig. 8 from the TD-EMB-RQA pipelines. Overlaps of the two distributions are very small for SVM based on FS features (Fig. 7), but are significantly larger using FX features (Fig. 8). This could also be expected by comparing the accuracy of classification on the test partition of data: based on FX features it is at the level of 89.2% and on the FS features at 95.9%.

Fig. 7
figure 7

Histograms of the projection of 320 FS feature values (5 RQA features for 64 electrodes), for all subjects, in the direction perpendicular to the LSVM hyperplane (parameters as in Fig. 3), for all data

Fig. 8
figure 8

Histograms of the projection of 320 FX feature values (5 RQA features for 64 electrodes), for all subjects, in the direction perpendicular to the LSVM hyperplane (parameters as in Fig. 4), for all data

We have also defined a single prototype for each condition, averaging normalized vectors for each class, and using the nearest prototype algorithm for classification with tenfold cross-validation. This simple procedure gives 75.5% for STFT-RQA and 58.2% for TD-EMB-RQA representation.

Visualization of similarity of all feature/electrode vectors using the UMAP (Uniform Manifold Approximation and Projection for Dimension Reduction) method (Fig. 9) showed pure clusters. UMAP is an algorithm for dimensionality reduction that is frequently used to plot high-dimensional data in two or three dimensions. It has similar applications as t-SNE for general non-linear dimensionality reduction and visualization [33]. Perfect clusterization for TD-EMB-RQA is quite surprising, but this just shows that using whole data a low-dimensional manifold may be constructed, capturing the similarity of vectors for each condition. UMAP dimensionality reduction is not a projection, so we cannot add additional test data that have not been used for the creation of the model.

Fig. 9
figure 9

a Visualization of the 320-dimensional Z column vectors based on the STFT-RQA approach for 98 subjects using UMAP dimensionality reduction. Two embedding dimensions shows high similarity of vectors representing the eyes open and closed conditions. Separability of these two conditions is high. UMAP shows only the topological structure of data. b Visualization for the TD-EMB-RQA approach shows lower similarity and separation.

Examples of the recurrence plots for eyes open and closed conditions are presented in Fig. 10a, b for the RS matrices, and c, d for the RX matrices. These plots are strongly dependent on the neighborhood thresholds and selection of electrodes. In the case of RS, we can see precisely structured, short metastable states, lasting from about 100 ms (typical times for microstates), to more than a second. The power spectra in these states are relatively stable, and as can be seen in the vertical columns return to the similar distribution, more often in the eyes closed case. In the TD-EMB-RQA case, RX matrices show more diffused structure. This is understandable, because in this representation EEG signal is quickly oscillating, so distances between vectors at each time step (6.25 ms) also tend to fall in and out of the ε threshold. Although precise interpretation is more difficult in this case nevertheless looking at the vertical columns of these plots recurrent structure is clearly observed.

Fig. 10
figure 10

Example of recurrence plots, 30 s, electrode O1, subject S001. Dark dots show distances inside the ε neighborhood. The structure seen in plots of the RS matrix is much more precise than in the RX matrix plots. Each recurrence plot has the same recurrence rate = 0.04. a RS matrix, STFT column vectors, window size = 240 samples, ε = 10.23, eyes open, and b for the eyes closed condition with ε = 15.84. c RX matrix, embedding dimension m = 5, time delay τ = 9, ε = 0.21, eyes open and d for the eyes closed condition with ε = 0.36

Looking at the power spectra for specific time windows, we may identify the dominance of alpha states, combinations of theta with alpha, and short beta peaks. The separation of EEG into 4 bands would show a bit clearer picture. In the case of eyes closed, recurrence plots show the rising of relatively stable alpha peaks. Recurrence plots show many structures that can be interpreted, such as intermittent pulsation, and slow oscillations that bring the system into roughly the same states.

Recently Férat et al. [27] performed calculations on EEG in eyes open/closed conditions, segmenting microstates either in the broadband (1–30 Hz) or in four frequency bands. Separation increases accuracy from 73 to 80%. The separation of signals into four classical frequency bands would expand our feature space to 1280 components. Adding more RQA features will expand it to many thousands. Training such high-dimensional models would require a large number of subjects, while typical EEG experiments usually involve only a few tens of participants.

5 Conclusions and future work

We have been inspired by the excellent results of Bosl, Tager-Flusberg, and Nelson [30], who showed that RQA EEG features were very useful in the early diagnosis of autism. They have used seven RQA features with two additional nonlinear features computed separately for the 6 frequency bands and only 19 sensors, for a total of 1026 features. The number of children diagnosed with autism was small, EEG was collected from 8 to 31 subjects depending on their age. Moreover, it is not clear how they selected parameters for recurrence analysis. While promising, such results have to be carefully checked and replicated on a larger number of participants.

Our main goal in this paper was to show an alternative route to create recurrence matrices from the EEG signals. We have compared two approaches that generate non-linear features, useful for interpretation and classification of the real EEG data. Both methods, TD-EMB-RQA and STFT-RQA, require careful selection of the size of neighborhood parameter ε for the construction of recurrence matrices. STFT has two parameters: time window and overlap for the sliding windows, but both parameters are easy to determine. The embedding approach requires the selection of the embedding dimension and time delays. The optimal choice of these hyperparameters is not simple, and the interpretation of results is not as straightforward as in the STFT case.

Our method to create non-linear features obtained from recurrence quantification analysis has important advantages over the standard approach based on the signal embedding. We have used the short-time Fourier transform (STFT) with RQA analysis to generate non-linear features. The STFT-RQA representation has several advantages for creating informative features for EEG time series classification. We did not divide the signal into frequency bands, as it is usually done. Results presented in Figs. 3, 4, 5, 6, 7 and 8, and Table 1 shows that features obtained from the RQA using the STFT approach distinguish the two experimental conditions in a significantly better way. This is confirmed by classification results obtained using the SVM linear classifier, showing an average improvement over the embedding approach by as much as 30% when all feature/electrode components are used.

Table 1 Summary of the test results of classification using LSVM with tenfold cross validation (CV) with selection of features/electrodes with recursive feature elimination (RFE) for the STFT-RQA (a) and TD-EMB-RQA (b) method

Equally significant is the insight into the data structure. We could identify a small number of electrodes and RQA features that extract information important for the classification of brain states. Recurrence plots may be interpreted using the language of dynamical systems. Periods of metastability are better delineated in the case of STFT (Fig. 10). In the eyes closed condition they are much longer than in the case of eyes open. This is also seen in histograms Figs. 3, 4, 5 and 6. The stable alpha rhythm that dominates in the eyes-closed condition may be seen in the filtered alpha-band signal, but it is also seen in the recurrence plots using a broad frequency band. During the resting state, many activations appear in different parts of the brain and the broadband power spectra analyzed here may include power peaks that shift between different bands. One of the most important reasons for using the STFT in our pipeline is the ability to identify peaks of the EEG power spectra that are meta-stable for a relatively long time.

Although we present here only univariate analysis, we hope that recurrence analysis based on the STFT column vectors will provide more precise information about global brain states than is contained in the microstate dynamics [34]. An alternative to the feature-based approach is based on the two-dimensional time/frequency images analyzed using deep neural networks. It has been applied to various disorders, such as major depression, epilepsy, autism, and schizophrenia (see the review by Craik, He, and Contreras-Vidal [35]). We are sure that feature-based approaches can be applied to the same data, providing not only comparable accuracy but also interpretation and explanation of results that deep networks are not yet capable of.

The time for STFT-RQA computations is longer than needed for a single run using the TD-EMB-RQA approach. This is due to the large number of time windows (almost equal to the number of samples) used for the STFT calculations. A standard method based on embedding is faster only if the time needed to select optimized parameters for embedding dimensions and delays is not taken into account. A lower sampling rate limits the accuracy of our power spectra to 2/3 Hz. For comparison of the two methods of signal representation, it was not important. However, small shifts of peak frequencies may have diagnostic value. For example, a shift of frequency peaks towards lower values in mild cognitive impairment may be used as a measure of progress toward Alzheimer’s disease [15, 36]. Therefore, higher sampling rates should be used in testing the usefulness of the STFT-RQA method in clinical applications. This will allow for testing how large shifts of sliding windows may be used.

The results presented here may be improved in many ways, using more sophisticated classifiers, various feature selection methods [37], or adding more features. The weights obtained from LSVM show, which combination of feature/channel is most useful for discriminating. For example, we have found that SVM column vector coefficients corresponding to the TT feature are much larger than those of the L and DET features. This allows for a reduction of the number of features and EEG channels, and it may be used as a basis to create filters specifically for different applications, working with simpler EEG equipment.

More non-linear features that characterize neurodynamics may be generated and tested. As our goal was to show the advantages of this approach compared to the standard embedding procedure, we have restricted ourselves to the six features only, although the PyRQA software we have used [29] generates up to 19 features. We can also add NOnLinear measures for Dynamical Systems (NOLDS). Sample entropy (SAMPEN) is frequently used to evaluate the complexity of information in the EEG time series. Other potentially useful non-linear features include correlation dimension (CORR-DIM), a measure of the fractal dimension of a time series that is also related to its complexity. The Hurst exponent (HURST-RS) is a measure of the “long-term memory” of time series. It can be used to determine long trends in the data. Detrended fluctuation analysis (DFA) is a measure of the Hurst parameter H. It is similar to the Hurst exponent, but DFA can also be used for non-stationary processes, where mean and/or variance change over time [38]. Features generated from the topological data analysis should also be tested [39].

As an alternative to the STFT approach, Short Time Padé Transform (STPT) can be used. In the resting state EEG microstates research, interesting data structures have been discovered using this technique: discrete, frequency-modulated oscillatory processes, called “oscillations” [34]. For a specific task, for example, diagnosis of any disease, our approach may be combined with feature selection methods, creating specific filters with a minimal number of features and electrodes. This should help to find features that will improve the generalization of results and help with their interpretation. Instead of using thresholds, instantaneous amplitude correlations (IAC) between the envelopes of EEG signals have recently been used [40]. To speed up the calculation, the influence of reduced overlaps of time windows will be checked, and the selection of features and channels performed.

The next important step is to test our approach on various datasets. Multivariate STFT signal representation should be used after the reconstruction of the source signals. This will help to discover functional connections between subnetworks of sources with similar frequency of power peaks in the same time windows. The goal is to reveal subsets of brain regions that form active subnetworks and provide insight into brain processes that should be more detailed than microstate analysis provides. Fuzzy symbolic dynamics may then be used to visualize trajectories, providing complementary information to recurrence plots [41]. There are many open research pathways worthy of exploration.