Mitigating the curse of dimensionality using feature projection techniques on electroencephalography datasets: an empirical review

Anuragi, Arti; Sisodia, Dilip Singh; Pachori, Ram Bilas

doi:10.1007/s10462-024-10711-8

Mitigating the curse of dimensionality using feature projection techniques on electroencephalography datasets: an empirical review

Open access
Published: 26 February 2024

Volume 57, article number 75, (2024)
Cite this article

Download PDF

You have full access to this open access article

Artificial Intelligence Review Aims and scope Submit manuscript

Mitigating the curse of dimensionality using feature projection techniques on electroencephalography datasets: an empirical review

Download PDF

Arti Anuragi¹,
Dilip Singh Sisodia¹ &
Ram Bilas Pachori²

998 Accesses
2 Citations
1 Altmetric
Explore all metrics

Abstract

Electroencephalography (EEG) is commonly employed to diagnose and monitor brain disorders, however, manual analysis is time-consuming. Hence, researchers nowadays are increasingly leveraging artificial intelligence (AI) techniques for automatic analysis of EEG, involving task-specific feature extraction and classification. Extracting multiple non-linear features from multiple EEG channels enhances the performance of classification, but it also results in high-dimensional features. However, the presence of the "curse of dimensionality" poses challenges for classifiers in AI-based EEG analyzers, leading to overfitting and complexity in classification. Therefore, in this empirical review, the effectiveness of the existing dimensionality reduction techniques to mitigate the curse of dimensionality in EEG feature sets is analyzed. We begin this study, by overviewing the extracted high-dimensional features from EEG signals related to disorders such as schizophrenia, alcoholism, focal seizures, focal seizures with deep features, and depression. Subsequently, 23 reduction techniques were reviewed, which transform the high-dimensional EEG features into a new reduced feature space to improve the classification. The impact of the reduced features was evaluated using traditional AI-based classifiers (support vector machine (SVM) and k-nearest neighbor (k-NN)). 10-fold cross-validation was performed for training and testing, and the performance was evaluated using accuracy, selected features, and dimensionality reduction rate (DRR) metrics. Comprehensive analysis of projection techniques on diverse EEG datasets offers valuable insights to assist researchers in selecting the most suitable technique. The hybrid projection technique termed principal component analysis-based t-distributed stochastic neighbor embedding (PCA+t-SNE) achieved an impressive average accuracy of 93.36%, surpassing the k-NN classifier without reduction techniques.

Adaboost Classifier with Dimensionality Reduction Techniques for Epilepsy Classification from EEG

Classification of EEG Signals for Epileptic Seizures Using Feature Dimension Reduction Algorithm based on LPP

Article 24 June 2020

Dimensionality Reduction Effect Analysis of EEG Signals in Cross-Correlation Classifiers Performance

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Electroencephalography (EEG) has been a key tool in the diagnosis and treatment of brain disorders in the medical field. The analysis of EEG signals has been known as the most prevalent approach to the problem of extracting knowledge of brain dynamics. Visual inspection is not an appropriate method for accurate and reliable diagnosis and interpretation of such complex EEG data as it is time-consuming, burdensome, dependent on expensive human resources, and prone to error and bias. Therefore, extensive research is needed to develop a method for automatically diagnosing brain disorders. The computer-aided diagnosis (CAD) systems have been used to perform an automatic neurophysiological assessment for detecting abnormalities from EEG signal data. The CAD system comprises four primary components, which include data preprocessing, feature extraction, dimensionality reduction, and classification processes. Each of these components is illustrated in Fig. 1 and is briefly described below.

Data preprocessing: This component involves refining input data by eliminating noise and inconsistencies, enhancing data quality through techniques like denoising and normalization, and integrating diverse data sources to enable comprehensive analysis.
Feature extraction: This entails identifying and representing relevant characteristics within preprocessed EEG data. These characteristics can include frequency components, statistical measures, or other patterns that are essential for distinguishing different brain activities or conditions. Feature extraction plays a crucial role in preparing EEG data for effective classification algorithms and aiding in tasks like detecting seizures, sleep stages, or cognitive states.
Dimensionality reduction: This refers to techniques that reduce the number of EEG features while preserving essential information. This helps simplify analysis, improves model efficiency, and prevents overfitting, making it easier to classify brain activities or disorders accurately. It mainly includes two techniques (1) feature selection which focuses on choosing a subset of the most informative features from the original set while discarding less relevant ones. (2) feature reduction which focuses on transforming the original EEG features into a lower-dimensional representation, often using techniques like principal component analysis (PCA) or independent component analysis (ICA).
Classification: It refers to a machine learning task in which a model is trained to assign labels to data by learning from its inherent characteristics. Once trained, the model can leverage this acquired knowledge to label new, previously unlabelled data.

Therein, feature extraction, selection and classification are necessary, while the other components are optional.

For pre-processing, which is primarily responsible for suppressing noise in information signals, several techniques have been used by researchers over the past few years. For a thorough analysis of methods employed for removing noise from EEG signals, readers can refer to ( Jiang et al. (2019)). The second and most crucial component of any CAD system is feature extraction, for which a wide range of methods (time domain, frequency domain, and time-frequency domain methods) have been reported in the literature to classify numerous brain disorders from EEG signals. Fourier transform (FT) based techniques are developed for frequency domain analysis of EEG signals, but these techniques do not offer time-domain data. The autoregressive (AR) methods are computationally efficient, but they have limitations that prevent them from being used in actual CAD systems. Thus, time-frequency signal-processing algorithms like discrete wavelet transform ( Sharmila and Geethanjali 2016), tunable-Q wavelet transform (TQWT) ( Anuragi and Sisodia 2017), empirical mode decomposition ( Thilagaraj and Rajasekaran 2019), empirical wavelet transform ( Anuragi and Sisodia 2020), Fourier-Bessel series expansion-based empirical wavelet transform (FBSE-EWT) ( Bhattacharyya et al. 2018) methods were used to extract features in the time and frequency domain for analyzing several brain disorders based EEG signals. Some commonly used non-linear features extracted from decomposed EEG signals are Hjorth parameters ( Mert and Akan 2018), approximate entropy ( Krishnan et al. 2020), Renyi entropy ( Sharma et al. 2014), line length entropy ( Esteller etal. 2001), and norm entropy ( Anuragi et al. 2020). The standard methods involve the extraction of hand-crafted features from a large number of decomposed EEG signals by advanced wavelet methods using multiple-channel EEG signals. Thus, this results in the high-dimensionality feature vector.

1.1 Dimension reduction

Dealing with high-dimensional data can pose various challenges for AI-based machine learning (ML) models, affecting their ability to accurately classify, recognize patterns, and visualize information. Consequently, this paper emphasizes the importance of dimension reduction.

Significance of dimension reduction: Learning with high-dimension features can become difficult due to high computational complexity. The curse of dimensionality refers to the fact that if the amount of data used to train a model is fixed, increasing dimensionality can lead to overfitting, which lowers classification success rates. This issue can be resolved by collecting exponentially more data for each additional dimension it is not always possible hence feature dimensionality reduction method is adopted by many researchers.
Feature projection versus selection: The dimensionality reduction method involves two approaches that are feature selection and feature reduction. Feature selection refers to picking out the most relevant aspects of the original signal, while feature reduction transforms the EEG data into a lower-dimensional representation ( Li et al. 2017.)

This review study exclusively focuses on reduction techniques because of the prevalence of highly correlated features in EEG signals, stemming from the intricate nature of brain activity. Feature reduction methods play a pivotal role in this context as they not only mitigate multicollinearity caused by numerous electrodes but also excel at noise filtration, amplifying the importance of key EEG components. Ultimately, this emphasis on feature reduction is driven by its potential to significantly boost the accuracy of classification models. Numerous researchers have shifted their attention towards dimensionality reduction, as illustrated in the overview of the relevant literature.

In the study ( Razzak etal. 2019; Sadiq et al. 2019; Peng et al. 2021; Peng etal. 2020), several feature combinations are evaluated for classification enhancement by reducing the dimension of a large feature matrix. Furthermore, a number of dimension-reduction techniques ( Van Der Maaten et al. 2009), including feature selection and feature transformation (projection), is used to select the most effective features for classifying EEG signals. In the study ( Zhang et al. 2019), authors examined six dimensionality reduction algorithms such as ICA, isometric feature mapping (ISOMAP), PCA, kernel PCA (K-PCA), locally linear embedding (LLE) and Laplacian eigenmaps (LE) to reduce the dimension of the features. Then these features were evaluated by least squares support vector machine (LS-SVM) classifiers. The findings demonstrated that, when compared to other methods, ICA had the highest classification accuracy for classifying epilepsy-seizure EEG signals. The authors presented work for motor imagery classification using the flexible analytic wavelet transform (FAWT) method in ( You et al. (2020)), where time-frequency features were extracted, future-reduced using multidimensional scaling (MDS), PCA, K-PCA, LLE, and LE, and then classified using a linear discriminant analysis (LDA) classifier, achieved the highest classification accuracy of 94.29% from MDS reduction method. For EEG-based focal detection, Raghu and Sriraam introduced a work ( Raghu and Sriraam 2018) it extracts 23 sets of features belonging to the time, frequency, statistical, and time-frequency further features are reduced neighborhood component analysis (NCA) using support vector machine (SVM), achieved the highest classification accuracy of 96.1%. In order to reduce the features derived from the bispectrum of the focal and non-focal EEG signals, ( Sharma et al. (2019)) utilized the locality-sensitive discriminant analysis (LSDA) data reduction technique. These features were then passed to SVM classifiers for performance evaluation and achieved 96.2% accuracy. In the study ( Jiang etal. 2022), the authors employed a convolutional autoencoder (CAE) for deep feature extraction and dimensionality reduction for children’s focal epilepsy EEG classification. ( Akbari et al. (2021)) used the forward selection algorithm (FSA) to reduce the features derived from various geometrical features using phase space dynamic (PSD) analysis for the classification of schizophrenia EEG signals. The k-NN classifier attained the highest classification accuracy of 94.80%. In the study, ( Prabhakar etal. (2020)) extracted nine different non-linear features, which resulted in high-dimensional features. Four optimization algorithms, including artificial flora (AF) optimization, glowworm search (GS) optimization, black hole (BH) optimization, and monkey search (MS) optimization, were used to determine an optimal number of features to improve SVM classifier performance. This results in a maximum accuracy of 92.17%. ( Mumtaz et al. (2016)) computed absolute power (AP) and relative power (RP) features from multiple channel alcoholic and non-alcoholic EEG signals using a fast Fourier transform (FFT), yielding 133 high dimensional features. PCA was used to reduce the most significant features, which were then used for classification using the logistic model trees classifier, which achieved a maximum accuracy of 96%. ( Patidar et al. (2017)) proposed a framework for alcoholic EEG signal classification in which centered correntropy features were extracted from the TQWT method, and dimension was reduced using PCA before being fed to LS-SVM classifiers for performance evaluations, achieved the highest classification accuracy of 97.02%. The framework for major depressive disorder (MDD) classification based on EEG signals was developed by ( Saeedi et al. (2020)), which uses a genetic algorithm (GA) to select the significant features from feature vectors produced by sample entropy and approximate entropy on decomposed signals derived from wavelet packet coefficients. The proposed framework achieved 94.28% accuracy using enhanced k-NN. Authors in this study ( Mahato and Paul 2019) have investigated linear and non-linear features for classifying MDD EEG signals. It was observed that PCA was used to effectively reduce the feature’s dimension after both features had been combined. The radial basis function network (RBFN) classifier was then applied to the reduced features, and it achieved the highest accuracy of 93.33%. Another study for MDD classification was carried out by ( Raghavendra et al. (2023)), where high-dimensional features were extracted using a continuous wavelet transform (CWT). Thereafter K-PCA and PCA analysis techniques were employed to reduce the dimension. They were then evaluated using an SVM classifier. The highest classification accuracy was 99.33% with K-PCA techniques. Authors in ( Ray et al. (2021)) has reviewed various feature reduction techniques for high dimensional data analysis. Face recognition is another area where dimension reduction techniques have been examined ( Gu et al. 2016).

1.2 Major contributions

The majority of the methods for classifying brain disorders based on EEG signals in the literature that we have discussed have focused on performing feature extraction tasks using various wavelet transform techniques. The extracted features are not properly chosen because of the dimensionality of the features, despite the fact that all of these methods increase computational complexity. As the dimension of the features increases, so does the classification complexity. More work on dimensionality reduction is needed to overcome these drawbacks and achieve better classification results. There have been a few attempts by authors to address this issue, but the majority of them used traditional PCA or K-PCA ( Mumtaz et al. 2016; Raghavendra et al. 2023; Zhang et al. 2019). Furthermore, most of the studies in the literature investigated the reduction techniques only on one EEG dataset. Thus, in this study, we conducted an extensive empirical review of the effectiveness of 23 different projection techniques on five different EEG signal-based brain disorders (shown in Fig. 1). The objective of this study is to provide end users to select the most appropriate feature projection technique for their application.

The key findings and contributions of this review study are summarized as follows:

To the best of our knowledge, this article presents the first empirical review of existing reduction techniques employed on diverse EEG datasets, thereby enhancing the versatility of this study.
A comprehensive review of 23 individual and combinational projection techniques for high-dimensionality features derived from EEG is conducted here.
These techniques are evaluated using three performance matrix such as method classification average accuracy, a number of reduced dimensional features, and dimensionality reduction rate (DRR).
The key findings of the empirical review have been discussed and summarized the results in the form of tables and plots.
The study recommended that PCA-t-SNE outperforms other studied techniques in mitigating the curse of dimensionality of classifiers on overall EEG datasets.

The paper’s remaining sections are structured in the following manner: Sect. 1 provides an initial overview of the necessary dimensionality reduction. Section 2 offers a brief introduction to a variety of EEG datasets that have been considered. Detailed descriptions of the linear and non-linear projection techniques, the classifiers employed, and the performance evaluation metrics can be found in Sect. 3 and Sect. 4, respectively. Section 5 presents a comprehensive analysis of the results and discussions regarding the performance of each projection technique. Finally, Sect. 6 concludes the paper.

2 Experimental datasets

As depicted in Fig. 1, the very first step of the study is the datasets on which the projection techniques are assessed. In this study, the types of observations, dimensionality, and intrinsic dimension ratio are three traits that are taken into consideration. These are based on previous surveys and papers on reducing dimensions. All the traits are described below.

# Observation (N): The variable N represents the number of observations in the dataset. Three distinct categories are established based on N:
- Small: When N is less than 1000 in the dataset.
- Medium: When N is greater than 1000 and less than or equal to 4000 in the dataset.
- Large: When N is greater than 4000 in the dataset.
These ranges correspond to the dataset sizes that are commonly used in research papers to evaluate projection methods.
# Dimensionality (d): The variable d represents the dimensionality of the dataset, indicating the number of features. In this study, three distinct categories are established:
- Low: When d is less than 100 in the dataset.
- Medium: When d is greater than 100 and less than or equal 500 in the dataset.
- High: When d is greater than 500 in the dataset.
# Intrinsic dimension ratio ($\sigma _{d}$): It is the ratio of the number of principal components that PCA determined with 95% of the data variance and the total number of features (d). This ratio ranges between 0 to 1. Higher values typically indicate that a projection is having difficulty mapping the data. Three different ranges are defined here as:
- Low: When $\sigma _{d}$ is less than 0.1.
- Medium: When $\sigma _{d}$ is greater than 0.1 and less than or equal 0.5.
- High: When $\sigma _{d}$ is greater than 0.5 and less than or equal 1.

Table 1 Description of considered five dataset set and their pre-processing approaches

Full size table

Table 2 Statistic of binary class classification datasets considered for the study after pre-processing

Full size table

In this study, we have chosen a smaller set of five EEG datasets on which we have already performed pre-processing and feature extraction and have obtained high-dimension feature vectors that suffer from the dimensionality problem ( Anuragi 2023). A description of the five sets of datasets is provided in Table 1. For analyzing EEG signals in the majority of the datasets, the FBSE-EWT ( Bhattacharyya et al. 2018) method was used. For Schizophrenia EEG signal classification, a multivariate FBSE-EWT (MFBSE-EWT) method was developed as an extension of the FBSE-EWT method for multiple channels. The TQWT method was used in depression detection to decompose EEG signals into sub-band signals. After decomposing EEG signals, various features were computed to achieve the highest performance of the classifiers. The computed features from each dataset are shown in Table 1. The dimensionality and number of observations of the obtained features, along with their types from each dataset, is depicted in Table 2.

3 Projection techniques

The process of dimension reduction (DR) involves projecting high-dimensional data into low-dimensional data. Different projection techniques are now frequently used in numerous applications due to the increase in the dimensionality of the features. DR techniques preserve as much of the original information of the data as possible while projecting the original high-dimensional data into a new low-dimensional dataset.

The problem of the “curse of dimensionality” can be mitigated by the new projected low-dimension representation of the original dataset. Table 3 provides a list of twenty-three projection techniques that are reviewed in this article. Their linearity type, learning type, neighborhood, computational complexity, tuning parameter requirements, and topology are also summarized in Table 3. Here, the linearity type is grouped into linear and non-linear types. Linear types are simple to implement and understand, but they cannot capture well sample distributions spread on complex manifolds in $p_{D}$, whereas non-linear type projections perform better on such datasets, but their parameters are more difficult to control. Similarly, the learning type is grouped into supervised and unsupervised learning. In the supervised learning type, the techniques use the labeled information, while in unsupervised learning, it projects the original data without the label. The projection techniques assert that they preserve two neighborhood that is local and global. The categorization of 23 techniques based on neighborhood is demonstrated in Table 3. They attempt to retain the distance between the points and their neighbors in data when using local neighborhood methods. However this may help to distinguish the clusters, but the distances between clusters in the projected space become meaningless ( Van der and Hinton 2008). Likewise, global methods try to maintain a pairwise distance between all the points. This may yield more accurate projections of high-dimensional space but exhibit less effective cluster separation ( Frey and Pimentel 1978). The computational complexity of each technique in $\mathcal {O}(\cdot )$ as a function of N and d are shown in Table 3. For interactive visual exploration, low-complexity techniques work best, but they may find it difficult to produce reliable results.

Here is a concise explanation of the diverse DR projection methods explored in this study. They are primarily categorized into two types: linear and non-linear. Each type is described in detail in the following section:

3.1 Linear types

3.1.1 PCA

Pearson, in the study ( Pearson 1901), introduced PCA, and hoteling developed its future ( Hotelling 1933). This research investigates various forms of PCA, including randomized PCA ( Feng etal. 2018), sparse PCA ( Zou et al. 2006), incremental PCA ( RossD and LimJ 2008), and kernel PCA ( Schölkopf etal. 2005), offering concise descriptions of each. They are all unsupervised learning techniques that use orthogonal transformations to obtain a new set of uncorrelated variables. The basic algorithmic steps involved in PCA are demonstrated in Algorithm 1.

Randomized PCA: It is an approximation of traditional PCA that employs random sampling techniques to select a subset of data, making it computationally more efficient for large datasets while still providing a close approximation of the principal components and their variances.
Sparse PCA: It is an extension of traditional PCA that enforces sparsity in the loading coefficients, encouraging a solution where most coefficients are zero. This promotes a more interpretable and concise representation of the data’s principal components.
Incremental PCA: This is employed for processing EEG data with large dimensionality by handling it in batches, facilitating the analysis of high-dimensional EEG datasets that might not fit into memory at once. This makes incremental PCA suitable for managing the complexities of EEG data in real-time or streaming applications.

Algorithm 1: PCA
Step 1:	Get the data D of $N\times d$ matrix
Step 2:	Compute the covariance matrix
Step 3:	Compute the eigenvectors and eigenvalues of the covariance matrix
Step 4:	Choosing principal components and forming a feature vector
Step 5:	Deriving the new projected data $D_{n}$

3.1.2 LDA

In this study, LDA is the only supervised learning type method examined. It captures global geometrical information while ignoring the geometrical variation of local data points within the same class ( Balakrishnama and Ganapathiraju 1998). The basic goal of LDA is to determine a linear transformation matrix to map high-dimensional data to low-dimensional data by following the fundamental steps mentioned in Algorithm 2.

Table 3 Conceptual comparison of feature projection algorithms

Full size table

Algorithm 2: LDA
Step 1:	Get the data D of $N\times d$ matrix and label of $N\times 1$.
Step 2:	Compute each class’s mean vector
Step 3:	Compute the total mean vector
Step 4:	Compute within-class scatter
Step 5:	Compute between-class scatter
Step 6:	Compute eigenvectors with corresponding eigenvalues sorted in non-decreasing order.
Step 7:	Deriving the new projected data $D_n$

3.1.3 NMF

In 1994, Paatero and Tapper introduced NMF, an unsupervised linear reduction technique, which was later popularised by ( Lee and Seung (2000)). NMF is a technique for factorizing a non-negative data matrix into the product of two lower-rank matrices W and H such that the resulting matrix approximates the optimal data solution. Then, the value of both matrices is iteratively updated with t as an index of the iteration, so that their product gets closer to the original data. The technique preserves the data’s structure and ensures that its weight and basis are positive. After a certain number of iterations or when the approximation error converges, NMF stops. The basic algorithm of NMF is also illustrated in Algorithm 3.

Algorithm 3: NMF
Step 1:	Get the data D of $N\times d$ matrix and label of $N\times 1$.
Step 2:	Initialize two non-negative factor matrix
Step 3:	Update $W\left( t\right) =update(D,H\left( t-1\right) ,W(t-1))$
Step 4:	Update ${H(t)}^T=update(D^T,\ {W\left( t\right) }^T,{H(t-1)}^T)$
Step 5:	Repeat step 3 and step 4 until it reaches its stopping criteria.

3.1.4 F-ICA

F-ICA, also referred to as Fats ICA, is a linear unsupervised reduction technique introduced by researcher in the study ( Hyvarinen 1999). F-ICA, like the majority of ICA algorithms, uses a fixed-point iteration scheme to rotate the whitened data orthogonally to maximize the rotation’s non-Gaussianity. Individual tests have demonstrated that the F-ICA algorithm’s fixed-point iteration scheme is 10–100 times faster than gradient-based ICA and does not need to choose a step size. Formally, the algorithmic process is described in Algorithm 4.

Algorithm 4: F-ICA
Step 1:	Get the data D of $N\times d$ matrix and label of $N\times 1$.
Step 2:	Center and whiten the matrix D
Step 3:	Initializing weight vector randomly
Step 4:	Updating weight vector
Step 5:	Normalize the updated weight vector (repeat from step 2 until the weight vector is not converged)

3.1.5 FA

FA ( Lee and Seung 2000), an unsupervised linear reduction technique, aims to model observed variables and their covariance matrix in terms of a smaller set of fundamental factors. The fundamental steps involved in FA are illustrated in Algorithm 5.

Algorithm 5: FA
Step 1:	Get the data D of $N\times d$ matrix and label of $N\times 1$.
Step 2:	Construct the initial matrix
Step 3:	Constructing correlation matrix
Step 4:	Compute Eigenvalues
Step 5:	Determine the number of factors
Step 6:	Compute the factor load matrix
Step 7:	Estimable factor analysis model

3.1.6 LPP

LPP is an unsupervised linear dimensional reduction technique based on the linear approximation of the nonlinear Laplacian graph eigenmap ( He and Niyogi 2003). Then, utilizing the concept of the graph’s Laplacian, a transformation matrix is computed, which maps the data points to a subspace. In a certain sense, this linear transformation preserves local neighborhood information optimally. Formally, the algorithmic process is described in Algorithm 6.

Algorithm 6: LPP
Step 1:	Get the data D of $N\times d$ matrix and label of $N\times 1$.
Step 2:	Construct the adjacency graph
Step 3:	Select the weight
Step 4:	Compute the eigenvector and eigenvalues
Step 5:	Deriving the new projected data $D_n$

3.2 Non-linear types

3.2.1 Kernel PCA

In kernel PCA, rather than computing the covariance matrix, it calculates the principal eigenvector of the kernel matrix; this property makes it appropriate for non-linear data mapping ( Schölkopf etal. 2005).

3.2.2 t-SNE

t-SNE ( Hinton and Roweis 2002), an unsupervised non-linear reduction technique, was presented by Hinton and Roweis. Based on the matching distance between the distribution, t-SNE captures the majority of the local structure of the high dimensional data by revealing a global structure and can work with manifold learning. The fundamental steps involved in t-SNE are illustrated in Algorithm 7.

Algorithm 7: t -SNE
Step 1:	Get the data D of $N\times d$ matrix and label of $N\times 1$.
Step 2:	Compute pairwise affinities under taken perplexity
Step 3:	Initialize new data $D_n$ randomly
Step 4:	Compute low-dimensional affinities
Step 5:	Compute gradient by minimizing the cost function
Step 6:	Deriving the new projected data $D_n$.

3.2.3 UMAP

UMAP produces a topological representation of high-dimensional data by patching together local manifold approximations and fuzzy simplicial set representations. An equivalent topological representation can be constructed from a low-dimensional data representation. To minimize cross-entropy between topological representations, UMAP optimizes the data representation layout in low-dimensional space. The basic algorithm of UMAP is depicted in Algorithm 8., and for the detailed background of UMAP, the readers may refer to ( McInnes etal. (2018)).

Algorithm 8: UMAP
Step 1:	Get the data D of $N\times d$ matrix and label of $N\times 1$.
Step 2:	Initialize embedding
Step 3:	Compute Local Fuzzy simplicial set
Step 4:	Compute probabilities of the points being nearest neighbors
Step 5:	Optimize embedding

3.2.4 MDS

MDS, an unsupervised non-linear DR technique that retains a similarity measure between pairs of data points, was presented by Kruskal and Wish in ( Carroll and Arabie (1998)). It has been utilized for exploratory analysis, multivariate analysis, and data visualization. The stress function, which in MDS is a sum of square errors between dissimilarities and their respective embedding inter-vector distance, must be optimized in order to transform data. The basic steps of MDS are shown in Algorithm 9. There are other supersets of MDS, known as non-matrix MDS (NMDS). Unlike metric MDS, non-metric MDS finds a non-parametric monotonic relationship between the dissimilarities in the item-item matrix, the Euclidean distances between items, and the location of each item in the low-dimensional space. Metric and non-matriculated MDS are examined in this study using Manhattan and Euclidean distance matrices, also known as MDS-E, MDE-M, NMDS-E, and NMDS-M.

Metric multidimensional scaling with Euclidean distance (MDS-E): MDS-E focuses on transforming data into a lower-dimensional space while preserving the pairwise Euclidean distances between data points. It is effective for capturing linear relationships within the data.
Metric multidimensional scaling with Manhattan distance (MDS-M): MDS-M, similar to MDS-E, aims to represent data in a lower-dimensional space, but it preserves pairwise Manhattan distances instead of Euclidean distances. This is particularly useful when dealing with data where Manhattan distances are more appropriate, such as in city-block distance metrics.
Non-metric multidimensional scaling with Euclidean distance (NMDS-E): NMDS-E is a non-metric variant of multidimensional scaling that focuses on preserving the rank order of pairwise Euclidean distances rather than the actual distances. It is robust to outliers and suitable for data where exact distances may not be meaningful.
Non-metric multidimensional scaling with Manhattan distance (NMDS-M): NMDS-M, similar to NMDS-E, is a non-metric approach but with Manhattan distances. It is suitable for situations where the rank order of Manhattan distances is more relevant than the exact distances, offering flexibility in capturing the underlying structure of the data without assuming a linear relationship.

Algorithm 9: MDS
Step 1:	Get the data D of $N\times d$ matrix and label of $N\times 1$.
Step 2:	Compute centering matrix
Step 3:	Determining the largest eigenvalues and corresponding eigenvectors
Step 4:	Compute the square root of the dot product of the matrix of eigenvectors and the diagonal matrix of eigenvalues
Step 5:	Deriving the new projected data $D_n$

3.2.5 LLE

Using a local graph-based approach, LLE, as a linear combination of reconstruction, primarily preserves the local structure of the data ( Roweis and Saul 2000). LLE is based on the linear approximation of all data points by a convex linear combination of their neighbors. The modified LLE (M-LLE), a variant of the LLE, was introduced by ( Zhang and Wang (2006)), which uses multiple linearly independent local weights. Algorithm 10 provides the readers with the fundamental steps of the LLE technique.

Modified LLE (M-LLE): It differs from basic LLE by incorporating adjustments for improved stability. It preserves local relationships by identifying neighborhoods, computing weights for linear combinations of neighbors, and optimizing a lower-dimensional representation with modifications to enhance robustness.

Algorithm 10: LLE
Step 1:	Get the data D of $N\times d$ matrix and label of $N\times 1$.
Step 2:	Compute the neighbors of each data point $D\left( i\right)$.
Step 3:	Compute the weights W that best reconstruct each data point $D\left( i\right)$ from its neighbors by minimizing the reconstruction error rate.
Step 4:	Compute the vector $D_n(i)$ best reconstructed by the weights, minimizing the quadratic form by its bottom non-zero eigenvectors.

3.2.6 ISOMAP

ISOMAP is a well-known non-linear DR technique for determining the intrinsic structure of the data from manifold learning. ISOMAP ( Tenenbaum and Silva Langford 2000) doesn’t learn the embedding directly in the target space. Instead, it tries to explicitly model non-linear relationships between points in close proximity in terms of geodesic distances. By linear approximating the non-linear manifold, geodesic distances can be learned. The essential step of ISOMAP is demonstrated in Algorithm 11.

Algorithm 11: ISOMAP
Step 1:	Get the data D of $N\times d$ matrix and label of $N\times 1$.
Step 2:	Compute an undirected k-neighborhood graph from the k points with the smallest dissimilarity D(i) to and use this dissimilarity as the edge’s weight.
Step 3:	By calculating the shortest paths through the k-neighborhood graph, the geodesic distances matrix is determined.
Step 4:	Derive the new projected data $D_n$ using geodesic distances matrix and metric MDS, as shown in Algorithm 6.

3.2.7 SOM

Teuvo Kohonen presented SOM in ( Kohonen (1990)), and it has since become one of the most popular non-linear unsupervised neural network algorithms for tasks like clustering, dimensionality reduction, and feature detection. In SOM, the dissimilarity of two instances in a data set with mixed-type features can be assessed separately for the numerical and categorical features. For numerical features, the dissimilarity can be calculated using the squared Euclidean distance, while for categorical features, the number of mismatches is used. Normalization is typically done prior to computing the distance matrix to make sure that each feature has an equal impact on distance. The fundamental step for applying the SOM was taken after the pre-processing step, as shown in Algorithm 12.

Algorithm 12: SOM
Step 1:	Get the data D of $N\times d$ matrix and label of $N\times 1$.
Step 2:	Initializing weight vector and initial winner neighborhood
Step 3:	Draw random input vector from D
Step 4:	Determine the winning neighborhood that has a weight vector closest to the D
Step 5:	Update the weight vector (repeat from step 3 until the feature map stops changing)

In addition, hybridized dimensionality reduction techniques using PCA, a linear reduction technique as the pre-reduction method, and the most robust non-linear reduction techniques as the integration partners, such as PCA+t-SNE ( Khagi etal. 2018), Kernel PCA+t-SNE, and PCA+UMAP ( Khagi etal. 2018), have also been explored and described here.

PCA+t-SNE: In EEG signal classification, combining PCA and t-SNE optimally reduces features. PCA captures global patterns, and t-SNE highlights local relationships, enhancing the distinction of subtle patterns among different EEG signal classes, resulting in a more effective low-dimensional representation for improved classification.
Kernel PCA+t-SNE: The key distinction between PCA + t-SNE and kernel PCA + t-SNE lies in the initial dimensionality reduction step. PCA + t-SNE employs linear PCA, which may not effectively capture complex non-linear relationships. On the other hand, kernel PCA + t-SNE uses kernel PCA in the first step, enhancing its ability to handle non-linearities and making it more powerful in specific cases compared to either technique used in isolation.
PCA + UMAP: This combines the efficiency of linear PCA in capturing primary sources of variance with the ability of non-linear UMAP to refine the representation by considering complex relationships. This hybrid approach is particularly beneficial for datasets with a mix of linear and non-linear structures.

4 Classification and evaluation

Following dimensionality reduction, the new projected features are fed into SVM and k-NN classifiers. Both the classifiers are briefly discussed below:

SVM: SVM, a supervised machine learning classifier, was introduced first by ( Cortes and Vapnik (1995)). The main purpose of the SVM is to find the hyperplane that best separates the data points into two classes so that the distance between the hyperplane and the closest data points is maximized. The distance that needs to be maximized is known as the margin, whereas these closest points are technically referred to as support vectors. The Classification Learner app from MATLAB is used in this study to implement SVM with a Gaussian kernel function.

k-NN: A non-linear classifier called k-NN is dependent on two parameters: the number of nearest neighbors and different distance metrics. Euclidean, Minkowski, and Mahalanobis distance matrices are the most frequently used by the k-NN algorithm to improve the performance of the classifier. The k-NN is appropriate for EEG data because it handles large and noisy data easily and only depends on the value of the k and the chosen distance matrix. The Classification Learner app from MATLAB is used in this study to implement k-NN with preset: weighted know by weighted k-NN.

4.1 Performance evaluation matrix

An SVM and k-NN classifier’s efficacy can be viewed visually by means of a confusion matrix, also known as a contingency table or error matrix. The confusion matrix can be used to evaluate the accuracy, precision, recall, and F-measure, but in this study, we only computed the accuracy of the classifier because it has been considered the most important parameter in most studies. Fundamentally, a confusion matrix is a two-by-two table for binary class problems (as shown in Fig. 2) that indicates the number of false positives ($F_P$), false negatives ($F_n$), true positives ($T_P$), and true negatives ($T_n$).

1.
Accuracy: The ratio of the classifier’s accurate predictions to the total of its predictions is known as the classifier’s accuracy. Mathematically it is expressed as shown in Eq. (1).
$$\begin{aligned} \frac{T_P+T_n}{T_P+F_n+F_p+T_n} \end{aligned}$$
(1)
2.
Reduced features dimension: In this study, we have also noted the number of reduced features.
3.
Dimensionality reduction rate (DRR): Identifying the most pertinent and significant features is the purpose of feature selection methods. As a result, the feature dimension can be reduced. The mathematical formulation for computing DRR is shown in Eq. (2).
$$\begin{aligned} \frac{({Full}_{feature}+{Reduced}_{feature}\ )}{{Full}_{feature}} \end{aligned}$$
(2)
where, ${Full}_{feature}$ is denoted by the number of features present in the full feature vector without applying any reduction techniques and ${Reduced}_{feature}$ signifies the number of reduced features achieved after applying the feature reduction techniques. The range of DRR values is from 0 to 1. When DRR is close to 1, it means that the performance is high, and the feature space is reduced. When DRR is close to 0, it signifies the performance isn’t good.

5 Experiment results and discussions

This empirical review has conducted an extensive comparative analysis of 23 projection techniques on a small set of five high-dimensionality features generated from EEG signals for the classification of schizophrenia, alcoholics, focal, focal with deep features, and depression. Both Python and MATLAB are used for the experiments. The pre-processing methods described in Table 1 were employed to extract features in this study. The dimensionality of the obtained features from each dataset is also depicted in Table 2. Following that, the high-dimensional features were projected into low-dimension space by employing 23 reduction techniques, which are demonstrated in Table 3. Several parameters needed to be set for each projection technique; these parameter values are mentioned in Tables 4 and 5.

Table 4 Summary of all the parameters and their values of projection techniques used in this study

Full size table

Table 5 Summary of all the parameters and their values of projection techniques used in this study (Cont...)

Full size table

After setting the parameter values, a comparative analysis of their performance using k-NN and SVM classifier with a 10-fold cross-validation approach is conducted, and the achieved results are depicted in Tables 7 and 8, respectively. The tables show three performance evaluation matrices; for a better understanding, an illustration of each cell is shown in Table 6, where the left corner of each cell shows classification accuracy, the upper right corner signifies the number of features, and the lower right corner shows DRR.

Table 6 Representation of each cell in Tables 7–8

Full size table

Table 7 Data-wise achieved classification accuracy by the k-NN classifier (columns) for different projection techniques (rows). (Notes accuracy values are in (%)). Reduced feature dimensions are mentioned in the upper right corner of each cell, and DRR in the range of 0 to 1 is depicted in the lower right corner of each cell

Full size table

Table 8 Data-wise achieved classification accuracy by the SVM classifier (columns) for different projection techniques (rows). (Notes accuracy values are in (%)). Reduced feature dimensions are mentioned in the upper right corner of each cell, and DRR in the range of 0 to 1 is depicted in the lower right corner of each cell

Full size table

In Table 7, the row represents projection techniques, while the column represents different datasets. Each cell shows the performance matrices, color code of each cell signifies the accuracy values by a sequential colormap. The color red indicates that the technique did not perform well on the corresponding datasets, while the light yellow shows vice versa. Scanning Table 7 along rows demonstrates how the performance of a given projection technique varies across the EEG datasets examined. For instance, we observe that the PCA+t-SNE and F-ICA projections have quite similar (good) classification accuracy across all five datasets. That is, Table 7 depicts a relatively light yellow compact block of cells. In contrast, if we concentrate in Table 7 on the block spanned by the NMDS-E and NMDS-M projection rows, we see less variation along rows and colors that are quite similar (worst), depicted in red color, respectively. More specifically, the method’s average classification accuracy of PCA+t-SNE is 93.36%, which is higher than the average classification accuracy of the full feature vector. Additionally, the employing technique’s impact can be seen in the number of selected features, which reduced up to three features. At the same time, scanning Table 7 along columns shows which projection technique is best for a particularly given dataset. For example, the previously mentioned projection technique PCA+t-SNE performs best in schizophrenia, focal, and depression datasets. This is because these datasets have a low intrinsic dimensionality (see Table 2), and this projection technique handles such data very well. PCA+t-SNE, LLE, F-ICA, and ISOMAP perform well on average for most datasets from the k-NN classifier, whereas NMDS-E, NMDS-M, SOM, and LLP perform poorly.

Further, Table 8 shows a similar observation from the performance analysis of the SVM classifier. While observing row-wise, PCA+t-SNE and F-ICA projections have quite similar (good) classification accuracy across all five datasets. That is, Table 8 depicts a relatively light yellow compact block of cells. In contrast, if we concentrate in Table 8 on the block spanned by the NMDS-E and NMDS-M projection rows, we see less variation along rows and colors that are quite similar (worst), depicted in red color, respectively. PCA+t-SNE, F-ICA, UMAP, and LDA perform well on average for most datasets from the SVM classifier, whereas NMDS-E, NMDS-M, LLP, and kernel PCA perform poorly.

We implemented 23 different linear and non-linear reduction techniques for five sets of EEG signals in this work. Based on the experimental results presented in Tables 7 and 8, a conclusion can be made that PCA+t-SNE performs well on all considered EEG datasets. Using scatter plots (shown in Fig. 3), we have also visualized the results of the top four reduction techniques along with the original features, particularly from the depression EEG dataset, as this dataset performs well in terms of overall techniques. Here three reduced features denoted as RF-# in Fig. 3 are depicted from each technique. Depression and normal EEG signal?s reduced features are represented in two different colors (blue-normal and red-depressed). According to the visualization plot, the PCA+t-SNE technique produces more meaningful embedding than the others, which distinguishes the cluster formation of both classes.

6 Conclusion

This review article presents a comparative analysis of multidimensional projection techniques from the perspective of end users who want to know how specific algorithms and parameter settings perform on high-dimensional EEG datasets. We reviewed the effectiveness of the recently utilized 23 feature projection techniques (including various combinations of reduction techniques) on high-dimension features derived from five diverse sets of EEG datasets. Performance assessment of these techniques was carried out using SVM and k-NN classifiers, both before (with full feature vector) and after applying reduction techniques. Based on an extensive review and evaluation using three performance metrics, four techniques emerged as the most effective. When paired with the k-NN classifier, PCA+t-SNE, LLE, F-ICA, and ISOMAP showed superior performance. On the other hand, when paired with the SVM classifier, PCA+t-SNE, F-ICA, UMAP, and LDA demonstrated the highest efficacy. The empirical findings, along with the visualization analysis, strongly recommended that PCA+t-SNE is the most effective reduction technique for classifying high-dimensional EEG datasets. Notably, the average classification accuracy achieved with PCA+t-SNE reduction techniques using k-NN and SVM classifiers is 93.36% and 91.98%, respectively, across all five datasets. These results provide valuable insights for researchers in selecting the most suitable reduction technique for high-dimensional EEG features. Moreover, the findings of this empirical review have broader applicability, as they can be extended to other high-dimensional datasets from diverse domains in future research endeavors.

Data availability

The dataset utilized in this research is publically available.

References

Akbari H, Ghofrani S, Zakalvand P, Sadiq MT (2021) Schizophrenia recognition based on the phase space dynamic of EEG signals and graphical features. Biomed Signal Process Control 69:102917
Article Google Scholar
Andrzejak RG, Schindler K, Rummel C (2012) Nonrandomness, nonlinear dependence, and nonstationarity of electroencephalographic recordings from epilepsy patients. Phys Rev E 86(4):046206
Article ADS Google Scholar
Anuragi A (2023) Improving automated analysis and learning of EEG signals for brain disorders detection using fourier-bessel series expansion-based empirical wavelet transform. Ph.D. thesis, National Institute of Technology Raipur, Raipur
Anuragi A, Sisodia DS (2017) Alcoholism detection using support vector machines and centered correntropy features of brain EEG signals. In: 2017 International conference on inventive computing and informatics (ICICI), pp 1021–1026
Anuragi A, Sisodia DS (2020) Empirical wavelet transform based automated alcoholism detecting using EEG signal features. Biomed Signal Process Control 57:101777
Article Google Scholar
Anuragi A, Sisodia DS, Pachori RB (2020) Automated alcoholism detection using Fourier-Bessel series expansion based empirical wavelet transform. IEEE Sens J 20(9):4914–4924
Article ADS CAS Google Scholar
Balakrishnama S, Ganapathiraju A (1998) Linear discriminant analysis-a brief tutorial. Institute Signal Inf Process 18(1998):1–8
Google Scholar
Bhattacharyya A, Singh L, Pachori RB (2018) Fourier-Bessel series expansion based empirical wavelet transform for analysis of non-stationary signals. Digit Signal Process 78:185–196
Article Google Scholar
Carroll JD, Arabie P (1998) Multidimensional scaling. In: Measurement, judgment and decision making, pp 179–250
Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20:273–297
Article Google Scholar
Esteller R, Echauz J, Tcheng T, Litt B, Pless B (2001) Line length: an efficient feature for seizure onset detection. In: 2001 Conference proceedings of the 23rd annual international conference of the IEEE engineering in medicine and biology society, pp 1707–1710
Feng X, Xie Y, Song M, Yu W, Tang J (2018) Fast randomized PCA for sparse data. In: Asian conference on machine learning Asian conference on machine learning, pp 710–725
Frey D, Pimentel R (1978) Principal component analysis and factor analysis
Gu G, Hou Z, Chen C, Zhao Y (2016) A dimensionality reduction method based on structured sparse representation for face recognition a dimensionality reduction method based on structured sparse representation for face recognition. Artif Intell Rev 46:431–443
Article Google Scholar
He X, Niyogi P (2003) Locality preserving projections locality preserving projections. In: Advances in Neural Information Processing Systems, p 16
Hinton GE, Roweis S (2002) Stochastic neighbor embedding. In Advances in Neural Information Processing Systems, p 15
Hotelling H (1933) Analysis of a complex of statistical variables into principal components. J Educ Psychol 24(6):417
Article Google Scholar
Hyvarinen A (1999) Fast and robust fixed-point algorithms for independent component analysis. IEEE Trans Neural Networks 10(3):626–634
Article CAS PubMed Google Scholar
Jiang T, Zhu J, Hu D, Gao W, Gao F, Cao J (2022) Early seizure detection in childhood focal epilepsy with electroencephalogram feature fusion on deep autoencoder learning and channel correlations. Multidimens Syst Signal Process 33(4):1273–93
Article Google Scholar
Jiang X, Bian GB, Tian Z (2019) Removal of artifacts from EEG signals: a review Removal of artifacts from EEG signals: a review. Sensors 19(5):987
Article ADS PubMed PubMed Central Google Scholar
Khagi B, Lee CG, Kwon GR (2018) Alzheimer’s disease classification from brain MRI based on transfer learning from CNN. In: 2018 11th biomedical engineering international conference (BMEiCON), pp 1–4
Kohonen T (1990) The self-organizing map The self-organizing map. Proc IEEE 7(89):1464–1480
Article Google Scholar
Krishnan PT, Raj ANJ, Balasubramanian P, Chen Y (2020) Schizophrenia detection using multivariate empirical mode decomposition and entropy measures from multichannel EEG signal. Biocybernetics Biomed Eng 40(3):1124–1139
Article Google Scholar
Lee D, Seung HS (2000) Algorithms for non-negative matrix factorization algorithms for non-negative matrix factorization. In: Advances in Neural Information Processing Systems, p 13
Li J, Cheng K, Wang S, Morstatter F, Trevino RP, Tang J, Liu H (2017) Feature selection: a data perspective feature selection: a data perspective. ACM Comput Surveys (CSUR) 50(6):1–45
Article Google Scholar
Mahato S, Paul S (2019) Detection of major depressive disorder using linear and non-linear features from EEG signals. Microsyst Technol 25:1065–1076
Article Google Scholar
McInnes L, Healy J, Melville J (2018) Umap: uniform manifold approximation and projection for dimension reduction. Preprint at arXiv:1802.03426
Mert A, Akan A (2018) Emotion recognition from EEG signals by using multivariate empirical mode decomposition. Pattern Anal Appl 2:181–89
Google Scholar
Mumtaz W, Vuong PL, Xia L, Malik A.S. Abd, Rashid RB (2016) Automatic diagnosis of alcohol use disorder using EEG features. Knowledge-Based Syst 10:548–59
Google Scholar
Olejarczyk E, Jernajczyk W (2017) Graph-based analysis of brain connectivity in schizophrenia. PLoS ONE 12(11):e0188629
Article PubMed PubMed Central Google Scholar
Patidar S, Pachori RB, Upadhyay A, Acharya UR (2017) An integrated alcoholic index using tunable-Q wavelet transform based features extracted from EEG signals for diagnosis of alcoholism. Appl Soft Comput 50:71–78
Article Google Scholar
Pearson K (1901) LIII. On lines and planes of closest fit to systems of points in space. London Edinburgh Dublin Philosophical Magazine J Sci 2(11):559–572
Article Google Scholar
Peng G, Nourani M, Harvey J, Dave H (2020) Feature selection using $f$-statistic values for EEG signal analysis. In: 2020 42nd annual international conference of the IEEE engineering in medicine & biology society (EMBC), pp 5963–5966
Peng G, Nourani M, Harvey J, Dave H (2021) Personalized EEG feature selection for low-complexity seizure monitoring. Int J Neural Syst 31(08):2150018
Article PubMed Google Scholar
Prabhakar SK, Rajaguru H, Kim SH (2020) Schizophrenia EEG signal classification based on swarm intelligence computing. Comput Intell Neurosci. https://doi.org/10.1155/2020/8853835
Article PubMed PubMed Central Google Scholar
Raghavendra U, Gudigar A, Chakole Y, Kasula P, Subha D, Kadri NA, Acharya UR (2023) Automated detection and screening of depression using continuous wavelet transform with electroencephalogram signals. Expert Syst 40(4):e12803
Article Google Scholar
Raghu S, Sriraam N (2018) Classification of focal and non-focal EEG signals using neighborhood component analysis and machine learning algorithms. Expert Syst Appl 113:18–32
Article Google Scholar
Ray P, Reddy SS, Banerjee T (2021) Various dimension reduction techniques for high dimensional data analysis: a review. Artif Intell Rev 54:3473–3515
Article Google Scholar
Razzak I, Hameed IA, Xu G (2019) Robust sparse representation and multiclass support matrix machines for the classification of motor imagery EEG signals. IEEE J Trans Eng Health Med 7:1–8
Article Google Scholar
Ross DL, Lim JL (2008) Incremental learning for robust visual tracking. Int J Comput Vision 77(13):125–141
Article Google Scholar
Roweis ST, Saul LK (2000) Non-linear dimensionality reduction by locally linear embedding. Science 290(5500):2323–2326
Article ADS CAS PubMed Google Scholar
Sadiq MT, Yu X, Yuan Z, Fan Z, Rehman AU, Li G, Xiao G (2019) Motor imagery EEG signals classification based on mode amplitude and frequency components using empirical wavelet transform. IEEE Access 7:127678–127692
Article Google Scholar
Saeedi M, Saeedi A, Maghsoudi A (2020) Major depressive disorder assessment via enhanced $k$-nearest neighbor method and EEG signals. Phys Eng Sci Med 43:1007–1018
Article PubMed Google Scholar
Schölkopf B, Smola A, Müller KR (2005) Kernel principal component analysis. In: Artificial neural networks—icann’97: 7th international conference Lausanne, Switzerland, pp 583–588, 8–10 October 1997
Sharma R, Pachori RB, Acharya UR (2014) Application of entropy measures on intrinsic mode functions for the automated identification of focal electroencephalogram signals. Entropy 17(2):669–691
Article ADS Google Scholar
Sharma R, Sircar P, Pachori RB (2019) A new technique for classification of focal and non-focal EEG signals using higher-order spectra. J Mech Med Biol 19(01):1940010
Article Google Scholar
Sharmila A, Geethanjali P (2016) DWT based detection of epileptic seizure from EEG signals using naive Bayes and $k$-NN classifiers. IEEE Access 4:7716–7727
Article Google Scholar
Tenenbaum JB, Silva Vd, Langford JC (2000) A global geometric framework for nonlinear dimensionality reduction. Science 290(5500):2319–2323
Article ADS CAS PubMed Google Scholar
Thilagaraj M, Rajasekaran MP (2019) An empirical mode decomposition (EMD)-based scheme for alcoholism identification. Pattern Recogn Lett 125:133–139
Article ADS Google Scholar
Van der Maaten L, Hinton G (2008) Visualizing data using t-SNE. Visualizing data using t-sne. J Mach Learn Res 9(11):2579–2605
Google Scholar
Van Der Maaten L, Postma E, Van den Herik J (2009) Dimensionality reduction: a comparative. J Mach Learn Res 10(66–71):13
Google Scholar
Wajid M (2021) MDD patients and healthy controls EEG data (New). https://figshare.com/articles/dataset/EEG_Data_New/4244171. Accessed 28 Dec 2021
You Y, Chen W, Zhang T (2020) Motor imagery EEG classification based on flexible analytic wavelet transform. Biomed Signal Process Control 62:102069
Article Google Scholar
Zhang T, Chen W, Li M (2019) Classification of inter-ictal and ictal EEGs using multi-basis MODWPT, dimensionality reduction algorithms and LS-SVM: A comparative study. Biomed Signal Process Control 47:240–251
Article Google Scholar
Zhang Z, Wang J (2006) MLLE: Modified locally linear embedding using multiple weights. In: Advances in Neural Information Processing Systems, p 19
Zou H, Hastie T, Tibshirani R (2006) Sparse principal component analysis. J Comput Graph Stat 15(2):265–286
Article Google Scholar

Download references

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

Author information

Authors and Affiliations

Department of Computer Science & Engineering, National Institute of Technology Raipur, G E Road Raipur, Raipur, Chhattisgarh, 492010, India
Arti Anuragi & Dilip Singh Sisodia
Department of Electrical Engineering, Indian Institute of Technology Indore, Simrol, Indore, Madhya Pradesh, 453552, India
Ram Bilas Pachori

Authors

Arti Anuragi
View author publications
You can also search for this author in PubMed Google Scholar
Dilip Singh Sisodia
View author publications
You can also search for this author in PubMed Google Scholar
Ram Bilas Pachori
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dilip Singh Sisodia.

Ethics declarations

Conflict of interest

The authors state that they have no conflicts of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Anuragi, A., Sisodia, D.S. & Pachori, R.B. Mitigating the curse of dimensionality using feature projection techniques on electroencephalography datasets: an empirical review. Artif Intell Rev 57, 75 (2024). https://doi.org/10.1007/s10462-024-10711-8

Download citation

Published: 26 February 2024
DOI: https://doi.org/10.1007/s10462-024-10711-8

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Mitigating the curse of dimensionality using feature projection techniques on electroencephalography datasets: an empirical review

Abstract

Similar content being viewed by others

Adaboost Classifier with Dimensionality Reduction Techniques for Epilepsy Classification from EEG

Classification of EEG Signals for Epileptic Seizures Using Feature Dimension Reduction Algorithm based on LPP

Dimensionality Reduction Effect Analysis of EEG Signals in Cross-Correlation Classifiers Performance

1 Introduction

1.1 Dimension reduction

1.2 Major contributions

2 Experimental datasets

3 Projection techniques

3.1 Linear types

3.1.1 PCA

3.1.2 LDA

3.1.3 NMF

3.1.4 F-ICA

3.1.5 FA

3.1.6 LPP

3.2 Non-linear types

3.2.1 Kernel PCA

3.2.2 t-SNE

3.2.3 UMAP

3.2.4 MDS

3.2.5 LLE

3.2.6 ISOMAP

3.2.7 SOM

4 Classification and evaluation

4.1 Performance evaluation matrix

5 Experiment results and discussions

6 Conclusion

Data availability

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation