Introduction

Neuroscience, also called brain science, is a discipline that explores the structure and function of the brain [1]. It is a typical inter-discipline that involves multiple disciplines, such as biology, psychology, information science, medicine, engineering, and artificial intelligence. Neuroscience has been developed for about 100 years and extensively applied to diagnose neurological disorders. With the development of research methods, the focus of neuroscience has gradually transitioned from brain structure to brain function in the past decade [2,3,4]. The brain nerve response is known as the core of cognitive production. Accurate identification of the brain nerve response can contribute to identifying important human cognitive functions, developing intelligent algorithms, and advancing medical developments regarding neurological diseases [3].

With the development of research tools for neuroscience, multiple neuroimaging tools are available for exploring brain function, including electroencephalography (EEG)/intracranial electroencephalography (iEEG), functional magnetic resonance imaging (fMRI), magnetoencephalography (MEG), positron emission tomography, and optogenetic techniques [4,5,6,7]. Among these methods, EEG/iEEG has been the most widely used tool for functional brain imaging due to its excellent temporal resolution and low equipment cost [8]. From the perspective of neurophysiology, EEG/iEEG reflects postsynaptic potential, which is generated when neurotransmitters bind to receptors on the postsynaptic membrane [9]. These postsynaptic potentials generate electric fields around neurons. Once sufficient neurons are activated, electroneurographic signals with specific patterns can be captured through a voltage amplifier. Owing to the shorter spatial distance between iEEG and neuronal groups, iEEG has higher accuracy and signal-to-noise ratio compared to EEG [9]. The electric signals captured by EEG have poor spatial resolution and signal-to-noise ratio since they are passed through the skull. However, EEG is a non-invasive technology, so it can be applied in a wider range of scenarios [8]. The information captured by both methods is the discharge of neuronal groups, so the capture equipment in both cases is a voltage amplifier, and the captured signals have basically the same manifestation. Hitherto, EEG/iEEG has been extensively applied to research diverse aspects of brain function, including attention [10], memory [11], language [12], emotions [13], and brain function disorders [14].

Although EEG/iEEG has good practicality, its application requires a certain foundation in signal processing technologies due to the complex representation of EEG signals; this leads to a problem in that some researchers lack clarity in selecting and applying analytical methods for EEG/iEEG. Therefore, we attempted to provide a brief introduction to commonly used EEG signal processing methods in this article. In this review, we first comprehensively classified EEG signals based on their characteristics. Next, we introduced commonly used analytical methods for EEG in terms of characteristics such as power spectrum and connectivity, and presented their adaptability to various types of EEG to assist researchers in method selection. We also summarized current issues and prospects for the future, which is expected to expedite the application of EEG/iEEG in brain science and neurological disease research.

Types of EEG signals

Generally, in research articles, especially those on neurological disease, EEG is classified based on the research subjects. For instance, in sleep study, EEG is classified into EEG during wakefulness and sleep EEG [15]; in epilepsy-related study, EEG is sub-divided into interictal EEG, preictal EEG, ictal EEG, and postictal EEG [16]; and in research on event-related potentials, EEG is categorized into resting-state EEG and task-state EEG [17]. In addition, it can be classified according to the shape of the EEG itself. For example, it can be divided into delta, theta, and alpha based on frequency [18, 19] or slow wave, fast wave, sharp wave, and spike wave based on shape.

However, from the perspective of EEG analysis, we believe that EEG can be classified into the following three categories. (1) An EEG in which the functional state of the brain remains unchanged over time is called a time-invariant EEG for short [20]. In this type of EEG, the state of the brain does not show significant changes during the capture process, for example, a resting-state EEG without psychological activity [16]. Alternatively, some changes in brain characteristics are not included among the main features to be studied. For example, in epilepsy research, researchers pay more attention to the pathological EEG; in this case, the interictal period without epileptic discharge can also be considered a time-invariant EEG [16]. Figure 1 is an example of sleep EEG. During sleep, the EEG is in a stable state for a long time. Figure 1a shows an EEG of a 5-s period of sleep, while Fig. 1b shows a 150-s sleep EEG. Although the EEG is unstable, relatively stable data segments can be found within the unstable EEG by analyzing these two segments. (2) Accurate event-related EEG can be regarded as an extension of event-related potentials; it refers to the EEG induced by a certain event where the induction time of the event is definite. EEG with time-varying characteristics caused by stimuli with a definite time, such as visuoauditory, transcranial magnetic stimulation, and electrical cortical stimulation [17]. Figure 2 is an example of event-related potential. Events will appear at a clear time, so accurate brain electrical responses can be obtained through time. Figure 2a is a section of an EEG containing event-related potentials, which marks the exact moment when the event occurred. Figure 2b shows the EEG response after superimposing multiple event-related potentials with the event as the 0 moment. Figure 2c shows the EEG response after the superposition of multiple event-related potentials, with the reaction time as the 0 moment. (3) Random event-related EEG refers to an EEG induced by a certain event in which the induction time of the event is random and cannot be determined. In research on diseases such as epilepsy or Parkinson’s disease, pathological EEG is triggered by abnormal neural activity in the lesion area, but the timing of pathological induction is difficult to determine, resulting in a time-varying EEG [19]. Figure 3 displays the EEG signals of an epilepsy patient. Figure 3a shows the EEG signal during the interictal period, while Fig. 3b, c show the EEG signal in the early and late stages of the seizure, respectively. Figure 3d shows the EEG signal of the entire seizure process. Epilepsy is a random event, so the time of occurrence of the event needs to be retrospectively located after event onset, which poses challenges to the real-time analysis of epilepsy EEG data. However, Fig. 3a, c show that epileptic EEG is still relatively stable within a period. In Fig. 3d, the data observed during the interictal and postictal periods, which represent two stable stages, reveals a significant difference.

Fig. 1
figure 1

Example of time-invariant electroencephalogram (EEG) based on sleep EEG. The data come from the C3, C4, O1, and O2 channels (10–20 system). a EEG data for a short sleep time of 5 s. b EEG data for a long sleep time of 150 s

Fig. 2
figure 2

Example of time-invariant electroencephalogram (EEG) based on event-related potentials. a A 5-second period of continuous EEG data with an event marker. b Corresponding event-related potentials of all channels obtained by superimposing EEG signals with all “Event” markers taken as the zero time. c Corresponding event-related potentials of all channels obtained by superimposing EEG signals with all “Reaction time” markers taken as the zero time

Fig. 3
figure 3

Example of time-invariant electroencephalogram (EEG) based on epilepsy EEG data. a A 5-second period of EEG data between epileptic seizures. b A 5-second period of EEG data in the early stage of an epileptic seizure. c A 5-second period of EEG data in the late stages of an epileptic seizure. d A 150-second period of EEG data from all stages of an epileptic seizure

It should also be noted that the main classification criteria for these three types of EEG were based on the EEG features to be analyzed, and specific analysis is required according to the features of interest. In sleep disorder research, we can consider stage N1 EEG as a time-invariant EEG. However, if the study target is a dream or memory, the main characteristics to be studied may also change during the N1 phase, and this EEG should be classified as a random event-related EEG. Therefore, researchers should accurately identify the target features to be studied before selecting an analytical method.

Common EEG analysis methods

In this section, the extraction methods of common brain functional features based on the characteristics of EEG signals are introduced.

Power spectrum analyses

The power spectrum is a very common analytical method in EEG analysis that can analyze the energy changes of various frequency components in EEG signals. This method can be applied in studies on brain science and neurological diseases that can trigger EEG energy changes upon state changes, such as sleep stage changes, seizures, and emotional changes. Multiple power analysis methods are available for selection, such as the fast Fourier transform (FFT), Welch, and autoregressive (AR) model, with different characteristics and usage limitations. The articles in Table 1 covered a range of topics related to EEG analysis, including sleep onset, transitions between sleep stages, classification of neurological disorders, detection of post-stroke EEG signals, analysis of EEG background activity in autism and dyslexia, and the impact of various factors such as focused ultrasound stimulation, cognitive impairment in diabetes, and neurofeedback training in autism [20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41].

Table 1 Applications of power spectrum analyses

FFT

FFT is a fast algorithm for computing discrete Fourier transform (DFT) [42]. DFT:

$$X\left(k\right)=\sum_{n=0}^{N-1}x\left(n\right){e}^{-i2\pi nk/N},\mathrm{k}=0,\dots ,\mathrm{N}-1$$
(1)

Here, \(X\left(k\right)\) denotes the DFT, \(N\) represents the length of the available data, \(x\left(n\right)\) refers to the input signal in the time domain, \(e\) signifies the exponential operation, \(i\) denotes the imaginary part, and \(k\) represents the sampling frequency. The calculation process of Equation (1) is known as the FFT algorithm. Using the symmetric and periodic nature of the exponential factor in the DFT calculation equation, FFT can reduce repetitive calculations [42]. FFT calculations have a high-frequency resolution but are also easily affected by EEG signal noise, so an average period method has been proposed for improvement.

The average period method splits the original signal into N non-overlapping consecutive segments and then calculates the periodogram of each segment individually, all of which are finally averaged [43]. Through the window technique, averaging N periodograms can reduce the variance of the power spectral density estimation, but spectral leakage easily occurs because signal segmentation leads to increased boundaries of the data, while Fourier transform has a poor ability to process these data boundaries [44].

Welch

The Welch method has two improvements that enhance resolution and reduce errors in results compared to the average period method. First, this method allows overlap between data segments. Second, the Hamming window function is used for each segment instead of the rectangular window function, which ameliorates the potential distortion caused by too many rectangular windows [45]. \({\{x}_{l}\left(n\right)\},l=1,\dots ,S\) refer to data segments, and \(M\) represents the length of each segment. The overlapping coefficient is usually set as 50% (\(M\)/2). The Welch spectrum estimate is given by the following equation:

$${\widehat{P}}_{l}\left(f\right)=\frac{1}{M}\frac{1}{P}{\left|\sum_{n=1}^{M}v\left(n\right){x}_{l}\left(n\right){e}^{-i2\pi nf}\right|}^{2}$$
(2)
$${\widehat{P}}_{w}\left(f\right)=\frac{1}{S}\sum_{l=1}^{S}{\widehat{P}}_{l}\left(f\right)$$
(3)

Here, \({\widehat{P}}_{l}\left(f\right)\) represents the periodogram estimate of segment \(l\) and \(v\left(n\right)\) denotes the window function. \(P\) refers to the general average of \(\left|{v}_{\left(n\right)}^{2}\right|\): \(P=1/M{\Sigma }_{n=1}^{M}{\left|{v}_{\left(n\right)}^{2}\right|}^{2}\), \(e\) signifies the exponential operation, \(i\) denotes the imaginary part, \(f\) represents the sampling frequency, \({\widehat{P}}_{w}\left(f\right)\) refers to the Welch power spectral density estimate, and \(S\) signifies the number of segments.

Currently, Welch’s method is one of the most widely used power spectrum analysis methods because it reduces the influence of boundary effects on the power spectrum, providing more stable power spectral results than FFT/short-time Fourier transform (STFT) methods.

Multitaper

Multitaper can solve the bias and variance problems of nonparametric spectral estimation simultaneously in an optimal manner [46]. Windowing the signal using different tapers allows multiple independent estimates to be derived from the same signal since the different windows are uncorrelated with each other.

Assuming that \({X}_{k}={\{{x}_{1,k},{x}_{2,k},\dots ,{x}_{p,k}\}}^{T}\) is the signal sequence, where p denotes the number of channels and k signifies the length of the sequence, the multitaper of the channel \(l,m\) is calculated as follows:

$${\widehat{S}}_{l,m}^{mt}\left(f\right)=\frac{1}{K}\sum_{i=1}^{K}{\widehat{S}}_{l,m}^{\left(i\right)}\left(f\right)$$
(4)
$${\widehat{S}}_{l,m}^{\left(i\right)}\left(f\right)=\Delta \left[\sum_{k=0}^{N-1}{h}_{k}^{\left(i\right)}{x}_{l,k}{e}^{-i2\pi kf\Delta }\right]\left[\sum_{k=0}^{N-1}{h}_{k}^{\left(i\right)}{x}_{m,k}{e}^{-i2\pi kf\Delta }\right]$$
(5)

Here, K refers to the cross-spectrum estimate, N represents the sequence length, \({\widehat{S}}_{l,m}^{\left(i\right)}\left(f\right)\) signifies the \(k\)th direct cross-spectrum estimate of the channel \(l,m\), \(\Delta\) indicates the sampling interval, \(f\) represents the sampling frequency, \(e\) signifies the exponential operation, \(i\) denotes the imaginary part, and \({h}_{k}^{\left(i\right)}\) represents taper. Since the final result is obtained by processing multiple tapers, the problem of information loss caused by single-scale analysis can be reduced.

The multitaper method is a modified Welch’s method that provides features similar to those of the STFT and Welch’s methods, but its stability is improved and the number of parameters to be determined is reduced because it uses multiple tapers for superposition. For instance, in the well-known tool Fieldtrip, multitaper is employed as the major power spectrum analysis method [47].

AR model

The AR model can achieve the linear prediction modeling of the original signal as with the signal \(x\left(n\right),0\le n\le (N-1)\) deemed as white noise with a mean value of 0 and a variance equal to \({\sigma }^{2}\). The signal’s amplitude during a specific period is determined by aggregating the various amplitudes from preceding signals and incorporating the estimation error. The model’s order, or filter, is contingent upon the quantity of AR coefficients employed.

$$x\left(n\right)=-\sum_{k=1}^{p}a\left(k\right)x\left(n-k\right)+w\left(n\right)$$
(6)

Here, \(a\left(k\right)\) represents the coefficients of the AR model, \(w\left(n\right)\) signifies white noise with a variance equal to \({\sigma }^{2}\), and \(p\) refers to the model order. The AR(P) model can be characterized in terms of parameters \(\left\{a\left[1\right],a\left[2\right],\dots ,a\left[p\right],{\sigma }^{2}\right\}\). Power spectral density:

$${P}_{AR}\left(f\right)=\frac{{\sigma }^{2}}{{\left|A\left(f\right)\right|}^{2}}$$
(7)

\(A\left(f\right)=1+{a}_{1}\mathrm{exp}\left(-j2\pi f\right)+{\dots +a}_{p}\mathrm{exp}\left(-j2\pi fp\right).\)AR model parameters can be derived using Burg or least squares [48]. The AR model can process the short-term signal in contrast to the FFT. The AR model is less used because of its high coefficient requirement, but its scalability is so excellent that it has been used in many researches [37,38,39,40,41].

Time–frequency analyses

Since the EEG of human beings in a task state shows time-dependent changes, a time–frequency analysis is quite suitable for EEG analysis. For short-term signals, time–frequency analysis can replace power spectrum analysis to characterize the signals in two dimensions. The commonly used methods for time–frequency analysis include STFT, the wavelet transform (WT), empirical mode decomposition (EMD), and the Wigner-Ville distribution (WVD). The articles in Table 2 investigated various methods for EEG signal analysis, including the use of rational discrete STFT and deep learning for epileptic seizure classification, a hybrid approach for alcohol and control EEG signal classification, connectivity analysis in autism disorders using STFT and coherence values, drowsiness detection based on relative band power and STFT, automatic sleep stage classification using time–frequency images, and detection of deception using smoothed pseudo WVD, among other techniques [49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68].

Table 2 Applications of time–frequency analyses

STFT

The STFT is a technique that divides long-term signals into shorter segments of uniform length. It then computes the Fourier transform separately for each of these shorter segments. The Fourier transform is defined as follows:

$$F\left(\omega ,\tau \right)={\int }_{-\infty }^{\infty }f\left(t\right){{\psi }^{*}\left(t-\tau \right)e}^{-i\omega t}dt$$
(8)

\(f\left(t\right)\) refers to the original signal, \(\tau\) signifies the translation parameter, \({\psi }^{*}\left(t-\tau \right)\) denotes the window function (usually Hamming window), and when the window function uses a Gaussian function, the STFT is called a Gabor transform. Moreover, \(e\) signifies the exponential operation, \(i\) denotes the imaginary part, \(t\) refers to time, and \(\omega\) represents frequency. The STFT has a limitation in that its fixed time window results in a fixed time–frequency resolution [69].

WVD

The WVD is a classical method for time–frequency analysis that excels in handling non-stationary signals. Unlike STFT, the WVD remains unaffected by leakage effects. This distribution represents a secondary energy density, derived by correlating the signal with time and frequency translations along with their complex conjugates. The instantaneous autocorrelation function of the signal \(f(t)\) is as follows:

$${R}_{f}\left(t,\tau \right)=f\left(t+\frac{\tau }{2}\right){f}^{*}\left(t+\frac{\tau }{2}\right)$$
(9)

\(t\) refers to time, \(\tau\) signifies the time-lag correlation coefficient, \(*\) denotes the complex conjugate, and the WVD of \(f(t)\) refers to the Fourier transform of \({R}_{f}\left(t,\tau \right)\) about \(\tau\).

$$W\left(t,\omega \right)={\int }_{-\infty }^{+\infty }{R}_{f}\left(t,\tau \right){e}^{-j\omega \tau }d\tau$$
(10)

\(e\) signifies the exponential operation, \(j\) denotes the imaginary part, and \(\omega\) represents frequency.

The WVD has a series of good properties such as conjugate symmetry, time-marginal properties, frequency-marginal properties, and energy distribution properties. However, when the signals have multiple frequency components, the WVD is affected by cross terms, that is, it is easily affected by noise [70].

WT

The WT overcomes the time–frequency resolution limitation observed in the STFT algorithm. This is achieved by introducing varying time–frequency resolutions in the outcomes through the translation and scaling of wavelets. Wavelets:

$${\psi }_{a,b}\left(t\right)=\left(\frac{1}{\sqrt{a}}\right)\psi \left(\frac{t-b}{a}\right)$$
(11)

\(\psi \left(t\right)\) refers to the mother wavelet, \({\psi }_{a,b}\left(t\right)\) signifies the sub-wavelet, \(a,b\) refer to the modulation and translation parameters, respectively, and the WT of the signal \(f(t)\) is as follows:

$${W}_{\psi }f\left(a,b\right)=\frac{1}{\sqrt{\left|a\right|}}{\int }_{-\infty }^{\infty }f\left(t\right){\psi }^{*}\left(\frac{t-b}{a}\right)dt$$
(12)

Now, there are many optional mother wavelet functions (such as morse, morlet, db, and Harr), which include discrete and continuous wavelets. For EEG signal analysis, discrete wavelets are commonly used for signal decomposition, and continuous wavelets are commonly used for signal presentation. Thus far, although continuous wavelets cause massive data redundancy, they have been the preferred time–frequency analysis method, with more accurate and smooth time–frequency representation [69].

EMD

EMD is a self-adaptive multiresolution technique that decomposes the original signals into components of different resolutions and can analyze non-linear and non-smooth signals [71]. EMD decomposes the input signals into several intrinsic mode functions (IMFs) and a residual:

$$I\left(n\right)=\sum_{m=1}^{M}{IMF}_{M}\left(n\right)+{Res}_{M}\left(n\right)$$
(13)

\(I\left(n\right)\) refers to a multi-component signal, \({IMF}_{M}\left(n\right)\) signifies the Mth IMF, and \({Res}_{M}\left(n\right)\) denotes the corresponding residual intrinsic modes. The IMF components usually extract time–frequency features using the Hilbert spectral analysis. EMD is characterized by self-adaptability and high efficiency. However, it may exhibit aliasing effects due to the presence of IMFs that contain significantly different characteristic time scales or when similar characteristic time scales are dispersed across different IMFs. EMD does not rely on the primary function of the fixed frequency, so the time–frequency results obtained by EMD are affected by Gibs; however, its positioning performance to the frequency is poor [72].

Connectivity analyses

Connectivity analysis of EEG is an analytical method that has gained much attention in recent years and is fundamental to research on brain networks and connectivity. Connectivity analysis includes multiple types, such as signal morphology-based, signal phase-based, statistics-based, and information-based analyses. Different correlation analysis methods are oriented to different principles and the obtained results express different characteristics. Hence, it is important to accurately choose suitable connectivity analysis methods during brain network research. We introduce several common connectivity analysis methods here. Table 3 summarized some articles that explored various aspects of EEG signal analysis, including analysis for different severities of obstructive sleep apnea, synchrony measures for early Alzheimer’s disease diagnosis, correlation between EEG abnormalities and symptoms of autism spectrum disorder, EEG channel correlation for emotion recognition, quantitative EEG in ischemic stroke correlation with functional status, variability of EEG functional connectivity in young attention deficit hyperactivity disorder subjects, and the identification of causal relationships between EEG activity and intracranial pressure changes in neurocritical care patients, among other topics [73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106].

Table 3 Applications of correlation analyses

Correlation (CORR)

CORR measures the similarity between two signals by calculating the variance of signals [107]. The CORR for each given frequency is as follows:

$$Corr\left(x\right)=\frac{{C}_{AB}\left(x\right)}{\left({C}_{AA}\left(x\right){C}_{BB}\left(x\right)\right)}$$
(14)

\({C}_{AB}\left(x\right)\) represents the cross-covariance between signal \(A\) and signal \(B\), and \({C}_{AA}\left(x\right)\) and \({C}_{BB}\left(x\right)\) refer to the auto-covariance of signal \(A\) and signal \(B\), respectively. CORR is sensitive to both phase and polarity.

Coherence (COH)

COH measures the similarity between two signals by calculating the power spectral density [108]. The COH for each given frequency is:

$$COH\left(x\right)=\frac{{\left|{S}_{AB}\left(x\right)\right|}^{2}}{\left({S}_{AA}\left(x\right){S}_{BB}\left(x\right)\right)}$$
(15)

\({S}_{AB}\left(x\right)\) represents the cross-spectrum between signal \(A\) and signal \(B\), and \({S}_{AA}\left(x\right)\) and \({S}_{BB}\left(x\right)\) refer to the auto-covariance of signal \(A\) and signal \(B\), respectively. Because COH is calculated through cross-spectrum and auto-spectrum, it is very sensitive to the phase changes of the signal but is little affected by energy changes.

Wavelet coherence (WTC)

WTC can represent the time-varying relationships between different signals in the time–frequency domain by producing different time–frequency resolutions through wavelet translation and dilation [109].

The WT of signal \(x\) is a function of time and frequency, defined as the convolution of an input with a wavelet family \(\theta (u)\):

$${W}_{x}\left(t,f\right)={\int }_{-\infty }^{\infty }x\left(u\right){\theta }_{t,f}^{*}\left(u\right)du$$
(16)

With given input signals \(x\) and \(y\), the wavelet cross-spectrum around time \(t\) and frequency \(f\) can be derived through the WT of \(x\) and \(y\):

$$C{W}_{xy}\left(t,f\right)={\int }_{t-\delta /2}^{t+\delta /2}{W}_{x}\left(\tau ,f\right){W}_{y}^{*}\left(\tau ,f\right)d\tau$$
(17)

Here, \(*\) represents the complex conjugate, and \(\delta\) denotes the scalar dependent on the frequency. The WTC of time \(t\) and frequency \(f\) is represented by \(C{W}_{xx}\left(t,f\right)\), and \(C{W}_{yy}\left(t,f\right)\) refers to the Fourier transform of the autocorrelation function of signal \(x\) and signal \(y\). WTC can view the phase correlation between signals on the time spectrum and reduce the interference of energy.

Phase lag value/index (PLV/PLI)

PLV and PLI are commonly applied to acquire the strength of phase synchronization [110]. The instantaneous phase of signal \(x(t)\) is generated using the following formula:

$${\varnothing }_{x}\left(t\right)=arctan\frac{\widetilde{x}\left(t\right)}{x\left(t\right)}$$
(18)

Here, \(\widetilde{x}\left(t\right)\) signifies the Hilbert transform of \(x\left(t\right)\), defined as follows:

$$\widetilde{x}\left(t\right)=\frac{1}{\pi }PV{\int }_{-\infty }^{\infty }\frac{x(\tau )}{t-\tau }d\tau$$
(19)

PV refers to the Cauchy principal value. The PLV of two signals is defined as follows:

$$PLV=\left|\frac{1}{N}\sum_{j=0}^{N-1}{\mathrm{e}}^{\left(j\left({\varnothing }_{x}\left(j\Delta t\right)\right)-{\varnothing }_{y}\left(j\Delta t\right)\right)}\right|$$
(20)

Here, \(\Delta t\) denotes the sampling period, \(N\) represents the number of samples per signal, \(j\) refers to the imaginary part, and \(e\) signifies the exponent. PLV signifies phase synchronization, with values ranging from 0 to 1. A value of 0 indicates a lack of synchronization, while 1 represents strict phase synchronization. On the other hand, PLI characterizes the asymmetry in the phase difference distribution between two signals. It is computed based on the relative phase difference between the two signals:

$$PLI=\left|E[sign(\Delta \varnothing (t))]\right|$$
(21)

\(E\) represents expectation, the result value is located within the interval [0, 1], and a higher value indicates a higher phase synchronization.

Mutual information (MI)

MI is designed based on information theory, which presents how one signal provides information for another signal [109]. \(P({x}_{j})\) and \(P({y}_{j})\) are the probability distributions of signal \(X=\{{x}_{j}\}\) and signal \(Y=\{{y}_{j}\}\), respectively. The entropy of \(X\) and \(Y\) is defined as follows:

$$H\left(X\right)=-\sum_{j=1}^{N}P\left({x}_{j}\right)log\left(P\left({x}_{j}\right)\right)$$
(22)
$$H\left(Y\right)=-\sum_{j=1}^{N}P\left({y}_{j}\right)log\left(P\left({y}_{j}\right)\right)$$
(23)

N signifies the window length. \(H\left(\left.Y\right|X\right)\) and \(H\left(X,Y\right)\) refer to the conditional entropy and joint entropy between \(X\) and \(Y\), respectively, which are defined as:

$$H\left(X,Y\right)=-{E}_{X}\left[{E}_{Y}\left[logP\left(X,Y\right)\right]\right]$$
(24)
$$H\left(\left.Y\right|X\right)=-{E}_{X}\left[{E}_{Y}\left[logP\left(\left.Y\right|X\right)\right]\right]$$
(25)

Here, E denotes the expected value function. The MI of signal X and signal Y is calculated as follows:

$$MI\left(X,Y\right)=H\left(X\right)+H\left(Y\right)-H\left(X,Y\right)=H\left(Y\right)-H\left(\left.Y\right|X\right)$$
(26)

MI can simultaneously detect the linear and nonlinear correlations between two signals, but it requires mass data.

Granger causality (GC)

GC is a linear vector AR model based on random time-series data, which can estimate effective interactions from time-series data [111]. For this method, if the past value of the signal \({X}_{1}(t)\) contains information that contributes to the prediction of \({X}_{2}(t)\), which exceeds the information contained only in the past value of Y, the signal \({X}_{1}(t)\) “Granger causes” the signal \({X}_{2}(t)\). Therefore, the bivariate AR model is as follows:

$${X}_{1}\left(t\right)=\sum_{j=1}^{p}{A}_{11,j}{X}_{1}\left(t-j\right)+\sum_{j=1}^{p}{A}_{12,j}{X}_{2}\left(t-j\right)+{E}_{1}\left(t\right)$$
(27)
$${X}_{2}\left(t\right)=\sum_{j=1}^{p}{A}_{21,j}{X}_{1}\left(t-j\right)+\sum_{j=1}^{p}{A}_{22,j}{X}_{2}\left(t-j\right)+{E}_{2}\left(t\right)$$
(28)

\(p\) refers to the maximum number of delayed observations, \(j\) denotes the lag coefficient, the matrix \(A\) represents the contribution of each delayed observation to the predicted signal value, and \({E}_{1}\left(t\right)\), \({E}_{2}\left(t\right)\) signify the residual of each time series. GC can only provide information on the linear characteristics of the signal and cannot analyze nonlinear situations.

Cross-frequency analysis (CFA)

CFA is a kind of rapidly developing connectivity analysis method, which mainly includes cross-frequency coupling (CFC) and cross-frequency directionality (CFD).

CFC describes the interaction of brain oscillations across different frequency bands and manifests in four modes: phase-to-amplitude, power-to-power, phase-to-phase, and phase-to-frequency interactions. The Kullback–Leibler distance serves as an effective metric for quantifying CFC [112]. Notably, CFC holds significance in working memory processes [113]. According to the theta/gamma neural code hypothesis, conserved memory items are encoded through theta-nested gamma cycles in sensory regions, facilitating communication between different brain cortices during memory and sensory processes [114]. A study leveraging iEEG data in epilepsy patients, coupled with behavioral outcomes, underscore the association between theta/gamma CFC across diverse brain regions and working memory performance [115]. Key findings reveal the widespread distribution of theta/gamma phase amplitude coupling across the cortex, with increased coupling strength observed in more cognitively demanding working memory tasks [116].

CFD, measuring information flow direction between brain regions, involves the modulation of high-frequency signal amplitude by the phase of a low-frequency signal [117]. It relies on the phase slope index, quantifying the phase slope in the cross-spectrum of two signals [117]. CFD has proven valuable for inferring causal relationships and estimating signal delays [118]. Additionally, it has been employed in exploring information flow directions between distinct brain regions during various cognitive tasks [117].

Source localization analysis

With the continuous development of EEG and MEG devices, the number of channels in scalp EEG or MEG has increased to over 100. Multi-channel and multi-location EEG/MEG signals have accelerated the development of EEG source localization.

First, structural MRI is often used as a prior in source localization analysis because it provides a high-resolution three-dimensional (3D) image of the brain’s anatomy. This image can be used to create a head model that accurately represents the geometry and conductivity of the brain and skull [119]. The head model is then used to calculate the forward solution, which describes how electrical activity generated by the brain is measured at the scalp [120]. By incorporating structural MRI information into the forward solution, the accuracy of the source localization can be improved.

Moreover, structural MRI has the potential to generate an accurate boundary element model of the head, facilitating the computation of the lead field matrix [121]. This matrix characterizes the propagation of electrical activity generated by the brain to the scalp electrodes [122]. The utilization of a realistic boundary element model enhances the precision of lead field matrix calculations, thereby improving the accuracy of source localization [123].

The source localization method can infer the intracranial discharge status of the brain through multi-channel signals from the scalp, human brain physical models, and finite element calculations. Source localization methods are commonly employed to localize functional areas and lesion areas, among others, under non-invasive conditions. Common source localization methods are introduced below. The articles in Table 4 explored EEG source localization techniques, including dipole analysis, beamforming approaches, and methods like low-resolution electromagnetic tomography (LORETA) and standardized LORETA (sLORETA), to study various conditions such as epilepsy, visual working memory tasks, auditory attention, depression, obsessive–compulsive disorder, pain perception, age-related hearing loss, and different neurological disorders, providing insights into the localization of brain activity in these contexts [124,125,126,127,128,129,130,131,132,133,134,135,136,137,138,139,140,141,142,143,144,145,146,147,148,149,150,151].

Table 4 Applications of source localization analyses

Minimum norm estimation

The minimum norm estimation method uses MEG for analysis and solves the current distribution by estimating the linear combination of the magnetometer lead field. \({L}_{i}\) signifies the vector field at the position \(i\), so the output of the magnetometer is defined as follows:

$${B}_{i}\left(J\right)=\int {L}_{i}\left(r\right)J\left(r\right)dV$$
(29)

\(J\left(r\right)\) denotes the conversion of various energy types into electrical energy, and the linear relationship among the magnetometer reading, current distribution, and lead field is expressed as:

$$B=LJ$$
(30)

Consequently, the shortest current vector needed to explain the magnetometer output is defined by multiplying the output vector \(B\) by the pseudo-inverse of \(L\):

$$\widehat{J}={L}^{+}B$$
(31)

Here, \({L}^{+}={L}^{T}{(L{L}^{T})}^{+}\) represents the Moore–Penrose generalized inverse, predicting minimum norm solutions for pure signals, noise-contaminated signals, and smoothed noise signals. Due to the harmonic nature of the minimum norm solution, the method faces challenges in resolving deep source localization within the outermost cortex, leading to localization errors [152].

Focal underdetermined system solution (FOCUSS)

FOCUSS is a high-resolution non-parametric technique that allocates current to each element within a predetermined reconstruction region using a forward model [153]. The weighted minimum norm method is used to perform mathematical calculations in the recursive steps in focusing. The calculation formula for the unknown current element \(I\) is as follows:

$$I=W{\left(GW\right)}^{+}B=W{W}^{T}{G}^{T}{\left(GW{W}^{T}{G}^{T}\right)}^{-1}B$$
(32)

Here, \(W\) is an \(n \times n\) matrix that refers to a constraint on the results to strengthen some elements in \(I\), \(B\) denotes the measured value of the radial magnetic field, and \(G\) signifies the spatial weight of the element:

$${W}_{k}=\left[\begin{array}{ccccc}{I}_{1_{k-1}}&\quad \empty &\quad 0\\ \qquad \quad \ddots&\quad \empty &\quad \empty \\ \empty &\quad {I}_{i_{k-1}}&\quad \empty \\ \empty &\qquad \quad \ddots&\quad \empty\\ 0 &\quad\empty&\quad {I}_{n_{k-1}}\end{array}\right]$$
(33)

\({I}_{ik-1}\) represents the \(i\)th element of vector \(I\) in the \((k-1)\) iteration, and \(k\) signifies the index of the iteration step. By continuously constructing \(W\) and calculating the weighted minimum norm, the model results are converged, but the computation time of FOCUSS is longer than that of other algorithms.

LORETA

LORETA is an innovative method in the high temporal resolution neuroimaging field that allows for the 3D reconstruction of the EEG activity distribution [154]. A head model is used for LORETA, and the intensity and direction of electrical activity at each point determine the electromagnetic field measured on the scalp. It is defined as:

$$\underset{J}{\mathrm{min}}{F}_{W}$$
(34)

with

$${F}_{W}={\Vert \Phi -KKJ\Vert }^{2}+\alpha {J}^{T}WJ$$
(35)

In the above equation, \(\Phi\) represents a vector of potential difference, \(K\) denotes the lead field matrix of the volume, \(J\) signifies the current density, \(W\) denotes the discrete Laplace operator in the square space, and \(\alpha\) refers to the Tikhonov regularization parameter.

$${\widehat{J}}_{W}={T}_{W}\mathrm{\varnothing }$$
(36)

The \({T}_{W}\) value can be calculated using the following formula:

$${T}_{W}={W}^{-1}{K}^{T}{\left(K{W}^{-1}{K}^{T}+\alpha H\right)}^{+}$$
(37)

\(H\) denotes the mean reference operator, which is realized using the discrete spatial Laplacian operator, so the spatial resolution of LORETA is relatively low.

sLORETA is also a common and popular source localization method. sLORETA incorporates additional assumptions regarding the smoothing and weighting of the values [155]. An advantage of sLORETA is that it has “guaranteed accuracy” in the presence of a single dipole, while LORETA does not [155]. sLORETA has been used in various studies to estimate the sources of EEG signals in the brain [156]. For example, sLORETA has been used to study the neural correlates of cognitive processes such as attention, memory, and language [156]. sLORETA has also been used to study the neural correlates of various disorders such as depression, schizophrenia, and Alzheimer’s disease [156].

Dipole

The dipole method can predict the electric field generated by a theoretical dipole in the brain using dipole property-related principles [157]. Location and orientation are two parameters of the dipole model, with the location indicating the position of active region within the brain in this model and the orientation indicating the arrangement of brain cells in the active region.

The six parameters of a dipole source consist of three coordinates in \({r}_{d}\in {R}^{3\times 1}\) and three dipole components in \(d=({d}_{x},{d}_{y},{d}_{x})\in {R}^{3\times 1}\) (equivalently two orientation angles and an intensity parameter). For each dipole position \({r}_{d}\) within the head, the relation between \(d\) and the potential measured at the \(m\)th electrode \({V}_{mod}\in {R}^{m\times 1}\) can be written as:

$${V}_{mod}=L\left({r}_{d}\right)d$$
(38)

The matrix \(L\in {R}^{m\times 3}\) is a lead field matrix, determined by dipole position, electrode position, and head geometry.

A more realistically shaped head model is often required for patient EEG data analysis, but in this case, boundary element methods or numerical methods such as the finite-difference method are needed to compute the lead field matrix.

Beamforming

Beamforming is a spatial filtering technique for signals measured by discrete sensors [158]. Beamforming refocuses the signals captured on the scalp to their original locations by finding the weights of each location in the source space, thus minimizing the variance of the current dipole at each location. It is often desirable to extract signals from a small region of the brain that is modeled by dipoles at the location \({r}_{d}\) with a specific orientation. With a given dipole and its components, the potential distribution is defined as follows:

$$c= L\left({r}_{d}\right)d$$
(39)
$${w}^{T}c=1$$
(40)

\({r}_{d}\) denotes the dipole coordinate, \(d\) represents the dipole component, \(L\) signifies the lead field matrix, and \(w\) refers to the weight vector.

The output variance or output power of a beamformer is calculated as follows:

$$\varepsilon \left\{{\left|y\left(k\right)\right|}^{2}\right\},k=-\infty ,\dots ,\infty$$
(41)

\(y(k)\) represents the output and \(\varepsilon \left\{{|\cdot |}^{2}\right\}\) denotes the expected value of its parameter. The results are constrained with different restrictions.

Current source density (CSD)

CSD calculates an estimate of the current projected radially from the underlying neuronal tissue at a given surface location to the skull and scalp and calculates a spatially weighted sum of the potential gradients pointing to that location from some or all of the recorded locations.

CSD estimates:

$$C\left(E\right)=\sum_{i=1}^{N}{c}_{i}h\left(\mathrm{cos}\left(E,{E}_{i}\right)\right)$$
(42)

Here, \(C\left(E\right)\) denotes the current density value at any point \(E\) on the sphere surface, \({c}_{i}\) refers to a constant to express an \(i\) surface potential set, and \(\mathrm{cos}\left(E,{E}_{i}\right)\) refers to the cosine of the angle between the surface point \(E\) and the electrode projection \({E}_{i}\). The function \(h\left(x\right)\) is defined as the sum of the grades:

$$h\left(x\right)=\frac{1}{4\pi }\sum_{n-1}^{\infty }\frac{2n+1}{{n}^{m-1}{\left(n+1\right)}^{m-1}}{P}_{n}\left(x\right)$$
(43)

Here, \(m\) is a constant greater than 1 and \({P}_{n}\) is the \(n\)th Legendre polynomial, defined as follows:

$$\Delta {P}_{n}=-n\left(n+1\right){P}_{n}$$
(44)

CSD does not necessitate reference information but is susceptible to noise. CSD is a source localization method designed for scalp EEG, treating the entire head as a conductor with equal conductivity. It concentrates signals from multiple EEG channels to their respective channels by adjusting the parameters. This method cannot focus EEG signals to the intracranial region. However, its arithmetic is simple and fast, and it is still used in some EEG analyses for scalp localization without the assistance of brain models [159].

Machine learning

Machine learning is a very popular class of signal processing methods currently applied in the medical field [160], and with the rapid development of deep learning, machine learning methods in EEG analyses have gained attention. Machine learning methods are commonly used for classification and regression problems in EEG analyses and have yielded substantial results in disease research [161]. The studies in Table 5 utilized various machine learning and signal processing techniques, including common spatial pattern (CSP), deep learning, wavelet analysis, support vector machine (SVM), convolutional neural network (CNN), recurrent neural network (RNN), and long short-term memory (LSTM) network, to address diverse applications such as seizure detection, diagnosis of neurological disorders (autism, schizophrenia, Parkinson’s disease), mental fatigue measurement, and emotion recognition using EEG signals [162,163,164,165,166,167,168,169,170,171,172,173,174,175,176,177,178,179,180,181,182,183,184,185].

Table 5 Applications of machine learning methods

CSP

The CSP algorithm uses a linear transformation to maximize the variance ratio of two signals after mapping, which is a common spatial-filtering algorithm used for multi-channel EEG analysis [186].

\({X}_{1},{X}_{2}\) refer to the signal data of \(\left(n,{T}_{1}\right),(n,{T}_{2})\) size, where \(n\) is the number of channels and \({T}_{1},{T}_{2}\) are the length of the respective signal:

$$w={argmax}_{w}\frac{{\Vert w{X}_{1}\Vert }^{2}}{{\Vert w{X}_{2}\Vert }^{2}}$$
(45)

\(w\) denotes the projection matrix, which can be solved using matrix diagonalization.

In contrast to other spatial feature extraction methods, the CSP method is simple and efficient, but it is only suitable for processing two categories of signal data.

Linear discriminant analysis (LDA)

LDA, a classical linear method, is mainly used to find features that characterize or separate two classes and is also applicable for the dimensionality reduction of data [186]. Regarding projection, the projected data have high cohesion and low coupling characteristics.

$$(J\left(w\right)=\frac{{\left({M}_{1}-{M}_{2}\right)}^{2}}{{S}_{1}+{S}_{2}})$$
(46)
$$\widehat{w}={argmax}_{w} J\left(w\right)$$
(47)

\({S}_{1},{S}_{2}\) are intra-class scatters, \({\left({M}_{1}-{M}_{2}\right)}^{2}\) refers to the inter-class scatter, and \(\widehat{w}\) represents the mapping matrix. Because LDA assumes that the data obey the Gaussian distribution, it does not perform satisfactorily in processing data with non-Gaussian distributions.

SVM

SVM is a class of generalized linear classifiers for the classification of binary data in a supervised learning manner [176]. SVM constructs a hyperplane in high-dimensional space to distinguish between two classes of data. Assuming that the dataset is \([\left({x}_{1},{y}_{1}\right),\left({x}_{2},{y}_{2}\right),\dots \left({x}_{n},{y}_{n})\right]\), wherein \({y}_{i}\in [-1, 1]\), the hyperplane is defined as:

$${w}^{T}x-b=0$$
(48)

The plane separating the two classes of data is as follows:

$${w}^{T}{x}_{i}+{w}_{0}=1{,w}^{T}{x}_{i}+{w}_{0}=-1$$
(49)

Here, \({w}^{T}\) represents the normal vector and \(b\) denotes the offset, so the data interval is \(2/\Vert w\Vert\). This method maximizes \(2/\Vert w\Vert\) while ensuring that all data satisfy the conditions. Methods such as Lagrangian duals can be used to solve such constrained optimization problems, computing the hyperplane of the separated data. SVM performs poorly in resolving multi-classification problems.

CNN

The unique convolutional layer of CNN can effectively extract EEG signals and structural information in the spatio-temporal frequency domains [187,188,189].

For feature extraction, the dot product is completed using the input data with the filterbank region-by-region, and each kernel is scanned using step length, sharing equal weight. The resulting output is a set of K-dimensional feature maps.

$${Z}_{j}^{l}=\sigma \left(\sum {Z}_{j}^{l-1}*{W}_{ j,i}^{l}+{B}_{j}^{l}\right)$$
(50)

Here, \({B}_{j}^{l}\) signifies the \(j\)th deviation in the layer \(l\), \({W}_{ j,i}^{l}\) refers to the weight matrix connecting with the feature map in the neighboring layer (\({Z}_{j}^{l},{Z}_{j}^{l-1}\)), \(*\) represents the convolution operator, and \(\sigma (\cdot )\) denotes the nonlinear activation function.

The extracted feature maps are recognized by a classifier, which often uses the cross-entropy loss function:

$$L\left({y}_{i},{\widehat{y}}_{i}\right)=\frac{1}{N}\sum_{\dot{i}=1}^{N}(-{y}_{i}log{\widehat{y}}_{i}-\left(1-{y}_{i}\right)\mathrm{log}\left(1-{\widehat{y}}_{i}\right))$$
(51)

Here, \({y}_{i}\) represents the sample’s true value and \({\widehat{y}}_{i}\) signifies the model-predicted value. CNN requires less preprocessing than other algorithms but also has the risk of overfitting.

RNN

The RNN model performs well for temporal signals, wherein connections between nodes generate directed or undirected graphs along the time series, effectively extracting feature information in the time dimension [190]. However, gradient explosion and gradient vanishing problems are present due to RNN’s structure of backpropagation through time. Later, LSTM was developed, which has broader applications than RNN.

LSTM

LSTM has improved on the problems of the RNN network [191] and selectively transmits data utilizing forgetting gates, input gates, and output gates.

The forgetting gate determines which information to remove from the state of the unit:

$${f}_{t}=\sigma \left({W}_{f}\left[{h}_{t-1},{x}_{t}\right]+{b}_{f}\right)$$
(52)

The input gate determines which values will be updated:

$${i}_{t}= \sigma \left({W}_{i}\left[{h}_{t-1},{x}_{t}\right]+{b}_{i}\right)$$
(53)
$${\widetilde{C}}_{t}=\mathit{tan}h\left({W}_{c}\left[{h}_{t-1},{x}_{t}\right]+{b}_{c}\right)$$
(54)

Then, the unit value state is updated based on the equations above:

$${C}_{t}={\mathrm{f}}_{t}*{C}_{t-1}+{i}_{t}*{\widetilde{C}}_{t}$$
(55)

Finally, the output gate determines which parts of the unit state will be the final output:

$${o}_{t}= \sigma \left({W}_{o}\left[{h}_{t-1},{x}_{t}\right]+{b}_{o}\right)$$
(56)
$${h}_{t}={o}_{t}*\mathrm{tanh}\left({C}_{t}\right)$$
(57)

wherein \(\sigma\) signifies the sigmoid activation function that compresses numbers to the range 0, 1, \(tanh\) denotes the hyperbolic tangent activation function that compresses numbers to the range − 1, 1, \({W}_{f}\), \({W}_{i}\), \({W}_{c}\), and \({W}_{o}\) are the weight matrixes, \({x}_{t}\) represents the input vector, \({h}_{t-1}\) represents past hidden states, and \({b}_{f}\), \({b}_{i}\), \({b}_{c}\), and \({b}_{o}\) are deviation vectors. LSTM has a slow training speed due to its performing and processing difficulties [192].

Joint application of EEG analysis and machine learning methods

The joint application of EEG analysis and machine learning methods has been an active area of research in neuroscience and disease diagnosis. EEG is a non-invasive method of measuring the electrical activity of the brain, and machine learning algorithms can be used to extract information from EEG signals to help diagnose various disorders and identify different brain states [193]. Machine learning algorithms have been developed to extract features from EEG signals, such as frequency bands, time–frequency representations, and connectivity measures [193]. These features can then be used to train machine learning models to classify different brain states or diagnose various disorders [193]. Machine learning algorithms have been developed to detect seizures in EEG signals with high accuracy [194] and classify EEG signals from patients with Alzheimer’s disease and healthy controls [195].

The joint application of EEG analysis and machine learning methods has several advantages in neuroscience and disease diagnosis. It allows for the identification of patterns in EEG signals that are difficult to detect using traditional methods. Machine learning algorithms can be used to extract features from EEG signals that are not easily visible to the human eye, such as subtle changes in frequency or amplitude [193]. These features can then be used to train machine learning models to classify different brain states or diagnose various disorders. Another advantage of this combination is that it can help reduce the subjectivity of EEG analysis. Traditional EEG analysis methods rely on visual inspection of the EEG signal by a trained expert, which can be time-consuming and subjective [193]. Machine learning algorithms can be used to automate the process of EEG analysis, reducing the time and subjectivity involved in the analysis [194].

Moreover, different models may be better suited to different aspects of the data. One model may be better at detecting certain types of patterns in the data, while another model may be better at classifying the data into different categories. By combining the strengths of different models, it is possible to create a more accurate and robust analysis [193] to reduce the risk of overfitting, and create a more generalizable analysis [196].

In summary, the joint application of EEG analysis and machine learning methods has great potential for the diagnosis of various disorders and the identification of different brain states. It has several advantages, including the ability to identify patterns in EEG signals that are difficult to detect using traditional methods, and the ability to reduce the subjectivity of EEG analysis.

Discussion

This article reviewed several commonly used EEG/iEEG analysis methods in neuroscience and introduced the applied principles based on the data generation characteristics. Due to different EEG data generation approaches, there are fundamental differences in data time, events, and variability. Therefore, the methods for data analysis should be selected based on these characteristics to ensure theoretical accuracy. Figure 4 presents a summary diagram of method selection. The required method can be selected based on the characteristics of the EEG signal and application requirements. The various methods are discussed below.

Fig. 4
figure 4

Summary of method selection for different data characteristics and application requirements. AR autoregressive, CFA ross-frequency analysis, CNN convolutional neural network, COH coherence, CORR correlation, CSD current source density, CSP common spatial patterns, EEG electroencephalography, EMD empirical mode decomposition, FFT fast Fourier transform, FOCUSS focal underdetermined system solution, GC granger causality, LDA linear discriminant analysis, LORETA low-resolution electromagnetic tomography, LSTM long short-term memory, MI mutual information, PLV/PLI phase lag value/index, RNN recurrent neural network, STFT short-time Fourier transform, SVM support vector machine, WT wavelet transform, WTC wavelet coherence, WVD Wigner-Ville distribution

The power spectrum analysis method is used to reflect the energy changes in various brain regions. FFT has a high-frequency resolution and accuracy but is easily affected by noise and requires a large amount of data [197]. Therefore, FFT is more suitable for analyzing time-invariant EEG signals of good quality [44]. Welch and multitaper can suppress noise in EEG signals using the window averaging method, but the frequency resolution is decreased and the requirement for data length is increased [198]. Hence, for time-invariant EEG, if the signals contain slight burst noise, Welch or multitaper is a good choice [45]. Additionally, for accurate event-related EEG, EEG can be segmented into epochs based on the onset time of events. Windowed superposition analysis of EEG in the same state, such as baseline EEG before stimulation, can also yield good power spectrum results. If the signal can be directly spliced, signal jumps easily occur at the splicing site, which can cause severe Gibs. Therefore, it is not recommended to use signal splicing before FFT calculation or Welch power spectrum analysis. The AR model can calculate the power spectrum after signal prediction and modeling, which is very suitable for analyzing short-term signal power spectra [58], such as the power spectrum of a small segment of rapidly changing EEG signals in event-related EEG. However, it should be noted that the AR model method is based on the model, so the selected model may not fit the signal well. Selection of the wrong model will result in large deviations [197]. The application of AR models requires a more accurate evaluation of the signals [199]. Additionally, for signals with rapid changes in signal amplitude, commonly used models cannot fit effectively, so AR models cannot effectively analyze the power spectrum. In summary, if the data obtained are of good quality, such as high-quality sleep data, FFT is recommended to obtain more accurate results. If the data contain random noise, such as in the long-term monitoring of epilepsy, Welch or multitaper is recommended. If the data are short, such as EEG after physical instant stimulation, the AR model is recommended for fitting. Power spectrum analysis is the basis of EEG signal analysis and one of the most important analysis methods. With upgrades in EEG acquisition equipment, the signal-to-noise ratio of the obtained EEG signal is also increasing. At this time, the transient power spectrum will play an important role, and related research will enhance the development of brain-computer interfaces (BCIs) and machine learning fields.

Time–frequency analysis is a powerful tool for analyzing event-related EEG signals because it can describe the changes in time-varying EEG from two dimensions based on the changes in time and frequency. Among the various time–frequency analysis methods, the continuous WT is the most commonly used method due to its good performance in balancing time and frequency. However, its high computational and spatial complexity makes it unsuitable for long-term data analysis [200]. Correspondingly, DWT shows a good performance in signal decomposition but a poor visualization effect and unsatisfactory frequency resolution [201]. For time-invariant EEG, although the changes in EEG are insignificant, the energy of diverse frequency bands will certainly change in response to long-term changes. STFT can present this response well, having good frequency resolution [3] and low time resolution [69]. If WT is used, although it is more informative, the resultant massive data redundancy is not conducive to observing the main features. WVD has a very high time–frequency resolution for short-term signals but is highly susceptible to noise. Therefore, WVD is more suitable for analyzing short-term signals with less noise [70]. EMD is very suitable for analyzing signals with many abrupt amplitudes [72] and will not be affected by Gibs, but WVD has poor frequency localization of low-noise signals [201]. With advances in computer performance, time–frequency analysis methods have gradually replaced power spectrum methods as the first choice for observing spectrum changes. The integration of time–frequency graphs and image-based deep learning has also produced many high-quality applications. However, it should be noted that time–frequency analysis is still not detailed enough for the calculation of instantaneous changes. For example, it is difficult to use time–frequency diagram analysis for the EEG signal during epileptic discharge. In this case, it should be combined with the time domain method to improve the description accuracy.

Connectivity analysis is an important part of neural signal analysis. The selection of relevant analysis methods is crucial to the main features that need to be analyzed. This article introduced several classical correlation analysis methods from the perspectives of time, frequency, and nonlinearity. However, many other methods have not been introduced, such as event statistics, CFC coupling, and amplitude frequency coupling. These methods are similar to the ones in this article and can be selected based on the data characteristics. Several correlation analysis methods applicable to long-term signals can be used to observe the correlations between different brain regions in time-invariant EEG due to insignificant changes in the characteristics of time-invariant EEG. CORR focuses more on the time-scale similarity of signals, while COH focuses more on their frequency-scale similarity [107]. The PLV/PLI method refers to phase-based statistics, with PLV being stricter. Such a method can be easily converted into correlation statistics of cross-frequency or other events [202]. These three methods can be used to measure the correlation between channels in terms of signal similarity and signal phase synchronization [112]. MI and GC can analyze the driving force between signals from the perspective of information transmission [109]. These two methods can be used to study the relationship between signals. MI observes the relationship from the perspective of information transmission, while GC infers the relationship from the perspective of regression. Both methods are nonlinear but differ in their approach. MI uses information entropy, while GC uses regression. As a result, the characteristics of the observations differ between the two methods [111]. However, WTC requires clear events to produce more accurate results, making it suitable for accurate event-related EEG but not for time-invariant EEG. All the methods mentioned above are also suitable for accurate event-related EEG. However, it is important to note that the relevant changes presented by event-related EEG during the event are accurate and real. Therefore, correlation analysis of short-term signals after the event is recommended to obtain related results with a higher signal-to-noise ratio. For random event-related EEG, due to the inaccuracy of its events, WTC methods are also unable to provide accurate results. However, due to the time-varying characteristics, other correlation analysis methods are required to segment and classify the signals according to the main analysis features to improve the signal-to-noise ratio of the results [203]. Connectivity analysis methods have emerged as a powerful tool for studying brain networks, which are important components of brain cognition. Moreover, these methods can be combined with neural networks to develop new bionic operations. Different connection patterns can represent different ways of connecting neurons. The existing neural networks usually use direct signal connections, but other connection methods can be used to produce more intelligent network interfaces. Spiking neural networks are a representative example of this type of research.

Source localization analysis can convert multi-channel EEG from the scalp to deeper brain regions, which can more clearly localize the position of signal generation. A personalized brain model is generated through structured MRI, following which intracranial nerve activity is inferred from extracranial nerve activity signals to estimate the discharge location of the personalized brain. This process can locate key intracranial locations without surgery and has been widely used in the fields of epilepsy focus location and functional area location. Existing source localization methods have different implementation principles. The minimum norm estimation can partly localize the source into the intracranial region but has low accuracy in deep source localization and a high requirement for signal quality due to the use of MEG [139]. The dipole and beamforming methods can effectively localize the source into the intracranial region and have good localization accuracy, but they require more accurate parameters and an accurate head model [142, 163]. Beamforming balances speed and accuracy and is a rapidly developing method [204]. The FOCUSS method has good resolution but low computational efficiency [140]. LORETA is currently the most frequently used source localization method [205]. The spatial resolution of this method is not high, but its temporal resolution is good; thus, it has received considerable attention in EEG analysis [139]. CSD does not require a brain model and its calculation process is simple and fast, but it cannot localize the source in the intracranial cavity [144]. Signal quality plays a crucial role in the accuracy of multi-channel brain source localization methods. Therefore, the source localization method best suits short-term EEG localization analysis in event-related EEG. By leveraging the high signal-to-noise ratio features at the onset time of events, more accurate localization results can be obtained. For the source localization analysis of time-invariant EEG, the discharge characteristics of the EEG should be converted before localization, or the superposition method should be used to increase the signal-to-noise ratio. For random event-related EEG, it is advisable to choose EEG signals with a high signal-to-noise ratio for localization as far as possible. Source location methods have significant restrictions, including signal quality, number of channels, and brain model accuracy, which can affect positioning accuracy. However, targeted selection can improve the accuracy of these methods. With the continuous development of computing power and artificial intelligence technology, the quality requirements of source localization methods will gradually decrease, allowing them to be widely used in clinical practice.

Machine learning methods are rapidly developing methods that learn features, which can be the original EEG signals or converted features of EEG [148], such as power and connectivity [149]. Hence, although machine learning has a wide range of adaptability due to its self-learning nature, the generation and selection of features remain points of discussion. For time-invariant EEG, the features of EEG can be the average of long-term features, such as the power spectrum and connectivity of each-channel EEG. For accurate event-related EEG, event-related features can be learned. For random event-related EEG, machine learning methods with clustering properties are more suitable for semi-supervised or unsupervised feature learning [150]. Deep learning has been successfully applied in multiple EEG signal tasks, such as motor imagery, epilepsy detection, severe depression detection, sleep stage scoring, and event-related potential tasks [206]. There are differences between the data for different tasks, such as signal window length and channel count. Given these differences, selecting the suitable type of deep learning network can achieve better classification performance. Using CNN to classify spectrograms can also produce good results [207], but CNN models are suitable for data classification without time information, while LSTM models are suitable for regression analysis with time information [190]. However, compared with traditional single models such as CNN and LSTM, a mixed model is recommended [208]. If a model fails to achieve the expected results, researchers can opt for a fusion of multiple models to improve accuracy. For skilled machine learning researchers, this is a simple and fast way to build applications. The mixed model is expected to perform well in classification accuracy and transfer learning. Nevertheless, deep learning models also have shortcomings. For example, due to the small size of disease datasets, the model is prone to overfitting. Therefore, methods such as introducing regularization terms into the model should be considered to minimize the impact of overfitting [209]. Moreover, the introduction of model interpretability can aid in understanding the feature selection for model classification. There may be causal relationships within the EEG features, and the introduction of causal algorithms can be considered to further optimize the models [210]. At present, most deep learning models are designed based on images and cannot adapt well to EEG signal data. The transformer, as a new type of neural network model, is being used in the diagnosis and treatment of brain diseases, but its application in EEG needs to be studied further [211]. Therefore, it is necessary to consider encoding and decoding the data based on EEG signal data features and developing new model structures. Machine learning methods are currently the fastest-growing neural signal processing methods, and many researchers have proposed new processing ideas in EEG analysis. With the rapid development of brain-like intelligence, a large-scale model may emerge to perform bionic simulations of human brain functions. This kind of research will help develop the field of artificial intelligence to a higher level.

This article discussed some commonly used EEG analysis methods. However, in practice, a combination of methods is frequently used. For instance, BCI, a frontier field of neuroscience and neurological diseases, requires the complex processing of brain electrical signals using multiple methods. A neural device known as a BCI translates the neural activity of an individual into external responses or directives. These interfaces find applications in restoring functionality for conditions such as epilepsy, stroke, spinal cord injuries, ALS, cerebral palsy, narcolepsy, Parkinson’s disease, and neuromuscular disorders [212]. In the realm of mental health, BCIs are under investigation as potential treatments for conditions like depression, anxiety, obsessive–compulsive disorder, and other neuropsychiatric disorders [213]. BCIs are versatile in acquiring a diverse array of signals, each associated with different objects. Despite this variability, the decoding of brain signals generally follows a five-stage process: signal acquisition, preprocessing, feature extraction, classification, and control interface. These stages involve the integration of various methodologies. Ongoing research in BCI analysis methods aims to enhance accuracy and reliability. Notably, the application of deep learning algorithms for EEG data analysis is a promising avenue. Another focus is on leveraging explainable artificial intelligence techniques to gain insights into BCI analysis outcomes. Like BCI, numerous studies necessitate the integration of traditional and innovative technologies to continually enhance the efficacy of EEG analysis methods and establish a foundation for further research.

Conclusions

EEG/iEEG is commonly applied in functional neuroimaging and is one of the leading tools in neuroscience. Clinical medicine, BCIs, and psychological research all require EEG/iEEG analysis. In recent decades, a variety of analysis methods have emerged for researchers to choose from, and interest in such techniques is high. However, the abundance of analysis methods has led researchers to question their applicability.

This review categorizes representative research methods based on the characteristics of EEG/iEEG signals. The methods are classified into power spectrum analysis, time–frequency analysis, connectivity analysis, source localization analysis, and machine learning. Other methods with wide application scenarios, such as nonlinear analysis, predictive analysis, and graph theory analysis, are not introduced in this review. These methods are considered to have certain similarities to or be the extension of the classical methods in this review from the perspective of analysis purpose. The methods introduced in this review are only a subset of common methods, and users need to make choices based on the characteristics of the data and methods.