Background

The first recording of the electric field of a human brain was made by the German psychiatrist Hans Berger in Jena, Germany, in 1924. He named the recorded signals electroencephalograms (EEGs) [1]. Over the past few decades, this signal has attracted very considerable interest and attention in the study of cognitive processes in both clinical [29] and research areas [1016]. Its main advantages are non-invasive measurement, superior temporal resolution, easy implementation, and low cost [17, 18]. An event-related potential (ERP), as a derivative of the EEG, is a measured brain response directly resulted from a thought or perception. In 1964 and 1965, respectively, two groups (Chapman and Bragdon [19] and Sutton et al. [20]) independently discovered a P300 component (a wave peak approximately 300 milliseconds (ms) after a task-relevant stimulus). Recently, a great variety of potential applications of the ERP-based P300 component have been widely studied [2126].

Ideally, the EEG machine records, along the scalp, the electrical activities generated by the firing of neurons within the brain. The present problem is that EEG signals contain the neurons' activities located in some significant distances away from the sensors (electrodes). Therefore, given the distance between the electrode and the neuronal activities, the EEG signal collected at any point on a person's scalp is a nonlinear mixture of the activities generated over a large brain area. In this paper, the recorded EEG data are assumed to be a linear mixture of neuronal activities for brevity. Certainly, dealing with the typical low-amplitude and low signal-to-noise ratio (SNR) potentials, the removal of other biological signals becomes one of the major challenges in the study of ERPs. To resolve this problem, down-sampling and averaging methods of EEG data over multiple trials are usually required. However, the down-sampling method can cause some signals to become indistinguishable and distorted, which implies an alteration of the original characteristics of the waveform of information. Also, the averaging method assumes that the signals are long-time stationary and deterministic relative to the stimulus onset. This assumption might cause the loss of time resolution specifically for dissimilar trials. Also, the stationarity and determinacy assumption on EEG signals might not work, because one must consider other factors such as maturation, age, sex, state-of-consciousness, psychiatric and neurological disorders, etc [27].

In this paper, a more efficient means of feature extraction is developed to cope with the drawbacks of the down-sampling and averaging method. Previous research has shown that several aspects of the ERP (especially the latency, magnitude, and topography) are highly variable across trials [27, 28]. Many techniques [2933] appeared in research area to resolve the problem of EEG (specifically for obtaining P300 components) are not sufficiently standardized for clinical usage. Moreover, those techniques usually have been performed off-line. In this paper, a real-time feature extraction method for P300 components using an adaptive nonlinear principal component analysis (ANPCA) incorporating the multilayer neural network (MNN) is proposed. The MNN technique has been widely adopted in the fields of information and neural sciences (i.e., feature extraction, classification, modeling, etc.) [3439]. The experimental results in this paper show that the implementation of the proposed method achieves a very significant statistical improvement in extracting P300 components.

The main contributions of this paper are the following. (i) The developed multi-stage principal component analysis (PCA) applied at the pre-separation step reduces external noises and artifacts significantly, and separates the colored source in the measured EEG signals. (ii) The designed adaptive rule in the whitening step makes the subsequent separation algorithm to converge fast. (iii) The combination of the proposed ANPCA method and the MNN for feature extraction can identify the P300 components in real-time (i.e., without down-sampling and averaging). (iv) Furthermore, the proposed method can become a viable tool in both research and clinical applications.

Methods

Data acquisition

Figures 1(a) and 1(b) show the overall schematic and block diagram, respectively, of the proposed real-time feature extraction method. In the experiment, two masters students and five Ph.D. students (all males, age 32 ± 5 years, none of whom had any known neurological deficits) have participated. A seven-choice signal paradigm (i.e., forward, turn right, turn left, backward, backward right, backward left, and stop) is used to stimulate the seven subjects. They sit in a comfortable chair in front of a computer monitor located at 60 cm away from their eyes. The subjects are asked to count silently the number of times of the flashes of a preselected image on the screen while imagining a car moving in the direction of the flashed signal. Four seconds after a starting tone, seven different images flash in random order, one image at a time. A software program (E-Prime 2.0, developer: Schneider, Sharpsburg-USA) is employed for presenting stimuli.

Figure 1
figure 1

The proposed scheme for real-time feature extraction (the seven traffic signals flash one at a time to evoke P300): (a) the overall scheme, (b) block diagram. The configurations of the ANPCA algorithm incorporated with the MNN scheme for real-time feature extraction of the independent components according to the P300 component.

The left-hand side box in Figure 1(a) shows a g-MOBIlab+ biosignal acquisition device (Christoph Guger, Austria), with which the EEG signals are recorded continuously and digitized at a 256 Hz sampling rate. Figure 2 depicts the positioning of the eight electrodes (channels) at Fz, Cz, Pz, Oz, P7, P3, P4, P8 by following the 10-20 International System [40] and the linked-ears reference. The ground electrode is placed at the center of the forehead. The impedance at each location is kept below 5 kΩ. The participants are supposed not to have any eye and head movements during the EEG recording. Each subject records four sessions; four different image-flash durations (i.e., 25 ms, 50 ms, 75 ms, and 100 ms, respectively) followed by a 300 ms blank screen. Hence, the inter-stimulus intervals (ISIs) in this work range from 325 ms to 400 ms.

Figure 2
figure 2

The eight-electrodes configuration. The standard positions (i.e., Fz, Cz, Pz, Oz, P7, P3, P4, and P8) prescribed by the 10-20 International System with a linked-ears reference.

Real-time feature extraction

Let M be the number of measured EEG signals and N be the number of unknown input sources. Then, the measured signal at channel i, x i (k), can be represented as a linear combination of N unknown mutually statistically independent source signals s j (k), j = 1,2, ..., N, as follows (typically MN) [41, 42].

x i ( k ) = j = 1 N a i j s j ( k ) + n i ( k ) ,
(1)

or in matrix form,

x ( k ) = A s ( k ) + n ( k ) ,
(2)

where x(k) = [x 1(k), x 2(k), ..., x M (k)] T ∈ RMis the vector of EEG signals, A ∈ RM × Nwith entries a ij is the unknown M × N mixing matrix, s(k) = [s 1(k), s 2(k), ..., s N (k)] T ∈ RN is the unknown vector of colored source signals, and n(k) ∈ RMis the vector of additive noises. The objective of this work is to estimate both A and s(k). The following assumptions are made: Individual components of the source vector s(k) are statistically independent of one another; the matrix A is invertible and has full rank; each component in s(k) is a stationary; and the noise vector n(k) is white with Gaussian distribution. The P300 extraction is made in the following steps: pre-separation, whitening, separation, and estimation without ignoring the additive noise signal n(k).

Pre-separation step

The pre-separation step uses a multi-stage PCA to separate the sources and also to reduce external noises and artefacts from the measured signal vector. The eigenvalue decomposition of the correlation matrix R xx of the measured signal x(k) is given by [42]

R x x = E { x ( k ) x T ( k ) } = V Λ V T ,
(3)

where Λ ∈ RM × Nis a pseudo-diagonal matrix. On the basis of the largest eigenvalues, the spatial whitening procedures can be written as

x ̄ ( k ) = B x ( k ) = Λ j - 1 2 V j T x ( k ) ,
(4)

where Λ j = diag{λ 1 , λ 2 , ..., λ N } with λ 1 λ 2 ≥ ... ≥ λ N and V j = {v 1, v 2,...v N } ∈ RN×M. Therefore, the PCA is performed for a new vector of signals, which is defined [41, 42]

x ̃ ( k ) = x ̄ ( k ) + x ̄ ( k - τ ) ,
(5)

where τ is an arbitrary time delay. The covariance matrix of the vector x ̃ ( k ) is expressed as

R x ̃ x ̃ = R x ̃ (0) = E { x ̃ ( k ) x ̃ T ( k )} = 2 R x ̄ ( 0 ) + R x ̄ ( τ ) + R x ̄ T ( τ ) ,
(6)

where R x ̄ x ̄ = R x ̄ (0) =E { x ̄ ( k ) x ̄ T ( k )} =H R s s H T =I, under the assumption that H = BA is orthogonal and R SS = I and

R x ̄ ( τ ) = E { x ̄ ( k ) x ̄ T ( k - τ ) } = H R s ( τ ) H T .
(7)

Hence, the matrix decomposition can be written

R x ̃ x ̃ = H D ( τ ) H T = V x ̃ Λ x ̃ V x ̃ T ,
(8)

where D(τ)is a diagonal matrix expressed as

D ( τ ) = 2 I + R s ( τ ) + R s T ( τ ) ,
(9)

with diagonal elements d ii (τ) = 2(1+E{s i (k)s i (k-τ)}) If the diagonal elements are distinct, the eigenvalue decomposition is unique. Thus, the mixing matrix and the input vector x ( k ) , respectively, can be estimated as A= B + V x ̃ and

x ( k ) = V x ̃ T x ̄ ( k ) = V x ̃ T B x ( k ) .
(10)

Assume that the process x ( k ) C M comprises a zero-mean sequence whose covariance matrix is defined as in (3), and that we are going to extract its complex-values eigenvectors v i and corresponding principal components (PCs) in real-time. Employing a self-supervising principle and hierarchical neural network architecture, the PCs( x i ) are extracted sequentially as

x i = v i T x = p = 1 M v i p x p ( t ) .
(11)

The vector v i should be determined in such a way that the reconstructed vector x ̄ = v i * x i will reproduce the input vector x ( t ) according to a suitable optimization. For this purpose, let us define a complex-valued instantaneous error vector as

e i ( t ) = [ e i 1 ( t ) , e i 2 ( t ) , , e i M ( t ) ] T = x ( t ) - x ̄ ( t ) = x ( t ) - v i * x ( t ) = ( I - v i v i H ) x ( t ) = e i R ( t ) + j e i I ( t ) ,
(12)

where I is the identity matrix, e i R ( t ) and e i I ( t ) are the real part and imaginary parts of the error vector e i (t), respectively, and j= - 1 . In order to find the optimal value of the vector v i , we can define the following standard 2-norm cost function.

E i ( v i ) = 1 2 e i R 2 2 + e i I 2 2 = 1 2 p = 1 M ( e i p R ) 2 + p = 1 M ( e i p I ) 2 ,
(13)

where e i p R is the p th element of e i R . The minimization of the cost function (13), according to the standard gradient descent approach for the real and imaginary parts of the vector v i = v i R +j v i I , leads to a set of differential equations as

d v i p R d t = - β i E i ( v i ) v i p R = β i E 1 + x p R h = 1 M E 2 + x p I h = 1 M E 3 ,
(14)
d v i p I d t = - β i E i ( v i ) v i p I = β i E 4 + x p R h = 1 M E 3 - x p I h = 1 M E 2 ,
(15)

where β i > 0 is the learning rate, E 1 = e i p R x i R + e i p I x i I , E 2 = e i h R v i h R - e i h I v i h I , E 3 = e i h R v i h I + e i h I v i h R , E 4 = e i p R x i I - e i p I x i R , x i = Δ x i R +j x i I , and e i p = Δ e i p R +j e i p I . Combining (14) and (15) and taking into account that v i p = Δ v i p R +j v i p I , the adaptation law for updating the parameters is obtained as

d v i p ( t ) d t = β i ( t ) x i ( t ) e i p * ( t ) + x p * ( t ) h = 1 M v i h ( t ) e i h ( t ) ,
(16)

which can be written in matrix form as

d v i d t = β i [ x i e i * + x * v i T e i ] ,
(17)

for any v i (0) ≠ 0, β i (t) > 0. Since the second term in (17), which can be written x * v i T e i = x * ( 1 - v i H v i ) x i , tends quickly to zero as v i H v i tends to 1 with t→∝ it can be neglected. The adaptation law in (17) can be further simplified to

d v i d t = β i x i e i * = β i x i [ x - v i * x i ] * = β i v i T x [ I - v i v i H ] x * ,
(18)

where (.)* denotes a complex conjugate. In discrete time, the adaptation law in (18) can be written

v i ( k + 1 ) = v i ( k ) + β i ( k ) x i ( k ) E 5 ,
(19)

where E 5 = [ x * ( k ) - v i ( k ) x i * ( k ) ] .

Whitening step

The whitening step uses the PCA to transform the data into an appropriate space and to reduce the redundancy of the observed data. The separated input vector x ̃ ( k ) is whitened in the second step by applying the following transformation.

u ( k ) = P ( k ) x ( k ) ,
(20)

where u(k) is the whitened k vector, and P is the whitening matrix, which is determined using the neural learning approach. The objective is to find a simple adaptive algorithm for estimating the whitening matrix P, such that the covariance matrix of the whitened signals u(k) will be a diagonal matrix, that is, R uu = E{uuT } = diag{λ 1, λ 2, ..., λ N} = I N , and will be mutually uncorrelated if all of the cross-correlations are zero, that is, r ij = E{u i u j } = 0, for all ij, with non-zero autocorrelations r i j =E { u i 2 } = λ i >0. Therefore, the minimization function can be formulated in the following 2-norm.

J 2 ( W ) = 1 4 i = 1 N j = 1 M ( E { u i u j } - λ i δ i j ) 2 = 1 4 E { u u T } - I N 2 .
(21)

To derive an adaptive learning algorithm, the following transformation

E { u u T } = E { P x x T P T } = E { P A s s T ( P A ) T } = B R s s B T = B B T ,
(22)

is used, where B = PA is the global transformation matrix from s to u. Without loss of generality, R ss = E{ssT } = I N is assumed. By substituting (22) into (21), the optimization criterion can be written as

J 2 ( W ) = 1 4 B B T - I N 2 = 1 4 t r [ ( B B T - I N ) ( B B T - I N ) ] .
(23)

Applying the standard gradient descent approach and the chain rule, the derivative of (23) is obtained as

d B d t = η ( I N - B B T ) B = η ( I N - R u u ) B .
(24)

Taking into account that B = PA and assuming that A varies very slowly in time (i.e., dA/dt≈0), we have

d P d t = η ( I N - R u u ) P .
(25)

Using the simple Euler formula, the corresponding discrete-time adaptive learning algorithm can be written as

P ( k + 1) = P ( k ) + η ( k ) ( I N - R u u k ) P ( k ) ,
(26)

where η(k) is the learning parameter to be adjusted according to η ( k ) =1 ξ ( η ( k - 1 ) ) + u ( k ) 2 2 , and ξ is the forgetting factor (i.e., 0 < ξ < 1). The covariance matrix R uu can be estimated as

R ^ u u k = u u T = 1 N k = 0 N - 1 u ( k ) u ( k ) T ,
(27)

where u ( k ) =P ( k ) x ( k ) .

Separation step

The separation of the whitened signals u(k) is the third step of the proposed algorithm, which is accomplished by applying the nonlinear principal component analysis (NPCA) learning rule. The multichannel linear separation transformation is given in the following form.

y ( k ) = W T ( k ) u ( k ) ,
(28)

where W(k) is the separation matrix, whose values are updated through the NPCA learning rule. If the independent signals are zero-mean, the generalized covariance matrix of f(y i ) and g(y j ) (f(y i ) and g(y j ) are different and odd nonlinear activation functions such that f(y) = y3 and g(y) = tanh(y)) is a non-singular diagonal matrix R fg = E{f(y)gT (y)}-E{f(y)}E{gT (y)}. On the basis of the independence criterion, the nonlinear covariance matrix is given as [41, 43]

R f g = f ( y ) g T ( y ) + I ,
(29)

where f(y) = [f(y 1), f(y 2), ..., f(y N )] T and g(y) = [g(y 1), g(y 2), ..., g(y N )] T , provided that E{f(y i )} = 0 or E{g(y i )} = 0. To satisfy these conditions for arbitrary distributed sources, the nonlinearities are selected as f i (y i ) = φ i (y i ), g i (y i ) = y i or f i (y i ) = y i , g i (y i ) = φ i (y i ), where φ i (y i ) are suitably designed nonlinear functions, defining g(y) as an odd function and f(y) = g(y)-y. Therefore, similarly to (21)-(26), a real-time implementation algorithm can be derived as

W ( k + 1 ) = W ( k ) - μ ( k ) f y ( k ) g T y W ( k ) ,
(30)

where gT y = (fT y(k)-yT (k)). Since the separation matrix W(k) is assumed to be orthogonal (i.e., WT (k)W(k) = I), the real-time adaptation rule can be rewritten as

W ( k + 1 ) = W ( k ) + μ ( k ) f y ( k ) W b W ( k ) ,
(31)

where y(k) is the separated signal and the output of the second step, W b = (uT (k)-fT y(k)), μ(k) is the learning parameter (it is adjusted according to μ ( k ) =1 γ ( μ ( k - 1 ) ) + y ( k ) 2 2 with the forgetting factor 0 < γ < 1), and f(.)is a suitably chosen nonlinear function that is usually selected to be odd in order to ensure both stability and signal separations. These nonlinear functions require the use of high-order statistics (HOS). In the present study, f(.)was chosen as f(t) = tanh(t). Finally, since f(t) = dg(t)/dt, g(t) = In[cosh(t)].

Estimation step

The final step is the estimation of the independent component basis vector of the mixing matrix A(k). The estimate of the observed data is given by

x ^ ( k ) = Q ( k ) y ( k ) .
(32)

Comparing (32) with (2), and since ŝ ( k ) y ( k ) (i.e., ŝ ( k ) is the estimated source signal s(k)), it can be concluded that  ( k ) =Q ( k ) . Therefore, the columns of the matrix Q(k) are the estimates of the columns of the matrix  ( k ) . Since Q(k) is the estimation matrix, its values (similarly to (26)) are updated through the adaptation law as

Q ( k + 1 ) = Q ( k ) + α ( k ) Q e y T ( k ) ,
(33)

where Q e = [ x ^ ( k ) - Q ( k ) y ( k ) ] . The quality of the source estimate in y(k) can be measured using the zero-forcing solution. Such a solution attempts to adapt the demixing matrix such that

lim k C ( k ) Â ( k ) = Φ D ,
(34)

where C(k) = W(k)P(k)V(k), Φ is a (M × M) permutation matrix with one unity entry in any row or column, and D is a diagonal nonsingular scaling matrix. In this case, it becomes

y i ( k ) = d j j s j ( k ) + l = 1 N b i l ( k ) n l ( k ) ,
(35)

for some non-replicative assignment ji for 1 ≤ iN and 1 ≤ jM Thus, each element of y(k) is the sum of a single unique source in s(k) and a noise term. In each simulation run, the performance index (PI) is evaluated using the following equation [44].

P I ( k ) = 1 M - 1 M - 1 2 i = 1 N C a + C b ,
(36)

where C a = max 1 j M c i j ( k ) 2 j = 1 M c i j ( k ) 2 , C b = max 1 j M c j i ( k ) 2 j = 1 M c j i ( k ) 2 , c ij denotes the (i, j)th element of the matrix in C(k), corresponding to the j th independent component (IC) in the desired subset of sources. This dimensionless performance metric measures the deviation of the combined system from a diagonally scaled permutation matrix (i.e., 0 ≤ PI(k) ≤ 1 for all matrices C(k), PI(k) is one when the sources maximally mixed in the outputs, and PI(k) is zero when the desired subset of the ICs is perfectly separated). The first term in (36) gives the error of the separation of the output component y i (k) in (35) with respect to the sources and the second term measures the degree of the desired IC, c j , appearing multiple times at the output. The integration of the four steps is called the adaptive nonlinear principal component analysis (ANPCA) method. In order to improve the flexibility, efficiency, and performance of blind signals separation or extraction, the proposed ANPCA scheme is run upon a multilayer neural network. The multiple layers of neurons with nonlinear transfer functions allow the network to learn both linear and nonlinear relationships between input and output vectors. Furthermore, this allows us to combine second-order statistics (SOS) and the HOS algorithm to extract features having different statistical properties, existing at various layers, and originating from various sources. The synaptic weights in each layer are updated by employing the algorithm described above.

Results

Preparatory to an analysis of the features of P300 components from EEG signals in real-time, actual signals were recorded in an eight-channel (Fz, Cz, Pz, Oz, P7, P3, P4, and P8) configuration. Figure 3 shows the observed EEG signals with background signal amplitudes of around 300 micro volts. Figure 4 shows the pre-processed signals with amplitudes of around 25 micro volts, which were filtered using a sixth-order BPF with cut-off frequencies of 1 Hz and 12 Hz. One way of gaining further insights into EEG signals is by introducing ANPCA techniques. The present model of EEG analysis consists of four main steps: pre-separation (learning rate β of 0.6), whitening (forgetting factor η of 0.01), separation (forgetting factor γ of 0.002), and estimation (learning rate α of 0.3). In this algorithm, the pre-separation and the whitening steps enable faster adaptation at the separation step. The performance of the component separation of the ANPCA algorithm in the output was evaluated using (36). The evolutions of PI(Ni) for six different run of the proposed method generated from the data with 350 ms ISI is given in Figure 5. It can be seen that the algorithm takes between four and ten epochs to converge. Depending on the simulation run, the performance factor varies from -22 dB to -33 dB, due to random differences in the source signals. The robustness of the ANPCA was evaluated by comparing its separation performance with suggested algorithms (i.e., NPCA [45], Nonstationary Source Separation-Joint Diagonalization (NSS-JD) [42], Joint Approximate Diagonalization of Eigen-matrices (JADE) [46], and Second-Order Blind Identification (SOBI) [47]) as shown in Figure 6. Figures 7, 8, 9, and 10 show the real-time-extracted signals from eight-electrode of the P300 component using the ANPCA algorithm using ISI of 325 ms, 350 ms, 375 ms, and 400 ms, respectively. The P300 amplitudes of individual subject, taken from Fz electrode, for ISI of 325 ms, 350 ms, 375 ms, and 400 ms, respectively, is shown in Figure 11 (a) P300 amplitude upon a single stimulus and (b) P300 upon multiple stimuli. By averaging the eight extracted signals from the eight-electrode, the P300 components were not detected in some periods as indicated in Figure 12. This signal was averaged using the 350 ms ISI data. Comparative plots of the classification accuracies along seven stimuli for all subjects (subjects 1-7) are provided in Figure 13. The best classification accuracy was achieved using ISI 350 ms. The average value of the classification accuracies upon seven block stimuli for all of the subjects is given in Table 1. The classification using ISI 350 ms gave the the higher average value with smallest standard deviation.

Figure 3
figure 3

The measured raw EEG signals. The EEG signals recorded continuously and digitized at a 256 Hz sampling rate using a g-MOBIlab+biosignal acquisition device.

Figure 4
figure 4

The pre-processed EEG signals that were band-passed. The EEG signals were pre-processed using a sixth-order BPF with cut-off frequencies of 1 Hz (i.e., to remove the trend from low frequency bands) and 12 Hz (i.e., to remove unimportant information from high frequency bands).

Figure 5
figure 5

Evolutions of PI ( Ni ) for six different runs of the ANPCA algorithm using 350 ms ISl. The performance of the ANPCA algorithm in (31) was evaluated using (35), where W(0) = I. A single block of N= 7000 samples has been used to compute all coefficient updates for six run, where x ^ ( k + N i ) = x ^ ( k ) for all integer values i ≥ 0 and ikN-1.

Figure 6
figure 6

Comparison of separation performance indices: the proposed method (ANPCA) vs. other algorithms (NPCA, NSS-JD, JADE, and SOBI). To evaluate the robustness of the ANPCA against background noise, the separation performance indices of the ANPCA is compared with others well known algorithms (i.e., NPCA, NSS-JD, JADE, and SOBI algorithms).

Figure 7
figure 7

Extracted P300 components in real time (325 ms ISI). For the ISI of about 325 ms, it was found that the amplitude of the P300 component was higher than for the other ISI but noisier than for the higher ISI.

Figure 8
figure 8

Extracted P300 components in real time (350 ms ISI). For the ISI of about 350 ms, the target and non-target amplitudes were clearer and easier to distinguish than for the other ISI.

Figure 9
figure 9

Extracted P300 components in real time (375 ms ISI). For the ISI of about 375 ms, it was found that in some sessions the non-target amplitudes were higher than the target ones.

Figure 10
figure 10

Extracted P300 components in real time (400 ms ISI). For the ISI of about 400 ms, it was found that none of the channels showed similar behaviour.

Figure 11
figure 11

Comparison of P300 amplitudes for four different ISIs (Fz channel): (a) P300 amplitude upon a single stimulus, (b) P300 upon multiple stimuli. The amplitudes of the P300 component for each ISI, which indicates that the short ISI could increase both the target and non-target amplitudes. The ISI of about 350 ms shows better performance.

Figure 12
figure 12

The averages of eight-electrode data in Figure 8 (350 ms ISI): P300s are not detected in some periods. By averaging, the amplitude of a target gets bigger compared to that of a non-target, if the signal is long time stationary. But, this will fail for dissimilar trials as indicated with the ellipse mark (i.e., solid circle for the target and dash circle for the non-target).

Figure 13
figure 13

Comparison of classification accuracies along seven stimuli (four ISIs, seven subjects): ISI 350 ms was the best out of four. All subjects achieved an average classification accuracy of 100% after three blocks of stimulus presentations were averaged (i.e., 8 s). In this regard, the subject intention will be recognized after eight seconds of the first given stimulus.

Table 1 Average value of the classification accuracies upon seven stimuli

Discussion

The ability to measure and classify single-trial responses in real-time from specific brain regions has important theoretical and practical implications for both clinical and research applications. In this study, the amplitude of the background signal was around 300 micro volts as shown in Figure 3. Since the amplitude of the P300 component is very small (around 1.5 micro volts) compared with that of the background, the pre-processing filtering is required. These EEG signals were filtered using a sixth-order BPF with cut-off frequencies of 1 Hz (i.e., to remove the trend from low frequency bands) and 12 Hz (i.e., to remove unimportant information from high frequency bands), respectively. However, as shown in Figure 4, the signals nonetheless were corrupted by noises with background signal amplitudes of around 25 micro volts. Although there were some noticeable improvements, classification of the signals with respect to the given stimulus remained difficult. Therefore, an ANPCA-algorithm-based multilayer neural network model that can be used to analyzed complex P300 component from EEG signals in real-time is proposed. The MNN model with back-propagation training algorithm has five layers: the input and output layers have the same number of units N; the first and third layers are nonlinear (a sigmoid function as a universal approximation), and the second and fourth layers are linear. Layer 2 contains M units, that is, as many as there are nonlinear PCs. The activations of the neurons in Layer 2 are the nonlinear PCs of the input data. The back-propagation algorithm with an adaptive learning rate and momentum was used to train the neural networks. The values of the learning rate and the momentum were estimated by trial and error until no further improvement in the performance index could be obtained. The parameter values chosen were 0.3 and 0.8, respectively. The networks were trained before the EEG signals are recorded for one session. The time length for the training was range from 15.925 s to 19.6 s for each ISI.

Figure 5 shows the evolution of PI(Ni) for six different simulation runs in one implementation of the proposed method. The performance of the ANPCA algorithm in (30) was evaluated using (36) with W(0) = I. A single block of N = 7000 samples has been used to compute all coefficient updates for six run, where x ^ ( k + N i ) = x ^ ( k ) for all integer values i ≥ 0 and ikN-1. As it can be seen, the algorithm took between four and ten epochs to converge. Depending on the simulation run, the performance factor varies from -22 dB to -33 dB, due to random differences in the source signals. The accuracy of the method generally improves for increasing values of block length N. It can be confirmed that the ANPCA algorithm successfully separates the mixture of source signals. To evaluate the robustness of the ANPCA against background noise, the separation performance indices of the ANPCA were compared with the suggested algorithms (i.e., NPCA, JADE, NSS-JD, and SOBI). The accuracy of the recovered independent components compared to the sources was measured according to the specified performance function in (36). Figure 6 shows the overall performance of all algorithms. For data iterations longer than 5000 iterations, the performance index was not much better, but was more and more time consuming. The quality of separation increases dramatically after 1500 length of iterations for the proposed method (ANPCA) and after 4000 length of iterations for other algorithms. It's clear that the proposed method present the shortest iteration time performance index about little over 0.03 (an acceptable level for separation). Upon this, it is asserted that the ANPCA algorithm successfully separates mixed source signals. The same accuracy level of separation was achieved after 4000 iterations by using other algorithms.

The ICs that were produced from the observed data using the ANPCA algorithm (for ISI of about 325 ms, 350 ms, 375 ms, and 400 ms) are shown in Figures 7, 8, 9, and 10. Although the signals were still corrupted by noises (manifested as the high amplitudes of non-targets in some sessions), the behaviours of the extracted signals clearly represented the P300 components. The observed signal was of the P300 event-related potential signal form. For the ISI of about 325 ms (Figure 7), it was found that the amplitude of the P300 component was higher than for the other ISI, as shown in Figure 11 (a), but noisier than for the higher ISI. As noted in Figure 7, the non-target amplitudes were roughly similar to the target amplitudes. For the ISI of about 350 ms (Figure 8), the target and non-target amplitudes were clearer and easier to distinguish than for the other ISI. For the ISI of about 375 ms (Figure 9), it was found that in some sessions the non-target amplitudes were higher than the target ones. For the ISI of about 400 ms, it was found that none of the channels showed similar behavior, as indicated in Figure 10. In this case the assumption of long stationary segment for averaging method will cause loss of the time resolution. Figures 7, 8, 9, and 10 show that the extracted signal amplitudes decreased (i.e., from the Fz to the P8 channel) as the distance of the electrodes increased. Figure 11(a) plots the amplitudes of the P300 component for four different ISIs (Fz channel) upon a single stimulus (scale of 700 ms) and indicate that the short ISI could increase both a target and a non-target amplitudes. Figure 11(b) plots the amplitudes of the P300 component for four different ISIs upon multiple stimuli (scale of 60 s) and indicate the peak shifting of the P300 component with respect to the various ISIs. The experiment using 350 ms ISI showed the best performance. Figure 12 displays the averages of the signals extracted from the eight-channel with ISI 350 ms. By averaging, the amplitude of a target gets bigger compared to that of a non-target, if the signal is long time stationary. But, this will fail for dissimilar trials, as indicated in Figure 12 (i.e., solid circle for the target and dashed circle for the non-target). This is one of the main reasons why the proposed method does not use the averaging scheme.

Comparative plots of the classification accuracies for the seven subjects were provided in Figure 13. All subjects achieved an average classification accuracy of 100% after three blocks of stimulus presentations were averaged (i.e., 8 s). In this regard, the subject intention was be recognized after eight seconds of the first given stimulus. Shown alongside the average value of the classification accuracies upon seven block stimuli for all of the subjects, in Table I, are the corresponding 85% confidence intervals. According to Table 1, the experiment with ISI 350 ms provides the highest average classification accuracies (88.921%) and smallest standard deviation (1.807) over all subjects. By contrast, ISI 400 ms showed the worst classification accuracies (84.839%). However, the worst standard deviation (4.959) was given by the experiment with ISI 325 ms. These results reflect the fact that the best performance was obtained through the experiment with ISI 350 ms.

Routine P300 component of EEG signals has been widely used in the clinical circumstances [2126]. In this context, the use of physiological signals rather than behavioral responses of patient are often advisable, albeit challenging. Overall, the P300 component has sparked considerable interest as a clinical-application diagnostic tool. The most efficient method of implementing the diagnostic tool is through real-time detection. The amplitude of different waveforms at a single point can also be displayed in a similar format. This type of display provides a more objective analysis of the EEG activity compared to a subjective visual analysis by a physician. Simultaneous video monitoring of the patient during the EEG recording is becoming more popular. It allows the physician to closely correlate EEG waveforms with the patient's activity and may help produce a more accurate diagnosis.

Conclusions

The applicability of the proposed ANPCA method for extracting the P300 waves included in the EEG signals for real-time without down-sampling and averaging of the original signals was demonstrated. The separation performance factor of the ANPCA varied from -22 dB to -33 dB due to the randomness of source signals. In comparison with other algorithms (i.e., NPCA, NSS-JD, JADE, and SOBI), the ANPCA presented the shortest iteration time with performance index about 0.03. Since all the computations are done in real time, the ANPCA can be used as a viable tool for clinical applications.