1. Introduction

In the past decade, research of brain computer interface (BCI) has been growing and developing. Due to the efforts devoted by many researchers, innovative designs of user-friendly and ergonomic BCI systems emerge[1]. Nowadays BCI has become a new communication channel between human and computer. According to a number of previous researches in literature, BCI is defined as a communication system that does not depend on the brain’s normal output pathways of peripheral nerves and muscles[2]. In fact, it provides a novel idea that one may use his brain activity to control computers, machines, or other electronic devices.

In general, the research of BCI usually falls into two categories: dependent BCI and independent BCI. A dependent BCI means the communication is dependent upon the brain’s normal output channels. For example, BCI based on the use of visual evoked potentials (VEPs) can be considered dependent since the production of VEPs relies on muscular control of gaze directions, thus being classified as dependent BCI. In contrast, an independent BCI is a communication channel that is independent of normal output pathways of the brain. Examples such as P300 evoked potentials[3, 4], slow cortical potentials[57], μ and β rhythms[812], and cortical neuronal action potentials are all attributed to independent BCI systems[13].

The general structure of a BCI system is mainly consisting of three components: signal acquisition, signal processing, and application. In a BCI system, EEG signals generated by cortical activity should be first acquired and digitized. Then, the recoded EEG signals are passed into the signal processing unit for noise reduction and feature extraction. Finally, the feature vectors are then transformed into a number of device commands that can be used to execute the designated tasks such as cursor movement or wheelchair control. Figure 1 shows a conceptual block diagram of a typical BCI system[14, 15].

Figure 1
figure 1

A conceptual diagram of basic design and operation of a BCI system. In general, it can be divided into three parts: signal acquisition, signal processing, and application.

In this study, a signal-processing-based technique, called template matching, used for detecting the presence of P300 evoked potentials with the applications into automated character recognition is proposed. It should be noted that the use of template matching to detect brain potentials may not be novel and has been addressed before. For example, Kim et al. have proposed a template-matching-based algorithm for spike detection[16]. The method itself has proven that it can quickly and reliably detect action potentials of a variety of morphologies in extracellular neuronal recordings, in comparison with the conventional threshold detection algorithm. Moreover, while visually evoked P300 response has attracted great attention for the BCI system design, some previous works focused on the use of a so-called m-sequence to induce/evoke responses in the visual cortex. In this aspect, Nezamfar et al. used multiple m-sequences for intent discrimination in the brain interface, instead of using a single m-sequence and its shifted versions previously[17]. They employed a correlation coefficient-based template matching scheme to define the classifier. On the other hand, in addition to the brain potential detection, template matching has also been successfully applied to a variety of topics in biomedical signal processing area, such as the ECG component detection (e.g., QRS detection[18]).

According to the descriptions of previous related works as stated above, one may see that the idea of template matching is not new. However, there actually still remains a lack in its applications into the task of P300 detection. That is also the main reason why we employed it here in our study. We believe that such a simple and reliable method should be one of the best ways to prevent the P300 detection performance from being degraded by the undesired noise or artifacts due to the skin, electrode motion, or power-line interference. In order to demonstrate the validity and reliability of the proposed method, we finally evaluated the algorithm performance using an existing database provided by BCI Competition 2003.

Furthermore, there is an existing BCI platform system, referred to as the BCI2000, having been currently used in several BCI research groups. It was developed by the Wadsworth Center in New York, and is an open-source software. That is, one can freely modify it and use it for a variety of BCI applications. In fact, it provides an integrated systematic environment for one to develop and test a general-purpose BCI research and development[19]. In this study, we also integrated and implemented the proposed novel algorithm onto the BCI2000 system, so the task of automated character recognition might be performed in an offline or a real-time manner.

2. Materials and methods

2.1 BCI2000 system

Having been known as a general-purpose system for BCI research, BCI2000 can be used for data acquisition, stimulus presentation, and brain monitoring applications[20]. As described previously, it is an open-source software. In general, the software of the BCI platform system can be divided into four modular parts: a) operator, b) source, c) signal processing, and d) application, and all these modules can also communicate with each other. It coordinates all the peripheral devices, so the intention of a user may be responded in a real-time fashion. The software platform is built using the Borland C++ Builder, and is designed carefully to ensure that the real-time conditions may be satisfied. A schematic block diagram is depicted in Figure 2. In fact, since BCI2000 can incorporate any sources of EEG signals and different signal processing algorithms to facilitate the implementation of a variety of BCI systems, in this study we employed it as a platform to realize the proposed P300-based signal processing algorithm as well as to evaluate the actual performance of the algorithm with the application into a BCI spelling system.

Figure 2
figure 2

A schematic block diagram of the BCI2000 system. It consists of four modular units: operator, source, signal processing, and application. During its operation, the information is communicated from source to signal processing, to user application, and then back to source.

2.2 Feature extraction: template matching

First, note that in BCI2000 the feature vector employed for the detection of a P300 evoked potential was only formed by including a number of EEG amplitudes that were at the real-time instants close to 300 ms immediately after the occurrence of the current flashing stimulus. However, such a feature set selection might be inadequate to well reflect the brain intent corresponding to the stimulus and thus may only achieve modest detection accuracy. In order to achieve better and acceptable detection results with high accuracy, in this research we proposed a template-matching-based feature extraction for the task of P300 detection. The reason that we employed the template matching is because in time domain the P300 evoked potential has a very well-defined waveform, which shows a positive potential at approximately 300 ms after the task relevant stimulus. Thus, we may speculate that using the template matching method to extract such a specific phenomenon could be one of the most direct and effective ways to accomplish this detection problem.

In fact, the template matching method is based on the calculation of correlation coefficient. It is alternatively known as the similarity evaluation. The correlation coefficient is defined as

R= n X n - X ¯ Y n - Y ¯ n X n - X ¯ 2 n Y n - Y ¯ 2
(1)

where X n represents the sample of the signal to be analyzed, and X ¯ denotes the mean of the signal; Y n represents the sample of the template, and Y ¯ is the mean of the template. The correlation coefficient R represents the similarity between the signal and the template, and it can be served as a feature used for the task of the subsequent pattern classification. Figure 3a, b provides illustrative examples of applying the template matching to the EEG signals with and without the presence of P300 potentials, respectively. One may see that the former achieves at least twice as much correlation coefficient value than does the latter.

Figure 3
figure 3

Illustrations of applying the template matching to the task of P300 detection. Illustrations of applying the template matching to the task of P300 detection from the EEG signals (a) with the presence of P300 waves; and (b) without the P300 waves.. The P300 template is highlighted in blue color in both plots. If further evaluating the correlation coefficient between the EEG and the P300 template, we got 0.69 from plot (a) and 0.37 from plot (b).

2.3 The proposed P300 detection flow

As stated in the previous section, a P300 detection algorithm built in the BCI2000 system was simply to employ EEG sampling data as the features used for the P300 detection. According to the characteristics of the P300 evoked potentials, the EEG response that can be ‘qualified’ as feature candidate should be the samples acquired at the time instants close to 300 ms after the stimulus. That is, given the sampling rate 240 Hz, the system may include the sampled EEG data taken from the vicinity of the 72nd sample of the EEG signal as a feature vector used for classification. Figure 4 shows a typical P300 waveform and indicates a possible feature set composed of a number of EEG amplitudes.In addition to EEG amplitude, another feature candidate, i.e., the correlation coefficient as described in the previous section and expressed in (1), was further proposed in our study. Observing Figure 4, one may see that the appearance of a P300 waveform is pretty well defined. We speculate that using the template matching method to extract the morphological feature would greatly enhance the detection rate. The correlation coefficient between the brain activity in form of EEG data and the P300 template obtained from a training process were evaluated by a template matching process. In fact, this value would roughly represent the similarity between the EEG segment and the P300 template. An EEG segment associated with a larger correlation coefficient may indicate the occurrence of P300.

Figure 4
figure 4

A typical plot of an averaged P300 waveform. Also, an example of a possible feature vector consisting of a set of EEG samples in the vicinity of 300 ms, say, from the 57th to the 65th samples, given the sampling rate 240 Hz, is indicated in red color.

To this point, a number of features formed by the EEG amplitude and the correlation coefficient can be used for P300 detection. In general, it should be noted that the concept of a linear discriminant function (LDF) is applicable to the case of N features. It leads to two class clusters shown in the N-dimensional feature space with an (N-1)-dimensional classifier hyperplane so the optimal linear classification results may be then achieved. Therefore, all or some of our feature candidates can be combined in a linear form to perform the so called LDF analysis. It should be also noted that not all the features may be required for P300 detection. Different detection rates may be achieved by employing different subsets or combinations of these features. The optimal feature set could be sought out by experimentally evaluating and comparing the performances obtained from a number of numerical trials. The LDF applied to our study can be expressed by

y= l 1 l 2 l N × F 1 F 2 . . . F N
(2)

where F1 ~ F N are the N features composing of an N-dimensional feature space vector and y represents the linear one feature that retains the information relevant for classification. The N × 1 linear mapping vector L = [l1l2 … l N ]T was obtained after training over a number of datasets with known classes. A decision rule for the occurrence of a P300 wave was then formulated as follows:

if y t h say P 300 occurs ; if y < t h say P 300 does not occur .
(3)

Figure 5 further provides a schematic diagram of the P300 detection flow proposed in our study.

Figure 5
figure 5

The P300 evoked potential detection flow as proposed in our study.

3. Results and discussion

3.1 Descriptions of a computerized EEG database

The EEG database used for evaluating the performance of the proposed P300 detection algorithm in this study was from the BCI Competition 2003. The collected EEG signals in the database were digitized at 240 Hz and they were recorded using exactly the same way as that proposed by Donchin et al.[3, 4]. The setting of P300 measurement is described as follows. During the experiment, each user was presented with a 6 by 6 matrix of characters as shown in Figure 6. Then, the user was asked to focus his/her attention on characters in a word (each user was presented by one character at a time). All rows and columns of this matrix were highlighted at a rate of 5.7 Hz successively and randomly. In general, the brain responses evoked by the flashing stimuli, where only the two out of 12 stimuli, that did contain the desired character should be different from those evoked by the stimuli that did not contain the desired character. This is because the former should contain the P300 evoked potentials while the latter should not. Therefore, according to the fact as stated above, we may speculate that each time one particular row and column could commonly decide a final target, i.e., the desired character. Note that for each character, sets of 12 stimuli (intensifications) were repeated 15 times. That is, any specific row or column was highlighted 15 times and thus there were 180 intensifications in total for each character.

Figure 6
figure 6

The 6 by 6 matrix of characters. For each character, all rows and columns in the matrix were highlighted by flashing for a number of times.

3.2 Numerical experimental settings and detection results

There were two datasets employed in our study. Each dataset represents a complete record of P300 evoked potentials measured using a paradigm as proposed by Donchin et al.[3, 4]. According to the description of the database, one dataset was established by a ten-word experiment (39 characters) and it was used for training; the other was obtained from an eight-word experiment (31 characters) and was used for testing. In our analysis, we re-organized these databases in the following manners. First, we split the original training database (i.e., the one that contains ten words) into two smaller data subsets which consisted of three words (DOG, GLOVE, HAT(2)) and seven words (CAT, FISH(1), FISH(2), BOWL, WATER, HAT(1), SHOES), respectively. We here simply employed the former as a training dataset (i.e., the one that contains three words). Then, we tested the latter (i.e., the one that contains seven words) simply using the decision rule, as formulated in (3), derived from the training dataset.

First, a feature vector consisting of 20 designated EEG amplitude values acquired at the temporal locations around 300 ms after the flashing stimulus was adopted for the task of P300 detection. We then transformed the N-dimensional feature vector, where N = 20, into a smaller one-dimensional one feature y simply using (2) and performed the pattern classification on the linear one feature y using the decision rule as formulated in (3). Note that the linear mapping vector L was obtained after training over the training datasets. As a result, the detection accuracy obtained from the testing dataset was only about 54%. Unfortunately, the detection rate did not satisfactorily reach to a sufficiently high accuracy level, thus implying that features directly selected from the raw EEG sample data might be inadequate for such a detection problem.

Next, we used both the amplitude and morphological information as the features for the same detection problem. A feature vector, consisting of five designated EEG samples (close to the 300 ms) and the correlation coefficient (i.e., the similarity) was used (N = 6). Considering (2), we set feature values F1 ~ F5 to the EEG samples as well as F6 to the correlation coefficient R measured between the EEG segment and the P300 template. As a result, the accuracy was then enhanced to about 82%. However, the detection rate was still not satisfactorily high. This might be partly attributed to a fact that the appearance of a P300 wave is usually not strictly at 300 ms after a stimulus occurs, so the designated EEG samples taken from some certain fixed time frame may not always be able to exactly reflect the appearance of a P300 wave.

Since according to the numerical experimental results we found, both the amplitude and the amplitude-morphology-based feature vectors could not achieve a sufficiently high detection rate, we thus employed the correlation coefficient as the sole feature and redid the same tests as above. Consequently, tests conducted using the testing dataset produced the perfect detection rate, i.e., 100% in predictive accuracy of P300. We speculate that such a morphological information-based feature can well and adequately reflect the presence of a P300 waveform. Therefore, we decide to continue to use the correlation coefficient evaluated using the template matching method as the sole feature to further carry out the subsequent numerical experiments of P300 detection.

Moreover, since as mentioned previously any specific row or column was intensified 15 times, there were 180 intensifications in total for each character. As a result, the maximum time required for detecting a character was about 30 s (i.e., 180/5.7 ≈ 31.58 s). That is, this only allowed about two characters per minute (char/min) in a system communication rate. Obviously, reducing the number of repetitions would enhance the communication rate, thus accelerating the task of detection. On the other hand, however, this may unavoidably cause degradation in the detection rate. In order to assess the reliability and robustness of the proposed template-matching-based algorithm, we computed the percentage of detection rate versus the number of repetitions of sets of 12 stimuli, denoted as M, for three numerical experiments described as follows.

First, for the numerical experiment I, we here merged both the previous seven-word testing dataset and the original eight-word testing dataset provided by the competition into a new larger testing dataset. Then, we performed the task of P300 detection/classification on these 15 words simply using the same P300 template and the decision rule as employed before (i.e., obtained from the previous three-word training dataset). Note that the algorithm performance in terms of the detection rate was evaluated on a variety of repetitions (M) of stimuli sets. The numerical results are provided in Table 1.

Table 1 Listings of the P300 detection results on the 15-word testing dataset

Secondly, in numerical experiment II, we simply performed the P300 detection on the original eight-word testing dataset provided by the competition using exactly the same P300 template as employed before (in experiment I). The numerical results are provided in Table 2. Finally, in numerical experiment III we employed all the ten words in the original training dataset to derive a new P300 template as well as the decision rule. Then we performed the P300 detection on the original eight-word testing dataset, again. Similarly, the numerical results are provided in Table 3.

Table 2 Listings of the P300 detection results on the eight-word testing dataset
Table 3 Listings of the P300 detection results on the eight-word testing dataset

3.3 Discussion

We summarized all the numerical results listed in Tables 1,2, and3 by a plot, as shown in Figure 7. Given M repetitions (or alternatively known as M trials) of sets of 12 stimuli with the flashing rate of 5.7 Hz, the communication rate, denoted CR, is defined as

CR= 60 M × 12 5.7 char / min
(4)
Figure 7
figure 7

Numerical results in terms of the detection accuracy (in percentage). Numerical results in terms of the detection accuracy (in percentage) obtained at various communication rates (in char/min) for all the three different experimental settings.

In general, it is revealed from Tables 1,2, and3 and Figure 7 that the communication rate at which the system allowed communication over 80% detection accuracy was about 4 char/min (i.e., M ≈ 8), over 90% detection accuracy was about 3 char/min (i.e., M ≈ 10), and the detection rate achieved 100% detection accuracy when the communication speed was reduced to about 2 char/min (i.e., M ≈ 15). In addition, from experiment III, one may also see that the detection rate was not significantly enhanced even though the number of the words for training was increased to ten words, suggesting that the size of the training database might not play the crucial role in the entire detection settings.

Furthermore, it should be also noted that although the database of P300 evoked potentials used in our work was recorded exactly in the same way as that employed by Donchin et al. in their previous works[3, 4], all rows and columns of the matrix were successively and randomly intensified at different rates for both our and their databases, respectively. In fact, the database used in our study was the dataset IIb provided by the BCI Competition 2003, and each of all the 12 events (i.e., 6 rows and 6 columns) was intensified at a rate of 5.7 Hz, whereas that from the other database employed in Donchin's work was intensified at a rate of 8 Hz[3]. Obviously, it is difficult and unfair to compare our work against Donchin et al.'s simply by relying on the communication rate, in terms of char/min, since their intensification rate is higher than ours, as indicated above. Instead, there is a direct relationship between the number of trials, denoted as M in our study, required for reliable recognition and the detection accuracy (i.e., the character recognition rate). Recall that here a ‘trial’ is defined as the intensification of all the 12 elements of the matrix. Therefore, the performance can be thus evaluated simply using the number of trials required for a given detection accuracy.

Since the detection accuracy is dependent upon the communication rate and also the communication rate is a function of the number of trials, as indicated in (4), we may thus find a way to relate the detection accuracy to the number of trials. Recall in Donchin et al.'s work, there were 40 trials per minute (that is, since the intensification rate is 8 Hz, we thus have: 40 trials × 12 events/trial × 0.125 s/event = 60 s = 1 min). Therefore, given the communication rate CR char/min at a certain level of accuracy, we may then obtain the number of trials, M, required for achieving that level of character detection accuracy simply by evaluating 40/CR; this relation can be alternatively obtained from (4) by replacing 5.7 by 8 (and thus we may get M = 40/CR). Consequently, the results in terms of communication rate derived from Donchin et al., as reported in[3], were then converted into the results in terms of number of trials using the above relation, as tabulated in Table 4. Note further that according to[3], it can be seen that CR = 2.5 at 96% ~ 99% level of accuracy, while CR = 1.25 at 100% detection accuracy. Therefore, achieving both levels of accuracy would then require 16 (40/2.5) and 32 (40/1.25) trials, respectively. In addition, the results produced by the numerical experiments conducted by our study are also provided in Table 5. Observing both Tables 4 and5, first we may see that as expected, the detection accuracy increases as the number of trials in each case increases. Secondly, comparing the numerical results as listed in Tables 4 and5, respectively, one may see that although both the average numbers of trials at either 80% or 95% level of accuracy derived from Donchin et al.'s and our work, respectively, were comparably close to each other (6.65 vs. 6.79, 11.96 vs. 13.11), our proposed template matching in conjunction with LDA (TMLDA) only needed 15 trials to achieve a 100% recognition rate, way less than that required by the method proposed by Donchin et al. (15 vs. 32). Furthermore, it is also revealed from Table 4 that Donchin et al.'s work still could not achieve 100% detection accuracy even using up to 16 trials. Therefore, this would represent a substantial improvement with respect to the results produced by the work as proposed in[3].

Table 4 Results in terms of number of trials derived by Donchin et al.'s work[3]
Table 5 Results in terms of the number of trials

4. Conclusions

According to the research conducted by Donchin et al., a BCI that involves the P300 detection allows a user to communicate with a PC or microprocessor-based equipment via a sequence of characters. In this research, a novel, simple, and reliable P300 detection method is proposed. In addition, the high detection accuracy and low detection time were also achieved by the proposed detection algorithm. Our approach differs from those based on the conventional spectral or temporal analysis of EEG. It employed a template matching technique to extract the brain intent in form of correlation coefficient-based morphological information from EEG signals. According to the numerical experimental results produced by a reliable database provided by BCI Competition 2003, the correlation coefficient manifests itself a good feature that can be well applied to the task of P300 detection. Moreover, the proposed algorithm also demonstrated itself that it could achieve higher detection accuracy than did the default method in the BCI2000 system. Numerical experimental results derived from the existing database showed that the proposed template matching algorithm finally achieved 100% detection accuracy for automatic character recognition.