Keywords

1 Introduction

Brain-Computer Interfaces (BCIs) translate brain signals, usually acquired non-invasively using electroencephalogram (EEG), in computer commands without using the brain’s normal output pathways of peripheral nerves and muscles [24]. Such communication technologies have the potential to help people with physical impairments, as they could provide user interfaces that work independently of the person’s limitations.

This article focuses on steady state visual evoked potentials (SSVEPs)-based BCIs [3]; a commonly used BCI paradigm. By looking at a flickering visual stimuli, brain signals are modulated with the corresponding frequency. These signals are then measured with an EEG and can be classified in real time. An SSVEP-based BCI can be utilized as communication tool and is one of the fastest BCI paradigms [1, 20].

A crucial point regarding user friendliness is the design of the graphical user interface (GUI). Four class SSVEP-based BCIs have been proven to grant sufficient control over the system and allow greater freedom in the choice of stimulation frequencies than systems with multiple targets [9, 11].

Integrated dictionaries can make spelling applications more efficient [13, 23]. However, due to the limited number of commands, implementing a dictionary in a four class BCI is a challenge. As the English alphabet consists of 26 letters, multiple steps are necessary to select a letter. Additional steps are needed to select dictionary suggestions.

Various BCI spelling interfaces with built-in dictionaries have already been developed by different research groups [17].

When designing a communication system with text predictive mechanisms, it is important to take its desired use in lifelike scenarios into consideration. In this regard, the presented dictionary was based on word frequency lists for spoken language. As the 200 most frequently spoken words form 80% of the everyday language [4], the use of such frequency lists can greatly accelerate the communication speed of such systems.

Another way to improve the system speed and accuracy is the careful adjustment of key parameters for SSVEP classification. Auto-calibration procedures allow expert-independent home-use of a BCI as they allow people with no technical knowledge to set up the system. Though such calibration methods are common for other BCI approaches [10, 13, 15, 16], they are rarely used for BCIs based on the SSVEP paradigm. However, in the here presented dictionary supported SSVEP-based BCI, a modified version of our earlier developed calibration methods [9] was integrated to set up user specific stimulation frequencies and other key parameters associated with the utilized classification methods such as frequency dependent classification thresholds.

Our goal was to develop a more user-friendly BCI system suitable for daily communication. To achieve this we improved and consolidated our previous developments [9, 23]. The here presented dictionary driven, four class SSVEP-based spelling application was tested with 41 participants. We further

  • evaluated the efficiency of the integrated dictionary for daily communication by analyzing results of an online spelling task and

  • demonstrate that the modified SSVEP-calibration methods allow expert independent adjustment of BCI key parameters.

The paper is organized as follows: the second section describes the experimental setup, and presents details about the spelling interface. The results are presented in the third section, followed by a discussion and conclusion in the final section.

2 Methods and Materials

2.1 Participants

The study was carried out in accordance with the guidelines of the Rhine-Waal University of Applied Sciences. All participants gave written informed consent in accordance with the Declaration of Helsinki. Information needed for the analysis of the experiments was stored anonymously during the experiment; results cannot be traced back to the participant. All participants had the opportunity to opt-out of the study at any time. Forty-one participants (11 female) with a mean (SD) age 22.31 (2.73) years participated in the study. All participants were students or employees of the Rhine-Waal University of Applied Sciences and had little or no previous experience with BCI systems. The EEG recording took place in a normal laboratory room (area \(\approx 36\,\mathrm{m}^{2}\)). Spectacles were worn when appropriate. Participants did not receive any financial reward for participation in this study.

2.2 Hardware

Participants were seated in front of a LCD screen (BenQ XL2420T, resolution: \(1920 \times 1080\) pixels, vertical refresh rate: 120 Hz) at a distance of about 60 cm. The used computer system operated on Microsoft Windows 7 Enterprise running on an Intel processor (Intel Core i7, 3.40 GHz). Standard Ag/AgCl electrodes were used to acquire the signals from the surface of the scalp. The ground electrode was placed over \(AF_Z\), the reference electrode over \(C_Z\), and the eight signal electrodes were placed at predefined locations on the EEG-cap marked with \(P_Z, PO_3, PO_4, O_1, O_2, O_Z, O_9\), and \(O_{10}\) in accordance with the international system of EEG electrode placement. Standard abrasive electrolytic electrode gel was applied between the electrodes and the scalp to bring impedances below \(5\,\mathrm{k}\Omega \). An EEG amplifier, g.USBamp (Guger Technologies, Graz, Austria), was utilized. The sampling frequency was set to 128 Hz. During the EEG signal acquisition, an analogue band pass filter (between 2 and 30 Hz) and a notch filter (around 50 Hz) were applied directly in the amplifier.

2.3 Signal Acquisition

Minimum energy combination method (MEC) [7, 20] was used for SSVEP signal classification. To detect a frequency in the spatially filtered signals, the SSVEP power estimations for the frequencies were normalized into probabilities:

$$\begin{aligned} p_{i} = \frac{\hat{P_{i}}}{ \sum _{j=1}^{N_{f}}{\hat{P}_{j}} } \; \,\text{ with }\, \; \sum _{i=1}^{N_{f}}{p_{i} = 1} \end{aligned}$$
(1)

where \(N_f\) is the number of considered frequencies and \(\hat{P_{i}}\) is the ith power estimation, \(1 \le i \le N_{f}\). Note that to increase robustness, three additional frequencies (means between pairs of target frequencies, see e.g. [21]) were also considered, hence \(N_{f}=7\).

All classifications were performed on the basis of the hardware synchronization of the used EEG amplifier (g.USBamp); the new EEG data were transferred to the PC in blocks of 13 samples (101.5625 ms with the sampling rate of 128 Hz). The classification was performed with a stepwise increasing sliding window (up to 20 s) after receiving the new EEG data block. If non of the \(p_{i}\) exceeded a certain corresponding threshold \(\beta _i\) the classifier output was rejected. The choice of the \(\beta _i\) depended on the corresponding stimulation frequency (in general, lower stimulation frequencies produce higher SSVEP-response) but also on user factors as the quality of the SSVEP-signals differ between participants. The values for the \(\beta _i\) were determined with a calibration software [9]. After each classification the classifier output was rejected for the duration of 914 ms (9 blocks). During this gaze shifting period, the targets did not flicker allowing the user to change his/her focus to another target (see [20] for more details).

2.4 Auto-calibration

Key SSVEP-parameters were determined individually for each participant in a short calibration session with the previously developed BCI wizard software [9]. This wizard ran the user through three phases in order to provide participant-specific stimulation frequencies (phases 1 and 2), classification thresholds, and minimal time segment lengths (phase 3).

The number of stable frequencies on LCD monitors is limited by the vertical refresh rate of 120 Hz since the number of frames in a stimulation cycle needs to be a constant [5]. Therefore, only dividers of the monitor’s vertical refresh rate were considered as stimulation frequencies. The four optimal stimulation frequencies were drawn from frequencies obtained with dividers between 6 and 24 of the vertical refresh rate (see x-axis of Fig. 3).

The low frequency band overlaps with the alpha band (8–13 Hz), which can cause false classifications [26]. Therefore, alpha activity was measured in phase 1 and critical frequencies were filtered: If a possible target frequency interfered with the users alpha wave (frequency difference less than 0.3 Hz), this frequency would be neglected as described in [9].

The determination of optimal stimulation frequencies was based on a comparison of the integral value of normalized probabilities (1); more details can be found in [9]. The so called multi-target technique (see [22]) where the user focuses on multiple simultaneously flickering stimuli at once, was used to find optimal target frequencies.

In this respect, the user faced sequentially three circles representing possible stimulation frequencies. Each circle flickered for 10 s while EEG data were recorded and the probabilities of the possible target frequencies were sorted from highest averaged probability to lowest. The first two circles contained seven of the considered frequencies each (see Fig. 1a). In order to avoid mutual influences between stimulating frequencies, the seven frequencies contained in each circle followed the additional restrictions rules (see e.g. [22]):

$$\begin{aligned} f_i \ne [ f_j +f_k]/2,\, f_i \ne 2f_j-f_k,\, f_i \ne 2 f_k-f_j. \end{aligned}$$
(2)

The considered stimulation frequencies for the first circle were 6.32, 7.50, 8.00, 10.00, 10.91, 13.33, and 6.67, 7.06, 8.57, 9.23, 12.00, 15.00, 12.00 Hz for the second circle.

Fig. 1.
figure 1

(a) One of the circles containing seven of the considered stimulation frequencies is displayed on the left. Each tested frequency was represented by the same amount of segments spread randomly across the circle. The random distributed segments representing one specific frequency are shown on the right side. (b) In phase 3 the BCI user had to focus on each of the four determined target frequencies. The recorded data were then analyzed to determine frequency specific classification thresholds. (Color online figure)

The third circle contained the seven highest ranked frequencies from the first two recordings. Finally the top four frequencies from the third recording were selected as optimal target frequencies. However, if the highest ranked frequency was more than 20% stronger than the second highest, it was filtered out, as too strong SSVEP responses to a particular frequency could cause classification errors.

In order to find optimal thresholds each of the four determined stimulation frequencies were presented as white boxes on the screen (see Fig. 1b). The boxes contained the numbers 1, 2, 3, and 4. Initially, the box containing the number 1 had a red frame, while the frames of the remaining boxes were white. An audio message instructed the user to focus on the box highlighted by the red frame. Each box flickered for 10 s while EEG data were recorded. The flickering stopped for a two seconds break so that further recordings would not be influenced by the SSVEP-responses from the previous one. Then the second box was highlighted and EEG data were recorded again. This procedure was repeated until data for all four frequencies were collected.

The classification thresholds were then determined as follows. For each frequency the distributions of correct and false classifications were calculated for different threshold sets. Through comparison of these distributions optimal thresholds were determined. Therefore, the classification outputs of the recorded data were analyzed with different threshold sets for each frequency. To determine the threshold for a particular target frequency the distributions of false and correct classifications of those outputs were compared. For further details regarding this procedure please refer to [9].

2.5 Dictionary Driven SSVEP-Based Three-Step Speller

The presented Dictionary driven SSVEP-based Three-step Speller resembles previously developed GUI layouts [8, 14, 23] and allows selection of single letters (spelling mode) as well as complete words (dictionary mode). In each mode, four frequencies were presented as flickering boxes (175\(\,\times \,\)175 pixels) on the monitor. The size of the boxes varied during the experiment as described in [20]. The output of the Dictionary driven SSVEP-based Three-step Speller, the spelled text, was displayed at the bottom of the screen.

Fig. 2.
figure 2

Graphical user interface of the dictionary driven Three-step speller during the online experiment. At first the participant was selecting the letter “H” (a–c). After the character sequence “HEL” had been selected, the participant entered the dictionary mode (d), and selected the word “HELLO” (e, f). In total, twelve correct commands were necessary to select the desired word. (Color figure online)

Spelling Mode. To select a character in the spelling mode three steps were necessary. Initially a matrix of nine boxes, each containing three letters of the alphabet (26 letters plus the command space), was presented (see Fig. 2). The frames of the boxes were colored differently for each row, with each color corresponding to one frequency; green (“A B C”, “D E F”, “G H I”), red (“J K L”, “M N O”, “P Q R”) and blue (“S T U”, “V W X”, “Y Z _”), respectively. An additional 10th box with a yellow frame, containing the command “Dict/Del” (delete the last spelled character or switch to dictionary mode) was located on the left side of the screen. After first selection, the boxes of the selected row were highlighted with individual colors (green, red and blue), while the other rows were grayed out.

To enhance user friendliness, an animation (in the form of a slow rearrangement of the boxes containing the selected single letters) was presented during the gaze shifting period (between the 2nd and 3rd step of the letter selection - spelling mode), while the remaining boxes were faded out (Fig. 2b and c). Next four boxes were presented, three representing a single letter and one for the command “back”. The purpose of the animation was to show the user from which box the single letters originated. The animation should ensure that the user did not have to search for the desired letter as he/she witnessed the position change. Based on our previous experience, this should reduce the number of wrong selections in the 3rd step. During the first two steps, no gaze shifting was necessary as only the frequencies (and frame colors) changed, but not the position of the target letter. For example, if the user wanted to select the letter “H”, initially the target letter was contained in a green-framed box (first row), then in the blue-framed box, and finally in the red-framed box (see Fig. 2a–c). The role of the yellow-framed box changed depending on the current step of the selection phase. In the 1st step of the spelling mode the user could enter the dictionary mode, see Fig. 2(a) box “Dict/Del”, where he/she could delete the last selected letter or word. In the 2nd and 3rd step of the spelling mode the yellow framed box contained the command “back” which gave the user the opportunity to go tho the previous step. In order to increase the user friendliness, every command classification was followed by an audio feedback with the name of the selected command or the letter spelled.

Dictionary Mode. The dictionary mode was used to select from a list of six suggested words which were positioned above the “Dict/Del” button, see Fig. 2(a and d). The presented suggestions were entries from a dictionary containing 39 000 words, ordered by word frequency. This dictionary was derived from a list of the most frequently used words from spoken English (https://en.wiktionary.org/wiki/Wiktionary:Frequency_lists; accessed on 4th November 2015). Initially, the first six entries of the dictionary were displayed as suggestions. After choosing a single letter in the spelling mode, the list of displayed suggestions was updated and contained the first six words from the dictionary list, starting with the selected letter or sequence. If the desired word was displayed as one of the suggestions, the user could switch to the dictionary mode and select the whole word directly, or continue to spell the word letter by letter. For example, after entering “HEL” the user was able to choose a word from the suggestion list in three steps: First, the “Dict/Del” button had to be selected to enter the dictionary mode. As only four commands were available, the suggestion list was split in two. In a second step, those lists could be selected by gazing at a “select” button located sideways to each list. After selection, the three words from the selected list were displayed in separate boxes and the desired word could be directly selected. Figure 2(d–f), shows the three steps necessary to choose the word “HELLO” after the character chain “HEL” was already selected. When selecting a word from the suggestion list, the system automatically added a white space at the end of the word.

2.6 Experimental Setup

After signing the consent form, each participant completed a brief questionnaire. Thereafter, the participants were prepared for the EEG recording. Participants participated in a brief test run spelling the word “BCI”, and a short phrase to get familiar with the application. Next, each participant used the GUI to spell a randomly selected sentence from a list containing 80 sentences from common English conversations. Those sentences were selected from English conversations between two or more people in real life scenarios. Each spelling phase ended automatically when the sentence was spelled correctly. Spelling errors were corrected via the implemented delete button. The entire session took on average about 40 min for each participant.

Table 1. Results for 41 participants.

3 Results

Table 1 summarizes results for all participants. Provided are the time needed to complete the task, the command accuracy and the ITR. It was apparent beforehand that the dictionary support would increase the overall system performance.

BCI performance was evaluated by calculating the commonly used ITR in bits/min (see e.g. [24]):

$$\begin{aligned} B=\log _2 N+P\log _2 P+(1-P)\log _2\left[ \frac{1-P}{N-1}\right] . \end{aligned}$$
(3)

In the formula above, B represents the number of bits per trial. The overall number of possible choices was four (\(N = 4\)), and the accuracy P was calculated based on the number of correct command classifications \(\widetilde{C_{n}}\) divided by the total number of classified commands \({C_{n}}\). To obtain ITR in bits per minute, B is multiplied by the number of command classifications per minute.

Fig. 3.
figure 3

Frequency power estimations over all participants for the considered frequencies. Probabilities based on the recorded data from the wizard were calculated with the Minimum Energy Combination algorithm.

Further \(\widetilde{C_{n}^{*}}\) denotes the number of commands needed to spell the phrase without the implemented dictionary (\(\widetilde{C_{n}^{*}}\) is three times the sentence length). Almost half (18 out of 41) of the participants reached an accuracy of 100% and the rest scored above 93%. The spelling task length varied negligibly from 27 to 37 characters.

Figure 3 shows the averaged distribution of the probabilities of the considered frequencies averaged over all participants after the last recording in phase 2 of the wizard. The recorded data were analyzed with the Minimum Energy Combination algorithm to find the frequencies with highest probabilities.

4 Discussion and Conclusion

Though the introduced interface was more complex due to the implementation of the dictionary, neither a drop in accuracy nor speed were observed in comparison to our previous experiments [8, 9]. The achieved mean accuracy of 97.92%, as well as the ITR 23.84 bits/min compete with the results from our previous field study where a similar user interface and algorithms were tested (97.02% accuracy and an ITR of 21.58 bits/min were achieved, see [9]).

The dictionary driven speller was implemented as a four-class BCI-system as those systems allow the majority of users to gain control over the system [9, 11, 19]. Indeed, the accuracies achieved as well as the fact that all participants were able to control the system, confirm that a low number of simultaneously displayed targets might decrease the number of wrong selections, as discussed e.g. in [9]. A further advantage of BCIs with a low number of classes is that they seem to be less stressful for the user (see e.g. [8]). A common drawback of these systems is that due to the used alphabet several steps are necessary to choose a desired target. The time needed to solve the tasks is usually quite large compared to typical input devices or multi-target BCIs (e.g. [12]). However, the accuracy and speed of the system can be increased through mechanisms like this implemented dictionary or other language-based models (see e.g. the review [17]). The number of commands needed to complete the spelling tasks increased on average by a factor of 1.92, if the previously developed three-step spelling application (without dictionary [8]) was used instead (see \(\widetilde{C_{n}^{*}}/\widetilde{C_{n}}\) in Table 1).

This factor varied; its maximal value was 2.50. Communication was sped up for almost all participants, however one participant chose purposely not to use the dictionary (\(\widetilde{C_{n}^{*}}/\widetilde{C_{n}}=1.00\)). Nevertheless, through the implemented dictionary, the effort to spell typical everyday sentences was reduced immensely for almost all participants. Choosing an appropriate dictionary for everyday use that suits the needs of different users is still a challenging task [4].

Communication interfaces developed for other commonly used BCI approaches yielded similar differences between dictionary driven and conventional spelling. E.g. Akram et al. integrated a word suggestion mechanism in order to reduce typing time into a conventional P300-based speller [2]. With the conventional speller an average word typing time of 2.9 min was achieved. In contrast, with the scheme with word suggestion mechanism the average time was reduced to 1.66 min, which is 1.74 times as fast. D’albis et al. proposed a motor imagery based spelling device adopting natural language processing [6]. The spelling speed with the proposed interface was on average 2.1 times faster (6.15 min compared to 12.93 min with the standard approach).

The automatic calibration procedure integrated in the here presented spelling interface allowed non-experts to setup a functioning SSVEP-BCI with user specific parameters. The experiment was carried out by student assistants with little to no experience in BCI-setup. All necessary parameters were determined automatically. After starting the program no further adjustments by the experimenters were necessary.

The wizard determined a set of target frequencies by comparing SSVEP-responses. As shown in Fig. 3, the lower stimulation frequencies evoked the highest SSVEP-responses, which is inline with previous observations [18]. However, it is known that low frequencies cause more fatigue [25]. A further disadvantage of those frequencies that they overlap with the alpha band (8–13 Hz). If the participant closed the eyes a little too long false classifications might occur. In order to omit this problem the wizard checked the considered frequencies for interference with a participant’s alpha wave.

In order to increase user-friendliness of the calibration procedure, we plan to integrate the necessary data recordings for calibration in an online copy spelling task. The dictionary could also be improved. For example, the structure of the already written part of a sentence could also be considered (e.g. through language based spelling correction). Another approach could be the implementation of the detection of error-related potentials that could help the user to correct errors easily. Future work should take those ideas into consideration.