Keywords

1 Introduction

Humans use computers through interfaces such as keyboard, mouse, touch screen, digital camera or a data glove (Fig. 10.1). These interfaces have one thing in common: they need physical movement of the user. This physical movement may not be possible in the physically “locked-in” [76] patients. A Brain Computer Interface (BCI) is a device in which a person uses his brain to control the machine to be used. The machine can be a computer, wheelchair, robot, an assistive or an alternative communication device. BCI is a promising technology that provides a direct communication between the brain and a computer for conveying messages to the external world from one’s thoughts, without using any of the appendages. It provides an individual a non-muscular [87] way to communicate and control his surroundings. Each time we do a task or think of performing one, our brain generates distinct signals. These signals corresponding to the activity have a pattern. Exploring and identifying this pattern are challenging and form the crux of any BCI task. A BCI picks the signals from the brain of an user in the form of Electroencephalography (EEG). Feature extraction and classification leads to these signals being translated into meaningful commands to drive the device. Due to its tremendous potential the BCI attracts huge investments and research activities from around the world to facilitate and accelerate development. BCI has a wide range of applications across a variety of fields, both medical and non-medical. At the outset, we review the principles and practical applications of BCI related to speech communication, including “locked-in” patients, synthetic telepathy, cognitive biometrics and silent speech communication.

Fig. 10.1
figure 1

Conventional human computer interfaces

Thousands of severely disabled people are unable to communicate due to paralysis, locked-in syndrome (LIS) or other neurological disorders. Reinstating communication with these patients is a major challenge. The BCI is used by people deprived of expressing through speech. LIS is a condition in which the patient is awake and conscious, but “locked-in” an immobile body. The voluntary motor paralysis prevents the subject from communicating by the way of words or body movements. The subject wishes to speak or move as he is able to perceive his environment, but is unable to communicate due to “locked-in” state. The inability to communicate with others is distressing. The recent advances in computer based communication technology and BCI have enabled these people to communicate and control their surroundings and access the internet. This has improved the quality of life of the patients and helped them live with dignity.

Several BCI techniques evolved over the past decade restoring communication to persons with severe paralysis. These assistive devices range from a simple binary (yes/no) communication device, the speller device, a virtual keyboard to imagined speech communication to name a few. Birbaumer et al. [5] and Perelmouter et al. [64] have developed a speller device for a “locked-in” person to compose letters. Binary tree structured decision of the BCI is used, dividing the alphabet into successive halves until the desired letter is selected. A similar kind of speller is portrayed by Wolpaw et al. [86] where the alphabets iteratively divide into fourths instead of halves. Donchin et al. [20] has developed a method based on the P300 component of event-related potentials. The rows and columns of a two dimensional alphabet grid are illuminated in a sequence, and allow the user to select the desired letter. A 2-D cursor navigation to select letters from a WiViK virtual keyboard for “locked-in” subjects is suggested by Kennedy et al. [38].

At its most basic level, communication for “locked-in” patients involves a simple yes-no scheme [24] based on eye movements. One eye blink means “yes” and two blinks mean “no”; others may look up for “yes” and look down for “no”. For more detailed communication, alphabet boards may be used. The letters on the alphabet board may be arranged in the order frequently used, or in the form of blocks or a grid. An assistant goes through the letters one by one until the patient blinks to choose a letter. A laser pointer controlled by head movement can be used for faster communication. Special infrared sensors that react to eye movement can also be employed. The patient can move the cursor or select a letter by staring at it and then snap on it by blinking. This technology serves as a rehabilitation measure for patients suffering from classical and incomplete LIS as it is proven to control the residual movements. But the indirect communication systems have few major disadvantages. Though these systems are precise, the letter choice rate is as slow as one word/min, thereby limiting the user’s fluency. Moreover, these indirect methods fail to improve the patient’s behavioral abnormalities and do not address improving their psychological condition [74]. These methods fail to improve the constraints related to speech communication capabilities as well.

In an effort to handle the aforementioned problems and make BCI speech production more natural and fluent, direct methods are being developed. Figure 10.2 summarizes the direct and indirect methods of speech. The direct method involves capturing the brain signals of the intended speech, treat the signals to predict the speech and synthesize the speech production in real-time. The direct method of speech communication in BCI has extensive claims both in medical and universal applications. In this perspective, Suppes et al. [78, 79] used EEG and MEG signals to characterize speech imagery of words and sentences. DaSalla et al. [17] has developed a BCI for vowel speech imagery using EEG. Brumberg et al. [11, 12, 15] and Guenther et al. [27, 28] have developed speech BCI using EEG and ECoG. The decoded signals from the imagined speech are used to drive a speech synthesizer. Leuthardt et al. [4850] have shown that ECoG is associated with different overt and imagined phoneme articulations. This enables invasive monitoring of human patients to control one-dimensional computer cursors rapidly and accurately. Extensive study is being borne out in this topic by numerous research groups.

Fig. 10.2
figure 2

Communication strategies in LIS patients

2 Evidences of Speech Communication in Locked-in Patients

In this chapter, we introduce a few examples of LIS affected patients and their achievements inspite of severe impairment. Stephen Hawking, one of the most brilliant theoretical physicists in history and the author of “A Brief history of Time”, has a motor neuron disorder since his adult lifetime. Hawking communicates by selecting words from a series of menus along the screen, by urging a switch in his hand or operated by head or eye motion. The chosen words are stored and staged to a speech synthesizer, thus enabling him to communicate up to 15 words a minute [77]. He has written books and lots of scientific documents and delivered many scientific and popular talks by using this device. A victim of LIS, Jean-Dominique Bauby, penned a book, titled—“The Diving Bell and the Butterfly” (An award-winning movie of the same name). Through the script he showed the world that a deficiency does not hold back from achieving [3]. Bauby communicated by blinking his left eyelid to choose letters from an alphabet board. He founded an Association of Locked-In Syndrome (ALIS) with the intent to aid patients suffering from LIS and their families. The French-based ALIS registered 367 LIS affected patients [43] between 1997 and 2004. This database serves as the foundation for the research performed on the patient population. Julia Tavalaro, a wheelchair-bound quadriplegic, was in a vegetative state, and could just move her head and eyes, but her senses were intact [44]. Tavalaro trained her residual movements to use a computer and eventually penned her own memoirs “Look Up for Yes”. Philippe Vigand, another victim of LIS, cultivated an infrared camera which in turn enabled him to “speak” and “write” by blinking his eyes [25]. This magic camera helped Philippe write his memoir, entitled “Only the Eyes Say Yes”. He has written about the evolution of his sickness and demonstrated his willingness to face new challenges. Another poignant testimony of LIS comes from Judy Mozersky who lost the entire bodily motion except her eyes. Through the aid of assistive computer technology, she has been able to continue her studies at Cornell. Her memoir, “Locked-In: A Young Woman’s Battle with Stroke” [55] has been published by the National Stroke Association.

These people with courage and hope have rebuilt their lives despite their supposedly insurmountable conditions. This implies the need to key out and appreciate the views and feelings surging behind the quiet and stillness of those who are “locked-in”. It is documented that the inability to communicate is alarming and more devastating than the inability to move. As an outcome, rehabilitation strategies for patients with LIS have focused on finding ways to aid communication using various means available for a finicky patient. Clinicians believe that in the majority of cases, improved communication improves patient’s quality of life and allows them to be more involved with family and society. Austrian researchers have categorized LIS into 3 subtypes shown in Fig. 10.3 [4]. (a) Classical LIS, in which conscious patients are completely immobile except for eye movement and blinking. (b) Incomplete LIS, in which minimal residual movement is preserved in parts of the body besides the eyes. (c) Total LIS, in which patients are conscious but unable to move any muscle. Rehabilitation can be made available for the classical LIS patients, a few of incomplete LIS patients, but not viable for total LIS patients.

Fig. 10.3
figure 3

Classification of locked-in syndrome (LIS)

3 Supplementary Target BCI Applications for Speech Communication

BCI for speech communication is prominently perceived as an alternative augmentative communication (AAC) device for severely disabled people. As illustrated in Fig. 10.4, it delivers its application in a variety of non-medical fields. Research on synthetic telepathy is being carried out by the U S Army, with the intention to allow its soldiers to communicate [19] just by thinking. In 2008, the U S Army awarded a $4 million contract to a team of scientists from three American universities. They are University of California (led by Mike D’Zmura) at Irvine, Carnegie Mellon University, and University of Maryland. The aim is to build up a thought-helmet, a device that can read and broadcast the unspoken speech of soldiers. The goal was to enable silent communication among the soldiers. The thought-helmet extracts the brain signals of the soldier who hopes to communicate silently, interprets the signals to speech, and conveys those to a radio speaker or an earpiece worn by other soldiers. The developers [9, 11, 22] are working towards decoding the brain signals associated with speech. To begin with, a message is interpreted by a synthetic voice to be delivered in the soldier’s own voice specifying his position and distance from the recipient. The team, directed by Schalk, is pursuing the invasive Electrocorticography (ECOG) approach. The second group, headed by Mike D’Zmura, is using EEG, a noninvasive brain-scanning technique better suited for an actual thought-helmet.

Fig. 10.4
figure 4

Applications of BCI in speech communication

Silent speech communication is one of the most interesting future technologies which enable speech communication without using the sounds created during the vocalization process. The silent speech interface [18] allows people to communicate with each other by using a whispering sound or even soundless speech. Furthermore, the voice-disabled individual can use his tongue and mouth movements. The process then allows the silent speech interface technology [14] to produce the voice on his behalf thereby facilitating communication with others. This technology is used by NASA astronauts who need to communicate, without the voices not being known due to surrounding noise. In preliminary experiments, NASA scientists found that small, button-sized sensors, fixed under the jawbone and on either side of the throat [56] could collect the signals. The signals are then sent to the processor and a computer program to translate them into words. These subvocal speech systems can be made use of in space suits, in noisy environment and airport towers to capture air-traffic controller [56] commands. The subvocal speech systems might be used by a person who has lost his voice permanently. A person using the subvocal speech system [21] thinks of phrases and talks to himself in silence. But the tongue and vocal cords do receive speech signals from the brain. These biological signals are tapped and fed to the speech system for further analysis. Chuck Jorgensen and his team [36, 37] are developing silent speech recognition at NASA’s Ames Research Center. They developed special software trained to recognize six words and 10 digits repeated sub-vocally. The word recognition rate was 92 %. The speech system is trained to learn the words— “stop”, “go”, “left”, “right”, “alpha” and “omega”, and the digits 0–9. With these sub-vocalized words, the software performed simple searches on the internet and controlled a web browser program.

The sub-vocalized or imagined speech can be used as a new feature for biometrics [72] as opposed to the traditional methods. This new class of biometrics based on cognitive aspects of human behavior, called cognitive biometric, is a novel approach to user authentication. The brain state of individuals used for the authentication mechanism increases the robustness and enable cross validation when used in combination with traditional biometric methods. The biometric approaches based on the biological features of humans [42, 52, 62, 63, 6971] have distinct advantages over traditional methods. The cognitive biometric cannot be hacked, stolen or transferred from one person to another as they are unique for each person.

Speech communication has an extensive scope in various domains of applications. But the challenges in processing the EEG signals are significant. The EEG signals are extremely complex and prone to internal and external interference. Advancement in sensor technology, data acquisition techniques and robust signal processing algorithms may lead to efficient usage of speech communication in diverse applications and may overcome the challenges posed.

4 BCI Design Principles

BCI research is a comparatively young and multidisciplinary [53] field integrating researchers from engineering, neuroscience, psychology, physiology and other healthcare fields. BCIs use brain signals to control and communicate with the computer or any external device. Hence the need to assess the brain signals and render the information into compliant electrical signals is important. There are several cases of existing BCI, classified based on the sensory systems and control signal systems.

4.1 Types of BCI

The neuroimaging techniques used in BCI can be broadly classified into invasive and non-invasive methods. Invasive BCIs involve implanting electrodes inside the brain and the non-invasive ones include haptic controllers and EEG scanners. The basic purpose of these devices is to comprehend the electrical signals in the brain and translate them as signals sensed by external devices. Invasive modalities need to implant microelectrode arrays inside the skull within the brain, which involves expert surgeons with high precision skills. The problem with this device is that a scar tissue forms over the device as a reaction to the extraneous matter. This reduces its efficacy and increases the health risk to the patient. Though, they possess the best signal to noise ratio the need to undergo complex surgical procedure causing a permanent hole in the skull is not worth taking the risk. Multiple degrees of freedom can be achieved only through invasive approaches. Partially invasive BCIs are implanted inside the skull, but over the brain. They spread out electrode arrays on the surface of the brain. Although the signal strength is less feeble, it eliminates the problem of scar tissue formation, e.g. ECoG or intracranial EEG (iEEG). Noninvasive BCI is the most used neuroimaging methods, dealing with general brainwaves that are dampened by passing through the skull, nonetheless receptive enough to extract the signals with specific information. The EEG is the most widely used non-invasive technique and most studied in recent times. Other non-invasive methods considered for capturing brain signals include magneto encephalography (MEG), functional magnetic resonance imaging (fMRI) and near infrared spectrum imaging (NIRS). The invasive and non-invasive methods are summarized in Table 10.1.

Table 10.1 Summary of neuroimaging techniques

According to the nature of the input signals used, BCI systems can be classified as exogenous or endogenous. Exogenous BCI uses external stimulus such as sound or picture to elicit the brain activity, while the endogenous BCI is based on self-regulation [59] of brain rhythms and potentials without external stimuli. Table 10.2 summarizes the differences between exogenous and endogenous BCIs.

Table 10.2 Main differences between exogenous and endogenous BCI

BCI systems are classified based on the input data processing techniques as synchronous or asynchronous. Synchronous BCIs [68] analyze brain signals during a pre-defined time window. The user is expected to send commands during this specific period and any signal outside this window is ignored. The asynchronous BCI analyzes the brain signals continuously irrespective of user command and therefore is more natural than synchronous BCI. Asynchronous BCIs are computationally heavier and complex. Table 10.3 summarizes the differences between the two.

Table 10.3 Major differences between synchronous and asynchronous BCIs

4.2 EEG Based BCI

A common method for designing a BCI [61] is to use EEG signals extracted during mental tasks. The EEG is the most widely used neuroimaging methods, due to its high temporal resolution, comparative low cost, portability, and few risks to the users. The EEG records the brain’s electrical activity along the scalp produced by the firing of neurons within the brain. However, the signals are of low resolution [59] as the signals travel through the scalp, skull, and many other layers. So the original signal strength in the electrodes becomes weaker, to the order of microvolts and turns out to be very sensitive to noise. Noise is a key factor [2] in EEG signals. It reduces the signal to noise ratio and decreases the ability to extract meaningful information from the recorded signals. The noise may be either due to other current fields in the brain or external noise sources. EEG signal is measured as the potential difference over time, between the active electrode and the reference electrode. The international 10–20 system accessed from Brain Master Technologies Inc. [10] is shown in Fig. 10.5. The multichannel EEG sets contain up to 128 or 256 active electrodes. These electrodes are made of silver chloride (AgCl). A gel is used which creates a conductive path between the skin and the electrode for the flow of current. Electrodes that do not use gels, called ‘dry’ electrodes are made of materials such as titanium and stainless-steel.

Fig. 10.5
figure 5

A standard 10–20 international electrode placement system

EEG signals consist of a set of frequency bands. These frequency bands are identified as delta (δ), theta (θ), alpha (α), beta (β), and gamma (γ). Relevant characteristics of these bands are mentioned in Table 10.4.

Table 10.4 Frequency bands in the brain signal

The EEG can be modified [29] by motor imagery, successfully used by patients with severe motor impairments (e.g., late stage of amyotrophic lateral sclerosis) to aid them communicate with their environment. The need for brain signals with a higher resolution has restricted the recovery of motor disabilities, despite the exceptional convenience of EEG based BCI applications.

4.3 Control Signals Used in BCI for Speech Communication

The physiological phenomena of the brain signals can be tapped, decoded and modulated, to control a BCI system. These signals are regarded as control signals in BCIs. The control signals employed in current BCI systems are classified as visual evoked potentials (VEP), slow cortical potentials (SCP), P300 evoked potentials, and sensorimotor rhythms (SMR). Wang et al. [83] has listed the signal controls with their main features (refer Table 10.5).

Table 10.5 Summary of control signals [83]

EEG records the electrical activity arising from the neurons residing in the cerebral cortex using the scalp electrodes. The brain electrical activity may be spontaneous or evoked due to specific external or internal stimulus/events. Responses to stimulus are termed as event-related potentials (ERP). Event-related potentials are time locked to physical stimuli and help capture neural activity related to sensory and cognitive processes. Event-related potentials can be elicited by a wide variety of sensory, cognitive or motor events. The EEG activity reflects the summed activity [65] of postsynaptic potentials. An electrical potential is produced when neurons, to the order of thousands or millions, fire in tandem. The ERPs are categorized as exogenous and endogenous. ERPs occurring within the first 100 ms after the onset of stimulus [31, 80] are termed sensory or exogenous as they depend on the physical parameters of the stimulus. Exogenous ERPs are obligatory responses to the presentation of physical stimulus like visual, audio or intensity. In contrast, ERPs generated with the latency in the range of 100 ms up to several seconds are termed cognitive or endogenous. The endogenous ERPs reveal the manner in which the subject evaluates the stimulus. They depend on behavioral and psychological processes of the event. The ERPs are characterized by their latency and amplitude, relative to stimulus onset. ERPs with a latency ranging from 500 ms to around 10 s are categorized as slow cortical potentials (SCP). The EEG signals are extremely complex and prone to noise. To separate the EEG signals from the background noise, the signals are time locked and averaged across many trials, thus improving the signal-to-noise ratio.

4.3.1 Visually Evoked Potentials

The VEP is an evoked potential and elicited when users view a flickering stimulus of different frequencies in the range of 3.5–75 Hz. The brain generates electrical activity of the identical frequency or multiples of the frequency of the visual stimulus. Spectral analysis of EEG in visual areas i.e. occipital lobe shows the frequency components that can later be mapped to the device commands. These modulations are easy to detect since the amplitude of VEPs increases a great deal [82] as the stimulus is moved closer to the central visual field. This control signal needs very little training. However, the drawback of this control signal is that the user has to keep his eyes fixed at one point bereft of any random movement.

Lee et al. [4547] exposed the subjects (participants of the experiment) with a 5 × 5 matrix that contained flashing stimuli of digits, characters, and symbols displayed on the LCD screen. The cells of the matrix flicker in a random sequence. Participants have to gaze at the cell containing the digit or character they want to select. The potential at the occipital cortex are measured, and the matrix cell which elicited large signal amplitude is considered as the target cell the participant wanted to select. Successful communication with a high information transfer rate is achieved as a consequence. The evoked potential serves as an efficient and reliable tool for disabled people to communicate with external environments. BCI based on VEP entails that the user should be able to control the gaze direction precisely.

4.3.2 Slow Cortical Potentials

Slow cortical potentials [34] are potential shifts of the cerebral cortex, in the frequency range below 1–2 Hz and may persist over several seconds. One of the first communication devices in BCI, the Thought Translation Device (TTD) developed by Birbaumer and his colleagues [57, 32, 33] uses SCPs to control the movement of an object on a computer screen. The TTD supports completely paralyzed patients with basic communication ability. The patients are trained to self-regulate SCPs voluntarily to navigate a binary-tree spelling device. In each selection, the choice is between selecting or not selecting a set of one or more letters [87] until a single letter is chosen. A backup or erase option exists as well. Self-regulation of SCPs is critical as the rate of information provided by SCP based BCI are sensibly low. For instance Lutzenberger et al. [51] and Rockstroh et al. [73], trained patients to self-regulate their SCP by providing feedback and positive reinforcement of correct responses. Continuous practice and extensive training are required to use SCP based BCI.

4.3.3 The P300 Event-Related Potential

A P300 wave is an endogenous event related potential component [35] created due to infrequent auditory, visual, or somatosensory stimuli. The signal is characterized by an increase in time series amplitude approximately 300 ms after the stimulus onset. Increase in signal amplitude is more prominent at the parietal and occipital electrodes, although observed at several other locations on the scalp. P300 was suggested by Farwell and Donchin [20, 23, 75] for operating a letter speller BCI but of late investigated by another research group [26, 40, 60]. It is seen in response to the oddball paradigm, that the target stimulus with rare and irregular occurrences is presented within a series of the standard stimulus. For example, if a subject is viewing a random series of names, in every 3 s occasionally if one of these is the subject’s name, a P300 wave is generated. The P300 wave is produced in response to this rarely presented, recognized, meaningful stimulus. The P300 is larger for less probable events [20].

The speller device consists of a matrix of letters, numbers and symbols. The rows and columns of the matrix are highlighted in sequence. To select a letter, the user has to focus attention on the cell containing the target letter. When a row or column has the chosen letter, a P300 component of the ERP is elicited. The BCI detects the character by determining the row and column, which produced a P300 response and the corresponding character, is printed on the screen. The use of P300 based BCIs does not need user initial training. Nevertheless, the performance may be reduced because the user gets used to the infrequent stimulus [16] and so the P300 amplitude is decreased. A common form of P300 based spelling BCI uses a 6 × 6 matrix that has 26 letters of the alphabet and numbers 0–9. In every trial, each row and column are illuminated once for a period of 100–175 ms, totaling 12 events, two containing the target item and ten containing non-target items characterizing an oddball paradigm. The presentation sequence is repeated several times per selection and the signals are averaged to improve the P300 signal to noise ratio for reliable detection.

4.3.4 Sensorimotor Rhythms

Sensorimotor rhythms comprise µ (8–12 Hz) and β (18–25 Hz) rhythms, localized in the primary sensory or motor cortical areas. A decrease in µ and β rhythms is associated with movement or preparation of movement labeled as event-related desynchronization (ERD). An increase of µ and β rhythms is associated with relaxation labeled event-related synchronization (ERS) [39]. These rhythms also occur with motor imagery, i.e. imagining the movement and also with cognitive tasks.

This technique developed by Wolpaw et al. [85], is used to control one and two dimensional cursor movements on a computer screen [41, 81, 88]. People have learned to control µ and β amplitudes in the absence of movement or sensation, including those with LIS. Increased µ rhythm amplitude [54] moves the cursor towards the top target and decreased µ rhythm amplitude does so towards the bottom target. Pfurtscheller and colleagues at the Graz University [57, 58, 66, 67] have developed a BCI for two-state classification using the mental imagery strategy. Different motor imagery such as, imagination of left-hand, right-hand, or foot movement is used to elicit the brain activity in the sensorimotor areas in response to a visual cue.

4.3.5 Intracranial Method—ECoG

As mentioned in earlier sections, speech communication for severely paralyzed people can be achieved using EEG or ECoG. The ECoG method requires implantation of microelectrodes into the outer layers of the human cortex. Kennedy et al. [38] has described an invasive method to drive a BCI, where an “Amyotrophic lateral sclerosis” (ALS) affected patient learned to control the cursor to produce synthetic speech and type. Brumberg and colleagues have developed a BCI to control an artificial speech synthesizer by an individual with “locked-in” syndrome during imagined speech [12, 27, 28]. The neural signal recorded from an implanted electrode in the speech motor cortex of a human volunteer is used to drive an artificial speech synthesizer. Leuthardt et al. [4850] demonstrated that ECoG activity recorded from the surface of the brain enables users to control a one-dimensional computer cursor rapidly and accurately. The ECoG signals associated with different types of motor and speech imagery are identified and used to control two dimensional joystick movements.

Though, speech prosthesis for paralyzed individuals can be achieved using cortical surface electrodes (e.g. ECoG) or intra-cortical microelectrodes, the EEG is the most preferred technology [84] because of its excellent temporal resolution, non-invasive characteristics, portability and reasonably low price. However, due to volume conduction through the scalp, skull, and other layers of the brain, the spatial resolution of EEG signals diminish and needs to be improved.

Though EEG is endowed with high temporal resolution, its poor spatial resolution makes it almost essential to be trained extensively by the users. However, ECoG has significant clinical risks limiting its usage.

5 Challenges and Future Research Directions for Speech Communication BCI

Several crucial issues need to be handled to facilitate expanded use of BCI technology in speech communication… Most of the existing techniques use fMRI for processing of languages and speech areas of the brain as it has a good spatial resolution. But fMRI has limited temporal resolution. The high temporal resolution of EEG and ECoG has the potential to demonstrate the functional relation between the language and speech areas of the brain. Neuroscience research [8] has shown that imagined speech activate the frontal cortex as well as Broca’s and Wernicke’s areas. The change in neural activity in the language areas of the brain needs to be understood clearly. The EEG signals are usually recorded in high-dimensional space and the size of data makes it computationally intensive on the classifier. To address this issue, competitive dimension reduction techniques [1] and spatial filters are to be identified. An important attribute of spatial filtering [2] is to reduce the number of channels on the scalp and at the same time retain all the information needed for the classification. The electrodes that do not contribute to the activity may be discarded thereby reducing the number of electrodes considerably.

Another key issue is whether it is practically useful to the actual target population. The patient population in need of BCI has severe neurological diseases [13] causing extensive changes in EEG patterns and the power spectrum. Due to their continuous degenerative state, a decrease in spectral power is possible, which induces classification errors.

To achieve success rates and stability in BCI, the evoked potentials from event related potentials are considered (e.g. P300, VEP). However, it may not be possible to provide external stimulus in every possible situation. For the patient population, the sensory perception is often impaired or degrades continuously, evading any external stimulus. Therefore, the need to self-regulate the brain signals is critical. For other applications such as silent speech communication and cognitive biometrics, the usage is outside the lab environment, rendering the supply of external stimulus infeasible. Therefore, the necessity to self-regulate the endogenous signals (e.g. SCP) is required. However, this implies extensive training for the user, in order to produce the same signals every time.

A combination of advances in sensor technology, data acquisition systems, standard methods and metrics for evaluation and reliable algorithms can propel the use of BCI for speech communication to diverse directions.

6 Conclusion

The research and development in BCI for speech communication have attracted great attention and investigation from many research groups across varied realms of interest. Though the primary goal of BCI technology is to restore communication in severely paralyzed population, the speech communication has expanded its application in silent speech communication, synthetic telepathy and cognitive biometrics.

The most common BCI applications use EEG for recording the neural activity. The EEG based speller devices are either controlled by evoked potential (VEP, P300, SMR) or by self-regulation of SCP and motor imagery for selection of letters from a visual display or a binary speller device. The EEG study confirms that it is feasible to use non-invasive neurophysiology method to control the spelling device. Though these indirect methods empower speech communication, they are not at rates, fast enough for conversational or near conversational speech. The slow speech production may cause the disabled users to withdraw from social interactions in frustration. To overcome the drawbacks of EEG based speller devices, the intracranial electrodes (ECoG) are used for signal acquisition. The ECoG boasts of an improved signal to noise ratio (SNR). The risk of neurosurgery, the cost involved and ethical issues make invasive methods impractical except for users who are severely disabled.

In recent times, researchers are investigating the feasibility of performing direct speech production from different neurological signals for more natural and fluent speech production. The direct method involves capturing the brain signals of the intended speech or speech imagery, process the signals to predict the speech and synthesize the speech output in real-time. The direct method of speech communication in BCI has an extensive scope of medical and general applications. Extensive work is being carried out in this field by several research groups. To promote the feasibility of BCI for speech imagery, we must take into account the psychological factors and the advances in EEG pattern recognition techniques. With the advancement in technology, faster and more accurate communication may be achieved with EEG based BCI systems for direct speech production.