1 Introduction

Plenty of neurological diseases including brainstem stroke, spinal cord injury, multiple sclerosis, cerebral palsy, muscular dystrophies and amyotrophic lateral sclerosis (ALS) may impair the person’s regular communication pathways [1]. If individuals are drastically affected by these neural disorders, they may partially or completely lose their voluntary muscle control. In such situations, the subject is incapable of interacting with their surroundings in any other form of communication. To address this issue, researchers are trying to invent a variety of assistive technologies. Among these assistive technologies, the brain-computer interface (BCI) concept is widely investigating by the researchers.

In a BCI system, specific patterns of brain activity are translated into control commands in the purpose of particular devices operation [2]. Mind-controlled wheelchair [3], home appliances [4], prosthetic arm controlling [5], spelling system [6], emotion detection system [7] and biometrics [8] are the popular BCI applications [9]. Currently, BCI applications have been widened from medical to non-medical fields, for example, BCI based games and virtual reality [9]. Both non-invasive and invasive brain activity recording modalities are contributing progressively in neuroscience research as well as in brain-computer interfacing [10]. There are two invasive strategies are employed in BCI research including electrocorticography (ECoG) and intracortical neuron recording. The broadly used non-invasive modalities are Electroencephalography (EEG), Magnetoencephalography (MEG), Near-Infrared Spectroscopy (NIRS) and Functional Magnetic Resonance Imaging (fMRI). The majority of BCI studies have been utilized brain waves based on EEG recording. There are some positive aspects in preference of EEG including low-cost data capturing devices, ease of mobility and non-invasive manner of data acquisition. However, the Signal-to-Noise Ratio (SNR) of EEG does not always meet the satisfactory level. Moreover, the algorithm for EEG analysis in certain cases lessens the classification accuracy as well as the data transfer rate.

The brain activity due to the changes in facial expression can be used either separately or combined with EEG in the purpose of BCI applications. Raheel et al. [11] have been classified five facial expressions using brain activity and the classified facial expressions are smile, wink, looking up, looking down the eye. The brain activity has been captured through a 14-channel Emotiv EPOC EEG headset. Authors have extracted thirteen statistical features and these features have been classified using K-NN, Naive Bayes, SVM, and multi-layer perceptron. The highest classification accuracy of 81.60% has been achieved using K-NN which seems to poor. Ma et al. [12] proposed a hybrid approach consists of Electrooculogram (EOG) and event-related potential (ERP) for robot control. From EOG signals, Wink, eye blink, frown, and gaze have been detected whereas P300, N170 and VPP were evoked from ERP. Double and triple blink was used to stop and move (forward) the robot respectively, whereas the looking left and looking right were utilized to move the robot in the left and right direction respectively. Finally, the frown was used to stop and enter in ERP mode. Reyes et al. [13] investigated the possibility of using brain activity (generated by facial gestures and eye movements) in the purpose of controlling assistive devices.

In this study, the brain activity from different facial expressions including left wink, right wink, and no wink have been classified. The brain wave regarding the winks has been captured using the fives channel Emotiv Insight EEG headset. Emotiv Insight has also been employed in other studies [14,15,16,17,18,19] to capture the brain wave. The remaining part of this paper has been organized in the following sections i.e. Sects. 2 and 3 discusses issues related to methodology, results and discussion respectively; finally, Sect. 4 deals with the conclusion.

2 Methodology

The human brain is the control unit of the whole body. Hence, any change in facial expression can be recorded in the brain. The brain wave due to the left wink, right wink, and no wink has been analyzed in the current study. The brain waves have been captured from five different subjects using Emotiv Insight. Two different features have been extracted from the collected dataset and the selected features have been classified using machine learning techniques. The complete methodology of the proposed approach has been discussed in this section.

2.1 Experimental design for data acquisition

The complete data collection experiment has been sorted into seven steps shown in Fig. 1. The first step was to select the subjects. In this study, brain activity has been recorded from a total of five subjects where three subjects were male, and two subjects were female. The age range of all subjects was 21–28 years. All the subjects are students and they are familiar with the BCI technology. All the subjects were asked to sleep for 6 h at the previous night of data collection day. Moreover, all the subjects were prohibited to take coffee, alcohol, and smoking within the past 12 h. All subjects have been signed-in the consent form before data collection.

Fig. 1
figure 1

Complete steps of experimental design

The full experiment was conducted at Applied Electronics and Communication Engineering (AppECE) lab, Faculty of Electrical and Electronic Engineering, University Malaysia Pahang, Malaysia. The temperature of the data acquisition place was 16–20 ℃ and the background noise level was 25–30 dB. The data has been collected on two different days. In each day, four trials of all classes (left wink, right wink, and no wink) from five subjects have been recorded. All the subjects were informed of their time slots (30 min) before the experiment day. Table 1 shows the details information regarding subjects, dates, time-slots and trials.

Table 1 Experiment details of data acquisition

The total number of observations from five subjects is 120, where 90 observations have been utilized in training and the remaining observations have been used to test the model. Hence, the percentages of training and testing observations are 75% and 25% respectively. In this study, 5 channels Emotiv Insight EEG headset has been used to capture brain activity. The five electrodes namely, AF3, AF4, T7, T8, and Pz have been placed according to the 10–20 electrode positioning system. This device is capable of capturing the EEG data containing mental commands, attention, stress, meditation, and focus. Moreover, some facial expressions including blink, wink, surprise, clench, and smile can also be detected accurately [20]. The sampling rate of this device is 128 Hz. It can transfer data to the computer or other embedded systems wirelessly. The users of Emotiv Insight enjoy the freedom of mobility due to its wireless data transmission system. Figure 2 shows the experimental set up of Emotiv Insight. In order to record and store the brain activity, Emotiv Pro software has been used. This software is capable of displaying real-time raw EEG, data packet acquisition and loss, motion data, performance metrics (0.1 Hz) and contact quality [21]. Moreover, it displays the FFT and band power graphs in real-time. Initially, Emotiv Insight and the computer have been paired through Bluetooth and finally, using the Emotiv Pro software, the computer and device have been connected. The contact between the skin and electrode can be improved using conductive gel.

Fig. 2
figure 2

Data acquisition device and software set-up

A powerpoint slide has been made and the data has been captured according to this slide. The duration of this slide was 10 s shown in Fig. 3. First 4 s of this slide was a blank screen (white) to stable the mental activity. A colored screen (blue) has been displayed from 4 to 5 s. Finally, from 5 to 10 s, the slide has been shown either left wink or right wink or no wink. The time between 5 to 10 s, subjects was asked to wink according to the display. Hence, the duration of each trial was 5 s (5 to 10 s). Before, collecting the final dataset, all the subjects were asked to practice with the powerpoint slide. Figure 4 shows the raw data of each trial for the left wink, right wink, and no wink.

Fig. 3
figure 3

Data acquisition protocol

Fig. 4
figure 4

Raw data of five channels for left wink, right wink and no wink

2.2 Data analysis framework

Once the data has been collected, the data should be analyzed. Figure 5 shows the complete follow chart of the data analysis. The raw data has been stored in the format of.csv file and each file contains data of 10 s. In the pre-processing phase, the data for the first 5 s have been removed. The raw data for five electrodes has also been separated. After preprocessing, each trial contains total samples of 640 (each channel). In this study, features in terms of fast furrier transform (FFT) and sample rage have been extracted. Finally, the extracted features have been employed to three different machine learning algorithms namely, LDA, SVM, and K-NN.

Fig. 5
figure 5

Complete flow chart of data analysis

2.2.1 Fast fourier transform

Fourier analysis converts signals from time domain to frequency domain. Fast Fourier transform (FFT) algorithm calculates Discrete Fourier transform (DFT) of a sequence. FFT quickly calculate the transformations by factorizing the DFT matrix into a product of sparse factors and produces the same result as DFT. The difference is that FFT is much faster than DFT. The N-point DFT of sequence \( \{ x_{n} ,n = 0,1,2, \ldots \ldots N - 1\} \) is defined by Eq. (1) [22].

$$ X_{k} = \mathop \sum \limits_{n = 0}^{N - 1} x_{n} e^{{ - \frac{i2\pi kn}{N}}} $$
(1)

where \( X_{k} \) is the FFT coefficients, N is the total number of input EEG samples, n is total number of points in FFT. k = 0, 1, 2… N − 1. In this study, the mean absolute value of FFT coefficients has been taken as the feature.

2.2.2 Sample range

Another feature namely sample range has also been employed in this study. The sample range is the difference between the maximum and minimum sample of each channel. Mathematically, it can be shown by Eq. (2);

$$ {\text{Sample range}}\, = \, X_{max} \left( k \right) - X_{min} \left( k \right) $$
(2)

where, \( X_{max} \) and \( X_{min} \) are the maximum and minimum samples of the raw data respectively and k denotes the channel or electrode.

2.2.3 Linear discriminant analysis

LDA is employed to find the linear combinations of feature vectors that describe the characteristics of the corresponding signal. It utilizes hyperplanes to separate two or more classes. The isolating hyperplane is achieved by searching the projection which maximizes the distance among the classes’ means and minimizes the interclass variance [23, 24]. This technique has a very low computational requirement and it is simple to use. The LDA has been successfully applied in a variety of BCI systems.

2.2.4 K-nearest neighbor

The K-NN algorithm depends on the principle that the features corresponding to the several classes will form individual clusters in the feature space. The features that are closer to each other recognized as neighbors. This classifier takes k metric distances into account between the test sample features and those of the nearest classes, to classify a test feature vector. In K-NN architecture, the number of neighbor and the types of distance metrics are key factor [24]. In this study, the Euclidean distance metrics has been utilized to build the K-NN model. In order to get optimum accuracy of K-NN model, we have picked different values of K and at K = 2, the K-NN model provides the best accuracy.

2.2.5 Support vector machine

The concept of SVM is to figure out the most suitable hyperplane within the feature space that classifies the data points distinctly. The hyperplane should be such that the gap between the hyperplane and each adjacent class is maximum [25]. This hyperplane contributes to an increase in classification accuracy. Different types of kernel function and the regularization parameter C have a crucial role in the structure of SVM. The widely employed kernel functions of SVM are radial basis function (RBF), linear, polynomial and sigmoid [26]. In this study, the Gaussian RBF kernel function has been employed. The C and gamma parameters are selected as default from the Classifier Learner App.

To validate the training model, k-fold cross-validation has been employed where the value of K is five. The complete data analysis section has been carried out in the Matlab environment. To classify the extracted features through the Machine Learning approach, we have used classifier learner apps which is a built-in toolbox of Matlab.

2.3 Performance evaluation

The performance of the proposed method has been analyzed in terms of confusion matrix, classification accuracy, true positive and false negative rate and AUC (Area Under the Curve) ROC (Receiver Operating Characteristics). The classification accuracy (CA) of the proposed method is calculated by Eq. (3) [27].

$$ CA = \frac{TP + TN}{TP + FN + TN + FP} \times 100\% $$
(3)

where TP = true positive, FN = false negative, TN = true negative and FP = false positive.

3 Results and discussion

In this section, the performance of the classifiers has been evaluated. Before training the classification model, the selected feature has been labeled. In this study, the left wink, right, wink and no wink have been labeled with 1, 2 and 3 respectively.

Figure 6 presents the confusion matrix (see Fig. 6a), TPR and FNR (see Fig. 6b) of the LDA, SVM and K-NN when the feature extraction technique is FFT. From the confusion matrix, it is clear that the 72 observations (out of 90) have been recognized accurately by the LDA and K-NN whereas the 70 observations have been recognized by the SVM. In the case of LDA, the TPR for classes 1, 2 and 3 are 90%, 90% and 70% respectively which seems to best performance as compared to the K-NN and SVM. According to Table 2, the classification accuracy of LDA, SVM, and K-NN are 83.3%, 75.6% and 80% respectively. Hence, the LDA shows the highest accuracy when the feature has been extracted using FFT.

Fig. 6
figure 6

All classifiers with FFT a Confusion matrix b True positive rate (TPR) and false negative rate (FNR)

Table 2 Classification accuracy

Figure 7 represents the confusion matrix, TPR and FNR for the employed classifiers when the feature is extracted by the sample range. The performance of this feature is excellent with all classifiers. A total of 89 observations (out of 90) have been recognized by the SVM and K-NN whereas 88 observations have been recognized accurately by the LDA. In the case of SVM and K-NN, the TPR for classes 1, 2 and 3 are 100%, 100%, and 97% respectively.

Fig. 7
figure 7

All classifier with sample range a Confusion matrix b True positive rate (TPR) and false negative rate (FNR)

The Training accuracy of SVM and K-NN are 98.9% whereas LDA achieves 97.8% shown in Table 2. Once the models are trained, the test data have been utilized to test the models. The testing accuracy of LDA, SVM, and K-NN with respect to FFT and sample range shown in Table 2. The maximum testing accuracy with respect to FFT has been achieved by LDA whereas the SVM and K-NN have been achieved 96.7% with respect to sample range. The FFT and sample range have been employed separately to extract the feature. From Table 2, it is obvious that the training and testing accuracy with respect to the sample range is significantly higher than the FFT.

In machine learning, the classification performance can also be evaluated by the area under curve (AUC)—receiver operating characteristics (ROC). Basically, the ROC signifies a probability curve whereas the AUC relates to the degree of separability. In short, this metric explains the model’s capability of differentiating the classes. The range of AUC is 0 to 1. The larger value of AUC represents that the model is capable of recognizing the classes accurately. Figure 8 presents the classifier performance in terms of AUC-ROC when the feature was extracted using sample range. Among three classifiers, the AUC of SVM (see Fig. 8b) is 1 in all classes which indicates that the performance of SVM is superior as compared to the LDA and K-NN.

Fig. 8
figure 8

Area under curve (AUC)—receiver operating characteristics (ROC) a LDA, b SVM and c K-NN

4 Conclusion

In this study, the facial expressions in terms of left wink, right wink, and no wink have been classified. The wink data has been recorded using a five-channel Emotiv Insight EEG headset in the form of brain activity. Two features namely, FFT and sample range have been extracted. The extracted features have been classified using SVM, K-NN, and LDA. The performance of the classifiers has been evaluated using classification accuracy, confusion metrics, AUC-ROC, TPR, and FNR. The feature, sample range has been achieved the better accuracy with all classifiers as compared to the FFT. The obtained classification accuracy signifies that the wink based facial expression can be utilized in the operation of assistive technology. However, there are some issues that need to be overcome. As the key objective of BCI technology is to assist the physically challenged people, the data should be captured from the targeted users. Moreover, the data should be collected from more subjects and the number of trials from each subject should also be increased. The classifier results should be transformed into device commands and in the meantime, the complete experiment should be developed in real-time.