Automated facial expression recognition using novel textural transformation

Facial expressions demonstrate the important information about our emotions and show the real intentions. In this study, a novel texture transformation method using graph structures is presented for facial expression recognition. Our proposed method consists of five steps. First the face image is segmented and resized. Then the proposed graph-based texture transformation is used as feature extractor. The exemplar feature extraction is performed using the proposed deep graph texture transformation. The extracted features are concatenated to obtain one dimensional feature set. This feature set is subjected to maximum pooling and principle component analysis methods to reduce the number of features. These reduced features are fed to classifiers and we have obtained the highest classification accuracy of 97.09% and 99.25% for JAFFE and TFEID datasets respectively Moreover, we have used CK + dataset to obtain comparison results and our textural transformation based model yielded 100% classification accuracy on the CK + dataset. The proposed method has the potential to be employed for security applications like counter terrorism, day care, residential security, ATM machine and voter verification.


Introduction
Facial expressions contain important information about the mood, mind, emotion and pain of people as a reaction of the brain to various situations (Farajzadeh and Hashemzadeh 2018;Yaddaden et al. 2018). The facial expression recognition (FER) is widely used in many fields such as biometrics and digital forensics to reveal the human emotion from the face image. The performance on the detection of people's mood gets intensified by the inclusion of facial areas such as face, eye, mouth and nose using advanced image processing techniques. Humans have various expressions like neutral, anger, happiness, sadness, fear, and surprise (Owusu et al. 2014;Shan et al. 2009;Zhang et al. 1996). These expressions play a major role in people's communications. In terms of digital forensics, the situations such as being excited or nervous include important information on the psychology of guilt. For this purpose, many algorithms have been developed to provide human machine interaction. The basic building blocks of these programs include face detection, facial local feature extraction, distinctive feature selection, and classification (Cohen et al. 2003;Fasel and Luettin 2003;Lien et al. 1998). The human brain reflects emotions using facial expressions and these recognition systems also classify the facial expressions (Matsugu et al. 2003;Shan et al. 2009;Virrey et al. 2019). Many facial expression recognition methods have been proposed based on expression recognition (feelings, opinion, thought), non-verbal communication (gestures and facial expression, signs), verbal communication (listener reactions), and psychological activities (pain and fatigue) (Moore and Bowden 2011;Zhao and Pietikainen 2007).

3
The image processing and pattern recognition methods have been widely used for FER. Farajzadeh and Hashemzadeh (2018) proposed a new face recognition model using five FER image datasets. The local binary pattern (LBP) and histogram of oriented gradients (HOG) techniques are used for feature extraction. They worked in two phases using LBP and HOG methods. The HOG based method achieved better results than LBP based method. Revina and Sam Emmanuel (2018) presented a modified facial expression recognition with local directional number (LDN) pattern, dominant gradient local ternary pattern (DGLTP). In their study, two FER image datasets are used to obtain numerical results of the method. Li et al. (2018) proposed a novel end-to-end learnable local binary pattern network using LBP network for face detection. The comparison results are given in terms of equal error rate (EER) and half total error rate (HTER) in their study. Turan and Lam (2018) proposed a structure with local descriptors based on histogram. The comparison results are given in terms of accuracy. Zhang and Hua (2015) analyzed the driver fatigue recognition based on LBP and boost local binary pattern. They have used support vector machine (SVM) for classification and recognition rate as a performance measure. Ertugrul et al. (2018) proposed a model of Kinship patterns for facial expressions. In their study, they used two (smile and disgust) databases. They used accuracy rates for performance evaluation. Chao et al. (2015) presented a model using expression specific local binary pattern and class regularized locality preserving projection. In their study, recognition rate of JAFFE (Lyons et al. 1999) dataset is evaluated. Guo et al. (2018) analyzed the expression dependent susceptibility to study the face distortions and evaluated the facial expressions. The accuracy, reaction time, expression intensity, fixation duration and numbers are used to evaluate the performance. Moore and Bowden (2011) proposed local Gabor binary pattern to study the facial expression. They have used three FER image datasets and presented their experimental results. Savchenko (2021) used lightweight neural networks for FER detection. They selected the AFEW and the VGAF datasets. Vo et al. (2020) proposed a pyramid with super-resolution (PSR) method for wild FER detection. They used three datasets (FER + , RAF-DB and AffectNet). They presented the results according to confusion matrix, accuracy. Revina and Emmanual (2019) applied multi-support vector neural network to FER images. Cohn-Kanade AU-Coded Expression and JAFFE databases were used for experimental results. They used accuracy, true positive rate and false positive rate metrics to evaluate the performance. Abate et al. (2019) proposed a method for clustering facial attributes. For this purpose, they utilized principal component analysis, convolutional neural networks. The experimentations were carried out using CelebA dataset.
Facial expression recognition is a major problem for image processing and pattern recognition. In the literature, many methods have been presented to solve this problem. We have presented a texture based method to detect the FER automatically which is seen in Fig. 1. The proposed scheme is as follows: -In the preprocessing phase, the face area of the image is segmented and resized. -The segmented image is divided into pieces to implement exemplar method. -The proposed texture transformation is applied on each piece to extract the features. The major contributions of this paper are given below.
• Most of the image descriptors available in the literature such as local binary pattern (LBP) (Ojala et al. 2002), local quadruple pattern (LQPAT) (Chakraborty et al.  (Ding et al. 2016), and local ternary pattern (LTP) (Tan and Triggs 2010) use single pattern and are applied onto each block for feature extraction. Therefore, these descriptors could not achieve successful results for some problems like FER which involve heterogeneous dataset using single pattern. To solve this problem, a novel graph based texture transform is proposed. This transform uses 15 patterns from the five levels for feature extraction by utilizing the graph based structures. • A novel cognitive and lightweight deep feature extraction network is proposed in this paper. This method is cognitive because optimization and weight updating methods are not used. The proposed method is fast as it does not involve any complex calculations. • A novel hybrid feature reduction is proposed in this method. In here, maximum pooling and PCA are utilized together for feature reduction.

The proposed graph based texture transformation
The image descriptors have been widely used for face recognition and FER methods (Ahonen et al. 2006;Hernández et al. 2007;Kabir et al. 2010;Rivera et al. 2013;Sadeghi and Raie 2019). A novel graph based texture descriptor is employed using variable patterns and it extracts features up to five levels. The basic geometric shapes are utilized as patterns. The proposed transformation can be used with variable block sizes. In this study, overlapping blocks with size of 3 × 3, 5 × 5, 7 × 7 are utilized. Our proposed transformation is illustrated using an example as shown below. A sample block size of 3 × 3. Equation (1) shows the mathematical description of the 3 × 3 block.
where Im i,j is i th and j th pixel value of the image. The proposed graph based transformation is explained using a 3 × 3 sample block and patterns for 3 × 3, 5 × 5 and 7 × 7 size of blocks are shown in the figures by using , and arrows. Also, the signum function is used as binary feature extraction function and its mathematical description is shown in Eq. (2).
where Sig(., .) represents the signum function, a and b are input parameters of the signum function.

Level 1
In this paper, a graph-based texture transformation of five levels is implemented. In each level, variable shapes are used to create graphs and these graphs are utilized as patterns.
In the first level, two graphs are utilized, and 8-bit binary feature values are extracted from each graph. Finally, 16-bit binary features are extracted in level 1.
To obtain the numerical results of the example, a sample block size of 3 × 3 is used as shown in Fig. 2.
The mathematical description and example of the level 1 are shown in Fig. 3. In level 1, local binary pattern and a graph which is similar to square shape is used.

Level 2
In this level, 4 pentagons are used for feature extraction. These pentagons are obtained with 90-degree rotation. In the level 2, 20 bit binary features are extracted. The mathematical description and example of the level 2 are shown in Fig. 4.

Level 3
8-bit binary data are extracted from level 3 which uses two shapes. The shapes used and mathematical expressions of level 3 are shown in Fig. 5.

Level 4
The triangle shapes are utilized as patterns in this level. Totally, 12 bit values are extracted and description of this level is given in Fig. 6. To extract these bits, 4 triangles at 90-degree rotations are used.

Level 5
In this level, the lines are used to extract the features. The line patterns are shown in Fig. 7. 3 + 3 + 2 bits are obtained by applying the signum function to the pixels at the ends of the lines. The 8 bits of data are obtained using 8 lines.
Finally, 64 bit features are extracted from each block. The 8 feature images are constructed using these bits. The pseudo code of feature image creation is given Algorithm 1. Then, the 8 feature images are constructed using these decimal values and histogram of these images are used as features.

The texture transformation based facial expression recognition
In this method, an exemplar facial image recognition method is presented. This algorithm uses a novel graph based texture transformation for feature extraction. The proposed method consists of preprocessing, feature extraction, feature concatenation, feature reduction and classification phases. During the preprocessing step, facial images are segmented to obtain facial area. Then, the segmented image is resized, and this image is divided into 30 × 30 non-overlapping blocks. The proposed texture transformation is applied to each block for feature extraction. The maximum pooling and PCA are employed for feature reduction together. In this step, 2048 × 12 = 24,576 features are reduced to 1024 features using 1D maximum pooling. The pseudo code of 1D maximum pooling is shown in Algorithm 2.
In the last phase, LDA and SVM classifiers are used for classification. The steps of the proposed method are given in Table 1. 128 and 1024 features are utilized as inputs for the quadratic kernel SVM and LDA respectively. To obtain 128 features, PCA is applied onto the pooled features.
In this study, LDA and SVM are utilized as classifier and attributes of them are listed in Table 2.

Datasets
In order to evaluate the recognition ability of the proposed texture transformation based exemplar FER method, JAFFE (Lyons et al. 1999) and TFEID (Chen and Yen 2007) datasets are used. These two public datasets are widely used for FER. The sample images of these datasets are shown in the Fig. 8 and its attributes are given as below.

Experimental setup
In the experiments, the computer with specifications listed in Table 3 was used and simulations were done using Matlab2018a.
As can be seen from this table (see Table 3), the proposed graph-based model has been implemented on a simple configured computer since it is a lightweight image classification (FER) model.

Cases
The presented graph-based textural extractor is a parametric function. For feature extraction, variable sized overlapping blocks can be used. To clearly show feature extraction ability of the presented graph-based function, 3 × 3, 5 × 5 and 7 × 7 sized overlapping blocks have been used. Moreover, we have considered LDA and SVM as classifiers. Hence, six cases have been created and these cases are defined in Table 4.

Performance evaluation
To measure classification ability of these cases, accuracy parameter is used. The mathematical formula of the accuracy is given in Eq. (3). Moreover, the calculated confusion matrices have been given. By using the confusion matrices, other commonly used performance evaluation parameters can be calculated.

Validation
tenfold cross validation is used in the classification phase and average success rates are evaluated. Performance obtained using the two databases (JAFFE and TFEID) is given below.

Experimental results using JAFFE database
The cases used in the experiment are implemented utilizing JAFFE dataset to classify the seven universal facial expressions. The results and confusion matrices of the cases are shown in Table 5. Besides the confusion matrices obtained for different cases are shown in Fig. 9.

Experimental results using TFEID database
The cases used in the experiment are implemented utilizing TFEID dataset to classify the 7 universal facial expressions. The results obtained are shown in Table 6. Moreover, #True predictive images #Total images × 100 Fig. 10 shows the confusion matrices obtained for various cases using TFEID dataset.

Execution time and computational complexity analysis
In order to evaluate the performance of the proposed method clearly, execution time and computational complexity are examined. The average execution time of the proposed method for an image is listed in Table 7 for different block sizes. In Table 7, the listed time measures are the summation of preprocessing, feature extraction, feature concatenation and feature reduction times for an image. The space complexity of the proposed texture transformation is also calculated, and the computational complexity of the different cases is given in Table 8. It is calculated for 3 × 3, 5 × 5 and 7 × 7 block sizes separately.
According to Table 8, the presented model has linear time complexity. In this respect, by using this model, a mobile FER application can be implemented.

Comparisons with state-of-the-arts
In this paper, an exemplar-based FER method is proposed using a novel graph based texture transformation method. The classification rate and space complexity are considered as the performance evaluation metrics. 3 × 3, 5 × 5 and Fig. 7 Mathematical description of level 5 with a numerical example 1 3 7 × 7 block sizes have been used for feature extraction. Six cases have been described using these various block sizes and classifiers. Seven widely used methods are chosen to obtain the comparison results. We have obtained the highest classification accuracy for case 3 among the seven previously presented state of art methods using the same datasets.
The proposed method is also compared with exemplar based SVM [1] method for each expression and the results are listed in Table 10.
Moreover, we have used a commonly used FER dataset (CK +) for increasing comparatively results. CK + (Lucey et al. 2010) contains 981 images with seven emotions. The distribution of this dataset is given as follows; anger: Table 1 The steps of the proposed textural graph transformation based exemplar FER method Step 1: Load the image W × H In here, W and H represent width and hieigth of the original image Step 2: Segment the facial area w × h w and h define size of the face segmented image Step 3: Resize the facial area of face image 120 × 90 Step 4: Divide non-overlapping regions in facial area. The exemplar method is used in this step

× 30
Step 5: Extract the features in each region by using the proposed graph based transform with variable block size f i represents features of region with size of 2048 Step 6: Concatenate the features of each region Step 7: Reduce the feature dimension by using maximum pooling or maximum pooling + PCA feat = 1 … k k = 1024 or 128 Step 8: Classify the reduced features using quadratic kernel SVM or LDA using tenfold cross validation  Fig. 11. As can be seen from Fig. 11, By using LDA and SVM classifiers, 97.25% and 100% classification accuracies have been calculated respectively. Moreover, comparative results about the CK + dataset have been tabulated in Table 11.

Discussions
The experimental results clearly illustrate that the proposed exemplar based FER method is superior as compared to the previously presented methods. The case 3 achieved 2.44% and 0.37% higher classification rate than the best previously reported results for JAFFE and TFEID respectively.   The presented graph-based textural transformation with 5 × 5 sized overlapping blocks Case 3 LDA The presented graph-based textural transformation with 7 × 7 sized overlapping blocks Case 4 SVM The presented graph-based textural transformation with 3 × 3 sized overlapping blocks Case 5 SVM The presented graph-based textural transformation with 5 × 5 sized overlapping blocks Case 6 SVM The presented graph-based textural transformation with 7 × 7 sized overlapping blocks The second performance evaluation criteria of this method is the execution time. The results of the execution time and space complexity of the proposed graph methods are given in Table 7. The space complexity O(n 2 ) and computational complexity for each block is equal to O(145) are calculated. It can be noted from the Table 8 that, the computational complexity of this method directly depends on the size of image.
The advantages of the proposed method can be summarized as follows: -Convolutional neural networks (CNN) are widely used for FER and they achieved good accuracy rates. However, CNN involves high computational complexity and it does not achieve high success rate using small datasets. To extract features deeply, a novel graph based texture transformation is proposed in this study. 15 descriptors are utilized in five levels in the transformation. Our developed image descriptors yield higher classification accuracy and lower computational complexity as compared to the previous studies. Therefore, the graph based texture transformation method is a lightweight feature extractor for FER. Moreover, this Fig. 9 The confusion matrices obtained for various cases using JAFFE dataset   The limitation of the proposed technique is that, it has been developed and tested on relative small databases. In the big datasets, deep learning models (Aouayeb et al. 2021;Minaee et al. 2021;Revina and Emmanuel 2019;Savchenko 2021;Umer et al. 2021;Vo et al. 2020) have been used to get high classification performance. In this research, we have presented a feature engineering work. To get high classification results is not main objective of this work. By using simple graphs, we have investigated classification ability of the textural features on the FER image datasets.

Conclusions
In this paper, a novel graph based texture transformation is employed in the feature extraction phase of the automated facial expression recognition. The proposed method consists of five levels and 2048 features are extracted using the proposed graph based texture transformation. The main purpose of the proposed transformation is to obtain in-depth properties with low computational complexity using variable patterns which are constructed using basic geometric shapes. The experimental results are presented for six cases. We have obtained 99.25% and 97.09% success rates using TFEID and JAFFE datasets respectively for Case3. Moreover, we have used CK + FER image dataset to increase comparative results. Our graph-based textural feature extraction based classification model yielded 97.25% accuracy using LDA and 100% accuracy rate deploying SVM classifier on the CK + dataset. By using three FER image datasets, general classification capability of the presented graph-based model has been demonstrated. Our proposed framework can be used in the real world applications as it requires minimum execution times and low space complexity. The developed    Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.