1 Introduction

Facial expressions contain important information about the mood, mind, emotion and pain of people as a reaction of the brain to various situations (Farajzadeh and Hashemzadeh 2018; Yaddaden et al. 2018). The facial expression recognition (FER) is widely used in many fields such as biometrics and digital forensics to reveal the human emotion from the face image. The performance on the detection of people's mood gets intensified by the inclusion of facial areas such as face, eye, mouth and nose using advanced image processing techniques. Humans have various expressions like neutral, anger, happiness, sadness, fear, and surprise (Owusu et al. 2014; Shan et al. 2009; Zhang et al. 1996). These expressions play a major role in people’s communications. In terms of digital forensics, the situations such as being excited or nervous include important information on the psychology of guilt. For this purpose, many algorithms have been developed to provide human machine interaction. The basic building blocks of these programs include face detection, facial local feature extraction, distinctive feature selection, and classification (Cohen et al. 2003; Fasel and Luettin 2003; Lien et al. 1998). The human brain reflects emotions using facial expressions and these recognition systems also classify the facial expressions (Matsugu et al. 2003; Shan et al. 2009; Virrey et al. 2019). Many facial expression recognition methods have been proposed based on expression recognition (feelings, opinion, thought), non-verbal communication (gestures and facial expression, signs), verbal communication (listener reactions), and psychological activities (pain and fatigue) (Moore and Bowden 2011; Zhao and Pietikainen 2007).

The image processing and pattern recognition methods have been widely used for FER. Farajzadeh and Hashemzadeh (2018) proposed a new face recognition model using five FER image datasets. The local binary pattern (LBP) and histogram of oriented gradients (HOG) techniques are used for feature extraction. They worked in two phases using LBP and HOG methods. The HOG based method achieved better results than LBP based method. Revina and Sam Emmanuel (2018) presented a modified facial expression recognition with local directional number (LDN) pattern, dominant gradient local ternary pattern (DGLTP). In their study, two FER image datasets are used to obtain numerical results of the method. Li et al. (2018) proposed a novel end-to-end learnable local binary pattern network using LBP network for face detection. The comparison results are given in terms of equal error rate (EER) and half total error rate (HTER) in their study. Turan and Lam (2018) proposed a structure with local descriptors based on histogram. The comparison results are given in terms of accuracy. Zhang and Hua (2015) analyzed the driver fatigue recognition based on LBP and boost local binary pattern. They have used support vector machine (SVM) for classification and recognition rate as a performance measure. Ertugrul et al. (2018) proposed a model of Kinship patterns for facial expressions. In their study, they used two (smile and disgust) databases. They used accuracy rates for performance evaluation. Chao et al. (2015) presented a model using expression specific local binary pattern and class regularized locality preserving projection. In their study, recognition rate of JAFFE (Lyons et al. 1999) dataset is evaluated. Guo et al. (2018) analyzed the expression dependent susceptibility to study the face distortions and evaluated the facial expressions. The accuracy, reaction time, expression intensity, fixation duration and numbers are used to evaluate the performance. Moore and Bowden (2011) proposed local Gabor binary pattern to study the facial expression. They have used three FER image datasets and presented their experimental results. Savchenko (2021) used lightweight neural networks for FER detection. They selected the AFEW and the VGAF datasets. Vo et al. (2020) proposed a pyramid with super-resolution (PSR) method for wild FER detection. They used three datasets (FER + , RAF-DB and AffectNet). They presented the results according to confusion matrix, accuracy. Revina and Emmanual (2019) applied multi-support vector neural network to FER images. Cohn-Kanade AU-Coded Expression and JAFFE databases were used for experimental results. They used accuracy, true positive rate and false positive rate metrics to evaluate the performance. Abate et al. (2019) proposed a method for clustering facial attributes. For this purpose, they utilized principal component analysis, convolutional neural networks. The experimentations were carried out using CelebA dataset.

Facial expression recognition is a major problem for image processing and pattern recognition. In the literature, many methods have been presented to solve this problem. We have presented a texture based method to detect the FER automatically which is seen in Fig. 1. The proposed scheme is as follows:

  • In the preprocessing phase, the face area of the image is segmented and resized.

  • The segmented image is divided into pieces to implement exemplar method.

  • The proposed texture transformation is applied on each piece to extract the features.

  • The extracted features are concatenated to obtain the final feature.

  • PCA and one dimensional (1D) maximum pooling are used for feature reduction.

  • SVM and LDA are used for automated classification.

Fig. 1
figure 1

Block diagram of the proposed method

The major contributions of this paper are given below.

  • Most of the image descriptors available in the literature such as local binary pattern (LBP) (Ojala et al. 2002), local quadruple pattern (LQPAT) (Chakraborty et al. 2017), dual cross pattern (DCP) (Ding et al. 2016), and local ternary pattern (LTP) (Tan and Triggs 2010) use single pattern and are applied onto each block for feature extraction. Therefore, these descriptors could not achieve successful results for some problems like FER which involve heterogeneous dataset using single pattern. To solve this problem, a novel graph based texture transform is proposed. This transform uses 15 patterns from the five levels for feature extraction by utilizing the graph based structures.

  • A novel cognitive and lightweight deep feature extraction network is proposed in this paper. This method is cognitive because optimization and weight updating methods are not used. The proposed method is fast as it does not involve any complex calculations.

  • A novel hybrid feature reduction is proposed in this method. In here, maximum pooling and PCA are utilized together for feature reduction.

2 The proposed graph based texture transformation

The image descriptors have been widely used for face recognition and FER methods (Ahonen et al. 2006; Hernández et al. 2007; Kabir et al. 2010; Rivera et al. 2013; Sadeghi and Raie 2019). A novel graph based texture descriptor is employed using variable patterns and it extracts features up to five levels. The basic geometric shapes are utilized as patterns. The proposed transformation can be used with variable block sizes. In this study, overlapping blocks with size of 3 × 3, 5 × 5, 7 × 7 are utilized.

Our proposed transformation is illustrated using an example as shown below. A sample block size of 3 × 3. Equation (1) shows the mathematical description of the 3 × 3 block.

$$block=\left[\begin{array}{ccc}{Im}_{i,j}& {Im}_{i,j+1}& {Im}_{i,j+2}\\ {Im}_{i+1,j}& {Im}_{i+1,j+1}& {Im}_{i+1,j+2}\\ {Im}_{i+2,j}& {Im}_{i+2,j+1}& {Im}_{i+2,j+2}\end{array}\right]$$
(1)

where \({Im}_{i,j}\) is ith and jth pixel value of the image. The proposed graph based transformation is explained using a 3 × 3 sample block and patterns for 3 × 3, 5 × 5 and 7 × 7 size of blocks are shown in the figures by using , and arrows. Also, the signum function is used as binary feature extraction function and its mathematical description is shown in Eq. (2).

$$\mathrm{Sig}\left(a,b\right)=\left\{\begin{array}{c}0, a<b\\ 1,a\ge b\end{array}\right.$$
(2)

where \(\mathrm{Sig}\left(.,.\right)\) represents the signum function, \(a\) and \(b\) are input parameters of the signum function.

2.1 Level 1

In this paper, a graph-based texture transformation of five levels is implemented. In each level, variable shapes are used to create graphs and these graphs are utilized as patterns. In the first level, two graphs are utilized, and 8-bit binary feature values are extracted from each graph. Finally, 16-bit binary features are extracted in level 1.

To obtain the numerical results of the example, a sample block size of 3 × 3 is used as shown in Fig. 2.

Fig. 2
figure 2

Sample 3 × 3 block

The mathematical description and example of the level 1 are shown in Fig. 3. In level 1, local binary pattern and a graph which is similar to square shape is used.

Fig. 3
figure 3

Mathematical description and examples of the Level 1

2.2 Level 2

In this level, 4 pentagons are used for feature extraction. These pentagons are obtained with 90-degree rotation. In the level 2, 20 bit binary features are extracted. The mathematical description and example of the level 2 are shown in Fig. 4.

Fig. 4
figure 4

Mathematical description and examples of the level 2

2.3 Level 3

8-bit binary data are extracted from level 3 which uses two shapes. The shapes used and mathematical expressions of level 3 are shown in Fig. 5.

Fig. 5
figure 5

Mathematical description and examples of level 3

2.4 Level 4

The triangle shapes are utilized as patterns in this level. Totally, 12 bit values are extracted and description of this level is given in Fig. 6. To extract these bits, 4 triangles at 90-degree rotations are used.

Fig. 6
figure 6

Mathematical description of the Level 4 with a numerical example

2.5 Level 5

In this level, the lines are used to extract the features. The line patterns are shown in Fig. 7. 3 + 3 + 2 bits are obtained by applying the signum function to the pixels at the ends of the lines. The 8 bits of data are obtained using 8 lines.

Fig. 7
figure 7

Mathematical description of level 5 with a numerical example

Finally, 64 bit features are extracted from each block. The 8 feature images are constructed using these bits. The pseudo code of feature image creation is given Algorithm 1.

figure d

Then, the 8 feature images are constructed using these decimal values and histogram of these images are used as features.

3 The texture transformation based facial expression recognition

In this method, an exemplar facial image recognition method is presented. This algorithm uses a novel graph based texture transformation for feature extraction. The proposed method consists of preprocessing, feature extraction, feature concatenation, feature reduction and classification phases. During the preprocessing step, facial images are segmented to obtain facial area. Then, the segmented image is resized, and this image is divided into 30 × 30 non-overlapping blocks. The proposed texture transformation is applied to each block for feature extraction. The maximum pooling and PCA are employed for feature reduction together. In this step, 2048 × 12 = 24,576 features are reduced to 1024 features using 1D maximum pooling. The pseudo code of 1D maximum pooling is shown in Algorithm 2.

figure e

In the last phase, LDA and SVM classifiers are used for classification. The steps of the proposed method are given in Table 1. 128 and 1024 features are utilized as inputs for the quadratic kernel SVM and LDA respectively. To obtain 128 features, PCA is applied onto the pooled features.

Table 1 The steps of the proposed textural graph transformation based exemplar FER method

In this study, LDA and SVM are utilized as classifier and attributes of them are listed in Table 2.

Table 2 The attributes of the classifiers used

4 Results

4.1 Datasets

In order to evaluate the recognition ability of the proposed texture transformation based exemplar FER method, JAFFE (Lyons et al. 1999) and TFEID (Chen and Yen 2007) datasets are used. These two public datasets are widely used for FER. The sample images of these datasets are shown in the Fig. 8 and its attributes are given as below.

Fig. 8
figure 8

Various facial expressions of two datasets: a JAFFE, b TFEID

JAFFE (Lyons et al. 1999) consists of 210 facial expression images of a Japanese female. TFEID (Chen and Yen 2007) dataset comprises of seven expressions (anger, disgust, fear, happiness, neutral, sadness, surprise) of 268 Taiwanese models.

4.2 Experimental setup

In the experiments, the computer with specifications listed in Table 3 was used and simulations were done using Matlab2018a.

Table 3 The specifications of the PC used in this study

As can be seen from this table (see Table 3), the proposed graph-based model has been implemented on a simple configured computer since it is a lightweight image classification (FER) model.

4.3 Cases

The presented graph-based textural extractor is a parametric function. For feature extraction, variable sized overlapping blocks can be used. To clearly show feature extraction ability of the presented graph-based function, 3 × 3, 5 × 5 and 7 × 7 sized overlapping blocks have been used. Moreover, we have considered LDA and SVM as classifiers. Hence, six cases have been created and these cases are defined in Table 4.

Table 4 The attributes of the defined cases

4.4 Performance evaluation

To measure classification ability of these cases, accuracy parameter is used. The mathematical formula of the accuracy is given in Eq. (3).

$$Acc\left(\%\right)=\frac{\#\mathrm{True}\, \mathrm{predictive}\, \mathrm{images}}{\#\mathrm{Total}\, \mathrm{images}} \times 100$$
(3)

Moreover, the calculated confusion matrices have been given. By using the confusion matrices, other commonly used performance evaluation parameters can be calculated.

4.5 Validation

tenfold cross validation is used in the classification phase and average success rates are evaluated. Performance obtained using the two databases (JAFFE and TFEID) is given below.

4.6 Experimental results using JAFFE database

The cases used in the experiment are implemented utilizing JAFFE dataset to classify the seven universal facial expressions. The results and confusion matrices of the cases are shown in Table 5. Besides the confusion matrices obtained for different cases are shown in Fig. 9.

Table 5 Accuracy rates (%) of the cases used in the experiments utilizing JAFFE dataset
Fig. 9
figure 9

The confusion matrices obtained for various cases using JAFFE dataset

4.7 Experimental results using TFEID database

The cases used in the experiment are implemented utilizing TFEID dataset to classify the 7 universal facial expressions. The results obtained are shown in Table 6. Moreover, Fig. 10 shows the confusion matrices obtained for various cases using TFEID dataset.

Table 6 Accuracy rates (%) obtained for various cases using TFEID dataset
Fig. 10
figure 10

The confusion matrices obtained for various cases using TFEID dataset

4.8 Execution time and computational complexity analysis

In order to evaluate the performance of the proposed method clearly, execution time and computational complexity are examined. The average execution time of the proposed method for an image is listed in Table 7 for different block sizes. In Table 7, the listed time measures are the summation of preprocessing, feature extraction, feature concatenation and feature reduction times for an image.

Table 7 Execution time (in seconds) for the proposed method with variable block sizes

The space complexity of the proposed texture transformation is also calculated, and the computational complexity of the different cases is given in Table 8. It is calculated for 3 × 3, 5 × 5 and 7 × 7 block sizes separately.

Table 8 The computational complexity of the proposed graph based feature extraction method for 3 × 3, 5 × 5 and 7 × 7 block sizes

According to Table 8, the presented model has linear time complexity. In this respect, by using this model, a mobile FER application can be implemented.

4.9 Comparisons with state-of-the-arts

In this paper, an exemplar-based FER method is proposed using a novel graph based texture transformation method. The classification rate and space complexity are considered as the performance evaluation metrics. 3 × 3, 5 × 5 and 7 × 7 block sizes have been used for feature extraction. Six cases have been described using these various block sizes and classifiers. Seven widely used methods are chosen to obtain the comparison results. We have obtained the highest classification accuracy for case 3 among the seven previously presented state of art methods using the same datasets.

The proposed method is compared with 3D Gabor features + SVM (Zhang and Tjondronegoro 2011), contours + SimNet (Lee et al. 2013), meta probability codes + SVM (Farajzadeh et al. 2014), curvelet + online sequential learning machine (Uçar et al. 2016), local Fisher discriminant analysis (Wang et al. 2016), pyramid + SVM + single-branch decision tree (Ashir and Eleyan 2017), exemplar-based + SVM (Farajzadeh and Hashemzadeh 2018). The summary of comparison results obtained using various methods using the same database is listed in Table 9.

Table 9 The overall comparison of results obtained for automated detection of FER using the same datasets

The proposed method is also compared with exemplar based SVM [1] method for each expression and the results are listed in Table 10.

Table 10 The comparison results obtained using the same datasets for different expressions

Moreover, we have used a commonly used FER dataset (CK +) for increasing comparatively results. CK + (Lucey et al. 2010) contains 981 images with seven emotions. The distribution of this dataset is given as follows; anger: 135, contempt: 54, disgust: 177, fear: 75, happy: 207, sadness: 84 and surprise: 249. We have used 3 × 3 overlapping blocks our recommended graph based textural feature extractor and the calculated results have been shown in Fig. 11.

Fig. 11
figure 11

Confusion matrices of the presented graph based model using LDA and SVM classifiers

As can be seen from Fig. 11, By using LDA and SVM classifiers, 97.25% and 100% classification accuracies have been calculated respectively. Moreover, comparative results about the CK + dataset have been tabulated in Table 11.

Table 11 Comparative results about the CK + dataset

5 Discussions

The experimental results clearly illustrate that the proposed exemplar based FER method is superior as compared to the previously presented methods. The case 3 achieved 2.44% and 0.37% higher classification rate than the best previously reported results for JAFFE and TFEID respectively.

The second performance evaluation criteria of this method is the execution time. The results of the execution time and space complexity of the proposed graph methods are given in Table 7. The space complexity O(n2) and computational complexity for each block is equal to O(145) are calculated. It can be noted from the Table 8 that, the computational complexity of this method directly depends on the size of image.

The advantages of the proposed method can be summarized as follows:

  • Convolutional neural networks (CNN) are widely used for FER and they achieved good accuracy rates. However, CNN involves high computational complexity and it does not achieve high success rate using small datasets. To extract features deeply, a novel graph based texture transformation is proposed in this study. 15 descriptors are utilized in five levels in the transformation. Our developed image descriptors yield higher classification accuracy and lower computational complexity as compared to the previous studies. Therefore, the graph based texture transformation method is a lightweight feature extractor for FER. Moreover, this model is appropriate to develop mobile FER application.

  • The proposed scheme is a cognitive method as it does not involve any random weight assignment, weight updating, ensemble based method or optimization algorithms.

  • The developed technique is robust as we performed ten-fold cross validation.

  • The proposed technique extracted distinctive features and as a result achieved high classification accuracy.

  • The algorithm is computationally less intensive and can be used in real time applications.

  • This graph-based model has been tested on three FER datasets and our model has attained over 97% classification accuracies for all datasets.

The limitation of the proposed technique is that, it has been developed and tested on relative small databases. In the big datasets, deep learning models (Aouayeb et al. 2021; Minaee et al. 2021; Revina and Emmanuel 2019; Savchenko 2021; Umer et al. 2021; Vo et al. 2020) have been used to get high classification performance. In this research, we have presented a feature engineering work. To get high classification results is not main objective of this work. By using simple graphs, we have investigated classification ability of the textural features on the FER image datasets.

6 Conclusions

In this paper, a novel graph based texture transformation is employed in the feature extraction phase of the automated facial expression recognition. The proposed method consists of five levels and 2048 features are extracted using the proposed graph based texture transformation. The main purpose of the proposed transformation is to obtain in-depth properties with low computational complexity using variable patterns which are constructed using basic geometric shapes. The experimental results are presented for six cases. We have obtained 99.25% and 97.09% success rates using TFEID and JAFFE datasets respectively for Case3. Moreover, we have used CK + FER image dataset to increase comparative results. Our graph-based textural feature extraction based classification model yielded 97.25% accuracy using LDA and 100% accuracy rate deploying SVM classifier on the CK + dataset. By using three FER image datasets, general classification capability of the presented graph-based model has been demonstrated. Our proposed framework can be used in the real world applications as it requires minimum execution times and low space complexity. The developed algorithm can be applied to detect other images like palm, hair, and vein.