Modified chess patterns: handcrafted feature descriptors for facial expression recognition

Kartheek, Mukku Nisanth; Prasad, Munaga V. N. K.; Bhukya, Raju

doi:10.1007/s40747-021-00526-3

Modified chess patterns: handcrafted feature descriptors for facial expression recognition

Original Article
Open access
Published: 21 September 2021

Volume 7, pages 3303–3322, (2021)
Cite this article

Download PDF

You have full access to this open access article

Complex & Intelligent Systems Aims and scope Submit manuscript

Modified chess patterns: handcrafted feature descriptors for facial expression recognition

Download PDF

Mukku Nisanth Kartheek ORCID: orcid.org/0000-0001-5502-4884^1,2,
Munaga V. N. K. Prasad¹ &
Raju Bhukya²

1576 Accesses
5 Citations
Explore all metrics

Abstract

Facial expressions are predominantly important in the social interaction as they convey the personal emotions of an individual. The main task in Facial Expression Recognition (FER) systems is to develop feature descriptors that could effectively classify the facial expressions into various categories. In this work, towards extracting distinctive features, Radial Cross Pattern (RCP), Chess Symmetric Pattern (CSP) and Radial Cross Symmetric Pattern (RCSP) feature descriptors have been proposed and are implemented in a 5 $\times $ 5 overlapping neighborhood to overcome some of the limitations of the existing methods such as Chess Pattern (CP), Local Gradient Coding (LGC) and its variants. In a 5 $\times $ 5 neighborhood, the 24 pixels surrounding the center pixel are arranged into two groups, namely Radial Cross Pattern (RCP), which extracts two feature values by comparing 16 pixels with the center pixel and Chess Symmetric Pattern (CSP) extracts one feature value from the remaining 8 pixels. The experiments are conducted using RCP and CSP independently and also with their fusion RCSP using different weights, on a variety of facial expression datasets to demonstrate the efficiency of the proposed methods. The results obtained from the experimental analysis demonstrate the efficiency of the proposed methods.

Facial Expression Recognition Adopting Combined Geometric and Texture-Based Features

Radial mesh pattern: a handcrafted feature descriptor for facial expression recognition

Article 24 July 2021

Facial expression recognition based on improved completed local ternary patterns

Article 01 May 2019

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

Facial expressions provide valuable information about a person by reflecting his psychological characteristics, which provide an important means for effective communication [66]. Facial expressions expressed by humans are common across traditions and cultures, and they provide an immediate means to analyze the mood of a person [15]. Research suggests that about 55% of human communication happens through facial expressions only [8]. Since the last decade, the field of Facial Expression Recognition (FER) has gathered a lot of attention from researchers because of its wide range of applications in driver mood detection, affective computing, clinical psychology and animation [35] etc. The relevance in day to day communication and advanced intelligent interactions between humans and machines are the important factors driving the study of FER [45].

The robustness of developing an FER system is hindered due to inconsistent image acquisition conditions, spontaneous expressions, ethnicity variations, illumination variations, aging factors, noise variations etc [33]. So, developing a feature descriptor that is robust to such dynamic changes is a complicated and challenging task. FER system includes image acquisition and pre-processing task, feature extraction task for extracting expression-specific features and classification task for classifying the expressions [35]. The classification task of an FER system is hugely dependent on the method used for feature extraction, as inappropriate feature extraction would degrade the performance even after using the best of classifiers. So, a proper feature extraction technique which could effectively capture expression-specific changes is essential for an FER system [66].

A person’s face can depict many expressions and there are only minute differences between the various expressions conveyed by human beings. For accurate classification in FER, it is essential to capture those minute information related to specific expressions. Based on the literature studies, the texture-based feature descriptors have been proven to be effective for extracting valuable features from a facial image by manipulating the neighborhood pixel relationship for detecting and capturing minute details in an image [35, 56, 57]. In FER systems, the most significant problem is to extract effective and meaningful patterns from the facial images [56].

The neighboring pixels relationship and the adjacent pixels relationship are crucial to detect finer appearance changes with respect to specific expressions to detect facial expressions effectively. Accordingly, the proposed feature descriptors have been developed by considering the neighboring pixels relationship (both diagonal and four neighbors) for Radial Cross Pattern (RCP) and by considering the adjacent pixels relationship based on Knight pixel positions for Center Symmetric Pattern (CSP). In this work, for the purpose of feature extraction, the 24 neighboring pixels surrounding the center pixel in a 5 $\times $ 5 neighborhood are arranged into two groups, namely RCP, which extracts two feature values by comparing 16 pixels with the center pixel and CSP extracts feature one feature value from the remaining 8 pixels. The features are extracted using RCP and CSP independently and also with their fusion named Radial Cross Symmetric Pattern (RCSP). The main strength of the proposed methods is that, from the available images in the dataset, the relevant features can be extracted without requiring much training data.

The remainder of the paper is structured as follows: the related work in the field of FER is summarized in the second section, and a brief review of existing descriptors that are a basis for the proposed methods are mentioned in the third section. The proposed feature descriptors for facial feature extraction are discussed in the fourth section. In the fifth section, the datasets considered for experimental evaluation and the comparison analysis of the proposed methods with the existing methods are reported. The concluding remarks and the suggestions for further analysis and study are mentioned in the last section.

Related work

The existing techniques in FER systems are broadly classified into geometric-based methods and appearance-based methods [35]. The geometric-based methods [7, 13] encode the locations, shapes, corner and contour information of main facial components such as eyes, nose and mouth. These geometric-based methods encode the characteristics that can describe the entire facial image with a lesser number of features that are scale and rotation invariant. Despite the fact that these geometric-based methods represent facial geometry, they fail to capture minute local details such as skin’s texture variations, and ridges. The facial appearance is best described by appearance-based methods, which can further be classified as global (holistic)-based methods [4, 58] and local-based methods [21, 27, 34, 35, 40,41,42]. Holistic (global) methods such as Eigenfaces [58] and Fisher faces [4] apply projection based techniques for producing a global description of the entire facial image. As these global-based methods are aimed at representing a facial image globally, they are unsuitable for capturing finer appearance changes corresponding to various facial expressions [41]. The local appearance-based methods [17, 34, 35, 40,41,42, 45] are aimed at investigating the local regions to describe various features such as corners, curved and straight edges etc. Also, the local-based methods can capture micro-level texture information such as ridge details, specific skin changes and minute characteristics that are relevant to various facial expressions.

The research on local-based methods is being carried out in two directions, namely texture-based [35, 45, 67] and edge-based approaches [17, 34, 40,41,42]. Local Binary Pattern (LBP) [45] is the most popular texture-based method for facial feature extraction. LBP is computationally efficient and is also invariant to changes in monotonic illumination. In cases of intensity fluctuations, random noises, and non-monotonic illumination levels, LBP’s feature extraction capability is affected [74]. Lai et al. [27] proposed Center Symmetric LBP (CSLBP) for greatly reducing the feature vector length of LBP. In Local Directional Pattern (LDP) [17], the directions related to top three responses from the eight obtained responses are encoded. Since Kirsch masks are applied on a local 3 $\times $ 3 neighborhood, the existence of noise or intensity distortions may affect the computations of Kirsch value responses. The top negative and positive Kirsch response values are encoded by the Local Directional Number (LDN) [41]. LDN is still affected by the noise present in the local neighborhood, even after preserving top ‘k’ positive and negative Kirsch responses. The difference in intensity values of opposing pixels in the principal directions is encoded as numbers in Local Directional Texture Pattern (LDTP) [40]. Ryu et al. [42] proposed Local Directional Ternary Pattern (LDTerP) and a multi-level approach for efficiently encoding the emotion related information. NEDP [16] considers the gradient of the center pixel as well as its neighbors for exploring a wider neighborhood to extract consistent features despite the presence of subtle distortions and random noises in a local region.

For capturing expression-specific changes, Murari et al. [35] proposed Regional Adaptive Affinitive Pattern (RADAP) that uses positional thresholds and multi-distance information for describing features that are robust to intra class variations and illuminations. Also, XRADAP, ARADAP and DRADAP operators are obtained from RADAP by performing xor, adder and decoder operations respectively. Despite the existence of noise in a local neighborhood, Local Prominent Directional Pattern (LPDP) [33] explores local regions for extracting crucial information about edges. Micheal Revina et al. [39] proposed Multi-Directional Triangles Pattern (MDTP) for extracting the features at the locations of lips and eyes. Local Dominant Directional Symmetrical Coding Patterns (LDDSCP) [54] generates two feature values by partitioning the Kirsch response values into two symmetrical groups based on the directional information. Local Optimal Oriented Pattern (LOOP) [21] uses sorted Kirsch responses for weight assignment, rather than using sequential weights. In Center Symmetric Local Gradient Coding (CS-LGC) [66], the gradients are computed in four different directions in a center symmetric manner.

Local Directional Maximum Edge Patterns (LDMEP) [32] applied Robinson’s masks in a local neighborhood for extracting both magnitude and phase information. Kas et al. [22] proposed Multi-level Directional Cross Binary Pattern (MDCBP) for texture recognition by combining both multi-radius and multi-orientation information. Durga et al. [23] proposed LBP with Adaptive Window (LBP-AW) for noise robust facial feature extraction. Alphonse et al. [2] proposed Multi-Scale and Rotation-Invariant Phase Pattern (MRIPP) for extracting blur-insensitive and rotation invariant facial features. Kumar et al. [26] proposed Weighted Full binary Tree-Sliced Binary Pattern (WFBT-SBP) for analyzing an RGB image based on inter-pixel similarity patterns. Kola et al. [24] proposed fusion of both singular values and Wavelet-based Local Gradient Coding-Horizontal and Diagonal (WLGC-HD) features for effective FER.

Szegedy et al. [53] proposed GoogleNet for object detection and classification. Duc et al. [62] proposed fusion of AlexNet and Support Vector Machine (SVM) for effective facial feature extraction. Subramanian et al. [50] proposed Meta-Cognitive Neuro-Fuzzy Inference System (McFIS) for FER. Jung et al. [20] used Convolutional Neural Network (CNN) for detecting faces and Deep Neural Network (DNN) for recognizing facial expressions from those detected faces. Shojaeilangari et al. [48] proposed Landmark-based Pose Invariant feature Descriptor (PID) for handling continuous head pose variations. Zhao et al. [72] proposed LBP on three orthogonal planes (LBP-TOP) for dynamic texture recognition. Shojaeilangari et al. [47] proposed Optical Flow-based spatial temporal feature descriptor for representing the facial expressions.

Aneja et al. [3] proposed DeepExpr, a transfer learning technique to map expressions from humans to animated characters. Zhao et al. [73] proposed an instance-based transfer learning approach with multiple feature representations. Sun et al. [51] proposed Individual Free Representation-Based Classification (IFRBC) that utilizes Variation Training Set (VTS) and virtual VTS for remitting the side effects caused by the individual differences. Wu et al. [63] proposed Adaptive Feature Mapping (AFM) for transforming the feature distribution of testing samples into that of training samples. Li et al. [29] proposed Deep Locality Preserving Convolutional Neural Network (DLPCNN) to preserve the locality closeness by maximizing the inter-class scatters. Verma et al. [60] proposed variants of Hybrid Inherited Feature Learning Network (HiNet) for capturing the local contextual information of expressive regions. Ji et al. [18] proposed a fusion network based on intra category common and distinctive feature representation. For FER, Xie et al. [64] presented the Deep Attentive Multi-path Convolutional Neural Network (DAMCNN), which combines the Salient Expression Region Descriptor (SERD) with the Multi-Path Variation Suppressing Network (MPVS-Net).

Zeng et al. [70] proposed a framework for FER that combines both geometric and appearance-based features and utilized Deep Sparse Auto Encoders (DSAE) for recognizing the facial expressions. Drawing inspiration from human vision system, Sadeghi et al. [43] proposed a method based on gabor filters. Li et al. [28] used reinforcement learning for selection of relevant images for expression classification. Saurav et al. [44] proposed Dual Integrated Convolution Neural Network (DICNN) model for recognizing ‘in the wild’ facial expressions on embedded platform. Jeen et al. [25] utilized subband selective multilevel stationary wavelet gradient transform features for recognizing facial expressions. Image filter-based Subspace Learning (IFSL) is proposed by Yan et al. [65] for better capturing the facial information. Feutry et al. [10] proposed a framework to learn anonymized representation of statistical data. Zeng et al. [69] proposed a novel pattern recognition-based method for accurate segmentation of test and control lines for quantitative analysis of gold immunochromatographic strips. Minaee et al. [36] proposed Attentional Convolutional Network (ACN) model with less than ten layers for classifying emotions from facial images. Sun et al. [52] adopted Dictionary Learning Feature Space (DLFS) for training and Sparse Representation Classification for finding the emotion of query images. A novel Deep Belief Network (DBN)-based multi-task learning algorithm is proposed by Zeng et al. [68] for the diagnosis of Alzheimer’s disease. There are various other CNN-based methods such as VGG [49], PCANet [5], ResNet [12] and MobileNet [14] which have shown promising results in FER.

Existing feature descriptors

In this section, the existing feature descriptors namely Chess Pattern (CP) [57], Local Gradient Coding (LGC) [55] and its variants are presented.

CP

Tuncer et al. [57] proposed CP, a local texture-based feature descriptor, developed using chess game rules for texture recognition. With reference to the center pixel (G_c) in a 5 $\times $ 5 neighborhood, CP logically places chessmen (Rook, Bishop and Knight) in possible positions following chess game rules. The possible positions where Rook {R_1,2,...,8}, Bishop {B_1,2,...,8} and Knight {K_1,2,...,8} are placed logically are numbered, which is a basis for feature extraction is shown in Fig. 1a, b. CP extracts six features in a 5 $\times $ 5 neighborhood. The first three extracted features are Rook (R), Bishop (B) and Knight (K). For extracting R feature, the pixel intensities in positions {R_1,2,...,8} are compared with the pixel intensity of G_c, which is shown in Eq. (1). The signum function used for comparing two pixel intensities is shown in Eq. (2). For extracting B feature, the pixel intensities in positions {B_1,2,...,8} are compared with the pixel intensity of G_c, which is shown in Eq. (3). For extracting K feature, the pixel intensities in positions {K_1,2,...,8} are compared with the pixel intensity of G_c, which is shown in Eq. (4).

The next three features extracted are Rook_Knight (R_K), Rook_Bishop (R_B) and Knight_Bishop (K_B). For extracting R_K feature, the pixel intensities in positions {R_1,2,...,8} are compared with the pixel intensity of positions {K_1,2,...,8}, which is shown in Eq. (5). For extracting R_B feature, the pixel intensities in positions {R_1,2,...,8} are compared with the pixel intensity of positions {B_1,2,...,8}, which is shown in Eq. (6). For extracting K_B feature, the pixel intensities in positions {K_1,2,...,8} are compared with the pixel intensity of positions {B_1,2,...,8}, which is shown in Eq. (7). Finally, all the six extracted features are concatenated together to from a final feature vector. The process of feature extraction through CP is demonstrated through numerical example in Fig. 1c–i. Thus, CP considers all the pixels present in 5 $\times $ 5 neighborhood and extracts six features by considering sign information to encode texture details present in an image. As CP considers binary weights, the feature vector length of CP is 256 $\times $ 6 = 1536, which is very high.

$$\begin{aligned}&R = \sum _{i=1}^8 \sigma (R_i, G_c) \times 2^{8-\hbox {i}} \end{aligned}$$

(1)

$$\begin{aligned}&\sigma (m,n)= {\left\{ \begin{array}{ll} 1, &{} \text {if } m \ge n \\ 0, &{} \mathrm{{otherwise}} \end{array}\right. } \end{aligned}$$

(2)

$$\begin{aligned}&B = \sum _{i=1}^8 \sigma (B_i, G_c) \times 2^{8-\hbox {i}} \end{aligned}$$

(3)

$$\begin{aligned}&K = \sum _{i=1}^8 \sigma (K_i, G_c) \times 2^{8-\hbox {i}} \end{aligned}$$

(4)

$$\begin{aligned}&R\_K = \sum _{i=1}^8 \sigma (R_i, K_i) \times 2^{8-\hbox {i}} \end{aligned}$$

(5)

$$\begin{aligned}&R\_B = \sum _{i=1}^8 \sigma (R_i, B_i) \times 2^{8-\hbox {i}} \end{aligned}$$

(6)

$$\begin{aligned}&K\_B = \sum _{i=1}^8 \sigma (K_i, B_i) \times 2^{8-\hbox {i}} \end{aligned}$$

(7)

LGC

Tong et al. [55] proposed LGC for facial feature extraction. LGC extracts texture features in a 3 $\times $ 3 neighborhood. LGC operator encodes the gradient information in horizontal, vertical and diagonal directions to generate an eight-bit binary number. The binary number thus formed is then converted into a decimal number, which is replaced in the place of center pixel. This process is repeated throughout the image, and all the histogram features are concatenated block wise to form final feature vector. This LGC encoding captures consistent expression-specific texture features in all possible directions. The coding formula for feature extraction through LGC operator is shown in Eq. (8). As 3 $\times $ 3 mask is considered for LGC, usually radius (d = 1) and number of neighbors (p = 8). To the LGC operator, some extensions are also proposed, which are LGC-HD operator, LGC-FN operator and LGC-AD operator.

$$\begin{aligned} \mathrm{{LGC}}_{p}^d&= \sigma (h_4, h_2) \times 2^{7} + \sigma (h_5, h_1) \times 2^{6} \nonumber \\&\quad + \sigma (h_6, h_8) \times 2^{5} + \sigma (h_4, h_6) \times 2^{4} \nonumber \\&\quad + \sigma (h_3, h_7) \times 2^{3} + \sigma (h_2, h_8) \times 2^{2} \nonumber \\&\quad + \sigma (h_4, h_8) \times 2^{1} + \sigma (h_2, h_6) \times 2^{0} \end{aligned}$$

(8)

LGC-HD

LGC-HD is also proposed by Tong et al. [55], further optimizes the LGC operator and decreases the characteristic feature vector length by considering the gradient information in horizontal and diagonal directions only. The coding formula for feature extraction through LGC-HD operator is shown in Eq. (9). In Fig. 2, feature extraction of LGC and LGC-HD for a sample 3 $\times $ 3 numerical example is shown. As 3 $\times $ 3 mask is considered for LGC-HD, usually d = 1 and p = 6.

$$\begin{aligned} \mathrm{{LGC-HD}}_{p}^d&= \sigma (h_4, h_2) \times 2^{4} + \sigma (h_5, h_1) \times 2^{3} \nonumber \\&\quad + \sigma (h_6, h_8) \times 2^{2} + \sigma (h_4, h_8) \times 2^{1} \nonumber \\&\quad + \sigma (h_2, h_6) \times 2^{0} \end{aligned}$$

(9)

LGC-FN

LGC-FN operator [46, 66] expands LGC by considering 5 $\times $ 5 neighborhood size. LGC-FN computes feature values in three directions, namely in horizontal and along two diagonal directions. The coding formula for feature extraction through LGC-FN operator is shown in Eq. (10). A sample mask considered for 5 $\times $ 5 operators is shown in Fig. 3a. In Fig. 3b, a sample 5 $\times $ 5 image patch is shown. In Fig. 3c, the computed feature value using LGC-FN is shown.

$$\begin{aligned} \mathrm{{LGC-FN}}&= \sigma (h_1, h_2) \times 2^{7} + \sigma (h_3, h_4) \times 2^{6} \nonumber \\&\quad + \sigma (h_5, h_6) \times 2^{5} + \sigma (h_7, h_8) \times 2^{4} \nonumber \\&\quad + \sigma (h_1, h_8) \times 2^{3} + \sigma (h_3, h_6) \times 2^{2} \nonumber \\&\quad + \sigma (h_2, h_7) \times 2^{1} + \sigma (h_4, h_5) \times 2^{0} \end{aligned}$$

(10)

LGC-AD

LGC-AD operator [46, 66] computes feature values in four directions, namely in horizontal, vertical and along two diagonal directions. Thus, LGC-AD operator is an extension of LGC-FN operator by additionally considering vertical gradient information. The coding formula for feature extraction through LGC-AD operator is shown in Eq. (11). In Fig. 3d, the computed feature value using LGC-AD is shown.

$$\begin{aligned} \mathrm{{LGC-AD}}&= \sigma (h_1, h_2) \times 2^{11} + \sigma (h_3, h_4) \times 2^{10} \nonumber \\&\quad + \sigma (h_5, h_6) \times 2^{9} + \sigma (h_7, h_8) \times 2^{8} \nonumber \\&\quad + \sigma (h_1, h_7) \times 2^{7} + \sigma (h_3, h_5) \times 2^{6} \nonumber \\&\quad + \sigma (h_4, h_6) \times 2^{5} + \sigma (h_2, h_8) \times 2^{4} \nonumber \\&\quad + \sigma (h_1, h_8) \times 2^{3} + \sigma (h_3, h_6) \times 2^{2} \nonumber \\&\quad + \sigma (h_2, h_7) \times 2^{1} + \sigma (h_4, h_5) \times 2^{0} \end{aligned}$$

(11)

Limitations of existing descriptors

The existing feature descriptors have some limitations as follows:

CP generates six feature values, so its feature vector (fv) length is six times the fv length of LBP. Also, it takes more computation time than traditional LBP operator.
LGC method extracts features in a 3 $\times $ 3 neighborhood using three groups of horizontal pixels and three groups of vertical pixels, but in diagonal direction, only two groups of pixels are considered. As a result, the gradient information in the diagonal directions is not completely captured, which negatively impacts the recognition accuracy [66].
In LGC-HD operator, fv length is reduced when compared to LGC, as it does not take into consideration the gradient information computed in the vertical direction.
LGC-FN operator does not consider the gradient information in vertical information and also the characteristics of center pixel information is neglected.
LGC-AD operator generates a fv length of 4096, which is very huge when compared to traditional LBP operator.
Most of the existing edge-based methods generate unstable patterns in the smoother regions of an image. Also, some of the existing variants of binary patterns generate the same feature values for different image portions.
Deep learning techniques need high computational resources and much training data for effective expression recognition.

So, by considering all these information, new feature descriptors are developed, which are discussed in detail in the next section.

Main contributions

The main contributions in this work are summarized as follows:

Local texture-based feature descriptors, namely RCP, CSP and their fusion RCSP are proposed and applied in a 5 $\times $ 5 neighborhood for extracting facial features by considering both the neighboring pixels relationship and the adjacent pixels relationship.
RCP considers multi-radial and multi-orientation information and extracts two features by comparing the neighboring pixels with the current pixel in horizontal, vertical and diagonal directions.
CSP extracts one feature value by comparing the adjacent Knight pixel positions in a 5 $\times $ 5 neighborhood. Along with horizontal and vertical pixels, CSP method also considers the pixels in four diagonal directions in contrast to LGC which compares only in two directions. Upon considering the diagonal pixels in four directions, the experimental results also showed an enhanced recognition accuracy.
The proposed methods have been evaluated with different weights to find out the optimal recognition accuracy.
To evaluate and validate the efficiency of the proposed methods, the experiments are conducted on a variety of facial expression datasets which include datasets captured in the lab environment, dataset in the wild and also on an animated facial expression dataset.
The proposed methods, which are non-parametric methods, outperformed the standard existing methods proving the robustness of the proposed descriptors.

Proposed methodology

Designing a suitable and robust feature descriptor is of most relevance for any classification tasks. As like CP, LGC-FN and LGC-AD operators, in this work also 5 $\times $ 5 neighborhood in considered. A sample 5 $\times $ 5 block (T) at the pixel coordinate (r, s) is shown in Eq. (12). The center pixel is denoted as G_c, which is shown in Eq. (13). From Fig. 1a, in CP, it is observed that in a 5 $\times $ 5 grid, with reference to G_c, Rook is placed in horizontal and vertical pixel positions, Bishop is placed in diagonal and anti-diagonal pixel positions and Knight is positioned in the leftover pixel positions. The same positioning of Rook, Bishop and Knight as in CP is adopted while designing our feature descriptors. In CP, the pixel positions where Rook, Bishop and Knight could be placed are numbered only in clockwise manner, and based on this number assignment feature extraction process is explained. For image processing applications, choosing the neighborhood size is crucial in the design phase of handcrafted feature descriptors. If a smaller neighborhood (3 $\times $ 3) is chosen, then the number of pixels considered are limited, hence the accuracy obtained may not be optimal. In general, if more pixels are involved in designing a kernel, the more accurate is the classification. However, choosing a larger neighborhood (7 $\times $ 7) size increases the computation time. In this work, 5 $\times $ 5 neighborhood is chosen to incorporate both multi-radial and multi-orientation information for exploring wider information in a local neighborhood. By extracting features in such a manner, large inter-class distinctions and low intra-class variations can be achieved.

In this work, the 24 neighboring pixels surrounding the center pixel are divided into two groups, namely RCP and CSP. The group of pixels considered for RCP is the same as mentioned in CS-LGC [66] and MDCBP [22], but the methodology used for feature extraction is different. RCP extracts features by comparing 16 pixels with G_c and (CSP) extracts features from the remaining 8 pixels. RCP is sub-divided into two groups, namely RCP₁ and RCP₂, each extracting one feature value by comparing 8 pixels information with G_c. The process of feature extraction through RCP₁, RCP₂ and CSP is discussed below in the subsequent subsections. To capture better texture details, numbering assignment is done in both clockwise (for CSP) and anti-clockwise directions (for RCP). The numbering assigned to Rook, Bishop and Knight is shown in Fig. 5a. From CP, the concept of comparing the pixel intensity with G_c is adopted while designing RCP and from LGC operators, the concept of comparing vertical, diagonal and horizontal pixel information is adopted while designing CSP. Thus, the feature descriptor is modelled by considering the advantages of both CP and LGC operators.

$$\begin{aligned}&{\textstyle T = } \begin{bmatrix} {\scriptstyle r,s }&{} {\scriptstyle r,s+1 }&{} {\scriptstyle r,s+2} &{} {\scriptstyle r,s+3 } &{} {\scriptstyle r,s+4 }\\ {\scriptstyle r+1,s }&{} {\scriptstyle r+1,s+1 }&{} {\scriptstyle r+1,s+2} &{} {\scriptstyle r+1,s+3 } &{} {\scriptstyle r+1,s+4 }\\ {\scriptstyle r+2,s }&{} {\scriptstyle r+2,s+1 }&{} {\scriptstyle r+2,s+2} &{} {\scriptstyle r+2,s+3 } &{} {\scriptstyle r+2,s+4 }\\ {\scriptstyle r+3,s }&{} {\scriptstyle r+3,s+1 }&{} {\scriptstyle r+3,s+2} &{} {\scriptstyle r+3,s+3 } &{} {\scriptstyle r+3,s+4 }\\ {\scriptstyle r+4,s }&{} {\scriptstyle r+4,s+1 }&{} {\scriptstyle r+4,s+2} &{} {\scriptstyle r+4,s+3 } &{} {\scriptstyle r+4,s+4 } \end{bmatrix} \end{aligned}$$

(12)

$$\begin{aligned}&G_c = T_{r+2,s+2} \end{aligned}$$

(13)

Overview of the proposed method

Initially, the images from the standard facial expression datasets are given as an input. Pre-processing is then done using Viola Jones algorithm [61] for extracting the facial region. For maintaining a uniformity among all “in the lab” datasets, the images have been resized into 120 $\times $ 120 image responses. Histogram equalization is applied on the pre-processed images for normalizing the illumination levels in an image. The proposed feature descriptors, namely RCP, CSP and their fusion RCSP are applied over an input image to get feature response maps. The feature response maps generated using the proposed methods are divided into ‘R’ non-overlapping regions, each of size $N \times N$. From each of these regions, the features are extracted. The feature vector is formed by concatenating all the features obtained from all the regions. The feature vectors are obtained for both the training and the testing images. These feature vectors are then passed on to Multi-class Support Vector Machine (SVM) classifier for expression classification. The block diagram of the proposed method is shown in Fig. 4.

Feature extraction through RCP₁

Sign component is proved to be an efficient factor in developing feature descriptors [22]. RCP₁ contains a set of 8 pixels, which includes 4 pixels corresponding to Rook, considered from the 3 $\times $ 3 neighborhood (d = 1) and the remaining 4 pixels corresponding to Bishop, considered from the 5 $\times $ 5 neighborhood (d = 2). The numbering system for Rook and Bishop follows anti-clockwise direction, following Moore’s neighborhood [35]. The pixel positions corresponding to RCP₁ is shown in Fig. 5b. Thus, considering pixels in this manner enables in better capturing the expression-specific texture information in eight directions ($0^{\circ }$, $90^{\circ }$, $180^{\circ }$, $270^{\circ }$, $45^{\circ }$, $135^{\circ }$, $225^{\circ }$, $315^{\circ }$) respectively. The pixel intensities present in these eight positions (R_1,2,3,4, B_5,6,7,8) are compared with the pixel intensity of G_c. Upon comparison based on signum function, shown in Eq. (2), if the obtained result is positive, then the corresponding bit is encoded as one, else it encoded as zero. Thus, for eight pixel positions, eight corresponding values (either 0 or 1) are obtained, which are then concatenated to form an eight bit binary number is subsequently multiplied with the weight matrix (W_z). The corresponding equations for calculating fv based on RCP₁ are shown in Eqs. (14) and (15). Generally, weight matrix contains binary weights [17, 35, 45]. Upon using binary weights (shown in Eq. 17, the fv length of RCP₁ is 256, as like LBP. To further reduce the fv length, different weights such as fibonacci (shown in Eq. 18) [38], prime (shown in Eq. 19), natural (shown in Eq. 20), squares (shown in Eq. 21), odd (shown in Eq. 22) and even (shown in Eq. 23) have been considered. For Fibonacci weights, the sequence of first eight Fibonacci numbers are considered, which is shown in Eq. (18). Similarly, for other weights also, the sequence of first eight numbers in that particular series are considered. At each time, W_z can take any one the weight values for feature extraction. The number thus obtained after multiplying with W_z is then replaced with the value of G_c. The process of feature extraction through RCP₁ for a numerical example is demonstrated in Fig. 6a, b.

$$\begin{aligned}&\mathrm{{RCP}}_{1} = \{\sigma (R_1, G_c), \sigma (R_2, G_c), \sigma (R_3, G_c), \nonumber \\&\qquad \qquad \sigma (R_4, G_c), \sigma (B_5, G_c), \sigma (B_6, G_c), \nonumber \\&\qquad \qquad \sigma (B_7, G_c), \sigma (B_8, G_c)\} \end{aligned}$$

(14)

$$\begin{aligned}&\mathrm{{RCP}}_{1} = \sum (\mathrm{{RCP}}_{1}\times W_{z}) \end{aligned}$$

(15)

$$\begin{aligned}&z = \{\mathrm{{binary,Fibonacci,prime,natural}}, \nonumber \\&\qquad \mathrm{{squares,odd,even}}\} \end{aligned}$$

(16)

$$\begin{aligned}&W_{\mathrm{{binary}}} = \begin{bmatrix} 1,&2,&4,&8,&16,&32,&64,&128 \end{bmatrix} \end{aligned}$$

(17)

$$\begin{aligned}&W_{\mathrm{{fibonacci}}} = \begin{bmatrix} 1,&1,&2,&3,&5,&8,&13,&21 \end{bmatrix} \end{aligned}$$

(18)

$$\begin{aligned}&W_{\mathrm{{prime}}} = \begin{bmatrix} 2,&3,&5,&7,&11,&13,&17,&19 \end{bmatrix} \end{aligned}$$

(19)

$$\begin{aligned}&W_{\mathrm{{natural}}} = \begin{bmatrix} 1,&2,&3,&4,&5,&6,&7,&8 \end{bmatrix} \end{aligned}$$

(20)

$$\begin{aligned}&W_{\mathrm{{squares}}} = \begin{bmatrix} 1,&4,&9,&16,&25,&36,&49,&64 \end{bmatrix} \end{aligned}$$

(21)

$$\begin{aligned}&W_{\mathrm{{odd}}} = \begin{bmatrix} 1,&3,&5,&7,&9,&11,&13,&15 \end{bmatrix} \end{aligned}$$

(22)

$$\begin{aligned}&W_{\mathrm{{even}}} = \begin{bmatrix} 2,&4,&6,&8,&10,&12,&14,&16 \end{bmatrix} \end{aligned}$$

(23)

Feature extraction through RCP₂

RCP₂ contains a set of 8 pixels, which includes 4 pixels corresponding to Bishop, considered from the 3 $\times $ 3 neighborhood (d = 1) and the remaining 4 pixels corresponding to Rook, considered from the 5 $\times $ 5 neighborhood (d = 2). The numbering system for Rook and Bishop follows anti-clockwise direction, following Moore’s neighborhood [35]. The pixel positions corresponding to RCP₂ are shown in Fig. 5c. Thus, considering pixels in this manner enables in better capturing the information in eight directions ($45^{\circ }$, $135^{\circ }$, $225^{\circ }$, $315^{\circ }$, $0^{\circ }$, $90^{\circ }$, $180^{\circ }$, $270^{\circ }$), respectively. The pixel intensities present in these eight positions (B_1,2,3,4, R_5,6,7,8) are compared with the pixel intensity of G_c. Upon comparison based on signum function, shown in Eq. (2), if the obtained result is positive, then the corresponding bit is encoded as one, else it encoded as zero. Thus, for eight pixel positions, eight corresponding binary values are obtained, which are then concatenated to form an eight bit binary number is subsequently multiplied with the W_z. The number thus obtained after multiplying with W_z is then replaced with the value of G_c. The corresponding equations for calculating fv based on RCP₂ are shown in Eqs. (24) and (25). The process of feature extraction through RCP₂ for a numerical example is demonstrated in Fig. 6a, c.

$$\begin{aligned} \mathrm{{RCP}}_{2}&= \{\sigma (B_1, G_c), \sigma (B_2, G_c), \sigma (B_3, G_c), \nonumber \\&\quad \sigma (B_4, G_c), \sigma (R_5, G_c), \sigma (R_6, G_c), \nonumber \\&\quad \sigma (R_7, G_c), \sigma (R_8, G_c)\} \end{aligned}$$

(24)

$$\begin{aligned} \mathrm{{RCP}}_{2}&= \sum (\mathrm{{RCP}}_{2}\times W_{z}) \end{aligned}$$

(25)

Feature extraction through RCP

RCP is obtained by concatenating both RCP₁ and RCP₂. Thus, RCP feature descriptor generates two feature values, each corresponding to RCP₁ and RCP₂ and hence, the fv length of RCP becomes 512 (in case of binary weights). The fv length becomes 110, 156, 74, 410, 130 and 146 whenever Fibonacci, prime, natural, squares, odd and even weights are utilized for feature extraction. Thus, using other weights than binary weights, even if two feature values are generated, the fv length of RCP is much lesser than fv generated for one feature value (LBP, LDP) in all cases (expect whenever squares weights are used for feature extraction). The corresponding equation for calculating RCP is shown in Eq. (26).

$$\begin{aligned} \mathrm{{RCP}} = \mathrm{{RCP}}_{1} \cup \mathrm{{RCP}}_{2} \end{aligned}$$

(26)

Feature extraction through CSP

The process of feature extraction through CSP is inspired from LGC-AD operator. The length of fv generated by LGC-AD is 4096 (very high) and also the computational complexity involved in LGC-AD is very high. From LGC-AD, the concept of comparing the horizontal, vertical and diagonal pixel information is adopted while designing the CSP operator. The numbering assignment of Knight is done in clockwise manner. CSP operator captures the pixel information in 4 diagonal directions, 2 vertical directions and in 2 horizontal directions. Thus, by comparing the pixels as shown in Fig. 5d, the fv length of CSP is 16 times lesser than the fv length of LGC-AD. The corresponding equations for calculating CSP is shown in Eqs. (27) and (28). The process of feature extraction through CSP for a numerical example is demonstrated in Fig. 6a, d.

$$\begin{aligned} \mathrm{{CSP}}&= \{\sigma (K_1, K_5), \sigma (K_2, K_6), \sigma (K_3, K_7), \nonumber \\&\quad \sigma (K_4, K_8), \sigma (K_1, K_6), \sigma (K_2, K_5), \nonumber \\&\quad \sigma (K_3, K_8), \sigma (K_4, K_7)\} \end{aligned}$$

(27)

$$\begin{aligned} \mathrm{{CSP}}&= \sum (\mathrm{{CSP}}\times W_{z}) \end{aligned}$$

(28)

Feature extraction through RCSP

RCSP is obtained by concatenating both RCP and CSP. Thus, RCP generates two feature values, each corresponding to RCP₁ and RCP₂ and CSP generates one feature value and hence, the fv length of RCSP becomes 768 (incase of binary weights). But, as other weights are also used, the fv length becomes 165, 234, 111, 615, 195 and 219 whenever Fibonacci, prime, natural, squares, odd and even weights are utilized for feature extraction. Thus, using other weights than binary weights, even if three feature values are generated, the fv length of RCSP is lesser than fv generated for one feature value (LBP, LDP) in all cases (expect whenever squares weights are used for feature extraction). The corresponding equation for calculating RCSP is shown in Eq. (29). The different steps involved in the calculation of final feature histogram are shown in Fig. 7.

$$\begin{aligned} \mathrm{{RCSP }} = \mathrm{{RCP}} \cup \mathrm{{CSP}}. \end{aligned}$$

(29)

Experimental analysis

For experimental analysis and evaluation, a total of ten datasets namely JAFFE [31], MUG [1], Extended Cohn–Kanade (CK+) [30], OULU VIS Strong [71], TFEID [6], KDEF [11], WSEFEP [37], ADFES [59], RAF [29] and FERG [3] have been considered. All the datasets other than RAF and FERG are captured in the lab environment. RAF contains images captured in the wild and FERG is an animated facial expression dataset. Thus, a variety of facial expression datasets have been considered for experimental evaluation. Anger, Disgust, Fear, Happy, Sad and Surprise are the basic expressions and Neutral is considered as the seventh expression [9]. The number of images considered for experimental evaluation across datasets is shown in Table 1. In Table 1, An. corresponds to Anger, Co. corresponds to Contempt, Di. corresponds to Disgust, Em. corresponds to Embarrass, Fe. corresponds to Fear, Ha. corresponds to Happy, Ne. corresponds to Neutral, Pr. corresponds to Pride, Sa. corresponds to Sad and Su. corresponds to Surprise facial expressions.

Table 1 Number of images considered for experimental evaluation across datasets

Modified chess patterns: handcrafted feature descriptors for facial expression recognition

Abstract

Similar content being viewed by others

Facial Expression Recognition Adopting Combined Geometric and Texture-Based Features

Radial mesh pattern: a handcrafted feature descriptor for facial expression recognition

Facial expression recognition based on improved completed local ternary patterns

Explore related subjects

Introduction

Related work

Existing feature descriptors

CP

LGC

LGC-HD

LGC-FN

LGC-AD

Limitations of existing descriptors

Main contributions

Proposed methodology

Overview of the proposed method

Feature extraction through RCP1

Feature extraction through RCP2

Feature extraction through RCP

Feature extraction through CSP

Feature extraction through RCSP

Experimental analysis

Dataset description

JAFFE

MUG

CK+

OULU

TFEID

KDEF

WSEFEP

ADFES

RAF

FERG

Experimental setup

Experiments for six expressions

Experiments for seven expressions

Experiments for eight expressions

Experiments for ten expressions

Conclusion

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation

Feature extraction through RCP₁

Feature extraction through RCP₂