Modified chess patterns: handcrafted feature descriptors for facial expression recognition

Facial expressions are predominantly important in the social interaction as they convey the personal emotions of an individual. The main task in Facial Expression Recognition (FER) systems is to develop feature descriptors that could effectively classify the facial expressions into various categories. In this work, towards extracting distinctive features, Radial Cross Pattern (RCP), Chess Symmetric Pattern (CSP) and Radial Cross Symmetric Pattern (RCSP) feature descriptors have been proposed and are implemented in a 5 ×\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\times $$\end{document} 5 overlapping neighborhood to overcome some of the limitations of the existing methods such as Chess Pattern (CP), Local Gradient Coding (LGC) and its variants. In a 5 ×\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\times $$\end{document} 5 neighborhood, the 24 pixels surrounding the center pixel are arranged into two groups, namely Radial Cross Pattern (RCP), which extracts two feature values by comparing 16 pixels with the center pixel and Chess Symmetric Pattern (CSP) extracts one feature value from the remaining 8 pixels. The experiments are conducted using RCP and CSP independently and also with their fusion RCSP using different weights, on a variety of facial expression datasets to demonstrate the efficiency of the proposed methods. The results obtained from the experimental analysis demonstrate the efficiency of the proposed methods.


Introduction
Facial expressions provide valuable information about a person by reflecting his psychological characteristics, which provide an important means for effective communication [66]. Facial expressions expressed by humans are common across traditions and cultures, and they provide an immediate means to analyze the mood of a person [15]. Research suggests that about 55% of human communication happens through facial expressions only [8]. Since the last decade, the field of Facial Expression Recognition (FER) has gathered a lot of attention from researchers because of its wide range of applications in driver mood detection, affective computing, clinical psychology and animation [35] etc. The relevance in day to day communication and advanced intelligent interactions between humans and machines are the important factors driving the study of FER [45].
The robustness of developing an FER system is hindered due to inconsistent image acquisition conditions, spontaneous expressions, ethnicity variations, illumination variations, aging factors, noise variations etc [33]. So, developing a feature descriptor that is robust to such dynamic changes is a complicated and challenging task. FER system includes image acquisition and pre-processing task, feature extraction task for extracting expression-specific features and classification task for classifying the expressions [35]. The classification task of an FER system is hugely dependent on the method used for feature extraction, as inappropriate feature extraction would degrade the performance even after using the best of classifiers. So, a proper feature extraction technique which could effectively capture expression-specific changes is essential for an FER system [66].
A person's face can depict many expressions and there are only minute differences between the various expressions conveyed by human beings. For accurate classification in FER, it is essential to capture those minute information related to specific expressions. Based on the literature studies, the texture-based feature descriptors have been proven to be effective for extracting valuable features from a facial image by manipulating the neighborhood pixel relationship for detecting and capturing minute details in an image [35,56,57]. In FER systems, the most significant problem is to extract effective and meaningful patterns from the facial images [56].
The neighboring pixels relationship and the adjacent pixels relationship are crucial to detect finer appearance changes with respect to specific expressions to detect facial expressions effectively. Accordingly, the proposed feature descriptors have been developed by considering the neighboring pixels relationship (both diagonal and four neighbors) for Radial Cross Pattern (RCP) and by considering the adjacent pixels relationship based on Knight pixel positions for Center Symmetric Pattern (CSP). In this work, for the purpose of feature extraction, the 24 neighboring pixels surrounding the center pixel in a 5 × 5 neighborhood are arranged into two groups, namely RCP, which extracts two feature values by comparing 16 pixels with the center pixel and CSP extracts feature one feature value from the remaining 8 pixels. The features are extracted using RCP and CSP independently and also with their fusion named Radial Cross Symmetric Pattern (RCSP). The main strength of the proposed methods is that, from the available images in the dataset, the relevant features can be extracted without requiring much training data.
The remainder of the paper is structured as follows: the related work in the field of FER is summarized in the second section, and a brief review of existing descriptors that are a basis for the proposed methods are mentioned in the third section. The proposed feature descriptors for facial feature extraction are discussed in the fourth section. In the fifth section, the datasets considered for experimental evaluation and the comparison analysis of the proposed methods with the existing methods are reported. The concluding remarks and the suggestions for further analysis and study are mentioned in the last section.

Related work
The existing techniques in FER systems are broadly classified into geometric-based methods and appearance-based methods [35]. The geometric-based methods [7,13] encode the locations, shapes, corner and contour information of main facial components such as eyes, nose and mouth. These geometric-based methods encode the characteristics that can describe the entire facial image with a lesser number of features that are scale and rotation invariant. Despite the fact that these geometric-based methods represent facial geometry, they fail to capture minute local details such as skin's texture variations, and ridges. The facial appearance is best described by appearance-based methods, which can further be classified as global (holistic)-based methods [4,58] and local-based methods [21,27,34,35,[40][41][42]. Holistic (global) methods such as Eigenfaces [58] and Fisher faces [4] apply projection based techniques for producing a global description of the entire facial image. As these global-based methods are aimed at representing a facial image globally, they are unsuitable for capturing finer appearance changes corresponding to various facial expressions [41]. The local appearance-based methods [17,34,35,[40][41][42]45] are aimed at investigating the local regions to describe various features such as corners, curved and straight edges etc. Also, the localbased methods can capture micro-level texture information such as ridge details, specific skin changes and minute characteristics that are relevant to various facial expressions.
The research on local-based methods is being carried out in two directions, namely texture-based [35,45,67] and edgebased approaches [17,34,[40][41][42]. Local Binary Pattern (LBP) [45] is the most popular texture-based method for facial feature extraction. LBP is computationally efficient and is also invariant to changes in monotonic illumination. In cases of intensity fluctuations, random noises, and non-monotonic illumination levels, LBP's feature extraction capability is affected [74]. Lai et al. [27] proposed Center Symmetric LBP (CSLBP) for greatly reducing the feature vector length of LBP. In Local Directional Pattern (LDP) [17], the directions related to top three responses from the eight obtained responses are encoded. Since Kirsch masks are applied on a local 3 × 3 neighborhood, the existence of noise or intensity distortions may affect the computations of Kirsch value responses. The top negative and positive Kirsch response values are encoded by the Local Directional Number (LDN) [41]. LDN is still affected by the noise present in the local neighborhood, even after preserving top 'k' positive and negative Kirsch responses. The difference in intensity values of opposing pixels in the principal directions is encoded as numbers in Local Directional Texture Pattern (LDTP) [40]. Ryu et al. [42] proposed Local Directional Ternary Pattern (LDTerP) and a multi-level approach for efficiently encoding the emotion related information. NEDP [16] considers the gradient of the center pixel as well as its neighbors for exploring a wider neighborhood to extract consistent features despite the presence of subtle distortions and random noises in a local region.
For capturing expression-specific changes, Murari et al. [35] proposed Regional Adaptive Affinitive Pattern (RADAP) that uses positional thresholds and multi-distance information for describing features that are robust to intra class variations and illuminations. Also, XRADAP, ARADAP and DRADAP operators are obtained from RADAP by performing xor, adder and decoder operations respectively. Despite the existence of noise in a local neighborhood, Local Prominent Directional Pattern (LPDP) [33] explores local regions for extracting crucial information about edges. Micheal Revina et al. [39] proposed Multi-Directional Triangles Pattern (MDTP) for extracting the features at the locations of lips and eyes. Local Dominant Directional Symmetrical Coding Patterns (LDDSCP) [54] generates two feature values by partitioning the Kirsch response values into two symmetrical groups based on the directional information. Local Optimal Oriented Pattern (LOOP) [21] uses sorted Kirsch responses for weight assignment, rather than using sequential weights. In Center Symmetric Local Gradient Coding (CS-LGC) [66], the gradients are computed in four different directions in a center symmetric manner.
Local Directional Maximum Edge Patterns (LDMEP) [32] applied Robinson's masks in a local neighborhood for extracting both magnitude and phase information. Kas et al. [22] proposed Multi-level Directional Cross Binary Pattern (MDCBP) for texture recognition by combining both multi-radius and multi-orientation information. Durga et al. [23] proposed LBP with Adaptive Window (LBP-AW) for noise robust facial feature extraction. Alphonse et al. [2] proposed Multi-Scale and Rotation-Invariant Phase Pattern (MRIPP) for extracting blur-insensitive and rotation invariant facial features. Kumar et al. [26] proposed Weighted Full binary Tree-Sliced Binary Pattern (WFBT-SBP) for analyzing an RGB image based on inter-pixel similarity patterns. Kola et al. [24] proposed fusion of both singular values and Wavelet-based Local Gradient Coding-Horizontal and Diagonal (WLGC-HD) features for effective FER.
Szegedy et al. [53] proposed GoogleNet for object detection and classification. Duc et al. [62] proposed fusion of AlexNet and Support Vector Machine (SVM) for effective facial feature extraction. Subramanian et al. [50] proposed Meta-Cognitive Neuro-Fuzzy Inference System (McFIS) for FER. Jung et al. [20] used Convolutional Neural Network (CNN) for detecting faces and Deep Neural Network (DNN) for recognizing facial expressions from those detected faces. Shojaeilangari et al. [48] proposed Landmark-based Pose Invariant feature Descriptor (PID) for handling continuous head pose variations. Zhao et al. [72] proposed LBP on three orthogonal planes (LBP-TOP) for dynamic texture recognition. Shojaeilangari et al. [47] proposed Optical Flow-based spatial temporal feature descriptor for representing the facial expressions.
Aneja et al. [3] proposed DeepExpr, a transfer learning technique to map expressions from humans to animated characters. Zhao et al. [73] proposed an instance-based transfer learning approach with multiple feature representations. Sun et al. [51] proposed Individual Free Representation-Based Classification (IFRBC) that utilizes Variation Training Set (VTS) and virtual VTS for remitting the side effects caused by the individual differences. Wu et al. [63] proposed Adaptive Feature Mapping (AFM) for transforming the feature distribution of testing samples into that of training samples. Li et al. [29] proposed Deep Locality Preserving Convolutional Neural Network (DLPCNN) to preserve the locality closeness by maximizing the inter-class scatters. Verma et al. [60] proposed variants of Hybrid Inherited Feature Learning Network (HiNet) for capturing the local contextual information of expressive regions. Ji et al. [18] proposed a fusion network based on intra category common and distinctive feature representation. For FER, Xie et al. [64] presented the Deep Attentive Multi-path Convolutional Neural Network (DAMCNN), which combines the Salient Expression Region Descriptor (SERD) with the Multi-Path Variation Suppressing Network (MPVS-Net).
Zeng et al. [70] proposed a framework for FER that combines both geometric and appearance-based features and utilized Deep Sparse Auto Encoders (DSAE) for recognizing the facial expressions. Drawing inspiration from human vision system, Sadeghi et al. [43] proposed a method based on gabor filters. Li et al. [28] used reinforcement learning for selection of relevant images for expression classification. Saurav et al. [44] proposed Dual Integrated Convolution Neural Network (DICNN) model for recognizing 'in the wild' facial expressions on embedded platform. Jeen et al. [25] utilized subband selective multilevel stationary wavelet gradient transform features for recognizing facial expressions. Image filter-based Subspace Learning (IFSL) is proposed by Yan et al. [65] for better capturing the facial information. Feutry et al. [10] proposed a framework to learn anonymized representation of statistical data. Zeng et al. [69] proposed a novel pattern recognition-based method for accurate segmentation of test and control lines for quantitative analysis of gold immunochromatographic strips. Minaee et al. [36] proposed Attentional Convolutional Network (ACN) model with less than ten layers for classifying emotions from facial images. Sun et al. [52] adopted Dictionary Learning Feature Space (DLFS) for training and Sparse Representation Classification for finding the emotion of query images. A novel Deep Belief Network (DBN)-based multi-task learning algorithm is proposed by Zeng et al. [68] for the diagnosis of Alzheimer's disease. There are various other CNN-based methods such as VGG [49], PCANet [5], ResNet [12] and MobileNet [14] which have shown promising results in FER.

Existing feature descriptors
In this section, the existing feature descriptors namely Chess Pattern (CP) [57], Local Gradient Coding (LGC) [55] and its variants are presented.

CP
Tuncer et al. [57] proposed CP, a local texture-based feature descriptor, developed using chess game rules for texture recognition. With reference to the center pixel (G c ) in a 5 . Finally, all the six extracted features are concatenated together to from a final feature vector. The process of feature extraction through CP is demonstrated through numerical example in Fig. 1c-i. Thus, CP considers all the pixels present in 5 × 5 neighborhood and extracts six features by considering sign information to encode texture details present in an image. As CP considers binary weights, the feature vector length of CP is 256 × 6 = 1536, which is very high.
LGC extracts texture features in a 3 × 3 neighborhood.
LGC operator encodes the gradient information in horizontal, vertical and diagonal directions to generate an eight-bit binary number. The binary number thus formed is then converted into a decimal number, which is replaced in the place of center pixel. This process is repeated throughout the image, and all the histogram features are concatenated block wise to form final feature vector. This LGC encoding captures consistent expression-specific texture features in all possible directions. The coding formula for feature extraction through LGC operator is shown in Eq. (8). As 3 × 3 mask is considered for LGC, usually radius (d = 1) and number of neighbors (p = 8). To the LGC operator, some extensions are also proposed, which are LGC-HD operator, LGC-FN operator and LGC-AD operator. LGC

LGC-HD
LGC-HD is also proposed by Tong et al. [55], further optimizes the LGC operator and decreases the characteristic feature vector length by considering the gradient information in horizontal and diagonal directions only. The coding formula for feature extraction through LGC-HD operator is shown in Eq. (9). In Fig. 2, feature extraction of LGC and LGC-HD for a sample 3 × 3 numerical example is shown. As 3 × 3 mask is considered for LGC-HD, usually d = 1 and p = 6. LGC LGC-FN LGC-FN operator [46,66] expands LGC by considering 5 × 5 neighborhood size.
LGC-FN computes feature values in three directions, namely in horizontal and along two diagonal directions. The coding formula for feature extraction through LGC-FN operator is shown in Eq. (10). A sample mask considered for 5 × 5 operators is shown in Fig. 3a. In Fig. 3b, a sample 5 × 5 image patch is shown. In Fig. 3c, the computed feature value using LGC-FN is shown. LGC LGC-AD LGC-AD operator [46,66] computes feature values in four directions, namely in horizontal, vertical and along two diagonal directions. Thus, LGC-AD operator is an extension of LGC-FN operator by additionally considering vertical gradient information. The coding formula for feature extraction through LGC-AD operator is shown in Eq. (11). In Fig. 3d, the computed feature value using LGC-AD is shown. LGC

Limitations of existing descriptors
The existing feature descriptors have some limitations as follows: -CP generates six feature values, so its feature vector (fv) length is six times the fv length of LBP. Also, it takes more computation time than traditional LBP operator. -LGC method extracts features in a 3 × 3 neighborhood using three groups of horizontal pixels and three groups of vertical pixels, but in diagonal direction, only two groups of pixels are considered. As a result, the gradient information in the diagonal directions is not completely captured, which negatively impacts the recognition accuracy [66]. -In LGC-HD operator, fv length is reduced when compared to LGC, as it does not take into consideration the gradient information computed in the vertical direction. So, by considering all these information, new feature descriptors are developed, which are discussed in detail in the next section.

Main contributions
The main contributions in this work are summarized as follows: -Local texture-based feature descriptors, namely RCP, CSP and their fusion RCSP are proposed and applied in a 5 × 5 neighborhood for extracting facial features by considering both the neighboring pixels relationship and the adjacent pixels relationship. -RCP considers multi-radial and multi-orientation information and extracts two features by comparing the neighboring pixels with the current pixel in horizontal, vertical and diagonal directions. -CSP extracts one feature value by comparing the adjacent Knight pixel positions in a 5 × 5 neighborhood. Along with horizontal and vertical pixels, CSP method also considers the pixels in four diagonal directions in contrast to LGC which compares only in two directions. Upon considering the diagonal pixels in four directions, the experimental results also showed an enhanced recognition accuracy. -The proposed methods have been evaluated with different weights to find out the optimal recognition accuracy. -To evaluate and validate the efficiency of the proposed methods, the experiments are conducted on a variety of facial expression datasets which include datasets captured in the lab environment, dataset in the wild and also on an animated facial expression dataset. -The proposed methods, which are non-parametric methods, outperformed the standard existing methods proving the robustness of the proposed descriptors.

Proposed methodology
Designing a suitable and robust feature descriptor is of most relevance for any classification tasks. As like CP, LGC-FN and LGC-AD operators, in this work also 5 × 5 neighborhood in considered. A sample 5 × 5 block (T ) at the pixel coordinate (r , s) is shown in Eq. (12). The center pixel is denoted as G c , which is shown in Eq. (13). From Fig. 1a, in CP, it is observed that in a 5 × 5 grid, with reference to G c , Rook is placed in horizontal and vertical pixel positions, Bishop is placed in diagonal and anti-diagonal pixel positions and Knight is positioned in the leftover pixel positions. The same positioning of Rook, Bishop and Knight as in CP is adopted while designing our feature descriptors. In CP, the pixel positions where Rook, Bishop and Knight could be placed are numbered only in clockwise manner, and based on this number assignment feature extraction process is explained. For image processing applications, choosing the neighborhood size is crucial in the design phase of handcrafted feature descriptors. If a smaller neighborhood (3 × 3) is chosen, then the number of pixels considered are limited, hence the accuracy obtained may not be optimal. In general, if more pixels are involved in designing a kernel, the more accurate is the classification. However, choosing a larger neighborhood (7 × 7) size increases the computation time. In this work, 5 × 5 neighborhood is chosen to incorporate both multi-radial and multi-orientation information for exploring wider information in a local neighborhood. By extracting features in such a manner, large inter-class distinctions and low intra-class variations can be achieved. In this work, the 24 neighboring pixels surrounding the center pixel are divided into two groups, namely RCP and CSP. The group of pixels considered for RCP is the same as mentioned in CS-LGC [66] and MDCBP [22], but the methodology used for feature extraction is different. RCP extracts features by comparing 16 pixels with G c and (CSP) extracts features from the remaining 8 pixels. RCP is subdivided into two groups, namely RCP 1 and RCP 2 , each extracting one feature value by comparing 8 pixels information with G c . The process of feature extraction through RCP 1 , RCP 2 and CSP is discussed below in the subsequent subsections. To capture better texture details, numbering assignment is done in both clockwise (for CSP) and anti-clockwise directions (for RCP). The numbering assigned to Rook, Bishop and Knight is shown in Fig. 5a. From CP, the concept of comparing the pixel intensity with G c is adopted while designing RCP and from LGC operators, the concept of comparing vertical, diagonal and horizontal pixel information is adopted while designing CSP. Thus, the feature descriptor is modelled by considering the advantages of both CP and LGC operators.

Overview of the proposed method
Initially, the images from the standard facial expression datasets are given as an input. Pre-processing is then done using Viola Jones algorithm [61] for extracting the facial region. For maintaining a uniformity among all "in the lab" datasets, the images have been resized into 120 × 120 image responses. Histogram equalization is applied on the preprocessed images for normalizing the illumination levels in an image. The proposed feature descriptors, namely RCP, CSP and their fusion RCSP are applied over an input image to get feature response maps. The feature response maps generated using the proposed methods are divided into 'R' non-overlapping regions, each of size N × N .

Feature extraction through RCP 1
Sign component is proved to be an efficient factor in developing feature descriptors [22]. RCP 1 contains a set of 8 pixels, which includes 4 pixels corresponding to Rook, considered from the 3 × 3 neighborhood (d = 1) and the remaining 4 pixels corresponding to Bishop, considered from the 5 × 5 neighborhood (d = 2). The numbering system for Rook and Bishop follows anti-clockwise direction, following Moore's neighborhood [35]. The pixel positions corresponding to RCP 1 is shown in Fig. 5b Fig. 6a, b.  (24) and (25). The process of feature extraction through RCP 2 for a numerical example is demonstrated in Fig. 6a, c.

Feature extraction through CSP
The process of feature extraction through CSP is inspired from LGC-AD operator. The length of fv generated by LGC-AD is 4096 (very high) and also the computational complexity involved in LGC-AD is very high. From LGC-AD, the concept of comparing the horizontal, vertical and diagonal pixel information is adopted while designing the CSP operator. The numbering assignment of Knight is done in clockwise manner. CSP operator captures the pixel information in 4 diagonal directions, 2 vertical directions and in 2 horizontal directions. Thus, by comparing the pixels as shown in Fig. 5d, the fv length of CSP is 16 times lesser than the fv length of LGC-AD. The corresponding equations for calculating CSP is shown in Eqs. (27) and (28). The process of feature extraction through CSP for a numerical example is demonstrated in Fig. 6a, d.  Fig. 7.

Experimental analysis
For experimental analysis and evaluation, a total of ten datasets namely JAFFE [31], MUG [1], Extended Cohn-Kanade (CK+) [30], OULU VIS Strong [71], TFEID [6], KDEF [11], WSEFEP [37], ADFES [59], RAF [29] and FERG [3] have been considered. All the datasets other than RAF and FERG are captured in the lab environment. RAF contains images captured in the wild and FERG is an animated facial expression dataset. Thus, a variety of facial expression datasets have been considered for experimental evaluation. Anger, Disgust, Fear, Happy, Sad and Surprise are the basic expressions and Neutral is considered as the seventh expression [9]. The number of images considered for experimental evaluation across datasets is shown in  [35].

OULU
The OULU-CASIA dataset has images captured from 80 subjects whose age lies in the range of 23-58 years old. The expressions are captured in both Near-Infra Red (NIR) and Visual Light Scenarios (VIS) in different illuminations such as strong, dark and weak environments. For each illumination, 480 video sequences were captured from those 80 subjects. OULU VIS strong subset has been considered for experimental analysis and evaluation. For the basic six expressions, the three peak frames from each expression are chosen, and the images for neutral expression were collected from the onset of each recording session [35].

ADFES
Amsterdam Dynamic Facial Expression Set is the full form of ADFES dataset. This dataset has three more expressions, namely contempt, embarrassment and pride apart from the basic seven expressions. 216 images are present in this dataset. This dataset has videos stored in MPEG-2 format and still pictures are also present in this dataset. The images were captured from 22 persons of which 12 are male, 10 are female whose age lies in between 18 and 25 years old.

Experimental setup
Initially, the datasets are gathered and for the datasets captured in the lab, pre-processing is done using Viola Jones algorithm [61] to extract the facial region. To maintain an uniformity among the datasets, all the images are converted into grayscale images and are resized into 120 × 120 image responses (as like RADAP [35] For "in the lab" datasets, for experimental evaluation and analysis, person independent (PI) scheme is adopted. In PI scheme, leave one subject out strategy has been followed (for all datasets except CK+), i.e., at each time, all the images from one particular subject are excluded from the training and are used for testing purposes. A ten fold PI cross validation is performed for CK+ dataset. As a result, by excluding an user in this way, the person's independence is always ensured. In this work, for the purpose of classification, a multi-class classifier model employing γ (γ − 1)/2 binary SVM models with oneversus-one approach and a linear kernel is followed, where γ corresponds to the total number of classes. Multi-class SVM is chosen for classification purpose, as it is the most widely classifier for addressing the problem of FER in the field of pattern recognition [23,24,35]. The experiments are performed using MATLAB R2018a software on i5 processor with Windows 10 operating system and 16 GB RAM. The existing variants of binary patterns such as LBP, LDP, LDN, CSLBP, LGC, LDTP, LDTerP, RADAP and CP are implemented in our setup and correspondingly the recognition accuracy is reported. The recognition accuracy is computed

Experiments for six expressions
The experiments for six expressions were conducted on all the datasets captured in the lab environment. The proposed methods have been implemented with different weights and the results are tabulated. In Table 2, for each dataset, the recognition accuracy comparison of RCP, in Table 3, the recognition accuracy comparison of CSP and in Table 4, the recognition accuracy comparison of RCSP with differ-ent weights is shown. For all the tables listed below, the weight that had achieved the highest recognition accuracy is highlighted in bold. For some datasets, there are multiple weights that have achieved highest recognition accuracy. In such cases, the weight with least fv length is highlighted in bold. For example, in Table 2, for WSEFEP dataset, RCP with natural and even weights achieved highest recognition accuracy. But, RCP with natural weights is only highlighted in bold as its fv length is lesser when compared to even weights. Among the proposed methods with different weights, CSP method with Fibonacci weights achieved highest recognition accuracy for JAFFE dataset. RCP method with natural weights achieved highest recognition accuracy for MUG, KDEF and WSEFEP datasets. RCP method with squares weights achieved highest recognition accuracy for TFEID and ADFES datasets. RCP method with prime weights achieved highest recognition accuracy for OULU- VIS dataset. RCSP method with squares weights achieved highest recognition accuracy for CK+ dataset. Hence, these methods are chosen for comparison analysis with the existing methods. In Table 5, the comparison analysis of the proposed methods with the existing variants of binary patterns, implemented in our environment setup is shown. In Table 6, the comparison analysis of the proposed method with the existing methods is shown. The comparison analysis for JAFFE dataset with the existing variants of binary patterns is reported in the second column of Table 5. From Table 5, the proposed CSP method outperformed existing variants of binary patterns such as LDTP, LDTerP, RADAP and CP by 8.11%, 5.12%, 6.44% and 4.92% respectively. From Table 6, the proposed method could also outperform deep learning methods such as VGG19, ResNet50, IFRBC and WLGC-HD by 3.11%, 4.22%, 1.37% and 2.96% respectively. The comparison analysis for MUG dataset with the existing variants of binary patterns is reported in the third column of Table 5. From Table  5, the proposed RCP method outperformed existing variants of binary patterns such as LDTP, LDTerP, RADAP and CP by 6.03%, 7.92%, 4.59% and 1.84% respectively. From Table  6, the proposed method could also outperform deep learning methods such as VGG19, ResNet50, MobileNet and HiNet by 2.85%, 1.19%, 11.37% and 0.27% respectively.
The comparison results for CK+ dataset with the existing variants of binary patterns are reported in the fourth column of Table 5. From Table 5, the proposed RCSP method outperformed existing variants of binary patterns such as LDTP, LDTerP, RADAP and CP by 2.82%, 6.53%, 1.79% and 1.08% respectively. From Table 6, the proposed method could also outperform some deep learning methods such as VGG19, ResNet50, MobileNet and HiNet by 1.11%, 3.1%, 13.72% and 1.02% respectively. The comparison analysis for OULU-VIS Strong dataset with the existing variants of binary patterns is reported in the fifth column of Table 5. From Table 5, the proposed RCP method outperformed existing variants of binary patterns such as LDTP, LDTerP, RADAP and CP by 3.89%, 7.64%, 0.28% and 2.2% respectively. From Table 6, the proposed method could also outperform some deep learning methods such as VGG16, ResNet50, MobileNet and HiNet by 2.78%, 3.08%, 15.28% and 1.88%, respectively. The comparison analysis for TFEID dataset with the existing variants of binary patterns is reported in the sixth column of Table 5. From Table 5, the proposed RCP method  Table 5. From Table 5, the proposed RCP method outperformed existing variants of binary patterns such as LDTP, LDTerP, RADAP and CP by 0.95%, 3.09%, 2.14% and 0.48% respectively. From Table 6, the proposed method could also outperform the existing methods such as ICVR and IFRBC by 8.21% and 6.54% respectively.
The comparison analysis for WSEFEP dataset with the existing variants of binary patterns is reported in the eighth column of Table 5. From Table 5, the proposed RCP method outperformed existing variants of binary patterns such as LDTP, LDTerP, RADAP and CP by 1.66%, 8.33%, 1.66% and 2.77% respectively. The comparison analysis for ADFES dataset are reported in the ninth column of Table  5. From Table 5, the proposed RCP method outperformed existing variants of binary patterns such as LDTP, LDTerP, RADAP and CP by 5.3%, 9.05%, 2.27% and 2.27% respectively.

Experiments for seven expressions
The experiments for seven expressions were conducted on all the ten datasets. In Table 7, for each dataset, the recognition accuracy comparison of RCP, in Table 8, the recognition accuracy comparison of CSP and in Table 9, the recognition accuracy comparison of RCSP with different weights is shown. Among the proposed methods with different weights, CSP method with Fibonacci weights achieved better recognition accuracy for JAFFE dataset. CSP method with natural weights achieved better recognition accuracy for WSEFEP dataset. RCP method with binary weights achieved better recognition accuracy for OULU-VIS dataset. RCP method with natural weights achieved better recognition accuracy for TFEID dataset. RCP method with odd weights achieved better recognition accuracy for ADFES dataset. RCSP method with natural weights achieved better recognition accuracy for MUG dataset. RCSP method with squares weights achieved better recognition accuracy for CK+ dataset. RCSP method with prime weights achieved better recognition accuracy for KDEF dataset. In case of RAF dataset, RCSP method with Fibonacci weights and for FERG dataset, RCSP method with natural weights achieved better recognition accuracy and hence these methods are chosen for comparison analysis with the existing methods. The comparison analysis of the proposed methods with the the existing variants of binary patterns is shown in Table 10 and the comparison analysis with the existing methods is shown in Table 11. The comparison analysis for JAFFE dataset with the existing variants of binary patterns is reported in the second column of Table 10. From Table 10, the proposed CSP method outperformed existing variants of binary patterns such as LDTP, LDTerP, RADAP and CP by 10.87%, 10.49%, 5.99% and 3.69% respectively. From Table 11, the proposed method could also outperform the existing methods such as VGG19, ResNet50, DLFS and PCANet by 0.77%, 5.06%, 1.39% and 3.84% respectively. The comparison analysis for MUG dataset with the existing variants of binary patterns is reported in the third column of Table 10. From Table 10, the proposed RCSP method outperformed existing variants of binary patterns such as LDTP, LDTerP, RADAP and CP by 5.05%, 5.64%, 3.49% and 2.49% respectively. From Table  11, although, the deep learning methods such as VGG19, ResNet50 and HiNet achieved 1.37%, 1.83% and 3.45% more than the proposed method, the proposed RCSP method is simple and whenever natural weights are utilized, the fv length is much lesser than that of traditional binary pattern variants.
The comparison analysis for CK+ dataset with the existing variants of binary patterns is reported in the fourth column of Table 10. From Table 10, the proposed RCSP method outperformed the existing variants of binary patterns such as LDTP, LDTerP, RADAP and CP by 4.92%, 6.6%, 3.40% and 2.06% respectively. From Table 11, the proposed method could also outperform deep learning methods such as VGG19, ResNet50, DLFS and PCANet by 9.29%, 0.69%, 4.28% and 9.16%, respectively. Although, the method HiNet achieved 0.6% more than than the proposed method, our method of feature extraction is simple and easily implementable. The comparison analysis for OULU-VIS dataset with the existing variants of binary patterns is reported in the fifth column of Table 10. From Table 10, the proposed RCP method outperformed the existing variants of binary patterns such as LDTP, LDTerP, RADAP and CP by 6.85%, 11.18%, 1.37% and 3.63% respectively. From Table 11, the proposed method could also outperform deep learning methods such as VGG19, ResNet50, MobileNet and HiNet by 5.21%, 10.31%, 15.31% and 3.71% respectively.
The comparison analysis for TFEID dataset with the existing variants of binary patterns is reported in the sixth column of Table 10. From Table 10, the proposed RCP method outperformed existing variants of binary patterns such as LDTP, LDTerP, RADAP and CP by 3.04%, 6.01%, 2.92% and 2.92% respectively. The comparison analysis for KDEF dataset with the existing variants of binary is reported in the seventh column of Table 10. From Table 10, the proposed  Table 11, the proposed method could also outperform the existing methods such as DLFS and PCANet by 4.05% and 13.06% respectively. The comparison analysis for WSEFEP dataset with existing variants of binary patterns is reported in the eighth column of Table 10. From Table 10, although RADAP method achieved 1.71% more than the proposed method, the feature extraction using CSP is very simple, as natural weights are used. Also, RADAP uses binary weights and generates six feature values, so fv length is more, when compared to the proposed method CSP, which generates one feature value. The comparison analysis for ADFES dataset are reported in the ninth column of Table 10. From Table 10, the proposed RCP method outperformed existing variants of binary patterns such as LDTP, LDTerP, RADAP and CP by 8.45%, 14.29%, 3.25% and 1.95% respectively. From Table 11, the proposed method could also outperform deep learning methods such as GoogleNet, AlexNet, CNN and AFM by 7.68%, 5.73%, 7.15% and 1.46% respectively. For RAF dataset, the comparison analysis of the proposed methods with the existing methods is shown in Table  12. From Table 12, the proposed RCSP method outperformed some of the existing methods such as DLP-CNN, ICID Fusion, DCNN + RLPS and IFSL by 3.44%, 2.24%, 4.80% and 0.74% respectively. For FERG dataset, the comparison analysis of the proposed methods with the existing methods is shown in Table 13. From Table 13, the proposed RCSP method outperformed the existing methods such as DeepExpr, Ensemble Multi-feature, Adversarial NN, Deep Emotion, LBP-AW and WLGC-HD by 10.97%, 2.99%, 1.79%, 0.69%, 3.29% and 2.09% respectively.

Experiments for eight expressions
The experiments for eight expressions are performed on TFEID dataset. Apart from basic six plus neutral expres-  Table 14 for eight expressions, in Table 15 for ten expressions (for ADFES dataset). From Table 14, CSP method with prime weights achieved better recognition and hence is chosen for comparison analysis with the existing methods. The existing binary variants have been implemented in our environment and correspondingly the recognition accuracy is reported in Table 16. From Table 16, the proposed method outperformed recent methods such as RADAP and CP by 3.57% and 3.65% respectively.

Experiments for ten expressions
The experiments for ten expressions are performed on ADFES dataset. Apart from basic six plus neutral expressions, this dataset has three more expressions, namely contempt, embarrass and pride. For the experimental evaluation, 215 images belonging to ten expressions have been considered. The proposed methods have been tested with different weights and the results are tabulated in Table 15 for ten expressions. From Table 15, RCP with squares weights achieved better recognition and hence is chosen for comparison analysis with existing methods. The existing binary variants have been implemented in our environment and correspondingly the recognition accuracy is reported in Table  17. The results reported by Shojaeilingari et al. [48] for the methods LBP-TOP, OF PID and Landmark PID are taken directly for comparison analysis in Table 17. From Table 17, Landmark PID [48] achieved 0.70% more than the proposed method. Other than Landmark PID, the proposed method outperformed recent methods such as RADAP, CP, LBP-TOP and OF PID by 3.28%, 4.6%, 10.95% and 7.78% respectively. Deep neural network approaches are generally preferable to handcrafted methods. However, parameters such as batch size, learning rate, number of training images, image size, and number of trainable model parameters all affect the overall recognition accuracy of deep neural networks. As a result, the accuracy of deep learning algorithms may vary based on the above factors. The results reported for deep learning methods are taken from the corresponding cited papers. The main advantage of the proposed methods is that they are easily implementable and they extract simple and relevant features within a local neighborhood by considering both the neighboring pixel's relationship and adjacent pixel's relationship for capturing the finer appearance changes with respect to specific facial expressions. Also, from the available images in the dataset itself, the proposed methods can learn and classify the test data. The experiments are performed in person independent setup to simulate a real-world scenario. From the experimental results, the proposed methods performed better on a variety of facial expression datasets and outperformed the standard existing methods proving the robustness of the proposed descriptors.

Conclusion
The main objective of FER systems is to develop feature descriptors that could accurately classify the facial expressions into various categories. Towards realizing this task, texture-based feature descriptors, namely RCP (generates two feature values) and CSP (generates one feature value) and their fusion RCSP are proposed in this work. The experiments are conducted using RCP and CSP independently and with their fusion RCSP using different weights on a variety of facial expression datasets which include datasets captured in the lab, 'in the wild' dataset and also on an animated facial expression dataset. From the experimental results, an observation has been made that proposed methods outperformed standard existing methods proving the robustness of the proposed descriptors and in most of the experiments, RCP has achieved better recognition accuracy than CSP or RCSP and using other weights than binary resulted in an enhanced performance with decreased fv length. From the experimental results, the pixels along the radial directions proved to be efficient for capturing local minute details. As a future work, using those pixel positions, novel graph based feature descriptors with low dimensions can be proposed. The proposed descriptors can be extended in future to handle pose and illumination problems, that are more prevalent in real world. Also, further research can be carried out on different weight approaches to choose the best weight for various image processing applications.

Conflict of interest On behalf of all authors, Mukku Nisanth
Kartheek states that there is no conflict of interest.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecomm ons.org/licenses/by/4.0/.