Weighted statistical binary patterns for facial feature representation

We present a novel framework for efficient and robust facial feature representation based upon Local Binary Pattern (LBP), called Weighted Statistical Binary Pattern, wherein the descriptors utilize the straight-line topology along with different directions. The input image is initially divided into mean and variance moments. A new variance moment, which contains distinctive facial features, is prepared by extracting root k-th. Then, when Sign and Magnitude components along four different directions using the mean moment are constructed, a weighting approach according to the new variance is applied to each component. Finally, the weighted histograms of Sign and Magnitude components are concatenated to build a novel histogram of Complementary LBP along with different directions. A comprehensive evaluation using six public face datasets suggests that the present framework outperforms the state-of-the-art methods and achieves 98.51% for ORL, 98.72% for YALE, 98.83% for Caltech, 99.52% for AR, 94.78% for FERET, and 99.07% for KDEF in terms of accuracy, respectively. The influence of color spaces and the issue of degraded images are also analyzed with our descriptors. Such a result with theoretical underpinning confirms that our descriptors are robust against noise, illumination variation, diverse facial expressions, and head poses.


Introduction
Artificial intelligence has been developing rapidly with many real-world applications such as time series prediction [24], image classification [40,46], and smart cities [28]. Among them, personal identification using biometric traits is a hot trend nowadays and has received increasing attention in the computer vision community. With biometric characteristics, the face image can be easily obtained from the camera as a non-invasive acquisition process. Therefore, face recognition can be widely applied to public environments such as video surveillance, criminal detection, access control system, mobile device security, etc. [9]. Although diverse methods for face recognition have been introduced [18,26,36], they still have shortcomings. For this reason, face recognition is a challenging topic. Figure 1 shows several face recognition challenges, such as facial expression, head pose, illumination, and background complexity. Also, it has other difficulties, including occlusion, aging, makeup, image quality, etc. These challenges are formidable to deal with well.
A face recognition application typically consists of face detection, feature extraction, and classification. In general, the feature extraction stage plays a vital role because it will fail to achieve decent results when the employed feature descriptor is not adequate. Indeed, most wellknown methods have robust feature descriptors, highly discriminative, and robust to extrinsic changes. In recent years, most face recognition algorithms, which have been studied extensively in addressing robust and discriminative descriptors, focus on three primary techniques: holistic, local, and hybrid models [23]. The holistic approach exploits the entire face and projects it into a small subspace such as Eigenfaces in manifold space [45], Fisherfaces [16,33]. The local approach considers certain facial features such as Speed-up robust features (SURF) [17], Local Binary Patterns (LBP) [22]. The local information combines with the holistic information to enrich feature descriptors for performance improvements in the hybrid approach: the fusion of 54 Gabor functions and fuzzy logic for facial expression recognition [15], two-color local descriptors, called Color ZigZag Binary Pattern (CZZBP) [19], or a fusion of Deep features [12].
Thanks to the low computational cost and efficient feature extraction capability, the LBP-based methods have been studied and widely applied to many tasks such as face recognition, facial expression classification, or texture classification. A large number of the LBP variants and hybrid models based on LBPs have been introduced [1,36] for face recognition. However, they still have some drawbacks, such as noise sensitivity, contrast information, or illumination variation. This paper proposes a weighting statistical binary pattern framework that can improve the local descriptor in terms of discriminative power and robust against noise and illumination variation.
This work is extended from our prior efforts where we consider neighborhoods in straight-line topology [44] to utilize more useful information for local feature descriptors by statistical binary patterns [14,37]. In this way, the proposed framework firstly considers two statistical moments (mean and variance) for noise elimination and obtain complementary information. Then, the proposed LBP variant is applied to the first-moment image for LBP representations. The second-statistical moment image is a complementary component for building the weighted histogram to incorporate each pattern contribution. This proposed framework can enrich local descriptors by utilizing both moments without increasing the fused histogram dimension. The present study addresses prior shortcomings and proposes an upgraded descriptor for face recognition. The contributions of it are given as follows: -We present a straight-line topology approach with LBP by direction (known as LBP α ), which is robust against several visual challenges, such as noise, illumination, and facial expressions, as a base foundation. -Then, we propose a novel complementary LBP variant (known as CLBP α ), which is inspired by the local difference magnitude-sign transform to complement information for the local descriptor. -To extract more robust descriptors from salient information in statistical moments, we propose the fused histogram of CLBP α , that is constructed by using WSBP α to obtain enriched features. -A comprehensive evaluation of six public datasets suggests that our proposed framework outperforms the state-of-the-art methods.
The paper is organized as follows. Section 2 prepares some background on LBP. Section 3 details the proposed framework. In section 4, we analyze the implementations through several parameter settings for evaluations. Experimental results are interpreted in Section 5. A discussion for our proposed framework is analyzed in Section 6, and the last one consists of our conclusion and future works.

Related works
Many methods based on basic LBP descriptors, that can encode the local appearance by the relation between neighborhoods, have been introduced. However, there exist several shortcomings, such as local information loss or sensitivity to noise. Diverse LBP variants have been proposed to address these shortcomings. Several neighborhood topologies or encoding operators have been introduced, such as Dominant Rotated Local Binary Patterns (DRLBP) [32] and Enhanced Line Local Binary Pattern (EL-LBP) [44].
Recently, several hybrid models based on LBP-like descriptors for face analysis have been examined and proved to have highly discriminative power [22]. Lin et al. [27] proposed a fast algorithm, called LBP edge-mapped descriptor, which was to fuse LBP and SIFT using the maxima of gradient magnitude points on the image to illustrate facial contours for face recognition. Ding et al. [11] introduced the Dual-Cross Patterns (DCPs) as a core algorithm to extract facial features at both the holistic and component levels of a human face, then applied the first derivative of Gaussian for eliminating the differences of illumination. The Multi-scale block Local Multiple Patterns (MB-LMP) [49] exploited multiple feature maps based on the modified Weber's ratio, then fused the histograms of non-overlapping patches for more robust features. Kas et al. [21] addressed shortcomings of previous LBPs and proposed Mixed Neighborhood Topology Cross Decoded Patterns (MNTCDP) by considering multi-radial and multi-orientation information simultaneously to exploit the relationship between the referenced point and its neighbors on each 5 × 5 pixel block. Inspired by LBP-like in face recognition, Shu et al. [43] proposed Equilibrium Difference LBP (ED-LBP) in multiple color channels (RGB, HSV, YCbCr) accompanied with an SVM classifier for face spoofing detection. Unlike the traditional LBP circle, the Local Diagonal Extrema Number Pattern (LDENP) [42] descriptor only encoded information within the local diagonal neighbors using the first-order local diagonal derivatives to obtain a compact description for face recognition. Deng et al. [10] proposed an accurate face recognition by exploiting the compressive binary patterns (CBP) on a set of first six random-field eigenfilters, which reduced the bit error rate of LBP-like descriptor and were more robust against additive Gaussian noise. According to LBP, another approach encoded information by examining neighboring pixels at different distances across different derivative directions called Local Gradient Hexa Pattern (LGHP) [6] which generated discriminative inter-class facial images. Lu et al. [29] proposed an unsupervised feature learning to represent face images from raw pixels and jointly encoded codebook for small regions to obtain high discrimination in descriptors, called Simultaneous Local Binary Feature Learning and Encoding (SLBFLE).
The other aspect was to utilize more useful information for descriptors to overcome local information loss within images. For instance, the Completed LBP technique (CLBP) [14] described local difference Sign-Magnitude transform to obtain higher performance. Another improvement of CLBP, i.e. the statistical binary patterns model [37], was built on several statistical moments for robust descriptors and improved the performance.

LBP
LBP was first introduced by Ojala et al. [38]. The LBP feature describes the spatial relationship in an image by encoding the neighbor points of a given central point. Let f be an 2D discrete image in Z 2 space. Then, the LBP encoding of f can be considered as a mapping from Z 2 to {0, 1} P : and g p are intensity of P neighbors and are measured on the circle of central point c and radius R. The dimension of LBP descriptor can be reduced by considering its uniform patterns, whose values U(LBP P ,R ) ≤ 2 and defined by the following equation: where LBP p P ,R is the p-th bit of LBP P ,R , and LBP P P ,R = LBP 0 P ,R . LBP u2 P ,R [38] was a very robust and reliable descriptor for face representation or texture classification. As a result, the mapping from LBP P ,R to LBP u2 P ,R produces L = P (P − 1) + 3 distinct output values by building a lookup table of 2 P patterns. Therefore, the local descriptor is described as follows: where H t = T LBP P ,R (x, y) = t , and T A = 1, if A is true 0, otherwise (4) in which H t is the occurrence of the t th LBP u2 code, where t ∈ [0..L−1]. Therefore, the length of histogram in uniform LBP representation is L = P (P − 1) + 3.

Completed LBP
wherem is the mean value of m p from the whole image. Moreover, the last component C also carries discriminant information. Therefore, the CLBP C operator is formulated: wheref is set as the mean gray level of the whole image. Because of complementary relationship between these operators, it turns out that the Completed LBP descriptor is useful for the texture classification task.

Face representation based on LBPs
The face representation based on LBP descriptors has been first introduced by Ahonen et al. [1] by analyzing small local regions in the face instead of striving for a holistic facial texture representation. In such a local approach, a face image is partitioned into m non-overlapping patches R (j ) (j = 1..m) where an LBP operator is independently applied to produce local histograms. It aims to fuse all LBP histograms as a single vector (also known as local LBP descriptors) for facial texture representation. The concatenation approach is a simple and efficient one for LBP description. Each LBP histogram H (j ) by each image patch R (j ) is computed by (3). Finally, the global LBP descriptor for all patches R (j ) is formulated as follows (T is the transpose operator): The resulting feature vector has the size of m × n, where n is the length of LBP histogram along with its topology. Therefore, this approach for face representation is more robust under variations such as poses or illumination. Notably, small patches within an image can be of different sizes or overlapping regions. Many face recognition works have followed the local approach and obtained significant LBP variants [5,42,47,49].

Statistical moment images
Since we define that f is a 2D discrete image in Z 2 space, we can obtain a real-valued image in R by a mapping technique. The spatial support, which is employed to compute the local statistics, is modeled as B ⊂ Z 2 , such that O ∈ B, where O is the origin of Z 2 [37]. Figure 2 illustrates how to construct a spatial support B.
The r-order moment image associated to f and B is also a mapping from Z 2 to R, defined as where c is a pixel from Z 2 , and |B| is the cardinality of the structuring element B. Accordingly, the r-order centered moment image (r > 1) is defined as where m 1 (f,B) (c) is the average value (1-order moment) calculated around c. Finally, the r-order normalized centered moment image (r > 2) is defined as where μ 2 (f,B) (c) is the variance (2-order centered moment) calculated around c. We propose the Weighted Statistical Binary Patterns by direction α (WSBP α ) descriptor to enhance the discriminant capability of LBPs for face recognition while reducing its sensitivity to representative challenges, such as facial emotions, noise, or illumination. Such a descriptor can encode spatial information in a set of local statistical moment images and maps this coding to uniform LBP "u2" to produce a more compact descriptor. Because of complementary and consistent characteristics, two crucial components CLBP S α and CLBP M α (Here, α is a given direction) are computed on a mean image and weighted by a variance image to improve the performance. The detail is given as follows.

Local Binary Patterns by direction (LBP α )
In the original LBP and several variants, the neighbors g p have the coordinate (R cos(2πp/P ), R sin(2πp/P )) lying on a circle of radius R. In the proposed LBP α , we consider the relationship between pixels by a straight-line topology on a direction α, given that the coordinate of c is (0,0). The neighbors of straight-line topology are defined as follows: When considering a line topology, the number of neighbors should be an even number, and neighbors are bilateral symmetry with a central point c. Figure 3 illustrates four LBP α i by considering 6 neighbors along with a line topology.
Similar to traditional LBP, we encode an image by LBP α operator as defined in (1). Therefore, it can be expressed as follows: where g p α is defined in (11), and remaining variables such as f , c, P , and R are defined in (1). As a result, LBP α operator, which produces 2 P distinct patterns, leads to a huge descriptor. Inspired by the LBP uniform principle in Section 2.1, we reduce the number of patterns by considering the uniform patterns concept for LBP α . After this process, the LBP α "uniform patterns" have P (P −1)+3 distinct output values from a lookup table of 2 P values.
The main difference between the circle LBP and the LBP α is that the LBP considers spatial relationship by a circle, whereas the LBP α exploits spatial information in the straight-line of neighbors along with the given directions.
Although several primary factors such as the direction of exposure, illumination, or facial expressions are given as challenges in face recognition, it turns out that the LBP α -based representation is robust against changes of illumination and scale since it examines micro-patterns in a line topology. Moreover, by taking advantage of traditional LBP, the proposed LBP α can characterize the distribution of local pixels by a direction, and the frequency of occurrences of LBP α values can be used to represent various facial structures.

Complementary Local Binary Patterns by direction α (CLBP α )
The CLBP [14] had been used for texture classification by combining three operators CLBP S, CLBP M, and CLBP C in a joint or hybrid way. Similar to CLBP, we propose Complementary Local Binary Patterns by direction α (CLBP α ) which considers neighbors g p α by direction α for the face recognition task. The proposed CLBP α consists of two operators: CLBP α -Sign (CLBP S α ) and CLBP α -Magnitude (CLBP M α ). In general, the CLBP S α is similar to the proposed LBP α described in (12). The CLBP S α operator describes the structure of image f with respect to the local relationship, whereas the CLBP M α complements local difference Magnitude and is in a consistent format with that of the CLBP S α . This operator is defined as follows: wherem α is the mean value of m p α for the whole discrete image f . Each component S and M has P (P − 1) + 3 distinct values corresponding to the "uniform" LBP α coding of discrete image f . Inspired by forming CLBP descriptors [14], we have two ways to combine different components for enhanced descriptors. The first descriptor CLBP S/M α , which forms a joint 2D histogram from the CLBP S α and CLBP M α codes, has [P (P −1)+3] 2 values. The second descriptor CLBP S M α , which concatenates two histograms together, has 2[P (P − 1) + 3] values. The distribution of the first one can become too sparse when the dimension (i.e., the number of neighbors P ) increases. However, the marginal histogram of the second one obtains a reasonable size of 2[P (P − 1) + 3]. As a trade-off between the performance and computational cost, the marginal histogram approach is utilized in our experiments. Note that component C, which expresses the local gray level in the image, is ignored in our proposed model. The proposed CLBP S M α produces more reliable and significant expressiveness for the facial feature representation.

Weighted Statistical CLBP by directions α i (WSBP α i )
An introduction of two first-order moments (mean and variance moments) into an LBP-based operator was proposed as Statistical Binary Patterns (SBP) [37]. The first order, known as mean-valued moment m 1 , gives the contribution of individual pixel intensity for the entire image. The second order, known as variance-valued moment μ 2 , is to find how each pixel varies from its neighboring pixels and represents salient regions in an image. Our proposed WSBP can build a novel histogram from CLBP α i descriptors by computing CLBP α i image on the first-order moment m 1 and counting occurrences of every pattern on that CLBP α i image by a significance index corresponding to the salient regions using the new secondorder moment μ 2 . The proposed descriptor can discard the noise, illumination, or near-uniform regions. Figure 4 illustrates the flow diagram of our descriptor based on the WSBP α i descriptors. With the mean image m 1 , the spatial relationship between local structures is represented using CLBP α i operator to obtain two essential components S α i and M α i . Then, each component (S α i and M α i ) obtained by CLBP α i operator is weighted by the contribution of every local pattern according to the new variance image μ 2 for the weighting histogram.
Let H be the histogram vector of each component, and (x, y) be location of pixel in each component of CLBP α i (P ,R) image. Then, the histogram for each component is based on the contribution of every location (pixel) in new variance moment μ 2 . Equation (4) defines the occurrence of every CLBP α i (P ,R) code t th as follows: The SBP descriptor [37] produces enhanced descriptors and only considers all patterns having the same weights and ignoring their significance. In this paper, the WSBP α i descriptors capture the local relationships within images corresponding to the mean moment, and exploit contrast and gradient magnitude information through variance moment to enhance the local relationship description. Equation (14) describes how every pixel occurrence is weighted by its contribution corresponding to those pixels in a new variance moment μ 2 . Therefore, the histogram of each component S α i and M α i has P (P − 1) + 3 values. Finally, the dimensionality of WSBP α i descriptor is 2[P (P − 1) + 3] because of the concatenation of histograms. As a result, the WSBP α i descriptor is not only compact but also robust to noise, illumination and other variations.

The computational complexity
In this section, we address the computational complexity of WSBP descriptor for an input image of size N × N. Suppose that the pre-defined spatial support B is defined as (R 1 , P 1 ), (R 2 , P 2 ); WSBP α is calculated by considering P neighbors. The computational complexity of WSBP descriptor depends on the following factors.
-Construction of moment images: At each pixel, the mean value can be obtained after O(P 1 + P 2 ) operations, while the variance value requires O((P 1 + P 2 ) 2 ) operations. Therefore, the construction of moment images can be done in CBLP α consists of 2 components CBLP S α and CBLP M α . The first one is calculated in O(P N 2 ). The second one has the same complexity of O(P N 2 ). As a result, the complexity of

Implementation
In this section, we detail the configuration of the WSBP descriptor.

The fusion of different descriptors WSBP α i
Suppose that WSBP α considers only one direction (α is a given direction), it could lead to an inadequate description simply because such a descriptor would exploit only the local relationship along that direction. What we aim here is that this descriptor should utilize every useful surrounded features. Inspired by LBP operators in a circle topology (with scale of (P, R) = (8, 1)), we propose to consider at least four directions for the fused histogram, α i ∈ {0 0 , 45 0 , 90 0 , 135 0 } (see Section 5). Figure 5 shows components S and M of CLBP at four directions {α i } as four views of a given image. The fusion of four views could be an adequate descriptor in recognizing face against illumination or head pose variations. Such a WSBP can be expressed as follows:

Moment parameters
For a successful implementation of our descriptor, a proper parameter setting has to be made. As a pre-processing step, the mean (m 1 ) and variance (μ 2 ) moments obtained by computing the spatial support B are used to reduce the noise sensitivity. Thus, moment parameters should be in the optimal settings for this purpose.
We define the structuring elements as a circle spatial support B = {{(R i , P i )}}, such that (P i ) is the number of neighbors and (R i ) is its radii. Figure 6 shows an example of two-moment images using B = {(1, 8)}. Given that the second-order moment (variance moment) tends to emphasize only dominant edges, some potentially important information could be discarded. To handle this problem, we propose to perform an extraction of root k-th for the variance moment as μ 2 = k √ μ 2 (k ∈ [2,16]). For example, Fig. 6e shows the new variance moment (μ 2 ), built by extracting root 9-th from the original one. With this method, more useful facial features, such as eye, nose, and mouth, can be enhanced as salient regions. Thus, the weighted histogram can enrich the essential areas by exploiting the contribution of every statistical pattern in the variance image. In the next section, we show how μ 2 = 9 √ μ 2 under the B = {(1, 6)} for structuring element makes a huge difference through a series of experiments with six public face datasets.

Experiments
This section describes experiments with six face datasets, such as ORL, YALE, AR, Caltech, FERET, and KDEF. Our statistical feature descriptors were processed with the algorithm mentioned above. Below features, that were the concatenation of 4 directions in exploiting CLBP α i operators, were used in our experiments: The fusion of different directions and components (S, M, m 1 , μ 2 ) would lead to a very long descriptor as the concatenation of histograms. To handle this problem, the Principal Component Analysis (PCA) with the percentange of cumulative sum of eigenvalues of 95%, was adopted for the dimension reduction purpose. For the classification task, the Linear SVMs were utilized.

Databases and experimental protocols
The ORL dataset 1 Figure 11 shows several examples in which original color images were converted to gray-scale images and decomposed into mean and variance moment images. For the dataset, we conducted two experiments for comprehensive evaluation. Because of color images, the first experiment was examined with grayscale images following the same protocol as other datasets, while the other was carried on other color channels to give several our perspectives.
The FERET dataset 6 [41], collected in 15 sessions during four years, was a large benchmark used extensively for comparison. The dataset comprised a total of 14,126 images from 1199 individuals. A subset adopted for evaluations had 1400 images of 200 subjects (7 images per person), including variations in poses, expression, and illumination. Figure 12 shows images under 7 states of each person.

Results with the ORL and YALE datasets
For the ORL dataset, N train training images per subject were randomly selected (N train = 2, 4, 5, 8) while the remaining (10 -N train ) images were used for testing. For the YALE dataset, N train training images per subject were randomly selected (N train = 2, 4, 6, 8) and the remaining (11 -N train ) images were carried out for testing. The measurements were repeated 100 times by shuffling data process. Results were shown in Tables 1 and 2 for the average classification rate.
Tables 1 and 2 summarised our experimental results under the various configurations. First, the CLBP S(m 1 ) and CLBP M(m 1 ), the WSBP S and WSBP M produced similar recognition results. The CLBP S(m 1 ) and WSBP S played a major role in achieving good results, suggesting that the S component encoded more valuable information from each face image. Second, when utilizing both mean (m 1 ) and variance (μ 2 ) moments as the input data for CLBP, the CLBP S(m 1 , μ 2 ) and CLBP M(m 1 , μ 2 ) dramatically increased the classification rate than considering only the first-order (m 1 ) moment, suggesting that fusion of S and M components improved the performance. Indeed, the CLBP S M(m 1 ) produced better results. For instance, when (P, R) = (4, 2), the best results obtained by CLBP S M(m 1 ) with the number of training images (N = 2, 4, 5, and 8) for the ORL dataset were 86.22%, 96.2%, 98.25%, and 99.45%, respectively. Similarly, the WSBP descriptors reached 87.91%, 96.77%, 98.51%, or 99.41%, respectively. On the other hand, WSBP and CLBP S M(m 1 , μ 2 ) were similar. For the YALE dataset (Table 2), our WSBP outperformed the other methods: 91.95%, 97.14%, 98.72%, and 99.44% at the scale of (P, R) = (4, 2). Also, our proposed method was compared with other state-of-the-art methods as shown in Table 3, that summarized techniques and recognition rate corresponding to each method. Ten methods on the top, including ours, were based on several hand-crafted features, and three remaining ones were based on deep features. Our method achieved 98.51% and 98.72% recognition rates for ORL and YALE datasets, which were greater than other methods, suggesting that our descriptor was robust against visual challenges, such

Results with Caltech 1999 and KDEF datasets
Since the number of images for each class in Caltech 1999 dataset varied, we did not change the number of training images for each class like the previous experiments. Here, we randomly chose half of the images in each class as a training set, and the remaining ones were used as a testing set. Table 4 showed our results for three configurations of (P, R), respectively, suggesting that our WSBP and CLBP S M(m 1 , μ 2 ) achieved the highest recognition rate. Table 5 compared our results with a deep learning approach based on Deep Stack Denoising Sparse Autoencoders (DSDSA) [13]. As can be seen from these tables, even using one single small scale (P, R) = (4, 2), our descriptor WSBP (Ours 1), CLBP S M(m 1 , μ 2 ) reached the recognition rate of 98.83%, 98.96%, respectively, which were greater than the performance of DSDSA. When we considered CLBP S M(m 1 , μ 2 ) at the scale of (P, R) = (6, 3), the performance was 99.03% (Ours 2).
For the KDEF dataset, we conducted the face recognition task by changing the number of training images for each person to verify the accuracy rates of each train/test portion. Indeed, several training images N train (N train = 2, 3, 4, 5) were randomly chosen while the remaining images (7 -N train ) were used for test. Here, the evaluation was repeated 100 times by shuffling data to get the average accuracy. Table 6 showed our results for three configurations of (P, R), respectively. Specifically, at the scale of (P, R) = (4, 2), our descriptor WSBP substantially increased the accuracy up to 94.11%, 97.87%, 99.07%, and 99.33% with the number of training images N train = 2, 3, 4, and 5, respectively. Such high performance suggests that our descriptor could effectively deal with visual challenges such as diverse facial expressions, illumination or occlusions.

Evaluation with gray-scale images
We carried out the cross-validation for training and testing. The different number of training images (N train = 10, 13,15,20) and the remaining (26 -N train ) were used for training and testing sets to guarantee they had unseen images. Results using 100 shuffle splits were summarized in Table 7 for the recognition rate and Table 8 for comparing with other methods, respectively.   Table 7 showed several LBPs results obtained with various parameters. As it can be seen, the CLBP S(m 1 ) and CLBP M(m 1 ) obtained the base results. With a specific parameter of (P, R) = (6, 2), the best results obtained by CLBP S(m 1 ) with N train = 10, 13, 15, 20 were respectively 82.87%, 88.83%, 90.79%, 95.90%. Similarly, the results of CLBP M(m 1 ) were respectively 80.09%, 86.22%, 89.83%, 95.67%. However, the recognition rates were improved significantly when bilaterally complementing S and M components with CLBP S M(m 1 ), the results were 98.46%, 98.68%, 99.04%, 99.93%, respectively. In this case, the CLBP S M(m 1 ) had a significant improvement which could increase 12.46% at N train = 13 (compared to CLBP M(m 1 ) and CLBP S(m 1 )). Moreover, CLBP S M(m 1 , μ 2 ) and WSBP descriptors also significantly increased the performance when reaching 98.79% and 99.37%. With this parameter, our proposed framework WSBP outperformed the CLBP S M(m 1 ) (0.69%) and CLBP S M(m 1 , μ 2 ) (0.58%). Notice that the improve-ment of WSBP could reach 13% compared to the original LBPs. Table 8 compared our method with others. In terms of the recognition rates, Ours outperformed the state-ofthe-art methods, including hand-crafted features and deep features techniques. Also, our WSBP was better than Multi-resolution dictionary [30] (82.19%), MNTCDP [21] (96.18%), Local Multiple Patterns [49] (98.00%), or even deep facial features CS [2] (93.99%) by a substantial margin. The remaining algorithms, including EL-LBP [44] (98.27%) and deep feature FDDL + CNN [39] (98%) were comparable with our descriptors, and yet ours prevailed.

Evaluation with color channel
The motivation of this experiment was to check the behaviors of facial descriptors for the color channel. The experiment was conducted with the HSV color images by keeping the other experimental setting was similar to Fig. 11 The AR dataset samples which are decomposed into pairs of moment images (mean m 1 , variance μ 2 ) for each two-row pair Fig. 12 The FERET dataset samples which are decomposed into pairs of moment images (mean m 1 , variance μ 2 ) for each two-row pair that of the gray-scale image. First, an RGB color image was converted into an HSV color image. Second, the Hue channel was extracted from the HSV space, called it H image, and fed it as an input for our experiment. Our descriptors were able to extract the eyes, eye-blows and mouth from H image probably because these areas had the distinctive colors. And yet the Sign (S) and Magnitude (M) components, computed by CLBP α on m 1 , could not discriminate the subtle color change occurring within the facial skin area, as shown in Fig. 13.
An evaluation with color channel was conducted with the same protocol settings of the gray-scale case. Result with 100 shuffle splits was summarized in Table 9. Note that (P, R) = (4,2), and N train = 13 were specific parameters chosen from Table 9, suggesting that S worked better than M for three cases. For instance, the accu-  CLBP α , which was inspired from CLBP [14], was designed for the gray-scale case to complement the crucial component M. It was not very effective in discriminating the color change within facial skin (see Magnitude components of H and the gray-scale images in Fig. 13). On the other hand, S component worked very well on H image by utilizing statistical moments (m 1 , μ 2 ), since its accuracy was comparable with the state-of-the-art methods. For instance, the accuracy of CLBP S(m 1 , μ 2 ) and WSBP S reached 93.56% and 90.28%, respectively, while CLBP S(m 1 ) reached 49.41%. Notice that the accuracies of CLBP S(m 1 , μ 2 ) and WSBP S were better than that of CLBP S(m 1 ) since margins were 44.15% and 40.87%, respectively. These results suggest that our descriptor was designed to extract the spatial relationship of the neighboring pixels, not to simply discriminate the magnitude between pixels.
The experiment was carried out with random N train training images of each class (N train = 1, 2, 3, 4, 5, 6), and N test testing images (N test = 7 -N train ) by 100 splits for average accuracy. Table 10 illustrates the achieved results on CLBP α i operators at various scales of (P, R). For most cases, WSBP obtained the best results and reached over 90% accuracy with 2 training images only. Table 11 compared a few recent methods and stated that our descriptors achieved the best performance. In detail, WSBP with 3 training images exceeded MNTCDP [21] at 2.57%, which was not easy to deal with the challenging FERET dataset having images under multiple orientations. As mentioned above, FERET had two different protocols for evaluations. It would not be a fair evaluation if we compared such methods under different protocols. And yet, it is interesting to evaluate the previous reports. For this purpose, we performed the average accuracy result based on recent reports: CLBP [10], SLBFLE [29], and WPCBP+FLD (HI) [47] (see Table 11), Table 4 Recognition rates for the Caltech dataset (P, R) = (4, 2) (P, R) = (6, 2) (P, R) = ( wherein these methods performed efficiently with the subset of the frontal face cases.

Robustness against degraded images
In practical surveillance scenarios, the degradation of images often happened during the acquisition process and could significantly affect the system performance. Therefore, motivation of this experiment was to examine how our facial descriptors dealt with such problems. In the first scenario, the Gaussian noise was added to the original image. For instance, five different levels of Gaussian noise were added by levels = {10%, 20%, 30%, 40%, 50%} using the Matlab function "imnoise". In the second scenario, occlusion was simulated by adding a white rectangle of random positions within the face region. Each rectangle had various sizes, ranging from [20,20] to [30,60] with a Matlab function "insertShape". Figure 14 showed the both scenarios.
In each scenario, five images were chosen in each class as training samples, whereas the rest as testing samples, by splitting the data into 100. The average recognition rates from the different methods were shown in Table 12. Here, we fine-tuned the structuring element B 2 = {(1, 5); (2, 6)} to obtain the best achievements for WSBP  [13] 97.50 and CLBP S M(m 1 , μ 2 ). This structuring element made our descriptors more robust against noise and occlusion comparing to other methods.

The processing time
This section describes the computational cost of several descriptors based on LBPs. Experiments of the ORL dataset for 400 images with 92 × 112 pixels were carried out with a machine with 3.5GHz CPU, 32GB RAM, and Windows 10 64-bit operating system. Table 13 showed the computational cost from two aspects: firstly, the processing time for the feature descriptor extraction phase and; secondly, the processing time for the matching phase (in seconds) of various descriptors with three different configurations of (P, R). The processing time measured here was based on the structuring element B = {(1, 6)}, where the training set had 200 images and the testing sets had 200 images. Table 13 showed that the WSBP required a longer processing time for the feature extraction and matching phase than CLBP S(m 1 ) or WSBP S. Indeed, it took much more processing time proportionally to a size of (P, R) due to larger dimension. And yet, notice that our WSBP descriptor was effective when it was compared with CLBP S M(m 1 , μ 2 ) since both recognition rates were approximately the same; see Tables 13 and 1.

Summary and discussion
Based on our experiments, we summarize and discuss several advantages of our proposed descriptors:   [30] Multi-resolution dictionary 82.19 Yang W. et al. [49] Local Multiple Patterns 98.00 Ouanan et al. [39] FDDL + CNN 98.00 Biswas et al. [2] Compressive Sensing (CS) 93.99 -The WSBP descriptor is designed to extend the LBPs with local difference Sign-Magnitude distributions on statistical moments. As a pre-processing step, statistical moment images obtained by spatial support B of local filters can eliminate noise coming from contrast change or illumination variation (mean moment) and yet derive useful information from the salient regions in a face image (variance moment) (see Fig. 6). -The classical LBPs consider neighborhoods bilaterally in a circle, whereas our WSBP descriptors are to exploit CLBP α i operators along with multiple directions, i.e. four directions, independently and combine them in the final descriptors. It is found that they are robust against different lighting conditions, head poses, and facial expressions to achieve high performance (see CLBP S M(m 1 ), CLBP S M(m 1 , μ 2 ), and WSBP in Tables 1, 2   using many parameters (P, R) because it could lead to a high dimensional descriptor. -Evaluation using six face datasets suggests that our descriptors outperform state-of-the-art methods, such as EL-LBP [44], AECLBP-S (B16) [22], Multi-resolution dictionary [30], DR-LBP + LDA [35], LDENP [42]. Moreover, our WSBP descriptors achieves better results than some deep facial features such as Deep Belief Net (GDBN) [8], Deep Autoencoders (DSDSA) [13], Compressive sensing (CS) [2], or FDDL + CNN [39] (see Tables 3, 5, and 8). Average recognition rate with the previous protocol CBP [10] 92.00 SLBFLE (R=4) [29] 96.95 WPCBP+FLD (HI) [47] 97.5 -According to an additional experiment with the color channel, it is found that the Magnitude transform captures the relationship of pixel magnitude on grayscale image very well, but is not effective with Hue image (see Fig. 13), since a combination of Sign-Magnitude of CLBP α in Hue space performs worse than that of the gray-scale case. Also, fusing statistical moments (m 1 and μ 2 ) in CLBP S and WSBP S achieves the higher accuracy in Hue space, by ignoring texture pixel intensity. This evaluation suggests a new direction in face recognition problems, such as integrating many color channels to enhance face spoofing detection performance [43]. -Although the YALE dataset contains some facial expression cases, it would be interesting in testing how our descriptor affords systematic variation of facial expressions. In addition, we use the KDEF dataset which has seven facial expressions for each subject to study the effect of facial expressions. Result suggests that our descriptor deals with such cases very well. -Our facial descriptors using both mean (m 1 ) and variance (μ 2 ) have shown their robustness against degraded images by evaluating the ORL dataset that contains artificial noise. Given that the Gaussian noise level of 50% makes the degraded face more challenging to recognize by human eyes, our WSBP(B 2 ) descriptor still reaches the acceptable accuracy of 93.05% for noise and 85.09% for occlusion, which are much higher than those of other LBPs.

Conclusions and future work
We present a set of descriptors wherein the local difference distributions in local binary patterns are exploited by  directions, and then a weighting approach for binary patterns is applied to statistical moment images for an efficient and robust facial feature representation. A comprehensive evaluation with several standard face datasets is carried out to validate our proposal. We have analyzed the behaviors of several descriptors with grayscale images and found that our method mostly outperforms state-of-the-arts. Also, an analysis with a set of color images has also been examined using the Hue channel for AR dataset. We have also simulated a few practical scenarios, that can be occurred during the data acquisition stage, by adding various Gaussian noise and random occlusion to the ORL dataset. One may understand that the spatial support strategy is a special preprocessing technique to eliminate the noise issues, and selecting the structure element B depends on the levels and types of noise. For the scenarios examined in this study, it is found that the structuring element of two circles eliminates noise very efficiently. Although this issue could downgrade the recognition performance, our experimental result is still higher than others. It shows that our proposed descriptor is robust against the degradation of the given image. Overall, our experimental results suggest that the proposed descriptor is robust against noise, contrast change, illumination variation, and facial expressions by exploiting different directions of binary pattern operators on the mean moment and considering the contribution of binary pattern to the variance moment.
We expect that these descriptors find more applications in the face recognition area and other areas such as facial paralysis analysis and face spoofing detection. Although our proposed framework is novel and highperforming, it has a few issues to be addressed: (1) the computational cost for matching increases when the descriptor dimension becomes larger; (2) it is necessary to fine-tune the optimal k-parameter for the root extraction variance moment. We plan to focus on how to deal with them. Also, it would be interesting to combine the WSBP descriptors with deep neural network for building powerful descriptors. Table 13 The processing time of the descriptors used for the present study with different parameters (FT: feature extraction time, FTS: feature size of LBPs descriptors without using any dimension reduction techniques, and MT: matching time)

Conflict of Interests
The authors declare that they have no conflict of interest.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons. org/licenses/by/4.0/.