A study in facial regions saliency: a fuzzy measure approach

People recognize familiar faces in a similar way by using interior facial features (facial regions) such as eyes, nose, mouth, etc. However, the importance of these regions in the realization of face identification and a quantification of the impact of such regions on the recognition process could vary from one region to another. An intuitively appealing observation is that of monotonicity: the more regions are taken into account in the recognition process, the better. From a formal point of view, the relevance of the facial regions and an aggregation of these pieces of experimental evidence can be described in the formal setting of fuzzy measures. Fuzzy measures are of particular interest with this regard given their monotonicity property (which stands in a clear contrast with the more restrictive additivity property inherent to probability–like measures). In this study, we concentrate on the construction of fuzzy measures (more specifically, λ\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ \lambda $$\end{document}-fuzzy measure) and characterize their performance in the problem of face recognition using a collection of experimental data.


Introduction
Perception and recognition of faces by humans is still a challenging problem.Each individual seems to recognize faces in a slightly different way.Nevertheless, numerous psychological studies report experiments, which confirm that there exist some properties of facial recognition mechanisms (including perception of salient facial regions, their mutual relationships or identification of brain areas responsible for identification of faces) being common for most people.Various methods of automatic face recognition have received significant attention during the recent years mainly because of their broad applications to forensic sciences, border control, passport verification, etc., where computers can help alleviate limitations of humans when working with large and continuously growing collections of data.
Computational face recognition methods can be divided into two main groups, namely holistic matching and feature-based matching methods (cf.Zhao et al. 2003).The first of them concerns a group of methods utilizing information contained in the overall face region such as Eigenfaces (Turk and Pentland 1991), Fisherfaces (Belhumeur et al. 1997), support vector machines (SVMs, Phillips 1998), independent component analysis (ICA, Bartlett et al. 2002), and their modifications, e.g., lattice ICA (Marques and Gran ˜a 2012).
The latter group includes methods where the localization and local statistics of facial features like eyes, nose, landmarks, contours etc., are essential.Such approaches include elastic bunch graph matching (EBGM, Wiskott et al. 1997), geometry-oriented methods (Kanade 1977), and local descriptors (Ahonen et al. 2004;Heikkila ¨et al. 2009).
There are many methods combining these two approaches (so-called hybrid methods), refer to the study reported by Pentland et al. (1994) where the Eigenfaces, Eigen features and the combined modular representation are discussed.Another examples are component-based methods where the face is decomposed into a set of features for which a flexible geometrical relation is allowed to compensate for pose changes (Heisele et al. 2003;Huang et al. 2003;Bonnen et al. 2013).Among these methods, approaches based on fuzzy information fusion produce accurate results, see e.g., Kwak and Pedrycz (2005).
The results concerning a way of perception of faces by humans may be fundamental from the point of view of automatic face recognition.For instance, some facts about significance of face features can play an important role in the design of automatic systems.Generally speaking, faces are processed by humans in a holistic manner (Sinha et al. 2006).Second-order spatial relations (i.e., spacing between features) play a very important role here (Rotshtein et al. 2007).However, internal facial features such as eyes, nose and mouth are relatively more important for recognizing trained (familiar) faces than external facial features (hair and face contour) which are more salient for recognition of untrained (unfamiliar) faces (cf.Ellis et al. 1979;Young et al. 1985).In the work by Davies et al. (1977) the Photofit Kit (a tool developed for reconstruction of faces by the police) was used to change the images of faces.Changes of foreheads, eyes and mouths caused the lowest error rates made by subjects.Similar results were presented in Haig (1986) and Matthews (1978) where the observers indicated that eye/eyebrows followed by mouth and then nose were the most dominant regions as the recognition features (taking into account internal features only).It was shown by O'Donnell and Bruce (2001) that people are highly sensitive to changes in the eye region when they are familiarized with.Other results confirm the significance of the upper half of face (Haig 1986) and eyebrows.In experiment described by Sadr et al. (2003), subjects recognized the faces of celebrities with removed eyebrows significantly worse than the faces without eyes (with the mean difference of 9.5 %).Surveys of works on human recognition of familiar and unfamiliar faces and cue saliency can be found in Johnston and Edmonds (2009) and Shepherd et al. (1981).
Detailed studies on cue saliency in computational face recognition were reported in many studies along with the performance obtained by using numerous techniques.Moreover, it is worth noting that in some situations, especially in crime investigations, we encounter a face image containing only a small visible part of face (for instance, when a person wears a balaclava, a helmet, sunglasses, a veil or a mask, or the head is not aligned properly on the image).Let us describe some results.Brunelli and Poggio (1993) applied template matching strategy and came up with the following ranking of saliency: eyes, mouth, nose and whole face template.Similar results were obtained in Lam and Yan (1998) where the created feature windows were compared using correlation values as a similarity measure and in Kwak and Pedrycz (2005) where the Fisherfaces method was applied.Radial Basic Functions Networks were used to determine dependency of recognition rate on the facial feature or percent of utilized face image (Sato et al. 1998;Gutta et al. 2002;Gutta and Wechsler 2003).Ekenel and Stiefelhagen (2009) presented a comparison of five salient region-based partitioning approaches and one generic approach with results obtained on five different databases.However, new images were built from the divided segments.Generic partitioning provided the highest correct recognition rates.The best of salient-based partitioning schemes was the one containing five overlapping regions: forehead, left and right eyes, left and right cheeks.Experiments with images consisting of 14 small face regions did not produce such good results.The recognition method was based on feature extraction with discrete cosine transform (Ekenel and Stiefelhagen 2005).Yan and Osadciw (2004) discussed combination of the Eigenfaces method with eyes, mouth, nose and forehead Eigenfeature.Adding an individual Eigenfeature improves the identification accuracy except for the nose Eigenfeatures.In (Dargham et al. 2012) the results of LDA for chosen partial regions were presented.Finally, the performance of 14 facial components in a 3D morphable model approach is considered by Heisele and Blanz (2005).All the results indicate that the eye region exhibit the highest discriminative value.Other works presenting results for particular regions of the face can be found in (Savvides et al. 2004a(Savvides et al. , b, 2006;;Neo et al. 2007Neo et al. , 2010;;Teo et al. 2007;Wright et al. 2009;Woodard et al. 2010;Park et al. 2011).
To take advantage of information about the cue saliency in the process of face identification based on an aggregation of a given number of classifiers one can use the fuzzy measure, which may help a determination of weights associated with the corresponding criteria.Then the aggregation procedure can be realized by fuzzy integral.This technique was presented by Kwak and Pedrycz (2005), where the Fisherfaces method was applied to the eye, nose, mouth and whole face regions and by Melin et al. (2005), where the modular neural networks were used to each of the facial areas around eyes, nose and mouths.In the similar way, fuzzy measure and fuzzy integral were applied to aggregate the classifiers obtained by the Fisherface method based on subimages decomposed by wavelets (Kwak and Pedrycz 2004), and to aggregate the separate component SVMs outputs with each component SVM importance (Yan et al. 2006).In the three-dimensional case this method was used by Lee and Marshall (2008).Application of fuzzy measure to gender recognition can be found in (Li et al. 2012).The authors used fuzzy measure to aggregate the results produced by support vector machine classifiers obtained for each of the following features: hair, forehead, eyes, nose, mouth, chin and clothing.Other applications of fuzzy measure in pattern recognition were presented, for instance, in Pedrycz (1990), where the measure was found helpful in the process of features selection.Graves and Nagarajah (2007) presented the model of estimation of the uncertainty for a new observation for multiclass classifier.In Keller et al. (1994) fuzzy measure was applied to fusion of handwritten character classifiers, and in (Yan and Keller 1991) the method of image segmentation was described.All these applications are rooted in the models of decision-making theory.The detailed study of application of the fuzzy measure in it can be found in (Grabisch 1995).
Other methods utilizing the cue saliency and psychophysical mechanisms in the face recognition process were described in (Venkat et al. 2013), where similarity mappings, existing in facial regions by means of Bayesian Networks, were modeled, or in (Da et al. 2010), where LBP-based local descriptor was combined with the weights assigned to distinctive facial areas.Similarly, a 9-region mask was applied to obtain weights for so-called Principal Local Binary Patterns (Pujol and Garcı ´a 2012).Some methods of determining weights for local descriptors based algorithms use human fixations (Fang et al. 2011;Choi et al. 2012).
In this paper, we construct the fuzzy measure by using the results of psychological and computational experiments and relate to the saliency of facial regions for recognizing faces by humans.We are motivated by the fact that the fuzzy measure can potentially capture the important information about the saliency of the particular facial areas and their combinations.People use the information contained in the merged regions and in the particular areas to the recognition purpose.It is intuitively apparent that the more features (with an assumption that each of the features is of relatively high significance itself) are taken into account, the higher performance of face recognition could be achieved.The monotonicity property plays a vital role here.To proceed with a formal description of this effect, we investigate a concept of a fuzzy measure (in which the notion of monotonicity assumes a pivotal role) and provide with extensive experimental evidence in order to quantify the performance of the fuzzy measure.We consider face regions such as eyes, nose, mouth and cheeks areas which are intuitively the most descriptive features of face used in the recognition activities.The main objectives of this study can be outlined as follows: • Investigation of the abilities of the fuzzy measure to reflect the importance of information included in facial regions and their aggregates.• Quantification of the role of the face regions and, particularly, their combinations when considered in the context of face recognition.
• Determination of dependencies between potential importance (expressed by fuzzy measure) of merged face areas and recognition rate produced by using wellknown face recognition algorithms such as Eigenfaces and Fisherfaces.• Comparative analysis of the results obtained in the experiments on automatic and human face recognition.• Construction of the Sugeno fuzzy measure using the results of psychological experiments on cue saliency and offering a novel approach to model the mechanism of faces identification by people.
The study provides a comprehensive examination of potential abilities of fuzzy measure to capture and quantify the importance of the information contained in the six of the most salient facial features.We also look at the contributions of all possible concatenations of the facial components when designing face classifiers.This way we concentrate on the face identification when taking into account both the information contained in the entire face (as a result of features merging) and in the local components of the face.We examine the way on how the presence of these facial components influences the recognition process.
The paper is organized as follows.The general processing scheme is presented in Sect. 2. In Sect.3, we discuss fuzzy measure and its usage and elaborate on the semantics in the context of feature-based face recognition.In Sect.4, presented are experiments while conclusions are covered in Sect. 5.

A general processing scheme
An overall scheme highlighting a sequence of main processing phases is presented in Fig. 1.First, a face image is preprocessed (which includes cropping, scaling, and eventual histogram equalization).In the sequel, a position of salient facial regions such as eyes, eyebrows, nose, mouth and cheeks are determined manually by selection of facial areas giving the highest accuracy rates when running some preliminary tests.Then, using the selected facial segments, we determine the accuracies associated with all the regions, i.e., the combinations of these atomic areas by applying the PCA method and, simultaneously, PCA followed by the well-known LDA dimensionality reduction procedure (known as Fisherfaces (Belhumeur et al. 1997)) and determining the recognition rates.The Euclidean distance function is commonly used.
Using the accuracy values obtained for the atomic facial regions we construct a fuzzy measure (more specifically, k-fuzzy measure) for all the combinations of these regions.In parallel, we determine the recognition rates obtained for the some combinations of the areas when using the PCA and Fisherfaces methods.

Interpretation of the fuzzy measure
Classifying face images based on a given number of classifiers (i.e., face regions compared independently), we take into account weights of criteria (both individually and considering their groups, i.e. concatenations of facial regions).They express various qualities of recognition obtained from the classifiers and should be determined to affect the final decision about the classification of an unknown image in a proper way.These weights being used in the aggregation process of outcomes of different classifiers can be represented by fuzzy measure and the values obtained in the process described in previous section, see Fig. 1, are suitable for this purpose.Moreover, fuzzy measure can express the dependencies between these regions on a basis of which classifiers are constructed (Grabisch 1995).More formally, let us assume that X ¼ x 1 ; . ..; x n f gdenote the overall face area, where x 1 ; . ..; x n stand for non-overlapping facial segments such as eyes, nose, etc.A fuzzy measure is defined as a set function g : P X ð Þ !0; 1 ½ satisfying the following conditions: The first condition quantifies with the observation that having the entire face image we have complete information about the face.The second property (monotonicity) quantifies the psychologically motivated observation that the likelihood of a proper identification of the individual increases when the knowledge about the available region of face is augmented by pieces of knowledge concerning other facial areas.In the original definition of the fuzzy measure the limit condition is also provided (Sugeno 1974) where A n f g; n ¼ 1; 2; . ..; is an arbitrary increasing sequence of measurable sets.Sugeno (1974) proposed a parametric version of the fuzzy measure (often denoted g k Þ by introducing the following aggregation scheme. to be satisfied for any pair of disjoint sets A and B. The value of the parameter k describes the dependency between the two combined face regions.Note that if k \ 0 then the measure has the property of sub-additivity which means that the satisfaction arising from one source of evidence, i.e. region, more or less entails the satisfaction arising from the second and they are in competition (or redundancy).
Here the combination of the areas can be not as efficient as one might have expected.On the other hand, the values k [ 0 imply a synergy effect meaning that these sources of evidence support each other (Grabisch 1995;Pedrycz and Gomide 1998).In such a case a combination of two or more classifiers should be more efficient.If the set of facial regions is chosen, it is obvious that the Sugeno measure cannot be both sub-additive for one group of parts of face and super-additive for another because of the constant value of the parameter k.
The parameter k can be determined by solving a polynomial equation of the following form (Sugeno 1974): where, as previously, x 1 ; . ..; x n are non-overlapping facial regions such as eyes, nose, etc.There exists a unique solution k to the above equation with k [ À 1; k 6 ¼ 0 (Sugeno 1974).The values g i are known and are called densities of the fuzzy measure.Denoting A i ¼ x 1 ; . ..; x i f g ; A iþ1 ¼ x 1 ; . ..; x i ; x iþ1 f g ; we use the recurrence formula to calculate the fuzzy measure over the combined facial regions:

Experiments
The main objective of the series of experiments carried out in this study is to determine the accuracies of recognition for salient facial regions and their combinations as well as the corresponding values of fuzzy measure for these areas.
Similarly, we calculate these values using the results of psychological studies reported in the literature.We are interested in the examination of the properties of fuzzy measures obtained this way.The results of experiments reported in this study are obtained for the AT&T (formerly ORL) image database (http://www.cl.cam.ac.uk/research/ dtg/attarchive/facedatabase.html) and the Facial Recognition Technology (FERET) Database (Phillips et al. 1998).
The AT&T database consists of 400 images of 40 individuals with various illumination, pose, and expression.The number of images per subject is always 10.In the experiments completed for the FERET database we use its sets (called ba, bk and bj) which consist of 600 images of 200 individuals (3 pictures of each subject) taken for various expression and different illumination conditions.
In the preliminary tests, the following six facial areas were selected: eyebrows, eyes, nose, mouth, left cheek, and right cheek.These regions cover most of the face area and are important in the process of recognition/classification realized by a human being.Next, we found the detailed sizes of the regions, which produce the highest accuracy rates for each of these six segments (refer to Fig. 2; Table 1 for details).
In the first series of experiments we present the results of recognition rates and calculated fuzzy measures for the atomic regions and their combinations by using the PCA method.We divide the set of images from the AT&T database taking five randomly chosen images of an individual to the training set and the rest of images are included to the testing set.Similar experiments we do with images from the FERET database (two training images and one testing image per person).Simultaneously, we apply the Fisherfaces method to the same datasets.Each of these computations was repeated 100 times and the final recognition rate is taken as an average of all the results.
The values of the recognition rates for all the atomic salient facial regions are reported in Table 2. Figure 3a illustrates the values of resulting from the calculations of the fuzzy measure (1) and ( 2) with respect to combinations of regions and the corresponding recognition accuracy values obtained for the concatenated images (i.e., vectors being the result of merging two images treated as vectors of pixel values) of two facial segments obtained by using the PCA and PCA followed by LDA methods on the AT&T database, PCA and PCA followed by LDA on the FERET   The scatter plot of the values of accuracy and the accuracy values produced when using the fuzzy measure are presented in Fig. 4a-d.Along with the results, we also show a linear regression, which expresses the relationship between the results formed by the fuzzy measure and those obtained when running classification schemes.Note that the accuracy values have been rescaled to be consistent with the boundary condition imposed on the fuzzy measure.
Similar results are obtained for the results produced when using of probability measure, representing the class of measures fulfilling the condition of additivity (see Fig. 4e-h).This measure is constructed using the recognition rates obtained for basic facial segments (eyes, nose, etc.) and then simply applying the additivity condition the combination of regions.All the values are normalized to fulfill the boundary condition 1.
The values of the parameter k, maximal and minimal differences between the values of the Sugeno measure and the recognition rate obtained in the classification process are presented in Table 3. Figure 5 visualizes the recognition rates of the Fisherfaces method and corresponding fuzzy measures for the two groups consisting combinations of four facial parts, i.e., eyebrows, eyes, nose, and mouth and nose, mouth, left cheek, and right cheek.These parts are distinguished because they correspond to the upper and lower parts of face in the first and in the second case, respectively.In Fig. 6 we include the values of the classification accuracy along with the values of the fuzzy measure for the areas of eyes and mouths being gradually augmented by other regions proceeding with their neighbors.
The results confirm in general the following facts: For any part of the face serving as a classifier (especially containing eyes) the recognition accuracy is better when the area of consideration is getting bigger.Moreover, the recognition rate increases when the combination of salient facial regions is considered.Finally, the most descriptive region of the face is eyes and eyebrows area (in general the upper half of face).Especially eyebrows are very significant in the process of computational face identification (over 81 % accuracy rate obtained for the AT&T database) and their presence in the considered area can increase the recognition rate significantly (see Fig. 6b).Even augmenting the eyes region by the other face segments does not increase the recognition rate in a meaningful fashion (see Fig. 6a).What becomes even more important, is that the results show that the fuzzy measure is strongly sub-additive (k À 0:9608); this occurs in all considered cases with an exception of PCA for FERET.In this case the accuracies are very low and the method is rather inefficient.Therefore, the value of the parameter k is positive (it comes from the boundary conditions on fuzzy measure).However, in all the considered cases fuzzy measure can be treated as a good source of evidence for the salient facial regions as it corresponds to the accuracies obtained for the combinations of regions.This property leads to the conclusion that Table 3 The values of the parameter of the g k fuzzy measure, maximal and minimal differences between the calculated fuzzy measure and recognition rates obtained when using merged facial regions Moreover, the correlation between fuzzy measure and accuracy when all the six crucial facial segments (eyebrows, eyes without eyebrows, nose, mouth, left and right cheeks) are considered is less than the correlation in the cases of four facial regions (see Fig. 5).Therefore, we conclude that in the situation when the chosen segments of face are occluded it may be sufficient to take into account only the most important facial parts such as eyes and eyebrows with combinations with others being available for recognition.Nevertheless, the general trend is that both real recognition accuracy and Sugeno fuzzy measure increase when the higher number of facial regions is merged.The highest differences between the classification accuracy and fuzzy measure can be observed for the regions placed at the lower part of face such as nose, mouth and cheeks.As it was discussed in Sect. 1, these regions are considered to be less useful as classifiers than the eyes area.

Method
It can be noticed that fuzzy measure slightly tends to overvalue the potential weight of information included in these regions.
Comparing the scatter between the accuracy of classifiers and fuzzy and probability measures, respectively, it is easy to see that fuzzy measure is more flexible and its values better fit to the scaled values of recognition rates than the values of the measure fulfilling the condition of additivity.Only in the case of PCA on the FERET database (where the method is of low efficiency) the scatters are similar (see Fig. 4c, g).Now let us consider the psychological experiments reported by Matthews (1978).In these experiments, the subjects were to answer a question about similarity or dissimilarity of two images after modifications of one or more facial features: eyes, eyebrows, nose, mouth, chin and hair.The last feature is not discussed here as an exterior face area.The face images were constructed using a police ''Identikit'' from the transparent overlays of facial features.Chosen results (and Sugeno measure values) are presented in Fig. 7.A comparison of accuracies observed in this experiment and in our computational experiments with the Fisherfaces method for chosen facial regions is presented in Fig. 8a.Similar comparison containing the values of the Sugeno fuzzy measure is presented in Fig. 8b.
The value of parameter k obtained from the accuracies of human recognition is -0.99994 and the correlation between recognition accuracies and computed Sugeno measures for selected parts of face is 0.846.It means, as in the case of automatic face identification described above, that this measure reflects the way people recognize faces and can be applied to model the interactions between facial segments.Figure 8a shows that in the process of identifying faces by humans, particular features and their combinations have similar meaning and their saliency is comparable with the saliency in the process of automatic face recognition (keeping the relationships between their values).As a consequence similar situation takes place in case of the fuzzy measure values (see Fig. 8b).
In the last series of experiments we divide the group of individuals into eight subsets A 1 ; . ..;A In case of the AT&T database first of this sets consists of 5 individuals, second consists of 10, etc.Similarly, the subsets of FERET are built from images of 25, 50,…, 200 people, respectively.As before, we find recognition accuracies using the PCA and PCA ?LDA methods.Next, we construct the Sugeno fuzzy measure taking as fuzzy densities the accuracies for the atomic facial segments such as eyebrows, eyes, nose, mouth, left and right cheeks areas.
The values of the parameter k are presented in Fig. 9.It can be observed that this value tends to -1 while the number of people in the considered dataset decreases and while the method is more efficient, i.e. having highest accuracies.The reason of this is the boundary condition 1.The measure tends to fulfill it by overvaluation of the results.The most meaningful example here is almost linear dependency between the number of people in the dataset and the values of k in case of the PCA method.
Figure 10 presents the values of fuzzy measure depending on number of considered classes from each database.Four upper and four lower combined facial segments were taken under consideration.The results show that the measure is rather stable in the case of efficient method such as Fisherfaces with five training images per class for AT&T database or in the case of significant facial area, e.g.eyes and their neighborhood.However, in other cases, particularly PCA, the value of fuzzy measure decreases when the number of classes increases.It is strictly related with the real rates whose values decrease in a similar way.

Conclusions and future work
In this study, we have investigated an application of the fuzzy measure (Sugeno fuzzy measure) as a vehicle to quantify a way of aggregation of important discriminatory information conveyed by facial regions.We discussed the properties of additivity and monotonicity in the context of face recognition based on the salient facial regions.The comprehensive series of experiments led us to the conclusion that the fuzzy measure can be sought as a sound vehicle to aggregate evidence-pieces of knowledge residing within face segments.In most cases, we can conclude that the fuzzy measure (owing to its monotonicity) comes as sound classification model.
Future work may include an efficient application of the fuzzy measure (particularly, related to the psychological studies) in face recognition systems based on other than PCA or LDA methods, development of measure being more flexible in the sense of expressing the interactions between higher number of the facial features, as well as insightful study of the way of determining membership grades of a class in a classifier playing a significant role in information fusion by fuzzy integral.Other interesting issue may be a deepened study of the eyes and eyebrows area region and its impact on the recognition process.Finding the subareas highly responsible for the quality of the recognition process would significantly reduce the dimensionality of data needed to the computation.

Fig. 2
Fig. 2 Facial regions: a original image, b cropped face, c salient regions database.Figure3b-d present similar standings for combinations of three, four, five and all the regions, respectively.

Fig. 3
Fig. 3 Values of accuracy and fuzzy measure obtained for combinations of a two b three c four d five and six facial regions by using the PCA and PCA ?LDA methods on the AT&T and FERET

Fig. 5
Fig. 5 Accuracy of the Fisherfaces method and corresponding fuzzy measures for combinations of regions of a the upper and b the lower parts of face

Fig. 7
Fig. 7 Recognition accuracy and fuzzy measure built from the results of psychological tests.E eyes, Eb eyebrows, N nose, M mouth, Ch chin area

Fig. 9
Fig. 9 The values of the parameter k a AT&T b FERET

Table 1
Face regions and their characteristics (size of the regions)