Keywords

1 Introduction

Face recognition is one of the most common biometrics with unique advantages, such as naturalness, non-contact and non-intrusiveness. Its related research has been for many years of great interest to computer vision and pattern recognition communities, which has been exploited for applications such as public security [1], fraud prevention [2] and crime prevention and detection [3]. A fair amount of efforts have been made on the development of 2-D face recognition systems using intensity images as input data in the past. Despite 2-D face recognition systems being able to perform well under constrained conditions, they are still facing great difficulties as facial appearances can vary significantly even for the same individual due to differences in pose, lighting conditions and expressions [4]. Using the 3-D geometry of the face instead of its 2-D appearance is expected to alleviate the difficulties since the human face is a natural 3-D entity [5]. According to the type of features used, the relevant work for 3-D face recognition can be roughly classified into three major categories, which are geometrical feature-based, shape descriptor-based and prominent region-based approaches.

Geometrical feature-based methods achieve the face recognition task using structural information extracted from 3-D faces, such as landmarks, salient curves and geodesic-like patterns. Landmarks are representative key facial points often associated with in order to construct a feature space. Shi et al. [6] introduced a method based on the so called ‘soft’ landmarks, i.e., the landmarks that are easily located on actual skin surfaces, such as eye corners, mouth corners, nose edge, etc. It showed that these landmarks vary significantly if different subjects are used to generate them. The use of anthropometric facial fiducial landmarks for the face recognition was presented in [7]. Salient curves are a kind of the discriminative surface curves extracted from 3-D faces. The symmetric profile curve from the intersection between the symmetry plane and the 3-D facial scans was described in [8]. Three facial curves which intersect the facial scan using horizontal and vertical planes as well as a cylinder were proposed in [9]. A geodesic is a locally length-minimizing curve along the surface and it contains information related to the intrinsic geometry of an object. Mpiperis et al. [10] proposed a geodesic polar representation of the facial surface. With this representation, the intrinsic surface attributes do not change under isometric deformations and therefore it can be used for expression-invariant face recognition. Based on the similar concept, a method using the similarity measurement of local geodesic patch has been proposed by Hajati et al. [11].

Shape descriptor-based methods look into attributes of local surfaces and encode the 3-D face into special designed patterns, which are often invariant to the orientation of faces. A representation of free-form surfaces based on the point signature was proposed for 3-D face recognition by Chua et al. [12]. The approach uses the point signature extracted from the rigid parts of a face to overcome the challenge of facial expressions. A similar representation, named local shape map, was proposed in [13]. Tanaka et al. [14] introduced a special shape descriptor based on the extended Gaussian image (EGI) and local surface curvature. The EGI was used as a mediate feature after curvature-based segmentation on which principal directions are mapped as local features. Huang et al. [15] adopted a multi-scale extended local binary pattern (eLBP) as an accurate descriptor of local shape changes for 3-D face identification. A hybrid matching approach based on scale-invariant feature transform (SIFT) was designed to measure similarities between control and test face scans once they were represented by the multi-scale eLBP.

Prominent region-based methods use dense point clouds detected from specific regions to form feature vectors. The similarity between two faces is determined based on their relationship in the feature space. Queirolo et al. [16] proposed a union of the segmented regions from faces for the purpose of face identification. These regions include the circular nose area, elliptical nose area and upper head. Regions segmented by median and Gaussian curvature were utilised for the feature construction in [17]. Gupta et al. [7] manually placed anthropometric points on faces and used a feature vector based on the anthropometric distance between points for face recognition. Xu et al. [18] proposed a method which converted a 3-D face into a regular mesh and created a feature vector to encode the 3-D shape of face based on this regular mesh.

In this paper, we propose a new geodesic-map representation for 3-D faces, which is an extension of original work proposed by Quan et al. [19]. The proposed method preserves the intrinsic geometrical information related to the identity of the face. It can be considered as the expression-free representation for faces from the same person and is able to reduce the non-rigid deformation effect in the face recognition task. The method first creates the geodesic strip for each extracted landmarks on a single face based on the geodesic distance measurement between surface points and the landmark. Then it combines the calculated geodesic strips for all extracted landmark in order to form a map. This map is the new representation of the face. In the subsequent stage of the statistical shape modelling, the search for the dense points correspondence is therefore simplified to an image-based method using the calculated geodesic-map instead of the iterative search in 3-D space. This helps to improve the efficiency of the whole face recognition task, including training and testing.

The rest of the paper is organized as follows: Sect. 2 introduces the sparse facial landmark detection. Section 3 describes the processing steps for generating the geodesic-map representation for 3-D faces. Section 4 explains the mechanism for dense point correspondence search across all training datasets. Section 5 presents the statistical shape models used in this work. Section 6 illustrates the model matching process. The experimental results using the statistical shape models for face recognition are presented in Sect. 7. Finally, concluding remarks and possible future work are given in Sect. 8.

2 Sparse Facial Landmark Detection

Landmarks are often used to assist the process of data registration in order to determine the coefficients of a transformation function. A minimum of three pairs of corresponding landmarks are needed if the transformation is considered as rigid and more is required when the transformation has more degrees of freedom. In this work, a small number of landmarks are extracted from 3-D faces, which are used for generating the geodesic-map representation and dense point correspondence search at the later stage. Using a combination of the shape index [20] and the intersecting profiles of facial symmetry plane [21], a set of 12 key landmarks can be extracted, which are two upper nose base, two nose corners, upper and lower lip tips, two inner eye corners, two outside eye corners and two mouth corners.

The general strategy of this landmark detection process is to use the Gaussian curvature and mean curvature to locate a set of candidates for each landmark along the intersecting profiles of facial symmetry plane, and then select the candidate with the shape index as the key landmark. The shape index S(p) at point p is calculated as:

$$\begin{aligned} S(p)=\frac{1}{2}-\frac{1}{\pi }{{\tan }^{-1}}\frac{{{K}_{1}}(p)+{{K}_{2}}(p)}{{{K}_{1}}(p)-{{K}_{2}}(p)} \end{aligned}$$
(1)

where \(K_{1}(p)\) and \(K_{2}(p)\) are the maximum and minimum local curvature at point p, respectively. According to the value of shape index, between zero and one, each point can be classified into six types of shape, such as cup, rut, saddle, ridge and cap. Figure 1 demonstrates the locations of all 12 key landmarks extracted. Since the extracted landmarks are sparse and around the facial areas that are anatomically stable, as they are well defined for all faces, and invariant to facial expressions, they are more likely to be robustly detected than landmarks located in other parts of the face.

Fig. 1.
figure 1

Examples of sparse landmarks detected on 3-D faces.

3 Geodesic-Map Representation

A geodesic is a generalization of an Euclidean distance and is defined as the length of the shortest path between two points along a continuous surface [22]. Bronstein et al. [23] proposed a face recognition method based on transformation, \(\psi \), mapping an original face \(\mathbb {S}\) with the given geodesic distance \(d_{\mathbb {S}}(\xi _{1},\xi _{2})\) onto another space \(\mathbb {S'}\) with the Euclidean distance \(d_{\mathbb {S'}}(\psi (\xi _{1}),\psi (\xi _{2}))\) in such a way that corresponding distances are preserved:

$$\begin{aligned} d_{\mathbb {S}}(\xi _{1},\xi _{2})=d_{\mathbb {S'}}(\psi (\xi _{1}),\psi (\xi _{2})) \end{aligned}$$
(2)

This means that the surface information represented by the geodesic distance between different points on the surface is preserved. Such mapping is invariant to rigid transformation as well as any non-rigid deformation which does not change the distance between the points on the surface. Based on the assumption that for the same subject facial expressions do not change geodesic distance, the dense point correspondence between two faces of the same subject can be estimated using geodesic distances between surface points to a number of fixed points on both surfaces. Figure 2 illustrates the search for point \(\mathbf {P'}\) corresponding to the given point \(\mathbf {P}\) on another surface, where \(\mathbf {L1}\), \(\mathbf {L2}\), \(\mathbf {L3}\) are three landmark points on one surface with the corresponding geodesic distances to \(\mathbf {P}\) denoted by \(\mathbf {g_{1}}\), \(\mathbf {g2}\), \(\mathbf {g3}\); \(\mathbf {L'1}\), \(\mathbf {L'2}\), \(\mathbf {L'3}\) are three landmark points on the other surface with the corresponding geodesic distances to \(\mathbf {P'}\) denoted by \(\mathbf {g'1}\), \(\mathbf {g'2}\) and \(\mathbf {g'3}\). \(\mathbf {P'}\) is said to correspond to \(\mathbf {P}\) if it is found that \(\mathbf {g1=g'1}\), \(\mathbf {g2=g'2}\) and \(\mathbf {g3=g'3}\). For a unique solution, a minimum of three fixed landmark points are needed.

Fig. 2.
figure 2

Geodesic distances for searching point correspondence: (a) original surface; (b) deformed surface from (a).

Fig. 3.
figure 3

Geodesic-map representation for a 3-D face (N is the number of surface points on the face).

The geodesic distance is used in this paper to assist the dense point correspondence search across 3-D faces. 12 key landmarks extracted using the method described in Sect. 2 are considered as the fixed surface point landmarks on 3-D faces. A geodesic-map representation is proposed to simplify overall correspondence search. The geodesic map is built using the following three steps. The first step is to compute the geodesic distances between 12 key landmarks and all surface points on a 3-D face. The second step is to re-arrange the related geodesic distances of each key landmark to an geodesic-stripe. The final step is to combine all the geodesic-stripes in order to form the geodesic-map. An example of generating the geodesic-map representation for a 3-D face is illustrated in Fig. 3, where the colour of the surface represents geodesic-distances to a specific landmark. In the geodesic-map, the row index corresponds to the order of landmarks and the column index matches the order of the surface points. From the figure, it can be seen that the 3-D faces are transformed from \(\mathbb {R}^{3}\) space to a \(\mathbb {R}^{2}\) image space and this enables an efficient dense point correspondence search. Figure 4 shows examples of 3-D faces and their corresponding geodesic-maps.

Fig. 4.
figure 4

Examples of 3-D face scans with the corresponding geodesic-maps: (a) faces from the same person with four expressions (from left to right): neutral, anger, fear and happiness; (b) corresponding geodesic-maps calculated using 12 key landmarks.

4 Geodesic-Map Matching

Having the geodesic-maps created, the pair-wise dense point correspondences among faces can be estimated using standard image-based matching techniques. For the faces from the same person, this can be achieved by using cross-correlation [22] in which geodesic-map’s column of the given face is cross-correlated with the target face geodesic-map. The geodesic-map’s column of the target face with the highest cross-correlation value is considered as being in correspondence with the point in questions from the given face as shown in Fig. 5.

Fig. 5.
figure 5

Pair-wise point correspondence search procedure using geodesic-map representation (N and M are the number of surface points on the original face and target face, respectively).

For the faces from different persons, the geodesic-map cannot be directly used for the correspondence search simply because the characteristics of geodesic distance. Computation of the point correspondence between faces of different subjects is required to construct the statistical shape model for the face recognition task. To tackle this problem, a data warping process is introduced prior to the geodesic-map matching process when it applying to faces from different subjects. The data warping is based on the Thin-plate Splines (TPS) warping technique [24] and applies to the target face. 12 pairs of extracted key landmarks from both original and target faces are used as the control points for the calculation the warping function. It is then used to warp the whole target face to match the one from the original face so that the standard geodesic-map matching described above can be carried out. This process is able to minimise non-rigid deformation caused by changes of person.

5 Dimensionality Reduction

Statistical models have been successfully used for face analysis and recognition for many years. The core of the models is the dimensionality reduction, which often serves the purpose of feature vector extraction. PCA is often the popular choices, which produces a compact representation based on low dimensional linear manifolds [25]. However, the models fail to discover the underlying nonlinear structure of facial data especially for faces containing facial expressions. Another choice is Locality Preserving Projection (LPP) and it is able to handle a wider range of data variability while preserving local structure linked to the nonlinear structure of facial data. In this work the statistical model, LPP, was used to evaluate their performance for the task of face recognition. The detail of the method can be found in [26].

6 New Dataset Fitting

Given the eigenvectors of statistical models extracted from the training dataset, the estimation of feature vectors in order to synthesise shape for faces from a new dataset, using the constructed statistical model is the next processing stage. This is usually achieved by a recursive data registration in which the shape and pose parameters are iteratively estimated in turn. While pose parameters control the orientation and position of the model, shape parameters encapsulate deformation of the model. Instead of applying one of the widely used approaches, modified Iterative Closest Point (ICP) registration, a hybrid fitting based on the combination of geodesic-map representation and feature sub-space projection is proposed in this work. In order to solve all unknown parameters effectively, the following standard optimization scheme is used:

  1. 1.

    Create the geodesic-map representation for both the model and a new face using the method described in Sect. 3.

  2. 2.

    Estimate the dense point correspondence between model and new face using the geodesic-map matching process explained in Sect. 4.

  3. 3.

    Calculate feature vector, \(\mathbf {\alpha }\), for the new face using back-projection based on the created feature sub-space, described as

    $$\begin{aligned} \mathbf {\alpha }=\mathbf {W}_{opt}^{T}\widehat{\mathbf {x}} \end{aligned}$$
    (3)

    where \(\widehat{\mathbf {x}}\) is the surface points related to estimated dense correspondences from the new face and \(\mathbf {W}_{opt}\) is the matrix containing feature vectors.

  4. 4.

    Generate a new instance of the statistical model, \(\mathbf {Q}\), using the feature vector \(\mathbf {\alpha }\), as

    $$\begin{aligned} \mathbf {Q}=\mathbf {W}_{opt}\widehat{\mathbf {\alpha }} \end{aligned}$$
    (4)

    and repeat steps 2 to 4 until the preset convergence condition is reached.

In this optimization scheme, the geodesic-map representation and map matching serve the similar purpose as applied to the training dataset in which it estimates correspondence between both the models and new face. Since the models have learnt non-rigid deformation from faces across different identities in the training set and can be adapted to match the deformation in the new dataset, the TPS warping techniques described in Sect. 4 is no longer needed. Furthermore the use of the proposed method speeds up the whole fitting process for the new dataset and saves up to \(70\,\%\) computation time on average compared with the widely used modified ICP registration [19, 27]. A few examples of the fitting results generated using the LPP-based method are shown in Fig. 6. From the figure it can be seen that the shape of synthesised faces are very close to new faces.

Fig. 6.
figure 6

Example of new dataset fitting: (a) new faces; (b) synthesised faces after the fitting.

It is worth noticing that the feature vector \(\mathbf {\alpha }\) controls shape of the models in order to match it to the new face. Therefore it contains geometrical information of the face and is used as the feature vector for the classification of face identity in this work. A variety of classification methods can be applied, including, Nearest-Neighbour, Naive Bayesian, Support Vector Machine, etc. For the sake of simplicity and to demonstrate the discriminative nature of the shape parameters \(\mathbf {\alpha }\) for the proposed feature vector, the Nearest-Neighbour classifier is chosen for the face classification in this work.

7 Experimental Results

To show the effectiveness of the proposed method for the purpose of face recognition tasks, two publicly available 3-D facial databases, BU-3DFE and Gavab, were exploited for the evaluation in this work. The BU-3DFE database consists of 2,500 3-D faces from 100 people, with age ranging from 18 to 70 years old, with a variety of ethnic origins including White, Black, East-Asian, Middle East Asian, Indian and Hispanic Latino [28]. Each person has seven basic expressions. The Gavab database contains 549 face scans from 61 different subjects [29]. Each subject was scanned 9 times for different poses and expressions, giving six neutral scans and three scans with an expression. The scan with missing data contains one scan while looking up (\(+35^{\circ }\)), one while looking down (\(-35^{\circ }\)), one for the left profile (\(-90^{\circ }\)), one for the right profile (\(+90^{\circ }\)) as well as one with random poses.

7.1 Facial Expression Changes

The robustness to facial expression variation is an important aspect in face recognition. To test the face recognition invariance with respect to face articulation, a series of tests were run and the performance of the proposed method is compared with that of the state-of-the-art methods, including Patch Geodesic Moments [10], Geodesic Polar Representation [11] and Canonical Image Representation [23]. In order to make a direct comparison with the results reported in [11], the same experimental protocol used in [11] is adopted here. The performance is measured in terms of rank-1 recognition rate and the Cumulative Matching Characteristics (CMC) [30]. In the test, all faces with neutral expression from BU-3DFE database are used to form the statistical models, while the rest of the database is used as the testing faces.

The rank-1 recognition rates of the proposed approaches are given in Table 1 together with the reported results of the Patch Geodesic Moments, Geodesic Polar Representation and Canonical Image Representation [11]. From Table 1, it can be seen that among the four methods LPP-based approach achieved the highest recognition rate with an average accuracy of \(89\,\%\), outperforming the state-of-the-art 3D expression-invariant techniques by at least \(4\,\%\). It is worth noticing that the recognition rates for different expressions range from \(87\,\%\) to \(94\,\%\). This shows that the proposed statistical shape modelling scheme can handle facial expression changes well but still introduces uncertainty into face recognition task caused by facial expressions.

The CMC of the proposed methods together with those benchmark methods are shown in Fig. 7. From the figure it can be noted that the recognition rate of the proposed LPP-based method is always the highest.

Table 1. Performance comparison under facial expression changes.
Fig. 7.
figure 7

Recognition rate obtained under facial expression changes.

7.2 Data Resolution Variation

In many practical applications, the data resolution usually varies because of the specification of data acquisition system, the need of data storage or the use of pre-processing. It often requires face recognition system to cope with low-resolution data. In order to evaluate the capability of the proposed methods in terms of handling low-resolution data in the face recognition task, a set of experiments were conducted using data with \(75\,\%\), \(50\,\%\) and \(25\,\%\) of the original resolution as the test dataset. The original resolution is approximately 5,000 surface points for each face. In terms of experimental strategy, all 2,500 faces from BU-3DFE database were used in the experiments. The faces were divided into ten subsets with each subset containing all 100 subjects and all seven expressions. One subset is selected for testing while the remaining subsets were used for training. Such experiments are repeated ten times with a different subset selected for testing each time. The faces in the training set are not used for the testing. Figure 8 reports the CMC of the proposed method. From the figures it can been seen that the method is able to achieve reasonable recognition rates with data resolution of \(75\,\%\) and \(50\,\%\) compared to the resolution of \(25\,\%\).

Table 2. Recognition comparison on missing data using Gavab database.

7.3 Missing Data

In order to evaluate the missing data challenge of the proposed method, and compare with the results achieved by the existing benchmark methods reported in [31], the same experimental protocol introduced in [31] was used here. The benchmark methods include sparse representation [32], 3-D ridge images [33], concave and convex regions [34] and elastic radial curves [31]. In the experiment, the frontal scans with neutral expression of each person was taken as the training set. The rest of the scans were used for testing. Since the proposed approach is not designed for working on facial scans with a large part of missing data, the scans for the left and right profiles were not included in testing. Table 2 illustrates the results of the rank-1 recognition accuracy for different categories of testing faces. From the table, it can be seen that the proposed approach provides a high recognition accuracy on both expression and pose variations and outperforms majority of the existing methods and its performance is close to the best recognition accuracy achieved by the elastic radial curves [31].

Fig. 8.
figure 8

Recognition rate obtained with various resolution.

8 Conclusions

This paper presents an effective representation with the statistical shape modelling scheme for 3-D face recognition. Given a set of training faces with a variety of facial expressions, the proposed scheme is able to effectively estimate the accurate dense point correspondence among the faces using the geodesic-map, construct the statistical shape models and synthesise appropriate shapes for a new face. The face recognition experiments show that the proposed method is handling well non-rigid deformations caused by the changes in the appearance (e.g. due to weight gain was not tested here) as well as certain level of missing data. It provides evidence that the proposed method can cope with the face recognition task with lower data resolution. The use of geodesic-map also helps improve the efficiency of the entire face recognition task. The research will be extended further by taking into consideration other practical factors, such as independent database, lack of training samples and occlusion.