A 3D morphometric perspective for facial gender analysis and classification using geodesic path curvature features

The relationship between the shape and gender of a face, with particular application to automatic gender classification, has been the subject of significant research in recent years. Determining the gender of a face, especially when dealing with unseen examples, presents a major challenge. This is especially true for certain age groups, such as teenagers, due to their rapid development at this phase of life. This study proposes a new set of facial morphological descriptors, based on 3D geodesic path curvatures, and uses them for gender analysis. Their goal is to discern key facial areas related to gender, specifically suited to the task of gender classification. These new curvature-based features are extracted along the geodesic path between two biological landmarks located in key facial areas. Classification performance based on the new features is compared with that achieved using the Euclidean and geodesic distance measures traditionally used in gender analysis and classification. Five different experiments were conducted on a large teenage face database (4745 faces from the Avon Longitudinal Study of Parents and Children) to investigate and justify the use of the proposed curvature features. Our experiments show that the combination of the new features with geodesic distances provides a classification accuracy of 89%. They also show that nose-related traits provide the most discriminative facial feature for gender classification, with the most discriminative features lying along the 3D face profile curve.


Introduction
Gender identification plays a remarkable role in social communication. Humans find this task relatively easy. They are remarkably accurate at determining the gender of subjects from their facial appearance. Even with altered hairstyle, removal of men's facial hair, and no cosmetic cues, humans can still determine subjects' genders from their faces with more than 95% accuracy [1][2][3][4][5][6][7]. However, achieving similar accuracy in automatic gender classification using computers remains a challenge. It is crucial in many applications, for instance making humancomputer interaction (HCI) more user friendly, conducting passive surveillance and access control, and collecting valuable statistics, such as the number of women who enter a store on a given day. Researchers have considered techniques for gender classification since the 1990s, when SexNet, the first automated system capable of gender recognition using the human face, was created [8].
A particular topic of research, for more than two decades, has been the relationship between facial traits and gender classification or face recognition. Enlow and Moyers [9] contended that men have wider and longer noses compared to women, and that the male forehead is more bowed and slanting than the female forehead, while Shepherd [10] argued that the female nose is less pointed than the male nose. Another interesting study highlighted the relation between face parts and face recognition rate; the authentication score is obtained by combining the Surface Interpenetration Measure values corresponding to four different face regions: circular and elliptical areas around the nose, forehead, and the entire face region. Establishing which parts of the face and facial morphology features are most effective for gender classification remains an open research topic due to the strong dependency on ethnicity and age.
There are two main sources of information for gender analysis: the shape and appearance of the face [5].
3D facial images are rich in shaperelated information but there are difficulties in capturing such images.
In contrast, 2D facial images are easy to capture but poor in shaperelated information. Two studies [12,13] going back to 1997 and 1986, respectively, argued that geometric features are superior to textural features for the identification of visually derived semantic information such as gender, age, and expression. Geometric features of faces are usually defined with landmarks; for example, Farkas [14] annotated a set of 23 anthropometric facial landmarks to extract a set anthropometric facial measurements (Euclidean distances, proportions, and angles). The present study uses Farkas landmarks to define geometric features relevant to gender variations. This paper proposes new geodesic geometric features for gender analysis; specifically, these are derived from mean and Gaussian curvatures, shape indices, and curvedness calculated along the geodesic path between two landmarks. The determination of such features along geodesic paths is novel. We conduct a thorough investigation of the utility of our new features and into which parts of the face are the most effective for gender discrimination.
Direct Euclidean and geodesic distance measures between facial landmarks are quite common as local geometric gender classification features (e.g., Refs. [15,16]).
In the current study, shape features derived from 3D geodesic paths between facial landmarks are used as descriptors to classify gender; we use as a dataset the extensive Avon Longitudinal Study of Parents and Children (ALSPAC) teenage face database.
The mean curvature, Gaussian curvature, shape index, and curvedness are utilised to gain maximum benefit; these features have shown good results in facial morphology classification in the past. For example, mean and Gaussian curvature features have been utilised to classify philtrum morphology [17]. Shape index and curvedness features have been applied in a wide range of 3D face recognition applications [18][19][20][21][22], and provide good classification results, ranging from 77% to 99% accuracy, depending on the dataset's complexity and the algorithms employed. Zhao et al., in Ref. [23], used a geodesic network generated for each face with predetermined geodesics and iso-geodesics; they then computed the mean curvature, Gaussian curvature, shape index, and curvedness for each network point. The authors then utilised these features for automated 3D facial similarity measurement.
Furthermore, geodesic distances and curves have been utilised extensively in face recognition systems for faces with different poses and expressions (e.g., Refs. [24,25]), and even in video processing (e.g., Ref. [26]). Since these features have shown robust results in previous studies on 3D facial applications, our current work bases its new descriptors on a combination of these features along a geodesic path, to achieve better results.
The new combinations of 3D geodesic path features were assessed in a gender classification application using the ALSPAC dataset containing 4745 3D facial meshes of fifteen-year-olds. This is a challenging dataset, as gender discrimination in young subjects is much more difficult than in adults. The results were then compared with the gender classification results for the same dataset obtained using the method in Ref. [27]; our approach was found to improve classification accuracy by over 8%. An important part of our research was determining the most discriminative parts of the face for gender classification. Nose morphology was found to be most discriminative for teenage Caucasian populations.

Related work
It is logical to focus on biologically significant landmarks in order to extract features for facial gender classification, since gender is a biological characteristic. Facial landmarks can be divided into three broad categories [28]: biological landmarks, mathematical landmarks, and pseudo-landmarks.
Biological landmarks, which are often used by scientists and physicians, are meaningful points that are defined as standard reference points on the face and head, such as the pupil, dacryon, nasion, or pogonion. Mathematical landmarks are defined according to certain mathematical or geometric properties of human faces, such as the middle point between two biological landmarks. Pseudo-landmarks are defined using two or more mathematical or anatomical landmarks or hair contours.
Although the gender classification problem has been the subject of considerable research in recent years, current computer-based vision methods for facial gender recognition tend to overlook the use of facial biological landmarks as the basis for gender classification, despite their capability to efficiently classify gender with a minimum number of features when compared to the methods that use global 3D facial geometric features. For example, Ballihi et al. [24] used a large set of geometric-curve features (circular curves and radial curves) together with the Adaboost algorithm for feature selection to yield a gender classification rate of 86% on the FRGCv2 dataset. We exploit the usefulness of facial biological landmarks in our work, by introducing a novel 3D set of geometric features based on anthropometric landmarks, with the goal of gender classification and discovering of the relationship between facial morphology and gender.
Gender classification and face recognition with landmark-based and simple geometric features was the subject of much research in the past. For example, Burton et al. [5] manually annotated 73 biological (anthropometric) landmarks for a dataset of 179 subjects, employing a total of 2628 Euclidean distance measurements. Due to limited computational capacity, the authors handpicked only 19 distances (and related ratios) and used these features, attaining a classification accuracy of 94%. Han et al. [29] utilised more intricate measures such as the volumes and areas of face portions to classify gender, but considered a small public data set of only 61 subjects. The authors used a support vector machine (SVM) classifier to classify the areas and volumes of five local craniofacial regions: the temple, eyes, nose, mouth, and cheeks. Using five-fold cross-validation, they reported 83% gender classification accuracy. Gilani et al. [15] extracted geodesic and Euclidean distances between 23 biological landmarks annotated manually for 64 3D facial meshes. Using these features, the authors proposed an approach that gave a gender classification accuracy for 3D faces of 90%. Toma [27] derived 250 facial parameters-90 Euclidean distances between landmarks, 118 angles, and 42 Euclidean distance ratios-from the large ALSPAC dataset to predict gender with approximately 80% accuracy.
Finding the relationship between face morphology and gender has also received some recent attention. For example, Brown and Perrett [30] reported results of such an investigation for a database of 32 photographs of male and female faces. Their results showed that the jaw, eyebrows, eyes, and chin contribute (in descending order) to gender perception. For 3D faces, Gilani et al. [15] and Toma [27] considered which parts of the face are most effective in gender discrimination when using distance measurements between anthropometric landmarks. In Ref. [27], the authors used Euclidean distance measures on ALSPAC facial meshes and found the nose ridge to be the most discriminative portion of the face for gender, which is also our finding in this paper. In Ref. [15], the authors used geodesic and Euclidean distances between anthropometric landmarks and found that the distances between the eyes and forehead landmarks are the most gender discriminative distances for 64 adult faces.
It follows from the above literature survey that currently there are no semi-or fully automated gender classification methods that simultaneously satisfy the following requirements: (1) use surface geometric features dependent on 3D biological facial landmarks; (2) analyse which portions of a 3D face are the most discriminative between males and females; and (3) are validated on a large dataset. These three points gave us the motivation to propose a semi-automatic facial gender classification algorithm based on 3D facial morphology and biological landmarks. As a result, this paper proposes new 3D geometric features based on curvature measures calculated from the geodesic path between 3D facial landmarks. We show that these features provide better gender discriminating results than the current state-of-the-art methods, which is due to the improved capability of the included features to represent the shape of 3D facial surfaces.

Dataset and methods
The following section presents an overview of the dataset, landmarks, and tools that are used in this work.

Dataset and landmarks
The Avon Longitudinal Study of Parents and Children (ALSPAC) dataset is used in the present work. The ALSPAC study was designed to explore how an individual's genotype is influenced by environmental factors impacting on health, behavior, and development of children. The initial ALSPAC sample was based on 14,541 pregnant women with an estimated delivery date between April 1991 and December 1992. Out of the initial 14,541 pregnancies, all but 69 had known birth outcomes; 195 sets of twins, 3 of triplets, and 1 of quadruplets represented the multiple births. 13,988 children were alive at one year. Mothers were asked to complete postal questionnaires that covered a range of health outcomes [31]. Ethical approval for using this data in the present study was obtained from the ALSPAC Ethics and Law Committee and the Local Research Ethics Committees [32]. The cohort was re-called when the children were 15 years of age, and 3D face scans of their faces were obtained using two Konica Minolta Vivid 900 laser cameras [33]. The final sample represented normal variation in 4747 British adolescents (2514 females and 2233 males); 92% of these individuals were Caucasians, and the remaining subjects (8%) were a mixture of other ethnic groups [34]. Each set of scanned images was imported into Rapidform 2006 (a reverse engineering software package) and processed by removing noise and unwanted areas as well as the color texture in order to highlight morphological features, and eliminate the influence of dissimilar facial color tones. Then, 21 3D facial landmarks were manually identified and recorded for each 3D facial image using the method of Toma et al. [35]. The biological landmark points for this dataset as well as their locations and meanings on the human face are shown   Table 1 presents their definitions.

Estimating curvature normal cycles
A variety of algorithms based on the estimation of curvature tensors may be used to calculate curvatures on triangular meshes. For example, Taubin [36] developed a method called normal cycles, in which the principal curvature directions can be approximated by two of the three eigenvectors, and the principal curvature magnitudes can be calculated by linear combinations of two of the three eigenvalues. This theory provides a unified, simple, and accurate way to determine curvatures for both smooth and polyhedral surfaces [37,38]. The main idea of the normal cycle theory is that in order to acquire a continuous tensor field over an entire surface, a piecewise linear curvature tensor field should be calculated by estimating the curvature tensor at each vertex and then adding those values linearly across triangles. Figure 2 illustrates the main method used to calculate the curvature tensor for each vertex along the edge e, where for every edge e of the mesh there is a minimum curvature (along the edge) and maximum curvature (across the edge). These line dense tensors can be averaged over an arbitrary mesh region B according to the following equation: where v represents the vertex position on the mesh, |B| is the surface area around v over which the curvature tensor is estimated, β(e) is the signed angle between the normal vectors of the two oriented triangles incident to edge e, |e ∩ B| is the length of e ∩ B, and e is a unit vector in the same direction as e [39]. In practice, the normal cycle method is fast, and it provides excellent results, although the important issue of how the user should choose the neighbourhood B that approximates a geodesic disk around the vertex v still remains. Altering the neighbourhood size can significantly affect the results: small neighbourhoods provide better estimates (and cleaner data), while an increase in the neighbourhood size smooths the estimates, reducing sensitivity to noise [37,40]. The eigenvectors of T (v) and their associated eigenvalue magnitudes are used to estimate curvatures at each vertex. The principal curvatures k 1 and k 2 at v are estimated by the eigenvalues, while the eigenvectors represent the curvature directions [17,39].
Mean (H) and Gaussian (G) curvatures are given in terms of the principal curvatures k 1 and k 2 by The sign of G indicates whether the surface is locally elliptic or hyperbolic [41]. The shape index S quantitatively measures the shape of a surface at a point p and captures the intuitive notion of the local shape of a surface. Every distinct surface shape corresponds to a unique value of S (except for planar shapes). The shape index for any surface point can be calculated in terms of the principal curvatures k 1 and k 2 at that point as Another surface feature, called curvedness R, measures how much a surface is bent. Curvedness can define the scale difference between objects, such as the difference between a soccer ball and a cricket ball. This feature can be also calculated in terms of the principal curvatures as follows [42]: In general, shape index and curvedness are robust surface characteristics of a 3D image; they are invariant to changes in 3D image orientation. Figure 3 shows how variations in shape and curvedness, S and R, can be represented as polar coordinates within a Cartesian coordinate frame given by two principal curvatures (k 1 and k 2 ).

Geodesic path and distance
A geodesic path is the shortest curve or path between two points on a surface; the geodesic distance may be defined as the length of this curve or path [44] (see Fig. 4). The computation of geodesic paths and distances on triangular meshes is a common problem in many computer graphics and vision applications. Several approaches may be used to compute geodesic distances and paths on triangular meshes; some are approximate, while others are exact. Exact methods include the Mitchell, Mount, and Papadimitriou (MMP) method [45], the Chen and Han (CH) method [46], the Xin and Wang  method [47], and the heat method [48]. Other exact methods, e.g., see also Refs. [49][50][51][52], could also be considered, to determine if they influence gender classification accuracy positively.
We use the popular fast marching method [53]. Although this is an approximate method, in practice the average error is below 1%, and the computational time and memory requirements are considerably lower than those of other methods. In addition, the method has been shown to work well with large 3D meshes [54].
The fast marching method is a widely used algorithm in computer vision and computer graphics; for instance, it has been utilised to solve global minimisation problems for deformable models [55]. This algorithm works as follows. Suppose we are given a metric M (s)ds on some manifold S satisfying M > 0. If we have two points, r 0 ∈ S and r 1 ∈ S, the weighted geodesic distance between r 0 and r 1 is defined as where the y's are all possible piecewise regular curves on S such that y(0) = r 0 and y(1) = r 1 . Fixing the point r 0 as the starting point, the distance U (r) = d(S, r 0 , r) to all other points, r, can be computed by propagating the level set curve C t = {r : U (r) = t} using the evolution equation ∂C t /∂t(r) = n r /M (r), where n r is the exterior unit normal to C t at the point r and U (r) satisfies the nonlinear Eikonal equation [53,56]:

Euclidean distance
Many researchers have frequently used geodesic and Euclidean distances as features for 3D facial recognition, 3D facial morphological analysis, and gender identification. In Ref. [14], 3D Euclidean distance was used to measure the deviations of morphological facial traits from a normal face; these distances have been used also to delineate syndromes in Ref. [57]. Some studies have shown, however, that geodesic distances are more appropriate for gender identification and for measuring levels of facial masculinity/femininity [15,58]. The present work uses both Euclidean and geodesic distances as features for gender classification in order to compare their performance with the proposed geodesic curvature features (i.e., mean curvature, Gaussian curvature, shape index, and curvedness for geodesic paths between landmarks). Figure 4 shows the difference between Euclidean and geodesic distances.

LDA classifier
The present study uses linear discriminant analysis (LDA) as a binary classifier to predict the gender of 4745 3D facial meshes, since this classifier is easy to implement and does not require the adjustment of any tuning parameters. LDA has been successfully used for gender classification in the past [15,59].
In preliminary tests, we also found that LDA outperformed another popular classifier of choice, the support vector machine (SVM) [60]. LDA attempts to maximise the ratio of betweenclass scatter to within-class scatter. Suppose we have n dimensional elements {x 1 , . . . , x n }, N 1 of which belong to W 1 (first class), and N 2 belong to W 2 (second class). Then, to compute the linear discriminant projection for these two classes, the following steps should be followed.
Calculate the class means: Next, calculate the class covariance matrices: Then the within-class scatter matrix is S W = S 1 + S 2 and the between-class scatter is The LDA projection is then obtained as the solution to the generalised eigenvalue problem: Gender recognition is then established based on the calculation of the Euclidean distance between the tested and extracted 3D facial feature descriptor after projecting onto the LDA space and the two class means, as well as the following projections onto the LDA space: µ * 1 = (W * ) T µ 1 and µ * 2 = (W * ) T µ 2 [15,58,61].

Verification and validation
An n-fold cross-validation scheme was chosen to evaluate the capability of LDA in gender classification. A cross-validation procedure is generally recommended for data mining and machine learning when estimating a classifier's performance, as well as to avoid over-fitting learning. In this scheme, the dataset is divided into equal subsets: one is used for testing, while the others are used to train the LDA classifier. This step was repeated for the other subsets so that they were all utilised as test samples. Three measures are used to access the performance of the LDA classifier: percentage accuracy, sensitivity, and specificity [62]: where T P is the number of true positives (i.e., LDA identifies as a man someone who was labeled as such), T N is the number of true negatives (i.e., the classifier recognises as a woman someone who was labeled as such), F P is a false male classification, and F N is a false female classification. Accuracy indicates overall detection performance, sensitivity is defined as the capability of features to accurately identify a male, and specificity indicates the features' capability not to identify a false male.

Gender analysis approach
An overview of our gender analysis approach is in Fig. 5, while different components of the algorithm are explained below.

Preprocessing
The current study used 4745 subjects (all British adolescents: 2512 females and 2233 males) from the ALSPAC dataset; all 3D faces had neutral expressions with a frontal view. For each face, all 21 of the 3D landmarks were used to extract feature descriptors. Generalised Procrustes Analysis (GPA) was performed to register (align) all sets of 21 facial landmarks by removing translation and rotation [63], and the respective facial meshes were transformed accordingly; this step was required to reduce any errors due to different face positions during computation of geodesic distances. A smoothing (Laplacian) filter [64] was then used to reduce highfrequency information (i.e., spikes) in the geometry of the mesh, thus giving the mesh a better shape and distributing the vertices more evenly [65].

Extraction of geodesic paths
Peyre's MATLAB fast marching toolbox [66] was used to determine geodesic paths and geodesic distances between landmarks. To this end, a number of landmark pairs were selected following the recommendations of Toma [27]; the same landmarks were used for calculation of respective Euclidean distances. Figure 6 illustrates the paths used for further analysis in four facial regions: forehead/eyes, nose, upper lip, and lower lip/chin. For each face, eight paths were used for the forehead/eyes region, nine paths for the nose region, ten paths for the upper lip region, and six paths for the lower lip/chin region. These paths were selected following the gender classification results of Toma [27], who found that only 24 distances provided  gender discrimination efficiency of over 70% when considering 250 Euclidean inter-landmark distances in the ALSPAC facial dataset.

Curvature features
Principal curvatures were first computed for all points of each path; the other features (mean curvature, Gaussian curvature, shape index, and curvedness) were then calculated from the principal curvatures as explained. Extraction of these features depends on the selection of the ring size, the size of the neighborhood (number of mesh layers) around the vertex used to calculate the curvature tensor. Ring size also affects the local surface smoothing (for curvature calculations). The ring size should be large if noise is present, but smoothing can also mask surface details. In the present study, a ring size of 2 was used. For more details on the principal curvature calculation algorithm, see Ref. [17].

Normalisation of curvature features
Each geodesic path has a different number of nodes (vertices) for curvature calculation. To cope with this, a normalised histogram distribution was calculated for each path feature; for this purpose, the number of bins selected was 5, 10, 15, 20, or 25, depending on the minimum number of nodes in a path across the entire sample. Let P k 1 , . . . , P k n denote the vertices of path P k on facial mesh k and let m k i , g k i , c k i , and s k i denote, respectively, the mean curvature, Gaussian curvature, curvedness value, and shape index value evaluated at vertex P k i (i = 1, . . . , n). For each path, we choose a number b = 5, 10, 15, 20, or 25 such that b min n, where min n is the minimum number of vertices in all paths P k across the sample. After histogram normalisation with b bins, we get exactly 4b characteristic curvature features for path P k : where the hat denotes the respective values resulting from histogram normalisation. Then we compose a feature descriptor D k = m k , g k , c k , s k consisting of 4b components.
In addition, we compose feature descriptors for each region of face k by concatenating its path descriptors:

Calculation of geodesic and Euclidean distances
For each face, 33 geodesic distances and 33 Euclidean distances were calculated between the same landmarks utilised to extract the geodesic paths shown in Fig. 6. The fast marching algorithm was used to compute geodesic distances.

Classification
The LDA classifier was used to determine gender using a five-fold validation process, as suggested in Ref. [67] for large datasets. Using this process, the 4745 facial meshes were first partitioned into five equally sized sets (folds); five iterations of training and validation were subsequently performed in such a way that within each iteration, a different fold of the data was withheld for validation, while the remaining four folds were used for training.

Experiments
Five computational experiments were designed for this study in order to determine an optimal set of features for gender classification; an investigation was also conducted on the influence of different parts of the face on gender classification accuracy within the ALSPAC dataset. Experiments 1 through 3 determined which facial features were best suitable for gender classification; it was also investigated which facial regions (eyes, forehead, chins, lips, and nose) were most important for the task. Experiments 1 and 2 used the traditional Euclidean and geodesic distances as classification features, while Experiment 3 utilised our novel feature descriptors. Experiment 4 then examined the effect on the classification scores of combining the Euclidean distance, geodesic distances, and geodesic path curvature features. Finally, Experiment 5 sought the most discriminatory features for gender recognition in each facial region. The evaluation criterion for all experiments was the average gender-classification accuracy using fivefold cross-validation. In addition, sensitivity and specificity measures were also determined. A detailed explanation of these experiments now follows.

Experiment 1: Euclidean distances
We classified gender using 33 Euclidean distances extracted from the 21 biologically significant landmarks, as shown in Fig. 6. Our algorithm classified 79.4% of faces correctly as either male or female. Table 2 shows the gender recognition accuracies as well as the sensitivities and specificities for different facial parts. The Euclidean distances were calculated as in the previous study [27] on the ALSPAC dataset, which used these as features for gender recognition.

Experiment 2: geodesic distances
The second experiment used geodesic distances to determine facial gender; many previous studies [16,68,69] suggested that geodesic distances may represent 3D models better than 3D Euclidean distances. Using the fast marching algorithm, 33 geodesic distances were calculated between a number of facial landmarks as shown in Fig. 6. Gender classification results obtained with these features are shown in Table 3.

Experiment 3: geodesic path curvature
The previous two experiments utilised the Euclidean and geodesic distances as features for gender recognition, both of which are very common features for this task. In contrast, the third experiment uses our novel feature descriptors based on the geodesic path curvatures. As explained in Section 4.1, our feature descriptors rely on selecting certain points of a geodesic path between landmarks (see Fig. 6), followed by determining the mean curvature, Gaussian curvature, shape index, and curvedness features for those points. We used histogram normalisation to resolve the problem of variations in face size. Before applying the resulting feature descriptors, it was important to analyse which histogram bin size would provide the best results. Different bin sizes were tried to achieve the best classification accuracy. Figure 7 demonstrates the relation between classification accuracy and bin size for each facial region. As is clear from Fig. 7, the optimal result was obtained using a bin size of 5 for both the nose and lower lip/chin regions, 10 for the forehead/eyes region, and 15 for the upper lip region. With these bin sizes, the overall gender recognition accuracy was 87.3%, much higher than achieved in Experiments 1 and 2. Table 4 shows the accuracy, sensitivity, and specificity obtained for all facial regions using the geodesic path curvature feature descriptors.

Experiment 4: combination of features
With the results of the previous experiments, we were able to rank the features according to their classification accuracy. The best result was achieved when geodesic path curvature features were used, while Euclidean distances provided the poorest result. In the fourth experiment, the robustness of the gender recognition performance was explored with the aid of different combinations of Euclidean distances, geodesic distances (after scaling using a min-max algorithm [70]), and the geodesic path curvature features. The total 3D facial gender-recognition rates are shown in Table 5. In general, combining different features improves the classification results. The best performance, 88.6% accuracy, was achieved when geodesic distance and geodesic path curvature features were concatenated.

Experiment 5: discrimination capability of landmarks
Our gender classification approach relies on biological landmarks as basic points to determine different classification features. Certain landmarks were used for each facial region to select Euclidean distance, geodesic distances, and geodesic path curvatures following the recommendations of Toma [27]. The aim of this experiment was to determine the landmark pairs that define the best geodesic-curvature based features (as described above) for gender discrimination. To achieve this aim, the paths connecting landmarks were ranked according to their individual classification accuracies obtained with the best combination of features (geodesic distance and geodesic path curvatures) from Experiment 4. Table 6 ranks inter-landmark paths for each face region, while

Fig. 8
Geodesic paths with the highest gender discrimination capability. Fig. 8 shows the three geodesic paths with the highest rank for each region.

General discussion
The first three experiments in this study aimed to determine which facial features are most effective for gender classification. Using only 3D Euclidean distance (Experiment 1), we found the gender classification accuracy to be 79.4%, which is well below human perception accuracy but close to the results of Toma [27]. Experiment 2 demonstrated that geodesic distances between facial landmarks provide better gender identification; this is to be expected, since geodesic distance is a better measure of face shape than Euclidean distance. However, the classification accuracy determined using this measure was still below the human accuracy threshold.
Our novel feature descriptors (geodesic path curvatures) were evaluated in Experiment 3 and produced a classification accuracy of 87.3%. The proposed geometric descriptor is an amalgamation of the mean curvature, Gaussian curvature, curvedness, and shape index at the vertices of the path and thus represents a richer description of the surface than simple Euclidean or geodesic distance measures, explaining the obvious improvement in classification accuracy.
As shown in previous studies [15][16][17], using combinations of facial features can further improve classification accuracies. We explored the various combinations of Euclidean or geodesic distances with our new geodesic path features. We achieved a further improvement in gender classification accuracy (88.6%) using a combination of geodesic distances between landmarks and our geodesic path features.
This result compares favourably with other methods for gender classification. To the best of our knowledge, this is the best published result based on an anthropometric landmark approach, and was achieved for a credibly large sample of 4745 facial meshes. Several studies [5,15,71] used geodesic distances, Euclidean distances, or their combination, as geometric gender classification features for both 2D or 3D facial images. The reported classification accuracies were generally higher than ours, but were achieved for much smaller sample sizes. Burton et al. [5] reported 96% accuracy for a sample of 179 faces, while Fellous [71] obtained 90% accuracy for 109 facial images. Gilani et al. [15] achieved 98.5% accuracy for 64 3D facial scans.
Other studies have only utilised global facial features for gender classification. For example, Wu et al. [61] used raw shape from shading depth features to achieve a gender classification accuracy of 69.9% with the FRGCv1 dataset comprising 200 subject faces. Lu et al. [72] obtained a gender classification rate of 85.4% using the vertices of a generic facial mesh fitted to the raw 3D data as a classification feature descriptor for the same FRGCv1 dataset. Ballihi et al. [24] achieved a classification accuracy of 86% using a combination of radial and circular curves as classification features, and specified curves on the nose, forehead, and cheeks as a compact signature of a 3D face for face recognition and gender selection. However, it should be noted that the authors used a relatively small sample of 466 subject faces. It should also be noted that none of the above global methods is suitable for the investigation of specific relationships between individual facial regions and gender classification accuracy, which was the aim of this work. The present study operated on a large population cohort of 4745 fifteen-year-old Caucasian adolescents; thus, the gender recognition effectivity identified in this study is likely to be more robust than that of other studies based on smaller samples.
Physiological and psychological research [1,5,73] supports the idea that facial and gender recognition in the human brain is based more on individual regions than on the whole face. For example, Edelman et al. [74] compared human performance against a computer model in the classification of gender in 160 adult individuals (80 males, 80 females) using frontal facial images. The study revealed that humans were better than computers at discriminating females based on the upper face, whereas for males the human accuracy was better for the lower face. It was also highlighted that males have thicker eyebrows and larger noses and mouths than females. Several forensic and anthropometric studies have also shown that the female face, mouth, and nose are smaller than in males [14].
Based on this information, the first three experiments conducted in the present study concentrated on using individual facial parts to determine the gender recognition capability. Figure 9 shows an annotated view comparing the classification performance for 3D facial parts for each feature type (Euclidean and geodesic distances and geodesic path curvatures).
As can be seen from Fig. 9, the nose is the most important facial area for gender discrimination in the ALSPAC dataset. In addition, the sensitivity and  Tables 2-4 identify the nasal morphological areas that are most effective for discriminating gender in young Caucasian people.
This finding is in agreement with medical studies [75][76][77], which addressed changes in nasal shapes and sizes in groups of [11][12][13][14][15][16][17] year old subjects in relation to gender discrimination. These studies have found that nasal height and nasal bridge length become fully mature at 15 years of age in males and 12 years of age in females.
After establishing a set of strong genderdifferentiating geometric features, we evaluated the discrimination capabilities of pairs of landmarks and their curvature features along the shortest geodesic path between them. We did this by finding prime determinants of classification accuracy using the LDA classification method. Such landmarks can then form the basis of a more efficient, focused selection of specific manual landmarks or even assist in developing a suitable directed automated landmark detection approach.
Our results indicate that the landmarks that describe 3D facial profile curves are important in gender classification, as shown in Fig. 8. These findings validate other studies that have relied solely on 3D profile curves. For example, Lei et al. [78] extracted the central vertical profile and the nasal tip transverse profile, and located the face feature points by analysing the curvatures of profiles to obtain ten 3D geometric facial features with a rank-1 accuracy of 98.8% using the ZJU-3DFED dataset and a rank-2 accuracy of 100% with the 3DFACE-XMU dataset. Also Ter Haar and Veltkamp [79] performed 3D face matching and evaluation using profile and contour of facial surface to achieve a mean average precision (MAP) of 0.70 and 92.5% recognition rate (RR) on the 3D face Shape Retrieval Contest dataset (SHREC'08) and an MAP of 0.96 and 97.6% RR on the University of Notre Dame (UND) dataset.
Our analysis revealed that the path between the inner canthi of the eyes (enR-enL), the ala shape path (alL-prn-alR), and the Cupid's bow path (cphL-ls-cphR) are the best characteristic paths for gender classification. These results have been corroborated by the results from previous studies that were conducted on the same ALSPAC dataset. Examples include Toma [27], who worked on whole 3D faces (PCA of a small set of anthropometric landmarks and Euclidean distance measures), and Wilson et al. [80], who worked only on the lower parts of 3D faces using manually identified regions. These studies identified approximately the same face regions, with less accurate classification.
Finally, our analysis of the sensitivity and specificity results showed little difference between the above results for gender classification based on facial regions using the geodesic and Euclidean distances as well as the geodesic path curvature features with the exception of the nose trait. In general, our first four experiments yielded good specificity and sensitivity results, particularly Experiment 4, in which the geodesic distance and geodesic path curvature features were integrated; the sensitivity value was 0.87 and the specificity value was 0.9.

Conclusions and future work
This paper proposed a novel 3D geometric descriptor for effective gender analysis and discrimination. It utilises curvature features determined from geodesic paths between landmarks within a facial region. Five experiments were performed, exploring in detail some aspects of facial traits based on key anthropometric landmarks. The results show that geodesic path curvature features extracted between 3D facial landmarks have the capability to classify the gender of Caucasian teenagers with an accuracy of 87.3%. Combination of the new 3D geometric descriptor with classical distance measures resulted in the best classification accuracy of 88.6%. The hybrid geodesic path curvature features and geodesic distance demonstrated an improved capability not only in terms of accuracy but also in terms of sensitivity and specificity. The sensitivity and specificity results show a noticeable variation between Caucasian teenagers in terms of both female and male nose morphology. Finally, Experiment 5 demonstrated that the geodesic paths between certain facial landmarks were more discriminative for gender classification and were more significant in 3D facial-profile contours. The nose ala path, Cupid's bow path, and the path between the inner canthi of the eyes were also shown to be significant.
In future, this study will be extended to explore gender variations depending only on profile contours. We will also work on evaluating the robustness of our novel feature descriptors on a dataset with respect to moderate changes in facial expression and ethnicity.