Automatic location and semantic labeling of landmarks on 3D human body models

Landmarks on human body models are of great significance for applications such as digital anthropometry and clothing design. The diversity of pose and shape of human body models and the semantic gap make landmarking a challenging problem. In this paper, a learning-based method is proposed to locate landmarks on human body models by analyzing the relationship between geometric descriptors and semantic labels of landmarks. A shape alignment algorithm is proposed to align human body models to break symmetric ambiguity. A symmetry-aware descriptor is proposed based on the structure of the human body models, which is robust to both pose and shape variations in human body models. An AdaBoost regression algorithm is adopted to establish the correspondence between several descriptors and semantic labels of the landmarks. Quantitative and qualitative analyses and comparisons show that the proposed method can obtain more accurate landmarks and distinguish symmetrical landmarks semantically. Additionally, a dataset of landmarked human body models is also provided, containing 271 human body models collected from current human body datasets; each model has 17 landmarks labeled manually.


Introduction
A landmark on a model, also referred to as a labeled keypoint, has a specific position and a semantic label [1]. Landmarking is a task to predict both the positions of landmarks and their semantic labels.
Landmarks on human body models not only play an essential role in many graphics applications, such as shape matching [2], object recognition [3], and surface remeshing [4], but also play a role in human body measurement, clothing design, and healthcare-related applications [5,6]. However, to label landmarks, one must bridge the semantic gap and have a semantic-level understanding of human body models to identify the positions of the landmarks and attach semantics. Additionally, variations in pose and shape of human body models and their intrinsic symmetry further increase the difficulty of semantic understanding, making landmarking of human body models a challenging problem.
To bridge the semantic gap, the semantics of landmarks can be either transferred from a labeled template model or learned from training data. The former mainly targets shape matching and is limited by the matching accuracy. On the other hand, landmarks can improve the accuracy of shape matching. For learning-based methods, however, few datasets have been released to investigate the landmarking problem on human body models. You et al. [7] collected a keypoint dataset for diverse models in 16 categories, which do not solely cover human body models. Although CAESAR [8] and SHREC'14 Track [9] provide human body model datasets with some landmarks, the human body models in these datasets have similar poses. Therefore, current datasets for human body model landmarking cover shape variations across individuals without considering the diversity of poses, which limits their applications. To solve this problem, we collected 271 human body models with varied shapes and poses and labeled the landmarks manually to form a new keypoint dataset. Considering both anatomical meaning and discrimination of shape features, 17 vertices were selected from a human body model for labeling landmarks. The landmarks and their semantic labels are shown in Fig. 1. Symmetric landmarks, such as the left and right patella are semantically distinguished.
The most common way to label landmarks is to establish correspondences between features and semantic labels. These features can be either handcrafted descriptors or features extracted by deep learning algorithms. Few handcrafted descriptors are robust to simultaneous variations in pose, shape variations between individuals, and the intrinsic symmetry of human body models. Learning-based features [7,10] may be extracted by treating landmarking as a special segmentation problem, in which each landmark is regarded as an individual class. Since the training data for each class in landmarking are far fewer than those for shape segmentation, the accuracy of these methods is limited. In addition, models for learning-based methods need to be aligned first to extract consistent features and distinguish symmetrical landmarks. However, aligning models manually is tedious and time-consuming, and shape alignment seems to be the most difficult point in the normalization process [11].
As the human body model structure is invariant to changes in shape and pose, it may be possible to locate landmarks via structural information. With the development of shape segmentation methods [12][13][14], we can easily access structural information through the segments obtained by shape segmentation. In this paper, a new multi-scale descriptor is defined based on the segments, making it robust to variations in shape and pose of human body models. Moreover, a shape alignment algorithm is adopted to break the symmetric ambiguity of human body models. In this way, the symmetric segments are separated so that the proposed segment-based descriptor is distinguishable for symmetric landmarks. Finally, combining the proposed descriptor and other handcrafted descriptors, an AdaBoost regression model is trained to map these descriptors to landmark labels. To further improve landmarking accuracy, the label is defined as a truncated distance field of the landmark. In summary, the main contributions in this paper are as follows: • an automatic method to locate symmetry-aware landmarks for 3D human body models with varied shapes and poses, with a keypoint dataset with landmarks labeled manually on 271 human body models, • a multi-scale descriptor based on segments, which is robust to the variations in shape and pose of human body models and can discriminate symmetric parts, and • a shape alignment algorithm based on the kinematic skeleton extracted from the segments using existing algorithms, giving human body models a consistent orientation.

Related work
Since the descriptors are closely related to the landmarking methods, both descriptors and landmarking methods for human body models will be reviewed in this section.

Descriptors
A descriptor encodes local or global geometric features of a 3D model. Descriptors can be roughly classified into two types: handcrafted descriptors and learning-based descriptors [2]. Because learningbased descriptors are trained for specific tasks, such descriptors relevant to this paper will be introduced in the landmarking method review. In earlier years, basic geometric attributes were used as model descriptors, such as 3D coordinates of vertices, which are sensitive to rigid transformations of models. To meet the requirements of applications, more complicated descriptors have been proposed, e.g., Gaussian curvature, spin images [15], 3D shape contexts [16], and shape diameter function (SDF) [17]. These descriptors are robust under rigid transformation but not for non-rigid deformations. To better deal with articulated models that can undergo non-rigid deformations, intrinsic descriptors that are isometric-invariant have been proposed based on geodesic distances or spectral geometry. These descriptors include the global point signature [18], heat kernel signature (HKS) [19], scale-invariant heat kernel signature (SIHKS) [20], wave kernel signature (WKS) [21], etc. Although these descriptors are robust to models of the same person in varied poses, they are not robust to human body models with shape variations between individuals, which are inevitable in practical applications. Moreover, these descriptors cannot distinguish the symmetric parts of human body models since the local or global geometries of the symmetric parts are almost the same. Based on the observation that some orientations, such as the gradient of the average diffusion distance field, are sensitive to local symmetry, some oriented descriptors [22][23][24] have been proposed to distinguish the symmetric parts of models. However, the orientations adopted by these descriptors are also unstable for non-isometric models, so are not robust for human body models with varied shapes.
In this paper, a new multi-scale descriptor is proposed based on structural information, making it robust to variations in shape and pose of human body models. In addition, the proposed descriptor can discriminate symmetric parts when the adopted structural information is symmetrically distinguishable.

Landmarking on human body models
Landmarking is a procedure to both predict the positions of landmarks and identify their semantic labels. The semantic labels of corresponding landmarks should be consistent across different models in the same category.
Descriptors play an essential role in early landmarking methods for human body models. Some methods [9,25] x locate landmarks directly via similarity of descriptors. Several methods [26,27] connect landmarks to form a graph, and a Markov network is used to predict the landmarks. In other x Ref. [9] reports on the SHREC'2014 track on automatic location of landmarks, and introduces six landmarking methods. methods [9,28], a classifier is adopted to establish the relationship between several descriptors and landmarks. In this way, effective descriptors can be selected by the classifier automatically. Shu et al. [29] adopted a stacked auto-encoder method to predict points of interest using multiple feature descriptors. Limited by the descriptors, all these methods cannot distinguish symmetric parts, making them applicable only to human body models with known orientations. Wuhrer et al. [26] generalized the landmarking method to human body models in varied poses by mapping the models to canonical forms. However, the best orientation among the canonical forms is chosen manually to avoid symmetric ambiguity.
With the development of deep learning algorithms, some methods have been proposed to locate landmarks via features extracted through deep learning. Wang et al. [2] introduced a triplet convolution neural network (CNN) to learn descriptors that can discriminate landmarks. However, these descriptors aim to improve the accuracy of matching rather than landmarking. Xi et al. [30] located landmarks by rendering 3D human body models into 2D images for use in deep CNNs. Although this method addresses the data imperfections of 3D models and validates its effectiveness on human body models in a standard pose, landmarking results on human body models in varied poses cannot be guaranteed because of the diversity of the rendered images. Other methods [7,10,31] treat landmarking as a special segmentation problem, where each landmark is regarded as an individual class. Since the number of training data in each class in landmarking are far fewer than those in shape segmentation, it is more difficult to extract effective features than to perform shape segmentation. As shown in the results [7], these methods face significant difficulties in locating consistent landmarks.
Another way to locate landmarks is to transfer landmarks from a labeled template model to a target model. Several shape matching methods can be used to locate the landmarks by establishing correspondences between the template model and the target model, such as iterative closest point and thin plate spline robust point matching adopted in Ref. [9], and coherent point drift adopted by Zhou and Hao [32]. In these methods, the accuracy of the landmarks directly depends on the shape matching accuracy. To bridge the non-isometric gap, a template model very similar to the target model is selected from the training data. However, limited by the accuracy of shape matching, the accuracy of landmarks located by these methods is still restricted. In recent years, significant progress has been made in shape matching [23,33,34]. The accuracy of landmarks based on shape matching may benefit from these new shape matching methods.
In this paper, a learning-based landmarking method is proposed for human body models with arbitrary shapes and poses. First, several descriptors are defined and extracted. An AdaBoost regression algorithm is then adopted to locate the landmarks by selecting effective features from the multiple descriptors.

Outline
In this paper, a novel landmarking method for human body models having varied shapes and poses is proposed based on structural information. The input to our method is a triangular mesh of a human body model and the corresponding segments obtained by learning-based shape segmentation methods. Although the segments are semantically labeled, symmetric segments have the same label, as shown in the segments in Fig. 2. Regarding these segments, the symmetric ambiguity is first broken; then, a better way is proposed to enrich structural information by transforming the segments into a multi-scale descriptor and extending the descriptor to shape analysis.
The proposed landmarking method consists of three steps: shape alignment, feature extraction, and AdaBoost regression. First, a shape alignment algorithm is used to transform the human body models into a consistent orientation. In this way, symmetric segments with the same labels can be distinguished, and some descriptors, such as the normal vector, can be computed consistently. Then, a new multi-scale descriptor is used based on heat diffusion for the symmetry-aware segments, called the part heat diffusion signature (PHDS), which is robust to variations in human body models and discriminating with regard to symmetric parts. Additionally, some handcrafted descriptors, such as the HKS and WKS, are also extracted. Combining all these descriptors, an AdaBoost regression algorithm is adopted to establish the correspondence between these descriptors and the probability distributions of the landmarks. The vertices with maximal probability values in the regression results are marked as landmarks. The pipeline of the proposed method is shown in Fig. 2.

Shape alignment
Due to the various poses and intrinsic symmetry of human body models, it is challenging to define a consistent orientation to align human body models. In recent years, some methods [33,35] have been proposed to distinguish symmetric parts of human body models with the help of kinematic skeletons. Based on the observation that the feet of human body models always face forward in the standard pose, these methods transmit the orientation information along the kinematic skeleton to the torso and then correctly distinguish the symmetric parts of human body models. However, the kinematic skeleton extraction steps of these methods are complex and time-consuming, whether template model fitting [33] or template skeleton embedding [35]. With the development of learning-based shape segmentation methods, we have easy access to the segments of human body models. This enables us to propose a simpler algorithm to extract kinematic skeletons directly from the segments, and then the orientation transmitting algorithm provided by Luo and Feng [35] is adopted to break the symmetric ambiguity via the extracted kinematic skeleton.
The steps of the proposed shape alignment algorithm are shown in Fig. 3(a). First, the boundary curves between the segments are extracted, and their center points are adopted as the skeleton points. Then, the skeleton points are connected using skeleton bones based on the structures of the segments and are further processed to form a kinematic skeleton. Once the kinematic skeleton has been extracted, the algorithm proposed by Luo and Feng [35] is adopted to establish a local coordinate system on the human body model. Thus, human body models may be consistently oriented by aligning the local coordinate systems, and the symmetric segments are distinguished as well.
Since the skeleton points extracted from the boundary curves may be insufficient to form a complete kinematic skeleton, further steps are taken to form the kinematic skeleton, as shown in Fig. 3

(b).
For each leaf segment that has only one neighbor, the farthest vertex on the segment to the corresponding skeleton point is defined as a new skeleton point, and a skeleton bone is obtained by connecting these two skeleton points, as shown by the blue lines of the middle model in Fig. 3(b). For the torso segment, two additional skeleton points are extracted to make the kinematic skeletons closer to the medial axis of the human body models. As shown in Fig. 3(c), two auxiliary points P 0 and P 1 are defined as the mid-points of the shoulder skeleton points and the hip skeleton points, respectively. Then, an offset along the line P 0 P 1 is used to gain two skeleton points P 0 and P 1 which satisfy the condition P 0 P 0 = P 1 P 1 = P 0 P 1 /5. In this way, a kinematic skeleton is extracted directly from the segments. Although skeleton points P 0 and P 1 are defined without considering geometric features, they do not affect the establishment of the coordinate system in which only the direction of skeleton bone P 0 P 1 is helpful.
To establish the local coordinate system, the algorithm provided by Luo and Feng [35] rotates the leg bones and foot bones of the kinematic skeleton to the standard pose first. An example of this step is shown in Fig. 4. By assuming that the feet face forward in the standard pose and the chest skeleton point (P 0 ) is above the waist skeleton point (P 1 ), a local coordinate system may be established to distinguish the symmetry of the human body model, as seen in the fourth model in Fig. 3

(a).
To ensure the segments agree with human body structure, the segments are subjected to a preprocessing step to update their semantic labels.

Fig. 4
Steps in rotating leg and foot bones to the standard pose. Reproduced with permission from Ref. [35], c Springer Science+Business Media, LLC, part of Springer Nature 2020.
When the number of segments with the same label exceeds the target number, such as three footsegments for a human body model, the labels of the segments with fewer vertices are modified according to geodesic distances to their adjacent segments. In addition, the symmetric thigh segments sometimes are very close so that the two thigh boundary curves near the torso geometrically merge into one curve, as shown in Fig. 5(b). In this case, first, the merged boundary curve should be subdivided into two boundary curves, and then two skeleton points may be extracted. To this end, the center point of the merged boundary curve is calculated first (red point C). Then, the point on the boundary curve closest to the center point is identified and marked as a temporary separation point (purple point A). Another separation point B is selected as the point closest to point A on the other side of the boundary curve. Point A is further refined to be the point closest to point B on the other side of the boundary curve. Having defined the separation points, the boundary curve can be subdivided into two boundary curves, and the center points of the two parts are regarded as the skeleton points, shown as yellow points in Fig. 5. Figure 6 shows some results of the proposed shape alignment algorithm. Human body models in various orientations are aligned around the hip region, resulting in a consistent orientation.

Feature extraction
Features play an essential role in locating landmarks. Handcrafted descriptors, such as HKS and WKS, can distinguish vertices to some extent. However, due to limited discrimination of vertices, accurately locating landmarks and distinguishing symmetrical vertices is challenging. Semantic segmentation results of 3D models can narrow the search for landmarks according to semantic information; for example, the nose landmark must appear in the head segment. Nevertheless, the inclusion relation between vertices  and segments is insufficient to locate landmarks accurately.
In this paper, a new multi-scale descriptor called PHDS is proposed to describe the relationship between vertices and segments, which solves a heat equilibrium equation on the surface with different initial heats attached to the segments. In addition to the inclusion relation, the proposed descriptor is also related to the distance between the vertices and the segments, which enriches structural information compared to the segments.
For each segment i, the heat equilibrium equation over the surface can be represented as where Δ is the discrete cotangent Laplacian matrix, p i is an indicator vector for the initial heat values of vertices with p i j = 1 if vertex j belongs to segment i and p i j = 0 otherwise, and H i is a diagonal matrix with H i jj being the heat contribution weight between vertex j and its corresponding segment. In this paper, where A is the total area of the model, A j is the area represented by vertex j, which is one third of the total area of its adjacent triangles, c is a constant value to balance the parameters of Δ and H i , and d i j can be considered as the heat retention ratio of vertex j for the initial heat. The larger d i j is, the closer the heat value of vertex j at equilibrium is to its initial heat value. In this paper, d i j is defined as follows: where a and b are constant values. To find the equilibrium parameter w i , Eq. (1) can also be written: which can be solved directly by a least squares method. For each segment i, an equilibrium parameter w i is calculated by solving Eq. (3). Since a human body model is divided into 14 segments, a 14-dimensional feature vector for each vertex is obtained by taking each segment as the initial heat source in turn: the proposed PHDS is a multi-dimensional descriptor, as shown in each row in Fig. 7. In addition to the initial heat source, the parameters (a, b) also affect the equilibrium parameter w i . Figure 7 shows the features obtained for three pairs of parameters (a, b). The greater the difference between a and b, the larger the range of heat diffusion on the human body model, so information about segments is transmitted to further vertices. Thus, PHDS is also a multi-scale descriptor with variable parameters (a, b). Moreover, since the features obtained by symmetric segments are arranged in different dimensions, the symmetrical vertices can be distinguished easily according to the features of different dimensions. Thus, the proposed PHDS is discriminative for symmetric parts.
In a broad sense, the proposed PHDS is somewhat similar to the skinning attachment [36]: both are composed of results by taking multiple elements as initial values. The skinning attachment takes each bone as an element, while PHDS takes each segment as an element. However, PHDS does not need complex operations such as skeleton extraction and vertex assignment, making it more effective and feasible. Furthermore, skinning attachment is usually used in shape animation. To the best of our knowledge, our method is the first to extend this type of descriptor to shape analysis. Different heat equilibrium parameters w i can be obtained using different parameters (a, b). There are no best parameters for (a, b) but they should be selected according to task. In this paper, three sets of parameters (a, b) are adopted to form the descriptor PHDS. In this way, heat can be transmitted over different segments so that the source segment can be perceived by vertices at different distances. In addition to PHDS, some other widely used descriptors, such as the HKS, SIHKS, WKS, SDF, curvature, and normal vectors at the vertices, are also adopted to enrich the features. For each vertex on the human body model, a 150-dimensional feature vector is formed using the descriptors above. Details of these descriptors are given in Table 1.

AdaBoost regression
The essence of landmarking is to automatically establish correspondence between the features and the labels of the landmarks. Since no descriptor can  solve the landmarking problem alone, the AdaBoost regression algorithm is adopted to select the effective descriptors from many to locate the landmarks. First, some training data, including features and labels of the vertices, are needed to train an AdaBoost regression model. If the labels of the vertices are directly defined as 1 (landmark) and 0 (nonlandmark), as in previous methods, regardless of whether segmentation or regression is used, it is challenging to learn effective features to locate the landmarks. In this case, the labels for landmarks and neighboring vertices may change suddenly, while the features vary slightly. Therefore, we define the labels in a smooth probability field by constructing a distance field for each landmark. In this way, the labels are more consistent with the features, which is conducive to learning. However, since the values of the probability field range from 0 to 1, the difference of probability values between adjacent vertices becomes smaller with expansion of the probability field distribution, in turn increasing the difficulty of landmark recognition. Thus, a truncation parameter τ is adopted to constrain the range of the probability field, as shown in Fig. 8.
To flexibly constrain the range of local regions around the landmarks, the distance field is defined as a Gaussian kernel function with respect to geodesic distance to the landmark. The truncation parameter τ is set as a geodesic distance to control local regions. The labels of the landmark i are defined as where d ij is the geodesic distance between vertex j and landmark i, and k is a constant set to 3.
Once the features and the labels are ready, the AdaBoost regression algorithm analyzes the relationship between the features and the labels to obtain a trained model, in which effective descriptors

Experiments
In this section, the experimental setting for testing the proposed landmarking method is introduced first. Then, the proposed method is experimentally validated and compared with previous methods qualitatively and quantitatively. Limitations of the proposed method are presented at the end. The proposed method was implemented on a desktop PC with an Intel Core i9-10900X CPU and 128 GB RAM.

Experimental setting
The experimental setting is considered from three points of view: the new dataset, the parameters used, and the evaluation criteria for landmarks.
To validate the proposed landmarking method, some human body models are selected from three datasets: the SCAPE dataset [37], the MPI FAUST dataset [38], and the SPRING dataset [39]; they were labeled manually to provide a dataset for human body model landmarking. The SCAPE dataset contains 71 human body models with various poses of the same person. The MPI FAUST dataset provides 100 human body models of 10 people in 10 different poses. Since the pose of the human body models in the SPRING dataset are all in standard pose, only 100 human body models are selected from this dataset, including 50 male human body models and 50 female human body models. Notably, the vertices of the human body models in MPI FAUST are in one-toone mapping, and the vertices of the human body models in both SCAPE and SPRING are also oneto-one mapped. Therefore, we downsample these human body models to 2048 vertices for two reasons: to reduce the influence of vertex correspondences on landmarking, and to keep consistency with the keypoint dataset proposed by You et al. [7], to expand it. Taking repeatability of poses into consideration, 120 human body models of various shapes and poses were used as training data, 51 human body models with partially similar poses were used for validation, and the remainder were taken as test data. Details of this data division are shown in Table 2.
Many learning-based shape segmentation methods  [12][13][14] have been proposed to obtain labeled segments with the training data provided by Maron et al. [40]. For convenience, the segments used in this paper are the labeled segments from the training data [40].
Since the human body models are downsampled to 2048 vertices, the parameter c is also set to 2048 to balance the parameters of Δ and H i . Note that c = 2048 is also effective for human body models at other resolutions because the heat contribution weight is determined by both the parameter c and the parameters (a, b). The maximum geodesic distance of all human body models is normalized to 1, and the truncation parameter τ is set to 0.05. Two evaluation criteria are adopted to evaluate the accuracy of the located landmarks: (i) the geodesic error between the located landmark and the ground truth labeled manually, and (ii) the percentage of correct keypoints (PCK), which considers the fraction of landmarks with geodesic error less than a certain threshold ε pck .

Experimental results and analysis
The proposed method is comprehensively evaluated from the following five considerations: (i) the necessity of the descriptors used in this paper, (ii) the effectiveness of the setting labels in AdaBoost regression, (iii) the strength of the proposed PHDS, (iv) the landmarking results, and (v) the robustness to changes in resolution of the human body models.
First, we validate the utility of the adopted descriptors in human body model landmarking. An ablation study is conducted by disabling the descriptors in turn, thus evaluating whether each descriptor is helpful for landmarking. The geodesic errors of the landmarks are shown in Table 3. On balance, the experiment that takes all descriptors has the best average performance for all landmarks, as shown by the average values in Table 3. Therefore, all descriptors are helpful for locating landmarks on human body models.
The effectiveness of setting labels as a distance field with a truncation parameter is verified next. If the truncation parameter is too small, the labels have fewer non-zero values, which makes it difficult for AdaBoost to extract effective features. However, the possible regions for landmarks increase as the truncation parameter becomes larger, making the landmarks challenging to locate. Therefore, the truncation parameter needs to balance the number of non-zero labels and the size of the local region. Table 4 shows the geodesic errors in landmarks for different truncation parameters, while the labels of the navel landmark for different truncation parameters are shown in Fig. 9. In addition, the commonly used labeling which sets the label to 1 for a landmark and 0 otherwise, is also tested, the experiment being denoted "0-1" in Table 4. Landmarking performs best on average with the truncation parameter τ = 0.05. In addition, compared to the "0−1" experiment and the distance field without truncation parameter (τ = 1), our method achieves better results, showing the effectiveness of setting labels using a truncated distance field.

Fig. 9
Labels of the navel landmark for different truncation parameters.
Next, the strength of the proposed PHDS in human body model landmarking is evaluated. The geodesic errors in experiments when disabling the descriptors in turn are shown in Table 3. The geodesic errors increase greatly without PHDS, while in other cases, the geodesic errors are slightly worse. Thus, PHDS has the most significant influence on the accuracy of landmarking. As effective descriptors are selected by AdaBoost during the training process, the importance of descriptors was additionally analyzed for effectiveness, and the results are shown in Fig. 10. The proposed PHDS is of great significance for most landmarks, which further validates the strength of PHDS. Moreover, since human body models are structurally consistent, PHDS, which is based on  structural information, is also robust to changes in shape and pose, as shown in Fig. 11. Figure 12 shows the landmarking results of the proposed method on human body models with varied shapes and poses. The landmarks obtained by the proposed method are accurately located and consistent with our perception. Additionally, the symmetric landmarks are semantically distinguished.
Finally, the robustness of the proposed method to the resolution of the human body models is considered. In this paper, landmarks are located via the descriptors of the human body models. Whether the proposed method can be applied to human body models with different resolutions depends on the properties of the descriptors. Figure 13(a) shows some dimensions of the proposed PHDS on human body models with different resolutions. It can be seen that the PHDS is almost visually the same for these human body models. The other descriptors used in this paper are also robust to changes in resolution of human body models. Figure 13(b) shows the landmarking results for these human body models. The results were obtained by directly applying the AdaBoost regression model trained on human body models with 2048 vertices to other resolution models. We can see that although the resolutions of the test human body models differ from those of the training models, all landmarks for these human body models are labeled in line with our expectations. Thus, the proposed landmarking method is robust to resolution changes in human body models.
The proposed landmarking method was also tested on human body models from the Princeton Segmentation Benchmark [42], where the models are noisier than the training data. The results are shown in Fig. 14, as are segments of these human body models obtained by MeshCNN [13]. Given that MeshCNN is a shape segmentation method for edges on models, we transform the edge segments to face segments, as shown in Fig. 14(b), and then apply them to the landmarking. MeshCNN is trained using models with 752 vertices, a limited quantity that might lead to overly simplified results and detail loss, increasing the difficulty of landmarking. Taking this into account, we also tested the models with the original resolution, in which the segments are obtained from the simplified models through a nearest neighbor algorithm. The segments of human body models at the original resolution are shown in Fig. 12 Landmarking results for the proposed methods for human body models with various shapes and poses. Landmarks with consistent semantic labels are shown in the same color. Left 3 columns: human body models from the validation set. Other columns: models from the test set. Each model is shown from two views.   Fig. 14(d). Although the quality of the segments is not comparable to that in Maron et al. [40], most landmarks are located accurately. Only one nose landmark is labeled at some offset due to the lack of detail. Since the proposed method is robust to the resolution of human body models, better results might be obtained by combining the landmarking results for human body models with different resolutions, which can be considered in future research.
The speed of the proposed landmarking method is discussed next. Our method contains three steps: shape alignment, feature extraction, and AdaBoost regression. For a human body model with 2048 vertices, it takes approximately 0.48 s to align the human body model. The feature extraction step takes approximately 3.13 s on average to obtain the 150dimensional feature vector. The AdaBoost regression step costs approximately 12 min for the training step, which contains 120 human body models, and 0.44 s to locate the landmarks on a test human body model. Therefore, for a test human body model, it takes approximately 0.48 s + 3.13 s + 0.44 s = 4.05 s to locate the landmarks by the proposed method.

Comparisons
In this section, we qualitatively and quantitatively compare the proposed landmarking method with previous landmarking methods, which shows the high accuracy of our method. Additionally, the effectiveness of the proposed shape alignment algorithm is verified.
As described in Section 2, there are two types of methods to locate landmarks: template-based methods, which transfer landmarks from a labeled template model via shape matching, and learningbased methods, which learn the locations and labels of landmarks from the training data. In this paper, we compared the proposed method with two template-based methods, FARM [33] and COCFM [23], and three learning-based methods, SpiderCNN [43], DGCNN [44], and PointConv [45]. FARM is an automatic registration method that deforms a template model to align well with the target, while COCFM is a shape matching method that establishes the correspondence between model vertices. For the three learning-based methods, the landmarking problem is treated as a multi-class classification problem, in which each landmark is associated with its semantic label and all the nonlandmarks are assigned a background class label. For these learning-based methods, the models are usually aligned manually first to locate landmarks [7]. Therefore, we test these landmarking methods with both unaligned and aligned human body models. The human body models are given random orientations to make up the unaligned training data and are processed via the proposed shape alignment algorithm to form the aligned training data. Table 5 shows the geodesic errors in landmarks located by these methods on the unaligned and aligned human body models. All these landmarking methods achieve better results when aligned human body models are adopted: for template-based methods, although spectral features are adopted to establish correspondence, the skeleton deformation of FARM and the displacement vector used by COCFM are affected by the orientation of the models, which slightly decrease the accuracy of the landmarks when the human body models are unaligned. For learningbased methods, the inconsistent orientations of the unaligned human body models make it challenging to extract effective features, resulting in worse results by a large margin. Although most descriptors used in this paper are invariant to model orientation, the proposed method achieves better results due to the alignment of the normal vectors. Therefore, the proposed shape alignment algorithm could be used as a preprocessing step for other methods to improve their landmarking accuracy. Our method achieves the best performance in terms of landmark accuracy on average, whether unaligned or aligned human body models are adopted. Table 6 compares PCK results at a geodesic error threshold ε pck = 0.01 for aligned human body models. The proposed method outperforms other methods for most landmarks and has better performance on average. Figure 15 shows the PCK curves with geodesic error threshold ε pck varying from 0 to 0.1. This shows the significant benefit of our method, with PCK results higher than those of other methods at any geodesic error threshold, which further confirms    the high accuracy of our method. Figure 16 shows some landmarking results for different methods. The landmarks located by our method are much more accurate than other landmarking methods and more consistent with the ground truth.

Limitations
First, because the human body models in the dataset have minimal clothing, limited by the training data, the proposed method is only applicable to human body models with minimal clothing.
Second, the proposed method adopts kinematic skeletons to establish local coordinate systems. As in the method of Luo and Feng [35], if the input human body model does not contain feet, or has such flexibility that the feet can both face backwards simultaneously, then the human body models cannot be aligned correctly, making symmetric segments indistinguishable.
Finally, segments play an important role in the proposed landmarking method. Although the accuracy of current shape segmentation exceeds 90%, some models might still be poorly segmented and thus fail to be accurately landmarked. However, this problem will diminish with further research in shape segmentation or using a combination of landmarking results for models with different resolutions. Nevertheless, for human body models with genus larger than 0 or missing parts, there is no guarantee that shape segmentation methods can obtain correct segments, making the proposed method ineffective.

Conclusions
In this paper, we collected and labeled 271 human body models with various shapes and poses to form a new dataset for landmarking. The human body models in this dataset are aligned to a consistent orientation, and 17 landmarks are located manually for each model. A learning-based landmarking method was proposed to map several descriptors to the landmark labels. Since the length of the descriptor is invariant to changes in the number of the vertices, the proposed method is robust to changes in model resolution. In addition to some handcrafted descriptors, a multi-scale descriptor named PHDS was proposed based on structural information, making it robust to variations in human body models. Further, a shape alignment algorithm was proposed to align human body models to enable PHDS to distinguish symmetric parts. Because of the strength of the proposed PHDS descriptor, a simple AdaBoost regression algorithm can accurately locate the landmarks. Although some steps of the proposed landmarking method are based on existing methods, we use these techniques to solve this challenging problem effectively, with good results.
Since the poses of the human body models vary, it is challenging to select a standard orientation to fully align them. The proposed shape alignment algorithm provides a way to align rough human body models using on local coordinate systems based on the hip, giving the human body models a consistent orientation. In addition, the proposed PHDS is not only symmetry-aware but also applicable to models in the same category, so can be used in applications such as shape morphing and shape correspondence to prevent symmetric ambiguity. In the future, we will focus on landmarking for 3D models in other categories. Various applications, such as shape correspondence and shape morphing, that benefit from the landmarks, will also be further investigated.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.
The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
Other papers from this open access journal are available free of charge from http://www.springer.com/journal/41095. To submit a manuscript, please go to https://www. editorialmanager.com/cvmj.