Anthropometric clothing measurements from 3D body scans

We propose a full processing pipeline to acquire anthropometric measurements from 3D measurements. The first stage of our pipeline is a commercial point cloud scanner. In the second stage, a pre-defined body model is fitted to the captured point cloud. We have generated one male and one female model from the SMPL library. The fitting process is based on non-rigid Iterative Closest Point (ICP) algorithm that minimizes overall energy of point distance and local stiffness energy terms. In the third stage, we measure multiple circumference paths on the fitted model surface and use a non-linear regressor to provide the final estimates of anthropometric measurements. We scanned 194 male and 181 female subjects and the proposed pipeline provides mean absolute errors from 2.5 mm to 16.0 mm depending on the anthropometric measurement.

made manually from physical subject using a tape measure, but the raise of online shopping and personalized tools set new demand for computerized anthropometric measurements.
A standard pipeline for computerized anthropometric measurements is the following [10,26,7,3,27,25] : 1) a 2D or 3D body scan producing a 3D point cloud or an initial model, 2) fitting of a pre-defined model and 3) measurements from the fitted model. The main challenge is the step two which should provide an accurate and watertight volumetric model of a subject so that important measurements can be made on the model surface. Challenges arise from different sensor modalities, poses and occluded regions. The proposed method in this work shares the main steps of the standard pipeline (Figure 1), but instead of physiologically valid model fit we adopt a non-rigid Iterative Closest Point (ICP) registration between the model and captured point clouds. Moreover, we do not make anthropometric measurements directly from the fitted model surface but extract a set of physiologically meaningful surface features (body circumferences) and use them to train a regressor that provides estimates of the physical anthropometric measurements. Our main contributions are: -A full processing pipeline from 3D body scans to anthropometric measurements. -The body model registration step using a non-rigid ICP to fit a pre-defined model to captured body scans. -Non-linear regression based anthropometric measurement estimation step from circumference based intermediate features. -A public benchmark dataset -NOMO3D -with anthropometric measurement ground truth.

B. Optimisation Stage
The purpose of the regression optimisation stage is to fin a non-linear or linear mapping f (·) such that f (P i ) : (t (1) i , . . . , t (C) i ) 7 !t i (7 i.e. the function maps the surface measurements tot i that i an estimate of the true anthropometric clothing measuremen t i . For this purpose, many existing regression methods can b used: Linear Regression is a simple approach for modelling th relationship between multiple explanatory variables denoted X and a scalar dependent variable y, which can be represente in the mathematical form aŝ To find the best coefficients , Least Square Estimation ca be employed to minimize the sum of squared errors as below While Support Vector Regression (SVR) is another efficien supervised learning model which analyses data used for regres sion and employs the same principles as the Support Vecto Machine (SVM). The non-linear SVR could be represented as where K(·) denotes the kernel functions, Polynomial o Gaussian functions, which transform the data into a highe dimensional feature space to make it possible to perform th linear separation; ↵ is the Lagrange multiplier.
4 Fig. 1 The proposed pipeline for measuring anthropometric clothing measurements from 3D body scans. A 3D point cloud is produced by a set of depth sensors (body scanner). A body template is fitted (registered) to the 3D point cloud (step-2); Circumference measurements are computed on the model surface (step-3); Supervised regression is adopted to provide estimates of anthropometric measurements (step-4) Our pipeline is evaluated with the NOMO3D dataset of real male and female subjects (194 plus 181) for which we provide average accuracy and percentage of subjects whose accuracy is below the thresholds in [9].

Related work
Anthropometric Measurement Datasets -There have been several campaigns to collect 3D body scans and anthropometry ground truth for them. For example, the UMTRI dataset was collected to find the safest sitting posture of young children in cars [12]. Reproducibility error in their experiments was less than ±5 mm for the most measurements. However, these were relative accuracies over repeated tests. Simmons and Istook [23] noted that there is substantial variation in available softwares how to measure anthropometric measurements from 3D data. Paquette et al. [17] demonstrated much larger errors for 3D measurements as compared to manual tape measurements. They reported systematic errors of up-to 30−40 mm despite the fact that standard measurement procedures were implemented to the softwares (ISO-8559 and U.S. Army).
3D Human Body Models -The early works following the data campaigns above were based on "virtual tape measurements" where the anthropometric measurements were made manually with the help of 3D measurement software. If this step needs to be automated, then 3D scan data needs to be aligned with a model for which the measurement paths can be predefined using 3D model vertex ids. However, first a good 3D human body model needs to be devised. The model should contain intuitive parameterization for shape and pose and provide realistic body shapes. There are sev-eral options for scientific work. The most popular parametric body model is MakeHuman which is an open source project (http://www.makehuman.org/) based on an artistic body model and aiming at high quality rendering for games and movies. However, better models are based on statistics of real human data. These require a single artist made initial point model which is iteratively matched to scanned point clouds in a normalized pose. Principal Component Analysis (PCA) over the matched model points provides natural parameterization for the shape. The pose can be intuitively defined by a skeleton joint model, but the final quality depends on how well the model can represent pose specific shape deformations. One of the first attempts to create a 3D human body from PCA shape and skeleton pose is the SCAPE body model by Anguelov et al. [2]. Hirshberg et al. [11] proposed a better parametric body model for SCAPE and introduced the BlendSCAPE model. Other attempts are by Baek and Lee [3] and more recently the SMPL model by Loper et al. [15]. SMPL provides high quality models where the shape is divided to pose invariant and pose dependent deformations and the model parameters are optimized using a combination of their own dataset of 1, 786 scans and 3, 800 scans from CAESAR. For this work we adopt the SMPL model due to its good overall quality.
Computerized Anthropometric Measurements -There have been several attempts to infer 3D body models from 2D RGB images. For example, Guan et al. [10] proposed a method and compared their measurements to the ground truth. However, for many industrial and commercial applications the accuracy of 2D measurements is insufficient. For better accuracy 3D scans are needed.
Weiss et al. [26] propose a Kinect-based 3D body scan method that uses the SCAPE body model. The method requires manual pose initialization and then optimizes the model mesh using a standard ICP. Tsoli et al. [25] propose a pipeline that is similar to ours. They use the BlendSCAPE model to register a 3D scan and then they compute various local and global features which are used in regression. A different approach was proposed by Zuffi et al. [32] in their "stitched puppet" model where the body model is divided to local templates where "local PCA" matching is performed and then the local parts are globally aligned in the next optimization step. Wuhrer et al. [27] introduce an inverse problem of ours where a 3D body model is estimated from the given 1D anthropometric measurements.
The above works particularly address the problem of unknown pose. However, we believe that a fixed pose can be assumed for many applications since customers can be assumed co-operative. Therefore, the process can be drastically simplified and provide accurate results. Recently, novel single depth-sensor based body scanning approaches have been proposed, for example, Bo-dyFusion [28] and DoubleFusion [29], but since 3D scanning is out of the scope of this work, a commercial 3D body scanner was used. Our dataset was collected using a commercial TC2 body scanner (https://www. tc2.com) that uses off-the-shelf depth sensors (Intel RealSense R200). Inside the scanner, subjects were instructed to step on the rotating platform and take a standing pose with the feet at around their shoulder width apart and the arms slightly raised to create a gap between the arms and torso. The platform then rotates around once, during which three depth sensors produce a raw 3D scan of the customer and the process takes a few seconds ( Figure 2). The test subjects wore tight fitting underwear-like sport costumes. The scanner outputs a triangulated mesh structure in the regular OBJ file format. Each triangulated mesh contains on average 57,000 vertices and around 113,000 faces. For our experimental studies, we scanned 194 men and 181 women. Scanned persons were instructed to wear tight underwear.

SMPL body model
The popular 3D human body models MakeHuman, SCAPE [2], BlendSCAPE [11] and SMPL [15] (see Section 2 for details) share similar model parameterization {T , S, θ} where T is the initial model in a "canonical shape" and "canonical pose", S defines the shape deformation and θ defines the pose. Pose parameterization is intuitive and typically based on a skeleton rig of K skeleton joints. A pose is encoded to the 3D rotation angles of K joints in θ. Each vertex location in T is relative to a specific skeleton part or parts and therefore the whole point cloud deforms. Parameterization of the shape is more difficult to model since parameters need to capture shape statistics of the human population. The standard approach is to use Principal Component Analysis (PCA) where principal components represent the most important axes of variation in the population. In the PCA space any shape can be reconstructed by linearly adding |β| principal directions to a mean shape T (the zero shape): Often as few as |β| = 10 principal component vectors provide sufficient accuracy for applications where subtle details are not important. For our work we selected the Skinned Multi-person Linear Model (SMPL) by Loper et al. [15] since it provides very competitive accuracy and the original implementation is publicly available. SMPL mesh model contains N = 6, 890 vertices (13, 766 faces) and K = 23 skeleton joints. The mesh has the same topology for men and women, spatially varying resolution, a clean quad structure, segmentation into parts, initial blend weights, and a skeletal rig. A particular detail that makes SMPL registration more accurate than its competitors is that it divides the shape deformation to pose independent deformation B S (β) and pose specific deformation B P (θ) which are summed to define the final shape. Notably the shape deformation parameters are also used to predict the rotations of the K = 23 skeleton joints J(β) : R |β| → R 3K . We re-defined the SMPL zero-pose to correspond to the pose subjects were instructed to take (Figure 3). Fig. 4 A scanned point cloud contains holes and measurement noise, but registration of the 3D body model (red) is robust to these distortions and achieves an accurate -"skin level" -registration which is essential for accurate anthropometric measurements in the next stage

Non-rigid ICP registration
The goal of the body model registration to the scanned point cloud is to provide "skin level registration" where the two surfaces, the model and the scan, overlay almost perfectly ( Figure 4). This is a challenging task since a) points contain measurement noise, b) large point regions may be missing and c) the model points do not exactly match the scan point locations. To make the final anthropometric measurements accurate in the next processing stage we need a registration method that is accurate and robust to the aforementioned nonidealities.
A core component in constructing the SCAPE, Blend-SCAPE and SMPL datasets is an artistic generated point model and an algorithm to register the model to real human scans. However, these algorithms perform complex optimization and must be manually initialized. Therefore the artistic models and special algorithms have not been used outside body model generation. However, the final body models, SCAPE, Blend-SCAPE and SMPL, provide intuitive parameterization as discussed in Section 4.1 and registration can be defined as an optimization problem where a few pose and shape parameters {S, θ} are optimized to minimize a registration error. Skin level registration requires a large number of PCA components for the shape and therefore we take an alternative approach from the generic point cloud matching literature.
Several comparison of generic registration methods exist. For example, Bogo et al. [4] introduced the FAUST dataset for comparing non-rigid registration methods.
In their experiments, several popular methods, e.g., Generalized Multi-Dimensional Scaling (GMDS) [5], Möbius voting [14] and Blended Intrinsic Maps (BIM) [13], did not perform well since these methods assume that both inputs are watertight and have the same topology. However, the baseline point cloud matching method, Iterative Closest Point (ICP), does not require such assumptions.
There are two extensions of the baseline ICP that are suitable for human body point clouds: Amberg et al. [1] and Schneider et al. [22]. Since the 3D scans often contain holes (Figure 2) we adopted the Amberg et al. approach that explicitly handles missing points. The challenge is two-fold -we want to retain the global convergence properties of ICP while still allow local deformations to the skin level. Local deformations make this ICP non-rigid.
The starting point of our algorithm is a pre-aligned model defined by {T , β i , θ k } i=1,...,|β|,k=1,...,K that brings the SMPL template to approximate correspondence with the obtained scan point cloud T scan . A simple procedure for pre-alignment is described in Section 4.3. If we define the pre-aligned model as V then the problem is to find optimal values for the alignment parameters X so that V(X ) registers the template points to the surface points T scan .
To solve the optimal parameters X an energy function of three terms is defined [1]: E d is the standard ICP distance term between the model and scan points where X i is a linear mapping of a single model vertex v i to correspondence in T scan . w i defines whether a model point has a correspondence in scan (w i = 1) or not (w i = 0). E s is a local stiffness term where · 2 F is the matrix Frobenius norm. The stiffness term enforces similar transformations between neighbor vertices N j of the model vertex v j . γ is used to weight differences in the rotational and skew part of the deformation against the translations part of the deformation (γ = 1 in the experiments). The third energy term is a landmark term The landmarks L are pre-defined and important positions in the model and this term enforces them to be registered accurately. The landmark term improves registration significantly, but requires manual labeling of selected keypoints and is therefore omitted in our experiments. The algorithm in [1] uses locally affine regularization which assigns an affine transformation to each vertex and minimizes the difference in the transformation of neighboring vertices. The deformation parameters X , which would be applied on source vertices to generate the target surface deformation, are obtained by minimizing the cost function in Eq. 6 directly and exactly.
The cost functionĒ(X ) takes its minimum at X = (A T A −1 )A T B. In the above equation M is the nodearc incidence matrix of the template mesh topology, and G := diag(1, 1, 1, γ) is a weighting matrix, W := diag(w 1 , ..., w n ) represents the weighting matrix in which w i = 0 if template vertices v i corresponds to missing data in the target mesh and n represents the number of template vertices, D is the sparse matrix of template vertices mapping the 4n × 3 deformation parameters X , U is the matrix of the correspondence points on the target mesh, D L and U L are the pre-defined landmarks on the template mesh and their correspondence points on the target mesh respectively, the Kronecker product is denoted by ⊗. α and β are the penalty terms that balance the two corresponding energy functions with respect to the standard ICP term E d .
The whole registration process consists of two loops. In the outer loop a series of deformations of the template are performed for each stiffness α i ∈ {α 1 , ..., α n }, where α i > α i+1 . These α values guarantee the registration process from a global deformation to more localized ones. In our experiments α values are set to from 100 to 1 by step size 1. In the inner loop a deformation X for a fixed stiffness term α i and preliminary correspondences is found. Preliminary correspondences are found by a nearest point search. The optimal deformation X is determined until ||X j − X j−1 || < , where is the threshold.

Pre-alignment and Initialization Procedures
A simple pre-alignment procedure is performed before non-rigid ICP registration. Generally the mis-alignment of registration is partly raised by wrong scales, face orientations, the different center points of subjects. To depreciate it, firstly we scale all scans into the same unit of measurement (meter) as the SMPL model meshes; we then rotate all scans to make sure that they face the same direction. Compared to the previous works which adopt the mean coordinate of vertices as the center points and align all meshes into the same center point, we additionally align all samples into the same lowest point (Z-axis). The center points change dramatically since the missing parts on scans and bring negative effects on registration. A standard point (x, y, 0) is employed as the lowest point for all meshes. After the pre-alignment procedure, all scans and the SMPL models are standing on the X − Y plane and facing to Y -axis direction with the same scale.
The height of the SMPL model is controlled by the first shape parameter β 1 . To obtain a suitable initial value for β 1 , we utilize a simple linear function over the heights of the training set scans to estimate the parameterβ 1 ≈ β 1 . To initialize the pose parameters, we start from the pose θ (on the right in Figure 3) and iteratively test a number of arm angle shifts to match with the target scan. These initialization procedures aid convergence and improve accuracy, but their effect is not significant.

Anthropometric Measurements
The proposed pipeline outputs estimates of the target physical anthropometric measurements from a fitted model (Section 4) by first calculating circumference paths through the model points (Section 5.1) and then estimating the physical measurements from the path distances by non-linear regression (Section 5.2).

Surface Measurements
The registration process brings two main benefits: (a) it produces a hole-free mesh without missing body parts and reduces the point cloud noise; and (b) registered meshes of all subjects are in the same topology that facilitates finding the corresponding vertices of the predefined circumference paths.
For each anthropometric measurement t i we define a set of surface circumference paths. The path lengths t The length of a circumference path is the sum of edge lengths through the defined path ( Figure 5). The selected circumference paths were not optimal, but manually set near the true anthropometric measurement locations. It was assumed that multiple paths provide extra robustness to shape deformations (see the ablation study in the experimental part of our work).

Non-linear regression
The purpose of a suitable regressor is to find a mapping f (·) such that wheret i is the estimate of the true anthropometric clothing measurement t i . The most straightforward solution is the ordinary least squares (linear regression) which finds a solution ω = (ω 0 , ω 1 , . . . , ω C ) T that minimizes the squared loss over training subjects i where t i is a training set the ground truth value and t i = (t (1) i , . . . , t (C) i ) T are the computed circumference path distances for this specific anthropometric measurement. Linear regression with regularization (ridge regression) minimizes the squared loss with a weight penalty term λ The are also more advanced extensions of linear regression, such as Elastic Net Regression [31], and other learning based regressors such as Support Vector Regression (SVR) [24]. We compare several popular regression methods in our ablations studies.

Dataset and Settings
We collected a set of 3D scans using the commercial scanner (Section 3). The dataset -NOMO3D -consists of 194 male and 181 female scans. For each subject, a clothing expert (tailor) made the actual anthropometric measurements (15 male and 19 female). All results are average performance over 5-fold cross-validation.
Method Evaluation -We employ the Mean Absolute Error (MAE) as the error metric between the ground truth and estimated anthropometric measurements. For each measurement i, Mean Absolute Error i , over all subjects j was obtained as In addition to the measurement specific MAEs we also computed the average MAEs over all measures. All numbers were measured in millimeters (mm). Moreover, for each measurement we also report the proportion of the test samples for which the accuracy was below the defined error limits in [9] as Success rate.
Computational complexity -The most time consuming part is the non-rigid ICP registration. Matlab code was adapted from [1] and it runs approximately 2 minutes on each scan. The pre-alignment and initialization procedures are very fast, less than a second, as well as the regression which is also computationally fast.

Results
The average 5-fold errors for each anthropometric measurement and their accuracy thresholds and success rates are shown in Table 1. In all cases, the number of surface measurements were optimized for each anthropometric measurement and the best performing regressor (nonlinear SVR) was used. For the both male and female subjects the best performing measurement was neck circumference with 93% test cases below the threshold (6 mm) for men and 81% for women. The worst performing measure was ankle circumference for which only 28% of male 24% of female success rates were achieved. The error distributions for the male and female neck and ankle circumferences and male chest and female natural waist circumferences are shown in Figure 6. The distributions reveal that there exists a small amount of test samples with a large error. It turned out that the main source of large estimation errors yields from the body scanner that often misses certain body parts. For example, feet regions often lack point cloud points which makes the registration fail in these regions ( Figure 7).

Ablation Study
Number of circumference paths -In the first ablation study, we investigated the effect of adding multiple surface measurements (circumference paths) to the anthropometric regression. The results for three well and three poorly performing measurements for the both male and female are shown in Figure 8. Results are for non-linear SVR regressor with 5-fold cross-validation. The most important findings are that additional paths always improve the accuracy and depending on the measurement the results saturate at 3 to 9 surface circumference paths. In particular, paths close to the physical anthropometric measurement location strongly contribute to the estimation accuracy. The best single paths (C = 1) were also selected using cross-validation and the results with and without SVR regression are shown in Table 1.
These results indicate that i) the multi-path regression is superior to single path regression and ii) SVR significantly improves the estimation performance.
Non-rigid ICP -To validate the importance of nonrigid ICP we conducted an experiment where the SMPL model was directly fitted to the point clouds. SMPL parameter optimization was done using the popular L-BFGS-B optimizer [30]. Similar to the non-rigid ICP, the distance term E d with the normal direction constraints was used as the target function. The stop criterion was set to 10 −6 to keep the computation times reasonable and the same pre-alignment procedure was adopted. The results are shown in Table 1 and are clearly inferior to the proposed non-rigid ICP registration. Table 1 Average 5-fold (80% for training and 20% for testing) performance (Mean Absolute Error) and success rate (a proportion of the test samples within the error limits in [9]) of anthropometric measurements. "C" denotes the number of circumference paths used in estimation. "best single" is the best single path performance. "+SVR" uses SVR regression for the estimates. "L-BFGS-B + SVR" uses the SMPL model fitted by the L-BFGS-B optimizer.

Conclusions
This work introduced a full processing pipeline for estimating physical anthropometric measurements from 3D body scans. The pipeline consisted of a commercial 3D scanner, a deformable SMPL body model, non-rigid ICP based model registration, computation of circumference path features and non-linear regression for anthropometric measurement estimation. Depending on the measurement our pipeline provided success rates from 28% to 93% for male and from 24% to 82% for female subjects. The proposed pipeline works in practice and shows that an affordable scanning system can be built for clothing industry.
In the future work, we will further investigate and refine each step of the pipeline. For example, selection of better surface features in addition to the circumference paths, fast-to-compute alternatives for the slow ICP algorithm (e.g. Chen et al. [6]) and better scanners and scanning procedures.