Statistical Modeling of Craniofacial Shape and Texture
- 217 Downloads
Abstract
We present a fully-automatic statistical 3D shape modeling approach and apply it to a large dataset of 3D images, the Headspace dataset, thus generating the first public shape-and-texture 3D morphable model (3DMM) of the full human head. Our approach is the first to employ a template that adapts to the dataset subject before dense morphing. This is fully automatic and achieved using 2D facial landmarking, projection to 3D shape, and mesh editing. In dense template morphing, we improve on the well-known Coherent Point Drift algorithm, by incorporating iterative data-sampling and alignment. Our evaluations demonstrate that our method has better performance in correspondence accuracy and modeling ability when compared with other competing algorithms. We propose a texture map refinement scheme to build high quality texture maps and texture model. We present several applications that include the first clinical use of craniofacial 3DMMs in the assessment of different types of surgical intervention applied to a craniosynostosis patient group.
Keywords
3D morphable model Statistical shape model Craniofacial shape Shape morphing1 Introduction
Here, we are concerned with 3D statistical shape modeling of craniofacial data, i.e. models of the variability of the full head and face. A full head model opens up new applications and introduces useful constraints that are not afforded by existing 3D face models. In graphics, knowledge of full head shape is necessary for modeling hairstyles onto a correctly proportioned cranium (Petr and Ivana 2015). In ergonomics, predicting the fit of headwear objects such as helmets, spectacles and breathing apparatus requires modeling the fit over the full head region (Harrison and Robinette 2006). In forensics, reconstructing face models from skulls (Madsen et al. 2018) could be further constrained if the relationship between skull and outer face surface were modelled over the whole skull. In craniofacial syndrome diagnosis and surgical planning and evaluation, a full head model is a prerequisite for comparing syndromic or post-surgical head shape to population norms and for proposing surgical interventions (Dai et al. 2017a). In face animation, the skull location can be used to stabilize rigid motion of a performance (Beeler and Bradley 2014). Estimating skull location would be considerably simplified with a full head model. In computer vision, a full head model enables prediction or completion of the unobserved back of the head from face images or from silhouettes, which has potential applications in biometrics, and provides a mechanism to address any of the other aforementioned applications.
These rich applications motivate our work, but building full head models brings with it new challenges that are not confronted by face-only models. To capture the rich variability in craniofacial shape variation requires a dataset of 3D scans that covers the whole head and face area and is diverse enough to sample the full space of variation. The first challenge is that cranial shape usually cannot be directly observed (due to hair or headwear) and many scanning systems only cover the ear-to-ear region, so no suitable dataset previously existed. Second, with large-scale data, the model construction pipeline must be fully automatic to avoid costly and unreliable manual intervention. Third, building a 3DMM requires establishment of dense correspondence between all training samples. The cranium and neck region dominate the face in terms of surface area, yet are largely devoid of distinctive features. This makes meaningful correspondence difficult to compute in the cranial area and also risks sacrificing quality of correspondence in the face area, as the cranium dominates.
We propose a fully automatic pipeline to train 3DMMs that: i. is the first to employ an adaptive template; ii. employs a new morphing algorithm that integrates ideas from Iterative Closest Points (Besl and McKay 1992) and Coherent Point Drift (Myronenko and Song 2010) and iii. employs regularized projection to reduce morphing error.
We present a high-quality, multi-view texture mapping method and employ it for texture modeling.
We build both global craniofacial 3DMMs and demographic sub-population 3DMMs from 1212 distinct identities in the Headspace dataset and we make both the 3DMMs [improved from our earlier public release in Dai et al. (2017b)] and the Headspace dataset publicly available (Duncan et al. 2018). To our best knowledge, our models are the first public shape-and-texture craniofacial 3DMMs of the full human head.
We demonstrate a wide range of applications that demonstrate the power of our 3DMMs, these include: i. the flexibility modes of shape completion, ii. age-based regression of craniofacial shape, and iii. clinical assessment of craniofacial surgery.
2 Related Work
The first 3DMM of the human face was presented by Blanz and Vetter (1999). Here, 3D face scans of 200 young adults, evenly split between male and female, were used to construct the model. Dense correspondences were computed using optical flow with an energy term dependent on both shape and texture. Independent shape and texture models were developed, where each was constructed using Principal Component Analysis (PCA). This was achieved in an iterative bootstrapping process, where the expressive power of the model was gradually increased by increasing the number of model components. Later the same authors described how to do face recognition within an analysis-by-synthesis setting (Blanz and Vetter 2003). This was achieved by fitting the 3D face model, along with pose and illumination model parameterisation, to a single facial image.
Many important works had led up to the first 3DMM, at least dating from the transformation grid work of Thompson et al. (1917). Other key milestones include the shape-space work of Kendall (1984), work on Thin Plate Splines by Bookstein (1989), the theoretical underpinnings of statistical shape modeling by Dryden et al. (1998), Point Distribution Models (PDMs) by Cootes and Taylor (1995), Active Shape Models (ASMs) (Cootes et al. 1995), and Active Appearance Models (AAMs) Cootes et al. (2001), to name a few.
More recently, the Basel Face Model (BFM) has become the most well-known and widely-used 3DMM of the human face and was developed by Paysan et al. (2009). As with Blanz and Vetter’s model, 200 scans were used, but the method of determining corresponding points was improved. Instead of optical flow, a set of hand-labelled feature points is marked on each of the 200 training scans. The corresponding points on a template mesh are known, which is then morphed onto the training scan using under-constrained per-vertex affine transformations, which are constrained by regularisation across neighbouring points (Amberg et al. 2007). The technique is known as optimal-step Non-rigid Iterative Closest Points (NICP).
The BFM was released as both a global model and a part-based model that is learned for four regions: the eyes, nose, mouth and the rest of face. In the part-based version, the manually-defined regions are fitted to the data independently and merged in a post-processing step (ter Haar and Veltkamp 2008; Basso et al. 2007). The part-based model was shown to lead to a higher data accuracy than the global model. De Smet and Van Gool (2010) proposed a method to find the optimal segmentation automatically by clustering the vertices, which is based on features derived from their displacements. In order to address the potential discontinuities at the boundaries of the segments, they smoothly weight the segments to obtain regionalized basis functions for the training data.
A multilinear model has been employed by several authors (Vlasic et al. 2005; Yang et al. 2011; Bolkart and Wuhrer 2013; Yang et al. 2012) to capture varying facial expressions. Vlasic et al. (2005) modelled facial shape using a combination of identity and expression variation. Yang et al. (2011) modelled the expression of a face in a different input image of the same subject. A number of PCA shape spaces for each expression are built and combined with a multilinear model. A follow-up work (Bolkart and Wuhrer 2013; Yang et al. 2012) used this model for a better description of expressions in videos. When a sequence of 3D meshes is given, Bolkart and Wuhrer (2013) fitted a multi-linear model to parameterize a 4D sequence. Later, they demonstrated a direct construction of a multilinear model from a set of meshes using a global optimization of 3DMM parameters along with a group-wise registration over the 3D scans (Bolkart and Wuhrer 2015). Another alternative to modeling faces with expression is to blend different shape models with expressions, which was introduced by Salazar et al. (2014) to establish correspondence among faces with expression.
A hierarchical pyramids method was introduced by Golovinskiy et al. (2006) to build a localized model. In order to model the geometric details in a high resolution face mesh, this statistical model is able to describe the varying geometric facial detail. Brunton et al. (2011) described 3D facial shape variation at multiple scales using a wavelet basis. The wavelet basis provided a way to combine small signals in local facial regions that are difficult for PCA to capture.
Recently, Booth et al. (2016) built a Large Scale Facial Model (LSFM), using the NICP template morphing approach, as was used in the BFM, but with error pruning, followed by Generalized Procrustes Analysis (GPA) for alignment, and PCA for the model construction. This 3DMM employs the largest 3D face dataset to date, and is constructed from 9663 distinct facial identities.
Lüthi et al. (2017) model the shape variations with a Gaussian process, which they represent using the leading components of its Karhunen–Loeve expansion. Such Gaussian Process Morphable Models (GPMMs) unify a variety of non-rigid deformation models with B-splines and PCA models as examples. In their follow-on work, they present a novel pipeline for morphable face model construction based on Gaussian processes (Gerig et al. 2017). GPMMs separate problem-specific requirements from the registration algorithm by incorporating domain-specific adaptions as a prior model.
Tran and Liu (2018) proposed a framework to construct a nonlinear 3DMM model from a large set of unconstrained face images, without collecting 3D face scans. Specifically, given a face image as input, a network encoder estimates the projection, shape and texture parameters. Two decoders served as the nonlinear 3DMM to map from the shape and texture parameters to the 3D shape and texture, respectively.
The work presented here builds on our earlier conference publication (Dai et al. 2017b) that introduced the first publicly available 3DMM of the human head. In that paper, we used a hierarchical parts-based approach to shape morphing. Here, we use an adaptive template approach to personalize the template to the subject’s facial features before dense morphing. The dense morphing algorithm itself then employs a new algorithm called Iterative Coherent Point Drift (ICPD), combining concepts from the well-known ICP (Besl and McKay 1992; Chen and Medioni 1992) and CPD (Myronenko and Song 2010) algorithms. In Sect. 9.2, we demonstrate that the new morphing results are superior to our earlier approach and we update our LYHM 3DMM public release with a version that uses this improved morphing. Our other earlier work introduced a symmetrized version of the CPD morphing algorithm (Dai et al. 2018a), and we also evaluate against this pipeline here, although symmetry is not a central consideration of this paper. Furthermore, in this paper, we use the constructed 3DMM in the first 3DMM-based clinical assessment of craniofacial surgery.
3 Overview of 3DMM Training
Our 3DMM training pipeline has three main functional blocks:
i. Data preprocessing We use automatic 2D landmarking and map to 3D using the known 2D-to-3D registration supplied by the 3D camera system. These 3D landmarks can then be used for both pose normalisation and template adaptation (personalization of the template).
ii. Dense correspondence A collection of 3D scans are reparametrized into a form where each scan has the same number of points joined into a mesh triangulation that is shared across all scans. This is achieved by non-rigid template deformation. Furthermore, the semantic or anatomical meaning of each point is shared across the collection, as defined by the template mesh. We use the publicly-available FaceWarehouse head mesh (Cao et al. 2014) as our template, which has 11510 vertices.
iii. Alignment and statistical modelling The collection of scans in dense correspondence are aligned using Generalized Procrustes Analysis (GPA). Then Weighted Principal Component Analysis (WPCA) is applied, generating a 3DMM.
4 Overview of the Headspace Dataset
The Headspace dataset comprises 3D images of the human head for 1519 subjects. The data was collected and annotated by the Alder Hey Children’s Hospital (AHCH) Craniofacial Unit (Liverpool, UK), who employed 3dMD Ltd’s static 5-view 3dMDhead scanning system, using the five 3D camera configuration shown in Fig. 2. This dataset has been structured and made available online for research purposes, in a collaboration between AHCH and the Department of Computer Science, University of York. Access to the dataset is via the author’s Headspace web page (Duncan et al. 2018).
A typical output of this system rendered from different viewpoints, both with and without texture, is shown in Fig. 3. Vertex resolution is variable but typically there are around 180K vertices. All subjects are wearing tight fitting latex caps to reduce the effect of hairstyles. For subjects with relatively low-volume hairstyles, the shape of the cranium is clearly revealed. If this is not the case, we exclude them from the 3DMM training data, filtering on the basis of the hair bulge flag in the metadata.
The dataset is supplied with subject-based metadata, capture-based metadata and a set of 3D landmark coordinates extracted using the Zhu-Ramanan mixture-of-trees algorithm (Zhu and Ramanan 2012). The subject information includes: gender, declared ethnic group, age, eye color, hair color, beard descriptor (none, low, medium, high), moustache descriptor (none, low, medium, high), and a spectacles flag. The capture information contains a quality descriptor (free text, such as ‘data spike right upper lip’), a hair bulge flag (hair bulge under latex cap distorting the apparent cranial shape), a cap artefact flag (cap has a ridge at its apex due to poor fitting), a cranial hole flag (a missing part in the data scan at the cranium) and an under chin hole flag (missing part under chin).
The dataset is well-balanced in gender, but not age, which is predominantly 20s, see Fig. 5. However, the age range is wide, from 1 to 89 years. Also it is not well-balanced in declared ethnicity, which is predominantly ‘white’, with 90% ‘white’, 5.3% ‘asian’, 2.7% ‘mixed heritage’, 1% ‘black’ and 1% ‘other’. Eye color is distributed as 33.36% brown, 46.38% blue, 19.89% green and 0.37% other.
5 Data Preprocessing
Preprocessing of the 3D scan serves to place the data in a frontal pose and localise a complete and accurate set of automatic 3D landmark positions, for every 3D image, that corresponds to a set of manually-placed landmarks on the template. Placing manual landmarks on the template is done only once, there is no manual landmarking on a per-subject basis.
In more detail, the first stage employs the Mixture of Trees method of Zhu and Ramanan (2012) to localize 2D facial landmarks on the 5-view composite texture image. Although there are more recent network-based landmarkers (see Wu and Ji (2019) for a review), this method works highly reliably on non-frontal poses, as captured in our 5-view composite 2D images (see Fig. 4), where there are typically two facial views at around \(\pm 45^{\circ }\) to the frontal view. In all cases, at least one view was successfully landmarked (in 99% of cases both views are landmarked), so all 1212 3DMM training images could be pose normalized. Our framework may be used with any other landmarker than can handle pose variations.
The mixture of trees that we use has 13 landmark tree models (‘components’) for 13 different yaw angles of the head. For each subject, two face detections are found, corresponding to the left and right side of the face. The detected 2D landmarks are then projected to 3D using the OBJ texture coordinates in the data. Given that we know where all 3D landmarks should be for a frontal pose, it is possible to do standard 3D pose alignment in a scale-normalized setting (Dai et al. 2017b).
We automatically learn how to orientate each of the detected trees to frontal pose, based on their 3D structure. To do this, we apply Generalized Procrustes Analysis (GPA) to each collection of 3D tree components and find the nearest-to-mean tree shape in a scale-normalized setting. We do not have any clear semantic meaning of the landmarks in the nearest-to-mean tree and therefore we don’t know their relative target positions when normalising to a frontal pose. Therefore, we apply a 3D face landmarker Creusot et al. (2013) to the 3D data of the nearest-to-mean tree shape, which generates a set of 14 near-symmetric landmarks, each with clear semantic meaning. This landmark set is easily frontal-pose normalized. Here, we find the alignment that moves the symmetry plane of these 14 landmarks to the Y–Z plane and positions the nasion directly above the subnasale to normalize the tilt angle. To complete the training phase, the mean 3D tree points for each of the 13 trees are then carried into this frontal pose using the same rotation, and are used as reference points for the frontal pose normalisation of the 3D trees.
In around 1% of the dataset, only one tree is detected and that is used for pose normalisation and, in the rest, 2–3 images are detected. In the cases where 3 trees are detected, the lowest scoring tree is always false positive and can be discarded. For the remaining two trees, a weighted combination of the two rotations is computed using quaternions, where the weighting is based on the mean Euclidean error to the mean tree, in the appropriate tree component.
After we have rotated the 3D image to a frontal view and generated a synthetic frontal 2D image, we wish to generate a set of landmarks that are accurate and correspond to the set marked up on the template. This is the set related to the central tree (\(0^{\circ }\) yaw) in the mixture, and we subsample 17 of these 68 landmarks around the eyes (indices 28, 37, 39, 40, 42, 43, 44, 46, 47) nose base (indices 31, 32, 34, 36) and mouth (indices 49, 52, 55, 58). After these 2D facial landmarks are extracted, they are again projected onto 3D mesh.
6 Correspondence Establishment
We employ template morphing as a means of establishing correspondence across our 3DMM training dataset. However, very low error non-rigid shape morphing over a diverse set of target shapes is still a challenging problem. The true underlying shape transformation of the template to the data is very different for different head shapes and we require a technique that permits an accurate mapping between target points (corresponding landmarks), while regulating the deformation of the remaining mesh. The use of Gaussian Process models in morphing is a leading recent approach (Gerig et al. 2017), whereas the use of the Laplace–Beltrami operator in As Rigid As Possible shape regulation (Sorkine and Alexa 2007) is a leading traditional approach. These were natural choices for us to evaluate within our pipeline—in particular they are employed to adapt, or personalize, the template to each individual subject, before dense morphing proceeds.
Now we present a new fully-automatic non-rigid 3D shape registration pipeline by integrating several powerful ideas from computer vision and graphics. These include Iterative Closest Points (ICP) (Besl and McKay 1992), Coherent Point Drift (CPD) (Myronenko and Song 2010), and mesh editing using the Laplace–Beltrami (LB) operator (Sorkine and Alexa 2007). As mentioned, we also provide comparisons of the latter approach with the use of Gaussian Process Morphable Models (GPMMs) (Gerig et al. 2017).
Figure 9 is a qualitative illustration of a typical result where our method achieves a more accurate correspondence than standard CPD. Note that the landmarks in our method are almost exactly the same position as their corresponding ground-truth points on the 3D scan. Even though standard CPD-affine is aided by Laplace–Beltrami Regularized Projection (LBRP, a component of our new pipeline), the result shows a squeezed face around the eye and mouth regions and the landmarks are far away from their corresponding ground-truth positions.
6.1 Template Adaptation
As shown in Fig. 8, template adaptation consists of two sub-stages: i. global alignment followed by ii. dynamically adapting the template shape to the data. For global alignment, we manually select the same landmarks on the template as we automatically extract on the data i.e. using the 17 landmarks sampled from the zero yaw angle tree component from Zhu and Ramanan (2012), augmented with three additional landmarks per ear. Note that this needs to be done once only and so doesn’t impact on the autonomy of the online operation of the framework. Then we align rigidly (without scaling) from the 3D landmarks on 3D data to the same landmarks on the template. The rigid transformation matrix is used for the data alignment to the template.
6.1.1 Laplace–Beltrami Mesh Manipulation
We decompose the template into several facial parts: eyes, nose, mouth, left ear and right ear. We rigidly align landmarks on each part separately to their corresponding landmarks on 3D data. These rigid transformation matrices are used for aligning the decomposed parts to 3D data. The rigidly transformed facial parts tell the original template where it should be. We treat this as a mesh manipulation problem. We use Laplace–Beltrami mesh editing to manipulate the original template towards the rigidly transformed facial parts, as follows: i. the facial parts (fp) of the original template are manipulated towards their target positions—these are rigidly transformed facial parts; ii. all other parts of the original template are moved As Rigidly As Possible (ARAP) (Sorkine and Alexa 2007).
The parameter \(\lambda \) weights the relative influence of the position and regularisation constraints, effectively determining the ‘stiffness’ of the mesh manipulation. As \(\lambda \rightarrow 0\), the facial parts of the original template are manipulated exactly to the rigidly transformed facial parts, but the template mesh shape is not retained. As \(\lambda \rightarrow \infty \), the adapted template will retain the shape of the original template, \(\mathbf {X}_0\), but will not be well-adapted to the subject’s face and head shape. A suitable \(\lambda \) value is chosen to give a good trade off.
6.1.2 Template Adaption Using Gaussian Process Posterior Models
Gaussian Process Morphable Models (GPMMs) allow a more general formulation of 3D shape deformation than PCA-based statistical shape models (Lüthi et al. 2017). Firstly, they operate on a continuous rather than discrete domain (which can of course be sampled, as required). Secondly, they allow a wider range of covariance formulations, such as those that don’t require training data. Therefore, we can use them to train statistical shape models and we now exploit GPMMs as an alternative to using the ARAP constraint for template adaptation.
Our aim is to employ the posterior Gaussian Process formulation to solve a regression problem. We present and later evaluate this approach to give a comparison with the ARAP approach. As before, the aim is to infer the full shape from a set of landmark positions on the shape. Given partial observations, such GPMMs are able to determine the potential full shape. They show the probable range of motion of all the vertices in the shape, when the landmarks are fixed to their target positions.
6.2 Iterative Coherent Point Drift
After template alignment and shape adaptation, the task is to further deform and align the template to the target 3D data scan. Here, we employ a new shape morphing algorithm that integrates ideas from Iterative Closest Points (ICP) (Besl and McKay 1992) and Coherent Point Drift (CPD) (Myronenko and Song 2010). ICP in itself is only a rigid alignment scheme and, although CPD offers non-rigid morphing, we have found that it is often unstable when the template and data are highly imbalanced in the number of points; in particular, our data has significantly more points than our template. To counter this, we use the template to iteratively sample the data via nearest neighbours.
The qualitative output of ICPD is very smooth, a feature inherited from standard CPD. A subsequent regularized point projection process is required to capture the target shape detail, and this is described next.
6.3 Laplace–Beltrami Regularized Projection
When ICPD has deformed the template close to the data, point projection is required to eliminate any shape distance error in a direction normal to the data’s surface. Such a point projection process is potentially fragile. If the data is incomplete or noisy, then projecting vertices from the deformed template to their nearest vertex or surface position on the data may cause large artefacts. Again, we overcome this by treating the projection operation as a mesh editing problem with three ingredients. First, position constraints are provided by those vertices with mutual nearest neighbours between the deformed template and data. Using mutual nearest neighbours reduces sensitivity to missing data. Second, local position constraints are provided by the automatically detected landmarks on the data. Third, regularisation constraints are provided by the LB operator which acts to retain the local structure of the mesh. We call this process Laplace–Beltrami Regularized Projection (LBRP), as shown in the registration framework in Fig. 8.
7 Alignment and Statistical Modeling
We use Generalized Procrustes Analysis (GPA) to align our deformed templates before applying statistical modelling using PCA. This generates a 3DMM as a linear basis of shapes, allowing for the generation of novel shape instances. We may use all of the full head template vertices for this modelling, or any subset. For example, later we select the cranial vertices when we build a specialized 3DMM to analyse a cranial medical condition.
In many applications, vertex resolution is not uniform across the mesh. For example, we may use more vertices to express detail around facial features of high curvature. However, standard PCA attributes the same weight to all points in its covariance analysis, biasing the capture of shape variance to those regions of high resolution. Whether or not this is desirable is application dependent. Here, to normalize against the effect of varying surface-sampling resolution, we employ Weighted PCA (WPCA) in our statistical modelling.
7.1 3DMM Training and Fitting Using Weighted PCA
Standard PCA performs an eigendecomposition of the covariance matrix associated with the set of training examples, \({\Sigma } = \mathbf {X_{\text {D}}}^\text {T} \mathbf {X}_\text {D}\). In our case, we have a small number of training data observations (N) compared with the number of features, or dimensions (3p), hence we would need to apply SVD to \(\mathbf {X}_{\text {D}}\), as \({\Sigma }\) is not full rank. However, a more efficient alternative is to employ snapshot PCA, which computes the eigenvectors of the Gram matrix \(\mathbf {G} =\mathbf {X_{\text {D}} X_{\text {D} }}^\text {T} \). This is significantly smaller than the covariance matrix and the desired covariance eigenvectors can be computed by premultiplying the Gram eigenvectors by \(\mathbf {X}_{\text {D}}^\text {T}\). That is, the desired principal components are linear combination of original zero-mean data using weights that are the eigenvectors of the Gram matrix.
7.2 Flexibility Modes of Partial Data Reconstruction
In the cases of incomplete scan data, the morphed template will also be incomplete. This is straightforward to detect in our morphing algorithm, as those vertices will not have a mutual nearest neighbour with the scan that it is warped to. Despite this, the 3DMM can both infer the missing parts and estimate the variation in shape of these missing parts when the variation in the partial shape is minimized.
Given partial data, we can divide the shape components into two parts: one for the missing data, \(\mathbf {x}_{\text {a}}\), and the other for (partial) present data, \(\mathbf {x}_{\text {b}}\) . Without loss of generality, this is achieved by permutation of blocks of 3 variables (one block per mesh vertex) such that \(\mathbf {x} = ( \mathbf {x}^\text {T}_{\text {a}}, \mathbf {x}^\text {T}_{\text {b}})^\text {T} \) and \(\mathbf {V}_\text {a,b}\) are the associated partitioned eigenvectors.
To circumvent this problem, Albrecht et al. (2008) allow a small amount of variance in the present part in order to explore the remaining flexibility in the missing part. This can be formulated as a generalized eigenvalue problem, the solution of which yields a set of generalized eigenvectors that describe the variation in the overall shape and these are called flexibility modes.
In the following, we explore flexibility modes associated with our global 3DMM. We choose scale factors (\(\pm 2.2\sqrt{l}\)) to illustrate the flexibility modes of: i. a missing face, ii. a missing cranium, and iii. a missing half head, at one side of the sagittal symmetry plane. In the first row of Fig. 11, we largely fix the shape of the cranium and reconstruct the full head from that shape, while permitting the shape of the face to vary. Showing the remaining flexibility, when one shape part is highly constrained provides more insight into the statistical properties of the shape. Here we found that most variation occurs over the chin region, which may have a wide range of forms (shapes and sizes) for a given cranium. Perhaps this is unsurprising, as the jaw is distant from the cranium and is a separate bone. However, to our knowledge, this is the first time that flexible reconstruction has been performed using a 3DMM of the head.
8 High Resolution Texture Mapping
It is preferable to store texture information in a UV space texture map, where resolution is unconstrained, rather than store only per-vertex colors, where resolution is limited by mesh resolution. To do so requires the texture information from each data scan to be transformed into a standard UV texture space for which the embedding of the template is known. As we use the FaceWarehouse mesh, we also use their vertex embedding into UV space (Cao et al. 2014). The key to obtaining a high quality texture map is embedding all the pixels in one triangular mesh face from the texture image to its corresponding mesh face in the UV template (see Fig. 12 (1)). Compared to a per-vertex color-texture map, a pixel embedding texture map employs all the pixels in each template mesh face, thus capturing more texture detail.
Minimization of face area does not guarantee that all UV coordinate combinations belong to the same viewpoint (see Fig. 12-2). To overcome this, a second stage that employs affine transformations is used to refine the UV coordinates. If the UV coordinates in one mesh face are placed in different views, we compute the affine transformation \(\mathbf {T}\) from its adjacent mesh face (the one with a common edge in the same viewpoint) to the corresponding face in the template UV faces. Then this corresponding face is transformed by \(\mathbf {T}^{-1}\) to find the vertex position in the common viewpoint (see blue point in Fig. 12 (1)). The outcome of affine transformation refinement is shown in Fig. 12 (3). As shown in Fig. 13, the quality of the texture map improves compared to the per-vertex approach, such that the freckles can be seen in the rendering.
9 Evaluation of Correspondences
9.1 Ablation Study
In order to validate the effectiveness of each key step in the proposed registration pipeline, we first remove the process of template adaption from the pipeline and evaluate performance. We then replace this and remove LB Regularized Projection from the proposed framework and again evaluate performance qualitatively and quantitatively, comparing both modified pipeline cases with the full pipeline. Typical results for a child in the Headspace dataset are shown in Fig. 14. After pure rigid alignment without template adaptation, the nose of the template is still bigger than the target. As can be seen in Fig. 14-3, the nose and ear areas both have a poor deformation result. Without LB Regularized Projection, shown in Fig. 14-4, it fails to capture the shape detail compared with the proposed full pipeline. The adaptive template improves the correspondence accuracy in the local regions, while the LB Regularized Projection helps in decreasing the correspondence error in the surface normal direction.
9.2 Comparison with Previous Work
This section compares the proposed method with our previous work. Figure 16 compares: (i) the proposed ICPD with adaptive template, (ii) hierarchical parts-based CPD-LB (Dai et al. 2017b), and (iii) symmetry-aware CPD (Dai et al. 2018a).
9.2.1 Quantitative Evaluation
9.2.2 Qualitative Evaluation
Figure 17 illustrates that the eye region and mouth region can have a small over-fitting problem in morphing when using either hierarchical parts-based CPD-LB or symmetry-aware CPD. The third row in Fig. 17 shows that, at least for this example, ICPD with adaptive template gives a better morphing in the ear region where outliers exists in the scan.
9.3 Comparison with Other Literature
9.3.1 Quantitative Evaluation
9.3.2 Qualitative Evaluation
Figure 19 shows a typical example where the proposed method is qualitatively superior to other methods with respect to shape detail capture, and the accuracy of the mouth region. The LSFM pipeline captures shape detail but the mouth region is not close to the 3D scan. The OF pipeline has a smooth deformation field, thereby failing to capture shape detail. The OF approach requires a point projection stage after the shape registration to reduce shape error.
10 Evaluation of 3DMMs
We select 1212 individuals (606 males, 606 females) from the Headspace dataset (Duncan et al. 2018) to build our global 3DMM using our fully-automatic 3DMM training pipeline. Note that the full dataset contains 1519 subjects, but we exclude 307 subjects on the following grounds: i. Poor fitting of the latex cap; ii. Excessive hair bulge under the latex cap; iii. Excessive noise or missing parts in the 3D scan; iv. Declared craniofacial condition/trauma; v. Gender balancing. Subpopulations of these 1212 Headspace subjects are employed to build gender-specific models, LYHM-male, LYHM-female, and four age-specific models (LYHM-age-X).
Comparison of 3DMM construction pipelines
Initialisation | Dense correspondence | Alignment | Modelling | |
---|---|---|---|---|
LSFM | Automatic facial landmarks | NICP with error pruning | GPA | PCA |
OF | Manual landmarks needed | GPMM registration | GPA | GP |
Proposed | Automatic pose normalisation | ICPD | GPA | WPCA |
10.1 3DMM Visualisation
We present visualisations that provide insight into how different regions of the high-dimensional space of human face/head shape and texture are naturally related to different demographic characteristics. Taking into account the available demographic metadata in the Headspace dataset, we define the following groups: male (all ages) and female (all ages). The dataset is further clustered into four age groups: under 15 years old, 15–30 years old, 31–50 years old, and over 50 years old. The mean and most significant 7 shape components of the 6 demographic-specific models are given in Fig. 20. Likewise, Fig. 21 shows the mean and most significant 7 texture components of the six demographic-specific models visualized on the mean shape. The shape and texture is varied from \(+3\sigma \) to \(-3\sigma \), where \(\sigma \) is standard deviation.
10.2 3DMM Quantitative Evaluation Metrics
For quantitative statistical shape model evaluation, Styner et al. (2003) propose three metrics: compactness, generalisation and specificity, as follows: i. Compactness of the model describes the number of model parameters (principal components for PCA-based models) required to express some fraction of the variance in the training set. Fewer parameters is better and means that the shape variation is captured more efficiently. ii. Generalization measures the capability of the model to represent unseen examples of the class of objects. It can be measured using the leave-one-out strategy, where one example is omitted from the training set and used for reconstruction testing. The accuracy of describing the unseen example is calculated by the mean vertex-to-vertex Euclidean distance error—lower is better for a given number of model components. With an increasing number of model parameters, the generalization error is expected to decrease. iii. Specificity measures the ability to generate shape instances of the class that are similar to those in the training set. In order to assess specificity, a set of shape instances should be randomly sampled from the shape space. Then the Euclidean distance error to the closest training shape is calculated for each shape instance and the average is taken for all the shape instances. The mean Euclidean distance error is expected to increase with increasing number of parameters, as the increasing number of PCA coefficients gives more flexibility to shape reconstruction. It also increases the likelihood of the reconstructed shape instances being away from the real data. For specificity measurement, the lower Euclidean distance error, the closer the model is to the training data data, so the specificity is better.
10.3 Evaluation of Full Head 3DMMs Using 3DMM Training Pipelines in the Literature
We build full head 3DMMs using the proposed method, the LSFM pipeline (Booth et al. 2016), and the OF pipline (Gerig et al. 2017), again with 1212 subjects from the Headspace dataset. As can be from Fig. 23-a, when less than 33 components used, LSFM is more compact than the proposed method and OF. Between the first 33 and 79 components, the model constructed by OF is more compact than the other two. When more than 79 components used, the proposed method has better compactness than LSFM (Booth et al. 2016) and OF (Gerig et al. 2017). With the first 56 and the first 146 components used, the 3DMM constructed by the proposed method retains 95% and 98% of the shape variation in the training set.
The proposed method has the lowest specificity error, which implies that the proposed method is best at generating instances close to real data.
Overall, with more 79 components used, the proposed pipeline is better than LSFM Booth et al. (2016) and OF Gerig et al. (2017) in terms of compactness. The generalisation error of LSFM decreases faster than the proposed method. However, with more components used, the proposed method has the lowest generalisation error when compared with the other two pipelines. The proposed method outperforms LSFM and OF in specificity.
10.4 Number of Model Components
10.5 Quantitative Evaluation of Submodels
Texture map image quality assessment using three metrics
SSIM | MS-SSIM | IW-SSIM | |
---|---|---|---|
Per-vertex color | 0.8790 | 0.8618 | 0.6238 |
Texture mapping | 0.8926 | 0.8712 | 0.6505 |
10.6 Texture Map Image Quality Assessment
As shown in Fig. 13, the proposed texture map technique outperforms per-vertex texture image qualitatively. We use several performance metrics, namely: SSIM (Wang et al. 2004), MS-SSIM (Wang et al. 2003) and IW-SSIM (Wang and Li 2011) to measure the texture map quantitatively. Under the assumption that human visual perception is highly adapted for extracting structural information from a scene, Structural SIMilarity (SSIM) is based on the degradation of structural information (higher is better). We save the rendering of the captured image, the proposed texture map and the per-vertex texture image into the same image size. When using the full-reference image quality assessment indices, we treat the rendering of the captured image as the reference image for image quality assessment. The rendering of the proposed texture map and per-vertex texture image are compared with this reference image, respectively. As can be seen in Table 2, the proposed texture mapping technique improves the texture image quality for texture modelling, when compared with that of per-vertex color texture image.
11 Applications
We now demonstrate two applications of our 3DMMs: age regression in Sect. 11.1, clinical intervention outcome evaluation in Sect. 11.2.
11.1 Age Regression
11.2 Clinical Intervention Outcome Evaluation
We use our modeling to describe post-surgical cranial change in a sample of 17 craniosynostosis patients (children), 10 of which have undergone one type of cranial corrective procedure Barrel Staving (BS) and the other 7, another cranial corrective procedure Total Calvarial Reconstruction (TCR).
We build a scale-normalized cranial model with the face removed to focus on cranial shape variation only. The model is constructed using Headspace subjects under 15 years and we note that major cranial shape changes are not thought to occur after 2 years. Thus the model is applicable to all but very young children. Note that we are merely illustrating how our 3DMMs can evaluate surgical procedures. However, in this case study, the relatively small number of patients and the young age of some of the patients, makes concrete inferences about the relative quality of the procedures unsafe.
We plot the preoperative and postoperative cranial model parameters of the patients. The expected result is that the parameterisations should show the head shapes moving nearer to the mean of the training examples. Figure 30 demonstrates the full head meshes of the patients after registration to the 3D scans, for both preoperative and postoperative shapes. The results are shown in Figs. 31 and 32. The parameterisations are validated to move nearer to the mean, which is at the origin of the plots. To our knowledge, this is the first use of full head 3DMMs in a craniofacial clinical study.
Figure 33 demonstrates a case study on cranial shape change of a specific patient. Clinicians are also interested in the influence of operation type on facial shape. Here, we use the shape of the full head, both face and cranium, in the analysis. We can clearly observe the improvement after operation when viewing the 3D shape. This is validated by the shape analysis. The preoperative shape parameters are outside the 2\(\sigma \) (2 standard deviation) ellipse of the training set, while the postoperative shape parameters are within this ellipse.
For a quantitative evaluation, we calculate the Mahalanobis distance of each patient. As can be seen in Figs. 31-right and 32-right, the mean of Mahalanobis distance for all patients decreases from 3.21 to 1.18 standard deviations in terms of the BS operation. For the TCR operation, the mean of Mahalanobis distance for all patients decreases from 3.52 to 2.23 standard deviations. The improvement for the BS intervention is 63.24% and the improvement for the TCR intervention is 36.65%. In this case study, the BS intervention has better clinical outcomes than TCR. However we note that the BS intervention is more appropriate for very young children (under 1 year old), while TCR is more appropriate for children older than a year. This is likely to influence the results. However, as mentioned earlier, the patient population sample is too small to be conclusive, and our 3DMM is more appropriate for the TCR group of patients than the younger BS group of patients.
12 Conclusion
We released the first publicly-available full head dataset with age, gender and ethnicity metadata for academic research. We proposed a fully-automatic 3DMM training pipeline and used it to build the first shape-texture 3DMM of the full head. The correspondence framework avoids over-fitting and under-fitting in template morphing. The adaptive template improves the correspondence accuracy in local regions, while the LB Regularized Projection helps in decreasing the correspondence error in the normal direction to the shape surface. The correspondence accuracy is state-of-the-art, in terms of publicly-available pipelines. The texture mapping technique captures high quality texture for texture modelling. The proposed 3DMMs have a powerful ability in reconstruction of incomplete data and model regression to observe the influence of age on craniofacial growth. The flexibility of reconstruction from incomplete craniofacial data helps in many computer vision applications. We present the first use of statistical 3D craniofacial shape models in a clinical study.
Notes
Acknowledgements
We thank Google Faculty Awards and our Google sponsor, Forrester Cole, for supporting this research in 2017–2018. We thank the Royal Academy of Engineering and the Leverhulme Trust for priming this work in 2013–2014, via their Senior Research Fellowship awards. Headspace data collection was supported by QIDIS from the National Commissioning Group. We thank Rachel Armstrong, Headspace data collection coordinator.
References
- Albrecht, T., Knothe, R., & Vetter, T. (2008). Modeling the remaining flexibility of partially fixed statistical shape models. In 2nd MICCAI workshop on mathematical foundations of computational anatomy (pp. 160–169).Google Scholar
- Amberg, B., Romdhani, S., & Vetter, T. (2007). Optimal step nonrigid ICP algorithms for surface registration. In IEEE conference on computer vision and pattern recognition (pp. 1–7).Google Scholar
- An, Z., Deng, W., Yuan, T., & Hu, J. (2018). Deep transfer network with 3D morphable models for face recognition. In 2018 13th IEEE international conference on automatic face gesture recognition (pp. 416–422).Google Scholar
- Basso, C., Verri, A., & Herder, J. (2007). Fitting 3D morphable models using implicit representations. Journal of Virtual Reality and Broadcasting, 4(18), 1–10.Google Scholar
- Beeler, T., & Bradley, D. (2014). Rigid stabilization of facial expressions. ACM Transactions on Graphics (TOG), 33(4), 44.CrossRefGoogle Scholar
- Besl, P. J., & McKay, N. D. (1992). Method for registration of 3-D shapes. In Sensor fusion IV: Control paradigms and data structures (Vol. 1611, pp. 586–607). International Society for Optics and Photonics.Google Scholar
- Blanz, V., & Vetter, T. (1999). A morphable model for the synthesis of 3D faces. In Proceedings of the 26th annual conference on Computer graphics and interactive techniques (pp. 187–194).Google Scholar
- Blanz, V., & Vetter, T. (2003). Face recognition based on fitting a 3D morphable model. IEEE Transactions on Pattern Analysis and Machine Intelligence, 25(9), 1063–1074.CrossRefGoogle Scholar
- Bolkart, T., & Wuhrer, S. (2013). Statistical analysis of 3D faces in motion. In 2013 International conference on 3D vision-3DV 2013 (pp. 103–110). IEEE.Google Scholar
- Bolkart, T., & Wuhrer, S. (2015). A groupwise multilinear correspondence optimization for 3D faces. In Proceedings of the IEEE international conference on computer vision (pp. 3604–3612).Google Scholar
- Bookstein, F. L. (1989). Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence, 11(6), 567–585.CrossRefGoogle Scholar
- Booth, J., Roussos, A., Ponniah, A., Dunaway, D., & Zafeiriou, S. (2018). Large scale 3D morphable models. International Journal of Computer Vision, 126(2–4), 233–254.MathSciNetCrossRefGoogle Scholar
- Booth, J., Roussos, A., Zafeiriou, S., Ponniah, A., & Dunaway, D. (2016). A 3D morphable model learnt from 10,000 faces. In Proceedings of CVPR (pp. 5543–5552).Google Scholar
- Brunton, A., Lang, J., Dubois, E., & Shu, C. (2011). Wavelet model-based stereo for fast, robust face reconstruction. In 2011 Canadian conference on computer and robot vision (CRV) (pp. 347–354).Google Scholar
- Cao, C., Weng, Y., Zhou, S., Tong, Y., & Zhou, K. (2014). Facewarehouse: A 3D facial expression database for visual computing. IEEE Transactions on Visualization and Computer Graphics, 20(3), 413–425.CrossRefGoogle Scholar
- Chen, Y., & Medioni, G. (1992). Object modelling by registration of multiple range images. Image and Vision Computing, 10(3), 145–155.CrossRefGoogle Scholar
- Cootes, T. F., Edwards, G. J., & Taylor, C. J. (2001). Active appearance models. IEEE Transactions on Pattern Analysis & Machine Intelligence, 6, 681–685.CrossRefGoogle Scholar
- Cootes, T. F., & Taylor, C. J. (1995). Combining point distribution models with shape models based on finite element analysis. Image and Vision Computing, 13(5), 403–409.CrossRefGoogle Scholar
- Cootes, T. F., Taylor, C. J., Cooper, D. H., & Graham, J. (1995). Active shape models-their training and application. Computer vision and image understanding, 61(1), 38–59.CrossRefGoogle Scholar
- Creusot, C., Pears, N. E., & Austin, J. (2013). A machine-learning approach to keypoint detection and landmarking on 3D meshes. International Journal of Computer Vision, 102(1), 146–179.CrossRefGoogle Scholar
- Dai, H., Pears, N., & Duncan, C. (2017a). A 2D morphable model of craniofacial profile and its application to craniosynostosis. In Medical image understanding and analysis, communications in computer and information science (Vol. 723).Google Scholar
- Dai, H., Pears, N., Smith, W., & Duncan, C. (2017b). A 3D morphable model of craniofacial shape and texture variation. In 2017 IEEE international conference on computer vision (ICCV) (pp. 3104–3112). IEEE.Google Scholar
- Dai, H., Pears, N., Smith, W., & Duncan, C. (2018a). Symmetric shape morphing for 3D face and head modelling. In 2018 13th IEEE international conference on automatic face and gesture recognition (FG 2018) (pp. 91–97). IEEE.Google Scholar
- Dai, H., Pears, N., Smith, W., & Duncan, C. (2018b). Symmetric shape morphing for 3D face and head modelling. In 2018 13th IEEE international conference on automatic face gesture recognition (FG 2018) (pp. 91–97).Google Scholar
- De Smet, M., & Van Gool, L. (2010). Optimal regions for linear model-based 3D face reconstruction. In Asian conference on computer vision (pp. 276–289).Google Scholar
- Dryden, I. L., & Mardia, K. V. (1998). Statistical shape analysis. Chichester: John Wiley and Sons.zbMATHGoogle Scholar
- Duncan, C., Armstrong, R., Pears, N. E., Dai, H., & Smith, W. (2018). The headspace dataset. https://www-users.cs.york.ac.uk/~nep/research/Headspace/. Accessed 5 Nov 2019.
- Garrido, P., Zollhöfer, M., Casas, D., Valgaerts, L., Varanasi, K., Pérez, P., et al. (2016). Reconstruction of personalized 3D face rigs from monocular video. ACM Transactions on Graphics, 35(3), 28:1–28:15.CrossRefGoogle Scholar
- Gerig, T., Forster, A., Blumer, C., Egger, B., Lüthi, M., Schönborn, S., & Vetter, T. (2017). Morphable face models: An open framework. CoRR arXiv:1709.08398.
- Golovinskiy, A., Matusik, W., Pfister, H., Rusinkiewicz, S., & Funkhouser, T. (2006). A statistical model for synthesis of detailed facial geometry. ACM Transactions on Graphics (TOG), 25, 1025–1034.CrossRefGoogle Scholar
- Harrison, C. R., & Robinette, K. M. (2006). Principles of fit to optimize helmet sizing. Technical report, Air Force Research Lab Wright-Patterson.Google Scholar
- Kendall, D. G. (1984). Shape manifolds, procrustean metrics, and complex projective spaces. Bulletin of the London Mathematical Society, 16(2), 81–121.MathSciNetCrossRefGoogle Scholar
- Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2), 91–110.CrossRefGoogle Scholar
- Lüthi, M., Gerig, T., Jud, C., & Vetter, T. (2017). Gaussian process morphable models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40, 1860–1873.CrossRefGoogle Scholar
- Madsen, D., Lüthi, M., Schneider, A., & Vetter, T. (2018). Probabilistic joint face-skull modelling for facial reconstruction. In Proceedings of CVPR (pp. 5295–5303).Google Scholar
- Myronenko, A., & Song, X. (2010). Point set registration: Coherent point drift. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(12), 2262–2275.CrossRefGoogle Scholar
- Paysan, P., Knothe, R., Amberg, B., Romdhani, S., & Vetter, T. (2009). A 3D face model for pose and illumination invariant face recognition. In Sixth IEEE international conference on advanced video and signal based surveillance, 2009. AVSS’09 (pp. 296–301).Google Scholar
- Petr, M., & Ivana, K. (2015). Hairstyles modeling for police identikits. In Proceedings of the 31st Spring conference on computer graphics (pp. 151–158). ACM.Google Scholar
- Salazar, A., Wuhrer, S., Shu, C., & Prieto, F. (2014). Fully automatic expression-invariant face correspondence. Machine Vision and Applications, 25(4), 859–879.CrossRefGoogle Scholar
- Saragih, J. M., Lucey, S., & Cohn, J. F. (2011). Real-time avatar animation from a single image. In IEEE international conference on automatic face and gesture recognition 2011 (pp. 213–220).Google Scholar
- Sorkine, O., & Alexa, M. (2007). As-rigid-as-possible surface modeling. In Proceedings of the fifth Eurographics symposium on geometry processing (pp. 109–116).Google Scholar
- Styner, M. A., Rajamani, K. T., Nolte, L. P., Zsemlye, G., Székely, G., Taylor, C. J., & Davies, R. H. (2003). Evaluation of 3D correspondence methods for model building. In Information processing in medical imaging (pp. 63–75).Google Scholar
- ter Haar, F. B., & Veltkamp, R. C. (2008) 3D face model fitting for recognition. In European conference on computer vision (pp. 652–664).Google Scholar
- Thompson, D. W. (1917). On growth and form. Cambridge University Press.Google Scholar
- Tran, L., & Liu, X. (2018). Nonlinear 3D face morphable model. arXiv preprint arXiv:1804.03786.
- Van Der Maaten, L. (2014). Accelerating t-SNE using tree-based algorithms. The Journal of Machine Learning Research, 15(1), 3221–3245.MathSciNetzbMATHGoogle Scholar
- Vlasic, D., Brand, M., Pfister, H., & Popović, J. (2005). Face transfer with multilinear models. ACM Transactions on Graphics (TOG), 24, 426–433.CrossRefGoogle Scholar
- Wang, Z., & Li, Q. (2011). Information content weighting for perceptual image quality assessment. IEEE Transactions on Image Processing, 20(5), 1185–1198.MathSciNetCrossRefGoogle Scholar
- Wang, Z., Bovik, A. C., Sheikh, H. R., & Simoncelli, E. P. (2004). Image quality assessment: From error visibility to structural similarity. IEEE Transactions on Image Processing, 13(4), 600–612.CrossRefGoogle Scholar
- Wang, Z., Simoncelli, E. P., & Bovik, A. C. (2003). Multiscale structural similarity for image quality assessment. In The thirty-seventh Asilomar conference on signals, systems and computers, 2003 (Vol. 2, pp. 1398–1402). IEEE.Google Scholar
- Wu, Y., & Ji, Q. (2019). Facial landmark detection: A literature survey. International Journal of Computer Vision, 127(2), 115–142.CrossRefGoogle Scholar
- Yang, F., Bourdev, L., Shechtman, E., Wang, J., & Metaxas, D. (2012). Facial expression editing in video using a temporally-smooth factorization. In 2012 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 861–868). IEEE.Google Scholar
- Yang, F., Wang, J., Shechtman, E., Bourdev, L., & Metaxas, D. (2011). Expression flow for 3D-aware face component transfer. In ACM transactions on graphics (TOG) (vol. 30, p. 60).Google Scholar
- Yin, L., Chen, X., Sun, Y., Worm, T., & Reale, M. (2008). A high-resolution 3D dynamic facial expression database. In 8th IEEE international conference on automatic face and gesture recognition, 2008. FG’08 (pp. 1–6). IEEE.Google Scholar
- Zhou, Y., & Zaferiou, S. (2017). Deformable models of ears in-the-wild for alignment and recognition. In 2017 12th IEEE international conference on automatic face and gesture recognition (FG 2017) (pp. 626–633). IEEE.Google Scholar
- Zhu, X., & Ramanan, D. (2012). Face detection, pose estimation, and landmark localization in the wild. In Proceedings of CVPR (pp. 2879–2886).Google Scholar
Copyright information
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.