This section describes the entire pipeline used to build the RBSM. As outlined in Fig. 4, we start by establishing dense correspondence among our training data. Based on a novel concept called breast probability masks (Sect. 3.1), this is achieved by means of a fully automated, pairwise registration pipeline as proposed in Sect. 3.2. Finally, we follow the typical workflow used to build a point-based statistical shape model by applying Generalized Procrustes Analysis and Principal Component Analysis to the registered data set (briefly summarized in Sect. 3.3).
In what follows, 3D breast scans are represented using triangular surface meshes. A triangle mesh \({\mathcal {M}}=(V,E,{\mathcal {P}})\) is fully specified by a set of n vertices \(V\subset {\mathbb {N}}\), edges \(E\subset V\times V\), and an embedding \({\mathcal {P}}=\{{\mathbf {p}}_1,{\mathbf {p}}_2\ldots ,{\mathbf {p}}_n\}\subset {\mathbb {R}}^3\). Sometimes, however, instead of arranging points \({\mathbf {p}}_i\) into a set, it is more convenient to use a matrix notation \({\mathbf {P}}=({\mathbf {p}}_1,{\mathbf {p}}_2,\ldots ,{\mathbf {p}}_n)^\top \in {\mathbb {R}}^{n\times 3}\). Hence, we will denote a triangle mesh either as \({\mathcal {M}}=(V,E,{\mathcal {P}})\) or equivalently as \({\mathcal {M}}=(V,E,{\mathbf {P}})\).
Breast probability masks
Given a 3D breast scan represented as triangle mesh \({\mathcal {M}}=(V,E,{\mathcal {P}})\), we call
$$\begin{aligned} p_{\mathcal {M}}:{\mathcal {P}}\longrightarrow (0,1] \end{aligned}$$
(1)
a breast probability mask (BPM). Technically, a BPM is a scalar field defined over \({\mathcal {M}}\) assigning each point \({\mathbf {p}}_i\) of a 3D breast scan a probability \(p_{\mathcal {M}}({\mathbf {p}}_i)\) telling how likely it is that \({\mathbf {p}}_i\) belongs to the breast region.
Concrete mapping As a concrete mapping for \(p_{\mathcal {M}}\), we propose to use a normalized sum of elliptical basis functions (EBFs), centered at the nipples. We use EBFs instead of ordinary radial basis functions (RBFs) because we found that they better capture the natural teardrop shape of the breast (see Fig. 5 for a comparison between RBFs and EBFs). Technically, EBFs are a generalization of RBFs using the Mahalanobis distance instead of an ordinary vector norm. Formally, an EBF \(\phi :[0,\infty )\longrightarrow {\mathbb {R}}\) centered at a point \({\mathbf {c}}\in {\mathbb {R}}^n\) is of the form \(\phi ({\mathbf {x}})=\phi \left( d_M\left( {\mathbf {x}},{\mathbf {c}}\right) \right) \). Here, \(d_M\) is the Mahalanobis distance, defined as
$$\begin{aligned} d_M({\mathbf {x}},{\mathbf {c}}):=\sqrt{\left( {\mathbf {x}}-{\mathbf {c}}\right) ^\top {\mathbf {S}}^{-1}\left( {\mathbf {x}}-{\mathbf {c}}\right) }\,, \end{aligned}$$
(2)
where \({\mathbf {S}}\in {\mathbb {R}}^{n\times n}\) is a symmetric positive definite matrix, also called covariance matrix. To stress that the Mahalanobis distance depends on \({\mathbf {S}}\), we write \(d_M({\mathbf {x}},{\mathbf {c}};{\mathbf {S}})\) in the following.
Now, in order to define a concrete BPM using EBFs, let \({\mathbf {p}}_\text {N}^\tau \in {\mathcal {P}}\) denote the position of the left (L) and right (R) nipple, respectively, and \(\tau \in \{\text {L, R}\}\). We first construct two individual probability masks for the left and the right breast, given as
$$\begin{aligned} p_{\mathcal {M}}^\tau ({\mathbf {p}}_i)=\phi \left( d_M\left( {\mathbf {p}}_i,{\mathbf {p}}_\text {N}^\tau ;{\mathbf {S}}_\tau \right) \right) . \end{aligned}$$
(3)
Hereby, we define \(\phi :[0,\infty )\longrightarrow (0,1]\) as
$$\begin{aligned} \phi (x)=\exp \left( -x^2\right) . \end{aligned}$$
(4)
Finally, the BPM for a whole 3D breast scan is given as the normalized sum
$$\begin{aligned} p_{\mathcal {M}}({\mathbf {p}}_i)= & {} \frac{1}{4}\left( p_{\mathcal {M}}^\text {L}({\mathbf {p}}_i)+{\hat{p}}_{\mathcal {M}}^\text {L}({\mathbf {p}}_i)\right. \nonumber \\&\left. +p_{\mathcal {M}}^\text {R}({\mathbf {p}}_i)+{\hat{p}}_{\mathcal {M}}^\text {R}({\mathbf {p}}_i)\right) , \end{aligned}$$
(5)
where
$$\begin{aligned} {\hat{p}}_{\mathcal {M}}^\tau ({\mathbf {p}}_i)=\phi \left( d_M\left( {\mathbf {p}}_i,{\hat{\mathbf {p}}}_\text {N}^\tau ;{\hat{\mathbf {S}}}_\tau \right) \right) \end{aligned}$$
(6)
are shifted BPMs of the left and right breast added to better mimic the teardrop shape, and \({\hat{\mathbf {p}}}_\text {N}^\tau ={\mathbf {p}}_\text {N}^\tau +{\mathbf {t}}_\tau \) with translation vectors \({\mathbf {t}}_\tau \in {\mathbb {R}}^3\).
Parameter selection In order to fully define a BPM, appropriate matrices \({\mathbf {S}}_\tau ,{\hat{\mathbf {S}}}_\tau \in {\mathbb {R}}^{3\times 3}\) and translation vectors \({\mathbf {t}}_\tau \in {\mathbb {R}}^3\) need to be chosen first. As such, a total of 30 values are required to be properly determined (six for each \({\mathbf {S}}_\tau \) and \({\hat{\mathbf {S}}}_\tau \), and three for each \({\mathbf {t}}_\tau \)). To simplify that task, we assume diagonal covariance matrices and utilize previously expert-marked landmarks on the 3D breast scans. Specifically, denote the landmark points shown in Fig. 2 as \({\mathbf {p}}_\text {SN},{\mathbf {p}}_\text {XI}\in {\mathcal {P}}\) for sternal notch and xiphoid, and \({\mathbf {p}}_\text {LaBP}^\tau ,{\mathbf {p}}_\text {LBP}^\tau \in {\mathcal {P}}\) for left and right lateral and lower breast pole, respectively. We then define
$$\begin{aligned} \begin{aligned}&{\mathbf {S}}_\tau = \frac{1}{2}{{\,\mathrm{diag}\,}}\Bigl (d_G\left( {\mathbf {p}}^\tau _\text {LaBP},{\mathbf {p}}^\tau _\text {N}\right) +d_G\left( {\mathbf {p}}^\tau _\text {N},{\mathbf {p}}_\text {XI}\right) ,\\&d_G\left( {\mathbf {p}}^\tau _\text {N},{\mathbf {p}}^\tau _\text {LBP}\right) , d_G\left( {\mathbf {p}}^\tau _\text {LaBP},{\mathbf {p}}^\tau _\text {N}\right) \Bigr )\,, \\&{\hat{\mathbf {S}}}_\tau = \frac{1}{2}{{\,\mathrm{diag}\,}}\Bigl (d_G\left( {\mathbf {p}}^\tau _\text {LaBP},{\mathbf {p}}^\tau _\text {N}\right) +d_G\left( {\mathbf {p}}^\tau _\text {N},{\mathbf {p}}_\text {XI}\right) ,\\&d_G\left( {\mathbf {p}}^\tau _\text {N},{\mathbf {p}}_\text {SN}\right) ,d_G\left( {\mathbf {p}}^\tau _\text {LaBP},{\mathbf {p}}^\tau _\text {N}\right) \Bigr )\,, \\&{\mathbf {t}}_\tau ={\mathbf {p}}^\tau _\text {N}+\frac{1}{5}\left( 0,d_G\left( {\mathbf {p}}^\tau _\text {N},{\mathbf {p}}_\text {SN}\right) ,0\right) , \end{aligned} \end{aligned}$$
(7)
where \(d_G\) denotes the Geodesic distance between two points on the surface mesh. Note that \({\mathbf {S}}_\tau \) and \({\hat{\mathbf {S}}}_\tau \) differ only in the second diagonal element.
Registration of 3D breast scans
Following Fig. 4, the proposed pairwise registration pipeline is mainly composed of rigid alignment (Sect. 3.2.1) and non-rigid alignment (Sect. 3.2.2). To speed up convergence, the latter is carried out in a hierarchical, multi-resolution fashion (Sect. 3.2.3).
Both phases make extensive use of BPMs in order to align a template surface \({\mathcal {S}}=(V,E,{\mathbf {P}})\) to a target \({\mathcal {T}}\) as accurately as possible inside the breast region and only roughly outside, effectively decoupling the breast from the rest of the thorax by reducing the variance outside the breast region to a minimum. This is justified as the covariance \({{\,\mathrm{cov}\,}}(x,y)\) becomes smaller if \({{\,\mathrm{var}\,}}(x)\) or \({{\,\mathrm{var}\,}}(y)\) is lowered, following from the well-known fact that \(\left| {{\,\mathrm{cov}\,}}(x,y)\right| \le \sqrt{{{\,\mathrm{var}\,}}(x)}\sqrt{{{\,\mathrm{var}\,}}(y)}\) (which holds via the Cauchy–Schwarz inequality).
Finally, note that the target surface \({\mathcal {T}}\) can be given in any representation that allows for closest point search. We use a triangular surface mesh but write \({\mathcal {T}}\subset {\mathbb {R}}^3\) for the sake of notational simplicity.
Rigid alignment
The overall goal of the rigid alignment is to move the template as close as possible to the rigid part of the target, which we define as the thorax without the breast. In particular, we expect that the thoraxes of two subjects without the breast region can be sufficiently well aligned if we assume the breast to be the only part of the thorax that deforms non-rigidly. Based on this assumption, the absence of suitable landmarks, and due to the fact that our initial 3D breast scans are already reasonably well aligned (see Sect. 4.1), we propose a modified version of the Iterative Closest Point (ICP) algorithm, originally introduced by Besl and McKay [8].
Essentially, compared to the standard version of the ICP algorithm, our modified version differs in the following three aspects: (i) A scaling factor is added to the rigid transformation effectively allowing for Euclidean similarity transformations [20, 60]. Secondly, (ii) to ensure that only the rigid parts of the 3D breast scans are used for alignment, correspondences, where both points have a high probability belonging to the breast region, are discarded. This is implemented by thresholding the template and target BPMs. Finally, (iii) rotations are restricted to the x-axis corresponding to the transversal plane. Rotations around the y- and z-axis (sagittal and coronal plane) possibly introduced due to severe overweight in conjunction with an uneven distribution of abdominal fat could destroy the initial alignment and lead to misalignment. In any case, asymmetries introduced due to the thorax should not affect the rigid alignment of the template.
Non-rigid alignment
Given the rigidly aligned template \({\mathcal {S}}=(V,E,{\mathbf {P}})\), the goal of the non-rigid alignment is to gradually deform \({\mathcal {S}}\) into a new surface \({\mathcal {S}}'=(V,E,{\mathbf {P}}')\) with identical topology such that \({\mathcal {S}}'\) is as close as possible to the target \({\mathcal {T}}\) inside the breast region. Following various authors including Jiang et al. [31] and Yamazaki et al. [57], we formulate our non-rigid registration problem using the following nonlinear energy functional
$$\begin{aligned} F\left( {\mathbf {P}}'\right) =F_D\left( {\mathbf {P}}'\right) +\alpha F_R\left( {\mathbf {P}}'\right) +\beta F_L\left( {\mathbf {P}}'\right) , \end{aligned}$$
(8)
where \(F_D\) is a distance term used to penalize the point-to-point distance between the template and target surface, \(F_R\) is a regularization term constraining deformations as similar as possible, and \(F_L\) constitutes a landmark term ensuring certain points to be matched. \(\alpha ,\beta \ge 0\) are weights controlling the individual contribution of each term to the cost function. Minimizing F finally leads to the new points \({\mathbf {P}}'\) of the deformed template surface \({\mathcal {S}}'\), i.e.,
$$\begin{aligned} {\mathbf {P}}'=\mathop {\mathrm{arg\,min}}\limits _{\mathbf {P'}\in {\mathbb {R}}^{n\times 3}}F(\mathbf {P'}). \end{aligned}$$
(9)
Adapting the strategy proposed by Allen et al. [2], instead of computing (9) only once, we minimize F several times but each time lowering the regularization weight \(\alpha \) in (8). As later demonstrated by Amberg et al. [4], this strategy is able to recover the whole range of global and local non-rigid deformations efficiently. Following various authors [31, 46], the optimization problem in (9) is solved using an alternating minimization (AM) approach as briefly summarized in Appendix A.
Distance term The distance term \(F_D\) is used to attract the template \({\mathcal {S}}\) to the target \({\mathcal {T}}\). Assuming fixed correspondences between both surfaces, i.e., \(\left\{ ({\mathbf {p}}_1,{\mathbf {q}}_1),({\mathbf {p}}_2,{\mathbf {q}}_2),\ldots ,({\mathbf {p}}_n,{\mathbf {q}}_n)\right\} \) with \({\mathbf {q}}_i\in {\mathcal {T}}\) being the closest point to \({\mathbf {p}}_i\), the distance term can be written as
$$\begin{aligned} F_D({\mathbf {P}}')=\frac{1}{2}\left\| {\mathbf {C}}\left( {\mathbf {P}}'-{\mathbf {Q}}\right) \right\| ^2_F, \end{aligned}$$
(10)
where \({\mathbf {C}}:=\text {diag}(c_1,c_2,\ldots ,c_n)\), \(c_i\ge 0\) for all \(i\in \left\{ 1,2,\ldots ,n\right\} \) are weights used to quantify the reliability of a match, and \({\mathbf {Q}}:=\left( {\mathbf {q}}_1,{\mathbf {q}}_2,\ldots ,{\mathbf {q}}_n\right) ^\top \in {\mathbb {R}}^{n\times 3}\). Using the BPMs \(p_{\mathcal {S}}\) and \(p_{\mathcal {T}}\) of the template and target, we set
$$\begin{aligned} c_i=\frac{p_{\mathcal {S}}({\mathbf {p}}_i)+p_{\mathcal {T}}({\mathbf {q}}_i)}{2}. \end{aligned}$$
(11)
This way, correspondences \(({\mathbf {p}}_i,{\mathbf {q}}_i)\) mapping from one breast region to the other have a greater impact on the overall distance term as \(c_i\in (0,1]\) becomes large in this case. Conversely, the influence tends to zero if \(c_i\rightarrow 0\), i.e., if both points are less likely to belong to the breast region. As such, the deformation of points \({\mathbf {p}}_i\) on the template with a small value for \(c_i\) is mainly controlled by the regularization term, as previously described by Allen et al. [2].
Regularization term The regularization term \(F_R\) should prevent the template surface from shearing and distortion while simultaneously ensuring structure preservation and smooth deformations. To do so, we adapt the consistent as-similar-as-possible (CASAP) regularization technique in which deformations are constrained to be locally as similar as possible [31, 57]. Specifically, given a local neighborhood \(E_i\subset E\) around each point \({\mathbf {p}}_i\), the template surface is only allowed to move in terms of an Euclidean similarity transformation
$$\begin{aligned} {\mathbf {p}}'_j-{\mathbf {p}}'_k=s_i{\mathbf {R}}_i\left( {\mathbf {p}}_j-{\mathbf {p}}_k\right) \qquad \forall (j,k)\in E_i, \end{aligned}$$
(12)
where \(s_i>0\) is a scaling factor and \({\mathbf {R}}_i\in {{\,\mathrm{SO}\,}}(3)\) a rotation matrix. Following Chao et al. [13], we define \(E_i\) as the set containing all (directed) edges of triangles incident to \({\mathbf {p}}_i\), also known as spokes-and-rims. Finally, our CASAP regularization term reads
$$\begin{aligned}&F_R({\mathbf {P}}')=\frac{1}{2}\sum _{i=1}^n w_i\nonumber \\&\quad \left[ \sum _{(j,k)\in E_i}w_{jk}\left\| \left( {\mathbf {p}}'_j-{\mathbf {p}}'_k\right) -s_i{\mathbf {R}}_i\left( {\mathbf {p}}_j-{\mathbf {p}}_k\right) \right\| ^2_2+\right. \nonumber \\&\qquad \left. \lambda \sum _{l\in N_i}w_{il}\left\| {\mathbf {R}}_i-{\mathbf {R}}_l\right\| ^2_F\right] \,, \end{aligned}$$
(13)
where weights \(w_i>0\) are added to individually control the amount of regularization for each particular point. As mentioned above, since the deformation of points \({\mathbf {p}}_i\) with a small value for \(c_i\) is mainly controlled by the regularization term, we define
$$\begin{aligned} w_i=\frac{1}{(h-1)c_i+1}\qquad \text {with}\qquad \frac{1}{h}\le w_i<1 \end{aligned}$$
(14)
for all \(i\in \{1,2,\ldots ,n\}\) and \(h\in {\mathbb {N}}^+\) (we used \(h=2\) throughout this paper). As seen, this strategy keeps points \({\mathbf {p}}_i\) of the template relatively stiff if (i) \({\mathbf {p}}_i\) has a low probability belonging to the breast region and (ii) if the corresponding point on the target is also not likely to be part of the breast region (because \(w_i\rightarrow 1\) if \(c_i\rightarrow 0\)), thus effectively preventing the template from adapting too close to the target outside the breast region. Lastly, \(N_i\subset V\) in (13) denotes the one-ring neighborhood of the i-th point and \(w_{jk}\in {\mathbb {R}}\) are the popular cotangent weights, see, e.g., [12]. \(\lambda \ge 0\) is usually set to 0.02A, where \(A\ge 0\) is the total surface area of \({\mathcal {S}}\) [34].
Landmark term The goal of the landmark term \(F_L\) is to keep certain positions (i.e., landmarks) fix during the registration process. Let \(I\subset {\mathbb {N}}\) be an index set containing the indices of the m landmarks specified on the template surface \({\mathcal {S}}\). Define a matrix \({\mathbf {D}}\in {\mathbb {R}}^{m\times n}\) as
$$\begin{aligned} {\mathbf {D}}=(d_{ij}):= {\left\{ \begin{array}{ll} 1, &{} \text {if }j\in I,\\ 0, &{} \text {otherwise} \end{array}\right. } \end{aligned}$$
(15)
for \(i=1,2,\ldots ,m\) and \(j=1,2,\ldots ,n\). Next, denote the corresponding landmarks on the target surface by \(\{{\mathbf {q}}_1,{\mathbf {q}}_2,\ldots ,{\mathbf {q}}_m\}\subset {\mathcal {T}}\). Then, the landmark term is defined as
$$\begin{aligned} F_L({\mathbf {P}}')=\frac{1}{2}\left\| \mathbf {DP}'-{\mathbf {Q}}_L\right\| ^2_F, \end{aligned}$$
(16)
where \({\mathbf {Q}}_L:=\left( {\mathbf {q}}_1,{\mathbf {q}}_2,\ldots ,{\mathbf {q}}_m\right) ^\top \in {\mathbb {R}}^{m\times 3}\).
Multi-resolution fitting strategy
Following common practice and to speed up convergence, instead of applying the previously described non-rigid alignment only once, we employ a hierarchical, multi-resolution fitting strategy composed of initial fitting, coarse fitting, and fine fitting (see also Fig. 4).
Initial fitting Having a low-resolution instance of the rigidly aligned template at hand, the goal of the initial fitting is to roughly adapt the coarse template to the key features (i.e., landmarks) of the target. To do so, we strictly prioritize the landmark constraints and do not use BPMs in this phase.
Coarse fitting In this step, the initially fitted low-resolution template is gradually deformed toward the target.
Upsampling Next, the deformation obtained from the previous step is applied to the original, full-resolution template. This is achieved using a concept called Embedded Deformation, introduced by Sumner et al. [50]. In essence, the deformation of the coarse template obtained from the previous step is transferred to the template by linearly interpolating the transformation at each point.
Fine fitting Lastly, the upsampled template is fitted to the target again to produce the final result.
Model building
Once the data set is brought into correspondence, we follow the typical workflow used to build a classical point-based statistical shape model as proposed by Cootes et al. [14]. For notational simplicity, instead of stacking points \({\mathcal {P}}\) of a triangular mesh \({\mathcal {M}}=(V,E,{\mathcal {P}})\) into a matrix \({\mathbf {P}}\in {\mathbb {R}}^{n\times 3}\) as before, we use a vectorized representation, denoted as \({\mathbf {x}}={{\,\mathrm{vec}\,}}({\mathbf {P}})\in {\mathbb {R}}^{3n}\) in the following.
Briefly, given a set of k breast scans \(\{{\mathbf {x}}_1,{\mathbf {x}}_2,\ldots ,{\mathbf {x}}_k\}\subset {\mathbb {R}}^{3n}\) in correspondence, we first perform Generalized Procrustes Analysis (GPA) as introduced by Gower [26]. GPA iteratively aligns the objects to the arithmetic mean \({\bar{\mathbf {x}}}\in {\mathbb {R}}^{3n}\) (successively estimated from the data) by using an Euclidean similarity transformation, effectively transforming the objects into the shape space. Secondly, principal component analysis (PCA) is carried out on the Procrustes-aligned shapes. Let \(\{\lambda _1,\lambda _2,\ldots ,\lambda _q\}\subset {\mathbb {R}}^+\) be the \(q<k\) nonzero eigenvalues (also called principal components (PCs) in this context) of the empirical covariance matrix calculated from the data and sorted in a descending order. Denote the corresponding eigenvectors as \(\{{\mathbf {u}}_1,{\mathbf {u}}_2,\ldots ,{\mathbf {u}}_q\}\subset {\mathbb {R}}^{3n}\). Then, a statistical shape model can be interpreted as a linear function \(M:{\mathbb {R}}^q\longrightarrow {\mathbb {R}}^{3n}\) defined as
$$\begin{aligned} M(\varvec{\alpha })={\bar{\mathbf {x}}}+{\mathbf {U}}\varvec{\alpha }, \end{aligned}$$
(17)
where \({\mathbf {U}}:=({\mathbf {u}}_1,{\mathbf {u}}_2,\ldots ,{\mathbf {u}}_q)\in {\mathbb {R}}^{3n\times q}\). To ensure plausibility of the newly generated shapes, \(\alpha _i\) is usually restricted to \(\left| \alpha _i\right| \le 3\sqrt{\lambda _i}\) for all \(i\in \{1,2,\ldots ,q\}\). If a (possibly unseen) shape \({\mathbf {x}}'\in {\mathbb {R}}^{3n}\) is in correspondence with the model and properly aligned, it can be reconstructed from M in a least-squares sense by using
$$\begin{aligned} \varvec{\alpha }^*={\mathbf {U}}^{-1}\left( {\mathbf {x}}' - {\bar{\mathbf {x}}}\right) \end{aligned}$$
(18)
as the model parameters, i.e., \({\mathbf {x}}'\approx M(\varvec{\alpha }^*)\). The number \(q<k\) of retained PCs is chosen so that the model typically represents a fixed proportion of the total variance, e.g., 98%.