1 Introduction

In this article, we propose an elastic 3D–2D image registration technique, which is motivated by a challenging image processing problem arising in biological studies of leucocyte extravasation into inflamed tissue in mice. A common experimental procedure consists of irritating a certain tissue region and performing two-dimensional intravital microscopy (IVM) to monitor the reactions on the cellular level, see for instance [5]. After a sufficiently long video sequence has been recorded, the animal is killed, the tissue is removed, stained, and fixed, and a single, volumetric 3D image is acquired using a 3D confocal microscope. The two obtained datasets therefore show essentially the same tissue region, one being an elastically deformed configuration of the other. While the IVM recording shows in addition the temporal behaviour of the moving leucocytes, the confocal microscopy image contains more detailed structures such as the blood vessel basement membranes. Aligning and merging both datasets allow to combine the enhanced structural information obtained by the 3D confocal microscope with temporal information gained from the IVM.

Both the 2D and 3D datasets use fluorescent staining and thus possess multiple colour channels corresponding to different fluorescent dyes. In particular, one channel of both images contains the same biological structures (for instance, the endothelial cell membranes) so that those can be used for registration. Our approach for registering the images is based on mimicking the image acquisition system of the 2D microscope, including a nonrigid tissue deformation and a blurring and projection step. The intensities of the blurred and projected image volume are then compared with those of a 2D reference image (for instance, a single frame of the IVM video), see Fig. 1 for the overall procedure. The goal of the registration process is then to find a deformation transforming the volumetric image in such a way that the similarity of the two datasets is maximal. The set of admissible deformations is chosen in a way that reflects physical properties of the underlying tissue.

Fig. 1
figure 1

Forward model of the relation between the 2D and 3D datasets. Given a specimen of the tissue, 3D confocal microscopy yields a corresponding 3D image of voxels. Taking the tissue configuration during confocal microscopy as a reference, the 2D microscopy image is obtained by a (geometrically nonlinear) deformation and a subsequent 2D image acquisition in which structures outside the focus plane are blurred during the projection onto the 2D image plane

In this article, we present a complete mathematical description of the aforementioned procedure and, using the direct method in the calculus of variations, guarantee the existence of an admissible deformation that maximizes the similarity between the datasets. Although rigid or parameterized 3D–2D registration [22, 30] and elastic 3D–3D registration [4] has been studied before, elastic 3D–2D registration has so far not been attempted to the best of our knowledge.

Xu and Wan presented an intensity-based 3D–2D registration technique for the matching of a CT volume to 2D X-ray images and a description of how their method can be implemented on GPUs [29]. Studies on CT to X-ray fluoroscopy registration for guidance of surgical intervention were published by Otake et al. [22] and Uneri et al. [28]. All three articles employ rigid motions to obtain optimal alignment, which can be expected to yield satisfactory registration results for this specific type of problem.

Also, descriptions of nonrigid methods can be found in the literature. Zheng and Yu presented a spline-based 3D–2D registration technique [30] for matching 3D CT data to 2D X-ray images using a statistical deformation model. The article also references a number of publications which describe similar methods. In contrast to our setting, in all the above studies the 2D images result from a 3D volume via the X-ray transform, while our 2D images stem from optical microscopy and thus depict structures outside the focus plane with blur. On the one hand, this blur makes the lateral alignment of structures between the 3D and 2D images more difficult (also computationally); on the other hand, it can provide additional information about the deformation along the viewing direction, which in the above studies must come solely from the deformation model.

Heldmann and Papenberg presented a variational method to register 3D CT data to a sequence of 2D ultrasound slices [12]. Note that their 2D images thus stem from point- or slicewise evaluation of a 3D volume rather than from an integral transform as in our and the above approaches. The well-posedness of this problem thus requires high regularity of the 3D image to be registered and the corresponding deformation, which the authors provide employing curvature regularization. They include results of the application of their technique to clinical data; however, no in-depth analysis of the presented energy functional is presented. Berkels et al. studied 2D–3D surface registration, where graph representations of cortical surfaces had to be matched to 2D brain images. Unlike in our case, an optimal deformation \(\psi : {\mathbb {R}}^2 \rightarrow {\mathbb {R}}^3\) had to be found. For the regularization of this highly ill-posed problem, the authors suggested a regularizer based on the second-order thin-plate spline energy of \(\psi \) [3]. A different, yet still related, situation can arise in surface-to-surface matching problems, if surface regions are locally described by two-dimensional images, which then in turn can be processed further by 2D–2D image matching algorithms. Such methods were, for example, described by Merelli et al. [17, 18].

Our method will estimate displacement in viewing direction from the severity of out-of-focus blur. A closely related problem, known as depth-from-defocus, is to estimate the distance between an object and the image acquisition device from a whole image stack (rather than just one image as in our case) that was generated by capturing the object under varying focal settings. The forward model described by Persch et al. [24], which mimics the image acquisition of a thin lens camera, is comparable to our 2D microscope model. Likewise related is the article by Aguet et al. [1], which suggests a method of how an image stack, produced by moving a sample through the different focal planes of a 2D microscope, can be combined into a single feature-enriched image.

For further methods, we refer to the extensive overview [15] of 3D–2D registration techniques with a focus on applications in image-guided surgery.

In the next section, we present the mathematical model and the analysis of existence of solutions in detail. Section 3 describes the numerical implementation, while Sect. 4 shows the behaviour of the method in carefully chosen test cases and a real biological dataset.

2 Mathematical Formulation

We continue with the mathematical description of our registration problem. We first briefly describe the forward operator of the 2D microscope and subsequently a physically reasonable model of tissue deformations.

For simplicity, our exposition will consider the 3D and 2D unit domains \({\Omega _{\text {3D}}}= [-1,1]^3\) and \({\Omega _{\text {2D}}}= [-1,1]^2\), respectively. The volumetric image stemming from 3D confocal microscopy is denoted \(u_{\text {3D}}:{\Omega _{\text {3D}}}\rightarrow [0,\infty )\) and is considered to represent the tissue reference configuration. In comparison with the IVM, the quality of the 3D microscopy process is typically good enough to assume \(u_{\text {3D}}\) noise- and blurfree. The 2D microscopy image \(u_{\text {2D}}:{\Omega _{\text {2D}}}\rightarrow [0,\infty )\) will be interpreted as resulting from a nonrigid deformation of \(u_{\text {3D}}\) and a subsequent projection into the plane. Above, both \(u_{\text {3D}}\) and \(u_{\text {2D}}\) are assumed to represent only that colour channel in which the same biological structures are visible; furthermore, we will assume both images to be uniformly bounded (which is appropriate since the image intensities can be interpreted as the local concentration of fluorescent molecules).

The forward operator of 2D microscopy can be modelled by

$$\begin{aligned}&{{\mathcal {F}}}: L^\infty ({\Omega _{\text {3D}}}) \rightarrow L^\infty ({\Omega _{\text {2D}}}),\\&({{\mathcal {F}}}u)(x_1,x_2) = [\chi *u](x_1,x_2, 0), \end{aligned}$$

where the blurring kernel \(\chi \in L^1({\mathbb {R}}^3)\) has compact support and encodes the microscope’s point spread function. Above, \(L^p\) denotes the standard Lebesgue function space, and \(\chi *u\) denotes convolution after extending u by zero outside \({\Omega _{\text {3D}}}\). Note that this zero extension is appropriate if outside \({\Omega _{\text {3D}}}\) there are no more fluorescently labelled structures or if \({\Omega _{\text {2D}}}\) only shows an interior subregion of \({\Omega _{\text {3D}}}\). Due to

$$\begin{aligned}&|[\chi *u](x)-[\chi *u]({\tilde{x}})|\\&\quad =\left| \int _{{\mathbb {R}}^3}(\chi (x-z)-\chi ({\tilde{x}}-z))u(z)\,{\mathrm {d}}z\right| \\&\quad \le \Vert \chi (x-\cdot )-\chi ({\tilde{x}}-\cdot )\Vert _{L^1}\Vert u\Vert _{L^\infty } \rightarrow _{{\tilde{x}}\rightarrow x}0 \end{aligned}$$

we have \(\chi *u\in C_c^0({\mathbb {R}}^3)\) (the space of continuous functions with compact support) so that \({{\mathcal {F}}}\) is well-defined and even maps into \(C^0({\Omega _{\text {2D}}})\) (the space of continuous functions on \({\Omega _{\text {2D}}}\)). The value \(\chi ((x_1,x_2,0)-z)\) can be interpreted as the amount of light recorded at \((x_1,x_2)\) by the 2D microscope from a point source of light at z, which is readily seen from \(({{\mathcal {F}}}u)(x_1,x_2)=\int _{{\mathbb {R}}^3}\chi ((x_1,x_2,0)-z)u(z)\,{\mathrm {d}}z\). In our model, we assume the blurring kernel to be spatially independent so that the observed blur of a point \(z\in {\Omega _{\text {3D}}}\) only depends on its height \(z_3\). A simple example (used in our calculations due to the lack of a properly measured point spread function) is given by:

$$\begin{aligned} \chi (x)={\left\{ \begin{array}{ll} \frac{1}{\pi c^2x_3^2}&{}{\text {if }}|x_3|\le 1,x_1^2+x_2^2\le c^2x_3^2,\\ 0&{}{\text {else,}} \end{array}\right. } \end{aligned}$$

which blurs each point z outside the focus plane \({\mathbb {R}}^2\times \{0\}\) uniformly onto a disc of radius \(cz_3\).

Remark 1

(Compactness of forward operator) The previous calculations even show that \({{\mathcal {F}}}\) is a compact operator from \(L^\infty ({\Omega _{\text {3D}}})\) to \(C^0({\Omega _{\text {2D}}})\) (as expected for convolutions, even though we only extract a slice from the convolution result). Indeed, due to \(|[\chi *u](x)-[\chi *u]({\tilde{x}})|\le \Vert \chi (x-\cdot )-\chi ({\tilde{x}}-\cdot )\Vert _{L^1}\Vert u\Vert _{L^\infty }\), the image under \({{\mathcal {F}}}\) of a bounded subset \(A\subset L^\infty ({\Omega _{\text {3D}}})\) is equicontinuous and thus compact in \(C^0({\Omega _{\text {2D}}})\) by the Arzelà–Ascoli theorem. However, in our analysis we will not make use of this fact.

Remark 2

(Less image regularity) With additional conditions on the blurring kernel \(\chi \), one may reduce the image regularity to \(u_{\text {3D}}\in L^q({\Omega _{\text {3D}}})\) and obtain a continuous linear forward operator \({{\mathcal {F}}}:L^q({\Omega _{\text {3D}}})\rightarrow L^q({\Omega _{\text {2D}}})\) for any \(q\ge 1\). Indeed, assume that \(\Vert \chi (\cdot ,\cdot ,x_3)\Vert _{L^1}\le C\) uniformly for almost all \(x_3\in {\mathbb {R}}\) and some \(C>0\) (note that this is satisfied by our example kernel), then for \(u\in L^q({\Omega _{\text {3D}}})\) we have

$$\begin{aligned} {{\mathcal {F}}}u(x_1,x_2) =\int _{\mathbb {R}}u(\cdot ,\cdot ,x_3)*\chi (\cdot ,\cdot ,-x_3)\,{\mathrm {d}}x_3 \end{aligned}$$

using Fubini’s theorem and thus \(|{{\mathcal {F}}}u(x_1,x_2)|^q\le K\int _{\mathbb {R}}|u(\cdot ,\cdot ,x_3)*\chi (\cdot ,\cdot ,-x_3)|^q\,{\mathrm {d}}x_3\) by Jensen’s inequality for some \(K>0\) depending on the support of \(\chi \). Therefore, by Fubini’s theorem and Young’s convolution inequality, we have

$$\begin{aligned} \Vert {{\mathcal {F}}}u\Vert _{L^q}^q= & {} \int _{\Omega _{\text {2D}}}|{{\mathcal {F}}}u|^q(x_1,x_2)\,{\mathrm {d}}(x_1,x_2)\\\le & {} K\int _{\mathbb {R}}\int _{\Omega _{\text {2D}}}|u(\cdot ,\cdot ,x_3)*\chi (\cdot ,\cdot ,-x_3)|^q\,{\mathrm {d}}(x_1,x_2)\,{\mathrm {d}}x_3\\\le & {} K\int _{\mathbb {R}}\Vert u(\cdot ,\cdot ,x_3)\Vert _{L^q}^q\Vert \chi (\cdot ,\cdot ,-x_3)\Vert _{L^1}^q\,{\mathrm {d}}x_3 \\\le & {} KC^q\int _{\mathbb {R}}\Vert u(\cdot ,\cdot ,x_3)\Vert _{L^q}^q\,{\mathrm {d}}x_3 \\\le & {} KC^q\Vert u\Vert _{L^q}^q. \end{aligned}$$

Between the 2D and the 3D image acquisition, the tissue is deformed by a deformation \(y:{\Omega _{\text {3D}}}\rightarrow {\mathbb {R}}^3\), where for any point \(x\in {\Omega _{\text {3D}}}\) the value y(x) shall be interpreted as the new position assumed during the 3D image acquisition. The inverse deformation \(y^{-1}\) thus moves the reference configuration back into the configuration during the 2D microscopy so that we expect \(u_{\text {2D}}={{\mathcal {F}}}(u_{\text {3D}}\circ y)\). However, due to noise in the image acquisition and additional artefacts not included in our model (such as diffuse background signals), one cannot expect equality. Instead we shall seek a deformation y that leads to a small dissimilarity measure

$$\begin{aligned} {\mathcal {J}}^d[y] = \int _{{\Omega _{\text {2D}}}} d\left( {{\mathcal {F}}}(u_{\text {3D}}\circ y)(x),\, u_{\text {2D}}(x)\right) \, {\mathrm {d}}x, \end{aligned}$$

where \(d:{\mathbb {R}}\times {\mathbb {R}}\rightarrow [0,\infty )\) measures the distance between two image intensities (again \(u_{\text {3D}}\) is extended by zero outside \({\Omega _{\text {3D}}}\)). The optimal choice of d is in general determined by the type of noise contained in the image data. We will focus on the \(L^1\) and \(L^2\) distance measures obtained with

$$\begin{aligned} d_1(a,b) = |a-b|, \quad d_2(a,b) = |a-b|^2, \end{aligned}$$

respectively. The former is well known to be appropriate if the data contain strong outliers, while the latter is appropriate for additive Gaussian noise. Other choices include correlation-based distance measures (see, for instance, [19, Sct. 6.1]) or the Kullback–Leibler divergence \(d_{\mathrm{KL}}(a,b)=b-a+a\log \frac{a}{b}\) in case of Poisson noise.

The minimization of \({\mathcal {J}}^d[y]\) for general mappings \(y:{\Omega _{\text {3D}}}\rightarrow {\mathbb {R}}^3\) is neither physically reasonable nor well-posed. Indeed, y might be discontinuous or not even measurable so that the composition \(u_{\text {3D}}\circ y\) does not make sense. Thus, we have to regularize the deformation y by imposing additional constraints and adding an extra energy term. Concerning physical constraints, y should neither reverse the orientation of the deformed body \({\Omega _{\text {3D}}}\) nor should it introduce self-intersections of the material. The first condition translates to

$$\begin{aligned} \det \nabla y > 0 {\text { almost everywhere,}} \end{aligned}$$

whereas the second condition is imposed by additionally demanding

$$\begin{aligned} \int _{{\Omega _{\text {3D}}}} \det \, \nabla y(x)\, {\mathrm {d}}x \ \le {\mathrm {vol}}(y({\Omega _{\text {3D}}})), \end{aligned}$$

where \({\mathrm {vol}}\) shall denote the three-dimensional Lebesgue measure. The advantage of this constraint is that it guarantees the injectivity of y if combined with \(\det \nabla y > 0\) and that it also holds for the weak limit of a sequence \((y_k)_k\) satisfying the constraint, see for example [6, Thm. 7.9-1]. An alternative way to ensure global injectivity is to make y coincide on the domain boundary \(\partial {\Omega _{\text {3D}}}\) with a homeomorphism of \({\Omega _{\text {3D}}}\) [2]. Such a constraint is difficult to treat, though, unless one imposes Dirichlet boundary conditions on y. However, Dirichlet boundary conditions are somewhat unnatural for image registration problems as y is not known on the boundary, which is why we stick to the above inequality and impose no boundary conditions on y at all. We shall also constrain the maximal possible displacement according to

$$\begin{aligned} \Vert y-{\mathrm {id}}\Vert _{L^\infty }\le {\mathrm {diam}}({\Omega _{\text {2D}}}) \end{aligned}$$

in order not to shift the three-dimensional volume out of the visible area (recall that, if y is a deformation, then \(y-{\mathrm {id}}\) is the corresponding displacement). In addition to those hard constraints, unreasonable growth or shrinkage of lengths, areas, or volumes by the deformation should be penalized. Modelling the biological tissue as an elastic material, such a penalization can be achieved by adding an elastic deformation energy

$$\begin{aligned} {{\mathcal {R}}}[y]=\int _{\Omega _{\text {3D}}}W(\nabla y)\,{\mathrm {d}}x \end{aligned}$$

as regularization, which will prevent physically unreasonable deformations with too high elastic energy. Here, \(W:{\mathbb {R}}^{3\times 3}\rightarrow [0,\infty )\) represents the stored energy function. Since biological tissue is very soft, it typically undergoes considerable nonlinear deformation so that the elastic model should be geometrically nonlinear and, in particular, rigid motion invariant. Furthermore, due to the lack of better information we may assume a homogeneous and isotropic elastic constitutive law, meaning that W does not depend on the spatial location within \({\Omega _{\text {3D}}}\) and that \(W(AR)=W(A)\) for all rotation matrices R so that the material behaves the same way in all directions. Actually, for biological tissue usually the opposite holds true. Unfortunately, though, one rarely has sufficiently accurate information available on the constitutive law within \({\Omega _{\text {3D}}}\). In case one does, this information can of course be incorporated in W—the further development depends neither on homogeneity nor on isotropy (those are just most natural without additional data, and they simplify notation). Together, the above assumptions imply that the stored energy function can be written as a function of the singular values or of the invariants of its argument (see, for example, [6, Ch. 4]) so that

$$\begin{aligned} W(A)={\hat{W}}(\Vert A\Vert _F,\Vert {\mathrm {cof}}A\Vert _F,\det A) \end{aligned}$$

with the Frobenius norm \(\Vert A\Vert _F = \sqrt{\mathrm {tr}(A^T A)}\) and the cofactor matrix \({\mathrm {cof}}A=\det A A^{-T}\). Note that \(\Vert A\Vert _F\), \(\Vert {\mathrm {cof}}A\Vert _F\), and \(\det A\) control length, area, and volume changes, respectively. Furthermore, biological tissue is almost incompressible so that deviation from \(\det \nabla y=1\) should be strongly penalized. In our numerical experiments, we will use the particular example

$$\begin{aligned} W(\nabla y)&= c_1 \Vert \nabla y\Vert _F^2 + g(\det \nabla y) \end{aligned}$$

for a positive constant \(c_1\) and a function \(g:{\mathbb {R}}\rightarrow [0,\infty ]\) such that \(g(x) \rightarrow \infty \) for \(x \rightarrow 0\). To favour deformations obeying \(\det \nabla y \approx 1\), a term like \(|\det \nabla y-1|^2\) can be incorporated into the definition of g. Note that this function is polyconvex (that is, it can be written as a convex function of \((\nabla y,{\mathrm {cof}}\nabla y,\det \nabla y)\)), which ensures weak lower semi-continuity of \({{\mathcal {R}}}\) as will be needed for the existence analysis. Summarizing, we arrive at the optimization problem

$$\begin{aligned} {\mathcal {E}}^d[y]={\mathcal {J}}^d[y]+{{\mathcal {R}}}[y]\rightarrow \min ! \end{aligned}$$

whose solution is the sought-matching deformation.

Remark 3

(Bayesian perspective) The task of registering the two datasets can also be viewed as an inverse problem, where a measurement \(u_{\text {2D}}\) is given alongside with a model of the forward operator \(y\mapsto {{\mathcal {F}}}(u_{\text {3D}}\circ y)\), which maps a deformation of the tissue into the corresponding projection onto a planar image. Due to measurement noise and unknown modelling errors in the forward operator, the measurement is a random variable. Likewise, the sought deformation y can be interpreted as a random variable with different outcomes in different repetitions of the measurement. From a Bayesian perspective, one would now like to maximize the conditional probability

$$\begin{aligned} P({{\mathcal {F}}}(u_{\text {3D}}\circ y)|u_{\text {2D}}) = \frac{P(u_{\text {2D}}|{{\mathcal {F}}}(u_{\text {3D}}\circ y))P(y)}{P(u_{\text {2D}})} \end{aligned}$$

(we formally use P like a probability density over an infinite-dimensional space). After taking the negative logarithm, the optimization problem thus turns into

$$\begin{aligned}&\min _y \ -\log P({{\mathcal {F}}}(u_{\text {3D}}\circ y)|u_{\text {2D}}) \nonumber \\&\quad = \min _y \left[ -\log P(u_{\text {2D}}|{{\mathcal {F}}}(u_{\text {3D}}\circ y)) - \log P(y)\right. \nonumber \\&\qquad \left. + \log P(u_{\text {2D}}) \ \right] . \end{aligned}$$

The term \(\log P(u_{\text {2D}})\) is independent of y and can be neglected, and for the probability distribution of deformations it is reasonable to assume a Boltzmann-type distribution \(P(y)\sim \exp (-{{\mathcal {R}}}[y])\), in which the probability decreases exponentially with increasing deformation energy. Likewise, the probability distribution of the measurement \((u_{\text {2D}})_k\) in the kth image pixel is taken as \(P((u_{\text {2D}})_k|(v)_k)\sim \exp (-d(v_k,(u_{\text {2D}})_k))\), where \(v_k=({{\mathcal {F}}}(u_{\text {3D}}\circ y))_k\) is the expected value in pixel k. For instance, if the noise distribution is Gaussian we have \(P((u_{\text {2D}})_k|(v)_k)\sim \exp (-d_2(v_k,(u_{\text {2D}})_k))\). Integrating over all pixels, we obtain \(P(u_{\text {2D}}|{{\mathcal {F}}}(u_{\text {3D}}\circ y))\sim \exp (-\int _{\Omega _{\text {2D}}}d({{\mathcal {F}}}(u_{\text {3D}}\circ y)(x),u_{\text {2D}}(x))\,{\mathrm {d}}x)=\exp (-{\mathcal {J}}^d[y])\). Summarizing, we arrive at the same optimization problem, \(\min _y{\mathcal {J}}^d[y]+{{\mathcal {R}}}[y]\).

Below we briefly prove the existence of minimizers to our registration problem. Note that the lower semi-continuity of \({{\mathcal {R}}}\) and the properties of the limit function are well known in the field of nonlinear elasticity (see, for instance, [6, Thm. 7.9-1]); nevertheless, we include a short corresponding argument. The difference from classical results in nonlinear elasticity lies in the lack of a Dirichlet boundary condition for the deformation y and in the additional dissimilarity measure \({\mathcal {J}}^d\). Similar hyperelastic image registration problems had been considered in [4, 8, 25]. Our analysis differs from the previous ones in the incorporation of the linear blurring operator \({{\mathcal {F}}}\) and the enforcement of global injectivity. Furthermore, in contrast to [4] we allow the images \(u_{\text {3D}}\) and \(u_{\text {2D}}\) to have discontinuities since we only require \(L^\infty \) regularity, and we impose a constraint on the supremum norm of the deformation y, which the authors in [4] explicitly try to avoid. A further minor difference is the version of the lower semi-continuity result employed in [4]: While they use the variant allowing a weaker growth condition on W at the expense of having to penalize the cofactor matrix of the deformation gradient (in fact they just consider one specific choice for the stored energy W, though it can be readily generalized), we employ the variant that does not need the cofactor matrix since we believe that biological structures withstand length changes (through fibrous structures) and volume changes (as being mainly composed of water), but not area changes (which are controlled by the cofactor matrix). The different versions of possible lower semi-continuity settings are nicely summarized in [23, Thm. 3.6].

Theorem 1

(Existence of minimizer) Let \(u_{\text {3D}}\in L^\infty ({\Omega _{\text {3D}}})\) and \(u_{\text {2D}}\in L^\infty ({\Omega _{\text {2D}}})\). Also, let \({\mathcal {E}}^d = {\mathcal {J}}^{d} + {{\mathcal {R}}}\) for a finite nonnegative dissimilarity \(d(\cdot ,\cdot )\), convex and lower semi-continuous in its first argument, and a polyconvex lower semi-continuous stored energy function W satisfying

$$\begin{aligned} W(A) \ge C \left( \Vert A\Vert _F^p + \max \{0,\det A\}^{-r}\right) - \beta \end{aligned}$$

for some exponents \(p>3\), \(r>0\) and constants \(C,\beta >0\). Then \({\mathcal {E}}^d\) admits a minimizing deformation y on the set of admissible deformations

(\(W^{1,p}\) denotes the standard Sobolev space). Furthermore, y is almost everywhere injective.

Proof

We follow the direct method of the calculus of variations. First note that \({\mathcal {E}}^d\) is bounded below by \(-\beta {\mathrm {vol}}({\Omega _{\text {3D}}})\) and that \(\inf _{{\mathcal {A}}}{\mathcal {E}}^d\) is finite due to \({\mathcal {E}}^d[{\mathrm {id}}]<\infty \).

Compactness: Consider a minimizing sequence \(y_k\in {\mathcal {A}}\), \(k=1,2,\ldots \), with \({\mathcal {E}}^d[y_k]\rightarrow \inf _{{\mathcal {A}}}{\mathcal {E}}^d\) monotonically as \(k\rightarrow \infty \). Potentially passing to a subsequence, we may additionally assume \(\liminf _{k\rightarrow \infty }{{\mathcal {R}}}[y_k]=\lim _{k\rightarrow \infty }{{\mathcal {R}}}[y_k]\). Due to the growth condition on W, we have

$$\begin{aligned} {\mathcal {E}}^d[y_1]\ge {\mathcal {E}}^d[y_k]\ge {{\mathcal {R}}}[y_k]\ge C\Vert \nabla y_k\Vert _{L^p}^p-\beta {\mathrm {vol}}({\Omega _{\text {3D}}}) \end{aligned}$$

so that \(\Vert \nabla y_k\Vert _{L^p}\) is uniformly bounded. Together with the admissibility condition \(\Vert y_k-{\mathrm {id}}\Vert _{L^\infty }\le {\mathrm {diam}}({\Omega _{\text {2D}}})\), we obtain uniform boundedness of \(\Vert y_k\Vert _{W^{1,p}}\) so that we can extract a weakly converging subsequence (still indexed by k) \(y_k\rightharpoonup y\) in \(W^{1,p}({\Omega _{\text {3D}}};{\mathbb {R}}^3)\). Due to \(p>3\), by Sobolev embedding we may even assume \(y_k\rightarrow y\) strongly in the space \(C^{0,\alpha }({\Omega _{\text {3D}}};{\mathbb {R}}^3)\) of Hölder continuous functions with exponent \(\alpha <1-p/3\).

Lower semi-continuity of \({{\mathcal {R}}}\): We have \({{\mathcal {R}}}[y]\le \liminf _{k\rightarrow \infty }{{\mathcal {R}}}[y_k]\) by the properties of W. Indeed, by Hölder’s inequality \({\mathrm {cof}}\nabla y_k\) and \(\det \nabla y_k\) are uniformly bounded in \(L^{2p/3}({\Omega _{\text {3D}}})\) and \(L^{p/3}({\Omega _{\text {3D}}})\), respectively, and thus converge for a subsequence. By [6, Thm. 7.6-1], we even have

$$\begin{aligned}&{\mathrm {cof}}\nabla y_k\rightharpoonup {\mathrm {cof}}\nabla y {\text { in }} L^{2p/3}({\Omega _{\text {3D}}}) \quad {\text {and}}\\&\det \nabla y_k\rightharpoonup \det \nabla y {\text { in }} L^{p/3}({\Omega _{\text {3D}}}). \end{aligned}$$

Now Mazur’s lemma implies the existence of a sequence of strongly and pointwise almost everywhere converging convex combinations

$$\begin{aligned} \sum _{i=k}^{N_k}a_i^k(\nabla y_i,{\mathrm {cof}}\nabla y_i,\det \nabla y_i)\rightarrow (\nabla y,{\mathrm {cof}}\nabla y,\det \nabla y) \end{aligned}$$

as \(k\rightarrow \infty \), where \({N_k}\ge k\) and the nonnegative coefficients \(a_i^k\) sum up to one. Since W is polyconvex, we can write \(W(A)={\tilde{W}}(\nabla A,{\mathrm {cof}}\nabla A,\det \nabla A)\) for a convex function \(\tilde{W}:{\mathbb {R}}^{3\times 3}\times {\mathbb {R}}^{3\times 3}\times {\mathbb {R}}\rightarrow {\mathbb {R}}\). Thus, with Fatou’s lemma and the lower semi-continuity of W, we now have

$$\begin{aligned} {{\mathcal {R}}}[y]= & {} \int _{\Omega _{\text {3D}}}{\tilde{W}}(\nabla y,{\mathrm {cof}}\nabla y,\det \nabla y)\,{\mathrm {d}}x\\= & {} \int _{\Omega _{\text {3D}}}{\tilde{W}}\left( \lim _{k\rightarrow \infty }\sum _{i=k}^{{N_k}}a_i^k(\nabla y_i,{\mathrm {cof}}\nabla y_i,\det \nabla y_i)\right) \,{\mathrm {d}}x\\\le & {} \int _{\Omega _{\text {3D}}}\liminf _{k\rightarrow \infty }\sum _{i=k}^{{N_k}}a_i^k{\tilde{W}}(\nabla y_i,{\mathrm {cof}}\nabla y_i,\det \nabla y_i)\,{\mathrm {d}}x\\\le & {} \liminf _{k\rightarrow \infty }\sum _{i=k}^{{N_k}}a_i^k\int _{\Omega _{\text {3D}}}{\tilde{W}}(\nabla y_i,{\mathrm {cof}}\nabla y_i,\det \nabla y_i)\,{\mathrm {d}}x\\= & {} \liminf _{k\rightarrow \infty }\sum _{i=k}^{{N_k}}a_i^k{{\mathcal {R}}}[y_i] =\liminf _{k\rightarrow \infty }{{\mathcal {R}}}[y_k], \end{aligned}$$

where the last equality follows from the assumption \(\liminf _{k\rightarrow \infty }{{\mathcal {R}}}[y_k]=\lim _{k\rightarrow \infty }{{\mathcal {R}}}[y_k]\) and the fact that the coefficients \(a_i^k\) sum up to one.

Properties of limit function: The limit function y lies in \({\mathcal {A}}\). Indeed, by the uniform convergence \(y_k\rightarrow y\) we have \(\Vert y-{\mathrm {id}}\Vert _{L^\infty }=\lim _{k\rightarrow \infty }\Vert y_k-{\mathrm {id}}\Vert _{L^\infty }\le {\mathrm {diam}}({\Omega _{\text {2D}}})\). To see that \(\det \nabla y > 0\) holds almost everywhere, consider the set

$$\begin{aligned} S_{\varepsilon } = \{ x \in {\Omega _{\text {3D}}}|\det \nabla y(x) < \varepsilon \} \end{aligned}$$

for \(\varepsilon >0\). The growth condition on W and the lower semi-continuity of \({{\mathcal {R}}}\) imply

$$\begin{aligned}&{\mathrm {vol}}(S_{\varepsilon })C\varepsilon ^{-r}-\beta {\mathrm {vol}}({\Omega _{\text {3D}}}) \le {{\mathcal {R}}}[y] \le \liminf _{k\rightarrow \infty }{{\mathcal {R}}}[y_k] \\&\quad \le \liminf _{k\rightarrow \infty }{\mathcal {E}}^d[y_k] \le {\mathcal {E}}^d[y_1]. \end{aligned}$$

Thus, \({\mathrm {vol}}(S_{\varepsilon }) \rightarrow 0\) as \(\varepsilon \rightarrow 0\) implies \(\det \nabla y > 0\) almost everywhere. Likewise, due to the weak convergence \(\det \nabla y_k\rightharpoonup \det \nabla y\) and the convergence \(y_k\rightarrow y\) in \(C^{0,\alpha }({\Omega _{\text {3D}}})\), we have

$$\begin{aligned}&\int _{\Omega _{\text {3D}}}\det \nabla y\,{\mathrm {d}}x =\lim _{k\rightarrow \infty }\int _{\Omega _{\text {3D}}}\det \nabla y_k\,{\mathrm {d}}x\\&\quad \le \lim _{k\rightarrow \infty }{\mathrm {vol}}(y_k({\Omega _{\text {3D}}})) ={\mathrm {vol}}(y({\Omega _{\text {3D}}})). \end{aligned}$$

Furthermore, y is injective almost everywhere, that is, the cardinality

$$\begin{aligned} N(y\,|\,v)={\mathrm {card}}(y^{-1}(\{v\})) \end{aligned}$$

equals 1 for almost every \(v\in y({\Omega _{\text {3D}}})\). Indeed, by the change of variables formula for Sobolev functions [14, Thm. 2] we have

$$\begin{aligned} {\mathrm {vol}}(y({\Omega _{\text {3D}}})) \le \int _{y({\Omega _{\text {3D}}})}N(y\,|\,v)\,{\mathrm {d}}v =\int _{\Omega _{\text {3D}}}\det \nabla y(x)\,{\mathrm {d}}x. \end{aligned}$$

Since also the opposite inequality holds, we must have equality and \(N(y\,|\,v)=1\) almost everywhere.

Lower semi-continuity of \({\mathcal {J}}^d\): Note that we have \(u_{\text {3D}}\circ y_k\rightarrow u_{\text {3D}}\circ y\) as \(k\rightarrow \infty \) in any \(L^q({\Omega _{\text {3D}}})\) with \(q\in [1,\infty )\). Indeed, for a Dirac sequence \(G_{\delta }\) of smooth mollifiers we have

$$\begin{aligned}&\Vert u_{\text {3D}}\circ y_k-u_{\text {3D}}\circ y \Vert _{L^q} \le \Vert u_{\text {3D}}\circ y_k-(G_{\delta } *u_{\text {3D}}) \circ y_k\Vert _{L^q}\\&\quad + \Vert (G_{\delta } *u_{\text {3D}}) \circ y_k-(G_{\delta } *u_{\text {3D}}) \circ y\Vert _{L^q}\\&\quad + \Vert (G_{\delta } *u_{\text {3D}}) \circ y-u_{\text {3D}}\circ y\Vert _{L^q}. \end{aligned}$$

Abbreviating \(s=(r+1)/r\) and employing Hölder’s inequality, for the first summand we obtain

$$\begin{aligned}&\int _{{\Omega _{\text {3D}}}} |u_{\text {3D}}\circ y_k - (G_{\delta } *u_{\text {3D}}) \circ y_k |^q \, {\mathrm {d}}x \\&\quad \le \int _{{\Omega _{\text {3D}}}}\left| \frac{| u_{\text {3D}}\circ y_k- (G_{\delta } *u_{\text {3D}}) \circ y_k |^q }{ \det \nabla y_k^{\frac{1}{s}} } \det \nabla y_k^{\frac{1}{s}}\right| \, {\mathrm {d}}x\\&\quad \le \left( \int _{{\Omega _{\text {3D}}}} | u_{\text {3D}}\circ y_k -(G_{\delta } *u_{\text {3D}}) \circ y_k |^{qs} \det \nabla y_k \, {\mathrm {d}}x \right) ^{\frac{1}{s}}\\&\quad \quad \left( \int _{\Omega _{\text {3D}}}\det \nabla y_k^{-\frac{1}{s-1}}\,{\mathrm {d}}x\right) ^{1-\frac{1}{s}}\\&\quad =\left( \int _{y_k({\Omega _{\text {3D}}})} | u_{\text {3D}}-G_{\delta } *u_{\text {3D}}|^{qs} N(y_k\,|\,v) \, {\mathrm {d}}v \right) ^{\frac{1}{s}}\\&\quad \quad \left( \int _{\Omega _{\text {3D}}}\det \nabla y_k^{-r}\,{\mathrm {d}}x\right) ^{\frac{1}{r+1}}\\&\quad \le \Vert u_{\text {3D}}- G_{\delta } *u_{\text {3D}}\Vert _{L^{qs}({\mathbb {R}}^3)}^q \left( \frac{{{\mathcal {R}}}[y_k]+\beta {\mathrm {vol}}({\Omega _{\text {3D}}})}{C}\right) ^{\frac{1}{r+1}}, \end{aligned}$$

where we used the change of variables for Sobolev functions [14, Thm. 2] as well as \(N(y_k\,|\,v)=1\) for almost all \(v\in y_k({\Omega _{\text {3D}}})\) (by the same argument as for y). Since \(u_{\text {3D}}\in L^{qs}({\Omega _{\text {3D}}})\) and \({{\mathcal {R}}}[y_k]\le {\mathcal {E}}^d[y_k]\le {\mathcal {E}}^d[y_1]\), the right-hand side converges to 0 as \(k\rightarrow \infty \) and then \(\delta \rightarrow 0\). For the second summand, we observe

$$\begin{aligned} \Vert (G_{\delta } *u_{\text {3D}}) \circ y_k-(G_{\delta } *u_{\text {3D}}) \circ y \Vert _{L^q}^q \le L_{\delta }^q \Vert y_k- y\Vert _{L^q}^q, \end{aligned}$$

\(L_{\delta }\) being the Lipschitz constant of \(G_{\delta } *u_{\text {3D}}\). Again, letting first \(k\rightarrow \infty \) and then \(\delta \rightarrow 0\) the right-hand side converges to 0. The third summand is treated like the first so that in summary \(u_{\text {3D}}\circ y_k\rightarrow u_{\text {3D}}\circ y\).

Due to \(u_{\text {3D}}\in L^\infty ({\Omega _{\text {3D}}})\), the composition \(u_{\text {3D}}\circ y_k\) is uniformly bounded in \(L^\infty ({\Omega _{\text {3D}}})\) so that any subsequence contains another weakly-* converging subsequence in \(L^\infty ({\Omega _{\text {3D}}})\). Due to the strong convergence \(u_{\text {3D}}\circ y_k\rightarrow u_{\text {3D}}\circ y\) in \(L^q({\Omega _{\text {3D}}})\), the limit must be the same and thus

$$\begin{aligned} u_{\text {3D}}\circ y_k{\mathop {\rightharpoonup }\limits ^{*}}u_{\text {3D}}\circ y\quad {\text {in}}\quad L^\infty ({\Omega _{\text {3D}}}) \end{aligned}$$

for the whole sequence. Furthermore, it is straightforward to check that \({{\mathcal {F}}}\) is the adjoint operator to

$$\begin{aligned}&{{\mathcal {F}}}':L^1({\Omega _{\text {2D}}})\rightarrow L^1({\Omega _{\text {3D}}}),\quad \\&{{\mathcal {F}}}'(g)(x)=(g*{\bar{\chi }}_{x_3})(x_1,x_2)\\&{\text { with }} {\bar{\chi }}_{x_3}(x_1,x_2)=\chi (-x_1,-x_2,-x_3), \end{aligned}$$

which is a bounded linear operator due to

$$\begin{aligned} \Vert {{\mathcal {F}}}'g\Vert _{L^1}&=\int _{-1}^1\Vert g*\chi (-\cdot ,-\cdot ,-x_3)\Vert _{L^1}\,{\mathrm {d}}x_3 \\&\le \int _{-1}^1\Vert g\Vert _{L^1}\Vert \chi (-\cdot ,-\cdot ,-x_3)\Vert _{L^1}\,{\mathrm {d}}x_3\\ {}&=\Vert g\Vert _{L^1}\Vert \chi \Vert _{L^1} \end{aligned}$$

by Young’s convolution inequality. As a consequence, \({{\mathcal {F}}}(u_{\text {3D}}\circ y_k){\mathop {\rightharpoonup }\limits ^{*}}{{\mathcal {F}}}(u_{\text {3D}}\circ y)\) in \(L^\infty ({\Omega _{\text {2D}}})\), since for any \(g\in L^1({\Omega _{\text {2D}}})\) we have

$$\begin{aligned}&\int _{\Omega _{\text {2D}}}g{{\mathcal {F}}}(u_{\text {3D}}\circ y_k)\,{\mathrm {d}}x =\int _{\Omega _{\text {3D}}}{{\mathcal {F}}}'(g)\,u_{\text {3D}}\circ y_k\,{\mathrm {d}}x\\&\quad \rightarrow \int _{\Omega _{\text {3D}}}{{\mathcal {F}}}'(g)\,u_{\text {3D}}\circ y\,{\mathrm {d}}x =\int _{\Omega _{\text {2D}}}g{{\mathcal {F}}}(u_{\text {3D}}\circ y)\,{\mathrm {d}}x \end{aligned}$$

as \(k\rightarrow \infty \). The convexity of d in its first argument now implies \(\liminf _{k\rightarrow \infty }{\mathcal {J}}^d[y_k]\ge {\mathcal {J}}^d[y]\), as desired.

Summarizing, \({\mathcal {E}}^d[y]={\mathcal {J}}^d[y]+{{\mathcal {R}}}[y]\le \liminf _{k\rightarrow \infty }{\mathcal {J}}^d[y_k]+{{\mathcal {R}}}[y_k]=\liminf _{k\rightarrow \infty }{\mathcal {E}}^d[y_k]=\inf _{{\mathcal {A}}}{\mathcal {E}}^d\) so that \(y\in {\mathcal {A}}\) must be a minimizer. \(\square \)

3 Numerical Implementation

In this section, we discuss the discretization and numerical minimization of the energy functional

$$\begin{aligned} {\mathcal {E}}^d[y]&= \int _{{\Omega _{\text {2D}}}}d\left( \,{{\mathcal {F}}}(u_{\text {3D}}\circ y)(x),\, u_{\text {2D}}(x)\, \right) \, {\mathrm {d}}x \\&\quad + \int _{{\Omega _{\text {3D}}}} W(\nabla y(x)) \ {\mathrm {d}}x, \end{aligned}$$

which is cumbersome due to the nonlocal convolution operator in \({{\mathcal {F}}}\), the composition of discretized functions, and the nondifferentiability of the discretized functions. As before, d denotes either the Euclidean distance \(d_1(x,y) = |x-y|\) or its square \(d_2(x,y) = |x-y|^2\), but other choices can be implemented in the same way. To obtain a differentiable functional in the former case (which will allow simpler numerics), we make the modification \(d_1(x,y) = \sqrt{(x-y)^2+\delta ^2}\) with \(\delta > 0\) a small regularization parameter. In our implementation, the stored energy function W has the form

$$\begin{aligned} W(A)={\left\{ \begin{array}{ll} c_1 \Vert A\Vert _F^2 + c_2(\det A)^{-1} \\ + c_3(1-\det A)^2 \\ + c_4(\Vert A\Vert _F^2-3)^2 + D&{}{\text {if}} \det A>0,\\ \infty &{}{\text {else}} \end{array}\right. } \end{aligned}$$
(1)

with constants \(c_1,c_2,c_3{,c_4} \ge 0\), \(c_2=2c_1\) and \(D=-3c_1-c_2\) such that \(W\ge 0\) and \(W(S) = 0\) for rotation matrices S. This specific choice satisfies the polyconvexity and the growth condition of Theorem 1 as long as \(c_2,c_4>0\) and \(c_4\le c_1/3\) (we pick \((c_1,c_2,c_3,c_4,D)=(1,2,4,0.1,-5)\)). As already indicated in the previous section, we deliberately do not include a term depending on \({\mathrm {cof}}A\) since biological tissues typically contain no structures that specifically withstand area changes. For certain tissues at least, this also seems to be supported experimentally, see for instance [16], where the hyperelastic material models describing soft tissue best only depend on the first and third strain invariant \(\Vert \nabla y\Vert _F\) and \(\det \nabla y\). Note that the growth conditions of Theorem 1 only concern the asymptotic behaviour for infinite strains so that in principle any constitutive relationship may be chosen in the regime of finite strains (if necessary it can be complemented by an energy term that is only active for large enough strains and ensures sufficient growth). Our above choice is a particular instance of the (compressible) Yeoh model (the latter was most consistent with experimental data in [16]). A popular alternative are Ogden material models that allow fitting many parameters to measurements.

Note that in our experiments the constraints \(\int _{{\Omega _{\text {3D}}}} \det \nabla y(x)\, {\mathrm {d}}x \le {\mathrm {vol}}(y({\Omega _{\text {3D}}}))\) and \(\Vert y-{\mathrm {id}}\Vert _{L^\infty }\le {\mathrm {diam}}({\Omega _{\text {2D}}})\) were always satisfied without explicit enforcement. The latter is trivial to check numerically or even just by visual inspection of the deformation y, and also the former can readily be checked by visual inspection of the deformed domain: By a classical result due to John Ball [2, Thm. 1(ii)] the condition is implied if y coincides on \(\partial {\Omega _{\text {3D}}}\) with a homeomorphism of \({\Omega _{\text {3D}}}\), and by the generalized Jordan–Schoenflies theorem this is the case for piecewise smooth (thus in particular discretized) y if the deformed domain boundary has no self-intersections (actually this result seems to go back already to the work of James W. Alexander in connection with the Alexander horned sphere). The automatic fulfilment of these conditions is not too surprising: There would be no benefit in violating them for the image pairs \(u_{\text {2D}}\) and \(u_{\text {3D}}\) used in our experiments. In principle, it would be possible to let the minimization algorithm run into a deformation y violating the conditions by carefully crafting the initialization of the algorithm (for instance, in Fig. 6 one could initialize the deformation such that the left cuboid is moved on top of the right one without local self-penetration). However, the associated cost \({\mathcal {E}}^d[y]\) is so high that without special preparation of the initialization this region of the energy landscape would not be visited during minimization. For the same reason, the growth condition imposed in Theorem 1 rarely is necessary from a practical perspective: Imagine, the minimizer y has bounded gradient (which is usually the case for physiological deformations of tissue). Then, it actually does not matter how the stored energy W behaves for larger deformation gradients—y will stay a local minimizer, even if W is changed for large deformation gradients to violate the growth conditions of Theorem 1. The significance of these conditions is that at some point y may cease to be a global minimizer (and global minimizers may cease to exist altogether due to a lack of lower semi-continuity), but a minimization algorithm will typically not see this global behaviour.

Assuming a twice differentiable 3D image \(u_{\text {3D}}\), the first and second Gâteaux derivatives of \({\mathcal {J}}^d\) in \(y\in {\mathcal {A}}\) for suitable variations \(\phi \) and \(\psi \) are

$$\begin{aligned} \partial {\mathcal {J}}^d[y](\phi )&= \int _{{\Omega _{\text {2D}}}} \partial _1d(\, {{\mathcal {F}}}(u_{\text {3D}}\circ y)(x),\, u_{\text {2D}}(x))\, \\&\quad \left( \chi *[\phi \cdot (\nabla u_{\text {3D}})\circ y] \right) (x_1,x_2,0)\ {\mathrm {d}}x, \\ \partial ^2 {\mathcal {J}}^d[y](\phi , \psi )&={\mathcal {H}}^{\mathrm L}(\phi ,\psi )+{\mathcal {H}}^{\mathrm {NL}}(\phi ,\psi )\quad {\text {with}}\\ {\mathcal {H}}^{\mathrm L}(\phi ,\psi )&= \int _{{\Omega _{\text {2D}}}} \partial _1d(\, {{\mathcal {F}}}(u_{\text {3D}}\circ y)(x),\, u_{\text {2D}}(x)) \\&\quad \left( \chi *\left[ \psi ^T\, (({\mathrm {Hess}}\, u_{\text {3D}})\circ y) \, \phi \right] \right) (x, 0) \, {\mathrm {d}}x, \\ {\mathcal {H}}^{\mathrm {NL}}(\phi ,\psi )&= \int _{{\Omega _{\text {2D}}}} \partial _1^2 d(\, {{\mathcal {F}}}(u_{\text {3D}}\circ y)(x),\, u_{\text {2D}}(x))\\ {}&\quad \, \left( \chi *[\phi \cdot (\nabla u_{\text {3D}})\circ y]\right) (x_1,x_2,0)\\&\quad \,\left( \chi *[\psi \cdot (\nabla u_{\text {3D}})\circ y]\right) (x_1,x_2,0)\, {\mathrm {d}}x, \nonumber \end{aligned}$$

where \({\mathrm {Hess}}\, u_{\text {3D}}\) denotes the Hessian of \(u_{\text {3D}}\). While for convex d the second summand in \(\partial ^2 {\mathcal {J}}^d[y]\) is always positive semi-definite, the first summand may destroy this definiteness.

Actually, real images are typically only piecewise smooth so that the above differentiability assumptions are too strong. However, for implementation purposes one may smooth the images to facilitate the energy minimization, where the amount of smoothing is gradually reduced. In our implementation, this is accomplished by the discretization and multilevel scheme introduced in the following paragraphs. As will be seen in the right charts of Fig. 4, the dissimilarity measure indeed becomes smoothed as a consequence. In fact, its second derivative will only be needed for the Newton method, which we will compare to alternative first-order minimization methods.

For the sake of completeness, we also write down the expressions for the first and second Gâteaux derivative of the hyperelastic regularizer. Rewriting W defined in (1) in the form

$$\begin{aligned}&W(A)={\bar{W}}(\Vert A\Vert _F^2,\det A) \quad {\text {with}}\\&\quad {\bar{W}}(I_1, I_3) = c_1 I_1 + c_2 I_3^{-1} + c_3(1-I_3)^2 +c_4(I_1-3)^2 + D, \end{aligned}$$

its partial derivatives are given by

$$\begin{aligned} \partial _1{\bar{W}}(I_1, I_3)&= c_1+2c_4(I_1-3),\\ \quad \partial _2{\bar{W}}(I_1, I_3)&= -c_2 I_3^{-2} - 2c_3 (1-I_3), \partial _1^2{\bar{W}}(I_1, I_3) =2c_4,\\ \partial _1\partial _2{\bar{W}}(I_1, I_3)&= 0,\\ \partial _2^2{\bar{W}}(I_1, I_3)&= 2\, c_2 I_3^{-3} + 2c_3. \end{aligned}$$

With the help of the identities \(\partial _A\det (A)(B) = \mathrm {tr}(B^T{\mathrm {cof}}A)\) and \(\partial _A(\Vert A\Vert _F^2)(B) = 2\mathrm {tr}(B^TA)\) and abbreviating \(I_1=\Vert \nabla y\Vert _F^2\) and \(I_3=\det \nabla y\), the Gâteaux derivatives of the hyperelastic regularizer read

$$\begin{aligned} \partial {{\mathcal {R}}}[y](\phi )&= \int _{{\Omega _{\text {3D}}}} 2\partial _1{\bar{W}}(I_1, I_3)\, \mathrm {tr}(\nabla \phi ^T\nabla y)\\&\quad + \partial _2{\bar{W}}(I_1, I_3)\, \mathrm {tr}(\nabla \phi ^T {\mathrm {cof}}\nabla y)\, {\mathrm {d}}x\,, \\ \partial ^2 {{\mathcal {R}}}[y](\phi , \psi )&= \int _{{\Omega _{\text {3D}}}}4\partial _1^2{\bar{W}}(I_1,I_3)\mathrm {tr}(\nabla \phi ^T\nabla y)\mathrm {tr}(\nabla \psi ^T\nabla y)\\&\quad + 2\partial _1{\bar{W}}(I_1, I_3)\mathrm {tr}(\nabla \phi ^T\nabla \psi )\\&\quad + \partial _2^2{\bar{W}}(I_1, I_3)\, \mathrm {tr}(\nabla \psi ^T {\mathrm {cof}}\nabla y)\, \mathrm {tr}(\nabla \phi ^T {\mathrm {cof}}\nabla y) \nonumber \\&\quad + \frac{\partial _2{\bar{W}}(I_1, I_3)}{I_3}\, \left[ \mathrm {tr}(\nabla \psi ^T {\mathrm {cof}}\nabla y)\,\mathrm {tr}(\nabla \phi ^T{\mathrm {cof}}\nabla y) \right. \\&\quad \left. -\mathrm {tr}(\nabla \phi ^T{\mathrm {cof}}\nabla y\,\nabla \psi ^T{\mathrm {cof}}\nabla y) \right] {\mathrm {d}}x. \end{aligned}$$

Spatial discretization The image domains \({\Omega _{\text {2D}}}\) and \({\Omega _{\text {3D}}}\) are discretized by two dyadically nested hierarchies of regular rectilinear grids \(({\mathcal {T}}_{\mathrm {2D}}^n)_{n=1,\ldots , N}\) and \(({\mathcal {T}}_{\mathrm {3D}}^n)_{n=1,\ldots , N}\), respectively. The number of nodes in the nth grid along each coordinate direction is \(M_n=2^n+1\), where the resolution of the given discrete input images determines N, the level of the finest grids \({\mathcal {T}}_{\mathrm {2D}}^N\) and \({\mathcal {T}}_{\mathrm {3D}}^N\).

Introducing multilinear finite element basis functions on each grid gives rise to a hierarchy of \(C^0\)-finite element spaces \(X^n = (X^n_{_{\text {2D}}}\times X^n_{_{\text {3D}}})_{n=1,\ldots ,N}\) with \(X^n \subset X^m\) whenever \(n \le m\) and corresponding restriction and prolongation operators chosen as follows. On a one-dimensional grid with \(M_n\) nodes, a finite element function \(\xi \) can be identified with the vector \((\xi _1,\ldots ,\xi _{M_n})\) of its nodal values. On such a grid, we define the one-dimensional restriction and prolongation operators as

$$\begin{aligned} {\mathfrak {R}}_n^{_{\text {1D}}}: \left( \xi _j\right) _{1\le j \le M_n}\&\mapsto \ \left( \tfrac{\xi _{2j-2}+2\xi _{2j-1}+\xi _{2j}}{4}\right) _{1\le j\le (M_n+1)/2}, \\ {\mathfrak {P}}_n^{_{\text {1D}}}: \left( \xi _j\right) _{1\le j \le M_n}\&\mapsto \ (\xi _1, \tfrac{\xi _1+\xi _2}{2}, \xi _2, \tfrac{\xi _2+\xi _3}{2}, \ldots , \xi _{M_n-1}, \\ \tfrac{\xi _{M_n-1}+\xi _{M_n}}{2}, \xi _{M_n}) \end{aligned}$$

(for ease of notation, we set \(\xi _0=\xi _1\) and \(\xi _{M_n+1}=\xi _{M_n}\)). The restriction and prolongation operators

$$\begin{aligned} {\mathfrak {R}}_n: X^n \rightarrow X^{n-1}, \quad {\mathfrak {P}}_n: X^{n} \rightarrow X^{n+1} \end{aligned}$$

are then obtained by applying consecutively \({\mathfrak {R}}_n^{_{\text {1D}}}\) and \({\mathfrak {P}}_n^{_{\text {1D}}}\) along each coordinate direction of the grid.

Fixing a grid \({\mathcal {T}}_{\mathrm {3D}}^n\) with \(N_n=(2^n+1)^3\) nodes \((x^1, \ldots , x^{N_n})\), we denote the finite element basis functions on \({\mathcal {T}}_{\mathrm {3D}}^n\) by \(\phi _1^n,\ldots ,\phi _{N_n}^n\). The finite element representation of \(u_{\text {3D}}\) in \(X^N_{_{\text {3D}}}\) is taken as \(u_{\text {3D}}^N=\sum _{j=1}^{N_N} u_{\text {3D}}(x^j) \phi _j^N\) and in \(X^n_{_{\text {3D}}}\) as \(u_{\text {3D}}^n={\mathfrak {R}}_{n+1}\cdots {\mathfrak {R}}_Nu_{\text {3D}}^N\) (note that we will denote discretized functions on grid level n with a superscript n). Similarly, the discretized deformation is expressed as \(y^n = \sum _{j=1}^{N_n} y^n_j \phi _j^n\) with nodal coefficients \(y^n_j\in {\mathbb {R}}^3\). The discretized version \(u_{\text {2D}}^n\) of \(u_{\text {2D}}\) on grid \({\mathcal {T}}_{\mathrm {2D}}^n\) is defined in an analogous way.

Since images in applications are typically given on rectilinear pixel or voxel grids, the use of multilinear Finite Elements is quite natural and efficient. It represents the primary discretization method in the QuocMesh Library, a C++ library dedicated to image processing, which we use for our numerical implementation. The most prominent alternative would be the use of linear Finite Elements on a tetrahedral submesh of the rectilinear grid. However, unless one introduces additional nodes in order to achieve a symmetric subdivision of all rectilinear elements (for instance, [4, 26] almost double the degrees of freedom), this discretization is known to induce a symmetry-breaking discretization bias. The downside of multilinear finite elements on the other hand is that unlike for linear finite elements the discrete deformation gradient \(\nabla y^n\) is not piecewise constant so that the quadrature of the regularization term \({{\mathcal {R}}}\) is not exact (while the dissimilarity measure \({\mathcal {J}}^d\) cannot be exactly integrated with any type of finite elements anyway due to the nonlinearity of the composition \(u_{\text {3D}}^n\circ y^n\)). This means that without considerable additional effort one can only exclude negative determinants \(\det \nabla y^n\) at the quadrature points so that a slightly negative determinant might still occur in tiny regions of few elements. However, this is usually not problematic in applications and is conceptually comparable to the quite successful use of nonconformal finite elements in the numerical solution of partial differential equations, where the violation of regularity requirements is reduced during grid refinement and vanishes in the limit. Such grid refinement does involve an additional technicality, though, which will be described later on.

Discretized cost functional The evaluation of the dissimilarity measure \({\mathcal {J}}^d\) and its derivatives requires the computation of \({{\mathcal {F}}}(u_{\text {3D}}\circ y)\). On the discretized level, the convolution contained in \({{\mathcal {F}}}\) is computed with the help of the discrete Fourier transform (DFT). To this end, \(u_{\text {3D}}^n \circ y^n\) and the discretized convolution kernel \(\chi ^n\) are evaluated at all grid points, and the resulting grid functions are padded with zeros so as to emulate the DFT using the fast Fourier transform on a periodic grid (in our case using routines of the FFTW project http://www.fftw.org/). The resulting grid function specifies the nodal values of a multilinear finite element function, which is then restricted to the \(x_1\)-\(x_2\)-plane to yield the discretized forward operator \(F^n(u_{\text {3D}}^n \circ y^n)\). Using second-order Gaussian quadrature on each element, the discrete analogue of \({\mathcal {J}}^d\) is now computed as

$$\begin{aligned} J^d_n[y^n] = A_n\sum _{q} w_{\text {2D}}^q\, d( F^n(u_{\text {3D}}^n \circ y^n)(q),\, u_{\text {2D}}^n(q)), \end{aligned}$$

where \(A_n\) stands for the area of each element in \({\mathcal {T}}_{\mathrm {2D}}^n\), \(w_{\text {2D}}^q\) denotes the quadrature weights, and the sum is taken over all quadrature points q. Similarly, the discretized regularizer is evaluated via

$$\begin{aligned} R_n[y^n]=V_n\sum _qw_{\text {3D}}^q\,W(\nabla y^n(q)) \end{aligned}$$

with \(V_n\) the volume of each element in \({\mathcal {T}}_{\mathrm {3D}}^n\). Finally, \(E^d_n=J^d_n+R_n\).

Discretized functional derivatives For the numerical evaluation of \(\partial {\mathcal {J}}^d\) we exploit that the adjoint operator to a convolution is the cross-correlation. In more detail, for functions \(f:{\Omega _{\text {2D}}}\rightarrow {\mathbb {R}}\), \(g:{\Omega _{\text {3D}}}\rightarrow {\mathbb {R}}\) we have

$$\begin{aligned}&\int _{\Omega _{\text {2D}}}f(x)[\chi *g](x_1,x_2,0)\,{\mathrm {d}}x\\&\quad =\int _{\Omega _{\text {2D}}}\int _{\Omega _{\text {3D}}}f(x)\chi ((x_1,x_2,0)-y)g(y)\,{\mathrm {d}}y\,{\mathrm {d}}x\\&\quad =\int _{\Omega _{\text {3D}}}\int _{\Omega _{\text {2D}}}f(x)\chi ((x_1,x_2,0)-y)\,{\mathrm {d}}x\,g(y)\,{\mathrm {d}}y\\&\quad =\int _{\Omega _{\text {3D}}}[\chi \diamond f](y)g(y)\,{\mathrm {d}}y, \end{aligned}$$

where we used Fubini’s theorem and \([\chi \diamond f](y)=[\chi (-\cdot ,-\cdot ,-y_3)*f](y_1,y_2)\) denotes the adjoint operator to the convolution evaluated in the \(x_1\)-\(x_2\)-plane. Denoting by \(*^n\) our discrete approximation of the convolution described previously, our discrete approximation of \(\diamond \) applied to two finite element functions \(f^n\in X_{\text {2D}}^n\) and \(\chi ^n\in X_{\text {3D}}^n\) is computed as the finite element function

$$\begin{aligned} \chi ^n\diamond ^nf^n=\chi ^n*^n Bf^n, \end{aligned}$$

where \(B:X_{\text {2D}}^n\rightarrow X_{\text {3D}}^n\) is the operator copying all nodal function values from \({\mathcal {T}}_{\mathrm {2D}}^n\) into the \(x_1\)-\(x_2\)-plane of \({\mathcal {T}}_{\mathrm {3D}}^n\) and leaving all other nodal values 0. Using the above notation, we can write

$$\begin{aligned}&\partial {\mathcal {J}}^d[y](\phi ) =\int _{\Omega _{\text {3D}}}\left[ \phi \cdot (\nabla u_{\text {3D}})\circ y\right] (x)\\&\quad \left[ \chi \diamond \partial _1d({{\mathcal {F}}}(u_{\text {3D}}\circ y,u_{\text {2D}})\right] (x)\,{\mathrm {d}}x. \end{aligned}$$

Correspondingly, using second-order Gaussian quadrature, the discretized derivative is calculated for each finite element basis function \(\phi _j^{n,k}=e_k\phi _j^n\) (with \(e_k\) the kth Cartesian unit vector) as

$$\begin{aligned}&\partial J^d_n[y^n](\phi _j^{n,k})=V_n\sum _qw_{\text {3D}}^q\left[ \phi _j^{n,k} \cdot (\nabla u_{\text {3D}}^n)\circ y^n\right] \!(q)\\&\quad [\chi ^n\diamond ^n(\partial _1 d(F^n(u_{\text {3D}}^n \circ y^n, u_{\text {2D}}^n))](q). \end{aligned}$$

The discretized analogue of \(\partial {{\mathcal {R}}}\) can be written as

$$\begin{aligned}&\partial R_n[y^n](\phi _j^{k,n}) = V_n \sum _{q} w_{\text {3D}}^q \mathrm {tr}\Big ((\nabla \phi _j^{k,n})^T(q)\\&\quad \big [2\partial _1{\bar{W}}(\Vert \nabla y^n(q)\Vert _F^2,\det \nabla y^n(q))\nabla y^n(q)\\&\quad +\partial _2{\bar{W}}(\Vert \nabla y^n(q)\Vert _F^2,\det \nabla y^n(q)){\mathrm {cof}}(\nabla y^n(q)) \big ]\Big ) \end{aligned}$$

and \(\partial E^d_n=\partial J^d_n+\partial R_n\).

Discretized second derivatives The second derivative of \({\mathcal {J}}^d\) can be written as the sum \(\partial ^2{\mathcal {J}}^d[y]={\mathcal {H}}^{\mathrm L}+{\mathcal {H}}^{\mathrm {NL}}\) of a local and a nonlocal linear operator, which are discretized separately. Indeed, while \({\mathcal {H}}^{\mathrm L}(\phi ,\psi )\) is nonzero only if \(\phi \) and \(\psi \) have overlapping support, the sparsity of a matrix representation of \({\mathcal {H}}^{\mathrm {NL}}(\phi ,\psi )\) depends on the size of the blurring kernel \(\chi \) and will typically be very low, making the storage of the assembled matrix impractical. However, during our numerical optimization we will only apply iterative solvers like BiCGStab, which only require the evaluation of matrix-vector products. Due to the tensor product structure of the integrand of \({\mathcal {H}}^{\mathrm {NL}}\), the application of the linear operator can be implemented efficiently as follows. Again exploiting the relation between convolution and the operator \(\diamond \), we can rewrite

Thus, given a finite element function \(\psi ^n\in (X_{\text {3D}}^n)^3\), for each finite element basis function \(\phi _j^{k,n}=e_k\phi _j^n\) we can compute the discretized analogue

$$\begin{aligned} H^{\mathrm {NL}}_n(\phi _j^{k,n},\psi ^n) =V_n\sum _qw_{\text {3D}}^q\big [\chi ^n\diamond ^n\big (\partial _1^2d(F^n(u_{\text {3D}}^n\circ y^n),u_{\text {2D}}^n)\\ (\chi ^n*^n[\psi ^n\cdot (\nabla u_{\text {3D}}^n)\circ y^n])(\cdot ,\cdot ,0)\big )\big ](q)\, [\phi _j^{k,n}\cdot (\nabla u_{\text {3D}}^n)\circ y^n](q). \end{aligned}$$

The operator \({\mathcal {H}}^{\mathrm L}\) is discretized as

$$\begin{aligned}&H^{\mathrm L}_n(\phi _j^{k,n},\phi _i^{l,n}) =V_n\sum _qw_{\text {3D}}^q\left[ \phi _j^{k,n}\cdot \,(({\mathrm {Hess}}^n\, u_{\text {3D}}^n)\circ y^n) \, \phi _i^{l,n}\right] (q)\\&\quad \left[ \chi ^n\diamond ^n\partial _1 d(F^n(u_{\text {3D}}^n \circ y^n, u_{\text {2D}}^n)\right] (q), \end{aligned}$$

where \({\mathrm {Hess}}^n\,u_{\text {3D}}^n\) is defined weakly as in mixed finite element approaches. Indeed, as an artefact of our discretization, the piecewise multilinear finite element function \(u_{\text {3D}}^n\) does not possess a weak second derivative (part of its distributional second derivative is concentrated on the element boundaries), yet second-order information \({\mathrm {Hess}}u_{\text {3D}}\) is helpful for the registration and should not be neglected in the discretization. Thus, we define \({\mathrm {Hess}}^nu_{\text {3D}}^n\) via

$$\begin{aligned}&\int _{\Omega _{\text {3D}}}\mathrm {tr}\left( \psi ^T{\mathrm {Hess}}^nu_{\text {3D}}^n\right) \,{\mathrm {d}}x =\int _{\partial {\Omega _{\text {3D}}}}n^T\psi \nabla u_{\text {3D}}^n\,{\mathrm {d}}x\\&\quad -\int _{\Omega _{\text {3D}}}{\mathrm {div}}\psi \cdot \nabla u_{\text {3D}}^n\,{\mathrm {d}}x \quad {\text {for all }}\;\psi \in (X_{\text {3D}}^n)^{3\times 3}, \end{aligned}$$

where n denotes the unit outward normal to \(\partial {\Omega _{\text {3D}}}\). This amounts to solving a linear system of the form \(MV=LU_{\text {3D}}\) for the nodal value vector V of \({\mathrm {Hess}}^nu_{\text {3D}}^n\) with M a mass matrix, L a stiffness matrix, and \(U_{\text {3D}}\) the vector of nodal values of \(u_{\text {3D}}^n\). The main advantage of this approach is that the same finite element framework can be employed as for the rest of the discretization. Alternatively one could use a smooth spline interpolation for the discrete images \(u_{\text {3D}}^n\), which would render the discretized energy \(E_n^d\) truly twice differentiable. (In fact, since the acquired image date is discrete, a smooth interpolation would be just as valid as any other interpolation.) Note that the matrix representation of \(H^{\mathrm L}_n\) is just a weighted mass matrix.

Abbreviating \(I_1^n(q)=\Vert \nabla y^n(q)\Vert _F^2\) and \(I_3^n(q)=\det \nabla y^n(q)\), the discretized analogue of \(\partial ^2{{\mathcal {R}}}\) reads

$$\begin{aligned}&\partial ^2R_n[y^n](\phi _j^{k,n},\phi _i^{l,n}) = V_n \sum _{q} w_{\text {3D}}^q\\&\qquad \bigg ( 4\partial _1^2{\bar{W}}(I_1^n(q),I_3^n(q))\\&\qquad \qquad \mathrm {tr}((\nabla \phi _j^{k,n}(q))^T\nabla y^n(q)) \mathrm {tr}((\nabla \phi _i^{l,n}(q))^T\nabla y^n(q))\\&\qquad \quad + 2\partial _1{\bar{W}}(I_1^n(q),I_3^n(q)) \mathrm {tr}((\nabla \phi _j^{k,n}(q))^T\nabla \phi _i^{l,n}(q))\\&\qquad \quad + \partial _2^2{\bar{W}}(I_1^n(q),I_3^n(q))\, \mathrm {tr}((\nabla \phi _i^{l,n}(q))^T {\mathrm {cof}}\nabla y^n(q))\,\\&\qquad \qquad \mathrm {tr}((\nabla \phi _j^{k,n}(q))^T {\mathrm {cof}}\nabla y^n(q)) \\&\qquad \quad + \frac{\partial _2{\bar{W}}(I_1^n(q),I_3^n(q))}{I_3^n(q)}\, \Big [ \mathrm {tr}((\nabla \phi _i^{l,n}(q))^T {\mathrm {cof}}\nabla y^n(q))\,\\&\qquad \qquad \mathrm {tr}((\nabla \phi _j^{k,n}(q))^T{\mathrm {cof}}\nabla y^n(q))\\&\qquad \quad -\mathrm {tr}((\nabla \phi _j^{k,n}(q))^T{\mathrm {cof}}\nabla y^n(q)\,(\nabla \phi _i^{l,n}(q))^T{\mathrm {cof}}\nabla y^n(q)) \Big ]\bigg ). \end{aligned}$$

Numerical optimization We tested and compared the performance of gradient-based methods, in particular nonlinear conjugate gradient and quasi-Newton methods, and a second-order line search or trust-region Newton method. Below we provide a few details on the latter (the implementation of the former being straightforward). We included Newton methods since they can be considered the gold standard in nonlinear elasticity problems, even though in the context of image registration one typically cannot expect second-order convergence (cf. Fig. 2a) since the input images (and thus the objective functional) are usually not smooth enough. The Hessian of the objective here rather plays the same role as a good preconditioner in a gradient-type algorithm.

The line-search Newton method for the numerical minimization of \(E^d_n\) over \((X_{\text {3D}}^n)^3\) takes the form

$$\begin{aligned} y_{k+1}^n&= y_k^n - \gamma _k(\partial ^2 E^d_n[y_k^n])^{-1} \partial E^d_n[y_k^n]\,, \quad k \ge 0, \end{aligned}$$

where we compute the step size \(\gamma _k>0\) using a backtracking line search with Armijo’s condition and where the inverse linear operator is applied using BiCGStab (note that due to the lack of sparsity in \(H^{\mathrm {NL}}_n\), the linear system has to be solved iteratively). However, while \(H^{\mathrm {NL}}_n\) is always positive semidefinite, \(H^{\mathrm L}_n\) and \(\partial ^2 R_n\) can be indefinite so that \(\partial ^2E^d_n\) may be so as well. Consequently, the Newton step may not be a descent direction, and the iteration might converge to a saddle point. To compensate a possible lack of positive definiteness, we add a scalar multiple \(\lambda _k\) of the identity to the Hessian operator,

$$\begin{aligned} y_{k+1}^n&= y_k^n - \gamma _k(\partial ^2 E^d_n[y_k^n] + \lambda _k {\mathrm {id}})^{-1} \partial E^d_n[y_k^n],\quad k \ge 0, \end{aligned}$$

where \(-\lambda _k\) should approximate the most negative eigenvalue of \(H^{\mathrm L}[y_k^n] + \partial ^2 R_n[y_k^n]\). To determine \(\lambda _k\), we use the procedure described in [9, Sct. 8.5.2, Thm. 8.5.1], which consists of a number of truncated Lanczos iterations to obtain a symmetric tridiagonal approximation \(T\in {\mathbb {R}}^{m\times m}\) to \(H^{\mathrm L}[y_k^n] + \partial ^2 R_n[y_k^n]\) and a subsequent computation of its characteristic polynomial, whose smallest zero is found via a bisection method.

Fig. 2
figure 2

Performance of Newton’s method using a fully and partially assembled Hessian operator (left) and a breakdown of the computational costs of a generic step of Newton’s method (right)

Further modifications of the Newton iteration allow a further reduction of the computational complexity of each Newton step. Since \(\partial _1{\mathrm {d}}({{\mathcal {F}}}(u_{\text {3D}}\circ y), u_{\text {2D}})\) is zero for perfectly aligned images, the contribution of \(H^{\mathrm L}\) for closely aligned images is negligible and \(\partial ^2 J^d_n\) can be approximated by \(H^{\mathrm {NL}}+ \partial ^2R_n\). We refer to [26] for a detailed discussion of this strategy in the context of hyperelastic 3D–3D image registration. Borrowing from least square problems, it is sometimes referred to as Gauss–Newton method. In contrast to [26], we do not neglect the terms of \(\partial ^2{{\mathcal {R}}}[y]\) stemming from differentiating twice the determinant since their contribution to the overal computational effort is minor and unlike for \(H^{\mathrm L}\) there is no justification for assuming them to be small. In fact, these terms are the only ones encoding the nonconvexity of the hyperelastic deformation energy, which is an inherent, important model feature of nonlinear elasticity. To support this, we also numerically verified in our experiments that the terms of \(\partial ^2{{\mathcal {R}}}[y]\) involving first and second derivatives of \({\bar{W}}\) have operator norm of the same order, which is in contrast to the relation between \(H^{\mathrm {NL}}\) and \(H^{\mathrm L}\) (the former having a \(10^4\)-fold higher norm near the optimum). Figure 2a compares the energy decrease of Newton’s method and this Gauss–Newton type modification when applied to the datasets shown in Fig. 6.

The major cost of each Newton iteration lies in the numerical solution of the linear system. To improve convergence of the BiCGStab solver, we tested several preconditioning methods. Jacobi, geometric scaling, and incomplete LU preconditioning—applied to the part \(H^{\mathrm L}[y_k^n] + \partial ^2 R_n[y_k^n]\) of the Hessian matrix that can be assembled—turned out to be inferior to a multigrid preconditioner applied to the entire Hessian operator \(\partial ^2 E^d_n\). Note that this operation does not require the assembly of \(\partial ^2 E^d_n\), since iterative solvers like BiCGStab or GMRES—this time without preconditioning—can be employed for the pre- and post-smoothing steps.

A line-search Newton method is prone to getting stuck or at least slowing down at saddle points (even despite the compensation for indefiniteness). This can be avoided by using a trust-region Newton method, which can also minimize indefinite quadratic functions within its trust region. To this end, we solved the Newton system via preconditioned truncated Lanczos iterations [10, 31] (since this simultaneously allows to use the above-mentioned technique for compensating indefiniteness), where we applied the same preconditioners as in the line-search approach. Figure 2b shows that the use of truncated Lanczos iterations reduces the computational complexity of the trust-region subproblem to a little less than computing the Hessian (while in the line-search Newton method the solution of the linear system by far outweighed that computation).

Overall, as already suggested by Fig. 2b, the solution of the linear system in Newton’s method (line search or trust region) turns out to consume so much time that a mere quasi-Newton method with BFGS updates (see, e.g., [21, Ch. 6.1]) is more efficient. In fact, a plain conjugate gradient descent with Polak–Ribiére updates (see, e.g., [7, Sct. 8.5]) performed best in our experiments and is thus chosen in the following.

Preprocessing Before starting the minimization of \(E^d_n\), we rigidly align \(u_{\text {3D}}\) to \(u_{\text {2D}}\) by minimizing \(J^d_n[y^n]\) among all rigid deformations (for which actually \(E^d_n=J^d_n\)). In fact, to accommodate a potential slight mismatch in magnification between the 2D intravital and the 3D confocal microscopy as well as different resolutions in \(x_1\)-, \(x_2\)-, and \(x_3\)-direction (or to make up for a bad choice of the blurring kernel \(\chi \)), we additionally allow a rescaling along the coordinate directions. Thus, we minimize \(J^d_n\) among all deformations

$$\begin{aligned} x&\mapsto R_3(\gamma )\,R_2(\beta )\,R_1(\alpha )\,{\mathrm {diag}}(s_1, s_2, s_3)\, x + t, \end{aligned}$$

parameterized by a translation vector \(t\in {\mathbb {R}}^3\) as well as scalings \(s_1,s_2,s_3\) along and rotation angles \(\alpha ,\beta ,\gamma \in [0, 2\pi [\) about the three coordinate directions (\(R_i(\delta )\in SO(3)\) denotes the rotation about the ith axis by angle \(\delta \)). We now use the grid hierarchy \({\mathcal {T}}_{\mathrm {3D}}^n\), \(n=1,\ldots ,N\), to iteratively find the optimal parameters for each grid level n via a quasi-Newton method initialized with the optimal parameters from level \(n-1\). After the optimal deformation on level N is found, we replace \(u_{\text {3D}}\) by its composition with that deformation so that the new \(u_{\text {3D}}\) now has the correct length scales and already is optimally aligned.

Fig. 3
figure 3

Multilevel and multiscale strategies simplify the energy landscape and thereby facilitate registration: Both top and bottom experiments use a three-dimensional image \(u_{\text {3D}}\) consisting of three cuboids at a resolution of \(256\times 256\times 9\) pixels and take the two-dimensional image \(u_{\text {2D}}\) as the projection of a simple translation, \(u_{\text {2D}}={{\mathcal {F}}}(u_{\text {3D}}\circ y_{0.35})\) for \(y_t(x)=(x_1,x_2,x_3+t)\) (note that the image width of 256 pixels corresponds to length 1). The right graphs show the registration energy as a function of the translation t in the range [0, 0.7], where the different colours correspond to the different resolutions or blurs shown on the left. In the top experiment, the \(x_1,x_2\)-resolution of both \(u_{\text {3D}}\) and \(u_{\text {2D}}\) was decreased repeatedly using the restriction operators \({\mathfrak {R}}_n\), while in the bottom experiment the images are obtained by blurring in \(x_1,x_2\)-direction with the Gaussian kernel \(K_s\) of scale s, \(u_{\text {2D}}^s=K_s*u_{\text {2D}}\), \(u_{\text {3D}}^s=K_s*_{\text {2D}}u_{\text {3D}}\). Clearly, spurious local minima are alleviated or even eliminated at coarser resolution or stronger blur (Color figure online)

Multilevel optimization problems We have already detailed how by replacing \({\mathcal {J}}^d\), \({{\mathcal {R}}}\), and \({\mathcal {E}}^d\) with discretized analogues \(J^{d}_n\), \(E^d_n\), and \(R_n\) we arrive at a set of optimization problems

$$\begin{aligned} \min _{y^n\in (X_{_{\text {3D}}}^n)^3} E^{d}_n[y^n], \quad n=1, \ldots , N, \end{aligned}$$

that are numerically solved using the nonlinear conjugate gradient method. As so often in image processing, the use of a multilevel approach is essential for the quality of the results as well as for computational efficiency, which in the context of hyperelastic image registration was first applied in [8] (see also [20, § 3.7]). Owing to its nonconvexity, the functional \({\mathcal {E}}^d\) can be expected to have a large number of local minima, which is in general linked to the resolution of the given image data and the number of image features that promote regional alignment. Downsampling of the image data reduces the number of image features (and thus of local minima) in the input datasets as well as the complexity of the optimization procedure (see the experimental illustration in Fig. 3). The results of the less costly optimization on coarser grids can then be used as good initial values for the higher-level optimization problems. We make use of this strategy by first minimizing \(E^d_n\) on a low grid level n and then successively solving the optimization problem on higher grid levels. Since our discretization can only ensure positive determinant of the deformation gradient at the quadrature points, it may happen that after grid refinement some new quadrature points exhibit negative determinant, leading to an infinite energy. As a remedy, we modify the stored energy (1) to

$$\begin{aligned}&W(A)= c_1 \Vert A\Vert _F^2 + c_3(1-\det A)^2 + c_4(\Vert A\Vert _F^2-3)^2 \\&\quad + D + c_2{\left\{ \begin{array}{ll}(\det A)^{-1}&{}{\text {if}}\det A>\varepsilon ,\\ \frac{2}{\varepsilon } - \frac{\det A}{\varepsilon ^2}&{}{\text {else}} \end{array}\right. } \end{aligned}$$

for some positive \(\varepsilon \) which is halved during each grid refinement (in our experiments, we start with \(\varepsilon =0.1\) on the coarsest grid) so that the penalization of negative determinants increases from level to level. This strategy of approximating a singular energy by a Lipschitz-continuous regularization is common in convex optimization and optimal control, where typically the Moreau–Yosida approximation is used as regularization. We instead just employ a simple linear extension of \(\det A\mapsto (\det A)^{-1}\). The resulting range of determinant values in our experiments is provided in Table 1.

Table 1 Numeric information on the example experiments shown in Figs. 6, 7, 8, 9, 10, 11 and 12

In addition to the use of hierarchical grids, we will use a smoothing-based multiscale strategy. A drawback of local, pixel-based distance measures is their inability to align corresponding but nonoverlapping image features. As illustrated in Fig. 4, blurring of the datasets can compensate the lack of overlap at the expense of a diminished data fidelity. We therefore solve the registration problem for blurred versions of \(u_{\text {2D}}\) and \(u_{\text {3D}}\) with successively decreasing blur radius. In our implementation, we convolved \(u_{\text {2D}}\) and the slices of \(u_{\text {3D}}\) in \(x_1\)-\(x_2\)-direction with a Gaussian kernel \(K_s(x) = \frac{1}{s\sqrt{2\pi }} \exp (-\frac{x^2}{2s^2})\) via the fast Fourier transform. Experimentally, a good sequence of decreasing kernel radii turned out to be \(s = 8h, 4h, 2h, 0\), where h denotes the grid width of the volumetric input image.

Full algorithm The full algorithmic workflow is depicted in Fig. 5. Note that the iteration over the different grid levels always starts with that level n as the coarsest one on which the grid size corresponds to the current blurring kernel radius. The actual implementation was based on the QuocMesh Library, a C++ Finite Element library which supports quadratic, cuboid and simplicial elements.

4 Experimental Results

In all following examples, we took \({\Omega _{\text {3D}}}= [0,1]^2 \times [0,Z]\) for some \(Z \in ]0, 1]\) and used the blurring kernel

$$\begin{aligned} \chi (x) = {\left\{ \begin{array}{ll} \left( \pi \, \left[ x_3-\frac{Z}{2}\right] ^2\right) ^{-1} &{} \sqrt{x_1^2+x_2^2} \le \left| x_3 - \frac{Z}{2}\right| , \\ 0 &{} {\text { else,}} \end{array}\right. } \end{aligned}$$

which corresponds geometrically to a double cone. For the data fidelity, we employ the distance measure \(d_1\). Numeric data on the different experiments are provided in Table 1.

Fig. 4
figure 4

Left: A reference image \(u_r\) (red) and a template image \(u_t\) (green) show the same structure at different locations. Due to the lack of overlap, the gradient of the registration functional \(y\mapsto \int _{{\mathbb {R}}^2}d(u_t\circ y,u_r)\,{\mathrm {d}}x\) at \(y={\mathrm {id}}\) is zero. Right: After blurring both images the structures overlap, resulting in a nonzero gradient. The negative gradient, which points into the direction of an improved registration, can be interpreted as a perturbation of \(y={\mathrm {id}}\) which produces a slight rightward deformation in the overlapping region, as indicated by the arrows (Color figure online)

Fig. 5
figure 5

Schematic overview of the overall registration procedure

Synthetic data To test the performance of the elastic regularizer, we applied our technique to synthetic datasets (Figs.6, 7, 8, 9 and 10), representing cuboids and a deformed vessel structure made of some elastic material.

In the first two test cases (Figs. 6, 7, 8, and 9), the simplicity of the shapes made it possible to generate the three-dimensional deformed and undeformed scenes \(u_{\text {3D}}\) and \(u_{\text {3D}}\circ y\) by setting the pixel intensities manually. The two-dimensional reference images \(u_{\text {2D}}\) were then obtained by applying the forward operator to the undeformed scenes. Figure 6 depicts the results of a test assessing the algorithm’s ability to compute nonrigid lateral deformations. The resolutions of \(u_{\text {3D}}\) and \(u_{\text {2D}}\) in this example are \(129\times 129 \times 17\) and \(129 \times 129\). Figure 6 shows that the overall structural alignment works flawlessly, but also that small-scale spurious deformations can be introduced locally which have negligible influence on the data fidelity term. This is a typical behaviour in registration problems. Increasing the hyperelastic regularization would reduce this spurious deformation of each cuboid, but also slightly hamper the strong global shear deformation between both cuboids so that the alignment of \(u_{\text {2D}}\) and \({{\mathcal {F}}}(u_{\text {3D}}\circ y)\) would be impaired. Note that the projected registration result \({{\mathcal {F}}}(u_{\text {3D}}\circ y)\) looks almost perfect despite the small spurious deformations. Figure 8 left shows the results for hyperelastic parameters increased and decreased by a factor three, respectively. While the result for stronger regularization qualitatively looks the same as in Fig. 6, the weaker regularization leads to a slight depression in both cuboids, associated with a volume change. This can be counteracted by increasing the volume change penalization weight \(c_3\) (Fig. 8, third image), restoring the previous image quality (as for the other parameters, note that \(c_2=2c_1\) is dictated by the identity deformation being stress-free and \(c_4\) has to be small enough not to interfere with polyconvexity). Figure 7 visualizes the deformation along with the norm and determinant of its gradient along a horizontal cross-section. As expected, one observes strong dilation behind the displaced cuboids. While our experiments were performed for the data dissimilarity measure \(d_1\), an exchange by \(d_2\) did not lead to any significant differences. Figure 8 shows the corresponding results for the example from Fig. 6. In principle, it is expected that the choice of data dissimilarity plays a role for very noisy images: As mentioned in Remark 3, \(d_2\) would be more appropriate for Gaussian noise, while \(d_1\) fits better to salt-and-pepper noise or noise with large tails. The noise in microscopy images often comes from the Poisson statistics of photon counts, for which yet another dissimilarity measure, a Kullback–Leibler divergence, is known to be most suitable. However, due to the strong smoothing via the kernel \(\chi \) we expect the dissimilarity choice to be relevant only for high noise levels for which any registration procedure will be strongly impaired anyway.

Fig. 6
figure 6

Inplane deformation of a pair of elastic cuboids. In this experiment and the one shown in Fig. 9, the parameters of the stored energy function W were chosen to make the two cuboids behave like they were embedded in some tenfold softer substance. The table shows the number of performed iterations on each grid level, the initial and final energy value as well as the initial and final Euclidean norm of the discrete energy gradient

Fig. 7
figure 7

Cross-sectional visualization of the deformation from Fig. 6: On a horizontal cross-section we show the composition of the deformation y with a chequerboard pattern as well as the norm

figure c
and determinant
figure d
of its gradient

Fig. 8
figure 8

Influence of model parameters on the registration result. The first three results are for the default dissimilarity measure \(d_1\), the fourth for \(d_2\) (using the default hyperelastic parameters). All simulations use \(c_2=2c_1\) and \(c_4=c_1/10\). The corresponding two-dimensional projections look indistinguishable from the one shown in Fig. 6

Fig. 9
figure 9

Displacement of one of a pair of elastic cubes, surrounded by a soft material

Fig. 10
figure 10

Idealized vessel structure used as a more realistic test case

Fig. 11
figure 11

Registration despite missing regions in the 2D image

The second test, shown in Fig. 9, is intended to evaluate how well the algorithm infers the displacement in viewing direction from the blurriness of the two-dimensional image. To this end, we use a configuration of two cubes, where the deformed scene differs from the undeformed configuration only by a vertical displacement of one cube. The underlying resolutions are \(129\times 129 \times 129\) and \(129 \times 129\), respectively. As Fig. 9 shows, the left cube is correctly displaced in viewing direction, but its initial shape is not entirely preserved. The lack of smoothness of the actually applied transformation

$$\begin{aligned} x&\mapsto {\left\{ \begin{array}{ll} \left( x_1,x_2,x_3+\frac{1}{3}\right) , &{}{\text {if}}\; 0 \le x_1 \le \frac{1}{2}\\ x,&{} {\mathrm {else,}} \end{array}\right. } \end{aligned}$$

which is incompatible with the regularizer \({{\mathcal {R}}}\), explains this phenomenon. As the hyperelastic regularizer favours more regular deformations, its contribution to the overall energy will outweigh those of the dissimilarity measure, if, as in this case, a transformation introduces too much shear.

Since vessel structures constitute the predominant image features in our microscope images, we apply the algorithm to another, more realistic test case to see whether branched tube-like structures are registered equally well as in the previous cases. The synthetic dataset shown in Fig. 10 was generated with the help of VascuSynth [11], a software package capable of generating realistically looking synthetic vessel structures. To obtain a volumetric template dataset \(u_{\text {3D}}\), the generated vessel structure was deformed using a CGAL [13] implementation of the algorithm described in [27], which is capable of generating triangular mesh deformations in real time under the constraint that the resulting deformation acts as rigidly as possible on each triangle. The deformed and undeformed triangular meshes were then turned into grayscale image stacks. As in the previous tests, the application of the forward operator to the undeformed volume image stack generated the two-dimensional reference image \(u_{\text {2D}}\). The dimensions of the input datasets were the same as in our first test case. It turned out that in this example, a preprocessing step as described in Sect. 3, preceding the elastic registration procedure, was necessary to obtain a satisfactory data alignment. The comparison of the reference image \(u_{\text {2D}}\) and the projected registration result \({{\mathcal {F}}}(u_{\text {3D}}\circ y)\) in Fig. 10 indicates a faultless overall alignment, but spurious small-scale deformations similar to those encountered in the first test case. To emulate the situation in which the 2D microscopy image has deficiencies (for instance due to occlusion of small regions), we further artificially removed part of the main vessel in the 2D reference image and repeated the registration, see Fig. 11. Away from this region the image alignment was not impaired, as is desirable. In the missing region, for our choice of parameters the obtained deformation y strongly compresses the vessel structure to make it almost invisible, thus producing a matching 2D projection.

Even though there is no ground-truth deformation, the ground truth deformed 3D structure is available for our synthetic examples (the one from which \(u_{\text {2D}}\) was produced). Since the examples dealt with the deformation of sets, we can compare the obtained deformed set with this ground truth using the Dice coefficient. The results are provided in Table 1. The experiments with displaced cuboids reach a Dice coefficient of around 90%, which is not bad in view of the fact that a vertical misalignment leads to a major deterioration of the Dice coefficient, but shows very little influence in the fidelity term. The Dice coefficient for the vessel structure is significantly worse with 74%; however, this is not unexpected: For thin structures, the same misalignment leads to a much lower Dice coefficient because of their lower volume. The computation time for all these examples ranged between 4.6 h and 4.9 h on an Intel Xeon X5450 processor (\(4\times 3\) GHz) with 8 GB RAM. Note that we simply performed a fixed number of 350 iterations on each grid level, even though most often there are only negligible changes after the first 50 iterations.

Microscopy data In Fig. 12, the technique was finally applied to a microscopy dataset as described in the introduction.

Fig. 12
figure 12

Microscopy images acquired with 3D confocal and 2D IVM microscopy and their elastic registration. Data courtesy of Lydia Sorokin, Konrad Buscher, Jian Song (Institute of Physiological Chemistry and Pathobiochemistry, Münster, Germany)

Fig. 13
figure 13

Registration result from Fig. 12 showing additional channels combining temporal information from two-dimensional intravital microscopy with well-resolved spatial information from three-dimensional confocal microscopy images acquired after tissue excision. Magenta shows fat cells (from confocal microscopy), blue shows leukocytes (from intravital microscopy)

The centre region of the reference image was obscured by diffused fluorescence dye, interfering with the registration process. As a consequence, the dissimilarity measure had to be augmented by a mask \(m:{\Omega _{\text {2D}}}\rightarrow (0,1]\) taking small values in the degraded image region,

$$\begin{aligned} {\mathcal {J}}^d[y] =\int _{{\Omega _{\text {2D}}}}m(x)\, d({{\mathcal {F}}}( u_{\text {3D}}\circ y)(x), u_{\text {2D}}(x) ) \ {\mathrm {d}}x. \end{aligned}$$

As can be seen from overlaying \(u_{\text {2D}}\) and \({{\mathcal {F}}}(u_{\text {3D}}\circ y)\) in Fig. 12, right, the alignment of the blood vessel structures is satisfactory. Note that projecting the original, undeformed three-dimensional configuration, as shown in Fig. 12 left, one obtains dark regions on the right middle part of the image, where vessels lie outside the focus plane. This is corrected by the registration so that the overall registration result on the right of Fig. 12 shows both a strong red and green signal in that region. Similarly, the in-plane distortion on the left side of the image is corrected by the registration. The alignment of the three-dimensional with the two-dimensional images allows information of other colour channels to be integrated into the single dataset as shown in Fig. 13. Thereby one can combine temporal information from intravital microscopy with well-resolved spatial information from confocal microscopy that can only be obtained after tissue excision. Here, clusters of fat cells that surround the larger blood vessels were stained after tissue excision and are shown in magenta, while individual migrating leukocytes were observed during intravital microscopy and are shown in blue.

5 Discussion

We have devised a model for 3D–2D image registration which combines ideas from hyperelastic registration as well as depth-from-defocus models. In contrast to other 3D–2D registration techniques, where typically one or more 2D slices of a 3D volume are given, we only see a projection of the three-dimensional volume. This setting is motivated by a biological application in which one combines 2D intravital microscopy with 3D confocal microscopy. In that field, a proper inplane alignment of the 3D volume with the 2D image is already of great help (and estimating the out of plane deformation gives some extra benefit). This alignment cannot be based just on a 2D slice of the 3D volume since that slice might not contain the features most prominent in the recorded 2D image. On the theory side, this setting of using projections rather than slices allows the use of less regular images and deformations.

Even though recently also 3D confocal intravital microscopy setups were established, the two-dimensional version is still much cheaper and less complicated. In particular, in 2D one can visualize larger regions and thus have a better overview over the occurring cell dynamics so that one might potentially decide at the end which tissue region to excise and examine further. Due to the larger field of view, 2D microscopy is also more robust to motion artefacts. In addition the frame rate is higher, allowing the observation of faster dynamics.

In the biological experiments, the imaged tissue slabs are usually relatively thin. Otherwise there is too little light transmission during 2D microscopy and hampered laser penetration into deeper layers during 3D confocal microscopy. In addition, the light refraction effects from tissue inhomogeneities become nonnegligible in deep tissue layers so that the 2D image of a thick tissue slab actually results from convolving the 3D volume with a spatially varying kernel that depends on the tissue sample itself. Thus, one might argue that one could concentrate on reconstructing merely the inplane deformation. However, as seen in Fig. 12 it may happen that certain structures lie so much out of focus that they are hardly visible in the two-dimensional projection \({{\mathcal {F}}}(u_{\text {3D}})\) or \(u_{\text {2D}}\), which can only be corrected by reconstructing the out-of-plane deformation in addition.

Since out-of-focus blur is the same on either side of the focus plane, the depth-from-defocus problem does not have a unique solution. An example would be Fig. 9, where the same 2D image could have been obtained by moving the left cube upwards. If there is enough texture in the neighbourhood, one may hope to resolve this nonuniqueness by identifying whether the structures above or below have less blur and are thus closer to the focus plane. The hyperelastic regularization, which does not allow material inversion, then will automatically position the different tissue layers in the correct order and thus on the correct side of the focus plane.

In the biological application, several cells are only visible in the 2D microscopy video, but not in the 3D volume, in particular moving cells whose dynamics are supposed to be imaged. In our approach, which merely dealt with the 3D–2D registration problem, the locations of those cells are only specified in the x–y plane. In a second step, one could potentially use existing depth-from-defocus methods to roughly localize those cells in the deformed 3D volume.

Due to modelling errors, as in any ill-posed inverse problem a perfect reconstruction of the ground-truth deformation is not expected. In particular, it is delicate to strike the right balance between regularization and data fidelity: Increasing the regularization may lead to more appropriate deformations in some image regions, but may prevent necessary strong deformations in other regions. To still allow enough freedom for the registering deformation, we regularized it via its elastic deformation energy. An alternative would have been to constrain the allowed deformations to the set of equilibrium deformations induced by different boundary loads. From a physical viewpoint, this would be more appropriate since the forces acting within the imaged tissue are usually negligible so that the tissue deformation is solely caused by forces acting on the boundary of the imaged region. In such an approach, the optimization variable would be the (spatially varying) boundary load of the 3D volume. This would render the actual deformations much more regular in the domain interior, since they solve an elliptic partial differential equation with (typically at least) piecewise smooth coefficients. Unfortunately, if the deformations are thus strongly constrained, the inhomogeneous and anisotropic material properties of biological tissue become more important. If they are not known for the given tissue sample, as is usually the case, and therefore replaced by more or less arbitrary ones, the actual tissue deformation might not even lie within the space of deformations reachable by varying the boundary load.

In principle, the proposed framework may also be applied to other 3D–2D imaging alignment problems such as registering a computer tomographic (CT) image with an X-ray image. More generally, the framework of course allows pose reconstruction for all kinds of measurements that are obtained by first deforming an object and then taking some integral transform of it (such as low- or limited-angle CT or magnetic resonance imaging with only few phase space measurements). Taking the 3D volume to be an anatomical template, the approach may in principle even serve for image reconstruction (where the possible reconstructions are deformed versions of the template).