Elastic 3D-2D Image Registration

We propose a method to non-rigidly align a three-dimensional (3D) volumetric image with a two-dimensional (2D) planar image representing a projection of the deformed volume. The application in mind comes from biological studies in which 2D intravital microscopy videos of living tissue are recorded, after which the tissue is excised and a more detailed 3D volume microscopy is performed. Coregistration of both data sets allows to combine the temporal (but 2D) information with more detailed spatial 3D information. Our approach is variational and uses a hyperelastic deformation regularization, as is appropriate for biological material. As a particular feature, the out of plane deformation is estimated based on the out of focus blur inside the 2D microscopy image. The approach becomes computationally feasible through the use of a coarse-to-fine optimization strategy and higher order optimization methods.


Introduction
In this article we propose an elastic 3D-2D image registration technique, which is motivated by a challenging image processing problem arising in biological studies of leucocyte extravasation into inflamed tissue in mice. A common experimental procedure consists of irritating a certain tissue region and performing two-dimensional intravital microscopy (IVM) to monitor the reactions on the cellular level, see for instance [4]. After a sufficiently long video sequence has been recorded, the animal is sacrificed, the tissue is removed, stained, and fixed, and a single, volumetric 3D image is acquired using a 3D confocal microscope. The two obtained datasets therefore show essentially the same tissue region, one being an elastically deformed configuration of the other. While the IVM recording shows in addition the temporal behaviour of the moving leucocytes, the confocal microscopy image contains more detailed structures such as the blood vessel basement membranes. Aligning and merging both datasets allows to combine the enhanced structural information obtained by the 3D confocal microscope with temporal information gained from the IVM.
Both the 2D and 3D dataset use flourescent staining and thus possess multiple colour channels corresponding to different flourescent dyes. In particular, one channel of both images contains the same biological structures (for instance the endothelial cell membranes) so that those can be used for registration. Our approach for registering the images is based on mimicking the image acquisition system of the 2D microscope, including a non-rigid tissue deformation and a blurring and projection step. The intensities of the blurred and projected image volume are then compared with those of a 2D reference image (for instance a single frame of the IVM video), see fig. 1 for the overall procedure. The goal of the registration process is then to find a deformation transforming the volumetric image in such a way that the similarity of the two datasets is maximal. The set of admissible deformations is chosen in a way that reflects physical properties of the underlying tissue.
In this article, we present a complete mathematical description of the aforementioned procedure and, using the direct method in the calculus of variations, guarantee the existence of an admissible deformation which maximizes the similarity between the datasets. Although rigid or parameterized Figure 1: Forward model of the relation between the 2D and 3D datasets. Given a specimen of the tissue, 3D confocal microscopy yields a corresponding 3D image of voxels. Taking the tissue configuration during confocal microscopy as a reference, the 2D microscopy image is obtained by a (geometrically nonlinear) deformation and a subsequent 2D image acquisition in which structures outside the focus plane are blurred during the projection onto the 2D image plane.
3D-2D registration [18,25] and elastic 3D-3D registration [3] has been studied before, elastic 3D-2D registration has so far not been attempted to the best of our knowledge. Xu and Wan presented an intensity based 3D-2D registration technique for the matching of a CT volume to 2D X-Ray images and a description of how their method can be implemented on GPUs [24]. Studies on CT to X-Ray Fluoroscopy registration for guidance of surgical intervention were published by Otake et al. [18] and Uneri et al. [23]. All three articles employ rigid motions to obtain optimal alignment, which can be expected to yield satisfactory registration results for this specific type of problem. Also descriptions of nonrigid methods can be found in the literature. Zheng and Yu presented a spline based 3D-2D registration technique [25] for matching 3D CT data to 2D X-Ray images using a statistical deformation model. The article also references a number of publications which describe similar methods. In contrast to our setting, in all above studies the 2D images result from a 3D volume via the X-Ray transform, while our 2D images stem from optical microscopy and thus depict structures outside the focus plane with blur. On the one hand this blur makes the lateral alignment of structures between the 3D and 2D image more difficult (also computationally), on the other hand it can provide additional information about the deformation along the viewing direction, which in the above studies must come solely from the deformation model. Heldmann and Papenberg presented a variational method to register 3D CT data to a sequence of 2D ultrasound slices [10]. Note that their 2D images thus stem from point-or slicewise evaluation of a 3D volume rather than from an integral transform as in our and the above approaches. The well-posedness of this problem thus requires high regularity of the 3D image to be registered and the corresponding deformation, which the authors provide employing curvature regularization. They include results of the application of their technique to clinical data, however, no in depth analysis of the presented energy functional is presented. Berkels et al. studied 2D-3D surface registration, where graph representations of cortical surfaces had to be matched to 2D brain images. Unlike in our case, an optimal deformation ψ : R 2 → R 3 had to be found. For the regularization of this highly ill-posed problem, the authors suggested a regularizer based on the second order thin plate spline energy of ψ [2]. A different, yet still related, situation can arise in surface to surface matching problems, if surface regions are locally described by two dimensional images which then in turn can be processed further by 2D-2D image matching algorithms. Such methods were for example described by Merelli et al. [14,15]. Our method will estimate displacement in viewing direction from the severity of out of focus blur. A closely related problem, known as depth-from-defocus, is to estimate the distance between an object and the image acquisition device from a whole image stack (rather than just one image as in our case) that was generated by capturing the object under varying focal settings. The forward model described by Persch et al. [20], which mimics the image acquisition of a thin lens camera, is comparable to our 2D microscope model. Likewise related is the article by Aguet et al. [1], which suggests a method of how an image stack, produced by moving a sample through the different focal planes of a 2D microscope, can be combined into a single feature enriched image. For further methods we refer to the extensive overview [13] of 3D-2D registration techniques with a focus on applications in image guided surgery. In the next section we present the mathematical model and the analysis of existence of solutions in detail. Section 3 describes the numerical implementation, while section 4 shows the behaviour of the method in carefully chosen test cases and a real biological dataset.

Mathematical Formulation
We continue with the mathematical description of our registration problem. We first briefly describe the forward operator of the 2D microscope and subsequently a physically reasonable model of tissue deformations. For simplicity, our exposition will consider the 3D and 2D unit domains Ω 3D = [−1, 1] 3 and Ω 2D = [−1, 1] 2 , respectively. The volumetric image stemming from 3D confocal microscopy is denoted u 3D : Ω 3D → [0, ∞) and is considered to represent the tissue reference configuration. In comparison to the IVM the quality of the 3D microscopy process is typically good enough to assume u 3D noise-and blurfree. The 2D microscopy image u 2D : Ω 2D → [0, ∞) will be interpreted as resulting from a non-rigid deformation of u 3D and a subsequent projection into the plane. Above, both u 3D and u 2D are assumed to represent only that colour channel in which the same biological structures are visible; furthermore we will assume both images to be uniformly bounded (which is appropriate since the image intensities can be interpreted as the local concentration of fluorescent molecules). The forward operator of 2D microscopy can be modelled by where the blurring kernel χ ∈ L 1 (R 3 ) has compact support and encodes the microscope's point spread function. Above, L p denotes the standard Lebesgue function space, and χ * u denotes convolution (after extending u by zero outside Ω 3D ). Due to (the space of continuous functions with compact support) so that F is well-defined and even maps into C 0 (Ω 2D ) (the space of continuous functions on Ω 2D ). The value χ((x 1 , x 2 , 0)−z) can be interpreted as the amount of light recorded at (x 1 , x 2 ) by the 2D microscope from a point source of light at z, which is readily seen from (Fu)(x 1 , dz. In our model we assume the blurring kernel to be spatially independent so that the observed blur of a point z ∈ Ω 3D only depends on its height z 3 . A simple example (used in our calculations due to lack of a properly measured point spread function) is given by which blurs each point z outside the focus plane R 2 × {0} uniformly onto a disc of radius cz 3 .
Remark 1 (Compactness of forward operator). The previous calculations even show that F is a compact operator from L ∞ (Ω 3D ) to C 0 (Ω 2D ) (as expected for convolutions, even though we only extract a slice from the convolution result). Indeed, due to is equicontinuous and thus compact in C 0 (Ω 2D ) by the Arzelà-Ascoli theorem. However, in our analysis we will not make use of this fact.
Remark 2 (Less image regularity). With additional conditions on the blurring kernel χ one may reduce the image regularity to u 3D ∈ L q (Ω 3D ) and obtain a continuous linear forward operator F : L q (Ω 3D ) → L q (Ω 2D ) for any q ≥ 1. Indeed, assume that χ(·, ·, x 3 ) L 1 ≤ C uniformly for almost all x 3 ∈ R and some C > 0 (note that this is satisfied by our example kernel), then for u ∈ L q (Ω 3D ) we have using Fubini's theorem and thus |Fu(x 1 , x 2 )| q ≤ K R |u(·, ·, x 3 ) * χ(·, ·, −x 3 )| q dx 3 by Jensen's inequality for some K > 0 depending on the support of χ. Therefore, by Fubini's theorem and Young's convolution inequality we have Between the 2D and the 3D image acquisition the tissue is deformed by a deformation y : Ω 3D → R 3 , where for any point x ∈ Ω 3D the value y(x) shall be interpreted as the new position assumed during the 3D image acquisition. The inverse deformation y −1 thus moves the reference configuration back into the configuration during the 2D microscopy so that we expect u 2D = F(u 3D • y).
However, due to noise in the image acquisition and additional artefacts not included in our model (such as diffuse background signals) one cannot expect equality. Instead we shall seek a deformation y that leads to a small dissimilarity measure where d : R × R → [0, ∞) measures the distance between two image intensities (again u 3D is extended by zero outside Ω 3D ). The optimal choice of d is in general determined by the type of noise contained in the image data. We will focus on the L 1 and L 2 distance measures obtained with respectively. The former is well-known to be appropriate if the data contains strong outliers, while the latter is appropriate for additive Gaussian noise. Other choices include correlation-based distance measures (see for instance [16, Sct. 6.1]) or the Kullback-Leibler Divergence d KL (a, b) = b − a + a log a b in case of Poisson noise.
The minimization of J d [y] for general mappings y : Ω 3D → R 3 is neither physically reasonable nor well-posed. Indeed, y might be discontinuous or not even measurable so that the composition u 3D • y does not make sense. Thus we have to regularize the deformation y by imposing additional constraints and adding an extra energy term. Concerning physical constraints, y should neither reverse the orientation of the deformed body Ω 3D nor should it introduce self-intersections of the material. The first condition translates to det ∇y > 0 almost everywhere, whereas the second condition is imposed by additionally demanding where vol shall denote the three-dimensional Lebesgue measure. The advantage of this constraint is that it guarantees the injectivity of y if combined with det ∇y > 0 and that it also holds for the weak limit of a sequence (y k ) k satisfying the constraint, see for example [5,. We shall also constrain the maximal possible displacement according to in order not to shift the three-dimensional volume out of the visible area. In addition to those hard constraints, unreasonable growth or shrinkage of lengths, areas, or volumes by the deformation should be penalized. Modelling the biological tissue as an elastic material, such a penalization can be achieved by adding an elastic deformation energy as regularization, which will prevent physically unreasonable deformations with too high elastic energy. Here, W : R 3×3 → [0, ∞) represents the stored energy function. Since biological tissue is very soft, it typically undergoes considerable nonlinear deformation so that the elastic model should be geometrically nonlinear and, in particular, rigid motion invariant. Furthermore, due to lack of better information we may assume a homogeneous and isotropic elastic constitutive law. Together, the above assumptions imply that the stored energy function can be written as a function of the singular values or of the invariants of its argument (see for example [5,Ch. 4 with the Frobenius norm A F = tr(A T A) and the cofactor matrix cofA = det AA −T . Note that A F , cofA F , and det A control length, area, and volume changes respectively. Furthermore, biological tissue is almost incompressible so that deviation from det ∇y = 1 should be strongly penalized. In our numerical experiments we will use the particular example for a positive constant c 1 and a function g : To favour deformations obeying det ∇y ≈ 1, a term like | det ∇y − 1| 2 can be incorporated into the definition of g. Note that this function is polyconvex (that is, can be written as a convex function of (∇y, cof∇y, det ∇y)), which ensures weak lower semi-continuity of R as will be needed for the existence analysis. Summarizing, we arrive at the optimization problem whose solution is the sought matching deformation.
Remark 3 (Bayesian perspective). The task of registering the two datasets can also be viewed as an inverse problem, where a measurement u 2D is given alongside with a model of the forward operator y → F(u 3D • y), which maps a deformation of the tissue into the corresponding projection onto a planar image. Due to measurement noise and unknown modelling errors in the forward operator, the measurement is a random variable. Likewise, the sought deformation y can be interpreted as a random variable with different outcomes in different repetitions of the measurement. From a Bayesian perspective, one would now like to maximize the conditional probability (we formally use P like a probability density over an infinite-dimensional space). After taking the negative logarithm, the optimization problem thus turns into The term log P (u 2D ) is independent of y and can be neglected, and for the probability distribution of deformations it is reasonable to assume a Boltzmann-type distribution in which the probability decreases exponentially with increasing deformation energy. Likewise, the probability distribution of the measurement (u 2D ) k in the k th image pixel is taken as Summarizing, we arrive at the same optimization problem, .
, convex and lower semi-continuous in its first argument, and a polyconvex lower semi-continuous stored energy function W satisfying , det A} −r − β for some exponents p > 3, r > 0 and constants C, β > 0. Then E d admits a minimizing deformation y on the set of admissible deformations Proof. We follow the direct method of the calculus of variations. First note that so that ∇y k L p is uniformly bounded. Together with the admissibility condition y k L ∞ ≤ diam(Ω 2D ) we obtain uniform boundedness of y k W 1,p so that we can extract a weakly converging subsequence (still indexed by k) y k y in W 1,p (Ω 3D ; R 3 ). Due to p > 3, by Sobolev embedding we may even assume y k → y strongly in the space C 0,α (Ω 3D ; R 3 ) of Hölder continuous functions with exponent α < 1 − p/3. Lower semi-continuity of R: We have R[y] ≤ lim inf k→∞ R[y k ] by the properties of W . Indeed, by Hölder's inequality cof∇y k and det ∇y k are uniformly bounded in L 2p/3 (Ω 3D ) and L p/3 (Ω 3D ), respectively, and thus converge for a subsequence. By [5, Thm. 7.6-1] we even have cof∇y k cof∇y in L 2p/3 (Ω 3D ) and det ∇y k det ∇y in L 2p/3 (Ω 3D ) . Now Mazur's lemma implies the existence of a sequence of strongly and pointwise almost everywhere converging convex combinations as k → ∞, where N n ≥ k and the nonnegative coefficients a k i sum up to one. Since W is polyconvex we can write W (A) =W (∇A, cof∇A, det ∇A) for a convex functionW : Thus, with Fatou's lemma and the lower semi-continuity of W we now have Properties of limit function: The limit function y lies in A. Indeed, by the uniform convergence y k → y we have y L ∞ = lim k→∞ y k L ∞ ≤ diam(Ω 2D ). To see that det ∇y > 0 holds almost everywhere, consider the set for ε > 0. The growth condition on W and the lower semi-continuity of R imply Thus, vol(S ε ) → 0 as ε → 0 implies det ∇y > 0 almost everywhere. Likewise, due to the weak convergence det ∇y k det ∇y and the convergence y k → y in C 0,α (Ω 3D ) we have Furthermore, y is injective almost everywhere, that is, the cardinality equals 1 for almost every v ∈ y(Ω 3D ). Indeed, by the change of variables formula for Sobolev Since also the opposite inequality holds, we must have equality and N (y | v) = 1 almost everywhere.
Lower semi-continuity of J d : Note that we have u 3D • y k → u 3D • y as k → ∞ in any L q (Ω 3D ) with q ∈ [1, ∞). Indeed, for a Dirac sequence G δ of smooth mollifiers we have Abbreviating s = (r + 1)/r and employing Hölder's inequality, for the first summand we obtain where we used the change of variables for Sobolev functions [12,Thm. 2] as well as N (y k | v) = 1 for almost all v ∈ y k (Ω 3D ) (by the same argument as for y). Since u 3D ∈ L qs (Ω 3D ) and R[y k ] ≤ E d [y k ] ≤ E d [y 1 ], the right-hand side converges to 0 as k → ∞ and then δ → 0. For the second summand we observe L δ being the Lipschitz constant of G δ * u 3D . Again, letting first k → ∞ and then δ → 0 the right-hand side converges to 0. The third summand is treated like the first so that in summary Due to u 3D ∈ L ∞ (Ω 3D ), the composition u 3D • y k is uniformly bounded in L ∞ (Ω 3D ) so that any subsequence contains another weakly-* converging subsequence in L ∞ (Ω 3D ). Due to the strong convergence u 3D • y k → u 3D • y in L q (Ω 3D ), the limit must be the same and thus for the whole sequence. Furthermore it is straightforward to check that F is the adjoint operator to F : which is a bounded linear operator due to by Young's convolution inequality. As a consequence,

Numerical Implementation
In this section we discuss the discretization and numerical minimization of the energy functional which is nontrivial due to the nonlocal convolution operator in F, the composition of discretized functions, and the nondifferentiability of the discretized functions. As before, d denotes either the Euclidean distance d 1 (x, y) = |x − y| or its square d 2 (x, y) = |x − y| 2 , but other choices can be implemented in the same way. To obtain a differentiable functional in the former case (which will allow simpler numerics), we make the modification d 1 (x, y) = (x − y) 2 + δ 2 with δ > 0 a small regularization parameter. In our implementation the stored energy function W has the form with constants c 1 , c 2 , c 3 ≥ 0, D ∈ R such that W ≥ 0 and W (S) = 0 for rotation matrices S. This specific choice violates the growth condition of theorem 1 (which was needed to apply a change of variables formula for Sobolev functions, while the lower semi-continuity of R could also be obtained for weaker growth conditions [19,Thm. 3.6]), however, we observed no indication of degeneration of the deformations in our numerical experiments so that the above choice seemed sufficient. Note that other stored energy functions can be implemented just as well, for instance the strain energy densities of Neo-Hookean materials, which only differ from our choice by the slightly weaker penalty term for volume compression. Note also that in our experiments the constraints Ω3D det ∇y(x) dx ≤ vol(y(Ω 3D )) and y L ∞ ≤ diam(Ω 2D ) were always satisfied without explicit enforcement.
Assuming a twice differentiable 3D image u 3D , the first and second Gâteaux derivatives of J d in y ∈ A for suitable variations φ and ψ are where Hess u 3D denotes the Hessian of u 3D . While for convex d the second summand in is always positive semi-definite, the first summand may destroy this definiteness. For the sake of completeness, we also write down the expressions for the first and second Gâteaux derivative of the hyperelastic regularizer. Rewriting W defined in (1) in the form its partial derivatives are given by Spatial discretization. The image domains Ω 2D and Ω 3D are discretized by two dyadically nested hierarchies of regular rectilinear grids (T n 2D ) n=1,...,N and (T n 3D ) n=1,...,N , respectively. The number of nodes in the n th grid along each coordinate direction is M n = 2 n + 1, where the resolution of the given discrete input images determines N , the level of the finest grids T N 2D and T N 3D . Introducing multilinear Finite Element basis functions on each grid gives rise to a hierarchy of C 0 -Finite Element spaces X n = (X n 2D × X n 3D ) n=1,...,N with X n ⊂ X m whenever n ≤ m and corresponding restriction and prolongation operators chosen as follows. On a one-dimensional grid with M n nodes a Finite Element function ξ can be identified with the vector (ξ 1 , . . . , ξ Mn ) of its nodal values. On such a grid we define the one-dimensional restriction and prolongation operators as , ξ Mn ) (for ease of notation we set ξ 0 = ξ 1 and ξ Mn+1 = ξ Mn ). The restriction and prolongation operators R n : X n → X n−1 , P n : X n → X n+1 are then obtained by applying consecutively R 1D n and P 1D n along each coordinate direction of the grid. Fixing a grid T n 3D with N n = (2 n + 1) 3 nodes (x 1 , . . . , x Nn ), we denote the Finite Element basis functions on T n 3D by φ n 1 , . . . , φ n Nn . The Finite Element representation of u 3D in X N 3D is taken as

With the help of the identities ∂
and in X n 3D as u n 3D = R n+1 · · · R N u N 3D (note that we will denote discretized functions on grid level n with a superscript n). Similarly, the discretized deformation is expressed as y n = Nn j=1 y n j φ n j with nodal coefficients y n j ∈ R 3 . The discretized version u n 2D of u 2D on grid T n 2D is defined in an analogous way.
Discretized cost functional. The evaluation of the data term J d and its derivatives requires the computation of F(u 3D •y). On the discretized level, the convolution contained in F is computed with the help of the discrete Fourier transform (DFT). To this end, u n 3D • y n and the discretized convolution kernel χ n are evaluated at all grid points, and the resulting grid functions are padded with zeros so as to emulate the DFT using the fast Fourier transform on a periodic grid (in our case using routines of the FFTW project http://www.fftw.org/). The resulting grid function specifies the nodal values of a multilinear Finite Element function, which is then restricted to the x 1 -x 2 -plane to yield the discretized forward operator F n (u n 3D • y n ). Using second order Gaussian quadrature on each element, the discrete analogue of J d is now computed as where A n stands for the area of each element in T n 2D , w q 2D denotes the quadrature weights, and the sum is taken over all quadrature points q. Similarly, the discretized regularizer is evaluated via R n [y n ] = V n q w q 3D W (∇y n (q)) with V n the volume of each element in T n 3D . Finally, E d n = J d n + R n .
Discretized functional derivatives. For the numerical evaluation of ∂J d we exploit that the adjoint operator to a convolution is the cross-correlation. In more detail, for functions f : where we used Fubini's theorem and [χ f ](y) = [χ(−·, −·, −y 3 ) * f ](y 1 , y 2 ) denotes the adjoint operator to the convolution evaluated in the x 1 -x 2 -plane. Denoting by * n our discrete approximation of the convolution described previously, our discrete approximation of applied to two Finite Element functions f n ∈ X n 2D and χ n ∈ X n 3D is computed as the Finite Element function χ n n f n = χ n * n Bf n , where B : X n 2D → X n 3D is the operator copying all nodal function values from T n 2D into the x 1 -x 2plane of T n 3D and leaving all other nodal values 0. Using the above notation we can write Correspondingly, using second order Gaussian quadrature, the discretized derivative is calculated for each Finite Element basis function φ n,k j = e k φ n j (with e k the k th Cartesian unit vector) as The discretized analogue of ∂R can be written as ∂R n [y n ](φ k,n j ) = V n q w q 3D tr (∇φ k,n j ) T (q) 2c 1 ∇y n (q) − c 2 det(∇y n (q)) −2 + 2c 3 (1 − det(∇y n (q))) cof(∇y n (q)) , Discretized second derivatives. The second derivative of J d can be written as the sum ∂ 2 J d [y] = H L + H NL of a local and a nonlocal linear operator, which are discretized separately. Indeed, while H L (φ, ψ) is nonzero only if φ and ψ have overlapping support, the sparsity of a matrix representation of H NL (φ, ψ) depends on the size of the blurring kernel χ and will typically be very low, making the storage of the assembled matrix impractical. However, during our numerical optimization we will only apply iterative solvers like BiCGStab, which only require the evaluation of matrix-vector products. Due to the tensor product structure of the integrand of H NL , the application of the linear operator can be implemented efficiently as follows. Again exploiting the relation between convolution and the operator we can rewrite Thus, given a Finite Element function ψ n ∈ (X n 3D ) 3 , for each Finite Element basis function φ k,n j = e k φ n j we can compute the discretized analogue The operator H L is discretized as where Hess n u n 3D is defined weakly as in mixed Finite Elements approaches. Indeed, as an artefact of our discretization, the piecewise multilinear Finite Element function u n 3D does not possess a weak second derivative (part of its distributional second derivative is concentrated on the element boundaries), yet second order information Hessu 3D is helpful for the registration and should not be neglected in the discretization. Thus we define Hess n u n 3D via where n denotes the unit outward normal to ∂Ω 3D . This amounts to solving a linear system of the form M V = LU 3D for the nodal value vector V of Hess n u n 3D with M a mass matrix, L a stiffness matrix, and U 3D the vector of nodal values of u n 3D . Note that the matrix representation of H L n is just a weighted mass matrix.
Numerical optimization. We tested and compared the performance of gradient-based methods, in particular nonlinear conjugate gradient and quasi-Newton methods, and a second order line search or trust region Newton method. Below we provide a few details on the latter (the implementation of the former being straightforward). The line search Newton method for the numerical minimization of E d n over (X n 3D ) 3 takes the form where we compute the step size γ k > 0 using a backtracking line search with Armijo's condition and where the inverse linear operator is applied using BiCGStab (note that due to the lack of sparsity  in H NL n , the linear system has to be solved iteratively). However, while H NL n is always positive semidefinite, H L n and ∂ 2 R n can be indefinite so that ∂ 2 E d n may be so as well. Consequently, the Newton step may not be a descent direction, and the iteration might converge to a saddle point. To compensate a possible lack of positive definiteness we add a scalar multiple λ k of the identity to the Hessian operator, where −λ k should approximate the most negative eigenvalue of H L [y n k ] + ∂ 2 R n [y n k ]. To determine λ k , we use the procedure described in [7, Sct. 8.5.2, Thm. 8.5.1], which consists of a number of truncated Lanczos iterations to obtain a symmetric tridiagonal approximation T ∈ R m×m to H L [y n k ] + ∂ 2 R n [y n k ] and a subsequent computation of its characteristic polynomial, whose smallest zero is found via a bisection method. Further modifications of the Newton iteration allow a further reduction of the computational complexity of each Newton step. Since ∂ 1 d(F(u 3D • y), u 2D ) is zero for perfectly aligned images, the contribution of H L for closely aligned images is negligible and ∂ 2 J d n can be approximated by H NL +∂ 2 R n . We refer to [21] for a detailed discussion of this strategy in the context of hyperelastic 3D-3D image registration. Figure 2a compares the energy decrease of Newton's method and this modification when applied to the datasets shown in Figure 8. Figure 2b shows that the major cost of each Newton iteration lies in the numerical solution of the linear system. To improve convergence of the BiCGStab solver, we tested several preconditioning methods. Jacobi, geometric scaling, and incomplete LU preconditioning -applied to the part H L [y n k ] + ∂ 2 R n [y n k ] of the Hessian matrix that can be assembled -turned out to be inferior to a multigrid preconditioner applied to the entire Hessian operator ∂ 2 E d n . Note that this operation does not require the assembly of ∂ 2 E d n , since iterative solvers like BiCGStab or GMRES -this time without preconditioning -can be employed for the pre-and postsmoothing steps. A line search Newton method is prone to getting stuck or at least slowing down at saddle points (even despite the compensation for indefiniteness). This can be avoided by using a trust region Newton method, which can also minimize indefinite quadratic functions within its trust region. To this end we solved the Newton system via preconditioned truncated Lanczos iterations [8,26] (since this simultaneously allows to use the above-mentioned technique for compensating indefiniteness), where we applied the same preconditioners as in the line search approach. Overall, as already suggested by fig. 2b, the solution of the linear system in Newton's method (line search or trust region) turns out to consume so much time that a mere quasi-Newton method with BFGS updates (see e. g. [17,Ch. 6.1]) is more efficient. In fact, a plain conjugate gradient descent with Polak-Ribiére updates (see e. g. [6,Sct. 8.5]) performed best in our experiments.
Preprocessing. Before starting the minimization of E d n we rigidly align u 3D to u 2D by minimizing J d n [y n ] among all rigid deformations (for which actually E d n = J d n ). In fact, to accommodate a potential slight mismatch in magnification between the 2D intravital and the 3D confocal microscopy as well as different resolutions in x 1 -, x 2 -, and x 3 -direction (or to make up for a bad choice of the blurring kernel χ), we additionally allow a rescaling along the coordinate directions. Thus, we minimize J d n among all deformations parameterized by a translation vector t ∈ R 3 as well as scalings s 1 , s 2 , s 3 along and rotation angles α, β, γ ∈ [0, 2π[ about the three coordinate directions (R i (δ) ∈ SO(3) denotes the rotation about the i th axis by angle δ). We now use the grid hierarchy T n 3D , n = 1, . . . , N , to iteratively find the optimal parameters for each grid level n via a quasi-Newton method initialized with the optimal parameters from level n − 1. After the optimal deformation on level N is found, we replace u 3D by its composition with that deformation so that the new u 3D now has the correct length scales and already is optimally aligned.
Multilevel optimization problems. We have already detailed how by replacing J d , R, and E d with discretized analogues J d n , E d n , and R n we arrive at a set of optimization problems that are numerically solved using the nonlinear conjugate gradient method. As so often in image registration methods, the use of a multilevel approach is essential for the quality of the results as well as for computational efficiency. Owing to its nonconvexity, the functional E d can be expected to have a large number of local minima, which is in general linked to the resolution of the given image data and the number of image features that promote regional alignment. Downsampling of the image data reduces the number of image features (and thus of local minima) in the input datasets as well as the complexity of the optimization procedure (see the experimental illustration in fig. 4). The results of the less costly optimization on coarser grids can then be used as good initial values for the higher level optimization problems. We make use of this strategy by first minimizing E d n on a low grid level n and then successively solving the optimization problem on higher grid levels. In addition to the use of hierarchical grids we will use a smoothing-based multiscale strategy. A drawback of local, pixel-based distance measures is their inability to align corresponding but non-overlapping image features. As illustrated in fig. 4, blurring of the datasets can compensate the lack of overlap at the expense of a diminished data fidelity. We therefore solve the registration problem for blurred versions of u 2D and u 3D with successively decreasing blur radius. In our implementation, we convolved u 2D and the slices of u 3D in x 1 -x 2 -direction with a Gaussian kernel 2s 2 ) via the fast Fourier transform. Experimentally, a good sequence of decreasing kernel radii turned out to be s = 8h, 4h, 2h, 0, where h denotes the grid width of the volumetric input image.
Full algorithm. The full algorithmic workflow is depicted in fig. 5. Note that the iteration over the different grid levels always starts with that level n as the coarsest one on which the grid size corresponds to the current blurring kernel radius. The actual implementation was based on the QuocMesh Library, a C++ Finite Element library which supports quadratic, cuboid and simplicial elements. res.
----- Figure 4: Left: A reference image u r (red) and a template image u t (green) show the same structure at different locations. Due to the lack of overlap, the gradient of the registration functional y → R 2 d(u t • y, u r ) dx at y = id is zero. Right: After blurring both images the structures overlap, resulting in a nonzero gradient. The negative gradient, which points into the direction of an improved registration, can be interpreted as a perturbation of y = id which produces a slight rightward deformation in the overlapping region, as indicated by the arrows.
Synthetic data. To test the performance of the elastic regularizer, we applied our technique to synthetic datasets (figs. 6 to 8), representing cuboids and a deformed vessel structure made of some elastic material.
In the first two test cases (figs. 6 to 7), the simplicity of the shapes made it possible to generate the three-dimensional deformed and undeformed scenes u 3D and u 3D • y by setting the pixel intensities manually. The two-dimensional reference images u 2D were then obtained by applying the forward operator to the undeformed scenes. Figure 6 depicts the results of a test assessing the algorithm's ability to compute non-rigid lateral deformations. The resolutions of u 3D and u 2D in this example are 129 × 129 × 17 and 129 × 129. Figure 6 shows, that the overall structural alignment works flawlessly, but also that small-scale spurious deformations can be introduced locally which have negligible influence on the data fidelity term. The second test, shown in fig. 7, is intended to evaluate how well the algorithm infers the displacement in viewing direction from the blurriness of the two-dimensional image. To this end we use a configuration of two cubes, where the deformed scene differs from the undeformed configuration only by a vertical displacement of one cube. The underlying resolutions are 129 × 129 × 129 and 129 × 129, respectively. As fig. 7 shows, the left cube is correctly displaced in viewing direction, but its initial shape is not entirely preserved. The lack of smoothness of the actually applied transformation which is incompatible with the regularizer R, explains this phenomenon. As the hyperelastic regularizer favours more regular deformations, its contribution to the overall energy will outweigh those of the data term, if, as in this case, a transformation introduces too much shear. Since vessel structures constitute the predominant image features in our microscope images, we apply the algorithm to another, more realistic test case to see whether branched tube-like structures Registration result u 3D •y (perspective view) Figure 6: Inplane deformation of a pair of elastic cuboids. In this experiment and the one shown in fig. 7, the parameters of the stored energy function W were chosen to make the two cuboids behave like they were embedded in some soft, gel-like substance. are registered equally well as in the previous cases. The synthetic dataset shown in fig. 8 was generated with the help of VascuSynth [9], a software package capable of generating realistically looking synthetic vessel structures. To obtain a volumetric template dataset u 3D , the generated vessel structure was deformed using a CGAL [11] implementation of the algorithm described in [22], which is capable of generating triangular mesh deformations in real-time under the constraint that the resulting deformation acts as rigidly as possible on each triangle. The deformed and undeformed triangular meshes were then turned into grayscale image stacks. As in the previous tests, the application of the forward operator to the undeformed volume image stack generated the two-dimensional reference image u 2D . The dimensions of the input datasets were the same as in our first test case. It turned out that in this example, a preprocessing step as described in section 3, preceding the elastic registration procedure, was necessary to obtain a satisfactory data alignment. The comparison of the reference image u 2D and the projected registration result F(u 3D • y) in fig. 8 indicates a faultless overall alignment, but spurious small-scale deformations similar to those encountered in the first test case.
Microscopy data. In fig. 9 the technique was finally applied to a microscopy dataset as described in the introduction. The centre region of the reference image was obscured by diffused fluorescence dye, interfering with the registration process. As a consequence, the data term had to be augmented by a mask m : Ω 2D → (0, 1] taking small values in the degraded image region, m(x) d(F(u 3D • y)(x), u 2D (x)) dx .
As can be seen from overlaying u 2D and F(u 3D • y) in fig. 9, right, the alignment of the blood Figure 10: Registration result from fig. 9 showing additional channels combining temporal information from two-dimensional intravital microscopy with well-resolved spatial information from three-dimensional confocal microscopy images acquired after tissue excision. Magenta shows fat cells (from confocal microscopy), blue shows leukocytes (from intravital microscopy).
vessel structures is satisfactory. Note that projecting the original, undeformed three-dimensional configuration, as shown in fig. 9 left, one obtains dark regions on the right middle part of the image, where vessels lie outside the focus plane. This is corrected by the registration so that the overall registration result on the right of fig. 9 shows both a strong red and green signal in that region. Similarly, the in-plane distortion on the left side of the image is corrected by the registration. The alignment of the three-dimensional with the two-dimensional images allows information of other colour channels to be integrated into the single dataset as shown in fig. 10. Thereby one can combine temporal information from intravital microscopy with well-resolved spatial information from confocal microscopy that can only be obtained after tissue excision. Here, clusters of fat cells that surround the larger blood vessels were stained after tissue excision and are shown in magenta, while individual migrating leukocytes were observed during intravital microscopy and are shown in blue.