Template-Based Image Reconstruction Facing Different Topologies

The reconstruction of images from measured data is an increasing field of research. For highly under-determined problems, template-based image reconstruction provides a way of compensating for the lack of sufficient data. A caveat of this approach is that dealing with different topologies of the template and the target image is challenging. In this paper, we propose a LDDMM-based image-reconstruction model that resolves this issue by adding a source term. On the theoretical side, we show that the model satisfies all criteria for being a well-posed regularization method. For the implementation, we pursue a discretize-then-optimize approach involving the proximal alternating linearized minimization algorithm, which is known to converge under mild assumptions. Our simulations with both artificial and real data confirm the robustness of the method, and its ability to successfully deal with topology changes even if the available amount of data is very limited.


Introduction
In medical applications such as computed tomography (CT) [31], images are typically observed via indirect and potentially noisy measurements.Especially when the amount of measured data is limited, obtaining meaningful reconstructions is challenging.This is, for instance, the case in limited-angle CT [17,31], where sparse data is acquired in order to minimize exposure time of organisms to X-radiation.In such settings, it is inevitable to add a priori information about the target into the reconstruction process, e.g., in form of a template image that is somehow close to the expected reconstruction.Template-based methods, outlined in more detail below, encode this closeness assumption directly into the reconstruction process.Hence, any reconstruction will strongly depend on the chosen template.If a good template is available, e.g., from an earlier observation, competing methods such as the filtered backprojection [31] or total-variation (TV) regularization [39] are outperformed by large margins [10,Sec. 10].Clearly, template-based methods can also be applied for other inverse problems such as deblurring or MRI.In the following, we discus indirect image matching in more detail.The included examples lead us to the proposed model, which is a simplified extension of the metamorphosis approach proposed in [18].

Indirect image matching
Let Ω ⊂ R d be a bounded domain.Image matching refers to the task of transforming a template image T ∈ L 2 (Ω) to match a target image U ∈ L 2 (Ω) as closely as possible regarding some misfit measure.For indirect image matching, the target U is additionally In the definition of V, V is an admissible vector space that continuously embeds into C1,α 0 (Ω, R d ), 0 < α ≤ 1, namely the closure of C ∞ c (Ω, R d ) with respect to the Hölder norm ∥ • ∥ C 1,α .Using the diffeomorphism φ, we implicitly define a transformation path I : [0, 1] × Ω → R starting from the template T via I(t, φ(t, x)) = T (x).From an Eulerian perspective, we can link this path I to the velocity v directly [10], namely as the (weak) solution of the transport equation ∂ ∂t I(t, x) + v(t, x)∇ x I(t, x) = 0 for (t, x) ∈ [0, 1] × Ω, I(0, x) = T (x) for x ∈ Ω.
We denote the set of feasible tuples (I, v) ∈ L 2 ([0, 1], L 2 (Ω)) × V solving this PDE by A. Given the data g and the template T , a reconstruction can now be defined as R = I(1, •) = T (φ −1 (1, •)) 1 , where I is a minimizer of the variational problem Here, the data fidelity term D : Y × Y → R ≥0 quantifies the misfit of K • I(1, •) with the measurements g, and the regularizer E : V → R ≥0 enforces the required smoothness of v [10].Note that (2) can be simplified by using linearized deformations and a Taylor expansion around some initial flow field, leading to optical-flow-based transformation models [1,6,22,35].However, an image transformation model purely based on diffeomorphisms can yield unsatisfying reconstructions, e.g., if the images have different mass or topological properties [10].To resolve this issue, [18] proposed to replace the flow of diffeomorphism model (3) underlying (4) with the metamorphosis model [28,38,43,44].In addition to the transport of pixel intensities, metamorphosis also allows intensity variations along the trajectories based on a source term ζ ∈ L 2 ([0, 1] × Ω).Hence, this model can create or remove objects during the transformation process.To put this into formulas, metamorphosis paths I are solutions of the transport equation ∂ ∂t I(t, x) + v(t, x)∇ x I(t, x) = ζ(t, x) for (t, x) ∈ [0, 1] × Ω, I(0, x) = T (x) for x ∈ Ω.
We denote the set of feasible tuples From the Lagrangian perspective, I(t, x) can be equivalently defined using the solution φ of (2)via The relation between ( 5) and ( 6) is commonly known as the method of characteristics [15].Given the data g and the template T , a reconstruction can then be defined as R = I(1, •), where I solves Here, the additional regularizer E 2 : L 2 ([0, 1] × Ω) → R ≥0 enforces the necessary regularity of ζ.
Proposed model If we take a closer look at (7), we notice that the objective takes only I(1, •) into account and is blind to the path I at all other times.Further, it holds that Consequently, any reconstruction R consists of a template deformation and some intensity change z that depends on ζ and φ.In (7), we have a complicated regularization of z in terms of both E 1 and E 2 .To obtain a simpler reconstruction model, we propose to replace the underlying transformation model ( 5) in (7) by The set of tuples (I, v, z) ∈ L 2 (Ω) × V × L 2 (Ω) satisfying ( 8) is denoted by C. Loosely speaking, we first deform the template T based on φ.Afterwards, we modify the pixel values with the source z.
Based on the simplified transformation model (8), we propose the reconstruction model Instead of implicitly regularizing z as in (7), we directly regularize it with E 2 : L 2 (Ω) → R ≥0 .To this end, we can rely on any well-established (convex) regularizer for images.The reconstruction R in ( 9) is composed of two summands, where one depends on v and the other on z.Therefore, designing efficient numerical schemes for (8) appears simpler than for (6), where we have a highly nonlinear dependence of R on ζ.Additionally, the dimensionality of z is lower than that of ζ as we do not have a time dependence.Since each tuple (v, z) gives rise to a unique reconstruction R v,z , we can eliminate the set C from (9) and end up with where the system (8) is implicitly encoded in R v,z .
Contributions We study the regularizing properties of the proposed model (10), and develop efficient numerical schemes for solving it.To the best of our knowledge, the only other templatebased reconstruction model with a source term is [18], but their approach involves a nonlinear coupling between the deformation and the source part.Further, their regularizer choice is restricted to differentiable ones.The simulations in [18] are based on L 2 regularization for ζ in (7), which in our experiments turned out to be unsuitable for sparse data.Better regularizers seem to be necessary to avoid reconstruction artifacts related to the source term in both ( 7) and (10).Hence, we propose to use TV regularization instead, which is known to yield good results for CT problems at reasonable computational cost.With this choice, we obtain meaningful reconstructions even for sparse data, which is a setting that [18] does not target.Compared to [23], we achieve reconstructions of similar quality, but with the advantage that topology changes are possible due to z.The detail of reconstructable structures that are not present in the template T depends on the available amount of data g, i.e., we cannot reconstruct detailed structures out of nothing.Hence, the proposed method is most useful if a good template T is available.Many algorithmic approaches for the flow of diffeomorphism model have been proposed during the last years [2,26,20,34,40,47,48].For linear forward operators K, these approaches can be often extended to the indirect setting (9) without any complicated modifications.As Lagrangian approaches turned out to be very efficient for indirect image matching, we decided to adapt the methods developed in [23,24], which build upon the FAIR toolbox [30].Since the problem is nonsmooth due to the TV regularization of z, we cannot deploy their proposed Gauss-Newton-Krylov solver, and we use the iPALM algorithm [37] instead.Similarly as in [23], the ODE in ( 8) is solved with an explicit Runge-Kutta method, which allows one for efficient algorithmic differentiation of (v, z) → R v,z .By construction of R v,z , computing its gradient ∇R v,z has basically the same cost as for the LDDMM-based model [23], which does not involve a source term z.Further, the approach does not require the storage of multiple space-time vector fields or images at intermediate time instances, as it is often the case when directly solving the PDE (5) with Eulerian methods.We want to emphasize that the proposed scheme can be implemented matrix-free, which is crucial when using dense forward operators K such as the Radon transform.Since iPALM has in general worse convergence rates than second-order methods, we combine it with a Gauss-Newton solver for v as post-processing step.This can only improve the objective function values, and in practice we also observed an improved reconstruction quality.
Outline In Section 2, the necessary theoretical background for the flow equation (2) and the total variation is provided.For the proposed model (10), existence of a minimizer, stability with respect to the data, and convergence for vanishing noise are established in Section 3. In order to approximate solutions of (10) numerically, we follow a discretize-then-optimize approach that involves the iPALM algorithm as outlined in Section 4. This allows one to easily exchange the regularizer for v and z if desired.Our implementation builds upon the FAIR toolbox [30], which allows for a simple extension to other distances and regularizers that are already implemented as part of the toolbox.Numerical results for the proposed model are provided in Section 5. Finally, conclusions are drawn in Section 6.

Preliminaries
In this section, we briefly review the necessary theoretical background.
Diffeomorphisms Recall that the deformations φ are induced by (2).For our theoretical investigations, it is useful to consider different initial times s ∈ [0, 1], i.e., the modified equation Here, the subscripts express the dependence on the initial time s and the velocity field v.This generalization enables us to move back and forth on trajectories φ s,v (•, x) starting from arbitrary time instants s, allowing us to rely on a unified theoretical result.The following theorem is a reformulation of [43, Thms. 1 and 9] and characterizes the solutions of (11).11).Further, the solution operator ) assigning a flow φ s,v to a velocity field v is continuous with respect to the weak topology in L 2 ([0, T ], V).
As φ t,v (0, φ 0,v (t, x)) = x, we directly get that φ 0,v (t, •) is a diffeomorphism for every t ∈ [0, 1].Now, let us have a closer look at the solutions of (8).Since Total variation The total variation [39] is a popular regularizer in imaging as it tends to preserve edges and sharp structures, which is in contrast to other techniques such as linear smoothing or Tikhonov regularization [7].For any Based on this, the total variation of f ∈ L 1 (Ω, R) is introduced as and the functions of bounded variation are defined as Equipped with the norm ∥f ∥ BV := ∥f ∥ L and since C 1 c (Ω, R d ) is dense and continuously embedded in C 0 (Ω, R d ), the gradient ∇f can be uniquely extended to a continuous linear functional on C 0 (Ω, R d ) using the Hahn-Banach theorem.

Therefore, we can associate a unique measure
Hence, BV(Ω) consists of those functions f ∈ L 1 (Ω) having a distributional gradient that is a finite Radon measures.Note that the domain of TV can be extended to It is well-known that this extension is proper, convex and (weakly) lower semi-continuous (lsc) [7,Lem. 6.105].Furthermore, we have for p ≤ d d−1 the continuous embedding BV(Ω) → L p (Ω) as well as the Poincaré-Wirtinger inequality ∥P 0 f ∥ p ≤ TV(f ), where More precisely, P 0 is the projection onto the complement of the subspace Π 0 of the constant functions.The projection onto Π 0 itself is denoted by This can be used to prove coercivity of functionals involving TV regularization, provided that the remaining terms are coercive with respect to ∥Q 0 f n ∥ p .

Regularizing Properties
Here, we study the regularizing properties of the reconstruction model (10) following the considerations in [23,Sec. 3] and making the necessary modifications due to additional source term z.
General case Throughout this section, we assume that V fulfills the regularity requirements from Theorem 2.1, i.e., V → C 1,α 0 (Ω, R d ) for some 0 < α ≤ 1. Regarding the data fidelity term D, the forward operator K, and the regularizers E 1 , E 2 , we make the following assumptions: 1.The operator K : L 2 (Ω) → Y is continuous and weak-weak-continuous, i.e., This can be interpreted as coercivity in v and z.

6.
The regularizers E 1 and E 2 are weakly lsc.
Remark 3.1.These conditions readily imply that if Further, it is easy to verify that Conditions 1-4 are fulfilled if D is a metric and K a bounded linear operator.
First, we prove existence of a minimizer for (10).
Proof.Let {v n , z n } n∈N ⊂ V × L 2 (Ω) be a minimizing sequence for (10).As E 1 (v n ) ≤ J(v 0 , z 0 )/λ 1 holds for all n ∈ N, the sequence {v n } n∈N is bounded in V by Condition 5. Hence, there exists a subsequence, also denoted with {v n } n∈N , such that v n ⇀ v * for some v * ∈ V.By Theorem 2.1 and the discussion thereafter, the sequence •) and is thus bounded.By Condition 5, we conclude that also {z n } n∈N is bounded.Therefore, we can extract a weakly convergent subsequence, also denoted with {z n } n∈N , with limit Hence, (v * , z * ) is a minimizer for (10).
Note that ( 10) is non-convex.Hence, we cannot expect uniqueness of the minimizer.If K is nonlinear, it appears sensible to require that it is completely continuous, i.e., that it maps weakly convergent sequences to strongly convergent ones.In this case, we can relax the constraints for D by considering operators that are only lsc.Next, we provide a result regarding the continuous dependence of minimizers for (10) on the data g ∈ Y .
Theorem 3.3 (Dependence on the data).Consider a sequence g n → g in Y .For each n ∈ N, let (v n , z n ) ∈ V × L 2 (Ω) be a minimizer of the functional J n := J λ,gn , where λ = (λ 1 , λ 2 ).Then, there exists a subsequence of {v n , z n } n∈N that converges weakly to a minimizer of J := J λ,g .(18) where the convergence follows by continuity of D in its second entry (Condition 1).Similarly as in Theorem 3.2, we choose a subsequence of {v n } n∈N with limit v * such that Now, Condition 5 implies that also {z n } n∈N is bounded.Hence, we may extract a weakly convergent subsequence {v n , z n } n∈N with limit . By incorporation of (19), we get that it holds Now, let (ṽ, z) ∈ V × L 2 (Ω) be arbitrary.By utilizing (20) and the weak lower semi-continuity of E 1 and E 2 , we have The continuity of D implies D(K(R ṽ,z ), g n ) → D(K(R ṽ,z ), g).Thus, it holds that which concludes the proof.
We conclude our investigation with a convergence result for vanishing noise, provided that we use an appropriate parameter choice rule λ = γ(δ).This enables us to approximate solutions of (1).Theorem 3.4 (Convergence for vanishing noise).Let T ∈ L 2 (Ω), g ∈ Y and assume that there exists Then there exists a subsequence of {v n , z n } n∈N that converges weakly to a point (v * , z * ) with K(R v * ,z * ) = g.

Proof.
For every n ∈ N, it holds that Hence, {v n } n∈N and {E 2 (z n )} n∈N are bounded.Further, D(K(R vn.zn ), g n ) is bounded, which as in Theorem 3.3 implies that {z n } n∈N is bounded.Therefore, {v n , z n } n∈N is bounded and possesses a weakly convergent subsequence (without relabeling) with limit (v * , z * ), for which the following estimate holds true Finally, by using Condition 3, we deduce K(R v * ,z * ) = g.

Specific setting
In our numerical experiments, we work with 2D images, i.e., Ω = (0, 1) 2 ⊂ R 2 , and we choose E 2 (z) := TV(z).Further, the data fidelity term D is chosen for all examples with synthetic data as D(f, g) = 1 2 ∥f − g∥ 2 , and the regularizer for v as where B is a differential operator such that Condition (5) is satisfied, i.e., E 1 is coercive in v. Since the Sobolev space V = H 3 0 (Ω, R 2 ) can be continuously embedded into C 1,0.5 (Ω, R 2 ), we have to choose a matrix B that encodes all third-order derivatives in space.We want to remark that our numerical approach in Section 4 works for any regularizers of the form (25), even if it is not coercive.Now, the specification of problem (10) reads In case that K : L 2 (Ω) → Y is linear, bounded and does not vanish for constant functions, the assumptions underlying our theoretical investigations in the previous paragraph are satisfied: • As K is linear and bounded, it is also continuous and weak-weak-continuous.
• Due to the norm properties, D is weakly lsc and continuous in both entries.Further, Condition 3 and 4 also follow from the norm properties.
• We show Condition 5 explicitly.Due to our choice of B, the regularizer E 1 is coercive.Hence, it remains to show the first part of the condition.Let {f n } n be bounded and let ∥z n ∥ → ∞.
Then, either ∥P 0 z n ∥ → ∞ or ∥Q 0 z n ∥ → ∞.In the first case, we have that E 2 (z n ) → ∞ and the claim follows by positivity of D. For the second one, we get due to linearity of K that where the first term remains bounded.Since ∥K(Q 0 z n )∥ → ∞ due to the assumption that K does not vanish for constant functions, we conclude ∥K(f n + z n ) − g∥ → ∞ and the claim follows by positivity of E 2 .
• The regularizers E 1 and E 2 are chosen such that they are weakly lsc.
Remark 3.5.Although this is not covered theoretically, we use a normalized-cross-correlation-based distance for our numerical experiments with real data as proposed in [23].This modification is necessary as the gray value scale between template and target is often different for real data.It holds that

Numerical Approach
In this section, we present a numerical scheme for solving (26).Our approach is based on the Lagrangian method developed in [23,24] as well as the iPALM algorithm [4,37].The actual implementation relies upon the FAIR toolbox [29].
So far, we have not specified the differential operator B in (26).As our approach is guided by the modular framework of [23,24], we could use curvature regularization, defined by B = ∇ x , or diffusion regularization, where B = ∆ x .These choices correspond to the H 1 and H 2 semi-norm, respectively.However, they do not satisfy the coercivity requirement, i.e., Condition 5. Instead, we use the H 3 semi-norm, which has been proposed in [23].Although our theoretical investigations for (26) in Section 3 only hold for the 2D case, we provide the algorithm in general form, and also deploy it for 3D images later.In the following, we briefly sketch the components of our approach.

Discretization
We pursue a discretize-then-optimize strategy.By partitioning every coordinate in m blocks of length h X = 1/m, the domain (0, 1) d is split into m d equally sized cubes.Then, the template T ∈ L 2 (Ω) and the source z ∈ L 2 (Ω) are both sampled at the cell-centered nodes x c ∈ R m d , resulting in discrete versions T(x c ) and z(x c ) ∈ R m d , respectively.Their values are interpolated by cubic B-splines if off grid values are required.Further, the time domain [0, 1] is uniformly partitioned into m t units of length h t = 1/m t .Then, the velocity v : [0, 1] × Ω → R d is sampled over cell-centered locations in space and at the nodes in time, resulting in a discrete velocity vector Lagrangian solver for R v,z In order to compute the solution map (v, z) → R v,z , we need to solve the flow equation (11).Here, every function φ s,v (•, x 0 ) : [0, 1] → Ω can be interpreted as a trajectory of some particle with position x 0 at initial time s.We compute R v,z as follows: 1. Computing the characteristics: For numerically solving (11), we employ a fourth order Runge-Kutta scheme (RK4).Since we require φ 1,v (0, •), we solve (11) backwards in time with N t equidistant steps of size ∆t = − 1 Nt and initial condition φ 1,v (1, x c ) = x c .To simplify the notation, the remaining discussion is instead based on the explicit Euler scheme where k = 0, . . ., N t − 1 and t k = 1 − k∆t.Here, I interpolates the velocity at time t k and transformed positions φ 1,v (t k , x c ).This is necessary as the points φ 1,v (t k , x c ) are in general not on the grid.Note that the time discretization parameters N t and m t differ in general: The first determines the accuracy of the ODE solver, whereas the second is related to the discretization.

2.
Deforming the template T : Based on the output φ 1,v (0, x c ) of the RK4-scheme, we can evaluate the template T at the deformed grid using interpolation.
3. Computing R v,z : Finally, we add the source term z and the deformed template Note that actually all steps of this procedure are independent of the forward operator K, the data fidelity term D and the chosen regularizers E 1 , E 2 .Further, the gradient ∇ v R v,z can be explicitly computed within the Runge-Kutta scheme, which is important for computational efficiency.
Forward operator Let us denote by K : R m d → R M , M ∈ N, a finite-dimensional, Fréchet differentiable approximation of the operator K : L 2 (Ω, R) → Y .With the application to CT in mind, we discuss a discretization of the d-dimensional Radon transform.More generally, if a discretization K of some operator K is given, we can simply insert it into the model (26).
For given θ ∈ S n−1 and s ∈ R, the Radon transform of f : R d → R is defined pointwise by where θ ⊥ is the orthogonal complement of span{θ} [31,Chap. 2].The Radon transform is linear and thus also Fréchet differentiable.For f ∈ L 2 (Ω), the value R(f ) ∈ Y is a function that maps from the cylinder S d−1 × R to R. We discretize this cylinder as follows: Take p ∈ N directions in S d−1 .For simplicity, we say that we take one measurement in each direction.Furthermore, we take the interval (0, 1) instead of R, and split it into q ∈ N equally sized cells of length 1/q, as we have also done with Ω. Depending on the dimension d and the diameter of Ω, the intervals length requires adjustment.Then, the data is sampled at cell-centered points y c for each angle, resulting in vectors g i (y c ) ∈ R q , i = 1, . . ., p, and the entire data vector is represented as g ∈ R M with M = p • q.A discrete Radon transform is implemented for both CPUs and GPUs as part of the ASTRA toolbox [36,45,46].

Data fidelity term and regularizers
We discretize the data fidelity term D(x, y) = 1 2 ∥x − y∥ 2 2 using the midpoint-rule for numerical integration, which results in with h Y = 1/q and x, y ∈ R M , see also [29,Chap. 6.2].As we consider the Radon transform for few directions θ ∈ S n−1 only, we disregard the necessary modifications related to the integration over the unit sphere.Similarly, the discretization of the regularizer E 1 is given by where B is a finite-difference counterpart of the chosen differential operator B. To mitigate boundary effects caused by the discretization of B, we pad the spatial domain and impose zero Neumann boundary conditions.For the TV regularizer, the norm is again discretized based on the midpointrule and the gradient is replaced by a finite difference counterpart, resulting in In our experiments, we employ backward differences and the boundary is extended by 0. To this end, we denote by z ∈ d i=1 R m the tensor representation of z ∈ R m d .For any i = 1, ..., m d , z i corresponds to z (i) = z i1,...,i d with i k = 1, ...m.Then, it holds that (∇ h X (z)

iPALM
Putting all parts from Section 4.1 together, we obtain the discrete problem In the following, we deploy the inertial proximal alternating linearized minimization (iPALM) algorithm [4,37] for solving (35).This scheme can solve generic problems of the form where , namely the proper, convex, and lsc functions on E i .The corresponding iterations are given by where prox τ f (x) = arg min y 1 2 ∥x−y∥ 2 2 +τ f (y) is the proximal mapping of f .The stepsizes σ 1 k , σ 2 k > 0 are chosen according to the respective partial Lipschitz-constants of ∇H, and α k > 0 is an inertia parameter, which helps to escape from local minima and boosts the convergence speed.In [37], the convergence of the iterations ( 37) is proven under the assumption that the objective has the Kurdyka-Łojasiewicz (KL) property.Among others, this property holds for semi algebraic functions, which include real polynomials, the ∥ • ∥ p -norm with rational, non-negative p, indicator-functions of semi-algebraic sets, as well as functions of the form x → sup{g(x, y) : y ∈ S}, where S is semialgebraic and g is a semi-algebraic function, see [4].Furthermore, compositions of semi-algebraic functions are semi-algebraic.(36) has the KL property.Further, let all functions have finite infima.Assume that ∇H is locally Lipschitz continuous and that both x i → ∇ xi H(x 1 , x 2 ) are globally Lipschitz, where the constants L 1 (x 2 ), L 2 (x 1 ) possibly depend on the fixed variable and are bounded on compact sets.Finally, assume and α k < 0.5 for every k ∈ N. If the sequence generated by (37) is bounded, then it converges to a critical point.
Remark 4.2.Although this is not supported by Theorem 4.1, it turned out that choosing ) works very well in practice [37].Unfortunately, the Lipschitz-moduli L(x k ), L(y k ) are often unknown or difficult to compute.Instead, a backtracking scheme can be used to ensure this condition, see [37] for details.
Next, we want to apply iPALM to problem (35).Both regularizers and the data fidelity term have the KL property since they are computed via composition of an affine function and a permissible norm function.Further, the operation (v, z) → R v,z is polynomial in (v, z) for every component as the RK4 scheme and linear interpolators provide a composition of polynomial functions.Hence, the objective in (35) has the KL property.We propose to use the splitting The partial gradients of H are given by where ∂K refers to the Fréchet-derivative of the operator K.If K is linear, its derivative coincides with the operator itself.The derivative ∂ ∂v R v,z of the solution map is given by Here, ∇ x T is the gradient of the interpolated template T .The derivative ∂ ∂v φ 1,v (0, x c ) can be computed recursively within the ODE solver for (11).Exemplary, we obtain for the explicit Euler scheme (29) that for all k = 0, . . ., N t − 1, see also [30].Further, we require the proximal mappings of G 1 and G 2 .For G 1 , it holds that where the minimum is determined by the linear system of equations (Id + σ 1 γh t h d X B T B)x = v.This system is sparse and efficiently solvable with a preconditioned conjugate gradient method.Regarding G 2 , we need to compute prox σ2 TV with the discrete TV (33).As outlined in [7], this can be done using the primal dual hybrid gradient method [9].
The step-sizes in the iPALM scheme (37) are chosen as the partial Lipschitz constants of ∇ v H(v, z) and ∇ z H(v, z), respectively.For a linear K, it holds Unfortunately, an upper bound for L 1 cannot be derived explicitly.Hence, we have to rely on backtracking instead.Remark 4.3 (Normalized-cross-correlation-based distance).For D NCC , we use the discretization for which the derivative is given by Incorporating (48), we can perform the according gradient steps in the iPALM scheme (37).Then, also a line search for estimating the Lipschitz constant with respect to z is needed.As discussed in Remark 3.5, D NCC should be paired with an L 2 -regularizer for the source variable to get theoretical guarantees.Then, we obtain E 2 (z) = ∥z∥ 2 L 2 (Ω) , which we discretize similarly as D SSD , namely as This results in the proximal operator Multi-level approach and post-processing Due to the non-convexity of (35), we have to cope with local minima.To avoid being trapped in one, we follow a multi-level strategy [30], which also helps to reduce the computational cost.Its different levels refer to different resolutions of the template and the target.We apply iPALM at each level, starting with the coarsest resolution.Each computed minimizer is bilinearly interpolated to the next finer scale to serve as initialization.This approach requires multi-level versions of the operator K and a method for downsampling the measurements g.If these are not available, iPALM can still be performed with only one scale.For the 2D Radon transform, multi-level versions of the operator K can be obtained with any backend that takes the discretization of the measurement geometry as an input.More precisely, assume that the number of grid cells used to discretize Ω ⊂ R 2 at the finest level is m = 2 l , l ∈ N. In our experiments, we set the number of cells for discretizing the measurement domain (0, 1) at level k ≤ l to q (k) = 1.5 • 2 k and the length of each cell to h where the denominator arises from averaging over two neighboring grid points and dividing the edge length of the image domain Ω in each coordinate direction in half.Additionally, after applying iPALM, we deploy the second-order inexact Gauss-Newton method from [23,24] for refining the velocity v.This method utilizes the same discretization and Lagrangian solver that is used for iPALM.Due to the linearity of K and the structure of D SSD , we can indeed fix z and consider the data g = g − K(z) within the corresponding LDDMM Gauss-Newton scheme for v.We observed that this can compensate the slower convergence rate of iPALM.Finally, note the described modifications can also be applied within the setting of Remark 4.3.

Remark 4.4.
There is a vast literature on alternating minimization and forward-backward schemes with variable metrics [3,5,11,16,41], which attempt to improve the convergence speed.For the iPALM scheme (37), our simulations indicate that an adaption of the metric is most promising for v.To pursue this idea further, we choose a different splitting of (35) and add G 1 to H instead. Hence, G 1 = 0 and prox G1 = Id is independent of the chosen metric for the coordinate v.There are different strategies for constructing metrics such that convergence guarantees can be obtained, e.g., minimize-maximize strategies [11] or sparse approximations of the Hessian [3].For a small benchmark, we deployed the same metric for v as proposed in [23], and solved (35) with a variable metric version of PALM, i.e., without the inertia steps.The sufficient decrease of the objective with respect to v is ensured with the same Armijo line search as in [23].Experimentally, this has led to similar results as our post-processing scheme.A more thorough comparison of the approaches could be an interesting direction of future research.

Numerical Examples
Here, we present numerical results for our model (26), discretized and solved numerically as described in Section 4. The code for all experiments is available on Github2 .Since we are mainly interested in a proof-of-concept, we only investigate CT and leave other inverse problems for future work.Our examples mostly rely on synthetic data generated from target images U of size 128 × 128 with range [0, 1].A corresponding sinogram g is obtained by applying the Radon transform with ten equally distributed angles in [0, 180] to U.Then, we add 5% Gaussian noise to get g.Further, we also include one example with real data from a CT scanner.For all examples, we use a timedependent velocity field v with a single time point, and 5 steps in the Runge-Kutta method that solves the associated equation (11).The multi-level procedure starts with resolution 32 × 32 at the coarsest scale.Throughout this section, all parameters are optimized via grid search.
For the first experiment, the template T is chosen as the Shepp-Logan phantom and the target U is a diffeomorphically deformed version of T with an additional small white square, see Figure 1.
As TV regularization favors constant areas, this image pair is well-suited for our model (26), and an almost perfect reconstruction is expected.Due to the additional structure, a good reconstruction with purely diffeomorphic approaches such as [10,23] is impossible.The regularization parameter λ 1 splits into λ 1 = [λ 1 1 , λ 2 1 , λ 3  1 ] for the spatial, temporal and L 2 (Ω, R) regularization of v, respectively.We have chosen λ 1 = [0.001,0.001, 10 −6 ] and λ 2 = 0.1.Recall that the reconstruction R can be decomposed into a deformation part T(φ −1 (1, x c )) and a source part z.We observe that the main structure is reconstructed as deformation of the template T, whereas z reconstructs the additional square, see Figure 1.This is indeed the expected behavior for properly chosen parameters.As comparison, we included a reconstruction with the standard L 2 -TV model [39], i.e., the solution of where we have chosen λ = 0.1 based on a grid search.For (52), we observe the typical reconstruction artifacts related to the Radon transform, i.e., rays crossing the reconstruction.If we increase λ to avoid these artifacts, the reconstruction looses details.Last, we also included a reconstruction with E 2 = ∥ • ∥ 2 and λ 2 = 0.0001 instead of E 2 = TV.This regularization has been investigated for the metamorphosis model (7) in [18].As expected, we get similar artifacts as in (52) with small λ.
In our second experiment, we deal with a pair of images that contain finer details.More precisely, we have chosen an artificial brain image [19] as template T, which is diffeomorphically deformed into the target U [23].Further, we added a structure in U to get a non-diffeomorphic setting.We also varied the smoothness of this structure, which leads to two sub-experiments.In the first one, we added a circle with constant intensity and in the second one a 2D Gaussian.The obtained results for (26) with λ 1 = [0.001,0.1, 10 −6 ] and λ 2 = 0.2 are depicted in Figure 2. Note that we increased the contrast for the error maps in order to improve the visibility.Our method is able to reconstruct all the major structures in U. Most of the errors occur at the boundaries of the structures, which is partially due to the employed interpolation.Similarly as reported in [23], our approach struggles with the swirl in the middle of U.This is most likely due to the large, almost non-diffeomorphic deformation and the limited amount of data.If we compare the z for the two sub-experiments, we observe that the reconstruction of the Gaussian is worse, which is not surprising as TV-regularization favors piece-wise constant images.In absence of topological changes, [23] demonstrated that a LDDMM-based approach without the source term can yield satisfactory results.A reconstruction with our method for U without the additional structure is provided in Figure 3.For λ 1 = [0.001,0.001, 10 −5 ] and λ 2 = 1, the reconstruction consists only of a deformation part.However, if λ 2 is chosen too small, artifacts related to the source z appear.
Next, we comment on the robustness with respect to parameter changes.As the comparative examples in Figure 3 and 7 show, the choice of parameters significantly impacts the quality of the reconstruction.For the images from Figure 2, the influence of the parameter choice in terms of SSD error and SSIM value is provided in Tabular 1.Some corresponding reconstructions are given in Figure 4. Changing each parameter by an order of magnitude leads to clearly visible changes in the reconstruction, see Figure 4.In particular, too large regularization parameters can repress the effect of their respective component.In contrast, changes on a smaller scale lead to robust reconstruction results.For the other examples from this section, we observed a similar behavior, although the precise scale depends on the underlying set of images.
For our third experiment, see Figure 5, we have chosen two X-ray images that are not diffeomorphic to each other and contain some noise structures.Finding the correct deformation for this pair is challenging as the deformation is relatively large and irregular.In this experiment, we     Table 1: SSD-error and SSIM for various parameters with the images from Figure 2.For greater clarity, the deformation regularization parameter is chosen as λ = λ 1 [0.001, 0.1, 10 −6 ].
compare the third-order regularizer (λ 1 = [0.5, 0.1, 10 −6 ], λ 2 = 1) with the curvature one from [24] (λ 1 = [6, 6], λ 2 = 1), which imposes less regularity on v.For both choices, we observe that the model struggles to bend the hand to correctly match the target.This is most noticeable at the right corner of the palm and at some of the fingers.For the third-order regularizer, we also notice that the fingers are not spread sufficiently from each other.The additional or disappearing structures are very fine, such as the noise beside the hands or the slight change in intensity on the edges of the bones.Hence, these are not well-suited for TV regularization and not reconstructed by our method.However, even if we would use a different E 2 , it is questionable if the amount of data suffices to reconstruct these structures.More precisely, reconstructing fine details without sufficient data or an appropriate prior is in general impossible.Nevertheless, this experiment shows that our model (26) allows to align image pairs with large deformations between them, even under the presence of noise.
In our last synthetic experiment, we demonstrate that our method is in principal also applicable to 3D CT reconstruction problems, see Figure 6.To this end, we use an image pair from [30] and the same 10 simulated 2D measurements as in [23], which correspond to an rotation around the third coordinate axis with angles equally distributed in [0, 180].Due to the problem size, we use curvature regularization on v with λ 1 = [0.07,0.07] for the spatial and temporal components, respectively, and for the source z we use the regularization parameter λ 2 = 0.01.Overall, we obtain a satisfying reconstruction with a SSIM of 0.9060, which is significantly better than the SSIM of 0.8807 obtained by [23].
In our final experiment, we tackle real CT data from a lotus root cross-section [8].Since the recorded data is dense, the underlying target U can be reconstructed via filtered back-projection.Retroactively, we deformed the computed target U and removed a hole in the lotus root to obtain a template T with a different topology, see Figure 7.To get a sparse setting, we subsampled g with 12 uniformly distributed angles in [0, 180].As we are already dealing with real data, no additional noise is added.Unfortunately, the intensity range of the given sinogram does not match the range of our Radon transform operator K. Therefore, D SSD is no sensible choice for comparing the real data with the simulated data K(R).Instead, we use D NCC , which is invariant to the scaling of the images' intensity, see Remark 3.5.Although our theoretical results in Section 3 do not apply in this setting, we obtain satisfying numerical results using λ 1 = [0.01,0.01, 10 −6 ] and λ 2 = 0.001, see Figure 7.More precisely, our method manages to find the main deformation and the additional hole.In this real data setting, we also investigate how a reconstruction with E 2 = ∥ • ∥ 2 instead of E 2 = TV performs.Here, our theoretical results from Section 3 hold again.To this end, we included two reconstructions corresponding to λ 2 = 0.1 and λ 2 = 0.01, respectively.In the first reconstruction, we have almost no artifcats, but can also only guess the new hole.For the second one, the hole can be clearly seen, but the artifcats are much stronger.Either way, the results are inferior to those generated with E 2 = TV.

Conclusions
In this paper, we extended the reconstruction model from [23] to also cope with topology changes.On the theoretical side, we were able to carry over all of the previous results.Compared to [18], we utilize a simplified metamorphosis approach, which allows one to use non-convex regularizers at lower computational cost.The chosen TV regularization enabled us to obtain satisfying reconstructions even for very limited data without suffering from the typical artifacts.So far, our experiments are a proof-of-concept.In the future, we also want to work with larger real data.To this end, it        could be necessary to use more sophisticated (maybe even problem-tailored) regularization methods for z or different data terms D, which can be incorporated into our model without much effort.Again, we stress that the method can be easily extended to higher dimensions and other forward operators K.Even without the scope of real data, this seems to be a natural direction of future research as our method is designed in a modular way.
Condition 3 is violated.Further, we do not have the required coercivity with respect to z (Condition 5).Hence, this setup is not covered by our theory.Nevertheless, our numerical experiments indicate that D NCC combined with E 1 = TV leads to good reconstructions.If we choose E 2 = ∥ • ∥ 2 L 2 (Ω) instead, we can derive the same theoretical results as for D(f, g) = ∥f −g∥ 2 .In this case, Condition 4 and 5 are obsolete, as they are only required to infer boundedness of z ∈ L 2 (Ω) from the boundedness of the objective, which now follows directly.The remaining conditions are met by D NCC , except for Condition 3. Hence, the convergence in Theorem 3.4 holds only up to a scalar.

Figure 1 :
Figure 1: Reconstruction for a cartoon like image with 10 measurement angles using our model (26) and the L 2 -TV model (52) as a baseline comparison.The reconstruction R obtained with (26) can be decomposed into a deformation and a source part.

Figure 2 :
Figure 2: Artifical brain image with data from 10 equally distributed angles in [0, 180].The first target contains an additional circle, which is well-suited for TV regularization of z.The second one contains an additional 2D Gaussian, which is more challenging for this setting.

Figure 3 :
Figure 3: Diffeomorphic counterpart to Figure 2, where U does not contain an additional structure.

Figure 5 :
Figure 5: Reconstruction of a human hand [30] with measurements for 10 angles in [0, 180].The underlying deformation is relatively large and the images are non-diffeomorphic.Especially the small noise structures outside of the bone areas are hard to reconstruct.

( a )
Template image T. (b) Target U.(c) Reconstruction of U.

Figure 6 :
Figure 6: Reconstruction of a 3D volume using only ten measurement directions.Each image depicts a slice of the volume along the third coordinate axis.

Figure 7 :
Figure 7: Reconstructions for data obtained by a CT scanner, namely 12 measurements with angles equally distributed in [0, 180].
R d on the Borel σ-algebra B(Ω) is called a vector-valued Radon measure if every coordinate function µ i : B(Ω) → R is a Radon measure.We denote by M(Ω, R d ) the space of vector-valued finite Radon measures.Due to the Riesz-Markov-Kakutani representation theorem, it holds C 0 (Ω, R d ) * ∼ = M(Ω, R d ), where C 0 (Ω, R d ) denotes the continuous functions vanishing at infinity.Hence, we can equip M(Ω, R d ) with the associated weak* convergence.