1 Introduction

Image registration is to find a transformation to map the corresponding image data, which are taken at different times, from different sensors, or from different viewpoints, for the purpose of telling the difference or merging information. Nowadays, image registration is widely used in many areas, such as computer vision, biological imaging, remote sensing and medical imaging [6, 21, 26, 32, 36, 38, 40, 47, 57].

In reality, according to the specific application, image registration can be classified into two categories: mono-modal registration and multi-modal registration. For multi-modal registration, finding a suitable distance measure is the most essential step [22, 35, 36, 47, 57]. The idea of this paper will be applicable to multi-modal registration framework, but we focus on the mono-modal registration in this work.

In dealing with the mono-modal registration, there are many choices of a data fidelity term [33] and a common approach for computing this transformation is to use the sum of squared differences (SSD) to measure the difference between the reference image R and the deformed template image T [11]. However, minimization of SSD alone in image registration is an ill-posed problem in the sense of Hadamard since it may have many solutions. In order to overcome this difficulty, regularization is indispensable [38, 52]. However, the choice of the regularization term, which needs some prior information about physical properties and helps to avoid the local minima, depends on the specific application.

All registration models are nonlinear but they can be classified into two main categories according to the way deformation mapping is represented: linear registration and nonlinear registration. In linear registration, the deformation model is linear and global, including rotation, translation, shearing and scaling [11, 38]. Although the computation speed of a linear model is fast since it contains few variables, it is commonly used as the pre-registration for starting a more sophisticated model. This is mainly because linear models can not accommodate the local details (differences). In contrast, nonlinear registration models inspired by physical processes of transformations [47] such as the elastic model [5], fluid model [9], diffusion model [16], TV (total variation) model [19], MTV (modified TV) model [12], linear curvature model [17, 18], mean curvature model [14], Gaussian curvature model [27] and total fractional-order variation model [56] are proposed to account for localized variation in details, by allowing many degrees of freedom. The particular free-form deformation models based on B-splines lying between the above two types possess simplicity, smoothness, efficiency and ability to describe local deformation with few degrees of freedom [44, 45, 47]. For relatively small deformation, all models can be effective, but for large deformation, not all models are effective and in particular few models can guarantee a one-to-one mapping unless one fine tunes the coupling parameters to reduce the deformation magnitude allowed (since the mapping quality is perfect if deformation is zero) which in turn loses the ability of modelling large deformation.

Over the last decade, more and more researchers have focused on diffeomorphic image registration where folding measured by the local invertibility quantity \(\det (J_{\mathbf{y }})\) is reduced or avoided. Here, \(\mathbf{y }\) denotes the transformation in the registration model and \(\det (J_{\mathbf{y }})\) is the Jacobian determinant of \(\mathbf{y }\). Under desired assumptions, obtaining a one-to-one mapping is a natural choice as reviewed in [47].

In 2004, Haber and Modersitzki [23] proposed an image registration model imposing volume preserving constraints, by ensuring \(\det (J_{\mathbf{y }})\) is close to 1. Although volume preservation is very important in some applications where some underlying (e.g. anatomical) structure is known to be incompressible [47], it is not required or reasonable in others. In a later work, the same authors [25] relaxed the constraint to allow \(\det (J_{\mathbf{y }})\) to lie in a specific interval. Yanovsky et al. [55] applied the symmetric Kullback–Leibler distance to quantify \(\det (J_\mathbf{y })\) to achieve a diffeomorphic mapping. Burger et al. [7] designed a volume penalty term that ensured that shrinkage and growth had the same cost in their variational functional. The constrained hierarchical parametric approach [41] ensures that the mapping is globally one-to-one and thus preserves topology in the deformed image. Sdika [46] introduced a regularizer to penalize the non-invertible transformation. In [51], Vercauteren et al. proposed an efficient non-parametric diffeomorphic image registration algorithm based on Thirion’s demons algorithm [49]. In addition, a framework called large deformation diffeomorphic metric mapping (LDDMM) can generate the diffeomorphic transformation for image registration [3, 15, 37, 50]. An entirely different framework proposed by Lam and Lui [30] obtains diffeomorphic registrations by constraining Beltrami coefficients of a quasi-conformal map \({\varvec{f}}=y_1({\mathbf{x }} )+ {\varvec{i}}y_2({\mathbf{x }})\), instead of controlling the map \(\mathbf{y }({\mathbf{x }} )\) directly.

In this paper, we aim to reformulate the Lam and Lui Beltrami measure as a direct regularizer for controlling \(\det (J_\mathbf{y })\) and to assess the effectiveness of the resulting variational models; though the idea applies to any commonly used models, we apply it to the diffusion model as one simple example. Our contributions are twofold:

  • We propose a new Beltrami coefficient-based regularizer that is explicitly expressed in terms of \(\det (J_{\mathbf{y }})\). This establishes a link between the Beltrami coefficient of the transformation and the quantity \(\det (J_{\mathbf{y }})\).

  • An effective, iterative scheme is presented and numerical experimental results show that the new registration model has a good performance and produces a diffeomorphic mapping while remaining competitive to the state-of-the-art models from non-Beltrami frameworks.

We remark that several interesting works that are concerned with reversible transformations (such as [8, 54]) may also benefit from this study.

The rest of the paper is organized as follows. Section 2 briefly reviews the basic mathematical formulation of image registration modelling, several typical regularization terms and how to get a diffeomorphic transformation for image registration. In Sect. 3, we propose a new regularizer and a new registration model. The effective discretization and numerical scheme are discussed in Sect. 4. Numerical experiment results are shown in Sect. 5, and finally a summary is concluded in Sect. 6.

2 Preliminaries, Regularization and Diffeomorphic Transformation

In general, image registration aims to compare, in space \(\mathbb {R}^{d}\), two or more images or image sequences in a video. In this work, we consider the case of a pair of images \(T, R:\Omega \subset \mathbb {R}^{d}\rightarrow \mathbb {R}\) and \(d = 2\). Here by convention, R is the Reference image and T is the (moving) Template image.

The aim of image registration is to find a transformation \(\mathbf y (\mathbf x )\) such that

$$\begin{aligned} T\circ \mathbf{y }(\mathbf x ) = T(\mathbf{y }(\mathbf x )) \approx R, \end{aligned}$$

where \(\mathbf x =(x_{1},x_{2})\) and \(\mathbf{y }(\mathbf x ) = (y_{1}(\mathbf x ),y_{2}(\mathbf x ))\). That is, the transformation \(\mathbf{y }(\mathbf x )\) moves T to match R. If we define \(\mathbf{y }(\mathbf x ) =\mathbf x +\mathbf{u }(\mathbf x )\), then \(\mathbf{u }(\mathbf x ) = (u_{1}(\mathbf x ),u_{2}(\mathbf x ))\) indicates how much T moves, i.e. \(\mathbf{u }(\mathbf x )\) is the displacement. Thus, the determination of the transformation \(\mathbf y (\mathbf x )\) is equivalent to the determination of the displacement field \(\mathbf{u }(\mathbf x )\).

2.1 Data Fidelity

One way to ensure that \(T(\mathbf{y })\) can approximate R is to minimize the difference \(T(\mathbf{y }) - R\). A commonly used difference measure is the sum of squared differences (SSD) defined by

$$\begin{aligned} {\mathcal {D}}[\mathbf{y }]= & {} \frac{1}{2}\int _{\Omega }(T(\mathbf y )-R)^{2}\hbox {d}{} \mathbf x \nonumber \\= & {} \frac{1}{2}\Vert T(\mathbf{y })-R\Vert ^{2}\nonumber \\= & {} \frac{1}{2}\Vert T(\mathbf x +\mathbf u )-R\Vert ^{2} = {\mathcal {D}}[\mathbf u ] \end{aligned}$$
(1)

where \(\Vert \cdot \Vert ^{2}\) denotes the squared \(L_{2}\)-norm. Of course, there are some other typical distance measures, including normalized cross-correlation [38], mutual information [35, 38], normalized gradient fields [24, 39] and mass-preserving measure [7].

2.2 Regularization

Minimizing any of the above-mentioned measures is inefficient to obtain a unique transformation \(\mathbf{y }\) for image registration, because \(\min {\mathcal {D}}[\mathbf{y }]\) is ill-posed [38, 39]. In order to overcome this problem, regularization is necessary. Combining distance measure and regularization gives the variational model for image registration:

$$\begin{aligned} \min _{\mathbf{u }} J(\mathbf{u }) = {\mathcal {D}}[\mathbf{u }] + \alpha S[\mathbf{u }], \end{aligned}$$
(2)

where \({\mathcal {D}}[\mathbf{u }]\) is the distance measure from (1), \(S[\mathbf{u }]\) is the regularizer to be discussed and \(\alpha \) is a positive parameter to balance these two terms.

There exist many regularizers and we can classify them into three categories:

  • First-order regularizers involving \(|\nabla \mathbf{u }|\) or \(|\nabla \cdot \mathbf{u }|\). The diffusion regularizer [16] and the TV regularizer [19] are well-known first-order regularizers. The former one aims to control smoothness of the displacement and the latter one can preserve the discontinuity.

  • Fractional-order regularizer \(\nabla ^\alpha \mathbf{u }\) with \(\alpha \in (1,2)\). In [56], a fractional-order regularizer is used for image registration. Because the fractional-order regularizer is a global regularizer, its implementation must explore the structured Toeplitz matrices. This regularizer can not only produce accurate and smooth solutions but also allow for a large rigid alignment [56].

  • Second-order regularizers involving \(\nabla ^2 \mathbf{u }\) or \(\nabla \cdot (\nabla \mathbf{u }/|\nabla \mathbf{u }|)\). These include the linear curvature regularizer [17, 18], mean curvature regularizer [14] and Gaussian curvature regularizer [27].

The first two categories of models require an affine linear transformation in an initial pre-registration step while the latter category does not need a linear transformation in pre-registration.

Differing from the above three categories, an important class of fluid-like models based on partial differential equations was developed to capture large deformations. Christensen et al. [10] proposed an effective viscous fluid model characterized by a spatial smoothing of the velocity field. For the viscous fluid model, the deformation is governed by the Navier–Stokes equation:

$$\begin{aligned} \eta \nabla ^{2}{} \mathbf v +(\eta +\lambda )\nabla (\nabla \cdot \mathbf v )+\mathbf F =0, \quad \mathbf v =\partial _{t}{} \mathbf u +\mathbf v \cdot \nabla \mathbf u . \end{aligned}$$
(3)

Here, \(\eta \) and \(\lambda \) are the viscosity coefficients, the term \(\nabla ^{2}{} \mathbf v \) constrains the velocity field to vary smoothly, the term \(\nabla (\nabla \cdot \mathbf v )\) allows structures in the template to change in mass and \(\mathbf F \) is the nonlinear deformation force field, which can be defined by \((T(\mathbf x +\mathbf u )-R)\nabla {T}\). The velocity field \(\mathbf v \) is initialized as \(\mathbf 0 \) in implementation. In [10], the condition \(|\det (J_\mathbf{y })|\ge 0.5\) is checked at each iteration and if not satisfied, restarting the numerical solver is initiated so that a diffeomorphic transform is obtained; see also [38]. Further in [55], the model is enhanced by incorporating a volume preservation idea relating to minimizing \(|\det (J_{\mathbf{y }})-1|\) again to ensure diffeomorphism without restarting.

Next, we review the Diffusion model [16]

$$\begin{aligned} \min _{\mathbf{u }} J(\mathbf{u })= & {} {\mathcal {D}}[\mathbf{u }] + \alpha S[\mathbf{u }] \nonumber \\= & {} \frac{1}{2}\int _{\Omega }(T({\mathbf{x }+\mathbf u })-R)^{2}\hbox {d}{} \mathbf x \nonumber \\&+\, \frac{\alpha }{2}\int _{\Omega }\sum _{\ell =1}^{2} |\nabla u_{\ell }|^{2}\hbox {{d}}{} \mathbf x . \end{aligned}$$
(4)

It leads to the Euler–Lagrange equation:

$$\begin{aligned}&(T(\mathbf x+u )-R)\nabla _{\mathbf{u }} T(\mathbf x+u ) - \alpha \Delta \mathbf{u }= 0\\&\qquad \ \hbox {i.e.} \begin{array}{l} (T(\mathbf x+u )-R)\partial _{u_1} T(\mathbf x+u ) - \alpha \Delta u_1 = 0, \\ (T(\mathbf x+u )-R)\partial _{u_2} T(\mathbf x+u ) - \alpha \Delta u_2 = 0, \end{array} \end{aligned}$$

subject to \(\langle \nabla u_{\ell },\mathbf{n } \rangle = 0\) on \(\partial \Omega \) and \(\ell = 1, 2\). Particularly, there exits a fast implementation based on the so-called additive operator splitting (AOS) scheme [38, 53]. In [13], a fast solver was developed for this model.

However, as with other models reviewed in the three categories, the obtained solution \(\mathbf{u }\) or \(\mathbf{y }\) is mathematically correct but often incorrect physically. This is due to no guarantee of mesh non-folding which is measured by \(\det (J_{\mathbf{y }})>0\), i.e. a positive determinant of the local Jacobian matrix \(J_\mathbf{y }\) of the transform \(\mathbf{y }\).

2.3 Models of Diffeomorphic Transformation

To achieve \(\det (J_{\mathbf{y }})>0\), one can find several recent works that impose this constraint in some direct ways. We review a few of such models before we present our new constraint. In the form of (4), the idea is to choose \(S_1[\cdot ]\) in the following (note \({\mathbf{y }=\mathbf{x }+ \mathbf{u }}\))

$$\begin{aligned} \min _{\mathbf{u }} J(\mathbf{u }) = {\mathcal {D}}[\mathbf{u }] + \alpha S[\mathbf{u }] + \beta S_1[ \mathbf{y } ]. \end{aligned}$$
(5)

2.3.1 Volume Control

In 2004, Haber and Modersitzki [23] used volume preserving constraint (area in 2D) for image registration, namely

$$\begin{aligned} \det (J_{\mathbf{y }}) = 1. \end{aligned}$$

As a consequence, we can ensure that the transformation is diffeomorphic. However, volume preservation is not desirable when the anatomical structure is compressible in medical imaging.

2.3.2 Slack Constraint

Improving on [25], the constraint \(\det (J_{\mathbf{y }})=1\) is relaxed and a slack constraint is proposed

$$\begin{aligned} M_{a}\le \det (J_{\mathbf{y }}) \le M_{b}, \end{aligned}$$

where a positive interval \([M_{a},M_{b}]\) is provided by the user as prior information in the specific application e.g. \([M_{a},M_{b}]=[0.1, 2]\).

2.3.3 Unbiased Transform

In [55], according to the information theory, \(\det (J_{\mathbf{y }})\) is controlled by the symmetric Kullback–Leibler distance

$$\begin{aligned} \int _{\Omega }|\det (J_{\mathbf{y }})-1|\log (|\det (J_\mathbf{y })|)\hbox {d}{} \mathbf x . \end{aligned}$$

It can help to get an unbiased diffeomorphic transformation. This idea was tested with the fluid regularizer (first order).

2.3.4 Balance of Shrinkage and Growth

Geometrically \(\det (J_{\mathbf{y }})=1\) implies volume preservation. Similarly \(\det (J_{\mathbf{y }})<1\) implies shrinkage while \(\det (J_\mathbf{y })>1\) implies growth. A function that treats the cases of shrinkage and growth identically is \(\phi (x)=((x-1)^2/x)^2\) since \(\phi (1/x)=\phi (x)\). A volume penalty

$$\begin{aligned} \int _{\Omega }\left( \frac{(\det (J_\mathbf{y })-1)^{2}}{\det (J_{\mathbf{y }})}\right) ^{2}\hbox {d}{} \mathbf x \end{aligned}$$
(6)

is used in the hyperelastic model [7], which ensures that shrinkage and growth have the same price.

2.3.5 LDDMM Framework

In LDDMM framework, the deformation is modelled by considering its velocity over time according to the transport equation. We can write its variational formulation as follows:

$$\begin{aligned} \begin{aligned}&\min _{{\mathcal {T}},v} {\mathcal {D}}({\mathcal {T}}(\cdot ,1),R) + \alpha {\mathcal {S}}(v)\\&\hbox {{s.t.}}\quad \partial _{t}{\mathcal {T}}(\mathbf x ,t)+v(\mathbf x ,t)\cdot \nabla {\mathcal {T}}(\mathbf x ,t)=0 \ \text{ and } \ {\mathcal {T}}(\mathbf x ,0) = T, \end{aligned} \end{aligned}$$

where \(v:\Omega \times [0,1]\rightarrow \mathbb {R}^{2}\) is the velocity and \({\mathcal {T}}:\Omega \times [0,1]\rightarrow \mathbb {R}\) is a series of images. For more details, please see [3, 15, 37, 47, 50]

2.3.6 Beltrami Indirect Control

In 2014, Lam and Lui [30] presented a novel approach in a Beltrami framework to obtain diffeomorphic registrations with large deformations using landmark and intensity information via quasi-conformal maps. Before introducing this model, we first describe some basic theories about quasi-conformal map and Beltrami coefficient.

A complex map \(z=x_1+\mathbf{i}x_2 \longmapsto f(z)=y_1(x_1,x_2)+ \mathbf{i}y_2(x_1,x_2)\) from a domain in \(\mathbb {C}\) onto another domain is quasi-conformal if it has continuous partial derivatives and satisfies the following Beltrami equation:

$$\begin{aligned} \frac{\partial f}{\partial {\bar{z}}} = \mu (f)\frac{\partial f}{\partial z}, \end{aligned}$$
(7)

for some complex-valued Lebesgue measurable \(\mu \) [4] satisfying \(\Vert \mu \Vert _{\infty } < 1\). Here \(\mu =\mu ({\mathbf{y }})\equiv f_{{\bar{z}}}/f_z\) is called the Beltrami coefficient explicitly computable from \({\mathbf{y }}\) since

$$\begin{aligned} \left\{ \begin{aligned} f_z=\frac{\partial f}{\partial z}&\equiv \frac{1}{2}\left( \frac{\partial f}{\partial x_1} - \mathbf{i}\frac{\partial f}{\partial x_2}\right) = \frac{(y_1)_{x_1}+(y_2)_{x_2}}{2} + \mathbf{i} \frac{(y_2)_{x_1}-(y_1)_{x_2}}{2}, \\ f_{{\bar{z}}}=\frac{\partial f}{\partial {\bar{z}}}&\equiv \frac{1}{2}\left( \frac{\partial f}{\partial x_1} + \mathbf{i}\frac{\partial f}{\partial x_2}\right) = \frac{(y_1)_{x_1}-(y_2)_{x_2}}{2} + \mathbf{i} \frac{(y_2)_{x_1}+(y_1)_{x_2}}{2}, \end{aligned}\right. \end{aligned}$$
(8)

where \((y_1)_{x_{1}}=\partial y_1/\partial x_1\). Conversely \(\mathbf y =\mathbf y ^{\mu }\) can be computed for a given \(\mu \) through solving \(\mu (\mathbf y ) = \mu \).

A quasi-conformal map is a homeomorphism (i.e. one-to-one) and its first-order approximation takes small circles to small ellipses of bounded eccentricity [20]. As a special case, \(\mu =0\) means that the map f is holomorphic and conformal, characterized by \(f_{ {\bar{z}}}=0\) or \(y_1, y_2\) satisfying the Cauchy–Riemann equations \((y_1)_{x_1} = (y_2)_{x_2}, \ (y_1)_{x_2} =- (y_2)_{x_1}\).

Thus in the context of image registration, enforcing \(\Vert \mu \Vert _{\infty } < 1\) provides the control for the transform f and ensures homeomorphism. The quasi-conformal hybrid registration model (QCHR) in [30] is

$$\begin{aligned} \min _\mathbf{y }\int _{\Omega }|\nabla \mu |^{2}+\alpha \int _{\Omega }|\mu |^{p}+\beta \int _{\Omega }(T( \mathbf{y })-R)^{2} \end{aligned}$$
(9)

subject to \(\mathbf{y }=(y_1,y_2)\) satisfying

  1. (1)

    \(\mu = \mu ( \mathbf{y })\);

  2. (2)

    \( \mathbf{y }(p_{j})=q_{j}\) for \(1\le j\le m\) (Landmark constraints);

  3. (3)

    \(\Vert \mu ( \mathbf{y })\Vert _{\infty }<1\) (bijectivity),

which indirectly controls \(\det (J_{\mathbf{y }})\) via Beltrami coefficient, where \(\mu ( \mathbf{y })\) is the Beltrami coefficient of the transformation \( \mathbf{y }\). The above model is solved by a penalty splitting method. It minimizes the following functional:

$$\begin{aligned}&\int _{\Omega }|\nabla \nu |^{2}+\alpha \int _{\Omega }|\nu |^{p}+\sigma \int _{\Omega }|\nu -\mu |^{2}\nonumber \\&\quad +\beta \int _{\Omega }(T( \mathbf{y }^{\mu })-R)^{2} \end{aligned}$$
(10)

subject to the constraints that \(\Vert \nu \Vert _{\infty }<1\) and \(\mathbf y ^{\mu }\) be the quasi-conformal map with Beltrami coefficient \(\mu \) satisfying \(\mathbf y ^{\mu }(p_{j}) = q_{j}\) for \(1\le j\le m\). Then in each iteration, it needs to solve the following two subproblems alternately:

$$\begin{aligned} \begin{aligned}&\mu _{n+1} = \arg \min \sigma \int _{\Omega }|\mu -\nu _{n}|^{2}+\beta \int _{\Omega }(T( \mathbf{y }^{\mu })-R)^{2}\\&\hbox {{s.t.}} \quad \mathbf y ^{\mu }(p_{j}) = q_{j} \quad \hbox {{for}} \ 1\le j\le m \end{aligned} \end{aligned}$$
(11)

and

$$\begin{aligned} \nu _{n+1}= & {} \arg \min \int _{\Omega }|\nabla \nu |^{2}\nonumber \\&+\alpha \int _{\Omega }|\nu |^{p}+\sigma \int _{\Omega }|\nu -\mu _{n+1}|^{2}. \end{aligned}$$
(12)

In addition, it also solves the equation \(\mu (\mathbf y )=\mu \) by the linear Beltrami solver (LBS) [34] to find \(\mathbf y \) and ensures that \(\mathbf y \) matches the landmark constraints.

Thus, instead of controlling the Jacobian determinant of the transformation directly, controlling Beltrami coefficient is also a good alternative providing the same but indirect control. However, since their algorithm [30] has to deal with two main unknowns (the transformation \(\mathbf y \) and its Beltrami coefficient \(\mu \)) and one auxiliary unknown (the coefficient \(\nu \)) in a non-convex formulation, the increased cost, practical implementation and convergence are real issues; for challenging problems, one cannot observe convergence and therefore the full capability of the model is not realized.

We are motivated to reduce the unknowns and simplify their algorithm. Our solution is to reformulate the problem in the space of the primary variable \({\mathbf{y }}\) or \(\mathbf{u}\), not in the transformed space of variables \(\mu , \nu \). We make use of the explicit formula of \(\mu =\mu ({\mathbf{y }})\). Working with primal mapping \({\mathbf{y }}\) enables us to introduce the advantages of minimizing a Beltrami coefficient to the above reviewed variational framework (2), effectively unifying the two frameworks.

Hence, we propose a new regularizer-based Beltrami coefficient and, in the numerical part, we can find that it is easy to be implemented. Moreover, the reformulated control regularizer can potentially be applied to a large class of variational models.

3 The Proposed Image Registration Model

In this section, we aim to present a new regularizer based on Beltrami coefficient, which can help to get a diffeomorphic transformation. Then combining the new regularizer with the diffusion model, we present a novel model. Of course, combining with other models may be studied as well since the idea is the same.

For \(f(z) = y_1(x_{1},x_{2})+\mathbf{i}y_2(x_{1},x_{2})\), according to the Beltrami equation (7) and the definitions (8), we have

$$\begin{aligned} \mu (f)= & {} \frac{\partial f}{\partial {\bar{z}}}\Big /\frac{\partial f}{\partial z}\nonumber \\= & {} \frac{((y_1)_{x_{1}}-(y_2)_{x_{2}})+\mathbf{i}((y_2)_{x_{1}}+(y_1)_{x_{2}})}{((y_1)_{x_{1}}+(y_2)_{x_{2}})+ \mathbf{i}((y_2)_{x_{1}}-(y_1)_{x_{2}})}, \end{aligned}$$
(13)
$$\begin{aligned} |\mu (f)|^{2}= & {} \displaystyle \frac{((y_1)_{x_{1}}-(y_2)_{x_{2}})^{2}+((y_2)_{x_{1}}+(y_1)_{x_{2}})^{2}}{((y_1)_{x_{1}}+(y_2)_{x_{2}})^{2}+((y_2)_{x_{1}}-(y_1)_{x_{2}})^{2}} \nonumber \\= & {} \displaystyle \frac{\Vert J_{f}\Vert _2^2 - 2 \det (J_{f})}{\Vert J_{f}\Vert _2^2 + 2 \det (J_{f})}. \end{aligned}$$
(14)

Note \((y_1)_{x_{1}}(y_2)_{x_{2}}-(y_2)_{x_{1}}(y_1)_{x_{2}} = \det (J_{f})\). So \(\det (J_{f})\) can be represented by the Beltrami coefficient \(\mu (f)\)

$$\begin{aligned} \det (J_{f}) = |f_{z}|^{2}(1-|\mu (f)|^{2}) \end{aligned}$$
(15)

Clearly \(\det (\nabla f)>0\) if \(|\mu (f)|<1\), and by the inverse function theorem, the map f is locally bijective. We conclude that f is diffeomorphism if we assume that \(\Omega \) is bounded, simply connected.

For more details about quasi-conformal theory, the readers can refer to [1, 20, 31].

3.1 New Regularizer

Our new regularizer based on \(|\mu (f)|<1\) to control the transformation to get a diffeomorphic mapping is

$$\begin{aligned} S_1[ \mathbf{y } ] = \int _{\Omega } \phi ( |\mu |^2 ) d\mathbf{x },\quad |\mu |^2=\frac{\Vert J_{\mathbf{y }}\Vert _2^2 - 2 \det (J_{\mathbf{y }})}{\Vert J_{\mathbf{y }}\Vert _2^2 + 2 \det (J_{\mathbf{y }})} \end{aligned}$$
(16)

which clearly involves the Jacobian determinant \(\det (J_\mathbf{y })\) in a non-trivial way and we explore the choices of \(\phi \) below.

Remark

Our new regularizer has two advantages: one is that the obtained transformation \(\mathbf y \) do not need to possess \(\det (J_\mathbf{y })\rightarrow 1\); the other one is that we only compute the transformation and do not need to compute its Beltrami coefficient and introduce another auxiliary unknown as [30]. In addition, from the numerical experiments, we can see that our new regularizer is easy to implement and obtain accurate and diffeomorphic transformations.

3.2 The Proposed Model

The above regularizer (16) providing a constraint on \(\mathbf{y }\) is ready to be combined with an existing model. In the framework (5), using (16), the first version of our new model takes the form

$$\begin{aligned} \min _{\mathbf{y }} \frac{1}{2}\Vert T(\mathbf{y })-R \Vert ^2_2 +\frac{\alpha }{2}\Vert \ |\nabla {\mathbf{u }}|\ \Vert ^{2}_2 + \beta \int _{\Omega } \phi ( |\mu |^2 ) d\mathbf{x } \end{aligned}$$
(17)

where \(\mathbf{u } = \mathbf{y }(\mathbf x )-\mathbf{x } =(y_{1}(\mathbf x ),y_{2}(\mathbf x ))-\mathbf{x }\) is the deformation field, \(|\nabla {\mathbf{u }}|^{2}= |\nabla u_1|^{2}+|\nabla u_2|^{2}\) and \(\mu =\mu ({\mathbf{y }})\). To promote \(|\mu (f)|<1\), our first and simple choice is \(\phi (v)=\phi _{1}(v)=\frac{1}{(v-1)^{2}}\), which forces (17) and \(\phi (v)\) to reduce v, at the initial guess \(v=0\) when \(\mathbf{u}\)=\(\mathbf{0}\), since \(\phi _{1}(v)\rightarrow \infty \) when \(v\rightarrow 1\).

Remark

From (9) and (17), we see that the QCHR model focuses on obtaining a smooth Beltrami coefficient and our model focuses on the diffeomorphic transformation itself. There are major differences between the regularizer in QCHR model and our new regularizer: the former is characterized by the Beltrami coefficient \(\mu \) directly and gradient of this Beltrami coefficient \(\mu \), while the latter is characterized by the Beltrami coefficient indirectly in terms of the transformation \(\mathbf y \) and the gradient of \(\mathbf u \). Since \(\mathbf y =\mathbf x + \mathbf u \) is our desired transformation, our direct regularizers such as \(|\nabla \mathbf u |^2\) make more sense than indirect regularizers such as \(|\nabla \mu |^2\).

However, as long as \(|\mu (f)|<1\), we would not give a preference to forcing \(|\mu (f)|\rightarrow 0\). To put some control on bias, similarly to [7], we are led to 2 more choices of a less unbiased function to modify \(S_1[ \mathbf{y } ]\)

  • \(\phi (v) =\phi _2(v)= \frac{v}{(v-1)^{2}}\):   balance \(|\mu (f)|\) between 0 and 1 as \(\phi _2(v)=\phi _2(1/v)\);

  • \(\phi (v) =\phi _3(v)=\frac{v^2}{(v-1)^{2}}\):   encourage \(|\mu (f)|\rightarrow 0\) and \(|\mu (f)|\not =1\);

Below, we list first-order derivatives and second-order derivatives for the above different \(\phi (v)\):

  • \({\phi }'_{1}(v)=\frac{2}{(v-1)^{3}}\) and \({\phi }''_{1}(v)=\frac{6}{(v-1)^{4}}\);

  • \({\phi }'_{2}(v)=-\frac{v+1}{(v-1)^{2}}\) and \({\phi }''_{2}(v)=\frac{2v+4}{(v-1)^4}\);

  • \({\phi }'_{3}(v)=-\frac{2v}{(v-1)^{3}}\) and \({\phi }''_{3}(v)=\frac{4v+2}{(v-1)^4}\)

which will be used in subsequent solutions. With a general \(\phi (v)\), the second version of our proposed model takes the form:

$$\begin{aligned}&\min _{\mathbf{u }} \frac{1}{2}\int _{\Omega }(T(\mathbf x+u )-R)^{2}\hbox {d}{} \mathbf x \nonumber \\&+\, \frac{\alpha }{2}\int _{\Omega }\sum _{\ell =1}^{2}|\nabla u_{\ell }|^{2}\hbox {d}{} \mathbf x + \beta \int _{\Omega }\phi (|\mu |^{2})\hbox {d}{} \mathbf x , \end{aligned}$$
(18)

where \(|\mu |^{2} = \frac{(\partial _{x_{1}}u_{1}-\partial _{x_{2}}u_{2})^{2} +(\partial _{x_{1}}u_{2}+\partial _{x_{2}}u_{1})^2}{(\partial _{x_{1}}u_{1} +\partial _{x_{2}}u_{2}+2)^{2}+(\partial _{x_{1}}u_{2}-\partial _{x_{2}}u_{1})^2}\) is written in component form ready for discretization, using \(y_1=x_{1}+u_1(x_1,x_2), \ y_2=x_{2}+u_2(x_1,x_2)\), and \(\partial _{x_{1}}u_{1}=\partial u_1/\partial x_1\).

Remark

For the existence or uniqueness of a solution of (18), this is out of the scope of the present work and will be considered in our forthcoming work.

4 The Numerical Algorithm

In this section, we will present a numerical algorithm to solve model (18). We choose the discretize—optimize approach. Directly discretizing this variational model gives rise to a finite-dimensional optimization problem. Then we use optimization methods to solve this resulting problem.

4.1 Discretization

We use finite differences to discretize model (18) on a unit square domain \(\Omega =[0,1]^2\). In implementation, we employ the nodal grid and define a spatial partition \(\Omega _{h} = \{\mathbf{x }^{i,j}\in \Omega \ |\ \mathbf{x }^{i,j}=(x_{1}^{i},x_{2}^{j})=(ih,jh), 0 \le i \le n, 0 \le j \le n\}\), where \(h = \frac{1}{n}\) and the discrete domain consists of \(n^{2}\) cells of size \(h \times h\). We discretize the displacement field \(\mathbf{u }\) on the nodal grid, namely \(\mathbf u ^{i,j} = (u_{1}^{i,j},u_{2}^{i,j}) = (u_{1}(x_{1}^{i},x_{2}^{j}), u_{2}(x_{1}^{i},x_{2}^{j}))\). For ease presentation, according to the lexicographical ordering, we reshape

$$\begin{aligned}&X = \left( x_{1}^{0},\ldots ,x_{1}^{n},\ldots ,x_{1}^{0},\ldots ,x_{1}^{n},x_{2}^{0},\ldots ,x_{2}^{0},\ldots ,x_{2}^{n},\ldots ,x_{2}^{n}\right) ^{T}\\&\qquad \in {{\mathbb {R}}}^{2(n+1)^{2}\times 1}, \end{aligned}$$

and

$$\begin{aligned}&U = \left( u_{1}^{0,0},\ldots ,u_{1}^{n,0},\ldots ,u_{1}^{0,n},\ldots ,u_{1}^{n,n}, u_{2}^{0,0},\ldots ,u_{2}^{n,0},\ldots ,u_{2}^{0,n},\ldots ,u_{2}^{n,n}\right) ^{T} \\&\qquad \in {{\mathbb {R}}}^{2(n+1)^{2}\times 1}. \end{aligned}$$
Fig. 1
figure 1

Partition of domain \(\Omega =\cup _{ij}\Omega _{i,j}\). Note that solutions \(u_{1}\) and \(u_{2}\) are defined at nodes. a Illustration of cell-centred partition: Green cell denoted by \(\Omega _{i,j}\). Nodal Grid \(\square \), b partition for \(\partial _{x}\) and \(\partial _{y}\). The left yellow cell is \(\Omega _{i,j}^{x_{1}}\) and the right green cell is \(\Omega _{i,j}^{x_{2}}\) (Color figure online)

4.1.1 Discretization of Term 1 in (18)

According to the cell-centred partition in Fig. 1a and mid-point rule, we get

$$\begin{aligned} \begin{aligned} {\mathcal {D}}[\mathbf{u }]:&= \frac{1}{2}\int _{\Omega }(T(\mathbf{x }+\mathbf{u }(\mathbf{x }))-R(\mathbf{x }))^{2}\hbox {d}\mathbf{x }\\&= \frac{h^{2}}{2}\sum _{i=0}^{n-1}\sum _{j=0}^{n-1}(T(\mathbf{x }^{i+\frac{1}{2},j+\frac{1}{2}}\\&\quad +\mathbf{u }(\mathbf{x }^{i+\frac{1}{2},j+\frac{1}{2}}))-R(\mathbf{x }^{i+\frac{1}{2},j+\frac{1}{2}}))^{2}. \end{aligned} \end{aligned}$$
(19)

Set \(\vec {R} = R(PX) \in {{\mathbb {R}}}^{n^{2}\times 1}\) as the discretized reference image and \(\vec T(PX+PU) \in {\mathbb R}^{n^{2}\times 1}\) as the discretized deformed template image, where \(P \in {{\mathbb {R}}}^{2n^{2} \times 2(n+1)^{2}}\) is an averaging matrix for the transfer from the nodal grid representation of U to the cell-centred positions.

Consequently, for SSD, we obtain the following discretization:

$$\begin{aligned} {\mathcal {D}}[\mathbf{u }] \approx \frac{h^2}{2} (\vec T(PX+PU)-\vec {R})^{T}(\vec T(PX+PU)-\vec {R}).\nonumber \\ \end{aligned}$$
(20)

4.1.2 Discretization of Term 2 in (18)

For the diffusion regularizer,

$$\begin{aligned} {\mathcal {S}}_{\mathrm{diff}}[\mathbf{u }] := \frac{\alpha }{2}\int _{\Omega }\sum _{\ell =1}^{2}|\nabla u_{\ell }|^{2}\hbox {d}{} \mathbf x , \end{aligned}$$
(21)

according to the partition in Fig. 1b and mid-point rule, we have

$$\begin{aligned} \int _{\Omega _{i,j}^{x_{1}}}\vert \partial _{x_{1}} u_{\ell }\vert ^{2}\hbox {d}{} \mathbf x \approx h^{2}(\partial ^{i+\frac{1}{2},j}_{x_{1}}u_{\ell })^{2} \qquad 1 \le j\le n-1, \end{aligned}$$
(22)

or at the boundary half-boxes

$$\begin{aligned} \int _{\Omega _{i,j}^{x_{1}}}\vert \partial _{x_{1}} u_{\ell }\vert ^{2}\hbox {d}{} \mathbf x \approx \frac{h^{2}}{2}(\partial ^{i+\frac{1}{2},j}_{x_{1}}u_{\ell })^{2} \qquad j=0,n. \end{aligned}$$
(23)

And for \(\int _{\Omega _{i,j}^{x_{2}}}\vert \partial _{x_{2}} u_{\ell }\vert ^{2}\hbox {d}{} \mathbf x ,\ \ell =1,2\), we have similar results.

As designed, we use compact (short) difference schemes to compute the \(\partial _{x_{1}}u_{\ell }\) and \(\partial _{x_{2}}u_{\ell },\ \ell =1,2\):

$$\begin{aligned} \partial ^{i+\frac{1}{2},j}_{x_{1}}u_{\ell }\approx & {} \frac{u_{\ell }^{i+1,j}-u_{\ell }^{i,j}}{h}, \nonumber \\ \partial ^{i,j+\frac{1}{2}}_{x_{2}}u_{\ell }\approx & {} \frac{u_{\ell }^{i,j+1}-u_{\ell }^{i,j}}{h}. \end{aligned}$$
(24)

Then (21) can be rewritten in the following formulation:

$$\begin{aligned} {\mathcal {S}}_{\mathrm{diff}}[\mathbf{u }] \approx \frac{\alpha h^2}{2}U^{T}A^{T}GAU. \end{aligned}$$
(25)

See “Appendix A” for details on A and G.

Remark

Note that here the matrix A is the discretized gradient matrix. So \(A^{T}GA\) is the discretized Laplace matrix.

4.1.3 Discretization of Term 3 in (18)

For simplicity, denote \(|\mu ( \mathbf{y })| =|\mu ( \mathbf{x } +\mathbf u )|\) by \(|\mu ( \mathbf{u })|\). From (18), note that \(\phi (|\mu ( \mathbf{u })|^{2})\) involves only first-order derivatives and all \(\mathbf{u }^{i,j}\) are available at vertex pixels. Thus it is convenient first to obtain approximations at all cell centres (e.g. at \(V_5\) in Fig. 2) and second to use local linear elements to facilitate first-order derivatives. We shall divide each cell (Fig. 2) into 4 triangles. In each triangle, we construct two linear interpolation functions to approximate the \(u_{1}\) and \(u_{2}\). Consequently, all partial derivatives are locally constants or \(\phi (|\mu ( \mathbf{u })|^{2})\) is constant in each triangle.

According to the partition in Fig. 2, we get

$$\begin{aligned} {\mathcal {S}}_{\mathrm{Beltrami}}[\mathbf u ]:= & {} \beta \int _{\Omega }\phi (|\mu ( \mathbf u )|^{2})\hbox {d}{} \mathbf x \nonumber \\= & {} \beta \sum _{i=1}^{n}\sum _{j=1}^{n}\sum _{k=1}^{4}\int _{\Omega _{i,j,k}}\phi (|\mu (\mathbf u )|)^{2})\hbox {d}{} \mathbf x . \end{aligned}$$
(26)

Set \(\mathbf{L }^{i,j,k}(\mathbf{x })= (L_{1}^{i,j,k}(\mathbf x ),L_{2}^{i,j,k}(\mathbf{x }))= (a^{i,j,k}_{1}x_{1}+a^{i,j,k}_{2}x_{2}+a^{i,j,k}_{3}, a^{i,j,k}_{4}x_{1}+a^{i,j,k}_{5}x_{2}+a^{i,j,k}_{6})\), which is the linear interpolation for \(\mathbf{u }\) in the \(\Omega _{i,j,k}\). Note that \(\partial _{x_{1}} L^{i,j,k}_{1} = a^{i,j,k}_{1}, \partial _{x_{2}} L^{i,j,k}_{1} = a^{i,j,k}_{2},\partial _{x_{1}} L^{i,j,k}_{2} = a^{i,j,k}_{4}\) and \(\partial _{x_{2}} L^{i,j,k}_{2} = a^{i,j,k}_{5}\). According to (18), the discretization of Beltrami regularizer can be written into following:

$$\begin{aligned}&{\mathcal {S}}_{\mathrm{Beltrami}}[\mathbf{u }] \approx \frac{\beta h^{2}}{4}\sum _{i=1}^{n}\sum _{j=1}^{n}\sum _{k=1}^{4} \phi \nonumber \\&\left( \frac{\left( a^{i,j,k}_{1}-a^{i,j,k}_{5}\right) ^{2} +\left( a^{i,j,k}_{2}+a^{i,j,k}_{4}\right) ^{2}}{\left( a^{i,j,k}_{1}+a^{i,j,k}_{5}+2\right) ^{2} +\left( a^{i,j,k}_{2}-a^{i,j,k}_{4}\right) ^{2}}\right) . \end{aligned}$$
(27)

To simplify (27), define 3 vectors \(\vec {\mathbf{r }}(U), \vec {\mathbf{r }}^{1}(U), \vec {\mathbf{r }}^{2}(U)\) \(\in \mathbb {R}^{4n^{2}}\) by \(\vec {\mathbf{r }}(U)_{\ell }=\vec {\mathbf{r }}^{1}(U)_{\ell } \vec {\mathbf{r }}^{2}(U)_{\ell }\), \(\vec {\mathbf{r }}^{1}(U)_{\ell }=(a^{i,j,k}_{1}-a^{i,j,k}_{5})^{2} +(a^{i,j,k}_{2}+a^{i,j,k}_{4})^{2}\), \(\vec {\mathbf{r }}^{2}(U)_{\ell }=1\big /[(a^{i,j,k}_{1}+a^{i,j,k}_{5}+2)^{2} +(a^{i,j,k}_{2}-a^{i,j,k}_{4})^{2}]\) where \(\ell = (k-1)n^{2}+(j-1)n+i\ \in [1, 4n^2]\). Hence, (27) becomes

$$\begin{aligned} {\mathcal {S}}_{\mathrm{Beltrami}}[\mathbf{u }] \approx \frac{\beta h^{2}}{4}{\varvec{\phi }}(\vec {\mathbf{r }}(U))e^{T} \end{aligned}$$
(28)

where \({\varvec{\phi }}(\vec {\mathbf{r }}(U)) = (\phi (\vec {\mathbf{r }}(U)_{1}),\ldots ,\phi (\vec {\mathbf{r }}(U)_{4n^{2}}))\) denotes the pixel-wise discretization of \(u_1, u_2\) at all cell centres, and \(e = (1,\ldots ,1)\in \mathbb {R}^{4n^{2}}\). Here, \(\vec {\mathbf{r }}(U)\) is the square of the discretized Beltrami coefficient; we rewrite it in a compact form in “Appendix B”.

Finally, combining the above three parts (20), (25) and (28), we get the discretization formulation for model (18):

$$\begin{aligned} \begin{aligned}&\min _{U} J(U):= \frac{h^2}{2}(\vec T(PX+PU)-\vec {R})^{T}(\vec T(PX+PU)-\vec {R})\\&\quad + \frac{\alpha h^2}{2}U^{T}A^{T}GAU+ \frac{\beta h^{2}}{4}{\varvec{\phi }}(\vec {\mathbf{r }}(U))e^{T}. \end{aligned} \end{aligned}$$
(29)

Remark

According to the definition of \(\phi \) and \(\vec {\mathbf{r }}(U)_{\ell } \ge 0\), each component of \({\varvec{\phi }}(\vec {\mathbf{r }}(U))\) is nonnegative and differentiable.

Fig. 2
figure 2

Partition of a cell, nodal point \(\square \) and centre point \(\circ \). \(\triangle V_{1}V_{2}V_{5}\) is \(\Omega _{i,j,k}\)

4.2 Optimization Method for the Discretized Problem (29)

In the numerical implementation, we choose line search method to solve the resulting unconstrained optimization problem (29). In order to guarantee the search direction is a descent direction, we employ the Gauss–Newton direction as the standard direction involving non-definite Hessians does not generate a descent direction. Otherwise, using a Gauss–Newton approach presents two advantages: one is that we do not need to compute the second-order term and it can save computation time; the other one is that this Gauss–Newton matrix is more important than the second term, either because of small second-order derivatives or because of small residuals [42].

Let \(J(U): \mathbb {R}^{2(n+1)^{2}}\rightarrow \mathbb {R}\) be twice continuously differentiable, \(U^{i}\in \mathbb {R}^{2(n+1)^{2}}\) and the approximated Hessian \(H(U^{i})\) positive definite. We model J at the current point \(U^{i}\) by the quadratic approximation \(q^{i}(s)\),

$$\begin{aligned} J(U^{i}+s)\approx q^{i}(s)= & {} J(U^{i})+d_{J}(U^{i})^{T}s \nonumber \\&+ \frac{1}{2}s^{T}H(U^{i})^{T}s, \end{aligned}$$
(30)

where \(s= U-U^{i}\) and \(d_{J}(U^{i}) = \nabla J(U^{i})\). Minimizing \(q^{i}(s)\) yields

$$\begin{aligned} U^{i+1} = U^{i}-[H(U^{i})]^{-1}d_{J}(U^{i}). \end{aligned}$$
(31)

In order to guarantee the global convergence of the Gauss–Newton method, we employ the line search and its iteration is as follows:

$$\begin{aligned} U^{i+1} = U^{i}-\theta _{i}[H(U^{i})]^{-1}d_{J}(U^{i}). \end{aligned}$$
(32)

where \(\theta _{i}\) is a step length.

Next, we will investigate the details about the approximated Hessian \(H(U^{i})\), step length \(\theta _{i}\), stopping criteria and multilevel strategy.

4.2.1 Approximated Hessian H

We consider each of the three terms in J(U) from (29) separately.

Firstly, we consider the discretized SSD

$$\begin{aligned} \frac{h^{2}}{2}(\vec T(PX+PU)-\vec {R})^{T}(\vec T(PX+PU)-\vec {R}). \end{aligned}$$
(33)

Its gradient and Hessian are, respectively,

$$\begin{aligned} \left\{ \begin{array}{lcl} d_{1} &{}=&{} h^{2}P^{T}\vec {T}_{\tilde{\mathbf{U }}}^{T}(\vec T(\tilde{\mathbf{U }})-\vec {R})\in {{\mathbb {R}}}^{2(n+1)^{2}\times 1},\\ H_{1} &{}=&{} h^{2}P^{T}(\vec {T}_{\tilde{\mathbf{U }}}^{T}\vec {T}_{\tilde{\mathbf{U }}} + \sum _{\ell = 1}^{n^{2}}(\vec T(\tilde{\mathbf{U }})-\vec {R})_{\ell }\nabla ^{2}(\vec T(\tilde{\mathbf{U }})-\vec {R})_{\ell })P \end{array} \right. \end{aligned}$$
(34)

where \(\tilde{\mathbf{U }} =PX+PU\) and \(\vec {T}_{\tilde{\mathbf{U }}} = \frac{\partial \vec T(\tilde{\mathbf{U }}) }{\partial \tilde{\mathbf{U }}}\) as the Jacobian of \(\vec T\) with respect to \( \tilde{\mathbf{U }}\).

For \(H_{1}\), we cannot ensure that it is positive semi-definite. If it is not positive definite, we may not get a descent direction. So we omit the second-order term of \(H_{1}\) to obtain the approximated Hessian of (33):

$$\begin{aligned} {\hat{H}}_{1} = h^{2}P^{T}(\vec {T}_{\tilde{\mathbf{U }}}^{T}\vec {T}_{\tilde{\mathbf{U }}})P. \end{aligned}$$
(35)

Remark

Evaluation of the deformed template image T must involve interpolation because \(\tilde{\mathbf{U }}\) do not in general correspond to pixel points; in our implementation, as with [39], we use B-splines interpolation to get \(\vec T(\tilde{\mathbf{U }})\).

Secondly, for the discretized diffusion regularizer \(\frac{\alpha h^{2}}{2} U^{T}A^{T}GAU\),

its gradient and Hessian are the following:

$$\begin{aligned} \left\{ \begin{array}{lcl} d_{2} &{}=&{} \alpha h^{2}A^{T}GAU \in {{\mathbb {R}}}^{2(n+1)^{2}\times 1},\\ H_{2} &{}=&{} \alpha h^{2}A^{T}GA \in {{\mathbb {R}}}^{2(n+1)^{2}\times 2(n+1)^{2}}. \end{array} \right. \end{aligned}$$
(36)

Since \(H_{2}\) is positive definite when U is applied with Dirichlet boundary conditions, we do not approximate it.

Finally, for the discretized Beltrami term

$$\begin{aligned} \frac{\beta h^{2}}{4}{\varvec{\phi }}(\vec {\mathbf{r }}(U))e^{T}, \end{aligned}$$
(37)

the gradient and the Hessian are as follows:

$$\begin{aligned} \left\{ \begin{array}{lcl} d_{3} &{}=&{} \frac{\beta h^{2}}{4} \hbox {d}\vec {\mathbf{r }}^{T}\hbox {d}{\varvec{\phi }}(\vec {\mathbf{r }}) \in {{\mathbb {R}}}^{2(n+1)^{2}\times 1},\\ H_{3} &{}=&{} \frac{\beta h^{2}}{4} (\hbox {d}\vec {\mathbf{r }}^{T}\hbox {d}^{2}{\varvec{\phi }}(\vec {\mathbf{r }})\hbox {d}\vec {\mathbf{r }} + \sum _{\ell =1}^{4n^{2}}[\hbox {d}{\varvec{\phi }}(\vec {\mathbf{r }})]_{\ell }\nabla ^{2}\vec {\mathbf{r }}_{\ell }) \in {\mathbb R}^{2(n+1)^{2}\times 2(n+1)^{2}} \end{array} \right. \nonumber \\ \end{aligned}$$
(38)

where \(\hbox {d}{\varvec{\phi }}(\vec {\mathbf{r }})= (\phi '(\vec {\mathbf{r }}_{1}),\ldots ,\phi '(\vec {\mathbf{r }}_{4n^{2}}))^{T}\) is the vector of derivatives of \({\varvec{\phi }}\) at all cell centres,

$$\begin{aligned} \left\{ \begin{array}{lcl} \hbox {d}\vec {\mathbf{r }}\ &{} = &{} {{\mathrm{diag}}}(\vec {\mathbf{r }}^{1})\hbox {d}\vec {\mathbf{r }}^{2}+{{\mathrm{diag}}}(\vec {\mathbf{r }}^{2})\hbox {d}\vec {\mathbf{r }}^{1}, \\ \hbox {d}\vec {\mathbf{r }}^{1} &{} =&{} 2{{\mathrm{diag}}}(A_{1}U)A_{1} + 2{{\mathrm{diag}}}(A_{2}U)A_{2}, \\ \hbox {d}\vec {\mathbf{r }}^{2} &{} =&{} -{{\mathrm{diag}}}(\vec {\mathbf{r }}^{2}\odot \vec {\mathbf{r }}^{2})[2{{\mathrm{diag}}}(A_{3}U+2)A_{3} + 2{{\mathrm{diag}}}(A_{4}U)A_{4}], \end{array} \right. \end{aligned}$$
(39)

\(\odot \) denotes a Hadamard product, \(\hbox {d}\vec {\mathbf{r }}, \hbox {d}\vec {\mathbf{r }}^{1}, \hbox {d}\vec {\mathbf{r }}^{2}\) are the Jacobian of \(\vec {\mathbf{r }}, \vec {\mathbf{r }}^{1}, \vec {\mathbf{r }}^{2}\) with respect to U, respectively, \( [\hbox {d}{\varvec{\phi }}(\vec {\mathbf{r }})]_{\ell }\) is the \(\ell \)th component of \(\hbox {d}{\varvec{\phi }}(\vec {\mathbf{r }})\) and \(\hbox {d}^{2}{\varvec{\phi }}(\vec {\mathbf{r }})\) is the Hessian of \({\varvec{\phi }}\) with respect to \(\vec {\mathbf{r }}\), which is a diagonal matrix whose ith diagonal element is \(\phi ''(\vec {\mathbf{r }}_{i}),\ 1\le i \le 4n^{2}\). Here \({{\mathrm{diag}}}(v)\) is a diagonal matrix with v on its main diagonal. More details about \(\vec {\mathbf{r }}^{1}\), \(\vec {\mathbf{r }}^{2}\), \(A_1\), \(A_2\), \(A_3\) and \(A_4\) are shown in “Appendix B” and some illustration of our notation is given in “Appendix C”.

To extract a positive semi-definite part out of (38), we omit the second-order term and obtain the approximated Hessian as

$$\begin{aligned} {\hat{H}}_{3} = \frac{\beta h^{2}}{4} \hbox {d}\vec {\mathbf{r }}^{T}\hbox {d}^{2}{\varvec{\phi }}(\vec {\mathbf{r }})\hbox {d}\vec {\mathbf{r }}. \end{aligned}$$
(40)

Therefore for functional J(U) in (29) with any choice of \(\phi \), we obtain its gradient

$$\begin{aligned} d_{J} = d_{1}+d_{2}+d_{3} \end{aligned}$$
(41)

and approximated Hessian:

$$\begin{aligned} H = {\hat{H}}_{1}+H_{2}+{\hat{H}}_{3}. \end{aligned}$$
(42)

4.2.2 Search Direction

At each iteration, using (41) and (42), we need to solve the Gauss–Newton system to find the search direction of (29):

$$\begin{aligned} H\delta U=-\,d_{J}, \end{aligned}$$
(43)

where \(\delta U\) is the search direction. In our implementation, we use MINRES with diagonal preconditioning to solve this linear system [2, 43].

4.2.3 Step Length

We use the standard Armijo strategy with backtracking to find a suitable step length \(\theta \). In the implementation, we also need to check that \(\vec {\mathbf{r }}(U)\) (54) is smaller than 1. Recall that \(\vec {\mathbf{r }}(U)\) is the norm square of the discretized Beltrami term. As a safe guard, we choose T0 = \(10^{-8}\) and Tol = \(10^{-12}\) as the lower bound of the step length \(\theta \) and \(\theta \Vert \delta U\Vert \) [7, 28, 42, 48]. The algorithm is summarized in Algorithm 1.

figure c

4.2.4 Stopping Criteria

Here, we adopt the stopping criteria as in [39]:

  1. (1.a)

    \(\Vert J( U^{i+1})-J( U^{i})\Vert \le \tau _{J}(1+\Vert J( U^{0})\Vert )\),

  2. (1.b)

    \(\Vert \mathbf{y }^{i+1}-\mathbf{y }^{i}\Vert \le \tau _{W}(1+\Vert \mathbf{y }^{0}\Vert )\),

  3. (1.c)

    \(\Vert d_{J}\Vert \le \tau _{G}(1+\Vert J( U^{0})\Vert )\),

  4. (2)

    \(\Vert d_{J}\Vert \le \) eps,

  5. (3)

    \(i \ge \) MaxIter.

Here, eps is the machine precision and MaxIter is the maximal number of outer iterations. We set \(\tau _{J} = 10^{-3}\), \(\tau _{W} = 10^{-2}\), \(\tau _{G} = 10^{-2}\) and MaxIter\( = 500\). If any one of (1) (2) and (3) is satisfied, the iterations are terminated. Hence, a Gauss–Newton numerical scheme with Armijo line search can be developed. The resulting Gauss–Newton numerical scheme by using Armijo line search is summarized in Algorithm 2.

figure d

Next, we discuss the global convergence result of Algorithm 2 for our reformulated problem (29). Firstly, we review some relevant theorem.

Theorem 1

([28]) For the unconstrained optimization problem

$$\begin{aligned} \min _{U} J(U) \end{aligned}$$

let an iterative sequence be defined by \(U^{i+1}=U^{i}+\theta \delta U^{i}\), where \(\delta U^{i}=-(H^{i})^{-1}d_{J}(U^{i})\) and \(\theta \) is obtained by Algorithm 1. Assume that three conditions are met: (i). \(d_{J}\) be Lipschitz continuous; (ii). the matrices \(H^{i}\) are SPD (iii). there exist constants \({\bar{\kappa }}\) and \(\lambda \) such that the condition number \(\kappa (H^{i})\le {\bar{\kappa }}\) and the norm \(||H^{i}||\le \lambda \) for all i. Then either \(J(U^{i})\) is unbounded from below or

$$\begin{aligned} \lim _{i\rightarrow \infty } d_{J}(U^{i})=0 \end{aligned}$$
(44)

and hence any limit point of the sequence of iterates is a stationary point.

Remark

In the above discretization leading to (29), we do not need to introduce the boundary condition. However, for theory purpose, in the following, we will prove our convergence result under the Dirichlet boundary condition (namely, the boundary is fixed) and this condition is needed to prove the symmetric positive definite (SPD) property of the approximated Hessians. In practical implementation, such a condition is not required as confirmed by experiments.

In addition, define an important set \({\mathcal {X}}:=\{U \ | \ \vec {\mathbf{r }}(U)_{\ell }\le 1-\epsilon , 1 \le \ell \le 4n^{2}\}\) for small \(\epsilon \). So \(U\in {\mathcal {X}}\) means that the transformation is diffeomorphic. Under the suitable \(\beta \), we assume that each \(U^{i}\) generated by Algorithm 2 is in the \({\mathcal {X}}\).

Secondly we stage a simple lemma that is needed shortly for studying \(H^i\).

Lemma 2

Let a matrix be comprised of 3 submatrices \(H = H_{1}+H_{2}+H_{3}\). If \(H_{1}\) and \(H_{2}\) are symmetric positive semi-definite and \(H_{3}\) is SPD, then H is SPD with \(\lambda _{h_{3}}\le \lambda _{h}\), where \(\lambda _{h_{3}}\) and \(\lambda _{h}\) are the minimum eigenvalues of \(H_{3}\) and H separately.

Proof

According to Rayleigh quotient, we can find a vector v such that

$$\begin{aligned} \lambda _{h} = \frac{v^{T}Hv}{v^{T}v}. \end{aligned}$$
(45)

Then we have

$$\begin{aligned} \lambda _{h_{3}}\le \frac{v^{T}H_{1}v}{v^{T}v}+\frac{v^{T}H_{2}v}{v^{T}v}+\frac{v^{T}H_{3}v}{v^{T}v} = \frac{v^{T}Hv}{v^{T}v} = \lambda _{h}, \end{aligned}$$
(46)

which completes the proof. \(\square \)

Theorem 3

Assume that T and R are twice continuously differentiable. For (29), when \(\phi =\phi _{1},\phi _{2}\) or \(\phi _{3}\), by using Algorithm 2, we obtain

$$\begin{aligned} \lim _{i\rightarrow \infty }d_{J}(U^{i})=0 \end{aligned}$$
(47)

and hence any limit point of the sequence of iterates produced by Algorithm 2 is a stationary point.

Proof

It suffices to show that Algorithm 2 satisfies the requirements of Theorem 1. Recall \(\vec {\mathbf{r }}(U)\) and we can see that it is continuous. Here, we use the Dirichlet boundary condition and we can assume that \(\Vert U\Vert \) is bounded. Then \(\vec {\mathbf{r }}(U)\) is a continuous mapping from a compact set to \(\mathbb {R}^{4n^{2}\times 1}\) and \(\vec {\mathbf{r }}(U)\) is proper. So for some small \(\epsilon >0\), \({\mathcal {X}}\) is compact.

Firstly, we show that in \({\mathcal {X}}\), \(d_{J}\) of (29) is Lipschitz continuous. When \(\phi =\phi _{1},\phi _{2}\) or \(\phi _{3}\), the term \({\varvec{\phi }}(\vec {\mathbf{r }}(U))e^{T}\) in the (29) is twice continuously differentiable with respect to \(U \in {\mathcal {X}}\). In addition, T and R are twice continuously differentiable. So (29) is twice continuously differentiable with respect to \(U \in {\mathcal {X}}\) and \(d_{J}\) is Lipschitz continuous.

Secondly, we show that in \({\mathcal {X}}\), \(H^{i}={\hat{H}}^{i}_{1}+ H^{i}_{2}+{\hat{H}}^{i}_{3}\) is SPD. By the construction of \(\hat{H}^{i}_{1}\) and \({\hat{H}}^{i}_{3}\), they are symmetric positive semi-definite. \(H^{i}_{2}\) is symmetric positive definite under the Dirichlet boundary condition. Consequently \(H^{i}\) is SPD.

Thirdly, we show that both \(\kappa (H^i)\) and \(\Vert H^{i}\Vert \) are bounded. We notice that in each iteration, \(H^{i}_{2}=\alpha h^{2}A^{T}GA\) is constant and we can set \(\Vert H^{i}_{2}\Vert = M_{2}\). For \({\hat{H}}^{i}_{1} = h^{2}P^{T}(\vec {T}_{\tilde{\mathbf{U }}}^{T} \vec {T}_{\tilde{\mathbf{U }}})P\), we get its upper bound \(M_{1}\) because T is twice continuously differentiable and \({\mathcal {X}}\) is compact. For \(\phi =\phi _{1},\phi _{2}\) or \(\phi _{3}\), \(\phi \) is twice continuously differentiable with respect to \(U \in {\mathcal {X}}\), then we have \(\Vert {\hat{H}}^{i}_{3}\Vert \le \frac{\beta h^{2}}{4}\Vert \hbox {d}\vec {\mathbf{r }}^{T}\Vert \Vert \hbox {d}^{2}{\varvec{\phi }}(\vec {\mathbf{r }})\Vert \Vert \hbox {d}\vec {\mathbf{r }}\Vert \le M_{3}\). Hence, we have

$$\begin{aligned} \Vert H^{i}\Vert \le \Vert {\hat{H}}^{i}_{1}\Vert +\Vert H^{i}_{2}\Vert +\Vert {\hat{H}}^{i}_{3}\Vert \le M_{1} + M_{2} + M_{3}. \end{aligned}$$
(48)

So set \(M=M_{1}+M_{2}+M_{3}\) and \(\Vert H^{i}\Vert \le M\). Set \(\sigma \) as the minimum eigenvalue of \(H^{i}_{2}\). According to Lemma 2, the smallest eigenvalue \(\lambda _{min}\) of \(H^{i}\) should be larger than \(\sigma \). The largest eigenvalue \(\lambda _{max}\) of \(H^{i}\) should be smaller than M due to \(\lambda _{max}\le \Vert H^{i}\Vert \). So the conditional number of \(H^{i}\) is smaller than \(\frac{M}{\sigma }\).

Finally, we can find that (29) has lower bound 0. So by applying Theorem 1, we finish the proof. \(\square \)

4.3 Multilevel Strategy

In practice, we employ the multilevel strategy. We firstly coarsen the template T and the reference R by L levels. Here, we set \(T_{L} = T\) and \(R_{L} = R\) in the finest level and \(T_{1}\) and \(R_{1}\) in the coarsest level. Then we can obtain \(U_{1}\) by solving our model (18) on the coarsest level. In order to give a good initial guess for the finer level, we adopt an interpolation operator on \(U_{1}\) to obtain \(U_{2}^{0}\) as the initial guess for the next level. We repeat this process and get the final registration on the finest level. A multilevel strategy has several advantages: in the coarse level, only important patterns can be considered and it is a standard technique used in order to avoid getting trapped in a meaningless local minimum; the computational speed is very fast because of less variables than on the fine level; the solution on the coarse level can be a good initial guess for the fine level.

The multilevel scheme representing our main algorithm is summarized below where \(I_{H}^{h}\) is an interpolation operator based on bi-linear interpolation techniques and \(I_{h}^{H}\) is a restriction operator for tansferring information to a coarser level.

figure e

5 Numerical Results

In this section, we will give some numerical results to illustrate the performance of our model (18). We hope to achieve 3 aims:

(1):

Which choice of \(\phi \) is the best for our model (18)?

(2):

We wish to compare with the current state-of-the-art methods (with codes listed for readers’ benefit) in the literature for good diffeomorphic mapping:

  1. (a)

    Hyperelastic Model [7]: code from http://www.siam.org/books/fa06/

  2. (b)

    LDDMM [37]: code from https://github.com/C4IR/FAIR.m/tree/master/add-ons/LagLDDMM

  3. (c)

    Diffeomorphic Demons (DDemons) [51]: code from http://www.insight-journal.org/browse/publication/154

  4. (d)

    QCHR [30]; code provided by the author Dr. Kam Chu Lam.

All of the tests are performed on a PC with 3.40 GHz Intel(R) Core(TM) i7-4770 microprocessor, and with installed memory (RAM) of 32 GB.

3). :

Most importantly, we like to test and highlight the advantages of our new model.

Let \(\mathbf y \) be the final transform obtained by a particular model for registering two given images TR. We use the following three measures to quantify the performance of this model and use them for later comparisons:

(i):

Re_SSD (the relative Sum of Squared Differences) which is given by

$$\begin{aligned} \mathrm{Re}\_\mathrm{SSD} = \frac{\Vert T(\mathbf{y })-R\Vert ^2}{\Vert T-R\Vert ^2}; \end{aligned}$$
(49)
(ii):

\(\min \det (J_{\mathbf{y }})\) and \(\max \det (J_{\mathbf{y }})\) that are the minimum and the maximum of the Jacobian determinant of this transformation;

(iii):

Jaccard similarity coefficient (JSC) as defined by

$$\begin{aligned} \mathrm{JSC} = \frac{|DT_{r}\cap R_{r}|}{|DT_{r}\cup R_{r}|}, \end{aligned}$$
(50)

where \(DT_{r}\) and \(R_{r}\) represent, respectively, the segmented regions of interest (e.g. certain image feature such as an organ) in the deformed template (after registration) and the reference. Hence, JSC is the ratio of the intersection of \(DT_{r}\) and \(R_{r}\) to the union of \(DT_{r}\) and \(R_{r}\) [29]. JSC = 1 shows that a perfect alignment of the segmentation boundary and JSC = 0 indicates that the segmented regions have no overlap after registration. Before computing JSC, in the first three examples below, we have employed a segmentation algorithm to segment the main features in both T and R but for the 4th example, the segmentation was manually done for both T and R.

In practice, we scale the intensity value of T and R to [0, 255]. Here, we state a strategy for choosing the parameters. For our model (18), \(\alpha \) should be related to energy \({\mathcal {D}}[\mathbf u _{0}]\) where \(\mathbf u _{0}\) is the initial guess for the displacement, and \(\beta \) should be related to \(\alpha \). Empirically, we set \(\alpha \in [\alpha _{1},\alpha _{2}]\), where \(\alpha _{1}=0.5{\mathcal {D}}[\mathbf u _{0}]10^{-2}\) and \(\alpha _{2}=2{\mathcal {D}}[\mathbf u _{0}]10^{-2}\). Respectively, for \(\phi =\phi _{1}\), \(\phi _{2}\), \(\phi _{3}\), we set \(\beta \in [3\alpha ,5\alpha ],\ [0.5\alpha ,2\alpha ]\) and \([\alpha ,5\alpha ]\). For simplicity, we denote by New 1, New 2 and New 3 the model (18) with \(\phi _{1}\), \(\phi _{2}\) and \(\phi _{3}\), respectively.

It should be noted that a good registration result should produce a small Re_SSD, be diffeomorphic and yield a large JSC value for a region of interest.

Fig. 3
figure 3

Test example 1 results of Hand to Hand registration (\(\alpha =2\)): in the top row, there are the template and reference. In the second row, there are the deformed templates obtained by model (18) and the diffusion model separately. Though the last column is visually fine, the transform is not correct—see Table 1. a Template, b reference, c \(T(\mathbf{y })\) by New 1 , d \(T(\mathbf{y })\) by New 2 , e \(T(\mathbf{y })\) by New 3 , f \(T(\mathbf{y })\) by diffusion model

5.1 Example 1—Improvement Over the Diffusion Model

In this example, we test a pair of real medical images, X-ray Hands of resolution \(128 \times 128\). Figure 3a, b shows the template and the reference. We compare our model with the diffusion model and study the improvement over it. In implementation, for both models, we use a five-step multilevel strategy.

We conduct two experiments using different parameters:

i) Fixed parameters. Our first choice uses fixed parameters. For New 1–3, we set \(\beta =7\), \(\beta =1\) and \(\beta =9\), respectively, and fix \(\alpha =2\). To be fair, we also choose \(\alpha =2\) for the diffusion model. In this case, Fig. 3 shows the deformed templates \(T(\mathbf{y })\) from 4 models. From it, we can see that all four models can produce visually satisfactory results. To differentiate them, we have to check the quantitative measures from Table 1. We can notice that the transformation obtained by the diffusion model is non-diffeomorphic due to \(\min \det (J_{\mathbf{y }}) <0\) (i.e. mesh folded, though visually pleasing and the Re_SSD is small). Figure 4 illustrates the transform \(\mathbf{y }=\mathbf{x } + \mathbf{u }\) locally at its folding point. In contrast, our New 1–3 can generate diffeomorphic transformations.

ii) Optimized parameters. The second choice uses the fine-tuned parameters for the diffusion model. We tested \(\alpha \in [1,500]\) and found the smallest \(\alpha =430\) with which the diffusion model generates a diffeomorphic transformation. Then for our model, we also set \(\alpha =430\) (which is not optimized in order to favour the former) and set \(\beta =5\) for New 1–3 (to test the robustness of our model). Table 1 shows the detailed results for this second test. From it, we can see that the Re_SSD and JSC of our model are similar to the diffusion model. And the transformations obtained by New 1–3 are all diffeomorphic while the diffusion model is only diffeomorphic with the help of an optimized \(\alpha \). This shows that our model possesses the robustness (in the sense of not requiring optimized \(\alpha \)) with the help of a positive \(\beta \).

Hence, this example demonstrates that our New 1–3 are robust and can all help to get an accurate and diffeomorphic transformation.

Table 1 Test example 1—Comparison of the new model (New 1–3) with the diffusion model based a fixed \(\alpha \) and an optimized \(\alpha \) for the latter

5.2 Example 2—Test of Large Deformation and Comparison of Models

As known, if the underlying deformation is small, it is generally believed that most models can deliver diffeomorphic transformations. This belief is true if one keeps increasing \(\alpha \), which in turn compromises the registration quality by resulting in an increase in Re_SSD (as seen in 2 tests of \(\alpha \) in Example 1 where the larger \(\alpha =430\) achieves diffeomorphism for diffusion with a worse Re_SSD value).

Therefore, to test the capability of a registration model, we need to take an example that requires large deformation. To this end, we consider Example 2—a classic synthetic example consisting of a Disc and a C shape of resolution \(128 \times 128\) as shown in Fig. 5a, b. We compare our 3 models (New 1–3) with 5 other models: the hyperelastic model, LDDMM, DDemons, QCHR and the diffusion model in registration quality and performance. For this example, we use a five-step multilevel strategy for our model, the hyperelastic model and the diffusion model. For LDDMM and QCHR, we use a three-step multilevel strategy. We use a one-step multilevel strategy for DDemons as we find that multilevel does not improve the results.

Fig. 4
figure 4

Zooming in the transformation (obtained by the diffusion model) where there is folding

Following our stated strategy for choosing the parameter for our model, we set \(\beta =80, 120, 100\) for New 1–3, respectively, and fix \(\alpha =70\). To be consistent, we also set \(\alpha =70\) for the diffusion model. For the hyperelastic model, LDDMM and QCHR, we set, respectively, \(\{\alpha _{l}=100, \alpha _{s}=0, \alpha _{v}=18\}\), \(\alpha =400\) and \(\{\alpha =0.1,\beta =1\}\) as used in the literature [7, 30, 37] for the same example. For the parameters of DDemons, we tried to optimize the parameters \(\{\sigma _{s},\sigma _{g}\}\) in the domain \([0.5,5]\times [0.5,5]\) and took the optimal choice \(\{\sigma _{s}=1.5,\sigma _{g}=3.5\}\).

We now present the comparative results. Figure 5c–j shows that except for the diffusion model, all the other models can produce the accepted registered results. Especially, our model and LDDMM are slightly better than the hyperelastic model, DDemons and QCHR. It is pleasing to see that the new model produces equally good results for this challenging example. From Table 2, we see that our New 1–3, hyperelastic model, LDDMM, DDemons and QCHR produce \(\min \det (J_{\mathbf{y }})>0\), i.e. the transformations obtained by these five models are diffeomorphic but the diffusion model fails again with \(\min \det (J_{\mathbf{y }})<0\).

Because New 1–3 are motivated by the QCHR model, we now discuss the results about these two types of models. On the one hand, according to Table 2, we can find that our model takes less time. This is because, as we have mentioned, the algorithm for QCHR needs to solve alternatively two subproblems (including several linear systems) in each iteration. Its convergence cannot be guaranteed. However, our model only needs to solve one linear system in each iteration. In addition, we employ the Gauss–Newton method which can be superlinearly convergent under the appropriate conditions. As we have also remarked, the QCHR algorithm can have convergence problems. This is now illustrated in Fig. 6 where we plot the relative residual of our model (New 3) and the relative residual of QCHR. We observe that New 3 decreases to below \(10^{-2}\) though not monotonically, but the relative residual of QCHR does not decrease and is over 0.1.

On the other hand, we can compare the obtained solutions’ quality by checking the energy functionals. Using the same QCHR functional, the QCHR solution for Example 2 has the value 1042 while the transformation obtained by New 3 gives the value 147 which is much smaller. This indicates that the result obtained by the QCHR algorithm is not accurate. This is consistent with the fact that the Re_SSD and JSC of New 3 are also better than QCHR. Both discussions reach the same conclusion: the QCHR algorithm cannot obtain the minimizer of the original QCHR functional.

Fig. 5
figure 5

Test example 2 results of Disc to C. The percentage value shows Re_SSD error. In the top row, there are the template and the reference. In the second and third row, there are the deformed templates obtained by New 1–3 and 5 other models separately. The landmarks in the template and reference are only used for QCHR and the last result (j) by diffusion is evidently not correct. a Template T, b reference R, c \(T(\mathbf{y })\) 0.1% by New 1, d \(T(\mathbf{y })\) 0.1% by New 2, e \(T(\mathbf{y })\) 0.1% by New 3, f \(T(\mathbf{y })\) 0.8% by hyperelastic, g \(T(\mathbf{y })\) 0.1% by LDDMM, h \(T(\mathbf{y })\) 1.7% by DDemons, i \(T(\mathbf{y })\) 7.7% by QCHR 6 landmarks, j \(T(\mathbf{y })\) 1.3% by diffusion model

Table 2 Test example 2—Comparison of the new model (New 1–3) with 5 other models

5.3 Example 3—Comparison of Models for a Challenging Test

Here, we illustrate the fact that area preservation between images can become unnecessary and trying to enforce it (as in the hyperelastic model) can fail to register an image. We choose the particular template and reference images, as shown in Fig. 7a, b, having significantly different areas in their main features—here the area of ’Disc’ is much larger than ’C’. The resolution of the images is \(512 \times 512\). We test the performance of New 1–3 and other models. In this example, we use a seven-step multilevel strategy for New 1–3, the hyperelastic model and the diffusion model. For LDDMM and QCHR, we use a five-step multilevel strategy. We use a single level for DDemons (since multilevels do not help).

In choosing the parameters for all the models to register this example, we first follow our strategy to set \(\beta =250, 50, 100\) for New 1–3, respectively, and fix \(\alpha =50\). To be consistent, we also set \(\alpha =50\) for the diffusion model. For the hyperelastic model, we also set \(\alpha _{l} = 50\) because it contains the diffusion term, and take \(\alpha _{s}=0\). For the third parameter \(\alpha _{v}\) in the hyperelastic model, we test it in the range [55, 150] and choose its optimal value \(\alpha _{v}=75\). For LDDMM and QCHR, we set the default value \(\alpha =400\) and \(\{\alpha =0.1,\beta =1\}\) as the previous example. For the parameters of DDemons, we test the parameters \(\{\sigma _{s},\sigma _{g}\}\) in the domain \([0.5,5]\times [0.5,5]\) and choose its optimal choice \(\{\sigma _{s}=2,\sigma _{g}=5\}\). Hence we would expect the hyperelastic model and DDemons to perform well.

The test results for Example 3 are presented in Table 3 and Fig. 7. Although all models except for the diffusion model produce diffeomorphic transformations, we can see visually that only 3 models (our New 2–3 and LDDMM) produce acceptable results, also confirmed by the table:

  • The badly deformed template generated by our New 1 shows that the model lacks robustness;

  • The hyperelastic model, though producing a diffeomorphic transform, fails (despite using an optimized parameter) because this model including a regularization term \((\det (J_\mathbf{y })-1)^{4}/(\det (J_\mathbf{y }))^{2}\) tends to preserve area. If we do not optimize parameters for the hyperelastic model, our tests show that its results are even worse.

  • In the previous example, we have pointed out that QCHR needs more computing time and, from Table 3, we see that the time for QCHR is about 20 times as long as our New 3;

  • The DDemons is trapped in a local minimum and its cpu time is also excessive (\(>5000\) s). We also try to apply a multilevel strategy to DDemons, but for this example the result is not satisfied. The Re_SSD, JSC and cpu time of our New 3 are all slightly better than the second best LDDMM;

  • Both Tables 2 and 3 show that the diffusion model produces solutions having a negative Jacobian (folding) which might be viewed non-physical; this model is included only for reference.

Hence, our model has advantages over other models for large deformation registrations not requiring preserving area.

Fig. 6
figure 6

Example 2 Relative Residual of New 3 and QCHR: The solid line indicates the relative residual of New 3. And the dot line shows the relative residual of the second subproblem in QCHR. Here, we can find that in the same 50 iterations, the relative residual of New 3 is decreasing to below \(10^{-2}\); however, the relative residual of QCHR is not decreasing and over 0.1. Hence, the convergence of the algorithm for QCHR cannot be guaranteed

Fig. 7
figure 7

Example 3 results of a large Disc to small letter C: in the top row, there are the template and reference. In the second and third row, there are the deformed templates obtained by model (18) and other models separately. The landmarks in the template and reference are only used for QCHR . a Template T, b reference R, c T(y) by our model 1, d T(y) by our model 2, e T(y) by our model 3, f T(y) by hyperelastic model, g T(y) by LDDMM, h T(y) by DDemons, i T(y) by QCHR with 20 pairs of landmarks, j T(y) by diffusion model

Table 3 Example 3—Comparison of the new model (New 1–3) with 5 other models

We now give 2 remarks on comparing New 3 (or New 2) and QCHR. As remarked, QCHR regularizes the Beltrami coefficient only and the landmarks supplied to QCHR can severely affect the results while our model regularizes the deformation rather than Beltrami coefficient. Both points can be further tested below.

(i). On the first point, regularizing the Beltrami coefficient only leads to smooth Beltrami coefficient. To compare smoothness of solutions by New 3 and QCHR, we compute three smoothness measures \(\Vert \nabla \mathbf u \Vert _{L^{2}}\), \(\Vert \mu (\mathbf y )\Vert _{L^{2}}\), \(\Vert \nabla \mu (\mathbf y )\Vert _{L^{2}}\) and present them in Table 4. Clearly the table indicates that QCHR does generate a smoother Beltrami coefficient than our model New 3 for both Examples 2–3, not a smoother deformation field. Hence, the model which only regularizes the Beltrami coefficient rather than the deformation is not sufficient to produce an accurate deformed template.

Table 4 Comparison of smoothness measures for solutions obtained by New 3 and QCHR. The Beltrami coefficient \(\mu \) obtained by QCHR is smoother than New 3 and the displacement \(\mathbf u \) obtained by New 3 is smoother than QCHR
Fig. 8
figure 8

Tests of QCHR with different landmarks: Example 2 (row 1) and Example 3 (row 2). On the left 3 columns of row 3, we show the registered templates for row 1. On the right 3 columns of row 3, we show the registered templates for row 2. Here, we can see that the accuracy of QCHR improves with the increase in landmarks. a T with 4 landmarks, b T with 6 landmarks, c T with 16 landmarks, d R with 4 landmarks, e R with 6 landmarks, f R with 16 landmarks, g T with 4 landmarks, h T with 8 landmarks, i T with 20 landmarks, j R with 4 landmarks, k R with 8 landmarks, l R with 20 landmarks, m \(T(\mathbf{y })\) JSC 83.15%, n \(T(\mathbf{y })\) JSC 85.36%, o \(T(\mathbf{y })\) JSC 90.16%, p \(T(\mathbf y )\) JSC 54.14%, q \(T(\mathbf{y })\) JSC 65.78%, r \(T(\mathbf{y })\) JSC 84.24%

Fig. 9
figure 9

Example 3 Illustration of Jacobian determinants of the transformations obtained by our New 3, QCHR and LDDMM for Example 2 (left two plots) and Example 3 (right two plots). Note all values are positive (since all models are diffeomorphic) and New 3 has similar distributions to LDDMM, different from QCHR. a \(\det (J_\mathbf{y _{\phi _{3}}})\), b \(\det (J_\mathbf{y _{\mathrm {QCHR}}})\), c \(\det (J_\mathbf{y _{\phi _{3}}})\), d \(\det (J_\mathbf{y _{\mathrm {LDDMM}}})\)

Fig. 10
figure 10

Example 4—Registration results of a pair of CT images: the template T and the reference R in the top row. The contours show the regions of interest. In the second and third rows, we show the deformed templates obtained by 8 models. The 5 landmarks in the template and the reference are only used by QCHR. a Template T, b reference R, c T(y) by New 1 JSC 94.2 %, d T(y) by New 2 JSC 94.4 %, e T(y) by New 3 JSC 95.3 %, f T(y) by hyperelastic model JSC 93.5\(\%\), g T(y) by LDDMM JSC 93.8 %, h T(y) by DDemons JSC 87.4 %, i T(y) by QCHR with 5 pairs of landmarks JSC 85.7%, j T(y) by diffusion model JSC 93.7\(\%\)

(ii). On the second point, we now illustrate the importance of landmarks for QCHR although for other problems the model can yield good results without any landmarks. Figure 8 shows three sets of increasing number of landmarks for Examples 2–3. We observe that more landmarks lead to better results in terms of JSC values.

As a final comparison of New 3 with LDDMM and QCHR, Fig. 9 plots the magnitudes of the Jacobian determinants of their transformations. It can be seen that New 3 and LDDMM give a similar pattern but both are different from QCHR.

5.4 Example 4—Comparison of the New Model with Other Models

In the final test, we test a pair of anonymized CT images in resolution \(512 \times 512\) from the Royal Liverpool University Hospital. Figure 10a, b shows the template and the reference. The template was taken in September 2016 and the reference was taken in May 2016. We want to compare the changes of our interested regions of abdominal aortic aneurysm with stents inserted inside them (with cross sections shown as two while ‘circles’ in images in Fig. 10a, b) during these 4 months. In addition, the interested region is used to compute JSC. The small white region on top of the images helps us to identify the correct slice to compare.

Here, following the previous example, we use the same multilevel strategy: a seven-step multilevel strategy for our model, the hyperelastic model and the diffusion model, a five-step multilevel strategy for LDDMM and QCHR and a one-step multilevel strategy for DDemons.

Following our strategy for choosing the parameter of our model, we set \(\alpha =20\) and set \(\beta =100, 40, 75\) with New 1–3, respectively. For the diffusion model and LDDMM, we test \(\alpha \) from [100, 2000] and set the optimal value 1300 and 500 ,respectively. For the hyperelastic model, we set \(\{\alpha _{l}=20, \alpha _{s}=0, \alpha _{v}=50\}\). We use the default value \(\{\alpha =0.1,\beta =1\}\) for QCHR. For the parameters of DDemons, we test the parameters \(\{\sigma _{s},\sigma _{g}\}\) in the domain \([0.5,5]\times [0.5,5]\) and choose \(\{\sigma _{s}=4,\sigma _{g}=4.5\}\).

With the optimized parameters, all the models in this example generate diffeomorphic transformations as seen from Table 5. DDemons and QCHR for this example are not as good as other models because they give worse Re_SSD and JSC. A worse JSC means the interested regions obtained by these two methods have significant differences from the reference (Fig. 10h, i). The diffusion model obtains a good JSC; however, its deformed template is a bit far (overall) from the reference (since Re_SSD = 10.02%). The other 2 models (Hyperelastic, LDDMM) generate good Re_SSD and JSC. However, our models produce the lowest Re_SSD and the best JSC. Hence, for this example of real images, our model is competitive to the state-of-the-art methods. Though there is broad agreement between Re_SSD and JSC, one has to combine with segmentation models to ensure the strict agreement.

Table 5 Example 4—Comparison of New 1–3 with 5 other models

Remark

According to the above four examples, our New 1 is not robust while New 2–3 can both generate accurate and diffeomorphic transformations. However, we recommend New 3 as the first choice because of the least computing time and the best quality, and New 2 as the second choice.

We also test these four examples with the Dirichlet boundary condition. Similar results for Examples 1 and 4 are obtained. However, for Examples 2 and 3, the transformations would be different since the boundary is better modelled by the Neumann’s condition.

6 Conclusions

Controlling mesh folding is a key issue in image registration models to ensure local invertibility. Many existing models either do not impose any further controls on the underlying transformation beyond smoothness (so potentially generating unrealistic or non-physical transforms or mapping) or impose a direct (often strongly biased e.g. towards area or volume preservation) control on some explicit function of the measure \(\det (J_{\mathbf{y }})\). This paper introduces a novel, unbiased and robust regularizer which is reformulated from Beltrami coefficient framework to ensure a diffeomorphic transformation. Moreover, we find that a direct approach (our New 1) from this Beltrami reformulation provides an alternative but less competitive method but further refinements (especially our New 3) of this new regularizer can give rise to more robust models than the existing methods. We highly recommend our model New 3, i.e. (18) with \(\phi =\phi _3\).

In designing optimization methods for solving the resulting highly nonlinear variational model, we give a suitable approximation of the exact Hessian matrix which is necessary to derive a convergent iterative method. Our test results can show that the new model (New 1–3, especially New 3) is competitive with the state-of-the-art models. The main advantage lies in robustness. Our future work will include extensions to 3D problems, multi-modality models and development of faster iterative solvers.