Graph-Based Slice-to-Volume Deformable Registration

Ferrante, Enzo; Paragios, Nikos

doi:10.1007/s11263-017-1040-8

Graph-Based Slice-to-Volume Deformable Registration

Published: 22 August 2017

Volume 126, pages 36–58, (2018)
Cite this article

Download PDF

International Journal of Computer Vision Aims and scope Submit manuscript

Graph-Based Slice-to-Volume Deformable Registration

Download PDF

1164 Accesses
4 Citations
Explore all metrics

Abstract

Deformable image registration is a fundamental problem in computer vision and medical image computing. In this paper we investigate the use of graphical models in the context of a particular type of image registration problem, known as slice-to-volume registration. We introduce a scalable, modular and flexible formulation that can accommodate low-rank and high order terms, that simultaneously selects the plane and estimates the in-plane deformation through a single shot optimization approach. The proposed framework is instantiated into different variants seeking either a compromise between computational efficiency (soft plane selection constraints and approximate definition of the data similarity terms through pair-wise components) or exact definition of the data terms and the constraints on the plane selection. Simulated and real-data in the context of ultrasound and magnetic resonance registration (where both framework instantiations as well as different optimization strategies are considered) demonstrate the potentials of our method.

Basics of magnetic resonance imaging and quantitative parameters T1, T2, T2*, T1rho and diffusion-weighted imaging

Article 15 April 2021

Suraj D. Serai

Iterative reconstruction: how it works, how to apply it

Article 11 October 2014

James Anthony Seibert

Statistical Shape Models: Understanding and Mastering Variation in Anatomy

1 Introduction

Slice-to-volume deformable registration is an important problem in the communities of computer vision and medical image computing, which has received considerable attention during the last decade. In general terms, it seeks to determine the slice (corresponding to an arbitrary plane) from a given target volume that corresponds to the deformed version of a source 2D image. This slice is generally specified by a rigid transformation $\hat{T}$. The source 2D image is deformed by a deformation field $\hat{D}$ towards improving the matching consistency between the deformed source image and the target slice.

Slice-to-volume registration is sometimes referred as 2D/3D registration, primarily due to dimension of the images involved in the registration process. Note that this term describes two different problems depending on the technology used to capture the 2D image: it might be a projective (e.g. X-ray) or sliced [e.g. ultrasound (US)] image. In this work we only focus on the latter case. Projective images have to be treated in a different way (basically a pixel in the 2D image does not correspond only to a voxel from the target volume, but to a projection of a set of them in certain perspective) and they are out of the scope of this paper. This is principally due to the fact that conventional image similarity terms cannot be used in the projective case. However, it should be noted that the proposed formulation with an appropriate definition of the matching and regularization cost could also accommodate a solution to this problem. We refer the reader to the comprehensive survey by Markelj et al. (2012) for further information about this topic.

1.1 Motivation

A broad number of medical image computing applications benefit from slice-to-volume registration. One can cite, for example, image guided surgeries and therapies (Fei et al. 2002), biopsies (Xu et al. 2014), tracking of particular organs (Gill et al. 2008) and minimally-invasive procedures (Liao et al. 2013; Huang et al. 2009). In such a context, slice-to-volume registration is a key element for bringing high resolution annotated data into the operating room. Generally, pre-operative 3D images such as computed tomography (CT) or magnetic resonance images (MRI) are acquired for diagnosis and manually annotated by expert physicians prior to the operation. During the procedure, 2D real time images are generated using different technologies (e.g. fluoroCT, US or interventional MRI slices). These intra-operative images refer to challenging acquisition constraints and inherit lower resolution and quality than the pre-operative ones. Moreover, tissue shift collapse as well as breathing and heart motion during the procedure, causes elastic deformation in the images. Non-rigid image registration is suitable to address this issue. The alignment of intra-operative images with pre-operative volumes augments the information that doctors have access to, and allows them to navigate the volumetric annotation while performing the operation.

Another interesting application is motion correction for image reconstruction. Here, the goal is to correct for misaligned slices when reconstructing a volume of a certain modality. A typical approach to solve it consists of mapping individual slices within a volume onto another reference volume in order to correct the inter-slice misalignment. The popular map-slice-to-volume (MSV) method that introduced this idea in the context of functional MRI (fMRI) was presented by Kim et al. (1999). More recently, applications of slice-to-volume registration to the same problem in different contexts like cardiac magnetic resonance (CMR) (Chandler et al. 2008), fetal images (Seshamani et al. 2013) and diffusion tensor imaging (DTI) (Jiang et al. 2009) have shown promising results.

Although the goals of the motivating problems we have described are different, all of them require to perform (to some extent) slice-to-volume registration. In this work, we focus on the applications where we need to navigate a pre-operative volume using intra-operative images. However, the method we present is modular enough to be adapted to different image modalities and settings, and therefore can be applied to any of these problems.

1.2 Previous Work

Several methods have been proposed during the recent years to deal with slice-to-volume registration. Some of them deal only with rigid registration, and therefore they cannot manage deformations due to tissue shift, breathing or heart motion. San José Estépar et al. (2009), for example, proposed a method to register endoscopic and laparoscopic ultrasound images with pre-operative computed tomography volumes that potentially could work in real time. It is based on a new phase correlation technique called LEPART and it handles rigid registration. Gill et al. (2008) tracks intra-operative MRI slices of prostate images with a pre-operative MRI volume. This monomodal registration is designed to provide patient tracking information for prostate biopsy performed under MR guidance, but is also constrained to rigid transformations. More recently, Eresen et al. (2014) proposed a method that uses smart phone as a navigation tool for initial slice alignment followed by an overlap invariant mutual information-based refinement that estimates the rigid transformation.

Other methods tackle the challenging problem of non-rigid slice-to-volume registration using nonlinear models. Among these, there is a sub-category of approaches that uses several slices instead of a single one, in order to improve the quality of the results. Some examples are Olesch et al. (2011) which uses a variational approach and Xu et al. (2014) who designed a two-step algorithm where initial rigid registration is followed by B-spline based deformable registration. Using several slices restricts the potential applications to the ones where more than one slice is available from the beginning. It also simplifies the problem by increasing the amount of available information. Our method performs slice-to-volume registration using a single input slice. Consequently, it can be adapted to a broader range of applications where just one slice is available at a time. We refer the reader to Ferrante and Paragios (2017) for a complete survey about alternative slice-to-volume registration methods proposed in the literature of medical image registration.

Most of the aforementioned slice-to-volume registration approaches, rely on continuous methods to model and perform parameter estimation. In this paper we extend our previous work presented in Ferrante and Paragios (2013), Ferrante et al. (2015b), Ferrante et al. (2015a) through the introduction of a single, mathematically rigorous and theoretically sound framework derived as a discrete labeling problem on a graphical model. Graphical models and discrete optimization are powerful formalisms that have been successfully used during the past years in the field of computer vision (Wang et al. 2013). In particular, rigid as well as non-rigid image registration have been formulated as a minimal cost graph problem where the nodes of the graph correspond to the deformation grid and the graph connectivity encodes regularization constraints. However, this technique has been applied mainly to mono-dimensional cases (2D–2D or 3D–3D). To the best of our knowledge, the only work that focuses on multi-dimensional image registration (apart of our previous articles that have been referenced at the beginning of this paragraph) using this type of techniques is Zikic et al. (2010). However, it estimates only rigid transformations and works with projective images.

Discrete methods have several advantages when compared with continuous approaches for slice-to-volume registration. First, discrete algorithms are inherently gradient-free, while most part of continuous methods require the objective function to be differentiable. Gradient-free methods do not require computation of the energy derivative. Therefore, it may be applied to any complex energy function (allowing the user to define its own similarity measures in case of registration problems). The only requirement is that this function must be evaluable in a variety of possible discrete labelings. Second, most part of the continuous methods are prone to be stuck in local minima when the functions are not convex. In case of discrete methods, even complicated functions could potentially be optimized using large neighbor search methods. The main limitation is the discretization of the continuous space; however, as suggested by Glocker (2010), ’the optimality is bounded by the discretization, but with intelligent refinement strategy the accuracy of continuous methods can be achieved’. Third, parallel architectures can be used to perform non-sequential tasks required by several discrete algorithms leading to more efficient implementations. Fourth, by using a discrete label space we can explicitly control its range and resolution (it can be useful to introduce prior information, as it will be shown in this work), while in continuous models it is not clear how this type of information can be used to constraint the solution. Last but not least, discrete frameworks such as discrete MRF provide a modular and principled way to combine prior knowledge with data likelihood (through the energy formulation), what makes it applicable to a wide range of vision tasks (Wang et al. 2013), particularly, to the challenging slice-to-volume registration problem.

1.3 Contribution

This article contributes to enrich the standard graph-based deformable registration theory by extending it to the case of slice-to-volume registration. We present three different models to solve this challenging problem which vary in terms of graph topology, label space definition and energy construction. Our aim is to demonstrate how flexible and powerful the graph theory is in terms of expressive potential of the modeling process, while solving a new problem using graphical models. We analyze the strong and weak points of every model and we perform comparative experiments. Validation is done using a monomodal MRI cardiac dataset and a multimodal brain dataset (Mercier et al. 2012) including different inference methods.

2 Graph-Based Slice-to-Volume Deformable Registration

An enormous variety of tasks in computer vision and medical image analysis can be expressed as discrete labeling problems (Paragios et al. 2016). Low, mid and high-level vision tasks can be addressed within this framework. To this end, a visual perception task is addressed by specifying a task-specific parametric model, associating it to the available observations (images) through an objective function and optimizing the model parameters given both, the objective and the observations (Paragios and Komodakis 2014).

In the context of graph-based discrete labeling problems, the model is composed by a graph $\mathcal {G = \langle V, E \rangle }$ where vertices in $\mathcal {V}$ correspond to the variables while $\mathcal {E}$ is a neighborhood system (pair-wise & higher order cliques) that encodes the relationships among these variables. We also consider a discretized version of the search space that is represented by a discrete set of labels $l \in L$. The aim is to assign to every variable $v \in \mathcal {V}$ a label $l_v \in L$. Each time we choose to assign a label, say, $l_{v_1}$ to a variable $v_1$, we are forced to pay a price according to the so-called energy function. This objective function is domain-specific and associates the observations to the model. It is formulated as the sum of singleton terms $g_v(l_v)$ (which depend only on one label $l_v$), pairwise terms $f_{v_1v_2}(l_{v_1}, l_{v_2})$ (which depend on two variables $l_{v_1}, l_{v_2}$) and high-order terms $f_{v_1 \ldots v_n}(l_{v_{i^1}},\ldots ,l_{v_i^{\mid C_i \mid }})$ (which are associated to high-order cliques $C_i$ that depend on more than two variables). Our goal is then to choose a labeling which will allow us to recover the solution corresponding to the minimal value of the objective function. In other words, we want to choose a labeling that minimizes the sum of all the energy potentials, or equivalently the energy $\mathcal {P}(g,f)$. This amounts to solving the following optimization problem:

$$\begin{aligned} \begin{aligned} \mathop {\hbox {argmin}}\limits _{l_p} \mathcal {P}(g,f)&= \sum _{v\in \mathcal {V}}g_p(l_v) + \sum _{(v_1,v_2) \in \mathcal {E}} f_{v_1v_2}(l_{v_1},l_{v_2}) \\&\quad + \sum _{C_i \in \mathcal {E}} f_{v_1 \ldots v_n}(l_{v_{i^1}},\ldots ,l_{v_i^{\mid C_i \mid }}), \end{aligned} \end{aligned}$$

(1)

Performing parameter inference on this graphical model, could be an effective solution to a big variety of problems in computational medicine. Note that we make a distinction between singleton, pairwise and high-order terms, depending on the number of variables jointly interacting. It should be noted that most part of the graph-based vision models have explored mainly pairwise constraints (pairwise Conditional and Markov Random Field (CRF/MRF) models), because in these cases exact or approximate efficient inference of Maximum a Posteriori (MAP) solutions can be done. However, during the last few years, more and more high-order models and inference algorithms have been developed which offer higher modeling power and can lead to more accurate solutions of the problems (Kohli and Rother 2012; Komodakis et al. 2011). Given such a general setting, let us now try to explore the expressive power of such models in the context of slice-to-volume deformable registration.

The task of slice-to-volume deformable registration can be expressed mathematically as follows. Given a 2D source image I and a 3D target volume J, we seek the 2D–2D in-plane local deformation field $\hat{T}_D$ and the plane $\hat{\pi }[J]$ (i.e. a bi-dimensional slice from the volume J) which in the most general case minimize the following objective function:

$$\begin{aligned} \hat{T}_D, \hat{\pi } = \mathop {\hbox {argmin}}\limits _{T_D, \pi } \mathcal {M}(I \circ T_D(\varvec{x}), \pi [J](\varvec{x})) + \mathcal {R}(T_D, \pi ), \end{aligned}$$

(2)

where $\mathcal {M}$ represents the data similarity term and $\mathcal {R}$ the regularization term. The data term $\mathcal {M}$ measures the matching quality between the deformed 2D source image and the corresponding 3D slice. The regularization term $\mathcal {R}$ imposes certain constraints on the solution that can be used to render the problem well posed. It also imposes certain expected geometric properties on the extended (plane selection and plane deformation) deformation field. The plane $\hat{\pi }$, that minimizes the equation, indicates the location of the 3D volume slice that best matches the deformed source image. The deformation field $\hat{T}_D$ represents the in-plane deformations that must be applied to the source image in order to minimize the energy function.

The fundamental idea behind our approaches is quite intuitive: we aim at deforming a planar 2D grid in the 3D space, which encodes both the deformation field $\hat{T}_D$ and the plane $\hat{\pi }$ at the same time. This grid is super-imposed to the 2D source image and consists of control points that jointly represent the in-plane deformation and the current position of the 2D image into the 3D volume. The source image is positioned within the volume by applying different displacement vectors with respect to the control points of the superimposed grid. These displacements are chosen such that a given energy (see Eq. 2) is minimized to best fit the matching criterion $\mathcal {M}$. Since they can be moved without any restriction, geometric constraints are imposed through the regularization term $\mathcal {R}$ in order to keep a smooth deformation field and a planar grid. Given that we impose a soft planar constraint, the resulting grid is approximately planar. Therefore, we reconstruct the final solution by projecting all the points into a regression plane which is estimated out of the current position of the points. The rigid transformation that indicates the position of the regression plane is considered as $\hat{\pi }$. Finally, the projected grid is interpreted as a 2D Free Form Deformation model (FFD) (Rueckert et al. 1999) where each control point has local influence on the deformation and is used to approximate the dense deformation field $\hat{T}_D$ (other control point interpolation models could be used as well). Alternatively, depending on the application, one may prefer to deform the sliced image $\pi [J]$ instead of the source image I. Note that this can be done by simply using the inverse of the deformation field $T_D$. To guarantee the existence of the inverse, we can restrict the generated deformation fields to be diffeomorphic. This can be easily guaranteed in our framework by restricting the displacements size to 0.4 times the size of the current grid, as indicated in Glocker et al. (2011). Figure 1 illustrates the complete workflow described in this paragraph.

In this work, we restrict the geometry of the final solution to in-plane deformations only (i.e. 2D deformations acting only in the plane $\pi $). As explained in the previous paragraph, we do that by projecting the final position of the control points into a regression plane estimated out of the current position of those points. We follow this strategy since we found that it improves the stability of the method by restricting the solution space to only 2D deformation fields. However, considering out-of-plane deformations in the proposed framework would only require sidestepping the control points projection step. In our current formulation, since we allow the control points to move freely within the 3D space, the grid is actually deformed in 3D. Actually, the regularization terms imposing plane consistency are soft constraints which can be violated if the data term indicates large matching values. Indeed, they are commonly violated; otherwise we would not require a projection step. Consequently, avoiding the step where we project every control point to the regression plane and interpreting the deformed 2D grid as a 3D deformation field, would be enough to incorporate out-of-plane deformations if required.

This general formulation can be expressed through different discrete labeling problems on a graph by changing its topology, the label space definition and the energy terms. As we mentioned, in this work we propose three different approaches to derive slice-to-volume registration as a discrete graph labeling problem. First, we propose the so-called overparameterized method, which combines linear and deformable parameters within a coupled formulation on a 5-dimensional label space (Ferrante and Paragios 2013). The main advantage of such a model is the simplicity provided by its pairwise structure, while the main disadvantage is the dimensionality of the label space which makes inference computationally inefficient and approximate (limited sampling of search space). Motivated by the work of Shekhovtsov et al. (2008), we present a decoupled model where linear and deformable parameters are separated into two interconnected subgraphs which refer to lower dimensional label spaces (Ferrante et al. 2015b). It allows us to reduce the dimensionality of the label space by increasing the number of edges and vertices, while keeping a pairwise graph. Finally, in the high-order approach (Ferrante et al. 2015a), we achieve this dimensionality reduction by augmenting the order of the graphical model, using third-order cliques which exploits the expression power of this type of variable interactions. Such a model provides better satisfaction of the global deformation constraints at the expense of quite challenging inference.

2.1 Overparameterized Approach

Let us consider an undirected pair-wise graph $G_O=\langle V,E \rangle $ super-imposed to the 2D image domain with a set of nodes V and a set of cliques E. The nodes V (a regular lattice) are interpreted as control points of the bi-dimensional quasi-planar grid that we defined in the previous section. The set of edges E is formed by regular 4-neighbors grid connections and some extra edges introduced to improve the propagation of the geometrical constraints (see Fig. 2a). The vertices $v_i \in V$ are moved by assigning them different labels $u_i \in L$ (where L corresponds to the label space) until an optimal position is found.

In order to deform the graph, we need to define a label space able to describe the inplane deformations and the plane selection variables. To this end, we consider a label space L that consists of 5-tuples $\varvec{l} = (d_x, d_y, d_z, \phi , \theta )$, where the first three parameters $(d_x,d_y,d_z)$ define a displacement vector $\varvec{d_i}$ in the cartesian coordinate system (see Fig. 2b), and the angles $(\phi , \theta )$ define a vector $\varvec{N_i}$ on a unit sphere, expressed using spherical coordinates (see Fig. 2c). Let us say we have a control point $\varvec{p_i} =(p_{xi}, p_{yi}, p_{zi})$ and we assign the label $\varvec{l_i}=(d_{xi}, d_{yi}, d_{zi}, \phi _i, \theta _i)$ to this point. So, the new point position $\varvec{p'_i}$ after assigning the label is calculated using the displacement vector as given by the following equation:

$$\begin{aligned} \varvec{p'_i} =(\varvec{p_{xi}} + \varvec{d_{xi}}, \varvec{p_{yi}} + \varvec{d_{yi}}, \varvec{p_{zi}} + \varvec{d_{zi}}). \end{aligned}$$

(3)

Additionally, we define a plane $\pi _i$ containing the displaced control point $\varvec{p'_i}$ and whose unit normal vector (expressed in spherical coordinates and with constant radius $r=1$) is $\varvec{N_i} = (\phi _i, \theta _i)$. One of the most important constraints to be considered is that our transformed graph should have a quasi-planar structure, i.e. it should be similar to a plane; the plane $\pi _i$ associated with every control point $\varvec{p_i}$ is used by the energy term to take into account this constraint. Figure 2.d shows how to interpret the labels for two given points $\varvec{p_i}$ and $\varvec{p_j}$.

The energy to be optimized is formed by data terms $G=\{g_i(\cdot )\}$ (or unary potentials) associated with each graph vertex and regularization terms $F=\{f_{ij}(\cdot , \cdot )\}$ (or pairwise potentials) associated with the edges. As we described in Sect. 2, the first ones are typically used for encoding some sort of data likelihood, whereas the later ones act as regularizers and thus play an important role in obtaining high-quality results (Glocker et al. 2011). The minimization energy problem for the overparameterized formulaton is thus defined as:

$$\begin{aligned} \mathcal {P_O}(G, F) = \min \sum _{i \in V} g_i(\varvec{l_i}) + \gamma \sum _{(i,j) \in E} f_{ij}(\varvec{l_i}, \varvec{l_j}), \end{aligned}$$

(4)

where $l_i, l_j \in L$ are the labels assigned to the vertices $v_i, v_j \in V$ respectively.

The formulation of the unary potentials that we propose is independent of the similarity measure. It is calculated for each control point given any intensity based metric $\delta $ capable of measuring the similarity between two bi-dimensional images (e.g sum of absolute differences, mutual information, normalized cross correlation). This calculation is done for each control point $\varvec{p_i}$, using its associated plane $\pi _i$ in the target image J and the source 2D image I. An oriented patch ${\varOmega }_i$ over the plane $\pi _i$ (centered at $\varvec{p_i}$) is extracted from the volume J, so that the metric $\delta $ can be calculated between that patch and the corresponding area from the source 2D image (see Fig. 3). Please note that this patch will be sampled from the 3D image, given the current position of the control point $\mathbf {p_i}$. Since a single point is not enough to define a unique patch, we refer to the “patch ${\varOmega }_i$ over the plane $\pi _k$” to stress the fact that this patch will be sampled from the area surrounding the point $\mathbf {p_i}$, only considering those points living in the plane $\pi _i$ defined by the normal vector $\varvec{N_i}$. The unary potential is then defined as:

$$\begin{aligned} g_i(\varvec{l_i}) = \int _{{\varOmega }_i} \delta ( I {\circ T_D} (\varvec{x}), \pi _i[J](\varvec{x}) ) d\varvec{x}. \end{aligned}$$

(5)

One of the simplest and commonly used similarity measures is the Sum of Absolute Differences (SAD) of the pixel intensity values. It is useful in the monomodal scenario, where two images of the same modality are compared and, therefore, the grey intensity level itself is discriminant enough to determine how related are the two images. Its formulation in our framework is:

$$\begin{aligned} g_{ SAD ,i}(\varvec{l_i}) = \int _{{\varOmega }_i} \mid I {\circ T_D} (\varvec{x}) - \pi _i[J](\varvec{x}) \mid d\varvec{x}. \end{aligned}$$

(6)

In multimodal scenarios, where different modalities are compared (e.g. CT with Ultrasound images), statistical similarity measures such as Mutual Information (MI) are generally used since we can not assume that corresponding objects have the same intensities in the two images. MI is defined using the joint intensity distribution p(i, j) and the marginal intensity distribution p(i) and p(j) of the images as:

$$\begin{aligned} g_{ MI ,i}(\varvec{l_i}) = - \int _{{\varOmega }_i} \log \frac{p(I {\circ T_D} (\varvec{x}), \pi _i[J](\varvec{x}))}{p(I {\circ T_D} (\varvec{x})) p(\pi _i[J](\varvec{x})))} d\varvec{x}. \end{aligned}$$

(7)

As we can see in the previous examples, our framework can encode any local similarity measure defined over two two-dimensional images. Please note that by local similarity measure we stress the fact that the metric is computed locally around the control point, as opposed to global similarity measures which are computed using the complete image.

Let us now proceed with the definition of the regularization term. Generally, these terms are used to impose smoothness on the displacement field. In our formulation, the pairwise potentials are defined using a linear combination of two terms: the first ($F_1$) controls the grid deformation assuming that it is a plane, whereas the second ($F_2$) maintains the plane structure of the mesh. They are weighted by a coefficient $\alpha $ as indicates the following equation:

$$\begin{aligned} f_{ij}(\varvec{l_i}, \varvec{l_j}) = \alpha {F_1}_{i,j}(\varvec{l_i}, \varvec{l_j}) + (1 - \alpha ) {F_2}_{i,j}(\varvec{l_i}, \varvec{l_j}). \end{aligned}$$

(8)

The in-plane deformation is controlled using a distance preserving approach: it tries to preserve the original distance between the control points of the grid. Since this metric is based on the euclidean distance between the points, it assumes that they are coplanar. We use a distance based on the ratio between the current position of the control points $\varvec{p_i}, \varvec{p_j}$ and their original position $\varvec{p_{o,i}}, \varvec{p_{o,j}}$:

$$\begin{aligned} \psi _{i,j}(\varvec{d_i}, \varvec{d_j}) = \frac{\mid \mid (\varvec{p_i} + \varvec{d_i}) - (\varvec{p_j} + \varvec{d_j}) \mid \mid }{\mid \mid (\varvec{p_{o,i}}) - (\varvec{p_{o,j}}) \mid \mid }. \end{aligned}$$

(9)

Once we have defined $\psi _{ij}$, the regularizer should fulfill two conditions: (i) it has to be symmetric with respect to the displacement of the points, i.e. it must penalize equally whenever the control points are closer or more distant; (ii) the energy has to be zero when the points are preserving distances and monotonically increasing with respect to the violation of the constraint. The following regularization term fulfills both conditions for a couple of nodes $i,j \in V$ labeled with labels $\varvec{l_i},\varvec{l_j}$:

$$\begin{aligned} {F_1}_{i,j}(\varvec{l_i}, \varvec{l_j}) = (1-\psi _{i,j}(\varvec{d_i}, \varvec{d_j}))^2 + (1-\psi _{i,j}(\varvec{d_i}, \varvec{d_j})^{-1})^2, \end{aligned}$$

(10)

The plane preservation term is based on the average distance between a given control point and the plane defined from the neighboring ones (see Fig. 4b). The aim is to maintain the quasi-planar structure of the grid. Given that the distance between a point and a plane is zero when the point lies on the plane, this term will be minimum when the control points for which we are calculating the pairwise potential are on the same plane.

The distance between a point $\varvec{p} = (p_x, p_y, p_z)$ and a plane $\pi $ defined by the normal vector $\varvec{N}=(n_x, n_y, n_z)$ and the point $\varvec{q}=(q_x, q_y, q_z)$ is calculated as:

$$\begin{aligned} D_\pi (\varvec{p}) = \frac{\mid n_x (p_x-q_x) + n_y (p_y-q_y) + n_z (p_z-q_z) \mid }{\sqrt{n_x^2 + n_y^2 + n_z^2}}. \end{aligned}$$

(11)

$F_2$ is defined using this distance (Equation 11) and corresponds to the average of $D_{\pi _j}(\varvec{p_i}+\varvec{d_i})$ and $D_{\pi _i}(\varvec{p_j}+\varvec{d_j})$:

$$\begin{aligned} {F_2}_{i,j}(\varvec{l_i}, \varvec{l_j}) = \frac{1}{2} (D_{\pi _j}(\varvec{p_i}+\varvec{d_i}) + D_{\pi _i}(\varvec{p_j}+\varvec{d_j})). \end{aligned}$$

(12)

Recall that normal vectors in our label space are expressed using spherical coordinates with a fixed radius $r=1$ (unit sphere). However, the formulation that we presented uses cartesian coordinates. Therefore, the mapping from one space to another is done as follows:

$$\begin{aligned} x = r \sin (\theta ) \cos (\phi ), y = s \sin (\theta ) \sin (\phi ), z = r \cos (\theta ). \end{aligned}$$

(13)

Note that such pairwise terms are non submodular since we include the current position of the points (which can be arbitrary) in their formulation and therefore the submodularity constraint is not fulfilled. In this context, even if there is no energy bounding that guarantees certain quality for the solution of the optimization problem, good empirical solutions are feasible since we are in a pairwise scenario. Still, two issues do arise: (i) high dimensionality of the label space and consequently high computational cost, (ii) insufficient sampling of the search space and therefore suboptimal solutions. In order to address these issues while maintaining the pairwise nature of the methods, we propose the decoupled method inspired by Shekhovtsov et al. (2008). We consider decoupling the label space into two different ones and redefining the topology of the graph, so that we can still capture rigid plane displacements and in-plane deformation.

2.2 Decoupled Approach

We propose to overcome the limitations of the overparameterized method by decoupling every node of the previous approach into two different ones: one modeling the in-plane deformation and another the position of the plane. This is somewhat analogous to creating two separated graphs of the same size and topology corresponding to different random variables and label spaces. Once spaces have been decoupled, different sampling strategies can be used for them. Another advantage of this approach is that we can define distinct regularization terms for edges connecting deformation nodes or plane position nodes. It allows to regularize in a different way the deformation and the plane position, imposing alternative geometrical constraints for every case.

Since data term computation requires the exact location of the node, both position and deformation labels are necessary. Both graphs can thus be connected through a pairwise edge between every pair of corresponding nodes. Therefore, new pairwise potentials are associated with these edges in order to encode the matching measure.

Formally, the decoupled formulation consists of an undirected pair-wise graph $G_D=\langle V,E \rangle $ with a set of nodes $V = V_I \cup V_P$ and a set of cliques $E=E_I \cup E_P \cup E_D$. $V_I$ and $V_P$ have the same cardinality and 4-neighbor grid structure. Nodes in $V_I$ are labeled with labels that model in-plane deformation, while labels used in $V_P$ model the plane position. Edges from $E_I$ and $E_P$ correspond to classical grid connections for nodes in $V_I$ and $V_P$ respectively; they are associated with regularization terms. Edges in $E_D$ link every node from $V_I$ with its corresponding node from $V_P$, creating a graph with a three dimensional structure; those terms encode the matching similarity measure. Note that $E_I$ and $E_P$ can be extended with the same type of extra edges defined in Sect. 2.1 (see Fig. 2a) to improve the satisfaction of the desired geometrical constraints.

We define two different label spaces, one associated with $V_I$ and one associated with $V_P$. The first label space, $L_I$, is a bidimensional space that models in-plane deformation using displacement vectors $\varvec{l^I} = (d_{x}, d_{y})$. The second label space, $L_P$, indicates the plane in which the corresponding control point is located and consists of labels $\varvec{l^P}$ representing different planes. In order to specify the plane and the orientation of the grid on it, we consider an orthonormal basis acting on a reference point in this plane. Using this information, we can reconstruct the position of the control points of the grid. The planes parametrization is given by $\varvec{l^P}=(\phi , \theta , \lambda )$, where angles $\phi $ and $\theta $ define a vector $\varvec{N}$ over a unit sphere, expressed through its spherical coordinates (see Fig. 2c). This value, together with parameter $\lambda $, defines the position of the plane associated with the given control point. This is an important advantage of our method: we could use prior knowledge to improve the way we explore the plane space, just by changing the plane space sampling method.

As it concerns the considered plane sampling method, the final position of every control point $\varvec{p_k}$ of the grid is determined using the pairwise term between two graph nodes ($v^I_k \in V_I$ and $v^P_k \in V_P$) and their respective labels ($l^I_k \in L_I$ and $l^P_k \in L_P$). Imagine we have a plane $\pi _k$ with normal vector $\varvec{N}$ that contains the displaced control point $\varvec{p_k} + \varvec{l^I_k}$. Parameter $\lambda $ indicates the magnitude of the translation we apply to $\pi _k$ in the direction given by $\varvec{N}$ in order to determine the plane’s final position (see Fig. 5 for a complete explanation). Given that we can associate different planes to different control points (by assigning them different labels $\varvec{l^P}$), we need to impose constraints that will force the final solution to refer to a unique plane.

The energy that guides the optimization process involves three different pairwise terms, which encode data consistency between the source and the target, smoothness of the deformation and unique plane selection:

$$\begin{aligned} \begin{aligned} \mathcal {P_D}(I, P, D)&= \min \alpha \sum _{(i,j) \in E_I} e^I_{i,j}(\varvec{l^I_{i}, l^I_{j}}) \\&\quad +\beta \sum _{(i,j) \in E_P} e^P_{i,j}(\varvec{l^P_{i}, l^P_{j}})\\&\quad + \sum _{(i,j) \in E_D} e^D_{i,j}(\varvec{l^I_{i}, l^P_{j}}), \end{aligned} \end{aligned}$$

(14)

where $\alpha , \beta $ are scaling factors, $e^I_{i,j} \in I$ are in-plane deformation regularizers (associated to edges in $E^I$), $e^P_{i,j} \in P$ are plane consistency constraints (associated with edges in $E^P$) and $e^D_{i,j} \in D$ are data terms (associated with edges in $E^D$). $l^I_{i}$, $l^P_{i}$ are labels from label spaces $L_I$ and $L_P$ respectively.

The data term is defined for every control point of the imaginary grid $\varvec{p_k}$ using the information provided by two associated graph nodes. It is encoded in the pairwise term $e^D \in E^D$. To this end, we extract an oriented patch ${\varOmega }_k$ over the plane $\pi _k$ (centered at $\varvec{p_k}$) from the volume J, so that the similarity measure $\delta $ can be calculated between that patch and the corresponding area over the source 2D image (see Fig. 5):

$$\begin{aligned} e^D_{i,j}(\varvec{l^I_{i}, l^P_{j}}) = \int _{{\varOmega }_k} \delta ( I {\circ T_D} (\varvec{x}), \pi _k[J](\varvec{x}) ) d\varvec{x}. \end{aligned}$$

(15)

We define two different regularization terms. The first controls the in-plane deformation; it is defined on $V_I$ and corresponds to a symmetric distance preserving penalty:

$$\begin{aligned} e^{I}_{i,j}(\varvec{l^I_{i}, l^I_{j}}) = (1-\psi _{i,j}(\varvec{l^I_i}, \varvec{l^I_j}))^2 + (1-\psi _{i,j}(\varvec{l^I_i}, \varvec{l^I_j})^{-1})^2, \end{aligned}$$

(16)

where $\psi _{i,j}$ is the distance defined in Equation 9.

The second term penalizes inconsistencies in terms of plane selection, and is defined on $V_P$. We use the earlier defined (at is concerns the overparameterized model, in Equation 12) point-to-plane distance:

$$\begin{aligned} e^P_{i,j}(\varvec{l^P_i}, \varvec{l^P_j}) = \frac{1}{2} (D_{\pi _j}(\varvec{p_i}') + D_{\pi _i}(\varvec{p_j}')). \end{aligned}$$

(17)

where $\varvec{p_i}'$ and $\varvec{p_j}'$ are the positions after applying label $\varvec{l^P_i}$, $\varvec{l^P_j}$ to $\varvec{p_i}$, $\varvec{p_j}$ respectively.

Note that these terms are similar to the ones of the former approach. However, there is an important difference regarding the parameters they use. In case of the overparameterized approach, parameters are always 5-dimensional labels. In the current approach, parameters are at most 3-dimensional, thus reducing the complexity of the optimization process while also allowing a denser sampling of the solution space. Conventional pairwise inference algorithms could be used to optimize the objective function corresponding to the previously defined decoupled model. Such a model offers a good compromise between expression power and computational efficiency. However, the pairwise nature of such an approach introduces limited expression power in terms of energy potentials. The smoothness (regularization) terms with second order cliques are not invariant to linear transformations such as rotation and scaling (Glocker et al. 2009), while being approximate in the sense that plane consistency is imposed in a rather soft manner. These concerns could be partially addressed through a higher order formulation acting directly on the displacements of the 2D grid with 3D deformation labels. Furthermore, the data term is just a local approximation of the real matching score between the deformed source 2D image and the corresponding target plane; by introducing high-order terms we could define it more accurately.

2.3 High-Order Approach

The new formulation consists of an undirected graph $G_H=\langle V,E \rangle $ with a set of nodes V and a set of third-order potentials $E=E_D \cup E_R$. The nodes are control points of our two-dimensional quasi-planar grid and they are displaced using 3D vectors $l_i \in L_H$. We define two types of cliques in E. Cliques in $E_D$ are triplets of vertices with a triangular shape and they are associated with data terms. Cliques in $E_R$ are collinear triplets of vertices (aligned in horizontal and vertical directions) forming third-order cliques associated with regularization terms.

Unlike the previous methods, which require extra labels to explicitly model the plane selection, high-order potentials explicitly encode them. Furthermore, third-order triangular cliques can also explicitly encode data terms, since the corresponding plane can be precisely determined using the position of these 3 vertices. We use triplets of collinear points for regularization terms. According to Kwon et al. (2008), this allows us to encode a smoothness prior based on the discrete approximation of the second-order derivatives using only the vertices’ position. Therefore, we define a simple three dimensional label space of displacement vectors which is sampled as shown in Fig. 2b.

The energy to be minimized consists of data terms $D_{ijk}$ associated with triangular triplets of graph vertices $(i,j,k) \in E_D$ and regularization terms $R_{ijk}$ associated with collinear horizontal and vertical triplets $(i,j,k) \in E_R$. The minimization energy problem becomes:

$$\begin{aligned} \begin{aligned}&\mathcal {P_H}(D, R) = \min \sum _{(i,j,k) \in E_D}D_{ijk}(\varvec{l_i, l_j, l_k}) \\&\quad +\gamma \sum _{(i,j,k) \in E_R}R_{ijk}(\varvec{l_i,l_j,l_k}), \end{aligned} \end{aligned}$$

(18)

where $\gamma $ is a scaling factor and $\varvec{l_i}$ is a label associated with a displacement vector $(d_x, d_y, d_z)$ and assigned to the node i.

The data term is defined over a disjoint set of triangular cliques, covering the entire 2D domain, as shown in Fig. 6a. Its formulation is independent of the similarity measure $\delta $ and it is calculated for each clique $\varvec{c} = (i,j,k) \in E_D$ using the source 2D image I and the corresponding plane $\pi _d[J]$ extracted from the target volume J, defined by the three control points of the clique. For a given similarity measure $\delta $, the data term associated with the clique $\varvec{c}$ is thus defined as:

$$\begin{aligned} D_{ijk}(\varvec{l_i, l_j, l_k}) = \int _{{\varOmega }_{(l_i, l_j, l_k)}} \delta ( I {\circ T_D} (\varvec{x}), \pi _d[J](\varvec{x}) ) d\varvec{x}, \end{aligned}$$

(19)

where $\varvec{x} \in {\varOmega }_{(l_i, l_j, l_k)}$, and ${\varOmega }_{(l_i, l_j, l_k)}$ corresponds to the triangular area defined by the control points of clique $\varvec{c}=(i,j,k)$ over the plane $\pi _d[J]$, after applying the corresponding labels $\varvec{l_i, l_j, l_k}$ to the vertices.

Smoothness and plane consistency are also imposed using higher order cliques. We define a clique for every set of three collinear and contiguous grid nodes (in horizontal and vertical directions as depicts Fig. 6b). We also introduce extra cliques formed by nodes that are collinear but not contiguous. The aim is to propagate the regularization so that the planar structure is conserved. The regularization term, as noted previously, seeks to satisfy the plane structure of the grid and the smoothness nature of the in-plane deformations.

Planar consistency can be easily enforced by propagating a null second-derivative constraint among collinear triplets of points. In fact, a null second-derivative for these cliques does not impose just a planarity constraint but it also aims at regularizing the grid structure. Thanks to the third-order cliques, we can accurately approximate a discrete version of the second-order derivative (Kwon et al. 2008). Given three contiguous control points $(\varvec{p_i}, \varvec{p_j}, \varvec{p_k})$ and their corresponding displacement labels $(\varvec{l_i}, \varvec{l_j}, \varvec{l_k})$, it can be approximated as follows: $ \mid \mid (\varvec{p_i} + \varvec{l_i}) + (\varvec{p_k} + \varvec{l_k}) - 2 \cdot (\varvec{p_j} + \varvec{l_j}) \mid \mid $.

Based on this idea, we define the following energy term that is proportional to the second derivative, and normalized with the original distance between the control points, d:

$$\begin{aligned} R^A_{ijk}(\varvec{\varvec{l_i}, \varvec{l_j}, \varvec{l_k}}) = \frac{\mid \mid (\varvec{p_i} + \varvec{l_i}) + (\varvec{p_k} + \varvec{l_k}) - 2 \cdot (\varvec{p_j} + \varvec{l_j}) \mid \mid }{d^2}^2, \end{aligned}$$

(20)

In-plane deformation smoothness is reinforced in the same manner as the previous models—through a symmetric distance preserving approach. For the sake of clarity, we redefine Equation 10 as ${\varPsi }_{ij}(\varvec{l_i}, \varvec{l_j}) = (1-\psi _{i,j}(\varvec{l_i}, \varvec{l_j}))^2 + (1-\psi _{i,j}(\varvec{l_i}, \varvec{l_j})^{-1})^2$, and we apply it to both pairs of contiguous points that form the clique (i, j, k):

$$\begin{aligned} R^B_{ijk}(\varvec{l_i}, \varvec{l_j}, \varvec{l_k}) = \frac{{\varPsi }_{ij}(\varvec{l_i}, \varvec{l_j}) + {\varPsi }_{jk}(\varvec{l_j}, \varvec{l_k})}{2}. \end{aligned}$$

(21)

The equation that regularizes the grid is a weighted combination of both terms $R^A_{ijk}$ and $R^B_{ijk}$:

$$\begin{aligned} R_{ijk}(\varvec{l_i}, \varvec{l_j}, \varvec{l_k}) = (1-\alpha ) R^A_{ijk}(\varvec{l_i}, \varvec{l_j}, \varvec{l_k}) + \alpha R^B_{ijk}(\varvec{l_i}, \varvec{l_j}, \varvec{l_k}), \end{aligned}$$

(22)

where $\alpha $ represents a weighting factor used to calibrate the regularization term.

3 Results and Discussion

Let us now proceed with a systematic evaluation of the proposed methods. One of the main aspects shared across methods is the inference algorithms used to produce the desired solution.

3.1 Inference Methods

Depending on their cardinality and regularity, objective functions can be optimized using a variety of discrete optimization algorithms which offer different guaranties. It must be noted that the regularization terms presented in our three models are non submodular, since we include the current position of the points (which can be arbitrary) in their formulation. Therefore, submodularity constraint is fulfilled neither in the pairwise nor in the high-order terms (for a clear definition of submodularity in pairwise and high-order energies, we refer the reader to the work of Ramalingam et al. (2008)).

In Ferrante and Paragios (2013), the overparameterized approach was optimized using the FastPD algorithm (Komodakis et al. 2007) while for the decoupled (Ferrante et al. 2015b) and the higher order models (Ferrante et al. 2015a), we consider loopy belief propagation networks. For the sake of fairness, in order to improve the confidence of the comparison among the three methods, in this work we adapted it to be optimized with the same algorithms. Therefore, results in this work can not be directly compared with our previous works.

Given the variety of models presented in this work, we chose two different inference methods that can deal with arbitrary graph topologies and clique orders, coming from two standard inference algorithm classes: (i) Loopy Belief Propagation (LBP), a well known message passing algorithm that has been extensively used in the literature; and (ii) the Lazy Flipper (LF) by Andres et al. (2012), a move-making algorithm which is a generalization of the classical Iterated Conditional Modes (ICM) (Besag 1986) and has provided good approximations for several non-submodular models in different benchmarks. Both are approximate inference methods that can accommodate arbitrary energy functions, graph topologies and label spaces, and allow us to show how the three proposed approaches perform under different optimization strategies.

3.1.1 Loopy Belief Propagation

LBP estimates a solution by iteratively passing local messages around the variables of the random field. These messages $m_{ij}$ (sent from a node i to a node j) are actually vectors of size $\mid L \mid $ (cardinality of the label space), where every scalar entry represents what node i thinks about assigning label l to the node j. Once a node i receives all the messages from its neighbors, it compute its beliefs (also vectors of size $\mid L \mid $) in a label $l_i$. The messages are iteratively passed from one node to its neighbors until no change occurs from one iteration to the next one. When convergence is achieved, the MAP labeling is obtained for every node i as the label $l_i$ that minimizes the corresponding belief. Note that both, messages and beliefs computed for a given node, depend on the messages received from its neighbors. Therefore, if the graph that underlies the MRF is a tree, this process is initialized in the roots since messages for these nodes can be calculated considering just their potentials. In this case, at convergence, the solution is guaranteed to be optimal for arbitrary energies. If the structure is not a tree, messages are passed in any arbitrary order, but the algorithm is not guaranteed to converge in a finite number of iterations. Nonetheless, LBP has shown good performance in empirical studies (Murphy et al. 1999).

3.1.2 Lazy Flipper

LF is a move-making algorithm proposed by Andres et al. (2012). It is a generalization of the well-known ICM which offers a systematic way to explore (exhaustively or not) the search space. The idea is to start from an arbitrary initial assignment and perform successive flips of variables that reduce the energy to be minimized. A greedy strategy is adopted to explore the space of solutions: as soon as a flip reducing the energy is found, the current configuration is updated accordingly. In a first stage, only one variable is flipped at a time (as in ICM). However, once a configuration is found whose energy can no longer be reduced by flips of one variable, a new stage starts where all subsets of two connected variables (i.e. variables that are linked by an edge in the graph) are considered. This strategy is applied, considering sets of maximum size k. This parameter controls the search depth. For $k=1$, it specializes to ICM. For bigger values of k a trade-off between approximation quality and runtime is established, which in the limit converges to an exhaustive search over only the connected subgraphs (intractable in most of the cases).

3.1.3 Factor Graphs

We have adopted the OpenGM2 library (Kappes et al. 2013) which implements both inference methods, and makes it possible to perform fair comparisons. It requires construction of a factor graph for every scheme (see Fig. 7).

A factor graph $G'$ is a bipartite graph that factorizes a given global energy function, expressing which variables are arguments of which local functions (Kschischang et al. 2001). Given a graphical model of any order $G=\langle V, E \rangle $ (like the ones described in this work), we can derive a factor graph $G'=\langle V', F', E'\rangle $. Here, $V'$ is the set of variable nodes formed by the nodes of G, $F'$ is a the set of all the factors $f \in F'$ (where every f is associated to one clique G), and the set $E' \subset V' \times F'$ defines the relation between the nodes and the factors. Every factor f has a function $\varphi _f:V'^n \rightarrow \mathbb {R}$ associated with it, that might correspond to one of the data or regularization terms defined in previous sections. The energy function of our discrete labeling problem in the context of factor graphs is then given by:

$$\begin{aligned} \mathcal {E}(x) = \sum _{f \in F'} \varphi _f(l^f_1, \ldots , l^f_n), \end{aligned}$$

(23)

where x corresponds to a given labeling for the complete graph and $l^f_1 \ldots l^f_n$ are labels given to the variables in the neighborhood (or scope) of the factor f. Figure 7 shows a comparison between the three models and the derivation of the corresponding factor graph in each case.

3.1.4 Incremental Approach

In order to improve the quality of the label space sampling (and therefore the accuracy of the results) while keeping a low computational cost, we adopted a greedy incremental approach where the label space is refined for every time we run the inference algorithm. In that way we explore a wider range of parameters which result in more accurate sampling when composed after several iterations. A similar approach has been successfully used in previous graph-based registration papers (Ferrante and Paragios 2013; Ferrante et al. 2015a, b; Glocker et al. 2008, 2011).

3.2 Experimental Validation

We compute results on two different datasets for the three methods, using the two inference algorithms (LBP and LF) in order to validate both the resulting 2D–2D deformation field and the final plane estimation. The first one is a monomodal MRI heart dataset while the second one consists of 6 sequences of multimodal US-MRI brain images.

For every registration case, we run the inference algorithm several times (more precisely, the inference method is executed a number of times equal to the product between grid refinement levels and label refinement levels). For a single execution of both inference methods, we use the same compound stopping criterion based on the energy gap between iterations and maximum running time. The algorithms run until the energy improvement between two iterations is smaller than a fraction of the current energy (we use $\epsilon = 0.01\%$). If convergence is not achieved before a timeout is reached, the algorithm stops and returns the best explored solution. A timeout of 60 s is used since we observe that it is enough to achieve convergence in most of the registration cases. When it is not achieved within this time, it can take too long. For LF we used a maximum depth of $k=2$ (for details about LF, we refer the reader to Sect. 3.1 or to the work of Kappes et al. 2013).

We run the same experiments using a continuous approach to estimate rigid and deformable parameters, which serve as baseline for comparison. We adopted the best deformable approach (namely Cont Def-Two Steps) from a brief comparative analysis included in “Appendix 1”, where we discussed alternative continuous models for slice-to-volume registration. Please refer to “Appendix 1” for a detailed discussion about this model. The continuous optimization is performed using the simplex algorithm proposed by Nelder and Mead (1965). Also known as Nelder-Mead, downhill simplex or amoeba, the simplex method is one of the most popular continuous derivative-free methods. It relies on the notion of simplex (a $n+1$ vertices polytope living in a n-dimensional space) to explore the space of solutions in a systematic way. At every iteration, the method constructs a simplex over the search surface, and the objective function is evaluated on its vertices. In the simplest version, the algorithm moves across the surface by replacing, at every iteration, the worst vertex of the current set by a point reflected through the centroid of the remaining n points. The method can find a local optimum when the objective function varies smoothly and is unimodal. It has also shown to be more robust when dealing with complicated parameter space than standard gradient-based methods, providing a good compromise between robustness and convergence time (Leung et al. 2008). It has been widely used in a variety of slice-to-volume applications, to estimate all kinds of transformation models optimizing a variety of similarity measures (Fei et al. 2002; Birkfellner et al. 2007; Gill et al. 2008; Osechinskiy and Kruggel 2011). We optimized a global energy where the similarity measure was computed for the complete image, since no local deformation model is considered.

In the following subsections we describe the datasets and present quantitative and qualitative results.

3.2.1 Monomodal Dataset Experiment

The monomodal dataset was derived from a temporal series of 3D heart MRI volumes. It consists of 10 sequences of 19 MRI slices which have to be registered with an initial volume. The slices are extracted from random positions in the volumes while satisfying spatio-temporal consistency. The ground truth associated with this dataset refers to the rigid transformation used to extract every 2D slice of every sequence (it is used to validate the plane estimation or rigid registration) and a segmentation mask of the left endocardium, that can be used to validate the quality of the estimated deformation field.

The dataset was generated from a temporal series of 3D heart MRI volumes $M_i$ as shown in Fig. 8. For a given sequence in the heart dataset, every 2D slice $I_i$ was extracted from the corresponding volume $M_i$ at a position which is calculated as follows. Starting from a random initial translation $T_0=(T_{x_0}, T_{y_0}, T_{z_0})$ and rotation $R_0=(R_{x_0}, R_{y_0}, R_{z_0})$, we extract the first 2D slice $I_0$ from the initial volume $M_0$. Then, gaussian noise is added to every parameter of the transformation in order to generate the position of the next slice at the next volume. We used $\sigma _r=3^\circ $ as rotation and $\sigma _t=5\mathrm {mm}$ as translation parameters. Those parameters generate maximum distances of about 25mm between the current and the succeeding plane. In this way, we generated 2D sequences that correspond to trajectories inside the volumes. Since the initial 3D series consists of temporally spaced volumes of the heart, there are local deformations between them due to the heartbeat; therefore, extracted slices are also deformed.

The resolution of the MRI volume is $192 \times 192 \times 11$ voxels and the voxel size is $1.25\mathrm {mm} \times 1.25\mathrm {mm} \times 8\mathrm {mm}$. The slices of the 2D sequences are $120 \times 120$ pixels with a voxel size of $1.25 \mathrm {mm} \times 1.25 \mathrm {mm}$.

Experiments for the 3 methods were performed using equivalent configurations. In all of them we used 3 grid refinement levels, 4 steps of label refinement per grid level, initial grid size of 40mm and minimum patch size (for similarity measure calculation) of 20 mm. In case of the overparameterized approach we used $\alpha = 0.8$, $\gamma = 1$ and 342 labels; for the decoupled approach we used $\alpha = 0.8$, $\beta = 0.2$, 25 labels in the 2D deformation space and 91 in the plane selection space; and finally, for the high-order approach we used $\alpha = 0.5$, $\gamma = 1.10$ and 19 labels. Parameters $\alpha , \beta , \gamma $ were chosen using cross-validation. The number of labels in every label space was chosen to make the search spaces as similar as possible. Recall that alternative label spaces were adopted in every approach: the overparameterized model uses 5-dimensional labels describing in-plane deformation and plane selection variables; the decoupled model divides this unified label space into two separate ones, the in-plane deformations label space and the plane selection label space; finally, the high-order model uses a unique and simpler label space composed of 3-dimensional displacement vectors.

Results are reported (for every approach and every inference method) for 10 sequences of 19 images, giving a total of 190 registration cases. We also included the results corresponding to the rigid approach optimized using simplex method. We used SAD as similarity measure given that we are dealing with monomodal registration. The idea is to register every 2D slice $I_i$ (which plays the role of an intra-operative image) to the same initial volume $M_0$ (which acts as the pre-operative image). The resulting position of the slice $I_i$ was used to initialize the registration of slice $I_{i+1}$.

Figure 10 shows results in terms of rigid transformation estimation. We measured the distance between the transformation parameters, and reported the average of the 190 registration cases. It resulted in less than 0.02rad ($1.14^\circ $) for rotation and less than 1.5mm for translation parameters in all the discrete approaches and optimization methods. The discrete methods outperform the results obtained using the continuous baselines. The decoupled method dominates the other two by orders of magnitude in terms of reduction of the standard deviation and the mean error. However, in terms of performance, both decoupled and high-order methods are equally good when compared to the overparameterized approach whose computational time is higher (as expected, given the high dimensionality of the label space). This can be observed in Figs. 13, 14 and 15.

To measure the influence of the deformation in the final results, we used the segmentations being associated with the dataset. We computed statistics for the segmentation overlapping at three different stages: before registration (i.e. between the source image and the target volume slice corresponding to the initial transformation), after rigid registration (i.e. between the source image and the target volume slice corresponding to the estimated transformation) and after deformable registration (i.e. between the deformed source image and the target volume slice corresponding to the estimated transformation). We evaluated accuracy computing DICE coefficient, Hausdorff distance and Contour Mean Distance (CMD). We also provided sensitivity (which measures how many pixels from the reference image are correctly segmented in test image) and specificity (which measures how many pixels outside the reference image are correctly excluded from the test image) coefficients to complete the analysis. Results presented in Fig. 11 show the mean and standard deviation of the indicators at the three stages, for the three approaches and the two inference methods. The discrete methods outperform the continuous approaches (rigid and deformable) in all the cases. It can be seen that results improve at each stage, achieving DICE coefficient of around 0.9 after deformation. Hausdorff distance and CMD decreased at each stage until a total reduction of around 66%. Decoupled method still outperforms the others after deformation in all the indicators, and presents a substantial improvement in terms of standard deviation reduction with respect to them (it is consistent with the results we showed in Fig. 10 for the rigid parameters). Figure 12 complements these results by showing DICE values per sequence, while Fig. 9 shows some qualitative results before, after rigid and after deformable registration.

Finally, in terms of running time, Fig. 13 presents the average value for the three approaches and the two inference methods, together with the distribution with respect to data cost computation and optimization time. As we can see, the decoupled method again outperforms the other two when inference is performed using LBP. We run all the experiments (brain and heart datasets) on an Intel Xeon W3670 with 6 Cores, 64bits and 16GB of RAM.

3.2.2 Multimodal Experiment

Another dataset was used to test our approaches on multimodal image registration. The dataset consists of a preoperative brain MRI volume (voxel size of $0.5 \mathrm {mm} \times 0.5 \mathrm {mm} \times 0.5 \mathrm {mm}$ and resolution of $394 \times 466 \times 378$ voxels) and 6 series of 9 US images extracted from the patient 01 of the database MNI BITE presented in Mercier et al. (2012). The intra-operative US images were acquired using the prototype neuronavigation system IBIS NeuroNav. We generated 6 different sequences of 9 2D US images of the brain ventricles, with resolution around $161 \times 126$ pixels and pixel size of $0.3 \mathrm {mm} \times 0.3 \mathrm {mm}$. The brain ventricles were manually segmented in both modalities. The estimated position of the slice n was used to initialize the registration process of slice $n+1$. Slice 0 was initialized in a position near the ground truth using the rigid transformation provided together with the dataset. We computed statistics as we did in the previous experiment, but in this case based on the overlap between ventricle segmentations. Since we registered input images of different modalities, we used Mutual Information as similarity measure instead of SAD.

Figure 11 summarizes the average DICE, specificity, sensibility, Hausdorff distance and Contour Mean Distance coefficients for all the series, while Fig. 13 reports the running times. Figure 12 complements these results by showing DICE values disaggregated per sequence. Note that the decoupled method does better in terms of computational time (independently of the inference method). However, the high-order method achieves better results in terms of segmentation statistics (in the order of 5% in terms of DICE, $2 \mathrm {mm} $ for Hausdorff distance and $0.5 \mathrm {mm}$ for contour mean distance) while keeping low running times, specially when using LF as optimization strategy (see Figs. 14, 15 for a comparison between running time and energy or accuracy, respectively). It must be noted that, in this case, we are dealing with a more complex problem than in the case of monomodal registration; consequently, the increment obtained in terms of accuracy for both, rigid and deformable registration, is smaller. Given that we are dealing with highly challenging images of low resolution being heavily corrupted from speckle, those results are extremely promising. It is known to the medical imaging community that explaining correspondences between different modalities is an extremely difficult task.

In all brain experiments, we used initial grid size of $8\mathrm {mm}$, a minimum patch size of 13mm, histograms of 16 bins to measure mutual information similarity, a grid level of 3 and 4 steps of label refinement per grid level. In case of the overparameterized approach, we used $\alpha = 0.9$, $\gamma = 0.1$ and 342 labels; for the decoupled approach $\alpha = 0.015$, $\beta = 0.135$, 25 labels in the 2D deformation space and 91 in the plane selection space; finally, for the high-order approach $\alpha = 0.7$, $\gamma = 0.05$ and 19 labels. Parameters were chosen similarly as in the heart experiments.

3.3 Comparative Analysis

In this section, we aim at comparing different aspects of the three approaches we have presented in this paper, namely label spaces, graph topology and computational time. Without loss of generality, some assumptions are made regarding the models. First, we consider only square grids where N is the number of control points and consequently $\sqrt{N}$ is the number of nodes per side. Second, for the sake of simplicity we do not consider the extra cliques introduced to improve the geometrical constraints propagation, since they are contemplated as an alternative strategy which may or may not be adopted.

Figure 14 shows a comparative analysis between the three approaches, using the two proposed inference methods, in terms of optimization time and final energy, while Fig. 15 includes a similar graph representing time vs accuracy (measured using DICE coefficient). Note that in Fig. 14, both methods are equivalent with respect to the final energy in general (without considering the outliers). However, there are more important differences in terms of computational time. In the high-order approach, where the label space is small, LF outperforms LBP since convergence is achieved in a few seconds, independently of the dataset. For bigger label spaces (like decoupled and overparameterized approaches), LBP converges faster in case of the heart dataset, where SAD is used as similarity measure and therefore the energy is smooth. The last case is when we use MI as similarity measure (brain dataset) and we have big label spaces: there is no clear pattern in this case. Note that these results are consistent with those shown in Fig. 15. Indeed, one can observe that graphs in Fig. 15 are essentially a flipped version (over the X axis) of graphs included in 14. This evidences a high correlation between low energy values and high accuracy of the results, proving that the energy is appropriately modeled.

Table 2 presents a compendium of the most critical parameters related to the proposed methods. Let us start with the label spaces. We divide them into two types: displacement space ($L_D$) and plane selection space ($L_P$). The first one contains the displacement vectors (2D or 3D, depending on the model) applied to the control points, while the second one contains the set of planes that can be chosen. In terms of cardinality of the label spaces, the overparameterized approach has the highest complexity, given by the cartesian product between the displacements and all the possible planes, $|L_D \times L_P|$. The decoupled model is dominated by the maximum of the cardinality of both label spaces, $\max (|L_D|, |L_P|)$. Finally, for the high-order model it depends only on $|L_D|$ since it is not necessary anymore to explicitly model which planes can be chosen—the triangles defined by the triplets of points describe a plane (and even more, a patch on this plane) by themselves. It clearly illustrates how we can reduce the complexity of a given label space by making smart decisions in terms of energy definition and graph topology.

However, there is always a trade-off. This strong reduction in the size of the label space, has an effect on other parameters like number of cliques and number of variables. In case of the decoupled model, the main advantage is related to the fact that while the number of variables and edges augment linearly (it goes from N to 2N in case of variables, and from $2N-2\sqrt{N}$ to $5N-4\sqrt{N}$ in case of pairwise edges), the number of labels decreases quadratically (from $|L_D \times L_P|$ to $\max (|L_D|, |L_P|)$). It results in better performance for the decoupled method as can be observed in Fig. 13. A consequence of the third-order cliques in the high-order method is higher computation costs. Even then, judging from the running times reported in Fig. 13, we achieve good experimental computation time because of the smaller label space.

Table 1 Memory footprint comparison among the three methods, using two different optimizers

Full size table

Table 2 Comparison among the three methods in terms of label space and graph topology

Full size table

Finally, we include a comparison in terms of memory footprints (see Table 1) among the three methods, using two different optimizers. We reported the maximum amount of memory that a process consumed while running one registration case for the heart dataset. As expected, the overparemterized model requires more memory than the other two approaches. Results also suggest that LF is more efficient in terms of memory consumption than LBP, since given the same graphical model, LF always outperforms LBP in terms of memory consumption (Table 2).

4 Conclusion

We derived three new models from the standard graph-based deformable registration theory for slice-to-volume registration. We have shown promising results in a monomodal and a multimodal case, using different inference methods, and we compare them with baseline rigid and non-rigid approaches were inference is performed using continuous optimization. The proposed framework inherits the advantages of graph-based registration theory: modularity with respect to the similarity measure, flexibility to incorporate new types of prior knowledge within the registration process (through new energy terms) and scalability given by its parallelization potential.

The three methods we have presented aim at optimizing different types of energy functions in order to get both, rigid and deformable transformations that can be applied independently, according to the problem we are trying to solve. An extensive evaluation in terms of different statistical indicators has been presented, together with a comparative analysis of the algorithmic and computational complexity of each model. This work constitutes a clear example of the modeling power of graphical models, and it pushes the limits of the state-of-the-art by showing how a new problem can be solved not just in one, but in three different ways.

Numerous future developments built upon the proposed framework can be imagined. In this work, we proposed a joint model which encodes rigid and deformable parameters through a 2D grid of control points living in 3D space. An alternative approach, standard in the literature of slice-to-volume registration using continuous methods (i.e. Osechinskiy and Kruggel 2011), consists in decoupling the parameters into a unique global rigid transformation (6 DOF) for plane selection, and a 2D deformation model, which can be optimized in two-steps or simultaneously, as we discussed in “Appendix 1”. Adopting a similar model in the discrete case would help to reduce the number of parameters in the label space, by increasing the complexity of the graphical model itself. In that sense, the recent work presented by Porchetto et al. (2016) suggests a strategy to optimize global transformations through discrete graphical models in the context of slice-to-volume registration, which could be combined with a simplified version of the proposed models encoding the deformable parameters.

Alternative optimization methods and in particular second order methods in the context of higher order inference could improve the quality of the obtained solution while decreasing the computational complexity. The integration of geometric information (landmark correspondences) combined with iconic similarity measures (Sotiras et al. 2010) could also be an interesting additional component of the registration criterion. Last but not least, domain/problem specific parameter learning (Baudin et al. 2013; Komodakis et al. 2015) towards improving the proposed models could have a positive influence on the obtained results.

References

Andres, B., Kappes, J. H., Beier, T., Köthe, U., & Hamprecht, F. A. (2012). The lazy flipper: Efficient depth-limited exhaustive search in discrete graphical models. In A. Fitzgibbon, S. Lazebnik, P. Perona, Y. Sato & C. Schmid (Eds.), Computer vision (ECCV 2012) (pp. 154–166). Berlin: Springer.
Baudin, P. Y., Goodman, D., Kumar, P., Azzabou, N., Carlier, P. G., Paragios, N., et al. (2013). Discriminative parameter estimation for random walks segmentation. In K. Mori, I. Sakuma, Y. Sato, C. Barillot & N. Navab (Eds.), Medical image computing and computer-assisted intervention (MICCAI 2013) (pp. 219–226). Berlin: Springer.
Besag, J. (1986). On the statistical analysis of dirty pictures. Journal of the Royal Statistical Society Series B (Methodological), 48, 259–302.
Birkfellner, W., Figl, M., Kettenbach, J., Hummel, J., Homolka, P., Schernthaner, R., et al. (2007). Rigid 2D/3D slice-to-volume registration and its application on fluoroscopic CT images. Medical Physics, 34(1), 246. doi:10.1118/1.2401661.
Article Google Scholar
Chandler, A. G., Pinder, R. J., Netsch, T., Schnabel, J. A., Hawkes, D. J., Hill, D. L., et al. (2008). Correction of misaligned slices in multi-slice MR cardiac examinations by using slice-to-volume registration. Journal of Cardiovascular Magnetic Resonance, 2008(10), 13.
Article Google Scholar
Eresen, A., Li, P., & Ji, J. X. (2014). Correlating 2D histological slice with 3D MRI image volume using smart phone as an interactive tool for muscle study. In 2014 36th Annual international conference of the IEEE engineering in medicine and biology society (EMBC) (pp. 6418–6421). IEEE.
Fei, B., Duerk, J. L., & Wilson, D. L. (2002). Automatic 3D registration for interventional MRI-guided treatment of prostate cancer. Computer Aided Surgery, 7(5), 257–267.
Article Google Scholar
Ferrante, E., Fecamp, V., & Paragios, N. (2015a). Implicit planar and in-plane deformable mapping in medical images through high order graphs. In IEEE international symposium on biomedical imaging: From nano to macro (ISBI).
Ferrante, E., Fecamp, V., & Paragios, N. (2015b). Slice-to-volume deformable registration: Efficient one shot consensus between plane selection and in-plane deformation. International Journal of Computer Assisted Radiology and Surgery (IJCARS), 10(6), 16.
Ferrante, E., & Paragios, N. (2013). Non-rigid 2D–3D medical image registration using Markov random fields. In Medical image computing and computer-assisted intervention (MICCAI 2013) (pp. 163–170). Berlin: Springer.
Ferrante, E., & Paragios, N. (2017). Slice-to-volume medical image registration: A survey. Medical Image Analysis, 39, 101–123.
Gill, S., Abolmaesumi, P., Vikal, S., Mousavi, P., & Fichtinger, G. (2008). Intraoperative prostate tracking with slice-to-volume registration in MRI. In Proceedings of the 20th international conference of the society for medical innovation and technology 2008 (pp. 154–158). Society for Medical Innovation and Technology SMIT.
Glocker, B. (2010). Random fields for image registration. PhD Thesis.
Glocker, B., Komodakis, N., Paragios, N., & Navab, N. (2009). Approximated curvature penalty in non-rigid registration using pairwise MRFs. In G. Bebis, R. Boyle, B. Parvin, D. Koracin, Y. Kuno, J. Wang, et al. (Eds.), Advances in visual computing (pp. 1101–1109). Berlin: Springer.
Glocker, B., Komodakis, N., Tziritas, G., Navab, N., & Paragios, N. (2008). Dense image registration through MRFs and efficient linear programming. Medical Image Analysis, 12(6), 731–741. doi:10.1016/j.media.2008.03.006. Special issue on information processing in medical imaging 2007.
Article Google Scholar
Glocker, B., Sotiras, A., Komodakis, N., & Paragios, N. (2011). Deformable medical image registration: Setting the state of the art with discrete methods. Annual Review of Biomedical Engineering, 13, 219–244. doi:10.1146/annurev-bioeng-071910-124649.
Article Google Scholar
Huang, X., Moore, J., Guiraudon, G., Jones, D. L., Bainbridge, D., Ren, J., et al. (2009). Dynamic 2D ultrasound and 3D ct image registration of the beating heart. IEEE Transactions on Medical Imaging, 28(8), 1179–1189.
Article Google Scholar
Jiang, S., Xue, H., Counsell, S., Anjari, M., Allsop, J., Rutherford, M., et al. (2009). Diffusion tensor imaging (DTI) of the brain in moving subjects: Application to in-utero fetal and ex-utero studies. Magnetic Resonance in Medicine, 62(3), 645–655.
Article Google Scholar
Kappes, J. H., Andres, B., Hamprecht, F. A., Schnörr, C., Nowozin, S., Batra, D., et al. (2013). A comparative study of modern inference techniques for discrete energy minimization problems. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), 2013 (pp. 1328–1335).
Kim, B., Boes, J. L., Bland, P. H., Chenevert, T. L., & Meyer, C. R. (1999). Motion correction in fMRI via registration of individual slices into an anatomical volume. Magnetic Resonance in Medicine, 41(5), 964–972.
Kohli, P., & Rother, C. (2012). Higher-order models in computer vision. Chapter 1, Image processing and analysis with graphs. Boca Raton: CRC Press.
MATH Google Scholar
Komodakis, N., Paragios, N., & Tziritas, G. (2011). MRF energy minimization and beyond via dual decomposition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(3), 531–552.
Komodakis, N., Tziritas, G., & Paragios, N. (2007). Fast, approximately optimal solutions for single and dynamic MRFs. In IEEE conference on computer vision and pattern recognition, 2007 (CVPR’07) (pp. 1–8). IEEE. http://ieeexplore.ieee.org/xpl/mostRecentIssue.jsp?punumber=4269955.
Komodakis, N., Xiang, B., & Paragios, N. (2015). A framework for efficient structured max-margin learning of high-order MRF models. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 37(7), 1425–1441.
Kschischang, F. R., Frey, B. J., & Loeliger, H. A. (2001). Factor graphs and the sum-product algorithm. IEEE Transactions on Information Theory, 47(2), 498–519.
Article MathSciNet MATH Google Scholar
Kwon, D., Lee, K. J., Yun, I. D., & Lee, S. U. (2008). Nonrigid image registration using dynamic higher-order MRF model. In D. Forsyth, P. Torr & A. Zisserman (Eds.), Computer Vision (ECCV 2008) (pp. 373–386). Berlin: Springer.
Leung, K. Y. E., van Stralen, M., Nemes, A., Voormolen, M. M., van Burken, G., Geleijnse, M. L., et al. (2008). Sparse registration for three-dimensional stress echocardiography. IEEE Transactions on Medical Imaging, 27(11), 1568–1579.
Liao, R., Zhang, L., Sun, Y., Miao, S., & Chefd’Hotel, C. (2013). A review of recent advances in registration techniques applied to minimally invasive therapy. IEEE Transactions on Multimedia, 15(5), 983–1000.
Article Google Scholar
Markelj, P., Tomaževič, D., Likar, B., & Pernuš, F. (2012). A review of 3D/2D registration methods for image-guided interventions. Medical Image Analysis, 16(3), 642–661.
Mercier, L., Del Maestro, R. F., Petrecca, K., Araujo, D., Haegelen, C., & Collins, D. L. (2012). Online database of clinical MR and ultrasound images of brain tumors. Medical Physics, 39, 3253.
Article Google Scholar
Murphy, K. P., Weiss, Y., & Jordan, M. I. (1999). Loopy belief propagation for approximate inference: An empirical study. In Proceedings of the fifteenth conference on uncertainty in artificial intelligence (UAI’99).
Nelder, J. A., & Mead, R. (1965). A simplex method for function minimization. The Computer Journal, 7(4), 308–313. doi:10.1093/comjnl/7.4.308.
Article MathSciNet MATH Google Scholar
Olesch, J., Beuthien, B., Heldmann, S., Papenberg, N., & Fischer, B. (2011). Fast intra-operative non-linear registration of 3D-ct to tracked, selected 2D-ultrasound slices. In SPIE medical imaging (pp. 79642R–79642R). International Society for Optics and Photonics.
Osechinskiy, S., & Kruggel, F. (2011). Slice-to-volume nonrigid registration of histological sections to MR images of the human brain. Anatomy Research International, 2011, 1–17. doi:10.1155/2011/287860.
Article Google Scholar
Paragios, N., Ferrante, E., Glocker, B., Komodakis, N., Parisot, S., & Zacharaki, E. I. (2016). (Hyper)-graphical models in biomedical image analysis. Medical Image Analysis, 33, 102–106.
Paragios, N., Komodakis, N. (2014). Discrete visual perception. In 2014 22nd International conference on pattern recognition (ICPR) (pp. 18–25). IEEE.
Porchetto, R., Stramana, F., Paragios, N., & Ferrante, E. (2016). Rigid slice-to-volume medical image registration through Markov random fields. BAMBI Workshop, MICCAI 2016.
Ramalingam, S., Kohli, P., Alahari, K., & Torr, P. H. (2008). Exact inference in multi-label crfs with higher order cliques. In IEEE conference on computer vision and pattern recognition, 2008 (CVPR 2008) (pp. 1–8). IEEE.
Rueckert, D., Sonoda, L. I., Hayes, C., Hill, D. L., Leach, M. O., & Hawkes, D. J. (1999). Nonrigid registration using free-form deformations: Application to breast MR images. IEEE Transactions on Medical Imaging, 18(8), 712–721.
Article Google Scholar
San José Estépar, R., Westin, C., & Vosburgh, K. (2009). Towards real time 2D to 3D registration for ultrasound-guided endoscopic and laparoscopic procedures. International Journal of Computer Assisted Radiology and Surgery, 4(6), 549–560.
Article Google Scholar
Seshamani, S., Fogtmann, M., Cheng, X., Thomason, M., Gatenby, C., & Studholme, C. (2013). Cascaded slice to volume registration for moving fetal fMRI. In 2013 IEEE 10th international symposium on biomedical imaging (ISBI) (pp. 796–799). IEEE.
Shekhovtsov, A., Kovtun, I., & Hlaváč, V. (2008). Efficient MRF deformation model for non-rigid image matching. Computer Vision and Image Understanding, 112(1), 91–99.
Article Google Scholar
Sotiras, A., Davatzikos, C., & Paragios, N. (2013). Deformable medical image registration: A survey. IEEE Transactions on Medical Imaging, 32(7), 1153–1190.
Article Google Scholar
Sotiras, A., Ou, Y., Glocker, B., Davatzikos, C., & Paragios, N. (2010). Simultaneous geometric–iconic registration. In T. Jiang, N. Navab, J. P. W. Pluim & M. A. Viergever (Eds.), Medical image computing and computer-assisted intervention (MICCAI 2010) (pp. 676–683). Berlin: Springer.
Wang, C., Komodakis, N., & Paragios, N. (2013). Markov random field modeling, inference & learning in computer vision & image understanding: A survey. Computer Vision and Image Understanding, 117(11), 1610–1627.
Article Google Scholar
Xu, H., Lasso, A., Fedorov, A., Tuncali, K., Tempany, C., & Fichtinger, G. (2014). Multi-slice-to-volume registration for MRI-guided transperineal prostate biopsy. International Journal of Computer Assisted Radiology and Surgery, 10(5), 1–10.
Zikic, D., Glocker, B., Kutter, O., Groher, M., Komodakis, N., Kamen, A., et al. (2010). Linear intensity-based image registration by Markov random fields and discrete optimization. Medical Image Analysis, 14(4), 550–562.
Article Google Scholar

Download references

Acknowledgements

This research was partially supported by European Research Council Starting Grant Diocles (ERC-STG-259112). We thank Mihir Sahasrabudhe for proof-reading the paper, and Puneet Kumar Dokania, Vivien Fecamp and Jorg Kappes for helpful discussions.

Author information

Authors and Affiliations

Center for Visual Computing, CentraleSupelec, INRIA, Universite Paris-Saclay, Paris, France
Enzo Ferrante & Nikos Paragios
Biomedical Image Analysis (BioMedIA) Group, Department of Computing, Imperial College London, London, UK
Enzo Ferrante
TheraPanacea, Paris, France
Nikos Paragios

Authors

Enzo Ferrante
View author publications
You can also search for this author in PubMed Google Scholar
Nikos Paragios
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Enzo Ferrante.

Additional information

Communicated by Ron Kimmel.

Images in this work are better viewed in color.

Appendix 1: Continuous Slice-to-Volume Registration

In this appendix, we include a brief comparative study among alternative continuous models for slice-to-volume registration. Comparison is performed using the monomodal heart dataset (see Sect. 3.2.1 for a complete description). The aim of this experiment was to choose the most accurate method for deformable registration which (together with the standard rigid model) was then used as baseline for comparison with the discrete approaches proposed in this work (see Sect. 3.2).

Following the literature on slice-to-volume registration (for a complete survey on slice-to-volume registration see Ferrante and Paragios (2017)), we adopted a decoupled model where the transformation consists in a global 6-DOF rigid transformation for plane selection, and a 2D FFD to represent the deformation field. To account for smooth deformations, the FFD is regularized using the Jacobian of the deformation field, a common regularizer used in the deformable image registration community (Sotiras et al. 2013). Optimization was performed through the continuous Nelder-Mead simplex algorithm described in Sect. 3.2, adopted in many slice-to-volume registration studies (see Section 4.1.2 of Ferrante and Paragios 2017). The grid resolution for the 2D FFDs was set to be equivalent to the resolutions used in the discrete experiments (see Sects. 3.2.1, 3.2.2). We run simplex optimization until convergence or until a maximum of 10,000 simplex iterations were achieved. For the rigid model, convergence was always reached in a few seconds. In case of the deformable models, the algorithm did not converge in all the cases, achieving maximum running times of around 40 s for 10,000 iterations. We experimented with more iterations (100,000) but we did not reach significant improvements in the results.

Figure 16 summarizes the results for the comparative study including a simple 6-DOF rigid transformation and three variants of a decoupled model with a global 6-DOF rigid transformation and a 2D FFD. As it can be observed, the Cont Def-Two Steps outperforms the other models. That is why it was chosen as baseline for comparison with the discrete approaches proposed in this work.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ferrante, E., Paragios, N. Graph-Based Slice-to-Volume Deformable Registration. Int J Comput Vis 126, 36–58 (2018). https://doi.org/10.1007/s11263-017-1040-8

Download citation

Received: 30 August 2016
Accepted: 07 August 2017
Published: 22 August 2017
Issue Date: January 2018
DOI: https://doi.org/10.1007/s11263-017-1040-8

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Graph-Based Slice-to-Volume Deformable Registration

Abstract

Similar content being viewed by others

Basics of magnetic resonance imaging and quantitative parameters T1, T2, T2*, T1rho and diffusion-weighted imaging

Iterative reconstruction: how it works, how to apply it

Statistical Shape Models: Understanding and Mastering Variation in Anatomy

1 Introduction

1.1 Motivation

1.2 Previous Work

1.3 Contribution

2 Graph-Based Slice-to-Volume Deformable Registration

2.1 Overparameterized Approach

2.2 Decoupled Approach

2.3 High-Order Approach

3 Results and Discussion

3.1 Inference Methods

3.1.1 Loopy Belief Propagation

3.1.2 Lazy Flipper

3.1.3 Factor Graphs

3.1.4 Incremental Approach

3.2 Experimental Validation

3.2.1 Monomodal Dataset Experiment

3.2.2 Multimodal Experiment

3.3 Comparative Analysis

4 Conclusion

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix 1: Continuous Slice-to-Volume Registration

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Graph-Based Slice-to-Volume Deformable Registration

Abstract

Similar content being viewed by others

Basics of magnetic resonance imaging and quantitative parameters T1, T2, T2*, T1rho and diffusion-weighted imaging

Iterative reconstruction: how it works, how to apply it

Statistical Shape Models: Understanding and Mastering Variation in Anatomy

1 Introduction

1.1 Motivation

1.2 Previous Work

1.3 Contribution

2 Graph-Based Slice-to-Volume Deformable Registration

2.1 Overparameterized Approach

2.2 Decoupled Approach

2.3 High-Order Approach

3 Results and Discussion

3.1 Inference Methods

3.1.1 Loopy Belief Propagation

3.1.2 Lazy Flipper

3.1.3 Factor Graphs

3.1.4 Incremental Approach

3.2 Experimental Validation

3.2.1 Monomodal Dataset Experiment

3.2.2 Multimodal Experiment

3.3 Comparative Analysis

4 Conclusion

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix 1: Continuous Slice-to-Volume Registration

Appendix 1: Continuous Slice-to-Volume Registration

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation