Multi-modal Brain Image Registration Based on Subset Deﬁnition and Manifold-to-Manifold Distance

. Image registration is an important procedure in multi-modal brain image processing. The main challenge is the variations of intensity distributions in diﬀerent image modalities. The eﬃcient SSD based method cannot handle this kind of variations. And other approaches based on modality independent descrip‐ tors and metrics are usually time-consuming. In this article, we propose a novel similarity metric based on manifold-to-manifold distance imposed on the subset of original images. We deﬁne a subset for a compact representation of the original image. Manifold learning technique is employed to reveal the intrinsic structure of the sampled data. Instead of comparing the images in the original feature space, we use the manifold-to-manifold distance to measure the diﬀerence. By mini‐ mizing the distance between the manifolds, we iteratively obtain the optimal registration of the original image pair. Experiment results show that our approach is eﬀective to deal with the multi-modal image registration on the BrainWeb dataset.


Introduction
Multi-modal image registration is essential to brain image processing, e.g. the image fusion, which can provide comprehensive information of multi-modal images. Conventional registration methods based on sum of squared difference (SSD) cannot be directly applied since the intensity distributions are obviously different. Some modality-independent metrics such as mutual information (MI) [15] and cross correlation (CC) [1] were proposed for the multi-modal image registration. The registration methods based on MI and CC are generally time consuming. Moreover, the modality-independent feature descriptors can be used for multi-modal image registration via the landmarkbased methods. However, the calculation of modality independent descriptors can be complicated. Efforts were also made to figure out a common representation of the reference and moving images and then use L2 norm as the similarity metric to optimize the transformation. One example is the registration method built upon the Laplacian image [22]. However, the nonlinear embedding of the original images often involves a large computation cost considering the image dimensionality.
We propose our registration framework based on the subset definition and the manifold-to-manifold distance as illustrated in Fig. 1. A subset is defined for the compact representation of the original image. We calculate the low-dimensional manifolds of the extracted subsets and use the manifold-to-manifold distance to estimate the image difference. First, we define a subset via edge detection on the reference image. And the coordinates of the sampled points are transferred to the moving image. Image patches centered on these points from both moving and reference images are used to describe the context information. Second, we use the dimensionality reduction technique to deal with the sampled subset. Considering the complicated structures of medical images, we use Laplacian eigenmaps [2] for the low-dimensional embedding with preservation of locality. Instead of measuring the similarity in the original feature space, we estimate the image difference via the distance between manifolds of the subsets. The registration is optimized iteratively by minimizing the manifold-to-manifold distance.

Related Work
Multi-modal image registration plays an important role in the field of brain image processing. It has been studied for years and many approaches have been proposed to solve the problem. One type of solution is to sample a dense point cloud on the surface of the brain and boundaries between different tissues, and then the alignment between point clouds is estimated with algorithms like ICP [5] and robust point matching (RPM) [16]. A similar but more efficient approach is to focus on some specific positions in the brain, so that the landmarks are sparsely distributed. The landmarks can be defined manually. Han et al. [6] use the regression forest to learn the positions of manually marked points. The extraction of landmarks can also be automatically by analyzing the geometry of the images. Commonly used features in brain images are ridges or crest lines [20]. Some semantic information can effectively improve the performance of the registration method. For instance, the prior of brain segmentation is very helpful information in the task of registration [11,18]. Features extracted from images under different modalities are often not precisely matched due to the differences in image characteristics.
To cope with this limitation, Lee et al. [9] propose a feature alignment method based on Gaussian-weighted distance map by spreading the weights of feature points to their neighborhood. An alternative to feature-based method is regarded as intensity-based method. Instead of extracting feature points, this type of methods uses directly the intensity values of the images. Considering the intensity variations of multi-modal images, the difference of intensity values is not an effective metric to describe the relationship between images to be registered. Therefore, some modality-independent similarity metrics are proposed so that the registration framework can adapt with the demand of multi-modal images. One typical example is the mutual information (MI) [15]. And some efforts have been made to improve its performance and robustness, such as normalized mutual information (NMI) [19] and conditional mutual information (CMI) [14]. Recently, methods that combine geometric and intensity features are proposed to make the intensity based registration more robust. Sulcal information can be used to constraint the registration of brain images [4,8]. The cortical surface is considered in the processing of brain volume data [13]. The method of MI is also extended by combining the gradient field calculated in the images [12]. Further related work includes some registration approaches based on internal similarity and manifold learning. Instead of exploring the relationship of the original images, some preprocessing steps are involved to make the images comparable. The already mentioned work of Penney et al. [17] considers the internal similarity in local areas to obtain modality-independent descriptors. The similar idea is also used by Heinrich et al. [7] in their work of modality independent neighborhood descriptor (MIND). Vectors that consist of differences between each pixel and its neighbors are organized as the descriptors of local structures. A second group of techniques seek to figure out a common representation for multi-modal images so that mono-modal registration based on L1 and L2 metrics can be applied. Wachinger et al. [21] use entropy image to represent the structures of input images. Locations with obvious anatomic structures have high entropy, while the entropy values of smooth areas are low. Finally, we want to refer to the method of Laplacian image [22]. Manifold learning technique is applied on image patches sampled across the whole image and a new intensity value is assigned to each pixel according to the corresponding coordinate in the low-dimensional space. Given the number of pixels in an image, this processing can be very time consuming.

Subset Definition
Due to the high dimensionality of the image data, we define a subset of the original image for a compact representation. In this article, we use the Canny operator [3] to extract points on edges from the brain image. We observe that the extracted points almost cover all the contours of sulcus and gyrus as well as the boundaries between white and grey matters.
The edge points are still densely distributed on the image. We further randomly downsample the detected points so that the final subset contains no more than 1000 samples. Image patches around these points are used as the subset to represent the original image.
We only use the Canny edge detector on the reference image and transfer the coordinates of the sampled points to the moving image. The transferred coordinates on the moving image actually deviate from the real boundaries. We use image patches around the sampled points to depict their context.

Metric Based on Manifold-to-Manifold Distance
Our subset definition is still a modality-related representation since the intensity values are directly used. We employ Laplacian eigenmaps [2] to get low-dimensional manifolds of the defined subsets to reveal their intrinsic structure and eliminate. The intrinsic structure describes the internal similarity of patches within each image and provides modality-independent information.
Let denote the low-dimensional manifolds of the subsets defined in the reference and moving images respectively. Laplacian eigenmaps starts with the construction of a neighborhood graph that describes the structure of the data. An image patch can be represented with a high-dimensional vector, where each entry corresponds to the intensity value of a pixel. We denote the data point corresponding to the image patch as , and its counterpart in the low dimensional space as . The Euclidean norm is used to estimate the distance between two patches in the high-dimensional space. This distance is only defined when the patches are close to each other. In this work, we choose the implementation that searches for the k nearest neighbors and connects the corresponding nodes with weights assigned to the edges in the form of heat kernel , where indicates the variance. Subsequently, the lowdimensional manifold with locality best preserved on average is obtained by minimizing .
In the neighborhood graph, edges only connect nodes that are close to each other, and the number of neighbors should not be too large. But when the value of k is too small, the neighborhood graph may be separated into several isolated subgraph. To avoid the emergence of undefined relationship between any pair of data points, we set k = 100 so that there is only one connected subgraph. And this selection does not severely contradict with the assumption of locality.
The dimension of the low-dimensional space is set to 2. The mapping results of Laplacian eigenmaps only guarantee that the local structures are preserved. We need to put the low-dimensional manifolds of the subsets into a mutual space before estimating the distance between the manifolds. Here, we figure out an affine transformation by minimizing the distance (1) where indicates the transformation imposed on the manifold of the moving image.
Subsequently, the distance between two manifolds is formulated as (2) where is the optimal transformation that minimize the distance in Eq. 1. This manifoldto-manifold distance is used as the image distance measurement in our registration approach.

Image Registration
The registration is calculated in an iterative way. In every iteration, the transformation parameters are updated to lower the difference between the reference and moving images measured with the manifold-to-manifold distance. The updated transformation is performed on the moving image. We use the Laplacian eigenmaps to calculate the lowdimensional manifold of the moving image. And then, the distance between the manifolds of the subsets defined on the reference and moving images is calculated. The iteration ends when the manifold-to-manifold distance reaches the convergence. Based on our proposed similarity metric, the problem of image registration can be converted to the minimization of the manifold-to-manifold distance defined in Eq. 2. (3) We use to denote the transformation of the original image. indicates the low dimensional manifold of the moving image when transformation is imposed on it.
We consider the rigid transformation on 2D images with 3 registration parameters to be optimized: the rotation r, the translations on x and y directions , respectively. The registration is optimized with BFGS [10]. The gradient of the objective function can be written as (4) where d is short for . The partial derivatives are estimated as follows,

Experiments
We apply our method on the brain MRI images provided by BrainWeb database 1 , which includes modalities of T1, T2 and PD. The images have been adjusted and they are well aligned. In order to demonstrate the effectiveness of our proposed similarity metric, we show plots of the similarity between slices selected from the T1-and T2-weighted images for rotation and translations in Fig. 2. Three different slices are tested. It can be seen that the image difference measured by the manifold-to-manifold distance approaches the minimum when two images are well aligned. We also test our method in registration experiments. The transformation parameters are calculated for all combinations of image modalities. We first make a random rigid transformation on one of the images, with rotations in the range of and translations . The calculated parameters are compared with them and the errors are shown in Table 1. For each configuration in Table 1, the registration is calculated 100 times with random starting positions. The registration methods proposed in [21] based on the entropy image and Laplacian image as well as the approach based on MI [15] are compared in this table. The experiments prove that the Laplacian eigenmaps for the defined subset can effectively depict the brain images and lead to accurate registration results. In our work, image patches are used to describe the local structures around the detected edge points. The performance of registration is related to the selection of the patch size. If the image patch is too small, it cannot contain enough structural information. The patch size should be large enough so that the subsets defined in the reference and moving images have sufficient overlap, but too large patches also lead to high computational cost. The setting of the patch size is application related. In our experiments, we consider the rotation up to and the translations 15 mm. We set the patch size to , and the dimension of the image patch is 961. In Fig. 3, we illustrate the changes of similarity respect to translation in y direction with different patch sizes. When the patch size is too small, a lot of local minima show up as the translation is obvious. Without a proper initial value, the result is likely to be trapped in the local minima. Fig. 3. The similarity respect to translation in y direction with patch sizes of (left), (middle) and (right).

Conclusion
In this article, we proposed a novel similarity metric for multi-modal brain images based on subset definition and manifold-to-manifold distance, and utilize it to solve the registration of brain images. We use a subset made up of patches around the sampled edge points to depict the structure of the brain images. Then, Laplacian eigenmaps is employed to calculate the low-dimensional manifold of the subset. The image difference is estimated via the manifold-to-manifold distance. Our results demonstrate that this metric provides a good measurement for the image difference and leads to effective image registration.