Signal, Image and Video Processing

, Volume 9, Issue 8, pp 1815–1824 | Cite as

A feature-based solution for 3D registration of CT and MRI images of human knee

  • Jingjie Zheng
  • Zhenyan Ji
  • Kuangdi Yu
  • Qin An
  • Zhiming Guo
  • Zuyi Wu
Original Paper


This paper presents a feature-based solution for 3D registration of CT and MRI images of a human knee. It facilitates constructing high-quality models with clear outlining of bone tissues and detailed illustration of soft tissues. The model will be used for analysing the effect of posterior cruciate ligament and anterior cruciate ligament deficiency. The solution consists of preprocessing, feature extraction, transformation parameter estimation and resampling, and blending. In preprocessing, we propose partial preserving and iterative neighbour comparing filtering to help segment bone tissues from MRI images without having to construct a statistical model. Through analysing the characteristics of knee images, tibia and femur are selected as the features and the algorithm for effectively extracting them is described. To estimate transformation parameters, we propose a method based on the statistical information of projected feature images, including translating according to the project feature image centroids and calculating the rotation angle by searching and mapping boundary points. We transform the MRI image and then blend it with the CT image by taking the maximum intensity of every two corresponding voxels from the two images. At the end of the paper, the registration result is evaluated by computing the Pearson product-moment correlation coefficient of the binarised features and the accuracy is confirmed.


Feature-based image registration Magnetic resonance imaging Computed tomography Human knee 

1 Introduction

Human knees are commonly injured and suffer from various conditions. Patients who have suffered significant trauma or chronic osteoarthritis pain are usually asked to undergo an MRI of the knees, as this shows clear and detailed structures of the cartilage, ligaments, and muscles [1]. Meanwhile, CT scans are adopted where the actual bone structures require well outlining. Registering and blending CT and MRI images of the human knee facilitates constructing high-quality models providing far more abundant information, featuring both modalities. This model is constructed for the purpose of analysing the effect of posterior cruciate ligament and anterior cruciate ligament deficiency. It is also considered to be a potentially useful tool for disease diagnosis, stress analysis, and computer-aided surgery.

This paper first introduces image registration and the typical classification of registration methods, i.e., intensity-based methods and feature-based methods. Then, we review our previous work and analyse the feasibility of applying mutual information method, one of the state-of-the-art intensity-based methods for multimodal image registration. Concluding with several drawbacks of applying this method, we consider using feature-based method and point out our main focuses.

In methodology, this paper proposes a feature-based solution for 3D registration of CT and MRI images of the human knee. The solution consists of four steps: preprocessing, feature extraction, transformation parameter estimation and resampling, and blending.

Preprocessing. Preprocessing enhances the images so that the features can be extracted more easily. An obvious contrast between bone tissues and other parts of the image is resulted by preprocessing. Different approaches are adopted to preprocess CT and MRI images. CT images are preprocessed by thresholding and adopting hole filling operation. MRI images are preprocessed with a series of operations to increase contrast of bone tissues and soft tissues as well as eliminate noise inside bone tissues. Since different parts of an MRI image vary a lot, parameters in these operations are piecewise applied. An algorithm is proposed for separating MRI slices into four parts.

Feature extraction. Tibia and femur, as the two largest bones viewed in the image are proper choices for anatomical features. They are extracted by thresholding and adopting morphological opening. Morphological opening is performed to a portion of the slices in the image. An algorithm is proposed for identifying whether a slice should be processed with morphological opening or not.

Transformation parameter estimation and resampling. Based on the features extracted, transformation parameters are estimated utilising the statistical information of these features. The sensed image is resampled according to the transformation matrix computed.

Blending. CT and MRI images are in the end blended by taking the maximum intensity of each corresponding voxel.

2 Analysis of methods

Image registration is the process of finding correspondence between all points in two images of a scene. The process spatially aligns the images, making it possible to fuse information in the images. The correspondence is often established by finding a transformation that minimises some distance between the transformed sensed image and the reference image [2].

Over the years, many approaches have been proposed for image registration. These approaches are generally categorised into intensity-based methods (or area-based methods) and feature-based methods.

Intensity-based methods [3, 4, 5] operate directly on voxel intensities. The basic principle of these methods is to search for a transformation that optimises a criterion measuring the intensity similarity of corresponding voxels. Among different measures, mutual information has been proved to be an excellent one for cross-modality registrations. This measure assumes the statistical dependence of the voxel intensities is maximal when the images are geometrically aligned.

Feature-based methods [6, 7, 8] typically extract distinct anatomical features from images and find the correspondence and transformation between them. Features such as points, curves, and surfaces are often employed in transformation model estimation. It can handle complex between-image distortions and is faster than mutual information method, since no evaluation of a matching criterion on the whole image is needed [9].

Medical images like brains often employ mutual information method since they contain inadequate features that are distinctive and easily detectable [10]. This measure only makes a fairly loose assumption that image intensities should have a probabilistic relationship. This assumption holds for brain images whose statistical dependence is relatively strong. However, directly applying it to knee images is very likely to fail [11]. CT and MRI images are different in many ways. First, an MRI image provides more detailed information (redundant while building probabilistic relationship) about soft tissues than a CT image. Second, the contrast between bone tissues and soft tissues is significant enough in a CT image so that there exists a threshold to separate them, while this boundary is ambiguous in an MRI image. Third, the intensities of bone tissues are greater than those of soft tissues in a CT image, while this is opposite in an MRI image.

In our previous work [12], two-dimensional images are preprocessed into a pair of bone skeletons. This provides images with stronger statistical dependence. The registration is carried out by optimisation of mutual information using Powell’s method [13]. This approach provides a relatively accurate result of registration. However, it is time-consuming as each evaluation for mutual information criterion involves all the voxels in images and the number of iterations during optimisation is very large [14]. Adding a third dimension, as in the 3D MRI and CT images in this work, will take even longer time in searching.

Now, we consider the feasibility of using feature-based methods. Selecting proper features and extracting them are the keys to the success of feature-based image registration.

The correct alignment of bones is of most concern in the registration of the human knee [12]. In our research, we acquire the image pair with the assistance of a customised leg stand so that the angles of knee bending are guaranteed to be the same and the knees have already been aligned along the \(z\) axis (head to foot direction). Bone tissues, unlike soft tissues, are rigid and their physical appearance does not change easily with environment. They are also salient features that can be easily located and identified with human eyes. Tibia and femur are the two largest bones in knee images; hence, they are proper choices for anatomical features in our registration.

Tibia and femur can be extracted out of a CT image easily, but they are very difficult to be extracted out of an MRI image. Standard automatic segmentation methods often fail to provide reasonable segmentation as different tissues have overlapping image intensity values and boundaries between tissues are not clearly separated [15]. Most existing segmentation methods on MRI images of the human knee rely on constructing a statistical model, the accuracy of which is strongly influenced by the amount of input data [1, 15].

Furthermore, the data describing the desired structures are hand-segmented which can be prohibitive. Based on the reasons above, it is important to find an approach that overcomes the shortcomings of these methods.

3 Methodology

3.1 Preprocessing

Preprocessing is essential since the images may contain noise and the contrast between bone tissues and other parts may not be significant enough. Preprocessing facilitates the feature extraction step.

Image intensities are first mapped to range \([0,1]\) according to the window centre wc and window width ww provided in the DICOM data:
$$\begin{aligned} y = {\left\{ \begin{array}{ll} 0 &{} \text {if}\, x \le \text {wc} - 0.5 - (\text {ww}-1)/2 \\ 1 &{} \text {if}\, x > \text {wc} - 0.5 + (\text {ww}-1)/2 \\ \frac{x-(\text {wc}-0.5)}{\text {ww}-1}+0.5 &{} \text {otherwise} \end{array}\right. } \end{aligned}$$
where \(x\) is the input intensity and \(y\) is the output intensity.
The preprocessing steps are depicted in Fig. 1. The knees in CT and MRI images are both first masked to eliminate the influence of background voxels. Then, the contrast between bone tissues and other parts is enhanced. CT and MRI images follow different contrast enhancement approaches. The CT image is enhanced with gamma correction, while the MRI image is enhanced with a combined method. Sequentially, they are partial preserving, iterative neighbour comparing filtering, and morphological closing.
Fig. 1

Schematic for preprocessing CT and MRI images of the human knee

3.1.1 Masking

Background voxels will increase the complexity of the steps coming after. Masking separates the object (i.e., the knee) from its background and facilitates the processing of volume ought be focused. There is a significant gap between background intensities and object intensities which can be used for thresholding. Histograms of CT and MRI images of the human knee are illustrated in Figs. 2 and 3.
Fig. 2

Histogram of a CT image of the human knee

Fig. 3

Histogram of an MRI image of the human knee

We first present the steps for computing the threshold \(\theta _\mathrm{mask}\). Store the intensity of each voxel in image \(f(x,y,z)\) into vector \(V\). Then, compute the number of bins in the image histogram, \(N_\mathrm{bins} = \lceil \frac{|V|}{S_1}\rceil \), where the value \(S_1\) is chosen based on the number of voxels in the image to generate a moderate resolution of histogram to discover the gap. We test different \(S_1\) with values 10, 100, 1,000, 10,000, and 100,000, and conclude that \(S_1 = 10,000\) is a safe and cost-effective choice.

Define the threshold for identifying the gap \(\eta _\mathrm{gap} = \frac{|V|}{S_2}\), where the value of \(S_2\) is chosen to make \(\eta _\mathrm{gap}\) small enough but not too small to ignore the gap so that the gap can be discovered. We determine \(S_2\) with similar method for determining \(S_1\) and have \(S_2 = 20\).

Calculate the histogram of \(V\) and we have another vector \(H\) with the length \(N_\mathrm{bins}\). Find the first \(H(i), H(i^{\prime })\) that satisfies, \(H(i)\le \eta _\mathrm{gap}\), where
$$\begin{aligned} i = 1, 2, 3, \ldots , N_\mathrm{bins} \end{aligned}$$
Then, the threshold \(\theta _\mathrm{mask} = \frac{i^{\prime }}{N_\mathrm{bins}}\).
Then, we mask the object by segmenting the image with \(\theta _\mathrm{mask}\) and filling holes in the mask. With thresholding, we have an initial mask
$$\begin{aligned} {\text {mask}}_{0} = \{(x,y,z) | f(x,y,z) > \theta _\mathrm{mask} \}. \end{aligned}$$
Since there might be holes in the mask, we fill holes and we have \({\text {mask}} = {\text {fill}}({\text {mask}}_{0})\), where fill is the morphological operation that fills holes to the input image. A hole is a set of background voxels that cannot be reached by filling in the background from the edge of the image.

3.1.2 Gamma correction

Gamma correction is often used to compensate for properties of human vision [16]. However, in this case, gamma correction is employed to increase the contrast between bone tissues and other parts of a CT image. Gamma correction is defined by the following power-law expression \(V_\mathrm{out} = {V_\mathrm{in}}^{\gamma }\), where \(V_\mathrm{in}\) and \(V_\mathrm{out}\) are input and output intensities in range [0, 1]. As illustrated in Fig. 4, when \(\gamma >1\), the contrast of the image increases, whereas the opposite occurs with \(\gamma <1\). The CT image is processed with \(\gamma = 3\) to put the soft tissues dark enough to be ignored. The result of gamma correction is illustrated in Fig. 5.
Fig. 4

Gamma correction

Fig. 5

A CT cross section before (a) and after (b) gamma correction

3.1.3 Partial preserving

Partial preserving operates on the masked voxels. It preserves the voxels that best characterise the knee image, i.e., the voxels with an intensity not falling into the first or last certain per cent of positions in the sorted intensity vector of all the masked voxels. The voxels out of this range are set to the nearest bound intensity. The intensities are then expanded back to range \([0, 1]\). This step results in significant contrast between bone tissues and other parts. Meanwhile, some important edge information is wiped away and the intensities of noise inside bone tissues is enhanced. The result of partial preserving is depicted in Fig. 6.
Fig. 6

An MRI cross section before (a) and after (b) partial preserving

Formally, define the intensities of the masked voxels in the image to be vector \(V_\mathrm{mask}\). Sort \(V_\mathrm{mask}\) in non-decreasing order, and we have \(V_\mathrm{mask}^{\prime }\). The upper threshold \(\theta _\mathrm{U}\) and lower threshold \(\theta _\mathrm{L}\) are computed
$$\begin{aligned} \theta _\mathrm{U}&= V_\mathrm{mask}^{\prime }(\lfloor (1 - \alpha )\cdot |V_\mathrm{mask}^{\prime }|\rfloor ),\\ \theta _\mathrm{L}&= V_\mathrm{mask}^{\prime }(\lfloor \alpha \cdot | V_\mathrm{mask}^{\prime }|\rfloor ). \end{aligned}$$
where \(\alpha \) is a scalar controlling the degree of elimination in interval \([0,0.5]\). Note that \(\alpha \) should be great enough so that \(\lfloor \alpha \cdot |V_\mathrm{mask}^{\prime }|\rfloor \) is a valid index. Then, the voxels with an intensity smaller than the lower threshold \(\theta _\mathrm{L}\) or greater than the upper threshold \(\theta _\mathrm{U}\) is set to the nearest threshold
$$\begin{aligned} f_\mathrm{mask}(x,y,z) := {\left\{ \begin{array}{ll} \theta _\mathrm{L} &{}\hbox {if}\, f_\mathrm{mask}(x,y,z) < \theta _\mathrm{L} \\ \theta _\mathrm{U} &{}\hbox {if}\, f_\mathrm{mask}(x,y,z) > \theta _\mathrm{U} \end{array}\right. }, \end{aligned}$$
where \(f_\mathrm{mask}(x,y,z)\) is the masked voxels in image \(f\!(x,y,z)\).
Finally, the intensities are expanded back to range \([0,1]\)
$$\begin{aligned} f_\mathrm{mask}(x,y,z) := \frac{f_\mathrm{mask}(x,y,z) - \theta _\mathrm{L}}{\theta _\mathrm{U} - \theta _\mathrm{L}}. \end{aligned}$$

3.1.4 Iterative neighbour comparing filtering

Iterative neighbour comparing filtering increases the intensities of the voxels other than bone tissues as well as eliminates noise in bone tissues.

This step consists of a number of iterations. We denote the number of iteration as iter. In each iteration, \(I_\mathrm{v}\) and \(I_\mathrm{s}\) are computed. \(I_\mathrm{v}\) denotes the intensity of a voxel multiplied by \(|\varOmega _\mathrm{v}|\), where \(\varOmega _\mathrm{v}\) denotes itself and its neighbour voxels. \(I_\mathrm{s}\) denotes the sum of the intensities of the voxels in \(\varOmega _\mathrm{v}\). \(I_\mathrm{v}\) and \(I_\mathrm{s}\) are compared to determine the operation to this voxel. If \(I_\mathrm{v}\) and \(I_\mathrm{s}\) are close to each other, it often means this voxel is located at a continuous volume and the intensity of this voxel should be increased. If \(I_\mathrm{v}\) is much greater than \(I_\mathrm{s}\), this voxel is inferred to be isolated and probably noise to be eliminated. If \(I_\mathrm{v}\) is much smaller than \(I_\mathrm{s}\), the intensity of this voxel better remains unchanged, since this happens when either there is a noise voxel with high intensity around it or it is located in a continuous volume but with a relatively small intensity compared with its neighbour voxels. A parameter \(\lambda \) is introduced into the comparison to control the filter behaviour.

Formally, consider a voxel \(v\) with coordinate \((x_v, y_v, z_v)\) and intensity \(f(v)\). Define set \(\varOmega _\mathrm{v}\), which consists of voxel \(v\) and its neighbour voxels
$$\begin{aligned} \varOmega _\mathrm{v}&= \{(x,y,z)|sz_\varOmega \ge |x-x_v|,sz_\varOmega \\&\ge |y-y_v|,sz_\varOmega \ge |z-z_v|\}, \end{aligned}$$
where \(sz_\varOmega \) is a scalar that controls the size of the set.

Then, compute \(I_\mathrm{v} = |\varOmega _\mathrm{v}| \cdot f(v)\) and

\(I_\mathrm{s} = \sum _{(x,y,z) \in \varOmega _\mathrm{v}}f(x,y,z)\). The new intensity of voxel \(v\),
$$\begin{aligned}f^{\prime }(v) = {\left\{ \begin{array}{ll} 0 &{} \hbox { if } f(v)=0 \\ 1 &{} \hbox { if } f(v) \cdot \frac{I_\mathrm{s}}{I_\mathrm{v}\cdot \lambda } > 1\\ f(v) \cdot \frac{I_s}{I_\mathrm{v}\cdot \lambda } &{} \hbox { otherwise} \end{array}\right. }, \end{aligned}$$
where \(\lambda \) is the control parameter in range \((0,\infty )\). The value of \(\lambda \) should be carefully determined in consideration of balancing noise elimination and intensity enhancing. When \(\lambda \) goes to zero, voxel intensities are better enhanced. When \(\lambda \) goes the other way, noise is better eliminated. Figure 7 illustrates an MRI cross section with different \(\lambda s\). Based on experiments, \(\lambda \) is typically set in range \([0.4,0.8]\) to produce a moderate result.
Fig. 7

An MRI cross section with different \(\lambda s\): original image (a), \(\lambda =0.1\) (b), \(\lambda =0.5\) (c), \(\lambda =1.0\) (d)

To deal with the situation where \(I_\mathrm{v}\) is much smaller than \(I_\mathrm{s}\), the following criterion is introduced. The voxel should remain unchanged if
$$\begin{aligned} \Big |\{(x,y,z)|f(x,y,z)>f(v) \hbox { where } (x,y,z) \in \varOmega _\mathrm{v} \}\Big | \ge \tau . \end{aligned}$$
At the end of each iteration, threshold \(\phi \) is applied to clean up the voxels whose intensities are too small.
$$\begin{aligned} f^\prime (x,y,z) := {\left\{ \begin{array}{ll} 0 &{} \hbox { if } f^\prime (x,y,z)<\phi \\ f^\prime (x,y,z) &{} \hbox { otherwise} \end{array}\right. }. \end{aligned}$$
Also, the control parameter \(\lambda \) is multiplied by a propagation scalar prop to decrease the effect in next iteration, \(\lambda ^\prime = \lambda \cdot {\text {prop}}\).

3.1.5 Grayscale morphological closing

The foregoing steps performed to the MRI image may break the continuity of the voxel intensities of soft tissues. Morphological close operation is employed here to repair these broken voxels. Morphological close operation is a dilation followed by an erosion with the same structuring element for both operations.

Let \(f(x)\) denote a grayscale image and \(b(x)\) denote the structuring function, the dilation of \(f\) by \(b\) is given by \((f\oplus b)(x)=\sup _{y\in E}[f(y)+b(x-y)]\), and the erosion of \(f\) by \(b\) is given by \((f\ominus b)(x)=\inf _{y\in E}[f(y)-b(y-x)]\), where “sup” denotes the supremum, “inf” denotes the infimum and \(E\) is the grid. Then, the close operation \(f\cdot b\) is given by \(f\cdot b=(f\oplus b)\ominus b\). The structuring element \(b\) used here is flat and disc-shaped with a radius of \(R\).

3.1.6 Piecewise processing

MRI image slices should be processed with different parameters in iterative neighbour comparing filtering and grayscale morphological closing. The reason for using different parameters is that the intensity distributions of different slices vary and applying the same parameters to all the slices will not generate the best result. For example, if we use the parameters for the slices with one bone displayed to the slices with both tibia and femur displayed (tibia and femur are close to each other), morphological closing may combine the bone tissues separated into one. This should be avoided. Observed that the MRI slices along a knee can be generally categorised into four distinct types (Fig. 8), it is reasonable to divide the slices into four parts and we should find a method for determining the three separation points.
Fig. 8

Typical images in the four parts along the MRI slices

The area ratio of soft tissues and bone tissues in a slice is a nice criterion because selecting parameters in processing steps such as morphological closing is to a great extent affected by it. It is not easy to directly compute this ratio, but computing the ratio of the number of pixels with the brightest and the darkest intensities in the masked area of a slice gives us an approximation. This ratio is a reasonable approximation since the soft tissues cover most of the brightest intensities and bone tissues the darkest. The curve of this ratio is depicted in Fig. 9. We observe that the slices with different characteristics separate when the curve crosses a moderate threshold. In the figure shown, it is around one. The threshold may vary when extreme cases of MRI realizations come up. But the shape of this curve should be homogeneous for different realizations, since the curve conforms to the natural shape of a human knee. We propose an algorithm to figure out the three separation points (Algorithm 1).
Applying this algorithm, the image slices are divided into four parts. Each part is applied with a different set of parameters. The recommended parameters are listed in Table 1.
Fig. 9

Ratio of the number of pixels with the brightest and the darkest intensities in slices

Table 1

Recommended parameters for MRI image slices.


\(\lambda \)

\(\tau \)

\(sz_{\varOmega }\)



\(\phi \)


































\(\lambda , \tau , sz_{\varOmega }, {\text {prop}}, \phi \), and iter are the parameters in iterative neighbour comparing filtering, \(R\) is the radius of the disc-shaped structuring element in morphological closing

3.2 Feature extraction

Feature extraction extracts tibia and femur out of CT and MRI images. It first binarises the image, then selects the two greatest connected components, and eliminates extra bone tissues connected to tibia or femur with morphological open operations. The resulting slices of CT and MRI images are depicted in Figs. 10 and 11, respectively.
Fig. 10

Slices of the feature extracted from the CT image

Fig. 11

Slices of the feature extracted from the MRI image

3.2.1 Thresholding

Thresholding sets the intensities of bone tissues to one and others to zero. The MRI image is thresholded by setting the voxels of the masked volume whose intensity is not equal to zero and others to one. The CT image is thresholded with Otsu’s method, which chooses the threshold to minimise the intraclass variance of the thresholded black and white pixels [17].

3.2.2 Connected component selection

After thresholding, the two greatest connected components in each image are selected. For MRI images, these connected components are already tibia and femur. But for CT images, there are still extra bone tissues connected to tibia or femur, i.e., they are also included in these two connected components.

3.2.3 Morphological opening

This step eliminates extra bone tissues connected to tibia or femur with morphological open operations. Mathematical opening is the dilation of the erosion of a set \(A\) by a structuring element \(B\), functioned as eliminating areas relatively small. The structuring element employed here is flat and disc-shaped.

Morphological opening is performed slice by slice, but not to all of them. In slices where tibia and femur appear concurrently, the areas of some bone tissues are too small to be mistaken as voxels to be eliminated. We propose an algorithm to determine whether it is right slice to apply morphological opening (Algorithm 2). This is determined by comparing the ratio with a threshold around 0.2. The threshold is chosen based on the observation that bone tissues take up around 20 % of the knee slice where tibia and femur reach each other. However, for the reason similar to Algorithm 1, this threshold may vary for different subjects.

3.3 Transformation parameter estimation and resampling

Now that tibia and femur are extracted out of the original images, we can use the features for transformation parameter estimation. This step estimates the parameters required to produce the transformation matrix in resampling the sensed image to the spatial coordinates of the reference image. The transformation here is a typical affine transformation, where an object can translate, rotate, and scale. In this part, parameters for translation, rotation, and scaling are estimated sequentially.

3.3.1 Preparation

Let CT and MRI denote the binarised images of the features. They represent the same range along \(z\) axis and should be scaled so that they contain the same number of slices \(h\).

3.3.2 Translation

This step moves the centroids of the features to the same position and determines parameter \(tx\) and \(ty\). \(tx\) denotes the displacement along \(x\) axis and \(ty\) along \(y\) axis.

Let \({\text {CT}}_\mathrm{proj}\) and \({\text {MRI}}_\mathrm{proj}\) denote the point sets in \(x\) and \(y\) dimensions generated from projecting CT and MRI down along \(z\) axis, respectively. Formally,
$$\begin{aligned}&{\text {CT}}_\mathrm{proj} = \{(x,y)|{\text {CT}}(x,y,z)=1 \hbox {, where } z\in [1, h]\},\\&{\text {MRI}}_\mathrm{proj} = \{(x,y)|{\text {MRI}}(x,y,z)=1 \hbox {, where } z\in [1, h]\}. \end{aligned}$$
The centroids of the two projected images are computed
$$\begin{aligned}&(cx_\mathrm{CT},cy_\mathrm{CT}) = \frac{1}{|{\text {CT}}_\mathrm{proj}|}\sum _{(x_{i},y_{i}) \in {\text {CT}}_\mathrm{proj}} (x_{i},y_{i}),\\&(cx_\mathrm{MRI},cy_\mathrm{MRI}) = \frac{1}{|{\text {MRI}}_\mathrm{proj}|}\sum _{(x_{i},y_{i}) \in {\text {MRI}}_\mathrm{proj}} (x_{i},y_{i}). \end{aligned}$$
Then, \(tx\) and \(ty\) are computed by subtracting the centroid of \({\text {CT}}_\mathrm{proj}\) by the one of \({\text {MRI}}_\mathrm{proj}\)
$$\begin{aligned} (tx, ty) = (cx_\mathrm{CT}-cx_\mathrm{MRI}, cy_\mathrm{CT}-cy_\mathrm{MRI}). \end{aligned}$$
The points in \({\text {MRI}}_\mathrm{proj}\) are transformed with \(T_\mathrm{translation}\), which is given by
$$\begin{aligned} T_\mathrm{translation} = \left[ \begin{array}{l@{\qquad }l@{\qquad }l} 1 &{} 0 &{} tx \\ 0 &{} 1 &{} ty \\ 0 &{} 0 &{} 1 \end{array} \right] . \end{aligned}$$
Let \({\text {MRI}}_\mathrm{proj}^\prime \) denote the transformed point set. For convenience, let \({\text {CT}}_\mathrm{proj}^\prime = {\text {CT}}_\mathrm{proj}\). As the centroids have become the same, \(cx = cx_\mathrm{CT}, cy = cy_\mathrm{CT}\), and \(P_\mathrm{C} = (cx, cy)\).

3.3.3 Rotation

This step calculates rotation angle \(\beta \) based on \({\text {MRI}}_\mathrm{proj}^\prime \) and \({\text {CT}}_\mathrm{proj}^\prime \). It utilises the inherent asymmetry of the point sets.

We first find the edges of \({\text {CT}}_\mathrm{proj}^\prime \) and \({\text {MRI}}_\mathrm{proj}^\prime \) with the Roberts’ cross operator [18]. The coordinates on the edges are sequentially added into \(V_\mathrm{CT}\) and \(V_\mathrm{MRI}\). The distances between each pixel on the edges and the centroid are calculated and stored in distance vector \({\text {dist}}_\mathrm{CT}\) and \({\text {dist}}_\mathrm{MRI}\): \({\text {dist}}_\mathrm{CT} = |V_\mathrm{CT} - P_\mathrm{C}|\) and \({\text {dist}}_\mathrm{MRI} = |V_\mathrm{MRI} - P_\mathrm{C}|\). The distance vector with the greater size is interpolated so that the resampled vector has the same size with the one that is originally smaller. We denote the new distance vectors as \({\text {dist}}_\mathrm{CT}^{\prime }\) and \({\text {dist}}_\mathrm{MRI}^{\prime }\), respectively. Then, \({\text {dist}}_\mathrm{CT}^{\prime }\) is circularly shifted right to find the minimum sum of squared difference between the corresponding entries of \({\text {dist}}_\mathrm{CT}^{\prime }\) and \({\text {dist}}_\mathrm{MRI}^{\prime }\).

Formally, let \([{\text {dist}}_\mathrm{CT}^{\prime }]^{j \mathrm{th}}(i)\) denote the \(i\)th entry in the vector generated by circularly shifting right \(j\) times based on \({\text {dist}}_\mathrm{CT}^{\prime }\). Find the minimum of \(s_\mathrm{j}, s_\mathrm{m}\) which satisfies \(s_\mathrm{j} = \sum _{i = 1}^{l_\mathrm{dist}}\left( [{\text {dist}}_\mathrm{CT}^{\prime }]^{j\mathrm th}(i) - {\text {dist}}_\mathrm{MRI}^{\prime }(i)\right) ^{2}\), where \(l_\mathrm{dist} = |{\text {dist}}_\mathrm{CT}^{\prime }| = |{\text {dist}}_\mathrm{MRI}^{\prime }|\). The corresponding entries on \([{\text {dist}}_\mathrm{CT}^{\prime }]^{s\mathrm th}\) and \({\text {dist}}_\mathrm{MRI}^{\prime }\) are mapped back to find the nearest coordinates in \({\text {CT}}_\mathrm{proj}^\prime \) and \({\text {MRI}}_\mathrm{proj}^\prime \). \(l_\mathrm{dist}\) pairs of such matching coordinates are found. Let \(P_\mathrm{CT}\) and \(P_\mathrm{MRI}\) denote the vectors containing the matching coordinates in \({\text {CT}}_\mathrm{proj}^\prime \) and \({\text {MRI}}_\mathrm{proj}^\prime \), respectively.

Then, we calculate the median angle formed by \(V_{a}(i)\) and \(V_{b}(i),\, \beta = {\text {median}}\left( {\text {angle}}(i)\right) \), where \({\text {angle}}(i) = \arccos \left( \frac{V_{a}(i) \cdot V_{b}(i^{\prime })}{|V_{a}(i)| |V_{b}(i^{\prime })|}\right) \), where \(V_{a} = {\text {CT}}_\mathrm{proj}^\prime (i) - P_\mathrm{C},\, V_{b} = {\text {MRI}}_\mathrm{proj}^\prime (i^{\prime }) - P_\mathrm{C}\).

With \(\beta \), we calculate the rotation matrix \(T_\mathrm{rotation}\) and points in \({\text {MRI}}_\mathrm{proj}^\prime \) are rotated about \(P_\mathrm{C}\)
$$\begin{aligned} T_\mathrm{rotation} = T_\mathrm{c} \left[ \begin{array}{l@{\qquad }l@{\qquad }l} \cos {\beta } &{} -\sin {\beta } &{} 0 \\ \sin {\beta } &{} \cos {\beta } &{} 0\\ 0 &{} 0 &{} 1 \end{array} \right] T_\mathrm{c}^{\prime }, \end{aligned}$$
$$\begin{aligned}&T_\mathrm{c} =\left[ \begin{array}{l@{\qquad }l@{\qquad }l} 1 &{} 0 &{} cx \\ 0 &{} 1 &{} cy \\ 0 &{} 0 &{} 1 \end{array} \right] \quad \mathrm{and} \quad T_\mathrm{c}^{\prime } = \left[ \begin{array}{l@{\qquad }l@{\qquad }l} 1 &{} 0 &{} -cx \\ 0 &{} 1 &{} -cy \\ 0 &{} 0 &{} \quad 1 \end{array} \right] . \end{aligned}$$
Let \({\text {MRI}}_\mathrm{proj}^{\prime \prime }\) denote the transformed point set. For convenience, let \({\text {CT}}_\mathrm{proj}^{\prime \prime } = {\text {CT}}_\mathrm{proj}^{\prime }\).

3.3.4 Scaling

This step calculates scale factors \(sx\) and \(sy\). They are estimated by computing the ratio of the mean distance between the centroid and the points on the edges of \({\text {CT}}_\mathrm{proj}^{\prime \prime }\) and \({\text {MRI}}_\mathrm{proj}^{\prime \prime }\), independently along each dimension \(sx=\bar{\mathrm{CT}}_{xs} / \bar{\mathrm{MRI}}_{xs},\, sy=\bar{\mathrm{CT}}_{ys} / \bar{\mathrm{MRI}}_{ys}\), where \({\text {CT}}_{xs}\) and \({\text {MRI}}_{xs}\) denote the distances between the points on edges and the centroid along \(x\) axis, and \({\text {CT}}_{ys}\) and \({\text {MRI}}_{ys}\) along \(y\) axis.

The points in \({\text {MRI}}_\mathrm{proj}^{\prime \prime }\) are transformed with
$$\begin{aligned} T_\mathrm{scaling} = T_\mathrm{c} \left[ \begin{array}{l@{\qquad }l@{\qquad }l} sx &{} 0 &{} 0 \\ 0 &{} sy &{} 0 \\ 0 &{} 0 &{} 1 \end{array} \right] T_\mathrm{c}^{\prime }. \end{aligned}$$

3.3.5 Resampling

Finally, \(T_\mathrm{scaling}, T_\mathrm{rotation}\), and \(T_\mathrm{translation}\) are multiplied together and we have the transformation matrix \(T_\mathrm{tran} = T_\mathrm{scaling} T_\mathrm{rotation}\)\( T_\mathrm{translation}\). In resampling, we consider the coordinate of each voxel in the CT image (i.e., the reference image), relation and determine the coordinate of the same voxel in the MRI image (i.e., the sensed image)
$$\begin{aligned} \left[ \begin{array}{l} X \\ Y \\ 1 \end{array} \right] = T_\mathrm{tran}^{-1} \times \left[ \begin{array}{l} x \\ y \\ 1 \end{array} \right] \end{aligned}$$
where \((x,y)\) is the coordinate of a voxel in the reference image and \((X,Y)\) is its counterpart in the sensed image. Note that as \((X,Y)\) might fall on a position where no discrete voxel lies, an interpolation method should be adopted to estimate this value from the intensities of discrete locations around it. In this case, cubic convolution is applied.

Thus, we have the resampled MRI image \({\text {MRI}}^\prime \). For convenience, let \({\text {CT}}^\prime = {\text {CT}}\).

3.4 Blending

Blending fuses the information from both modalities. The outcome of this step should have clear outlining of bone tissues and detailed illustration of soft tissues. The intensities of bone tissues are very large in CT images and very small in MRI images, so it is not meaningful to simply average the intensities of two images. Instead, we take the maximum of corresponding voxels in \({\text {CT}}^\prime \) and \({\text {MRI}}^\prime \) to give the blended image \(I_\mathrm{blended}(x,y,z) = {\text {max}}({\text {CT}}^\prime (x,y,z), {\text {MRI}}^\prime (x,y,z))\). This helps to maximise information retention in CT-MRI fusion [19].

4 Results and evaluation

We implemented our solution in Matlab 2012b and ran it on a computer with 2.13 GHz CPU and 2 GB RAM. It took 62.3 s in total to finish computing. The original matrix sizes of CT and MRI images are \(176\times 148\times 59\) and \(128\times 128\times 59\), respectively.

The results of registration are illustrated in Fig. 12. From top to down are original CT sections and corresponding aligned MRI sections, corresponding aligned fused sections, and an example of misaligned sections. We observe that the bone tissues are salient and easily noticed in CT images, whereas the soft tissues in MRI images are more detailed and abundant than their counterparts. The alignment is obvious observing with human eyes: Bone tissues from the CT image well fits the soft tissues from the MRI image. Slight misalignment subjects to the exposure of low intensity bone tissues derived from MRI image in the misalignment areas as shown in the example. From the fused image, we can see the combination of the merits from both modalities which contains much more abundant information than either.
Fig. 12


To numerically evaluate this result, the Pearson product-moment correlation coefficient of the binarised features (i.e., tibia and femur) of \({\text {CT}}^{\prime }\) and \({\text {MRI}}^{\prime }\) is computed. We evaluate with the binarised features (instead of the whole images) for two reasons. On the one hand, correct alignment of bone tissues is of most concern in this research. Bone tissues are rigid, and they are consistent between the images, while the appearance of soft tissues might change. On the other, CT and MRI images differ from each other in many ways. It is more comparable if we only take the position information of the features into evaluation, thus they are binarised.

Pearson’s correlation coefficient between two variables is defined as the covariance of the two variables divided by the product of their standard deviations
$$\begin{aligned} r(X, Y) = \frac{\sum ^n _{i=1}(X_i - \bar{X})(Y_i - \bar{Y})}{\sqrt{\sum ^n _{i=1}(X_i - \bar{X})^2} \sqrt{\sum ^n _{i=1}(Y_i - \bar{Y})^2}} \end{aligned}$$
In our case, the correlation coefficient,
\(r({\text {CT}}^\prime _\mathrm{feature}, {\text {MRI}}^\prime _\mathrm{feature}) = 92.69\,\%\), where \({\text {CT}}^\prime _\mathrm{feature}\) and \({\text {MRI}}^\prime _\mathrm{feature}\) denote the binarised feature images of \({\text {CT}}^\prime \) and \({\text {MRI}}^\prime \), respectively. In contrast, if we compute the correlation coefficient of the binarised feature images before alignment, we have \(r_\mathrm{before} = 13.87\,\%\). This proves the degree of improvement due to the alignment process. To more intuitively demonstrate the accuracy, we project the features of CT and MRI images after the alignment along each axis and show the overlaid outlines in Fig. 13. The two outlines intertwine each other with slight variance.
Fig. 13

Overlaid outlines of the projected features, a along \(z\)-axis, b along \(x\)-axis, c along \(y\)-axis, corresponding to the layouts of Fig. 12

5 Conclusion

In this paper, we present a feature-based solution for 3D registration of CT and MRI images of the human knee. The experiment shows that our solution is very accurate. In our solution, statistical information is used to help extract features, estimate parameters and improve the accuracy. The model provided by this solution will be further used to analyse the effect of posterior cruciate ligament and anterior cruciate ligament deficiency. In the future work, we will further verify the robustness of our solution as we acquire new datasets to feature it with more generality.


  1. 1.
    Fripp, J., Warfield, S.K., Crozier, S., Ourselin, S.: In: Pattern Recognition, 2006. ICPR 2006. 18th International Conference on, Vol. 1 (IEEE, 2006), pp. 167–170Google Scholar
  2. 2.
    Goshtasby, A.A.: Image Registration: Principles Tools and Methods. Springer, New York (2012)CrossRefGoogle Scholar
  3. 3.
    Alam, M.M., Howlader, T., Rahman, S.M.: Entropy-based image registration method using the curvelet transform. Signal Image Video Process. 8(3), 491–505 (2014)Google Scholar
  4. 4.
    Maes, F., Collignon, A., Vandermeulen, D., Marchal, G., Suetens, P.: Multimodality image registration by maximization of mutual information. IEEE Trans. Med. Imaging 16(2), 187 (1997)CrossRefGoogle Scholar
  5. 5.
    Kim, J., Fessler, J.A.: Intensity-based image registration using robust correlation coefficients. IEEE Trans. Med. Imaging 23(11), 1430 (2004)CrossRefGoogle Scholar
  6. 6.
    Bouchiha, R., Besbes, K.: Comparison of local descriptors for automatic remote sensing image registration. Signal Image Video Process. 1–7 (2013)Google Scholar
  7. 7.
    Can, A., Stewart, C.V., Roysam, B., Tanenbaum, H.L.: A feature-based technique for joint, linear estimation of high-order image-to-mosaic transformations: mosaicing the curved human retina. IEEE Trans. Pattern Anal. Mach. Intell. 24(3), 347 (2002)CrossRefGoogle Scholar
  8. 8.
    Dai, X., Khorram, S.: A feature-based image registration algorithm using improved chain-code representation combined with invariant moments. IEEE Trans. Geosci. Remote Sens. 37(5), 2351 (1999)CrossRefGoogle Scholar
  9. 9.
    Boda, S.: In: Feature-Based Image Registration. Ph.D. thesis (2009)Google Scholar
  10. 10.
    Zitova, B., Flusser, J.: Image registration methods: a survey. Image Vis. Comput. 21(11), 977 (2003)CrossRefGoogle Scholar
  11. 11.
    Tomaževič, D., Likar, B., Pernuš, F.: Multi-feature mutual information image registration. Image Anal. Stereol. 31(1), 43 (2012)MathSciNetCrossRefGoogle Scholar
  12. 12.
    Ji, Z., Wei, H.: The registration of knee joint images with preprocessing. Int. J. Image Graphics Signal Proc. (IJIGSP) 3(4), 10 (2011)Google Scholar
  13. 13.
    Powell, M. J.: An efficient method for finding the minimum of a function of several variables without calculating derivatives. Comput. J. 7(2), 155–162 (1964)Google Scholar
  14. 14.
    Pan, X., Zhao, K., Liu, J., Kang, Y.: In: Biomedical Engineering and Informatics (BMEI), 2010 3rd International Conference on, Vol. 1 (IEEE, 2010), pp. 18–22Google Scholar
  15. 15.
    Kapur, T., Beardsley, P., Gibson, S., Grimson, W., Wells, W.: In: Proceedings of IEEE Intl Workshop on Model-Based 3D Image Analysis (Citeseer, 1998), pp. 97–106Google Scholar
  16. 16.
    Poynton, C.: Digital Video and HD: Algorithms and Interfaces. Morgan Kaufmann, Burlington, Massachusetts (2012)Google Scholar
  17. 17.
    Otsu, N.: A threshold selection method from gray-level histograms. Automatica 11(285–296), 23 (1975)Google Scholar
  18. 18.
    Roberts, L.G.: In: Machine Perception of Three-Dimensional Solids. Technical Report, DTIC Document (1963)Google Scholar
  19. 19.
    Shah, P., Srikanth, T., Merchant, S.N.: In: U.B. Desai, Signal, Image and Video Processing pp. 1–16 (2013)Google Scholar

Copyright information

© Springer-Verlag London 2014

Authors and Affiliations

  • Jingjie Zheng
    • 1
  • Zhenyan Ji
    • 1
  • Kuangdi Yu
    • 1
  • Qin An
    • 1
  • Zhiming Guo
    • 1
  • Zuyi Wu
    • 2
  1. 1.School of Software EngineeringBeijing Jiaotong UniversityBeijingPeople’s Republic of China
  2. 2.School of ScienceBeijing Jiaotong UniversityBeijingPeople’s Republic of China

Personalised recommendations