Multimodal Image Reconstruction Using Supplementary Structural Information in Total Variation Regularization

In this paper, we propose an iterative reconstruction algorithm which uses available information from one dataset collected using one modality to increase the resolution and signal-to-noise ratio of one collected by another modality. The method operates on the structural information only which increases its suitability across various applications. Consequently, the main aim of this method is to exploit available supplementary data within the regularization framework. The source of primary and supplementary datasets can be acquired using complementary imaging modes where different types of information are obtained (e.g. in medical imaging: anatomical and functional). It is shown by extracting structural information from the supplementary image (direction of level sets) one can enhance the resolution of the other image. Notably, the method enhances edges that are common to both images while not suppressing features that show high contrast in the primary image alone. In our iterative algorithm we use available structural information within a modified total variation penalty term. We provide numerical experiments to show the advantages and feasibility of the proposed technique in comparison to other methods.


Introduction
The problem of combining several images into a single one is an old problem of image fusion [1]. The challenging task is to transfer information from all data sets into a single domain which represents all the available data in the most complete way. The problem of fusing images can arise in many applications where data is acquired from different imaging systems or modalities. Recent advances in medical hybrid scanners have posed new challenges in data fusion between data sets representing different characteristics of the biological materials [2].
Functional imaging modalities, such as positron emission tomography (PET) or single photon emission computed tomography (SPECT) are used for diagnosing and monitoring oncological diseases. In medical hybrid scanners, the functional modalities are combined with anatomical imaging systems, such as X-ray computed tomography (CT) or magnetic resonance (MRI), to help in identifying the exact location of the decease. Complementary CT information is also used in PET and SPECT for attenuation correction or in some cases for the partial volume correction (PVC) which leads to improvement in resolution for functional images [3]. The measured data from hybrid scanners can be reconstructed separately and fused [4] or PVC corrected, or alternatively, the information on anatomical features can be embedded directly into reconstruction process by means of a priori information [5].
In this paper, we propose an algorithm which uses available information from one data set to increase the resolution and signal-to-noise (SNR) ratio of another one. The method operates on the structural information only which increases its suitability across various applications [6][7][8][9][10].
In [10], we used a diffusion tensor framework to build a combined tensor which exhibits the local structural properties for two datasets simultaneously. The modified diffusion tensor was used in the regularization framework to reconstruct SPECT measurements given the reconstructed MR image. The resolution of SPECT functional images was improved by the structural information from the referenced MR images. However the feasibility of the proposed technique was limited by the number of parameters needing definition. In this paper, we present a novel, yet more flexible and easy-to-use method, which has the same objective as the algorithm in [10].
The core of the new method is based on the total variation (TV) semi-norm, which has proven to be a successful tool for image recovery over the past few decades [11]. The apparent drawback of the TV semi-norm in favouring piecewise constant solutions to smooth solutions, although by considering higher order regularization terms this shortcoming can be suppressed [12][13][14][15][16]. The two-step algorithm proposed to solve the problem [12]. In the first step, one needs to smooth the vector field of the noisy image and then find a surface which fits the smoothed vector field (the second step). Subsequently, various enhancements and modifications of this method have been presented [13][14][15][16].
In this work, we are not concerned with the step of smoothing normals [12][13][14][15] or tangent vectors [16]; rather we investigate a situation where a supplementary vector field is available in the surface fitting step. We then modify the surface fitting step in such a way that additional information on edges can be easily integrated into the recovery process. Our aim is to encourage structural alignment of two images when gradient orientations tend to be parallel. On the other hand, non-parallel direction of level sets must be treated as a special case to avoid strong bias in the recovery process [10].
Our approach is tested on an image denoising and debluring problem and then applied to synthetic PET/CT reconstruction of a thorax. In these experiments, we consider the case when some image parts have common edges and some are structurally different. For image reconstruction experiment we introduce lesions into the synthetic functional phantom which are absent from the supplementary anatomical image. The main goal is to enhance the spatial resolution of functional images without loss of important features, such as introduced lesions. We compare the proposed method with another state-of-the-art method which uses supplementary information.

Image Recovery by the Surface Fitting
For a given noisy image k 0 ðx; yÞ ¼ kðx; yÞ þ gðx; yÞ, which is defined on a two dimensional rectangular domain X & R 2 , one can find its noiseless representation kðx; yÞ recovered from noise gðx; yÞ by minimizing the following cost function: where c is a regularization parameter to determine a trade-off between the data fidelity and TV semi-norm [11] respectively. The magnitude of the gradient jrkj ¼ ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffi is calculated with a small constant to avoid instabilities in the uniform regions of k. The minimization of (1) results in the noiseless piecewise constant approximation (cartoon or staircase effect) to k [11]. One can overcome the cartoon effect of the TV minimization (1) by considering a higher order regularization terms with a two-step minimization approach [12][13][14][15][16]. The first step is generally performed with the regularization of the vector field for k, e.g. unit normal vector field [12]: where d is a regularization parameter and the unit normal vector field n is given by In the second step, the surface k is found that fits the obtained smoothed normal vectors nðkÞ. The second step was defined as the surface fitting problem and it is performed with the following minimization problem: Since the minimization of (4) using noisy normal vectors nðk 0 Þ will lead to a perturbed recovery, there has been a lot of research dedicated to the regularization of the normal vector field [12][13][14][15] or the tangential one [16]. The regularized normal vector field in the surface fitting step (4) can potentially improve image quality and remove the staircase effect of the lower order TV minimization methods (1). Note that when rk Á nðkÞ ¼ 0, the functional (4) becomes the classical TV minimization problem (1), on the other hand, when rk Á nðkÞ ¼ jrkj the smoothing term disappears and the data fidelity term is in full force. Therefore the model (4) encourages the data fidelity term when rkknðkÞ (structurally valuable regions, such as image boundaries) and more smoothing when rk?nðkÞ (uniform regions).
If h is an angle between rk and nðkÞ, then one can rewrite the right hand side term in (4) as: It is now evident that the smoothing term (5) approaches zero when h ! 0 (vectors parallel) and when h ! p=2 (vectors perpendicular) the weight of the TV penalty increases.

Embedding Structural Information into the Surface Fitting
Step In this section we will show how one can embed supplementary information into the minimization term (4). Consider the following problem: where k is an image we would like to recover from its noisy (g noise component) and blurred representation k 0 , the forward operator A implements discrete convolution (for our problem it is isotropic blurring). The supplementary image l is given as a reference image. The images can differ in intensity levels, geometry, spatial resolution and signal to noise ratio (SNR). The main goal is to recover (denoise and deblur) k using only the structural information of l, while preserving the salient features in k.
One can substitute the normal vector fields of the reference nðlÞ ¼ rl jrlj directly into (4), resulting in the regularized deblurring problem: However, this model assumes that the gradients of k are parallel to l in every ðx; yÞ. This is not a valid assumption for the reconstructed multimodal datasets where the gradient orientations can differ between acquired images. Following this observation, we believe that only parallel (or almost parallel) gradients of k and l can be modified with additional information from l.
To identify how gradient orientations for k and l are related to each other, one has to find an angle between rk and rl. In this paper we use an isotropic and recursive oriented network (IRON) to identify the gradient orientations [17]. This method has proven to be more stable to noise than derivative based approaches due to its non-local nature (see Fig. 1).
The computation of IRON for any image at a given location ðx 0 ; y 0 Þ requires computation of the variance along the lines of the network (see Fig. 1). For each angle u k ; k ¼ 1; . . .; K the variance of the network is calculated as: here v i;j refers to the interpolated gray level at location ði; jÞ on the network. The network consists of L lines and p points, for our experiments we take L ¼ 3; p ¼ 5.
To obtain the desired texture orientation value one has to find a minimum (where global relates to one orientation and local to multiple) of the variance Dðx; y; u k Þ for a given location ðx 0 ; y 0 Þ for k ¼ 1; . . .; K orientations. The number of orientations K to be tested is defined by the application, in this work we used K ¼ 16, which is sufficient for our task. For the detailed description of the IRON method we refer the reader to [17], in our implementation we used the image rotation technique. We define the texture orientations estimated with IRON method for images k and l as u k ðKÞ and u l ðKÞ respectively. Let u kl ðKÞ be the angle between u k ðKÞ and u l ðKÞ for K orientations: u kl ðKÞ ¼ u k ðKÞ À u l ðKÞ; u kl ðKÞ 2 ½Àp; p: ð9Þ Here we introduce an orientation matching measure which shows how gradient orientations are aligned with each other for k and l: when Uðu kl ðKÞÞ ! 0 the normal vectors tend to be parallel nðkÞknðlÞ or rkknðlÞ.
In this paper, we say that the gradients of k and l are parallel when Uðu kl ðKÞÞ\T, when T is a small constant. Then the image recovery of k using nðlÞ can be written as: The problem expressed in (11) describes the standard TV minimization (no prior information about the supplementary image used) for the areas where gradients k and l are not parallel. The strong prior knowledge (direction of smoothing) is embedded when the gradients tend to be parallel.
The term ðjrkj À rk Á nðlÞÞ in (11) shares some similarities with the recently proposed model for nonlinear processing of color images [18], expressed by: This model measures the degree of level sets k; l being parallel to each other and also depends on the gradient magnitudes of both images. In contrast to (12) we remove the dependency on the magnitude of the gradient for image l in our model (11). Additionally, we would argue that in finding gradient orientations the IRON technique is more robust to noise than the derivative based techniques used in minimization of (12). However, a possible drawback of our method (11) is a binary decision making approach for the use of supplementary information (no linear combinations of vectors are taken). Here we do not compare the model expressed in (12) with the proposed one (11), but it is a subject of future research.

Discretization of the Proposed Model
The optimality conditions for the saddle points of (11) are (considering only the upper part): In (14), g is the outwards unit normal vector on the boundary oX. By introducing a time variable t one can write (13) as an evolution equation: Example of an IRON symmetric network used to find orientation for the point A 0 ðx 0 ; y 0 Þ with L ¼ 3 lines and p ¼ 5 points per line For numerical implementation we use the notation of forward and backward differences: D x Ç k i;j ¼ Çðk iÇ1;j À k i;j Þ and D y Ç k i;j ¼ Çðk i;jÇ1 À k i;j Þ. We use an explicit scheme [11] to discretize (15) as: Þ minðabsðaÞ; absðbÞÞ. The parameter Dt denotes the time discretization constant and is chosen to be small for explicit schemes 0\Dt 0:25.
The proposed algorithm (11) to recover k having the supplementary image l is given in Algorithm 1.

Iterative Tomographic Reconstruction Using the Proposed Model
For tomographic reconstruction we consider a multimodal medical imaging set-up comprising functional (PET) and anatomical (X-ray CT) modalities [2]. Our aim is to reconstruct the unknown radiotracer distribution k having supplementary anatomical information l.
The image k 2 R N which is an N-dimensional vector can be reconstructed from its projections (sinogram) g 2 R M . For ET, g follows a Poisson distribution and the count measurements can be written as: where the projection or system matrix P : R N ! R M depends on the system design and the detector array geometry. In this work we do not account for the scatter effects, but the resolution of PET modality is simulated.
To reconstruct the image k from the measured data g, the following constrained cost function must be optimized: where the Kullback-Leibler (KL) distance [21] is defined as: and the regularization term RðkÞ is controlled by the parameter b.
In this work, we consider three different regularization penalties, the first one is the traditional TV semi-norm The second and the third are anatomically driven functionals which depend on both k and l: and the penalty term is based on the Bowsher method (BM) [5,10]: where function q is an edge preserving Huber function which approximates the ' 1 norm similarly to the TV semi-norm [19]. The threshold f depends on jrkj and needs to be carefully defined. The penalty R 3 performs smoothing between the central pixel i and the nearest pixel k in the local neighbourhood set @ i ðl; n 0 Þ. The neighbourhood depends on l alone and n 0 is a number of the most closest neighbours (normally 20-35 % of the total number of neighbours) of i based on the smallest absolute differences absðl i À l k Þ. The BM is based on the Gibbs assumption that the closest neighbours to the central pixel have the highest probability to be within one intensity class. One can use a simple absolute difference metric to find the most similar neighbours. This metric, however, is very sensitive to noise in images and we will demonstrate in numerical experiments later how a very low level of noise can significantly affect the quality of the recovered images. For more details on BM we refer the reader to [5,10].
Similarly to [10] we write a nested forward-backward splitting iterative algorithm [22]: Here the MLEM method solves the KL optimization sub-problem and L is an operator that performs a transition from k mþ 1 2 to k mþ1 by minimizing the following function: The standard iterative gradient descent algorithm is used to optimize equation (24): Using the proposed penalty (21) we present the following iterative reconstruction Algorithm 2 for tomographic reconstruction.

Image Recovery Using Supplementary Information
The aim of this experiment is to show that the proposed method is a flexible and easy-to-use tool for embedding supplementary structural information into the recovery process. We created two phantoms k and l (see Fig. 2a, c, respectively) in a way that geometrically different structures were present in both images. Image k is significantly degraded by an isotropic blur (we use a Gaussian filter with ½15 15 pixels kernel size and the standard deviation equal to 2.0) and noise with standard deviation of 12% of the signal (see Fig. 2b). Image l (the reference) is a less noisy dataset (0:05% of noise) with sharper features. Potentially, l can have different grey-scale intensity values, however, this will not impede the performance of the proposed method. Our aim here is to recoverk from k 0 by using available structural information in l. Features of k which are geometrically correlated (common edges) with l must be enhanced by information from l during the recovery process, meanwhile the non correlated features must be preserved ink. Since some features in k are not correlated to features in l (see LB ROI in Fig. 2a) can initiate false edges in the recovery ofk, it is essential to restrict the use of supplementary information. This is a challenging task and failing to do so will result in severe artifacts ink.
For numerical experiments we use the gradient descent approach (25) to solve the least squares problem (LS) (6): k nþ1 ¼ k n À DtðA Ã ðAk n À k 0 ÞÞ. Different regularizes then applied to LS to stabilize solution against noise, such as TV (16) where u; v ¼ 0, TV-Str method without orientation matching (7) and TV-Str with orientation matching (11) (see Algorithm 1). For the BM penalty we perform gradient descent iterations using the regularization term (22) in the Algorithm 3. For the image restoration experiment we provide the computer code which is implemented in C and Matlab languages and is available from the following link [20].
In Fig. 3 we show the gradient orientations in radians calculated using the IRON method (8). In (a), one can see how angles u k 0 ðKÞ were estimated for the degraded image (first iteration of TV-Str Algorithm 1). In (b), the orientation map for the reference image is given and in (c) the orientation matching measure is calculated for k 0 and l. One can see that even for the first iteration of the proposed TV-Str method the parallel gradients can be identified (low intensity values in (c)). During iterations of Algorithm 1 the orientation matching measure becomes more precise in identification of aligned features.
In this experiment all parameters were chosen empirically based on the known level of noise and the response from normalized mean square error (NMSE), given as: In Table 1 we provide parameters which were used for this experiment. In Fig. 4 (top) we consider the UB ROI where all features in k are ideally aligned with l. The LS method (a) fails to recover k due to strong influence of noise in the data. However, using TV regularization (b) one can remove noise and substantially improve resolution. The deblurring effect of the recovered images using the supplementary information can be clearly seen for the BM (c) and TV-Str methods (d, e). The convergence behaviour of the compared algorithms for the UB ROI can be seen in Fig. 5 (left). One can notice that the algorithms which use supplementary information give the smallest NMSE error (the BM (c) should be stopped earlier to avoid divergence). The TV-Str method without orientation matching (d) has almost the same error as TV-Str with IRON matching (e) (see Table 2).
In Fig. 4 (bottom) we consider the LB ROI where some features are aligned with each other and some are completely different. To demonstrate artifacts induced by the methods which use structural information (BM, TV-Str) we show the zoomed region of LB ROI. Very strong artifacts (horizontal lines) are visible using the BM Fig. 2 a Original image k, the upper box (UB ROI) contains features that are present in the sharper reference image, while the lower box (LB ROI) contains partially correlated or structurally different features from the reference image, b isotropically blurred and noisy image k 0 (12% of random noise), c the reference image l with 0:05% of noise. The main goal of the recovery process is to enhance resolution and SNR of k 0 by using information from l (c) without introducing any false features caused by uncorrelated edges in LB ROI and TV-Str without orientation matching (c, d). Both the BM and TV-Str without orientation matching deliver a high value of bias (see Fig. 5 (right)) for this ROI (the BM should be stopped prematurely). The proposed method with IRON orientation matching (e) delivers an image almost free of artifacts. According to the plots in Fig. 5 (LB ROI) the proposed method has slightly higher level of error than TV. In the presence of a high level of noise it is problematic to identify orientation of the gradient exactly.
In Table 2, the NMSE values for the methods are provided. The proposed method with orientation matching gives the best values for UB and competitive results for LB ROIs.

Tomographic Image Reconstruction Using the TV-Str Method
To further investigate the applicability of the proposed method we model a multimodal tomographic reconstruction problem. Our aim is to reconstruct a synthetic thorax phantom (see Fig. 6a) with supplementary information given in image (b). The functional (a) and anatomical (b) phantoms were chosen to be structurally different to examine the problem of misaligned features. Several lesions were added to the functional phantom which are absent from the reference phantom.
Each projection was generated with a strip kernel [23] using the higher resolution version of the phantom (600 Â 600 isotropic pixel grid). Reconstructions were calculated on a lower 200 Â 200 isotropic pixel grid with a linear projection model thus avoiding the ''inverse crime'' of generating the data with the same model as the model that is used for calculating the reconstruction [24]. The pixel size was chosen to be 4 mm and the characteristic blur associated with the PET system was modelled by convolving each projection with a Gaussian kernel (FWHM = 5 mm) [3]. The resolution was not modelled in the reconstruction. Poisson distributed noise (W ¼ 30 realizations) was applied to the projection data. The total number of counts was 100K per sinogram. Scatter was not simulated in this study. The number of acquisition angles was set to 400.
We compare the selected methods: MLEM (23) (upper step only), MLEM with TV penalty term (20), MLEM with TV-Str (21) and MLEM with BM (22). Since we perform the MLEM step similarly for every penalty function we reduce our notation for reconstruction methods with penalties to TV, BM, and TV-Str. Note that the TV-Str method here is the proposed method with orientation matching technique, without this step the proposed method can impose bias on the solution as shown earlier (see Sect. 3.1).
In order to compare algorithms we use the following quantitative measures: normalized absolute deviation (NAD), SNR and ROI variability.
The NAD between true activity k and the estimatedk, over a ROI, is defined as: where W is a number of noise realizations. The SNR is defined as:  Table 1 and final errors are shown in Table 2 Table 1 Parameters for the image restoration experiment wherek ROI andk B is the average of counts within the ROI and the background, respectively. NB is the total number of pixels within the background B and r W j is the ensemble standard deviation of each pixel j across all noise realizations W.
The ROI variability is defined as: Parameters for all methods were found empirically by referring to the best NAD-SNR values achievable. We did not perform a rigorous optimization for the parameter values, however certain conclusions based on the behaviour of each method can be made.  (7), e TV-Str with orientation matching (11). Note how the methods behave differently for the different ROIs. The parameters for this experiment are given in Table 1  The MLEM algorithm gives an image with poor resolution and a high level of noise (see Fig. 7). To get a better reconstruction one needs to stop the iteration process prematurely [21], however in this case we run the MLEM algorithm for 50 iterations (M ¼ 50). The quantitative analysis of lesion ROIs (see Fig. 8) and whole phantom (see Fig. 9) shows high values for NAD and low SNR. The NAD-SNR values are improved significantly with the use of TV regularization. The BM gives the lowest NAD for the L1 ROI (see Fig. 8) but quite a low SNR, for L2 ROI it shows low SNR as well. One can see the high variability level in reconstructed images on Fig. 7 as well as high value on Fig. 9. The proposed method TV-Str performs very similar to the TV penalty, but it also adds a significant amount of contrast to the edges which are considered to be common to both images. The NAD-SNR values for TV-Str (lesions ROI) show very competitive performance of the method providing higher values of SNR and NAD. For the whole phantom ROI the TV-Str method provides the best bias and the lowest ROI variability. On Fig. 7d one can see that lesions are well preserved (similar to TV) and no artifacts are visible.  In this work we have intentionally disregarded the first step of the problem (2), where one needs to regularize the normal or tangential vector fields. One can consider the case when two normal vector fields nðkÞ and nðlÞ are minimized simultaneously resulting in the combined vector field nðk; lÞ which used in the surface fitting step (4). This is an interesting, yet challenging problem and it was partially examined previously using the combined diffusion tensors approach [10]. The choice between the orientations of two vector fields and the magnitude of the gradients is a complicated task which has a non-unique solution. In the future we will consider the problem of obtaining a smoothed joint vector field nðk; lÞ.
We also used non-smoothed normals of l in (4) and no strong artifacts appeared in the solution. However, if l will be more noisy it is advisable to smooth it first before using in (4).
Notably, the functional (11) is non-differentiable due to discontinuity for k when U ¼ T. The splitting techniques based on the proximity operators can deal with discontinuous penalty terms [22]. Another option is to modify functional (11) into convex and continuous combination which consists of TV and TV-Str terms in one regularization penalty.
Normally, using complex regularization terms in image reconstruction problems is discouraged due to difficulties in finding the minimizer for (18). The proposed TV-Str term with orientation matching is simple to use, however, the orientation matching step is computationally expensive. Faster and robust techniques to identify the aligned orientations for TV-Str can be strongly beneficial.
In this paper we have shown a novel approach for incorporating available additional information into TV filtering step. The resolution of features (common for various datasets) can be significantly improved while misaligned features can be recovered without strong artifacts. The proposed technique is robust to uncorrelated data since only parallel (or almost parallel) gradients are accepted for correction. The proposed functional can be used with many applications, such as, medical hybrid imaging, dynamic imaging (when pre-scan in higher resolution is available), image fusion etc.
Light Source for the use of their facilities. This work has been supported by the Engineering and Physical Sciences Research Council under Grants EP/J010456/1 and EP/I02249X/1.
Open Access This article is distributed under the terms of the Creative Commons Attribution License which permits any use, distribution, and reproduction in any medium, provided the original author(s) and the source are credited.