1 Introduction

Magnetic resonance image (MRI) denoising is key preprocessing step in many image processing and analysis tasks. There is a large amount of papers related to this topic [1]. Most of denoising methods can be classified on those that use the intrinsic pattern redundancy of the images and those exploiting their sparseness properties.

On the first class, the well-known non-local means (NLM) filter [2] is maybe the most representative method. The bibliography related to extensions of this method is quite extensive [3,4,5]. On the other hand, sparseness-based methods try to reduce the noise by assuming that most of the signal can be sparsely represented using few basis signals (using fixed basis like in FFT or DCT [6] or data dependent basis using for example PCA [7]).

Recently, deep learning methods have also proposed to denoise MR images by training different architectures with pairs of noisy and noise-free input-outputs. Such methods try to infer the clean image from the noisy input. The main benefit of these techniques is that after training, the denoising can be applied extremely fast (on GPUs). One of the first deep learning methods for denoising was proposed by Gondara [8] using convolutional denoising autoencoders though a bottleneck strategy to denoise 2D images. Benou et al. [9] proposed a spatio-temporal denoising method using restricted Boltzman machines. More recently, Jiang et al. [10] proposed a specific Rician noise filter using a slice-wise convolutional neural network.

In this paper, we present a novel denoising approach based on the application of a 3D Convolutional Neural Network using an overcomplete patch-based sliding window scheme. The resulting filtered image is used as a guide image to accurately estimate the voxel similarities within a rotationally invariant NLM (RI-NLM) strategy as done in Manjón et al. [7].

2 Materials and Methods

2.1 Image Data

Training Dataset:

To train a supervised neural network a ground truth is needed to teach the network how the desired output looks like. Unfortunately, zero noise images do not exist and the only two options are to simulate noise free images or to work with a low-noise image resulted from the averaging of multiple acquired images and to consider it as a bronze standard. The first option has indeed images with zero noise but at the expense of a simpler and less realistic anatomy. The second is anatomically more complete but zeros noise condition is not met. In this paper, we have used both approaches.

MNI Synthetic Dataset:

We used 20 simulated T1 brain MRIs from the MNI brain simulator. To train the network several levels of stationary Gaussian noise (1% to 9% of maximum intensity) were added to generate the training data.

IXI Dataset:

Since to acquisition of several MRIs is a costly process we used as a surrogate denoised images from the IXI dataset. Specifically, we randomly selected 30 T1 MRIs from this dataset and we denoised them using the PRI-NL-PCA method [7] which is a state-of the-art-method. Denoised images had virtually almost zero noise and the anatomy was minimally affected by the application of the filter as can be checked in the residual image obtained by subtracting the noisy and denoised image. Again, to train the network several levels of stationary Gaussian noise were added to generate the training data (1% to 9% of maximum intensity).

Test Dataset:

To be able to quantitatively compare the proposed method with previous methods, we used the well-known Brainweb 3D T1-weighted MRI phantom [11] as test dataset. This synthetic dataset has a size of 181 × 217 × 181 voxels (voxel resolution = 1 mm3) and was corrupted with different levels of stationary Gaussian and Rician noise (1% to 9% of maximum intensity). Rician noise was generated by adding Gaussian noise to real and imaginary parts and then computing the magnitude image.

2.2 Preprocessing

Classic preprocessing in deep learning consist of center the images by subtracting the mean and dividing by the standard deviation. Since our proposed method uses 3D patches as input of the network, this operation could be done to each patch independently. However, since we use a sliding windows approach to denoise the images, we used a different approach to minimize block artifacts that could arise after mean restoration and standard deviation restoration.

First, we estimated a low-pass filtered image with a box-car kernel with the same size of the patch (local mean map). Second, we estimated local standard deviation map using the same patch size (local standard deviation map). Afterwards, these two images were used to normalize the input and output volumes by subtracting the local mean map and dividing by the local standard deviation map (see Fig. 1). We found that this approach introduces significantly less blocking artifacts than the standard approach.

Fig. 1.
figure 1

2D example of the proposed patch-based CNN model. Block design: Red (Batch Normalization), Blue (3D convolution) and Green (RELU). (Color figure online)

2.3 Proposed Method

The proposed approach is based on a patch-wise single scale CNN (no max-pooling). The input and output of the proposed CNN are 3D patches of size 12 × 12 × 12 voxels. Such patches are extracted from the pre-processed images in an over complete manner with an overlapping of 6 voxels in all three dimensions.

Differently from other approaches were different networks are trained to filter different levels of noise [10], our pre-processing fixes to approximately one the amount of noise present at each patch. Thanks to this, our network is able to blindly deal with arbitrary levels of noise and besides is naturally suited to deal with spatially variant noise levels which are quite common in modern MRIs.

Overcomplete Patch-Based CNN

The topology of the proposed network is the following. First, one input block of size 12 × 12 × 12 composed of one 3D convolution and a RELU layer with 64 filters of 3 × 3 × 3 voxels. Then, seven repeated blocks composed of a Batch-Normalization, a 3D convolution plus a RELU. Finally, a last block composed of a Batch Normalization and a 3D convolution to produces a 12 × 12 × 12 output patch (see Fig. 1). To train the network we used ADAM optimizer, 100 epochs and a batch size of 128 patches. We used an early stop criterion using the validation data which represented the 10% of the training data. The whole network has a total of 779,009 trainable parameters.

We used a residual learning approach (i.e., the network learns how to produce noise map) as in [10] instead of using residual connexions in the network (i.e., the residual network is trained to produce denoised image) as we found this option more effective (faster training and better results). Basically, instead of learning the noise-free patch, we learn the noise present in the patch. This is done by the network simply removing correlated information in the input layer. Differently from [10] where the network remove the original image from the input, our network starts with a pre-processed patch that is highly similar to the output patch. Therefore, the effort of the network to remove the anatomy is lower and the problem to solve easier.

We trained the network using around 300.000 patches randomly selected from the cases of each library (i.e. we trained a network using only patches from MNI dataset and one using only IXI dataset). We used the mean squared error as loss function. Once the network is trained, the test image (i.e., the Brainweb phantom) is filtered using an overcomplete 3D sliding window approach. This overcomplete approach further reduces the noise by averaging several overlapping estimations and contributes to reduce block artefacts.

Rotational Invariant Denoising

As shown in Manjón et al. [7], when a good quality pre-filtered image is available we can use this image within a rotationally invariant NLM filter to robustly perform a local similarity estimation defined as follow:

$$ \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{A} (i) = \frac{{\sum\limits_{{j \in\Omega }} {w(i,j)y(i)} }}{{\sum\limits_{{j \in\Omega }} {w(i,j)} }}\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,w(i,j) = e^{{ - \frac{1}{2}\left( {\frac{{(g(i) - g(j))^{2} + 3(\mu_{{N_{i} }} - \mu_{Nj} )^{2} }}{{2h_{i}^{2} }}} \right)}} $$
(2)

where µNi and µNj are the mean values of patches Ni and Nj around voxels i and j in the guide image g, h is related to the standard deviation of the noise present on image y and Ω represents position of the elements of the search volume. We refer the interested reader to the original paper [7] to see the full details of the rotational invariant NLM filter. Rician noise bias was removed as described in Manjón et al. [7].

It is worth noting that applying this rotational invariant NLM using the proposed CNN guide image not only outperforms the use of the CNN only but also helps to remove small remaining block artifacts.

3 Experiments and Results

In this section, a set of experiments are presented to show how the hyper parameters of the proposed network were selected and some comparisons with state-of-the-art methods. To evaluate the results, we used the Peak Signal to noise Ratio (PSNR) estimated between the denoised and the noise free Brainweb phantom.

3.1 Network Topology

We explored several options to design the proposed patch-based CNN, such as the patch-size, number of layers or the number of filters. For the number of filters, we tested 16, 32 and 64 filters, the results showed that the higher the number of filters the better the results. We chose 64 filters because 128 significantly increased the model size and training time but the improvement was modest. Regarding the number of layers, we found that increasing the number of layers to have a receptive field wider than the patch size was not improving significantly the results. We tested patch sizes of 6 × 6 × 6, 12 × 12 × 12 and 24 × 24 × 24 voxels and we found that the best results were obtained for 12 × 12 × 12 voxels (with 7 internal blocks covering a receptive field of 17 × 17 × 17 voxels).

3.2 Impact of Training Data

We trained the designed network using the both described datasets (MNI and IXI) and compared the results on the Brainweb dataset used as testing dataset. We added Gaussian noise (range 1 to 9%) to these images to simulate noisy cases. We did not add Rician noise as the Rician bias correction is performed at postprocessing as described in [7]. We also evaluated the impact of the level of overlapping over the final results. Specifically, offsets of 6 and 3 voxels in all 3 dimensions were evaluated. The results are shown in Table 1.

Table 1. PSNR results on the Brainweb phantom for the proposed method for stationary Gaussian noise with two different training datasets (the MNI and IXI datasets).

As can be noted, the best results were obtained when using the IXI dataset. This was counter intuitive as we thought that the patterns of the synthetic MNI dataset and the Brainweb phantom being similar, the results would be better when using this dataset. We think that the patterns of the IXI dataset being richer and more complex allowed the network to better generalize. As expected, we found also that the higher the overlap the better the results (at the expense of a higher computational time, 17 vs 120 s).

3.3 Methods Comparison

We compared our proposed method with other recent state of the art denoising methods. The compared methods are called PRI-NL-PCA [7], BM4D [12], MCDnCNNg (blind) and MCDnCNNs (several noise specific networks) [10] and the proposed method trained with IXI data. Both Gaussian and Rician noise with different levels were evaluated. Tables 2 and 3 summarize the results of the comparison.

Table 2. PSNR results on the Brainweb phantom of the compared methods for stationary Gaussian noise.
Table 3. PSNR results on the Brainweb phantom of the compared methods for stationary Rician noise. Results of MCDnCNNg and MCDnCNNg methods [10] were estimated from Fig. 6.

As can be noticed, the combination of the proposed PB-CNN with the RI-NLM further improves the results for both stationary Gaussian and Rician noise. The proposed method outperformed the compared methods for all noise levels and noise types.

3.4 Qualitative Evaluation on Real Images

Although the results on synthetic data are easy to interpret, they might by not realistic enough. To qualitatively evaluate the results of the proposed method we applied it to two real images and we evaluators visually the results. In Fig. 2 the results can be visually checked. As can be noticed, no anatomical information can be observed in the residuals. Finally, it is worth noting that a network trained on T1 images can be used to denoise T2 images effectively thanks to its patch-based nature.

Fig. 2.
figure 2

Denoising example of real T1 and T2 images. From left to right: Noisy image, filtered image with the proposed filter and residual image (removed Rician noise).

4 Discussion

In this paper, we have presented a new method for MRI denoising that combines the benefits of new deep learning techniques with the strength of the traditional non-local image processing methods. The proposed method is based on an overcomplete patch-based CNN which produces a pre-filtered image which is used as a guide image within a rotational invariant non-local means framework.

The proposed method outperformed compared methods for all noise levels and noise types (Gaussian and Rician) and is an effective approach for automatically reduce the amount on noise in MR images in a blind manner thanks to its automatic adaptation to different levels of noise. Furthermore, although it was not designed to do so the proposed method is able to deal with spatially variant noise as can be noticed at Fig. 2 thanks to its adaptive patch-based nature.