Keywords

1 Introduction

MRI image can provide various kinds of detailed information with respect to physical health. However, external errors, inappropriate spatial encoding, body motion etc. may jointly result in the undesirable effects of MRI and the harmful noise. Clean MRI images could increase the accuracy of computer vision assignments [1, 2], like semantic segmentation [3] and object detection [4]. In the past, a wide variety of denoising methods have been proposed such as filtering methods [5, 6], transform domain method [7]. Nevertheless, these methods are restricted to numerous objective factors such as undesirable texture changes caused by violation of assumptions and heavy computational overhead. Recently, deep learning methods have made great progress in the field of image denoising. These means helps to acquire the impressive effects in MRI image denosing. Due to the scarcity of medical images, researchers need to use unpaired data during training. Generative adversarial network (GAN) [8] have been found to be more competitive in image generation tasks [9, 10]. One of the solution might be directly using some unsupervised methods (DualGAN [11], CycleGAN [12]) to find the mappings between clear and noised image domains. However, these general methods often encode some irrelevant characteristics such as texture features rather than noise attributes into the generators, and thus will not produce high-quality denoised images.

Under the guidance of aforementioned theories, we present a MRI image denoising method called UEGAN which uses GAN based on decoupled expression to generate visually realistic denoised images. More specifically, we decouple the content and noise from noised images to accurately encode noise attributes into the denoising model. As shown in Fig. 1, the content encoders encode content information and the noise encoder encode noise attributes from unpaired clear and noised MRI images. However, this type of structure can’t guarantee that the noise encoder encodes noise attributes only - it may encode content information as well. So we employ the nosing branch to limit the noise encoder to encode the content attributes of n. The denosing generator \(G_{clear}\) and the noising generator \(G_{noised}\) take corresponding content information on condition of noise attributes to generate denoised MRI images and noised MRI images. Based on CycleGAN [12], we apply the adversarial loss and the cycle-consistency loss as the regularizers to help the generator generate a MRI image which closes to the original image. In order to further reduce the undesirable banding artifacts introduced by \(G_{noised}\) and \(G_{clear}\), we apply the image quality penalty into this structure. We conduct experiments on Brainweb MRI datasets, and obtain qualitative and quantitative results that are competitive with several conventional methods and a deep learning method.

2 Related Work

Since the proposed model structure makes most use of the popular denoising network and the latest technology of image disentangled representation, in this part, we briefly review the generative adversarial network, single image denoising and disentangled representation.

2.1 Generative Adversarial Network

Generative adversarial network [8] is brought forward to train generative models. Radford et al. [13] propose GANs of CNN version called DCGANs. Arjovsky et al. [14] introduce a novel loss called wasserstein into GAN at train time. Zhang et al. [15] propose Self-Attention GAN which applies attention mechanism to the field of image creation.

2.2 Disentangled Representation

Recently, there is a rapid development in learning disentangled representations, namely decoupled expression. Tran et al. [16] unravel posture and identity components for face recognition, which called DRGAN. Liu et al. [17] present an identity extraction and elimination autoencoder to disentangle identity from other characteristics. Xu et al. propose FaceShapeGene [18] which correctly disentangles the shape features of different semantic facial parts.

2.3 Single Image Denosing

Image noise has caused serious damages to image quality. There are many deep learning methods that focus on image denoising tasks. Jain et al. [19] firstly introduce Convolutional neural networks (CNN) which has a small receptive field into image denoising. Chen et al. [20] joint Euclidean and perceptual loss functions to find more edge information. According to deep image prior (DIP), present by Ulyanov et al. [21], abundant prior knowledge for image denosing already exist in the pre-train convolutional neural network.

3 Proposed Method

Inspired by GAN, single image denosing, decoupled expression, we proposed a MRI image Unsupervised denoising method called UEGAN which has well designed loss functions based on decoupled expression. This structure combines the advantages of the above three classic models and is made up of four parts: 1) content encoders \(E_N^{cont}\) for noisy image domain and \(E_C^{cont}\) for clear image domain; 2) noise encoder \(E^{noise}\); 3) noised and clear image generator \(G_{noised}\) and \(G_{clear}\); 4) noised and clear image discriminators \(D_N\) and \(D_C\). Given a train sample n \(\in\) N in the noised image domain and c \(\in\) C in the clear image domain, the content encoders \(E_N^{cont}\) and \(E_C^{cont}\) acquire content information from corresponding samples and \(E^{noise}\) extract the noise attributes from N. Then \(E^{noise} \left( n \right)\) and \(E_C^{cont} \left( c \right)\) are feed into the \(G_{noised}\) to generate a noised image \({\text{c}}^n\), meanwhile, \(E^{noise} \left( n \right)\) and \(E_N^{cont} \left( n \right)\) are feed into the \(G_{clear}\) to generate a clear image \(n^c .\) The discriminators \(D_{noise}\) and \(D_{clear}\) differentiate the real from generated examples. The final structure is shown in Fig. 1.

3.1 Decoupling Noise and Content

It is not easy to decouple content information from a noised image because the ground truth image is not available in the unpaired setting. since the clear image c is not affected by noise, the content encoder \(E_C^{cont} \left( c \right)\) is equivalent to encoding the content characteristics only. We share the weights of the last layer which existing in the \(E_N^{cont} \left( n \right)\) and \(E_C^{cont} \left( c \right)\) respectively to encode as much content information from noised image domain as possible.

Meanwhile, the noise encoder should only encode noise attributes. So We feed the outputs of \(E^{noise} \left( n \right)\) and \(E_C^{cont} \left( c \right)\) into the \(G_{noised}\) to generate \(c^n\). Since \(c^n\) is a noised version of c, \(c^n\) does not contain any content information of n in the whole process. This nosing branch further limits the noise encoder to encode the content information of n.

Fig. 1.
figure 1

The architecture of our network. The denoising branch (bottom noising branch) is represented by full line (dotted line). \(E_N^{cont}\) and \(E_C^{cont}\) are content encoders for noised and clear images. \(E^{noise}\) is a noise encoder. \(G_{noised}\) and \(G_{clear}\) are noised image and clear image generators. GAN losses are added to differentiate \(c^n\) from noised images, and \(n^c\) from clear images. Cycle-consistency loss is employed to n and \(n^{\prime}\), c and \(c^{\prime}\). IE loss is applied to n and \(n^c\).

3.2 Adversarial Loss

In order to acquire a cleaner output, we introduce the adversarial loss function into the content domain and the noise domain. For the clear image domain, we define the adversarial loss as \(L_{D_C }\):

$$ L_{D_C } = {\mathbb{E}}_{c \sim p\left( c \right)} [\log \,D_C (c)] + {\mathbb{E}}_{n \sim p\left( n \right)} [\log (1 - D_C (G_{clear} (E_N^{cont} (n),\,z)))]. $$
(1)

where z \(= E^{noise} \left( n \right)\) and \({\text{D}}_C\) devotes to maximize the objective function to differentiate denoised images from real clear images. In contrast, \(G_{clear}\) tries to minimize the objective function to make denoised images look similar to real samples in clear image domain. For the clear image domain, we define the loss as \(L_{D_N }\):

$$ L_{D_N } = {\mathbb{E}}_{n \sim p\left( n \right)} [\log \,D_N (n)] + {\mathbb{E}}_{c \sim p\left( c \right)} [\log (1 - D_N (G_{noise} (E_C^{cont} (c),\,z)))]. $$
(2)

3.3 Image Quality Penalty

We have observed that the denoised images \(n^c\) usually contains unpleasant banding artifacts in the experiment. So we introduce the Image information entropy (IE) [22] which is utilized to compute the amount of information in an image to reduce the banding artifacts. And IE loss is employed to guide the generator to produce MRI images with less noise. The loss is defined as:

$$ L_{IE} \left( {G_{clear} \left( z \right)} \right) = \sum\nolimits_{i = 0,\;\;p\left( i \right) \ne 0}^d {p\left( i \right)log\frac{1}{p\left( i \right)}.} $$
(3)

where d is the range of image intensity and p(i), i = 0, 1,2,…, d is the probability distribution of the intensity of the output \(G_{clear}\)(x).

3.4 Cycle-Consistency Loss

\(G_{clear}\) should have the ability to generate visually realistic and clear images after the minmax game. However, without the guidance of pairwise supervision, the denoised image \(n^c\) may rarely retains the content information of the original noised sample n. Therefore, we introduce the cycle-consistency loss to ensure that the denoised image \(n^c\) can be renoised to construct the original noised image and \(c^n\) can be translated back to the original clear image domain. The loss preserves more content information of corresponding original samples. In more detail, we define the forward translation as:

$$ n^c = G_{clear} (E_N^{cont} (n),\,E^{noise} (n)), $$
$$ c^n = G_{noised} (E_C^{cont} (c),\,E^{noise} (n)). $$
(4)

And the backward translation as:

$$ n^{\prime} = G_{noised} (E_C^{cont} (c^n ),E^{noise} (n^c )), $$
$$ c^{\prime} = G_{clear} (E_N^{cont} (n^c ),E^{noise} (n^c )). $$
(5)

We perform the loss on both domains as follows:

$$ L_{cc} = {\mathbb{E}}_{c \sim p\left( c \right)} \left[ {\left\| {c - c^{\prime}} \right\|_1 } \right] + {\mathbb{E}}_{n \sim p\left( n \right)} \left[ {\left\| {n - n^{\prime}} \right\|_1 } \right]. $$
(6)

Meanwhile, we carefully balance the weights among the aforementioned losses to prevent \({\text{n}}^{\text{c}}\) from staying too close to n.

The total objective function is a combination of all the losses from (1) to (6) with respective weights:

$$ L = \lambda_{adv} L_{adv} + {\uplambda }_{{\text{IE}}} {\text{L}}_{{\text{IE}}} + {\uplambda }_{{\text{cc}}} {\text{L}}_{{\text{cc}}} . $$
(7)

3.5 Testing

In the process of testing, the noising branch is removed. Provided a test image a, \(E_N^{cont}\) and \(E^{noise}\) extract the content information and noise attributes. Then \(G_{clear}\) takes the outputs and generates the denoised image A:

$$ A = G_{clear} (E_N^{cont} (a),\,E^{noise} (a)). $$
(8)

4 Experiments and Analysis

We compare the MRI image denoising performance between our work with non-local means (NLM) [23] and a deep learning method DIP. To analyze the performance of denoising methods quantitatively, peak signal to noise ratio (PSNR), structural similarity index (SSIM) are employed. We evaluate the proposed model on Brainweb MRI datasets. The unpaired train set with 150 MRI images consists of the following two parts:

  1. 1)

    Samples from the noise image domain consist of seventy-five slices, whose slice thickness is 1 mm, and additional gaussian noise standard deviation sigma is 25.

  2. 2)

    Samples (no additional gaussian noise) from the clear image domain consist of seventy-five slices, whose slice thickness is 1 mm.

4.1 Implementation Details

We train our network UEGAN using Pytorch 1.4.0 package on a computer with Intel i9 9300k CPU, NVIDIA RTX 2080Ti GPU, 32 Gb memory and windows10 OS with Brainweb MRI datasets. The UEGAN is optimized using the gradient-based Adam-optimizer whose hyper-parameter is set as β1 = 0.5, β2 = 0.999, Nepoch = 100000, and the learning rate of all generators is 2e−4, the learning rate of all discriminators is 1e−4. We utilize 208 × 176 original size with batch size of 4 for training. We experimentally set hyper-parameters: \(\lambda_{adv}\) = 1, \(\lambda_{cc}\) = 10, \(\lambda_{IE}\) = 10.

4.2 Experimental Results

In this section, we compare our method with NLM and DIP, and the denosing performance is shown in Fig. 2. For NLM, the denoising results is blurry and a great quantity of local details are missing. However, our visual results have the sharper texture and more structure details.

For DIP, it produces artifacts and cannot recover meaningful MRI image information. On the contrary, our model UEGAN obtains more distinct results and less noise especially on local regions.

The UEGAN achieves the best visual performance in denosing and image information recovering.

Fig. 2.
figure 2

Visual denoising results in three selected MRI slices. Column: noised image, NLM, DIP, the proposed method UEGAN, noise-free image in order from left to right.

4.3 Quantitative Analysis

Two quantitative analysis strategies PSNR and SSIM are adopted to assess the effects of a traditional image denoising method NLM, a deep learning method DIP and our work UEGAN. The denoisong results of our work shows superior performance to other algorithms on above two quantitative evaluation indexes as shown in Table 1 and Table 2.

Table 1. PSNR comparison
Table 2. SSIM comparison

5 Conclusion

In this paper, we concentrate on generating high-quality denoised MRI images with a deep-learning method which called UEGAN based on decoupled expression. We utilize the noise encoder and the content encoder to decouple the content information and noise attributes in a noisy MRI image. In order to obtain rich content characteristics from the original image, we add the adversarial loss and the cycle-consistency loss. We add the nosing branch into model so as to limit the noise encoder to encoding noise attributes as much as possible. The IE loss helps to remove the banding artifacts which consisting in the outputs of generator. After competing with several popular methods, both visual effects and quantitative results show that our work is extremely promising.