Skip to main content

De-noising of Contrast-Enhanced MRI Sequences by an Ensemble of Expert Deep Neural Networks

  • Conference paper
  • First Online:
Deep Learning and Data Labeling for Medical Applications (DLMIA 2016, LABELS 2016)

Abstract

Dynamic contrast-enhanced MRI (DCE-MRI) is an imaging protocol where MRI scans are acquired repetitively throughout the injection of a contrast agent. The analysis of dynamic scans is widely used for the detection and quantification of blood brain barrier (BBB) permeability. Extraction of the pharmacokinetic (PK) parameters from the DCE-MRI washout curves allows quantitative assessment of the BBB functionality. Nevertheless, curve fitting required for the analysis of DCE-MRI data is error-prone as the dynamic scans are subject to non-white, spatially-dependent and anisotropic noise that does not fit standard noise models. The two existing approaches i.e. curve smoothing and image de-noising can either produce smooth curves but cannot guaranty fidelity to the PK model or cannot accommodate the high variability in noise statistics in time and space.

We present a novel framework based on Deep Neural Networks (DNNs) to address the DCE-MRI de-noising challenges. The key idea is based on an ensembling of expert DNNs, where each is trained for different noise characteristics and curve prototypes to solve an inverse problem on a specific subset of the input space. The most likely reconstruction is then chosen using a classifier DNN. As ground-truth (clean) signals for training are not available, a model for generating realistic training sets with complex nonlinear dynamics is presented. The proposed approach has been applied to DCE-MRI scans of stroke and brain tumor patients and is shown to favorably compare to state-of-the-art de-noising methods, without degrading the contrast of the original images.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    The appendix is avilable in the electronic version of the manuscript and at: https://drive.google.com/file/d/0B_vghaLYgXRKTnAwSU5oLUNDWmc/view?usp=sharing.

References

  1. Abbott, N.J., Friedman, A.: Overview and introduction: the blood-brain barrier in health and disease. Epilepsia 53(s6), 1–6 (2012)

    Article  Google Scholar 

  2. Bridle, J.S.: Probabilistic interpretation of feedforward classification network outputs, with relationships to statistical pattern recognition. In: Soulié, F.F., Hérault, J. (eds.) Neurocomputing. NATO ASI Series, vol. 68, pp. 227–236. Springer, Heidelberg (1990)

    Chapter  Google Scholar 

  3. Brix, G., Semmler, W., Port, R., Schad, L.R., Layer, G., Lorenz, W.J.: Pharmacokinetic parameters in CNS Gd-DTPA enhanced MR imaging. J. Comput. Assist. Tomogr. 15(4), 621–628 (1991)

    Article  Google Scholar 

  4. Buades, A., Coll, B., Morel, J.M.: A non-local algorithm for image denoising. In: Computer Vision and Pattern Recognition, CVPR, vol. 2, pp. 60–65 (2005)

    Google Scholar 

  5. Dahl, G.E., Sainath, T.N., Hinton, G.E.: Improving deep neural networks for LVCSR using rectified linear units and dropout. In: ICASSP, pp. 8609–8613. IEEE (2013)

    Google Scholar 

  6. Gal, Y., et al.: Denoising of dynamic contrast-enhanced MR images using dynamic nonlocal means. IEEE Trans. Med. Imaging 29(2), 302–310 (2010)

    Article  Google Scholar 

  7. Golkov, V., et al.: q-space deep learning for twelve-fold shorter and model-freediffusion MRI scans. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9349, pp. 37–44. Springer, Heidelberg (2015)

    Google Scholar 

  8. Hinton, G.E., Salakhutdinov, R.R.: Reducing the dimensionality of data with neural networks. Science 313(5786), 504–507 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  9. Hinton, G., et al.: Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Sig. Process. Mag. 29(6), 82–97 (2012)

    Article  Google Scholar 

  10. Hinton, G.E.: Training products of experts by minimizing contrastive divergence. Neural Comput. 14(8), 1771–1800 (2002)

    Article  MATH  Google Scholar 

  11. Hinton, G.E., Osindero, S., Teh, Y.W.: A fast learning algorithm for deep belief nets. Neural Comput. 18(7), 1527–1554 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  12. Kimmel, R., Malladi, R., Sochen, N.: Images as embedded maps and minimal surfaces: movies, color, texture, and volumetric medical images. Int. J. Comput. Vis. 39(2), 111–129 (2000)

    Article  MATH  Google Scholar 

  13. Martel, A.L.: A fast method of generating pharmacokinetic maps from dynamic contrast-enhanced images of the breast. In: Larsen, R., Nielsen, M., Sporring, J. (eds.) MICCAI 2006. LNCS, vol. 4191, pp. 101–108. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  14. Murase, K.: Efficient method for calculating kinetic parameters using T1-weighted dynamic contrast-enhanced magnetic resonance imaging. Magn. Reson. Med. 51(4), 858–862 (2004)

    Article  Google Scholar 

  15. Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning internal representations by error propagation. Technical report, DTIC Document (1985)

    Google Scholar 

  16. Schmid, V.J., et al.: A bayesian hierarchical model for the analysis of a longitudinal dynamic contrast-enhanced MRI oncology study. Magn. Reson. Med. 61(1), 163–174 (2009)

    Article  Google Scholar 

  17. Sourbron, S.P., Buckley, D.L.: Classic models for dynamic contrast-enhanced MRI. NMR Biomed. 26(8), 1004–1027 (2013)

    Article  Google Scholar 

  18. Tofts, P.: Quantitative MRI of the Brain: Measuring Changes Caused by Disease. Wiley, Hoboken (2005)

    Google Scholar 

  19. Tofts, P.S.: Modeling tracer kinetics in dynamic Gd-DTPA MR imaging. J. Magn. Reson. Imaging 7(1), 91–101 (1997)

    Article  Google Scholar 

  20. Tofts, P.S., et al.: Estimating kinetic parameters from dynamic contrast-enhanced T1-weighted MRI of a diffusable tracer: standardized quantities and symbols. J. Magn. Reson. Imaging 10(3), 223–232 (1999)

    Article  Google Scholar 

  21. Veksler, R., Shelef, I., Friedman, A.: Blood-brain barrier imaging in human neuropathologies. Arch. Med. Res. 45(8), 646–652 (2014)

    Article  Google Scholar 

  22. Vincent, P., et al.: Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion. J. Mach. Learn. Res. 11, 3371–3408 (2010)

    MathSciNet  MATH  Google Scholar 

Download references

Acknowledgments

This study was supported by the European Union’s Seventh Framework Program (FP7/2007–2013; grant agreement 602102, EPITARGET; A.F.), the Israel Science Foundation (A.F.) and the Binational Israel-USA Foundation (BSF; A.F.).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ariel Benou .

Editor information

Editors and Affiliations

Appendices

Appendices

A Artificial Neuron and Deep neural network

Neural Networks (NNs) are modeled as collections of computational units called artificial neurons (ANs) that are connected in an acyclic graph. An AN is a computational unit with multiple inputs, denoted by an augmented input vector \(\tilde{\mathbf {c}}_{o}=\left[ 1,\mathbf {c}_{o}^{T}\right] ^{T}\) and a single output scalar, \(y\in \mathbb {R}\), such that \(y=f(\mathbf {x})=f\left( \mathbf {w^{T}x}\right) \), where \(f:\,\mathbb {R\rightarrow R}\) is called the activation function of the neuron.

Let \(\mathbf {w}_{i}^{l}=[w_{i,0}^{l},w_{i.1}^{l},\ldots ,w_{i,K_{l}}^{l}]^{T}\) define the weights of the graph edges connecting the \(i-\text{ th }\) AN in layer \(l+1\) to the \(K_{l}\) ANs in layer l , \(l\in [0,L-1]\). Let us also define by \(f_{l}(\cdotp )\) the activation function of the ANs in layer l. The output \(y_{i}^{l+1}\) of the \(i-\text{ th }\) AN in layer \(l+1\) is calculated as follows:

Fig. 6.
figure 6

The output of an Artificial neuron is obtained by applying an activation function, \(f(\cdotp )\), on the weighted sum of the augmented input \(\tilde{\mathbf {c}}\).

Fig. 7.
figure 7

A Deep neural network with \(L=3\) layers and a sigmoid activation function. Hidden layers are marked in red. (Color figure online)

$$\begin{aligned} y_{i}^{l+1}=f_{l}(\mathop {\sum }\limits _{j=1}^{K_{l}}w_{i,j}^{l}y_{j}^{l})=f_{l}(<\mathbf {w}_{i}^{l},\mathbf {y}^{l}>) \end{aligned}$$
(3)

where \(\mathbf {y}^{l}=[1,y_{1}^{l},\ldots ,y_{K_{l}}^{l}]^{T}\). The DNN’s input is the augmented vector \(\mathbf {y}^{0}=\tilde{\mathbf {c}}_{o}=\left[ 1,\mathbf {c}_{o}^{T}\right] ^{T}\)

B Expert DNN

We set \(L=10\) layers for each of our expert DNNs. Each layer contains 100-175-150-120-80-30-80-120-150-175-100 neurons, respectively, where the input and output layers are of size \(l_{0}=l_{10}=100\). The pre-training process is carried out in an aggregative manner where the weights \(\mathbf {W}_{0}^{l}\) of each layer \(l=1\ldots 5\) of a given DNN are pre-trained using a single RBM (denoted by \(RBM^{l}\)) such that the hidden units realizations \(\mathbf {h}^{l-1}\) of an \(RBM^{l-1}\) are used as the visible units \(\mathbf {v}^{l}\) of an \(RBM^{l}\), i.e. \(\mathbf {h}^{l-1}=\mathbf {v}^{l}\) (see Fig. 8a). The weights of the DNN are initialized by setting \(\mathbf {W}^{l}=\mathbf {W}_{0}^{l}\) for the lower L / 2 layers (\(l=1\ldots 5)\) and \(\mathbf {W}^{l}=(\mathbf {W}_{0}^{L-l+1})^{T}\) for the upper layers (\(l=6,\ldots ,10\)) (see Fig. 8b). The calculation of the final weights of the DNN is done simultaneously using stochastic gradient decent (SGD) with linear activation function in the output layer.

For the initialization of the input layer we assume \(\{v_{i}^{0}\}\) are random variables sampled from a normal distribution \(\mathcal {N}(a_{i},\sigma _{i})\), where \(a_{i},\sigma _{i}\) is the mean and the standard deviation (respectively) associated with unit i and are estimated from the training set. Therefore, it is trained as a Gaussian-Bernoulli RBM, with an energy function:

$$\begin{aligned} E\left( \mathbf {v,h}\right) =-\overset{K_{l-1}}{\underset{i=1}{\sum }}\frac{(v_{i}-a_{i})^{2}}{\sigma _{i}}-\overset{K_{l}}{\underset{j=1}{\sum }}b_{j}h_{j}-\underset{i,j}{\sum }\frac{v_{i}}{\sigma _{i}}h_{j}w_{ij} \end{aligned}$$
(4)

The entire training set was scaled such that each entry of input has zero mean and a unit variance. The learning rate of the first \(RMB^{1}\) was set to 0.001 (0.01 for all the others) and pre-training proceeded for 300 epochs. In addition, we used more binary hidden units than the size of the input vector because real-valued data contains more information than a binary feature activation.

Fig. 8.
figure 8

Training an expert DNN: (a) Pre-training using stacked RBMs, (b) Weights initialization of an expert DNN, (c) Fine tuning using stochastic gradient descent.

C Classification DNN

The classification DNN contains an input layer, two hidden layers and an output layer; each layer has 120-180-180-24 neurons, respectively. The input of the DNN is defined as follows. Let \(g_{k}(\mathbf {c}_{o})\) be the hypothesis of the \(k-th\) DNN expert with respect to an observed input \(\mathbf {c}_{o}\), where \(k=1,\ldots ,24\). We define five measures that allow the evaluation of the experts’ performances as follows: \(z_{1;k}=\left\| g_{k}(\mathbf {c}_{o})-\mathbf {c}_{o}\right\| _{1}\), \(z_{2;k}=\left\| g_{k}(\mathbf {c}_{o})-\mathbf {c}_{o}\right\| _{2}\), are the \(L_{1}\) and \(L_{2}\) norm of of the deviation from the original WoC; \(z_{3;k}=\frac{<g_{k}(\mathbf {c}_{o}),\mathbf {c}_{o}>}{\left\| g_{k}(\mathbf {c}_{o})\right\| \left\| \mathbf {c}_{o}\right\| _{2}}\), \(z_{4;k}=\frac{cov(g_{k}(\mathbf {c}_{o}),\mathbf {c}_{o})}{var(g_{k}(\mathbf {c}_{o}))var(\mathbf {c}_{o})}\), are the cosine similarity and correlation between the reconstruction and input signals; and \(z_{5;k}=\left\| \nabla g_{k}(\mathbf {c}_{o})\right\| _{1}\) is the hypothesis total variation. The input feature vector is therefore:

$$\begin{aligned} \mathbf {z}=[z_{1;1},z_{2;1},z_{3;1},z_{4;1},z_{5;1},\ldots ,z_{1;24},z_{2;24},z_{3;24},z_{4;24},z_{5;24}]^{T}\in \mathbb {R}^{120}. \end{aligned}$$
(5)

A “softmax” activation function, which is commonly used for multi-class classification problems, was assigned to neurons in the last layer. The softmax activation function takes into account not only the entry value of a specific AN but also the entries to all the other ANs at this layer:

$$\begin{aligned} f(\alpha _{i})=\frac{\exp (\alpha _{i})}{\underset{j}{\sum }\exp (\alpha _{j})}, \end{aligned}$$
(6)

where \(\alpha _{i}\) denotes the entry value at the \(i-th\) neuron.

Seventy percent of each DNN’s training set were picked at random to create the training set of the classification DNN where for each training example the feature vector \(\mathbf {z}\) was calculated and a label \(\mathbf {y}\) was assigned according to the origin of the training example. Namely, a training example that originally belongs to the training set of the \(k-th\) expert is assigned a label \(\mathbf {y}=\mathbf {e}_{k}\) such that all the coefficients are 0 except for the \(k-th\) coefficient which is 1.

D The Beltrami Framework

In this section we briefly describe the Beltrami framework for de-noising grayscale videos and our extension to modify it to DCE-MRI scans. We consider a grayscale video to be a 3D Riemannian manifold embedded in D = d + 3 dimensional space where d = 1 for grayscale images. The embedding map \(Q:\,\Sigma \rightarrow M\) is given by:

$$\begin{aligned} Q(x,y,\tau )=\left( x,y,\tau ,I(x,y,\tau )\right) \end{aligned}$$
(7)

where I is the image intensity map. Both \(\Sigma \) and M are Riemannian manifolds and hence are equipped with metrics G and H, respectively, which enable measurement of lengths over each manifold. We require the lengths as measured on each manifold to be the same, i.e.,

$$\begin{aligned} ds^{2}=\left( dx,dy,d\tau ,dI\right) H\left( dx,dy,d\tau ,dI\right) ^{T}=(dx,dy,d\tau )G(dx,dy,d\tau ) \end{aligned}$$
(8)

where \(dI=I_{x}dx+I_{y}dy+I_{\tau }d\tau \), according to the chain rule. A natural choice for gray-level videos is a Euclidean space-feature manifold with the metric:

$$\begin{aligned} H=\left( \begin{array}{cccc} 1 &{} 0 &{} 0 &{} 0\\ 0 &{} 1 &{} 0 &{} 0\\ 0 &{} 0 &{} 1 &{} 0\\ 0 &{} 0 &{} 0 &{} \beta ^{2} \end{array}\right) \end{aligned}$$
(9)

where \(\beta \) is the relative scale between the space coordinates and the intensity component. Using (2) the induced metric tensor \(G=\{g_{uv}\}\) is:

$$\begin{aligned} G=\left( \begin{array}{ccc} 1+\beta ^{2}I_{x}^{2} &{} \beta ^{2}I_{x}I_{y} &{} \beta ^{2}I_{x}I_{\tau }\\ \beta ^{2}I_{x}I_{y} &{} 1+\beta ^{2}I_{y}^{2} &{} \beta ^{2}I_{y}I_{\tau }\\ \beta ^{2}I_{x}I_{\tau } &{} \beta ^{2}I_{y}I_{\tau } &{} 1+\beta ^{2}I_{\tau }^{2} \end{array}\right) \end{aligned}$$
(10)

The Beltrami flow is obtained by minimizing the area of the image manifold:

$$\begin{aligned} S_{X,G}=\iiint \sqrt{g}dxdyd\tau \end{aligned}$$
(11)

where \(g=det\left( G\right) \). Using the methods of variational calculus with the resulting Euler-Lagrange relation, the minimization is given by:

$$\begin{aligned} -\frac{d}{dx}\left( \frac{I_{x}}{\sqrt{g}}\right) -\frac{d}{dy}\left( \frac{I_{y}}{\sqrt{g}}\right) -\frac{d}{d\tau }\left( \frac{I_{\tau }}{\sqrt{g}}\right) =-div\left( \sqrt{g}G^{-1}\nabla I\right) \end{aligned}$$
(12)

Multiplying both sides by \(g^{-1/2}\) we get :

$$\begin{aligned} I_{t}=\triangle _{g}I=-\frac{1}{\sqrt{g}}div\left( \sqrt{g}G^{-1}\nabla I\right) \end{aligned}$$
(13)

where \(\triangle _{g}\) is the Laplace-Beltrami operator. The discretized version of Eq. (10) allows us to perform iterative traversal through this scale space on a computer and produces a very effective technique for denoising grayscale videos when using the metric in (7):

$$\begin{aligned} I_{t+1}=I_{t}+dt\frac{1}{\sqrt{g}}(D_{x}+D_{y}+D_{\tau }) \end{aligned}$$
(14)

where \(D=\sqrt{g}G^{-1}\nabla I\), \(div(D)=D_{x}+D_{y}+D_{\tau }\), and \(dt\propto \beta ^{-2}\). Note that the output depends on two hyper parameters: the number of iterations of the update, and the parameter \(\beta \).

The above framework assumes similar physical measures of the xy, and \(\tau \) coordinates. In reality, the space domain coordinates x and y do not possess the same physical measure as the time domain coordinate \(\tau \). Hence, we need to introduce another scaling factor,\(\gamma \), into the space-time-intensity metric:

$$\begin{aligned} H=\left( \begin{array}{ccc} \mathbf {I}_{2\times 2} &{} 0 &{} \mathbf {0}\\ 0 &{} \gamma ^{2} &{} \mathbf {0}\\ \mathbf {0} &{} \mathbf {0} &{} \,\,\beta ^{2}\mathbf {I}_{w^{3}\times w^{3}} \end{array}\right) . \end{aligned}$$
(15)

The new induced metric tensor for the 3D image manifold is computed using the constraint in Eq. (5) :

$$\begin{aligned} G=\left( \begin{array}{ccc} 1+\beta ^{2}I_{x}^{2} &{} \beta ^{2}I_{x}I_{y} &{} \beta ^{2}I_{x}I_{\tau }\\ \beta ^{2}I_{x}I_{y} &{} 1+\beta ^{2}I_{y}^{2} &{} \beta ^{2}I_{y}I_{\tau }\\ \beta ^{2}I_{x}I_{\tau } &{} \beta ^{2}I_{y}I_{\tau } &{} \gamma +\beta ^{2}I_{\tau }^{2} \end{array}\right) \end{aligned}$$
(16)

In addition we modified the numerical update step so it would fit the new scaling as follows:

$$\begin{aligned} I_{t+1}=I_{t}+dt_{1}g^{-1/2}(D_{x}+D_{y})+dt_{2}g^{-1/2}D_{\tau } \end{aligned}$$
(17)

where \(dt_{1}\propto \beta ^{-2}\) and \(dt_{2}\propto \gamma ^{-2}\).

E Results on Synthetic Data

The performance evaluation of our DNN-based de-noising method on synthetic data is done using 10-fold cross-validation (10-CV) method. 200,000 noisy WoC were generated using the Tofts model. The training data is randomly divided into ten groups (20,000 training examples in every CV group) such that nine groups are used for training and the remaining set is used for testing. The experiment is performed for different signal to noise ratio (SNR) values independently. Fig. 9 demonstrates successful denoising of a single representative WoC using our DNN-based method (red) and using moving average (MA) method (green) along with the synthetic, clean and noisy WoCs (blue and black, respectively). In Fig. 10 the mean MSE values and standard deviation intervals are plotted for different SNR levels.

Fig. 9.
figure 9

De-noising of synthetic WoC (black) using DNN (red) and MA (green) \(SNR=10_{[dB]}.\) Ground truth (GT) is in blue.

Fig. 10.
figure 10

Mean values and standard deviation intervals of the MSE between the clean (ground-truth) WoC and the cleaned curves using the DNN-based method (red) and MA (green) as a function of the SNR of the simulation (synthetic) curves.

F Run-time Comparison

We measured the run-time of the different algorithms for the 13 DCE-MRI scans. Table 1 presents the average run-time of each de-noising algorithm in minutes. The measured run-time did not include any pre-processing procedures and measured only the run-time of the de-noiosng algorithms. The algorithms were tested on MatLab 2014b (64-bit) using Intel(R) core(TM) i-7-4470, 3.4 GHz CPU with 16 GB RAM.

Table 1. Average run-time of the different de-noising methods for a single DCE-MRI scan (4D volume with dimentions \(255\times 255\times 22 \times 100\)).
Fig. 11.
figure 11

Block diagram of our performance assessment method for real data.

G Experimental Setup for Real Data

In the absence of ground-truth washout curves, in addition to visual assessment, we estimated the de-noising algorithms’ success using two measures: the fidelity of the output of the de-noising methods to the noisy data and to the PK model. Fig. 11 shows a block diagram of our performance assessment method. Given a noisy DCE-MRI scan we apply a de-noising algorithm. The fidelity of the cleaned curves to the noisy data is measured by calculating the mean squared error (MSE) between the noisy curve and the de-noised curve. Then, we extract the PK-parameters from the cleaned curves by applying the standard DCE-MRI curve fitting algorithm. Next, we use the estimated PK-parameters to generate synthetic washout curves according to the Tofts model. The MSE between the model-based synthetic curve and the de-noised curve measure the fidelity of the de-noised curves to the PK-model.

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing AG

About this paper

Cite this paper

Benou, A., Veksler, R., Friedman, A., Riklin Raviv, T. (2016). De-noising of Contrast-Enhanced MRI Sequences by an Ensemble of Expert Deep Neural Networks. In: Carneiro, G., et al. Deep Learning and Data Labeling for Medical Applications. DLMIA LABELS 2016 2016. Lecture Notes in Computer Science(), vol 10008. Springer, Cham. https://doi.org/10.1007/978-3-319-46976-8_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-46976-8_11

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-46975-1

  • Online ISBN: 978-3-319-46976-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics