# Uncertainty Estimation in Medical Image Denoising with Bayesian Deep Image Prior

- 3 Mentions
- 849 Downloads

## Abstract

Uncertainty quantification in inverse medical imaging tasks with deep learning has received little attention. However, deep models trained on large data sets tend to hallucinate and create artifacts in the reconstructed output that are not anatomically present. We use a randomly initialized convolutional network as parameterization of the reconstructed image and perform gradient descent to match the observation, which is known as deep image prior. In this case, the reconstruction does not suffer from hallucinations as no prior training is performed. We extend this to a Bayesian approach with Monte Carlo dropout to quantify both aleatoric and epistemic uncertainty. The presented method is evaluated on the task of denoising different medical imaging modalities. The experimental results show that our approach yields well-calibrated uncertainty. That is, the predictive uncertainty correlates with the predictive error. This allows for reliable uncertainty estimates and can tackle the problem of hallucinations and artifacts in inverse medical imaging tasks.

## Keywords

Variational inference Hallucination Deep learning## 1 Introduction

Noise in medical imaging affects all modalities, including X-ray, magnetic resonance imaging (MRI), computed tomography (CT), ultrasound (US) or optical coherence tomography (OCT) and can obstruct important details for medical diagnosis [1, 7, 16]. Besides “classical” approaches with linear and non-linear filters, such as the Wiener filter, or wavelet-denoising [3, 22], convolutional neural networks (CNN) have proven to yield superior performance in denoising of natural and medical images [16, 28].

*hallucination*and, while acceptable in the reconstruction of natural images [25], must be avoided at all costs in medical imaging (see Fig. 1). Hallucinations can lead to false diagnoses and thus severely compromise patient safety.

To further increase the reliability in the denoised medical images, the reconstruction uncertainty has to be considered. Bayesian autoencoders provide the mathematical framework to quantify a per-pixel reconstruction uncertainty [2, 4, 14]. This allows the detection of hallucinations and other artifacts, given that the uncertainty is well-calibrated; i. e. the uncertainty corresponds well with the reconstruction error [15].

In this work, we employ *deep image prior*
[18] to cope with hallucinations in medical image denoising and provide a Bayesian approach with Monte Carlo (MC) dropout
[6] that yields well-calibrated reconstruction uncertainty. We present experimental results on denoising images from low-dose X-ray, ultrasound and OCT. Compared to previous work, our approach leads to better uncertainty estimates and is less prone to overfitting of the noisy image. Our code is publicly available at github.com/mlaves/uncertainty-deep-image-prior.

## 2 Related Work

*Image Priors.* Besides manually crafted priors such as 3D collaborative filtering
[5], convolutional denoising autoencoders have been used to implicitly learn an image prior from data
[7, 11]. Lempitsky et al. have recently shown that the excellent performance of deep networks for inverse image tasks, such as denoising, is based not only on their ability to learn image priors from data, but also on the structure of a convolutional image generator itself
[18]. An image generator network \( \hat{\textit{\textbf{x}}} = \textit{\textbf{f}}_{\varvec{\theta }}(\textit{\textbf{z}}) \) with randomly-initialized parameters \( \varvec{\theta } \) is interpreted as parameterization of the image. The parameters \( \varvec{\theta } \) of the network are found by minimizing the pixel-wise squared error \( \Vert \tilde{\textit{\textbf{x}}} - \textit{\textbf{f}}_{\varvec{\theta }}(\textit{\textbf{z}}) \Vert \) with stochastic gradient descent (SGD). The input \( \textit{\textbf{z}} \) is sampled from a uniform distribution with additional perturbations by normally distributed noise in every iteration. This is referred to as deep image prior (DIP). They provided empirical evidence that the structure of a CNN alone is sufficient to capture enough image statistics to provide state-of-the-art performance in inverse imaging tasks. During the process of SGD, low-frequency image features are reconstructed first, followed by higher frequencies, which makes human supervision necessary to retrieve the optimal denoised image. Therefore, this approach heavily relies on early stopping in order to not overfit the noise. However, a key advantage of deep image prior is the absence of hallucinations, since there is no prior learning. A Bayesian approach could alleviate overfitting and additionally provide reconstruction uncertainty.

*Bayesian Deep Learning.* Bayesian neural networks allow estimation of predictive uncertainty
[2] and we generally differentiate between aleatoric and epistemic uncertainty
[12]. Aleatoric uncertainty results from noise in the data (e. g. speckle noise in US or OCT). It is derived from the conditional log-likelihood under the maximum likelihood estimation (MLE) or maximum posterior (MAP) framework and can be captured directly by a deep network (i. e. by subdividing the last layer of an image generator network). Epistemic uncertainty is caused by uncertainty in the model parameters. In deep learning, we usually perform MLE or MAP inference to find a single best estimate \( \hat{\varvec{\theta }} \) for the network parameters. This does not allow estimation of epistemic uncertainty and we therefore place distributions over the parameters. In Bayesian inference, we want to consider all possible parameter configurations, weighted by their posterior. Computing the posterior predictive distribution involves marginalization of the parameters \( \varvec{\theta } \), which is intractable. A common approximation of the posterior distribution is variational inference with Monte Carlo dropout
[6]. It allows estimation of epistemic uncertainty by Monte Carlo sampling from the posterior of a network, that has been trained with dropout.

*Bayesian Deep Image Prior.* Cheng et al. recently provided a Bayesian perspective on the deep image prior in the context of natural images, which is most related to our work
[4]. They interpret the convolutional network as spatial random process over the image coordinate space and use stochastic gradient Langevin dynamics (SGLD) as Bayesian approximation
[26] to sample from the posterior. In SGLD, an MC sampler is derived from SGD by injecting Gaussian noise into the gradients after each backward pass. The authors claim to have solved the overfitting issue with DIP and to be able to provide uncertainty estimates. In the following, we will show that this is not the case for medical image denoising, even when using the code provided by the authors. Further, the uncertainty estimates from SGLD do not reflect the predictive error with respect to the noise-free ground truth image.

## 3 Methods

### 3.1 Aleatoric Uncertainty with Deep Image Prior

*i*follows a Gaussian distribution \( \mathcal {N}(\tilde{x}_{i}; \hat{x}_{i}, \hat{\sigma }^{2}_{i}) \) with mean \( \hat{x}_{i} \) and variance \( \hat{\sigma }^{2}_{i} \). We split the last layer such that the network outputs these values for each pixel

*N*is the number of pixels per image. In this case, \( \hat{\varvec{\sigma }}^{2} \) captures the pixel-wise aleatoric uncertainty and is jointly estimated with \( \hat{\textit{\textbf{x}}} \) by finding \( \varvec{\theta } \) that minimizes Eq. (4) with SGD. For numerical stability, Eq. (4) is implemented such that the network directly outputs \( -\log \hat{\varvec{\sigma }}^{2} \).

### 3.2 Epistemic Uncertainty with Bayesian Deep Image Prior

*T*stochastic forward passes with applied dropout are performed to sample from the approximate Bayesian posterior \( \tilde{\varvec{\theta }} \sim q(\varvec{\theta }) \). This allows us to approximate the posterior predictive distribution

*U*.

### 3.3 Calibration of Uncertainty

## 4 Experiments

We refer to the presented Bayesian approach to deep image prior with Monte Carlo dropout as MCDIP and evaluate its denoising performance and the calibration of uncertainty on three different medical imaging modalities (see Fig. 2). The first test image \( \textit{\textbf{x}}_{\mathrm {OCT}} \) shows an OCT scan of a retina affected by choroidal neovascularization. Next, \( \textit{\textbf{x}}_{\mathrm {US}} \) shows an ultrasound of a fetal head for gestational age estimation. The third test image \( \textit{\textbf{x}}_{\mathrm {xray}} \) shows a chest x-ray for pneumonia assessment. All test images are arbitrarily sampled from public data sets [9, 13] and have a resolution of \( 512 \times 512 \) pixel.

Images from optical coherence tomography and ultrasound are prone to speckle noise due to interference phenomena [21]. Speckle noise can obscure small anatomical details and reduce image contrast. It is worth mentioning that speckle patterns also contain information about the microstructure of the tissue. However, this information is not perceptible to a human observer, therefore the denoising of such images is desirable. Noise in low-dose X-ray originates from an uneven photon density and can be modeled with Poisson noise [17, 27]. In this work, we approximate the Poisson noise with Gaussian noise since \( \mathsf {Poisson}(\lambda ) \) approaches a Normal distribution as \( \lambda \rightarrow \infty \) (see Appendix A.5). We first create a low-noise image \( \textit{\textbf{x}} \) by smoothing and downsampling the original image from public data sets using the ANTIALIAS filter from the Python Imaging Library (PIL) to \( 256 \times 256 \) pixel. Downsampling involves averaging over highly correlated neighboring pixels affected by uncorrelated noise. This decreases the observation noise by sacrificing image resolution (see Appendix A.4). The downsampled image acts as ground truth to which we compute the peak signal-to-noise ratio (PSNR) and the structural similarity (SSIM) of the denoised image \( \hat{\textit{\textbf{x}}} \). Further, we compute the UCE and provide calibration diagrams (MSE vs. uncertainty) to show the (mis-)calibration of the uncertainty estimates.

We compare the results from MCDIP to standard DIP and to DIP with SGLD from Cheng et al.
[4]. SGLD posterior inference is performed by averaging over *T* posterior samples \( \hat{\textit{\textbf{x}}} = \frac{1}{T} \sum _{t=1}^{T} \hat{\textit{\textbf{x}}}_{t} \) after a “burn in” phase. The posterior variance is used as an estimator of the epistemic uncertainty \( \frac{1}{T} \sum _{t=1}^{T} \left( \hat{\textit{\textbf{x}}} - \hat{\textit{\textbf{x}}}_{t} \right) ^{2} \). Cheng et al. claim that their approach does not require early stopping and yields better denoising performance. Additionally, we train the SGLD approach with the loss function from Eq. (7) to consider aleatoric uncertainty and denote this with SGLD+NLL. We implement SGLD using the Adam optimizer, which works better in practice and is more related to preconditioned SGLD
[20].

## 5 Results

The results are presented threefold: We show (1) possible overfitting in Fig. 3 by plotting the PSNR between the reconstruction \( \hat{\textit{\textbf{x}}} \) and the ground truth image \( \textit{\textbf{x}} \); (2) denoising performance by providing the denoised images in Fig. 4 and PSNR in Table 1 after convergence (i. e. after 50k optimizer steps); and (3) goodness of uncertainty in Fig. 5 by providing calibration diagrams and uncertainty maps.

PSNR values after convergence (at least 50k iterations). Note that our goal was not to reach highest possible PSNR, but to show overfitting in convergence.

PSNR | DIP | SGLD | SGLD+NLL | MCDIP |
---|---|---|---|---|

OCT | \( 23.64 \pm 0.19 \) | \( 23.58 \pm 0.12 \) | \( 24.82 \pm 0.12 \) | \( \mathbf {29.88} \pm 0.03 \) |

US | \( 23.55 \pm 0.11 \) | \( 23.81 \pm 0.15 \) | \( 24.55 \pm 0.08 \) | \( \mathbf {29.67} \pm 0.07 \) |

X-ray | \( 23.28 \pm 0.08 \) | \( 23.50 \pm 0.12 \) | \( 24.60 \pm 0.04 \) | \( \mathbf {31.19} \pm 0.10 \) |

The calibration diagrams and corresponding UCE values in Fig. 5 suggest that SGLD+NLL is better calibrated than MCDIP. However, due to overfitting the noisy image without early stopping, the MSE from SGLD+NLL concentrates around 0.0, which results in low UCE values. On the US and OCT image, the uncertainty from SGLD+NLL collapses to a single bin in the calibration diagram and does not allow to reason about the validness of the reconstructed image (see Fig. 9 in Appendix A.1). The uncertainty map from MCDIP shows high uncertainty at edges in the image and the mean uncertainty value (denoted by U) is close to the noise level in all three test images.

## 6 Discussion and Conclusion

In this paper, we provided a new Bayesian approach to the deep image prior. We used variational inference with Monte Carlo dropout and the full negative log-likelihood to both quantify epistemic and aleatoric uncertainty. The presented approach is applied to medical image denoising of three different modalities and provides state-of-the-art performance in denoising with deep image prior. Our Bayesian treatment does not need carefully applied early stopping and yields well-calibrated uncertainty. We observe the estimated mean uncertainty value to be close to the noise level of the images.

The question remains why Bayesian deep image prior with SGLD does not work as well as expected and is outperformed by MC dropout. First, SGLD as described by Welling et al. requires a strong decay of the step size to ensure convergence to a mode of the posterior [26]. Cheng et al. did not implement this and we followed their approach [4]. After implementing the described step size decay, SGLD did not overfit the noisy image (see Appendix A.3). However, this requires a carefully chosen step size decay which is equivalent to early stopping.

The deep image prior framework is especially interesting in medical imaging as it does not require supervised training and thus does not suffer from hallucinations and other artifacts. The presented approach can further be applied to deformable registration or other inverse image tasks in the medical domain.

## References

- 1.Agostinelli, F., Anderson, M.R., Lee, H.: Adaptive multi-column deep neural networks with application to robust image denoising. In: Advances in Neural Information Processing Systems, pp. 1493–1501 (2013)Google Scholar
- 2.Bishop, C.M.: Pattern Recognition and Machine Learning. Springer, Boston (2006). https://doi.org/10.1007/978-1-4615-7566-5CrossRefzbMATHGoogle Scholar
- 3.Chang, S.G., Yu, B., Vetterli, M.: Adaptive wavelet thresholding for image denoising and compression. IEEE Trans. Image Process.
**9**(9), 1532–1546 (2000). https://doi.org/10.1109/83.862633MathSciNetCrossRefzbMATHGoogle Scholar - 4.Cheng, Z., Gadelha, M., Maji, S., Sheldon, D.: A Bayesian perspective on the deep image prior. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5443–5451 (2019)Google Scholar
- 5.Dabov, K., Foi, A., Katkovnik, V., Egiazarian, K.: Image denoising by sparse 3-D transform-domain collaborative filtering. Trans. Image Process.
**16**(8), 2080–2095 (2007). https://doi.org/10.1109/TIP.2007.901238MathSciNetCrossRefGoogle Scholar - 6.Gal, Y., Ghahramani, Z.: Dropout as a Bayesian approximation: Representing model uncertainty in deep learning. In: ICML, pp. 1050–1059 (2016)Google Scholar
- 7.Gondara, L.: Medical image denoising using convolutional denoising autoencoders. In: International Conference on Data Mining Workshops, pp. 241–246 (2016). https://doi.org/10.1109/ICDMW.2016.0041
- 8.Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: ICML, pp. 1321–1330 (2017)Google Scholar
- 9.van den Heuvel, T.L., de Bruijn, D., de Korte, C.L., Ginneken, B.v.: Automated measurement of fetal head circumference using 2D ultrasound images. PloS One
**13**(8), e0200412 (2018). https://doi.org/10.1371/journal.pone.0200412. US dataset source - 10.Hogg, R.V., McKean, J., Craig, A.T.: Introduction to Mathematical Statistics, 8th edn. Pearson, New York (2018)Google Scholar
- 11.Jain, V., Seung, S.: Natural image denoising with convolutional networks. In: Advances in Neural Information Processing Systems, pp. 769–776 (2009)Google Scholar
- 12.Kendall, A., Gal, Y.: What uncertainties do we need in Bayesian deep learning for computer vision? In: NeurIPS, pp. 5574–5584 (2017)Google Scholar
- 13.Kermany, D.S., et al.: Identifying medical diagnoses and treatable diseases by image-based deep learning. Cell
**172**(5), 1122–1131 (2018). https://doi.org/10.1016/j.cell.2018.02.010CrossRefGoogle Scholar - 14.Kingma, D.P., Welling, M.: Auto-encoding variational Bayes. In: ICLR (2014)Google Scholar
- 15.Laves, M.H., Ihler, S., Fast, J.F., Kahrs, L.A., Ortmaier, T.: Well-calibrated regression uncertainty in medical imaging with deep learning. In: Medical Imaging with Deep Learning (2020)Google Scholar
- 16.Laves, M.H., Ihler, S., Kahrs, L.A., Ortmaier, T.: Semantic denoising autoencoders for retinal optical coherence tomography. In: SPIE/OSA European Conference on Biomedical Optics, vol. 11078, pp. 86–89 (2019). https://doi.org/10.1117/12.2526936
- 17.Lee, S., Lee, M.S., Kang, M.G.: Poisson-gaussian noise analysis and estimation for low-dose x-ray images in the NSCT domain. Sensors
**18**(4), 1019 (2018)CrossRefGoogle Scholar - 18.Lempitsky, V., Vedaldi, A., Ulyanov, D.: Deep Image Prior. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9446–9454 (2018). https://doi.org/10.1109/CVPR.2018.00984
- 19.Levi, D., Gispan, L., Giladi, N., Fetaya, E.: Evaluating and calibrating uncertainty prediction in regression tasks. arXiv arXiv:1905.11659 (2019)
- 20.Li, C., Chen, C., Carlson, D., Carin, L.: Preconditioned stochastic gradient Langevin dynamics for deep neural networks. In: Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, pp. 1788–1794 (2016)Google Scholar
- 21.Michailovich, O.V., Tannenbaum, A.: Despeckling of medical ultrasound images. Trans. Ultrason. Ferroelectr. Freq. Control
**53**(1), 64–78 (2006). https://doi.org/10.1109/TUFFC.2006.1588392CrossRefGoogle Scholar - 22.Rabbani, H., Nezafat, R., Gazor, S.: Wavelet-domain medical image denoising using bivariate Laplacian mixture model. Trans. Biomed. Eng.
**56**(12), 2826–2837 (2009). https://doi.org/10.1109/TBME.2009.2028876CrossRefGoogle Scholar - 23.Salinas, H.M., Fernandez, D.C.: Comparison of PDE-based nonlinear diffusion approaches for image enhancement and denoising in optical coherence tomography. IEEE Trans. Med. Imaging
**26**(6), 761–771 (2007). https://doi.org/10.1109/TMI.2006.887375CrossRefGoogle Scholar - 24.Sotiras, A., Davatzikos, C., Paragios, N.: Deformable medical image registration: a survey. IEEE Trans. Med. Imaging
**32**(7), 1153–1190 (2013). https://doi.org/10.1109/TMI.2013.2265603CrossRefGoogle Scholar - 25.Wang, N., Tao, D., Gao, X., Li, X., Li, J.: A comprehensive survey to face hallucination. Int. J. Comput. Vis.
**106**(1), 9–30 (2014). https://doi.org/10.1007/s11263-013-0645-9CrossRefGoogle Scholar - 26.Welling, M., Teh, Y.W.: Bayesian learning via stochastic gradient Langevin dynamics. In: ICML, pp. 681–688 (2011)Google Scholar
- 27.Žabić, S., Wang, Q., Morton, T., Brown, K.M.: A low dose simulation tool for CT systems with energy integrating detectors. Med. Phys.
**40**(3), 031102 (2013). https://doi.org/10.1118/1.4789628CrossRefGoogle Scholar - 28.Zhang, K., Zuo, W., Chen, Y., Meng, D., Zhang, L.: Beyond a Gaussian denoiser: residual learning of deep CNN for image denoising. IEEE Trans. Image Process.
**26**(7), 3142–3155 (2017). https://doi.org/10.1109/TIP.2017.2662206MathSciNetCrossRefzbMATHGoogle Scholar