Skip to main content

Improving Aleatoric Uncertainty Quantification in Multi-annotated Medical Image Segmentation with Normalizing Flows

  • 925 Accesses

Part of the Lecture Notes in Computer Science book series (LNIP,volume 12959)


Quantifying uncertainty in medical image segmentation applications is essential, as it is often connected to vital decision-making. Compelling attempts have been made in quantifying the uncertainty in image segmentation architectures, e.g. to learn a density segmentation model conditioned on the input image. Typical work in this field restricts these learnt densities to be strictly Gaussian. In this paper, we propose to use a more flexible approach by introducing Normalizing Flows (NFs), which enables the learnt densities to be more complex and facilitate more accurate modeling for uncertainty. We prove this hypothesis by adopting the Probabilistic U-Net and augmenting the posterior density with an NF, allowing it to be more expressive. Our qualitative as well as quantitative (GED and IoU) evaluations on the multi-annotated and single-annotated LIDC-IDRI and Kvasir-SEG segmentation datasets, respectively, show a clear improvement. This is mostly apparent in the quantification of aleatoric uncertainty and the increased predictive performance of up to 14%. This result strongly indicates that a more flexible density model should be seriously considered in architectures that attempt to capture segmentation ambiguity through density modeling. The benefit of this improved modeling will increase human confidence in annotation and segmentation, and enable eager adoption of the technology in practice.


  • Segmentation
  • Uncertainty
  • Computer vision
  • Imaging

This is a preview of subscription content, access via your institution.

Buying options

USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-030-87735-4_8
  • Chapter length: 14 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
USD   54.99
Price excludes VAT (USA)
  • ISBN: 978-3-030-87735-4
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   69.99
Price excludes VAT (USA)
Fig. 1.
Fig. 2.
Fig. 3.


  1. Armato, S., III., et al.: The lung image database consortium (LIDC) and image database resource initiative (IDRI): a completed reference database of lung nodules on CT scans. Med. Phys. 38, 915–931 (2011).

    CrossRef  Google Scholar 

  2. Baumgartner, C.F., et al.: PHiSeg: capturing uncertainty in medical image segmentation (2019)

    Google Scholar 

  3. van den Berg, R., Hasenclever, L., Tomczak, J.M., Welling, M.: Sylvester normalizing flows for variational inference (2019)

    Google Scholar 

  4. Jha, D., et al.: Kvasir-SEG: a segmented polyp dataset. In: Ro, Y.M., et al. (eds.) MMM 2020. LNCS, vol. 11962, pp. 451–462. Springer, Cham (2020).

    CrossRef  Google Scholar 

  5. Kendall, A., Gal, Y.: What uncertainties do we need in Bayesian deep learning for computer vision? CoRR (2017).

  6. Kingma, D.P., Dhariwal, P.: Glow: generative flow with invertible 1x1 convolutions (2018)

    Google Scholar 

  7. Kingma, D.P., Welling, M.: Auto-encoding variational bayes (2014)

    Google Scholar 

  8. Kobyzev, I., Prince, S., Brubaker, M.: Normalizing flows: an introduction and review of current methods. IEEE Trans. Pattern Anal. Mach. Intell. p. 1 (2020).

  9. Kohl, S.A.A., et al.: A hierarchical probabilistic U-Net for modeling multi-scale ambiguities (2019)

    Google Scholar 

  10. Kohl, S.A., et al.: A probabilistic u-net for segmentation of ambiguous images. arXiv preprint arXiv:1806.05034 (2018)

  11. Pogorelov, K., et al.: KVASIR: a multi-class image dataset for computer aided gastrointestinal disease detection (2017).

  12. Rezende, D.J., Mohamed, S.: Variational inference with normalizing flows (2016)

    Google Scholar 

  13. Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015).

    CrossRef  Google Scholar 

  14. Selvan, R., Faye, F., Middleton, J., Pai, A.: Uncertainty quantification in medical image segmentation with normalizing flows. In: Liu, M., Yan, P., Lian, C., Cao, X. (eds.) MLMI 2020. LNCS, vol. 12436, pp. 80–90. Springer, Cham (2020).

    CrossRef  Google Scholar 

  15. Sohn, K., Lee, H., Yan, X.: Learning structured output representation using deep conditional generative models. Adv. Neural. Inf. Process. Syst. 28, 3483–3491 (2015)

    Google Scholar 

Download references

Author information

Authors and Affiliations


Corresponding author

Correspondence to M. M. Amaan Valiuddin .

Editor information

Editors and Affiliations



A Probabilistic U-Net Objective

The loss function of the PU-Net is based on the standard ELBO and is defined as

$$\begin{aligned} \begin{aligned} \mathcal {L}&= -\mathbb {E}_{q_\phi (\mathbf {z}\vert \mathbf {s},\mathbf {x})}[\,{\text {log}}p(\mathbf {s}\vert \mathbf {z},\mathbf {x})\,]+{\text {KL}}\left( \,q_\phi (\mathbf {z}\vert \mathbf {s},\mathbf {x})\vert \vert p_\psi (\mathbf {z}\vert \mathbf {x})\,\right) , \end{aligned} \end{aligned}$$

where the latent sample \(\mathbf {z}\) from the posterior distribution is conditioned on the input image \(\mathbf {x}\), and ground-truth segmentation \(\mathbf {s}\).

B Planar and Radial Flows

Normalizing Flows are trained by maximizing the likelihood objective

$$\begin{aligned} \begin{aligned} \log p(\mathbf {x})=\log p_{0}\left( \mathbf {z}_{0}\right) -\sum _{i=1}^{K} \log \left( \left| {\text {det}} \frac{d f_{i}}{d \mathbf {z}_{i-1}}\right| \right) . \end{aligned} \end{aligned}$$

In the PU-Net, the objective becomes

$$\begin{aligned} \begin{aligned} \log q(\mathbf {z}\vert \mathbf {s},\mathbf {x})=\log q_{0}(\mathbf {z}_0\vert \mathbf {s},\mathbf {x})-\sum _{i=1}^{K} \log \left( \left| {\text {det}} \frac{d f_{i}}{d \mathbf {z}_{i-1}}\right| \right) , \end{aligned} \end{aligned}$$

where the i-th latent sample \(\mathbf {z}_i\) from the Normalizing Flow is conditioned on the input image \(\mathbf {x}\), and ground-truth segmentation \(\mathbf {s}\).

The planar flow expands and contracts distributions along a specific directions by applying the transformation

$$\begin{aligned} f(\mathbf {x})=\mathbf {x} + \mathbf {u}h(\mathbf {w}^T\mathbf {x}+\mathbf {b}), \end{aligned}$$

while the radial flow warps distributions around a specific point with the transformation

$$\begin{aligned} f(\mathbf {x})=\mathbf {x} + \frac{\beta }{\alpha \left| \mathbf {x}-\mathbf {x}_0\right| }(\mathbf {x}-\mathbf {x}_0). \end{aligned}$$

C Dataset Images

Here example images from the datasets used in this work can be seen. Figure 4 depicts four examples from the LIDC dataset. On the left in the figure the 2D CT image containing the lesion, followed by the four labels made by four independent annotators is shown. In Fig. 5, eight examples from the Kvasir-SEG dataset is depicted. An endoscopic image with its ground truth label can be seen.

Fig. 4.
figure 4

Example images from the LIDC dataset.

Fig. 5.
figure 5

Example images from the Kvasir-SEG dataset.

D Sample Size Dependent GED

The GED evaluation is dependent on the number of reconstructions sampled from the prior distribution. Figure 6 depicts this relationship for the vanilla, 2-planar and 2-radial posterior models. The uncertainty in the values originate from the changing results when training with ten-fold cross validation. One can observe that with increasing sample size, the GED as well as the associated uncertainty decrease. This is also the case when the posterior is augmented with a 2-planar or 2-radial flow. Particularly, the uncertainty in the GED evaluation significantly decreases.

Fig. 6.
figure 6

The GED based on sample size evaluated on the vanilla, 2-planar and 2-radial models.

E Prior Distribution Variance

We investigated whether the prior distribution captures the degree of ambiguity in the input images. For every input image \(\mathbb {X}\), we obtain a latent L-dimensional mean and standard deviation vector of the prior distribution \(P(\boldsymbol{\mu }, \boldsymbol{\sigma }\vert \mathbb {X})\). The mean of the latent prior variance vector \(\mu _{LV}\), is obtained from the input images in an attempt to quantify this uncertainty. Figure 7 shows this for several different input images of the test set. As can be seen, the mean variance over the latent prior increases along with a subjective assessment of the annotation difficulty.

Fig. 7.
figure 7

Depicted in the CT image is the mean of the prior distribution variance of the 2-planar model. We show the input CT image, its average segmentation prediction (16 samples) and ground truth from four annotators.

Rights and permissions

Reprints and Permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Verify currency and authenticity via CrossMark

Cite this paper

Valiuddin, M.M.A., Viviers, C.G.A., van Sloun, R.J.G., de With, P.H.N., van der Sommen, F. (2021). Improving Aleatoric Uncertainty Quantification in Multi-annotated Medical Image Segmentation with Normalizing Flows. In: , et al. Uncertainty for Safe Utilization of Machine Learning in Medical Imaging, and Perinatal Imaging, Placental and Preterm Image Analysis. UNSURE PIPPI 2021 2021. Lecture Notes in Computer Science(), vol 12959. Springer, Cham.

Download citation

  • DOI:

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-87734-7

  • Online ISBN: 978-3-030-87735-4

  • eBook Packages: Computer ScienceComputer Science (R0)