## Abstract

Quantifying uncertainty in medical image segmentation applications is essential, as it is often connected to vital decision-making. Compelling attempts have been made in quantifying the uncertainty in image segmentation architectures, e.g. to learn a density segmentation model conditioned on the input image. Typical work in this field restricts these learnt densities to be strictly Gaussian. In this paper, we propose to use a more flexible approach by introducing Normalizing Flows (NFs), which enables the learnt densities to be more complex and facilitate more accurate modeling for uncertainty. We prove this hypothesis by adopting the Probabilistic U-Net and augmenting the posterior density with an NF, allowing it to be more expressive. Our qualitative as well as quantitative (GED and IoU) evaluations on the multi-annotated and single-annotated LIDC-IDRI and Kvasir-SEG segmentation datasets, respectively, show a clear improvement. This is mostly apparent in the quantification of aleatoric uncertainty and the increased predictive performance of up to 14%. This result strongly indicates that a more flexible density model should be seriously considered in architectures that attempt to capture segmentation ambiguity through density modeling. The benefit of this improved modeling will increase human confidence in annotation and segmentation, and enable eager adoption of the technology in practice.

### Keywords

- Segmentation
- Uncertainty
- Computer vision
- Imaging

This is a preview of subscription content, access via your institution.

## Buying options

## References

Armato, S., III., et al.: The lung image database consortium (LIDC) and image database resource initiative (IDRI): a completed reference database of lung nodules on CT scans. Med. Phys.

**38**, 915–931 (2011). https://doi.org/10.1118/1.3528204Baumgartner, C.F., et al.: PHiSeg: capturing uncertainty in medical image segmentation (2019)

van den Berg, R., Hasenclever, L., Tomczak, J.M., Welling, M.: Sylvester normalizing flows for variational inference (2019)

Jha, D., et al.: Kvasir-SEG: a segmented polyp dataset. In: Ro, Y.M., et al. (eds.) MMM 2020. LNCS, vol. 11962, pp. 451–462. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-37734-2_37

Kendall, A., Gal, Y.: What uncertainties do we need in Bayesian deep learning for computer vision? CoRR (2017). http://arxiv.org/abs/1703.04977

Kingma, D.P., Dhariwal, P.: Glow: generative flow with invertible 1x1 convolutions (2018)

Kingma, D.P., Welling, M.: Auto-encoding variational bayes (2014)

Kobyzev, I., Prince, S., Brubaker, M.: Normalizing flows: an introduction and review of current methods. IEEE Trans. Pattern Anal. Mach. Intell. p. 1 (2020). https://doi.org/10.1109/tpami.2020.2992934

Kohl, S.A.A., et al.: A hierarchical probabilistic U-Net for modeling multi-scale ambiguities (2019)

Kohl, S.A., et al.: A probabilistic u-net for segmentation of ambiguous images. arXiv preprint arXiv:1806.05034 (2018)

Pogorelov, K., et al.: KVASIR: a multi-class image dataset for computer aided gastrointestinal disease detection (2017). https://doi.org/10.1145/3083187.3083212

Rezende, D.J., Mohamed, S.: Variational inference with normalizing flows (2016)

Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28

Selvan, R., Faye, F., Middleton, J., Pai, A.: Uncertainty quantification in medical image segmentation with normalizing flows. In: Liu, M., Yan, P., Lian, C., Cao, X. (eds.) MLMI 2020. LNCS, vol. 12436, pp. 80–90. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-59861-7_9

Sohn, K., Lee, H., Yan, X.: Learning structured output representation using deep conditional generative models. Adv. Neural. Inf. Process. Syst.

**28**, 3483–3491 (2015)

## Author information

### Authors and Affiliations

### Corresponding author

## Editor information

### Editors and Affiliations

## Appendices

### Appendices

### A Probabilistic U-Net Objective

The loss function of the PU-Net is based on the standard ELBO and is defined as

where the latent sample \(\mathbf {z}\) from the posterior distribution is conditioned on the input image \(\mathbf {x}\), and ground-truth segmentation \(\mathbf {s}\).

### B Planar and Radial Flows

Normalizing Flows are trained by maximizing the likelihood objective

In the PU-Net, the objective becomes

where the *i*-th latent sample \(\mathbf {z}_i\) from the Normalizing Flow is conditioned on the input image \(\mathbf {x}\), and ground-truth segmentation \(\mathbf {s}\).

The planar flow expands and contracts distributions along a specific directions by applying the transformation

while the radial flow warps distributions around a specific point with the transformation

### C Dataset Images

Here example images from the datasets used in this work can be seen. Figure 4 depicts four examples from the LIDC dataset. On the left in the figure the 2D CT image containing the lesion, followed by the four labels made by four independent annotators is shown. In Fig. 5, eight examples from the Kvasir-SEG dataset is depicted. An endoscopic image with its ground truth label can be seen.

### D Sample Size Dependent GED

The GED evaluation is dependent on the number of reconstructions sampled from the prior distribution. Figure 6 depicts this relationship for the vanilla, 2-planar and 2-radial posterior models. The uncertainty in the values originate from the changing results when training with ten-fold cross validation. One can observe that with increasing sample size, the GED as well as the associated uncertainty decrease. This is also the case when the posterior is augmented with a 2-planar or 2-radial flow. Particularly, the uncertainty in the GED evaluation significantly decreases.

### E Prior Distribution Variance

We investigated whether the prior distribution captures the degree of ambiguity in the input images. For every input image \(\mathbb {X}\), we obtain a latent *L*-dimensional mean and standard deviation vector of the prior distribution \(P(\boldsymbol{\mu }, \boldsymbol{\sigma }\vert \mathbb {X})\). The mean of the latent prior variance vector \(\mu _{LV}\), is obtained from the input images in an attempt to quantify this uncertainty. Figure 7 shows this for several different input images of the test set. As can be seen, the mean variance over the latent prior increases along with a subjective assessment of the annotation difficulty.

## Rights and permissions

## Copyright information

© 2021 Springer Nature Switzerland AG

## About this paper

### Cite this paper

Valiuddin, M.M.A., Viviers, C.G.A., van Sloun, R.J.G., de With, P.H.N., van der Sommen, F. (2021). Improving Aleatoric Uncertainty Quantification in Multi-annotated Medical Image Segmentation with Normalizing Flows.
In: , *et al.* Uncertainty for Safe Utilization of Machine Learning in Medical Imaging, and Perinatal Imaging, Placental and Preterm Image Analysis. UNSURE PIPPI 2021 2021. Lecture Notes in Computer Science(), vol 12959. Springer, Cham. https://doi.org/10.1007/978-3-030-87735-4_8

### Download citation

DOI: https://doi.org/10.1007/978-3-030-87735-4_8

Published:

Publisher Name: Springer, Cham

Print ISBN: 978-3-030-87734-7

Online ISBN: 978-3-030-87735-4

eBook Packages: Computer ScienceComputer Science (R0)