Calibrated Surrogate Maximization of Dice

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12264)


In the medical imaging community, it is increasingly popular to train machine learning models for segmentation problems with objectives based on the soft-Dice surrogate. While experimental studies have showed good performance with respect to Dice, there have also been reports of some issues related to stability. In parallel with these developments, direct optimization of evaluation metrics has also been studied in the context of binary classification. Recently, in this setting, a quasi-concave, lower-bounded and calibrated surrogate for the \(F_1\)-score has been proposed. In this work, we show how to use this surrogate in the context of segmentation. We then show that it has some better theoretical properties than soft-Dice. Finally, we experimentally compare the new surrogate with soft-Dice on a 3D-segmentation problem and get results indicating that stability is improved. We conclude that the new surrogate, for theoretical and experimental reasons, can be considered a promising alternative to the soft-Dice surrogate.


Dice Calibration Segmentation 



Marcus Nordström, Fredrik Löfman, Henrik Hult and Atsuto Maki were supported by RaySearch Laboratories. Masashi Sugiyama was supported by the International Research Center for Neurointelligence (WPI-IRCN) at The University of Tokyo Institutes for Advanced Study.

Supplementary material

505216_1_En_27_MOESM1_ESM.pdf (122 kb)
Supplementary material 1 (pdf 122 KB)


  1. 1.
    Bao, H., Sugiyama, M.: Calibrated surrogate maximization of linear-fractional utility in binary classification. In: International Conference on Artificial Intelligence and Statistics, pp. 2337–2347 (2020)Google Scholar
  2. 2.
    Bartlett, P.L., Jordan, M.I., McAuliffe, J.D.: Convexity, classification, and risk bounds. J. Am. Stat. Assoc. 101(473), 138–156 (2006)MathSciNetCrossRefGoogle Scholar
  3. 3.
    Bertels, J., et al.: Optimizing the dice score and Jaccard index for medical image segmentation: theory and practice. In: Shen, D., et al. (eds.) MICCAI 2019. LNCS, vol. 11765, pp. 92–100. Springer, Cham (2019). Scholar
  4. 4.
    Bertels, J., Robben, D., Vandermeulen, D., Suetens, P.: Optimization with soft dice can lead to a volumetric bias. In: Crimi, A., Bakas, S. (eds.) BrainLes 2019. LNCS, vol. 11992, pp. 89–97. Springer, Cham (2020). Scholar
  5. 5.
    Çiçek, Ö., Abdulkadir, A., Lienkamp, S.S., Brox, T., Ronneberger, O.: 3D U-Net: learning dense volumetric segmentation from sparse annotation. In: Ourselin, S., Joskowicz, L., Sabuncu, M.R., Unal, G., Wells, W. (eds.) MICCAI 2016. LNCS, vol. 9901, pp. 424–432. Springer, Cham (2016). Scholar
  6. 6.
    Drozdzal, M., Vorontsov, E., Chartrand, G., Kadoury, S., Pal, C.: The importance of skip connections in biomedical image segmentation. In: Carneiro, G., et al. (eds.) LABELS/DLMIA - 2016. LNCS, vol. 10008, pp. 179–187. Springer, Cham (2016). Scholar
  7. 7.
    Heller, N., et al.: The Kits19 Challenge Data: 300 Kidney Tumor Cases With Clinical Context, CT Semantic Segmentations, and Surgical Outcomes (2019)Google Scholar
  8. 8.
    Isensee, F., Kickingereder, P., Wick, W., Bendszus, M., Maier-Hein, K.H.: No new-net. In: Crimi, A., Bakas, S., Kuijf, H., Keyvan, F., Reyes, M., van Walsum, T. (eds.) BrainLes 2018. LNCS, vol. 11384, pp. 234–244. Springer, Cham (2019). Scholar
  9. 9.
    Isensee, F., Maier-Hein, K.H.: An attempt at beating the 3D U-net. arXiv preprint arXiv:1908.02182 (2019)
  10. 10.
    Jadon, S., et al.: A comparative study of 2D image segmentation algorithms for traumatic brain lesions using CT data from the ProTECTIII multicenter clinical trial. In: Medical Imaging 2020: Imaging Informatics for Healthcare, Research, and Applications. vol. 11318, p. 11318 0Q. International Society for Optics and Photonics (2020)Google Scholar
  11. 11.
    Kar, P., Narasimhan, H., Jain, P.: Surrogate functions for maximizing precision at the top. In: International Conference on Machine Learning, pp. 189–198 (2015)Google Scholar
  12. 12.
    Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: Proceedings of the 3rd International Conference on Learning Representations (2015)Google Scholar
  13. 13.
    Litjens, G., et al.: A survey on deep learning in medical image analysis. Med. Image Anal. 42, 60–88 (2017)CrossRefGoogle Scholar
  14. 14.
    Liu, X.Y., Wu, J., Zhou, Z.H.: Exploratory undersampling for class-imbalance learning. IEEE Trans. Syst. Man Cybern. Part B (Cybern. ) 39(2), 539–550 (2008)Google Scholar
  15. 15.
    Liu, X.Y., Zhou, Z.H.: The influence of class imbalance on cost-sensitive learning: an empirical study. In: Sixth International Conference on Data Mining (ICDM 2006), pp. 970–974. IEEE (2006)Google Scholar
  16. 16.
    Mehrtash, A., Wells III, W.M., Tempany, C.M., Abolmaesumi, P., Kapur, T.: Confidence calibration and predictive uncertainty estimation for deep medical image segmentation. arXiv preprint arXiv:1911.13273 (2019)
  17. 17.
    Menon, A., Narasimhan, H., Agarwal, S., Chawla, S.: On the statistical consistency of algorithms for binary classification under class imbalance. In: International Conference on Machine Learning, pp. 603–611 (2013)Google Scholar
  18. 18.
    Milletari, F., Navab, N., Ahmadi, S.A.: V-net: fully convolutional neural networks for volumetric medical image segmentation. In: Fourth International Conference on 3D vision (3DV), pp. 565–571. IEEE (2016)Google Scholar
  19. 19.
    Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). Scholar
  20. 20.
    Steinwart, I.: How to compare different loss functions and their risks. Constr. Approximat. 26(2), 225–287 (2007)MathSciNetCrossRefGoogle Scholar
  21. 21.
    Sudre, C.H., Li, W., Vercauteren, T., Ourselin, S., Jorge Cardoso, M.: Generalised dice overlap as a deep learning loss function for highly unbalanced segmentations. In: Cardoso, M.J., et al. (eds.) DLMIA/ML-CDS - 2017. LNCS, vol. 10553, pp. 240–248. Springer, Cham (2017). Scholar
  22. 22.
    Vapnik, V.: The Nature of Statistical Learning Theory. Springer, New York (2013). Scholar
  23. 23.
    Zhang, T.: Statistical behavior and consistency of classification methods based on convex risk minimization. Ann. Stat. 32, 56–85 (2004) MathSciNetCrossRefGoogle Scholar
  24. 24.
    Zhang, Y., et al.: Cascaded volumetric convolutional network for kidney tumor segmentation from CT volumes. arXiv preprint arXiv:1910.02235 (2019)

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.KTH Royal Institute of TechnologyStockholmSweden
  2. 2.RaySearch LaboratoriesStockholmSweden
  3. 3.The University of TokyoTokyoJapan
  4. 4.RIKENTokyoJapan

Personalised recommendations