Simultaneous Semantic Segmentation and Outlier Detection in Presence of Domain Shift

  • Petra BevandićEmail author
  • Ivan Krešo
  • Marin Oršić
  • Siniša Šegvić
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11824)


Recent success on realistic road driving datasets has increased interest in exploring robust performance in real-world applications. One of the major unsolved problems is to identify image content which can not be reliably recognized with a given inference engine. We therefore study approaches to recover a dense outlier map alongside the primary task with a single forward pass, by relying on shared convolutional features. We consider semantic segmentation as the primary task and perform extensive validation on WildDash val (inliers), LSUN val (outliers), and pasted objects from Pascal VOC 2007 (outliers). We achieve the best validation performance by training to discriminate inliers from pasted ImageNet-1k content, even though ImageNet-1k contains many road-driving pixels, and, at least nominally, fails to account for the full diversity of the visual world. The proposed two-head model performs comparably to the C-way multi-class model trained to predict uniform distribution in outliers, while outperforming several other validated approaches. We evaluate our best two models on the WildDash test dataset and set a new state of the art on the WildDash benchmark.

Supplementary material

480714_1_En_3_MOESM1_ESM.pdf (4.2 mb)
Supplementary material 1 (pdf 4335 KB)


  1. 1.
    Bengio, Y., Courville, A.C., Vincent, P.: Representation learning: a review and new perspectives. IEEE Trans. Pattern Anal. Mach. Intell. 35(8), 1798–1828 (2013)CrossRefGoogle Scholar
  2. 2.
    Bevandic, P., Kreso, I., Orsic, M., Segvic, S.: Discriminative out-of-distribution detection for semantic segmentation. CoRR abs/1808.07703 (2018)Google Scholar
  3. 3.
    Blum, H., Sarlin, P., Nieto, J.I., Siegwart, R., Cadena, C.: The Fishyscapes benchmark: measuring blind spots in semantic segmentation. CoRR abs/1904.03215Google Scholar
  4. 4.
    Brock, A., Donahue, J., Simonyan, K.: Large scale GAN training for high fidelity natural image synthesis. In: ICLR (2019)Google Scholar
  5. 5.
    Brostow, G.J., Shotton, J., Fauqueur, J., Cipolla, R.: Segmentation and recognition using structure from motion point clouds. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008. LNCS, vol. 5302, pp. 44–57. Springer, Heidelberg (2008). Scholar
  6. 6.
    Bulò, S.R., Porzi, L., Kontschieder, P.: In-place activated BatchNorm for memory-optimized training of DNNs. CoRR, abs/1712.02616, December 5 2017Google Scholar
  7. 7.
    Caruana, R.: Multitask learning. Mach. Learn. 28(1), 41–75 (1997). Scholar
  8. 8.
    Chen, L.-C., Zhu, Y., Papandreou, G., Schroff, F., Adam, H.: Encoder-decoder with atrous separable convolution for semantic image segmentation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11211, pp. 833–851. Springer, Cham (2018). Scholar
  9. 9.
    Cordts, M., et al.: The cityscapes dataset. In: CVPRW (2015)Google Scholar
  10. 10.
    Deng, J., Dong, W., Socher, R., Li, L., Li, K., Li, F.: ImageNet: a large-scale hierarchical image database. In: CVPR, pp. 248–255 (2009)Google Scholar
  11. 11.
    DeVries, T., Taylor, G.W.: Learning confidence for out-of-distribution detection in neural networks. CoRR abs/1802.04865 (2018)Google Scholar
  12. 12.
    Eigen, D., Fergus, R.: Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In: ICCV, pp. 2650–2658 (2015)Google Scholar
  13. 13.
    Everingham, M., Gool, L., Williams, C.K., Winn, J., Zisserman, A.: The pascal visual object classes (VOC) challenge. Int. J. Comput, Vision (2010)CrossRefGoogle Scholar
  14. 14.
    Geiger, A., Lenz, P., Stiller, C., Urtasun, R.: Vision meets robotics: the KITTI dataset. Int. J. Robot. Res. (IJRR) 32, 1231–1237 (2013)CrossRefGoogle Scholar
  15. 15.
    Goodfellow, I.J., et al.: Generative adversarial nets. In: NIPS (2014)Google Scholar
  16. 16.
    Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: ICML, pp. 1321–1330 (2017)Google Scholar
  17. 17.
    He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: ICCV (2017)Google Scholar
  18. 18.
    He, K., Zhang, X., Ren, S., Sun, J.: Spatial pyramid pooling in deep convolutional networks for visual recognition. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8691, pp. 346–361. Springer, Cham (2014). Scholar
  19. 19.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016)Google Scholar
  20. 20.
    Hendrycks, D., Gimpel, K.: A baseline for detecting misclassified and out-of-distribution examples in neural networks. In: ICLR (2017)Google Scholar
  21. 21.
    Hendrycks, D., Mazeika, M., Dietterich, T.: Deep anomaly detection with outlier exposure. In: ICLR (2019)Google Scholar
  22. 22.
    Huang, G., Liu, Z., Weinberger, K.Q.: Densely connected convolutional networks. In: CVPR (2017)Google Scholar
  23. 23.
    Kendall, A., Gal, Y.: What uncertainties do we need in Bayesian deep learning for computer vision? In: NIPS, pp. 5574–5584 (2017)Google Scholar
  24. 24.
    Kong, S., Fowlkes, C.: Pixel-wise attentional gating for parsimonious pixel labeling. arxiv 1805.01556 (2018)Google Scholar
  25. 25.
    Kreso, I., Krapac, J., Segvic, S.: Ladder-style DenseNets for semantic segmentation of large natural images. In: ICCV CVRSUAD 2017, pp. 238–245 (2017)Google Scholar
  26. 26.
    Kreso, I., Krapac, J., Segvic, S.: Efficient ladder-style DenseNets for semantic segmentation of large images. CoRR abs/1905.05661 (2019)Google Scholar
  27. 27.
    Kreso, I., Orsic, M., Bevandic, P., Segvic, S.: Robust semantic segmentation with ladder-DenseNet models. CoRR abs/1806.03465 (2018)Google Scholar
  28. 28.
    Lakshminarayanan, B., Pritzel, A., Blundell, C.: Simple and scalable predictive uncertainty estimation using deep ensembles. In: NIPS, pp. 6402–6413 (2017)Google Scholar
  29. 29.
    Lee, K., Lee, H., Lee, K., Shin, J.: Training confidence-calibrated classifiers for detecting out-of-distribution samples. In: ICLR (2018)Google Scholar
  30. 30.
    Lee, K., Lee, K., Lee, H., Shin, J.: A simple unified framework for detecting out-of-distribution samples and adversarial attacks. In: NeurIPS (2018)Google Scholar
  31. 31.
    Liang, S., Li, Y., Srikant, R.: Enhancing the reliability of out-of-distribution image detection in neural networks. In: ICLR (2018)Google Scholar
  32. 32.
    Lin, T., Dollár, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: CVPR, pp. 936–944 (2017)Google Scholar
  33. 33.
    Meletis, P., Dubbelman, G.: Training of convolutional networks on multiple heterogeneous datasets for street scene semantic segmentation. In: IV (2018)Google Scholar
  34. 34.
    Nalisnick, E.T., Matsukawa, A., Teh, Y.W., Görür, D., Lakshminarayanan, B.: Do deep generative models know what they don’t know? In: ICLR (2019)Google Scholar
  35. 35.
    Neuhold, G., Ollmann, T., Bulò, S.R., Kontschieder, P.: The mapillary vistas dataset for semantic understanding of street scenes. In: ICCV (2017)Google Scholar
  36. 36.
    Ngiam, J., Khosla, A., Kim, M., Nam, J., Lee, H., Y. Ng, A.: Multimodal deep learning. In: ICML, pp. 689–696 (2011)Google Scholar
  37. 37.
    Sabokrou, M., Khalooei, M., Fathy, M., Adeli, E.: Adversarially learned one-class classifier for novelty detection. In: CVPR, pp. 3379–3388 (2018)Google Scholar
  38. 38.
    Scheirer, W.J., de Rezende Rocha, A., Sapkota, A., Boult, T.E.: Toward open set recognition. IEEE Trans. Pattern Anal. Mach. Intell. 35(7), 1757–1772 (2013)CrossRefGoogle Scholar
  39. 39.
    Shafaei, A., Schmidt, M., Little, J.J.: Does your model know the digit 6 is not a cat? a less biased evaluation of “outlier” detectors. CoRR abs/1809.04729 (2018)Google Scholar
  40. 40.
    Smith, L., Gal, Y.: Understanding measures of uncertainty for adversarial example detection. In: UAI, abs/1803.08533 (2018)Google Scholar
  41. 41.
    Torralba, A., Efros, A.A.: Unbiased look at dataset bias. In: CVPR, June 2011.
  42. 42.
    Vyas, A., Jammalamadaka, N., Zhu, X., Das, D., Kaul, B., Willke, T.L.: Out-of-distribution detection using an ensemble of self supervised leave-out classifiers. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11212, pp. 560–574. Springer, Cham (2018). Scholar
  43. 43.
    Yu, F., Koltun, V., Funkhouser, T.: Dilated residual networks. In: CVPR (2017)Google Scholar
  44. 44.
    Yu, F., Zhang, Y., Song, S., Seff, A., Xiao, J.: LSUN: construction of a large-scale image dataset using deep learning with humans in the loop. CoRR abs/1506.03365 (2015)Google Scholar
  45. 45.
    Zamir, A.R., Sax, A., Shen, W.B., Guibas, L.J., Malik, J., Savarese, S.: Taskonomy: disentangling task transfer learning. In: CVPR (2018)Google Scholar
  46. 46.
    Zendel, O., Honauer, K., Murschitz, M., Steininger, D., Domínguez, G.F.: WildDash - creating hazard-aware benchmarks. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11210, pp. 407–421. Springer, Cham (2018). Scholar
  47. 47.
    Zendel, O., Murschitz, M., Humenberger, M., Herzner, W.: How good is my test data? introducing safety analysis for computer vision. Int. J. Comput. Vis. 125(1–3), 95–109 (2017)MathSciNetCrossRefGoogle Scholar
  48. 48.
    Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.: Pyramid scene parsing network. In: CVPR (2017)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Petra Bevandić
    • 1
    Email author
  • Ivan Krešo
    • 1
  • Marin Oršić
    • 1
  • Siniša Šegvić
    • 1
  1. 1.Faculty of Electrical Engineering and ComputingUniversity of ZagrebZagrebCroatia

Personalised recommendations