Advertisement

Increasing the Robustness of Semantic Segmentation Models with Painting-by-Numbers

Conference paper
  • 827 Downloads
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12355)

Abstract

For safety-critical applications such as autonomous driving, CNNs have to be robust with respect to unavoidable image corruptions, such as image noise. While previous works addressed the task of robust prediction in the context of full-image classification, we consider it for dense semantic segmentation. We build upon an insight from image classification that output robustness can be improved by increasing the network-bias towards object shapes. We present a new training schema that increases this shape bias. Our basic idea is to alpha-blend a portion of the RGB training images with faked images, where each class-label is given a fixed, randomly chosen color that is not likely to appear in real imagery. This forces the network to rely more strongly on shape cues. We call this data augmentation technique “Painting-by-Numbers”. We demonstrate the effectiveness of our training schema for DeepLabv3\(+\) with various network backbones, MobileNet-V2, ResNets, and Xception, and evaluate it on the Cityscapes dataset. With respect to our 16 different types of image corruptions and 5 different network backbones, we are in 74% better than training with clean data. For cases where we are worse than a model trained without our training schema, it is mostly only marginally worse. However, for some image corruptions such as images with noise, we see a considerable performance gain of up to 25%.

Keywords

Semantic segmentation Shape-bias Corruption robustness 

Supplementary material

504449_1_En_22_MOESM1_ESM.pdf (10.3 mb)
Supplementary material 1 (pdf 10530 KB)

References

  1. 1.
    Abadi, M., Barham, P., et al.: TensorFlow: a system for large-scale machine learning. In: 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16), pp. 265–283 (2016). https://www.usenix.org/system/files/conference/osdi16/osdi16-abadi.pdf
  2. 2.
    Azulay, A., Weiss, Y.: Why do deep convolutional networks generalize so poorly to small image transformations? J. Mach. Learn. Res. 20(184), 1–25 (2019). http://jmlr.org/papers/v20/19-519.html
  3. 3.
    Carlini, N., Wagner, D.: Adversarial examples are not easily detected: bypassing ten detection methods. In: Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security (AISec 2017), pp. 3–14. ACM, New York, NY, USA (2017).  https://doi.org/10.1145/3128572.3140444
  4. 4.
    Carlini, N., Wagner, D.A.: Towards evaluating the robustness of neural networks. In: 2017 IEEE Symposium on Security and Privacy (SP) (2017)Google Scholar
  5. 5.
    Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: Semantic image segmentation with deep convolutional nets and fully connected CRFs. In: ICLR. vol. abs/1412.7062 (2015). http://arxiv.org/abs/1412.7062
  6. 6.
    Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. In: TPAMI (2017). http://arxiv.org/abs/1606.00915
  7. 7.
    Chen, L.C., Papandreou, G., Schroff, F., Adam, H.: Rethinking atrous convolution for semantic image segmentation (2017). http://arxiv.org/abs/1706.05587
  8. 8.
    Chen, L.-C., Zhu, Y., Papandreou, G., Schroff, F., Adam, H.: Encoder-decoder with atrous separable convolution for semantic image segmentation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11211, pp. 833–851. Springer, Cham (2018).  https://doi.org/10.1007/978-3-030-01234-2_49CrossRefGoogle Scholar
  9. 9.
    Chollet, F.: Xception: deep learning with depthwise separable convolutions. In: CVPR (2017).  https://doi.org/10.1109/CVPR.2017.195, http://ieeexplore.ieee.org/document/8099678/
  10. 10.
    Cisse, M., Bojanowski, P., Grave, E., Dauphin, Y., Usunier, N.: Parseval networks: improving robustness to adversarial examples. In: Precup, D., Teh, Y.W. (eds.) Proceedings of the 34th International Conference on Machine Learning. Proceedings of Machine Learning Research, PMLR, International Convention Centre, Sydney, Australia (2017). http://proceedings.mlr.press/v70/cisse17a.html
  11. 11.
    Cordts, M., et al.: The cityscapes dataset for semantic urban scene understanding. In: CVPR (2016)Google Scholar
  12. 12.
    Cubuk, E.D., Zoph, B., Mane, D., Vasudevan, V., Le, Q.V.: Autoaugment: learning augmentation strategies from data. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2019Google Scholar
  13. 13.
    Dai, D., Van Gool, L.: Dark model adaptation: semantic image segmentation from daytime to nighttime. In: ITSC, pp. 3819–3824. IEEE (2018)Google Scholar
  14. 14.
    Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: CVPR (2009)Google Scholar
  15. 15.
    DeVries, T., Taylor, G.W.: Improved regularization of convolutional neural networks with cutout. arXiv preprint arXiv:1708.04552 (2017)
  16. 16.
    Dodge, S., Karam, L.: A study and comparison of human and deep learning recognition performance under visual distortions. In: 2017 26th International Conference on Computer Communication and Networks (ICCCN), pp. 1–7. IEEE (2017)Google Scholar
  17. 17.
    Dodge, S.F., Karam, L.J.: Understanding how image quality affects deep neural networks. In: Quomex (2016)Google Scholar
  18. 18.
    Engstrom, L., Tran, B., Tsipras, D., Schmidt, L., Madry, A.: Exploring the landscape of spatial robustness. In: Chaudhuri, K., Salakhutdinov, R. (eds.) Proceedings of the 36th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 97, pp. 1802–1811. PMLR, Long Beach, California, USA, June 2019. http://proceedings.mlr.press/v97/engstrom19a.html
  19. 19.
    Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The pascal visual object classes (VOC) challenge. In: IJCV (2010).  https://doi.org/10.1007/s11263-009-0275-4
  20. 20.
    Fawzi, A., Frossard, P.: Manitest: are classifiers really invariant? In: BMVC (2015)Google Scholar
  21. 21.
    Gatys, L.A., Ecker, A.S., Bethge, M.: A neural algorithm of artistic style. arXiv http://arxiv.org/abs/1508.06576 (2015)
  22. 22.
    Geirhos, R., Rubisch, P., Michaelis, C., Bethge, M., Wichmann, F.A., Brendel, W.: Imagenet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness. In: ICLR, May 2019. https://openreview.net/forum?id=Bygh9j09KX
  23. 23.
    Geirhos, R., Temme, C.R.M., Rauber, J., Schütt, H.H., Bethge, M., Wichmann, F.A.: Generalisation in humans and deep neural networks. In: Advances in Neural Information Processing Systems, vol. 31 (2018). https://arxiv.org/abs/1808.08750
  24. 24.
    Gilmer, J., Ford, N., Carlini, N., Cubuk, E.: Adversarial examples are a natural consequence of test error in noise. In: Chaudhuri, K., Salakhutdinov, R. (eds.) Proceedings of the 36th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 97, pp. 2280–2289. PMLR, Long Beach, California, USA, June 2019. http://proceedings.mlr.press/v97/gilmer19a.html
  25. 25.
    Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, Cambridge (2016)zbMATHGoogle Scholar
  26. 26.
    Grauman, K., Darrell, T.: The pyramid match kernel: discriminative classification with sets of image features. In: ICCV (2005)Google Scholar
  27. 27.
    Gu, S., Rigazio, L.: Towards deep neural network architectures robust to adversarial examples. In: NIPS Workshop on Deep Learning and Representation Learning abs/1412.5068 (2014)Google Scholar
  28. 28.
    He, K., Zhang, X., Ren, S., Sun, J.: Spatial pyramid pooling in deep convolutional networks for visual recognition. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8691, pp. 346–361. Springer, Cham (2014).  https://doi.org/10.1007/978-3-319-10578-9_23CrossRefGoogle Scholar
  29. 29.
    He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: ICCV, pp. 1026–1034 (2015)Google Scholar
  30. 30.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016).  https://doi.org/10.1109/CVPR.2016.90, http://ieeexplore.ieee.org/document/7780459/
  31. 31.
    Hendrycks, D., Dietterich, T.: Benchmarking neural network robustness to common corruptions and perturbations. In: ICLR (2019)Google Scholar
  32. 32.
    Hendrycks, D., Mu, N., Cubuk, E.D., Zoph, B., Gilmer, J., Lakshminarayanan, B.: AugMix: a simple data processing method to improve robustness and uncertainty. In: ICLR (2020)Google Scholar
  33. 33.
    Huang, X., Kwiatkowska, M., Wang, S., Wu, M.: Safety verification of deep neural networks. In: Majumdar, R., Kunčak, V. (eds.) CAV 2017. LNCS, vol. 10426, pp. 3–29. Springer, Cham (2017).  https://doi.org/10.1007/978-3-319-63387-9_1CrossRefGoogle Scholar
  34. 34.
    Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: ICML (2015)Google Scholar
  35. 35.
    Kamann, C., Rother, C.: Benchmarking the robustness of semantic segmentation models with respect to common corruptions. Int. J. Comput. Vis. 1–22 (2020).  https://doi.org/10.1007/s11263-020-01383-2
  36. 36.
    Kannan, H., Kurakin, A., Goodfellow, I.: Adversarial logit pairing. arXiv preprint arXiv:1803.06373 (2018)
  37. 37.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)Google Scholar
  38. 38.
    Laermann, J., Samek, W., Strodthoff, N.: Achieving generalizable robustness of deep neural networks by stability training. In: Fink, G.A., Frintrop, S., Jiang, X. (eds.) DAGM GCPR 2019. LNCS, vol. 11824, pp. 360–373. Springer, Cham (2019).  https://doi.org/10.1007/978-3-030-33676-9_25CrossRefGoogle Scholar
  39. 39.
    Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: CVPR, Washington, DC, USA (2006)Google Scholar
  40. 40.
    LeCun, Y., Bengio, Y., Hinton, G.E.: Deep learning. Nature (2015).  https://doi.org/10.1038/nature14539
  41. 41.
    Lecun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. In: Proceedings of the IEEE (1998)Google Scholar
  42. 42.
    Lin, M., Chen, Q., Yan, S.: Network in network. In: ICLR (2014)Google Scholar
  43. 43.
    Liu, W., Rabinovich, A., Berg, A.C.: Parsenet: Looking wider to see better. arXiv:1506.04579 [cs.CV] (2015)
  44. 44.
    Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: CVPR. vol. abs/1411.4038 (2015)Google Scholar
  45. 45.
    Lopes, R.G., Yin, D., Poole, B., Gilmer, J., Cubuk, E.D.: Improving robustness without sacrificing accuracy with patch gaussian augmentation. arXiv preprint arXiv:1906.02611 (2019)
  46. 46.
    Mahajan, D., et al.: Exploring the limits of weakly supervised pretraining. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11206, pp. 185–201. Springer, Cham (2018).  https://doi.org/10.1007/978-3-030-01216-8_12CrossRefGoogle Scholar
  47. 47.
    Metzen, J.H., Genewein, T., Fischer, V., Bischoff, B.: On detecting adversarial perturbations. In: ICLR (2017). https://arxiv.org/abs/1702.04267
  48. 48.
    Michaelis, C., et al.: Benchmarking robustness in object detection: autonomous driving when winter is coming. In: Machine Learning for Autonomous Driving Workshop (NeurIPS 2019), vol. 190707484, Jul 2019. https://arxiv.org/abs/1907.07484
  49. 49.
    Noh, H., Hong, S., Han, B.: Learning deconvolution network for semantic segmentation. In: ICCV, pp. 1520–1528 (2015)Google Scholar
  50. 50.
    Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: unified, real-time object detection. In: CVPR, pp. 779–788 (2016)Google Scholar
  51. 51.
    Ruderman, A., Rabinowitz, N.C., Morcos, A.S., Zoran, D.: Pooling is neither necessary nor sufficient for appropriate deformation stability in CNNs. arXiv preprint arXiv:1804.04438 (2018)
  52. 52.
    Rusak, E., et al.: Increasing the robustness of DNNs against image corruptions by playing the Game of Noise. arXiv https://arxiv.org/abs/2001.06057 (2020)
  53. 53.
    Sakaridis, C., Dai, D., Van Gool, L.: Semantic foggy scene understanding with synthetic data. IJCV 126(9), 973–992 (2018)CrossRefGoogle Scholar
  54. 54.
    Sakaridis, C., Dai, D., Van Gool, L.: Guided curriculum model adaptation and uncertainty-aware evaluation for semantic nighttime image segmentation. In: ICCV (2019)Google Scholar
  55. 55.
    Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.C.: MobileNetV2: inverted residuals and linear bottlenecks. In: CVPR (2018)Google Scholar
  56. 56.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: ICLR (2015). http://arxiv.org/abs/1409.1556
  57. 57.
    Szegedy, C., et al.: Going deeper with convolutions. In: CVPR (2015).  https://doi.org/10.1109/CVPR.2015.7298594, http://ieeexplore.ieee.org/document/7298594/
  58. 58.
    Szegedy, C., et al.: Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199 (2013)
  59. 59.
    Takahashi, R., Matsubara, T., Uehara, K.: Data augmentation using random image cropping and patching for deep CNNs. IEEE Trans. Circ. Syst. Video Technol. 30, 2917–2931 (2019)CrossRefGoogle Scholar
  60. 60.
    Vasiljevic, I., Chakrabarti, A., Shakhnarovich, G.: Examining the impact of blur on recognition by convolutional networks. arXiv:1611.05760 [cs.CV] abs/1611.05760 (2016). http://arxiv.org/abs/1611.05760
  61. 61.
    Volk, G., Stefan, M., von Bernuth, A., Hospach, D., Bringmann, O.: Towards robust CNN-based object detection through augmentation with synthetic rain variations. In: ITSC (2019)Google Scholar
  62. 62.
    Xie, Q., Hovy, E., Luong, M.T., Le, Q.V.: Self-training with noisy student improves imagenet classification. arXiv preprint arXiv:1911.04252 (2019)
  63. 63.
    Yun, S., Han, D., Oh, S.J., Chun, S., Choe, J., Yoo, Y.: Cutmix: regularization strategy to train strong classifiers with localizable features. In: ICCV, pp. 6023–6032 (2019)Google Scholar
  64. 64.
    Zhang, H., Cisse, M., Dauphin, Y.N., Lopez-Paz, D.: mixup: beyond empirical risk minimization. In: ICLR (2017)Google Scholar
  65. 65.
    Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.: Pyramid scene parsing network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2017). http://arxiv.org/abs/1612.01105
  66. 66.
    Zheng, S., Song, Y., Leung, T., Goodfellow, I.J.: Improving the robustness of deep neural networks via stability training. In: CVPR, pp. 4480–4488 (2016)Google Scholar
  67. 67.
    Zhong, Z., Zheng, L., Kang, G., Li, S., Yang, Y.: Random erasing data augmentation. In: AAAI (2017)Google Scholar
  68. 68.
    Zhou, Y., Song, S., Cheung, N.M.: On classification of distorted images with deep convolutional neural networks. In: International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (2017)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.Corporate Research, Robert Bosch GmbHRenningenGermany
  2. 2.Visual Learning Lab, Heidelberg University (HCI/IWR)HeidelbergGermany

Personalised recommendations