Semantic Segmentation with Modified Deep Residual Networks

  • Xinze Chen
  • Guangliang Cheng
  • Yinghao Cai
  • Dayong Wen
  • Heping LiEmail author
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 663)


A novel semantic segmentation method is proposed, which consists of the following three parts: (I) First, a simple yet effective data augmentation method is introduced without any extra GPU memory cost during training. (II) Second, a deeper residual network is constructed through three effective techniques: dilated convolution, LSTM network and multi-scale prediction. (III) Third, an online hard pixels mining is adopted to improve the segmentation performance. We combine these three parts to train an end-to-end network and achieve a new state-of-the-art segmentation accuracy of 79.3 % on PASCAL VOC 2012 test set at the time of submission.


Semantic segmentation Data augmentation Residual networks LSTM Multi-scale prediction 



This work is supported by the National Natural Science Foundation of China (NSFC) (Grant No.: 61305048, Grant No.: 61503381).


  1. 1.
    Bell, S., Zitnick, C.L., Bala, K., Girshick, R.B.: Inside-outside net: detecting objects in context with skip pooling and recurrent neural networks. arXiv preprint arXiv:1512.04143 (2015)
  2. 2.
    Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: Semantic image segmentation with deep convolutional nets and fully connected CRFs. In: ICLR (2015)Google Scholar
  3. 3.
    Chen, L.C., Yang, Y., Wang, J., Xu, W., Yuille, A.L.: Attention to scale: scale-aware semantic image segmentation. In: CVPR (2016)Google Scholar
  4. 4.
    Dai, J., He, K., Sun, J.: BoxSup: exploiting bounding boxes to supervise convolutional networks for semantic segmentation. In: ICCV, pp. 1635–1643 (2015)Google Scholar
  5. 5.
    Everingham, M., Eslami, S.M.A., Gool, L.J.V., Williams, C.K.I., Winn, J.M., Zisserman, A.: The Pascal visual object classes challenge: a retrospective. Int. J. Comput. Vis. 111(1), 98–136 (2015)CrossRefGoogle Scholar
  6. 6.
    Graves, A.: Generating sequences with recurrent neural networks. arXiv preprint arXiv:1308.0850 (2013)
  7. 7.
    Graves, A., Jaitly, N.: Towards end-to-end speech recognition with recurrent neural networks. In: ICML, pp. 1764–1772 (2014)Google Scholar
  8. 8.
    Hariharan, B., Arbelaez, P., Bourdev, L.D., Maji, S., Malik, J.: Semantic contours from inverse detectors. In: ICCV, pp. 991–998 (2011)Google Scholar
  9. 9.
    He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: ICCV, pp. 1026–1034 (2015)Google Scholar
  10. 10.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)Google Scholar
  11. 11.
    Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)CrossRefGoogle Scholar
  12. 12.
    Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R.B., Guadarrama, S., Darrell, T.: Caffe: convolutional architecture for fast feature embedding. In: Proceedings of the ACM International Conference on Multimedia, pp. 675–678 (2014)Google Scholar
  13. 13.
    Krähenbühl, P., Koltun, V.: Efficient inference in fully connected CRFs with Gaussian edge potentials. In: NIPS, pp. 109–117 (2011)Google Scholar
  14. 14.
    Lin, G., Shen, C., van den Hengel, A., Reid, I.D.: Exploring context with deep structured models for semantic segmentation. In: CVPR (2016)Google Scholar
  15. 15.
    Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Heidelberg (2014). doi: 10.1007/978-3-319-10602-1_48 Google Scholar
  16. 16.
    Liu, Z., Li, X., Luo, P., Loy, C.C., Tang, X.: Semantic image segmentation via deep parsing network. In: ICCV, pp. 1377–1385 (2015)Google Scholar
  17. 17.
    Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: CVPR, pp. 3431–3440 (2015)Google Scholar
  18. 18.
    Mallat, S.: A Wavelet Tour of Signal Processing, 2nd edn. Academic Press, San Diego (1999)zbMATHGoogle Scholar
  19. 19.
    Noh, H., Hong, S., Han, B.: Learning deconvolution network for semantic segmentation. In: ICCV, pp. 1520–1528 (2015)Google Scholar
  20. 20.
    Shrivastava, A., Gupta, A., Girshick, R.B.: Training region-based object detectors with online hard example mining. In: CVPR (2016)Google Scholar
  21. 21.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: ICLR (2015)Google Scholar
  22. 22.
    Badrinarayanan, V., Alex Kendall, R.C.: SegNet: a deep convolutional encoder-decoder architecture for image segmentation. arXiv preprint arXiv:1511.00561 (2015)
  23. 23.
    Yu, F., Koltun, V.: Multi-scale context aggregation by dilated convolutions. In: ICLR (2016)Google Scholar
  24. 24.
    Zheng, S., Jayasumana, S., Romera-Paredes, B., Vineet, V., Su, Z., Du, D., Huang, C., Torr, P.H.S.: Conditional random fields as recurrent neural networks. In: ICCV, pp. 1529–1537 (2015)Google Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2016

Authors and Affiliations

  • Xinze Chen
    • 1
  • Guangliang Cheng
    • 1
  • Yinghao Cai
    • 1
  • Dayong Wen
    • 1
  • Heping Li
    • 1
    Email author
  1. 1.NLPRInstitute of Automation, Chinese Academy of SciencesBeijingChina

Personalised recommendations