Journal of Computer Science and Technology

, Volume 32, Issue 4, pp 683–700 | Cite as

Objectness Region Enhancement Networks for Scene Parsing

  • Xin-Yu Ou
  • Ping Li
  • He-Fei Ling
  • Si Liu
  • Tian-Jiang Wang
  • Dan Li
Regular Paper
  • 272 Downloads

Abstract

Semantic segmentation has recently witnessed rapid progress, but existing methods only focus on identifying objects or instances. In this work, we aim to address the task of semantic understanding of scenes with deep learning. Different from many existing methods, our method focuses on putting forward some techniques to improve the existing algorithms, rather than to propose a whole new framework. Objectness enhancement is the first effective technique. It exploits the detection module to produce object region proposals with category probability, and these regions are used to weight the parsing feature map directly. “Extra background” category, as a specific category, is often attached to the category space for improving parsing result in semantic and instance segmentation tasks. In scene parsing tasks, extra background category is still beneficial to improve the model in training. However, some pixels may be assigned into this nonexistent category in inference. Black-hole filling technique is proposed to avoid the incorrect classification. For verifying these two techniques, we integrate them into a parsing framework for generating parsing result. We call this unified framework as Objectness Enhancement Network (OENet). Compared with previous work, our proposed OENet system effectively improves the performance over the original model on SceneParse150 scene parsing dataset, reaching 38.4 mIoU (mean intersectionover-union) and 77.9% accuracy in the validation set without assembling multiple models. Its effectiveness is also verified on the Cityscapes dataset.

Keywords

objectness region enhancement black-hole filling scene parsing instance enhancement objectness region proposal 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Notes

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary material

11309_2017_1751_MOESM1_ESM.pdf
ESM 1 (PDF 1122 kb)

References

  1. 1.
    Chen L C, Papandreou G, Kokkinos I, Murphy K, Yuille A L. DeepLab: Semantic image segmentation with deep convolutional nets, Atrous convolution, and fully connected CRFs. IEEE Trans. Pattern Analysis and Machine Intelligence, 2017, doi:  10.1109/TPAMI.2017.2699184.
  2. 2.
    Zhou B L, Zhao H, Puig X, Fidler S, Barriuso A, Torralba A. Semantic understanding of scenes through the ADE20K dataset. arXiv: 1608.05442, 2016. https://arxiv. org/abs/1608.05442, June 2017.Google Scholar
  3. 3.
    Fu Z J, Huang F X, Sun X M, Vasilakos A, Yang C N. Enabling semantic search based on conceptual graphs over encrypted outsourced data. IEEE Trans. Services Computing, 2016, doi:  10.1109/TSC.2016.2622697.Google Scholar
  4. 4.
    Pan Z Q, Lei J J, Zhang Y et al. Fast motion estimation based on content property for low-complexity H.265/HEVC encoder. IEEE Trans. Broadcasting, 2016, 62(3): 675-684.CrossRefGoogle Scholar
  5. 5.
    Fu Z J, Ren K, Shu J G, Sun X M, Huang F X. Enabling personalized search over encrypted outsourced data with efficiency improvement. IEEE Tran. Parallel and Distributed Systems, 2016, 27(9): 2546-2559.CrossRefGoogle Scholar
  6. 6.
    Wen X Z, Shao L, Xue Y, Fang W. A rapid learning algorithm for vehicle classification. Information Sciences, 2015, 295: 395-406.CrossRefGoogle Scholar
  7. 7.
    Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation. In Proc. IEEE Conf. Computer Vision and Pattern Recognition, June 2015, pp.3431-3440.Google Scholar
  8. 8.
    Lin G S, Shen C H, van den Hengel A, Reid I. Exploring context with deep structured models for semantic segmentation. arXiv: 1603.03183, 2017. https://arxiv.org/abs/16-03.03183, June 2017.Google Scholar
  9. 9.
    Lin T Y, Maire M, Belongie S et al. Microsoft COCO: Common objects in context. In Proc. European Conf. Computer Vision, October 2014, pp.740-755.Google Scholar
  10. 10.
    Dai J F, He K M, Sun J. Instance-aware semantic segmentation via multi-task network cascades. In Proc. IEEE Conf. Computer Vision and Pattern Recognition, June 2016, pp.3150-3158.Google Scholar
  11. 11.
    Pinheiro P H O, Collobert R, Doll´ar P. Learning to segment object candidates. In Proc. the 28th Int. Conf. Neural Information Processing Systems, December 2015, pp.1990-1998.Google Scholar
  12. 12.
    Girshick R. Fast R-CNN. In Proc. IEEE Int. Conf. Computer Vision, December 2015, pp.1440-1448.Google Scholar
  13. 13.
    Ren S Q, He K M, Girshick R, Sun J. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137-1149.Google Scholar
  14. 14.
    Dai J F, Li Y, He K M, Sun J. R-FCN: Object detection via region-based fully convolutional networks. In Proc. the 30th Conf. Neural Information Processing Systems, December 2016, pp.379-387.Google Scholar
  15. 15.
    Krizhevsky A, Sutskever I, Hinton G E. ImageNet classification with deep convolutional neural networks. In Proc. the 25th Int. Conf. Neural Information Processing Systems, December 2012, pp.1097-1105.Google Scholar
  16. 16.
    Badrinarayanan V, Kendall A, Cipolla R. SegNet: A deep convolutional encoder-decoder architecture for scene segmentation. IEEE Trans. Pattern Analysis and Machine Intelligence, 2017, doi:  10.1109/TPAMI.2016.2644615.
  17. 17.
    Noh H, Hong S, Han B. Learning deconvolution network for semantic segmentation. In Proc. IEEE Int. Conf. Computer Vision, December 2015, pp.1520-1528.Google Scholar
  18. 18.
    Liu S, Wang C H, Qian R H et al. Surveillance video parsing with single frame supervision. arXiv: 1611.09587, 2016.https://arxiv.org/abs/1611.09587, June 2017.Google Scholar
  19. 19.
    Liu S, Liang X D, Liu L Q, Shen X H, Yang J C, Xu C S, Lin L, Cao X, Yan S C. Matching-CNN meets KNN: Quasiparametric human parsing. In Proc. IEEE Conf. Computer Vision and Pattern Recognition, June 2015, pp.1419-1427.Google Scholar
  20. 20.
    Liu S, Liang X D, Liu L Q et al. Fashion parsing with video context. IEEE Trans. Multimedia, 2015, 17(8): 1347-1358.CrossRefGoogle Scholar
  21. 21.
    Liang X D, Liu S, Shen X H, Yang J C, Liu L Q, Dong J, Lin L, Yan S C. Deep human parsing with active template regression. IEEE Trans. Pattern Analysis and Machine Intelligence, 2015, 37(12): 2402-2414.Google Scholar
  22. 22.
    Liu S, Ou X Y, Qian R H et al. Makeup like a superstar: Deep localized makeup transfer network. In Proc. the 25th Int. Joint Conf. Artificial Intelligence, July 2016, pp.2568-2575.Google Scholar
  23. 23.
    Liu S, Feng J S, Song Z, Zhang T Z, Lu H Q, Xu C S, Yan S C. Hi, magic closet, tell me what to wear! In Proc. the 20th ACM Int. Conf. Multimedia, October 2012, pp.619-628.Google Scholar
  24. 24.
    Zhou B L, Khosla A, Lapedriza `A, Torralba A, Oliva A. Places: An image database for deep scene understanding. arXiv: 1610.02055, 2016. https://arxiv.org/abs/16-10.02055, June 2017.
  25. 25.
    He K M, Zhang X Y, Ren S Q, Sun J. Identity mappings in deep residual networks. In Proc. European Conf. Computer Vision, October 2016, pp.630645.Google Scholar
  26. 26.
    Dai J F, He K M, Sun J. Convolutional feature masking for joint object and stuff segmentation. In Proc. IEEE Conf. Computer Vision and Pattern Recognition, June 2015, pp.3992-4000.Google Scholar
  27. 27.
    Girshick R, Donahue J, Darrell T, Malik J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proc. IEEE Conf. Computer Vision and Pattern Recognition, June 2014, pp.580-587.Google Scholar
  28. 28.
    Hariharan B, Arbel´aez P, Girshick R, Malik J. Simultaneous detection and segmentation. In Proc. European Conf. Computer Vision, October 2014, pp.297-312.Google Scholar
  29. 29.
    Sharma A, Tuzel O, Liu M Y. Recursive context propagation network for semantic scene labeling. In Proc. Annual Conf. Neural Information Processing Systems, December 2014, pp.2447-2455.Google Scholar
  30. 30.
    Sharma A, Tuzel O, Jacobs D W. Deep hierarchical parsing for semantic segmentation. In Proc. IEEE Conf. Computer Vision and Pattern Recognition, June 2015, pp.530-538.Google Scholar
  31. 31.
    He K M, Zhang X Y, Ren S Q, Sun J. Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans. Pattern Analysis and Machine Intelligence, 2015, 37(9): 1904-1916.Google Scholar
  32. 32.
    Zheng S, Jayasumana S, Romera-Paredes B, Vineet V, Su Z Z, Du D L, Huang C, Torr P H S. Conditional random fields as recurrent neural networks. In Proc. IEEE Int. Conf. Computer Vision, December 2015, pp.1529-1537.Google Scholar
  33. 33.
    Arnab A, Jayasumana S, Zheng S, Torr P H S. Higher order conditional random fields in deep neural networks. In Proc. European Conf. Computer Vision, October 2016, pp.524-540.Google Scholar
  34. 34.
    Ciresan D, Meier U, Schmidhuber J. Multi-column deep neural networks for image classification. In Proc. IEEE Conf. Computer Vision and Pattern Recognition, June 2012, pp.3642-3649.Google Scholar
  35. 35.
    Dai J F, He K M, Sun J. BoxSup: Exploiting bounding boxes to supervise convolutional networks for semantic segmentation. In Proc. IEEE Int. Conf. Computer Vision, December 2015, pp.1635-1643.Google Scholar
  36. 36.
    Krähenbühl P, Koltun V. Efficient inference in fully connected CRFs with Gaussian edge potentials. In Proc. the 24th Int. Conf. Neural Information Processing Systems, December 2011, pp.109-117.Google Scholar
  37. 37.
    Jia Y Q, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R B, Guadarrama S, Darrell T. Caffe: Convolutional architecture for fast feature embedding. In Proc. the 22nd ACM Int. Conf. Multimedia, November 2014, pp.675-678.Google Scholar
  38. 38.
    Yu F, Koltun V. Multi-scale context aggregation by dilated convolutions. arXiv: 1511.07122, 2016. https://arxiv. org/abs/1511.07122, June 2017.Google Scholar
  39. 39.
    Cordts M, OmranM, Ramos S et al. The Cityscapes dataset for semantic urban scene understanding. In Proc. IEEE Conf. Computer Vision and Pattern Recognition, June 2016, pp.3213-3223.Google Scholar
  40. 40.
    Liu Z W, Li X X, Luo P, Loy C C, Tang X O. Semantic image segmentation via deep parsing network. In Proc. IEEE Int. Conf. Computer Vision, December 2015, pp.1377-1385.Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2017

Authors and Affiliations

  • Xin-Yu Ou
    • 1
    • 2
    • 3
  • Ping Li
    • 1
  • He-Fei Ling
    • 1
  • Si Liu
    • 2
  • Tian-Jiang Wang
    • 1
  • Dan Li
    • 1
  1. 1.School of Computer Science and TechnologyHuazhong University of Science and TechnologyWuhanChina
  2. 2.Institute of Information EngineeringChinese Academy of SciencesBeijingChina
  3. 3.Cadres Online Learning Institute of Yunnan ProvinceYunnan Open UniversityKunmingChina

Personalised recommendations