Advertisement

Neural Processing Letters

, Volume 50, Issue 3, pp 2665–2679 | Cite as

Online Hard Region Mining for Semantic Segmentation

  • Jin Yin
  • Pengfei Xia
  • Jingsong HeEmail author
Article
  • 114 Downloads

Abstract

Recent advances in semantic segmentation have made significant progress by enlarging the reception fields or capturing contextual information. Semantic segmentation is considered as a per-pixel classification problem. Hard discriminate region existing in an image will limit segmentation accuracy. In this work, we propose an approach to increase the attention to local semantic segmentation performance by region-based hard region mining. To analyse the performance on three popular semantic segmentation datasets, including PASCAL VOC 2012, PASCAL Context and Camvid, we experiment two different semantic segmentation networks, Deeplab v3 and FCN. Our experimental results show consistent improvement, which demonstrating the efficacy of our approach.

Keywords

Hard region mining Semantic segmentation Online bootstrapping CNNs FCN 

Notes

References

  1. 1.
    Badrinarayanan V, Kendall A, Cipolla R (2017) Segnet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans Pattern Anal Mach Intell 39(12):2481–2495CrossRefGoogle Scholar
  2. 2.
    Brostow GJ, Shotton J, Fauqueur J, Cipolla R (2008) Segmentation and recognition using structure from motion point clouds. In: European conference on computer vision. Springer, pp 44–57Google Scholar
  3. 3.
    Cai Z, Vasconcelos N (2018) Cascade R-CNN: Delving into high quality object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6154–6162Google Scholar
  4. 4.
    Chen LC, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2018) DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans Pattern Anal Mach Intell 40(4):834–848CrossRefGoogle Scholar
  5. 5.
    Chen LC, Papandreou G, Schroff F, Adam H (2017) Rethinking atrous convolution for semantic image segmentation. arXiv preprint arXiv:1706.05587
  6. 6.
    Chen LC, Yang Y, Wang J, Xu W, Yuille AL (2016) Attention to scale: scale-aware semantic image segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3640–3649Google Scholar
  7. 7.
    Chen LC, Zhu Y, Papandreou G, Schroff F, Adam H (2018) Encoder-decoder with atrous separable convolution for semantic image segmentation. In: Proceedings of the European conference on computer vision (ECCV), pp 801–818CrossRefGoogle Scholar
  8. 8.
    Cireşan D, Meier U, Schmidhuber J (2012) Multi-column deep neural networks for image classification. arXiv preprint arXiv:1202.2745
  9. 9.
    Dai J, He K, Sun J (2016) Instance-aware semantic segmentation via multi-task network cascades. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3150–3158Google Scholar
  10. 10.
    Farabet C, Couprie C, Najman L, LeCun Y (2013) Learning hierarchical features for scene labeling. IEEE Trans Pattern Anal Mach Intell 35(8):1915–1929CrossRefGoogle Scholar
  11. 11.
    Garcia-Garcia A, Orts-Escolano S, Oprea S, Villena-Martinez V, Garcia-Rodriguez J (2017) A review on deep learning techniques applied to semantic segmentation. arXiv preprint arXiv:1704.06857
  12. 12.
    Girshick R (2015) Fast R-CNN. In: Proceedings of the IEEE international conference on computer vision, pp 1440–1448Google Scholar
  13. 13.
    Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 580–587Google Scholar
  14. 14.
    Hariharan B, Arbeláez P, Bourdev L, Maji S, Malik J (2011) Semantic contours from inverse detectors. In: International conference on computer vision. pp 991–998Google Scholar
  15. 15.
    Hariharan B, Arbeláez P, Girshick R, Malik J (2014) Simultaneous detection and segmentation. In: European conference on computer vision. Springer, pp 297–312Google Scholar
  16. 16.
    He K, Zhang X, Ren S, Sun J (2015) Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Machine Intell 37(9):1904–1916CrossRefGoogle Scholar
  17. 17.
    He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778Google Scholar
  18. 18.
    Hong C, Yu J, Chen X (2013) Image-based 3D human pose recovery with locality sensitive sparse retrieval. In: 2013 IEEE international conference on systems, man, and cybernetics. IEEE, pp 2103–2108Google Scholar
  19. 19.
    Hong C, Yu J, Wan J, Tao D, Wang M (2015) Multimodal deep autoencoder for human pose recovery. IEEE Trans Image Process 24(12):5659–5670MathSciNetCrossRefGoogle Scholar
  20. 20.
    Hong C, Yu J, You J, Chen X, Tao D (2015) Multi-view ensemble manifold regularization for 3D object recognition. Inf Sci 320:395–405MathSciNetCrossRefGoogle Scholar
  21. 21.
    Hong C, Yu J, Zhang J, Jin X, Lee K (2018) Multi-modal face pose estimation with multi-task manifold deep learning. IEEE Trans Ind Inform. https://doi.org/10.1109/TII.2018.2884211 CrossRefGoogle Scholar
  22. 22.
    Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105Google Scholar
  23. 23.
    Li X, Liu Z, Luo P, Change Loy C, Tang X (2017) Not all pixels are equal: difficulty-aware semantic segmentation via deep layer cascade. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3193–3202Google Scholar
  24. 24.
    Liu W, Rabinovich A, Berg AC (2015) ParseNet: Looking wider to see better. arXiv preprint arXiv:1506.04579
  25. 25.
    Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3431–3440Google Scholar
  26. 26.
    Loshchilov I, Hutter F (2015) Online batch selection for faster training of neural networks. arXiv preprint arXiv:1511.06343
  27. 27.
    Murthy VN, Singh V, Chen T, Manmatha R, Comaniciu D (2016) Deep decision network for multi-class image classification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2240–2248Google Scholar
  28. 28.
    Noh H, Hong S, Han B (2015) Learning deconvolution network for semantic segmentation. In: Proceedings of the IEEE international conference on computer vision, pp 1520–1528Google Scholar
  29. 29.
    Peng C, Zhang X, Yu G, Luo G, Sun J (2017) Large kernel matters—improve semantic segmentation by global convolutional network. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4353–4361Google Scholar
  30. 30.
    Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 779–788Google Scholar
  31. 31.
    Ren S, He K, Girshick R, Sun J (2015) Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in neural information processing systems, pp 91–99Google Scholar
  32. 32.
    Ronneberger O, Fischer P, Brox T (2015) U-net: convolutional networks for biomedical image segmentation. In: International conference on medical image computing and computer-assisted intervention. Springer, pp 234–241Google Scholar
  33. 33.
    Shrivastava A, Gupta A, Girshick R (2016) Training region-based object detectors with online hard example mining. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 761–769Google Scholar
  34. 34.
    Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556
  35. 35.
    Toshev A, Szegedy C (2014) DeepPose: human pose estimation via deep neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1653–1660Google Scholar
  36. 36.
    Wei Y, Liang X, Chen Y, Shen X, Cheng MM, Feng J, Zhao Y, Yan S (2017) STC: a simple to complex framework for weakly-supervised semantic segmentation. IEEE Trans Pattern Anal Mach Intell 39(11):2314–2320CrossRefGoogle Scholar
  37. 37.
    Wu Z, Shen C, van den Hengel A (2016) High-performance semantic segmentation using very deep fully convolutional networks. arXiv preprint arXiv:1604.04339
  38. 38.
    Yu F, Koltun V (2015) Multi-scale context aggregation by dilated convolutions. arXiv preprint arXiv:1511.07122
  39. 39.
    Yu J, Kuang Z, Zhang B, Zhang W, Lin D, Fan J (2018) Leveraging content sensitiveness and user trustworthiness to recommend fine-grained privacy settings for social image sharing. IEEE Trans Inf Forensics Secur 13(5):1317–1332CrossRefGoogle Scholar
  40. 40.
    Yu J, Zhang B, Kuang Z, Lin D, Fan J (2017) iPrivacy: image privacy protection by identifying sensitive objects via deep multi-task learning. IEEE Trans Inf Forensics Secur 12(5):1005–1016CrossRefGoogle Scholar
  41. 41.
    Yuan Y, Wang J (2018) OCNet: Object context network for scene parsing. arXiv preprint arXiv:1809.00916
  42. 42.
    Zhang L, Lin L, Liang X, He K (2016) Is faster R-CNN doing well for pedestrian detection? In: European conference on computer vision. Springer, pp 443–457Google Scholar
  43. 43.
    Zhao H, Shi J, Qi X, Wang X, Jia J (2017) Pyramid scene parsing network. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2881–2890Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.University of Science and Technology of ChinaHefeiChina

Personalised recommendations