Advertisement

Multi-layer Adaptive Feature Fusion for Semantic Segmentation

  • Yizhen Chen
  • Haifeng HuEmail author
Article
  • 31 Downloads

Abstract

Multi-layer feature fusion is a very important strategy for semantic segmentation, as a single-layer feature is usually unable to make an accurate prediction on every pixel. However, most current methods adopt direct summing or channel concatenation on multi-layer features, lacking of consideration of the distinction and complementarity between them. To explore their respective importance and to achieve an appropriate fusion on each pixel, in this paper, we propose a novel multi-layer adaptive feature fusion method for semantic segmentation, which is based on attention mechanism. Specifically, our method encourages the network to learn the importance of features from different layer according to the content of input image and the specific capability of each layer of feature, expressed in the form of weight map. By pixel-wisely multiplying the features with their corresponding weight maps, we can change the response values proportionally at each pixel and get several weighted features. Finally, the weighted features are summed up to obtain the highly fused feature for discrimination. A series of comparative experiments are carried out on two public datasets, PASCAL VOC 2012 and PASCAL-Person-Part, which successfully prove the effectiveness of our method. Furthermore, we visualize the weight maps of the multi-layer features to facilitate an intuitive understanding of their importance at different location.

Keywords

Semantic segmentation Multi-layer adaptive feature fusion Attention mechanism Weight map 

Notes

Acknowledgements

This work was supported in part by the National Natural Science Foundation of China under Grants 61673402, in part by the Natural Science Foundation of Guangdong under Grants 2017A030311029, in part by the Science and Technology Program of Guangzhou under Grants 201704020180.

References

  1. 1.
    Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3431–3440Google Scholar
  2. 2.
    Badrinarayanan V, Kendall A, Cipolla R (2017) Segnet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans Pattern Anal Mach Intell 39(12):2481–2495CrossRefGoogle Scholar
  3. 3.
    Chen L-C, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2017) Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans Pattern Anal Mach Intell 40(4):834–848CrossRefGoogle Scholar
  4. 4.
    Lin G, Milan A, Shen C, Reid I (2017) Refinenet: multi-path refinement networks for high-resolution semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1925–1934Google Scholar
  5. 5.
    Peng C, Zhang X, Yu G, Luo G, Sun J (2017) Large kernel matters-improve semantic segmentation by global convolutional network. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4353–4361Google Scholar
  6. 6.
    Yu C, Wang J, Peng C, Gao C, Yu G, Sang N (2018) Learning a discriminative feature network for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1857–1866Google Scholar
  7. 7.
    Zhao H, Shi J, Qi X, Wang X, Jia J (2017) Pyramid scene parsing network. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2881–2890Google Scholar
  8. 8.
    Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556
  9. 9.
    He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778Google Scholar
  10. 10.
    Huang G, Liu Z, Van Der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4700–4708Google Scholar
  11. 11.
    Drozdzal M, Vorontsov E, Chartrand G, Kadoury S, Pal C (2016) The importance of skip connections in biomedical image segmentation. In: Deep learning and data labeling for medical applications. Springer, New York, pp 179–187Google Scholar
  12. 12.
    Yu J, Tan M, Zhang H, Tao D, Rui Y (2019) Hierarchical deep click feature prediction for fine-grained image recognition. IEEE Trans Pattern Anal Mach Intell.  https://doi.org/10.1109/TPAMI.2019.2932058 CrossRefGoogle Scholar
  13. 13.
    Yu J, Zhu C, Zhang J, Huang Q, Tao D (2019) Spatial pyramid-enhanced NetVLAD with weighted triplet loss for place recognition. IEEE Trans Neural Netw Learn Syst.  https://doi.org/10.1109/TNNLS.2019.2908982 CrossRefGoogle Scholar
  14. 14.
    Zhang J, Yu J, Tao D (2018) Local deep-feature alignment for unsupervised dimension reduction. IEEE Trans Image Process 27(5):2420–2432MathSciNetCrossRefGoogle Scholar
  15. 15.
    Hong C, Yu J, Zhang J, Jin X, Lee KH (2018) Multi-modal face pose estimation with multi-task manifold deep learning. IEEE Trans Ind Inform 15:3952–3961CrossRefGoogle Scholar
  16. 16.
    Hong C, Yu J, Wan J, Tao D, Wang M (2015) Multimodal deep autoencoder for human pose recovery. IEEE Trans Image Process 24(12):5659–5670MathSciNetCrossRefGoogle Scholar
  17. 17.
    Hong C, Yu J, Tao D, Wang M (2014) Image-based three-dimensional human pose recovery by multiview locality-sensitive sparse retrieval. IEEE Trans Ind Electron 62(6):3742–3751Google Scholar
  18. 18.
    Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: Towards real-time object detection with region proposal networks. In: Advances in neural information processing systems, pp 91–99Google Scholar
  19. 19.
    He K, Gkioxari G, Dollár P, Girshick R (2017) Mask r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp 2961–2969Google Scholar
  20. 20.
    Yu J, Rui Y, Tao D (2014) Click prediction for web image reranking using multimodal sparse coding. IEEE Trans Image Process 23(5):2019–2032MathSciNetCrossRefGoogle Scholar
  21. 21.
    Yu J, Yang X, Gao F, Tao D (2016) Deep multimodal distance metric learning using click constraints for image ranking. IEEE Trans Cybern 47(12):4014–4024CrossRefGoogle Scholar
  22. 22.
    Yu J, Zhang B, Kuang Z, Lin D, Fan J (2016) iPrivacy: image privacy protection by identifying sensitive objects via deep multi-task learning. IEEE Trans Inf Forensics Secur 12(5):1005–1016CrossRefGoogle Scholar
  23. 23.
    Yu J, Kuang Z, Zhang B, Zhang W, Lin D, Fan J (2018) Leveraging content sensitiveness and user trustworthiness to recommend fine-grained privacy settings for social image sharing. IEEE Trans Inf Forensics Secur 13(5):1317–1332CrossRefGoogle Scholar
  24. 24.
    Ronneberger O, Fischer P, Brox T (2015) U-net: Convolutional networks for biomedical image segmentation. In: International conference on medical image computing and computer-assisted intervention. Springer, New York, pp 234–241Google Scholar
  25. 25.
    Jégou S, Drozdzal M, Vazquez D, Romero A, Bengio Y (2017) The one hundred layers tiramisu: fully convolutional densenets for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 11–19Google Scholar
  26. 26.
    Xu K, Ba J, Kiros R, Cho K, Courville A, Salakhudinov R, Zemel R, Bengio Y (2015) Show, attend and tell: neural image caption generation with visual attention. In: International conference on machine learning, pp 2048–2057Google Scholar
  27. 27.
    Chorowski JK, Bahdanau D, Serdyuk D, Cho K, Bengio Y (2015) Attention-based models for speech recognition. In: Advances in neural information processing systems, pp 577–585Google Scholar
  28. 28.
    Li H, Xiong P, An J, Wang L. Pyramid attention network for semantic segmentation. arXiv preprint arXiv:1805.10180
  29. 29.
    Chen L-C, Papandreou G, Schroff F, Adam H. Rethinking atrous convolution for semantic image segmentation. arXiv preprint arXiv:1706.05587
  30. 30.
    Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105Google Scholar
  31. 31.
    Everingham M, Van Gool L, Williams CK, Winn J, Zisserman A (2010) The pascal visual object classes (voc) challenge. Int J Comput Vis 88(2):303–338CrossRefGoogle Scholar
  32. 32.
    Chen X, Mottaghi R, Liu X, Fidler S, Urtasun R, Yuille A (2014) Detect what you can: detecting and representing objects using holistic models and body parts. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1971–1978Google Scholar
  33. 33.
    Chen L-C, Papandreou G, Kokkinos I, Murphy K, Yuille AL. Semantic image segmentation with deep convolutional nets and fully connected crfs. arXiv preprint arXiv:1412.7062
  34. 34.
    Chen L-C, Yang Y, Wang J, Xu W, Yuille AL (2016) Attention to scale: Scale-aware semantic image segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3640–3649Google Scholar
  35. 35.
    Wu Z, Shen C, Hengel Avd. Bridging category-level and instance-level semantic image segmentation. arXiv preprint arXiv:1605.06885
  36. 36.
    Chen L-C, Zhu Y, Papandreou G, Schroff F, Adam H (2018) Encoder-decoder with atrous separable convolution for semantic image segmentation. In: Proceedings of the European conference on computer vision (ECCV), pp 801–818Google Scholar
  37. 37.
    Lin T-Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft coco: Common objects in context. In: European conference on computer vision. Springer, New York, pp 740–755Google Scholar
  38. 38.
    Sun C, Shrivastava A, Singh S, Gupta A (2017) Revisiting unreasonable effectiveness of data in deep learning era. In: Proceedings of the IEEE international conference on computer vision, pp 843–852Google Scholar
  39. 39.
    Xia F, Wang P, Chen L-C, Yuille AL (2016) Zoom better to see clearer: Human and object parsing with hierarchical auto-zoom net. In: european conference on computer vision. Springer, New York, pp 648–663Google Scholar
  40. 40.
    Liang X, Shen X, Feng J, Lin L, Yan S (2016) Semantic object parsing with graph lstm. In: European conference on computer vision. Springer, New York, pp 125–143Google Scholar
  41. 41.
    Huang Q, Xia C, Wu C, Li S, Wang Y, Song Y, Kuo C-CJ. Semantic segmentation with reverse attention. arXiv preprint arXiv:1707.06426

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.School of Electronics and Information TechnologySun Yat-Sen UniversityGuangzhouChina

Personalised recommendations