Abstract
The object scale variation results in a negative effect on image segmentation performance. Spatial pyramid pooling module or the attention mechanism are two widely used components in deep neural networks to handle this problem. Applying the single component commonly achieves limited benefit. To push the limit, in this paper, we propose a scale channel attention network (SCA-Net), which enhances the fusion feature of multi-scale by using channel attention components. After the multiple-scale pooling step, the multi-scale spatial information distributes in different feature channels. Meanwhile, the channel attention block is employed to guide SCA-Net focus on the object-relevant scale channels. We further explore the channel attention block and find a simple yet effective structure to combine global average pooling and global maximum pooling, resulting in a robust global information encoder. The SCA-Net does not contain any time-consuming post-processing, which is an extra step after the neural network for the segmentation result optimization. The assessment results on PASCAL VOC 2012 and Cityscapes benchmarks achieve the test set performance of 75.5% and 77.0%.
Similar content being viewed by others
References
Adelson EH, Anderson CH, Bergen JR, Burt PJ, Ogden JM (1984) Pyramid methods in image processing. RCA Eng 29(6):33–41
Badrinarayanan V, Kendall A, Cipolla R (2017) Segnet: a deep convolutional encoder-decoder architecture for image segmentation, vol 39, pp 2481–2495
Bluche T (2016) Joint line segmentation and transcription for end-to-end handwritten paragraph recognition. In: Advances in neural information processing systems, pp 838–846
Bulo SR, Neuhold G, Kontschieder P (2017) Loss max-pooling for semantic image segmentation. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, pp 7082–7091
Burt PJ (1988) Attention mechanisms for vision in a dynamic world. In: [1998 Proceedings] 9th international conference on pattern recognition. IEEE, pp 977–987
Chen Liang-Chieh, Yi Y, Wang J, Wei X, Yuille AL (2016) Attention to scale: Scale-aware semantic image segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3640–3649
Chen L, Zhang H, Xiao J, Nie L, Shao J, Liu W, Chua T-S (2017) Sca-cnn: Spatial and channel-wise attention in convolutional networks for image captioning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5659–5667
Chen L-C, Zhu Y, Papandreou G, Schroff F, Adam H (2018) Encoder-decoder with atrous separable convolution for semantic image segmentation. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 801–818
Chen Liang-Chieh, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2018) Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans Pattern Anal Mach Intell 40(4):834–848
Cordts M, Omran M, Ramos S, Rehfeld T, Enzweiler M, Benenson R, Franke U, Roth S, Schiele B (2016) The cityscapes dataset for semantic urban scene understanding. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3213–3223
Corbetta M, Shulman GL (2002) Control of goal-directed and stimulus-driven attention in the brain. Nature Rev Neurosci 3(3):201
Everingham M, Van Gool L, Williams CKI, Winn J, Zisserman A (2010) The pascal visual object classes (voc) challenge. Int J Comput Vis 88(2):303–338
Hariharan B, Arbeláez P, Bourdev L, Maji S, Malik J (2011) Semantic contours from inverse detectors. In: 2011 International conference on computer vision. IEEE, pp 991–998
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. Computer vision and pattern recognition, 770–778
He K, Gkioxari G, Dollár P, Girshick R (2017) Mask r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp 2961–2969
Huan D, Liu Z, Shi R (2018) Salient object segmentation based on depth-aware image layering. Multimed Tools Appl, 1–14
Huang G, Liu Z, Van Der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4700–4708
Huang Z, Wang X, Huang L, Huang C, Wei Y, Liu W (2018) Ccnet: Criss-cross attention for semantic segmentation. arXiv:1811.11721
Itti L, Koch C, Niebur E (1998) A model of saliency-based visual attention for rapid scene analysis. IEEE Trans Pattern Anal Mach Intell 11:1254–1259
Jie H, Li S, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7132–7141
Lin G, Shen C, Van Den Hengel A, Reid I (2016) Efficient piecewise training of deep structured models for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3194–3203
Lin T-Y, Dollár P., Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2117–2125
Lin G, Milan A, Shen C, Reid I (2017) Refinenet: Multi-path refinement networks for high-resolution semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1925–1934
Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3431–3440
Milletari F, Navab N, Ahmadi S-A (2016) V-net: Fully convolutional neural networks for volumetric medical image segmentation. In: 2016 Fourth International Conference on 3D Vision (3DV). IEEE, pp 565–571
Mnih V, Heess N, Heess AG, et al. (2014) Recurrent models of visual attention. In: Advances in neural information processing systems, pp 2204–2212
Noh H, Hong S, Han B (2015) Learning deconvolution network for semantic segmentation. In: Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV). IEEE Computer Society, pp 1520–1528
Ronneberger O, Fischer P, Brox T (2015) U-net: Convolutional networks for biomedical image segmentation. In: International conference on medical image computing and computer-assisted intervention. Springer, pp 234–241
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1–9
Tian Y, Guo J, Yulei W, Lin H (2019) Towards attack and defense views of rational delegation of computation. IEEE Access
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. In: Advances in neural information processing systems, pp 5998–6008
Wang C, Yang J, Wang K, Lai S-H (2017) Multi-scale energy optimization for object proposal generation. Multimed Tools Appl 76(8):10481–10499
Wang F, Jiang M, Qian C, Yang S, Li C, Zhang H, Wang X, Tang X (2017) Residual attention network for image classification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3156–3164
Woo S, Park J, Lee J-Y, In SK (2018) Cbam Convolutional block attention module. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 3–19
Xie H, Yang D, Sun N, Chen Z, Zhang Y (2019) Automated pulmonary nodule detection in ct images using deep convolutional neural networks. Pattern Recogn 85:109–119
Xie H, Fang S, Zha Z-J, Yang Y, Li Y, Zhang Y (2019) Convolutional attention networks for scene text recognition. ACM Trans Multimed Comput Comm Appl (TOMM) 15(1s):3
Zhang Y, Li K, Li K, Wang L, Zhong B, Yun F (2018) Image super-resolution using very deep residual channel attention networks. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 286–301
Zhao H, Shi J, Qi X, Wang X, Jia J (2017) Pyramid scene parsing network. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2881–2890
Zheng S, Jayasumana S, Romera-Paredes B, Vineet V, Zhizhong S, Dalong D, Huang C, Torr PHS (2015) Conditional random fields as recurrent neural networks. In: Proceedings of the IEEE international conference on computer vision, pp 1529–1537
Zhou B, Khosla A, Lapedriza A, Oliva A, Torralba A (2016) Learning deep features for discriminative localization. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2921–2929
Acknowledgements
This paper is partly supported by the National Key Research and Development Program of China (2017YFB0803301) and the Major Scientific and Technological Special Project of Guizhou Province (20183001).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Chen, J., Tian, Y., Ma, W. et al. Scale channel attention network for image segmentation. Multimed Tools Appl 80, 16473–16489 (2021). https://doi.org/10.1007/s11042-020-08921-7
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-020-08921-7