Semantic Segmentation via Efficient Attention Augmented Convolutional Networks

Cao, Jingjing; Liao, Zhengfei; Zhao, Qiangwei

doi:10.1007/978-981-16-5188-5_47

Jingjing Cao¹⁰,
Zhengfei Liao¹⁰ &
Qiangwei Zhao¹¹

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1449))

Included in the following conference series:

International Conference on Neural Computing for Advanced Applications

1705 Accesses

Abstract

Self attention can extract global information by operating on the whole input while convolution layer only operates on a local neighborhood. So concatenating the outputs of convolution and self attention can augment the ability of collecting the contextual information of convolutional networks. However, the complexities of memory and computation of self attention will grow quadratically with the input size, which hinders its applicability on high-resolution images. Thus, we propose the efficient attention augmented convolution module to solve the complexity problem caused by self attention. In this module, there are three branches of operations, which are convolution, efficient attention and column-row attention respectively. Efficient attention has linear complexities with input size by switching the order of matrix multiplication of self attention. Column-row attention is a column attention operation followed by a row attention operation, which is used to collect the spatial information that efficient attention lack of for flattening its input. And the output of this module is the combination of the outputs of these three operations. We replace several convolution layers in fully convolutional networks with this augmentation module and get the efficient attention augmented convolutional networks. Then we test it on PASCAL VOC 2012 semantic segmentation task, and the experimental results show that all the augmented models have improvements on performance compared with those baselines not being augmented.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 119.00; Price excludes VAT (USA)

Softcover Book: USD 159.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Bello, I., Zoph, B., Vaswani, A., Shlens, J., Le, Q.V.: Attention augmented convolutional networks. arXiv (2019)
Google Scholar
Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell. 40(4), 834–848 (2017)
Article Google Scholar
Chen, L.C., Zhu, Y., Papandreou, G., Schroff, F., Adam, H.: Encoder-decoder with atrous separable convolution for semantic image segmentation. arXiv (2018)
Google Scholar
Child, R., Gray, S., Radford, A., Sutskever, I.: Generating long sequences with sparse transformers. arXiv (2019)
Google Scholar
Dosovitskiy, A., et al.: An image is worth 16x16 words: transformers for image recognition at scale, pp. 1–21. arXiv (2020)
Google Scholar
Fan, J., Zhang, Z., Tan, T., Song, C., Xiao, J.: CIAN: cross-image affinity net for weakly supervised semantic segmentation. arXiv (2018)
Google Scholar
Guo, Q., Qiu, X., Liu, P., Shao, Y., Xue, X., Zhang, Z.: Star-transformer. arXiv preprint arXiv:1902.09113 (2019)
Hao, S., Zhou, Y., Guo, Y.: A brief survey on semantic segmentation with deep learning. Neurocomputing 406, 302–321 (2020)
Article Google Scholar
Hariharan, B., Arbeláez, P., Girshick, R., Malik, J.: Simultaneous detection and segmentation. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014, Part VII. LNCS, vol. 8695, pp. 297–312. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10584-0_20
Chapter Google Scholar
He, J., Deng, Z., Zhou, L., Wang, Y., Qiao, Y.: Adaptive pyramid context network for semantic segmentation. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition 2019-June, pp. 7511–7520 (2019)
Google Scholar
He, T., Shen, C., Tian, Z., Gong, D., Sun, C., Yan, Y.: Knowledge adaptation for efficient semantic segmentation, pp. 578–587. arXiv (2019)
Google Scholar
Ho, J., Kalchbrenner, N., Weissenborn, D., Salimans, T.: Axial attention in multidimensional transformers, pp. 1–11. arXiv (2019)
Google Scholar
Huang, L., Yuan, Y., Guo, J., Zhang, C., Chen, X., Wang, J.: Interlaced sparse self-attention for semantic segmentation. arXiv preprint arXiv:1907.12273 (2019)
Huang, Z., et al.: CCNet: criss-cross attention for semantic segmentation. arXiv (2018)
Google Scholar
Joutard, S., Dorent, R., Isaac, A., Ourselin, S., Vercauteren, T., Modat, M.: Permutohedral attention module for efficient non-local neural networks. In: Shen, D., et al. (eds.) MICCAI 2019, Part VI. LNCS, vol. 11769, pp. 393–401. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-32226-7_44
Chapter Google Scholar
Lee, J., Lee, Y., Kim, J., Kosiorek, A., Choi, S., Teh, Y.W.: Set transformer: a framework for attention-based permutation-invariant neural networks. In: International Conference on Machine Learning, pp. 3744–3753. PMLR (2019)
Google Scholar
Li, X., Zhong, Z., Wu, J., Yang, Y., Lin, Z., Liu, H.: Expectation-maximization attention networks for semantic segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9167–9176 (2019)
Google Scholar
Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.C.: MobileNetV2: inverted residuals and linear bottlenecks, pp. 4510–4520. arXiv (2018)
Google Scholar
Shelhamer, E., Long, J., Darrell, T.: Fully convolutional networks for semantic segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39(4), 640–651 (2017)
Article Google Scholar
Song, C., Huang, Y., Ouyang, W., Wang, L.: Box-driven class-wise region masking and filling rate guided loss for weakly supervised semantic segmentation, pp. 3136–3145. arXiv (2019)
Google Scholar
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, NIPS, December 2017, pp. 5999–6009 (2017)
Google Scholar
Vernaza, P., Chandraker, M.: Learning random-walk label propagation for weakly-supervised semantic segmentation, pp. 7158–7166. arXiv (2018)
Google Scholar
Wei, Y., Feng, J., Liang, X., Cheng, M.M., Zhao, Y., Yan, S.: Object region mining with adversarial erasing: a simple classification to semantic segmentation approach. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1568–1576 (2017)
Google Scholar
Zhang, H., Zhang, H., Wang, C., Xie, J.: Co-occurrent features in semantic segmentation. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, June 2019, pp. 548–557 (2019)
Google Scholar
Zhuoran, S.S., Mingyuan, Z.Z., Haiyu, Z.Z., Shuai, Y.Y., Hongsheng, L.L.: Efficient attention: attention with linear complexities. arXiv (2018)
Google Scholar

Download references

Author information

Authors and Affiliations

Wuhan University of Technology, Wuhan, China
Jingjing Cao & Zhengfei Liao
Huazhong University of Science and Technology, Wuhan, China
Qiangwei Zhao

Authors

Jingjing Cao
View author publications
You can also search for this author in PubMed Google Scholar
Zhengfei Liao
View author publications
You can also search for this author in PubMed Google Scholar
Qiangwei Zhao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jingjing Cao .

Editor information

Editors and Affiliations

Harbin Institute of Technology, Shenzhen, China
Haijun Zhang
Nanfang College of Sun Yat-sen University, Guangzhou, China
Zhi Yang
Hefei University of Technology, Hefei, China
Zhao Zhang
Chongqing University, Chongqing, China
Zhou Wu
South China Normal University, Guangzhou, China
Tianyong Hao

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Cao, J., Liao, Z., Zhao, Q. (2021). Semantic Segmentation via Efficient Attention Augmented Convolutional Networks. In: Zhang, H., Yang, Z., Zhang, Z., Wu, Z., Hao, T. (eds) Neural Computing for Advanced Applications. NCAA 2021. Communications in Computer and Information Science, vol 1449. Springer, Singapore. https://doi.org/10.1007/978-981-16-5188-5_47

Download citation

DOI: https://doi.org/10.1007/978-981-16-5188-5_47
Published: 20 August 2021
Publisher Name: Springer, Singapore
Print ISBN: 978-981-16-5187-8
Online ISBN: 978-981-16-5188-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics