Abstract
Self attention can extract global information by operating on the whole input while convolution layer only operates on a local neighborhood. So concatenating the outputs of convolution and self attention can augment the ability of collecting the contextual information of convolutional networks. However, the complexities of memory and computation of self attention will grow quadratically with the input size, which hinders its applicability on high-resolution images. Thus, we propose the efficient attention augmented convolution module to solve the complexity problem caused by self attention. In this module, there are three branches of operations, which are convolution, efficient attention and column-row attention respectively. Efficient attention has linear complexities with input size by switching the order of matrix multiplication of self attention. Column-row attention is a column attention operation followed by a row attention operation, which is used to collect the spatial information that efficient attention lack of for flattening its input. And the output of this module is the combination of the outputs of these three operations. We replace several convolution layers in fully convolutional networks with this augmentation module and get the efficient attention augmented convolutional networks. Then we test it on PASCAL VOC 2012 semantic segmentation task, and the experimental results show that all the augmented models have improvements on performance compared with those baselines not being augmented.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bello, I., Zoph, B., Vaswani, A., Shlens, J., Le, Q.V.: Attention augmented convolutional networks. arXiv (2019)
Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell. 40(4), 834–848 (2017)
Chen, L.C., Zhu, Y., Papandreou, G., Schroff, F., Adam, H.: Encoder-decoder with atrous separable convolution for semantic image segmentation. arXiv (2018)
Child, R., Gray, S., Radford, A., Sutskever, I.: Generating long sequences with sparse transformers. arXiv (2019)
Dosovitskiy, A., et al.: An image is worth 16x16 words: transformers for image recognition at scale, pp. 1–21. arXiv (2020)
Fan, J., Zhang, Z., Tan, T., Song, C., Xiao, J.: CIAN: cross-image affinity net for weakly supervised semantic segmentation. arXiv (2018)
Guo, Q., Qiu, X., Liu, P., Shao, Y., Xue, X., Zhang, Z.: Star-transformer. arXiv preprint arXiv:1902.09113 (2019)
Hao, S., Zhou, Y., Guo, Y.: A brief survey on semantic segmentation with deep learning. Neurocomputing 406, 302–321 (2020)
Hariharan, B., Arbeláez, P., Girshick, R., Malik, J.: Simultaneous detection and segmentation. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014, Part VII. LNCS, vol. 8695, pp. 297–312. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10584-0_20
He, J., Deng, Z., Zhou, L., Wang, Y., Qiao, Y.: Adaptive pyramid context network for semantic segmentation. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition 2019-June, pp. 7511–7520 (2019)
He, T., Shen, C., Tian, Z., Gong, D., Sun, C., Yan, Y.: Knowledge adaptation for efficient semantic segmentation, pp. 578–587. arXiv (2019)
Ho, J., Kalchbrenner, N., Weissenborn, D., Salimans, T.: Axial attention in multidimensional transformers, pp. 1–11. arXiv (2019)
Huang, L., Yuan, Y., Guo, J., Zhang, C., Chen, X., Wang, J.: Interlaced sparse self-attention for semantic segmentation. arXiv preprint arXiv:1907.12273 (2019)
Huang, Z., et al.: CCNet: criss-cross attention for semantic segmentation. arXiv (2018)
Joutard, S., Dorent, R., Isaac, A., Ourselin, S., Vercauteren, T., Modat, M.: Permutohedral attention module for efficient non-local neural networks. In: Shen, D., et al. (eds.) MICCAI 2019, Part VI. LNCS, vol. 11769, pp. 393–401. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-32226-7_44
Lee, J., Lee, Y., Kim, J., Kosiorek, A., Choi, S., Teh, Y.W.: Set transformer: a framework for attention-based permutation-invariant neural networks. In: International Conference on Machine Learning, pp. 3744–3753. PMLR (2019)
Li, X., Zhong, Z., Wu, J., Yang, Y., Lin, Z., Liu, H.: Expectation-maximization attention networks for semantic segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9167–9176 (2019)
Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.C.: MobileNetV2: inverted residuals and linear bottlenecks, pp. 4510–4520. arXiv (2018)
Shelhamer, E., Long, J., Darrell, T.: Fully convolutional networks for semantic segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39(4), 640–651 (2017)
Song, C., Huang, Y., Ouyang, W., Wang, L.: Box-driven class-wise region masking and filling rate guided loss for weakly supervised semantic segmentation, pp. 3136–3145. arXiv (2019)
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, NIPS, December 2017, pp. 5999–6009 (2017)
Vernaza, P., Chandraker, M.: Learning random-walk label propagation for weakly-supervised semantic segmentation, pp. 7158–7166. arXiv (2018)
Wei, Y., Feng, J., Liang, X., Cheng, M.M., Zhao, Y., Yan, S.: Object region mining with adversarial erasing: a simple classification to semantic segmentation approach. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1568–1576 (2017)
Zhang, H., Zhang, H., Wang, C., Xie, J.: Co-occurrent features in semantic segmentation. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, June 2019, pp. 548–557 (2019)
Zhuoran, S.S., Mingyuan, Z.Z., Haiyu, Z.Z., Shuai, Y.Y., Hongsheng, L.L.: Efficient attention: attention with linear complexities. arXiv (2018)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Cao, J., Liao, Z., Zhao, Q. (2021). Semantic Segmentation via Efficient Attention Augmented Convolutional Networks. In: Zhang, H., Yang, Z., Zhang, Z., Wu, Z., Hao, T. (eds) Neural Computing for Advanced Applications. NCAA 2021. Communications in Computer and Information Science, vol 1449. Springer, Singapore. https://doi.org/10.1007/978-981-16-5188-5_47
Download citation
DOI: https://doi.org/10.1007/978-981-16-5188-5_47
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-16-5187-8
Online ISBN: 978-981-16-5188-5
eBook Packages: Computer ScienceComputer Science (R0)