Multi-directional guidance network for fine-grained visual classification

Yang, Shengying; Jin, Yao; Lei, Jingsheng; Zhang, Shuping

doi:10.1007/s00371-023-03226-w

Multi-directional guidance network for fine-grained visual classification

Original article
Published: 29 January 2024

(2024)
Cite this article

The Visual Computer Aims and scope Submit manuscript

Shengying Yang ORCID: orcid.org/0000-0003-3010-1622¹,
Yao Jin¹,
Jingsheng Lei¹ &
…
Shuping Zhang²

147 Accesses
Explore all metrics

Abstract

Fine-grained images have a high confusion among subclasses. The key to this is finding discriminative regions that can be used for classification. The existing methods mainly use attention mechanisms or high-level linguistic information for classification, which only focus on the feature regions with the highest response and neglect other parts, resulting in inadequate capability for feature representation. Classification based on only a single feature part is not reliable. The fusion mechanism can achieve locating several different parts. However, simple feature fusion strategies do not exploit cross-layer information and lack the use of low-level information. To effectively address this limitation, we propose the multi-directional guidance network. Our network starts with a feature and attention guidance module that forces the network to learn detailed feature representations. Second, we propose a multi-layer guidance module that integrates diverse semantic information. In addition, we introduce a multi-way transfer structure to fuse low-level and high-level semantics in a novel way to improve generalization ability of the network. We have conducted extensive experiments on the FGVC benchmark dataset (CUB-200-2011, Stanford Cars and FGVC Aircraft) to demonstrate the superior performance of the method. Our code will be available at https://github.com/syyang2022/MGN.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fusing bilinear multi-channel gated vector for fine-grained classification

Article 07 February 2023

Selecting Discriminative Features for Fine-Grained Visual Classification

Multi-branch and Multi-scale Attention Learning for Fine-Grained Visual Categorization

Data Availability Statement

Our code will be available at https://github.com/syyang2022/MGN.

References

Wah, C., Branson, S., Welinder, P., Perona, P., Belongie, S.: The caltech-ucsd birds-200-2011 dataset (2011)
Yang, G., He, Y., Yang, Y., Xu, B.: Fine-grained image classification for crop disease based on attention mechanism. Front. Plant Sci. 11, 600854 (2020)
Article Google Scholar
Zhang, N., Donahue, J., Girshick, R., Darrell, T.: Part-based r-cnns for fine-grained category detection. In: Computer Vision-ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6–12, 2014, Proceedings, Part I 13, pp. 834–849. Springer (2014)
Huang, S., Xu, Z., Tao, D., Zhang, Y.: Part-stacked cnn for fine-grained visual categorization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1173–1182 (2016)
Lin, D., Shen, X., Lu, C., Jia, J.: Deep lac: Deep localization, alignment and classification for fine-grained recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1666–1674 (2015)
Zheng, H., Fu, J., Zha, Z.-J., Luo, J., Mei, T.: Learning rich part hierarchies with progressive attention networks for fine-grained image recognition. IEEE Trans. Image Process. 29, 476–488 (2019)
Article MathSciNet Google Scholar
Zhang, T., Chang, D., Ma, Z., Guo, J.: Progressive co-attention network for fine-grained visual classification. In: 2021 International Conference on Visual Communications and Image Processing (VCIP), pp. 1–5. IEEE (2021)
Ding, Y., Zhou, Y., Zhu, Y., Ye, Q., Jiao, J.: Selective sparse sampling for fine-grained image recognition. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6599–6608 (2019)
Zhang, L., Huang, S., Liu, W., Tao, D.: Learning a mixture of granularity-specific experts for fine-grained categorization. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 8331–8340 (2019)
Xiao, T., Xu, Y., Yang, K., Zhang, J., Peng, Y., Zhang, Z.: The application of two-level attention models in deep convolutional neural network for fine-grained image classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 842–850 (2015)
Yang, Z., Luo, T., Wang, D., Hu, Z., Gao, J., Wang, L.: Learning to navigate for fine-grained classification. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 420–435 (2018)
Ge, W., Lin, X., Yu, Y.: Weakly supervised complementary parts models for fine-grained image classification from the bottom up. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3034–3043 (2019)
Liu, C., Xie, H., Zha, Z.-J., Ma, L., Yu, L., Zhang, Y.: Filtration and distillation: enhancing region attention for fine-grained visual categorization. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 11555–11562 (2020)
Sun, M., Yuan, Y., Zhou, F., Ding, E.: Multi-attention multi-class constraint for fine-grained image recognition. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 805–821 (2018)
Zheng, H., Fu, J., Mei, T., Luo, J.: Learning multi-attention convolutional neural network for fine-grained image recognition. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 5209–5217 (2017)
Fu, J., Zheng, H., Mei, T.: Look closer to see better: Recurrent attention convolutional neural network for fine-grained image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4438–4446 (2017)
He, J., Chen, J.-N., Liu, S., Kortylewski, A., Yang, C., Bai, Y., Wang, C.: Transfg: A transformer architecture for fine-grained recognition. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, pp. 852–860 (2022)
Wang, J., Yu, X., Gao, Y.: Feature fusion vision transformer for fine-grained visual categorization. arXiv:2107.02341 (2021)
Lin, T.-Y., RoyChowdhury, A., Maji, S.: Bilinear cnn models for fine-grained visual recognition. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1449–1457 (2015)
Kong, S., Fowlkes, C.: Low-rank bilinear pooling for fine-grained classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 365–374 (2017)
Li, P., Xie, J., Wang, Q., Gao, Z.: Towards faster training of global covariance pooling networks by iterative matrix square root normalization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 947–955 (2018)
Liao, Q., Wang, D., Holewa, H., Xu, M.: Squeezed bilinear pooling for fine-grained visual categorization. In: Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, pp. 0–0 (2019)
Shi, X., Xu, L., Wang, P., Gao, Y., Jian, H., Liu, W.: Beyond the attention: Distinguish the discriminative and confusable features for fine-grained image classification. In: Proceedings of the 28th ACM International Conference on Multimedia, pp. 601–609 (2020)
Gao, Y., Beijbom, O., Zhang, N., Darrell, T.: Compact bilinear pooling. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 317–326 (2016)
Yu, C., Zhao, X., Zheng, Q., Zhang, P., You, X.: Hierarchical bilinear pooling for fine-grained visual recognition. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 574–589 (2018)
Zhuang, P., Wang, Y., Qiao, Y.: Learning attentive pairwise interaction for fine-grained classification. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 13130–13137 (2020)
Gao, Y., Han, X., Wang, X., Huang, W., Scott, M.: Channel interaction networks for fine-grained image categorization. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 10818–10825 (2020)
Lin, T.-Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2117–2125 (2017)
Li, H., Xiong, P., An, J., Wang, L.: Pyramid attention network for semantic segmentation. arXiv:1805.10180 (2018)
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., Berg, A.C.: Ssd: Single shot multibox detector. In: Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pp. 21–37. Springer (2016)
Chen, X., Fu, C., Zhao, Y., Zheng, F., Song, J., Ji, R., Yang, Y.: Salience-guided cascaded suppression network for person re-identification. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3300–3310 (2020)
Chang, D., Pang, K., Zheng, Y., Ma, Z., Song, Y.-Z., Guo, J.: Your" flamingo" is my" bird": fine-grained, or not. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11476–11485 (2021)
Lei, J., Li, X., Peng, B., Fang, L., Ling, N., Huang, Q.: Deep spatial-spectral subspace clustering for hyperspectral image. IEEE Trans. Circuits Syst. Video Technol. 31(7), 2686–2697 (2020)
Article Google Scholar
Song, X., Jiang, S., Herranz, L.: Multi-scale multi-feature context modeling for scene recognition in the semantic manifold. IEEE Trans. Image Process. 26(6), 2721–2735 (2017)
Article MathSciNet Google Scholar
Jiang, S., Min, W., Liu, L., Luo, Z.: Multi-scale multi-view deep feature aggregation for food recognition. IEEE Trans. Image Process. 29, 265–276 (2019)
Article MathSciNet Google Scholar
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015)
Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5–9, 2015, Proceedings, Part III 18, pp. 234–241. Springer (2015)
Kong, T., Yao, A., Chen, Y., Sun, F.: Hypernet: Towards accurate region proposal generation and joint object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 845–853 (2016)
Yang, Z., Luo, T., Wang, D., Hu, Z., Gao, J., Wang, L.: Learning to navigate for fine-grained classification. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 420–435 (2018)
Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional networks. In: Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part I 13, pp. 818–833. Springer (2014)
Krause, J., Stark, M., Deng, J., Fei-Fei, L.: 3d object representations for fine-grained categorization. In: Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 554–561 (2013)
Maji, S., Rahtu, E., Kannala, J., Blaschko, M., Vedaldi, A.: Fine-grained visual classification of aircraft. arXiv:1306.5151 (2013)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Zheng, H., Fu, J., Zha, Z.-J., Luo, J.: Looking for the devil in the details: Learning trilinear attention sampling network for fine-grained image recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5012–5021 (2019)
Zare, M., Ghasemi, M., Zahedi, A., Golalipour, K., Mohammadi, S.K., Mirjalili, S., Abualigah, L.: A global best-guided firefly algorithm for engineering problems. J. Bionic Eng. 1–30 (2023)
Agushaka, J.O., Ezugwu, A.E., Abualigah, L.: Gazelle optimization algorithm: a novel nature-inspired metaheuristic optimizer. Neural Comput. Appl. 35(5), 4099–4131 (2023)
Article Google Scholar
Hu, G., Zheng, Y., Abualigah, L., Hussien, A.G.: Detdo: an adaptive hybrid dandelion optimizer for engineering optimization. Adv. Eng. Inform. 57, 102004 (2023)
Article Google Scholar
Luo, W., Yang, X., Mo, X., Lu, Y., Davis, L.S., Li, J., Yang, J., Lim, S.-N.: Cross-x learning for fine-grained visual categorization. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 8242–8251 (2019)
Chen, Y., Bai, Y., Zhang, W., Mei, T.: Destruction and construction learning for fine-grained image recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5157–5166 (2019)
Chang, D., Ding, Y., Xie, J., Bhunia, A.K., Li, X., Ma, Z., Wu, M., Guo, J., Song, Y.-Z.: The devil is in the channels: mutual-channel loss for fine-grained image classification. IEEE Trans. Image Process. 29, 4683–4695 (2020)
Article Google Scholar

Download references

Acknowledgements

This work is supported by Natural Science Foundation of Xinjiang Uygur Autonomous Region (No. 2022D01C349) and Scientific Research Fund of Zhejiang Provincial Education Department (Y202352150, Y202352263).

Author information

Authors and Affiliations

Zhejiang University of Science and Technology, Hangzhou, 310023, China
Shengying Yang, Yao Jin & Jingsheng Lei
Xinjiang Institute of Technology, Aksu, 735400, China
Shuping Zhang

Authors

Shengying Yang
View author publications
You can also search for this author in PubMed Google Scholar
Yao Jin
View author publications
You can also search for this author in PubMed Google Scholar
Jingsheng Lei
View author publications
You can also search for this author in PubMed Google Scholar
Shuping Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shengying Yang.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Yang, S., Jin, Y., Lei, J. et al. Multi-directional guidance network for fine-grained visual classification. Vis Comput (2024). https://doi.org/10.1007/s00371-023-03226-w

Download citation

Accepted: 02 December 2023
Published: 29 January 2024
DOI: https://doi.org/10.1007/s00371-023-03226-w

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multi-directional guidance network for fine-grained visual classification

Abstract

Access this article

Similar content being viewed by others

Fusing bilinear multi-channel gated vector for fine-grained classification

Selecting Discriminative Features for Fine-Grained Visual Classification

Multi-branch and Multi-scale Attention Learning for Fine-Grained Visual Categorization

Data Availability Statement

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Multi-directional guidance network for fine-grained visual classification

Abstract

Access this article

Similar content being viewed by others

Fusing bilinear multi-channel gated vector for fine-grained classification

Selecting Discriminative Features for Fine-Grained Visual Classification

Multi-branch and Multi-scale Attention Learning for Fine-Grained Visual Categorization

Data Availability Statement

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation