A global-local feature adaptive fusion network for image scene classification

Lv, Guangrui; Dong, Lili; Zhang, Wenwen; Xu, Wenhai

doi:10.1007/s11042-023-15519-2

A global-local feature adaptive fusion network for image scene classification

Published: 10 June 2023

Volume 83, pages 6521–6554, (2024)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Guangrui Lv¹,
Lili Dong ORCID: orcid.org/0000-0002-1213-7691¹,
Wenwen Zhang¹ &
…
Wenhai Xu¹

167 Accesses
Explore all metrics

Abstract

Convolutional neural networks (CNN) have been widely used in image scene classification and have achieved remarkable progress. However, because the extracted deep features can neither focus on the local semantics of the image, nor capture the spatial morphological variation of the image, it is not appropriate to directly use CNN to generate the distinguishable feature representations. To relieve this limitation, a global-local feature adaptive fusion (GLFAF) network is proposed. The GLFAF framework extracts multi-scale and multi-level features by using a designed CNN. Then, to leverage the complementary advantages of the multi-scale and multi-level features, we design a global feature aggregate module to discover global attention features and further learn the multiple deep dependencies of spatial scale variations among these global features. Meanwhile, a local feature aggregate module is designed to aggregate the multi-scale and multi-level features. Specially, multi-level features at the same scale are fused based on channel attention, and then spatial fused features at different scales are aggregated based on channel dependence. Moreover, spatial contextual attention is designed to refine spatial features across scales and different fisher vector layers are designed to learn semantic aggregation among spatial features. Subsequently, two different feature adaptive fusion modules are introduced to explore the complementary associations of global and local aggregate features, which can obtain comprehensive and differentiated image scene presentation. Finally, a large number of experiments on real scene datasets coming from three different fields show that the proposed GLFAF approach can more accurately realize scene classification than other state-of-the-art models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

CBAM: Convolutional Block Attention Module

A review of convolutional neural networks in computer vision

Article Open access 23 March 2024

Deep learning models for digital image processing: a review

Article 07 January 2024

Data Availability

The UC Merced Land-Use dataset that support the findings of this study are available in the ucmerced repository: http://weegee.vision.ucmerced.edu/datasets/landuse.html. The UIUC Sports dataset that support the findings of this study are available in the stanford repository: http://vision.stanford.edu/lijiali/event_dataset/. The infrared maritime scene dataset that support the findings of this study are available from the corresponding author on reasonable request.

References

Anwer RM, Khan FS, van de Weijer J et al (2018) Binary patterns encoded convolutional neural networks for texture recognition and remote sensing scene classification. ISPRS J Photogrammetry Rem Sens 138:74–85
Basiri ME, Nemati S, Abdar M et al (2021) ABCDM: an attention-based bidirectional CNN-RNN deep model for sentiment analysis. Futur Gener Comput Syst 115:279–294
Google Scholar
Bay H, Tuytelaars T, Van Gool L (2006) Surf: speeded up robust features, European conference on computer vision. Springer, Berlin, pp 404–417
Google Scholar
Bi Q, Qin K, Li Z et al (2019) Multiple instance dense connected convolution neural network for aerial image scene classification. In: 2019 IEEE International conference on image processing (ICIP). IEEE, pp 2501–2505
Bi Q, Qin K, Zhang H et al (2019) APDC-Net: attention pooling-based convolutional network for aerial scene classification. IEEE Geosci Rem Sens Lett 17(9):1603–1607
Google Scholar
Bi Q, Qin K, Zhang H (2020) RADC-Net: a residual attention based convolution network for aerial scene classification. Neurocomputing 377:345–359
Google Scholar
Bi Q, Qin K, Li Z et al (2020) A multiple-instance densely-connected ConvNet for aerial scene classification. IEEE Trans Image Process 29:4911–4926
Google Scholar
Blei DM, Ng AY, Jordan MI (2003) Latent Dirichlet allocation. J Mach Learn Res 3:993–1022
Google Scholar
Chen Y (2015) Convolutional neural network for sentence classification. University of Waterloo
Cheng G, Ma C, Zhou P et al (2016) Scene classification of high resolution remote sensing images using convolutional neural networks. In: 2016 IEEE International geoscience and remote sensing symposium (IGARSS). IEEE, pp 767–770
Cheng G, Xie X, Han J et al (2020) Remote sensing image scene classification meets deep learning: challenges, methods, benchmarks, and opportunities. IEEE J Selected Topics Appl Earth Observ Rem Sens PP(99):1–1
Google Scholar
Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: IEEE computer society conference on computer vision and pattern recognition (CVPR’05), vol 1. IEEE, pp 886–893
Ding C, Tao D (2015) Robust face recognition via multimodal deep face representation. IEEE Trans Multimed 17(11):2049–2058
Google Scholar
Dong L, Zhang T, Ma D et al (2020) Maritime background infrared imagery classification based on histogram of oriented gradient and local contrast features. Journal of Infrared and Millimeter Waves 39:5
Google Scholar
Dosovitskiy A, Beyer L, Kolesnikov A et al (2020) An image is worth 16x16 words: transformers for image recognition at scale, arXiv:2010.11929
Feng Y, Chen F, Ji Y, et al. (2021) Efficient cross-modality graph reasoning for RGB-infrared person re-identification. IEEE Signal Process Lett 28:1425–1429
Google Scholar
He K, Zhang X, Ren S et al (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Hofmann T (2001) Unsupervised learning by probabilistic latent semantic analysis. Mach Learn 42(1):177–196
Google Scholar
Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7132–7141
Hu X, Yang K, Fei L et al (2019) Acnet: attention based network to exploit complementary features for rgbd semantic segmentation. In: IEEE International conference on image processing (ICIP). IEEE, pp 1440–1444
Huang H, Xu K (2019) Combing triple-part features of convolutional neural networks for scene classification in remote sensing. Remote Sens 11(14):1687
Google Scholar
Jiang Y, Yuan J, Yu G (2012) Randomized spatial partition for scene recognition, European conference on computer vision. Springer, Berlin, pp 730–743
Google Scholar
Jgou H, Douze M, Schmid C et al (2010) Aggregating local descriptors into a compact image representation. In: 2010 IEEE computer society conference on computer vision and pattern recognition. IEEE, pp 3304–3311
Li LJ, Li FF (2007) What, where and who? Classifying events by scene and object recognition Computer Vision. In: Proc.of IEEE International conference on computer vision, pp 1–8
Li Q, Wu J, Tu Z (2013) Harvesting mid-level visual concepts from large-scale internet images. In: 2013 IEEE Conference on computer vision and pattern recognition, pp 851–858
Li Q, Peng Q, Yan C (2018) Multiple VLAD encoding of CNNs for image classification. Comput Sci Eng 20(2):52–63
Google Scholar
Lin D, Lu C, Liao R et al (2014) Learning important spatial pooling regions for scene classification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3726–3733
Liu Z, Lin Y, Cao Y et al (2021) Swin transformer: hierarchical vision transformer using shifted windows. International Conference on Computer Vision, 10012-10022
Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110
Google Scholar
Lu X, Sun H, Zheng X (2019) A feature aggregation convolutional neural network for remote sensing scene classification. IEEE Trans Geosci Remote Sens 57(10):7894–7906
Google Scholar
Lv Y, Zhang X, Xiong W et al (2019) An end-to-end local-global-fusion feature extraction network for remote sensing image scene classification. Rem Sens 2019 11(24):3006
Google Scholar
Ma J, Ma Q, Tang X et al (2020) Remote sensing scene classification based on global and local consistent network, IGARSS 2020-2020. In: IEEE International geoscience and remote sensing symposium. IEEE, pp 537–540
Ni K, Liu P, Wang P (2021) Compact global-local convolutional network with multifeature fusion and learning for scene classification in synthetic aperture radar imagery. IEEE J Selected Topics Appl Earth Observ Rem Sens 14:7284–7296
Google Scholar
Oliva A, Torralba A (2001) Modeling the shape of the scene: a holistic representation of the spatial envelope. Int J Comput Vis 42(3):145–175
Google Scholar
Perronnin F, Snchez J, Mensink T (2010) Improving the fisher kernel for large-scale image classification, European conference on computer vision. Springer, Heidelberg, pp 143–156
Google Scholar
Qi K, Yang C, Hu C et al (2021) Rotation invariance regularization for remote sensing image scene classification with convolutional neural networks[J]. Remote Sens 13(4):569
Google Scholar
Rublee E, Rabaud V, Konolige K et al (2011) ORB: an efficient alternative to SIFT or SURF. In: 2011 International conference on computer vision. IEEE, pp 2564–2571
Sadeghi F, Tappen M F (2012) Latent pyramidal regions for recognizing scenes, European conference on computer vision. Springer, Berlin, pp 228–241
Google Scholar
Satpathy A, Jiang X, Eng HL (2014) LBP-based edge-texture features for object recognition. IEEE Trans Image Process 23(5):1953–1964
MathSciNet Google Scholar
Sheng G, Wen Y, Tao X et al (2012) High-resolution satellite scene classification using a sparse coding based multiple feature combination. Int J Remote Sens 33(8):2395–2412
Google Scholar
Shen J, Zhang T, Wang Y et al (2010) A dual-model architecture with grouping-attention-fusion for remote sensing scene classification. Remote Sens 13(3):433
Google Scholar
Shi C, Wang T, Wang L (2020) Branch feature fusion convolution network for remote sensing scene classification. IEEE J Selected Topics Appl Earth Observ Rem Sens 13:5194–5210
Google Scholar
Shrinivasa SR, Prabhakar CJ (2022) Scene image classification based on visual words concatenation of local and global features. Multimed Tools Appl 81 (1):1237–1256
Google Scholar
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. Computer Science
Sitaula C, Xiang Y, Basnet A et al (2019) Tag-based semantic features for scene image classification. In: International conference on neural information processing. Springer, Cham, pp 90–102
Sitaula C, Xiang Y, Basnet A et al (2020) Hdf: hybrid deep features for scene image representation. International Joint Conference on Neural Networks (IJCNN) IEEE 2020:1–8
Google Scholar
Sitaula C, Aryal S, Xiang Y et al (2021) Content and context features for scene image representation[J]. Knowl-Based Syst 232:107470
Google Scholar
Smeulders AWM, Worring M, Santini S et al (2000) Content-based image retrieval at the end of the early years. IEEE Trans Pattern Anal Mach Intell 22(12):1349–1380
Google Scholar
Sun N, Li W, Liu J et al (2018) Fusing object semantics and deep appearance features for scene recognition. IEEE Trans Circuits Syst Video Technol 29 (6):1715–1728
Google Scholar
Szegedy C, Liu W, Jia Y et al (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1–9
Sun H, Li S, Zheng X et al (2019) Remote sensing scene classification by gated bidirectional network. IEEE Trans Geosci Rem Sens PP(99):1–15
Google Scholar
Vaswani A, Shazeer N, Parmar N et al (2017) Attention is all you need[J]. Advances in Neural Information Processing Systems, 30
Wang Y (2021) Survey on deep multi-modal data analytics: collaboration, rivalry, and fusion. ACM Trans Multimed Comput Commun Appli (TOMM) 17 (1s):1–25
Google Scholar
Wang D, Mao K (2019) Task-generic semantic convolutional neural network for web text-aided image classification. Neurocomputing 329:103–115
Google Scholar
Wang Y, Zhang W, Wu L et al (2016) Iterative views agreement: an iterative low-rank based structured optimization method to multi-view spectral clustering. arXiv:1608.05560
Wang G, Fan B, Xiang S et al (2017) Aggregating rich hierarchical features for scene classification in remote sensing imagery. IEEE J Selected Topics Appl Earth Observ Rem Sens 10(9):4104–4115
Google Scholar
Wang Q, Liu S, Chanussot J et al (2018) Scene classification with recurrent attention of VHR remote sensing images. IEEE Trans Geosci Remote Sens 57(2):1155–1167
Google Scholar
Wang X, Wang S, Ning C et al (2021) Enhanced feature pyramid network with deep semantic embedding for remote sensing scene classification. IEEE Trans Geosci Rem Sens 59(9):7918–7932
Google Scholar
Wang W, Xie E, Li X et al (2021) Pyramid vision transformer: a versatile backbone for dense prediction without convolutions. International Conference on Computer Vision, 568–578
Woo S, Park J, Lee JY et al (2018) Cbam: convolutional block attention module. In: Proceedings of the European conference on computer vision (ECCV), pp 3–19
Wu J, Rehg JM (2010) Centrist: a visual descriptor for scene categorization. IEEE Trans Pattern Anal Mach Intell 33(8):1489–1501
Google Scholar
Wu F, Jing XY, Dong X et al (2018) Intraspectrum discrimination and interspectrum correlation analysis deep network for multispectral face recognition. IEEE Trans Cybern 50(3):1009–1022
Google Scholar
Wu F, Jing XY, Feng Y et al (2021) Spectrum-aware discriminative deep feature learning for multi-spectral face recognition. Pattern Recogn 111:107632
Google Scholar
Xia GS, Hu J, Hu F (2017) AID: a benchmark data set for performance evaluation of aerial scene classification. IEEE Trans Geosci Remote Sens 55(7):3965–3981
Google Scholar
Xia S, Zeng J, Leng L et al (2019) Ws-am: weakly supervised attention map for scene recognition. Electronics 8(10):1072
Google Scholar
Xiong Z, Yuan Y, Wang Q (2020) MSN: modality separation networks for RGB-D scene recognition. Neurocomputing 373:81–89
Google Scholar
Xu K, Huang H, Deng P et al (2020) Two-stream feature aggregation deep neural network for scene classification of remote sensing images[J]. Inform Sci 539:250–268
MathSciNet Google Scholar
Xu K, Huang H, Deng P (2021) Remote sensing image scene classification based on global-local dual-branch structure model. IEEE Geoscience and Remote Sensing Letters
Yang Y, Newsam S (2010) Bag-of-visual-words and spatial extensions for land-use classification. In: Proceedings of the 18th SIGSPATIAL international conference on advances in geographic information systems, pp 270–279
Zeng D, Chen S, Chen B et al (2018) Improving remote sensing scene classification by integrating global-context and local-object features. Remote Sens 10(5):734
Google Scholar
Zhang F, Du B, Zhang L (2015) Scene classification via a gradient boosting random convolutional network framework. IEEE Trans Geosci Remote Sens 54(3):1793–1802
Google Scholar
Zhang C, Zhu G, Huang Q et al (2017) Image classification by search with explicitly and implicitly semantic representations. Inform Sci 376:125–135
Google Scholar
Zhang W, Tang P, Zhao L (2019) Remote sensing image scene classification using CNN-CapsNet. Remote Sens 11(5):494
Google Scholar
Zhang J, Yang K, Constantinescu A et al (2021) Trans4Trans: efficient transformer for transparent object segmentation to help visually impaired people navigate in the real world. International Conference on Computer Vision, 1760–1770
Zhang C, Wang Y, Zhu L et al (2021) Multi-graph heterogeneous interaction fusion for social recommendation. ACM Trans Inform Syst (TOIS) 40 (2):1–26
Google Scholar
Zheng Y, Jiang YG, Xue X (2012) Learning hybrid part filters for scene recognition, European conference on computer vision. Springer, Berlin, pp 172–185
Google Scholar
Zhou B, Khosla A, Lapedriza A et al (2016) Places: an image database for deep scene understanding, arXiv:1610.02055
Zhu Q, Zhong Y, Liu Y et al (2018) A deep-local-global feature fusion framework for high spatial resolution imagery scene classification. Remote Sens 10(4):568
Google Scholar

Download references

Funding

This paper was supported in part by the Fundamental Research Funds for the Central Universities of China under Grant 3132019340 and 3132019200. This paper was supported in part by high tech ship research project from ministry of industry and information technology of the people’s republic of China under Grant MC-201902-C01.

Author information

Authors and Affiliations

College of Information Science and Technology, Dalian Maritime University, Dalian, 116026, China
Guangrui Lv, Lili Dong, Wenwen Zhang & Wenhai Xu

Authors

Guangrui Lv
View author publications
You can also search for this author in PubMed Google Scholar
Lili Dong
View author publications
You can also search for this author in PubMed Google Scholar
Wenwen Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Wenhai Xu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lili Dong.

Ethics declarations

Conflict of Interests

The authors declare no conflicts of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Lv, G., Dong, L., Zhang, W. et al. A global-local feature adaptive fusion network for image scene classification. Multimed Tools Appl 83, 6521–6554 (2024). https://doi.org/10.1007/s11042-023-15519-2

Download citation

Received: 08 September 2021
Revised: 29 June 2022
Accepted: 19 April 2023
Published: 10 June 2023
Issue Date: January 2024
DOI: https://doi.org/10.1007/s11042-023-15519-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A global-local feature adaptive fusion network for image scene classification

Abstract

Access this article

Similar content being viewed by others

CBAM: Convolutional Block Attention Module

A review of convolutional neural networks in computer vision

Deep learning models for digital image processing: a review

Data Availability

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A global-local feature adaptive fusion network for image scene classification

Abstract

Access this article

Similar content being viewed by others

CBAM: Convolutional Block Attention Module

A review of convolutional neural networks in computer vision

Deep learning models for digital image processing: a review

Data Availability

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation