Scene Level Image Classification: A Literature Review

Chavda, Sagar; Goyani, Mahesh

doi:10.1007/s11063-022-11072-5

Scene Level Image Classification: A Literature Review

Published: 18 November 2022

Volume 55, pages 2471–2520, (2023)
Cite this article

Neural Processing Letters Aims and scope Submit manuscript

718 Accesses
2 Citations
Explore all metrics

This article has been updated

Abstract

Convolutional neural networks (CNNs) have made significant contributions to natural and remote sensing imaging since the development of deep learning. Scene-level image classification is a challenge that affects both the natural and remote sensing domains and has numerous applications. The number of possible scene entities in the image content that could match the dataset images is the main focus. Scene-level classification is significant and fascinating because of open problems like intraclass heterogeneity, interclass homogeneity, background cluttering, high spatial resolution, and different imaging conditions. Additionally, the multi-label scene dataset’s imbalance, lack of preservation of complex semantic relations, and higher label-to-label correlation are all apparent. The article discusses a meta-analysis of the state-of-the-art scene classification literature practices. We discuss CNNs, attention mechanisms, capsule networks, and generative adversarial networks. The article also delivers an overview of the various activations, losses, optimization techniques, and regularization schemes pertinent to the scene domain. The standard benchmark datasets based on single- and multi-label themes are collated. The performance metrics for scene classification are explained as well. The implementation of the multi-label scene classification utilizing several CNN models on the UC Merced multi-label dataset is also covered in the paper. The proposed MobileNet-based model performs better than the recognized cutting-edge methodologies.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Advances in Deep Learning for Hyperspectral Image Analysis—Addressing Challenges Arising in Practical Imaging Scenarios

Semi-supervised Vision Transformers

ColorNet: Investigating the Importance of Color Spaces for Image Classification

Change history

20 November 2022
This article was retracted on 14 October 2022

References

Aksoy S, Koperski K, Tusk C, Marchisio G, Tilton JC (2005) Learning Bayesian classifiers for scene classification with a visual grammar. IEEE Trans Geosci Remote Sens 43(3):581–589. https://doi.org/10.1109/TGRS.2004.839547
Article Google Scholar
Amiri K, Farah M, Leloglu UM (2020) BoVSG: bag of visual SubGraphs for remote sensing scene classification. Int J Remote Sens 41(5):1986–2003. https://doi.org/10.1080/01431161.2019.1681602
Article Google Scholar
Anil R, Gupta V, Koren T, Regan K, Singer Y (2020) Scalable second order optimization for deep learning. arxiv:2002.09018
Apicella A, Donnarumma F, Isgro F, Prevete R (2021) A survey on modern trainable activation functions. Neural Netw 138:14–32. https://doi.org/10.1016/j.neunet.2021.01.026
Article Google Scholar
Bahdanau D, Cho K, Bengio Y (2015) Neural machine translation by jointly learning to align and translate. In: 3rd International conference on learning representations, pp 1–15. arxiv:1409.0473
Basha SS, Dubey SR, Pulabaigari V, Mukherjee S (2020) Impact of fully connected layers on performance of convolutional neural networks for image classification. Neurocomputing 378:112–119. https://doi.org/10.1016/j.neucom.2019.10.008
Article Google Scholar
Bashmal L, Bazi Y, Al Rahhal MM, Alhichri H, Al Ajlan N (2021) Uav image multi-labeling with data-efficient transformers. Appl Sci 11(9):3974. https://doi.org/10.3390/app11093974
Article Google Scholar
Bashmal L, Bazi Y, Rahhal MA (2021b) Deep vision transformers for remote sensing scene classification. In: International geoscience and remote sensing symposium. IEEE, pp 2815–2818. https://doi.org/10.1109/IGARSS47720.2021.9553684
Basu S, Ganguly S, Mukhopadhyay S, DiBiano R, Karki M, Nemani R (2015) Deepsat: a learning framework for satellite imagery. In: Advances in geographic information systems, vol 37. ACM, pp 1–10. https://doi.org/10.1145/2820783.2820816
Bazi Y, Bashmal L, Al Rahhal MM, Dayil RA, Ajlan NA (2021) Vision transformers for remote sensing image classification. Remote Sens 13(3):516:1–20. https://doi.org/10.3390/rs13030516
Article Google Scholar
Bharathi N (2018) Scene classification dataset. https://www.kaggle.com/nitishabharathi/scene-classification
Bhilare A (2021) Complexity of CNN using MACC and flops. https://www.kaggle.com/general/240788
Bi Q, Qin K, Zhang H, Li Z, Xu K (2020) RADC-Net: a residual attention based convolution network for aerial scene classification. Neurocomputing 377:345–359. https://doi.org/10.1016/j.neucom.2019.11.068
Article Google Scholar
Blaschke T, Strobl J (2001) What’s wrong with pixels? Some recent developments interfacing remote sensing and GIS. Z Geoinformationssyst 14(6):12–17
Google Scholar
Bosch A, Zisserman A, Munoz X (2008) Scene classification using a hybrid generative/discriminative approach. IEEE Trans Pattern Anal Mach Intell 30(4):712–727. https://doi.org/10.1109/TPAMI.2007.70716
Article Google Scholar
Bottou L, Curtis FE, Nocedal J (2018) Optimization methods for large-scale machine learning. SIAM Rev 60(2):223–311. https://doi.org/10.1137/16M1080173
Article MathSciNet MATH Google Scholar
Boualleg Y, Farah M, Farah IR (2019) Remote sensing scene classification using convolutional features and deep forest classifier. IEEE Geosci Remote Sens Lett. https://doi.org/10.1109/lgrs.2019.2911855
Article Google Scholar
Boutell M, Luo J (2004) Bayesian fusion of camera metadata cues in semantic scene classification. IEEE Comput Vis Pattern Recogn 2:623–630. https://doi.org/10.1109/CVPR.2004.1315222
Article Google Scholar
Boutell MR, Luo J, Shen X, Brown CM (2004) Learning multi-label scene classification. Pattern Recogn 37(9):1757–1771. https://doi.org/10.1016/j.patcog.2004.03.009
Article Google Scholar
Chaib S, Liu H, Gu Y, Yao H (2017) Deep feature fusion for VHR remote sensing scene classification. IEEE Trans Geosci Remote Sens 55(8):4775–4784. https://doi.org/10.1109/TGRS.2017.2700322
Article Google Scholar
Chaudhari S, Mithal V, Polatkan G, Ramanath R (2021) An attentive survey of attention models. ACM Trans Intell Syst Technol 12(5):1–32. https://doi.org/10.1145/3465055
Article Google Scholar
Chaudhuri B, Demir B, Chaudhuri S, Bruzzone L (2018) Multilabel remote sensing image retrieval using a semisupervised graph-theoretic method. IEEE Trans Geosci Remote Sens 56(2):1144–1158. https://doi.org/10.1109/TGRS.2017.2760909
Article Google Scholar
Chen C, Zhang B, Su H, Li W, Wang L (2016) Land-use scene classification using multi-scale completed local binary patterns. SIViP 10(4):745–752. https://doi.org/10.1007/s11760-015-0804-2
Article Google Scholar
Chen J, Huang H, Peng J, Zhu J, Chen L, Li W, Sun B, Li H (2020) Convolution neural network architecture learning for remote sensing scene classification. arxiv:2001.09614
Cheng G, Guo L, Zhao T, Han J, Li H, Fang J (2013) Automatic landslide detection from remote-sensing imagery using a scene classification method based on boVW and pLSA. Int J Remote Sens 34(1):45–59. https://doi.org/10.1080/01431161.2012.705443
Article Google Scholar
Cheng G, Han J, Lu X (2017) Remote sensing image scene classification: benchmark and state of the art. Proc IEEE 105(10):1865–1883. https://doi.org/10.1109/JPROC.2017.2675998
Article Google Scholar
Cheng G, Li Z, Yao X, Guo L, Wei Z (2017) Remote sensing image scene classification using bag of convolutional features. IEEE Geosci Remote Sens Lett 14(10):1735–1739. https://doi.org/10.1109/LGRS.2017.2731997
Article Google Scholar
Chollet F (2017) Xception: deep learning with depthwise separable convolutions. In: Computer vision and pattern recognition. IEEE, pp 1800–1807. https://doi.org/10.1109/CVPR.2017.195, http://ieeexplore.ieee.org/document/8099678/
Creswell A, White T, Dumoulin V, Arulkumaran K, Sengupta B, Bharath AA (2018) Generative adversarial networks: an overview. IEEE Signal Process Mag 35(1):53–65. https://doi.org/10.1109/MSP.2017.2765202
Article Google Scholar
Datta L (2020) A survey on activation functions and their relation with Xavier and He normal initialization. arxiv:2004.06632
Dede MA, Aptoula E, Genc Y (2019) Deep network ensembles for aerial scene classification. IEEE Geosci Remote Sens Lett 16(5):732–735. https://doi.org/10.1109/LGRS.2018.2880136
Article Google Scholar
Derpanis K, Lecce M, Daniilidis K, Wildes R (2012) Dynamic scene understanding: the role of orientation features in space and time in scene classification. In: Computer vision and pattern recognition. IEEE, pp 1306–1313. https://doi.org/10.1109/CVPR.2012.6247815
Diez J, Luaces O, del Coz JJ, Bahamonde A (2015) Optimizing different loss functions in multilabel classifications. Progr Artif Intell 3(2):107–118. https://doi.org/10.1007/s13748-014-0060-7
Article Google Scholar
Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S, Uszkoreit J, Houlsby N (2020) An image is worth 16\(\times \)16 words: transformers for image recognition at scale, pp 1–22. arxiv:2010.11929v2
Frank E, Hall M (2001) A simple approach to ordinal classification. In: Lecture notes in computer science, vol 2167. Springer, pp 145–156. https://doi.org/10.1007/3-540-44795-4_13
Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. In: Ghahramani Z, Welling M, Cortes C, Lawrence N, Weinberger K (eds) Advances in neural information processing systems, vol 27. Curran Associates, Inc. https://proceedings.neurips.cc/paper/2014/file/5ca3e9b122f61f8f06494c97b1afccf3-Paper.pdf
Guo D, Xia Y, Luo X (2020) Scene classification of remote sensing images based on saliency dual attention residual network. IEEE Access 8:6344–6357. https://doi.org/10.1109/ACCESS.2019.2963769
Article Google Scholar
Guo MH, Xu TX, Liu JJ, Liu ZN, Jiang PT, Mu TJ, Zhang SH, Martin RR, Cheng MM, Hu SM (2021) Attention mechanisms in computer vision: a survey, pp 1–27. http://arxiv.org/abs/2111.07624
Hafiz AM, Parah SA, Bhat RUA (2021) Attention mechanisms and deep learning for machine vision: a survey of the state of the art, pp 1–24. arxiv:2106.07550
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Computer vision and pattern recognition, pp 770–778. https://doi.org/10.1109/CVPR.2016.90
Helber P, Bischke B, Dengel A, Borth D (2019) EuroSAT: a novel dataset and deep learning benchmark for land use and land cover classification. IEEE J Sel Top Appl Earth Observ Remote Sens 12(7):2217–2226. https://doi.org/10.1109/JSTARS.2019.2918242
Article Google Scholar
Hendrycks D, Gimpel K (2016) Gaussian error linear units (GELUs). arxiv:1606.08415
Hinton GE, Dayan P, Frey BJ, Neal RM (1995) The wake sleep algorithm for unsupervised neural networks. Science 268(5214):1158–1161
Article Google Scholar
Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H (2017) MobileNets: efficient convolutional neural networks for mobile vision applications. arxiv:1704.04861
Hu J, Shen L, Albanie S, Sun G, Wu E (2020) Squeeze-and-excitation networks. IEEE Trans Pattern Anal Mach Intell 42(8):2011–2023. https://doi.org/10.1109/TPAMI.2019.2913372
Article Google Scholar
Hua W, Han M, Gong Y (2002) Baseball scene classification using multimedia features. IEEE Multim Expo 1:821–824. https://doi.org/10.1109/ICME.2002.1035908
Article Google Scholar
Hua Y, Mou L, Zhu XX (2019) Recurrently exploring class-wise attention in a hybrid convolutional and bidirectional LSTM network for multi-label aerial image classification. ISPRS J Photogramm Remote Sens 149:188–199. https://doi.org/10.1016/j.isprsjprs.2019.01.015
Article Google Scholar
Hua Y, Mou L, Zhu XX (2020) Relation network for multilabel aerial image classification. IEEE Trans Geosci Remote Sens 58(7):4558–4572. https://doi.org/10.1109/TGRS.2019.2963364
Article Google Scholar
Huang G, Liu Z, Van Der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In: Computer vision and pattern recognition. IEEE, pp 2261–2269 https://doi.org/10.1109/CVPR.2017.243
Huang R, Zheng F, Huang W (2021) Multilabel remote sensing image annotation with multiscale attention and label correlation. IEEE J Sel Top Appl Earth Observ Remote Sens 14:6951–6961. https://doi.org/10.1109/JSTARS.2021.3091134
Article Google Scholar
Huber PJ (1964) Robust estimation of a location parameter. Ann Math Stat 35(1):73–101. https://doi.org/10.1214/aoms/1177703732
Article MathSciNet MATH Google Scholar
Hui J (2017) Understanding dynamic routing between capsules (capsule networks). https://jhui.github.io/2017/11/03/Dynamic-Routing-Between-Capsules/
Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. In: Machine learning. JMLR, pp 448–456
Jin P, Xia GS, Hu F, Lu Q, Zhang L (2018) AID++: an updated version of AID on scene classification. In: IEEE international geoscience and remote sensing symposium. IEEE, pp 4721–4724. https://doi.org/10.1109/IGARSS.2018.8518882, https://ieeexplore.ieee.org/document/8518882/
Khan N, Chaudhuri U, Banerjee B, Chaudhuri S (2019) Graph convolutional network for multi-label VHR remote sensing scene recognition. Neurocomputing 357:36–46. https://doi.org/10.1016/j.neucom.2019.05.024
Article Google Scholar
Khan S, Naseer M, Hayat M, Zamir SW, Khan FS, Shah M (2021) Transformers in vision: a survey. http://arxiv.org/abs/2101.01169
Krizhevsky A, Sutskever I, Hinton GE (2012) ImageNet classification with deep convolutional neural networks. Adv Neural Inf Process Syst 25:1097–1105
Google Scholar
Lazebnik S, Schmid C, Ponce J (2006) Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: Computer vision and pattern recognition. IEEE, New York, pp 2169–2178. https://doi.org/10.1109/CVPR.2006.68
Lecun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2323. https://doi.org/10.1109/5.726791
Article Google Scholar
Li H, Tao C, Wu Z, Chen J, Gong J, Deng M (2017a) RSI-CB: a large scale remote sensing image classification benchmark via crowdsource data. arxiv:1705.10450
Li LJ, Fei-Fei L (2007) What, where and who? Classifying events by scene and object recognition. In: Computer vision. IEEE, Rio de Janeiro, pp 1–8. https://doi.org/10.1109/ICCV.2007.4408872
Li Lj, Socher R, Fei-Fei L (2009) Towards total scene understanding: classification, annotation and segmentation in an automatic framework. In: Computer vision and pattern recognition. IEEE, pp 2036–2043. https://doi.org/10.1109/CVPR.2009.5206718
Li P, Chen P, Zhang D (2022) Cross-modal feature representation learning and label graph mining in a residual multi-attentional CNN-LSTM network for multi-label aerial scene classification. Remote Sens 14(10):2424:1–27. https://doi.org/10.3390/rs14102424
Article Google Scholar
Li X, Du Z, Huang Y, Tan Z (2021) A deep translation (GAN) based change detection network for optical and SAR remote sensing images. ISPRS J Photogramm Remote Sens 179:14–34. https://doi.org/10.1016/j.isprsjprs.2021.07.007
Article Google Scholar
Li Y, Song Y, Luo J (2017) Improving pairwise ranking for multi-label image classification. In: Computer vision and pattern recognition, pp 1837–1845. https://doi.org/10.1109/CVPR.2017.199
Li Y, Chen R, Zhang Y, Zhang M, Chen L (2020) Multi-label remote sensing image scene classification by combining a convolutional neural network and a graph neural network. Remote Sens 12(23):1–17. https://doi.org/10.3390/rs12234003
Article Google Scholar
Lin D, Chen Z (2022) Semantic understandings for aerial images via multigrained feature grouping. Sci Program. https://doi.org/10.1155/2022/1822539
Article Google Scholar
Lin D, Lin J, Zhao L, Wang ZJ, Chen Z (2022) Multilabel aerial image classification with a concept attention graph neural network. IEEE Trans Geosci Remote Sens 60(5602112):1–12. https://doi.org/10.1109/TGRS.2020.3041461
Article Google Scholar
Lin M, Chen Q, Yan S (2013) Network in network, pp 1–10. arxiv:1312.4400
Lin TY, Goyal P, Girshick R, He K, Dollar P (2020) Focal loss for dense object detection. IEEE Trans Pattern Anal Mach Intell 42(2):318–327. https://doi.org/10.1109/TPAMI.2018.2858826
Article Google Scholar
Lipson P, Grimson E, Sinha P (1997) Configuration based scene classification and image indexing. In: Computer vision and pattern recognition. IEEE, pp 1007–1013. https://doi.org/10.1109/CVPR.1997.609453
Liu GH, Yang JY (2013) Content-based image retrieval using color difference histogram. Pattern Recogn 46(1):188–198. https://doi.org/10.1016/j.patcog.2012.06.001
Article Google Scholar
Liu Y, Suen C, Liu Y, Ding L (2019) Scene classification using hierarchical Wasserstein CNN. IEEE Trans Geosci Remote Sens 57(5):2494–2509. https://doi.org/10.1109/TGRS.2018.2873966
Article Google Scholar
Lu X, Sun H, Zheng X (2019) A feature aggregation convolutional neural network for remote sensing scene classification. IEEE Trans Geosci Remote Sens 57(10):7894–7906. https://doi.org/10.1109/TGRS.2019.2917161
Article Google Scholar
Luo J, Savakis A (2001) Indoor vs outdoor classification of consumer photographs using low-level and semantic features. IEEE Image Process 2:745–748
Google Scholar
Ma N, Zhang X, Zheng HT, Sun J (2018) Shufflenet v2: practical guidelines for efficient CNN architecture design. In: The European conference on computer vision. Springer, Munich, pp 116–131. https://openaccess.thecvf.com/content_ECCV_2018/html/Ningning_Light-weight_CNN_Architecture_ECCV_2018_paper.html
Maron O, Ratan AL (1998) Multiple-instance learning for natural scene classification. In: Machine learning. MKP, pp 341–349. https://doi.org/10.1016/S0735-1097(86)80281-9
Martins AF, Astudillo RF (2016) From softmax to sparsemax: a sparse model of attention and multi-label classification. In: Machine learning, vol 4. JMLR, pp 2432–2443
Miller J, Goodman R, Smyth P (1993) On loss functions which minimize to conditional expected values and posterior probabilities. IEEE Trans Inform Theory 39(4):1404–1408. https://doi.org/10.1109/18.243457
Article MATH Google Scholar
Ng A (2017) Deep learning specialization. DeepLearning.AI/Coursera. https://www.deeplearning.ai/program/deep-learning-specialization/. Accessed 10 Oct 2020
Ng AY (2004) Feature selection, L 1 vs. L 2 regularization, and rotational invariance. In: Machine learning. ACM, Banff, pp 1–8. https://doi.org/10.1145/1015330.1015435, http://portal.acm.org/citation.cfm?doid=1015330.1015435
Oliva A, Torralba A (2001) Modeling the shape of the scene: a holistic representation of the spatial envelope. Int J Comput Vis 42(3):145–175. https://doi.org/10.1023/A:1011139631724
Article MATH Google Scholar
Ozyildirim BM, Kiran M (2020) Do optimization methods in deep learning applications matter? arxiv:2002.12642
Patrick KM, Adekoya FA, Mighty AA, Edward BY (2022) Capsule networks–a survey. J King Saud Univ Comput Inf Sci 34(1):1295–1310. https://doi.org/10.1016/j.jksuci.2019.09.014
Article Google Scholar
Payne A, Singh S (2005) Indoor vs. outdoor scene classification in digital photographs. Pattern Recogn 38(10):1533–1545. https://doi.org/10.1016/j.patcog.2004.12.014
Article Google Scholar
Penatti OAB, Nogueira K, dos Santos JA (2015) Do deep features generalize from everyday objects to remote sensing and aerial scenes domains? In: Computer vision and pattern recognition workshops. IEEE, Boston, pp 44–51. https://doi.org/10.1109/CVPRW.2015.7301382
Punjabi A, Schmid J, Katsaggelos AK (2020) Examining the benefits of capsule neural networks, pp 1–13. http://arxiv.org/abs/2001.10964
Qi X, Zhu P, Wang Y, Zhang L, Peng J, Wu M, Chen J, Zhao X, Zang N, Mathiopoulos PT (2020) MLRSNet: a multi-label high spatial resolution remote sensing dataset for semantic scene understanding. ISPRS J Photogramm Remote Sens 169:337–350. https://doi.org/10.1016/j.isprsjprs.2020.09.020
Article Google Scholar
Quattoni A, Torralba A (2009) Recognizing indoor scenes. In: Computer vision and pattern recognition. IEEE, Miami, pp 413–420. https://doi.org/10.1109/CVPR.2009.5206537
Rosasco L, De Vito E, Caponnetto A, Piana M, Verri A (2004) Are loss functions all the same? Neural Comput 16(5):1063–1076. https://doi.org/10.1162/089976604773135104
Article MATH Google Scholar
Sabour S, Frosst N, Hinton GE (2017) Dynamic routing between capsules. In: The 31st international conference on neural information processing systems. NIPS’17. Curran Associates Inc., Red Hook, p 3859–3869
Sandler M, Howard A, Zhu M, Zhmoginov A, Chen LC (2018) MobileNetV2: inverted residuals and linear bottlenecks. In: Computer vision and pattern recognition. IEEE, pp 4510–4520. https://doi.org/10.1109/CVPR.2018.00474, https://ieeexplore.ieee.org/document/8578572/
Serrano N, Savakis A, Luo J (2002) A computationally efficient approach to indoor/outdoor scene classification. IEEE Pattern Recogn 16:146–149. https://doi.org/10.1109/ICPR.2002.1047420
Article Google Scholar
Serrano N, Savakis AE, Luo J (2004) Improved scene classification using efficient low-level features and semantic cues. Pattern Recogn 37(9):1773–1784. https://doi.org/10.1016/j.patcog.2004.03.003
Article MATH Google Scholar
Shen X, Boutell M, Luo J, Brown C (2004) Multi-label machine learning and its application to semantic scene classification. In: Storage and retrieval methods and applications for multimedia. SPIE, pp 188–199. https://doi.org/10.1117/12.523428
Sheng G, Yang W, Xu T, Sun H (2011) High-resolution satellite scene classification using a sparse coding based multiple feature combination. Int J Remote Sens 33(8):2395–2412. https://doi.org/10.1080/01431161.2011.608740
Article Google Scholar
Shorten C, Khoshgoftaar TM (2019) A survey on Image data augmentation for deep learning. J Big Data 6(60):1–48. https://doi.org/10.1186/s40537-019-0197-0
Article Google Scholar
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition, pp 1–14. arxiv:1409.1556
Soydaner D (2020) A comparison of optimization algorithms for deep learning. Int J Pattern Recogn Artif Intell 34(13):1–26. https://doi.org/10.1142/S0218001420520138
Article Google Scholar
Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15(1):1929–1958. https://doi.org/10.5555/2627435.2670313
Article MathSciNet MATH Google Scholar
Srivastava P, Khare A (2018) Utilizing multiscale local binary pattern for content-based image retrieval. Multim Tools Appl 77:12377–12403. https://doi.org/10.1007/s11042-017-4894-4
Article Google Scholar
Stivaktakis R, Tsagkatakis G, Tsakalides P (2019) Deep learning for multilabel land cover scene categorization using data augmentation. IEEE Geosci Remote Sens Lett 16(7):1031–1035. https://doi.org/10.1109/LGRS.2019.2893306
Article Google Scholar
Sumbul G, Demir B (2020) A deep multi-attention driven approach for multi-label remote sensing image classification. IEEE Access 8:95934–95946. https://doi.org/10.1109/ACCESS.2020.2995805
Article Google Scholar
Sumbul G, Charfuelan M, Demir B, Markl V (2019) Bigearthnet: a large-scale benchmark archive for remote sensing image understanding. In: International geoscience and remote sensing symposium. IEEE, Yokohama, pp 5901–5904. https://doi.org/10.1109/IGARSS.2019.8900532
Sun H, Li S, Zheng X, Lu X (2020) Remote sensing scene classification by gated bidirectional network. IEEE Trans Geosci Remote Sens 58(1):82–96. https://doi.org/10.1109/TGRS.2019.2931801
Article Google Scholar
Sun R (2019) Optimization for deep learning: theory and algorithms. arxiv:1912.08957
Sun RY (2020) Optimization for deep learning: an overview. J Oper Res Soc China 8(2):249–294. https://doi.org/10.1007/s40305-020-00309-6
Article MathSciNet MATH Google Scholar
Sun S, Cao Z, Zhu H, Zhao J (2020) A survey of optimization methods from a machine learning perspective. IEEE Trans Cybern 50(8):3668–3681. https://doi.org/10.1109/TCYB.2019.2950779
Article Google Scholar
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Computer vision and pattern recognition. IEEE, pp 1–9. https://doi.org/10.1109/CVPR.2015.7298594
Szegedy C, Ioffe S, Vanhoucke V, Alemi A (2016) Inception-v4, inception-ResNet and the impact of residual connections on learning
Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. In: Computer vision and pattern recognition. IEEE, pp 2818–2826. https://doi.org/10.1109/CVPR.2016.308
Tan M, Le QV (2019) EfficientNet: rethinking model scaling for convolutional neural networks. In: Machine learning research. JMLR, pp 6105–6114. http://proceedings.mlr.press/v97/tan19a.html
Touvron H, Cord M, Douze M, Massa F, Sablayrolles A, Jegou H (2021) Training data-efficient image transformers and distillation through attention. In: International conference on machine learning. PMLR, pp 10347–10357. arxiv:2012.12877
Umamaheswaran S, Lakshmanan R, Vinothkumar V, Arvind K, Nagarajan S (2019) New and robust composite micro structure descriptor (CMSD) for CBIR. Int J Speech Technol 23(2):243–249. https://doi.org/10.1007/s10772-019-09663-0
Article Google Scholar
Vailaya A, Jain A, Jiang Zhangs H (1998) On image classification: city images vs landscapes. Pattern Recogn 31(12):1921–1935. https://doi.org/10.1016/S0031-3203(98)00079-X
Article Google Scholar
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez A, Kaiser L, Polosukhin I (2017) Attention is all you need Ashish. In: Advances in neural information processing systems, pp 5999–6009
Vinyals O, Povey D (2012) Krylov subspace descent for deep learning. In: Artificial intelligence and statistics, La Palma, Canary Islands, vol 22, pp 1261–1268
Vipparthi SK, Nagar SK (2014) Multi-joint histogram based modelling for image indexing and retrieval. Comput Electr Eng 40(8):163–173. https://doi.org/10.1016/j.compeleceng.2014.04.018
Article Google Scholar
Wang J, Li J, Wiederhold G (2001) Simplicity: semantics-sensitive integrated matching for picture libraries. IEEE Trans Pattern Anal Mach Intell 23(9):947–963. https://doi.org/10.1109/34.955109
Article Google Scholar
Wei Y, Luo X, Lixin Hu YP, Feng J (2020) An improved unsupervised representation learning generative adversarial network for remote sensing image scene classification. Remote Sens Lett 11(6):598–607. https://doi.org/10.1080/2150704X.2020.1746854
Article Google Scholar
Wei Y, Zhang Z, Wang Y, Xu M, Yang Y, Yan S, Wang M (2021) Deraincyclegan: Rain attentive cyclegan for single image deraining and rainmaking. IEEE Trans Image Process 30:4788–4801. https://doi.org/10.1109/TIP.2021.3074804
Article Google Scholar
Weng L (2018) Attention? Attention! lilianwenggithubio/lil-log.https://lilianweng.github.io/lil-log/2018/06/24/attention-attention.html
Weng L (2020) The transformer family. lilianwenggithubio/lil-log. https://lilianweng.github.io/lil-log/2020/03/27/the-transformer-family.html
Xia GS, Hu J, Hu F, Shi B, Bai X, Zhong Y, Zhang L, Lu X (2017) AID: a benchmark data set for performance evaluation of aerial scene classification. IEEE Trans Geosci Remote Sens 55(7):3965–3981. https://doi.org/10.1109/TGRS.2017.2685945
Article Google Scholar
Xiao J, Hays J, Ehinger K, Oliva A, Torralba A (2010) Sun database: large-scale scene recognition from abbey to zoo. In: Computer vision and pattern recognition. IEEE, San Francisco, CA, pp 3485–3492. https://doi.org/10.1109/CVPR.2010.5539970
Xie S, Girshick R, Dollar P, Tu Z, He K (2017) Aggregated residual transformations for deep neural networks. In: Computer vision and pattern recognition. IEEE, pp 5987–5995. https://doi.org/10.1109/CVPR.2017.634, http://ieeexplore.ieee.org/document/8100117/
Yan R, Liu Y, Jin R, Hauptmann A (2003) On predicting rare classes with SVM ensembles in scene classification. In: Acoustics, speech, and signal processing. IEEE, pp 3–21. https://doi.org/10.1109/ICASSP.2003.1199097
Yang J, Jiang YG, Hauptmann AG, Ngo CW (2007) Evaluating bag-of-visual-words representations in scene classification. In: Multimedia conference and exhibition. ACM, pp 197–206. https://doi.org/10.1145/1290082.1290111
Yang Y, Newsam S (2010) Bag-of-visual-words and spatial extensions for land-use classification. In: GIS: Proceedings of the ACM international symposium on advances in geographic information systems, pp 270–279. https://doi.org/10.1145/1869790.1869829
Yao Y, Rosasco L, Caponnetto A (2007) On early stopping in gradient descent learning. Constr Approx 26(2):289–315. https://doi.org/10.1007/s00365-006-0663-2
Article MathSciNet MATH Google Scholar
yzimm (2021) The amount of parameters (parameters) and the amount of calculation (flops) in the convolutional neural network CNN. https://chowdera.com/2021/04/20210420120616773r.html
Zeiler MD, Fergus R (2014) Visualizing and understanding convolutional networks. In: Lecture notes in computer science, vol 8689 LNCS. Springer, pp 818–833. https://doi.org/10.1007/978-3-319-10590-1_53
Zhang W, Tang P, Zhao L (2019) Remote sensing image scene classification using CNN-CapsNet. Remote Sens 11(5):1–22. https://doi.org/10.3390/rs11050494
Article Google Scholar
Zhang X, Zhou X, Lin M, Sun J (2018) Shufflenet: an extremely efficient convolutional neural network for mobile devices. In: computer vision and pattern recognition. IEEE, Salt Lake City, pp 6848–6856. https://openaccess.thecvf.com/content_cvpr_2018/html/Zhang_ShuffleNet_An_Extremely_CVPR_2018_paper.html
Zhao B, Zhong Y, Xia GS, Zhang L (2016) Dirichlet-derived multiple topic scene classification model for high spatial resolution remote sensing imagery. IEEE Trans Geosci Remote Sens 54(4):2108–2123. https://doi.org/10.1109/TGRS.2015.2496185
Article Google Scholar
ZhaoLi J, Tang P, Huo L (2016) Feature significance-based multibag-of-visual-words model for remote sensing image scene classification. J Appl Remote Sens 10(3):1–21. https://doi.org/10.1117/1.JRS.10.035004
Article Google Scholar
Zheng Q, Yang M, Tian X, Jiang N, Wang D (2020) A full stage data augmentation method in deep convolutional neural network for natural image classification. Discret Dyn Nat Soc 2020:4706576. https://doi.org/10.1155/2020/4706576
Article MATH Google Scholar
Zheng Q, Zhao P, Yang Li HW, Yang Y (2021) Spectrum interference-based two-level data augmentation method in deep learning for automatic modulation classification. Neural Comput Appl 33:7723–7745. https://doi.org/10.1007/s00521-020-05514-1
Article Google Scholar
Zheng X, Yuan Y, Lu X (2019) A deep scene representation for aerial scene classification. IEEE Trans Geosci Remote Sens 57(7):4799–4809. https://doi.org/10.1109/TGRS.2019.2893115
Article Google Scholar
Zhou B, Lapedriza A, Xiao J, Torralba A, Oliva A (2014) Learning deep features for scene recognition using places database. In: Neural information processing systems, Boston, MA, pp 487–495. https://doi.org/10.5555/2968826.2968881
Zhou W, Newsam S, Li C, Shao Z (2018) PatternNet: a benchmark dataset for performance evaluation of remote sensing image retrieval. ISPRS J Photogram Remote Sens 145(Part A):197–209. https://doi.org/10.1016/j.isprsjprs.2018.01.004
Article Google Scholar
Zhu M (2021) A brief analysis of GAN variants on image classification and generation. J Phys: Conf Ser 1827(1):012165. https://doi.org/10.1088/1742-6596/1827/1/012165
Article Google Scholar
Zoph B, Vasudevan V, Shlens J, Le QV (2018) Learning transferable architectures for scalable image recognition. In: Computer vision and pattern recognition. IEEE, pp 8697–8710. https://doi.org/10.1109/CVPR.2018.00907, https://ieeexplore.ieee.org/document/8579005/
Zou Q, Ni L, Zhang T, Wang Q (2015) Deep learning based feature selection for remote sensing scene classification. IEEE Trans Geosci Remote Sens Lett 12(11):2321–2325. https://doi.org/10.1109/LGRS.2015.2475299
Article Google Scholar

Download references

Funding

This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

Author information

Authors and Affiliations

Gujarat Technological University, Ahmedabad, India
Sagar Chavda & Mahesh Goyani
Government Engineering College, Modasa, India
Sagar Chavda & Mahesh Goyani

Authors

Sagar Chavda
View author publications
You can also search for this author in PubMed Google Scholar
Mahesh Goyani
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sagar Chavda.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Chavda, S., Goyani, M. Scene Level Image Classification: A Literature Review. Neural Process Lett 55, 2471–2520 (2023). https://doi.org/10.1007/s11063-022-11072-5

Download citation

Accepted: 16 October 2022
Published: 18 November 2022
Issue Date: June 2023
DOI: https://doi.org/10.1007/s11063-022-11072-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Scene Level Image Classification: A Literature Review

Abstract

Access this article

Similar content being viewed by others

Advances in Deep Learning for Hyperspectral Image Analysis—Addressing Challenges Arising in Practical Imaging Scenarios

Semi-supervised Vision Transformers

ColorNet: Investigating the Importance of Color Spaces for Image Classification

Change history

20 November 2022

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Scene Level Image Classification: A Literature Review

Abstract

Access this article

Similar content being viewed by others

Advances in Deep Learning for Hyperspectral Image Analysis—Addressing Challenges Arising in Practical Imaging Scenarios

Semi-supervised Vision Transformers

ColorNet: Investigating the Importance of Color Spaces for Image Classification

Change history

20 November 2022

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation