Skip to main content
Log in

Scene Level Image Classification: A Literature Review

  • Published:
Neural Processing Letters Aims and scope Submit manuscript

This article has been updated

Abstract

Convolutional neural networks (CNNs) have made significant contributions to natural and remote sensing imaging since the development of deep learning. Scene-level image classification is a challenge that affects both the natural and remote sensing domains and has numerous applications. The number of possible scene entities in the image content that could match the dataset images is the main focus. Scene-level classification is significant and fascinating because of open problems like intraclass heterogeneity, interclass homogeneity, background cluttering, high spatial resolution, and different imaging conditions. Additionally, the multi-label scene dataset’s imbalance, lack of preservation of complex semantic relations, and higher label-to-label correlation are all apparent. The article discusses a meta-analysis of the state-of-the-art scene classification literature practices. We discuss CNNs, attention mechanisms, capsule networks, and generative adversarial networks. The article also delivers an overview of the various activations, losses, optimization techniques, and regularization schemes pertinent to the scene domain. The standard benchmark datasets based on single- and multi-label themes are collated. The performance metrics for scene classification are explained as well. The implementation of the multi-label scene classification utilizing several CNN models on the UC Merced multi-label dataset is also covered in the paper. The proposed MobileNet-based model performs better than the recognized cutting-edge methodologies.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17

Similar content being viewed by others

Change history

  • 20 November 2022

    This article was retracted on 14 October 2022

References

  1. Aksoy S, Koperski K, Tusk C, Marchisio G, Tilton JC (2005) Learning Bayesian classifiers for scene classification with a visual grammar. IEEE Trans Geosci Remote Sens 43(3):581–589. https://doi.org/10.1109/TGRS.2004.839547

    Article  Google Scholar 

  2. Amiri K, Farah M, Leloglu UM (2020) BoVSG: bag of visual SubGraphs for remote sensing scene classification. Int J Remote Sens 41(5):1986–2003. https://doi.org/10.1080/01431161.2019.1681602

    Article  Google Scholar 

  3. Anil R, Gupta V, Koren T, Regan K, Singer Y (2020) Scalable second order optimization for deep learning. arxiv:2002.09018

  4. Apicella A, Donnarumma F, Isgro F, Prevete R (2021) A survey on modern trainable activation functions. Neural Netw 138:14–32. https://doi.org/10.1016/j.neunet.2021.01.026

    Article  Google Scholar 

  5. Bahdanau D, Cho K, Bengio Y (2015) Neural machine translation by jointly learning to align and translate. In: 3rd International conference on learning representations, pp 1–15. arxiv:1409.0473

  6. Basha SS, Dubey SR, Pulabaigari V, Mukherjee S (2020) Impact of fully connected layers on performance of convolutional neural networks for image classification. Neurocomputing 378:112–119. https://doi.org/10.1016/j.neucom.2019.10.008

    Article  Google Scholar 

  7. Bashmal L, Bazi Y, Al Rahhal MM, Alhichri H, Al Ajlan N (2021) Uav image multi-labeling with data-efficient transformers. Appl Sci 11(9):3974. https://doi.org/10.3390/app11093974

    Article  Google Scholar 

  8. Bashmal L, Bazi Y, Rahhal MA (2021b) Deep vision transformers for remote sensing scene classification. In: International geoscience and remote sensing symposium. IEEE, pp 2815–2818. https://doi.org/10.1109/IGARSS47720.2021.9553684

  9. Basu S, Ganguly S, Mukhopadhyay S, DiBiano R, Karki M, Nemani R (2015) Deepsat: a learning framework for satellite imagery. In: Advances in geographic information systems, vol 37. ACM, pp 1–10. https://doi.org/10.1145/2820783.2820816

  10. Bazi Y, Bashmal L, Al Rahhal MM, Dayil RA, Ajlan NA (2021) Vision transformers for remote sensing image classification. Remote Sens 13(3):516:1–20. https://doi.org/10.3390/rs13030516

    Article  Google Scholar 

  11. Bharathi N (2018) Scene classification dataset. https://www.kaggle.com/nitishabharathi/scene-classification

  12. Bhilare A (2021) Complexity of CNN using MACC and flops. https://www.kaggle.com/general/240788

  13. Bi Q, Qin K, Zhang H, Li Z, Xu K (2020) RADC-Net: a residual attention based convolution network for aerial scene classification. Neurocomputing 377:345–359. https://doi.org/10.1016/j.neucom.2019.11.068

    Article  Google Scholar 

  14. Blaschke T, Strobl J (2001) What’s wrong with pixels? Some recent developments interfacing remote sensing and GIS. Z Geoinformationssyst 14(6):12–17

    Google Scholar 

  15. Bosch A, Zisserman A, Munoz X (2008) Scene classification using a hybrid generative/discriminative approach. IEEE Trans Pattern Anal Mach Intell 30(4):712–727. https://doi.org/10.1109/TPAMI.2007.70716

    Article  Google Scholar 

  16. Bottou L, Curtis FE, Nocedal J (2018) Optimization methods for large-scale machine learning. SIAM Rev 60(2):223–311. https://doi.org/10.1137/16M1080173

    Article  MathSciNet  MATH  Google Scholar 

  17. Boualleg Y, Farah M, Farah IR (2019) Remote sensing scene classification using convolutional features and deep forest classifier. IEEE Geosci Remote Sens Lett. https://doi.org/10.1109/lgrs.2019.2911855

    Article  Google Scholar 

  18. Boutell M, Luo J (2004) Bayesian fusion of camera metadata cues in semantic scene classification. IEEE Comput Vis Pattern Recogn 2:623–630. https://doi.org/10.1109/CVPR.2004.1315222

    Article  Google Scholar 

  19. Boutell MR, Luo J, Shen X, Brown CM (2004) Learning multi-label scene classification. Pattern Recogn 37(9):1757–1771. https://doi.org/10.1016/j.patcog.2004.03.009

    Article  Google Scholar 

  20. Chaib S, Liu H, Gu Y, Yao H (2017) Deep feature fusion for VHR remote sensing scene classification. IEEE Trans Geosci Remote Sens 55(8):4775–4784. https://doi.org/10.1109/TGRS.2017.2700322

    Article  Google Scholar 

  21. Chaudhari S, Mithal V, Polatkan G, Ramanath R (2021) An attentive survey of attention models. ACM Trans Intell Syst Technol 12(5):1–32. https://doi.org/10.1145/3465055

    Article  Google Scholar 

  22. Chaudhuri B, Demir B, Chaudhuri S, Bruzzone L (2018) Multilabel remote sensing image retrieval using a semisupervised graph-theoretic method. IEEE Trans Geosci Remote Sens 56(2):1144–1158. https://doi.org/10.1109/TGRS.2017.2760909

    Article  Google Scholar 

  23. Chen C, Zhang B, Su H, Li W, Wang L (2016) Land-use scene classification using multi-scale completed local binary patterns. SIViP 10(4):745–752. https://doi.org/10.1007/s11760-015-0804-2

    Article  Google Scholar 

  24. Chen J, Huang H, Peng J, Zhu J, Chen L, Li W, Sun B, Li H (2020) Convolution neural network architecture learning for remote sensing scene classification. arxiv:2001.09614

  25. Cheng G, Guo L, Zhao T, Han J, Li H, Fang J (2013) Automatic landslide detection from remote-sensing imagery using a scene classification method based on boVW and pLSA. Int J Remote Sens 34(1):45–59. https://doi.org/10.1080/01431161.2012.705443

    Article  Google Scholar 

  26. Cheng G, Han J, Lu X (2017) Remote sensing image scene classification: benchmark and state of the art. Proc IEEE 105(10):1865–1883. https://doi.org/10.1109/JPROC.2017.2675998

    Article  Google Scholar 

  27. Cheng G, Li Z, Yao X, Guo L, Wei Z (2017) Remote sensing image scene classification using bag of convolutional features. IEEE Geosci Remote Sens Lett 14(10):1735–1739. https://doi.org/10.1109/LGRS.2017.2731997

    Article  Google Scholar 

  28. Chollet F (2017) Xception: deep learning with depthwise separable convolutions. In: Computer vision and pattern recognition. IEEE, pp 1800–1807. https://doi.org/10.1109/CVPR.2017.195, http://ieeexplore.ieee.org/document/8099678/

  29. Creswell A, White T, Dumoulin V, Arulkumaran K, Sengupta B, Bharath AA (2018) Generative adversarial networks: an overview. IEEE Signal Process Mag 35(1):53–65. https://doi.org/10.1109/MSP.2017.2765202

    Article  Google Scholar 

  30. Datta L (2020) A survey on activation functions and their relation with Xavier and He normal initialization. arxiv:2004.06632

  31. Dede MA, Aptoula E, Genc Y (2019) Deep network ensembles for aerial scene classification. IEEE Geosci Remote Sens Lett 16(5):732–735. https://doi.org/10.1109/LGRS.2018.2880136

    Article  Google Scholar 

  32. Derpanis K, Lecce M, Daniilidis K, Wildes R (2012) Dynamic scene understanding: the role of orientation features in space and time in scene classification. In: Computer vision and pattern recognition. IEEE, pp 1306–1313. https://doi.org/10.1109/CVPR.2012.6247815

  33. Diez J, Luaces O, del Coz JJ, Bahamonde A (2015) Optimizing different loss functions in multilabel classifications. Progr Artif Intell 3(2):107–118. https://doi.org/10.1007/s13748-014-0060-7

    Article  Google Scholar 

  34. Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S, Uszkoreit J, Houlsby N (2020) An image is worth 16\(\times \)16 words: transformers for image recognition at scale, pp 1–22. arxiv:2010.11929v2

  35. Frank E, Hall M (2001) A simple approach to ordinal classification. In: Lecture notes in computer science, vol 2167. Springer, pp 145–156. https://doi.org/10.1007/3-540-44795-4_13

  36. Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. In: Ghahramani Z, Welling M, Cortes C, Lawrence N, Weinberger K (eds) Advances in neural information processing systems, vol 27. Curran Associates, Inc. https://proceedings.neurips.cc/paper/2014/file/5ca3e9b122f61f8f06494c97b1afccf3-Paper.pdf

  37. Guo D, Xia Y, Luo X (2020) Scene classification of remote sensing images based on saliency dual attention residual network. IEEE Access 8:6344–6357. https://doi.org/10.1109/ACCESS.2019.2963769

    Article  Google Scholar 

  38. Guo MH, Xu TX, Liu JJ, Liu ZN, Jiang PT, Mu TJ, Zhang SH, Martin RR, Cheng MM, Hu SM (2021) Attention mechanisms in computer vision: a survey, pp 1–27. http://arxiv.org/abs/2111.07624

  39. Hafiz AM, Parah SA, Bhat RUA (2021) Attention mechanisms and deep learning for machine vision: a survey of the state of the art, pp 1–24. arxiv:2106.07550

  40. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Computer vision and pattern recognition, pp 770–778. https://doi.org/10.1109/CVPR.2016.90

  41. Helber P, Bischke B, Dengel A, Borth D (2019) EuroSAT: a novel dataset and deep learning benchmark for land use and land cover classification. IEEE J Sel Top Appl Earth Observ Remote Sens 12(7):2217–2226. https://doi.org/10.1109/JSTARS.2019.2918242

    Article  Google Scholar 

  42. Hendrycks D, Gimpel K (2016) Gaussian error linear units (GELUs). arxiv:1606.08415

  43. Hinton GE, Dayan P, Frey BJ, Neal RM (1995) The wake sleep algorithm for unsupervised neural networks. Science 268(5214):1158–1161

    Article  Google Scholar 

  44. Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H (2017) MobileNets: efficient convolutional neural networks for mobile vision applications. arxiv:1704.04861

  45. Hu J, Shen L, Albanie S, Sun G, Wu E (2020) Squeeze-and-excitation networks. IEEE Trans Pattern Anal Mach Intell 42(8):2011–2023. https://doi.org/10.1109/TPAMI.2019.2913372

    Article  Google Scholar 

  46. Hua W, Han M, Gong Y (2002) Baseball scene classification using multimedia features. IEEE Multim Expo 1:821–824. https://doi.org/10.1109/ICME.2002.1035908

    Article  Google Scholar 

  47. Hua Y, Mou L, Zhu XX (2019) Recurrently exploring class-wise attention in a hybrid convolutional and bidirectional LSTM network for multi-label aerial image classification. ISPRS J Photogramm Remote Sens 149:188–199. https://doi.org/10.1016/j.isprsjprs.2019.01.015

    Article  Google Scholar 

  48. Hua Y, Mou L, Zhu XX (2020) Relation network for multilabel aerial image classification. IEEE Trans Geosci Remote Sens 58(7):4558–4572. https://doi.org/10.1109/TGRS.2019.2963364

    Article  Google Scholar 

  49. Huang G, Liu Z, Van Der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In: Computer vision and pattern recognition. IEEE, pp 2261–2269 https://doi.org/10.1109/CVPR.2017.243

  50. Huang R, Zheng F, Huang W (2021) Multilabel remote sensing image annotation with multiscale attention and label correlation. IEEE J Sel Top Appl Earth Observ Remote Sens 14:6951–6961. https://doi.org/10.1109/JSTARS.2021.3091134

    Article  Google Scholar 

  51. Huber PJ (1964) Robust estimation of a location parameter. Ann Math Stat 35(1):73–101. https://doi.org/10.1214/aoms/1177703732

    Article  MathSciNet  MATH  Google Scholar 

  52. Hui J (2017) Understanding dynamic routing between capsules (capsule networks). https://jhui.github.io/2017/11/03/Dynamic-Routing-Between-Capsules/

  53. Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. In: Machine learning. JMLR, pp 448–456

  54. Jin P, Xia GS, Hu F, Lu Q, Zhang L (2018) AID++: an updated version of AID on scene classification. In: IEEE international geoscience and remote sensing symposium. IEEE, pp 4721–4724. https://doi.org/10.1109/IGARSS.2018.8518882, https://ieeexplore.ieee.org/document/8518882/

  55. Khan N, Chaudhuri U, Banerjee B, Chaudhuri S (2019) Graph convolutional network for multi-label VHR remote sensing scene recognition. Neurocomputing 357:36–46. https://doi.org/10.1016/j.neucom.2019.05.024

    Article  Google Scholar 

  56. Khan S, Naseer M, Hayat M, Zamir SW, Khan FS, Shah M (2021) Transformers in vision: a survey. http://arxiv.org/abs/2101.01169

  57. Krizhevsky A, Sutskever I, Hinton GE (2012) ImageNet classification with deep convolutional neural networks. Adv Neural Inf Process Syst 25:1097–1105

    Google Scholar 

  58. Lazebnik S, Schmid C, Ponce J (2006) Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: Computer vision and pattern recognition. IEEE, New York, pp 2169–2178. https://doi.org/10.1109/CVPR.2006.68

  59. Lecun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2323. https://doi.org/10.1109/5.726791

    Article  Google Scholar 

  60. Li H, Tao C, Wu Z, Chen J, Gong J, Deng M (2017a) RSI-CB: a large scale remote sensing image classification benchmark via crowdsource data. arxiv:1705.10450

  61. Li LJ, Fei-Fei L (2007) What, where and who? Classifying events by scene and object recognition. In: Computer vision. IEEE, Rio de Janeiro, pp 1–8. https://doi.org/10.1109/ICCV.2007.4408872

  62. Li Lj, Socher R, Fei-Fei L (2009) Towards total scene understanding: classification, annotation and segmentation in an automatic framework. In: Computer vision and pattern recognition. IEEE, pp 2036–2043. https://doi.org/10.1109/CVPR.2009.5206718

  63. Li P, Chen P, Zhang D (2022) Cross-modal feature representation learning and label graph mining in a residual multi-attentional CNN-LSTM network for multi-label aerial scene classification. Remote Sens 14(10):2424:1–27. https://doi.org/10.3390/rs14102424

    Article  Google Scholar 

  64. Li X, Du Z, Huang Y, Tan Z (2021) A deep translation (GAN) based change detection network for optical and SAR remote sensing images. ISPRS J Photogramm Remote Sens 179:14–34. https://doi.org/10.1016/j.isprsjprs.2021.07.007

    Article  Google Scholar 

  65. Li Y, Song Y, Luo J (2017) Improving pairwise ranking for multi-label image classification. In: Computer vision and pattern recognition, pp 1837–1845. https://doi.org/10.1109/CVPR.2017.199

  66. Li Y, Chen R, Zhang Y, Zhang M, Chen L (2020) Multi-label remote sensing image scene classification by combining a convolutional neural network and a graph neural network. Remote Sens 12(23):1–17. https://doi.org/10.3390/rs12234003

    Article  Google Scholar 

  67. Lin D, Chen Z (2022) Semantic understandings for aerial images via multigrained feature grouping. Sci Program. https://doi.org/10.1155/2022/1822539

    Article  Google Scholar 

  68. Lin D, Lin J, Zhao L, Wang ZJ, Chen Z (2022) Multilabel aerial image classification with a concept attention graph neural network. IEEE Trans Geosci Remote Sens 60(5602112):1–12. https://doi.org/10.1109/TGRS.2020.3041461

    Article  Google Scholar 

  69. Lin M, Chen Q, Yan S (2013) Network in network, pp 1–10. arxiv:1312.4400

  70. Lin TY, Goyal P, Girshick R, He K, Dollar P (2020) Focal loss for dense object detection. IEEE Trans Pattern Anal Mach Intell 42(2):318–327. https://doi.org/10.1109/TPAMI.2018.2858826

    Article  Google Scholar 

  71. Lipson P, Grimson E, Sinha P (1997) Configuration based scene classification and image indexing. In: Computer vision and pattern recognition. IEEE, pp 1007–1013. https://doi.org/10.1109/CVPR.1997.609453

  72. Liu GH, Yang JY (2013) Content-based image retrieval using color difference histogram. Pattern Recogn 46(1):188–198. https://doi.org/10.1016/j.patcog.2012.06.001

    Article  Google Scholar 

  73. Liu Y, Suen C, Liu Y, Ding L (2019) Scene classification using hierarchical Wasserstein CNN. IEEE Trans Geosci Remote Sens 57(5):2494–2509. https://doi.org/10.1109/TGRS.2018.2873966

    Article  Google Scholar 

  74. Lu X, Sun H, Zheng X (2019) A feature aggregation convolutional neural network for remote sensing scene classification. IEEE Trans Geosci Remote Sens 57(10):7894–7906. https://doi.org/10.1109/TGRS.2019.2917161

    Article  Google Scholar 

  75. Luo J, Savakis A (2001) Indoor vs outdoor classification of consumer photographs using low-level and semantic features. IEEE Image Process 2:745–748

    Google Scholar 

  76. Ma N, Zhang X, Zheng HT, Sun J (2018) Shufflenet v2: practical guidelines for efficient CNN architecture design. In: The European conference on computer vision. Springer, Munich, pp 116–131. https://openaccess.thecvf.com/content_ECCV_2018/html/Ningning_Light-weight_CNN_Architecture_ECCV_2018_paper.html

  77. Maron O, Ratan AL (1998) Multiple-instance learning for natural scene classification. In: Machine learning. MKP, pp 341–349. https://doi.org/10.1016/S0735-1097(86)80281-9

  78. Martins AF, Astudillo RF (2016) From softmax to sparsemax: a sparse model of attention and multi-label classification. In: Machine learning, vol 4. JMLR, pp 2432–2443

  79. Miller J, Goodman R, Smyth P (1993) On loss functions which minimize to conditional expected values and posterior probabilities. IEEE Trans Inform Theory 39(4):1404–1408. https://doi.org/10.1109/18.243457

    Article  MATH  Google Scholar 

  80. Ng A (2017) Deep learning specialization. DeepLearning.AI/Coursera. https://www.deeplearning.ai/program/deep-learning-specialization/. Accessed 10 Oct 2020

  81. Ng AY (2004) Feature selection, L 1 vs. L 2 regularization, and rotational invariance. In: Machine learning. ACM, Banff, pp 1–8. https://doi.org/10.1145/1015330.1015435, http://portal.acm.org/citation.cfm?doid=1015330.1015435

  82. Oliva A, Torralba A (2001) Modeling the shape of the scene: a holistic representation of the spatial envelope. Int J Comput Vis 42(3):145–175. https://doi.org/10.1023/A:1011139631724

    Article  MATH  Google Scholar 

  83. Ozyildirim BM, Kiran M (2020) Do optimization methods in deep learning applications matter? arxiv:2002.12642

  84. Patrick KM, Adekoya FA, Mighty AA, Edward BY (2022) Capsule networks–a survey. J King Saud Univ Comput Inf Sci 34(1):1295–1310. https://doi.org/10.1016/j.jksuci.2019.09.014

    Article  Google Scholar 

  85. Payne A, Singh S (2005) Indoor vs. outdoor scene classification in digital photographs. Pattern Recogn 38(10):1533–1545. https://doi.org/10.1016/j.patcog.2004.12.014

    Article  Google Scholar 

  86. Penatti OAB, Nogueira K, dos Santos JA (2015) Do deep features generalize from everyday objects to remote sensing and aerial scenes domains? In: Computer vision and pattern recognition workshops. IEEE, Boston, pp 44–51. https://doi.org/10.1109/CVPRW.2015.7301382

  87. Punjabi A, Schmid J, Katsaggelos AK (2020) Examining the benefits of capsule neural networks, pp 1–13. http://arxiv.org/abs/2001.10964

  88. Qi X, Zhu P, Wang Y, Zhang L, Peng J, Wu M, Chen J, Zhao X, Zang N, Mathiopoulos PT (2020) MLRSNet: a multi-label high spatial resolution remote sensing dataset for semantic scene understanding. ISPRS J Photogramm Remote Sens 169:337–350. https://doi.org/10.1016/j.isprsjprs.2020.09.020

    Article  Google Scholar 

  89. Quattoni A, Torralba A (2009) Recognizing indoor scenes. In: Computer vision and pattern recognition. IEEE, Miami, pp 413–420. https://doi.org/10.1109/CVPR.2009.5206537

  90. Rosasco L, De Vito E, Caponnetto A, Piana M, Verri A (2004) Are loss functions all the same? Neural Comput 16(5):1063–1076. https://doi.org/10.1162/089976604773135104

    Article  MATH  Google Scholar 

  91. Sabour S, Frosst N, Hinton GE (2017) Dynamic routing between capsules. In: The 31st international conference on neural information processing systems. NIPS’17. Curran Associates Inc., Red Hook, p 3859–3869

  92. Sandler M, Howard A, Zhu M, Zhmoginov A, Chen LC (2018) MobileNetV2: inverted residuals and linear bottlenecks. In: Computer vision and pattern recognition. IEEE, pp 4510–4520. https://doi.org/10.1109/CVPR.2018.00474, https://ieeexplore.ieee.org/document/8578572/

  93. Serrano N, Savakis A, Luo J (2002) A computationally efficient approach to indoor/outdoor scene classification. IEEE Pattern Recogn 16:146–149. https://doi.org/10.1109/ICPR.2002.1047420

    Article  Google Scholar 

  94. Serrano N, Savakis AE, Luo J (2004) Improved scene classification using efficient low-level features and semantic cues. Pattern Recogn 37(9):1773–1784. https://doi.org/10.1016/j.patcog.2004.03.003

    Article  MATH  Google Scholar 

  95. Shen X, Boutell M, Luo J, Brown C (2004) Multi-label machine learning and its application to semantic scene classification. In: Storage and retrieval methods and applications for multimedia. SPIE, pp 188–199. https://doi.org/10.1117/12.523428

  96. Sheng G, Yang W, Xu T, Sun H (2011) High-resolution satellite scene classification using a sparse coding based multiple feature combination. Int J Remote Sens 33(8):2395–2412. https://doi.org/10.1080/01431161.2011.608740

    Article  Google Scholar 

  97. Shorten C, Khoshgoftaar TM (2019) A survey on Image data augmentation for deep learning. J Big Data 6(60):1–48. https://doi.org/10.1186/s40537-019-0197-0

    Article  Google Scholar 

  98. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition, pp 1–14. arxiv:1409.1556

  99. Soydaner D (2020) A comparison of optimization algorithms for deep learning. Int J Pattern Recogn Artif Intell 34(13):1–26. https://doi.org/10.1142/S0218001420520138

    Article  Google Scholar 

  100. Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15(1):1929–1958. https://doi.org/10.5555/2627435.2670313

    Article  MathSciNet  MATH  Google Scholar 

  101. Srivastava P, Khare A (2018) Utilizing multiscale local binary pattern for content-based image retrieval. Multim Tools Appl 77:12377–12403. https://doi.org/10.1007/s11042-017-4894-4

    Article  Google Scholar 

  102. Stivaktakis R, Tsagkatakis G, Tsakalides P (2019) Deep learning for multilabel land cover scene categorization using data augmentation. IEEE Geosci Remote Sens Lett 16(7):1031–1035. https://doi.org/10.1109/LGRS.2019.2893306

    Article  Google Scholar 

  103. Sumbul G, Demir B (2020) A deep multi-attention driven approach for multi-label remote sensing image classification. IEEE Access 8:95934–95946. https://doi.org/10.1109/ACCESS.2020.2995805

    Article  Google Scholar 

  104. Sumbul G, Charfuelan M, Demir B, Markl V (2019) Bigearthnet: a large-scale benchmark archive for remote sensing image understanding. In: International geoscience and remote sensing symposium. IEEE, Yokohama, pp 5901–5904. https://doi.org/10.1109/IGARSS.2019.8900532

  105. Sun H, Li S, Zheng X, Lu X (2020) Remote sensing scene classification by gated bidirectional network. IEEE Trans Geosci Remote Sens 58(1):82–96. https://doi.org/10.1109/TGRS.2019.2931801

    Article  Google Scholar 

  106. Sun R (2019) Optimization for deep learning: theory and algorithms. arxiv:1912.08957

  107. Sun RY (2020) Optimization for deep learning: an overview. J Oper Res Soc China 8(2):249–294. https://doi.org/10.1007/s40305-020-00309-6

    Article  MathSciNet  MATH  Google Scholar 

  108. Sun S, Cao Z, Zhu H, Zhao J (2020) A survey of optimization methods from a machine learning perspective. IEEE Trans Cybern 50(8):3668–3681. https://doi.org/10.1109/TCYB.2019.2950779

    Article  Google Scholar 

  109. Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Computer vision and pattern recognition. IEEE, pp 1–9. https://doi.org/10.1109/CVPR.2015.7298594

  110. Szegedy C, Ioffe S, Vanhoucke V, Alemi A (2016) Inception-v4, inception-ResNet and the impact of residual connections on learning

  111. Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. In: Computer vision and pattern recognition. IEEE, pp 2818–2826. https://doi.org/10.1109/CVPR.2016.308

  112. Tan M, Le QV (2019) EfficientNet: rethinking model scaling for convolutional neural networks. In: Machine learning research. JMLR, pp 6105–6114. http://proceedings.mlr.press/v97/tan19a.html

  113. Touvron H, Cord M, Douze M, Massa F, Sablayrolles A, Jegou H (2021) Training data-efficient image transformers and distillation through attention. In: International conference on machine learning. PMLR, pp 10347–10357. arxiv:2012.12877

  114. Umamaheswaran S, Lakshmanan R, Vinothkumar V, Arvind K, Nagarajan S (2019) New and robust composite micro structure descriptor (CMSD) for CBIR. Int J Speech Technol 23(2):243–249. https://doi.org/10.1007/s10772-019-09663-0

    Article  Google Scholar 

  115. Vailaya A, Jain A, Jiang Zhangs H (1998) On image classification: city images vs landscapes. Pattern Recogn 31(12):1921–1935. https://doi.org/10.1016/S0031-3203(98)00079-X

    Article  Google Scholar 

  116. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez A, Kaiser L, Polosukhin I (2017) Attention is all you need Ashish. In: Advances in neural information processing systems, pp 5999–6009

  117. Vinyals O, Povey D (2012) Krylov subspace descent for deep learning. In: Artificial intelligence and statistics, La Palma, Canary Islands, vol 22, pp 1261–1268

  118. Vipparthi SK, Nagar SK (2014) Multi-joint histogram based modelling for image indexing and retrieval. Comput Electr Eng 40(8):163–173. https://doi.org/10.1016/j.compeleceng.2014.04.018

    Article  Google Scholar 

  119. Wang J, Li J, Wiederhold G (2001) Simplicity: semantics-sensitive integrated matching for picture libraries. IEEE Trans Pattern Anal Mach Intell 23(9):947–963. https://doi.org/10.1109/34.955109

    Article  Google Scholar 

  120. Wei Y, Luo X, Lixin Hu YP, Feng J (2020) An improved unsupervised representation learning generative adversarial network for remote sensing image scene classification. Remote Sens Lett 11(6):598–607. https://doi.org/10.1080/2150704X.2020.1746854

    Article  Google Scholar 

  121. Wei Y, Zhang Z, Wang Y, Xu M, Yang Y, Yan S, Wang M (2021) Deraincyclegan: Rain attentive cyclegan for single image deraining and rainmaking. IEEE Trans Image Process 30:4788–4801. https://doi.org/10.1109/TIP.2021.3074804

    Article  Google Scholar 

  122. Weng L (2018) Attention? Attention! lilianwenggithubio/lil-log.https://lilianweng.github.io/lil-log/2018/06/24/attention-attention.html

  123. Weng L (2020) The transformer family. lilianwenggithubio/lil-log. https://lilianweng.github.io/lil-log/2020/03/27/the-transformer-family.html

  124. Xia GS, Hu J, Hu F, Shi B, Bai X, Zhong Y, Zhang L, Lu X (2017) AID: a benchmark data set for performance evaluation of aerial scene classification. IEEE Trans Geosci Remote Sens 55(7):3965–3981. https://doi.org/10.1109/TGRS.2017.2685945

    Article  Google Scholar 

  125. Xiao J, Hays J, Ehinger K, Oliva A, Torralba A (2010) Sun database: large-scale scene recognition from abbey to zoo. In: Computer vision and pattern recognition. IEEE, San Francisco, CA, pp 3485–3492. https://doi.org/10.1109/CVPR.2010.5539970

  126. Xie S, Girshick R, Dollar P, Tu Z, He K (2017) Aggregated residual transformations for deep neural networks. In: Computer vision and pattern recognition. IEEE, pp 5987–5995. https://doi.org/10.1109/CVPR.2017.634, http://ieeexplore.ieee.org/document/8100117/

  127. Yan R, Liu Y, Jin R, Hauptmann A (2003) On predicting rare classes with SVM ensembles in scene classification. In: Acoustics, speech, and signal processing. IEEE, pp 3–21. https://doi.org/10.1109/ICASSP.2003.1199097

  128. Yang J, Jiang YG, Hauptmann AG, Ngo CW (2007) Evaluating bag-of-visual-words representations in scene classification. In: Multimedia conference and exhibition. ACM, pp 197–206. https://doi.org/10.1145/1290082.1290111

  129. Yang Y, Newsam S (2010) Bag-of-visual-words and spatial extensions for land-use classification. In: GIS: Proceedings of the ACM international symposium on advances in geographic information systems, pp 270–279. https://doi.org/10.1145/1869790.1869829

  130. Yao Y, Rosasco L, Caponnetto A (2007) On early stopping in gradient descent learning. Constr Approx 26(2):289–315. https://doi.org/10.1007/s00365-006-0663-2

    Article  MathSciNet  MATH  Google Scholar 

  131. yzimm (2021) The amount of parameters (parameters) and the amount of calculation (flops) in the convolutional neural network CNN. https://chowdera.com/2021/04/20210420120616773r.html

  132. Zeiler MD, Fergus R (2014) Visualizing and understanding convolutional networks. In: Lecture notes in computer science, vol 8689 LNCS. Springer, pp 818–833. https://doi.org/10.1007/978-3-319-10590-1_53

  133. Zhang W, Tang P, Zhao L (2019) Remote sensing image scene classification using CNN-CapsNet. Remote Sens 11(5):1–22. https://doi.org/10.3390/rs11050494

    Article  Google Scholar 

  134. Zhang X, Zhou X, Lin M, Sun J (2018) Shufflenet: an extremely efficient convolutional neural network for mobile devices. In: computer vision and pattern recognition. IEEE, Salt Lake City, pp 6848–6856. https://openaccess.thecvf.com/content_cvpr_2018/html/Zhang_ShuffleNet_An_Extremely_CVPR_2018_paper.html

  135. Zhao B, Zhong Y, Xia GS, Zhang L (2016) Dirichlet-derived multiple topic scene classification model for high spatial resolution remote sensing imagery. IEEE Trans Geosci Remote Sens 54(4):2108–2123. https://doi.org/10.1109/TGRS.2015.2496185

    Article  Google Scholar 

  136. ZhaoLi J, Tang P, Huo L (2016) Feature significance-based multibag-of-visual-words model for remote sensing image scene classification. J Appl Remote Sens 10(3):1–21. https://doi.org/10.1117/1.JRS.10.035004

    Article  Google Scholar 

  137. Zheng Q, Yang M, Tian X, Jiang N, Wang D (2020) A full stage data augmentation method in deep convolutional neural network for natural image classification. Discret Dyn Nat Soc 2020:4706576. https://doi.org/10.1155/2020/4706576

    Article  MATH  Google Scholar 

  138. Zheng Q, Zhao P, Yang Li HW, Yang Y (2021) Spectrum interference-based two-level data augmentation method in deep learning for automatic modulation classification. Neural Comput Appl 33:7723–7745. https://doi.org/10.1007/s00521-020-05514-1

    Article  Google Scholar 

  139. Zheng X, Yuan Y, Lu X (2019) A deep scene representation for aerial scene classification. IEEE Trans Geosci Remote Sens 57(7):4799–4809. https://doi.org/10.1109/TGRS.2019.2893115

    Article  Google Scholar 

  140. Zhou B, Lapedriza A, Xiao J, Torralba A, Oliva A (2014) Learning deep features for scene recognition using places database. In: Neural information processing systems, Boston, MA, pp 487–495. https://doi.org/10.5555/2968826.2968881

  141. Zhou W, Newsam S, Li C, Shao Z (2018) PatternNet: a benchmark dataset for performance evaluation of remote sensing image retrieval. ISPRS J Photogram Remote Sens 145(Part A):197–209. https://doi.org/10.1016/j.isprsjprs.2018.01.004

    Article  Google Scholar 

  142. Zhu M (2021) A brief analysis of GAN variants on image classification and generation. J Phys: Conf Ser 1827(1):012165. https://doi.org/10.1088/1742-6596/1827/1/012165

    Article  Google Scholar 

  143. Zoph B, Vasudevan V, Shlens J, Le QV (2018) Learning transferable architectures for scalable image recognition. In: Computer vision and pattern recognition. IEEE, pp 8697–8710. https://doi.org/10.1109/CVPR.2018.00907, https://ieeexplore.ieee.org/document/8579005/

  144. Zou Q, Ni L, Zhang T, Wang Q (2015) Deep learning based feature selection for remote sensing scene classification. IEEE Trans Geosci Remote Sens Lett 12(11):2321–2325. https://doi.org/10.1109/LGRS.2015.2475299

    Article  Google Scholar 

Download references

Funding

This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sagar Chavda.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chavda, S., Goyani, M. Scene Level Image Classification: A Literature Review. Neural Process Lett 55, 2471–2520 (2023). https://doi.org/10.1007/s11063-022-11072-5

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11063-022-11072-5

Keywords

Navigation