Skip to main content
Log in

High level visual scene classification using background knowledge of objects

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

This paper introduces a novel and simple approach of high-level scene classification. Knowing that objects are the essence of any given scene, the proposed method uses them to construct a well-structured background knowledge, which is composed of ranking functions and a statistical collection, in order to support the scene classification process. Since not all objects are relevant, only the most salient ones are identified and used in computing the appropriate scene category. To prove the efficiency of the proposed method, experiments are conducted on state of the art datasets: MIT Indoor, SUN900, SUN2012, SUN397 and LabelMe+.Comparisons with other methods were also introduced. The obtained results are reported and discussed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

References

  1. Abkenar MR, Sadreazami H, Ahmad MO (2019) Graph-based salient object detection using background and foreground connectivity cues. 2019 IEEE International Symposium on Circuits and Systems (ISCAS)

  2. Aditya S (2017) Explainable image understanding using vision and reasoning. Proceedings of the AAAI Conference on Artificial Intelligence, vol 31

  3. Aditya S et al (2018) Image understanding using vision and reasoning through scene description graph. Comput Vis Image Understand 173:33–45

    Article  Google Scholar 

  4. Alajaji D, Alhichri H (2020) Few shot scene classification in remote sensing using meta-agnostic machine. 2020 6th conference on data science and machine learning applications (CDMA)

  5. Ali N et al (2018) A hybrid geometric spatial image representation for scene classification. PLoS One 13

  6. Anbarasu B, Anitha G (2018) Indoor scene recognition for micro aerial vehicles navigation using enhanced-GIST descriptors. Def Sci J 68:129

    Article  Google Scholar 

  7. Bagschik G, Menzel T, Maurer M (2018) Ontology based scene creation for the development of automated vehicles. In 2018 IEEE Intelligent Vehicles Symposium (IV), pp 1813-1820

  8. Bai X, Yang M, Lyu P, Xu Y, Luo J (2018) Integrating scene text and visual appearance for fine-grained image classification. IEEE Access 6:66322–66335

    Article  Google Scholar 

  9. Biederman R, Mezzanotte J, Rabinowitz JC (1982) Scene perception: detecting and judging objects undergoing relational violations. Cogn Psychol 14:143–177

    Article  Google Scholar 

  10. Borji A et al (2019) Salient object detection: A survey. Computational Visual Media 5:117–150

    Article  Google Scholar 

  11. Bosch AZ, Muñoz X (2006) Scene classification via pLSA. European conference on computer vision

  12. Brady TF, Konkle T, Alvarez GA, Oliva A (2008) Visual long-term memory has a massive storage capacity for object details. Proc Natl Acad Sci 105:14325–14329

    Article  Google Scholar 

  13. Brown M, Süsstrunk S (2011) Multi-spectral SIFT for scene category recognition» CVPR 2011. IEEE

  14. Cakir F, Güdükbay U, Ulusoy Ö (2011) Nearest-neighbor based metric functions for indoor scene recognition. Comput Vis Image Underst 115:1483–1492

    Article  Google Scholar 

  15. Choi MJ, Torralba A, Willsky AS (2012) Context models and out-of-context objects. Pattern Recogn Lett 33:853–862

    Article  Google Scholar 

  16. Cordts M, Omran M, Ramos S, Rehfeld T, Enzweiler M, Benenson R, Franke U, Roth S, Schiele B (2016) The cityscapes dataset for semantic urban scene understanding. Proceedings of the IEEE conference on computer vision and pattern recognition

  17. Donadello I (2018) Semantic, «image interpretation-integration of numerical data and logical knowledge for cognitive vision». Diss. University of Trento

  18. Donadello LS, AD Garcez(2017) Logic tensor networks for semantic image interpretation arXiv preprint arXiv:1705.08968

  19. Dubey R, Peterson J, Khosla A, Yang M-H, Ghanem B (2015) What makes an object memorable? Proceedings of the IEEE international conference on computer vision

  20. Einhäuser W, Spain M, Perona P (2008) Objects predict fixations better than early saliency. J Vis 8:18–18

    Article  Google Scholar 

  21. Feng YL, Wu L (2017) Bag of visual words model with deep spatial features for geographical scene classification. Computational Intelligence and Neuroscience

  22. Finn C, Abbeel P, Levine S (2017) Model-agnostic meta-learning for fast adaptation of deep networks», arXiv preprint arXiv:1703.03400

  23. Finn C, Xu K, Levine S (2018) Probabilistic model-agnostic meta-learning. Advances in Neural Information Processing Systems

  24. Fu Q, Fu H, Yan H, Zhou B, Chen X, Li X (2020) Human-centric metrics for indoor scene assessment and synthesis. Graph Models 110

  25. Galleguillos C, Belongie S (2010) Context based object categorization: A critical survey. Comput Vis Image Underst 114:712–722

    Article  Google Scholar 

  26. Ganesan, Balasubramanian A (2019) Indoor versus outdoor scene recognition for navigation of a micro aerial vehicle using spatial color gist wavelet descriptors. Visual Computing for Industry, Biomedicine, and Art 2

  27. He X, Deng L (2017) Deep learning for image-to-text generation: A technical overview. IEEE Signal Process Mag 34:109–116

    Article  Google Scholar 

  28. Hotz L, Neumann B (2005) Scene Interpretation as a Configuration Task. KI, vol 19

  29. Hotz L, Neumann B (2005) Scene Interpretation as a Configuration Task. KI 19

  30. Hu A et al (2020) Probabilistic future prediction for video scene understanding. European Conference on Computer Vision. Springer, Cham

  31. Hwang SJ, Grauman K (2011) Reading between the lines: object localization using implicit cues from image tags. IEEE Trans Pattern Anal Mach Intell 34:1145–1158

    Article  Google Scholar 

  32. Hwang SJ, Grauman K (2012) Learning the relative importance of objects from tagged images for retrieval and cross-modal search. Int J Comput Vis 100:134–153

    Article  MathSciNet  Google Scholar 

  33. Isola P, Xiao J, Torralba A, Oliva A (2011) What makes an image memorable? CVPR 2011

  34. Isola P, Xiao J, Parikh D, Torralba A, Oliva A (2013) What makes a photograph memorable? IEEE Trans Pattern Anal Mach Intell 36:1469–1482

    Article  Google Scholar 

  35. Itti L, Koch C, Niebur E (1998) A model of saliency-based visual attention for rapid scene analysis. IEEE Trans Pattern Anal Mach Intell 20:1254–1259

    Article  Google Scholar 

  36. Karaoglu S, Tao R, Gevers T, Smeulders AWM (2016) Words matter: scene text for image classification and retrieval. IEEE Trans Multimed 19:1063–1076

    Article  Google Scholar 

  37. Kojima R, Sugiyama O, Nakadai K (2016) Multimodal scene understanding framework and its application to cooking recognition. Appl Artif Intell 30:181–200

    Article  Google Scholar 

  38. Li L, Sumanaphan S (2011) Indoor scene recognition. Stanford University

    Google Scholar 

  39. Li L-J, Su H, Lim Y, Fei-Fei L (2014) Object bank: an object-level image representation for high-level visual recognition. Int J Comput Vis 107:20–39

    Article  Google Scholar 

  40. Li E, Xia J, Du P, Lin C, Samat A (2017) Integrating multilayer features of convolutional neural networks for remote sensing scene classification. IEEE Trans Geosci Remote Sens 55:5653–5665

    Article  Google Scholar 

  41. Li Y, Zhang Z, Cheng Y, Wang L, Tan T (2019) MAPNet: multi-modal attentive pooling network for RGB-D indoor scene classification. Pattern Recogn 90:436–449

    Article  Google Scholar 

  42. Liu B-D, Meng J, Xie W-Y, Shao S, Li Y, Wang Y (2019) Weighted spatial pyramid matching collaborative representation for remote-sensing-image scene classification. Remote Sensing 11

  43. Lu M, Xu RY, Wang Z (2020) Understanding and predicting the memorability of outdoor natural scenes. IEEE Trans Image Process 29:4927–4941

    Article  Google Scholar 

  44. Mary NAB, Dharma D (2017) Coral reef image classification employing improved LDP for feature extraction. Journal of Visual Communication and Image Representation 49:225–242

    Article  Google Scholar 

  45. Mary N A B, Singh AR, Athisayamani S (2021) Classification of Banana Leaf Diseases Using Enhanced Gabor Feature Descriptor. Inventive Communication and Computational Technologies. Springer, Singapore, pp 229-242

  46. Oliva, Torralba A (2001) Modeling the shape of the scene: A holistic representation of the spatial envelope. International Journal of Computer Vision 42:145–175

    Article  Google Scholar 

  47. Pandey M, Lazebnik S (2011) Scene recognition and weakly supervised object localization with deformable part-based models. 2011 international conference on computer vision

  48. Pangercic D et al (2009) Visual scene detection and interpretation using encyclopedic knowledge and formal description logic. Proceedings of the International Conference on Advanced Robotics (ICAR), vol 11

  49. Patel TA, Dabhi VK, Prajapati HB (2020) Survey on scene classification techniques. 2020 6th international conference on advanced computing and communication systems (ICACCS)

  50. Peng Z Li J Zhang Y Li G-JQ, Tang J (2019) Few-shot image recognition with knowledge transfer. Proceedings of the IEEE International Conference on Computer Vision

  51. Perera S, Tal A, Zelnik-Manor L (2019) Is image memorability prediction solved?. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops

  52. Pham L, McLoughlin I, Phan H , Palaniappan R, Lang Y (2019) Bag-of-features models based on C-DNN network for acoustic scene classification. Audio engineering society conference: 2019 AES international conference on audio forensics

  53. Quattoni A, Torralba A (2009) Recognizing indoor scenes. In 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp 413-420

  54. Rafique A, Jalal A, Ahmed A (2019) Scene understanding and recognition: statistical segmented model using geometrical features and Gaussian Naı̈ve Bayes. IEEE conference on International Conference on Applied and Engineering Mathematics

  55. Rafique AA, Jalal A, Ahmed A (2019) Scene Understanding and Recognition: Statistical Segmented Model using Geometrical Features and Gaussian Naïve Bayes. IEEE conference on International Conference on Applied and Engineering Mathematics, vol 57

  56. Ramesh B, Jian NLZ, Chen L, Xiang C, Gao Z (2019) Scalable scene understanding via saliency consensus. Soft Comput 23:2429–2443

    Article  Google Scholar 

  57. Rangel JC, Cazorla M, Garcia-Varea I, Martinez-Gomez J, Fromont E, Sebban M (2016) Scene classification based on semantic labeling. Adv Robot 30:758–769

    Article  Google Scholar 

  58. Reiter R, Mackworth AK (1989) A logical framework for depiction and image interpretation. Artif Intell 41:125–155

    Article  MathSciNet  Google Scholar 

  59. Russell BC, Torralba A, Murphy KP, Freeman WT (2008) LabelMe: a database and web-based tool for image annotation. Int J Comput Vis 77:157–173

    Article  Google Scholar 

  60. Russell C, Torralba A, Murphy KP, Freeman WT (2015) LabelMe, an annotation tool, , available at: http://labelme.csail.mit.edu/Release3.0/

  61. Rust NC, Mehrpour V (2020) Understanding image memorability. Trends Cogn Sci

  62. Sadeghi MA, Farhadi A (2011) Recognition using visual phrases. CVPR 2011

  63. Savchenko V, Rassadin AG (2019) Scene recognition in user preference prediction based on classification of deep embeddings and object detection. International Symposium on Neural Networks

  64. Triantafillou E, Zemel R, Urtasun R (2017) Few-shot learning through an information retrieval lens. Advances in Neural Information Processing Systems

  65. Vinyals O et al (2016) Matching networks for one shot learning. Adv Neural Inf Proces Syst

  66. Wang W et al (2021) Pattern Analysis and Scene Understanding. Interdisciplinary Evolution of the Machine Brain. Springer, Singapore, pp 59-93

  67. Wu R, Ye Z, Liu P, Tang X, Zhao W (2015) Knowledge as action: A cognitive framework for indoor scene classification. 2015 IEEE international conference on image processing (ICIP)

  68. Xia S, Zeng J, Leng L, Fu X (2019) WS-AM: weakly supervised attention map for scene recognition. Electronics 8

  69. Xiao T et al (2018) Unified perceptual parsing for scene understanding. Proceedings of the European Conference on Computer Vision (ECCV)

  70. Xiao J, Hays J, Ehinger KA, Oliva A, A Torralba (2010) Sun database: Large-scale scene recognition from abbey to zoo. In 2010 IEEE computer society conference on computer vision and pattern recognition, pp 3485-3492

  71. Xueqi L (2016) Method of scene image classification based on gist descriptor and CNN. Video Eng 40:7–11

    Google Scholar 

  72. Zeng D, Chen S, Chen B, Li S (2018) Improving remote sensing scene classification by integrating global-context and local-object features. Remote Sensing 10

  73. Zhang M, Zhu M, Zhao X (2020) Recognition of high-risk scenarios in building construction based on image semantics. J Comput Civil Eng 34

  74. Zhang P, Bai Y, Dong W, Bai B, Li Y (2021) Few-shot Classification of Aerial Scene Images via Meta-learning. Remote Sensing 13

  75. Zhao B, Zhong Y, Zhang L, Huang B (2016) The fisher kernel coding framework for high spatial resolution scene classification. Remote Sens 8:157

    Article  Google Scholar 

  76. Zitnick CL, Vedantam R, Parikh D (2014) Adopting abstract images for semantic scene understanding. IEEE Trans Pattern Anal Mach Intell 38:627–638

    Article  Google Scholar 

Download references

Acknowledgments

We thank both Mr. Miles Mathew and Mr. Richard Hacken from the University of Brigham Young, in Provo, Utah, USA for giving us a great help in reviewing this paper and suggesting very useful recommendations.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lamine Benrais.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Benrais, L., Baha, N. High level visual scene classification using background knowledge of objects. Multimed Tools Appl 81, 3663–3692 (2022). https://doi.org/10.1007/s11042-021-11701-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-021-11701-6

Keywords

Navigation