Scene Recognition with Comprehensive Regions Graph Modeling

  • Haitao ZengEmail author
  • Gongwei Chen
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11903)


Learning the regional contents of scenes comprehensively is key to scene recognition. Due to semantic diversity and spatial complexity in scene images, modeling based on these regional contents is challenging. The current works mainly focus on some small and partial regions of the scene, while ignoring the majority region of the scene. In contrast, we propose the Semantic Regional Graph modeling framework for the comprehensive selection of discriminative semantic regions in scenes. To explore the relations of these regions, we propose to model these regions in geometric aspect based on the graph model, and generate the discriminative representations for scene recognition. Experimental results demonstrate the effectiveness of our method, which achieves state-of-the-art performances on MIT67 and SUN397 datasets.


Scene recognition Graph Neural Network 


  1. 1.
    Sutskever, I., Krizhevsky, A., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: NIPS, pp. 1106–1114 (2012)Google Scholar
  2. 2.
    Bolei, Z., Aditya, K., Agata, L., Antonio, T., Aude, P.: Places: an image database for deep scene understanding. arXiv preprint arXiv:1610.02055 (2016)
  3. 3.
    Caesar, H., Uijlings, J., Ferrari, V.: COCO-Stuff: thing and stuff classes in context. In: CVPR (2018)Google Scholar
  4. 4.
    Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell. 40(4), 834–848 (2018)CrossRefGoogle Scholar
  5. 5.
    Chen, T., Lin, L., Chen, R., Wu, Y., Luo, X.: Knowledge-embedded representation learning for fine-grained image recognition. In: IJCAI (2018)Google Scholar
  6. 6.
    Cheng, X., Lu, J., Feng, J., Yuan, B., Zhou, J.: Scene recognition with objectness. Pattern Recogn. 74, 474–487 (2018)CrossRefGoogle Scholar
  7. 7.
    Perronnin, F., Sánchez, J., Mensink, T.: Improving the Fisher Kernel for large-scale image classification. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6314, pp. 143–156. Springer, Heidelberg (2010). Scholar
  8. 8.
    Fredembach, C., Schroder, M., Susstrunk, S.: Eigenregions for image classification. IEEE Trans. Pattern Anal. Mach. Intell. 26(12), 1645–1649 (2004)CrossRefGoogle Scholar
  9. 9.
    Maaten, L., Huang, G., Liu, Z., Weinberger, K.Q.: Densely connected convolutional networks. In: CVPR (2017)Google Scholar
  10. 10.
    Guo, S., Huang, W., Wang, L., Qiao, Y.: Locally supervised deep hybrid model for scene recognition. IEEE Trans. Image Process. 26(2), 808–820 (2017)MathSciNetCrossRefGoogle Scholar
  11. 11.
    Heranz, L., Jiang, S., Li, X.: Scene recognition with CNNs: objects, scales and dataset bias. In: CVPR (2016)Google Scholar
  12. 12.
    Jiang, S., Chen, G., Song, X., Liu, L.: Deep patch representations with shared codebook for scene classification. ACM Trans. Multimedia Comput. Commun. Appl. 15, 1–17 (2019)CrossRefGoogle Scholar
  13. 13.
    Gkioxari, G., He, K., Dollar, P., Girshick, R.: Mask R-CNN. In: ICCV, pp. 2980–2988 (2017)Google Scholar
  14. 14.
    Ren, S., He, K., Zhang, X., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016)Google Scholar
  15. 15.
    Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. In: ICLR (2017)Google Scholar
  16. 16.
    Peng, X., Zhao, L., Tian, Y., Kapadia, M., Metaxas, D.N.: Semantic graph convolutional networks for 3D human pose regression. In: CVPR (2019)Google Scholar
  17. 17.
    Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vision 60, 91–110 (2004)CrossRefGoogle Scholar
  18. 18.
    Min, Z., Wei, C.X.S., Wang, P., Guo, Y.: Multi-label image recognition with graph convolutional networks. In: CVPR (2019)Google Scholar
  19. 19.
    Koniusz, P., Zhang, H.: A deeper look at power normalizations. In: CVPR (2018)Google Scholar
  20. 20.
    Quattoni, A., Torralba, A.: Recognizing indoor scenes. In: CVPR, pp. 413–420 (2009)Google Scholar
  21. 21.
    Ren, S., He, K., Girshick, R., Sun, J.: Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: NIPS (2015)Google Scholar
  22. 22.
    Russakovsky, O., et al.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vision 115, 211–252 (2015)MathSciNetCrossRefGoogle Scholar
  23. 23.
    Shen, J., Shepherd, J., Ngu, A.H.H.: Semantic-sensitive classification for large image libraries. In: International Multimedia Modelling Conference, pp. 340–345 (2005)Google Scholar
  24. 24.
    Song, X., Jiang, S., Herranz, L.: Multi-scale multi-feature context modeling for scene recognition in the semantic manifold. IEEE Trans. Image Process. 26(8), 2721–2735 (2017)MathSciNetCrossRefGoogle Scholar
  25. 25.
    Song, X., Jiang, S., Herranz, L.: Joint multi-feature spatial context for scene recognition on the semantic manifold. In: CVPR (2015)Google Scholar
  26. 26.
    Song, X., Jiang, S., Herranz, L., Kong, Y., Zheng, K.: Category co-occurrence modeling for large scale scene recognition. Pattern Recogn. 59, 98–111 (2016)CrossRefGoogle Scholar
  27. 27.
    Schmid, C., Lazebnik, S., Ponce, J.: Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: CVPR (2006)Google Scholar
  28. 28.
    Vailaya, A., Jain, A., Figueiredo, M., Zhang, H.: Content-based hierarchical classification of vacation images. In: IEEE International Conference on Multimedia Computing and Systems, pp. 518–523 (1999)Google Scholar
  29. 29.
    Vaswani, A., et al: Attention is all you need. In: NIPS (2017)Google Scholar
  30. 30.
    Wang, J., Yang, J., Yu, K., Lv, F., Huang, T., Gong, Y.: Locality-constrained linear coding for image classification. In: CVPR (2010)Google Scholar
  31. 31.
    Wang, L., Guo, S., Huang, W., Xiong, Y., Qiao, Y.: Knowledge guided disambiguation for large-scale scene classification with multi-resolution CNNs. IEEE Trans. Image Process. 26(4), 2055–2068 (2017)MathSciNetCrossRefGoogle Scholar
  32. 32.
    Wang, Z., Wang, L., Wang, Y., Zhang, B., Qiao, Y.: Weakly supervised patchnets: describing and aggregating local patches for scene recognition. IEEE Trans. Image Process. 26(4), 2028–2041 (2017)MathSciNetCrossRefGoogle Scholar
  33. 33.
    Mark, J.B., Wilhelm, B.: Principles of Digital Image Processing: Core Algorithms. UTICS. Springer, London (2009). Scholar
  34. 34.
    Wu, R., Wang, B., Wang, W., Yu, Y.: Harvesting discriminative meta objects with deep CNN features for scene classification. In: ICCV (2015)Google Scholar
  35. 35.
    Ye, Y., Wang, X., Gupta, A.: Zero-shot recognition via semantic embeddings and knowledge graphs. In: CVPR (2018)Google Scholar
  36. 36.
    Xiao, J., Hays, J., Ehinger, K.A., Oliva, A., Torralba, A.: SUN Database: large-scale scene recognition from abbey to zoo. In: CVPR, pp. 3485–3492 (2010)Google Scholar
  37. 37.
    Xie, G.-S., Zhang, X.-Y., Yan, S., Liu, C.-L.: Hybrid CNN and dictionary-based models for scene recognition and domain adaptation. IEEE Trans. Circuits Syst. Video Technol. 27, 1263–1274 (2017)CrossRefGoogle Scholar
  38. 38.
    Vasconcelos, N., Li, Y., Dixit, M.: Deep scene image classification with the MFAFVNet. In: ICCV (2017)Google Scholar
  39. 39.
    Yang, J., Lee, S., Lu, J., Batra, D., Parikh, D.: Graph R-CNN for scene graph generation. In: ECCV (2018)Google Scholar
  40. 40.
    Zhao, Z., Larson, M.: From volcano to toyshop: adaptive discriminative region discovery for scene recognition. In: ACM MM (2018)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.China University of Mining and TechnologyBeijingChina
  2. 2.Key Laboratory of Intelligent Information Processing, Institute of Computing TechnologyChinese Academy of SciencesBeijingChina

Personalised recommendations