Skip to main content
Log in

Adaptive local recalibration network for scene recognition

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Scene recognition is a computer vision task that categorizes scenes from photographs. In this paper, we introuduce the Adaptive Local Recalibration Network (ALR-Net), a novel scene recognition method based on convolutional neural networks (CNNs). In comparison to the object classification task, the scene classification images have a more dispersed distribution of information. To solve this issue, we suggest an attention mechanism for locating the discriminative regions for scene recognition. Along with normal data augmentation, we use the regions to guide two additional data augmentation approaches, namely adaptive cropping and adaptive hiding, in order to capture local information more efficiently and specifically. Attention maps are also used to adaptively recalibrate scene feature maps so that discriminative regions receive more attention than others. In addition, we bring in a scene distribution label for each image, which is used to assist the training of attention maps. Extensive studies on two scene recognition benchmarks verified the proposed model’s effectiveness: MIT67 (88.37%) and SUN397 (74.24%).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Data Availability

All data generated or analysed during this study are included in these published articles [32, 48, 49].

References

  1. Zhou B, Lapedriza A, Xiao J, Torralba A, Oliva A (2014) Learning deep features for scene recognition using places database. Advances in Neural Information Processing Systems (NIPS) 27:2014

    Google Scholar 

  2. Liu T, Wang J, Yang B, Wang X (2021) Ngdnet: Nonuniform gaussian-label distribution learning for infrared head pose estimation and ontask behavior understanding in the classroom. Neurocomputing, 436:210–220

  3. Li Z, Liu H, Zhang Z, Liu T, Xiong NN (2021) Learning knowledge graph embedding with heterogeneous relation attention networks. IEEE Trans Neural Netw Learn Syst 33(8):3961–3973

  4. H Liu, C Zheng, D Li, X Shen, K Lin, J Wang, Z Zhang, Z Zhang, NN Xiong. Edmf: Efficient deep matrix factorization with review feature learning for industrial recommender system. IEEE Transactions on Industrial Informatics, 18(7):4361–4371, 2021

  5. Wang Z, Wang L, Wang Y, Zhang B, Qiao Y (2017) Weakly supervised patchnets: Describing and aggregating local patches for scene recognition. IEEE Trans Image Process 26(4):2028–2041

  6. Wu R, Wang B, Wang W, Yu Y (2015) Harvesting discriminative meta objects with deep cnn features for scene classification. In Proceedings of the IEEE International Conference on Computer Vision, pages 1287–1295

  7. Cheng X, Lu J, Feng J, Yuan B, Zhou J (2018) Scene recognition with objectness. Pattern Recognition 74:474–487

  8. Zhao Z and Larson M (2018) From volcano to toyshop: Adaptive discriminative region discovery for scene recognition. In Proceedings of the 26th ACM international conference on Multimedia, pages 1760–1768

  9. Zhou B, Khosla A, Lapedriza A, Oliva A, Torralba A (2016) Learning deep features for discriminative localization. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2921–2929

  10. Simon M and Rodner E (2015) Neural activation constellations: Unsupervised part model discovery with convolutional networks. In Proceedings of the IEEE international conference on computer vision, pages 1143–1151

  11. Song X, Jiang S, Herranz L (2017) Multi-scale multi-feature context modeling for scene recognition in the semantic manifold. IEEE Transactions on Image Processing, 26(6):2721–2735

  12. Zeng H, Song X, Chen G, Jiang S (2019) Learning scene attribute for scene recognition. IEEE Transactions on Multimedia 22(6):1519–1530

  13. Yu L, Jin M, Zhou K (2020) Multi-channel biomimetic visual transformation for object feature extraction and recognition of complex scenes. Applied Intelligence 50(3):792–811

  14. Patterson G, Hays J (2012) Sun attribute database: Discovering, annotating, and recognizing scene attributes. In 2012 IEEE Conference on Computer Vision and Pattern Recognition, pages 2751–2758. IEEE

  15. Patterson G, Xu C, Su H, Hays J (2014) The sun attribute database: Beyond categories for deeper scene understanding. International Journal of Computer Vision, 108(1-2):59–81

  16. Wang L, Guo S, Huang W, Xiong Y, Qiao Y (2017) Knowledge guided disambiguation for large-scale scene classification with multiresolution cnns. IEEE Transactions on Image Processing 26(4):2055–2068

  17. Gao BB, Xing C, Xie CW, Wu J, Geng X (2017) Deep label distribution learning with label ambiguity. IEEE Transactions on Image Processing, 26(6):2825–2838

  18. Tanaka D, Ikami D, Yamasaki T, Aizawa K (2018) Joint optimization framework for learning with noisy labels. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 5552–5560

  19. Yi K, Wu J (2019) Probabilistic end-to-end noise correction for learning with noisy labels. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7017–7025

  20. Liu JB, Huang YP, Zou Q, Wang SC (2019) Learning representative features via constrictive annular loss for image classification. Applied Intelligence, 49(8):3082–3092

  21. Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems 25:1097–1105

  22. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556

  23. Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1–9

  24. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778

  25. Yuan C, Wu Y, Qin X, Qiao S, Pan Y, Huang P, Liu D, Han N (2019) An effective image classification method for shallow densely connected convolution networks through squeezing and splitting techniques. Applied Intelligence 49(10):3570–3586

    Article  Google Scholar 

  26. Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 7132–7141

  27. Park J, Woo S, Lee JY, Kweon IS (2018) Bam: Bottleneck attention module. arXiv:1807.06514

  28. Woo S, Park J, Lee JY, Kweon IS (2018) Cbam: Convolutional block attention module. In Proceedings of the European conference on computer vision (ECCV), pages 3–19

  29. Liu H, Nie H, Zhang Z, Li YF (2021) Anisotropic angle distribution learning for head pose estimation and attention understanding in human-computer interaction. Neurocomputing 433:310–322

    Article  Google Scholar 

  30. Liu H, Fang S, Zhang Z, Li D, Lin K, Wang J (2021) Mfdnet: Collaborative poses perception and matrix fisher distribution for head pose estimation. IEEE Trans Multimedia 24:2449–2460

    Article  Google Scholar 

  31. Deng Y, Chen H, Chen H, Li Y (2021) Learning from images: A distillation learning framework for event cameras. IEEE Trans Image Process 30:4919–4931

    Article  Google Scholar 

  32. Zhou B, Lapedriza A, Khosla A, Oliva A, Torralba A (2017) Places: A 10 million image database for scene recognition. IEEE Trans Pattern Anal Mach Intell 40(6):1452–1464

    Article  Google Scholar 

  33. Lowe DG (2004) Distinctive image features from scale-invariant keypoints. International journal of computer vision 60(2):91–110

    Article  Google Scholar 

  34. Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR’05), volume 1, pages 886–893. Ieee

  35. Oliva A, Torralba A (2001) Modeling the shape of the scene: A holistic representation of the spatial envelope. International journal of computer vision 42(3):145–175

    Article  MATH  Google Scholar 

  36. Jégou H, Perronnin F, Douze M, Sánchez J, Pérez P, Schmid C (2011) Aggregating local image descriptors into compact codes. IEEE transactions on pattern analysis and machine intelligence 34(9):1704–1716

    Article  Google Scholar 

  37. Perronnin F, Sánchez J, Mensink T (2010) Improving the fisher kernel for large-scale image classification. In European conference on computer vision, pages 143–156. Springer

  38. Liu H, Wang X, Zhang W, Zhang Z, Li YF (2020) Infrared head pose estimation with multi-scales feature fusion on the irhp database for human attention recognition. Neurocomputing 411:510–520

    Article  Google Scholar 

  39. Deng Y, Chen H, Li Y (2021) Mvf-net: A multi-view fusion network for event-based object classification. IEEE Transactions on Circuits and Systems for Video Technology 32(12):8275–8284

    Article  Google Scholar 

  40. Zheng H, Fu J, Mei T, Luo J (2017) Learning multi-attention convolutional neural network for fine-grained image recognition. In Proceedings of the IEEE international conference on computer vision, pages 5209–5217

  41. Yang Z, Luo T, Wang D, Hu Z, Gao J, Wang L (2018) Learning to navigate for fine-grained classification. In Proceedings of the European Conference on Computer Vision (ECCV), pages 420–435

  42. Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15(1):1929–1958

    MathSciNet  MATH  Google Scholar 

  43. Ioffe S, Szegedy C (2015) Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International conference on machine learning, pages 448–456. PMLR

  44. Singh KK, Lee YJ (2017) Hide-and-seek: Forcing a network to be meticulous for weakly-supervised object and action localization. In 2017 IEEE international conference on computer vision (ICCV), pages 3544–3553. IEEE

  45. Zhong Z, Zheng L, Kang G, Li S, Yang Y (2020) Random erasing data augmentation. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pages 13001–13008

  46. DeVries T, Taylor GW (2017) Improved regularization of convolutional neural networks with cutout. arXiv:1708.04552

  47. Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: Towards real-time object detection with region proposal networks. arXiv:1506.01497

  48. Quattoni A, Torralba A (2009) Recognizing indoor scenes. In 2009 IEEE Conference on Computer Vision and Pattern Recognition, pages 413–420. IEEE

  49. Xiao J, Hays J, Ehinger KA, Oliva A, Torralba A (2010) Sun database: Large-scale scene recognition from abbey to zoo. In 2010 IEEE computer society conference on computer vision and pattern recognition, pages 3485–3492. IEEE

  50. Goyal P, Dollár P, Girshick R, Noordhuis P, Wesolowski L, Kyrola A, Tulloch A, Jia Y, He K (2017) Accurate, large minibatch sgd: Training imagenet in 1 hour. arXiv:1706.02677

  51. Sitaula C, Xiang Y, Aryal S, Lu X (2021) Scene image representation by foreground, background and hybrid features. Expert Systems with Applications, page 115285

  52. Guo S, Huang W, Wang L, Qiao Y (2016) Locally supervised deep hybrid model for scene recognition. IEEE transactions on image processing 26(2):808–820

  53. Xie GS, Zhang XY, Yan S, Liu CL (2015) Hybrid cnn and dictionary-based models for scene recognition and domain adaptation. IEEE Transactions on Circuits and Systems for Video Technology 27(6):1263–1274

    Article  Google Scholar 

  54. Herranz L, Jiang S, Li X (2016) Scene recognition with cnns: objects, scales and dataset bias. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 571–579

  55. López-Cifuentes A, Escudero-Viñolo M (2020) Jesús Bescós, Á García-Martín. Semantic-aware scene recognition. Pattern Recognition 102:107256

    Article  Google Scholar 

  56. Chen G, Song X, Zeng H, Jiang S (2020) Scene recognition with prototype-agnostic scene layout. IEEE Transactions on Image Processing, 29:5877–5888

Download references

Funding

The numerical calculations in this paper have been done on the supercomputing system in the Supercomputing Center of Wuhan University. This work is supported by the National Natural Science Foundation of China Enterprise Innovation and Development Joint Fund (Project No. U19B2004).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lian Zou.

Ethics declarations

Conflicts of interest

The authors declare no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, J., Zou, L., Fan, C. et al. Adaptive local recalibration network for scene recognition. Appl Intell 53, 27935–27950 (2023). https://doi.org/10.1007/s10489-023-04963-0

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-023-04963-0

Keywords

Navigation