Skip to main content
Log in

Influence of Different Activation Functions on Deep Learning Models in Indoor Scene Images Classification

  • APPLICATION PROBLEMS
  • Published:
Pattern Recognition and Image Analysis Aims and scope Submit manuscript

Abstract

The success of deep learning in the field of computer vision and object recognition has made significant breakthroughs, especially in improving recognition accuracy. The scene recognition algorithms have been evolved over the years because of the developments in machine learning and deep convolution neural networks (DCNNs). In this paper, the classification of indoor scenes using three deep learning models, namely, ResNet, MobileNet, and EfficientNet is attempted. The influence of activation functions on classification accuracy is being explored. Three activation functions, namely, tanh, ReLU, and sigmoid, are deployed in the work. The MIT-67 indoor dataset is split into scenes with and without people to test its effect on the accuracy of classification. The novelty of the work includes splitting the dataset, based on the spatial layout and segregating, into two groups, namely, with people and without people. Amongst the three pre-trained models, EfficientNet has given good results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1.
Fig. 2.
Fig. 3.
Fig. 4.
Fig. 5.
Fig. 6.
Fig. 7.
Fig. 8.
Fig. 9.
Fig. 10.
Fig. 11.
Fig. 12.
Fig. 13.

Similar content being viewed by others

REFERENCES

  1. L. Deng and D. Yu, “Deep learning: Methods and applications,” Found. Trends Signal Process. 7, 197–387 (2014). https://doi.org/10.1561/2000000039

    Article  MathSciNet  MATH  Google Scholar 

  2. J. Deng, W. Dong, R. Socher, L. Li, K. Li, and F. Li, “ImageNet: A large-scale hierarchical image database,” in Proc. IEEE Conf. on Computer Vision and Pattern Recognition, Miami, 2009 (IEEE, 2009), pp. 248–255. https://doi.org/10.1109/CVPR.2009.5206848

  3. N. M. Elfiky, F. S. Khan, J. van de Weijer, and J. Gonzàlez, “Discriminative compact pyramids for object and scene recognition,” Pattern Recognit. 45, 1627–1636 (2012). https://doi.org/10.1016/j.patcog.2011.09.020

    Article  MATH  Google Scholar 

  4. H. Goh, N. Thome, M. Cord and J.-H. Lim, “Learning deep hierarchical visual feature coding,” IEEE Trans. Neural Network. Learn. Syst. 25, 2212–2225 (2014). https://doi.org/10.1109/TNNLS.2014.2307532

    Article  Google Scholar 

  5. S. Guo, W. Huang, L. Wang, and Y. Qiao, “Locally supervised deep hybrid model for scene recognition,” IEEE Trans. Image Process. 26, 808–820 (2017). https://doi.org/10.1109/TIP.2016.2629443

    Article  MathSciNet  MATH  Google Scholar 

  6. M. Hayat, S. H. Khan, M. Bennamoun, and S. An, “A spatial layout and scale invariant feature representation for indoor scene classification,” IEEE Trans. Image Process. 25, 4829–4841 (2016). https://doi.org/10.1109/TIP.2016.2599292

    Article  MathSciNet  MATH  Google Scholar 

  7. L. Herranz, S. Jiang, and X. Li, “Scene recognition with CNNs: Objects, scales and dataset bias,” in Proc. IEEE Conf. on Computer Vision and Pattern Recognition, Las Vegas, 2016 (IEEE, 2016), pp. 571–579. https://doi.org/10.1109/CVPR.2016.68

  8. A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, and H. Adam, “MobileNets: Efficient convolutional neural networks for mobile vision applications.” arXiv:1704.04861 [cs.CV]

  9. S. Lazebnik, C. Schmid, and J. Ponce, “Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories,” in IEEE Computer Society Conf. on Computer Vision and Pattern Recognition, New York, 2006 (IEEE, 2006), pp. 2169–2178. https://doi.org/10.1109/CVPR.2006.68

  10. G. L. Malcolm, I. I. A. Groen, and C. I. Baker, “Making sense of real-world scenes,” Trends Cognit. Sci. 20, 843–856 (2016). https://doi.org/10.1016/j.tics.2016.09.003

    Article  Google Scholar 

  11. X. Meng, Z. Wang, and L. Wu, “Building global image features for scene recognition,” Pattern Recognit. 45, 373–380 (2012). https://doi.org/10.1016/j.patcog.2011.06.012

    Article  Google Scholar 

  12. M. Tan and Q. Le, “EfficientNet: Rethinking model scaling for convolutional neural networks,” in Proc. 36th Int. Conf. on Machine Learning, Ed. by K. Chaudhuri and R. Salakhutdinov, Proc. of Machine Learning Research, vol. 97 (PMLR, 2019), pp. 6105–6114. http://proceedings.mlr.press/v97/tan19a/tan19a.pdf

  13. A. Quattoni and A. Torralba, “Recognizing indoor scenes,” in IEEE Conf. on Computer Vision and Pattern Recognition, Miami, 2009 (IEEE, 2009), pp. 413–420. https://doi.org/10.1109/CVPR.2009.5206537

  14. H. Seong, J. Hyun, and E. Kim, “FOSNet: An end-to-end trainable deep neural network for scene recognition,” IEEE Access 8, 82066–82077 (2020). https://doi.org/10.1109/ACCESS.2020.2989863

    Article  Google Scholar 

  15. K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” in Proc. Int. Conf. on Learning Representations, 2015. arXiv:1409.1556 [cs.CV]

  16. K. Wu, E. Wu, and G. Kreiman, “Learning scene gist with convolutional neural networks to improve object recognition,” 52nd Ann. Conf. on Information Sciences and Systems, Princeton, N.J., 2018 (IEEE, 2018), pp. 1–6. https://doi.org/10.1109/CISS.2018.8362305

  17. J. Wu and J. M. Rehg, “CENTRIST: A visual descriptor for scene categorization,” IEEE Trans. Pattern Anal. Mach. Intell. 33, 1489–1501 (2011). https://doi.org/10.1109/TPAMI.2010.224

    Article  Google Scholar 

  18. J. Xiao, J. Hays, K.A. Ehinger, A. Oliva, and A. Torralba, “SUN database: Large-scale scene recognition from abbey to zoo,” in IEEE Computer Society Conf. on Computer Vision and Pattern Recognition, San Francisco, 2010 (IEEE, 2010), pp. 3485–3492. https://doi.org/10.1109/CVPR.2010.5539970

  19. G.-S. Xie, X.-Y. Zhang, S. Yan, and C.-L. Liu, “Hybrid CNN and dictionary-based models for scene recognition and domain adaptation,” IEEE Trans. Circuits Syst. Video Technol. 27, 1263–1274 (2017). https://doi.org/10.1109/TCSVT.2015.2511543

    Article  Google Scholar 

  20. G.-S. Xie, X.-Y. Zhang, and C.-L. Liu, “Efficient feature coding based on auto-encoder network for image classification,” in Computer Vision–ACCV 2014, Ed. by D. Cremers, I. Reid, H. Saito, and M. H. Yang, Lecture Notes in Computer Science, vol. 9003 (Springer, Cham, 2015), pp. 628–642. https://doi.org/10.1007/978-3-319-16865-4_41

    Book  Google Scholar 

  21. L. Xie, F. Lee, L. Liu, Z. Yin, Y. Yan, W. Wang, J. Zhao, and Q. Chen, “Improved spatial pyramid matching for scene recognition,” Pattern Recognit. 82 (2018), 118–129. https://doi.org/10.1016/j.patcog.2018.04.025

    Article  Google Scholar 

  22. Y. Liu, Q. Chen, W. Chen, and I. Wassell, “Dictionary learning inspired deep network for scene recognition,” Proc. AAAI Conf. Artif. Intell. 32, 7178–7185 (2018). https://doi.org/ojs.aaai.org/index.php/AAAI/article/view/12312

  23. S. Yang and D. Ramanan, “Multi-scale recognition with DAG-CNNs,” in IEEE Int. Conf. on Computer Vision (ICCV), Santiago, Chile, 2016 (IEEE, 2016), pp. 1215–1223. https://doi.org/10.1109/ICCV.2015.144

  24. Z. Wu, Ch. Shen, and A. van den Hengel, “Wider or deeper: Revisiting the ResNet model for visual recognition,” Pattern Recognit. 90, 119–133 (2019). https://doi.org/10.1016/j.patcog.2019.01.006

    Article  Google Scholar 

Download references

Funding

The authors declare that there is no external funding received; however, the necessary hardware and software support is received from the research center of the institute for carrying out the research. The authors thank Visvesvaraya Technological University (VTU), Belagavi-590018, and KLE Institute of Technology, Hubballi-580027, India for providing a platform to carrying out the research work.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Basavaraj S. Anami or Chetan V. Sagarnal.

Ethics declarations

COMPLIANCE WITH ETHICAL STANDARDS

This article is a completely original work of its authors; it has not been published before and will not be sent to other publications until the PRIA Editorial Board decides not to accept it for publication.

Conflict of Interest

The process of writing and the content of the article do not give grounds for raising the issue of a conflict of interest.

Additional information

Basavaraj S. Anami is working as Principal of K. L. E. Institute of Technology, Hubballi-580027, India. He is a veteran professor in computer science and has more than 40 yr of teaching experience, including 20 yr of research experience. His research area includes agriculture/horticulture image processing and natural language processing. A senior member of IEEE and CSI. He has authored three books in computer science, published by PHI, Wiely (India) and UP.

Chetan V. Sagarnal received a Bachelor of Engineering and Master of Technology in Electronics and Communication Engineering from Visveswaraiah Technological University (VTU), Belagavi-590018, India. He has 7 years of teaching experience and pursuing his PhD in Computer Science and Engineering from VTU. His current research interests include computer vision and deep learning.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Anami, B.S., Sagarnal, C.V. Influence of Different Activation Functions on Deep Learning Models in Indoor Scene Images Classification. Pattern Recognit. Image Anal. 32, 78–88 (2022). https://doi.org/10.1134/S1054661821040039

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1134/S1054661821040039

Keywords:

Navigation