Abstract
The success of deep learning in the field of computer vision and object recognition has made significant breakthroughs, especially in improving recognition accuracy. The scene recognition algorithms have been evolved over the years because of the developments in machine learning and deep convolution neural networks (DCNNs). In this paper, the classification of indoor scenes using three deep learning models, namely, ResNet, MobileNet, and EfficientNet is attempted. The influence of activation functions on classification accuracy is being explored. Three activation functions, namely, tanh, ReLU, and sigmoid, are deployed in the work. The MIT-67 indoor dataset is split into scenes with and without people to test its effect on the accuracy of classification. The novelty of the work includes splitting the dataset, based on the spatial layout and segregating, into two groups, namely, with people and without people. Amongst the three pre-trained models, EfficientNet has given good results.
Similar content being viewed by others
REFERENCES
L. Deng and D. Yu, “Deep learning: Methods and applications,” Found. Trends Signal Process. 7, 197–387 (2014). https://doi.org/10.1561/2000000039
J. Deng, W. Dong, R. Socher, L. Li, K. Li, and F. Li, “ImageNet: A large-scale hierarchical image database,” in Proc. IEEE Conf. on Computer Vision and Pattern Recognition, Miami, 2009 (IEEE, 2009), pp. 248–255. https://doi.org/10.1109/CVPR.2009.5206848
N. M. Elfiky, F. S. Khan, J. van de Weijer, and J. Gonzàlez, “Discriminative compact pyramids for object and scene recognition,” Pattern Recognit. 45, 1627–1636 (2012). https://doi.org/10.1016/j.patcog.2011.09.020
H. Goh, N. Thome, M. Cord and J.-H. Lim, “Learning deep hierarchical visual feature coding,” IEEE Trans. Neural Network. Learn. Syst. 25, 2212–2225 (2014). https://doi.org/10.1109/TNNLS.2014.2307532
S. Guo, W. Huang, L. Wang, and Y. Qiao, “Locally supervised deep hybrid model for scene recognition,” IEEE Trans. Image Process. 26, 808–820 (2017). https://doi.org/10.1109/TIP.2016.2629443
M. Hayat, S. H. Khan, M. Bennamoun, and S. An, “A spatial layout and scale invariant feature representation for indoor scene classification,” IEEE Trans. Image Process. 25, 4829–4841 (2016). https://doi.org/10.1109/TIP.2016.2599292
L. Herranz, S. Jiang, and X. Li, “Scene recognition with CNNs: Objects, scales and dataset bias,” in Proc. IEEE Conf. on Computer Vision and Pattern Recognition, Las Vegas, 2016 (IEEE, 2016), pp. 571–579. https://doi.org/10.1109/CVPR.2016.68
A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, and H. Adam, “MobileNets: Efficient convolutional neural networks for mobile vision applications.” arXiv:1704.04861 [cs.CV]
S. Lazebnik, C. Schmid, and J. Ponce, “Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories,” in IEEE Computer Society Conf. on Computer Vision and Pattern Recognition, New York, 2006 (IEEE, 2006), pp. 2169–2178. https://doi.org/10.1109/CVPR.2006.68
G. L. Malcolm, I. I. A. Groen, and C. I. Baker, “Making sense of real-world scenes,” Trends Cognit. Sci. 20, 843–856 (2016). https://doi.org/10.1016/j.tics.2016.09.003
X. Meng, Z. Wang, and L. Wu, “Building global image features for scene recognition,” Pattern Recognit. 45, 373–380 (2012). https://doi.org/10.1016/j.patcog.2011.06.012
M. Tan and Q. Le, “EfficientNet: Rethinking model scaling for convolutional neural networks,” in Proc. 36th Int. Conf. on Machine Learning, Ed. by K. Chaudhuri and R. Salakhutdinov, Proc. of Machine Learning Research, vol. 97 (PMLR, 2019), pp. 6105–6114. http://proceedings.mlr.press/v97/tan19a/tan19a.pdf
A. Quattoni and A. Torralba, “Recognizing indoor scenes,” in IEEE Conf. on Computer Vision and Pattern Recognition, Miami, 2009 (IEEE, 2009), pp. 413–420. https://doi.org/10.1109/CVPR.2009.5206537
H. Seong, J. Hyun, and E. Kim, “FOSNet: An end-to-end trainable deep neural network for scene recognition,” IEEE Access 8, 82066–82077 (2020). https://doi.org/10.1109/ACCESS.2020.2989863
K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” in Proc. Int. Conf. on Learning Representations, 2015. arXiv:1409.1556 [cs.CV]
K. Wu, E. Wu, and G. Kreiman, “Learning scene gist with convolutional neural networks to improve object recognition,” 52nd Ann. Conf. on Information Sciences and Systems, Princeton, N.J., 2018 (IEEE, 2018), pp. 1–6. https://doi.org/10.1109/CISS.2018.8362305
J. Wu and J. M. Rehg, “CENTRIST: A visual descriptor for scene categorization,” IEEE Trans. Pattern Anal. Mach. Intell. 33, 1489–1501 (2011). https://doi.org/10.1109/TPAMI.2010.224
J. Xiao, J. Hays, K.A. Ehinger, A. Oliva, and A. Torralba, “SUN database: Large-scale scene recognition from abbey to zoo,” in IEEE Computer Society Conf. on Computer Vision and Pattern Recognition, San Francisco, 2010 (IEEE, 2010), pp. 3485–3492. https://doi.org/10.1109/CVPR.2010.5539970
G.-S. Xie, X.-Y. Zhang, S. Yan, and C.-L. Liu, “Hybrid CNN and dictionary-based models for scene recognition and domain adaptation,” IEEE Trans. Circuits Syst. Video Technol. 27, 1263–1274 (2017). https://doi.org/10.1109/TCSVT.2015.2511543
G.-S. Xie, X.-Y. Zhang, and C.-L. Liu, “Efficient feature coding based on auto-encoder network for image classification,” in Computer Vision–ACCV 2014, Ed. by D. Cremers, I. Reid, H. Saito, and M. H. Yang, Lecture Notes in Computer Science, vol. 9003 (Springer, Cham, 2015), pp. 628–642. https://doi.org/10.1007/978-3-319-16865-4_41
L. Xie, F. Lee, L. Liu, Z. Yin, Y. Yan, W. Wang, J. Zhao, and Q. Chen, “Improved spatial pyramid matching for scene recognition,” Pattern Recognit. 82 (2018), 118–129. https://doi.org/10.1016/j.patcog.2018.04.025
Y. Liu, Q. Chen, W. Chen, and I. Wassell, “Dictionary learning inspired deep network for scene recognition,” Proc. AAAI Conf. Artif. Intell. 32, 7178–7185 (2018). https://doi.org/ojs.aaai.org/index.php/AAAI/article/view/12312
S. Yang and D. Ramanan, “Multi-scale recognition with DAG-CNNs,” in IEEE Int. Conf. on Computer Vision (ICCV), Santiago, Chile, 2016 (IEEE, 2016), pp. 1215–1223. https://doi.org/10.1109/ICCV.2015.144
Z. Wu, Ch. Shen, and A. van den Hengel, “Wider or deeper: Revisiting the ResNet model for visual recognition,” Pattern Recognit. 90, 119–133 (2019). https://doi.org/10.1016/j.patcog.2019.01.006
Funding
The authors declare that there is no external funding received; however, the necessary hardware and software support is received from the research center of the institute for carrying out the research. The authors thank Visvesvaraya Technological University (VTU), Belagavi-590018, and KLE Institute of Technology, Hubballi-580027, India for providing a platform to carrying out the research work.
Author information
Authors and Affiliations
Corresponding authors
Ethics declarations
COMPLIANCE WITH ETHICAL STANDARDS
This article is a completely original work of its authors; it has not been published before and will not be sent to other publications until the PRIA Editorial Board decides not to accept it for publication.
Conflict of Interest
The process of writing and the content of the article do not give grounds for raising the issue of a conflict of interest.
Additional information
Basavaraj S. Anami is working as Principal of K. L. E. Institute of Technology, Hubballi-580027, India. He is a veteran professor in computer science and has more than 40 yr of teaching experience, including 20 yr of research experience. His research area includes agriculture/horticulture image processing and natural language processing. A senior member of IEEE and CSI. He has authored three books in computer science, published by PHI, Wiely (India) and UP.
Chetan V. Sagarnal received a Bachelor of Engineering and Master of Technology in Electronics and Communication Engineering from Visveswaraiah Technological University (VTU), Belagavi-590018, India. He has 7 years of teaching experience and pursuing his PhD in Computer Science and Engineering from VTU. His current research interests include computer vision and deep learning.
Rights and permissions
About this article
Cite this article
Anami, B.S., Sagarnal, C.V. Influence of Different Activation Functions on Deep Learning Models in Indoor Scene Images Classification. Pattern Recognit. Image Anal. 32, 78–88 (2022). https://doi.org/10.1134/S1054661821040039
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1134/S1054661821040039