Abstract
The current article discusses the performance of local and global descriptors, as well as convolutional neural networks (CNNs), in tasks involving image recognition in interior spaces. The purpose of the test is to identify several realistic situations that closely resemble the typical working conditions for mobile robots. A robot interacting with its environment may be able to see portions of scenes in which objects are seen from various angles or changes in the lighting in various settings. The purpose is to investigate how well the different descriptors perform in identifying situations that meet the above criteria. In order to evaluate the effectiveness of visual descriptors and convolutional neural networks in the classification of images taken from the perspective of mobile robots in indoor environments, a proprietary database was implemented and subjected to several controlled transformations. These modifications made it possible to analyze the performance of Bag-of-Visual-Words (BoVW), Fisher Vectors (Fisher), Vector of Locally Aggregated Descriptors (VLAD), Global Image Descriptors (GIST), and CNN descriptors in visual categorization tasks according to the situational perception of mobile robots.The findings highlight the advantages of descriptors for the various test scenarios and highlight the need for hybrid models that employ both descriptors and CNNs for scene identification tasks in interior areas where mobile robots operate.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Afif, M., Ayachi, R., Said, Y., Atri, M.: Deep learning based application for indoor scene recognition. Neural Process. Lett. 51, 2827–2837 (2020)
Arandjelovic, R., Gronat, P., Torii, A., Pajdla, T., Sivic, J.: Netvlad: Cnn architecture for weakly supervised place recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5297–5307 (2016)
Bay, H., Tuytelaars, T., Van Gool, L.: Surf: Speeded up robust features. Lect. Notes Comput. Sci. 3951, 404–417 (2006)
Bosch, A., Zisserman, A., Munoz, X.: Image classification using random forests and ferns. In: 2007 IEEE 11th International Conference on Computer Vision, pp. 1–8. IEEE (2007)
Csurka, G., Dance, C., Fan, L., Willamowski, J., Bray, C.: Visual categorization with bags of keypoints. In: Workshop on Statistical Learning in Computer Vision, ECCV, vol. 1, pp. 1–2. Prague (2004)
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), vol. 1, pp. 886–893. IEEE (2005)
Gong, Y., Wang, L., Guo, R., Lazebnik, S.: Multi-scale orderless pooling of deep convolutional activation features. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8695, pp. 392–407. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10584-0_26
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Jégou, H., Perronnin, F., Douze, M., Sánchez, J., Pérez, P., Schmid, C.: Aggregating local image descriptors into compact codes. IEEE Trans. Pattern Anal. Mach. Intell. 34(9), 1704–1716 (2011)
Jurie, F., Triggs, B.: Creating efficient codebooks for visual recognition. In: Tenth IEEE International Conference on Computer Vision (ICCV’05), vol. 1, pp. 604–610. IEEE (2005)
LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436–444 (2015)
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60, 91–110 (2004)
Murino, V., Puppo, E.: Image Analysis and Processing-ICIAP 2015: 18th International Conference, Genoa, Italy, Proceedings, Part I, vol. 9279. Springer (2015)
Oliva, A., Torralba, A.: Modeling the shape of the scene: A holistic representation of the spatial envelope. Int. J. Comput. Vis. 42, 145–175 (2001)
Quattoni, A., Torralba, A.: Recognizing indoor scenes. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 413–420. IEEE (2009)
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., et al.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 115, 211–252 (2015)
Sánchez, J., Perronnin, F., Mensink, T., Verbeek, J.: Image classification with the fisher vector: Theory and practice. Int. J. Comput. Vis. 105, 222–245 (2013)
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015)
van Gemert, J.C., Geusebroek, J.-M., Veenman, C.J., Smeulders, A.W.M.: Kernel codebooks for scene categorization. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008. LNCS, vol. 5304, pp. 696–709. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-88690-7_52
Vogel, J., Schiele, B.: A semantic typicality measure for natural scene categorization. In: Rasmussen, C.E., Bülthoff, H.H., Schölkopf, B., Giese, M.A. (eds.) DAGM 2004. LNCS, vol. 3175, pp. 195–203. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-28649-3_24
Wei, X., Phung, S.L., Bouzerdoum, A.: Visual descriptors for scene categorization: experimental evaluation. Artif. Intell. Rev. 45, 333–368 (2016)
Wozniak, P., Afrisal, H., Esparza, R.G., Kwolek, B.: Scene recognition for indoor localization of mobile robots using deep CNN. In: Chmielewski, L.J., Kozera, R., Orłowski, A., Wojciechowski, K., Bruckstein, A.M., Petkov, N. (eds.) ICCVG 2018. LNCS, vol. 11114, pp. 137–147. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00692-1_13
Wu, R., Wang, B., Wang, W., Yu, Y.: Harvesting discriminative meta objects with deep cnn features for scene classification. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1287–1295 (2015)
Xie, L., Lee, F., Liu, L., Kotani, K., Chen, Q.: Scene recognition: A comprehensive survey. Pattern Recognit. 102, 107205 (2020)
Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional networks. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8689, pp. 818–833. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10590-1_53
Zhou, X., Zhuang, X., Tang, H., Hasegawa-Johnson, M., Huang, T.S.: Novel gaussianized vector representation for improved natural scene categorization. Pattern Recognit. Lett. 31(8), 702–708 (2010)
Acknowledgements
This research has been supported by the project “COordinated intelligent Services for Adaptive Smart areaS (COSASS), Reference: PID2021-123673OB-C33, financed by MCIN/AEI/10.13039/501100011033/FEDER, UE.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Hernando Ríos González, L., López Flórez, S., González-Briones, A., de la Prieta, F. (2023). Evaluation of Feature Descriptors for Scene Classification. In: Castillo Ossa, L.F., Isaza, G., Cardona, Ó., Castrillón, O.D., Corchado Rodriguez, J.M., De la Prieta Pintado, F. (eds) Trends in Sustainable Smart Cities and Territories . SSCT 2023. Lecture Notes in Networks and Systems, vol 732. Springer, Cham. https://doi.org/10.1007/978-3-031-36957-5_24
Download citation
DOI: https://doi.org/10.1007/978-3-031-36957-5_24
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-36956-8
Online ISBN: 978-3-031-36957-5
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)