Abstract
This paper presents a new approach to exploring sparse and binary convolutional filters in traditional Convolutional Neural Networks (CNN). Recent advances in the integration of Deep Learning architectures, particularly in mobile autonomous robotics applications, have motivated several researches to overcome the challenges related to the limitations of computational resources. One of the biggest challenges in the area, is the development of applications to address the Loop Closure Detection problem in Simultaneous Localization and Mapping (SLAM) systems. For such application, it is necessary to use exhaustive computational power. Nevertheless, resource optimization of Convolutional Neural Network models enhances the capability of integration. Therefore, we propose the reformulation of convolutional layers through Local Binary Descriptors (LBD) to achieve this kind of optimization of CNN’s resources. This paper discusses the evaluation of a Bag of Visual Features (BoVF) approach, extracting features through local descriptors (e.g., SIFT, SURF, KAZE), and local binary descriptors (e.g., BRIEF, ORB, BRISK, AKAZE, FREAK). The descriptors were evaluated in the recognition and classification steps using six visual datasets (i.e., MNIST, JAFFE, Extended CK+, FEI, CIFAR-10, and FER-2013) through a Multilayer Perceptron (MLP) classifier. Experimentally, we demonstrated the feasibility of producing promising results by combining BoVF with MLP classifier. Additionally, we can assume that the computed descriptors generated by a Local Binary Descriptor alongside the proposed hybrid DNN (Deep Neural Network) architecture can satisfactorily accomplish the results for the optimization of a CNN’s resources applied to the Loop Closure Detection problem.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Alahi, A., Ortiz, R., Vandergheynst, P.: FREAK: fast retina keypoint. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 510–517. IEEE (2012)
Alcantarilla, P.F., Solutions, T.: Fast explicit diffusion for accelerated features in nonlinear scale spaces. IEEE Trans. Patt. Anal. Mach. Intell 34(7), 1281–1298 (2011)
Alcantarilla, P.F., Bartoli, A., Davison, A.J.: KAZE features. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7577, pp. 214–227. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33783-3_16
Anwer, R.M., Khan, F.S., van de Weijer, J., Molinier, M., Laaksonen, J.: Binary patterns encoded convolutional neural networks for texture recognition and remote sensing scene classification. ISPRS J. Photogramm. Remote. Sens. 138, 74–85 (2018)
Aqel, M.O.A., Marhaban, M.H., Saripan, M.I., Ismail, N.B.: Review of visual odometry: types, approaches, challenges, and applications. Springerplus 5(1), 1–26 (2016). https://doi.org/10.1186/s40064-016-3573-7
Barroso-Laguna, A., Riba, E., Ponsa, D., Mikolajczyk, K.: Key .net: keypoint detection by handcrafted and learned CNN filters. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 5836–5844 (2019)
Bay, H., Tuytelaars, T., Van Gool, L.: SURF: speeded up robust features. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3951, pp. 404–417. Springer, Heidelberg (2006). https://doi.org/10.1007/11744023_32
Bekele, D., Teutsch, M., Schuchert, T.: Evaluation of binary keypoint descriptors. In: 2013 IEEE International Conference on Image Processing, pp. 3652–3656. IEEE (2013)
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
Calonder, M., Lepetit, V., Ozuysal, M., Trzcinski, T., Strecha, C., Fua, P.: Brief: computing a local binary descriptor very fast. IEEE Trans. Pattern Anal. Mach. Intell. 34(7), 1281–1298 (2011)
Calonder, M., Lepetit, V., Strecha, C., Fua, P.: BRIEF: binary robust independent elementary features. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6314, pp. 778–792. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15561-1_56
Chatoux, H., Lecellier, F., Fernandez-Maloigne, C.: Comparative study of descriptors with dense key points. In: 2016 23rd International Conference on Pattern Recognition (ICPR), pp. 1988–1993. IEEE (2016)
Chen, B., Yuan, D., Liu, C., Wu, Q.: Loop closure detection based on multi-scale deep feature fusion. Appl. Sci. 9(6), 1120 (2019)
CS Kumar, A., Bhandarkar, S.M., Prasad, M.: DepthNet: a recurrent neural network architecture for monocular depth prediction. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 283–291 (2018)
Csurka, G., Dance, C., Fan, L., Willamowski, J., Bray, C.: Visual categorization with bags of keypoints. In: Workshop on Statistical Learning in Computer Vision, ECCV, vol. 1, pp. 1–2. Prague (2004)
Dai, Z., Huang, X., Chen, W., He, L., Zhang, H.: A comparison of CNN-based and hand-crafted keypoint descriptors. In: 2019 International Conference on Robotics and Automation (ICRA), pp. 2399–2404. IEEE (2019)
Donahue, J., et al.: Long-term recurrent convolutional networks for visual recognition and description. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2625–2634 (2015)
Durrant-Whyte, H., Bailey, T.: Simultaneous localization and mapping: part I. IEEE Rob. Autom. Mag. 13(2), 99–110 (2006)
Goodfellow, I.J., et al.: Challenges in representation learning: a report on three machine learning contests. In: Lee, M., Hirose, A., Hou, Z.-G., Kil, R.M. (eds.) ICONIP 2013. LNCS, vol. 8228, pp. 117–124. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-42051-1_16
Heinly, J., Dunn, E., Frahm, J.-M.: Comparative evaluation of binary features. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7573, pp. 759–773. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33709-3_54
Juefei-Xu, F., Naresh Boddeti, V., Savvides, M.: Local binary convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 19–28 (2017)
Kanade, T., Cohn, J.F., Tian, Y.: Comprehensive database for facial expression analysis. In: Fourth IEEE International Conference on Automatic Face and Gesture Recognition, Proceedings, pp. 46–53. IEEE (2000)
Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009)
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)
Leutenegger, S., Chli, M., Siegwart, R.: BRISK: binary robust invariant scalable keypoints. In: 2011 IEEE International Conference on Computer Vision (ICCV), pp. 2548–2555. IEEE (2011)
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vision 60(2), 91–110 (2004)
Lowe, D.G., et al.: Object recognition from local scale-invariant features. In: ICCV, vol. 99, pp. 1150–1157 (1999)
Lucey, P., Cohn, J.F., Kanade, T., Saragih, J., Ambadar, Z., Matthews, I.: The extended Cohn-Kanade dataset (CK+): a complete dataset for action unit and emotion-specified expression. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 94–101. IEEE (2010)
Lyons, M., Akamatsu, S., Kamachi, M., Gyoba, J.: Coding facial expressions with Gabor wavelets. In: Third IEEE International Conference on Automatic Face and Gesture Recognition, Proceedings, pp. 200–205. IEEE (1998)
Mascharka, D., Manley, E.: Lips: learning based indoor positioning system using mobile phone-based sensors. In: 2016 13th IEEE Annual Consumer Communications Networking Conference (CCNC), pp. 968–971 (2016). https://doi.org/10.1109/CCNC.2016.7444919
Minsky, M., Papert, S.: Perceptrons. 1969. Cited on p. 1 (1990)
Morioka, N., Satoh, S.: Building compact local pairwise codebook with joint feature space clustering. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6311, pp. 692–705. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15549-9_50
Patel, A., Kasat, D., Jain, S., Thakare, V.: Performance analysis of various feature detector and descriptor for real-time video based face tracking. Int. J. Comput. Appl. 93(1) (2014)
Peng, T., Zhang, D., Liu, R., Asari, V.K., Loomis, J.S.: Evaluating the power efficiency of visual slam on embedded GPU systems. In: 2019 IEEE National Aerospace and Electronics Conference (NAECON), pp. 117–121. IEEE (2019)
Ramezani, M., Wang, Y., Camurri, M., Wisth, D., Mattamala, M., Fallon, M.: The newer college dataset: Handheld lidar, inertial and vision with ground truth. In: 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (2020)
Rosa, P., Silveira, O., de Melo, J., Moreira, L., Rodrigues, L.: Development of embedded algorithm for visual simultaneous localization and mapping. In: Anais Estendidos da XXXII Conference on Graphics, Patterns and Images, pp. 160–163. SBC (2019)
Rublee, E., Rabaud, V., Konolige, K., Bradski, G.R.: ORB: an efficient alternative to sift or surf. In: ICCV, vol. 11, p. 2. Citeseer (2011)
Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning internal representations by error propagation. Technical report, California Univ San Diego La Jolla Inst for Cognitive Science (1985)
Sivic, J., Zisserman, A.: Video google: a text retrieval approach to object matching in videos. In: NULL, p. 1470. IEEE (2003)
Tan, C.L., Egerton, S., Ganapathy, V.: Semantic slam model for autonomous mobile robots using content based image retrieval techniques: a performance analysis. Aust. J. Intell. Inf. Process. Syst. 12(4), 32 (2010)
Thomaz, C.E., Giraldi, G.A.: A new ranking method for principal components analysis and its application to face image analysis. Image Vis. Comput. 28(6), 902–913 (2010)
Valiente, D., Gil, A., Payá, L., Sebastián, J., Reinoso, Ó.: Robust visual localization with dynamic uncertainty management in omnidirectional slam. Appl. Sci. 7, 1294 (12 2017). https://doi.org/10.3390/app7121294
Wang, S., Clark, R., Wen, H., Trigoni, N.: DeepVO: towards end-to-end visual odometry with deep recurrent convolutional neural networks. In: 2017 IEEE International Conference on Robotics and Automation (ICRA), pp. 2043–2050. IEEE (2017)
Xie, J., Kiefel, M., Sun, M.T., Geiger, A.: Semantic instance annotation of street scenes by 3D to 2D label transfer. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2016)
Zhang, Z., Lyons, M., Schuster, M., Akamatsu, S.: Comparison between geometry-based and Gabor-wavelets-based facial expression recognition using multi-layer perceptron. In: Third IEEE International Conference on Automatic Face and Gesture Recognition, Proceedings. pp. 454–459. IEEE (1998)
Acknowledgments
– This work was financed in part by the CoordenaçÃo de Aperfeiçoamento de Pessoal de Nível Superior—Brasil (CAPES)—Finance Code 001.
– This work was carried out with the support of the Programa de CooperaçÃo Acadêmica em Defesa Nacional (PROCAD-DEFESA).
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 Springer Nature Switzerland AG
About this paper
Cite this paper
da Silva, A.M.R., Casqueiro, G.A., Angonese, A.T., Rosa, P.F.F. (2022). Towards Loop Closure Detection for SLAM Applications Using Bag of Visual Features: Experiments and Simulation. In: Ribeiro, P.R.d.A., Cota, V.R., Barone, D.A.C., de Oliveira, A.C.M. (eds) Computational Neuroscience. LAWCN 2021. Communications in Computer and Information Science, vol 1519. Springer, Cham. https://doi.org/10.1007/978-3-031-08443-0_3
Download citation
DOI: https://doi.org/10.1007/978-3-031-08443-0_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-08442-3
Online ISBN: 978-3-031-08443-0
eBook Packages: Computer ScienceComputer Science (R0)