Extending Maps with Semantic and Contextual Object Information for Robot Navigation: a Learning-Based Framework Using Visual and Depth Cues

Martins, Renato; Bersan, Dhiego; Campos, Mario F. M.; Nascimento, Erickson R.

doi:10.1007/s10846-019-01136-5

Extending Maps with Semantic and Contextual Object Information for Robot Navigation: a Learning-Based Framework Using Visual and Depth Cues

Published: 29 February 2020

Volume 99, pages 555–569, (2020)
Cite this article

Journal of Intelligent & Robotic Systems Aims and scope Submit manuscript

Renato Martins ORCID: orcid.org/0000-0003-0053-0004¹^nAff2,
Dhiego Bersan¹,
Mario F. M. Campos¹ &
…
Erickson R. Nascimento¹

463 Accesses
12 Citations
7 Altmetric
Explore all metrics

Abstract

This paper addresses the problem of building augmented metric representations of scenes with semantic information from RGB-D images. We propose a complete framework to create an enhanced map representation of the environment with object-level information to be used in several applications such as human-robot interaction, assistive robotics, visual navigation, or in manipulation tasks. Our formulation leverages a CNN-based object detector (Yolo) with a 3D model-based segmentation technique to perform instance semantic segmentation, and to localize, identify, and track different classes of objects in the scene. The tracking and positioning of semantic classes is done with a dictionary of Kalman filters in order to combine sensor measurements over time and then providing more accurate maps. The formulation is designed to identify and to disregard dynamic objects in order to obtain a medium-term invariant map representation. The proposed method was evaluated with collected and publicly available RGB-D data sequences acquired in different indoor scenes. Experimental results show the potential of the technique to produce augmented semantic maps containing several objects (notably doors). We also provide to the community a dataset composed of annotated object classes (doors, fire extinguishers, benches, water fountains) and their positioning, as well as the source code as ROS packages.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Building 3D semantic maps for mobile robots using RGB-D camera

Article 01 July 2016

Localization Exploiting Semantic and Metric Information in Non-static Indoor Environments

Article Open access 01 December 2023

Indoor Obstacle Discovery on Reflective Ground via Monocular Camera

Article 20 October 2023

References

Armeni, I., Sax, A., Zamir, A.R., Savarese, S.: Joint 2D-3D-Semantic Data for Indoor Scene Understanding. ArXiv e-prints (2017)
Bersan, D., Martins, R., Campos, M., Nascimento, E.R.: Semantic Map Augmentation for Robot Navigation: a Learning Approach Based on Visual and Depth Data. In: IEEE LARS (2018)
Bolya, D., Zhou, C., Xiao, F., Lee, Y.J.: Yolact: Real-time Instance Segmentation. In: ICCV (2019)
Bozhinoski, D., Di Ruscio, D., Malavolta, I., Pelliccione, P., Crnkovic, I.: Safety for mobile robotic systems: a systematic mapping study from a software engineering perspective. J. Syst. Softw. 151, 150–179 (2019)
Article Google Scholar
Carneiro, R., Nascimento, R., Guidolini, R., Cardoso, V., Oliveira-santos, T., Badue, C., Souza, A.D.: Mapping road lanes using laser remission and deep neural networks coRR (2018)
Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The Cityscapes Dataset for Semantic Urban Scene Understanding. In: IEEE CVPR (2016)
Dai, A., Chang, A.X., Savva, M., Halber, M., Funkhouser, T., Nießner, M.: Scannet: Richly-Annotated 3D Reconstructions of Indoor Scenes. In: IEEE CVPR (2017)
Dimitrievski, M., Veelaert, P., Philips, W.: Behavioral pedestrian tracking using a camera and lidar sensors on a moving vehicle. Sensors 19(2), 391 (2019)
Article Google Scholar
Endres, F., Hess, J., Sturm, J., Cremers, D., Burgard, W.: 3-d mapping with an rgb-d camera. IEEE Trans on Robotics 30(1), 177–187 (2014)
Article Google Scholar
Engel, J., Schȯps, T., Cremers, D.: LSD-SLAM: Large-Scale Direct Monocular SLAM. In: IEEE ECCV (2014)
Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The pascal visual object classes (voc) challenge. International Journal of Computer Vision 88(2), 303–338 (2010)
Article Google Scholar
Firman, M.: RGBD Datasets: Past, Present and Future. In: CVPR Workshop on Large Scale 3D Data: Acquisition, Modelling and Analysis (2016)
Fox, D., Burgard, W., Dellaert, F., Thrun, S.: Monte carlo localization: Efficient position estimation for mobile robots. AAAI/IAAI 1999 (1999)
Gaidon, A., Wang, Q., Cabon, Y., Vig, E.: Virtual Worlds as Proxy for Multi-Object Tracking Analysis. In: IEEE CVPR (2016)
Grisetti, G., Stachniss, C., Burgard, W.: Improved techniques for grid mapping with rao-blackwellized particle filters. IEEE Trans on Robotics 23(1), 34–46 (2007)
Article Google Scholar
Häne, C., Zach, C., Cohen, A., Pollefeys, M.: Dense semantic 3D reconstruction. IEEE Trans on Pattern Analysis and Machine Intelligence 39(9), 1730–1743 (2016)
Article Google Scholar
He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-Cnn. In: IEEE ICCV (2017)
He, K., Zhang, X., Ren, S., Sun, J.: Deep Residual Learning for Image Recognition. In: IEEE CVPR (2016)
Jatavallabhula, K., Smith, E., Lafleche, J.F., Fuji Tsang, C., Rozantsev, A., Chen, W., Xiang, T., Lebaredian, R., Fidler, S.: Kaolin: A pytorch library for accelerating 3d deep learning research. arXiv:1911.05063 (2019)
Kanazawa, A., Black, M.J., Jacobs, D.W., Malik, J.: End-To-End Recovery of Human Shape and Pose. In: Computer Vision and Pattern Regognition (CVPR) (2018)
Kim, C., Li, F., Rehg, J.M.: Multi-Object Tracking with Neural Gating Using Bilinear Lstm. In: IEEE ECCV (2018)
Kim, S.J., Nam, J.Y., Ko, B.C.: Online tracker optimization for multi-pedestrian tracking using a moving vehicle camera. IEEE Access 6, 48675–48687 (2018)
Article Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.: Imagenet Classification with Deep Convolutional Neural Networks. In: NIPS (2012)
Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955)
Article MathSciNet MATH Google Scholar
Labbé, M., Michaud, F.: Rtab-map as an open-source lidar and visual simultaneous localization and mapping library for large-scale and long-term online operation. Journal of Field Robotics 36(2), 416–446 (2019)
Article Google Scholar
Lateef, F., Ruichek, Y.: Survey on semantic segmentation using deep learning techniques. Neurocomputing (2019)
Leo, M., Medioni, G., Trivedi, M., Kanade, T., Farinella, G. M.: Computer vision for assistive technologies. Comput. Vis. Image Underst. 154, 1–15 (2017)
Article Google Scholar
Li, X., Belaroussi, R.: Semi-dense 3d semantic mapping from monocular slam. arXiv:1611.04144 (2016)
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C., Berg, A.: Ssd: Single Shot Multibox Detector. In: IEEE ECCV (2016)
McCormac, J., Handa, A., Davison, A., Leutenegger, S.: Semanticfusion: Dense 3D Semantic Mapping with Convolutional Neural Networks. In: IEEE ICRA (2017)
Mur-Artal, R., Montiel, J.M., Tardos, J.D.: Orb-slam: a versatile and accurate monocular slam system. IEEE Trans on Robotics 31(5), 1147–1163 (2015)
Article Google Scholar
Nascimento, E.R., Oliveira, G., Campos, M., Vieira, A.: Improving Object Detection and Recognition for Semantic Mapping with an Extended Intensity and Shape Based Descriptor. In: IEEE IROS Workshop on Active Semantic Perception (2011)
Papadakis, P., Rives, P.: Binding human spatial interactions with mapping for enhanced mobility in dynamic environments. Autonomous Robots 41(5), 1047–1059 (2017)
Article Google Scholar
Pérez-Yus, A., López-Nicolás, G., Guerrero, J.: Detection and Modelling of Staircases Using a Wearable Depth Sensor. In: ECCV (2014)
Pronobis, A., Jensfelt, P.: Large-Scale Semantic Mapping and Reasoning with Heterogeneous Modalities. In: IEEE ICRA (2012)
Raguram, R., Frahm, J.M., Pollefeys, M.: A Comparative Analysis of Ransac Techniques Leading to Adaptive Real-Time Random Sample Consensus. In: European Conference on Computer Vision (2008)
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You Only Look Once: Unified, real-time object detection. In: IEEE CVPR (2016)
Rehder, E., Wirth, F., Lauer, M., Stiller, C.: Pedestrian prediction by planning using deep neural networks. coRR (2018)
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-Cnn: Towards Real-Time Object Detection with Region Proposal Networks. In: NIPS (2015)
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A.C., Fei-fei, L.: ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision 115(3), 211–252 (2015)
Article MathSciNet Google Scholar
Rusu, R., Cousins, S.: 3D is Here: Point Cloud Library (PCL). In: IEEE ICRA (2011)
Sadeghian, A., Alahi, A., Savarese, S.: Tracking the Untrackable: Learning to Track Multiple Cues with Long-Term Dependencies. In: IEEE ICCV (2017)
Salaris, P., Vassallo, C., Souères, P., Laumond, J.P.: The geometry of confocal curves for passing through a door. IEEE Trans on Robotics 31(5), 1180–1193 (2015)
Article Google Scholar
Wang, C., Hou, S., Wen, C., Gong, Z., Li, Q., Sun, X., Li, J.: Semantic line framework-based indoor building modeling using backpacked laser scanning point cloud. ISPRS Journal of Photogrammetry and Remote Sensing 143, 150–166 (2018)
Article Google Scholar
Wang, H., Sun, Y., Liu, M.: Self-supervised drivable area and road anomaly segmentation using rgb-d data for robotic wheelchairs. IEEE Robotics and Automation Letters 4(4), 4386–4393 (2019)
Article Google Scholar
Whelan, T., Leutenegger, S., Salas-moreno, R.F., Glocker, B., Davison, A.: Elasticfusion: Dense SLAM without A Pose Graph. In: Robotics: Science and Systems (2015)
Zhan, X., Liu, Z., Luo, P., Tang, X., Loy, C.C.: Mix-And-Match Tuning for Self-Supervised Semantic Segmentation. In: AAAI Conference on Artificial Intelligence (2018)

Download references

Acknowledgments

The authors thank PNPD-CAPES and FAPEMIG for financial support during this research. We also gratefully acknowledge NVIDIA for the donation of the Jetson TX2 GPU used in the online experiments of this research.

Author information

Renato Martins
Present address: Inria, Sophia Antipolis, France

Authors and Affiliations

Universidade Federal de Minas Gerais (UFMG), Belo Horizonte, Brazil
Renato Martins, Dhiego Bersan, Mario F. M. Campos & Erickson R. Nascimento

Authors

Renato Martins
View author publications
You can also search for this author in PubMed Google Scholar
Dhiego Bersan
View author publications
You can also search for this author in PubMed Google Scholar
Mario F. M. Campos
View author publications
You can also search for this author in PubMed Google Scholar
Erickson R. Nascimento
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Renato Martins.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Martins, R., Bersan, D., Campos, M.F.M. et al. Extending Maps with Semantic and Contextual Object Information for Robot Navigation: a Learning-Based Framework Using Visual and Depth Cues. J Intell Robot Syst 99, 555–569 (2020). https://doi.org/10.1007/s10846-019-01136-5

Download citation

Received: 23 July 2019
Accepted: 17 December 2019
Published: 29 February 2020
Issue Date: September 2020
DOI: https://doi.org/10.1007/s10846-019-01136-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Extending Maps with Semantic and Contextual Object Information for Robot Navigation: a Learning-Based Framework Using Visual and Depth Cues

Abstract

Access this article

Similar content being viewed by others

Building 3D semantic maps for mobile robots using RGB-D camera

Localization Exploiting Semantic and Metric Information in Non-static Indoor Environments

Indoor Obstacle Discovery on Reflective Ground via Monocular Camera

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation