Skip to main content
Log in

Extending Maps with Semantic and Contextual Object Information for Robot Navigation: a Learning-Based Framework Using Visual and Depth Cues

  • Published:
Journal of Intelligent & Robotic Systems Aims and scope Submit manuscript

Abstract

This paper addresses the problem of building augmented metric representations of scenes with semantic information from RGB-D images. We propose a complete framework to create an enhanced map representation of the environment with object-level information to be used in several applications such as human-robot interaction, assistive robotics, visual navigation, or in manipulation tasks. Our formulation leverages a CNN-based object detector (Yolo) with a 3D model-based segmentation technique to perform instance semantic segmentation, and to localize, identify, and track different classes of objects in the scene. The tracking and positioning of semantic classes is done with a dictionary of Kalman filters in order to combine sensor measurements over time and then providing more accurate maps. The formulation is designed to identify and to disregard dynamic objects in order to obtain a medium-term invariant map representation. The proposed method was evaluated with collected and publicly available RGB-D data sequences acquired in different indoor scenes. Experimental results show the potential of the technique to produce augmented semantic maps containing several objects (notably doors). We also provide to the community a dataset composed of annotated object classes (doors, fire extinguishers, benches, water fountains) and their positioning, as well as the source code as ROS packages.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Armeni, I., Sax, A., Zamir, A.R., Savarese, S.: Joint 2D-3D-Semantic Data for Indoor Scene Understanding. ArXiv e-prints (2017)

  2. Bersan, D., Martins, R., Campos, M., Nascimento, E.R.: Semantic Map Augmentation for Robot Navigation: a Learning Approach Based on Visual and Depth Data. In: IEEE LARS (2018)

  3. Bolya, D., Zhou, C., Xiao, F., Lee, Y.J.: Yolact: Real-time Instance Segmentation. In: ICCV (2019)

  4. Bozhinoski, D., Di Ruscio, D., Malavolta, I., Pelliccione, P., Crnkovic, I.: Safety for mobile robotic systems: a systematic mapping study from a software engineering perspective. J. Syst. Softw. 151, 150–179 (2019)

    Article  Google Scholar 

  5. Carneiro, R., Nascimento, R., Guidolini, R., Cardoso, V., Oliveira-santos, T., Badue, C., Souza, A.D.: Mapping road lanes using laser remission and deep neural networks coRR (2018)

  6. Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The Cityscapes Dataset for Semantic Urban Scene Understanding. In: IEEE CVPR (2016)

  7. Dai, A., Chang, A.X., Savva, M., Halber, M., Funkhouser, T., Nießner, M.: Scannet: Richly-Annotated 3D Reconstructions of Indoor Scenes. In: IEEE CVPR (2017)

  8. Dimitrievski, M., Veelaert, P., Philips, W.: Behavioral pedestrian tracking using a camera and lidar sensors on a moving vehicle. Sensors 19(2), 391 (2019)

    Article  Google Scholar 

  9. Endres, F., Hess, J., Sturm, J., Cremers, D., Burgard, W.: 3-d mapping with an rgb-d camera. IEEE Trans on Robotics 30(1), 177–187 (2014)

    Article  Google Scholar 

  10. Engel, J., Schȯps, T., Cremers, D.: LSD-SLAM: Large-Scale Direct Monocular SLAM. In: IEEE ECCV (2014)

  11. Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The pascal visual object classes (voc) challenge. International Journal of Computer Vision 88(2), 303–338 (2010)

    Article  Google Scholar 

  12. Firman, M.: RGBD Datasets: Past, Present and Future. In: CVPR Workshop on Large Scale 3D Data: Acquisition, Modelling and Analysis (2016)

  13. Fox, D., Burgard, W., Dellaert, F., Thrun, S.: Monte carlo localization: Efficient position estimation for mobile robots. AAAI/IAAI 1999 (1999)

  14. Gaidon, A., Wang, Q., Cabon, Y., Vig, E.: Virtual Worlds as Proxy for Multi-Object Tracking Analysis. In: IEEE CVPR (2016)

  15. Grisetti, G., Stachniss, C., Burgard, W.: Improved techniques for grid mapping with rao-blackwellized particle filters. IEEE Trans on Robotics 23(1), 34–46 (2007)

    Article  Google Scholar 

  16. Häne, C., Zach, C., Cohen, A., Pollefeys, M.: Dense semantic 3D reconstruction. IEEE Trans on Pattern Analysis and Machine Intelligence 39(9), 1730–1743 (2016)

    Article  Google Scholar 

  17. He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-Cnn. In: IEEE ICCV (2017)

  18. He, K., Zhang, X., Ren, S., Sun, J.: Deep Residual Learning for Image Recognition. In: IEEE CVPR (2016)

  19. Jatavallabhula, K., Smith, E., Lafleche, J.F., Fuji Tsang, C., Rozantsev, A., Chen, W., Xiang, T., Lebaredian, R., Fidler, S.: Kaolin: A pytorch library for accelerating 3d deep learning research. arXiv:1911.05063 (2019)

  20. Kanazawa, A., Black, M.J., Jacobs, D.W., Malik, J.: End-To-End Recovery of Human Shape and Pose. In: Computer Vision and Pattern Regognition (CVPR) (2018)

  21. Kim, C., Li, F., Rehg, J.M.: Multi-Object Tracking with Neural Gating Using Bilinear Lstm. In: IEEE ECCV (2018)

  22. Kim, S.J., Nam, J.Y., Ko, B.C.: Online tracker optimization for multi-pedestrian tracking using a moving vehicle camera. IEEE Access 6, 48675–48687 (2018)

    Article  Google Scholar 

  23. Krizhevsky, A., Sutskever, I., Hinton, G.: Imagenet Classification with Deep Convolutional Neural Networks. In: NIPS (2012)

  24. Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955)

    Article  MathSciNet  MATH  Google Scholar 

  25. Labbé, M., Michaud, F.: Rtab-map as an open-source lidar and visual simultaneous localization and mapping library for large-scale and long-term online operation. Journal of Field Robotics 36(2), 416–446 (2019)

    Article  Google Scholar 

  26. Lateef, F., Ruichek, Y.: Survey on semantic segmentation using deep learning techniques. Neurocomputing (2019)

  27. Leo, M., Medioni, G., Trivedi, M., Kanade, T., Farinella, G. M.: Computer vision for assistive technologies. Comput. Vis. Image Underst. 154, 1–15 (2017)

    Article  Google Scholar 

  28. Li, X., Belaroussi, R.: Semi-dense 3d semantic mapping from monocular slam. arXiv:1611.04144 (2016)

  29. Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C., Berg, A.: Ssd: Single Shot Multibox Detector. In: IEEE ECCV (2016)

  30. McCormac, J., Handa, A., Davison, A., Leutenegger, S.: Semanticfusion: Dense 3D Semantic Mapping with Convolutional Neural Networks. In: IEEE ICRA (2017)

  31. Mur-Artal, R., Montiel, J.M., Tardos, J.D.: Orb-slam: a versatile and accurate monocular slam system. IEEE Trans on Robotics 31(5), 1147–1163 (2015)

    Article  Google Scholar 

  32. Nascimento, E.R., Oliveira, G., Campos, M., Vieira, A.: Improving Object Detection and Recognition for Semantic Mapping with an Extended Intensity and Shape Based Descriptor. In: IEEE IROS Workshop on Active Semantic Perception (2011)

  33. Papadakis, P., Rives, P.: Binding human spatial interactions with mapping for enhanced mobility in dynamic environments. Autonomous Robots 41(5), 1047–1059 (2017)

    Article  Google Scholar 

  34. Pérez-Yus, A., López-Nicolás, G., Guerrero, J.: Detection and Modelling of Staircases Using a Wearable Depth Sensor. In: ECCV (2014)

  35. Pronobis, A., Jensfelt, P.: Large-Scale Semantic Mapping and Reasoning with Heterogeneous Modalities. In: IEEE ICRA (2012)

  36. Raguram, R., Frahm, J.M., Pollefeys, M.: A Comparative Analysis of Ransac Techniques Leading to Adaptive Real-Time Random Sample Consensus. In: European Conference on Computer Vision (2008)

  37. Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You Only Look Once: Unified, real-time object detection. In: IEEE CVPR (2016)

  38. Rehder, E., Wirth, F., Lauer, M., Stiller, C.: Pedestrian prediction by planning using deep neural networks. coRR (2018)

  39. Ren, S., He, K., Girshick, R., Sun, J.: Faster R-Cnn: Towards Real-Time Object Detection with Region Proposal Networks. In: NIPS (2015)

  40. Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A.C., Fei-fei, L.: ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision 115(3), 211–252 (2015)

    Article  MathSciNet  Google Scholar 

  41. Rusu, R., Cousins, S.: 3D is Here: Point Cloud Library (PCL). In: IEEE ICRA (2011)

  42. Sadeghian, A., Alahi, A., Savarese, S.: Tracking the Untrackable: Learning to Track Multiple Cues with Long-Term Dependencies. In: IEEE ICCV (2017)

  43. Salaris, P., Vassallo, C., Souères, P., Laumond, J.P.: The geometry of confocal curves for passing through a door. IEEE Trans on Robotics 31(5), 1180–1193 (2015)

    Article  Google Scholar 

  44. Wang, C., Hou, S., Wen, C., Gong, Z., Li, Q., Sun, X., Li, J.: Semantic line framework-based indoor building modeling using backpacked laser scanning point cloud. ISPRS Journal of Photogrammetry and Remote Sensing 143, 150–166 (2018)

    Article  Google Scholar 

  45. Wang, H., Sun, Y., Liu, M.: Self-supervised drivable area and road anomaly segmentation using rgb-d data for robotic wheelchairs. IEEE Robotics and Automation Letters 4(4), 4386–4393 (2019)

    Article  Google Scholar 

  46. Whelan, T., Leutenegger, S., Salas-moreno, R.F., Glocker, B., Davison, A.: Elasticfusion: Dense SLAM without A Pose Graph. In: Robotics: Science and Systems (2015)

  47. Zhan, X., Liu, Z., Luo, P., Tang, X., Loy, C.C.: Mix-And-Match Tuning for Self-Supervised Semantic Segmentation. In: AAAI Conference on Artificial Intelligence (2018)

Download references

Acknowledgments

The authors thank PNPD-CAPES and FAPEMIG for financial support during this research. We also gratefully acknowledge NVIDIA for the donation of the Jetson TX2 GPU used in the online experiments of this research.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Renato Martins.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Martins, R., Bersan, D., Campos, M.F.M. et al. Extending Maps with Semantic and Contextual Object Information for Robot Navigation: a Learning-Based Framework Using Visual and Depth Cues. J Intell Robot Syst 99, 555–569 (2020). https://doi.org/10.1007/s10846-019-01136-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10846-019-01136-5

Keywords

Navigation