Can We Unify Perception and Localization in Assisted Navigation? An Indoor Semantic Visual Positioning System for Visually Impaired People

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12376)


Navigation assistance has made significant progress in the last years with the emergence of different approaches, allowing them to perceive their surroundings and localize themselves accurately, which greatly improves the mobility of visually impaired people. However, most of the existing systems address each of the tasks individually, which increases the response time that is clearly not beneficial for a safety-critical application. In this paper, we aim to cover scene perception and visual localization needed by navigation assistance in a unified way. We present a semantic visual localization system to help visually impaired people to be aware of their locations and surroundings in indoor environments. Our method relies on 3D reconstruction and semantic segmentation of RGB-D images captured from a pair of wearable smart glasses. We can inform the user of an upcoming object via audio feedback so that the user can be prepared to avoid obstacles or interact with the object, which means that visually impaired people can be more active in an unfamiliar environment.


Visual localization 3D reconstruction Semantic segmentation Navigation assistance for the visually impaired 



The work is partially funded by the German Federal Ministry of Labour and Social Affairs (BMAS) under the grant number 01KM151112. This work is also supported in part by Hangzhou SurImage Technology Company Ltd. and in part by Hangzhou KrVision Technology Company Ltd. (


  1. 1.
    Badrinarayanan, V., Kendall, A., Cipolla, R.: SegNet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39(12), 2481–2495 (2017)CrossRefGoogle Scholar
  2. 2.
    Cadena, C., Carlone, L., Carrillo, H., et al.: Past, present, and future of simultaneous localization and mapping: toward the robust-perception age. IEEE Trans. Robot. 32(6), 1309–1332 (2016)CrossRefGoogle Scholar
  3. 3.
    Hu, W., Wang, K., Chen, H., et al.: An indoor positioning framework based on panoramic visual odometry for visually impaired people. Measure. Sci. Technol. 31(1), 014006 (2019)CrossRefGoogle Scholar
  4. 4.
    Hu, X., Yang, K., Fei, L., Wang, K.: ACNet: Attention based network to exploit complementary features for RGBD semantic segmentation. In: International Conference on Image Processing (2019)Google Scholar
  5. 5.
    Lin, S., Cheng, R., Wang, K., Yang, K.: Visual localizer: outdoor localization based on convnet descriptor and global optimization for visually impaired pedestrians. Sensors. 18(4), 2476 (2018)CrossRefGoogle Scholar
  6. 6.
    Lin, Y., Wang, K., Yi, W., Lian, S.: Deep learning based wearable assistive system for visually impaired people. In: Proceedings of the IEEE International Conference on Computer Vision Workshops (2019)Google Scholar
  7. 7.
    Liu, Q., Li, R., Hu, H., Gu, D.: Indoor topological localization based on a novel deep learning technique. Cognitive Comput. 12(3), 528–541 (2020). Scholar
  8. 8.
    Martinez, M., Roitberg, A., Koester, D., et al.: Using technology developed for autonomous cars to help navigate blind people. In: Proceedings of the IEEE International Conference on Computer Vision Workshops (2017)Google Scholar
  9. 9.
    Mur-Artal, R., Tardós, J.D.: ORB-SLAM2: An open-source SLAM system for monocular, stereo, and RGB-D cameras. IEEE Trans. Robot. 33(5), 1255–1262 (2017)CrossRefGoogle Scholar
  10. 10.
    Paszke, A., Chaurasia, A., Kim, S., Culurciello, E.: Enet: A deep neural network architecture for real-time semantic segmentation. arXiv:1606.02147 (2016)
  11. 11.
    Poudel, R.P., Liwicki, S., Cipolla, R.: Fast-SCNN: fast semantic segmentation network. arXiv:1902.04502 (2019)
  12. 12.
    Romera, E., Alvarez, J.M., Bergasa, L.M., Arroyo, R.: ERFNET: Efficient residual factorized convnet for real-time semantic segmentation. IEEE Trans. Intell. Transport. Syst. 19(1), 263–272 (2018)CrossRefGoogle Scholar
  13. 13.
    Romera, E., Bergasa, L.M., Yang, K., et al.: Bridging the day and night domain gap for semantic segmentation. In: Intelligent Vehicles Symposium (2019)Google Scholar
  14. 14.
    Rosinol, A., Abate, M., Chang, Y., Carlone, L.: Kimera: an open-source library for real-time metric-semantic localization and mapping. In: International Conference on Robotics and Automation (2019)Google Scholar
  15. 15.
    Song, S., Lichtenberg, S.P., Xiao, J.: SUN RGB-D: A RGB-D scene understanding benchmark suite. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015)Google Scholar
  16. 16.
    Sumikura, S., Shibuya, M., Sakurada, K.: OpenVSLAM: a versatile visual slam framework. In: Proceedings of the 27th ACM International Conference on Multimedia (2019)Google Scholar
  17. 17.
    Sun, L., Yang, K., Hu, X., et al.: Real-time fusion network for RGB-D semantic segmentation incorporating unexpected obstacle detection for road-driving images. arXiv:2002.10570 (2020)
  18. 18.
    Whelan, T., Salas-Moreno, R.F., Glocker, B., et al.: Elasticfusion: real-time dense slam and light source estimation. Int. J. Robot. Res. 35(14), 1697–1716 (2016)CrossRefGoogle Scholar
  19. 19.
    Yang, K., Bergasa, L.M., Romera, E., Wang, K.: Robustifying semantic cognition of traversability across wearable RGB-depth cameras. Appl. Opt. 58(12), 3141–3155 (2019)CrossRefGoogle Scholar
  20. 20.
    Yang, K., Wang, K., Bergasa, L.M., et al.: Unifying terrain awareness for the visually impaired through real-time semantic segmentation. Sensors. 18(5), 1506 (2018)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.Institute for Anthropomatics and RoboticsKarlsruhe Institute of TechnologyKarlsruheGermany

Personalised recommendations