Skip to main content

Can We Unify Perception and Localization in Assisted Navigation? An Indoor Semantic Visual Positioning System for Visually Impaired People

  • Conference paper
  • First Online:
Computers Helping People with Special Needs (ICCHP 2020)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12376))


Navigation assistance has made significant progress in the last years with the emergence of different approaches, allowing them to perceive their surroundings and localize themselves accurately, which greatly improves the mobility of visually impaired people. However, most of the existing systems address each of the tasks individually, which increases the response time that is clearly not beneficial for a safety-critical application. In this paper, we aim to cover scene perception and visual localization needed by navigation assistance in a unified way. We present a semantic visual localization system to help visually impaired people to be aware of their locations and surroundings in indoor environments. Our method relies on 3D reconstruction and semantic segmentation of RGB-D images captured from a pair of wearable smart glasses. We can inform the user of an upcoming object via audio feedback so that the user can be prepared to avoid obstacles or interact with the object, which means that visually impaired people can be more active in an unfamiliar environment.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others


  1. Badrinarayanan, V., Kendall, A., Cipolla, R.: SegNet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39(12), 2481–2495 (2017)

    Article  Google Scholar 

  2. Cadena, C., Carlone, L., Carrillo, H., et al.: Past, present, and future of simultaneous localization and mapping: toward the robust-perception age. IEEE Trans. Robot. 32(6), 1309–1332 (2016)

    Article  Google Scholar 

  3. Hu, W., Wang, K., Chen, H., et al.: An indoor positioning framework based on panoramic visual odometry for visually impaired people. Measure. Sci. Technol. 31(1), 014006 (2019)

    Article  Google Scholar 

  4. Hu, X., Yang, K., Fei, L., Wang, K.: ACNet: Attention based network to exploit complementary features for RGBD semantic segmentation. In: International Conference on Image Processing (2019)

    Google Scholar 

  5. Lin, S., Cheng, R., Wang, K., Yang, K.: Visual localizer: outdoor localization based on convnet descriptor and global optimization for visually impaired pedestrians. Sensors. 18(4), 2476 (2018)

    Article  Google Scholar 

  6. Lin, Y., Wang, K., Yi, W., Lian, S.: Deep learning based wearable assistive system for visually impaired people. In: Proceedings of the IEEE International Conference on Computer Vision Workshops (2019)

    Google Scholar 

  7. Liu, Q., Li, R., Hu, H., Gu, D.: Indoor topological localization based on a novel deep learning technique. Cognitive Comput. 12(3), 528–541 (2020).

    Article  Google Scholar 

  8. Martinez, M., Roitberg, A., Koester, D., et al.: Using technology developed for autonomous cars to help navigate blind people. In: Proceedings of the IEEE International Conference on Computer Vision Workshops (2017)

    Google Scholar 

  9. Mur-Artal, R., Tardós, J.D.: ORB-SLAM2: An open-source SLAM system for monocular, stereo, and RGB-D cameras. IEEE Trans. Robot. 33(5), 1255–1262 (2017)

    Article  Google Scholar 

  10. Paszke, A., Chaurasia, A., Kim, S., Culurciello, E.: Enet: A deep neural network architecture for real-time semantic segmentation. arXiv:1606.02147 (2016)

  11. Poudel, R.P., Liwicki, S., Cipolla, R.: Fast-SCNN: fast semantic segmentation network. arXiv:1902.04502 (2019)

  12. Romera, E., Alvarez, J.M., Bergasa, L.M., Arroyo, R.: ERFNET: Efficient residual factorized convnet for real-time semantic segmentation. IEEE Trans. Intell. Transport. Syst. 19(1), 263–272 (2018)

    Article  Google Scholar 

  13. Romera, E., Bergasa, L.M., Yang, K., et al.: Bridging the day and night domain gap for semantic segmentation. In: Intelligent Vehicles Symposium (2019)

    Google Scholar 

  14. Rosinol, A., Abate, M., Chang, Y., Carlone, L.: Kimera: an open-source library for real-time metric-semantic localization and mapping. In: International Conference on Robotics and Automation (2019)

    Google Scholar 

  15. Song, S., Lichtenberg, S.P., Xiao, J.: SUN RGB-D: A RGB-D scene understanding benchmark suite. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015)

    Google Scholar 

  16. Sumikura, S., Shibuya, M., Sakurada, K.: OpenVSLAM: a versatile visual slam framework. In: Proceedings of the 27th ACM International Conference on Multimedia (2019)

    Google Scholar 

  17. Sun, L., Yang, K., Hu, X., et al.: Real-time fusion network for RGB-D semantic segmentation incorporating unexpected obstacle detection for road-driving images. arXiv:2002.10570 (2020)

  18. Whelan, T., Salas-Moreno, R.F., Glocker, B., et al.: Elasticfusion: real-time dense slam and light source estimation. Int. J. Robot. Res. 35(14), 1697–1716 (2016)

    Article  Google Scholar 

  19. Yang, K., Bergasa, L.M., Romera, E., Wang, K.: Robustifying semantic cognition of traversability across wearable RGB-depth cameras. Appl. Opt. 58(12), 3141–3155 (2019)

    Article  Google Scholar 

  20. Yang, K., Wang, K., Bergasa, L.M., et al.: Unifying terrain awareness for the visually impaired through real-time semantic segmentation. Sensors. 18(5), 1506 (2018)

    Article  Google Scholar 

Download references


The work is partially funded by the German Federal Ministry of Labour and Social Affairs (BMAS) under the grant number 01KM151112. This work is also supported in part by Hangzhou SurImage Technology Company Ltd. and in part by Hangzhou KrVision Technology Company Ltd. (

Author information

Authors and Affiliations


Corresponding author

Correspondence to Kailun Yang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Chen, H., Zhang, Y., Yang, K., Martinez, M., Müller, K., Stiefelhagen, R. (2020). Can We Unify Perception and Localization in Assisted Navigation? An Indoor Semantic Visual Positioning System for Visually Impaired People. In: Miesenberger, K., Manduchi, R., Covarrubias Rodriguez, M., Peňáz, P. (eds) Computers Helping People with Special Needs. ICCHP 2020. Lecture Notes in Computer Science(), vol 12376. Springer, Cham.

Download citation

  • DOI:

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-58795-6

  • Online ISBN: 978-3-030-58796-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics