Intelligent Assistant for People with Low Vision Abilities

  • Oleksandr Bogdan
  • Oleg Yurchenko
  • Oleksandr Bailo
  • Francois Rameau
  • Donggeun Yoo
  • In So Kweon
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10749)


This paper proposes a wearable system for visually impaired people that can be utilized to obtain an extensive feedback about their surrounding environment. Our system consists of a stereo camera and smartglasses, communicating with a smartphone that is used as an intermediary computational device. Furthermore, the system is connected to a server where all the expensive computations are executed. The whole setup is capable of detecting obstacles in the nearest surrounding, recognizing faces and facial expressions, reading texts, providing a generic description and question answering of a particular input image. In addition, we propose a novel depth question answering system to estimate object size as well as objects relative position in an unconstrained environment in near real-time and in a fully automatic way requiring only stereo image pair and voice request as an input. We have conducted a series of experiments to evaluate the feasibility and practicality of the proposed system which shows promising results to assist visually impaired people.


Visually impaired people Wearable device Mobility Recognition Guidance 



We would like to thank KAIST Research Promotion Team for funding this work through URP program. The fourth author was supported by the KRF Program through the NRF funded by the Ministry of Science and ICT (2015H1D3A1066564).


  1. 1.
  2. 2.
    Common problems of people with low vision.
  3. 3.
    Horus wearable device.
  4. 4.
  5. 5.
  6. 6.
  7. 7.
  8. 8.
  9. 9.
    Fukui, A., Park, D.H., Yang, D., Rohrbach, A., Darrell, T., Rohrbach, M.: Multimodal compact bilinear pooling for visual question answering and visual grounding. arXiv:1606.01847 (2016)
  10. 10.
    Geiger, A., Roser, M., Urtasun, R.: Efficient large-scale stereo matching. In: Kimmel, R., Klette, R., Sugimoto, A. (eds.) ACCV 2010. LNCS, vol. 6492, pp. 25–38. Springer, Heidelberg (2011). CrossRefGoogle Scholar
  11. 11.
    Postma, A., Zuidhoek, S., Noordzij, M., Kappers, A.: Differences between early-blind, late-blind, and blindfolded-sighted people in haptic spatial-configuration learning and resulting memory traces. Perception 36(8), 1253–1265 (2007)CrossRefGoogle Scholar
  12. 12.
    Mustapha, B., Zayegh, A., Begg, R.K.: Wireless obstacle detection system for the elderly and visually impaired people. In: ICSIMA (2013)Google Scholar
  13. 13.
    Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: CVPR (2016)Google Scholar
  14. 14.
    Keralia, D., Vyas, K.K., Deulkar, K.: Google project tango–a convenient 3D modeling device. Int. J. Curr. Eng. Technol. 4, 3139–3142 (2014)Google Scholar
  15. 15.
    Brady, E., Morris, M.R., Zhong, Y., White, S., Bigham, J.P.: Visual challenges in the everyday lives of blind people. In: SIGCHI (2013)Google Scholar
  16. 16.
    Schroff, F., Kalenichenko, D., Philbin, J.: FaceNet: a unified embedding for face recognition and clustering. In: CVPR (2015)Google Scholar
  17. 17.
    Wang, H., Katzschmann, R., Teng, S., Araki, B., Giarré, L., Rus, D.: Enabling independent navigation for visually impaired people through a wearable vision-based feedback system. In: ICRA (2017)Google Scholar
  18. 18.
    Goodfellow, I.J., et al.: Challenges in representation learning: a report on three machine learning contests. In: Lee, M., Hirose, A., Hou, Z.-G., Kil, R.M. (eds.) ICONIP 2013. LNCS, vol. 8228, pp. 117–124. Springer, Heidelberg (2013). CrossRefGoogle Scholar
  19. 19.
    Huaizu, J., Miller, E.: Face detection with the faster R-CNN. CoRR, abs/1606.03473 (2016)Google Scholar
  20. 20.
    Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: CVPR (2015)Google Scholar
  21. 21.
    Neto, L.B., Grijalva, F., Maike, V., Martini, L., Florencio, D., Baranauskas, M., Rocha, A., Goldenstein, S.: A kinect-based wearable face recognition system to aid visually impaired users. Trans. Hum.-Mach. Syst. 47, 52–64 (2017)Google Scholar
  22. 22.
    Langner, O., Dotsch, R., Bijlstra, G., Wigboldus, D., Hawk, S., Van Knippenberg, A.D.: Presentation and validation of the radboud faces database. Cogn. Emot. 24(8), 1377–1388 (2010)CrossRefGoogle Scholar
  23. 23.
    Vinyals, O., Toshev, A., Bengio, S., Erhan, D.: Show and tell: lessons learned from the 2015 MSCOCO image captioning challenge. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 39(4), 652–663 (2017)CrossRefGoogle Scholar
  24. 24.
    Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with subword information. arXiv:1607.04606 (2016)
  25. 25.
    Lucey, P., Cohn, J., Kanade, T., Saragih, J., Ambadar, Z., Matthews, I.: The extended Cohn-Kanade dataset (CK+): a complete dataset for action unit and emotion-specified expression. In: CVPR Workshops (2010)Google Scholar
  26. 26.
    Tapu, R., Mocanu, B., Bursuc, A., Zaharia, T.: A smartphone-based obstacle detection and classification system for assisting visually impaired people. In: ICCV Workshops (2013)Google Scholar
  27. 27.
    Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)CrossRefGoogle Scholar
  28. 28.
    Mattoccia, S., Macrı’, P.: 3D glasses as mobility aid for visually impaired people. In: Agapito, L., Bronstein, M.M., Rother, C. (eds.) ECCV 2014. LNCS, vol. 8927, pp. 539–554. Springer, Cham (2015). Google Scholar
  29. 29.
    Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: NIPS (2015)Google Scholar
  30. 30.
    Creamer, T., Jaiswal, P., Pavlovski, C.: Voice-to-text reduction for real time IM/chat/SMS. US Patent App. 10/603,495 (2003)Google Scholar
  31. 31.
    Hassner, T., Harel, S., Paz, E., Enbar, R.: Effective face frontalization in unconstrained images. In: CVPR (2015)Google Scholar
  32. 32.
    Kanade, T., Cohn, J., Tian, Y.: Comprehensive database for facial expression analysis. In: FG (2000)Google Scholar
  33. 33.
    Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv:1301.3781 (2013)
  34. 34.
    Barber, W., Cipolla, T., Mundy, J.: Optical character recognition. US Patent 4,339,745 (1982)Google Scholar
  35. 35.
    Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., Berg, A.C.: SSD: single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  1. 1.RCV Lab, Electrical EngineeringKAISTDaejeonRepublic of Korea

Personalised recommendations