Multimedia Tools and Applications

, Volume 76, Issue 9, pp 11771–11807 | Cite as

A computer vision-based perception system for visually impaired

  • Ruxandra Tapu
  • Bogdan Mocanu
  • Titus Zaharia


In this paper, we introduce a novel computer vision-based perception system, dedicated to the autonomous navigation of visually impaired people. A first feature concerns the real-time detection and recognition of obstacles and moving objects present in potentially cluttered urban scenes. To this purpose, a motion-based, real-time object detection and classification method is proposed. The method requires no a priori information about the obstacle type, size, position or location. In order to enhance the navigation/positioning capabilities offered by traditional GPS-based approaches, which are often unreliably in urban environments, a building/landmark recognition approach is also proposed. Finally, for the specific case of indoor applications, the system has the possibility to learn a set of user-defined objects of interest. Here, multi-object identification and tracking is applied in order to guide the user to localize such objects of interest. The feedback is presented to user by audio warnings/alerts/indications. Bone conduction headphones are employed in order to allow visually impaired to hear the systems warnings without obstructing the sounds from the environment. At the hardware level, the system is totally integrated on an android smartphone which makes it easy to wear, non-invasive and low-cost.


Obstacle detection BoVW / VLAD image representation Relevant interest points A-HOG descriptor Visually impaired people 



This work has been partially supported by the AAL (Ambient Assisted Living) ALICE project (AAL-2011-4-099), co-financed by ANR (Agence Nationale de la Recherche) and CNSA (Conseil National pour la Solidarité et l’Autonomie).

This work was supported by a grant of the Romanian National Authority for Scientific Research and Innovation, CNCS - UEFISCDI, project number PN-II-RU-TE-2014-4-0202.


  1. 1.
    Alahi, Ortiz R, Vandergheynst P (2012) FREAK: fast retina keypoint. In: IEEE Conference on Computer Vision and Pattern Recognition, 2012. CVPRGoogle Scholar
  2. 2.
    Ali H, Paar G, Paletta L (2007) Semantic indexing for visual recognition of buildings, 5th Int Symp Mob Mapp Technol. 6–9Google Scholar
  3. 3.
    Arthur D, Vassilvitskii S (2007) K-means++: the advantages of careful seeding. In: Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, pp. 1027–1035.
  4. 4.
    Baatz G, Köser K, Chen D, Grzeszczuk R, Pollefeys M (2010) Handling urban location recognition as a 2D homothetic problem. In: Daniilidis K, Maragos P, Paragios N (eds) Computer Vision – ECCV 2010 SE - 20, Springer Berlin Heidelberg, pp. 266–279. doi:  10.1007/978-3-642-15567-3_20
  5. 5.
    Bay H, Ess A, Tuytelaars T, Van Gool L (2008) Speeded-Up Robust Features (SURF). Comput Vis Image Underst 110:346–359. doi: 10.1016/j.cviu.2007.09.014 CrossRefGoogle Scholar
  6. 6.
    Black M, Anandan P (1993) A framework for robust estimation of optical flow. In: International Conference on Computer Vision CVPR, 231–236Google Scholar
  7. 7.
    Blasch BB, Wiener WR, Welsh RL (1997) Foundations of orientation and mobility. In: American Foundation for the Blind, 2nd ed., Press: New YorkGoogle Scholar
  8. 8.
    Brock M, Kristensson PO (2013) Supporting blind navigation using depth sensing and sonification. In Proceedings of the ACM Conference on Pervasive and Ubiquitous Computing, SwitzerlandGoogle Scholar
  9. 9.
    Chandrasekhar VR, Chen DM, Tsai SS, Cheung NM, Chen H, Takacs G et al (2011) The Stanford Mobile Visual Search Data Set, in: Proceedings of the Second Annual ACM Conference on Multimedia Systems, ACM, New York, NY, USA, pp. 117–122. doi:  10.1145/1943552.1943568
  10. 10.
    Chaudhry, Chandra R (2015) Design of a mobile face recognition system for visually impaired persons. CoRR, vol. abs/1502.00756Google Scholar
  11. 11.
    Chen DM, Baatz G, Koser K, Tsai SS, Vedantham R, Pylvanainen T et al (2011) City-scale landmark identification on mobile devices, Computer Vision and Pattern Recognition (CVPR), 2011 I.E. Conference on. 737–744. doi:  10.1109/CVPR.2011.5995610
  12. 12.
    Chen L, Guo B, Sun W (2010) Obstacle detection system for visually impaired people based on stereo vision. In Proceedings of the 4th International Conference on Genetic and Evolutionary Computing, Shenzhen, China, 13–15Google Scholar
  13. 13.
    Csurka G, Bray C, Dance C, Fan L (2004) Visual categorization with bags of keypoints, Workshop on Statistical Learning in Computer Vision, ECCV. 1–22Google Scholar
  14. 14.
    Dakopoulos D, Boddhu SK, Bourbakis N (2007) A 2D vibration array as an assistive device for visually impaired, bioinformatics and bioengineering, 2007. BIBE 2007. Proceedings of the 7th IEEE International Conference on. 930–937. doi:  10.1109/BIBE.2007.4375670
  15. 15.
    Dakopoulos D, Bourbakis N (2008) Preserving visual information in low resolution images during navigation of visually impaired. In: Proceedings of the 1st International Conference on PErvasive Technologies Related to Assistive Environments, ACM, New York, NY, USA, pp. 27:1–27:6. doi:  10.1145/1389586.1389619
  16. 16.
    Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection, Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on. 1 886–893 vol. 1. doi:  10.1109/CVPR.2005.177
  17. 17.
    Dalal N, Triggs B (2006) Object detection using histograms of oriented gradients. In: European Conference on Computer VisionGoogle Scholar
  18. 18.
    Delhumeau J, Gosselin P-H, Jégou H, Pérez P (2013) Revisiting the VLAD image representation. In: ACM Multimedia, 653–656Google Scholar
  19. 19.
    Ding C, He X (2004) K-means clustering via principal component analysis. In: Proceedings of the Twenty-First International Conference on Machine Learning, ACM, New York, NY, USA, pp. 29–. doi:  10.1145/1015330.1015408
  20. 20.
    El Mobacher A, Mitri N, Awad M (2013) Entropy-based and weighted selective SIFT clustering as an energy aware framework for supervised visual recognition of man-made structures. Math Probl EngGoogle Scholar
  21. 21.
    Erhan D, Szegedy C, Toshev A, Anguelov D (2014) Scalable object detection using deep neural networks. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2155–2162Google Scholar
  22. 22.
    Everingham M, Gool L, Williams CK, Winn J, Zisserman A (2010) The Pascal Visual Object Classes (VOC) challenge. Int J Comput Vis 88:303–338. doi: 10.1007/s11263-009-0275-4 CrossRefGoogle Scholar
  23. 23.
    Farabet C, Couprie C, Najman L, LeCun Y (2013) Learning hierarchical features for scene labeling. Pattern Analysis and MachineIntelligence, IEEE Transactions on, pp. 1–15Google Scholar
  24. 24.
    Fernando B, Fromont E, Muselet D, Sebban M (2012) Discriminative feature fusion for image classification. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3434–3441Google Scholar
  25. 25.
    Gauglitz S, Hollerer T, Turk M (2011) Evaluation of interest point detectors and feature descriptors for visual tracking. Int J Comput Vis, pages 1–26Google Scholar
  26. 26.
    Girshick RB, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587Google Scholar
  27. 27.
    Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation, Computer Vision and Pattern Recognition (CVPR), 2014 I.E. Conference on. 580–587. doi:  10.1109/CVPR.2014.81
  28. 28.
    Golledge RG, Marston JR, Costanzo CM (1997) Attitudes of visually impaired persons towards the use of public transportation. J Vis Impair Blindness 90:446–459Google Scholar
  29. 29.
    Grauman K, Bastian L (2011) Visual object recognition. Morgan & Claypool, San FranciscoGoogle Scholar
  30. 30.
    Gronat P, Obozinski G, Sivic J, Pajdla T (2013) Learning and calibrating per-location classifiers for visual place recognition, Computer Vision and Pattern Recognition (CVPR), 2013 I.E. Conference on. 907–914. doi:  10.1109/CVPR.2013.122
  31. 31.
    Harris C, Stephens M (1988) A combined corner and edge detector. In: Alvey Vision Conference, 147–151Google Scholar
  32. 32.
    Jegou H, Douze M, Schmid C (2011) Product quantization for nearest neighbor search. In PAMI 33(1):117–128CrossRefGoogle Scholar
  33. 33.
    Johnson LA, Higgins CM (2006) A navigation aid for the blind using tactile-visual sensory substitution. Eng Med Biol Soc 2006. EMBS’06. 28th Annual International Conference of the IEEE. 6289–6292. doi:  10.1109/IEMBS.2006.259473.
  34. 34.
    José J, Farrajota M, Rodrigues João MF, Hans du Buf JM (2011) The smart vision local navigation aid for blind and visually impaired persons. Int J Digit Content Technol Appl 5:362–375Google Scholar
  35. 35.
    Khan A, Moideen F, Lopez J, Khoo WL, Zhu Z (2012) KinDetect: kinect detection objects. In: Computer Helping People with Special Needs, LNCS7382, 588–595Google Scholar
  36. 36.
    Kuo BC, Ho HH, Li CH, Hung CC, Taur JS (2014) A kernel-based feature selection method for SVM with RBF kernel for hyperspectral image classification, selected topics in applied earth observations and remote sensing. IEEE J 7:317–326. doi: 10.1109/JSTARS.2013.2262926 Google Scholar
  37. 37.
    Lampert CH, Blaschko MB, Hofmann T (2008) Beyond sliding windows: object localization by efficient subwindow search, Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on. 1–8. doi:  10.1109/CVPR.2008.4587586
  38. 38.
    Lee JJ, Kim G (2007) Robust estimation of camera homography using fuzzy RANSAC. In: Proceedings of the 2007 International Conference on Computational Science and Its Applications - Volume Part I, Springer-Verlag, Berlin, Heidelberg, pp. 992–1002.
  39. 39.
    Lepetit CV, Strecha C, Fua P () BRIEF: binary robust independent elementary features. 11th European Conference on Computer Vision (ECCV), Heraklion, Crete. LNCS Springer, September 2010Google Scholar
  40. 40.
    Leutenegger S, Chli M, Siegwart R (2011) Brisk: binary robust invariant scalable keypoints. IEEE International Conference on Computer Vision (ICCV)Google Scholar
  41. 41.
    Li J, Allinson NM (2009) Dimensionality reduction-based building recognition. In: 9th IASTED International Conference on VisualizationGoogle Scholar
  42. 42.
    Lin Q, Hahn HS, Han YJ (2013) Top-view based guidance for blind people using directional ellipse model. Int J Adv Robot Syst 1:1–10Google Scholar
  43. 43.
    Lowe DG (1999) Object recognition from local scale-invariant features, Computer Vision, 1999. The Proceedings of the Seventh IEEE International Conference on. 2, 1150–1157 vol.2. doi:  10.1109/ICCV.1999.790410
  44. 44.
    Lowe D (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60:91–110. doi: 10.1023/B:VISI.0000029664.99615.94 CrossRefGoogle Scholar
  45. 45.
    Lucas B, Kanade T (1981) An iterative technique of image registration and its application to stereo. In: IJCAI’81 Proceedings of the 7th international joint conference on Artificial intelligence, 2, 674–679Google Scholar
  46. 46.
    Manduchi R (2012) Mobile vision as assistive technology for the blind: an experimental study. In: Proceedings of the 13th International Conference on Computers Helping People with Special Needs - Volume Part II, Springer-Verlag, Berlin, Heidelberg, pp. 9–16. doi:  10.1007/978-3-642-31534-3_2
  47. 47.
    Matas J, Chum O, Urban M, Pajdla T (2004) Robust wide-baseline stereo from maximally stable external regions. Image Vis Comput 22:761–767. doi: 10.1016/j.imavis.2004.02.006 CrossRefGoogle Scholar
  48. 48.
    Meers S, Ward K (2005) A substitute vision system for providing 3D perception and GPS navigation via electro-tactile stimulation. In: 1st International Conference on Sensing Technology, 21–23Google Scholar
  49. 49.
    Muja M, Lowe DG (2009) Fast approximate nearest neighbors with automatic algorithm configuration. In: VISAPP International Conference on Computer Vision Theory and Applications, pp. 331–340Google Scholar
  50. 50.
    Oneata D, Revaud J, Verbeek J, Schmid C (2014) Spatio-temporal object detection proposals. Europeean Conference on Computer Vision, ECCV 2014 - European Conference on Computer Vision, volume 8691, pages 737–752, Zurich, Switzerland, SpringerGoogle Scholar
  51. 51.
    Pascolini D, Mariotti SP (2012) Global data on visual impairments 2010, in: World Health Organization, GenevaGoogle Scholar
  52. 52.
    Peng E, Peursum P, Li L, Venkatesh S (2010) A smartphone-based obstacle sensor for the visually impaired. In: Yu Z, Liscano R, Chen G, Zhang D, Zhou X (eds) Ubiquitous Intelligence and Computing SE - 45, Springer Berlin Heidelberg, pp. 590–604. doi:  10.1007/978-3-642-16355-5_45
  53. 53.
    Powers DMW (2011) Evaluation: from precision, recall and F measure to roc, informedness, markedness and correlation. J Mach Learn Technol 2(1):37–63MathSciNetGoogle Scholar
  54. 54.
    Pradeep V, Medioni G, Weiland J (2010) Robot vision for the visually impaired, Computer Vision and Pattern Recognition Workshops (CVPRW), 2010 I.E. Computer Society Conference on. 15–22. doi:  10.1109/CVPRW.2010.5543579
  55. 55.
    Rister B, Wang G, Wu M, Cavallaro JR (2013) A fast and efficient sift detector using the mobile GPU, Acoustics, Speech and Signal Processing (ICASSP), 2013 I.E. International Conference on. 2674–2678. doi:  10.1109/ICASSP.2013.6638141
  56. 56.
    Rodríguez A, Yebes JJ, Alcantarilla PF, Bergasa LM, Almazán J, Cela A (2012) Assisting the visually impaired: obstacle detection and warning system by acoustic feedback. Sensors 12:17476–17496. doi: 10.3390/s121217476 CrossRefGoogle Scholar
  57. 57.
    Rosa S, Paleari M, Ariano P, Bona B (2012) Object tracking with adaptive HOG detector and adaptive Rao-Blackwellised particle filter. Proceedings of SPIE 8301, Intelligent Robots and Computer Vision XXIX: Algorithms and Techniques, 83010 W. doi: 10.1117/12.911991
  58. 58.
    Rublee E, Rabaud V, Konolige K, Bradski G (2011) ORB: an efficient alternative to SIFT or SURF. Computer Vision (ICCV), 2011 I.E. International Conference on, vol., no., pp. 2564–2571, 6–13Google Scholar
  59. 59.
    Saez JM, Escolano F (2008) Stereo-based aerial obstacle detection for the visually impaired. In: Workshop on Computer Vision Applications for the Visually Impaired, Marselle, FranceGoogle Scholar
  60. 60.
    Saez JM, Escolano F, Penalver A (2005) First steps towards stereo-based 6DOF SLAM for the visually impaired, computer vision and pattern recognition - workshops, 2005. CVPR Workshops. IEEE Computer Society Conference on. 23. doi:  10.1109/CVPR.2005.461
  61. 61.
    Sainarayanan G, Nagarajan R, Yaacob S (2007) Fuzzy image processing scheme for autonomous navigation of human blind. Appl Soft Comput 7:257–264CrossRefGoogle Scholar
  62. 62.
    Shao H, Svoboda1 T, Tuytelaars T, Van Gool L (2003) HPAT indexing for fast object/scene recognition based on local appearance. In: E. Bakker, M. Lew, T. Huang, N. Sebe, X. Zhou (Eds.), Image and Video Retrieval SE - 8, Springer Berlin Heidelberg, pp. 71–80. doi:  10.1007/3-540-45113-7_8
  63. 63.
    Szegedy C, Toshev A, Erhan D (2013) Deep neural networks for object detection. In: Annual Conference on Neural Information Processing Systems, pp. 2553–2561Google Scholar
  64. 64.
    Takizawa H, Yamaguchi S, Aoyagi M, Ezaki N, Mizuno S (2012) Kinect cane: an assistive system for the visually impaired based on three-dimensional object recognition. In Proceedings of IEEE International Symposium on System Integration, JapanGoogle Scholar
  65. 65.
    Tian Y, Yang X, Arditi A (2010) Computer vision-based door detection for accessibility of unfamiliar environments to blind persons. In: Proceedings of the 12th International Conference on Computers Helping People with Special Needs, Springer LNCS, vol. 6180, pp. 263–270Google Scholar
  66. 66.
    Tong S, Chang E (2001) Support vector machine active learning for image retrieval. In: Proceedings of the Ninth ACM International Conference on Multimedia, ACM, New York, NY, USA, pp. 107–118. doi:  10.1145/500141.500159
  67. 67.
    Tuzel O, Porikli F, Meer P (2006) Region covariance: “a fast descriptor for detection and classification”. In ECCV 3952:589–600Google Scholar
  68. 68.
    van de Sande KEA, Uijlings JRR, Gevers T, Smeulders AWM (2011) Segmentation As Selective Search for Object Recognition, in: Proceedings of the 2011 International Conference on Computer Vision, IEEE Computer Society, Washington, DC, USA, pp. 1879–1886. doi:  10.1109/ICCV.2011.6126456
  69. 69.
    Vinyals A, Toshev A, Bengio S, Erhan D (2015) Show and tell: a neural image caption generator. In: International Conference on Computer Vision and Pattern Recognition (CVPR)Google Scholar
  70. 70.
    Wang HC et al (2015) Bridging text spotting and SLAM with junction features. Intelligent Robots and Systems (IROS), 2015 IEEE/RSJ International Conference on, Hamburg, pp. 3701–3708Google Scholar
  71. 71.
    Yu JH, Chung HI, Hahn HS (2009) Walking assistance system for sight impaired people based on a multimodal transformation technique. In Proceedings of the ICROS-SICE International Joint Conference, JapanGoogle Scholar
  72. 72.
    Zhang W (2005) Localization based on building recognition. In: IEEE Workshop on Applications for Visually Impaired, pp. 21–28Google Scholar
  73. 73.
    Zhang M, Zhou Z (2005) A k-nearest neighbor based algorithm for multilabel classification. In: IEEE International Conference on Granular Computing 2, 718–721Google Scholar
  74. 74.
    Zhao C, Liu C, Lai Z (2011) Multi-scale gist feature manifold for building recognition. Neurocomput 74:2929–2940. doi: 10.1016/j.neucom.2011.03.035 CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2016

Authors and Affiliations

  1. 1.ARTEMIS DepartmentInstitut Mines-Télécom / Télécom SudParis, UMR CNRS MAP5 8145ÉvryFrance
  2. 2.Telecommunication Department, Faculty of ETTIUniversity “Politehnica” of BucharestBucharestRomania

Personalised recommendations