Skip to main content
Log in

A computer vision-based perception system for visually impaired

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

In this paper, we introduce a novel computer vision-based perception system, dedicated to the autonomous navigation of visually impaired people. A first feature concerns the real-time detection and recognition of obstacles and moving objects present in potentially cluttered urban scenes. To this purpose, a motion-based, real-time object detection and classification method is proposed. The method requires no a priori information about the obstacle type, size, position or location. In order to enhance the navigation/positioning capabilities offered by traditional GPS-based approaches, which are often unreliably in urban environments, a building/landmark recognition approach is also proposed. Finally, for the specific case of indoor applications, the system has the possibility to learn a set of user-defined objects of interest. Here, multi-object identification and tracking is applied in order to guide the user to localize such objects of interest. The feedback is presented to user by audio warnings/alerts/indications. Bone conduction headphones are employed in order to allow visually impaired to hear the systems warnings without obstructing the sounds from the environment. At the hardware level, the system is totally integrated on an android smartphone which makes it easy to wear, non-invasive and low-cost.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17

Similar content being viewed by others

References

  1. Alahi, Ortiz R, Vandergheynst P (2012) FREAK: fast retina keypoint. In: IEEE Conference on Computer Vision and Pattern Recognition, 2012. CVPR

  2. Ali H, Paar G, Paletta L (2007) Semantic indexing for visual recognition of buildings, 5th Int Symp Mob Mapp Technol. 6–9

  3. Arthur D, Vassilvitskii S (2007) K-means++: the advantages of careful seeding. In: Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, pp. 1027–1035. http://dl.acm.org/citation.cfm?id=1283383.1283494

  4. Baatz G, Köser K, Chen D, Grzeszczuk R, Pollefeys M (2010) Handling urban location recognition as a 2D homothetic problem. In: Daniilidis K, Maragos P, Paragios N (eds) Computer Vision – ECCV 2010 SE - 20, Springer Berlin Heidelberg, pp. 266–279. doi: 10.1007/978-3-642-15567-3_20

  5. Bay H, Ess A, Tuytelaars T, Van Gool L (2008) Speeded-Up Robust Features (SURF). Comput Vis Image Underst 110:346–359. doi:10.1016/j.cviu.2007.09.014

    Article  Google Scholar 

  6. Black M, Anandan P (1993) A framework for robust estimation of optical flow. In: International Conference on Computer Vision CVPR, 231–236

  7. Blasch BB, Wiener WR, Welsh RL (1997) Foundations of orientation and mobility. In: American Foundation for the Blind, 2nd ed., Press: New York

  8. Brock M, Kristensson PO (2013) Supporting blind navigation using depth sensing and sonification. In Proceedings of the ACM Conference on Pervasive and Ubiquitous Computing, Switzerland

  9. Chandrasekhar VR, Chen DM, Tsai SS, Cheung NM, Chen H, Takacs G et al (2011) The Stanford Mobile Visual Search Data Set, in: Proceedings of the Second Annual ACM Conference on Multimedia Systems, ACM, New York, NY, USA, pp. 117–122. doi: 10.1145/1943552.1943568

  10. Chaudhry, Chandra R (2015) Design of a mobile face recognition system for visually impaired persons. CoRR, vol. abs/1502.00756

  11. Chen DM, Baatz G, Koser K, Tsai SS, Vedantham R, Pylvanainen T et al (2011) City-scale landmark identification on mobile devices, Computer Vision and Pattern Recognition (CVPR), 2011 I.E. Conference on. 737–744. doi: 10.1109/CVPR.2011.5995610

  12. Chen L, Guo B, Sun W (2010) Obstacle detection system for visually impaired people based on stereo vision. In Proceedings of the 4th International Conference on Genetic and Evolutionary Computing, Shenzhen, China, 13–15

  13. Csurka G, Bray C, Dance C, Fan L (2004) Visual categorization with bags of keypoints, Workshop on Statistical Learning in Computer Vision, ECCV. 1–22

  14. Dakopoulos D, Boddhu SK, Bourbakis N (2007) A 2D vibration array as an assistive device for visually impaired, bioinformatics and bioengineering, 2007. BIBE 2007. Proceedings of the 7th IEEE International Conference on. 930–937. doi: 10.1109/BIBE.2007.4375670

  15. Dakopoulos D, Bourbakis N (2008) Preserving visual information in low resolution images during navigation of visually impaired. In: Proceedings of the 1st International Conference on PErvasive Technologies Related to Assistive Environments, ACM, New York, NY, USA, pp. 27:1–27:6. doi: 10.1145/1389586.1389619

  16. Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection, Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on. 1 886–893 vol. 1. doi: 10.1109/CVPR.2005.177

  17. Dalal N, Triggs B (2006) Object detection using histograms of oriented gradients. In: European Conference on Computer Vision

  18. Delhumeau J, Gosselin P-H, Jégou H, Pérez P (2013) Revisiting the VLAD image representation. In: ACM Multimedia, 653–656

  19. Ding C, He X (2004) K-means clustering via principal component analysis. In: Proceedings of the Twenty-First International Conference on Machine Learning, ACM, New York, NY, USA, pp. 29–. doi: 10.1145/1015330.1015408

  20. El Mobacher A, Mitri N, Awad M (2013) Entropy-based and weighted selective SIFT clustering as an energy aware framework for supervised visual recognition of man-made structures. Math Probl Eng

  21. Erhan D, Szegedy C, Toshev A, Anguelov D (2014) Scalable object detection using deep neural networks. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2155–2162

  22. Everingham M, Gool L, Williams CK, Winn J, Zisserman A (2010) The Pascal Visual Object Classes (VOC) challenge. Int J Comput Vis 88:303–338. doi:10.1007/s11263-009-0275-4

    Article  Google Scholar 

  23. Farabet C, Couprie C, Najman L, LeCun Y (2013) Learning hierarchical features for scene labeling. Pattern Analysis and MachineIntelligence, IEEE Transactions on, pp. 1–15

  24. Fernando B, Fromont E, Muselet D, Sebban M (2012) Discriminative feature fusion for image classification. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3434–3441

  25. Gauglitz S, Hollerer T, Turk M (2011) Evaluation of interest point detectors and feature descriptors for visual tracking. Int J Comput Vis, pages 1–26

  26. Girshick RB, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587

  27. Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation, Computer Vision and Pattern Recognition (CVPR), 2014 I.E. Conference on. 580–587. doi: 10.1109/CVPR.2014.81

  28. Golledge RG, Marston JR, Costanzo CM (1997) Attitudes of visually impaired persons towards the use of public transportation. J Vis Impair Blindness 90:446–459

    Google Scholar 

  29. Grauman K, Bastian L (2011) Visual object recognition. Morgan & Claypool, San Francisco

    Google Scholar 

  30. Gronat P, Obozinski G, Sivic J, Pajdla T (2013) Learning and calibrating per-location classifiers for visual place recognition, Computer Vision and Pattern Recognition (CVPR), 2013 I.E. Conference on. 907–914. doi: 10.1109/CVPR.2013.122

  31. Harris C, Stephens M (1988) A combined corner and edge detector. In: Alvey Vision Conference, 147–151

  32. Jegou H, Douze M, Schmid C (2011) Product quantization for nearest neighbor search. In PAMI 33(1):117–128

    Article  Google Scholar 

  33. Johnson LA, Higgins CM (2006) A navigation aid for the blind using tactile-visual sensory substitution. Eng Med Biol Soc 2006. EMBS’06. 28th Annual International Conference of the IEEE. 6289–6292. doi: 10.1109/IEMBS.2006.259473.

  34. José J, Farrajota M, Rodrigues João MF, Hans du Buf JM (2011) The smart vision local navigation aid for blind and visually impaired persons. Int J Digit Content Technol Appl 5:362–375

    Google Scholar 

  35. Khan A, Moideen F, Lopez J, Khoo WL, Zhu Z (2012) KinDetect: kinect detection objects. In: Computer Helping People with Special Needs, LNCS7382, 588–595

  36. Kuo BC, Ho HH, Li CH, Hung CC, Taur JS (2014) A kernel-based feature selection method for SVM with RBF kernel for hyperspectral image classification, selected topics in applied earth observations and remote sensing. IEEE J 7:317–326. doi:10.1109/JSTARS.2013.2262926

    Google Scholar 

  37. Lampert CH, Blaschko MB, Hofmann T (2008) Beyond sliding windows: object localization by efficient subwindow search, Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on. 1–8. doi: 10.1109/CVPR.2008.4587586

  38. Lee JJ, Kim G (2007) Robust estimation of camera homography using fuzzy RANSAC. In: Proceedings of the 2007 International Conference on Computational Science and Its Applications - Volume Part I, Springer-Verlag, Berlin, Heidelberg, pp. 992–1002. http://dl.acm.org/citation.cfm?id=1802834.1802930

  39. Lepetit CV, Strecha C, Fua P () BRIEF: binary robust independent elementary features. 11th European Conference on Computer Vision (ECCV), Heraklion, Crete. LNCS Springer, September 2010

  40. Leutenegger S, Chli M, Siegwart R (2011) Brisk: binary robust invariant scalable keypoints. IEEE International Conference on Computer Vision (ICCV)

  41. Li J, Allinson NM (2009) Dimensionality reduction-based building recognition. In: 9th IASTED International Conference on Visualization

  42. Lin Q, Hahn HS, Han YJ (2013) Top-view based guidance for blind people using directional ellipse model. Int J Adv Robot Syst 1:1–10

    Google Scholar 

  43. Lowe DG (1999) Object recognition from local scale-invariant features, Computer Vision, 1999. The Proceedings of the Seventh IEEE International Conference on. 2, 1150–1157 vol.2. doi: 10.1109/ICCV.1999.790410

  44. Lowe D (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60:91–110. doi:10.1023/B:VISI.0000029664.99615.94

    Article  Google Scholar 

  45. Lucas B, Kanade T (1981) An iterative technique of image registration and its application to stereo. In: IJCAI’81 Proceedings of the 7th international joint conference on Artificial intelligence, 2, 674–679

  46. Manduchi R (2012) Mobile vision as assistive technology for the blind: an experimental study. In: Proceedings of the 13th International Conference on Computers Helping People with Special Needs - Volume Part II, Springer-Verlag, Berlin, Heidelberg, pp. 9–16. doi: 10.1007/978-3-642-31534-3_2

  47. Matas J, Chum O, Urban M, Pajdla T (2004) Robust wide-baseline stereo from maximally stable external regions. Image Vis Comput 22:761–767. doi:10.1016/j.imavis.2004.02.006

    Article  Google Scholar 

  48. Meers S, Ward K (2005) A substitute vision system for providing 3D perception and GPS navigation via electro-tactile stimulation. In: 1st International Conference on Sensing Technology, 21–23

  49. Muja M, Lowe DG (2009) Fast approximate nearest neighbors with automatic algorithm configuration. In: VISAPP International Conference on Computer Vision Theory and Applications, pp. 331–340

  50. Oneata D, Revaud J, Verbeek J, Schmid C (2014) Spatio-temporal object detection proposals. Europeean Conference on Computer Vision, ECCV 2014 - European Conference on Computer Vision, volume 8691, pages 737–752, Zurich, Switzerland, Springer

  51. Pascolini D, Mariotti SP (2012) Global data on visual impairments 2010, in: World Health Organization, Geneva

  52. Peng E, Peursum P, Li L, Venkatesh S (2010) A smartphone-based obstacle sensor for the visually impaired. In: Yu Z, Liscano R, Chen G, Zhang D, Zhou X (eds) Ubiquitous Intelligence and Computing SE - 45, Springer Berlin Heidelberg, pp. 590–604. doi: 10.1007/978-3-642-16355-5_45

  53. Powers DMW (2011) Evaluation: from precision, recall and F measure to roc, informedness, markedness and correlation. J Mach Learn Technol 2(1):37–63

    MathSciNet  Google Scholar 

  54. Pradeep V, Medioni G, Weiland J (2010) Robot vision for the visually impaired, Computer Vision and Pattern Recognition Workshops (CVPRW), 2010 I.E. Computer Society Conference on. 15–22. doi: 10.1109/CVPRW.2010.5543579

  55. Rister B, Wang G, Wu M, Cavallaro JR (2013) A fast and efficient sift detector using the mobile GPU, Acoustics, Speech and Signal Processing (ICASSP), 2013 I.E. International Conference on. 2674–2678. doi: 10.1109/ICASSP.2013.6638141

  56. Rodríguez A, Yebes JJ, Alcantarilla PF, Bergasa LM, Almazán J, Cela A (2012) Assisting the visually impaired: obstacle detection and warning system by acoustic feedback. Sensors 12:17476–17496. doi:10.3390/s121217476

    Article  Google Scholar 

  57. Rosa S, Paleari M, Ariano P, Bona B (2012) Object tracking with adaptive HOG detector and adaptive Rao-Blackwellised particle filter. Proceedings of SPIE 8301, Intelligent Robots and Computer Vision XXIX: Algorithms and Techniques, 83010 W. doi:10.1117/12.911991

  58. Rublee E, Rabaud V, Konolige K, Bradski G (2011) ORB: an efficient alternative to SIFT or SURF. Computer Vision (ICCV), 2011 I.E. International Conference on, vol., no., pp. 2564–2571, 6–13

  59. Saez JM, Escolano F (2008) Stereo-based aerial obstacle detection for the visually impaired. In: Workshop on Computer Vision Applications for the Visually Impaired, Marselle, France

  60. Saez JM, Escolano F, Penalver A (2005) First steps towards stereo-based 6DOF SLAM for the visually impaired, computer vision and pattern recognition - workshops, 2005. CVPR Workshops. IEEE Computer Society Conference on. 23. doi: 10.1109/CVPR.2005.461

  61. Sainarayanan G, Nagarajan R, Yaacob S (2007) Fuzzy image processing scheme for autonomous navigation of human blind. Appl Soft Comput 7:257–264

    Article  Google Scholar 

  62. Shao H, Svoboda1 T, Tuytelaars T, Van Gool L (2003) HPAT indexing for fast object/scene recognition based on local appearance. In: E. Bakker, M. Lew, T. Huang, N. Sebe, X. Zhou (Eds.), Image and Video Retrieval SE - 8, Springer Berlin Heidelberg, pp. 71–80. doi: 10.1007/3-540-45113-7_8

  63. Szegedy C, Toshev A, Erhan D (2013) Deep neural networks for object detection. In: Annual Conference on Neural Information Processing Systems, pp. 2553–2561

  64. Takizawa H, Yamaguchi S, Aoyagi M, Ezaki N, Mizuno S (2012) Kinect cane: an assistive system for the visually impaired based on three-dimensional object recognition. In Proceedings of IEEE International Symposium on System Integration, Japan

  65. Tian Y, Yang X, Arditi A (2010) Computer vision-based door detection for accessibility of unfamiliar environments to blind persons. In: Proceedings of the 12th International Conference on Computers Helping People with Special Needs, Springer LNCS, vol. 6180, pp. 263–270

  66. Tong S, Chang E (2001) Support vector machine active learning for image retrieval. In: Proceedings of the Ninth ACM International Conference on Multimedia, ACM, New York, NY, USA, pp. 107–118. doi: 10.1145/500141.500159

  67. Tuzel O, Porikli F, Meer P (2006) Region covariance: “a fast descriptor for detection and classification”. In ECCV 3952:589–600

    Google Scholar 

  68. van de Sande KEA, Uijlings JRR, Gevers T, Smeulders AWM (2011) Segmentation As Selective Search for Object Recognition, in: Proceedings of the 2011 International Conference on Computer Vision, IEEE Computer Society, Washington, DC, USA, pp. 1879–1886. doi: 10.1109/ICCV.2011.6126456

  69. Vinyals A, Toshev A, Bengio S, Erhan D (2015) Show and tell: a neural image caption generator. In: International Conference on Computer Vision and Pattern Recognition (CVPR)

  70. Wang HC et al (2015) Bridging text spotting and SLAM with junction features. Intelligent Robots and Systems (IROS), 2015 IEEE/RSJ International Conference on, Hamburg, pp. 3701–3708

  71. Yu JH, Chung HI, Hahn HS (2009) Walking assistance system for sight impaired people based on a multimodal transformation technique. In Proceedings of the ICROS-SICE International Joint Conference, Japan

  72. Zhang W (2005) Localization based on building recognition. In: IEEE Workshop on Applications for Visually Impaired, pp. 21–28

  73. Zhang M, Zhou Z (2005) A k-nearest neighbor based algorithm for multilabel classification. In: IEEE International Conference on Granular Computing 2, 718–721

  74. Zhao C, Liu C, Lai Z (2011) Multi-scale gist feature manifold for building recognition. Neurocomput 74:2929–2940. doi:10.1016/j.neucom.2011.03.035

    Article  Google Scholar 

Download references

Acknowledgments

This work has been partially supported by the AAL (Ambient Assisted Living) ALICE project (AAL-2011-4-099), co-financed by ANR (Agence Nationale de la Recherche) and CNSA (Conseil National pour la Solidarité et l’Autonomie).

This work was supported by a grant of the Romanian National Authority for Scientific Research and Innovation, CNCS - UEFISCDI, project number PN-II-RU-TE-2014-4-0202.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ruxandra Tapu.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tapu, R., Mocanu, B. & Zaharia, T. A computer vision-based perception system for visually impaired. Multimed Tools Appl 76, 11771–11807 (2017). https://doi.org/10.1007/s11042-016-3617-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-016-3617-6

Keywords

Navigation