Abstract
In this paper, we introduce a novel computer vision-based perception system, dedicated to the autonomous navigation of visually impaired people. A first feature concerns the real-time detection and recognition of obstacles and moving objects present in potentially cluttered urban scenes. To this purpose, a motion-based, real-time object detection and classification method is proposed. The method requires no a priori information about the obstacle type, size, position or location. In order to enhance the navigation/positioning capabilities offered by traditional GPS-based approaches, which are often unreliably in urban environments, a building/landmark recognition approach is also proposed. Finally, for the specific case of indoor applications, the system has the possibility to learn a set of user-defined objects of interest. Here, multi-object identification and tracking is applied in order to guide the user to localize such objects of interest. The feedback is presented to user by audio warnings/alerts/indications. Bone conduction headphones are employed in order to allow visually impaired to hear the systems warnings without obstructing the sounds from the environment. At the hardware level, the system is totally integrated on an android smartphone which makes it easy to wear, non-invasive and low-cost.
Similar content being viewed by others
References
Alahi, Ortiz R, Vandergheynst P (2012) FREAK: fast retina keypoint. In: IEEE Conference on Computer Vision and Pattern Recognition, 2012. CVPR
Ali H, Paar G, Paletta L (2007) Semantic indexing for visual recognition of buildings, 5th Int Symp Mob Mapp Technol. 6–9
Arthur D, Vassilvitskii S (2007) K-means++: the advantages of careful seeding. In: Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, pp. 1027–1035. http://dl.acm.org/citation.cfm?id=1283383.1283494
Baatz G, Köser K, Chen D, Grzeszczuk R, Pollefeys M (2010) Handling urban location recognition as a 2D homothetic problem. In: Daniilidis K, Maragos P, Paragios N (eds) Computer Vision – ECCV 2010 SE - 20, Springer Berlin Heidelberg, pp. 266–279. doi: 10.1007/978-3-642-15567-3_20
Bay H, Ess A, Tuytelaars T, Van Gool L (2008) Speeded-Up Robust Features (SURF). Comput Vis Image Underst 110:346–359. doi:10.1016/j.cviu.2007.09.014
Black M, Anandan P (1993) A framework for robust estimation of optical flow. In: International Conference on Computer Vision CVPR, 231–236
Blasch BB, Wiener WR, Welsh RL (1997) Foundations of orientation and mobility. In: American Foundation for the Blind, 2nd ed., Press: New York
Brock M, Kristensson PO (2013) Supporting blind navigation using depth sensing and sonification. In Proceedings of the ACM Conference on Pervasive and Ubiquitous Computing, Switzerland
Chandrasekhar VR, Chen DM, Tsai SS, Cheung NM, Chen H, Takacs G et al (2011) The Stanford Mobile Visual Search Data Set, in: Proceedings of the Second Annual ACM Conference on Multimedia Systems, ACM, New York, NY, USA, pp. 117–122. doi: 10.1145/1943552.1943568
Chaudhry, Chandra R (2015) Design of a mobile face recognition system for visually impaired persons. CoRR, vol. abs/1502.00756
Chen DM, Baatz G, Koser K, Tsai SS, Vedantham R, Pylvanainen T et al (2011) City-scale landmark identification on mobile devices, Computer Vision and Pattern Recognition (CVPR), 2011 I.E. Conference on. 737–744. doi: 10.1109/CVPR.2011.5995610
Chen L, Guo B, Sun W (2010) Obstacle detection system for visually impaired people based on stereo vision. In Proceedings of the 4th International Conference on Genetic and Evolutionary Computing, Shenzhen, China, 13–15
Csurka G, Bray C, Dance C, Fan L (2004) Visual categorization with bags of keypoints, Workshop on Statistical Learning in Computer Vision, ECCV. 1–22
Dakopoulos D, Boddhu SK, Bourbakis N (2007) A 2D vibration array as an assistive device for visually impaired, bioinformatics and bioengineering, 2007. BIBE 2007. Proceedings of the 7th IEEE International Conference on. 930–937. doi: 10.1109/BIBE.2007.4375670
Dakopoulos D, Bourbakis N (2008) Preserving visual information in low resolution images during navigation of visually impaired. In: Proceedings of the 1st International Conference on PErvasive Technologies Related to Assistive Environments, ACM, New York, NY, USA, pp. 27:1–27:6. doi: 10.1145/1389586.1389619
Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection, Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on. 1 886–893 vol. 1. doi: 10.1109/CVPR.2005.177
Dalal N, Triggs B (2006) Object detection using histograms of oriented gradients. In: European Conference on Computer Vision
Delhumeau J, Gosselin P-H, Jégou H, Pérez P (2013) Revisiting the VLAD image representation. In: ACM Multimedia, 653–656
Ding C, He X (2004) K-means clustering via principal component analysis. In: Proceedings of the Twenty-First International Conference on Machine Learning, ACM, New York, NY, USA, pp. 29–. doi: 10.1145/1015330.1015408
El Mobacher A, Mitri N, Awad M (2013) Entropy-based and weighted selective SIFT clustering as an energy aware framework for supervised visual recognition of man-made structures. Math Probl Eng
Erhan D, Szegedy C, Toshev A, Anguelov D (2014) Scalable object detection using deep neural networks. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2155–2162
Everingham M, Gool L, Williams CK, Winn J, Zisserman A (2010) The Pascal Visual Object Classes (VOC) challenge. Int J Comput Vis 88:303–338. doi:10.1007/s11263-009-0275-4
Farabet C, Couprie C, Najman L, LeCun Y (2013) Learning hierarchical features for scene labeling. Pattern Analysis and MachineIntelligence, IEEE Transactions on, pp. 1–15
Fernando B, Fromont E, Muselet D, Sebban M (2012) Discriminative feature fusion for image classification. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3434–3441
Gauglitz S, Hollerer T, Turk M (2011) Evaluation of interest point detectors and feature descriptors for visual tracking. Int J Comput Vis, pages 1–26
Girshick RB, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587
Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation, Computer Vision and Pattern Recognition (CVPR), 2014 I.E. Conference on. 580–587. doi: 10.1109/CVPR.2014.81
Golledge RG, Marston JR, Costanzo CM (1997) Attitudes of visually impaired persons towards the use of public transportation. J Vis Impair Blindness 90:446–459
Grauman K, Bastian L (2011) Visual object recognition. Morgan & Claypool, San Francisco
Gronat P, Obozinski G, Sivic J, Pajdla T (2013) Learning and calibrating per-location classifiers for visual place recognition, Computer Vision and Pattern Recognition (CVPR), 2013 I.E. Conference on. 907–914. doi: 10.1109/CVPR.2013.122
Harris C, Stephens M (1988) A combined corner and edge detector. In: Alvey Vision Conference, 147–151
Jegou H, Douze M, Schmid C (2011) Product quantization for nearest neighbor search. In PAMI 33(1):117–128
Johnson LA, Higgins CM (2006) A navigation aid for the blind using tactile-visual sensory substitution. Eng Med Biol Soc 2006. EMBS’06. 28th Annual International Conference of the IEEE. 6289–6292. doi: 10.1109/IEMBS.2006.259473.
José J, Farrajota M, Rodrigues João MF, Hans du Buf JM (2011) The smart vision local navigation aid for blind and visually impaired persons. Int J Digit Content Technol Appl 5:362–375
Khan A, Moideen F, Lopez J, Khoo WL, Zhu Z (2012) KinDetect: kinect detection objects. In: Computer Helping People with Special Needs, LNCS7382, 588–595
Kuo BC, Ho HH, Li CH, Hung CC, Taur JS (2014) A kernel-based feature selection method for SVM with RBF kernel for hyperspectral image classification, selected topics in applied earth observations and remote sensing. IEEE J 7:317–326. doi:10.1109/JSTARS.2013.2262926
Lampert CH, Blaschko MB, Hofmann T (2008) Beyond sliding windows: object localization by efficient subwindow search, Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on. 1–8. doi: 10.1109/CVPR.2008.4587586
Lee JJ, Kim G (2007) Robust estimation of camera homography using fuzzy RANSAC. In: Proceedings of the 2007 International Conference on Computational Science and Its Applications - Volume Part I, Springer-Verlag, Berlin, Heidelberg, pp. 992–1002. http://dl.acm.org/citation.cfm?id=1802834.1802930
Lepetit CV, Strecha C, Fua P () BRIEF: binary robust independent elementary features. 11th European Conference on Computer Vision (ECCV), Heraklion, Crete. LNCS Springer, September 2010
Leutenegger S, Chli M, Siegwart R (2011) Brisk: binary robust invariant scalable keypoints. IEEE International Conference on Computer Vision (ICCV)
Li J, Allinson NM (2009) Dimensionality reduction-based building recognition. In: 9th IASTED International Conference on Visualization
Lin Q, Hahn HS, Han YJ (2013) Top-view based guidance for blind people using directional ellipse model. Int J Adv Robot Syst 1:1–10
Lowe DG (1999) Object recognition from local scale-invariant features, Computer Vision, 1999. The Proceedings of the Seventh IEEE International Conference on. 2, 1150–1157 vol.2. doi: 10.1109/ICCV.1999.790410
Lowe D (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60:91–110. doi:10.1023/B:VISI.0000029664.99615.94
Lucas B, Kanade T (1981) An iterative technique of image registration and its application to stereo. In: IJCAI’81 Proceedings of the 7th international joint conference on Artificial intelligence, 2, 674–679
Manduchi R (2012) Mobile vision as assistive technology for the blind: an experimental study. In: Proceedings of the 13th International Conference on Computers Helping People with Special Needs - Volume Part II, Springer-Verlag, Berlin, Heidelberg, pp. 9–16. doi: 10.1007/978-3-642-31534-3_2
Matas J, Chum O, Urban M, Pajdla T (2004) Robust wide-baseline stereo from maximally stable external regions. Image Vis Comput 22:761–767. doi:10.1016/j.imavis.2004.02.006
Meers S, Ward K (2005) A substitute vision system for providing 3D perception and GPS navigation via electro-tactile stimulation. In: 1st International Conference on Sensing Technology, 21–23
Muja M, Lowe DG (2009) Fast approximate nearest neighbors with automatic algorithm configuration. In: VISAPP International Conference on Computer Vision Theory and Applications, pp. 331–340
Oneata D, Revaud J, Verbeek J, Schmid C (2014) Spatio-temporal object detection proposals. Europeean Conference on Computer Vision, ECCV 2014 - European Conference on Computer Vision, volume 8691, pages 737–752, Zurich, Switzerland, Springer
Pascolini D, Mariotti SP (2012) Global data on visual impairments 2010, in: World Health Organization, Geneva
Peng E, Peursum P, Li L, Venkatesh S (2010) A smartphone-based obstacle sensor for the visually impaired. In: Yu Z, Liscano R, Chen G, Zhang D, Zhou X (eds) Ubiquitous Intelligence and Computing SE - 45, Springer Berlin Heidelberg, pp. 590–604. doi: 10.1007/978-3-642-16355-5_45
Powers DMW (2011) Evaluation: from precision, recall and F measure to roc, informedness, markedness and correlation. J Mach Learn Technol 2(1):37–63
Pradeep V, Medioni G, Weiland J (2010) Robot vision for the visually impaired, Computer Vision and Pattern Recognition Workshops (CVPRW), 2010 I.E. Computer Society Conference on. 15–22. doi: 10.1109/CVPRW.2010.5543579
Rister B, Wang G, Wu M, Cavallaro JR (2013) A fast and efficient sift detector using the mobile GPU, Acoustics, Speech and Signal Processing (ICASSP), 2013 I.E. International Conference on. 2674–2678. doi: 10.1109/ICASSP.2013.6638141
Rodríguez A, Yebes JJ, Alcantarilla PF, Bergasa LM, Almazán J, Cela A (2012) Assisting the visually impaired: obstacle detection and warning system by acoustic feedback. Sensors 12:17476–17496. doi:10.3390/s121217476
Rosa S, Paleari M, Ariano P, Bona B (2012) Object tracking with adaptive HOG detector and adaptive Rao-Blackwellised particle filter. Proceedings of SPIE 8301, Intelligent Robots and Computer Vision XXIX: Algorithms and Techniques, 83010 W. doi:10.1117/12.911991
Rublee E, Rabaud V, Konolige K, Bradski G (2011) ORB: an efficient alternative to SIFT or SURF. Computer Vision (ICCV), 2011 I.E. International Conference on, vol., no., pp. 2564–2571, 6–13
Saez JM, Escolano F (2008) Stereo-based aerial obstacle detection for the visually impaired. In: Workshop on Computer Vision Applications for the Visually Impaired, Marselle, France
Saez JM, Escolano F, Penalver A (2005) First steps towards stereo-based 6DOF SLAM for the visually impaired, computer vision and pattern recognition - workshops, 2005. CVPR Workshops. IEEE Computer Society Conference on. 23. doi: 10.1109/CVPR.2005.461
Sainarayanan G, Nagarajan R, Yaacob S (2007) Fuzzy image processing scheme for autonomous navigation of human blind. Appl Soft Comput 7:257–264
Shao H, Svoboda1 T, Tuytelaars T, Van Gool L (2003) HPAT indexing for fast object/scene recognition based on local appearance. In: E. Bakker, M. Lew, T. Huang, N. Sebe, X. Zhou (Eds.), Image and Video Retrieval SE - 8, Springer Berlin Heidelberg, pp. 71–80. doi: 10.1007/3-540-45113-7_8
Szegedy C, Toshev A, Erhan D (2013) Deep neural networks for object detection. In: Annual Conference on Neural Information Processing Systems, pp. 2553–2561
Takizawa H, Yamaguchi S, Aoyagi M, Ezaki N, Mizuno S (2012) Kinect cane: an assistive system for the visually impaired based on three-dimensional object recognition. In Proceedings of IEEE International Symposium on System Integration, Japan
Tian Y, Yang X, Arditi A (2010) Computer vision-based door detection for accessibility of unfamiliar environments to blind persons. In: Proceedings of the 12th International Conference on Computers Helping People with Special Needs, Springer LNCS, vol. 6180, pp. 263–270
Tong S, Chang E (2001) Support vector machine active learning for image retrieval. In: Proceedings of the Ninth ACM International Conference on Multimedia, ACM, New York, NY, USA, pp. 107–118. doi: 10.1145/500141.500159
Tuzel O, Porikli F, Meer P (2006) Region covariance: “a fast descriptor for detection and classification”. In ECCV 3952:589–600
van de Sande KEA, Uijlings JRR, Gevers T, Smeulders AWM (2011) Segmentation As Selective Search for Object Recognition, in: Proceedings of the 2011 International Conference on Computer Vision, IEEE Computer Society, Washington, DC, USA, pp. 1879–1886. doi: 10.1109/ICCV.2011.6126456
Vinyals A, Toshev A, Bengio S, Erhan D (2015) Show and tell: a neural image caption generator. In: International Conference on Computer Vision and Pattern Recognition (CVPR)
Wang HC et al (2015) Bridging text spotting and SLAM with junction features. Intelligent Robots and Systems (IROS), 2015 IEEE/RSJ International Conference on, Hamburg, pp. 3701–3708
Yu JH, Chung HI, Hahn HS (2009) Walking assistance system for sight impaired people based on a multimodal transformation technique. In Proceedings of the ICROS-SICE International Joint Conference, Japan
Zhang W (2005) Localization based on building recognition. In: IEEE Workshop on Applications for Visually Impaired, pp. 21–28
Zhang M, Zhou Z (2005) A k-nearest neighbor based algorithm for multilabel classification. In: IEEE International Conference on Granular Computing 2, 718–721
Zhao C, Liu C, Lai Z (2011) Multi-scale gist feature manifold for building recognition. Neurocomput 74:2929–2940. doi:10.1016/j.neucom.2011.03.035
Acknowledgments
This work has been partially supported by the AAL (Ambient Assisted Living) ALICE project (AAL-2011-4-099), co-financed by ANR (Agence Nationale de la Recherche) and CNSA (Conseil National pour la Solidarité et l’Autonomie).
This work was supported by a grant of the Romanian National Authority for Scientific Research and Innovation, CNCS - UEFISCDI, project number PN-II-RU-TE-2014-4-0202.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Tapu, R., Mocanu, B. & Zaharia, T. A computer vision-based perception system for visually impaired. Multimed Tools Appl 76, 11771–11807 (2017). https://doi.org/10.1007/s11042-016-3617-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-016-3617-6