A computer vision-based perception system for visually impaired
- 452 Downloads
Abstract
In this paper, we introduce a novel computer vision-based perception system, dedicated to the autonomous navigation of visually impaired people. A first feature concerns the real-time detection and recognition of obstacles and moving objects present in potentially cluttered urban scenes. To this purpose, a motion-based, real-time object detection and classification method is proposed. The method requires no a priori information about the obstacle type, size, position or location. In order to enhance the navigation/positioning capabilities offered by traditional GPS-based approaches, which are often unreliably in urban environments, a building/landmark recognition approach is also proposed. Finally, for the specific case of indoor applications, the system has the possibility to learn a set of user-defined objects of interest. Here, multi-object identification and tracking is applied in order to guide the user to localize such objects of interest. The feedback is presented to user by audio warnings/alerts/indications. Bone conduction headphones are employed in order to allow visually impaired to hear the systems warnings without obstructing the sounds from the environment. At the hardware level, the system is totally integrated on an android smartphone which makes it easy to wear, non-invasive and low-cost.
Keywords
Obstacle detection BoVW / VLAD image representation Relevant interest points A-HOG descriptor Visually impaired peopleNotes
Acknowledgments
This work has been partially supported by the AAL (Ambient Assisted Living) ALICE project (AAL-2011-4-099), co-financed by ANR (Agence Nationale de la Recherche) and CNSA (Conseil National pour la Solidarité et l’Autonomie).
This work was supported by a grant of the Romanian National Authority for Scientific Research and Innovation, CNCS - UEFISCDI, project number PN-II-RU-TE-2014-4-0202.
References
- 1.Alahi, Ortiz R, Vandergheynst P (2012) FREAK: fast retina keypoint. In: IEEE Conference on Computer Vision and Pattern Recognition, 2012. CVPRGoogle Scholar
- 2.Ali H, Paar G, Paletta L (2007) Semantic indexing for visual recognition of buildings, 5th Int Symp Mob Mapp Technol. 6–9Google Scholar
- 3.Arthur D, Vassilvitskii S (2007) K-means++: the advantages of careful seeding. In: Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, pp. 1027–1035. http://dl.acm.org/citation.cfm?id=1283383.1283494
- 4.Baatz G, Köser K, Chen D, Grzeszczuk R, Pollefeys M (2010) Handling urban location recognition as a 2D homothetic problem. In: Daniilidis K, Maragos P, Paragios N (eds) Computer Vision – ECCV 2010 SE - 20, Springer Berlin Heidelberg, pp. 266–279. doi: 10.1007/978-3-642-15567-3_20
- 5.Bay H, Ess A, Tuytelaars T, Van Gool L (2008) Speeded-Up Robust Features (SURF). Comput Vis Image Underst 110:346–359. doi: 10.1016/j.cviu.2007.09.014 CrossRefGoogle Scholar
- 6.Black M, Anandan P (1993) A framework for robust estimation of optical flow. In: International Conference on Computer Vision CVPR, 231–236Google Scholar
- 7.Blasch BB, Wiener WR, Welsh RL (1997) Foundations of orientation and mobility. In: American Foundation for the Blind, 2nd ed., Press: New YorkGoogle Scholar
- 8.Brock M, Kristensson PO (2013) Supporting blind navigation using depth sensing and sonification. In Proceedings of the ACM Conference on Pervasive and Ubiquitous Computing, SwitzerlandGoogle Scholar
- 9.Chandrasekhar VR, Chen DM, Tsai SS, Cheung NM, Chen H, Takacs G et al (2011) The Stanford Mobile Visual Search Data Set, in: Proceedings of the Second Annual ACM Conference on Multimedia Systems, ACM, New York, NY, USA, pp. 117–122. doi: 10.1145/1943552.1943568
- 10.Chaudhry, Chandra R (2015) Design of a mobile face recognition system for visually impaired persons. CoRR, vol. abs/1502.00756Google Scholar
- 11.Chen DM, Baatz G, Koser K, Tsai SS, Vedantham R, Pylvanainen T et al (2011) City-scale landmark identification on mobile devices, Computer Vision and Pattern Recognition (CVPR), 2011 I.E. Conference on. 737–744. doi: 10.1109/CVPR.2011.5995610
- 12.Chen L, Guo B, Sun W (2010) Obstacle detection system for visually impaired people based on stereo vision. In Proceedings of the 4th International Conference on Genetic and Evolutionary Computing, Shenzhen, China, 13–15Google Scholar
- 13.Csurka G, Bray C, Dance C, Fan L (2004) Visual categorization with bags of keypoints, Workshop on Statistical Learning in Computer Vision, ECCV. 1–22Google Scholar
- 14.Dakopoulos D, Boddhu SK, Bourbakis N (2007) A 2D vibration array as an assistive device for visually impaired, bioinformatics and bioengineering, 2007. BIBE 2007. Proceedings of the 7th IEEE International Conference on. 930–937. doi: 10.1109/BIBE.2007.4375670
- 15.Dakopoulos D, Bourbakis N (2008) Preserving visual information in low resolution images during navigation of visually impaired. In: Proceedings of the 1st International Conference on PErvasive Technologies Related to Assistive Environments, ACM, New York, NY, USA, pp. 27:1–27:6. doi: 10.1145/1389586.1389619
- 16.Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection, Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on. 1 886–893 vol. 1. doi: 10.1109/CVPR.2005.177
- 17.Dalal N, Triggs B (2006) Object detection using histograms of oriented gradients. In: European Conference on Computer VisionGoogle Scholar
- 18.Delhumeau J, Gosselin P-H, Jégou H, Pérez P (2013) Revisiting the VLAD image representation. In: ACM Multimedia, 653–656Google Scholar
- 19.Ding C, He X (2004) K-means clustering via principal component analysis. In: Proceedings of the Twenty-First International Conference on Machine Learning, ACM, New York, NY, USA, pp. 29–. doi: 10.1145/1015330.1015408
- 20.El Mobacher A, Mitri N, Awad M (2013) Entropy-based and weighted selective SIFT clustering as an energy aware framework for supervised visual recognition of man-made structures. Math Probl EngGoogle Scholar
- 21.Erhan D, Szegedy C, Toshev A, Anguelov D (2014) Scalable object detection using deep neural networks. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2155–2162Google Scholar
- 22.Everingham M, Gool L, Williams CK, Winn J, Zisserman A (2010) The Pascal Visual Object Classes (VOC) challenge. Int J Comput Vis 88:303–338. doi: 10.1007/s11263-009-0275-4 CrossRefGoogle Scholar
- 23.Farabet C, Couprie C, Najman L, LeCun Y (2013) Learning hierarchical features for scene labeling. Pattern Analysis and MachineIntelligence, IEEE Transactions on, pp. 1–15Google Scholar
- 24.Fernando B, Fromont E, Muselet D, Sebban M (2012) Discriminative feature fusion for image classification. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3434–3441Google Scholar
- 25.Gauglitz S, Hollerer T, Turk M (2011) Evaluation of interest point detectors and feature descriptors for visual tracking. Int J Comput Vis, pages 1–26Google Scholar
- 26.Girshick RB, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587Google Scholar
- 27.Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation, Computer Vision and Pattern Recognition (CVPR), 2014 I.E. Conference on. 580–587. doi: 10.1109/CVPR.2014.81
- 28.Golledge RG, Marston JR, Costanzo CM (1997) Attitudes of visually impaired persons towards the use of public transportation. J Vis Impair Blindness 90:446–459Google Scholar
- 29.Grauman K, Bastian L (2011) Visual object recognition. Morgan & Claypool, San FranciscoGoogle Scholar
- 30.Gronat P, Obozinski G, Sivic J, Pajdla T (2013) Learning and calibrating per-location classifiers for visual place recognition, Computer Vision and Pattern Recognition (CVPR), 2013 I.E. Conference on. 907–914. doi: 10.1109/CVPR.2013.122
- 31.Harris C, Stephens M (1988) A combined corner and edge detector. In: Alvey Vision Conference, 147–151Google Scholar
- 32.Jegou H, Douze M, Schmid C (2011) Product quantization for nearest neighbor search. In PAMI 33(1):117–128CrossRefGoogle Scholar
- 33.Johnson LA, Higgins CM (2006) A navigation aid for the blind using tactile-visual sensory substitution. Eng Med Biol Soc 2006. EMBS’06. 28th Annual International Conference of the IEEE. 6289–6292. doi: 10.1109/IEMBS.2006.259473.
- 34.José J, Farrajota M, Rodrigues João MF, Hans du Buf JM (2011) The smart vision local navigation aid for blind and visually impaired persons. Int J Digit Content Technol Appl 5:362–375Google Scholar
- 35.Khan A, Moideen F, Lopez J, Khoo WL, Zhu Z (2012) KinDetect: kinect detection objects. In: Computer Helping People with Special Needs, LNCS7382, 588–595Google Scholar
- 36.Kuo BC, Ho HH, Li CH, Hung CC, Taur JS (2014) A kernel-based feature selection method for SVM with RBF kernel for hyperspectral image classification, selected topics in applied earth observations and remote sensing. IEEE J 7:317–326. doi: 10.1109/JSTARS.2013.2262926 Google Scholar
- 37.Lampert CH, Blaschko MB, Hofmann T (2008) Beyond sliding windows: object localization by efficient subwindow search, Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on. 1–8. doi: 10.1109/CVPR.2008.4587586
- 38.Lee JJ, Kim G (2007) Robust estimation of camera homography using fuzzy RANSAC. In: Proceedings of the 2007 International Conference on Computational Science and Its Applications - Volume Part I, Springer-Verlag, Berlin, Heidelberg, pp. 992–1002. http://dl.acm.org/citation.cfm?id=1802834.1802930
- 39.Lepetit CV, Strecha C, Fua P () BRIEF: binary robust independent elementary features. 11th European Conference on Computer Vision (ECCV), Heraklion, Crete. LNCS Springer, September 2010Google Scholar
- 40.Leutenegger S, Chli M, Siegwart R (2011) Brisk: binary robust invariant scalable keypoints. IEEE International Conference on Computer Vision (ICCV)Google Scholar
- 41.Li J, Allinson NM (2009) Dimensionality reduction-based building recognition. In: 9th IASTED International Conference on VisualizationGoogle Scholar
- 42.Lin Q, Hahn HS, Han YJ (2013) Top-view based guidance for blind people using directional ellipse model. Int J Adv Robot Syst 1:1–10Google Scholar
- 43.Lowe DG (1999) Object recognition from local scale-invariant features, Computer Vision, 1999. The Proceedings of the Seventh IEEE International Conference on. 2, 1150–1157 vol.2. doi: 10.1109/ICCV.1999.790410
- 44.Lowe D (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60:91–110. doi: 10.1023/B:VISI.0000029664.99615.94 CrossRefGoogle Scholar
- 45.Lucas B, Kanade T (1981) An iterative technique of image registration and its application to stereo. In: IJCAI’81 Proceedings of the 7th international joint conference on Artificial intelligence, 2, 674–679Google Scholar
- 46.Manduchi R (2012) Mobile vision as assistive technology for the blind: an experimental study. In: Proceedings of the 13th International Conference on Computers Helping People with Special Needs - Volume Part II, Springer-Verlag, Berlin, Heidelberg, pp. 9–16. doi: 10.1007/978-3-642-31534-3_2
- 47.Matas J, Chum O, Urban M, Pajdla T (2004) Robust wide-baseline stereo from maximally stable external regions. Image Vis Comput 22:761–767. doi: 10.1016/j.imavis.2004.02.006 CrossRefGoogle Scholar
- 48.Meers S, Ward K (2005) A substitute vision system for providing 3D perception and GPS navigation via electro-tactile stimulation. In: 1st International Conference on Sensing Technology, 21–23Google Scholar
- 49.Muja M, Lowe DG (2009) Fast approximate nearest neighbors with automatic algorithm configuration. In: VISAPP International Conference on Computer Vision Theory and Applications, pp. 331–340Google Scholar
- 50.Oneata D, Revaud J, Verbeek J, Schmid C (2014) Spatio-temporal object detection proposals. Europeean Conference on Computer Vision, ECCV 2014 - European Conference on Computer Vision, volume 8691, pages 737–752, Zurich, Switzerland, SpringerGoogle Scholar
- 51.Pascolini D, Mariotti SP (2012) Global data on visual impairments 2010, in: World Health Organization, GenevaGoogle Scholar
- 52.Peng E, Peursum P, Li L, Venkatesh S (2010) A smartphone-based obstacle sensor for the visually impaired. In: Yu Z, Liscano R, Chen G, Zhang D, Zhou X (eds) Ubiquitous Intelligence and Computing SE - 45, Springer Berlin Heidelberg, pp. 590–604. doi: 10.1007/978-3-642-16355-5_45
- 53.Powers DMW (2011) Evaluation: from precision, recall and F measure to roc, informedness, markedness and correlation. J Mach Learn Technol 2(1):37–63MathSciNetGoogle Scholar
- 54.Pradeep V, Medioni G, Weiland J (2010) Robot vision for the visually impaired, Computer Vision and Pattern Recognition Workshops (CVPRW), 2010 I.E. Computer Society Conference on. 15–22. doi: 10.1109/CVPRW.2010.5543579
- 55.Rister B, Wang G, Wu M, Cavallaro JR (2013) A fast and efficient sift detector using the mobile GPU, Acoustics, Speech and Signal Processing (ICASSP), 2013 I.E. International Conference on. 2674–2678. doi: 10.1109/ICASSP.2013.6638141
- 56.Rodríguez A, Yebes JJ, Alcantarilla PF, Bergasa LM, Almazán J, Cela A (2012) Assisting the visually impaired: obstacle detection and warning system by acoustic feedback. Sensors 12:17476–17496. doi: 10.3390/s121217476 CrossRefGoogle Scholar
- 57.Rosa S, Paleari M, Ariano P, Bona B (2012) Object tracking with adaptive HOG detector and adaptive Rao-Blackwellised particle filter. Proceedings of SPIE 8301, Intelligent Robots and Computer Vision XXIX: Algorithms and Techniques, 83010 W. doi: 10.1117/12.911991
- 58.Rublee E, Rabaud V, Konolige K, Bradski G (2011) ORB: an efficient alternative to SIFT or SURF. Computer Vision (ICCV), 2011 I.E. International Conference on, vol., no., pp. 2564–2571, 6–13Google Scholar
- 59.Saez JM, Escolano F (2008) Stereo-based aerial obstacle detection for the visually impaired. In: Workshop on Computer Vision Applications for the Visually Impaired, Marselle, FranceGoogle Scholar
- 60.Saez JM, Escolano F, Penalver A (2005) First steps towards stereo-based 6DOF SLAM for the visually impaired, computer vision and pattern recognition - workshops, 2005. CVPR Workshops. IEEE Computer Society Conference on. 23. doi: 10.1109/CVPR.2005.461
- 61.Sainarayanan G, Nagarajan R, Yaacob S (2007) Fuzzy image processing scheme for autonomous navigation of human blind. Appl Soft Comput 7:257–264CrossRefGoogle Scholar
- 62.Shao H, Svoboda1 T, Tuytelaars T, Van Gool L (2003) HPAT indexing for fast object/scene recognition based on local appearance. In: E. Bakker, M. Lew, T. Huang, N. Sebe, X. Zhou (Eds.), Image and Video Retrieval SE - 8, Springer Berlin Heidelberg, pp. 71–80. doi: 10.1007/3-540-45113-7_8
- 63.Szegedy C, Toshev A, Erhan D (2013) Deep neural networks for object detection. In: Annual Conference on Neural Information Processing Systems, pp. 2553–2561Google Scholar
- 64.Takizawa H, Yamaguchi S, Aoyagi M, Ezaki N, Mizuno S (2012) Kinect cane: an assistive system for the visually impaired based on three-dimensional object recognition. In Proceedings of IEEE International Symposium on System Integration, JapanGoogle Scholar
- 65.Tian Y, Yang X, Arditi A (2010) Computer vision-based door detection for accessibility of unfamiliar environments to blind persons. In: Proceedings of the 12th International Conference on Computers Helping People with Special Needs, Springer LNCS, vol. 6180, pp. 263–270Google Scholar
- 66.Tong S, Chang E (2001) Support vector machine active learning for image retrieval. In: Proceedings of the Ninth ACM International Conference on Multimedia, ACM, New York, NY, USA, pp. 107–118. doi: 10.1145/500141.500159
- 67.Tuzel O, Porikli F, Meer P (2006) Region covariance: “a fast descriptor for detection and classification”. In ECCV 3952:589–600Google Scholar
- 68.van de Sande KEA, Uijlings JRR, Gevers T, Smeulders AWM (2011) Segmentation As Selective Search for Object Recognition, in: Proceedings of the 2011 International Conference on Computer Vision, IEEE Computer Society, Washington, DC, USA, pp. 1879–1886. doi: 10.1109/ICCV.2011.6126456
- 69.Vinyals A, Toshev A, Bengio S, Erhan D (2015) Show and tell: a neural image caption generator. In: International Conference on Computer Vision and Pattern Recognition (CVPR)Google Scholar
- 70.Wang HC et al (2015) Bridging text spotting and SLAM with junction features. Intelligent Robots and Systems (IROS), 2015 IEEE/RSJ International Conference on, Hamburg, pp. 3701–3708Google Scholar
- 71.Yu JH, Chung HI, Hahn HS (2009) Walking assistance system for sight impaired people based on a multimodal transformation technique. In Proceedings of the ICROS-SICE International Joint Conference, JapanGoogle Scholar
- 72.Zhang W (2005) Localization based on building recognition. In: IEEE Workshop on Applications for Visually Impaired, pp. 21–28Google Scholar
- 73.Zhang M, Zhou Z (2005) A k-nearest neighbor based algorithm for multilabel classification. In: IEEE International Conference on Granular Computing 2, 718–721Google Scholar
- 74.Zhao C, Liu C, Lai Z (2011) Multi-scale gist feature manifold for building recognition. Neurocomput 74:2929–2940. doi: 10.1016/j.neucom.2011.03.035 CrossRefGoogle Scholar