Abstract
In this paper, we present a bi-modal user interface aimed both for assistance to persons without hands or with physical disabilities of hands/arms, and for contactless HCI with able-bodied users as well. Human being can manipulate a virtual mouse pointer moving his/her head and verbally communicate with a computer, giving speech commands instead of computer input devices. Speech is a very useful modality to reference objects and actions on objects, whereas head pointing gesture/motion is a powerful modality to indicate spatial locations. The bi-modal interface integrates a tri-lingual system for multi-channel audio signal processing and automatic recognition of voice commands in English, French and Russian as well as a vision-based head detection/tracking system. It processes natural speech and head pointing movements in parallel and fuses both informational streams in a united multimodal command, where each modality transmits own semantic information: head position indicates 2D head/pointer coordinates, while speech signal yields control commands. Testing of the bi-modal user interface and comparison with contact-based pointing interfaces was made by the methodology of ISO 9241-9.
Chapter PDF
Similar content being viewed by others
Keywords
References
Oviatt, S.: Multimodal interfaces. In: Human-Computer Interaction Handbook: Fundamentals, Evolving Technologies and Emerging Applications, pp. 286–304. Lawrence Erlbaum Assoc., Mahwah (2003)
Bolt, R.A.: Put-that-there: Voice and gesture at the graphics interface. Computer Graphics 14(3), 262–270 (1980)
Malkewitz, R.: Head Pointing and Speech Control as a Hands-Free Interface to Desktop Computing. In: 3rd International ACM Conference on Assistive Technologies ASSETS 1998, Marina del Rey, CA, USA, pp. 182–188. ACM Press, New York (1998)
Gorodnichy, D., Roth, G.: Nouse ’Use your nose as a mouse’ perceptual vision technology for hands-free games and interfaces. Image and Vision Computing 22(12), 931–942 (2004)
Harada, S., Landay, J.A., Malkin, J., Li, X., Bilmes, J.A.: The Vocal Joystick: Evaluation of voice-based cursor control techniques. In: 8th International ACM SIGACCESS Conference on Computers & Accessibility ASSETS 2006, Portland, USA, pp. 197–204. ACM Press, New York (2006)
Eiichi, I.: Multi-modal interface with voice and head tracking for multiple home appliances. In: 8th IFIP International Conference on Human-Computer Interaction INTERACT 2001, Tokyo, Japan, pp. 727–728 (2001)
Karpov, A., Ronzhin, A.: ICANDO: Low Cost Multimodal Interface for Hand Disabled People. Journal on Multimodal User Interfaces 1(2), 21–29 (2007)
Ronzhin, A., Karpov, A.: Russian Voice Interface. Pattern Recognition and Image Analysis: Advances in Mathematical Theory and Applications 17(2), 321–336 (2007)
Rabiner, L., Juang, B.: Speech Recognition. In: Benesty, J., et al. (eds.) Springer Handbook of Speech Processing. Springer, New York (2008)
Krim, H., Viberg, M.: Two decades of array signal processing research: the parametric approach. Signal Processing Magazine 13(4), 67–94 (1996)
Ronzhin, A., Karpov, A., Kipyatkova, I., Železný, M.: Client and Speech Detection System for Intelligent Infokiosk. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds.) TSD 2010. LNCS, vol. 6231, pp. 560–567. Springer, Heidelberg (2010)
Viola, P., Jones, M.: Rapid object detection using a boosted cascade of simple features. In: IEEE International Conference on Computer Vision and Pattern Recognition Conference CVPR 2001, Kauai, HI, USA (2001)
Lienhart, R., Maydt, J.: An Extended Set of Haar-like Features for Rapid Object Detection. In: IEEE International Conference on Image Processing ICIP 2002, Rochester, New York, USA, pp. 900–903 (2002)
Lucas, B.D., Kanade, T.: An Iterative Image Registration Technique with an Application to Stereo Vision. In: 7th International Joint Conference on Artificial intelligence IJCAI, Vancouver, Canada, pp. 674–679 (1981)
Bouguet, J.-Y.: Pyramidal Implementation of the Lucas-Kanade Feature Tracker Description of the algorithm. Intel Corporation Microprocessor Research Labs, Report, New York, USA (2000)
Karpov, A., Ronzhin, A., Cadiou, A.: A Multi-Modal System ICANDO: Intellectual Computer AssistaNt for the Disabled Operators. In: INTERSPEECH International Conference, Pittsburgh, PA, USA, pp. 1998–2001. ISCA Association (2006)
ISO 9241-9:2000(E) Ergonomic Requirements for Office Work with Visual Display Terminals (VDTs), Part 9: Requirements for Non-Keyboard Input Devices, International Standards Organization (2000)
Soukoreff, R.W., MacKenzie, I.S.: Towards a standard for pointing device evaluation, perspectives on 27 years of Fitts’ law research in HCI. International Journal of Human Computer Studies 61(6), 751–789 (2004)
Zhang, X., MacKenzie, I.S.: Evaluating Eye Tracking with ISO 9241 - Part 9. In: Jacko, J.A. (ed.) HCI 2007. LNCS, vol. 4552, pp. 779–788. Springer, Heidelberg (2007)
Carbini, S., Viallet, J.E.: Evaluation of contactless multimodal pointing devices. In: 2nd IASTED International Conference on Human-Computer Interaction, Chamonix, France, pp. 226–231 (2006)
De Silva, G.C., Lyons, M.J., Kawato, S., Tetsutani, N.: Human Factors Evaluation of a Vision-Based Facial Gesture Interface. In: International Workshop on Computer Vision and Pattern Recognition for Computer Human Interaction, Madison, USA (2003)
Wilson, A., Cutrell, E.: FlowMouse: A computer vision-based pointing and gesture input device. In: Costabile, M.F., Paternó, F. (eds.) INTERACT 2005. LNCS, vol. 3585, pp. 565–578. Springer, Heidelberg (2005)
Ward, D., Blackwell, A., MacKay, D.: Dasher: A data entry interface using continuous gestures and language models. In: ACM Symposium on User Interface Software and Technology UIST 2000, pp. 129–137. ACM Press, New York (2000)
SPIIRAS Speech and Multimodal Interfaces Web-site, TV demonstration, http://www.spiiras.nw.ru/speech/demo/ort.avi
SPIIRAS Speech and Multimodal Interfaces Web-site, demonstration 2, http://www.spiiras.nw.ru/speech/demo/demo_new.avi
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Karpov, A., Ronzhin, A., Kipyatkova, I. (2011). An Assistive Bi-modal User Interface Integrating Multi-channel Speech Recognition and Computer Vision. In: Jacko, J.A. (eds) Human-Computer Interaction. Interaction Techniques and Environments. HCI 2011. Lecture Notes in Computer Science, vol 6762. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-21605-3_50
Download citation
DOI: https://doi.org/10.1007/978-3-642-21605-3_50
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-21604-6
Online ISBN: 978-3-642-21605-3
eBook Packages: Computer ScienceComputer Science (R0)