An Assistive Bi-modal User Interface Integrating Multi-channel Speech Recognition and Computer Vision

Karpov, Alexey; Ronzhin, Andrey; Kipyatkova, Irina

doi:10.1007/978-3-642-21605-3_50

Alexey Karpov¹⁷,
Andrey Ronzhin¹⁷ &
Irina Kipyatkova¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6762))

Included in the following conference series:

International Conference on Human-Computer Interaction

2113 Accesses
10 Citations

Abstract

In this paper, we present a bi-modal user interface aimed both for assistance to persons without hands or with physical disabilities of hands/arms, and for contactless HCI with able-bodied users as well. Human being can manipulate a virtual mouse pointer moving his/her head and verbally communicate with a computer, giving speech commands instead of computer input devices. Speech is a very useful modality to reference objects and actions on objects, whereas head pointing gesture/motion is a powerful modality to indicate spatial locations. The bi-modal interface integrates a tri-lingual system for multi-channel audio signal processing and automatic recognition of voice commands in English, French and Russian as well as a vision-based head detection/tracking system. It processes natural speech and head pointing movements in parallel and fuses both informational streams in a united multimodal command, where each modality transmits own semantic information: head position indicates 2D head/pointer coordinates, while speech signal yields control commands. Testing of the bi-modal user interface and comparison with contact-based pointing interfaces was made by the methodology of ISO 9241-9.

Download to read the full chapter text

Chapter PDF

Bimodal Speech Recognition Fusing Audio-Visual Modalities

A Universal Assistive Technology with Multimodal Input and Multimedia Output Interfaces

Multimodal Interfaces of Human–Computer Interaction

Article 01 January 2018

Keywords

References

Oviatt, S.: Multimodal interfaces. In: Human-Computer Interaction Handbook: Fundamentals, Evolving Technologies and Emerging Applications, pp. 286–304. Lawrence Erlbaum Assoc., Mahwah (2003)
Google Scholar
Bolt, R.A.: Put-that-there: Voice and gesture at the graphics interface. Computer Graphics 14(3), 262–270 (1980)
Article Google Scholar
Malkewitz, R.: Head Pointing and Speech Control as a Hands-Free Interface to Desktop Computing. In: 3rd International ACM Conference on Assistive Technologies ASSETS 1998, Marina del Rey, CA, USA, pp. 182–188. ACM Press, New York (1998)
Google Scholar
Gorodnichy, D., Roth, G.: Nouse ’Use your nose as a mouse’ perceptual vision technology for hands-free games and interfaces. Image and Vision Computing 22(12), 931–942 (2004)
Article Google Scholar
Harada, S., Landay, J.A., Malkin, J., Li, X., Bilmes, J.A.: The Vocal Joystick: Evaluation of voice-based cursor control techniques. In: 8th International ACM SIGACCESS Conference on Computers & Accessibility ASSETS 2006, Portland, USA, pp. 197–204. ACM Press, New York (2006)
Google Scholar
Eiichi, I.: Multi-modal interface with voice and head tracking for multiple home appliances. In: 8th IFIP International Conference on Human-Computer Interaction INTERACT 2001, Tokyo, Japan, pp. 727–728 (2001)
Google Scholar
Karpov, A., Ronzhin, A.: ICANDO: Low Cost Multimodal Interface for Hand Disabled People. Journal on Multimodal User Interfaces 1(2), 21–29 (2007)
Article Google Scholar
Ronzhin, A., Karpov, A.: Russian Voice Interface. Pattern Recognition and Image Analysis: Advances in Mathematical Theory and Applications 17(2), 321–336 (2007)
Article Google Scholar
Rabiner, L., Juang, B.: Speech Recognition. In: Benesty, J., et al. (eds.) Springer Handbook of Speech Processing. Springer, New York (2008)
Google Scholar
Krim, H., Viberg, M.: Two decades of array signal processing research: the parametric approach. Signal Processing Magazine 13(4), 67–94 (1996)
Article Google Scholar
Ronzhin, A., Karpov, A., Kipyatkova, I., Železný, M.: Client and Speech Detection System for Intelligent Infokiosk. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds.) TSD 2010. LNCS, vol. 6231, pp. 560–567. Springer, Heidelberg (2010)
Chapter Google Scholar
Viola, P., Jones, M.: Rapid object detection using a boosted cascade of simple features. In: IEEE International Conference on Computer Vision and Pattern Recognition Conference CVPR 2001, Kauai, HI, USA (2001)
Google Scholar
Lienhart, R., Maydt, J.: An Extended Set of Haar-like Features for Rapid Object Detection. In: IEEE International Conference on Image Processing ICIP 2002, Rochester, New York, USA, pp. 900–903 (2002)
Google Scholar
Lucas, B.D., Kanade, T.: An Iterative Image Registration Technique with an Application to Stereo Vision. In: 7th International Joint Conference on Artificial intelligence IJCAI, Vancouver, Canada, pp. 674–679 (1981)
Google Scholar
Bouguet, J.-Y.: Pyramidal Implementation of the Lucas-Kanade Feature Tracker Description of the algorithm. Intel Corporation Microprocessor Research Labs, Report, New York, USA (2000)
Google Scholar
Karpov, A., Ronzhin, A., Cadiou, A.: A Multi-Modal System ICANDO: Intellectual Computer AssistaNt for the Disabled Operators. In: INTERSPEECH International Conference, Pittsburgh, PA, USA, pp. 1998–2001. ISCA Association (2006)
Google Scholar
ISO 9241-9:2000(E) Ergonomic Requirements for Office Work with Visual Display Terminals (VDTs), Part 9: Requirements for Non-Keyboard Input Devices, International Standards Organization (2000)
Google Scholar
Soukoreff, R.W., MacKenzie, I.S.: Towards a standard for pointing device evaluation, perspectives on 27 years of Fitts’ law research in HCI. International Journal of Human Computer Studies 61(6), 751–789 (2004)
Article Google Scholar
Zhang, X., MacKenzie, I.S.: Evaluating Eye Tracking with ISO 9241 - Part 9. In: Jacko, J.A. (ed.) HCI 2007. LNCS, vol. 4552, pp. 779–788. Springer, Heidelberg (2007)
Chapter Google Scholar
Carbini, S., Viallet, J.E.: Evaluation of contactless multimodal pointing devices. In: 2nd IASTED International Conference on Human-Computer Interaction, Chamonix, France, pp. 226–231 (2006)
Google Scholar
De Silva, G.C., Lyons, M.J., Kawato, S., Tetsutani, N.: Human Factors Evaluation of a Vision-Based Facial Gesture Interface. In: International Workshop on Computer Vision and Pattern Recognition for Computer Human Interaction, Madison, USA (2003)
Google Scholar
Wilson, A., Cutrell, E.: FlowMouse: A computer vision-based pointing and gesture input device. In: Costabile, M.F., Paternó, F. (eds.) INTERACT 2005. LNCS, vol. 3585, pp. 565–578. Springer, Heidelberg (2005)
Chapter Google Scholar
Ward, D., Blackwell, A., MacKay, D.: Dasher: A data entry interface using continuous gestures and language models. In: ACM Symposium on User Interface Software and Technology UIST 2000, pp. 129–137. ACM Press, New York (2000)
Google Scholar
SPIIRAS Speech and Multimodal Interfaces Web-site, TV demonstration, http://www.spiiras.nw.ru/speech/demo/ort.avi
SPIIRAS Speech and Multimodal Interfaces Web-site, demonstration 2, http://www.spiiras.nw.ru/speech/demo/demo_new.avi

Download references

Author information

Authors and Affiliations

St. Petersburg Institute for Informatics and Automation of the Russian Academy of Sciences SPIIRAS, 39, 14-th line, 199178, St. Petersburg, Russian Federation
Alexey Karpov, Andrey Ronzhin & Irina Kipyatkova

Authors

Alexey Karpov
View author publications
You can also search for this author in PubMed Google Scholar
Andrey Ronzhin
View author publications
You can also search for this author in PubMed Google Scholar
Irina Kipyatkova
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Public Health, Institute of Health Informatics, University of Minnesota, 1260 Mayo (MMC 807), 420 Delaware Street S.E., 55455, Minneapolis, MN, USA
Julie A. Jacko

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Karpov, A., Ronzhin, A., Kipyatkova, I. (2011). An Assistive Bi-modal User Interface Integrating Multi-channel Speech Recognition and Computer Vision. In: Jacko, J.A. (eds) Human-Computer Interaction. Interaction Techniques and Environments. HCI 2011. Lecture Notes in Computer Science, vol 6762. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-21605-3_50

Download citation

DOI: https://doi.org/10.1007/978-3-642-21605-3_50
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-21604-6
Online ISBN: 978-3-642-21605-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

An Assistive Bi-modal User Interface Integrating Multi-channel Speech Recognition and Computer Vision

Abstract

Chapter PDF

Similar content being viewed by others

Bimodal Speech Recognition Fusing Audio-Visual Modalities

A Universal Assistive Technology with Multimodal Input and Multimedia Output Interfaces

Multimodal Interfaces of Human–Computer Interaction

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

An Assistive Bi-modal User Interface Integrating Multi-channel Speech Recognition and Computer Vision

Abstract

Chapter PDF

Similar content being viewed by others

Bimodal Speech Recognition Fusing Audio-Visual Modalities

A Universal Assistive Technology with Multimodal Input and Multimedia Output Interfaces

Multimodal Interfaces of Human–Computer Interaction

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation