Audio-Visual Isolated Words Recognition for Voice Dialogue System

Chaloupka, Josef

doi:10.1007/978-3-642-25775-9_8

Audio-Visual Isolated Words Recognition for Voice Dialogue System

Josef Chaloupka²¹

Conference paper

2532 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6800))

Abstract

This contribution is about experiments in audio-visual isolated words recognition. The results of these experiments will be used to improve our voice dialogue system, where visual speech recognition will be added. The voice dialogue systems can be used in train or bus stations (or elsewhere), where noise levels are relatively high, therefore the visual part of speech can improve the recognition rate mainly in noisy conditions. The audio-visual recognition of isolated words in our experiments was based on the technique of two-stream Hidden Markov Models (HMM) and on the HMM of single Czech phonemes and visemes. Different visual speech features and a different number of states and mixtures of HMM were evaluated in single tests. In the following experiments, isolated words were being recognized after training of the HMM and babble noise was added in the successive steps to the acoustic speech signal.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Chaloupka, J., Chaloupka, Z.: Czech Artificial Computerized Talking Head George. In: Esposito, A., Vích, R. (eds.) Cross-Modal Analysis of Speech, Gestures, Gaze and Facial Expressions. LNCS (LNAI), vol. 5641, pp. 324–330. Springer, Heidelberg (2009)
Chapter Google Scholar
Viola, P., Jones, M.: J.: Robust Real-Time Face Detection. International Journal of Computer Vision 57(2), 137–154 (2004)
Article Google Scholar
Potamianos, G., Neti, C., Gravier, G., Garg, A., Senior, A.W.: Recent advances in the automatic recognition of audiovisual speech. Proceedings of the IEEE 91(9), 1306–1326 (2003)
Article Google Scholar
Liew, A.W.C., Wang, S.: Visual speech recognition – lip segmentation and mapping. Medical Information Science Reference Press, New York (2009)
Book Google Scholar
Heckmann, M., Kroschel, K., Savariaux, C., Berthommier, F.: DCT-based video features for audio-visual speech recognition. In: Proc. Int. Conf. Spoken Lang. Process. (2002)
Google Scholar
Goecke, R., Asthana, A.: A Comparative Study of 2D and 3D Lip Tracking Methods for AV ASR. In: Proceedings of International Conference on Auditory-Visual Speech Processing (AVSP 2008), Australia, pp. 235–240 (2008) ISBN 978-0-646-49504-0
Google Scholar
Lan, Y., Theobald, B.J., Harvey, R., Ong, E.J., Bowden, R.: Improving Visual Features for Lip-reading. In: The 9th International Conference on Auditory-Visual Speech Processing - AVSP 2010, Japan, pp. 142–147 (September 2010) ISBN 978-4-9905475-0-9
Google Scholar
Varga, A.P., Steeneken, H.J.M., Tomlinson, M., Jones, D.: The NOISEX-92 study on the effect of additive noise on automatic speech recognition. Tech. Rep., Speech Research Unit, Defence Research Agency, Malvern, UK (1992)
Google Scholar
Zhao, D.Y., Kleijn, W.B., Ypma, A., de Vries, B.: Online Noise Estimation Using Stochastic-Gain HMM for Speech Enhancement. IEEE Transactions on Audio, Speech, and Language Processing 16(4), 835–846 (2008)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Information Technology, Technical University of Liberec, Studentska 2, 461 17, Liberec, Czech Republic
Josef Chaloupka

Authors

Josef Chaloupka
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Dept. of Psychology and IIASS, International Institute for Advanced Scientific Studies, Second University of Naples, Vietri sul Mare, SA, Italy
Anna Esposito
School of Computing Science, University of Glasgow, Glasgow, UK
Alessandro Vinciarelli
Department of Telecommunication and Media Informatics, Laboratory of Speech Acoustics, Budapest University of Technology and Economics, 1117, Budapest, Hungary
Klára Vicsi
TELECOM ParisTech, CNRS-LTCI UMR 5141, 75014, Paris, France
Catherine Pelachaud
Faculty of Electrical Engineering, Mathematics and Computer Science, University of Twente, 7500 AE, Enschede, The Netherlands
Anton Nijholt

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chaloupka, J. (2011). Audio-Visual Isolated Words Recognition for Voice Dialogue System. In: Esposito, A., Vinciarelli, A., Vicsi, K., Pelachaud, C., Nijholt, A. (eds) Analysis of Verbal and Nonverbal Communication and Enactment. The Processing Issues. Lecture Notes in Computer Science, vol 6800. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-25775-9_8

Download citation

DOI: https://doi.org/10.1007/978-3-642-25775-9_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-25774-2
Online ISBN: 978-3-642-25775-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics