Audio-Visual Speech Recognition One Pass Learning with Spiking Neurons

Séguier, Renaud; Mercier, David

doi:10.1007/3-540-46084-5_195

Renaud Séguier⁵ &
David Mercier⁵

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2415))

Included in the following conference series:

International Conference on Artificial Neural Networks

131 Accesses
6 Citations

Abstract

We present a new application in the field of impulse neurons: audio-visual speech recognition. The features extracted from the audio (cepstral coefficients) and the video (height, width of the mouth, percentage of black and white pixels in the mouth) are sufficiently simple to consider a real time integration of the complete system. A generic preprocessing makes it possible to convert these features into an impulse sequence treated by the neural network which carries out the classification. The training is done in one pass: the user pronounces once all the words of the dictionary. The tests on the European M2VTS Data Base shows the interest of such a system in audio-visual speech recognition. In the presence of noise in particular, the audio-visual recognition is much better than the recognition based on the audio modality only.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

P. de Cuetos, N. Chalapathy, and W. Andrew. Audio-visual intent-to-speak detection for human-computer interaction. In ICASSP, 2000.
Google Scholar
P. Delmas, P.Y. Coulon, and V. Fristot. Automatic snakes for robust lip boudaries extraction. In ICASSP, 1999.
Google Scholar
S. Dupont and J. Luettin. Audio-visual Speech modeling for continuous speech recognition. IEEE Transactions on multimedia, 2000.
Google Scholar
S. Durand and F. Alexandre. Learning Speech as acoustic sequences with the unsupervised model, tom. In NEURAP, 8th International Conference on Neural Networks and their Applications, 1995.
Google Scholar
A. Gersho and R. M. Gray. Vector Quantization and Signal Compression. Kluwer Acad. Pub., 1991.
Google Scholar
J. Luettin. Visual Speech and speaker recognition. In PhD Dissertation, Univ. of Sheffield, 1997.
Google Scholar
D. Mercier and R. Séguier. Spiking neurons (stanns) in speech recognition. In 3rd WSEAS International Conference on Neural Network and Applications, Feb 2002.
Google Scholar
N. Mozayyani, A. R. Baig, and G. Vaucher. A fully neural solution for on-line handwritten Character recognition. In IJCNN, 1998.
Google Scholar
S. Pigeon. M2vts. In http://www.tele.zacl.ac.be/PROJECTS/M2VTS/m2fdb.html, 1996.
Gerasimos Potamianos and Chalapathy Neti. Automatic speechreading of impaired Speech. In Audio-Visual Speech Processing, September 2001.
Google Scholar
R. Séguier, N. Cladel, C. Foucher, and D. Mercier. Lipreading with spiking neurons: One pass learning. In International Conference in Central Europe on Computer Graphits, Visualization and Computer Vision, Feb 2002.
Google Scholar
R. Séguier, A. LeGlaunec, and B. Loriferne. Human faces detection and tracking in video sequence. In Proc. 7th Portuguese Conf. on Pattern Recognition, 1995.
Google Scholar
R. Séguier and David Mercier. A generic pretreatment for spiking-neuron. Application on lipreading with stann (spatio-temporal artificial neural networks). In International Conference on Artificial Neural Networks and Genetic Algorithms, 2001.
Google Scholar
Y. Tian, T. Panade, and J. F. Cohn. Recognizing action units for facial expression analysis. IEEE Trans. ora Patterra Analysis and Machine Iatelligence, 2001.
Google Scholar
G. Vaucher. An algebraic interpretation of psp composition. In BioSystems, Vol 48, 1998.
Google Scholar

Download references

Author information

Authors and Affiliations

Supélec, Team ETSN, Avenue de la Boulaie, BP28, 35511, Cesson Sévigné, France
Renaud Séguier & David Mercier

Authors

Renaud Séguier
View author publications
You can also search for this author in PubMed Google Scholar
David Mercier
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

ETS Informática, Universidad Autónoma de Madrid, 28049, Madrid, Spain
José R. Dorronsoro

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Séguier, R., Mercier, D. (2002). Audio-Visual Speech Recognition One Pass Learning with Spiking Neurons. In: Dorronsoro, J.R. (eds) Artificial Neural Networks — ICANN 2002. ICANN 2002. Lecture Notes in Computer Science, vol 2415. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-46084-5_195

Download citation

DOI: https://doi.org/10.1007/3-540-46084-5_195
Published: 21 August 2002
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-44074-1
Online ISBN: 978-3-540-46084-8
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics