Skip to main content

Audio-Visual Speech Recognition One Pass Learning with Spiking Neurons

  • Conference paper
  • First Online:
Artificial Neural Networks — ICANN 2002 (ICANN 2002)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2415))

Included in the following conference series:

Abstract

We present a new application in the field of impulse neurons: audio-visual speech recognition. The features extracted from the audio (cepstral coefficients) and the video (height, width of the mouth, percentage of black and white pixels in the mouth) are sufficiently simple to consider a real time integration of the complete system. A generic preprocessing makes it possible to convert these features into an impulse sequence treated by the neural network which carries out the classification. The training is done in one pass: the user pronounces once all the words of the dictionary. The tests on the European M2VTS Data Base shows the interest of such a system in audio-visual speech recognition. In the presence of noise in particular, the audio-visual recognition is much better than the recognition based on the audio modality only.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. P. de Cuetos, N. Chalapathy, and W. Andrew. Audio-visual intent-to-speak detection for human-computer interaction. In ICASSP, 2000.

    Google Scholar 

  2. P. Delmas, P.Y. Coulon, and V. Fristot. Automatic snakes for robust lip boudaries extraction. In ICASSP, 1999.

    Google Scholar 

  3. S. Dupont and J. Luettin. Audio-visual Speech modeling for continuous speech recognition. IEEE Transactions on multimedia, 2000.

    Google Scholar 

  4. S. Durand and F. Alexandre. Learning Speech as acoustic sequences with the unsupervised model, tom. In NEURAP, 8th International Conference on Neural Networks and their Applications, 1995.

    Google Scholar 

  5. A. Gersho and R. M. Gray. Vector Quantization and Signal Compression. Kluwer Acad. Pub., 1991.

    Google Scholar 

  6. J. Luettin. Visual Speech and speaker recognition. In PhD Dissertation, Univ. of Sheffield, 1997.

    Google Scholar 

  7. D. Mercier and R. Séguier. Spiking neurons (stanns) in speech recognition. In 3rd WSEAS International Conference on Neural Network and Applications, Feb 2002.

    Google Scholar 

  8. N. Mozayyani, A. R. Baig, and G. Vaucher. A fully neural solution for on-line handwritten Character recognition. In IJCNN, 1998.

    Google Scholar 

  9. S. Pigeon. M2vts. In http://www.tele.zacl.ac.be/PROJECTS/M2VTS/m2fdb.html, 1996.

  10. Gerasimos Potamianos and Chalapathy Neti. Automatic speechreading of impaired Speech. In Audio-Visual Speech Processing, September 2001.

    Google Scholar 

  11. R. Séguier, N. Cladel, C. Foucher, and D. Mercier. Lipreading with spiking neurons: One pass learning. In International Conference in Central Europe on Computer Graphits, Visualization and Computer Vision, Feb 2002.

    Google Scholar 

  12. R. Séguier, A. LeGlaunec, and B. Loriferne. Human faces detection and tracking in video sequence. In Proc. 7th Portuguese Conf. on Pattern Recognition, 1995.

    Google Scholar 

  13. R. Séguier and David Mercier. A generic pretreatment for spiking-neuron. Application on lipreading with stann (spatio-temporal artificial neural networks). In International Conference on Artificial Neural Networks and Genetic Algorithms, 2001.

    Google Scholar 

  14. Y. Tian, T. Panade, and J. F. Cohn. Recognizing action units for facial expression analysis. IEEE Trans. ora Patterra Analysis and Machine Iatelligence, 2001.

    Google Scholar 

  15. G. Vaucher. An algebraic interpretation of psp composition. In BioSystems, Vol 48, 1998.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2002 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Séguier, R., Mercier, D. (2002). Audio-Visual Speech Recognition One Pass Learning with Spiking Neurons. In: Dorronsoro, J.R. (eds) Artificial Neural Networks — ICANN 2002. ICANN 2002. Lecture Notes in Computer Science, vol 2415. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-46084-5_195

Download citation

  • DOI: https://doi.org/10.1007/3-540-46084-5_195

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-44074-1

  • Online ISBN: 978-3-540-46084-8

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics