A Lightweight Speech Detection System for Perceptive Environments

  • Dominique Vaufreydaz
  • Rémi Emonet
  • Patrick Reignier
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4299)


In this paper, we address the problem of speech activity detection in multimodal perceptive environments. Such space may contain many different microphones (lapel, distant or table top). Thus, we need a generic speech activity detector in order to cope with different speech conditions (from close-talking to noisy distant speech). Moreover, as the number of microphones in the room can be high, we also need a very light system. The speech activity detector presented in this article works efficiently on dozens of microphones in parallel. We will see that even if its absolute score of the evaluation is not perfect (30% and 40% of error rate respectively on the two tasks), its accuracy is good enough in the context we are using it.


False Alarm Speech Recognition Speech Activity Energy Detector Microphone Array 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Macho, D., Padrell, J., Abad, A., Nadeu, C., Hernando, J., McDonough, J., Wolfel, M., Klee, U., Omologo, M., Brutti, A., Svaizer, P., Potamianos, G., Chu, S.M.: Automatic Speech Activity Detection, Source Localization, and Speech Recognition on the Chil Seminar Corpus. In: IEEE International Conference on Multimedia & Expo. (January 2005)Google Scholar
  2. 2.
    Ramirez, J., Segura, J., Benitez, C., de la Torre, A., Rubio, A.: Efficient voice activity detection algorithms using long-term speech information. In: Eurospeech 1997 (1997)Google Scholar
  3. 3.
    Martin, A., Charlet, D., Mauuary, L.: Robust Speech/Non-Speech Detection Using LDA Applied to MFCC. In: Proc. ICASSP, Salt Lake City, vol. 1, pp. 237–240 (May 2001)Google Scholar
  4. 4.
    Frigo, M., Johnson, S.G.: The Design and Implementation of FFTW3. Special issue on Program Generation, Optimization, and Platform Adaptation 95, 216–231 (2005)Google Scholar
  5. 5.
    Taboada, J., Feijoo, S., Balsa, R., Hernandez, C.: Explicit estimation of speech boundaries. IEEE Proc. Sci. Meas. Technol. 141, 153–159 (1994)CrossRefGoogle Scholar
  6. 6.
    Rabiner, L., Juang, B.H.: Fundamentals of Speech Recognition. Prentice Hall PTR, Englewood Cliffs (1993)Google Scholar
  7. 7.
    Lamel, L., Gauvain, J.L., Eskenazi, M.: BREF, a large vocabulary spoken corpus for French. In: Proc Eurospeech 1991, Genova, Italia (1991)Google Scholar
  8. 8.
    Vaufreydaz, D.: Modélisation statistique du langage à partir d’Internet pour la reconnaissance automatique de la parole continue, Ph.D. in Computer Science at Joseph Fourier University, Grenoble, France), 226 pages (January 2002)Google Scholar
  9. 9.
    Spring 2006 (RT-06S) Rich Transcription Meeting Recognition Evaluation Plan (2006), http://www.nist.gov/speech/tests/rt/rt2006/spring/docs/rt06s-meeting-eval-plan-V2.pdf
  10. 10.
    Metze, F., Mc Donough, J., Soltau, H., Waibel, A., Lavie, A., Burger, S., Langley, C., Levin, L., Schultz, T., Pianesi, F., Cattoni, R., Lazzari, G., Mana, N., Pianta, E., Besacier, L., Blanchon, H., Vaufreydaz, D., Taddei, L.: The Nespole! Speech-to-Speech Translation System. In: Human Language Technologies 2002, San Diego - California (USA), 6 pages (March 2002)Google Scholar
  11. 11.
    Metze, F., Gieselmann, P., Holzapfel, H., Kluge, T., Rogina, I., Waibel, A., Wolfel, M., Crowley, J., Reignier, P., Vaufreydaz, D., Bérard, F., Cohen, B., Coutaz, J., Rouillard, S., Arranz, V., Bertran, M., Rodriguez, H.: The "FAME" Interactive Space. In: 2nd Joint Workshop on Multimodal Interaction and Related Machine Learning Algorithms, Edinburgh - UK, 4 pages (February 2005)Google Scholar
  12. 12.
    Brdiczka, O., Maisonnasse, J., Reignier, P.: Automatic Detection of Interaction Groups. In: Proc. Int’l Conf. Multimodal Interfaces (October 2005)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Dominique Vaufreydaz
    • 1
  • Rémi Emonet
    • 1
  • Patrick Reignier
    • 1
  1. 1.PRIMA – INRIA Rhône-Alpes, ZirstMontbonnotFrance

Personalised recommendations