Advertisement

The LIMSI RT06s Lecture Transcription System

  • Lori Lamel
  • Eric Bilinski
  • Gilles Adda
  • Jean-Luc Gauvain
  • Holger Schwenk
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4299)

Abstract

This paper describes recent research carried out in the context of the FP6 Integrated Project Chil in developing a system to automatically transcribe lectures and presentations. Widely available corpora were used to train both the acoustic and language models, since only a small amount of Chil data was available for system development. Acoustic model training made use of the transcribed portion of the TED corpus of Eurospeech recordings, as well as the ICSI, ISL, and NIST meeting corpora. For language model training, text materials were extracted from a variety of on-line conference proceedings. Experimental results are reported for close-talking and far-field microphones on development and evaluation data.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    The Translanguage English Database (TED) Transcripts, LDC catalog number LDC2002T03, isbn 1-58563-202-3Google Scholar
  2. 2.
    Bengio, Y., Ducharme, R.: A neural probabilistic language model. Advances in Neural Information Processing Systems (NIPS) 13, 933–938 (2001)Google Scholar
  3. 3.
    Burger, S., MacLaran, V., Yu, H.: The ISL Meeting Corpus: The Impact of Meeting Type on Speech Style. In: ICSLP 2002, Denver (September 2002) (LDC2004S05, LDC2004E04, LDC2004E05)Google Scholar
  4. 4.
    Fiscus, J.G.: A post-processing system to yield reduced word error rates: Recognizer output voting error reduction (ROVER). In: Proc. ASRU 1997, Santa Barbara, pp. 347–354 (December 1997)Google Scholar
  5. 5.
    Garofolo, J.S., Laprun, C.D., Michel, M., Stanford, V.M., Tabassi, E.: The NIST Meeting Room Pilot Corpus. In: LREC 2004, Lisbon (May 2004) (LDC2004S09, LDC2004T13)Google Scholar
  6. 6.
    Gauvain, J.L., Lamel, L., Adda, G.: The Limsi Broadcast News Transcription System. Speech Communication 37(1-2), 89–108 (2002)zbMATHCrossRefGoogle Scholar
  7. 7.
    Janin, A., Baron, D., Edwards, J., Ellis, D., Gelbart, D., Morgan, N., Peskin, B., Pfau, T., Shriberg, E., Stolcke, A., Wooters, C.: The ICSI Meeting Corpus. In: ICASSP 2003, Hong Kong (April 2003) (LDC2004S02, LDC2004T04)Google Scholar
  8. 8.
    Lamel, L., Adda, G., Bilinski, E., Gauvain, J.L.: Transcribing Lectures and Seminars. In: Proc. ISCA Eurospeech 2005, Lisbon (September 2005)Google Scholar
  9. 9.
    Lamel, L.F., Schiel, F., Fourcin, A., Mariani, J., Tillmann, H.: The Translanguage English Database TED. In: ICSLP 1994, Yokohama (September 1994) (LDC2002S04)Google Scholar
  10. 10.
    Lamel, L., Schwenk, H., Gauvain, J.L., Adda, G., Bilinski, E.: Improvements in Transcribing Lectures and Seminars. In: MLMI 2005 (July 2005)Google Scholar
  11. 11.
    Leggetter, C.J., Woodland, P.C.: Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models. Computer Speech and Language 9(2), 171–185 (1995)CrossRefGoogle Scholar
  12. 12.
    Macho, D., Padrell, J., Abad, A., Nadeu, C., Hernando, J., McDonough, J., Wolfel, M., Klee, U., Omologo, M., Brutti, A., Svaizer, P., Potamianos, G., Chu, S.: First experiments of automatic speech activity detection, source localization and speech recognition in the CHIL project. In: Workshop on Hands-Free Speech Communication and Microphone Arrays, Rutgers University, Piscataway, NJ (2005)Google Scholar
  13. 13.
    Mangu, L., Brill, E., Stolcke, A.: Finding Consensus Among Words: Lattice-Based Word Error Minimization. In: Eurospeech 1999, Budapest, pp. 495–498 (September 1999)Google Scholar
  14. 14.
    Schwenk, H.: Efficient training of large neural networks for language modeling. In: IJCNN, pp. 3059–3062 (2004)Google Scholar
  15. 15.
    Waibel, A., Steusloff, H., Stiefelhagen, R.: CHIL - Computers in the Human Interaction Loop. In: 5th International Workshop on Image Analysis for Multimedia Interactive Services, Lisbon (April 2004), http://isl.ira.uka.de/chil
  16. 16.
    Woodland, P.C., Niesler, T., Whittaker, E.: Language Modeling in the HTK Hub5 LVCSR. In: The 1998 Hub5E Workshop (September 1998)Google Scholar
  17. 17.
    Zhu, X., Barras, C., Lamel, L., Gauvain, J.-L.: Speaker Diarization: from Broadcast News to Lectures. In: Proc. RT06s (submitted)Google Scholar
  18. 18.
    Zhu, X., Barras, C., Meignier, S., Gauvain, J.L.: Combining speaker identification and BIC for speaker diarization. In: Proc. Interspeech 2005, Lisboa, pp. 2441–2444 (September 2005)Google Scholar
  19. 19.
    Zhu, X., Leung, C.C., Barras, C., Lamel, L., Gauvain, J.L.: Speech activity detection and speaker identification for CHIL. In: Proc. MLMI 2005, Edinburgh (July 2005)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Lori Lamel
    • 1
  • Eric Bilinski
    • 1
  • Gilles Adda
    • 1
  • Jean-Luc Gauvain
    • 1
  • Holger Schwenk
    • 1
  1. 1.LIMSI-CNRSFrance

Personalised recommendations