Advertisement

N-Best 2008: A Benchmark Evaluation for Large Vocabulary Speech Recognition in Dutch

  • David A. van Leeuwen
Chapter
Part of the Theory and Applications of Natural Language Processing book series (NLP)

Abstract

In 2008 an evaluation of large vocabulary continuous speech recognition systems for the Dutch language was conducted. The tasks consisted of transcription of Broadcast News and Conversational Telephone Speech in the Northern and Southern regional language variants (Dutch and Flemish). The evaluation was modeled after the well known ARPA/NIST evaluations and the French Technolangue Evalda campaigns. This paper reviews the tasks and evaluation methodology used, presents the official results and discusses some additional analyses. Acoustic and textual training material was specified and provided in a primary evaluation condition. Seven academic sites from four European countries submitted results to this evaluation in four primary transcription tasks. The best results reported are a word error rate of 15.9% for Southern Dutch Broadcast News. Text normalisation, vocabulary and pronunciation modeling are common among the important system development efforts.

References

  1. 1.
    Demuynck, K., Puurula, A., Van Compernolle, D., Wambacq, P.: The ESAT 2008 system for N-Best Dutch speech recognition benchmark. In: Proceedings of ASRU, Merano, pp. 339–344 (2009)Google Scholar
  2. 2.
    Demuynck, K., Duchateau, J., Van Compernolle, D., Wambacq, P.: An efficient search space representation for large vocabulary continuous speech recognition. Speech Commun. 30 (1):37–53 (2000)CrossRefGoogle Scholar
  3. 3.
    Despres, J., Fousek, P., Gauvain, J.-L., Gay, S., Josse, Y., Lamel, L., Messaoudi, A.: Modeling Northern and Southern varieties of Dutch for STT. In: Proceedings of Interspeech, Brighton, pp. 96–99. ISCA (2009)Google Scholar
  4. 4.
    Fiscus, J.: The rich transcription 2006 spring meeting recognition evaluation. http://www.nist.gov/speech/tests/rt/rt2006/spring/docs/rt06s-meeting-ev%al-plan-V2.pdf(2006)Google Scholar
  5. 5.
    Fiscus, J.G., Ajot, J., Garofolo, J.S.: The rich transcription 2007 meeting recognition evaluation. In: The Joint Proceedings of the CLEAR 2007 and RT 2007 Evaluation Workshops. Volume 4625 of LNCS, Baltimore, pp. 373–389, Springer (2007)Google Scholar
  6. 6.
    Fiscus, J.G., Ajot, J., Radde, N., Laprun, C.: Multiple dimension levenshtein edit distance calculations for evaluating automatic speech recognition systems during simultaneous speech. In: Proceedings LREC, Genoa, pp. 803–808. ELRA (2006)Google Scholar
  7. 7.
    Godfrey, J.J., Holliman, E.C., McDaniel, J.: Switchboard: telephone speech corpus for research and development. In: Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP), San Francisco, pp. 517–520 (1992)Google Scholar
  8. 8.
    Goedertier, W., Goddijn, S., Martens, J.-P.: Orthographic transcription of the Spoken Dutch Corpus. In: Proceedings of the LREC, Athens, pp. 909–914 (2000)Google Scholar
  9. 9.
    Huijbregts, M., Ordelman, R., Werff, L., Jong, F.M.G.: SHoUT, the University of Twente submission to the N-Best 2008 speech recognition evaluation for Dutch. In: Proceedings of Interspeech, Brighton, pp. 2575–2578. ISCA (2009)Google Scholar
  10. 10.
    Huijbregts, M.A.H., Ordelman, R.J.F., de Jong, F.M.G.: A spoken document retrieval application in the oral history domain. In: Proceedings of 10th International Conference Speech and Computer, Patras, pp. 699–702. University of Patras (2005)Google Scholar
  11. 11.
    Kessens, J., van Leeuwen, D.: N-Best: the Northern and southern dutch Benchmark Evaluation of Speech recognition Technology. In: Proceedings Interspeech, pp. 1354–1357, Antwerp, August 2007. ISCA.Google Scholar
  12. 12.
    Martin, A.F., Greenberg, C.S.: The NIST 2010 speaker recognition evaluation. In: Proceedings of Interspeech, Makuhari, pp. 2726–2729. ISCA (2010)Google Scholar
  13. 13.
    Martin, A.F., Le, A.N.: NIST 2007 language recognition evaluation. In: Proceedings of Speaker and Language Odyssey, Stellenbosch, South Afrika. IEEE (2008)Google Scholar
  14. 14.
    Oostdijk, N.H.J., Broeder, D.: The Spoken Dutch Corpus and its exploitation environment. In: Proceedings of the 4th International Workshop on Linguistically Interpreted Corpora (LINC-03), Budapest (2003)Google Scholar
  15. 15.
    Ordelman, R.: Dutch speech recognition in multimedia information retrieval. PhD thesis, University of Twente (2003)Google Scholar
  16. 16.
    Pallett, D.: A look at NIST’s benchmark ASR tests: Past, present, and future.http://www.nist.gov/speech/history/(2003)
  17. 17.
    Pellom, B.: Sonic: the university of colorado continuous speech recognizer. Technical Report TR-CSLR-2001-01, University of Colorado, Boulder, March 2001Google Scholar
  18. 18.
    Phillips, P.J., Narvekar, A., Jiang, F., O’Toole, A.J.: An other-race effect for face recognition algorithms. ACM Trans. Appl. Percept. 8 (14), ART14 (2010)Google Scholar
  19. 19.
    Robinson, T., Hochberg, M., Renals, S.: The Use of Recurrent Networks in Continuous Speech Recognition, Chapter 7, pp. 233–258. Kluwer, Boston (1996)Google Scholar
  20. 20.
    Stouten, F., Duchateau, J., Martens, J.-P., Wambacq, P.: Coping with disfluencies in spontaneous speech recognition: acoustic detection and linguistic context manipulation. Speech Commun. 48, 1590–1606 (2006)CrossRefGoogle Scholar
  21. 21.
    van Leeuwen, D.A.: Evaluation plan for the North- and south-dutch Benchmark Evaluation of Speech recognition Technology (N-Best 2008).http://speech.tm.tno.nl/n-best/eval/evalplan.pdf(2008)
  22. 22.
    van Leeuwen, D.A., Kessens, J., Sanders, E., van den Heuvel, H.: Results of the N-Best 2008 Dutch speech recognition evaluation. In: Proceedings of the Interspeech, Brighton, Sept. 2009, pp. 2571–2574. ISCA (2009)Google Scholar
  23. 23.
    van Leeuwen, D.A., Martin, A.F., Przybocki, M.A., Bouten, J.S.: NIST and TNO-NFI evaluations of automatic speaker recognition. Comput. Speech Lang. 20, 128–158 (2006)CrossRefGoogle Scholar
  24. 24.
    Young, S.J., Adda-Dekker, M., Aubert, X., Dugast, C., Gauvain, J.-L., Kershaw, D.J., Lamel, L., van Leeuwen, D.A., Pye, D., Robinson, A.J., Steeneken, H.J.M., Woodland, P.C.: Mutilingual large vocabulary speech recognition: the European SQALE project. Comput. Speech Lang. 11, 73–89 (1997)CrossRefGoogle Scholar

Copyright information

© The Author(s) 2013

Open Access. This chapter is distributed under the terms of the Creative Commons Attribution Noncommercial License, which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.

Authors and Affiliations

  1. 1.Centre for Language and Speech TechnologyNijmegenThe Netherlands

Personalised recommendations