Advertisement

SPRAAK: Speech Processing, Recognition and Automatic Annotation Kit

  • Patrick Wambacq
  • Kris Demuynck
  • Dirk Van Compernolle
Chapter
Part of the Theory and Applications of Natural Language Processing book series (NLP)

Abstract

The availability of a speech recognition system for Dutch is mentioned as one of the essential requirements for the language and speech technology community. Indeed, researchers now are faced with the problem that no good speech recognition tool is available for their purposes or existing tools lack functionality or flexibility. This project has two primary goals that are accomplished within a single software framework. The first goal is to develop a highly modular toolkit for research into speech recognition algorithms. It allows researchers to focus on one particular aspect of speech recognition technology without needing to worry about the details of the other components. The second goal is to provide a state-of-the art recogniser for Dutch with a simple interface, so that it can be used by non-specialists with a minimum of programming requirements. Next to speech recognition, the resulting software enables applications in related fields as well. Examples are linguistic and phonetic research where the software can be used to segment large speech databases or to provide high quality automatic transcriptions. We have chosen the existing ESAT recogniser, augmented with knowledge and code from the other partners in the project, as a starting point. This code base is transformed to meet the specified requirements. The transformation is accomplished by improving the software interfaces to make the software package more user friendly and adapted for usage in a large user community, and by providing adequate user and developer documentation written in English, so as to make it easily accessible to the international language and speech technology community as well.

Notes

Acknowledgements

The SPRAAK toolkit has emerged as the transformation of KULeuven/ESAT’s previous speech recognition system (HMM75 was the latest version). This system is the result of 20 years of research and development in speech recognition at the KULeuven. We want to acknowledge the contributions of many researchers (too many to list them all) most of which have by now left the university after having obtained their PhD degree.

The SPRAAK project has also added some extra functionality to the previous system and has produced demo acoustic models and documentation. This was done in a collaborative effort with four partners: KULeuven/ESAT, RUNijmegen/CSLT, UTwente/HMI and TNO. The contributions of all these partners are also acknowledged.

References

  1. 1.
    Afify, M., Siohan, O.: Constrained maximum likelihood linear regression for speaker adaptation. In: Proceedings of the ICSLP-2000, Beijing, China, vol.3, pp. 861–864 (2000)Google Scholar
  2. 2.
    Clarkson, P., Rosenfeld, R.: Statistical language modeling using the CMU-Cambridge toolkit. In: Proceedings of the ESCA Eurospeech 1997, Rhodes, Greece, pp. 2707–2710 (1997)Google Scholar
  3. 3.
    CMUSphinx wiki available at. http://cmusphinx.sourceforge.net/wiki/start
  4. 4.
    Cucchiarini, C., Driesen, J., Van hamme, H., Sanders, E.: Recording speech of children, non-natives and elderly people for HLT applications: the JASMIN-CGN corpus. In: Proceedings of the 6th International Conference on Language Resources and Evaluation – LREC 2008, Marrakech, Morocco, 28–30 May 2008, p. 8Google Scholar
  5. 5.
    Demuynck, K.: Extracting, modelling and combining information in speech recognition. Ph.D. thesis, K.U.Leuven ESAT (2001)Google Scholar
  6. 6.
    Demuynck, K., Duchateau, J., Van Compernolle, D.: Reduced semi-continuous models for large vocabulary continuous speech recognition in Dutch. In: Proceedings of the ICSLP, Philadelphia, USA, Oct 1996, vol. IV, pp. 2289–2292Google Scholar
  7. 7.
    Demuynck, K., Duchateau, J., Van Compernolle, D.: Optimal feature sub-space selection based on discriminant analysis. In: Proceedings of the European Conference on Speech Communication and Technology, Budapest, Hungary, vol. 3, pp. 1311–1314 (1999)Google Scholar
  8. 8.
    Demuynck, K., Laureys, T., Van Compernolle, D., Van hamme, H.: FLaVoR: a flexible architecture for LVCSR. In: Proceedings of the European Conference on Speech Communication and Technology, Geneva, Switzerland, Sept 2003, pp. 1973–1976Google Scholar
  9. 9.
    Demuynck, K., Puurula, A., Van Compernolle, D., Wambacq, P.: The ESAT 2008 system for N-Best Dutch speech recognition Benchmark. In: Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, Merano, Italy, Dec 2009, pp. 339–343Google Scholar
  10. 10.
    Demuynck, K., Roelens, J., Van Compernolle, D., Wambacq, P.: SPRAAK: an open source speech recognition and automatic annotation kit. In: Proceedings of the International Conference on Spoken Language Processing, Brisbane, Australia, Sept 2008, p. 495Google Scholar
  11. 11.
    Demuynck, K., Seppi, D., Van Compernolle, D., Nguyen, P., Zweig, G.: Integrating meta-information into exemplar-based speech recognition with segmental conditional random fields. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing, Prague, Czech Republic, May 2011, pp. 5048–5051Google Scholar
  12. 12.
    Gales, M.: Maximum likelihood linear transformations for HMM-based speech recognition. Comput. Speech Lang. 12, 75–98 (1998)CrossRefGoogle Scholar
  13. 13.
    Gemmeke, J.F., Van Segbroeck, M., Wang, Y., Cranen, B., Van hamme, H.: Automatic speech recognition using missing data techniques: handling of real-world data. In Kolossa, D., Haeb-Umbach, R. (eds.) Robust Speech Recognition of Uncertain or Missing Data. Springer, Berlin/Heidelberg (2011, in press)Google Scholar
  14. 14.
  15. 15.
    Kessens, J., van Leeuwen, D.A.: N-best: the Northern- and Southern-Dutch benchmark evaluation of speech recognition technology. In: Proceedings of the Interspeech, Antwerp, Belgium, pp. 1354–1357 (2007)Google Scholar
  16. 16.
    Lee, A., Kawahara, T.: Recent development of open-source speech recognition engine Julius. In: Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), Sapporo, Japan, pp. 131–137 (2009)Google Scholar
  17. 17.
    Nogueira, W., Vanpoucke, F., Dykmans, P., De Raeve, L., Van hamme, H., Roelens, J.: Speech recognition technology in CI rehabilitation. Cochlear Implant. Int. 11 (Suppl. 1), 449–453 (2010)Google Scholar
  18. 18.
    Palomäki, K., Brown, G., Barker, J.: Missing data speech recognition in reverberant conditions. In: Proceedings of the ICASSP, Orlando, Florida, pp. 65–68 (2002)Google Scholar
  19. 19.
    Palomäki, K., Brown, G., Barker, J.: Techniques for handling convolutional distortion with ‘missing data’ automatic speech recognition. Speech Commun. 43 (1–2), 123–142 (2004)CrossRefGoogle Scholar
  20. 20.
    Rybach, D., Gollan, C., Heigold, G., Hoffmeister, B., Lööf, J., Schlüter, R., Ney, H.: The RWTH Aachen university open source speech recognition system. In: Proceedings of the Interspeech 2009, Brighton, UK, Sept 2009, pp. 2111–2114Google Scholar
  21. 21.
    Stolcke, A.: SRILMan extensible language modeling toolkit. In: Hansen, J.H.L., Pellom, B. (eds.) Proceedings of the ICSLP, Denver, Sept 2002, vol. 2, pp. 901–904Google Scholar
  22. 22.
    Stolcke, A., Zheng, J., Wang, W., Abrash, V.: SRILM at sixteen: update and outlook. In: Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, Waikoloa, Hawaii, Dec 2011Google Scholar
  23. 23.
    Stouten, F., Duchateau, J., Martens, J.-P., Wambacq, P.: Coping with disfluencies in spontaneous speech recognition: acoustic detection and linguistic context manipulation. Speech Commun. 48 (11), 1590–1606 (2006)CrossRefGoogle Scholar
  24. 24.
    Stouten, V., Van hamme, H., Wambacq, P.: Model-based feature enhancement with uncertainty decoding for noise robust ASR. Speech Commun. 48 (11), 1502–1514 (2006)Google Scholar
  25. 25.
    Strik, H., Bakker, A.: Alfabetisering met een luisterende computer. DIXIT special issue on ‘STEVIN en onderwijs’ (in Dutch), p. 21. http://lands.let.ru.nl/~strik/publications/a148-TST-AAP-DIXIT.pdf(2009)
  26. 26.
    van Doremalen, J., Cucchiarini, C., Strik, H.: Optimizing automatic speech recognition for low-proficient non-native speakers. EURASIP J. Audio Speech Music Process. 2010, 13 (2010). Article ID 973954. doi:10.1155/2010/973954Google Scholar
  27. 27.
    Van Segbroeck, M., Van hamme, H.: Advances in missing feature techniques for robust large vocabulary continuous speech recognition. IEEE Trans. Audio Speech Lang. Process. 19 (1), 123–137 (2011)Google Scholar
  28. 28.
    Wambacq, P., Demuynck, K.: Efficiency of speech alignment for semi-automated subtitling in Dutch. In: Habernal, I., Matous̆ek, V. (eds.) TSD 2011. LNAI, vol. 6836, pp. 123–130. Springer, Berlin/New York (2011)Google Scholar
  29. 29.
    Wu, T., Duchateau, J., Martens, J.-P., Van Compernolle, D.: Feature subset selection for improved native accent identification. Speech Commun. 52, 83–98 (2010)CrossRefGoogle Scholar

Copyright information

© The Author(s) 2013

Open Access. This chapter is distributed under the terms of the Creative Commons Attribution Noncommercial License, which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.

Authors and Affiliations

  • Patrick Wambacq
    • 1
  • Kris Demuynck
    • 1
  • Dirk Van Compernolle
    • 1
  1. 1.Katholieke Universiteit Leuven, ESATHeverleeBelgium

Personalised recommendations