Skip to main content

Quaero 2010 Speech-to-Text Evaluation Systems

  • Conference paper

Abstract

Quaero is a French program with German participation, within which KIT is also working on the problem of Automatic Speech Recognition for audio data from various sources from the World Wide Web. In this paper we describe the development of our English and German speech recognition systems for the 2010 Quaero evaluation for which, at least in part, we have utilized the XC4000 HPC cluster at KIT. Both recognition systems were trained with the help of the Janus Recognition Toolkit developed at the Interactive Systems Laboratory, and both are expansions of the 2009 evaluation systems. Both systems use various front-ends, state-of-the art acoustic models that include discriminative training, and very large language models which require the use of shared memory. Both systems also make use of domain specific acoustic and language model training material which became available for the 2010 evaluation. In total the expansion of the system and the addition of domain-dependent training material let to significant improved performance over the 2009 systems.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. A.W. Black and P.A. Taylor. The festival speech synthesis system: System documentation. Technical report, Human Communication Research Centre, University of Edinburgh, Edinburgh, Scotland, United Kingdom, 1997.

    Google Scholar 

  2. W.M. Fisher. A statistical text-to-phone function using ngrams and rules. In Proceedings the 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing, Phoenix, AZ, USA, December 1999. IEEE.

    Google Scholar 

  3. M.J.F. Gales. Maximum likelihood linear transformations for hmm-based speech recognition. Technical report, Cambridge University, Engineering Department, May 1997.

    Google Scholar 

  4. M.J.F. Gales. Semi-tied covariance matrices. 1998.

    Google Scholar 

  5. M.J.F. Gales. Semi-tied covariance matrices for hidden Markov models. Technical report, Cambridge University, Engineering Department, February 1998.

    Google Scholar 

  6. C. Gollan, M. Bisani, S. Kanthak, R. Schlüter, and H. Ney. Cross domain automatic transcription on the tc-star epps corpus. In Proceedings of the 2005 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’05), Philadelphia, PA, USA, March 2005.

    Google Scholar 

  7. Q. Jin and T. Schultz. Speaker segmentation and clustering in meetings. In Proceedings of the 8th International Conference on Spoken Language Processing (Interspeech 2004 – ICSLP), Jeju Island, Korea, October 2004. ISCA.

    Google Scholar 

  8. E. Leeuwis, M. Federico, and M. Cettolo. Language modeling and transcription of the TED corpus lectures. In International Conference on Acoustics, Speech, and Signal Processing, Hong Kong, China, March 2003.

    Google Scholar 

  9. C.J. Leggetter and P.C. Woodland. Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models. Computer Speech and Language, 9:171–185, 1995.

    Article  Google Scholar 

  10. D. Povey and P.C. Woodland. Improved discriminative training techniques for large vocabulary continuous speech recognition. In International Conference on Acoustics, Speech, and Signal Processing, Salt Lake City, UT, USA, May 2001.

    Google Scholar 

  11. S. Stüker, K. Kilgour, and J. Niehues. Quaero speech-to-text and text translation evaluation systems. In High Performance Computing in Science and Engineering ’10 – Transactions of the High Performance Computing Center, Stuttgart (HLRS) 2010. Springer, Heidelberg, 2010.

    Google Scholar 

  12. H. Soltau, F. Metze, C. Fügen, and A. Waibel. A One Pass-Decoder Based on Polymorphic Linguistic Context Assignment. Trento, Italy, 2001.

    Google Scholar 

  13. H. Soltau, F. Metze, C. Fügen, and A. Waibel. A one pass-decoder based on polymorphic linguistic context assignment. In Proceedings of the IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU ’01), pages 214–217, Madonna di Campiglio Trento, Italy, December 2001.

    Google Scholar 

  14. A. Stolcke. SRILM – An extensible language modeling toolkit. In Proceedings of the 7th International Conference on Spoken Language Processing (ICSLP 2002), pages 901–904, Denver, CO, USA, 2002. ISCA.

    Google Scholar 

  15. A. Venkataraman and W. Wang. Techniques for effective vocabulary selection. Arxiv preprint cs/0306022, 2003.

  16. M.C. Wölfel and J.W. McDonough. Minimum variance distortionless response spectralestimation, review and refinements. IEEE Signal Processing Magazine, 22(5):117–126, September 2005.

    Article  Google Scholar 

  17. P. Zhan and M. Westphal. Speaker normalization based on frequency warping. In Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing, Munich, Germany, April 1997.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sebastian Stüker .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Stüker, S., Kilgour, K., Kraft, F. (2012). Quaero 2010 Speech-to-Text Evaluation Systems. In: Nagel, W., Kröner, D., Resch, M. (eds) High Performance Computing in Science and Engineering '11. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23869-7_44

Download citation

Publish with us

Policies and ethics