Quaero 2010 Speech-to-Text Evaluation Systems

Stüker, Sebastian; Kilgour, Kevin; Kraft, Florian

doi:10.1007/978-3-642-23869-7_44

Quaero 2010 Speech-to-Text Evaluation Systems

Sebastian Stüker⁴,
Kevin Kilgour⁴ &
Florian Kraft⁴

Conference paper

1958 Accesses
4 Citations

Abstract

Quaero is a French program with German participation, within which KIT is also working on the problem of Automatic Speech Recognition for audio data from various sources from the World Wide Web. In this paper we describe the development of our English and German speech recognition systems for the 2010 Quaero evaluation for which, at least in part, we have utilized the XC4000 HPC cluster at KIT. Both recognition systems were trained with the help of the Janus Recognition Toolkit developed at the Interactive Systems Laboratory, and both are expansions of the 2009 evaluation systems. Both systems use various front-ends, state-of-the art acoustic models that include discriminative training, and very large language models which require the use of shared memory. Both systems also make use of domain specific acoustic and language model training material which became available for the 2010 evaluation. In total the expansion of the system and the addition of domain-dependent training material let to significant improved performance over the 2009 systems.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

A.W. Black and P.A. Taylor. The festival speech synthesis system: System documentation. Technical report, Human Communication Research Centre, University of Edinburgh, Edinburgh, Scotland, United Kingdom, 1997.
Google Scholar
W.M. Fisher. A statistical text-to-phone function using ngrams and rules. In Proceedings the 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing, Phoenix, AZ, USA, December 1999. IEEE.
Google Scholar
M.J.F. Gales. Maximum likelihood linear transformations for hmm-based speech recognition. Technical report, Cambridge University, Engineering Department, May 1997.
Google Scholar
M.J.F. Gales. Semi-tied covariance matrices. 1998.
Google Scholar
M.J.F. Gales. Semi-tied covariance matrices for hidden Markov models. Technical report, Cambridge University, Engineering Department, February 1998.
Google Scholar
C. Gollan, M. Bisani, S. Kanthak, R. Schlüter, and H. Ney. Cross domain automatic transcription on the tc-star epps corpus. In Proceedings of the 2005 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’05), Philadelphia, PA, USA, March 2005.
Google Scholar
Q. Jin and T. Schultz. Speaker segmentation and clustering in meetings. In Proceedings of the 8th International Conference on Spoken Language Processing (Interspeech 2004 – ICSLP), Jeju Island, Korea, October 2004. ISCA.
Google Scholar
E. Leeuwis, M. Federico, and M. Cettolo. Language modeling and transcription of the TED corpus lectures. In International Conference on Acoustics, Speech, and Signal Processing, Hong Kong, China, March 2003.
Google Scholar
C.J. Leggetter and P.C. Woodland. Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models. Computer Speech and Language, 9:171–185, 1995.
Article Google Scholar
D. Povey and P.C. Woodland. Improved discriminative training techniques for large vocabulary continuous speech recognition. In International Conference on Acoustics, Speech, and Signal Processing, Salt Lake City, UT, USA, May 2001.
Google Scholar
S. Stüker, K. Kilgour, and J. Niehues. Quaero speech-to-text and text translation evaluation systems. In High Performance Computing in Science and Engineering ’10 – Transactions of the High Performance Computing Center, Stuttgart (HLRS) 2010. Springer, Heidelberg, 2010.
Google Scholar
H. Soltau, F. Metze, C. Fügen, and A. Waibel. A One Pass-Decoder Based on Polymorphic Linguistic Context Assignment. Trento, Italy, 2001.
Google Scholar
H. Soltau, F. Metze, C. Fügen, and A. Waibel. A one pass-decoder based on polymorphic linguistic context assignment. In Proceedings of the IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU ’01), pages 214–217, Madonna di Campiglio Trento, Italy, December 2001.
Google Scholar
A. Stolcke. SRILM – An extensible language modeling toolkit. In Proceedings of the 7th International Conference on Spoken Language Processing (ICSLP 2002), pages 901–904, Denver, CO, USA, 2002. ISCA.
Google Scholar
A. Venkataraman and W. Wang. Techniques for effective vocabulary selection. Arxiv preprint cs/0306022, 2003.
M.C. Wölfel and J.W. McDonough. Minimum variance distortionless response spectralestimation, review and refinements. IEEE Signal Processing Magazine, 22(5):117–126, September 2005.
Article Google Scholar
P. Zhan and M. Westphal. Speaker normalization based on frequency warping. In Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing, Munich, Germany, April 1997.
Google Scholar

Download references

Author information

Authors and Affiliations

Research Group 3-01 ‘Multilingual Speech Recognition’, Karlsruhe Institute of Technology, Karlsruhe, Germany
Sebastian Stüker, Kevin Kilgour & Florian Kraft

Authors

Sebastian Stüker
View author publications
You can also search for this author in PubMed Google Scholar
Kevin Kilgour
View author publications
You can also search for this author in PubMed Google Scholar
Florian Kraft
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sebastian Stüker .

Editor information

Editors and Affiliations

Zentrum für Informationsdienste und, Hochleistungsrechnen (ZIH), TU Dresden, Helmholtzstr. 10, Dresden, 01069, Germany
Wolfgang E. Nagel
, Abt. Angewandte Mathematik, Universität Freiburg, Hermann-Herder-Str. 10, Freiburg, 79104, Germany
Dietmar B. Kröner
, Höchstleistungsrechenzentrum, Universität Stuttgart, Nobelstraße 19, Stuttgart, 70569, Germany
Michael M. Resch

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Stüker, S., Kilgour, K., Kraft, F. (2012). Quaero 2010 Speech-to-Text Evaluation Systems. In: Nagel, W., Kröner, D., Resch, M. (eds) High Performance Computing in Science and Engineering '11. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23869-7_44

Download citation

DOI: https://doi.org/10.1007/978-3-642-23869-7_44
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-23868-0
Online ISBN: 978-3-642-23869-7
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics