Skip to main content

Quaero Speech-to-Text and Text Translation Evaluation Systems

  • Conference paper
High Performance Computing in Science and Engineering '10

Abstract

Our laboratory has used the HP XC4000, the high performance computer of the federal state Baden-Wnrttemberg, in order to participate in the second Quaero evaluation for automatic speech recognition (ASR) and Machine Translation (MT). State-of-the-art automatic speech recognition and machine translation systems train use stochastic models which are trained on large amounts of training data using techniques from the field of machine learning. Using these techniques the systems search for the most likely speech recognition hypothesis, translation hypothesis respectively.

The 2009 evaluation systems are further developments of the 2008 evaluation systems which incorporate more training data and updated models. The speech recognition and machine translation models were, at leas in part, trained on the XC4000 high performance cluster. The speech recognition evaluation itself was also mainly executed on the XC4000.

In this paper we report on the newly developed system and how we utilized the XC4000 in order to train their models and to run the actual evaluation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 149.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 199.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 199.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Andreas Zollman, Ashish Venugopal, and Alex Waibel. Training and Evaluation Error Minimization Rules for Statistical Machine Translation. In Proc. of ACL 2005, Workshop on Data-drive Machine Translation and Beyond (WPT-05), Ann Arbor, MI, 2005.

    Google Scholar 

  2. A.W. Black and P.A. Taylor. The festival speech synthesis system: System documentation. Technical report, Human Communication Research Centre, University of Edinburgh, Edinburgh, Scotland, United Kingdom, 1997.

    Google Scholar 

  3. W.M. Fisher. A statistical text-to-phone function using ngrams and rules. In Proceedings the 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing, Phoenix, AZ, USA, December 1999. IEEE.

    Google Scholar 

  4. George Foster, Roland Kuhn, and Howard Johnson. Phrasetable Smoothing for Statistical Machine Translation. In Proc. of Empirical Methods in Natural Language Processing, Sydney, Australia, 2006.

    Google Scholar 

  5. M.J.F. Gales. Maximum likelihood linear transformations for hmm-based speech recognition. Technical report, Cambridge University, Engineering Department, May 1997.

    Google Scholar 

  6. M.J.F. Gales. Semi-tied covariance matrices for hidden Markov models. Technical report, Cambridge University, Engineering Department, February 1998.

    Google Scholar 

  7. Christian Gollan, Maximilian Bisani, Stephan Kanthak, Ralf Schlüter, and Hermann Ney. Cross domain automatic transcription on the tc-star epps corpus. In Proceedings of the 2005 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’05), Philadelphia, PA, USA, March 2005.

    Google Scholar 

  8. Qin Gao and Stephan Vogel. Parallel implementation of word alignment tool. In Proceedings of the ACL Workshops Software Engineering, Testing, and Quality Assurance for Natural Language Processing, pages 49–57, Columbus, Ohio, June 2008. ACL.

    Google Scholar 

  9. Almut Silja Hildebrand and Stephan Vogel. Combination of machine translation systems via hypothesis selection from combined n-best lists. In MT at work: Proceedings of the 8th Conference of the AMTA, pages 254–261, Waikiki, Hawaii, October 2008.

    Google Scholar 

  10. Muntsin Kolss, Jan Niehues, Teresa Herrmann, and Alex Waibel. The Universität Karlsruhe Translation System for the EACL-WMT 2009. In Fourth Workshop on Statistical Machine Translation (WMT 2009), Athens, Greece, 2009.

    Google Scholar 

  11. Qin Jin and Tanja Schultz. Speaker segmentation and clustering in meetings. In Proceedings of the 8th International Conference on Spoken Language Processing (Interspeech 2004 — ICSLP), Jeju Island, Korea, October 2004. ISCA.

    Google Scholar 

  12. Philipp Koehn, Franz Josef Och, and Daniel Marcu. Statistical Phrase-Based Translation. In HLT/NAACL 2003, 2003.

    Google Scholar 

  13. E. Leeuwis, M. Federico, and M. Cettolo. Language modeling and transcription of the ted corpus lectures. In International Conference on Acoustics, Speech, and Signal Processing, Hong Kong, China, March 2003.

    Google Scholar 

  14. C.J. Leggetter and P.C. Woodland. Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models. Computer Speech and Language, 9:171–185, 1995.

    Article  Google Scholar 

  15. Jan Niehues and Muntsin Kolss. A POS-Based Model for Long-Range Reorderings in SMT. In Fourth Workshop on Statistical Machine Translation (WMT 2009), Athens, Greece, 2009.

    Google Scholar 

  16. Jan Niehues and Stephan Vogel. Discriminative Word Alignment via Alignment Matrix Modeling. In Proc. of Third ACL Workshop on Statistical Machine Translation, Columbus, USA, 2008.

    Google Scholar 

  17. Franz J. Och. GIZA++: Training of statistical translation models. http://www.fjoch.com/GIZA++.html, 2000.

  18. Franz Josef Och. Minimum error rate training in statistical machine translation. In Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics, pages 160–167, Sapporo, Japan, July 2003. Association for Computational Linguistics.

    Google Scholar 

  19. D. Povey and P.C. Woodland. Improved discriminative training techniques for large vocabulary continuous speech recognition. In International Conference on Acoustics, Speech, and Signal Processing, Salt Lake City, UT, USA, May 2001.

    Google Scholar 

  20. Kay Rottman and Stephan Vogel. Word reordering in statistical machine translation with a pos-based distortion model. In TMI ’07, 2007.

    Google Scholar 

  21. Sebastian Stüker, Christian Fügen, Florian Kraft, and Matthias Wölfel. The isl 2007 English speech transcription system for European parliament speeches. In Proceedings of the 10th European Conference on Speech Communication and Technology (INTERSPEECH 2007), pages 2609–2612, Antwerp, Belgium, August 2007.

    Google Scholar 

  22. H. Soltau, F. Metze, C. Fügen, and A. Waibel. A one pass-decoder based on polymorphic linguistic context assignment. In Proceedings of the IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU ’01), pages 214–217, Madonna di Campiglio Trento, Italy, December 2001.

    Google Scholar 

  23. A. Stolcke. SRILM – An Extensible Language Modeling Toolkit. In Proceedings of the 7th International Conference on Spoken Language Processing (ICSLP 2002), pages 901–904, Denver, CO, USA, 2002. ISCA.

    Google Scholar 

  24. A. Venkataraman and W. Wang. Techniques for effective vocabulary selection. Arxiv preprint cs/0306022, 2003.

    Google Scholar 

  25. M.C. Wölfel and J.W. McDonough. Minimum variance distortionless response spectralestimation, review and refinements. IEEE Signal Processing Magazine, 22(5):117–126, September 2005.

    Article  Google Scholar 

  26. Puming Zhan and Martin Westphal. Speaker normalization based on frequency warping. In Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing, Munich, Germany, April 1997.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sebastian Stüker .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Stüker, S., Kilgour, K., Niehues, J. (2011). Quaero Speech-to-Text and Text Translation Evaluation Systems. In: Nagel, W., Kröner, D., Resch, M. (eds) High Performance Computing in Science and Engineering '10. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15748-6_38

Download citation

Publish with us

Policies and ethics