Skip to main content

MSD Recombination for Statistical Machine Translation into Highly-Inflected Languages

  • Conference paper
Text, Speech and Dialogue (TSD 2008)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5246))

Included in the following conference series:

  • 950 Accesses

Abstract

Freely available tools and language resources were used to build the VoiceTRAN statistical machine translation (SMT) system. Various configuration variations of the system are presented and evaluated. The VoiceTRAN SMT system outperformed the baseline conventional rule-based MT system in both English-Slovenian in-domain test setups. To further increase the generalization capability of the translation model for lower-coverage out-of-domain test sentences, an “MSD-recombination” approach was proposed. This approach not only allows a better exploitation of conventional translation models, but also performs well in the more demanding translation direction; that is, into a highly inflectional language. Using this approach in the out-of-domain setup of the English-Slovenian JRC-ACQUIS task, we have achieved significant improvements in translation quality.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Vičič, J.: Avtomatsko prevajanje iz slovenskega v angleški jezik na osnovi statističnega strojnega prevajanja (Automatic SMT: Slovenian-English), Masters’ thesis, University of Ljubljana, Slovenia (2002)

    Google Scholar 

  2. Romih, M., Holozan, P.: Slovenian-English Translation System. In: Proceedings of the LTC 2002, Ljubljana, Slovenia, p. 167 (2002)

    Google Scholar 

  3. Žganec Gros, J., Gruden, S., Mihelič, F., Erjavec, T., Vintar, Š., Holozan, P., Mihelič, A., Dobrišek, S., Žibert, J., Logar, N., Korošec, T.: The VoiceTRAN Speech Translation Demonstrator. In: Proceedings of the IS-LTC 2006, Ljubljana, Slovenia, pp. 234–239 (2006)

    Google Scholar 

  4. Sepesy Maučec, M., Kačič, Z.: Statistical machine translation from Slovenian to English. Journal of Computing and Information Technology 15(5), 47–59 (2007)

    Google Scholar 

  5. Steinberger, R., Pouliquen, B., Widiger, A., Ignat, C., Erjavec, T., Tufis, D., Varga, D.: The JRC-Acquis: A multilingual aligned parallel corpus with 20+ languages. In: Proceedings of the Fifth International Conference on Language Resources and Evaluation, LREC 2006, ELRA, Paris, pp. 2142–2147 (2006)

    Google Scholar 

  6. Erjavec, T.: The IJS-ELAN Slovene-English Parallel Corpus. International Journal of Corpus Linguistics 7(1), 1–20 (2002)

    Article  Google Scholar 

  7. Erjavec, T.: Compilation and Exploitation of Parallel Corpora. Journal of Computing and In-formation Technology 11(2), 93–102 (2003)

    Article  Google Scholar 

  8. Och, F.J., Ney, H.: A Systematic Comparison of Various Statistical Alignment Models. Computational Linguistics 29(1), 19–51 (2003), http://www.fjoch.com/GIZA++.html

    Article  Google Scholar 

  9. Rosenfeld, R.: The CMU Statistical Language Modeling Toolkit, and Its Use in the 1994 ARPA CSR Evaluation. In: Proceedings of the ARPA SLT Workshop, http://www.speech.cs.cmu.edu/SLM/toolkit.html

  10. Germann, U.: Greedy Decoding for Statistical Machine Translation in Almost Linear Time. In: Proceedings of the HLT-NAACL- 2003 (2003), http://www.isi.edu/licensed-sw/rewrite-decoder/

  11. Turian, J.P., Shen, L., Dan Melamed, I.: Proteus Technical Report #03-005: Evaluation of Machine Translation and its Evaluation, http://nlp.cs.nyu.edu/eval/

  12. Doddington, G.: Automatic Evaluation of Machine Translation Quality using N-gram Cooccurrence Statistics. In: Proceedings of the 2nd Human Language Technologies Conference, San Diego (2002)

    Google Scholar 

  13. Banerjee, S., Lavie, A.: METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments. In: Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization at the 43rd Annual Meeting of the Association of Computational Linguistics, Ann Arbor, Michigan (2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Petr Sojka Aleš Horák Ivan Kopeček Karel Pala

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Žganec-Gros, J., Gruden, S. (2008). MSD Recombination for Statistical Machine Translation into Highly-Inflected Languages. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds) Text, Speech and Dialogue. TSD 2008. Lecture Notes in Computer Science(), vol 5246. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-87391-4_31

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-87391-4_31

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-87390-7

  • Online ISBN: 978-3-540-87391-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics