Skip to main content

German Compounds in Factored Statistical Machine Translation

  • Conference paper
Advances in Natural Language Processing (GoTAL 2008)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5221))

Included in the following conference series:

Abstract

An empirical method for splitting German compounds is explored by varying it in a number of ways to investigate the consequences for factored statistical machine translation between English and German in both directions. Compound splitting is incorporated into translation in a preprocessing step, performed on training data and on German translation input. For translation into German, compounds are merged based on part-of-speech in a postprocessing step. Compound parts are marked, to separate them from ordinary words. Translation quality is improved in both translation directions and the number of untranslated words in the English output is reduced. Different versions of the splitting algorithm performs best in the two different translation directions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Langer, S.: Zur Morphologie und Semantik von Nominalkomposita. In: Tagungsband der 4. Konferenz zur Verarbeitung natürlicher Sprache, pp. 83–97 (1998)

    Google Scholar 

  2. Koehn, P., Knight, K.: Empirical methods for compound splitting. In: Proceedings of the tenth conference of EACL, Budapest, Hungary, pp. 187–193 (2003)

    Google Scholar 

  3. Popović, M., Stein, D., Ney, H.: Statistical machine translation of German compound words. In: Proceedings of FinTAL - 5th International Conference on Natural Language Processing, Turku, Finland, pp. 616–624 (2006)

    Google Scholar 

  4. Stymne, S., Holmqvist, M., Ahrenberg, L.: Effects of morphological analysis in translation between German and English. In: Proceedings of the Third ACL Workshop on Statistical Machine Translation, Columbus, Ohio (2008)

    Google Scholar 

  5. Koehn, P.: Europarl: A parallel corpus for statistical machine translation. In: Proceedings of MT Summit X, Phuket, Thailand, pp. 79–86 (2005)

    Google Scholar 

  6. Schmid, H.: Probabilistic part-of-speech tagging using decision trees. In: Proceedings of the International Conference on New Methods in Language Processing, Manchester, UK, pp. 44–49 (1994)

    Google Scholar 

  7. Holmqvist, M., Stymne, S., Ahrenberg, L.: Getting to know Moses: Initial experiments on German-English factored translation. In: Proceedings of the Second Workshop on Statistical Machine Translation, Prague, Czech Republic, pp. 181–184 (2007)

    Google Scholar 

  8. Koehn, P., Hoang, H.: Factored translation models. In: Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Prague, Czech Republic, pp. 868–876 (2007)

    Google Scholar 

  9. Koehn, P., Hoang, H., Birch, A., Callison-Burch, C., Federico, M., Bertoldi, N., Cowan, B., Shen, W., Moran, C., Zens, R., Dyer, C., Bojar, O., Constantin, A., Herbst, E.: Moses: Open source toolkit for statistical machine translation. In: Proceedings of the 45th Annual Meeting of the ACL, demonstration session, Prague, Czech Republic, pp. 177–180 (2007)

    Google Scholar 

  10. Stolcke, A.: SRILM - an extensible language modeling toolkit. In: Proceedings of the International Conference on Spoken Language Processing (ICSLP), Denver, Colorado, pp. 901–904 (2002)

    Google Scholar 

  11. Och, F.J., Ney, H.: A systematic comparison of various statistical alignment models. Computational Linguistics 29(1), 19–51 (2003)

    Article  MATH  Google Scholar 

  12. Och, F.J.: Minimum error rate training in statistical machine translation. In: Proceedings of the 41st Annual Meeting of ACL, Sapporo, Japan, pp. 160–167 (2003)

    Google Scholar 

  13. Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting of the ACL, Philadelphia, Pennsylvania, pp. 311–318 (2002)

    Google Scholar 

  14. Doddington, G.: Automatic evaluation of machine translation quality using n-gram co-occurrence statistics. In: Proceedings of the Second International Conference on Human Language Technology Research, San Diego, California, pp. 138–145 (2002)

    Google Scholar 

  15. Lavie, A., Agarwal, A.: METEOR: An automatic metric for MT evaluation with high levels of correlation with human judgments. In: Proceedings of the Second Workshop on Statistical Machine Translation, Prague, Czech Republic, pp. 228–231 (2007)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Stymne, S. (2008). German Compounds in Factored Statistical Machine Translation. In: Nordström, B., Ranta, A. (eds) Advances in Natural Language Processing. GoTAL 2008. Lecture Notes in Computer Science(), vol 5221. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-85287-2_44

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-85287-2_44

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-85286-5

  • Online ISBN: 978-3-540-85287-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics