Advertisement

Statistical Machine Translation and Turkish

  • Kemal OflazerEmail author
  • Reyyan Yeniterzi
  • İlknur Durgar-El Kahlout
Chapter
  • 384 Downloads
Part of the Theory and Applications of Natural Language Processing book series (NLP)

Abstract

Machine translation is one of the most important applications of natural language processing. The last 25 years have seen tremendous progress in machine translation, enabled by the development of statistical techniques and availability of large-scale parallel sentence corpora from which statistical models of translation can be learned. Turkish poses quite many challenges for statistical machine translation as alluded to in Chap.  1, owing mainly to its complex morphology. This chapter discusses in more detail the challenges of Turkish in the context of statistical machine translation and describes two widely different approaches that have been employed in the last several years to English to Turkish machine translation.

References

  1. Bisazza A, Federico M (2009) Morphological pre-processing for Turkish to English statistical machine translation. In: Proceedings of IWSLT, Tokyo, pp 129–135Google Scholar
  2. Carpuat M (2009) Toward using morphology in French-English phrase-based SMT. In: Proceedings of WMT, Athens, pp 150–154Google Scholar
  3. Chung T, Gildea D (2009) Unsupervised tokenization for machine translation. In: Proceedings of EMNLP, Singapore, pp 718–726Google Scholar
  4. Corston-Oliver S, Gamon M (2004) Normalizing German and English inflectional morphology to improve statistical word alignment. In: Proceedings of AMTA, Washington, DCGoogle Scholar
  5. Durgar-El Kahlout İ (2009) A prototype English-Turkish statistical machine translation system. PhD thesis, Sabancı University, IstanbulGoogle Scholar
  6. Durgar-El Kahlout İ, Oflazer K (2006) Initial explorations in English to Turkish statistical machine translation. In: Proceedings of WMT, New York, NY, pp 7–14Google Scholar
  7. Durgar-El Kahlout İ, Oflazer K (2010) Exploiting morphology and local word reordering in English-to-Turkish phrase-based statistical machine translation. IEEE Trans Audio Speech Lang Process 18(6):1313–1322Google Scholar
  8. Durgar-El Kahlout İ, Mermer C, Doğan MU (2012) Recent improvements in statistical machine translation between Turkish and English. In: Vertan C, von Hahn W (eds) Multilingual processing in Eastern and Southern EU languages: low-resourced technologies and translation. Cambridge Scholars Publishing, CambridgeGoogle Scholar
  9. Eyigöz E, Gildea D, Oflazer K (2013a) Multi-rate HMMs for word alignment. In: Proceedings of WMT, Sofia, pp 494–502Google Scholar
  10. Eyigöz E, Gildea D, Oflazer K (2013b) Simultaneous word-morpheme alignment for statistical machine translation. In: Proceedings of NAACL-HLT, Atlanta, GA, pp 32–40Google Scholar
  11. Goldwater S, McClosky D (2005) Improving statistical MT through morphological analysis. In: Proceedings of EMNLP, Vancouver, BC, pp 676–683Google Scholar
  12. Koehn P, Och FJ, Marcu D (2003) Statistical phrase-based translation. In: Proceedings of NAACL-HLT, Edmonton, AB, pp 127–133Google Scholar
  13. Koehn P, Hoang H, Birch A, Callison-Burch C, Federico M, Bertoldi N, Cowan B, Shen W, Moran C, Zens R, Dyer C, Bojar O, Constantin A, Herbst E (2007) Moses: open source toolkit for statistical machine translation. In: Proceedings of ACL, Prague, pp 177–180Google Scholar
  14. Lee YS (2004) Morphological analysis for statistical machine translation. In: Proceedings of NAACL-HLT, Boston, MA, pp 57–60Google Scholar
  15. Luong MT, Nakov P, Kan MY (2010) A hybrid morpheme-word representation for machine translation of morphologically rich languages. In: Proceedings of EMNLP, Cambridge, MA, pp 148–157Google Scholar
  16. Mermer C, Akın AA (2010) Unsupervised search for the optimal segmentation for statistical machine translation. In: Proceedings of the ACL student research workshop, Uppsala, pp 31–36Google Scholar
  17. Minkov E, Toutanova K, Suzuki H (2007) Generating complex morphology for machine translation. In: Proceedings of ACL, Prague, pp 128–135Google Scholar
  18. Naradowsky J, Toutanova K (2011) Unsupervised bilingual morpheme segmentation and alignment with context-rich hidden semi-markov models. In: Proceedings of ACL-HLT, Portland, OR, pp 895–904Google Scholar
  19. Nguyen T, Vogel S, Smith NA (2010) Nonparametric word segmentation for machine translation. In: Proceedings of COLING, Beijing, pp 815–823Google Scholar
  20. Niessen S, Ney H (2004) Statistical machine translation with scarce resources using morpho-syntatic information. Comput Linguist 30(2):181–204Google Scholar
  21. Nivre J, Hall J, Nilsson J, Chanev A, Eryiğit G, Kübler S, Marinov S, Marsi E (2007) MaltParser: a language-independent system for data-driven dependency parsing. Nat Lang Eng 13(2):95–135Google Scholar
  22. Oflazer K (1994) Two-level description of Turkish morphology. Lit Linguist Comput 9(2):137–148Google Scholar
  23. Oflazer K (1996) Error-tolerant finite-state recognition with applications to morphological analysis and spelling correction. Comput Linguist 22(1):73–99Google Scholar
  24. Oflazer K, Durgar-El Kahlout İ (2007) Exploring different representational units in English-to-Turkish statistical machine translation. In: Proceedings of WMT, Prague, pp 25–32Google Scholar
  25. Papineni K, Roukos S, Ward T, Zhu WJ (2002) BLEU: a method for automatic evaluation of machine translation. In: Proceedings of ACL, Philadelphia, PA, pp 311–318Google Scholar
  26. Popovic M, Ney H (2004) Towards the use of word stems and suffixes for statistical machine translation. In: Proceedings of LREC, Lisbon, pp 1585–1588Google Scholar
  27. Sadat F, Habash N (2006) Combination of Arabic preprocessing schemes for statistical machine translation. In: Proceedings of COLING-ACL, Sydney, pp 1–8Google Scholar
  28. Schmid H (1994) Probabilistic part-of-speech tagging using decision trees. In: Proceedings of the international conference on new methods in language processing, ManchesterGoogle Scholar
  29. Stolcke A (2002) SRILM – an extensible language modeling toolkit. In: Proceedings of ICSLP, Denver, CO, vol 2, pp 901–904Google Scholar
  30. Talbot D, Osborne M (2006) Modelling lexical redundancy for machine translation. In: Proceedings of COLING-ACL, Sydney, pp 969–976Google Scholar
  31. Tantuğ AC, Oflazer K, Durgar-El Kahlout İ (2008) BLEU+: a tool for fine-grained BLEU computation. In: Proceedings of LREC, Marrakesh, pp 1493–1499Google Scholar
  32. Toutanova K, Klein D, Manning CD, Singer Y (2003) Feature-rich part-of-speech tagging with a cyclic dependency network. In: Proceedings of NAACL-HLT, Edmonton, AB, pp 252–259Google Scholar
  33. Yang M, Kirchhoff K (2006) Phrase-based backoff models for machine translation of highly inflected languages. In: Proceedings of EACL, Trento, pp 41–48Google Scholar
  34. Yeniterzi R (2009) Syntax-to-morphology alignment and constituent reordering in factored phrase-based statistical machine translation from English to Turkish. Master’s thesis, Sabancı University, IstanbulGoogle Scholar
  35. Yeniterzi R, Oflazer K (2010) Syntax-to-morphology mapping in factored phrase-based statistical machine translation from English to Turkish. In: Proceedings of ACL, Uppsala, pp 454–464Google Scholar
  36. Yılmaz E, Durgar-El Kahlout İ (2014) The use of recurrent neural networks language model in Turkish-English machine translation. In: Proceedings of IEEE signal processing and communications applications conference, Trabzon, pp 1247–1250Google Scholar
  37. Yılmaz E, Durgar-El Kahlout İ, Aydın B, Özil ZS (2013) TÜBİTAK Turkish-English submissions for IWSLT 2013. In: Proceedings of IWSLT, Heidelberg, pp 152–159Google Scholar
  38. Yuret D, Türe F (2006) Learning morphological disambiguation rules for Turkish. In: Proceedings of NAACL-HLT, New York, NY, pp 328–334Google Scholar
  39. Zollmann A, Venugopal A, Vogel S (2006) Bridging the inflection morphology gap for Arabic statistical machine translation. In: Proceedings of NAACL-HLT, New York, NY, pp 201–204Google Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  • Kemal Oflazer
    • 1
    Email author
  • Reyyan Yeniterzi
    • 2
  • İlknur Durgar-El Kahlout
    • 3
  1. 1.Carnegie Mellon University QatarDoha-Education CityQatar
  2. 2.Özyeǧin UniversityIstanbulTurkey
  3. 3.TÜBİTAK-BİLGEMGebzeTurkey

Personalised recommendations