Advertisement

Artificial Intelligence Review

, Volume 51, Issue 1, pp 77–117 | Cite as

Slavic languages in phrase-based statistical machine translation: a survey

  • Mirjam Sepesy MaučecEmail author
  • Janez Brest
Article
  • 277 Downloads

Abstract

The demand for translations is increasing at a rate far beyond the capacity of professional translators. It is too difficult, time consuming and expensive to translate everything from scratch in each language. Machine translation offers a solution, as it provides translation automatically. Until recently, statistical machine translation has proved to be one of the most successful approaches. However, a new approach to machine translation based on neural networks has emerged with promising results. The present paper concerns phrase-based statistical machine translation, an area that has been extensively studied in the literature. The translation system consists of many components built on the premise of probabilities. Each component is described separately. Although high quality translation systems have been developed for certain language pairs, there is still a large number of languages that cause many translation errors. Languages with a rich morphology pose an especially difficult challenge for research. We address one group of morphologically rich languages: Slavic languages, which constitute a relatively homogeneous family of languages characterized by rich, inflectional morphology. The present paper offers a comprehensive survey of approaches to coping with Slavic languages in different aspects of statistical machine translation. We observe that the interest of the community in research of more difficult languages is increasing and we believe that the translation quality of those languages will reach the level of practical use in the near future.

Keywords

Statistical machine translation Morphology Slavic language Inflection Free word order 

Notes

Acknowledgements

The authors would like to thank the editor and anonymous reviewers for their helpful and constructive comments that greatly contributed to improving the paper. Funding was provided by Javna Agencija za Raziskovalno Dejavnost RS (Grant Nos. P2-0069, P2-0041).

References

  1. Agić Ž, Merkler D, Berović D (2013) Parsing Croatian and Serbian by using Croatian dependency treebanks. In: Proceedings of the fourth workshop on statistical parsing of morphologically-rich languages. Seattle, Washington, USA, pp 22–33Google Scholar
  2. Alumäe T, Kurimo M (2010) Efficient estimation of maximum entropy language models with N-gram features: an SRILM extension. In: Proceedings of Interspeech 2010. Chiba, Japan, pp 1820–1823Google Scholar
  3. Arčan M, Popović M, Buitelaar P (2016) Asistent A machine translation system for Slovene, Serbian and Croatian. In: Proceedings of the conference on language technologies & digital humanities. Ljubljana, Slovenia, pp 13–20Google Scholar
  4. Avramidis E, Koehn P (2008) Enriching morphologically poor languages for statistical machine translation. In: Proceedings of ACL-08: HLT. Association for Computational Linguistics, Columbus, Ohio, pp 763–770Google Scholar
  5. Baerman M (2015) The Oxford handbook of inflection. Oxford University Press, OxfordGoogle Scholar
  6. Bahdanau D, Cho K, Bengio Y (2014) Neural machine translation by jointly learning to align and translate. CoRR abs/1409.0473Google Scholar
  7. Banerjee S, Lavie A (2005) METEOR: an automatic metric for MT evaluation with improved correlation with human judgments. In: Proceedings of the ACL 2005 workshop on intrinsic and extrinsic evaluation measures for MT and/or summarization, pp 65–72Google Scholar
  8. Bertoldi N, Haddow B, Fouet JB (2010) Improved minimum error rate training in Moses. Prague Bull Math Linguist 91:7–16Google Scholar
  9. Bilmes JA, Kirchhoff K (2003) Factored language models and generalized parallel backoff. In: Proceedings of the 2003 conference of the North American chapter of the association for computational linguistics on human language technology: companion volume of the proceedings of HLT-NAACL 2003-short papers, vol 2. Association for Computational Linguistics, Edmonton, Canada, pp 4–6Google Scholar
  10. Bisazza A, Monz C (2014) Class-based language modeling for translating into morphologically rich languages. In: Proceedings of COLING 2014, the 25th international conference on computational linguistics: technical papers. Dublin City University and Association for Computational Linguistics, Dublin, Ireland, pp 1918–1927Google Scholar
  11. Bohnet B, Nivre J, Boguslavsky IM, Farkas R, Ginter F, Hajič J (2013) Joint morphological and syntactic analysis for richly inflected languages. Trans Assoc Comput Linguist 1:429–440Google Scholar
  12. Bojar O (2007) English-to-Czech factored machine translation. In: Proceedings of the second workshop on statistical machine translation. Prague, Czech Republic, Association for Computational Linguistics, pp 232–239Google Scholar
  13. Bojar O (2011) Analyzing error types in English-Czech machine translation. Prague Bull Math Linguist 95:63–76Google Scholar
  14. Bojar O, Čmejrek M (2007) Mathematical model of tree transformations. Public deliverable D3.2, EuroMatrix, IST-034291Google Scholar
  15. Bojar O, Hajič J (2008) Phrase-based and deep syntactic English-to-Czech statistical machine translation. In: Proceedings of the third workshop on statistical machine translation. Association for Computational Linguistics, Columbus, Ohio, USA, pp 143–146Google Scholar
  16. Bojar O, Kos K (2010) 2010 Failures in English-Czech phrase-based MT. In: Proceedings of the joint fifth workshop on statistical machine translation and metrics (MATR). Association for Computational Linguistics, Uppsala, Sweden, pp 60–66Google Scholar
  17. Bojar O, Prokopová M (2006) Czech-English word alignment. In: Proceedings of the international conference on language resources and evaluation, pp 1236–1239Google Scholar
  18. Bojar O, Tamchyna A (2011) Forms wanted: training SMT on monolingual data. Abstract at machine translation and morphologically-rich languages. In: Research workshop of the Israel Science Foundation University of Haifa, IsraelGoogle Scholar
  19. Bojar O, Wu D (2012) Towards a predicate-argument evaluation for MT. In: Proceedings of the sixth workshop on syntax, semantics and structure in statistical translation (SSST). Jeju, Republic of Korea, Association for Computational Linguistics, pp 30–38Google Scholar
  20. Bojar O, Zeman D (2014) Czech machine translation in the project CzechMATE. Prague Bull Math Linguist 101:71–96Google Scholar
  21. Bojar O, Matusov E, Ney H (2006) Czech-English phrase-based machine translation. In: Proceedings of the 5th international conference on NLP (FinTAL). Turku, Finland, pp 214–224Google Scholar
  22. Bojar O, Kos K, Mareček D (2010) Tackling sparse data issue in machine translation evaluation. In: Proceedings of the ACL 2010 conference short papers. Association for Computational Linguistics, Uppsala, Sweden, pp 86–91Google Scholar
  23. Bojar O, Jawaid B, Kamran A (2012) Probes in a taxonomy of factored phrase-based models. In: Proceedings of the 7th workshop on statistical machine translation. Association for Computational Linguistics, Montréal, Canada, pp 253–260Google Scholar
  24. Bojar O, Macháček M, Tamchyna A, Zeman D (2013a) Scratching the surface of possible translations. In; Proceedings of the 16th international conference text. Plzeň, Czech Republic, Speech and Dialogue, pp 465–474Google Scholar
  25. Bojar O, Rosa R, Tamchyna A (2013b) Chimera—three heads for English-to-Czech translation. In: Proceedings of the eighth workshop on statistical machine translation. Association for Computational Linguistics, Sofia, Bulgaria, pp 92–98Google Scholar
  26. Bojar O, Chatterjee R, Federmann C, Graham Y, Haddow B, Huck M, Jimeno Yepes A, Koehn P, Logacheva V, Monz C, Negri M, Neveol A, Neves M, Popel M, Post M, Rubino R, Scarton C, Specia L, Turchi M, Verspoor K, Zampieri M (2016) Findings of the 2016 conference on machine translation. In: Proceedings of the first conference on machine translation. Association for Computational Linguistics, Berlin, Germany, pp 131–198Google Scholar
  27. Botha JA, Blunsom P (2014) Compositional morphology for word representations and language modelling. In: Proceedings of the 31st international conference on machine learning. Beijing, China, pp 1899–1907Google Scholar
  28. Brown PF, Pietra SAD, Pietra VJD, Mercer RL (1993) The mathematics of statistical machine translation: parameter estimation. Comput Linguist 19(2):263–311Google Scholar
  29. Brychcín T, Konopík M (2011) Morphological based language models for inflectional languages. IN: The 6th IEEE international conference on intelligent data acquisition and advanced computing systems: technology and applications. Czech Republic, Prague, pp 560–564Google Scholar
  30. Brychcín T, Konopík M (2015) HPS: high precision stemmer. Inf Process Manag 51(1):68–91Google Scholar
  31. Burlot F, Yvon F (2015) Morphology-aware alignments for translation to and from a synthetic language. In: Proceedings of the 12th international workshop on spoken language translation, Da Nang, Vietnam, pp 188–195Google Scholar
  32. Cettolo M, Niehues J, Stker S, Bentivogli L, Cattoni R, Federico M (2015) The IWSLT 2015 evaluation campaign. In: Proceedings of the international workshop on spoken language translation (IWSLT), Da Nang, Vietnam, pp 2–14Google Scholar
  33. Chahuneau V, Schlinger E, Smith NA, Dyer C (2013) Translating into morphologically rich languages with synthetic phrases. In: Proceedings of the 2013 conference on empirical methods in natural language processing. Seattle, Washington, USA, pp 1677–1687Google Scholar
  34. Chahuneau V, Smith NA, Dyer C (2013b) Knowledge-rich morphological priors for Bayesian language models. In: Proceedings of NAACL-HLT. Atlanta, Georgia, pp 1206–1215Google Scholar
  35. Chen SF, Goodman J (1998) An empirical study of smoothing techniques for language modelling. Technical Report TR-10-98, Computer Science Group, Harvard UniversityGoogle Scholar
  36. Cho K, Van Merriënboer B, Gülçehre Ç, Bahdanau D, Bougares F, Schwenk H, Bengio Y (2014) Learning phrase representations using RNN Encoder-Decoder for statistical machine translation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP). Association for Computational Linguistics, Doha, Qatar, pp 1724–1734Google Scholar
  37. Cholakov K, Kordoni V (2014) Better statistical machine translation through linguistic treatment of phrasal verbs. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP). Association for Computational Linguistics, Doha, Qatar, pp 196–201Google Scholar
  38. Chung J, Cho K, Bengio Y (2016) NYU-MILA neural machine translation systems for WMT16. In: Proceedings of the first conference on machine translation. Association for Computational Linguistics, Berlin, Germany, pp 268–271Google Scholar
  39. Costa-jussà MR (2015a) How much hybridization does machine translation need? J Assoc Inf Sci Technol 6(10):2160–2165Google Scholar
  40. Costa-jussà MR (2015b) Latest trends in hybrid machine translation and its applications. Comput Speech Lang 32(1):3–10Google Scholar
  41. Denkowski M, Lavie A (2014) Meteor universal: language specific translation evaluation for any target language. In: Proceedings of the EACL 2014 workshop on statistical machine translation. Baltimore, Maryland, USA, pp 376–380Google Scholar
  42. Ding S, Duh K, Khayrallah H, Koehn P, Post M (2016) The JHU machine translation systems for WMT 2016. In: Proceedings of the first conference on machine translation. Association for Computational Linguistics, Berlin, Germany, pp 272–280Google Scholar
  43. Donaj G, Kačič Z (2016) Language modeling for automatic speech recognition of inflective languages: an applications-oriented approach using lexical data. Springer, LondonGoogle Scholar
  44. Dove C, Loskutova O, de la Fuente R (2012) What’s your pick: RbMT, SMT or hybrid? In: Proceedings of 11th conference of the associationfor machine translation in the Americas (AMTA), San Diego, CAGoogle Scholar
  45. Dugonik J, Bošković B, Maučec MS, Brest J (2014) The usage of differential evolution in a statistical machine translation. In: Proceedings of the IEEE symposium series on computational intelligence (SSCI). Orlando, Florida, USA, pp 89–96Google Scholar
  46. Durrani N, Sajjad H (2014) Integrating an unsupervised transliteration model into statistical machine translation. In: Proceedings of the 14th conference of the European chapter of the association for computational linguistics. Gothenburg, Sweden, pp 148–153Google Scholar
  47. Durrani N, Schmid H, Fraser A (2011) A joint sequence translation model with integrated reordering. In: Proceedings of the 49th annual meeting of the association for computational linguistics (ACL-HLT). Portland, Oregon, USA, pp 1045–1054Google Scholar
  48. Durrani N, Fraser A, Schmid H, Hoang H, Koehn P (2013) Can Markov models over minimal translation units help phrase-based SMT? In: Proceedings of the 51st annual conference of the association for computational linguistics (ACL). Sofia, Bulgaria, pp 399–405Google Scholar
  49. Durrani N, Koehn P, Schmid H, Fraser A (2014) Investigating the usefulness of generalized word representations in SMT. In: Proceedings of the 25th annual conference on computational linguistics (COLING). Dublin, Ireland, pp 421–432Google Scholar
  50. Durrani N, Schmid H, Fraser A, Koehn P, Schütze H (2015) The operation sequence model—combining N-gram-based and phrase-based statistical machine translation. Comput Linguist 41(2):185–214MathSciNetGoogle Scholar
  51. Dušek O, Žabokrtský Z, Popel M, Dušek M, Novák M, Mareček D (2012) Formemes in English-Czech deep syntactic MT. In: Proceedings of the 7th workshop on statistical machine translation. Association for Computational Linguistics, Montreal, Canada, pp 267–274Google Scholar
  52. Dyer C, Chahuneau V, Smith NA (2013) A simple, fast, and effective reparameterization of IBM Model 2. In: Proceedings of NAACL. Atlanta, Georgia, USA, pp 644–648Google Scholar
  53. Dzikiene JK, Nivre J, Krupavičius A (2013) Lithuanian dependency parsing with rich morphological features. In: Proceedings of the fourth workshop on statistical parsing of morphologically-rich languages, pp 12–21Google Scholar
  54. Eisele A, Federmann C, Saint-Amand H, Jellinghaus M, Herrmann T, Chen Y (2008) Using Moses to integrate multiple rule-based machine translation engines into a hybrid system. In: Proceedings of the third workshop on statistical machine translation. Association for Computational Linguistics, Columbus, Ohio, USA, pp 179–182Google Scholar
  55. Farrús M, Costa-jussà MR, Morse MP (2012) Study and correlation analysis of linguistic, perceptual, and automatic machine translation evaluations. J Am Soc Inf Sci Technol 63(1):174–184Google Scholar
  56. Federmann C, Hunsicker S (2011) Stochastic parse tree selection for an existing RBMT system. In: Proceedings of the 6th workshop on statistical machine translation. Association for Computational Linguistics, Edinburgh, Scotland, pp 351–357Google Scholar
  57. Felice M, Specia L (2013) Investigating the contribution of linguistic information to quality estimation. Mach Transl 27:193–212Google Scholar
  58. Fishel M (2009) Deeper than words: morph-based alignment for statistical machine translation. In: Proceedings of the conference of the pacific association for computational linguistics (PacLing 2009), University of Hokkaido, Sapporo, JapanGoogle Scholar
  59. Galuščáková P, Bojar O (2012) Improving SMT by using parallel data of a closely related language. In: Human Language Technologies—the Baltic Perspective—proceedings of the fifth international conference Baltic HLT 2012, IOS Press, Amsterdam, Netherlands, Frontiers in AI and Applications, vol 247, pp 58–65Google Scholar
  60. Gao J, He X, tau Yih W, Deng L (2014) Learning continuous phrase representations for translation modeling. In: Proceedings of the 52nd annual meeting of the association for computational linguistics. Association for Computational Linguistics, pp 699–709Google Scholar
  61. Gao Q, Vogel S (2008) Parallel implementations of word alignment tool. In: Proceedings of the workshop software engineering, testing, and quality assurance for natural language processing. Association for Computational Linguistics, pp 49–57Google Scholar
  62. Gaudio R, Labaka G, Agirre E, Osenova P, Simov K, Popel M, Oele D, van Noord G, Gomes L, Ja António Rodrigues, Neale S, Ja Silva, Querido A, Rendeiro N, Branco A (2016) SMT and hybrid systems of the QTLeap project in the WMT16 IT-task. In: Proceedings of the first conference on machine translation. Association for Computational Linguistics, Berlin, Germany, pp 435–441Google Scholar
  63. Genzel D (2010) Automatically learning source-side reordering rules for large scale machine translation. In: Proceedings of the 23rd international conference on computational linguistics. Association for Computational Linguistics, pp 376–384Google Scholar
  64. Giménez J, Màrquez L (2010) Linguistic measures for automatic machine translation evaluation. Mach Transl 24:209–240Google Scholar
  65. Gimpel K, Smith NA (2014) Phrase dependency machine translation with quasi-synchronous tree-to-tree feature. Comput Linguist 40(2):349–401Google Scholar
  66. Goldwater S, McClosky D (2005) Improving statistical MT through morphological analysis. In: Proceedings of human language technology conference and conference on empirical methods in natural language processing (HLT/EMNLP). Vancouver, Canada, pp 676–683Google Scholar
  67. Graham Y, van Genabith J (2010) Factor templates for factored machine translation models. In; Proceedings of the seventh international workshop on spoken language translation (IWSLT). France, Paris, pp 275–282Google Scholar
  68. Green N (2011) Effects of noun phrase bracketing in dependency parsing and machine translation. In: Proceedings of the ACL 2011 student session. Association for Computational Linguistics, Portland, OR, USA, pp 69–74Google Scholar
  69. Green S, DeNero J (2012) A class-based agreement model for generating accurately inflected translations. In: Proceedings of the 50th annual meeting of the association for computational linguistics. Jeju, Republic of Korea, Association for Computational Linguistics, pp 146–155Google Scholar
  70. Hammarströ H, Borin L (2011) Unsupervised learning of morphology. Comput Linguist 37(2):309–350MathSciNetGoogle Scholar
  71. Hirsimäki T, Pylkkönen J, Kurimo M (2009) Importance of high-order N-gram models in morph-based speech recognition. IEEE/ACM Trans Audio Speech Lang Process 17(4):724–732Google Scholar
  72. Ho C, Azmi Murad MA, Doraisamy S, Abdul Kadir R (2014) Extracting lexical and phrasal paraphrases: a review of the literature. Artif Intell Rev 42(4):851–894Google Scholar
  73. Hoang C, Sima’an K (2014) Latent domain translation models in mix-of-domains haystack. In: COLING 2014, 25th international conference on computational linguistics, proceedings of the conference: technical papers, August 23–29, 2014. Dublin, Ireland, pp 1928–1939Google Scholar
  74. Hoang T, Bojar O (2015) TmTriangulate: a tool for phrase table triangulation. Prague Bull Math Linguist 104:75–86Google Scholar
  75. Homola P, Kuboň V (2008) A hybrid machine translation system for typologically related languages. In: Proceedings of the 21st international florida-artificial-intelligence-research-society conference (FLAIRS), pp 227–228Google Scholar
  76. Huet S, Manishina E, Lefevre F (2013) Factored machine translation systems for Russian-English. In: Proceedings of the eighth workshop on statistical machine translation. Sofia, Bulgaria, pp 154–157Google Scholar
  77. Hunsicker S, Yu C, Federmann C (2012) Machine learning for hybrid machine translation. In: Proceedings of the seventh workshop on statistical machine translation, pp 312–316Google Scholar
  78. Ircig P, Psutka JV, Psutka J (2009) Using morphological information for robust language modeling in Czech ASR system. IEEE/ACM Trans Audio Speech Lang Process 17(4):840–847Google Scholar
  79. Ircing P, Krbec P, Hajič J, Khudanpur S, Jelinek F, Psutka J, Byrne W (2001) On large vocabulary continuous speech recognition of highly inflectional language—Czech. In: Proceedings of the European conference on speech communication and technology (EUROSPEECH), pp 487–490Google Scholar
  80. ISO 9:1995 (1995) Information and documentation transliteration of Cyrillic characters into Latin characters Slavic and non-Slavic languages. International Organization for StandardizationGoogle Scholar
  81. Jawaid B, Bojar O (2014) Two-step machine translation with lattices. In: Proceedings of the 9th international conference on language resources and evaluation (LREC 2014). Reykjavík, Iceland, pp 682–686Google Scholar
  82. Jean S, Firat O, Cho K, Memisevic R, Bengio Y (2015) Montreal neural machine translation systems for WMT’15. In: Proceedings of the tenth workshop on statistical machine translation. Lisboa, Portugal, pp 134–140Google Scholar
  83. Jeong M, Toutanova K, Suzuki H, Quirk C (2010) A discriminative lexicon model for complex morphology. In: The ninth conference of the association for machine translation in the Americas (AMTA). Association for Computational LinguisticsGoogle Scholar
  84. Joty S, Guzmán F, Màrquez L, Nakov P (2014) DiscoTK: using discourse structure for machine translation evaluation. In: Proceedings of the ninth workshop on statistical machine translation. Association for Computational Linguistics, Baltimore, Maryland, USA, pp 402–408Google Scholar
  85. Juhár J, Staš J, Hládek D (2012) Recent progress in development of language model for Slovak large vocabulary continuous speech recognition. In: New technologies-trends, innovations and research, pp 261–276Google Scholar
  86. Junczys-Dowmunt M, Szał A (2011) SyMGiza++: Symmetrized word alignment models for statistical machine translation. In: International joint conferences security and intelligent information systems (SIIS), pp 379–390Google Scholar
  87. Junczys-Dowmunt M, Dwojak T, Sennrich R (2016) The AMU-UEDIN submission to the WMT16 news translation task: attention-based NMT models as feature functions in phrase-based SMT. In: Proceedings of the first conference on machine translation. Association for Computational Linguistics, Berlin, Germany, pp 319–325Google Scholar
  88. Kalchbrenner N, Blunsom P (2013) Recurrent continuous translation models. In: Proceedings of the 2013 conference on empirical methods in natural language processing (EMNLP), pp 1700–1709Google Scholar
  89. Katz SM (1987) Estimation of probabilities from sparse data for the language model component of a speech recogniser. IEEE Trans Acoust Speech Signal Process 35(3):400–401Google Scholar
  90. Kazi M, Salesky E, Thompson B, Ray J, Coury M, Shen W, Anderson T, Erdmann G, Gwinnup J, Young K, Ore B, Hutt M (2014) The MITLL-AFRL IWSLT 2014 MT System. In: Proceedings of the international workshop on spoken language translation (IWSLT), Lake Tahoe, pp 65–73Google Scholar
  91. Kipyatkova I, Karpov A (2014) Study of Morphological factors of factored language models for Russian ASR. In: Proceedings of the 16th international conference speech and computer (SPECOM). Novi Sad, Serbia, pp 451–458Google Scholar
  92. Kirchhoff K, Yang M, Duh K (2006) Machine translation of parliamentary proceedings using morpho-syntactic knowledge. In: Proceedings of the TC-STAR workshop on speech-to-speech translationGoogle Scholar
  93. Kneser R, Ney H (1993) Improved clustering techniques for class-based statistical language modelling. In: Proceedings of third European conference on speech communication and technology. EUROSPEECH 1993, Berlin, Germany, pp 22–25Google Scholar
  94. Koehn P (2011) Statistical machine translation. Cambridge University Press, CambridgezbMATHGoogle Scholar
  95. Koehn P, Haddow B (2012) Interpolated backoff for factored translation models. In: Proceedings of the tenth conference of the association for machine translation in the Americas (AMTA)Google Scholar
  96. Koehn P, Hoang H (2007) Factored translation models. In: Proceedings of the 2007 joint conference on empirical methods in natural language processing and computational natural language learning (EMNLP-CoNLL). Czech Republic, Scotland, Prague, pp 868–876Google Scholar
  97. Koehn P, Och FJ, Marcu D (2003) Statistical phrase-based translation. In: Proceedings of the human language technology and North American Association for computational linguistics conference (HLT/NAACL). Czech Republic, Scotland, Prague, pp 48–54Google Scholar
  98. Kolovratník D, Klyueva N, Bojar O (2009) Statistical machine translationrelated and unrelated languages. In: ITAT 2009 information technologies—applications and theory, Slovakia, pp 31–36Google Scholar
  99. Kos K, Bojar O (2009) Evaluation of machine translation metrics for Czech as the target language. Prague Bull Math Linguist 92:135–147Google Scholar
  100. Kuboň V, Vičič J (2014) A comparison of MT Methods for closely related languages: a case study on Czech Slovak language pair. In: Proceedings of the conference language technology for closely related languages and language variants (LT4CloseLang), pp 92–98Google Scholar
  101. Labaka G, España-Bonet C, Màrquez L, Sarasola K (2014) A hybrid machine translation architecture guided by syntax. Mach Transl 28(2):91–125Google Scholar
  102. Lembersky G, Ordan N, Wintner S (2012) Language models for machine translation: original vs. translated texts. Comput Linguist 38(4):799–825MathSciNetGoogle Scholar
  103. Lerner U, Petrov S (2013) Source-side classifier preordering for machine translation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP ’13). Seattle, Washington, USA, pp 513–523Google Scholar
  104. Libovický J, Pecina P (2015) Tolerant BLEU: a submission to the WMT14 metrics task. In: Proceedings of the ninth workshop on statistical machine translation (SMT), pp 409–413Google Scholar
  105. Lo C, Cherry C, Foster G, Stewart D, Islam R, Kazantseva A, Kuhn R (2016) NRC Russian-English machine translation system for WMT 2016. In: Proceedings of the first conference on machine translation. Association for Computational Linguistics, Berlin, Germany, pp 326–332Google Scholar
  106. Luong MT, Socher R, Manning CD (2013) Better word representations with recursive neural networks for morphology. In: Proceedings of the seventeenth conference on computational natural language learning. Association for Computational Linguistics, Sofia, Bulgaria, pp 104–113Google Scholar
  107. Macherey K, Dai AM, Talbot D, Popat AC, Och F (2011) Language-independent compound splitting with morphological operations. In: Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies, vol 1. Association for Computational Linguistics, Portland, Oregon, HLT ’11, pp 1395–1404Google Scholar
  108. Majewski P (2008) Syllable based language model for large vocabulary continuous speech recognition of Polish. Proceedings of the 11th international conference text, speech and dialogue (TSD). Brno, Czech Republic, pp 397–401Google Scholar
  109. Marasek K (2012) TED Polish-to-English translation system for the IWSLT 2012. In: Proceedings of the international workshop on spoken language translation (IWSLT), Hong Kong, pp 126–129Google Scholar
  110. Mareček D, Rosa R, Galuščáková P, Bojar O (2011) Two-step translation with grammatical post-processing. In: Proceedings of the sixth workshop on statistical machine translation. Association for Computational Linguistics, Edinburgh, Scotland, WMT ’11, pp 426–432Google Scholar
  111. Mariño JB, Banchs RE, Crego JM, de Gispert A, Lambert P, Fonollosa JAR, Costa-jussà MR (2006) N-gram-based machine translation. Comput Linguist 32(4):527–549MathSciNetzbMATHGoogle Scholar
  112. Maučec MS, Brest J (2010) Reduction of morpho-syntactic features in statistical machine translation of highly inflective language. Informatica 21(1):95–116zbMATHGoogle Scholar
  113. Maučec MS, Donaj G (2016) Morphosyntactic tags in statistical machine translation of highly inflectional language. In: Proceedings of the artificial intelligence and natural language conference (AINL FRUCT). Saint-Petersburg, Russia, pp 99–102Google Scholar
  114. Maučec MS, Kačič Z, Verdonik D (2014) Statistical machine translation of subtitles for highly inflected language pair. Pattern Recogn Lett 46:96–103Google Scholar
  115. McDonald R, Nivre J (2011) Analyzing and integrating dependency parsers. Comput Linguist 37(1):197–230Google Scholar
  116. Mikolov T, Kopecký J, Burget L, Glembek O, Černocký JH (2009) Neural network based language models for highly inflected languages. In: Proceedings of the ICASSP, pp 4725–4728Google Scholar
  117. Mikolov T, Yih W, Zweig G (2013) Linguistic regularities in continuous space word representations. In: Proceedings of the conference of the North American chapter of the association for computational linguistics: human language technologies (NAACL HLT). Atlanta, Georgia, pp 746–751Google Scholar
  118. Miłkowski M (2012) The Polish language in the digital age, White Paper Series. Springer, BerlinGoogle Scholar
  119. Minkov E, Toutanova K, Suzuki H (2007) Generating complex morphology for machine translation. In: roceedings of the 45th annual meeting of the association of computational linguistics. Association for Computational Linguistics, Prague, Czech Republic, pp 128--135Google Scholar
  120. Molchanov A, Bykov F (2016) PROMT translation systems for WMT 2016 translation tasks. In: Proceedings of the first conference on machine translation. Association for Computational Linguistics, Berlin, Germany, pp 339–343Google Scholar
  121. Morchid M, Huet S, Dufour R (2014) Topic-based approach for post-processing correction of automatic translations. In: Proceedings of the 11th international workshop on spoken language translation, Lake Tahoe, pp 80–85Google Scholar
  122. Müller T, Schuetze H, Schmid H (2012) A comparative investigation of morphological language modeling for the languages of the European Union. In: Human language technologies: conference of the North American chapter of the association of computational linguistics, proceedings, June 3–8, 2012. Montréal, Canada, pp 386–395Google Scholar
  123. Munková D, Munk M (2014) An automatic evaluation of machine translation and Slavic languages. In: Proceedings of the 8th international conference on application of information and communication technologies (AICT-2014), Astana, pp 447–451Google Scholar
  124. Munková D, Munk M (2015) Automatic evaluation of machine translation through the residual analysis. In: Proceedings of the 11th international conference advanced intelligent computing theories and applications. Fuzhou, China, pp 481–490Google Scholar
  125. Niehues J, Herrmann T, Vogel S, Waibel A (2011) Wider context by using bilingual language models in machine translation. In: Proceedings of the sixth workshop on statistical machine translation. Association for Computational Linguistics, Edinburgh, Scotland, pp 198–206Google Scholar
  126. Nivre J (2015) Towards a universal grammar for natural language processing. In: Gelbukh A (ed) Computational linguisticsand intelligent text processing. Springer, Berlin, pp 3–16Google Scholar
  127. Nivre J, Hall J, Nilsson J, Chanev A, Eryiğit G, Kübler S, Marinov S, Marsi E (2007) MaltParser: a language-independent system for data-driven dependency parsing. Nat Lang Eng 13(2):95–135Google Scholar
  128. Novák V, Žabokrtský Z (2007) Feature engineering in maximum spanning tree dependency parser. In: Proceedings of the 10th international conference on text. Pilsen, Czech Republic, Speech and Dialogue, pp 92–98Google Scholar
  129. Novák V, Nedoluzhko A, Žabokrtský Z (2013) Translation of “it” in a deep syntax framework. In: Proceedings of the workshop on discourse in machine translation (DiscoMT). Association for Computational Linguistics, Sofia, Bulgaria, pp 51–59Google Scholar
  130. Och FJ (2003) Minimum error rate training in statistical machine translation. In: Proceedings of the 41st annual meeting on association for computational linguistics, vol 1. Association for Computational Linguistics, Sapporo, Japan, pp 160–167Google Scholar
  131. Och FJ, Ney H (2003a) A systematic comparison of various statistical alignment models. Comput Linguist 29(1):19–51zbMATHGoogle Scholar
  132. Och FJ, Ney H (2003b) The alignment template approach to statistical machine translation. Comput Linguist 30(4):417–449zbMATHGoogle Scholar
  133. Oparin I (2008) Language models for automatic speech recognition of inflectional languages. Ph.D. Dissertation, University of West BohemiaGoogle Scholar
  134. Oparin I, Glembek O, Burget L, Černocký J (2008) Morphological random forests for language modeling of inflectional languages. In: Proceedings of the spoken language technology workshop, (IEEE). Goa, India, pp 189–192Google Scholar
  135. Papineni K, Roukos S, Ward T, Zhu WJ (2004) BLEU: a method for automatic evaluation of machine translation. Tech. Rep. RC22176(W0109-022), IBM Research Report, IBMGoogle Scholar
  136. Popel M, Žabokrtský Z (2010) TectoMT: Modular NLP framework. In: Proceedings of the 7th international conference on advances in natural language processing, Reykjavik, Iceland, IceTAL’10, pp 293–304Google Scholar
  137. Popel M, Mareček D, Green N, Žabokrtský Z (2011) Influence of parser choice on dependency-based MT. IN: Proceedings of the 6th workshop on statistical machine translation. Association for Computational Linguistics, Edinburgh, Scotland, UK, pp 433–439Google Scholar
  138. Popović M (2011) Hjerson: an open source tool for automatic error classification of machine translation output. Prague Bull Math Linguist 96:59–68Google Scholar
  139. Popović M (2015) chrF: character n-gram F-score for automatic MT evaluation. In: Proceedings of the tenth workshop on statistical machine translation. Association for Computational Linguistics, Lisbon, Portugal, pp 392–395Google Scholar
  140. Popović M, Arčan M (2015) Identifying main obstacles for statistical machine translation of morphologically rich South Slavic languages. In: Proceedings of the eighteenth annual conference of the European association for machine translation (EAMT 15). Antalya, Turkey, pp 97–104Google Scholar
  141. Popović M, Ljubešić N (2014) Exploring cross-language statistical machine translation for closely related South Slavic languages. In: Proceedings of the conference: language technology for closely related languages and language variants (LT4CloseLang). Association for Computational Linguistics, Doha, Qatar, pp 76–84Google Scholar
  142. Popović M, Ney H (2004) Towards the use of word stems and suffixes for statistical machine translation. In: Proceedings of the 4th international conference on language resources and evaluation (LREC), Lisbon, Portugal, pp 1585–1588Google Scholar
  143. Popović M, Ney H (2011) Towards automatic error analysis of machine translation output. Comput Linguist 37(4):657–688MathSciNetGoogle Scholar
  144. Popović M, Arčan M, Avramidis E, Burchardt A, Lommel AR (2015) Poor man’s lemmatisation for automatic error classification. In: The eighteenth annual conference of the European association for machine translation (EAMT 15), pp 105–112Google Scholar
  145. Prochazka V, Pollak P, Zdansky J, Nouza J (2011) Performance of Czech speech recognition with language models created from public resources. Radioengineering 20(4):1002–1008Google Scholar
  146. Rishøj C, Søgaard A (2011) Factored translation with unsupervised word clusters. In: Proceedings of the 6th workshop on statistical machine translation. Association for Computational Linguistics, Edinburgh, Scotland, pp 447–451Google Scholar
  147. Rosa R, Mareček D, Dušek O (2012) DEPFIX: a system for automatic correction of Czech MT outputs. In: Proceedings of the seventh workshop on statistical machine translation. Association for Computational Linguistics, Montreal, Canada, WMT ’12, pp 362–368Google Scholar
  148. Rosa R, Sudarikov R, Novák M, Popel M, Bojar O (2016) Dictionary-based domain adaptation of MT systems without retraining. In: Proceedings of the first conference on machine translation. Association for Computational Linguistics, Berlin, Germany, pp 449–455Google Scholar
  149. Rotovnik T, Maučec MS, Kačič Z (2007) Large vocabulary continuous speech recognition of an inflected language using stems and endings. Speech Commun 49(6):437–452Google Scholar
  150. Ruth J, O’Regan J (2011) Shallow-transfer rule-based machine translation from Czech to Polish. In: Proceedings of the second international workshop on free/open-source rule-based machine translation, pp 69–76Google Scholar
  151. Salehi B, Cook P, Baldwin T (2014) Using distributional similarity of multi-way translations to predict multiword expression compositionality. In: Proceedings of the 14th conference of the european chapter of the association for computational linguistics. Association for Computational Linguistics, Gothenburg, Sweden, pp 472–481Google Scholar
  152. Schwenk H, Rousseau A, Attik M (2012) Large, pruned or continuous space language models on a GPU for statistical machine translation. In: Proceedings of the conference of the North American chapter of the association for computational linguistics: human language technologies (NAACL HLT). Atlanta, Georgia, pp 11–19Google Scholar
  153. Seeker W, Kuhn J (2013) Morphological and syntactic case in statistical dependency parsing. Comput Linguist 39:23–55Google Scholar
  154. Sennrich R (2015) Modelling and optimizing on syntactic N-grams for statistical machine translation. Trans Assoc Computat Linguist 3:169–182Google Scholar
  155. Sennrich R, Haddow B, Birch A (2016a) Edinburgh neural machine translation systems for WMT 16. In: Proceedings of the first conference on machine translation. Association for Computational Linguistics, pp 371–376Google Scholar
  156. Sennrich R, Haddow B, Birch A (2016b) Neural machine translation of rare words with subword units. In: Proceedings of the 54th annual meeting of the association for computational linguistics, pp 1715–1725Google Scholar
  157. Shaik MAB, Mousa AED, Schüter R, Ney H (2011) Using morpheme and syllable based sub-words for Polish LVCSR. In: Proceedings of ICASSP, pp 4680–4683Google Scholar
  158. Shalonova K, Golénia B, Flach P (2009) Towards learning morphology for under-resourced fusional and agglutinating languages. IEEE/ACM Trans Audio Speech Lang Process 17(5):956–965Google Scholar
  159. Shin E, Stüker S, Kilgour K, Fügen C, Waibel A (2013) Maximum entropy language modeling for Russian ASR. In: Proceedings of the 10th international workshop on spoken language translation, Heidelberg, Germany, pp 288–294Google Scholar
  160. Simova I, Kordoni V (2013) Improving English-Bulgarian statistical machine translation by phrasal verb treatment. In: Workshop on multi-word units in machine translation and translation technologies, pp 62–71Google Scholar
  161. Slawik I, Niehues J, Waibel A (2015) Stripping adjectives: integration techniques for selective stemming in SMT systems. In: The eighteenth annual conference of the European association for machine translation (EAMT 15), pp 105–112Google Scholar
  162. Snover M, Dorr B, Schwartz R, Micciulla L, Makhoul J (2006) A study of translation error rate with targeted human annotation. In: 5th conference of the association for machine translation in the Americas (AMTA), Boston, MassachusettsGoogle Scholar
  163. Son LH, Allauzen A, Yvon F (2012) Continuous space translation models with neural networks. In: Proceedings of the conference of the North American chapter of the association for computational linguistics: human language technologies, pp 39–48Google Scholar
  164. Stanojević M, Sima’an K (2014) BEER: BEtter evaluation as ranking. In: Proceedings of the ninth workshop on statistical machine translation. Association for Computational Linguistics, Baltimore, Maryland, USA, pp 414–419Google Scholar
  165. Tamchyna A, Bojar O (2015) What a transfer-based system brings to the combination with PBMT. In: Proceedings of the ACL 2015 fourth workshop on hybrid approaches to translation (HyTra). Association for Computational Linguistics, Beijing, China, pp 11–20Google Scholar
  166. Tiedemann J (2012) Character-based pivot translation for under-resourced languages and domains. In: Proceedings of the 13th conference of the European chapter of the association for computational linguistics (EACL 2012), The Association for Computational Linguistics, pp 141–151Google Scholar
  167. Tiedemann J, Agić Ž, Nivre J (2014) Treebank translation for cross-lingual parser induction. In: Proceedings of the eighteenth conference on computational natural language learning (CoNLL). Avignon, France, pp 130–140Google Scholar
  168. Tillmann C (2004) A unigram orientation model for statistical machine translation. In: Proceedings of HLT-NAACL 2004: short papers. Association for Computational Linguistics, Boston, Massachusetts, pp 101–104Google Scholar
  169. Tillmann C, Hewavitharana S (2013) A unified alignment algorithm for bilingual data. Nat Lang Eng 19(1):33–60Google Scholar
  170. Toral A, Pecina P, Wang L, van Genabith J (2015) Linguistically-augmented perplexity-based data selection for language models. Comput Speech Lang 32:11–26Google Scholar
  171. Toutanova K, Suzuki H, Ruopp A (2008) Applying morphology generation models to machine translation. Proc ACL. Association for Computational Linguistics, Columbus, pp 514–522Google Scholar
  172. Tran K, Bisazza A, Monz C (2014) Word translation prediction for morphologically rich languages with bilingual neural networks. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP). Association for Computational Linguistics, Doha, Qatar, pp 1676–1688Google Scholar
  173. Tsvetkov Y, Dyer C, Levin L, Bhatia A (2013) Generating English determiners in phrase-based translation with synthetic translation options. In: Proceedings of the eighth workshop on statistical machine translation. Sofia, Bulgaria, pp 271–280Google Scholar
  174. Vaswani A, Huang L, Chiang D (2012) Smaller alignment models for better translations: unsupervised word alignment with the l0-norm. In: Proceedings of the 50th annual meeting of the association for computational linguistics, pp 311–319Google Scholar
  175. Vazhenina D, Markov K (2013) Factored language modeling for Russian LVCSR. In: Proceedings of the international joint conference on awareness science and technology & ubi-media computing, pp 205–210Google Scholar
  176. Vidhu Bhala RV, Abirami S (2014) Trends in word sense disambiguation. Artif Intell Rev 42(2):159–171Google Scholar
  177. Virpioja S, Väyrynen J, Mansikkaniemi A, Kurimo M (2010) Applying morphological decomposition to statistical machine translation. In: Proceedings of the joint fifth workshop on statistical machine translation and metrics MATR. Uppsala University, Uppsala, Sweden, pp 195–200Google Scholar
  178. Wang L, Wong DF, Chao LS, Lu Y, Xing J (2014) A systematic comparison of data selection criteria for SMT domain adaptation. Sci World J 2014Google Scholar
  179. Wang R, Osenova P, Simov K (2012) Linguistically-augmented Bulgarian-to-English statistical machine translation model. IN: Proceedings of the joint workshop on exploiting synergies between information retrieval and machine translation (ESIRMT) and hybrid approaches to machine translation (HyTra). Association for Computational Linguistics, Avignon, France, pp 119–128Google Scholar
  180. Wang R, Zhao H, Lu BL (2015) Bilingual continuous-space language model growing for statistical machine translation. IEEE/ACM Trans Audio Speech Lang Process 23(7):1209–1220Google Scholar
  181. Wang R, Utiyama M, Goto I, Sumita E, Zhao H, Lu BL (2016) Converting continuous-space language models into N-gram language models with efficient bilingual pruning for statistical machine translation. ACM Trans Asian Low-Resour Lang Inf Process 15(3):11:1–11:26Google Scholar
  182. Weller M, Kisselew M, Smekalova S, Fraser A, Schmid H, Durrani N, Sajjad H, Farkas R (2013) Munich-Edinburgh-Stuttgart submissions at WMT13: morphological and syntactic processing for SMT. In: Proceedings of the eighth workshop on statistical machine translation. Association for Computational Linguistics, Sofia, Bulgaria, pp 232–239Google Scholar
  183. Williams P, Sennrich R, Post M, Koehn P (2016) Syntax-based statistical machine translation. Morgan & Claypool, San RafaelGoogle Scholar
  184. Wołk K, Marasek K (2013) Polish - English speech statistical machine translation systems for the IWSLT 2013. In: Proceedings of the international workshop on spoken language translation (IWSLT), Heidelberg, GermanyGoogle Scholar
  185. Wołk K, Marasek K (2014a) Enhanced bilingual evaluation understudy. In: Proceedings of the 11th international workshop on spoken language translation (IWSLT), Lake Tahoe, pp 191–197Google Scholar
  186. Wołk K, Marasek K (2014b) Polish - English speech statistical machine translation systems for the IWSLT 2014. In: Proceedings of the international workshop on spoken language translation (IWSLT), Lake Tahoe, pp 143–149Google Scholar
  187. Wołk K, Marasek K (2015a) Neural-based machine translation for medical text domain. Based on European Medicines Agency leaflet texts. Procedia Comput Sci 64:2–9Google Scholar
  188. Wołk K, Marasek K (2015b) PJAIT systems for the IWSLT 2015 evaluation campaign enhanced by comparable corpora. In: Proceedings of the international workshop on spoken language translation (IWSLT), Da Nang, Vietnam, pp 101–104Google Scholar
  189. Wołk K, Marasek K, Glinkowski W (2015a) Telemedicine as a special case of the machine translation. Comput Med Imaging Graph 46:249–256Google Scholar
  190. Wołk K, Rejmund E, Marasek K (2015b) Harvesting comparable corpora and mining them for equivalent bilingual sentences using statistical classification and analogy-based heuristics. In: Proceedings of the international symposium on methodologies for intelligent systems (ISMIS), pp 433–441Google Scholar
  191. Wróblewska A (2011) Polish-English word alignment: preliminary study. Emerg Intell Technol Ind 369:123–132Google Scholar
  192. Wu X, Yu H, Liu Q (2014) RED: DCU-CASICT participation in WMT2014 metrics task. In: Proceedings of the ninth workshop on statistical machine translation. Association for Computational Linguistics, Baltimore, Maryland, USA, pp 420–425Google Scholar
  193. Xiong D, Zhang M (2015) Backward and trigger-based language models for statistical machine translation. Nat Lang Eng 21(2):201–226MathSciNetGoogle Scholar
  194. Žabokrtský Z, Ptáček J, Pajas P (2008) TectoMT: Highly modular MT system with tectogrammatics used as transfer layer. In: Proceedings of the third workshop on statistical machine translation. Association for Computational Linguistics, Columbus, Ohio, USA, pp 167–170Google Scholar
  195. Zeman D, Fishel M, Berka J, Bojar O (2011) Addicter: What is wrong with my translations? Prague Bull Math Linguist 96:79–88Google Scholar
  196. Zens R, Ney H (2006) Discriminative reordering models for statistical machine translation. In: Proceedings of the workshop on statistical machine translation, New York City, pp 55–63Google Scholar

Copyright information

© Springer Science+Business Media Dordrecht 2017

Authors and Affiliations

  1. 1.Faculty of Electrical Engineering and Computer ScienceUniversity of MariborMariborSlovenia

Personalised recommendations