Abstract
We introduce a brief introduction to statistical machine translation for semitic languages along with an overview of machine translation approaches. We discuss the special consideration that should be taken into account when developing SMT systems for Semitic languages. We discuss segmentation techniques for Semitic SMT; and finally we introduce a detailed guide on how to build an SMT using freely available resources.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Adler M, Elhadad M (2006) An unsupervised morpheme-based HMM for Hebrew morphological disambiguation. In: Proceedings of the 21st international conference on computational linguistics and 44th annual meeting of the ACL, Sydney, pp 665–672
Al-Haj H, Lavie A (2012) The impact of Arabic morphological segmentation on broad-coverage English-to-Arabic statistical machine translation. Mach Transl 26:3–24
Brown P, Cocke J, Della-Pietra S, Jelinek F, Della-Pietra V, Lafferty J, Mercer R, Roossin P (1990) A statistical approach to machine translation. Comput Linguist 16(2):79–85
Brown P, Della-Pietra S, Della-Pietra V, Mercer R (1993) The mathematics of statistical machine translation: parameter estimation. Comput Linguist 19(2):263–311
Diab M (2009) Second generation tools (AMIRA 2.0) fast and robust tokenization, POS tagging, and base phrase chunking. In: MEDAR 2nd international conference on Arabic language resources and tools, Cairo
El-Kholy A, Habash N (2012) Orthographic and morphological processing for English-Arabic statistical machine translation. Mach Transl 26:25–45
Federico M, Bertoldi N, Cettolo M (2008) IRSTLM: an open source toolkit for handling large scale language models. In: Proceedings of interspeech, Brisbane
Galley M, Manning C (2008) A simple and effective hierarchical phrase reordering mode. In: Proceedings of EMNLP, Honolulu
Jurafsky D, Martin JH (2009) Speech and language processing: an introduction to natural language processing, speech recognition, and computational linguistics, 2nd edn. Prentice-Hall, Upper Saddle River
Koehn P (2004) Pharaoh: a beam search decoder for phrase-based statistical machine translation models. In: Proceedings of 6th conference of the Association for Machine Translation in the Americas, AMTA, Washington, DC, pp 115–124
Mansour S (2010) MorphTagger: HMM-based Arabic segmentation for statistical machine translation. In: Proceedings of the 7th international workshop on spoken language translation, Paris
Mansour S, Sima’an K, Winter Y (2007) Smoothing a lexicon-based POS tagger for Arabic and Hebrew. In: Proceedings of the workshop on computational approaches to Semitic languages: common issues and resources, Semitic, Prague, pp 97–103
Moses Framework user guide. http://www.statmt.org/moses/manual/manual.pdf
Och F, Ney H (2003) A systematic comparison of various statistical alignment models. Comput Linguist 29:19–51
Papineni K, Roukos S, Ward T, Zhu W (2002) BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th annual meeting of the Association for Computational Linguistics (ACL’02), Philadelphia, pp 311–318
Shilon R, Habash N, Lavie A, Wintner S (2010) Machine translation between Hebrew and Arabic: needs, challenges and preliminary solutions. In: Proceedings of AMTA 2010: the ninth conference of the Association for Machine Translation in the Americas, Denver
Talbot D, Osborne M (2007) Randomised language modelling for statistical machine translation. In: Proceedings of the 45th annual meeting of the Association of Computational Linguistics, Prague, pp 512–519
Tillmann C (2004) A unigram orientation model for statistical machine translation. In: Proceedings of human-language technology and North American Association of Computational Linguistics (HLT-NAACL), Boston, pp 101–104
Weaver W (1949) Translation. In: Locke W, Booth A (eds) Machine translation of languages: fourteen essays, vol 42. MIT, Cambridge, pp 15–23
Wu D (2005) MT model space: statistical vs. compositional vs. example-based machine translation. Mach Transl J 19:213–227
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Hassan, H., Darwish, K. (2014). Statistical Machine Translation. In: Zitouni, I. (eds) Natural Language Processing of Semitic Languages. Theory and Applications of Natural Language Processing. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-45358-8_6
Download citation
DOI: https://doi.org/10.1007/978-3-642-45358-8_6
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-45357-1
Online ISBN: 978-3-642-45358-8
eBook Packages: Computer ScienceComputer Science (R0)