Skip to main content

Statistical Machine Translation

  • Chapter
  • First Online:
Natural Language Processing of Semitic Languages
  • 2515 Accesses

Abstract

We introduce a brief introduction to statistical machine translation for semitic languages along with an overview of machine translation approaches. We discuss the special consideration that should be taken into account when developing SMT systems for Semitic languages. We discuss segmentation techniques for Semitic SMT; and finally we introduce a detailed guide on how to build an SMT using freely available resources.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 119.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://www.fjoch.com/GIZA++.html

  2. 2.

    http://www.ldc.upenn.edu/Catalog/docs/LDC2012T06

  3. 3.

    http://www.ldc.upenn.edu

  4. 4.

    http://cl.naist.jp/eric-n/ubuntu-nlp/

  5. 5.

    http://sourceforge.net/projects/irstlm/

References

  1. Adler M, Elhadad M (2006) An unsupervised morpheme-based HMM for Hebrew morphological disambiguation. In: Proceedings of the 21st international conference on computational linguistics and 44th annual meeting of the ACL, Sydney, pp 665–672

    Google Scholar 

  2. Al-Haj H, Lavie A (2012) The impact of Arabic morphological segmentation on broad-coverage English-to-Arabic statistical machine translation. Mach Transl 26:3–24

    Article  Google Scholar 

  3. Brown P, Cocke J, Della-Pietra S, Jelinek F, Della-Pietra V, Lafferty J, Mercer R, Roossin P (1990) A statistical approach to machine translation. Comput Linguist 16(2):79–85

    Google Scholar 

  4. Brown P, Della-Pietra S, Della-Pietra V, Mercer R (1993) The mathematics of statistical machine translation: parameter estimation. Comput Linguist 19(2):263–311

    Google Scholar 

  5. Diab M (2009) Second generation tools (AMIRA 2.0) fast and robust tokenization, POS tagging, and base phrase chunking. In: MEDAR 2nd international conference on Arabic language resources and tools, Cairo

    Google Scholar 

  6. El-Kholy A, Habash N (2012) Orthographic and morphological processing for English-Arabic statistical machine translation. Mach Transl 26:25–45

    Article  Google Scholar 

  7. Federico M, Bertoldi N, Cettolo M (2008) IRSTLM: an open source toolkit for handling large scale language models. In: Proceedings of interspeech, Brisbane

    Google Scholar 

  8. Galley M, Manning C (2008) A simple and effective hierarchical phrase reordering mode. In: Proceedings of EMNLP, Honolulu

    Google Scholar 

  9. Jurafsky D, Martin JH (2009) Speech and language processing: an introduction to natural language processing, speech recognition, and computational linguistics, 2nd edn. Prentice-Hall, Upper Saddle River

    Google Scholar 

  10. Koehn P (2004) Pharaoh: a beam search decoder for phrase-based statistical machine translation models. In: Proceedings of 6th conference of the Association for Machine Translation in the Americas, AMTA, Washington, DC, pp 115–124

    Google Scholar 

  11. Mansour S (2010) MorphTagger: HMM-based Arabic segmentation for statistical machine translation. In: Proceedings of the 7th international workshop on spoken language translation, Paris

    Google Scholar 

  12. Mansour S, Sima’an K, Winter Y (2007) Smoothing a lexicon-based POS tagger for Arabic and Hebrew. In: Proceedings of the workshop on computational approaches to Semitic languages: common issues and resources, Semitic, Prague, pp 97–103

    Google Scholar 

  13. Moses Framework user guide. http://www.statmt.org/moses/manual/manual.pdf

  14. Och F, Ney H (2003) A systematic comparison of various statistical alignment models. Comput Linguist 29:19–51

    Article  MATH  Google Scholar 

  15. Papineni K, Roukos S, Ward T, Zhu W (2002) BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th annual meeting of the Association for Computational Linguistics (ACL’02), Philadelphia, pp 311–318

    Google Scholar 

  16. Shilon R, Habash N, Lavie A, Wintner S (2010) Machine translation between Hebrew and Arabic: needs, challenges and preliminary solutions. In: Proceedings of AMTA 2010: the ninth conference of the Association for Machine Translation in the Americas, Denver

    Google Scholar 

  17. Talbot D, Osborne M (2007) Randomised language modelling for statistical machine translation. In: Proceedings of the 45th annual meeting of the Association of Computational Linguistics, Prague, pp 512–519

    Google Scholar 

  18. Tillmann C (2004) A unigram orientation model for statistical machine translation. In: Proceedings of human-language technology and North American Association of Computational Linguistics (HLT-NAACL), Boston, pp 101–104

    Google Scholar 

  19. Weaver W (1949) Translation. In: Locke W, Booth A (eds) Machine translation of languages: fourteen essays, vol 42. MIT, Cambridge, pp 15–23

    Google Scholar 

  20. Wu D (2005) MT model space: statistical vs. compositional vs. example-based machine translation. Mach Transl J 19:213–227

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hany Hassan .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Hassan, H., Darwish, K. (2014). Statistical Machine Translation. In: Zitouni, I. (eds) Natural Language Processing of Semitic Languages. Theory and Applications of Natural Language Processing. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-45358-8_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-45358-8_6

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-45357-1

  • Online ISBN: 978-3-642-45358-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics