Machine Translation

, Volume 26, Issue 1–2, pp 177–195 | Cite as

Machine translation between Hebrew and Arabic

  • Reshef Shilon
  • Nizar Habash
  • Alon Lavie
  • Shuly Wintner
Article

Abstract

Hebrew and Arabic are related but mutually incomprehensible languages with complex morphology and scarce parallel corpora. Machine translation between the two languages is therefore interesting and challenging. We discuss similarities and differences between Hebrew and Arabic, the benefits and challenges that they induce, respectively, and their implications on machine translation. We highlight the shortcomings of using English as a pivot language and advocate a direct, transfer-based and linguistically-informed (but still statistical, and hence scalable) approach. We report preliminary results of the two systems we are currently developing, for translation in both directions.

Keywords

Arabic Hebrew Transfer-based MT 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Alkuhlani S, Habash N (2011) A corpus for modeling morpho-syntactic agreement in arabic: gender, number and rationality. In: Proceedings of the ACL’2011, Short Paper, PortlandGoogle Scholar
  2. Brown PF, Cocke J, Della Pietra SA, Della Pietra VJ, Jelinek F, Lafferty JD, Mercer RL, Roossin PS (1990) A statistical approach to machine translation. Comput Linguist 16(2): 79–85Google Scholar
  3. Buckwalter T (2004) Buckwalter Arabic Morphological Analyzer Version 2.0. Linguistic Data Consortium, PhiladelphiaGoogle Scholar
  4. El Kholy A, Habash N (2010) Techniques for Arabic morphological detokenization and orthographic denormalization. In: Proceedings of LREC-2010Google Scholar
  5. Habash N (2004) Large scale lexeme based Arabic morphological generation. In: Proceedings of Traitement Automatique du Langage Naturel (TALN-04), Fez, MoroccoGoogle Scholar
  6. Habash N (2010) Introduction to Arabic Natural Language Processing. Morgan & Claypool PublishersGoogle Scholar
  7. Habash N, Rambow O (2005) Arabic tokenization, part-of-speech tagging and morphological disambiguation in one fell swoop. In: Proceedings of ACL-05, Ann Arbor, MI, USAGoogle Scholar
  8. Habash N, Sadat F (2006) Arabic preprocessing schemes for statistical machine translation. In: Moore RC, Bilmes JA, Chu-Carroll J, Sanderson M (eds) HLT-NAACL, New York, NY, USAGoogle Scholar
  9. Habash N, Soudi A, Buckwalter T (2007) On Arabic transliteration. In: Soudi A, Neumann G, van den Bosch A (eds) Arabic computational morphology, text, speech and language technology, vol 38. Springer, chap 2, pp 15–22. doi:10.1007/978-1-4020-6046-5_2
  10. Hajic J (1987) Ruslan: an MT system between closely related languages. In: Proceedings of the 3rd conference of the European chapter of the association for computational linguistics, pp 113–117Google Scholar
  11. Hajic J, Hric J, Kubon V (2000) Machine translation of very close languages. In: Proceedings of the sixth conference on applied natural language processing. Association for Computational Linguistics, Seattle, WA, USA, pp 7–12. doi:10.3115/974147.974149. http://www.aclweb.org/anthology/A00-1002
  12. Hanneman G, Ambati V, Clark JH, Parlikar A, Lavie A (2009) An improved statistical transfer system for French–English machine translation. In: StatMT ’09: Proceedings of the fourth workshop on statistical machine translation. Association for Computational Linguistics, Morristown, NJ, USA, pp 140–144Google Scholar
  13. Itai A, Wintner S (2008) Language resources for Hebrew. Lang Resour Eval 42: 75–98CrossRefGoogle Scholar
  14. Kumar S, Och FJ, Macherey W (2007) Improving word alignment with bridge languages. In: Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL). Association for Computational Linguistics, Prague, Czech Republic, pp 42–50. http://www.aclweb.org/anthology/D/D07/D07-1005
  15. Lavie A (2008) Stat-XFER: a general search-based syntax-driven framework for machine translation. In: Gelbukh AF (ed) CICLing, Lecture Notes in Computer Science, vol 4919. Springer, pp 362–375Google Scholar
  16. Lavie A, Vogel S, Levin L, Peterson E, Probst K, Llitjós AF, Reynolds R, Carbonell J, Cohen R (2003) Experiments with a Hindi-to-English transfer-based MT system under a miserly data scenario. ACM Trans Asian Lang Inform Process (TALIP) 2(2): 143–163. doi:10.1145/974740.974747 CrossRefGoogle Scholar
  17. Lavie A, Sagae K, Jayaraman S (2004a) The significance of recall in automatic metrics for mt evaluation. In: Frederking RE, Taylor K (eds) AMTA. Lecture Notes in Computer Science, vol 3265. Springer, pp 134–143Google Scholar
  18. Lavie A, Wintner S, Eytani Y, Peterson E, Probst K (2004b) Rapid prototyping of a transfer-based Hebrew-to-English machine translation system. In: Proceedings of TMI-2004: the 10th international conference on theoretical and methodological issues in machine translation, Baltimore, MDGoogle Scholar
  19. Monson C, Font Llitjós A, Ambati V, Levin L, Lavie A, Alvarez A, Aranovich R, Carbonell J, Frederking R, Peterson E, Probst K (2008) Linguistic structure and bilingual informants help induce machine translation of lesser-resourced languages. In: Proceedings of the Sixth International Language Resources and Evaluation (LREC’08). European Language Resources Association (ELRA). http://www.lrec-conf.org/proceedings/lrec2008/
  20. Munteanu DS, Marcu D (2005) Improving machine translation performance by exploiting non-parallel corpora. Comput Linguist 31(4): 477–504. doi:10.1162/089120105775299168 CrossRefGoogle Scholar
  21. Muraki K (1987) PIVOT: two-phase machine translation system. In: MT summit manuscripts and program, pp 81–83Google Scholar
  22. Papineni K, Roukos S, Ward T, Zhu WJ (2002) BLEU: a method for automatic evaluation of machine translation. In: ACL ’02: Proceedings of the 40th annual meeting on association for computational linguistics. Association for Computational Linguistics, Morristown, NJ, USA, pp 311–318. doi:10.3115/1073083.1073135
  23. Roth R, Rambow O, Habash N, Diab M, Rudin C (2008) Arabic morphological tagging, diacritization, and lemmatization using lexeme models and feature ranking. In: Proceedings of ACL-08, Short Papers, Columbus, OH, USA, pp 117–120Google Scholar
  24. Shilon R, Habash N, Lavie A, Wintner S (2010) Machine translation between Hebrew and Arabic: needs, challenges and preliminary solutions. In: Proceedings of AMTA 2010: the ninth conference of the association for machine translation in the AmericasGoogle Scholar
  25. Tantug AC, Adali E, Oflazer K (2007) Machine translation between turkic languages. In: Proceedings of ACL 2007, Companion Volume. The Association for Computer LinguisticsGoogle Scholar
  26. Varga D, Halácsy P, Kornai A, Nagy V, Németh L, Trón V (2005) Parallel corpora for medium density languages. In: Proceedings of RANLP’2005, pp 590–596Google Scholar
  27. Wu H, Wang H (2007) Pivot language approach for phrase-based statistical machine translation. In: Proceedings of the 45th annual meeting of the association of computational linguistics. Association for Computational Linguistics, Prague, Czech Republic, pp 856–863. http://www.aclweb.org/anthology/P07-1108

Copyright information

© Springer Science+Business Media B.V. 2011

Authors and Affiliations

  • Reshef Shilon
    • 1
  • Nizar Habash
    • 2
  • Alon Lavie
    • 3
  • Shuly Wintner
    • 4
  1. 1.Tel Aviv UniversityTel AvivIsrael
  2. 2.Columbia UniversityNew YorkUSA
  3. 3.Carnegie Mellon UniversityPittsburghUSA
  4. 4.University of HaifaHaifaIsrael

Personalised recommendations