Improved Arabic-French Machine Translation through Preprocessing Schemes and Language Analysis

  • Fatiha Sadat
  • Emad Mohamed
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7884)


Arabic is a morphologically rich and complex language, which presents significant challenges for natural language processing and machine translation. In this paper, we describe an ongoing effort to build a competitive Arabic–French statistical machine translation system using the Moses decoder and other tools. The results show a significant increase in terms of Bleu score by introducing some preprocessing schemes for Arabic in addition to other language analysis rules.


Arabic morphology statistical machine translation comparable corpora parallel corpora pre-processing 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Stolcke, A.: Srilm-An Extensible Language Modeling Toolkit. In: Proc. of the International Conference on Spoken Language Processing (2002)Google Scholar
  2. 2.
    Diab, M., Hacioglu, K., Jurafsky, D.: Automatic Tagging of Arabic Text: From Raw Text to Base Phrase Chunks. In: Proc. of the North American Chapter of the Association for Computational Linguistics (NAACL), Boston, MA (2004)Google Scholar
  3. 3.
    Koehn, P., Shen, W., Federico, M., Bertoldi, N., Callison-Burch, C., Cowan, B., Dyer, C., Hoang, H., Bojar, O., Zens, R., Constantin, A., Herbst, E., Moran, C., Birch, A.: Moses: Open source toolkit for statistical machine translation. In: Proceedings of the ACL 2007 Interactive Presentation Sessions, Prague (2007)Google Scholar
  4. 4.
    Carpuat, M., Marton, Y., Habash, N.: Improving Arabic-to-English Statistical Machine Translation by Reordering Post-verbal Subjects for Alignment. Machine Translation, Special Issue on Machine Translation for Arabic 26(1-2), 105–120 (2012)CrossRefGoogle Scholar
  5. 5.
    Habash, N.: Introduction to Arabic Natural Language Processing. Morgan & Claypool (2010)Google Scholar
  6. 6.
    Habash, N., Sadat, F.: Arabic Preprocessing Schemes for Statistical Machine Translation. In: Proceedings of NAACL 2006, New York, USA, June 5-7 (2006)Google Scholar
  7. 7.
    Lee, Y.: Morphological Analysis for Statistical Machine Translation. In: Proc. of NAACL, Boston, MA (2004)Google Scholar
  8. 8.
    Papineni, K., Roukos, S., Ward, T., Zhu, W.: Bleu: A Method for Automatic Evaluation of Machine Translation. Technical Report RC22176(W0109-022), IBM Research Division, Yorktown Heights, NY (2001)Google Scholar
  9. 9.
    Daelemans, W., Zavrel, J., Berck, P., Gillis, S.: MBT: A memory part speech tagger generator. In: Proceedings of the Fourth Workshop on Very Large Corpora, ACL 1996, Copenhagen, Denmark, pp. 14–27 (August 4, 1996)Google Scholar
  10. 10.
    Mohamed, E., Kübler, S.: Arabic part of speech tagging. In: Proceedings of LREC, Valetta, Malta (2010)Google Scholar
  11. 11.
    Hasan, S., Isbihani, A.E., Ney, H.: Creating a Large-Scale Arabic to French Statistical Machine Translation System. In: International Conference on Language Resources and Evaluation (LREC), Genoa, Italy, pp. 855–858 (May 2006)Google Scholar
  12. 12.
    El Isbihani, A., Khadivi, S., Bender, O., Ney, H.: Morpho-syntactic Arabic Preprocessing for Arabic to English Statistical Machine Translation. In: Human Language Technology Conf./North American Chapter of the Assoc. for Computational Linguistics Annual Meeting (HLT-NAACL), Workshop on Statistical Machine Translation, New York City, pp. 15–22 (June 2006)Google Scholar
  13. 13.
    Och, F.J., Ney, H.: A Systematic Comparison of Various Statistical Alignment Models. Computational Linguistics 29(1), 19–51 (2003)zbMATHCrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Fatiha Sadat
    • 1
  • Emad Mohamed
    • 1
  1. 1.MontrealCanada

Personalised recommendations