Skip to main content

Arabic Morphological Representations for Machine Translation

  • Chapter
Arabic Computational Morphology

Part of the book series: Text, Speech and Language Technology ((TLTB,volume 38))

Abstract

Arabic has a very rich morphology characterized by a combination of templatic and affixational morphemes, complex morphological rules, and a rich feature system. This complexity makes working with Arabic as a source of target language in machine translation (MT) a challenge for two reasons. First, it is not clear what the right representation is for two reasons. First, it is not clear what the right representation is for Arabic words given a specific MT approach or system. And secondly, there are many MT-relevant resources for Arabic morphology, lexicography and syntax (e.g., morphological analyzers, dictionaries and treebanks) that adopt various representations that are not necessarily compatible with each other. The result is that for MT researchers, there is a need to experiment with and to relate multiple representations used by different resources or components to each other within a single system. In this chapter, we describe different Arabic morphological representations used by MT-relevant natural language processing resources and tools and we discuss their usability in different MT approaches. We also present a common framework for relating different levels of representations to each other

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Azza Abdel-Monem, Khaled Shaalan, Ahmed Rafea, and Hoda Baraka. A Proposed Approach for Generating Arabic from Interlingua in a Multilingual Machine Translation System. In Proceedings of the 4th Conference on Language Engineering, pp. 197–206, 2003. Cairo, Egypt.

    Google Scholar 

  2. Imad Al-Sughaiyer and Ibrahim Al-Kharashi. Arabic Morphological Analysis Tech-niques: A Comprehensive Survey. Journal of the American Society for Information Science and Technology, 55(3):189–213, 2004.

    Article  Google Scholar 

  3. Muhammed Aljlayl and Ophir Frieder. On Arabic Search: Improving the Retrieval Effectiveness via a Light Stemming Approach. In Proceedings of ACM Eleventh Conference on Information and Knowledge Management, Mclean, VA, pp. 340–347, 2002.

    Google Scholar 

  4. Haytham Alsharaf, Sylviane Cardey, Peter Greenfield, and Yihui Shen. Problems and Solutions in Machine Translation Involving Arabic, Chinese and French. In Proceedings of the International Conference on Information Technology, pp. 293–297, Las Vegas, Nevada, 2004.

    Google Scholar 

  5. Satanjeev Banerjee and Alon Lavie. METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments. In Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization, pp. 65–72, Ann Arbor, Michigan, 2005. Association for Computational Linguistics.

    Google Scholar 

  6. Kenneth Beesley. Arabic Finite-State Morphological Analysis and Generation. In Proceedings of the 16th International Conference on Computational Linguistics (COLING-96), pp. 89–94, Copenhagen, Denmark, 1996.

    Google Scholar 

  7. Daniel Bikel. Design of a Multi-lingual, Parallel-processing Statistical Parsing Engine. In Proceedings of International Conference on Human Language Technology Research (HLT), pp. 24–27, 2002.

    Google Scholar 

  8. Jeff A. Bilmes and Katrin Kirchhoff. Factored Language Models and Generalized Parallel Backoff. In Proceedings of the Human Language Technology Conference/North American Chapter of Association for Computational Linguistics (HLT/NAACL-03), pp. 4–6, Edmonton, Canada, 2003.

    Google Scholar 

  9. Peter Brown, John Cocke, Stephen Della-Pietra, Vincent Della-Pietra, Fredrick Jelinek, John Lafferty, Robert Mercer, and Paul Roossin. A Statistical Approach to Machine Translation. Computational Linguistics, 16:79–85, June 1990.

    Google Scholar 

  10. Peter Brown, Stephen Della-Pietra, Vincent Della-Pietra, and Robert Mercer. The Mathematics of Machine Translation: Parameter Estimation. Computational Linguistics, 19(2):263–311, 1993.

    Google Scholar 

  11. Tim Buckwalter. Buckwalter Arabic Morphological Analyzer Version 1.0, 2002. Linguistic Data Consortium, University of Pennsylvania, 2002. LDC Catalog No.: LDC2002L49.

    Google Scholar 

  12. Tim Buckwalter. Buckwalter Arabic Morphological Analyzer Version 2.0, 2004. Linguistic Data Consortium, University of Pennsylvania, 2002. LDC Cat alog No.: LDC2004L02, ISBN 1-58563-324-0.

    Google Scholar 

  13. Chris Callison-Burch, Miles Osborne, and Philipp Koehn. Re-evaluating the Role of BLEU in Machine Translation Research. In Proceedings of the 11th conference of the European Chapter of the Association for Computational Linguistics (EACL’06), pp. 249–256, Trento, Italy, 2006.

    Google Scholar 

  14. Michael Carl and Andy Way. Recent Advances in Example-Based Machine Translation. Kluwer Academic Publishers, Dordrecht, Holland, 1988.

    Google Scholar 

  15. Violetta Cavalli-Sforza, Abdelhadi Soudi, and Teruko Mitamura. Arabic Morphology Generation Using a Concatenative Strategy. In Proceedings of the 6th Applied Natural Language Processing Conference (ANLP 2000), pp. 86–93, Seattle, Washington, USA, 2000.

    Google Scholar 

  16. Michael Collins. Three Generative, Lexicalised Models for Statistical Parsing. In Proceedings of the 35th Annual Meeting of the ACL (jointly with the 8th Conference of the EACL), pp. 16–23, Madrid, Spain, 1997.

    Google Scholar 

  17. Michael Collins, Philipp Koehn, and Ivona Kucerova. Clause Restructuring for Statistical Machine Translation. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL’05), pp. 531–540, Ann Arbor, Michigan, 2005.

    Google Scholar 

  18. Kareem Darwish. Building a Shallow Morphological Analyzer in One Day. In Proceedings of the workshop on Computational Approaches to Semitic Languages in the 40th Annual Meeting of the Association for Computational Linguistics (ACL-02), pp. 47–54, Philadelphia, PA, USA, 2002.

    Google Scholar 

  19. Mona Diab, Kadri Hacioglu, and Daniel Jurafsky. Automatic Tagging of Arabic Text: From Raw Text to Base Phrase Chunks. In Proceedings of the 5th Meeting of the North American Chapter of the Association for Computational Linguistics/Human Language Technologies Conference (HLT-NAACL04), pp. 149–152, Boston, MA, 2004.

    Google Scholar 

  20. Bonnie J. Dorr, Pamela W. Jordan, and John W. Benoit. A Survey of Current Research in Machine Translation. In M. Zelkowitz, editor, Advances in Computers, Vol. 49, pp. 1–68. Academic Press, London, 1999.

    Google Scholar 

  21. Anas El Isbihani, Shahram Khadivi, Oliver Bender, and Hermann Ney. Morpho-syntactic arabic preprocessing for arabic to english statistical machine translation. In Proceedings on the Workshop on Statistical Machine Translation, pp. 15–22, New York City, June 2006. Association for Computational Linguistics.

    Google Scholar 

  22. Sharon Goldwater and David McClosky. Improving Statistical MT Through Morphological Analysis. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 676–683, Vancouver, Canada, 2005.

    Google Scholar 

  23. Nizar Habash. Generation Heavy Hybrid Machine Translation. PhD thesis, University of Maryland College Park, 2003.

    Google Scholar 

  24. Nizar Habash. Large Scale Lexeme Based Arabic Morphological Generation. In Proceedings of Traitement Automatique des Langues Naturelles (TALN-04), pp. 271–276, 2004. Fez, Morocco.

    Google Scholar 

  25. Nizar Habash, Bonnie Dorr, and Christof Monz. Challenges in Building an Arabic-English GHMT System with SMT Components. In Proceedings of the 7th Conference of the Association for Machine Translation in the Americas (AMTA06), pp. 56–65, Cambridge,MA, 2006.

    Google Scholar 

  26. Nizar Habash and Owen Rambow. Arabic Tokenization, Part-of-Speech Tagging and Morphological Disambiguation in One Fell Swoop. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL’05), pp. 573–580, Ann Arbor, Michigan, June 2005. Association for Computational Linguistics.

    Google Scholar 

  27. Nizar Habash and Owen Rambow. MAGEAD: A Morphological Analyzer and Generator for the Arabic Dialects. In Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, pp. 681–688, Sydney, Australia, July 2006. Association for Computational Linguistics.

    Google Scholar 

  28. Nizar Habash, Owen Rambow, and George Kiraz. Morphological Analysis and Generation for Arabic Dialects. In Proceedings of the Workshop on Computational Approaches to Semitic Languages at 43rd Meeting of the Association for Computational Linguistics (ACL’05), pp. 17–24, Ann Arbor, Michigan, 2005.

    Google Scholar 

  29. Nizar Habash and Fatiha Sadat. Arabic Preprocessing Schemes for Statistical Machine Translation. In Proceedings of the 7th Meeting of the North American Chapter of the Association for Computational Linguistics/Human Language Technologies Conference (HLT-NAACL06), pp. 49–52, New York, NY, 2006.

    Google Scholar 

  30. Jan Hajič, Otakar Smrž, Tim Buckwalter, and Hubert Jin. Feature-based Tagger of Approximations of Functional Arabic Morphology. In Ma. Antonia Martí Montserrat Civit, Sandra Kübler, editor, Proceedings of Treebanks and Linguistic Theories (TLT), pp. 53–64, Barcelona, Spain, 2005.

    Google Scholar 

  31. Xu Jinxi. UN Parallel Text (Arabic-English), LDC Catalog No.: LDC2002E15, 2002. Linguistic Data Consortium, University of Pennsylvania.

    Google Scholar 

  32. Lauri Karttunen, Ronald Kaplan, and Annie Zaenen. Two-level Morphology with Composition. In Proceedings of Fourteenth International Conference on Computational Linguistics (COLING-92), pp. 141–148, Nantes, France, July 20–28 1992.

    Google Scholar 

  33. George Kiraz. Multi-tape Two-level Morphology: A Case study in Semitic Non-Linear Morphology. In Proceedings of Fifteenth International Conference on Computational Linguistics (COLING-94), pp. 180–186, Kyoto, Japan, 1994.

    Google Scholar 

  34. Katrin Kirchhoff, Mei Yang, and Kevin Duh. Statistical Machine Translation of Parliamentary Proceedings Using Morpho-Syntactic Knowledge. In TC-STAR Workshop on Speech-to-Speech Translation, pp. 57–62, Barcelona, Spain, 2006.

    Google Scholar 

  35. Kevin Knight. A Statistical MT Tutorial Workbook, April 30 1999. http://www.clsp. jhu.edu/ws99/projects/mt/mt-workbook.htm.

    Google Scholar 

  36. Philipp Koehn. Pharaoh: a Beam Search Decoder for Phrase-based Statistical Machine Translation Models. In Proceedings of the Association for Machine Translation in the Americas, pp. 115–124, 2004.

    Google Scholar 

  37. Philipp Koehn, Franz Josef Och, and Daniel Marcu. Statistical Phrase-based Translation. In Proceedings of the Human Language Technology and North American Association for Computational Linguistics Conference (HLT/NAACL), pp. 127–133, Edmonton, Canada, 2003.

    Google Scholar 

  38. Kimmo Koskenniemi. Two-Level Model for Morphological Analysis. In Proceedings of the 8th International Joint Conference on Artificial Intelligence, pp. 683–685, 1983.

    Google Scholar 

  39. Young-Suk Lee. Morphological Analysis for Statistical Machine Translation. In Proceedings of the 5th Meeting of the North American Chapter of the Association for Computational Linguistics/Human Language Technologies Conference (HLT-NAACL04), pp. 57–60, Boston, MA, 2004.

    Google Scholar 

  40. Young-Suk Lee, Kishore Papineni, Salim Roukos, Ossama Emam, and Hany Hassan. Language Model Based Arabic Word Segmentation. In Proceedings of the 41st Meeting of the Association for Computational Linguistics (ACL’03), pp. 399–406, Sapporo, Japan, 2003.

    Google Scholar 

  41. Mohamed Maamouri, Ann Bies, and Tim Buckwalter. The Penn Arabic Treebank: Building a Large-Scale Annotated Arabic Corpus. In NEMLAR Conference on Arabic Language Resources and Tools, Cairo, Egypt, 2004.

    Google Scholar 

  42. Guido Minnen, John Carroll, and Darren Pearce. Robust, Applied Morphological Generation. In Proceedings of the 1st International Conference on Natural Language Generation (INLG 2000), pp. 201–208, Mitzpe Ramon, Israel, 2000.

    Google Scholar 

  43. Sonja Nieien and Hermann Ney. Statistical Machine Translation with Scarce Resources Using Morpho-syntactic Information. Computational Linguistics, 30(2), 2004.

    Google Scholar 

  44. Franz Josef Och. Google System Description for the 2005 NIST MT Evaluation. In MT Eval Workshop (unpublished talk), 2005.

    Google Scholar 

  45. Franz Josef Och, Daniel Gildea, Sanjeev Khudanpur, Anoop Sarkar, Kenji Yamada, Alex Fraser, Shankar Kumar, Libin Shen, David Smith, Katherine Eng, Viren Jain, Zhen Jin, and Dragomir Radev. A Smorgasbord of Features for Statistical Machine Translation. In Proceedings of the Human Language Technology / North American Association of Computational Linguistics Conference, pp. 161–168, Boston, Massachusetts, 2004.

    Google Scholar 

  46. Franz Josef Och and Hermann Ney. A Systematic Comparison of Various Statistical Alignment Models. Computational Linguistics, 29(1):19–52, 2003.

    Article  Google Scholar 

  47. Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. BLEU: A Method for Automatic Evaluation of Machine Translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, pp. 311–318, Philadelphia, PA, 2002.

    Google Scholar 

  48. Aaron Phillips and Violetta Cavalli-Sforza. Arabic-to-English Example Based Machine Translation Using Context-Insensitive Morphological Analysis. In Journées d’Etudes sur le Traitement Automatique de la Langue Arabe (JETALA), Rabat, Morocco, 2006.

    Google Scholar 

  49. Maja Popović and Hermann Ney. Towards the Use of Word Stems and Suffixes for Statistical Machine Translation. In Proceedings of the 4th International Conference on Language Resources and Evaluation (LREC), pp. 1585–1588, Lisbon, Portugal, May 2004.

    Google Scholar 

  50. Chris Quirk, Arul Menezes, and Colin Cherry. Dependency Treelet Translation: Syntactically Informed Phrasal SMT. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics, pp. 271–279, Ann Arbor, Michigan, 2005.

    Google Scholar 

  51. Jason Riesa and David Yarowsky. Minimally Supervised Morphological Segmentation with Applications to Machine Translation. In Proceedings of the 7th Conference of the Association for Machine Translation in the Americas (AMTA06), pp. 185–192, Cambridge, MA, 2006.

    Google Scholar 

  52. Fatiha Sadat and Nizar Habash. Combination of Arabic Preprocessing Schemes for Statistical Machine Translation. In Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, pp. 1–8, Sydney, Australia, July 2006. Association for Computational Linguistics.

    Google Scholar 

  53. Mohammed Sharaf. Implications of the Agreement Features in (English to Arabic) Machine Translation. Master’s thesis, Al-Azhar University, 2002.

    Google Scholar 

  54. Noah Smith, David Smith, and Roy Tromble. Context-Based Morphological Disambiguation with Random Fields. In Proceedings of the 2005 Conference on Empirical Methods in Natural Language Processing (EMNLP05), pp. 475–482, Vancouver, Canada, 2005.

    Google Scholar 

  55. Harold Somers. Review Article: Example-based Machine Translation. Machine Translation, 14(2):113–157, 1999.

    Article  Google Scholar 

  56. Abdelhadi Soudi. Challenges in the Generation of Arabic from Interlingua. In Proceedings of Traitement Automatique des Langues Naturelles (TALN-04), pp. 343–350, 2004. Fez, Morocco.

    Google Scholar 

  57. Abdelhadi Soudi, Violetta Cavalli-Sforza, and Abderrahim Jamari. A Computational Lexeme-Based Treatment of Arabic Morphology. In Proceedings of the Arabic Natural Language Processing Workshop, Conference of the Association for Computational Linguistics (ACL 2001), pp. 50–57, Toulouse, France, 2001.

    Google Scholar 

  58. Abdelhadi Soudi, Violetta Cavalli-Sforza, and Abderrahim Jamari. A Prototype English-to-Arabic Interlingua-based MT system. In Proceedings of the Third International Conference on Language Resources and Evaluation: Workshop on Arabic language resources and evaluation, Las Palmas, Spain, 2002.

    Google Scholar 

  59. Andreas Zollmann, Ashish Venugopal, and Stephan Vogel. Bridging the inflection morphology gap for arabic statistical machine translation. In Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers, pp. 201–204, New York City, USA, 2006. Association for Computational Linguistics.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer

About this chapter

Cite this chapter

Habash, N. (2007). Arabic Morphological Representations for Machine Translation. In: Soudi, A., Bosch, A.v., Neumann, G. (eds) Arabic Computational Morphology. Text, Speech and Language Technology, vol 38. Springer, Dordrecht. https://doi.org/10.1007/978-1-4020-6046-5_14

Download citation

Publish with us

Policies and ethics