Arabic Morphological Representations for Machine Translation

Habash, Nizar

doi:10.1007/978-1-4020-6046-5_14

Nizar Habash¹⁴

Part of the book series: Text, Speech and Language Technology ((TLTB,volume 38))

1133 Accesses
10 Citations
1 Altmetric

Abstract

Arabic has a very rich morphology characterized by a combination of templatic and affixational morphemes, complex morphological rules, and a rich feature system. This complexity makes working with Arabic as a source of target language in machine translation (MT) a challenge for two reasons. First, it is not clear what the right representation is for two reasons. First, it is not clear what the right representation is for Arabic words given a specific MT approach or system. And secondly, there are many MT-relevant resources for Arabic morphology, lexicography and syntax (e.g., morphological analyzers, dictionaries and treebanks) that adopt various representations that are not necessarily compatible with each other. The result is that for MT researchers, there is a need to experiment with and to relate multiple representations used by different resources or components to each other within a single system. In this chapter, we describe different Arabic morphological representations used by MT-relevant natural language processing resources and tools and we discuss their usability in different MT approaches. We also present a common framework for relating different levels of representations to each other

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Azza Abdel-Monem, Khaled Shaalan, Ahmed Rafea, and Hoda Baraka. A Proposed Approach for Generating Arabic from Interlingua in a Multilingual Machine Translation System. In Proceedings of the 4th Conference on Language Engineering, pp. 197–206, 2003. Cairo, Egypt.
Google Scholar
Imad Al-Sughaiyer and Ibrahim Al-Kharashi. Arabic Morphological Analysis Tech-niques: A Comprehensive Survey. Journal of the American Society for Information Science and Technology, 55(3):189–213, 2004.
Article Google Scholar
Muhammed Aljlayl and Ophir Frieder. On Arabic Search: Improving the Retrieval Effectiveness via a Light Stemming Approach. In Proceedings of ACM Eleventh Conference on Information and Knowledge Management, Mclean, VA, pp. 340–347, 2002.
Google Scholar
Haytham Alsharaf, Sylviane Cardey, Peter Greenfield, and Yihui Shen. Problems and Solutions in Machine Translation Involving Arabic, Chinese and French. In Proceedings of the International Conference on Information Technology, pp. 293–297, Las Vegas, Nevada, 2004.
Google Scholar
Satanjeev Banerjee and Alon Lavie. METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments. In Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization, pp. 65–72, Ann Arbor, Michigan, 2005. Association for Computational Linguistics.
Google Scholar
Kenneth Beesley. Arabic Finite-State Morphological Analysis and Generation. In Proceedings of the 16th International Conference on Computational Linguistics (COLING-96), pp. 89–94, Copenhagen, Denmark, 1996.
Google Scholar
Daniel Bikel. Design of a Multi-lingual, Parallel-processing Statistical Parsing Engine. In Proceedings of International Conference on Human Language Technology Research (HLT), pp. 24–27, 2002.
Google Scholar
Jeff A. Bilmes and Katrin Kirchhoff. Factored Language Models and Generalized Parallel Backoff. In Proceedings of the Human Language Technology Conference/North American Chapter of Association for Computational Linguistics (HLT/NAACL-03), pp. 4–6, Edmonton, Canada, 2003.
Google Scholar
Peter Brown, John Cocke, Stephen Della-Pietra, Vincent Della-Pietra, Fredrick Jelinek, John Lafferty, Robert Mercer, and Paul Roossin. A Statistical Approach to Machine Translation. Computational Linguistics, 16:79–85, June 1990.
Google Scholar
Peter Brown, Stephen Della-Pietra, Vincent Della-Pietra, and Robert Mercer. The Mathematics of Machine Translation: Parameter Estimation. Computational Linguistics, 19(2):263–311, 1993.
Google Scholar
Tim Buckwalter. Buckwalter Arabic Morphological Analyzer Version 1.0, 2002. Linguistic Data Consortium, University of Pennsylvania, 2002. LDC Catalog No.: LDC2002L49.
Google Scholar
Tim Buckwalter. Buckwalter Arabic Morphological Analyzer Version 2.0, 2004. Linguistic Data Consortium, University of Pennsylvania, 2002. LDC Cat alog No.: LDC2004L02, ISBN 1-58563-324-0.
Google Scholar
Chris Callison-Burch, Miles Osborne, and Philipp Koehn. Re-evaluating the Role of BLEU in Machine Translation Research. In Proceedings of the 11th conference of the European Chapter of the Association for Computational Linguistics (EACL’06), pp. 249–256, Trento, Italy, 2006.
Google Scholar
Michael Carl and Andy Way. Recent Advances in Example-Based Machine Translation. Kluwer Academic Publishers, Dordrecht, Holland, 1988.
Google Scholar
Violetta Cavalli-Sforza, Abdelhadi Soudi, and Teruko Mitamura. Arabic Morphology Generation Using a Concatenative Strategy. In Proceedings of the 6th Applied Natural Language Processing Conference (ANLP 2000), pp. 86–93, Seattle, Washington, USA, 2000.
Google Scholar
Michael Collins. Three Generative, Lexicalised Models for Statistical Parsing. In Proceedings of the 35th Annual Meeting of the ACL (jointly with the 8th Conference of the EACL), pp. 16–23, Madrid, Spain, 1997.
Google Scholar
Michael Collins, Philipp Koehn, and Ivona Kucerova. Clause Restructuring for Statistical Machine Translation. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL’05), pp. 531–540, Ann Arbor, Michigan, 2005.
Google Scholar
Kareem Darwish. Building a Shallow Morphological Analyzer in One Day. In Proceedings of the workshop on Computational Approaches to Semitic Languages in the 40th Annual Meeting of the Association for Computational Linguistics (ACL-02), pp. 47–54, Philadelphia, PA, USA, 2002.
Google Scholar
Mona Diab, Kadri Hacioglu, and Daniel Jurafsky. Automatic Tagging of Arabic Text: From Raw Text to Base Phrase Chunks. In Proceedings of the 5th Meeting of the North American Chapter of the Association for Computational Linguistics/Human Language Technologies Conference (HLT-NAACL04), pp. 149–152, Boston, MA, 2004.
Google Scholar
Bonnie J. Dorr, Pamela W. Jordan, and John W. Benoit. A Survey of Current Research in Machine Translation. In M. Zelkowitz, editor, Advances in Computers, Vol. 49, pp. 1–68. Academic Press, London, 1999.
Google Scholar
Anas El Isbihani, Shahram Khadivi, Oliver Bender, and Hermann Ney. Morpho-syntactic arabic preprocessing for arabic to english statistical machine translation. In Proceedings on the Workshop on Statistical Machine Translation, pp. 15–22, New York City, June 2006. Association for Computational Linguistics.
Google Scholar
Sharon Goldwater and David McClosky. Improving Statistical MT Through Morphological Analysis. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 676–683, Vancouver, Canada, 2005.
Google Scholar
Nizar Habash. Generation Heavy Hybrid Machine Translation. PhD thesis, University of Maryland College Park, 2003.
Google Scholar
Nizar Habash. Large Scale Lexeme Based Arabic Morphological Generation. In Proceedings of Traitement Automatique des Langues Naturelles (TALN-04), pp. 271–276, 2004. Fez, Morocco.
Google Scholar
Nizar Habash, Bonnie Dorr, and Christof Monz. Challenges in Building an Arabic-English GHMT System with SMT Components. In Proceedings of the 7th Conference of the Association for Machine Translation in the Americas (AMTA06), pp. 56–65, Cambridge,MA, 2006.
Google Scholar
Nizar Habash and Owen Rambow. Arabic Tokenization, Part-of-Speech Tagging and Morphological Disambiguation in One Fell Swoop. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL’05), pp. 573–580, Ann Arbor, Michigan, June 2005. Association for Computational Linguistics.
Google Scholar
Nizar Habash and Owen Rambow. MAGEAD: A Morphological Analyzer and Generator for the Arabic Dialects. In Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, pp. 681–688, Sydney, Australia, July 2006. Association for Computational Linguistics.
Google Scholar
Nizar Habash, Owen Rambow, and George Kiraz. Morphological Analysis and Generation for Arabic Dialects. In Proceedings of the Workshop on Computational Approaches to Semitic Languages at 43rd Meeting of the Association for Computational Linguistics (ACL’05), pp. 17–24, Ann Arbor, Michigan, 2005.
Google Scholar
Nizar Habash and Fatiha Sadat. Arabic Preprocessing Schemes for Statistical Machine Translation. In Proceedings of the 7th Meeting of the North American Chapter of the Association for Computational Linguistics/Human Language Technologies Conference (HLT-NAACL06), pp. 49–52, New York, NY, 2006.
Google Scholar
Jan Hajič, Otakar Smrž, Tim Buckwalter, and Hubert Jin. Feature-based Tagger of Approximations of Functional Arabic Morphology. In Ma. Antonia Martí Montserrat Civit, Sandra Kübler, editor, Proceedings of Treebanks and Linguistic Theories (TLT), pp. 53–64, Barcelona, Spain, 2005.
Google Scholar
Xu Jinxi. UN Parallel Text (Arabic-English), LDC Catalog No.: LDC2002E15, 2002. Linguistic Data Consortium, University of Pennsylvania.
Google Scholar
Lauri Karttunen, Ronald Kaplan, and Annie Zaenen. Two-level Morphology with Composition. In Proceedings of Fourteenth International Conference on Computational Linguistics (COLING-92), pp. 141–148, Nantes, France, July 20–28 1992.
Google Scholar
George Kiraz. Multi-tape Two-level Morphology: A Case study in Semitic Non-Linear Morphology. In Proceedings of Fifteenth International Conference on Computational Linguistics (COLING-94), pp. 180–186, Kyoto, Japan, 1994.
Google Scholar
Katrin Kirchhoff, Mei Yang, and Kevin Duh. Statistical Machine Translation of Parliamentary Proceedings Using Morpho-Syntactic Knowledge. In TC-STAR Workshop on Speech-to-Speech Translation, pp. 57–62, Barcelona, Spain, 2006.
Google Scholar
Kevin Knight. A Statistical MT Tutorial Workbook, April 30 1999. http://www.clsp. jhu.edu/ws99/projects/mt/mt-workbook.htm.
Google Scholar
Philipp Koehn. Pharaoh: a Beam Search Decoder for Phrase-based Statistical Machine Translation Models. In Proceedings of the Association for Machine Translation in the Americas, pp. 115–124, 2004.
Google Scholar
Philipp Koehn, Franz Josef Och, and Daniel Marcu. Statistical Phrase-based Translation. In Proceedings of the Human Language Technology and North American Association for Computational Linguistics Conference (HLT/NAACL), pp. 127–133, Edmonton, Canada, 2003.
Google Scholar
Kimmo Koskenniemi. Two-Level Model for Morphological Analysis. In Proceedings of the 8th International Joint Conference on Artificial Intelligence, pp. 683–685, 1983.
Google Scholar
Young-Suk Lee. Morphological Analysis for Statistical Machine Translation. In Proceedings of the 5th Meeting of the North American Chapter of the Association for Computational Linguistics/Human Language Technologies Conference (HLT-NAACL04), pp. 57–60, Boston, MA, 2004.
Google Scholar
Young-Suk Lee, Kishore Papineni, Salim Roukos, Ossama Emam, and Hany Hassan. Language Model Based Arabic Word Segmentation. In Proceedings of the 41st Meeting of the Association for Computational Linguistics (ACL’03), pp. 399–406, Sapporo, Japan, 2003.
Google Scholar
Mohamed Maamouri, Ann Bies, and Tim Buckwalter. The Penn Arabic Treebank: Building a Large-Scale Annotated Arabic Corpus. In NEMLAR Conference on Arabic Language Resources and Tools, Cairo, Egypt, 2004.
Google Scholar
Guido Minnen, John Carroll, and Darren Pearce. Robust, Applied Morphological Generation. In Proceedings of the 1st International Conference on Natural Language Generation (INLG 2000), pp. 201–208, Mitzpe Ramon, Israel, 2000.
Google Scholar
Sonja Nieien and Hermann Ney. Statistical Machine Translation with Scarce Resources Using Morpho-syntactic Information. Computational Linguistics, 30(2), 2004.
Google Scholar
Franz Josef Och. Google System Description for the 2005 NIST MT Evaluation. In MT Eval Workshop (unpublished talk), 2005.
Google Scholar
Franz Josef Och, Daniel Gildea, Sanjeev Khudanpur, Anoop Sarkar, Kenji Yamada, Alex Fraser, Shankar Kumar, Libin Shen, David Smith, Katherine Eng, Viren Jain, Zhen Jin, and Dragomir Radev. A Smorgasbord of Features for Statistical Machine Translation. In Proceedings of the Human Language Technology / North American Association of Computational Linguistics Conference, pp. 161–168, Boston, Massachusetts, 2004.
Google Scholar
Franz Josef Och and Hermann Ney. A Systematic Comparison of Various Statistical Alignment Models. Computational Linguistics, 29(1):19–52, 2003.
Article Google Scholar
Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. BLEU: A Method for Automatic Evaluation of Machine Translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, pp. 311–318, Philadelphia, PA, 2002.
Google Scholar
Aaron Phillips and Violetta Cavalli-Sforza. Arabic-to-English Example Based Machine Translation Using Context-Insensitive Morphological Analysis. In Journées d’Etudes sur le Traitement Automatique de la Langue Arabe (JETALA), Rabat, Morocco, 2006.
Google Scholar
Maja Popović and Hermann Ney. Towards the Use of Word Stems and Suffixes for Statistical Machine Translation. In Proceedings of the 4th International Conference on Language Resources and Evaluation (LREC), pp. 1585–1588, Lisbon, Portugal, May 2004.
Google Scholar
Chris Quirk, Arul Menezes, and Colin Cherry. Dependency Treelet Translation: Syntactically Informed Phrasal SMT. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics, pp. 271–279, Ann Arbor, Michigan, 2005.
Google Scholar
Jason Riesa and David Yarowsky. Minimally Supervised Morphological Segmentation with Applications to Machine Translation. In Proceedings of the 7th Conference of the Association for Machine Translation in the Americas (AMTA06), pp. 185–192, Cambridge, MA, 2006.
Google Scholar
Fatiha Sadat and Nizar Habash. Combination of Arabic Preprocessing Schemes for Statistical Machine Translation. In Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, pp. 1–8, Sydney, Australia, July 2006. Association for Computational Linguistics.
Google Scholar
Mohammed Sharaf. Implications of the Agreement Features in (English to Arabic) Machine Translation. Master’s thesis, Al-Azhar University, 2002.
Google Scholar
Noah Smith, David Smith, and Roy Tromble. Context-Based Morphological Disambiguation with Random Fields. In Proceedings of the 2005 Conference on Empirical Methods in Natural Language Processing (EMNLP05), pp. 475–482, Vancouver, Canada, 2005.
Google Scholar
Harold Somers. Review Article: Example-based Machine Translation. Machine Translation, 14(2):113–157, 1999.
Article Google Scholar
Abdelhadi Soudi. Challenges in the Generation of Arabic from Interlingua. In Proceedings of Traitement Automatique des Langues Naturelles (TALN-04), pp. 343–350, 2004. Fez, Morocco.
Google Scholar
Abdelhadi Soudi, Violetta Cavalli-Sforza, and Abderrahim Jamari. A Computational Lexeme-Based Treatment of Arabic Morphology. In Proceedings of the Arabic Natural Language Processing Workshop, Conference of the Association for Computational Linguistics (ACL 2001), pp. 50–57, Toulouse, France, 2001.
Google Scholar
Abdelhadi Soudi, Violetta Cavalli-Sforza, and Abderrahim Jamari. A Prototype English-to-Arabic Interlingua-based MT system. In Proceedings of the Third International Conference on Language Resources and Evaluation: Workshop on Arabic language resources and evaluation, Las Palmas, Spain, 2002.
Google Scholar
Andreas Zollmann, Ashish Venugopal, and Stephan Vogel. Bridging the inflection morphology gap for arabic statistical machine translation. In Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers, pp. 201–204, New York City, USA, 2006. Association for Computational Linguistics.
Google Scholar

Download references

Author information

Authors and Affiliations

Center for Computational Learning Systems, Columbia University, 3022 Broadway, New York
Nizar Habash

Authors

Nizar Habash
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Ecole Nationale de I’Industrie Minérale, Rabat, Morocco
Abdelhadi Soudi
Tilburg University, The Netherlands
Antal van den Bosch
Deutsches Forschungszentrum für Künstliche Intelligenz, Saarbrücken, Germany
Günter Neumann

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Habash, N. (2007). Arabic Morphological Representations for Machine Translation. In: Soudi, A., Bosch, A.v., Neumann, G. (eds) Arabic Computational Morphology. Text, Speech and Language Technology, vol 38. Springer, Dordrecht. https://doi.org/10.1007/978-1-4020-6046-5_14

Download citation

DOI: https://doi.org/10.1007/978-1-4020-6046-5_14
Publisher Name: Springer, Dordrecht
Print ISBN: 978-1-4020-6045-8
Online ISBN: 978-1-4020-6046-5
eBook Packages: Humanities, Social Sciences and LawSocial Sciences (R0)

Publish with us

Policies and ethics