Skip to main content

Grammar-Lexis Relations in the Computational Morphology of Arabic

  • Chapter
Arabic Computational Morphology

Part of the book series: Text, Speech and Language Technology ((TLTB,volume 38))

Abstract

Grammar-lexis rules and relations ensuring correct insertion of major lexical entries (nouns, verbs and deverbals) play an essential part in the computational morphology of Arabic. This chapter, which is based on the experiences of the DIINAR.1 Arabic lexical resource and related software, and on that of the first version of the SYSTRAN Arabic-English MT system, outlines previous approaches of the computational morphology of the language (Section 2): root and pattern (shortly recalled); lexeme-based; machine learning and statistical; stems, based on roots and patterns, and finally, the stem-based approach, including root and pattern as well as grammar-lexis information. The latter, which is the most compliant to the requirements of machine-translation and other high-level applications, is further developed in Section 3 of the Arabic word-form and a mapping of rules and relations accounting for grammar-lexis relations operating within the boundaries of that complex unit. In the Word-Formatives Grammar, rules and relations involving the lexical nucleus of the word-form play a crucial part and are formalised in a computational perspective. The stem either coincides with, or is the core of the nucleus, because lexical entries include two overall categories: in the first, stem and entry coincide; in the second, the lexical entry corresponds to a morphological compound encompassing the stem and a lexicalized extension (in most cases, a suffix which is part of the entry). Correct relations between the lexical nucleus and the other formatives included in the word-form are ensured through morphosyntactic specifiers associated to each entry of the lexical database. These relations, which have been included in the DIINAR.1 database, are both finite in number and exhaustive in coverage. They also allow computational morphology and other applications to rely on a good restriction of the generated lexica: only cliticized or affixed formatives that can effectively be associated with a given lexical nucleus are added, and ‘illegal’ ones are ruled out. In the DIINAR.1 resource, the effective number of inflected word-forms is 7,774,938 (about nine times less than one would obtain through ‘blind’ generation). A comprehensive mapping of examples is given. Their compatibility with applications going beyond computational morphology is also outlined

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Abbès, R. (2004). La conception et la réalisation d’un concordancier électronique pour l’arabe. Thèse de doctorat en sciences de l’information, ENSSIB/INSA, Lyon.

    Google Scholar 

  • Abbès, R., Dichy, J. & Hassoun, M. (2004). The Architecture of a Standard Arabic lexical database: some figures, ratios and categories from the DIINAR.1 source program. In Proceedings of the COLING-04 Workshop on Computational Approaches to Arabic Script-based Languages (pp. 15–22), Geneva.

    Google Scholar 

  • Ammar, S. & Dichy, J. (1999). Les verbes arabe.Paris: Hatier. Fully Arabic version, with specific introduction: Al-’afālu l-ςarabiyya, الأفعال العربية (same publisher and year).

    Google Scholar 

  • Arar, M. (2003). Dāhiratu l-labsi fī l-ςarabiyya [The phenomenon of ambiguity in Arabic, ظاهرة اللبس في العربية]. Amman: Dār Wā’il.

    Google Scholar 

  • Aronoff, M. (1994). Morphology by Itself: Stems and Inflectional Classes. Cambridge, MA: MIT Press.

    Google Scholar 

  • Beesley, K. (1989). Computer Analysis of Arabic Morphology: A two-level approach with detours. In Comrie, B. & Eid, M. (Eds.) (1991), Perspectives on Arabic Linguistics III: Papers from the Third Annual Symposium on Arabic Linguistics (pp. 155–172). Amsterdam: John Benjamins.

    Google Scholar 

  • Beesley, K. (2001). Finite-state morphological analysis and generation of Arabic at Xerox research: Status and plans in 2001. In Proceedings of the ACL-01 Workshop on Arabic Language Processing: Status and Prospects (pp. 1–8), Toulouse, France.

    Google Scholar 

  • Beesley, K. & Karttunen, L. (2003). Finite State Morphology. Stanford, CA: CSLI Publications.

    Google Scholar 

  • Buckwalter, T. (2002). Buckwalter Arabic Morphological Analyzer Version 1.0. Linguistic Data Consortium, Philadelphia. LDC catalog number LDC2002L49 and ISBN 1-58563-257–0. Retrieved December 16, 2006, from http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2002L49

    Google Scholar 

  • Cantineau, J. (1950a). La notion de ‘schème’ et son altération dans diverses langues sémitiques. In Semitica, 3, 73–83.

    Google Scholar 

  • Cantineau, J. (1950b). Racines et schèmes. In Mélanges offerts á William Marçais. Paris : Maisonneuve.

    Google Scholar 

  • Cassuto, P. (2000). Le classement dans les dictionnaires de l’hébreu. In Cassuto, P. & Larcher, P. (Eds.), La sémitologie, aujourd’hui (pp. 133–158).

    Google Scholar 

  • Cassuto. P. & Larcher, P. (Eds.). (2000). La sémitologie, aujourd’hui. Travaux du Cercle linguistique d’Aix-en-Provence n°16, Publications de l’université de Provence:

    Google Scholar 

  • Cohen, D. (1961). Essai d’une analyse automatique de l’arabe. T.A. informations. Reprod. in Cohen, D. études de linguistique sémitique et arabe (pp. 49–78). The Hague/Paris: Mouton.

    Google Scholar 

  • Desclés, J.-P., dir. (1983). (H. Abaab, J.-P. Desclés, J. Dichy, D.E. Kouloughli, M.S. Ziadah). Conception d’un synthétiseur et d’un analyseur morphologiques de l’arabe, en vue d’une utilisation en Enseignement assisté par Ordinateur. Rapport rédigé à la demande du Ministère des Affaires étrangères.

    Google Scholar 

  • Diab, M. & Resnik, P. (2001). An unsupervised method for word sense tagging using parallel corpora. In Proceedings of the 40thAnnual Meeting of the Association for Computational Linguistics (pp. 255–262), Philadelphia, PA.

    Google Scholar 

  • Dichy, J. (1984). Vers un modèle d’analyse automatique du mot graphique non-vocalisé en arabe. Presented at the Conference on “Communication entre langues européennes et langues orientales”, Montvillargenne, Oise. Revised version in Dichy, J. & Hassoun, M. (Eds.), (1989), pp. 92–158.

    Google Scholar 

  • Dichy, J. (1987). The SAMIA Research Program, Year Four, Progress and Prospects. In Processing Arabic Report 2 (pp. 1–26). T.C.M.O., Nijmegen University, Netherlands.

    Google Scholar 

  • Dichy, J. (1990). L’écriture dans la représentation de la langue : la lettre et le mot en arabe. Doctorat d’état, Université Lumière Lyon 2, Lyon.

    Google Scholar 

  • Dichy, J. (1993). Deux grands ‘mythes scientifiques’ relatifs au système d’écriture de l’arabe. In Savoir, images, mirages, Journées d’Études arabes, Special issue ofl’Arabisant (pp. 32–33). Paris: Association Française des Arabisants.

    Google Scholar 

  • Dichy, J. (1997). Pour une lexicomatique de l’arabe : l’unité lexicale simple et l’inventaire fini des spécificateurs du domaine du mot. Meta 42, 291–306. Presses de l’Université de Montréal.

    Google Scholar 

  • Dichy, J. (2000). Morphosyntactic Specifiers to be associated to Arabic Lexical Entries - Methodological and Theoretical Aspects. In Proceedings of ACIDA 2000 (Vol. ‘Corpora and Natural Language Processing’, pp. 55–60), Monastir, Tunisia.

    Google Scholar 

  • Dichy, J. (2003). Sens des schèmes et sens des racines en arabe: le principe de figement lexical (PFL) et ses effets sur le lexique d’une langue sémitique. In Rémi-Giraud, S. & Panier, L., dir., La polysémie ou l’empire des sens (pp. 189–211). Lyon: Presses Universitaires de Lyon.

    Google Scholar 

  • Dichy, J. (2005). Spécificateurs engendrés par les traits [± animé], [± humain], [± concret] et structures d’arguments en arabe et en français. In Béjoint, H. & Maniez, F. (Eds.), De la mesure dans les termes, Actes du colloque en hommage à Philippe Thoiron (pp. 151–181). Lyon: Presses Universitaires de Lyon.

    Google Scholar 

  • Dichy, J. Braham, A., Ghazali, S. & Hassoun, M. (2002). La base de connaissances linguistiques DIINAR.1 (DIctionnaire INformatisé de l’ARabe, version 1). In Braham, A. (Ed.), Proceedings of the International Symposium on the Processing of Arabic, Université de la Manouba, Tunisia.

    Google Scholar 

  • Dichy, J. & Farghaly, A. (2003). Roots and Patterns vs. Stems plus Grammar-Lexis Specifications: on what basis should a multilingual lexical database centred on Arabic be built? In Proceedings of the IXth MT Summit Workshop on Machine Translation for Semitic Languages: Issues and Approaches (pp. 1–8), New Orleans.

    Google Scholar 

  • Dichy, J. & Hassoun, M. (Eds.) (1989). Simulation de modèles linguistiques et Enseignement Assisté par Ordinateur de l’arabe – Travaux SAMIA I. Paris: Conseil International de la Langue Française.

    Google Scholar 

  • Dien, D., Kiem, H. & Hovy, E. (2003). BTL: a Hybrid Model for English-Vietnamese Machine Translation. In Proceedings of the IXth MT Summit (pp. 87–94), New Orleans.

    Google Scholar 

  • Ditters, E. (1992). A Formal Approach to Arabic Syntax: The Noun phrase and the Verb Phrase. Ph.D. dissertation, Catholic University of Nijmegen, Netherlands.

    Google Scholar 

  • Farghaly, A. (1987). Three Level Morphology. Paper presented at the Arabic Morphology Workshop, Linguistic Summer Institute, Stanford, CA.

    Google Scholar 

  • Farghaly, A. (1994). Discontinuity in the Lexicon: A Case from Arabic Morphology. In International Conference on Arabic Linguistics, The American University in Cairo, Cairo, Egypt.

    Google Scholar 

  • Fassi-Fehri, A. (1997). Al-Maςjama wa-t-taxTīT – NaDarāt jadīda fī qaDāyā l-luγ a l-ςarabiyya [Lexicography and language planning. Arabic Language matters reconsidered, المعجمة والتخطيط – نظرات جديدة في قضايا اللغة العربية]. Casablanca, Morocco: Al-Markaz al-thaqāfiyy al-ςarabiyy.

    Google Scholar 

  • Forster, G., Grandrabur, S., Langlais, P., Plamondon, P., Russel, G. & Simard, M. (2003). Statistical Machine Translation: Rapid Development with limited Resources. In Proceedings of the IXth MT Summit (pp. 110–117), New Orleans.

    Google Scholar 

  • Frost, R., Deutsch, A. & Forster, K.I. (2000). Decomposing morphologically complex words in a non linear morphology. Journal of Experimental Psychology: Learning, Memory and Cognition, 26, 751–65.

    Article  Google Scholar 

  • Frost, R., Forster, K.I. & Deutsch, A. (1997). What can we learn from the morphology of Hebrew? A masked priming investigation of morphological representation. Journal of Experimental Psychology: Learning, Memory and Cognition, 23, 829–856.

    Article  Google Scholar 

  • Geith, M. & El-Saadany, T. (1987). Arabic morphological analyzer on a personal computer. Presented at the Arabic Morphology Workshop, Linguistic Summer Institute, Stanford, CA.

    Google Scholar 

  • Ghenima, M. (1998). Analyse morpho-syntaxique en vue de la voyellation assistée par ordinateur des textes écrits en arabe. Ph.D. dissertation, ENSSIB/Université Lyon 2.

    Google Scholar 

  • Grainger, J., Dichy, J., El-Halfaoui, M. & Bamhamed, M. (2003). Approche expérimentale de la reconnaissance du mot écrit en arabe. In Jaffré, J.-P. (Ed.), Dynamiques de l’écriture: approches pluridisciplinaires. Faits de langue, 22, 77–86.

    Google Scholar 

  • Hassoun, M. (1987). Conception d’un dictionnaire pour le traitement automatique de l’arabe dans différents contextes d’application., Ph.D. (thèse d’Ètat), Université Lyon 1.

    Google Scholar 

  • Hlal, Y. (1979). Méthode d’apprentissage pour l’analyse morphosyntaxique (expérimentée dans le cas de l’arabe et du français). Ph.D. dissertation, Université Paris-Sud, Centre d’Orsay.

    Google Scholar 

  • Hlal, Y. (1985a). Morphology and syntax of the Arabic language. Arab School of Sciences and Technology: Informatics 4C, 1–8.

    Google Scholar 

  • Hlal, Y. (1985b). Morphological analysis of Arabic speech. In Workshop Papers Kuwait/Proceedings of Kuwait Conference on Computer Processing of the Arabic Language (Section 13, pp. 273–294).

    Google Scholar 

  • Karttunnen, L. (1994). Constructing Lexical Transducers. In Proceedings of COLING-94, (pp. 206–411), Tokyo, Japan.

    Google Scholar 

  • Karttunnen, L. & Beesley, K.R. (2005). Twenty-five years of finite-state morphology. In Arppe, A., Carlson, L., Lindén, K., Piitulainen, J., Suominen, M., Vainio, M., Westerlund, H. & Yli-Jyrä, A. (Eds.), Inquiries into Words, Constraints and Contexts. Festschrift for Kimmo Koskenniemi on his 60th Birthday (2005). CSLI Studies in Computational Linguistics ONLINE, pp. 71–83. Copestake, A. (Series Ed.). Stanford, CA: CSLI Publications.

    Google Scholar 

  • McCarthy, J. (1981). A Prosodic Theory of Nonconcatenative Morphology. Linguistic Inquiry, 12, 373–418.

    Google Scholar 

  • McCarthy, John J. & Prince, Alan S. (1996). Prosodic morphology. Technical report 32, Rutgers University Center for Cognitive science.

    Google Scholar 

  • Melčuk, I. A. (1982). Towards a Language of Linguistics: A System of Formal Notions for Theoretical Morphology. München: Wilhem Fink Verlag.

    Google Scholar 

  • Nikkhou, M. (Ed.) (2004). NEMLAR International Conference on Arabic Language Resources and Tools, Cairo. Paris: ELDA.

    Google Scholar 

  • Ouersighni, R. (2001). A major offshoot of the DIINAR-MBC project: AraParse, a morpho-syntactic analyzer of unvowelled Arabic texts. In ACL-01 Workshop on Arabic Language Processing: Status and Prospects (pp. 66–72), Toulouse, France.

    Google Scholar 

  • Ouersighni, R. (2002). La conception et la réalisation d’un système d’analyse morpho-syntaxique robuste pour l’arabe: utilisation pour la détection et le diagnostic des fautes d’accord. Ph.D. dissertation, ENSSIB/Université Lyon 2.

    Google Scholar 

  • Rogati, M., McCarley, S. & Yang, Y. (2003). Unsupervised Learning of Arabic Stemming Using a Parallel Corpus. In 41st Annual Meeting of the Association of Computational Linguistics (pp. 391–398), Sapporo, Japan.

    Google Scholar 

  • Roman, A. (1990). Grammaire de l’arabe. Paris: P.U.F., coll. “Que sais-je?”.

    Google Scholar 

  • Roman, A. (1999). La création lexicale en arabe, ressources et limites de la nomination dans une langue humaine naturelle. Presses Universitaires de Lyon.

    Google Scholar 

  • Rousseau, J. (1987). La découverte de la racine en sémitique par l’idéologue Volney. Historiographia Linguistica, 14(3), 341–365.

    Google Scholar 

  • Sampson, G. (1985). Writing systems. Stanford University Press.

    Google Scholar 

  • Schafer, C. & Yarowsky, D. (2003). A Two-Level Syntax-Based Approach to Arabic-English Statistical Machine Translation. In Proceedings of the IXth MT Summit Workshop on Machine Translation for Semitic Languages: Issues and Approaches (pp. 45–52), New Orleans.

    Google Scholar 

  • Soudi, A., Cavalli-Sforza, V. & Jamari, A. (2001). A Computational Lexeme-Based Treatment of Arabic Morphology. In ACL-01 Workshop on Arabic Language Processing: Status and Prospects (pp. 155–162), Toulouse, France.

    Google Scholar 

  • Troupeau, G. (1984). La notion de ‘racine’ chez les grammairiens arabes anciens. In Auroux, S., Glatiny, M., Joly, A., Nicolas, A. & Rosier, I. (Eds.), Matériaux pour une histoire des théories linguistiques, pp. 239–245. Presses Universitaires de Lille.

    Google Scholar 

  • Zaafrani, R. (2002). Développement d’un environnement interactif d’apprentissage avec ordinateur de l’arabe langue étrangère. Ph.D. dissertation, ENSSIB/Université Lyon 2.

    Google Scholar 

  • Zwiep, I.E. (1996). The Hebrew linguistic tradition of the Middle Ages. Histoire Épistémologie Langage, 18(1), 41–61.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer

About this chapter

Cite this chapter

Dichy, J., Farghaly, A. (2007). Grammar-Lexis Relations in the Computational Morphology of Arabic. In: Soudi, A., Bosch, A.v., Neumann, G. (eds) Arabic Computational Morphology. Text, Speech and Language Technology, vol 38. Springer, Dordrecht. https://doi.org/10.1007/978-1-4020-6046-5_7

Download citation

Publish with us

Policies and ethics