Abstract
Grammar-lexis rules and relations ensuring correct insertion of major lexical entries (nouns, verbs and deverbals) play an essential part in the computational morphology of Arabic. This chapter, which is based on the experiences of the DIINAR.1 Arabic lexical resource and related software, and on that of the first version of the SYSTRAN Arabic-English MT system, outlines previous approaches of the computational morphology of the language (Section 2): root and pattern (shortly recalled); lexeme-based; machine learning and statistical; stems, based on roots and patterns, and finally, the stem-based approach, including root and pattern as well as grammar-lexis information. The latter, which is the most compliant to the requirements of machine-translation and other high-level applications, is further developed in Section 3 of the Arabic word-form and a mapping of rules and relations accounting for grammar-lexis relations operating within the boundaries of that complex unit. In the Word-Formatives Grammar, rules and relations involving the lexical nucleus of the word-form play a crucial part and are formalised in a computational perspective. The stem either coincides with, or is the core of the nucleus, because lexical entries include two overall categories: in the first, stem and entry coincide; in the second, the lexical entry corresponds to a morphological compound encompassing the stem and a lexicalized extension (in most cases, a suffix which is part of the entry). Correct relations between the lexical nucleus and the other formatives included in the word-form are ensured through morphosyntactic specifiers associated to each entry of the lexical database. These relations, which have been included in the DIINAR.1 database, are both finite in number and exhaustive in coverage. They also allow computational morphology and other applications to rely on a good restriction of the generated lexica: only cliticized or affixed formatives that can effectively be associated with a given lexical nucleus are added, and ‘illegal’ ones are ruled out. In the DIINAR.1 resource, the effective number of inflected word-forms is 7,774,938 (about nine times less than one would obtain through ‘blind’ generation). A comprehensive mapping of examples is given. Their compatibility with applications going beyond computational morphology is also outlined
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Abbès, R. (2004). La conception et la réalisation d’un concordancier électronique pour l’arabe. Thèse de doctorat en sciences de l’information, ENSSIB/INSA, Lyon.
Abbès, R., Dichy, J. & Hassoun, M. (2004). The Architecture of a Standard Arabic lexical database: some figures, ratios and categories from the DIINAR.1 source program. In Proceedings of the COLING-04 Workshop on Computational Approaches to Arabic Script-based Languages (pp. 15–22), Geneva.
Ammar, S. & Dichy, J. (1999). Les verbes arabe.Paris: Hatier. Fully Arabic version, with specific introduction: Al-’afālu l-ςarabiyya, الأفعال العربية (same publisher and year).
Arar, M. (2003). Dāhiratu l-labsi fī l-ςarabiyya [The phenomenon of ambiguity in Arabic, ظاهرة اللبس في العربية]. Amman: Dār Wā’il.
Aronoff, M. (1994). Morphology by Itself: Stems and Inflectional Classes. Cambridge, MA: MIT Press.
Beesley, K. (1989). Computer Analysis of Arabic Morphology: A two-level approach with detours. In Comrie, B. & Eid, M. (Eds.) (1991), Perspectives on Arabic Linguistics III: Papers from the Third Annual Symposium on Arabic Linguistics (pp. 155–172). Amsterdam: John Benjamins.
Beesley, K. (2001). Finite-state morphological analysis and generation of Arabic at Xerox research: Status and plans in 2001. In Proceedings of the ACL-01 Workshop on Arabic Language Processing: Status and Prospects (pp. 1–8), Toulouse, France.
Beesley, K. & Karttunen, L. (2003). Finite State Morphology. Stanford, CA: CSLI Publications.
Buckwalter, T. (2002). Buckwalter Arabic Morphological Analyzer Version 1.0. Linguistic Data Consortium, Philadelphia. LDC catalog number LDC2002L49 and ISBN 1-58563-257–0. Retrieved December 16, 2006, from http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2002L49
Cantineau, J. (1950a). La notion de ‘schème’ et son altération dans diverses langues sémitiques. In Semitica, 3, 73–83.
Cantineau, J. (1950b). Racines et schèmes. In Mélanges offerts á William Marçais. Paris : Maisonneuve.
Cassuto, P. (2000). Le classement dans les dictionnaires de l’hébreu. In Cassuto, P. & Larcher, P. (Eds.), La sémitologie, aujourd’hui (pp. 133–158).
Cassuto. P. & Larcher, P. (Eds.). (2000). La sémitologie, aujourd’hui. Travaux du Cercle linguistique d’Aix-en-Provence n°16, Publications de l’université de Provence:
Cohen, D. (1961). Essai d’une analyse automatique de l’arabe. T.A. informations. Reprod. in Cohen, D. études de linguistique sémitique et arabe (pp. 49–78). The Hague/Paris: Mouton.
Desclés, J.-P., dir. (1983). (H. Abaab, J.-P. Desclés, J. Dichy, D.E. Kouloughli, M.S. Ziadah). Conception d’un synthétiseur et d’un analyseur morphologiques de l’arabe, en vue d’une utilisation en Enseignement assisté par Ordinateur. Rapport rédigé à la demande du Ministère des Affaires étrangères.
Diab, M. & Resnik, P. (2001). An unsupervised method for word sense tagging using parallel corpora. In Proceedings of the 40thAnnual Meeting of the Association for Computational Linguistics (pp. 255–262), Philadelphia, PA.
Dichy, J. (1984). Vers un modèle d’analyse automatique du mot graphique non-vocalisé en arabe. Presented at the Conference on “Communication entre langues européennes et langues orientales”, Montvillargenne, Oise. Revised version in Dichy, J. & Hassoun, M. (Eds.), (1989), pp. 92–158.
Dichy, J. (1987). The SAMIA Research Program, Year Four, Progress and Prospects. In Processing Arabic Report 2 (pp. 1–26). T.C.M.O., Nijmegen University, Netherlands.
Dichy, J. (1990). L’écriture dans la représentation de la langue : la lettre et le mot en arabe. Doctorat d’état, Université Lumière Lyon 2, Lyon.
Dichy, J. (1993). Deux grands ‘mythes scientifiques’ relatifs au système d’écriture de l’arabe. In Savoir, images, mirages, Journées d’Études arabes, Special issue ofl’Arabisant (pp. 32–33). Paris: Association Française des Arabisants.
Dichy, J. (1997). Pour une lexicomatique de l’arabe : l’unité lexicale simple et l’inventaire fini des spécificateurs du domaine du mot. Meta 42, 291–306. Presses de l’Université de Montréal.
Dichy, J. (2000). Morphosyntactic Specifiers to be associated to Arabic Lexical Entries - Methodological and Theoretical Aspects. In Proceedings of ACIDA 2000 (Vol. ‘Corpora and Natural Language Processing’, pp. 55–60), Monastir, Tunisia.
Dichy, J. (2003). Sens des schèmes et sens des racines en arabe: le principe de figement lexical (PFL) et ses effets sur le lexique d’une langue sémitique. In Rémi-Giraud, S. & Panier, L., dir., La polysémie ou l’empire des sens (pp. 189–211). Lyon: Presses Universitaires de Lyon.
Dichy, J. (2005). Spécificateurs engendrés par les traits [± animé], [± humain], [± concret] et structures d’arguments en arabe et en français. In Béjoint, H. & Maniez, F. (Eds.), De la mesure dans les termes, Actes du colloque en hommage à Philippe Thoiron (pp. 151–181). Lyon: Presses Universitaires de Lyon.
Dichy, J. Braham, A., Ghazali, S. & Hassoun, M. (2002). La base de connaissances linguistiques DIINAR.1 (DIctionnaire INformatisé de l’ARabe, version 1). In Braham, A. (Ed.), Proceedings of the International Symposium on the Processing of Arabic, Université de la Manouba, Tunisia.
Dichy, J. & Farghaly, A. (2003). Roots and Patterns vs. Stems plus Grammar-Lexis Specifications: on what basis should a multilingual lexical database centred on Arabic be built? In Proceedings of the IXth MT Summit Workshop on Machine Translation for Semitic Languages: Issues and Approaches (pp. 1–8), New Orleans.
Dichy, J. & Hassoun, M. (Eds.) (1989). Simulation de modèles linguistiques et Enseignement Assisté par Ordinateur de l’arabe – Travaux SAMIA I. Paris: Conseil International de la Langue Française.
Dien, D., Kiem, H. & Hovy, E. (2003). BTL: a Hybrid Model for English-Vietnamese Machine Translation. In Proceedings of the IXth MT Summit (pp. 87–94), New Orleans.
Ditters, E. (1992). A Formal Approach to Arabic Syntax: The Noun phrase and the Verb Phrase. Ph.D. dissertation, Catholic University of Nijmegen, Netherlands.
Farghaly, A. (1987). Three Level Morphology. Paper presented at the Arabic Morphology Workshop, Linguistic Summer Institute, Stanford, CA.
Farghaly, A. (1994). Discontinuity in the Lexicon: A Case from Arabic Morphology. In International Conference on Arabic Linguistics, The American University in Cairo, Cairo, Egypt.
Fassi-Fehri, A. (1997). Al-Maςjama wa-t-taxTīT – NaDarāt jadīda fī qaDāyā l-luγ a l-ςarabiyya [Lexicography and language planning. Arabic Language matters reconsidered, المعجمة والتخطيط – نظرات جديدة في قضايا اللغة العربية]. Casablanca, Morocco: Al-Markaz al-thaqāfiyy al-ςarabiyy.
Forster, G., Grandrabur, S., Langlais, P., Plamondon, P., Russel, G. & Simard, M. (2003). Statistical Machine Translation: Rapid Development with limited Resources. In Proceedings of the IXth MT Summit (pp. 110–117), New Orleans.
Frost, R., Deutsch, A. & Forster, K.I. (2000). Decomposing morphologically complex words in a non linear morphology. Journal of Experimental Psychology: Learning, Memory and Cognition, 26, 751–65.
Frost, R., Forster, K.I. & Deutsch, A. (1997). What can we learn from the morphology of Hebrew? A masked priming investigation of morphological representation. Journal of Experimental Psychology: Learning, Memory and Cognition, 23, 829–856.
Geith, M. & El-Saadany, T. (1987). Arabic morphological analyzer on a personal computer. Presented at the Arabic Morphology Workshop, Linguistic Summer Institute, Stanford, CA.
Ghenima, M. (1998). Analyse morpho-syntaxique en vue de la voyellation assistée par ordinateur des textes écrits en arabe. Ph.D. dissertation, ENSSIB/Université Lyon 2.
Grainger, J., Dichy, J., El-Halfaoui, M. & Bamhamed, M. (2003). Approche expérimentale de la reconnaissance du mot écrit en arabe. In Jaffré, J.-P. (Ed.), Dynamiques de l’écriture: approches pluridisciplinaires. Faits de langue, 22, 77–86.
Hassoun, M. (1987). Conception d’un dictionnaire pour le traitement automatique de l’arabe dans différents contextes d’application., Ph.D. (thèse d’Ètat), Université Lyon 1.
Hlal, Y. (1979). Méthode d’apprentissage pour l’analyse morphosyntaxique (expérimentée dans le cas de l’arabe et du français). Ph.D. dissertation, Université Paris-Sud, Centre d’Orsay.
Hlal, Y. (1985a). Morphology and syntax of the Arabic language. Arab School of Sciences and Technology: Informatics 4C, 1–8.
Hlal, Y. (1985b). Morphological analysis of Arabic speech. In Workshop Papers Kuwait/Proceedings of Kuwait Conference on Computer Processing of the Arabic Language (Section 13, pp. 273–294).
Karttunnen, L. (1994). Constructing Lexical Transducers. In Proceedings of COLING-94, (pp. 206–411), Tokyo, Japan.
Karttunnen, L. & Beesley, K.R. (2005). Twenty-five years of finite-state morphology. In Arppe, A., Carlson, L., Lindén, K., Piitulainen, J., Suominen, M., Vainio, M., Westerlund, H. & Yli-Jyrä, A. (Eds.), Inquiries into Words, Constraints and Contexts. Festschrift for Kimmo Koskenniemi on his 60th Birthday (2005). CSLI Studies in Computational Linguistics ONLINE, pp. 71–83. Copestake, A. (Series Ed.). Stanford, CA: CSLI Publications.
McCarthy, J. (1981). A Prosodic Theory of Nonconcatenative Morphology. Linguistic Inquiry, 12, 373–418.
McCarthy, John J. & Prince, Alan S. (1996). Prosodic morphology. Technical report 32, Rutgers University Center for Cognitive science.
Melčuk, I. A. (1982). Towards a Language of Linguistics: A System of Formal Notions for Theoretical Morphology. München: Wilhem Fink Verlag.
Nikkhou, M. (Ed.) (2004). NEMLAR International Conference on Arabic Language Resources and Tools, Cairo. Paris: ELDA.
Ouersighni, R. (2001). A major offshoot of the DIINAR-MBC project: AraParse, a morpho-syntactic analyzer of unvowelled Arabic texts. In ACL-01 Workshop on Arabic Language Processing: Status and Prospects (pp. 66–72), Toulouse, France.
Ouersighni, R. (2002). La conception et la réalisation d’un système d’analyse morpho-syntaxique robuste pour l’arabe: utilisation pour la détection et le diagnostic des fautes d’accord. Ph.D. dissertation, ENSSIB/Université Lyon 2.
Rogati, M., McCarley, S. & Yang, Y. (2003). Unsupervised Learning of Arabic Stemming Using a Parallel Corpus. In 41st Annual Meeting of the Association of Computational Linguistics (pp. 391–398), Sapporo, Japan.
Roman, A. (1990). Grammaire de l’arabe. Paris: P.U.F., coll. “Que sais-je?”.
Roman, A. (1999). La création lexicale en arabe, ressources et limites de la nomination dans une langue humaine naturelle. Presses Universitaires de Lyon.
Rousseau, J. (1987). La découverte de la racine en sémitique par l’idéologue Volney. Historiographia Linguistica, 14(3), 341–365.
Sampson, G. (1985). Writing systems. Stanford University Press.
Schafer, C. & Yarowsky, D. (2003). A Two-Level Syntax-Based Approach to Arabic-English Statistical Machine Translation. In Proceedings of the IXth MT Summit Workshop on Machine Translation for Semitic Languages: Issues and Approaches (pp. 45–52), New Orleans.
Soudi, A., Cavalli-Sforza, V. & Jamari, A. (2001). A Computational Lexeme-Based Treatment of Arabic Morphology. In ACL-01 Workshop on Arabic Language Processing: Status and Prospects (pp. 155–162), Toulouse, France.
Troupeau, G. (1984). La notion de ‘racine’ chez les grammairiens arabes anciens. In Auroux, S., Glatiny, M., Joly, A., Nicolas, A. & Rosier, I. (Eds.), Matériaux pour une histoire des théories linguistiques, pp. 239–245. Presses Universitaires de Lille.
Zaafrani, R. (2002). Développement d’un environnement interactif d’apprentissage avec ordinateur de l’arabe langue étrangère. Ph.D. dissertation, ENSSIB/Université Lyon 2.
Zwiep, I.E. (1996). The Hebrew linguistic tradition of the Middle Ages. Histoire Épistémologie Langage, 18(1), 41–61.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2007 Springer
About this chapter
Cite this chapter
Dichy, J., Farghaly, A. (2007). Grammar-Lexis Relations in the Computational Morphology of Arabic. In: Soudi, A., Bosch, A.v., Neumann, G. (eds) Arabic Computational Morphology. Text, Speech and Language Technology, vol 38. Springer, Dordrecht. https://doi.org/10.1007/978-1-4020-6046-5_7
Download citation
DOI: https://doi.org/10.1007/978-1-4020-6046-5_7
Publisher Name: Springer, Dordrecht
Print ISBN: 978-1-4020-6045-8
Online ISBN: 978-1-4020-6046-5
eBook Packages: Humanities, Social Sciences and LawSocial Sciences (R0)