Abstract
This paper aims at presenting how to elaborate a relevant sorting of morphosyntactic tags to be used in the NooJ dictionary for Rromani language through three topics: dialectal issues, treatment of postpositions and countableness of substantives. This module encompasses all four dialects of Rromani, the isoglosses of which are basically no longer geographical. We have thus defined each of the four dialects through a combination of two tags corresponding to specific isoglosses. For instance, the so-called O-bi dialect (i.e. O-superdialect with no mutation of alveolar affricates) is labelled as “rro + rrbi” in NooJ. Then, on typological grounds, it was decided to treat the Rromani postpositions as agglutinative, non-inflectional, morphemes. Rromani postpositions are appended to substantives in the oblique case and in some cases cumulative (as in Modern Indic). In addition, the postposition of possession may be inflected in gender, number and case as an adjective (-qo, -qi, -qe of as basic forms, with variants). Accordingly, no less than some 250 potential forms are to be encountered for postpositions, covering all basic dialectal variants. However, they may all be rendered, by a much more economical system, appropriate to both Rromani grammar and computational analysis. Moreover, we investigated the system of countableness in Rromani nouns when relevant.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Proto-Rromani people were deported by Mahmood Sultan in 1018 from Kanauji in the middle valley of the Ganges.
- 2.
The four dialects of the Rromani are respectively called O-bi, O-mu, E-bi and E-mu (see chap. 2).
- 3.
In Rromani grammar, two levels of cases should be distinguished sharply: the two morphological cases (direct, oblique) expressed by an inflectional ending, and several functional cases (e.g. ablative) expressed, either by a postposition appended to a noun in the oblique case, or by a preposition preceding a noun in the direct case. Prepositional and postpositional phrases could be often equivalents (e.g. e raklesθar from the boy with a postposition -θar from vs. katar o raklo from the boy with a preposition katar from).
- 4.
For example, a noun raklo boy generates 257 “forms” in total: seven forms without postposition, 10 forms with invariable postpositions and 240 forms with a variable postposition.
- 5.
For example, long forms of the postposition -qo of are used only in the O-bi dialect.
- 6.
The capital “D” precedes the inflectional or semantic information of determinee (e.g. possessed substantive) in the Rromani module. For example, “Dsg” means the possessed substantive is in the singular case.
- 7.
In NooJ, words, lexemes and morphemes could be considered as ALUs. [3].
- 8.
A colon means inclusiveness. For example, “:N” includes any noun in any inflected form.
- 9.
Inanimate nouns in the oblique case do not exist without postposition in Rromani.
- 10.
In general, a noun inflects in four forms; a masculine noun raklo boy inflects in: raklo boy (sg + dr), rakles boy (sg + ob), rakle boys (pl + dr), raklen boys (pl + ob).
- 11.
However, a NooJ inflectional dictionary would recognize it.
- 12.
Each constraint (and its variable) is numbered from left to right ($1 being the first constraint), and the various fields of the lexicon are named “L” (corresponding Lemma), “C” (morphosyntactic Category), “S” (Syntactic or semantic features) and “F” (inFlectional information). For instance, “$1L” means corresponding lemma of the first constraint. [4].
- 13.
The paradigm buxlo large covers all adjectives, which are vocalic (i.e. ending “-o” in the basic form) and oxytonic (e.g. buxlo large, kalo black).
- 14.
Remember the combination of two tags “rro + rrbi” represents the O-bi dialect.
- 15.
The tag “rrs” (as south) represents the vernacular used in the Balkans.
- 16.
Remember the capital “D” precedes the information of determinee.
- 17.
These inflected forms of the possessive postposition are used in either the Balkan vernacular or the Carpathian one, both belonging to the O-bi dialect.
- 18.
Remember that “Dabl” means the posssessed noun is in the ablative case, not the possessor.
- 19.
The tag “rrn” (as north) represents the vernacular used in Russia and the north of Poland.
- 20.
In Rromani, there is no indefinite article. However, the cardinal number jekh one is used as the singular indefinite article.
- 21.
On morphological ground, the singular form of love money is *lovo, yet its diminutive lovorro is used as an equivalent.
References
Courthiade, M.: The nominal flexion in Rromani. In: Courthiade, M., Grigore, D. (eds.) Professor Gherghe Sarău: a Life Devoted to the Rromani Language. Editura universității din bucurești, Bucharest (2016)
Courthiade, M., et al.: Morri angluni rromane ćhibǎqi evroputni lavustik. Cigány Ház, Budapest (2009)
Silberztein, M.: La formalisation des langues: l’approche de NooJ. ISTE Eds., London (2015)
Silberztein, M.: NooJ Manual (2003). www.nooj4nlp.net
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG
About this paper
Cite this paper
Watabe, M. (2018). A NooJ Dictionary for the Rromani Language: Toward a NooJ-Relevant Sorting of Morphosyntactic Tags. In: Mbarki, S., Mourchid, M., Silberztein, M. (eds) Formalizing Natural Languages with NooJ and Its Natural Language Processing Applications. NooJ 2017. Communications in Computer and Information Science, vol 811. Springer, Cham. https://doi.org/10.1007/978-3-319-73420-0_1
Download citation
DOI: https://doi.org/10.1007/978-3-319-73420-0_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-73419-4
Online ISBN: 978-3-319-73420-0
eBook Packages: Computer ScienceComputer Science (R0)