Skip to main content

Realistic data and paradigms: the paradigm cell finding problem


Since Blevins (2006), there has been a shift in morphological frameworks away from what he called a constructive perspective towards an abstractive perspective based on data directly available to speakers (i.e whole words).

This evolution towards word-based morphology is part of a more general anticonstructionist movement in social sciences characterised by the quote in (1) about constructive approaches cited by Blevins et al. (2016a):

  1. (1)

    The main fallacy in this kind of thinking is that the reductionist hypothesis does not by any means imply a “constructionist” one: The ability to reduce everything to simple fundamental laws does not imply the ability to start from those laws and reconstruct the universe. In fact the more the elementary physicists tell us about the nature of the fundamental laws, the less relevance they seem to have to the very real problems of the rest of science, much less to those of society (Anderson 1972).

In this paper, we elaborate on Blevins (2006) to define a realistic perspective for the use of morphological data and give an illustration of its place in the emergence of both inflectional and derivation paradigms with the French verbs and the French Ethnics.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17


  1. 1.

    “In the ancient model the primary insight is not that words can be split into formatives, but they can be located in paradigms. They are not wholes composed of simple parts, but are themselves the parts within a complex whole.”

  2. 2.

    In theory, realistic data should start with utterances in context but, with morphology in mind, we consider a stage where the speaker is capable of breaking down utterances into words.

  3. 3.

    Paradigm Cell Filling/Finding Problem, see Ackerman et al. (2009) and Sect. 3 for explicit definitions.

  4. 4.

    Even though, the rules of referral Stump (2001) used alongside his rules of exponence are in fact paradigmatic relations.

  5. 5.

    At this level, Robins (1959) considers ‘Item & Process’ and ‘Item & Arrangement’ as variants.

  6. 6.

    The syntagmatic models cited above use formal paradigms but rather to define sets of morphosyntactic cells associated with the lexemes than to relate word-forms together.

  7. 7.

    We consider that the units of the syntactic frameworks and lexeme-based theories should be derived from primary data.

  8. 8.

    In fact, we do not argue that exponents should not be part of morphological analyses but that they should not be presupposed but rather abstracted.

  9. 9.

    A misleading but practical simplification for the purpose of their demonstration.

  10. 10.

    SdeWaC (Faaßand Eckart 2013) is a 880M word German corpus based on deWaC (Baroni et al. 2009), a 1700M word German corpus constructed from the web.

  11. 11.

    Zipf (1932).

  12. 12.

    See Henri (2010, Chap. 4, 125–150) for a detailed description of Mauritian verbal inflection.

  13. 13.

    The inflected variants counted here are purely based on forms as SdeWaC gives no information on the morphosyntactic properties of the noun forms. A form occupying several syncretic inflectional cells is counted only for 1. German nouns declension counts 8 morphosyntactic cells but many German nouns do not have 4 variants. The figure is reproduced as is from the article with the log-scale on the y axis.

  14. 14.

    Lexique3 is a French lexicon containing 135,000 word entries (55,000 lemmas) giving the frequencies for every word/lemma in Frantext and in a set of subtitles. See New (2006) for a complete description.

  15. 15.

    FrWaC is a 1600M word French corpus constructed from the web (Baroni et al. 2009).

  16. 16.

    There are three major differences between the accounts: i) Blevins et al. count inflected variants (the same form in 3 cells counts for 1) while Bonami and Beniamine count cell-forms (the same form in 3 cells counts for 3), ii) noun compounding is very productive in German, leading to a lot of hapax legomena compared to French verbs, iii) Bonami and Beniamine restrict their count of verb forms to the 6847 verbs documented in the Lefff lexicon (Sagot 2010) filtering out the huge amount of noise in FrWaC while Blevins et al. say nothing about filtering.

  17. 17.

    See Sect. 3.3 for an outline of the proposed adaptations.

  18. 18.

    Henri (2010, pp. 139–140) uses the typology of dialogical moves of Henri et al. (2008) which defines the counter-oriented move as follows: the Speaker’s take-up is in opposition to the Addressee’s move or the situation.

  19. 19.

    If the features could combine freely, the MSP would consist of the 8 following cells:

    • ne +, pc +, co + ne +, pc +, co - ne +, pc -, co + ne -, pc +, co +
      ne -, pc -, co - ne -, pc -, co + ne -, pc +, co - ne -, pc -, co -
  20. 20.

    In the Bonami et al. (2011) sample of 2079 verbs, 30% have only one form and 70% have two different forms.

  21. 21.

    Boyé and Schalchli (2016) term this a cell paradigm. Here we adopt optimal morphomic paradigm to emphasize the fact that this is a generalization of the morphomic paradigms of lexemes.

  22. 22.

    In line with speakers intuition about the meaning of inflected forms.

  23. 23.

    A sub-paradigm is a part of a paradigm concerned with a subset of inflectional attributes.

  24. 24.

    The 1st person has no high honorific grade and no contrast between low and mid grade.

  25. 25.

    The German adjectives also have a predicative form which we left aside here. It would make for an additional cell containing gut in both the morphosyntactic and the morphomic paradigm.

  26. 26.

    Here, we do not take into account the formal Dativ-e forms which would allow for a fifth form. The OMP calculations would be similar and the resulting OMP would have an additional cell as shown below.

    1. (i)
  27. 27.

    Because of the nature of intersective syncretism, the shadings in this table do not directly correspond to the shadings in Table 19.

  28. 28.

    Like Bonami and Beniamine (2015), we distinguish homophonous forms located in different cells. We refer to these as different cell-forms. We also use co-pair to designate co-forms corresponding to a pair of cells.

  29. 29.

    The method presented here uses only cotextual information but it could be applied in a similar way with contextual information.

  30. 30.

    Overabundance is the opposite of syncretism. It occurs when several inflected forms occupying the same inflectional cell for a given lexeme.

  31. 31.

    Including 845 overabundant forms.

  32. 32.

    We use the GRACE format recommended by Rajman et al. (1997) for the tagging of French conjugation.

  33. 33.

    This is the number of times the cell-form appeared in the sample, different from the Lexique3 frequency and variable from one sample to another.

  34. 34.

    See Fig. 9.

  35. 35.

    While French conjugation distinguishes 51 cells for verbs with past participle targeted by gender/number agreement (,,,, some intransitive verbs possess only the form and can present a full paradigm with 48 co-forms.

  36. 36.

    In this case, we could unify C1 and C2 or C2 and C3. When such a situation arises, we arbitrarily choose one of the largest unifiable set.

  37. 37.

    As noted by Boyé (2016), a large number of co-pairs is also crucial for the emergence of the right predictions between forms.

  38. 38.

    Grevisse and Goose (2007, §898, 1107–1108).

  39. 39.

    The minimal number of co-pairs necessary to distinguish the 57 inflectional classes defined by Stump and Finkel (2013).

  40. 40.

    The data presented here is voluntarily restricted to the citation forms of each lexeme to emphasize the importance of a paradigmatic approach of derivation.

  41. 41.

    Here we do not consider inflection but even if we did it would change little to the analysis. Number is neutralized for the nouns and for the adjectives. Both nouns and adjectives can display number contrasts in French (e.g. œil ‘eye’: sg vs pl; normal ‘normal’: vs but none of the Ethnics do. Following Bonami et al. (2004), we consider liaison contrasts between singulars and plurals to be related to differences in their linking elements (Ø vs ) rather than their inflected forms. Gender would only introduce one more cell as shown below. So, in fact, only taking into account the Ethnic and the country is almost the same as considering the complete set of cell-forms, especially when the relation between cells Sync2 and Sync3 is deemed to be inflectional.

    1. (i)
  42. 42.

    The entry point of all morphological research—morpheme-based as well as lexeme-based—is the word. In the first case, the analysis aims at breaking the word into morphemes and organizing them in a tree-like structure (...) In the second case, the analysis looks to describe the relations of forms and meanings between words.

  43. 43.

    The size of corpora and the number of examples are crucial factors for morphological studies and (...) to quote a well-know motto of corpus linguistics: More data is better data (...). More to the point, the extensive approach of morphology tries to base morphological analyses on the greatest possible number of examples considering that, in fact, the quantity of the examples taken into account directly affects the quality of the analyses (...). The accuracy of the description for the phenomena and the processes observed depend on that quantity. Analyses based on large numbers of examples are more precise and give a better understanding of less common data and more marginal phenomena.

  44. 44.

    Like English, French uses different allomorphs for this suffix (e.g. voir ‘see’, visible ‘visible’ and soudre ‘solve’, soluble ‘solvable’), we included all the allomorphs in our data.


  1. Ackerman, F., Blevins, J. P., & Malouf, R. (2009). Parts and wholes: Implicative patterns in inflectional paradigms. In J. P. Blevins & J. Blevins (Eds.), Analogy in grammar: Form and acquisition (pp. 54–82). Oxford: Oxford University Press.

    Chapter  Google Scholar 

  2. Anderson, P. W. (1972). More is different. Science, 177(4047), 393–396.

    Article  Google Scholar 

  3. Baroni, M., Bernardini, S., Adriano, F., & Zanchetta, E. (2009). The WaCky wide web: A collection of very large linguistically processed web-crawled corpora. Language Resources and Evaluation, 43(3), 209–226.

    Article  Google Scholar 

  4. Blevins, J. P. (2006). Word-based morphology. Journal of Linguistics, 42, 531–573.

    Article  Google Scholar 

  5. Blevins, J. P., Ackerman, F., Malouf, R., & Ramscar, M. (2016a). Morphology as an adaptive discriminative system. In H. Harley & D. Siddiqi (Eds.), Morphological metatheory (pp. 271–302). Amsterdam: Benjamins.

    Chapter  Google Scholar 

  6. Blevins, J. P., Milin, P., & Ramscar, M. (2016b). The Zipfian paradigm cell filling problem. In F. Kiefer, J. P. Blevins, & H. Bartos (Eds.), Perspectives on morphological structure: Data and analyses, Leiden: Brill. Chap. 8.

    Google Scholar 

  7. Bonami, O. (2014). La structure fine des paradigmes de flexion: Études de morphologie descriptive, théorique et formelle. Habilitation thesis, Université Paris Diderot-Paris 7, Paris.

  8. Bonami, O., & Beniamine, S. (2015). Implicative structure and joint predictiveness. In V. Pirrelli, C. Marzi, & M. Ferro (Eds.), Word structure and word usage: Proceedings of the NetWordS final conference, Pisa, April 2015.

    Google Scholar 

  9. Bonami, O., & Boyé, G. (2008). Paradigm shape is morphomic in Nepali. Communication presented at the 13th international morphology meeting in Vienna, February 2008.

  10. Bonami, O., & Boyé, G. (2014). De formes en thèmes. In F. Villoing, S. Leroy, & S. David (Eds.), Foisonnements morphologiques: Etudes en hommage à Françoise Kerleroux (pp. 17–45). Paris: Presses Universitaires de Paris Ouest.

    Google Scholar 

  11. Bonami, O., & Strnadová, J. (2018). Paradigm structure and predictability in derivational morphology. Morphology,

    Google Scholar 

  12. Bonami, O., Boyé, G., & Tseng, J. (2004). An integrated approach to French liaison. In G. Jäger, P. Monachesi, G. Penn, & S. Wintner (Eds.), Proceedings of formal grammar 2004 (pp. 29–45). Nancy: CSLI Publications.

    Google Scholar 

  13. Bonami, O., Boyé, G., & Henri, F. (2011). Measuring inflectional complexity: French and Mauritian. Paper presented at the: Quantitative measures in morphology and morphological development workshop at UCSD, January 2011.

  14. Booij, G. (1994). Against split morphology. In G. Booij & J. van Marle (Eds.), Yearbook of morphology 1993 (pp. 27–49). Dordrecht: Kluwer Academic.

    Google Scholar 

  15. Boyé, G. (2015). Small world inflectional morphology: A fragment for French conjugation. Paper presented at the: Computational methods for descriptive and theoretical morphology workshop in Vienna, February 2015.

  16. Boyé, G. (2016). Pour une modélisation surfaciste de la flexion: Le cas de la conjugaison du français. In SHS web of conferences (Vol. 27). Les Ulis: EDP Sciences.

    Google Scholar 

  17. Boyé, G., & Schalchli, G. (2016). The status of paradigms. In A. Hippisley & G. Stump (Eds.), The Cambridge handbook of morphology (pp. 206–234). Cambridge: Cambridge University Press.

    Chapter  Google Scholar 

  18. Brown, D. (2007). Peripheral functions and overdifferentiation: The Russian second locative. Russian Linguistics, 31(1), 61–76.

    Article  Google Scholar 

  19. Buchholz, E. (2004). Grammatik der finnischen Sprache. Bremen: Hempen Verlag.

    Google Scholar 

  20. Carstairs, A. (1987). Allomorphy in inflexion. London: Croom Helm.

    Google Scholar 

  21. Church, K. W., & Mercer, R. L. (1993). Introduction to the special issue on computational linguistics using large corpora. Computational Linguistics, 19(1), 1–24.

    Google Scholar 

  22. Corbett, G. G. (2000). Number. Cambridge textbooks in linguistics. Cambridge: Cambridge University Press.

    Google Scholar 

  23. Corbett, G. G., & Fraser, N. M. (1993). Network morphology: A DATR account of Russian nominal inflection. Journal of Linguistics, 29(1), 113–142.

    Article  Google Scholar 

  24. David, K. (2007). Number marking in Maltese nouns. In F. Hoyt, N. Seifert, A. Teodorescu, & J. White (Eds.), Texas linguistic society IX: The morphosyntax of underrepresented languages (pp. 79–88). Stanford: CSLI Publications.

    Google Scholar 

  25. de Calmès, M., & Pérennou, G. (1998). BDLEX: A lexicon for spoken and written French. In 1st international conference on language resources and evaluation (pp. 1129–1136). Grenade: ELRA.

    Google Scholar 

  26. Eggert, E. (2002). La dérivation toponymes-gentilés en français: Mise en évidence des régularités utilisables dans le cadre d’un traitement automatique. Ph.D. thesis, Université de Tours.

  27. Faaß, G., & Eckart, K. (2013). SdeWaC—A corpus of parsable sentences from the web. In I. Gurevych, C. Biemann, & T. Zesch (Eds.), Language processing and knowledge in the web (pp. 61–68). Heidelberg: Springer.

    Chapter  Google Scholar 

  28. Finkel, R., & Stump, G. T. (2007). Principal parts and morphological typology. Morphology, 17(1), 39–75.

    Article  Google Scholar 

  29. Grevisse, M., & Goose, A. (2007). Le bon usage. Bruxelles: De Boeck Université.

    Google Scholar 

  30. Hathout, N. (2016). La question des données en morphologie. Cahiers de l’ILSL, 45, 123–160.

    Google Scholar 

  31. Henri, F. (2010). A constraint-based approach to verbal constructions in Mauritian. Ph.D. thesis, University of Mauritius and Université Paris Diderot.

  32. Henri, F., Marandin, J.-M., & Abeillé, A. (2008). Information structure coding in Mauritian: Verum focus expressed by long forms of verbs. Paper presented at the: Workshop on predicate focus, verum focus, verb focus in Postdam, November 2008.

  33. Hockett, C. F. (1954). Two models of grammatical description. Word, 10, 210–234.

    Article  Google Scholar 

  34. Kilani-Schoch, M., & Dressler, W. U. (2005). Morphologie naturelle et flexion du verbe français. Tübingen: Gunter Narr Verlag.

    Google Scholar 

  35. Lee, J. (2014). Automatic morphological alignment and clustering. Technical report TR-2014-07, Department of Computer Science, University of Chicago.

  36. Lignon, S. (2000). La suffixation en -ien: Aspects sémantiques et phonologiques. Ph.D. thesis, Université de Toulouse le Mirail.

  37. Malouf, R., & Ackerman, F. (2013). The low entropy conjecture. Language, 89(3), 429–463.

    Article  Google Scholar 

  38. Matthews, P. H. (1991). Morphology (2nd ed.). Cambridge: Cambridge University Press.

    Book  Google Scholar 

  39. New, B. (2006). Lexique 3 : Une nouvelle base de données lexicales. In Actes de la conférence traitement automatique des langues naturelles, TALN’2006 (pp. 892–900).

    Google Scholar 

  40. Pihel, K., & Pikamäe, A. (1999). Soome-eesti sõnaraamat. Tallinn: Valgus.

    Google Scholar 

  41. Plénat, M. (2008). Quelques considérations sur la formation des gentilés. In B. Fradin (Ed.), La raison morphologique: Hommage à la mémoire de Danielle Corbin (pp. 155–174). Amsterdam: Benjamins.

    Chapter  Google Scholar 

  42. Rajman, M., Lecomte, J., & Paroubek, P. (1997). Format de description lexicale pour le français, Partie 2: Description morpho-syntaxique. Technical report GRACE, GTR 3-2.1, LIMSI.

  43. Ramscar, M., Yarlett, D., Dye, M., Denny, K., & Thorpe, K. (2010). The effects of feature-label-order and their implications for symbolic learning. Cognitive Science, 34(6), 909–957.

    Article  Google Scholar 

  44. Robins, R. H. (1959). In defence of WP. Transactions of the Philological Society, 58, 116–144.

    Article  Google Scholar 

  45. Roché, M. (2008). Structuration du lexique et principe d’économie : Le cas des ethniques. In J. Durand, B. Habert, & B. Laks (Eds.), Congrès mondial de linguistique, Française, CMLF ’08 (pp. 1571–1585). Paris: Institut de Linguistique Française.

    Google Scholar 

  46. Roché, M. (2011). Quelle morphologie ? In M. Roché, G. Boyé, N. Hathout, S. Lignon, & M. Plénat (Eds.), Des unités morphologiques au lexique, langues et syntaxe (pp. 15–39). Paris: Lavoisier.

    Google Scholar 

  47. Sagot, B. (2010). The Lefff, a freely available and large-coverage morphological and syntactic lexicon for French. In Proceedings of the 8th international conference on language resources and evaluation, LREC’10. Valletta: ELRA.

    Google Scholar 

  48. Shannon, C. E. (1948). A mathematical theory of communication. The Bell System Technical Journal, 27, 379–423, 623–656.

    Article  Google Scholar 

  49. Stump, G. T. (2001). Inflectional morphology. Cambridge studies in linguistics, Vol. 93. Cambridge: Cambridge University Press.

    Book  Google Scholar 

  50. Stump, G. T. (2006). Heteroclisis and paradigm linkage. Language, 82, 279–322.

    Article  Google Scholar 

  51. Stump, G. T. (2016). Inflectional paradigms: Content and form at the syntax-morphology interface. Cambridge: Cambridge University Press.

    Book  Google Scholar 

  52. Stump, G. T., & Finkel, R. (2008). Stem alternations and principal parts in French verb inflection. Paper presented at the: Décembrettes 6 in Bordeaux, December 2008.

  53. Stump, G. T., & Finkel, R. A. (2013). Morphological typology: From word to paradigm. Cambridge studies in linguistics: Vol. 138. Cambridge: Cambridge University Press.

    Book  Google Scholar 

  54. Wurzel, W. U. (1989). Inflectional morphology and naturalness. Dordrecht: Kluwer Academic.

    Google Scholar 

  55. Zipf, G. K. (1932). Selected studies of the principle of relative frequency in language. Cambridge: Harvard University Press.

    Book  Google Scholar 

Download references

Author information



Corresponding author

Correspondence to Gilles Boyé.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Boyé, G., Schalchli, G. Realistic data and paradigms: the paradigm cell finding problem. Morphology 29, 199–248 (2019).

Download citation


  • Inflectional morphology
  • Derivational morphology
  • Abstractive approach
  • Realistic data
  • Paradigms
  • Paradigm cell filling problem
  • Paradigm cell finding problem