The Paradigm Cell Filling Problem (pcfp): “What licenses reliable inferences about the inflected (and derived) surface forms of a lexical item?”Ackerman et al. (2009, p. 54) has received considerable attention during the last decade. The two main approaches that have been explored are the Information Theoretic approach which aims to measure the information contained in the implicative relations between cells of a paradigm; and the neural network approach, which takes an amorphous view of morphology and tries learn paradigms form surface forms. In this paper I present a third alternative based on analogical classification which tries to integrate elements from both approaches. I will present a case study on the Russian nominal inflection system, and will argue that implicative relations between markers, noun semantics and stem phonology all play a role in helping speakers solve the pcfp.
This is a preview of subscription content, access via your institution.
Buy single article
Instant access to the full article PDF.
Tax calculation will be finalised during checkout.
Subscribe to journal
Immediate online access to all issues from 2019. Subscription will auto renew annually.
Tax calculation will be finalised during checkout.
Even if lexemes can only inflect for two different cells, many low frequency lexemes (e.g. hapax legomena) will only appear in one inflected form.
This is, strictly speaking, not completely true. Some early connectionist models (McClelland and Rumelhart 1986) could be though of as direct predecessors of the RNN approach. However, the idea of the pcfp had not been articulated precisely in these earlier studies.
Where ’ marks palatalization.
It is not very clear how the authors define the term irregular. It seems to include what they call unpredictable stem alternations like diphthongization in Spanish (poder ‘can’ inf ↔ puedo 1.sg), and cases like the Italian rendere ‘make’ ↔ reso ‘made’, which are hardly irregular and not at all unpredictable (Albright et al. 2001; Guzmán Naranjo 2019).
This is not necessarily the case. There are versions of Word and Paradigm morphology which allow for linguistic abstractions without the need of affixes, stems or morphemes (Guzmán Naranjo and Becker forthcoming).
I will use the term analogy to refer exclusively to analogical classification and not proportional analogy (Blevins 2016).
Where K stands for a velar and N stands for a nasal.
One could argue that the kind of output-oriented schemata which Croft and Cruse (2004, Chap. 11.2-11.3) argue for is a kind of analogical schema which takes fully inflected forms into account. However, it is not completely clear how output-oriented could be implemented computationally, and whether they could handle sublexical units.
Other metrics like Cohen’s D, F score or Kappa score are alternatives to accuracy. I will use accuracy in this paper because it has the more intuitive interpretation, but other metrics would work equally well.
Although a detailed comparison of machine learning algorithms for modelling analogy is outside the scope of this paper, I tested an LSTM network on a couple of selected models but could not reach a higher accuracy than the one by the MLP. Using extreme gradient boosting trees achieve similar accuracies like the MLP for most cases, but worse accuracies in other cases.
Notice I make no mention of morphemes or rules at any point, and markers are neither morphemes nor rules, simply contrasting material. The present approach is agnostic about how we construct the inflected forms, it only requires that we assign lexemes to inflection classes.
Notice that I do not include the vocative (Andersen 2012).
It does not provide forms for the vocative.
For the current study I allowed the class extraction to take palatalization into account. Abstracting away palatalization produces only slightly different results.
This happens when the singular and plural have different bases. For example, for получеловек (‘half-man’), the nom.sg and acc.sg are poluʨelovjek and poluʨelovjeka respectively, while the nom.pl and acc.pl are poluljudji and poluljudjej respectively. In cases like this, the algorithm does not identify that the noun should have two different stems, one for the singular (poluʨelovjek) and one for the plural poluljudj. Instead, it only identifies the common sequence polu as the stem. Because of this, the marker extraction produces a class which only fits this noun.
A potential solution to this issue would be a hybrid approach which memorizes low frequency classes, and only attempts analogical classification for classes with enough items.
Two markers separated by a comma indicate overabundance.
More precisely the ruwikiruscorpora-func_upos_skipgram_300_5_2019 semantic vector data-set downloaded http://rusvectores.org/en/models/, accessed 17.06.2019.
The technique used by word2vec approaches is not the same as with distributional methods as shown in Figs. 2 and 3. Instead, in word2vec with the skip-gram method a neural network learns to predict the surrounding context (in some fixed window, usually 300 or 500) of a word. The weights induced by the network are the semantic vectors.
I chose the singular nominative for no other reason that it was the form which provided the most lemmas. This of course raises the issue that the vectors also contain semantic information about the nominative itself. There is no good way around this problem without training word vectors on lemmatised corpora. Nevertheless, because all words are represented in the nominative, and because the vectors have no morphological information, this should not bias the results.
These distributions are not very precise in that it is not possible to properly discriminate between syncretic forms. This means that if two cells of the paradigm of a lexeme are identical, seeing only one of them in the corpus is counted as having seen both. The actual number of observed forms is thus somewhat inflated.
A recent attempt at developing such a method is found in Beniamine (2018), however Beniamine’s method cannot handle processes like reduplication.
The models consider palatalization as an independent value.
Because of the large number of items, the 95% confidence intervals for the all accuracy metrics for the models is between +/-0.002 from the reported accuracy.
Since every response cell now has 12 models, it would take too much space to present all models. Instead, I avaraged the accuracy for all 12 models for each cell.
This is only true generally speaking. It is possible for a model to have too many bad predictors which could drown the information in the good predictors.
Notice each model takes about an hour to fit.
Although it would be possible to add more cells, as already discussed, the actual improvement from adding more cells is small.
This general correlation is obtained by flattening each matrix into a vector and calculating the correlation between both resulting vectors. Notice correlation values are negative because a high entropy value means a low accuracy value (and the other way around).
A bayesian correlation test is more or less equivalent to a Pearson correlation test. A bayesian correlation test compares whether the two correlation values are likely to be different or not. The difference is that the bayesian method gives us posterior probabilities, which have a simpler and more intuitive interpretation than p-values.
The generalized version of this approach is described by Guzmán Naranjo and Becker (forthcoming).
I include the six final segments of the word instead of only four because we now have to account for the segments in the markers.
Ackerman, F., Blevins, James. P., & Malouf, R. (2009). Parts and wholes: Implicative patterns in inflectional paradigms. In J. P. Blevins & J. Blevins (Eds.), Analogy in grammar: Form and acquisition (pp. 54–82). Oxford, New York: Oxford University Press.
Ackerman, F., & Malouf, R. (2013). Morphological organization: The low conditional entropy conjecture. Language, 89(3), 429–464.
Ackerman, F., & Malouf, R. (2016). Word and pattern morphology: An information-theoretic approach. Word Structure, 9(2), 125–131.
Albright, A., Andrade, A., & Hayes, B. (2001). Segmental environments of Spanish diphthongization. UCLA Working Papers in Linguistics, 7(5), 117–151.
Andersen, H. (2012). The new Russian vocative: Synchrony, diachrony, typology. Scando-Slavica, 58(1), 122–167.
Arndt-Lappe, S. (2011). Towards an exemplar-based model of stress in English noun-noun compounds. Journal of Linguistics, 47(3), 549–585.
Arndt-Lappe, S. (2014). Analogy in suffix rivalry: The case of English -ity and -ness. English Language and Linguistics, 18(3), 497–548.
Baayen, R. H., et al. (2011). An amorphous model for morphological processing in visual comprehension based on naive discriminative learning. Psychological Review, 118(3), 438–481.
Beniamine, S. (forthcoming). One lexeme, many classes: Inflection class systems as lattices. In B. Crysmann & M. Sailer (Eds.), One- to -many relations in morphology, syntax and semantics, Berlin: Language Science Press.
Beniamine, S. (2018). Classifications flexionnelles: Étude quantitative des structures de paradigmes. PhD thesis. Paris: Université Sorbonne Paris Cité – Paris Diderot.
Blevins, J. P. (2013). The information-theoretic turn. Psihologija, 46(3), 355–375.
Blevins, J. P. (2016). Word and paradigm morphology. Oxford: Oxford University Press.
Blevins, J. P., Milin, P., & Ramscar, M. (2017). The zipfian paradigm cell filling problem. In F. Kiefer, J. P. Blevins, & H. Bartos (Eds.), Perspectives on morphological organization (pp. 139–158). Leiden, The Netherlands: Brill.
Bonami, O., & Beniamine, S. (2016). Joint predictiveness in inflectional paradigms. Word Structure, 9(2), 156–182.
Boyé, G., & Schalchli, G. (2019). Realistic data and paradigms: The paradigm cell finding problem. Morphology, 29(2), 199–248.
Breiman, L. (2001). Random forests. Machine Learning, 5–32.
Bürkner, P.-C. (2017). Brms: An R package for Bayesian multilevel models using stan. Journal of Statistical Software, 80(1), 1–28.
Bürkner, P.-C. (2018). Advanced Bayesian multilevel modeling with the R package brms. The R Journal, 10(1), 395411.
Bybee, J. L., & Slobin, D. I. (1982). Rules and schemas in the development and use of the English past tense. Language, 58(2), 265–289.
Cardillo, A. F., et al. (2018). Deep learning of inflection and the cell-filling problem. Journal of Computational Linguistics, 4(1), 57–75.
Carpenter, B., et al. (2017). Stan: A probabilistic programming language. Journal of Statistical Software, 76(1), 1–32.
Chen, T., & Guestrin, C. (2016). XGBoost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. KDD ’16 (pp. 785–794). New York, NY, USA: ACM.
Clark, S. (2015). Vector space models of lexical meaning. In S. Lappin & C. Fox (Eds.), Blackwell handbooks in linguistics, vol. 10: handbook of contemporary semantics (pp. 493–522). Malden, MA: Wiley.
Corbett, G. G. (1982). Gender in Russian: An account of gender specification and its relationship to declension. Russian Linguistics, 6(2), 197–232.
Corbett, G. G. (1991). Gender. Cambridge textbooks in linguistics. Cambridge: Cambridge University Press.
Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine Learning, 20(3), 273–297.
Cotterell, R., et al. (2019). On the complexity and typology of inflectional morphological systems. Transactions of the Association for Computational Linguistics, 7, 327–342.
Croft, W., & Cruse, A. D. (2004). Cognitive linguistics. Cambridge textbooks in linguistics. Cambridge, MA: Cambridge University Press.
Eddington, D. (2002). Spanish gender assignment in an analogical framework. Journal of Quantitative Linguistics, 9(1), 49–75.
Erk, K. (2012). Vector space models of word meaning and phrase meaning: A survey. Language and Linguistics Compass, 6(10), 635–653.
Firth, J. R. (1957). A synopsis of linguistic theory, 1930–1955. Studies in Linguistic Analysis, 1–32.
Fraser, N. M., & Corbett, G. G. (1995). Gender, animacy, and declensional class assignment: A unified account for Russian. Yearbook of morphology 1994 (pp. 123–150). Dordrecht: Springer.
Guzmán Naranjo, M. (2019). Analogical classification in formal grammar: Empirically oriented theoretical morphology and syntax. Berlin: Language Science Press.
Guzmán Naranjo, M., & Becker, L. (forthcoming). Coding efficiency in nominal inflection: Expectedness and type frequency effects. Linguistics Vanguard.
Harris, Z. S. (1954). Distributional structure. Word, 10(2–3), 146–162.
Köpcke, K.-M. (1988). Schemas in German plural formation. Lingua, 74(4), 303–335.
Köpcke, K.-M. (1998). Prototypisch starke und schwache Verben der Deutschen Gegenwartssprache. Germanistische Linguistik, 141(142), 45–60.
Köpcke, K.-M., & Zubin, D. A. (1984). Sechs Prinzipien für die Genuszuweisung im Deutschen: Ein Beitrag zur natürlichen Klassifikation. Linguistische Berichte, 93, 26–50.
Kutuzov, A., & Kuzmenko, E. (2017). WebVectors: A toolkit for building web interfaces for vector semantic models. In D. I. Ignatov et al. (Eds.), Analysis of images, social networks and texts, revised selected papers, 5th international conference, AIST 2016, Yekaterinburg, Russia, April 7–9, 2016 (pp. 155–161). Cham: Springer.
Lapesa, G., & Evert, S. (2014). A large scale evaluation of distributional semantic models: Parameters, interactions and model selection. Transactions of the Association for Computational Linguistics, 2, 531–546.
Lenci, A. (2008). Distributional semantics in linguistic and cognitive research. Italian Journal of Linguistics, 20(1), 1–31.
Lewandowski, D., Kurowicka, D., & Joe, H. (2009). Generating random correlation matrices based on vines and extended onion method. Journal of Multivariate Analysis, 100(9), 1989–2001.
Malouf, R. (2016). Generating morphological paradigms with a recurrent neural network. San Diego Linguistic Papers, 6, 122–129.
Marelli, M., & Baroni, M. (2015). Affixation in semantic space: Modeling morpheme meanings with compositional distributional semantics. Psychological Review, 122(3), 485–515.
Marzi, C., et al. (2018). Evaluating inflectional complexity crosslinguistically: A processing perspective. In Proceedings of the 11th language resources and evaluation conference: European language resource association.
Matthews, C. A. (2005). French gender attribution on the basis of similarity: A comparison between AM and connectionist models. Journal of Quantitative Linguistics, 12, 262–296.
Matthews, C. A. (2010). On the nature of phonological cues in the acquisition of French gender categories: Evidence from instance-based learning models. Lingua, 120(4), 879–900.
Matthews, C. A. (2013). On the analogical modelling of the English past-tense: A critical assessment. Lingua, 133, 360–373.
McClelland, J. L., & Rumelhart, D. E. (1986). A distributed model of human learning and memory. In J. L. McClelland & D. E. Rumelhart (Eds.), Parallel distributed processing: explorations in the microstructure of cognition, vol. 2: psychological and biological models (p. 170–2015). Cambridge: MIT Press.
Mikolov, T., et al. (2013). Efficient estimation of word representations in vector space. ArXiv preprint. arXiv:1301.3781.
Mortensen, D. R., Dalmia, S., & Littell, P. (2018). Epitran: Precision G2P for many languages. In Proceedings of the eleventh international conference on language resources and evaluation, LREC-2018.
Pal, S. K., & Mitra, S. (1992). Multilayer perceptron, fuzzy sets, and classification. IEEE Transactions on Neural Networks, 3(5), 683–697.
Parker, J., & Sims, A. (2019). Irregularity, paradigmatic layers, and the complexity of inflection class systems: A study of Russian nouns. In P. Arkadiev & F. Gardani (Eds.), The complexities of morphology.
Rosen, E. R. (2019). Learning complex inflectional paradigms through blended gradient inputs. In Proceedings of the Society for Computation in Linguistics, SCiL (Vol. 2, pp. 102–112).
Rumelhart, David. E., Hinton, G. E., & Williams, R. J. (1986a). Learning internal representations by error propagation. In D. E. Rumelhart & J. L. McClelland (Eds.), Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: psychological and biological models, Cambridge: MIT Press.
Rumelhart, D. E., et al. (1986b). On learning the past tenses of English verbs. In Parallel distributed processing: explorations in the microstructure of cognition, vol. 2: psychological and biological models, Cambridge: MIT Press.
Shannon, C. (1948). A mathematical theory of communication. The Bell System Technical Journal 27, 379–423, 623–656.
Shavrina, T., & Shapovalova, O. (2017). To the methodology of corpus construction for machine learning: “Taiga” syntax tree corpus and parser. In Proceedings of the international conference “Corpus linguistics 2017” (pp. 78–84). St. Petersburg: St. Petersburg State University.
Silfverberg, M., Liu, L., & Hulden, M. (2018). A computational model for the linguistic notion of morphological paradigm. In Proceedings of the 27th international conference on computational linguistics. Association for computational linguistics (pp. 1615–1626).
Skousen, R., Lonsdale, D., & Parkinson, D. B. (2002). Analogical modeling: An exemplar-based approach to language. Amsterdam: John Benjamins.
Strobl, C., Malley, J., & Tutz, G. (2009). An introduction to recursive partitioning: Rationale, application, and characteristics of classification and regression trees, bagging, and random forests. Psychological Methods, 14, 323–348.
Stump, G. T., & Finkel, R. (2013). Morphological typology: From word to paradigm. Cambridge studies in linguistics: Vol. 138. Cambridge: Cambridge University Press.
Turney, P. D., & Pantel, P. (2010). From frequency to meaning: Vector space models of semantics. Journal of Artificial Intelligence Research, 37, 141–188.
Viks, Ü. (1994). A morphological analyzer for the Estonian language: The possibilities and impossibilities of automatic analysis. Automatic Morphology of Estonian, 1, 7–28.
Zaliznyak, A. A. (1967). Russkoe imennoe slovoizmenenie. Moscow: Nauka.
Zaliznyak, A. A. (1977). Grammatical dictionary of the Russian language. Moscow: Russkij Jazyk.
I would like to thank Sacha Beniamine and Olivier Bonami for their help and useful comments. All errors are mine. This work was supported by a public grant overseen by the French National Research Agency (ANR) as part of the “Investissements d’Avenir” program (reference: ANR-10-LABX-0083).
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Guzmán Naranjo, M. Analogy, complexity and predictability in the Russian nominal inflection system. Morphology 30, 219–262 (2020). https://doi.org/10.1007/s11525-020-09367-1
- Paradigm cell filling problem
- Information theory
- Paradigm organization