Skip to main content

Analogy, complexity and predictability in the Russian nominal inflection system

Abstract

The Paradigm Cell Filling Problem (pcfp): “What licenses reliable inferences about the inflected (and derived) surface forms of a lexical item?”Ackerman et al. (2009, p. 54) has received considerable attention during the last decade. The two main approaches that have been explored are the Information Theoretic approach which aims to measure the information contained in the implicative relations between cells of a paradigm; and the neural network approach, which takes an amorphous view of morphology and tries learn paradigms form surface forms. In this paper I present a third alternative based on analogical classification which tries to integrate elements from both approaches. I will present a case study on the Russian nominal inflection system, and will argue that implicative relations between markers, noun semantics and stem phonology all play a role in helping speakers solve the pcfp.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Notes

  1. 1.

    Even if lexemes can only inflect for two different cells, many low frequency lexemes (e.g. hapax legomena) will only appear in one inflected form.

  2. 2.

    This is, strictly speaking, not completely true. Some early connectionist models (McClelland and Rumelhart 1986) could be though of as direct predecessors of the RNN approach. However, the idea of the pcfp had not been articulated precisely in these earlier studies.

  3. 3.

    For a more detailed description of the different approaches to using entropy to quantify inflectional complexity see Cotterell et al. (2019) and Parker and Sims (2019).

  4. 4.

    Where ’ marks palatalization.

  5. 5.

    It is not very clear how the authors define the term irregular. It seems to include what they call unpredictable stem alternations like diphthongization in Spanish (poder ‘can’ infpuedo 1.sg), and cases like the Italian rendere ‘make’ ↔ reso ‘made’, which are hardly irregular and not at all unpredictable (Albright et al. 2001; Guzmán Naranjo 2019).

  6. 6.

    This is not necessarily the case. There are versions of Word and Paradigm morphology which allow for linguistic abstractions without the need of affixes, stems or morphemes (Guzmán Naranjo and Becker forthcoming).

  7. 7.

    I will use the term analogy to refer exclusively to analogical classification and not proportional analogy (Blevins 2016).

  8. 8.

    Where K stands for a velar and N stands for a nasal.

  9. 9.

    One could argue that the kind of output-oriented schemata which Croft and Cruse (2004, Chap. 11.2-11.3) argue for is a kind of analogical schema which takes fully inflected forms into account. However, it is not completely clear how output-oriented could be implemented computationally, and whether they could handle sublexical units.

  10. 10.

    Other metrics like Cohen’s D, F score or Kappa score are alternatives to accuracy. I will use accuracy in this paper because it has the more intuitive interpretation, but other metrics would work equally well.

  11. 11.

    A detailed mathematical explanation of how perceptrons work is outside the scope of this paper. For an in-depth discussion of perceptrons see Rumelhart et al. (1986b) and Pal and Mitra (1992).

  12. 12.

    Although a detailed comparison of machine learning algorithms for modelling analogy is outside the scope of this paper, I tested an LSTM network on a couple of selected models but could not reach a higher accuracy than the one by the MLP. Using extreme gradient boosting trees achieve similar accuracies like the MLP for most cases, but worse accuracies in other cases.

  13. 13.

    Notice I make no mention of morphemes or rules at any point, and markers are neither morphemes nor rules, simply contrasting material. The present approach is agnostic about how we construct the inflected forms, it only requires that we assign lexemes to inflection classes.

  14. 14.

    Notice that I do not include the vocative (Andersen 2012).

  15. 15.

    It does not provide forms for the vocative.

  16. 16.

    The software for automatic marker extraction was first introduced in Guzmán Naranjo and Becker (forthcoming) and can be found at: https://gitlab.com/mguzmann89/paradigma.

  17. 17.

    For the current study I allowed the class extraction to take palatalization into account. Abstracting away palatalization produces only slightly different results.

  18. 18.

    This happens when the singular and plural have different bases. For example, for получеловек (‘half-man’), the nom.sg and acc.sg are poluʨelovjek and poluʨelovjeka respectively, while the nom.pl and acc.pl are poluljudji and poluljudjej respectively. In cases like this, the algorithm does not identify that the noun should have two different stems, one for the singular (poluʨelovjek) and one for the plural poluljudj. Instead, it only identifies the common sequence polu as the stem. Because of this, the marker extraction produces a class which only fits this noun.

  19. 19.

    A potential solution to this issue would be a hybrid approach which memorizes low frequency classes, and only attempts analogical classification for classes with enough items.

  20. 20.

    Two markers separated by a comma indicate overabundance.

  21. 21.

    More precisely the ruwikiruscorpora-func_upos_skipgram_300_5_2019 semantic vector data-set downloaded http://rusvectores.org/en/models/, accessed 17.06.2019.

  22. 22.

    The technique used by word2vec approaches is not the same as with distributional methods as shown in Figs. 2 and 3. Instead, in word2vec with the skip-gram method a neural network learns to predict the surrounding context (in some fixed window, usually 300 or 500) of a word. The weights induced by the network are the semantic vectors.

  23. 23.

    I chose the singular nominative for no other reason that it was the form which provided the most lemmas. This of course raises the issue that the vectors also contain semantic information about the nominative itself. There is no good way around this problem without training word vectors on lemmatised corpora. Nevertheless, because all words are represented in the nominative, and because the vectors have no morphological information, this should not bias the results.

  24. 24.

    These distributions are not very precise in that it is not possible to properly discriminate between syncretic forms. This means that if two cells of the paradigm of a lexeme are identical, seeing only one of them in the corpus is counted as having seen both. The actual number of observed forms is thus somewhat inflated.

  25. 25.

    A recent attempt at developing such a method is found in Beniamine (2018), however Beniamine’s method cannot handle processes like reduplication.

  26. 26.

    The models consider palatalization as an independent value.

  27. 27.

    Because of the large number of items, the 95% confidence intervals for the all accuracy metrics for the models is between +/-0.002 from the reported accuracy.

  28. 28.

    Since every response cell now has 12 models, it would take too much space to present all models. Instead, I avaraged the accuracy for all 12 models for each cell.

  29. 29.

    This is only true generally speaking. It is possible for a model to have too many bad predictors which could drown the information in the good predictors.

  30. 30.

    Notice each model takes about an hour to fit.

  31. 31.

    Although it would be possible to add more cells, as already discussed, the actual improvement from adding more cells is small.

  32. 32.

    This general correlation is obtained by flattening each matrix into a vector and calculating the correlation between both resulting vectors. Notice correlation values are negative because a high entropy value means a low accuracy value (and the other way around).

  33. 33.

    A bayesian correlation test is more or less equivalent to a Pearson correlation test. A bayesian correlation test compares whether the two correlation values are likely to be different or not. The difference is that the bayesian method gives us posterior probabilities, which have a simpler and more intuitive interpretation than p-values.

  34. 34.

    The actual posterior probability that the correlation in 24-1 was equal to the correlations in 24-2 or 24-3 was 0%.

  35. 35.

    Notice also that because the data-set Malouf uses consists of only the most frequent Russian nouns, the number of classes found in it is smaller than what is found in the larger Zaliznyak (1977) data-set (Parker and Sims 2019).

  36. 36.

    The generalized version of this approach is described by Guzmán Naranjo and Becker (forthcoming).

  37. 37.

    I include the six final segments of the word instead of only four because we now have to account for the segments in the markers.

References

  1. Ackerman, F., Blevins, James. P., & Malouf, R. (2009). Parts and wholes: Implicative patterns in inflectional paradigms. In J. P. Blevins & J. Blevins (Eds.), Analogy in grammar: Form and acquisition (pp. 54–82). Oxford, New York: Oxford University Press.

    Google Scholar 

  2. Ackerman, F., & Malouf, R. (2013). Morphological organization: The low conditional entropy conjecture. Language, 89(3), 429–464.

    Google Scholar 

  3. Ackerman, F., & Malouf, R. (2016). Word and pattern morphology: An information-theoretic approach. Word Structure, 9(2), 125–131.

    Google Scholar 

  4. Albright, A., Andrade, A., & Hayes, B. (2001). Segmental environments of Spanish diphthongization. UCLA Working Papers in Linguistics, 7(5), 117–151.

    Google Scholar 

  5. Andersen, H. (2012). The new Russian vocative: Synchrony, diachrony, typology. Scando-Slavica, 58(1), 122–167.

    Google Scholar 

  6. Arndt-Lappe, S. (2011). Towards an exemplar-based model of stress in English noun-noun compounds. Journal of Linguistics, 47(3), 549–585.

    Google Scholar 

  7. Arndt-Lappe, S. (2014). Analogy in suffix rivalry: The case of English -ity and -ness. English Language and Linguistics, 18(3), 497–548.

    Google Scholar 

  8. Baayen, R. H., et al. (2011). An amorphous model for morphological processing in visual comprehension based on naive discriminative learning. Psychological Review, 118(3), 438–481.

    Google Scholar 

  9. Beniamine, S. (forthcoming). One lexeme, many classes: Inflection class systems as lattices. In B. Crysmann & M. Sailer (Eds.), One- to -many relations in morphology, syntax and semantics, Berlin: Language Science Press.

  10. Beniamine, S. (2018). Classifications flexionnelles: Étude quantitative des structures de paradigmes. PhD thesis. Paris: Université Sorbonne Paris Cité – Paris Diderot.

  11. Blevins, J. P. (2013). The information-theoretic turn. Psihologija, 46(3), 355–375.

    Google Scholar 

  12. Blevins, J. P. (2016). Word and paradigm morphology. Oxford: Oxford University Press.

    Google Scholar 

  13. Blevins, J. P., Milin, P., & Ramscar, M. (2017). The zipfian paradigm cell filling problem. In F. Kiefer, J. P. Blevins, & H. Bartos (Eds.), Perspectives on morphological organization (pp. 139–158). Leiden, The Netherlands: Brill.

    Google Scholar 

  14. Bonami, O., & Beniamine, S. (2016). Joint predictiveness in inflectional paradigms. Word Structure, 9(2), 156–182.

    Google Scholar 

  15. Boyé, G., & Schalchli, G. (2019). Realistic data and paradigms: The paradigm cell finding problem. Morphology, 29(2), 199–248.

    Google Scholar 

  16. Breiman, L. (2001). Random forests. Machine Learning, 5–32.

  17. Bürkner, P.-C. (2017). Brms: An R package for Bayesian multilevel models using stan. Journal of Statistical Software, 80(1), 1–28.

    Google Scholar 

  18. Bürkner, P.-C. (2018). Advanced Bayesian multilevel modeling with the R package brms. The R Journal, 10(1), 395411.

    Google Scholar 

  19. Bybee, J. L., & Slobin, D. I. (1982). Rules and schemas in the development and use of the English past tense. Language, 58(2), 265–289.

    Google Scholar 

  20. Cardillo, A. F., et al. (2018). Deep learning of inflection and the cell-filling problem. Journal of Computational Linguistics, 4(1), 57–75.

    Google Scholar 

  21. Carpenter, B., et al. (2017). Stan: A probabilistic programming language. Journal of Statistical Software, 76(1), 1–32.

    Google Scholar 

  22. Chen, T., & Guestrin, C. (2016). XGBoost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. KDD ’16 (pp. 785–794). New York, NY, USA: ACM.

    Google Scholar 

  23. Clark, S. (2015). Vector space models of lexical meaning. In S. Lappin & C. Fox (Eds.), Blackwell handbooks in linguistics, vol. 10: handbook of contemporary semantics (pp. 493–522). Malden, MA: Wiley.

    Google Scholar 

  24. Corbett, G. G. (1982). Gender in Russian: An account of gender specification and its relationship to declension. Russian Linguistics, 6(2), 197–232.

    Google Scholar 

  25. Corbett, G. G. (1991). Gender. Cambridge textbooks in linguistics. Cambridge: Cambridge University Press.

    Google Scholar 

  26. Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine Learning, 20(3), 273–297.

    Google Scholar 

  27. Cotterell, R., et al. (2019). On the complexity and typology of inflectional morphological systems. Transactions of the Association for Computational Linguistics, 7, 327–342.

    Google Scholar 

  28. Croft, W., & Cruse, A. D. (2004). Cognitive linguistics. Cambridge textbooks in linguistics. Cambridge, MA: Cambridge University Press.

    Google Scholar 

  29. Eddington, D. (2002). Spanish gender assignment in an analogical framework. Journal of Quantitative Linguistics, 9(1), 49–75.

    Google Scholar 

  30. Erk, K. (2012). Vector space models of word meaning and phrase meaning: A survey. Language and Linguistics Compass, 6(10), 635–653.

    Google Scholar 

  31. Firth, J. R. (1957). A synopsis of linguistic theory, 1930–1955. Studies in Linguistic Analysis, 1–32.

  32. Fraser, N. M., & Corbett, G. G. (1995). Gender, animacy, and declensional class assignment: A unified account for Russian. Yearbook of morphology 1994 (pp. 123–150). Dordrecht: Springer.

    Google Scholar 

  33. Guzmán Naranjo, M. (2019). Analogical classification in formal grammar: Empirically oriented theoretical morphology and syntax. Berlin: Language Science Press.

    Google Scholar 

  34. Guzmán Naranjo, M., & Becker, L. (forthcoming). Coding efficiency in nominal inflection: Expectedness and type frequency effects. Linguistics Vanguard.

  35. Harris, Z. S. (1954). Distributional structure. Word, 10(2–3), 146–162.

    Google Scholar 

  36. Köpcke, K.-M. (1988). Schemas in German plural formation. Lingua, 74(4), 303–335.

    Google Scholar 

  37. Köpcke, K.-M. (1998). Prototypisch starke und schwache Verben der Deutschen Gegenwartssprache. Germanistische Linguistik, 141(142), 45–60.

    Google Scholar 

  38. Köpcke, K.-M., & Zubin, D. A. (1984). Sechs Prinzipien für die Genuszuweisung im Deutschen: Ein Beitrag zur natürlichen Klassifikation. Linguistische Berichte, 93, 26–50.

    Google Scholar 

  39. Kutuzov, A., & Kuzmenko, E. (2017). WebVectors: A toolkit for building web interfaces for vector semantic models. In D. I. Ignatov et al. (Eds.), Analysis of images, social networks and texts, revised selected papers, 5th international conference, AIST 2016, Yekaterinburg, Russia, April 7–9, 2016 (pp. 155–161). Cham: Springer.

    Google Scholar 

  40. Lapesa, G., & Evert, S. (2014). A large scale evaluation of distributional semantic models: Parameters, interactions and model selection. Transactions of the Association for Computational Linguistics, 2, 531–546.

    Google Scholar 

  41. Lenci, A. (2008). Distributional semantics in linguistic and cognitive research. Italian Journal of Linguistics, 20(1), 1–31.

    Google Scholar 

  42. Lewandowski, D., Kurowicka, D., & Joe, H. (2009). Generating random correlation matrices based on vines and extended onion method. Journal of Multivariate Analysis, 100(9), 1989–2001.

    Google Scholar 

  43. Malouf, R. (2016). Generating morphological paradigms with a recurrent neural network. San Diego Linguistic Papers, 6, 122–129.

    Google Scholar 

  44. Marelli, M., & Baroni, M. (2015). Affixation in semantic space: Modeling morpheme meanings with compositional distributional semantics. Psychological Review, 122(3), 485–515.

    Google Scholar 

  45. Marzi, C., et al. (2018). Evaluating inflectional complexity crosslinguistically: A processing perspective. In Proceedings of the 11th language resources and evaluation conference: European language resource association.

    Google Scholar 

  46. Matthews, C. A. (2005). French gender attribution on the basis of similarity: A comparison between AM and connectionist models. Journal of Quantitative Linguistics, 12, 262–296.

    Google Scholar 

  47. Matthews, C. A. (2010). On the nature of phonological cues in the acquisition of French gender categories: Evidence from instance-based learning models. Lingua, 120(4), 879–900.

    Google Scholar 

  48. Matthews, C. A. (2013). On the analogical modelling of the English past-tense: A critical assessment. Lingua, 133, 360–373.

    Google Scholar 

  49. McClelland, J. L., & Rumelhart, D. E. (1986). A distributed model of human learning and memory. In J. L. McClelland & D. E. Rumelhart (Eds.), Parallel distributed processing: explorations in the microstructure of cognition, vol. 2: psychological and biological models (p. 170–2015). Cambridge: MIT Press.

    Google Scholar 

  50. Mikolov, T., et al. (2013). Efficient estimation of word representations in vector space. ArXiv preprint. arXiv:1301.3781.

  51. Mortensen, D. R., Dalmia, S., & Littell, P. (2018). Epitran: Precision G2P for many languages. In Proceedings of the eleventh international conference on language resources and evaluation, LREC-2018.

    Google Scholar 

  52. Pal, S. K., & Mitra, S. (1992). Multilayer perceptron, fuzzy sets, and classification. IEEE Transactions on Neural Networks, 3(5), 683–697.

    Google Scholar 

  53. Parker, J., & Sims, A. (2019). Irregularity, paradigmatic layers, and the complexity of inflection class systems: A study of Russian nouns. In P. Arkadiev & F. Gardani (Eds.), The complexities of morphology.

    Google Scholar 

  54. Rosen, E. R. (2019). Learning complex inflectional paradigms through blended gradient inputs. In Proceedings of the Society for Computation in Linguistics, SCiL (Vol. 2, pp. 102–112).

    Google Scholar 

  55. Rumelhart, David. E., Hinton, G. E., & Williams, R. J. (1986a). Learning internal representations by error propagation. In D. E. Rumelhart & J. L. McClelland (Eds.), Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: psychological and biological models, Cambridge: MIT Press.

    Google Scholar 

  56. Rumelhart, D. E., et al. (1986b). On learning the past tenses of English verbs. In Parallel distributed processing: explorations in the microstructure of cognition, vol. 2: psychological and biological models, Cambridge: MIT Press.

    Google Scholar 

  57. Shannon, C. (1948). A mathematical theory of communication. The Bell System Technical Journal 27, 379–423, 623–656.

    Google Scholar 

  58. Shavrina, T., & Shapovalova, O. (2017). To the methodology of corpus construction for machine learning: “Taiga” syntax tree corpus and parser. In Proceedings of the international conference “Corpus linguistics 2017” (pp. 78–84). St. Petersburg: St. Petersburg State University.

    Google Scholar 

  59. Silfverberg, M., Liu, L., & Hulden, M. (2018). A computational model for the linguistic notion of morphological paradigm. In Proceedings of the 27th international conference on computational linguistics. Association for computational linguistics (pp. 1615–1626).

    Google Scholar 

  60. Skousen, R., Lonsdale, D., & Parkinson, D. B. (2002). Analogical modeling: An exemplar-based approach to language. Amsterdam: John Benjamins.

    Google Scholar 

  61. Strobl, C., Malley, J., & Tutz, G. (2009). An introduction to recursive partitioning: Rationale, application, and characteristics of classification and regression trees, bagging, and random forests. Psychological Methods, 14, 323–348.

    Google Scholar 

  62. Stump, G. T., & Finkel, R. (2013). Morphological typology: From word to paradigm. Cambridge studies in linguistics: Vol. 138. Cambridge: Cambridge University Press.

    Google Scholar 

  63. Turney, P. D., & Pantel, P. (2010). From frequency to meaning: Vector space models of semantics. Journal of Artificial Intelligence Research, 37, 141–188.

    Google Scholar 

  64. Viks, Ü. (1994). A morphological analyzer for the Estonian language: The possibilities and impossibilities of automatic analysis. Automatic Morphology of Estonian, 1, 7–28.

    Google Scholar 

  65. Zaliznyak, A. A. (1967). Russkoe imennoe slovoizmenenie. Moscow: Nauka.

    Google Scholar 

  66. Zaliznyak, A. A. (1977). Grammatical dictionary of the Russian language. Moscow: Russkij Jazyk.

    Google Scholar 

Download references

Acknowledgements

I would like to thank Sacha Beniamine and Olivier Bonami for their help and useful comments. All errors are mine. This work was supported by a public grant overseen by the French National Research Agency (ANR) as part of the “Investissements d’Avenir” program (reference: ANR-10-LABX-0083).

Author information

Affiliations

Authors

Corresponding author

Correspondence to Matías Guzmán Naranjo.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Guzmán Naranjo, M. Analogy, complexity and predictability in the Russian nominal inflection system. Morphology 30, 219–262 (2020). https://doi.org/10.1007/s11525-020-09367-1

Download citation

Keywords

  • Analogy
  • Paradigm cell filling problem
  • Information theory
  • Paradigm organization