Advertisement

Russian Linguistics

, Volume 37, Issue 3, pp 253–291 | Cite as

Making choices in Russian: pros and cons of statistical methods for rival forms

  • R. Harald Baayen
  • Anna Endresen
  • Laura A. Janda
  • Anastasia Makarova
  • Tore Nesset
Article

Abstract

Sometimes languages present speakers with choices among rival forms, such as the Russian forms ostrič’ vs. obstrič’ ‘cut hair’ and proniknuv vs. pronikši ‘having penetrated’. The choice of a given form is often influenced by various considerations involving the meaning and the environment (syntax, morphology, phonology). Understanding the behavior of rival forms is crucial to understanding the form-meaning relationship of language, yet this topic has not received as much attention as it deserves. Given the variety of factors that can influence the choice of rival forms, it is necessary to use statistical models in order to accurately discover which factors are significant and to what extent. The traditional model for this kind of data is logistical regression, but recently two new models, called ‘tree & forest’ and ‘naive discriminative learning’ have emerged as alternatives. We compare the performance of logistical regression against the two new models on the basis of four datasets reflecting rival forms in Russian. We find that the three models generally provide converging analyses, with complementary advantages. After identifying the significant factors for each dataset, we show that different sets of rival forms occupy different regions in a space defined by variance in meaning and environment.

Keywords

Random Forest Classification Tree Variable Importance Forest Model Rival Form 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Выбор вариантных форм в русском языке: плюсы и минусы различных моделей статистического анализа

Аннотация

Носители языка часто сталкиваются с ситуацией выбора вариантных форм, таких как рус. остричь и обстричь или проникнув и проникши. На выбор варианта могут влиять различные факторы, включая семантику и контекстное окружение (синтаксическое, морфологическое и фонологическое). Изучение поведения вариантных форм необходимо для понимания соотношения означающего и означаемого в языке, однако этот вопрос до сих пор не получил должного внимания. Ввиду того, что выбор вариантной формы может зависеть от факторов различного рода, необходимо использовать методы статистического анализа: они позволяют точно определить, какие факторы являются главными и какова доля их влияния. Обычно для такого типа языковых данных применяется модель логистической регрессии, однако недавно появились две альтернативные модели—‘случайный лес’ и ‘наивное различительное обучение’. Мы сравнили эффективность логистической регрессии и двух новых моделей статистического анализа на материале четырех баз данных, собранных для ряда вариантных форм русского языка. Все три модели дают в целом схожие результаты, но каждая имеет свои преимущества. В статье выявлены определяющие факторы для каждого набора данных, а также показано, что исследованные нами вариантные формы размещаются в различных зонах системы двух осей координат—оси различия по значению и оси различия по контекстным условиям.

References

  1. Alekseeva, A. P. (1978). Iz istorii pristavočnogo glagol’nogo slovoobrazovanija (na primere obrazovanij s ob.- i o- (Avtoreferat kand. filol. nauk). Leningrad. Google Scholar
  2. Andrews, E. (1984). A semantic analysis of the Russian prepositions / preverbs O(-) and OB(-). Slavic and East European Journal, 28(4), 477–492. CrossRefGoogle Scholar
  3. Aronoff, M. (1976). Word formation in generative grammar (Linguistic Inquiry Monographs, 1). Cambridge. Google Scholar
  4. Arppe, A. (2008). Univariate, bivariate and multivariate methods in corpus-based lexicography. A study of synonymy (PhD dissertation), University of Helsinki, Helsinki. Google Scholar
  5. Arppe, A. (2012). Polytomous: Polytomous logistic regression for fixed and mixed effects. R package version 0.1.4. http://CRAN.R-project.org/package=polytomous.
  6. Avilova, N. S. (1959). O kategorii vida v sovremennom russkom literaturnom jazyke. Russkij jazyk v nacional’noj škole 4, 21–26. Google Scholar
  7. Avilova, N. S. (1976). Vid glagola i semantika glagol’nogo slova. Moskva. Google Scholar
  8. Baayen, R. H. (2008). Analyzing linguistic data: a practical introduction to statistics using R. Cambridge. CrossRefGoogle Scholar
  9. Baayen, R. H. (2011). Corpus linguistics and naive discriminative learning. Brazilian Journal of Applied Linguistics, 11, 295–328. Google Scholar
  10. Baayen, R. H., Milin, P., Đurđević, D. F., Hendrix, P., & Marelli, M. (2011). An amorphous model for morphological processing in visual comprehension based on naive discriminative learning. Psychological Review, 118(3), 438–482. CrossRefGoogle Scholar
  11. Barykina, A. N., Dobrovol’skaja, V. V., & Merzon, S. N. (1989). Izučenije glagol’nyx pristavok. Moskva. Google Scholar
  12. Bauer, L. (2003). Introducing linguistic morphology. Bristol. Google Scholar
  13. Baydimirova, A. (2010). Russian aspectual prefixes O, OB, and OBO: a case study of allomorphy. (Master’s thesis), University of Tromsø. Tromsø. Retrieved from http://www.ub.uit.no/munin/handle/10037/2767.
  14. Booij, G. (2005). The grammar of words. An introduction to linguistic morphology. Oxford. Google Scholar
  15. Bresnan, J., Cueni, A., Nikitina, T., & Baayen, R. H. (2007). Predicting the dative alternation. In G. Bouma, I. Kraemer, & J. Zwarts (Eds.), Cognitive foundations of interpretation (pp. 69–94). Amsterdam. Google Scholar
  16. Bulaxovskij, L. A. (1950). Istoričeskij kommentarij k russkomu literaturnomu jazyku. Kiev. Google Scholar
  17. Bulaxovskij, L. A. (1954). Russkij literaturnyj jazyk pervoj poloviny XIX veka. Fonetika. Morfologija. Udarenie. Sintaksis. Moskva. Google Scholar
  18. Cedergren, H., & Sankoff, D. (1974). Variable rules: Performance as a statistical reflection of competence. Language, 50(2), 333–355. CrossRefGoogle Scholar
  19. Černyšev, V. I. (1915). Pravil’nost’ i čistota russkoj rěci. Opyt russkoj stilističeskoj grammatiki. Tom 2: Časti rěci (2-e izd., isp. i dop.). Petrograd. Google Scholar
  20. Čertkova, M. J. (1996). Grammatičeskaja kategorija vida v sovremennom russkom jazyke. Moskva. Google Scholar
  21. Crawley, M. J. (2002). Statistical computing. An introduction to data analysis using S-plus. Chichester. Google Scholar
  22. Dąbrowska, E. (2008). The effects of frequency and neighbourhood density on adult native speakers’ productivity with Polish case inflections: an empirical test of usage-based approaches to morphology. Journal of Memory and Language, 58(4), 931–951. doi: 10.1016/j.jml.2007.11.005. CrossRefGoogle Scholar
  23. Dąbrowska, E. (2010). Naive v. expert intuitions: an empirical study of acceptability judgments. The Linguistic Review, 27(1), 1–23. doi: 10.1515/tlir.2010.001. CrossRefGoogle Scholar
  24. Danks, D. (2003). Equilibria of the Rescorla-Wagner model. Journal of Mathematical Psychology, 47(2), 109–121. doi: 10.1016/S0022-2496(02)00016-0. CrossRefGoogle Scholar
  25. Dickey, S. M. (2001). ‘Semelfactive’-- and the Western Aspect Gestalt. Journal of Slavic Linguistics, 9(1), 25–48. Google Scholar
  26. Dobrušina, E. R., Mellina, E. A., & Pajar, D. (2001). Russkije pristavki: mnogoznačnost’ i semantičeskoe edinstvo. Sbornik. Moskva. Google Scholar
  27. Dunn, O. J. (1961). Multiple comparisons among means. Journal of the American Statistical Association, 56(293), 52–64. CrossRefGoogle Scholar
  28. Endresen, A. (2011). Russkie pristavki O-, OB- i OBO-: raznye allomorfy ili morfemy? Ėksperimental’noe issledovanie. In Proceedings of the XL International Philological Conference. March 14–19, 2011. Psycholinguistics (pp. 44–55). St. Petersburg. Google Scholar
  29. Endresen, A. (2012). Allomorphy via borrowing? The status of the prefixes PRE- and PERE- in Modern Russian. Paper presented at the 7th Annual Meeting of the Slavic Linguistic Society, University of Kansas, Lawrence, 25–27 August 2012. Google Scholar
  30. Fasmer, M. (1971). Etimologičeskij slovar’ russkogo jazyka. Vol. 3. Moskva. Google Scholar
  31. Flier, M. S. (1985). Syntagmatic constraints on the Russian prefix pere-. In M. S. Flier & R. D. Brecht (Eds.), Issues in Russian morphosyntax (UCLA Slavic Studies, 10, pp. 138–154). Columbus. Google Scholar
  32. Forsyth, J. A. (1970). A Grammar of Aspect. Usage and meaning in the Russian verb. Cambridge. Google Scholar
  33. Gorbačevič, K. S. (1971). Izmenenie norm russkogo literaturnogo jazyka. Leningrad. Google Scholar
  34. Gorbačevič, K. S. (1978). Variantnost’ slova i jazykovaja norma. Na materiale sovremennogo russkogo jazyka. Leningrad. Google Scholar
  35. Graudina, L. K., Ickovič, V. A., & Katlinskaja, L. P. (1976). Grammatičeskaja pravil’nost’ russkoj reči. Opyt častotno-stilističeskogo slovarja variantov. Moskva. Google Scholar
  36. Graudina, L. K., Ickovič, V. A., & Katlinskaja, L. P. (2001). Grammatičeskaja pravil’nost’ russkoj reči. Opyt častotno-stilističeskogo slovarja variantov (2-e izd., isp. i dop.). Moskva. Google Scholar
  37. Graudina, L. K., Ickovič, V. A., & Katlinskaja, L. P. (2007). Slovar’ grammatičeskix variantov russkogo jazyka (3-e izd., stereotipnoe). Moskva. Google Scholar
  38. Haspelmath, M. (2002). Understanding morphology. London. Google Scholar
  39. Hothorn, T., Bretz, F., & Westfall, P. (2008). Simultaneous inference in general parametric models. Biometrical Journal, 50(3), 346–363. doi: 10.1002/bimj.200810425. CrossRefGoogle Scholar
  40. Hougaard, C. (1973). Vyražaet li o-/ob- soveršaemost’? Scando-Slavica, 19, 119–125. Google Scholar
  41. Isačenko, A. V. (1960). Grammatičeskij stroj russkogo jazyka v sopostavlenii s slovackim. Morfologija. Vol. 2. Bratislava. Google Scholar
  42. Janda, L. A. (1986). A semantic analysis of the Russian verbal prefixes za-, pere-, do- and ot- (Slavistische Beiträge, 192). München. Google Scholar
  43. Janda, L. A. (2007). Aspectual clusters of Russian verbs. Studies in Language, 31(3), 607–648. doi: 10.1075/sl.31.3.04jan. CrossRefGoogle Scholar
  44. Krongauz, M. A. (1998). Pristavki i glagoly v russkom jazyke: semantičeskaja grammatika. Moskva. Google Scholar
  45. Matthews, P. H. (1974). Morphology. An introduction to the theory of word-structure. Cambridge. Google Scholar
  46. Nesset, T. (1998). Russian conjugation revisited. A cognitive approach to aspects of Russian verb inflection (Tromsø-Studier i Språkvitenskap / Tromsø Studies in Linguistics, 19). Oslo. Google Scholar
  47. Nesset, T., Janda, L. A., & Baayen, R. H. (2010). Capturing correlational structure in Russian paradigms: a case study in logistic mixed-effects modeling. Corpus linguistics and linguistic theory, 6(1), 29–48. doi: 10.1515/CLLT.2010.002. Google Scholar
  48. Nesset, T., & Makarova, A. (2011). ‘Nu-drop’ in Russian verbs: a corpus-based investigation of morphological variation and change. Russian Linguistics, 35(4), 41–63. doi: 10.1007/s11185-011-9084-9. Google Scholar
  49. Plungjan, V. A. (2000). ‘Bystro’ v grammatike russkogo i drugix jazykov. In L. L. Iomdin & L. P. Krysin (Eds.), Slovo v tekste i v slovare. Sbornik statej k semidesjatiletiju akademika Ju. D. Apresjana (pp. 212–223). Moskva. Google Scholar
  50. Rescorla, R. A., & Wagner, A. W. (1972). A theory of Pavlovian conditioning: variations in the effectiveness of reinforcement and nonreinforcement. In A. H. Black & W. F. Prokasy (Eds.), Classical Conditioning II: current research and theory (pp. 64–99). New York. Google Scholar
  51. Riddle, E. M. (1985). A historical perspective on the productivity of the suffixes -ness and -ity. In J. Fisiak (Ed.), Historical semantics. Historical word-formation (Trends in Linguistics. Studies and Monographs, 29, pp. 435–461). Berlin. Google Scholar
  52. Roberts, C. B. (1981). The origins and development of O(B)- prefixed verbs in Russian with the general meaning ‘deceive’. Russian Linguistics, 5(3), 218–233. CrossRefGoogle Scholar
  53. Rozental’, D. Ė. (1977). Praktičeskaja stilistika russkogo jazyka. Moskva. Google Scholar
  54. Šaxmatov, A. A. (1952). Učenie o častjax reči. Moskva. Google Scholar
  55. Shull, S. (2003). The experience of space. The privileged role of spatial prefixation in Czech and Russian (Slavistische Beiträge, 419). München. Google Scholar
  56. Sokolova, S., Lyashevskaya, O., & Janda, L. A. (2012). The locative alternation and the Russian ‘empty’ prefixes: a case study of the verb gruzit’ ‘load’. In D. Divjak & S. T. Gries (Eds.), Frequency effects in language representation (Trends in Linguistics. Studies and Monographs, 244.2, pp. 51–85). Berlin. Google Scholar
  57. Soudakoff, D. (1975). The prefixes pere- and pre-: a definition and comparison. In D. E. Davidson & R. D. Brecht (Eds.), Soviet-American Russian language contributions [Special issue]. Slavic and East European Journal 19(2), 230–238. CrossRefGoogle Scholar
  58. Street, J., & Dąbrowska, E. (2010). More individual differences in language attainment: How much do adult native speakers of English know about passives and quantifiers? In P. Hendriks & C. Koster (Eds.), Asymmetries in language acquisition [Special issue]. Lingua 120(8), 2080–2094. doi: 10.1016/j.lingua.2010.01.004. CrossRefGoogle Scholar
  59. Strobl, C., Malley, J., & Tutz, G. (2009). An introduction to recursive partitioning: rationale, application, and characteristics of classification and regression trees, bagging, and random forests. Psychological Methods, 14(4), 323–348. doi: 10.1037/a0016973. CrossRefGoogle Scholar
  60. Švedova, N. Ju. (Ed.) (1980). Russkaja grammatika. Vol. 1. Moskva. Google Scholar
  61. Tagliamonte, S., & Baayen, R. H. (2012). Models, forests, and trees of York English: Was/were variation as a case study for statistical practice. Language Variation and Change, 24(2), 135–178. CrossRefGoogle Scholar
  62. Timberlake, A. (2004). A reference grammar of Russian. Cambridge. Google Scholar
  63. Tixonov, A. N. (1964). Čistovidovye pristavki v sisteme russkogo vidovogo formoobrazovanija. Voprosy jazykoznanija, 1, 42–52. Google Scholar
  64. Tixonov, A. N. (1998). Russkij glagol. Problemy teorii i leksikografirovanija. Moskva. Google Scholar
  65. Townsend, C. E. (1975). Russian word-formation. Columbus. Google Scholar
  66. Townsend, C. E. (2008). Russian word-formation. Bloomington. Google Scholar
  67. van Schooneveld, C. H. (1958). The so-called ‘préverbe vides’ and neutralization. In Dutch contributions to the Fourth International Congress of Slavistics (pp. 159–161). The Hague. Google Scholar
  68. Vinogradov, V. V. (1972). Russkij jazyk (grammatičeskoe učenie o slove). Moskva. Google Scholar
  69. Vinogradov, V. V., Istrina, E. S., & Barxudarov, S. G. (Eds.) (1952). Grammatika russkogo jazyka. Moskva. Google Scholar
  70. Vinogradov, V. V., & Švedova, N. J. (Eds.) (1964). Glagol, narečie, predlogi i sojuzy v russkom literaturnom jazyke XIX veka. Moskva. Google Scholar
  71. Wade, T. (1992). A comprehensive Russian grammar. Cambridge. Google Scholar
  72. Zaliznjak, A. A., & Šmelev, A. D. (1997). Lekcii po russkoj aspektologii (Slavistische Beiträge, 353). München. Google Scholar
  73. Zaliznjak, A. A., & Šmelev, A. D. (2000). Vvedenije v russkuju aspektologiju. Moskva. Google Scholar

Copyright information

© Springer Science+Business Media Dordrecht 2013

Authors and Affiliations

  • R. Harald Baayen
    • 1
  • Anna Endresen
    • 2
  • Laura A. Janda
    • 2
  • Anastasia Makarova
    • 2
  • Tore Nesset
    • 2
  1. 1.University of TübingenTübingenGermany
  2. 2.University of TromsøTromsøNorway

Personalised recommendations