Morphological variation and sensitivity to frequency of forms among native speakers of Czech

Bermel, Neil; Knittl, Luděk; Russell, Jean

doi:10.1007/s11185-015-9149-2

Morphological variation and sensitivity to frequency of forms among native speakers of Czech

Морфологическая вариация и чувствительность к частотности форм у носителей чешского языка

Published: 15 September 2015

Volume 39, pages 283–308, (2015)
Cite this article

Russian Linguistics Aims and scope Submit manuscript

Neil Bermel¹,
Luděk Knittl¹ &
Jean Russell¹

342 Accesses
4 Citations
Explore all metrics

Abstract

This article looks at inter-speaker variation in two environments: the genitive and locative singular cases of masculine ‘hard inanimate’ nouns in Czech, using a large-scale survey of native speakers that used two tasks to test their preferences for certain forms (acceptability) and their choices (gap filling). Our hypothesis that such variation exists was upheld, but only within limited parameters. Most biographical data (age, gender, education) played no role in respondents’ choices or preferences. Their region of origin played a small but significant role, although not the one expected. Relating the two types of tasks to each other, we found that respondents’ use of the ratings scale did not correlate to their choice of forms, but their overall strength of preference for one form over another did correlate with their choices. Inter-speaker variation does thus go some way to explaining the persistent diversity in this paradigm and arguably may contribute to its maintenance.

Аннотация

Настоящая статья рассматривает вариацию между говорящими в двух средах: в родительном и предложном падежах единственного числа ‘твердых неодушевленных’ существительных мужского рода в чешском языке. Материалом исследования стал широкий опрос носителей чешского языка с целью проверки предпочтений и выбора используемых форм двумя тестами: оценками на шкале и выполнением пропусков. Наша гипотеза, заключающаяся в том, что такая вариация существует, была до некоторой степени подтверждена. Большинство биографических данных носителей (возраст, пол, образование) не играло роли в предпочтениях и выборе наших респондентов. Место происхождения, однако, играло небольшую, но существенную роль, хотя иную, чем мы ожидали. Соотнеся эти два типа задач между собой, мы пришли к выводу, что способ использования шкалы предпочтений не соответствовал выбору форм, зато общая тенденция в предпочтениях той или иной формы соответствовала выбору окончаний участниками анкеты. Таким образом, причины устойчивого разнообразия в этой парадигме частично объясняет вариативность в языке носителей, и есть основания полагать, что она является условием сохранения этого разнообразия.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The cognitive reality of morphomes. Evidence from Italian

Article Open access 29 December 2023

Age of acquisition and imageability norms for base and morphologically complex words in English and in Spanish

Article 05 May 2015

Object symmetry effects in Germanic

Article Open access 06 March 2018

Notes

As an example, Grepl et al. (1996, pp. 244–267) give 13 basic ‘types’ (typy) named after common nouns: pán ‘mister’, muž ‘man’, předseda ‘chairman’, soudce ‘judge’, hrad ‘castle’, stroj ‘machine’, žena ‘woman’, růže ‘rose’, kost ‘bone’, město ‘town’, moře ‘sea’, kuře ‘chicken’, stavení ‘building’. Cvrček et al. (2010, p. 144) have 12 ‘patterns’ (vzory) similarly named: list ‘sheet’, město, had ‘snake’, táta ‘dad’, žena, muž, stroj, duše ‘soul’, píseň ‘song’, moře, kost, stavení. Both then have lists of ‘subtypes’ or ‘subpatterns’ (podtypy, podvzory): Cvrček et al. have 10 subpatterns and then a lengthy list of further exceptions, while Grepl et al. do not distinguish strictly between subtypes and other sorts of deviations from the basic types. Tradition evidently plays a significant role in these descriptions: a basic pattern or type such as moře may have only a handful of items, while a subclass of it such as letiště ‘airport’ may have many more, and a much more productive class such as cyklus ‘cycle’ with hundreds of items is not even classed as a subtype or subpattern.
Meillet (1965, p. 347), writing about Common Slavonic, lists only five nouns reliably falling into the u-stem class: domŭ ‘house’, vrŭxŭ ‘summit’, volŭ ‘ox’, polŭ ‘half’, medŭ ‘honey’, to which Matthews (1967), writing about old Russian, adds synъ ‘son’, rodъ ‘clan’, rjadъ ‘row’, činъ ‘rank’ “and several others” (ibid., p. 106). Vaillant (1964, pp. 90–92), writing about Old Church Slavonic, in addition lists oudъ ‘member, (body) part’, darъ ‘gift’, sanъ ‘post’ as largely convergent with this class, and židъ ‘Jew’ as convergent in the plural.
Details can be found in Bermel and Knittl (2012a, pp. 99–100). SYN2005 has just over 100 million word tokens, so this equates respectively to .004/.003 per million for types with the expansive ending, .01/.001 for types with the recessive ending, and 1.21/1.1 for types that have both endings.
Since our explicit goal was to relate corpus frequency to user experiment data, one of the ways we aimed to make the two data sets converge was by having users react to data drawn from the corpus. This meant they were dealing with material that came from the stylistic and structural ambit of the corpus data. To reduce the possibility of respondents being distracted by extraneous material, influenced by similar constructions elsewhere in the sentence, or confused by complex syntax, we simplified or modified some of the sentences used. However, we did not always use the simplest possible sentence structures. Our hope was that in a questionnaire of significant length, having people read sentences that caught their attention in some way or exhibited varied structure would increase their attention span for the task.
The endpoints on the scale were labelled: 1 = naprosto normální (v rámci daného kontextu bych to určitě takto napsal/a) ‘1 = absolutely normal (in this context I would definitely write it that way)’; 7 = nepřijatelné (v daném kontextu mi něco hodně ,,nesedí“, nepovažuji to za normální češtinu) ‘7 = unacceptable (in this context something really doesn’t feel right; I don’t think it’s normal Czech)’. Midpoints were not labelled; this encourages respondents to use the scale as equally-spaced points between 1 and 7, although there is no guarantee they will do so. The use of 1 as the high mark conforms to general Czech rating and marking systems.
For the gap-filling, it was important that the context be clear enough to elicit the desired answer. Sometimes this meant inserting an adjective to make sure that a singular form was obtained. In a few places a plural was judged so unlikely that no adjective was inserted; however, in some instances respondents nonetheless used one. For some this may have represented an attempt at avoiding the task, possibly because they were unsure of the ‘right’ answer (there is no choice to be made in the plural form).
We were interested here in regional differences, but did not wish to call that fact to our respondents’ attention by using traditional terms like ‘Bohemian’ or ‘Moravian’ that might highlight dialect affinity. We therefore used the division into 14 kraje—modern administrative regions—which can handily be divided along dialectal lines. The only problematic region was Vysočina, which is bisected by Bohemian / Moravian dialect isoglosses, and thus analyses involving regional variables leave out respondents from this area. Respondents were asked to identify the area ‘they came from’, presupposing that they would select the area for which they have the greatest affinity. They were also asked to indicate if they had lived anywhere else for a year or longer, but most did not indicate that this was the case.
In 2009, there were 129.8 women studying at Czech universities for every 100 men (Český statistický úřad 2011), meaning that women constituted 56.5 % of tertiary students. Even if all our respondents had been current university students, however, this would only have predicted 312 female respondents, compared to the actual 329, so it cannot completely explain the disproportionate response from women.
We asked for three levels: primary, secondary, tertiary. In addition, respondents were asked to indicate their field of study if they had finished university. The ‘Expected’ column shows how many we might have expected to have in each area if the survey had been weighted to the proportions of the Czech population as a whole.
As some studies have shown that linguists, or even specifically those linguists with specific theoretical training, may answer differently from other respondents due to their level of metaknowledge (Dąbrowska 2010), we tried to limit linguists’ participation by specifically targeting students in modules on management, computer science and civics.
The one result of \(p = 0.07\) is just outside the conventional threshold for significance (1 in 20, or \(p = 0.05\)). In the context of seven other non-significant results, it is not worth examining this too closely.
For a discussion of what this measure signifies, see Sect. 9.
As discussed earlier, Rácz et al. (2014), among others, have suggested that in nonce-word tasks, men rely statistically more on analogy and women on inference of general rules. Our data does not provide enough support for this, possibly because neither of our tasks involves unknown or little known lexemes requiring necessary resort to these processes.
It shows up occasionally in combination with other factors, but the effect sizes are very small. In all probability this is a matter of one word that people from different regions judge slightly differently.
By this we mean that the task of inserting a single response in a forced-choice question is a relatively easy and comprehensible task by comparison, familiar from school exercises and tests and from other questionnaires. As researchers we are not thereby absolved of interrogating those results with similar rigor (i.e. is it correct to deduce from the production of one variant that the other variant would not be produced?), but from the respondent’s point of view the task is a simpler one.
In combining the two ratings into one, all of the methods proposed strip out some information from the original data, and this one is no exception. Lost here is the absolute value of the ratings (overall ‘strictness’ or ‘permissiveness’ of each user). For example, if speaker S rates the {a} ending as 1 and the {u} ending as 3, the ‘delta’ is 2. This gives speaker S the same score as speaker T, who rated {a} as 3 and {u} as 5. Speaker S is significantly more positive about both endings, but that fact is not captured in the final result. (The overall level of permissiveness is better captured by taking the sum of scores, but that calculation in turn strips out the strength and direction of preference, i.e. scores for competing variants of 1 and 7 give the same result as scores of 3 and 5, or of 7 and 1.)
The direction of correlation is easiest to see in a hypothetical example. If the average rating for {ě} is 1.5 and the average rating for {u} is 3.5, that means respondents rate {ě} as better than {u}, and it results in a score of 2 (subtracting the {ě} score from the {u} score). A positive correlation thus indicates that the more definitively respondents like {ě}, the more they use {ě} (consonant with expectations). If the positions are reversed, then the average score for {ě} is 3.5 and the average score for {u} is 1.5. In this instance, respondents rate {u} as better than {ě}, resulting in a preference score of −2 (negative because we subtract the {ě} score from the {u} score). A negative correlation thus means that the more definitively people like {u}, the more they use {ě} (contrary to expectations). A score close to zero occurs when both forms get a similar rating (inconclusive).

References

Baayen, R. H., Endresen, A., Janda, L. A., Makarova, A., & Nesset, T. (2013). Making choices in Russian: pros and cons of statistical methods for rival forms. Russian Linguistics, 37(3), 253–291.
Article Google Scholar
Bermel, N., & Knittl, L. (2012a). Morphosyntactic variation and syntactic constructions in Czech nominal declension: corpus frequency and native-speaker judgments. Russian Linguistics, 36(1), 91–119.
Article Google Scholar
Bermel, N., & Knittl, L. (2012b). Corpus frequency and acceptability judgments: a study of morphosyntactic variants in Czech. Corpus Linguistics and Linguistic Theory, 8(2), 241–275. doi:10.1515/cllt-2012-0010.
Article Google Scholar
Bermel, N., Knittl, L., & Russell, J. (2014). Absolutní a proporcionální frekvence v ČNK ve světle výzkumu morfosyntaktické variace v češtině. Naše řeč, 97(4–5), 216–227.
Google Scholar
Borschev, V., & Partee, B. H. (2002). The Russian genitive of negation: theme-rheme structure or perspective structure? In J. E. Lavine & G. R. Greenberg (Eds.), A special volume in honour of Leonard H. Babby [Special issue]. Journal of Slavic Linguistics, 10(1–2), 105–144.
Google Scholar
Brown, D. (2007). Peripheral functions and overdifferentiation: the Russian second locative. Russian Linguistics, 31(1), 61–76. doi:10.1007/s11185-006-0715-5.
Article Google Scholar
Bybee, J. (2002). Word frequency and context of use in the lexical diffusion of phonetically conditioned sound change. Language Variation and Change, 14, 261–290. doi:10.1017/S0954394502143018.
Article Google Scholar
Bybee, J. (2006). From usage to grammar: the mind’s response to repetition. Language, 82(4), 711–713.
Article Google Scholar
Čermák, F. et al. (2005). SYN2005: a genre-balanced corpus of written Czech. Czech National Corpus Institute, Faculty of Arts, Charles University. Prague. Available at http://www.korpus.cz.
Český statistický úřad (2011). Zaostřeno na ženy a muže – 2011. Retrieved from https://www.czso.cz/csu/czso/zaostreno-na-zeny-a-muze-2011-zwsib54xwy (16 December 2014).
Český statistický úřad (2013). Stav a pohyb obyvatelstva v ČR – v roce 2012 (předběžné výsledky). Retrieved from https://www.czso.cz/csu/czso/stav-a-pohyb-obyvatelstva-v-cr-v-roce-2012-predbezne-vysledky-e5y3986dwh (16 December 2014).
Český statistický úřad (2014). Souhrnná data o České republice (Obyvatelstvo podle dosaženého vzdělání). Retrieved from https://www.czso.cz/csu/czso/souhrnna_data_o_ceske_republice (16 December 2014).
Cvrček, V. et al. (2010). Mluvnice současné češtiny. Praha.
Google Scholar
Dąbrowska, E. (2008). The effects of frequency and neighbourhood density on adult speakers’ productivity with Polish case inflections: an empirical test of usage-based approaches to morphology. Journal of Memory and Language, 58, 931–951. doi:10.1016/j.jml.2007.11.005.
Article Google Scholar
Dąbrowska, E. (2010). Naive v. expert intuitions: an empirical study of acceptability judgments. Linguistic Review, 27, 1–23. doi:10.1515/tlir.2010.001.
Article Google Scholar
Grepl, M. et al. (1996). Příruční mluvnice češtiny. Praha.
Google Scholar
Janda, L. A. (1996). Back from the brink. A study of how relic forms in languages serve as source material for analogical extension (LINCOM Studies in Slavic Linguistics, 1). Munich, Newcastle.
Google Scholar
Labov, W., Karen, M., & Miller, C. (1991). Near-mergers and the suspension of phonemic contrast. Language Variation and Change, 3, 33–74.
Article Google Scholar
Matthews, W. K. (1967). Russian historical grammar (reprinted with corrections). London.
Google Scholar
Meillet, A. (1965). Le slave commun. Paris.
Google Scholar
Pierrehumbert, J. (1994). Knowledge of variation. In K. Beals et al. (Eds.), Papers from the 30th regional meeting of the Chicago Linguistic Society. Volume 2: The parasession on variation in linguistic theory (pp. 232–256). Chicago.
Google Scholar
Rácz, P., Beckner, C., Hay, J. B., & Pierrehumbert, J. B. (2014). Rules, analogy, and social factors codetermine past-tense formation patterns in English. In Proceedings of the 2014 Joint Meeting of SIGMORPHON and SIGFSM, 27 June 2014. Baltimore, Maryland. Retrieved from http://acl2014.org/acl2014/W14-28/pdf/W14-2807.pdf (14 January 2014).
Google Scholar
Theakston, A. L. (2004). The role of entrenchment in children’s and adults’ performance on grammaticality judgment tasks. Cognitive Development, 19, 15–34. doi:10.1016/j.cogdev.2003.08.001.
Article Google Scholar
Vaillant, A. (1964). Manuel de vieux slave. Tome I: Grammaire. Paris.
Google Scholar

Download references

Author information

Authors and Affiliations

The University of Sheffield, Western Bank, Sheffield, S10 2TN, UK
Neil Bermel, Luděk Knittl & Jean Russell

Authors

Neil Bermel
View author publications
You can also search for this author in PubMed Google Scholar
Luděk Knittl
View author publications
You can also search for this author in PubMed Google Scholar
Jean Russell
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Neil Bermel.

Additional information

This article forms part of the project ‘Acceptability and forced-choice judgements in the study of linguistic variation’, funded by the Leverhulme Trust (RPG-407).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bermel, N., Knittl, L. & Russell, J. Morphological variation and sensitivity to frequency of forms among native speakers of Czech. Russ Linguist 39, 283–308 (2015). https://doi.org/10.1007/s11185-015-9149-2

Download citation

Published: 15 September 2015
Issue Date: November 2015
DOI: https://doi.org/10.1007/s11185-015-9149-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Morphological variation and sensitivity to frequency of forms among native speakers of Czech

Abstract

Аннотация

Access this article

Similar content being viewed by others

The cognitive reality of morphomes. Evidence from Italian

Age of acquisition and imageability norms for base and morphologically complex words in English and in Spanish

Object symmetry effects in Germanic

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Morphological variation and sensitivity to frequency of forms among native speakers of Czech

Abstract

Аннотация

Access this article

Similar content being viewed by others

The cognitive reality of morphomes. Evidence from Italian

Age of acquisition and imageability norms for base and morphologically complex words in English and in Spanish

Object symmetry effects in Germanic

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation