Going Beyond “Aboutness”: A Quantitative Analysis of Sputnik Czech Republic

Fidler, Masako; Cvrček, Václav

doi:10.1007/978-3-319-98017-1_10

Masako Fidler⁸ &
Václav Cvrček⁹

Part of the book series: Quantitative Methods in the Humanities and Social Sciences ((QMHSS))

526 Accesses
2 Citations

Abstract

This paper is an attempt to unpack the “alternativeness” of Sputnik Czech Republic, an online news-opinion portal that targets the Czech-speaking audience. The overarching principle used in the analysis is prominence, a concept used in the corpus linguistic method of keyword analysis. The use of Multi-level Discourse Prominence Analysis (MLDPA), which combines quantitative data and concepts from critical discourse analysis and cognitive linguistics, expands the applicability of prominence beyond the lexicon to multiple levels of language and informs of the overarching rhetoric and ideology in a text. The centerpiece of MLDPA is “keymorph analysis,” which applies the cognitive linguistic notion of morphemes as meaning-bearing units (Janda 1993; Janda and Clancy, The case book for Czech. Slavica, Bloomington, IN, 2006) to the existing corpus linguistic method of keyword analysis. MLDPA helps identify and objectivize the ideological content of news in media that creates the impression of objective and well-balanced news.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Hardcover Book: USD 129.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://sputniknews.com/docs/about/index.html, accessed September 22, 2018.
2.
The use of podle parallels the use of “neutral structuring verbs” (Caldas-Coulthard, 1994) that “introduce a saying without evaluating it explicitly” (Machin and Mayer, 2012, p. 59).
3.
The show is cited incorrectly as “Le Lene” instead of the actual “Le Iene.”
4.
Emphasis in bold style by the authors.
5.
All the examples from SPUCz used in this article were last checked and were present on the web on June 22, 2018.
6.
The phrasing připravovat + the infinitive is not natural but not totally erroneous in Czech.
7.
Filip is the current chairman of the Czech Communist Party.
8.
“[a] word form which recurs within the text in question will be more likely to be key in it.” (Scott & Tribble, 2006).
9.
Extraction of KWs is the first statistical step (“keywords are pointers, that is all” (Scott, 2010)). KWs are often further analyzed with other methods of corpus linguistics (e.g., collocation profiles and “semantic prosody” (Stewart, 2010)).
10.
Several statistical tests are used for comparison of relative frequencies, such as log-likelihood, chi², or Fisher exact tests (cf. Bertels & Speelman, 2013) to determine the statistical significance of the difference. However, the statistical significance expressed by p-value is a necessary but not sufficient condition of prominence. Given that these tests are typically asymptotically true, p-values (esp. when computed on large data sets) do not inform us of whether the difference between the frequencies carries any descriptive value (cf. Wilson, 2013). As a result, tests are often accompanied by the effect size estimation, such as the Difference Index (DIN), a ratio (multiplied by 100) of the difference between relative frequencies of an item in the target text, and the reference corpus and the mean of those relative frequencies (cf. Fidler & Cvrček, 2015).
11.
The term KWs therefore differs from query terms in search engines or cultural keywords (Williams, 1976). The identification of KWs has a clear quantitative basis; “…it is less subject to the vagaries of subjective judgments of cultural importance … [and] it does not rely on researchers selecting items that might be important… but can reveal items that researchers did not know to be important in the first place.” (Culpeper & Demmen, 2015, p. 90)
12.
More discussion on the influence of a reference corpus on the results of KWA can be found in Scott, 2013.
13.
While the target corpus may be biased towards the presence of words formed from these stems, it allows us to focus on the image of these countries specifically (especially Russia and Ukraine).
14.
Both corpora are available upon request at www.korpus.cz.
15.
The significance level used in this study was set to 0.001 and the minimum effect size was set to DIN = 75.
16.
This procedure involves the level of prominence (DIN), the number of prominent units, and the number of all content words in a sentence. It investigates sentence types that are likely to attract reader attention by measuring the density of KLs.
17.
For example, the lemma hrad ‘castle’ can appear in multiple word forms in Standard Czech: hrad (nom/acc sg), hradu (gen/dat sg), hradě (loc sg), hradem (instr sg), hrady (nom/acc/voc/instr pl), hradů (gen pl), hradům (dat pl), and hradech (loc pl).
18.
Here, we only discuss common nouns, as they are most likely to be associated with the representation of entities, individuals, and events.
19.
Proper nouns and adjectives directly derived from them are not discussed here.
20.
Cf. “collocations create connotations” (Stubbs, 2005, p. 14). The contextual properties of keywords are thus examined by their links (Scott & Tribble, 2006) to other keywords (i.e., co-occurrence of KWs within a textual span).
21.
The collocates were searched within a span of three words on either side of the KWIC and were ranked first by LogDice and secondly by frequency.
22.
Collocates here are lemmas that are not necessarily keyed.
23.
The appearance of KWs referring to presidents among the collocations is expected, as the major seed words include names of presidents (e.g., Putin and Poroshenko).
24.
We excluded the remaining adverbs: zahraničně as part of the descriptive phrases zahraničně-politický/-ekonomický/-obchodní ‘internationally-politically /-economically /-commercially,‘ and the adverb odkladně (used in neodkladně ‘urgently’).
25.
Subjects were manually checked and categorized.
26.
The subjects were manually identified and include instances where the subject is implicit and/or is mentioned in the surrounding discourse.
27.
DIN here (marked with the asterisk) is calculated differently than for KLs. The prominence of each case is calculated relative to all occurrences of a given lemma in SPUCz and SYN2015, respectively (i.e., not relative to the number of tokens in the corpus) as in Table <InternalRef="IDRef="IDTab17”>10.17</InternalRef>.
28.
The instrumental case is highly collocated with the preposition s ‘with’ in Czech.
29.
The sentences were examined by each co-author independently first. The co-authors then discussed their differences and reached a mutually acceptable categorization.

References

Altshuler, D. (2010). Aspect in English and Russian flashback discourses. Oslo Studies in Language, 2, 75–107.
Google Scholar
Baker, P., & McEnery, T. (2005). A corpus-based approach to discourse of refugees and asylum seekers in UN and newspaper texts. Journal of Language and Politics, 4(2), 197–226.
Article Google Scholar
Baker, P. (2005). The public discourse of gay men. London: Routledge.
Google Scholar
Baker, P. (2009). The question is, how cruel is it? Keywords in debates on fox hunting in the British House of Commons. In D. Archer (Ed.), What’s in a word-list? (pp. 125–136). London: Ashgate.
Google Scholar
Bertels, A., & Speelman, D. (2013). ‘Keywords method’ versus ‘Calcul des Spécificités’. International Journal of Corpus Lingustics, 18(4), 536–560.
Article Google Scholar
Biber, D. (1993). Using register-diversified corpora for general language studies. Computational Linguistics, 19(2), 219–241.
Google Scholar
Biber, D. (2006). University language: A corpus-based study of spoken and written registers. Amsterdam: John Benjamins.
Book Google Scholar
Biber, D., Johansson, S., Leech, G., Conrad, S., & Finegan, E. (1999). Longman grammar of spoken and written English. Harlow, UK: Longman.
Google Scholar
Caldas-Coulthard, C. (1994). “On reporting reporting: The representation of speech in factual and factional narratives”, ed. In Malcolm Coulthard, advances in written texts analysis, 295–308. London: Routledge.
Google Scholar
Chvany, C. (1990). Verbal aspect, discourse saliency, and the so-called perfect of result in Modern Russian. In N. B. Thelin (Ed.), Verbal aspect in discourse (pp. 213–236). Amsterdam: John Benjamins.
Chapter Google Scholar
Culpeper, J. (2002). Computers, language and characterisation. An analysis of six characters in Romeo and Juliet. In U. Melander-Marttala, C. Ostman, & M. Kyto (Eds.), Conversation in life and in literature: Papers from the ASLA symposium (Vol. 15, pp. 11–30). Uppsala, Sweden: Association Suedoise de Linguistique Appliquee.
Google Scholar
Culpeper, J. (2009). Keyness: Words, parts-of-speech and semantic categories in the character-talk of Shakespeare’s Romeo and Juliet. International Journal of Corpus Linguistics, 14(1), 29–59.
Article Google Scholar
Culpeper, J., & Demmen, J. (2015). Keywords. In D. Biber & R. Reppen (Eds.), The Cambridge handbook of English corpus linguistics (pp. 90–105). Cambridge, UK: Cambridge University Press.
Chapter Google Scholar
Cvrček, V., & Fidler, M. (forthcoming). More than keywords: Discourse prominence analysis of the Russian web portal Sputnik Czech Republic. In A. Salamurovič & M. Berrocal (Eds.), Language in politics in Slavic-speaking countries.
Google Scholar
Desclés, J.-P., & Guentschéva, Z. (1990). Discourse analysis of aorist and imperfect in Bulgarian and French. In N. B. Thelin (Ed.), Verbal aspect in discourse (pp. 237–261). Amsterdam: John Benjamins.
Chapter Google Scholar
Fidler, M., & Cvrček, V. (2015). A data-driven analysis of reader viewpoints: Reconstructing the historical reader using keyword analysis. Journal of Slavic Linguistics, 23(2), 197–239.
Article Google Scholar
Fairclough, N. (1995). Media discourse. London: Hodder Education.
Google Scholar
Fidler, M., & Cvrček, V. (2017). Keymorph analysis, or how morphosyntax informs discourse. Corpus Linguistics and Linguistic Theory. https://doi.org/10.1515/cllt-2016-0073. Accessed 29 Sept 2018.
Fielder, G. (1990). Narrative context and Russian aspect. In N. B. Thelin (Ed.), Verbal aspect in discourse (pp. 263–284). Amsterdam: John Benjamins.
Chapter Google Scholar
Fisher-Starcke, B. (2009). Keywords and frequent phrases of Jane Austin’s Pride and Prejudice. A corpus-stylistic analysis. International Journal of Corpus Linguistics, 14(4), 492–523.
Article Google Scholar
Groll, E. Elias. (2014). Kremlin’s ‘Sputnik’ newswire is the buzzfeed of propaganda. Foreign Policy. https://foreignpolicy.com/2014/11/10/kremlins-sputnik-newswire-is-the-buzzfeed-of-propaganda/. Accessed 3 July 2017.
Heritage, Timothy. (2013, December 9). Putin dissolves state news agency, tightens grip on Russia media. Reuters World News. http://www.reuters.com/article/us-russia-media-idUSBRE9B80I120131209. Accessed 17 July 2017.
Hopper, P., & Thompson, S. (1980). Transitivity. Language, 56(2), 251–299.
Article Google Scholar
Jäger, S., & Maier, F. (2016). Analysing discourses and dispositives: A foucauldian approach to theory and methodology. In R. Wodak & M. Meyer (Eds.), Methods of critical discourse studies (3rd ed., pp. 109–136). London: Sage.
Google Scholar
Jakobson, R. (1984). Contribution to the general theory of case: General meanings of the Russian cases. In L. R. Waugh & M. Halle (Eds.), Roman Jakobson. Russian and Slavic grammar. Studies 1931–1981 (pp. 59–103). Berlin: Mouton.
Chapter Google Scholar
Janda, L. A. (1993). The shape of the indirect object in Central and Eastern Europe. The Slavic and East European Journal, 37(4), 533–563.
Article Google Scholar
Janda, L. A., & Clancy, S. (2006). The case book for Czech. Bloomington, IN: Slavica.
Google Scholar
Kresin, S. (1998). Deixis and thematic hierarchies in Russian narrative discourse. Journal of Pragmatics, 30(4), 421–435.
Article Google Scholar
Křen, M., Cvrček, V., Čapka, T., Čermáková, A., Hnátková, M., Chlumská, L., et al. (2016). SYN2015: Representative Corpus of contemporary written Czech. In N. Calzolari, K. Choukri, T. Declerck, S. Goggi, M. Grobelnik, B. Maegaard, et al. (Eds.), Proceedings of the tenth international conference on language resources and evaluation (LREC’16) (pp. 2522–2528). Portorož, Slovenia: ELRA http://www.lrec-conf.org/proceedings/lrec2016/index.html. Accessed 29 Sept 2018.
MacFarquhar, Neil. (2016, August 28). A powerful Russian weapon: The spread of false stories. The New York Times. https://www.nytimes.com/2016/08/29/world/europe/russia-sweden-disinformation.html. Accessed 17 July 2017.
Machin, D., & Mayer, A. (2012). How to do critical discourse analysis: A multimodal introduction. Los Angeles: Sage.
Google Scholar
Mahlberg, M. (2007). Clusters, key clusters and local textual functions in Dickens. Corpora, 2(1), 1–31.
Article Google Scholar
Scott, M. (2010). Problems in investigating keyness, or cleansing the undergrowth and marking out tails…. In M. Bondi & M. Scott (Eds.), Keyness in texts (pp. 43–57). Amsterdam: John Benjamins.
Chapter Google Scholar
Scott, M. (2013). WordSmith tools manual. Version 7.0. Liverpool, UK: Lexical Analysis Software http://www.lexically.net/downloads. Accessed 29 Sept 2018.
Scott, M., & Tribble, C. (2006). Textual patterns: Keyword and corpus analysis in language education. Amsterdam: John Benjamins.
Book Google Scholar
Smoleňová, Ivana. (2015, June). The pro-Russian disinformation campaign in the Czech Republic and Slovakia. Types of media spreading pro-Russian propaganda, their characteristics and frequently used narratives. Prague Security Studies Institute (PSSI). http://www.pssi.cz/download/docs/253_is-pro-russian-campaign.pdf. Accessed 17 July 2017.
Sonnenhauser, B. (2008). Aspect interpretation in Russian—A pragmatic account. Journal of Pragmatics, 40(12), 2077–2099.
Article Google Scholar
Stewart, D. (2010). Semantic prosody. A critical evaluation. New York: Routledge.
Book Google Scholar
Straková, J., Straka, M., & Hajič, J. (2014). Open-source tools for morphology, lemmatization, pos tagging and named entity recognition. In Proceedings of 52nd annual meeting of the Association for Computational Linguistics: System demonstrations, Baltimore, Maryland, June 2014 (pp. 13–18). Stroudsburg, PA: Association for Computational Linguistics.
Chapter Google Scholar
Stubbs, M. (2005). Conrad in the computer: Examples of quantitative stylistic methods. Language and Literature, 14(1), 5–24.
Article Google Scholar
Tabbert, U. (2015). Crime and corpus. The linguistic representation of crime in the press. John Benjamins: Philadephia.
Google Scholar
Ueda, M. (1992). The interaction between clause-level parameters and context in Russian morphosyntax: Genitive of negation and predicate adjectives. Munich, Germany: Otto Sagner.
Book Google Scholar
Walker, B. (2010). Wmatrix, key concepts and the narrator in Julian Barnes’s Talking It Over. In D. McIntyre & B. Busse (Eds.), Language and style (pp. 364–387). Basingstoke, UK: Palgrave Education.
Chapter Google Scholar
Williams, R. (1976). Keywords: A vocabulary of culture and society. New York: Oxford University Press.
Google Scholar
Wilson, A. (2013). Embracing Bayes factors for key item analysis in corpus linguistics. In M. Bieswanger & A. Koll-Stobbe (Eds.), New approaches to the study of linguistic variability. Language competence and language awareness in Europe (pp. 3–11). Frankfurt, Germany: Peter Lang.
Google Scholar

Download references

Acknowledgments

This paper was supported in part by program Progres Q08 Czech National Corpus implemented at the Faculty of Arts, Charles University and the Brown University Humanities Research Funds. The authors would also like to thank Katie Krafft for data collection.

Author information

Authors and Affiliations

Department of Slavic Studies, Brown University, Providence, RI, USA
Masako Fidler
Institute of the Czech National Corpus, Charles University, Prague 1, Czech Republic
Václav Cvrček

Authors

Masako Fidler
View author publications
You can also search for this author in PubMed Google Scholar
Václav Cvrček
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Václav Cvrček .

Editor information

Editors and Affiliations

Department of Slavic Studies, Brown University, Providence, RI, USA
Masako Fidler
Institute of the Czech National Corpus, Charles University, Prague 1, Czech Republic
Václav Cvrček

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Fidler, M., Cvrček, V. (2018). Going Beyond “Aboutness”: A Quantitative Analysis of Sputnik Czech Republic. In: Fidler, M., Cvrček, V. (eds) Taming the Corpus. Quantitative Methods in the Humanities and Social Sciences. Springer, Cham. https://doi.org/10.1007/978-3-319-98017-1_10

Download citation

DOI: https://doi.org/10.1007/978-3-319-98017-1_10
Published: 10 November 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-98016-4
Online ISBN: 978-3-319-98017-1
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics