Advertisement

Sociolinguistic Extension of the ORD Corpus of Russian Everyday Speech

  • Natalia Bogdanova-Beglarian
  • Tatiana SherstinovaEmail author
  • Olga Blinova
  • Olga Ermolova
  • Ekaterina Baeva
  • Gregory Martynenko
  • Anastasia Ryko
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9811)

Abstract

The ORD corpus is one of the largest resources of contemporary spoken Russian. By 2014, its collection numbered about 400 h of recordings made by a group of 40 respondents (20 men and 20 women, of different ages and professions), who volunteered to spend a whole day with a switched-on voice recorder, recording all their verbal communication. The corpus presents the unique linguistic material recorded in natural communicative situations, allowing spoken Russian and the everyday discourse to be studied in many aspects. However, the original sample of respondents was not sufficient enough to study a sociolinguistic variation of speech. Thus, it was decided to launch a large project aiming at the ORD sociolinguistic extension, which was supported by the Russian Science Foundation. The paper describes the general principles for the sociolinguistic extension of the corpus. It defines social groups which should be presented in the corpus in adequate numbers, sets criteria for selecting participants, describes the “recorder’s kit” for the respondents and involves the adaptation principles of the ORD annotation and structure. Now, the ORD collection exceeds 1200 h of recordings, presenting speech of 127 respondents and hundreds of their interlocutors. 2450 macro episodes of everyday spoken communication have been already annotated, and the speech transcripts add up to 1 mln words.

Keywords

Speech corpus Everyday spoken Russian Oral communication Sociolinguistics Social groupings Sociolects Speech variation 

Notes

Acknowledgement

The research is supported by the Russian Science Foundation, project # 14-18-02070 “Everyday Russian Language in Different Social Groups”.

References

  1. 1.
    Kendall, T.: Corpora from a sociolinguistic perspective. In: Corpus Studies: Future Directions, Special Iss. of Revista Brasileira de Linguística Aplicada, vol. 11(2), pp. 361–389 (2011)Google Scholar
  2. 2.
    Baker, P.: Sociolinguistics and Corpus Linguistics. Edinburgh University Press, Edinburgh (2010)Google Scholar
  3. 3.
    Romaine, S.: Corpus linguistics and sociolinguistics. In: Lüdeling, A., Kytö, M. (eds.) Corpus Linguistics: An International Handbook, vol. 1, pp. 96–111. Mouton de Gruyter, Berlin-New York (2008)Google Scholar
  4. 4.
    Grishina, E.A.: Spoken speech in the Russian national corpus. In: The Russian National Corpus 2003–2005, pp. 94–110. Indrik Publ., Moscow (2005). (in Russian)Google Scholar
  5. 5.
    Kibrik, A.A., Podlesskaya, V.I. (eds.): Night Dream Stories: a Corpus Study of Spoken Russian Discourse. Languages of Slavic Cultures, Moscow (2009). (in Russian)Google Scholar
  6. 6.
    Asinovsky, A., Bogdanova, N., Rusakova, M., Ryko, A., Stepanova, S., Sherstinova, T.: The ORD speech corpus of Russian everyday communication “One Speaker’s Day”: creation principles and annotation. In: Matoušek, V., Mautner, P. (eds.) TSD 2009. LNCS, vol. 5729, pp. 250–257. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  7. 7.
    Reference Guide for the British National Corpus. http://www.natcorp.ox.ac.uk/docs/URG.xml
  8. 8.
    Campbell, N.: Speech & expression; the value of a longitudinal corpus. In: LREC 2004, pp. 183–186 (2004)Google Scholar
  9. 9.
    Linguistic Annotator ELAN. https://tla.mpi.nl/tools/tla-tools/elan/
  10. 10.
    Praat: doing phonetics by computer. http://www.fon.hum.uva.nl/praat/
  11. 11.
    Bogdanova-Beglarian, N., Martynenko, G., Sherstinova, T.: The “One Day of Speech” corpus: phonetic and syntactic studies of everyday spoken Russian. In: Ronzhin, A., Potapova, R., Fakotakis, N. (eds.) SPECOM 2015. LNCS, vol. 9319, pp. 429–437. Springer, Heidelberg (2015)CrossRefGoogle Scholar
  12. 12.
    Baeva, E.M.: On means of sociolingiustic balancing of a spoken corpus (Based on the ORD corpus). Perm Univ. Herald Russ. Foreign Philol. 4(28), 48–57 (2014). (in Russian)Google Scholar
  13. 13.
    Davis, J.M., Smith, M.: Working in Multi-Professional Contexts: A Practical Guide for Professionals in Children’s Services, p. 82. SAGE Publications Ltd., Los Angeles (2012)Google Scholar
  14. 14.
    Bogdanova-Beglarian, N.V. (ed.): Speech Corpus as the Base for Analysis of Russian Speech. Part 2. Theoretical and practical aspects of analysis, 1. Philological Faculty of St. Petersburg State University, St. Petersburg (2014). (in Russian)Google Scholar
  15. 15.
    Social and demographic portrait of Russia: the result of population census of 2010 by Federal Agency of Urban Statistics. Statistics of Russia, Moscow (2012). (in Russian)Google Scholar
  16. 16.
    Zaslavskaya, T.I.: Social structure of modern Russian society. Soc. Sci. Modernity 2, 5–23 (1997). (in Russian)Google Scholar
  17. 17.
    Sherstinova, T.: The structure of the ORD speech corpus of Russian everyday communication. In: Matoušek, V., Mautner, P. (eds.) TSD 2009. LNCS, vol. 5729, pp. 258–265. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  18. 18.
    Sherstinova, T.: Macro episodes of Russian everyday oral communication: towards pragmatic annotation of the ORD speech corpus. In: Ronzhin, A., Potapova, R., Fakotakis, N. (eds.) SPECOM 2015. LNCS (LNAI), vol. 9319, pp. 268–276. Springer, Heidelberg (2015)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • Natalia Bogdanova-Beglarian
    • 1
  • Tatiana Sherstinova
    • 1
    Email author
  • Olga Blinova
    • 1
  • Olga Ermolova
    • 1
  • Ekaterina Baeva
    • 1
  • Gregory Martynenko
    • 1
  • Anastasia Ryko
    • 1
  1. 1.Saint Petersburg State UniversitySt. PetersburgRussia

Personalised recommendations