Russian Learner Translator Corpus

Design, Research Potential and Applications
  • Andrey Kutuzov
  • Maria Kunilovskaya
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8655)


The project we present – Russian Learner Translator Corpus (RusLTC) is a multiple learner translator corpus which stores Russian students’ translations out of English and into it. The project is being developed by a cross-functional team of translator trainers and computational linguists in Russia. Translations are collected from several Russian universities; all translations are made as part of routine and exam assignments or as submissions for translation contests by students majoring in translation. As of March 2014 RusLTC contains the total of nearly 1.2 million word tokens, 258 source texts, and 1,795 translations. The paper gives a brief overview of the related research, describes the corpus structure and corpus-building technologies used; it also covers the query tool features and our error annotation solutions. In the final part we make a summary of the RusLTC-based research, its current practical applications and suggest research prospects and possibilities.


corpus building learner corpora multiple translation corpora query tool linguistic mark-up mistakes annotation corpus-driven translator education 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Bernardini, S., Castagnoli, S.: Corpora for translator education and translation practice. In: Rodrigo, E. (ed.) Topics in Language Resources for Translation and Localisation. Benjamins translation library: EST subseries, vol. 79, pp. 39–57. John Benjamins Publishing Company (2008)Google Scholar
  2. 2.
    Bowker, L., Bennison, P.: Student translation archive: design, development and application. In: Zanettin, F., Bernardini, S., Stewart, D. (eds.) Corpora in Translator Education, pp. 103–117. Saint Jerome Publishing (2003)Google Scholar
  3. 3.
    Castagnoli, S.: Variation and regularities in translation: insights from multiple translation corpora. In: UCCTS 2010 - Using Corpora in Contrastive and Translation Studies (2010)Google Scholar
  4. 4.
    Castagnoli, S., Kunz, K., Kübler, N., Volanschi, A.: Designing a learner translator corpus for training purposes (2006)Google Scholar
  5. 5.
    Espunya, A.: Investigating lexical difficulties of learners in the error-annotated upf learner translation corpus. In: Granger, S., Gilquin, G., Meunier, F. (eds.) Twenty Years of Learner Corpus Research: Looking back, Moving ahead. Corpora and Language in Use – Proceedings. Presses Universitaires de Louvain (2013)Google Scholar
  6. 6.
    Florén, C., Sanz, R.: The application of a parallel corpus (english-spanish) to the teaching of translation (entrad project). In: Muñoz Calvo, M., Buesa-Gómez, C., Ruiz-Moneva, M.A. (eds.) New Trends in Translation and Cultural Identity, pp. 433–443. Cambridge Scholars Publishing (2008)Google Scholar
  7. 7.
    Graedler, A.L.: Nest – a corpus in the brooding box. In: Huber, M., Mukherjee, J. (eds.) Corpus Linguistics and Variation in English: Focus on Non-Native Englishes. Studies in Variation, Contacts and Change in English, University of Giessen (2013)Google Scholar
  8. 8.
    Ilyushchenya, T., Kunilovskaya, M.: Inter-rater reliability in student translation evaluation). In: Proceedings of International Conference on Translation Studies Ecology of Translation: Interdisciplinary Research and Perspectives, Tyumen, Russia, pp. 105–115 (2013) (in Russian)Google Scholar
  9. 9.
    Kunilovskaya, M., Morgoun, N.: Gains and pitfalls of sentence-splitting in translation. In: Perm National Research Polytechnic University Herald, pp. 152–166. Linguistic and Pedagogy, Perm National Research Polytechnic University (2013)Google Scholar
  10. 10.
    Kutuzov, A.: Is there a difference between male and female translations (based on the rusltc data). In: Proceedings of International Conference on Translatology, Problems of Translation and Methods of Teaching Translation, vol. 1, pp. 97–104. Nizhny Novgorod, Russia (2012) (in Russian)Google Scholar
  11. 11.
    Padró, L., Stanilovsky, E.: Freeling 3.0: Towards wider multilinguality. In: Calzolari, N., Choukri, K., Declerck, T., Doǧan, M.U., Maegaard, B., Mariani, J., Odijk, J., Piperidis, S. (eds.) Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC 2012). European Language Resources Association (ELRA), Istanbul (2012)Google Scholar
  12. 12.
    Sosnina, E.: Russian translation learner corpus: The first insights. In: The Proceedings of the 6 International Scientific Conference Interactive Systems: Problems of Human-computer Interaction (2005)Google Scholar
  13. 13.
    Spence, R.: A corpus of student l1-l2 translations. In: Granger, S., Hung, J. (eds.) Proceedings of the International Symposium on Computer Learner Corpora, Second Language Acquisition and Foreign Language Teaching, pp. 110–112. The Chinese University of Hong Kong (1998)Google Scholar
  14. 14.
    Stenetorp, P., Pyysalo, S., Topic, G., Ohta, T., Ananiadou, S., Tsujii, J.: Brat: a web-based tool for nlp-assisted text annotation. In: EACL, pp. 102–107 (2012)Google Scholar
  15. 15.
    Uzar, R., Waliski, J.: Analysing the fluency of translators. International Journal of Corpus Linguistics 6(1), 155–166 (2001-12-01T00:00:00)Google Scholar
  16. 16.
    Varga, D., Németh, L., Halácsy, P., Kornai, A., Trón, V., Nagy, V.: Parallel corpora for medium density languages. In: Recent Advances in Natural Language Processing (RANLP 2005), pp. 590–596 (2005)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Andrey Kutuzov
    • 1
  • Maria Kunilovskaya
    • 2
  1. 1.National Research University Higher School of EconomicsMoscowRussia
  2. 2.Tyumen State UniversityTyumenRussia

Personalised recommendations