Skip to main content
Log in

Simplification in translated Czech: a new approach to type-token ratio

Упрощение в чешских переводных текстах: новый подход к отношению словоформа / словоупотребление (type-token ratio)

  • Published:
Russian Linguistics Aims and scope Submit manuscript

Abstract

The main objective of the paper is to examine whether simplification can be demonstrated to exist in Czech translated texts. In general, simplification as one of the so-called translation universals, is defined as a translators’ tendency to create simpler texts. According to research of English texts, simplification may be manifested e.g. by a lower level of lexical richness. To describe lexical richness, a simple type-token ratio (TTR) is widely used; however, it is very sensitive to text size. To overcome this disadvantage, a standardized type-token ratio (sTTR) has been introduced, which is calculated for every 1000 words in the text. Nevertheless, it also has certain drawbacks. Our method for standardizing type-token ratio (zTTR) is based on comparing the observed TTR with the referential TTR values representing texts of identical size. Inspired by the z-score, this metric is capable of comparing the lexical richness of texts regardless of their length. The analysis carried out on a large comparable corpus of translated and non-translated Czech proved that the non-translated texts tend be lexically richer, although the difference is not as striking as some studies have predicted.

Аннотация

Основной целью работы является выяснение вопроса, содержатся ли упрощения в чешских переводных текстах. В общем случае упрощение, как одна из так называемых универсалий перевода, определяется как тенденция переводчиков порождать более простые тексты по сравнению с оригиналом. Исследования текстов на английском языке, показывают, что упрощения могут проявляться, например, в более низком уровне лексического богатства. Для описания лексического богатства широко используется простое отношение словоформа / словоупотребление (type-token ratio, TTR); однако оно очень чувствительно к размеру текста. Чтобы преодолеть этот недостаток, было введено стандартизированное отношение словоформа / словоупотребление (sTTR), которое вычисляется для каждой тысячи слов в тексте. Тем не менее, и этот метод имеет определенные недостатки. Наш метод стандартизации отношения словоформа / словоупотребление (zTTR) основан на сравнении наблюдаемой величины TTR со значениями эталонного TTR, представляющими тексты идентичного размера. Эта метрика, родившаяся под влиянием меры z-score, способна сравнивать лексическое богатство текстов безотносительно к их длине. Наш анализ, выполненный на основе большого корпуса чешских переводных и оригинальных текстов, показал, что оригинальные тексты являются, как правило, лексически богаче, хотя разница не столь значительна, как это предсказывали некоторые исследования.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Notes

  1. Neumann, S. CroCo: A multiply annotated and aligned corpus for the investigation of translation properties. Invited talk, Language Technology Group Seminars, Macquarie University, Sydney, 15 May 2006.

  2. Neumann, S. Beyond translation properties: the contribution of corpus studies to empirical translation theory. Plenary talk, UCCTS4, Lancaster, UK, 25th July 2014.

  3. Baker, M. Patterns of idiomaticity in translated vs. original English. Paper given at the Third EST Congress Translation Studies: Claims, Changes and Challenges, August 30–Sept. 1, 2001, Copenhagen.

  4. As part of grant VG027 2013 FA CU, see Chlumská (2013).

  5. The corpus can be accessed via the KonText interface: http://www.korpus.cz.

  6. However important, the issue of text types / genres and their definition far exceeds the limited scope of this paper. In this case, the traditional division available in the CNC was used.

  7. According to the Czech National Library statistics of translated books, available (in Czech) at http://text.nkp.cz/sluzby/sluzby-pro/sluzby-pro-vydavatele/vykazy.

  8. To avoid possible misunderstanding related to the ambiguity of the term: we use the term ‘type’ in this study to denote (different) case-sensitive word-forms (not lemmas). Nevertheless, the algorithm described below will be valid with any kind of types (lemmas, case-insensitive forms etc.).

  9. The simplicity of TTR calculation is not the only reason for its popularity among researchers. Further obvious advantages are its straightforward interpretation and low computational complexity; due to these factors, other metrics, such as Yule’s K (Yule 1944) or Zipf’s Z (Orlov 1982), are used significantly less often.

  10. WordSmith Tools, version 4 by Mike Scott. More information available at http://www.lexically.net/wordsmith/.

  11. A similar algorithm is used for comparing frequencies of language phenomena in two unequally sized corpora by converting raw frequencies to ipm (instances per million).

  12. For the purpose of this study we have excluded texts which were previously included in the Jerome corpus.

  13. A similar approach to normalizing the difference between an actual value and a sample mean using the standard deviation was adopted e.g. for measuring lexical fixedness (Fazly and Stevenson 2006).

  14. R Core Team, 2013. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. http://www.R-project.org/.

References

  • Baker, M. (1993). Corpus linguistics and translation studies. Implications and applications. In M. Baker, G. Francis, & E. Tognini-Bonelli (Eds.), Text and technology. In honour of John Sinclair (pp. 233–250). Amsterdam, Philadelphia.

    Chapter  Google Scholar 

  • Baker, M. (1996). Corpus-based translation studies: the challenges that lie ahead. In H. Somers (Ed.), Terminology, LSP and translation: studies in language engineering in honour of Juan C. Sager (Benjamins Translation Library, 18, pp. 175–186). Amsterdam, Philadelphia.

    Chapter  Google Scholar 

  • Chesterman, A. (2004). Hypotheses about translation universals. In G. Hansen, K. Malmkjær, & D. Gile (Eds.), Claims, changes and challenges in translation studies. Selected contributions from the EST Congress Copenhagen 2001 (Benjamins Translation Library, 50, EST Subseries, 1, pp. 1–14). Amsterdam, Philadelphia.

    Chapter  Google Scholar 

  • Chlumská, L. (2013). Jerome – a monolingual comparable corpus of translated and non-translated Czech. Available at http://www.korpus.cz.

  • Corpas Pastor, G., Mitkov, R., Afzal, N., & Pekar, V. (2008). Translation universals: do they exist? A corpus-based NLP study of convergence and simplification. In Proceedings of the Eighth Conference of the Association for Machine Translation in the Americas (AMTA-08). Waikiki, Honolulu. Retrieved from http://clg.wlv.ac.uk/papers/AMTA2008.pdf (18 June 2015).

    Google Scholar 

  • Delaere, I., De Sutter, G., & Plevoet, K. (2012). Is translated language more standardized than non-translated language? Using profile-based correspondence analysis for measuring linguistic distances between language varieties. Target, 24(2), 203–224. doi:10.1075/target.24.2.01del.

    Article  Google Scholar 

  • Fazly, A., & Stevenson, S. (2006). Automatically constructing a lexicon of verb phrase idiomatic combinations. In EACL-2006. Proceedings of the 11th Conference of the European chapter of the Accociation for Computational Linguistics. April 3rd–7th, 2006. Trento, Italy. Retrieved from http://www.aclweb.org/anthology/E06-1043 (18 June 2015).

    Google Scholar 

  • Fernandes, L. (2006). Corpora in translation studies: revisiting Baker’s typology. Fragmentos, 30, 87–95.

    Google Scholar 

  • Kenny, D. (1998). Creatures of habit? What translators usually do with words. Meta: Translators’ Journal, 43(4), 515–523.

    Article  Google Scholar 

  • Laviosa, S. (1998). Core patterns of lexical use in a comparable corpus of English narrative prose. Meta: Translators’ Journal, 43(4), 557–570. doi:10.7202/003425ar.

    Article  Google Scholar 

  • Laviosa, S. (2002). Corpus-based translation studies. Theory, findings, applications (Approaches to Translation Studies, 17). Amsterdam, New York.

    Google Scholar 

  • Lind, S. (2007). Translation universals (or laws, or tendencies, or probabilities, or …?). TIC Talk. Newsletter of the United Bible Societies Translation Information Clearinghouse, 63. Retrieved from https://www.academia.edu/8696942/Translation_Universals_or_laws_or_tendencies_or_probabilities_or..._ (18 June 2015).

  • Mihăilă, C. (2010). Translation studies: simplification and explicitation universals. Retrieved from http://www.slideshare.net/claudiumihaila/report-3832657?related=1 (18 June 2015).

  • Orlov, J. (1982). Ein Modell der Häufigkeitsstruktur des Vokabulars. In H. Guiter & M. V. Arapov (Eds.), Studies on Zipf’s Law (Quantitative Linguistics 16, pp. 154–233). Bochum.

    Google Scholar 

  • Teich, E. (2003). Cross-linguistic variation in system and text. A methodology for the investigation of translations and comparable texts (Text, Translation, Computational Processing, 5). Berlin, New York.

    Book  Google Scholar 

  • Tirkkonen-Condit, S. (2004). Unique items—over- or under-represented in translated language? In A. Mauranen & P. Kujamäki (Eds.), Translation universals. Do they exist? (Benjamins Translation Library, 48, pp. 177–184). Amsterdam, Philadelphia.

    Chapter  Google Scholar 

  • Xiao, R. (2010). How different is translated Chinese from native Chinese? A corpus-based study of translation universals. International Journal of Corpus Linguistics, 15(1), 5–35. doi:10.1075/ijcl.15.1.01xia.

    Article  Google Scholar 

  • Yang, H. Z., & Wei, N. X. (2002). Yu liao ku yu yan xue dao lun (transl. ‘An introduction to corpus linguistics’). Shanghai.

    Google Scholar 

  • Yule, G. U. (1944). The statistical study of literary vocabulary. Cambridge.

    Google Scholar 

  • Zanettin, F. (2011). Translation and corpus design. SYNAPS – A Journal of Professional Communication, 26, 14–23.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Václav Cvrček.

Additional information

This paper was written within the Programme for the Development of Fields of Study at Charles University, No. P11 ‘Czech National Corpus’, sub-programme ‘Czech National Corpus’.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Cvrček, V., Chlumská, L. Simplification in translated Czech: a new approach to type-token ratio. Russ Linguist 39, 309–325 (2015). https://doi.org/10.1007/s11185-015-9151-8

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11185-015-9151-8

Keywords

Navigation