Prominent POS-Grams and n-Grams in Translated Czech in the Mirror of the English Source Texts

  • Lucie Chlumská
Part of the Quantitative Methods in the Humanities and Social Sciences book series (QMHSS)


The most typical or prominent POS-grams, i.e., sequences of parts of speech or possibly other grammatical categories, can reveal a lot about the character of a text, especially with regard to its dynamics (reflected in the dominance of nominal or verbal constructions) or lexical density (the accumulation of lexical words as opposed to grammatical word sequences).

In the study of translated Czech, previous research has shown that the POS-grams salient in translated texts differ from those in comparable non-translated Czech texts: they include more verbal combinations and pronouns. Their concrete realizations, e.g., the most frequent n-grams (sequences of n words) in given combinations, have indicated a possible interference effect based on the most represented source language: English.

This study builds on the previous POS-gram and n-gram research on translated Czech and strives to describe and interpret the prominent POS-grams in translated Czech in the light of their corresponding English source texts, using a parallel corpus (namely, the English–Czech part of the InterCorp corpus). As a theoretical basis for description, hypotheses about translation universals are discussed. The results of the analysis indicate that some of the presumably universal translation tendencies can certainly be traced in Czech translations; however, translators’ choices tend to be the result of a combination of factors rather than a single reason (such as explicitation or normalization). The study also comments on the specificities of cross-linguistic comparison based on POS-grams and n-grams in two typologically different languages.


Language of translation POS-grams n-grams Parallel corpus Interference 


  1. Baker, M. (1993). Corpus linguistics and translation studies: Implications and applications. In M. Baker, G. Francis, & E. Tognini-Bonelli (Eds.), Text and technology: In honour of John Sinclair (pp. 233–250). Amsterdam, The Netherlands: John Benjamins.CrossRefGoogle Scholar
  2. Baker, M. (1996). Corpus-based translation studies: The challenges that lie ahead. In H. Somers (Ed.), Terminology, LSP and translation: Studies in language engineering in honour of Juan C. Sager (pp. 175–186). Amsterdam, The Netherlands: John Benjamins.CrossRefGoogle Scholar
  3. Baker, M. (2004). A corpus-based view of similarity and difference in translation. International Journal of Corpus Linguistics, 9(2), 167–193.CrossRefGoogle Scholar
  4. Becher, V. (2010). Abandoning the notion of “translation-inherent” explicitation. Against a dogma of translation studies. Across Languages and Cultures, 11(1), 1–28.CrossRefGoogle Scholar
  5. Biber, D., Conrad, S., Finegan, E., Leech, G., & Johansson, S. (1999). Longman grammar of spoken and written English. Harlow, England: Longman.Google Scholar
  6. Chesterman, A. (2004). Hypotheses about translation universals. In G. Hansen, K. Malmkjær, & D. Gile (Eds.), Claims, changes and challenges in translation studies. Selected contributions from the EST Congress Copenhagen 2001 (pp. 1–14). Amsterdam, The Netherlands: John Benjamins.Google Scholar
  7. Chlumská, L. (2013). JEROME: jednojazyčný srovnatelný korpus pro výzkum překladové češtiny [JEROME: a Monolingual Comparable Corpus for the Research of Translated Czech]. Institute of the Czech National Corpus, Faculty of Arts, Charles University, Prague. Available at WWW:
  8. Chlumská, L. (2016). (Ne)typické slovní kombinace v českých překladech a možnosti jejich zkoumání. [(Non)typical word combinations in Czech translations and avenues for their research]. In A. Čermáková, L. Chlumská, & M. Malá (Eds.), Jazykové paralely [Linguistic Parallels] (pp. 235−266). Prague, Czech Republic: NLN.Google Scholar
  9. Chlumská, L. (2017). Překladová čeština a její charakteristiky [Translated Czech and its Characteristics]. Prague, Czech Republic: NLN.Google Scholar
  10. Cortes, V. (2008). A comparative analysis of lexical bundles in academic history writing in English and Spanish. Corpora, 3(1), 43–57.CrossRefGoogle Scholar
  11. Cvrček, V., & Václavík, J. (2015). Jednoznačnost a kontext. Kvantitativní studie. [Unambiguity and context. A Quantitative study]. Korpus, gramatika, axiologie, 11, 28–41.Google Scholar
  12. Čermáková, A., & Chlumská, L. (2016). Jazyk dětské literatury: kontrastivní srovnání angličtiny a češtiny [The language of children’s literature: A contrastive comparison of English and Czech]. In A. Čermáková, L. Chlumská, & M. Malá (Eds.), Jazykové paralely [Linguistic Parallels] (pp. 162−183). Prague, Czech Republic: NLN.Google Scholar
  13. Čermáková, A., & Chlumská, L. (2017). Expressing ‘place’ in children’s literature: Testing the limits of the n-gram method in contrastive linguistics. In T. Egan & H. Dirdal (Eds.), Cross-linguistic correspondences. From lexis to genre (pp. 75–95). Amsterdam, The Netherlands: John Benjamins.CrossRefGoogle Scholar
  14. De Sutter, G., Goethals, P., Leuschner, T., & Vandepitte, S. (2012). Towards methodologically more rigorous corpus-based translation studies. Across Languages and Cultures, 13(2), 137–143.CrossRefGoogle Scholar
  15. Ebeling, J., & Ebeling, S. O. (2013). Patterns in contrast. Amsterdam, The Netherlands: John Benjamins.CrossRefGoogle Scholar
  16. Ebeling, J., Ebeling, S. O., & Hasselgård, H. (2013). Using recurrent word-combinations to explore cross-linguistic differences. In K. Aijmer & B. Altenberg (Eds.), Advances in corpus-based contrastive linguistics (pp. 177–200). Amsterdam, The Netherlands: John Benjamins.CrossRefGoogle Scholar
  17. Even-Zohár, I. (1979). Polysystem theory. Poetics Today, 1(1-2), 287–310.CrossRefGoogle Scholar
  18. Fidler, M., & Cvrček, V. (2015). A data-driven analysis of reader viewpoints: Reconstructing the historical reader using keyword analysis. Journal of Slavic Linguistics, 23, 197–239.CrossRefGoogle Scholar
  19. Forchini, P., & Murphy, A. (2008). N-grams in comparable specialized corpora. Perspectives on phraseology, translation, and pedagogy. International Journal of Corpus Linguistics, 13(3), 351–367.CrossRefGoogle Scholar
  20. Frawley, W. (1984). Prolegomenon to a theory of translation. In Translation: Literary, linguistic and philosophical perspectives. Newark, NJ: University of Delaware Press.Google Scholar
  21. Gellerstam, M. (1986). Translationese in Swedish novels translated from English. In L. Wollin & H. Lindquist (Eds.), Translation studies in Scandinavia (pp. 88–95). Lund, Sweden: CWK Gleerup.Google Scholar
  22. Grabowski, L. (2012). On translation universals in selected contemporary Polish literary translations. Studies in Polish Linguistics, 7(1), 165–183 Xiao 2010.Google Scholar
  23. Granger, S. (2013). Tracking the third code: A cross-linguistic corpus-driven approach to discourse markers. Conference paper at ICLC 7 – UCCTS 3, 11–13 July 2013, Gent, Belgium.Google Scholar
  24. Granger, S. (2014). A lexical bundle approach to comparing languages: Stems in English and French. Languages in Contrast, 14(1), 58–72.CrossRefGoogle Scholar
  25. Granger, S., & Lefer, M.-A. (2013). In K. Aijmer & B. Altenberg (Eds.), Advances in corpus-based contrastive linguistics: Studies in honour of Stig Johansson. Amsterdam, The Netherlands: John Benjamins.Google Scholar
  26. Gries, S. (2008). Phraseology and linguistic theory: A brief survey. In S. Granger & F. Meunier (Eds.), Phraseology: An interdisciplinary perspective (pp. 3–25). Amsterdam, The Netherlands: John Benjamins.CrossRefGoogle Scholar
  27. Halverson, S. (2003). The cognitive basis of translation universals. Target, International Journal of Translation Studies, 15(2), 197–241.MathSciNetGoogle Scholar
  28. House, J. (2008). Beyond intervention: Universals in translation? trans-kom: Zeitschrift für Translationswissenschaft und Fachkommunikation, 1(1), 6–19.MathSciNetGoogle Scholar
  29. Mahlberg, M. (2012). Corpus stylistics and Dickens’s fiction (Vol. 14). New York, NY: Routledge.Google Scholar
  30. Mauranen, A. (2000). Strange strings in translated language. A Study on corpora. In M. Olohan (Ed.), Intercultural faultlines. Research models in translation studies 1: Textual and cognitive aspects (pp. 119–141). Manchester, England: St. Jerome Publishing.Google Scholar
  31. Neumann, S. (2014). Beyond translation properties: The contribution of corpus studies to empirical translation theory. Plenary talk at the UCCTS 4, Lancaster, UK, 25 July 2014.Google Scholar
  32. Tirkkonen-Condit, S. (2002). Translationese – a myth or an empirical fact? A study into the linguistic identifiability of translated language. Target, 14(2), 207–220.CrossRefGoogle Scholar
  33. Tirkkonen-Condit, S. (2004). Unique items? over- or under-represented in translated language? In A. Mauranen & P. Kujamäki (Eds.), Translation universals - Do they exist? (pp. 177–185). Amsterdam, The Netherlands: John Benjamins.CrossRefGoogle Scholar
  34. Toury, G. (1980). In search of a theory of translation. Tel Aviv, Israel: The Porter Institute for Poetics and Semiotics.Google Scholar
  35. Toury, G. (1995). Descriptive translation studies - and beyond. Amsterdam, The Netherlands: John Benjamins.CrossRefGoogle Scholar
  36. Xiao, R. (2010). How different is translated Chinese from native Chinese? International Journal of Corpus Linguistics, 15(1), 5–35.CrossRefGoogle Scholar
  37. Zanettin, F. (2011). Translation and corpus design. SYNAPS – A Journal of professional Communication, 26, 14–23.Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Lucie Chlumská
    • 1
  1. 1.Institute of the Czech National CorpusCharles UniversityPragueCzech Republic

Personalised recommendations