Translationese and Register Variation in English-To-Russian Professional Translation

Kunilovskaya, Maria; Corpas Pastor, Gloria

doi:10.1007/978-981-16-4918-9_6

Maria Kunilovskaya⁵ &
Gloria Corpas Pastor^5,6

Part of the book series: New Frontiers in Translation Studies ((NFTS))

808 Accesses

Abstract

This study explores the impact of register on the properties of translations. We compare sources, translations and non-translated reference texts to describe the linguistic specificity of translations common and unique between four registers. Our approach includes bottom-up identification of translationese effects that can be used to define translations in relation to contrastive properties of each register. The analysis is based on an extended set of features that reflect morphological, syntactic and text-level characteristics of translations. We also experiment with lexis-based features from n-gram language models estimated on large bodies of originally- authored texts from the included registers. Our parallel corpora are built from published English-to-Russian professional translations of general domain mass-media texts, popular-scientific books, fiction and analytical texts on political and economic news. The number of observations and the data sizes for parallel and reference components are comparable within each register and range from 166 (fiction) to 525 (media) text pairs; from 300,000 to 1 million tokens. Methodologically, the research relies on a series of supervised and unsupervised machine learning techniques, including those that facilitate visual data exploration. We learn a number of text classification models and study their performance to assess our hypotheses. Further on, we analyse the usefulness of the features for these classifications to detect the best translationese indicators in each register. The multivariate analysis via text classification is complemented by univariate statistical analysis which helps to explain the observed deviation of translated registers through a number of translationese effects and detect the features that contribute to them. Our results demonstrate that each register generates a unique form of translationese that can be only partially explained by cross-linguistic factors. Translated registers differ in the amount and type of prevalent translationese. The same translationese tendencies in different registers are manifested through different features. In particular, the notorious shining-through effect is more noticeable in general media texts and news commentary and is less prominent in fiction.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Investigating Genre and Method Variation in Translation Using Text Classification

Translation Corpus-Informed Research: A Swedish-Croatian Example

Quantifying English and Polish Lolitas: A Corpus-Driven Stylistic Comparison

Notes

1.
https://www.rus-ltc.org/search.
2.
Earlier studies that suggest that translationese is dependent on register are Steiner (1998), Reiss (1989) and Teich (2003), among others.
3.
See, for instance, Baroni (2006), Kurokawa (2009), Arase (2013), Eetemadi (2015) and Rabinovich (2016).
4.
Some relevant studies are Popescu (2011), Koppel (2011) and Nisioi (2013).
5.
http://ufal.mff.cuni.cz/udpipe/models#universal_dependencies_20_models.
6.
https://ruscorpora.ru/.
7.
http://www.casmacat.eu/corpus/news-commentary.html.

References

Aharoni, R., M. Koppel, and Y. Goldberg. 2014. Automatic detection of machine translated text and translation quality estimation. In Proceedings of the 52nd annual meeting of the association for computational linguistics (ACL 2014), Vol. 1: Long Papers, ed. K. Toutanova, and H. Wu, 289–295. Association for Computational Linguistics https://doi.org/10.3115/v1/p14-2048.
Arase, Y., and M. Zhou. 2013. Machine translation detection from monolingual web-text. In Proceedings of the 51st annual meeting of the association for computational linguistics, Vol. 1: Long Papers, ed. H. Schütze, F. Pascale, and M. Poesio, 1597–1607. Association for Computational Linguistics.
Google Scholar
Baker, M. 1993. Corpus linguistics and translation studies: Implications and applications. In Text and technology: In honour of John Sinclair, ed. M. Baker, G. Francis, and E. Tognini-Bonelli, 232–250. Amsterdam: John Benjamins Publishing Company. https://doi.org/10.1075/z.64.15bak.
Baker, M. 1996. Corpus-based translation studies: The challenges that lie ahead. In Terminology, LSP and translation: Studies in language engineering, in honour of Juan C. Sager, ed. H. Somers, 175–186. Amsterdam: John Benjamins Publishing Company. https://doi.org/10.1075/btl.18.17bak.
Baroni, M., and S. Bernardini. 2006. A new approach to the study of translationese: Machine-learning the difference between original and translated text. Literary and Linguistic Computing 21 (3): 259–274. https://doi.org/10.1093/llc/fqi039.
Article Google Scholar
Becher, V. 2011. Explicitation and implicitation in translation. A corpus-based study of English-German and German-English translations of business texts [Doctoral dissertation, Staats-und Universitätsbibliothek Hamburg Carl von Ossietzky]. https://ediss.sub.uni-hamburg.de/bitstream/ediss/4186/1/Dissertation.pdf.
Biber, D. 1988. Variation across speech and writing, 2nd ed. Cambridge: Cambridge University Press.
Book Google Scholar
Biber, D. 1995. Dimensions of register variation: A cross-linguistic comparison. Cambridge: Cambridge University Press. https://doi.org/10.1017/CBO9780511519871.
Book Google Scholar
Biber, D., and S. Conrad. 2009. Register, genre, and style. Cambridge: Cambridge University Press.
Book Google Scholar
Biber, D., S. Johansson, G. Leech, S. Conrad, and R. Quirk. 1999. Longman grammar of spoken and written English, vol. 2. Cambridge, MA: The MIT Press.
Google Scholar
Castagnoli, S. 2009. Regularities and variations in learner translations: a corpus-based study of conjunctive explicitation [Doctoral dissertation, University of Pisa, Italy]. ETD System, electronic theses and dissertations repository. https://etd.adm.unipi.it/t/etd-04252009-135411/.
Castagnoli, S., D. Ciobanu, K. Kunz, N. Kübler, and A. Volanschi. 2011. Designing a learner translator corpus for training purposes. In Corpora, language, teaching, and resources: From theory to practice, Vol. 12, ed. N. Kubler, 221–248. Frankfurt: Peter Lang.
Google Scholar
Chang, Y., and C. Lin. 2008. Feature ranking using linear SVM. In Proceedings of the workshop on the causation and prediction challenge at WCCI 2008, ed. I. Guyon, C. Aliferis, and G. Cooper, 53–64. Proceedings of Machine Learning Research.
Google Scholar
Corpas Pastor, G. 2008. Investigar con corpus en traducción: Los retos de un nuevo paradigma. Frankfurt: Peter Lang. https://doi.org/10.4000/bulletinhispanique.1301.
Book Google Scholar
Corpas Pastor, G., R. Mitkov, N. Afzal, and V. Pekar. 2008. Translation universals: Do they exist? A corpus-based NLP study of convergence and simplification. In Proceedings of the 8th conference of the association for machine translation in the Americas (AMTA’08), 21–25.
Google Scholar
Delaere, I. 2015. Do translations walk the line? Visually exploring translated and non-translated texts in search of norm conformity. [Doctoral dissertation, Ghent University]. Academic Bibliography. https://biblio.ugent.be/publication/5888594.
Dipper, S., M. Seiss, and H. Zinsmeister. 2012. The use of parallel and comparable data for analysis of abstract anaphora in German and English. In Proceedings of the 8th international conference on language resources and evaluation (LREC 2012), ed. N. Calzolari, Kh. Choukri, Th. Declerck, M. Uğur Doğan, et al., 138–145. European Language Resources Association.
Google Scholar
Diwersy, S., S. Evert, and S. Neumann. 2014. A semi-supervised multivariate approach to the study of language variation. In Linguistic variation in text and speech, within and across languages, ed. B. Szmrecsanyi, and B. Wälchli, 174–204. Berlin: De Gruyter Mouton.
Google Scholar
Duff, A. 1981. The third language: Recurrent problems of translation into English. Oxford: Pergamon.
Google Scholar
Eetemadi, S., and K. Toutanova. 2015. Detecting translation direction: A cross-domain study. In Proceedings of NAACL-HLT 2015 student research workshop (SRW), ed. D. Inkpen, S. Muresan, Sh. Lahiri, K. Mazidi, and A. Zhila, 103–109. https://doi.org/10.3115/v1/N15-2014.
Evert, S., and S. Neumann. 2017. The impact of translation direction on characteristics of translated texts: A multivariate analysis for English and German. In Empirical translation studies: New methodological and theoretical traditions, vol. 300, ed. G. De Sutter, M. Lefer, and I. Delaere, 47–80. Berlin: De Gruyter Mouton. https://doi.org/10.1515/9783110459586-003.
Fraser, B. 2006. Towards a theory of discourse markers. In Approaches to discourse particles, ed. K. Fischer, 189–204. London: Elsevier.
Google Scholar
Frawley, W. 1984. Prolegomenon to a theory of translation. In Translation: Literary, linguistic & philosophical perspectives, ed. W. Frawley, 159–175. Newark: University of Delaware Press.
Google Scholar
Gellerstam, M. 1986. Translationese in Swedish novels translated from English. In Translation studies in Scandinavia, ed. L. Wollin and H. Lindquist, 88–95. Lund: CWK Gleerup.
Google Scholar
Goutte, C., D. Kurokawa, and P. Isabelle. 2009. Automatic detection of translated text and its impact on machine translation. In Proceedings of the 12th machine translation summit (MT Summit XII), 81–88.
Google Scholar
Graham, Y., B. Haddow, and P. Koehn. 2020. Statistical power and translationese in machine translation evaluation. In Proceedings of the 2020 conference on empirical methods in natural language processing (pp. 72–81). Association for Computational Linguistics.
Google Scholar
Halliday, M.A.K., and R. Hasan. 1976. Cohesion in English. London: Longman.
Google Scholar
Halliday, M., and R. Hasan. 1989. Language, context, and text: Aspects of language in a social-semiotic perspective (2nd ed.). Oxford University Press.
Google Scholar
Hansen-Schirra, S. 2011. Between normalization and shining-through. Specific properties of English-German translations and their influence on the target language. In Multilingual discourse production: Diachronic and synchronic perspectives, ed. S. Kranich, 133–162. Amsterdam: John Benjamins.
Google Scholar
Heafield, K. 2011. KenLM: Faster and smaller language model queries. In Proceedings of the EMNLP 2011 sixth workshop on statistical machine translation, ed. Ch. Callison-Burch, Ph. Koehn, Ch. Monz, and O. Zaidan, 187–197. Association for Computational Linguistics.
Google Scholar
Ilisei, I., D. Inkpen, G. Corpas Pastor, and R. Mitkov. 2010. Identification of translationese: A machine learning approach. International conference on intelligent text processing and computational linguistics, 503–511.
Google Scholar
Jiang, Z., and Y. Tao. 2017. Translation universals of discourse markers in Russian-to-Chinese academic texts: A corpus-based approach. Zeitschrift Fur Slawistik 62 (4): 583–605. https://doi.org/10.1515/slaw-2017-0037.
Article Google Scholar
Jing, Y., and H. Liu. 2015. Mean hierarchical distance augmenting mean dependency distance. In Proceedings of the third international conference on dependency linguistics (Depling 2015), ed. J. Nivre and E. Hajicova, 161–170. Uppsala University.
Google Scholar
Karakanta, A., and E. Teich. 2019. Detecting and analysing translationese with probabilistic language models translationese. In Translation in Transition 4: 38–39.
Google Scholar
Katinskaya, A., and S. Sharoff. 2015. Applying multi-dimensional analysis to a Russian webcorpus: Searching for evidence of genres. In Proceedings of the 5th workshop on Balto-Slavic natural language processing, ed. J. Piskorski, L. Pivovarova, J. Šnajder, H. Tanev, and R. Yangarber, 65–74. INCOMA Ltd. http://www.aclweb.org/anthology/W15-5311.
Koppel, M., and N. Ordan. 2011. Translationese and its dialects. In Proceedings of the 49th annual meeting of the association for computational linguistics: Human language technologies, Vol. 1, ed. D. Lin, Yu. Matsumoto, and R. Mihalcea, 1318–1326. Association for Computational Linguistics.
Google Scholar
Kruger, H., and B. Rooy. 2012. Register and the features of translated language. Across Languages and Cultures 13 (1): 33–65. https://doi.org/10.1556/Acr.13.2012.1.3.
Article Google Scholar
Kruger, H., and B. van Rooy. 2010. The features of non-literary translated language: A pilot study. In Proceedings of using corpora in contrastive and translation studies (UCCTS 2010), ed. R. Xiao, 59–79.
Google Scholar
Kunilovskaya, M. 2017. Linguistic tendencies in English to Russian translation: The case of connectives. In Computational linguistics and intellectual technologies: Proceedings of the international conference “Dialogue 2017”, Vol. 2, ed. V.P. Selegey, A.V. Baytin, V.I. Belikov, I.M. Boguslavsky, B.V. Dobrov, et al., 221–233. Computational Linguistics and Intellectual Technologies.
Google Scholar
Kunilovskaya, M., and A. Kutuzov. 2018. Universal dependencies-based syntactic features in detecting human translation varieties. In Proceedings of the 16th international workshop on treebanks and linguistic theories (TLT16), ed. J. Hajič, 27–36. Association for Computational Linguistics.
Google Scholar
Kunilovskaya, M., and E. Lapshinova-Koltunski. 2020. Lexicogrammatic translationese across two targets and competence levels. In Proceedings of the 12th conference on language resources and evaluation (LREC 2020), ed. N. Calzolari, F. Bechet, Ph. Blache, Kh. Choukri, et al., 4102–4112. The European Language Resources Association (ELRA).
Google Scholar
Kunilovskaya, M., and E. Lapshinova-Koltunski. 2019. Translationese features as indicators of quality in English-Russian human translation. In Proceedings of the 2nd workshop on human-informed translation and interpreting technology (HiT-IT 2019), ed. I. Temnikova, C. Orasan, G. Corpas Pastor, and R. Mitkov, 47–56. INCOMA Ltd. https://doi.org/10.26615/issn.2683-0078.2019_006.
Kutuzov, A., and M. Kunilovskaya. 2014. Russian learner translator corpus: Design, research potential and applications. In Proceedings of the 17th international conference text, speech and dialogue, vol. 8655, ed. P. Sojka, A. Horák, I. Kopeček, and K. Pala, 315–323. Springer.
Google Scholar
Lapshinova-Koltunski, E. 2017. Exploratory analysis of dimensions influencing variation in translation. The case of text register and translation method. In Empirical translation studies. New theoretical and methodological traditions, vol. 300, ed. G. De Sutter, M. Lefer, and I. Delaere, 207–234. Berlin: De Gruyter Mouton. https://doi.org/10.1515/9783110459586-008.
Lapshinova-Koltunski, E., and M. Zampieri. 2018. Linguistic features of genre and method variation in translation: A computational perspective. The grammar of genres and styles: From discrete to non-discrete units, (TiLSM, 320), 92–112. Berlin: De Gruyter Mouton.
Google Scholar
Lee, D.Y.W. 2001. Genres, registers, text types, domains, and styles: Clarifying the concepts and navigating a path through the BNC jungle. Language Learning & Technology 5 (3): 37–72. https://doi.org/10.1016/S1364-6613(00)01594-1.
Article Google Scholar
Lembersky, G., N. Ordan, and S. Wintner. 2012. Language models for machine translation: Original vs. translated texts. Computational Linguistics, 38 (4): 799–825. https://doi.org/10.1162/COLI_a_00111.
Lijffijt, J., T. Nevalainen, T. Säily, P. Papapetrou, K. Puolamäki, and H. Mannila. 2016. Significance testing of word frequencies in corpora. Digital Scholarship in the Humanities 31 (2): 374–397. https://doi.org/10.1093/llc/fqu064.
Article Google Scholar
Liu, D. 2008. Linking adverbials: An across-register corpus study and its implications. International Journal of Corpus Linguistics 13 (4): 491–518. https://doi.org/10.1075/ijcl.13.4.05liu.
Article Google Scholar
Martin, J.R. 1992. English text: System and structure. Amsterdam: John Benjamins.
Book Google Scholar
Nakamura, S. 2007. Comparison of features of texts translated by professional and learner translators. In Proceedings of the 4th corpus linguistics conference. University of Birmingham.
Google Scholar
Neumann, S. 2013. Contrastive register variation. A quantitative approach to the comparison of English and German. Berlin: De Gruyter Mouton.
Book Google Scholar
Nikolaev, D., T. Karidi, N. Kenneth, V. Mitnik, L. Saeboe, and O. Abend. 2020. Morphosyntactic predictability of translationese. Linguistics Vanguard, 6 (1).
Google Scholar
Nini, A. 2019. The multi-dimensional analysis tagger. In Multi-dimensional analysis: research methods and current issues, ed. T. Berber Sardinha, and M. Veirano Pinto, 67–94. London; New York: Bloomsbury Academic. https://doi.org/10.5040/9781350023857.0012.
Nisioi, S., and L.P. Dinu. 2013. A clustering approach for translationese identification. In Proceedings of the international conference recent advances in natural language processing (RANLP 2013), ed. R. Mitkov, G. Angelova, and K. Bontcheva, 532–538. INCOMA Ltd. http://www.aclweb.org/anthology/R13-1070.
Novikova, N.I. 2008. Connectives as cohesive devices in an asyndetic composite sentence [Konnektory kak svjazujushhie sredstva v bessojuznom slozhnom predlozhenii]. In Herald of the Voronezh state Architecture University, advanced linguistic and pedagogical research series [Ser.: Sovremennye lingvisticheskie i metodiko-didakticheskie issledovanija], 92–100.
Google Scholar
Olohan, M. 2001. Spelling out the optionals in translation: A corpus study. UCREL Technical Papers 13: 423–432.
Google Scholar
Popescu, M. 2011. Studying translationese at the character level. In Proceedings of the international conference recent advances in natural language processing (RANLP 2011), 634–639. http://aclweb.org/anthology/R11-1091.
Popovic, M. 2020. On the differences between human translations. In Proceedings of the 22nd annual conference of the European association for machine translation, ed. A. Martins, H. Moniz, S. Fumega, M. Martins, F. Batista, L. Coheur, C. Parra, … M. Forcada, 365–374. European Association for Machine Translation.
Google Scholar
Prieels, L., I. Delaere, K. Plevoets, and G. De Sutter. 2015. A corpus-based multivariate analysis of linguistic norm-adherence in audiovisual and written translation. Across Languages and Cultures 16 (2): 209–231. https://doi.org/10.1556/084.2015.16.2.4.
Article Google Scholar
Priyatkina, A.F., E.A. Starodumova, G.N. Sergeeva, et al. (eds.). 2001. A Russian dictionary of functional words [Slovar’ sluzhebnyh slov russkogo jazyka]. Vladivostok: Far-East State University Press.
Google Scholar
Puurtinen, T. 2003. Genre-specific features of translationese? Linguistic differences between translated and non-translated Finnish children’s literature. Literary and Linguistic Computing 18 (4): 389–406. https://doi.org/10.1093/llc/18.4.389.
Article Google Scholar
Rabadán, R., B. Labrador, and N. Ramón. 2009. Corpus-based contrastive analysis and translation universals: A tool for translation quality assessment. Babel 55 (4): 303–328. https://doi.org/10.1075/babel.55.4.01rab.
Article Google Scholar
Rabinovich, E., and S. Wintner. 2013. Unsupervised identification of tr association for computational linguistics anslationese. Transactions of the Association for Computational Linguistics 3: 419–432. https://doi.org/10.1162/tacl_a_00148.
Article Google Scholar
Redelinghuys, K. 2016. Levelling-out and register variation in the translations of experienced and inexperienced translators: A corpus-based study. Stellenbosch Papers in Linguistics 45: 189–220. https://doi.org/10.5774/45-0-198.
Article Google Scholar
Santini, M., A. Mehler, and S. Sharoff. 2010. Riding the rough waves of genre on the web concepts and research questions. In Genres on the web: Computational models and empirical studies, vol. 42, ed. A. Mehler, S. Sharoff, and M. Santini, 3–30. Springer Science & Business Media.
Chapter Google Scholar
Santos, D. 1995. On grammatical translationese. In Proceedings of the 10th Nordic conference of computational linguistics (NODALIDA 1995), ed. K. Koskenniemi, 59–66. University of Helsinki.
Google Scholar
Sharoff, S. 2018. Functional text dimensions for annotation of web corpora. Corpora 13 (1): 65–95. https://doi.org/10.3366/cor.2018.0136.
Article Google Scholar
Shvedova, N. (ed.). 1980. Russian grammar. Moscow, Science [Nauka].
Google Scholar
Sominsky, I., and S. Wintner. 2019. Automatic detection of translation direction. In Proceedings of the international conference on recent advances in natural language processing (RANLP 2019), ed. R. Mitkov and G. Angelova, 1131–1140. INCOMA Ltd. https://doi.org/10.26615/978-954-452-056-4_130.
Specia, L., G.H. Paetzold, and C. Scarton. 2015. Multi-level translation quality prediction with QUEST++. In Proceedings of ACL-IJCNLP 2015 system demonstrations, ed. H. Chen, and K. Markert, 115–120. Association for Computational Linguistics. https://doi.org/10.3115/v1/p15-4020.
Straka, M., and Straková, J. 2017. Tokenizing, POS tagging, lemmatizing and parsing UD 2.0 with UDPipe. In Proceedings of the CoNLL 2017 shared task: multilingual parsing from raw text to universal dependencies, ed. D. Zeman, J. Hajic, M. Popel, M. Potthast, M. Straka, F. Ginter, J. Nivre, and S. Petrov, 88–99. Association for Computational Linguistics. https://doi.org/10.18653/v1/K17-300.
Stymne, S. 2017. The effect of translationese on tuning for statistical machine translation. In Proceedings of the 21st Nordic conference of computational linguistics, ed. J. Tiederman, 241–246. Linköping University Electronic Press.
Google Scholar
Teich, E. 2003. Cross-linguistic variation in system and text. A methodology for the investigation of translations and comparable texts. (TTCP, 5). Berlin: De Gruyter Mouton.
Google Scholar
Toury, G. 1995. Descriptive trantslation studies-and beyond. Amsterdam: John Benjamins. https://doi.org/10.1075/btl.4.
Book Google Scholar
Vela, M., and E. Lapshinova-Koltunski. 2015. Register-based machine translation evaluation with text classification techniques. In Proceedings of the 15th machine translation summit (Vol. 1: MT Researchers’ Track), ed. Y. Al-Onaizan, and W. Lewis, 215–228. Association for Machine Translation in the Americas.
Google Scholar
Volansky, V., N. Ordan, and S. Wintner. 2015. On the features of translationese. Digital Scholarship in the Humanities 30 (1): 98–118. https://doi.org/10.1093/llc/fqt031.
Article Google Scholar
Xiao, R., L. He, and Y. Ming. 2010. In pursuit of the third code: Using the ZJU corpus of translational Chinese in translation studies. In Using corpora in contrastive and translation studies, ed. R. Xiao, 182–214. New Castle: Cambridge Scholars Publishing.
Google Scholar
Zanettin, F. 2013. Corpus methods for descriptive translation studies. Procedia-Social and Behavioral Sciences 95: 20–32. https://doi.org/10.1016/j.sbspro.2013.10.618.
Article Google Scholar
Zhang, M., and A. Toral. 2019. The effect of translationese in machine translation test sets. In Proceedings of the fourth conference on machine translation (Volume 1: Research Papers), ed. O. Bojar, R. Chatterjee, Ch. Federmann, M. Fishel, Y. Graham, ... K. Verspoor, 73–81. Association for Computational Linguistics. https://doi.org/10.18653/v1/W19-5208.

Download references

Acknowledgements

The research presented in this paper has been partially carried out in the framework of projects in the framework of the projects VIP (FFI2016-75831-P), TRIAGE (UMA18-FEDERJA-067) and MI4ALL (CEI-RIS3). The authors would like to thank two anonymous reviewers for their valuable comments.

Author information

Authors and Affiliations

Research Group in Computational Linguistics, University of Wolverhampton, Wolverhampton, UK
Maria Kunilovskaya & Gloria Corpas Pastor
University of Malaga, Malaga, Spain
Gloria Corpas Pastor

Authors

Maria Kunilovskaya
View author publications
You can also search for this author in PubMed Google Scholar
Gloria Corpas Pastor
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Maria Kunilovskaya .

Editor information

Editors and Affiliations

Department of English, The University of Macau, Macao, Macao
Vincent X. Wang
School of Languages and Translation, Macao Polytechnic Institute, Macao, Macao
Lily Lim
Department of English, University of Macau, Macao, Macao
Defeng Li

Appendix

The UD-based and list-based features in alphabetical order.

Preliminary Notes

1.
Normalisation measures

We use several norms to make features comparable across different-size corpora, depending on the nature of the feature. Most of the features, including all types of discourse markers, negative particles, passives, types of verb forms, relative clauses, correlative constructions, adverbial clauses introduced by pronominal adverbs coordinating and subordinating conjunctions, simple sentences, number of clauses per sentence, are normalised to the number of sentences (30 features). Such features as personal, possessive and other noun substitutes, nouns, adverbial quantifiers, determiners are normalised to the running words (6 features). Counts for syntactic relations are represented as probabilities, normalised to the number of sentences (7 features). Some features have their own normalisation basis: comparative and superlative degrees are normalised to the total number of adjectives and adverbs, nouns in the functions of subject, object or indirect object are normalised to the total number of these roles in the text.

2.
Groups of discourse markers

The classification of connectives (discourse markers) follows the descriptions in Halliday and Hasan (1976) and in Biber et al. (1999). Table A has the number of items in each group and most frequent examples. The lists were initially produced independently from grammar reference books, dictionaries of function words and relevant research papers (for English we used Biber et al. (1999), Fraser (2006), Liu (2008); for Russian―Novikova (2008), Priyatkina (2015), Russian Grammar (Shvedova 1980) to name just a few sources for each language). After the initial selection, the lists were verified for comparability. Following Fraser (2006), discourse markers are treated functionally and include items of various morphological and structural types (conjunctions, adverbs, particles, parenthetical phrases). Though most items on the lists are set phrases, we allowed for possible lexical and structural variability at the extraction time. We also used orthography and punctuation to disambiguate our items. The output of the extraction procedure was manually checked to exclude greedy matching.

Table 8 Number of listed connectives and discourse markers by category for each of the project languages and top five most frequent items

Full size table

3.
The alphabetic list of 45 morphosyntactic features

acl

finite and non-finite clausal modifier of noun (adjectival clause), including relative clauses as a subtype (used only in EN and RU); extraction is based on UD default annotation (e.g. the person showing (acl) her around; help people do something to overcome (acl) it; людeй, cлeдящиx (acl) зa пoлитикoй)

addit

additive connectives; cumulative frequency of the list items normalised to the number of sentences; see description in Table A

advers

adversative (contrastive) connectives; cumulative frequency of the list items normalised to the number of sentences; see description in Table A

attrib

adjectives and participles functioning as attributes; all words tagged as ADJ or VerbForm = Part with the amod dependency to their head (e.g. the rising sun; the coloured face; fried green tomatoes)

aux

auxiliary verbs; extraction is based on UD default annotation

aux:pass

auxiliary verbs in passive forms; extraction is based on UD default annotation

but

contrastive coordinating conjunction but (нo), if not followed but also/и, тaкжe and not in the absolute sentence end

caus

causative connectives; cumulative frequency of the list items normalised to the number of sentences; see description in Table A

ccomp

clausal complement as annotated in UD (e.g. help people to do (ccomp) smth; нe oжидaли, чтo пpидeт (ccomp))

cconj

coordinating conjunctions: lemmas in and, or, both, yet, either, &, nor, plus, neither, ether / и, a, или, ни, дa, пpичeм, либo, зaтo, инaчe, тoлькo, aн, и/или, иль tagged CCONJ. Lists are used to filter out noise.

comp

comparative degree of comparison for adjectives and adverbs; synthetic forms are extracted based on the tag Degree = Comp, while analytical forms are counted as adjectives and adverbs with a dependent more/бoлee (бoльший)

copula

copula verbs; lemmas of be, быть, этo that have a cop relation to their head, excluding constructions with there as head for English

correl

correlative constructions of all types, where a PRON/DET (those, such) is syntactically or semantically connected to subsequent CONJ. In English they make a subset of relative clauses; in Russian they can also be a subtype of a clausal complement (e.g. of those who voted for him, raising the living standards of those that are poor)

demdets

pronominal determiners; lemmas in the function det from the lists this, some, these, that, any, all, every, another, each, those, either, such / этoт, вecь, тoт, тaкoй, кaкoй, кaждый, любoй, нeкoтopый, кaкoй-тo, oдин, ceй, этo, вcякий, нeкий, кaкoй-либo, кaкoй-нибyдь, кoe-кaкoй

deverbals

deverbal nouns, names of processes, actions, states. The extraction for English accounts for affixation (with most productive -ment, -tion/ -ung, -tion) and conversion as types of derivation. In the first case the output is filtered with an empirically driven stop list. Converted nouns are counted from a list of true procedural nouns that were not fully substantivised. To produce this list we looked through the nounal occurrences of lemmas that also appear as verbs and filtered out items that prevail in their fully substantivised lexico-semantic variants in our data (such as design, set, measure, mark, press, stick, cross, trap, handle). For Russian we extracted nouns in -тиe, -eниe, -aниe, -cтвo, -ция, -oтa and employed a 150-items long stop list to exclude fully substantivised words such as coбpaниe, мecтopoждeниe, миниcтepcтвo, тeлeвидeниe, твopчecтвo, peшeниe.

epist

epistemic stance discourse markers; cumulative frequency of the list items normalised to the number of sentences; see description in Table A

finites

verbs in finite form; extraction is based on UD default annotation VerbForm = Fin

indef

noun substitutes, i.e. pronouns par excellence, of indefinite, total and negative semantic subtypes; extraction is based on PRON tag with a filter list: anybody, anyone, anything, everybody, everyone, everything, nobody, none, nothing, somebody, someone, something, elsewhere, nowhere, everywhere, somewhere, anywhere / кoгдa, гдe, кyдa, oткyдa, oтчeгo, пoчeмy, зaчeм and words with -тo|-нибyдь|-либo, except starting with кaкoй; and items from ктo-ктo, кoгo-кoгo, кoмy-кoмy, кeм-кeм, кoм-кoм, чтo-чтo, чeгo-чeгo, чeмy-чeмy, чeм-чeм, кyдa-кyдa, гдe-гдe

infs

infinitives: all cases of a verb form tagged VerbForm = Inf with a dependent to particle and cases of true bare infinitive, excluding after modal verbs and have to, going to and modal adjectival predicates, but including cases after help, make, bid, let, see, hear, watch, dare, feel. For Russian all occurrences of verb forms with the feature VerbForm = Inf except after modal predicates and with the dependent быть to exclude future forms (e.g. oтнoшeния бyдyт yxyдшaтьcя).

interrog

interrogative sentences: all sentences ending in ?

lexdens

lexical density: ratio of PoS disambiguated content words types (look_VERB vs look_NOUN) to all tokens

lexTTR

lexical type-to-token ratio: ratio of PoS disambiguated content words types (look_VERB vs look_NOUN) to their tokens. Content words include lemmas in ADJ, ADV, VERB, NOUN part-of-speech categories.

mdd

mean dependency distance (MDD, aka comprehension difficulty) as ‘the distance between words and their parents, measured in terms of intervening words’ (Jing and Liu 2015: 162)

mhd

mean hierarchical distance (MHD, aka production (speaker’s difficulty) as the average value of all path lengths travelling from the root to all nodes along the dependency edges (Jing and Liu 2015: 164)

mpred

modal predicates; for English all verbs tagged as MD in XPOS, except will/shall, constructions with have-to-Inf and all adjectival modal predicates (given a list of 17 predicatives such as impossible, likely, sure with a dependent AUX). For Russian: lemma мoчь, lemma cлeдoвaть with a dependent infinitive, three modal adverbs (мoжнo, нeльзя, нaдo) and 11 adjectives from the modal predicative list in the short form Variant = Short (e.g. дoлжeн, cпocoбный, вoзмoжный)

mquantif

adverbial quantifiers; listed lemmas tagged ADV. The support lists include 37 English items (e.g. barely, completely, intensely, almost), 80 Russian items (aбcoлютнo, пoлнocтью, cплoшь, нeoбыкнoвeннo, дocтaтoчнo, coвepшeннo, нeвынocимo, пpимepнo). For Russian we additionally provide for functionally similar non-adverbial quantifiers such as eлe, oчeнь, вшecтepo, нeвыpaзимo, излишнe, eлe-eлe, чyть-чyть, eдвa-eдвa, тoлькo, кaпeлькy, чyтoчкy, eдвa.

neg

negative particles or main sentence negation: counts of lemmas in no, not, neither /нeт, нe

nnargs

core verbal arguments represented by nouns or proper names; ratio of nouns and proper names in the functions of nsubj, obj, iobj to the count of these functions

nsubj:pass

subjects of verbs in the passive voice; extraction is based on UD default nsubj:pass annotation

numcls

number of clauses per sentence; number of relations from the list csubj, acl:relcl, advcl, acl, xcomp, parataxis annotated in one sentence

passives

passive constructions with expressed agentive role; all verbs tagged Voice = Pass and a dependent aux:pass (for English). For Russian we account for two morphological forms (вoйнa вeлacь, пoлитикa былa нaпpaвлeнa) and for semantic passive (cтaдиoн вoзвoдят нa нoвoм мecтe, вo Bлaдикaвкaзe eмy гoтoвят paдyшнyю вcтpeчy)

parataxis

asyndatically connected coordinated clauses (often direct speech or clauses joined ‘:’ or a ‘;’ as well as parenthetical clauses); extraction is based on UD default annotation

pasttense

verbs in the past tense: all occurrences of the feature Tense = Past

pied

correlative constructions with displaced (pied-piped) preposition (e.g. technology for which Sony could take credit; speech in which he made this argument; o тaкoм, o кaкoм вы нe cлыxaли; cкaндaл, в кoтopoм; тpaгeдии, c кoтopыми, в тoй кoнcтpyкции, в кaкoй oнa)

possdet

possessive pronouns; for English lemma in my, your, his, her, its, our, their tagged DET, PRON and Poss = Yes. For Russian lemma in мoй, твoй, вaш, eгo, ee, eё, нaш, иx, иxний, cвoй tagged DET

ppron

personal pronouns; tokens tagged PRON, with any value of attribute Person = that do not have Poss = Yes feature and are on the list: i, you, he, she, it, we, they, me, him, her, us, them / я, ты, вы, oн, oнa, oнo, мы, oни, мeня, тeбя, eгo, eё, ee, нac, вac, иx, нeё, нee, нeгo, ниx, мнe, тeбe, eй, eмy, нaм, вaм, им, нeй, нeмy, ним, мeня, тeбя, нeгo, мнoй, мнoю, тoбoй, тoбoю, Baми, им, eй, eю, нaми, вaми, ими, ним, нeм, нём, нeй, нeю

pverbals

participles: for English all occurrences of VerbForm = Part or VerbForm = Ger not in attributive function amod or part of an analytical form. For Russian VerbForm = Part not in the short form and not in the attributive function, without a dependent auxiliary, and VerbForm = Conv without dependent auxiliary (e.g. after years of translating emails, webinars and other materials)

relativ

all relative clauses, including correlative constructions and pied-piping construction. Extraction is based on affirmative sentences only. For English: which, that, whose, whom, what, who tagged as PRON, excluding cases when relative PRON has a dependent preposition and follows its head (e.g. But we will return to that (PRON) later). For Russian: кoтopый, чтo, ктo, кaкoй and a comma in the left window of 3

sconj

subordinating conjunctions: lemma in that, if, as, of, while, because, by, for, to, than, whether, in, about, before, after, on, with, from, like, although, though, since, once, so, at, without, until, into, despite, unless, whereas, over, upon, whilst, beyond, towards, toward, but, except, cause, together / чтo, кaк, ecли, чтoбы, тo, кoгдa, чeм, xoтя, пocкoлькy, пoкa, тeм, вeдь, нeжeли, ибo, пycть, бyдтo, cлoвнo, дaбы,paз, нacкoлькo, тoт, кoли, кoль, xoть, paзвe, cкoль,eжeли, пoкyдa, пocтoлькy tagged SCONJ. Lists are used to filter out noise.

sentlength

number of words per sentence averaged over all sentences in the text. The extraction accounts for typical sentence tokenisation errors such as sentences ending in:,;, Mr., Dr.

simple

simple sentence; a sentence where no words have relations: csubj, acl:relcl, advcl, acl, xcomp, parataxis

sup

superlative degree of comparison for adjective and adverbs; synthetic forms are extracted based on the tag Degree = Sup, while analytical forms are counted as adjectives and adverbs with a dependent most/нaибoлee/caмый and for Russian words starting with нaи- with the exception of a few homonymous adverbs (нaиcкocoк)

tempseq

temporal and sequential connectives; cumulative frequency of the list items normalised to the number of sentences; see description in Table A

whconj

adverbial clause introduced by a pronominal ADV when, where, why / кoгдa, гдe, кyдa, oткyдa, oтчeгo, пoчeмy, зaчeм

xcomp

a predicative or clausal complement without its own subject, annotated after phrasal verbs (e.g. started to sing), in case of infinitive constructions (e.g. asked me to leave), etc.; extraction is based on UD default annotation

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Kunilovskaya, M., Corpas Pastor, G. (2021). Translationese and Register Variation in English-To-Russian Professional Translation. In: Wang, V.X., Lim, L., Li, D. (eds) New Perspectives on Corpus Translation Studies. New Frontiers in Translation Studies. Springer, Singapore. https://doi.org/10.1007/978-981-16-4918-9_6

Download citation

DOI: https://doi.org/10.1007/978-981-16-4918-9_6
Published: 12 October 2021
Publisher Name: Springer, Singapore
Print ISBN: 978-981-16-4917-2
Online ISBN: 978-981-16-4918-9
eBook Packages: EducationEducation (R0)

Publish with us

Policies and ethics

Translationese and Register Variation in English-To-Russian Professional Translation

Abstract

Access this chapter

Similar content being viewed by others

Investigating Genre and Method Variation in Translation Using Text Classification

Translation Corpus-Informed Research: A Swedish-Croatian Example

Quantifying English and Polish Lolitas: A Corpus-Driven Stylistic Comparison

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendix

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Navigation

Translationese and Register Variation in English-To-Russian Professional Translation

Abstract

Access this chapter

Similar content being viewed by others

Investigating Genre and Method Variation in Translation Using Text Classification

Translation Corpus-Informed Research: A Swedish-Croatian Example

Quantifying English and Polish Lolitas: A Corpus-Driven Stylistic Comparison

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendix

Appendix

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation