Abstract
This study explores the impact of register on the properties of translations. We compare sources, translations and non-translated reference texts to describe the linguistic specificity of translations common and unique between four registers. Our approach includes bottom-up identification of translationese effects that can be used to define translations in relation to contrastive properties of each register. The analysis is based on an extended set of features that reflect morphological, syntactic and text-level characteristics of translations. We also experiment with lexis-based features from n-gram language models estimated on large bodies of originally- authored texts from the included registers. Our parallel corpora are built from published English-to-Russian professional translations of general domain mass-media texts, popular-scientific books, fiction and analytical texts on political and economic news. The number of observations and the data sizes for parallel and reference components are comparable within each register and range from 166 (fiction) to 525 (media) text pairs; from 300,000 to 1 million tokens. Methodologically, the research relies on a series of supervised and unsupervised machine learning techniques, including those that facilitate visual data exploration. We learn a number of text classification models and study their performance to assess our hypotheses. Further on, we analyse the usefulness of the features for these classifications to detect the best translationese indicators in each register. The multivariate analysis via text classification is complemented by univariate statistical analysis which helps to explain the observed deviation of translated registers through a number of translationese effects and detect the features that contribute to them. Our results demonstrate that each register generates a unique form of translationese that can be only partially explained by cross-linguistic factors. Translated registers differ in the amount and type of prevalent translationese. The same translationese tendencies in different registers are manifested through different features. In particular, the notorious shining-through effect is more noticeable in general media texts and news commentary and is less prominent in fiction.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
- 2.
Earlier studies that suggest that translationese is dependent on register are Steiner (1998), Reiss (1989) and Teich (2003), among others.
- 3.
- 4.
- 5.
- 6.
- 7.
References
Aharoni, R., M. Koppel, and Y. Goldberg. 2014. Automatic detection of machine translated text and translation quality estimation. In Proceedings of the 52nd annual meeting of the association for computational linguistics (ACL 2014), Vol. 1: Long Papers, ed. K. Toutanova, and H. Wu, 289–295. Association for Computational Linguistics https://doi.org/10.3115/v1/p14-2048.
Arase, Y., and M. Zhou. 2013. Machine translation detection from monolingual web-text. In Proceedings of the 51st annual meeting of the association for computational linguistics, Vol. 1: Long Papers, ed. H. Schütze, F. Pascale, and M. Poesio, 1597–1607. Association for Computational Linguistics.
Baker, M. 1993. Corpus linguistics and translation studies: Implications and applications. In Text and technology: In honour of John Sinclair, ed. M. Baker, G. Francis, and E. Tognini-Bonelli, 232–250. Amsterdam: John Benjamins Publishing Company. https://doi.org/10.1075/z.64.15bak.
Baker, M. 1996. Corpus-based translation studies: The challenges that lie ahead. In Terminology, LSP and translation: Studies in language engineering, in honour of Juan C. Sager, ed. H. Somers, 175–186. Amsterdam: John Benjamins Publishing Company. https://doi.org/10.1075/btl.18.17bak.
Baroni, M., and S. Bernardini. 2006. A new approach to the study of translationese: Machine-learning the difference between original and translated text. Literary and Linguistic Computing 21 (3): 259–274. https://doi.org/10.1093/llc/fqi039.
Becher, V. 2011. Explicitation and implicitation in translation. A corpus-based study of English-German and German-English translations of business texts [Doctoral dissertation, Staats-und Universitätsbibliothek Hamburg Carl von Ossietzky]. https://ediss.sub.uni-hamburg.de/bitstream/ediss/4186/1/Dissertation.pdf.
Biber, D. 1988. Variation across speech and writing, 2nd ed. Cambridge: Cambridge University Press.
Biber, D. 1995. Dimensions of register variation: A cross-linguistic comparison. Cambridge: Cambridge University Press. https://doi.org/10.1017/CBO9780511519871.
Biber, D., and S. Conrad. 2009. Register, genre, and style. Cambridge: Cambridge University Press.
Biber, D., S. Johansson, G. Leech, S. Conrad, and R. Quirk. 1999. Longman grammar of spoken and written English, vol. 2. Cambridge, MA: The MIT Press.
Castagnoli, S. 2009. Regularities and variations in learner translations: a corpus-based study of conjunctive explicitation [Doctoral dissertation, University of Pisa, Italy]. ETD System, electronic theses and dissertations repository. https://etd.adm.unipi.it/t/etd-04252009-135411/.
Castagnoli, S., D. Ciobanu, K. Kunz, N. Kübler, and A. Volanschi. 2011. Designing a learner translator corpus for training purposes. In Corpora, language, teaching, and resources: From theory to practice, Vol. 12, ed. N. Kubler, 221–248. Frankfurt: Peter Lang.
Chang, Y., and C. Lin. 2008. Feature ranking using linear SVM. In Proceedings of the workshop on the causation and prediction challenge at WCCI 2008, ed. I. Guyon, C. Aliferis, and G. Cooper, 53–64. Proceedings of Machine Learning Research.
Corpas Pastor, G. 2008. Investigar con corpus en traducción: Los retos de un nuevo paradigma. Frankfurt: Peter Lang. https://doi.org/10.4000/bulletinhispanique.1301.
Corpas Pastor, G., R. Mitkov, N. Afzal, and V. Pekar. 2008. Translation universals: Do they exist? A corpus-based NLP study of convergence and simplification. In Proceedings of the 8th conference of the association for machine translation in the Americas (AMTA’08), 21–25.
Delaere, I. 2015. Do translations walk the line? Visually exploring translated and non-translated texts in search of norm conformity. [Doctoral dissertation, Ghent University]. Academic Bibliography. https://biblio.ugent.be/publication/5888594.
Dipper, S., M. Seiss, and H. Zinsmeister. 2012. The use of parallel and comparable data for analysis of abstract anaphora in German and English. In Proceedings of the 8th international conference on language resources and evaluation (LREC 2012), ed. N. Calzolari, Kh. Choukri, Th. Declerck, M. Uğur Doğan, et al., 138–145. European Language Resources Association.
Diwersy, S., S. Evert, and S. Neumann. 2014. A semi-supervised multivariate approach to the study of language variation. In Linguistic variation in text and speech, within and across languages, ed. B. Szmrecsanyi, and B. Wälchli, 174–204. Berlin: De Gruyter Mouton.
Duff, A. 1981. The third language: Recurrent problems of translation into English. Oxford: Pergamon.
Eetemadi, S., and K. Toutanova. 2015. Detecting translation direction: A cross-domain study. In Proceedings of NAACL-HLT 2015 student research workshop (SRW), ed. D. Inkpen, S. Muresan, Sh. Lahiri, K. Mazidi, and A. Zhila, 103–109. https://doi.org/10.3115/v1/N15-2014.
Evert, S., and S. Neumann. 2017. The impact of translation direction on characteristics of translated texts: A multivariate analysis for English and German. In Empirical translation studies: New methodological and theoretical traditions, vol. 300, ed. G. De Sutter, M. Lefer, and I. Delaere, 47–80. Berlin: De Gruyter Mouton. https://doi.org/10.1515/9783110459586-003.
Fraser, B. 2006. Towards a theory of discourse markers. In Approaches to discourse particles, ed. K. Fischer, 189–204. London: Elsevier.
Frawley, W. 1984. Prolegomenon to a theory of translation. In Translation: Literary, linguistic & philosophical perspectives, ed. W. Frawley, 159–175. Newark: University of Delaware Press.
Gellerstam, M. 1986. Translationese in Swedish novels translated from English. In Translation studies in Scandinavia, ed. L. Wollin and H. Lindquist, 88–95. Lund: CWK Gleerup.
Goutte, C., D. Kurokawa, and P. Isabelle. 2009. Automatic detection of translated text and its impact on machine translation. In Proceedings of the 12th machine translation summit (MT Summit XII), 81–88.
Graham, Y., B. Haddow, and P. Koehn. 2020. Statistical power and translationese in machine translation evaluation. In Proceedings of the 2020 conference on empirical methods in natural language processing (pp. 72–81). Association for Computational Linguistics.
Halliday, M.A.K., and R. Hasan. 1976. Cohesion in English. London: Longman.
Halliday, M., and R. Hasan. 1989. Language, context, and text: Aspects of language in a social-semiotic perspective (2nd ed.). Oxford University Press.
Hansen-Schirra, S. 2011. Between normalization and shining-through. Specific properties of English-German translations and their influence on the target language. In Multilingual discourse production: Diachronic and synchronic perspectives, ed. S. Kranich, 133–162. Amsterdam: John Benjamins.
Heafield, K. 2011. KenLM: Faster and smaller language model queries. In Proceedings of the EMNLP 2011 sixth workshop on statistical machine translation, ed. Ch. Callison-Burch, Ph. Koehn, Ch. Monz, and O. Zaidan, 187–197. Association for Computational Linguistics.
Ilisei, I., D. Inkpen, G. Corpas Pastor, and R. Mitkov. 2010. Identification of translationese: A machine learning approach. International conference on intelligent text processing and computational linguistics, 503–511.
Jiang, Z., and Y. Tao. 2017. Translation universals of discourse markers in Russian-to-Chinese academic texts: A corpus-based approach. Zeitschrift Fur Slawistik 62 (4): 583–605. https://doi.org/10.1515/slaw-2017-0037.
Jing, Y., and H. Liu. 2015. Mean hierarchical distance augmenting mean dependency distance. In Proceedings of the third international conference on dependency linguistics (Depling 2015), ed. J. Nivre and E. Hajicova, 161–170. Uppsala University.
Karakanta, A., and E. Teich. 2019. Detecting and analysing translationese with probabilistic language models translationese. In Translation in Transition 4: 38–39.
Katinskaya, A., and S. Sharoff. 2015. Applying multi-dimensional analysis to a Russian webcorpus: Searching for evidence of genres. In Proceedings of the 5th workshop on Balto-Slavic natural language processing, ed. J. Piskorski, L. Pivovarova, J. Šnajder, H. Tanev, and R. Yangarber, 65–74. INCOMA Ltd. http://www.aclweb.org/anthology/W15-5311.
Koppel, M., and N. Ordan. 2011. Translationese and its dialects. In Proceedings of the 49th annual meeting of the association for computational linguistics: Human language technologies, Vol. 1, ed. D. Lin, Yu. Matsumoto, and R. Mihalcea, 1318–1326. Association for Computational Linguistics.
Kruger, H., and B. Rooy. 2012. Register and the features of translated language. Across Languages and Cultures 13 (1): 33–65. https://doi.org/10.1556/Acr.13.2012.1.3.
Kruger, H., and B. van Rooy. 2010. The features of non-literary translated language: A pilot study. In Proceedings of using corpora in contrastive and translation studies (UCCTS 2010), ed. R. Xiao, 59–79.
Kunilovskaya, M. 2017. Linguistic tendencies in English to Russian translation: The case of connectives. In Computational linguistics and intellectual technologies: Proceedings of the international conference “Dialogue 2017”, Vol. 2, ed. V.P. Selegey, A.V. Baytin, V.I. Belikov, I.M. Boguslavsky, B.V. Dobrov, et al., 221–233. Computational Linguistics and Intellectual Technologies.
Kunilovskaya, M., and A. Kutuzov. 2018. Universal dependencies-based syntactic features in detecting human translation varieties. In Proceedings of the 16th international workshop on treebanks and linguistic theories (TLT16), ed. J. Hajič, 27–36. Association for Computational Linguistics.
Kunilovskaya, M., and E. Lapshinova-Koltunski. 2020. Lexicogrammatic translationese across two targets and competence levels. In Proceedings of the 12th conference on language resources and evaluation (LREC 2020), ed. N. Calzolari, F. Bechet, Ph. Blache, Kh. Choukri, et al., 4102–4112. The European Language Resources Association (ELRA).
Kunilovskaya, M., and E. Lapshinova-Koltunski. 2019. Translationese features as indicators of quality in English-Russian human translation. In Proceedings of the 2nd workshop on human-informed translation and interpreting technology (HiT-IT 2019), ed. I. Temnikova, C. Orasan, G. Corpas Pastor, and R. Mitkov, 47–56. INCOMA Ltd. https://doi.org/10.26615/issn.2683-0078.2019_006.
Kutuzov, A., and M. Kunilovskaya. 2014. Russian learner translator corpus: Design, research potential and applications. In Proceedings of the 17th international conference text, speech and dialogue, vol. 8655, ed. P. Sojka, A. Horák, I. Kopeček, and K. Pala, 315–323. Springer.
Lapshinova-Koltunski, E. 2017. Exploratory analysis of dimensions influencing variation in translation. The case of text register and translation method. In Empirical translation studies. New theoretical and methodological traditions, vol. 300, ed. G. De Sutter, M. Lefer, and I. Delaere, 207–234. Berlin: De Gruyter Mouton. https://doi.org/10.1515/9783110459586-008.
Lapshinova-Koltunski, E., and M. Zampieri. 2018. Linguistic features of genre and method variation in translation: A computational perspective. The grammar of genres and styles: From discrete to non-discrete units, (TiLSM, 320), 92–112. Berlin: De Gruyter Mouton.
Lee, D.Y.W. 2001. Genres, registers, text types, domains, and styles: Clarifying the concepts and navigating a path through the BNC jungle. Language Learning & Technology 5 (3): 37–72. https://doi.org/10.1016/S1364-6613(00)01594-1.
Lembersky, G., N. Ordan, and S. Wintner. 2012. Language models for machine translation: Original vs. translated texts. Computational Linguistics, 38 (4): 799–825. https://doi.org/10.1162/COLI_a_00111.
Lijffijt, J., T. Nevalainen, T. Säily, P. Papapetrou, K. Puolamäki, and H. Mannila. 2016. Significance testing of word frequencies in corpora. Digital Scholarship in the Humanities 31 (2): 374–397. https://doi.org/10.1093/llc/fqu064.
Liu, D. 2008. Linking adverbials: An across-register corpus study and its implications. International Journal of Corpus Linguistics 13 (4): 491–518. https://doi.org/10.1075/ijcl.13.4.05liu.
Martin, J.R. 1992. English text: System and structure. Amsterdam: John Benjamins.
Nakamura, S. 2007. Comparison of features of texts translated by professional and learner translators. In Proceedings of the 4th corpus linguistics conference. University of Birmingham.
Neumann, S. 2013. Contrastive register variation. A quantitative approach to the comparison of English and German. Berlin: De Gruyter Mouton.
Nikolaev, D., T. Karidi, N. Kenneth, V. Mitnik, L. Saeboe, and O. Abend. 2020. Morphosyntactic predictability of translationese. Linguistics Vanguard, 6 (1).
Nini, A. 2019. The multi-dimensional analysis tagger. In Multi-dimensional analysis: research methods and current issues, ed. T. Berber Sardinha, and M. Veirano Pinto, 67–94. London; New York: Bloomsbury Academic. https://doi.org/10.5040/9781350023857.0012.
Nisioi, S., and L.P. Dinu. 2013. A clustering approach for translationese identification. In Proceedings of the international conference recent advances in natural language processing (RANLP 2013), ed. R. Mitkov, G. Angelova, and K. Bontcheva, 532–538. INCOMA Ltd. http://www.aclweb.org/anthology/R13-1070.
Novikova, N.I. 2008. Connectives as cohesive devices in an asyndetic composite sentence [Konnektory kak svjazujushhie sredstva v bessojuznom slozhnom predlozhenii]. In Herald of the Voronezh state Architecture University, advanced linguistic and pedagogical research series [Ser.: Sovremennye lingvisticheskie i metodiko-didakticheskie issledovanija], 92–100.
Olohan, M. 2001. Spelling out the optionals in translation: A corpus study. UCREL Technical Papers 13: 423–432.
Popescu, M. 2011. Studying translationese at the character level. In Proceedings of the international conference recent advances in natural language processing (RANLP 2011), 634–639. http://aclweb.org/anthology/R11-1091.
Popovic, M. 2020. On the differences between human translations. In Proceedings of the 22nd annual conference of the European association for machine translation, ed. A. Martins, H. Moniz, S. Fumega, M. Martins, F. Batista, L. Coheur, C. Parra, … M. Forcada, 365–374. European Association for Machine Translation.
Prieels, L., I. Delaere, K. Plevoets, and G. De Sutter. 2015. A corpus-based multivariate analysis of linguistic norm-adherence in audiovisual and written translation. Across Languages and Cultures 16 (2): 209–231. https://doi.org/10.1556/084.2015.16.2.4.
Priyatkina, A.F., E.A. Starodumova, G.N. Sergeeva, et al. (eds.). 2001. A Russian dictionary of functional words [Slovar’ sluzhebnyh slov russkogo jazyka]. Vladivostok: Far-East State University Press.
Puurtinen, T. 2003. Genre-specific features of translationese? Linguistic differences between translated and non-translated Finnish children’s literature. Literary and Linguistic Computing 18 (4): 389–406. https://doi.org/10.1093/llc/18.4.389.
Rabadán, R., B. Labrador, and N. Ramón. 2009. Corpus-based contrastive analysis and translation universals: A tool for translation quality assessment. Babel 55 (4): 303–328. https://doi.org/10.1075/babel.55.4.01rab.
Rabinovich, E., and S. Wintner. 2013. Unsupervised identification of tr association for computational linguistics anslationese. Transactions of the Association for Computational Linguistics 3: 419–432. https://doi.org/10.1162/tacl_a_00148.
Redelinghuys, K. 2016. Levelling-out and register variation in the translations of experienced and inexperienced translators: A corpus-based study. Stellenbosch Papers in Linguistics 45: 189–220. https://doi.org/10.5774/45-0-198.
Santini, M., A. Mehler, and S. Sharoff. 2010. Riding the rough waves of genre on the web concepts and research questions. In Genres on the web: Computational models and empirical studies, vol. 42, ed. A. Mehler, S. Sharoff, and M. Santini, 3–30. Springer Science & Business Media.
Santos, D. 1995. On grammatical translationese. In Proceedings of the 10th Nordic conference of computational linguistics (NODALIDA 1995), ed. K. Koskenniemi, 59–66. University of Helsinki.
Sharoff, S. 2018. Functional text dimensions for annotation of web corpora. Corpora 13 (1): 65–95. https://doi.org/10.3366/cor.2018.0136.
Shvedova, N. (ed.). 1980. Russian grammar. Moscow, Science [Nauka].
Sominsky, I., and S. Wintner. 2019. Automatic detection of translation direction. In Proceedings of the international conference on recent advances in natural language processing (RANLP 2019), ed. R. Mitkov and G. Angelova, 1131–1140. INCOMA Ltd. https://doi.org/10.26615/978-954-452-056-4_130.
Specia, L., G.H. Paetzold, and C. Scarton. 2015. Multi-level translation quality prediction with QUEST++. In Proceedings of ACL-IJCNLP 2015 system demonstrations, ed. H. Chen, and K. Markert, 115–120. Association for Computational Linguistics. https://doi.org/10.3115/v1/p15-4020.
Straka, M., and Straková, J. 2017. Tokenizing, POS tagging, lemmatizing and parsing UD 2.0 with UDPipe. In Proceedings of the CoNLL 2017 shared task: multilingual parsing from raw text to universal dependencies, ed. D. Zeman, J. Hajic, M. Popel, M. Potthast, M. Straka, F. Ginter, J. Nivre, and S. Petrov, 88–99. Association for Computational Linguistics. https://doi.org/10.18653/v1/K17-300.
Stymne, S. 2017. The effect of translationese on tuning for statistical machine translation. In Proceedings of the 21st Nordic conference of computational linguistics, ed. J. Tiederman, 241–246. Linköping University Electronic Press.
Teich, E. 2003. Cross-linguistic variation in system and text. A methodology for the investigation of translations and comparable texts. (TTCP, 5). Berlin: De Gruyter Mouton.
Toury, G. 1995. Descriptive trantslation studies-and beyond. Amsterdam: John Benjamins. https://doi.org/10.1075/btl.4.
Vela, M., and E. Lapshinova-Koltunski. 2015. Register-based machine translation evaluation with text classification techniques. In Proceedings of the 15th machine translation summit (Vol. 1: MT Researchers’ Track), ed. Y. Al-Onaizan, and W. Lewis, 215–228. Association for Machine Translation in the Americas.
Volansky, V., N. Ordan, and S. Wintner. 2015. On the features of translationese. Digital Scholarship in the Humanities 30 (1): 98–118. https://doi.org/10.1093/llc/fqt031.
Xiao, R., L. He, and Y. Ming. 2010. In pursuit of the third code: Using the ZJU corpus of translational Chinese in translation studies. In Using corpora in contrastive and translation studies, ed. R. Xiao, 182–214. New Castle: Cambridge Scholars Publishing.
Zanettin, F. 2013. Corpus methods for descriptive translation studies. Procedia-Social and Behavioral Sciences 95: 20–32. https://doi.org/10.1016/j.sbspro.2013.10.618.
Zhang, M., and A. Toral. 2019. The effect of translationese in machine translation test sets. In Proceedings of the fourth conference on machine translation (Volume 1: Research Papers), ed. O. Bojar, R. Chatterjee, Ch. Federmann, M. Fishel, Y. Graham, ... K. Verspoor, 73–81. Association for Computational Linguistics. https://doi.org/10.18653/v1/W19-5208.
Acknowledgements
The research presented in this paper has been partially carried out in the framework of projects in the framework of the projects VIP (FFI2016-75831-P), TRIAGE (UMA18-FEDERJA-067) and MI4ALL (CEI-RIS3). The authors would like to thank two anonymous reviewers for their valuable comments.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendix
Appendix
The UD-based and list-based features in alphabetical order.
Preliminary Notes
-
1.
Normalisation measures
We use several norms to make features comparable across different-size corpora, depending on the nature of the feature. Most of the features, including all types of discourse markers, negative particles, passives, types of verb forms, relative clauses, correlative constructions, adverbial clauses introduced by pronominal adverbs coordinating and subordinating conjunctions, simple sentences, number of clauses per sentence, are normalised to the number of sentences (30 features). Such features as personal, possessive and other noun substitutes, nouns, adverbial quantifiers, determiners are normalised to the running words (6 features). Counts for syntactic relations are represented as probabilities, normalised to the number of sentences (7 features). Some features have their own normalisation basis: comparative and superlative degrees are normalised to the total number of adjectives and adverbs, nouns in the functions of subject, object or indirect object are normalised to the total number of these roles in the text.
-
2.
Groups of discourse markers
The classification of connectives (discourse markers) follows the descriptions in Halliday and Hasan (1976) and in Biber et al. (1999). Table A has the number of items in each group and most frequent examples. The lists were initially produced independently from grammar reference books, dictionaries of function words and relevant research papers (for English we used Biber et al. (1999), Fraser (2006), Liu (2008); for Russian―Novikova (2008), Priyatkina (2015), Russian Grammar (Shvedova 1980) to name just a few sources for each language). After the initial selection, the lists were verified for comparability. Following Fraser (2006), discourse markers are treated functionally and include items of various morphological and structural types (conjunctions, adverbs, particles, parenthetical phrases). Though most items on the lists are set phrases, we allowed for possible lexical and structural variability at the extraction time. We also used orthography and punctuation to disambiguate our items. The output of the extraction procedure was manually checked to exclude greedy matching.
-
3.
The alphabetic list of 45 morphosyntactic features
acl
finite and non-finite clausal modifier of noun (adjectival clause), including relative clauses as a subtype (used only in EN and RU); extraction is based on UD default annotation (e.g. the person showing (acl) her around; help people do something to overcome (acl) it; людeй, cлeдящиx (acl) зa пoлитикoй)
addit
additive connectives; cumulative frequency of the list items normalised to the number of sentences; see description in Table A
advers
adversative (contrastive) connectives; cumulative frequency of the list items normalised to the number of sentences; see description in Table A
attrib
adjectives and participles functioning as attributes; all words tagged as ADJ or VerbForm = Part with the amod dependency to their head (e.g. the rising sun; the coloured face; fried green tomatoes)
aux
auxiliary verbs; extraction is based on UD default annotation
aux:pass
auxiliary verbs in passive forms; extraction is based on UD default annotation
but
contrastive coordinating conjunction but (нo), if not followed but also/и, тaкжe and not in the absolute sentence end
caus
causative connectives; cumulative frequency of the list items normalised to the number of sentences; see description in Table A
ccomp
clausal complement as annotated in UD (e.g. help people to do (ccomp) smth; нe oжидaли, чтo пpидeт (ccomp))
cconj
coordinating conjunctions: lemmas in and, or, both, yet, either, &, nor, plus, neither, ether / и, a, или, ни, дa, пpичeм, либo, зaтo, инaчe, тoлькo, aн, и/или, иль tagged CCONJ. Lists are used to filter out noise.
comp
comparative degree of comparison for adjectives and adverbs; synthetic forms are extracted based on the tag Degree = Comp, while analytical forms are counted as adjectives and adverbs with a dependent more/бoлee (бoльший)
copula
copula verbs; lemmas of be, быть, этo that have a cop relation to their head, excluding constructions with there as head for English
correl
correlative constructions of all types, where a PRON/DET (those, such) is syntactically or semantically connected to subsequent CONJ. In English they make a subset of relative clauses; in Russian they can also be a subtype of a clausal complement (e.g. of those who voted for him, raising the living standards of those that are poor)
demdets
pronominal determiners; lemmas in the function det from the lists this, some, these, that, any, all, every, another, each, those, either, such / этoт, вecь, тoт, тaкoй, кaкoй, кaждый, любoй, нeкoтopый, кaкoй-тo, oдин, ceй, этo, вcякий, нeкий, кaкoй-либo, кaкoй-нибyдь, кoe-кaкoй
deverbals
deverbal nouns, names of processes, actions, states. The extraction for English accounts for affixation (with most productive -ment, -tion/ -ung, -tion) and conversion as types of derivation. In the first case the output is filtered with an empirically driven stop list. Converted nouns are counted from a list of true procedural nouns that were not fully substantivised. To produce this list we looked through the nounal occurrences of lemmas that also appear as verbs and filtered out items that prevail in their fully substantivised lexico-semantic variants in our data (such as design, set, measure, mark, press, stick, cross, trap, handle). For Russian we extracted nouns in -тиe, -eниe, -aниe, -cтвo, -ция, -oтa and employed a 150-items long stop list to exclude fully substantivised words such as coбpaниe, мecтopoждeниe, миниcтepcтвo, тeлeвидeниe, твopчecтвo, peшeниe.
epist
epistemic stance discourse markers; cumulative frequency of the list items normalised to the number of sentences; see description in Table A
finites
verbs in finite form; extraction is based on UD default annotation VerbForm = Fin
indef
noun substitutes, i.e. pronouns par excellence, of indefinite, total and negative semantic subtypes; extraction is based on PRON tag with a filter list: anybody, anyone, anything, everybody, everyone, everything, nobody, none, nothing, somebody, someone, something, elsewhere, nowhere, everywhere, somewhere, anywhere / кoгдa, гдe, кyдa, oткyдa, oтчeгo, пoчeмy, зaчeм and words with -тo|-нибyдь|-либo, except starting with кaкoй; and items from ктo-ктo, кoгo-кoгo, кoмy-кoмy, кeм-кeм, кoм-кoм, чтo-чтo, чeгo-чeгo, чeмy-чeмy, чeм-чeм, кyдa-кyдa, гдe-гдe
infs
infinitives: all cases of a verb form tagged VerbForm = Inf with a dependent to particle and cases of true bare infinitive, excluding after modal verbs and have to, going to and modal adjectival predicates, but including cases after help, make, bid, let, see, hear, watch, dare, feel. For Russian all occurrences of verb forms with the feature VerbForm = Inf except after modal predicates and with the dependent быть to exclude future forms (e.g. oтнoшeния бyдyт yxyдшaтьcя).
interrog
interrogative sentences: all sentences ending in ?
lexdens
lexical density: ratio of PoS disambiguated content words types (look_VERB vs look_NOUN) to all tokens
lexTTR
lexical type-to-token ratio: ratio of PoS disambiguated content words types (look_VERB vs look_NOUN) to their tokens. Content words include lemmas in ADJ, ADV, VERB, NOUN part-of-speech categories.
mdd
mean dependency distance (MDD, aka comprehension difficulty) as ‘the distance between words and their parents, measured in terms of intervening words’ (Jing and Liu 2015: 162)
mhd
mean hierarchical distance (MHD, aka production (speaker’s difficulty) as the average value of all path lengths travelling from the root to all nodes along the dependency edges (Jing and Liu 2015: 164)
mpred
modal predicates; for English all verbs tagged as MD in XPOS, except will/shall, constructions with have-to-Inf and all adjectival modal predicates (given a list of 17 predicatives such as impossible, likely, sure with a dependent AUX). For Russian: lemma мoчь, lemma cлeдoвaть with a dependent infinitive, three modal adverbs (мoжнo, нeльзя, нaдo) and 11 adjectives from the modal predicative list in the short form Variant = Short (e.g. дoлжeн, cпocoбный, вoзмoжный)
mquantif
adverbial quantifiers; listed lemmas tagged ADV. The support lists include 37 English items (e.g. barely, completely, intensely, almost), 80 Russian items (aбcoлютнo, пoлнocтью, cплoшь, нeoбыкнoвeннo, дocтaтoчнo, coвepшeннo, нeвынocимo, пpимepнo). For Russian we additionally provide for functionally similar non-adverbial quantifiers such as eлe, oчeнь, вшecтepo, нeвыpaзимo, излишнe, eлe-eлe, чyть-чyть, eдвa-eдвa, тoлькo, кaпeлькy, чyтoчкy, eдвa.
neg
negative particles or main sentence negation: counts of lemmas in no, not, neither /нeт, нe
nnargs
core verbal arguments represented by nouns or proper names; ratio of nouns and proper names in the functions of nsubj, obj, iobj to the count of these functions
nsubj:pass
subjects of verbs in the passive voice; extraction is based on UD default nsubj:pass annotation
numcls
number of clauses per sentence; number of relations from the list csubj, acl:relcl, advcl, acl, xcomp, parataxis annotated in one sentence
passives
passive constructions with expressed agentive role; all verbs tagged Voice = Pass and a dependent aux:pass (for English). For Russian we account for two morphological forms (вoйнa вeлacь, пoлитикa былa нaпpaвлeнa) and for semantic passive (cтaдиoн вoзвoдят нa нoвoм мecтe, вo Bлaдикaвкaзe eмy гoтoвят paдyшнyю вcтpeчy)
parataxis
asyndatically connected coordinated clauses (often direct speech or clauses joined ‘:’ or a ‘;’ as well as parenthetical clauses); extraction is based on UD default annotation
pasttense
verbs in the past tense: all occurrences of the feature Tense = Past
pied
correlative constructions with displaced (pied-piped) preposition (e.g. technology for which Sony could take credit; speech in which he made this argument; o тaкoм, o кaкoм вы нe cлыxaли; cкaндaл, в кoтopoм; тpaгeдии, c кoтopыми, в тoй кoнcтpyкции, в кaкoй oнa)
possdet
possessive pronouns; for English lemma in my, your, his, her, its, our, their tagged DET, PRON and Poss = Yes. For Russian lemma in мoй, твoй, вaш, eгo, ee, eё, нaш, иx, иxний, cвoй tagged DET
ppron
personal pronouns; tokens tagged PRON, with any value of attribute Person = that do not have Poss = Yes feature and are on the list: i, you, he, she, it, we, they, me, him, her, us, them / я, ты, вы, oн, oнa, oнo, мы, oни, мeня, тeбя, eгo, eё, ee, нac, вac, иx, нeё, нee, нeгo, ниx, мнe, тeбe, eй, eмy, нaм, вaм, им, нeй, нeмy, ним, мeня, тeбя, нeгo, мнoй, мнoю, тoбoй, тoбoю, Baми, им, eй, eю, нaми, вaми, ими, ним, нeм, нём, нeй, нeю
pverbals
participles: for English all occurrences of VerbForm = Part or VerbForm = Ger not in attributive function amod or part of an analytical form. For Russian VerbForm = Part not in the short form and not in the attributive function, without a dependent auxiliary, and VerbForm = Conv without dependent auxiliary (e.g. after years of translating emails, webinars and other materials)
relativ
all relative clauses, including correlative constructions and pied-piping construction. Extraction is based on affirmative sentences only. For English: which, that, whose, whom, what, who tagged as PRON, excluding cases when relative PRON has a dependent preposition and follows its head (e.g. But we will return to that (PRON) later). For Russian: кoтopый, чтo, ктo, кaкoй and a comma in the left window of 3
sconj
subordinating conjunctions: lemma in that, if, as, of, while, because, by, for, to, than, whether, in, about, before, after, on, with, from, like, although, though, since, once, so, at, without, until, into, despite, unless, whereas, over, upon, whilst, beyond, towards, toward, but, except, cause, together / чтo, кaк, ecли, чтoбы, тo, кoгдa, чeм, xoтя, пocкoлькy, пoкa, тeм, вeдь, нeжeли, ибo, пycть, бyдтo, cлoвнo, дaбы,paз, нacкoлькo, тoт, кoли, кoль, xoть, paзвe, cкoль,eжeли, пoкyдa, пocтoлькy tagged SCONJ. Lists are used to filter out noise.
sentlength
number of words per sentence averaged over all sentences in the text. The extraction accounts for typical sentence tokenisation errors such as sentences ending in:,;, Mr., Dr.
simple
simple sentence; a sentence where no words have relations: csubj, acl:relcl, advcl, acl, xcomp, parataxis
sup
superlative degree of comparison for adjective and adverbs; synthetic forms are extracted based on the tag Degree = Sup, while analytical forms are counted as adjectives and adverbs with a dependent most/нaибoлee/caмый and for Russian words starting with нaи- with the exception of a few homonymous adverbs (нaиcкocoк)
tempseq
temporal and sequential connectives; cumulative frequency of the list items normalised to the number of sentences; see description in Table A
whconj
adverbial clause introduced by a pronominal ADV when, where, why / кoгдa, гдe, кyдa, oткyдa, oтчeгo, пoчeмy, зaчeм
xcomp
a predicative or clausal complement without its own subject, annotated after phrasal verbs (e.g. started to sing), in case of infinitive constructions (e.g. asked me to leave), etc.; extraction is based on UD default annotation
Rights and permissions
Copyright information
© 2021 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this chapter
Cite this chapter
Kunilovskaya, M., Corpas Pastor, G. (2021). Translationese and Register Variation in English-To-Russian Professional Translation. In: Wang, V.X., Lim, L., Li, D. (eds) New Perspectives on Corpus Translation Studies. New Frontiers in Translation Studies. Springer, Singapore. https://doi.org/10.1007/978-981-16-4918-9_6
Download citation
DOI: https://doi.org/10.1007/978-981-16-4918-9_6
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-16-4917-2
Online ISBN: 978-981-16-4918-9
eBook Packages: EducationEducation (R0)