Investigating Genre and Method Variation in Translation Using Text Classification

  • Marcos ZampieriEmail author
  • Ekaterina Lapshinova-Koltunski
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9302)


In this paper, we propose the use of automatic text classification methods to analyse variation in English-German translations from both a quantitative and a qualitative perspective. The experiments described in this paper are carried out in two steps. We trained classifiers to 1) discriminate between different genres (fiction, political essays, etc.); and 2) identify the translation method (machine vs. human). Using semi-delexicalized models (excluding all nouns), we report results of up to 60.5% F-measure in distinguishing human and machine translations and 45.4% in discriminating between seven different genres. More than the classification performance itself, we argue that text classification methods can level out discriminative features of different variables (genres and translation methods) thus enabling researchers to investigate in more detail the properties of each of them.


Human and machine translation Text classification Genres 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Medlock, B.: Investigating classification for natural language processing tasks. Technical report, University of Cambridge - Computer Laboratory (2008)Google Scholar
  2. 2.
    Niculae, V., Zampieri, M., Dinu, L.P., Ciobanu, A.M.: Temporal text ranking and automatic dating of texts. In: 14th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2014) (2014)Google Scholar
  3. 3.
    Diwersy, S., Evert, S., Neumann, S.: A semi-supervised multivariate approach to the study of language variation. Linguistic Variation in Text and Speech, within and across Languages (2014)Google Scholar
  4. 4.
    Zampieri, M., Gebre, B.G., Diwersy, S.: N-gram language models and POS distribution for the identification of Spanish varieties. In: Proceedings of TALN2013, Sable d’Olonne, France, pp. 580–587 (2013)Google Scholar
  5. 5.
    Lapshinova-Koltunski, E.: VARTRA: a comparable corpus for analysis of translation variation. In: Proceedings of the Sixth Workshop on Building and Using Comparable Corpora, Sofia, Bulgaria, pp. 77–86. ACL (2013)Google Scholar
  6. 6.
    Halliday, M., Hasan, R.: Language, context and text: Aspects of language in a social-semiotic perspective. Oxford University Press, Oxford (1989)Google Scholar
  7. 7.
    Biber, D.: Dimensions of Register Variation. A Cross Linguistic Comparison. Cambridge University Press, Cambridge (1995)CrossRefGoogle Scholar
  8. 8.
    Hansen-Schirra, S., Neumann, S., Steiner, E.: Cross-linguistic Corpora for the Study of Translations. Insights from the Language Pair English-German. de Gruyter, Berlin, New York (2012)CrossRefGoogle Scholar
  9. 9.
    Neumann, S.: Contrastive Register Variation. A Quantitative Approach to the Comparison of English and German. De Gruyter Mouton, Berlin, Boston (2013)Google Scholar
  10. 10.
    House, J.: Translation Quality Assessment. A Model Revisited. Günther Narr, Tübingen (1997)Google Scholar
  11. 11.
    Steiner, E.: An extended register analysis as a form of text analysis for translation. In: Wotjak, G., Schmidt, H. (eds.) Modelle der Translation - Models of Translation, pp. 235–256. Leipziger Schriften zur Kultur-, Literatur-, Sprach- und Übersetzungswissenschaft, Leipzig (1996)Google Scholar
  12. 12.
    Steiner, E.: A register-based translation evaluation. TARGET, International Journal of Translation Studies 10(2), 291–318 (1997)Google Scholar
  13. 13.
    Steiner, E.: Translated Texts. Properties, Variants, Evaluations. Peter Lang Verlag, Frankfurt/M (2004)Google Scholar
  14. 14.
    De Sutter, G., Delaere, I., Plevoets, K.: Lexical lectometry in corpus-based translation studies: combining profile-based correspondence analysis and logistic regression modeling. In: Quantitative Methods in Corpus-based Translation Studies: a Practical Guide to Descriptive Translation Research, vol. 51. John Benjamins Publishing Company, Amsterdam, pp. 325–345 (2012)Google Scholar
  15. 15.
    Delaere, I., De Sutter, G.: Applying a multidimensional, register-sensitive approach to visualize normalization in translated and non-translated Dutch. Belgian Journal of Linguistics 27, 43–60 (2013)CrossRefGoogle Scholar
  16. 16.
    Irvine, A., Morgan, J., Carpuat, M., Daumé III, H., Munteanu, D.S.: Measuring machine translation errors in new domains. TACL 1, 429–440 (2013)Google Scholar
  17. 17.
    Santini, M., Mehler, A., Sharoff, S.: Riding the rough waves of genre on the web. In: Mehler, A., Sharoff, S., Santini, M. (eds.) Genres on the Web: Computational Models and Empirical Studies. Springer, pp. 3–30 (2010)Google Scholar
  18. 18.
    Wu, H., Wang, H., Zong, C.: Domain adaptation for statistical machine translation with domain dictionary and monolingual corpora. In: Proceedings of COLING-2008, Manchester, UK, pp. 993–1000 (2008)Google Scholar
  19. 19.
    Irvine, A., Callison-Burch, C.: Using comparable corpora to adapt MT models to new domains. In: Proceedings of the ACL Workshop on Statistical Machine Translation (WMT) (2014)Google Scholar
  20. 20.
    Popovic, M., Ney, H.: Towards automatic error analysis of machine translation output. Computational Linguistics 37(4), 657–688 (2011)MathSciNetCrossRefGoogle Scholar
  21. 21.
    Fishel, M., Sennrich, R., Popovic, M., Bojar, O.: Terrorcat: a translation error categorization-based mt quality metric. In: 7th Workshop on Statistical Machine Translation (2012)Google Scholar
  22. 22.
    Volansky, V., Ordan, N., Wintner, S.: More human or more translated? Original texts vs. human and machine translations. In: Proceedings of the 11th Bar-Ilan Symposium on the Foundations of AI With ISCOL (2011)Google Scholar
  23. 23.
    Gellerstam, M.: Translationese in Swedish novels translated from English. In: Translation Studies in Scandinavia, pp. 88–95 (1986)Google Scholar
  24. 24.
    Baker, M., et al.: Corpus linguistics and translation studies: Implications and applications. Text and technology: In honour of John Sinclair 233, 250 (1993)Google Scholar
  25. 25.
    Baroni, M., Bernardini, S.: A new approach to the study of translationese: Machine-learning the difference between original and translated text. Literary and Linguistic Computing 21(3), 259–274 (2006)CrossRefGoogle Scholar
  26. 26.
    Ilisei, I., Inkpen, D., Corpas Pastor, G., Mitkov, R.: Identification of translationese: a machine learning approach. In: Gelbukh, A. (ed.) CICLing 2010. LNCS, vol. 6008, pp. 503–511. Springer, Heidelberg (2010) CrossRefGoogle Scholar
  27. 27.
    Volansky, V., Ordan, N., Wintner, S.: On the features of translationese. Literary and Linguistic Computing (2013)Google Scholar
  28. 28.
    Ciobanu, A.M., Dinu, L.P.: A quantitative insight into the impact of translation on readability. In: Proceedings of the 3rd PITR workshop, pp. 104–113 (2014)Google Scholar
  29. 29.
    Gebre, B.G., Zampieri, M., Wittenburg, P., Heskens, T.: Improving native language identification with tf-idf weighting. In: Proceedings of the BEA, Atlanta, USA (2013)Google Scholar
  30. 30.
    Zampieri, M., Gebre, B.G.: Varclass: An open source language identification tool for language varieties. In: Language Resources and Evaluation (LREC) (2014)Google Scholar
  31. 31.
    Joachims, T.: Text categorization with support vector machines: learning with many relevant features. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 137–142. Springer, Heidelberg (1998) CrossRefGoogle Scholar
  32. 32.
    Petrenz, P., Webber, B.: Robust cross-lingual genre classification through comparable corpora. In: The 5th Workshop on Building and Using Comparable Corpora (2012)Google Scholar
  33. 33.
    Quiniou, S., Cellier, P., Charnois, T., Legallois, D.: What about sequential data mining techniques to identify linguistic patterns for stylistics? In: Gelbukh, A. (ed.) CICLing 2012, Part I. LNCS, vol. 7181, pp. 166–177. Springer, Heidelberg (2012) CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Marcos Zampieri
    • 1
    • 2
    Email author
  • Ekaterina Lapshinova-Koltunski
    • 1
  1. 1.Saarland UniversitySaarbrückenGermany
  2. 2.German Research Center for Artificial Intelligence (DFKI)SaarbrückenGermany

Personalised recommendations