Skip to main content
Log in

The UPF learner translation corpus as a resource for translator training

  • Original Paper
  • Published:
Language Resources and Evaluation Aims and scope Submit manuscript

Abstract

The learner translation corpus developed at the School of Translation and Interpreting of Pompeu Fabra University in Barcelona is a web-searchable resource created for pedagogical and research purposes. It comprises a multiple translation corpus (English–Catalan) featuring automatic linguistic annotation and manual error annotation, complemented with an interface for monolingual or bilingual querying of the data. The corpus can be used to identify common errors in the students’ work and to analyse their patterns of language use. It provides easy access to error samples and to multiple versions of the same source text sequence to be used as learning materials in various courses in the translator-training university curriculum.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Notes

  1. An exception to this approach is the PELCRA Learner Translation Corpus, developed at the English Department of the University of Łódz (Poland) by collecting translations (as opposed to original production) from Polish (native language) into English (Uzar 2002: 249, cited in Castagnoli 2009: 39) for the purposes of research and teaching of English as a foreign language. Its error annotation typology is from the EFL field as opposed to the translation pedagogy field.

  2. The project was funded by the Catalan government through the Agència per a la Gestió d’Ajuts Universitaris i de Recerca (AGAUR), code number 2008MQD00006. A. Espunya belongs to the Grup CEDIT (Centre d’Estudis de Discurs i Traducció), funded by the Catalan government (code 2009SGR 771).

  3. http://mellange.eila.jussieu.fr and http://corpus.leeds.ac.uk/mellange/ltc.html.

  4. MISTiC is not annotated for errors, as it was compiled for a “traditional” corpus research study on explicitation. Castagnoli (2009: 84) reports a size of 59 source texts (30 English, 29 French) aligned in a one-to-many relation—with a few exceptions—to 482 Italian student translations. Multi-parallel and longitudinal analyses are possible, as there are several translations for each ST and each student contributed more than one translation.

  5. Markin is a tool developed by Martin Holmes. Copyright by Martin Holmes and Creative Technology (Microdesign) Ltd. URL address http://www.cict.co.uk/markin/index.php.

  6. We are indebted to Judith Domingo and Martí Quixal for their assistance with the implementation of this recycling scheme.

  7. TreeTagger is a tool for annotating text with part-of-speech and lemma information developed by Helmut Schmid in the TC project at the Institute for Computational Linguistics of the University of Stuttgart. URL address http://www.ims.uni-stuttgart.de/projekte/corplex/TreeTagger/.

  8. Defined by Vázquez-Ayora (1977: 102) as the usage of a given element or structure with a higher or lower frequency than is normal for original texts.

  9. Interested readers may contact the author to obtain permission of use.

  10. The PACTE research group (Process of Acquisition of the Translation and Evaluation Competencies) defines “rich points” as “specific source-text segments that contained translation problems” (2009: 212). Rich points are key research constructs intended to isolate the hidden stimulus in an experiment.

  11. As in all quantitative corpus analysis, the statistical significance of results depends on corpus size. For a multiple translation corpus, size means not only raw word count but also the number of versions for each source text.

  12. The raw numbers of tags for the top ten categories are the following: Lexical Imprecision (n = 671), Wrong Sense (n = 600), Catalan Grammar (n = 508), Inexact Sense (n = 404), Punctuation (n = 380), Literal Translation (n = 325), Hispanism (n = 222), Spelling (n = 188), Syntax (n = 187).

References

  • Alsina, A., Badia, T., Boleda, G., Bott, S., Gil, Á., Quixal, M., et al. (2002). CATCG: un sistema de análisis morfosintáctico para el catalán. Procesamiento del Lenguaje Natural, 29, 309–310.

    Google Scholar 

  • Bowker, L., & Bennison, P. (2002). Translation tracking system: A tool for managing translation archives. In Proceedings of LREC 2002 (pp. 503–507). http://gandalf.aksis.uib.no/lrec2002/pdf/115.pdf. Accessed 12 March 2011.

  • Bowker, L., & Bennison, P. (2003). Student translation archive. Design, development and application. In F. Zanettin, S. Bernardini, & D. Stewart (Eds.), Corpora in translator education (pp. 104–117). Manchester: St Jerome.

  • Castagnoli, S. (2009). Regularities and variations in learner translations: A corpus-based study of conjunctive explicitation. PhD Dissertation, University of Pisa.

  • Dagneaux, E., Granger, S. & Meunier, F. (2002) (Eds.), International corpus of learner English. UCL, Presses Universitaires de Louvain.

  • Delisle, J. (1993). La traduction raisonnée: Livre du maitre. Ottawa: Presses de l’Université d’Ottawa.

    Google Scholar 

  • Domingo, J., Badia, T., & Colominas, C. (2010). IAC: A dynamic corpus interface. In Xiao, R. (Ed.), Proceedings of the international symposium on using Corpora in contrastive and translation studies 2010 conference (UCCTS2010). Edge Hill University, Ormskirk, 2729 July 2010. http://www.edgehill.ac.uk/uccts2010proceedings. Accessed 12 March 2011.

  • Espunya, A. (2013). Investigating lexical difficulties of learners in the error-annotated UPF learner translation corpus. In S. Granger, G. Gilquin, & F. Meunier, (Eds.), Twenty years of learner corpus research: Looking back, moving ahead. Corpora and language in use: Proceedings, 1 (pp. 129–137). Louvain-la-Neuve: Presses universitaires de Louvain.

  • Florén Serrano, C., & Lorés Sanz, R. (2008). The application of a parallel corpus (English–Spanish) to the teaching of translation (ENTRAD Project). In M. Muñoz-Calvo, C. Buesa-Gómez, & M. A. Ruiz-Moneva (Eds.), New Trends in translation and cultural identity (pp. 433–443). Newcastle upon Tyne: Cambridge Scholars Publishing.

    Google Scholar 

  • Granger, S. (1993). International corpus of learner English. In J. Aarts, P. de Haan, & N. Oostdijk (Eds.), English language corpora: Design, analysis and exploitation (pp. 57–71). Amsterdam: Rodopi.

    Google Scholar 

  • Granger, S., Dagneaux, E., Meunier, F., & Paquot, M. (2009). International corpus of learner English, v2. Louvain-la-Neuve, Belgium: Presses universitaires de Louvain.

    Google Scholar 

  • Mihalcea, R. (2003). Performance analysis of a part of speech tagging task. In A. Gelbukh (Ed.), Proceedings of the 4th international conference on intelligent text processing and computational linguists (CICLING), 2003, Mexico City, Mexico. CICLING 2003, LNCS 2588 (pp. 158–167). Berlin, Heidelberg: Springer.

  • PACTE. (2009). Results of the validation of the PACTE translation competence model: Acceptability and decision making. Across Languages and Cultures, 10(2), 207–230.

    Article  Google Scholar 

  • Sosnina, E. P. (2006). Development and application of Russian translation learner corpus. In Proceedings of corpus linguistics2006, St. Petersburg, Russia, 1014 October 2006 (pp. 365–373).

  • Uzar R. (2002). A corpus methodology for analysing translation. In Tagnin, S.E.O. (Ed.), Cadernos de Tradução: Corpora e Tradução, 1(9), 235–263.

  • Vázquez-Ayora, G. (1977). Introducción a la Traductología. Washington: Georgetown University Press.

    Google Scholar 

  • Vinay, J.-P. & Darbelnet, J. (1977). Stylistique comparée du français et de l’anglais: méthode de traduction. Paris: Didier (1990).

Source texts

  • Anglia Polytechnic: “Cambridge Campus Accommodation Guide”.

  • Bogosian, Eric: “Our Gang”.

  • Heaney, Seamus: “Crediting Poetry: The Nobel Lecture”.

  • Institutional author: “Internet Safety: safe surfing tips for teens”.

  • (http://kidshealth.org/teen/safety/safebasics/internet_safety.html).

  • Publishing house: “Le Méridien Boston”, Frommer’s 98 New England.

  • Kelman, James: How Late It Was, How Late.

  • Mishra, Pankaj: The Romantics.

  • Rowling, J.K.: Harry Potter and the Philosopher’s Stone.

  • Steig, W. and T. Elliot: Shrek. Screenplay.

  • The Economist: ‘Forests and How to Save Them’.

Download references

Acknowledgments

The people involved in the project are, in alphabetical order, J. Ainaud, A. Espunya (coordinator), J. M. Fontana, M. Forcadell, M. González and D. Pujol. Ainaud, Pujol and Forcadell collaborated on the definition of the taxonomy; Espunya and Pujol are responsible for the collection and error annotation of translations. Fontana, González and Espunya are responsible for the exploitation of the corpus in the English language courses. The alignment and linguistic annotation of the texts was provided by J. Foraster and P. Giménez, on external contracts. Lastly, I would like to thank our colleague C. Colominas at UPF for her advice on technical issues regarding the construction of a parallel corpus. I am very grateful to the anonymous reviewers for their very insightful comments and suggestions that have greatly improved this paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Anna Espunya.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Espunya, A. The UPF learner translation corpus as a resource for translator training. Lang Resources & Evaluation 48, 33–43 (2014). https://doi.org/10.1007/s10579-013-9260-1

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10579-013-9260-1

Keywords

Navigation