The undergraduate learner translator corpus: a new resource for translation studies and computational linguistics

Alfuraih, Reem F.

doi:10.1007/s10579-019-09472-6

The undergraduate learner translator corpus: a new resource for translation studies and computational linguistics

Project Notes
Published: 24 July 2019

Volume 54, pages 801–830, (2020)
Cite this article

Language Resources and Evaluation Aims and scope Submit manuscript

Reem F. Alfuraih ORCID: orcid.org/0000-0003-3199-1989¹

1312 Accesses
19 Citations
4 Altmetric
Explore all metrics

Abstract

Around the world, a growing interest has been seen in learner translator corpora, which are invaluable resources for teaching and research. This paper introduces a new resource to support researchers from different interdisciplinary areas such as computational linguistics, descriptive translation studies, computer-aided translation technology, Arabic machine translation applications, cognitive science, and translation pedagogy. Motivated by the lack of learner translator resources that provide data about learners of translation from and into Arabic, the undergraduate learner translator corpus (ULTC) is an ongoing, error-tagged sentence-aligned parallel corpus of English, Arabic, and French, with Arabic as its main language. The present corpus, consisting of parallel texts of female learners of translation from English or French into Arabic, is the first of its kind in terms of the languages represented, tasks covered, and number of students involved. It is also unique in terms of combining many complementary corpora of cross-lingual data, each of which has its own web-based query interface and corpus analysis tools. This paper describes the ULTC compilation process, preliminary findings, and planned future expansion and research.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Abu Shquier, M. M., & Abu Shqeer, O. (2012). Words ordering and corresponding verb-subject agreements in English–Arabic machine translation, An enhancement approach. The International Arab Journal of Information Technology (IAJIT),2, 49–60.
Article Google Scholar
Afli, H., Lohar, P., & Way, A. (2017). MultiNews: A web collection of an aligned multimodal and multilingual corpus. In Proceedings of the first workshop on curation and applications of parallel and comparable corpora. Taipei, Taiwan.
Al-Ajmi, H. (2004). A new English-Arabic parallel text corpus for lexicographic applications. Lexikos, 14(1), 326–330.
Google Scholar
Al-Jarf, R. (2007). SVO word order errors in English–Arabic translation. Translators’ Journal,52, 299–308.
Google Scholar
Al-Momani, I. (2010). Does the VP node exist in Modern Standard Arabic? Journal of Language and Literature, ISSN: 2078-0303, May 2010.
Alotaibi, H. M. (2017). Arabic–English parallel corpus: A new resource for translation training and language teaching. Arab World English Journal,8(3), 319.
Article Google Scholar
Baker, M. (1993). Corpus linguistics and translation studies: Implications and applications. In M. Baker, G. Francis, & E. Tognini-Bonelli (Eds.), Text and technology: In honour of John Sinclair (pp. 17–45). Amsterdam: John Benjamins.
Chapter Google Scholar
Baker, M. (1999). The role of corpora in investigating the linguistic behaviour of professional translators. International Journal of Corpus Linguistics,4(2), 281–298.
Article Google Scholar
Bowker, L., & Pearson, J. (2002). Working with specialized language: A practical guide to using corpora. London: Routledge.
Book Google Scholar
Bowker, L., & Peter, B. (2003). Student translation archive and student translation tracking system: Design, development and application. In F. Zanettin, S. Bernardini, & D. Stewart (Eds.), Corpora in translator education (pp. 103–119). Manchester: St Jerome Publishing.
Google Scholar
Carl, M. (2012). Translog-II: A program for recording user activity data for empirical reading and writing research. In Proceedings of the eighth international conference on language resources and evaluation, European Language Resources Association (ELRA), Istanbul, Turkey. http://www.lrec-conf.org/proceedings/lrec2012/summaries/614.html.
Carl, M., Bangalore, S., & Schaeffer, M. (2015). New directions in empirical translation process research: Exploring the CRITT TPR-DB. Cham: Springer. (New Frontiers in Translation Studies).
Google Scholar
Carl, M., & Dragsted, B. (2012). Inside the monitor model: Process of default and challenged translation production. Translation: Corpora, Computation, Cognition,2(1), 127–145. (Special issue on the crossroads between contrastive linguistics, translation studies and machine translation).
Google Scholar
Carl, M., Dragsted, B., Elming, J., Hardt, D., & Jakobsen, A. L. (2012). The process of post-editing: A pilot study. In B. Sharp, M. Zock, M. Carl, A. L. Jakobsen (eds.), Proceedings of the 8th natural language processing and cognitive science workshop (Copenhagen studies in language series, Vol. 41, pp. 131–142).
Castagnoli, S. (2009). Regularities and variations in learner translations: A corpus-based study of conjunctive explicitation. PhD Dissertation, University of Pisa.
Castagnoli, S., Ciobanu, D., Kunz, K., Volanschi, A., & Kübler, N. (2011). Designing a learner translator corpus for training purposes. In N. Kübler (Ed.), Corpora, language, teaching, and resources: From theory to practice (pp. 221–248). Bern: Peter Lang.
Google Scholar
Cettolo, M. (2016). An Arabic–Hebrew parallel corpus of TED talks. In Proceedings of the AMTA 2016 workshop on Semitic machine translation (SeMaT). Austin, US-TX.
Dimitriu, R. (2009). Translators’ prefaces as documentary sources for translation studies, Perspectives. Studies in Translatology,17(3), 193–206.
Article Google Scholar
Espunya, A. (2014). The UPF learner translation corpus as a resource for translator training. Language Resources & Evaluation,48, 33.
Article Google Scholar
Ferguson, C. A. (1959). Diglossia. Word,15, 325–340.
Article Google Scholar
Florén, C. (2006). ENTRAD, an English Spanish parallel corpus created for the teaching of translation. Paper presented at the 7th teaching and language corpora conference (TALC 2006).
Fung, P., & Cheung, P. (2004). Mining very-non-parallel corpora: Parallel sentence and lexicon extraction via bootstrapping and em. In Proceedings of EMNLP, vol. 2004.
Graedler, A. L. (2013). Nest – a corpus in the brooding box. In M. Huber & J. Mukherjee (Eds.), Corpus linguistics and variation in English: Focus on non-native Englishs. Studies in Variation, Contacts and Change in English, University of Giessen.
Granger, S. (2002). A bird’s eye view of learner corpus research. In S. Granger, J. Hung, S. Petch-Tyson (eds.), Computer learner corpora, second language acquisition and foreign language teaching. Amsterdam & Philadelphia: Benjamins.
Guzman, F., Sajjad, H., Abdelali, A., & Vogel, S. (2013). The AMARA corpus: Building resources for translating the web’s educational content. In Proceedings of the international workshop on spoken language translation, IWSLT 2013. Heidelberg: IWSLT.
Hansen, G. (Ed.). (2002). Empirical translation studies: Process and product (Copenhagen studies in language, vol. 27). Denmark: Samfundslittera-tur.
Hewavitharana, S., Vogel, S. (2011). Extracting parallel phrases from comparable data. In Proceedings of the 4th workshop on building and using comparable corpora: Comparable corpora and the web (pp. 61–68). Association for Computational Linguistics.
Horn, C. (2015). Diglossia in the Arab world. Open Journal of Modern Linguistics,5, 100–104.
Article Google Scholar
Hu, K., & Tao, Q. (2013). The Chinese–English conference interpreting corpus: Uses and limitations. Meta,58(3), 626–642. https://doi.org/10.7202/1025055ar.
Article Google Scholar
Izquierdo, M., Hofland, K., & Reigem, Ø. (2008). The ACTRES parallel corpus: An English–Spanish translation corpus. Corpora,3(1), 31–41.
Article Google Scholar
Izwaini, S. (2003). Building specialised corpora for translation studies. In Workshop on multilingual corpora: Linguistic requirements and technical perspectives, corpus linguistics. (pp. 17–25). , Lancaster University, UK. http://www.coli.uni-sb.de/muco03/izwaini.pdf.
Jakobsen, A. (2003). Effects of think aloud on translation speed, revision and segmentation. In F. Alves (Ed.), Triangulating translation: Perspectives in process oriented research (pp. 69–95). Amsterdam: Benjamins.
Chapter Google Scholar
Jakobsen, A. L. (2011). Tracking translators’ keystrokes and eye movements with Translog. In C. Alvstad, A. Hild, & E. Tiselius (Eds.), Methods and strategies of process research integrative approaches in translation studies (pp. 37–55). Amsterdam: John Benjamins Publishing.
Chapter Google Scholar
Jakobsen, A. L., & Schou, L. (1999). Logging target text production with Translog. Copenhagen Studies in Language (Vol. 24, pp. 9–20). Copenhagen: Samfundslitteratur.
Google Scholar
Kumar, G., Cao, Y., Cotterell, R., Callison-Burch, C., Povey, D., & Khudanpur, S. (2014). Translations of the CALLHOME Egyptian Arabic corpus for conversational speech translation. IWSLT.
Kutuzov, A., & Kunilovskaya, M. (2014). Russian learner translator corpus. In P. Sojka, A. Horak, I. Kopecek, & K. Pala (Eds.), Text, speech and dialogue (Lecture Notes in Computer Science) (Vol. 8655, pp. 315–323). Berlin: Springer.
Chapter Google Scholar
Li, X., et al. (2013). GALE Arabic-English parallel aligned treebank – broadcast news. Part 1 LDC2013T14. Web Download. Philadelphia: Linguistic Data Consortium.
McEnery, A. M., & Xiao, R. Z. (2007). Parallel and comparable corpora: What are they up to? In G. Anderman, & M. Rogers (Eds.), Incorporating corpora: Translation and the linguist. Retrieved from http://eprints.lancs.ac.uk/59/.
Mesa-Lao, B. (2014). Gaze behavior on source texts: An exploratory study comparing translation and post-editing. In S. O’Brien, L. W. Balling, M. Carl, M. Simard, & L. Specia (Eds.), Post-editing of machine translation (pp. 219–245). Newcastle Upon Tyne: Cambridge Scholar Publishing.
Google Scholar
Mikhailov, M., Cooper, R. (2016). Corpus linguistics for translation and contrastive studies: A guide for research. Routledge. Corpus Linguistics Guides. London & New York: Routledge.
Norberg, U. (2014). Fostering self-reflection in translation students. Translation & Interpreting Studies,9(1), 150–164.
Article Google Scholar
Oakes, M. (1998). Statistics for corpus linguistics. Edinburgh: Edinburgh University Press.
Google Scholar
Paltridge, B. (2012). Discourse analysis (2nd ed.). London: Bloomsbury.
Google Scholar
Pasha, A., Al-Badrashiny, M., Diab, M., El Kholy, A., Eskander, R., Habash, N., Pooleery, M., Rambow, O., & Roth, R. M. (2014). MADAMIRA: A fast, comprehensive tool for morphological analysis and disambiguation of Arabic. Language Resources and Evaluation Conference (LREC 2014).
Rafalovitch, A., & Dale, R. (2009). United Nations General Assembly resolutions: A six-language parallel corpus. In Proceedings of the MT summit XII. (pp. 292–299, Ottawa, Canada).
Russo, M., Bendazzoli, C., Sandrelli, A., & Spinolo, N. (2012). The European parliament interpreting corpus (EPIC): Implementation and developments. In S. F. Straniero & C. Falbo (Eds.), Breaking ground in corpus-based interpreting studies (pp. 53–90). Frankfurt am Main: Peter Lang.
Google Scholar
Salhi, H. (2013). Investigating the complementary polysemy and the Arabic translations of the noun destruction in EAPCOUNT. Meta Translators’ Journal,58(1), 227–246.
Google Scholar
Schmidt, T., & Wörner, K. (Eds.). (2012). Multilingual corpora and multilingual corpus analysis (p. 407). Amsterdam/Philadelphia: John Benjamins.
Google Scholar
Serbina, T., et al. (2015). Development of a keystroke logged translation corpus. In C. Fantinuoli & F. Zanettin (Eds.), New directions in corpus-based translation studies (pp. 11–34). Berlin: Language Science Press.
Google Scholar
Shlesinger, M. (2008). Towards a definition of interpretese: An intermodal, corpus-based study. In G. Hansen, A. Chesterman, & H. Gerzynisch-Arbogast (Eds.), Efforts and models in interpreting and translation research (pp. 237–253). Amsterdam/Philadelphia: John Benjamins.
Google Scholar
Smith, J. R., Quirk, C., & Toutanova, K. (2010). Extracting parallel sentences from comparable corpora using document level alignment. In Human language technologies: The 2010 annual conference of the North American chapter of the Association for Computational Linguistics, pp. 403–411. Association for Computational Linguistics.
Sosnina, E. P. (2006). Development and application of Russian translation learner corpus. St. Petersburg: Papers from the Corpus Linguistics Conference.
Google Scholar
Stefanescu, D., Ion, R., & Hunsicker, S. (2012). Hybrid parallel sentence mining from comparable corpora. In Proceedings of the 16th conference of the European Association for Machine Translation (pp. 137–144).
Štěpánková, K. (2014). Learner translation corpus: CELTraC (Bachelor’s thesis).
Temnikova, I., Abdelali, A., Hedaya, S., Vogel, S., & Al Daher, A. (2017). Interpreting strategies annotation in the WAW corpus. RANLP, p. 36.
Tiedemann, J. (2012). Parallel data, tools and interfaces in OPUS. In Proceedings of the 8th international conference on language resources and evaluation (LREC’12) (pp. 2214–2218). Istanbul: European Language Research Association.
Tono, Y. (2003). Learner corpora: Design, development and application. In Proceedings of the corpus linguistics 2003 conference (pp. 800–809). Lancaster, UK, 28–31 March 2003.
Uzar, R., & Walinski, J. (2001). Analyzing the fluency of translators. International Journal of Corpus Linguistics,155(166), 12.
Google Scholar
Wurm, A. (2013). Eigennamen und Realia in einem Korpus studentischer Übersetzungen (KOPTE); in: transkom, 6(2); 381–419. http://trans-kom.eu.
Xiao, R., & McEnery, T. (2002). A two-level approach to situation aspect. Paper presented at the 5th chronos colloquium on tense, aspect and modality, Groningen, Netherlands.
Zaidan, O. F., & Callison-Burch, C. (2014). Arabic dialect identification. Computational Linguistics,40(1), 171–202.
Article Google Scholar

Download references

Acknowledgements

The author would like to thank the anonymous reviewers for the detailed and constructive review that helped to clarify many points and improve the structure of the manuscript. The author is greatly indebted to PNU instructors, course coordinators, and learners for their contributions.

Author information

Authors and Affiliations

College of Languages, Princess Nourah bint Abdulrahman University, PO Box 7455, Riyadh, 14215, Saudi Arabia
Reem F. Alfuraih

Authors

Reem F. Alfuraih
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Reem F. Alfuraih.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Alfuraih, R.F. The undergraduate learner translator corpus: a new resource for translation studies and computational linguistics. Lang Resources & Evaluation 54, 801–830 (2020). https://doi.org/10.1007/s10579-019-09472-6

Download citation

Published: 24 July 2019
Issue Date: September 2020
DOI: https://doi.org/10.1007/s10579-019-09472-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The undergraduate learner translator corpus: a new resource for translation studies and computational linguistics

Abstract

Access this article

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation