Abstract
In this paper we present a tool that uses comparable corpora to find appropriate translation equivalents for expressions that are considered by translators as difficult. For a phrase in the source language the tool identifies a range of possible expressions used in similar contexts in target language corpora and presents them to the translator as a list of suggestions. In the paper we discuss the method and present results of human evaluation of the performance of the tool, which highlight its usefulness when dictionary solutions are lacking.
Similar content being viewed by others
References
Babych, B., & Hartley, A. (2004). Extending the BLEU MT evaluation method with frequency weightings. In Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics. Barcelona.
Dagan, I., & Church, K. (1997). Termight: humans and machines in bilingual terminology acquisition. Machine Translation, 12(1/2), 89–107.
Daille, B., & Morin, E. (2005). French-English terminology extraction from comparable corpora. In Proceedings IJCNLP 2005: Second International Joint Conference, Lecture Notes in Computer Sciences (LNCS), Vol. 3651, pp. 707–719.
Grefenstette, G. (2002). Multilingual corpus-based extraction and the very large Lexicon. In L. Borin (Ed.), Language and computers, parallel corpora, parallel worlds (pp. 137–149). Rodopi.
Justeson, J. S., & Katz, S. M. (1995). Techninal terminology: Some linguistic properties and an algorithm for identification in text. Natural Language Engineering, 1(1), 9–27.
Lin, D. (1998). Automatic retrieval and clustering of similar words. In: Proceedings of Joint COLING-ACL-98 (pp. 768–774). Montreal.
Mel’čuk, I. A. (1996). Lexical functions: A tool for the description of lexical relations in a lexicon. In L. Wanner (Ed.), Lexical functions in lexicography and natural language processing (pp. 37–102). Amsterdam: John Benjamins.
Partington, A. (1998). Patterns and meanings: Using corpora for English language research and teaching. Amsterdam: John Benjamins.
Ploux, S., & Ji, H. (2003). A model for matching semantic maps between languages (French/English, English/French). Computational Linguistics, 29(2), 155–178.
Rapp, R. (2004). A freely available automatically generated thesaurus of related words. In Proceedings of the Forth Language Resources and Evaluation Conference, LREC 2004 (pp. 395–398). Lisbon.
Rayson, P., Archer, D., Piao, S., & McEnery, T. (2004). The UCREL semantic analysis system. In: Proceedings of Beyond Named Entity Recognition Workshop in association with LREC 2004 (pp. 7–12). Lisbon.
Sharoff, S. (2006). Creating general-purpose corpora using automated search engine queries. In M. Baroni & S. Bernardini (Eds.), WaCky! Working papers on the Web as Corpus. Bologna: Gedit. http://www.wackybook.sslmit.unibo.it
Zanettin, F. (1998). Bilingual comparable corpora and the training of translators. Meta, XLIII(4).
Acknowledgements
This research is supported by EPSRC grant EP/C005902. We are grateful to the anonymous reviewers for their insightful comments and links to relevant research.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Sharoff, S., Babych, B. & Hartley, A. ‘Irrefragable answers’ using comparable corpora to retrieve translation equivalents. Lang Resources & Evaluation 43, 15–25 (2009). https://doi.org/10.1007/s10579-007-9046-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10579-007-9046-4