Abstract
Few approaches to extract word translations from non-parallel texts have been proposed so far. Researchers have not been encouraged to work on this topic because extracting information from non-parallel corpora is a difficult task producing poor results. Whereas for parallel texts, word translation extraction can reach about 99%, the accuracy for non-parallel texts has been around 72% up to now. The current approach, which relies on the previous extraction of bilingual pairs of lexico-syntactic templates from parallel corpora, makes a significant improvement to about 89% of words translations identified correctly.
Keywords
- Test Word
- Computational Linguistics
- Parallel Corpus
- Correct Translation
- Bilingual Dictionary
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
This work has been supported by Ministerio de Educación y Ciencia of Spain, within the project GARI-COTERM, ref: HUM2004-05658-D02-02.
This is a preview of subscription content, access via your institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Ahrenberg, L., Andersson, M., Merkel, M.: A simple hybrid aligner for generating lexical correspondences in parallel texts. In: 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics (COLING-ACL 1998), Montreal, pp. 29–35 (1998)
Carreras, X., Chao, I., Padró, L., Padró, M.: An open-source suite of language analyzers. In: 4th International Conference on Language Resources and Evaluation (LREC 2004), Lisbon, Portugal (2004)
Dejean, H., Gaussier, E., Sadat, F.: Bilingual terminology extraction: an approach based on a multilingual thesaurus applicable to comparable corpora. In: COLING 2002, Tapei, Taiwan (2002)
Diab, M., Finch, S.: A statistical word-level translation model for comparable corpora. In: Proceedings of the Conference on Content-Based Multimedia Information Access, RIAO (2001)
Fung, P.: Compiling bilingual lexicon entries from a non-parallel english-chinese corpus. In: 14th Annual Meeting of Very Large Corpora, Boston, Massachusettes, pp. 173–183 (1995)
Fung, P., McKeown, K.: Finding terminology translation frmo non-parallel corpora. In: 5th Annual Workshop on Very Large Corpora, Hong Kong, pp. 192–202 (1997)
Fung, P., Yee, L.Y.: An ir approach for translating new words from nonparallel, comparable texts. In: Coling 1998, Montreal, Canada, pp. 414–420 (1998)
Gamallo, P.: Extraction of translation equivalents from parallel corpora using sense-sensitive contexts. In: 10th Conference of the European Association on Machine Translation (EAMT 2005), Budapest, Hungary, pp. 97–102 (2005)
Gamallo, P., Agustini, A., Lopes, G.: Clustering syntactic positions with similar syntactic requirements. Computational Linguistics 31(1) (2005)
Gamallo, P., Gasperin, C., Agustini, A., Lopes, G.P.: Syntactic-based methods for measuring word similarity. In: Matoušek, V., Mautner, P., Mouček, R., Tauser, K. (eds.) TSD 2001. LNCS (LNAI), vol. 2166, pp. 116–125. Springer, Heidelberg (2001)
Grefenstette, G.: Explorations in Automatic Thesaurus Discovery. Kluwer Academic Publishers, USA (1994)
Harris, Z.: Distributional structure. In: Katz, J.J. (ed.) The Philosophy of Linguistics, pp. 26–47. Oxford University Press, New York (1985)
Kwong, O.Y., Tsou, B.K., Lai, T.B.: Alignment and extraction of bilingual legal terminology from context profiles. Terminology 10(1), 81–99 (2004)
Lin, D.: Automatic retrieval and clustering of similar words. In: COLING-ACL 1998, Montreal (1998)
Melamed, D.: A word-to-word model of translational equivalence. In: 35th Conference of the Association of Computational Linguistics (ACL 1997), Madrid, Spain (1997)
Nakagawa, H.: Disambiguation of single noun translations extracted from bilingual comparable corpora. Terminology 7(1), 63–83 (2001)
Rapp, R.: Identifying word translations in non-parallel texts. In: 33rd Conference of the ACL 1995, pp. 320–322 (1995)
Rapp, R.: Automatic identification of word translations from unrelated english and german corpora. In: ACL 1999, pp. 519–526 (1999)
Schimd, H.: Treetagger. In: A language independent part-of-speech tagger (2002), http://www.ims.uni-stuttgart.de/projekte/corplex/TreeTagger/DecisionTreeTagger.html
Tiedemann, J.: Extraction of translation equivalents from parallel corpora. In: 11th Nordic Conference of Computational Linguistics, Copenhagen, Denmark (1998)
Wettler, M., Rapp, R.: Computation of word associations based on the co-occurrences of words in large corpora. In: 1st Workshop on Very Large Corpora, Columbus, Ohio, pp. 84–93 (1993)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Otero, P.G., Campos, J.R.P. (2005). An Approach to Acquire Word Translations from Non-parallel Texts. In: Bento, C., Cardoso, A., Dias, G. (eds) Progress in Artificial Intelligence. EPIA 2005. Lecture Notes in Computer Science(), vol 3808. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11595014_59
Download citation
DOI: https://doi.org/10.1007/11595014_59
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-30737-2
Online ISBN: 978-3-540-31646-6
eBook Packages: Computer ScienceComputer Science (R0)
