Skip to main content

An Approach to Acquire Word Translations from Non-parallel Texts

  • Conference paper

Part of the Lecture Notes in Computer Science book series (LNAI,volume 3808)

Abstract

Few approaches to extract word translations from non-parallel texts have been proposed so far. Researchers have not been encouraged to work on this topic because extracting information from non-parallel corpora is a difficult task producing poor results. Whereas for parallel texts, word translation extraction can reach about 99%, the accuracy for non-parallel texts has been around 72% up to now. The current approach, which relies on the previous extraction of bilingual pairs of lexico-syntactic templates from parallel corpora, makes a significant improvement to about 89% of words translations identified correctly.

Keywords

  • Test Word
  • Computational Linguistics
  • Parallel Corpus
  • Correct Translation
  • Bilingual Dictionary

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

This work has been supported by Ministerio de Educación y Ciencia of Spain, within the project GARI-COTERM, ref: HUM2004-05658-D02-02.

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (Canada)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (Canada)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (Canada)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Ahrenberg, L., Andersson, M., Merkel, M.: A simple hybrid aligner for generating lexical correspondences in parallel texts. In: 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics (COLING-ACL 1998), Montreal, pp. 29–35 (1998)

    Google Scholar 

  2. Carreras, X., Chao, I., Padró, L., Padró, M.: An open-source suite of language analyzers. In: 4th International Conference on Language Resources and Evaluation (LREC 2004), Lisbon, Portugal (2004)

    Google Scholar 

  3. Dejean, H., Gaussier, E., Sadat, F.: Bilingual terminology extraction: an approach based on a multilingual thesaurus applicable to comparable corpora. In: COLING 2002, Tapei, Taiwan (2002)

    Google Scholar 

  4. Diab, M., Finch, S.: A statistical word-level translation model for comparable corpora. In: Proceedings of the Conference on Content-Based Multimedia Information Access, RIAO (2001)

    Google Scholar 

  5. Fung, P.: Compiling bilingual lexicon entries from a non-parallel english-chinese corpus. In: 14th Annual Meeting of Very Large Corpora, Boston, Massachusettes, pp. 173–183 (1995)

    Google Scholar 

  6. Fung, P., McKeown, K.: Finding terminology translation frmo non-parallel corpora. In: 5th Annual Workshop on Very Large Corpora, Hong Kong, pp. 192–202 (1997)

    Google Scholar 

  7. Fung, P., Yee, L.Y.: An ir approach for translating new words from nonparallel, comparable texts. In: Coling 1998, Montreal, Canada, pp. 414–420 (1998)

    Google Scholar 

  8. Gamallo, P.: Extraction of translation equivalents from parallel corpora using sense-sensitive contexts. In: 10th Conference of the European Association on Machine Translation (EAMT 2005), Budapest, Hungary, pp. 97–102 (2005)

    Google Scholar 

  9. Gamallo, P., Agustini, A., Lopes, G.: Clustering syntactic positions with similar syntactic requirements. Computational Linguistics 31(1) (2005)

    Google Scholar 

  10. Gamallo, P., Gasperin, C., Agustini, A., Lopes, G.P.: Syntactic-based methods for measuring word similarity. In: Matoušek, V., Mautner, P., Mouček, R., Tauser, K. (eds.) TSD 2001. LNCS (LNAI), vol. 2166, pp. 116–125. Springer, Heidelberg (2001)

    CrossRef  Google Scholar 

  11. Grefenstette, G.: Explorations in Automatic Thesaurus Discovery. Kluwer Academic Publishers, USA (1994)

    MATH  Google Scholar 

  12. Harris, Z.: Distributional structure. In: Katz, J.J. (ed.) The Philosophy of Linguistics, pp. 26–47. Oxford University Press, New York (1985)

    Google Scholar 

  13. Kwong, O.Y., Tsou, B.K., Lai, T.B.: Alignment and extraction of bilingual legal terminology from context profiles. Terminology 10(1), 81–99 (2004)

    CrossRef  Google Scholar 

  14. Lin, D.: Automatic retrieval and clustering of similar words. In: COLING-ACL 1998, Montreal (1998)

    Google Scholar 

  15. Melamed, D.: A word-to-word model of translational equivalence. In: 35th Conference of the Association of Computational Linguistics (ACL 1997), Madrid, Spain (1997)

    Google Scholar 

  16. Nakagawa, H.: Disambiguation of single noun translations extracted from bilingual comparable corpora. Terminology 7(1), 63–83 (2001)

    Google Scholar 

  17. Rapp, R.: Identifying word translations in non-parallel texts. In: 33rd Conference of the ACL 1995, pp. 320–322 (1995)

    Google Scholar 

  18. Rapp, R.: Automatic identification of word translations from unrelated english and german corpora. In: ACL 1999, pp. 519–526 (1999)

    Google Scholar 

  19. Schimd, H.: Treetagger. In: A language independent part-of-speech tagger (2002), http://www.ims.uni-stuttgart.de/projekte/corplex/TreeTagger/DecisionTreeTagger.html

  20. Tiedemann, J.: Extraction of translation equivalents from parallel corpora. In: 11th Nordic Conference of Computational Linguistics, Copenhagen, Denmark (1998)

    Google Scholar 

  21. Wettler, M., Rapp, R.: Computation of word associations based on the co-occurrences of words in large corpora. In: 1st Workshop on Very Large Corpora, Columbus, Ohio, pp. 84–93 (1993)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Otero, P.G., Campos, J.R.P. (2005). An Approach to Acquire Word Translations from Non-parallel Texts. In: Bento, C., Cardoso, A., Dias, G. (eds) Progress in Artificial Intelligence. EPIA 2005. Lecture Notes in Computer Science(), vol 3808. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11595014_59

Download citation

  • DOI: https://doi.org/10.1007/11595014_59

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-30737-2

  • Online ISBN: 978-3-540-31646-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics