Advertisement

A Multi-view Approach for Term Translation Spotting

  • Raphaël Rubino
  • Georges Linarès
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6609)

Abstract

This paper presents a multi-view approach for term translation spotting, based on a bilingual lexicon and comparable corpora. We propose to study different levels of representation for a term: the context, the theme and the orthography. These three approaches are studied individually and combined in order to rank translation candidates. We focus our task on French-English medical terms. Experiments show a significant improvement of the classical context-based approach, with a F-score of 40.3% for the first ranked translation candidates.

Keywords

Multilingualism Comparable Corpora Topic Model 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Brown, P., Della Pietra, S., Della Pietra, V., Jelinek, F., Lafferty, J., Mercer, R., Roossin, P.: A Statistical Approach to Machine Translation. Computational Linguistics 16, 79–85 (1990)Google Scholar
  2. 2.
    Koehn, P.: Europarl: A Parallel Corpus for Statistical Machine Translation. In: MT Summit, vol. 5, Citeseer (2005)Google Scholar
  3. 3.
    Fung, P.: Compiling Bilingual Lexicon Entries from a Non-parallel English-Chinese Corpus. In: Proceedings of the 3rd Workshop on Very Large Corpora, pp. 173–183 (1995)Google Scholar
  4. 4.
    Rapp, R.: Identifying Word Translations in Non-parallel Texts. In: Proceedings of the 33rd ACL Conference, pp. 320–322. ACL (1995)Google Scholar
  5. 5.
    Chiao, Y., Zweigenbaum, P.: Looking for Candidate Translational Equivalents in Specialized, Comparable Corpora. In: Proceedings of the 19th Coling Conference, vol. 2, pp. 1–5. ACL (2002)Google Scholar
  6. 6.
    Rubino, R.: Exploring Context Variation and Lexicon Coverage in Projection-based Approach for Term Translation. In: Proceedings of the RANLP Student Research Workshop, Borovets, Bulgaria, pp. 66–70. ACL (2009)Google Scholar
  7. 7.
    Laroche, A., Langlais, P.: Revisiting Context-based Projection Methods for Term-translation Spotting in Comparable Corpora. In: Proceedings of the 23rd Coling Conference, Beijing, China, pp. 617–625 (2010)Google Scholar
  8. 8.
    Shao, L., Ng, H.: Mining New Word Translations from Comparable Corpora. In: Proceedings of the 20th ACL Conference, p. 618. ACL (2004)Google Scholar
  9. 9.
    Gaussier, E., Renders, J., Matveeva, I., Goutte, C., Dejean, H.: A Geometric View on Bilingual Lexicon Extraction from Comparable Corpora. In: Proceedings of the 42nd ACL Conference, p. 526. ACL (2004)Google Scholar
  10. 10.
    Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet Allocation. The Journal of Machine Learning Research 3, 993–1022 (2003)zbMATHGoogle Scholar
  11. 11.
    Levenshtein, V.: Binary Codes Capable of Correcting Deletions, Insertions, and Reversals. Soviet Physics Doklady 10, 707–710 (1966)zbMATHGoogle Scholar
  12. 12.
    Rapp, R.: Automatic Identification of Word Translations from Unrelated English and German Corpora. In: Proceedings of the 37th ACL Conference, pp. 519–526. ACL (1999)Google Scholar
  13. 13.
    Déjean, H., Gaussier, E., Renders, J., Sadat, F.: Automatic Processing of Multilingual Medical Terminology: Applications to Thesaurus Enrichment and Cross-language Information Retrieval. Artificial Intelligence in Medicine 33, 111–124 (2005)CrossRefGoogle Scholar
  14. 14.
    Koehn, P., Knight, K.: Learning a Translation Lexicon from Monolingual Corpora. In: Proceedings of the ACL Workshop on Unsupervised Lexical Acquisition, vol. 9, pp. 9–16. ACL (2002)Google Scholar
  15. 15.
    Church, K.W., Hanks, P.: Word Association Norms, Mutual Information, and Lexicography. Computational Linguistics 16(1), 22–29 (1990)Google Scholar
  16. 16.
    Dunning, T.: Accurate Methods for the Statistics of Surprise and Coincidence. Computational Linguistics 19, 61–74 (1993)Google Scholar
  17. 17.
    Evert, S.: The Statistics of Word Cooccurrences: Word Pairs and Collocations. Ph.D. Thesis, Institut für maschinelle Sprachverarbeitung, Universität Stuttgart (2004)Google Scholar
  18. 18.
    Fung, P., McKeown, K.: Finding Terminology Translations from Non-parallel Corpora. In: Proceedings of the 5th Workshop on Very Large Corpora, pp. 192–202 (1997)Google Scholar
  19. 19.
    Deerwester, S., Dumais, S., Furnas, G., Landauer, T., Harshman, R.: Indexing by Latent Semantic Analysis. Journal of the American Society for Information Science 41, 391–407 (1990)CrossRefGoogle Scholar
  20. 20.
    Hofmann, T.: Probabilistic Latent Semantic Indexing. In: Proceedings of the 22nd ACM SIGIR Conference, pp. 50–57. ACM, New York (1999)Google Scholar
  21. 21.
    Ni, X., Sun, J., Hu, J., Chen, Z.: Mining Multilingual Topics from Wikipedia. In: Proceedings of the 18th International Conference on WWW, pp. 1155–1156. ACM, New York (2009)Google Scholar
  22. 22.
    Boyd-Graber, J., Blei, D.M.: Multilingual topic models for unaligned text. In: Proceedings of the 25th Conference on Uncertainty in Artificial Intelligence, pp. 75–82 (2009)Google Scholar
  23. 23.
    Langlais, P., Yvon, F., Zweigenbaum, P.: Translating medical words by analogy. In: Intelligent Data Analysis in Biomedicine and Pharmacology, Washington, DC, USA, pp. 51–56 (2008)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Raphaël Rubino
    • 1
  • Georges Linarès
    • 1
  1. 1.Laboratoire Informatique d’AvignonAvignonFrance

Personalised recommendations