French-English Terminology Extraction from Comparable Corpora

  • Béatrice Daille
  • Emmanuel Morin
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3651)


This article presents a method of extracting bilingual lexica composed of single-word terms (SWTs) and multi-word terms (MWTs) from comparable corpora of a technical domain. First, this method extracts MWTs in each language, and then uses statistical methods to align single words and MWTs by exploiting the term contexts. After explaining the difficulties involved in aligning MWTs and specifying our approach, we show the adopted process for bilingual terminology extraction and the resources used in our experiments. Finally, we evaluate our approach and demonstrate its significance, particularly in relation to non-compositional MWT alignment.


Machine Translation Target Language Statistical Machine Translation Parallel Corpus Similarity Vector 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Cao, Y., Li, H.: Base Noun Phrase Translation Using Web Data and the EM Algorithm. In: Proceeding of the 19th International Conference on Computational Linguistics (COLING 2002), Tapei, Taiwan, pp. 127–133 (2002)Google Scholar
  2. 2.
    Carl, M., Langlais, P.: An intelligent Terminology Database as a pre-processor for Statistical Machine Translation. In: Chien, L.F., Daille, B., Kageura, L., Nakagawa, H. (eds.) Proceeding of the COLING 2002 2nd International Workshop on Computational Terminology (COMPUTERM 2002), Tapei, Taiwan, pp. 15–21 (2002)Google Scholar
  3. 3.
    Chiao, Y.C.: Extraction lexicale bilingue à partir de textes médicaux comparables : application à la recherche d’information translangue. PhD thesis, Université Pierre et Marie Curie, Paris VI (2004)Google Scholar
  4. 4.
    Chiao, Y.C., Zweigenbaum, P.: Looking for candidate translational equivalents in specialized, comparable corpora. In: Proceedings of the 19th International Conference on Computational Linguistics (COLING 2002), Tapei, Taiwan, pp. 1208–1212 (2002)Google Scholar
  5. 5.
    Daille, B.: Conceptual Structuring through Term Variations. In: Bond, F., Korhonen, A., MacCarthy, D., Villacicencio, A. (eds.) Proceedings of the ACL 2003 Workshop on Multiword Expressions: Analysis, Acquisition and Treatment, pp. 9–16 (2003)Google Scholar
  6. 6.
    Daille, B.: Terminology Mining. In: Pazienza, M. (ed.) Information Extraction in the Web Era, pp. 29–44. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  7. 7.
    Daille, B., Gaussier, E., Langé, J.-M.: Towards Automatic Extraction of Monolingual and Bilingual Terminology. In: Proceedings of the 15th International Conference on Computational Linguistics (COLING 1994), vol. 1, pp. 515–521 (1994)Google Scholar
  8. 8.
    Déjean, H., Sadat, F., Gaussier, E.: An approach based on multilingual thesauri and model combination for bilingual lexicon extraction. In: Proceedings of the 19th International Conference on Computational Linguistics (COLING 2002), pp. 218–224 (2002)Google Scholar
  9. 9.
    Déjean, H., Gaussier, E.: Une nouvelle approche à l’extraction de lexiques bilingues à partir de corpus comparables. Lexicometrica, Alignement lexical dans les corpus multilingues, 1–22 (2002)Google Scholar
  10. 10.
    Fung, P.: A statistical view on bilingual lexicon extraction: From parallel corpora to non-parallel corpora. In: Farwell, D., Gerber, L., Hovy, E. (eds.) AMTA 1998. LNCS (LNAI), vol. 1529, pp. 1–17. Springer, Heidelberg (1998)CrossRefGoogle Scholar
  11. 11.
    Gaussier, E., Langé, J.M.: Modèles statistiques pour l’extraction de lexiques bilingues. Traitement Automatique des Langues (TAL) 36, 133–155 (1995)Google Scholar
  12. 12.
    Jacquemin, C.: Spotting and Discovering Terms through Natural Language Processing. MIT Press, Cambridge (2001)Google Scholar
  13. 13.
    Melamed, I.D.: Empirical Methods for Exploiting Parallel Texts. MIT Press, Cambridge (2001)Google Scholar
  14. 14.
    Rapp, R.: Automatic Identification of Word Translations from Unrelated English and German Corpora. In: Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics (ACL 1999), pp. 519–526 (1999)Google Scholar
  15. 15.
    Salton, G., Lesk, M.E.: Computer Evaluation of Indexing and Text Processing. Journal of the Association for Computational Machinery 15, 8–36 (1968)zbMATHGoogle Scholar
  16. 16.
    Tanimoto, T.T.: An elementary mathematical theory of classification. Technical report, IBM Research (1958)Google Scholar
  17. 17.
    Veronis, J. (ed.): Parallel Text Processing. Kluwer Academic Publishers, Dordrecht (2000)zbMATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Béatrice Daille
    • 1
  • Emmanuel Morin
    • 1
  1. 1.LINA – FRE CNRS 2729University of NantesNantes Cedex 3France

Personalised recommendations