Chapter

Advances in Artificial Intelligence

Volume 1952 of the series Lecture Notes in Computer Science pp 339-349

Extracting Equivalents from Aligned Parallel Texts: Comparison of Measures of Similarity

  • António RibeiroAffiliated withDepartamento de Informática, Quinta da Torre, Universidade Nova de Lisboa, Faculdade de Ciências e Tecnologia
  • , Gabriel Pereira LopesAffiliated withDepartamento de Informática, Quinta da Torre, Universidade Nova de Lisboa, Faculdade de Ciências e Tecnologia
  • , João MexiaAffiliated withDepartamento de Matemática, Quinta da Torre, Universidade Nova de Lisboa, Faculdade de Ciências e Tecnologia

* Final gross prices may vary according to local VAT.

Get Access

Abstract

Extraction of term equivalents is one of the most important tasks for building bilingual dictionaries. Several measures have been proposed to extract translation equivalents from aligned parallel texts. In this paper, we will compare 28 measures of similarity based on the co-occurrence of words in aligned parallel text segments. Parallel texts are aligned using a simple method that extends previous work by Pascale Fung & Kathleen McKeown and Melamed but which, in contrast, does not use statistically unsupported heuristics to filter reliable points.