Empirical Methods for MT Lexicon Development

  • I. Dan Melamed
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1529)


This article reviews some recently invented methods for automatically extracting translation lexicons from parallel texts. The accuracy of these methods has been significantly improved by exploiting known properties of parallel texts and of particular language pairs. The state of the art has advanced to the point where non-compositional compounds can be automatically identified with high reliability, and their translations can be found. Most importantly, all of these methods can be smoothly integrated into the usual work ow of MT system developers. Semi-automatic MT lexicon construction is likely to be more efficient and more accurate than either fully automatic or fully manual methods alone.


Similarity Score Word Pair Empirical Method Indirect Association Word Sequence 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. I. D. Melamed. (1998) Empirical Methods for Exploiting Parallel Texts, Ph.D. dissertation. University of Pennsylvania, Philadelphia, PA.Google Scholar
  2. I. D. Melamed. (to appear) “Bitext Maps and Alignment via Pattern Recognition,” to appear in Computational Linguistics.Google Scholar
  3. I. D. Melamed. (submitted) “Word-to-Word Models of Translational Equivalence,” submitted to Computational Linguistics.Google Scholar
  4. D. Yarowsky. (1993) “One Sense Per Collocation,” DARPA Workshop on Human Language Technology. Princeton, NJ.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1998

Authors and Affiliations

  • I. Dan Melamed
    • 1
  1. 1.Computer Science Research DepartmentWest Group D1-66FEagan

Personalised recommendations