Language Resources and Evaluation

, Volume 43, Issue 1, pp 27-40

First online:

A cost-effective lexical acquisition process for large-scale thesaurus translation

  • Jimmy LinAffiliated withUniversity of Maryland Email author 
  • , G. Craig MurrayAffiliated withUniversity of Maryland
  • , Bonnie J. DorrAffiliated withUniversity of Maryland
  • , Jan HajičAffiliated withCharles University
  • , Pavel PecinaAffiliated withCharles University

Rent the article at a discount

Rent now

* Final gross prices may vary according to local VAT.

Get Access


Thesauri and controlled vocabularies facilitate access to digital collections by explicitly representing the underlying principles of organization. Translation of such resources into multiple languages is an important component for providing multilingual access. However, the specificity of vocabulary terms in most thesauri precludes fully-automatic translation using general-domain lexical resources. In this paper, we present an efficient process for leveraging human translations to construct domain-specific lexical resources. This process is illustrated on a thesaurus of 56,000 concepts used to catalog a large archive of oral histories. We elicited human translations on a small subset of concepts, induced a probabilistic phrase dictionary from these translations, and used the resulting resource to automatically translate the rest of the thesaurus. Two separate evaluations demonstrate the acceptability of the automatic translations and the cost-effectiveness of our approach.


Thesauri Controlled vocabularies Manual translation process