Language Resources and Evaluation

, Volume 43, Issue 1, pp 27–40

A cost-effective lexical acquisition process for large-scale thesaurus translation

  • Jimmy Lin
  • G. Craig Murray
  • Bonnie J. Dorr
  • Jan Hajič
  • Pavel Pecina
Article

DOI: 10.1007/s10579-008-9074-8

Cite this article as:
Lin, J., Murray, G.C., Dorr, B.J. et al. Lang Resources & Evaluation (2009) 43: 27. doi:10.1007/s10579-008-9074-8
  • 72 Downloads

Abstract

Thesauri and controlled vocabularies facilitate access to digital collections by explicitly representing the underlying principles of organization. Translation of such resources into multiple languages is an important component for providing multilingual access. However, the specificity of vocabulary terms in most thesauri precludes fully-automatic translation using general-domain lexical resources. In this paper, we present an efficient process for leveraging human translations to construct domain-specific lexical resources. This process is illustrated on a thesaurus of 56,000 concepts used to catalog a large archive of oral histories. We elicited human translations on a small subset of concepts, induced a probabilistic phrase dictionary from these translations, and used the resulting resource to automatically translate the rest of the thesaurus. Two separate evaluations demonstrate the acceptability of the automatic translations and the cost-effectiveness of our approach.

Keywords

Thesauri Controlled vocabularies Manual translation process 

Copyright information

© Springer Science+Business Media B.V. 2008

Authors and Affiliations

  • Jimmy Lin
    • 1
  • G. Craig Murray
    • 1
  • Bonnie J. Dorr
    • 1
  • Jan Hajič
    • 2
  • Pavel Pecina
    • 2
  1. 1.University of MarylandCollege ParkUSA
  2. 2.Charles UniversityPragueCzech Republic