Chapter

Research and Advanced Technology for Digital Libraries

Volume 1696 of the series Lecture Notes in Computer Science pp 274-293

Date:

Disambiguation Strategies for Cross-Language Information Retrieval

  • Djoerd HiemstraAffiliated withCentre for Telematics and Information Technology, University of Twente
  • , Franciska de JongAffiliated withCentre for Telematics and Information Technology, University of Twente

* Final gross prices may vary according to local VAT.

Get Access

Abstract

This paper gives an overview of tools and methods for Cross-Language Information Retrieval (CLIR) that are developed within the Twenty-One project. The tools and methods are evaluated with the TREC CLIR task document collection using Dutch queries on the English document base. The main issue addressed here is an evaluation of two approaches to disambiguation. The underlying question is whether a lot of effort should be put in finding the correct translation for each query term before searching, or whether searching with more than one possible translation leads to better results? The experimental study suggests that the quality of search methods is more important than the quality of dis-ambiguation methods. Good retrieval methods are able to disambiguate translated queries implicitly during searching.

Keywords

Cross-Language Information Retrieval Statistical Machine Translation