Multilingual Information Retrieval Based on Document Alignment Techniques

  • Martin Braschler
  • Peter Scäuble
Conference paper

DOI: 10.1007/3-540-49653-X_12

Part of the Lecture Notes in Computer Science book series (LNCS, volume 1513)
Cite this paper as:
Braschler M., Scäuble P. (1998) Multilingual Information Retrieval Based on Document Alignment Techniques. In: Research and Advanced Technology for Digital Libraries. ECDL 1998. Lecture Notes in Computer Science, vol 1513. Springer, Berlin, Heidelberg

Abstract

A multilingual information retrieval method is presented where the user formulates the query in his/her preferred language to retrieve relevant information from a multilingual document collection. This multilingual retrieval method involves mono- and cross-language searches as well as merging their results. We adopt a corpus based approach where documents of different languages are associated if they cover a similar story. The resulting comparable corpus enables two novel techniques we have developed. First, it enables Cross-Language Information Retrieval (CLIR) which does not lack vocabulary coverage as we observed in the case of approaches that are based on automatic Machine Translation (MT). Second, aligned documents of this corpus facilitate to merge the results of mono- and cross-language searches. Using the TREC CLIR data, excellent results are obtained. In addition, our evaluation of the document alignments gives us new insights about the usefulness of comparable corpora.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Springer-VerlagBerlin Heidelberg 1998

Authors and Affiliations

  • Martin Braschler
    • 1
  • Peter Scäuble
    • 2
  1. 1.Eurospider Information Technology AGZürichSwitzerland
  2. 2.Swiss Federal Institute of Technology (ETH)ZürichSwitzerland

Personalised recommendations