Disambiguation Strategies for Cross-Language Information Retrieval

  • Djoerd Hiemstra
  • Franciska de Jong
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1696)

Abstract

This paper gives an overview of tools and methods for Cross-Language Information Retrieval (CLIR) that are developed within the Twenty-One project. The tools and methods are evaluated with the TREC CLIR task document collection using Dutch queries on the English document base. The main issue addressed here is an evaluation of two approaches to disambiguation. The underlying question is whether a lot of effort should be put in finding the correct translation for each query term before searching, or whether searching with more than one possible translation leads to better results? The experimental study suggests that the quality of search methods is more important than the quality of dis-ambiguation methods. Good retrieval methods are able to disambiguate translated queries implicitly during searching.

Keywords

Cross-Language Information Retrieval Statistical Machine Translation 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    L. Ballesteros and W.B. Croft. Resolving ambiguity for cross-language retrieval. In W.B. Croft, A. Moffat, C.J. van Rijsbergen, R. Wilkinson, and J. Zobel, editors, Proceedings of the 21st ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’98), pages 64–71, 1998.Google Scholar
  2. 2.
    R. Bod. Enriching Linguistics with Statistics: Performance Models for Natural Language. Academische Pers, 1995.Google Scholar
  3. 3.
    M. Braschler, J. Krause, C. Peters and P. Schäuble. Cross-language information retrieval (clir) track overview. In Procedings of the seventh Text Retrieval Conference (TREC-7), 1999.Google Scholar
  4. 4.
    D. Harman. How effective is sufixing? Journal of the American Society for Information Science, 42(1):7–15, 1991.CrossRefGoogle Scholar
  5. 5.
    D. Hiemstra. A linguistically motivated probabilistic model of information retrieval. In C. Nicolaou and C. Stephanidis, editors, Proceedings of the Second European Conference on Research and Advanced Technology for Digital Libraries (ECDL-2), pages 569–584, 1998.Google Scholar
  6. 6.
    D. Hiemstra. Multilingual domain modeling in Twenty-One: automatic creation of a bi-directional translation lexicon from a parallel corpus. In P.A. Coppen, H. van Halteren, and L. Teunissen, editors, Proceedings of eightth CLIN meeting, pages 41–58, 1998.Google Scholar
  7. 7.
    D. Hiemstra. A probabilistic justi_cation for using tf.idf term weighting in information retrieval. International Journal on Digital Libraries, to appear.Google Scholar
  8. 8.
    D. Hiemstra and W. Kraaij. Twenty-One at TREC-7: Ad-hoc and cross-language track. In Proceedings of the seventh Text Retrieval Conference (TREC-7). NIST Special Publications, 1999.Google Scholar
  9. 9.
    D.A. Hull. Using structured queries for disambiguation in cross-language information retrieval. In AAAI Symposium on Cross-Language Text and Speech Retrieval. American Association for Artificial Intelligence, 1997.Google Scholar
  10. 10.
    D.A. Hull and G. Grefenstette. A dictionary-based approach to multilingual information retrieval. In Proceedings of the 19th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’96), 1996.Google Scholar
  11. 11.
    W. Kraaij. Multilingual functionality in the Twenty-One project. In AAAI Symposium on Cross-Language Text and Speech Retrieval. American Association for Artificial Intelligence, 1997.Google Scholar
  12. 12.
    W. Kraaij and D. Hiemstra. Cross-language retrieval with the Twenty-One system. In E. Voorhees and D. Harman, editors, Proceedings of the 6th Text Retrieval Conference TREC-6, pages 753–761. NIST Special Publication 500-240, 1998.Google Scholar
  13. 13.
    D.R.H. Miller, T. Leek and R.M. Schwartz. BBN at TREC-7: using hidden markov models for information retrieval. In Proceedings of the seventh Text Retrieval Conference, TREC-7. NIST Special Publications, 1999.Google Scholar
  14. 14.
    A.M. Mood and F.A. Graybill, editors. Introduction to the Theory of Statistics, Second edition. McGraw-Hill, 1963.Google Scholar
  15. 15.
    D.W. Oard. A comparative study of query and document translation for cross-language information retrieval. In Proceedings of the Third Conference of the Association for Machine Translation in the Americas (AMTA), 1998.Google Scholar
  16. 16.
    D.W. Oard and B.J. Dorr. A survey of multilingual text retrieval. Technical report, University of Maryland, 1996. http://www.ee.umd.edu/medlab/mlir/mlir.html
  17. 17.
    J.M. Ponte and W.B. Croft. A language modeling approach to information retrieval. In W.B. Croft, A. Moffat, C.J. van Rijsbergen, R. Wilkinson, and J. Zobel, editors, Proceedings of the 21st ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’98), 1998.Google Scholar
  18. 18.
    S.E. Robertson and K. Sparck-Jones. Relevance weighting of search terms. Journal of the American Society for Information Science, 27:129–146, 1976.CrossRefGoogle Scholar
  19. 19.
    G. Salton and C. Buckley. Term-weighting approaches in automatic text retrieval. Information Processing & Management, 24(5):513–523, 1988.CrossRefGoogle Scholar
  20. 20.
    G. Salton and M.J. McGill, editors. Introduction to Modern Information Retrieval. McGraw-Hill, 1983.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1999

Authors and Affiliations

  • Djoerd Hiemstra
    • 1
  • Franciska de Jong
    • 1
  1. 1.Centre for Telematics and Information TechnologyUniversity of TwenteEnschedeThe Netherlands

Personalised recommendations