Skip to main content

Disambiguation Strategies for Cross-Language Information Retrieval

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1696))

Abstract

This paper gives an overview of tools and methods for Cross-Language Information Retrieval (CLIR) that are developed within the Twenty-One project. The tools and methods are evaluated with the TREC CLIR task document collection using Dutch queries on the English document base. The main issue addressed here is an evaluation of two approaches to disambiguation. The underlying question is whether a lot of effort should be put in finding the correct translation for each query term before searching, or whether searching with more than one possible translation leads to better results? The experimental study suggests that the quality of search methods is more important than the quality of dis-ambiguation methods. Good retrieval methods are able to disambiguate translated queries implicitly during searching.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. L. Ballesteros and W.B. Croft. Resolving ambiguity for cross-language retrieval. In W.B. Croft, A. Moffat, C.J. van Rijsbergen, R. Wilkinson, and J. Zobel, editors, Proceedings of the 21st ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’98), pages 64–71, 1998.

    Google Scholar 

  2. R. Bod. Enriching Linguistics with Statistics: Performance Models for Natural Language. Academische Pers, 1995.

    Google Scholar 

  3. M. Braschler, J. Krause, C. Peters and P. Schäuble. Cross-language information retrieval (clir) track overview. In Procedings of the seventh Text Retrieval Conference (TREC-7), 1999.

    Google Scholar 

  4. D. Harman. How effective is sufixing? Journal of the American Society for Information Science, 42(1):7–15, 1991.

    Article  Google Scholar 

  5. D. Hiemstra. A linguistically motivated probabilistic model of information retrieval. In C. Nicolaou and C. Stephanidis, editors, Proceedings of the Second European Conference on Research and Advanced Technology for Digital Libraries (ECDL-2), pages 569–584, 1998.

    Google Scholar 

  6. D. Hiemstra. Multilingual domain modeling in Twenty-One: automatic creation of a bi-directional translation lexicon from a parallel corpus. In P.A. Coppen, H. van Halteren, and L. Teunissen, editors, Proceedings of eightth CLIN meeting, pages 41–58, 1998.

    Google Scholar 

  7. D. Hiemstra. A probabilistic justi_cation for using tf.idf term weighting in information retrieval. International Journal on Digital Libraries, to appear.

    Google Scholar 

  8. D. Hiemstra and W. Kraaij. Twenty-One at TREC-7: Ad-hoc and cross-language track. In Proceedings of the seventh Text Retrieval Conference (TREC-7). NIST Special Publications, 1999.

    Google Scholar 

  9. D.A. Hull. Using structured queries for disambiguation in cross-language information retrieval. In AAAI Symposium on Cross-Language Text and Speech Retrieval. American Association for Artificial Intelligence, 1997.

    Google Scholar 

  10. D.A. Hull and G. Grefenstette. A dictionary-based approach to multilingual information retrieval. In Proceedings of the 19th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’96), 1996.

    Google Scholar 

  11. W. Kraaij. Multilingual functionality in the Twenty-One project. In AAAI Symposium on Cross-Language Text and Speech Retrieval. American Association for Artificial Intelligence, 1997.

    Google Scholar 

  12. W. Kraaij and D. Hiemstra. Cross-language retrieval with the Twenty-One system. In E. Voorhees and D. Harman, editors, Proceedings of the 6th Text Retrieval Conference TREC-6, pages 753–761. NIST Special Publication 500-240, 1998.

    Google Scholar 

  13. D.R.H. Miller, T. Leek and R.M. Schwartz. BBN at TREC-7: using hidden markov models for information retrieval. In Proceedings of the seventh Text Retrieval Conference, TREC-7. NIST Special Publications, 1999.

    Google Scholar 

  14. A.M. Mood and F.A. Graybill, editors. Introduction to the Theory of Statistics, Second edition. McGraw-Hill, 1963.

    Google Scholar 

  15. D.W. Oard. A comparative study of query and document translation for cross-language information retrieval. In Proceedings of the Third Conference of the Association for Machine Translation in the Americas (AMTA), 1998.

    Google Scholar 

  16. D.W. Oard and B.J. Dorr. A survey of multilingual text retrieval. Technical report, University of Maryland, 1996. http://www.ee.umd.edu/medlab/mlir/mlir.html

  17. J.M. Ponte and W.B. Croft. A language modeling approach to information retrieval. In W.B. Croft, A. Moffat, C.J. van Rijsbergen, R. Wilkinson, and J. Zobel, editors, Proceedings of the 21st ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’98), 1998.

    Google Scholar 

  18. S.E. Robertson and K. Sparck-Jones. Relevance weighting of search terms. Journal of the American Society for Information Science, 27:129–146, 1976.

    Article  Google Scholar 

  19. G. Salton and C. Buckley. Term-weighting approaches in automatic text retrieval. Information Processing & Management, 24(5):513–523, 1988.

    Article  Google Scholar 

  20. G. Salton and M.J. McGill, editors. Introduction to Modern Information Retrieval. McGraw-Hill, 1983.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1999 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Hiemstra, D., de Jong, F. (1999). Disambiguation Strategies for Cross-Language Information Retrieval. In: Abiteboul, S., Vercoustre, AM. (eds) Research and Advanced Technology for Digital Libraries. ECDL 1999. Lecture Notes in Computer Science, vol 1696. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-48155-9_18

Download citation

  • DOI: https://doi.org/10.1007/3-540-48155-9_18

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-66558-8

  • Online ISBN: 978-3-540-48155-3

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics