Disambiguation Strategies for Cross-Language Information Retrieval

Hiemstra, Djoerd; de Jong, Franciska

doi:10.1007/3-540-48155-9_18

Disambiguation Strategies for Cross-Language Information Retrieval

Djoerd Hiemstra⁵ &
Franciska de Jong⁵

Conference paper
First Online: 01 January 1999

635 Accesses
23 Citations

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1696))

Abstract

This paper gives an overview of tools and methods for Cross-Language Information Retrieval (CLIR) that are developed within the Twenty-One project. The tools and methods are evaluated with the TREC CLIR task document collection using Dutch queries on the English document base. The main issue addressed here is an evaluation of two approaches to disambiguation. The underlying question is whether a lot of effort should be put in finding the correct translation for each query term before searching, or whether searching with more than one possible translation leads to better results? The experimental study suggests that the quality of search methods is more important than the quality of dis-ambiguation methods. Good retrieval methods are able to disambiguate translated queries implicitly during searching.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

L. Ballesteros and W.B. Croft. Resolving ambiguity for cross-language retrieval. In W.B. Croft, A. Moffat, C.J. van Rijsbergen, R. Wilkinson, and J. Zobel, editors, Proceedings of the 21st ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’98), pages 64–71, 1998.
Google Scholar
R. Bod. Enriching Linguistics with Statistics: Performance Models for Natural Language. Academische Pers, 1995.
Google Scholar
M. Braschler, J. Krause, C. Peters and P. Schäuble. Cross-language information retrieval (clir) track overview. In Procedings of the seventh Text Retrieval Conference (TREC-7), 1999.
Google Scholar
D. Harman. How effective is sufixing? Journal of the American Society for Information Science, 42(1):7–15, 1991.
Article Google Scholar
D. Hiemstra. A linguistically motivated probabilistic model of information retrieval. In C. Nicolaou and C. Stephanidis, editors, Proceedings of the Second European Conference on Research and Advanced Technology for Digital Libraries (ECDL-2), pages 569–584, 1998.
Google Scholar
D. Hiemstra. Multilingual domain modeling in Twenty-One: automatic creation of a bi-directional translation lexicon from a parallel corpus. In P.A. Coppen, H. van Halteren, and L. Teunissen, editors, Proceedings of eightth CLIN meeting, pages 41–58, 1998.
Google Scholar
D. Hiemstra. A probabilistic justi_cation for using tf.idf term weighting in information retrieval. International Journal on Digital Libraries, to appear.
Google Scholar
D. Hiemstra and W. Kraaij. Twenty-One at TREC-7: Ad-hoc and cross-language track. In Proceedings of the seventh Text Retrieval Conference (TREC-7). NIST Special Publications, 1999.
Google Scholar
D.A. Hull. Using structured queries for disambiguation in cross-language information retrieval. In AAAI Symposium on Cross-Language Text and Speech Retrieval. American Association for Artificial Intelligence, 1997.
Google Scholar
D.A. Hull and G. Grefenstette. A dictionary-based approach to multilingual information retrieval. In Proceedings of the 19th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’96), 1996.
Google Scholar
W. Kraaij. Multilingual functionality in the Twenty-One project. In AAAI Symposium on Cross-Language Text and Speech Retrieval. American Association for Artificial Intelligence, 1997.
Google Scholar
W. Kraaij and D. Hiemstra. Cross-language retrieval with the Twenty-One system. In E. Voorhees and D. Harman, editors, Proceedings of the 6th Text Retrieval Conference TREC-6, pages 753–761. NIST Special Publication 500-240, 1998.
Google Scholar
D.R.H. Miller, T. Leek and R.M. Schwartz. BBN at TREC-7: using hidden markov models for information retrieval. In Proceedings of the seventh Text Retrieval Conference, TREC-7. NIST Special Publications, 1999.
Google Scholar
A.M. Mood and F.A. Graybill, editors. Introduction to the Theory of Statistics, Second edition. McGraw-Hill, 1963.
Google Scholar
D.W. Oard. A comparative study of query and document translation for cross-language information retrieval. In Proceedings of the Third Conference of the Association for Machine Translation in the Americas (AMTA), 1998.
Google Scholar
D.W. Oard and B.J. Dorr. A survey of multilingual text retrieval. Technical report, University of Maryland, 1996. http://www.ee.umd.edu/medlab/mlir/mlir.html
J.M. Ponte and W.B. Croft. A language modeling approach to information retrieval. In W.B. Croft, A. Moffat, C.J. van Rijsbergen, R. Wilkinson, and J. Zobel, editors, Proceedings of the 21st ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’98), 1998.
Google Scholar
S.E. Robertson and K. Sparck-Jones. Relevance weighting of search terms. Journal of the American Society for Information Science, 27:129–146, 1976.
Article Google Scholar
G. Salton and C. Buckley. Term-weighting approaches in automatic text retrieval. Information Processing & Management, 24(5):513–523, 1988.
Article Google Scholar
G. Salton and M.J. McGill, editors. Introduction to Modern Information Retrieval. McGraw-Hill, 1983.
Google Scholar

Download references

Author information

Authors and Affiliations

Centre for Telematics and Information Technology, University of Twente, Enschede, The Netherlands
Djoerd Hiemstra & Franciska de Jong

Authors

Djoerd Hiemstra
View author publications
You can also search for this author in PubMed Google Scholar
Franciska de Jong
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Domaine de Voluceau, INRIA, BP 105, F-78153, Le Chesnay Cedex, France
Serge Abiteboul & Anne-Marie Vercoustre &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hiemstra, D., de Jong, F. (1999). Disambiguation Strategies for Cross-Language Information Retrieval. In: Abiteboul, S., Vercoustre, AM. (eds) Research and Advanced Technology for Digital Libraries. ECDL 1999. Lecture Notes in Computer Science, vol 1696. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-48155-9_18

Download citation

DOI: https://doi.org/10.1007/3-540-48155-9_18
Published: 17 September 1999
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-66558-8
Online ISBN: 978-3-540-48155-3
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics