Abstract
This paper gives an overview of tools and methods for Cross-Language Information Retrieval (CLIR) that are developed within the Twenty-One project. The tools and methods are evaluated with the TREC CLIR task document collection using Dutch queries on the English document base. The main issue addressed here is an evaluation of two approaches to disambiguation. The underlying question is whether a lot of effort should be put in finding the correct translation for each query term before searching, or whether searching with more than one possible translation leads to better results? The experimental study suggests that the quality of search methods is more important than the quality of dis-ambiguation methods. Good retrieval methods are able to disambiguate translated queries implicitly during searching.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
L. Ballesteros and W.B. Croft. Resolving ambiguity for cross-language retrieval. In W.B. Croft, A. Moffat, C.J. van Rijsbergen, R. Wilkinson, and J. Zobel, editors, Proceedings of the 21st ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’98), pages 64–71, 1998.
R. Bod. Enriching Linguistics with Statistics: Performance Models for Natural Language. Academische Pers, 1995.
M. Braschler, J. Krause, C. Peters and P. Schäuble. Cross-language information retrieval (clir) track overview. In Procedings of the seventh Text Retrieval Conference (TREC-7), 1999.
D. Harman. How effective is sufixing? Journal of the American Society for Information Science, 42(1):7–15, 1991.
D. Hiemstra. A linguistically motivated probabilistic model of information retrieval. In C. Nicolaou and C. Stephanidis, editors, Proceedings of the Second European Conference on Research and Advanced Technology for Digital Libraries (ECDL-2), pages 569–584, 1998.
D. Hiemstra. Multilingual domain modeling in Twenty-One: automatic creation of a bi-directional translation lexicon from a parallel corpus. In P.A. Coppen, H. van Halteren, and L. Teunissen, editors, Proceedings of eightth CLIN meeting, pages 41–58, 1998.
D. Hiemstra. A probabilistic justi_cation for using tf.idf term weighting in information retrieval. International Journal on Digital Libraries, to appear.
D. Hiemstra and W. Kraaij. Twenty-One at TREC-7: Ad-hoc and cross-language track. In Proceedings of the seventh Text Retrieval Conference (TREC-7). NIST Special Publications, 1999.
D.A. Hull. Using structured queries for disambiguation in cross-language information retrieval. In AAAI Symposium on Cross-Language Text and Speech Retrieval. American Association for Artificial Intelligence, 1997.
D.A. Hull and G. Grefenstette. A dictionary-based approach to multilingual information retrieval. In Proceedings of the 19th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’96), 1996.
W. Kraaij. Multilingual functionality in the Twenty-One project. In AAAI Symposium on Cross-Language Text and Speech Retrieval. American Association for Artificial Intelligence, 1997.
W. Kraaij and D. Hiemstra. Cross-language retrieval with the Twenty-One system. In E. Voorhees and D. Harman, editors, Proceedings of the 6th Text Retrieval Conference TREC-6, pages 753–761. NIST Special Publication 500-240, 1998.
D.R.H. Miller, T. Leek and R.M. Schwartz. BBN at TREC-7: using hidden markov models for information retrieval. In Proceedings of the seventh Text Retrieval Conference, TREC-7. NIST Special Publications, 1999.
A.M. Mood and F.A. Graybill, editors. Introduction to the Theory of Statistics, Second edition. McGraw-Hill, 1963.
D.W. Oard. A comparative study of query and document translation for cross-language information retrieval. In Proceedings of the Third Conference of the Association for Machine Translation in the Americas (AMTA), 1998.
D.W. Oard and B.J. Dorr. A survey of multilingual text retrieval. Technical report, University of Maryland, 1996. http://www.ee.umd.edu/medlab/mlir/mlir.html
J.M. Ponte and W.B. Croft. A language modeling approach to information retrieval. In W.B. Croft, A. Moffat, C.J. van Rijsbergen, R. Wilkinson, and J. Zobel, editors, Proceedings of the 21st ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’98), 1998.
S.E. Robertson and K. Sparck-Jones. Relevance weighting of search terms. Journal of the American Society for Information Science, 27:129–146, 1976.
G. Salton and C. Buckley. Term-weighting approaches in automatic text retrieval. Information Processing & Management, 24(5):513–523, 1988.
G. Salton and M.J. McGill, editors. Introduction to Modern Information Retrieval. McGraw-Hill, 1983.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1999 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Hiemstra, D., de Jong, F. (1999). Disambiguation Strategies for Cross-Language Information Retrieval. In: Abiteboul, S., Vercoustre, AM. (eds) Research and Advanced Technology for Digital Libraries. ECDL 1999. Lecture Notes in Computer Science, vol 1696. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-48155-9_18
Download citation
DOI: https://doi.org/10.1007/3-540-48155-9_18
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-66558-8
Online ISBN: 978-3-540-48155-3
eBook Packages: Springer Book Archive