Disambiguation Strategies for Cross-Language Information Retrieval
Purchase on Springer.com
$29.95 / €24.95 / £19.95*
* Final gross prices may vary according to local VAT.
This paper gives an overview of tools and methods for Cross-Language Information Retrieval (CLIR) that are developed within the Twenty-One project. The tools and methods are evaluated with the TREC CLIR task document collection using Dutch queries on the English document base. The main issue addressed here is an evaluation of two approaches to disambiguation. The underlying question is whether a lot of effort should be put in finding the correct translation for each query term before searching, or whether searching with more than one possible translation leads to better results? The experimental study suggests that the quality of search methods is more important than the quality of dis-ambiguation methods. Good retrieval methods are able to disambiguate translated queries implicitly during searching.
- L. Ballesteros and W.B. Croft. Resolving ambiguity for cross-language retrieval. In W.B. Croft, A. Moffat, C.J. van Rijsbergen, R. Wilkinson, and J. Zobel, editors, Proceedings of the 21st ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’98), pages 64–71, 1998.
- R. Bod. Enriching Linguistics with Statistics: Performance Models for Natural Language. Academische Pers, 1995.
- M. Braschler, J. Krause, C. Peters and P. Schäuble. Cross-language information retrieval (clir) track overview. In Procedings of the seventh Text Retrieval Conference (TREC-7), 1999.
- D. Harman. How effective is sufixing? Journal of the American Society for Information Science, 42(1):7–15, 1991. CrossRef
- D. Hiemstra. A linguistically motivated probabilistic model of information retrieval. In C. Nicolaou and C. Stephanidis, editors, Proceedings of the Second European Conference on Research and Advanced Technology for Digital Libraries (ECDL-2), pages 569–584, 1998.
- D. Hiemstra. Multilingual domain modeling in Twenty-One: automatic creation of a bi-directional translation lexicon from a parallel corpus. In P.A. Coppen, H. van Halteren, and L. Teunissen, editors, Proceedings of eightth CLIN meeting, pages 41–58, 1998.
- D. Hiemstra. A probabilistic justi_cation for using tf.idf term weighting in information retrieval. International Journal on Digital Libraries, to appear.
- D. Hiemstra and W. Kraaij. Twenty-One at TREC-7: Ad-hoc and cross-language track. In Proceedings of the seventh Text Retrieval Conference (TREC-7). NIST Special Publications, 1999.
- D.A. Hull. Using structured queries for disambiguation in cross-language information retrieval. In AAAI Symposium on Cross-Language Text and Speech Retrieval. American Association for Artificial Intelligence, 1997.
- D.A. Hull and G. Grefenstette. A dictionary-based approach to multilingual information retrieval. In Proceedings of the 19th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’96), 1996.
- W. Kraaij. Multilingual functionality in the Twenty-One project. In AAAI Symposium on Cross-Language Text and Speech Retrieval. American Association for Artificial Intelligence, 1997.
- W. Kraaij and D. Hiemstra. Cross-language retrieval with the Twenty-One system. In E. Voorhees and D. Harman, editors, Proceedings of the 6th Text Retrieval Conference TREC-6, pages 753–761. NIST Special Publication 500-240, 1998.
- D.R.H. Miller, T. Leek and R.M. Schwartz. BBN at TREC-7: using hidden markov models for information retrieval. In Proceedings of the seventh Text Retrieval Conference, TREC-7. NIST Special Publications, 1999.
- A.M. Mood and F.A. Graybill, editors. Introduction to the Theory of Statistics, Second edition. McGraw-Hill, 1963.
- D.W. Oard. A comparative study of query and document translation for cross-language information retrieval. In Proceedings of the Third Conference of the Association for Machine Translation in the Americas (AMTA), 1998.
- D.W. Oard and B.J. Dorr. A survey of multilingual text retrieval. Technical report, University of Maryland, 1996. http://www.ee.umd.edu/medlab/mlir/mlir.html
- J.M. Ponte and W.B. Croft. A language modeling approach to information retrieval. In W.B. Croft, A. Moffat, C.J. van Rijsbergen, R. Wilkinson, and J. Zobel, editors, Proceedings of the 21st ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’98), 1998.
- S.E. Robertson and K. Sparck-Jones. Relevance weighting of search terms. Journal of the American Society for Information Science, 27:129–146, 1976. CrossRef
- G. Salton and C. Buckley. Term-weighting approaches in automatic text retrieval. Information Processing & Management, 24(5):513–523, 1988. CrossRef
- G. Salton and M.J. McGill, editors. Introduction to Modern Information Retrieval. McGraw-Hill, 1983.
- Disambiguation Strategies for Cross-Language Information Retrieval
- Book Title
- Research and Advanced Technology for Digital Libraries
- Book Subtitle
- Third European Conference, ECDL’99 Paris, France, September 22–24, 1999 Proceedings
- pp 274-293
- Print ISBN
- Online ISBN
- Series Title
- Lecture Notes in Computer Science
- Series Volume
- Series ISSN
- Springer Berlin Heidelberg
- Copyright Holder
- Springer-Verlag Berlin Heidelberg
- Additional Links
- Cross-Language Information Retrieval
- Statistical Machine Translation
- Industry Sectors
- eBook Packages
To view the rest of this content please follow the download PDF link above.