A Probabilistic Approach to Term Translation for Cross-Lingual Retrieval

Xu, Jinxi; Weischedel, Ralph

doi:10.1007/978-94-017-0171-6_6

Jinxi Xu⁴ &
Ralph Weischedel⁴

Part of the book series: The Springer International Series on Information Retrieval ((INRE,volume 13))

256 Accesses
6 Citations

Abstract

This work has three aspects. One is to describe a probabilistic approach to term translations for cross-lingual IR. We will show that such an approach, when used with a probabilistic retrieval model, can produce better retrieval than non-probabilistic techniques such as structural query translation (Pirkola, 1998) and Machine Translation. We will also show that parallel corpora and manual lexicons are complementary and their combination is essential to high performance CLIR. The second aspect of this work is to empirically measure CLIR performance as a function of the sizes of the bilingual resources available for estimating translation probabilities. A measurement like this is useful for two reasons. First, it can help to predict CLIR performance for a new language pair. Second, it can be used as a guidance on how much more data to acquire if existing resources cannot meet a target performance level. The third aspect is to describe a technique that can potentially reduce the cost of manually creating a parallel corpus. Such a technique will be useful for language pairs with no or little parallel text.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Allan, J., Callan, J., Feng, F., and Malin, D. (2000). INQUERY at TREC8. In TREC8 Proceedings. NIST.
Google Scholar
Ballesteros, L. and Croft, W. (1998). Resolving Ambiguity for Cross-language Retrieval. In Proceedings of ACM SIGIR 1998 Conference, pages 64–71.
Google Scholar
Berger, A. and Lafferty, J. (1999). Information Retrieval as Statistical Translation. In Proceedings of ACM SIGIR 1999 Conference.
Google Scholar
Brown, P., Pietra, S. D., Pietra, V. D., and Mercer, R. (1993). The Mathematics of Statistical Machine Translation: Parameter Estimation. Computational Linguistics, pages 263–311.
Google Scholar
Hiemstra, D. and de Jong, F. (1999). Disambiguation Strategies for Cross-language Information Retrieval. In Proceedings of the third European Conference on Research and Advanced Technology for Digital Libraries, pages 274–293.
Google Scholar
Hull, D. (1997). Using Structured Queries for Disambiguation in Cross-language Information Retrieval. In AAAI Symposium on Cross-Language Text and Speech Retrieval.
Google Scholar
Klavans, J. and Hovy, E. (1999). Multilingual (or Cross-lingual) Information Retrieval. In Hovy, E., editor, Multilingual Information Management, current levels and future abilities.
Google Scholar
Kwok, K. L. (1997). Comparing Representations in Chinese Information Retrieval. In Proceedings of ACM SIGIR 1997 Conference.
Google Scholar
Lafferty, J. (1999). Personal Communications.
Google Scholar
McCarley, J. (1999). Should We Translate the Documents or the Queries in Cross-language Information Retrieval. In Proceedings of ACL 1999, pages 208–214.
Google Scholar
Miller, D., Leek, T., and Schwartz, R. (1999). A Hidden Markov Model Information Retrieval System. In Proceedings of ACM SIGIR 1999 Conference, pages 214–221.
Google Scholar
Oard, D. (1998). A Comparative Study of Query and Document Translation for Cross-language Information Retrieval. In Third Conference of the Association for Machine Translation in the Americas.
Google Scholar
Pirkola, A. (1998). The Effects of Query Structure and Dictionary Setups in Dictionary-based Cross-language Information Retrieval. In Proceedings of ACM SIGIR 1998 Conference, pages 55–63.
Google Scholar
Ponte, J. (1998). A Language Modeling Approach to Information Retrieval. In Proceedings of ACM SIGIR 1998 Conference, pages 275–281.
Google Scholar
Porter, M. (1980). An Algorithm for Suffix Stripping. Program,14(3):130137.
Google Scholar
Rabiner, L. (1989). A Tutorial on Hidden Markov Models and Selected Appli- cations in Speech Recognition. Proceedings of IEEE 77, pages 257–286.
Article Google Scholar
Resnik, P. (1998). Parallel Strands: A Preliminary Investigation into Mining the Web for Bilingual Text. In Third Conference of the Association for Machine Translation in the Americas.
Google Scholar
Singhal, A., Buckley, C.,, and Mitra, M. (1996). Pivoted Document Length Normalization. In Proceedings of ACM SIGIR 1996 Conference.
Google Scholar
Sperer, R. and Oard, D. (2000). Structured Query Translation for Cross-language Information Retrieval. In Proceedings of ACM SIGIR 2000 Conference.
Google Scholar
Voorhees, E. and Harman, D., editors (1997). TREC5 Proceedings. NIST.
Google Scholar
Voorhees, E. and Harman, D., editors (1998). TREC6 Proceedings. NIST.
Google Scholar
Voorhees, E. and Harman, D., editors (2001). TREC9 Proceedings. NIST. Xu, J. and Croft, W. (1998). Corpus-based Stemming Using Co-occurrence of Word Variants. ACM TOIS,18(1):79–112.
Google Scholar

Download references

Author information

Authors and Affiliations

BBN Technologies, 10 Moulton St, Cambridge, MA, USA, 02138
Jinxi Xu & Ralph Weischedel

Authors

Jinxi Xu
View author publications
You can also search for this author in PubMed Google Scholar
Ralph Weischedel
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, University of Massachusetts, Amherst, USA
W. Bruce Croft (Distinguished Professor) (Distinguished Professor)
Computer Science Department, Carniege Mellon University, Pittsburgh, USA
John Lafferty (Associate Professor) (Associate Professor)

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Xu, J., Weischedel, R. (2003). A Probabilistic Approach to Term Translation for Cross-Lingual Retrieval. In: Croft, W.B., Lafferty, J. (eds) Language Modeling for Information Retrieval. The Springer International Series on Information Retrieval, vol 13. Springer, Dordrecht. https://doi.org/10.1007/978-94-017-0171-6_6

Download citation

DOI: https://doi.org/10.1007/978-94-017-0171-6_6
Publisher Name: Springer, Dordrecht
Print ISBN: 978-90-481-6263-5
Online ISBN: 978-94-017-0171-6
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics