Skip to main content

A Probabilistic Approach to Term Translation for Cross-Lingual Retrieval

  • Chapter
Language Modeling for Information Retrieval

Part of the book series: The Springer International Series on Information Retrieval ((INRE,volume 13))

Abstract

This work has three aspects. One is to describe a probabilistic approach to term translations for cross-lingual IR. We will show that such an approach, when used with a probabilistic retrieval model, can produce better retrieval than non-probabilistic techniques such as structural query translation (Pirkola, 1998) and Machine Translation. We will also show that parallel corpora and manual lexicons are complementary and their combination is essential to high performance CLIR. The second aspect of this work is to empirically measure CLIR performance as a function of the sizes of the bilingual resources available for estimating translation probabilities. A measurement like this is useful for two reasons. First, it can help to predict CLIR performance for a new language pair. Second, it can be used as a guidance on how much more data to acquire if existing resources cannot meet a target performance level. The third aspect is to describe a technique that can potentially reduce the cost of manually creating a parallel corpus. Such a technique will be useful for language pairs with no or little parallel text.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Allan, J., Callan, J., Feng, F., and Malin, D. (2000). INQUERY at TREC8. In TREC8 Proceedings. NIST.

    Google Scholar 

  • Ballesteros, L. and Croft, W. (1998). Resolving Ambiguity for Cross-language Retrieval. In Proceedings of ACM SIGIR 1998 Conference, pages 64–71.

    Google Scholar 

  • Berger, A. and Lafferty, J. (1999). Information Retrieval as Statistical Translation. In Proceedings of ACM SIGIR 1999 Conference.

    Google Scholar 

  • Brown, P., Pietra, S. D., Pietra, V. D., and Mercer, R. (1993). The Mathematics of Statistical Machine Translation: Parameter Estimation. Computational Linguistics, pages 263–311.

    Google Scholar 

  • Hiemstra, D. and de Jong, F. (1999). Disambiguation Strategies for Cross-language Information Retrieval. In Proceedings of the third European Conference on Research and Advanced Technology for Digital Libraries, pages 274–293.

    Google Scholar 

  • Hull, D. (1997). Using Structured Queries for Disambiguation in Cross-language Information Retrieval. In AAAI Symposium on Cross-Language Text and Speech Retrieval.

    Google Scholar 

  • Klavans, J. and Hovy, E. (1999). Multilingual (or Cross-lingual) Information Retrieval. In Hovy, E., editor, Multilingual Information Management, current levels and future abilities.

    Google Scholar 

  • Kwok, K. L. (1997). Comparing Representations in Chinese Information Retrieval. In Proceedings of ACM SIGIR 1997 Conference.

    Google Scholar 

  • Lafferty, J. (1999). Personal Communications.

    Google Scholar 

  • McCarley, J. (1999). Should We Translate the Documents or the Queries in Cross-language Information Retrieval. In Proceedings of ACL 1999, pages 208–214.

    Google Scholar 

  • Miller, D., Leek, T., and Schwartz, R. (1999). A Hidden Markov Model Information Retrieval System. In Proceedings of ACM SIGIR 1999 Conference, pages 214–221.

    Google Scholar 

  • Oard, D. (1998). A Comparative Study of Query and Document Translation for Cross-language Information Retrieval. In Third Conference of the Association for Machine Translation in the Americas.

    Google Scholar 

  • Pirkola, A. (1998). The Effects of Query Structure and Dictionary Setups in Dictionary-based Cross-language Information Retrieval. In Proceedings of ACM SIGIR 1998 Conference, pages 55–63.

    Google Scholar 

  • Ponte, J. (1998). A Language Modeling Approach to Information Retrieval. In Proceedings of ACM SIGIR 1998 Conference, pages 275–281.

    Google Scholar 

  • Porter, M. (1980). An Algorithm for Suffix Stripping. Program,14(3):130137.

    Google Scholar 

  • Rabiner, L. (1989). A Tutorial on Hidden Markov Models and Selected Appli- cations in Speech Recognition. Proceedings of IEEE 77, pages 257–286.

    Article  Google Scholar 

  • Resnik, P. (1998). Parallel Strands: A Preliminary Investigation into Mining the Web for Bilingual Text. In Third Conference of the Association for Machine Translation in the Americas.

    Google Scholar 

  • Singhal, A., Buckley, C.,, and Mitra, M. (1996). Pivoted Document Length Normalization. In Proceedings of ACM SIGIR 1996 Conference.

    Google Scholar 

  • Sperer, R. and Oard, D. (2000). Structured Query Translation for Cross-language Information Retrieval. In Proceedings of ACM SIGIR 2000 Conference.

    Google Scholar 

  • Voorhees, E. and Harman, D., editors (1997). TREC5 Proceedings. NIST.

    Google Scholar 

  • Voorhees, E. and Harman, D., editors (1998). TREC6 Proceedings. NIST.

    Google Scholar 

  • Voorhees, E. and Harman, D., editors (2001). TREC9 Proceedings. NIST. Xu, J. and Croft, W. (1998). Corpus-based Stemming Using Co-occurrence of Word Variants. ACM TOIS,18(1):79–112.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2003 Springer Science+Business Media Dordrecht

About this chapter

Cite this chapter

Xu, J., Weischedel, R. (2003). A Probabilistic Approach to Term Translation for Cross-Lingual Retrieval. In: Croft, W.B., Lafferty, J. (eds) Language Modeling for Information Retrieval. The Springer International Series on Information Retrieval, vol 13. Springer, Dordrecht. https://doi.org/10.1007/978-94-017-0171-6_6

Download citation

  • DOI: https://doi.org/10.1007/978-94-017-0171-6_6

  • Publisher Name: Springer, Dordrecht

  • Print ISBN: 978-90-481-6263-5

  • Online ISBN: 978-94-017-0171-6

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics