Skip to main content

Automatic dictionary extraction for cross-language information retrieval

  • Chapter
Parallel Text Processing

Part of the book series: Text, Speech and Language Technology ((TLTB,volume 13))

Abstract

In experiments comparing a variety of different methods for cross-language information retrieval using a bilingual training corpus—methods based on both machine translation and “traditional” information-retrieval techniques—a fairly simple statistical technique for automatically extracting a bilingual dictionary from parallel text proved to have the best performance. Surprisingly, an improvement to the dictionary extraction method that significantly increases the accuracy of the dictionary proved to be slightly detrimental to overall performance even though it is highly beneficial for other applications. This chapter will describe the extraction method and its enhancement in detail, and compare the performance of a retrieval system using the automatically-generated dictionaries with other retrieval methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Ballesteros, L. and Croft, W. B. (1997). Phrasal Translation and Query Expansion Techniques for Cross-Language Information Retrieval. University of Massachusetts Technical Report: IR-104.

    Google Scholar 

  • Brown, P. F., Della Pietra, S., Della Pietra, V. J. and Mercer, R. L. (1993). The mathematics of statistical machine translation: parameter estimation. Computational Linguistics, 19(2), 263311

    Google Scholar 

  • Brown, R. D. (1996). Example-Based Machine Translation in the Pangloss System. Proceedings of the 16th International Conference on Computational Linguistics (COLING-96),Copenhagen, 169–174. Available: http://www.cs.cmu.edu/—ralf/ papers.html.

    Google Scholar 

  • Brown, R. D. (1997). Automated Dictionary Extraction for “Knowledge-Free” Example-Based Translation. Proceedings of the Seventh International Conference on Theoretical and Methodological Issues in Machine Translation (TMI97), 111–118. Available: http://www.cs.cmu.edu/-ralf/papers.html.

    Google Scholar 

  • Brown, R. D. (1998). Automatically-Extracted Thesauri for Cross-Language IR: When Better is Worse. First Workshop on Computational Terminology, 15–21. Available: http://www.cs.cmu.edu/—ralf/papers.html.

    Google Scholar 

  • Buckley, C., Salton, G., Allan, A. and Singhal, A. (1995). Automatic Query Expansion Using SMART: TREC 3. Overview of the Third Text REtrieval Conference (TREC-3), 69–80.

    Google Scholar 

  • Carbonell, J. G. and Goldstein, J. (1998). The Use of MMR, Diversity-Based Reranking for Reordering Documents and Producing Summaries. Proceedings of the 21’` Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’98), 335–336. Available: http://www.cs.cmu.edu/jade/ps/sigir98.ps.

    Google Scholar 

  • Carbonell, J. G., Yang, Y., Frederking, R. E., Brown, R. D., Geng, Y. and Lee, D. (1997). Translingual Information Retrieval: A Comparative Evaluation. Proceedings of Fifteenth International Joint Conference on Artificial Intelligence (IJCAI-97), volume I, 708–715. Available: http://www.cs.cmu.edu/—ralf/papers.html.

    Google Scholar 

  • Davis, M. W. and Dunning, T. E. (1995). A TREC Evaluation of Query Translation Methods for Multi-Lingual Text Retrieval. The Fourth Text Retrieval Conference (TREC-4), IST, 483–498.

    Google Scholar 

  • Deerwester, S., Dumais, S. T., Fumas, G. W., Landauer, T. K. and Harshman, R. (1990). Indexing by Latent Semantic Analysis. Journal of the American Society for Information Science 1 (6), 391–407.

    Article  Google Scholar 

  • Dumais, S. T., Landauer, T. K. and Littman, M. L. (1996). Automatic Cross-Linguistic Information Retrieval Using Latent Semantic Indexing. SIGIR’96 Workshop on Cross-Linguistic Information Retrieval.

    Google Scholar 

  • Frederking, R. E., Nirenburg, S., Farwell, D., Helmreich, S., Hovy, E., Knight, K., Beale, S., Domashnev, C., Attardo, D., Grannes, D. and Brown, R. D. (1994). Integrating Translations from Multiple Sources within the Pangloss Mark III Machine Translation. Proceedings of the First Conference of the Association for Machine Translation in the Americas, Columbia, Maryland, 73–80.

    Google Scholar 

  • Gaussier, E. (1998). Flow Network Models for Word Alignment and Terminology Extraction from Bilingual Corpora. Proceedings of the 36 th Annual Meeting of the Association for Computational Linguistics and 17` h International Conference on Computational Linguistics (COLING-ACL’98), Montréal, Quebec, Canada, 444–450.

    Google Scholar 

  • Graff, D. and Finch, R. (1994). Multilingual Text Resources at the Linguistic Data Consortium. Proceedings of the 1994 ARPA Human Language Technology Workshop. Morgan Kaufmann, 18–22.

    Google Scholar 

  • Hersh, W. R., Buckley, C., Leone, T. J. and Hickman, D. (1994). OHSUMED: An Interactive Retrieval Evaluation and New Large Text Collection for Research. 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’94), 192–201.

    Google Scholar 

  • Hull, D. A. and Grefenstette, G. (1996). Querying Across Languages: a Dictionary-based Approach to Multilingual Information Retrieval. Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’96), 49–57.

    Google Scholar 

  • Melamed, I. D. (1997). A Word-to-Word Model of Translational Equivalence. Proceedings of the 35` h Annual Meeting of the Association for Computational Linguistics (ACL’97), 490–497.

    Google Scholar 

  • Salton, G. and Buckley, C. (1990). Improving Retrieval Performance by Relevance Feedback. Journal of American Society for Information Sciences, 41: 288–297.

    Article  Google Scholar 

  • Salton, G. (1989). Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer. Addison-Wesley, Reading, Pennsylvania.

    Google Scholar 

  • Sheridan, P. and Ballerini, J. P. (1996). Experiments in Multilingual Information Retrieval using the SPIDER System. Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’96), 58–65.

    Google Scholar 

  • Srinivasan, P. (1996). Optimal Document Indexing Vocabulary for MEDLINE. Information Processing and Management, 32 (5): 503–514.

    Article  Google Scholar 

  • Wong, S. K. M., Ziarko, W. and Wong, P. C. N. (1985). Generalized Vector Space Model in Information Retrieval. Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’85), 18–25.

    Google Scholar 

  • Yang, Y. and Pedersen, J. P. (1997). Feature selection in statistical learning of text categorization. Proceedings of The Fourteenth International Conference on Machine Learning, 412–420. Available: http://www.cs.cmu.edu/yiming/publications.html.

    Google Scholar 

  • Yang, Y., Brown, R. D., Frederking, R. E., CarbonellJ. G., Geng, G. and Lee, D. (1997). Bilingual-corpus Based Approaches to Translingual Information Retrieval. Proceedings of The 2“a Workshop on Multilinguality in Software Industry: The AI Contribution (MULSAIC’97).

    Google Scholar 

  • Yang, Y., Carbonell, J. G., Brown, R. D. and Frederking, R. E. (1998). Translingual Information Retrieval: Learning from Bilingual Corpora. Artificial Intelligence Journal (Special issue: Best of IJCAI-97), 103, 323–345. Available: http://www.cs.cmu.edu/—ralf/ papers.html.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2000 Springer Science+Business Media Dordrecht

About this chapter

Cite this chapter

Brown, R.D., Carbonell, J.G., Yang, Y. (2000). Automatic dictionary extraction for cross-language information retrieval. In: Véronis, J. (eds) Parallel Text Processing. Text, Speech and Language Technology, vol 13. Springer, Dordrecht. https://doi.org/10.1007/978-94-017-2535-4_14

Download citation

  • DOI: https://doi.org/10.1007/978-94-017-2535-4_14

  • Publisher Name: Springer, Dordrecht

  • Print ISBN: 978-90-481-5555-2

  • Online ISBN: 978-94-017-2535-4

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics