Skip to main content

Building Bilingual Dictionaries from Parallel Web Documents

  • Conference paper
  • First Online:
Advances in Information Retrieval (ECIR 2002)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2291))

Included in the following conference series:

Abstract

In this paper we describe a system for automatically constructing a bilingual dictionary for cross-language information retrieval applications. We describe how we automatically target candidate parallel documents, filter the candidate documents and process them to create parallel sentences. The parallel sentences are then automatically translated using an adaptation of the EMIM technique and a dictionary of translation terms is created. We evaluate our dictionary using human experts. The evaluation showed that the system performs well. In addition the results obtained from automatically-created corpora are comparable to those obtained from manually created corpora of parallel documents. Compared to other available techniques, our approach has the advantage of being simple, uniform, and easy-to-implement while providing encouraging results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Brown, R.D.: Automatically extracted thesauri for cross-language IR: when better is worse, 1st Workshop on Computational Terminology (Computerm), p15–21, 1998.

    Google Scholar 

  2. Chen, J.: Parallel Text Mining for Cross-Language Information Retrieval using a Statistical Translation Model. M.Sc. thesis, University of Montreal, 2000. http://www.iro.umontreal.ca/~chen/thesis/node1.html.

  3. Chen, J. and Nie, J-Y.: Parallel Web Text Mining for Cross-Language IR. In Proceedings of RIAO-2000: "Content-Based Multimedia Information Access”, Paris, 12–14 April 2000.

    Google Scholar 

  4. Davies, M.W. and Ogden, W.C.: QUILT, Implementing a large-scale cross-language text retrieval system, 20th International Conference on Research and Development in Information Retrieval (ACM SIGIR’97), Philadelphia, p92–98, 1997.

    Google Scholar 

  5. Grefenstette, G. (ed.): Cross-Language Information Retrieval. Kluwer Academic Publisher, 1998.

    Google Scholar 

  6. Littman, M.L., and Dumais, S.T. and Landauer, T.K.: Automatic Cross-language Information Retrieval using Latent Semantic Indexing. In Grefenstette, G. (ed.): Crosslanguage Information Retrieval, Kluwer Academic Publishers, p51–62, 1998.

    Google Scholar 

  7. Nie, J-Y., Simard, M., Isabelle, P. and Durard, R.: Cross-Language Information Retrieval based on Parallel Texts and Automatic Mining of Parallel Texts from the Web. In Proceedings of the 22nd International Conference on Research and Development in Information Retrieval (ACM SIGIR’99), Berkeley, p74–81. 1999.

    Google Scholar 

  8. Oakes, M.P.: Statistics for Corpus Linguistics. Edinburgh Textbooks in Empirical Linguistics. 1998.

    Google Scholar 

  9. Oard, D.: Language Distribution of the Web. Web site for Research Resources on Cross-Language Text Retrieval. http://www.clis2.umd.edu/dlrg/filter/papers/

  10. Peters, C. and Sheridan, S.: Multilingual Information Access. In M. Agosti, F. Cresti, and G. Pasi (Eds.): Lectures on Information Retrieval/ESSIR 2000, LNCS 1980, pp. 51–80, 2000.

    Chapter  Google Scholar 

  11. Picchi, E. and Peters, C.: Cross-Language Information Retrieval: A System for Comparable Corpus Querying. In Grefenstette, G. (ed.): Cross-language Information Retrieval, Kluwer Academic Publishers, p81–92, 1998.

    Google Scholar 

  12. Resnik, P.: Parallel Strands: A Preliminary Investigation into Mining the Web for Bilingual Text. In Proceedings of the AMTA-98 Conference, October, 1998.

    Google Scholar 

  13. Resnik, P.: Mining the Web for Bilingual Text. In Proceedings of the International Conference of the Association of Computational Linguistics (ACL-99), College Park, Maryland, 1999.

    Google Scholar 

  14. van Rijsbergen, C.J.: Information Retrieval. 2nd Edition. CD-ROM version, 1999. http://www.dcs.gla.ac.uk/Keith/Preface.html

  15. Sheridan, P. and Ballerini, J.P.: Experiments in Multilingual Information Retrieval using the SPIDER system. In Proceedings of the 19th International Conference on Research and Development in Information Retrieval (ACM SIGIR’96), Zurich, p58–65. 1996.

    Google Scholar 

  16. Yang, Y. and Carbonell, J.G. and Brown, R.D. and Frederking, R.E.: Translingual information retrieval: learning from bilingual corpora, Artificial Intelligence, 103:323–345, 1998.

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2002 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

McEwan, C.J.A., Ounis, I., Ruthven, I. (2002). Building Bilingual Dictionaries from Parallel Web Documents. In: Crestani, F., Girolami, M., van Rijsbergen, C.J. (eds) Advances in Information Retrieval. ECIR 2002. Lecture Notes in Computer Science, vol 2291. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45886-7_20

Download citation

  • DOI: https://doi.org/10.1007/3-540-45886-7_20

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-43343-9

  • Online ISBN: 978-3-540-45886-9

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics