Building Bilingual Dictionaries from Parallel Web Documents

McEwan, Craig J. A.; Ounis, Iadh; Ruthven, Ian

doi:10.1007/3-540-45886-7_20

Craig J. A. McEwan⁷,
Iadh Ounis⁷ &
Ian Ruthven⁸

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2291))

Included in the following conference series:

European Conference on Information Retrieval

464 Accesses
4 Citations

Abstract

In this paper we describe a system for automatically constructing a bilingual dictionary for cross-language information retrieval applications. We describe how we automatically target candidate parallel documents, filter the candidate documents and process them to create parallel sentences. The parallel sentences are then automatically translated using an adaptation of the EMIM technique and a dictionary of translation terms is created. We evaluate our dictionary using human experts. The evaluation showed that the system performs well. In addition the results obtained from automatically-created corpora are comparable to those obtained from manually created corpora of parallel documents. Compared to other available techniques, our approach has the advantage of being simple, uniform, and easy-to-implement while providing encouraging results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Brown, R.D.: Automatically extracted thesauri for cross-language IR: when better is worse, 1^st Workshop on Computational Terminology (Computerm), p15–21, 1998.
Google Scholar
Chen, J.: Parallel Text Mining for Cross-Language Information Retrieval using a Statistical Translation Model. M.Sc. thesis, University of Montreal, 2000. http://www.iro.umontreal.ca/~chen/thesis/node1.html.
Chen, J. and Nie, J-Y.: Parallel Web Text Mining for Cross-Language IR. In Proceedings of RIAO-2000: "Content-Based Multimedia Information Access”, Paris, 12–14 April 2000.
Google Scholar
Davies, M.W. and Ogden, W.C.: QUILT, Implementing a large-scale cross-language text retrieval system, 20^th International Conference on Research and Development in Information Retrieval (ACM SIGIR’97), Philadelphia, p92–98, 1997.
Google Scholar
Grefenstette, G. (ed.): Cross-Language Information Retrieval. Kluwer Academic Publisher, 1998.
Google Scholar
Littman, M.L., and Dumais, S.T. and Landauer, T.K.: Automatic Cross-language Information Retrieval using Latent Semantic Indexing. In Grefenstette, G. (ed.): Crosslanguage Information Retrieval, Kluwer Academic Publishers, p51–62, 1998.
Google Scholar
Nie, J-Y., Simard, M., Isabelle, P. and Durard, R.: Cross-Language Information Retrieval based on Parallel Texts and Automatic Mining of Parallel Texts from the Web. In Proceedings of the 22^nd International Conference on Research and Development in Information Retrieval (ACM SIGIR’99), Berkeley, p74–81. 1999.
Google Scholar
Oakes, M.P.: Statistics for Corpus Linguistics. Edinburgh Textbooks in Empirical Linguistics. 1998.
Google Scholar
Oard, D.: Language Distribution of the Web. Web site for Research Resources on Cross-Language Text Retrieval. http://www.clis2.umd.edu/dlrg/filter/papers/
Peters, C. and Sheridan, S.: Multilingual Information Access. In M. Agosti, F. Cresti, and G. Pasi (Eds.): Lectures on Information Retrieval/ESSIR 2000, LNCS 1980, pp. 51–80, 2000.
Chapter Google Scholar
Picchi, E. and Peters, C.: Cross-Language Information Retrieval: A System for Comparable Corpus Querying. In Grefenstette, G. (ed.): Cross-language Information Retrieval, Kluwer Academic Publishers, p81–92, 1998.
Google Scholar
Resnik, P.: Parallel Strands: A Preliminary Investigation into Mining the Web for Bilingual Text. In Proceedings of the AMTA-98 Conference, October, 1998.
Google Scholar
Resnik, P.: Mining the Web for Bilingual Text. In Proceedings of the International Conference of the Association of Computational Linguistics (ACL-99), College Park, Maryland, 1999.
Google Scholar
van Rijsbergen, C.J.: Information Retrieval. 2nd Edition. CD-ROM version, 1999. http://www.dcs.gla.ac.uk/Keith/Preface.html
Sheridan, P. and Ballerini, J.P.: Experiments in Multilingual Information Retrieval using the SPIDER system. In Proceedings of the 19^th International Conference on Research and Development in Information Retrieval (ACM SIGIR’96), Zurich, p58–65. 1996.
Google Scholar
Yang, Y. and Carbonell, J.G. and Brown, R.D. and Frederking, R.E.: Translingual information retrieval: learning from bilingual corpora, Artificial Intelligence, 103:323–345, 1998.
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computing Science, University of Glasgow, G12 8QQ, Glasgow
Craig J. A. McEwan & Iadh Ounis
Department of Computer and Information Sciences, University of Strathclyde, G1 1XH, UK
Ian Ruthven

Authors

Craig J. A. McEwan
View author publications
You can also search for this author in PubMed Google Scholar
Iadh Ounis
View author publications
You can also search for this author in PubMed Google Scholar
Ian Ruthven
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer and Information Sciences, University of Strathclyde, 26 Richmond Street, G1 1XH, Glasgow, UK
Fabio Crestani
School of Information and Communication Technologies, University of Paisley, High Street, PA1 2BE, Paisley, UK
Mark Girolami
Computing Science Department, University of Glasgow, 17 Lilybank Gardens, G12 8RZ, Glasgow, UK
Cornelis Joost van Rijsbergen

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

McEwan, C.J.A., Ounis, I., Ruthven, I. (2002). Building Bilingual Dictionaries from Parallel Web Documents. In: Crestani, F., Girolami, M., van Rijsbergen, C.J. (eds) Advances in Information Retrieval. ECIR 2002. Lecture Notes in Computer Science, vol 2291. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45886-7_20

Download citation

DOI: https://doi.org/10.1007/3-540-45886-7_20
Published: 14 March 2002
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-43343-9
Online ISBN: 978-3-540-45886-9
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics