Cross-Language Retrieval with Wikipedia

Schönhofen, Péter; Benczúr, András; Bíró, István; Csalogány, Károly

doi:10.1007/978-3-540-85760-0_9

Péter Schönhofen¹,
András Benczúr¹,
István Bíró¹ &
…
Károly Csalogány¹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5152))

Included in the following conference series:

Workshop of the Cross-Language Evaluation Forum for European Languages

495 Accesses
10 Citations

Abstract

We demonstrate a twofold use of Wikipedia for cross-lingual information retrieval. As our main contribution, we exploit Wikipedia hyperlinkage for query term disambiguation. We also use bilingual Wikipedia articles for dictionary extension. Our method is based on translation disambiguation; we combine the Wikipedia based technique with a method based on bigram statistics of pairs formed by translations of different source language terms.

This work was supported by a Yahoo! Faculty Research Grant and by grants MOLINGV NKFP-2/0024/2005, NKFP-2004 project Language Miner http://nyelvbanyasz. sztaki.hu

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Di Nunzio, G.M., Ferro, N., Mandl, T., Peters, C.: CLEF 2007 Ad Hoc track overview. In: Peters, C., et al. (eds.) CLEF 2007. LNCS, vol. 5152, pp. 13–32. Springer, Heidelberg (2008)
Google Scholar
Benczúr, A.A., Csalogány, K., Fogaras, D., Friedman, E., Sarlás, T., Uher, M., Windhager, E.: Searching a small national domain – A preliminary report. In: Proceedings of the 12th International World Wide Web Conference (WWW) (2003)
Google Scholar
Di Nunzio, G., Ferro, N., Mandl, T., Peters, C.: CLEF 2006: Ad Hoc Track Overview. In: Peters, C., Clough, P., Gey, F.C., Karlgren, J., Magnini, B., Oard, D.W., de Rijke, M., Stempfhuber, M. (eds.) CLEF 2006. LNCS, vol. 4730. Springer, Heidelberg (2007)
Chapter Google Scholar
Halácsy, P., Trón, V.: Benefits of deep NLP-based lemmatization for information retrieval. In: Peters, C., Clough, P., Gey, F.C., Karlgren, J., Magnini, B., Oard, D.W., de Rijke, M., Stempfhuber, M. (eds.) CLEF 2006. LNCS, vol. 4730. Springer, Heidelberg (2007)
Chapter Google Scholar
Savoy, J., Abdou, S.: UniNE at CLEF 2006: Experiments with Monolingual, Bilingual, Domain-Specific and Robust Retrieval. In: Peters, C., Clough, P., Gey, F.C., Karlgren, J., Magnini, B., Oard, D.W., de Rijke, M., Stempfhuber, M. (eds.) CLEF 2006. LNCS, vol. 4730. Springer, Heidelberg (2007)
Chapter Google Scholar
Hungarian Grammar: From Wikipedia, the free encyclopedia, http://en.wikipedia.org/wiki/Hungarian_grammar
Hiemstra, D., de Jong, F.: Disambiguation strategies for cross-language information retrieval. In: Proceedings of the Third European Conference on Research and Advanced Technology for Digital Libraries, London, UK, pp. 274–293 (1999)
Google Scholar
Dorr, B.J.: The use of lexical semantics in interlingual machine translation. Machine Translation 7(3), 135–193 (1992)
Article Google Scholar
Knight, K., Luk, S.K.: Building a large-scale knowledge base for machine translation. In: Proceedings of the twelfth National Conference on Artificial Intelligence, pp. 773–778 (1994)
Google Scholar
Navigli, R., Velardi, P., Gangemi, A.: Ontology learning and its application to automated terminology translation. IEEE Intelligent Systems 18(1), 22–31 (2003)
Article Google Scholar
Mahesh, K.: Ontology development for machine translation: Ideology and methodology. Technical Report MCCS 96-292, Computing Research Laboratory, New Mexico State University (1996)
Google Scholar
Denoyer, L., Gallinari, P.: The Wikipedia XML corpus. SIGIR Forum 40(1), 64–69 (2006)
Article Google Scholar
Adafre, S.F., de Rijke, M.: Finding similar sentences across multiple languages in Wikipedia. In: Proceedings of the New Text Workshop, 11th Conference of the European Chapter of the Association for Computational Linguistics (2006)
Google Scholar
Schmid, H.: Probabilistic part-of-speech tagging using decision trees. In: Proceedings of the International Conference on New Methods in Language Processing (1994)
Google Scholar
Project jDictionary: SMART English-German plugin version 1.4, http://jdictionary.sourceforge.net/plugins.html
Völkel, M., Krötzsch, M., Vrandecic, D., Haller, H., Studer, R.: Semantic Wikipedia. In: Proceedings of the 15th international conference on World Wide Web, pp. 585–594 (2006)
Google Scholar
Schönhofen, P.: Identifying document topics using the Wikipedia category network. In: Web Intelligence, pp. 456–462 (2006)
Google Scholar
Rasolofo, Y., Savoy, J.: Term proximity scoring for keyword-based retrieval systems. In: Sebastiani, F. (ed.) ECIR 2003. LNCS, vol. 2633, pp. 207–218. Springer, Heidelberg (2003)
Chapter Google Scholar
Büttcher, S., Clarke, C.L.A., Lushman, B.: Term proximity scoring for Ad-Hoc retrieval on very large text collections. In: SIGIR 2006, pp. 621–622. ACM Press, New York (2006)
Chapter Google Scholar
Singhal, A., Buckley, C., Mitra, M., Salton, G.: Pivoted document length normalization. Technical Report TR95-1560, Cornell University, Ithaca, NY (1995)
Google Scholar

Download references

Author information

Authors and Affiliations

Data Mining and Web search Research Group, Informatics Laboratory Computer and Automation Research Institute, Hungarian Academy of Sciences,
Péter Schönhofen, András Benczúr, István Bíró & Károly Csalogány

Authors

Péter Schönhofen
View author publications
You can also search for this author in PubMed Google Scholar
András Benczúr
View author publications
You can also search for this author in PubMed Google Scholar
István Bíró
View author publications
You can also search for this author in PubMed Google Scholar
Károly Csalogány
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Carol Peters Valentin Jijkoun Thomas Mandl Henning Müller Douglas W. Oard Anselmo Peñas Vivien Petras Diana Santos

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Schönhofen, P., Benczúr, A., Bíró, I., Csalogány, K. (2008). Cross-Language Retrieval with Wikipedia . In: Peters, C., et al. Advances in Multilingual and Multimodal Information Retrieval. CLEF 2007. Lecture Notes in Computer Science, vol 5152. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-85760-0_9

Download citation

DOI: https://doi.org/10.1007/978-3-540-85760-0_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-85759-4
Online ISBN: 978-3-540-85760-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics