WikiTranslate: Query Translation for Cross-Lingual Information Retrieval Using Only Wikipedia

  • Dong Nguyen
  • Arnold Overwijk
  • Claudia Hauff
  • Dolf R. B. Trieschnigg
  • Djoerd Hiemstra
  • Franciska de Jong
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5706)

Abstract

This paper presents WikiTranslate, a system which performs query translation for cross-lingual information retrieval (CLIR) using only Wikipedia to obtain translations. Queries are mapped to Wikipedia concepts and the corresponding translations of these concepts in the target language are used to create the final query. WikiTranslate is evaluated by searching with topics formulated in Dutch, French and Spanish in an English data collection. The system achieved a performance of 67% compared to the monolingual baseline.

Keywords

Cross-lingual information retrieval query translation word sense disambiguation Wikipedia comparable corpus 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Zesch, T., Gurevych, I., Mühlhäuser, M.: Analyzing and Accessing Wikipedia as a Lexical Semantic Resource. In: Data Structures for Linguistic Resources and Applications, pp. 197–205 (2007)Google Scholar
  2. 2.
    Sanderson, M.: Word sense disambiguation and information retrieval. In: Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 142–151. Springer-Verlag New York, Inc., Dublin (1994)Google Scholar
  3. 3.
    Kraaij, W., Nie, J.-Y., Simard, M.: Embedding web-based statistical translation models in cross-language information retrieval. Comput. Linguist. 29, 381–419 (2003)CrossRefMATHGoogle Scholar
  4. 4.
    Lavrenko, V., Choquette, M., Croft, W.B.: Cross-lingual relevance models. In: Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval, Tampere, Finland, pp. 175–182. ACM, New York (2002)Google Scholar
  5. 5.
    Sheridan, P., Ballerini, J.P.: Experiments in multilingual information retrieval using the SPIDER system. In: Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval, Zurich, Switzerland, pp. 58–65. ACM, New York (1996)Google Scholar
  6. 6.
    Mihalcea, R.: Using Wikipedia for Automatic Word Sense Disambiguation. In: The North American Chapter of the Association for Computational Linguistics (NAACL 2007), Rochester (2007)Google Scholar
  7. 7.
    Su, C.-Y., Lin, T.-C., Shih-Hung, W.: Using Wikipedia to Translate OOV Term on MLIR. In: The 6th NTCIR Workshop, Tokyo (2007)Google Scholar
  8. 8.
    Schönhofen, P., Benczúr, A., Bíró, I., Csalogány, K.: Performing Cross-Language Retrieval with Wikipedia. In: CLEF 2007, Budapest (2007)Google Scholar
  9. 9.
    Potthast, M., Stein, B., Anderka, M.: A Wikipedia-based Multilingual Retrieval Model. In: Macdonald, C., Ounis, I., Plachouras, V., Ruthven, I., White, R.W. (eds.) ECIR 2008. LNCS, vol. 4956, pp. 522–530. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  10. 10.
    Hollink, V., Kamps, J., Monz, C., de Rijke, M.: Monolingual Document Retrieval for European Languages. Inf. Retr. 7, 33–52 (2004)CrossRefGoogle Scholar
  11. 11.
    Stemming algorithms for use in information retrieval, http://www.snowball.tartarus.org
  12. 12.
    Agirre, E., Di Nunzio, G.M., Ferro, N., Mandl, T., Peters, C.: CLEF 2008: Ad Hoc Track Overview. In: Working Notes for the CLEF 2008 Workshop (2008)Google Scholar
  13. 13.
    Vossen, P.: EuroWordNet: a multilingual database for information retrieval. In: Proceedings of the DELOS workshop on Cross-language Information, Zurich, Switzerland (1997)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Dong Nguyen
    • 1
  • Arnold Overwijk
    • 1
  • Claudia Hauff
    • 1
  • Dolf R. B. Trieschnigg
    • 1
  • Djoerd Hiemstra
    • 1
  • Franciska de Jong
    • 1
  1. 1.University of TwenteThe Netherlands

Personalised recommendations