Skip to main content

Crosslanguage Retrieval Based on Wikipedia Statistics

  • Conference paper
Evaluating Systems for Multilingual and Multimodal Information Access (CLEF 2008)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5706))

Included in the following conference series:

Abstract

In this paper we present the methodology, implementations and evaluation results of the crosslanguage retrieval system we have developed for the Robust WSD Task at CLEF 2008. Our system is based on query preprocessing for translation and homogenisation of queries. The presented preprocessing of queries includes two stages: Firstly, a query translation step based on term statistics of cooccuring articles in Wikipedia. Secondly, different disjunct query composition techniques to search in the CLEF corpus. We apply the same preprocessing steps for the monolingual as well as the crosslingual task and thereby acting fair and in a similar way across these tasks. The evaluation revealed that the similar processing comes at nearly no costs for monolingual retrieval but enables us to do crosslanguage retrieval and also a feasible comparison of our system performance on these two tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Agiree, E., de Lacall, O.L.: UBC-ALM: Combining k-NN with SVD for WSD. In: Proc. of the 4th Int. Workshop on Semantic Evaluations, pp. 341–345 (2007)

    Google Scholar 

  2. Agirre, E., Giorgio, M., Di Nunzio, Ferro, N., Mandl, T., Peters, C.: Clef 2008: Ad hoc track overview (2008)

    Google Scholar 

  3. Chang, Y., Ng, H.T., Zhong, Z.: NUS-PT: Exploiting parallel texts for word sense disambiguation in the english all-words tasks. In: Proc. of the 4th Int. Workshop on Semantic Evaluations (2007)

    Google Scholar 

  4. Juffinger, A., Kern, R., Granitzer, M.: Exploiting cooccurrence on corpus and document level for fair crosslanguage retrieval. In: Working Notes for the CLEF 2008 Workshop, Aarhus, Denmark, September 17-19 (2008)

    Google Scholar 

  5. Anderka, M., Potthast, M., Stein, B.: A wikipedia-based multilingual retrieval model. In: Macdonald, C., Ounis, I., Plachouras, V., Ruthven, I., White, R.W. (eds.) ECIR 2008. LNCS, vol. 4956, pp. 522–530. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  6. Miller, G.: Wordnet: A lexical database for english. Comm. ACM (1995)

    Google Scholar 

  7. Robertson, S., Zaragoza, H., Taylor, M.: Simple bm25 extension to multiple weighted fields. In: Proc. of the 13th ACM international conference on Information and knowledge management (2004)

    Google Scholar 

  8. Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Information Processing and Management (1988)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Juffinger, A., Kern, R., Granitzer, M. (2009). Crosslanguage Retrieval Based on Wikipedia Statistics. In: Peters, C., et al. Evaluating Systems for Multilingual and Multimodal Information Access. CLEF 2008. Lecture Notes in Computer Science, vol 5706. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04447-2_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-04447-2_19

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-04446-5

  • Online ISBN: 978-3-642-04447-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics