Advertisement

Danish and Greek Web Search Experiments with Hummingbird SearchServerTM at CLEF 2005

  • Stephen Tomlinson
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4022)

Abstract

Hummingbird participated in the WebCLEF mixed monolingual retrieval task of the Cross-Language Evaluation Forum (CLEF) 2005. In this task, the system was given 547 known-item queries from 11 languages (134 Spanish, 121 English, 59 Dutch, 59 Portuguese, 57 German, 35 Hungarian, 30 Danish, 30 Russian, 16 Greek, 5 Icelandic and 1 French). The goal was to find the desired page in the 82GB EuroGOV collection (3.4 million pages crawled from government sites of 27 European domains). Our experiments found that stopword processing was more important than anticipated, perhaps because words common in one language may tend to be overweighted by inverse document frequency in a mixed language collection. Extra weight on the document title helped significantly, and extra weight on less deep urls significantly helped home page queries. Stemming was of neutral impact on average, but it made a substantial difference for some individual queries. We analyze several Danish and Greek queries in detail.

Keywords

Inverse Document Frequency Extra Weight Document Title Query Word Greek Government 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    AltaVista’s Babel Fish Translation Service, http://babelfish.altavista.com/tr
  2. 2.
    Cross-Language Evaluation Forum web site, http://www.clef-campaign.org/
  3. 3.
    Hodgson, A.: Converting the Fulcrum Search Engine to Unicode. In: Sixteenth International Unicode Conference (2000)Google Scholar
  4. 4.
    Porter, M.F.: Snowball: A language for stemming algorithms (October 2001), http://snowball.tartarus.org/texts/introduction.html
  5. 5.
    Robertson, S.E., Walker, S., Jones, S., Hancock-Beaulieu, M.M., Gatford, M.: Okapi at TREC-3. In: Proceedings of TREC-3 (1995)Google Scholar
  6. 6.
    Savoy, J.: CLEF and Multilingual information retrieval resource page, http://www.unine.ch/info/clef/
  7. 7.
    Sigurbjörnsson, B., Kamps, J., de Rijke, M.: EuroGOV: Engineering a Multilingual Web Corpus. In: Peters, C., Gey, F.C., Gonzalo, J., Müller, H., Jones, G.J.F., Kluck, M., Magnini, B., de Rijke, M., Giampiccolo, D. (eds.) CLEF 2005. LNCS, vol. 4022, pp. 825–836. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  8. 8.
    Sigurbjörnsson, B., Kamps, J., de Rijke, M.: Overview of WebCLEF 2005. In: Peters, C., Gey, F.C., Gonzalo, J., Müller, H., Jones, G.J.F., Kluck, M., Magnini, B., de Rijke, M., Giampiccolo, D. (eds.) CLEF 2005. LNCS, vol. 4022, pp. 810–824. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  9. 9.
    Text REtrieval Conference (TREC) Home Page, http://trec.nist.gov/
  10. 10.
    Tomlinson, S.: Bulgarian and Hungarian Experiments with Hummingbird SearchServerTM at CLEF 2005. In: Peters, C., Gey, F.C., Gonzalo, J., Müller, H., Jones, G.J.F., Kluck, M., Magnini, B., de Rijke, M., Giampiccolo, D. (eds.) CLEF 2005. LNCS, vol. 4022, pp. 194–203. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  11. 11.
    Tomlinson, S.: Experiments in Named Page Finding and Arabic Retrieval with Hummingbird SearchServerTM at TREC 2002. In: Proceedings of TREC (2002)Google Scholar
  12. 12.
    Tomlinson, S.: Robust, Web and Terabyte Retrieval with Hummingbird SearchServerTM at TREC 2004. In: Proceedings of TREC (2004)Google Scholar
  13. 13.
    Westerveld, T., Kraaij, W., Hiemstra, D.: Retrieving Web Pages using Content, Links, URLs and Anchors. In: Proceedings of TREC 2001 (2001)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Stephen Tomlinson
    • 1
  1. 1.HummingbirdOttawaCanada

Personalised recommendations