Skip to main content

Multilingual Web Retrieval Experiments with Field Specific Indexing Strategies for WebCLEF 2006 at the University of Hildesheim

  • Conference paper
Book cover Evaluation of Multilingual and Multi-modal Information Retrieval (CLEF 2006)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4730))

Included in the following conference series:

  • 529 Accesses

Abstract

Experiments with the analysis and extraction of the HTML structure of web documents were carried out for WebCLEF 2006. In addition, blind relevance feedback was applied. As for WebCLEF 2005, a language indepen dent indexing strategy was pursued. We experimented with HTML title, H1 element and other elements emphasizing text. Our index contained title and H1, emphasized elements, full and partial content. The best results with the WebCLEF 2005 topics were achieved with a strong weight on the title-element and a very small weight on emphasized text leading to a marginal improvement over the best post submission runs for the mixed-monolingual task at Web CLEF 2005. For the WebCLEF 2006 topics, improved results were achieved for manually generated topics. The best performance for manual topics for WebCLEF 2006 was achieved with a strong weight on both HTML title as well as H1 elements, and a decreased weight for the other elements. Blind relevance feedback could not yet improve the results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Balog, K., Azzopardi, L., Kamps, J., Rijke, M.: Overview of WebCLEF 2006, LNCS, vol. 4730, pp. 803–819, Springer, Heidelberg (2007)

    Google Scholar 

  2. Chen, L., Ye, S., Li, X.: Template Detection for Large Scale Search Engines. In: Proc ACM Symposium on Applied Computing, pp. 1094–1098. ACM Press, New York (2006)

    Google Scholar 

  3. Hackl, R., Mandl, T., Womser-Hacker, C.: Mono- and Cross-Lingual Retrieval Experiments at the University of Hildesheim. In: Peters, C., Clough, P.D., Gonzalo, J., Jones, G.J.F., Kluck, M., Magnini, B. (eds.) CLEF 2004. LNCS, vol. 3491, pp. 165–169. Springer, Heidelberg (2005)

    Google Scholar 

  4. Jensen, N., Hackl, R., Mandl, T., Strötgen, R.: Web Retrieval Experiments with the EuroGOV Corpus at the University of Hildesheim. In: Peters, C., Gey, F.C., Gonzalo, J., Müller, H., Jones, G.J.F., Kluck, M., Magnini, B., de Rijke, M., Giampiccolo, D. (eds.) CLEF 2005. LNCS, vol. 4022, pp. 837–845. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  5. Kamps, J.: Web-centric language models. In: Proc. 14th ACM CIKM 2005, pp. 307–308. ACM, New York (2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Carol Peters Paul Clough Fredric C. Gey Jussi Karlgren Bernardo Magnini Douglas W. Oard Maarten de Rijke Maximilian Stempfhuber

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Heuwing, B., Mandl, T., Strötgen, R. (2007). Multilingual Web Retrieval Experiments with Field Specific Indexing Strategies for WebCLEF 2006 at the University of Hildesheim. In: Peters, C., et al. Evaluation of Multilingual and Multi-modal Information Retrieval. CLEF 2006. Lecture Notes in Computer Science, vol 4730. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74999-8_105

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-74999-8_105

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-74998-1

  • Online ISBN: 978-3-540-74999-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics