Skip to main content

Setting Per-field Normalisation Hyper-parameters for the Named-Page Finding Search Task

  • Conference paper
  • 2056 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4425))

Abstract

Per-field normalisation has been shown to be effective for Web search tasks, e.g. named-page finding. However, per-field normalisation also suffers from having hyper-parameters to tune on a per-field basis. In this paper, we argue that the purpose of per-field normalisation is to adjust the linear relationship between field length and term frequency. We experiment with standard Web test collections, using three document fields, namely the body of the document, its title, and the anchor text of its incoming links. From our experiments, we find that across different collections, the linear correlation values, given by the optimised hyper-parameter settings, are proportional to the maximum negative linear correlation. Based on this observation, we devise an automatic method for setting the per-field normalisation hyper-parameter values without the use of relevance assessment for tuning. According to the evaluation results, this method is shown to be effective for the body and title fields. In addition, the difficulty in setting the per-field normalisation hyper-parameter for the anchor text field is explained.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Amati, G.: Probabilistic Models for Information Retrieval based on Divergence from Randomness. PhD thesis, University of Glasgow (2003)

    Google Scholar 

  2. Chowdhury, A., et al.: Document normalization revisited. In: Proceedings of ACM SIGIR, ACM Press, New York (2002)

    Google Scholar 

  3. Clarke, C., Scholer, F., Soboroff, I.: Overview of the TREC-2005 Terabyte Track. In: Proceedings of TREC (2005)

    Google Scholar 

  4. DeGroot, M.: Probability and Statistics, 2nd edn. Addison-Wesley, Reading (1989)

    Google Scholar 

  5. Eiron, N., McCurley, K.: Analysis of anchor text for web search. In: Proceedings ACM SIGIR 2003, ACM Press, New York (2003), http://mccurley.org/papers/anchor.pdf

    Google Scholar 

  6. Harter, S.: A probabilistic approach to automatic keyword indexing. PhD thesis, The University of Chicago (1974)

    Google Scholar 

  7. He, B., Ounis, I.: Term frequency normalisation tuning for BM25 and DFR model. In: Losada, D.E., Fernández-Luna, J.M. (eds.) ECIR 2005. LNCS, vol. 3408, Springer, Heidelberg (2005)

    Google Scholar 

  8. He, B., Ounis, I.: A study of the Dirichlet Priors for term frequency normalisation. In: Proceedings of ACM SIGIR 2005, ACM Press, New York (2005)

    Google Scholar 

  9. Jansen, B., Spink, A.: How are we searching the World Wide Web? A comparison of nine search engine transaction logs. Information Processing & Management 42(1) (2006)

    Google Scholar 

  10. Macdonald, C., et al.: University of Glasgow at TREC 2005: Experiments in Terabyte and Enterprise tracks with Terrier. In: Proceedings of TREC (2005)

    Google Scholar 

  11. Ounis, I., et al.: Terrier: A high performance and scalable Information Retrieval platform. In: Proceedings of ACM SIGIR OSIR Workshop, ACM Press, New York (2006)

    Google Scholar 

  12. Plachouras, V.: Selective Web Information Retrieval. PhD thesis, University of Glasgow (2006)

    Google Scholar 

  13. Robertson, S.E., Walker, S., Beaulieu, M.: Okapi at TREC-7: automatic ad hoc, filtering, VLC and interactive. In: Proceedings of TREC 7 (1998)

    Google Scholar 

  14. Robertson, S.E., Zaragoza, H., Taylor, M.: Simple BM25 extension to multiple weighted fields. In: Proceedings ACM CIKM, ACM Press, New York (2004)

    Google Scholar 

  15. Salton, G.: The SMART Retrieval System. Prentice Hall, Englewood Cliffs (1971)

    Google Scholar 

  16. Singhal, A., Buckley, C., Mitra, M.: Pivoted document length normalization. In: Proceedings of ACM SIGIR, ACM Press, New York (1996)

    Google Scholar 

  17. Voorhees, E.: TREC: Experiment and Evaluation in Information Retrieval. MIT Press, Cambridge (2005)

    Google Scholar 

  18. Zaragoza, H., et al.: Microsoft Cambridge at TREC 13: Web and Hard Tracks. In: Proceedings of TREC (2004)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Giambattista Amati Claudio Carpineto Giovanni Romano

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer Berlin Heidelberg

About this paper

Cite this paper

He, B., Ounis, I. (2007). Setting Per-field Normalisation Hyper-parameters for the Named-Page Finding Search Task. In: Amati, G., Carpineto, C., Romano, G. (eds) Advances in Information Retrieval. ECIR 2007. Lecture Notes in Computer Science, vol 4425. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-71496-5_42

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-71496-5_42

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-71494-1

  • Online ISBN: 978-3-540-71496-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics