Setting Per-field Normalisation Hyper-parameters for the Named-Page Finding Search Task

He, Ben; Ounis, Iadh

doi:10.1007/978-3-540-71496-5_42

Setting Per-field Normalisation Hyper-parameters for the Named-Page Finding Search Task

Ben He¹ &
Iadh Ounis¹

Conference paper

2056 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4425))

Abstract

Per-field normalisation has been shown to be effective for Web search tasks, e.g. named-page finding. However, per-field normalisation also suffers from having hyper-parameters to tune on a per-field basis. In this paper, we argue that the purpose of per-field normalisation is to adjust the linear relationship between field length and term frequency. We experiment with standard Web test collections, using three document fields, namely the body of the document, its title, and the anchor text of its incoming links. From our experiments, we find that across different collections, the linear correlation values, given by the optimised hyper-parameter settings, are proportional to the maximum negative linear correlation. Based on this observation, we devise an automatic method for setting the per-field normalisation hyper-parameter values without the use of relevance assessment for tuning. According to the evaluation results, this method is shown to be effective for the body and title fields. In addition, the difficulty in setting the per-field normalisation hyper-parameter for the anchor text field is explained.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Amati, G.: Probabilistic Models for Information Retrieval based on Divergence from Randomness. PhD thesis, University of Glasgow (2003)
Google Scholar
Chowdhury, A., et al.: Document normalization revisited. In: Proceedings of ACM SIGIR, ACM Press, New York (2002)
Google Scholar
Clarke, C., Scholer, F., Soboroff, I.: Overview of the TREC-2005 Terabyte Track. In: Proceedings of TREC (2005)
Google Scholar
DeGroot, M.: Probability and Statistics, 2nd edn. Addison-Wesley, Reading (1989)
Google Scholar
Eiron, N., McCurley, K.: Analysis of anchor text for web search. In: Proceedings ACM SIGIR 2003, ACM Press, New York (2003), http://mccurley.org/papers/anchor.pdf
Google Scholar
Harter, S.: A probabilistic approach to automatic keyword indexing. PhD thesis, The University of Chicago (1974)
Google Scholar
He, B., Ounis, I.: Term frequency normalisation tuning for BM25 and DFR model. In: Losada, D.E., Fernández-Luna, J.M. (eds.) ECIR 2005. LNCS, vol. 3408, Springer, Heidelberg (2005)
Google Scholar
He, B., Ounis, I.: A study of the Dirichlet Priors for term frequency normalisation. In: Proceedings of ACM SIGIR 2005, ACM Press, New York (2005)
Google Scholar
Jansen, B., Spink, A.: How are we searching the World Wide Web? A comparison of nine search engine transaction logs. Information Processing & Management 42(1) (2006)
Google Scholar
Macdonald, C., et al.: University of Glasgow at TREC 2005: Experiments in Terabyte and Enterprise tracks with Terrier. In: Proceedings of TREC (2005)
Google Scholar
Ounis, I., et al.: Terrier: A high performance and scalable Information Retrieval platform. In: Proceedings of ACM SIGIR OSIR Workshop, ACM Press, New York (2006)
Google Scholar
Plachouras, V.: Selective Web Information Retrieval. PhD thesis, University of Glasgow (2006)
Google Scholar
Robertson, S.E., Walker, S., Beaulieu, M.: Okapi at TREC-7: automatic ad hoc, filtering, VLC and interactive. In: Proceedings of TREC 7 (1998)
Google Scholar
Robertson, S.E., Zaragoza, H., Taylor, M.: Simple BM25 extension to multiple weighted fields. In: Proceedings ACM CIKM, ACM Press, New York (2004)
Google Scholar
Salton, G.: The SMART Retrieval System. Prentice Hall, Englewood Cliffs (1971)
Google Scholar
Singhal, A., Buckley, C., Mitra, M.: Pivoted document length normalization. In: Proceedings of ACM SIGIR, ACM Press, New York (1996)
Google Scholar
Voorhees, E.: TREC: Experiment and Evaluation in Information Retrieval. MIT Press, Cambridge (2005)
Google Scholar
Zaragoza, H., et al.: Microsoft Cambridge at TREC 13: Web and Hard Tracks. In: Proceedings of TREC (2004)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computing Science, University of Glasgow, United Kingdom
Ben He & Iadh Ounis

Authors

Ben He
View author publications
You can also search for this author in PubMed Google Scholar
Iadh Ounis
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Giambattista Amati Claudio Carpineto Giovanni Romano

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

He, B., Ounis, I. (2007). Setting Per-field Normalisation Hyper-parameters for the Named-Page Finding Search Task. In: Amati, G., Carpineto, C., Romano, G. (eds) Advances in Information Retrieval. ECIR 2007. Lecture Notes in Computer Science, vol 4425. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-71496-5_42

Download citation

DOI: https://doi.org/10.1007/978-3-540-71496-5_42
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-71494-1
Online ISBN: 978-3-540-71496-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics