Using Document-Quality Measures to Predict Web-Search Effectiveness

Raiber, Fiana; Kurland, Oren

doi:10.1007/978-3-642-36973-5_12

Fiana Raiber²³ &
Oren Kurland²³

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7814))

Included in the following conference series:

European Conference on Information Retrieval

2999 Accesses
7 Citations

Abstract

The query-performance prediction task is estimating retrieval effectiveness in the absence of relevance judgments. The task becomes highly challenging over theWeb due to, among other reasons, the effect of low quality (e.g., spam) documents on retrieval performance. To address this challenge, we present a novel prediction approach that utilizes queryindependent document-quality measures. While using these measures was shown to improve Web-retrieval effectiveness, this is the first study demonstrating the clear merits of using them for query-performance prediction. Evaluation performed with large scale Web collections shows that our methods post prediction quality that often surpasses that of state-of-the-art predictors, including those devised specifically for Web retrieval.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Softcover Book: USD 129.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Balasubramanian, N., Kumaran, G., Carvalho, V.R.: Predicting query performance on the web. In: Proc. of SIGIR, pp. 785–786 (2010)
Google Scholar
Bendersky, M., Croft, W.B., Diao, Y.: Quality-biased ranking of web documents. In: Proc. of WSDM, pp. 95–104 (2011)
Google Scholar
Bernstein, Y., Billerbeck, B., Garcia, S., Lester, N., Scholer, F., Zobel, J.: RMIT university at trec 2005: Terabyte and robust track. In: Proc. of TREC-14 (2005)
Google Scholar
Brin, S., Page, L.: The anatomy of a large-scale hypertextual web search engine. In: Proc. of WWW, pp. 107–117 (1998)
Google Scholar
Carmel, D., Yom-Tov, E.: Estimating the Query Difficulty for Information Retrieval. In: Synthesis Lectures on Information Concepts, Retrieval, and Services. Morgan & Claypool (2010)
Google Scholar
Carmel, D., Yom-Tov, E., Darlow, A., Pelleg, D.: What makes a query difficult? In: Proc. of SIGIR, pp. 390–397 (2006)
Google Scholar
Clarke, C.L.A., Craswell, N., Soboroff, I.: Overview of the trec 2009 web track. In: Proc. of TREC (2009)
Google Scholar
Cormack, G.V., Smucker, M.D., Clarke, C.L.A.: Efficient and effective spam filtering and re-ranking for large web datasets. Information Retrieval 14(5), 441–465 (2011)
Article Google Scholar
Cronen-Townsend, S., Zhou, Y., Croft, W.B.: Predicting query performance. In: Proc. of SIGIR, pp. 299–306 (2002)
Google Scholar
Diaz, F.: Performance prediction using spatial autocorrelation. In: Proc. of SIGIR, pp. 583–590 (2007)
Google Scholar
Gyöngyi, Z., Garcia-Molina, H.: Web spam taxonomy. In: Proc. of AIRWeb, pp. 39–47 (2005)
Google Scholar
Hauff, C., Kelly, D., Azzopardi, L.: A comparison of user and system query performance predictions. In: Proc. of CIKM, pp. 979–988 (2010)
Google Scholar
Hauff, C., Murdock, V., Baeza-Yates, R.A.: Improved query difficulty prediction for the web. In: Proc. of CIKM, pp. 439–448 (2008)
Google Scholar
Hummel, S., Shtok, A., Raiber, F., Kurland, O., Carmel, D.: Clarity re-visited. In: Proc. of SIGIR, pp. 1039–1040 (2012)
Google Scholar
Kurland, O., Lee, L.: PageRank without hyperlinks: Structural re-ranking using links induced by language models. In: Proc. of SIGIR, pp. 306–313 (2005)
Google Scholar
Lavrenko, V., Croft, W.B.: Relevance-based language models. In: Proc. of SIGIR, pp. 120–127 (2001)
Google Scholar
Lin, J., Metzler, D., Elsayed, T., Wang, L.: Of Ivory and Smurfs: Loxodontan MapReduce Experiments for Web Search. In: Proc. of TREC 2009 (2010)
Google Scholar
Shtok, A., Kurland, O., Carmel, D.: Predicting Query Performance by Query-Drift Estimation. In: Azzopardi, L., Kazai, G., Robertson, S., Rüger, S., Shokouhi, M., Song, D., Yilmaz, E. (eds.) ICTIR 2009. LNCS, vol. 5766, pp. 305–312. Springer, Heidelberg (2009)
Chapter Google Scholar
Shtok, A., Kurland, O., Carmel, D.: Using statistical decision theory and relevance models for query-performance prediction. In: Proc. of SIGIR (2010)
Google Scholar
Song, F., Croft, W.B.: A general language model for information retrieval (poster abstract). In: Proc. of SIGIR, pp. 279–280 (1999)
Google Scholar
Tomlinson, S.: Robust, Web and Terabyte Retrieval with Hummingbird Search Server at TREC 2004. In: Proc. of TREC-13 (2004)
Google Scholar
Vinay, V., Cox, I.J., Milic-Frayling, N., Wood, K.R.: On ranking the effectiveness of searches. In: Proc. of SIGIR, pp. 398–404 (2006)
Google Scholar
Voorhees, E.M.: Overview of the TREC 2004 Robust Retrieval Track. In: Proc. of TREC-13 (2004)
Google Scholar
Zhai, C., Lafferty, J.D.: A study of smoothing methods for language models applied to ad hoc information retrieval. In: Proc. of SIGIR, pp. 334–342 (2001)
Google Scholar
Zhao, Y., Scholer, F., Tsegay, Y.: Effective Pre-retrieval Query Performance Prediction Using Similarity and Variability Evidence. In: Macdonald, C., Ounis, I., Plachouras, V., Ruthven, I., White, R.W. (eds.) ECIR 2008. LNCS, vol. 4956, pp. 52–64. Springer, Heidelberg (2008)
Chapter Google Scholar
Zhou, Y., Croft, B.: Ranking robustness: a novel framework to predict query performance. In: Proc. of CIKM, pp. 567–574 (2006)
Google Scholar
Zhou, Y., Croft, B.: Query performance prediction in web search environments. In: Proc. of SIGIR, pp. 543–550 (2007)
Google Scholar
Zhou, Y., Croft, W.B.: Document quality models for web ad hoc retrieval. In: Proc. of CIKM, pp. 331–332 (2005)
Google Scholar
Zhu, X., Gauch, S.: Incorporating quality metrics in centralized/distributed information retrieval on the world wide web. In: Proc. of SIGIR, pp. 288–295 (2000)
Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Industrial Engineering and Management, Technion, Haifa, 32000, Israel
Fiana Raiber & Oren Kurland

Authors

Fiana Raiber
View author publications
You can also search for this author in PubMed Google Scholar
Oren Kurland
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Yandex, Leo Tolstoy, 16, 119021, Moscow, Russia
Pavel Serdyukov & Ilya Segalovich &
Kontur Labs and Ural Federal University, Fonvizina 3-27, 620078, Yekaterinburg, Russia
Pavel Braslavski
National Research University Higher School of Economics (HSE), Pokrovskii bd 11, 109028, Moscow, Russia
Sergei O. Kuznetsov
University of Amsterdam, Turfdraagsterpad 9, 1012 XT, Amsterdam, The Netherlands
Jaap Kamps
Knowledge Media Institute, The Open University, Walton Hall, MK7 6AA, Milton Keynes, UK
Stefan Rüger
Mathematics & Computer Science Department, Emory University, 400 dowman Drive, 30329, Atlanta, GA, USA
Eugene Agichtein
Department of Computer Science, University College London, Gower Street, WC1E 6BT, London, UK
Emine Yilmaz

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Raiber, F., Kurland, O. (2013). Using Document-Quality Measures to Predict Web-Search Effectiveness. In: Serdyukov, P., et al. Advances in Information Retrieval. ECIR 2013. Lecture Notes in Computer Science, vol 7814. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-36973-5_12

Download citation

DOI: https://doi.org/10.1007/978-3-642-36973-5_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-36972-8
Online ISBN: 978-3-642-36973-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics