Abstract
This paper proposes a new approach to ranking the documents retrieved by a search engine using click-through data. The goal is to make the final ranked list of documents accurately represent users’ preferences reflected in the click-through data. Our approach combines the ranking result of a traditional IR algorithm (BM25) with that given by a machine learning algorithm (Naïve Bayes). The machine learning algorithm is trained on click-through data (queries and their associated documents), while the IR algorithm runs over the document collection. We consider several alternative strategies for combining the result of using click-through data and that of using document data. Experimental results confirm that any method of using click-through data greatly improves the preference ranking, over the method of using BM25 alone. We found that a linear combination of scores of Naïve Bayes and scores of BM25 performs the best for the task. At the same time, we found that the preference ranking methods can preserve relevance ranking, i.e., the preference ranking methods can perform as well as BM25 for relevance ranking.
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Anick, P.: Using terminological feedback for web search refinement – a log-based study. In: Proceedings of SIGIR 2003, pp. 88–95 (2003)
Aslam, J.A., Montague, M.H.: Models for Metasearch. In: Proceedings of SIGIR 2001, pp. 275–284 (2001)
Bartell, B.T., Cottrell, G.W., Belew, R.K.: Automatic combination of multiple ranked retrieval systems. In: Proceedings of SIGIR 1994, pp. 173–181 (1994)
Beeferman, D., Berger, A.: Agglomerative clustering of a search engine query log. In: Proceedings of SIGKDD 2000, pp. 407–416 (2000)
Buckley, C., Salton, G., Allan, J., Singhal, A.: Automatic Query Expansion Using SMART: TREC 3. TREC 1994, 69–80 (1994)
Cui, H., Wen, J.R., Nie, J.Y., Ma, W.Y.: Probabilistic query expansion using query logs. In: Proceedings of WWW 2002, pp. 325–332 (2002)
Dumais, S., Joachims, T., Bharat, K., Weigend, A.: SIGIR 2003 Workshop Report: Implicit Measures of User Interests and Preferences. SIGIR Forum 37(2), 50–54 (2003)
Fox, E.A., Shaw, J.A.: Combination of multiple searches. In: Proceedings of TREC-2, pp. 243–249 (1994)
Greengrass, E.: Information Retrieval: a Survey (2000), http://www.cs.umbc.edu/cadip/readings/IR.report.120600.book.pdf
Hull, D.A.: Using Statistical Testing in the Evaluation of Retrieval Experiments. In: Proceedings of SIGIR 1993, pp. 329–338 (1993)
Joachims, T.: Optimizing search engines using clickthrough data. In: Proceedings of SIGKDD 2002, pp. 133–142 (2002)
Joachims, T., Granka, L., Pan, B., Hembrooke, H., Gay, G.: Accurately interpreting clickthrough data as implicit feedback. In: SIGIR 2005, pp. 154–161 (2005)
Lee, J.H.: Analyses of multiple evidence combination. In: Proceedings of SIGIR 1997, pp. 267–276 (1997)
Ling, C.X., Gao, J.F., Zhang, H.J., Qian, W.N., Zhang, H.J.: Improving Encarta search engine performance by mining user logs. International Journal of Pattern Recognition and Artificial Intelligence 16(8), 1101–1116 (2002)
Manmatha, R., Rath, T., Feng, F.: Modeling score distributions for combining the outputs of search engines. In: Proceedings of SIGIR 2001, pp. 267–275 (2001)
Mitchell, T.M.: Machine Learning. McGraw Hill, New York (1997)
Oztekin, B., Karypis, G., Kumar, V.: Expert agreement and content based reranking in a meta search environment using mearf. In: Proceedings of WWW 2002, pp. 333–344 (2002)
Radlinski, F., Joachims, T.: Query chains: learning to rank from implicit feedback. In: KDD 2005, pp. 239–248 (2005)
Robertson, S.E., Walker, S., Jones, S., Hancock-Beaulieu, M.M., Gatford, M.: Okapi at TREC-3. In: Overview of the Third Text REtrieval Conference (TREC-3), pp. 109–126 (1995)
Salton, G., Buckley, C.: Improving retrieval performance by relevance feedback. Journal of American Society for Information Sciences 41, 288–297 (1990)
Shen, X., Tan, B., Zhai, C.: Context-sensitive information retrieval using implicit feedback. In: SIGIR 2005, pp. 43–50 (2005)
Silverstein, C., Henzinger, M., Marais, H., Moricz, M.: Analysis of a Very Large AltaVista Query Log. Technical Report SRC 1998-014, Digital Systems Research Center (1998)
Spink, A., Jansen, B.J., Wolfram, D., Saracevic, T.: From e-sex to e-commerce: web search changes. IEEE Computer 35(3), 107–109 (2002)
Spink, A., Wolfram, D., Jansen, B.J., Saracevic, T.: Searching the web: the public and their queries. Journal of the American Society of Information Science and Technology 52(3), 226–234 (2001)
Vogt, C.C., Cottrell, G.W.: Predicting the performance of linearly combined IR systems. In: Proceedings of SIGIR 1998, pp. 190–196 (1998)
White, R.W., Ruthven, I., Jose, J.M.: The use of implicit evidence for relevance feedback in web retrieval. In: Crestani, F., Girolami, M., van Rijsbergen, C.J.K. (eds.) ECIR 2002. LNCS, vol. 2291, pp. 93–109. Springer, Heidelberg (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Zhao, M., Li, H., Ratnaparkhi, A., Hon, HW., Wang, J. (2006). Adapting Document Ranking to Users’ Preferences Using Click-Through Data. In: Ng, H.T., Leong, MK., Kan, MY., Ji, D. (eds) Information Retrieval Technology. AIRS 2006. Lecture Notes in Computer Science, vol 4182. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11880592_3
Download citation
DOI: https://doi.org/10.1007/11880592_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-45780-0
Online ISBN: 978-3-540-46237-8
eBook Packages: Computer ScienceComputer Science (R0)