Decoding multi-click search behavior based on marginal utility

Abstract

Query logs contain rich feedback information from users interacting with search engines. Therefore, various click models have been developed to interpret users’ search behavior and to extract useful knowledge from query logs. However, most existing models are not designed to consider novelty bias in click behavior. The underlying hypothesis behind this paper is that given the previously clicked documents, a user tends to choose documents which provide novel relevant information to satisfy her information need, rather than redundant relevant information. Moreover, the prior click models have been mainly tested on frequently occurring queries, hence, leaving a large proportion of sparse queries uncovered. In this paper, we propose to predict users’ click behavior from the perspective of utility theory (i.e., utility and marginal utility). In particular, as a complement to the examination hypothesis, we introduce a new hypothesis called marginal utility hypothesis to characterize the effect of novelty bias on users’ click behavior by exploring the semantic divergence among documents in a result list. Moreover, to cope with sparse or unseen queries that have not been observed in the training set, we use a set of descriptive features to quantify the probability of a document being relevant and probability of a document providing marginally (novel) useful information. Finally, a series of experiments are conducted on a real-world data set to validate the effectiveness of the proposed methods. The experimental results verify the effectiveness of interpreting users’ click behavior based on the marginal utility hypothesis, especially when query sessions contain sparse queries or unseen query-document pairs.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Notes

  1. 1.

    http://research.microsoft.com/en-us/um/beijing/projects/letor/.

  2. 2.

    http://www.gregsadetsky.com/aol-data/.

  3. 3.

    http://www.sogou.com/labs/dl/q-e.html.

  4. 4.

    http://research.microsoft.com/en-us/um/people/nickcr/wscd09/.

  5. 5.

    http://research.microsoft.com/en-us/um/people/nickcr/wscd2012/.

  6. 6.

    http://research.microsoft.com/en-us/um/people/nickcr/wscd2013/datasets.htm

  7. 7.

    http://research.microsoft.com/en-us/um/people/nickcr/wscd2014/.

  8. 8.

    https://en.wikipedia.org/wiki/Standard_score.

References

  1. Baeza-Yates, R., Gionis, A., Junqueira, F., Murdock, V., Plachouras, V., & Silvestri, F. (2007). The impact of caching on search engines. In Proceedings of the 30th SIGIR, pp 183–190.

  2. Baeza-Yates, R., & Tiberi, A. (2007). Extracting semantic relations from query logs. In Proceedings of the 13th KDD, pp. 76–85.

  3. Bates, M. J. (1989). The design of browsing and berrypicking techniques for the online search interface. Online Review, 13(5), 407–424.

    Article  Google Scholar 

  4. Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent dirichlet allocation. Journal of Machine Learning Research, 3, 993–1022.

    MATH  Google Scholar 

  5. Borisov, A., Markov, I., de Rijke, M., & Serdyukov, P. (2016). A neural click model for web search. In Proceedings of the 25th WWW, pp. 531–541.

  6. Burges, C., Shaked, T., Renshaw, E., Lazier, A., Deeds, M., Hamilton, N., & Hullender, G. (2005) Learning to rank using gradient descent. In Proceedings of the 22nd ICML, pp. 89–96.

  7. Carl, M. (2007). Principles of Economics. Auburn, Alabama: Ludwig von Mises Institute.

  8. Chapelle, O., & Zhang, Y. (2009). A dynamic bayesian network click model for web search ranking. In Proceedings of the 18th WWW, pp. 1–10.

  9. Chen, W., Wang, D., Zhang, Y., Chen, Z., Singla, A., & Yang, Q. (2012) A noise-aware click model for web search. In Proceedings of the 5th WSDM, pp. 313–322.

  10. Chuklin, A., Markov, I., & de Rijke, M. (2015). Click Models for Web Search, (Vol. 7). Synthesis lectures on information concepts: Retrieval, and services.

  11. Chuklin, A., Serdyukov, P., & de Rijke, M. (2013) Using intent information to model user behavior in diversified search. In Proceedings of the 35th ECIR, pp. 1–13.

  12. Clarke, C.L., Kolla, M., Cormack, G.V., Vechtomova, O., Ashkan, A., Büttcher, S., & MacKinnon, I. (2008) Novelty and diversity in information retrieval evaluation. In Proceedings of the 31st SIGIR, pp. 659–666.

  13. Craswell, N., Zoeter, O., Taylor, M., & Ramsey, B. (2008) An experimental comparison of click position-bias models. In Proceedings of the 1st WSDM, pp. 87–94.

  14. Dupret, G., & Liao, C. (2010). A model to estimate intrinsic document relevance from the clickthrough logs of a web search engine. In Proceedings of the 3rd WSDM, pp. 181–190.

  15. Dupret, G. E., & Piwowarski, B. (2008). A user browsing model to predict search engine click data from past observations. In Proceedings of the 31st SIGIR, pp. 331–338.

  16. Granka, L.A., Joachims, T., & Gay, G. (2004) Eye-tracking analysis of user behavior in WWW search. In Proceedings of the 27th SIGIR, pp. 478–479.

  17. Guo, F., Liu, C., Kannan, A., Minka, T., Taylor, M., Wang, Y., & Faloutsos, C. (2009). Click chain model in web search. In Proceedings of the 18th WWW, pp. 11–20.

  18. Hinton, G. E., McClelland, J. L., & Rumelhart, D. E. (1986). Parallel distributed processing: Explorations in the microstructure of cognition, Vol. 1. chapter Distributed representations, pp. 77–109.

  19. Hong, D., & Si, L. (2013). Search result diversification in resource selection for federated search. In Proceedings of the 36th SIGIR, pp. 613–622.

  20. Hu, B., Zhang, Y., Chen, W., Wang, G., & Yang, Q. (2011). Characterizing search intent diversity into click models. In Proceedings of the 20th WWW, pp. 17–26.

  21. Huang, J., White, R. W., Buscher, G., & Wang, K. (2012). Improving searcher models using mouse cursor activity. In Proceedings of the 35th SIGIR, pp. 195–204.

  22. Järvelin, K., & Kekäläinen, J. (2002). Cumulated gain-based evaluation of IR techniques. ACM Transactions on Information Systems, 20(4), 422–446.

    Article  Google Scholar 

  23. Jiang, J., He, D., & Allan, J. (2014). Searching, browsing, and clicking in a search session: Changes in user behavior by task and over time. In Proceedings of the 37th SIGIR, pp. 607–616.

  24. Lee, C., Teevan, J., & de la Chica, S. (2014). Characterizing multi-click search behavior and the risks and opportunities of changing results during use. In Proceedings of the 37th SIGIR, pp. 515–524.

  25. Liu, C., Guo, F., & Faloutsos, C. (2010). Bayesian browsing model: Exact inference of document relevance from petabyte-scale data. ACM Transactions on Knowledge Discovery from Data, 4(4), 19:1–19:26.

    Article  Google Scholar 

  26. Liu, D. C., & Nocedal, J. (1989). On the limited memory BFGS method for large scale optimization. Journal of Mathematical Programming, 45(3), 503–528.

    MathSciNet  Article  MATH  Google Scholar 

  27. Liu, Q., Yu, F., Wu, S., & Wang, L. (2015). A convolutional click prediction model. In Proceedings of the 24th CIKM, pp. 1743–1746.

  28. Lucchese, C., Orlando, S., Perego, R., Silvestri, F., & Tolomei, G. (2013). Discovering tasks from search engine query logs. ACM Transactions on Information Systems, 31(3), 14:1–14:43.

    Article  Google Scholar 

  29. Manning, C . D., Raghavan, P., & Schütze, H. (2008). Introduction to information retrieval. Cambridge: Cambridge University Press.

    Google Scholar 

  30. Nickel, M. (2013). Tensor factorization for relational learning. PhD thesis, Ludwig Maximilian University.

  31. Paccanaro, A., & Hinton, G. E. (2001). Learning distributed representations of concepts using linear relational embedding. IEEE Transactions on Knowledge and Data Engineering, 13(2), 232–244.

    Article  Google Scholar 

  32. Petersen, C., Simonsen, J. G., & Lioma, C. (2016). Power law distributions in information retrieval. ACM Transactions on Information Systems, 34(2), 8:1–8:37.

    Article  Google Scholar 

  33. Radlinski, F., Bennett, P. N., Carterette, B., & Joachims, T. (2009). Redundancy, diversity and interdependent document relevance. SIGIR Forum, 43, 46–52.

    Article  Google Scholar 

  34. Radlinski, F., & Craswell, N. (2010) Comparing the sensitivity of information retrieval metrics. In Proceedings of the 33rd SIGIR, pp. 667–674.

  35. Radlinski, F., Kurup, M., & Joachims, T. (2008) How does clickthrough data reflect retrieval quality? In Proceedings of the 17th CIKM, pp. 43–52.

  36. Richardson, M., Dominowska, E., & Ragno, R. (2007). Predicting clicks: Estimating the click-through rate for new ads. In Proceedings of the 16th WWW, pp. 521–530.

  37. Silvestri, F. (2010). Mining query logs: Turning search usage data into knowledge. Foundations and Trends in Information Retrieval, 4(1–2), 1–174.

    Article  MATH  Google Scholar 

  38. Teevan, J., Adar, E., Jones, R., & Potts, M. A. S. (2007). Information re-retrieval: repeat queries in Yahoo’s logs. In Proceedings of the 30th SIGIR, pp. 151–158.

  39. Tefko, S. (1997). The stratified model of information retrieval interaction: extension and applications. In Proceedings of the American Society for Information Science, pp. 313–327.

  40. Tyler, S. K., & Teevan, J. (2010). Large scale query log analysis of re-finding. In Proceedings of the 3rd WSDM, pp. 191–200.

  41. Wang, C., Liu, Y., Wang, M., Zhou, K., Nie, J., & Ma, S. (2015). Incorporating non-sequential behavior into click models. In Proceedings of the 38th SIGIR, pp. 283–292.

  42. Wang, C., Liu, Y., Zhang, M., Ma, S., Zheng, M., Qian, J., & Zhang, K. (2013). Incorporating vertical results into search click models. In Proceedings of the 36th SIGIR, pp. 503–512.

  43. Wang, D., Chen, W., Wang, G., Zhang, Y., & Hu, B. (2010). Explore click models for search ranking. In Proceedings of the 19th CIKM, pp. 1417–1420.

  44. Wang, H., Zhai, C., Dong, A., & Chang, Y. (2013). Content-aware click modeling. In Proceedings of the 22nd WWW, pp. 1365–1376.

  45. Xia, L., Xu, J., Lan, Y., Guo, J., & Cheng, X. (2015). Learning maximal marginal relevance model via directly optimizing diversity evaluation measures. In Proceedings of the 38th SIGIR.

  46. Xu, D., Liu, Y., Zhang, M., Ma, S., & Ru, L. (2012). Incorporating revisiting behaviors into click models. In Proceedings of the 5th WSDM, pp. 303–312.

  47. Yu, H., & Ren, F. (2014). Search result diversification via filling up multiple knapsacks. In Proceedings of the 23rd CIKM, pp. 609–618.

  48. Zhang, Y., Chen, W., Wang, D., & Yang, Q. (2011). User-click modeling for understanding and predicting search-behavior. In Proceedings of the 17th KDD, pp. 1388–1396.

  49. Zhang, Y., Dai, H., Xu, C., Feng, J., Wang, T., Bian, J., Wang, B., & Liu, T. Y. (2014). Sequential click prediction for sponsored search with recurrent neural networks. In Proceedings of the 28th AAAI, pp. 1369–1375.

  50. Zhang, Y., Wang, D., Wang, G., Chen, W., Zhang, Z., Hu, B., & Zhang, L. (2010) Learning click models via probit bayesian inference. In Proceedings of the 19th CIKM, pp. 439–448.

  51. Zhang, Z., & Nasraoui, O. (2006). Mining search engine query logs for query recommendation. In Proceedings of the 15th WWW, pp. 1039–1040.

  52. Zhong, F., Wang, D., Wang, G., Chen, W., Zhang, Y., Chen, Z., & Wang, H. (2010). Incorporating post-click behaviors into a click model. In Proceedings of the 33rd SIGIR, pp. 355–362.

  53. Zhu, Y., Lan, Y., Guo, J., Cheng, X., & Niu, S. (2014). Learning for search result diversification. In Proceedings of the 37th SIGIR, pp. 293–302.

  54. Zhu, Z.A., Chen, W., Minka, T., Zhu, C., & Chen, Z. (2010). A novel click model and its applications to online advertising. In Proceedings of the 3rd WSDM, pp. 321–330.

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Hai-Tao Yu.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Yu, HT., Jatowt, A., Blanco, R. et al. Decoding multi-click search behavior based on marginal utility. Inf Retrieval J 20, 25–52 (2017). https://doi.org/10.1007/s10791-016-9289-z

Download citation

Keywords

  • Click model
  • Query session
  • Novelty bias
  • Marginal utility