Information Retrieval

, Volume 13, Issue 1, pp 46–69 | Cite as

Click-based evidence for decaying weight distributions in search effectiveness metrics

  • Yuye Zhang
  • Laurence A. F. Park
  • Alistair Moffat
Article

Abstract

Search effectiveness metrics are used to evaluate the quality of the answer lists returned by search services, usually based on a set of relevance judgments. One plausible way of calculating an effectiveness score for a system run is to compute the inner-product of the run’s relevance vector and a “utility” vector, where the ith element in the utility vector represents the relative benefit obtained by the user of the system if they encounter a relevant document at depth i in the ranking. This paper uses such a framework to examine the user behavior patterns—and hence utility weightings—that can be inferred from a web query log. We describe a process for extrapolating user observations from query log clickthroughs, and employ this user model to measure the quality of effectiveness weighting distributions. Our results show that for measures with static distributions (that is, utility weighting schemes for which the weight vector is independent of the relevance vector), the geometric weighting model employed in the rank-biased precision effectiveness metric offers the closest fit to the user observation model. In addition, using past TREC data as to indicate likelihood of relevance, we also show that the distributions employed in the BPref and MRR metrics are the best fit out of the measures for which static distributions do not exist.

Keywords

Effectiveness metric Query log Clickthrough Rank-biased precision Average precision Reciprocal rank BPref 

References

  1. Agichtein, E., Brill, E., & Dumais, S. (2006). Improving web search ranking by incorporating user behavior information. In Proceedings of the 29th annual international ACM SIGIR conference on research and development in information retrieval (pp. 19–26). New York, NY: ACM. doi:10.1145/1148170.1148177.
  2. Buckley, C., & Voorhees, E. M. (2004). Retrieval evaluation with incomplete information. In Proceedings of the 27th annual international ACM SIGIR conference on research and development in information retrieval (pp. 25–32). New York, NY: ACM. doi:10.1145/1008992.100900.
  3. Clarke, C. L. A., Craswell, N., & Soboroff, I. (2004). Overview of the TREC 2004 terabyte track. In Proceedings of the 2004 TREC text retrieval conference. National Institute of Standards and Technology. http://trec.nist.gov/pubs/trec13/papers/TERA.OVERVIEW.pdf
  4. Cooper, W. S. (1968). Expected search length: A single measure of retrieval effectiveness based on the weak ordering action of retrieval systems. American Documentation, 19, 30–41.CrossRefGoogle Scholar
  5. Craswell, N., & Hawking, D. (2001). Overview of the TREC-2001 web track. In Proceedings of the 2001 TREC text retrieval conference. National Institute of Standards and Technology. http://trec.nist.gov/pubs/trec10/papers/web2001.ps
  6. Hawking, D. (2000). Overview of the TREC-9 web track. In Proceedings of the 2000 TREC text retrieval conference. National Institute of Standards and Technology. http://trec.nist.gov/pubs/trec9/papers/web9.pdf
  7. Jansen, B. J., & Spink, A. (2006). How are we searching the world wide web? A comparison of nine search engine transaction logs. Information Processing and Management, 42(1), 248–263. doi:10.1016/j.ipm.2004.10.00.
  8. Järvelin, K., & Kekäläinen, J. (2002). Cumulated gain-based evaluation of IR techniques. ACM Transactions on Information Systems, 20(4), 422–446.CrossRefGoogle Scholar
  9. Joachims, T. (2002). Optimizing search engines using clickthrough data. In KDD ’02: Proceedings of the eighth ACM SIGKDD international conference on knowledge discovery and data mining (pp. 133–142). New York, NY: ACM. doi:10.1145/775047.775067.
  10. Joachims, T., Granka, L., Pan, B., Hembrooke, H., & Gay, G. (2005). Accurately interpreting clickthrough data as implicit feedback. In Proceedings of the 28th annual international ACM SIGIR conference on research and development in information retrieval (pp. 154–161). New York, NY: ACM. doi:10.1145/1076034.1076063.
  11. Kendall, M. G., & Gibbons, J. D. (1990). Rank correlation methods (5th ed.). New York: Oxford University Press.MATHGoogle Scholar
  12. Lee, U., Liu, Z., & Cho, J. (2005). Automatic identification of user goals in web search. In Proceedings of the 14th international conference on the World Wide Web (pp. 391–400). New York, NY: ACM. doi:10.1145/1060745.1060804.
  13. Liu, Y., Gao, B., Liu, T. Y., Zhang, Y., Ma, Z., He, S., et al. (2008). Browse rank: Letting web users vote for page importance. In Proceedings of the 31st annual international ACM SIGIR conference on research and development in information retrieval (pp. 451–458). New York, NY: ACM. doi:10.1145/1390334.139041.
  14. Moffat, A., Webber, W., & Zobel, J. (2007). Strategic system comparisons via targeted relevance judgments. In Proceedings of the 30th annual international ACM SIGIR conference on research and development in information retrieval (pp. 375–382). New York, NY: ACM. doi:10.1145/1277741.127780.
  15. Moffat, A., & Zobel, J. (2008). Rank-biased precision for measurement of retrieval effectiveness. ACM Transactions on Information Systems, 27(1–2), 1–27.CrossRefGoogle Scholar
  16. Park, L. A., & Zhang, Y. (2007). On the distribution of user persistence for rank-biased precision. In Proceedings of the 12th Australasian document computing symposium (pp. 17–24). Australia: School of Computer Science and Information Technology, RMIT University.Google Scholar
  17. Robertson, S. (2008). A new interpretation of average precision. In Proceedings of the 31st annual international ACM SIGIR conference on research and development in information retrieval (pp. 689–690). New York, NY: ACM. doi:10.1145/1390334.1390453.
  18. Sakai, T. (2004). Ranking the NTCIR systems based on multigrade relevance. In Proceedings of the Asian information retrieval symposium. LNCS (Vol. 3411, pp. 251–262), Berlin, Heidelberg: Springer.Google Scholar
  19. Sakai, T. (2007). Alternatives to BPref. In Proceedings of the 30th annual international ACM SIGIR conference on research and development in information retrieval (pp. 71–78). New York, NY: ACM. doi:10.1145/1277741.1277756.
  20. Sakai, T., & Robertson, S. (2008). Modelling a user population for designing information retrieval metrics. In EVIA 2008 A satellite workshop of NTCIR-7: Proceedings of the second intenational workshop on evaluating information access (pp. 30–41). Tokyo, Japan: National Institute of Informatics.Google Scholar
  21. Teevan, J., Dumais, S. T., & Liebling, D. J. (2008). To personalize or not to personalize: Modeling queries with variation in user intent. In Proceedings of the 31st annual international ACM SIGIR conference on research and development in information retrieval (pp. 163–170). New York, NY: ACM. doi:10.1145/1390334.1390364.
  22. Turpin, A., Scholer, F., Billerbeck, B., & Abel, L. A. (2006). Examining the pseudo-standard web search engine results page. In Proceedings of the 11th Australasian document computing symposium (pp. 9–16). Australia: Faculty of Information Technology, Queensland University of Technology.Google Scholar
  23. Voorhees, E. M., & Harman, D. (2000). Overview of the ninth text retrieval conf. (TREC-9). In Proceedings of the 2000 TREC text retrieval conference. National Institute of Standards and Technology. http://trec.nist.gov/pubs/trec9/papers/overview_9.pdf
  24. Webber, W., Moffat, A., & Zobel, J. (2008). Score standardization for inter-collection comparison of retrieval systems. In Proceedings of the 31st annual international ACM SIGIR conference on research and development in information retrieval (pp. 51–58). New York, NY: ACM. doi:10.1145/1390334.1390346.
  25. Zobel, J. (1998). How reliable are the results of large-scale information retrieval experiments? In Proceedings of the 21st annual international ACM SIGIR conference on research and development in information retrieval (pp. 307–314). New York, NY: ACM. doi:10.1145/290941.291014.

Copyright information

© Springer Science+Business Media, LLC 2009

Authors and Affiliations

  • Yuye Zhang
    • 1
  • Laurence A. F. Park
    • 1
  • Alistair Moffat
    • 1
  1. 1.Department of Computer Science and Software EngineeringThe University of MelbourneMelbourneAustralia

Personalised recommendations