Advertisement

Weighted Rank Correlation in Information Retrieval Evaluation

  • Massimo Melucci
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5839)

Abstract

In Information Retrieval (IR), it is common practice to compare the rankings observed during an experiment – the statistical procedure to compare rankings is called rank correlation. Rank correlation helps decide the success of new systems, models and techniques. To measure rank correlation, the most used coefficient is Kendall’s τ. However, in IR, when computing the correlations, the most relevant, useful or interesting items should often be considered more important than the least important items. Despite its simplicity and widespread use, Kendall’s τ little helps discriminate the items by importance. To overcome this drawback, in this paper, a family τ * of rank correlation coefficients for IR has been introduced for discriminating the rank correlation according to the rank of the items. The basis has been provided by the notion of gain previously utilized in retrieval effectiveness measurement. The probability distribution for τ * has also been provided.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Kendall, M.: A new measure of rank correlation. Biometrika 30(1/2), 81–93 (1938)CrossRefzbMATHGoogle Scholar
  2. 2.
    Sillitto, G.: The distribution of Kendall’s τ coefficient of rank correlation in rankings containing ties. Biometrika 34(1/2), 36–40 (1947)MathSciNetCrossRefzbMATHGoogle Scholar
  3. 3.
    Jarvëlin, K., Kekäläinen, J.: Cumulated gain-based evaluation of IR techniques. ACM Transactions on Information Systems 20(4), 422–446 (2002)CrossRefGoogle Scholar
  4. 4.
    Shieh, G.: A weighted Kendall’s tau statistic. Statistics and Probability Letters 39(1), 17–24 (1998)MathSciNetCrossRefzbMATHGoogle Scholar
  5. 5.
    Yilmaz, E., Aslam, J., Robertson, S.: A new rank correlation coefficient for information retrieval. In: Proceedings of the ACM International Conference on Research and Development in Information Retrieval (SIGIR), pp. 587–594 (2008)Google Scholar
  6. 6.
    Fagin, R., Kumar, R., Sivakumar, D.: Comparing top k lists. SIAM Journal of Discrete Mathematics 17(1), 134–160 (2003)MathSciNetCrossRefzbMATHGoogle Scholar
  7. 7.
    Amento, B., Terveen, L., Hill, W.: Does authority mean quality? Predicting expert quality ratings of web documents. In: Proceedings of the ACM International Conference on Research and Development in Information Retrieval (SIGIR), pp. 296–303 (2000)Google Scholar
  8. 8.
    Baeza-Yates, R., Castillo, C., Marín, M., Rodríguez, A.: Crawling a country: better strategies than breadth-first for web page ordering. In: Proceedings of the World Wide Web Conference, pp. 864–872 (2005)Google Scholar
  9. 9.
    Broder, A.Z., Lempel, R., Maghoul, F., Pedersen, J.: Efficient pagerank approximation via graph aggregation. Journal of Information Retrieval 9(2), 123–138 (2006)CrossRefGoogle Scholar
  10. 10.
    Broder, A., Fontoura, M., Josifovski, V., Riedel, L.: A semantic approach to contextual advertising. In: Proceedings of the ACM International Conference on Research and Development in Information Retrieval (SIGIR), pp. 559–556 (2007)Google Scholar
  11. 11.
    Hauff, C., Murdock, V., Baeza-Yates, R.: Improved query difculty prediction for the web. In: Proceedings of the ACM Conference on Information and Knowledge Management (CIKM), pp. 439–448 (2008)Google Scholar
  12. 12.
    Sanderson, M., Joho, H.: Forming test collections with no system pooling. In: Proceedings of the ACM International Conference on Research and Development in Information Retrieval (SIGIR), pp. 25–29 (2004)Google Scholar
  13. 13.
    Aslam, J., Pavlu, V., Yilmaz, E.: A statistical method for system evaluation using incomplete judgments. In: Proceedings of the ACM International Conference on Research and Development in Information Retrieval (SIGIR), pp. 541–547 (2006)Google Scholar
  14. 14.
    Voorhees, E.: Evaluation by highly relevant documents. In: Proceedings of the ACM International Conference on Research and Development in Information Retrieval (SIGIR), pp. 74–82 (2001)Google Scholar
  15. 15.
    Bailey, P., Craswell, N., Soboroff, I., Thomas, P., de Vries, A.P., Yilmaz, E.: Relevance assessment: Are judges exchangeable and does it matter? In: Proceedings of the ACM International Conference on Research and Development in Information Retrieval (SIGIR), pp. 667–674 (2008)Google Scholar
  16. 16.
    Büttcher, S., Clarke, C.L.A., Yeung, P.: Reliable information retrieval evaluation with incomplete and biased judgements. In: Proceedings of the ACM International Conference on Research and Development in Information Retrieval (SIGIR), pp. 63–70 (2008)Google Scholar
  17. 17.
    Sakai, T.: Alternatives to bpref. In: Proceedings of the ACM International Conference on Research and Development in Information Retrieval (SIGIR), pp. 71–78 (2007)Google Scholar
  18. 18.
    Sakai, T.: Comparing metrics across trec and ntcir: The robustness to system bias. In: Proceedings of the ACM Conference on Information and Knowledge Management (CIKM), pp. 581–590 (2008)Google Scholar
  19. 19.
    Carterette, B., Pavlu, V., Kanoulas, E., Aslam, J., Allan, J.: Evaluation over thousands of queries. In: Proceedings of the ACM International Conference on Research and Development in Information Retrieval (SIGIR), pp. 651–658 (2008)Google Scholar
  20. 20.
    Carterette, B.: Robust test collections for retrieval evaluation. In: Proceedings of the ACM International Conference on Research and Development in Information Retrieval (SIGIR), pp. 55–62 (2007)Google Scholar
  21. 21.
    Webber, W., Moffat, A., Zobel, J.: Score standardization for inter-collection comparison of retrieval systems. In: Proceedings of the ACM International Conference on Research and Development in Information Retrieval (SIGIR), pp. 51–58 (2008)Google Scholar
  22. 22.
    Callan, J., Connell, M.: Query-based sampling of text databases. ACM Transactions on Information Systems 19(2), 97–130 (2001)CrossRefGoogle Scholar
  23. 23.
    Caverlee, J., Liu, L., Bae, J.: Distributed query sampling: a quality-conscious approach. In: Proceedings of the ACM International Conference on Research and Development in Information Retrieval (SIGIR), pp. 340–347 (2006)Google Scholar
  24. 24.
    Song, Y., Zhuang, Z., Li, H., Li, Q., Lee, W.C., Giles, C.: Real-time automatic tag recommendation. In: Proceedings of the ACM International Conference on Research and Development in Information Retrieval (SIGIR), pp. 515–522 (2008)Google Scholar
  25. 25.
    Carmel, D., Yom-Tov, E., Darlow, A., Pelleg, D.: What makes a query difficult? In: Proceedings of the ACM International Conference on Research and Development in Information Retrieval (SIGIR), pp. 6–11 (2006)Google Scholar
  26. 26.
    de Moura, E., dos Santos, C., Fernandes, D., da Silva, A., Calado, P., Nascimento, M.: Improving web search efficiency via a locality based static pruning method. In: Proceedings of the World Wide Web Conference, pp. 235–244 (2005)Google Scholar
  27. 27.
    Geng, X., Liu, T.Y., Qin, T., Li, H.: Feature selection for ranking. In: Proceedings of the ACM International Conference on Research and Development in Information Retrieval (SIGIR), pp. 407–414 (2007)Google Scholar
  28. 28.
    Melucci, M.: On rank correlation in information retrieval evaluation. SIGIR Forum 41(1) (June 2007)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Massimo Melucci
    • 1
  1. 1.University of PaduaItaly

Personalised recommendations