Information Retrieval

, Volume 13, Issue 4, pp 346–374 | Cite as

LETOR: A benchmark collection for research on learning to rank for information retrieval

Article

Abstract

LETOR is a benchmark collection for the research on learning to rank for information retrieval, released by Microsoft Research Asia. In this paper, we describe the details of the LETOR collection and show how it can be used in different kinds of researches. Specifically, we describe how the document corpora and query sets in LETOR are selected, how the documents are sampled, how the learning features and meta information are extracted, and how the datasets are partitioned for comprehensive evaluation. We then compare several state-of-the-art learning to rank algorithms on LETOR, report their ranking performances, and make discussions on the results. After that, we discuss possible new research topics that can be supported by LETOR, in addition to algorithm comparison. We hope that this paper can help people to gain deeper understanding of LETOR, and enable more interesting research projects on learning to rank and related topics.

Keywords

Learning to rank Information retrieval Benchmark datasets Feature extraction 

References

  1. Asuncion, A., & Newman, D. J. (2007). UCI machine learning repository. Irvine, CA: University of California, School of Information and Computer Science.Google Scholar
  2. Baeza-Yates, R., & Ribeiro-Neto, B. (1999). Modern information retrieval. New York: Addison Wesley.Google Scholar
  3. Brin, S., & Page, L. (1998). The anatomy of a large-scale hypertextual web search engine. Computer Networks and ISDN Systems, 30(1–7), 107–117.CrossRefGoogle Scholar
  4. Burges, C., Shaked, T., Renshaw, E., Lazier, A., Deeds, M., Hamilton, N., et al. (2005). Learning to rank using gradient descent. In ICML ’05: Proceedings of the 22nd International Conference on Machine Learning (pp. 89–96). New York, NY: ACM Press.Google Scholar
  5. Cao, Y., Xu, J., Liu, T.-Y., Li, H., Huang, Y., & Hon, H.-W. (2006). Adapting ranking svm to document retrieval. In SIGIR ’06: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 186–193). New York, NY: ACM Press.Google Scholar
  6. Cao, Z., Qin, T., Liu, T.-Y., Tsai, M.-F., et al. (2007). Learning to rank: From pairwise approach to listwise approach. In ICML ’07: Proceedings of the 24th International Conference on Machine Learning (pp. 129–136). New York, NY: ACM Press.Google Scholar
  7. Chechik, G., Heitz, G., Elidan, G., Abbeel, P., & Koller, D. (2008). Max-margin classification of data with absent features. Journal of Machine Learning Research, 9, 1–21.Google Scholar
  8. Chirita, P., Diederich, J., & Nejdl, W. (2005). MailRank: Using ranking for spam detection. In Proceedings of the 14th ACM International Conference on Information and Knowledge Management (pp. 373–380). New York, NY: ACM.Google Scholar
  9. Collins, M. (2002). Ranking algorithms for named-entity extraction: Boosting and the voted perceptron. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, July (pp. 07–12).Google Scholar
  10. Craswell, N., & Hawking, D. (2004). Overview of the TREC 2004 Web track. In Proceedings of the 13th Text Retrieval Conference (TREC 2004). Gaithersburg, MD: NIST.Google Scholar
  11. Craswell, N., Hawking, D., Wilkinson, R., & Wu, M. (2003). Overview of the TREC 2003 web track. In Proceedings of TREC 2003 (pp. 78–92).Google Scholar
  12. Dave, K., Lawrence, S., & Pennock, D. (2003). Mining the peanut gallery: Opinion extraction and semantic classification of product reviews. In Proceedings of the 12th International Conference on World Wide Web (pp. 519–528). New York, NY: ACM Press.Google Scholar
  13. Freund, Y., Iyer, R., Schapire, R. E., & Singer, Y. (2003). An efficient boosting algorithm for combining preferences. Journal of Machine Learning Research, 4, 933–969.CrossRefMathSciNetGoogle Scholar
  14. Geng, X., Liu, T.-Y., Qin, T., Arnold, A., Li, H., & Shum, H.-Y. (2008). Query dependent ranking using k-nearest neighbor. In SIGIR ’08: Proceedings of the 31th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 115–122). New York, NY: ACM.Google Scholar
  15. Gyöngyi, Z., Garcia-Molina, H., & Pedersen, J. (2004). Combating web spam with trustrank. In Proceedings of the Thirtieth International Conference on Very Large Data Bases-Volume 30 (pp. 576–587). VLDB Endowment.Google Scholar
  16. Harrington, E. F. (2003). Online ranking/collaborative filtering using the perceptron algorithm. In Proceedings of the 20th International Conference on Machine Learning (pp. 250–257).Google Scholar
  17. Herbrich, R., Graepel, T., & Obermayer, K. (1999). Support vector learning for ordinal regression. In ICANN1999 (pp. 97–102).Google Scholar
  18. Hersh, W., Buckley, C., Leone, T. J., & Hickam, D. (1994). Ohsumed: An interactive retrieval evaluation and new large test collection for research. In SIGIR ’94 (pp. 192–201). New York, NY: Springer.Google Scholar
  19. Hu, Y., Xin, G., Song, R., Hu, G., Shi, S., Cao, Y., et al. (2005). Title extraction from bodies of html documents and its application to web page retrieval. In SIGIR ’05: Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 250–257). New York, NY: ACM.Google Scholar
  20. Huang, J. C., & Frey, B. J. (2009). Structured ranking learning using cumulative distribution networks. In D. Koller, D. Schuurmans, Y. Bengio, & L. Bottou (Eds.), Advances in Neural Information Processing Systems 21 (pp. 697–704). Cambridge: MIT Press.Google Scholar
  21. Järvelin, K., & Kekäläinen, J. (2002). Cumulated gain-based evaluation of IR techniques. ACM Transactions on Information Systems, 20(4), 422–446.CrossRefGoogle Scholar
  22. Joachims, T. (2002). Optimizing search engines using clickthrough data. In KDD ’02: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 133–142). New York, NY: ACM Press.Google Scholar
  23. Lewis, D., Yang, Y., Rose, T., & Li, F. (2004). RCV1: A new benchmark collection for text categorization research. The Journal of Machine Learning Research, 5, 361–397.Google Scholar
  24. Li, L., & Lin, H.-T. (2006). Ordinal regression by extended binary classification. In NIPS (pp. 865–872).Google Scholar
  25. Li, P., Burges, C., & Wu, Q. (2008). Mcrank: Learning to rank using multiple classification and gradient boosting. In Advances in Neural Information Processing Systems 20 (pp. 897–904). Cambridge, MA: MIT Press.Google Scholar
  26. Matveeva, I., Burges, C., Burkard, T., Laucius, A., & Wong, L. (2006). High accuracy retrieval with multiple nested ranker. In SIGIR ’06: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 437–444). New York, NY: ACM.Google Scholar
  27. Minka, T., & Robertson, S. (2008). Selection bias in the LETOR datasets. In Proceedings of SIGIR 2008 Workshop on Learning to Rank for Information Retrieval.Google Scholar
  28. Nie, L., Davison, B. D., & Qi, X. (2006). Topical link analysis for web search. In SIGIR ’06: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 91–98). New York, NY: ACM.Google Scholar
  29. Pang, B., & Lee, L. (2005). Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. In Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics (pp. 115–124). NJ: Association for Computational Linguistics Morristown.Google Scholar
  30. Qin, T., Liu, T., Xu, J., & Li, H. (2008a). How to make LETOR more useful and reliable. In Proceedings of SIGIR 2008 Workshop on Learning to Rank for Information Retrieval.Google Scholar
  31. Qin, T., Liu, T.-Y., & Li, H. (2008b). A general approximation framework for direct optimization of information retrieval measures. Technical Report MSR-TR-2008-164, Microsoft Corporation.Google Scholar
  32. Qin, T., Liu, T.-Y., Zhang, X.-D., Chen, Z., & Ma, W.-Y. (2005). A study of relevance propagation for web search. In SIGIR ’05 (pp. 408–415). New York, NY: ACM Press.Google Scholar
  33. Qin, T., Liu, T.-Y., Zhang, X.-D., Wang, D.-S., & Li, H. (2008c). Global ranking using continuous conditional random fields. In D. Koller, D. Schuurmans, Y. Bengio, & L. Bottou (Eds.), NIPS. MIT Press.Google Scholar
  34. Qin, T., Liu, T.-Y., Zhang, X.-D., Wang, D.-S., Xiong, W.-Y., & Li, H. (2008d). Learning to rank relational objects and its application to web search. In WWW ’08: Proceeding of the 17th International Conference on World Wide Web (pp. 407–416). New York, NY: ACM.Google Scholar
  35. Qin, T., Zhang, X.-D., Tsai, M.-F., Wang, D.-S., Liu, T.-Y., & Li, H. (2008e). Query-level loss functions for information retrieval. Information Processing & Management, 44(2):838–855.CrossRefGoogle Scholar
  36. Qin, T., Zhang, X.-D., Wang, D.-S., Liu, T.-Y., Lai, W., & Li, H. (2007). Ranking with multiple hyperplanes. In SIGIR ’07 (pp. 279–286). New York, NY: ACM Press.Google Scholar
  37. Robertson, S., Zaragoza, H., & Taylor, M. (2004). Simple bm25 extension to multiple weighted fields. In CIKM ’04: Proceedings of the Thirteenth ACM International Conference on Information and Knowledge Management (pp. 42–49). New York, NY: ACM.Google Scholar
  38. Robertson S. E., & Hull, D. A. (2000). The TREC-9 filtering track final report. In TREC (pp. 25–40).Google Scholar
  39. Shakery, A., & Zhai, C. (2003). Relevance propagation for topic distillation UIUC TREC-2003 Web track experiments. In Proceedings of TREC (pp. 673–677).Google Scholar
  40. Taylor, M., Guiver, J., Robertson, S., & Minka, T. (2008). Softrank: Optimizing non-smooth rank metrics. In WSDM ’08: Proceedings of the International Conference on Web Search and Web Data Mining (pp. 77–86). New York, NY: ACM.Google Scholar
  41. Tsai, M.-F., Liu, T.-Y., Qin, T., Chen, H.-H., & Ma, W.-Y. (2007). Frank: A ranking method with fidelity loss. In SIGIR ’07 (pp. 383–390). New York, NY: ACM PressGoogle Scholar
  42. Volkovs, M. N., & Zemel, R. S. (2009). Boltzrank: Learning to maximize expected ranking gain. In ICML ’09: Proceedings of the 26th Annual International Conference on Machine Learning (pp. 1089–1096). New York, NY: ACMGoogle Scholar
  43. Voorhees, E., & Harman, D. (2005). TREC: Experiment and evaluation in information retrieval. Cambridge, MA: MIT Press.Google Scholar
  44. Xia, F., Liu, T.-Y., Wang, J., Zhang, W., & Li, H. (2008). Listwise approach to learning to rank—Theory and algorithm. In ICML ’08: Proceedings of the 25th International Conference on Machine Learning. New York, NY: ACM Press.Google Scholar
  45. Xu, J., Cao, Y., Li, H., & Zhao, M. (2005). Ranking definitions with supervised learning methods. In International World Wide Web Conference (pp. 811–819). New York, NY: ACM Press.Google Scholar
  46. Xu, J., & Li, H. (2007). Adarank: A boosting algorithm for information retrieval. In SIGIR ’07: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 391–398). New York, NY: ACM Press.Google Scholar
  47. Xu, J., Liu, T.-Y., Lu, M., Li, H., & Ma, W.-Y. (2008). Directly optimizing evaluation measures in learning to rank. In SIGIR ’08: Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 107–114). New York, NY: ACM.Google Scholar
  48. Xue, G.-R., Yang, Q., Zeng, H.-J., Yu, Y., & Chen, Z. (2005). Exploiting the hierarchical structure for link analysis. In SIGIR ’05: Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 186–193). New York, NY: ACM.Google Scholar
  49. Yue, Y., Finley, T., Radlinski, F., & Joachims, T. (2007). A support vector method for optimizing average precision. In SIGIR ’07: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 271–278). New York, NY: ACM PressGoogle Scholar
  50. Zhai, C., & Lafferty, J. (2001). A study of smoothing methods for language models applied to Ad Hoc information retrieval. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 334–342). New York, NY: ACM Press.Google Scholar
  51. Zhai, C. X., Cohen, W. W., & Lafferty, J. (2003). Beyond independent relevance: Methods and evaluation metrics for subtopic retrieval. In SIGIR ’03: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Informaion Retrieval (pp. 10–17). New York, NY: ACM Press.Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2009

Authors and Affiliations

  1. 1.Microsoft Research AsiaBeijingChina

Personalised recommendations