Information Retrieval

, Volume 16, Issue 5, pp 584–628 | Cite as

The whens and hows of learning to rank for web search

  • Craig Macdonald
  • Rodrygo L. T. Santos
  • Iadh Ounis
Article

Abstract

Web search engines are increasingly deploying many features, combined using learning to rank techniques. However, various practical questions remain concerning the manner in which learning to rank should be deployed. For instance, a sample of documents with sufficient recall is used, such that re-ranking of the sample by the learned model brings the relevant documents to the top. However, the properties of the document sample such as when to stop ranking—i.e. its minimum effective size—remain unstudied. Similarly, effective listwise learning to rank techniques minimise a loss function corresponding to a standard information retrieval evaluation measure. However, the appropriate choice of how to calculate the loss function—i.e. the choice of the learning evaluation measure and the rank depth at which this measure should be calculated—are as yet unclear. In this paper, we address all of these issues by formulating various hypotheses and research questions, before performing exhaustive experiments using multiple learning to rank techniques and different types of information needs on the ClueWeb09 and LETOR corpora. Among many conclusions, we find, for instance, that the smallest effective sample for a given query set is dependent on the type of information need of the queries, the document representation used during sampling and the test evaluation measure. As the sample size is varied, the selected features markedly change—for instance, we find that the link analysis features are favoured for smaller document samples. Moreover, despite reflecting a more realistic user model, the recently proposed ERR measure is not as effective as the traditional NDCG as a learning loss function. Overall, our comprehensive experiments provide the first empirical derivation of best practices for learning to rank deployments.

Keywords

Learning to rank Evaluation Web search Sample size Document representations Loss function 

References

  1. Amati, G. (2003).Probabilistic models for information retrieval based on divergence from randomness. PhD thesis, Department of Computing Science, University of Glasgow.Google Scholar
  2. Amati, G., Ambrosi, E., Bianchi, M., Gaibisso, C., & Gambosi, G. (2008). FUB, IASI-CNR and University of Tor Vergata at TREC 2007 Blog Track. In Proceedings of the 16th text retrieval conference, TREC ’07.Google Scholar
  3. Arampatzis, A., Kamps, J., & Robertson, S. (2009). Where to stop reading a ranked list?: Threshold optimization using truncated score distributions. In Proceedings of the 32nd annual international ACM SIGIR conference on research and development in information retrieval, SIGIR ’09 (pp. 524–531). doi:10.1145/1571941.1572031.
  4. Aslam, J. A., Kanoulas, E., Pavlu, V., Savev, S., & Yilmaz, E. (2009). Document selection methodologies for efficient and effective learning-to-rank. In Proceedings of the 32nd annual international ACM SIGIR conference on research and development in information retrieval, SIGIR ’09 (pp. 468–475). doi:10.1145/1571941.1572022.
  5. Becchetti, L., Castillo, C., Donato, D., Leonardi, S., & Baeza-Yates, R. (2006). Link-based characterization and detection of Web spam. In: Proceedings of the 2nd international workshop on adversarial information retrieval on the web, AIRWeb.Google Scholar
  6. Beitzel, S. M., Jensen, E. C., Chowdhury, A., Grossman, D., Frieder, O., & Goharian, N. (2004). Fusion of effective retrieval strategies in the same information retrieval system. Journal American Society of Information Science & Technology, 55(10), 859–868. doi:10.1002/asi.20012.CrossRefGoogle Scholar
  7. Broder, A. Z., Carmel, D., Herscovici, M., Soffer, A., & Zien. J. (2003). Efficient query evaluation using a two-level retrieval process. In Proceedings of the 12th ACM international conference on information and knowledge management, CIKM ’03 (pp. 426–434). doi:10.1145/956863.956944.
  8. Buckley, C., & Voorhees, E. M. (2000). Evaluating evaluation measure stability. In Proceedings of the 23rd annual international ACM SIGIR conference on research and development in information retrieval, SIGIR ’00 (pp. 33–40). doi:10.1145/345508.345543.
  9. Burges, C., Shaked, T., Renshaw, E., Lazier, A., Deeds, M., Hamilton, N., & Hullender, G. (2005). Learning to rank using gradient descent. In Proceedings of the 22nd international conference on machine learning, ICML ’05 (pp. 89–96). doi:10.1145/1102351.1102363.
  10. Cambazoglu, B. B., Zaragoza, H., Chapelle, O., Chen, J., Liao, C., Zheng, Z., & Degenhardt, J. (2010). Early exit optimizations for additive machine learned ranking systems. In Proceedings of the third ACM international conference on web search and data mining, WSDM ’10 (pp. 411–420). doi:10.1145/1718487.1718538.
  11. Carterette, B., Fang, H., Pavlu, V., & Kanoulas, E. (2010). Million query track 2009 overview. In Proceedings of the 18th text retrieval conference, TREC ’09.Google Scholar
  12. Castillo, C., Donato, D., Becchetti, L., Boldi, P., Leonardi, S., Santini, M., & Vigna, S. (2006). A reference collection for web spam. SIGIR Forum, 40(2), 11–24. doi:10.1145/1189702.1189703.CrossRefGoogle Scholar
  13. Chapelle, O., & Chang, Y. (2011). Yahoo! learning to rank challenge overview. In: Journal of Machine Learning Research Proceedings Track, 14, 1–24.Google Scholar
  14. Chapelle, O., Metlzer, D., Zhang, Y., & Grinspan, P. (2009). Expected reciprocal rank for graded relevance. In Proceeding of the 18th ACM international conference on information and knowledge management, CIKM ’09 (pp. 621–630). doi:10.1145/1645953.1646033.
  15. Chapelle, O., Chang, Y., & Liu, T. Y. (2011). Future directions in learning to rank. Journal of Machine Learning Research Proceedings Track, 14, 91–100.Google Scholar
  16. Clarke, C. L. A., Craswell, N., & Soboroff, I. (2010). Overview of the TREC 2009 Web track. In Proceedings of the 18th text retrieval conference (TREC 2009), TREC ’09.Google Scholar
  17. Clarke, C. L. A., Craswell, N., & Soboroff, I. (2011). Overview of the TREC 2010 web track. In Proceedings of the 19th text retrieval conference, TREC ’10.Google Scholar
  18. Coolican, H. (1999). Research methods and statistics in psychology. London: A Hodder Arnold Publication, Hodder & Stoughton. http://books.google.co.uk/books?id=XmfGQgAACAAJ.
  19. Cormack, G. V., Smucker, M. D., & Clarke, C. L. A. (2011). Efficient and effective spam filtering and re-ranking for large Web datasets. Information Retrieval, 15(5), 441–465. doi:10.1007/s10791-011-9162-z.CrossRefGoogle Scholar
  20. Craswell, N., & Hawking, D. (2004). Overview of TREC-2004 web track. In Proceedings of the 13th text retrieval conference, TREC ’04.Google Scholar
  21. Craswell, N., Robertson, S., Zaragoza, H., & Taylor, M. (2005). Relevance weighting for query independent evidence. In Proceedings of the 28th annual international ACM SIGIR conference on research and development in information retrieval, SIGIR ’05 (pp. 416–423). doi:10.1145/1076034.1076106.
  22. Craswell, N., Jones, R., Dupret, G., & Viegas, E. (eds) (2009). Proceedings of the 2009 workshop on web search click data. doi:10.1145/1507509.
  23. Craswell, N., Fetterly, D., Najork, M., Robertson, S., & Yilmaz, E. (2010). Microsoft research at TREC 2009. In Proceedings of the 18th Text REtrieval Conference, TREC ’09.Google Scholar
  24. Croft, W. B. (2008). Learning about ranking and retrieval models. In Keynote, SIGIR 2007 workshop learning to rank for information retrieval (LR4IR).Google Scholar
  25. Donmez, P., & Carbonell, J. G. (2009). Active sampling for rank learning via optimizing the area under the roc curve. In Proceedings of the 31th European conference on IR research on advances in information retrieval, ECIR ’09 (pp. 78–89). doi:10.1007/978-3-642-00958-7_10.
  26. Donmez, P., Svore, K. M., & Burges, C. J. (2009). On the local optimality of LambdaRank. In: Proceedings of the 32nd annual international ACM SIGIR conference on research and development in information retrieval, SIGIR ’09 (pp. 460–467). doi:10.1145/1571941.1572021.
  27. Freund, Y., Iyer, R., Schapire, R. E., & Singer, Y. (2003). An efficient boosting algorithm for combining preferences. Journal of Machine Learning Research, 4, 933–969.MathSciNetGoogle Scholar
  28. Friedman, J. H. (2000). Greedy function approximation: A gradient boosting machine. Annals of Statistics, 29, 1189–1232. doi:10.1214/aos/1013203451.CrossRefGoogle Scholar
  29. Ganjisaffar, Y., Caruana, R., & Lopes, C. (2011). Bagging gradient-boosted trees for high precision, low variance ranking models. In: Proceedings of the 34th annual international ACM SIGIR conference on Research and development in information retrieval, SIGIR ’11 (pp. 85–94). doi:10.1145/2009916.2009932.
  30. Hawking, D., Upstill, T., & Craswell, N. (2004). Toward better weighting of anchors. In: Proceedings of the 27th annual international ACM SIGIR conference on research and development in information retrieval, SIGIR ’04 (pp. 512–513). doi:10.1145/1008992.1009096.
  31. He, B., Macdonald, C., & Ounis, I. (2008). Retrieval sensitivity under training using different measures. In: Proceedings of the 31st annual international ACM SIGIR conference on research and development in information retrieval, SIGIR ’08 (pp. 67–74). doi:10.1145/1390334.1390348.
  32. Järvelin, K., & Kekäläinen, J. (2002). Cumulated gain-based evaluation of IR techniques. Transactions on Information Systems, 20(4), 422–446. doi:10.1145/582415.582418.CrossRefGoogle Scholar
  33. Kirkpatrick, S., Gelatt, C. D., & Vecchi, M. P. (1983). Optimization by simulated annealing. Science, 220(4598), 671–680. doi:10.1126/science.220.4598.671.MathSciNetMATHGoogle Scholar
  34. Kraaij, W., Westerveld, T., & Hiemstra, D. (2002). The importance of prior probabilities for entry page search. In Proceedings of the 25th annual international ACM SIGIR conference on research and development in information retrieval, SIGIR ’02 (pp. 27–34). doi:10.1145/564376.564383.
  35. Li, H. (2011). Learning to rank for information retrieval and natural language processing. Synthesis lectures on human language technologies. San Rafael: Morgan & Claypool Publishers. doi:10.2200/S00348ED1V01Y201104HLT012.CrossRefGoogle Scholar
  36. Liu, T. Y. (2009). Learning to rank for information retrieval. Foundations and Trends in Information Retrieval, 3(3), 225–331. doi:10.1561/1500000016.CrossRefGoogle Scholar
  37. Long, B., Chapelle, O., Zhang, Y., Chang, Y., Zheng, Z., & Tseng. B. (2010). Active learning for ranking through expected loss optimization. In: Proceedings of the 33rd annual international ACM SIGIR conference on research and development in information retrieval, SIGIR ’10 (pp. 267–274). doi:10.1145/1835449.1835495.
  38. Macdonald, C., & Ounis, I. (2009). Usefulness of quality click-through data for training. In: Proceedings of the 2009 workshop on web search click data, WSCD ’09 (pp. 75–79). doi:10.1145/1507509.1507521.
  39. Metzler, D. (2007). Automatic feature selection in the Markov random field model for information retrieval. In: Proceedings of the 16th ACM international conference on information and knowledge management, CIKM ’07 (pp. 253–262). doi:10.1145/1321440.1321478.
  40. Metzler, D., & Croft, W. B. (2005). A Markov random field model for term dependencies. In: Proceedings of the 28th annual international ACM SIGIR conference on research and development in information retrieval, SIGIR ’05 (pp. 472–479). doi:10.1145/1076034.1076115.
  41. Minka, T., & Robertson, S. (2008). Selection bias in the LETOR datasets. In: SIGIR 2007 workshop learning to rank for information retrieval (LR4IR).Google Scholar
  42. Ounis, I., Amati, G., Plachouras, V., He, B., Macdonald, C., & Lioma, C. (2006). Terrier: A high performance and scalable information retrieval platform. In: Proceedings of the 2nd workshop on open source information retrieval at SIGIR 2006, OSIR (pp. 18–25).Google Scholar
  43. Page, L., Brin, S., Motwani, R., & Winograd, T. (1998). The pagerank citation ranking: Bringing order to the web. Technical report, Stanford Digital Library technologies project.Google Scholar
  44. Pederson, J. (2008). The machine learned ranking story. http://jopedersen.com/Presentations/The_MLR_Story.pdf, Accessed July 30, 2012.
  45. Pederson, J. (2010). Query understanding at bing. In: Invited talk, SIGIR 2010 industry day.Google Scholar
  46. Peng, J., Macdonald, C., He, B., Plachouras, V., & Ounis, I. (2007). Incorporating term dependency in the DFR framework. In: Proceedings of the 30th annual international ACM SIGIR conference on research and development in information retrieval, SIGIR ’07 (pp. 843–844). doi:10.1145/1277741.1277937.
  47. Piroi, F., & Zenz, V. (2011). Evaluating information retrieval in the intellectual property domain: The CLEF-IP campaign. In Current challenges in patent information retrieval, the information retrieval series, 29 (pp. 87–108). Berlin: Springer. doi:10.1007/978-3-642-19231-9_4.
  48. Plachouras, V. (2006) Selective web information retrieval. PhD thesis. Department of Computing Science, University of Glasgow.Google Scholar
  49. Plachouras, V., & Ounis, I. (2004). Usefulness of hyperlink structure for query-biased topic distillation. In Proceedings of the 27th annual international ACM SIGIR conference on research and development in information retrieval, SIGIR ’04 (pp. 448–455). doi:10.1145/1008992.1009069.
  50. Plachouras, V., Ounis, I., & Amati, G. (2005). The static absorbing model for the web. Journal of Web Engineering, 4(2), 165–186.Google Scholar
  51. Qin, T., Liu, T. Y., Xu, J., & Li, H. (2009). LETOR: A benchmark collection for research on learning to rank for information retrieval. Information Retrieval, 13(4), 347–374.Google Scholar
  52. Robertson, S. (2008). On the optimisation of evaluation metrics. In Keynote, SIGIR 2008 workshop learning to rank for information retrieval (LR4IR).Google Scholar
  53. Robertson, S., & Walker, S., Hancock-Beaulieu, M., Gull, A., Lau, M. (1992). Okapi at TREC. In Proceedings of the text retrieval conference, TREC-1.Google Scholar
  54. Segalovich, I. (2010). Machine learning in search quality at Yandex. In Invited Talk, SIGIR 2010 industry day.Google Scholar
  55. Tomlinson, S., & Hedin, B. (2011). Measuring effectiveness in the TREC legal track. In Current challenges in patent information retrieval, the information retrieval series (vol. 29, pp. 167–180). Berlin: Springer. doi:10.1007/978-3-642-19231-9_8.
  56. Voorhees, E. M., & Harman, D. K. (2005). TREC: Experiment and evaluation in information retrieval. Cambridge: MIT Press. doi:10.1002/asi.20583.Google Scholar
  57. Weinberger, K., Mohan, A., & Chen, Z. (2010). Tree ensembles and transfer learning. In Proceedings of the Yahoo! learning to rank challenge workshop at WWW 2010.Google Scholar
  58. Wu, Q., Burges, C. J. C., Svore, K. M., & Gao, J. (2008). Ranking, boosting, and model adaptation. Technical Report MSR-TR-2008-109, Microsoft.Google Scholar
  59. Xu, J., & Li, H. (2007). Adarank: A boosting algorithm for information retrieval. In Proceedings of the 30th annual international ACM SIGIR conference on research and development in information retrieval, SIGIR ’07 (pp. 391–398). doi:10.1145/1277741.1277809.
  60. Yilmaz, E., & Robertson, S. (2010). On the choice of effectiveness measures for learning to rank. Information Retrieval, 13(3), 271–290. doi:10.1007/s10791-009-9116-x.CrossRefGoogle Scholar
  61. Zhai, C., & Lafferty, J. (2001). A study of smoothing methods for language models applied to ad hoc information retrieval. In Proceedings of the 24th annual international ACM SIGIR conference on research and development in information retrieval, SIGIR ’01 (pp. 334–342). doi:10.1145/383952.384019.
  62. Zhang, M., Kuang, D., Hua, G., Liu, Y., & Ma, S. (2009). Is learning to rank effective for web search? In Proceedings of SIGIR 2008 workshop learning to rank for information retrieval (LR4IR).Google Scholar
  63. Zobel, J. (1998). How reliable are the results of large-scale information retrieval experiments? In Proceedings of the 21st annual international ACM SIGIR conference on research and development in information retrieval, SIGIR ’98 (pp. 307–314). doi:10.1145/290941.291014.

Copyright information

© Springer Science+Business Media, LLC 2012

Authors and Affiliations

  • Craig Macdonald
    • 1
  • Rodrygo L. T. Santos
    • 1
  • Iadh Ounis
    • 1
  1. 1.School of Computing ScienceUniversity of GlasgowScotlandUK

Personalised recommendations