Information Retrieval

, Volume 13, Issue 3, pp 254–270 | Cite as

Adapting boosting for information retrieval measures

  • Qiang Wu
  • Christopher J. C. Burges
  • Krysta M. Svore
  • Jianfeng Gao
Learning to rank for information retrieval


We present a new ranking algorithm that combines the strengths of two previous methods: boosted tree classification, and LambdaRank, which has been shown to be empirically optimal for a widely used information retrieval measure. Our algorithm is based on boosted regression trees, although the ideas apply to any weak learners, and it is significantly faster in both train and test phases than the state of the art, for comparable accuracy. We also show how to find the optimal linear combination for any two rankers, and we use this method to solve the line search problem exactly during boosting. In addition, we show that starting with a previously trained model, and boosting using its residuals, furnishes an effective technique for model adaptation, and we give significantly improved results for a particularly pressing problem in web search—training rankers for markets for which only small amounts of labeled data are available, given a ranker trained on much more data from a larger market.


Learning to rank Boosting Web search 


  1. Bacchiani, M., Roark, B., & Saraclar, M. (2004). Language model adaptation with MAP estimation and the perceptron algorithm. In HLT-NAACL (pp. 21–24).Google Scholar
  2. Bellagarda, J. (2001). An overview of statistical language model adaptation. In ITRW on adaptation methods for speech recognition (pp. 165–174).Google Scholar
  3. Burges, C. (2005). Ranking as learning structured outputs. In C. C. S. Agarwal & R. Herbrich (Eds.), Proceedings of the NIPS workshop on learning to rank.Google Scholar
  4. Burges, C., Ragno, R., & Le, Q. (2006). Learning to rank with non-smooth cost functions. In NIPS.Google Scholar
  5. Burges, C., Shaked, T., Renshaw, E., Lazier, A., Deeds, M., Hamilton, N., et al. (2005). Learning to rank using gradient descent. In ICML. Bonn, Germany.Google Scholar
  6. Cao, Z., Qin, T., Liu, T. Y., Tsai, M. F., & Li, H. (2007). Learning to rank: From pairwise approach to listwise approach. In ICML.Google Scholar
  7. Chen, K., Lu, R., Wong, C., Sun, G., Heck, L., & Tseng, B. (2008). Trada: Tree based ranking function adaptation. In ACM 17th conference on information and knowledge management.Google Scholar
  8. Donmez, P., Svore, K., & Burges, C. (2008). On the optimality of LambdaRank. SIGIR.Google Scholar
  9. Friedman, J. (2001). Greedy function approximation: A gradient boosting machine. Annals of Statistics, 29(5).Google Scholar
  10. Gao, J., Nie, J. Y., Wu, G., & Cao, G. (2004). Dependence language model for information retrieval. In SIGIR, (pp. 170–177).Google Scholar
  11. Gao, J., Qin, H., Xia, X., & Nie, J. Y. (2005). Linear discriminative model for information retrieval. In SIGIR, (pp. 290–297).Google Scholar
  12. Gao, J., Suzuki, H., & Yuan, W. (2006). An empirical study on language model adaptation. ACM Trans on Asian Language Information Processing, 5(3), 207–227.Google Scholar
  13. Gao, J., Wu, Q., Burges, C., Svore, K., Su, Y., Khan, N., et al. (2009). Model adaptation via model interpolation and boosting for web search ranking. In Conference on Empirical Methods in Natural Language Processing.Google Scholar
  14. Jarvelin, K., & Kekalainen, J. (2000). IR evaluation methods for retrieving highly relevant documents. In SIGIR 23. ACM.Google Scholar
  15. Jones, K., Walker, S., & Robertson, S. (1998). A probabilistic model of information retrieval: Development and status. Tech. Rep. TR-446, Cambridge University Computer Laboratory.Google Scholar
  16. Le, Q., & Smola, A. J. (2007). Direct optimization of ranking measures. CoRR abs/0704.3359. Informal publication.Google Scholar
  17. Li, P., Burges, C., & Wu, Q. (2007). Learning to rank using classification and gradient boosting. In NIPS.Google Scholar
  18. Mason, L., Baxter, J., Bartlett, P., & Frean, M. (2000). Boosting algorithms as gradient descent. In T. L. S. A. Solla & K. R. Müller (Eds.), Advances in neural information processing systems (Vol. 12, pp. 512–518).Google Scholar
  19. Robertson, S., & Zaragoza, H. (2007). On rank-based effectiveness measures and optimization. Information Retrieval, 10(3), 321–339.CrossRefGoogle Scholar
  20. Song, F., & Croft, B. (1999). A general language model for information retrieval. In CIKM (pp. 316–321).Google Scholar
  21. Xu, J., & Li, H. (2007). A boosting algorithm for information retrieval. In SIGIR.Google Scholar
  22. Yue, Y., & Burges, C. (2007). On using simultaneous perturbation stochastic approximation for learning to rank, and the empirical optimality of LambdaRank. Tech. Rep. MSR-TR-2007-115, Microsoft research.Google Scholar
  23. Yue, Y., Finley, T., Radlinski, F., & Joachims, T. (2007). A support vector method for optimizing average precision. In SIGIR.Google Scholar
  24. Zhai, C., & Lafferty, J. (2002). Two-stage language models for information retrieval. In SIGIR (pp. 49–56).Google Scholar
  25. Zheng, Z., Zha, H., Zhang, T., Chapelle, O., Chen, K., & Sun, G. (2007). A general boosting method and its application to learning ranking functions for web search. In NIPS.Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2009

Authors and Affiliations

  • Qiang Wu
    • 1
  • Christopher J. C. Burges
    • 1
  • Krysta M. Svore
    • 1
  • Jianfeng Gao
    • 1
  1. 1.Microsoft ResearchRedmondUSA

Personalised recommendations