Skip to main content

Training Efficient Tree-Based Models for Document Ranking

  • Conference paper
Advances in Information Retrieval (ECIR 2013)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7814))

Included in the following conference series:

Abstract

Gradient-boosted regression trees (GBRTs) have proven to be an effective solution to the learning-to-rank problem. This work proposes and evaluates techniques for training GBRTs that have efficient runtime characteristics. Our approach is based on the simple idea that compact, shallow, and balanced trees yield faster predictions: thus, it makes sense to incorporate some notion of execution cost during training to “encourage” trees with these topological characteristics. We propose two strategies for accomplishing this: the first, by directly modifying the node splitting criterion during tree induction, and the second, by stagewise tree pruning. Experiments on a standard learning-to-rank dataset show that the pruning approach is superior; one balanced setting yields an approximately 40% decrease in prediction latency with minimal reduction in output quality as measured by NDCG.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 99.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Li, H.: Learning to Rank for Information Retrieval and Natural Language Processing. Morgan & Claypool Publishers (2011)

    Google Scholar 

  2. Ganjisaffar, Y., Caruana, R., Lopes, C.: Bagging gradient-boosted trees for high precision, low variance ranking models. In: SIGIR 2011 (2011)

    Google Scholar 

  3. Burges, C.: From RankNet to LambdaRank to LambdaMART: An overview. Technical Report MSR-TR-2010-82, Microsoft Research (2010)

    Google Scholar 

  4. Chapelle, O., Chang, Y., Liu, T.Y.: Future directions in learning to rank. In: JMLR: Workshop and Conference Proceedings 14 (2011)

    Google Scholar 

  5. Wang, L., Lin, J., Metzler, D.: A cascade ranking model for efficient ranked retrieval. In: SIGIR 2011 (2011)

    Google Scholar 

  6. Xu, Z., Weinberger, K., Chapelle, O.: The greedy miser: Learning under test-time budgets. In: ICML 2012 (2012)

    Google Scholar 

  7. Panda, B., Herbach, J., Basu, S., Bayardo, R.: PLANET: Massively parallel learning of tree ensembles with MapReduce. In: VLDB 2009 (2009)

    Google Scholar 

  8. Svore, K., Burges, C.: Large-scale learning to rank using boosted decision trees. In: Bekkerman, R., Bilenko, M., Langford, J. (eds.) Scaling Up Machine Learning. Cambridge University Press (2011)

    Google Scholar 

  9. Ye, J., Chow, J., Chen, J., Zheng, Z.: Stochastic gradient boosted distributed decision trees. In: CIKM 2009 (2009)

    Google Scholar 

  10. Tyree, S., Weinberger, K., Agrawal, K.: Parallel boosted regression trees for web search ranking. In: WWW 2011 (2011)

    Google Scholar 

  11. Burges, C., Ragno, R., Le, Q.: Learning to rank with nonsmooth cost functions. In: NIPS 2007 (2007)

    Google Scholar 

  12. Friedman, J.: Greedy function approximation: A gradient boosting machine. The Annals of Statistics 29(5), 1189–1232 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  13. Breiman, L., Friedman, J., Stone, C., Olshen, R.: Classification and Regression Trees. Chapman and Hall (1984)

    Google Scholar 

  14. Breiman, L.: Bagging predictors. Machine Learning 24(2), 123–140 (1996)

    MathSciNet  MATH  Google Scholar 

  15. Breiman, L.: Random forests. Machine Learning 45(1), 5–32 (2001)

    Article  MATH  Google Scholar 

  16. Ganjisaffar, Y.: Tree ensembles for learning to rank. PhD thesis, UC Irvine (2011)

    Google Scholar 

  17. Margineantu, D., Dietterich, T.: Pruning adaptive boosting. In: ICML 1997 (1997)

    Google Scholar 

  18. Tibshirani, R.: Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society, Series B 58, 267–288 (1994)

    MathSciNet  Google Scholar 

  19. Martínez-Muñoz, G., Hernández-Lobato, D., Suárez, A.: An analysis of ensemble pruning techniques based on ordered aggregation. IEEE TPAMI 31(2) (2009)

    Google Scholar 

  20. Torgo, L.: Sequence-based methods for pruning regression trees. In: Technical Report, LIACC, Machine Learning Group (1998)

    Google Scholar 

  21. Cambazoglu, B.B., Zaragoza, H., Chapelle, O., Chen, J., Liao, C., Zheng, Z., Degenhardt, J.: Early exit optimizations for additive machine learned ranking systems. In: WSDM 2010 (2010)

    Google Scholar 

  22. Stadler, W.: Multicriteria Optimization in Engineering and in the Sciences. In: Mathematical Concepts and Methods in Science and Engineering. Springer (1988)

    Google Scholar 

  23. Osyczka, A.: Multicriterion Optimization in Engineering with FORTRAN Programs. E. Horwood (1984)

    Google Scholar 

  24. Järvelin, K., Kekäläinen, J.: Cumulative gain-based evaluation of IR techniques. ACM TOIS 20(4) (2002)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Asadi, N., Lin, J. (2013). Training Efficient Tree-Based Models for Document Ranking. In: Serdyukov, P., et al. Advances in Information Retrieval. ECIR 2013. Lecture Notes in Computer Science, vol 7814. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-36973-5_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-36973-5_13

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-36972-8

  • Online ISBN: 978-3-642-36973-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics