Training Efficient Tree-Based Models for Document Ranking

Asadi, Nima; Lin, Jimmy

doi:10.1007/978-3-642-36973-5_13

Nima Asadi^23,24 &
Jimmy Lin^23,24,25

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7814))

Included in the following conference series:

European Conference on Information Retrieval

3068 Accesses
17 Citations

Abstract

Gradient-boosted regression trees (GBRTs) have proven to be an effective solution to the learning-to-rank problem. This work proposes and evaluates techniques for training GBRTs that have efficient runtime characteristics. Our approach is based on the simple idea that compact, shallow, and balanced trees yield faster predictions: thus, it makes sense to incorporate some notion of execution cost during training to “encourage” trees with these topological characteristics. We propose two strategies for accomplishing this: the first, by directly modifying the node splitting criterion during tree induction, and the second, by stagewise tree pruning. Experiments on a standard learning-to-rank dataset show that the pruning approach is superior; one balanced setting yields an approximately 40% decrease in prediction latency with minimal reduction in output quality as measured by NDCG.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Softcover Book: USD 129.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Li, H.: Learning to Rank for Information Retrieval and Natural Language Processing. Morgan & Claypool Publishers (2011)
Google Scholar
Ganjisaffar, Y., Caruana, R., Lopes, C.: Bagging gradient-boosted trees for high precision, low variance ranking models. In: SIGIR 2011 (2011)
Google Scholar
Burges, C.: From RankNet to LambdaRank to LambdaMART: An overview. Technical Report MSR-TR-2010-82, Microsoft Research (2010)
Google Scholar
Chapelle, O., Chang, Y., Liu, T.Y.: Future directions in learning to rank. In: JMLR: Workshop and Conference Proceedings 14 (2011)
Google Scholar
Wang, L., Lin, J., Metzler, D.: A cascade ranking model for efficient ranked retrieval. In: SIGIR 2011 (2011)
Google Scholar
Xu, Z., Weinberger, K., Chapelle, O.: The greedy miser: Learning under test-time budgets. In: ICML 2012 (2012)
Google Scholar
Panda, B., Herbach, J., Basu, S., Bayardo, R.: PLANET: Massively parallel learning of tree ensembles with MapReduce. In: VLDB 2009 (2009)
Google Scholar
Svore, K., Burges, C.: Large-scale learning to rank using boosted decision trees. In: Bekkerman, R., Bilenko, M., Langford, J. (eds.) Scaling Up Machine Learning. Cambridge University Press (2011)
Google Scholar
Ye, J., Chow, J., Chen, J., Zheng, Z.: Stochastic gradient boosted distributed decision trees. In: CIKM 2009 (2009)
Google Scholar
Tyree, S., Weinberger, K., Agrawal, K.: Parallel boosted regression trees for web search ranking. In: WWW 2011 (2011)
Google Scholar
Burges, C., Ragno, R., Le, Q.: Learning to rank with nonsmooth cost functions. In: NIPS 2007 (2007)
Google Scholar
Friedman, J.: Greedy function approximation: A gradient boosting machine. The Annals of Statistics 29(5), 1189–1232 (2001)
Article MathSciNet MATH Google Scholar
Breiman, L., Friedman, J., Stone, C., Olshen, R.: Classification and Regression Trees. Chapman and Hall (1984)
Google Scholar
Breiman, L.: Bagging predictors. Machine Learning 24(2), 123–140 (1996)
MathSciNet MATH Google Scholar
Breiman, L.: Random forests. Machine Learning 45(1), 5–32 (2001)
Article MATH Google Scholar
Ganjisaffar, Y.: Tree ensembles for learning to rank. PhD thesis, UC Irvine (2011)
Google Scholar
Margineantu, D., Dietterich, T.: Pruning adaptive boosting. In: ICML 1997 (1997)
Google Scholar
Tibshirani, R.: Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society, Series B 58, 267–288 (1994)
MathSciNet Google Scholar
Martínez-Muñoz, G., Hernández-Lobato, D., Suárez, A.: An analysis of ensemble pruning techniques based on ordered aggregation. IEEE TPAMI 31(2) (2009)
Google Scholar
Torgo, L.: Sequence-based methods for pruning regression trees. In: Technical Report, LIACC, Machine Learning Group (1998)
Google Scholar
Cambazoglu, B.B., Zaragoza, H., Chapelle, O., Chen, J., Liao, C., Zheng, Z., Degenhardt, J.: Early exit optimizations for additive machine learned ranking systems. In: WSDM 2010 (2010)
Google Scholar
Stadler, W.: Multicriteria Optimization in Engineering and in the Sciences. In: Mathematical Concepts and Methods in Science and Engineering. Springer (1988)
Google Scholar
Osyczka, A.: Multicriterion Optimization in Engineering with FORTRAN Programs. E. Horwood (1984)
Google Scholar
Järvelin, K., Kekäläinen, J.: Cumulative gain-based evaluation of IR techniques. ACM TOIS 20(4) (2002)
Google Scholar

Download references

Author information

Authors and Affiliations

Dept. of Computer Science, University of Maryland, College Park, USA
Nima Asadi & Jimmy Lin
Institute for Advanced Computer Studies, University of Maryland, College Park, USA
Nima Asadi & Jimmy Lin
The iSchool, University of Maryland, College Park, USA
Jimmy Lin

Authors

Nima Asadi
View author publications
You can also search for this author in PubMed Google Scholar
Jimmy Lin
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Yandex, Leo Tolstoy, 16, 119021, Moscow, Russia
Pavel Serdyukov & Ilya Segalovich &
Kontur Labs and Ural Federal University, Fonvizina 3-27, 620078, Yekaterinburg, Russia
Pavel Braslavski
National Research University Higher School of Economics (HSE), Pokrovskii bd 11, 109028, Moscow, Russia
Sergei O. Kuznetsov
University of Amsterdam, Turfdraagsterpad 9, 1012 XT, Amsterdam, The Netherlands
Jaap Kamps
Knowledge Media Institute, The Open University, Walton Hall, MK7 6AA, Milton Keynes, UK
Stefan Rüger
Mathematics & Computer Science Department, Emory University, 400 dowman Drive, 30329, Atlanta, GA, USA
Eugene Agichtein
Department of Computer Science, University College London, Gower Street, WC1E 6BT, London, UK
Emine Yilmaz

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Asadi, N., Lin, J. (2013). Training Efficient Tree-Based Models for Document Ranking. In: Serdyukov, P., et al. Advances in Information Retrieval. ECIR 2013. Lecture Notes in Computer Science, vol 7814. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-36973-5_13

Download citation

DOI: https://doi.org/10.1007/978-3-642-36973-5_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-36972-8
Online ISBN: 978-3-642-36973-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics