Abstract
Gradient-boosted regression trees (GBRTs) have proven to be an effective solution to the learning-to-rank problem. This work proposes and evaluates techniques for training GBRTs that have efficient runtime characteristics. Our approach is based on the simple idea that compact, shallow, and balanced trees yield faster predictions: thus, it makes sense to incorporate some notion of execution cost during training to “encourage” trees with these topological characteristics. We propose two strategies for accomplishing this: the first, by directly modifying the node splitting criterion during tree induction, and the second, by stagewise tree pruning. Experiments on a standard learning-to-rank dataset show that the pruning approach is superior; one balanced setting yields an approximately 40% decrease in prediction latency with minimal reduction in output quality as measured by NDCG.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Li, H.: Learning to Rank for Information Retrieval and Natural Language Processing. Morgan & Claypool Publishers (2011)
Ganjisaffar, Y., Caruana, R., Lopes, C.: Bagging gradient-boosted trees for high precision, low variance ranking models. In: SIGIR 2011 (2011)
Burges, C.: From RankNet to LambdaRank to LambdaMART: An overview. Technical Report MSR-TR-2010-82, Microsoft Research (2010)
Chapelle, O., Chang, Y., Liu, T.Y.: Future directions in learning to rank. In: JMLR: Workshop and Conference Proceedings 14 (2011)
Wang, L., Lin, J., Metzler, D.: A cascade ranking model for efficient ranked retrieval. In: SIGIR 2011 (2011)
Xu, Z., Weinberger, K., Chapelle, O.: The greedy miser: Learning under test-time budgets. In: ICML 2012 (2012)
Panda, B., Herbach, J., Basu, S., Bayardo, R.: PLANET: Massively parallel learning of tree ensembles with MapReduce. In: VLDB 2009 (2009)
Svore, K., Burges, C.: Large-scale learning to rank using boosted decision trees. In: Bekkerman, R., Bilenko, M., Langford, J. (eds.) Scaling Up Machine Learning. Cambridge University Press (2011)
Ye, J., Chow, J., Chen, J., Zheng, Z.: Stochastic gradient boosted distributed decision trees. In: CIKM 2009 (2009)
Tyree, S., Weinberger, K., Agrawal, K.: Parallel boosted regression trees for web search ranking. In: WWW 2011 (2011)
Burges, C., Ragno, R., Le, Q.: Learning to rank with nonsmooth cost functions. In: NIPS 2007 (2007)
Friedman, J.: Greedy function approximation: A gradient boosting machine. The Annals of Statistics 29(5), 1189–1232 (2001)
Breiman, L., Friedman, J., Stone, C., Olshen, R.: Classification and Regression Trees. Chapman and Hall (1984)
Breiman, L.: Bagging predictors. Machine Learning 24(2), 123–140 (1996)
Breiman, L.: Random forests. Machine Learning 45(1), 5–32 (2001)
Ganjisaffar, Y.: Tree ensembles for learning to rank. PhD thesis, UC Irvine (2011)
Margineantu, D., Dietterich, T.: Pruning adaptive boosting. In: ICML 1997 (1997)
Tibshirani, R.: Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society, Series B 58, 267–288 (1994)
Martínez-Muñoz, G., Hernández-Lobato, D., Suárez, A.: An analysis of ensemble pruning techniques based on ordered aggregation. IEEE TPAMI 31(2) (2009)
Torgo, L.: Sequence-based methods for pruning regression trees. In: Technical Report, LIACC, Machine Learning Group (1998)
Cambazoglu, B.B., Zaragoza, H., Chapelle, O., Chen, J., Liao, C., Zheng, Z., Degenhardt, J.: Early exit optimizations for additive machine learned ranking systems. In: WSDM 2010 (2010)
Stadler, W.: Multicriteria Optimization in Engineering and in the Sciences. In: Mathematical Concepts and Methods in Science and Engineering. Springer (1988)
Osyczka, A.: Multicriterion Optimization in Engineering with FORTRAN Programs. E. Horwood (1984)
Järvelin, K., Kekäläinen, J.: Cumulative gain-based evaluation of IR techniques. ACM TOIS 20(4) (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Asadi, N., Lin, J. (2013). Training Efficient Tree-Based Models for Document Ranking. In: Serdyukov, P., et al. Advances in Information Retrieval. ECIR 2013. Lecture Notes in Computer Science, vol 7814. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-36973-5_13
Download citation
DOI: https://doi.org/10.1007/978-3-642-36973-5_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-36972-8
Online ISBN: 978-3-642-36973-5
eBook Packages: Computer ScienceComputer Science (R0)