Advertisement

Parallel Training GBRT Based on KMeans Histogram Approximation for Big Data

  • Rong Gu
  • Lei Jin
  • Yongwei Wu
  • Jingying Qu
  • Tao Wang
  • Xiaojun Wang
  • Chunfeng Yuan
  • Yihua HuangEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9529)

Abstract

Gradient Boosting Regression Tree (GBRT), one of the state-of-the-art ranking algorithms widely used in industry, faces challenges in the big data era. With the rapid increase in the sizes of datasets, the iterative training process of GBRT becomes very time-consuming over large scale data. In this paper, we aim to speed up the training process of each tree in the GBRT framework. First, we propose a novel KMeans histogram building algorithm which has lower time complexity and is more efficient than the cutting-edge histogram building method. Further, we put forward an approximation algorithm by combining the kernel density estimation with the histogram technique to improve the accuracy. We conduct a variety of experiments on both the public Learning To Rank(LTR) benchmark datasets and the large-scale real-world datasets from Baidu search engine. The experimental results show that our proposed parallel training algorithm outperforms the state-of-the-art parallel GBRT algorithm with near 2 times speedup and better accuracy. Also, our algorithm achieves the near-linear scalability.

Keywords

Learning To Rank Gradient boosting regression tree Parallel computing KMeans histogram Kernel density estimation 

Notes

Acknowledgments

This work is funded in part by Jiangsu Province Industry Support Program (BE2014131), China NSF Grants (No. 61572250 & No. 61223003) and Baidu the most valuable topics Open Research Project.

References

  1. 1.
    Ben-Haim, Y., Tom-Tov, E.: A streaming parallel decision tree algorithm. J. Mach. Learn. Res. (JMLR) 11, 849–872 (2010)MathSciNetzbMATHGoogle Scholar
  2. 2.
    Burges, C., Shaked, T., Renshaw, E., Lazier, A., Deeds, M., Hamilton, N., Hullender, G.: Learning to rank using gradient descent. In: Proceedings of the 22nd International Conference on Machine Learning (ICML), pp. 89–96. ACM (2005)Google Scholar
  3. 3.
    Cao, Z., Qin, T., Liu, T.Y., Tsai, M.F., Li, H.: Learning to rank: from pairwise approach to listwise approach. In: Proceedings of the 24th International Conference on Machine Learning (ICML), pp. 129–136. ACM (2007)Google Scholar
  4. 4.
    Chapelle, O., Chang, Y.: Yahoo! learning to rank challenge overview. In: Proceedings of JMLR: Workshop and Conference on Yahoo! Learning to Rank, pp. 1–24 (2011)Google Scholar
  5. 5.
    Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)CrossRefGoogle Scholar
  6. 6.
    Friedman, J.H.: Greedy function approximation: a gradient boosting machine. Ann. Stat. 29(5), 1189–1232 (2001)MathSciNetCrossRefzbMATHGoogle Scholar
  7. 7.
    Hang, L.: A short introduction to learning to rank. IEICE Trans. Inf. Syst. 94(10), 1854–1862 (2011)Google Scholar
  8. 8.
    Liu, T.Y., Xu, J., Qin, T., Xiong, W., Li, H.: Letor: benchmark dataset for research on learning to rank for information retrieval. In: Proceedings of SIGIR 2007 Workshop on Learning To Rank for Information Retrieval, pp. 3–10 (2007)Google Scholar
  9. 9.
    Mohan, A., Chen, Z., Weinberger, K.Q.: Web-search ranking with initialized gradient boosted regression trees. In: JMLR: Workshop and Conference Proceedings, pp. 77–89 (2011)Google Scholar
  10. 10.
    Panda, B., Herbach, J.S., Basu, S., Bayardo, R.J.: Planet: massively parallel learning of tree ensembles with mapreduce. Proc. VLDB Endowment 2(2), 1426–1437 (2009)CrossRefGoogle Scholar
  11. 11.
    Shafer, J., Agrawal, R., Mehta, M.: Sprint: a scalable parallel classi er for data mining. In: Proceedings of the 1996 International Conference Very Large Data Basesm (VLDB), pp. 544–555. Citeseer (1996)Google Scholar
  12. 12.
    Srivastava, A., Han, E.H., Kumar, V., Singh, V.: Parallel formulations of decision-tree classification algorithms. In: Proceedings of the 1988 International Conference on Parallel Processing (ICPP), pp. 237–244. IEEE (1998)Google Scholar
  13. 13.
    Tyree, S., Weinberger, K.Q., Agrawal, K., Paykin, J.: Parallel boosted regression trees for web search ranking. In: Proceedings of the 20th International Conference on World Wide Web(WWW), pp. 387–396. ACM (2011)Google Scholar
  14. 14.
    Valizadegan, H., Jin, R., Zhang, R., Mao, J.: Learning to rank by optimizing ndcg measure. In: Bengio, Y., Schuurmans, D., Lafferty, J., Williams, C., Culotta, A. (eds.) Advances in Neural Information Processing Systems 22 (NIPS), pp. 1883–1891. Curran Associates Inc., Red Hook (2009)Google Scholar
  15. 15.
    Wu, Q., Burges, C.J., Svore, K.M., Gao, J.: Ranking, boosting, and model adaptation. Tecnical report, MSR-TR-2008-109 (2008)Google Scholar
  16. 16.
    Zaharia, M., Chowdhury, M., Das, T., Dave, A., Ma, J., McCauley, M., Franklin, M.J., Shenker, S., Stoica, I.: Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation (NSDI), pp. 2–12. USENIX Association (2012)Google Scholar
  17. 17.
    Zheng, Z., Zha, H., Zhang, T., Chapelle, O., Chen, K., Sun, G.: A general boosting method and its application to learning ranking functions for web search. In: Advances in Neural Information Processing Systems (NIPS), pp. 1697–1704 (2008)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Rong Gu
    • 1
  • Lei Jin
    • 1
  • Yongwei Wu
    • 2
  • Jingying Qu
    • 2
  • Tao Wang
    • 2
  • Xiaojun Wang
    • 2
  • Chunfeng Yuan
    • 1
  • Yihua Huang
    • 1
    Email author
  1. 1.National Key Laboratory for Novel Software Technology, Collaborative Innovation Center of Novel Software Technology and IndustrializationNanjing UniversityNanjingChina
  2. 2.Baidu, Inc.BeijingChina

Personalised recommendations