Fast Gradient Boosting Decision Trees with Bit-Level Data Structures

Devos, Laurens; Meert, Wannes; Davis, Jesse

doi:10.1007/978-3-030-46150-8_35

Laurens Devos¹⁴,
Wannes Meert¹⁴ &
Jesse Davis¹⁴

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11906))

Included in the following conference series:

Joint European Conference on Machine Learning and Knowledge Discovery in Databases

2143 Accesses
3 Citations

Abstract

A gradient boosting decision tree model is a powerful machine learning method that iteratively constructs decision trees to form an additive ensemble model. The method uses the gradient of the loss function to improve the model at each iteration step. Inspired by the database literature, we exploit bitset and bitslice data structures in order to improve the run time efficiency of learning the trees. We can use these structures in two ways. First, they can represent the input data itself. Second, they can store the discretized gradient values used by the learning algorithm to construct the trees in the boosting model. Using these bit-level data structures reduces the problem of finding the best split, which involves counting of instances and summing gradient values, to counting one-bits in bit strings. Modern CPUs can efficiently count one-bits using AVX2 SIMD instructions. Empirically, our proposed improvements can result in speed-ups of 2 to up to 10 times on datasets with a large number of categorical features without sacrificing predictive performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
BitBoost is hosted on GitHub: https://github.com/laudv/bitboost.
2.
https://www.kaggle.com/c/allstate-claims-severity.
3.
http://yann.lecun.com/exdb/mnist.
4.
https://archive.ics.uci.edu/ml/datasets/covertype.
5.
https://www.kaggle.com/datasnaek/youtube-new.

References

Chambi, S., Lemire, D., Kaser, O., Godin, R.: Better bitmap performance with roaring bitmaps. Softw. Pract. Exp. 46(5), 709–719 (2016)
Article Google Scholar
Chan, C.Y., Ioannidis, Y.E.: Bitmap index design and evaluation. In: ACM SIGMOD Record, vol. 27, pp. 355–366. ACM (1998)
Google Scholar
Chen, T., Guestrin, C.: XGBoost: a scalable tree boosting system. In: Proceedings of ACM SIGKDD, pp. 785–794 (2016)
Google Scholar
Colantonio, A., Pietro, R.D.: Concise: compressed ‘n’ composable integer set. Inf. Process. Lett. 110(16), 644–650 (2010)
Article MATH Google Scholar
Freund, Y., Schapire, R.E.: A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 55(1), 119–139 (1997)
Article MathSciNet MATH Google Scholar
Friedman, J.H.: Greedy function approximation: a gradient boosting machine. Ann. Stat. 29(5), 1189–1232 (2001)
Article MathSciNet MATH Google Scholar
Ke, G., et al.: LightGBM: a highly efficient gradient boosting decision tree. In: Advances in Neural Information Processing Systems, vol. 30, pp. 3146–3154 (2017)
Google Scholar
Kurz, N., Muła, W., Lemire, D.: Faster population counts using AVX2 instructions. Comput. J. 61(1), 111–120 (2017)
Google Scholar
Mason, L., Baxter, J., Bartlett, P.L., Frean, M.R.: Boosting algorithms as gradient descent. In: Advances in Neural Information Processing Systems, pp. 512–518 (2000)
Google Scholar
Prokhorenkova, L., Gusev, G., Vorobev, A., Dorogush, A.V., Gulin, A.: CatBoost: unbiased boosting with categorical features. In: Advances in Neural Information Processing Systems, vol. 31, pp. 6638–6648 (2018)
Google Scholar
Rinfret, D., O’Neil, P., O’Neil, E.: Bit-sliced index arithmetic. In: ACM SIGMOD Record, vol. 30, pp. 47–57. ACM (2001)
Google Scholar
Schapire, R.E.: The strength of weak learnability. Mach. Learn. 5(2), 197–227 (1990)
Google Scholar
Schapire, R.E., Freund, Y., Bartlett, P., Lee, W.S.: Boosting the margin: a new explanation for the effectiveness of voting methods. Ann. Stat. 26(5), 1651–1686 (1998)
Article MathSciNet MATH Google Scholar

Download references

Acknowledgments

LD is supported by the KU Leuven Research Fund (C14/17/070), WM by VLAIO-SBO grant HYMOP (150033), and JD by KU Leuven Research Fund (C14/17/07, C32/17/036), Research Foundation - Flanders (EOS No. 30992574, G0D8819N) and VLAIO-SBO grant HYMOP (150033).

Author information

Authors and Affiliations

Department of Computer Science, KU Leuven, Leuven, Belgium
Laurens Devos, Wannes Meert & Jesse Davis

Authors

Laurens Devos
View author publications
You can also search for this author in PubMed Google Scholar
Wannes Meert
View author publications
You can also search for this author in PubMed Google Scholar
Jesse Davis
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Laurens Devos .

Editor information

Editors and Affiliations

Leuphana University, Lüneburg, Germany
Ulf Brefeld
IRISA/Inria, Rennes, France
Elisa Fromont
University of Würzburg, Würzburg, Germany
Andreas Hotho
Leiden University, Leiden, The Netherlands
Arno Knobbe
ETH Zurich, Zurich, Switzerland
Marloes Maathuis
Institut National des Sciences Appliquées, Villeurbanne, France
Céline Robardet

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Devos, L., Meert, W., Davis, J. (2020). Fast Gradient Boosting Decision Trees with Bit-Level Data Structures. In: Brefeld, U., Fromont, E., Hotho, A., Knobbe, A., Maathuis, M., Robardet, C. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2019. Lecture Notes in Computer Science(), vol 11906. Springer, Cham. https://doi.org/10.1007/978-3-030-46150-8_35

Download citation

DOI: https://doi.org/10.1007/978-3-030-46150-8_35
Published: 30 April 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-46149-2
Online ISBN: 978-3-030-46150-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the ECML PKDD community (opens in a new tab)