Lossless Compression of Random Forests


Ensemble methods are among the state-of-the-art predictive modeling approaches. Applied to modern big data, these methods often require a large number of sub-learners, where the complexity of each learner typically grows with the size of the dataset. This phenomenon results in an increasing demand for storage space, which may be very costly. This problem mostly manifests in a subscriber-based environment, where a user-specific ensemble needs to be stored on a personal device with strict storage limitations (such as a cellular device). In this work we introduce a novel method for lossless compression of tree-based ensemble methods, focusing on random forests. Our suggested method is based on probabilistic modeling of the ensemble’s trees, followed by model clustering via Bregman divergence. This allows us to find a minimal set of models that provides an accurate description of the trees, and at the same time is small enough to store and maintain. Our compression scheme demonstrates high compression rates on a variety of modern datasets. Importantly, our scheme enables predictions from the compressed format and a perfect reconstruction of the original ensemble. In addition, we introduce a theoretically sound lossy compression scheme, which allows us to control the trade-off between the distortion and the coding rate.

This is a preview of subscription content, access via your institution.


  1. [1]

    Breiman L, Friedman J, Olshen R A, Stone C J. Classification and Regression Trees (1st edition). Chapman and Hall/CRC, 1984.

  2. [2]

    Quinlan J R. C4.5: Programs for Machine Learning (1st edition). Morgan Kaufmann Publishers, 1992.

  3. [3]

    Breiman L. Bagging predictors. Machine Learning, 1996, 24(2): 123-140.

    MATH  Google Scholar 

  4. [4]

    Schapire R E. The boosting approach to machine learning: An overview. In Nonlinear Estimation and Classification, Denison D D, Hansen M H, Holmes C C, Mallick B, Yu B (eds.), Springer, 2003, pp.149-171.

  5. [5]

    Breiman L. Random forests. Machine Learning, 2001, 45(1): 5-32.

    Article  MATH  Google Scholar 

  6. [6]

    Friedman J, Hastie T, Tibshirani R. The Elements of Statistical Learning: Data Mining, Inference, and Prediction (1st edition). Springer, 2001.

  7. [7]

    Painsky A, Rosset S. Compressing random forests. In Proc. the 16th International Conference on Data Mining, December 2016, pp.1131-1136.

  8. [8]

    Geurts P. Some enhancements of decision tree bagging. In Proc. the 4th European Conference Principles of Data Mining and Knowledge Discovery, Sept. 2000, pp.136-147.

  9. [9]

    Meinshausen N. Node harvest. The Annals of Applied Statistics, 2010, 4(4): 2049-2072.

    MathSciNet  Article  MATH  Google Scholar 

  10. [10]

    Friedman J H, Popescu B E. Predictive learning via rule ensembles. The Annals of Applied Statistics, 2008, 2(3): 916-954.

    MathSciNet  Article  MATH  Google Scholar 

  11. [11]

    Bernard S, Heutte L, Adam S. On the selection of decision trees in random forests. In Proc. the 2009 International Joint Conference on Neural Networks, June 2009, pp.302-307.

  12. [12]

    Joly A, Schnitzler F, Geurts P, Wehenkel L. L 1-based compression of random forest models. In Proc. European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning, April 2012, pp.375-380.

  13. [13]

    Buciluă C, Caruana R, Niculescu-Mizil A. Model compression. In Proc. the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, August 2006, pp.535-541.

  14. [14]

    Tikk D, Kóczy L T, Gedeon T D. A survey on universal approximation and its limits in soft computing techniques. International Journal of Approximate Reasoning, 2003, 33(2): 185-202.

    MathSciNet  Article  MATH  Google Scholar 

  15. [15]

    Katajainen J, Mäkinen E. Tree compression and optimization with applications. International Journal of Foundations of Computer Science, 1990, 1(04): 425-447.

    MathSciNet  Article  MATH  Google Scholar 

  16. [16]

    Chen S, Reif J H. Efficient lossless compression of trees and graphs. In Proc. the 6th Data Compression Conference, March 1996, pp.428.

  17. [17]

    Painsky A, Wornell G W. On the universality of the logistic loss function. arXiv:1805.03804, 2018. https://arxiv.org/pdf/1805.03804.pdf, September 2018.

  18. [18]

    Painsky A, Wornell G W. Bregman divergence bounds and the universality of the logarithmic loss. arXiv:1810.07014, 2018. http://export.arxiv.org/pdf/1810.07014, September 2018.

  19. [19]

    Hothorn T, Hornik K, Zeileis A. Unbiased recursive partitioning: A conditional inference framework. Journal of Computational and Graphical Statistics, 2006, 15(3): 651-674.

    MathSciNet  Article  Google Scholar 

  20. [20]

    Painsky A, Rosset S. Cross-validated variable selection in tree-based methods improves predictive performance. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(11): 2142-2153.

    Article  Google Scholar 

  21. [21]

    Sayood K. Introduction to Data Compression (5th Edition). Morgan Kaufmann, 2017.

  22. [22]

    Szpankowski W, Weinberger M J. Minimax pointwise redundancy for memoryless models over large alphabets. IEEE Transactions on Information Theory, 2012, 58(7): 4094-4104.

    MathSciNet  Article  MATH  Google Scholar 

  23. [23]

    Orlitsky A, Santhanam N P, Zhang J. Universal compression of memoryless sources over unknown alphabets. IEEE Transactions on Information Theory, 2004, 50(7): 1469-1481.

    MathSciNet  Article  MATH  Google Scholar 

  24. [24]

    Painsky A, Rosset S, Feder M. Universal compression of memoryless sources over large alphabets via independent component analysis. In Proc. the 2015 Data Compression Conference, April 2015, pp.213-222.

  25. [25]

    Painsky A, Rosset S, Feder M. A simple and efficient approach for adaptive entropy coding over large alphabets. In Proc. the 2016 Data Compression Conference, March 2016, pp.369-378.

  26. [26]

    Painsky A, Rosset S, Feder M. Large alphabet source coding using independent component analysis. IEEE Transactions on Information Theory, 2017, 63(10): 6514-6529.

    MathSciNet  Article  MATH  Google Scholar 

  27. [27]

    Painsky A, Rosset S, Feder M G. Linear independent component analysis over finite fields: Algorithms and bounds. IEEE Transactions on Signal Processing, 2018, 66(22): 5875-5886.

    MathSciNet  Article  MATH  Google Scholar 

  28. [28]

    Zaks S. Lexicographic generation of ordered trees. Theoretical Computer Science, 1980, 10(1): 63-82.

    MathSciNet  Article  MATH  Google Scholar 

  29. [29]

    Banerjee A, Merugu S, Dhillon I S, Ghosh J. Clustering with Bregman divergences. Journal of Machine Learning Research, 2005, 6: 1705-1749.

    MathSciNet  MATH  Google Scholar 

  30. [30]

    Lloyd S. P. Least squares quantization in PCM. IEEE Transactions on Information Theory, 1982, 28(2): 129-137.

    MathSciNet  Article  MATH  Google Scholar 

  31. [31]

    Cover T M, Thomas J A. Elements of Information Theory (2nd edition, e-book). John Wiley & Sons, 2012.

  32. [32]

    Deutsch L P. Gzip file format specification version 4.3. 1996. https://www.rfc-editor.org/rfc/rfc1952.txt, Oct. 2018.

  33. [33]

    Schuchman L. Dither signals and their effect on quantization noise. IEEE Transactions on Communication Technology, 1964, 12(4): 162-165.

    Article  Google Scholar 

  34. [34]

    Geurts P, Ernst D, Wehenkel L. Extremely randomized trees. Machine Learning, 2006, 63(1): 3-42.

    Article  MATH  Google Scholar 

  35. [35]

    Liu F T, Ting K M, Yu Y, Zhou Z H. Spectrum of variable-random trees. Journal of Artificial Intelligence Research, 2008, 32: 355-384.

    Article  MATH  Google Scholar 

  36. [36]

    Zhou Z H, Feng J. Deep forest: Towards an alternative to deep neural networks. arXiv:1702.08835, 2017. https://arxiv.org/pdf/1702.08835v2.pdf, September 2018.

Download references

Author information



Corresponding author

Correspondence to Amichai Painsky.

Electronic supplementary material


(PDF 838 kb)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Painsky, A., Rosset, S. Lossless Compression of Random Forests. J. Comput. Sci. Technol. 34, 494–506 (2019). https://doi.org/10.1007/s11390-019-1921-0

Download citation


  • entropy coding
  • lossless compression
  • lossy compression
  • random forest