Advertisement

Machine Learning

, Volume 108, Issue 8–9, pp 1613–1634 | Cite as

CaDET: interpretable parametric conditional density estimation with decision trees and forests

  • Cyrus Cousins
  • Matteo RiondatoEmail author
Article
  • 514 Downloads
Part of the following topical collections:
  1. Special Issue of the ECML PKDD 2019 Journal Track

Abstract

We introduce CaDET, an algorithm for parametric Conditional Density Estimation (CDE) based on decision trees and random forests. CaDET uses the empirical cross entropy impurity criterion for tree growth, which incentivizes splits that improve predictive accuracy more than the regression criteria or estimated mean-integrated-square-error used in previous works. CaDET also admits more efficient training and query procedures than existing tree-based CDE approaches, and stores only a bounded amount of information at each tree leaf, by using sufficient statistics for all computations. Previous tree-based CDE techniques produce complicated uninterpretable distribution objects, whereas CaDET may be instantiated with easily interpretable distribution families, making every part of the model easy to understand. Our experimental evaluation on real datasets shows that CaDET usually learns more accurate, smaller, and more interpretable models, and is less prone to overfitting than existing tree-based CDE approaches.

Keywords

Parametric models Random forests Sufficient statistics 

Notes

Supplementary material

10994_2019_5820_MOESM1_ESM.pdf (146 kb)
Supplementary material 1 (pdf 145 KB)

References

  1. Agarwal, R., Chen, Z., & Sarma, S. V. (2017). A novel nonparametric maximum likelihood estimator for probability density functions. IEEE Transactions on Pattern Analysis & Machine Intelligence, 7, 1294–1308.CrossRefGoogle Scholar
  2. Aitchison, J. (1982). The statistical analysis of compositional data. Journal of the Royal Statistical Society Series B (Methodological), 44(2), 139–177.MathSciNetCrossRefzbMATHGoogle Scholar
  3. Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32.CrossRefzbMATHGoogle Scholar
  4. Breiman, L., Friedman, J. H., Olshen, R. A., & Stone, C. J. (1984). Classification and regression trees. Davidson: Chapman and Hall/CRC.zbMATHGoogle Scholar
  5. Casella, G., & Berger, R. L. (2002). Statistical inference. Pacific Grove, CA: Duxbury.Google Scholar
  6. Chaudhuri, P., Loh, W. Y., et al. (2002). Nonparametric estimation of conditional quantiles using quantile regression trees. Bernoulli, 8(5), 561–576.MathSciNetzbMATHGoogle Scholar
  7. De Vito, S., Massera, E., Piga, M., Martinotto, L., & Di Francia, G. (2008). On field calibration of an electronic nose for benzene estimation in an urban pollution monitoring scenario. Sensors and Actuators B: Chemical, 129(2), 750–757.CrossRefGoogle Scholar
  8. Di Mauro, N., Vergari, A., Basile, T. M., & Esposito, F. (2017). Fast and accurate density estimation with extremely randomized cutset networks. In: Joint European conference on machine learning and knowledge discovery in databases (pp. 203–219). Berlin: Springer.Google Scholar
  9. Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning. Cambridge: MIT Press.zbMATHGoogle Scholar
  10. Halmos, P. R., Savage, L. J., et al. (1949). Application of the radon-nikodym theorem to the theory of sufficient statistics. The Annals of Mathematical Statistics, 20(2), 225–241.MathSciNetCrossRefzbMATHGoogle Scholar
  11. Hothorn, T., & Zeileis, A. (2017). Transformation forests. arXiv preprint arXiv:1701.02110.
  12. Kanazawa, Y. (1993). Hellinger distance and Kullback–Leibler loss for the kernel density estimator. Statistics & Probability Letters, 18(4), 315–321.MathSciNetCrossRefzbMATHGoogle Scholar
  13. Koopman, B. O. (1936). On distributions admitting a sufficient statistic. Transactions of the American Mathematical Society, 39(3), 399–409.MathSciNetCrossRefzbMATHGoogle Scholar
  14. Lahman, S. (2018). Sean Lahman’s baseball archive. http://www.seanlahman.com/baseball-archive/statistics/.
  15. Meinshausen, N. (2006). Quantile regression forests. Journal of Machine Learning Research, 7, 983–999.MathSciNetzbMATHGoogle Scholar
  16. Molina, A., Vergari, A., Di Mauro, N., Natarajan, S., Esposito, F., & Kersting, K. (2018). Mixed sum-product networks: A deep architecture for hybrid domains. In: Thirty-second AAAI conference on artificial intelligence.Google Scholar
  17. Nelder, J. A., & Wedderburn, R. W. M. (1972). Generalized linear models. Journal of the Royal Statistical Society, 135(3), 370–384.CrossRefGoogle Scholar
  18. Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., et al. (2011). Scikit-learn: Machine learning in python. Journal of Machine Learning Research, 12, 2825–2830.MathSciNetzbMATHGoogle Scholar
  19. Pospisil, T., & Lee, A. B. (2018). RFCDE: Random forests for conditional density estimation. arXiv preprint arXiv:1804.05753.
  20. Quinlan, J. R. (1986). Induction of decision trees. Machine Learning, 1(1), 81–106.Google Scholar
  21. Rahman, T., Kothalkar, P., & Gogate, V. (2014). Cutset networks: A simple, tractable, and scalable approach for improving the accuracy of Chow–Liu trees. In: Joint European conference on machine learning and knowledge discovery in databases (pp. 630–645).Google Scholar
  22. Rosenblatt, M. (1969). Conditional probability density and regression estimators. Multivariate Analysis II, 25, 31.Google Scholar
  23. Shannon, C. E. (1948). A mathematical theory of communication. Bell System Technical Journal, 27(3), 379–423.MathSciNetCrossRefzbMATHGoogle Scholar
  24. Shuford, E. H., Albert, A., & Massengill, H. E. (1966). Admissible probability measurement procedures. Psychometrika, 31(2), 125–145.CrossRefzbMATHGoogle Scholar
  25. Weisberg, S. (2005). Binomial regression. In S. Weisberg (Ed.), Applied linear regression (3rd ed., pp. 253–54). Hoboken, NJ: Wiley.CrossRefGoogle Scholar
  26. Zamora-Martínez, F., Romeu, P., Botella-Rocamora, P., & Pardo, J. (2014). On-line learning of indoor temperature forecasting models towards energy efficiency. Energy and Buildings, 83, 162–172.CrossRefGoogle Scholar
  27. Zhu, J., & Hastie, T. (2002). Kernel logistic regression and the import vector machine. In Advances in neural information processing systems (pp. 1081–1088).Google Scholar

Copyright information

© The Author(s), under exclusive licence to Springer Science+Business Media LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Department of Computer ScienceBrown UniversityProvidenceUSA
  2. 2.Department of Computer ScienceAmherst CollegeAmherstUSA

Personalised recommendations