Highly adaptive regression trees

Nizam, Sohail; Benkeser, David

doi:10.1007/s12065-023-00836-0

Highly adaptive regression trees

Special Issue
Published: 14 March 2023

Volume 17, pages 535–547, (2024)
Cite this article

Evolutionary Intelligence Aims and scope Submit manuscript

Sohail Nizam¹ &
David Benkeser¹

129 Accesses
Explore all metrics

Abstract

The development of machine learning methods that are both accurate and interpretable is of paramount importance in healthcare and many other fields. The Highly Adaptive Lasso (HAL) has been shown to have predictive performance on par with state-of-the art algorithms. HAL involves performing regularized regression of the outcome on a tensor product of indicator basis functions. In this paper we show that this basis can be represented as a non-recursive partitioning of the feature space and propose a method for mapping this partitioning implied by HAL to a recursive partitioning. Such a mapping then allows for the representation of HAL as a decision tree, thereby providing interpretability of predictions made by the algorithm. We refer to this post-hoc method for interpretability as Highly Adaptive Regression Trees (HART). We provide a set of algorithms to construct this mapping and conveniently visualize the resulting tree. Using real data, we show that HAL’s predictive performance is on par with state-of-the-art methods, and we demonstrate the construction and interpretation of HARTs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

What Is Machine Learning?

A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science

A random forest guided tour

Article 19 April 2016

Gérard Biau & Erwan Scornet

Data availibility

The datasets analysed during the current study are all available in the public UCI Machine Learning Repository, (https://archive.ics.uci.edu/ml/index.php). Individual links are provided below. Breast Cancer: https://archive.ics.uci.edu/ml/datasets/breast+cancer Cardio: https://archive.ics.uci.edu/ml/datasets/cardiotocography Drugs: https://archive.ics.uci.edu/ml/datasets/Drug+consumption+ Wine: https://archive.ics.uci.edu/ml/datasets/wine+quality.

References

Pirracchio R, Petersen ML, Carone M, Rigon MR, Chevret S, van der Laan MJ (2015) Mortality prediction in intensive care units with the super icu learner algorithm (sicula): a population-based study. Lancet Respir Med 3(1):42–52. https://doi.org/10.1016/S2213-2600(14)70239-5
Article PubMed Google Scholar
Churpek MM, Yuen TC, Winslow C, Meltzer DO, Kattan MW, Edelson DP (2016) Multicenter comparison of machine learning methods and conventional regression for predicting clinical deterioration on the wards. Crit Care Med 44(2):368. https://doi.org/10.1097/CCM.0000000000001571
Article PubMed PubMed Central Google Scholar
Acion L, Kelmansky D, van der Laan M, Sahker E, Jones D, Arndt S (2017) Use of a machine learning framework to predict substance use disorder treatment success. PLOS One 12(4):0175383. https://doi.org/10.1371/journal.pone.0175383
Article CAS Google Scholar
Rosellini AJ, Dussaillant F, Zubizarreta JR, Kessler RC, Rose S (2018) Predicting posttraumatic stress disorder following a natural disaster. J Psychiatr Res 96:15–22. https://doi.org/10.1016/j.jpsychires.2017.09.010
Article Google Scholar
Bajari P, Nekipelov D, Ryan SP, Yang M (2015) Machine learning methods for demand estimation. Am Econ Rev 105(5):481–85
Article Google Scholar
Amat C, Michalski T, Stoltz G (2018) Fundamentals and exchange rate forecastability with simple machine learning methods. J Int Money Finance 88:1–24
Article Google Scholar
Zeineddine H, Braendle U, Farah A (2021) Enhancing prediction of student success: automated machine learning approach. Comput Electr Eng 89:106903
Article Google Scholar
Huang X-L, Ma X, Hu F (2018) Machine learning and intelligent communications. Mob Netw Appl 23(1):68–70
Article Google Scholar
Kusner MJ, Loftus JR, Russell C, Silva R (2017) Counterfactual fairness. ArXiv e-prints (arXiv:1703.06856 [stat.ML]. https://doi.org/10.48550/arXiv.1703.06856
Podgorelec V, Kokol P, Stiglic B, Rozman I (2002) Decision trees: an overview and their use in medicine. J Med Syst 26(5):445–463. https://doi.org/10.1023/a:1016409317640
Article PubMed Google Scholar
Chern C-C, Chen Y-J, Hsiao B (2019) Decision tree-based classifier in providing telehealth service. BMC Med Inform Decis Mak 19(1):1–15. https://doi.org/10.1186/s12911-019-0825-9
Article Google Scholar
Venkatasubramaniam A, Wolfson J, Mitchell N, Barnes T, JaKa M, French S (2017) Decision trees in epidemiological research. Emerg Themes Epidemiol 14(1):1–12. https://doi.org/10.1186/s12982-017-0064-4
Article Google Scholar
Zhang H, Legro RS, Zhang J, Zhang L, Chen X, Huang H, Casson PR, Schlaff WD, Diamond MP, Krawetz SA (2010) Decision trees for identifying predictors of treatment effectiveness in clinical trials and its application to ovulation in a study of women with polycystic ovary syndrome. Human Reprod 25(10):2612–2621. https://doi.org/10.1093/humrep/deq210
Article Google Scholar
Friedman J, Hastie T, Tibshirani R (2001) The Elements of Statistical Learning vol. 1. Springer, ???
Benkeser D, Van Der Laan M (2016) The highly adaptive lasso estimator. In: 2016 IEEE international conference on data science and advanced analytics (DSAA), IEEE. pp 689–696 https://doi.org/10.1109/DSAA.2016.93
van der Laan M (2017) A generally efficient targeted minimum loss based estimator based on the highly adaptive lasso. Int J Biostat. https://doi.org/10.1515/ijb-2015-0097
Article MathSciNet PubMed Google Scholar
Tibshirani R (1996) Regression shrinkage and selection via the lasso. J Royal Stat Soc Series B 58(1):267–288. https://doi.org/10.1111/j.2517-6161.1996.tb02080.x
Article MathSciNet Google Scholar
Breiman L, Friedman J, Stone CJ, Olshen RA (1984) Classification and regression trees. CRC Press, Boca Raton
Google Scholar
Bollig B, Wegener I (1996) Improving the variable ordering of obdds is np-complete. IEEE Trans Comput 45(9):993–1002. https://doi.org/10.1109/12.537122
Article Google Scholar
Salzberg SL (1994) C4. 5: Programs for machine learning by j. ross quinlan. morgan kaufmann publishers, inc., 1993. Kluwer Academic Publishers
Kass GV (1980) An exploratory technique for investigating large quantities of categorical data. J Royal Stat Soc Series C (Appl Stat) 29(2):119–127. https://doi.org/10.2307/2986296
Article Google Scholar
Oliver JJ, Dowe DL, Wallace C (1992) Inferring decision graphs using the minimum message length principle. In: Proceedings of the 5th Australian joint conference on artificial intelligence, World Scientific. pp 361–367
Zwitter M, Matjaz (1988) Soklic: breast cancer. UCI Machine Learning Repository
Ayres-de-Campos D, Bernardes J, Garrido A, Marques-de-Sa J, Pereira-Leite L (2000) Sisporto 2.0: a program for automated analysis of cardiotocograms. J Matern-Fetal Med 9(5):311–318. https://doi.org/10.1002/1520-6661
Article CAS PubMed Google Scholar
Fehrman E, Muhammad AK, Mirkes EM, Egan V, Gorban AN (2017) The five factor model of personality and evaluation of drug consumption risk. In: Data Science, pp. 231–242. Springer, ??? https://doi.org/10.1007/978-3-319-55723-6_18
Aeberhard S, Coomans D, De Vel O (1992) Comparison of classifiers in high dimensional settings. Dept Math Statist, James Cook Univ., North Queensland, Australia, Tech. Rep 92(02) https://doi.org/10.1016/0031-3203(94)90145-7
Lao C, Elwood M, Kuper-Hommel M, Campbell I, Lawrenson R (2021) Impact of menopausal status on risk of metastatic recurrence of breast cancer. Menopause 28(10):1085–1092. https://doi.org/10.1097/GME.0000000000001817
Article PubMed Google Scholar
Tantau T The TikZ and PGF Packages. https://tikz.dev/
Zivanovic S Forest. https://doi.org/10.5281/zenodo.1234. https://github.com/sasozivanovic/forest

Download references

Acknowledgements

All figures were created using the Tikz [28] and Forest [29] packages in Latex.

Funding

The research leading to these results received funding from the National Science Foundation under Grant Agreement No 2015540.

Author information

Authors and Affiliations

Department of Biostatistics and Bioinformatics, Rollins School of Public Health, Emory University, 1518 Clifton Road, Atlanta, Georgia, 30322, USA
Sohail Nizam & David Benkeser

Authors

Sohail Nizam
View author publications
You can also search for this author in PubMed Google Scholar
David Benkeser
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

SN and DB both conceptualized the research, developed the methodology, conducted the analyses, and wrote and edited this work.

Corresponding author

Correspondence to Sohail Nizam.

Ethics declarations

Conflict of interest

The authors have no competing interests to declare that are relevant to the content of this article.

Consent to participate

Not applicable. The research leading to these results did not involve any live subjects.

Consent for publication

All authors and relevant institutions have given consent for this work to be published.

Code availability

The code used to produce these results is available in the Supplementary materials.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file 1 (py 23 KB)

Supplementary file 2 (r 1 KB)

Supplementary file 3 (r 1 KB)

Supplementary file 4 (py 2 KB)

Supplementary file 5 (txt 1 KB)

Supplementary file 6 (pdf 113 KB)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Nizam, S., Benkeser, D. Highly adaptive regression trees. Evol. Intel. 17, 535–547 (2024). https://doi.org/10.1007/s12065-023-00836-0

Download citation

Received: 04 November 2022
Revised: 23 January 2023
Accepted: 24 January 2023
Published: 14 March 2023
Issue Date: February 2024
DOI: https://doi.org/10.1007/s12065-023-00836-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Highly adaptive regression trees

Abstract

Access this article

Similar content being viewed by others

What Is Machine Learning?

A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science

A random forest guided tour

Data availibility

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Consent to participate

Consent for publication

Code availability

Additional information

Publisher's Note

Supplementary Information

Supplementary file 1 (py 23 KB)

Supplementary file 2 (r 1 KB)

Supplementary file 3 (r 1 KB)

Supplementary file 4 (py 2 KB)

Supplementary file 5 (txt 1 KB)

Supplementary file 6 (pdf 113 KB)

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Highly adaptive regression trees

Abstract

Access this article

Similar content being viewed by others

Data availibility

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Consent to participate

Consent for publication

Code availability

Additional information

Publisher's Note

Supplementary Information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation