Estimating Heterogeneous Treatment Effect on Multivariate Responses Using Random Forests

Guo, Boyi; Holscher, Hannah D.; Auvil, Loretta S.; Welge, Michael E.; Bushell, Colleen B.; Novotny, Janet A.; Baer, David J.; Burd, Nicholas A.; Khan, Naiman A.; Zhu, Ruoqing

doi:10.1007/s12561-021-09310-w

Estimating Heterogeneous Treatment Effect on Multivariate Responses Using Random Forests

Published: 15 May 2021

Volume 15, pages 545–561, (2023)
Cite this article

Statistics in Biosciences Aims and scope Submit manuscript

Boyi Guo¹,
Hannah D. Holscher²,
Loretta S. Auvil³,
Michael E. Welge³,
Colleen B. Bushell³,
Janet A. Novotny⁴,
David J. Baer⁴,
Nicholas A. Burd⁵,
Naiman A. Khan⁵ &
…
Ruoqing Zhu ORCID: orcid.org/0000-0002-0753-5716⁶

831 Accesses
1 Citation
1 Altmetric
Explore all metrics

Abstract

Estimating the individualized treatment effect has become one of the most popular topics in statistics and machine learning communities in recent years. Most existing methods focus on modeling the heterogeneous treatment effects for univariate outcomes. However, many biomedical studies are interested in studying multiple highly correlated endpoints at the same time. We propose a random forest model that simultaneously estimates individualized treatment effects of multivariate outcomes. We consider a popular study design where covariates and outcomes are measured both before and after the intervention. The proposed model uses oblique splitting rules to partition population space to the neighborhood that experiences distinct treatment effects. An extensive simulation study suggests that the proposed method outperforms existing methods in various nonlinear settings. We further apply the proposed method to two nutrition studies investigating the effects of food consumption on gastrointestinal microbiota composition and clinical biomarkers. The method has been implemented in a freely available R package MOTE.RF at https://github.com/boyiguo1/MOTE.RF.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Data integration for prediction of weight loss in randomized controlled dietary trials

Article Open access 18 November 2020

Comparison of causal forest and regression-based approaches to evaluate treatment effect heterogeneity: an application for type 2 diabetes precision medicine

Article Open access 16 June 2023

A data-adaptive method for investigating effect heterogeneity with high-dimensional covariates in Mendelian randomization

Article Open access 10 February 2024

Code Availability

The method has been implemented in a freely available R package MOTE.RF at https://github.com/boyiguo1/MOTE.RF.

References

Athey S, Imbens G (2016) Recursive partitioning for heterogeneous causal effects. Proc Natl Acad Sci 113(27):7353–7360
Article MathSciNet MATH Google Scholar
Athey S, Tibshirani J, Wager S (2019) Generalized random forests. Ann Stat 47(2):1148–1178
Article MathSciNet MATH Google Scholar
Ball MP, Bobe JR, Chou MF, Clegg T, Estep PW, Lunshof JE, Vandewege W, Zaranek AW, Church GM (2014) Harvard personal genome project: lessons from participatory public research. Genome Med 6(2):10
Article Google Scholar
Breiman L (2001a) Random forests. Machine Learn 45(1):5–32
Article MATH Google Scholar
Breiman L (2001b) Statistical modeling: the two cultures. Stat Sci 16(3):199–231
Article MathSciNet MATH Google Scholar
Brinkley J, Tsiatis A, Anstrom KJ (2010) A generalized estimator of the attributable benefit of an optimal treatment regime. Biometrics 66(2):512–522. https://doi.org/10.1111/j.1541-0420.2009.01282.x
Article MathSciNet MATH Google Scholar
Cai T, Tian L, Wong PH, Wei LJ (2011) Analysis of randomized comparative clinical trial data for personalized treatment selections. Biostatistics (Oxford, England) 12(2):270–82. https://doi.org/10.1093/biostatistics/kxq060
Article MATH Google Scholar
Callahan BJ, Sankaran K, Fukuyama JA, McMurdie PJ, Holmes SP (2016) Bioconductor workflow for microbiome data analysis: from raw reads to community analyses. F1000Research 5
Chen J, Bushman FD, Lewis JD, Wu GD, Li H (2013) Structure-constrained sparse canonical correlation analysis with an application to microbiome data analysis. Biostatistics 14(2):244–258
Article Google Scholar
Collins FS, Varmus H (2015) A new initiative on precision medicine. N Engl J Med 372(9):793–795. https://doi.org/10.1056/NEJMp1500523
Article Google Scholar
Cook RD, Li B, Chiaromonte F (2010) Envelope models for parsimonious and efficient multivariate linear regression. Stat Sin pp 927–960
Davies A, Ghahramani Z (2014) The random forest kernel and creating other kernels for big data from random partitions. arXiv:14024293
Egozcue JJ, Pawlowsky-Glahn V, Mateu-Figueras G, Barcelo-Vidal C (2003) Isometric logratio transformations for compositional data analysis. Math Geol 35(3):279–300
Article MathSciNet MATH Google Scholar
Foster JC, Taylor JMG, Ruberg SJ (2011) Subgroup identification from randomized clinical trial data. Stat Med 30(24):2867–2880. https://doi.org/10.1002/sim.4322
Article MathSciNet Google Scholar
Geurts P, Ernst D, Wehenkel L (2006) Extremely randomized trees. Machine Learn 63(1):3–42
Article MATH Google Scholar
Gordon L, Olshen RA (1985) Tree-structured survival analysis. Cancer Treat Rep 69(10):1065–1069
Google Scholar
Holscher HD, Taylor AM, Swanson KS, Novotny JA, Baer DJ (2018) Almond consumption and processing affects the composition of the gastrointestinal microbiota of healthy adult men and women: A randomized controlled trial. Nutrients 10(2):126
Article Google Scholar
Hotelling H (1936) Relations between two sets of variables. Biometrika 28(3–4):321–377
Article MATH Google Scholar
Hothorn T, Bühlmann P, Dudoit S, Molinaro A, Van Der Laan MJ (2005) Survival ensembles. Biostatistics 7(3):355–373
Article MATH Google Scholar
Ishwaran H, Kogalur UB, Blackstone EH, Lauer MS (2008) Random survival forests. Ann Appl Stat pp 841–860
Kosorok MR, Moodie EE (2015) Adaptive treatment strategies in practice: planning trials and analyzing data for personalized medicine, vol 21. SIAM
Laber EB, Zhao YQ (2015) Tree-based methods for individualized treatment regimes. Biometrika 102(3):501–514. https://doi.org/10.1093/biomet/asv028
Article MathSciNet MATH Google Scholar
LeBlanc M, Crowley J (1992) Relative risk trees for censored survival data. Biometrics 411–425
Li KC (1991) Sliced inverse regression for dimension reduction. J Am Stat Assoc 86(414):316–327
Article MathSciNet MATH Google Scholar
Li H (2019) Statistical and computational methods in microbiome and metagenomics. Handbook Stat Genomics 977–550
Liaw A, Wiener M (2002) Classification and regression by randomforest. R News 2(3):18–22
Lipkovich I, Dmitrienko A, Denne J, Enas G (2011) Subgroup identification based on differential effect search—a recursive partitioning method for establishing response to treatment in patient subpopulations. Stat Med 30(21):2601–2621. https://doi.org/10.1002/sim.4289
Article MathSciNet Google Scholar
Loh WY, He X, Man M (2015) A regression tree approach to identifying subgroups with differential treatment effects. Stat Med 34(11):1818–1833.
Article MathSciNet Google Scholar
Meinshausen N (2006) Quantile regression forests. J Mach Learn Res 7(Jun):983–999
Meng C, Zeleznik OA, Thallinger GG, Kuster B, Gholami AM, Culhane AC (2016) Dimension reduction techniques for the integrative analysis of multi-omics data. Brief Bioinform 17(4):628–641
Article Google Scholar
Menze BH, Kelm BM, Splitthoff DN, Koethe U, Hamprecht FA (2011) On oblique random forests. In: Joint European conference on machine learning and knowledge discovery in databases, Springer, Berlin, pp 453–469
Nadeem N, Woodside JV, Neville CE, McCall DO, McCance D, Edgar D, Young IS, McEneny J (2014) Serum amyloid a-related inflammation is lowered by increased fruit and vegetable intake, while high-sensitive c-reactive protein, il-6 and e-selectin remain unresponsive. Br J Nutr 112(7):1129–1136
Article Google Scholar
Ozato N, Saito S, Yamaguchi T, Katashima M, Tokuda I, Sawada K, Katsuragi Y, Kakuta M, Imoto S, Ihara K, et al. (2019) Blautia genus associated with visceral fat accumulation in adults 20–76 years of age. NPJ Biofilms Microbiomes 5(1):1–9
Peplow M (2016) The 100 000 genomes project. BMJ 353. https://doi.org/10.1136/bmj.i1757
Peterson CB, Stingo FC, Vannucci M (2016) Joint bayesian variable and graph selection for regression models with network-structured predictors. Stat Med 35(7):1017–1031
Article MathSciNet Google Scholar
Qian M, Murphy SA (2011) Performance guarantees for individualized treatment rules. Ann Stat 39(2):1180
Article MathSciNet MATH Google Scholar
R Core Team (2020) R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/
Rainforth T, Wood F (2015) Canonical correlation forests. ArXiv e-prints
Rohart F, Gautier B, Singh A, Lê Cao KA (2017) mixomics: an R package for ’omics feature selection and multiple data integration. PLoS Comput Biol 13(11):e1005752
Article Google Scholar
Rubin DB (1974) Estimating causal effects of treatments in randomized and nonrandomized studies. J Educ Psychol 66(5):688
Article Google Scholar
Ryan KK, Tremaroli V, Clemmensen C, Kovatcheva-Datchary P, Myronovych A, Karns R, Wilson-Pérez HE, Sandoval DA, Kohli R, Bäckhed F et al (2014) FXR is a molecular target for the effects of vertical sleeve gastrectomy. Nature 509(7499):183–188
Article Google Scholar
Sega M, Xiao Y (2011) Multivariate random forests. Wiley Interdiscip Rev 1(1):80–87. https://doi.org/10.1002/widm.12
Article Google Scholar
Simon N, Friedman J, Hastie T, Tibshirani R (2011) Regularization paths for cox’s proportional hazards model via coordinate descent. J Stat Softw 39(5):1–13
Article Google Scholar
Su X, Tsai CL, Wang H, Nickerson DM, Li B (2009) Subgroup analysis via recursive partitioning. J Mach Learn Res 10:141–158
Tenenhaus A, Tenenhaus M (2011) Regularized generalized canonical correlation analysis. Psychometrika 76(2):257
Article MathSciNet MATH Google Scholar
Tenenhaus M, Tenenhaus A, Groenen PJ (2017) Regularized generalized canonical correlation analysis: a framework for sequential multiblock component methods. Psychometrika 82(3):737–777
Article MathSciNet MATH Google Scholar
Thompson S, Bailey M, Taylor A, Kaczmarek J, Krug A, Edwards C, Reeser G, Burd N, Khan N, Holscher H (2020) Avocado consumption alters gastrointestinal bacteria abundance and microbial metabolite concentrations among adults with overweight or obesity: a randomized, controlled trial. J Nutr (accepted)
Tian L, Alizadeh AA, Gentles AJ, Tibshirani R (2014) A simple method for estimating interactions between a treatment and a large number of covariates. J Am Stat Assoc 109(508):1517–1532
Article MathSciNet MATH Google Scholar
Wold S, Sjöström M, Eriksson L (2001) Pls-regression: a basic tool of chemometrics. Chemom Intell Lab Syst 58(2):109–130
Article Google Scholar
Ze X, Duncan SH, Louis P, Flint HJ (2012) Ruminococcus bromii is a keystone species for the degradation of resistant starch in the human colon. ISME J 6(8):1535–1543
Article Google Scholar
Zhang B, Tsiatis AA, Davidian M, Zhang M, Laber E (2012a) Estimating optimal treatment regimes from a classification perspective. Stat 1(1):103–114. https://doi.org/10.1002/sta.411
Article MathSciNet MATH Google Scholar
Zhang B, Tsiatis AA, Laber EB, Davidian M (2012b) A robust method for estimating optimal treatment regimes. Biometrics 68(4):1010–1018
Article MathSciNet MATH Google Scholar
Zhang B, Tsiatis AA, Laber EB, Davidian M (2013) Robust estimation of optimal dynamic treatment regimes for sequential treatment decisions. Biometrika 100(3):681–694. https://doi.org/10.1093/biomet/ast014
Article MathSciNet MATH Google Scholar
Zhang Y, Laber EB, Tsiatis A, Davidian M (2015) Using decision lists to construct interpretable and parsimonious treatment regimes. Biometrics 71(4):895–904. https://doi.org/10.1111/biom.12354
Article MathSciNet MATH Google Scholar
Zhao Y, Zeng D, Rush AJ, Kosorok MR (2012) Estimating individualized treatment rules using outcome weighted learning. J Am Stat Assoc 107(499):1106–1118. https://doi.org/10.1080/01621459.2012.695674
Article MathSciNet MATH Google Scholar
Zhao L, Tian L, Cai T, Claggett B, Wei LJ (2013) Effectively selecting a target population for a future comparative study. J Am Stat Assoc 108(502):527–539. https://doi.org/10.1080/01621459.2013.770705
Article MathSciNet MATH Google Scholar
Zhu R, Kosorok MR (2012) Recursively imputed survival trees. J Am Stat Assoc 107(497):331–340
Article MathSciNet MATH Google Scholar
Zhu X, Qu A (2016) Individualizing drug dosage with longitudinal data. Stat Med 35(24):4474–4488
Article MathSciNet Google Scholar
Zhu R, Zhao YQ, Chen G, Ma S, Zhao H (2017) Greedy outcome weighted tree learning of optimal personalized treatment rules. Biometrics 73(2):391–400. https://doi.org/10.1111/biom.12593
Article MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Biostatistics, University of Alabama at Birmingham, Birmingham, AL, 35244, USA
Boyi Guo
Department of Food Science and Human Nutrition, University of Illinois at Urbana-Champaign, Champaign, IL, 61801, USA
Hannah D. Holscher
National Center for Supercomputing Applications, University of Illinois at Urbana-Champaign, Champaign, IL, 61801, USA
Loretta S. Auvil, Michael E. Welge & Colleen B. Bushell
USDA, ARS, Beltsville Human Nutrition Research Center, Beltsville, MD, 20705, USA
Janet A. Novotny & David J. Baer
Department of Kinesiology and Community Health, University of Illinois at Urbana-Champaign, Champaign, IL, 61801, USA
Nicholas A. Burd & Naiman A. Khan
Department of Statistics, University of Illinois at Urbana-Champaign, Champaign, IL, 61801, USA
Ruoqing Zhu

Authors

Boyi Guo
View author publications
You can also search for this author in PubMed Google Scholar
Hannah D. Holscher
View author publications
You can also search for this author in PubMed Google Scholar
Loretta S. Auvil
View author publications
You can also search for this author in PubMed Google Scholar
Michael E. Welge
View author publications
You can also search for this author in PubMed Google Scholar
Colleen B. Bushell
View author publications
You can also search for this author in PubMed Google Scholar
Janet A. Novotny
View author publications
You can also search for this author in PubMed Google Scholar
David J. Baer
View author publications
You can also search for this author in PubMed Google Scholar
Nicholas A. Burd
View author publications
You can also search for this author in PubMed Google Scholar
Naiman A. Khan
View author publications
You can also search for this author in PubMed Google Scholar
Ruoqing Zhu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ruoqing Zhu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Guo, B., Holscher, H.D., Auvil, L.S. et al. Estimating Heterogeneous Treatment Effect on Multivariate Responses Using Random Forests. Stat Biosci 15, 545–561 (2023). https://doi.org/10.1007/s12561-021-09310-w

Download citation

Received: 02 July 2020
Revised: 05 February 2021
Accepted: 24 April 2021
Published: 15 May 2021
Issue Date: December 2023
DOI: https://doi.org/10.1007/s12561-021-09310-w

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Estimating Heterogeneous Treatment Effect on Multivariate Responses Using Random Forests

Abstract

Access this article

Similar content being viewed by others

Data integration for prediction of weight loss in randomized controlled dietary trials

Comparison of causal forest and regression-based approaches to evaluate treatment effect heterogeneity: an application for type 2 diabetes precision medicine

A data-adaptive method for investigating effect heterogeneity with high-dimensional covariates in Mendelian randomization

Code Availability

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Estimating Heterogeneous Treatment Effect on Multivariate Responses Using Random Forests

Abstract

Access this article

Similar content being viewed by others

Data integration for prediction of weight loss in randomized controlled dietary trials

Comparison of causal forest and regression-based approaches to evaluate treatment effect heterogeneity: an application for type 2 diabetes precision medicine

A data-adaptive method for investigating effect heterogeneity with high-dimensional covariates in Mendelian randomization

Code Availability

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation