Estimating Heterogeneous Treatment Effect on Multivariate Responses Using Random Forests

Abstract

Estimating the individualized treatment effect has become one of the most popular topics in statistics and machine learning communities in recent years. Most existing methods focus on modeling the heterogeneous treatment effects for univariate outcomes. However, many biomedical studies are interested in studying multiple highly correlated endpoints at the same time. We propose a random forest model that simultaneously estimates individualized treatment effects of multivariate outcomes. We consider a popular study design where covariates and outcomes are measured both before and after the intervention. The proposed model uses oblique splitting rules to partition population space to the neighborhood that experiences distinct treatment effects. An extensive simulation study suggests that the proposed method outperforms existing methods in various nonlinear settings. We further apply the proposed method to two nutrition studies investigating the effects of food consumption on gastrointestinal microbiota composition and clinical biomarkers. The method has been implemented in a freely available R package MOTE.RF at https://github.com/boyiguo1/MOTE.RF.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3

Code Availability

The method has been implemented in a freely available R package MOTE.RF at https://github.com/boyiguo1/MOTE.RF.

References

  1. 1.

    Athey S, Imbens G (2016) Recursive partitioning for heterogeneous causal effects. Proc Natl Acad Sci 113(27):7353–7360

    MathSciNet  Article  Google Scholar 

  2. 2.

    Athey S, Tibshirani J, Wager S (2019) Generalized random forests. Ann Stat 47(2):1148–1178

    MathSciNet  Article  Google Scholar 

  3. 3.

    Ball MP, Bobe JR, Chou MF, Clegg T, Estep PW, Lunshof JE, Vandewege W, Zaranek AW, Church GM (2014) Harvard personal genome project: lessons from participatory public research. Genome Med 6(2):10

    Article  Google Scholar 

  4. 4.

    Breiman L (2001a) Random forests. Machine Learn 45(1):5–32

    Article  Google Scholar 

  5. 5.

    Breiman L (2001b) Statistical modeling: the two cultures. Stat Sci 16(3):199–231

    MathSciNet  Article  Google Scholar 

  6. 6.

    Brinkley J, Tsiatis A, Anstrom KJ (2010) A generalized estimator of the attributable benefit of an optimal treatment regime. Biometrics 66(2):512–522. https://doi.org/10.1111/j.1541-0420.2009.01282.x

    MathSciNet  Article  MATH  Google Scholar 

  7. 7.

    Cai T, Tian L, Wong PH, Wei LJ (2011) Analysis of randomized comparative clinical trial data for personalized treatment selections. Biostatistics (Oxford, England) 12(2):270–82. https://doi.org/10.1093/biostatistics/kxq060

    Article  MATH  Google Scholar 

  8. 8.

    Callahan BJ, Sankaran K, Fukuyama JA, McMurdie PJ, Holmes SP (2016) Bioconductor workflow for microbiome data analysis: from raw reads to community analyses. F1000Research 5

  9. 9.

    Chen J, Bushman FD, Lewis JD, Wu GD, Li H (2013) Structure-constrained sparse canonical correlation analysis with an application to microbiome data analysis. Biostatistics 14(2):244–258

    Article  Google Scholar 

  10. 10.

    Collins FS, Varmus H (2015) A new initiative on precision medicine. N Engl J Med 372(9):793–795. https://doi.org/10.1056/NEJMp1500523

    Article  Google Scholar 

  11. 11.

    Cook RD, Li B, Chiaromonte F (2010) Envelope models for parsimonious and efficient multivariate linear regression. Stat Sin pp 927–960

  12. 12.

    Davies A, Ghahramani Z (2014) The random forest kernel and creating other kernels for big data from random partitions. arXiv:14024293

  13. 13.

    Egozcue JJ, Pawlowsky-Glahn V, Mateu-Figueras G, Barcelo-Vidal C (2003) Isometric logratio transformations for compositional data analysis. Math Geol 35(3):279–300

    MathSciNet  Article  Google Scholar 

  14. 14.

    Foster JC, Taylor JMG, Ruberg SJ (2011) Subgroup identification from randomized clinical trial data. Stat Med 30(24):2867–2880. https://doi.org/10.1002/sim.4322

    MathSciNet  Article  Google Scholar 

  15. 15.

    Geurts P, Ernst D, Wehenkel L (2006) Extremely randomized trees. Machine Learn 63(1):3–42

    Article  Google Scholar 

  16. 16.

    Gordon L, Olshen RA (1985) Tree-structured survival analysis. Cancer Treat Rep 69(10):1065–1069

    Google Scholar 

  17. 17.

    Holscher HD, Taylor AM, Swanson KS, Novotny JA, Baer DJ (2018) Almond consumption and processing affects the composition of the gastrointestinal microbiota of healthy adult men and women: A randomized controlled trial. Nutrients 10(2):126

    Article  Google Scholar 

  18. 18.

    Hotelling H (1936) Relations between two sets of variables. Biometrika 28(3–4):321–377

    Article  Google Scholar 

  19. 19.

    Hothorn T, Bühlmann P, Dudoit S, Molinaro A, Van Der Laan MJ (2005) Survival ensembles. Biostatistics 7(3):355–373

    Article  Google Scholar 

  20. 20.

    Ishwaran H, Kogalur UB, Blackstone EH, Lauer MS (2008) Random survival forests. Ann Appl Stat pp 841–860

  21. 21.

    Kosorok MR, Moodie EE (2015) Adaptive treatment strategies in practice: planning trials and analyzing data for personalized medicine, vol 21. SIAM

  22. 22.

    Laber EB, Zhao YQ (2015) Tree-based methods for individualized treatment regimes. Biometrika 102(3):501–514. https://doi.org/10.1093/biomet/asv028

    MathSciNet  Article  MATH  Google Scholar 

  23. 23.

    LeBlanc M, Crowley J (1992) Relative risk trees for censored survival data. Biometrics 411–425

  24. 24.

    Li KC (1991) Sliced inverse regression for dimension reduction. J Am Stat Assoc 86(414):316–327

    MathSciNet  Article  Google Scholar 

  25. 25.

    Li H (2019) Statistical and computational methods in microbiome and metagenomics. Handbook Stat Genomics 977–550

  26. 26.

    Liaw A, Wiener M (2002) Classification and regression by randomforest. R News 2(3):18–22

  27. 27.

    Lipkovich I, Dmitrienko A, Denne J, Enas G (2011) Subgroup identification based on differential effect search—a recursive partitioning method for establishing response to treatment in patient subpopulations. Stat Med 30(21):2601–2621. https://doi.org/10.1002/sim.4289

    MathSciNet  Article  Google Scholar 

  28. 28.

    Loh WY, He X, Man M (2015) A regression tree approach to identifying subgroups with differential treatment effects. Stat Med 34(11):1818–1833.

    MathSciNet  Article  Google Scholar 

  29. 29.

    Meinshausen N (2006) Quantile regression forests. J Mach Learn Res 7(Jun):983–999

  30. 30.

    Meng C, Zeleznik OA, Thallinger GG, Kuster B, Gholami AM, Culhane AC (2016) Dimension reduction techniques for the integrative analysis of multi-omics data. Brief Bioinform 17(4):628–641

    Article  Google Scholar 

  31. 31.

    Menze BH, Kelm BM, Splitthoff DN, Koethe U, Hamprecht FA (2011) On oblique random forests. In: Joint European conference on machine learning and knowledge discovery in databases, Springer, Berlin, pp 453–469

  32. 32.

    Nadeem N, Woodside JV, Neville CE, McCall DO, McCance D, Edgar D, Young IS, McEneny J (2014) Serum amyloid a-related inflammation is lowered by increased fruit and vegetable intake, while high-sensitive c-reactive protein, il-6 and e-selectin remain unresponsive. Br J Nutr 112(7):1129–1136

    Article  Google Scholar 

  33. 33.

    Ozato N, Saito S, Yamaguchi T, Katashima M, Tokuda I, Sawada K, Katsuragi Y, Kakuta M, Imoto S, Ihara K, et al. (2019) Blautia genus associated with visceral fat accumulation in adults 20–76 years of age. NPJ Biofilms Microbiomes 5(1):1–9

  34. 34.

    Peplow M (2016) The 100 000 genomes project. BMJ 353. https://doi.org/10.1136/bmj.i1757

  35. 35.

    Peterson CB, Stingo FC, Vannucci M (2016) Joint bayesian variable and graph selection for regression models with network-structured predictors. Stat Med 35(7):1017–1031

    MathSciNet  Article  Google Scholar 

  36. 36.

    Qian M, Murphy SA (2011) Performance guarantees for individualized treatment rules. Ann Stat 39(2):1180

    MathSciNet  Article  Google Scholar 

  37. 37.

    R Core Team (2020) R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/

  38. 38.

    Rainforth T, Wood F (2015) Canonical correlation forests. ArXiv e-prints

  39. 39.

    Rohart F, Gautier B, Singh A, Lê Cao KA (2017) mixomics: an R package for ’omics feature selection and multiple data integration. PLoS Comput Biol 13(11):e1005752

    Article  Google Scholar 

  40. 40.

    Rubin DB (1974) Estimating causal effects of treatments in randomized and nonrandomized studies. J Educ Psychol 66(5):688

    Article  Google Scholar 

  41. 41.

    Ryan KK, Tremaroli V, Clemmensen C, Kovatcheva-Datchary P, Myronovych A, Karns R, Wilson-Pérez HE, Sandoval DA, Kohli R, Bäckhed F et al (2014) FXR is a molecular target for the effects of vertical sleeve gastrectomy. Nature 509(7499):183–188

    Article  Google Scholar 

  42. 42.

    Sega M, Xiao Y (2011) Multivariate random forests. Wiley Interdiscip Rev 1(1):80–87. https://doi.org/10.1002/widm.12

    Article  Google Scholar 

  43. 43.

    Simon N, Friedman J, Hastie T, Tibshirani R (2011) Regularization paths for cox’s proportional hazards model via coordinate descent. J Stat Softw 39(5):1–13

    Article  Google Scholar 

  44. 44.

    Su X, Tsai CL, Wang H, Nickerson DM, Li B (2009) Subgroup analysis via recursive partitioning. J Mach Learn Res 10:141–158

  45. 45.

    Tenenhaus A, Tenenhaus M (2011) Regularized generalized canonical correlation analysis. Psychometrika 76(2):257

    MathSciNet  Article  Google Scholar 

  46. 46.

    Tenenhaus M, Tenenhaus A, Groenen PJ (2017) Regularized generalized canonical correlation analysis: a framework for sequential multiblock component methods. Psychometrika 82(3):737–777

    MathSciNet  Article  Google Scholar 

  47. 47.

    Thompson S, Bailey M, Taylor A, Kaczmarek J, Krug A, Edwards C, Reeser G, Burd N, Khan N, Holscher H (2020) Avocado consumption alters gastrointestinal bacteria abundance and microbial metabolite concentrations among adults with overweight or obesity: a randomized, controlled trial. J Nutr (accepted)

  48. 48.

    Tian L, Alizadeh AA, Gentles AJ, Tibshirani R (2014) A simple method for estimating interactions between a treatment and a large number of covariates. J Am Stat Assoc 109(508):1517–1532

    MathSciNet  Article  Google Scholar 

  49. 49.

    Wold S, Sjöström M, Eriksson L (2001) Pls-regression: a basic tool of chemometrics. Chemom Intell Lab Syst 58(2):109–130

    Article  Google Scholar 

  50. 50.

    Ze X, Duncan SH, Louis P, Flint HJ (2012) Ruminococcus bromii is a keystone species for the degradation of resistant starch in the human colon. ISME J 6(8):1535–1543

    Article  Google Scholar 

  51. 51.

    Zhang B, Tsiatis AA, Davidian M, Zhang M, Laber E (2012a) Estimating optimal treatment regimes from a classification perspective. Stat 1(1):103–114. https://doi.org/10.1002/sta.411

    MathSciNet  Article  MATH  Google Scholar 

  52. 52.

    Zhang B, Tsiatis AA, Laber EB, Davidian M (2012b) A robust method for estimating optimal treatment regimes. Biometrics 68(4):1010–1018

    MathSciNet  Article  Google Scholar 

  53. 53.

    Zhang B, Tsiatis AA, Laber EB, Davidian M (2013) Robust estimation of optimal dynamic treatment regimes for sequential treatment decisions. Biometrika 100(3):681–694. https://doi.org/10.1093/biomet/ast014

    MathSciNet  Article  MATH  Google Scholar 

  54. 54.

    Zhang Y, Laber EB, Tsiatis A, Davidian M (2015) Using decision lists to construct interpretable and parsimonious treatment regimes. Biometrics 71(4):895–904. https://doi.org/10.1111/biom.12354

    MathSciNet  Article  MATH  Google Scholar 

  55. 55.

    Zhao Y, Zeng D, Rush AJ, Kosorok MR (2012) Estimating individualized treatment rules using outcome weighted learning. J Am Stat Assoc 107(499):1106–1118. https://doi.org/10.1080/01621459.2012.695674

    MathSciNet  Article  MATH  Google Scholar 

  56. 56.

    Zhao L, Tian L, Cai T, Claggett B, Wei LJ (2013) Effectively selecting a target population for a future comparative study. J Am Stat Assoc 108(502):527–539. https://doi.org/10.1080/01621459.2013.770705

    MathSciNet  Article  MATH  Google Scholar 

  57. 57.

    Zhu R, Kosorok MR (2012) Recursively imputed survival trees. J Am Stat Assoc 107(497):331–340

    MathSciNet  Article  Google Scholar 

  58. 58.

    Zhu X, Qu A (2016) Individualizing drug dosage with longitudinal data. Stat Med 35(24):4474–4488

    MathSciNet  Article  Google Scholar 

  59. 59.

    Zhu R, Zhao YQ, Chen G, Ma S, Zhao H (2017) Greedy outcome weighted tree learning of optimal personalized treatment rules. Biometrics 73(2):391–400. https://doi.org/10.1111/biom.12593

    MathSciNet  Article  MATH  Google Scholar 

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Ruoqing Zhu.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Guo, B., Holscher, H.D., Auvil, L.S. et al. Estimating Heterogeneous Treatment Effect on Multivariate Responses Using Random Forests. Stat Biosci (2021). https://doi.org/10.1007/s12561-021-09310-w

Download citation

Keywords

  • Individualized treatment effect
  • Microbiota
  • Multivariate
  • Random forests
  • Personalized nutrition