Advertisement

Lifetime Data Analysis

, Volume 23, Issue 4, pp 671–691 | Cite as

\(L_1\) splitting rules in survival forests

  • Hoora Moradian
  • Denis LarocqueEmail author
  • François Bellavance
Article

Abstract

The log-rank test is used as the split function in many commonly used survival trees and forests algorithms. However, the log-rank test may have a significant loss of power in some circumstances, especially when the hazard functions or when the survival functions cross each other in the two compared groups. We investigate the use of the integrated absolute difference between the two children nodes survival functions as the splitting rule. Simulations studies and applications to real data sets show that forests built with this rule produce very good results in general, and that they are often better compared to forests built with the log-rank splitting rule.

Keywords

Survival data Right-censored data Ensemble methods Random forests Survival forests 

Notes

Acknowledgments

The authors would like to thank the Associate Editor and two reviewers whose comments helped in preparing an improved version of this article. This research was supported by the Natural Sciences and Engineering Research Council of Canada (NSERC) and by Le Fonds québécois de la recherche sur la nature et les technologies (FQRNT).

Supplementary material

10985_2016_9372_MOESM1_ESM.pdf (224 kb)
Supplementary material 1 (pdf 223 KB)

References

  1. Ambler G, Benner A (2014) mfp: multivariable fractional polynomials. R package version 1.5.0. http://CRAN.R-project.org/package=mfp
  2. Bache K, Lichman M (2013) UCI machine learning repository. http://archive.ics.uci.edu/ml
  3. Bou-Hamad I, Larocque D, Ben-Ameur H (2011) A review of survival trees. Stat Surv 5:44–71MathSciNetCrossRefzbMATHGoogle Scholar
  4. Boulesteix AL, Janitza S, Kruppa J, König IR (2012) Overview of random forest methodology and practical guidance with emphasis on computational biology and bioinformatics. Wiley Interdiscip Rev Data Min Knowl Discov 2(6):493–507CrossRefGoogle Scholar
  5. Breiman L (2001) Random forests. Mach Learn 45(1):5–32CrossRefzbMATHGoogle Scholar
  6. Breiman L, Friedman JH, Olshen RA, Stone CJ (1984) Classification and regression trees. Wadsworth & Brooks, MontereyzbMATHGoogle Scholar
  7. Breslow NE, Chatterjee N (1999) Design and analysis of two-phase studies with binary outcome applied to wilms tumour prognosis. J R Stat Soc Ser C (Appl Stat) 48(4):457–468CrossRefzbMATHGoogle Scholar
  8. Chen X, Ishwaran H (2012) Random forests for genomic data analysis. Genomics 99(6):323–329CrossRefGoogle Scholar
  9. Chen X, Ishwaran H (2013) Pathway hunting by random survival forests. Bioinformatics 29(1):99–105CrossRefGoogle Scholar
  10. Ciampi A, Thiffault J, Nakache JP, Asselain B (1986) Stratification by stepwise regression, correspondence analysis and recursive partition: a comparison of three methods of analysis for survival data with covariates. Comput Stat Data Anal 4(3):185–204CrossRefzbMATHGoogle Scholar
  11. Ciampi A, Hogg SA, McKinney S, Thiffault J (1988) Recpam: a computer program for recursive partition and amalgamation for censored survival data and other situations frequently occurring in biostatistics. i. methods and program features. Comput Methods Progr Biomed 26(3):239–256CrossRefGoogle Scholar
  12. Cutler A, Zhao G (2001) Pert-perfect random tree ensembles. Comput Sci Stat 33:490–497Google Scholar
  13. De Bin Riccardo, Sauerbrei Willi, Boulesteix Anne-Laure (2014) Investigating the prediction ability of survival models based on both clinical and omics data: two case studies. Stat Med 33(30):5310–5329MathSciNetCrossRefGoogle Scholar
  14. Fleming TR, Harrington DP (1991) Counting processes and survival analysis. Wiley, HobokenzbMATHGoogle Scholar
  15. Gordon L, Olshen RA (1985) Tree-structured survival analysis. Cancer Treat Rep 69(10):1065Google Scholar
  16. Graf E, Schmoor C, Sauerbrei W, Schumacher M (1999) Assessment and comparison of prognostic classification schemes for survival data. Stat Med 18(17–18):2529–2545CrossRefGoogle Scholar
  17. Harrell FE, Califf RM, Pryor DB, Lee KL, Rosati RA (1982) Evaluating the yield of medical tests. JAMA 247(18):2543–2546CrossRefGoogle Scholar
  18. Hosmer DW Jr, Lemeshow S, May S (2011) Applied survival analysis: regression modeling of time to event data. Wiley, ChichesterzbMATHGoogle Scholar
  19. Hothorn T, Lausen B (2003) On the exact distribution of maximally selected rank statistics. Comput Stat Data Anal 43(2):121–137MathSciNetCrossRefzbMATHGoogle Scholar
  20. Hothorn T, Bühlmann P, Dudoit S, Molinaro A, Van Der Laan MJ (2006a) Survival ensembles. Biostatistics 7(3):355–373CrossRefzbMATHGoogle Scholar
  21. Hothorn T, Hornik K, Zeileis A (2006b) Unbiased recursive partitioning: a conditional inference framework. J Comput Graph Stat 15(3):651–674MathSciNetCrossRefGoogle Scholar
  22. Ishwaran H, Kogalur UB (2010) Consistency of random survival forests. Stat Probab Lett 80(13):1056–1064MathSciNetCrossRefzbMATHGoogle Scholar
  23. Ishwaran H, Kogalur UB (2014) Random forests for survival, regression and classification (rf-src). R package version 1.5.5. http://cran.r-project.org/web/packages/randomForestSRC/
  24. Ishwaran H, Kogalur UB, Blackstone EH, Lauer MS (2008) Random survival forests. Ann Appl Stat 2(3):841–860MathSciNetCrossRefzbMATHGoogle Scholar
  25. Ishwaran H, Kogalur UB, Gorodeski EZ, Minn AJ, Lauer MS (2010) High-dimensional variable selection for survival data. J Am Stat Assoc 105(489):205–217MathSciNetCrossRefzbMATHGoogle Scholar
  26. Ishwaran H, Kogalur UB, Chen X, Minn AJ (2011) Random survival forests for high-dimensional data. Stat Anal Data min 4(1):115–132MathSciNetCrossRefGoogle Scholar
  27. Kalbfleisch JD, Prentice RL (1980) The statistical analysis of failure time data. Wiley series in probability and mathematical statistics. Wiley, New YorkGoogle Scholar
  28. Leblanc M, Crowley J (1993) Survival trees by goodness of split. J Am Stat Assoc 88(422):457–467MathSciNetCrossRefzbMATHGoogle Scholar
  29. Lin X, Wang H (2004) A new testing approach for comparing the overall homogeneity of survival curves. Biom J 46(5):489–496MathSciNetCrossRefGoogle Scholar
  30. Lin X, Xu Q (2010) A new method for the comparison of survival distributions. Pharm Stat 9(1):67–76CrossRefGoogle Scholar
  31. Lin Y, Jeon Y (2006) Random forests and adaptive nearest neighbors. J Am Stat Assoc 101(474):578–590MathSciNetCrossRefzbMATHGoogle Scholar
  32. Loh WY (2002) Regression trees with unbiased variable selection and interaction detection. Stat Sin 12(2):361–386MathSciNetzbMATHGoogle Scholar
  33. Loh WY (2013) Guide classification and regression trees user manual for version 15Google Scholar
  34. Mogensen UB, Ishwaran H, Gerds TA (2012) Evaluating random forests for survival analysis using prediction error curves. J Stat Softw 50(11):1CrossRefGoogle Scholar
  35. R Core Team (2014) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria.http://www.R-project.org/
  36. Rokach L (2009) Taxonomy for characterizing ensemble methods in classification tasks: a review and annotated bibliography. Comput Stat Data Anal 53(12):4046–4072MathSciNetCrossRefzbMATHGoogle Scholar
  37. Sauerbrei Willi, Royston Patrick (1999) Building multivariable prognostic and diagnostic models: transformation of the predictors by using fractional polynomials. J R Stat Soc Ser A (Stat Soc) 162(1):71–94CrossRefGoogle Scholar
  38. Scheike T, Martinussen T, Silver J (2009) timereg: timereg package for flexible regression models for survival data. R package version, pp 1–2Google Scholar
  39. Schlichting P, Christensen E, Andersen PK, Fauerholdt L, Juhl E, Poulsen H, Tygstrup N (1983) Prognostic factors in cirrhosis identified by Cox’s regression model. Hepatology 3(6):889–895CrossRefGoogle Scholar
  40. Schumacher M, Bastert G, Bojar H, Huebner K, Olschewski M, Sauerbrei W, Schmoor C, Beyerle C, Neumann RL, Rauschecker HF (1994) Randomized 2 \(\times \) 2 trial evaluating hormonal treatment and the duration of chemotherapy in node-positive breast cancer patients. german breast cancer study group. J Clin Oncol 12(10):2086–2093CrossRefGoogle Scholar
  41. Segal MR (1988) Regression trees for censored data. Biometrics 44:35–47CrossRefzbMATHGoogle Scholar
  42. Siroky DS (2009) Navigating random forests and related advances in algorithmic modeling. Stat Surv 3:147–163MathSciNetCrossRefzbMATHGoogle Scholar
  43. Therneau TM (2014) A package for survival analysis in S. R package version 2.37-7. http://CRAN.R-project.org/package=survival
  44. Verikas A, Gelzinis A, Bacauskiene M (2011) Mining data with random forests: a survey and results of new tests. Pattern Recogn 44(2):330–349CrossRefGoogle Scholar
  45. Zhu R, Kosorok MR (2012) Recursively imputed survival trees. J Am Stat Assoc 107(497):331–340MathSciNetCrossRefzbMATHGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2016

Authors and Affiliations

  • Hoora Moradian
    • 1
  • Denis Larocque
    • 1
    Email author
  • François Bellavance
    • 1
  1. 1.Department of Decision SciencesHEC MontréalMontrealCanada

Personalised recommendations