Skip to main content

Abstract

Although there is no consensus on a precise definition of interpretability, it is possible to identify several requirements: “simplicity, stability, and accuracy”, rarely all satisfied by existing interpretable methods. The structure and stability of random forests make them good candidates to improve the performance of interpretable algorithms. The first part of this chapter focuses on rule learning models, which are simple and highly predictive algorithms, but very often unstable with respect to small data perturbations. A new algorithm called SIRUS, designed as the extraction of a compact rule ensemble from a random forest, considerably improves stability over state-of-the-art competitors, while preserving simplicity and accuracy. The second part of this chapter is dedicated to post-hoc methods, in particular variable importance measures for random forests. An asymptotic analysis of Breiman’s MDA (Mean Decrease Accuracy) shows that this measure is strongly biased using a sensitivity analysis perspective. The Sobol-MDA algorithm is introduced to fix the MDA flaws, replacing permutations by projections. An extension to Shapley effects, an efficient importance measure when input variables are dependent, is then proposed with the SHAFF algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 16.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    See Table 1 in Sect. 5 of the Supplementary Material in [5] for dataset details.

  2. 2.

    See Sect. 2 of the Supplementary Material in [5] for details on the bi-objective procedure.

  3. 3.

    See Sect. 6 of the Supplementary Material in [5] for a detailed definition of this criterion.

  4. 4.

    See Bénard et al. [8] for details.

References

  1. Aas K, Jullum M, Løland A (2019) Explaining individual predictions when features are dependent: More accurate approximations to Shapley values. Preprint. arXiv:190310464

    Google Scholar 

  2. Alelyani S, Zhao Z, Liu H (2011) A dilemma in assessing stability of feature selection algorithms. In: 13th IEEE international conference on high performance computing & communication. IEEE, Piscataway, pp 701–707

    Google Scholar 

  3. Archer K, Kimes R (2008) Empirical characterization of random forest variable importance measures. Comput Stat Data Anal 52:2249–2260

    Article  MathSciNet  MATH  Google Scholar 

  4. Basu S, Kumbier K, Brown J, Yu B (2018) Iterative random forests to discover predictive and stable high-order interactions. Proc Natl Acad Sci 115:1943–1948

    Article  MathSciNet  MATH  Google Scholar 

  5. Bénard C, Biau G, Da Veiga S, Scornet E (2021) Interpretable random forests via rule extraction. In: International Conference on Artif Intell Stat PMLR:937–945

    Google Scholar 

  6. Bénard C, Biau G, Da Veiga S, Scornet E (2021) SHAFF: Fast and consistent SHApley eFfect estimates via random Forests. Preprint. arXiv:210511724

    Google Scholar 

  7. Bénard C, Biau G, Da Veiga S, Scornet E (2021) SIRUS: Stable and Interpretable RUle Set for classification. Electron J Stat 15:427–505

    Article  MathSciNet  MATH  Google Scholar 

  8. Bénard C, Da Veiga S, Scornet E (2021) MDA for random forests: inconsistency, and a practical solution via the Sobol-MDA. Preprint. arXiv:210213347

    Google Scholar 

  9. Boulesteix AL, Slawski M (2009) Stability and aggregation of ranked gene lists. Brief Bioinform 10:556–568

    Article  Google Scholar 

  10. Bousquet O, Elisseeff A (2002) Stability and generalization. J Mach Learn Res 2:499–526

    MathSciNet  MATH  Google Scholar 

  11. Breiman L (1996) Bagging predictors. Mach Learn 24:123–140

    Article  MATH  Google Scholar 

  12. Breiman L (1996) Out-of-bag estimation. Technical report, Statistics Department, University of California Berkeley

    Google Scholar 

  13. Breiman L (2001) Random forests. Mach Learn 45:5–32

    Article  MATH  Google Scholar 

  14. Breiman L (2001) Statistical modeling: the two cultures (with comments and a rejoinder by the author). Stat Sci 16:199–231

    Article  MATH  Google Scholar 

  15. Breiman L (2003) Setting up, using, and understanding random forests v3.1. https://www.stat.berkeley.edu/~breiman/Using_random_forests_V3.1.pdf

  16. Breiman L, Friedman J, Olshen R, Stone C (1984) Classification and regression trees. Chapman & Hall/CRC, Boca Raton

    Google Scholar 

  17. Broto B, Bachoc F, Depecker M (2020) Variance reduction for estimation of Shapley effects and adaptation to unknown input distribution. SIAM/ASA J Uncertain Quant 8:693–716

    Article  MathSciNet  MATH  Google Scholar 

  18. Candes E, Fan Y, Janson L, Lv J (2016) Panning for gold: Model-X knockoffs for high-dimensional controlled variable selection. Preprint. arXiv:161002351

    Google Scholar 

  19. Chao A, Chazdon R, Colwell R, Shen TJ (2006) Abundance-based similarity indices and their estimation when there are unseen species in samples. Biometrics 62:361–371

    Article  MathSciNet  MATH  Google Scholar 

  20. Chastaing G, Gamboa F, Prieur C (2012) Generalized Hoeffding-Sobol decomposition for dependent variables-application to sensitivity analysis. Electron J Stat 6:2420–2448

    Article  MathSciNet  MATH  Google Scholar 

  21. Chen T, Guestrin C (2016) Xgboost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. ACM, New York, pp 785–794

    Chapter  Google Scholar 

  22. Clark P, Niblett T (1989) The CN2 induction algorithm. Mach Learn 3:261–283

    Article  Google Scholar 

  23. Cohen W (1995) Fast effective rule induction. In: Proceedings of the twelfth international conference on machine learning. Morgan Kaufmann Publishers Inc., San Francisco, pp 115–123

    Google Scholar 

  24. Cohen W, Singer Y (1999) A simple, fast, and effective rule learner. In: Proceedings of the sixteenth national conference on artificial intelligence and eleventh conference on innovative applications of artificial intelligence. AAAI Press, Palo Alto, pp 335–342

    Google Scholar 

  25. Covert I, Lee SI (2020) Improving kernelSHAP: practical Shapley value estimation via linear regression. Preprint. arXiv:201201536

    Google Scholar 

  26. Covert I, Lundberg S, Lee SI (2020) Understanding global feature contributions through additive importance measures. Preprint. arXiv:200400668

    Google Scholar 

  27. Crawford L, Flaxman S, Runcie D, West M (2019) Variable prioritization in nonlinear black box methods: a genetic association case study. Ann Appl Stat 13:958

    Article  MathSciNet  MATH  Google Scholar 

  28. Dembczyński K, Kotłowski W, Słowiński R (2008) Maximum likelihood rule ensembles. In: Proceedings of the 25th international conference on machine learning. ACM, New York, pp 224–231

    Chapter  MATH  Google Scholar 

  29. Dembczyński K, Kotłowski W, Słowiński R (2010) ENDER: A statistical framework for boosting decision rules. Data Mining Knowl Discov 21:52–90

    Article  MathSciNet  Google Scholar 

  30. Devroye L, Wagner T (1979) Distribution-free inequalities for the deleted and holdout error estimates. IEEE Trans Inf Theory 25:202–207

    Article  MathSciNet  MATH  Google Scholar 

  31. Doshi-Velez F, Kim B (2017) Towards a rigorous science of interpretable machine learning. Preprint. arXiv:170208608

    Google Scholar 

  32. Dua D, Graff C (2017) UCI machine learning repository. http://archive.ics.uci.edu/ml

  33. Efron B, Hastie T, Johnstone I, Tibshirani R (2004) Least angle regression. Ann Stat 32:407–499

    Article  MathSciNet  MATH  Google Scholar 

  34. Erhan D, Bengio Y, Courville A, Vincent P (2009) Visualizing higher-layer features of a deep network. University of Montreal 1341:1

    Google Scholar 

  35. Esposito F, Malerba D, Semeraro G, Kay J (1997) A comparative analysis of methods for pruning decision trees. IEEE Trans Patt Anal Mach Intell 19:476–491

    Article  Google Scholar 

  36. Fokkema M (2017) PRE: An R package for fitting prediction rule ensembles. Preprint. arXiv:170707149

    Google Scholar 

  37. Freitas A (2014) Comprehensible classification models: A position paper. ACM SIGKDD Explorations Newsletter 15:1–10

    Article  Google Scholar 

  38. Freund Y, Schapire R (1996) Experiments with a new boosting algorithm. In: Thirteenth international conference on ML, Citeseer, vol 96, pp 148–156

    Google Scholar 

  39. Friedman J (2001) Greedy function approximation: a gradient boosting machine. Ann Stat 29(5):1189-1232

    Article  MathSciNet  MATH  Google Scholar 

  40. Friedman J, Hastie T, Tibshirani R (2001) The elements of statistical learning, vol 1. Springer series in statistics. Springer, New York

    Google Scholar 

  41. Friedman J, Popescu B, et al. (2003) Importance sampled learning ensembles. J Mach Learn Res (2003) 4:94305

    Google Scholar 

  42. Friedman J, Popescu B, et al. (2008) Predictive learning via rule ensembles. Ann Appl Stat 2:916–954

    Article  MathSciNet  MATH  Google Scholar 

  43. Fürnkranz J (1999) Separate-and-conquer rule learning. Artif Intell Rev 13:3–54

    Article  MATH  Google Scholar 

  44. Fürnkranz J, Widmer G (1994) Incremental reduced error pruning. In: Proceedings of the 11th international conference on machine learning. Morgan Kaufmann Publishers Inc., San Francisco, pp 70–77

    Google Scholar 

  45. Genuer R, Poggi JM, Tuleau-Malot C (2010) Variable selection using random forests. Patt Recogn Lett 31:2225–2236

    Article  Google Scholar 

  46. Ghanem R, Higdon D, Owhadi H (2017) Handbook of uncertainty quantification. Springer, New York

    Book  MATH  Google Scholar 

  47. Gregorutti B, Michel B, Saint-Pierre P (2017) Correlation and variable importance in random forests. Stat Comput 27:659–678

    Article  MathSciNet  MATH  Google Scholar 

  48. Guidotti R, Monreale A, Ruggieri S, Turini F, Giannotti F, Pedreschi D (2018) A survey of methods for explaining black box models. ACM Comput Surv 51:1–42

    Article  Google Scholar 

  49. Guyon I, Weston J, Barnhill S, Vapnik V (2002) Gene selection for cancer classification using support vector machines. Mach learn 46:389–422

    Article  MATH  Google Scholar 

  50. He Z, Yu W (2010) Stable feature selection for biomarker discovery. Comput Biol Chem 34:215–225

    Article  MATH  Google Scholar 

  51. Iooss B, Lemaître P (2015) A review on global sensitivity analysis methods. Springer, Boston, pp 101–122

    Google Scholar 

  52. Iooss B, Prieur C (2017) Shapley effects for sensitivity analysis with correlated inputs: comparisons with Sobol’indices, numerical estimation and applications. Preprint. arXiv:170701334

    Google Scholar 

  53. Ish-Horowicz J, Udwin D, Flaxman S, Filippi S, Crawford L (2019) Interpreting deep neural networks through variable importance. Preprint. arXiv:190109839

    Google Scholar 

  54. Ishwaran H (2007) Variable importance in binary regression trees and forests. Electron J Stat 1:519–537

    Article  MathSciNet  MATH  Google Scholar 

  55. Ishwaran H, Kogalur U, Blackstone E, Lauer M (2008) Random survival forests. Ann Appl Stat 2:841–860

    Article  MathSciNet  MATH  Google Scholar 

  56. Kim B, Wattenberg M, Gilmer J, Cai C, Wexler J, Viegas F (2018) Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (TCAV). In: International conference on machine learning, PMLR, pp 2668–2677

    Google Scholar 

  57. Kumar IE, Venkatasubramanian S, Scheidegger C, Friedler S (2020) Problems with shapley-value-based explanations as feature importance measures. In: III HD, Singh A (eds) Proceedings of the 37th international conference on machine learning, PMLR. Proceedings of machine learning research, vol 119, pp 5491–5500

    Google Scholar 

  58. Kumbier K, Basu S, Brown J, Celniker S, Yu B (2018) Refining interaction search through signed iterative random forests. arXiv:181007287

    Google Scholar 

  59. Letham B (2015) Statistical learning for decision making: interpretability, uncertainty, and inference. PhD thesis, Massachusetts Institute of Technology

    Google Scholar 

  60. Letham B, Rudin C, McCormick T, Madigan D (2015) Interpretable classifiers using rules and Bayesian analysis: Building a better stroke prediction model. Ann Appl Stat 9:1350–1371

    Article  MathSciNet  MATH  Google Scholar 

  61. Lipton Z (2016) The mythos of model interpretability. Preprint. arXiv:160603490

    Google Scholar 

  62. Liu S, Patel R, Daga P, Liu H, Fu G, Doerksen R, Chen Y, Wilkins D (2012) Combined rule extraction and feature elimination in supervised classification. IEEE Trans. Nanobiosci. 11:228–236

    Article  Google Scholar 

  63. Louppe G (2014) Understanding random forests: From theory to practice. Preprint. arXiv:14077502

    Google Scholar 

  64. Lundberg S, Lee SI (2017) A unified approach to interpreting model predictions. In: Advances in neural information processing systems, New York, pp 4765–4774

    Google Scholar 

  65. Lundberg S, Erion G, Lee SI (2018) Consistent individualized feature attribution for tree ensembles. Preprint. arXiv:180203888

    Google Scholar 

  66. Malioutov D, Varshney K (2013) Exact rule learning via boolean compressed sensing. In: The 30th international conference on machine learning. Proceedings of machine learning research, pp 765–773

    Google Scholar 

  67. Meinshausen N (2010) Node harvest. Ann Appl Stat 4:2049–2072

    Article  MathSciNet  MATH  Google Scholar 

  68. Meinshausen N (2015) Package ‘nodeharvest’

    Google Scholar 

  69. Mentch L, Hooker G (2016) Quantifying uncertainty in random forests via confidence intervals and hypothesis tests. J Mach Learn Res 17:841–881

    MathSciNet  MATH  Google Scholar 

  70. Michalski R (1969) On the quasi-minimal solution of the general covering problem. In: Proceedings of the fifth international symposium on information processing. ACM, New York, pp 125–128

    Google Scholar 

  71. Murdoch W, Singh C, Kumbier K, Abbasi-Asl R, Yu B (2019) Interpretable machine learning: definitions, methods, and applications. Preprint. arXiv:190104592

    Google Scholar 

  72. Nalenz M, Villani M, et al. (2018) Tree ensembles with rule structured horseshoe regularization. Ann Appl Stat 12:2379–2408

    Article  MathSciNet  MATH  Google Scholar 

  73. Owen A (2014) Sobol’indices and Shapley value. SIAM/ASA J Uncertain Quant 2:245–251

    Article  MathSciNet  MATH  Google Scholar 

  74. Quinlan J (1986) Induction of decision trees. Mach Learn 1:81–106

    Article  Google Scholar 

  75. Quinlan J (1987) Simplifying decision trees. Int J Man-Mach Stud 27:221–234

    Article  Google Scholar 

  76. Quinlan J (1992) C4.5: Programs for machine learning. Morgan Kaufmann, San Mateo

    Google Scholar 

  77. Ribeiro M, Singh S, Guestrin C (2016) Why should I trust you? Explaining the predictions of any classifier. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. ACM, New York, pp 1135–1144

    Chapter  Google Scholar 

  78. Rivest R (1987) Learning decision lists. Mach Learn 2:229–246

    Article  Google Scholar 

  79. Rogers W, Wagner T (1978) A finite sample distribution-free performance bound for local discrimination rules. Ann Stat 6:506–514

    Article  MathSciNet  MATH  Google Scholar 

  80. Rüping S (2006) Learning interpretable models. PhD thesis, Universität Dortmund

    Google Scholar 

  81. Saltelli A (2002) Making best use of model evaluations to compute sensitivity indices. Comput. Phys Commun 145:280–297

    Article  MATH  Google Scholar 

  82. Scornet E, Biau G, Vert JP (2015) Consistency of random forests. Ann Stat 43:1716–1741

    Article  MathSciNet  MATH  Google Scholar 

  83. Shah R, Meinshausen N (2014) Random intersection trees. J Mach Learn Res 15:629–654

    MathSciNet  MATH  Google Scholar 

  84. Shapley L (1953) A value for n-person games. Contrib Theory Games 2:307–317

    MathSciNet  MATH  Google Scholar 

  85. Shrikumar A, Greenside P, Kundaje A (2017) Learning important features through propagating activation differences. In: Proceedings of the 34th International Conference on Machine Learning. Proceedings of Machine Learning Research, pp 3145–3153

    Google Scholar 

  86. Simonyan K, Vedaldi A, Zisserman A (2013) Deep inside convolutional networks: visualising image classification models and saliency maps. Preprint. arXiv:13126034

    Google Scholar 

  87. Sobol I (1993) Sensitivity estimates for nonlinear mathematical models. Math Modell Comput Exp 1:407–414

    MathSciNet  MATH  Google Scholar 

  88. Song E, Nelson B, Staum J (2016) Shapley effects for global sensitivity analysis: theory and computation. SIAM/ASA J Uncertain Quant 4:1060–1083

    Article  MathSciNet  MATH  Google Scholar 

  89. Song L, Smola A, Gretton A, Borgwardt K, Bedo J (2007) Supervised feature selection via dependence estimation. In: Proceedings of the 24th international conference on machine learning. Morgan Kaufmann Publishers, San Francisco, pp 823–830

    Chapter  Google Scholar 

  90. Strobl C, Boulesteix AL, Zeileis A, Hothorn T (2007) Bias in random forest variable importance measures: illustrations, sources and a solution. BMC Bioinformatics 8:25

    Article  Google Scholar 

  91. Su G, Wei D, Varshney K, Malioutov D (2015) Interpretable two-level boolean rule learning for classification. Preprint. arXiv:151107361

    Google Scholar 

  92. Sundararajan M, Najmi A (2020) The many Shapley values for model explanation. In: Thirty-seventh international conference on machine learning. Proceedings of machine learning research, pp 9269–9278

    Google Scholar 

  93. Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B (Methodological), pp 267–288

    Google Scholar 

  94. Vapnik V (1998) Statistical learning theory. 1998, vol 3. Wiley, New York

    Google Scholar 

  95. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez A, Kaiser L, Polosukhin I (2017) Attention is all you need. Preprint. arXiv:170603762

    Google Scholar 

  96. Wager S, Athey S (2018) Estimation and inference of heterogeneous treatment effects using random forests. J Am Stat Assoc 113:1228–1242

    Article  MathSciNet  MATH  Google Scholar 

  97. Weiss S, Indurkhya N (2000) Lightweight rule induction. In: Proceedings of the seventeenth international conference on machine learning. Morgan Kaufmann Publishers Inc., San Francisco, pp 1135–1142

    Google Scholar 

  98. Williamson B, Feng J (2020) Efficient nonparametric statistical inference on population feature importance using Shapley values. In: Thirty-seventh international conference on machine learning. Proceedings of machine learning research, pp 10282–10291

    Google Scholar 

  99. Wright M, Ziegler A (2017) ranger: A fast implementation of random forests for high dimensional data in C++ and R. J Stat Softw 77:1–17

    Article  Google Scholar 

  100. Yang H, Rudin C, Seltzer M (2017) Scalable bayesian rule lists. In: Proceedings of the 34th international conference on machine learning, PMLR, pp 3921–3930

    Google Scholar 

  101. Yu B (2013) Stability. Bernoulli 19:1484–1500

    Article  MathSciNet  MATH  Google Scholar 

  102. Yu B, Kumbier K (2019) Three principles of data science: predictability, computability, and stability (PCS). Preprint. arXiv:190108152

    Google Scholar 

  103. Zucknick M, Richardson S, Stronach E (2008) Comparing the characteristics of gene expression profiles derived by univariate and multivariate classification methods. Stat Appl Genet Mol Biol 7:1–34

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

We would like to thank the many referees that helped us to improve the overall quality of the papers on which this chapter is built. We also want to express our warm thanks to Gerard Biau for his work and his numerous ideas in the presented work.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Clément Bénard .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Bénard, C., Veiga, S.D., Scornet, E. (2022). Interpretability via Random Forests. In: Lepore, A., Palumbo, B., Poggi, JM. (eds) Interpretability for Industry 4.0 : Statistical and Machine Learning Approaches . Springer, Cham. https://doi.org/10.1007/978-3-031-12402-0_3

Download citation

Publish with us

Policies and ethics