Interpretability via Random Forests

Bénard, Clément; Veiga, Sébastien Da; Scornet, Erwan

doi:10.1007/978-3-031-12402-0_3

Clément Bénard^4,5,
Sébastien Da Veiga⁶ &
Erwan Scornet⁷

416 Accesses
1 Citations

Abstract

Although there is no consensus on a precise definition of interpretability, it is possible to identify several requirements: “simplicity, stability, and accuracy”, rarely all satisfied by existing interpretable methods. The structure and stability of random forests make them good candidates to improve the performance of interpretable algorithms. The first part of this chapter focuses on rule learning models, which are simple and highly predictive algorithms, but very often unstable with respect to small data perturbations. A new algorithm called SIRUS, designed as the extraction of a compact rule ensemble from a random forest, considerably improves stability over state-of-the-art competitors, while preserving simplicity and accuracy. The second part of this chapter is dedicated to post-hoc methods, in particular variable importance measures for random forests. An asymptotic analysis of Breiman’s MDA (Mean Decrease Accuracy) shows that this measure is strongly biased using a sensitivity analysis perspective. The Sobol-MDA algorithm is introduced to fix the MDA flaws, replacing permutations by projections. An extension to Shapley effects, an efficient importance measure when input variables are dependent, is then proposed with the SHAFF algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

eBook: USD 16.99; Price excludes VAT (USA)

Softcover Book: USD 16.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
See Table 1 in Sect. 5 of the Supplementary Material in [5] for dataset details.
2.
See Sect. 2 of the Supplementary Material in [5] for details on the bi-objective procedure.
3.
See Sect. 6 of the Supplementary Material in [5] for a detailed definition of this criterion.
4.
See Bénard et al. [8] for details.

References

Aas K, Jullum M, Løland A (2019) Explaining individual predictions when features are dependent: More accurate approximations to Shapley values. Preprint. arXiv:190310464
Google Scholar
Alelyani S, Zhao Z, Liu H (2011) A dilemma in assessing stability of feature selection algorithms. In: 13th IEEE international conference on high performance computing & communication. IEEE, Piscataway, pp 701–707
Google Scholar
Archer K, Kimes R (2008) Empirical characterization of random forest variable importance measures. Comput Stat Data Anal 52:2249–2260
Article MathSciNet MATH Google Scholar
Basu S, Kumbier K, Brown J, Yu B (2018) Iterative random forests to discover predictive and stable high-order interactions. Proc Natl Acad Sci 115:1943–1948
Article MathSciNet MATH Google Scholar
Bénard C, Biau G, Da Veiga S, Scornet E (2021) Interpretable random forests via rule extraction. In: International Conference on Artif Intell Stat PMLR:937–945
Google Scholar
Bénard C, Biau G, Da Veiga S, Scornet E (2021) SHAFF: Fast and consistent SHApley eFfect estimates via random Forests. Preprint. arXiv:210511724
Google Scholar
Bénard C, Biau G, Da Veiga S, Scornet E (2021) SIRUS: Stable and Interpretable RUle Set for classification. Electron J Stat 15:427–505
Article MathSciNet MATH Google Scholar
Bénard C, Da Veiga S, Scornet E (2021) MDA for random forests: inconsistency, and a practical solution via the Sobol-MDA. Preprint. arXiv:210213347
Google Scholar
Boulesteix AL, Slawski M (2009) Stability and aggregation of ranked gene lists. Brief Bioinform 10:556–568
Article Google Scholar
Bousquet O, Elisseeff A (2002) Stability and generalization. J Mach Learn Res 2:499–526
MathSciNet MATH Google Scholar
Breiman L (1996) Bagging predictors. Mach Learn 24:123–140
Article MATH Google Scholar
Breiman L (1996) Out-of-bag estimation. Technical report, Statistics Department, University of California Berkeley
Google Scholar
Breiman L (2001) Random forests. Mach Learn 45:5–32
Article MATH Google Scholar
Breiman L (2001) Statistical modeling: the two cultures (with comments and a rejoinder by the author). Stat Sci 16:199–231
Article MATH Google Scholar
Breiman L (2003) Setting up, using, and understanding random forests v3.1. https://www.stat.berkeley.edu/~breiman/Using_random_forests_V3.1.pdf
Breiman L, Friedman J, Olshen R, Stone C (1984) Classification and regression trees. Chapman & Hall/CRC, Boca Raton
Google Scholar
Broto B, Bachoc F, Depecker M (2020) Variance reduction for estimation of Shapley effects and adaptation to unknown input distribution. SIAM/ASA J Uncertain Quant 8:693–716
Article MathSciNet MATH Google Scholar
Candes E, Fan Y, Janson L, Lv J (2016) Panning for gold: Model-X knockoffs for high-dimensional controlled variable selection. Preprint. arXiv:161002351
Google Scholar
Chao A, Chazdon R, Colwell R, Shen TJ (2006) Abundance-based similarity indices and their estimation when there are unseen species in samples. Biometrics 62:361–371
Article MathSciNet MATH Google Scholar
Chastaing G, Gamboa F, Prieur C (2012) Generalized Hoeffding-Sobol decomposition for dependent variables-application to sensitivity analysis. Electron J Stat 6:2420–2448
Article MathSciNet MATH Google Scholar
Chen T, Guestrin C (2016) Xgboost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. ACM, New York, pp 785–794
Chapter Google Scholar
Clark P, Niblett T (1989) The CN2 induction algorithm. Mach Learn 3:261–283
Article Google Scholar
Cohen W (1995) Fast effective rule induction. In: Proceedings of the twelfth international conference on machine learning. Morgan Kaufmann Publishers Inc., San Francisco, pp 115–123
Google Scholar
Cohen W, Singer Y (1999) A simple, fast, and effective rule learner. In: Proceedings of the sixteenth national conference on artificial intelligence and eleventh conference on innovative applications of artificial intelligence. AAAI Press, Palo Alto, pp 335–342
Google Scholar
Covert I, Lee SI (2020) Improving kernelSHAP: practical Shapley value estimation via linear regression. Preprint. arXiv:201201536
Google Scholar
Covert I, Lundberg S, Lee SI (2020) Understanding global feature contributions through additive importance measures. Preprint. arXiv:200400668
Google Scholar
Crawford L, Flaxman S, Runcie D, West M (2019) Variable prioritization in nonlinear black box methods: a genetic association case study. Ann Appl Stat 13:958
Article MathSciNet MATH Google Scholar
Dembczyński K, Kotłowski W, Słowiński R (2008) Maximum likelihood rule ensembles. In: Proceedings of the 25th international conference on machine learning. ACM, New York, pp 224–231
Chapter MATH Google Scholar
Dembczyński K, Kotłowski W, Słowiński R (2010) ENDER: A statistical framework for boosting decision rules. Data Mining Knowl Discov 21:52–90
Article MathSciNet Google Scholar
Devroye L, Wagner T (1979) Distribution-free inequalities for the deleted and holdout error estimates. IEEE Trans Inf Theory 25:202–207
Article MathSciNet MATH Google Scholar
Doshi-Velez F, Kim B (2017) Towards a rigorous science of interpretable machine learning. Preprint. arXiv:170208608
Google Scholar
Dua D, Graff C (2017) UCI machine learning repository. http://archive.ics.uci.edu/ml
Efron B, Hastie T, Johnstone I, Tibshirani R (2004) Least angle regression. Ann Stat 32:407–499
Article MathSciNet MATH Google Scholar
Erhan D, Bengio Y, Courville A, Vincent P (2009) Visualizing higher-layer features of a deep network. University of Montreal 1341:1
Google Scholar
Esposito F, Malerba D, Semeraro G, Kay J (1997) A comparative analysis of methods for pruning decision trees. IEEE Trans Patt Anal Mach Intell 19:476–491
Article Google Scholar
Fokkema M (2017) PRE: An R package for fitting prediction rule ensembles. Preprint. arXiv:170707149
Google Scholar
Freitas A (2014) Comprehensible classification models: A position paper. ACM SIGKDD Explorations Newsletter 15:1–10
Article Google Scholar
Freund Y, Schapire R (1996) Experiments with a new boosting algorithm. In: Thirteenth international conference on ML, Citeseer, vol 96, pp 148–156
Google Scholar
Friedman J (2001) Greedy function approximation: a gradient boosting machine. Ann Stat 29(5):1189-1232
Article MathSciNet MATH Google Scholar
Friedman J, Hastie T, Tibshirani R (2001) The elements of statistical learning, vol 1. Springer series in statistics. Springer, New York
Google Scholar
Friedman J, Popescu B, et al. (2003) Importance sampled learning ensembles. J Mach Learn Res (2003) 4:94305
Google Scholar
Friedman J, Popescu B, et al. (2008) Predictive learning via rule ensembles. Ann Appl Stat 2:916–954
Article MathSciNet MATH Google Scholar
Fürnkranz J (1999) Separate-and-conquer rule learning. Artif Intell Rev 13:3–54
Article MATH Google Scholar
Fürnkranz J, Widmer G (1994) Incremental reduced error pruning. In: Proceedings of the 11th international conference on machine learning. Morgan Kaufmann Publishers Inc., San Francisco, pp 70–77
Google Scholar
Genuer R, Poggi JM, Tuleau-Malot C (2010) Variable selection using random forests. Patt Recogn Lett 31:2225–2236
Article Google Scholar
Ghanem R, Higdon D, Owhadi H (2017) Handbook of uncertainty quantification. Springer, New York
Book MATH Google Scholar
Gregorutti B, Michel B, Saint-Pierre P (2017) Correlation and variable importance in random forests. Stat Comput 27:659–678
Article MathSciNet MATH Google Scholar
Guidotti R, Monreale A, Ruggieri S, Turini F, Giannotti F, Pedreschi D (2018) A survey of methods for explaining black box models. ACM Comput Surv 51:1–42
Article Google Scholar
Guyon I, Weston J, Barnhill S, Vapnik V (2002) Gene selection for cancer classification using support vector machines. Mach learn 46:389–422
Article MATH Google Scholar
He Z, Yu W (2010) Stable feature selection for biomarker discovery. Comput Biol Chem 34:215–225
Article MATH Google Scholar
Iooss B, Lemaître P (2015) A review on global sensitivity analysis methods. Springer, Boston, pp 101–122
Google Scholar
Iooss B, Prieur C (2017) Shapley effects for sensitivity analysis with correlated inputs: comparisons with Sobol’indices, numerical estimation and applications. Preprint. arXiv:170701334
Google Scholar
Ish-Horowicz J, Udwin D, Flaxman S, Filippi S, Crawford L (2019) Interpreting deep neural networks through variable importance. Preprint. arXiv:190109839
Google Scholar
Ishwaran H (2007) Variable importance in binary regression trees and forests. Electron J Stat 1:519–537
Article MathSciNet MATH Google Scholar
Ishwaran H, Kogalur U, Blackstone E, Lauer M (2008) Random survival forests. Ann Appl Stat 2:841–860
Article MathSciNet MATH Google Scholar
Kim B, Wattenberg M, Gilmer J, Cai C, Wexler J, Viegas F (2018) Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (TCAV). In: International conference on machine learning, PMLR, pp 2668–2677
Google Scholar
Kumar IE, Venkatasubramanian S, Scheidegger C, Friedler S (2020) Problems with shapley-value-based explanations as feature importance measures. In: III HD, Singh A (eds) Proceedings of the 37th international conference on machine learning, PMLR. Proceedings of machine learning research, vol 119, pp 5491–5500
Google Scholar
Kumbier K, Basu S, Brown J, Celniker S, Yu B (2018) Refining interaction search through signed iterative random forests. arXiv:181007287
Google Scholar
Letham B (2015) Statistical learning for decision making: interpretability, uncertainty, and inference. PhD thesis, Massachusetts Institute of Technology
Google Scholar
Letham B, Rudin C, McCormick T, Madigan D (2015) Interpretable classifiers using rules and Bayesian analysis: Building a better stroke prediction model. Ann Appl Stat 9:1350–1371
Article MathSciNet MATH Google Scholar
Lipton Z (2016) The mythos of model interpretability. Preprint. arXiv:160603490
Google Scholar
Liu S, Patel R, Daga P, Liu H, Fu G, Doerksen R, Chen Y, Wilkins D (2012) Combined rule extraction and feature elimination in supervised classification. IEEE Trans. Nanobiosci. 11:228–236
Article Google Scholar
Louppe G (2014) Understanding random forests: From theory to practice. Preprint. arXiv:14077502
Google Scholar
Lundberg S, Lee SI (2017) A unified approach to interpreting model predictions. In: Advances in neural information processing systems, New York, pp 4765–4774
Google Scholar
Lundberg S, Erion G, Lee SI (2018) Consistent individualized feature attribution for tree ensembles. Preprint. arXiv:180203888
Google Scholar
Malioutov D, Varshney K (2013) Exact rule learning via boolean compressed sensing. In: The 30th international conference on machine learning. Proceedings of machine learning research, pp 765–773
Google Scholar
Meinshausen N (2010) Node harvest. Ann Appl Stat 4:2049–2072
Article MathSciNet MATH Google Scholar
Meinshausen N (2015) Package ‘nodeharvest’
Google Scholar
Mentch L, Hooker G (2016) Quantifying uncertainty in random forests via confidence intervals and hypothesis tests. J Mach Learn Res 17:841–881
MathSciNet MATH Google Scholar
Michalski R (1969) On the quasi-minimal solution of the general covering problem. In: Proceedings of the fifth international symposium on information processing. ACM, New York, pp 125–128
Google Scholar
Murdoch W, Singh C, Kumbier K, Abbasi-Asl R, Yu B (2019) Interpretable machine learning: definitions, methods, and applications. Preprint. arXiv:190104592
Google Scholar
Nalenz M, Villani M, et al. (2018) Tree ensembles with rule structured horseshoe regularization. Ann Appl Stat 12:2379–2408
Article MathSciNet MATH Google Scholar
Owen A (2014) Sobol’indices and Shapley value. SIAM/ASA J Uncertain Quant 2:245–251
Article MathSciNet MATH Google Scholar
Quinlan J (1986) Induction of decision trees. Mach Learn 1:81–106
Article Google Scholar
Quinlan J (1987) Simplifying decision trees. Int J Man-Mach Stud 27:221–234
Article Google Scholar
Quinlan J (1992) C4.5: Programs for machine learning. Morgan Kaufmann, San Mateo
Google Scholar
Ribeiro M, Singh S, Guestrin C (2016) Why should I trust you? Explaining the predictions of any classifier. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. ACM, New York, pp 1135–1144
Chapter Google Scholar
Rivest R (1987) Learning decision lists. Mach Learn 2:229–246
Article Google Scholar
Rogers W, Wagner T (1978) A finite sample distribution-free performance bound for local discrimination rules. Ann Stat 6:506–514
Article MathSciNet MATH Google Scholar
Rüping S (2006) Learning interpretable models. PhD thesis, Universität Dortmund
Google Scholar
Saltelli A (2002) Making best use of model evaluations to compute sensitivity indices. Comput. Phys Commun 145:280–297
Article MATH Google Scholar
Scornet E, Biau G, Vert JP (2015) Consistency of random forests. Ann Stat 43:1716–1741
Article MathSciNet MATH Google Scholar
Shah R, Meinshausen N (2014) Random intersection trees. J Mach Learn Res 15:629–654
MathSciNet MATH Google Scholar
Shapley L (1953) A value for n-person games. Contrib Theory Games 2:307–317
MathSciNet MATH Google Scholar
Shrikumar A, Greenside P, Kundaje A (2017) Learning important features through propagating activation differences. In: Proceedings of the 34th International Conference on Machine Learning. Proceedings of Machine Learning Research, pp 3145–3153
Google Scholar
Simonyan K, Vedaldi A, Zisserman A (2013) Deep inside convolutional networks: visualising image classification models and saliency maps. Preprint. arXiv:13126034
Google Scholar
Sobol I (1993) Sensitivity estimates for nonlinear mathematical models. Math Modell Comput Exp 1:407–414
MathSciNet MATH Google Scholar
Song E, Nelson B, Staum J (2016) Shapley effects for global sensitivity analysis: theory and computation. SIAM/ASA J Uncertain Quant 4:1060–1083
Article MathSciNet MATH Google Scholar
Song L, Smola A, Gretton A, Borgwardt K, Bedo J (2007) Supervised feature selection via dependence estimation. In: Proceedings of the 24th international conference on machine learning. Morgan Kaufmann Publishers, San Francisco, pp 823–830
Chapter Google Scholar
Strobl C, Boulesteix AL, Zeileis A, Hothorn T (2007) Bias in random forest variable importance measures: illustrations, sources and a solution. BMC Bioinformatics 8:25
Article Google Scholar
Su G, Wei D, Varshney K, Malioutov D (2015) Interpretable two-level boolean rule learning for classification. Preprint. arXiv:151107361
Google Scholar
Sundararajan M, Najmi A (2020) The many Shapley values for model explanation. In: Thirty-seventh international conference on machine learning. Proceedings of machine learning research, pp 9269–9278
Google Scholar
Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B (Methodological), pp 267–288
Google Scholar
Vapnik V (1998) Statistical learning theory. 1998, vol 3. Wiley, New York
Google Scholar
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez A, Kaiser L, Polosukhin I (2017) Attention is all you need. Preprint. arXiv:170603762
Google Scholar
Wager S, Athey S (2018) Estimation and inference of heterogeneous treatment effects using random forests. J Am Stat Assoc 113:1228–1242
Article MathSciNet MATH Google Scholar
Weiss S, Indurkhya N (2000) Lightweight rule induction. In: Proceedings of the seventeenth international conference on machine learning. Morgan Kaufmann Publishers Inc., San Francisco, pp 1135–1142
Google Scholar
Williamson B, Feng J (2020) Efficient nonparametric statistical inference on population feature importance using Shapley values. In: Thirty-seventh international conference on machine learning. Proceedings of machine learning research, pp 10282–10291
Google Scholar
Wright M, Ziegler A (2017) ranger: A fast implementation of random forests for high dimensional data in C++ and R. J Stat Softw 77:1–17
Article Google Scholar
Yang H, Rudin C, Seltzer M (2017) Scalable bayesian rule lists. In: Proceedings of the 34th international conference on machine learning, PMLR, pp 3921–3930
Google Scholar
Yu B (2013) Stability. Bernoulli 19:1484–1500
Article MathSciNet MATH Google Scholar
Yu B, Kumbier K (2019) Three principles of data science: predictability, computability, and stability (PCS). Preprint. arXiv:190108152
Google Scholar
Zucknick M, Richardson S, Stronach E (2008) Comparing the characteristics of gene expression profiles derived by univariate and multivariate classification methods. Stat Appl Genet Mol Biol 7:1–34
Article MathSciNet MATH Google Scholar

Download references

Acknowledgements

We would like to thank the many referees that helped us to improve the overall quality of the papers on which this chapter is built. We also want to express our warm thanks to Gerard Biau for his work and his numerous ideas in the presented work.

Author information

Authors and Affiliations

Safran Tech, Digital Sciences & Technologies, Magny-Les-Hameaux, France
Clément Bénard
Sorbonne Université, CNRS, LPSM, Paris, France
Clément Bénard
Safran Tech, Digital Sciences & Technologies, Magny-Les-Hameaux, France
Sébastien Da Veiga
École Polytechnique, Institut Polytechnique de Paris, CMAP, Palaiseau, France
Erwan Scornet

Authors

Clément Bénard
View author publications
You can also search for this author in PubMed Google Scholar
Sébastien Da Veiga
View author publications
You can also search for this author in PubMed Google Scholar
Erwan Scornet
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Clément Bénard .

Editor information

Editors and Affiliations

University of Naples Federico II, Naples, Italy
Antonio Lepore
University of Naples Federico II, Naples, Italy
Biagio Palumbo
University Paris-Saclay, Orsay, France
Jean-Michel Poggi

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Bénard, C., Veiga, S.D., Scornet, E. (2022). Interpretability via Random Forests. In: Lepore, A., Palumbo, B., Poggi, JM. (eds) Interpretability for Industry 4.0 : Statistical and Machine Learning Approaches . Springer, Cham. https://doi.org/10.1007/978-3-031-12402-0_3

Download citation

DOI: https://doi.org/10.1007/978-3-031-12402-0_3
Published: 11 July 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-12401-3
Online ISBN: 978-3-031-12402-0
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics