Abstract
Although there is no consensus on a precise definition of interpretability, it is possible to identify several requirements: “simplicity, stability, and accuracy”, rarely all satisfied by existing interpretable methods. The structure and stability of random forests make them good candidates to improve the performance of interpretable algorithms. The first part of this chapter focuses on rule learning models, which are simple and highly predictive algorithms, but very often unstable with respect to small data perturbations. A new algorithm called SIRUS, designed as the extraction of a compact rule ensemble from a random forest, considerably improves stability over state-of-the-art competitors, while preserving simplicity and accuracy. The second part of this chapter is dedicated to post-hoc methods, in particular variable importance measures for random forests. An asymptotic analysis of Breiman’s MDA (Mean Decrease Accuracy) shows that this measure is strongly biased using a sensitivity analysis perspective. The Sobol-MDA algorithm is introduced to fix the MDA flaws, replacing permutations by projections. An extension to Shapley effects, an efficient importance measure when input variables are dependent, is then proposed with the SHAFF algorithm.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
See Table 1 in Sect. 5 of the Supplementary Material in [5] for dataset details.
- 2.
See Sect. 2 of the Supplementary Material in [5] for details on the bi-objective procedure.
- 3.
See Sect. 6 of the Supplementary Material in [5] for a detailed definition of this criterion.
- 4.
See Bénard et al. [8] for details.
References
Aas K, Jullum M, Løland A (2019) Explaining individual predictions when features are dependent: More accurate approximations to Shapley values. Preprint. arXiv:190310464
Alelyani S, Zhao Z, Liu H (2011) A dilemma in assessing stability of feature selection algorithms. In: 13th IEEE international conference on high performance computing & communication. IEEE, Piscataway, pp 701–707
Archer K, Kimes R (2008) Empirical characterization of random forest variable importance measures. Comput Stat Data Anal 52:2249–2260
Basu S, Kumbier K, Brown J, Yu B (2018) Iterative random forests to discover predictive and stable high-order interactions. Proc Natl Acad Sci 115:1943–1948
Bénard C, Biau G, Da Veiga S, Scornet E (2021) Interpretable random forests via rule extraction. In: International Conference on Artif Intell Stat PMLR:937–945
Bénard C, Biau G, Da Veiga S, Scornet E (2021) SHAFF: Fast and consistent SHApley eFfect estimates via random Forests. Preprint. arXiv:210511724
Bénard C, Biau G, Da Veiga S, Scornet E (2021) SIRUS: Stable and Interpretable RUle Set for classification. Electron J Stat 15:427–505
Bénard C, Da Veiga S, Scornet E (2021) MDA for random forests: inconsistency, and a practical solution via the Sobol-MDA. Preprint. arXiv:210213347
Boulesteix AL, Slawski M (2009) Stability and aggregation of ranked gene lists. Brief Bioinform 10:556–568
Bousquet O, Elisseeff A (2002) Stability and generalization. J Mach Learn Res 2:499–526
Breiman L (1996) Bagging predictors. Mach Learn 24:123–140
Breiman L (1996) Out-of-bag estimation. Technical report, Statistics Department, University of California Berkeley
Breiman L (2001) Random forests. Mach Learn 45:5–32
Breiman L (2001) Statistical modeling: the two cultures (with comments and a rejoinder by the author). Stat Sci 16:199–231
Breiman L (2003) Setting up, using, and understanding random forests v3.1. https://www.stat.berkeley.edu/~breiman/Using_random_forests_V3.1.pdf
Breiman L, Friedman J, Olshen R, Stone C (1984) Classification and regression trees. Chapman & Hall/CRC, Boca Raton
Broto B, Bachoc F, Depecker M (2020) Variance reduction for estimation of Shapley effects and adaptation to unknown input distribution. SIAM/ASA J Uncertain Quant 8:693–716
Candes E, Fan Y, Janson L, Lv J (2016) Panning for gold: Model-X knockoffs for high-dimensional controlled variable selection. Preprint. arXiv:161002351
Chao A, Chazdon R, Colwell R, Shen TJ (2006) Abundance-based similarity indices and their estimation when there are unseen species in samples. Biometrics 62:361–371
Chastaing G, Gamboa F, Prieur C (2012) Generalized Hoeffding-Sobol decomposition for dependent variables-application to sensitivity analysis. Electron J Stat 6:2420–2448
Chen T, Guestrin C (2016) Xgboost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. ACM, New York, pp 785–794
Clark P, Niblett T (1989) The CN2 induction algorithm. Mach Learn 3:261–283
Cohen W (1995) Fast effective rule induction. In: Proceedings of the twelfth international conference on machine learning. Morgan Kaufmann Publishers Inc., San Francisco, pp 115–123
Cohen W, Singer Y (1999) A simple, fast, and effective rule learner. In: Proceedings of the sixteenth national conference on artificial intelligence and eleventh conference on innovative applications of artificial intelligence. AAAI Press, Palo Alto, pp 335–342
Covert I, Lee SI (2020) Improving kernelSHAP: practical Shapley value estimation via linear regression. Preprint. arXiv:201201536
Covert I, Lundberg S, Lee SI (2020) Understanding global feature contributions through additive importance measures. Preprint. arXiv:200400668
Crawford L, Flaxman S, Runcie D, West M (2019) Variable prioritization in nonlinear black box methods: a genetic association case study. Ann Appl Stat 13:958
Dembczyński K, Kotłowski W, Słowiński R (2008) Maximum likelihood rule ensembles. In: Proceedings of the 25th international conference on machine learning. ACM, New York, pp 224–231
Dembczyński K, Kotłowski W, Słowiński R (2010) ENDER: A statistical framework for boosting decision rules. Data Mining Knowl Discov 21:52–90
Devroye L, Wagner T (1979) Distribution-free inequalities for the deleted and holdout error estimates. IEEE Trans Inf Theory 25:202–207
Doshi-Velez F, Kim B (2017) Towards a rigorous science of interpretable machine learning. Preprint. arXiv:170208608
Dua D, Graff C (2017) UCI machine learning repository. http://archive.ics.uci.edu/ml
Efron B, Hastie T, Johnstone I, Tibshirani R (2004) Least angle regression. Ann Stat 32:407–499
Erhan D, Bengio Y, Courville A, Vincent P (2009) Visualizing higher-layer features of a deep network. University of Montreal 1341:1
Esposito F, Malerba D, Semeraro G, Kay J (1997) A comparative analysis of methods for pruning decision trees. IEEE Trans Patt Anal Mach Intell 19:476–491
Fokkema M (2017) PRE: An R package for fitting prediction rule ensembles. Preprint. arXiv:170707149
Freitas A (2014) Comprehensible classification models: A position paper. ACM SIGKDD Explorations Newsletter 15:1–10
Freund Y, Schapire R (1996) Experiments with a new boosting algorithm. In: Thirteenth international conference on ML, Citeseer, vol 96, pp 148–156
Friedman J (2001) Greedy function approximation: a gradient boosting machine. Ann Stat 29(5):1189-1232
Friedman J, Hastie T, Tibshirani R (2001) The elements of statistical learning, vol 1. Springer series in statistics. Springer, New York
Friedman J, Popescu B, et al. (2003) Importance sampled learning ensembles. J Mach Learn Res (2003) 4:94305
Friedman J, Popescu B, et al. (2008) Predictive learning via rule ensembles. Ann Appl Stat 2:916–954
Fürnkranz J (1999) Separate-and-conquer rule learning. Artif Intell Rev 13:3–54
Fürnkranz J, Widmer G (1994) Incremental reduced error pruning. In: Proceedings of the 11th international conference on machine learning. Morgan Kaufmann Publishers Inc., San Francisco, pp 70–77
Genuer R, Poggi JM, Tuleau-Malot C (2010) Variable selection using random forests. Patt Recogn Lett 31:2225–2236
Ghanem R, Higdon D, Owhadi H (2017) Handbook of uncertainty quantification. Springer, New York
Gregorutti B, Michel B, Saint-Pierre P (2017) Correlation and variable importance in random forests. Stat Comput 27:659–678
Guidotti R, Monreale A, Ruggieri S, Turini F, Giannotti F, Pedreschi D (2018) A survey of methods for explaining black box models. ACM Comput Surv 51:1–42
Guyon I, Weston J, Barnhill S, Vapnik V (2002) Gene selection for cancer classification using support vector machines. Mach learn 46:389–422
He Z, Yu W (2010) Stable feature selection for biomarker discovery. Comput Biol Chem 34:215–225
Iooss B, Lemaître P (2015) A review on global sensitivity analysis methods. Springer, Boston, pp 101–122
Iooss B, Prieur C (2017) Shapley effects for sensitivity analysis with correlated inputs: comparisons with Sobol’indices, numerical estimation and applications. Preprint. arXiv:170701334
Ish-Horowicz J, Udwin D, Flaxman S, Filippi S, Crawford L (2019) Interpreting deep neural networks through variable importance. Preprint. arXiv:190109839
Ishwaran H (2007) Variable importance in binary regression trees and forests. Electron J Stat 1:519–537
Ishwaran H, Kogalur U, Blackstone E, Lauer M (2008) Random survival forests. Ann Appl Stat 2:841–860
Kim B, Wattenberg M, Gilmer J, Cai C, Wexler J, Viegas F (2018) Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (TCAV). In: International conference on machine learning, PMLR, pp 2668–2677
Kumar IE, Venkatasubramanian S, Scheidegger C, Friedler S (2020) Problems with shapley-value-based explanations as feature importance measures. In: III HD, Singh A (eds) Proceedings of the 37th international conference on machine learning, PMLR. Proceedings of machine learning research, vol 119, pp 5491–5500
Kumbier K, Basu S, Brown J, Celniker S, Yu B (2018) Refining interaction search through signed iterative random forests. arXiv:181007287
Letham B (2015) Statistical learning for decision making: interpretability, uncertainty, and inference. PhD thesis, Massachusetts Institute of Technology
Letham B, Rudin C, McCormick T, Madigan D (2015) Interpretable classifiers using rules and Bayesian analysis: Building a better stroke prediction model. Ann Appl Stat 9:1350–1371
Lipton Z (2016) The mythos of model interpretability. Preprint. arXiv:160603490
Liu S, Patel R, Daga P, Liu H, Fu G, Doerksen R, Chen Y, Wilkins D (2012) Combined rule extraction and feature elimination in supervised classification. IEEE Trans. Nanobiosci. 11:228–236
Louppe G (2014) Understanding random forests: From theory to practice. Preprint. arXiv:14077502
Lundberg S, Lee SI (2017) A unified approach to interpreting model predictions. In: Advances in neural information processing systems, New York, pp 4765–4774
Lundberg S, Erion G, Lee SI (2018) Consistent individualized feature attribution for tree ensembles. Preprint. arXiv:180203888
Malioutov D, Varshney K (2013) Exact rule learning via boolean compressed sensing. In: The 30th international conference on machine learning. Proceedings of machine learning research, pp 765–773
Meinshausen N (2010) Node harvest. Ann Appl Stat 4:2049–2072
Meinshausen N (2015) Package ‘nodeharvest’
Mentch L, Hooker G (2016) Quantifying uncertainty in random forests via confidence intervals and hypothesis tests. J Mach Learn Res 17:841–881
Michalski R (1969) On the quasi-minimal solution of the general covering problem. In: Proceedings of the fifth international symposium on information processing. ACM, New York, pp 125–128
Murdoch W, Singh C, Kumbier K, Abbasi-Asl R, Yu B (2019) Interpretable machine learning: definitions, methods, and applications. Preprint. arXiv:190104592
Nalenz M, Villani M, et al. (2018) Tree ensembles with rule structured horseshoe regularization. Ann Appl Stat 12:2379–2408
Owen A (2014) Sobol’indices and Shapley value. SIAM/ASA J Uncertain Quant 2:245–251
Quinlan J (1986) Induction of decision trees. Mach Learn 1:81–106
Quinlan J (1987) Simplifying decision trees. Int J Man-Mach Stud 27:221–234
Quinlan J (1992) C4.5: Programs for machine learning. Morgan Kaufmann, San Mateo
Ribeiro M, Singh S, Guestrin C (2016) Why should I trust you? Explaining the predictions of any classifier. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. ACM, New York, pp 1135–1144
Rivest R (1987) Learning decision lists. Mach Learn 2:229–246
Rogers W, Wagner T (1978) A finite sample distribution-free performance bound for local discrimination rules. Ann Stat 6:506–514
Rüping S (2006) Learning interpretable models. PhD thesis, Universität Dortmund
Saltelli A (2002) Making best use of model evaluations to compute sensitivity indices. Comput. Phys Commun 145:280–297
Scornet E, Biau G, Vert JP (2015) Consistency of random forests. Ann Stat 43:1716–1741
Shah R, Meinshausen N (2014) Random intersection trees. J Mach Learn Res 15:629–654
Shapley L (1953) A value for n-person games. Contrib Theory Games 2:307–317
Shrikumar A, Greenside P, Kundaje A (2017) Learning important features through propagating activation differences. In: Proceedings of the 34th International Conference on Machine Learning. Proceedings of Machine Learning Research, pp 3145–3153
Simonyan K, Vedaldi A, Zisserman A (2013) Deep inside convolutional networks: visualising image classification models and saliency maps. Preprint. arXiv:13126034
Sobol I (1993) Sensitivity estimates for nonlinear mathematical models. Math Modell Comput Exp 1:407–414
Song E, Nelson B, Staum J (2016) Shapley effects for global sensitivity analysis: theory and computation. SIAM/ASA J Uncertain Quant 4:1060–1083
Song L, Smola A, Gretton A, Borgwardt K, Bedo J (2007) Supervised feature selection via dependence estimation. In: Proceedings of the 24th international conference on machine learning. Morgan Kaufmann Publishers, San Francisco, pp 823–830
Strobl C, Boulesteix AL, Zeileis A, Hothorn T (2007) Bias in random forest variable importance measures: illustrations, sources and a solution. BMC Bioinformatics 8:25
Su G, Wei D, Varshney K, Malioutov D (2015) Interpretable two-level boolean rule learning for classification. Preprint. arXiv:151107361
Sundararajan M, Najmi A (2020) The many Shapley values for model explanation. In: Thirty-seventh international conference on machine learning. Proceedings of machine learning research, pp 9269–9278
Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B (Methodological), pp 267–288
Vapnik V (1998) Statistical learning theory. 1998, vol 3. Wiley, New York
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez A, Kaiser L, Polosukhin I (2017) Attention is all you need. Preprint. arXiv:170603762
Wager S, Athey S (2018) Estimation and inference of heterogeneous treatment effects using random forests. J Am Stat Assoc 113:1228–1242
Weiss S, Indurkhya N (2000) Lightweight rule induction. In: Proceedings of the seventeenth international conference on machine learning. Morgan Kaufmann Publishers Inc., San Francisco, pp 1135–1142
Williamson B, Feng J (2020) Efficient nonparametric statistical inference on population feature importance using Shapley values. In: Thirty-seventh international conference on machine learning. Proceedings of machine learning research, pp 10282–10291
Wright M, Ziegler A (2017) ranger: A fast implementation of random forests for high dimensional data in C++ and R. J Stat Softw 77:1–17
Yang H, Rudin C, Seltzer M (2017) Scalable bayesian rule lists. In: Proceedings of the 34th international conference on machine learning, PMLR, pp 3921–3930
Yu B (2013) Stability. Bernoulli 19:1484–1500
Yu B, Kumbier K (2019) Three principles of data science: predictability, computability, and stability (PCS). Preprint. arXiv:190108152
Zucknick M, Richardson S, Stronach E (2008) Comparing the characteristics of gene expression profiles derived by univariate and multivariate classification methods. Stat Appl Genet Mol Biol 7:1–34
Acknowledgements
We would like to thank the many referees that helped us to improve the overall quality of the papers on which this chapter is built. We also want to express our warm thanks to Gerard Biau for his work and his numerous ideas in the presented work.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this chapter
Cite this chapter
Bénard, C., Veiga, S.D., Scornet, E. (2022). Interpretability via Random Forests. In: Lepore, A., Palumbo, B., Poggi, JM. (eds) Interpretability for Industry 4.0 : Statistical and Machine Learning Approaches . Springer, Cham. https://doi.org/10.1007/978-3-031-12402-0_3
Download citation
DOI: https://doi.org/10.1007/978-3-031-12402-0_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-12401-3
Online ISBN: 978-3-031-12402-0
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)