Abstract
This chapter expands on intrinsic model interpretability discussed in the last chapter to include many modern techniques that are both interpretable and accurate on many real-world problems. The chapter starts with differentiating between interpretable and explainable models and why, in specific domains where high stakes decisions need to be made, interpretable models should be a natural choice. The chapter covers some state-of-the-art interpretable models that are ensemble-based, decision tree-based, rules-based, and scoring system based. We describe each algorithm in sufficient detail and then use the diabetes classification or insurance claims regression dataset to practically demonstrate the output of each, along with interpretations and observations.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
J. Adebayo, et al., Sanity checks for saliency maps. Adv. Neural Inf. Proc. Syst. 31, 9505–9515 (2018)
G. Aglin, S. Nijssen, P. Schaus, Learning optimal decision trees using caching branch-and-bound search, in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, No. 04 (2020), pp. 3146–3153
A.I.R.L. Azevedo, M.F. Santos, KDD, SEMMA and CRISP-DM: a parallel overview, in IADS-DM (2008)
S. Basu, et al., Iterative random forests to discover predictive and stable high-order interactions. Proc. Nat. Acad. Sci. 115(8), 1943–1948 (2018)
K.P. Bennett, J.A. Blue, Optimal decision trees. Rensselaer Polytechnic Institute Math. Rep. 214, 24 (1996)
D. Bertsimas, J. Dunn, Optimal classification trees. Mach. Learn. 106(7), 1039–1082 (2017)
L. Breiman, Mach. Learn. 45(1), 5–32 (2001). ISSN: 0885-6125
H.A. Chipman, E.I. George, R.E. McCulloch, Bayesian treed models. Mach. Learn. 48(1–3), 299–320 (2002)
H.A. Chipman, E.I. George, R.E. McCulloch, et al., BART: Bayesian additive regression trees. Ann. Appl. Statist. 4(1), 266–298 (2010)
P. Clark, R. Boswell, Rule induction with CN2: Some recent improvements, in European Working Session on Learning (Springer, Berlin, 1991), pp. 151–163
W.W. Cohen, Fast effective rule induction, in Machine Learning Proceedings 1995 (Elsevier, Amsterdam, 1995), pp. 115–123
D.G.T. Denison, B.K. Mallick, A.F.M. Smith, A Bayesian cart algorithm. Biometrika 85(2), 363–377 (1998)
E. Frank, I.H. Witten, Generating accurate rule sets without global optimization, in Proceedings of the Fifteenth International Conference on Machine Learning, (1998), pp. 144–151
Y. Freund, R.E. Schapire, A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 55(1), 119–139 (1997)
Y. Freund, R. Schapire, N. Abe, A short introduction to boosting. J. Jpn. Soc. Artif. Intell. 14(771–780), 1612 (1999)
J.H. Friedman, Greedy function approximation: a gradient boosting machine. Ann. Statist. 29, 1189–1232 (2000)
J.H. Friedman, B.E. Popescu, et al., Predictive learning via rule ensembles. Ann. Appl. Statist. 2(3), 916–954 (2008)
J. Fürnkranz, Separate-and-conquer rule learning. Artif. Intell. Rev. 13(1), 3–54 (1999)
F. Gardin, et al., skope-rules (2017). https://github.com/scikit-learn-contrib/skope-rules
T.J. Hastie, R.J. Tibshirani, Generalized Additive Models, vol. 43 (CRC Press, Boca Raton, 1990)
T. Hothorn, K. Hornik, A. Zeileis, Unbiased recursive partitioning: a conditional inference framework. J. Comput. Graph. Statist. 15(3), 651–674 (2006)
W.-C. Hsiao, Y.-S. Shih, Splitting variable selection for multivariate regression trees. Statist. Probab. Lett. 77(3), 265–271 (2007)
X. Hu, C. Rudin, M. Seltzer, Optimal sparse decision trees, in Advances in Neural Information Processing Systems (2019), pp. 7267–7275
B. Kim, C. Rudin, J. Shah, The Bayesian case model: A generative approach for case-based reasoning and prototype classification, in Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 2. NIPS’14 (MIT Press, Cambridge, 2014), pp. 1952–1960
H. Kim, W.-Y. Loh, Classification trees with unbiased multiway splits. J. Amer. Statist. Assoc. 96(454), 589–604 (2001)
D.R. Larsen, P.L. Speckman, Multivariate regression trees for analysis of abundance data. Biometrics 60(2), 543–549 (2004)
N. Larus-Stone, et al., Systems optimizations for learning certifiably optimal rule lists, in SysML Conference (2018)
B. Letham, et al., Interpretable classifiers using rules and Bayesian analysis: building a better stroke prediction model. Ann. Appl. Statist. 9(3), 1350–1371 (2015)
W.-Y. Loh, Y.-S. Shih, Split selection methods for classification trees. Statist. Sinica 7, 815–840 (1997)
W.-Y. Loh, N. Vanichsetakul, Tree-structured classification via generalized discriminant analysis. J. Amer. Statist. Assoc. 83(403), 715–725 (1988)
R.S. Michalski, A theory and methodology of inductive learning, in Machine Learning (Elsevier, Amsterdam, 1983), pp. 83–134
H. Nori, et al., InterpretML: a unified framework for machine learning interpretability (2019). Preprint arXiv:1909.09223
M. Norouzi, et al., Efficient non-greedy optimization of decision trees, in Advances in Neural Information Processing Systems (2015), pp. 1729–1737
R.J. Quinlan, Learning with continuous classes, in 5th Australian Joint Conference on Artificial Intelligence (World Scientific, Singapore, 1992), pp. 343–348
C. Rudin, Do simpler models exist and how can we find them? in KDD (2019), pp. 1–2
C. Rudin, Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nat. Mach. Intell. 1(5), 206–215 (2019)
M. Robert Segal, Tree-structured methods for longitudinal data. J. Amer. Statist. Assoc. 87(418), 407–418 (1992)
R.J. Sela, J.S. Simonoff, RE-EM trees: a data mining approach for longitudinal and clustered data. Mach. Learn. 86(2), 169–207 (2012)
B. Ustun, C. Rudin, Supersparse linear integer models for optimized medical scoring systems. Mach. Learn. 102(3), 349–391 (2015)
B. Ustun, C. Rudin, Learning optimized risk scores. J. Mach. Learn. Res. 20(150), 1–75 (2019). http://jmlr.org/papers/v20/18-615.html
T. Wang, et al., Or’s of And’s for Interpretable Classification, with Application to Context-Aware Recommender Systems (2015). arXiv: 1504.07614 [cs.LG]
L. Wasserman, et al., Bayesian model selection and model aver-aging. J. Math. Psychol. 44(1), 92–107 (2000)
Y. Yu, D. Lambert, Fitting trees to functional data, with an application to time-of-day patterns. J. Computat. Graph. Statist. 8(4), 749–762 (1999)
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2021 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this chapter
Cite this chapter
Kamath, U., Liu, J. (2021). Model Interpretability: Advances in Interpretable Machine Learning. In: Explainable Artificial Intelligence: An Introduction to Interpretable Machine Learning. Springer, Cham. https://doi.org/10.1007/978-3-030-83356-5_4
Download citation
DOI: https://doi.org/10.1007/978-3-030-83356-5_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-83355-8
Online ISBN: 978-3-030-83356-5
eBook Packages: Computer ScienceComputer Science (R0)