Skip to main content

Model Interpretability: Advances in Interpretable Machine Learning

  • Chapter
  • First Online:
Explainable Artificial Intelligence: An Introduction to Interpretable Machine Learning

Abstract

This chapter expands on intrinsic model interpretability discussed in the last chapter to include many modern techniques that are both interpretable and accurate on many real-world problems. The chapter starts with differentiating between interpretable and explainable models and why, in specific domains where high stakes decisions need to be made, interpretable models should be a natural choice. The chapter covers some state-of-the-art interpretable models that are ensemble-based, decision tree-based, rules-based, and scoring system based. We describe each algorithm in sufficient detail and then use the diabetes classification or insurance claims regression dataset to practically demonstrate the output of each, along with interpretations and observations.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 119.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 159.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 159.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. J. Adebayo, et al., Sanity checks for saliency maps. Adv. Neural Inf. Proc. Syst. 31, 9505–9515 (2018)

    Google Scholar 

  2. G. Aglin, S. Nijssen, P. Schaus, Learning optimal decision trees using caching branch-and-bound search, in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, No. 04 (2020), pp. 3146–3153

    Google Scholar 

  3. A.I.R.L. Azevedo, M.F. Santos, KDD, SEMMA and CRISP-DM: a parallel overview, in IADS-DM (2008)

    Google Scholar 

  4. S. Basu, et al., Iterative random forests to discover predictive and stable high-order interactions. Proc. Nat. Acad. Sci. 115(8), 1943–1948 (2018)

    Article  MathSciNet  Google Scholar 

  5. K.P. Bennett, J.A. Blue, Optimal decision trees. Rensselaer Polytechnic Institute Math. Rep. 214, 24 (1996)

    Google Scholar 

  6. D. Bertsimas, J. Dunn, Optimal classification trees. Mach. Learn. 106(7), 1039–1082 (2017)

    Article  MathSciNet  Google Scholar 

  7. L. Breiman, Mach. Learn. 45(1), 5–32 (2001). ISSN: 0885-6125

    Article  Google Scholar 

  8. H.A. Chipman, E.I. George, R.E. McCulloch, Bayesian treed models. Mach. Learn. 48(1–3), 299–320 (2002)

    Article  Google Scholar 

  9. H.A. Chipman, E.I. George, R.E. McCulloch, et al., BART: Bayesian additive regression trees. Ann. Appl. Statist. 4(1), 266–298 (2010)

    Article  MathSciNet  Google Scholar 

  10. P. Clark, R. Boswell, Rule induction with CN2: Some recent improvements, in European Working Session on Learning (Springer, Berlin, 1991), pp. 151–163

    Google Scholar 

  11. W.W. Cohen, Fast effective rule induction, in Machine Learning Proceedings 1995 (Elsevier, Amsterdam, 1995), pp. 115–123

    Book  Google Scholar 

  12. D.G.T. Denison, B.K. Mallick, A.F.M. Smith, A Bayesian cart algorithm. Biometrika 85(2), 363–377 (1998)

    Article  MathSciNet  Google Scholar 

  13. E. Frank, I.H. Witten, Generating accurate rule sets without global optimization, in Proceedings of the Fifteenth International Conference on Machine Learning, (1998), pp. 144–151

    Google Scholar 

  14. Y. Freund, R.E. Schapire, A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 55(1), 119–139 (1997)

    Article  MathSciNet  Google Scholar 

  15. Y. Freund, R. Schapire, N. Abe, A short introduction to boosting. J. Jpn. Soc. Artif. Intell. 14(771–780), 1612 (1999)

    Google Scholar 

  16. J.H. Friedman, Greedy function approximation: a gradient boosting machine. Ann. Statist. 29, 1189–1232 (2000)

    MathSciNet  MATH  Google Scholar 

  17. J.H. Friedman, B.E. Popescu, et al., Predictive learning via rule ensembles. Ann. Appl. Statist. 2(3), 916–954 (2008)

    Article  MathSciNet  Google Scholar 

  18. J. Fürnkranz, Separate-and-conquer rule learning. Artif. Intell. Rev. 13(1), 3–54 (1999)

    Article  Google Scholar 

  19. F. Gardin, et al., skope-rules (2017). https://github.com/scikit-learn-contrib/skope-rules

  20. T.J. Hastie, R.J. Tibshirani, Generalized Additive Models, vol. 43 (CRC Press, Boca Raton, 1990)

    MATH  Google Scholar 

  21. T. Hothorn, K. Hornik, A. Zeileis, Unbiased recursive partitioning: a conditional inference framework. J. Comput. Graph. Statist. 15(3), 651–674 (2006)

    Article  MathSciNet  Google Scholar 

  22. W.-C. Hsiao, Y.-S. Shih, Splitting variable selection for multivariate regression trees. Statist. Probab. Lett. 77(3), 265–271 (2007)

    Article  MathSciNet  Google Scholar 

  23. X. Hu, C. Rudin, M. Seltzer, Optimal sparse decision trees, in Advances in Neural Information Processing Systems (2019), pp. 7267–7275

    Google Scholar 

  24. B. Kim, C. Rudin, J. Shah, The Bayesian case model: A generative approach for case-based reasoning and prototype classification, in Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 2. NIPS’14 (MIT Press, Cambridge, 2014), pp. 1952–1960

    Google Scholar 

  25. H. Kim, W.-Y. Loh, Classification trees with unbiased multiway splits. J. Amer. Statist. Assoc. 96(454), 589–604 (2001)

    Article  MathSciNet  Google Scholar 

  26. D.R. Larsen, P.L. Speckman, Multivariate regression trees for analysis of abundance data. Biometrics 60(2), 543–549 (2004)

    Article  MathSciNet  Google Scholar 

  27. N. Larus-Stone, et al., Systems optimizations for learning certifiably optimal rule lists, in SysML Conference (2018)

    Google Scholar 

  28. B. Letham, et al., Interpretable classifiers using rules and Bayesian analysis: building a better stroke prediction model. Ann. Appl. Statist. 9(3), 1350–1371 (2015)

    Article  MathSciNet  Google Scholar 

  29. W.-Y. Loh, Y.-S. Shih, Split selection methods for classification trees. Statist. Sinica 7, 815–840 (1997)

    MathSciNet  MATH  Google Scholar 

  30. W.-Y. Loh, N. Vanichsetakul, Tree-structured classification via generalized discriminant analysis. J. Amer. Statist. Assoc. 83(403), 715–725 (1988)

    Article  MathSciNet  Google Scholar 

  31. R.S. Michalski, A theory and methodology of inductive learning, in Machine Learning (Elsevier, Amsterdam, 1983), pp. 83–134

    Google Scholar 

  32. H. Nori, et al., InterpretML: a unified framework for machine learning interpretability (2019). Preprint arXiv:1909.09223

    Google Scholar 

  33. M. Norouzi, et al., Efficient non-greedy optimization of decision trees, in Advances in Neural Information Processing Systems (2015), pp. 1729–1737

    Google Scholar 

  34. R.J. Quinlan, Learning with continuous classes, in 5th Australian Joint Conference on Artificial Intelligence (World Scientific, Singapore, 1992), pp. 343–348

    Google Scholar 

  35. C. Rudin, Do simpler models exist and how can we find them? in KDD (2019), pp. 1–2

    Google Scholar 

  36. C. Rudin, Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nat. Mach. Intell. 1(5), 206–215 (2019)

    Article  Google Scholar 

  37. M. Robert Segal, Tree-structured methods for longitudinal data. J. Amer. Statist. Assoc. 87(418), 407–418 (1992)

    Article  Google Scholar 

  38. R.J. Sela, J.S. Simonoff, RE-EM trees: a data mining approach for longitudinal and clustered data. Mach. Learn. 86(2), 169–207 (2012)

    Article  MathSciNet  Google Scholar 

  39. B. Ustun, C. Rudin, Supersparse linear integer models for optimized medical scoring systems. Mach. Learn. 102(3), 349–391 (2015)

    Article  MathSciNet  Google Scholar 

  40. B. Ustun, C. Rudin, Learning optimized risk scores. J. Mach. Learn. Res. 20(150), 1–75 (2019). http://jmlr.org/papers/v20/18-615.html

    MathSciNet  MATH  Google Scholar 

  41. T. Wang, et al., Or’s of And’s for Interpretable Classification, with Application to Context-Aware Recommender Systems (2015). arXiv: 1504.07614 [cs.LG]

    Google Scholar 

  42. L. Wasserman, et al., Bayesian model selection and model aver-aging. J. Math. Psychol. 44(1), 92–107 (2000)

    Article  MathSciNet  Google Scholar 

  43. Y. Yu, D. Lambert, Fitting trees to functional data, with an application to time-of-day patterns. J. Computat. Graph. Statist. 8(4), 749–762 (1999)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2021 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Kamath, U., Liu, J. (2021). Model Interpretability: Advances in Interpretable Machine Learning. In: Explainable Artificial Intelligence: An Introduction to Interpretable Machine Learning. Springer, Cham. https://doi.org/10.1007/978-3-030-83356-5_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-83356-5_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-83355-8

  • Online ISBN: 978-3-030-83356-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics