Advertisement

Visualizing the Feature Importance for Black Box Models

  • Giuseppe CasalicchioEmail author
  • Christoph Molnar
  • Bernd Bischl
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11051)

Abstract

In recent years, a large amount of model-agnostic methods to improve the transparency, trustability, and interpretability of machine learning models have been developed. Based on a recent method for model-agnostic global feature importance, we introduce a local feature importance measure for individual observations and propose two visual tools: partial importance (PI) and individual conditional importance (ICI) plots which visualize how changes in a feature affect the model performance on average, as well as for individual observations. Our proposed methods are related to partial dependence (PD) and individual conditional expectation (ICE) plots, but visualize the expected (conditional) feature importance instead of the expected (conditional) prediction. Furthermore, we show that averaging ICI curves across observations yields a PI curve, and integrating the PI curve with respect to the distribution of the considered feature results in the global feature importance. Another contribution of our paper is the Shapley feature importance, which fairly distributes the overall performance of a model among the features according to the marginal contributions and which can be used to compare the feature importance across different models. Code related to this paper is available at: https://github.com/giuseppec/featureImportance.

Keywords

Interpretable machine learning Explainable AI Feature importance Variable importance Feature effect Partial dependence 

Notes

Acknowledgments

This work is funded by the Bavarian State Ministry of Education, Science and the Arts in the framework of the Centre Digitisation. Bavaria (ZD.B).

References

  1. 1.
    Bischl, B., Mersmann, O., Trautmann, H., Weihs, C.: Resampling methods for meta-model validation with recommendations for evolutionary computation. Evol. Comput. 20(2), 249–275 (2012)CrossRefGoogle Scholar
  2. 2.
    Bischl, B., Lang, M., Kotthoff, L., Schiffner, J., Richter, J., Studerus, E., Casalicchio, G., Jones, Z.M.: mlr: machine learning in R. J. Mach. Learn. Res. 17(170), 1–5 (2016)MathSciNetzbMATHGoogle Scholar
  3. 3.
    Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)CrossRefGoogle Scholar
  4. 4.
    Casalicchio, G., Bischl, B., Boulesteix, A.L., Schmid, M.: The residual-based predictiveness curve: a visual tool to assess the performance of prediction models. Biometrics 72(2), 392–401 (2016)MathSciNetCrossRefGoogle Scholar
  5. 5.
    Casalicchio, G., et al.: OpenML: an R package to connect to the machine learning platform OpenML. Comput. Stat. 1–15 (2017).  https://doi.org/10.1007/s00180-017-0742-2
  6. 6.
    Cohen, S., Dror, G., Ruppin, E.: Feature selection via coalitional game theory. Neural Comput. 19(7), 1939–1961 (2007)MathSciNetCrossRefGoogle Scholar
  7. 7.
    Cutler, A., Cutler, D.R., Stevens, J.R.: Random forests. Ensemble Machine Learning, pp. 157–175. Springer, Boston (2012).  https://doi.org/10.1007/978-1-4419-9326-7_5CrossRefGoogle Scholar
  8. 8.
    Datta, A., Sen, S., Zick, Y.: Algorithmic transparency via quantitative input influence: theory and experiments with learning systems. In: Proceedings of IEEE Symposium on Security and Privacy, SP, pp. 598–617 (2016)Google Scholar
  9. 9.
    Fisher, A., Rudin, C., Dominici, F.: Model class reliance: variable importance measures for any machine learning model class, from the "Rashomon" perspective (2018). arXiv preprint arXiv:1801.01489
  10. 10.
    Friedman, J.H.: Greedy function approximation: a gradient boosting machine. Annals of Statistics, pp. 1189–1232 (2001)Google Scholar
  11. 11.
    Goldstein, A., Kapelner, A., Bleich, J., Pitkin, E.: Peeking inside the black box: visualizing statistical learning with plots of individual conditional expectation. J. Comput. Graph. Stat. 24(1), 44–65 (2015)MathSciNetCrossRefGoogle Scholar
  12. 12.
    Gregorutti, B., Michel, B., Saint-Pierre, P.: Correlation and variable importance in random forests. Stat. Comput. 27(3), 659–678 (2017)MathSciNetCrossRefGoogle Scholar
  13. 13.
    Lang, M., Bischl, B., Surmann, D.: batchtools: tools for R to work on batch systems. J. Open Source Softw. 2(10) (2017)CrossRefGoogle Scholar
  14. 14.
    Lipton, Z.C.: The mythos of model interpretability. In: ICML WHI 2016 (2016)Google Scholar
  15. 15.
    Lundberg, S.M., Erion, G.G., Lee, S.I.: Consistent individualized feature attribution for tree ensembles (2018). arXiv preprint arXiv:1802.03888
  16. 16.
    Lundberg, S.M., Lee, S.I.: A unified approach to interpreting model predictions. In: NIPS, vol. 30, pp. 4765–4774. Curran Associates, Inc., Red Hook (2017)Google Scholar
  17. 17.
    Molnar, C., Casalicchio, G., Bischl, B.: iml: an R package for interpretable machine learning. J. Open Source Softw. 3(27), 786 (2018).  https://doi.org/10.21105/joss.00786CrossRefGoogle Scholar
  18. 18.
    Serfling, R.J.: Approximation Theorems of Mathematical Statistics, vol. 162. Wiley, New York (2009)zbMATHGoogle Scholar
  19. 19.
    Shapley, L.S.: A value for n-person games. Contrib. Theory Games 2(28), 307–317 (1953)MathSciNetzbMATHGoogle Scholar
  20. 20.
    Strobl, C., Boulesteix, A.L., Kneib, T., Augustin, T., Zeileis, A.: Conditional variable importance for random forests. BMC Bioinf. 9, 307 (2008)CrossRefGoogle Scholar
  21. 21.
    Štrumbelj, E., Kononenko, I., Wrobel, S.: An efficient explanation of individual classifications using game theory. J. Mach. Learn. Res. 11(Jan), 1–18 (2010)MathSciNetzbMATHGoogle Scholar
  22. 22.
    Štrumbelj, E., Kononenko, I.: A general method for visualizing and explaining black-box regression models. In: Dobnikar, A., Lotrič, U., Šter, B. (eds.) ICANNGA 2011, Part II. LNCS, vol. 6594, pp. 21–30. Springer, Heidelberg (2011).  https://doi.org/10.1007/978-3-642-20267-4_3CrossRefGoogle Scholar
  23. 23.
    Vanschoren, J., Van Rijn, J.N., Bischl, B., Torgo, L.: OpenML: networked science in machine learning. ACM SIGKDD Explor. Newsl. 15(2), 49–60 (2014)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Giuseppe Casalicchio
    • 1
    Email author
  • Christoph Molnar
    • 1
  • Bernd Bischl
    • 1
  1. 1.Department of StatisticsLudwig-Maximilians-University MunichMunichGermany

Personalised recommendations