Skip to main content
Log in

Considerations when learning additive explanations for black-box models

  • Published:
Machine Learning Aims and scope Submit manuscript

Abstract

Many methods to explain black-box models, whether local or global, are additive. In this paper, we study global additive explanations for non-additive models, focusing on four explanation methods: partial dependence, Shapley explanations adapted to a global setting, distilled additive explanations, and gradient-based explanations. We show that different explanation methods characterize non-additive components in a black-box model’s prediction function in different ways. We use the concepts of main and total effects to anchor additive explanations, and quantitatively evaluate additive and non-additive explanations. Even though distilled explanations are generally the most accurate additive explanations, non-additive explanations such as tree explanations that explicitly model non-additive components tend to be even more accurate. Despite this, our user study showed that machine learning practitioners were better able to leverage additive explanations for various tasks. These considerations should be taken into account when considering which explanation to trust and use to explain black-box models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Data availability

Publicly available datasets used in this paper can be found at the websites of: UCI (Bikeshare, Magic), Kaggle (Lending Club), FICO. Whenever available, links are cited in this paper’s references.

Code Availability

Publicly-available code to help other researchers replicate the work can be found at: https://github.com/shftan/distilled_additive_explanations.

Notes

  1. The name is counter-intuitive as it is based on the conditional, not marginal distribution.

  2. State-of-the-art rule lists (Letham et al., 2015; Angelino et al., 2017) do not support regression, which is needed for distillation. We used a slightly older subgroup discovery algorithm (Atzmueller & Lemmerich, 2012) that supports regression but does not generate disjoint rules. This method only achieved reasonable results on Bikeshare.

  3. For decision trees K represents the depth, and a tree of depth 4 would be denoted as DT-4. For sparse rules, K represents the number of rules, and a group of 5 rules would be denoted as RULES-5. For SAT and SPARSE, K denotes the number of features to use. In the case of SPARSE, K is set indirectly by finding the regularization lambda parameter that produced the best accuracy on validation while also producing exactly K non-zero feature coefficients.

References

  • Adebayo, J., Gilmer, J., Muelly, M., Goodfellow, I., Hardt, M., & Kim, B. (2019). Sanity checks for saliency maps. In NeurIPS.

  • Amodio, S., Aria, M., & D’Ambrosio, A. (2014). 2014. Statistica: On concurvity in nonlinear and nonparametric regression models.

    Google Scholar 

  • Ancona, M., Ceolini, E., Oztireli, C., & Gross, M. (2018). Towards better understanding of gradient-based attribution methods for Deep Neural Networks. In ICLR.

  • Angelino, E., Larus-Stone, N., Alabi, D., Seltzer, M., & Rudin, C. (2017). Learning certifiably optimal rule lists. In KDD.

  • Apley, D.W., & Zhu, J. (2020). Visualizing the effects of predictor variables in black box supervised learning models. Journal of the Royal Statistical Society: Series B (Statistical Methodology),82, 4.

  • Atzmueller, M., & Lemmerich, F. (2012). VIKAMINE - Open-source subgroup discovery, pattern mining. In ECML PKDD: Analytics.

  • Ba, J., & Caruana, R. (2014). Do deep nets really need to be deep?. In NeurIPS.

  • Bach, S., Binder, A., Montavon, G., Klauschen, F., Muller, K. R., & Samek, W. (2015). On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation. PloS ONE, 10, 7.

    Article  Google Scholar 

  • Bastani, O., Kim, C., & Bastani, H. (2017). Interpreting blackbox models via model extraction. In FAT/ML Workshop.

  • Bau, D., Zhou, B., Khosla, A., Oliva, A., & Torralba, A. (2017). Quantifying interpretability of deep visual representations: Network dissection. In CVPR.

  • Bhatt, U., Weller, A., & Moura, J. M. F. (2020). Evaluating and aggregating feature-based model explanations. In IJCAI.

  • Bien, J., & Tibshirani, R. (2011). Prototype selection for interpretable classification. The Annals of Applied Statistics, 5, 4.

    Article  MathSciNet  MATH  Google Scholar 

  • Breiman, L. (2001). Random forests. Machine Learning, 45, 1.

    MATH  Google Scholar 

  • Bucilua, C., Caruana, R., & Niculescu-Mizil, A. (2006). Model compression. In KDD.

  • Caruana, R., Lou, Y., Gehrke, J., Koch, P., Sturm, M., & Elhadad, N. (2015). Intelligible models for healthcare: Predicting pneumonia risk and hospital 30-day readmission. In KDD.

  • Chang, C.H., Tan, S., Lengerich, B., Goldenberg, A., & Caruana, R. (2021). How interpretable and trustworthy are GAMs. In KDD.

  • Covert, I., Lundberg, S., & Lee, S.I. (2020). Understanding global feature contributions through additive importance measures. In NeurIPS.

  • Craven, M. W., & Shavlik, J. W. (1995). Extracting tree-structured representations of trained networks. InNeurIPS.

  • Doshi-Velez, F., & Kim, B. (2018). Towards A rigorous science of interpretable machine learning. In Explainable and interpretable models in computer vision and machine learning. Springer.

  • FICO. (2018). FICO explainable machine learning challenge. https://community.fico.com/s/explainable-machine-learning-challenge.

  • Fisher A.J., Rudin, C., & Dominici, F. (2019). All models are wrong, but many are useful: Learning a variable’s importance by studying an entire class of prediction models simultaneously. JMLR.

  • Friedman, J. H. (2001). Greedy function approximation: A gradient boosting machine. The Annals of Statistics, 29, 5.

    Article  MathSciNet  MATH  Google Scholar 

  • Friedman, J. H., & Popescu, B. E. (2008). Predictive learning via rule ensembles. The Annals of Applied Statistics, 2, 3.

    Article  MathSciNet  MATH  Google Scholar 

  • Frosst, N., & Hinton, G. (2018). Distilling a neural network into a soft decision tree. In CEUR-WS.

  • Glorot, X., & Bengio, Y. (2010). Understanding the difficulty of training deep feedforward neural networks. In AISTATS.

  • Hastie, T., & Tibshirani, R. (1986). Generalized additive models. Journal of Statistical Science, 1(3), 297–310.

    MathSciNet  MATH  Google Scholar 

  • Hastie, T., Tibshirani, R., & Friedman, J. (2009). The elements of statistical learning: Data mining, inference, and prediction. Springer.

  • Hinton, G., Vinyals, O., & Dean, J. (2014). Distilling the knowledge in a neural network. In NeurIPS Deep learning and representation learning workshop.

  • Hooker, G. (2004). Discovering additive structure in black box functions. In KDD.

  • Ibrahim, M., Louie, M., Modarres, C., & Paisley, J. (2019). Mapping the landscape of predictions: Global explanations of neural networks. In AIES.

  • Ioffe, S., & Szegedy, C. (2015). Batch normalization: Accelerating deep network training by reducing internal covariate shift. In ICML.

  • Jesus, S., Belém, C., Balayan, V., Bento, J., Saleiro, P., Bizarro, P., & Gama, J. (2021). How can I choose an explainer? In FAccT: An application-grounded evaluation of post-hoc explanations.

  • Rawal, K., & Lakkaraju, H. (2020). Beyond individualized recourse: Interpretable and interactive summaries of actionable recourses. In NeurIPS.

  • Kaur, H., Nori, H., Jenkins, S., Caruana, R., Wallach, H., & Wortman Vaughan, J. (2020). Understanding Data Scientists’ Use of Interpretability Tools for Machine Learning: Interpreting Interpretability. In CHI.

  • Kim, B., Khanna, R., & Koyejo, O. (2016). Examples are not enough, learn to criticize! criticism for interpretability. In NeurIPS.

  • Kim, B., Wattenberg, M., Gilmer, J., Cai, C.J., Wexler, J., Viegas, F., & Sayres, R.A. (2018). Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (TCAV). In ICML.

  • Kingma, D. P., Adam J. L. (2015). A method for stochastic optimization: Adam. In ICLR.

  • Lage, I., Chen, E., He, J., Narayanan, M., Kim, B., Gershman, S. J., & Doshi-Velez, F. (2019). Human evaluation of models built for interpretability. In HCOMP.

  • Lakkaraju, H., Kamar, E., Caruana, R., & Leskovec, J. (2019). Faithful and customizable explanations of black box models. In AIES.

  • Lending Club. (2011). Lending Club Loan Dataset 2007-2011. https://www.lendingclub.com/info/download-data.action.

  • Lengerich, B., Tan, S., Chang, C. H., Hooker, G., & Caruana, R. (2020). An efficient algorithm for recovering identifiable additive models: Purifying interaction effects with the functional anova. In AISTATS.

  • Letham, B., Rudin, C., McCormick, T. H., Madigan, D., et al. (2015). Interpretable classifiers using rules and bayesian analysis: Building a better stroke prediction model. The Annals of Applied Statistics, 9, 3.

    Article  MathSciNet  MATH  Google Scholar 

  • LiMin, Fu. (1994). Rule generation from neural networks. IEEE Transactions on Systems, Man, and Cybernetics, 24, 8.

    Article  Google Scholar 

  • Lou, Y., Caruana, R., & Gehrke, J. (2012). Intelligible models for classification and regression. In KDD.

  • Lou, Y., Caruana, R., Gehrke, J., & Hooker, G. (2013). Accurate intelligible models with pairwise interactions. In KDD.

  • Lundberg, S. M., & Lee, S. I. (2017). A unified approach to interpreting model predictions. In NeurIPS.

  • Montavon, G., Samek, W., & Muller, K. R. (2018). Methods for interpreting and understanding deep neural networks. Digital Signal Processing, 73, 1–5.

    Article  MathSciNet  Google Scholar 

  • Mu, J., & Andrea, J. (2020). Compositional explanations of neurons. In NeurIPS.

  • Nori, H., Jenkins, S., Koch, P., & Caruana, R. (2019). InterpretML: A unified framework for machine learning interpretability. arXiv preprint arXiv:1909.09223.

  • Orlenko, A., & Moore, J. H. (2021). A comparison of methods for interpreting random forest models of genetic association in the presence of non-additive interactions. BioData Mining, 14, 1.

    Article  Google Scholar 

  • Owen A. B., (2014). Sobol’ indices, & Shapley value. SIAM/ASA Journal on Uncertainty Quantification.

  • Poursabzi-Sangdeh, F., Goldstein, D. G., Hofman, J. M., Wortman Vaughan, J. W., & Wallach, H. (2021). Manipulating and measuring model interpretability. In CHI.

  • Ribeiro, M. T., Singh, S., & Guestrin, C. (2016). Why should I trust you?: Explaining the predictions of any classifier. In KDD.

  • Rudin, C. (2019). Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5), 206–215.

    Article  Google Scholar 

  • Sanchez, I., Rocktaschel, T., Riedel, S., & Singh, S. (2015). Towards extracting faithful and descriptive representations of latent variable models. In AAAI spring syposium on knowledge representation and reasoning: Integrating symbolic and neural approaches.

  • Setzu, M., Guidotti, R., Monreale, A., Turini, F., Pedreschi, D., & Giannotti, F. (2021). GLocalX-from local to global explanations of black box AI models. Artificial Intelligence 294.

  • Shrikumar, A., Greenside, P., & Kundaje, A. (2017). Learning important features through propagating activation differences. InICML.

  • Simonyan, K., & Vedaldi, A., Zisserman, A. (2014). Visualising image classification models and saliency maps. In ICLR Workshop: Deep inside convolutional networks.

  • Simonyan, K., & Zisserman, A. (2015). Very deep convolutional networks for large-scale image recognition. In ICLR.

  • Sobol’, I. M. (1990). On sensitivity estimation for nonlinear mathematical models. Matematicheskoe modelirovanie, 2, 1.

  • Slack, D., Hilgard, S., Jia, E., & Singh, S., Lakkaraju, H. (2020). Adversarial attacks on post hoc explanation methods: Fooling LIME and SHAP. In AIES.

  • Štrumbelj, E., & Kononenko, I. (2014). Explaining prediction models and individual predictions with feature contributions. Knowledge and Information Systems, 41, 3.

    Article  Google Scholar 

  • Tan, S. (2018). Interpretable approaches to detect bias in black-box models. In AIES doctoral consortium.

  • Tan, S., Caruana, R., & Hooker, G., Lou, Y. (2018). Auditing black-box models using transparent model distillation: Distill-and-compare. In AIES.

  • Tan, S., Soloviev, M., Hooker, G., & Wells, M. T. (2020). Tree space prototypes: Another look at making tree ensembles interpretable. In FODS.

  • Tsang, M., Cheng, D., & Liu, Y. (2018). Detecting statistical interactions from neural network weights. In ICLR.

  • van der Linden, I., Haned, H., & Kanoulas, E. (2019). Global aggregations of local explanations for black box models. In SIGIR Fairness, accountability, confidentiality, transparency, and safety workshop.

  • Williamson, B., & Feng, J. (2020). Efficient nonparametric statistical inference on population feature importance using Shapley values. In ICML.

  • Wood, S. N. (2006). Generalized additive models: An introduction with R. Chapman and Hall/CRC.

  • Wood, S. N. (2011). Fast stable restricted maximum likelihood and marginal likelihood estimation of semiparametric generalized linear models. Journal of the Royal Statistical Society: Series B, 73, 1.

    MathSciNet  MATH  Google Scholar 

  • Yan, T., & Procaccia, A. D. (2021). If you like shapley then you’ll love the core. In AAAI.

  • Zhao, Q., & Hastie, T. (2021). Causal interpretations of black-box models. Journal of Business & Economic Statistics, 39, 1.

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

We thank Julius Adebayo for helpful discussion.

Funding

Giles Hooker was supported by NSF Grant No DMS-1712554.

Author information

Authors and Affiliations

Authors

Contributions

The authors contributed to this paper in the following manner: Sarah Tan designed, executed, and analyzed the experiments and wrote the paper. Giles Hooker formulated mathematics in the paper and wrote the paper. Paul Koch wrote software used by the experiments. Albert Gordo executed experiments and wrote the paper. Rich Caruana analyzed experiments and wrote the paper.

Corresponding author

Correspondence to Sarah Tan.

Ethics declarations

Conflict of interest

The authors declare that there is no conflict of interest.

Consent to participate

User study subjects consented to participate in the user study.

Additional information

Editor: Nathalie Japkowicz.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This work started before the authors joined Facebook.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tan, S., Hooker, G., Koch, P. et al. Considerations when learning additive explanations for black-box models. Mach Learn 112, 3333–3359 (2023). https://doi.org/10.1007/s10994-023-06335-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10994-023-06335-8

Keywords

Navigation