Considerations when learning additive explanations for black-box models

Tan, Sarah; Hooker, Giles; Koch, Paul; Gordo, Albert; Caruana, Rich

doi:10.1007/s10994-023-06335-8

Considerations when learning additive explanations for black-box models

Published: 19 June 2023

Volume 112, pages 3333–3359, (2023)
Cite this article

Machine Learning Aims and scope Submit manuscript

Sarah Tan ORCID: orcid.org/0000-0001-5453-4874¹,
Giles Hooker²,
Paul Koch³,
Albert Gordo¹ &
…
Rich Caruana³

478 Accesses
5 Citations
7 Altmetric
Explore all metrics

Abstract

Many methods to explain black-box models, whether local or global, are additive. In this paper, we study global additive explanations for non-additive models, focusing on four explanation methods: partial dependence, Shapley explanations adapted to a global setting, distilled additive explanations, and gradient-based explanations. We show that different explanation methods characterize non-additive components in a black-box model’s prediction function in different ways. We use the concepts of main and total effects to anchor additive explanations, and quantitatively evaluate additive and non-additive explanations. Even though distilled explanations are generally the most accurate additive explanations, non-additive explanations such as tree explanations that explicitly model non-additive components tend to be even more accurate. Despite this, our user study showed that machine learning practitioners were better able to leverage additive explanations for various tasks. These considerations should be taken into account when considering which explanation to trust and use to explain black-box models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 8

Can local explanation techniques explain linear additive models?

Article Open access 19 September 2023

Benchmarking and survey of explanation methods for black box models

Article Open access 03 June 2023

Explaining Decisions of Black-Box Models Using BARBE

Data availability

Publicly available datasets used in this paper can be found at the websites of: UCI (Bikeshare, Magic), Kaggle (Lending Club), FICO. Whenever available, links are cited in this paper’s references.

Code Availability

Publicly-available code to help other researchers replicate the work can be found at: https://github.com/shftan/distilled_additive_explanations.

Notes

The name is counter-intuitive as it is based on the conditional, not marginal distribution.
State-of-the-art rule lists (Letham et al., 2015; Angelino et al., 2017) do not support regression, which is needed for distillation. We used a slightly older subgroup discovery algorithm (Atzmueller & Lemmerich, 2012) that supports regression but does not generate disjoint rules. This method only achieved reasonable results on Bikeshare.
For decision trees K represents the depth, and a tree of depth 4 would be denoted as DT-4. For sparse rules, K represents the number of rules, and a group of 5 rules would be denoted as RULES-5. For SAT and SPARSE, K denotes the number of features to use. In the case of SPARSE, K is set indirectly by finding the regularization lambda parameter that produced the best accuracy on validation while also producing exactly K non-zero feature coefficients.

References

Adebayo, J., Gilmer, J., Muelly, M., Goodfellow, I., Hardt, M., & Kim, B. (2019). Sanity checks for saliency maps. In NeurIPS.
Amodio, S., Aria, M., & D’Ambrosio, A. (2014). 2014. Statistica: On concurvity in nonlinear and nonparametric regression models.
Google Scholar
Ancona, M., Ceolini, E., Oztireli, C., & Gross, M. (2018). Towards better understanding of gradient-based attribution methods for Deep Neural Networks. In ICLR.
Angelino, E., Larus-Stone, N., Alabi, D., Seltzer, M., & Rudin, C. (2017). Learning certifiably optimal rule lists. In KDD.
Apley, D.W., & Zhu, J. (2020). Visualizing the effects of predictor variables in black box supervised learning models. Journal of the Royal Statistical Society: Series B (Statistical Methodology),82, 4.
Atzmueller, M., & Lemmerich, F. (2012). VIKAMINE - Open-source subgroup discovery, pattern mining. In ECML PKDD: Analytics.
Ba, J., & Caruana, R. (2014). Do deep nets really need to be deep?. In NeurIPS.
Bach, S., Binder, A., Montavon, G., Klauschen, F., Muller, K. R., & Samek, W. (2015). On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation. PloS ONE, 10, 7.
Article Google Scholar
Bastani, O., Kim, C., & Bastani, H. (2017). Interpreting blackbox models via model extraction. In FAT/ML Workshop.
Bau, D., Zhou, B., Khosla, A., Oliva, A., & Torralba, A. (2017). Quantifying interpretability of deep visual representations: Network dissection. In CVPR.
Bhatt, U., Weller, A., & Moura, J. M. F. (2020). Evaluating and aggregating feature-based model explanations. In IJCAI.
Bien, J., & Tibshirani, R. (2011). Prototype selection for interpretable classification. The Annals of Applied Statistics, 5, 4.
Article MathSciNet MATH Google Scholar
Breiman, L. (2001). Random forests. Machine Learning, 45, 1.
MATH Google Scholar
Bucilua, C., Caruana, R., & Niculescu-Mizil, A. (2006). Model compression. In KDD.
Caruana, R., Lou, Y., Gehrke, J., Koch, P., Sturm, M., & Elhadad, N. (2015). Intelligible models for healthcare: Predicting pneumonia risk and hospital 30-day readmission. In KDD.
Chang, C.H., Tan, S., Lengerich, B., Goldenberg, A., & Caruana, R. (2021). How interpretable and trustworthy are GAMs. In KDD.
Covert, I., Lundberg, S., & Lee, S.I. (2020). Understanding global feature contributions through additive importance measures. In NeurIPS.
Craven, M. W., & Shavlik, J. W. (1995). Extracting tree-structured representations of trained networks. InNeurIPS.
Doshi-Velez, F., & Kim, B. (2018). Towards A rigorous science of interpretable machine learning. In Explainable and interpretable models in computer vision and machine learning. Springer.
FICO. (2018). FICO explainable machine learning challenge. https://community.fico.com/s/explainable-machine-learning-challenge.
Fisher A.J., Rudin, C., & Dominici, F. (2019). All models are wrong, but many are useful: Learning a variable’s importance by studying an entire class of prediction models simultaneously. JMLR.
Friedman, J. H. (2001). Greedy function approximation: A gradient boosting machine. The Annals of Statistics, 29, 5.
Article MathSciNet MATH Google Scholar
Friedman, J. H., & Popescu, B. E. (2008). Predictive learning via rule ensembles. The Annals of Applied Statistics, 2, 3.
Article MathSciNet MATH Google Scholar
Frosst, N., & Hinton, G. (2018). Distilling a neural network into a soft decision tree. In CEUR-WS.
Glorot, X., & Bengio, Y. (2010). Understanding the difficulty of training deep feedforward neural networks. In AISTATS.
Hastie, T., & Tibshirani, R. (1986). Generalized additive models. Journal of Statistical Science, 1(3), 297–310.
MathSciNet MATH Google Scholar
Hastie, T., Tibshirani, R., & Friedman, J. (2009). The elements of statistical learning: Data mining, inference, and prediction. Springer.
Hinton, G., Vinyals, O., & Dean, J. (2014). Distilling the knowledge in a neural network. In NeurIPS Deep learning and representation learning workshop.
Hooker, G. (2004). Discovering additive structure in black box functions. In KDD.
Ibrahim, M., Louie, M., Modarres, C., & Paisley, J. (2019). Mapping the landscape of predictions: Global explanations of neural networks. In AIES.
Ioffe, S., & Szegedy, C. (2015). Batch normalization: Accelerating deep network training by reducing internal covariate shift. In ICML.
Jesus, S., Belém, C., Balayan, V., Bento, J., Saleiro, P., Bizarro, P., & Gama, J. (2021). How can I choose an explainer? In FAccT: An application-grounded evaluation of post-hoc explanations.
Rawal, K., & Lakkaraju, H. (2020). Beyond individualized recourse: Interpretable and interactive summaries of actionable recourses. In NeurIPS.
Kaur, H., Nori, H., Jenkins, S., Caruana, R., Wallach, H., & Wortman Vaughan, J. (2020). Understanding Data Scientists’ Use of Interpretability Tools for Machine Learning: Interpreting Interpretability. In CHI.
Kim, B., Khanna, R., & Koyejo, O. (2016). Examples are not enough, learn to criticize! criticism for interpretability. In NeurIPS.
Kim, B., Wattenberg, M., Gilmer, J., Cai, C.J., Wexler, J., Viegas, F., & Sayres, R.A. (2018). Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (TCAV). In ICML.
Kingma, D. P., Adam J. L. (2015). A method for stochastic optimization: Adam. In ICLR.
Lage, I., Chen, E., He, J., Narayanan, M., Kim, B., Gershman, S. J., & Doshi-Velez, F. (2019). Human evaluation of models built for interpretability. In HCOMP.
Lakkaraju, H., Kamar, E., Caruana, R., & Leskovec, J. (2019). Faithful and customizable explanations of black box models. In AIES.
Lending Club. (2011). Lending Club Loan Dataset 2007-2011. https://www.lendingclub.com/info/download-data.action.
Lengerich, B., Tan, S., Chang, C. H., Hooker, G., & Caruana, R. (2020). An efficient algorithm for recovering identifiable additive models: Purifying interaction effects with the functional anova. In AISTATS.
Letham, B., Rudin, C., McCormick, T. H., Madigan, D., et al. (2015). Interpretable classifiers using rules and bayesian analysis: Building a better stroke prediction model. The Annals of Applied Statistics, 9, 3.
Article MathSciNet MATH Google Scholar
LiMin, Fu. (1994). Rule generation from neural networks. IEEE Transactions on Systems, Man, and Cybernetics, 24, 8.
Article Google Scholar
Lou, Y., Caruana, R., & Gehrke, J. (2012). Intelligible models for classification and regression. In KDD.
Lou, Y., Caruana, R., Gehrke, J., & Hooker, G. (2013). Accurate intelligible models with pairwise interactions. In KDD.
Lundberg, S. M., & Lee, S. I. (2017). A unified approach to interpreting model predictions. In NeurIPS.
Montavon, G., Samek, W., & Muller, K. R. (2018). Methods for interpreting and understanding deep neural networks. Digital Signal Processing, 73, 1–5.
Article MathSciNet Google Scholar
Mu, J., & Andrea, J. (2020). Compositional explanations of neurons. In NeurIPS.
Nori, H., Jenkins, S., Koch, P., & Caruana, R. (2019). InterpretML: A unified framework for machine learning interpretability. arXiv preprint arXiv:1909.09223.
Orlenko, A., & Moore, J. H. (2021). A comparison of methods for interpreting random forest models of genetic association in the presence of non-additive interactions. BioData Mining, 14, 1.
Article Google Scholar
Owen A. B., (2014). Sobol’ indices, & Shapley value. SIAM/ASA Journal on Uncertainty Quantification.
Poursabzi-Sangdeh, F., Goldstein, D. G., Hofman, J. M., Wortman Vaughan, J. W., & Wallach, H. (2021). Manipulating and measuring model interpretability. In CHI.
Ribeiro, M. T., Singh, S., & Guestrin, C. (2016). Why should I trust you?: Explaining the predictions of any classifier. In KDD.
Rudin, C. (2019). Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5), 206–215.
Article Google Scholar
Sanchez, I., Rocktaschel, T., Riedel, S., & Singh, S. (2015). Towards extracting faithful and descriptive representations of latent variable models. In AAAI spring syposium on knowledge representation and reasoning: Integrating symbolic and neural approaches.
Setzu, M., Guidotti, R., Monreale, A., Turini, F., Pedreschi, D., & Giannotti, F. (2021). GLocalX-from local to global explanations of black box AI models. Artificial Intelligence 294.
Shrikumar, A., Greenside, P., & Kundaje, A. (2017). Learning important features through propagating activation differences. InICML.
Simonyan, K., & Vedaldi, A., Zisserman, A. (2014). Visualising image classification models and saliency maps. In ICLR Workshop: Deep inside convolutional networks.
Simonyan, K., & Zisserman, A. (2015). Very deep convolutional networks for large-scale image recognition. In ICLR.
Sobol’, I. M. (1990). On sensitivity estimation for nonlinear mathematical models. Matematicheskoe modelirovanie, 2, 1.
Slack, D., Hilgard, S., Jia, E., & Singh, S., Lakkaraju, H. (2020). Adversarial attacks on post hoc explanation methods: Fooling LIME and SHAP. In AIES.
Štrumbelj, E., & Kononenko, I. (2014). Explaining prediction models and individual predictions with feature contributions. Knowledge and Information Systems, 41, 3.
Article Google Scholar
Tan, S. (2018). Interpretable approaches to detect bias in black-box models. In AIES doctoral consortium.
Tan, S., Caruana, R., & Hooker, G., Lou, Y. (2018). Auditing black-box models using transparent model distillation: Distill-and-compare. In AIES.
Tan, S., Soloviev, M., Hooker, G., & Wells, M. T. (2020). Tree space prototypes: Another look at making tree ensembles interpretable. In FODS.
Tsang, M., Cheng, D., & Liu, Y. (2018). Detecting statistical interactions from neural network weights. In ICLR.
van der Linden, I., Haned, H., & Kanoulas, E. (2019). Global aggregations of local explanations for black box models. In SIGIR Fairness, accountability, confidentiality, transparency, and safety workshop.
Williamson, B., & Feng, J. (2020). Efficient nonparametric statistical inference on population feature importance using Shapley values. In ICML.
Wood, S. N. (2006). Generalized additive models: An introduction with R. Chapman and Hall/CRC.
Wood, S. N. (2011). Fast stable restricted maximum likelihood and marginal likelihood estimation of semiparametric generalized linear models. Journal of the Royal Statistical Society: Series B, 73, 1.
MathSciNet MATH Google Scholar
Yan, T., & Procaccia, A. D. (2021). If you like shapley then you’ll love the core. In AAAI.
Zhao, Q., & Hastie, T. (2021). Causal interpretations of black-box models. Journal of Business & Economic Statistics, 39, 1.
Article MathSciNet Google Scholar

Download references

Acknowledgements

We thank Julius Adebayo for helpful discussion.

Funding

Giles Hooker was supported by NSF Grant No DMS-1712554.

Author information

Authors and Affiliations

Facebook, Menlo Park, USA
Sarah Tan & Albert Gordo
University of California, Berkeley, USA
Giles Hooker
Microsoft Research, Seattle, USA
Paul Koch & Rich Caruana

Authors

Sarah Tan
View author publications
You can also search for this author in PubMed Google Scholar
Giles Hooker
View author publications
You can also search for this author in PubMed Google Scholar
Paul Koch
View author publications
You can also search for this author in PubMed Google Scholar
Albert Gordo
View author publications
You can also search for this author in PubMed Google Scholar
Rich Caruana
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

The authors contributed to this paper in the following manner: Sarah Tan designed, executed, and analyzed the experiments and wrote the paper. Giles Hooker formulated mathematics in the paper and wrote the paper. Paul Koch wrote software used by the experiments. Albert Gordo executed experiments and wrote the paper. Rich Caruana analyzed experiments and wrote the paper.

Corresponding author

Correspondence to Sarah Tan.

Ethics declarations

Conflict of interest

The authors declare that there is no conflict of interest.

Consent to participate

User study subjects consented to participate in the user study.

Additional information

Editor: Nathalie Japkowicz.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This work started before the authors joined Facebook.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Tan, S., Hooker, G., Koch, P. et al. Considerations when learning additive explanations for black-box models. Mach Learn 112, 3333–3359 (2023). https://doi.org/10.1007/s10994-023-06335-8

Download citation

Received: 19 October 2021
Revised: 31 December 2022
Accepted: 30 March 2023
Published: 19 June 2023
Issue Date: September 2023
DOI: https://doi.org/10.1007/s10994-023-06335-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Considerations when learning additive explanations for black-box models

Abstract

Access this article

Similar content being viewed by others

Can local explanation techniques explain linear additive models?

Benchmarking and survey of explanation methods for black box models

Explaining Decisions of Black-Box Models Using BARBE

Data availability

Code Availability

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Consent to participate

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Considerations when learning additive explanations for black-box models

Abstract

Access this article

Similar content being viewed by others

Can local explanation techniques explain linear additive models?

Benchmarking and survey of explanation methods for black box models

Explaining Decisions of Black-Box Models Using BARBE

Data availability

Code Availability

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Consent to participate

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation