totalvis: A Principal Components Approach to Visualizing Total Effects in Black Box Models

Seedorff, Nicholas; Brown, Grant

doi:10.1007/s42979-021-00560-5

totalvis: A Principal Components Approach to Visualizing Total Effects in Black Box Models

Original Research
Published: 14 March 2021

Volume 2, article number 141, (2021)
Cite this article

SN Computer Science Aims and scope Submit manuscript

609 Accesses
4 Citations
Explore all metrics

Abstract

While a wide variety of machine-learning techniques have been productively applied to diverse prediction tasks, characterizing the nature of patterns and rules learned by these techniques has remained a difficult problem. Often called ‘black-box’ models for this reason, visualization has become a prominent area of research in understanding their behavior. One powerful tool for summarizing complex models, partial dependence plots (PDPs), offers a low-dimensional graphical interpretation by evaluating the effect of modifying individual predictors on fitted/predicted values. Nevertheless, in high-dimensional settings, PDPs may not capture more complex associations between groups of related variables and the outcome of interest. We propose an extension of PDPs based on the idea of grouping covariates, and interpreting the total effects of the groups. The method utilizes principal components analysis to explore the structure of the covariates, and offers several plots for assessing the approximation function. In conjunction with our diagnostic plot, totalvis gives insight into the total effect a group of covariates has on the prediction and can be used in situations where PDPs may not be appropriate. These tools provide a useful approach for pattern exploration, as well as a natural mechanism to reason about potential causal effects embedded in black-box models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 16

Fig. 19

Correlated Component Regression: Re-thinking Regression in the Presence of Near Collinearity

PCovR2: A flexible principal covariates regression approach to parsimoniously handle multiple criterion variables

Article 08 January 2021

Factor Analysis Regression for Predictive Modeling with High-Dimensional Data

Article 23 August 2022

Notes

https://github.com/nickseedorff/totalvis.

References

Apley D. ALEPlot: Accumulated local effects (ALE) plots and partial dependence (PD) plots. https://CRAN.R-project.org/package=ALEPlot, r package version 1.1 2018.
Apley, D. Visualizing the effects of predictor variables in black box supervised learning models. arXiv preprint (2016). https://arxiv.org/pdf/1612.08468.pdf.
Bartholomew D. Principal components analysis. In: Peterson P, Baker E, McGaw B, editors. International encyclopedia of education. 3rd ed. Oxford: Elsevier; 2010. p. 374–7. https://doi.org/10.1016/B978-0-08-044894-7.01358-0.
Chapter Google Scholar
Breiman L. Random forests. Mach Learn. 2001;45(1):5–32. https://doi.org/10.1023/A:1010933404324.
Article MATH Google Scholar
Chen T, He T, Benesty M, Khotilovich V, Tang Y, Cho H, Chen K, Mitchell R, Cano I, Zhou T, Li M, Xie J, Lin M, Geng Y, Li Y. xgboost: Extreme Gradient Boosting. 2019. https://CRAN.R-project.org/package=xgboost, r package version 0.90.0.2.
Covert I, Lundberg S, Lee SI. Understanding global feature contributions with additive importance measures. 2020. arXiv:2004.00668.
DOJ, BJS. Crime in the United States (computer file). 1995.
DOJ, BJS. Law Enforcement Management And Administrative Statistics (Computer File) U.S. Department Of Commerce, Bureau Of The Census Producer, Washington, DC and Inter-university Consortium for Political and Social Research. 1992.
Dua D, Graff C. UCI machine learning repository. 2017. http://archive.ics.uci.edu/ml.
Friedman JH. Greedy function approximation: a gradient boosting machine. Ann Stat. 2001;29(5):1189–232. https://doi.org/10.1214/aos/1013203451.
Article MathSciNet MATH Google Scholar
Friedman J, Popescu B. Predictive learning via rule ensembles. Ann Appl Stat. 2008;2(3):916–54. https://doi.org/10.1214/07-AOAS148.
Article MathSciNet MATH Google Scholar
Goldstein A, Kapelner A, Bleich J, Pitkin E. Peeking inside the black box: visualizing statistical learning with plots of individual conditional expectation. J Comput Gr Stat. 2015;24(1):44–65. https://doi.org/10.1080/10618600.2014.907095.
Article MathSciNet Google Scholar
Goldstein A, Kapelner A, Bleich J. ICEbox: Individual conditional expectation plot toolbox. 2017. https://CRAN.R-project.org/package=ICEbox, r package version 1.1.2.
Greenwell BM. pdp: an R package for constructing partial dependence plots. R J. 2017;9(1):421–36. https://doi.org/10.32614/RJ-2017-016.
Article Google Scholar
Greenwell B. pdp: Partial dependence plots. 2018. https://CRAN.R-project.org/package=pdp, r package version 0.7.0.
Greenwell B, Boehmke B, Cunningham J, Developers G. gbm: Generalized boosted regression models. 2019. https://CRAN.R-project.org/package=gbm, r package version 2.1.5.
Hastie T, Tibshirani R, Friedman J. The elements of statistical learning. Berlin: Springer; 2009.
Book Google Scholar
Jolliffe IT, Cadima J. Principal component analysis: a review and recent developments. Philos Trans A Math Phys Eng Sci. 2016;374(2065):20150202.
MathSciNet MATH Google Scholar
Liaw A, Wiener M. Classification and regression by randomforest. R News. 2002;2(3):18–22. https://CRAN.R-project.org/doc/Rnews/.
Lundberg SM, Lee SI. A unified approach to interpreting model predictions. In: Advances in Neural Information Processing Systems 30, Curran Associates, Inc., pp. 4765–4774, 2017. http://papers.nips.cc/paper/7062-a-unified-approach-to-interpreting-model-predictions.pdf.
M Kuhn Contributions from J Wing, Weston S, Williams A, Keefer C, Engelhardt A, Cooper T, Mayer Z, Kenkel B, the R Core Team, Benesty M, Lescarbeau R, Ziem A, Scrucca L, Tang Y, Candan C, Hunt T. caret: Classification and regression training. 2019. https://CRAN.R-project.org/package=caret, r package version 6.0-84.
Meyer D, Dimitriadou E, Hornik K, Weingessel A, Leisch F. e1071: Misc functions of the department of statistics, probability theory group (Formerly: E1071), TU Wien. 2019. https://CRAN.R-project.org/package=e1071, r package version 1.7-2.
Molnar C. iml: Interpretable Machine Learning. 2019a. https://CRAN.R-project.org/package=iml, r package version 0.9.0.
Molnar C. Interpretable machine learning. 2019b. https://christophm.github.io/interpretable-ml-book/.
Nembrini S, König IR, Wright MN. The revival of the Gini importance? Bioinformatics. 2018;34(21):3711–3718. https://doi.org/10.1093/bioinformatics/bty373, https://academic.oup.com/bioinformatics/article-pdf/34/21/3711/26146978/bty373.pdf
Pedersen TL, Benesty M. lime: Local Interpretable Model-Agnostic Explanations. 2019. https://CRAN.R-project.org/package=lime, r package version 0.5.1.
R Core Team. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. 2019. https://www.R-project.org/.
Riberio M, Singh S, Guestrin C. “Why should I trust you?”: explaining the predictions of any classifier. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1135–1144. 2016. https://doi.org/10.1145/2939672.2939778.
Ripley B. nnet: feed-forward neural networks and multinomial log-linear models. 2016. https://CRAN.R-project.org/package=nnet, r package version 7.3-12.
Schliep K, Hechenbichler K. kknn: Weighted k-nearest neighbors. 2016. https://CRAN.R-project.org/package=kknn, r package version 1.3.1.
Smith BJ. MachineShop: machine learning models and tools. 2019. https://cran.r-project.org/package=MachineShop, r package version 1.6.0.
Štrumbelj E, Kononenko I. Explaining prediction models and individual predictions with feature contributions. Knowl Inf Syst. 2014;41(3):647–65. https://doi.org/10.1007/s10115-013-0679-x.
Article Google Scholar
USDOC, USCB. Census of population and housing 1990 United States: summary tape file 1a & 3a (computer files). 1990.
USDOC, USCB. Washington, DC and Inter-university Consortium for Political and Social Research. 1992.

Download references

Author information

Authors and Affiliations

Department of Biostatistics, University of Iowa College of Public Health, 145 N Riverside Dr, Iowa City, IA, 52242, USA
Nicholas Seedorff & Grant Brown

Authors

Nicholas Seedorff
View author publications
You can also search for this author in PubMed Google Scholar
Grant Brown
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nicholas Seedorff.

Ethics declarations

Conflict of interest

On behalf of all the authors, the corresponding author states that there is no conflict of interest.

Funding

None.

Availability of data and material

All data are publicly available at https://archive.ics.uci.edu/ml/datasets/Communities+and+Crime+Unnormalized.

Code availability

Code open source available at https://github.com/nickseedorff/totalvis.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

See Table 2 for a summary of the relevant variables explored in the community and crimes application.

Table 2 Descriptions of relevant variables from the communities and crimes data

Full size table

Rights and permissions

Reprints and permissions

About this article

Cite this article

Seedorff, N., Brown, G. totalvis: A Principal Components Approach to Visualizing Total Effects in Black Box Models. SN COMPUT. SCI. 2, 141 (2021). https://doi.org/10.1007/s42979-021-00560-5

Download citation

Received: 24 September 2020
Accepted: 03 March 2021
Published: 14 March 2021
DOI: https://doi.org/10.1007/s42979-021-00560-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

totalvis: A Principal Components Approach to Visualizing Total Effects in Black Box Models

Abstract

Access this article

Similar content being viewed by others

Correlated Component Regression: Re-thinking Regression in the Presence of Near Collinearity

PCovR2: A flexible principal covariates regression approach to parsimoniously handle multiple criterion variables

Factor Analysis Regression for Predictive Modeling with High-Dimensional Data

Notes

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Funding

Availability of data and material

Code availability

Additional information

Publisher's Note

Appendix

Rights and permissions

About this article

Cite this article

Keywords

Navigation

totalvis: A Principal Components Approach to Visualizing Total Effects in Black Box Models

Abstract

Access this article

Similar content being viewed by others

Correlated Component Regression: Re-thinking Regression in the Presence of Near Collinearity

PCovR2: A flexible principal covariates regression approach to parsimoniously handle multiple criterion variables

Factor Analysis Regression for Predictive Modeling with High-Dimensional Data

Notes

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Funding

Availability of data and material

Code availability

Additional information

Publisher's Note

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation