Abstract
While a wide variety of machine-learning techniques have been productively applied to diverse prediction tasks, characterizing the nature of patterns and rules learned by these techniques has remained a difficult problem. Often called ‘black-box’ models for this reason, visualization has become a prominent area of research in understanding their behavior. One powerful tool for summarizing complex models, partial dependence plots (PDPs), offers a low-dimensional graphical interpretation by evaluating the effect of modifying individual predictors on fitted/predicted values. Nevertheless, in high-dimensional settings, PDPs may not capture more complex associations between groups of related variables and the outcome of interest. We propose an extension of PDPs based on the idea of grouping covariates, and interpreting the total effects of the groups. The method utilizes principal components analysis to explore the structure of the covariates, and offers several plots for assessing the approximation function. In conjunction with our diagnostic plot, totalvis gives insight into the total effect a group of covariates has on the prediction and can be used in situations where PDPs may not be appropriate. These tools provide a useful approach for pattern exploration, as well as a natural mechanism to reason about potential causal effects embedded in black-box models.
Similar content being viewed by others
References
Apley D. ALEPlot: Accumulated local effects (ALE) plots and partial dependence (PD) plots. https://CRAN.R-project.org/package=ALEPlot, r package version 1.1 2018.
Apley, D. Visualizing the effects of predictor variables in black box supervised learning models. arXiv preprint (2016). https://arxiv.org/pdf/1612.08468.pdf.
Bartholomew D. Principal components analysis. In: Peterson P, Baker E, McGaw B, editors. International encyclopedia of education. 3rd ed. Oxford: Elsevier; 2010. p. 374–7. https://doi.org/10.1016/B978-0-08-044894-7.01358-0.
Breiman L. Random forests. Mach Learn. 2001;45(1):5–32. https://doi.org/10.1023/A:1010933404324.
Chen T, He T, Benesty M, Khotilovich V, Tang Y, Cho H, Chen K, Mitchell R, Cano I, Zhou T, Li M, Xie J, Lin M, Geng Y, Li Y. xgboost: Extreme Gradient Boosting. 2019. https://CRAN.R-project.org/package=xgboost, r package version 0.90.0.2.
Covert I, Lundberg S, Lee SI. Understanding global feature contributions with additive importance measures. 2020. arXiv:2004.00668.
DOJ, BJS. Crime in the United States (computer file). 1995.
DOJ, BJS. Law Enforcement Management And Administrative Statistics (Computer File) U.S. Department Of Commerce, Bureau Of The Census Producer, Washington, DC and Inter-university Consortium for Political and Social Research. 1992.
Dua D, Graff C. UCI machine learning repository. 2017. http://archive.ics.uci.edu/ml.
Friedman JH. Greedy function approximation: a gradient boosting machine. Ann Stat. 2001;29(5):1189–232. https://doi.org/10.1214/aos/1013203451.
Friedman J, Popescu B. Predictive learning via rule ensembles. Ann Appl Stat. 2008;2(3):916–54. https://doi.org/10.1214/07-AOAS148.
Goldstein A, Kapelner A, Bleich J, Pitkin E. Peeking inside the black box: visualizing statistical learning with plots of individual conditional expectation. J Comput Gr Stat. 2015;24(1):44–65. https://doi.org/10.1080/10618600.2014.907095.
Goldstein A, Kapelner A, Bleich J. ICEbox: Individual conditional expectation plot toolbox. 2017. https://CRAN.R-project.org/package=ICEbox, r package version 1.1.2.
Greenwell BM. pdp: an R package for constructing partial dependence plots. R J. 2017;9(1):421–36. https://doi.org/10.32614/RJ-2017-016.
Greenwell B. pdp: Partial dependence plots. 2018. https://CRAN.R-project.org/package=pdp, r package version 0.7.0.
Greenwell B, Boehmke B, Cunningham J, Developers G. gbm: Generalized boosted regression models. 2019. https://CRAN.R-project.org/package=gbm, r package version 2.1.5.
Hastie T, Tibshirani R, Friedman J. The elements of statistical learning. Berlin: Springer; 2009.
Jolliffe IT, Cadima J. Principal component analysis: a review and recent developments. Philos Trans A Math Phys Eng Sci. 2016;374(2065):20150202.
Liaw A, Wiener M. Classification and regression by randomforest. R News. 2002;2(3):18–22. https://CRAN.R-project.org/doc/Rnews/.
Lundberg SM, Lee SI. A unified approach to interpreting model predictions. In: Advances in Neural Information Processing Systems 30, Curran Associates, Inc., pp. 4765–4774, 2017. http://papers.nips.cc/paper/7062-a-unified-approach-to-interpreting-model-predictions.pdf.
M Kuhn Contributions from J Wing, Weston S, Williams A, Keefer C, Engelhardt A, Cooper T, Mayer Z, Kenkel B, the R Core Team, Benesty M, Lescarbeau R, Ziem A, Scrucca L, Tang Y, Candan C, Hunt T. caret: Classification and regression training. 2019. https://CRAN.R-project.org/package=caret, r package version 6.0-84.
Meyer D, Dimitriadou E, Hornik K, Weingessel A, Leisch F. e1071: Misc functions of the department of statistics, probability theory group (Formerly: E1071), TU Wien. 2019. https://CRAN.R-project.org/package=e1071, r package version 1.7-2.
Molnar C. iml: Interpretable Machine Learning. 2019a. https://CRAN.R-project.org/package=iml, r package version 0.9.0.
Molnar C. Interpretable machine learning. 2019b. https://christophm.github.io/interpretable-ml-book/.
Nembrini S, König IR, Wright MN. The revival of the Gini importance? Bioinformatics. 2018;34(21):3711–3718. https://doi.org/10.1093/bioinformatics/bty373, https://academic.oup.com/bioinformatics/article-pdf/34/21/3711/26146978/bty373.pdf
Pedersen TL, Benesty M. lime: Local Interpretable Model-Agnostic Explanations. 2019. https://CRAN.R-project.org/package=lime, r package version 0.5.1.
R Core Team. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. 2019. https://www.R-project.org/.
Riberio M, Singh S, Guestrin C. “Why should I trust you?”: explaining the predictions of any classifier. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1135–1144. 2016. https://doi.org/10.1145/2939672.2939778.
Ripley B. nnet: feed-forward neural networks and multinomial log-linear models. 2016. https://CRAN.R-project.org/package=nnet, r package version 7.3-12.
Schliep K, Hechenbichler K. kknn: Weighted k-nearest neighbors. 2016. https://CRAN.R-project.org/package=kknn, r package version 1.3.1.
Smith BJ. MachineShop: machine learning models and tools. 2019. https://cran.r-project.org/package=MachineShop, r package version 1.6.0.
Štrumbelj E, Kononenko I. Explaining prediction models and individual predictions with feature contributions. Knowl Inf Syst. 2014;41(3):647–65. https://doi.org/10.1007/s10115-013-0679-x.
USDOC, USCB. Census of population and housing 1990 United States: summary tape file 1a & 3a (computer files). 1990.
USDOC, USCB. Washington, DC and Inter-university Consortium for Political and Social Research. 1992.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
On behalf of all the authors, the corresponding author states that there is no conflict of interest.
Funding
None.
Availability of data and material
All data are publicly available at https://archive.ics.uci.edu/ml/datasets/Communities+and+Crime+Unnormalized.
Code availability
Code open source available at https://github.com/nickseedorff/totalvis.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
See Table 2 for a summary of the relevant variables explored in the community and crimes application.
Rights and permissions
About this article
Cite this article
Seedorff, N., Brown, G. totalvis: A Principal Components Approach to Visualizing Total Effects in Black Box Models. SN COMPUT. SCI. 2, 141 (2021). https://doi.org/10.1007/s42979-021-00560-5
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s42979-021-00560-5