Advertisement

Knowledge and Information Systems

, Volume 54, Issue 1, pp 95–122 | Cite as

Auditing black-box models for indirect influence

  • Philip Adler
  • Casey Falk
  • Sorelle A. Friedler
  • Tionney Nix
  • Gabriel Rybeck
  • Carlos Scheidegger
  • Brandon Smith
  • Suresh Venkatasubramanian
Regular Paper
  • 466 Downloads

Abstract

Data-trained predictive models see widespread use, but for the most part they are used as black boxes which output a prediction or score. It is therefore hard to acquire a deeper understanding of model behavior and in particular how different features influence the model prediction. This is important when interpreting the behavior of complex models or asserting that certain problematic attributes (such as race or gender) are not unduly influencing decisions. In this paper, we present a technique for auditing black-box models, which lets us study the extent to which existing models take advantage of particular features in the data set, without knowing how the models work. Our work focuses on the problem of indirect influence: how some features might indirectly influence outcomes via other, related features. As a result, we can find attribute influences even in cases where, upon further direct examination of the model, the attribute is not referred to by the model at all. Our approach does not require the black-box model to be retrained. This is important if, for example, the model is only accessible via an API, and contrasts our work with other methods that investigate feature influence such as feature selection. We present experimental evidence for the effectiveness of our procedure using a variety of publicly available data sets and models. We also validate our procedure using techniques from interpretable learning and feature selection, as well as against other black-box auditing procedures. To further demonstrate the effectiveness of this technique, we use it to audit a black-box recidivism prediction algorithm.

Keywords

Black-box auditing ANOVA Algorithmic accountability Deep learning Discrimination-aware data mining Feature influence Interpretable machine learning 

References

  1. 1.
    Adler P, Falk C, Friedler SA, Rybeck G, Scheidegger C, Smith B, Venkatasubramanian S (2016) Auditing black-box models for indirect influence, In: Proceedings of the IEEE international conference on data mining (ICDM)Google Scholar
  2. 2.
    Agrawal R, Srikant R (2000) Privacy-preserving data mining, In: ACM Sigmod Record, vol 29. ACM, pp. 439–450Google Scholar
  3. 3.
    Angwin J, Larson J, Mattu S, Kirchner L (2016) Machine bias, ProPublicaGoogle Scholar
  4. 4.
    Barakat N, Diederich J (2004) Learning-based rule-extraction from support vector machines. In: Proceedings of the 14th international conference on computer theory and applicationsGoogle Scholar
  5. 5.
    Breiman L (2001) Random forests. Mach Learn 45(1):5–32CrossRefMATHGoogle Scholar
  6. 6.
    Bucilua C, Caruana R, Niculescu-Mizil A (2006) Model compression, In: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 535–541Google Scholar
  7. 7.
    Casella G, Berger RL (2001) Statistical inference, 2nd edn. Cengage Learning, BostonMATHGoogle Scholar
  8. 8.
    Chandrashekar G, Sahin F (2014) A survey on feature selection methods. Comput Electr Eng 40:16–28CrossRefGoogle Scholar
  9. 9.
    Chouldechova A (2016) Fair prediction with disparate impact: a study of bias in recidivism prediction instruments. In: Presented at the workshop on fairness, accountability, and transparency in machine learning (FATML)Google Scholar
  10. 10.
    Clark P, Niblett T (1989) The cn2 induction algorithm. Mach Learn 3(4):261–283Google Scholar
  11. 11.
    Datta A, Sen S, Zick Y (2016) Algorithmic transparency via quantitative input influence: theory and experiments with learning systems. In: Proceedings of 37th IEEE symposium on security and privacyGoogle Scholar
  12. 12.
    Duivesteijn W, Thaele J (2014) Understanding where your classifier does (not) work—the SCaPE model class for EMM, In: International conference on data mining (ICDM), pp 809–814Google Scholar
  13. 13.
    Feldman M, Friedler SA, Moeller J, Scheidegger C, Venkatasubramanian S (2015) Certifying and removing disparate impact. In: Proceedings of the 21st ACM KDD, pp 259–268Google Scholar
  14. 14.
    Freedman D, Diaconis P (1981) On the histogram as a density estimator: L 2 theory. Probab Theory Relat Fields 57(4):453–476MathSciNetMATHGoogle Scholar
  15. 15.
    Hastie T, Tibshirani R (1998) Classification by pairwise coupling. In: Jordan MI, Kearns MJ, Solla SA (eds) Advances in neural information processing systems, vol 10. MIT Press, CambridgeGoogle Scholar
  16. 16.
    Henelius A, Puolamäki K, Boström H, Asker L, Papapetrou P (2014) A peek into the black box: exploring classifiers by randomization. Data Min Knowl Disc 28:1503–1529MathSciNetCrossRefGoogle Scholar
  17. 17.
    Kabra M, Robie A, Branson K (2015) Understanding classifier errors by examining influential neighbors. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3917–3925Google Scholar
  18. 18.
    Kaufman S, Rosset S, Perlich C, Stitelman O (2012) Leakage in data mining: Formulation, detection, and avoidance. ACM Trans Knowl Discov Data (TKDD) 6(4):15Google Scholar
  19. 19.
    Kleinberg J, Mullainathan S, Raghavan M (2017) Inherent trade-offs in the fair determination of risk scores, In: Proceedings of innovations in theoretical computer science (ITCS)Google Scholar
  20. 20.
    Le QV, Ranzato M, Monga R, Devin M, Chen K, Corrado GS, Dean J, Ng AY (2011) Building high-level features using large scale unsupervised learning. In: Proceedings of the ICMLGoogle Scholar
  21. 21.
    Massey DS, Denton N (1993) American apartheid: segregation and the making of the underclass. Harvard University Press, CambridgeGoogle Scholar
  22. 22.
    Motwani R, Raghavan P (1995) Randomized Algorithms. Cambridge University Press, CambridgeCrossRefMATHGoogle Scholar
  23. 23.
    Quinlan R (1993) C4.5: programs for machine learning. Morgan Kaufmann Publishers, San MateoGoogle Scholar
  24. 24.
    Raccuglia P, Elbert KC, Adler PDF, Falk C, Wenny MB, Mollo A, Zeller M, Friedler SA, Schrier J, Norquist AJ (2016) Machine-learning-assisted materials discovery using failed experiments. Nature 533:73–76CrossRefGoogle Scholar
  25. 25.
    Ribeiro MT, Singh S, Guestrin C (2016) Why Should I Trust You?: Explaining the Predictions of Any Classifier. In: Proceedings of the ACM KDDGoogle Scholar
  26. 26.
    Romei A, Ruggieri S (2014) A multidisciplinary survey on discrimination analysis. Knowl Eng Rev 29:582–638CrossRefGoogle Scholar
  27. 27.
    Rubner Y, Tomasi C, Guibas LJ (1998) A metric for distributions with applications to image databases. In: 6th International conference on computer vision 1998. IEEE, pp 59–66Google Scholar
  28. 28.
    Strobl C, Boulesteix A-L, Kneib T, Augustin T, Zeileis A (2008) Conditional variable importance for random forests. BMC Bioinf 9(1):1CrossRefGoogle Scholar
  29. 29.
    Strobl C, Boulesteix A-L, Zeileis A, Hothorn T (2007) Bias in random forest variable importance measures: illustrations, sources and a solution. BMC Bioinf 8(1):1CrossRefGoogle Scholar
  30. 30.
    Ustun B, Traca S, Rudin C (2014) Supersparse linear integer models for interpretable classification. Technical report 1306.6677, arXivGoogle Scholar
  31. 31.
    Zacarias OP, Bostrom H (2013) Comparing support vector regression and random forests for predicting malaria incidence in Mozambique. In: International conference on advances in ICT for emerging regions (ICTer), 2013. IEEE, pp 217–221Google Scholar
  32. 32.
    Zeiler MD, Fergus R (2014) Visualizing and understanding convolutional networks. In: Computer vision—ECCV 2014. Springer, pp 818–833Google Scholar

Copyright information

© Springer-Verlag London Ltd. 2017

Authors and Affiliations

  • Philip Adler
    • 1
  • Casey Falk
    • 1
  • Sorelle A. Friedler
    • 1
  • Tionney Nix
    • 1
  • Gabriel Rybeck
    • 1
  • Carlos Scheidegger
    • 2
  • Brandon Smith
    • 1
  • Suresh Venkatasubramanian
    • 3
  1. 1.Department of Computer ScienceHaverford CollegeHaverfordUSA
  2. 2.Department of Computer ScienceUniversity of ArizonaTucsonUSA
  3. 3.Department of Computer ScienceUniversity of UtahSalt Lake CityUSA

Personalised recommendations