Skip to main content

Algorithmic Transparency via Quantitative Input Influence

  • Chapter
  • First Online:
Transparent Data Mining for Big and Small Data

Part of the book series: Studies in Big Data ((SBD,volume 32))

Abstract

Algorithmic systems that employ machine learning are often opaque—it is difficult to explain why a certain decision was made. We present a formal foundation to improve the transparency of such decision-making systems. Specifically, we introduce a family of Quantitative Input Influence (QII) measures that capture the degree of input influence on system outputs. These measures provide a foundation for the design of transparency reports that accompany system decisions (e.g., explaining a specific credit decision) and for testing tools useful for internal and external oversight (e.g., to detect algorithmic discrimination). Distinctively, our causal QII measures carefully account for correlated inputs while measuring influence. They support a general class of transparency queries and can, in particular, explain decisions about individuals and groups. Finally, since single inputs may not always have high influence, the QII measures also quantify the joint influence of a set of inputs (e.g., age and income) on outcomes (e.g. loan decisions) and the average marginal influence of individual inputs within such a set (e.g., income) using principled aggregation measures, such as the Shapley value, previously applied to measure influence in voting.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 119.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 159.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 159.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    By “black-box access to the decision-making system” we mean a typical setting of software testing with complete control of inputs to the system and full observability of the outputs.

  2. 2.

    The adult dataset contains approximately 31k datapoints of users’ personal attributes, and whether their income is more than $50k per annum; see Sect. 5 for more details.

References

  1. Adler, P., Falk, C., Friedler, S., Rybeck, G., Schedegger, C., Smith, B., Venkatasubramanian, S.: Auditing black-box models for indirect influence. In: Proceedings of the 2016 IEEE International Conference on Data Mining (ICDM), ICDM’16, pp. 339–348. IEEE Computer Society, Washington (2016)

    Google Scholar 

  2. Alloway, T.: Big data: Credit where credit’s due (2015). http://www.ft.com/cms/s/0/7933792e-a2e6-11e4-9c06-00144feab7de.html

  3. Barford, P., Canadi, I., Krushevskaja, D., Ma, Q., Muthukrishnan, S.: Adscape: harvesting and analyzing online display ads. In: Proceedings of the 23rd International Conference on World Wide Web, WWW’14, pp. 597–608. ACM, New York (2014)

    Google Scholar 

  4. Barocas, S., Nissenbaum, H.: Big data’s end run around procedural privacy protections. Commun. ACM 57 (11), 31–33 (2014)

    Article  Google Scholar 

  5. Big data in education (2015). https://www.edx.org/course/big-data-education-teacherscollegex-bde1x

  6. Big data in government, defense and homeland security 2015–2020 (2015). http://www.prnewswire.com/news-releases/big-data-in-government-defense-and-homeland-security-2015---2020.html

  7. Bishop, C.M.: Pattern Recognition and Machine Learning (Information Science and Statistics). Springer, New York (2006)

    MATH  Google Scholar 

  8. Bork, P., Jensen, L., von Mering, C., Ramani, A., Lee, I., Marcott, E.: Protein interaction networks from yeast to human. Curr. Opin. Struct. Biol. 14 (3), 292–299 (2004)

    Article  Google Scholar 

  9. Breiman, L.: Random forests. Mach. Learn. 45 (1), 5–32 (2001)

    Article  MATH  Google Scholar 

  10. Calders, T., Verwer, S.: Three naive Bayes approaches for discrimination-free classification. Data Min. Knowl. Disc. 21 (2), 277–292 (2010)

    Article  MathSciNet  Google Scholar 

  11. Cover, T.M., Thomas, J.A.: Elements of Information Theory. Wiley, New York (2012)

    MATH  Google Scholar 

  12. Datta, A., Tschantz, M., Datta, A.: Automated experiments on ad privacy settings: a tale of opacity, choice, and discrimination. In: Proceedings on Privacy Enhancing Technologies (PoPETs 2015), pp. 92–112 (2015)

    Google Scholar 

  13. Datta, A., Datta, A., Procaccia, A., Zick, Y.: Influence in classification via cooperative game theory. In: Proceedings of the 24th International Joint Conference on Artificial Intelligence (IJCAI 2015), pp. 511–517 (2015)

    Google Scholar 

  14. Datta, A., Sen, S., Zick, Y.: Algorithmic transparency via quantitative input influence: theory and experiments with learning systems. In: Proceedings of 37th Symposium on Security and Privacy (Oakland 2016), pp. 598–617 (2016)

    Google Scholar 

  15. Dwork, C., Hardt, M., Pitassi, T., Reingold, O., Zemel, R.: Fairness through awareness. In: Proceedings of the 3rd Innovations in Theoretical Computer Science Conference (ITCS 2012), pp. 214–226 (2012)

    Google Scholar 

  16. E.G. Griggs v. Duke Power Co., 401 U.S. 424, 91 S. Ct. 849, 28 L. Ed. 2d 158 (1977)

    Google Scholar 

  17. Feldman, M., Friedler, S.A., Moeller, J., Scheidegger, C., Venkatasubramanian, S.: Certifying and removing disparate impact. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD’15, pp. 259–268. ACM, New York (2015)

    Google Scholar 

  18. Guha, S., Cheng, B., Francis, P.: Challenges in measuring online advertising systems. In: Proceedings of the 10th ACM SIGCOMM Conference on Internet Measurement, IMC’10, pp. 81–87. ACM, New York (2010)

    Google Scholar 

  19. Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003)

    MATH  Google Scholar 

  20. Janzing, D., Balduzzi, D., Grosse-Wentrup, M., Schölkopf, B.: Quantifying causal influences. Ann. Statist. 41 (5), 2324–2358 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  21. Jelveh, Z., Luca, M.: Towards diagnosing accuracy loss in discrimination-aware classification: an application to predictive policing. In: Fairness, Accountability and Transparency in Machine Learning, pp. 137–141 (2014)

    Google Scholar 

  22. Kamishima, T., Akaho, S., Sakuma, J.: Fairness-aware learning through regularization approach. In: Proceedings of the 2011 IEEE 11th International Conference on Data Mining Workshops (ICDMW 2011), pp. 643–650 (2011)

    Google Scholar 

  23. Keinan, A., Sandbank, B., Hilgetag, C., Meilijson, I., Ruppin, E.: Fair attribution of functional contribution in artificial and biological networks. Neural Comput. 16 (9), 1887–1915 (2004)

    Article  MATH  Google Scholar 

  24. Lécuyer, M., Ducoffe, G., Lan, F., Papancea, A., Petsios, T., Spahn, R., Chaintreau, A., Geambasu, R.: Xray: enhancing the web’s transparency with differential correlation. In: Proceedings of the 23rd USENIX Conference on Security Symposium, SEC’14, pp. 49–64. USENIX Association, Berkeley (2014)

    Google Scholar 

  25. Lecuyer, M., Spahn, R., Spiliopolous, Y., Chaintreau, A., Geambasu, R., Hsu, D.: Sunlight: Fine-grained targeting detection at scale with statistical confidence. In: Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, CCS’15, pp. 554–566. ACM, New York (2015)

    Google Scholar 

  26. Letham, B., Rudin, C., McCormick, T.H., Madigan, D.: Interpretable classifiers using rules and Bayesian analysis: building a better stroke prediction model. Ann. Appl. Stat. 9 (3), 1350–1371 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  27. Lichman, M.: UCI machine learning repository (2013). http://archive.ics.uci.edu/ml

    Google Scholar 

  28. Lindelauf, R., Hamers, H., Husslage, B.: Cooperative game theoretic centrality analysis of terrorist networks: the cases of Jemaah Islamiyah and Al Qaeda. Eur. J. Oper. Res. 229 (1), 230–238 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  29. Maschler, M., Solan, E., Zamir, S.: Game Theory. Cambridge University Press, Cambridge (2013)

    Book  MATH  Google Scholar 

  30. Michalak, T., Rahwan, T., Szczepanski, P., Skibski, O., Narayanam, R., Wooldridge, M., Jennings, N.: Computational analysis of connectivity games with applications to the investigation of terrorist networks. In: Proceedings of the 23rd International Joint Conference on Artificial Intelligence (IJCAI 2013), pp. 293–301 (2013)

    Google Scholar 

  31. Murdoch, T.B., Detsky, A.S.: The inevitable application of big data to health care. http://jama.jamanetwork.com/article.aspx?articleid=1674245

  32. National longitudinal surveys (2017). http://www.bls.gov/nls/

  33. O’Donnell, R.: Analysis of Boolean Functions. Cambridge University Press, New York (2014)

    Book  MATH  Google Scholar 

  34. Perry, W.L., McInnis, B., Price, C.C., Smith, S.C., Hollywood, J.S.: Predictive policing: the role of crime forecasting in law enforcement operations. RAND Corporation, Santa Monica (2013)

    Google Scholar 

  35. Podesta, J., Pritzker, P., Moniz, E., Holdern, J., Zients, J.: Big data: seizing opportunities, preserving values. Technical Report, Executive Office of the President - the White House (2014)

    Google Scholar 

  36. Rényi, A.: On measures of entropy and information. In: Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability, Volume 1: Contributions to the Theory of Statistics, pp. 547–561. University of California Press, Berkeley (1961)

    Google Scholar 

  37. Rüping, S.: Learning interpretable models. Ph.D. Thesis, Dortmund University of Technology (2006). http://d-nb.info/997491736

  38. Shannon, C.E.: A mathematical theory of communication. Bell Syst. Tech. J. 27 (3), 379–423 (1948)

    Article  MathSciNet  MATH  Google Scholar 

  39. Shapley, L.: A value for n-person games. In: Contributions to the Theory of Games, vol. 2, Annals of Mathematics Studies, No. 28, pp. 307–317. Princeton University Press, Princeton (1953)

    Google Scholar 

  40. Shapley, L.S., Shubik, M.: A method for evaluating the distribution of power in a committee system. Am. Polit. Sci. Rev. 48 (3), 787–792 (1954)

    Article  Google Scholar 

  41. Smith, G.: Quantifying information flow using min-entropy. In: Proceedings of the 8th International Conference on Quantitative Evaluation of Systems (QEST 2011), pp. 159–167 (2011)

    Google Scholar 

  42. Strumbelj, E., Kononenko, I.: An efficient explanation of individual classifications using game theory. J. Mach. Learn. Res. 11, 1–18 (2010)

    MathSciNet  MATH  Google Scholar 

  43. The National Center for Fair and Open Testing: 850+ colleges and universities that do not use SAT/ACT scores to admit substantial numbers of students into bachelor degree programs (2015). http://www.fairtest.org/university/optional

  44. Tibshirani, R.: Regression shrinkage and selection via the lasso: a retrospective. J. R. Stat. Soc. Ser. B 73 (3), 273–282 (2011)

    Article  MathSciNet  Google Scholar 

  45. University, G.W.: Standardized test scores will be optional for GW applicants (2015). https://gwtoday.gwu.edu/standardized-test-scores-will-be-optional-gw-applicants

  46. Ustun, B., Tracà, S., Rudin, C.: Supersparse linear integer models for interpretable classification. ArXiv e-prints (2013). http://arxiv.org/pdf/1306.5860v1

  47. Young, H.: Monotonic solutions of cooperative games. Int. J. Game Theory 14 (2), 65–72 (1985)

    Article  MathSciNet  MATH  Google Scholar 

  48. Zemel, R., Wu, Y., Swersky, K., Pitassi, T., Dwork, C.: Learning fair representations. In: Proceedings of the 30th International Conference on Machine Learning (ICML 2013), pp. 325–333 (2013)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Anupam Datta .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this chapter

Cite this chapter

Datta, A., Sen, S., Zick, Y. (2017). Algorithmic Transparency via Quantitative Input Influence. In: Cerquitelli, T., Quercia, D., Pasquale, F. (eds) Transparent Data Mining for Big and Small Data. Studies in Big Data, vol 32. Springer, Cham. https://doi.org/10.1007/978-3-319-54024-5_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-54024-5_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-54023-8

  • Online ISBN: 978-3-319-54024-5

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics