Algorithmic Transparency via Quantitative Input Influence

Datta, Anupam; Sen, Shayak; Zick, Yair

doi:10.1007/978-3-319-54024-5_4

Anupam Datta⁵,
Shayak Sen⁵ &
Yair Zick⁶

Part of the book series: Studies in Big Data ((SBD,volume 32))

2872 Accesses
9 Citations

Abstract

Algorithmic systems that employ machine learning are often opaque—it is difficult to explain why a certain decision was made. We present a formal foundation to improve the transparency of such decision-making systems. Specifically, we introduce a family of Quantitative Input Influence (QII) measures that capture the degree of input influence on system outputs. These measures provide a foundation for the design of transparency reports that accompany system decisions (e.g., explaining a specific credit decision) and for testing tools useful for internal and external oversight (e.g., to detect algorithmic discrimination). Distinctively, our causal QII measures carefully account for correlated inputs while measuring influence. They support a general class of transparency queries and can, in particular, explain decisions about individuals and groups. Finally, since single inputs may not always have high influence, the QII measures also quantify the joint influence of a set of inputs (e.g., age and income) on outcomes (e.g. loan decisions) and the average marginal influence of individual inputs within such a set (e.g., income) using principled aggregation measures, such as the Shapley value, previously applied to measure influence in voting.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 119.00; Price excludes VAT (USA)

Softcover Book: USD 159.99; Price excludes VAT (USA)

Hardcover Book: USD 159.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
By “black-box access to the decision-making system” we mean a typical setting of software testing with complete control of inputs to the system and full observability of the outputs.
2.
The adult dataset contains approximately 31k datapoints of users’ personal attributes, and whether their income is more than $50k per annum; see Sect. 5 for more details.

References

Adler, P., Falk, C., Friedler, S., Rybeck, G., Schedegger, C., Smith, B., Venkatasubramanian, S.: Auditing black-box models for indirect influence. In: Proceedings of the 2016 IEEE International Conference on Data Mining (ICDM), ICDM’16, pp. 339–348. IEEE Computer Society, Washington (2016)
Google Scholar
Alloway, T.: Big data: Credit where credit’s due (2015). http://www.ft.com/cms/s/0/7933792e-a2e6-11e4-9c06-00144feab7de.html
Barford, P., Canadi, I., Krushevskaja, D., Ma, Q., Muthukrishnan, S.: Adscape: harvesting and analyzing online display ads. In: Proceedings of the 23rd International Conference on World Wide Web, WWW’14, pp. 597–608. ACM, New York (2014)
Google Scholar
Barocas, S., Nissenbaum, H.: Big data’s end run around procedural privacy protections. Commun. ACM 57 (11), 31–33 (2014)
Article Google Scholar
Big data in education (2015). https://www.edx.org/course/big-data-education-teacherscollegex-bde1x
Big data in government, defense and homeland security 2015–2020 (2015). http://www.prnewswire.com/news-releases/big-data-in-government-defense-and-homeland-security-2015---2020.html
Bishop, C.M.: Pattern Recognition and Machine Learning (Information Science and Statistics). Springer, New York (2006)
MATH Google Scholar
Bork, P., Jensen, L., von Mering, C., Ramani, A., Lee, I., Marcott, E.: Protein interaction networks from yeast to human. Curr. Opin. Struct. Biol. 14 (3), 292–299 (2004)
Article Google Scholar
Breiman, L.: Random forests. Mach. Learn. 45 (1), 5–32 (2001)
Article MATH Google Scholar
Calders, T., Verwer, S.: Three naive Bayes approaches for discrimination-free classification. Data Min. Knowl. Disc. 21 (2), 277–292 (2010)
Article MathSciNet Google Scholar
Cover, T.M., Thomas, J.A.: Elements of Information Theory. Wiley, New York (2012)
MATH Google Scholar
Datta, A., Tschantz, M., Datta, A.: Automated experiments on ad privacy settings: a tale of opacity, choice, and discrimination. In: Proceedings on Privacy Enhancing Technologies (PoPETs 2015), pp. 92–112 (2015)
Google Scholar
Datta, A., Datta, A., Procaccia, A., Zick, Y.: Influence in classification via cooperative game theory. In: Proceedings of the 24th International Joint Conference on Artificial Intelligence (IJCAI 2015), pp. 511–517 (2015)
Google Scholar
Datta, A., Sen, S., Zick, Y.: Algorithmic transparency via quantitative input influence: theory and experiments with learning systems. In: Proceedings of 37th Symposium on Security and Privacy (Oakland 2016), pp. 598–617 (2016)
Google Scholar
Dwork, C., Hardt, M., Pitassi, T., Reingold, O., Zemel, R.: Fairness through awareness. In: Proceedings of the 3rd Innovations in Theoretical Computer Science Conference (ITCS 2012), pp. 214–226 (2012)
Google Scholar
E.G. Griggs v. Duke Power Co., 401 U.S. 424, 91 S. Ct. 849, 28 L. Ed. 2d 158 (1977)
Google Scholar
Feldman, M., Friedler, S.A., Moeller, J., Scheidegger, C., Venkatasubramanian, S.: Certifying and removing disparate impact. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD’15, pp. 259–268. ACM, New York (2015)
Google Scholar
Guha, S., Cheng, B., Francis, P.: Challenges in measuring online advertising systems. In: Proceedings of the 10th ACM SIGCOMM Conference on Internet Measurement, IMC’10, pp. 81–87. ACM, New York (2010)
Google Scholar
Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003)
MATH Google Scholar
Janzing, D., Balduzzi, D., Grosse-Wentrup, M., Schölkopf, B.: Quantifying causal influences. Ann. Statist. 41 (5), 2324–2358 (2013)
Article MathSciNet MATH Google Scholar
Jelveh, Z., Luca, M.: Towards diagnosing accuracy loss in discrimination-aware classification: an application to predictive policing. In: Fairness, Accountability and Transparency in Machine Learning, pp. 137–141 (2014)
Google Scholar
Kamishima, T., Akaho, S., Sakuma, J.: Fairness-aware learning through regularization approach. In: Proceedings of the 2011 IEEE 11th International Conference on Data Mining Workshops (ICDMW 2011), pp. 643–650 (2011)
Google Scholar
Keinan, A., Sandbank, B., Hilgetag, C., Meilijson, I., Ruppin, E.: Fair attribution of functional contribution in artificial and biological networks. Neural Comput. 16 (9), 1887–1915 (2004)
Article MATH Google Scholar
Lécuyer, M., Ducoffe, G., Lan, F., Papancea, A., Petsios, T., Spahn, R., Chaintreau, A., Geambasu, R.: Xray: enhancing the web’s transparency with differential correlation. In: Proceedings of the 23rd USENIX Conference on Security Symposium, SEC’14, pp. 49–64. USENIX Association, Berkeley (2014)
Google Scholar
Lecuyer, M., Spahn, R., Spiliopolous, Y., Chaintreau, A., Geambasu, R., Hsu, D.: Sunlight: Fine-grained targeting detection at scale with statistical confidence. In: Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, CCS’15, pp. 554–566. ACM, New York (2015)
Google Scholar
Letham, B., Rudin, C., McCormick, T.H., Madigan, D.: Interpretable classifiers using rules and Bayesian analysis: building a better stroke prediction model. Ann. Appl. Stat. 9 (3), 1350–1371 (2015)
Article MathSciNet MATH Google Scholar
Lichman, M.: UCI machine learning repository (2013). http://archive.ics.uci.edu/ml
Google Scholar
Lindelauf, R., Hamers, H., Husslage, B.: Cooperative game theoretic centrality analysis of terrorist networks: the cases of Jemaah Islamiyah and Al Qaeda. Eur. J. Oper. Res. 229 (1), 230–238 (2013)
Article MathSciNet MATH Google Scholar
Maschler, M., Solan, E., Zamir, S.: Game Theory. Cambridge University Press, Cambridge (2013)
Book MATH Google Scholar
Michalak, T., Rahwan, T., Szczepanski, P., Skibski, O., Narayanam, R., Wooldridge, M., Jennings, N.: Computational analysis of connectivity games with applications to the investigation of terrorist networks. In: Proceedings of the 23rd International Joint Conference on Artificial Intelligence (IJCAI 2013), pp. 293–301 (2013)
Google Scholar
Murdoch, T.B., Detsky, A.S.: The inevitable application of big data to health care. http://jama.jamanetwork.com/article.aspx?articleid=1674245
National longitudinal surveys (2017). http://www.bls.gov/nls/
O’Donnell, R.: Analysis of Boolean Functions. Cambridge University Press, New York (2014)
Book MATH Google Scholar
Perry, W.L., McInnis, B., Price, C.C., Smith, S.C., Hollywood, J.S.: Predictive policing: the role of crime forecasting in law enforcement operations. RAND Corporation, Santa Monica (2013)
Google Scholar
Podesta, J., Pritzker, P., Moniz, E., Holdern, J., Zients, J.: Big data: seizing opportunities, preserving values. Technical Report, Executive Office of the President - the White House (2014)
Google Scholar
Rényi, A.: On measures of entropy and information. In: Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability, Volume 1: Contributions to the Theory of Statistics, pp. 547–561. University of California Press, Berkeley (1961)
Google Scholar
Rüping, S.: Learning interpretable models. Ph.D. Thesis, Dortmund University of Technology (2006). http://d-nb.info/997491736
Shannon, C.E.: A mathematical theory of communication. Bell Syst. Tech. J. 27 (3), 379–423 (1948)
Article MathSciNet MATH Google Scholar
Shapley, L.: A value for n-person games. In: Contributions to the Theory of Games, vol. 2, Annals of Mathematics Studies, No. 28, pp. 307–317. Princeton University Press, Princeton (1953)
Google Scholar
Shapley, L.S., Shubik, M.: A method for evaluating the distribution of power in a committee system. Am. Polit. Sci. Rev. 48 (3), 787–792 (1954)
Article Google Scholar
Smith, G.: Quantifying information flow using min-entropy. In: Proceedings of the 8th International Conference on Quantitative Evaluation of Systems (QEST 2011), pp. 159–167 (2011)
Google Scholar
Strumbelj, E., Kononenko, I.: An efficient explanation of individual classifications using game theory. J. Mach. Learn. Res. 11, 1–18 (2010)
MathSciNet MATH Google Scholar
The National Center for Fair and Open Testing: 850+ colleges and universities that do not use SAT/ACT scores to admit substantial numbers of students into bachelor degree programs (2015). http://www.fairtest.org/university/optional
Tibshirani, R.: Regression shrinkage and selection via the lasso: a retrospective. J. R. Stat. Soc. Ser. B 73 (3), 273–282 (2011)
Article MathSciNet Google Scholar
University, G.W.: Standardized test scores will be optional for GW applicants (2015). https://gwtoday.gwu.edu/standardized-test-scores-will-be-optional-gw-applicants
Ustun, B., Tracà, S., Rudin, C.: Supersparse linear integer models for interpretable classification. ArXiv e-prints (2013). http://arxiv.org/pdf/1306.5860v1
Young, H.: Monotonic solutions of cooperative games. Int. J. Game Theory 14 (2), 65–72 (1985)
Article MathSciNet MATH Google Scholar
Zemel, R., Wu, Y., Swersky, K., Pitassi, T., Dwork, C.: Learning fair representations. In: Proceedings of the 30th International Conference on Machine Learning (ICML 2013), pp. 325–333 (2013)
Google Scholar

Download references

Author information

Authors and Affiliations

Carnegie Mellon University, 5000 Forbes Avenue, Pittsburgh, PA, USA
Anupam Datta & Shayak Sen
School of Computing, National University of Singapore, 21 Lower Kent Ridge Rd, Singapore, Singapore
Yair Zick

Authors

Anupam Datta
View author publications
You can also search for this author in PubMed Google Scholar
Shayak Sen
View author publications
You can also search for this author in PubMed Google Scholar
Yair Zick
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Anupam Datta .

Editor information

Editors and Affiliations

Department of Control and Computer Engineering, Politecnico di Torino, Torino, Italy
Tania Cerquitelli
Bell Laboratories, Cambridge, United Kingdom
Daniele Quercia
Carey School of Law, University of Maryland, Baltimore, Maryland, USA
Frank Pasquale

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Datta, A., Sen, S., Zick, Y. (2017). Algorithmic Transparency via Quantitative Input Influence. In: Cerquitelli, T., Quercia, D., Pasquale, F. (eds) Transparent Data Mining for Big and Small Data. Studies in Big Data, vol 32. Springer, Cham. https://doi.org/10.1007/978-3-319-54024-5_4

Download citation

DOI: https://doi.org/10.1007/978-3-319-54024-5_4
Published: 10 May 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-54023-8
Online ISBN: 978-3-319-54024-5
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics