Abstract
Algorithmic systems that employ machine learning are often opaque—it is difficult to explain why a certain decision was made. We present a formal foundation to improve the transparency of such decision-making systems. Specifically, we introduce a family of Quantitative Input Influence (QII) measures that capture the degree of input influence on system outputs. These measures provide a foundation for the design of transparency reports that accompany system decisions (e.g., explaining a specific credit decision) and for testing tools useful for internal and external oversight (e.g., to detect algorithmic discrimination). Distinctively, our causal QII measures carefully account for correlated inputs while measuring influence. They support a general class of transparency queries and can, in particular, explain decisions about individuals and groups. Finally, since single inputs may not always have high influence, the QII measures also quantify the joint influence of a set of inputs (e.g., age and income) on outcomes (e.g. loan decisions) and the average marginal influence of individual inputs within such a set (e.g., income) using principled aggregation measures, such as the Shapley value, previously applied to measure influence in voting.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
By “black-box access to the decision-making system” we mean a typical setting of software testing with complete control of inputs to the system and full observability of the outputs.
- 2.
The adult dataset contains approximately 31k datapoints of users’ personal attributes, and whether their income is more than $50k per annum; see Sect. 5 for more details.
References
Adler, P., Falk, C., Friedler, S., Rybeck, G., Schedegger, C., Smith, B., Venkatasubramanian, S.: Auditing black-box models for indirect influence. In: Proceedings of the 2016 IEEE International Conference on Data Mining (ICDM), ICDM’16, pp. 339–348. IEEE Computer Society, Washington (2016)
Alloway, T.: Big data: Credit where credit’s due (2015). http://www.ft.com/cms/s/0/7933792e-a2e6-11e4-9c06-00144feab7de.html
Barford, P., Canadi, I., Krushevskaja, D., Ma, Q., Muthukrishnan, S.: Adscape: harvesting and analyzing online display ads. In: Proceedings of the 23rd International Conference on World Wide Web, WWW’14, pp. 597–608. ACM, New York (2014)
Barocas, S., Nissenbaum, H.: Big data’s end run around procedural privacy protections. Commun. ACM 57 (11), 31–33 (2014)
Big data in education (2015). https://www.edx.org/course/big-data-education-teacherscollegex-bde1x
Big data in government, defense and homeland security 2015–2020 (2015). http://www.prnewswire.com/news-releases/big-data-in-government-defense-and-homeland-security-2015---2020.html
Bishop, C.M.: Pattern Recognition and Machine Learning (Information Science and Statistics). Springer, New York (2006)
Bork, P., Jensen, L., von Mering, C., Ramani, A., Lee, I., Marcott, E.: Protein interaction networks from yeast to human. Curr. Opin. Struct. Biol. 14 (3), 292–299 (2004)
Breiman, L.: Random forests. Mach. Learn. 45 (1), 5–32 (2001)
Calders, T., Verwer, S.: Three naive Bayes approaches for discrimination-free classification. Data Min. Knowl. Disc. 21 (2), 277–292 (2010)
Cover, T.M., Thomas, J.A.: Elements of Information Theory. Wiley, New York (2012)
Datta, A., Tschantz, M., Datta, A.: Automated experiments on ad privacy settings: a tale of opacity, choice, and discrimination. In: Proceedings on Privacy Enhancing Technologies (PoPETs 2015), pp. 92–112 (2015)
Datta, A., Datta, A., Procaccia, A., Zick, Y.: Influence in classification via cooperative game theory. In: Proceedings of the 24th International Joint Conference on Artificial Intelligence (IJCAI 2015), pp. 511–517 (2015)
Datta, A., Sen, S., Zick, Y.: Algorithmic transparency via quantitative input influence: theory and experiments with learning systems. In: Proceedings of 37th Symposium on Security and Privacy (Oakland 2016), pp. 598–617 (2016)
Dwork, C., Hardt, M., Pitassi, T., Reingold, O., Zemel, R.: Fairness through awareness. In: Proceedings of the 3rd Innovations in Theoretical Computer Science Conference (ITCS 2012), pp. 214–226 (2012)
E.G. Griggs v. Duke Power Co., 401 U.S. 424, 91 S. Ct. 849, 28 L. Ed. 2d 158 (1977)
Feldman, M., Friedler, S.A., Moeller, J., Scheidegger, C., Venkatasubramanian, S.: Certifying and removing disparate impact. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD’15, pp. 259–268. ACM, New York (2015)
Guha, S., Cheng, B., Francis, P.: Challenges in measuring online advertising systems. In: Proceedings of the 10th ACM SIGCOMM Conference on Internet Measurement, IMC’10, pp. 81–87. ACM, New York (2010)
Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003)
Janzing, D., Balduzzi, D., Grosse-Wentrup, M., Schölkopf, B.: Quantifying causal influences. Ann. Statist. 41 (5), 2324–2358 (2013)
Jelveh, Z., Luca, M.: Towards diagnosing accuracy loss in discrimination-aware classification: an application to predictive policing. In: Fairness, Accountability and Transparency in Machine Learning, pp. 137–141 (2014)
Kamishima, T., Akaho, S., Sakuma, J.: Fairness-aware learning through regularization approach. In: Proceedings of the 2011 IEEE 11th International Conference on Data Mining Workshops (ICDMW 2011), pp. 643–650 (2011)
Keinan, A., Sandbank, B., Hilgetag, C., Meilijson, I., Ruppin, E.: Fair attribution of functional contribution in artificial and biological networks. Neural Comput. 16 (9), 1887–1915 (2004)
Lécuyer, M., Ducoffe, G., Lan, F., Papancea, A., Petsios, T., Spahn, R., Chaintreau, A., Geambasu, R.: Xray: enhancing the web’s transparency with differential correlation. In: Proceedings of the 23rd USENIX Conference on Security Symposium, SEC’14, pp. 49–64. USENIX Association, Berkeley (2014)
Lecuyer, M., Spahn, R., Spiliopolous, Y., Chaintreau, A., Geambasu, R., Hsu, D.: Sunlight: Fine-grained targeting detection at scale with statistical confidence. In: Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, CCS’15, pp. 554–566. ACM, New York (2015)
Letham, B., Rudin, C., McCormick, T.H., Madigan, D.: Interpretable classifiers using rules and Bayesian analysis: building a better stroke prediction model. Ann. Appl. Stat. 9 (3), 1350–1371 (2015)
Lichman, M.: UCI machine learning repository (2013). http://archive.ics.uci.edu/ml
Lindelauf, R., Hamers, H., Husslage, B.: Cooperative game theoretic centrality analysis of terrorist networks: the cases of Jemaah Islamiyah and Al Qaeda. Eur. J. Oper. Res. 229 (1), 230–238 (2013)
Maschler, M., Solan, E., Zamir, S.: Game Theory. Cambridge University Press, Cambridge (2013)
Michalak, T., Rahwan, T., Szczepanski, P., Skibski, O., Narayanam, R., Wooldridge, M., Jennings, N.: Computational analysis of connectivity games with applications to the investigation of terrorist networks. In: Proceedings of the 23rd International Joint Conference on Artificial Intelligence (IJCAI 2013), pp. 293–301 (2013)
Murdoch, T.B., Detsky, A.S.: The inevitable application of big data to health care. http://jama.jamanetwork.com/article.aspx?articleid=1674245
National longitudinal surveys (2017). http://www.bls.gov/nls/
O’Donnell, R.: Analysis of Boolean Functions. Cambridge University Press, New York (2014)
Perry, W.L., McInnis, B., Price, C.C., Smith, S.C., Hollywood, J.S.: Predictive policing: the role of crime forecasting in law enforcement operations. RAND Corporation, Santa Monica (2013)
Podesta, J., Pritzker, P., Moniz, E., Holdern, J., Zients, J.: Big data: seizing opportunities, preserving values. Technical Report, Executive Office of the President - the White House (2014)
Rényi, A.: On measures of entropy and information. In: Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability, Volume 1: Contributions to the Theory of Statistics, pp. 547–561. University of California Press, Berkeley (1961)
Rüping, S.: Learning interpretable models. Ph.D. Thesis, Dortmund University of Technology (2006). http://d-nb.info/997491736
Shannon, C.E.: A mathematical theory of communication. Bell Syst. Tech. J. 27 (3), 379–423 (1948)
Shapley, L.: A value for n-person games. In: Contributions to the Theory of Games, vol. 2, Annals of Mathematics Studies, No. 28, pp. 307–317. Princeton University Press, Princeton (1953)
Shapley, L.S., Shubik, M.: A method for evaluating the distribution of power in a committee system. Am. Polit. Sci. Rev. 48 (3), 787–792 (1954)
Smith, G.: Quantifying information flow using min-entropy. In: Proceedings of the 8th International Conference on Quantitative Evaluation of Systems (QEST 2011), pp. 159–167 (2011)
Strumbelj, E., Kononenko, I.: An efficient explanation of individual classifications using game theory. J. Mach. Learn. Res. 11, 1–18 (2010)
The National Center for Fair and Open Testing: 850+ colleges and universities that do not use SAT/ACT scores to admit substantial numbers of students into bachelor degree programs (2015). http://www.fairtest.org/university/optional
Tibshirani, R.: Regression shrinkage and selection via the lasso: a retrospective. J. R. Stat. Soc. Ser. B 73 (3), 273–282 (2011)
University, G.W.: Standardized test scores will be optional for GW applicants (2015). https://gwtoday.gwu.edu/standardized-test-scores-will-be-optional-gw-applicants
Ustun, B., Tracà, S., Rudin, C.: Supersparse linear integer models for interpretable classification. ArXiv e-prints (2013). http://arxiv.org/pdf/1306.5860v1
Young, H.: Monotonic solutions of cooperative games. Int. J. Game Theory 14 (2), 65–72 (1985)
Zemel, R., Wu, Y., Swersky, K., Pitassi, T., Dwork, C.: Learning fair representations. In: Proceedings of the 30th International Conference on Machine Learning (ICML 2013), pp. 325–333 (2013)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this chapter
Cite this chapter
Datta, A., Sen, S., Zick, Y. (2017). Algorithmic Transparency via Quantitative Input Influence. In: Cerquitelli, T., Quercia, D., Pasquale, F. (eds) Transparent Data Mining for Big and Small Data. Studies in Big Data, vol 32. Springer, Cham. https://doi.org/10.1007/978-3-319-54024-5_4
Download citation
DOI: https://doi.org/10.1007/978-3-319-54024-5_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-54023-8
Online ISBN: 978-3-319-54024-5
eBook Packages: EngineeringEngineering (R0)