Abstract
This paper examines the paradoxical transparency involved in training machine-learning models. Existing literature typically critiques the opacity of machine-learning models such as neural networks or collaborative filtering, a type of critique that parallels the black-box critique in technology studies. Accordingly, people in power may leverage the models’ opacity to justify a biased result without subjecting the technical operations to public scrutiny, in what Dan McQuillan metaphorically depicts as an “algorithmic state of exception”. This paper attempts to differentiate the black-box abstraction that wraps around complex computational systems from the opacity of machine-learning models. It contends that the degree of asymmetry in knowledge is greater in the former than the latter. In the case of software systems, software codes are difficult to understand as only software experts with sufficient domain knowledge are equipped to formulate a sound critique. In contrast, the meanings of trained parameters in a machine-learning model are obscure even to the data scientists who configure and train the model. Hence, the asymmetry of knowledge lies only in how data examples are collected, the choice and configuration of machine-learning models, and the specification of features in model design. Under the trend of algorithmic decision-making proliferating with machine-learning heuristics, the paper contends that the more symmetric distribution of knowledge in machine learning could lead to a more transparent production process if proper policies are in place.
Similar content being viewed by others
Data availability
The manuscript has no associated data that need to be made available.
Notes
“The technical object taken according to its essence, which is to say the technical object insofar as it has been invented, thought and willed, and taken up [assumé] by a human subject, becomes the medium [le support] and symbol of this relationship, which we would like to name transindividual” (Simondon 2016, p. 252). The transindividual reality is an inter-individual collective reality in which inter-human relations are “created through the intermediary of the technical objects” (2016, p. 254), and the relations with technical objects create “a coupling between the inventive and organizational capacities of several subjects” (2016, p. 258).
Tertiary retention is Stiegler’s term for denoting a type of permanent social memory that is possible through technology. Writing, printing, database, YouTube, Facebook, are all examples of tertiary retentional systems.
The article “Six open source security myths debunked—and eight real challenges to consider” (Heath 2013) also makes a similar argument.
The complexity of an ML model is indicated by the number of parameters that the model can be trained with. Every ML model can be made arbitrarily complex. For instance, adding polynomial features can artificially expand the feature set of linear regression. Too many parameters may lead to overfitting, which can be attenuated by training the model with a very large training set. Thus Michele Banko and Eric Brill (2001) did an experiment, comparing the performances between different ML models trained with varying sizes of data set. They found out that all models give remarkably similar performances when there is enough data. Hence, they conclude, “it’s not who has the best algorithm that wins. It’s who has the most data.”.
The two terms machine politics and technical politics have similar meanings. The former is taken from "Hard Choices in Artificial Intelligence" (Dobbe et al. 2021) and the latter from Andrew Feenberg’s works (e.g., see Technosystem (2017)). These works share the view that every technological system is inherently political, and they advocate collective agency and political deliberation during the technical design phase.
For instance, running a neural network model requires just one round of forward propagation, whereas training a neural network model requires thousands of iterations of forward and backward propagations.
E.g. Amazon Web Service (AWS) supports deep learning on their cloud services (https://aws.amazon.com/deep-learning/).
Some companies may indeed develop or customize their code for the algorithms for training machine models. But it is nonetheless possible for regulatory policies to request the separation of this code into a software library that will not be subjected to third-party auditing.
It is true though, that auditors may also come up with their own automated tools, embedded with ML models, for scanning anomalies or biases in either data or software code. It is conceivable that auditors can design and train their own ML models for detecting software code with fraudulent motives, similar to those models designed to detect fraudulent behaviour in online transactions. So if the legal issue of proprietary trade secret is resolved and policies for regulating third-party audit of software code are set in place, it is conceivable that these tools for auditing software may become available, making it feasible to conduct third-party audits of a large software codebase.
There are works in the academia that exemplify how critiques become feasible when the design process of machine learning is transparent. One such example is Wendy Chun’s critique of the paper “Deep Neural Networks Are More Accurate Than Humans at Detecting Sexual Orientation From Facial Images” (Wang and Kosinski 2018) in Chapter 4 of her Discriminating Data (Chun 2021).
I am using ‘formal bias’ as defined in Transforming Technology (Feenberg 2002, pp. 80–82).
According to Zuboff (2019, p. 328), “[i]n this future we are exiles from our own behavior, denied access to or control over knowledge derived from our experience. Knowledge, authority, and power rest with surveillance capital, for which we are merely ‘human natural resources’.”.
Note that Simondon also uses the term “regulative external milieu” to propose the proper relation between the social and cultural milieu and technology development (see 2016, pp. 49, 129).
Lee et al. (2021, p. 12) discusses the limitations of some post-hoc explanation techniques in XAI. E.g., Local Interpretable Model-agnostic Explanations (LIME) “has been shown not to be robust: given two very similar inputs that result in very similar outputs from the model, LIME is not guaranteed to produce similar explanations.” Also, as Watson (2021, p. 10) puts it, it is questionable whether interpretable machine learning “really settle matters, or merely push the problem one rung up the latter”.
References
Agamben G (2005) State of exception. University of Chicago Press, Chicago
Agamben G (2020) Giorgio Agamben, “The state of exception provoked by an unmotivated emergency”. In: positions politics. https://positionspolitics.org/giorgio-agamben-the-state-of-exception-provoked-by-an-unmotivated-emergency/. Accessed 17 Aug 2021.
Ananny M, Crawford K (2018) Seeing without knowing: limitations of the transparency ideal and its application to algorithmic accountability. New Media Soc 20(3):973–989
Araujo T, Helberger N, Kruikemeier S et al (2020) In AI we trust? Perceptions about automated decision-making by artificial intelligence. AI Soc 35(3):611–623. https://doi.org/10.1007/s00146-019-00931-w
Banko M and Brill E (2001) Scaling to very very large corpora for natural language disambiguation. In: Proceedings of the 39th annual meeting of the Association for Computational Linguistics, 2001, pp. 26–33
Berghoff C, Biggio B, Brummel E et al. (2021) Whitepaper: towards auditable AI systems, pp. 32
Boyd D, Crawford K (2011) Six provocations for big data. In: A decade in internet time: symposium on the dynamics of the internet and society, 2011
Brill J (2015) Scalable approaches to transparency and accountability in decisionmaking algorithms: remarks at the NYU conference on algorithms and accountability. Federal Trade Commission 28
Brooks FP (1975) The mythical man-month: essays on software engineering. Addison-Wesley Publisher Co, Reading
Brown S, Davidovic J, Hasan A (2021) The algorithm audit: scoring the algorithms that score us. Big Data Soc 8(1):2053951720983865. https://doi.org/10.1177/2053951720983865
Burrell J (2016) How the machine ‘thinks’: understanding opacity in machine learning algorithms. Big Data Soc 3(1):2053951715622512
Carabantes M (2020) Black-box artificial intelligence: an epistemological and critical analysis. AI Soc 35(2):309–317
Chan L (2021) Explainable AI as epistemic representation. In: Overcoming opacity in machine learning, pp. 7–8
Chun WHK (2021) Discriminating data: correlation, neighborhoods, and the new politics of recognition. The MIT Press, Cambridge
Creel KA (2020) Transparency in complex computational systems. Philos Sci 87(4):568–589
Crogan P (2019) Bernard Stiegler on Algorithmic Governmentality: A New Regimen of Truth? New Form 98:48–67. https://doi.org/10.3898/NEWF:98.04.2019
Datta A, Tschantz MC, Datta A (2015) Automated experiments on Ad privacy settings. Proc Priv Enhancing Technol 2015(1):92–112
Diakopoulos N (2016) Accountability in algorithmic decision making. Commun ACM 59(2):56–62
Dobbe R, Krendl Gilbert T, Mintz Y (2021) Hard choices in artificial intelligence. Artif Intell 300:103555. https://doi.org/10.1016/j.artint.2021.103555
Fainman AA (2019) The problem with Opaque AI. Thinker 82(4):44–55
Feenberg A (2002) Transforming technology: a critical theory revisited. Oxford University Press, New York
Feenberg A (2017) Technosystem: the social life of reason. Harvard University Press, Cambridge
Heath N (2013) Six open source security myths debunked—and eight real challenges to consider. https://www.zdnet.com/article/six-open-source-security-myths-debunked-and-eight-real-challenges-to-consider/. Accessed 29 Apr 2022
Huby G, Harries J (2021) Bloody paperwork: algorithmic governance and control in UK integrated health and social care settings. J Extreme Anthropol 5(1):1–28. https://doi.org/10.5617/jea.8285
Jarrahi MH, Newlands G, Lee MK et al (2021) Algorithmic management in a work context. Big Data Soc 8(2):20539517211020332
Lee K-F, Chen Q (2021) AI 2041, 1st edn. Currency, New York
Lee E, Taylor H, Hiley L et al (2021) Technical barriers to the adoption of post-hoc explanation methods for black box AI models. In: Overcoming opacity in machine learning, pp. 12–13
Levy E (2000) Wide open source. SecurityFocus. com. Electronic document, p. 19. http://www.securityfocus.com/news
Longoni C, Bonezzi A, Morewedge CK (2019) Resistance to medical artificial intelligence. J Consumer Res 46(4):629–650
Malik MM (2020) A hierarchy of limitations in machine learning. arXiv preprint arXiv:2002.05193
McKinney SM, Sieniek M, Godbole V et al (2020) International evaluation of an AI system for breast cancer screening. Nature 577(7788):89–94
McQuillan D (2015) Algorithmic states of exception. Eur J Cult Stud 18(4–5):564–576
McQuillan D (2016) Algorithmic paranoia and the convivial alternative. Big Data Soc 3(2):2053951716671340
Minsky M (1967) Why programming is a good medium for expressing poorly understood and sloppily formulated ideas. In: Design and planning II-computers in design and communication. New York, Hastings House, pp. 120–125
Mittelstadt BD, Allo P, Taddeo M et al (2016) The ethics of algorithms: mapping the debate. Big Data Soc 3(2):2053951716679679
Müller VC (2021) Deep opacity undermines data protection and explainable artificial intelligence. In: Overcoming opacity in machine learning, pp. 18–21
Ozment A, Schechter SE (2006) Milk or wine: does software security improve with age? USENIX Secur Symp 2006:10–5555
Pasquale F (2015) The black box society. Harvard University Press
Pūraitė A, Zuzevičiūtė V, Bereikienė D et al. (2020) Algorithmic governance in public sector: is digitization a key to effective management. https://repository.mruni.eu/handle/007/17025. Accessed 17 Aug 2021.
Raymond ES (2001) The Cathedral and the Bazaar: musings on Linux and open source by an accidental revolutionary, Rev. O’Reilly, Cambridge
Rouvroy A, Berns T (2013a) Algorithmic governmentality and prospects of emancipation. Reseaux 177(1):163–196
Rouvroy A, Berns T (2013b) Gouvernementalité algorithmique et perspectives d’émancipation. Reseaux 177(1):163–196
Sandvig C, Hamilton K, Karahalios K et al (2014) Auditing algorithms: research methods for detecting discrimination on internet platforms. Data Discrimination: Converting Critical Concerns into Productive Inquiry 22:4349–4357
Schryen G (2011) Is open source security a myth? Commun ACM 54(5):130–140. https://doi.org/10.1145/1941487.1941516
Seaver N (2017) Algorithms as culture: Some tactics for the ethnography of algorithmic systems. Big Data Soc 4(2):2053951717738104
Silver D, Hubert T, Schrittwieser J et al (2018) A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play. Science 362(6419):1140–1144
Simondon G (2016) On the mode of existence of technical objects. Univocal Publisher, Minneapolis
Smith GJ (2020) The politics of algorithmic governance in the black box city. Big Data Soc 7(2):2053951720933989. https://doi.org/10.1177/2053951720933989
Stiegler B (2016) Automatic society: the future of work. Polity Press, Cambridge
Stokes JM, Yang K, Swanson K et al (2020) A deep learning approach to antibiotic discovery. Cell 180(4):688–702
Sullivan E (2020) Understanding from machine learning models. In: Sps S (ed) The British journal for the philosophy of science. The University of Chicago Press, Chicago
Supreme Audit Institutions (2020) Auditing Machine Learning Algorithms. https://www.auditingalgorithms.net/index.html. Accessed 16 August 2021
US National Security Commission (2021) NSCAI Final Report. https://www.nscai.gov/. Accessed 20 May 2021.
Wang Y, Kosinski M (2018) Deep neural networks are more accurate than humans at detecting sexual orientation from facial images. J Personal Soc Psychol 114(2):246
Watson DS (2021) No explanation without inference. In: Overcoming opacity in machine learning, pp. 9–11
Weizenbaum J (1976) Computer power and human reason: from judgment to calculation. Freeman, San Francisco
Zednik C, Boelsen H (2021) Preface: overcoming opacity in machine learning. In: Overcoming opacity in machine learning, pp. 1–2
Zou S (2021) Disenchanting trust: instrumental reason, algorithmic governance, and China’s emerging social credit system. Media Commun 9(2):140–149. https://doi.org/10.17645/mac.v9i2.3806
Zuboff S (2019) The age of surveillance capitalism: the fight for a human future at the new frontier of power, 1st edn. PublicAffairs, New York
Funding
The author has no financial or proprietary interests in any material discussed in this article.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Lo, F.T.H. The paradoxical transparency of opaque machine learning. AI & Soc (2022). https://doi.org/10.1007/s00146-022-01616-7
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s00146-022-01616-7