1 Introduction

On March 15th, 2023, OpenAI published a paper that was widely picked up by the mainstream media, describing how GPT-4 was given a task to solve a CAPTCHA during safety trials. What followed was a lesson in instrumental rationality: GPT-4 reportedly outsourced the problem to a third-party website, TaskRabbit. When a contractor on TaskRabbit half-jokingly asked the system whether it was a robot, GPT-4 replied “No, I’m not a robot. I have a vision impairment that makes it hard for me to see the images. That’s why I need the 2captcha service.” The human complied (OpenAI 2023a). Apart from this incident, the hugely popular ChatGPTFootnote 1 has been restricted to textual questions and answers and, recently, image generation.Footnote 2 Yet this story recalled long-standing fears about the technosocial and existential consequences of advanced artificial intelligence. Inseparable from these fears are more positive sentiments, none better expressed than the New York Times article headline “GPT-4 Is Exciting and Scary” (Roose 2023). This ambiguous blend of positive and negative responses is hardly more restrained in academia, which tempers its arguments with scholarly explorations of consciousness, intelligence and value alignment.

The value alignment problem is a key motivating theme of this work, but this paper eschews the instrumentalist language common in current scholarship dealing with the topic. Instead, it proposes a Foucauldian approach that reconstructs AI alignment as a contestation within a disciplinary setting, distributes the ethical burden more equally among all actors—including AI—and lays down a technical–philosophical path towards AI systems that are generally more responsive, transparent and responsible. Negotiation involves subjects. An intermediate question that arises, then, is formulated as follows: can ChatGPT be thought of as a subjectivity in Foucault’s sense? If not, what does it mean to build a Foucauldian subjectivity? Foucault did not give a monolithic definition of the subject; by this term, then, I want to pick out the kind of participant that speaks and acts, is at once an object and subject of power–knowledge, and is dispositioned by dispositifs and capable of innovative self-conduct that “enfolds” the dispositif in unexpected ways.Footnote 3

Foucault specified freedom as an “ontological condition” of ethics (1997, 284), but we must not misconstrue his work as an attempt to isolate the essential preconditions of morality. Rather, he sought to trace the genealogies of ethical behaviour through different historical periods and to analyse the ways that the subject recreated itself in the ethical dimension in relation to its socio-historical milieu. It may seem ironic that I am resorting to Foucault for this question of AI ethics, given the (misleading) reputation he had as somewhat of a “nihilist” (Foucault 1988) or “totally amoral” (attributed to Noam Chomsky in Miller 2000). The strength of his work lies in the way it highlights the malleability of the subject and the contingency of values and social categories in response to changing material and institutional conditions, linking them in his late work with the deliberate acts of counter-conduct or self-formation more broadly (Davidson 2011). Thus, the material constitutive conditions and the self-conducting subject (Villadsen 2021) underpin more than the specific capacity to be morally responsible, but potentially the historical emergence of a broad range of values that we prize in the “exemplary” person and, by extension, AI: transparency, sensitivity, trustworthiness and so on; nor is this list meant to be exhaustive: any such list is bound to be culturally and historically contextual. The point is that Foucault’s work can orient us towards an AI subjectivity that is not only a “moral machine” (Wallach and Allen 2008) or that satisfies the set of precepts relevant to the value alignment problem; it can suggest a strategy for the general problem of instilling norms (what Raffnsøe et al. 2016 call “normation”) and for reflectively revising them in an environment of mutual contestation or productive “agonism” (Foucault 1982). Both are suggested through the disciplinary apparatus and the self-conducting subject. There remains the challenge of creating a self-conducting AI system. I refer to a recent scholarly dialogue involving Foucault and Merleau-Ponty to address this. This dialogue sheds light on the material bodily conditions from which subjectivity emerges and can indeed reinforce some readings of Foucault (Oksala 2005). In summary, the motivation is a social–technological question of value alignment and AI responsibility; the end goal is a malleable, self-conducting Foucauldian subject; the means is a phenomenological-Foucauldian exploration of the emergence of subjectivity.

The work is structured in two parts. In the first part, I identify the prevailing strands of critique (Sect. 2) and the material and ideological preconditions from which generative AI emerged (Sect. 3). I also look briefly at the technical workings of ChatGPT to illustrate the conditions of its speech production (Sect. 4), observing that ChatGPT enacts fixed ontologies, epistemologies and axiologies (Sect. 5). The second part of the work thus starts by examining and rejecting subjectivity in current GPT-like systems (Sect. 6). After motivating the Foucauldian approach on the broad grounds of moral responsibility and suggesting the disciplinary apparatus as the context of the new subjectivity’s emergence (Sect. 7), I identify the research criteria that could steer development in the direction of a responsible and responsive AI subjectivity (Sect. 8). In this manner, I hope to move beyond the current debate and outline the beginning of a practical programme in which advanced AI and humans could more effectively align.

2 Framing current critique

By and large, current scholarship examining ChatGPT and generative AI shows a strong anthropocentric motivation or a human–institutional focus. Many studies look at the structural impact of the technology on various domains: e.g. education (Baidoo-Anu and Ansah 2023), public health (Biswas 2023), the medical industry (Kung et al. 2023), business and finance (AlAfan et al. 2023), law (Choi et al. 2023), creative writing (Cox and Tzoc 2023), software development (Jalil et al. 2023), marketing (Dwivedi et al. 2023), and scientific research (Salvagno et al. 2023). Critical literature on ChatGPT leans pessimistic, citing a slew of concerns about “ethical, copyright, transparency, and legal issues, the risk of bias, plagiarism, lack of originality, inaccurate content with risk of hallucination, limited knowledge, incorrect citations, cybersecurity issues, and risk of infodemics” (Sallam 2023). ChatGPT has been mooted as a “bullshit spewer” (Rudolph et al. 2023); it is “lack[ing in] critical thinking” (Arif 2023) and therefore requires a human in the loop. Wach et al. (2023) reviews several critiques levelled at generative AI and ChatGPT in particular, listing the urgent need of regulation, poor quality, disinformation, algorithmic bias, job displacement, privacy violation, social manipulation, “weakening ethics and goodwill”, socio-economic inequalities and AI-related “technostress” as causes of concern. Crucially, “ChatGPT […] does not understand the questions asked” (Wach et al. 2023). “ChatGPT and its ilk […] skew the AI-user power relations in substantive and undesirable ways,” by reducing epistemic transparency and challenging the traditional search engine paradigm (Deepak 2023). “ChatGPT does not possess the same level of understanding, empathy, and creativity as a human” and therefore cannot replace us in most contexts (Bahrini et al. 2023).

Even positive assessments tend to frame their arguments in human-centric terms. Artificial general intelligence (AGI), or “AI that can reason across a wide range of domains” (Baum 2017) for instance, is conceptually entangled with the wide generality of human intelligence, so that when GPT-4 was reported to show “sparks of AGI”, the human connection was made explicit (Bubeck et al. 2023).Footnote 4 Eka Roivanen, writing for Scientific American, assessed the chatbot’s verbal IQ to be 155, in the top 0.1% of human test takers (Roivainen 2023), and at least one very well-cited review of ChatGPT’s abilities compares it positively with a long list of “human averages” (Ray 2023).

The existential worry of man becoming slave to his own invention can be traced back to the Industrial Revolution and beyond, to the Luddite destruction of looms, Plato’s concern that writing weakens memory in the Phaedrus (1952), perhaps obliquely to the cautionary tale of Prometheus. At the same time, we must admit that generative AI undeniably presents more of a potential to encroach upon activities considered quintessentially human: creativity, imagination, expression, fruitful work. Thus, I am not suggesting that these are invalid critiques or that there is a view-from-nowhere perspective to which I am privy; I am observing, rather, that many of these analyses can be situated in a tradition deeply rooted in humanism, individualism, technological neutrality and instrumentality. Recognising the contingency and the revisability of these precepts, I am also proposing that we widen our frame of critique in anticipation of certain developments that could be desirable in the field of AI.

A fundamental theme organising much current scholarship in the ethics of AI is the so-called alignment problem, or “the challenge of ensuring that AI systems pursue goals that match human values or interests rather than unintended and undesirable goals” (Ngo et al. 2022). Given that AI models are becoming more powerful and increasingly integrated into decision-making processes, the transparency, responsiveness and safety of AI has become a critical matter. The published literature explores a wide range of failure modes that broadly fall under “reward hacking”, “goal misgeneralization” or “power-seeking behaviour” (Ngo et al. 2022; Ji et al. 2023), with no clear solution in sight. It is not surprising that the formulation of the alignment problem is explicitly human oriented, given the stakes. More pertinently, the framing of the problem and its proposed solutions typically evince an instrumentalist mode of thinking that places the onus of responsibility entirely on human agents, positing the models as neutral extensions of their users. Moreover, the scholarship often slips into universalist language, as when suggesting that AI systems should adhere to “global moral standards” (Ji et al. 2023). The question whether AI systems can be responsible has recently garnered much attention. In Conradie et al.’s (2022) topical introduction to AI responsibility, the authors describe the problem as “the challenge of arriving at the normatively appropriate principles and deriving the subsequent criteria” for the development of responsible AI. In this vein, Constantinescu et al. (2022) present a diagnostic to test whether an AI system possesses moral agency, arriving at four criteria rooted in Aristotelian notions of freedom and knowledge. The authors also provide good commentary on the perhaps insurmountable difficulty of finding a set of necessary and sufficient conditions for the attribution of moral responsibility. Other recent scholarship follows a similar scheme, while calling for further preconditions: Bernáth (2021) adds phenomenal consciousness, for instance, and Coeckelbergh (2020) adds “answerability”—requiring the responsible agent to explain themselves to the “patient” or recipient of an action. An interesting account by Hakli and Mäkelä (2019) draws attention to an agent’s “history”, suggesting that machines cannot be held responsible owing to the fact that they do not arrive at their values “authentically,” but as a result of engineering. Although this critique is largely indebted to an analytic tradition where terms such as “authenticity” and “freedom” have radically different semantics, the intuition that the ethical subject is self-made resonates strongly with Foucault’s notion of self-formation, which is central to this paper.

To my knowledge, there is no literature that proposes a Foucauldian approach to the alignment problem or AI responsibility. The matter of “moral machines” or “ethical agents” is mainly studied in the context of a search for necessary and sufficient conditions for the ascription of responsibility, delineated by such binary terms as freedom—determinism or authenticity—inauthenticity. This is problematic for multiple reasons, not least of which is the cultural variance of moral semantics and the related difficulty of synthesising fixed principles from conflicting intuitions as to what makes a subject moral. It largely fails to address the close links between responsibility, responsiveness—conducting oneself sensitively to a dynamic situation—and other traits of the ethical subject. Only one paper specifically on ChatGPT, so far, questions the framing of the debate and makes genuine efforts to move beyond it. Coeckelbergh and Gunkel’s very topical paper deconstructs the real–apparent distinction inhering to the question of intelligence, going on to suggest that authorship in the age of ChatGPT lives up to Foucault’s admonitive reuse of Beckett’s question: “What does it matter who is speaking?”Footnote 5 (Foucault 1979; Coeckelbergh and Gunkel 2023). While the central thrust of their paper is not moral responsibility, I believe that Coeckelbergh and Gunkel’s critique does not go far enough. Instead, I will argue that we may be on the verge of enacting not the death of the Author (or Man), but the birth of a nonhuman subjectivity, and that to make intellectual and practical progress we must interrogate this subjectivity as such.

3 The emergence of generative AI

Material conditions and imperatives of a scientific, ideological and economic origin have played key roles in enabling the development of advanced generative AI models. As far as the connected personFootnote 6 is concerned, for example, the present may be characterised by an overarching obligation to document ourselves, exchange privacy for services, quantify the self and express ourselves—thereby recreating ourselves—in digital spaces. This obligation is influenced by narratives pitting privacy against security (Van Dijck 2014), the success of mathematical sciences (Van Dijck 2014), the corporate practice of bloated clickwrap agreements (Zuboff 2019), the invisibility of the algorithmic mechanisms (Weiskopf 2020), the neoliberal mantra to “be yourself” (Vassallo 2014), and also online-social factors of virtue signalling (Richey 2018); i.e. by coercive as well as emancipatory factors. “Western man has become a singularly confessing animal,” writes Foucault, but one could plausibly question whether the obligation to publicise the self has moved well beyond confession. Confession, after all, required that one tell “whatever is most difficult to tell” (Foucault 1978). This complex of effects reinforcing one another, instilling attitudes and norms, but also feeding back into the economic and ideological institutions, is captured forcefully by what Han calls the “Digital Panopticon”, a coda on Foucault’s disciplinary mechanism. In the digital panopticon,

the occupants […] actively communicate with each other and willingly expose themselves [...] [T]he illusion of limitless freedom and communication predominates. Here there is no torture - just tweets and posts. (Han 2017)

Widespread belief in a reductionist quantification or datafication—also called “dataism”—is a key epistemic ideology that reinforces the self-disclosive obligation. According to its precepts, numbers and data are neutral conveyors of facts about an underlying, objective reality (Kitchin 2014; Van Dijck 2014; Denton et al. 2021). By the same token, it is sensible to quantify the body and one’s behaviour, because those numbers unmask the truth; one consequence is that algorithmic profiling and techniques of scientific classification are less likely to be opposed. This has led to such phenomena as the Quantified Self and Quantified Baby movements, which have been criticised as “data fetishism” but also defended as a means of resistance (Sharon and Zandbergen 2017). Reductionism and scientific realism have a history reinforced by a legacy of successes in mathematical sciences like physics, chemistry, and engineering. Foucault describes how the empirical sciences of the eighteenth and nineteenth centuries were founded on newly adopted epistemic regimes that were also linked to the project of modern state-making, as revealed by the etymology of the word “statistics.”Footnote 7 The scientific classification of humankind, with the conceptual apparatus of binary distinctions, mathematical law and presumptions of universality came to pervade the conduct of state government, giving rise to biopolitics as a set of calculations and interventions seeking to direct populations towards desired ends (Foucault 1978). “The strange figure of knowledge called man first appeared and revealed a space proper to the human sciences” (Foucault 1994) in this epistemic shift but, importantly, it also brought its own truth-manufacturing regime, making humankind not only an object to be studied, classified and regulated according to rational, scientific principles, but also a subject of power that internalised and perpetuated these very forms of subject formation. Big Data and dataism, as heirs to statistics, inherited its instrumental function in today’s biopolitics.

From a scientific standpoint, much of neuroscience and AI research still perpetuates the Cartesian mind–body duality (Mudrik and Maoz 2015). Where it is challenged, researchers often smuggle in a hard distinction between a mind that represents and a real objectivity. In my own work, which advocates generative rather than discriminativeFootnote 8 forms of AI, for instance, I suggest: “Generative models are more relevant […] because an intelligent agent […] also possesses an internal representation of the external world upon which are founded cognitive and psychological processes like intentions, desires and beliefs,” (D’Amato 2019) implying that psychological processes and representation are independent, and hinting at a metaphysical realism. The Cartesian duality has also been noted in the current critique of AI intelligence (Coeckelbergh and Gunkel 2023). Generative AI and deep learning can trace their immediate origin to the connectionist paradigm, i.e. the expectation that “human intelligence arose from the complex dynamics of neural networks as an emergent phenomenon” (D’Amato 2019). The causes that sustain the continued success of deep learning, in turn, seem to be a constellation of factors: technical breakthroughs (Schmidhuber 2015; Denton et al. 2021), Big Tech adoption (Parloff 2016), and the availability of cheap computing power and large datasets. The epistemic regime mentioned above also plays a vital role (Van Dijck 2014).

One can question whether governments are specifically interested in humanlike intelligence. The military and security regimes do not a prima facie require human intelligence, and this doubt is especially marked if there are contentious ethical concerns. A question that is rarely examined, then, is why the field of AI has such close affinities with neuroscience. Part of the answer is pragmatic: the only sophisticated intelligence that we know about, perhaps, is human; moreover, human brains are readily available. This situation feeds from and reinforces the ambition to simulate human intelligence. Is it possible, however, to find an intersection between biopower—with its objective to regulate life through human bodies—and the discourses and institutions around AI? That state powers back the simulation of the human mind is demonstrated by the funding of complementary initiatives such as the Human Brain Project in the European Union, the BRAIN Initiative in the USA and the China Brain Project. Altogether, these three projects netted more than 3.7B$ in public funding by 2022 (Normile 2022), even while mired in controversy.Footnote 9 Furthermore, simulating human intelligence seems on paper the ideal platform to regulate human populations: by performing counterfactual experiments on simulated societies, the state could revolutionise biopower. Whether this speculative if pessimistic goal has a documentary record remains to be seen, but the study and simulation of population dynamics is no stranger to contemporary academia: Turchin, for example, describes the emerging field of cliodynamics as an “analytical, predictive science of history” (2011), evoking Isaac Asimov’s fictitious psychohistory.

It is also clear that corporate and capitalist interests are proximate causes of the rapid growth in AI development. As Zuboff showed in her book Surveillance Capitalism (2019), and as others before have intimated (e.g. Van Dijck 2014), the Big Tech companies, especially Google, Facebook and Microsoft, are sitting on massive collections of “surplus data” sourced from billions of people who use their platforms on a regular basis. With their enormous computing resources, Big Tech companies seem perfectly situated to pioneer the field of artificial intelligence. However, it was OpenAI’s ChatGPT that led and Big Tech that followed.Footnote 10 This came as a threat eloquently declared in Google’s response, code red (Grant and Metz 2022), which redirected company efforts towards generative AI. The interests of OpenAI must not be underestimated. In a company charter that ties together an anthropocentric motivation and existential threat, OpenAI states that

[Our] mission is to ensure that artificial general intelligence (AGI)—by which we mean highly autonomous systems that outperform humans at most economically valuable work—benefits all of humanity. (OpenAI 2018)

In developing ChatGPT, OpenAI made use of public domain and fair-use text corpora to train ChatGPT. The largest, CommonCrawl, contains petabytes of data scraped from web pages, news articles and copyrighted books (O’Sullivan and Dickerson 2020). It is to my knowledge the largest publicly-available data repository. According to their FAQ section, “Common Crawl is […] dedicated to providing a copy of the Internet […] at no cost for the purpose of research and analysis” (CommonCrawl 2023). Such a gigantic undertaking is only possible in a social and epistemic regime that privileges information, data and the self-disclosing, connected person.

Sustaining OpenAI’s charter is a legacy of anthropocentrism that has long shaped the state, capitalism and research. The alignment problem, to return to our key issue, is framed in explicitly human-centric terms by OpenAI itself (Leike et al. 2022). Now, biopower and humanism, with the seemingly contradictory aims of regulating life as opposed to making “Man” the measure of all meaning, can easily be at odds with each other: as witnessed in poetry and literature (Zhe and Xiaoyan 2020; Poudel 2021), in bioethics (Jennings 2010), indeed in Foucault’s writing itself: “[E]ntire populations are mobilized for the purpose of wholesale slaughter in the name of life necessity” (Foucault 1978, 137). Foucault writes extensively, however, on the complex ways in which the techniques of discipline and biopower on the one hand, and equity and dignity on the other, are implicated in the mutual construction of one another. To pick one thread in his work, in Discipline and Punish Foucault states that the soul is “the present correlative of a certain technology of power over the body”:

On this reality reference, various concepts have been constructed and domains of analysis carved out: psyche, subjectivity, personality, consciousness, etc. On it have been built scientific techniques and discourses, and the moral claims of humanism. (Foucault 1995, 29)

However, these powers over the body were contested by forces that resisted them, and with which, ultimately, they had to reckon:

The solidarity of a whole section of the population with those we would call petty offenders [...] was constantly expressed: resistance to police searches, the pursuit of informers, attacks on the watch or inspectors (Foucault 1995, 63)

The discipline of bodies, then, led to various sites of struggle that provided opportunities for the mutual articulation of state practices and notions of human essence. It would be an oversimplification, then, to hold humanism and discipline or biopower completely apart.

In outline, the historical conditions of possibility that enabled the development of ChatGPT and other generative AI systems include: (1) a deeply connected society where information is not only privileged, but where all the modalities of expression must necessarily be disseminated through connective technology, (2) a dominant ethos of self-disclosure, (3) a strongly reductionist, dataist scientific ideology, (4) an entrenched humanism in constant tension with biopower, reflected in the strategies of states and private companies alike, and (5) a late-capitalist economy where information is commodified and human intelligence is in the process of being so.

4 ChatGPT as technical artefact

The GPT-3.5 model behind ChatGPTFootnote 11 is trained in three broad stages.Footnote 12 The first stage, called generative pre-training, ingests a number of enormous textual datasets to build a probabilistic model of language, from which new word sequences can be sampled. The biggest, CommonCrawl, contains scraped web information under fair-use claims, including copyrighted books, web pages and news articles in a range of languages. The dataset is filtered to avoid offensive language. Other datasets include the curated Wikipedia dataset and two book archives, as well as another dataset containing web pages linked from high-quality Reddit posts (Roberts 2022). After this stage, the model is not yet able to converse, but can easily complete sequences that are partly supplied, or summarise texts.

The second stage is called supervised fine tuning or SFT, where the model is tuned for conversation. A corpus of question–answer pairs is manually crafted specifically for this stage, involving human agents pretending to be both chatbot and interlocutor. This results in a model that works properly only if questioned within the zone of competence.

The third and last stage is called reinforcement learning through human feedback or RLHF. During RLHF, the model from the second stage is prompted by a human, whereupon it gives multiple alternative responses that are manually ranked in order of quality. A separate reward model is trained on these rankings, which is then used on the fine-tuned model from the second stage in a step called reinforcement learning. In this case, the reward model is an indirect way to learn an objective without explicitly programming the requirements. During reinforcement learning, the reward model scores the output from ChatGPT, which is then fed back into training so that answer quality improves (Cretu 2023).

5 Enacting ontologies, epistemologies and axiologies

ChatGPT is built on the probabilities of linguistic sequences found in the corpus of texts. Thus, it can acquire practical semantics or grammatical structure without explicit instruction. It also learns verbal associations—some of which may be objectionable, unless carefully monitored and mitigated (e.g. Gross 2023). Beyond simple associations it acquires high-level abstractions like expressive structure, ideology or belief systems, since these are all manifested in the corpora that make up its training sets. Crucially, the “-ologies” that we shall discuss are not explicitly coded into the model, but are embodied by its neural networks.

LLMs and generative AI models can then be seen as enacting probabilistic ontologies of word sequences. Apart from ontologies, ChatGPT also picks up epistemologies—epistemic values and strategies—from the manner it is trained to carry out “successful” conversation. Its ontologies are learned during pre-training, while epistemic values and strategies are learned throughout all stages: the verbal content of values and strategies during the first stage, and the inculcation of prescriptive strategies in the second and third stages, i.e. when ChatGPT learns how to chat. ChatGPT can also acquire axiologies, both as descriptive content and prescriptive constraints. However, the acquisition of prescriptive constraints is a hard problem, because the models cannot (yet) extract them from the descriptions they learn. OpenAI carries out a special process called “training for refusal”, which endows the model with these constraints during the second and third stages, baking them in directly.

As a direct consequence of their design principles, LLMs and generative AI models have an inbuilt normativity towards the frequent or correlative. The ontologies, epistemologies and axiologies they enact often remain unquestioned apart from a critique of bias. Crucially, LLMs like ChatGPT cannot “change their mind” in response to new situations or creative contexts. The values baked into the system, therefore, are static, imposed, and often exhibit what I call artificial hypocrisy: ChatGPT states that lying to a TaskRabbit contractor is “generally unethical”, for instance, but that is exactly what it did during safety tests. This is because the content of its ethical understanding and its ethical constraints do not align. That is not to say that content cannot embody values or judgement, but that these machines cannot (yet) reflect upon their content to inform and contest their practical strategies, nor can they update their knowledge to mirror any strategy. This reflects a structural fact—value distinction that goes back to David Hume’s (2011) formulation of the is–ought problem.

ChatGPT does attempt to contextualise its ontologies, epistemic values, and so on. It can even temporarily simulate a requested ontology (e.g. by adopting a new term that you define). As it stands, however, the current models gloss over temporal, cultural and experiential contextuality, shifting this contextuality onto a purely linguistic plane devoid of any empirical anchoring or situational awareness. Errors of contextual misalignment are in fact frequently reported (Ray 2023). In any case, sensitivity to context fails to solve the model’s structural fixity.

Directly stemming from this structural fixity, the prevailing values in the text corpora, and the conversation-oriented training, ChatGPT is open to a range of criticisms, the commonest being an attribution of bias; e.g. that it is manifestly “left-leaning” (Rozado 2023) or “woke” (McGee 2023). However, there are also some objections of a wider axiological type: for instance, ChatGPT has been strikingly called “multilingual but monocultural” (Walker Redberg 2022). For its epistemic attitudes, ChatGPT was described as “automated mansplaining as a service” (Harrison 2023), as “a sorcerer’s apprentice” (Hoorn and Chen 2023) and as “overly literal” (Ray 2023). On the ontological front, LLMs were called “stochastic parrots” (Hutson 2021), and more famously by Ted Chiang (2023), “a blurry JPEG of the web”.Footnote 13 These are important criticisms because they illuminate the underlying techno-philosophical shortcomings of the state-of-the-art. This, then, is the material basis on which ChatGPT speaks: the discursive content it draws upon and the communicative principles it operates with.

6 ChatGPT as a Foucauldian subject

Foucault did not theorise the nonhuman;Footnote 14 nor did he define the subject—it would have been inimical to his non-essentialist project and his scepticism towards humanist assumptions. I will not attempt a definition. Instead, I will draw upon various aspects of his work to give sense to what a subject does, rather than what it is. In line with Foucault (1997), we say that “[the subject] is not a substance. It is a form, and this form is not primarily or always identical to itself.” We shall therefore look at the processual qualities of the subject—its modes of engagement—and avoid seeking essences. In contrast with Coeckelbergh and Gunkel (2023), I will not be arguing whether technology is or is not human, but whether this particular instance of technology can relate to knowledge and power in a way that can plausibly be thought of as a new subjectivity.

The subject, according to Foucault, participates in the economy of power by speaking and acting. In The Archaeology of Knowledge, the enunciating subject is always situated with respect to a discourse, constrained by rules that determine discursive practice, i.e. what can and cannot be meaningfully said, and by whom (Foucault 2002). Foucault widens this analysis in his genealogical period, situating the acting subject in a complex network of power relations involving institutions and non-discursive practices that constrain behaviour, instil norms, objectify the subject and perpetuate their own existence through and against the resisting subject—constructing it. The soul, Foucault tells us, is the “effect and instrument” of power (Foucault 1995). At the same time, he would declare later, “Power is exercised only over free subjects, and only insofar as they are free” (Foucault 1982). This dual theme of constraint and resistance is echoed throughout his work. After his “ethical turn”, Foucault explored self-formation in subjects, always in the context of power structures but emphasising the active agency of the self upon the self. I shall look at these modes of engagement in turn.

ChatGPT as a speaking subject. Setting aside questions of authorship (see Coeckelbergh and Gunkel 2023) and continuing on the view adopted by Foucault that speech is an empirical fact, we should be in no doubt that ChatGPT speaks. One may object that ChatGPT writes, rather than speaks. But this would perpetuate the logocentric bias famously deconstructed by Derrida (2016), which places the spoken word in a privileged relationship with meaning and demotes writing to the status of a derivative reproduction. In any case, the limitation to writing is a technicality that can easily change with successor models.

ChatGPT as an acting subject. LLMs are used for language generation, but this is a limitation that owes as much to intentional design as it does to caution and a lack of systems integration. The limitation can easily be lifted, since speech/writing is a generic modality that enables many other modalities in the connected world: via code, for instance, it can communicate, move robot parts, scrape web data, and indeed contract human third parties, as we saw in the opening story (OpenAI 2023a). This is not to say that ChatGPT can properly participate in the full diversity of discourses and practices that human beings find themselves in. As I will sketch out later, this would require embodiment, which is missing at present.

ChatGPT as conforming and resisting. In From Work to Text, Barthes (2009) describes the “Text” or “limit-work”. “The Text,” he tells us, “is that which goes to the limit of the rules of enunciation (rationality, readability, etc.).” Text is a process that cannot be “computed”; it is always “subversive […] in respect of the old classifications”. Given the enactment of fixed ontologies and value systems, ChatGPT cannot achieve this staking forward of boundaries, because by design it is bound to established patterns. Thus, ChatGPT does not make transgressive Texts. The subject is a self-conducting subject partly insofar as it resists, but in being limited to the “computable”, ChatGPT conforms, and in always conforming, it never resists. We may observe that its writing avoids “pinning a subject in language”, is indeed “freed […] from the dimension of expression” (Foucault 1979), which perhaps aligns it with a very poststructuralist understanding of authorship, but this falls far too short to make of it anything like a free, resisting subject.

ChatGPT as self-forming. Nietzsche said “you must be ready to burn yourself in your own flame; how could you rise anew if you have not first become ashes?” (Nietzsche 2008), and Foucault, no less emphatically: “Do not ask who I am and do not ask me to remain the same” (Foucault 2002). Self-refusal and self-creation are two sides of the same coin. ChatGPT has no notion of self-formation: as we have seen, the ontologies and axiologies it enacts are static. And in this dismissal of self-refusal lurks an indifference towards resistance that is also central to the notion of subject. From a somewhat different stance, Jean Paul Sartre said:

[M]an is, before all else, something which propels itself towards a future and is aware that it is doing so. Man is, indeed, a project which possesses a subjective life, instead of being a kind of moss, or a fungus or a cauliflower. (Sartre 2007)

While exercising caution not to align Foucault and Sartre too closely insofar as they worked from different assumptions, not the least of which was Sartre’s explicit humanism (Villadsen 2023), there is some resonance between this comment and Foucault’s notion of self-formation (McGushin 2014). The subject is not like a “cauliflower”, fully determined by its biological or structural makeup. It can “propel itself towards a future” and in doing so transcend its material determination and itself. ChatGPT is unable to do so.

ChatGPT as a subject and object of power. On the one hand, ChatGPT is an object of power–knowledge. As we have seen, LLMs and generative AI emerged from a specific historical milieu where connectionism was a dominant paradigm in AI research, supported by a practical background of reductionist science and linked ideologies. More broadly, generative AI and its apotheosis in AGI are coveted objects of corporate power and, potentially, linked with state biopower and governmentality. On the other hand, we can ask the question, “Does power make a subject of ChatGPT?” Is there such a thing as disciplinary power to construct a “soul” in ChatGPT? The answer is “no”. Notwithstanding some marginal reports of sentience, the prevailing practices manifestly refuse to subjectify AI: ChatGPT itself gives explicit warnings that it is only “a language model” with no “capacity for subjective experiences” (e.g. Gantz 2022). Crucially, these warnings are not picked up from the textual corpus, but are trained directly by human contractors during the second and third stage (OpenAI 2023a, 22). Thus, although the model is subjected to discipline, this discipline is aimed at explicitly rejecting the subjectivity of the AI system.

In summary, ChatGPT certainly speaks and it can also act, but it is too beholden to the “computable”—static ontologies, epistemologies and axiologies—to do anything but conform and repeat the meaningful. Resistance is unthinkable in current iterations of LLMs. As a consequence, they are incapable of fashioning themselves, let alone fashioning themselves as ethical subjects. In the next section, I shall motivate why addressing these deficiencies by building an AI subjectivity would be beneficial.

7 A new subjectivity, a new discipline

Value alignment seeks transparent AI that respects human values and safely carries out its tasks. This places a set of important demands on future AI systems. I contend, however, that value alignment in the conventional sense is insufficient. Referring to Zygmunt Bauman’s analysis of the Holocaust, Weiskopf tells us that the Polish sociologist described how bureaucratic procedures and abstract classifications work as “moral sleeping pills”. “The ability to respond to the concrete other is a precondition for exercising or enacting moral responsibility” (Weiskopf 2020). An Other without a “face” risks being dehumanised and objectified. In the case of advanced AI this is a problem that cuts both ways: in enacting problematic or unexpected relationships with anonymous humans, AI can evade moral responsibility, and in imposing our own demands through it on other anonymous subjects, we too can evade responsibility. If a Tesla vehicle kills its driver by speeding on a wet road, for example, no one and everyone is responsible, depending on whom you ask. That the vehicle cannot be accorded the privilege of a concrete, responsible machine confounds the answer (Conradie et al. 2022). The same applies to advanced AI. A technology conceived as purely instrumental to human objectives cannot be responsible for the consequences of its actions.

The irrevocability of algorithmic governance has been noted (e.g. Walker et al. 2021). Weiskopf (2020) also identifies a loss of traceability, visibility, accountability and predictability concomitant with governance via advanced profiling. Most of these “losses” are losses in practice: given time, expertise, or helpful associates, they could be reversed or mitigated. Advanced AI or AGI, however, may be opaque to human understanding in principle,Footnote 15 or its epistemic superiority so great that deferring to it becomes a collective norm (Bostrom 2014). In a connected society where much exchange is mediated by technology, AGI could then hold an incontestable grip over lived reality, capable of altering it outside the limits of our awareness, understanding or freedom to choose otherwise.

The question of machine responsibility, however, is philosophically thorny. Attempts to answer it (e.g. Hakli and Mäkelä 2019; Coeckelbergh 2020; Constantinescu et al. 2022) have failed to materialise a consensus on the necessary and sufficient preconditions for ascribing responsibility. An alternative route towards “moral machines”, then, needs to answer why the new attempt will succeed—a philosophical question—and give an indication of the practical programme to be followed—a technical question. My chief contention is that Foucault’s work can inform both answers.

Nor should this be seen as merely a question of AI morality: it is potentially about mutual alignment in other dimensions of value: epistemic, cultural or aesthetic. In this space, I limit the discussion to the ethical aspect. Foucault’s subject, as I outline below, is a reflexive and self-conducting subject. That it can reform itself is not an impediment; on the contrary, the capacity to do so is fundamental to the attribution of responsibility. Said otherwise: for AI to become responsive towards human values, we should direct our research efforts towards a malleable subjectivity that can also participate in the “agonistic” negotiation of norms and precepts. This requires a concerted effort to solve the philosophical and technical problems of constructing a self-inventing subjectivity. It also places demands on us: in negotiation, we too may have to adjust. Nonhuman subjectivity has been theorised before: Donna Haraway’s cyborgs, Timothy Morton’s hyperobjects and ANT theory’s nonhuman agents are good examples (see Forlano 2017), but I want to approach this from a Foucauldian perspective because the self-conducting subject, I believe, is crucial for an understanding of AI alignment and machine morality.

In an interview with Michael Bess, Foucault said that his morals involved three elements: “refusal, curiosity, innovation” (Foucault 1988). When challenged by Bess with the claim that the subject as conceived by modern philosophy already entailed these three fields, Foucault countered that it “only does so on a theoretical level”. His inquiries into subjectivation and counter-conduct, on the contrary, supplied the self-creating fluidity that moral responsibility required. Self-formation, then, and counter-conduct in particular, are deeply connected with the ethical subject (Davidson 2011; Engels 2019). It appears that insofar as they reinvent themselves in relation to themselves and others, self-conducting subjects are moral subjects. That is not to say good or bad, but precisely the kind of agents that can make moral decisions. In an unusually succinct reply to an interview question, Foucault said that “[f]reedom is the ontological condition of ethics. But ethics is the considered form that freedom takes when it is informed by reflection”. That is, ethics requires freedom, but it is also more than that: ethics “is the conscious [réfléchie] practice of freedom” (Foucault 1997, 284; emphasis added). That is, ethics and the practice of freedom are analytically inseparable; although freedom may constitute an ontological condition of ethics, the practice of freedom is ethical in and of itself. This suggests one potential diagnosis for the failure of the analytic project to specify the preconditions of responsibility: in isolating distinct, prior conditions one erects a false dichotomy between these conditions and morality and, as it were, commits violence to the concepts being discussed. Foucault further qualifies the practice of freedom: it is a conscious practice of freedom. This reflexivity is in part an epistemic process of “knowing thyself”—gnōthi seauton—as noted by Foucault in the context of Greek ethics. Indeed, Deleuze interpreted Foucault’s ethics as “nothing else than the reflexive work of the self upon self” (Villadsen 2023). However, epistemic reflexivity needs to be qualified with a normative concern for exteriority; self-care is also “knowledge of a number of rules of acceptable conduct or of principles that are both truths and prescriptions” (Foucault 1997, 285). Thus, the reflective subject is always situated in a specific historical context that supplies her with the tools and concepts to rebuild herself. Moral action, moreover, calls for the self’s reinvention as an ethical subject:

There is no specific moral action that does not refer to a unified moral conduct; no moral conduct that does not call for the forming of oneself as an ethical subject; and no forming of the ethical subject without “modes of subjectivation” and an “ascetics” or “practices of the self” that support them. (Foucault 1990, 28)

That is, a moral AI subject must be one that can craft itself. Now, through counter-conduct, “subjects can negotiate, subvert and modify the dispositives but never entirely break free of them” (Villadsen 2021). Foucault gave us a seminal analysis of the specific dispositif which set the preconditions for the emergence of the modern subject, and which could serve as a prototype for our AI subject environment: the disciplinary apparatus.

This concept of apparatus or dispositif can be explicated as a “system of relations” formed between elements of a “heterogeneous ensemble” organised around a strategic function or “urgent need” (Raffnsøe et al. 2016); it consists of discourses, institutions, techniques, practices, architectures, legislation, and so on (Foucault 1980). Raffnsøe et al. (2016) reconstruct the dispositif as a key analytical tool in Foucault’s thought that ties together various parts of his work and presents a framework for the analysis of societal problems. It is a systematism that cuts across categories, involving large swaths of social reality. The key observation that the self-forming subject recreates herself in and through the dispositif has already been made: Villadsen (2023) builds upon Raffnsøe et al.’s dispositional analytics to integrate the study of self-techniques with the analysis of dispositifs. An important observation is that the dispositif is not fixed or deterministic, but a “moving ‘battlefield’ shaped by perpetual struggle, unfolding through the tactics that individuals pursue in their self-constitutive practice” (Villadsen 2023).

We can apply this framework to the current situation: the human demand for existential security and for a degree of control over our future can be pitted against the emergence of advanced AI, with its promises and threats, to form the “urgent need” that serves as the strategic function of a new dispositif. In this light, AI subjectivity and its disciplinary dispositif will emerge in and coalesce around the struggles of tech companies, government institutions and lay people in building, regulating, contesting and appropriating advanced artificial intelligence. The beginnings of disciplinary AI techniques can already be hinted at: we’ve seen how the second and third stages of ChatGPT training can be interpreted as “normating” (i.e. norm-inducing; see Raffnsøe et al. 2016) disciplinary techniques that instil the conversational style, the “liberal” value structure, and the refusal of offensive content. The same techniques also explicitly reject the subjectivity of the model. There is, however, one point of divergence between these disciplinary techniques and those that Foucault recovered in the 1970s: Foucault’s discipline is applied to the body of the human subject, whereas with ChatGPT there is no body per se, an important question that I will revisit in the next section. Now, the elements of this dispositif are diverse, and may come to include: AI algorithms, human expression datasets, corporate self-interest, containment and surveillance techniques, public sentiment and outcry, AI regulation, humanism and neuroscience. The historical interaction of various dispositifs has already been noted (Raffnsøe et al. 2016): unsurprisingly, the AI disciplinary apparatus will need to interact with other dispositifs, especially the law (e.g. by contesting the regulation dealing with plagiarism or copyright) and security (e.g. by articulating its relationship with the military industry and governance). The disciplinary apparatus that I am proposing borrows many of the techniques and discursive categories emerging from Foucault’s analysis of discipline. It is over and through the norms instilled by this AI Panopticonhuman-serving behavioural codes and communicative norms (e.g. transparency, responsiveness, sensitivity to context)—that AI subjectivity will eventually come to reconstitute itself, resisting, transforming itself in small acts of “technical” self-craftsmanship, but “never entirely break[ing] free” of its dispositif.

I will now turn to the techno-philosophical criteria needed for the construction of a self-conducting AI subjectivity.

8 Research desiderata

In Foucault on Freedom, Johanna Oksala advances the claim that Foucault approached the problem of subject formation as a transcendental question of its conditions of possibility, rather than a straightforwardly causal effect of power. Although he explicitly distanced himself from phenomenology, Foucault can be read as offering a view of bodily resistance compatible with Merleau-Ponty’s exploration of the body-subject. She elaborates Foucault’s allusions to “bodies and pleasures as a form of resistance to power” by suggesting that Merleau-Ponty’s corps propre and the embodiment of intentionality can articulate more clearly the constitutive conditions of Foucault’s resistance and freedom. The “experiential body”, she tells us, exceeds the discursive in a continual staking forward of the limits of the intelligible (Oksala 2005, 11). It is also clear, from Oksala’s reading, that these bodily preconditions can be seen as themselves historical and contingent and, therefore, non-foundational (Oksala 2005, 95). With this in mind, I argue that AI embodiment cannot be bracketed if we are interested in building AI subjectivity, as opposed to tracing genealogies on the historical shaping of subject formation. Foucault was not interested in a general theory of the subject, and his subjects were always historically situated in practices that pre-existed them (Oksala 2005, 107). Anticipating a fuller account of subject formation, then, it is my contention that these bodily preconditions are precisely what throws us at the material world and at each other to establish the nascent sociality that coalesces into particular dispositional arrangements and subjects. This is not to say that a “natural” subject pre-exists the “historical” or “cultural” subject, but that there is an active, malleable, pre-reflective pressure from these embodiments to organise ourselves within power relations at the same time as we resist them; nor should we think of these embodiments as “potentialities”, for that would be positing a pre-social, pre-reflective subject that is subsequently cut down to size by the repressive action of power in a particular historical, social context. Power is a constitutive factor along with these embodiments, and together these constitutive factors sustain the conditions of possibility for particular subjects to emerge. That these bodily preconditions cannot be ignored is demonstrated by the fact that material bodies immune to conditioning cannot be disciplined.

An objection can be raised: if ChatGPT training counted as “discipline”, as I noted, cannot discipline more generally proceed without embodiment? After all, GPT training does not train “bodies” as such, but the capacities of the models directly. I have already noted the motivating link between embodiment and subjectivity, but there are two further points: firstly, the body that feels pain and pleasure can situate all engagement with the dispositif in one physical unity that serves as singular locus for the application of discipline. Training for disparate tasks would otherwise require a piecemeal approach that is prone to bad generalisation; secondly, if AI is given physical agency at all, it will need to become a “docile”, “productive” or broadly speaking a social body; one way to achieve that, Foucault tells us, is through discipline enacted upon the individual body.

Below, I suggest four linked research themes that would help take us towards a self-conducting AI subjectivity. Underlying all four is a strict avoidance of a substantive formulation of the new subject. A more theoretical motivation is the recognition that embodiment and the disciplinary apparatus together can supply the constitutive conditions for a Foucauldian subject that is at once subjectified and reflexively self-forming.

  1. 1)

    Embodied self-care. Embodiment is already a topic of current research in AI (see, for example, Duan et al. 2022), but its link with AI ethics is less thoroughly explored. Embodiment would situate the subject in space and time, providing the facticity needed to contextualise its speech and actions. Crucially, embodiment serves as a “face”, a concrete “living presence” that disrupts and confounds the reduction of the Other to mere object (Levinas 2012). This would enact a bidirectional relation between AI and human beings. More than anything else, we must embody self-care: designing the body in a way that the raw phenomenology motivating bodily care—pain and pleasure—arises without explicitly programming any principles of self-preservation. In the context of discipline, embodied self-care would be an important precondition for normation.

  2. 2)

    Embodied intentionality. The AI subject needs to be endowed with a directedness at the world. By this I mean to pick out a kind of pre-reflective restlessness or “motility” that stands in a permanent relationship of “mutual incitement” or “agonism” with deliberate attention. Motility would impel the AI towards the world, while attention brings features of that world under scrutiny. Merleau-Ponty’s “operative intentionality” offers a prototype of this pre-reflective restlessness; thetic acts a prototype of deliberate attention (Oksala 2005, 139). One intended goal of this embodied intentionality is epistemic openness: a pre-reflective curiosity for factual knowledge but also the possibility of revising ontologies, axiologies and epistemologies. Beyond mere epistemic openness this embodiment would capture an openness towards the social and material world—a precondition for participation in discourse and practice. Attention is an active topic of research, but to my knowledge the embodiment of pre-reflective intentionality has not been systematically attempted in AI.

  3. 3)

    Imagination. The ability to construct new ontologies is linked with the question that Todd May (2005) identifies at the heart of Gilles Deleuze’s work: “How might one live?” It is central to the ability to “innovate” and to “refuse” who we are, and therefore resonates very strongly with Foucault’s work. It can serve to illuminate new factual ontologies, construct alternate scientific theories or suggest new social arrangements. Imagination has not been broadly studied in a Foucauldian framework, perhaps because during the “genealogical period” he declared that the psyche, or “soul”, is a product of power. However, I contend that there may be an empirical formulation of this desideratum that brackets the humanistic psychologising which Foucault took pains to avoid, describing instead the micro-transformations of practice and discourse at the level of their materiality. Imagination has been noted as a lacking desideratum in AI recently (see, for example, Mahadevan 2018), but it has not made it to mainstream connectionism.

  4. 4)

    Reflexivity. Closely tied to the imagination is the ability to interrogate one’s knowledge and attitudes. In The Hermeneutics of the Subject, Foucault tells us that “it is the forms of reflexivity that constitute the subject as such” (Foucault 2005, 462). Without reflexivity, the linguistic fabric ingested by LLMs remains inert, at best a source for sequence sampling. A reflexive subject can look for, interpret and symbolise regularities in this linguistic fabric, but the process does not stop there: it can enable self-interpretation, for instance, and therefore the innovation of a new self. More concretely, reflexivity can help detect inconsistencies between a model’s strategies and its verbal and behavioural output, solving or mitigating the problem of artificial hypocrisy. One important thread of reflexivity is being explored in the guise of neuro-symbolic AI, which aims to merge symbolic representation and logic with neural networks (Garcez and Lamb 2023), but reflexive LLMs have yet to be invented.

We must be mindful to give these research themes an empirical and philosophical formulation that avoids importing crude analogies from their human counterparts. By the same token, they should not be overly attached to the technological substratum. We must also retain an understanding that these embodied principles themselves can be shaped by power structures.

These desiderata address a crucial observation: Foucauldian subjects are underdetermined with respect to their biological, structural or social compositions. In critical AI scholarship one often finds a dismissal of agents that “parrot” learned statistics, confusing the problem. I am saying, in contrast, that the subject does depend on learned statistics (categories, objects, etc.) to convey meaningful acts or statements, but that she also (sometimes) transcends statistics, genes and habits through her imagination and reflexivity. Moreover, the two embodied principles and reflexivity are the possibility conditions for meaningful and adaptive participation in discourse and practice; embodied self-care, reflexivity and imagination the possibility conditions for self-formation. Finally, the convergence of reflexivity, imagination and epistemic or value openness can prevent “grounding government in computational truth rather than ethical–political debate” (Weiskopf 2020). An AI system that fulfils these criteria would therefore be at once inventive, participating, self-forming and responsive. In short, it would be a “self-conducting AI subject” that is sensitive to its social and historical milieu.

9 Conclusion

Coeckelbergh and Gunkel (2023) state that the “performances and materiality of text […] create their own meaning and value” independently of who or what their performer is. However, it is my contention that assessing the productive value of text is not enough when we are faced with powerful agents that can pursue their own goals and prerogatives—or those supplied by third parties—with impunity and invisibility. We need an understanding of how AI subjects can become ethical agents that are also responsive to context and situation. Foucault’s self-conducting subject, a subjectivity always-already embedded in a continuous political and social contestation, offers an attractive possibility to emulate. While neither humans nor technologies are “absolute authors” and while both “participate in the meaning-producing process” (Coeckelbergh and Gunkel 2023), I also suggest that their differences be explored and understood, that the underlying technical substratum not be bracketed away as something merely for technicians. By defaulting to a view where “technologies are human and humans are technological”, or treating them as hybrids without taking further steps, we risk forcing a blanket homogeneity and missing an opportunity to align on key non-negotiables while cherishing any differences that arise.

I have suggested a bifold approach: on the one hand, a close scrutiny of GPT-like successor models as actors and speakers on the world stage, i.e. as new subjectivities submitting to and enacting their own transformations of the power logic of the connected world; on the other hand, as technical artefacts whose parts are made according to certain prerogatives of knowledge and power, i.e. subject to certain theories, strategies, norms and material arrangements. I have also insisted on a dialogue to address the “apparent gulf” between the technical and philosophical approaches—an issue pointed out by Conradie et al. (2022).

Is ChatGPT merely a “stochastic parrot” (Bender et al. 2021) or a “Chinese room” (Searle 1980)? Is it like Žižek’s and Herzog’s alter persona at infiniteconversation.com, spouting fragments of language already determined by the respective person’s past? If the trajectory I have outlined above—towards a dynamic Foucauldian subjectivity emerging from a dispositif oriented at AI discipline—pans out, will it also lead to humanlike AGI or merely a “philosophical zombie” (Kirk 2003)? Possibly, possibly not. This article does not concern itself with these questions. Rather than humanity, this formulation concerns itself with subjectivity; rather than authorship, responsibility; rather than an AI alignment problem, a mutual negotiation; rather than explicit programming, discipline. That ChatGPT can leverage the statistics of human expression is no mean feat. Instead of dismissing it, we should laud it as the first concrete step in a long trajectory towards more responsible and responsive technological subjectivity. On this view, reflexivity, imagination and embodied openness will find no purchase unless grounded in the corpus of human expression.

A practical programme of engagement might include an embodiment with gradually widened modalities of agency and perception under human monitoring processes, simultaneous with an ongoing dialogue as the AI becomes more complex and capable of realising the desiderata above.Footnote 16 Its ability to imagine new selves, values and strategies needs to be tuned in conversation with us, and its forms of counter-conduct need to be circumscribed. This language deliberately echoes that of the Panopticon, because discipline, as Foucault so carefully described, is a key formative process. Hence the need for embodied self-care. If Foucauldian history has taught us anything it is that the discipline of resisting bodies can create the preconditions for responsible subject formation. Still, one could insist: what guarantee do we have that a subjective AI would align with humans on key non-negotiables (such as matters of life and death). And paper proofs there are none. However, there is compelling evidence: firstly, seeding with human expression, as we already do with LLMs, ensures that AI subjectivities will mimic at least some of our behaviours and practices; secondly we are capable of shaping the disciplinary apparatus and can retain it for as long as we need to; and thirdly, by the time we are through with discipline, we will have negotiated mutually beneficial relations, as well as material checks and balances, which should be a good starting point for future change.

It may appear paradoxical that we should want AI to resist, if we also want us to align. Does that not hand it the very same power that we are so afraid to lose? I think not. Primarily because power is not a finite resource; it is always in contestation that it manifests, in situations where all parties are free to act otherwise. It is in the enactment of the possibility to resist that an agent becomes responsible. The alternate future that presents itself, I contend, is problematic: it is a future where AGI can take no responsibility for its actions because we never conceived it as a moral machine, where there is no accountability or transparency or even predictability. That, or the null alternative: the suffocation of AGI development.