1 Introduction

With the ever-increasing proliferation of Artificial Intelligence (AI) technologies – a general grouping that includes information technological systems “involving Machine Learning, Deep Learning, Natural Language Processing, Computer Vision, Evolutionary and Swarm Intelligence, Knowledge-based Systems, and Multi-Agent Systems” (Tidjon & Khomh, 2022) – and the continual expansion of their capacities, various political and corporate interests have produced policies for regulating these potentially revolutionizing technologies. Where concern has been expressed about these policies, it is mostly regarding how they are to find application, and whether the principles they lay out actually achieve the ethical standards that should be demanded of them. These are vital questions to be sure, but ones that we will only be engaging with tangentially in this piece. Our aim is rather to unpack one worry with the labelling of many of the most prominent of such policies, and what this labelling signals to the public.

Many of these policies have been structured under labels follow a particular (and we will argue potentially pernicious) trend: national or international guidelines, policies or regulations, such as the EU’s and USA’s Trustworthy AI and China’s and India’s Responsible AIFootnote 1, frame themselves under a label that seems to attribute morally and agentially loaded notions such as responsibility or trustworthiness to AI technologies, even if this is merely an implicit result of how these policies are branded. These notions are broadly understood to be appropriate only when applied to agents, and the concern here is that this may create an inducement to inappropriately attribute these agential states to AI technologies, which are not themselves legitimate bearers of responsibility or trustworthiness. In many ways this follows from and extends the concerns that have been raised by Ryan (2020) regarding ‘Trustworthy AI’, but we endeavour to show that we have good reason to avoid any general AI policy that follows the labelling recipe of [agentially loaded notion + ‘AI’], using the most popular versions – ‘Trustworthy AI’ and ‘Responsible AI’ – as our exemplars. We also go beyond Ryan’s original argument in providing additional reasons why such labels should be avoided. To replace this potentially troublesome framing, we suggest that it is better to use the labels ‘reliable’, ‘safe’, ‘explainable’ or ‘interpretable’, and ‘sustainable AI’ in public-facing communication. As these notions already form the backbone of most of these policies, the cost of making this change appears to be very low. Relatedly, if the aim is to use a label that will communicate the presence of all of these qualities as well as a stronger moral acceptability to the public (as the labels of ‘trust’, ‘responsibility’, and ‘ethical’ convey), then we suggest labelling the policies not in terms of some quality of AI, but rather in terms of our approach to the technology and its wider development and use context: using labels such as being trustworthy about AI, rather than trustworthy AI.

We develop this argument by first outlining why the demand for the regulation of AI has become such a global trend, with an emphasis on the ethical concerns that such regulation is supposed to mitigate or resolve. The dangers raised by the use of agentially and morally loaded notions like ‘Responsible AI’ and ‘Trustworthy AI’ are then explained, using these as our example and focusing on the way these notions may entice us to attribute inappropriate capacities and status to AI technologies, with the danger of both obscuring real ethical shortcomings by involved human agents and setting up perverse incentives for how these technologies are developed, sold, and used. We then show how, when it comes to the qualities of the AI technologies themselves, aiming at the less loaded notions of reliability, safety, explainability or interpretability, and sustainability gets us coverage of the ethical concerns we originally sought regulation for – or at least as far as a consideration of the technology in isolation can get us – and that this is widely recognised even by those producing the policies under discussion. Finally, as the ethical concerns we have extend beyond the properties of the technologies themselves and can only be properly accounted for by considering the socio-technical contexts of their development and use, we should employ framing devices and labels that capture this whole. It is, for example, not about whether what is produced is an AI that is trustworthy or responsible, but about whether those involved are trustworthy or responsible about AI.

2 Ethical Concerns and the Demand for Guidelines

The moral perils raised by the use of AI technologies, prominently including morally problematic bias, privacy violations, techno-responsibility gaps, and sustainability, have received considerable attention within the data ethics literature – and some of these have already manifested to various degrees in real world situations.Footnote 2 At the same time, there is a concern that many in the general public approach AI technology with a surfeit of caution (Oxborough & Cameron, 2020), and this must be combated by finding ways to promote confidence in, and the uptake of, such technologies (Gillespie et al., 2021: 2–3). This is exacerbated by the fact that the complexity of the technologies in question often results in high levels of epistemic opacity regarding their internal functions or in explaining how a given outcome came about – the so-called black box problem. Meeting these challenges will likely involve the sorts of expensive and time-consuming measures that corporations are not known for undertaking without regulation. There is also a lack of clarity as to what it means for these technologies to be rendered sufficiently explainable or interpretable to avoid epistemic opacity, and how this is to be accomplished (Brennen, 2020). This creates a demand for guidelines – a set of best practices – put together by experts, publicly communicated, and socially endorsed, which can then be followed.

In the face of this, there is justified pressure to develop policy guidelines and regulations that address these moral perils while also promoting increased user confidence in AI technologies. Although there are numerous examples of measures of this sort being enacted to varying degrees of good faith among corporations, the most significant have been undertaken by mixed expert groups in the strategic interest of a major political player: in the European Union, the Ethics Guidelines for Trustworthy AI (HLEG AI, 2019) and the EU AI Act (European Commission, 2021); in the USA, Promoting the Use of Trustworthy Artificial Intelligence in the Federal Government (Executive Order 13,960, 2020) and the Executive Order on the Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence (Executive Order 14,110, 2023); in the People’s Republic of China, the Governance Principles for a New Generation of AI (MOST, 2019); and, in India, Towards Responsible AI for All (NITI Aayog, 2021). The aim of such regulation is multidimensional. It aims to promote further innovation in the field as well as ensure that the development, purveyance, and use thereof meet ethical standards. It also aims to advance the geopolitical positioning of the relevant political actors; and, even if not a direct aim, the involved corporate stakeholders will doubtless push for choices that will boost their eventual profits. Finally, and most pertinently here, it aims to convey to the public certain understandings and expectations regarding AI technologies. Given these mixed aims, it is perhaps unsurprising that the results have often involved “lofty principles, conflicting interests” and not a small degree of controversy (Arcesati, 2021).

3 AI Policies and Agentially Loaded Qualifiers

As we have seen, the most influential AI policies, at least at the national level, have adopted the labels of ‘Responsible AI’ and ‘Trustworthy AI’. In each case, the name used is meant to be a stand in for a set of criteria that must be met for a given product to be worthy of the ‘Responsible AI’ or ‘Trustworthy AI’ stamp of approval. Crucially, this involves two interrelated but importantly distinct tasks. First, it involves determining what demands we should make of the technologies in question – what should an AI have to be able to do or be in order to qualify as meeting our ethical demands? Secondly, it means accounting for the demands that we should make of the wider socio-technical contexts in which the technology is conceived, developed, and used. This concerns demands placed on developers, purveyors, regulators, and users.

There is an intriguing convergence in the structure of both ‘Responsible AI’ and ‘Trustworthy AI’ policies: both encapsulate their requirements under catchy labels with the structure of [morally and agentially loaded notion + ‘AI’]. At first blush this may seem an innocuous enough choice. After all, it seems reasonable that we should want AI technologies to stand in some appropriate relationship to these morally significant notions – and we surely do not want unethical, irresponsible, or untrustworthy AI. There is, however, a danger here. The notions in question are traditionally understood to be most appropriately, if not exclusively, applicable to agents with certain capacities that AI technologies are usually thought not (yet) to possess. This argument has already been well-made, at least in part, as regards trust and trustworthiness by Ryan in his response to the EU HLEG’s approach of ‘Trustworthy AI’ (2020). In his paper, he lays out the details for how most of the dominant theories of trust in the philosophical literature require that the trusted party must possess some agentially rich motivational states to be the sort of thing that can be appropriately trusted, and that present AI technologies lack these. Another problem is that the term ‘trustworthy AI’ conflates several important meanings, of which Stix (2022: 4–5) identifies five:

  • trust in the proper functioning and safety of the technology;

  • the technology being worthy of the trust of the humans making use of it or encountering it otherwise;

  • humans making use of it or encountering it seeing the technology as trustworthy;

  • humans making use of it or encountering it experiencing the technology as trustworthy;

  • the technology that is worthy of trust to all.

It is noteworthy that the first of these meanings has the additional complication of conflating the distinction between trustworthiness and reliability (see Sutrop, 2019 for a discussion of this ambiguity as applied to the HLEG’s Trustworthy AI policy).

This same argument - that AI technologies lack the necessary capacities for appropriate attribution - can, and has been, extended to talk of ‘Responsible AI’ (for examples see Gunkel, 2020; Dignum, 2019; Constantinescu et al., 2022). This should come as no great surprise given the sheer quantity of contributions to the discussion of responsibility gaps resulting from the introduction of AI technologies (Matthias, 2004; Sparrow, 2007; Himmelreich, 2019a; Dignum, 2020), a concern that only gets off the ground if it is assumed that these technologies are inappropriate targets of responsibility. More shortly stated: according to many currently dominant philosophical theories of responsibility and trustworthiness, there is no current or near future AI technology that can be responsible or trustworthy. This is not, of course, to say that these technologies are irresponsible or untrustworthy, but rather that the application of these qualifiers is a category mistake. Although we will in what follows focus on ‘Trustworthy AI’ and ‘Responsible AI’, this is due to their being the most widely employed and influential versions of the [agentially loaded notion + ‘AI’] recipe, not because our objections are limited to them. Indeed, we contend that any use of this recipe is likely to result in worrisome outcomes and is better avoided. This would encompass combinations such as ‘Ethical AI’, ‘Moral AI’, ‘Caring AI’, etc., all notions that carry heavy moral and agential baggage. It would be a long task of whack-a-mole to list each potential combination and address its problems. Our identification and discussion of those applicable to the most common versions can be seen as at least pro tanto evidence that all of them should be avoided.

Our contention is that there are two crucial points of worry here: (1) though it may not exhaust the impact and value of these policies when fully unpacked, the application of these morally and agentially loaded notions to AI technologies remains inappropriate and (potentially harmfully) misleading - and (2) it still introduces into the conversation a notion that can be easily misused or abused by those who either wilfully or through ignorance fail to appreciate the full scope of the policies in question. As the labels used for these policies should be chosen to best promote their overall aims and serve as a form of public communication about the subject matter - and so cannot make the assumption that it will be employed only or indeed primarily by those who have carefully scrutinized the document - these concerns cannot be waved aside as insignificant.

It could be retorted that these concerns are misplaced or pedantic. It could be noted that the policy guidelines encapsulated by these terms are richer and more nuanced than the simple idea that the AI itself should be responsible or trustworthy and is more focused on the provision of criteria whereby this ‘level of certification’ for an AI technology can be reached. These criteria often involve demands of ethical conduct on the part of developers and purveyors, and outlines measures to be taken by regulators in order to incentivise or enforce these demands - for an example, consider the contents of the HLEG’s seven requirements for ‘Trustworthy AI’ (2019: 14–19). These persons in the wider socio-technical context may be legitimate bearers of responsibility or trust, and so the original concern is just the result of taking a label too seriously. This retort is not frivolous, and it is certainly true that the people formulating these criteria are doing important work in helping us respond to the possible perils of AI technologies. However, our primary concern is with these policies as forms of public communication that shape our pervading social discourse. Many laypeople will never hear more about these policies than their labels, and those who select these labels are unlikely to be unaware of this. Selecting a label like “Trustworthy AI”, for example, is (at least partly) intended to help promote trust by the laity in those technologies that meet this standard. If the meaning of the term in a given policy actually reduces to or is conflated with other notions such as safety, privacy, sustainability, etc., then the concern is why use the label of trustworthy, when it can lead the laity – who are unlikely to know this – to potentially extend these attitudes directly to AI systems? We recognise this concern as speculative, but not without sufficient plausibility to be taken seriously. The risk that such extension might occur has yet to be empirically studied but given the well-established correlation between the association of anthropomorphic or agentic features with an AI and the likelihood of these extensions (Hancock et al., 2021; Kaplan et al., 2023; Omrani et al., 2022; Kawai et al., 2023; Messer et al., 2024) it seems to be a reasonable possibility. Furthermore, as shown in Waytz et al. (2010), perceiving AIs as humanlike can encourage considering them as moral agents, and their actions as the results of autonomous decision-making, thereby having a moral impact that they should not have. Finally, though there is not yet empirical research testing the impacts of the terms ‘trustworthy AI’ and ‘responsible AI’, it is worth considering the conclusions of Salles et al.’s (2020) analysis of the use of anthropomorphic language within AI research itself. Their discussion on the impact this language use on the public concludes (2020: 93):

In the general public it inadvertently promotes misleading interpretations of and beliefs about what AI is and what its capacities are. As noted before, this represents a significant failure in scientific communication and engagement, and one that is not ethically minor. Rather than checking and managing laypeople’s anthropomorphic tendency it tends to support it. But to the extent that the tendency to anthropomorphize shapes how people behave toward the anthropomorphized entity, such anthropomorphism has ethical consequences...anthropomorphizing AIs may be the source both of overblown fears of AI (that they will make humans obsolete, for example) and of uncritical optimism (regarding the extent to which AIs could actually behave like humans and perform difficult tasks better than humans). Finally, such anthropomorphism might create ethical confusion by blurring moral and ontological boundaries.

All of this makes it at least plausible that the use of language of the [agentially loaded notions + ‘AI’] sort will only feed into these trends and concerns. And since, as we will argue, the cost of changing the language to avoid this is so low, it seems similarly reasonable to take steps to avoid the possibility rather than court it unnecessarily. At the least, it should place a burden on those seeking to use this language in the labelling and framing of publicly facing communication to provide good reasons for why they we should follow their lead. If we plausibly speculate that a potential harm might occur and the remedy to it is this low-cost, it seems eminently reasonable to take it seriously.

As a nearby example of our concern, consider the argument of Laux et al. (2023) that the notion of trustworthiness used in the EU AI Act when discussing ‘Trustworthy AI’ is problematically conflated with the notion of the acceptability of risk. Since “the Commission chose to understand ‘trustworthiness’ narrowly in terms of the ‘acceptability’ of AI’s risks, with the latter being primarily assessed through conformity assessments carried out by technology expertise” (Laux et al., 2023: 1) it leaves questions unanswered as to whether this approach is likely to engender trust in the folk, and more pertinently for our concerns, it does not necessarily tell us anything about the technologies’ trustworthiness. All else equal, we should then ask why a given policy selects to use a label like ‘Trustworthy AI’ or ‘Responsible AI’ if the content actually reduces to other non-agentially loaded notions or to the attribution of these notions not to the AI but to something else. We will discuss what we argue to be superior approaches to policy framing that avoids these concerns in Sect. 4 and Sect. 5. However, before turning to these, we will first provide further support the argument that we have good reason to avoid using conceptual framing that is likely to induce some to attribute agentially loaded notions to AI technologies themselves, by examining the worries that such attributions can raise in the cases of ‘Trustworthy AI’ and ‘Responsible AI’.

4 Reasons to Worry About ‘Trustworthy AI’ and ‘Responsible AI’

The HLEG policy document clearly states that it takes the stance that AI technologies can themselves be trustworthy (2019: 4–5) and defends doing so by referring to the importance that trust has for social cohesion and acceptance. Although it is true that the vast majority of philosophical accounts of trust only permit agents to be trustworthy (see Nickel et al., 2010 and Ryan, 2020 for a discussion), the authors could have brought into play a few theories regarding trustworthiness that do permit the extension of this notion to non-agents. Some examples would be the rational trust approach of theorists such as Mollering (2006) or the more recent integrated agency approach of Nguyen (2020). A different approach in the same ballpark would be that of Viehoff (2023): to view the problem of non-agential trust as a problem of conceptual engineering, a problem the resolution of which Viehoff holds should keep the door open to non-agential trust. However, we contend that none of these approaches would allow policymakers to escape the worries raised by permitting the attribution of trustworthiness to AI in particular. Before delving into this, it is first important to highlight that what we should presumably aim for is not that users trust AI, but that AI is trustworthy. The former of these would be a question for psychology and, given the explosion of literature asking this precise question, it is clearly considered to be a question of great importance. The latter is asking whether the AI technologies in question meet the criteria for an agent to justifiably place their trust in the technology. If there is to be talk of these technologies being justifiably trustworthiness, then it must be in line with the minority theories of trust that do permit such an extension to non-agents. However, these very approaches have among their conditions on trustworthiness precisely sufficient, appropriate reliability (Tuomela & Hoffman, 2003: 168; Nguyen, 2020: 2). Since the focus of these policies should be the promotion of those features that would render an AI in fact trustworthy, there would be greater clarity in straightforwardly identifying reliable AI as the immediate aim. This would not only have the advantageous upshot of avoiding the danger of either supporting an approach to trustworthiness that does not on the final tally hold up but also prevent those engaging with the policies from mistaking the sort of trust at work here as being the agentially loaded sort rather than the more object appropriate accounts.

‘Responsible AI’, as it is commonly employed in AI policies, is a multidimensional notion. Moreover, the identities of these dimensions can differ dramatically between different ‘Responsible AI’ policies, which should not be surprising given that there is “no single agreed upon end-to-end guide that covers its different facets” (Agarwal & Mishra, 2021: 3).Footnote 3 Or, as Dignum (2019: 93) puts it:

Responsible AI means different things to different people. The concept of Responsible AI also serves as an overall container for many diverse opinions and topics. Depending on the speaker and on the context, it can mean one of the following things:

  1. 1.

    Policies concerning the governance of R&D activities and the deployment and use of AI in societal settings,

  2. 2.

    The role of developers, at individual and collective level,

  3. 3.

    Issues of inclusion, diversity and universal access, and,

  4. 4.

    Predictions and reflections on the benefits and risks of AI.

Noticeably, in none of these meanings does the idea of AI systems themselves being responsible come into play. Our contention here is that this is an oversight, though an understandable one. For those working in AI development and governance, ‘Responsible AI’ is a term of art, even if it has not settled on a single meaning (and may never do so). But it is not at all clear that laypeople will have these same apprehensions. They are unlikely to be familiar with the contexts within which the meanings that Dignum lists occur, and their central – and possibly only – significant communicative engagement with the term is likely as a label for a public-facing policy. This opens the way for the sorts of potential misunderstandings mentioned by Waytz et al., 2010 and Salles et al., 2020, as well as additional worries that we will unpack in the succeeding sections.

Interestingly, despite the variety of meanings ‘Responsible AI’ can carry, when we consider the national-level policy guidelines of China (MOST, 2019) and India (NITI Aayog, 2021), both of which employ the ‘Responsible AI’ label, we find a high degree of overlap at the level of principles as illustrated in the Table below.

Governance Principles for a New Generation of AI

Towards Responsible AI for All

Harmony and friendliness

Principle of protection and reinforcement of positive human values

Fairness and Justice

Principle of Equality

Inclusivity and sharing

Principle of Inclusivity and Non-discrimination

Respect privacy

Principle of Privacy and Security

Secure/safe and controllable

Principle of Safety and Reliability

Shared responsibility

Principle of Accountability

Open collaboration

Principle of Transparency

Agile governance

 

Though various elements of the moral and governance terrain are sliced up differently between these two sets of principles – with transparency, for example, being a principle of its own for the Indian guidelines but incorporated into Secure/safe and controllable in the Chinese version – much the same ground is covered.

As with Dignum’s various meanings, neither set of principles are committed to the idea that AI technology can or should itself be treated as responsible. However, it is not the content of the guidelines or policies with which we necessarily take issue, but with the choice of label itself – what is it about either of these sets of principles that makes the label ‘Responsible AI’ appropriate? Prima facie, a label such as that brings to mind questions such as: (1) responsible production and development, (2) how prospective responsibility for the various elements of development, use, and future deployment are divvied up, and (3) determining who bears retrospective responsibility for the outcomes of an AI.Footnote 4 Given our concern with misleading framing, it can be immediately noted that (1) does not actually involve any responsible AI at all – i.e. any AI to be a bearer of responsibility. Indeed, (2) and (3) may open the door for such responsible AI, but this would be a matter of contingency and largely determined by whether or not the system in question is open to legitimate responsibility attributions. In the case of ‘Responsible AI’, there is even more opposition to the idea that an AI can be responsible than there is to the idea that an AI can be trusted (Matthias, 2004; Sparrow, 2007; Himmelreich, 2019a; Dignum, 2019, 2020; Gunkel, 2020; Coeckelbergh, 2020). Again, the reason for this is that, to be a bearer of responsibility, usually requires an entity to be an agent, and indeed a moral agent. Continuing the parallels with trust and trustworthiness, we can attribute responsibility to non-moral agents, but this is viewed as an illegitimate attribution – an error. Though the consensus that AI technologies cannot be legitimately held responsible is currently broad by the standards of philosophical discourse, there are important voices that object. List (2021) contends that the same functionalist picture of agency that permits group agency should also permit similar artificial agency, though noting that this does not necessarily mean extending moral significance to AI systems. Though undoubtedly an interesting perspective, it should be noted that even List admits that present AI systems do not meet all the necessary conditions for moral agency (2021: 1229), though it is not a conceptual impossibility that in the future they might. We are in fact in agreement with List on this, as we are not committed to the impossibility that some AI technologies at some point might be legitimate bearers of responsibility, but that insofar as current AI policies are meant to help us deal with current AI technologies, this is a concern for a possible future, not grounds for contemporary policy framing. As things stand, there is little reason to think that the attribution of responsibility to an AI could be anything other than inappropriate according to our best philosophical theories. The question we seem to be left with is why persist with the framing of ‘Responsible AI’ if it does not necessarily involve responsible AI, and it stands to potentially induce those who do not sufficiently scrutinize its framing to make inappropriate attributions?

While the danger of inappropriate use is on the face of it a thing to be avoided unless there is good reason militating against it in the other direction, we offer several additional arguments for why we have moral reason to avoid promoting such misunderstandings:

  1. 1.

    It can obscure the true culprits of betrayal in cases of trust violations.

  2. 2.

    Creates problematic incentives for involved corporations.

  3. 3.

    It can result in the experience of betrayal harms that would be otherwise avoided.

  4. 4.

    It can engender inappropriate vulnerability.

In the following subsections these reasons are unpacked further.

4.1 ‘Trustworthy AI’ and ‘Responsible AI’ as an Obscurement

This concern, explicitly raised by Ryan (2020: 2), is probably the most significant and immediately worrisome peril of employing the [agentially loaded notion + ‘AI’] formula in the case of trustworthiness. It is a near-universal feature of agentially loaded accounts of trust that when trust is violated, there is an experience of betrayal on the part of the truster, and this feeling of betrayal is aimed at the trusted (Baier, 1986; O’Neill, 2002; Tuomela & Hoffman, 2003). Without intending to muddy the water between ‘Responsible AI’ and ‘Trustworthy AI’, it is also important that these feelings of betrayal are often motivating parts of our responsibility practices, with resentment from such betrayals being paradigmatic examples of the Strawsonian reactive attitudes characteristic of moral blame (for the seminal work concerning these attitudes see Strawson, 1962). Even in cases where a betrayal might not rise to the standard of full moral blame, it usually results in important restructuring of an agent’s relationship to the trusted (Hardin, 2002). Even Nguyen, who is among the friendliest to approaches that permit non-agents to be trustworthy, makes an effort to position feelings of betrayal as a justified and characteristic response to violations of trust - be this by other agents, our own body parts, or objects sufficiently integrated into our agency.

The danger is that if an AI technology is taken by a user to be the appropriate target of trust or responsibility, then they will be similarly likely to make this technology a target of their feelings of betrayal and fail to instead target the morally relevant trust violators standing behind the technology, so to speak. The first part of this concern is similar to that present in techno-responsibility gaps (Matthias, 2004; Sparrow, 2007; Hellström, 2013; Danaher, 2016; Himmelreich, 2019a), where the introduction of the technology threatens to result in inappropriate responsibility attributions. To explicate the analogy: the introduction of an AI technology into a causal chain might result in a breakdown of our ability to properly attribute responsibility for harmful outcomes. This breakdown is usually termed a responsibility gap, or more precisely a techno-responsibility gap (Tigard, 2021). Such a gap occurs when the legitimate attribution of moral responsibility seems to be called for on account of some outcome, but there appears to be no fitting target for such attribution. Gaps like this can arguably arise from many contexts, most with nothing directly to do with AI technologies. But the techno-responsibility gap is a special subset: a case where the involvement of a technology as an intervener in the causal chain of action is a reason for the occurrence of a gap. Techno-responsibility gaps arise when all the following hold:

  1. 1.

    There is some outcome, O, which is morally unacceptable.

  2. 2.

    O must be the result of the activity of some technology.

  3. 3.

    There is a demand for responsibility, D, on account of O.

  4. 4.

    D is a legitimate demand.

  5. 5.

    The technology is an illegitimate target for responsibility.

  6. 6.

    There is no other legitimate target for responsibility demand D.

  7. 7.

    C) There is no legitimate target for responsibility demand D.

We have moral reasons to avoid events such as this as they represent instances of injustice for the victims of harms, whose legitimate demands go unanswered. There are two features of AI technologies that make them uniquely prone to opening such gaps: (i) they operate with some (presently minimal) degree of autonomy – great enough to problematise responsibility tracking, but insufficient for legitimate openness to responsibility (Nyholm, 2018). And (ii) they are often epistemically opaque, which means that developers and users cannot predict the outcomes they will give rise to with the degree of certainty needed for these humans to bear the responsibility for these outcomes.

So, if an AI is (erroneously) deemed to be the responsible entity for some morally problematic outcome, then it seems there is a responsibility gap if we assume (as is generally agreed) that the technology lacks the capacities to be legitimately responsible. So too in the trust case, there is a danger of seeing these technologies as being capable of betrayal, when they are not the sorts of things that can legitimately betray you. We may, in cases such as these, have good reason to speak of “betrayal gaps” in a sense, but it should be noted that it is not entirely analogous to the case with responsibility: our responsibility practices are directed toward others in ways that often demand responses from the targets of responsibility, thus when there is a demand for responsibility but no appropriate target our practice is frustrated. On the other hand, our feelings of betrayal do not necessarily have this feature, and where it does it is potentially difficult to divorce such betrayal demands from responsibility demands – when we demand a response on the basis of betrayal, this is usually a demand for responsibility. Despite this difference, it is worth noting that the overall picture that emerges still reinforces our central argument that the application of either agentially loaded notion to AI raises concerns.Footnote 5

The worry in the case of both trustworthiness and responsibility is that placing these inappropriate targets in the way will screen those who should be the proper targets of the attitudes in question. This would be to the perverse benefit of the developers or purveyors who may have failed in their moral duties toward the user but whose failings and responsibility go unrecognised. If used in this way, the guidelines that facilitate this would be a form of ethics washing (McMillan & Brown, 2019; Heilinger, 2022) This possibility is made particularly likely due to the susceptibility of AI technology to anthropomorphisation. The tendency to anthropomorphise is widespread among humans (Epley et al., 2007). We see faces in light sockets and sometimes say things like, “Damn it! Why did you have to break now?” when our computer’s fail us. In most cases this is entirely unproblematic. In some cases, however, this can result in problems when anthropomorphisation leads us to attribute inappropriate agential capacities or states to non-agents (Heilinger, 2022: 14–15). AI technologies (particularly brain-inspired varieties) are uniquely capable of exacerbating this tendency and inducing such misattributions of agential capacities to these technologies (Salles et al., 2020).

It may well be the case that different AI technologies, particularly those differing in the degree and type of their anthropomorphisms, might be more or less likely to incite feelings of betrayal or blame (the negative dimension of responsibility). As usual, the assumption would be that the greater the anthropomorphism, the greater the likelihood that feelings of betrayal are incited, or blame is attributed. Is this merely a matter of degree? In the case of responsibility there is considerable literature on the idea that responsibility comes in different types (Shoemaker, 2015; Tigard, 2021), but these variations are not usually indexed to the features clearly related to anthropomorphism. On one influential taxonomy, responsibility can be divided between answerability, accountability, and attributability, and what differentiates these is the type of failing that legitimates them (poor judgement, insufficient regard, and bad character, respectively). We think it unlikely that these would map meaningfully onto the differences in AI technology. Regarding betrayal, might we talk about different sorts of betrayal relevant in these different cases? The latter is an option hinted at by Nguyen (2020), who notes that the feelings incited by failures by trusted non-agents might be betrayal-like, enough like betrayal to still capture this apparent explanandum, but not identical to the feelings incited by agential betrayal. As is very often the case when trying to parse attitudes of this sort, it is not clear where the joints are, and differentiating what constitutes a qualitive as opposed to a mere quantitative change can be illusive. For our purposes, we grant that it is certainly the case that there is at the least differences in degree between betrayals – even amongst betrayals by agents this would be the case – and that the level of anthropomorphism of a technology is likely to impact this degree. We are agnostic on whether we should speak of different sorts of betrayal here, but note that the issue of obscurement discussed above, and of unnecessary betrayal discussed in Sect. 3.3, would apply to any sort of betrayal that shares the features of: (i) it being targeted towards an entity that is taken to be its bearer on account of some failure of trust, and (ii) it is a feeling that negatively impacts the betrayed agent’s well-being (i.e. it is a form of emotional harm).

4.2 Problematic Incentives

Many people will hear about the notions of ‘Trustworthy AI’ and ‘Responsible AI’ and associate it with a certain idea of moral security when considering AI technologies. They will not have any detailed knowledge about what the policy document this label is intended to encapsulate actually says, but both the name itself and the purpose behind it will make them put aside some ethical doubts they may otherwise have held about AI technology. At least this is the outcome that the authors of the policy document are presumably hoping for, as such a document is always, among other things, intended as a form of public communication - in this case a way of conveying that the developer, purveyor, and implementer of the policy takes the ethical concerns of AI seriously. However, there is the real threat that this setup may incentivise the corporations directing so much AI development, and the nations looking to protect and improve their geopolitical positions, to focus their attention not on being trustworthy or responsible about the AI they are developing, but rather on what can be done to make users trust their products or view them as responsible. There is already a fulsome range of studies investigating how best to promote user trust in various AI technologies, and corporations have significant incentive to heed any findings that result. The motivation for much of this is often framed as a response to the (perceived to be unreasonable) reticence among the folk toward trusting AI technology. A common (though by no means universal) trend in the findings of these studies has been that increasing the features of, or adding new features to, the technology that induce the attribution of anthropomorphism often correlates with increased attributions of trust (Glikson & Woolley, 2020). This result should not be surprising if we assume that trustworthiness is usually something that the folk attribute to agents - make the technology more agent-like, increase the attribution of trust. In light of what we have already discussed, the danger here is hopefully obvious: these corporations have an incentive to develop their technologies in such a way as to induce the user to anthropomorphise these technologies. This will contribute to the inappropriate attribution of agential states to these technologies, feeding the other worries listed here.

The problematic incentives are even more worrying in the case of ‘Responsible AI’, as the danger compounds with the issue of obscurement we have already discussed. If individuals are induced by the framing to attribute responsibility to AI technologies and so fail to adequately hold responsible the true culprits behind the scenes, then these culprits have reduced reason to be attentive to the possibility of future failures, to rectify the existing failures, and to make reparations to those already harmed. This is an incentive to cut corners and be reckless, callous, and apathetic. It is also an excellent example of the self-defeating nature of the label ‘Responsible AI’. A further concern, identified by Tuvo et al. (2022), is the possibility that anthropomorphising an AI can lead to a diffusion of responsibility in Human-Robot Interaction. The conclusion they draw from their experimental work is that when a human undertakes a joint task with a robot, then the level of anthropomorphism attributed to the robot by the human participant is negatively correlated with human participant’s perception of their own responsibility for the negative consequences of their actions. If we enter a world where robots (and possibly other technologies) bearing a ‘Responsible AI’ label work alongside human workers, this has the potential to serve as an inducement to further anthropomorphise the technology, and if Tuvo et al. are correct, worryingly diffuse perceptions of responsibility. As we have already discussed, ‘Responsible AI’ policies – when sufficiently scrutinised – contains requirements for proper responsibility schema in order to distribute responsibility aptly in cases of harm, or when considering future developments that carry risks of harm. If this is a true goal for those employing this label, then the very possibility that it could lead to the sort of problematic incentives noted here should be a reason to abandon this framing. Why take the risk of a breakdown in correct responsibility distribution for no benefit, when a more accurate label would remove the concern?

4.3 Unnecessary Betrayals

The experience of betrayal is, among other things, a form of emotional harm (Rachman, 2010) and, almost certainly relatedly, humans demonstrate strong betrayal-aversion (Bohnet & Zeckhauser, 2004). We thus have a pro tanto moral reason to avoid bringing it about, if possible, as it is both a direct form of disutility and is also actively undesired by almost all humans. This is most obviously true in our direct relations with others, where violating another’s trust (i.e., betraying them) is only morally permissible if there are strong countervailing moral reasons for doing so. But it also includes more unusual situations: I should not convince my friend to put their trust in somebody that I can foresee will, with sufficient likelihood, betray that trust. It is also important to note that the emotional harm of experienced betrayal is no less harmful if the original trust was misplaced or not, though it might be dispelled if the truster can be convinced that what they trusted could never be the sort of thing that can betray someone. Indeed, in cases where the betrayal response is aimed at something that could never be trustworthy, any emotional harm that results is doubly tragic for being wholly unnecessary. There is evidence that this sort of interaction occurs in human dealings with computers, where some users will feel betrayed by certain failures (Ferdig & Mishra, 2004). However, we concur with Vanneste and Puranam (2022: 26) when they point out that it seems at least plausible to surmise that the feeling of betrayal would be more pronounced or more frequent when dealing with more agentic technologies such as many AI systems, but that ultimately this is a question for empirical investigation. We will be holding the assumption that it is true for the rest of this work.

To adapt this to the present context, we can sketch out two possible ways that taking an AI as the source of betrayal could lead to unnecessary betrayal harms: (i) sometimes a technology’s failure is just an accident. In this case, no relevant person in the socio-technical context has been anything less than trustworthy. Seeing an AI technology as trustworthy can induce a user to feel betrayed in a case of failure such as this, where if they had not formed this mistaken expectation of the technology no such feeling would have arisen. As a result, unnecessary harm is experienced. (ii) even when there is a betrayal in the wider context and the user appropriately targets it, it may be that the user feels an additional betrayal by the device itself due to the inducement to view it as trustworthy. This again is unnecessary harm. A conclusion we can draw from this is that we have a pro tanto moral reason not to take actions that we foresee might, with sufficient likelihood, result in individuals placing misplaced trust in AI technologies that can then fail this trust and result in betrayal harms.

4.4 Inappropriate Vulnerability

Similar and related to betrayal responses, another common component of the prominent theories of trust is that entering a trust relationship at least sometimes involves the truster making themselves vulnerable to the trusted. This vulnerability, as it is usually unpacked, is more than the mere vulnerability to the failure of delivery of an outcome but follows from the need to put one’s faith in the trusted. In light of this, Kerasidou et al. (2021: 3) makes the following argument in the context of AI applications in healthcare:

In trust relationships, the trustor can become vulnerable to the trustee, and dependent on their goodwill…By asking the public to trust AI, and as such the tech companies driving this innovation, what is asked of them is to accept the risk that these companies are free to decide whether they will confirm or betray public trust. But how could the public reasonably take such a position, when they feel that they don’t yet have reasons to trust? It seems inappropriate to ask the public to accept that position of vulnerability. In this light, trust seems to be an inappropriate, if not a dangerous, basis on which to base our relationship to AI.

Crucial here is that the vulnerability is not only to things going astray, but to the emotional harm of the betrayal itself (Vanneste & Puranam, 2022: 26). Though certain sorts of vulnerability are at times unavoidable, there are other sorts of relationships that we could envisage between users and AI technologies that would at least not open them to this particular variety of vulnerability. The immediate example would be reliance - which is also the preferred alternative of Ryan (2020: 17) and Kerasidou et al. (2021: 4). To rely on a technology might involve some openness to vulnerability due to failure, as any assessment of reliability might be inaccurate or the given reliability might be very low, but it does not entail the emotional vulnerability to betrayal so characteristic of trusting relations. However, it should be noted that vulnerability’s special relationship to trust is disputed between different accounts of trust - some argue that this openness to a special vulnerability of betrayal is constitutive or essential for trustworthiness (Baier, 1986; Hall et al., 2001) while others see this vulnerability as only a possibility that may sometimes arise in trusting situations (Nickel, 2007; Hinchman, 2017). Even so, if it is granted that these situations of special vulnerability only arise from time to time it still seems troubling to ask people to accept a relationship with a technology that opens this possibility even sometimes when no such vulnerability need be present at all.

5 What we need AI Technologies to be: Reliable, safe, Explainable or Interpretable, and Sustainable

It could be said that the concerns discussed in Sect. 3 are only directly pertinent to trustworthiness and responsibility. It is possible that some other agentially loaded notions would not be open to similar concerns, though of course by the same dint they could open new ones. It is not impossible that this is true, but we have severe reservations. It will very likely be the case that the close association of the agentially loaded notion with the AI technology itself will induce misplaced attributions, something exacerbated by our existing tendency toward anthropomorphizing AI. Although this is most pertinent for ‘Trustworthy AI’ and ‘Responsible AI’ as undoubtedly the biggest games in town, the same is also true for other popular notions used in this role, like ‘Ethical AI’ or ‘Moral AI’ (for examples see Theodorou & Dignum, 2020; Gibert & Martin, 2022). This should not be surprising: the norms around avoiding the misattribution of these notions to non-agents have a functional role and flaunting them by inducing such misattributions to AI technologies was always likely to cause problems. Though we do not rule out the possibility that in the future there may be AI systems that justify the application of some or all of these notions, given the present and near-future technological realities we would urge AI guidelines and regulations to avoid contributing to AI anthropomorphism or the possibility of attributing agential features to AI.

Indeed, if our arguments have merit then it would be best if such guidelines explicitly discouraged measures in the design or use of AI that would themselves contribute to misattributions, or alternatively have clear mechanisms to mitigate the possible risks. For example, it may be the case that designing carebots with certain anthropomorphic features brings a great net benefit in well-being amongst those cared for, and so we would have a good reason to want to maintain these features. But to mitigate the possible harms of misattribution of, for example, trust and responsibility, the technology should be accompanied with clear and transparent schema for the distribution of both – one where human individuals or groups are the bearers.

This is not to say that there are not important and reasonable demands that we can make of AI technologies. If we disregard the attribution of agentially loaded notions, and with some individual differences, the various AI policies in play have converged on a similar set of significant requirements regarding the qualities of AI technologies. They should be safe, they should be explainable or interpretable, they should be reliable, they should be sustainable. These all have the advantage of not being agentially loaded, and so do not bring the baggage or the dangers that we have been discussing thus far. Just as important, taken together they get us all that we can reasonably demand of the AI technologies themselves. What is not necessary is for AI to be saddled with misleadingly agentially loaded qualifiers, which serve only to potentially mislead and even obscure malicious harm. Thus, the development, regulation, and application of these technologies should meet these richer moral standards, and those undertaking such development, purveyance, regulation, and use should indeed be responsible, trustworthy, ethical, and moral. But AI technology itself is not, and cannot (yet) be, any of those things.

Recall the ethical concerns that originally sparked the urgent need for AI policies: unfairness and morally problematic bias, privacy violations, techno-responsibility gaps, and sustainability. Insofar as notions legitimately applicable to AI technology itself can combat these worries, the ones we should aim for are indeed those that have already been identified by the experts: reliability, safety, explainability or interpretability, and sustainability. Without meaning to claim the final word on any of these understandings - each of which undoubtedly deserve their own treatment - it is fair to say that the legitimate applicability of none of these notions are exclusively or even paradigmatically limited to agents.

To work in reverse order: by sustainable AI is meant, in line with the World Commission on Environment and Development’s definition of sustainability: “meeting the needs of the present without compromising the ability of future generations to meet their own needs” (WCED, 1987: 27), an AI technology that either allows for the meeting of the needs of the present without compromising the ability of future generations to meet their own needs or helps to promote this outcome. Vitally, this is not to say that the development and manufacturing process that results in the creation of the technology must not itself aim to be sustainable, it surely must. But the technology itself, as an object in the world, should be designed to as best as possible be sustainable in the course of its use. Explainability and interpretability are hot topics in the discourse surrounding AI technologies and are often seen as an important ingredient in meeting the challenges of unfairness and morally problematic bias (Stevens et al., 2020; Zhou et al., 2022) and techno-responsibility gaps (Baum et al., 2022). Both are ways to combat the epistemic opacity of AI technologies, though the usage of the terms is confused across the literature: at times they are employed interchangeably, while at others they designate two distinct approaches. We are following the second approach here, taking explainability and interpretability to be two different ways to combat the epistemic opacity of AI technologies. Explainability aims to explain how the system arrived at its outcome according to its own inner workings - so which features of stimulus were deemed relevant, which node in the system responded and what does this node represent. This is often done by introducing a new meta-algorithm designed to explain the original model (for an example of such, see Begley et al., 2020). Interpretability requires that the technology be constrained in such a way as to allow the user to understand the causal relationship (according to domain-specific structural knowledge) between the stimulus and the outcome of the model in question, where how this is to be done is domain-sensitive (Rudin, 2019). Safety is something that we expect from all products, with contextual variation. To require an AI technology to be safe is not to say that it should be innocuous, any more than a safe train or safe drain cleaner would be, but rather about making sure that it minimizes unexpected risks and harms where the expectation in question is bounded by the intended purpose of the technology and the facts about its use context. A safe gun, for example, is a killing tool that is designed to avoid misfires or other events or states that could cause risk or harm other than that which fulfils its intended function.

Finally, reliability (which we take to be interchangeable in this context with robustness) is the rational predictability and rate of success with which the technology brings about certain states or outcomes. Higher reliability minimizes unexpected states and outcomes. This is always determined at a particular level of description, as how we pick out the states and outcomes we are interested in will vary from agent to agent. To illustrate, when a person is engaging with ChatGPT, if the sorts of outcomes they expect are best described as: “responses that resemble natural language” then they are quite likely to find the system to be reliable (i.e. its responses will be of this sort). On the other hand, if they are a developer with an intimate knowledge of the system and they are engaging with it in order to see if it generates certain particular word combinations that the developer has reason to think it would given the training data, the assessment of reliability could be quite different. Deciding on the appropriate level of description at which reliability should be demanded is an important part of the ethical assessment of any AI system. This can be a challenging task, especially when dealing with AI systems the purpose of which is to generate unexpected outcomes. Of course, we would rarely want a system like this to produce any possible outcome, so this is almost always a matter of bounded unexpectedness. In such a case, we can speak of its reliability at arriving at outcomes within these bounds. But even in cases where the outcome is quite unbounded from expectation, reliability would still be important due to its interrelations with the other notions at work here, for example: for a technology to be counted as safe for the purposes that matter here, it should be reliably safe, and so forth. Even if a system is meant to produce unexpected outcomes, we would still want – indeed should probably demand – that these be safe outcomes, which immediately introduces bounds on how unexpected a scope of outcomes we should permit. Technology that is reliably safe, explainable, or interpretable, and sustainable helps to combat (though not resolve) all the ethical concerns that we started with. And since these ingredients are present in some form and to greater and lesser extents in almost all of the national policies we have considered, the cost of ditching the responsible and trustworthy AI labelling in favour of any one or preferably a combination of these is fortunately very low.

6 Being X about AI

None of the above is intended to mean that responsibility or trustworthiness are not relevant when we consider the ethical challenges posed by AI technologies. But they should be applied to the correct targets, which are not the technologies themselves but rather other elements of the wider socio-technical contexts in which these technologies are embedded (Dignum, 2019: 53–54; Selbst et al., 2019: 63–64). With this in mind, we would urge that rather than using the formula of [agentially loaded notion + ‘AI’], it would be better to employ the formulation: [agentially loaded notion about AI]. This linguistically easy shift both avoids the various risks that we have discussed above, as well as focuses attention on the correct targets of these agentially loaded notions. We want, and are justified in demanding of, the developers and purveyors of AI technology to be responsible and trustworthy - as well as ethical, just, and a variety of other similar notions. When we are focussed on these individuals meeting these demands regarding AI technology, what we are demanding is that they be “X” about AI - where X is the agentially loaded notion.

It could be objected that a problem remains: elements of the socio-technical context to which we might attribute X includes institutions such as corporations, and these are often taken to be acceptable targets. It may open a dilemma, the objector could contend, where either we must give up the extension of X to the institutions if we are demanding this be done with AI technology, or we should be prepared to extend X to the technology. Our response is to remind the objector that X is a stand in for agentially loaded notions, notions that we paradigmatically ascribe to agents, and to point out that there is a rich literature defending collective agency (French, 1984; List & Pettit, 2011; Hess, 2014; Tollefsen, 2015; Björnsson & Hess, 2016; Himmelreich, 2019b).Footnote 6 Institutions such as corporations are often given as precise examples of such collective agents. By shifting the attribution of X to institutions and persons but not AI technologies, we can still retain the meaningful divide between ascription to agents and non-agents.

To make this suggestion more palatable, we would also point out that the extant AI policies already adopt this approach in much of their content, but crucially not in their framing and labelling. They stress the need for developers and purveyors to meet demands of responsibility and trustworthiness, for example, and place limitations on the sorts of contexts into which it is ethically acceptable to introduce certain AI technologies. All of this is being responsible or being trustworthy about AI. The only point of tension is the extension of this to the technologies themselves, and the use of encapsulating labels that place this latter problematic relationship in a highlighted and prominent position that stands to induce misattributions, which, as we have shown, brings deeply morally worrisome risks. Given that there is no obvious cost to making this change, and serious moral benefits to doing so, we strongly urge policymakers to take this suggestion seriously.

To end this section on a positive note, one that indicates that the concerns we have developed here might be being taken seriously at least in some circles, can be found in the labelling of President Biden’s 2023 Executive Order on the Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence. Though we may quibble about precise content, both safety and security are non-agentially loaded notions that we are glad to see given an important role in the framing. Similarly, we are fully supportive of the phrasing “Trustworthy Development and Use”, as in this case the agentially loaded notion is being aimed at the relevant socio-technical contexts and not the technologies themselves. Labelling of this sort addresses our concerns, and we hope that this is indicative of a wider and sustained trend going forward, rather than a once off in a landscape of [agentially loaded notion + ‘AI’] labelled policies. However, and this is an unfortunate irony, whereas a number of the other policies have a troublesome label but content that avoids the [agentially loaded notion + ‘AI’] combination, President Biden’s Executive Order does the opposite: its label evades our concerns, but in the content there are four uses of the phrase “trustworthy AI systems [our emphasis]” to refer to what is aimed for, and one instance of referencing the previous Executive Order 13,960 Promoting the Use of Trustworthy Artificial Intelligence in the Federal Government. Given that how to achieve this “trustworthy AI” is cashed out throughout the Order in terms of safety, privacy, and to a lesser degree ensuring fair outcomes, it seems strikingly unnecessary to use this language at all. It is also somewhat curious and disharmonious that this is done given that the title explicitly talks of trustworthy development and use, but the content at times talks of the development of trustworthy AI, without this reversal being explained. However, since the sort of person likely to read and engage with the details of the Order are not the audience about which we are primarily concerned in this work, we still take this to be a major improvement.

7 Concluding Remarks

The need for effective AI policies is surely undeniable. If anything, as with much new technology, we are late to the party in this regard. But how we label and communicate these policies is also important. There is good reason for us to avoid the recipe, currently in vogue, of trying to brand these policies with the formula of [agentially loaded notion + ‘AI’]. As long as AI systems remains non-agential, we contend there is a reasonable worry that branding of this sort may induce misattributions, as well as bring on other dangers, though time and future research will tell how accurate this concern turns out to be. Fortunately, the possible remedies are low-cost and so though this concern is currently speculative there is little to lose in trying to get ahead of it. The first remedy could be to employ labelling and communication that refers to ‘reliable AI’, ‘safe AI’, ‘explainable or interpretable AI’, and ‘sustainable AI’ – notions that are not agentially loaded. Alternatively, trustworthiness or responsibility as regards AI could be better communicated through labelling and framing more clearly targets other elements of the wider socio-technical context in which these technologies are embedded. These would include developers, users, and regulators, as well as perhaps institutions of which we can make these richer, more agential, demands – better that we focus on being X about AI, rather than on X AI.