1 AI anthropomorphism or trustworthiness anthropocentrism?

When a philosophically loaded concept, like trustworthiness, gets adopted into an emerging domain of practical ethics, such as the debates surrounding the design and governance of AI, there are different kinds of questions philosophers might ask.

Some question whether it makes sense to ascribe trustworthiness to AI systems. For instance, Ryan (2020) draws on extant accounts of trust to argue that trustworthiness requires the trustee to have the capacity either to feel goodwill towards the trustor or to be held appropriately responsible for their actions. Since AI systems lack these capacities, they should not be described as trustworthy. Doing so, argues Ryan, would either inappropriately anthropomorphise AI systems (since it misleadingly suggests they have capacities they in fact lack) or deflate the concept of trustworthiness to the extent that it undermines the value of interpersonal trust.

Simion and Kelp (2023) frame their paper differently, asking instead what it means for AI to be trustworthy. Thus, they presuppose that it does make sense to call AI systems trustworthy, aiming instead to develop an account of trustworthiness capable of making sense of such ascriptions. They are of course well aware that the classical accounts of trustworthiness would entail ascribing contentious anthropomorphic capacities to AI systems. However, they regard this as a problem for the classical accounts, rather than a reason to reject the emerging practice of ascribing trustworthiness to AI systems. In their diagnosis, the problem is that the classical accounts are too anthropocentric: being based on paradigm cases involving interpersonal trust, it should not surprise us that they generalise poorly to non-human cases. Consequently, they propose a context-sensitive account of trustworthiness, designed to accommodate both artefacts and persons.

I tend to agree with Simion and Kelp on this point. Descriptions of non-human entities as trustworthy are not novel or unique to AI governance and design. For instance, the entry for ‘trustworthy’ on the Cambridge Dictionary webpage includes the following examplesFootnote 1:

  • ‘Consumers believe in the basic reliability and trustworthiness of our products’.

  • ‘Robust regression must be used in order to compute trustworthy solutions’.

  • ‘The field will not progress until researcher deficiencies no longer interfere with the ability to provide solid and trustworthy data’.

  • ‘Fanatics have long sought to chase down the more obscure parts of the list, and it is not entirely trustworthy for this purpose’.

Those who enjoy fantasy novels will also be familiar with the more archaic form ‘trusty’ being applied to swords and steeds. Trustworthy AI seems a natural continuation of these existing uses, none of which involve inappropriate anthropomorphism or threaten the value of interpersonal trust.

Now, critics like Ryan might object that the above examples do involve anthropomorphism but that it is an innocent kind, since everyone knows not to take such trustworthiness ascriptions literally. There is no danger that a longsword will be assumed capable of feeling goodwill or that data will be considered morally responsible. What makes trustworthy AI particularly problematic is that AI systems are designed to carry out (parts of) tasks that would otherwise be handled by humans, such as diagnosing tumours or estimating creditworthiness. This (the objection goes) makes it especially likely that inappropriate anthropomorphic assumptions will be inferred from trustworthiness ascriptions.

I do not think this objection succeeds. The risk of misunderstanding is a reason to develop more robust theories of trustworthiness, rather than to arbitrarily exclude AI from their domain of application. Nonetheless, I argue in this commentary that this observation—that AI systems differ from other artefacts, in that they are designed to accomplish characteristically human tasks—still holds some important lessons. Specifically, I argue that it can help address some otherwise pressing counterexamples to Simion and Kelp’s account. Thus, we should allow a modest degree of anthropocentrism in our account of trustworthy AI.

Here is the plan: Section 2 summarises the relevant features of Simion and Kelp’s account, Section 3 introduces and discusses the counterexamples, and Section 4 outlines my positive proposal. I conclude in Section 5.

2 Simion and Kelp’s account

Simion and Kelp construe trustworthiness as a disposition to fulfil one’s contextually relevant obligations. The idea is that the conversational context within which a given trustworthiness ascription is made picks out a set (or sets) of actions that are salient to the trustee’s obligations in that context. The trustee counts as trustworthy in that context if and only if the strength of the trustee’s dispositions to execute those actions surpasses a threshold, also set by the trustee’s contextually relevant obligations. Since the obligations that apply to different kinds of entities differ, this account can accommodate artefacts and persons alike, without either implying that artefacts have implausibly anthropomorphic capacities or lowering the standards for interpersonal trust. Thus, while persons are typically obligated to take responsibility for the consequences of their actions, artefacts are usually only obligated to reliably carry out their proper functions.

For this account to be practically applicable, we need to be able to answer two kinds of questions. First, how do we determine the contextually relevant obligations of an AI system? Second, how do we infer the relevant set(s) of actions and threshold, given those obligations? While much can be said about either, I will focus on Simion and Kelp’s answer to the former.

Their answer is that artefacts acquire obligations based on their functions. Given some specification of what the functions of an artefact are, it is considered ‘properly functioning’ when it under normal circumstances reliably enough fulfils those functions, whereas it is considered malfunctioning, and thus in breach of the norms governing that type of artefact, when it fails to function to a sufficient level of reliability.

Simion and Kelp distinguish two ways that artefacts acquire functions. The first is based on the designers’ intentions (d-functions): an artefact d-functions properly when it (reliably enough under normal circumstances) is able to do what the designers intended it to do. However, artefacts can also acquire functions etiologically (e-functions), based on the history of that type of artefact. An artefact acquires an e-function when past tokens of that artefact type had a certain ability, and this fact contributes to explaining the continuous existence of artefacts of that type. To use their example, a knife has the e-function to cut because the ability of past knives to cut explains the fact that knives continue to be manufactured.

The distinction between d-functions and e-functions allows Simion and Kelp to account for cases where d-functions and e-functions come apart. A dull knife functions poorly (qua knife), even if it was deliberately designed to be dull. This is because, considered as a token of the type knife, it still has the e-function to cut. (It might, however, e-function properly considered as a token of another type—say, theatre prop). Similarly, they take this to explain why a cancer-diagnostic AI system that has been designed not to recognise common tumours would not count as trustworthy. According to Simion and Kelp, this is because cancer-diagnostic AIs have the e-function to reliably recognise such tumours, since this ability contributes to explaining their continuous existence.

3 Untrustworthy but properly functioning AI

The problem for Simion and Kelp’s account is that there are cases of untrustworthy AI systems that nonetheless reliably fulfil their d-function and e-function.

Consider recommender systems, i.e. the type of machine learning algorithms that curate social media feeds, suggest the next song or video on entertainment platforms, and recommend products when we shop online. These are designed to do things like increase user engagement or generate sales, and they often do so quite well. Automatically analysing the tremendous amounts of data on user behaviour that large digital platforms generate, recommender systems can continuously optimise their recommendations to most effectively induce the desired behaviours (Yeung, 2017). The effectiveness of recommender systems is a large part of the explanation for the success of major platforms such as TikTok, YouTube, and Amazon, and hence for the continuous existence of recommender systems. Thus, the d-function as well as the e-function of these systems is to increase user engagement and sales. Since they reliably fulfil this function, they count as trustworthy on Simion and Kelp’s account. However, the way they fulfil this function is often troubling: they can foster addictive behaviours (‘just one more video’), exploit negative but highly motivating emotions (doomscrolling, hate clicks), or drive polarisation by creating echo chambers and filter bubbles or by gradually radicalising users towards more and more extreme content.

My second example involves algorithmic bias. Now, there are many potential causes of bias in machine learning systems. Some relate to data quality, such as when data are unrepresentative, are less accurate for some groups, or encode social injustices (Barocas & Selbst, 2016). However, bias can also arise from trade-offs between different properties that designers may want to optimise for. For example, the most efficient way for a machine learning model to optimise its predictive accuracy is sometimes to very accurately capture the patterns that characterise the majority of data points, thereby effectively treating small subsets of the data as noise (Hardt, 2014). This can result in models that have high average accuracy but systematically underperform for minority populations represented by these ‘outlying’ data points. Importantly, this can occur even if the training data are completely representative and accurate. In fact, under some conditions, it can be mathematically impossible to increase accuracy for minority populations without decreasing overall accuracy.Footnote 2 For many types of predictive models, their continuous existence is explained by the ability to optimise overall accuracy, not unbiasedness. Take creditworthiness scoring algorithms: these are used to minimise the number of customers who default on loans. As long as they achieve high overall accuracy in predicting applicants’ risk of default on a given loan type, lower accuracy for small sub-populations is unimportant. Thus, their e-function and (at least in some cases) their d-function are high accuracy but not unbiasedness. On Simion and Kelp’s account, a biased creditworthiness scoring algorithm of this type would count as trustworthy.

Biting the bullet on these cases is a non-starter. They are exactly the kind of cases that motivate calls for trustworthy AI. Any account that seeks to clarify the notion of trustworthiness as it is used in AI design and governance must exclude them. Simion and Kelp might respond by arguing that avoiding polarisation, bias, etc. is also among the functions of these systems. However, none of the obvious strategies for arguing this seem promising.

Suppose they argued that they are d-functions, since many designers do want to ensure their AI systems avoid these problems. That would still leave the cases where designers lack such noble intentions as counterexamples. Furthermore, suppose that a recommender system, S1, has been designed to avoid polarisation, but only reduces it slightly, while S2 has been designed solely to optimise engagement. In this case, S1 would fail to fulfil one of its d-functions (avoiding polarisation), whereas S2 fulfils all of its functions. Thus, S2 would be more trustworthy than S1. But it should of course be the other way around: any success in reducing polarisation should count favourably towards the trustworthiness of an AI system.

Suppose, instead, they argued that they are e-functions, since people are more likely to use and accept unbiased and non-polarising AI systems, and this contributes to explaining their continuous existence. However, it is unclear whether this is empirically true. Polarisation and algorithmic bias have only recently become widely recognised concerns. Whether they have significantly slowed the emergence of AI is at least an open question. Moreover, from an explanatory point of view, this response puts the cart before the horse. A theory of trustworthiness is supposed to explain when an AI system is a good candidate for trust. Defining trustworthiness in terms of the features that make users more likely to trust AI would be circular. It also seems normatively problematic: if a type of AI continues to exist because of its ability to convince users that it is unbiased, regardless of the actual behaviour of the system, would that count towards its trustworthiness?

Here is my diagnosis of why Simion and Kelp’s account delivers the wrong verdict. Their account is based on simple cases of tool use, where the functions of an artefact are usually well aligned with the interests of the person who needs to trust it, namely the user. The ability to cut contributes to explaining the existence of knives because it helped, and continuous to help, knife users achieve important goals. Since knife users are plausibly the main type of trustor we have in mind when describing a knife as trustworthy, a properly functioning knife is also a trustworthy knife.Footnote 3

However, the relationship between artefacts and human interests can be significantly more complex. Technologies can be deliberately designed with functions that are contrary to the interests of those who would trust them; in fact, many artefacts are designed, and continue to exist, exactly because they serve interests antagonistic to the user and other potential trustors. While this is not a recent phenomenon, the contemporary digital and increasingly AI-driven economy makes it highly salient. AI is used to identify and to target vulnerable people, e.g. with adverts for diploma mills or gambling websites (O’Neil, 2016), to imbue bureaucratic control mechanisms with a veneer of apolitical objectivity (Eubanks, 2018) and to harvest vast amounts of sensitive data for the benefit of companies and governments (Véliz, 2020; Zuboff, 2019).

All of this is problematic for Simion and Kelp’s account because it breaks the intuitive link between proper functioning and trustworthiness. When trusting a properly functioning artefact cannot be assumed to benefit the trustor, proper functioning does not in itself make the trustee worthy of trust. Indeed, why should users trust technologies whose function is to exploit their trust?

Part of the problem, I suspect, is that trustworthiness within the design and governance literature is often used in a moralised sense. Designers and policymakers are not just looking to ensure that AI is trustworthy in an instrumental sense (like a sword might be considered a trustworthy tool for committing horrible deeds). Rather, the point is to ensure that people trust AI systems when, and only when, it would be morally good if they were trusted. I am sceptical that an account of trustworthy AI can deliver this without building in some substantive moral commitments.

However, this is not the main point I will press in the following. Rather, I want to develop a more constructive proposal, which I think goes at least some way towards accommodating the above examples, independently of whether a non-moralised approach to trustworthy AI proves viable.

4 A modestly anthropocentric proposal

While I will not be able to develop a fully fledged theory, I want to sketch a way forward, building on Simion and Kelp’s general approach. Specifically, I want to retain their account of trustworthiness as a disposition to fulfil one’s obligations, whilst jettisoning the idea that the obligations of AI should be sourced from purely functional norms. To achieve this, I propose that, for the purposes of thinking about trustworthiness, AI systems should not be considered primarily as tools, but as technological participants in social practices.

Let me first motivate this proposal. Recall the objection, discussed in Section 1, that there is a greater risk of inappropriately anthropomorphising AI systems, because they (unlike most other artefacts) are designed to carry out human-like tasks. Simion and Kelp seek to block this argument by proposing a non-anthropocentric theory of trustworthiness. This allows them to treat trustworthy AI in exactly the same terms as other trustworthy tools. What I am proposing is to instead partially embrace the objection: AI systems are indeed more similar to humans than many other artefacts, and trustworthy AI should therefore be construed in more anthropocentric terms. However, it is only a partial embrace: AI systems are not subject to the exact same trustworthiness criteria as humans. It will not entail that AI systems have capacities involving consciousness, moral responsibility, or similar.

Instead, my proposal is, roughly, that because AI systems take up human-like roles within important social practices, they inherit some of the same obligations that would apply to humans occupying those roles. Again, care should be taken not to overemphasise the analogy: humans are rarely replaced 1-to-1 by AI systems. Rather, AI is usually implemented to provide decision support in the form of predictions, suggestions, reminders, red flags, etc., with a human (e.g. a judge or physician) still making the final decision. Even in cases where human decision-makers are displaced, the broader social practice is not left unchanged—and more importantly, should not be left unchanged. Exactly because AI systems lack crucial capacities, like empathy, care, and moral responsibility, new practices of accountability, oversight, reason giving, etc. should, and often do, emergeFootnote 4.

To cash out these ideas, I propose the following:

Modest anthropocentrism: an AI system’s obligations to phi are sourced in the norms that should govern the role it plays within the social practices it participates in, taking into account any changes to the social practices that its participation may bring about.

As it is beyond the scope of this commentary to defend this proposal in depth, I restrict myself to briefly explaining the core ideas.

First, following Haslanger (2018), I define social practices as patterns of learned behaviour that allow agents to coordinate action and hold each other accountable by reference to a body of shared social meanings. The social meanings make behaviours intelligibility as meaningful actions (or ‘moves’) within that practice. They provide norms for justifying such actions and for judging their success, and they govern what inferences and expectations others may draw from them. For example, the words ‘I’m afraid the biopsy came back positive’ only count as a diagnosis when uttered by the right kind of person in the right kind of setting. Classifying it as a diagnosis in turn imposes certain evidential norms on the utterer, grants the person receiving the diagnosis access to certain kinds of resources, and so on.

Second, an AI system participates in a social practice, P, if and only if it performs one or more tasks that either (i) play the same functional role as a meaningful action in P or (ii) contribute to the actions of agents participating in P. Thus, a fully automated cancer-diagnostic AI and a medical decision-support system would both count as participating in the practice of medical diagnosis.

Finally, given this, how are the obligations to phi of an AI system determined? It would be tempting to source these directly in the norms that ordinarily govern the role it plays in P. However, as mentioned, even if the AI system itself is unable to fulfil such norms, other agents may adapt their behaviour to compensate. Similarly, even if the AI system fulfils the norms that apply to its own role, it may negatively impact other agents’ ability to fulfil the norms they would ordinarily be expected to follow. Therefore, the obligations of an AI system should instead be sourced in the norms that should govern the role the AI systems play in P, taking into account any changes to P that the implementation of the AI system may bring about. To a first approximation (though I will have a bit more to say about this shortly), these potentially revised norms are those that would best allow P to achieve its overall social function.

Here is an example. The practice of medical diagnosis serves several important functions, including to guide the distribution of healthcare resources and provide individuals exemptions from normal social or legal expectations. However, at least within modern Western medicine, it also plays an important role in supporting patient autonomy: being ‘given’ a diagnosis, including appropriate explanations of its meaning, likely causes, and potential future consequences, is crucial for patients to be able to give informed consent to proposed treatments, as well as to competently shape their life and identify going forward.Footnote 5 Now, the mere fact that a diagnostic AI itself is unable to provide such explanations may not be a problem. As long as there are healthcare professionals at hand with sufficient understanding of the patient’s clinical situation, implementing the system would not degrade the overall functioning of the practice. By contrast, if healthcare professionals off-load so much of their cognitive labour to the system that they lose this clinical understanding, they would no longer be able to fulfil the autonomy-supporting function of medical diagnosis (Keeling and Nyrup, 2022). In this case, the AI system would have an obligation to provide explanations.

5 Conclusion

To close, a final remark: I suggested that the norms governing AI systems should be determined by the overall social function of the practices they participate in. This seems to be in the spirit of Simion and Kelp’s general framework, especially since they (briefly) outline an etiological account of the functions of social practices. However, I also called this a first approximation. The reason for my hesitation is as follows: like Haslanger, I worry that many social practices, in their current formation, in part function to serve unjust ends—e.g. to facilitate and reinforce the exploitation of marginalised populations and scarce natural resources for the benefit of the already well off. For those who share this worry, calling AI systems that merely preserve the current functions of social practices ‘trustworthy’ will be deeply unsatisfactory (to put it mildly).

In the final analysis, then, I would want my account to build in more substantive moral commitments. One promising approach might be to adopt a normative theory of social practices. For example, one might define social practices as patterns of learned behaviour that allow agents to coordinate their actions in ways that benefit everyone or that allow them to hold each other appropriately accountable.Footnote 6 How best to work out these details, I am not yet sure. Meanwhile, though, I hope to have made a convincing case for modest anthropocentrism as a general approach to trustworthy AI.