1 Introduction

If we have confidence in a human agent A performing an action, e.g., delivering a message to B for us, we can say that we ‘trust’ A to do so, but can you also ‘trust’ the postal service, as an institution, to deliver a message? The issue is not whether we should have confidence in the postal service or not, but whether there is something about the selectional restrictions of the verb trust that rules out institutions as trustees. Or how about: can you trust A to deliver a message while sleepwalking, or does trust require the trustee’s conscious awareness? And, most importantly for our purposes: can you trust a piece of artificial intelligence (AI) technology to deliver a message?

Trust involves a relationship between two parties (the trustor and the trustee), in which the former places trust in the latter. Trustworthiness, on the other hand, refers to the quality of being deserving of the trustor’s confidence, which is a characteristic of the trustee, but subject to the perception of the trustor. It is possible for an individual to be considered trustworthy by some but not by others. Trusting a trustee involves a certain degree of risk or vulnerability on the part of the trustor, as they are placing their reliance on the trustee’s assistance. When someone places trust in a trustee, they are taking a leap of faith.

Philosophers have debated whether machines can be trusted [4, 7, 10]—including whether the concept of trustworthiness applies to machines. This, often skeptical, debate—which i shall not try to summarize here —has not stopped the AI community, as well as AI regulators in law and governance, from using the term.

The trustworthiness of AI technologies is also discussed in the general public. Public ’trust’ in self-driving cars, for example, took a blow on 18 March 2018, Arizona, when an Uber self-driving vehicle killed a pedestrian. This incident showed that self-driving cars were more prone to error and were less reliable than expected. I will argue that it is important to distinguish between error proneness (performance) and reliability (robustness). Both seem implied by how we talk about the trustworthiness of AI, and both are analogous to prerequisites for trust between people.

I shall, however, reject the conclusion in [4], advocated for by many others [7, 10, 11], that trust is a uniquely human relation that machines cannot enter. This conclusion, I will show, is erroneous and stems from a petitio principii logical fallacy. In brief, the conclusion is already a premise for Jones’ definition of trustworthiness. Jones’ definition is, in other words, anthropomorphic. I revise Jones’ definition of trustworthiness to avoid circular reasoning, and show how the revised (less anthropomorphic) definition reduces to requirements of performance and robustness. The revised definition highlights the analogy between trust in humans, institutions, and artifacts. Jones’ definition does not require privacy and value alignment explicitly. Privacy and value alignment are important, as suggested by Annette Baier’s seminal account of trust, but I will argue that, in spite of the amoral nature of many unaligned models, these objectives are secondary and task dependent. Finally, I will discuss how transparency relates to trustworthiness on my account. Transparency is widely assumed to modulate trust [13], and I will show that on Jones’ revised definition, transparency can provide sufficient, but not necessary conditions for trustworthiness.

2 Definitions of trustworthy humans

I begin by identifying the criteria for trustworthiness on canonical definitions thereof. I discuss three common definitions of trust between humans, of which the third (D3) is the most commonly agreed upon:Footnote 1

D1.:

A is trustworthy for B if A performs reliably for B.

D2.:

A is trustworthy for B if A has discretion and goodwill toward B [1].

D3.:

A is trustworthy for B if A is competent and directly responsive to the fact that B is counting on A [4].

In Sect. 3, I will argue for a revision of D3 that goes as follows:

D4.:

A is trustworthy for B if A is reliably competent.

Note how D1–D4 have a similar structure. The necessary precondition for the binaryFootnote 2 trustworthiness relation between A and B is expressed as a conjunction of one or two relations:

$$\begin{aligned} Trustworthy(A,B) \rightarrow \phi \wedge \psi \end{aligned}$$

The above reads: A is trustworthy for B if \(\phi\) holds (e.g., A is competent), and \(\psi\) holds (e.g., A is directly responsive to the fact that B is counting on A). Let me briefly summarize what the preconditions are taken to mean.

2.1 Reliability

Jones mentions that reliability ’applies to forces of nature and non-reflective agents as well as to fellow human beings’, but beyond that, she only defines it negatively. Dictionaries are of little use and tend to conflate ’reliability’ and ’trustworthiness’ as synonymous. The MacMillan dictionary defines reliability by saying that ’a reliable person is someone who you can trust to behave well, work hard, or do what you expect them to do’. The disjunction could indicate that trustworthiness refers to particular kinds of reliability. Most seem to agree that reliability requires some form of consistency.

[1] claims trust is distinct from reliability, in that trust can be betrayed, whereas reliance can merely be disappointed. I want to add that reliability seems to (always) be triggered by past observations of a specific behavior. The behavior does not have to be good or bad. You can, in other words, be reliably bad. In contrast, trustworthiness does not require past observations of a specific behavior. I can find you trustworthy as an investor without any knowledge of your past investments, but simply because I find you competent, or because I saw your university diploma. Reliability, thus, differs from trustworthiness in requiring past experiences and in not requiring competence.

I will assume that D1 means something like ’A is trustworthy for B if A behaves consistently for B’. One possible definition of reliability would then be: ’A is reliable for B if A behaves consistently for B’. Our revision of Jones’ definition in Sect. 3 (D4) will then amount to something like: ’A is trustworthy for B if A can be expected to behave consistently well for B’,Footnote 3 Since reliability presupposes a series of past observations, and trustworthiness—on our definition below, call it D4-trustworthiness—no longer entails direct responsiveness, D4-trustworthiness is no longer a form of reliability (with bells and whistles). Reliability is also not a form of trustworthiness (with bells and whistles), since D4-trustworthiness entails competence. In the context of AI, I will refer to reliability as robustness (see Sect. 4).

2.2 Discretion

[1] defines trustworthiness in terms of discretion. Agent A, in other words, has to ‘avoid causing offense or revealing confidential information’ to be trustworthy. This is generally good advice for someone who wants the trust of others, but I will argue that discretion or privacy is best grouped under competence. A competent psychologist will not share confidential information. A trusted friend will not go tell everyone about your anxieties without your approval. This is simply part of what it means to be a competent psychologist or friend. My trust in my mechanic is rarely affected by how talkative he is. Similarly, my trust in my fellow clients in a group therapy setting will often be orthogonal to their discretion.

Similarly, for some AI applications (but not others), privacy is simply part of what it means to perform well. Imagine a system that transcribes legal proceedings. The quality of such a speech recognition system will naturally depend on its ability to capture as much of what is said during the proceedings as possible. It will also have to produce fluent, readable output. Sometimes these two objectives are evaluated separately, e.g., by word error rate and perplexity. We can discuss them separately, but a good system will have to strike a good balance between the two objectives. We can, in the same way, discuss the extent to which the system protects the highly sensitive data it processes. We can even do this somewhat independently of the other objectives. Still, however, a good system has to protect the data. A transcription system for legal proceedings that leaks participant data, is no good, irrespective of its error rate and perplexity.

For other applications, however, non-privacy (leakage of information) may be a feature and not a bug. Consider an AI system for determining the genre of a painting, trained on historical collections of public art work. Privacy is not a crucial factor in this context. In fact, it may get in our way. If we want to know why the system thinks a painting is surrealist, i.e., we want to know what art work seen during training led it to make this inference, we need the system to memorize its training data. In general, privacy and interpretability are at odds [9], and this leads to scenarios in which we are forced to choose between such AI virtues.Footnote 4

2.3 Goodwill

Goodwill is Baier’s way of saying ’without malicious intent’. Goodwill, thus, presupposes an intention. Apples or pears cannot have goodwill, for example. Does goodwill, as a concept, apply to software or technology? Baier never considers this question. One way D2 could apply to machines, however, is by extension to its designers (derived intentionality). On such a reading of Baier’s definition, any AI not designed with malicious intent is trustworthy.

Note that D2 does not make competence a prerequisite for trustworthiness, a criticism that has often been leveled against Baier. While goodwill or aligned values is not a necessary condition for all trustworthy AI, it certainly helps. Like discretion, goodwill can be a component of competence.

Goodwill is important for some applications, e.g., scientific investigation and educational applications, but in these cases, it should be integrated in our quality metrics. AI for scientific investigation, for example, must align with established scientific methodology, and educational technologies must align with regulations and pedagogical principles to be considered any good (at what they were designed for). Chatbots and translation systems must be non-toxic, safe, and unbiased (to the extent possible). Other technologies are not subject to such constraints. This, for example, holds true for a range of core technologies such as image segmentation and part-of-speech tagging, as well as for generative models developed for artistic purposes, e.g., poetry.

2.4 Competence

Reliability (D1) does not presuppose competence, since you can be reliably bad at something. In a discussion of a legal case, for example, an attorney once accused another attorney for ’picking on the most vulnerable members of society and callously relying on their poor memories’. The inability of some people to recollect past events was reliable enough that the attorney made it a strategy to exploit this inability. So, in sum: if I trust that you loose a game of chess, I expect you to willfully loose. If I rely on you loosing a game of chess, I just expect you to loose—not necessarily willfully so.

But what is competence? One way of putting this, is to say that if A is competent at performing some action \(\phi\), i.e., playing chess, predicting the polarity of a product review, or juggling with three balls, she will (almost always) be able to perform \(\phi\) under standard conditions. What are standard conditions? These would be representative of the conditions previously observed. Performance under standard conditions, thus, says nothing about the general distribution of conditions. It refers to the distribution of past or observed conditions, but nothing about conditions we are likely to face in the future. Say A lives in a culture where juggling is always performed indoors in a very bright room. A can be a competent juggler (in this culture) if she (almost always) can juggle under such conditions. The fact that she fails when conditions change ever so slightly, does not threaten her perceived competence, only her perceived reliability. In AI terminology, she performs well, but fails to exhibit robust performance across distribution shifts.

2.5 Responsiveness

In contrast to Baier, who simply ignores technology, Jones [4] explicitly argues that D3, because of its emphasis on direct responsiveness, is a reason why trustworthiness only applies to humans and human institutions. Other philosophers have presented related arguments, e.g., talking about A’s attribution of motivation to B as what distinguishes B’s trustworthiness from B’s reliability [7]. Trusting someone is going on a limb, making yourself vulnerable to the willingness of the other to assist you. You trust the other—knowing that you trust her—will be willing to assist you. For this reason, trustworthiness is not (for now) a property of technology.

The main argument I will make here is that Jones’ responsiveness precondition in D3 is an ad hoc fix, which is only relevant for human trustees, and should be seen as an appendix to D3, when applied to humans (only), rather than part of the definition of trustworthiness itself. The failure of [4] to make this distinction leads her to a form of circular reasoning which I will flesh out in Section 3.

3 Jones’s circularity

I will make the following argument: direct responsiveness is only a necessary precondition to trustworthiness in D3 because Karen Jones (understandably) wants trustworthiness to apply to humans. Jones is eager to define trustworthiness for humans and feels she needs direct responsiveness to avoid situations in which reliably competent users fail to respond to trust. Note how this is subtly different from goodwill, even if goodwill also often implies direct responsiveness.Footnote 5

Jones’s argument that trustworthiness is uniquely human, is therefore, in my view, circular (a petitio principii). In short, her argument, as I read it, is:

  1. a.

    Trustworthiness requires competence.

  2. b.

    Because humans can abstain from using their competence, trustworthiness also requires direct responsiveness (for humans).

  3. c.

    Machines cannot be directly responsive.

  4. d.

    Machines, therefore, cannot be trustworthy.

The problem with this argument is that the responsiveness clause (b) was inserted into Jones’s definition of trustworthiness to prevent human deflection from blocking competent behavior. For machines, whose behavior is not at the mercy of willful or unwillful deflection,Footnote 6 This clause is no longer needed. The logical fallacy is, in this way, a result of baking the conclusion into one of the premises (direct responsiveness). The conclusion becomes a premise, and the deduction that leads to the conclusion that machines cannot be trustworthy, becomes an instance of circular reasoning (petitio principii),Footnote 7

To see this, consider the example of a trustworthy car driver C. Say we do not know whether C is human or machine. So what does it mean for C to be a trustworthy driver? C will have to be a competent driver to be deemed trustworthy. The competence of C amounts to C driving well under usual conditions. Competence is a necessary condition, but not sufficient, however. When your competence as a driver is evaluated during a driver’s license road test, your ability to drive error-free under standard conditions is evaluated. Your reliability under changing conditions is not evaluated. If C is trustworthy, C is more than competence. It means that we can expect C to always perform its best and reliably under changing conditions. If C is human, this would be guaranteed by C’s responsiveness to a trustor’s trust, but if C is not, such reliability can be established without responsiveness (beyond the system’s being up and running). With Jones’ revised definition of trustworthiness, we can now talk about the trustworthiness of an agent in the absence of knowledge whether the agent is human or not. We can, for example, compare human taxi drivers (and their cars) to driverless taxis. This should be an advantage, as AI systems become part of our everyday lives.

In my revision of Jones’ definition of trustworthiness, repeated here for convenience, the precondition of reliability removes the need for responsiveness:

D4.:

A is trustworthy for B if A is reliably competent.

The superiority of D4 over D3 becomes particularly clear in discussions of trust in institutions. Annette Baier and Karen Jones both happily extend the trustworthiness predicate to institutions, but institutions are generally not responsive. In contrast, their behavior is regulated by rules, and it is the rule-abiding of institutions that make them trustworthy. If institutions fail to follow the rules, we loose trust in them. It is, in other words, part of the reliable competence of institutions that they serve reliably and do not discriminate between trustors.

4 Definitions of trustworthy machines

Definitions of trustworthiness refer to concepts such as reliable performance, discretion and goodwill, competence, and responsiveness. How do such concepts translate into AI and machine learning? My tentative mapping into the vocabulary of this literature is as follows:

Humans

Machines

Competence

Accuracy

Reliability

Robustness

Discretion

Privacy

Goodwill

Aligned purpose

Accuracy here is a place holder for whatever is the commonly accepted performance measure for a task, e.g., \(F_n\)-score, precision@k, or correlation strength. Robustness can be robustness to different shifts, including noise conditions, domain shifts, subgroup shifts, or temporal drift.

If we remove the responsiveness clause of D3, leading to D4 instead of D3, we are, thus, left with three definitions of trustworthiness of machines:

D1’.:

Machine A is trustworthy for B if A exhibits robust performance across B’s use cases.

D2’.:

Machine A is trustworthy for B if A was designed with privacy and aligned purpose.

D4’.:

Machine A is trustworthy for B if A exhibits high accuracy and robust performance across B’s use cases.

Note that D2’—derived from the definition of trust in [1]—is independent of the use cases. Trustworthiness is an intrinsic property of AI models, which is not evaluated relative to specific applications. The D2’ is, so to speak, a thick conception of trustworthiness, whereas D1’ and D4’ present thin or specific conceptions. Note how D4 translates directly into D4’. I will argue that for AI, D4’—our revision of Jones’s definition—is the most useful definition of trustworthiness —and that under D4’, it is possible to design trustworthy AI.

4.1 Accuracy

The accuracy of a machine learning model is the true positives and the true negatives over the total number of evaluation data points. As already mentioned, this is just a place holder in our case. Some AI applications call for other metrics. Performance on very skewed classification problems, for example, is sometimes better measured by minority class \(F_1\)-score, since true negatives inflate accuracy numbers. Regression models are often evaluated by distance metrics, and generation models by (say, text or image) similarity metrics.

4.2 Robustness

In the context of AI, reliability translates into the concept of performance robustness. Robustness refers to the property that a model’s performance—say, accuracy or \(F_1\)—is not (too) sensitive to distributional shift (changing conditions), e.g., moving a driverless car to a new environment. A robust system is typically assumed to be robustly accurate, but the two terms can also be used in orthogonal ways: A can be more robust than B, with B more accurate than A, for example.

Note that performance and robustness may, in some situations, be at odds. When discussing the fairness of AI systems, for example, we often consider the reliability–robustness trade-off, exchanging high accuracy on a majority group for slightly worse performance across all demographic groups. Robustness is quantified as the variance in performance or the min–max difference across performance estimates across multiple conditions. If we focus on temporal drift, we can estimate the performance of AI systems A and B on multiple time slices. If we equate robustness with (inverse) variance across these time slices, it is easy to see why accuracy and robustness are orthogonal.

Say these are performance estimates (say \(F_1\)-scores) of a system A across three time slices of data on some task, say, authorship attribution:

18th century

19th century

20th century

0.2

0.3

0.2

System A exhibits lower performance variance (higher robustness), yet lower accuracy, than a system B with the following performance estimates over the same time slices:

18th century

19th century

20th century

0.7

0.3

0.7

System A is more reliable than B, but B is more accurate. On D4’, none of them are trustworthy; however, system A fails to be accurate (competent), and system B fails to be robust (reliable). In sum, there is a trade-off between performance and robustness, but trustworthiness requires both. Occasionally, this will lead to scenarios in which trustworthy AI is not possible.

4.3 Privacy

Baier argues that discretion is a necessary precondition for trustworthiness, which arguably translates into privacy in the context of machine learning models. AI systems can violate user privacy in many ways. Private information can be leaked when users interact with the model, but machine learning models that are made available for commercial or research purposes may also leak private information about people that contributed to or were talked about in the training data. Models can be trained with differential privacy guarantees to prevent the latter form of leakage. While privacy is crucial for some applications, e.g., automatic transcription of legal proceedings, it is immaterial for other applications, including a range of core technologies, e.g., image segmentation or part-of-speech tagging, and applications such as movie plot generation or handwriting recognition. Since privacy is, thus, an application-specific precondition for trustworthiness, I think the most parsimonious way of thinking of privacy is as part of the definition of what it means to be competent for some tasks. Just like competent psychologists do not violate confidentiality, competent speech recognition models for legal proceedings do not leak private data.

4.4 Aligned purpose

Baier’s notion of ’goodwill’ relates to what in the AI community is called value or purpose alignment. If the purpose of a technology does not align with the values and expected purposes of the end users, this either signals a conflict of interests, i.e., a violation of the ’goodwill’ precondition, or poor communication about the purpose of the technology. Poor communication can itself be a violation of ’goodwill’, making model transparency and data sheets instrumental in establishing a sense of aligned purpose. Aligned purpose, like privacy, is only relevant for some AI applications, however, e.g., excluding many core technologies. Thinking of aligned purpose as part of the competence profile for these tasks, to me, seems like the most principled approach to take. Aligned purpose (or goodwill) also relates, in subtle ways, to direct responsiveness; see Footnote 5 for discussion.

5 How transparency modulates trust

[13] focus on the role of transparency in establishing trust in technology. On their view, transparency plays the role of goodwill, as ‘an expression of good faith on the trustee’s part’. Transparency includes (but is not limited to) ‘explanation, which can enhance a user’s understanding of how an algorithm works.’ It is intuitive that understanding how a competent algorithm works, would increase trust in it. Transparency lets us see the algorithm is competent. On my view, however, good faith is not a prerequisite for trustworthinessFootnote 8—so how can we account for how transparency modulates trust?

My account, I think, is relatively straightforward. Trustworthiness is premised on robust competence, and robustness estimates for a new piece of technology usually limited by first-hand experience or second-hand reports of experience with this technology. Increasing transparency may, however, compensate for experience in some cases. To see this, consider the following anecdote about ALVINN, one of the first self-driving cars, from a 1993 PhD thesis by Dean Pomerleau [8, p. 182]. The technology in ALVINN is a neural network trained to steer the wheel based on input from a camera mounted to the roof of the car. Pomerleau discusses how neural networks’ ability to utilize subtle image features can be dangerous:

This danger was demonstrated when ALVINN was trained to drive on a dirt road with a small but distinct ditch on its right side. The network had no problem learning and then driving autonomously in one direction, but when the vehicle was turned around, the network stayed on the road, but was erratic, swerving from one edge of the road to the other. After analyzing the network’s hidden units, the reason for its difficulty became clear. It had developed detectors not only for the position of the road, but also for the position of the ditch on the right side during training. When tested in the opposite direction, the network was able to keep the vehicle on the road using its road detectors but was somewhat confused because the ditch it had learned to look for on the right side was now on the left.

Obviously, we should not trust a driverless car that relies on there being a ditch along the road. Such a car is not robustly competent. We see this when evaluating the neural network steering the wheel on sufficient data (‘when tested in the opposite direction’)—but, as Pomerleau points out, we can also detect such behavior by ‘analyzing the network’s hidden units’.

Such analysis is not always trivial, and some methods for performing such analysis may even be misleading [3]. Nevertheless, most feature attribution methods, for example, would be able to pick up on a driverless car’s reliance on the existence of a ditch on the right hand side of a road.Footnote 9 That is, feature attribution methods would tell us something that more data could have told us, but in the absence of such data.

When a neural network correctly predicts a phenomenon—say when to turn a wheel or whether an X-ray scan suggests excessive liquids in the lungs—it provides evidence that there is some learnable relation between the input (images) and a target variable, e.g., \(\{\text{ left }\mid \text{ right }\}\) or \(\{\text{ liquid }\mid \text{ no-liquid }\}\). What exactly the relation is, is initially unclear, except it is something that is available in the input (images) the model was evaluated on. Some hypotheses may be ruled out in light of more data, but feature attribution methods can also be used to rule out hypotheses, e.g., by showing that the underlying relation does or does not involve particular variables.

Such transparency-increasing methods do not have unique epistemic value. What they tell us, more data could also have told us. Instead they compensate for data in situations where data are inaccessible, e.g., because it has not been generated yet, or because it is private. Transparency, in this sense, modulates trust, by potentially providing us with evidence for a robust competence of a technology.

6 Intentionality strikes back?

Finally, I consider a possible response to what I have said so far. I have introduced a revised version of Jones’ definition of trustworthiness and argued that AI can be trustworthy under this definition. Jones, in contrast, says ’one can only trust things that have wills.’ This is intended to rule out things like trustworthy AI, but how about unconscious human agents? Can you, for example, ‘trust your best friend to sleepwalk tonight’? Or is that merely a form of reliance? You do not will to sleepwalk, so for Karen Jones, you cannot trust someone to sleepwalk. In D4, I removed the responsiveness criterion. If B reliably sleepwalks, A can now—under our revised definition of trustworthiness—trust that B will sleepwalk. Is this reasonable, though? If not, this seems a serious objection to what I have proposed.

If A relies on B’s sleepwalking, there is a presupposition that B will maintain the level of sleepwalking B has exhibited in the past, irrespective of changing conditions. A can rely on B sleepwalking fortnightly, for example. If A, in contrast, trusts B’s sleepwalking, its periodicity is irrelevant. To be a trustworthy sleepwalker, you need to be a reliably competent sleepwalker. What that means, would depend on what would be appropriate performance metrics for sleepwalking. If good sleepwalking is fast sleepwalking, someone with the ability to walk fast under diverse conditions, and who sleeps lightly, may be thought of as a trustworthy sleepwalker. Even if that person never sleepwalked before. The question, though, is whether being a reliably competent sleepwalker requires some form of intentionality?

Jones reserves trustworthiness to people and institutions. I have argued that her conservative position on trustworthy technology is a result of circular reasoning, and that it would be more consistent to extend the predicate also to technologies and artifacts—but how about natural kinds and unwilled actions like sleepwalking? I have already indicated that I do not hesitate to extend trustworthiness to unconscious humans. Consider, for example, the difference between:

  1. (1)

    I trust you will vote.

  2. (2)

    I trust you vote.

While (1) asks about the intentions of the addressee, (2) is satisfied by the addressee sleepwalking to the election booth and filling out the form. No intentionality is needed. I do, however, think that there is an important difference between technologies and natural kinds.

Artifacts that are put on the market, were presumably tested before doing so. Under D4’, the necessary conditions for trust are competence and reliability (accuracy and robustness), but as end users, we have little opportunity to evaluate the quality and robustness of products. We trust that safety checks and proper evaluations have been performed. This is a form of derived intentionality.

Derived intentionality introduces a possible guarantor for the accuracy and robustness of B. Someone to say: yes, we tested it. It works. It is this derived intentionality that motivates the feeling of betrayal (rather than mere disappointment). The main role of derived intentionality and the reason it (almost) feels like a necessary precondition is not that it may come with the promise of reliable competence, but that it defines what (\(\phi\)) the trustworthiness is with respect to. A sleepwalker can be reliably competent at sleepwalking, but in the absence of intentionality, we are left without a definition of exactly what a competent sleepwalker is supposed to be good at. Car drivers have a telos. We know what car driving competence entails. Jaywalking is a bit different.

Similarly, an apple can be reliably competent at appling, but as long as apples are not for something, we cannot evaluate an apple’s competence. Only in the context of particular usage, can we talk about the trustworthiness of apples, e.g., in the context of eating or being a stilleben motif. An apple can be a reliable and trustworthy driver, but it is unclear what it would mean for an apple to be a reliable or trustworthy apple, simply because \(\phi\) is unspecified for apples.

But what then happens when (2) is satisfied by a sleepwalking voter? In this case, the addressee is taking the responsibility of the system developer, saying: I know how this system (I) works, and that I will, competently and reliably, sleepwalk to the election booth and fill out the form. Derived intentionality is established, and we know exactly what trustworthiness implies.

So where does this leave us? D4 says there are two criteria for trustworthiness, and that satisfying both is sufficient to be trustworhy: competence (accuracy) and reliability (robustness). Trustworthiness is, in this way, easier to verify than reliability, which requires historical data. On the other hand, evaluating whether someone is competent and reliable in the absence of data may also be much harder. Evaluating someone’s competence and reliability can be very difficult in the absence of data.

If you tell me before going to bed that you will sleepwalk tonight, how would I evaluate the trustworthiness of your claim? If you were a robot, and you were designed to sleepwalk, and the company I bought you fro, had told me to trust you to sleepwalk, the situation would be different: the guarantor could, in theory, be sufficient evidence for competence and reliability, and the guarantor could specify performance metrics for evaluating your competence, as well as the distributions they evaluated on, before establishing your reliability.

Knowing who designed you and for what purpose, gives me a prior on your ability to competently and reliably sleepwalk (or whatever), and makes it easier to evaluate this ability.

7 Conclusion

Most public-facing AI aims to be ‘trustworthy’, but several philosophers have argued that trustworthiness is exclusively a human property. I first identified three competing definitions of what trustworthiness means. One equated trustworthiness with reliability; Baier’s and Jones’ definitions are the other two. Baier, in my view, touches on important, but auxiliary aspects of what it means to be trustworthy; whereas Jones’s definition of trustworthiness seems closer to what we want. Unfortunately, Jones’s definition reinforces the paradox by leading to the conclusion that trustworthiness only applies to humans. I show that such an argument would be circular, however, and identify a premise or condition in Jones’ definition that seems rather anthropomorphic. Simply leaving this out resolves the paradox.

I offered a tentative revision of Jones’ definition: A is trustworthy for B if A is reliably competent. In AI, this definition translates into: machine A is trustworthy for B if A exhibits high accuracy and robust performance across B’s use cases (D4’). I have, I think, shown that the notion of trustworthiness can be extended beyond people or institutions, to artifacts and even natural kinds. AI directly impacts us and our societies. If we roll out AI at scale, we need to make sure it works, and reliably so. And as AI becomes part of our daily lives, we need a language, a terminology that applies to both humans and machines.