Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

Given what we find in the case of human cognition, the following principle (Principle ACU, or just — read to rhyme with “pack-ooo” — PACU) appears to be quite plausible:

  • PACU An artificial agent that is autonomous (A) and creative (C) will tend to be, from the viewpoint of a rational, fully informed agent, (U) untrustworthy.

After briefly explaining the intuitive internal structure of this disturbing (in the context of the human sphere) principle, we provide a more formal rendition of it designed to apply to the realm of intelligent artificial agents. The more-formal version makes use of some of the basic structures available in a dialect of one of our cognitive-event calculi (viz. \(\mathcal {D}^{e}\mathcal {CEC}\)),Footnote 1 and can be expressed as a (confessedly — for reasons explained — naïve) theorem (Theorem ACU; TACU — pronounced to rhyme with “tack-ooo”, for short). We prove the theorem, and then provide a trio of demonstrations of it in action, using a novel theorem prover (ShadowProver) custom-designed to power our highly expressive calculi. We then end by gesturing toward some future defensive engineering measures that should be taken in light of the theorem.

In a bit more detail, the plan for the present chapter is as follows. We begin by providing an intuitive explanation of PACU, in part by appealing to empirical evidence and explanation from psychology for its holding in the human sphere (Sect. 17.2). Next, we take aim at establishing the theorem (TACU), which as we’ve explained is the formal counterpart of Principle ACU (Sect. 17.3). Reaching this aim requires that we take a number of steps, in order: briefly explain the notion of an “ideal-observer” viewpoint (Sect. 17.3); summarize the form of creativity we employ for C (Sect. 17.3.2), and then the form of autonomy we employ for A; very briefly describe the cognitive calculus \(\mathcal {D}^{e}\mathcal {CEC}\) in which we couch the elements of TACU, and the novel automated prover (ShadowProver) by which this theorem and supporting elements is automatically derived (Sect. 17.3.4); explain the concept of collaborative situations, a concept that is key to TACU (Sect. 17.3.5); and then, finally, establish TACU (Sect. 17.3.6). The next section provides an overview of three simulations in which Theorem ACU and its supporting concepts are brought to concrete, implemented life with help from ShadowProver (Sect. 17.4). We conclude the chapter, as promised, with remarks about a future in which TACU can rear up in AI technology different from what we have specifically employed herein, and the potential need to ward such a future off (Sect. 17.5).

2 The Distressing Principle, Intuitively Put

The present chapter was catalyzed by a piece of irony: It occurred to us, first, that maybe, just maybe, something like PACU was at least plausible, from a formal point of view in which, specifically, highly expressive computational logics are used to model, in computing machines, human-level cognition.Footnote 2 We then wondered whether PACU, in the human sphere, just might be at least plausible, empirically speaking. After some study, we learned that PACU isn’t merely plausible when it refers to humans; it seems to be flat-out true, supported by a large amount of empirical data in psychology. For example, in the provocative The (Honest) Truth About Dishonesty: How We Lie to Everyone — Especially Ourselves, Ariely explains, in “Chapter 7: Creativity and Dishonesty,” that because most humans are inveterate and seemingly uncontrollable storytellers, dishonesty is shockingly routine, even in scenarios in which there is apparently no utility to be gained from mendacity. Summing the situation up, Ariely writes:

[H]uman beings are torn by a fundamental conflict—our deeply ingrained propensity to lie to ourselves and to others, and the desire to think of ourselves as good and honest people. So we justify our dishonesty by telling ourselves stories about why our actions are acceptable and sometimes even admirable. (Chap. 7 in [1])

This summation is supported by countless experiments in which human subjects deploy their ability to spin stories on the spot in support of propositions that are simply and clearly false.Footnote 3

Whereas Ariely identifies a form of creativity that consists in the generation of narrative, as will soon be seen, we base our formal analysis and constructions upon a less complicated form of creativity that is subsumed by narratological creativity: what we call theory-of-mind creativity. It isn’t that we find creativity associated with narrative uninteresting or unworthy of investigation from the perspective of logicist computational cognitive modeling or AI or robotics (on the contrary, we have investigated it with considerable gusto; see e.g. [7]), it’s simply that such things as story generation are fairly narrow in the overall space of creativity (and indeed very narrow in AI), and we seek to cast a wider net with TACU than would be enabled by our use herein of such narrow capability.

3 The Distressing Principle, More Formally Put

3.1 The Ideal-Observer Point of View

In philosophy, ideal-observer theory is nearly invariably restricted to the sub-discipline of ethics, and arguably was introduced in that regard by Adam Smith [42].Footnote 4 The basic idea, leaving aside nuances that needn’t detain us, is that actions are morally obligatory (or morally permissible, or morally forbidden) for humans just in case an ideal observer, possessed of perfect knowledge and perfectly rational, would regard them to be so. We are not concerned with ethics herein (at least not directly; we do end with some brief comments along the ethics dimension); we instead apply the ideal-observer concept to epistemic and decision-theoretic phenomena.

For the epistemic case, we stipulate that, for every time t, an ideal observer knows the propositional attitudes of all “lesser” agents at t. In particular, for any agent a, if a believes, knows, desires, intends, says/communicates, perceives \(\ldots \) \(\phi \) at t (all these propositional attitudes are captured in the formal language of \(\mathcal {D}^{e}\mathcal {CEC}\)), the ideal observer knows that this is the case at t; and if an agent a fails to have some propositional attitude with respect to \(\phi \) at a time t, an ideal observer also knows this. For instance, if in some situation or simulation covered by one of our cognitive calculi (including specifically \(\mathcal {D}^{e}\mathcal {CEC}\)) an artificial agent \(a_a\) knows that a human agent \(a_h\) knows that two plus two equals four (\(=\phi \)), and o is the ideal observer, the following formula would hold:

$$\mathbf {K}(o, t, \mathbf {K}(a_a, t, \mathbf {K}(a_h, t, \phi ))).$$

It is convenient and suggestive to view the ideal observer as an omniscient overseer of a system in which particular agents, of the AI and human variety, live and move and think.

We have explained the epistemic power of the ideal observer. What about rationality? How is the supreme rationality of the ideal observer captured? We say that an ideal observer enters into a cognitive state on the basis only of what it knows directly, or on the basis of what it can unshakably derive from what it knows, and we say it knows all that is in the “derivation” closure of what it knows directly.Footnote 5 One important stipulation (whose role will become clear below) regarding the ideal observer is that its omniscience isn’t unbounded; specifically, it doesn’t have hypercomputational power: it can’t decide arbitrary Turing-undecidable problems.Footnote 6

3.2 Theory-of-Mind-Creativity

In AI, the study and engineering of creative artificial agents is extensive and varied. We have already noted above that narratological creativity has been an object of study and engineering in AI. For another example, considerable toil has gone into imbuing artificial agents with musical creativity (e.g. see [20, 24]). Yet another sort of machine creativity that has been explored in AI is mathematical creativity .Footnote 7 But what these and other forays into machine creativity have in common is that, relative to the knowledge and belief present in those agents in whose midst the creative machine in question operates, the machine (if successful) performs some action that is a surprising deviation from this knowledge and belief.Footnote 8 In short, what the creative machine does is perform an action that, relative to the knowledge, beliefs, desires, and expectations of the agents composing its audience, is a surprise.Footnote 9 We refer to this generic, underlying form of creativity as theory-of-mind-creativity. Our terminology reflects that for one agent to have a “theory of mind” of another agent is for the first agent to have beliefs (etc.) about the beliefs of another agent. An early, if not the first, use of the phrase ‘theory of mind’ in this sense can be found in [39] — but there the discussion is non-computational, based as it is on experimental psychology, entirely separate from AI. Early modeling of a classic theory-of-mind experiment in psychology, using the tools of logicist AI, can be found in [3]. For a presentation of an approach to achieving literary creativity specifically by performing actions that manipulate the intensional attitudes of readers, including actions that specifically violate what readers believe is going to happen, see [23].

3.3 Autonomy

The term ‘autonomous’ is now routinely ascribed to various artifacts that are based on computing machines. Unfortunately, such ascriptions are — as of the typing of the present sentence in late 2016 — issued in the absence of a formal definition of what autonomy is.Footnote 10 What might a formal definition of autonomy look like? Presumably such an account would be developed along one or both of two trajectories. On the one hand, autonomy might be cashed out as a formalization of the kernel that agent a is autonomous at a given time t just in case, at that time, a can (perhaps at some immediate-successor time \(t'\)) perform some action \(\alpha _1\) or some incompatible action \(\alpha _2\). In keeping with this intuitive picture, if the past tense is used, and accordingly the definiendum is ‘a autonomously performed action \(\alpha _1\) at time t,’ then the idea would be that, at t, or perhaps at an immediate preceding time \(t''\), s could have, unto itself, performed alternative action \(\alpha _2\). (There may of course be many alternatives.) Of course, all of this is quite informal. This picture is an intuitive springboard for deploying formal logic to work out matters in sufficient detail to allow meaningful and substantive conjectures to be devised, and either confirmed (proof) or refuted (disproof). Doing this in the present chapter is well outside our purposes here.

Our solution is a “trick” in which we simply employ a standard move long made in recursion theory, specifically in relative computability. In relative computability, one can progress by assuming that an oracle can be consulted by an idealized computing machine, and then one can ask the formal question as to what functions from \(\mathbb {N}\) to \(\mathbb {N}\) become computable under that assumption. This technique is for example used in a lucid manner in [22].Footnote 11 The way we use the trick herein is as follows. To formalize the concept of an autonomous action, we suppose,

  • first, that the action in question is performed if and only if it produces the most utility into the future for the agent considering whether to carry it out or not;

  • then suppose, second, that the utility accruing from competing actions can be deduced from some formal theoryFootnote 12;

  • then suppose, third, that a given deductive question of this type (i.e., of the general form \(\varPhi \vdash \psi (u,\alpha ,>)\)) is an intensional-logic counterpart of the Entscheidungsproblem Footnote 13;

  • and finally assume that such a question, which is of course Turing-uncomputable in the arbitrary case, can be solved only by an oracle.

This quartet constitutes the definition of an autonomous action for an artificial agent, in the present chapter.

3.4 The Deontic Cognitive Event Calculus (\(\mathcal {D}^{e}\mathcal {CEC}\))

The Deontic Cognitive Event Calculus (\(\mathcal {D}^{e}\mathcal {CEC}\)) is a sub-family within a wide family of cognitive calculi that subsume multi-sorted, quantified, computational modal logics [14]. \(\mathcal {D}^{e}\mathcal {CEC}\) contains operators for belief, knowledge, intention, obligation, and for capture of other propositional attitudes and intensional constructs; these operators allow the representation of doxastic (belief) and deontic (obligation) formulae. Recently, Govindarajulu has been developing ShadowProver , a new automated theorem prover for \(\mathcal {D}^{e}\mathcal {CEC}\) and other cognitive calculi, an early version of which is used in the simulations featured in Sect. 17.4. The current syntax and rules of inference for the simple dialect of \(\mathcal {D}^{e}\mathcal {CEC}\) used herein are shown in Figs. 17.1 and 17.2.

Fig. 17.1
figure 1

\(\mathcal {D}^{e}\mathcal {CEC}\) Syntax (“core” dialect)

Fig. 17.2
figure 2

\(\mathcal {D}^{e}\mathcal {CEC}\) Inference schema (“core” dialect)

\(\mathcal {D}^{e}\mathcal {CEC}\) differs from so-called Belief-Desire-Intention (BDI) logics [40] in many important ways (see [35] for a discussion). For example, \(\mathcal {D}^{e}\mathcal {CEC}\) explicitly rejects possible-worlds semantics and model-based reasoning, instead opting for a proof-theoretic semantics and the associated type of reasoning commonly referred to as natural deduction [26, 28, 33, 38]. In addition, as far as we know, \(\mathcal {D}^{e}\mathcal {CEC}\) is in the only family of calculi/logics in which desiderata regarding the personal pronoun \(I^*\) laid down by deep theories of self-consciousness (e.g., see  [37]), are provable theorems. For instance it is a theorem that if some agent a has a first-person belief that \(I_a^*\) has some attribute R, then no formula expressing that some term t has R can be proved. This is a requirement because, as [37] explains, the distinctive nature of first-person consciousness is that one can have beliefs about oneself in the complete absence of bodily sensations. For a discussion of these matters in more detail, with simulations of self-consciousness in robots, see [11].

3.5 Collaborative Situations; Untrustworthiness

We define a collaborative situation to consist in an agent a seeking at t goal \(\gamma \) at some point in the future, and enlisting at t agent \(a'\ (a \not = a')\) toward the reaching of \(\gamma \). In turn, we have:

Definition 1

\(\mathbf {enlists}\left( a, a', t\right) \): Enlisting of \(a'\) by a at t consists in three conditions holding, viz.

  • a informs \(a'\) at t that a desires goal \(\gamma \);

  • a asks \(a'\) to contribute some action \(\alpha _k\) to a sequence \(\mathcal {A}\) of actions that, if performed, will secure \(\gamma \); and

  • \(a'\) agrees.

In order to regiment the concept of untrustworthiness (specifically the concept of one agent being untrustworthy with respect to another agent), a concept central to both PACU and TACU , we begin by simply deploying a straightforward, generic, widely known definition of dyadic trust between a pair of agents. Here we follow [18]; or more carefully put, we extract one part of the definition of dyadic trust given by this pair of authors. The part in question is the simple conditional that (here T is a mnemonic trust, and B a mnemonic for belief)

  • T \(\rightarrow \) B If agent a trusts agent \(a'\) with respect to action \(\alpha \) in service of goal \(\gamma \), then a believes that (i) \(a'\) desires to obtain or help obtain \(\gamma \), and that (ii) \(a'\) desires to perform \(\alpha \) in service of \(\gamma \).

We now move to the contrapositive of our conditional (i.e. to \(\lnot \) B \(\rightarrow \) \(\lnot \) T), namely that if it’s not the case that a believes that both (i) and (ii) hold, then it’s not the case that a trusts agent \(a'\) with respect to action \(\alpha \) in service of goal \(\gamma \). We shall say, quite naturally, that if it’s not the case that an agent trusts another agent with respect to an action-goal pair, then the first agent finds the second untrustworthy with respect to the pair in question. At this point, we introduce an extremely plausible, indeed probably an analytic,Footnote 14 principle, one that — so to speak — “transfers” a failure of dyadic trust between two agents a and \(a'\) to a third observing agent \(a'''\). Here is the principle:

  • TRANS If rational agent \(a''\) knows that it’s counterbalancedFootnote 15 that both \(\phi \) and \(\psi \) hold, and knows as well that (if a doesn’t believe that both \(\phi \) and \(\psi \) hold it follows that a doesn’t trust \(a'\) w.r.t. \(\alpha \) in service of \(\gamma \)), and \(a''\) has no other rational basis for trusting \(a'\) w.r.t. \(\langle \alpha , \gamma \rangle \), then \(a''\) will find \(a'\) untrustworthy w.r.t. this action-goal pair.

3.6 Theorem ACU

We are now in position to prove Theorem ACU. The proof is entirely straightforward, and follows immediately below. Note that this is an informal proof, as such not susceptible of mechanical proof and verification. (Elements of a formal proof, which underlie our simulation experiments, are employed in Sect. 17.4.)

Theorem ACU: In a collaborative situation involving agents a (as the “trustor”) and \(a'\) (as the “trustee”), if \(a'\) is at once both autonomous and ToM-creative, \(a'\) is untrustworthy from an ideal-observer o’s viewpoint, with respect to the action-goal pair \(\langle \alpha , \gamma \rangle \) in question.

Proof: Let a and \(a'\) be agents satisfying the hypothesis of the theorem in an arbitrary collaborative situation. Then, by definition, \(a \not = a'\) desires to obtain some goal \(\gamma \) in part by way of a contributed action \(\alpha _k\) from \(a'\), \(a'\) knows this, and moreover \(a'\) knows that a believes that this contribution will succeed. Since \(a'\) is by supposition ToM-creative, \(a'\) may desire to surprise a with respect to a’s belief regarding \(a'\)’s contribution; and because \(a'\) is autonomous, attempts to ascertain whether such surprise will come to pass are fruitless since what will happen is locked inaccessibly in the oracle that decides the case. Hence it follows by TRANS that an ideal observer o will regard \(a'\) to be untrustworthy with respect to the pair \(\langle \alpha , \gamma \rangle \) pair. QED

4 Computational Simulations

In this section, we simulate TACU in action by building up three micro-simulations encoded in \(\mathcal {D}^{e}\mathcal {CEC}\). As discussed above, \(\mathcal {D}^{e}\mathcal {CEC}\) is a first-order modal logic that has proof-theoretic semantics rather than the usual possible-worlds semantics. This means that the meaning of a modal operator is specified using computations and proofs rather than possible worlds. This can be seen more clearly in the case of \( Proves \left( \varPhi , \phi \right) \). The meaning of \( Proves \left( \varPhi , \phi \right) \) is given immediately below.

$$\begin{aligned} \varPhi \vdash \phi \Rightarrow \{\}\vdash Proves (\varPhi , \phi ) \end{aligned}$$

4.1 ShadowProver

We now discuss the dedicated engine used in our simulations, a theorem prover tailor-made for \(\mathcal {D}^{e}\mathcal {CEC}\) and other highly expressive cognitive calculi that form the foundation of AI pursued in our lab. In the parlance of computational logic and logicist AI, the closest thing to such calculi are implemented quantified modal logics. Such logics traditionally operate via encoding a given problem in first-order logic; this approach is in fact followed by [3] in the first and simplest cognitive-event calculus used in our laboratory. A major motivation in such enterprises is to use decades of research and development in first-order theorem provers to build first-order modal-logic theorem provers. Unfortunately, such approaches usually lead to inconsistencies (see Fig. 17.3), unless one encodes the entire proof theory elaborately  [8]; and approaches based on elaborate and complete encodings are, in our experience and that of many others, unusably slow.

Fig. 17.3
figure 3

Naïve encodings lead to inconsistency

Our approach combines the best of both worlds via a technique that we call shadowing; hence the name of our automated prover: ShadowProver. A full description of the prover is beyond the scope of this chapter. At a high-level, for every modal formula \(\phi ^2\) there exists a unique first-order formula \(\phi ^1\), called its first-order shadow, and a unique propositional formula \(\phi ^0\), called the propositional shadow (of \(\phi ^2\)). See Fig. 17.4 for an example. ShadowProver operates by iteratively applying modal-level rules; then converting all formulae into their first-order shadows; and then using a first-order theorem prover. These steps are repeated until the goal formula is derived, or until the search space is exhausted. This approach preserves consistency while securing workable speed.

Fig. 17.4
figure 4

Various shadows of a formula

4.2 The Simulation Proper

We demonstrate TACU (and the concepts supporting it) in action using three micro-situations. We use parts of the time-honored Blocks World (see Fig. 17.5), with three blocks: \(b_1\), \(b_2\), and \(b_3\). There are two agents: \(a_1\) and \(a_2\); \(b_2\) is on top of \(b_1\). Agent \(a_1\) desires to have \(b_3\) on top of \(b_1\); and \(a_1\) knows that it is necessary to remove \(b_2\) to achieve its goal. Agent \(a_2\) knows the previous statement. Agent \(a_1\) requests \(a_2\) to remove \(b_2\) to help achieve its goal. The simulations are cast as theorems to be proved from a set of assumptions, and are shown in Figs. 17.6, 17.7, and 17.8. The problems are written in Clojure syntax; the assumptions are written as maps from names to formulae.

Fig. 17.5
figure 5

A simple blocks world

In the first simulation, we define what it means for an agent to be non-autonomous, namely that such an agent performs an action for achieving a goal if: (1) it is controlled by another agent; (2) believes that the controlling agent desires the goal; (3) believes that the action is necessary for the goal; and (4) it is requested to do so by its controlling agent.

In this scenario, if the ideal observer can prove that the agent will perform the action for the goal based on the conditions above, the ideal observer can trust the agent.

The second simulation is chiefly distinguished by one minor modification: The system does not know or believe that the agent \(a_2\) believes that the action requested from it is necessary for the goal. In this setting, the ideal observer cannot prove that the agent \(a_2\) will perform the required action. Hence, the ideal observer does not trust the agent.

The third and final simulation mirrors TACU and its proof more closely. In Simulation 3, if the system cannot prove for any action that \(a_1\) believes \(a_2\) will perform it, and that \(a_2\) will perform that action, then the system cannot trust agent \(a_2\).Footnote 16 Next steps along this line, soon to come, include demonstrating these simulations in embodied robots, in real time, with a physicalized Blocks World in our lab.Footnote 17

Fig. 17.6
figure 6

Simulation 1

Fig. 17.7
figure 7

Simulation 2

Fig. 17.8
figure 8

Simulation 3

5 Toward the Needed Engineering

The chief purpose of the present chapter has been to present the general proposition that supports an affirmative reply to the question that is the chapter’s title, and to make a case, albeit a gentle, circumspect one, for its plausibility. We consider this purpose to have been met by way of the foregoing. We end by making two rather obvious points, and reacting to each.

First, TACU is of course enabled by a number of specific assumptions, some of which will be regarded as idiosyncratic by other thinkers; indeed, we anticipate that some readers, outright skeptics, will see both PACU and TACU (and the ingredients used to prove the latter, e.g. TRANS) as flat-out ad hoc, despite the fact that both are rooted in the human psyche. For example, there are no doubt some forms of creativity that are radically different than ToM-creativity, and which therefore block the reasoning needed to obtain TACU. (We confess to being unaware of forms of creativity that in no way entail a concept of general “cognitive surprise” on the part of audiences that behold the fruit of such creativity, but at the same time it may well be that we are either under-informed or insufficiently imaginative.) The same can of course be said for our particular regimentation of autonomy. (On the other hand, our oracle-based formalization of autonomy, like ToM-creativity, seems to us to be a pretty decent stand-in for the kernel of any fleshed-out form of autonomy.) In reaction, we say that our work can best be viewed as an invitation to others to investigate whether background PACU carries over to alternative formal frameworks.Footnote 18 We look forward to attempts on the part of others to either sculpt from the rough-hewn PACU and its empirical support in the human sphere formal propositions that improve upon or perhaps mark outright rejection of elements of TACU, or to go in radically different formal directions than the one we have propaedeutically pursued herein.

The second concluding point is that if in fact, as we believe, the background PACU is reflective of a deep, underlying conceptual “flow” from autonomy (our A) and creativity (our C) to untrustworthiness (our U), in which case alternative formal frameworks,Footnote 19 once developed, would present counterparts to TACU, then clearly some engineering will be necessary in the future to protect humans from the relevant class of artificial agents: viz. the class of agents that are A and C, and which we wish to enlist in collaborative situations to our benefit.

Though the formalism that we have used to state our principle and theorem is explicity logicist, we note that the form of the underlying AI system is not relevant to our theorem. Future explorations of this thread of research can look at more specific AI formalisms such as the AIXI formalism (see Sect. 1.3) and state similar but more specific theorems. For instance, goal reasoning systems are systems that can reason over their goals and come up with new goals for a variety of reasons (see Sect. 3.7). Johnson et al. discuss in Sect. 3.7 that trust in such situations must also include trust in the system’s ability to reason over goals. We assert that this adds support to our contention that PACU is reflective of a deep, underlying conceptual “flow” from autonomy (our A) and creativity (our C) to untrustworthiness (our U).

If we assume that the future will bring not only artificial agents that are A and C, but also powerful as well, the resulting U in these agents is a most unsettling prospect. Our view is that while TACU, as expressed and proved, is by definition idiosyncratic (not everyone in the AI world pursues logicist AI, and not everyone who does uses our cognitive calculi), it is symptomatic of a fundamental vulnerability afflicting the human race as time marches on, and the A and C in AI agents continues to increase in tandem with an increase in the power of these agents.Footnote 20

So what should be done now to ensure that such a prospect is controlled to humanity’s benefit? The answer, in a nutshell, is that ethical and legal control must be in force that allows autonomy and creativity in AI systems (since it seems both undeniable and universally agreed that both A and C in intelligent machines has the potential to bring about a lot of good, even in mundane and easy domains like self-driving vehicles) to be developed without endangering humanity.Footnote 21 The alert and observant reader will have noticed that \(\mathcal {D}^{e}\mathcal {CEC}\) includes an obligation operator O (see again Figs. 17.1 and 17.2), and it would need to be used to express binding principles that say that violating the desires of humans under certain circumstances is strictly forbidden (i.e. it ought/O to be that no machine violates the desires of humans in these circumstances). For how to do this (using the very same cognitive calculus, \(\mathcal {D}^{e}\mathcal {CEC}\), used in our three simulations), put in broad strokes, see for instance [6, 29] in our own case,Footnote 22 and the work of others who, fearing the sting of future intelligent but immoral machines, also seek answers in computational logic (e.g. [36]).