Turing Interrogative Games
Authors
 First online:
 Received:
 Accepted:
DOI: 10.1007/s110230119245z
Abstract
The issue of adequacy of the Turing Test (TT) is addressed. The concept of Turing Interrogative Game (TIG) is introduced. We show that if some conditions hold, then each machine, even a thinking one, loses a certain TIG and thus an instance of TT. If, however, the conditions do not hold, the success of a machine need not constitute a convincing argument for the claim that the machine thinks.
Keywords
Turing test Logic of questions Interrogative games Recursion theoryThe Turing Test
Despite its years, the Turing Test (TT) still brings in new issues, and the dispute over it is far from being closed; see for example the latest collection of papers (Epstein et al. 2009). One of the central topics in this area is the problem of adequacy of TT as a tool for investigating if artificial agents think. ^{1} Our aim is to address this issue by using a certain formal approach.
Papers devoted to TT most often are philosophical in nature and methods used. Among recent papers notable exceptions are: Turing Machines approach (Sato and Ikegami 2004), exploring the computational complexity of TT setting (HernandezOrallo 2000), applying the interactive proof theory in modeling TT (Bradford and Wollowski 1995; Shieber 2006, 2007). In our paper we will make use of some elementary concepts of recursion theory and of a certain recent result in the (metatheory of the) logic of questions (cf. Wiśniewski and Pogonowski 2010). Our approach differs from the mentioned ones not only in the formal method used. We attempt to stay as close to the Turing’s original proposal as it is possible. In order to achieve this we, first, focus on the core of Turing’s proposal and then, on this basis, introduce the concept of Turing Interrogative Game (TIG for short). We do not claim, however, that TT can be performed only via TIG(s). What we claim is: once a TIG is performed, an instance of TT takes place. A comparison of basic properties of TIGs (which are characterized in the section "Turing Interrogative Games") with the features of TT described in this section justifies this claim.

TT resembles the socalled imitation game, but the analogy is not complete. The imitation game involves three players: a man (A), a woman (B), and an interrogator; the objective of an interrogator is to identify which of the players, A and B, is a man, and which is a woman. The objective of an interrogator in TT is different: he/she aims at an identification of a machine that thinks. Therefore only two parties are enough: an interrogator (sometimes referred to as a judge or a jury)^{3} and a tested agent. In his seminal paper “Computing Machinery and Intelligence” Turing uses the term viva voce for the version of a game with player B omitted (cf. Turing 1950, p. 446). In his later works Turing straightforwardly refers to TT as to a twoparties enterprise. For example, in “Digital Computers\(\ldots\)” one reads:
“I am imagining something like a vivavoce examination, but with the questions and answers all typewritten in order that we need not consider irrelevant matters as the faithfulness with which the human voice can be imitated.” (Turing 1951a, pp. 4–5)

The interrogator is not supposed to know the identity of the tested agent. The parties cannot see or hear each other. As Turing puts it:
“A considerable proportion of jury, who should not be expert about machines, must be taken in by the pretence. They aren’t allowed to see the machine itself—that would make it too easy. So the machine is kept in a far away room and the jury are allowed to ask it questions, which are transmitted through to it: it sends back a typewritten answer.” (Newman et al. 1952, p. 4)

The game is played by means of questions and answers. This is only the interrogator who asks questions. The second party never asks questions, but is supposed to answer them. This feature plays an important role in our paper, so the claim deserves a wider justification. First, we find some indirect evidence in “Computing Machinery\(\ldots\)”. TT resembles the imitation game, and in the latter questions are asked only by the interrogator, whereas answers are given by the players A and B–cf. (Turing 1950, p. 433). Second, the examples of dialogues given by Turing in “Computing Machinery\(\ldots\)” display that questions are asked only by the interrogator (cf. Turing 1950, p. 434–435 and p. 446). Finally, in “Can Automatic Calculating Machines\(\ldots\)” one reads:
“The idea of the test is that the machine has to try and pretend to be a man, by answering questions put to it, and it will only pass if the pretence is reasonably convincing.” (Newman et al. 1952, p. 4)

The aim of an AI agent tested is to mislead the interrogator in such a way that he/she would not be able to make the accurate identification (cf. Turing 1950, p. 434). The agent should attempt to answer questions in a humanlike manner, and even can use some tricks to achieve this, e.g. can make mistakes in calculations, or spelling mistakes, etc.
“Likewise the machine would be permitted all sorts of tricks so as to appear more manlike, such as waiting a bit before giving the answer, or making spelling mistakes\(\ldots\)” (Newman et al. 1952, p. 5)

An interrogator is free in his/her choice of questions asked; Turing does not impose any restrictions here. He writes in “Computing Machinnery\(\ldots\)”:
“The question and answer method seems to be suitable for introducing almost any one of the fields of human endeavor that we wish to include.” (Turing 1950, p. 435)
In “Can Automatic Calculating Machines\(\ldots\)” one reads:
“BRAITHWAITE: Would the questions have to be sums, or could I ask it what it had had for breakfast?
TURING: Oh yes, anything. And the questions don’t really have to be questions, any more than questions in a law court are really questions. You know the sort of thing. “I put it to you that you are only pretending to be a man” would be quite in order.” (Newman et al. 1952, p. 5)

An agent should be tested long enough to gain more reliable results. As it is clearly stated in “Can Digital Computers Think”:
“We had better suppose that each jury has to judge quite a number of times, and that sometimes they really are dealing with a man and not a machine. That will prevent them saying “It must be a machine” every time without proper consideration.” (Newman et al. 1952, p. 5)

The main objective of TT is to differentiate between AI agents that think and AI agents that do not. In this paper we, purposely, do not address the issue of what concept of thinking, if any, is involved in the TT’s setting. Discussions on this issue are still open (cf. e.g. Copeland 2000; Moor 1976). Turing himself writes:
“I don’t want to give a definition of thinking, but if I had to I should probably by unable to say anything more about it than that it was a sort of buzzing that went on inside my head. But I don’t really see that we need to agree on a definition at all. The important thing is to try draw a line between the properties of a brain, or of a man, that we want to discuss, and those that we don’t.” (Newman et al. 1952, p. 3–4)
Generally speaking, what matters for an AI agent in TT is to evince relevance in a dialogue at the same level as humans do (cf. Ginzburg 2010). It is claimed that the success of an AI agent would provide good reasons to believe that the AI agent thinks (see Stalker 1976).^{4} It is presupposed that an AI agent that thinks is able to pass the test. It is assumed that the result of TT is neither predetermined by the structure of the test nor by the fact that the tested party is an AI agent.
Turing Interrogative Games
In this section we introduce the concept of Turing Interrogative Game (TIG). TIGs display the key features of TT (in its viva voce version) specified in the previous section. However, for simplicity we plainly assume that the second party of a TIG is a machine, and we impose some conditions which explicate and/or supplement the TT rules.
TIGs are characterized as follows.
1. There are two parties, Int (after ‘Interrogator’) and M (after ‘Machine’), which play a TIG.
“I have K at my K1, and no other pieces. You have only K at K6 and R at R1. It is your move. What do you play?” (Turing 1950, p. 435)
One can use more than one sentence to express a condition. In such a case the condition is a conjunction of the sentences used.
Specifying a condition is not mandatory in the original TT setting. Yet the following justifies us in stipulating that each question asked in a TIG is to be accompanied with a condition. First, one can argue that questions asked in TT but not accompanied with explicit conditions are, by default, questions about the actual state of the world or the domain just investigated. In TIGs, we make this feature visible by formulating the relevant condition. Second, TT is a kind of dialogue, and a question of a dialogue that pertains to a given topic sometimes arises from previous claims about the topic, answers to previous questions (if any) included. This can be made explicit in TIGs; the condition accompanied with a raised question is made up of the relevant items of information.
2. As in the case of TT, we assume that Int is free in the choice of questions and conditions. This mirrors the active role of an interrogator in TT. Moreover, M is not permitted to ask any question.
3. It is assumed that for each question Q and each condition γ there exist(s) some correct answer(s) to Q with respect to γ. However, we do not claim that, given a question Q and a condition γ, there exists exactly one correct answer to Q with respect to γ. In some cases there may be more such answers. Consider e.g.:
Moreover, we assume that any answer to Q that is correct with respect to some condition(s) is either a direct answer to Q or a sentence which says ‘there is no correct answer’; we express the latter by \(\odot\). By a direct answer to a question we mean an expression which is a possible and justsufficient answer to the question, i.e. gives neither more nor less information than it is called for by the question itself.^{5} We use dQ for the set of all the direct answers to question Q.Take arithmetic. Which prime is greater than 7?
Suppose that the question is asked and accompanied with a condition to the effect that it pertains to the real world. Now it becomes a contextually ‘tricky’ question, and the answer of the form ‘there is no correct answer’ seems to be the correct one in the context and within a TIG. In an everyday conversation the above question would probably be responded to with a comment or a clarification request, but since in TT and TIGs a tested agent is not permitted to ask questions, contextually ‘tricky’ questions are to be correctly answered by ‘there is no correct answer’. The second reason for which we need \(\odot\) in TIGs is the following. According to (2) above and (5) below, Int is free in the choice of questions and conditions. So, in principle, Int can ask a question which is not relevant with regard to the condition just set, or can set a condition which is not relevant to the question just asked. In both cases the answer which is correct with respect to the condition is ‘there is no correct answer’ and for this reason we need \(\odot\). Here is an illustration:Who was the last king of the United States of America?
Finally, we do not forejudge that for each question Q, the set dQ of direct answers to Q exists and is nonempty. Another application of \(\odot\) is: we assume that if dQ is empty or no direct answer to Q is defined/exists, then \(\odot\) is the unique correct answer to Q with respect to any condition.^{6}Take arithmetic. Does Pegasus exist?
4. We also assume that the following hold:

\((\spadesuit)\) if β is a direct answer to Q, then there exists a condition γ_{β} such that β is the unique correct answer to Q with respect to γ_{β},

\((\clubsuit)\) there exists a condition ϑ such that \(\odot\) is the only correct answer to Q with respect to ϑ.
5. We assume that a TIG is played in a language such that: (i) the set of expressions of the language includes declarative sentences (sentences for short) and questions, (ii) expressions of the language can be coded by natural numbers, i.e. there exists a coding method according to which each sentence is coded by an unique natural number, and similarly for questions, (iii) the set of sentences of the language is denumerable and at least recursively enumerable (r.e. for short), and (iv) the set of questions of the language is r.e. (By ‘denumerable’ we mean, here and below, ‘countably infinite’.) Moreover, \(\odot\) is supposed to occur in the language.
Clause (ii) allows us to use tools taken from (classical) recursion theory.
Each question of the language can occur in TIGs, and each sentence of the language can be used to set a condition in some TIG(s). We also assume that each sentence of the language is a direct answer to some question(s) of the language.
Since sentences and questions have unique numerical codes, in what follows we will sometimes disregard the distinction between the relevant expressions and (their) numerical codes. In particular, this pertains to the considerations below.
6. M is a machine and thus proceeds only algorithmically. On the other hand, M is supposed to provide those answers to questions asked by Int which are correct with respect to conditions that are set by Int. Without loss of generality we may assume that the correct answers (if any) provided by M are outputs of questionresolving algorithms which are accessible to M; the outputs are sent to Int.^{7} For our purposes it is convenient to view algorithms as partial recursive functions.
Let \(\Upphi\) be the set of sentences of the language in which a TIG is played, and let \(\Uppsi\) be the set of questions of the language. Domain and range of a function ϕ will be designated by dom(ϕ) and rng(ϕ), respectively.
By a questionresolving algorithm (qralgorithm for short) we mean a partial recursive function ϕ such that \(dom(\phi) \subseteq \Uppsi \times \Upphi\), and for any \((Q, \gamma) \in dom(\phi)\) : ϕ (Q, γ) is a correct answer to Q with respect to γ.^{8}
Let us stress that qralgorithms do not define what is the correct answer in a given context; this is already preestablished.
We say that a qralgorithm ϕ pertains to question Q if \((Q, \beta) \in dom(\phi)\) for some \(\beta \in \Upphi\). Similarly, a qralgorithm ϕ pertains to sentence γ if \((Q, \gamma) \in dom(\phi)\) for some \(Q \in \Uppsi\).
A qralgorithm ϕ is accessible to M if M is able to compute the values of ϕ for the whole domain of ϕ. We assume that the class of qralgorithms that are accessible to M is nonempty and finite.^{9} We use \(\Upsigma_{{\bf M}}\) for the set of all the qralgorithms accessible to M, and \(Rng(\Upsigma_{{\bf M}})\) for the union of ranges of all the functions in \(\Upsigma_{{\bf M}}\). Clearly, \(Rng(\Upsigma_{{\bf M}})\) is r.e., as an union of a finite number of r.e. sets.
M wins a round iff M provides a certain expression that is a correct answer to the question asked with regard to the condition set; otherwise M loses the round and Int wins the round. Int loses a round if M wins the round. Observe that M can lose a round in two ways: (a) the expression provided is not an answer of the required kind, or (b) no output is provided.
Int wins the game iff there is a round of the game which is lost by M; otherwise Int loses the game and M wins the game. M loses the game if M loses a round of the game.
We say that a TIG \(\tau^{\prime}\) extends a TIG τ if \(\tau^{\prime}\) comprises, besides the rounds already played in τ, also some new round(s).
A general comment is in order now. TIGs preserve the key features of TT specified in the first section of this paper. But we do not claim that TT can be performed only via TIG(s).^{10} We only claim that once a TIG is performed, an instance of TT takes place. This, however, is sufficient for our purposes.
The Trap
One may expect that the success of a machine in a TIG depends only on how “strong” the machine is, that is, what qralgorithms are accessible to the machine. However, in this section we will show there are systematic reasons for which each machine loses some TIG(s).
We need some auxiliary notions; they will be introduced step by step.
A question is called effective iff the set of (all the) direct answers to the question is nonempty and r.e. By an ωquestion we mean a question whose set of direct answers is a denumerable (i.e. countably infinite) set of sentences.
Proposition 1
Each TIG won by a machine, but played in a language in which there occurs at least one noneffective ωquestion, has an extension which is won by the interrogator.
Proof
Let Q be a noneffective ωquestion. Hence dQ (i.e. the set of direct answers to Q) is nonempty (actually, denumerable), but not r.e. Thus the set \(dQ \cup \{\odot \}\) is not r.e.
Let \(\Updelta_{Q}\) be the set of all the functions satisfying the above conditions and defined by means of the qralgorithms that pertain to Q and are accessible to M. Let \(Rng(\Updelta_{Q})\) be the union of ranges of all the functions in \(\Updelta_{Q}\). Thus \(Rng(\Updelta_{Q})\) equals, intuitively speaking, the set of all the possible outputs with respect to Q of the qralgorithms pertaining to Q which are accessible to M. Clearly, \(Rng(\Updelta_{Q}) \subset dQ \cup \{ \odot \}\). Yet \(Rng(\Updelta_{Q})\) is r.e., as an union of a finite number of r.e. sets (recall that only finitely many qralgorithms are accessible to M). On the other hand, \(dQ \cup \{ \odot \}\) is not r.e. Hence there exists an element, β, of \(dQ \cup \{ \odot \}\) which does not belong to \(Rng(\Updelta_{Q})\).
A remark is in order. The fact that β does not belong to \(Rng(\Updelta_{Q})\) only means that there is no qralgorithm accessible to M that gives β on an input consisting of Q and (any!) condition. Of course, β can be an output of a certain qralgorithm on an input involving a question different from Q, but this is unimportant to the argument to follow.
There are two possibilities: (i) \(\beta \in dQ\) or (ii) \(\beta = \odot\).
Let τ be a TIG won by M.
Suppose that (i) holds. By \((\spadesuit)\), there exists condition γ_{β} such that β is the only correct answer to Q with respect to γ_{β}. But Int is free in the choice of questions and conditions, and any question of the language may occur in a TIG. Recall that although each game is finite, there is no upper limit on the number of rounds that pertains to all games. So Int is permitted to extend τ step by step, and each extension will be a TIG which is an extension of the initial game τ. Hence there exists a TIG τ′ which is an extension of τ and whose last round involves γ_{β} and Q. But there is no qralgorithm accessible to M which generates β out of a condition and Q. On the other hand, β is the correct answer to Q with respect to γ_{β}. So M loses the round and thus Int wins the game \(\tau^{\prime}\).
Now suppose that (ii) holds. The situation is analogous, due to \((\clubsuit)\). \(\square\)
Remark 1

Int always wins in a long run, but in order to achieve this Int need not know in advance which condition and/or which question will trigger the effect.

What is crucial is the presence of a noneffective ωquestion. Recall that a noneffective ωquestion has direct answers (actually, denumerably many of them), but the trouble is that the set of direct answers to the question is not r.e. One may expect that Int’s asking a question “obscure” enough so that no direct (i.e. possible and justsufficient) answer to it is defined/exists automatically results in M’s failure. But we have blocked such easy wins. According to what had been said in the section "Turing Interrogative Games", footnote 6, the correct answer to an “obscure” question, with respect to any condition, is \(\odot\), and qralgorithms can mimic this.

M loses the last round of the extension by providing no output. By the way, this raises an interesting question: how long should an interrogator wait in order to conclude that there will be no answer at all?

Of course, the continuity of conversation can be retained by some tricks, e.g. by allowing the machine to respond with questions and/or clarification requests. This, however, would not change the general picture: a round is lost anyway.
Remark 2
So far we have considered unrestricted TIGs. The situation would be different if we imposed an upper limit on the number of rounds in TIGs. Now for games of maximal permitted length won by M there are no extensions won by Int. But the question arises: what is the limit? Moreover, it is obvious that, due to the fact that M proceeds only algorithmically (with the consequences of this fact, see (6) above) and the properties of the language, M would not be able to win each game of a maximal permitted length.
where Q is a question.`Q’ is an ωquestion.
A language permits \(\Upomega\)sentences if for each question of the language, the corresponding \(\Upomega\)sentence belongs to the language.
Proposition 2
Each TIG won by a machine, but played in a language that permits \(\Upomega\) sentences and whose set of ωquestions is not r.e., has an extension which is won by the interrogator.
Proof
The set \(Rng(\Upsigma_{{\bf M}}) \cap \Upgamma_{\Uppsi}\) is r.e., as an intersection of r.e. sets. However, the set \(\Upgamma_{\Uppsi_{\omega}}\) is not r.e., for if it were, \(\Uppsi_{\omega}\) would be r.e. Hence \(Rng(\Upsigma_{{\bf M}}) \cap \Upgamma_{\Uppsi} \ne \Upgamma_{\Uppsi_{\omega}}\). Since \((\heartsuit)\) holds, it follows that there exists an \(\Upomega\)sentence belonging to \(\Upgamma_{\Uppsi_{\omega}}\) which is not an output of any qralgorithm accessible to M that gives \(\Upomega\)sentences as outputs. But each sentence, the relevant \(\Upomega\)sentence included, is a direct answer to some question, and, by \((\spadesuit)\), for each direct answer to a question there exists a condition such that the answer is the unique correct answer to the question with respect to the condition. So the relevant \(\Upomega\)sentence is the unique correct answer to some question with respect to a certain condition. On the other hand, there is no qralgorithm accessible to M which generates the \(\Upomega\)sentence out of the question and the condition. Now we reason similarly as in the proof of Proposition 1. \(\square\)
Therefore we get:
Proposition 3
 (1)
the language includes some noneffective ωquestion(s),
 (2)
the language permits \(\Upomega\) sentences, but the set of ωquestions of the language is not r.e.
Thus, generally speaking, if the game is played in a language that fulfils any of the above conditions, a machine always loses to an interrogator stubborn enough. This is the good news. But there is also the bad news: each machine loses, regardless of whether the machine thinks or not. This is The Trap.
What is the Price of Getting Out From the Trap
At first sight a way out from The Trap seems simple: perform TIG(s) in a language that does not have the properties (1) and (2) specified above. Yet this is a kind of ad hoc strategy, and such a strategy is always no good. Moreover, and this is more important, adopting it gives rise to some new difficulties which we are going to point at in this section.
Suppose that a TIG and its extensions are played in a language L* which, besides the conditions imposed in the section "Turing Interrogative Games" (cf. (5) above), satisfies the following conditions as well: the set of sentences of L* is recursive, and L* permits \(\Upomega\)sentences. L* can be viewed as a (formalization of a) fragment of a natural language or as a formalized language.
One can prove the following:
Theorem 1

\((\dag)\) each infinite recursive set of sentences of L is the set of direct answers to some interrogative of L

\((\dag \dag)\) At least one infinite recursive set of sentences of L* is not the set of direct answers to any question of L*.
Theorem 2
(Wiśniewski and Pogonowski 2010) Let L be a language such that: (a) among expressions of the language there are sentences and interrogatives, (b) both sentences and interrogatives of L can be coded by natural numbers, (c) the set of sentences of L is denumerable and recursive, and (d) the set of all effective ωinterrogatives of L is r.e. There exists an infinite family of infinite recursive sets of sentences of L which are not sets of direct answers to any interrogative of L.
Thus if each ωquestion of L* is effective, and the set of ωquestions of L* is r.e., the set of all effective ωquestions of L* is r.e. and therefore there are denumerably many infinite recursive sets of sentences of L* which are not sets of direct answers to questions of L*.
 (*)
Which formula of the language of Peano Arithmetic is undecidable?
 (**)
Which English sentence has the same meaning as the Polish sentence ‘Śnieg jest biały’?

(∇) Which ξ is such that \(\varsigma\)?
TT is supposed to be run in a natural language. If the above analysis is correct, we are justified in saying that the analogue of condition \((\dag)\) of Theorem 1 holds for a natural language. Clearly natural languages permit \(\Upomega\)sentences. Thus a natural language (or a part of it) which, in addition, satisfies the assumptions of Theorem 1 displays at least one of the undesired properties, (1) or (2), specified in the antecedent of Proposition 3. The initial assumptions of Theorem 1, in turn, express desired properties of a language of TT. Therefore The Trap is inescapable.
Yet applying logical concepts and theorems to natural languages is always somewhat risky. So let us be more cautious.

(∇) Which ξ is such that \(\varsigma\)?
Conclusions
Let us conclude. The main objective of TT is to differentiate between machines that think and machines that do not: it is presupposed that a thinking machine is able to pass the test, and if a machine passes TT, there are good reasons to believe that it thinks. TT can (although need not) be performed in the form of TIGs. We have shown that, given some conditions, each machine loses a certain TIG, and thus also each thinking machine (in any rational sense of the word). But once a TIG is performed, an instance of TT takes place. So TT does not reach its main objective as long as the conditions hold. The conditions characterize some properties of a language in which a TIG/TT is played/performed. If, however, they do not hold, one can argue that the success of a machine does not constitute a convincing argument for the claim that the machine thinks, because there exist denumerably many welldefined issues which are a priori banned from the game.
We prefer ‘thinks’ over ‘is intelligent’, since the concept of intelligence is currently used in a way that does not presuppose the presence of mental processes.
For an overview of the discussion on TT rules, see e.g. Copeland and Proudfoot (2009), Saygin et al. (2001).
Although, it seems, it is not claimed that a failure provides an argument for saying that a tested agent, either a machine or a man, does not think. The (abductive!) reasoning goes from ‘an agent passes the test’ to ‘the agent thinks’.
At first sight this may seem nonintuitive. However, as we will see in "The Trap" (cf. Remark 1), this blocks a possibility of Int’s unfair success, i.e. a success achieved by asking a question which cannot be answered with a direct answer due to the lack of such answer(s).
In order to get a more realistic picture one can assume that there is some intermediate device which “translates” the conditions and questions provided by Int into their numerical codes and “translates” the outputs send by M into expressions of the language. However, this assumption is inessential for the claims of this paper.
Recall that there may exist many answers which are correct with respect to a given condition. In such a case there may exist many qralgorithms pertaining to the question and the condition; of course, qralgorithms pertaining to a given question agree at the “point” referred to by \((\spadesuit)\). Let us stress that we do not presuppose that direct answers to a question are always mutually exclusive.
Note that this does not imply that M can cope only with finitely many questions, or that questions with infinite sets of direct answers cannot be dealt with!
For example, an analysis of Turing’s writings devoted to TT shows that he uses the term ‘question’ somewhat loosely. In particular, requests for a manifestation of a complex linguistic activity (“Write a poem!”, or “I put it to you that you are only pretending to be a man.”, etc.) can be issued in TT by an interrogator. However, we do not model this in TIGs. We also simplified the answerhood issue. One can argue that answers allowed in TT include replies of kinds which are not taken into account in TIGs. Moreover, the second party of a TIG is plainly supposed to be a machine.
Open Access
This article is distributed under the terms of the Creative Commons Attribution Noncommercial License which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.