Varieties of Self-Reference in Metamathematics

This paper investigates the conditions under which diagonal sentences can be taken to constitute paradigmatic cases of self-reference. We put forward well-motivated constraints on the diagonal operator and the coding apparatus which separate paradigmatic self-referential sentences, for instance obtained via Gödel’s diagonalization method, from accidental diagonal sentences. In particular, we show that these constraints successfully exclude refutable Henkin sentences, as constructed by Kreisel.

of definite descriptions of sentences: By "the Gödel sentence" we mean the sentence stating its own unprovability, by "the Henkin sentence" the sentence asserting its own provability, by "the liar sentence" the sentence stating its own falsity, and so on.
In each case, it is obvious that the definite article is incorrect. In the case of "the Gödel sentence", it is obvious that there are many different ways of defining an unprovability predicate ¬Bew(x) from which the Gödel sentence is constructed. The formula Bew(x) will depend on the particular theory and its language, the chosen Gödel coding, and then on how the provability predicate is defined relative to that coding. The problems of choosing a reasonable coding relative to the theory under consideration and then defining a suitable provability predicate relative to the coding and theory are well-known, thoroughly studied, and increasingly better understood.
When logicians talk about the Gödel sentence, they often assume that a language and a sound theory (or family of theories) have been fixed, and a "natural" provability predicate has been been chosen. They often assume that this ensures that Bew(x) satisfies the Löb derivability conditions. The usual justification for speaking about the Gödel sentence is then that all diagonal or fixed-point sentences of ¬Bew(x) are provably equivalent. That is, if is some suitable system of arithmetic, ϕ 1 ↔ ¬Bew( ϕ 1 ) and ϕ 2 ↔ ¬Bew( ϕ 2 ) imply ϕ 1 ↔ ϕ 2 . Moreover, all these sentences are provably equivalent to the consistency statement for . Provable equivalence, however, is a very coarse grained kind of equivalence, too coarse grained for justifying the definite article in "the Gödel sentence".
Finding a diagonal sentence of ¬Bew(x) is not trivial, and this might give the impression that these diagonal sentences are all mere trivial variations of each other. However, this is not the case. Whether a sentence is a diagonal sentence of ¬Bew(x) is not decidable. In fact, the set of diagonal sentences of any formula is undecidable [10, observation 2.2]. Of course, the provable equivalence of all diagonal sentences of ¬Bew(x) means that we do not have to distinguish between them as long as we are only interested in their provability, and their properties analyzable in standard (propositional) provability logic. However, if this is taken as justification for talking about the Gödel, one could also talk about the theorem of Peano arithmetic, as there is only one such theorem up to provable equivalence.
In the case of "the Henkin sentence", the problems are more blatant: By Löb's theorem all diagonal sentences of the provability predicate are provable and thus provably equivalent with each other and the theorem of Peano arithmetic. Clearly, they are not just trivial variations of each other, and the definite description "the Henkin sentence" cannot apply to them.
If formulas other than provability predicates satisfying the Löb conditions are considered, the definite article is even less plausible, because diagonal sentences of a given formula may behave in very different ways and, in particular, fail to be provably equivalent. For instance, we can look at the truth teller sentence constructed from the n -truth predicate Tr n (x) for some n ≥ 1. The formula Tr n (x) itself is a n formula. Applying some canonical diagonalization procedure to Tr n (x) thus yields a n -sentence (see [10] for a more detailed discussion). The point of a truth predicate for a class C of sentences is that all sentences in C are fixed-point sentences. 0 = 0 and 0 = 1 will be diagonal sentences of any partial truth predicate Tr n (x). Thus, if we are interested in the question whether the n -truth teller sentence is provable, refutable, or independent for a given n, we have to ask about very specific diagonal sentences -in contrast to the case of Gödel and Henkin sentences with a provability predicate satisfying the Löb conditions.

Self-Reference
In the case of n -truth we cannot dodge the question of what the truth teller sentence is by proving a general theorem about all diagonal sentences of the n -truth predicate, because all n -sentences are fixed-points, and some will be provable, while others are not. The same applies to provability for which the Löb derivability conditions may fail. These provability predicates need not be highly contrived: We can consider cut-free provability or Rosser provability. We may consider further formulas expressing other properties, possibly formulated in proper extensions of the language of arithmetic with a primitive truth or necessity predicate, for instance.
We do not expect that we can narrow down the class of diagonal sentence until only one sentence is left that deserves to be called "the sentence asserting its own P ", where P is the property expressed by the formula. However, we may still hope to be able to narrow down the class so that all remaining sentences behave in the same way. We may even hope to narrow down the class for some formulas to a point where all remaining sentences are so similar (given certain restrictive choices) that we are warranted to use the singular and talk about the sentence asserting its own P . Perhaps we will never be able to arrive at a suitable class of sentences, but rather realize that the status of the sentences depends on accidental choices in the coding or the definition of the formula in some haphazard way. In other cases we may arrive at a fairly stable result, although we may not be as lucky as in the case of canonical provability with the Löb conditions where all diagonal sentences behave in the same way; but we may be able to establish a result that applies to a sufficiently interesting class of diagonal sentences.
In the case of the formula ∃y SS0 × y = x expressing that x is even, for instance, there is little hope to obtain a stable result about sentences stating their even parity. Even the status of diagonal sentences obtained in some canonical way will depend on the coding and the method of diagonalization in a haphazard way. "Metatheoretic" properties such as provability and truth will generally yield more stable results. By metatheoretic properties we mean here properties expressible in some non-arithmetized metatheory such as the theories described in [9]; but we do not attempt here to make this distinction sharp.
Continuing to ask about self-referential sentences, even if the equivalence of all fixed-points fails, can lead to important insights. Ironically, the most striking example is the discussion that leads up to the discovery of Löb's theorem. Kreisel [16] replied to Henkin's [13] question: " [. . . ] the answer to Henkin's question depends on which formula is used to 'express' the notion of provability in ". Kreisel regretted that he did not keep asking. Löb, in contrast, continued asking and proved his celebrated theorem [18] by imposing further restrictions on the formula expressing provability. Hence, by specifying such restrictions on how a property may be expressed and how self-reference is obtained, one can achieve definitely noteworthy results.
In the present paper we do not delve into the intricacies of what it means for a formula of arithmetic to express some property P and ask on which diagonal sentences we should focus, once a formula expressing P is given. These fixed-point sentences should be self-referential.
Defining what it means for a sentence to be self-referential is notoriously difficult. Self-reference may be thought to be reducible to aboutness by the following definition: A sentence is self-referential if, and only if it is about itself. But then presumably the sentence ∀x x = x is self referential because it states the self-identity of everything, including the sentence itself. Halbach and Leigh [9] and Picollo [20] provide more comments on self-reference via quantification. 1 Following [10] and [12], we consider only self-reference via a closed term. That is, to be self-referential, more precisely to ascribe a property to itself, that sentence must contain a closed term that refers to that sentence. 2 Before we provide a precise definition of self-reference via terms, some technical preliminaries are in order. Let L 0 contain the logical symbols =, ¬, ∧, ∀ together with the constant symbol 0, the unary function symbol S and the binary function symbols + and ×. Let L be an effective extension of L 0 that contains a function symbol for each p.r. function, which may also contain further constants, functions symbols or predicates which we do not specify explicitly. Let be a consistent recursive enumerable L-theory which contains R together with all true identities of closed L-terms.
The name of a string in L is given by a numbering and a numeral function. We call an injective and effective function which maps L-expressions to numbers a numbering. We write # for standard numberings. We call an injective function ν : ω → ClTerm that maps each number to a closed term of L which has the same value a numeral function. A numbering α and a numeral function ν induce a naming function − , which is the composition ν • α. In order to make α and ν explicit, we also sometimes write ϕ α,ν for ϕ . If ν is the standard numeral function, i.e., ν(n) = n for all n ∈ ω, we sometimes also write ϕ α . Finally, ≡ denotes syntactic identity between expressions. Definition 1.1 (Kreisel-Henkin Criterion for Self-Reference) Under a specific coding, a sentence says of itself that it has the property expressed by the formula ϕ(x), if it is of the form ϕ(t) where t is a closed term that satisfies the following condition, t = ϕ(t) 1 Perhaps different forms of self-reference via quantification are somehow reducible to self-reference via a closed term. In standard first-order logic, quantifiers range over the entire domain, and restrictions are expressed in Frege's way with connectives. Frege's insight that the binary quantifiers of syllogistic logic are expressible with a unary quantifier makes the notion of aboutness difficult to capture. Here we remain agnostic about self-reference via quantification and concentrate on the Kreisel-Henkin criterion below. 2 For discussions and applications of the criterion see [9][10][11]. Therefore, if t refers to ϕ(t), that is, if it has the code of ϕ(t) as its value, will prove t = ϕ(t) . Of course, t = ϕ(t) implies ϕ(t) ↔ ϕ( ϕ(t) ) and ϕ(t) is thus also a diagonal sentence of ϕ(x).
We say that a sentence has the Kreisel-Henkin property if, and only if it is of the form ϕ(t) and ascribes to itself the property expressed by ϕ(x) in the sense of the Kreisel-Henkin criterion. We also call such term t a fixed-point term of ϕ(x).

Accidental Self-Reference
In the presence of suitable function symbols, Gödel's diagonalization method enables us to find a suitable closed term t ϕ for each ϕ(x) such that t ϕ = ϕ(t ϕ ) . "Suitability" needs to be understood relative to the coding scheme used. However, instead of using a systematic method for arriving at diagonal sentences that satisfy the Kreisel-Henkin criterion, for any given ϕ(x), we could also try a brute force method by enumerating all closed terms of the language and browse through them until we have found the first term t ϕ with t ϕ = ϕ(t ϕ ) . We may strike it lucky and the first t ϕ could be one that would have been generated by a systematic method; but we could also stumble upon some diagonal sentence with the Kreisel-Henkin property "accidentally". We could rig the game and set up the coding in such a way that there is an easy to find t ϕ . We can even use numerals S · · · S0 as the only closed terms and use the coding schema from [26]. The question is whether all these diagonal sentences with the Kreisel-Henkin property are all also obtainable with the usual Gödel diagonal or some similar systematic method and, if there are other such diagonal sentences whether they differ in their properties from the diagonal sentences obtained by some reasonable systematic method.
Let us call a fixed-point sentence not obtained by systematic method an accidental diagonal sentence. Of course, this is not (yet) a precise definition, and we have given no evidence that there are indeed any accidental diagonal sentences. 3 Before trying to make the distinction between accidental and non-accidental diagonal sentences precise, we provide examples of clearly accidental fixed-point sentences that behave in ways that are very different from those of generated by the usual systematic methods.
If we ask about the n -truth teller or the Henkin sentence, we will select a sentence for the diagonal sentences of a given predicate that ascribe to themselves the relevant property by the Kreisel-Henkin criterion. But we may then still be left with accidental and non-accidental diagonal sentences with rather different properties. In this case we would select those obtained by some systematic method and not the accidental diagonal sentences. If there is the n -truth teller or the Henkin sentence it will have been arrived at by a systematic method not by some quirk in the coding or some clever trick that works not generally, but only for the predicate in question.
Accidental diagonal sentences may satisfy the Kreisel-Henkin criterion because of some very specific feature of the formula that is being diagonalized. An accidental diagonal sentence may be chosen in a very ad hoc way to obtain a specific result. However, if we ask about self-referential sentences such as truth teller sentences, we are more interested in knowing the properties of diagonal sentences that have been constructed in a straightforward way and not in those obtained by some trickery.
In this paper we do not attempt to provide a thorough defence for preferring nonaccidental fixed-point sentences. Before one can enter this discussion, we need to show that it is possible to come up with a precise distinction; we also need to provide examples of accidental diagonal sentences with the Kreisel-Henkin property that behave differently from all non-accidental ones.
First we provide some examples of accidental diagonal sentences taken from the literature. The first example is Kreisel's [16] refutable Henkin sentence. 4 Of course, Kreisel had to employ a deviant provability predicate. He claimed that whether the Henkin sentence is provable or not depends on the way provability is expressed. However, it also depends on how the formula expressing provability is diagonalized. Only accidental diagonal sentences of Kreisel's provability predicate are refutable; those obtained in a systematic and uniform way are provable, as we are going to show.
Define another predicate Bew K (x) to be the following one Then Bew K (x) also weakly represents provability, and Bew K (t) is a refutable sentence stating its own provability with respect to the Kreisel-Henkin criterion.
According to the definition, Bew K (t) ≡ t = t ∧ Bew(t ) , hence by the assumption This shows that t is also a fixed-point term with respect to Bew K (x), and obviously Bew K (t) is refutable.
Intuitively, the deviant Henkin sentence Bew K (t) is not the result of some systematic fixed-point construction. Rather, t is already contained in Bew(x) and "happens" to be its fixed-point term [11, p. 701].
We can start from a provability predicate Bew(x) satisfying the Löb derivability conditions. It is not hard to see that applying the usual canonical diagonalization method yields a term s distinct from t and that the resulting Henkin sentence Bew K (s) is provable [10, observation 4.1].
We can generalize the above method of obtaining fixed-point sentences: Proof Since S is a proper subset, ϕ t S still contains at least one free occurrences of x, hence is still a formula with the only free variable x. Now according to our definition it is easy to see that ϕ(t) ≡ ϕ t S (t). This in particular means that the codes of ϕ t S (t) and ϕ(t) coincide. Since t satisfies the Kreisel-Henkin criterion with respect to ϕ(x), we also have t = ϕ t S (t) . Hence, t is also a fixed-point term of ϕ t S (x).
Kreisel's original refutable Henkin sentence in [16] -not Henkin's simplified version Bew K (t) above -can be obtained from this observation with n = 2.
Observation 2.2 has some crucial implications about whether the Kreisel-Henkin criterion alone can be used as a sufficient condition for genuine self-reference. Firstly, if for different subsets S the formula ϕ t S expresses different syntactical properties, then according to the Kreisel-Henkin criterion the formula ϕ(t) self-ascribes several different properties. The number of different proper subsets, or in other words the number of different self-ascribing properties, is equal to 2 n − 1, which grows exponentially. Secondly, for most proper subsets S, if we apply the usual diagonal construction directly to the formula ϕ t S , we in general obtain a closed term t S different from t. The two sentences ϕ t S (t S ) and ϕ t S (t) will not be provably equivalent in general as well. In this sense, ϕ(t) would be an accidental fixed-point for most of these formulas ϕ t S . We now introduce examples of accidental diagonal sentences which result from contrived codings. In particular, we provide a counterpart of Observation 2.1 on the level of Gödel numberings. That is, by suitably tweaking the coding, we obtain a deviant provability predicate such that the fixed-point property of the resulting Henkin sentence is directly implemented into the coding. As in the case of Kreisel's provability predicate above, only accidental diagonal sentences thus obtained are refutable, while those constructed in a systematic and uniform manner are provable.

Observation 2.3
Let # be a standard numbering such that each #-code is positive and even. Let Bew(x) be a provability predicate that weakly represents provability, i.e., for every sentence ψ, ψ iff Bew( ψ # ). We assume that ¬Bew(n), for each odd n (e.g., this holds for Feferman's [5] standard provability predicate). Let 1 :≡ S0 and m + 1 :≡ ( m + S0). Let m be the smallest odd number such that m does not occur in Bew( m). 5 We now change our old standard numbering # by defining a new numbering α as follows Then Bew(x) weakly represents provability relative to α, i.e., the set of α-codes oftheorems. Moreover, Bew( m) is a refutable sentence which states its own provability with respect to the Kreisel-Henkin criterion.
Intuitively, the refutable Henkin sentence Bew( m) is accidental. For its fixed-point term m is not obtained by a systematic method, but rather by a contrived numbering which is specifically tailored for this purpose. Even if Bew(x) is a canonical provability predicate w.r.t. the numbering #, Bew(x) does not satisfy Löb's conditions w.r.t. the numbering α. 6 As in the case above, assume that Bew(x) satisfies Löb's conditions w.r.t. the numbering #. Let s be a fixed-point term of Bew(x) which is obtained by the usual canonical diagonalization method relative to the coding α. That is, Bew(s) ↔ Bew( Bew(s) α ). It is then easy to see that s is different to m, and hence that Bew(s) is a provable Henkin sentence.
Numberings which are designed to immediately provide fixed-points with the Kreisel-Henkin property are sometimes said to have "built-in diagonalization. Paradigmatic examples of such numberings are so-called self-referential numberings" [8, definition 3.3]. It is to be expected that results about axiomatic theories of truth are most stable, because the axioms are formulated relative to a fixed coding, while defined notions such as the usual provability predicate are highly relative to the coding. Heck [12, p. 14ff] showed that even axiomatic theories of truth are sensitive to the chosen coding (and the language). Again, one has to be very careful about what an axiomatic truth theory is, independently of a fixed coding. In semantic, nonclassical theories of truth, [2] had already observed sensitivities to the codings. See also [22,Section 2.2] and [8, Section 9] for more recent examples of intensionality with respect to truth theories which result from numberings which have built-in diagonalization. Whether all numberings with built-in diagonalization yield accidental fixed-points is a delicate question which we will briefly address in Section 8.2.

Plan of the Paper
The main goal of this paper is to make the distinction between accidental and non-accidental fixed-points precise. The basic idea is that non-accidental fixed-points are constructed in a uniform way. A precise notion of uniformity is introduced in Section 3, where we also show that the canonical fixed-point constructions found in the literature are uniform in our sense. In the remainder of the paper, we examine the extent to which the uniformity constraint rules out accidental fixed-points.
We start with Kreisel's construction as a paradigmatic case of an accidental fixedpoint. According to our analysis, this fixed-point construction is accidental since Kreisel's provability predicate Bew K (x) already contains its own fixed-point term. We ask whether, or more generally, under which additional assumptions, the uniformity requirement rules out the possibility that a predicate contains its own fixed-point term (Question 3.6).
In Section 4, we show that uniformity alone is not sufficient to exclude refutable Henkin sentences which contain their own fixed-point terms. Rather, accidental fixed-points can also result from contrived choices of the numbering or the numeral function. However, in Section 5 we show that Kreisel-like constructions can be successfully excluded by 1) requiring uniformity of the diagonal operator and 2) requiring the numbering and the numeral function to induce a non-circular weak naming relation. As we argue in Section 6, the constraint of well-foundedness, which implies non-circularity, is natural and well-motivated. This provides a satisfactory answer to Question 3.6 and completes the main part of the paper.
In Section 7, we introduce a new construction of refutable Henkin sentences which are accidental, but do not contain their own fixed-point terms. We show that the constraints of uniformity and the non-circularity of the weak naming relation taken together do not rule out this construction, but uniformity plus the well-foundedness of the weak naming relation do. In Section 8, we provide a different metamathematical context in which the uniformity constraint successfully singles out non-accidental fixed-points. Moreover, we briefly address the question whether all numberings with built-in diagonalization yield accidental fixed-points. Finally, in Section 9 we extract some conclusions.

Uniformity
In this section the distinction between accidental and non-accidental is made precise. The non-accidental fixed-points are obtained in a uniform and systematic way. These systematic methods can be extracted from the usual textbook proofs of the diagonal lemma. These methods apply uniformly to all formulas. The precise definition of a uniform diagonal construction allows us to distinguish the non-accidental fixed-point constructions, including the usual Gödel's diagonal method and alike, and the more accidental ones provided by Kreisel. The use of uniformity to distinguish the two kinds of fixed-point constructions can be traced back to [11]. But the definition of uniformity given there is defective, since, according to the definition there, Gödel's diagonal construction would not be uniform.
It is our task here to give a more adequate definition of uniformity and provide an extensive study of its implications for self-reference. There are several considerations that motivate and shape our formulation for uniformity below. First of all, as already mentioned, the intuition for uniformity is that a uniform construction should not result in a fixed-point depending on very specific syntactical features of the formula it diagonalizes; it should diagonalize all the formulas with a designated free variable by similar means.
Second of all, since the notion of uniformity is essentially restricting the class of constructions we are allowed to perform on the syntactical objects, it is natural to define it in a recursive way: we specify basic operations that are uniform in a very intuitive sense, and a uniform construction is then a finite composition of all the basic uniform constructions. Except requiring the basic operations to be intuitively uniform, we also want them to be possibly carried out in a syntax theory, e.g. in the sense of Halbach and Leigh [9]. This reflects our general contemplation on the subject: if an arithmetic sentence could refer to a syntactical object at all, then when constructing such a sentence we must mimic what we can do in syntax theory. The operations provided in a syntax theory include substitution, quotation, and concatenation. As you will soon see, these are indeed the basic constructions we allow, except for a modification for concatenation, which links to our final motivation.
The final consideration is that we want our defined constructions to always yield well-defined syntactical objects. This leads us to a typed approach to define uniformity. The unrestricted form of concatenation will not always result in well-formed formulas or terms, thus we have replaced concatenation with three collections of well-typed operations, associated with logical connectives, function symbols, and predicates. We also distinguish operations which only differ with respect to their domain or codomain, such as the substitution or naming function. As we will see below, these distinctions will permit a conceptually more refined introduction of the basic constructions and will lead to technically important results (see Lemma 5.9).
After spelling out all the motives, we now provide the precise formulation of the notion of uniformity. Since our aim is to analyse sentences that self-ascribe properties which can be expressed by unary predicates, we restrict ourselves to terms and formulas with a single (designated) free variable x. Let Fml x and Term x be the set of all L-formulas and L-terms respectively which at most contain x as a free variable. As usual, for any set A and n ≥ 1 let A n denote the Cartesian product A × · · · × A which consists of n factors; A 1 will simply be A. We set A 0 := 1, where 1 denotes a designated singleton set, which remains fixed throughout this paper. Let * denote the unique element of 1, i.e., 1 = { * }. To be precise, the binary product is not strictly The fact that we have these canonical isomorphisms, and that these isomorphisms interact in a coherent way, justifies our usual sloppy way of writing A × · · · × A, and permits us to freely view A n as A m × A l , whenever m + l = n. However, precisely speaking, we will assume A n to be the product A × (· · · (A × (A × A)) · · · ). This level of precision will only affect the precise form of Definition 3.1 below and the materials in Section 5 where a more careful treatment is needed. In other places in this paper, however, we will suppress this level of precision as usual.
The following meta-linguistic operations will serve as the basic constituents of uniform constructions: 7 (1) Two meta-linguistic substitution functions: Given any formula ϕ(x) ∈ Fml x and any term t (x) ∈ Term x , application of Sub f yields the result of substituting the term t (x) for x in ϕ(x), i.e., Similarly, for any two terms t (x) and s(x), we have (2) The naming functions for formulas and terms: 8 (3) Given any n-ary logical connective (for quantifiers we only consider ones that bind x), the meta-linguistic function ). In our language L, ranges over {¬, ∧, ∀x}. (4) Given any n-ary function symbol f of L, 9 the meta-linguistic function , · · · , t n (x)). (5) Given any n-ary predicate symbol R of L, the meta-linguistic function The function which introduces the variable term x, where x sends the unique element * in 1 to the term x. 7 It will be evident from the definition below that uniformity applies more generally to a wider range of languages, which we do not consider in this paper. 8 Recall from Section 1.2 that the naming function − are induced by a numbering and a numeral function and can be applied to any string. However, for reasons which will become clear at a later stage of this paper, it is useful to distinguish naming functions for well-formed formulas and terms respectively (cf. Section 5). When this distinction is not important or when we consider a naming function for all strings, we also write − as usual, without a subscript. 9 We identify constants with 0-ary function symbols.
While the functions given in (1) and (3)-(6) are fixed, we treat the naming function − as a parameter which has to be specified. To be fully precise and explicit, we call the above functions basic operations containing − . We define uniform functions based on such a class as follows: Definition 3.1 Let D be the smallest collection of sets containing 1, Fml x , Term x which is closed under binary products. Let A, B be sets in D. A function f : A → B is called uniform for − if it is contained in the smallest class of functions that includes the following C-basic functions containing − : and which is closed under composition and maps canonically induced by the Cartesian product structure; that is, we have If we want to be explicit about both the numbering function α and the numeral function ν that constitute the naming function − , we also say a function is uniform for ν • α. If ν is the standard numeral function that takes n to its numeral n for any n ∈ ω, then we also omit mentioning it explicitly and say a function is uniform for α. Of course, if the naming function is implicitly understood as determined by the context we will often suppress this parameter. In fact, an explicit version will only play a major role in Section 8.2. Now suppose we have fixed a naming function − . Note that all the other canonically induced functions associated to the Cartesian product structure are uniform in the above sense. Given any two uniform functions f : A → B and g : C → D, there is an induced function which is uniform by definition. All identity functions on sets in D are also uniform. A simple induction on D shows this fact: The identity functions on 1, Fml x and Term x are uniform (the identity function on 1 is ! 1 ). Let A, B in D be given such that id A , id B are uniform. The identity function on A × B can be expressed as Hence, id A×B is uniform. The associators, or the canonically induced isomorphisms So are the canonical isomorphisms between A, A × 1 and 1 × A: The inverses of these canonical isomorphisms are uniform as well, which we leave for the readers to check. Since all these canonical maps are uniform, we are free to use them in the remaining parts of this paper, and we will usually not mention them explicitly as we usually do when dealing with Cartesian products, except in Section 5 where more careful treatment is needed.
Importantly, all functions belonging to the C-basic class do not make any distinction on the initial input, and obtain results in an intuitively uniform way. The recursive definition of uniformity above then captures this intuitive sense of uniformity. This finally leads us to the definition of a uniform diagonal operator. Given a function u with codomain Fml x or Term x , we say u is closed if im u ⊆ Sent or im u ⊆ ClTerm, where im u denotes the image of u and Sent and ClTerm denote the set of sentences and closed terms respectively. For example, both − f and − t are closed.
A uniform diagonal operator is a diagonal operator which is uniform (for some naming function) in the sense of Definition 3.1.
Note that the definition of a diagonal operator depends on the chosen coding and the interpretation of the language. Since we only consider theories which prove all true identity statements of closed terms, instead of requiring that the the identity dϕ = ϕ(dϕ) is provable in , we could equivalently require that dϕ = ϕ(dϕ) is true with respect to the given interpretation.

Remark 3.3
In this paper we focus on self-reference via a term, but the definition of uniformity we gave can also be applied to study "weak" diagonal sentences that do not satisfy the Kreisel-Henkin condition and thus to languages lacking the required (or indeed any) function symbols. In particular, we can define the uniformity of such a weak diagonal operator d : Fml x → Fml x along similar lines. For instance, the diagonal operator which underlies the diagonal lemma introduced in [3, §35] will of course be uniform.
We now show that the canonical diagonalization methods found in the literature are uniform. More specifically, we show that Gödel's standard diagonal construction (which can be found e.g. in Smoryński [23]), Jeroslow's [15] diagonal operator and some further methods of diagonalization are all uniform. We thereby hope to convince the reader that our definition sufficiently captures the intuitive sense of a uniform diagonal construction.

Gödel's Construction
Let sub G be the function symbol representing the primitive recursive function that takes (the code of) a formula ϕ(x) and (the code of) an expression e, and outputs (the code of) the formula obtained by substituting e for x in ϕ(x), i.e., By definition we can prove Here is an explicit construction: First, note that by using the meta-linguistic function sub G , we can construct the term sub G (x, x): The following diagram provides a composite map that gives us d G : If we unwrap the definition and follow the arrows of the diagram, we obtain for any input ϕ(x) ∈ Fml x the following sequence of constructions The last step of this sequence delivers the desired fixed-point term d G (ϕ(x)) for ϕ(x). This shows that the usual Gödel construction is uniform.

Jeroslow's Diagonal Operator
We now reconstruct Jeroslow's [15] diagonalization method as a uniform diagonal operator. To begin with, we observe that there is a binary function symbol sub J satisfying the following property for any formula ϕ(x) and any term t (x) both with free variable x. In other words, sub J represents the primitive recursive function which maps (the code of) a formula ϕ(x) and (the code of) a term t (x) to (the code of) ϕ(t ( t (x) )). Hence, for any ϕ(x) and t (x) we have We conclude This implies that the term sub J ( ϕ(x) , sub J ( ϕ(x) , x) ) satisfies the Kreisel-Henkin criterion with respect to ϕ. Let d J denote Jeroslow's diagonal operator which maps ϕ(x) to this term. The following diagram shows that d J is uniform: 11 Starting with ϕ(x) and chasing the arrows in the diagram above, results in the following sequence of constructions: Once again, the last step of this sequence delivers the desired fixed-point term d J (ϕ(x)) for ϕ(x). Hence, the diagonal operator d J is uniform. Note that the construction of d J is based on the basic functions − f , − t and sub J . In particular, we have not used Sub f or Sub t , while the construction of Gödel's diagonal operator requires the use of Sub f . This is reflected in the resulting 11 We have implicitly used associators in the diagramme to make the composite maps well-defined. fixed-point terms: d J (ϕ) only contain names of ϕ(x), but not of expressions of the form ϕ(s) with s ≡ x, while d G (ϕ) contains a name of ϕ(sub(x, x)), which requires the substitution of a term in ϕ(x).

Other Uniform Diagonal Constructions
In addition to the usual canonical diagonal constructions, our framework is sufficiently robust to also accommodate several variants thereof, which intuitively qualify as uniform.
Example 3. 4 We can slightly tweak Gödel's construction to obtain a uniform diagonal operator which contains a "dummy" conjunct. Let − be based on some standard numbering #. Let Sub ∧ be a primitive recursive function that satisfies the following condition: Let sub ∧ be a binary function symbol that represents Sub ∧ . Let d A be a diagonal operator which maps a formula ϕ(x) to the term By definition of sub ∧ , we have Hence, d A is a diagonal operator. Moreover, it is easy to see that d A is uniform. d A can be given by adding to the construction of d G the operation = which yields the formula x = x, and ∧ which yields the conjunction of ϕ(sub ∧ (x, x)) and the formula x = x. Variations of d A involving other connectives or expressions other than x = x can be introduced along similar lines.
Example 3.5 Our notion of uniformity also subsumes the original definition of uniformity introduced in [11]. This definition relies on a function symbol such that for every ϕ ∈ Fml x . Let now d B be a diagonal operator which is uniform in the sense of [11], i.e., for every ϕ ∈ Fml x , If our language contains such a function symbolḋ, then d B is also uniform in the sense of Definition 3.2. We close by showing how such a function symbol can be specified. First, we fix a unary function symbol f (other than S) of our language. Let F denote the primitive recursive function given by We now possibly change our base theory to f such that F is represented by the function symbol f in our language -at least on all relevant formulae (see Section 4 for a more precise formulation). We then have for every ϕ ∈ Fml x , This provides us with a uniform diagonal construction for theory f .
The initial motivation for uniformity, as introduced in [11, pp. 700], is to provide a condition on self-referential sentences with the Kreisel-Henkin property that is satisfied by fixed-point sentences obtained in a systematic, "canonical" way, but not by contrived fixed-points. We have shown above that the canonical fixed-point constructions are uniform. Now we turn to the question to what extent uniformity can rule out deviant fixed-point constructions, such as the refutable Henkin sentences constructed by Kreisel and variations and generalizations thereof as in Observation 2.2 and examples below. In particular, if ϕ(t) satisfies the Kreisel-Henkin criterion and some natural assumptions are made, uniformity should rule out the possibility that the self-referential term t occurs already in the formula ϕ(x), as it does in the refutable Henkin sentence above. Thus, the usefulness of uniformity depends on the answer to the following question: Question 3.6 Let ϕ(x) be a formula and d be a uniform diagonal operator. Under which assumptions can we rule out the possibility that the term d(ϕ(x)) occurs in ϕ(x)?
It will be shown that a natural assumption on the naming function is sufficient to eliminate this possibility. However, we first show that extra assumptions are required and that d(ϕ(x)) can occur in ϕ(x), even if d is uniform, in a carefully chosen theory w.r.t. some specifically tailored numbering and numeral functions.

Uniform Kreisel-Like Constructions
To establish our claim that Question 3.6 cannot be trivially answered and some assumption is required, we construct a provability predicate Bew (x) such that where d is a uniform diagonal operator and Bew(x) is a given provability predicate weakly representing provability. The application of d to Bew (x) results in the term d(Bew (x)), which occurs in Bew (x) itself. Clearly, the resulting self-referential sentence Bew (d(Bew (x))), i.e., is refutable. Hence, using the same reasoning towards Observation 2.1, Bew (x) is a provability predicate and Bew (d(Bew (x))) is a refutable Henkin sentence. The construction of the provability predicate Bew (x) relies on some selfreferential trickery. This is because Bew (x) contains the term d(Bew (x)) which depends on the definition of Bew (x) itself. In order to make the definition of Bew (x) explicit, let d be a uniform diagonal operator which serves as a parameter. We define a meta-linguistic operator k d : Fml x → Fml x which maps a given formula ϕ(x) to the formula Any meta-linguistic fixed-point of k d will serve as the desired provability predicate. This is because every fixed-point Bew (x) of k d remains unchanged with regard to application of k d and thus satisfies Eq. 1: As it turns out, whether or not fixed-points of k d exist crucially depends on specific features of the naming function and the interpretation of our language.
In what follows we need to be more precise with the exact way our language L extends L 0 . Let L be the result of adding a k+1-ary function symbol f k n for each n, k ∈ ω, to L 0 . For simplicity, we assume that L does not contain any further nonlogical symbols. Let Pr denote the set of primitive recursive functions. We call an interpretation I of L standard, if (1) I interprets the symbols of L 0 as usual; (2) I(f k n ) is a k+1-ary function in Pr, for every k, n ∈ ω; (3) each k+1-ary function in Pr is represented by some f k n in L by I. In particular, if I is standard then it interprets the domain as ω. Thus, standard interpretations differ only with respect to the p.r. functions they assign to a given function symbol.
If I is standard, we use Basic(I) to denote the deductive closure of the theory R extended with all I-true identities of the form t = n, where t is a closed term and n ∈ ω. In general, different standard interpretations I yield different theories Basic(I).
Recall that the definition of a diagonal operator depends on the numbering and the interpretation of the language. Moreover, we defined uniformity relative to a given naming function. The following definition makes this explicit: Definition 4.1 Let α be a numbering, ν a numeral function and I be a standard interpretation.
(1) We say that d is a diagonal operator with respect to α and I if d is a closed meta-linguistic function of type such that for each ϕ ∈ Fml x , the I-value of closed term dϕ is the α-code of ϕ(dϕ).
(2) We say that d is uniform diagonal operator with respect to α, ν and I if d is a diagonal operator with respect to α and I and d is uniform for the naming function ν • α.
According to the next lemma, a fixed-point term d(ϕ(x)) can occur in ϕ(x) for some particular numberings and standard interpretations, even if d is uniform.

Lemma 4.2
There is a numbering α, a numeral function ν, a standard interpretation I, a uniform diagonal operator d with respect to α, ν and I and there are formulas Bew(x), Bew (x) which weakly represent Basic(I) such that We sketch two straight-forward constructions of provability predicates Bew (x) and Bew † (x) which both satisfy the conditions of Lemma 4.2. Recall that the uniform diagonal operator d B introduced in Example 3.5 maps each formula ϕ(x) to the fixed point termḋ ( ϕ(x) ).
Our first construction is based on a peculiar choice of the numbering: where Bew(x) weakly represents Basic(I). Given a standard numbering # (such that each code is positive), let α be a new numbering which assigns 0 to Bew (x) and #ϕ to every other expression ϕ. Hence, 0 ≡ Bew (x) α . Therefore, the fixed-point term of Bew (x), based on the operator d B and the numbering α, simply isḋ(0). That is, (Bew (x)). Hence, Instead of using a contrived numbering, our second construction is based on an peculiar choice of the numeral function: where c is some fresh constant symbol and Bew(x) weakly represents Basic(I).
We choose I such that c denotes (the code of) Bew † (x). Moreover, let ν be the numeral function which maps the code of Bew † (x) to c and each other number to its standard numeral. According to these choices, c is the ν-name of Bew † (x), that is, c ≡ Bew † (x) ν . Therefore, the fixed-point term of Bew † (x), based on the operator d B and the numeral function ν, simply isḋ(c). That is,ḋ(c) ≡ d B (Bew  † (x)). Hence,

Remark 4.3
The vigilant reader will complain that our constructions contain subtle but persistent circles. In the second construction, we defined the provability predicate Bew(x), and therefore also Bew † (x), in dependency of the interpretation I. But I in turn depends on the choice of Bew † (x). In other words, we assume without proof that such an interpretation and provability predicates exist. Similarly, in the first construction we defined the provability predicate Bew(x), and therefore also Bew (x), in dependency of I. But the interpretation I in turn depends on Bew (x). This is because the function symbolḋ represents the function mapping the α-code of Bew (x) (i.e., the number 0) to the α-code of Bew (ḋ (Bew (x))), which of course depends on Bew (x).
As we show in the Appendix B, we can use the recursion theorem to provide the missing details, thereby turning our proof sketches into rigorous arguments. See B.1 and B.2 for explicit and detailed constructions of Bew (x) and Bew † (x) respectively. Inspection of these constructions reveals that they rely on circular features of some of the involved formalisation choices. In particular, both constructions yield provability predicates which contain their own names: The predicate Bew (x) contains its own α-name 0, while Bew † (x) contains its own ν-name c.

Circularity of Naming
In the previous section we have given examples of formulas ϕ(x) that already contain d(ϕ(x)), even if d is a uniform diagonal operator. However, our examples rely either on contrived numeral functions or codings. In Question 3.6 we asked which additional assumptions can be made to rule out the possibility that the diagonal term d(ϕ(x)) occurs already in the formula ϕ(x) which is diagonalized. Remark 4.3 hints at a possible answer: If we rule out the deviant numeral functions and codings, or more specifically, if the naming function does not exhibit any circular features, we may hope it will provide us with the additional natural assumptions we are seeking. To make this precise, we define a binary relation on the set of expressions. This relation will play an essential role in the formulation of an answer to Question 3.6. We call the weak naming relation for − and say that an expression e is weakly named in e if e e . 12 In order to make the dependency of on the underlying naming function explicit, we sometimes write − or α,ν for the weak naming relation for − = ν • α. If ν is the standard numeral function we also simply write α instead of α,ν . Finally, let * denote the transitive closure of . The following useful facts follow immediately from Definition 5.1

Fact 5.3
If e * e , then there is a subexpression t e such that t is a closed term and e * t. 12 The symbol denotes a slightly different relation in [9]. There, is defined by setting e e iff e e . In our terminology, this may be called a strong naming relation. Caveat lector! The relation allows us to formalise what we mean by circularity of naming functions. What we will show is the following: If does not exhibits any loops, viz. if its transitive closure * is irreflexive, then it suffices to yield a possitive answer to Question 3.6.
To prove this, our strategy is to first inspect the meta-linguistic properties of a fixed-point term obtained from uniform constructions, and our strategy is to first provide a more systematic study of the structure of uniform functions. It is evident from Definition 3.1 that every uniform operation can be constructed by successively composing C-basic functions together with canonical maps of the Cartesian product structure. To make this intuition precise, we introduce a representation system for constructions of uniform functions. Such development will allow us to prove, by induction and case distinction, Lemma 5.11, according to which all diagonal terms obtained uniformly share a particular meta-linguistic feature. This lemma directly implies our main result, viz. Proposition 5.12.
We start by using a term algebra to represent uniform functions. Let (B n ) n∈ω be a fixed bijective (and effective, if you prefer) enumeration of all C-basic functions, and let UniFct be the set of all uniform functions.

Definition 5.4
Let be the signature that contains constant symbols b n for each n ∈ ω, and two binary function symbols and . Let T denote the term algebra generated over (with no variables). We recursively define a subset R ⊂ T and an evaluation function ev : R → UniFct: • b n ∈ R and ev(b n ) = B n , for any n ∈ ω; • If p, q ∈ R and dom(ev(p)) = dom(ev(q)), then (p, q) ∈ R and ev( (p, q)) = ev(p), ev(q) ; • If dom(ev(p)) = cod(ev(q)), then (p, q) ∈ R and ev( (p, q)) = ev(p) • ev(q); where dom(f ) and cod(f ) denotes the domain and codomain of a function, respectively. We call a term r ∈ R a representation of a uniform function u if ev(r) = u.
Obviously, terms in R are well-typed, and in what follows we simply use dom(r), cod(r) to denote dom(ev(r)), cod(ev(r)), respectively. We also call a term r ∈ R closed (resp. basic) if ev(r) is closed (resp. basic).
Since the codomain of all our basic operations is either Fml x or Term x , the following fact is immediate: From the definition of uniformity and representation of uniform functions, it is easy to see that ev : R → UniFct is surjective, which means every uniform function has some representation. But ev is not injective. Suppose that ev(b m ) = id Fml x and ev(b n ) = π Fml x ,Fml x 1 . Then , (b m , b m ))), which shows that both b m and (b n , (b m , b m )) are representations of id Fml x . This example shows that some representations contain redundant information which is irrelevant to the actual uniform function it represents. To reduce such redundancies, we define a reduction process for terms in R: Definition 5. 6 The reduction relation is the smallest binary relation −→ ⊆ R × R satisfying the following clauses: for every p, q, r ∈ R, (1) if ev(r) = id A and (r, p), (q, r) ∈ R then It is easy to verify that we have a well-defined notion of reduction among terms in R, i.e. if p ∈ R and p −→ q, then q must also be a term in R. Let −→ * denote the transitive closure of −→. Clearly, the function that a term p ∈ R represents remains invariant under the reduction process. Thus, if p −→ * q then ev(p) = ev(q).
We say a representation p is reduced if there is no other q ∈ R such that p −→ q. By clause (6), if p ∈ R is reduced then so is every subterm of p. The following holds for the reduction process we have described above: Fact 5.7 Given any term r ∈ R, successive application of reduction to r will always yield a reduced representation after finitely many steps.
Proof All the clauses of the reduction relation do not increase the length of terms, where only (5) does not strictly decrease the length. Hence, we only need to verify that (5) does not generate an infinite chain of reductions, which is clearly the case.
Hence, without any loss of generality, we can work only with reduced representations of a uniform function. Note that even though the reduction process is terminating, it does not necessarily enjoy unique normalisation. That is, a representation may give rise to different reduced representations, and the same uniform function may have different reduced representations. 13 With reduced representations, we may commence studying the behavior of uniform functions. For our purpose, we are mainly concerned with those whose codomain is Fml x or Term x . The following is a simple observation which will be used later: Fact 5.8 Given a reduced r ∈ R such that cod(r) is either Fml x or Term x , then it is either a constant b n for some n ∈ ω, or of the form (b m , q) for some m ∈ ω such that b m is basic.
Proof Suppose r is a composite term. Since r is reduced and cod(r) is not a binary product, we have r = (b m , q) for some m ∈ ω. Clearly, ev(b m ) cannot be id Fml x , id Term x or ! A for any A ∈ D. Moreover, ev(b m ) cannot be a projection; otherwise, cod(q) would be a binary product and by Fact 5.5 it would be of the form (q 1 , q 2 ). This would imply that (b m , q) is not reduced. Hence, r has the form (b m , q) with b m basic.
This fact holds essentially because the codomains of all our basic functions are not multiple products of Fml x or Term x , hence if the codomain of a reduced representation is a single multiple of Fml x or Term x , the final composite cannot be a projection.
Uniform diagonalization operators are of type Fml x → Term x . The following lemma shows that any such operator must involve an essential use of the function − f . This is simply because the uniformity constraint does not allow any other way to obtain a function of such type. Proof Suppose r ∈ R is a reduced term representing u. We show by induction on the complexity of terms that if r does not contain b n then ev(r) is a constant function.
Suppose r is b m for some m ∈ ω. There is no C-basic function other than − f that is of type Fml x → Term x , hence the base case is closed. For the inductive step suppose r is a composite term. By Fact 5.8 r is of the form (b m , q) with b m basic. From the codomain of ev(b m ) and the fact that m = n we conclude that b m must represent Sub t , − t , f or x. Also, q is reduced and does not contain b n . The domains of these basic functions are all of the form Term k x , for k ∈ ω. If k = 0, then ev(r) is obviously constant. If k = 1, then ev(q) : Fml x → Term x . By induction hypothesis ev(q) is constant, and so is ev(r). Finally, if k ≥ 2, then according to our choice of the product structure and by Fact 5.5, q must be of the form (q 1 , (q 2 , · · · (q k−1 , q k ) · · · )), where every q i is reduced, does not contain b n and ev(q i ) : Fml x → Term x . By induction hypothesis again, every ev(q i ) is constant, and so is ev(r).
Note that diagonal operators are not constant functions. Hence, according to this lemma, all the uniform diagonal constructions provided in Section 3 include the function − f . Note that the constructions of Gödel's diagonal operator and the two examples presented in Example 3.4 and Example 3.5 do not employ the function − t ; while Jeroslow's operator does. Hence, Lemma 5.9 in particular shows that, at least in the context of uniform diagonalization, the function − f is more fundamental than the function − t . This is one of the reasons why we explicitly distinguish these two naming functions.
Uniform functions also preserve the occurrence of free variables. Proof Let r ∈ R be a reduced representation of u. We prove the claim by induction on the complexity of r. For the base case suppose r is some constant b m . If u has type Fml x → Fml x then ev(b m ) = id Fml x , which satisfies the condition. If u has type Fml x → Term x , the only C-basic function with the right type is − f , which is closed. For the inductive step we can assume by Fact 5.8 that r is of the form (b m , q) with b m being basic.
where is not ∀x, or R with ar(R) ≥ 1. We check the claim for each of these cases: (1) If ev(b m ) = Sub f , then by Fact 5.5 q must be of the form (q 1 , q 2 ), with ev(q 1 ) : Fml x → Fml x , ev(q 2 ) : Fml x → Term x . Since ev(r) is not closed, both ev(q 1 ) and ev(q 2 ) are not closed. By the induction hypothesis, ev(q 1 )(ϕ) and ev(q 2 )(ϕ) are a formula ϕ 1 (x) and a term t 2 (x) respectively, both with a free variable x. Hence, ϕ 1 (t 2 (x)) contains x as a free variable. (2) If ev(b m ) is for either being ¬ or ∧, then either ev(q) : Fml x → Fml x or q = (q 1 , q 2 ), where ev(q 1 ), ev(q 2 ) : Fml x → Fml x . For the former case, ev(q) cannot be closed. For the latter case, at least one of ev(q i ) is not closed.
The case where u : Fml x → Term x is completely similar. Here, ev(b m ) is Sub t , f with ar(f ) ≥ 1, or x, and a proof by case distinction proceeds in almost the same manner as shown above.
The above proof is a bit tedious, since it relies on an induction with several case distinctions. But the statement of Lemma 5.10 should be expected, since each Cbasic function either preserves the existence of free variables, or it maps anything to closed expressions; composition does not change this fact.
With all these preliminary work, we can finally show the following result about uniform diagonalization: If d(ϕ(x)) is the result of uniform diagonalization, then it weakly names an expression, which weakly names an expression . . . , which weakly names an expression of the form ϕ(s):

Lemma 5.11 Let d be a uniform diagonal operator. Then for every formula ϕ(x) ∈ Fml x that contains a free variable x, there is a term s such that ϕ(s) * d(ϕ(x)).
The idea behind Lemma 5.11 is very simple. Intuitively, we may view a reduced representation r of a uniform diagonal operator d as an instruction for carrying out the diagonalization process for each formula ϕ(x). We have shown in Lemma 5.9 that d must make an essential use of the function − f . Before that use, what we may essentially do is to construct terms and substitute them into ϕ(x), which results in a formula of the form ϕ(s) for some term s. Moreover, we may combine ϕ(s) with other formulas using connectives to form a longer expression, which we temporarily denote as ψ. Note that ϕ(s) ψ, hence, after applying − f , we have ϕ(s) * ψ . Note that ψ is a closed term, which means that we can only substitute it into other expressions, but not the other way around. This implies that further applications of other basic functions to ψ would retain the * -relation with ϕ(s), which is exactly what we want.
Of course, to make this rough proof sketch precise, it requires a rigorous argument. Since the detailed proof is quite tedious and technical, we omit is here. Its structure resembles the proof of Lemma 5.10, where we also need an induction together with several case distinctions. The enthusiastic reader can find the proof in full detail in Appendix A.
We can now formulate an answer to our Question 3.6: To make it impossible for d(ϕ(x)) to occur in ϕ(x) for a uniform diagonal operator d, it is sufficient to rule out loops in the weak naming relation or, equivalently to demand that * is irreflexive. We maintain that this assumption on * is natural. The usual Gödel codings and numeral functions make * irreflexive.

Proposition 5.12 Let * be irreflexive. Then for every uniform diagonal operator d and formula ϕ(x), the fixed-point term d(ϕ(x)) cannot occur in ϕ(x).
Proof Assume that there is a uniform diagonal operator d such that d(ϕ(x)) occurs in ϕ(x). By Lemma 5.11, there exists a term s such that ϕ(s) * d(ϕ(x)). Since d(ϕ(x)) is a subterm of ϕ(s), we obtain d(ϕ(x)) * d(ϕ(x)) by Fact 5.2. Hence, * is not irreflexive.
Recall that the deviant Henkin sentences introduced in Section 4 are based on provability predicates Bew (x) which satisfy condition Eq. 1, i.e., It follows immediately from Proposition 5.12 that every predicate Bew (x) satisfying this condition involves circular weak naming relations.

Remark 5.13
While the construction of Bew (x) in B.1 employs a canonical numeral function, namely, standard numerals, the circularity of the weak naming relation results from a contrived choice of the numbering. To further analyse this situation, we say that a numbering α is monotonic if for any expressions e, e , e e implies α(e) ≤ α(e ). Clearly, the numbering function α used to construct Bew (x) in B.1 is not monotonic. Can we do better and base this construction on a monotonic numbering instead of α? We note that this is not possible. In order to see this, we call a numbering α strongly monotonic for ν-numerals, if α(e) < α( e α,ν ), for all expressions e.
We observe that every monotonic numbering is strongly monotonic for standard numerals (see also [8,Section 6]). Moreover, if α is strongly monotonic for ν-numerals, then the weak naming relation α,ν induced by α and ν is well-founded. In particular, the relation α,ν * cannot be circular. Hence, by Proposition 5.12 we cannot construct a uniform diagonal operator d and a provability predicate Bew (x) satisfying Eq. 1, whenever we use a numbering α and a numeral function ν such that α is strongly monotonic for ν-numerals.
As we have seen in B.2, the circularity of the weak naming relation can also result from a non-standard numeral function, even if we fix a standard monotonic numbering. Hence, an answer to Question 3.6 involves a constraint on the numbering and the numeral function. Of course, this is precisely what we do in Proposition 5.12 when we require the irreflexivity of * .
We have said that the constraint of non-circularity is natural. The next section will provide more detailed conceptual and philosophical grounds on which this requirement can be based.

Quotation and the Well-Foundedness of Naming
A naming function maps every expression e to a closed term e , which serves as its name. Since we work in an arithmetical framework, a naming function consists of a Gödel numbering and a numeral function. Except for requiring effectiveness, thus far we have not placed any constraints on numberings and numeral functions. Gödel numerals are often conceived of as arithmetical counterparts of quotational names. However, there are codings and numeral functions that make it very implausible to think of these numerals as quotations. In this section, we introduce precise constraints that single out certain coding schemata and numeral functions as adequate counterparts of quotation devices.
In order to do so, we conceive of arithmetical naming functions as particular instances of string-theoretical naming devices. Let A be an alphabet and let A * denote the set of finite strings over A including the empty string . For strings e, f in A * , let ef denote the result of concatenating e with f . Let E ⊆ A * be a set of expressions. We call any injective function N : E → E a string-theoretical naming function for E. For example, let A consist of English letters together with a pair of single quotation marks. The function Q which maps each string s ∈ A * to its proper quotation 's' is a canonical example of a string-theoretical naming function for A * . 14 Let − : Term x ∪ Fml x → ClTerm be a naming function as introduced in Section 3, i.e., − is the composition of a numbering and a numeral function. Then − can be also conceived of as a string-theoretical naming function. For example, Q and − name the letter "x" by the strings " 'x' " and " x " respectively.
In the philosophical literature, − is often viewed as an arithmetical proxy of the quotation function Q. Heck [12, pp. 27], for example, takes the "disquotation" schema T ( ϕ ) ↔ ϕ to be an arithmetical formalisation of the informal schema 'S' is true iff S (see also [24, pp. 156]). On this view, it is plausible to require that − satisfies certain quotation-like features. In particular, we require that − behaves similarly to the weak naming relation induced by quotation. In order to make this precise, we first generalize Definition 5.1 to string-theoretical naming functions. Definition 6.1 Let N be a string-theoretical naming function for E. We say that an expression e ∈ E is weakly named in e by N, in symbols: e N e , if there exists another expression e ∈ E such that e e and N(e ) e . We also call N the weak naming relation for N. Let N * denote the transitive closure of N .
Clearly, Fact 5.2 also holds in the more general setting. The following useful observation follows from the above definition and Fact 5.2.

Fact 6.2 Let N be a string-theoretical naming function for
Since each quotation properly contains its named expression, no quotation q can denote an expression containing q itself. More generally, the weak naming relation Q is well-founded. From this observation, we can extract the following necessary condition for a naming function to mimic or resemble quotation: Well-Foundedness: Every naming function which resembles quotation induces a well-founded weak naming relation.
Thus, we can justify the assumption of * 's irreflexivity in our answer to Question 3.6 with Proposition 5.12 by drawing on the conception of − as resembling quotation.
While proper quotation is perhaps the most common naming function, the specific method of enclosing expressions by quotation marks is by no means theoretically essential. 15 Alternatively, we may name strings by describing their constituent symbols, e.g. using Tarski's [24] structural-descriptive names, or by a Kripkean act of baptism [17, pp. 693]. The reader may wonder to what extend the well-foundedness requirement depends on the specifics of the quotation function. In other words, can we maintain the requirement of 's well-foundedness if we conceive of − as resembling other naming functions different to proper quotation? In the remainder of this section we show that the well-foundedness criterion can be based on a broad conception of quotation which encompasses several canonical naming devices found in the literature.
To delineate this broad conception, consider the expressions " 'snow' " , "the word which consists of the following letters: es, en, o, double-u, following one another", "the 4354th word of Chants Democratic" and "Jack". There is an important difference in the way these expressions serve as names of strings. The first two preserve the literal information of the named string "snow", i.e., for each letter of "snow", they contain a designated corresponding string. For example, "s" and "es" correspond to the first letter of "snow" respectively. This preservation of literal information enables us to read off the designated words from their names. The last two expressions do not preserve literal information. As opposed to the situation above, their referents can only be determined by reference to an external source of information or act of baptism.
In what follows, we confine ourselves to naming devices which preserve literal information. The following definitions are an attempt to make this precise.

Definition 6.3
Let e, f, g ∈ A * . We write (e, f ) g if g contains non-overlapping occurrences of e and f . More precisely, (e, f ) g iff there are (possibly empty) a, b, c ∈ A * such that g = aebf c or g = af bec. We call a function G : S × S → S weakly -increasing, if (e, f ) G(e, f ), for all e, f ∈ S. Definition 6. 4 We call a function N : A * → A * a literal pre-naming function for A * , if N can be recursively defined by • N (s) = G(N (a π s (1) · · · a π s (n−1) ), L(a π s (n) , s)), where s ≡ a 1 · · · a n such that n > 0 and a i ∈ A for each i ≤ n; for some function L : (A ∪ { }) × A * → A * \ { }, a weakly -increasing function G : A * × A * → A * and a function π − which maps each string s ∈ A * to a permutation π s of the set {1, . . . , lh(s)}. We also call L a literal function. The second argument of L serves as a parameter, permitting alphabetical symbols to be named by L in dependency of the full string in which they occur. If L is defined without parameters, we sometimes suppress the second argument of L for better readability. Let E ⊆ A * be a set of expressions. We call a function N : E → E a literal naming function for E, if there is a literal pre-naming function N for A * and functions B, E : E → A * such that We also call B(s) and E(s) the begin marker and the end marker of the name N(e) of e respectively. This definition accommodates a large class of naming devices found in the literature: A. Let A consist of English lower case letters together with a pair of single quotation marks. Let E be given by The functions Q 1 , Q 2 , Q 3 : E → E, given by Q 1 (a 1 a 2 · · · a n ) :≡ 'a 1 a 2 · · · a n ' Q 2 (a 1 a 2 · · · a n ) :≡ 'a n · · · a 2 a 1 ' Q 3 (a 1 a 2 · · · a n ) :≡ 'a 1 a 1 a 2 a 2 · · · a n a n ' are literal naming functions. The literal function of Q 3 duplicates the symbol a to a string aa. For Q 2 , the family of permutations is non-trivial: for each s with length k ≥ 2, π s (j ) = k + 1 − j , for any 1 ≤ j ≤ k. We can also duplicate or permute in dependency of whether or not the input string contains a designated marker: Q 4 (a 1 a 2 · · · a n ) :≡ 'a 1 a 1 a 2 a 2 · · · a n a n ' if a i ≡ d, for some i ≤ n; 'a 1 a 2 · · · a n ' otherwise.
where a 1 , a 2 , . . . , a n ∈ A, are literal naming functions. Also Q 4 and Q 5 are literal naming functions. Note that the definitions of Q 4 and Q 5 essentially rely on parameters for L and π − respectively. B. Let A consist of English lower case letters together with the symbols and • . For any α ∈ A * , let B(α) be the shortest string of the form · · · • which does not occur in α. Boolos' quotation function Q B : A * → A * presented in [1], given by is a literal naming function. Here the non-trivial bit lies in the begin markers and end markers. A variant of Boolos' construction was communicated to us by Albert Visser and is given as follows. For any α ∈ A * , let B(α) :≡ • · · · • , where the length of · · · equals the length of α. The quotation function Q V : A * → A * , given by is a literal naming function. C. We now show that the quotation device introduced by Halbach and Leigh [9,Chapter 8] can be accommodated in our framework. Let the alphabet A HL consist of the following symbols (see [ Halbach and Leigh's quotation function Q HL : A * HL → A * HL can now be recursively defined as follows: • Q HL (e) :≡ L(e), if lh(e) ≤ 1; • Q HL (a 1 · · · a n ) :≡ Q HL (a 1 · · · a n−1 )L(a n ), where n > 1 and a i ∈ A HL for each i ≤ n; Hence, Q HL is a literal naming function. D. Let A consist of English lower case letters together with the space character " ".
Let E be the set of A-strings without the empty string . Let L map to the string "empty" and each letter of A to its ICAO spelling name. For instance, L maps the letter "b" to the string "bravo" and the space character to the string "space". Let G : A * × A * → A * be given by Let B(s) and E(s) be empty. The resulting structural-descriptive naming device SD 1 which maps any expression a 1 a 2 · · · a n to the A-string L(a 1 )" concatenated with" · · · "concatenated with"L(a n ), where a 1 , a 2 , . . . , a n ∈ A, is a literal naming function. E. Let the alphabet A be given by Here, α is conceived of as an alphabetical symbol of length 1. The structuraldescriptive naming device SD 2 : A * → A * , given by SD 2 (a 1 a 2 · · · a n ) :≡ a 1 a 2 · · · a n , where a 1 , a 2 , . . . , a n ∈ A, is a literal naming function. F. Let A = 0, . . . , 9, a, b, . . . , z," " be an ordered alphabet containing the Arabic numerals, the English lower case letters and the space character " ". We specify a base 37 notation system for ω by using the k-th alphabetical symbol of A as the base 37 digit for k (with 0 ≤ k < 37). We write (a 1 · · · a n ) 37 for the number with base 37 notation a 1 · · · a n . For example, since 2 and b are the 2nd and the 11th symbol of A respectively, we have (2b) 37 = 11 + (2 · 37) = 85.
We now order the strings of A * using the length-first ordering (α i ) i∈ω in which we enumerate the strings according to increasing length, where the strings of same length are ordered alphabetically. We have α m ≡ a 1 · · · a n iff m = (a 1 · · · a n ) 37 . The list (α i ) i∈ω can be seen as a lexicon for strings over A. We define a naming function D : A * → A * by mapping each string a 1 · · · a n to its descriptive name "the word in the lexicon whose index is " a 1 · · · a n " in base 37 notation" Clearly, D is a literal naming function. G. We now transfer the descriptive device D from the previous example to an arithmetical setting. Let A = a 1 , . . . , a k be an alphabet for our arithmetical language L (including parentheses) and let (α i ) i∈ω be a length-first ordering of A * . We now define for each i ≤ k We now define the "efficient" naming function E : A * → ClTerm by setting E( ) :≡ 0 and E(a 1 · · · a n ) :≡ S a n (· · · S a 1 (0) · · · ).
Note that the value of E(a 1 · · · a n ) is the number m with a 1 · · · a n ≡ α m .
In order to show that E is a literal naming function, we define L (without parameters) by setting L( ) :≡ 0 and L(a) :≡ S a , for a ∈ A. We set G(e, f ) :≡ f (k × e), for e, f ∈ A * . We then have E( ) ≡ L( ) ≡ 0; E(a 1 · · · a n ) ≡ L(a n )(k × E(a 1 · · · a n−1 )) ≡ G(E(a 1 · · · a n−1 ), L(a n , a 1 · · · a n )), where n > 0 and a i ∈ A for each i ≤ n. Hence, E is a literal naming function. H. Let A be an alphabet for our arithmetical language L and let # : A * → ω be a monotonic numbering, i.e., #e ≤ #e , for all e, e ∈ A * with e e . We set G(e, f ) :≡ f e, for e, f ∈ A * . We define the literal function L by setting if lh(e) ≤ 1; S · · · S -times if e ≡ a 1 a 2 · · · a n a for some n ≥ 1 & ∀i ≤ n a i ∈ A and = #(a 1 a 2 · · · a n a) − #(a 1 a 2 · · · a n ); 0 otherwise.
Note that ∈ ω, since # is monotonic. We then have e # ≡ L(e, e), if lh(e) ≤ 1; a 1 · · · a n # ≡ L(a n , a 1 · · · a n ) a 1 · · · a n−1 # ≡ G( a 1 · · · a n−1 # , L(a n , a 1 · · · a n )), where n > 1 and a i ∈ A for each i ≤ n. Hence, · # is a literal naming function. 16 This shows that every naming function which is based on a monotonic numbering and standard numerals is a literal naming function.
Note that unique readability is not satisfied by every literal naming function. However, all of the above examples permit unique readability. We now provide sufficient conditions for the well-foundedness of weak naming relations. Lemma 6.5 Let N be a literal naming function for E ⊆ A * which satisfies at least one of the following conditions: (1) N can be defined using markers B and E such that B(e) ≡ or E(e) ≡ , for each e ∈ E; (2) N can be defined using a literal function L such that L(a, e) ∈ A, for each a ∈ A and e ∈ A * ; (3) N can be defined using a well-founded literal function L, i.e., there are no sequences (a i ) i∈ω and (e i ) i∈ω of alphabetical symbols and A-strings respectively such that L(a i+1 , e i+1 ) ≡ a i for every i ∈ ω. (4) N satisfies the following two conditions: (ii) for every two N-names e, f ∈ im(N), if e ≺ f then lh(e) + 1 < lh(f ).
Then N is well-founded.
Proof (1) & (2) follow from the fact that in each case we have lh(N(e)) > lh(e), for every e ∈ E.
We now show (3). Assume that there is a sequence (e i ) i∈ω of expressions of E such that e i+1 N e i , for each i ∈ ω. Since e i+1 N e i implies lh(e i+1 ) ≤ lh(e i ), there is a number k such that lh(e i+1 ) = lh(e i ), for each i ≥ k. Hence, we have N(e i+1 ) ≡ e i , for each i ≥ k. Let N be defined by means of a literal pre-naming function N . We then have lh(e i+1 ) ≤ lh(N (e i+1 )) ≤ lh(N(e i+1 )) = lh(e i+1 ), We therefore obtain an infinite sequence (a i ) i∈ω of alphabetical symbols of A such that L(a i+1 , e i+1 ) ≡ a i . Thus, L is ill-founded.
We now show (4). Let N be defined by some literal pre-naming function N . We first show that lh(s) < lh(N (s)) for all s ∈ E with lh(s) > 1. Let s ≡ a 1 · · · a n , for some n > 1 and a 1 , . . . , a n ∈ A. Let N (s) ≡ G(N (a π s (1) · · · a π s (n−1) ), L(a π s (n) , s)), where G, L and π − are given as in Definition 6.4. Since G is weakly -increasing, we have N (a π s (1) · · · a π s (n−1) ) ≺ N (s). Hence, using (ii), we have lh(s) = n ≤ lh(N (a π s (1) · · · a π s (n−1) )) + 1 < lh(N (s)). Now, assume that there is a sequence (e i ) i∈ω of expressions of E such that e i+1 N e i , for each i ∈ ω. As we have seen in the proof of clause (3), there is a number k such that lh(e i+1 ) = lh(e i ) and N (e i+1 ) ≡ e i , for each i ≥ k. Since lh(e) < lh(N (e)) for all composite expressions e, we have lh(e i ) = 1 for each i ≥ k. Hence, there is a sequence (a i ) i∈ω of alphabetical symbols such that a i+1 N a i for each i ∈ ω. But this contradicts (i).
We observe that all literal naming functions introduced above give rise to wellfounded weak naming relations.
These examples suggest that the well-foundedness principle is grounded in a rather robust and general conception of quotation. Remark 6.7 Instead of requiring well-foundedness, one may require naming functions which resemble quotation to be strongly monotonic (cf. Section 5). Here, the essential assumption is that each quotation properly contains its quoted expression (as strings). It can then be argued that numberings which mimic quotations are required to code the Gödel numeral of an expression e by a larger number than the expression e itself (see [8,Section 6] for an elaboration of this view). Note that only the naming functions Q 1 , Q B , Q V and D satisfy this assumption. Hence, the justification of strong monotonicity seems to require a much more narrow conception of quotation than in the case of well-foundedness.
Moreover, we immediately obtain the following result from clause (4) of Lemma 6.5.

Remark 6.9
The philosophical significance of the above corollary is limited by the fact that it depends on subtleties regarding the employed notation system for arithmetical terms. For example, by Lemma 6.5.4 the corollary also holds if each complex term is enclosed by parenthesis, or if each function symbol consists of a composite string. However, we can easily construct a counterexample to Corollary 6.8 if complex terms are of the form f u 1 . . . u k , where f is an alphabetical symbol. These considerations suggest that the choice of the notation system is yet another source of intensionality in the context of self-reference.
We now return to our study of Kreisel-like constructions of refutable Henkin sentences. In Section 5, we have seen that Kreisel-like fixed-points can be uniformly constructed with respect to circular naming relations. In the next section we will introduce another variant of Kreisel-like constructions of refutable Henkin sentences which are based on an ill-founded but non-circular naming relation. By slightly generalising our results of Section 5, we will show that the requirements of wellfoundedness of the naming relation, together with uniformity, also rule out this new variety of deviant fixed-point constructions.

Ill-Foundedness Without Circles
Recall that the Kreisel-like Henkin sentences introduced in Section 4 consist of a provability predicate Bew (x) of the form where d is a diagonal operator. That is, Bew (x) contains its own fixed-point term d (Bew (x)). As we have seen in Section 5, if d is uniform then d (Bew (x)) induces a circle with regard to the underlying naming relation (Proposition 5.12). Hence, the constraints of uniformity and non-circularity are sufficient to rule out the refutable Henkin sentences considered thus far.
However, we can tweak the construction of Bew (x) such that it no longer contains its own fixed-point term but still yields refutable Henkin sentences. In order to do so, we construct an ω-chain of formulas (Bew n (x)) n∈ω such that Bew n (x) is of the form where Bew(x) is some fixed provability predicate and jump represents a function mapping d(Bew n (x)) to d (Bew n+1 (x)), for every n ∈ ω. As in the case of Bew (x) above, the Henkin sentence of each provability predicate Bew n (x) is refutable. As opposed to Bew (x), however, d(Bew n (x)) is not contained in Bew n (x) itself. Hence, there are refutable Henkin sentences which are based on provability predicates which do not contain their own fixed-point terms and thus evade Proposition 5.12: There is a numbering α, a numeral function ν, a standard interpretation I, a uniform diagonal operator d with respect to α, ν and I and for each n ∈ ω there is a formula Bew n (x) such that (1) Bew n (x) weakly represents Basic(I); (2) Basic(I) ¬Bew n (t n ); (

3) t n does not occur in Bew n (x);
where t n is the fixed-point term d(Bew n (x)) of Bew n (x) w.r.t. α and I.

Proof
The details of the construction sketched above can be found in B.3.
Inspection of the ω-chain (Bew n (x)) n∈ω constructed in B.3 shows that the fixedpoint term t n+1 of Bew n+1 (x) occurs in Bew n (x), for each n ∈ ω. We now ask under which additional assumptions we can rule out both Kreisel's original construction and its variant based on ω-chains as given above: Question 7.2 Let d be a uniform diagonal operator and (ϕ n (x)) n∈ω and (t n ) n∈ω sequences of formulas and closed terms respectively. Under which assumptions can we rule out the possibility that for every n ∈ ω we have d(ϕ n (x)) ≡ t n , where ϕ n (x) contains t m for some m ≥ n.
An answer to Question 7.2 yields also an answer to Question 3.6 by setting ϕ n (x) :≡ ϕ(x) and t n :≡ t, for each n ∈ ω.
We first observe that requiring uniformity together with the non-circularity of the weak naming relation is not sufficient to rule out the construction of ω-chains of refutable Henkin sentences as given above: However, once we require uniformity together the well-foundedness of the weak naming relation, deviant Henkin sentences such as constructed above can be successfully excluded. More generally, we obtain the following answer to Question 7.2: Proposition 7.4 Let (ϕ n (x)) n∈ω and (t n ) n∈ω be sequences of formulas and closed terms respectively. If the * relation induced by the naming function − is wellfounded, there is no uniform diagonal operator d such that for every n ∈ ω we have d(ϕ n (x)) ≡ t n , where ϕ n (x) contains t m for some m ≥ n.
Proof Let n ∈ ω. We have d(ϕ n (x)) ≡ t n , where ϕ n (x) contains t m for some m ≥ n. By Lemma 5.11, there exists a term s such that ϕ n (s) * t n . Since t m is a subterm of the formula ϕ n (s), we obtain t m * t n by Fact 5.2. Hence, there is an infinite subsequence (u n ) n∈ω of (t n ) n∈ω such that u n+1 * u n , for each n ∈ ω.
This contradicts our assumption that * is well-founded.

Limitations
At this point we should stress that the constraints of uniformity and well-foundedness by no means rule out every deviant construction of a Henkin sentence. After all, in this paper we have only investigated constraints on the fixed-point operator and the naming function, while we impose no constraints whatsoever on provability predicates, except that they should weakly represent the set of theorems of the theory. It is therefore hardly surprising that there exists a contrived provability predicate whose canonical diagonalization, say via Gödel's method, yields a refutable Henkin sentence (see [10,Section 5] for an example).
We conclude this section by providing another concrete example showing that uniformity and well-foundedness are not be sufficient to rule out every accidental diagonal sentence. Recall that the Kreisel-like constructions considered in this paper consist of a provability predicate Bew • (x) of the form where Bew(x) is a provability predicate weakly representing provability and χ(x) is a formula such that χ(d(Bew • (x))) is refutable and χ( ϕ ) is provable for all sentences ϕ which are distinct to Bew • (d (Bew • (x))). The conjunct χ of Bew (x) (see Section 4) contains its fixed-point term d (Bew (x)), and the conjunct χ of Bew n (x) (cf. Section 7) contains the fixed-point term d (Bew n+1 (x)) of its subsequent provability predicate in the given ω-chain. As we have seen, this is precisely the reason why uniform versions of these constructions force the naming relation to be circular and ill-founded respectively. We now provide an example of a Kreisel-like construction which is based on a provability predicate whose conjunct χ does not contain any fixed-point term. Hence, this construction can be given with respect to a uniform diagonal operator d and a well-founded naming relation. To do so, let Bew • (x) be of the form where Bew(x) is a provability predicate weakly representing provability and f 0 0 represents the function which maps the code of d (Bew • (x)) to 0 and each other number to itself. Clearly, f 0 0 (n) = 0 is satisfied by all positive numbers n which are not the code of d (Bew • (x)). Assuming that sentences all have positive codes, we therefore obtain a refutable Henkin sentence d(Bew • (x)) with respect to the uniform diagonal operator d and a standard naming function.
Finally, we can even construct refutable Henkin sentences without any additional conjunct (see B.5 for a detailed construction).

Applications
In this section, we present various applications of uniformity in distinguishing and identifying accidental fixed-points constructed by various means in the literature.

Logical Derivability
We first provide an example of accidental self-reference from a setting closer to natural language. For a given English sentence ϕ, we may ask about the status of the sentence that says of itself that it is logically derivable from ϕ. The status of such sentences depends on how self-reference is obtained: (1) The sentence (1) is logically derivable from (1).
Clearly, the sentence (1) is true, while (2) is false. In the metamathematical study of self-reference we would like to rule out diagonal operators which mirror the accidental self-referential feature of sentence (1). While the KH-property is not sufficient to rule out such diagonal operators, the requirement of uniformity successfully excludes metamathematical counterparts of sentence (1).
Proof We first show (1)-(3). Let t :≡ d J (Bew x (x)). Hence, t = Bew t (t) . Let d 0 be a diagonal operator which maps Bew t (x) to t and any other formula of the form ϕ(x) to d J (ϕ(x)). Clearly, d 0 satisfies the KH-property. Moreover, we have Bew t (t) Bew t (t) and hence PA Bew t (d 0 (Bew t )).

Codings with Built-in Diagonalization
We now turn to the question to what extent uniformity excludes fixed-points which are obtained by codings with built-in diagonalization. Recall the construction of the refutable Henkin sentence Bew( m) in Observation 2.3. Intuitively, Bew( m) is an accidental diagonal sentence since it relies on a numbering which is constructed in a highly ad hoc fashion. This intuition can be grounded in mathematical facts as follows. While the employed numbering α together with standard numerals induce a well-founded naming relation, the fixed-point term m cannot be constructed uniformly. Recall that if we do not mention the numeral function in a parameter of naming function, it means we take the standard numerals. (1) The naming relation α is well-founded; (2) No diagonal operator which maps Bew(x) to m is uniform for α.
Proof The relation # is well-founded by Corollary 6.6. Using Fact 6. We now turn to other codings with built-in diagonalization. Let β be a monotonic numbering of strings such that for each formula ϕ(x) with x free there is a number n ϕ such that n ϕ = β(ϕ(n ϕ )), where n ϕ denotes the efficient numeral of n ϕ (e.g., take β to be the numbering gn 1 constructed in [8,Section 5]).
Clearly, β gives rise to an ill-founded naming relation if we use efficient numerals. This is because the diagonal sentence ϕ(n ϕ ) contains its own name n ϕ . However, β together with standard numerals induce a well-founded relation β . Yet, no diagonal operator which maps ϕ(x) to n ϕ can be uniform for β.
Proof By Corollary 6.6, β is well-founded. Assume that d is uniform. By Lemma 5.11 there is a term s such that ϕ(s) β * n ϕ . But n ϕ does not contain any standard numeral which is the β-code of any expression. Hence, ϕ(s) β * n ϕ cannot be true.
We close by showing that there is a coding with built-in diagonalization which induces a well-founded naming relation and yields a uniform diagonal operator. Let δ be a numbering of the well-formed expressions of L such that for any given ϕ(x) with x free there is a number n ϕ with n ϕ = δ(ϕ(x)) and (n ϕ ) 2 = δ(ϕ(n ϕ × n ϕ )). 17 Set − = · • δ.
Proof In order to show that − is well-founded, it is sufficient to show that for every expression e and numbers m, n: e − n and m e implies m < n.
If the antecedent holds, then there is an expression f such that e f and f n. By [8, Lemma 6.10] we have m < δ(m). Since δ is monotonic, we have Let now any formula ϕ(x) with x free be given. By definition of δ there is a number n ϕ with n ϕ = δ(ϕ(x)) and (n ϕ ) 2 = δ(ϕ(n ϕ ×n ϕ )). Let × be the basic meta-linguistic operation which maps two terms s, t to s × t. We then have Hence, d is uniform. 17 δ can be obtained by slightly tweaking the construction of the numbering in [8, Section 6.2].
If not all diagonal sentences for a formula expressing a property behave in the same way, we can exclude those diagonal sentences that are not self-referential by the Kreisel-Henkin criterion, assuming we are interested in the sentence ascribing P to themselves. Maybe there is no single such sentence, but any diagonal sentences ascribing P to itself must be self-referential.
However, self-referential diagonal sentences ascribing provability to themselves via Kreisel provability predicates still vary in their properties, as they may be provable, refutable, or independent. Refutable Henkin sentences are obtained by plugging in a specific term into the formula that happens to be a self-referential Henkin sentence in virtue of the cunning construction. If the usual diagonal constructions are applied to the provability predicate, provable sentences are obtained. If we are interested in the sentence ascribing provability to itself via this provability predicate it must be among the provable ones. The refutable ones can only be obtained via a trick very specific to the provability predicate in question. Thus, we single out those self-referential diagonal sentences that have been obtained in a uniform way. This is sufficient to eliminate refutable Henkin sentences, as long as we employ a canonical coding and numeral function.
Of course, appealing to "canonical" codings and numeral functions is as unsatisfactory as appealing to "canonical" diagonal sentences. Hence, we replace this vague condition with a precise condition on the naming relation: Ruling out illfounded naming relations is then sufficient to obtain only provable Henkin sentences from Kreisel-style provability predicates. Generally, the well-foundedness of the naming relation is another constraint for narrowing down the class of diagonal sentences.
We do not not maintain that these constraints are the final word. Section 7.1 contains an example hinting at the need for further constraints. However, in some cases our constraints suffice to answer the question about the sentence ascribing some property to itself. Of course, our constrains on diagonal sentences interact with other constraints on the language, the coding, the axiomatization of the theory, and the formula expressing the property. All these interrelate and there is much scope for future work. a constant function. Hence by induction hypothesis ϕ(s) ev(q i )(ϕ) or ϕ(s) * ev(q i )(ϕ). Since ev(q i )(ϕ) ev(q )(ϕ), we are done. (c) The case for ev(b k ) = R, where ar(R) ≥ 1, proceeds similarly to (4.b).
The proof is complete since diagonal operators cannot be constant functions.

Appendix B. Uniform Constructions of Deviant Henkin Sentences
This part of the appendix contains several explicit constructions of deviant provability predicates which yield refutable Henkin sentences. Since all of these constructions will rely on the recursion theorem, we start by briefly introducing this important recursion theoretic result.
We start by assigning indices to p.r. functions. 18 We write F a for the p.r. function with index a. Note that each p.r. function has infinitely many indices. Moreover, the set of indices is p.r. The following recursion theorem for p.r. functions shows that we can construct p.r. functions in a self-referential way, by using their indices in their own definitions. 19 Theorem B.1 (Primitive Recursion Theorem) For every k+1-ary p.r. function G( x, y), there is an index a such that F a ( x) = G( x, a), where x is a k-tuple of variables.
In this paper we only consider standard interpretations which are intuitively "effective" (for a definition of a standard interpretation see Section 4). More precisely, we require that for every standard interpretation I there is a recursive function which maps each pair (n, k) of numbers to an index a such that I(f k n ) = F a . Before we provide our constructions we fix some more notation. Let L be given as in Section 4 and let # be some standard elementary numbering of L such that #e > 0, for all well-formed expressions e in L. Let P k i be the projection function which maps a k + 1-tuple to its i + 1-th component (where i ≤ k).
Let the function I : {f k n | n = 0, k = 0} → Pr be given by F n if n is the index of a k + 1-ary function; P k 0 otherwise.
Clearly, I is surjective. For each index a of a unary function, let I a be the extension of I by mapping the function symbol f 0 0 to F a . Each such I a is a standard interpretation function of L. We observe: Fact B.2 For each index a of a unary function, there is an L 0 -formula Bew a (x) which weakly represents Basic(I a ) relative to #. Moreover, there is a p.r. function H which maps each such a to the #-code of Bew a (x).
We now prove Lemma 4.2 by providing two examples of uniform diagonal operators which satisfy the fixed-point property of Lemma 4.2. The first example employs a canonical numeral function, namely, standard numerals, but is based on a contrived numbering. The second example uses a standard numbering, but relies on an artificial numeral function. These examples can be developed for all uniform diagonal operators introduced in this paper. For the sake of simplicity, however, we will base the exposition on diagonal operators which are particularly suitable.

B.1 First Proof of Lemma 4.2
We base the first construction on the diagonal operator d B introduced in Example 3.5. Similar but slightly more complicated constructions can be given for d G and d J .
Clearly, G is p.r. Using Theorem B.1, we find an index a such that F a (p) = G(p, a), for each p ∈ ω.
We now define a numbering α as follows: The numbering α is injective and elementary. Moreover, Bew a (x) also weakly represents Basic(I a ) relative to α. We set . By Eq. 2 and the fact that , we have for each ϕ(x) ∈ Fml x that . Hence, d B given by is a uniform diagonal operator with respect to α, standard numerals and I a (see also Definition 4.1 and Example 3.5). Moreover, Bew * (x) is a fixed-point of k d B , i.e., This completes our first proof of Lemma 4.2. Note that while our construction employs the standard numeral function, the numbering α is contrived. We now show that if we leave the numeral function unconstrained, we can construct a deviant provability predicate satisfying Lemma 4.2 for any given standard numbering.

B.2 Second Proof of Lemma 4.2
We base the second construction on Jeroslow's operator d J introduced in Section 3.2. Once again, similar but slightly more complicated constructions can be given for d B and d G .
Let sub J be f 1 n for some n such that F n maps the #-codes of ϕ(x) and t (x) to the #-code of ϕ(t ( t (x) # ). Let z be the #-code of the formula We now define a function G : ω 2 → ω by setting G(p, q) := ∧ # (z, H (q)). By Theorem B.1, there is an index a such that F a (p) = G(p, a), for all p ∈ ω. We now define the formula Hence, the mapping ν : ω → ClTerm given by is a numeral function (for I a ). We moreover have that Basic(I a ) sub J ( ϕ(x) #,ν , t (x) #,ν ) = ϕ(t ( t (x) #,ν )) #,ν .

B.5 A Henkin Sentence Without an Additional Conjunct
We now construct a Henkin sentence which is not of the form χ(x) ∧ Bew(x). Let sub J be given as in Section B.2. We assume that for each index a of a unary function, Bew a (x) also satisfies Basic(I a ) ¬Bew a ( ϕ # ), for all non-formulas ϕ. Let H be a p.r. function which maps each such a to the #-code of Bew a (x) (see Fact B.2). Let K be a p.r. function which maps each #-code of ϕ to the #-code of ϕ # . Let Sub # denote the #-tracking function of the substitution function Sub f defined in Section 3. Let sub J # denote the #-tracking function of the binary function sub J introduced in Section 3. Let the function L : ω → ω be given by L(q) := Sub # (H (q), #f 0 0 (x)). We define G : ω 2 → ω by setting G(p, q) := #0 if p = Sub # (L(q), sub J # (K(L(q)), sub J # (K(L(q)), #x))); p otherwise.
To sum up, d J (Bew (x)) is a refutable Henkin sentence, where d J is a uniform diagonal operator and the underlying naming relation is given by a standard numbering and numeral function. Hence, in particular, the corresponding naming relation is well-founded.