Six Problems in Pure Inductive Logic

We present six significant open problems in Pure Inductive Logic, together with their background and current status, with the intention of raising awareness and leading ultimately to their resolution.


Introduction
Imagine an agent being asked to assign belief values, which we take to be subjective probabilities, to some events in such a way that this assignment was 'rational'. In the possible absence of any precise definition of what exactly 'rational' means the agent might formulate guiding principles, such as respecting existing symmetries, which they feel any other like minded agent would adopt, or at least would not ridicule as 'irrational'. Pure Inductive Logic, PIL, as referred to here and in line with Carnap's original concept, see [1,Page 69], is an attempt to formulate and investigate the consequences of such purportedly rational principles. PIL focuses on a very simple context where events are identified with sentences of a finite relational language which is otherwise completely uninterpreted as far as the agent is concerned. In other words a 'blank sheet'. The justification for such a restriction is that if we cannot come to an answer in this simple case, we should not expect to do better in the much broader aspects that philosophers would wish to consider. The monograph [18] describes much of what has been done in the area so far.
To date a number of such principles have been proposed and doubtless more will be forthcoming in the future. If all these principles were consistent with each other it would provide some support for the argument that, in this simple context at least, there is a single rational answer that all truly rational agents will agree on. In other words in support of a special, vacuous, case of Roger White's [24] Rational Uniqueness Thesis that any set of evidence permits only one rationally acceptable attitude towards a given proposition. Unfortunately that is not the case, even for unary languages, see for example [13,17]. Furthermore even if we adjust our view of what is rational by choosing a subset of the available principles which are compatible with each other this only exceptionally produces a unique assignment (though for unary languages Johnson's Sufficientness Principle produces Carnap's Continuum of Inductive Methods which comes close, to within a single parameter λ ∈ [0, 1]).
In order to throw more light on these issues we need to gain a better understanding of the consequences of some of these principles, how they relate to each other and the structures of the probability functions satisfying them. The six problems described in this paper are in our opinion central to this programme.
In some cases, for example Problems 2 and 4, much effort has already been exerted to solving them with only fragmentary success whilst others, for example Problems 1 and 6, have only arisen comparatively recently and may in the event succumb with less resistance. In all cases however the solutions to these questions would significantly enhance our understanding of PIL and in turn suggest further intriguing avenues of research, both mathematical and philosophical.
For each of these problems we will explain (or reference) the necessary relevant concepts and related results. First however we will briefly cover some background and motivation which is common to all.
Throughout L, possibly with prefixes, will denote a first order language with finitely many relation symbols R 1 , R 2 , . . . , R q , of arities r 1 , r 2 , . . . , r q respectively, and countably many constant symbols a i , i ∈ N + = {1, 2, 3, . . .} which we intend to name all members of the domain. With the exception of Problem 6 L will not include equality nor any function symbols.
We let SL/F L denote the set of first order sentences/formulae of L and QF SL/QF F L for the quantifier free versions. For θ ∈ SL we set θ to be θ if = 1 and ¬θ if = 0. Since we will very largely only be interested in semantics rather than syntax we will take the liberty of treating logically equivalent sentences and formulae as being actually equal.
For b 1 , . . . , b n distinct members from the set of constant symbols we say that where the i (j 1 , . . . , j r i ) ∈ {0, 1}. In other words a state description for b 1 , . . . , b n specifies for each of the R i and each choice b j 1 , . . . , b j r i of constants (not necessarily distinct) from amongst the b 1 , . . . , b n whether or not R i holds for this tuple. We shall use upper case Greek letters to denote state descriptions. We say that a function w : SL → [0, 1] is a probability function on SL if it satisfies that for all θ, φ, ∃x ψ(x) ∈ SL, . From (P1-3) all the 'expected' properties of a probability function follow, see for example [18,Prop. 3.1].
Given a probability function w on SL and φ ∈ SL with w(φ) > 0 we define, as usual, the conditional probability function w(· | φ) on SL by A useful device which we shall employ through this paper, and which circumvents the problems when w(φ) may be zero, is identifying an assertion such as in other words multiplying out by the potentially zero denominator(s). PIL is primarily concerned with the issue of understanding and investigating what it might mean for a probability function w on SL to be rational (or logical) in the circumstance that no intended particular interpretation or meaning is given to the relation and constant symbols. To date the method adopted to glean such insight has been by proposing various principles which w might arguably be expected to obey if it is to somehow warrant the description 'rational' and then to investigate the further constraints on w that this imposes. Of particular value here are representation theorems which describe the family of w satisfying a principle in a simple way, usually in terms of convex mixtures of some comparatively elementary probability functions. Several of the problems presented here are aimed at providing some such results.

The Strength of Constant Exchangeability
The one principle which is almost invariably assumed in the context of PIL is:
While this widespread acceptance is primarily based on the principle's evident, one might even say undeniable, rationality, Ex also has as an extremely useful consequence in the following principle (where b, c, d are tuples of constants and b, c have the same length): The Principle of Instantial Relevance, PIR For θ( b, d), φ( d) ∈ SL with w(φ( d) ∧ θ( c, d))) > 0, and b, c, d having (pairwise) no constants in common, This version of PIR as given here is in some aspects more general than that originally proved by Gaifman, under the name Nonnegative Instantial Relevance, in [6] (see also [10] and [23,Footnote 7]) where the language was unary, b, c had length 1 and d was not an argument of θ .
However, assuming that w satisfies Ex, the form given here follows directly from it since if w is a probability function for a general language L with θ, φ as above and we define v on state descriptions of the unary language L P with the single unary relation symbol P by where the b m are tuples of constants (with b = b 1 , c = b 2 ) disjoint from each other and from d, then v extends to a probability function on SL P 1 also satisfying Ex and hence the special, unary, form of PIR. Consequently v(P (a 2 ) ∧ P (a 1 )) ≥ v(P (a 2 )) · v(P (a 2 )), so Eq. 2 follows.
The fact that Ex implies PIR suggests the following wider question:

Problem 1
Under what conditions on ψ, ξ ∈ SL must we have that for all probability functions on SL satisfying Ex?
In other words under what conditions on ψ, ξ is the value given to ψ by a probability function w on SL satisfying Ex enhanced (at least not decreased) by conditioning on ξ ?
A sufficient condition here of course is when ψ, ξ are of the form given in PIR (that is, θ( b, d), θ( c, d)). Another, as one can readily check, is when ψ ξ or ξ ψ. But does this essentially exhaust the possibilities or are there genuine further principles here like PIR still awaiting discovery?
Several points are worth mentioning here. Firstly PIR itself, even with the additional assumption of the Constant Irrelevance Principle, IP, (see [18, Page 52]) does not imply Ex, (see e.g. [20]). (However, as also shown in [20], with the further assumption of the Regularity Principle (see [18, Page 61]) PIR with IP does imply Ex. Whilst IP is a very strong principle and plays a vital role in the proof still at this point it has not been ruled out that the same result could be shown even without the assumption of IP.) Secondly if ψ = ψ(a 1 ) and ξ = ξ(a 1 ), that is ψ and ξ mention just one and the same constant, then Eq. 3 holds just if ξ(a 1 ) ψ(a 1 ) or ψ(a 1 ) ξ(a 1 ). To show this suppose neither held and let M 1 , M 2 be structures for L with universe .
Unfortunately this is not yet enough to refute (3) since V may not satisfy Ex.
Then V * is a probability function on SL satisfying Ex and Unfortunately a similar approach does not seem to work when we have more than one constant.
At this obstacle then one might feel tempted, in the light of PIR, into trying the simple special case of showing that when ξ(a 1 ) |= ψ(a 1 ), whenever w satisfies Ex. Unfortunately there are counter-examples to this. Precisely, for a unary language L with at least two predicate symbols there are probability functions w satisfying Ex but failing to satisfy: The Generalised Principle of Instantial Relevance (see [13], [18,Chapter 18]).
By [14,Theorem 1] v is a probability function satisfying Ex. However the failure of Eq. 5 for w translates into the failure of Eq. 4 for v.

Characterizing the Probability Functions Satisfying SDSAP
Since the early work by Carnap in [2] and Carnap-Stegmüller in [3] there have been a string of papers (for more details see [7,8]) aimed at capturing and explicating the idea of 'analogical support' within the wider context of Inductive Logic. One particular idea, which we will now describe, is formalized within Unary Inductive Logic, that is where all the relation symbols R 1 , . . . , R q of the language L are unary. For this language the atoms of L are the 2 q formulae where the 1 ≤ h 1 , . . . , h n ≤ 2 q . So a state description for b 1 , . . . , b n tells us exactly which of the R j (b i ) hold for each i = 1, . . . , n and j = 1, . . . , 2 q . We define the distance between atoms α i (x), α j (x) given respectively by 1 , . . . , q and δ 1 , . . . , δ q by that is as the number of predicate symbols on which they differ. The aforementioned idea is that given 'background evidence' θ(a 1 , . . . , , a n ) ∈ QF SL the extent to which further evidence α i (a n+1 ) provides analogical support for α j (a n+2 ) decreases as the distance α i (x) − α j (x) increases. There are various ways that this might be formalized but the most appropriate in the context seems to be: ∧φ(a 1 , a 2 , . . . , a n )) for any consistent φ(a 1 , a 2 , . . . , a n ) ∈ QF SL.
In [8] the status of this principle was investigated under the additional assumptions that w satisfied Constant Exchangeability (see Section 2), Predicate Exchangeability and Strong Negation, where for the unary language we are currently considering: 2

The Predicate Exchangeability Principle, Px
For predicate symbols R i , R j of L and θ ∈ SL, where θ is the result of transposing R i and R j throughout θ .

The Strong Negation Principle, SN
For R i a predicate symbol of L and θ ∈ SL, Both of Px and SN are natural assumptions in this context in that the distance between atoms is invariant under these transformations. Indeed the converse holds, any permutation of atoms which preserves distances is a composition of permutations licensed by Px and SN, see [8,Theorem 2].
By the note following Proposition 4 from [8] if q = 1 (i.e. the unary language with just one predicate symbol) then SAP holds for all probability functions satisfying Ex+Px+SN with the exception of convex combinations of the c 0 , c ∞ from Carnap's Continuum for this language. When q = 2 however the situation changes radically, as shown in the note following the proof of Theorem 11 of [8] there are now only a handful of probability function on SL satisfying Ex+Px+SN+SAP, and these seem to otherwise lack any obviously attractive features. Finally it is shown in [8] that once q ≥ 3 there are no probability functions satisfying Ex+Px+SN+SAP.
A key feature of these proofs from [8] however is that we make use of the free choice of background evidence φ(a 1 , . . . , a n ) ∈ QF SL rather than restricting it to being a state description as Carnap and most of the subsequent discussion had assumed. 3 Taking SDSAP to be SAP as above but with θ(a 1 , . . . , a n ) restricted to a state description perhaps addresses analogical influence just as well and leads us to:

Problem 2
Characterise the probability functions on the unary language L satisfying Ex+Px+SN and SDSAP Given that the conditioning method employed in the SAP case apparently can no longer be used here the obvious alternative strategy is to make a collection of probability functions satisfying SDSAP, and operations on these that preserve SDSAP, and then show that every probability function satisfying SDSAP must arise in this way.
Concerning the first step in this approach, finding probability functions satisfying SDSAP and operations preserving them, [4] gives some examples for the case q = 2 though unfortunately their very disparity seems to provide little encouragement for proceeding to the second step and showing that these cover all the possibilities for this language. Furthermore for q ≥ 3 we currently know of no probability functions satisfying SDSAP (with Ex+Px+SN) and we might even be led to hazard the conjecture that in fact there are none.
With hindsight a more reasonable analogy principle than SDSAP might be that: state description (a 1 , . . . , a n a 1 , a 2 , . . . , a n )) > w(α i (a n+2 ) | α k (a n+1 ) ∧ (a 1 , a 2 , . . . , a n )) where (i, j ) is the set of s ∈ {1, 2, . . . , q} for which α i (x), α j (x) give the same parity to R s . With this revision we can find probability functions satisfying the principle for all q. For example let v be a probability function on the sentences of the unary language with a single predicate symbol P satisfying Ex+SN and define w on state description In this case w will satisfy Ex+Px+SN and Eq. 7, see [7,Proposition 41]. A full representation theorem for the probability functions satisfying these conditions currently awaits explication. It is perhaps worth pointing out here however, that, as shown in [5], if we replace the strict inequality in Eq. 7 by non-strict then for q ≥ 2 the probability functions satisfying this condition together with Ex, Atom Exchangeability and Regularity (see Sections 4,5 respectively, or [18] for definitions) are precisely the members c λ for 0 < λ ≤ ∞ of Carnap's Continuum of Inductive methods (so arguably giving an alternative characterization of this continuum in terms of 'analogy').

The Principle of Induction
There are various ways in which to capture the desideratum that the probability given to an event by a rational probability function should reflect how often this event occurred in the past. One instance of this is PIR which we considered in Section 2. Another possible formulation is the Principle of Induction, which has been proposed primarily for probability functions that already satisfy the Principle of Spectrum Exchangeability. To explain it, we need some definitions and notation. Let (b 1 , . . . , b n ) be a state description as in Eq. 1. The equivalence relation ∼ on {b 1 , b 2 , . . . , b n } is defined as follows: b s ∼ b t just when for all i ∈ {1, . . . q}, 0 ≤ u ≤ r i − 1 and not necessarily distinct k 1 , . . . , k u , k u+2 , . . . , k r i from {1, 2, . . . , n}, (i, k 1 , . . . , k u , s, k u+2 , . . . , k r i ) = (i, k 1 , . . . , k u , t, k u+2 , . . . , k r i ).
In other words, if the equality relation were added to the language, b s = b t would be consistent with (b 1 , . . . , b n ) (plus the axioms of equality).
Let E( ) denote the set of equivalence classes of ∼ . The Spectrum of , S( ), is defined to be the multiset of sizes of the equivalence classes in E( ). For state descriptions (b 1 , b 2 , . . . , b n ), (b 1 , b 2 , . . . , b n )

The Spectrum Exchangeability Principle, Sx
We remark that for a unary language and in presence of Ex, Sx is equivalent to the Principle of Atom Exchangeability, Ax, which, combined with Ex, says that the probability of a state descriptions depends only on the multiset of the numbers of occurrences of individual atoms in it That is, w The Principle of Induction has been proved for probability functions that satisfy the following principle:

Language Invariance with Sx, Li+Sx
A probability function w for a language L satisfies Language Invariance with Sx if there is a family of probability functions w L , one on each (finite, possibly polyadic) language L, satisfying Sx such that w L = w and whenever L ⊆ L , w L = w L SL (that is, w L restricted to SL agrees with w L ).
It has also been proved for homogeneous probability functions satisfying Sx, where homogeneity of a probability function w means that for all t lim n→∞ |S ( (a 1 ,a 2 ,...,a n ))|=t w ( (a 1 , a 2 , . . . , a n )) = 0.
It is not known if PI holds for t-heterogeneous probability functions. In view of the Ladder Theorem (see [11] or [18,Chapter 30]), which states that any probability function satisfying Sx can be expressed as a convex combination of homogeneous and t-heterogeneous functions (with t ∈ N + ), this question represents the missing mosaic stone needed to complete the Sx-and-PI picture.

Problem 3
Show that PI holds for any t-heterogeneous probability function satisfying Sx.

An Ultimate Symmetry Principle
There are various principles in Pure Inductive Logic based on the idea that it is rational to respect symmetry when assigning beliefs, in particular the above mentioned principles of Constant and Predicate Exchangeability (Ex, Px) and Strong Negation (SN), and of Variable Exchangeability (Vx), see [18]. It is natural then to ask if there is a common source of these principles. Understanding symmetry as that which exists between a structure and its image under an automorphism has lead to the formulation of the Invariance Principle, INV, which does provide such a source, as we now explain. (For more details see [15,16], [18,Chapters 23,38]).
Let T be the set of structures for L each with universe {a 1 , a 2 , a 3 , . . .} with the convention that the constant a i of L is interpreted in M ∈ T by the element a i ∈ M. Let BL be the two-sorted structure with universe T , the sets An automorphism η of BL is a bijection of T onto itself such that for each θ ∈ SL there is a ψ ∈ SL such that and also for each ψ ∈ SL there is a θ ∈ SL satisfying (8).
We write ηθ or η(θ) for the sentence ψ ∈ SL (up to logical equivalence) for which η[θ ] = [ψ]. We remark that η is an automorphism of the Lindenbaum algebra of SL but not all automorphisms of the Lindenbaum algebra of SL arise in this way, see [21]. Now imagine an agent 4 whose task it is to assign probabilities w(θ) to the θ ∈ SL in a rational way and who knows that he is inhabiting one of the M in T . He is aware of BL but knows nothing about which particular M from T he is in. He tries to judge how probable it is that θ holds in his particular M. Then it appears rational for him to propose the same value for any two sentences that look the same within BL in the sense that one is the image of the other under an automorphism of BL. This leads to
INV covers other symmetry principle in the sense that restricting all automorphisms as used in INV to automorphisms from certain special classes produces many individual well-established symmetry principles: for example the above mentioned Ex, Px, SN, Vx.
However, in its generality, INV is a strong principle and it is natural to wonder if there are any probability functions that satisfy it.
One such can be described as follows. Assuming as usual that the relation symbols of L are R 1 , R 2 , . . . , R q with arities r 1 , . . . , r q respectively, the atoms of the Boolean Algebra of sets [θ ] in BL are (just) the singleton sets ⎡ where j ∈ {0, 1}. The probability function c L 0 defined by for each choice of the j ∈ {0, 1} does satisfy INV.
In the case of L containing just unary predicates, c L 0 is the only probability function on SL that satisfies INV, see [15] or [18,Chapter 23]. The proof that this is the case relies heavily on the fact that for a unary language, any sentence θ (mentioning only constants included amongst a k 1 , a k 2 , . . . , a k m ) is logically equivalent to a disjunction of finitely many sentences of the form (∃x α j (x)) j . Using this, it is possible to demonstrate the existence of enough automorphisms to force such restrictions on any potential candidate for a probability function satisfying INV as to rule them all out, except c L 0 . The unary automorphisms can be modified to derive some information about the general non-unary case, too; for example to demonstrate that under INV certain consistent sentences must have probability 0 (and hence that INV and the Principle of Super Regularity 5 are incompatible), see [18,Chapter 40;22]. However, the following tantalising problem remains unanswered:

Problem 4
Are there non-trivial (i.e. not c L 0 ) probability functions satisfying INV?
The difficulty lies in our present relative ignorance regarding possible automorphisms of BL in the general case.
In the unary case, special classes of automorphisms either lead to previously well known symmetry principles or 'go too far' by eliminating too many probability functions, see [15] or [18]. In the polyadic case, we find that one special class of relatively well understood and well behaved automorphisms (those that 'permute state formulae') does lead to a new symmetry principle satisfied by numerous interesting probability functions. The next section will focus on the problem of representing all the probability functions satisfying this new principle. Notwithstanding, Problem 4 requires us to understand more deeply than we currently do what automorphisms other than those permuting state formulae can there be. Let r = max{r 1 , . . . , r q }. An atom of L is a state formula for r variables. Note that the definition of atom given above just for a unary language is in agreement with this general definition. An atom is determined by the corresponding map Let denote the set of such maps and for ∈ let α denote the atom determined by .

Representation Theorem(s) for the Permutation Invariance Principle
It can be shown that those automorphisms η of BL that map state descriptions to state descriptions have a uniform structure, see [16,18,22]. Namely, in that case there is a permutation of atoms (equivalently, of ) that generates the automorphism in the sense we will shortly explain. First we need to introduce some notation.
• Let (x 1 , . . . , x n )  We say that a permutation σ of satisfies condition (C) if the following holds: The result which we have referred to above can now be stated. If η is an automorphism of BL that maps each state description to a state description then there is a unique permutation σ of satisfying (C) such that for all ∈ and (distinct) constants b 1 , . . . , b r , Conversely, if σ is a permutation of satisfying (C) then there is a unique automorphism η of BL such that Eq. 10 holds.
These automorphisms of BL appear the 'more reasonable' ones since they respect state descriptions. They are referred to as automorphisms that permute state formulae, see [16,18]. Any automorphism of BL that arises (in the obvious way) for example via permuting the constants, permuting relations of the same arity, exchanging the roles of R and ¬R for some relation symbol R or permuting the order of arguments within a relation symbol, permutes state formulae.
Restricting the automorphisms in INV to those that permute state formulae leads to The Permutation Invariance Principle, PIP If η is an automorphism of BL that permutes state formulae then w(θ) = w(ηθ) for θ ∈ SL.
When L is unary, PIP is equivalent to the well known Atom Exchangeability Principle Ax, which, if Ex is assumed, is also the unary form of Sx. However, for polyadic languages PIP is genuinely a new principle. Apart from its justification inherited from INV, it has two other quite different claims to rationality: as shown in [19] and [16] (or [18]) respectively, it can be expressed equivalently as the Translation Invariance Principle TIP and as Nathanial's Invariance Principle NIP.
TIP asserts that any sentence should get the same probability as its 'translations', see [19]. NIP asserts that state descriptions with the same 'structures', see [16,18], should have the same probabilities. It follows that PIP is an uncommonly well recommended principle of Pure Inductive Logic. It is less restrictive than Sx since any function satisfying Sx has to satisfy PIP (see [18,Corollary 40.3]) but not conversely. Naturally, the problem arises of finding a representation theorem for PIP.

Problem 5
Characterise the probability functions satisfying PIP.
In [18,Chapter 42], a class of probability functions up ,L E is defined and it is conjectured there that some modifications of these functions would yield such a representation theorem. However, the example with which we now conclude this section appears to render the conjecture wrong, so the problem is wide open.
Consider the language L with one binary predicate R. For M ∈ T , let V M be the probability function on SL such that

Adding Functions to the Language of Pure Inductive Logic
As explained in [9] there have been to date few attempts, and perhaps too scant philosophical motivation, to expand the language and methods of Inductive Logic to include function symbols (and at the same time, equality). In particular the breadth of probability functions on such languages which satisfy the most basic of the symmetry principles, Constant Exchangeability, Ex, had apparently remained a mystery up till that point. That paper however initiated such a development by providing a representation theorem for the probability functions on the language L with equality and a single unary function symbol F which satisfied Ex. We shall briefly describe the form of this theorem in what follows but the immediate problem it leads to is:

Problem 6
Give a Representation Theorem for the probability functions satisfying Ex on a language for Pure Inductive Logic with the equality relation and finitely manypossibly polyadic -function symbols added.
What we are seeking here is a theorem of the structure of de Finetti's Representation Theorem. 6 In other words, we wish to describe every such function as a convex mixture of basic probability functions of some elementary form (and necessarily conversely).
The simplest case here, of L having one unary function symbol F , the equality symbol and no relation symbols, was considered in [9] and we now briefly outline the main constructs and result of that paper.
In the case of the main theorem of [9] these basic functions, denoted v g,h , are specified in terms of two functions g : Z → {r ∈ R | r ≥ 0} and h : S → S where S = {j ∈ Z | g(j ) > 0} satisfying that (i) j ∈Z g(j ) = 1 and for n ∈ N + , g(n + 1) ≤ g(n).
In the style of the representation theorems in [12] we think of the j ∈ Z as colours, with 0 being black, and g(j ) being the probability of picking colour j .
In terms of Problem 6 it would be reasonable to suppose that a construction along similar lines might work in the case of more functions and additional relations. Actually, mimicking ideas in [12], adding further relations would seem to be straightforward (it was left out from [9] in order not to make that paper any more complicated than it already was). Similarly adding further unary functions. The main challenge will surely be to add binary, ternary, etc, functions, not so much in guessing what the basic functions might be but in then proving that they do the job.