Determining Maximal Entropy Functions for Objective Bayesian Inductive Logic

According to the objective Bayesian approach to inductive logic, premisses inductively entail a conclusion just when every probability function with maximal entropy, from all those that satisfy the premisses, satisfies the conclusion. When premisses and conclusion are constraints on probabilities of sentences of a first-order predicate language, however, it is by no means obvious how to determine these maximal entropy functions. This paper makes progress on the problem in the following ways. Firstly, we introduce the concept of a limit in entropy and show that, if the set of probability functions satisfying the premisses contains a limit in entropy, then this limit point is unique and is the maximal entropy probability function. Next, we turn to the special case in which the premisses are categorical sentences of the logical language. We show that if the uniform probability function gives the premisses positive probability, then the maximal entropy function can be found by simply conditionalising this uniform prior on the premisses. We generalise our results to demonstrate agreement between the maximal entropy approach and Jeffrey conditionalisation in the case in which there is a single premiss that specifies the probability of a sentence of the language. We show that, after learning such a premiss, certain inferences are preserved, namely inferences to inductive tautologies. Finally, we consider potential pathologies of the approach: we explore the extent to which the maximal entropy approach is invariant under permutations of the constants of the language, and we discuss some cases in which there is no maximal entropy probability function.


Introduction
Inference under uncertainty remains one of the challenges of our time.While there is widespread agreement that probabilities are well suited to capture uncertainty and that Bayesian and Jeffrey conditionalisation are key principles of rationality, there is significant disagreement about the proper choice of probabilities and their use.One prominent approach to uncertain inference appeals to the Maximum Entropy Principle of Jaynes (1957).This selects a probability function, from all those that agree with the available evidence, that is as equivocal as possible in the sense that it has maximum Shannon entropy (Shannon, 1948).The Maximum Entropy Principle is ofte employed as part of an objective Bayesian approach to inference (Jaynes, 2003;Williamson, 2010).
The use of the Maximum Entropy Principle on finite domains is wellunderstood.A number of axiomatic characterisations highlight some of its most important properties, such as irrelevance of extraneous information, independence in the absence of evidence of dependence, and invariance under uniform refinements of the underlying finite domain (Paris andVencovská, 1990, 1997;Paris, 1994Paris, , 1998)).Furthermore, MaxEnt inference is known to agree on finite domains with what one might call 'baseline rationality': Bayesian and Jeffrey conditionalisation turn out to be special cases of MaxEnt inference (Williams, 1980).While Jeffrey conditionalisation can only deal with a single uncertain premiss at a time, of the form P (F ) = c, MaxEnt inference can handle multiple uncertain premisses of more complex forms simultaneously.Given a fixed finite domain and premisses of a suitable form, MaxEnt inference introduces an objective relation between premisses and conclusions, independent of the inferring agent.This objectivity facilitates the implementation of MaxEnt inferences in algorithms and automated systems. 1he application of MaxEnt to infinite domains is much less well understood.Firstly, axiomatic characterisations have yet to be put forward.Second, Max-Ent inference is only known to agree with Jeffrey conditionalisation on certain infinite domains that lack a logical structure (Caticha and Giffin, 2006).The focus of this paper is to shed some light on the application of MaxEnt to infinite domains-in particular, to its use as semantics for objective Bayesian inductive logic on infinite predicate languages.
There are two different explications of MaxEnt on infinite predicate languages.One, due to Jeff Paris and his co-workers, takes limits of maximum entropy functions on finite sublanguages (Barnett and Paris, 2008;Rafiee Rad, 2009;Paris and Rafiee Rad, 2010;Rafiee Rad, 2018, 2021).The second explication considers maximal entropy probability functions defined on the infinite language as a whole (Williamson, 2008;Landes and Williamson, 2015;Williamson, 2017;Rafiee Rad, 2017;Landes, 2021a).The limit approach provides a means to determine the probabilities for MaxEnt inference.However, this construction has problems: in some cases, it does not yield an answer at all (Rafiee Rad, 2009;Paris and Rafiee Rad, 2010); in other cases the constructed probabilities fail to satisfy the given premisses (Landes, 2021b).The maximal entropy approach can be used in a wider range of situations (Rafiee Rad, 2009, 2017), but the approach is less constructive and it is less clear how to determine maximal entropy probability functions.It has however been conjectured that both approaches agree where the limit approach is well defined (Williamson, 2017;Landes et al., 2021).
In this paper we study the second of these two approaches: the maximal entropy approach.We first give a method for determining the maximal entropy probability function in many general scenarios, by introducing the concept of an entropy limit point (Theorem 12).Then we show that the approach generalises both Bayesian conditionalisation (Theorem 30) and Jeffrey conditionalisation (Theorem 37).This not only clarifies which probabilities the maximal entropy approach picks out, but also gives a simple way to determine these probabilities and shows that the maximal entropy approach agrees with baseline rationality.
We then turn to general features of the maximal entropy approach.We see that certain inferences drawn in the absence of any premisses-inferences to inductive tautologies-are preserved when a premiss is added ( §7).We show that while the notion of comparative entropy used to define the maximal entropy probability functions can depend on the order of the constant symbols (Proposition 44), this order is rendered irrelevant in all cases in which the maximal entropy approach simplifies to Bayesian or Jeffrey conditionalisation (Theorem 45,Corollary 46).Finally, it becomes clear why the maximal entropy approach fails to provide probabilities in some cases.These cases are those where the premiss has zero prior probability.Updating on events of zero prior probability is notoriously problematic.We investigate the extent of these failures in §9, show that they arise in all levels of the arithmetic hierarchy including and above Σ 2 (Theorem 48), and provide a refinement of the approach to handle these problematic cases.
It is worth noting the relation between this approach and perhaps the most well-known approach to inductive logic, namely that of Rudolf Carnap (see, e.g., Carnap, 1952).In common with Carnap's approach, we consider the problem of developing an inductive logic involving sentences of a first-order predicate language.However, the maximal entropy approach differs in two key respects.Firstly, our setting is more general, as it considers premiss statements which attach probabilities or sets of probabilities to sentences of the logical language, while Carnap considered only the sentences themselves.Second, our approach is based on the idea of entropy maximisation, while Carnap's approach appeals to Bayesian conditionalisation involving exchangeable prior probability functions.The latter approach is susceptible to serious objections (Williamson, 2017, Chapter 4).

Objective Bayesian Inductive Logic
An important class of probabilistic logics consider entailment relationships of the following form (Haenni et al., 2011): Here, φ 1 , . . ., φ k , ψ are sentences of a logical language L and X 1 , . . ., X k , Y are sets of probabilities.This entailment relationship should be interpreted as saying: φ 1 , . . ., φ k having probabilities in X 1 , . . ., X k respectively inductively entails that ψ has probability in Y .
The objective Bayesian approach to inductive logic interprets probabilities as rational degrees of belief.It takes the premisses on the left-hand side of the entailment relationship to capture all the constraints on rational degrees of belief that are inferred from evidence, and it uses Jaynes' Maximum Entropy Principle to determine a rational belief function with which to calculate the probability of a conclusion statement ψ.Thus if L is a finite propositional language, X 1 , . . ., X k are closed convex sets of probabilities (i.e.closed intervals), and the premisses are consistent, an entailment relationship holds just when the probability function with maximum entropy, amongst all those that satisfy the premisses, gives a probability in Y to ψ (Williamson, 2010, Chapter 7).
This approach has been extended to the case in which L is a first-order predicate language in the following way.Suppose L has countably many constant symbols t 1 , t 2 , . . .and finitely many relation symbols U 1 , . . ., U l .Let a 1 , a 2 , . . .run through the atomic sentences of the form U i t i1 . . .t i k in such a way that those atomic sentences involving only t 1 , . . ., t n occur before those involving t n+1 , for each n.Consider the finite sub-languages L n , containing only constant symbols t 1 , . . ., t n .
Definition 1 (n-states).Ω n is the set of n-states of L, i.e., sentences of the form ±a 1 ∧ . . .∧ ±a rn involving the atomic sentences a 1 , . . ., a rn of L n , which only feature the constants t 1 , . . ., t n .2The n-states for L are thus the sentences where k i is the arity of U i , ϵ tj 1 ,...,tj k ∈ {0, 1} and Let SL, SL n be the sets of sentences of L, L n respectively.
Definition 2 (N φ ).For a single given sentence φ we use N φ to denote the greatest index of the constants appearing in φ, i.e., the greatest number n such that t n occurs in φ.If φ has no constants, we adopt the convention that N φ = 1.
Of particular importance will be the equivocator function, P = , which gives the same probability to each n-state, for each n.
Definition 4 (Measure).The measure of a sentence θ is the probability given to it by the equivocator function.In particular, θ has positive measure if and only if P = (θ) > 0.
Definition 5 (Feasible Region).We use E to refer to the set of probability functions that satisfy the premisses φ X1 1 , . . ., φ X k k , i.e., Two special cases will be particularly important in this paper.To distinguish the case of a single categorical premiss, φ, we often write E φ instead of E.
In the case of a single uncertain premiss, φ X , we write E φ X .Throughout, we shall assume that the X are intervals and that the feasible region is non-empty, E ̸ = ∅.
The n-entropies, which only take into account the probabilities on finitely many n-states, are then used to define a notion of comparative entropy on the infinite language L as a whole: Definition 7 (Comparative Entropy).We say that the probability function P ∈ P has greater entropy than Q ∈ P, if and only if the n-entropy of P dominates that of Q for sufficiently large n, i.e., if and only if there is an The greater entropy relation defines a partial order on the probability functions on L. We will focus on the maximal elements in E of this partial ordering: Definition 8 (Maximal Entropy Functions).The set of maximal entropy functions, maxent E, is defined as maxent E := {P ∈ E : there is no Q ∈ E that has greater entropy than P }.
In the absence of any premisses, maxent E = maxent P = {P = }.In this paper, we invoke the objective Bayesian notion of inductive entailment, denoted by | ≈ • (Williamson, 2017, §5.3): Note that this definition applies where maxent E is non-empty.We consider the case in which maxent E is empty in §9.
We will say that ψ is an inductive tautology While the objective Bayesian approach provides appropriate semantics for inductive logic, it is not obvious how to determine the maximal entropy functions in order to ascertain whether a given entailment relationship holds.This is because the definition of maxent E seems to require a sort through members of E in order to find those with maximal entropy-a process that would be unfeasible in practice.This paper aims to address this question.
§3 introduces the concept of an entropy limit point in order to characterise maxent E in terms of certain limits of n-entropy maximisers.This gives a constructive procedure for determining maxent E when it contains an entropy limit point.
In §4 and §5 we consider an important special case-that in which the premisses are categorical sentences φ 1 , . . ., φ k (without attached probabilities) and where the maximal entropy function can be obtained simply by conditionalising the equivocator function.

Entropy Limit Points
This section adapts the techniques of Landes et al. (2021, §5) in order to characterise maxent E in terms of certain limits of n-entropy maximisers.Landes et al. (2021) were concerned with a very different question: that of showing that the above objective Bayesian semantics for inductive logic in terms of maximal entropy functions yields the same inferences as those produced by Paris' limit approach discussed in §1.Nevertheless, the results of Landes et al. (2021, §5) can be straightforwardly adapted to the present problem.The proofs of the two results in this section, which are close to those of Landes et al. (2021, Proposition 36) and Landes et al. (2021, Theorem 39), have been provided in Appendix 1.
We will consider the set of n-entropy maximisers for each n: We now introduce the key concept of this section: Definition 10 (Entropy Limit Point).P ∈ P is an entropy limit point of P 1 , P 2 , . . .⊆ P, if for each n there is some P ∈ P will be called an entropy limit point of E if it is an entropy limit point of H 1 , H 2 , . ...
Entropy limit points of E are of special interest because they are also limit points in terms of the L 1 distance, Proposition 11.If P is an entropy limit point of E, then there are functions This property enables us to characterise the set of maximal entropy functions more constructively, in terms of a limit of n-entropy maximisers: Theorem 12 (Entropy Limit Point).If E contains an entropy limit point P , then maxent E = {P } .
Note that there can be at most one entropy limit point P of E. This is because E is convex (by the convexity of X 1 , . . ., X k ) and the n-entropy maximiser of a convex set is uniquely determined on L n .Thus, the H n can have at most one L 1 limit point.
Theorem 12 provides a simple procedure for showing that a hypothesised function P is in fact a maximal entropy function: show that it is an entropy limit point of n-entropy maximisers, and show that it is in E. (Note that this is only a sufficient condition: if E contains no entropy limit point, then Theorem 12 does not allow us to infer anything about maxent E.) Landes et al. (2021, Lemmas 40, 44) provide some tools for demonstrating that a hypothesised function is an entropy limit point of E.
Example 13.Suppose we have a single premiss ∀xU x {c} where L has a single unary predicate U and c ∈ [0, 1].(We will often omit the curly braces and write φ c instead of φ {c} in such cases.)In this case, the number r n of atomic sentences of L n is n.Any n-entropy maximiser gives probability c to the nstate U t 1 ∧ . . .∧ U t n , which we abbreviate by θ n , and divides probability 1 − c amongst all other n-states: By the argument of Landes et al. (2021, Example 42), the following probability function is an entropy limit point: : ω n |= ¬θ n .
Example 14.Consider a single categorical premiss U 1 t 1 ∨ ∃x∀yU 2 xy.In this case, H n = {P ∈ E : P ⇂Ln = P =⇂Ln } for all n.Thus the equivocator function is the unique entropy limit point of E. However, the equivocator function is not in E, so it cannot be the maximal entropy function.Indeed, as will become apparent later (Theorem 30), maxent

Categorical Premisses and Bayesian Conditionalisation
We now consider an important special case: that in which the premisses are categorical sentences φ 1 , . . ., φ k of L, i.e., there are no attached sets of probabilities X 1 , . . ., X k , or equivalently, X 1 = . . .= X k = {1}.Let φ be the sentence φ 1 ∧ . . .∧ φ k .In this section and the next, we consider E = E φ df = {P ∈ P : P (φ) = 1} and we show that there are several cases in which maxent E can be found simply by conditionalising the equivocator function on φ.
Our first result directly applies Theorem 12: Hence, Theorem 12 applies.

■
Corollary 16 is useful because where it applies it provides a particularly simple procedure for determining maxent E φ .Also, it shows that the move to the infinite does not disrupt agreement between the Maximum Entropy Principle and conditionalisation: as long as conditionalising on φ maximises n-entropy for each sufficiently large n, it maximises entropy on the language as a whole.Because of its interest, we provide an alternative, more direct proof of Corollary 16 in Appendix 2.
Example 18. Suppose we have categorical premisses U t 2 → V t 3 , ∀x∃yW xy, where L has unary predicate symbols U and V and a binary relation symbol W . Now P = ((U t 2 → V t 3 ) = 0.75) and P = (∀x∃yW xy) = 1.So P = ((U t 2 → V t 3 ) ∧ ∀x∃yW xy) = 0.75, and This latter function is in H 3 , H 4 , . .., so Corollary 16 applies and maxent Finally, we note an important consequence of Corollary 16: Corollary 19.If φ is satisfiable and logically equivalent to a quantifier-free sentence, then maxent

■
This result can be thought of as an analogue of Seidenfeld (1986, Result 1), which demonstrates agreement between the Maximum Entropy Principle and conditionalisation in the case of a finite domain.In the next section, we show that this result can be extended to the situation in which φ is not quantifier free.

An Alternative Route to Conditionalisation
This section demonstrates agreement between the maximal entropy approach and conditionalisation without appeal to entropy limit points.
As above we consider categorical sentences φ 1 , . . ., φ k and abbreviate φ 1 ∧ . . .∧ φ k by φ.The following definition will be central to several of the results in this section: Definition 20 (n-support φ n ).Let sentence φ n be the disjunction of those n-states ω that are inductively consistent with φ, i.e., n-states ω such that | ̸ ≈ • ¬(ω ∧ φ).Equivalently, these are the n-states ω such that φ ∧ ω has positive measure.Thus, If there are no n-states inductively consistent with φ, we take φ n to be an arbitrary contradiction on L n .We call φ n the inductive support of φ on L n , or simply the n-support of φ. φ Nφ will be referred to as the support of φ. 3 We use |φ n | to denote the number of n-states in the n-support φ n , i.e., the number of n-states inductively consistent with φ.
Our main result of this section, Theorem 30, will show that when φ has positive measure, the maximal entropy function is the equivocator function conditional on φ, or, equivalently, the equivocator conditional on the support of φ.This provides a straightforward way of determining the maximal entropy function in that case.
We will first prove some technical lemmas to which the main result will appeal.The first lemma invokes the concept of exchangeability: Definition 21 (Constant Exchangeability).Let θ(x 1 , x 2 , . . ., x l ) be a formula of L that does not contain constants.A probability function P on SL satisfies constant exchangeability, if and only if for all such θ and all sets of pairwise distinct constants t 1 , t 2 , . . ., t l , and t ′ 1 , t ′ 2 , . . ., t ′ l it holds that P (θ(t 1 , t 2 , . . ., t l )) = P (θ(t ′ 1 , t ′ 2 , . . ., t ′ l )) .Equivalently, for all n ∈ N and all n-states ω n , ν n ∈ Ω n , if ω n can be obtained from ν n by a permutation of the first n constants then P (ω n ) = P (ν n ).
We are obliged to Jeff Paris for pointing out the following lemma and Proposition 24 which follows from it.
Proof: The proof follows by a straightforward adaptation of the proof of Paris and Vencovská (2015, Corollary 6.2) and proceeds by induction on the quantifier complexity of φ ∧ ψ when written in Prenex Normal Form.
The result holds by assumption when φ ∧ ψ is quantifier free.For the induction step it is sufficient to consider where all constants appearing in both ⃗ t and ⃗ t ′ are included in {t 1 , . . ., t l }.
Let u 1 , u 2 , u 3 , . . .be distinct constants containing those in ⃗ t and . .} are disjoint except for the constants shared in ⃗ t and ⃗ t ′ .By Paris and Vencovská (2015, Lemma 6.1), Then for every ϵ > 0 there is N large enough such that for all n by Paris and Vencovská (2015, Lemma 3.7), by induction hypothesis, and taking n large enough we have and thus which gives the required result.

Proof:
Paris and Vencovská (2015, Corollary 6.2) show the following: if probability function P on SL satisfies constant exchangeability and P (φ∧ψ) = P (φ) • P (ψ), whenever φ, ψ are quantifier free sentences of L that mention no constants in common, then P (φ ∧ ψ) = P (φ) • P (ψ) for any sentences φ, ψ of the language L which do not mention any constants in common.
Note that P = satisfies constant exchangeability and the assumption of Paris and Vencovská (2015, Corollary 6.2) is thus satisfied.Let φ be a sentence that does not mention any constant.Then φ, φ are two sentences that do not mention any constants in common.Since probability functions assign logically equivalent sentences the same probability we now easily find So, P = (φ) = P = (φ) 2 .This means that P = (φ) has to be either zero or one.■ Hence, every inductively consistent constant-free sentence is an inductive tautology: P = (φ) > 0 for constant-free φ implies that P = (φ) = 1.
Proposition 24.For all φ ∈ SL and all n ≥ N φ , P = (φ) = P = (φ n ) and Proof: Consider two sentences φ, ψ ∈ SL which mention at most the first N := N φ∧ψ constants.From Lemma 22 we obtain that for all ω n ∈ Ω n it holds that Using the trick on Paris and Vencovská (2015, P. 53) putting ψ = φ we obtain Using the definition of a conditional probability we find (1) So, (2) So, Finally, let us note that (2) and ( 3) Proof: Notice that by Proposition 24: Exploiting the law of total probability twice and the above observation we now find for all ψ ∈ SL that Proof: By Lemma 25 for all k ≥ 0, P = (•|φ Nφ+k ) = P = (•|φ).This entails Note that φ Nφ+k is quantifier free.Let χ, ψ be quantifier free and satisfiable, then the probability function P = (•|ψ) is equal to the probability function P = (•|χ), if and only if ψ and χ are logically equivalent; clearly, if ψ and χ are logically equivalent, then these probability functions are equal.Furthermore, if ψ and χ are not logically equivalent, then without loss of generality ψ does not entail χ, and P = (ψ|ψ) = 1 > P = (χ|ψ) follows. Letting Every N φ +k state ω Nφ+k extending a state in φ Nφ is such that where the first equality is given by Corollary 26 and second equality is given by the assumption that ω ′ m ⊭ φ m .■ Consider a sentence ψ with zero measure, P = (ψ) = 0. Intuitively, ψ is only true in few possible worlds. 4One way to approach this intuition is by exploiting probability axiom P3 according to which the probability of a quantified sentence is the limit of probabilities of quantifier-free sentences.This suggests that-in the limit-only few n-states "converge" to ψ.So, if P (ψ) = c > 0, then P has to assign a joint probability of close to c to few n-states.That is, for n large enough, there exists set of n-states S n , with joint probability of almost c, that is arbitrarily small in comparison to the number of all n-states.The following result, for which we are obliged to Alena Vencovská, makes this precise.
Lemma 28 (Concentration of probability on few n-states).Let ψ be such that P = (ψ) = 0 and P (ψ) = c > 0, then for any ϵ > 0 there exists some M ∈ N such that for all m ≥ M there exists a set of m-states, S m , such that Proof: First notice that if the result holds for some m ∈ N and a set of m-states S m , then it also holds for the set of m + 1 states S m+1 defined as the extensions of S m to L m+1 .Therefore, it is enough to show that result holds for some m ∈ N. Let P = {P, P = }.We first show that there exists some m ∈ N and a quantifier-free sentence χ ∈ SL m such that for all (We can think of χ as a finite approximation of ψ.)We proceed by induction 4 More precisely, consider the set of term structures for L that have a countably infinite domain.Then this means that the proportion of those term structures that satisfy ψ is negligible.But the term structures on a countably infinite domain can be determined as the limiting extensions of terms structures on finite subsets of the domain.This means that for asymptotically large n, there are only few term structures with a domain of size n that can be extended to a term structure that satisfies ψ.Then dividing the probability mass between the term structures on the full domain in such a way as to assign a probability of c > 0 to ψ should inevitably distribute a probability mass close to c between few term structures on a finite subdomain of size n for large n. on the quantifier complexity; that is we proceed by induction on n for ψ ∈ Σ n and ψ ∈ Π n .
For the base case, n = 0, ψ is quantifier free, and we can simply pick χ := ψ.
Let n ∈ N be large enough such that for all Notice that ψ logically entails n k1,...,kr=1 ξ(t k1 , . . ., t kr ) and thus By the induction hypothesis, for each k 1 , . . .k r ∈ {1, . . ., n} there is a quantifier free sentence (5) Notice that where we write ξ( ⃗ t ki ) for ξ(t k1 , . . ., t kr ).Then Let Ξ = n k1,...,kr=1 ξ(t k1 , . . ., t kr ), and Λ = n k1,...,kr=1 λ ⃗ k .Then by ( 4) and ( 6) we have And we have Noticing that Since ( 7) holds for all Q ∈ P = {P = , P } and P = (ψ) = 0, we have and thus and we have |Sm| |Ωm| = P = (Λ) < ϵ.Note that S m is the set of m-states entailing Λ. Furthermore, ■ There is a sense in which the states in S m simulate ψ on the sublanguage L m .Consider an underlying domain with m elements, t 1 , t 2 , . . ., t m .Universal (respectively, existential) quantification over a variable x can be understood as a finite conjunction (disjunction) over all finitely many elements.Replace all the quantifications in ψ by finite conjunctions and disjunctions over these m elements.On this finite domain, the resulting quantifier free sentence is equivalent to the original sentence.It is in this sense that S m simulates ψ on a finite domain. 5 The next Lemma shows that any maximal entropy function must assign probability one to the support φ Nφ of φ (and thus to the n-support φ n for n ≥ N φ ).Note that this lemma does not prove the existence of a maximal entropy function.
Now consider 0 < P = (φ) < 1.Since φ N and φ n are logically equivalent for n ≥ N (Corollary 26) and since probability functions respect logical equivalence it follows the assumption that P (φ n ) < 1 that P (φ N ) < 1 holds.
So, let P be such that P (φ) = 1 and P (φ N ) < 1, then P (φ ∧ ¬φ N ) = c > 0 for some 1 ≥ c > 0. Let ψ := φ ∧ ¬φ N and notice that by definition of 5 One might think that the following statement can play a similar role to that played by Sm in the above proof.Let , the disjunction of n-states deductively but not inductively consistent with φ. (If there are no such states, take φ n 0 to be an arbitrary contradiction on Ln.) Now suppose that φ has measure zero and that P (φ) = c.Since φ has measure zero, φ n is a contradiction on Ln.Hence, , so P (φ n 0 ) ≥ c.P must concentrate probability at least c on φ n 0 .Thus the question arises as to whether P=(φ n 0 ) −→ 0 as n −→ ∞.This would imply that , in which case φ n 0 would represent an increasingly negligible number of states.
However, it turns out that while this last condition holds true for some measure-zero φ, e.g., ∀xU 1 x, it does not hold true for all such sentences.For example, in the case of ∃x∀yU 2 xy, which also has zero measure, P=(φ n 0 ) = 1 for all n.
φ N , P = (ψ) = 0. Let ϵ > 0 and take M and S M as given by Lemma 28 and let K M be the set of M -states in Ω M \ S M such that Then by convexity The M -entropy of P = (•|φ N ) is We thus note Now consider the three summands in turn.Since becomes arbitrarily small by Lemma 28 and 1 ≥ b M > 0, the first term is eventually less than zero.The second term goes to zero, since K M increases without bounds.Finally, b M ≥ (1 − ϵ)c > 0. This means that for all large enough M it holds that H M (P ) − H M (P = (•|φ N )) < 0 and hence H M (P ) < H M (P = (•|φ N )).This entails that P = (•|φ N ) has greater entropy than P .Thus, P / ∈ maxent E φ .In particular, we note for later use that the sequence f n (P ) := > 0 for all large enough n.

■
We are now in a position to present the main result of this section: Theorem 30 (Agreement with Bayesian Conditionalisation).For all φ ∈ SL with Proof: We prove a stronger property, namely that P = (•|φ N ) has greater entropy than every other probability function P ∈ E.
We let N := N φ .Given the previous lemma, it suffices to show all P ∈ E with P (φ N ) = 1 and P ̸ = P = (•|φ N ) have less entropy than P = (•|φ N ).This means that P = (•|φ N ) has greater entropy than all other Consider first the case of P = (φ) = 1.In this case, the equivocator P = is in E, and, since it is the probability function in P with maximal entropy, it is the unique member of maxent E φ .By Lemma 25, the equivocator is also P = (•|φ).By Lemma 29, it is P = (•|φ N ).Now consider 0 < P = (φ) < 1.By Lemma 25, This establishes the two last equalities in the statement of the theorem.
If P (φ N ) = 1 but P ̸ = P = (•|φ N ), then there is some M ≥ N such that for all m ≥ M , P ̸ = P = (•|φ N ) on Ω m .For all m ≥ N , P = (•|φ N ) equivocates over all m-states in φ m and has strictly greater m-entropy than every other probability function Q ∈ P with Q(φ m ) = 1 and Q(ω m ) ̸ = P = (ω m |φ n ) for some m-state ω m .This entails that for all n ≥ M it holds that H n (P ) < H n (P = (•|φ N )).Hence, P = (•|φ N ) has greater entropy then every other probability function P ∈ E with P (φ N ) < 1.

■
The following observation shows that the maximum entropy function not only has greatest entropy in the sense defined above, but also in a cumulative sense.
Corollary 32.If φ has positive measure, then for all Proof: The proof shows a slightly stronger property: for all P ∈ E φ \{P = (•|φ)} the sequence f n (P ) := H n (P = (•|φ)) − H n (P ) is such that there exists some M ≥ N φ such that f n (P ) is strictly positive and never decreasing for all n ≥ M .
Let us first consider the case that P (φ N ) < 1.The claim of this corollary follows directly from the final observation in the proof of Lemma 29.
The second and final case is when P (φ Nφ ) = 1.Since P = (•|φ) ̸ = P there has to exist some M ≥ N φ such that for all m ≥ M the probability functions P and P = (•|φ) disagree on the quantifier free sentence of L m .Since both functions assign non-zero probability to, at most, the m-states extending those in φ N and P = (•|φ) is maximally equivocal on this set of M -states, it follows that H M (P = (•|φ)) > H M (P ).
For all m ≥ M we have: Since the last difference is strictly positive, this limit is +∞.The Corollary follows trivially by adding the first M − 1 bounded terms to the above limit.

■
Given a finite set of premisses of the form φ X1 1 , . . ., φ X k k we showed in Theorem 11 how a maximal entropy function can be characterised in terms of an entropy limit point.In case of a single categorical premiss, φ, if P = (• | φ) is an entropy limit point then it is the unique maximum entropy function (Corollary 15).In particular, this is the case when φ is equivalent to a quantifier free sentence (Corollary 19).Theorem 30 shows that for any inductively consistent premiss φ, there exists a unique maximal entropy function, which can be determined by conditionalising the equivocator on the support of φ, the quantifier free sentence φ Nφ expressible in the sublanguage L Nφ .For example, for φ = U 1 t 1 ∨ ∃x∀yU 2 xy every 1-state is consistent with φ.However, only the 1-states entailing U 1 t 1 are in the support of φ.These 1-states have the feature that almost all their extensions contribute to the probability of P = (φ) via probability axiom P3.What is more, Theorem 30 shows that the maximal entropy probability function equivocates between the N φ -states, and also between their extensions.That is, the unique maximal entropy probability function divides the full probability measure equally between these N φ -states and similarly between their extensions to any L n with n ≥ N φ .
Given Theorem 30, conditionalising the equivocator function is a simple method for determining the maximal entropy probabilities in objective Bayesian inductive logic.Although this approach to inductive logic is Bayesian, conditionalisation is not taken here as a principle that is constitutive or core to the Bayesian method, but rather as an inference tool that is appropriate in certain specific circumstances.Indeed, conditionalisation has been criticised as being problematic outside an appropriate range of circumstances (Howson, 2014;Williamson, 2010).The fact that it agrees with the maximal entropy approach can be taken to justify the use of conditionalisation on learning φ, in the circumstances in which φ has positive measure and is 'simple' in the sense that it only imposes the constraint P (φ) = 1 (Williamson, 2017, Definition 5.14).

Jeffrey Conditionalisation
In this section, we generalise our results for conditionalisation from the case in which the premiss is a categorical sentence φ to the case in which the premiss is a sentence of the language with a specific probability attached, φ c , with c ∈ (0, 1).Thus in this section, E = E φ c df = {P ∈ P : P (φ) = c}.
Definition 33 (Jeffrey Update of the Equivocator).Where P = (φ) ∈ (0, 1) we can define the Jeffrey update of the equivocator function: First, we have a straightforward generalisation of Corollary 15: Proposition 34.If P φ c is an entropy limit point of E φ c , then Hence, Theorem 12 applies.

■
We also have an analogue of Corollary 16: Proof: If P φ c ∈ H n for sufficiently large n, then P φ c is an entropy limit point of E φ c .Hence, Proposition 34 applies.■ Thus (cf.Corollary 19), if φ is quantifier-free and P φ c is well defined (1 > P = (φ) > 0), then maxent E φ c = {P φ c }. Interestingly, as we show shortly, this holds true even when φ contains quantifiers.First we make the following observation: Proposition 36.¬φ n = (¬φ) n .
Proof: Recall from (1) that for all n ≥ N φ it is true that Since φ n is the disjunction of such ω n , in particular φ n is quantifier free, we have ¬φ n = (¬φ) n and ⟨φ n , (¬φ) n ⟩ is a partition.

■
We are now in a position to provide the main result of this section.
Theorem 37 (Agreement with Jeffrey Conditionalisation).For all c ∈ (0, 1) and all φ ∈ SL such that P = (φ) ∈ (0, 1), the maximal entropy function for the premiss φ c is obtained by Jeffrey updating the equivocator function: Theorem 30 covers the borderline cases of c = 0 and c = 1 in which the maximum entropy function is unique and given by a Bayesian conditionalisation.Proof: The main idea in the proof comes from the intuition that it is always beneficial in terms of entropy to take the probability mass from those n-states that have few extensions to m-states that simulate φ, as m increases to infinity, and divide it (equally) between the extensions of those n-states for which almost all extensions to an m-state simulate φ as m increases to infinity.
Let N := N φ and note that by Theorem 30 Furthermore, cP = (•|φ) + (1 − c)P = (•|¬φ) has strictly greater entropy than every other function in ) assigns all n-states extending φ N the same probability and it also assigns assigns all n-states extending (¬φ) N = ¬φ N (Proposition 36) the same probability.
The n-entropy of P φ c is given by:

we assume and let
Then there has to exist some state ν N ⊨ ¬φ N (recall that this means that on all φ N states and all its extensions and, furthermore, equivocate probability mass 1 − c over ¬φ N and all its extension, and, furthermore, equivocate over all n-states in a set S ⊂ Ω n of extensions ¬φ N as in Lemma 28.So, overall, Hence, We proved that this expression is strictly less than zero for all large enough n in Theorem 30 for c = 1.The general case for 1 > c > α > 0 follows from (Landes et al., 2021, Proposition 5): Letting H n (P ) > H n (Q) for P, Q ∈ P and denoting by c • P the result of multiplying all probabilities of n-states by 1 > c > 0, then and only then H n (c • P ) > H n (c • Q).This is so, since H n (c • P ) is an affine function of H n (P ) with a strictly positive slope.It hence follows that ) has greater entropy than Q.This completes the proof.
Hence, for all sentences ψ ∈ SL Since P = ∈ E φ X the result follows.
If P = (φ) / ∈ X, then for all P ∈ E φ X it holds that x := P (φ) ̸ = P = (φ).By the proof of Theorem 37 we see that greatest entropy among all functions with x = P (φ).Hence, , we now compute the n-entropies for all these probability functions for n ≥ N to be equal to It is hence holds for all x, y ∈ [0, 1] and all n > N that Let us next note that P P=(φ N ) = P = .Furthermore, every P x is a convex combination of P = (•|φ) and of P = (•|¬φ).Along this line from P = (•|φ) to P = (•|¬φ) N -entropy is maximised by P P=(φ N ) = P = since it is the equivocator (on Ω N ).
Since the P x (on Ω N ) all are part of a line segment and H N is strictly concave, it follows that N -entropy is uniquely maximised by the equivocator and strictly decreases the further one moves in one direction from the equivocator.Hence, P c has strictly the greatest N -entropy among all other P x for x ∈ X \ {c}.
Applying the above equivalence ( 9) we find that P c (since c ∈ X is the closest to P = (φ)) also has the greatest n-entropy among all P x for x ∈ X for large enough n.P c has hence greater entropy than every other probability function in P ∈ E φ X \ {P c }.

■ 7 Preservation of Inductive Tautologies
Having developed the entropy limit point method for determining maximal entropy functions, and having demonstrated concordance with Bayesian conditionalisation and Jeffrey conditionalisation, we will now discuss the maximal entropy approach from a general perspective.In this section, we outline some logical features of objective Bayesian inductive logic, while in §8 we will explore the extent to which inferences are invariant under permutations of the constants, and in §9 we investigate some cases involving categorical premisses with zero measure.
First we show that, in objective Bayesian inductive logic, inductive tautologies (i.e., probability 1 inferences in the absence of any premisses) are preserved after learning the probability of any proposition that is inductively consistent: Theorem 39 (Preservation of Inductive Tautologies, PIT).
Proof: If P ∈ maxent E then no function in E dominates P in n-entropy for sufficiently large n.In particular, no function in E ∩ F dominates P in n-entropy for sufficiently large n.Thus, P ∈ maxent(E ∩ F) and maxent E ⊆ maxent(E ∩ F).
■ PIT can also be thought of as a variant of the Rational Monotonicity rule of inference in non-monotonic logic (Lehmann and Magidor, 1992, §3.4): PIT specialises Rational Monotonicity to the case in which ψ is an inductive tautology and then generalises it to the case in which φ is uncertain.
PIT can also be interpreted as an absolute continuity condition (Billingsley, 1979, p. 422): if ¬θ has zero measure, i.e., P = (¬θ) = 0, then any P † ∈ maxent E φ c also gives zero probability to ¬θ, where φ has positive measure and c > 0. Note that the equivocator function P = corresponds to Lebesgue measure when probability functions on L are mapped to probability measures on the unit interval (Williamson, 2017, §2.6.3).Thus, 'zero measure' in the present sense (Definition 4) corresponds to zero Lebesgue measure.
8 Invariance under Permutations Williamson (2010, Proposition 5.10) shows that the maximal entropy approach is invariant under those finite and infinite permutations of the atomic sentences that list atomic sentences involving only t 1 , ..., t n before those involving t n+1 for each n.In this section, we explore invariance under permutations of the constants themselves.
Definition 41.Let f be a reordering of constants, i.e, f is bijective.For φ ∈ SL we write f (φ) for the result of reordering the constants in φ according to f .We use f (P ) to denote the probability function obtained from P by permuting the constants of φ ∈ SL according to f : f (P )(φ( ⃗ t)) := P (φ(f ( ⃗ t))) for all φ ∈ SL.
Lemma 42.If P ∈ P and f is a permutation, then f (P ) ∈ P.
Proof: It is clear that f (P ) satisfies P1 and P2.
Concerning P3, we need to show the next equality, the latter equalities follow from the definition of f (P ).
As usual, put N := max{i : t i ∈ θ( ⃗ t)} and also let N f := max{j : Let us now fix m and consider and thus Similarly, let and so We next note that (P ( )) m∈N is an increasing non-negative sequence which converges by P3 to P (∃xθ(x, f ( ⃗ t))).This entails that sup m f (P )( where the last equality is just definition of f (P ).Hence, f (P ) satisfies P3. ■ The concept of 'greater entropy' is well defined in the sense that it is preserved under any permutation that preserves the probability functions that it permutes: Proposition 43 (Independence of ordering of constant symbols).For any reordering of constants f and probability functions P, Q such that f (P ) = P and f (Q) = Q, P has greater entropy than Q, if and only if f (P ) has greater entropy than f (Q).

Proof:
If Hence, P has greater n-entropy than Q for sufficiently large n, if and only if f (P ) has greater n-entropy than f (Q) for sufficiently large n.

■
On the other hand, if a permutation f changes the two probability functions of interest, then the permuted functions can compare differently with respect to which has greater entropy: Proposition 44 (Dependence on ordering of constant symbols).There exists an infinite reordering of constants f and probability functions P, Q such that P has greater entropy than Q but f (Q) has greater entropy than f (P ).
Proof: To simplify matters we consider a language only containing a single relation symbol, U , which is unary.It is apparent from the proof that the proof strategy applies to all languages in our sense.
Let f be the following bijection on N. f (2n + 1) := 2n − 1 for all n ≥ 1, f (1) = 2 and f (2n) = 2n+2.Intuitively, the even numbers and 1 are postponed to the future and the odd numbers, with the exception of 1 are brought forward.
It is important in the following that for all n it holds that f is not a bijection on {1, . . ., n}.For all even n and n = 1 it holds that f (n) > n.For all other odd n it holds that f −1 (n) = n + 2 > n.This fact will be used without further mention.
Next define a probability function P ∈ P by having all constant symbols be independent of each other.This entails that n-entropies can be written as a sum of n 1-entropies.This follows from, for example, (Landes and Williamson, 2016, Equation 1).
For all n ≥ 1 we now let We can then compute the n-entropies as follows for all n ≥ 1 since the even n are maximally equivocal as is 1 and all other odd n are deterministic.
We note that Q is well defined under the assumption that Q(U t i ) ≤ 0.5 since i) 1-entropy is strictly concave and strictly increasing for Q(U t i ) ∈ [0, 0.5], ii) H 1 (P ) ∈ [0, log(2)] for all P ∈ P, iii) H 1 is continuous, iv) H 1 is a bijective map from [0, 0.5] onto [0, log(2)] and finally v) the intermediate value theorem holds.
Apparently, for i ∈ {1, 2, 3} it holds that H i (P ) > H i (Q).That H i (P ) > H i (Q) holds for all greater i, too, follows from the definition of Q.

■
The proof shows in fact that for any language there exist probability functions P, Q ∈ P such that P has greater entropy than f (P ) and P has greater entropy than Q, yet f (Q) has greater entropy than f (P ).
Interestingly, despite the possibility exposed by Proposition 44, our results show that in many natural cases, the function that has maximal entropy is invariant under reordering the constants:6 Theorem 45 (Invariance under Permutations of Constant Symbols).
Proof: Let us first recall that by Lemma 42 we have f (P ) ∈ P. Furthermore, from the definition of f (P ) we immediately obtain that After observing that ⊨ f (φ m ) ↔ f (φ) m and that ⊨ f (¬φ m ) ↔ f (¬φ) m for all large enough m, we apply Theorem 37 and find It now suffices to note that the equivocator function is as symmetrical as can be: for all χ, ρ ∈ QF SL it holds that .

■
As expected this result generalises easily to a single premiss with an attached uncertainty interval.
Proof: For both premisses a unique maximum entropy function exists which is equal to a Jeffrey update of the equivocator.These Jeffrey (or simply Bayesian) updates are with respect to φ Nφ , respectively, the logically equivalent f (φ Nφ ) and (f (φ)) Nφ .Furthermore, both Jeffrey updates are with respect to the same x ∈ X (Corollary 38).
Finally, let us apply the proof of Theorem 45 to note that for all ψ ∈ SL it holds that As Example 13 illustrates, there are cases of zero-measure premisses that are entirely unproblematic and that can be handled using the entropy limit point techniques introduced in §3.7 However, some zero-measure premisses are more problematic, in that they generate sets E of probability functions in which there is no function with maximal entropy.We will focus on these pathological cases in this section.We first provide some examples of such cases and then we discuss how best to proceed when they arise.We argue that these cases suggest a refinement to the definition of maximal entropy and that they motivate drawing inferences from any function with sufficiently great entropy.
To simplify the exposition in this section we assume in this section that the underlying language L contains only the single relation symbol employed in the respective propositions.The general case follows from the fact entropy maximisation is language invariant (Paris, 1994, Chapter 6), because maximal entropy functions equivocate over all sentences mentioning only relation symbols that are not mentioned by any premiss.
Proposition 47.For φ = ∃x∀yU xy and any P ∈ E φ there exists a probability function Q ∈ E φ which has greater entropy than P .Hence, maxent E φ = ∅.
Proof: Suppose for contradiction that maxent E φ ̸ = ∅ and let P ∈ maxent E φ .We now show that this entails a contradiction.This is achieved by first defining a probability function P ′ ∈ E φ \ {P } such that H n (P ′ ) ≥ H n (P ) for all large enough n.It is not necessarily the case that P ′ has greater entropy than P .However, all probability functions that are a convex combination of P and P ′ are in E φ (E φ is convex) and have strictly greater n-entropy than P for all large enough n (H n (•) is concave).Hence, all the convex combinations are in E φ and have greater entropy than P .Contradiction.
Note that P = (φ) = 0 < 1 = P (φ).Hence, P ̸ = P = .Let us now define a probability function P ′ ∈ E by shifting all witnessing of ∃x∀yU xy by one and then adding a constant t 1 such that U t 1 t * is independent from all other literals for all t * ̸ = t 1 .Intuitively, the literals ±U t i t k are replaced by ±U t i+1 t k .
Formally, let ω n ∈ Ω n = n i,k=1 U ϵ i,k t i t k be an arbitrary n-state.Then define P ′ by Firstly, we note P ′ (∀yU t 1 y) = lim n→∞ P ′ ( n k=1 U t 1 t k ) = lim n→∞ 2 −n = 0. So, according to P ′ the constant t 1 is not a witness of the existential premiss sentence φ.
We next show that P ̸ = P ′ .Firstly, note that since lim n→∞ P ( n i=1 ∀yU t i y) = P (∃x∀yU xy) = 1 the following minimum is a finite number and thus obtains min{i ∈ N : P (∀yU t i y) > 0}.Armed with this observation, we note that secondly and finally that min{i ∈ N : P ′ (∀yU So, P ̸ = P ′ . We also observe that for all i ∈ N, P ′ (∀yU t i y) = P (∀yU t i+1 y) and furthermore P ′ ( i∈I ∀yU t i y) = P ( i∈I ∀yU t i+1 y) for all finite index sets I. So, This means that P ′ (∃x∀yU xy) = 1 and thus, as advertised, P ′ ∈ E φ .
We now calculate n-entropies of P and P ′ and find for n ≥ 1 that: H n (P ′ ) = − ϵr,s∈{0,1} 1≤r,s≤n Holding the first summation fixed, we note that since n-entropy is maximised by maximally equivocating that H n (P ) ≤ H n (P ′ ).For example, if P is flat on n k=1 U ϵ n,k t n t k , P ( n k=1 U ϵ n,k t n t k ) = 2 −n for all ϵ n,k with 1 ≤ k ≤ n, and all these conjunctions are independent of n−1 i=1 n k=1 U ϵ n,k t i t k for all ϵ, then H n (P ) = H n (P ′ ).Now define Q := P +P ′ 2 .Since E φ is convex and P, P ′ ∈ E φ , we observe that Q ∈ E φ .
Since n-entropy is a strictly concave function we conclude that H n (Q) > H n (P ) whenever P and P ′ disagree on L n .Since P ̸ = P ′ there has to exists some finite M and quantifier free sentence ψ ∈ QF SL M such that P (ψ) ̸ = P ′ (ψ) (Gaifman's Theorem).Since L m ⊂ L m+1 for all m we have that P disagrees with P ′ on L m for all m ≥ M .We have hence found a Q ∈ E such that H n (Q) > H n (P ) for all large enough n.Hence, P / ∈ maxent E φ .Contradiction.

■
We generalise this result to higher quantifier complexity in Appendix 3.These results are summarised in the following theorem.
Theorem 48 (Zero Measure Premisses).For all n ≥ 1 and it holds that for all P ∈ E φ there exists a probability function Q ∈ E φ which has greater entropy.Hence, maxent E φ = ∅.
Having introduced some pathological cases in which there is no maximal entropy function, we now turn to the question as to what to do in such cases.
For simplicity of exposition, we focus on the case in which we have a single premiss, φ = ∃x∀yU xy, considered in Proposition 47.We call a proposition of the form ∀yU t i y a witness proposition.A probability function P that satisfies φ distributes probability 1 to the witness propositions, lim k→∞ P ( k i=1 ∀yU t i y) = P (∃x∀yU x) = 1.We call a constant t i a witness if P gives positive probability to the corresponding witness proposition ∀yU t i y.Now, the equivocator function, which is the probability function with maximal entropy, gives φ measure zero, P = (∃x∀yU x) = 0, and thus it has no witnesses.Given P , one can construct a function Q that has greater entropy than P by making Q 'closer to' the equivocator in one or both of two ways: 1. Delaying the witnesses.If there are infinitely many witnesses, then one can create Q by increasing the index of each witness in an appropriate way in order to make Q more like the equivocator than P for each fixed n.For example, if t i1 , t i2 , . . .are the witnesses for P , one can construct Q with witnesses t i2 , t i3 , . .., ensuring that Q(∀yU t i1 y) = 0 and Q(∀yU t ij y) = P (∀yU t ij−1 y) for each j > 1.
2. Flattening the distribution over witness propositions.Entropy can be increased by increasing the number of witnesses, if there are finitely many, and distributing probability more equally to the witnesses, decreasing the rate at which the probability of k i=1 ∀yU t i y converges to 1.The approach taken in the proof of Proposition 47 involved a mixture of these strategies: delaying witnesses to give P ′ , and then flattening the distribution by taking a convex combination of P and P ′ , to yield Q.
One might argue that although the first of these two strategies increases n-entropy for sufficiently large n, it does not on its own lead to a function that is more equivocal in an intuitive sense.Hence, this seems to be a case in which the formal concept of maximal entropy fails to adequately explicate the concept of being maximally equivocal.(In contrast, the second strategy is unproblematic: flattening the distribution over witness propositions does seem to be a genuine way of generating a more equivocal probability function.) The explication of maximal entropy can however be refined to avoid this problem: we can deem P to have greater entropy than Q just when, for every reordering f of the constants that do not appear in the premisses, f (P ) dominates f (Q) in n-entropy for sufficiently large n.Note that this refinement relativises the greater-entropy relation to the premisses.
This refinement eradicates the first of the two strategies: delaying witnesses no longer increases entropy, because there are reorderings with respect to which the witnesses are not delayed.The refinement leaves intact the second kind of strategy.
If we accept this refinement, the question then becomes: what policy should be adopted when there is no maximal entropy function because of increases in entropy of the second kind?Williamson (2010, pp. 29-30) suggests a pragmatic policy: to take inferences to be determined by probability functions with sufficiently great entropy.Here, the cut-off between functions that have sufficiently great entropy and those that do not may depend on features of the problem or on the users of the logic, and may not be precise.Choosing a probability function with sufficiently great entropy amounts to a choice of P such that P ( k i=1 ∀yU t i y) converges to 1 sufficiently slowly.
Further desiderata may be imposed.For example, one might suggest equivocating between the constants by treating them equally.The thought here is that each constant should be a witness, because the premiss gives no grounds for discriminating between constants that are witnesses and those that are not.This line of reasoning motivates giving each witness proposition the same probability s > 0 and making witness propositions probabilistically independent.8In which case, P ( k i=1 ∀yU t i y) = 1 − (1 − s) k , which converges to 1 as required.Now, decreasing s (and distributing the corresponding probability equally amongst n-states) will lead to a probability function with greater entropy-this is an application of the second of the two strategies outlined above.The pragmatic policy then amounts to drawing inferences from probability functions that correspond to values of s that are sufficiently small.One approach here is to take s to be sufficiently small just when taking s any smaller would not make a significant difference with respect to practical purposes.
In sum, we see that although these pathological examples require refinements to the overall approach, there is scope to devise policies that allow one to extend objective Bayesian inductive logic even to these difficult measure-zero cases.

Conclusion
Objective Bayesian inductive logic defines inductive entailment from a set of (possibly probabilistic) premisses in terms of maximal entropy probability functions that satisfy the given premisses.To be more precise, a set of premisses inductively entails a conclusion if every probability function with maximal entropy that satisfies the premisses also satisfies the conclusion.This is a very natural approach to inductive logic that has been studied extensively in the literature in the context of reasoning with propositional languages.An immediate task that arises with this approach is then to find these maximal entropy probability functions in order to perform inference.This is a straightforward, although possibly computationally expensive, problem when working with propositional languages.For more expressive languages, however, it is not clear how one should proceed to determine these maximal entropy probability functions.In this paper, we have studied this problem for premisses and conclusion that are given in terms of constraints on the probabilities of sentences of a first order language.
To do so we first introduced the notion of an entropy limit point and discussed its use for determining maximal entropy probability functions.Next we distinguished what we call the measure-zero sentences from those that have positive measure.Measure-zero sentences are sentences that are assigned probability zero by the equivocator function P = .Intuitively, measure-zero sentences are those that have very few models.To be more precise, these are sentences for which the proportion of term structures with a countably infinite domain that satisfy them is negligible.We showed that for categorical premisses with positive measure, the maximal entropy approach agrees with Bayesian conditionalisation.This then generalizes to Jeffrey conditionalisation when dealing with a non-categorical premiss that is given in terms of a constraint on the probability of some sentence.With these results in place we then showed that although the comparative entropy of probability function does in general depend on the ordering of constants in the language, the probability functions with maximal entropy remain invariant under such permutations in the cases where it agrees with Bayesian of Jeffrey conditionalisation.
These results not only clarify which probabilities the maximal entropy probability functions assign for inductive inference but also give a constructive method for calculating the maximal entropy probabilities.On the one hand, this shows that the maximal entropy approach agrees with standard conceptions of baseline rationality.On the other, it witnesses the stability and generality of Bayesian conditionalisation as a process of probabilistic learning.
Finally, we turned our attention to inference from zero-measure premisses and identified a certain class of zero-measure sentences for which there is no maximal entropy probability function.This leaves the question of inductive inference from these pathological zero-measure premisses open.The issue is then to understand which inferences from zero measure premisses are rational and how to systematically characterize such inferences in terms of a unified inference process, and we developed a strategy for doing this.
Another interesting question is what more can be said about inductive inference from multiple non-categorical premisses.So far, our results on objective Bayesian inductive logic have concerned languages containing only relation symbols.It is natural to extend these considerations to languages also con-taining a symbol for equality and function symbols, which have already been studied in Pure Inductive Logic (Landes, 2009;Landes et al., 2009;Paris and Vencovská, 2015;Howarth and Paris, 2019;Paris and Vencovská, 2019).Finally, our hope here is that these results can also suggest new avenues for investigating the open cases of the entropy limit conjecture that concerns the equivalence of the two main approaches to inductive inference introduced in §1.
Proof: First we shall show that P ∈ maxent E; later we shall see that there is no other member of maxent E. First, then, assume for contradiction that P ̸ ∈ maxent E. Then there is some Q ∈ E such that Q has greater entropy than P .That is, for sufficiently large n, H n (Q n ) ≥ H n (Q) > H n (P ), where the Q n ∈ H n converge in entropy (and, by Proposition 11, in L 1 ) to P .N.b., Q ̸ = P .Hence, for sufficiently large n, Since the Q n converge in entropy to P , they converge in L 1 to Q.By the uniqueness of L 1 limit points, Q = P : a contradiction.Hence P ∈ maxent E, as required.
Next we shall see that P is the unique member of maxent E. Suppose for contradiction that there is some P † ∈ maxent E such that P † ̸ = P .Then P cannot eventually dominate P † in n-entropy-i.e., there is some infinite set J ⊆ N such that for n ∈ J, H n (P † ) ≥ H n (P ).
In particular, since ψ is quantifier-free, Q n (ψ) − P (ψ) ≤ max φ∈SLn (Q n (φ) − P (φ)) < λδ/2.For any such n, f n (ρ n ) ≥ f n (ψ) = P (ψ) − Q n (ψ) + λ(P † (ψ) − P (ψ)) Putting the above parts together, we have that for sufficiently large n ∈ J, However, that these H n (Q n ) − H n (P ) are bounded away from zero contradicts the assumption that the Q n converge in entropy to P .Hence, P is the unique member of maxent E, as required.
Assume for contradiction that P ∈ maxent E φ .Since P = (φ) = 0, P cannot be the equivocator.However, since P ∈ E φ , it must also holds that for all t i (i ∈ N) there has to exist some minimal t k * i (k * i ≥ 1) such that P (∀zSt i t k * i z) > 0. We now define a probability function Q ∈ E φ which has greater entropy than P , which contradicts that P ∈ maxent E φ .First, we postpone for all i the witnessing (see Proposition 47) to k * i + 1.This is again achieved by first defining a probability function P ′ ∈ E φ \ {P } such that H n (P ′ ) ≥ H n (P ) for all large enough n: As we saw in Proposition 47, it holds that P ′ (∃y∀zSt i yz) = 1 for all i ∈ N.
Furthermore, for all i ∈ N there exist an n i ∈ N and ϵ k,l ∈ {0, 1} ni×ni such that P ′ ( ni k=1 ni l=1 S ϵ k,l t i t k t l ) ̸ = P ( ni k=1 ni l=1 S ϵ k,l t i t k t l ).Given the way we wrote E φ (see ( 11)), we see that every extension of P ′ to a probability function -which so far not be defined on the entire languagewill be in E φ since membership in E φ solely depends on sub-states where the first constant is fixed to some t i .
We now define P ′ on an arbitrary n-state ω n of the language, and hence via Gaifman's Theorem on the entire language by P ′ (ω n ) := S ϵ i,k,l t i t k t l ) .
Because of the additivity of the entropy function (Csiszár, 2008, P. 63), we also find for all n ∈ N that S ϵ i,k,l t 1 t k t l )) .
Since the entropy function is maximised for independent variables we also find: S ϵ i,k,l t i t k t l )) .
Taking Q to be any convex combination of P and P ′ , we see that H n (Q) > H n (P ) for all large enough n.This entails that Q has greater entropy than P . ■

S
ϵ i,k,l t 1 t k t l ) • log(P ′ (

S
ϵ i,k,l t i t k t l ) • log(P ′ (