Determining Maximal Entropy Functions for Objective Bayesian Inductive Logic

Landes, Juergen; Rafiee Rad, Soroush; Williamson, Jon

doi:10.1007/s10992-022-09680-6

Determining Maximal Entropy Functions for Objective Bayesian Inductive Logic

Open access
Published: 13 October 2022

Volume 52, pages 555–608, (2023)
Cite this article

Download PDF

You have full access to this open access article

Journal of Philosophical Logic Aims and scope Submit manuscript

Determining Maximal Entropy Functions for Objective Bayesian Inductive Logic

Download PDF

475 Accesses
6 Citations
3 Altmetric
Explore all metrics

Abstract

According to the objective Bayesian approach to inductive logic, premisses inductively entail a conclusion just when every probability function with maximal entropy, from all those that satisfy the premisses, satisfies the conclusion. When premisses and conclusion are constraints on probabilities of sentences of a first-order predicate language, however, it is by no means obvious how to determine these maximal entropy functions. This paper makes progress on the problem in the following ways. Firstly, we introduce the concept of a limit in entropy and show that, if the set of probability functions satisfying the premisses contains a limit in entropy, then this limit point is unique and is the maximal entropy probability function. Next, we turn to the special case in which the premisses are categorical sentences of the logical language. We show that if the uniform probability function gives the premisses positive probability, then the maximal entropy function can be found by simply conditionalising this uniform prior on the premisses. We generalise our results to demonstrate agreement between the maximal entropy approach and Jeffrey conditionalisation in the case in which there is a single premiss that specifies the probability of a sentence of the language. We show that, after learning such a premiss, certain inferences are preserved, namely inferences to inductive tautologies. Finally, we consider potential pathologies of the approach: we explore the extent to which the maximal entropy approach is invariant under permutations of the constants of the language, and we discuss some cases in which there is no maximal entropy probability function.

Article PDF

A Triple Uniqueness of the Maximum Entropy Approach

Nonmonotonicity in the Framework of Parametric Logic

Article 10 September 2018

Approximations of System W Between c-Inference, System Z, and Lexicographic Inference

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Data Availability

This manuscript does not use data.

References

Balestrino, A., Caiti, A., & Crisostomi, E. (2006). Efficient numerical approximation of maximum entropy estimates. International Journal of Control, 79(9), 1145–1155.
Article Google Scholar
Barnett, O., & Paris, J. B. (2008). Maximum entropy inference with quantified knowledge. Logic Journal of IGPL, 16(1), 85–98.
Article Google Scholar
Billingsley, P. (1979). Probability and measure (3rd (1995) edn). New York: Wiley.
Google Scholar
Carnap, R. (1952). The continuum of inductive methods. Chicago: University of Chicago Press.
Google Scholar
Caticha, A., & Giffin, A. (2006). Updating probabilities. In Proceedings of MaxEnt (Vol. 872 pp. 31–42).
Chen, B., Hu, J., & Zhu, Y. (2010). Computing maximum entropy densities: a hybrid approach. Signal Processing: An International Journal, 4(2), 114–122.
Google Scholar
Cover, T. M., & Thomas, J. A. (1991). Elements of information theory (2nd (2006) edn). New York: Wiley.
Book Google Scholar
Csiszár, I. (2008). Axiomatic characterizations of information measures. Entropy, 10(3), 261–273.
Article Google Scholar
Gaifman, H. (1964). Concerning measures in first order calculi. Israel Journal of Mathematics, 2(1), 1–18.
Article Google Scholar
Goldman, S. A. (1987). Efficient methods for calculating maximum entropy distributions Master’s thesis. Electrical Engineering and Computer Science, Massachusetts Institute of Technology.
Goldman, S. A., & Rivest, R. (1988). A non-iterative maximum entropy algorithm. In L. Kanal J. Lemmer (Eds.) Uncertainty in Artificial Intelligence 2 (pp. 133–148). North-Holland: Elsevier.
Haenni, R., Romeijn, J. -W., Wheeler, G., & Williamson, J. (2011). Probabilistic logics and probabilistic networks. Synthese Library. Dordrecht: Springer.
Book Google Scholar
Howarth, E., & Paris, J. B. (2019). Pure inductive logic with functions. Journal of Symbolic Logic, 84(4), 1382–1402.
Article Google Scholar
Howson, C. (2014). Finite additivity, another lottery paradox and conditionalisation. Synthese, 191(5), 989–1012.
Article Google Scholar
Jaynes, E. T. (1957). Information theory and statistical mechanics. The Physical Review, 106(4), 620–630.
Article Google Scholar
Jaynes, E. T. (2003). Probability theory: the logic of science. Cambridge: Cambridge University Press.
Book Google Scholar
Landes, J. (2009). The principle of spectrum exchangeability within inductive logic. PhD thesis, Manchester Institute for Mathematical Sciences.
Landes, J. (2021a). A triple uniqueness of the maximum entropy approach. In J. Vejnarová N. Wilson (Eds.) Proceedings of ECSQARU, volume 12897 of LNAI (pp. 644–656). Cham: Springer.
Landes, J. (2021b). The entropy-limit (conjecture) for Σ₂-premisses. Studia Logica, 109(2), 423–442.
Article Google Scholar
Landes, J., & Williamson, J. (2015). Justifying objective Bayesianism on predicate languages. Entropy, 17(4), 2459–2543.
Article Google Scholar
Landes, J., & Williamson, J. (2016). Objective Bayesian nets from consistent datasets. In A. Giffin K.H. Knuth (Eds.) Proceedings of MaxEnt, (Vol. 1757 pp. 020007–1–020007-8). AIP.
Landes, J., & Williamson, J. (2022). Objective bayesian nets for integrating consistent datasets. Journal of Artificial Intelligence Research, 74, 393–458.
Article Google Scholar
Landes, J., Paris, J. B., & Vencovská, A. (2009). Representation theorems for probability functions satisfying spectrum exchangeability in inductive logic. International Journal of Approximate Reasoning, 51(1), 35–55.
Article Google Scholar
Landes, J., Rafiee Rad, S., & Williamson, J. (2021). Towards the entropy-limit conjecture. Annals of Pure and Applied Logic, 172(2), 102870.
Article Google Scholar
Lehmann, D., & Magidor, M. (1992). What does a conditional knowledge base entail? Artificial Intelligence, 55(1), 1–60.
Article Google Scholar
Ormoneit, D., & White, H. (1999). An efficient algorithm to compute maximum entropy densities. Econometric Reviews, 18(2), 127–140.
Article Google Scholar
Paris, J. B. (1994). The uncertain reasoner’s companion. Cambridge: Cambridge University Press.
Google Scholar
Paris, J. B. (1998). Common sense and maximum entropy. Synthese, 117(1), 75–93.
Article Google Scholar
Paris, J. B., & Rafiee Rad, S. (2010). A note on the least informative model of a theory. In F. Ferreira, B. Löwe, E. Mayordomo, L. Mendes Gomes, J. B. Paris, & S. Rafiee Rad (Eds.) Proceedings of CiE (pp. 342–351). Berlin: Springer.
Paris, J. B., & Vencovská, A. (1990). A note on the inevitability of maximum entropy. International Journal of Approximate Reasoning, 4 (3), 183–223.
Article Google Scholar
Paris, J. B., & Vencovská, A. (1997). In defense of the maximum entropy inference process. International Journal of Approximate Reasoning, 17 (1), 77–103.
Article Google Scholar
Paris, J., & Vencovská, A. (2015). Pure inductive logic. Cambridge: Cambridge University Press.
Book Google Scholar
Paris, J. B., & Vencovská, A. (2019). Six problems in pure inductive logic. Journal of Philosophical Logic, 48(4), 731–747.
Article Google Scholar
Pearl, J. (1988). Probabilistic reasoning in intelligent systems: networks of plausible inference. San Mateo: Morgan Kaufmann.
Google Scholar
Rafiee Rad, S. (2009). Inference processes for probabilistic first order languages. PhD thesis, Manchester Institute for Mathematical Sciences.
Rafiee Rad, S. (2017). Equivocation axiom on first order languages. Studia Logica, 105(1), 121–152.
Article Google Scholar
Rafiee Rad, S. (2018). Maximum entropy models for Σ₁ sentences. Journal of Applied Logics - IfCoLoG Journal of Logics and their Applications, 5(1), 287–300.
Google Scholar
Rafiee Rad, S. (2021). On probabilistic characterisation of models of first order theories. Annals of Pure and Applied Logic, 172(1).
Seidenfeld, T. (1986). Entropy and uncertainty. Philosophy of Science, 53(4), 467–491.
Article Google Scholar
Shannon, C. (1948). A mathematical theory of communication. The Bell System Technical Journal, 27, 379–423.
Article Google Scholar
Williams, P. M. (1980). Bayesian conditionalisation and the principle of minimum information. British Journal for the Philosophy of Science, 31(2), 131–144.
Article Google Scholar
Williamson, J. (2008). Objective Bayesian probabilistic logic. Journal of Algorithms, 63(4), 167–183.
Article Google Scholar
Williamson, J. (2010). In defence of objective Bayesianism. Oxford: Oxford University Press.
Book Google Scholar
Williamson, J. (2017). Lectures on inductive logic. Oxford: Oxford University Press.
Book Google Scholar

Download references

Acknowledgments

We are grateful to Jeff Paris and Alena Vencovská for very helpful advice.

Funding

Open access funding provided by Università degli Studi di Milano within the CRUI-CARE Agreement. Jürgen Landes is funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) 405961989 and 432308570 and NextGenerationEU funding for the project “Practical Reasoning for Human-Centred Artificial Intelligence”. Soroush Rafiee Rad’s work is supported by the Dutch Institute for Emergent Phenomena (DIEP) cluster at the University of Amsterdam and partly by the Deutsche Forschungsge-meinschaft (DFG, German Research Foundation) 432308570. Jon Williamson is funded by the Leverhulme Trust (grant RPG-2019-059) and the Deutsche Forschungsgemeinschaft (DFG, grant 432308570).

Author information

Authors and Affiliations

Department of Philosophy “Piero Martinetti”, University of Milan, Milan, Italy
Juergen Landes
Dutch Institute for Emergent Phenomena (DIEP) and Institute for Logic, Language and Computation (ILLC), Amsterdam, The Netherlands
Soroush Rafiee Rad
Philosophy Department and Centre for Reasoning, University of Kent, Canterbury, UK
Jon Williamson

Authors

Juergen Landes
View author publications
You can also search for this author in PubMed Google Scholar
Soroush Rafiee Rad
View author publications
You can also search for this author in PubMed Google Scholar
Jon Williamson
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

All authors significantly contributed to this manuscript. All authors have read and approved the uploaded version of the manuscript.

Corresponding author

Correspondence to Juergen Landes.

Ethics declarations

Conflict of Interest

No author reports a conflict of interest.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix 1: Proofs of Proposition 15 and Theorem 16

First let us recount some basic information-theoretic facts.

The n-divergence of two probability functions P and Q is defined as the Kullback-Leibler divergence of P from Q on ${\mathcal L}_{n}$:

$$d_{n}(P,Q) \stackrel{\text{df}}{=} \sum\limits_{\omega \in {\Omega}_{n}} P(\omega ) \log \frac{P(\omega )}{Q(\omega )}.$$

A Pythagorean theorem holds for the n-divergence d_n [7, Theorem 11.6.1]:

$$d_{n}(P,Q)\geq d_{n}(P,R_{n}) + d_{n}(R_{n},Q),$$

for any convex F ⊆ ℙ, if P ∈ F and Q ∉ F, where R_n ∈ arg infS∈Fd_n(S,Q).

Consequently, for any P ∈ E and Q_n ∈ ℍ_n [24, corollary 32]:

$$H_{n}(Q_{n}) - H_{n}(P)\geq d_{n}(P, Q_{n}).$$

Pinsker’s inequality connects the L₁ distance to n-divergence (see, e.g., [7, Lemma 11.6.1]):

$$d_{n}(P,Q)\geq \frac{1}{2}\left\| P-Q\right\|_{n}^{2}.$$

Proposition 15

If P is a limit in entropy of $\mathbb E$ then there are Q_n ∈ ℍ_n such that ∥Q_n − P∥_n→0 as n→∞.

Proof

Putting our last two information-theoretic facts together we have that

$$ \begin{array}{@{}rcl@{}} H_{n}(Q_{n}) - H_{n}(P)& \geq & \quad d_{n}(P,Q_{n}) \\ & \geq & \quad \frac{1}{2}\left\| P-Q_{n}\right\|_{n}^{2}, \end{array} $$

for Q_n ∈ ℍ_n and P ∈ E.

Now, if P is a limit in entropy of $\mathbb E$ then there are Q_n ∈ ℍ_n such that |H_n(Q_n) − H_n(P)|→0 as n→∞. Hence ∥P − Q_n∥n2 also converge to zero, as required.□

Theorem 16

If $\mathbb E$ contains a limit in entropy P then

$$\operatorname{maxent} \mathbb E = \{P\}.$$

Proof

First we shall show that P ∈ maxent E; later we shall see that there is no other member of maxent E.

First, then, assume for contradiction that P∉ maxent E. Then there is some Q ∈ E such that Q has greater entropy than P. That is, for sufficiently large n, H_n(Q_n) ≥ H_n(Q) > H_n(P), where the Q_n ∈ ℍ_n converge in entropy (and, by Proposition 15, in L₁) to P. N.b., Q≠P. Hence, for sufficiently large n,

$$ \begin{array}{@{}rcl@{}} H_{n}(Q_{n}) - H_{n}(P)& > & H_{n}(Q_{n}) - H_{n}(Q) \\ & \geq & d_{n}(Q,Q_{n}) \\ & \geq & \frac{1}{2}\left\| Q-Q_{n}\right\|_{n}^{2}.\end{array} $$

Since the Q_n converge in entropy to P, they converge in L₁ to Q. By the uniqueness of L₁ limit points, Q = P: a contradiction. Hence P ∈ maxent E, as required.

Next we shall see that P is the unique member of maxent E. Suppose for contradiction that there is some P^‡∈ maxent E such that P^‡≠P. Then P cannot eventually dominate P^‡ in n-entropy—i.e., there is some infinite set J ⊆ ℕ such that for n ∈ J,

$$ H_{n}(P^{\dagger})\geq H_{n}(P ). $$

Let R= dfλP^‡ + (1 − λ)P for some λ ∈ (0, 1). Now by the log-sum inequality [7, Theorem 2.7.1], for all n ∈ J large enough that P^‡(ω_n)≠P(ω_n) for some ω_n ∈Ω_n,

$$ \begin{array}{@{}rcl@{}} H_{n}(R)& > & \lambda H_{n}(P^{\dagger}) + (1-\lambda ) H_{n}(P )\\ & \geq & \lambda H_{n}(P ) + (1-\lambda ) H_{n}(P )\\ & = & H_{n}(P ). \end{array} $$

Hence,

$$ \begin{array}{@{}rcl@{}} H_{n}(Q_{n}) - H_{n}(P)& > & H_{n}(Q_{n}) - H_{n}(R)\\ & \geq & d_{n}(R, Q_{n}),\end{array} $$

for large enough n ∈ J.

Now by Pinsker’s inequality and the definition of R,

$$ \begin{array}{@{}rcl@{}} d_{n}(R, Q_{n})& \geq & \frac{1}{2}\left\| R - Q_{n}\right\|_{n}^{2} \\ & = & \frac{1}{2}\left\| P - Q_{n} + \lambda (P^{\dagger} - P )\right\|_{n}^{2} \\ & = & \frac{1}{2}\left( \sum\limits_{\omega_{n}\in {\Omega}_{n}} \left| P (\omega_{n}) - Q_{n}(\omega_{n}) + \lambda (P^{\dagger}(\omega_{n}) - P (\omega_{n}))\right| \right)^{2} . \end{array} $$

Let f_n(φ)= dfP(φ) − Q_n(φ) + λ(P^‡(φ) − P(φ)) and $\rho _{n} \stackrel {\text {df}}{=} \bigvee _{f_{n}(\omega _{n})>0} \omega _{n}$. Then,

$$ \begin{array}{@{}rcl@{}} \sum\limits_{\omega_{n}\in {\Omega}_{n}} |f_{n}(\omega_{n})|& = & \sum\limits_{\omega_{n}: f_{n}(\omega_{n})>0} f_{n}(\omega_{n}) - \sum\limits_{\omega_{n}: f_{n}(\omega_{n})\! \leq \! 0} f_{n}(\omega_{n}) \\ & =& \sum\limits_{\omega_{n}: f_{n}(\omega_{n})>0} f_{n}(\omega_{n}) - \sum\limits_{\omega_{n}: f_{n}(\omega_{n}) \not> 0} f_{n}(\omega_{n}) \\ & = & f_{n}(\rho_{n}) - f_{n}(\neg \rho_{n}) \\ & = & 2 f_{n}(\rho_{n}) \end{array} $$

after substituting P(¬ρ_n) = 1 − P(ρ_n) etc.

Let us consider the behaviour of

$$ \begin{array}{@{}rcl@{}} f_{n}(\rho_{n}) = P (\rho_{n}) - Q_{n}(\rho_{n}) + \lambda (P^{\dagger}(\rho_{n}) - P (\rho_{n})) \end{array} $$

as n→∞. Now, P(ρ_n) − Q_n(ρ_n)→0 as n→∞, because Q_n converges in L₁ to P. However, λ(P^‡(ρ_n) − P(ρ_n))↛0 as n→∞, as we shall now see. P^‡≠P by assumption, so they must differ on some quantifier-free sentence ψ, a sentence of _m, say. Suppose without loss of generality that P^‡(ψ) > P(ψ) (otherwise take ¬ψ instead) and let δ = P^‡(ψ) − P(ψ) > 0. Now for n ≥ m,

$$ f_{n}(\rho_{n}) = \sum\limits_{\omega_{n}: f_{n}(\omega_{n})>0} f_{n}(\omega_{n})\geq \sum\limits_{\omega_{n}\models \psi } f_{n}(\omega_{n}) = f_{n}(\psi ) . $$

Since Q_n converges in L₁ to P we can consider n > m large enough that [7, Equation 11.137]:

$$ {\|}Q_{n}-P {{\|}}_{n} = 2 \max_{\varphi \in S{\mathcal L}_{n}} (Q_{n}(\varphi ) - P (\varphi )) <\lambda \delta . $$

In particular, since ψ is quantifier-free, $ Q_{n}(\psi )-P (\psi ) \leq \max \limits _{\varphi \in S{\mathcal L}_{n}} (Q_{n}(\varphi ) - P (\varphi )) < \lambda \delta /2$. For any such n,

$$ \begin{array}{@{}rcl@{}} f_{n}(\rho_{n})& \geq & f_{n}(\psi ) \\& = & P (\psi) - Q_{n}(\psi) + \lambda (P^{\dagger}(\psi) - P (\psi)) \\ & > & -\frac{\lambda \delta}{2} + \lambda \delta \\ & = & \frac{\lambda \delta}{2} . \end{array} $$

Putting the above parts together, we have that for sufficiently large n ∈ J,

$$ H_{n}(Q_{n}) - H_{n}(P ) > d_{n}(R, Q_{n}) \geq \frac{\left( 2 f_{n}(\rho_{n})\right)^{2}}{2}> \frac{\lambda^{2} \delta^{2}}{2}>0 . $$

However, that these H_n(Q_n) − H_n(P) are bounded away from zero contradicts the assumption that the Q_n converge in entropy to P. Hence, P is the unique member of maxent E, as required.□

Appendix 2. Alternative Proof of Corollary 20

This appendix provides a more direct proof of Corollary 20, which identifies an important scenario in which the equivocator function conditioned on a categorical constraint is the maximal entropy function.

Corollary 20

If ℍ_n contains P₌(⋅|φ) for sufficiently large n then

$$\operatorname{maxent} \mathbb E_{\varphi} = \{P_=(\cdot |\varphi )\}.$$

Proof

There are two cases: either P₌(φ) = 1 or P₌(φ) < 1.

If P₌(φ) = 1 then P₌ ∈ E_φ and P₌(⋅|φ) = P₌(⋅). P₌ is the unique member of maxent E_φ because the equivocator function has greater entropy than any other probability function, so maxent E_φ = {P₌(⋅|φ)}, as required.

If P₌(φ) < 1 then we can proceed as follows.

Since P₌(φ) > 0, P₌(⋅|φ) is well defined. P₌(φ|φ) = 1 so P₌(⋅|φ) ∈ E. Thus E_φ≠∅.

Suppose for contradiction that maxent E_φ≠{P₌(⋅|φ)}. Then in E_φ there must be some P^‡≠P₌(⋅|φ) that is not eventually dominated in entropy by P₌(⋅|φ). That is, there is some infinite J ⊆ ℕ such that H_n(P^‡) ≥ H_n(P₌(⋅|φ)) for all n ∈ J. (To see this consider that there are three cases: (i) if maxent E_φ = ∅ then every member of E_φ is eventually dominated by some other in entropy, so P₌(⋅|φ) is dominated by some P^‡ and P^‡ is not dominated by P₌(⋅|φ); (ii) if P₌(⋅|φ)∉ maxent E_φ = {P^‡,…} then P^‡ is not dominated by P₌(⋅|φ); (iii) if maxent E_φ = {P₌(⋅|φ),P^‡,…} then P^‡ is not dominated by P₌(⋅|φ).)

Define a probability function Q= dfλP^‡ + (1 − λ)P₌(⋅|φ) for some λ ∈ (0, 1). By the log-sum inequality [7, Theorem 2.7.1], for all n ∈ J large enough that P^‡(ω)≠P₌(ω|φ) for some ω ∈Ω_n,

$$ \begin{array}{@{}rcl@{}} H_{n}(Q)& > & \lambda H_{n}(P^{\dagger}) + (1-\lambda ) H_{n} (P_=(\cdot |\varphi )) \\ & \geq & \lambda H_{n}(P_=(\cdot |\varphi )) + (1-\lambda ) H_{n} (P_=(\cdot |\varphi )) \\ & = & H_{n}(P_=(\cdot |\varphi )).\end{array} $$

However, that H_n(Q) > H_n(P₌(⋅|φ)) for sufficiently large n ∈ J contradicts the assumption that ℍ_n contains P₌(⋅|φ) for sufficiently large n. Hence maxent E_φ = {P₌(⋅|φ)}, as required.□

Appendix 3. Zero Measure Premisses of Higher Quantifier Complexity

Proposition 55 (Σ_2m)

For $\varphi =\exists x_{2m}\forall x_{2m-1}{\dots } \forall x_{1} Ux_{2m}x_{2m-1}{\ldots } x_{1}\in {\Sigma }_{2m}$ it holds that for all $P\in \mathbb E_{\varphi }$ there exists a probability function $Q\in \mathbb E_{\varphi }$ which has greater entropy. Hence, $\operatorname {maxent}\mathbb E_{\varphi }=\emptyset $.

Proof

For ease of notation we will write $U t_{i} \vec {t}$ for $U t_{i}t_{k_{2m-1}}{\ldots } t_{k_{1}}$ and $\bigwedge _{t=1}^{n} Ut_{i}\vec {t}$ for $\bigwedge _{k_{2m-1}=1}^{n} \ldots \bigwedge _{k_{1}=1}^{n} Ut_{i}t_{k_{2m-1}}{\dots } t_{k_{1}}$.

Suppose for contradiction that $\operatorname {maxent}\mathbb E\neq \emptyset $ and let $P\in \operatorname {maxent}\mathbb E$. Note that P₌(φ) = 0 < 1 = P(φ). Hence, P≠P₌.

Let us now define a probability function $P^{\prime }\in \mathbb E$ by shifting all witnessing of $\exists x_{2m} \forall x_{2m-1} \exists x_{2m-2}.... \forall x_{1} U\vec {x}$ by one and then adding a constant t₁ such that $Ut_{1}\vec {t}$ is independent from all other literals for all $\vec {t}$. Intuitively, the literals $\pm Ut_{i}\vec {t}$ are replaced by $\pm Ut_{i+1}\vec {t}$.

Formally, let $\omega _{n}\in {\Omega }_{n}=\bigwedge _{i, t=1}^{n} U^{\epsilon _{i,\vec t}}t_{i}\vec {t}$ be an arbitrary n-state. Then define $P^{\prime }$ by

$$ \begin{array}{@{}rcl@{}} P^{\prime}(\omega_{n}):&=&P\left( \bigwedge_{i=2}^{n}\bigwedge_{t=1}^{n}U^{\epsilon_{i-1,\vec{t}}}t_{i-1}\vec{t}\right)\cdot P_=\left( \bigwedge_{t=1}^{n}U^{\epsilon_{1,\vec{t}}}t_{1}\vec{t}\right)\\ &=&\frac{P(\bigwedge_{i=1}^{n-1}\bigwedge_{t=1}^{n} U^{\epsilon_{i,\vec{t}}}t_{i}\vec{t})}{2^{n^{2m-1}}} . \end{array} $$

Firstly, we note that

$$ \begin{array}{@{}rcl@{}} &P^{\prime}(\forall x_{2m-1} \exists x_{2m-2}{\ldots} \forall x_{1}& Ut_{1}\vec{x}) =\lim_{n\to\infty}P^{\prime}\left( \bigwedge_{j=1}^{n} \exists x_{2m-2}{\ldots} \forall x_{1} Ut_{1} t_{j}\vec{x}\right)\\ &&= \lim_{n\to\infty}P_{=}\left( \bigwedge_{j=1}^{n} \exists x_{2m-2}.... \forall x_{1} Ut_{1} t_{j}\vec{x}\right)=0 . \end{array} $$

(11)

So, according to $P^{\prime }$ the constant t₁ is not a witness of the existential premiss sentence φ.

We next show that $P\neq P^{\prime }$. Firstly, note that

$$ \lim_{n\to\infty}P\left( \bigvee_{i=1}^{n}\forall y\exists x_{2m-2}.... \forall x_{1} Ut_{1} y\vec{x}\right)=P(\exists z\forall y \exists x_{2m-2}.... \forall x_{1} Uzy\vec{x}))=1 $$

and thus there is a smallest $i \in \mathbb {N}$ for which $P(\forall x_{2m-1} \exists x_{2m-2}.... \forall x_{1} Ut_{i}\vec {x})>0$. With this and Eq. 11, we have

$$ \begin{array}{@{}rcl@{}} &&\min\{i\in\mathbb N : P^{\prime}(\forall x_{2m-1} \exists x_{2m-2}.... \forall x_{1} Ut_{i}x_{2m-1}\vec{x})>0\}\\ &&=\min\left\{i\in\mathbb N : \lim_{n\to\infty}P^{\prime}\left( \bigwedge_{k=1}^{n} \exists x_{2m-2}.... \forall x_{1} Ut_{i}t_{k}\vec{x}\right)>0\right\}\\ &&=\min\left\{i\in\mathbb N : \lim_{n\to\infty}P\left( \bigwedge_{k=1}^{n} \exists x_{2m-2}.... \forall x_{1} Ut_{i-1}t_{k}\vec{x}\right)>0\right\}\\ &&=1+\min\left\{i\in\mathbb N : \lim_{n\to\infty}P\left( \bigwedge_{k=1}^{n} \exists x_{2m-2}.... \forall x_{1} Ut_{i}t_{k}\vec{x}\right)>0\right\}\\ &&=1+\min\{i\in\mathbb N : P(\forall x_{2m-1} \exists x_{2m-2}.... \forall x_{1} Ut_{i}x_{2m-1}\vec{x})>0\} . \end{array} $$

So, $P\neq P^{\prime }$.

We also observe that for all i ≥ 2,

$$P^{\prime}(\forall x_{2m-1} \exists x_{2m-2}.... \forall x_{1} Ut_{i}\vec{x})=P(\forall x_{2m-1} \exists x_{2m-2}.... \forall x_{1} Ut_{i-1}\vec{x})$$

and furthermore,

$$P^{\prime}\left( \bigvee_{i\in I}\forall x_{2m-1} \exists x_{2m-2}.... \forall x_{1} Ut_{i}\vec{x}\right)=P\left( \bigvee_{i\in I}\forall x_{2m-1} \exists x_{2m-2}.... \forall x_{1} Ut_{i-1}\vec{x}\right)$$

for all finite index sets I. So,

$$ \begin{array}{@{}rcl@{}} P^{\prime}(\exists y\forall x_{2m-1} \exists x_{2m-2}.... \forall x_{1} Uy\vec{x})&=&\lim_{n\to\infty} P^{\prime}\left( \bigvee_{i=1}^{n}\forall x_{2m-1} \exists x_{2m-2}.... \forall x_{1} Ut_{i}\vec{x}\right)\\ &\geq&\lim_{n\to\infty} P^{\prime}\left( \bigvee_{i=2}^{n}\forall x_{2m-1} \exists x_{2m-2}.... \forall x_{1} Ut_{i}\vec{x}\right)\\ &=&\lim_{n\to\infty} P\left( \bigvee_{i=1}^{n-1}\forall x_{2m-1} \exists x_{2m-2}.... \forall x_{1} Ut_{i}\vec{x}\right)\\ &=&\lim_{n\to\infty} P\left( \bigvee_{i=1}^{n}\forall x_{2m-1} \exists x_{2m-2}.... \forall x_{1} Ut_{i}\vec{x}\right)\\ &=&1 . \end{array} $$

This means that $P^{\prime }(\exists x\forall y Uxy)=1$ and thus, as advertised, $P^{\prime }\in \mathbb E$.

We now calculate n-entropies of P and $P^{\prime }$ and find for n ≥ 1 that:

$$ \begin{array}{@{}rcl@{}} H_{n}(P)&=&-\sum\limits_{{\begin{array}{cc}\epsilon_{i,\vec{t}}{\in}\{0,1\}\\2{\leq} i{\leq} n \end{array}}}\sum\limits_{{\begin{array}{cc}\epsilon_{1,\vec{t}}{\in}\{0,1\} \end{array}}} P\left( \bigwedge_{t=1}^{n}U^{\epsilon_{1,\vec{t}}}t_{1}\vec{t} \wedge\bigwedge_{i=2}^{n}\bigwedge_{t=1}^{n} U^{\epsilon_{i,\vec{t}}}t_{i}\vec{t}\right)\\ &&\quad\quad\quad\quad\cdot\log\left( P\left( \bigwedge_{t=1}^{n}U^{\epsilon_{1,\vec{t}}}t_{1}\vec{t} \wedge\bigwedge_{i=2}^{n}\bigwedge_{t=1}^{n}U^{\epsilon_{i,\vec{t}}}t_{i}\vec{t}\right)\right)\\ H_{n}(P^{\prime})&=&-\sum\limits_{{\begin{array}{cc}\epsilon_{i,\vec{t}}{\in}\{0,1\}\\1{\leq} i {\leq} n \end{array}}}P^{\prime}\left( \bigwedge_{i,t=1}^{n}U^{\epsilon_{i,\vec{t}}}t_{i}\vec{t}\right)\cdot\log\left( P^{\prime}\left( \bigwedge_{i,t=1}^{n}U^{\epsilon_{i,\vec{t}}}t_{i}\vec{t}\right)\right)\\ &=&-\sum\limits_{{\begin{array}{cc}\epsilon_{i, \vec{t}}{\in}\{0,1\}\\2{\leq} i{\leq} n \end{array}}}\sum\limits_{{\begin{array}{cc}\epsilon_{1, \vec{t}}{\in}\{0,1\} \end{array}}} P^{\prime}\left( \bigwedge_{t=1}^{n}U^{\epsilon_{1,\vec{t}}}t_{1}\vec{t} \wedge\bigwedge_{i=2}^{n}\bigwedge_{t=1}^{n}U^{\epsilon_{i,\vec{t}}}t_{i}\vec{t}\right)\\ &&\quad\quad\quad\quad\cdot\log\left( P^{\prime}\left( \bigwedge_{t=1}^{n}U^{\epsilon_{1,\vec{t}}}t_{1}\vec{t} \wedge\bigwedge_{i=2}^{n}\bigwedge_{t=1}^{n}U^{\epsilon_{i,\vec{t}}}t_{i}\vec{t}\right)\right)\\ &=&-\sum\limits_{{\begin{array}{cc}\epsilon_{i, \vec{t}}{\in}\{0,1\}\\2{\leq} i{\leq} n \end{array}}}\sum\limits_{{\begin{array}{cc}\epsilon_{1,\vec{t}}{\in}\{0,1\} \end{array}}} \frac{P(\bigwedge_{i=1}^{n-1}\bigwedge_{t=1}^{n}U^{\epsilon_{i,\vec{t}}}t_{i}\vec{t})}{2^{n^{2m-1}}}\\ &&\quad\quad\quad\quad\cdot\log\left( \frac{P(\bigwedge_{i=1}^{n-1}\bigwedge_{t=1}^{n}U^{\epsilon_{i,\vec{t}}}t_{i}\vec{t})}{2^{n^{2m-1}}}\right) . \end{array} $$

Holding the first summation fixed, we note that, since n-entropy is maximised by maximally equivocating, $H_{n}(P)\leq H_{n}(P^{\prime })$. Now define $Q:=\frac {P+P^{\prime }}{2}$. Since $\mathbb E$ is convex and $P,P^{\prime }\in \mathbb E$, we observe that $Q\in \mathbb E$.

Since n-entropy is a strictly concave function we conclude that H_n(Q) > H_n(P) whenever P and $P^{\prime }$ disagree on $\mathcal L_{n}$. Since $P\neq P^{\prime }$ there has to exist some finite M and quantifier-free sentence $\psi \in QFS\mathcal L_{M}$ such that $P(\psi )\neq P^{\prime }(\psi )$ (Gaifman’s Theorem). Since $\mathcal L_{m}\subset \mathcal L_{m+1}$ for all m we have that P disagrees with $P^{\prime }$ on $\mathcal L_{m}$ for all m ≥ M. We have hence found a $Q\in \mathbb E$ such that H_n(Q) > H_n(P) for all large enough n. Hence, $P\notin \operatorname {maxent}\mathbb E$. Contradiction. □

Proposition 56 (π₃)

For φ = ∀x∃y∀zSxyz ∈π₃ it holds that for all $P\in \mathbb E_{\varphi }$ there exists a probability function $Q\in \mathbb E_{\varphi }$ which has greater entropy. Hence, $\operatorname {maxent}\mathbb E_{\varphi }=\emptyset $.

Proof

Let us first note that

$$ \begin{array}{@{}rcl@{}} \mathbb E_{\varphi}&=&\{P\in\mathbb P : P(\varphi)=1\}\\ &=&\{P\in\mathbb P : P(\exists y\forall z St_{1}yz)=1, P(\exists y\forall z St_{2}yz)=1,\dots,\} . \end{array} $$

(12)

Assume for contradiction that $P\in \operatorname {maxent}\mathbb E_{\varphi }$. Since P₌(φ) = 0, P cannot be the equivocator. However, since $P\in \mathbb E_{\varphi }$, it must also hold that for all t_i ($i\in \mathbb N$) there has to exist some minimal $t_{k^{*}_{i}}$ ($k^{*}_{i}\geq 1$) such that $P(\forall zSt_{i}t_{k^{*}_{i}}z)>0$.

We now define a probability function $Q\in \mathbb E_{\varphi }$ which has greater entropy than P, which contradicts that $P\in \operatorname {maxent}\mathbb E_{\varphi }$. First, we postpone for all i the witnessing (see Proposition 53) to $k^{*}_{i}+1$. This is again achieved by first defining a probability function $P^{\prime }\in \mathbb E_{\varphi }\setminus \{P\}$ such that $H_{n}(P^{\prime })\geq H_{n}(P)$ for all large enough n:

$$ \begin{array}{@{}rcl@{}} P^{\prime}\left( \bigwedge\limits_{k=1}^{n}\bigwedge\limits_{l=1}^{n} S^{\epsilon_{k,l}}t_{i}t_{k}t_{l}\right):=\frac{P(\bigwedge_{k=1}^{n-1}\bigwedge_{l=1}^{n}S^{\epsilon_{k,l}}t_{i}t_{k}t_{l})}{2^{n}} . \end{array} $$

As we saw in Proposition 53, $P^{\prime }(\exists y\forall z St_{i}yz)=1$ for all $i\in \mathbb N$. Furthermore, for all $i\in \mathbb N$ there exists an $n_{i}\in \mathbb N$ and $\epsilon _{k,l}\in \{0,1\}^{n_{i}\times n_{i}}$ such that $P^{\prime }(\bigwedge _{k=1}^{n_{i}}\bigwedge _{l=1}^{n_{i}} S^{\epsilon _{k,l}}t_{i}t_{k}t_{l})\neq P(\bigwedge _{k=1}^{n_{i}}\bigwedge _{l=1}^{n_{i}} S^{\epsilon _{k,l}}t_{i}t_{k}t_{l})$.

Given the way we wrote $\mathbb E_{\varphi }$ (see Eq. 12), we see that every extension of $P^{\prime }$ to a probability function—which so far has not been defined on the entire language—will be in $\mathbb E_{\varphi }$ since membership in $\mathbb E_{\varphi }$ solely depends on sub-states where the first constant is fixed to some t_i.

We now define $P^{\prime }$ on an arbitrary n-state ω_n of the language, and hence on the entire language by

$$ \begin{array}{@{}rcl@{}} P^{\prime}(\omega_{n}):=\prod\limits_{i=1}^{n} P^{\prime}\left( \bigwedge\limits_{k=1}^{n}\bigwedge\limits_{l=1}^{n}S^{\epsilon_{i,k,l}}t_{i}t_{k}t_{l}\right) . \end{array} $$

Because of the additivity of the entropy function [8, P. 63], we also find for all $n\in \mathbb N$ that

$$ \begin{array}{@{}rcl@{}} H_{n}(P^{\prime})=-\sum\limits_{i=1}^{n}\sum\limits_{{\begin{array}{cc}\epsilon_{i,r,s}{\in}\{0,1\}\\1{\leq} r{\leq} n\\1{\leq} s{\leq} n \end{array}}}P^{\prime}\left( \bigwedge\limits_{k=1}^{n}\bigwedge\limits_{l=1}^{n} S^{\epsilon_{i,k,l}}t_{1}t_{k}t_{l}\right)\cdot\log\left( P^{\prime}\left( \bigwedge\limits_{k=1}^{n}\bigwedge\limits_{l=1}^{n} S^{\epsilon_{i,k,l}}t_{1}t_{k}t_{l}\right)\right) . \end{array} $$

Since the entropy function is maximised for independent variables we also find:

$$ \begin{array}{@{}rcl@{}} H_{n}(P)\geq-\sum\limits_{i=1}^{n} \sum\limits_{{\begin{array}{cc}\epsilon_{i,r,s}{\in}\{0,1\}\\1{\leq} r{\leq} n\\1{\leq} s{\leq} n \end{array}}}P^{\prime}\left( \bigwedge\limits_{k=1}^{n}\bigwedge\limits_{l=1}^{n} S^{\epsilon_{i,k,l}}t_{i}t_{k}t_{l}\right)\cdot\log\left( P^{\prime}\left( \bigwedge\limits_{k=1}^{n}\bigwedge\limits_{l=1}^{n} S^{\epsilon_{i,k,l}}t_{i}t_{k}t_{l}\right)\right) . \end{array} $$

Now recall that we saw in Proposition 53 that the following inequality holds for all large enough fixed $i\in \mathbb N$:

$$ \begin{array}{@{}rcl@{}} &&{~}_{i}H_{n}(P^{\prime}):=-\sum\limits_{{\begin{array}{cc}\epsilon_{r,s}{\in}\{0,1\}\\1{\leq} r{\leq} n\\1{\leq} s{\leq} n \end{array}}}P^{\prime}\left( \bigwedge\limits_{k=1}^{n}\bigwedge\limits_{l=1}^{n} S^{\epsilon_{k,l}}t_{i}t_{k}t_{l}\right)\cdot\log\left( P^{\prime}\left( \bigwedge\limits_{k=1}^{n}\bigwedge\limits_{l=1}^{n} S^{\epsilon_{k,l}}t_{i}t_{k}t_{l}\right)\right)\\ &&=-\sum\limits_{{\begin{array}{cc}\epsilon_{r,s}{\in}\{0,1\}\\1{\leq} r{\leq} n\\1{\leq} s{\leq} n \end{array}}}P^{\prime}\left( \bigwedge\limits_{k=1}^{n}\bigwedge\limits_{l=1}^{n} U^{\epsilon_{k,l}}t_{k}t_{l}\right)\cdot\log\left( P^{\prime}\left( \bigwedge\limits_{k=1}^{n}\bigwedge\limits_{l=1}^{n} U^{\epsilon_{k,l}}t_{k}t_{l}\right)\right)\\ &&\geq-\sum\limits_{{\begin{array}{cc}\epsilon_{r,s}{\in}\{0,1\}\\1{\leq} r{\leq} n\\1{\leq} s{\leq} n \end{array}}}P\left( \bigwedge\limits_{k=1}^{n}\bigwedge\limits_{l=1}^{n} U^{\epsilon_{k,l}}t_{k}t_{l}\right)\cdot\log\left( P\left( \bigwedge_{k=1}^{n}\bigwedge_{l=1}^{n} U^{\epsilon_{k,l}}t_{k}t_{l}\right)\right)\\ &&=-\sum\limits_{{\begin{array}{cc}\epsilon_{r,s}{\in}\{0,1\}\\1{\leq} r{\leq} n\\1{\leq} s{\leq} n \end{array}}}P\left( \bigwedge\limits_{k=1}^{n}\bigwedge\limits_{l=1}^{n} S^{\epsilon_{k,l}}t_{i}t_{k}t_{l}\right)\cdot\log\left( P\left( \bigwedge\limits_{k=1}^{n}\bigwedge\limits_{l=1}^{n} S^{\epsilon_{k,l}}t_{i}t_{k}t_{l}\right)\right):= {{~}_{i}H_{n}}(P) . \end{array} $$

So, we have for all large enough $n\in \mathbb N$ that

$$ \begin{array}{@{}rcl@{}} H_{n}(P^{\prime})=\sum\limits_{i=1}^{n} {{~}_{i}H}_{n}(P^{\prime})\geq \sum\limits_{i=1}^{n} {{~}_{i}H_{n}}(P)\geq H_{n}(P) . \end{array} $$

We again put $Q:=\frac {P+P^{\prime }}{2}$ and note that since $P\neq P^{\prime }$, Q≠P. Since $P^{\prime }\in \mathbb E_{\varphi }$ we easily find by applying the convexity of $\mathbb E_{\varphi }$ that $Q\in \mathbb E_{\varphi }$. Furthermore, H_n(Q) > H_n(P) for all large enough $n\in \mathbb N$ since Q is a convex combination of P and $P^{\prime }$ and $H_{n}(P^{\prime })\geq H_{n}(P)$ for all $n\in \mathbb N$. □

Proposition 57 (π_2m+ 3)

For $\varphi =\forall v_{1}\exists w_{1}\dots \forall v_{m}\exists w_{m}\forall x\exists y\forall z Rv_{1}w_{1}{\dots } v_{m}w_{m}xyz$ ∈π_2m+ 1 and for all $P\in \mathbb E_{\varphi }$ there exists a probability function $Q\in \mathbb E_{\varphi }$ which has greater entropy than P. Hence, $\operatorname {maxent}\mathbb E_{\varphi }=\emptyset $.

Proof

The proof proceeds by induction on the quantifier complexity m.

The base case m = 0 is Proposition 56.

The induction step for m ≥ 1 assumes the result for m − 1 ≥ 0. The proof follows the blueprint laid out in the base case.

Let us first note that

$$ \begin{array}{@{}rcl@{}} \mathbb E_{\varphi}&=&\{P\in\mathbb P : P(\varphi)=1\}\\ &=&\{P\in\mathbb P : P(\exists w_{1}\dots\forall v_{m}\exists w_{m}\forall x\exists y\forall z Rt_{1}w_{1}{\dots} v_{m}w_{m}xyz)=1,\\ &&\quad\quad\quad\quad\quad P(\exists w_{1}\dots\forall v_{m}\exists w_{m}\forall x\exists y\forall z Rt_{2}w_{1}{\dots} v_{m}w_{m}xyz)=1,\\ &&\quad\quad\quad\quad\quad\dots,\} . \end{array} $$

(13)

Assume for contradiction that $P\in \operatorname {maxent}\mathbb E_{\varphi }$. Since P₌(φ) = 0, P cannot be the equivocator. However, since $P\in \mathbb E_{\varphi }$, it must also hold that for all t_i ($i\in \mathbb N$) there has to exist some minimal $t_{k^{*}_{i}}$ ($k^{*}_{i}\geq 1$) such that $P(\forall v_{2}\exists w_{2}\dots \forall v_{m}\exists w_{m}\forall x\exists y\forall z Rt_{i}t_{k^{*}_{i}}v_{2}w_{2}{\dots } v_{m}w_{m}xyz)>0$. We now postpone this witnessing as usual.

We begin by assigning probabilities to substates fixing t_i

$$ \begin{array}{@{}rcl@{}} &&P^{\prime}\left( \bigwedge\limits_{b_{1}=1}^{n}\bigwedge\limits_{a_{2}=1}^{n}\ldots\bigwedge\limits_{a_{m+1}=1}^{n}\bigwedge_{b_{m+1}=1}^{n}\bigwedge_{a_{m+2}=1}^{n} R^{\epsilon_{b_{1},a_{2},\dots,a_{m+2}}}t_{i}t_{b_{1}}t_{a_{2}}{\dots} t_{a_{m+2}}\right):=\\ &&\frac{P(\bigwedge_{b_{1}=1}^{n-1}\bigwedge_{a_{2}=1}^{n}\ldots\bigwedge_{a_{m+1}=1}^{n}\bigwedge_{b_{m+1}=1}^{n}\bigwedge_{a_{m+2} =1}^{n}R^{\epsilon_{b_{1},a_{2},\dots,a_{m+2}}}t_{i}t_{b_{1}}t_{a_{2}}{\dots} t_{a_{m+2}})}{2^{n}} . \end{array} $$

Again, $P^{\prime }(\exists w_{1}\dots \forall v_{m}\exists w_{m}\forall x\exists y\forall z Rt_{i}w_{1}\dots v_{m}w_{m}xyz)=1$ for all $i\in \mathbb N$. Furthermore, for all $i\in \mathbb N$ there exist an $n_{i}\in \mathbb N$ and $\vec \epsilon \in \{0,1\}^{n_{i}^{2m+2}}$ such that

$$ \begin{array}{@{}rcl@{}} &P^{\prime}&\left( \bigwedge\limits_{b_{1}=1}^{n_{i}}\bigwedge\limits_{a_{2}=1}^{n_{i}}\ldots\bigwedge\limits_{a_{m+1}=1}^{n_{i}}\bigwedge\limits_{b_{m+1}=1}^{n_{i}}\bigwedge\limits_{a_{m+2}=1}^{n_{i}} R^{\epsilon_{b_{1},a_{2},\dots,a_{m+2}}}t_{i}t_{b_{1}}t_{a_{2}}{\dots} t_{a_{m+2}}\right)\\ &&:\neq P^{\prime}\left( \bigwedge\limits_{b_{1}=1}^{n_{i}}\bigwedge\limits_{a_{2}=1}^{n_{i}}\ldots\bigwedge\limits_{a_{m+1}=1}^{n_{i}}\bigwedge\limits_{b_{m+1}=1}^{n_{i}}\bigwedge\limits_{a_{m+2}=1}^{n_{i}} R^{\epsilon_{b_{1},a_{2},\dots,a_{m+2}}}t_{i}t_{b_{1}}t_{a_{2}}{\dots} t_{a_{m+2}}\right) . \end{array} $$

In particular, $P^{\prime }\neq P$.

We now define $P^{\prime }$ on an arbitrary n-state ω_n of the language, and hence on the entire language, by fixing $\vec \epsilon _{i}\in \{0,1\}^{n^{2m+2}}$ for 1 ≤ i ≤ n and letting

$$ \begin{array}{@{}rcl@{}} P^{\prime}(\omega_{n}):=\prod\limits_{i=1}^{n} P^{\prime}\left( \bigwedge\limits_{\vec\epsilon_{i}\in\{0,1\}^{n^{2m+2}}}R^{\vec\epsilon_{i}}t_{i}\vec t\right) . \end{array} $$

Because of the additivity of the entropy function [8, p. 63], we also find for all $n\in \mathbb N$ that

$$ \begin{array}{@{}rcl@{}} H_{n}(P^{\prime})&=&-\sum\limits_{i=1}^{n}\sum\limits_{\vec\epsilon_{i}\in\{0,1\}^{n^{2m+2}}}P^{\prime}\left( \bigwedge\limits_{\vec\epsilon_{i}\in\{0,1\}^{n^{2m+2}}}R^{\vec\epsilon_{i}}t_{i}\vec t\right)\cdot\log\left( P^{\prime}\left( \bigwedge\limits_{\vec\epsilon_{i}\in\{0,1\}^{n^{2m+2}}}R^{\vec\epsilon_{i}}t_{i}\vec t\right)\right)\\ :&=&\sum\limits_{i=1}^{n}{{~}_{i}H}_{n,2m+2}(P^{\prime}) .\end{array} $$

We now use the proof of Proposition 55 to obtain that for all i and all large enough n (depending on i),

$$ \begin{array}{@{}rcl@{}} {{~}_{i}H}_{n,2m+2}(P^{\prime})\geq {{~}_{i}H}_{n,2m+2}(P) . \end{array} $$

_iH_n,2m(P) is the n-entropy of a probability function P on a language containing one (2m + 2)-ary relation symbol U, $\varphi =\exists w_{1}\forall v_{2}\exists w_{2}\dots \exists w_{m+1}\forall v_{m+2} Uw_{1}v_{2}w_{2}{\dots } w_{m+1}v_{m+2}\in {\Pi }_{2m+2}$ and $P\in \mathbb E_{\varphi }$.

Since n-entropy is maximised by probability functions with as many probabilistic independences as possible, we again have:

$$ \begin{array}{@{}rcl@{}} H_{n}(P)\geq\sum\limits_{i=1}^{n}{{~}_{i}H}_{n,2m+2}(P) , \end{array} $$

which overall gives the inequality:

$$ \begin{array}{@{}rcl@{}} H_{n}(P^{\prime})&=&\sum\limits_{i=1}^{n}{{~}_{i}H}_{n,2m+2}(P^{\prime})\geq \sum\limits_{i=1}^{n}{{~}_{i}H}_{n,2m+2}(P)\geq H_{n}(P) . \end{array} $$

Taking Q to be any convex combination of P and $P^{\prime }$, we see that H_n(Q) > H_n(P) for all large enough n. This entails that Q has greater entropy than P. □

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Landes, J., Rafiee Rad, S. & Williamson, J. Determining Maximal Entropy Functions for Objective Bayesian Inductive Logic. J Philos Logic 52, 555–608 (2023). https://doi.org/10.1007/s10992-022-09680-6

Download citation

Received: 23 December 2021
Accepted: 23 July 2022
Published: 13 October 2022
Issue Date: April 2023
DOI: https://doi.org/10.1007/s10992-022-09680-6

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Determining Maximal Entropy Functions for Objective Bayesian Inductive Logic

Abstract

Article PDF

Similar content being viewed by others

A Triple Uniqueness of the Maximum Entropy Approach

Nonmonotonicity in the Framework of Parametric Logic

Approximations of System W Between c-Inference, System Z, and Lexicographic Inference

Data Availability

References

Acknowledgments

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of Interest

Additional information

Publisher’s Note

Appendices

Appendix 1: Proofs of Proposition 15 and Theorem 16

Proposition 15

Proof

Theorem 16

Proof

Appendix 2. Alternative Proof of Corollary 20

Corollary 20

Proof

Appendix 3. Zero Measure Premisses of Higher Quantifier Complexity

Proposition 55 (Σ_2m)

Proof

Proposition 56 (π₃)

Proof

Proposition 57 (π_2m+ 3)

Proof

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Determining Maximal Entropy Functions for Objective Bayesian Inductive Logic

Abstract

Article PDF

Similar content being viewed by others

A Triple Uniqueness of the Maximum Entropy Approach

Nonmonotonicity in the Framework of Parametric Logic

Approximations of System W Between c-Inference, System Z, and Lexicographic Inference

Data Availability

References

Acknowledgments

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of Interest

Additional information

Publisher’s Note

Appendices

Appendix 1: Proofs of Proposition 15 and Theorem 16

Proposition 15

Proof

Theorem 16

Proof

Appendix 2. Alternative Proof of Corollary 20

Corollary 20

Proof

Appendix 3. Zero Measure Premisses of Higher Quantifier Complexity

Proposition 55 (Σ2m)

Proof

Proposition 56 (π3)

Proof

Proposition 57 (π2m+ 3)

Proof

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation

Proposition 55 (Σ_2m)

Proposition 56 (π₃)

Proposition 57 (π_2m+ 3)