Abstract
According to the objective Bayesian approach to inductive logic, premisses inductively entail a conclusion just when every probability function with maximal entropy, from all those that satisfy the premisses, satisfies the conclusion. When premisses and conclusion are constraints on probabilities of sentences of a first-order predicate language, however, it is by no means obvious how to determine these maximal entropy functions. This paper makes progress on the problem in the following ways. Firstly, we introduce the concept of a limit in entropy and show that, if the set of probability functions satisfying the premisses contains a limit in entropy, then this limit point is unique and is the maximal entropy probability function. Next, we turn to the special case in which the premisses are categorical sentences of the logical language. We show that if the uniform probability function gives the premisses positive probability, then the maximal entropy function can be found by simply conditionalising this uniform prior on the premisses. We generalise our results to demonstrate agreement between the maximal entropy approach and Jeffrey conditionalisation in the case in which there is a single premiss that specifies the probability of a sentence of the language. We show that, after learning such a premiss, certain inferences are preserved, namely inferences to inductive tautologies. Finally, we consider potential pathologies of the approach: we explore the extent to which the maximal entropy approach is invariant under permutations of the constants of the language, and we discuss some cases in which there is no maximal entropy probability function.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
Data Availability
This manuscript does not use data.
References
Balestrino, A., Caiti, A., & Crisostomi, E. (2006). Efficient numerical approximation of maximum entropy estimates. International Journal of Control, 79(9), 1145–1155.
Barnett, O., & Paris, J. B. (2008). Maximum entropy inference with quantified knowledge. Logic Journal of IGPL, 16(1), 85–98.
Billingsley, P. (1979). Probability and measure (3rd (1995) edn). New York: Wiley.
Carnap, R. (1952). The continuum of inductive methods. Chicago: University of Chicago Press.
Caticha, A., & Giffin, A. (2006). Updating probabilities. In Proceedings of MaxEnt (Vol. 872 pp. 31–42).
Chen, B., Hu, J., & Zhu, Y. (2010). Computing maximum entropy densities: a hybrid approach. Signal Processing: An International Journal, 4(2), 114–122.
Cover, T. M., & Thomas, J. A. (1991). Elements of information theory (2nd (2006) edn). New York: Wiley.
Csiszár, I. (2008). Axiomatic characterizations of information measures. Entropy, 10(3), 261–273.
Gaifman, H. (1964). Concerning measures in first order calculi. Israel Journal of Mathematics, 2(1), 1–18.
Goldman, S. A. (1987). Efficient methods for calculating maximum entropy distributions Master’s thesis. Electrical Engineering and Computer Science, Massachusetts Institute of Technology.
Goldman, S. A., & Rivest, R. (1988). A non-iterative maximum entropy algorithm. In L. Kanal J. Lemmer (Eds.) Uncertainty in Artificial Intelligence 2 (pp. 133–148). North-Holland: Elsevier.
Haenni, R., Romeijn, J. -W., Wheeler, G., & Williamson, J. (2011). Probabilistic logics and probabilistic networks. Synthese Library. Dordrecht: Springer.
Howarth, E., & Paris, J. B. (2019). Pure inductive logic with functions. Journal of Symbolic Logic, 84(4), 1382–1402.
Howson, C. (2014). Finite additivity, another lottery paradox and conditionalisation. Synthese, 191(5), 989–1012.
Jaynes, E. T. (1957). Information theory and statistical mechanics. The Physical Review, 106(4), 620–630.
Jaynes, E. T. (2003). Probability theory: the logic of science. Cambridge: Cambridge University Press.
Landes, J. (2009). The principle of spectrum exchangeability within inductive logic. PhD thesis, Manchester Institute for Mathematical Sciences.
Landes, J. (2021a). A triple uniqueness of the maximum entropy approach. In J. Vejnarová N. Wilson (Eds.) Proceedings of ECSQARU, volume 12897 of LNAI (pp. 644–656). Cham: Springer.
Landes, J. (2021b). The entropy-limit (conjecture) for Σ2-premisses. Studia Logica, 109(2), 423–442.
Landes, J., & Williamson, J. (2015). Justifying objective Bayesianism on predicate languages. Entropy, 17(4), 2459–2543.
Landes, J., & Williamson, J. (2016). Objective Bayesian nets from consistent datasets. In A. Giffin K.H. Knuth (Eds.) Proceedings of MaxEnt, (Vol. 1757 pp. 020007–1–020007-8). AIP.
Landes, J., & Williamson, J. (2022). Objective bayesian nets for integrating consistent datasets. Journal of Artificial Intelligence Research, 74, 393–458.
Landes, J., Paris, J. B., & Vencovská, A. (2009). Representation theorems for probability functions satisfying spectrum exchangeability in inductive logic. International Journal of Approximate Reasoning, 51(1), 35–55.
Landes, J., Rafiee Rad, S., & Williamson, J. (2021). Towards the entropy-limit conjecture. Annals of Pure and Applied Logic, 172(2), 102870.
Lehmann, D., & Magidor, M. (1992). What does a conditional knowledge base entail? Artificial Intelligence, 55(1), 1–60.
Ormoneit, D., & White, H. (1999). An efficient algorithm to compute maximum entropy densities. Econometric Reviews, 18(2), 127–140.
Paris, J. B. (1994). The uncertain reasoner’s companion. Cambridge: Cambridge University Press.
Paris, J. B. (1998). Common sense and maximum entropy. Synthese, 117(1), 75–93.
Paris, J. B., & Rafiee Rad, S. (2010). A note on the least informative model of a theory. In F. Ferreira, B. Löwe, E. Mayordomo, L. Mendes Gomes, J. B. Paris, & S. Rafiee Rad (Eds.) Proceedings of CiE (pp. 342–351). Berlin: Springer.
Paris, J. B., & Vencovská, A. (1990). A note on the inevitability of maximum entropy. International Journal of Approximate Reasoning, 4 (3), 183–223.
Paris, J. B., & Vencovská, A. (1997). In defense of the maximum entropy inference process. International Journal of Approximate Reasoning, 17 (1), 77–103.
Paris, J., & Vencovská, A. (2015). Pure inductive logic. Cambridge: Cambridge University Press.
Paris, J. B., & Vencovská, A. (2019). Six problems in pure inductive logic. Journal of Philosophical Logic, 48(4), 731–747.
Pearl, J. (1988). Probabilistic reasoning in intelligent systems: networks of plausible inference. San Mateo: Morgan Kaufmann.
Rafiee Rad, S. (2009). Inference processes for probabilistic first order languages. PhD thesis, Manchester Institute for Mathematical Sciences.
Rafiee Rad, S. (2017). Equivocation axiom on first order languages. Studia Logica, 105(1), 121–152.
Rafiee Rad, S. (2018). Maximum entropy models for Σ1 sentences. Journal of Applied Logics - IfCoLoG Journal of Logics and their Applications, 5(1), 287–300.
Rafiee Rad, S. (2021). On probabilistic characterisation of models of first order theories. Annals of Pure and Applied Logic, 172(1).
Seidenfeld, T. (1986). Entropy and uncertainty. Philosophy of Science, 53(4), 467–491.
Shannon, C. (1948). A mathematical theory of communication. The Bell System Technical Journal, 27, 379–423.
Williams, P. M. (1980). Bayesian conditionalisation and the principle of minimum information. British Journal for the Philosophy of Science, 31(2), 131–144.
Williamson, J. (2008). Objective Bayesian probabilistic logic. Journal of Algorithms, 63(4), 167–183.
Williamson, J. (2010). In defence of objective Bayesianism. Oxford: Oxford University Press.
Williamson, J. (2017). Lectures on inductive logic. Oxford: Oxford University Press.
Acknowledgments
We are grateful to Jeff Paris and Alena Vencovská for very helpful advice.
Funding
Open access funding provided by Università degli Studi di Milano within the CRUI-CARE Agreement. Jürgen Landes is funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) 405961989 and 432308570 and NextGenerationEU funding for the project “Practical Reasoning for Human-Centred Artificial Intelligence”. Soroush Rafiee Rad’s work is supported by the Dutch Institute for Emergent Phenomena (DIEP) cluster at the University of Amsterdam and partly by the Deutsche Forschungsge-meinschaft (DFG, German Research Foundation) 432308570. Jon Williamson is funded by the Leverhulme Trust (grant RPG-2019-059) and the Deutsche Forschungsgemeinschaft (DFG, grant 432308570).
Author information
Authors and Affiliations
Contributions
All authors significantly contributed to this manuscript. All authors have read and approved the uploaded version of the manuscript.
Corresponding author
Ethics declarations
Conflict of Interest
No author reports a conflict of interest.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix 1: Proofs of Proposition 15 and Theorem 16
First let us recount some basic information-theoretic facts.
The n-divergence of two probability functions P and Q is defined as the Kullback-Leibler divergence of P from Q on \({\mathcal L}_{n}\):
A Pythagorean theorem holds for the n-divergence dn [7, Theorem 11.6.1]:
for any convex F ⊆ ℙ, if P ∈ F and Q ∉ F, where Rn ∈ arg infS∈Fdn(S,Q).
Consequently, for any P ∈ E and Qn ∈ ℍn [24, corollary 32]:
Pinsker’s inequality connects the L1 distance to n-divergence (see, e.g., [7, Lemma 11.6.1]):
Proposition 15
If P is a limit in entropy of \(\mathbb E\) then there are Qn ∈ ℍn such that ∥Qn − P∥n→0 as n→∞.
Proof
Putting our last two information-theoretic facts together we have that
for Qn ∈ ℍn and P ∈ E.
Now, if P is a limit in entropy of \(\mathbb E\) then there are Qn ∈ ℍn such that |Hn(Qn) − Hn(P)|→0 as n→∞. Hence ∥P − Qn∥n2 also converge to zero, as required.□
Theorem 16
If \(\mathbb E\) contains a limit in entropy P then
Proof
First we shall show that P ∈ maxent E; later we shall see that there is no other member of maxent E.
First, then, assume for contradiction that P∉ maxent E. Then there is some Q ∈ E such that Q has greater entropy than P. That is, for sufficiently large n, Hn(Qn) ≥ Hn(Q) > Hn(P), where the Qn ∈ ℍn converge in entropy (and, by Proposition 15, in L1) to P. N.b., Q≠P. Hence, for sufficiently large n,
Since the Qn converge in entropy to P, they converge in L1 to Q. By the uniqueness of L1 limit points, Q = P: a contradiction. Hence P ∈ maxent E, as required.
Next we shall see that P is the unique member of maxent E. Suppose for contradiction that there is some P‡∈ maxent E such that P‡≠P. Then P cannot eventually dominate P‡ in n-entropy—i.e., there is some infinite set J ⊆ ℕ such that for n ∈ J,
Let R= dfλP‡ + (1 − λ)P for some λ ∈ (0, 1). Now by the log-sum inequality [7, Theorem 2.7.1], for all n ∈ J large enough that P‡(ωn)≠P(ωn) for some ωn ∈Ωn,
Hence,
for large enough n ∈ J.
Now by Pinsker’s inequality and the definition of R,
Let fn(φ)= dfP(φ) − Qn(φ) + λ(P‡(φ) − P(φ)) and \(\rho _{n} \stackrel {\text {df}}{=} \bigvee _{f_{n}(\omega _{n})>0} \omega _{n}\). Then,
after substituting P(¬ρn) = 1 − P(ρn) etc.
Let us consider the behaviour of
as n→∞. Now, P(ρn) − Qn(ρn)→0 as n→∞, because Qn converges in L1 to P. However, λ(P‡(ρn) − P(ρn))↛0 as n→∞, as we shall now see. P‡≠P by assumption, so they must differ on some quantifier-free sentence ψ, a sentence of m, say. Suppose without loss of generality that P‡(ψ) > P(ψ) (otherwise take ¬ψ instead) and let δ = P‡(ψ) − P(ψ) > 0. Now for n ≥ m,
Since Qn converges in L1 to P we can consider n > m large enough that [7, Equation 11.137]:
In particular, since ψ is quantifier-free, \( Q_{n}(\psi )-P (\psi ) \leq \max \limits _{\varphi \in S{\mathcal L}_{n}} (Q_{n}(\varphi ) - P (\varphi )) < \lambda \delta /2\). For any such n,
Putting the above parts together, we have that for sufficiently large n ∈ J,
However, that these Hn(Qn) − Hn(P) are bounded away from zero contradicts the assumption that the Qn converge in entropy to P. Hence, P is the unique member of maxent E, as required.□
Appendix 2. Alternative Proof of Corollary 20
This appendix provides a more direct proof of Corollary 20, which identifies an important scenario in which the equivocator function conditioned on a categorical constraint is the maximal entropy function.
Corollary 20
If ℍn contains P=(⋅|φ) for sufficiently large n then
Proof
There are two cases: either P=(φ) = 1 or P=(φ) < 1.
If P=(φ) = 1 then P= ∈ Eφ and P=(⋅|φ) = P=(⋅). P= is the unique member of maxent Eφ because the equivocator function has greater entropy than any other probability function, so maxent Eφ = {P=(⋅|φ)}, as required.
If P=(φ) < 1 then we can proceed as follows.
Since P=(φ) > 0, P=(⋅|φ) is well defined. P=(φ|φ) = 1 so P=(⋅|φ) ∈ E. Thus Eφ≠∅.
Suppose for contradiction that maxent Eφ≠{P=(⋅|φ)}. Then in Eφ there must be some P‡≠P=(⋅|φ) that is not eventually dominated in entropy by P=(⋅|φ). That is, there is some infinite J ⊆ ℕ such that Hn(P‡) ≥ Hn(P=(⋅|φ)) for all n ∈ J. (To see this consider that there are three cases: (i) if maxent Eφ = ∅ then every member of Eφ is eventually dominated by some other in entropy, so P=(⋅|φ) is dominated by some P‡ and P‡ is not dominated by P=(⋅|φ); (ii) if P=(⋅|φ)∉ maxent Eφ = {P‡,…} then P‡ is not dominated by P=(⋅|φ); (iii) if maxent Eφ = {P=(⋅|φ),P‡,…} then P‡ is not dominated by P=(⋅|φ).)
Define a probability function Q= dfλP‡ + (1 − λ)P=(⋅|φ) for some λ ∈ (0, 1). By the log-sum inequality [7, Theorem 2.7.1], for all n ∈ J large enough that P‡(ω)≠P=(ω|φ) for some ω ∈Ωn,
However, that Hn(Q) > Hn(P=(⋅|φ)) for sufficiently large n ∈ J contradicts the assumption that ℍn contains P=(⋅|φ) for sufficiently large n. Hence maxent Eφ = {P=(⋅|φ)}, as required.□
Appendix 3. Zero Measure Premisses of Higher Quantifier Complexity
Proposition 55 (Σ2m)
For \(\varphi =\exists x_{2m}\forall x_{2m-1}{\dots } \forall x_{1} Ux_{2m}x_{2m-1}{\ldots } x_{1}\in {\Sigma }_{2m}\) it holds that for all \(P\in \mathbb E_{\varphi }\) there exists a probability function \(Q\in \mathbb E_{\varphi }\) which has greater entropy. Hence, \(\operatorname {maxent}\mathbb E_{\varphi }=\emptyset \).
Proof
For ease of notation we will write \(U t_{i} \vec {t}\) for \(U t_{i}t_{k_{2m-1}}{\ldots } t_{k_{1}}\) and \(\bigwedge _{t=1}^{n} Ut_{i}\vec {t}\) for \(\bigwedge _{k_{2m-1}=1}^{n} \ldots \bigwedge _{k_{1}=1}^{n} Ut_{i}t_{k_{2m-1}}{\dots } t_{k_{1}}\).
Suppose for contradiction that \(\operatorname {maxent}\mathbb E\neq \emptyset \) and let \(P\in \operatorname {maxent}\mathbb E\). Note that P=(φ) = 0 < 1 = P(φ). Hence, P≠P=.
Let us now define a probability function \(P^{\prime }\in \mathbb E\) by shifting all witnessing of \(\exists x_{2m} \forall x_{2m-1} \exists x_{2m-2}.... \forall x_{1} U\vec {x}\) by one and then adding a constant t1 such that \(Ut_{1}\vec {t}\) is independent from all other literals for all \(\vec {t}\). Intuitively, the literals \(\pm Ut_{i}\vec {t}\) are replaced by \(\pm Ut_{i+1}\vec {t}\).
Formally, let \(\omega _{n}\in {\Omega }_{n}=\bigwedge _{i, t=1}^{n} U^{\epsilon _{i,\vec t}}t_{i}\vec {t}\) be an arbitrary n-state. Then define \(P^{\prime }\) by
Firstly, we note that
So, according to \(P^{\prime }\) the constant t1 is not a witness of the existential premiss sentence φ.
We next show that \(P\neq P^{\prime }\). Firstly, note that
and thus there is a smallest \(i \in \mathbb {N}\) for which \(P(\forall x_{2m-1} \exists x_{2m-2}.... \forall x_{1} Ut_{i}\vec {x})>0\). With this and Eq. 11, we have
So, \(P\neq P^{\prime }\).
We also observe that for all i ≥ 2,
and furthermore,
for all finite index sets I. So,
This means that \(P^{\prime }(\exists x\forall y Uxy)=1\) and thus, as advertised, \(P^{\prime }\in \mathbb E\).
We now calculate n-entropies of P and \(P^{\prime }\) and find for n ≥ 1 that:
Holding the first summation fixed, we note that, since n-entropy is maximised by maximally equivocating, \(H_{n}(P)\leq H_{n}(P^{\prime })\). Now define \(Q:=\frac {P+P^{\prime }}{2}\). Since \(\mathbb E\) is convex and \(P,P^{\prime }\in \mathbb E\), we observe that \(Q\in \mathbb E\).
Since n-entropy is a strictly concave function we conclude that Hn(Q) > Hn(P) whenever P and \(P^{\prime }\) disagree on \(\mathcal L_{n}\). Since \(P\neq P^{\prime }\) there has to exist some finite M and quantifier-free sentence \(\psi \in QFS\mathcal L_{M}\) such that \(P(\psi )\neq P^{\prime }(\psi )\) (Gaifman’s Theorem). Since \(\mathcal L_{m}\subset \mathcal L_{m+1}\) for all m we have that P disagrees with \(P^{\prime }\) on \(\mathcal L_{m}\) for all m ≥ M. We have hence found a \(Q\in \mathbb E\) such that Hn(Q) > Hn(P) for all large enough n. Hence, \(P\notin \operatorname {maxent}\mathbb E\). Contradiction. □
Proposition 56 (π3)
For φ = ∀x∃y∀zSxyz ∈π3 it holds that for all \(P\in \mathbb E_{\varphi }\) there exists a probability function \(Q\in \mathbb E_{\varphi }\) which has greater entropy. Hence, \(\operatorname {maxent}\mathbb E_{\varphi }=\emptyset \).
Proof
Let us first note that
Assume for contradiction that \(P\in \operatorname {maxent}\mathbb E_{\varphi }\). Since P=(φ) = 0, P cannot be the equivocator. However, since \(P\in \mathbb E_{\varphi }\), it must also hold that for all ti (\(i\in \mathbb N\)) there has to exist some minimal \(t_{k^{*}_{i}}\) (\(k^{*}_{i}\geq 1\)) such that \(P(\forall zSt_{i}t_{k^{*}_{i}}z)>0\).
We now define a probability function \(Q\in \mathbb E_{\varphi }\) which has greater entropy than P, which contradicts that \(P\in \operatorname {maxent}\mathbb E_{\varphi }\). First, we postpone for all i the witnessing (see Proposition 53) to \(k^{*}_{i}+1\). This is again achieved by first defining a probability function \(P^{\prime }\in \mathbb E_{\varphi }\setminus \{P\}\) such that \(H_{n}(P^{\prime })\geq H_{n}(P)\) for all large enough n:
As we saw in Proposition 53, \(P^{\prime }(\exists y\forall z St_{i}yz)=1\) for all \(i\in \mathbb N\). Furthermore, for all \(i\in \mathbb N\) there exists an \(n_{i}\in \mathbb N\) and \(\epsilon _{k,l}\in \{0,1\}^{n_{i}\times n_{i}}\) such that \(P^{\prime }(\bigwedge _{k=1}^{n_{i}}\bigwedge _{l=1}^{n_{i}} S^{\epsilon _{k,l}}t_{i}t_{k}t_{l})\neq P(\bigwedge _{k=1}^{n_{i}}\bigwedge _{l=1}^{n_{i}} S^{\epsilon _{k,l}}t_{i}t_{k}t_{l})\).
Given the way we wrote \(\mathbb E_{\varphi }\) (see Eq. 12), we see that every extension of \(P^{\prime }\) to a probability function—which so far has not been defined on the entire language—will be in \(\mathbb E_{\varphi }\) since membership in \(\mathbb E_{\varphi }\) solely depends on sub-states where the first constant is fixed to some ti.
We now define \(P^{\prime }\) on an arbitrary n-state ωn of the language, and hence on the entire language by
Because of the additivity of the entropy function [8, P. 63], we also find for all \(n\in \mathbb N\) that
Since the entropy function is maximised for independent variables we also find:
Now recall that we saw in Proposition 53 that the following inequality holds for all large enough fixed \(i\in \mathbb N\):
So, we have for all large enough \(n\in \mathbb N\) that
We again put \(Q:=\frac {P+P^{\prime }}{2}\) and note that since \(P\neq P^{\prime }\), Q≠P. Since \(P^{\prime }\in \mathbb E_{\varphi }\) we easily find by applying the convexity of \(\mathbb E_{\varphi }\) that \(Q\in \mathbb E_{\varphi }\). Furthermore, Hn(Q) > Hn(P) for all large enough \(n\in \mathbb N\) since Q is a convex combination of P and \(P^{\prime }\) and \(H_{n}(P^{\prime })\geq H_{n}(P)\) for all \(n\in \mathbb N\). □
Proposition 57 (π2m+ 3)
For \(\varphi =\forall v_{1}\exists w_{1}\dots \forall v_{m}\exists w_{m}\forall x\exists y\forall z Rv_{1}w_{1}{\dots } v_{m}w_{m}xyz\) ∈π2m+ 1 and for all \(P\in \mathbb E_{\varphi }\) there exists a probability function \(Q\in \mathbb E_{\varphi }\) which has greater entropy than P. Hence, \(\operatorname {maxent}\mathbb E_{\varphi }=\emptyset \).
Proof
The proof proceeds by induction on the quantifier complexity m.
The base case m = 0 is Proposition 56.
The induction step for m ≥ 1 assumes the result for m − 1 ≥ 0. The proof follows the blueprint laid out in the base case.
Let us first note that
Assume for contradiction that \(P\in \operatorname {maxent}\mathbb E_{\varphi }\). Since P=(φ) = 0, P cannot be the equivocator. However, since \(P\in \mathbb E_{\varphi }\), it must also hold that for all ti (\(i\in \mathbb N\)) there has to exist some minimal \(t_{k^{*}_{i}}\) (\(k^{*}_{i}\geq 1\)) such that \(P(\forall v_{2}\exists w_{2}\dots \forall v_{m}\exists w_{m}\forall x\exists y\forall z Rt_{i}t_{k^{*}_{i}}v_{2}w_{2}{\dots } v_{m}w_{m}xyz)>0\). We now postpone this witnessing as usual.
We begin by assigning probabilities to substates fixing ti
Again, \(P^{\prime }(\exists w_{1}\dots \forall v_{m}\exists w_{m}\forall x\exists y\forall z Rt_{i}w_{1}\dots v_{m}w_{m}xyz)=1\) for all \(i\in \mathbb N\). Furthermore, for all \(i\in \mathbb N\) there exist an \(n_{i}\in \mathbb N\) and \(\vec \epsilon \in \{0,1\}^{n_{i}^{2m+2}}\) such that
In particular, \(P^{\prime }\neq P\).
We now define \(P^{\prime }\) on an arbitrary n-state ωn of the language, and hence on the entire language, by fixing \(\vec \epsilon _{i}\in \{0,1\}^{n^{2m+2}}\) for 1 ≤ i ≤ n and letting
Because of the additivity of the entropy function [8, p. 63], we also find for all \(n\in \mathbb N\) that
We now use the proof of Proposition 55 to obtain that for all i and all large enough n (depending on i),
iHn,2m(P) is the n-entropy of a probability function P on a language containing one (2m + 2)-ary relation symbol U, \(\varphi =\exists w_{1}\forall v_{2}\exists w_{2}\dots \exists w_{m+1}\forall v_{m+2} Uw_{1}v_{2}w_{2}{\dots } w_{m+1}v_{m+2}\in {\Pi }_{2m+2}\) and \(P\in \mathbb E_{\varphi }\).
Since n-entropy is maximised by probability functions with as many probabilistic independences as possible, we again have:
which overall gives the inequality:
Taking Q to be any convex combination of P and \(P^{\prime }\), we see that Hn(Q) > Hn(P) for all large enough n. This entails that Q has greater entropy than P. □
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Landes, J., Rafiee Rad, S. & Williamson, J. Determining Maximal Entropy Functions for Objective Bayesian Inductive Logic. J Philos Logic 52, 555–608 (2023). https://doi.org/10.1007/s10992-022-09680-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10992-022-09680-6