Abstract
We study the extent to which divisors of a typical integer n are concentrated. In particular, defining \(\Delta (n) := \max _t \# \{d | n, \log d \in [t,t+1]\}\), we show that \(\Delta (n) \geqslant (\log \log n)^{0.35332277\ldots }\) for almost all n, a bound we believe to be sharp. This disproves a conjecture of Maier and Tenenbaum. We also prove analogs for the concentration of divisors of a random permutation and of a random polynomial over a finite field. Most of the paper is devoted to a study of the following much more combinatorial problem of independent interest. Pick a random set \({\textbf{A}} \subset {\mathbb {N}}\) by selecting i to lie in \({\textbf{A}}\) with probability 1/i. What is the supremum of all exponents \(\beta _k\) such that, almost surely as \(D \rightarrow \infty \), some integer is the sum of elements of \({\textbf{A}} \cap [D^{\beta _k}, D]\) in k different ways? We characterise \(\beta _k\) as the solution to a certain optimisation problem over measures on the discrete cube \(\{0,1\}^k\), and obtain lower bounds for \(\beta _k\) which we believe to be asymptotically sharp.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
Part I. Main results and overview of the paper
1 Introduction
1.1 The concentration of divisors
Given an integer n, we define the Delta function
that is to say the maximum number of divisors n has in any interval of logarithmic length 1. Its normal order (almost sure behaviour) has proven quite mysterious, and indeed it was a celebrated achievement of Maier and Tenenbaum [20], answering a question of Erdős from 1948 [9], to show that \(\Delta (n) > 1\) for almost allFootnote 1n.
Work on the distribution of \(\Delta \) began in the 1970s with Erdős and Nicolas [7, 8]. However, it was not until the work of Hooley [16] that the Delta function received proper attention. Among other things, Hooley showed how bounds on the average size of \(\Delta \) can be used to count points on certain algebraic varieties. Further work on the normal and average behavior of \(\Delta \) can be found in the papers of Tenenbaum [23, 24], Hall and Tenenbaum [12,13,14], and of Maier and Tenenbaum [20,21,22]. See also [15, Ch. 5,6,7]. Finally, Tenenbaum’s survey paper [26, p. 652–658] includes a history of the Delta function and description of many applications in number theory.
The best bounds for \(\Delta (n)\) for “normal” n currently known were obtained in a more recent paper of Maier and Tenenbaum [22].
Theorem MT (Maier–Tenenbaum [22]) Let \(\varepsilon >0\) be fixed. Then
for almost all n, where
It is conjectured in [22] that the lower bound is optimal.
One of the main results of this paper is a disproof of this conjecture.
Theorem 1
Let \(\varepsilon >0\) be fixed. Then
for almost all n, where \(\eta = 0.35332277270132346711\ldots \).
The constant \(\eta \), which we believe to be sharp, is described in relation (1.3) below, just after the statement of Theorem 2.
1.2 Packing divisors
Let us briefly attempt to explain, without details, why it was natural for Maier and Tenenbaum to make their conjecture, and what it is that allows us to find even more tightly packed divisors.
We start with a simple observation. Let n be an integer, and suppose we can find pairs of divisors \(d_i, d'_i\) of n, \(i = 1,\ldots , k\), such that
-
\(1 < d_i/d'_i \leqslant 2^{1/k}\);
-
The sets of primes dividing \(d_id'_i\) are disjoint, as i varies in \(\{1,\ldots , k\}\).
Then we can find \(2^k\) different divisors of n in a dyadic interval, namely all products \(a_1\ldots a_k\) where \(a_i\) is either \(d_i\) or \(d'_i\).
In [22], Maier and Tenenbaum showed how to find many such pairs of divisors \(d_i, d'_i\). To begin with, they look only at the large prime factors of n. They first find one pair \(d_1, d'_1\) using the technique of [20]. Then, using a modification of the argument, they locate a further pair \(d_2\) and \(d'_2\), but with these divisors not having any primes in common with \(d_1, d'_1\). They continue in this fashion to find \(d_3, d'_3\), \(d_4,d_4'\), etc., until essentially all the large prime divisors of n have been used. After this, they move on to a smaller range of prime factors of n, and so on.
By contrast, we eschew an iterative approach and select \(2^k\) close divisors from amongst the large prime divisors of n in one go, in a manner that is combinatorially quite different to that of Maier and Tenenbaum. We then apply a similar technique to a smaller range of prime factors of n, and so on. This turns out to be a more efficient way of locating proximal divisors.
In fact, we provide a general framework that encapsulates all possible combinatorial constructions one might use to pack many divisors close to each other. To work in this generality it is necessary to use a probabilistic formalism. One effect of this is that, even though our work contains that of Maier and Tenenbaum as a special case, the arguments here will look totally different.
1.3 Random sets and equal sums
For most of the paper we do not talk about integers and divisors, but rather about the following model setting. Throughout the paper, \({\textbf{A}}\) will denote a random set of positive integers in which i is included in \({\textbf{A}}\) with probability 1/i, these choices being independent for different is. We refer to \({\textbf{A}}\) as a logarithmic random set.
A large proportion of our paper will be devoted to understanding conditions under which there is an integer which can be represented as a sum of elements of \({\textbf{A}}\) in (at least) k different ways. In particular, we wish to obtain bounds on the quantities \(\beta _k\) defined in the following problem.
Problem 1
Let \(k \geqslant 2\) be an integer. Determine \(\beta _k\), the supremum of all exponents \(c < 1\) for which the following is true: with probability tending to 1 as \(D \rightarrow \infty \), there are distinct sets \(A_1, \ldots , A_k \subset {\textbf{A}}\cap [D^c, D]\) with equal sums, i.e., \(\sum _{a \in A_1} a = \cdots = \sum _{a \in A_k} a\).
The motivation for the random set \({\textbf{A}}\) comes from our knowledge of the anatomy of integers, permutations and polynomials. For a random integer \(m\leqslant x\), with x large, let \(U_k\) be the event that m has a prime factor in the interval \((e^{k},e^{k+1}]\). For a random permutation \(\sigma \in S_n\), let \(V_k\) be the event that \(\sigma \) has a cycle of size k, and for a random monic polynomial f of degree n over \({\mathbb {F}}_q\), with n large, let \(W_k\) be the event that f has an irreducible factor of degree k. Then it is known (see e.g., [2, 3, 15]) that \(U_k\), \(V_k\) and \(W_k\) each occur with probability close to 1/k, and also that the \(U_k\) are close to independent for \(k=o(\log x)\), the \(V_k\) are close to independent for \(k=o(n)\), and the \(W_k\) are close to independent for k large and \(k=o(n)\). Thus, the model set \({\textbf{A}}\) captures the factorization structure of random integers, random permutations and random polynomials over a finite field. It is then relatively straightforward to transfer results about subset sums of \({\textbf{A}}\) to divisors of integers, permutations and polynomials. Section 2 below contains details of the transference principle.
The main result of this paper is an asymptotic lower bound on \(\beta _k\).
Theorem 2
We have \(\liminf _{r\rightarrow \infty } (\beta _{2^r})^{1/r} \geqslant \rho /2\), where
is a specific constant defined as the unique solution in [0, 1/3] of
where the sequence \(a_j\) is defined by
The proof of Theorem 2 will occupy the bulk of this paper, and has three basic parts:
-
(a)
Showing that for every \(r\geqslant 1\), \(\beta _{2^r} \geqslant \theta _r\) for a certain explicitly defined constant \(\theta _r\);
-
(b)
Showing that \(\lim _{r\rightarrow \infty } \theta _r^{1/r}\) exists;
-
(c)
Showing that (1.1) has a unique solution \(\rho \in [0,1/3]\) and that
$$\begin{aligned} \rho =2\lim _{r\rightarrow \infty } \theta _r^{1/r}. \end{aligned}$$
In the sequel we shall refer to “Theorem 2 (a)”, “Theorem 2 (b)” and “Theorem 2 (c)”. Parts (a), (b) and (c) are quite independent of one another, with the proof of (a) (given in Sect. 9.2) being by far the longest of the three. The definition of \(\theta _r\), while somewhat complicated, is fairly self-contained: see Definition 9.6. Parts (b) and (c) are then problems of an analytic and combinatorial flavour which can be addressed largely independently of the main arguments of the paper. The formula (1.1) allows for a quick computation of \(\rho \) to many decimal places, as the limit on the right side converges extremely rapidly. See Sect. 12 for details.
Let us now state an important corollary of Theorem 2.
Corollary 1
Define
Then
Proof
Evidently, \(\zeta _+\geqslant \zeta _-\). In addition, observe the trivial bound \(\beta _k \leqslant \beta _{k+1}\). Hence,
We then use Theorem 2 to find that \(\zeta _-\geqslant \eta \). \(\square \)
We conjecture that our lower bounds on \(\beta _k\) are asymptotically sharp, so that the following holds:
Conjecture 1
We have \(\zeta _+=\zeta _- = \eta \).
We will address the exact values of \(\beta _k\) in a future paper; in particular, we will show that
and
where
1.4 Application to divisors of integers, permutations and polynomials
The link between Problem 1 and the concentration of divisors is given by the following Theorems. The proofs are relatively straightforward and given in the next section. Recall from (1.2) the definition of \(\zeta _+\).
Theorem 3
For any \(\varepsilon >0\), we have
for almost every n.
Remark
In principle, the proof of Theorem 3 yields an explicit bound on the size of the set of integers n with \(\Delta (n)\leqslant (\log \log n)^{\zeta _+-\varepsilon }\). However, incorporating such an improvement is a very complicated task. In addition, the obtained bound will presumably be rather weak without a better understanding of the theoretical tools we develop (cf. Sect. 3).
The same probabilistic setup allows us to quickly make similar conclusions about the distribution of divisors (product of cycles) of permutations and of polynomials over finite fields.
Theorem 4
For a permutation \(\sigma \) on \(S_n\), denote by
where d denotes a generic divisor of \(\sigma \); that is, d is the product of a subset of the cycles of \(\sigma \).
Let \(\varepsilon >0\) be fixed. If n is sufficiently large in terms of \(\varepsilon \), then for at least \((1-\varepsilon )(n!)\) of the permutations \(\sigma \in S_n\), we have
Theorem 5
Let q be any prime power. For a polynomial \(f\in {\mathbb {F}}_q[t]\), let
Let \(\varepsilon >0\) be fixed. If n is sufficiently large in terms of \(\varepsilon \), then at least \((1-\varepsilon ) q^n\) monic polynomials of degree n satisfy
Conjecture 2
The lower bounds given in Theorems 3, 4 and 5 are sharp. That is, corresponding upper bounds with exponent \(\zeta _+ + \varepsilon \) hold.
If both Conjectures 1 and 2 hold, then we deduce that the optimal exponent in the above theorems is equal to \(\eta \).
Remark
The exponent \(\zeta _+-\varepsilon \) in Theorems 3, 4 and 5 depends only on accurate asymptotics for \(\beta _k\) as \(k\rightarrow \infty \) or, even more weakly, for \(\beta _{2^r}\) as \(r\rightarrow \infty \) (cf. (1.4)). In this work, however, we develop a framework for determining \(\beta _k\) exactly for each k.
The quantity \(\beta _k\) is also closely related to the densest packing of k divisors of a typical integer. To be specific, we define \(\alpha _k\) be the supremum of all real numbers \(\alpha \) such that for almost every \(n\in {\mathbb {N}}\), n has k divisors \(d_1<\cdots <d_k\) with \(d_k \leqslant d_1 (1+ (\log n)^{-\alpha })\). In 1964, Erdős [10] conjectured that \(\alpha _2 = \log 3 -1\), and this was confirmed by Erdős and Hall [6] (upper bound) and Maier and Tenenbaum [20] (lower bound). The best bounds on \(\alpha _k\) for \(k\geqslant 3\) are given by Maier and Tenenbaum [22], who showed that
and (this is not stated explicitly in [22])
See also [26, p. 655–656].Footnote 2 In particular, it is not known if \(\alpha _3 > \alpha _4\), although Tenenbaum [26] conjectures that the sequence \((\alpha _k)_{k\geqslant 2}\) is strictly decreasing.
We can quickly deduce a lower bound for \(\alpha _k\) in terms of \(\beta _k\).
Theorem 6
For all \(k\geqslant 2\) we have \(\alpha _k \geqslant \beta _k/(1-\beta _k)\).
In particular,
which is substantially larger than the bound from (1.5), which is \(\alpha _3 \geqslant 0.0127069\ldots \).
Combining Theorem 6 with the bounds on \(\beta _k\) given in Theorem 2, we have improved the lower bounds (1.5) for large k.
The upper bound on \(\alpha _k\) is more delicate, and a subject which we will return to in a future paper. For now, we record our belief that the lower bound in Theorem 6 is sharp.
Conjecture 3
For all \(k\geqslant 2\) we have \(\alpha _k = \beta _k/(1-\beta _k)\).
2 Application to random integers, random permutations and random polynomials
In this section we assume the validity of Theorem 2 and use it to prove Theorems 3, 4, 5 and 6. The two main ingredients in this deduction are a simple combinatorial device (Lemma 2.1), of a type often known as a “tensor power trick”, used for building a large collection of equal subset sums, and transference results (Lemmas 2.2, 2.3 and 2.4) giving a correspondence between the random set \({\textbf{A}}\) and prime factors of a random integer, the cycle structure of a random permutation and the factorization of a random polynomial over a finite field. In the integer setting, this is a well-known principle following, e.g. from the Kubilius model of the integers (Kubilius, Elliott [4, 5], Tenenbaum [25]). We give a self-contained (modulo using the sieve) proof below.
Throughout this section, \({\textbf{A}}\) denotes a logarithmic random set.
2.1 A “tensor power” argument
In this section we give a simple combinatorial argument, first used in a related context in the work of Maier–Tenenbaum [20], which shows how to use equal subsums in multiple intervals \(((D')^c,D']\) to create many more common subsums in \({\mathcal {A}}\).
Lemma 2.1
Let \(k \in {\mathbb {Z}}_{\geqslant 2}\) and \(\varepsilon >0\) be fixed. Let \(D_1,D_2\) be parameters depending on D with \(3 \leqslant D_1 < D_2 \leqslant D\), \(\log \log D_1 = o(\log \log D)\) and \(\log \log D_2 = (1 - o(1)) \log \log D\) as \(D\rightarrow \infty \). Then, with probability \(\rightarrow 1\) as \(D\rightarrow \infty \), there are distinct \(A_1,\ldots , A_M \subset {\textbf{A}}\cap [D_1, D_2]\) with \(\sum _{a \in A_1} a = \cdots = \sum _{a \in A_M} a\) and \(M \geqslant (\log D)^{(\log k)/\log (1/\beta _k) - \varepsilon }\).
Remark
In particular, the result applies when \(D_1 = 3\) and \(D_2 = D\), in which case it has independent combinatorial interest, giving a (probably tight) lower bound on the growth of the representation function for a random set.
Proof
Since increasing the value of \(D_1\) only makes the proposition stronger, we may assume that \(D_1 \rightarrow \infty \) as \(D\rightarrow \infty \). Let \(0<\delta <\beta _k\), and set \(\alpha := \beta _k - \delta \). Set
and consider the intervals \([D_2^{\alpha ^{i+1}}, D_2^{\alpha ^i})\), \(i = 0,1,\ldots , m - 1\). Due to the choice of m, these all lie in \([D_1, D_2]\).
Let \(E_i\), \(i = 0,1,2,\ldots \) be the event that there are distinct \(A^{(i)}_1,\ldots , A^{(i)}_k \subset [D_2^{\alpha ^{i+1}}, D_2^{\alpha ^i})\) with \(\sum _{a \in A^{(i)}_1} a = \cdots = \sum _{a \in A^{(i)}_k} a\). Then, by the definition of \(\beta _k\) and the fact that \(D_1 \rightarrow \infty \), we have \({\mathbb {P}}(E_i) = 1 - o(1)\), uniformly in \(i=0,1,\ldots ,m-1\). Here and throughout the proof, o(1) means a function tending to zero as \(D\rightarrow \infty \), at a rate which may depend on \(k,\delta \). These events \(E_i\) are all independent. The Law of Large Numbers then implies that, with probability \(1 - o(1)\), at least \((1 - o(1))m\) of them occur, let us say for \(i \in I\), \(|I| = (1 - o(1))m\).
From the above discussion, we have found \(M := k^{|I|} = k^{(1 - o(1))m}\) distinct sets \(B_{\varvec{j}} = \bigcup _{i \in I} A_{j_i}^{(i)}\), \({\varvec{j}} \in [k]^{I}\), such that all of the sums \(\sum _{a \in B_{\varvec{j}}} a\) are the same. Note that
Taking \(\delta \) small enough and D large enough, the result follows. \(\square \)
2.2 Modeling prime factors with a logarithmic random set
Let X be a large parameter, suppose that
and let \(I=[i_1,i_2] \cap {\mathbb {N}}\), where
For a uniformly random positive integer \({\textbf{n}}\leqslant X\), let \({\textbf{n}}=\prod _p p^{v_p}\) be the the prime factorization of \({\textbf{n}}\), where the product is over all primes. Let \({\mathscr {P}}_i\) be the set of primes in \((e^{i/K}, e^{(i+1)/K}]\), and define the random set
that is, the set of i for which \({\textbf{n}}\) has a prime factor in \({\mathscr {P}}_i\). By the sieve, it is known that the random variables \(v_p\) are nearly independent for \(p=X^{o(1)}\), and thus the probability that \(b_i\geqslant 1\) is roughly
The next lemma makes this precise.
Recall the notion of total variation distance \(d_{{\text {TV}}}(X,Y)\) between two discrete real random vectors X, Y defined on the same probability space \((\Omega ,{\mathcal {F}},{\mathbb {P}})\):
We have
provided that the random variables \(X_j,Y_j\) live on the same probability space for each j, that \(X_1,\ldots ,X_k\) are independent, and \(Y_1,\ldots ,Y_k\) are also independent. Although we believe this is a standard inequality, we could not find a good reference for it and give a proof of (2.4) in Lemma A.8. In addition, recall the identity
when X and Y take values in a probability space \((\Omega ,{\mathcal {F}},{\mathbb {P}})\) with \(\Omega \) countable and \({\mathcal {F}}\) being the power set of \(\Omega \). See, e.g. [19, Proposition 4.2].
Lemma 2.2
Uniformly for any collection \({\mathscr {I}}\) of subsets of I, we have
Proof
For \(i_1 \leqslant i\leqslant i_2\), let \(\omega _i\) be the indicator function of the event that \({{\textbf{n}}}\) has a prime factor from \({\mathscr {P}}_i\), let \(Q_i\) be a Poisson random variable with parameter \(R_i\), with the different \(Q_i\) independent, and let \(Z_i=1_{Q_i\geqslant 1}\).Footnote 3 Also, let \(Y_i\) be a Bernoulli random variable with \({\mathbb {P}}(Y_i=1)=1/i\), again with the \(Y_i\) independent. Let \(\varvec{\omega }, {\textbf{Z}}, {\textbf{Y}}\) denote the vectors of the variables \(\omega _i,Z_i,Y_i\), respectively. By assumption, each \({\mathscr {P}}_i \subset [\log X, X^{1/3\log \log \log X}]\). Hence, Theorem 1 of [11] implies that
In addition, note that \(d_{{\text {TV}}}(Z_i,Y_i)\ll 1/i^2\) for all i, something that can be easily proven using (2.5). Combining this estimate with (2.4), we find that
The triangle inequality then implies that \(d_{{\text {TV}}}(\varvec{\omega },{\textbf{Y}})\ll 1/\log \log X\), as desired. \(\square \)
2.3 The concentration of divisors of integers
In this section we prove Theorems 3 and 6. Recall from (1.2) the definition of \(\zeta _+\).
Proof of Theorem 3
Fix \(\varepsilon >0\) and let X be large enough in terms of \(\varepsilon \), and let \({\textbf{n}} \leqslant X\) be a uniformly sampled random integer. Generate a logarithmic random set \({\textbf{A}}\). Set \(K=10 \log \log X\), \(D_1 = i_1\), \(D=D_2 = i_2\), where \(i_1\) and \(i_2\) are defined by (2.2). With our choice of parameters, the hypotheses of Lemma 2.1 hold and hence, with probability \(1 - o(1)\) as \(X\rightarrow \infty \), there are distinct sets \(A_1,\ldots , A_M \subset {\textbf{A}}\cap [D_1, D_2]\) with \(\sum _{a \in A_1} a = \cdots = \sum _{a \in A_M} a\) and \(M:=\lceil (\log \log X)^{\zeta _+ - \varepsilon }\rceil \). By Lemma A.2, with probability \(1 - o(1)\), we have
Write F for the event that both of these happen.
Let \({{\textbf{n}}}\) be a random integer chosen uniformly in [1, X], and let \({\textbf{I}}\) be the random set associated to \({{\textbf{n}}}\) via (2.3). By Lemma 2.2, the corresponding event \(F'\) for \({\textbf{I}}\) also holds with probability \(1-o(1)\); that is, \(F'\) is the event that \(|{\textbf{I}} \cap [D_1,D_2]| \leqslant 2\log D_2\) and that there are distinct subsets \(I_1,\ldots ,I_M\) with equal sums. Assume we are in the event \(F'\). For each \(i\in {\textbf{I}}\), \({\textbf{n}}\) is divisible by some prime \(p_i\in {\mathscr {P}}_i\). In addition, for each \(r,s\in \{1,2,\ldots ,M\}\), we have
Writing \(d_r := \prod _{i \in I_r} p_i\) for each i, we thus see that the \(d_r\)’s are all divisors of \({\textbf{n}}\) and their logarithms all lie in an interval of length 1. It follows that \({\mathbb {P}}(\Delta ({\textbf{n}}) \geqslant M) = 1 - o(1)\) when \({{\textbf{n}}}\) is a uniformly sampled random integer from [1, X], as required for Theorem 3. \(\square \)
Proof of Theorem 6
Fix \(0<c < \beta _k/(1-\beta _k)\), let X be large and set \(K= (\log X)^{c}\). Define \(i_1,i_2\) by (2.2), let \(D=i_2\) and define \(c'\) by \(D^{c'} = i_1\). Let \({{\textbf{n}}}\) be a random integer chosen uniformly in [1, X]. We have
and therefore \(c' \leqslant \beta _k-\delta \) for some \(\delta >0\), which depends only on c. By the definition of \(\beta _k\) and Lemma 2.2, it follows that with probability \(1-o(1)\), the set \({\textbf{I}}\) defined in (2.3) has k distinct subsets \(I_1,\ldots ,I_k\) with equal sums, and moreover (cf. the proof of Theorem 3 above), \(|{\textbf{I}}| \leqslant 2\log i_2\), so that \(|I_j|\leqslant 2\log i_2\) for each j. Thus, with probability \(1-o(1)\), there are primes \(p_i\in {\mathscr {P}}_i\) (\(i\in {\textbf{I}}\)) such that for any \(r,s\in \{1,\ldots ,k\}\) we have
Thus, setting \(d_r = \prod _{i\in I_r} p_i\), we see that \(d_r \leqslant d_s \exp \big \{O\big (\frac{\log \log X}{(\log X)^c}\big ) \big \}\) for any \(r,s\in \{1,\ldots ,k\}\). Since c is arbitrary subject to \(c<\beta _k/(1-\beta _k)\), we conclude that \(\alpha _k \geqslant \beta _k/(1-\beta _k)\). \(\square \)
2.4 Permutations and polynomials over finite fields
The connection between random logarithmic sets, random permutations and random polynomials is more straightforward, owing to the well-known approximations of these objects by a vector of Poisson random variables.
For each j, let \(Z_j\) be a Poisson random variable with parameter 1/j, and such that \(Z_1,Z_2,\ldots ,\) are independent. The next proposition states that, apart from the very longest cycles, the cycle lengths of a random permutation have a joint Poisson distribution.
Lemma 2.3
For a random permutation \(\sigma \in S_n\), let \(C_j(\sigma )\) denote the number of cycles in \(\sigma \) of length j. Then for \(r = o(n)\) as \(n\rightarrow \infty \) we have
Proof
In fact there is a bound \(\ll e^{-n/r}\) uniformly in n and r; see [3]. \(\square \)
The next proposition states a similar phenomenon for the degrees of the irreducible factors of a random polynomial over \({\mathbb {F}}_q\), except that now one must also exclude the very smallest degrees as well.
Lemma 2.4
Let q be a prime power. Let f be a random, monic polynomial in \({\mathbb {F}}_q[t]\) of degree n. Let \(Y_d(f)\) denote the number of monic, irreducible factors of f which have degree d. Suppose that \(10\log n \leqslant r \leqslant s\leqslant \frac{n}{10\log n}\). Then
as \(n \rightarrow \infty \).
Proof
For \(r \leqslant i \leqslant s\), let \({\hat{Z}}_i\) be a negative binomial random variableFootnote 4\(\textrm{NB}(\frac{1}{i}\sum _{j|i} \mu (i/j) q^{j},q^{-i})\). Corollary 3.3 in [2] implies that
uniformly in q, n, r, s as in the statement of the lemma. Note that
for \(i\geqslant r\geqslant 10\log n\). A routine if slightly lengthy calculation with (2.5) gives
Combining this with (2.4), we arrive at
The conclusion follows from this, (2.6) and the triangle inequality. \(\square \)
Proof of Theorem 4
Fix \(\varepsilon >0\), let n be large enough in terms of \(\varepsilon \), let \(u=\log n\) and \(v=n/\log n\). For a random permutation \(\sigma \in S_n\), let \({\textbf{C}} = \{ j: C_j(\sigma )\geqslant 1 \}\), and define the random set \({\tilde{{\textbf{A}}}} = \{ j: Z_j \geqslant 1 \}\). As in the proof of Lemma 2.2, (2.4) and (2.5) imply that
Lemma 2.3 implies that
Hence,
as \(n \rightarrow \infty \). By Lemma 2.1, with probability \(\rightarrow 1\) as \(n\rightarrow \infty \), \({\textbf{A}}\cap (u,v]\) has M distinct subsets \(A_1,\ldots ,A_M\) with equal sums, where \(M=\lceil (\log n)^{\zeta _+-\varepsilon }\rceil \). Hence, \({\textbf{C}}\) has distinct subsets \(S_1,\ldots ,S_M\) with equal sums with probability \(\rightarrow 1\) as \(n\rightarrow \infty \). Each subset \(S_j\) corresponds to a distinct divisor of \(\sigma \), the size of the divisor being the sum of elements of \(S_j\). \(\square \)
Proof of Theorem 5
The proof is essentially the same as that of Theorem 4, except now we take \(u=10\log n\), \(v=\frac{n}{10\log n}\), \({\textbf{C}} = \{j: Y_j(f)\geqslant 1 \}\) and use Lemma 2.4 in place of Lemma 2.3. \(\square \)
3 Overview of the paper
The purpose of this section is to explain the main ideas that go into the proof of Theorem 2 in broad strokes, as well as to outline the structure of the rest of the paper. The remainder of the paper splits into three parts, and we devote a subsection to each of these. Finally, in Sect. 3.4, we make some brief comments about the relationship of our work to previous work of Maier and Tenenbaum [20, 22]. Further comments on this connection are made in Appendix C.
3.1 Part II: equal sums and the optimization problem
Part II provides a very close link between the key quantity \(\beta _k\) (which is defined in Problem 1 and appears in all four of Theorems 2, 3, 4 and 5) and a quantity \(\gamma _k\), which on the face of it appears to be of a completely different nature, being the solution to a certain optimization problem (Problem 3.7 below) involving the manner in which linear subspaces of \({\mathbb {Q}}^k\) intersect the cube \(\{0,1\}^k\).
At the heart of this connection is a fairly simple way of associating a flag to k distinct sets \(A_1,\ldots , A_k \subset A\), where A is a given set of integers (that we typically generate logarithmically).
Definition 3.1
(Flags) Let \(k \in {\mathbb {N}}\). By an r-step flag we mean a nested sequence
of vector spaces.Footnote 5 Here \({\textbf{1}} = (1,1,\ldots , 1) \in {\mathbb {Q}}^k\). A flag is complete if \(\dim V_{i+1} = \dim V_i + 1\) for \(i = 0,1,\ldots , r-1\).
To each choice of distinct sets \(A_1,\ldots , A_k \subset A\), we associate a flag as follows. The Venn diagram of the subsets \(A_1,\ldots ,A_k\) produces a natural partition of A into \(2^k\) subsets, which we denote by \(B_\omega \) for \(\omega \in \{0,1\}^k\). Here \(A_i = \sqcup _{\omega :\omega _i=1} B_\omega \). We iteratively select vectors \(\omega ^1,\ldots ,\omega ^r\) to maximize \(\prod _{j=1}^r (\max B_{\omega ^j})\) subject to the constraint that \({\textbf{1}},\omega ^1,\ldots ,\omega ^r\) are linearly independent over \({\mathbb {Q}}\). We then defineFootnote 6\(V_j = {\text {Span}}({\textbf{1}},\omega ^1,\ldots ,\omega ^j)\) for \(j = 0,1,\ldots , r\).
The purpose of making this construction is difficult to describe precisely in a short paragraph. However, the basic idea is that the vectors \(\omega ^1,\ldots , \omega ^r\) and the flag \({\mathscr {V}}\) provide a natural frame of reference for studying the equal sums equation
Suppose now that \(A_1,\ldots , A_k \subset [D^c,D]\). Then the construction just described naturally leads, in addition to the flag \({\mathscr {V}}\), to the following further data: thresholds \(c_j\) defined by \(\max B_{\omega ^j} \approx D^{c_j}\), and measures \(\mu _j\) on \(\{0,1\}^k\), which capture the relative sizes of the sets \(B_{\omega } \cap (D^{c_{j+1}},D^{c_j}]\), \(\omega \in \{0,1\}^k\). Full details of these constructions are given in Sect. 4.
The above discussion motivates the following definition, which will be an important one in our paper.
Definition 3.2
(Systems) Let \(({\mathscr {V}},{{\textbf{c}}},{\varvec{\mu }})\) be a triple such that:
-
(a)
\({\mathscr {V}}\) is an r-step flag whose members \(V_j\) are distinct and spanned by elements of \(\{0,1\}^k\);
-
(b)
\({\mathscr {V}}\) is nondegenerate, which means that \(V_r\) is not contained in any of the subspaces \(\{ x \in {\mathbb {Q}}^k : x_i = x_j\}\), \(i \ne j\);
-
(c)
\({{\textbf{c}}}=(c_1,\ldots ,c_r,c_{r+1})\) with \(1\geqslant c_1 \geqslant \cdots \geqslant c_{r+1} \geqslant 0\);
-
(d)
\({\varvec{\mu }}=(\mu _1,\ldots ,\mu _r)\) is an r-tuple of probability measures;
-
(e)
\({\text {Supp}}(\mu _i)\subset V_i \cap \{0,1\}^k\) for all i.
Then we say that \(({\mathscr {V}},{{\textbf{c}}},{\varvec{\mu }})\) is a system. We say that a system is complete if its underlying flag is, in the sense of Definition 3.1.
Remark
The nondegeneracy condition (b) arises naturally from the construction described previously, provided one assumes the sets \(A_1,\ldots , A_k\) are distinct.
We have sketched how a system \(({\mathscr {V}},{{\textbf{c}}},{\varvec{\mu }})\) may be associated to any k distinct sets \(A_1,\ldots , A_k \subset [D^c, D]\). Full details are given in Sect. 4.1. There is certainly no canonical way to reverse this and associate sets \(A_i\) to a system \(({\mathscr {V}},{{\textbf{c}}},{\varvec{\mu }})\), even if the numbers \(\mu _j(\omega )\) are all rational. However, given a set \({\textbf{A}}\subset [D^c,D]\) (which, in our paper, will be a logarithmic random set) and a system \(({\mathscr {V}},{{\textbf{c}}},{\varvec{\mu }})\), there is a natural probabilistic way to construct subsets \(A_1,\ldots , A_k \subset {\textbf{A}}\) via their Venn diagram \((B_{\omega })_{\omega \in \{0,1\}^k}\): if \(a \in {\textbf{A}}\cap (D^{c_{j+1}}, D^{c_j}]\) then we put a in \(B_{\omega }\) with probability \(\mu _j(\omega )\), these choices being independent for different as.
This will be indeed be roughly our strategy for constructing, given a logarithmic random set \({\textbf{A}}\subset [D^c, D]\), distinct subsets \(A_1,\ldots , A_k \subset {\textbf{A}}\cap [D^c, D]\) satisfying the equal sums condition (3.1). Very broadly speaking, we will enact this plan in two stages, described in Sects. 5 and 6 respectively. In Sect. 5, which is by far the deeper part of the argument, we will show that (almost surely in \({\textbf{A}}\)) the distribution of tuples \((\sum _{a \in A_i} a)_{i = 1}^k\) is dense in a certain box adapted to the flag \({\mathscr {V}}\), as the \(A_i\) range over the random choices just described. Then, in Sect. 6, we will show that (almost surely) one of these tuples can be “corrected” to give the equal sums condition (3.1). This general mode of argument has its genesis in the paper [20] of Maier and Tenenbaum, but the details here will look very different. In addition to the fact that linear algebra and entropy play no role in Maier and Tenenbaum’s work, they use a second moment argument which does not work in our setting. Instead we use an \(\ell ^p\) estimate with \(p\approx 1\), building on ideas in [17, 18].
In analysing the distribution of tuples \((\sum _{a \in A_i} a)_{i = 1}^k\), the notion of entropy comes to the fore.
Definition 3.3
(Entropy of a subspace) Suppose that \(\nu \) is a finitely supported probability measure on \({\mathbb {Q}}^k\) and that \(W \leqslant {\mathbb {Q}}^k\) is a vector subspace. Then we define
Remark
This the (Shannon) entropy of the distribution on cosets \(W + x\) induced by \(\nu \). Entropy will play a key role in our paper, and basic definitions and properties of it are collected in Appendix B.
More important than the entropy itself will be a certain quantity \(\textrm{e}({\mathscr {V}}',{{\textbf{c}}},{\varvec{\mu }})\), assigned to subflags of \({\mathscr {V}}\). We give the relevant definitions now.
Definition 3.4
(Subflags) Suppose that
is a flag. Then another flag
is said to be a subflag of \({\mathscr {V}}\) if \(V'_i \leqslant V_i\) for all i. In this case we write \({\mathscr {V}}' \leqslant {\mathscr {V}}\). It is a proper subflag if it is not equal to \({\mathscr {V}}\).
Definition 3.5
(\(\textrm{e}\)-value) Let \(({\mathscr {V}},{{\textbf{c}}},{\varvec{\mu }})\) be a system, and let \({\mathscr {V}}' \leqslant {\mathscr {V}}\) be a subflag. Then we define the \(\textrm{e}\)-value
Remark
Note that
since condition (e) of Definition 3.2 implies that \({\mathbb {H}}_{\mu _j}(V_j)=0\) for \(1\leqslant j\leqslant r\).
Definition 3.6
(Entropy condition) Let \(({\mathscr {V}},{{\textbf{c}}},{\varvec{\mu }})\) be a system. We say that this system satisfies the entropy condition if
and the strict entropy condition if
We cannot give a meaningful discussion of exactly why these definitions are the right ones to make in this overview. Indeed, it took the authors over a year of working on the problem to arrive at them. Let us merely say that
-
If a random logarithmic set \({\textbf{A}}\cap [D^c, D]\) almost surely admits distinct subsets \(A_1,\ldots , A_k\) satisfying the equal sums condition (3.1), then some associated system \(({\mathscr {V}},{{\textbf{c}}},{\varvec{\mu }})\) satisfies the entropy condition (3.4). For detailed statements and proofs, see Sect. 4.
-
If a system \(({\mathscr {V}},{{\textbf{c}}},{\varvec{\mu }})\) satisfies the strict entropy condition (3.5) then the details of the construction of sets \(A_1,\ldots , A_k\) satisfying the equal sums condition, outlined above, can be made to work. For detailed statements and proofs, see Sects. 5 and 6.
With the above definitions and discussion in place, we are finally ready to introduce the key optimization problem, the study of which will occupy a large part of our paper.
Problem 3.7
(The optimisation problem) Determine the value of \(\gamma _k\), defined to be the supremum of all constants c for which there is a system \(({\mathscr {V}},{{\textbf{c}}},{\varvec{\mu }})\) such that \(c_{r+1}=c\) and the entropy condition (3.4) holds.
Similarly, determine \({\tilde{\gamma }}_k\), defined to be the supremum of all constants c for which there is a system \(({\mathscr {V}},{{\textbf{c}}},{\varvec{\mu }})\) such that \(c_{r+1}=c\) and the strict entropy condition (3.5) holds.
The precise content of the two bullet points above, and the main result of Part II of the paper, is then the following theorem.
Theorem 7
For every \(k\geqslant 2\), we have
Remark 3.1
(a) Presumably \(\gamma _k = \beta _k = {{\tilde{\gamma }}}_k\). Indeed, it is natural to think that any system satisfying (3.4) can be perturbed an arbitrarily small amount to satisfy (3.5). However, we have not been able to show that this is possible in general.
(b) It is not a priori clear that \(\gamma _k\) and \({\tilde{\gamma }}_k\) exist and are positive. This will follow, e.g., from our work on “binary systems” in part IV of the paper, although there is an easier way to see this using the original Maier–Tenenbaum argument, adapted to our setting; see Appendix C for a sketch of the details.
3.2 Part III: the optimization problem
Part III of the paper is devoted to the study of Problem 3.7 in as much generality as we can manage. Unfortunately we have not yet been able to completely resolve this problem, and indeed numerical experiments suggest that a complete solution, for all k, could be very complicated.
The main achievement of Part III is to provide a solution of sorts when the flag \({\mathscr {V}}\) is fixed, but one is free to choose \({\textbf{c}}\) and \({\varvec{\mu }}\). Write \(\gamma _k({\mathscr {V}})\) (or \({{\tilde{\gamma }}}_k({\mathscr {V}})\)) for the solution to this problem, that is, the supremum of values \(c=c_{r+1}\geqslant 0\) for which a system \(({\mathscr {V}},{{\textbf{c}}},{\varvec{\mu }})\) exists satisfying (3.4) (or (3.5)).
Our solution applies only to rather special flags \({\mathscr {V}}\), but this is unsurprising: for “generic” flags \({\mathscr {V}}\), one would not expect there to be any choice of \({\textbf{c}}\), \({\varvec{\mu }}\), for which \(c_{r+1} > 0\), and so \(\gamma _k({\mathscr {V}})= 0\) in these cases. Such flags are of no interest in this paper.
We begin, in Sect. 7, by solving an even more specific problem in which the entropy condition (3.4) is only required to hold for certain very special subflags \({\mathscr {V}}'\) of \({\mathscr {V}}\), which we call basic flags. These are flags of the form
We call this the restricted entropy condition; to spell it out, this is the condition that
for \(m = 0,1,\ldots , r-1\) (the case \(m = r\) being vacuous).
We write \(\gamma _k^{{\text {res}}}({\mathscr {V}})\) for the maximum value of \(c_{r+1}\) (over all choices of \({\textbf{c}}\) and \({\varvec{\mu }}\) such that \(({\mathscr {V}}, {{\textbf{c}}}, {\varvec{\mu }})\) is a system) subject to this condition. Clearly
The main result of Sect. 7 is Proposition 7.7, which states that under certain conditions we have
for certain parameters \(\rho _1,\ldots , \rho _{r-1}\) depending on the flag \({\mathscr {V}}\).
To define these, one considers the “tree structure” on \(\{0,1\}^k \cap V_r\) induced by the flag \({\mathscr {V}}\): the “cells at level j” are simply intersections with cosets of \(V_j\), and we join a cell C at level j to a “child” cell \(C'\) at level \(j-1\) iff \(C' \subset C\). The \(\rho _i\) are then defined by setting up a certain recursively-defined function on this tree and then solving what we term the \(\rho \)-equations. The details may be found in Sect. 7.2. Proposition 7.7 also describes the measures \({\varvec{\mu }}\) and the parameters \({{\textbf{c}}}\) for which this optimal value is attained.
In Sect. 8, we relate the restricted optimisation problem to the real one, giving fairly general conditions under which we in fact have equality in (3.7), that is to say \(\gamma _k^{{\text {res}}}({\mathscr {V}}) = \gamma _k({\mathscr {V}})\). The basic strategy of this section is to show that for the \({\textbf{c}}\) and \({\varvec{\mu }}\) which are optimal for the restricted optimisation problem, the full entropy condition (3.4) is in fact a consequence of the restricted condition (3.6).
The arguments of this section make heavy use of the submodularity inequality for entropy, using this to drive a kind of “symmetrisation” argument. In this way one can show that an arbitrary \(\textrm{e}({\mathscr {V}}', {{\textbf{c}}}, {\varvec{\mu }})\) is greater than or equal to one in which \({\mathscr {V}}'\) is almost a basic flag; these “semi-basic” flags are then dealt with by hand.
To add an additional layer of complexity, we build a perturbative device into this argument so that our results also apply to \(\tilde{\gamma }_k({\mathscr {V}})\).
3.3 Part IV: binary systems
The final part of the paper is devoted to a discussion of a particular type of flag \({\mathscr {V}}\), the binary flags, and the associated optimal systems \(({\mathscr {V}}, {{\textbf{c}}}, {\varvec{\mu }})\), which we call binary systems.
Definition 3.8
(Binary flag of order r) Let \(k = 2^r\) be a power of two. Identify \({\mathbb {Q}}^k\) with \({\mathbb {Q}}^{{\mathcal {P}}[r]}\) (where \({\mathcal {P}}[r]\) means the power set of \([r] = \{1,\ldots , r\}\)) and define an r-step flag \({\mathscr {V}}\), \(\langle {\textbf{1}} \rangle = V_0 \leqslant V_1 \leqslant \cdots \leqslant V_r = {\mathbb {Q}}^{{\mathcal {P}}[r]}\), as follows: \(V_i\) is the subspace of all \((x_S)_{S \subset [r]}\) for which \(x_S = x_{S \cap [i]}\) for all \(S \subset [r]\).
Whilst the definition is, in hindsight, rather simple and symmetric, it was motivated by extensive numerical experiment. We believe these flags to be asymptotically optimal for Problem 3.7, though we currently lack a proof.
There are two main tasks in Part IV. First, we must verify that the various conditions necessary for the results of Part III hold for the binary flags. This is accomplished in Sect. 10, the main statements being given in Sect. 9. At the end of Sect. 9 we give the proof (and complete statement) of Theorem 2(a), conditional upon the results of Sect. 10. This is the deepest result in the paper.
Following this we turn to Theorem 2(b). There are two tasks here. First, we prove that the parameters \(\rho _i\) for the binary flags (which do not depend on r) tend to a limit \(\rho \). This is not at all straightforward, and is accomplished in Sect. 11.
After that, in Sect. 12, we describe this limit in terms of certain recurrence relations, which also provide a useful means of calculating it numerically. Theorem 2(b) is established at the very end of the paper.
Most of Part IV could, if desired, be read independently of the rest of the paper.
3.4 Relation to previous work
Previous lower bounds for the a.s. behaviour of \(\Delta \) are contained in two papers of Maier and Tenenbaum [20, 22]. Both of these bounds can be understood within the framework of our paper.
The main result of [20] follows from the fact that
Indeed by Theorem 7 it then follows that \(\beta _2 \geqslant 1 - \frac{1}{\log 3}\), and then from Theorem 3 it follows that for almost every n we have
The exponent appearing here is \(0.28754048957\ldots \) and is exactly the one in [20, Theorem 2].
The bound (3.9) is very easy to establish, and a useful exercise in clarifying the notation we have set up. Take \(k = 2\), \(r = 1\) and let \({\mathscr {V}}\) be the flag \(\langle {\textbf{1}}\rangle = V_0 \leqslant V_1 = {\mathbb {Q}}^2\). Let \({{\textbf{c}}}= (c_1, c_2)\) with \(c_1 = 1\) and
Let \(\mu _1\) be the measure which assigns weight \(\frac{1}{3}\) to the points \({\textbf{0}} = (0,0)\), (0, 1) and (1, 0) in \(\{0,1\}^2\) (this being a pullback of the uniform measure on \(\{0,1\}^2 / V_0\)).
There are only two subflags \({\mathscr {V}}'\) of \({\mathscr {V}}\), namely \({\mathscr {V}}\) itself and the basic flag \({\mathscr {V}}'_{{\text {basic}}(0)} : \langle {\textbf{1}}\rangle = V'_0 \leqslant V'_1\) with \(V'_0 = V'_1 = V_0 = \langle {\textbf{1}}\rangle \). The entire content of the strict entropy condition (3.5) is therefore that
which translates to
We have \({\mathbb {H}}_{\mu _1}(V_0) = \log 3\) and \(c_1 = 1\), and so this translates to precisely condition (3.11).
Remark
(a) With very little more effort (appealing to Lemma B.2) one can show that \(\gamma _2 = \beta _2 = {{\tilde{\gamma }}}_2 = 1 - \frac{1}{\log 3}\).
(b) This certainly does not provide a shorter proof of Theorem 3.10 than the one Maier and Tenenbaum gave, since our deductions are reliant on the material in Sects. 5 and 6, which constitute a significant elaboration of the ideas from [20].
The main result of [22] (Theorem 1.4 there) follows from the lower bound
which of course includes (3.9) as the special case \(r = 1\). Applying Theorem 7 and Theorem 3, then letting \(r \rightarrow \infty \), we recover [22, Theorem 1.4] (quoted as Theorem MT in Sect. 1), namely the bound
for almost all n. The exponent here is \(0.33827824168\ldots \).
To explain how (3.12) may be seen within our framework requires a little more setting up. Since it is not directly relevant to our main arguments, we defer this to Appendix C.
Part II. Equal sums and the optimisation problem
4 The upper bound \(\beta _k \leqslant \gamma _k\)
In this section we establish the bound in the title. We recall the definitions of \(\beta _k\) (Problem 1) and \(\gamma _k\) (Problem 3.7). We will in fact show a bit more, that if \(c>\gamma _k\) then
as \(D\rightarrow \infty \).
4.1 Venn diagrams and linear algebra
Let \(0< c < 1\) be some fixed quantity, and let D be a real number, large in terms of c. Suppose that \(A_1,\ldots , A_k \subset [D^c, D]\) are distinct sets. In this section we show that there is a rather natural way to associate a complete system \(({\mathscr {V}},{\textbf{c}},{\varvec{\mu }})\) (in the sense of Definition 3.2) to these sets. This system encodes the “linear algebra of the Venn diagram of the \(A_i\)” in a way that turns out to be extremely useful.
The Venn diagram of the \(A_i\) has \(2^k\) cells, indexed by \(\{0,1\}^k\) in a natural way. Thus for each \(\omega =(\omega _1,\ldots ,\omega _k)\in \{0,1\}^k\), we define
The flag \({\mathscr {V}}\). Set \(\Omega := \{ \omega : B_{\omega } \ne \emptyset \}\). We may put a total order \(\prec \) on \(\Omega \) by writing \(\omega ' \prec \omega \) if and only if \(\max B_{\omega '} < \max B_\omega \). We now select r special vectors \(\omega ^1,\ldots ,\omega ^r \in \Omega \), with \(r\leqslant k-1\), in the following manner. Let \(\omega ^1 = \max _{\prec }(\Omega {\setminus } \{\varvec{0}, \varvec{1} \} )\). Assuming we have chosen \(\omega ^1,\ldots ,\omega ^j\) such that \({\textbf{1}},\omega ^1,\ldots ,\omega ^j\) are linearly independent over \({\mathbb {Q}}\), let \(\omega ^{j+1} = \max ( \Omega {\setminus } {\text {Span}}(\varvec{1}, \omega ^1,\ldots , \omega ^j) )\), as long as such a vector exists.
Let \({\textbf{1}}, \omega ^1,\ldots , \omega ^r\) be the set of vectors produced when this algorithm terminates. By construction, \(\Omega \subset {\text {Span}}({\textbf{1}},\omega ^1,\ldots ,\omega ^r)\), or in other words \(B_\omega =\emptyset \) whenever \(\omega \in \{0,1\}^k{\setminus } {\text {Span}}({\textbf{1}},\omega ^1,\ldots ,\omega ^r)\).
Now define an r-step flag \({\mathscr {V}}: \langle {\textbf{1}}\rangle = V_0< V_1< \cdots < V_r\) by setting \(V_j := {\text {Span}}({\textbf{1}}, \omega ^1,\ldots , \omega ^j)\) for \(1 \leqslant j \leqslant r\).
The parameters \({\textbf{c}}\). Now we construct the parameters \({\textbf{c}} : 1 \geqslant c_1 \geqslant c_2 \geqslant \cdots \geqslant c_{r+1}\). For \(j = 1,\ldots , r\), we define
Thus
for \(j = 1,\ldots , r\). Also set \(c_{r+1}=c\). (The ceiling function \(\lceil \cdot \rceil \) produces a “coarse” or discretised set of possible thresholds \(c_i\), suitable for use in a union bound later on; see Lemma 4.2 below. The offset of \(-\log D\) is to ensure that \(c_1 \leqslant 1\).)
The measures \({\varvec{\mu }}\). Set
Define
with the convention that if the denominator vanishes, then \(\mu _j(\omega ) = 1_{\omega = {\textbf{0}}}\).
Remark
It is important that we use the \(B'_{\omega }\) here, rather than the \(B_{\omega }\), for technical reasons that will become apparent in the proof of Proposition 4.4 below.
Lemma 4.1
\(({\mathscr {V}},{\textbf{c}},{\varvec{\mu }})\) is a complete system (in the sense of Definition 3.2).
Proof
We need to check that \({\text {Supp}}(\mu _j) \subset V_j\) for \(j = 1,\ldots , r\). By definition, if \(\mu _j(\omega ) > 0\) then \(B_{\omega } \cap (D^{c_{j+1}}, D] \ne \emptyset \). This implies that \(\max B_{\omega } > D^{c_{j+1}}\). On the other hand, (4.4) implies that \(D^{c_{j+1}} \geqslant \max B_{\omega ^{j+1}}\), and thus \(\max B_{\omega } > \max B_{\omega ^{j+1}}\). By the construction of the vectors \(\omega ^i\), we must have \(\omega \in {\text {Span}}({\textbf{1}}, \omega ^1,\ldots , \omega ^j) = V_j\).
We also need to check that \({\mathscr {V}}\) is nondegenerate, also in the sense of Definition 3.2, that is to say \(V_r\) is not contained in any hyperplane \(\{\omega \in {\mathbb {Q}}^k : \omega _i = \omega _j\}\). This follows immediately from the fact that the \(A_i\) are distinct. Since
and so there is certainly some \(\omega \) with \(\omega _i \ne \omega _j\) and \(B_\omega \ne \emptyset \). \(\square \)
Note that, in addition to the system \(({\mathscr {V}},{\textbf{c}},{\varvec{\mu }})\), the procedure described above outputs a sequence \(\omega ^1,\ldots , \omega ^r\) of elements of \(\{0,1\}^k\). We call the ensemble consisting of the system and the \(\omega ^i\) the linear data associated to \(A_1,\ldots , A_k\). We will only consider the event \({\textbf{A}}\in {\mathcal {E}}\), where
By Lemma A.5, \({\mathbb {P}}({\textbf{A}}\in {\mathcal {E}})=1-o(1)\) as \(D\rightarrow \infty \). In particular, if \(A \in {\mathcal {E}}\), we have \(|A \cap [D^c,D]| \leqslant 2\log D\) for large enough D.
Lemma 4.2
Fix \(k\in {\mathbb {Z}}_{\geqslant 2}\) and suppose that \(A \in {\mathcal {E}}\). The number of different ensembles of linear data arising from distinct sets \(A_1,\ldots , A_k \subset A\) is \(\ll (\log D)^{O(1)}\).
Proof
The number of choices for \(\omega ^1,\ldots , \omega ^r\) is O(1), and hence the number of \({\mathscr {V}}\) is also \(O_k(1)\). The thresholds \(c_j\) are drawn from a fixed set of size \(\log D\), and the numerators and denominators of the \(\mu _j(\omega )\) are all integers \(\leqslant 2\log D\). \(\quad \square \)
Remark 4.1
The O(1) and the \(\ll \) here both depend on k. However we regard k as fixed here and do not indicate this dependence explicitly. If one is more careful then one can obtain results that are effective up to about \(k \sim \log \log D\).
4.2 A local-to-global estimate
Our next step towards establishing the bound \(\beta _k \leqslant \gamma _k\) is to pass from the “local” event that a random logarithmic set \({\textbf{A}}\) possesses a k-tuple of equal subsums \((\sum _{a\in A_1}a,\ldots ,\sum _{a\in A_k}a)\) to the “global” distribution of such subsums (with the subtlety that we must mod out by \({\textbf{1}}\)). The latter is controlled by the set \({\mathscr {L}}_{{\mathscr {V}},{{\textbf{c}}},{\varvec{\mu }}}({\textbf{A}})\) defined below.
Definition 4.3
Given a set of integers A and a system \(({\mathscr {V}},{\textbf{c}},{\varvec{\mu }})\), we write \( {\mathscr {L}}_{{\mathscr {V}},{\textbf{c}},{\varvec{\mu }}}(A)\) for the set of vectors
where \((B_\omega )_{\omega \in \{0,1\}^k}\) runs over all partitions of A such that
Proposition 4.4
Fix an integer \(k\geqslant 2\) and a parameter \(0<c<1\). Let D be large in terms of c and k, and let \({\textbf{A}}\subset [D^c,D]\) be a logarithmic random set. Let
Then we have
Here, the supremum is over all complete systems \(({\mathscr {V}},{\textbf{c}},{\varvec{\mu }})\) with \(c_{r+1}=c\).
Proof
Recall the definition of the set \({\mathcal {E}}\), given in Eq. (4.7). We have
where, given linear data \(\{({\mathscr {V}},{\textbf{c}},\mathbf {\mu }), \omega ^1,\ldots , \omega ^r\}\), we write \({\mathscr {S}}({\mathscr {V}},{\textbf{c}},{\varvec{\mu }},(\omega ^i))\) to denote the set of all \(A\in {\mathcal {E}}\) that have k distinct subsets \((A_1,\ldots ,A_k)\) with equal sums-of-elements and associated linear data \(\{({\mathscr {V}},{\textbf{c}},\mathbf {\mu }),\omega ^1,\ldots , \omega ^r\}\). (The set \({\textbf{A}}\) appearing in (4.10) will be constructed below by removing certain elements from the logarithmic set \({\textbf{A}}\) we started with; this new set belongs to \({\widetilde{{\mathcal {E}}}}\), but not necessarily to \({\mathcal {E}}\).)
Let us fix a choice of linear data \(\{({\mathscr {V}},{\textbf{c}},\mathbf {\mu }), \omega ^1,\ldots , \omega ^r\}\) and let us abbreviate \({\mathscr {S}}\) for the set \({\mathscr {S}}({\mathscr {V}},{\textbf{c}},{\varvec{\mu }},(\omega ^i))\). An elementary probability calculation gives
For each \(A \in {\mathscr {S}}\), fix a choice of \((A_1,\ldots ,A_k)\) with equal sums and such that the linear data associated to \((A_1,\ldots , A_k)\) is \(\{ ({\mathscr {V}}, {\textbf{c}}, {\varvec{\mu }}), \omega ^1,\ldots , \omega ^r\}\). Let \(B_\omega \) be the cells of the Venn diagram corresponding to the \(A_i\), as in (4.2), and then define the \(B_\omega '\) as in (4.5). Recall that (4.6) holds, and define \(K_j = \max B_{\omega ^j}\) for \(1 \leqslant j\leqslant r\). In particular, \(K_1> \cdots > K_r\). Let \(A' = A {\setminus } \{K_1,\ldots ,K_r\}\). Note that \(A'\in {\tilde{{\mathcal {E}}}}\) if D is large enough in terms of k. Moreover, we have
Therefore, the equal sums condition is equivalent to
and hence
Since \({\textbf{1}}, \omega ^1,\ldots , \omega ^r\) are linearly independent, the value of the right-hand side of (4.12) uniquely determines the numbers \(K_j\), which themselves uniquely determine A in terms of the sets \(B_\omega '\). Therefore, given \(A' \in {\tilde{{\mathcal {E}}}}\), the number of possible sets A is, by Definition 4.3, at most \(|{\mathscr {L}}_{{\mathscr {V}},{\textbf{c}},{\varvec{\mu }}}(A')|\). Moreover by (4.4) we have \(K_j > \frac{1}{e} D^{c_j}\) for every j, and therefore
We sum over \(A'\), and reinterpret the product on the right-hand side of (4.13) in terms of \({\mathbb {P}}({\textbf{A}}=A')\). This gives
By Lemma 4.2 there are \((\log D)^{O(1)}\) possible choices for the linear data \(\{({\mathscr {V}},{\textbf{c}},\mathbf {\mu }), \omega ^1,\ldots , \omega ^r\}\), and the proof is complete. \(\square \)
4.3 Upper bounds in terms of entropies
Having established Proposition 4.4, we turn to the study of the sets \({\mathscr {L}}_{{\mathscr {V}},{{\textbf{c}}},{\varvec{\mu }}}(A)\). We will bound their cardinality in terms of the quantities \(\textrm{e}({\mathscr {V}}',{\textbf{c}},\mathbf {\mu })\) from Definition 3.2 with \({\mathscr {V}}'\) a subflag of \({\mathscr {V}}\).
Lemma 4.5
Let \(({\mathscr {V}},{{\textbf{c}}},{\varvec{\mu }})\) be a system and let \(A\in {\tilde{{\mathcal {E}}}}\), where \({\tilde{{\mathcal {E}}}}\) is defined in (4.9). Then, for any subflag \({\mathscr {V}}'\) of \({\mathscr {V}}\),
Remark
The implied constant in the \(\ll _{{\mathscr {V}}'}\) could be made explicit if desired (in terms of the quantitative rationality of a basis for the spaces in \({\mathscr {V}}'\)) but we have no need to do this.
Proof of Lemma 4.5
Given a set \(X\subset [D^c,D]\), write \(X^{(j)}:=X\cap (D^{c_{j+1}},D^{c_j}]\) for \(j = 1,\ldots , r\). Throughout the proof, we will assume that A is a set of integers and that \((B_{\omega })_{\omega \in \{0,1\}^k}\) runs over all partitions of A such that (4.8) is satisfied. In our new notation, this may be rewritten as
For each j, \(1 \leqslant j \leqslant r\), fix a linear projection \(P_j : V_j \rightarrow V'_j\), and set \(Q_j := {\text {id}}_{V_j} - P_j\), so that \(Q_j\) maps \(V_j\) to itself. Set
and
Since
it follows immediately from the definition of \({\mathscr {L}}_{{\mathscr {V}},{\textbf{c}},{\varvec{\mu }}}(A)\) (Definition 4.3) that
We claim that
and that
These bounds, substituted into (4.16), immediately imply Lemma 4.5.
It remains to establish (4.17) and (4.18), which are proven in quite different ways. We begin with (4.18), which is a “combinatorial” bound, in that there cannot be too many choices for the data making up the sums in \({\mathscr {L}}^Q(A)\). For this, observe that \(Q_j\) vanishes on \(V'_j\) and hence is constant on cosets of \(V'_j\). Therefore the elements of \({\mathscr {L}}^Q(A)\) are determined by the sets \(\bigcup _{\omega \in v_j + V'_j} B^{(j)}_{\omega }\), over all \(v_j \in V_j/V'_j\) and \(1 \leqslant j \leqslant r\). By (4.15),
and by Lemma B.1 the number of ways of partitioning \(A^{(j)}\) into sets of these sizes is bounded above by \(e^{{\mathbb {H}}({\textbf{p}}^{(j)}) |A^{(j)}|}\), where \({\textbf{p}}^{(j)} = (\mu _j(v_j + V'_j))_{v_j \in V_j/V'_j}\). By Definition 3.3, \({\mathbb {H}}({\textbf{p}}^{(j)}) = {\mathbb {H}}_{\mu _j}(V'_j)\). Taking the product over \(j = 1,\ldots , r\) gives
From the assumption that \(A\in {\widetilde{{\mathcal {E}}}}\), where \({\widetilde{{\mathcal {E}}}}\) is defined in (4.9), we have
Using this, and the trivial bound \({\mathbb {H}}_{\mu _j}(V'_j) \leqslant \log |{\text {Supp}}(\mu _j)| \leqslant \log (2^k)\), (4.18) follows.
Now we prove (4.17), which is a “metric” bound, the point being that none of the sums in \({\mathscr {L}}^P(A)\) can be too large in an appropriate sense. Pick a basis for \({\mathbb {Q}}^k\) adapted to \({\mathscr {V}}'\): that is, a basis \(e_1,\ldots , e_k\) such that \(V'_j = {\text {Span}}(e_1,\ldots , e_{\dim V'_j})\) for each j, and \(e_1 = {\textbf{1}}\). There are positive integers \(M,N = O_{{\mathscr {V}}',{\mathscr {V}}}(1)\) such that, in this basis, the \(e_i\)-coordinates of \(P_j(\omega )\) are all rationals with denominator M and absolute value at most N.
Now for fixed j and \(\omega \), if D is large then \( \sum _{a \in B_{\omega }^{(j)}} a \leqslant D^{c_j} \log D, \) since \(B_{\omega }^{(j)} \subset (D^{c_{j+1}}, D^{c_j}]\) and by the assumption that \(A\in {\widetilde{{\mathcal {E}}}}\). Thus
and so the expression \( \sum \limits _{i=1}^r \sum \limits _{{\omega \in \{0,1\}^k \cap V_j}} P_j(\omega ) \sum _{a \in B_{\omega }^{(j)}} a \) belongs to the set
We must bound the number of different values that the expression \(\sum _{i=1}^k x_ie_i\) can take mod \({\textbf{1}}\) when the coefficients \(x_1,\ldots ,x_k\) are as above. Since \(e_1 = {\textbf{1}}\) and \(x_1 M\in {\mathbb {Z}}\), given \(x_2,\ldots ,x_k\) there are at most M possibilities for \(x_1\) mod \({\textbf{1}}\). In addition, there are
possibilities for \(x_2,\ldots ,x_k\), thereby concluding the proof of (4.17) and hence of Lemma 4.5. \(\square \)
A potential problem with applying Lemma 4.5 is that there may be infinitely many subflags \({\mathscr {V}}'\) to consider, and the constant implied by the \(\ll \)-symbol depends on \({\mathscr {V}}'\). As we shall see in the next Lemma, however, we may reduce the problem to consideration of a finite number of subflags, a tool which will be used in several parts of this paper.
Lemma 4.6
For a given k, the set of all flags
may be partitioned into \(O_k(1)\) equivalence classes such that any two flags \(\mathscr {V}',\mathscr {V}''\) in the same equivalence class satisfy \(\dim V'_j = \dim V''_j\) for all j, and for any thresholds \({{\textbf{c}}}\) satisfying \(c_1\geqslant c_2 \geqslant \cdots \geqslant c_{r+1}\) and probability measures \({\varvec{\mu }}\) supported on \(\{0,1\}^k\), we have \({\mathbb {H}}_{\mu _j}(V'_j) = {\mathbb {H}}_{\mu _j}(V''_j)\) for all j and \(\textrm{e}({\mathscr {V}}',{\textbf{c}}, {\varvec{\mu }}) = \textrm{e}({\mathscr {V}}'',{\textbf{c}}, {\varvec{\mu }})\).
Proof
We say that two subflags \({\mathscr {V}}',{\mathscr {V}}''\) are equivalent if \(V'_j, V''_j\) have the same intersection with \(\{0,1\}^k\) and \(\dim V'_j = \dim V''_j\), for all \(j = 1,\ldots , r\). There are clearly only \(O_k(1)\) equivalence classes, and the desired properties hold for members of the same equivalence class by the definition of \({\mathbb {H}}_{\mu _j}(V'_j)\) and \(\textrm{e}(\mathscr {V}',{{\textbf{c}}},{\varvec{\mu }})\). \(\square \)
Armed with Lemma 4.6, we immediately obtain from Lemma 4.5, applied to one representative from each class, the following corollary.
Corollary 4.7
Let \(({\mathscr {V}},{\textbf{c}},{\varvec{\mu }})\) be a system and suppose that \(A\in {\tilde{{\mathcal {E}}}}\). Then
4.4 The upper bound in Theorem 7
We can now establish the upper bound in Theorem 7, that is to say the inequality \(\beta _k \leqslant \gamma _k\).
We start by applying Proposition 4.4. Together with Lemma A.5, it implies that
Here, the supremum is over complete systems \(({\mathscr {V}},{\textbf{c}},{\varvec{\mu }})\) with \(c_{r+1} = c\), and we made the observation that for such systems we have
an immediate consequence of the definition of \(\textrm{e}({\mathscr {V}},{{\textbf{c}}},{\varvec{\mu }})\) and the fact that \({\mathbb {H}}_{\mu _j}(V_j) = 0\) for all j and that \(\dim V_j = j+1\). Thus we may apply Corollary 4.7, concluding that
where
the supremum is over all complete systems \(({\mathscr {V}},{\textbf{c}},{\varvec{\mu }})\) with \(c_{r+1} = c\), and the minimum is over all subflags \(\mathscr {V}' \leqslant {\mathscr {V}}\). Note that the minimum exists by Lemma 4.6, since we may restrict attention to a finite set of subflags \(\mathscr {V}'\). Moreover, the supremum is realised, meaning there is a system \(({\mathscr {V}},{{\textbf{c}}},{\varvec{\mu }})\) for which the right side of (4.19) equals \(\theta \). Indeed, there are O(1) choices for \({\mathscr {V}}\), and with \({\mathscr {V}}\) fixed the quantities \({\textbf{c}},{\varvec{\mu }}\) range over compact subsets of Euclidean space, with the right side of (4.19) continuous in these variables.
Now, if we assume that \(c>\gamma _k\), then the definition of \(\gamma _k\) in Problem 3.7 implies that there is no system \(({\mathscr {V}},{{\textbf{c}}},{\varvec{\mu }})\) with \(c_{r+1}=c\) and that satisfies the entropy condition (3.4). Equivalently, if \(c_{r+1}=c\), then \(\min _{{\mathscr {V}}'\leqslant {\mathscr {V}}} \big (\textrm{e}({\mathscr {V}}', {\textbf{c}}, {\varvec{\mu }}) - \textrm{e}({\mathscr {V}},{\textbf{c}},{\varvec{\mu }}))\big )<0\). In particular, we have \(\theta <0\). We have thus established (4.1), as required.
Remark
In the above proof, \(({\mathscr {V}},{\textbf{c}},{\varvec{\mu }})\) is a complete system. However, for other aspects of our problem it is not natural to focus on the completeness condition, for which reason we omit it from the definition of \(\gamma _k\).
5 The lower bound \(\beta _k \geqslant {{\tilde{\gamma }}}_k\)
5.1 Introduction and simple reductions
The aim of this section and the next is to establish the lower bound \(\beta _k \geqslant {{\tilde{\gamma }}}_k\). We begin, in Lemma 5.3 below, by showing that we may restrict our attention to certain systems satisfying some additional regularity conditions.
We isolate a “folklore” lemma from the proof for which it is not easy to find a good reference. The authors thank Carla Groenland for a helpful conversation on this topic.
Lemma 5.1
Let V be a subspace of \({\mathbb {Q}}^k\). Then \(\# (V \cap \{0,1\}^k) \leqslant 2^{\dim V}\).
Proof
We outline two quite different short proofs. Let \(d := \dim V\).
Proof 1. We claim that there is a projection from \({\mathbb {Q}}^k\) onto some set of d coordinates which is injective on V. From this, the result is obvious, since the image of \(\{0,1\}^k\) under any such projection has size \(2^d\). To prove the claim, let \(e_1,\ldots ,e_n\) denote the standard basis on \({\mathbb {Q}}^n\). Note that if \(W \leqslant {\mathbb {Q}}^n\) and if none of the quotient maps \({\mathbb {Q}}^n \mapsto {\mathbb {Q}}^n/\langle e_i\rangle \) is injective on W, then W must contain a multiple of each \(e_i\), and therefore \(W = {\mathbb {Q}}^n\). Thus if W is a proper subspace of \({\mathbb {Q}}^n\) then there is a projection onto some set of \((n-1)\) coordinates which is injective on W. Repeated use of this fact establishes the claim.
Proof 2. Suppose that \(\# (V \cap \{0,1\}^k)\) contains \(2^d + 1\) points. These are all distinct under the natural ring homomorphism \(\pi : {\mathbb {Z}}^k \rightarrow {\mathbb {F}}_2^k\), and so their images cannot lie in a subspace (over \({\mathbb {F}}_2\)) of dimension d. Hence there are \(v_1,\ldots , v_{d+1} \in V\) such that \(\pi (v_1),\ldots , \pi (v_{d+1})\), are linearly independent over \({\mathbb {F}}_2\). The \((d +1) \times k\) matrix formed by these \(\pi (v_i)\) therefore has a \((d+1) \times (d+1)\)-subminor which is nonzero in \({\mathbb {F}}_2\). The corresponding subminor of the matrix formed by the \(v_i\) is therefore an odd integer, and in particular not zero. This means that \(v_1,\ldots , v_{d+1}\) are linearly independent over \({\mathbb {Q}}\), contrary to the assumption that \(\dim (V)=d\). \(\square \)
We now record an immediate corollary of Lemma 4.6, which provides a “gap condition” on the \(\textrm{e}\)-quantities.
Lemma 5.2
If the system \(({\mathscr {V}},{{\textbf{c}}},{\varvec{\mu }})\) satisfies (3.5) then there is an \(\varepsilon >0\) such that for all proper subflags \({\mathscr {V}}'\),
For future reference, the next two lemmas record more information about optimal systems for \({\tilde{\gamma }}_k\) and for \(\gamma _k\), respectively.
Lemma 5.3
Let \(k\in {\mathbb {Z}}_{\geqslant 2}\). We have that \({\tilde{\gamma }}_k\) is the supremum of all \(c>0\) for which there is a system \(({\mathscr {V}},{{\textbf{c}}},{\varvec{\mu }})\) such that \(c_{r+1}=c\), (3.5) holds and we further have:
-
(a)
\(1=c_1> c_2> \cdots >c_{r+1}=c\);
-
(b)
\({\mathbb {H}}_{\mu _j}(V_{j-1}) > \dim (V_j/V_{j-1})\) for \(1\leqslant j\leqslant r-1\) and
$$\begin{aligned} {\mathbb {H}}_{\mu _r}(V_{r-1}) > \frac{c_r}{c_r-c_{r+1}} \dim (V_r/V_{r-1}); \end{aligned}$$ -
(c)
\(\dim (V_1/V_0)=1\);
-
(d)
\({\text {Supp}}(\mu _j) = V_j \cap \{0,1\}^k\) for \(j=1,2,\ldots ,r\);
-
(e)
for all j and \(\omega \), \(\mu _j(\omega )=\mu _j({\textbf{1}}-\omega )\).
Proof
First of all, we show that we may assume that \(c>0\) and that statement (d) holds. Indeed, if a system \(({\mathscr {V}},{\varvec{\mu }},{{\textbf{c}}})\) satisfies (3.5), then Lemma 5.2 implies that (5.1) holds for some \(\varepsilon >0\). As the difference between the left and right sides of (5.1) is continuous in the quantities \(c_j\) and \(\mu _j(\omega )\), we may increase \(c_{r+1}\) (and possibly some of the other \(c_j\)’s) a tiny bit and we may also adjust the measures \(\mu _j\) by a small amount, so that \(c_{r+1}>0\), statement (d) holds, and we also have that
for every proper subflag \(\mathscr {V}'\).
Next, we show that we may take \(c_1=1\). Indeed, condition (3.5) implies that \(\textrm{e}({\mathscr {V}}',{{\textbf{c}}},{\varvec{\mu }})\geqslant \textrm{e}({\mathscr {V}},{{\textbf{c}}},{\varvec{\mu }})\geqslant 0\) for all \({\mathscr {V}}'\leqslant {\mathscr {V}}\) by (3.3). Now if \(c_1<1\) and \({\tilde{c}}_j=c_j/c_1\) for each j, then the perturbed system \(({\mathscr {V}},{\tilde{{{\textbf{c}}}}},{\varvec{\mu }})\) has a larger value of \(c_{r+1}\), and moreover also satisfies (3.5), since for any subflag \(\mathscr {V}'\) we have
Next, consider a system \(({\mathscr {V}},{{\textbf{c}}},{\varvec{\mu }})\) satisfying \(c_1=1\) and \(c_{r+1}=c>0\), and consider the subflag \({\mathscr {V}}' : \langle {\textbf{1}}\rangle = V'_0 \leqslant V_1'\leqslant \cdots \leqslant V_r'\), where \(V_i' = V_i\) for \(i\ne j\), and \(V_{j}'=V_{j-1}\); that is, \({\mathscr {V}}'\) has two consecutive copies of \(V_{j-1}\). By assumption (Definition 3.2), we have \(V_{j-1}\ne V_j\), and thus \({\mathscr {V}}'\) is a proper subflag of \({\mathscr {V}}\). Thus
Since the left-hand side is positive, we conclude that (a) and (b) hold.
(c) Let \(d=\dim (V_1/V_0)\). By Lemma 5.1, \(|V_1 \cap \{0,1\}^k| \leqslant 2^{\dim V_1} = 2^{d+1}\) and hence \(\mu _1\) is supported on at most \(2^{d+1}-1\) cosets of \(V_0\) (since \({\textbf{1}} \in V_0\), the points \({\textbf{0}}\) and \({\textbf{1}}\) lie in the same coset). In particular, by Lemma B.2, \({\mathbb {H}}_{\mu _1}(V_0) \leqslant \log (2^{d+1}-1)\). On the other hand, \({\mathbb {H}}_{\mu _1}(V_0) > d\) by statement (b). We must thus have \(d = 1\), which is exactly statement (c).
(e) Assume the system \(({\mathscr {V}},{{\textbf{c}}},{\varvec{\mu }})\) satisfies (3.5) and (a). For every j and \(\omega \in V_j\), we define
We then consider the system \(({\mathscr {V}},{{\textbf{c}}},{\tilde{{\varvec{\mu }}}})\), and must show that it also satisfies (3.5). For this, it is enough to show that
for all j. Indeed, we then have, for every proper subflag \(\mathscr {V}'\),
To prove (5.2), write
where the sum is over all cosets C of \(V'_j\) and \(L(t) = -t \log t\). Thus, since \(-C\) runs over all cosets as C does, we have
By the concavity of L, we have
Claim (5.2) then readily follows. \(\square \)
Lemma 5.4
Let \(k\in {\mathbb {Z}}_{\geqslant 2}\) be such that \(\gamma _k>0\). Then we have that \(\gamma _k\) is the supremum of all \(c>0\) for which there is a system \(({\mathscr {V}},{{\textbf{c}}},{\varvec{\mu }})\) such that \(c_{r+1}=c\), (3.4) holds and we further have:
-
(a)
\(1=c_1> c_2> \cdots >c_{r+1}=c\);
-
(b)
\({\mathbb {H}}_{\mu _j}(V_{j-1}) \geqslant \dim (V_j/V_{j-1})\) for \(1\leqslant j\leqslant r-1\) and
$$\begin{aligned} {\mathbb {H}}_{\mu _r}(V_{r-1}) \geqslant \frac{c_r}{c_r-c_{r+1}} \dim (V_r/V_{r-1}) ; \end{aligned}$$ -
(c)
\(\dim (V_1/V_0)=1\);
-
(d)
\(\bigcup _{i=1}^j {\text {Supp}}\mu _i\) spans \(V_j\) for \(j=1,2,\ldots ,r\);
-
(e)
for all j and \(\omega \), \(\mu _j(\omega )=\mu _j({\textbf{1}}-\omega )\).
Remark
As we will see in Part IV, we always have \(\gamma _k>0\).
Proof
The proof that we may take \(c_1=1\) is the same as in Lemma 5.3.
Next, consider a system \(({\mathscr {V}},{{\textbf{c}}},{\varvec{\mu }})\) satisfying \(c_1=1\) and \(c_{r+1}=c>0\), and consider the subflag \({\mathscr {V}}' : \langle {\textbf{1}}\rangle = V'_0 \leqslant V_1'\leqslant \cdots \leqslant V_r'\), where \(V_i' = V_i\) for \(i\leqslant r-1\), and \(V_r'=V_{r-1}\). Thus
Since the left-hand side is \(\geqslant 0\) and we have assumed that \(c_{r+1}=c>0\) and that \(V_{r-1}\ne V_r\), the latter being true from Definition 3.2, we conclude that
This proves part of statements (a) and (b). We shall now prove them fully.
(a) There are always indices \(1=i_1<i_2<\cdots<i_s<i_{s+1}=r+1\) such that
Crucially, note that \(i_{s+1}=r+1\) because \(c_r>c_{r+1}\) by (5.3). Next, we define the system \(({\mathscr {W}},\varvec{\nu },{\textbf{d}})\), where \({\mathscr {W}}\) is an s-step flag and, for all \(j\in \{1,\ldots ,s\}\), we have
In particular, \(W_s=V_{i_{s+1}-1}=V_r\) because \(i_{s+1}=r\), and thus \({\mathscr {W}}\) is a non-degenerate flag system as per Definition 3.2 (b). Clearly, \(1=d_1>d_2>\cdots>d_s>d_{s+1}=c\), so in order to prove part (a), all that remains to show is that the system \(({\mathscr {W}},\varvec{\nu },{\textbf{d}})\) satisfies the entropy condition (3.4). This follows by a simple computation. Indeed, let \({\mathscr {W}}'\) be a subflag of \({\mathscr {W}}\). We then define \({\mathscr {V}}'\leqslant {\mathscr {V}}\) by letting \(V_m'=W_j\) whenever \(i_j\leqslant m<i_{j+1}\). Hence,
Consequently, since the system \(({\mathscr {V}},{\varvec{\mu }},{{\textbf{c}}})\) satisfies condition (3.4), so does \(({\mathscr {W}},\varvec{\nu },{\textbf{d}})\). This proves that we may always assume condition (a).
(b) Consider a system \(({\mathscr {V}},{{\textbf{c}}},{\varvec{\mu }})\) satisfying (a). We then argue as in Lemma 5.3, by considering the subflag \({\mathscr {V}}'\) with \(V_i' = V_i\) for \(i\ne j\), and \(V_{j}'=V_{j-1}\). We then have
Since the left-hand side is \(\geqslant 0\) and \(c_j-c_{j+1}>0\) for all \(j=1,\ldots ,r\), statement (b) follows.
(c) Assuming statement (b), we may prove statement (c) by arguing as in Lemma 5.3.
(d) Suppose that (a) holds. Consider the flag \({\mathscr {V}}': \langle {\textbf{1}}\rangle \leqslant V_1'\leqslant \cdots \leqslant V_r'\), where
It is easy to see from the definition of a system (Definition 3.2) that \({\mathscr {V}}'\) is a subflag of \({\mathscr {V}}\). We have \({\mathbb {H}}_{\mu _j}(V'_j)=0\) for all j, and hence
by (3.5). Since \(c_i-c_{j+1}>0\) for all \(i\leqslant r-1\), and \(c_r>c_{r+1}\geqslant 0\), we must have that \(V_i'=V_i\) for all i, which is precisely statement (d).
(e) This statement is proven as in Lemma 5.3. \(\square \)
The bound \(\beta _k \geqslant {\tilde{\gamma }}_k\) will now follow from the following proposition, as long as we can show that the quantity \({\tilde{\gamma }}_k\) is well-defined and positive. The latter will be accomplished in Sect. 9, where we construct a system satisfying the strict entropy condition 3.5. An alternative construction is given in Appendix C.
As usual, \({\textbf{A}}\) is a logarithmic random set.
Proposition 5.5
Let \(c>0\) and suppose that there is a system \(({\mathscr {V}},{{\textbf{c}}},{\varvec{\mu }})\) such that:
-
(i)
\(1=c_1> c_2> \cdots >c_{r+1}=c\);
-
(ii)
There is some \(\varepsilon >0\) such that \(\textrm{e}({\mathscr {V}}',{{\textbf{c}}},{\varvec{\mu }}) \geqslant \textrm{e}({\mathscr {V}},{{\textbf{c}}},{\varvec{\mu }}) + \varepsilon \) for all proper subflags \({\mathscr {V}}'\) of \({\mathscr {V}}\).
-
(iii)
\({\text {Supp}}(\mu _j) = V_j \cap \{0,1\}^k\) for \(j=1,2,\ldots ,r\).
Let \(\delta >0\), and assume that D is large enough in terms of \(\delta ,\varepsilon \) and \(({\mathscr {V}},{{\textbf{c}}},{\varvec{\mu }})\). Then the probability that \({\textbf{A}}\cap [D^c,D]\) has k distinct subsets with equal sums is \(\geqslant 1-\delta \).
The proof of Proposition 5.5 is perhaps the most difficult part of this paper, and will occupy this and the next section. Throughout the remainder of this section and throughout the next section, we will fix a system \(({\mathscr {V}},{{\textbf{c}}},{\varvec{\mu }})\) with \(c_{r+1}=c\) satisfying conditions (i)–(iii) of Proposition 5.5. Constants implied by \(O-\) and \(\ll -\)symbols may depend on this system.
The main result, which we will prove in this section and the next, is Proposition 5.7 below.
Definition 5.6
(Nondegenerate maps) A map \(\psi : X \rightarrow \{0,1\}^k\) is said to be nondegenerate if the image of \(\psi \) is not contained in any of the subspaces \(\{x\in {\mathbb {Q}}^k : x_i=x_j\}\).
The map \(\psi \) is a “Venn diagram selection function”, that is, the value of \(\psi (b)\) specifies which piece of the Venn diagram of k subsets \(X_1,\ldots ,X_k\) of X that b belongs to. In the notation (4.6) of the previous section, \(\psi (a)=\omega \) means that \(a\in B_\omega \). The condition that \(\psi \) is nondegenerate is equivalent to \(X_1,\ldots ,X_k\) being distinct, and is similar to the property of a flag \({\mathscr {V}}\) being nondegenerate.
Proposition 5.7
With probability tending to 1 as \(D\rightarrow \infty \), there exists a nondegenerate map \(\psi : {\textbf{A}}\cap (D^c,D] \rightarrow \{0,1\}^k\) such that \(\sum _{a \in {\textbf{A}}} a \psi (a) \in \langle {\textbf{1}} \rangle \).
The map \(\psi \) will be constructed using the data from the system \(({\mathscr {V}},{{\textbf{c}}},{\varvec{\mu }})\). Before we embark on the proof of this result, we show how to deduce Proposition 5.5 from it.
Proof of Proposition 5.5, assuming Proposition 5.7
By Proposition 5.7, we know that with probability \(1 - o_{D \rightarrow \infty }(1)\) there is a nondegenerate map \(\psi : {\textbf{A}}\cap (D^c,D] \rightarrow \{0,1\}^k\) such that \(\sum _{a \in {\textbf{A}}} a \psi (a)\) lies in \(\langle {\textbf{1}} \rangle \), that is to say, it is a constant vector. We will show that this map induces k distinct subsets of \({\textbf{A}}\) with equal sums.
Let \(\psi _i:{\textbf{A}}\cap (D^c,D]\rightarrow {\mathbb {Q}}\), \(i=1,\ldots ,k\), denote the projection of \(\psi \) onto the i-th coordinate of \({\mathbb {Q}}^k\), so that \(\psi =(\psi _1,\ldots ,\psi _k)\). Define
These sets are distinct because if \(A_i = A_j\), then the image of \(\psi \) would take values in the hyperplane \(\{x \in {\mathbb {Q}}^k : x_i = x_j\}\), contrary to the fact that \(\psi \) is nondegenerate. Moreover, for all i, j we have
and so \(A_1,\ldots ,A_k\) do indeed have equal sums. \(\square \)
5.2 Many values of \(\sum _{a \in {\textbf{A}}} a \psi (a)\), and a moment bound
We turn now to the task of proving Proposition 5.7. We will divide the proof of Proposition 5.7 into two parts. The first and more difficult part, which we prove in this section, states that (with high probability) \(\sum _{a \in {\textbf{A}}} a \psi (a)\) takes many different values modulo \(\langle {\textbf{1}} \rangle \) as \(\psi \) ranges over all nondegenerate maps \(\psi :{\textbf{A}}\cap (D^c,D]\rightarrow \{0,1\}^k\). The precise statement is Proposition 5.9 below. The deduction of Proposition 5.7 from Proposition 5.9 will occupy Sect. 6.
Let \(0<\kappa \leqslant \min _{1\leqslant j\leqslant r} (c_j-c_{j+1})-2/\log D\) be a small quantity, which may depend on D. Let
The purpose of working with \({\textbf{A}}'\) rather than \({\textbf{A}}\) is to ensure that some gaps are left for the subsequent argument in the next section (based on ideas of Maier and Tenenbaum [20]), in which we show that one of the many sums \(\sum _{a \in {\textbf{A}}'} a \psi (a)\) guaranteed by Proposition 5.9 may be modified, using the elements of \({\textbf{A}}\cap (D^c,D] {\setminus } {\textbf{A}}'\), to be in \(\langle {\textbf{1}} \rangle \).
Definition 5.8
(Compatible functions) We say that a map \(\psi : {\textbf{A}}' \rightarrow \{0,1\}^k\) is compatible if, for all j, \(a\in {\textbf{A}}^j\) implies \(\psi (a)\in V_j\).
Remark
Recall that \({\text {Supp}}(\mu _j)=V_j\cap \{0,1\}^k\) for all j by condition (iii) of Proposition 5.5. Setting \(B_\omega ^{(j)}=\{a\in {\textbf{A}}^j:\psi (a)=\omega \}\), we see that \(\psi \) being compatible is equivalent to \(B_\omega ^{(j)} \ne \emptyset \) only if \(\mu _j(\omega )>0\), and is consistent with earlier notation (4.6).
Proposition 5.9
There exist real numbers \(\kappa ^*>0\), \(p>1\) and \(t>0\) (which depend on the system \(({\mathscr {V}},{{\textbf{c}}},{\varvec{\mu }})\)) so that the following is true. Let \(\delta > 0\) and suppose that D is sufficiently large as a function of \(\delta \). Uniformly for \(0\leqslant \kappa \leqslant \kappa ^*\), we have with probability at least \(1 - \delta \), that \(\sum _{a \in {\textbf{A}}'}a \psi (a)\) takes at least
different values modulo \(\langle {\textbf{1}} \rangle \), as \(\psi \) ranges over all nondegenerate, compatible maps \(\psi \).
Remark
By (5.4), it clearly suffices to prove Proposition 5.9 for \(\kappa =\kappa ^*\).
We will deduce Proposition 5.9 from a moment bound. Firstly, define the representation function \(r_{{\textbf{A}}'}: {\mathbb {Q}}^k/\langle {\textbf{1}} \rangle \rightarrow {\mathbb {R}}\) by
where the summation is over all maps \(\psi : {\textbf{A}}'\rightarrow \{0,1\}^k\), and where
This weight function \(w_{{\textbf{A}}'}\) is chosen so that it is large only when \(\psi \) is balanced, that is, when for all j and \(\omega \), the set \({\textbf{A}}^j\) has about \(\mu _j(\omega )|{\textbf{A}}_j|\) elements a with \(\psi (a)=\omega \). Observe that if \(\psi (a)\not \in {\text {Supp}}(\mu _j)\) for some j and some \(a\in {\textbf{A}}^j\), then \(w_{{\textbf{A}}'}(\psi )=0\), and thus only compatible \(\psi \) contribute to the sum \(r_{{\textbf{A}}}(x)\). However, \(w_{\textbf{A}}(\psi )\) might be non-zero for some degenerate maps \(\psi \), and these will be removed by a separate argument below.
The crucial moment bound for the deduction of Proposition 5.9 is given below.
Proposition 5.10
Let
There is a \(p > 1\) and \(\kappa ^*>0\) so that uniformly for \(0\leqslant \kappa \leqslant \kappa ^*\) and for all \(D\geqslant e^{100/c}\) we have the moment bound
Proof of Proposition 5.9, assuming Proposition 5.10
Define also
We have
for any \({\textbf{A}}'\). On the other hand, when \(\psi \) is non-compatible, then \(w_{{\textbf{A}}'}(\psi )=0\) because we know that \({\text {Supp}}(\mu _j)=V_j\cap \{0,1\}^k\) for all j by our assumption of condition (iii) of Proposition 5.5. In addition, if \(\psi \) is degenerate, then its image is contained in \(\{x\in {\mathbb {Q}}^k:x_i=x_j\}\cap \{0,1\}^k\) for some \(i\ne j\). Since \(V_r\not \subset \{x\in {\mathbb {Q}}^k:x_i=x_j\}\), there must exist some \(\omega \in V_r\cap \{0,1\}^k={\text {Supp}}(\mu _r)\) that is not in the support of \(\psi \). Therefore,
Since \(c_r>c_{r+1}\) by our assumption of condition (i) of Proposition 5.5, Lemma A.5 implies \(|{\textbf{A}}^r| \geqslant \frac{1}{2}(c_{r}-c_{r+1}) \log D\) with probability \(> 1 - O(e^{-(1/4)\log ^{1/2} D})\), and thus the right side above is o(1) with this same probability. The same lemma also implies that \({\textbf{A}}'\in {\mathcal {E}}^*\) with probability \(> 1 - O(e^{-(1/4)\log ^{1/2} D})\).
Now fix a small \(\delta >0\). The above discussion implies that, with probability at least \(1 - \delta /2\) (for D sufficiently large), we have
On the other hand, Markov’s inequality and Proposition 5.10 imply that, with probability at least \(1 - \delta /2\), we have
By Hölder’s inequality,
With probability at least \(1 - \delta \), both (5.5) and (5.6) hold, and in this case (5.7) gives
This completes the proof of Proposition 5.9. \(\square \)
The rest of the section is devoted to the proof of Proposition 5.10.
5.3 An entropy condition for adapted systems
For reasons that will become apparent, in the proof of Proposition 5.10 we will need to apply the entropy gap condition not only with subflags \({\mathscr {V}}'\) of \({\mathscr {V}}\), but with a more general type of system.
Definition 5.11
(Adapted system) Given a system \(({\mathscr {V}},{{\textbf{c}}},{\varvec{\mu }})\), the pair \(({\mathscr {W}},{\textbf{b}})\) is adapted to \(({\mathscr {V}},{{\textbf{c}}},{\varvec{\mu }})\) if \({\mathscr {W}}: \langle {\textbf{1}} \rangle = W_0 \leqslant W_1 \leqslant \cdots \leqslant W_s\) is a complete flag with \(W_s\leqslant V_r\), and \({{\textbf{b}}}=(b_1,\ldots ,b_s)\) satisfies \(1\geqslant b_1 \geqslant \cdots \geqslant b_s\geqslant 0\) and the condition
We say that \(({\mathscr {W}},{\textbf{b}})\) is saturated if \(s=\dim (V_r)-1\) and if for all \(j\leqslant r\), there are exactly \(\dim V_j-1\) values of i with \(b_i>c_{j+1}\). Otherwise, we call \(({\mathscr {W}},{{\textbf{b}}})\) unsaturated.
Remark
For the definition of complete flag, see Definition 3.1. We make a few comments to motivate the term saturated. Let
so that the \(b_i\)’s belonging to the interval \((c_{j+1},c_j]\) are precisely \(b_{m_{j-1}+1},\ldots ,b_{m_j}\). Since \(W_i\leqslant V_j\) whenever \(b_i>c_{j+1}\), we infer that
Since \({\mathscr {W}}\) is complete, we have \(\dim (W_i)=i+1\), and thus \(m_j\leqslant \dim (V_j)-1\). In particular, \(({\mathscr {W}},{{\textbf{b}}})\) is saturated if, and only if, we have equality in (5.9) for all j. \(\square \)
We need some further notation, which reflects that \({\textbf{A}}'\) is supported on intervals with gaps. For \(1\leqslant j\leqslant r\), let
Recall that we take \(\kappa \) small enough so that each \(I_j\) has length \(\geqslant 2/\log D\), that is, \(\kappa \leqslant \min _j (c_j-c_{j+1})-2/\log D\).
There is a natural analogue of the \(\textrm{e}\)-value (cf. Definition 3.5) for adapted systems.
Definition 5.12
Given an adapted system \(({\mathscr {W}}, {\textbf{b}})\), we define
where \(\lambda \) denotes the Lebesgue measure on \({\mathbb {R}}\).
Finally, we define
that is to say \(\delta ({{\textbf{b}}})\) is the smallest non-negative real number with the property that
Adapted systems \(({\mathscr {W}},{{\textbf{b}}})\) can, in a certain sense, be interpreted in terms of convex superpositions of pairs \(({\mathscr {V}}',{{\textbf{c}}})\), \({\mathscr {V}}' \leqslant {\mathscr {V}}\) a subflag. The next lemma gives us a strict inequality analogous to condition (ii) of Proposition 5.5, unless \({\mathscr {W}}\) is saturated and has a small value of \(\delta ({{\textbf{b}}})\), which corresponds to the convex superposition which gives rise to \(({\mathscr {W}},{\textbf{b}})\) having weight \(\approx 1\) on the trivial subflag \(({\mathscr {V}},{{\textbf{c}}})\).
Lemma 5.13
Let \(({\mathscr {V}},{\varvec{\mu }},{{\textbf{c}}})\) be a system satisfying conditions (i)–(ii) of Proposition 5.5. Let \(\varepsilon \) be as in condition (ii). Suppose that \(({\mathscr {W}}, {\textbf{b}})\) is an adapted system to \(({\mathscr {V}},{\varvec{\mu }},{{\textbf{c}}})\) such that \(b_i\) lies in some set \(I_j\) for each i. Suppose, further, that \(\kappa \) is small enough in terms of \(\varepsilon \), and that \(\kappa \leqslant \frac{1}{2} \min _j (c_j-c_{j+1}).\)
-
(a)
If \(({\mathscr {W}},{{\textbf{b}}})\) is unsaturated, then \(\textrm{e}({\mathscr {W}}, {\textbf{b}}) \geqslant \textrm{e}({\mathscr {V}},{{\textbf{c}}},{\varvec{\mu }}) + \varepsilon /2\).
-
(b)
If \(({\mathscr {W}},{\textbf{b}})\) is a saturated, then \(\textrm{e}({\mathscr {W}},{{\textbf{b}}}) \geqslant \textrm{e}({\mathscr {V}},{{\textbf{c}}},{\varvec{\mu }}) + \varepsilon \delta ({{\textbf{b}}})/2\).
Proof
We treat both parts together for most of the proof. Let \(m_j\) be defined by (5.8). In particular, \(m_0=0\) because \(c_1=1\). Note that
and let h be such that
Without loss of generality, we may assume that \(b_{m_{h}}<c_{h}\); the case \(b_{m_{h}}=c_{h}\) will then follow by continuity.
Set \(b=b_{m_{h}}\) and note that
The quantity \(\textrm{e}({\mathscr {W}},{{\textbf{b}}}')\) is linear in each variable \(b_i'\) and the region over which we consider the above minimum is a polytope. As a consequence, the minimum of \(\textrm{e}({\mathscr {W}},{{\textbf{b}}}')\) must occur at one of the vertices of the polytope. In particular, there are indices \(\ell _j\in (m_{j-1},m_j]\) for \(j=1,\ldots ,r\) such that
In fact, note that we must have \(\ell _{h} <m_{h}\) because \(b_{m_{h}}^*=b\) and we have assumed that \(b<c_{h}\).
Using the linearity of \(\textrm{e}({\mathscr {W}},\cdot )\) once again, we find that
where \(b_i^{(1)} = b_i^{(2)} = b_i^*\) for \(i\in \{1,\ldots ,s\}{\setminus } (\ell _{h},m_{h}]\), \(b_i^{(1)} = c_{h+1}+\kappa \) for \(i\in (\ell _{h},m_{h}]\) and \(b_i^{(2)} = c_{h}\) for \(i\in (\ell _{h},m_{h}]\).
Fix \({{\textbf{b}}}'\in \{ {{\textbf{b}}}^{(1)}, {{\textbf{b}}}^{(2)} \}\). In addition, define the indices \(i_1,\ldots ,i_r\) by letting \(i_j=\ell _j\) when \(j\ne h\) or \({{\textbf{b}}}'={{\textbf{b}}}^{(1)}\), while letting \(i_{h}=m_{h}\) when \({{\textbf{b}}}'={{\textbf{b}}}^{(2)}\). We then have
A straightforward calculation implies that
where \({\mathscr {V}}'\) is the subflag of \({\mathscr {V}}\) with \(V_j'=W_{i_j}\) and
(Note that \({\mathscr {V}}'\) is indeed a subflag since \(W_{i_j}\leqslant W_{m_j}\leqslant V_j\) by (5.9).)
If \({\mathscr {V}}'={\mathscr {V}}\), we must have that \(W_{i_j}=V_j\) for all j. Since \(W_{i_j}\leqslant W_{m_j}\leqslant V_j\), we infer that \(W_{m_j}=V_j\), as well as that \(i_j=m_j\) for all j. In particular, the flag \(({\mathscr {W}},{{\textbf{b}}})\) we started with must be saturated and \(S=0\) (since \(i_j=m_j\) and \({\mathbb {H}}_{\mu _j}(W_{i_j})={\mathbb {H}}_{\mu _j}(V_j)=0\) for all j).
We are now ready to complete the proof of both parts of the lemma.
(a) By the above discussion, if \(({\mathscr {W}},{{\textbf{b}}})\) is unsaturated, then \({\mathscr {V}}'\ne {\mathscr {V}}\). Therefore, by assumption of condition (ii) of Proposition 5.5, we have
\(\text {for } {{\textbf{b}}}'\in \{{{\textbf{b}}}^{(1)},{{\textbf{b}}}^{(2)}\}\). Inserting this inequality into (5.13) implies that
Since \(\textrm{e}({\mathscr {W}},{{\textbf{b}}})\geqslant \textrm{e}({\mathscr {W}},{{\textbf{b}}}^*)\), the proof of part (a) is complete by assuming that \(\kappa \) is small enough in terms of \(\varepsilon \).
(b) Assume that \(({\mathscr {W}},{{\textbf{b}}})\) is saturated. We can only have \({\mathscr {V}}'={\mathscr {V}}\) if \(i_{h}=m_{h}\). Since \(\ell _{h}<m_{h}\), this can only happen when \({{\textbf{b}}}'={{\textbf{b}}}^{(2)}\). As a consequence, assuming again that \(\kappa \) is small enough in terms of \(\varepsilon \), we have that
Inserting this into (5.13) yields the inequality
Since \(b=c_{h}-\delta ({{\textbf{b}}})\), \(0<c_{h}-c_{h+1}-\kappa \leqslant 1\), and \(\textrm{e}({\mathscr {W}},{{\textbf{b}}})\geqslant \textrm{e}({\mathscr {W}},{{\textbf{b}}}^*)\), we find that \(\textrm{e}({\mathscr {W}},{{\textbf{b}}})\geqslant \textrm{e}({\mathscr {V}},{{\textbf{c}}},{\varvec{\mu }})+\varepsilon \delta ({{\textbf{b}}})/2\). This completes the proof of part (b) of the lemma. \(\square \)
5.4 Proof of the moment bound
In this subsection we prove Proposition 5.10. For a vector
define the event
When \({\textbf{A}}'\) lies in \(S({\textbf{n}})\), we write
so that
We may define, for any compatible \(\psi \), the auxilliary function
The salient property of \(\theta \) is that it is determined by the ordering of the elements in \({\textbf{A}}^j\) and not by the elements themselves. We denote by \(\varvec{\Theta }_{{{\textbf{n}}}}\) the set of compatible functions \(\theta \), that is, those functions satisfying
In the event \(S({\textbf{n}})\), if \(\psi \) is an compatible function and \(\theta \) is defined by (5.16), we have
where the notation \(w_{{\textbf{n}}}\) (in place of \(w_{{\textbf{A}}}\)) reflects the fact that w only depends on \(\theta \), and not otherwise on \({\textbf{A}}\). In this notation,
Writing \(r_{{\textbf{A}}'}^p = r_{{\textbf{A}}'}^{p-1} r_{{\textbf{A}}'}\) and interchanging the order of summation, it follows that if \({\textbf{A}}'\) lies in \(S({\textbf{n}})\), then
where the inner summation is over all compatible functions \(\theta '\) satisfying
Similar to the argument in Sect. 4.2, we find a flag \({\mathscr {W}}\) and special values of i which have the effect of isolating terms in the relation (5.20). With \(\theta , \theta ',{\textbf{n}}\) fixed, let
and
We now choose a special basis of \({\text {Span}}({\textbf{1}}, \Omega )\). For each \(\omega \in \Omega \), let
and place a total ordering on \(\Omega \) by saying that \(\omega \prec \omega '\) if \(K_{\omega } < K_{\omega '}\). Let \(\omega ^1\) be the minimum element in \(\Omega {\setminus } \langle {\textbf{1}} \rangle \),
where s is such that \(\Omega \subset {\text {Span}}({\textbf{1}},\omega ^1,\ldots ,\omega ^s)\). Finally, let
and form the flag
We note that in the special case \(\theta =\theta '\), we have \(s=0\) and \({\mathscr {W}}\) is a trivial flag with only one space \(W_0\).
Now we divide up the sample space of \({\textbf{A}}'\) into events describing the rough size of the critical elements \(a_{\tau _j}\). By construction,
Similarly to Sect. 4, for \(1\leqslant i\leqslant s\) let
The definition of \({\textbf{A}}'\) implies that for each i, there is some j with \(b_i\in I_j=(c_{j+1}+\kappa ,c_j]\). Moreover, we have the implications
where we used (5.17) to obtain the second implication. Since \(b_1\geqslant b_2\geqslant \cdots \geqslant b_i\), we infer the stronger relation
Therefore, the pair \(({\mathscr {W}},{{\textbf{b}}})\) is adapted to \(({\mathscr {V}},{{\textbf{c}}},{\varvec{\mu }})\).
Using the inequality \((x+ y)^{p-1} \leqslant x^{p-1} + y^{p-1}\) repeatedly, we may partition (5.19) according to the values of \({\mathscr {W}}(\theta ,\theta ')\) and \(\varvec{\tau }(\theta ,\theta ')\), obtaining (still assuming \(S({\textbf{n}})\))
We need to separately consider other elements of \({\textbf{A}}'\) that lie in the intervals \((D^{b_i}/e,D^{b_i}]\), and so we define
By assumption, \(\sum _b \ell _b \geqslant s\). It may happen that \(b_i=b_{i+1}\) for some i, in which case \(|{\mathcal {B}}| < s\). With \({{\textbf{n}}}, \varvec{\tau }, {{\textbf{b}}}, \varvec{\ell }\) all fixed, consider the event
defined as the intersection of
-
\(S({{\textbf{n}}})\);
-
\(a_{\tau _i} \in (D^{b_i}/e,D^{b_i}]\) for all i;
-
\(|{\textbf{A}}' \cap (D^b/e,D^b]| = \ell _b\) for all \(b\in {\mathcal {B}}\).
Taking expectations over \({\textbf{A}}'\), we get
where the condition that \(\ell _b\leqslant D^{b/2}/100\) comes from the fact that we taking expectations over \({\textbf{A}}'\in {\mathcal {E}}^*\). By Hölder’s inequality with exponents \(\frac{1}{p-1}\), \(\frac{1}{2-p}\), this implies that
Claim. Let \(\ell _b\leqslant D^{b/2}/100\) for all \(b\in {\mathcal {B}}\). Then we have
Proof of Claim
Let us begin by analyzing the event \(E({{\textbf{b}}},\varvec{\tau },{{\textbf{n}}}, \varvec{\ell })\) we are conditioning on. Consider the set \(\bigcup _j(D^{c_{j+1}+\kappa },D^{c_j}]{\setminus } \bigcup _{b\in {\mathcal {B}}} (D^b/e,D^b]\). There is a unique way to write it as \(\bigcup _{m=1}^M I_m\), where the sets \(I_m\) are intervals of the form (A, B] with their closures \({\bar{I}}_m\) mutually disjoint. Now, the event \(E({{\textbf{b}}},\varvec{\tau },{{\textbf{n}}}, \varvec{\ell })\) is equivalent to there being mutually disjoint sets of consecutive integers \({\mathcal {I}}_m\) (\(1\leqslant m\leqslant M\)) and \({\mathcal {J}}_b\) (\(b\in {\mathcal {B}}\)) such that:
-
The sets \({\mathcal {I}}_m\) \((1\leqslant m\leqslant M)\) and \({\mathcal {J}}_b\) \((b\in {\mathcal {B}})\) together form a partition of the set \([n_r]\);
-
For all \(m\in \{1,\ldots ,M\}\), we have \(a_n\in I_m\) if and only if \(n\in {\mathcal {I}}_m\);
-
For all \(b\in {\mathcal {B}}\), we have \(a_n\in (D^b/e,D^b]\) if and only if \(n\in {\mathcal {J}}_b\);
-
\(\tau _i\in {\mathcal {J}}_{b_i}\) for all i;
-
\(|{\mathcal {J}}_b|=\ell _b\) for all \(b\in {\mathcal {B}}\).
The above discussion allows us to describe the distribution law of \({\textbf{A}}'\) under the event \(E({{\textbf{b}}},\varvec{\tau },{{\textbf{n}}}, \varvec{\ell })\): given a choice of the intervals \({\mathcal {I}}_m\) and \({\mathcal {J}}_b\), we construct independent logarithmic random sets \({\textbf{A}}^*_m\) on \(I_m\) and \({\tilde{A}}_b\) on \((D^b/e,D^b]\) such that \(\#{\textbf{A}}'\cap I_m=\#{\mathcal {I}}_m\) for all m and \(\#{\tilde{A}}_b=\ell _b\) for all b. Then \({\textbf{A}}'\) is the union of all \({\textbf{A}}^*_m\)’s and all \({\tilde{{\textbf{A}}}}_b\)’s.
Having explained how the distribution of \({\textbf{A}}'\) looks like under the event \(E({{\textbf{b}}},\varvec{\tau },{{\textbf{n}}},\varvec{\ell })\), let us now prove our claim. We argue as in the proof of Proposition (4.4). Relation (5.20) implies
for some \(a_0\in {\mathbb {Z}}\). Since \({\textbf{1}}, \omega ^1,\ldots ,\omega ^s\) are linearly independent, this uniquely determines their coefficients \(a_0, a_{\tau _1},\ldots , a_{\tau _s}\) in terms of the other \(a_i\)’s. For each \(b\in {\mathcal {B}}\), let
Then, given \({\textbf{A}}_m^*\) for all m and \(b\in {\mathcal {B}}\), there are at most
choices for \({\tilde{{\textbf{A}}}}_b\) (since \(m_b\) of each elements are determined by the remaining \(\ell _b-m_b\) elements and by the elements of the \({\textbf{A}}_m^*\) that we have fixed), where we used that \(\ell _b^{m_b}\leqslant \ell _b^k\ll (1-1/e)^{-\ell _b}\). In addition, Lemma A.4 implies that the probability of occurrence of a given set \(X_b\subset {\mathbb {Z}}\cap (D^b/e,D^b]\) as the set \({\tilde{{\textbf{A}}}}_b\), conditionally to the event that \(\#{\tilde{{\textbf{A}}}}_b=\ell _b\), is
Putting the above estimates together, we conclude that
upon noticing that \(\sum _{b\in {\mathcal {B}}} m_b b = \sum _i b_i\). This proves our claim that (5.24) holds. \(\square \)
In the light of (5.24), relation (5.23) becomes
To evaluate the bracketed expression, first recall the definition (5.18) of \(w_{{\textbf{n}}}(\theta ')\), and note that the conditions \({\mathscr {W}}(\theta ,\theta ',{{\textbf{n}}}) = {\mathscr {W}}\), \(\varvec{\tau }(\theta ,\theta ',{{\textbf{n}}}) = \varvec{\tau }\) together imply that
where we have defined \(\tau _0:=0\) and \(\tau _{s+1} := n_r+1\). For brevity, write
Some of these sets are empty. In any case, we have
From (5.18), and the fact that the discrete intervals \(T_{i,j}\) are disjoint and cover \([n_r]\), we have
With these observations, we conclude that
where
Substituting into (5.25), and summing over \({\textbf{n}}\), we get
If \(V_j \leqslant W_i\), then \(\mu _j(W_i+\omega )=1\) for all \(\omega \) and thus \(\eta (i,j,p,{\mathscr {W}})=1\). For all \(i,j,p,{\mathscr {W}}\) we have \(\eta (i,j,p,{\mathscr {W}})\leqslant 1\). Thus, we require lower bounds on \(|T_{i,j}|\) in the case \(V_j \not \leqslant W_i\).
Claim. Assume that \(E({{\textbf{b}}},\varvec{\tau },{{\textbf{n}}},\varvec{\ell })\) holds. Given i such that \(b_{i+1}<b_i\) and \(j\in \{1,\ldots ,r\}\), define
Then
Proof of Claim
Let t be such that \(a_t\in M_{i,j}\). In particular,
This relation and the definition of \(b_i\) in (5.21) imply that \(a_{\tau _{i+1}}<a_t<a_{\tau _i}\) and hence \(\tau _i<t<\tau _{i+1}\), where we used that \(a_1>a_2>\cdots >a_{n_r}\). In addition, since \(D^{c_{j+1}+\kappa }<a_t\leqslant D^{c_j}\), we have that \(a_t\in {\textbf{A}}^j\). Thus, \(n_{j-1}<t\leqslant n_j\) by (5.15). This completes the proof of the claim. \(\square \)
A direct consequence of (5.30) is that
Combining this inequality with (5.29), we get
Fix \({{\textbf{b}}}\) and \({\mathscr {W}}\), and let \(E'({{\textbf{b}}},\varvec{\ell })\) be the event that \(|{\textbf{A}}'\cap (D^b/e,D^b]|=\ell _b\) for all \(b\in {\mathcal {B}}\). Given \({\textbf{A}}'\in E'({{\textbf{b}}},\varvec{\ell })\), we have at most \(\prod _b \ell _b\leqslant e^{\sum _b\ell _b}\) choices for \(\tau _1,\ldots ,\tau _s\). Hence,
Since the events \(S({{\textbf{n}}})\) are mutually disjoint, we arrive at the inequality
Next, we estimate the right hand side of (5.31). The intervals \(M_{i,j}\) and \((D^b/e,D^b]\) are mutually disjoint by (5.30), hence the quantities \(|{\textbf{A}}' \cap M_{i,j}|\) and \(|{\tilde{{\textbf{A}}}}_b|\) are independent. Using Lemma A.3, we obtain
Recall that \(I_j=(c_{j+1}+\kappa ,c_j]\), define
and recall that \(\lambda \) denotes the Lebesgue measure on \({\mathbb {R}}\). Then, by the definition of \(M_{i,j}\), we have
Substituting into the definition of \(\textrm{e}()\) (Definition 5.12), this gives
where
Recall the definition (5.28) of \(\eta (i,j,p,{\mathscr {W}})\). If \(W_i \geqslant V_j\), then \(\mu _j(W_i + x) \!=\! 1\) whenever \(x \in {\text {Supp}}(\mu _j)\), and so in this case \(\eta (i,j,p,{\mathscr {W}}) = 1.\) Since \({\mathbb {H}}_{\mu _j}(W_i)=0\) in this case, we have
For any fixed \(i,j,{\mathscr {W}}\), we have
and so
We deduce from (5.32), (5.33) and (5.34) that
To continue, we separate two cases.
Case 1. \(({\mathscr {W}},{{\textbf{b}}})\) is unsaturated.
In the above case, Lemma 5.13(a) implies that \(\textrm{e}({\mathscr {W}},{{\textbf{b}}}) \geqslant \textrm{e}(\mathscr {V},{{\textbf{c}}},{\varvec{\mu }}) + \varepsilon /2\). Consequently,
provided that \(p-1\) is small enough in terms of \(\varepsilon \) (and k).
Since there are O(1) choices for \({\mathscr {W}}\) and \(\log ^{O(1)} D\) choices for \({{\textbf{b}}}\), the contribution of such flags to the right hand side of (5.32) is
Case 2. \(({\mathscr {W}},{{\textbf{b}}})\) is saturated. (Recall from Definition 5.11 that \(({\mathscr {W}},{{\textbf{b}}})\) is called saturated when \(s=\dim (V_r)-1\) and for all \(j\leqslant r\), there are exactly \(\dim V_j-1\) values of i with \(b_i>c_{j+1}\).)
Fix for the moment a pair (i, j) such that
The second condition is equivalent to knowing that
In particular, we have \(W_i\leqslant V_j\) by (5.22). Note though that we have assumed \(V_j\not \leqslant W_i\). Therefore, \(W_i<V_j\). Since \(\dim (W_i)=i+1\), we infer that
Since we have assumed that \(({\mathscr {W}},{{\textbf{b}}})\) is saturated, the above inequality implies that \(b_{i+1}>c_{j+1}\). Recalling the definition (5.11) of \(\delta ({{\textbf{b}}})\), we conclude that
This implies that \(G_i\cap I_j\subset [c_j-\delta ({{\textbf{b}}}),c_j]\) for any pair (i, j) satisfying (5.37). As a consequence,
Since we also have that \(\textrm{e}({\mathscr {W}}, {{\textbf{b}}}) \geqslant \textrm{e}({\mathscr {V}},{{\textbf{c}}},{\varvec{\mu }}) + \varepsilon \delta ({{\textbf{b}}})/2\) by Lemma (5.13)(b), it follows that
provided that \(p-1\) is small enough compared to \(\varepsilon \).
Using (5.38), we see that the contribution of saturated flags to the right hand side of (5.32) is
where we used that there are O(1) choices for \({\mathscr {W}}\). Recall (5.21), which implies that the numbers \(b_i\) are restricted to the set \(\{m/\log D: m\in {\mathbb {N}}\}\). Thus the number of \({{\textbf{b}}}\) with \(\delta ({{\textbf{b}}})=m/\log D\) is at most \((m+1)^s\) and
We thus conclude that
If we combine the above inequality with (5.36) and (5.32), we establish Proposition 5.10. \(\square \)
6 An argument of Maier and Tenenbaum
The aim of this section is to prove Proposition 5.7. The reader may care to recall the statement of that proposition now, as well as the definition of a compatible map (Definition 5.8). As in the previous section, the system \(({\mathscr {V}},{{\textbf{c}}},{\varvec{\mu }})\) is fixed, and satisfies conditions (i)–(iii) of Proposition 5.5. We also fix a basis \(\{ {\textbf{1}},\omega ^1,\ldots ,\omega ^d \}\) of \(V_r\) such that \(V_j={\text {Span}}({\textbf{1}},\omega ^1,\ldots ,\omega ^{\dim (V_j)-1})\) for each j and such that \(\omega ^i\in \{0,1\}^k\) for each i. Denote \(\Omega = {\text {Supp}}(\mu _r) = V_r \cap \{0,1\}^k\).
We begin with an observation related to the solvability of (4.12), which we recall here for the convenience of the reader:
Let \(\Lambda \) denote the \({\mathbb {Z}}\)-span of \({\textbf{1}},\omega ^1,\ldots ,\omega ^d\) (that is, the lattice generated by \({\textbf{1}},\omega ^1,\ldots ,\omega ^d\)). Every vector \(\omega \in \Omega \) is a rational combination of the basis elements \({\textbf{1}},\omega ^1,\ldots ,\omega ^d\). Hence, there is some \(M\in {\mathbb {N}}\) such that \(M\omega \in \Lambda \) for each \(\omega \in \Omega \). In particular, note that the right-hand side of (6.1) lies generically in the lattice \(\Lambda /M=\{x/M: x\in \Lambda \}\). However, we must ensure that (6.1) is solvable with \(K_1,\ldots ,K_r\in {\mathbb {Z}}\). Equivalently, the right-hand side of (6.1) must lie in \(\Lambda \), which can be guaranteed when the coefficients of all vectors \(\omega \) in it lie in \(M{\mathbb {Z}}\).
In this section, implied constants in O() and \(\ll \) notations may depend on the system \(({\mathscr {V}},{{\textbf{c}}},{\varvec{\mu }})\) and basis \(\omega ^1,\ldots ,\omega ^d\); in particular, on k, d and M.
6.1 The sets \({\mathscr {L}}_i({\textbf{A}})\) and lower bounds for their size
The main statement of this subsection, Proposition 6.2, is a variant of Proposition 5.9, where we stipulate that all elements lie in \(\Lambda \). This will later ensure that (6.1) is solvable with \(K_1,\ldots ,K_r\in {\mathbb {Z}}\).
Fix \(\kappa >0\) satisfying \(\kappa \leqslant \frac{\kappa ^*}{2}\), where \(\kappa ^*\) is the constant from Proposition 5.9. In particular, \(\kappa \leqslant 1/2\). We introduce the sets
Thus each \(I_i(D)\) is simply a union of r intervals in \(\Lambda \), and we have the nesting
For any \(\omega \in V_r\) we denote by \({\overline{\omega }}\) the projection of \(\omega \) onto
In addition let \({\overline{\psi }}(a)=\overline{\psi (a)}\) for \(a\in {\textbf{A}}\).
The reader may wish to recall the definition of nondegenerate (Definition 5.6) and compatible (Definition 5.8) maps.
Definition 6.1
Write \({\mathscr {L}}_{i}({\textbf{A}})\) for the set of all \(\sum _{a \in {\textbf{A}}} a{\overline{\psi }}(a)\) that lie in \(\Lambda \), where \(\psi \) ranges over all nondegenerate, compatible maps supported on \(I_i(D)\).
Proposition 6.2
Let \(\delta > 0\) and \(i\in {\mathbb {N}}\), and let D be sufficiently large in terms of \(\delta \). Then with probability at least \(1 - \delta \) in the choice of \({\textbf{A}}\cap I_i(D)\),
where \(\alpha \) is a positive constant depending at most on \(({\mathscr {V}},{{\textbf{c}}},{\varvec{\mu }})\).
Proof
Let
where the first inclusion follows by noticing that
for \(c_{j+1}\in [0,1]\), \(0\leqslant \kappa \leqslant \kappa ^*/2\leqslant 1/2\) and \(i\geqslant 1\). Write \({\mathscr {L}}'_i({\textbf{A}})\) for the set of all \(\sum _{a \in {\textbf{A}}} a {\overline{\psi }}(a)\), where \(\psi \) ranges over all nondegenerate, compatible maps supported on \(I'_i(D)\), but without the stipulation that the sum is in \(\Lambda \). We now apply Proposition 5.9 with D replaced by \(D^{1-\kappa /i}\) and \(\delta \) replaced by \(\delta /2\) to conclude that
with probability at least \(1-\delta /2\), where \(\alpha =1/(p-1)\) with p as in Proposition 5.9.
We now use the elements of \({\textbf{A}}\cap (I_i(D){\setminus } I_i'(D))\) to create many sums \(\sum _{a \in {\textbf{A}}} {\overline{\psi }}(a)\) which do lie in \(\Lambda \). Let \(G:= (D^{c_{r+1}(1-\kappa /i)},\delta ^{-1}D^{c_{r+1}(1-\kappa /i)}]\), which is a subset of \(I_i(D){\setminus } I'_i(D)\). Let \({\mathcal {E}}\) be the event that \({\textbf{A}}\cap G\) contains at least \(2^k\) elements that are \(\equiv m\pmod {M}\) for each \(m\in \{1,\ldots ,M\}\). Lemma A.2 (applied with \(B=\{b\in {\mathbb {Z}}\cap G : b\equiv m\pmod M\}\) and \(\varepsilon =1/3\)) implies that if \(\delta \) is sufficiently small then \({\mathbb {P}}({\mathcal {E}}) \geqslant 1 - \delta /2\).
Assume now that we are in the event \({\mathcal {E}}\). Let us fix a set \({\mathcal {K}}\subset {\textbf{A}}\cap G\) that contains exactly \(2^k\) elements that are \(\equiv m\pmod {M}\) for each \(m\in \{1,\ldots ,M\}\). Take any nondegenerate, compatible function \(\psi : {\textbf{A}}\rightarrow \{0,1\}^k\) supported on \(I'_i(D)\), and write
Recall that \({\text {Supp}}(\mu _r)=V_r\cap \{0,1\}^k\) by condition (iii) of Proposition 5.5. Hence, for each \(\omega \in \Omega \), we may find an element \(a_\omega \in {\mathcal {K}}\) satisfying \(a_\omega \equiv -N_\omega \pmod {M}\). Setting \(\psi _0(a_{\omega })=\omega \) for each \(\omega \), and \(\psi _0(a)=\psi (a)\) for \(a\in I'_i(D)\), and \(\psi _0(a)={\textbf{0}}\) for all other \(a\in I_i(D)\). We have
since \(M|(a_\omega +N_\omega )\) for all \(\omega \). Moreover, \(\psi _0\) is nondegenerate and compatible by construction. Consequently, \(\sum _a a{\overline{\psi }}_0(a) \in \Lambda \) (by removing the coefficient of \({\textbf{1}}\)). Since there are at most \(2^{|{\mathcal {K}}|} \leqslant 2^{M2^k}\) choices for \(\{a_\omega : \omega \in \Omega \}\), the map from \(\sum _{a\in I'_i(D)} a{\overline{\psi }}(a)\) to \(\sum _{a\in I_i(D)} a{\overline{\psi }}_0(a)\) is at most \(2^{M2^k}\)-to-1.We conclude that with probability \(\geqslant 1-\delta \),
the implied constant only depending on k, M and \(\alpha \), which are all fixed. \(\square \)
6.2 Putting \({\mathscr {L}}_i({\textbf{A}})\) in a box
In the last section, we showed that (with high probability) \({\mathscr {L}}_i({\textbf{A}})\) is large. In this section we show that with high probability it is contained in a box (in coordinates \(\omega ^1,\ldots ,\omega ^d\)); putting these results together one then sees that \({\mathscr {L}}_i({\textbf{A}})\) occupies a positive proportion of lattice points in the box, the bound being independent of D.
For \(t \in \{1,\ldots , d\}\), write j(t) for the unique j such that
In addition, let \(C\) be the largest coordinate in absolute value of any element in \(V_r\cap \{0,1\}^k\) when written with respect to the base \({\textbf{1}},\omega ^1,\ldots ,\omega ^d\). We then set
Lemma 6.3
Assume \(\delta >0\) is small enough so that \( r e^{-2/\delta }\leqslant \delta \). Then, we have
with probability at least \(1 -\delta \) in the choice of \({\textbf{A}}\cap I_i(D)\).
Proof
This follows quickly from the fact that \(\psi \) is compatible and by Lemma A.6, the latter implying that
with probability \(\geqslant 1- r e^{-2/\delta } \geqslant 1-\delta \). \(\square \)
Proposition 6.4
Let \(\delta \) and \(\alpha \) be as in Proposition 6.2 and in Lemma 6.3. With probability at least \(1 - 2 \delta \) in the choice of \({\textbf{A}}\cap I_i(D)\), \({\mathscr {L}}_i({\textbf{A}})\) is a subset of the box \(\bigoplus _{t = 1}^d [-N_{j(t)}^{(i)}, N_{j(t)}^{(i)}] \omega ^t\) of size \(\gg \delta ^{d+\alpha } N^{(i)}\).
Proof
This follows immediately upon combining Proposition 6.2 and Lemma 6.3 . \(\square \)
6.3 Zero sums with positive probability
Lemma 6.5
Let \(\delta \) and \(\alpha \) be as in Proposition 6.2 and Lemma 6.3, and let D be large enough in terms of \(\delta \) and \(({\mathscr {V}},{{\textbf{c}}},{\varvec{\mu }})\). Let \(i\in {\mathbb {Z}}\cap [1,(\log D)^{1/3}]\). In addition, let \(S\subset \bigoplus _{t = 1}^d [-N_{j(t)}^{(i)}, N_{j(t)}^{(i)}] \omega ^t\) with \(|S| \gg \delta ^{d+\alpha } N^{(i)}\) and with \(S \subset \Lambda \). Then
Proof
We condition on a fixed choice of \({\textbf{A}}\cap I_i(D)\) for which \({\mathscr {L}}_i({\textbf{A}}) = S\). Note that
Then it is enough to show that with probability \(\gg \delta ^{2d(d+\alpha )}\) , the set \({\textbf{A}}\) contains 2d distinct elements \(a_t\) and \(a_t'\), \(1\leqslant t\leqslant d\), such that
To see why this is sufficient, let \(s=\sum _t(a_t'-a_t)\omega ^t\), which we know belongs to \(S={\mathscr {L}}_i({\textbf{A}})\). In particular, there is an compatible map \(\psi \) supported on \(I_i(D)\) such that \(\sum _{a \in {\textbf{A}}} a {\overline{\psi }}(a) = s\). Now, consider the function
with \({\psi '}(a) = \psi (a)\) for \(a\in {\textbf{A}}\cap I_i(D)\), \(\psi '(a_t')={\textbf{1}}-\omega ^t\) and \(\psi '(a_t)=\omega ^t\) for \(1\leqslant t\leqslant d\), and \(\psi '(a)={\textbf{0}}\) for all other values of \(a\in {\textbf{A}}\cap I_{i+1}(D)\). Notice that \(\psi '\) is compatible according to Definition 5.8 by the second part of (6.7). It is now clear that \(0\in {\mathscr {L}}_{i+1}({\textbf{A}})\). Hence, if the conditional probability that (6.7) holds is \(\gg \delta ^{2\beta d}\), so is the probability that \(0\in {\mathscr {L}}_{i+1}({\textbf{A}})\).
To find \(a_t\) and \(a_t'\) satisfying (6.7), let
The number of elements \(\sum _t s_t \omega ^t \in S\) with \(n|s_t\) for some t is
as long as D is large enough in terms of \(\delta \) and \(({\mathscr {V}},{{\textbf{c}}},{\varvec{\mu }})\). Thus, there is a subset \(S'\subset S\) of size at least |S|/2 and with \(n\not \mid s_t\) for all t. We will choose the sets \(\{a_t: 1\leqslant t\leqslant d\}\) and \(\{a_t': 1\leqslant t\leqslant d\}\) independently, by selecting \(a_t \equiv 0\pmod {n}\) and \(a_t'\not \equiv 0\pmod {n}\).
Note that
provided that \(i\leqslant (\log D)^{1/3}\). For each given t, i and j, the probability that the interval \([4t N^{(i)}_{j}, (4t+2) N^{(i)}_{j}]\) contains no element \(a_t \equiv 0\pmod {n}\) of \({\textbf{A}}\) equals
for some small positive constant \(\gamma =\gamma (d)\). Thus, the probability that, for each \(t=1,2,\ldots ,d\), the set \({\textbf{A}}\) contains some \(a_t\equiv 0\pmod n\) in the interval \([4t N^{(i)}_{j(t)}, (4t+2) N^{(i)}_{j(t)}]\) is \(\gg 1/n^d\gg \delta ^{d(d+\alpha )}\).
Fix a choice of \(a_1,\ldots , a_d\) as described above, and set
By construction, every coordinate of \(x\in X\) is \(\not \equiv 0\pmod {n}\). Also,
Now the intervals on the right-hand side above are disjoint, and
Thus, by Lemma A.7, with probability \(\gg (\delta ^{d+\alpha })^d\), there are \(a_1',\ldots ,a_d' \in {\textbf{A}}\) such that \((a_1',\ldots ,a_t')\in X\). The relation (6.7) follows for such \(a_t,a_t'\), which exist with probability \(\gg \delta ^{d(d+\alpha )} \cdot \delta ^{d(d+\alpha )}\). \(\square \)
6.4 An iterative argument
To complete the proof of Proposition 5.7, we apply Lemma 6.5 iteratively. Let \({\mathscr {S}}\) be the set of sets S satisfying the assumptions of Lemma 6.5. We say that \({\mathscr {L}}_i({\textbf{A}})\) is large if it satisfies the conclusions of Proposition 6.4, or equivalently if \({\mathscr {L}}_i({\textbf{A}}) = S\) with \(S\in {\mathscr {S}}\). Thus Lemma 6.5 implies that
We conclude there is some \(\varepsilon = \delta ^{O(1)}\) such that
For brevity, write \(E_i\) for the event that \(0 \notin {\mathscr {L}}_i({\textbf{A}})\), and \(F_i\) for the event that \({\mathscr {L}}_{i}({\textbf{A}})\) is large. In this notation, (6.10) becomes
Moreover, Proposition 6.4 implies that
Lastly, note that \(E_1 \supset E_2 \supset \cdots \) because \({\mathscr {L}}_1({\textbf{A}}) \subset {\mathscr {L}}_2({\textbf{A}}) \subset \cdots \)
We claim that \({\mathbb {P}}(E_i)< 4\delta \) for some \(i\leqslant I:=\lfloor (\log D)^{1/3}\rfloor \). Indeed, for each \(i\leqslant I\), we have
Thus, if \({\mathbb {P}}(E_i) \geqslant 4 \delta \), then \({\mathbb {P}}(E_{i+1}) \leqslant (1 - \varepsilon /2) {\mathbb {P}}(E_i)\). If this holds for all \(i\leqslant I\), then \({\mathbb {P}}(E_I)\leqslant (1-\varepsilon /2)^{I-1}<4\delta \), a contradiction. Therefore, \({\mathbb {P}}({\mathbb {E}}_{i^*}) < 4\delta \) for some \(i^*\leqslant I\), as long as D is large enough in terms of \(\delta \) and the (fixed) system \(({\mathscr {V}},{{\textbf{c}}},{\varvec{\mu }})\). This completes the proof of Proposition 5.7.
Part III. The optimisation problem
7 The optimisation problem—basic features
In this section we consider Problem 3.7, the optimisation problem on the cube, which is a key feature of our paper. We will give some kind of a solution to this for a fixed nondegenerate flag \({\mathscr {V}}\), leaving aside the question of how to choose \({\mathscr {V}}\) optimally.
Let us refresh ourselves on the main elements of the setup of Problem 3.7. We have a nondegenerate, r-step flag
of distinct vector spaces. In light of Lemma 5.4, we may restrict our attention to flags such that
which we henceforth assume. With the flag \({\mathscr {V}}\) fixed, we wish to find \(\gamma _k({\mathscr {V}})\), the supremum of numbers \(c\geqslant 0\) such that there are thresholds
(we may assume that \(c_1=1\) by arguing as in Lemmas 5.3 and 5.4) and probability measures \(\mu _1,\ldots , \mu _r\) on \(\{0,1\}^k\) satisfying \({\text {Supp}}(\mu _j) \subset V_j\) for each j, and such that the entropy condition (3.4) holds, that is to say
for all subflags \({\mathscr {V}}' \leqslant {\mathscr {V}}\). We recall that
Remarks. (a) It is easy to see that \(\gamma _k({\mathscr {V}})\) always exists by considering the following example with \(c=0\). Take \(c_1=1\) and \(c_2=\cdots =c_{r+1}=0\) and recall that \(\dim (V_1/V_0)=1\). Suppose that \(V_1={\text {Span}}({\textbf{1}},\omega )\) with \(\omega \in \{0,1\}^k\). Thus, \(\textrm{e}({\mathscr {V}},{{\textbf{c}}},{\varvec{\mu }})=1\) for any choice of \({\varvec{\mu }}\). If \(V_1'=V_1\) then likewise we have \(\textrm{e}({\mathscr {V}}',{{\textbf{c}}},{\varvec{\mu }})=1\), and if \(V_1'=V_0\) then \(\textrm{e}({\mathscr {V}}',{{\textbf{c}}},{\varvec{\mu }})={\mathbb {H}}_{\mu _1}(V_0)\). Now \(V_0+{\textbf{1}}\), \(V_0+\omega \) and \(V_0+({\textbf{1}}-\omega )\) are three different cosets. Taking
we have \(\textrm{e}({\mathscr {V}}',{{\textbf{c}}},{\varvec{\mu }})=\log 3\). Thus, (3.4) holds. As we shall see in this section, this choice of \(\mu _1\) is the optimal choice for a very general class of flags, including those of interest to us.
(b) A simple compactness argument shows that the supremum is realised, that is, there is a choice of \({{\textbf{c}}}\) and \({\varvec{\mu }}\) satisfying the entropy condition 3.4 and with \(c_{r+1}=\gamma _k({\mathscr {V}})\).
(c) As long as we can show that \(\gamma _k>0\) (which will be taken care of in Part IV), we can always find an optimal system \(({\mathscr {V}},{{\textbf{c}}},{\varvec{\mu }})\) that also has \(c_j>c_{j+1}\) for each j (cf. Lemma 5.4(a)).
7.1 A restricted optimisation problem
It turns out to be very useful to consider a restricted variant of the problem in which the entropy condition (7.1) is only required to be satisfied for certain “basic” subflags \({\mathscr {V}}'\), rather than all of them.
Definition 7.1
(Basic subflag) Given a flag \({\mathscr {V}}: \langle {\textbf{1}}\rangle = V_0 \leqslant V_1 \leqslant \cdots \leqslant V_r\), the basic subflags \({\mathscr {V}}'_{{\text {basic}}(m)}\) are the ones in which \(V'_i = V_{\min (m,i)}\), for \(m = 0,1,\ldots , r-1\) (note that when \(m = r\) we recover \({\mathscr {V}}\) itself).
Here is the restricted version of Problem 3.7. Recall that a flag is non-degenerate if the top space \(V_r\) is not contained in any of the subspaces \(\{x\in {\mathbb {R}}^k: x_i=x_j \}\). The restriction to nondegenerate flags ensures that the subsets \(A_1,\ldots ,A_k\) in our main problem are distinct.
Problem 7.2
Let \({\mathscr {V}}\) be a nondegenerate flag of distinct spaces in \({\mathbb {Q}}^k\). Define \(\gamma _k^{{\text {res}}}({\mathscr {V}})\) to be the supremum of all constants \(c\geqslant 0\) for which there are measures \(\mu _1,\ldots , \mu _r\) such that \({\text {Supp}}(\mu _i) \subset V_i\), and parameters
such that the restricted entropy condition
holds for all \(m = 0,1,\ldots , r-1\).
It is clear that
In general there is absolutely no reason to suppose that the two quantities are equal, since after all the restricted entropy condition (7.2) apparently only captures a small portion of the full condition (7.1).
Our reason for studying the restricted problem is that we do strongly believe that
One might think of this unproven assertion, on an intuitive level, in two (roughly equivalent) ways:
-
for those flags optimal for Problem 3.7, the critical cases of (7.1) are those for which \({\mathscr {V}}'\) is basic;
-
for those flags optimal for Problem 3.7, and for the critical choice of the \(c_i, \mu _i\), the restricted condition (7.2) in fact implies the more general condition (7.1).
7.2 The \(\rho \)-equations, optimal measures and optimal parameters
The definitions and constructions of this section will appear unmotivated at first sight. They are forced upon us by the analysis of Sect. 7.5 below.
Let the flag \({\mathscr {V}}\) be fixed.
It is convenient to call the intersection of a coset \(x + V_i\) with the cube \(\{0,1\}^k\) a cell at level i, and to denote the cells at various levels by the letter C. (The terminology comes from the fact it can be useful to think of \(V_i\) defining a \(\sigma \)-algebra (partition) on \(\{0,1\}^k\), the equivalence relation being given by \(\omega \sim \omega '\) iff \(\omega - \omega ' \in V_i\): however, we will not generally use the language of \(\sigma \)-algebras in what follows.)
If C is a cell at level i, then it will be a union of cells \(C'\) at level \(i-1\). These cells we call the children of C, and we write \(C \rightarrow C'\).
Let \({\varvec{\rho }} = (\rho _1,\ldots , \rho _{r-1})\) be real parameters in (0, 1), and for each cell C define functions \(f^C({\varvec{\rho }})\) by the following recursive recipe:
-
If C has level 0, then \(f^C({\varvec{\rho }}) = 1\);
-
If C has level i, then
$$\begin{aligned} f^C({\varvec{\rho }}) = \sum _{C \rightarrow C'} f^{C'}({\varvec{\rho }})^{\rho _{i-1}},\end{aligned}$$(7.4)
with the convention that \(\rho _0 = 0\).
Write
for the cell at level i which contains \({\textbf{0}}\). Note that
Definition 7.3
(\(\rho \)-equations) The \(\rho \)-equations are the system of equations
We say that they have a solution if they are satisfied with \(\rho _1,\ldots ,\rho _{r-1} \in (0,1)\).
Example
Figure 1 illustrates these definitions for the so-called binary flag in \({\mathbb {Q}}^4\), which will be a key object of study from Sect. 9 onwards. Here
and \(V_2 = {\mathbb {Q}}^4\). The \(\rho \)-equations consist of the single equation , that is to say \(3^{\rho _1} + 4 \cdot 2^{\rho _1} +4 = 3^{\rho _1} e^2\). This has the unique solution \(\rho _1 \approx 0.306481\).
In general the \(\rho \)-equations may or may not have a solution, but for flags \({\mathscr {V}}\) of interest to us, it turns out that they have a unique such solution. In this case, we make the following definition.
Definition 7.4
(Optimal measures) Suppose that \({\mathscr {V}}\) is a flag for which the \(\rho \)-equations have a solution. Then the corresponding optimal measure on \(\mu ^*\) on \(\{0,1\}^k\) with respect to \({\mathscr {V}}\) is defined as follows: we set \(\mu ^*(\Gamma _r) = 1\), and
for any cell C at level \(i \geqslant 1\) and any child \(C \rightarrow C'\). We also set \(\mu ^*({\textbf{0}})=\mu ^*({\textbf{1}}) = \mu ^*(\Gamma _0)/2\). Lastly, we define the restrictions \(\mu ^*_j(\omega ) := \mu ^*(\Gamma _j)^{-1}\mu ^*(\omega )1_{\omega \in \Gamma _j}\) for \(j = 1,2,\ldots , r\) (thus \(\mu ^*_r = \mu ^*\)). We call theseFootnote 7optimal measures (on \(\{0,1\}^k\), with respect to \({\mathscr {V}}\)). Finally, we write \({\varvec{\mu }}^* = (\mu _1^*,\mu _2^*,\ldots , \mu _r^*)\).
Remark 7.1
(a) By taking telescoping products of (7.6) for \(i = r, r-1,\ldots , 0\), we see that \(\mu ^*\) is uniquely defined on all cells at level 0, and these are the cell \(\{ {\textbf{0}}, {\textbf{1}}\}\) and singletons \(\{\omega \}\) for all \(\omega \in \{0,1\}^k{\setminus } \{{\textbf{0}},{\textbf{1}}\}\). Since we also specified \(\mu ^*({\textbf{0}})=\mu ^*({\textbf{1}}) = \mu ^*(\Gamma _0)/2\), we see that \(\mu ^*(\omega )\) is completely and uniquely determined by these rules, for all \(\omega \). In particular, the \(\rho \)-equations (7.5) are equivalent to
and thus
In addition, we have
(b) By construction, the measures \(\mu _j^*\) satisfy statements (d) and (e) of Lemma 5.3 for all j:
(c) At the moment, the term “optimal measure” is just a name. We will establish the sense in which (in situations of interest) the measures \(\mu ^*_j\) are optimal in Proposition 7.7 below.
(d) Note that \({\varvec{\mu }}^*\) and \(\mu ^*\) are two different (but closely related) objects. The former is an r-tuple of measures \(\mu _j^*\), all of which are induced from the single measure \(\mu ^*\).
Definition 7.5
(Optimal parameters) Suppose that \({\mathscr {V}}\) is a flag for which the \(\rho \)-equations have a solution. Let \(\mu ^*\) be the corresponding optimal measure on \(\{0,1\}^k\) with respect to \({\mathscr {V}}\). Suppose additionally that
for \(m = 0,1,\ldots , r-1\). Then the corresponding optimal parameters with respect to \({\mathscr {V}}\) and the solution \(\varvec{\rho }\) are the unique choice of
if it exists, such that
The Eq. (7.11), written out in full, are
\(m = 0,1,\ldots , r-1.\)
By (7.10), this uniquely determines \(c^*_{m+1} \in {\mathbb {R}}\) in terms of \(c_{m+2}^*,\ldots ,c^*_{r+1}\). Hence, we recursively determine \(c_1,\ldots ,c_r\) in terms of \(c_{r+1}\). Since we must further have \(c_1=1\), this implicitly determines \(c_{r+1}\) as well, and thus the entire vector \({{\textbf{c}}}^*\).
Remark. By Lemma 5.3 (ii), a stronger form of the condition (7.10) is required in order for the entropy gap condition to hold, and so in practice this assumption is not at all restrictive.
We conclude this subsection with a characterization of the optimal measure \(\mu ^*\) and parameters \({{\textbf{c}}}^*\). Given an r-step flag \({\mathscr {V}}\), there is an associated rooted tree \({\mathscr {T}}({\mathscr {V}})\), which captures the structure of the cells at different levels \(0,\ldots ,r-1\). In particular, this tree always has exactly \(2^k-1\) leaves at level 0, corresponding to the cell \(\Gamma _0=\{{\textbf{0}},{\textbf{1}}\}\) and the singletons \(\{\omega \}\) for each \(\omega \in \{0,1\}^k {\setminus } \{{\textbf{0}},{\textbf{1}}\}\).
Lemma 7.6
The optimal constant \(\gamma _k^{res}({\mathscr {V}})\), associated measures \(\mu ^*_i(C)\) and optimal parameters \(c_i^*\) depend only on the tree \({\mathscr {T}}({\mathscr {V}})\) and the sequence of dimensions \(\dim (V_j)\), \(0\leqslant j\leqslant r\).
Proof
Let \({\mathscr {V}}\) and \({\widetilde{{\mathscr {V}}}}\) be different flags with the same tree structure, that is, \({\mathscr {T}}({\mathscr {V}})\) is isomorphic to \({\mathscr {T}}({\widetilde{{\mathscr {V}}}})\), and with the same sequence of dimensions \(\dim (V_j)\) and \(\dim (V_j')\). By an easy induction on the level and the definition of \(f^C(\varvec{\rho })\), if \(C\in {\mathscr {T}}({\mathscr {V}})\) and \({\tilde{C}}\in {\mathscr {T}}({\widetilde{{\mathscr {V}}}})\) correspond, we find that \(f^C(\varvec{\rho }) = f^{\widetilde{C}} (\varvec{\rho })\). The statements now follow from Definitions 7.4 and 7.5. \(\square \)
7.3 Solution of the optimisation problem: statement
Here is the main result of this section, which explains the introduction of the various concepts above, as well as their names.
Proposition 7.7
Suppose that \({\mathscr {V}}: {\textbf{1}} = V_0 \leqslant V_1 \leqslant \cdots \leqslant V_r \leqslant {\mathbb {Q}}^k\) is a nondegenerate flag such that \(\dim (V_1/V_0)=1\) and the \(\rho \)-equations have a solution. Let \({\varvec{\mu }}^*\) be the corresponding optimal measures, and suppose that the corresponding optimal parameters \({{\textbf{c}}}^*\) exist. Then
Moreover, the optimal measures \({\varvec{\mu }}^*\) and optimal parameters \({{\textbf{c}}}^*\) provide the solution to Problem 7.2; in particular, \(c^*_{r+1}\) is precisely the right-hand side of (7.13).
For this result to be of any use, we need methods for establishing, for flags \({\mathscr {V}}\) of interest, that the \(\rho \)-equations have a solution, and also that the optimal parameters exist. The former is a very delicate matter, highly dependent on the specific structure of the flags of interest. Once this is sorted out, the latter problem is less serious, at least in situations relevant to us.
7.4 Linear forms in entropies
In the next section we will prove Proposition 7.7. In this section we isolate some lemmas from the proof.
Let \({\mathscr {V}}: \langle {\textbf{1}} \rangle = V_0 \leqslant \cdots \leqslant V_r \leqslant {\mathbb {Q}}^k\) be a flag. We use the terminology of cells C at level i, introduced at the beginning of Sect. 7.2.
Lemma 7.8
Let \({\textbf{y}}= (y_0,\ldots , y_{r-1})\) be real numbers with the property that all the partial sums \(y_{< i} := y_0 + \cdots + y_{i-1}\) are positive. If C is a cell (at some level i), then we write
where the supremum is over all probability measures \(\mu _C\) supported on C.
-
(a)
The quantities \(h^C({\textbf{y}})\) are completely determined by the following rules:
-
If C has level 0, then \(h^C({\textbf{y}}) = 0\);
-
If C has level i, then
$$\begin{aligned} h^C({\textbf{y}}) = y_{< i}\log \Big (\sum _{C':\, C \rightarrow C'} e^{h^{C'}({\textbf{y}})/y_{< i}}\Big ). \end{aligned}$$(7.15)
-
-
(b)
For any C, the maximum in (7.14) occurs for a unique measure \(\mu ^*_{C,{\textbf{y}}}\). Furthermore, all of the \(\mu ^*_{C,{\textbf{y}}}\) are restrictions of the “top” measure \(\mu ^*_{{\textbf{y}}} := \mu ^*_{\Gamma _r,{\textbf{y}}}\), that is to say \(\mu ^*_{C,{\textbf{y}}}(x) = \mu ^*_{{\textbf{y}}}(x)/\mu ^*_{{\textbf{y}}}(C)\) for all \(x \in C\), and
$$\begin{aligned} \frac{\mu ^*_{{\textbf{y}}}(C')}{\mu ^*_{{\textbf{y}}}(C)} = \frac{e^{h^{C'}({\textbf{y}}/y_{<i})}}{e^{h^{C}({\textbf{y}}/y_{<i})}}. \end{aligned}$$(7.16)
Remark
As will be apparent from the proof, we do not use the linear structure of the cells C (that is, the fact that they come from cosets). We leave it to the reader to formulate a completely general version of this lemma in which the cells at level i are the atoms in a \(\sigma \)-algebra \({\mathscr {F}}_i\), with \({\mathscr {F}}_{i}\) being a refinement of \({\mathscr {F}}_{i+1}\) for all i.
Proof
We prove both parts simultaneously. Let us temporarily write \({{\tilde{h}}}^C({\textbf{y}})\) for the function defined by (7.15), thus the aim is to prove that \(h^C({\textbf{y}}) = {{\tilde{h}}}^C({\textbf{y}})\), where \(h^C({\textbf{y}})\) is defined in (7.14). We do this by induction on i, the \(i=0\) case being trivial since, in this case, all the entropies \({\mathbb {H}}_{\mu _C}(V_m)\) are zero because each cell of level 0 lies in some coset mod \(V_0\), and thus in the same coset mod \(V_m\) for \(m=0,1,\ldots ,r-1\).
Suppose now that we know the result for cells of level \(i -1\). Note that both \(h^C\) and \({{\tilde{h}}}^C\) satisfy a homogeneity property
This is obvious for \(h^C\), and can be proven very easily for \({{\tilde{h}}}^C\) by induction. Therefore we may assume that \(y_{< i} = 1\). This does not affect the measure \(\mu ^*_{{\textbf{y}}}\), which does not depend on the scaling of the parameters \(y_m\).
Suppose that C is a cell at level i. A probability measure \(\mu _C\) on C is completely determined by probability measures \(\mu _{C'}\) on the children \(C'\) of C (at level \(i - 1\)) together with the probabilities \(\mu _C(C')\), which must sum to 1, with the relation being that \(\mu _{C'}(x) = \mu _C(x)/\mu _C(C')\) for \(x\in C'\).
Suppose that \(0\leqslant m < i\). Let the random variables X, Y be random cosets of \(V_m, V_{i-1}\) respectively, sampled according to the measure \(\mu _C\). Then X determines Y and so, by Lemma B.5, \({\mathbb {H}}(X,Y) = {\mathbb {H}}(X)\). The chain rule for entropy, Lemma B.4, then yields
Translated back to the language we are using, this implies that
Therefore
(Here we used our assumption that \(y_{< i} = 1\).) Since \({\mathbb {H}}_{\mu _C}(V_m) = 0\) for \(m \geqslant i\), and \({\mathbb {H}}_{\mu _{C'}}(V_m) = 0\) for \(m \geqslant i - 1\), we may extend the sums over all \(m\in \{0,1,\ldots ,r-1\}\) thereby obtaining
Since the \(\mu _{C'}\) can be arbitrary probability measures, and \({\mathbb {H}}_{\mu _C}(V_{i-1})\) depends only on the value of \(\mu _C(C')\), it follows from the inductive hypothesis that
with equality when going from (7.18) to (7.19) when \(\mu _{C'} = \mu ^*_{C',{\textbf{y}}}\) for all \(C'\). Applying Lemma B.3 with the \(p_j\) being the \(\mu _C(C')\) and the \(a_j\) being the \({{\tilde{h}}}^{C'}({\textbf{y}})\), and noting that \({\mathbb {H}}_{\mu _C}(V_{i-1}) = {\mathbb {H}}({\textbf{p}})\) (where \({\textbf{p}} = (p_1,p_2,\ldots )\)), it follows that
In addition, Lemma B.3 implies that equality occurs in (7.20) precisely when \(p_j = e^{a_j}/\sum _i e^{a_i}\), that is to say when
(Here we used again that \(y_{<i}=1\).) Recalling that \(\mu _{C'} = \mu ^*_{C',{\textbf{y}}}\) for all \(C'\), we see that the measure \(\mu _C\) for which equality occurs in (7.17) is the restriction of \(\mu ^*_{{\textbf{y}}} = \mu ^*_{\Gamma _r,{\textbf{y}}}\) to C. This completes the inductive step. \(\square \)
7.5 Solution of the optimisation problem: proof
This section is devoted to the proof of Proposition 7.7. Strictly speaking, for our main theorems we only need a lower bound on \(\gamma _k^{{\text {res}}}({\mathscr {V}})\), and for this it suffices to show that \(c_{r+1}^*\) is given by the right-hand side of (7.13). This could, in principle, be phrased as a calculation, but it would look complicated and unmotivated. Instead, we present it in the way we discovered it, by showing that the RHS of (7.13) is an upper bound on \(\gamma _k^{{\text {res}}}({\mathscr {V}})\), and then observing that equality does occur when \(\mu = \mu ^*\) is the optimal measure (Definition 7.4) and \({{\textbf{c}}}= {{\textbf{c}}}^*\) the optimal parameters (Definition 7.5). We establish this upper bound using the duality argument from linear programming and Lemma 7.8.
To ease the notation, we use the shorthand \(d_i := \dim (V_i)\) throughout this subsection. Let us, then, consider the restricted optimisation problem, namely Problem 7.2. The condition (7.2) may be rewritten as
for \(m = 0,1,\ldots , r-1\). Therefore for any choice of “dual variables” \({\textbf{y}}= (y_0,y_1,\ldots ,\) \(y_{r-1})\), \(y_0,\ldots , y_{r-1} \geqslant 0\), we have
which, upon rearranging, gives
where
for \(j = 1,\ldots , r\), and
Since the \(c_j - c_{j+1}\), \(j = 1,\ldots , r\), and \(c_{r+1}\) are nonnegative and sum to 1, this implies that
By Lemma 7.8, this implies that
where
for \(j = 1,\ldots , r\), and \(\mu ^*_{\Gamma _j,{\textbf{y}}}\) is the measure \(\nu \) supported on \(\Gamma _j = V_j \cap \{0,1\}^k\) for which the sum \(\sum _m y_m {\mathbb {H}}_{\nu }(V_m)\) is maximal, as defined in Lemma 7.8.
Now we specify a choice of \({\textbf{y}}\). To do this, we make a change of variables, defining \(\rho _i = y_{< i}/y_{< i+1}\). Note that for fixed \(y_0 > 0\), choices of \(y_1,\ldots , y_{r-1}> 0\) are in one-to-one correspondence with choices of \(\rho _1,\ldots , \rho _{r-1}\) with \(0< \rho _i < 1\). We must then have that
for the cells C at level i, which may easily be proven by induction on the level i, using the defining equations for the \(h^C\) and \(f^C\) (see (7.15), (7.4) respectively).
Now choose the \(\rho _i\) to satisfy the \(\rho \)-equations (7.5). In virtue of (7.27), the j-th \(\rho \)-equation
with \(j\in \{1,2,\ldots ,r-1\}\) is equivalent to
with \(E'_j({\textbf{y}})\) defined as in (7.26) above.
Recall that \(d_1-d_0=\dim (V_1/V_0)=1\). Thus, if we choose
a short calculation confirms that
With this choice of \({\textbf{y}}\) we therefore have, from (7.28) with \(j = 1,\ldots , r-1\), (7.29) and (7.25),
In the above analysis, the \(\mu _i\) and the \(c_i\) were arbitrary subject to the conditions of Problem 7.2, thus \({\text {Supp}}(\mu _i) \subset V_i\) and \(1 = c_1> c_2> \cdots > c_{r+1}\). Therefore, recalling the definition of \(\gamma _k^{{\text {res}}}({\mathscr {V}})\) (see Problem 7.2), we have proven that
Proposition 7.7 asserts that equality occurs in this bound when \(c_j = c^*_j\) and \(\mu _j = \mu ^*_j\), where \({{\textbf{c}}}^* = (c_1^*,\ldots , c^*_{r+1})\) are the optimal parameters defined in Definition 7.5, and \(\mu ^*\) and its restrictions \(\mu ^*_j\) are the optimal measures defined in Definition 7.4. To establish this, we must go back through the argument showing that equality occurs at every stage with these choices.
First note that (7.21) is equivalent (as we stated at the time) to \(\textrm{e}({\mathscr {V}}'_{{\text {basic}}(m)},{{\textbf{c}}},{\varvec{\mu }}) \geqslant \textrm{e}({\mathscr {V}},{{\textbf{c}}},{\varvec{\mu }})\). The fact that equality occurs here when \({{\textbf{c}}}= {{\textbf{c}}}^*\) and \({\varvec{\mu }}= {\varvec{\mu }}^*\) is essentially the definition of the optimal parameters \({{\textbf{c}}}^*\) (Definition 7.5). That equality occurs in (7.22) and (7.23) is then automatic.
Working from the other end of the proof, the choice of \({\textbf{y}}\) was made so that \(E'_1({\textbf{y}}) = \cdots = E'_r({\textbf{y}}) = E_{r+1}({\textbf{y}})\). We claim that, with this choice of \({\textbf{y}}\),
By (7.16), it suffices to check that
This follows immediately from (7.6) and (7.27).
Since \(\mu ^*_j\) is defined to be the restriction of \(\mu ^*\) to \(\Gamma _j\), it follows from (7.31) that \(\mu ^*_j = \mu ^*_{\Gamma _j,{\textbf{y}}}\), and hence that \(E_j({\textbf{y}}) = E'_j({\textbf{y}})\) for \(j = 1,\ldots , r\).
Thus all \(2r + 1\) of the quantities \(E'_j({\textbf{y}})\) (\(j = 1,\ldots , r\)) and \(E_j({\textbf{y}})\) (\(j = 1,\ldots , r+1\)) are equal. It follows from this and the fact that equality occurs in (7.23) that equality occurs in (7.24), (7.25) and (7.30) as well. This concludes the proof of Proposition 7.7. \(\square \)
8 The strict entropy condition
8.1 Introduction
Fix an r-step, nondegenerate flag \({\mathscr {V}}\). In the previous section, we studied a restricted optimization problem (Problem 7.2) asking for the supremum of \(c_{r+1}\) when ranging over all systems \(({\mathscr {V}},{{\textbf{c}}},{\varvec{\mu }})\) satisfying the “restricted entropy condition”
The aim of the present section is two-fold: we wish to establish, under general conditions, that an “optimal system” with respect to (8.1) satisfies the more general entropy condition
In addition, we want to show that if we slightly perturb such a system, we may guarantee the strict entropy condition (3.5), which is a version of (8.2) with strict inequalities for all proper subflags \({\mathscr {V}}'\) of \({\mathscr {V}}\).
Before stating our result, we need to define the notion of the automorphism group of a flag.
Definition 8.1
(Automorphism group) For a permutation \(\sigma \in S_k\) and \(\omega =(\omega _1,\ldots ,\omega _k)\in {\mathbb {Q}}^k\), denote by \(\sigma \omega \) the usual coordinate permutation action \(\sigma \omega = (\omega _{\sigma (1)},\ldots ,\omega _{\sigma (k)})\). The automorphism group \({\text {Aut}}({\mathscr {V}})\) is the group of all \(\sigma \) that satisfy \(\sigma V_i=V_i\) for all i.
Proposition 8.2
Let \({\mathscr {V}}\) be an r-step, nondegenerate flag of distinct spaces. Assume that the \(\rho \)-equations (7.5) have a solution, and define the optimal measures \({\varvec{\mu }}^*\) on \(\{0,1\}^k\) as in Definition 7.4. Furthermore, assume that:
-
(a)
no intermediate subspace is fixed by \({\text {Aut}}({\mathscr {V}})\), that is to say there is no space W that is invariant under the action of \({\text {Aut}}({\mathscr {V}})\) and such that \(V_{i-1}< W <V_i\) (the inclusions being strict);
-
(b)
the optimal parameters \({{\textbf{c}}}^*\) exist and they are distinct and positive, that is to say the system of Eq. (7.12) has a unique solution \({{\textbf{c}}}^*\) satisfying \(1=c_1^*> c_2^*> \cdots> c_{r+1}^*>0\);
-
(c)
the following “positivity inequalities” hold:
-
(i)
\({\mathbb {H}}_{\mu ^*_{m+1}}(V_m)>\dim (V_{m+1}/V_m)\) for \(0\leqslant m\leqslant r-1\);
-
(ii)
\({\mathbb {H}}_{\mu ^*_i}(V_{m-1})-{\mathbb {H}}_{\mu ^*_i}(V_m) < \dim (V_m/V_{m-1})\) for \(1\leqslant m<i\leqslant r\).
-
(i)
Then, for every \(\varepsilon >0\), there exists a perturbation \({\tilde{{{\textbf{c}}}}}\) of \({{\textbf{c}}}^*\) such that \(1 = {{\tilde{c}}}_1> {{\tilde{c}}}_2>\cdots > {{\tilde{c}}}_{r+1}\geqslant c_{r+1}-\varepsilon \) and such that we have the strict entropy condition
We assume throughout the rest of the section that (a), (b) and (c) of Proposition 8.2 are satisfied, and we now fix the system \(({\mathscr {V}},{{\textbf{c}}}^*,{\varvec{\mu }}^*)\). For notational brevity in what follows, we write
Our strategy is as follows. First, we show the weaker “unperturbed” statement that
noting that we have strict inequality for certain subflags \({\mathscr {V}}'\) along the way. Then, in Sect. 8.8, we show how to perturb \({{\textbf{c}}}^*\) to \({\tilde{{{\textbf{c}}}}}\) so that the strict inequality (8.3) is satisfied. We also sketch a second way of effecting the perturbation which is in a sense more robust, but which in essence requires a perturbation of the whole proof of (8.4).
8.2 Analysis of non-basic flags
We turn now to the task of proving (8.4). We will prove it for progressively wider sets of subflags \({\mathscr {V}}'\), each time using the previous statement. In order, we will prove it for subflags \({\mathscr {V}}'\) which we call:
-
(a)
semi-basic: flags
$$\begin{aligned} {\mathscr {V}}':V_0\leqslant V_1 \leqslant V_2 \leqslant \cdots \leqslant V_{m-1} \leqslant \cdots \leqslant V_{m-1} \leqslant V_m \leqslant \cdots \leqslant V_m \end{aligned}$$with \(m\geqslant 1\) (that is, \({\mathscr {V}}'\) is like a basic flag, but there can be more than one copy of \(V_{m-1}\));
-
(b)
standard: each \(V'_i\) is one of the spaces \(V_j\);
-
(c)
invariant: this means that \(\sigma V'_i = V'_i\) for all automorphisms \(\sigma \in {\text {Aut}}({\mathscr {V}})\) and all i;
-
(d)
general subflags, i.e. we assume no restriction on the \(V'_i\) other than that \(V'_i \leqslant V_i\).
Note that a semi-basic flag is standard, a standard flag is invariant, and of course an invariant flag is general.
We introduce some notation for standard flags. Let \(J \subset {\mathbb {N}}_0^r\) be the set of all r-tuples \(\varvec{j} = (j_1,\ldots , j_r)\) such that \(j_1 \leqslant \cdots \leqslant j_r\) and \(j_i \leqslant i\) for all i. Then we define the flag \({\mathscr {V}}'_{\varvec{j}} = {\mathscr {V}}'_{(j_1,\ldots , j_r)}\) to be the one with \(V'_i = V_{j_i}\). This is a standard flag, and conversely every standard flag is of this form. If we define
then \({\text {basic}}(m) \in J\), and \({\mathscr {V}}'_{{\text {basic}}(m)}\) agrees with our previous notation.
8.3 Semi-basic subflags
In this subsection we prove the following result, establishing that (8.4) holds for semi-basic subflags, and with strict inequality for those which are not basic.
Lemma 8.3
(Assuming that (a), (b) and (c) of Proposition 8.2 hold) we have \(\textrm{e}({\mathscr {V}}') > \textrm{e}({\mathscr {V}})\) for all non-basic, semi-basic flags \({\mathscr {V}}'\).
We begin by setting a small amount of notation for semi-basic flags. We note that the idea of a semi-basic flag, which looks rather ad hoc, will only be used here and in Sect. 8.5.
Definition 8.4
(Semi-basic flags that are not basic) Suppose that \(1 \leqslant m \leqslant r - 1\) and that \(m \leqslant s \leqslant r-1\). Then we define the element \({\text {semi}}(m,s) \in J\) to be \(\varvec{j} = (1,2,\ldots , m-1,m-1,\ldots , m,\ldots , m)\) such that \(j_i = i\) for \(i \leqslant m-1\), \(j_i = m-1\) for \(m \leqslant i \leqslant s\) and \(j_i = m\) for \(i > s\).
It is convenient and natural to extend the notation to \(s = m-1\) and \(s = r\), by defining
One can think of the semi-basic flags as interpolating between the basic flags.
Example
When \(r = 3\) there are three semi-basic flags \({\mathscr {V}}_{\varvec{j}}\) that are not basic, corresponding to
Proof of Lemma 8.3
Assume that \({\mathscr {V}}'\) is semi-basic but not basic. We will show that
for \(m \leqslant s \leqslant r-1\). Since \({\mathscr {V}}'_{{\text {semi}}(m ,r)} = {\mathscr {V}}'_{{\text {basic}}(m-1)}\) is basic, this establishes Lemma 8.3.
To prove (8.6), we simply compute that
when \(m \leqslant s \leqslant r-2\), and
In both cases, the result follows from part (ii) of condition(c) of Proposition 8.2; in the second case, we also need to use our assumption that \(c_{r+1}^*\geqslant 0\). \(\square \)
8.4 Submodularity inequalities
To proceed further, we make heavy use of a submodularity property of the expressions \(\textrm{e}()\).
Suppose that \({\mathscr {V}}', {\tilde{{\mathscr {V}}}}'\) are two subflags of \({\mathscr {V}}\). We can define the sum \({\mathscr {V}}' + {\tilde{{\mathscr {V}}}}'\) and intersection \({\mathscr {V}}' \cap {\tilde{{\mathscr {V}}}}'\) by
and
Both of these are indeed subflags of \({\mathscr {V}}\).
Lemma 8.5
We have
Proof
We first note that the entropies \({\mathbb {H}}_{\mu }(W)\) satisfy a submodularity inequality. Namely, if \(W_1, W_2\) are subspaces of \({\mathbb {Q}}^k\) and \(\mu \) is a probability measure then
To prove this, consider the following three random variables:
-
X is a random coset of \(W_1 + W_2\), sampled according to the measure \(\mu \);
-
Y is a random coset of \(W_1\), sampled according to the measure \(\mu \);
-
Z is a random coset of \(W_2\), sampled according to the measure \(\mu \).
Then, more-or-less by definition,
Note also that Y determines X and so \({\mathbb {H}}(Y) = {\mathbb {H}}(X,Y)\), and similarly \({\mathbb {H}}(Z) = {\mathbb {H}}(X,Z)\). Finally, (Y, Z) uniquely defines a random coset of \(W_1 \cap W_2\), and so
The inequality to be proven, (8.7) is therefore equivalent to
which is a standard entropy inequality (Lemma B.6; usually known as “submodularity of entropy” or “Shannon’s inequality” in the literature).
Lemma 8.5 is essentially an immediate consequence of (8.7) and the formula
(It is very important that this formula holds with equality, as compared to (8.7), which holds only with an inequality.) \(\square \)
This has the following immediate corollary when applied to standard subflags. Here, the \(\max \) and \(\min \) are taken coordinatewise.
Corollary 8.6
Suppose that \(\varvec{j}_1, \varvec{j}_2 \in J\). Then
8.5 Standard subflags
Now we extend the result of the Sect. 8.3 to all standard subflags.
Lemma 8.7
(Assuming that (a), (b) and (c) of Proposition 8.2 hold) we have \(\textrm{e}({\mathscr {V}}') > \textrm{e}({\mathscr {V}})\) for all standard, non-basic subflags \({\mathscr {V}}'\leqslant {\mathscr {V}}\).
Proof
Let \(\varvec{j} \in J\) with \(\varvec{j}\) non-basic, and let \({\mathscr {V}}'={\mathscr {V}}'_{\varvec{j}}\). Then \(r\geqslant 3\), since when \(r\leqslant 2\) all standard flags are basic. We proceed by induction on \(\Vert \varvec{j}\Vert _\infty \), the case \(\Vert \varvec{j}\Vert _\infty =1\) being trivial, since then \({\mathscr {V}}\) is semibasic and we may invoke Lemma 8.3. Now suppose we have proved \(\textrm{e}({\mathscr {V}}') > \textrm{e}({\mathscr {V}})\) for all non-basic standard flags \({\mathscr {V}}'={\mathscr {V}}'_{\varvec{j}}\) with \(\Vert \varvec{j}\Vert _\infty < m\), and let \(\varvec{j} \in J\) with \(\Vert \varvec{j}\Vert _\infty =m\). We apply Corollary 8.6 with \(\varvec{j}_1 = \varvec{j}\) and \(\varvec{j}_2 = {\text {basic}}(j_r - 1)\). Noting that \(\max (\varvec{j}, {\text {basic}}(j_r - 1)) = {\text {semi}}(j_r, s)\), where s is the largest index in \(\varvec{j}\) such that \(j_s < j_r\), we see that
where
Suppose that both of the flags on the right of (8.8) are basic. If \({\text {semi}}(j_r, s)\) is basic then it must be \({\text {basic}}(j_r)\), which means that \(s = j_r - 1\). But then \(\varvec{j}_* = (j_1,\ldots , j_s, j_r - 1,\ldots j_r - 1)\) which, if it is basic, must be \({\text {basic}}(j_r - 1)\); this then implies that \(j_i = i\) for \(1 \leqslant i \leqslant s\), and hence that \(\varvec{j} = {\text {basic}}(j_r)\), a contradiction. Thus, at least one of the two flags \(\varvec{j}_*, {\text {semi}}(j_r,s)\) on the right of (8.8) is not basic. Since \(\Vert \varvec{j}_* \Vert _{\infty } < \Vert \varvec{j} \Vert _{\infty }=m\), the induction hypothesis together with Lemma 8.3 implies that \(\textrm{e}({\mathscr {V}}') > \textrm{e}({\mathscr {V}})\), as desired. \(\square \)
8.6 Invariant subflags
Now we extend our results to all invariant flags, but now without the strict inequality.
Lemma 8.8
(Assuming that (a), (b) and (c) of Proposition 8.2 hold) we have \(\textrm{e}({\mathscr {V}}') \geqslant \textrm{e}({\mathscr {V}})\) for all invariant subflags \({\mathscr {V}}'\leqslant {\mathscr {V}}\).
Proof
We associate a pair \((i,\ell )\), \(i\geqslant \ell \), of positive integers to \({\mathscr {V}}'\), which we call the signature, in the following manner. If \({\mathscr {V}}'\) is standard, then set \((i,\ell )=(-1,-1)\). Otherwise, let i be maximal so that \(V'_i\) is not a standard space \(V_t\), and then let \(\ell \) be minimal such that \(V'_i \leqslant V_{\ell }\). The fact that \(\ell \leqslant i\) is immediate from the definition of a subflag. We put a partial ordering on signatures as follows: \((i', \ell ') \preceq (i,\ell )\) iff \(i' < i\), or if \(i' = i\) and \(\ell ' \leqslant \ell \). We proceed by induction on the pair \((i,\ell )\) with respect to this ordering, the case \((i,\ell )=(-1,-1)\) handled by Lemma 8.7.
For the inductive step, suppose \({\mathscr {V}}'\) is nonstandard with signature \((i,\ell )\). By submodularity,
where
Suppose that \({\mathscr {V}}_1, {\mathscr {V}}_2\) have signatures \((i_1,\ell _1),(i_2,\ell _2)\), respectively. We show that
Both \({\mathscr {V}}_1\) and \({\mathscr {V}}_2\) are invariant flags. Thus, if (8.10) holds, then both flags on the right-hand side of (8.9) have strictly smaller signature than \({\mathscr {V}}'\), and the lemma follows by induction.
Finally, we prove (8.10). Note that if \(j>i\), then \(V_j'\) is a standard space \(V_m\) and thus so are \(({\mathscr {V}}_1)_j\) and \(({\mathscr {V}}_2)_j\). In particular, \(i_1\leqslant i\) and \(i_2\leqslant i\). We have that \(({\mathscr {V}}_2)_i\) contains \(V_{\ell -1}\), is not equal to \(V_{\ell -1}\), and is contained in \(V_\ell \). But \(({\mathscr {V}}_2)_i\) is invariant, and hence by our assumption that (a) of Proposition 8.2 holds, \(({\mathscr {V}}_2)_i=V_\ell \). Consequently, \(i_2<i\) if \({\mathscr {V}}_2\) is nonstandard. In the case that \({\mathscr {V}}_1\) is nonstandard, we also have that \(\ell _1<\ell \) because every space in the flag \({\mathscr {V}}_1\) is contained in \(V_{\ell - 1}\). This proves (8.10). \(\square \)
8.7 General subflags
In this section we establish (8.4), that is to say the inequality \(\textrm{e}({\mathscr {V}}')\geqslant \textrm{e}({\mathscr {V}})\) for all subflags \({\mathscr {V}}'\), of course subject to our standing assumption that (a), (b) and (c) of Proposition 8.2 hold. We need a simple lemma about the action of the automorphism group \({\text {Aut}}({\mathscr {V}})\) on subflags.
Lemma 8.9
Let \(\sigma \in {\text {Aut}}({\mathscr {V}})\) and let \({\mathscr {V}}'\) be a subflag of \({\mathscr {V}}\). Then one may define a new subflag \(\sigma ({\mathscr {V}}')\), setting \(\sigma ({\mathscr {V}}')_i := \sigma (V'_i)\). Moreover, \(\textrm{e}(\sigma ({\mathscr {V}}')) = \textrm{e}({\mathscr {V}}')\).
Proof
Since \({\mathscr {V}}'\) is a subflag, \(V'_i \leqslant V_i\). Applying \(\sigma \), and recalling that \(V_i\) is invariant under \(\sigma \), we see that \(\sigma (V'_i) \leqslant V_i\). Therefore \(\sigma ({\mathscr {V}}')\) is also a subflag. To see that \(\textrm{e}(\sigma ({\mathscr {V}}')) = \textrm{e}({\mathscr {V}}')\), recall Lemma 7.6, which implies that \(\mu _i\) is invariant under \(\sigma \), since the trees \({\mathscr {T}}({\mathscr {V}}')\) and \({\mathscr {T}}(\sigma ({\mathscr {V}}'))\) are isomorphic and we have \(\dim (V_j')=\dim (\sigma (V_j'))\) for all j. It follows that, for any subspace \(W \leqslant {\mathbb {Q}}^k\),
This completes the proof of the lemma. \(\square \)
Proof of (8.4)
Let m be the minimum of \(\textrm{e}({\mathscr {V}}')\) over all subflags \({\mathscr {V}}' \leqslant {\mathscr {V}}\), and among the flags with \(\textrm{e}({\mathscr {V}}')=m\), take the one with \(\sum _i \dim V'_i\) minimal. Let \(\sigma \in {\text {Aut}}({\mathscr {V}})\) be an arbitrary automorphism. By Lemma 8.9, \(\textrm{e}({\mathscr {V}}') = \textrm{e}(\sigma ({\mathscr {V}}'))\), and hence submodularity implies that
In particular, we have \(\textrm{e}({\mathscr {V}}\cap \sigma ({\mathscr {V}}')) = m\) (and also \(\textrm{e}({\mathscr {V}}' + \sigma ({\mathscr {V}}')) = \textrm{e}({\mathscr {V}})\), but we will not need this). Moreover, by the minimality of \(\sum _i \dim V'_i\),
which means that \({\mathscr {V}}'\) is invariant. Invoking Lemma 8.8, we conclude that \(m=\textrm{e}({\mathscr {V}}')\geqslant \textrm{e}({\mathscr {V}})\). \(\square \)
8.8 The strict entropy condition
In this section we complete the proof of Proposition 8.2 by showing how to perturb (8.4) to the desired strict inequality (8.3).
First argument. Consider first the collection \({\mathcal {U}}\) of all subflags \({\mathscr {V}}'\) which satisfy, for some \(1\leqslant j\leqslant r-1\), the relations
These are flags which differ from \({\mathscr {V}}\) in exactly one space. Our first task will be to establish the strict inequality
for all \({\mathscr {V}}' \in {\mathcal {U}}\), by elaborating upon the argument of the previous subsection. We already know that \(\textrm{e}({\mathscr {V}}') \geqslant \textrm{e}({\mathscr {V}})\), so suppose as a hypothesis for contradiction that \(\textrm{e}({\mathscr {V}}') = \textrm{e}({\mathscr {V}})\) for some \({\mathscr {V}}' \in {\mathcal {U}}\). Amongst all such flags, take one with minimal \(\sum \dim (V_i')\). By submodularity, we have (8.11) and hence \(\textrm{e}({\mathscr {V}}' \cap \sigma ({\mathscr {V}}')) = \textrm{e}({\mathscr {V}})\) for any automorphism \(\sigma \in {\text {Aut}}({\mathscr {V}})\). But
is evidently in \({\mathcal {U}}\) as well, and by our minimality assumption it follows that \(\dim (V_j' \cap \sigma (V_j'))=\dim (V_{j}')\). Thus, \({\mathscr {V}}'\) is invariant, and by assumption (a) of Proposition 8.2, it follows that \(V_j'=V_{j-1}\). Thus, \({\mathscr {V}}'\) is a standard flag, which is not basic since \(j\leqslant r-1\). Hence, \(\textrm{e}({\mathscr {V}}')>\textrm{e}({\mathscr {V}})\) by Lemma 8.7. This contradition establishes (8.12).
Let \(1\leqslant j\leqslant r-1\) and let V be a space satisfying \(V_{j-1}\leqslant V<V_j\). Let \({\mathscr {V}}'\) be the subflag \(\langle {\textbf{1}}\rangle = V_0\leqslant \ldots V_{j-1} \leqslant V \leqslant V_{j+1}\leqslant \cdots \leqslant V_r\). Then one easily computes that
and so (8.12) implies that
Now let \(\varepsilon >0\) be sufficiently small and consider the pertubation \({\tilde{{{\textbf{c}}}}}\) given by
Evidently, \(1={\tilde{c}}_1> {\tilde{c}}_2>\cdots > {\tilde{c}}_{r+1} \geqslant c^*_{r+1}-\varepsilon \), as needed. For any proper subflag \({\mathscr {V}}' \leqslant {\mathscr {V}}\),
Let \(J=\min \{j : V_j' \ne V_j\}\). If \(J=r\), then \(\dim (V_r/V_r')\geqslant 1\) and the right side above is at least \(\varepsilon /2 + O(\varepsilon ^r)\), which is positive for small enough \(\varepsilon \). If \(J\leqslant r-1\), then \(V_{J-1} \leqslant V_{J}' < V_J\) and we see that the right side above is at least
which is also positive for sufficiently small \(\varepsilon \) by (8.4) and (8.12).
Second argument. We now sketch a second approach to the proof of Proposition 8.2. The idea is to introduce a small perturbation of our fundamental quantity \(\textrm{e}()\), namely
where \(\lambda \approx 1\). Note that \(\textrm{e}_1({\mathscr {V}}',{{\textbf{c}}},{\varvec{\mu }}) = \textrm{e}({\mathscr {V}}',{{\textbf{c}}},{\varvec{\mu }})\), and also that \(\textrm{e}_{\lambda }({\mathscr {V}},{{\textbf{c}}},{\varvec{\mu }})\) does not depend on \(\lambda \), since all the entropies \({\mathbb {H}}_{\mu _j}(V_j)\) vanish. Define the \(\lambda \)-perturbed optimal parameters \({{\textbf{c}}}^*(\lambda )\) to be the unique solution to the \(\lambda \)-perturbed version of (7.11), that is to say the equations
By a continuity argument, these exist for \(\lambda \) sufficiently close to 1 and they satisfy \(\lim _{\lambda \rightarrow 1} {\textbf{c}}^*(\lambda ) = {\textbf{c}}^*(1) = {\textbf{c}}^*\).
Now, assume that \(\lambda \) is close enough to 1 so that
and we have the following “positivity inequalities”:
-
(i)
\(\lambda {\mathbb {H}}_{\mu ^*_{m+1}}(V_m)>\dim (V_{m+1}/V_m)\) for \(0\leqslant m\leqslant r-1\);
-
(ii)
\(\lambda \cdot \big ({\mathbb {H}}_{\mu ^*_i}(V_{m-1})-{\mathbb {H}}_{\mu ^*_i}(V_m) \big )< \dim (V_m/V_{m-1})\) for \(1\leqslant m<i\leqslant r\).
These conditions can be clearly guaranteed by a continuity argument and our assumption that they hold when \(\lambda =1\). For a parameter \(\lambda \) satisfying (i) and (ii) above, the proof of (8.4) holds verbatim for the \(\lambda \)-perturbed quantities \(\textrm{e}_\lambda \), allowing one to conclude that
for all subflags \({\mathscr {V}}'\) of \({\mathscr {V}}\).
Now suppose that \(\lambda < 1\). Then we have
with equality if and only if \({\mathscr {V}}' = {\mathscr {V}}\) because \({\text {Supp}}(\mu _j^*)=V_j\cap \{0,1\}^k\) for all j. Therefore if \({\mathscr {V}}'\) is a proper subflag of \({\mathscr {V}}\) we have
Taking \({\tilde{{{\textbf{c}}}}} = {{\textbf{c}}}^*(\lambda )\) for \(\lambda \) sufficiently close to 1, Proposition 8.2 follows.
Part IV. Binary systems
9 Binary systems and a lower bound for \(\beta _k\)
In this section we define certain special flags \({\mathscr {V}}\) on \({\mathbb {Q}}^k\), \(k = 2^r\), which we call the binary systems of order r. It is these systems which lead to the lower bound on \(\beta _k\) given in Theorem 2, which is one of the main results of the paper.
In this section we will define these flags (which is easy) and state their basic properties. The proofs of these properties, some of which are quite lengthy, are deferred to Sect. 10.
We are then in a position to prove part of one of our main theorems, Theorem 2 (a), which we do in Sect. 9.2.
For the convenience of the reader, recall us here the three parts of Theorem 2, as stated at the end of Sect. 1.3:
-
(a)
Showing that for every \(r\geqslant 1\), \(\beta _{2^r} \geqslant \theta _r\) for a certain explicitly defined constant \(\theta _r\);
-
(b)
Showing that \(\lim _{r\rightarrow \infty } \theta _r^{1/r}\) exists;
-
(c)
Showing that (1.1) has a unique solution \(\rho \in [0,1/3]\) and that \(\rho =2\lim _{r\rightarrow \infty } \theta _r^{1/r}\).
9.1 Binary flags and systems: definitions and properties
Definition 9.1
(Binary flag of order r) Let \(k = 2^r\) be a power of two. Identify \({\mathbb {Q}}^k\) with \({\mathbb {Q}}^{{\mathcal {P}}[r]}\) (where \({\mathcal {P}}[r]\) means the power set of \([r] = \{1,\ldots , r\}\)) and define a flag \({\mathscr {V}}\), \(\langle {\textbf{1}} \rangle = V_0 \leqslant V_1 \leqslant \cdots \leqslant V_r = {\mathbb {Q}}^{{\mathcal {P}}[r]}\), as follows: \(V_i\) is the subspace of all \((x_S)_{S \subset [r]}\) for which \(x_S = x_{S \cap [i]}\) for all \(S \subset [r]\).
Remark
We have \(\dim (V_i) = 2^i\), and \(V_r = {\mathbb {Q}}^{{\mathcal {P}}[r]}\), so the system is trivially nondegenerate. Note that we have been using the letter r to denote the number of \(V_i\) in the flag \({\mathscr {V}}\), throughout the paper. It just so happens that, in this example, this is the same r as in the definition of \(k = 2^r\).
One major task is to show that optimal measures and optimal parameters, as described in Sect. 7, may be defined on the binary flags. Since we will be seeing them so often, let us write down the \(\rho \)-equations (7.5) for the binary flags explicitly:
Proposition 9.2
Let \({\mathscr {V}}\) be the binary flag of order r. Then
-
(a)
the \(\rho \)-equations (9.1) have a solution with \(0< \rho _i < 1\) for \(i \geqslant 1\), and consequently we may define the optimal measures \({\varvec{\mu }}^*\) on \(\{0,1\}^k\) as in Definition 7.4;
-
(b)
the optimal parameters \({\textbf{c}}^*\) (in the sense of Definition 7.5) exist.
We call the binary flag \({\mathscr {V}}\) (of order r) together with the additional data of the optimal measures \(\mu = \mu ^*\) and optimal parameters \({\textbf{c}} = {\textbf{c}}^*\), the binary system (of order r). We caution that for fixed i (such as \(i = 2\)) the parameters \(c_i\) do depend on r, although not very much.
The second major task is to show that the binary systems satisfy the entropy condition (3.4), or more accurately that arbitrarily small perturbations of them satisfy the strict entropy condition (3.5). In the last section we provided a tool for doing this in somewhat general conditions, namely Proposition 8.2. That proposition has four conditions, (a), (b), (c)(i) and (c)(ii) which must be satisfied. Of these, (b) (the existence of the optimal parameters \({\textbf{c}}^*\)) has already been established, assuming the validity of Proposition 9.2. We state the other three conditions separately as lemmas.
Lemma 9.3
Suppose that \(V_{i-1} \leqslant W \leqslant V_i\) and that W is invariant under \({\text {Aut}}({\mathscr {V}})\). Then W is either \(V_{i-1}\) or \(V_i\). Thus, the binary flags satisfy Proposition 8.2 (a).
Lemma 9.4
We have \({\mathbb {H}}_{\mu _{m+1}^*}(V_m) > 2^m\) for \(0 \leqslant m \leqslant r - 1\). Thus, the binary flags satisfy Proposition 8.2 (c)(i).
Lemma 9.5
We have \({\mathbb {H}}_{\mu _i^*}(V_{m-1})-{\mathbb {H}}_{\mu ^*_i}(V_m) < 2^{m-1}\) for \(1\leqslant m< i\leqslant r\). Thus, the binary flags satisfy Proposition 8.2 (c)(ii).
The proofs of these various facts are given in Sect. 10.
9.2 Proof of Theorems 2 (a) and 7
We are now in a position to complete the proof of Theorem 2 (a), modulo the results stated above. First, we define the constants \(\theta _r\).
Definition 9.6
Let \(\rho _1,\rho _2,\ldots \) be the solution to the \(\rho \)-equations (9.1) for the binary flag. Then we define
Proof of Theorem 2 (a)
By Proposition 7.7, \(\theta _r\) is equal to \(c^*_{r+1}\), where \({\textbf{c}}^*\) are the optimal parameters on the binary flag \({\mathscr {V}}\) of order r, the existence of which is Proposition 9.2 (b) above.
Fix \(\delta \in (0,\theta _r/2]\). By Proposition 8.2 (the hypotheses of which are satisfied by Lemma 9.3, Proposition 9.2 (b) and Lemmas 9.4 and 9.5), there exists a perturbation \({\tilde{{{\textbf{c}}}}}\) of \({{\textbf{c}}}^*\) such that
and \(({\mathscr {V}},{\tilde{{{\textbf{c}}}}},{{\varvec{\mu }}}^*)\) satisfies the strict entropy condition (3.5). By Lemma 5.2, there exists some \(\varepsilon >0\) such that the “entropy gap” condition (5.1) holds. Finally, by Remark 7.1 (b), we have that \({\text {Supp}}(\mu _j^*)=\Gamma _j\) for all j. Hence, Proposition 5.5 implies that \(\beta _{2^r} \geqslant {\tilde{c}}_{r+1}=\theta _r-\delta \). Since \(\delta \) is arbitrary, this proves Theorem 2 (a). \(\square \)
Proof of Theorem 7
The upper bound \(\beta _k\leqslant \gamma _k\) is established in Sect. 4. The lower bound \(\beta _k\geqslant {\tilde{\gamma }}_k\) follows by Lemma 5.3, Proposition 5.5 and the fact that there exists at least one system satisfying the strict entropy condition (3.5), as per the proof of Theorem 2 (a) above. \(\square \)
9.3 Remarks on Theorem 2 (b)
Theorem 2 (b) is a problem of a combinatorial and analytic nature which can be considered more-or-less completely independently of the first three parts of the paper.
To get a feel for it, and a sense of why it is difficult, let us write down the first two \(\rho \)-equations (9.1) for the binary flags. The equation with \(j = 1\) is
This has the numerical solution \(\rho _1 \approx 0.306481\).
To write down the \(\rho \)-equation for \(j = 2\), one must compute \(f^{\Gamma _3}(\rho )\), and without any additional theory the only means we have to do this is to draw the full tree structure for the binary flag \({\mathscr {V}}\) of order 3 (on \({\mathbb {Q}}^8\)). This is a tractable exercise and one may confirm that
The \(\rho \)-equation with \(j = 2\) is then
where (recall from Fig. 1) \(f^{\Gamma _2}(\rho ) = 3^{\rho _1} + 4 \cdot 2^{\rho _1} + 4\). This may be solved numerically, with the value \(\rho _2 \approx 0.2796104\ldots \), using Mathematica.
Such a numerical procedure, however, is already quite an unappetising prospect if one wishes to compute \(\rho _3\).
Consequently, we must develop more theory to understand the \(\rho _i\) and to prove Theorem 2 (b). This is the task of the last two sections of the paper.
10 Binary systems: proofs of the basic properties
In this section, we prove the various statements in Sect. 9.1.
We begin, in Sect. 10.2, by proving Lemma 9.3. This is a relatively simple and self-contained piece of combinatorics.
In Sect. 10.3 we introduce the concept of genotype, which allows us to describe the tree structure induced on \(\{0,1\}^k\) by the binary flag \({\mathscr {V}}\). In Sect. 10.4 we show how to compute the quantities \(f^C(\varvec{\rho })\) in terms of the genotype.
We are then, in Sect. 10.5, in a position to prove Proposition 9.2 (a), guaranteeing that the \(\rho _i\) exist and allowing us to define the optimal measures \({\varvec{\mu }}^*\).
In Sect. 10.6 we establish the two entropy inequalities, Lemmas 9.4 and 9.5.
Finally, in Sect. 10.7 we prove Proposition 9.2 (b), which confirms the existence of the optimal parameters \({{\textbf{c}}}^*\).
10.1 Basic terminology
Throughout the section, \({\mathscr {V}}\) will denote the binary flag or order r, as defined in Definition 9.1. That is, we take \(k = 2^r\), identify \({\mathbb {Q}}^k\) with \({\mathbb {Q}}^{{\mathcal {P}}[r]}\), and take \(V_i\) to be the subspace of all \((x_S)_{S \subset {\mathcal {P}}[r]}\) for which \(x_S = x_{S \cap [i]}\) for all \(S \subset [r]\).
In addition, we will write \({\textbf{0}}_j, {\textbf{1}}_j\) for the vectors in \(\{0,1\}^{{\mathcal {P}}[j]}\) consisting of all 0s (respectively all 1s). We call these (or any multiples of them) constant vectors.
Finally, we introduce the notion of a block of a vector
For each \(A \subset [i]\) we consider the \(2^{r-i}\)-tuple
We call these the i-blocks of x.
Remark 10.1
(a) One should note carefully that the i-blocks are strings of length \(2^{r-i}\). In this language, \(V_i\) is the space of vectors x, all of whose i-blocks are constant.
(b) If we put together the coordinates of the i-blocks x(A, i) and \(x(A\triangle \{i\},i)\), then we obtain the \((i-1)\)-block \(x(A\cap [i-1],i-1)\).
In order to visualize the structure of the flag \({\mathscr {V}}\) and of the partition of \(\{0,1\}^{{\mathcal {P}}[r]}\) by the cosets of \(V_j\), it will be often useful to write elements of \(\{0,1\}^{{\mathcal {P}}[r]}\) as strings of 0s and 1s of length \(2^r\). When we do this we use the reverse binary order, which is the one induced from \({\mathbb {N}}\) via the map \(f(S) = \sum _{s \in S} 2^{r - s}\).
Example 10.2
For concreteness, let us consider the case \(r = 3\). In this case, the ordering of the coordinates of x is
If \(x = 01001110\) then its 2-blocks are 01, 00, 11, 10, and its 1-blocks are 0100, 1110.
10.2 Automorphisms of the binary system
Proof of Lemma 9.3
We begin by defining some permutations of \({\mathcal {P}}[r]\) for which, we claim, the corresponding coordinate permutations give elements of \({\text {Aut}}({\mathscr {V}})\). Suppose that \(1 \leqslant j \leqslant r\) and that \(A \subset [j-1]\). Then we may consider the permutation \(\pi (A,j)\) defined by
To visualize the action of this permutation on the coordinates of a vector x, it is useful to order its coordinates as we explained above. The action of \(\pi (A,j)\) is then to permute the two adjacent j-blocks x(A, j) and \(x(A\sqcup \{j\},j)\), which together form the \((j-1)\)-block \(x(A,j-1)\), as per Remark 10.1(b). More concretely, below are some examples of the action of the permutations \(\pi (A,j)\) in the setting of Example 10.2:
If the readers wish, they may translate the arguments below in the above more visual language.
Claim. \(\pi (A,j)\) preserves \(V_i\) for all i, and therefore \(\pi (A,j) \in {\text {Aut}}({\mathscr {V}})\).
Proof
Suppose that \(x= (x_S)_{S \subset [r]} \in V_i\) and let us write for simplicity \(\pi \) instead of \(\pi (A,j)\).
Suppose first that \(j > i\). Then \(\pi (S) \cap [i] = S \cap [i]\) for all S, and so
where the first and last steps used the fact that \({\textbf{x}} \in V_i\). Thus the claim follows in this case.
Suppose that \(j \leqslant i\). Let \(t > i\). Then the conditions \((S \triangle \{t\}) \cap [j-1] = A\) and \(S \cap [j-1] = A\) are equivalent. Hence, if \(S \cap [j-1] = A\), then we find that
where we used that \(x\in V_i\) and that \(t>i\) at the second step. Similarly, if \(S \cap [j-1] \ne A\), then
In all cases, we have found that \(x_{\pi (S \triangle \{t\})} = x_{\pi (S)}\). Since this is true for all \(t > i\), \(\pi (x)\) indeed lies in \(V_i\). This completes the proof of the claim. \(\square \)
Suppose now that W is an invariant subspace of \({\mathscr {V}}\) satisfying the inclusions \(V_{i-1}<W \leqslant V_i\). We want to conclude that \(W=V_i\). To accomplish this, we introduce some auxiliary notation.
For each \(A\subset [i-1]\), we consider the vector \(y^A = (y^A_S)_{S \subset [r]}\in V_i\) that is uniquely determined by the relations \(y^A_{A} = 1\), \(y^A_{A \cup \{i\}} = -1\) and \(y^A_S = 0\) for all other \(S \subset [i]\). There are \(2^{i-1}\) such vectors \(y^A\). They are mutually orthogonal, hence linearly independent. In addition, together with \(V_{i-1}\), they generate all of \(V_i\). Since \(V_{i-1}<W\leqslant V_i\), there must exist \(A\subset [i-1]\) such that \(y^A\in W\).
Now, it is easy to check that for any \(j<i\) and any \(A\subset [i-1]\), we have
From the above relation and the invariance of W under \({\text {Aut}}({\mathscr {V}})\), it is clear that if W contains at least one vector \(y^A\) with \(A\subset [i-1]\), then it contains all such vectors. Since we also know that \(V_{i-1}\leqslant W\leqslant V_i\), we must have that \(W=V_i\), which completes the proof of Lemma 9.3. \(\square \)
Remark
A minor elaboration of the above argument in fact allows one to show that the subspaces of \({\mathbb {Q}}^{{\mathcal {P}}[r]}\) invariant under \({\text {Aut}}({\mathscr {V}})\) are the \(V_i\), the orthogonal complements of \(V_{i-1}\) in \(V_i\), and all direct sums of these spaces. However, we will not need the classification in this explicit form.
10.3 Cell structure and genotype
The cosets of \(V_i\) partition \(\{0,1\}^{{\mathcal {P}}[r]}\) into sets which we call the cells at level i. Our first task is to describe these explicitly.
Consider \(\omega , \omega ' \in \{0,1\}^{{\mathcal {P}}[r]}\). It is easy to see that \(\omega - \omega ' \in V_i\) (and so \(\omega , \omega '\) lie in the same cell at level i) if and only if for every \(A \subset [i]\) one of the following is true:
-
(a)
Both \(\omega (A,i)\) and \(\omega '(A,i)\) are constant blocks (that is, they both lie in \(\{ {\textbf{0}}_{r-i}, {\textbf{1}}_{r-i}\})\).
-
(b)
\(\omega (A, i) = \omega '(A,i)\), and neither of these blocks is constant (that is, neither is \({\textbf{0}}_{r-i}\) nor \({\textbf{1}}_{r-i}\)).
Thus a cell at level i is completely specified by the positions A of its constant i-blocks, and the values \(\omega (A,i)\) (for an arbitrary \(\omega \in C\)) of its non-constant i-blocks.
Example
With \(r=3\) and \(\omega = 01001110\), the level 2 cell that contains \(\omega \) is the set
Its constant 2-blocks are at \(A = \{2\}\) and \(A = \{1\}\). Its non-constant 2-blocks are at \(A = \emptyset \) (taking the value \(\omega (A,2) = 01\)) and at \(A = \{1,2\}\) (taking the value \(\omega (A, 2) = 10\)). The level 1 cell containing \(\omega \) is just \(\{\omega \}\).
The positions of the constant i-blocks play an important role, and we introduce the name genotype to describe these.Footnote 8
Definition 10.1
(Genotype) If C is a cell at level i, its genotype \(g(C) \subset {\mathcal {P}}[i]\) is defined to be the collection of \(A \subset [i]\) for which \(\omega (A,i)\in \{{\textbf{0}}_{r-i},{\textbf{1}}_{r-i}\}\) for all \(\omega \in C\). We refer to any subset of \({\mathcal {P}}[i]\) as an i-genotype. If \(g, g'\) are two i-genotypes, then we write \(g \leqslant g'\) to mean the same as \(g \subseteq g'\). We write |g| for the cardinality of g.
Example
If C is the cell at level 2 containing \(\omega = 01001110\), the genotype g(C) is equal to \(\big \{\{2\}, \{1\}\big \}\). (We have listed these sets in the reverse binary ordering once again.)
Definition 10.2
(Consolidations) If g is an i-genotype, then its consolidation is the \((i-1)\)-genotype \(g^*\) defined by \(g^* := \{A' \subset [i-1] : A'\in g, A' \cup \{i\} \in g\}\) (cf. Remark 10.1 (b)).
Let us pause to note the easy inequality
valid for all i-genotypes.
The genotype is intimately connected to the cell structure on \(\{0,1\}^k\) induced by \({\mathscr {V}}\), as the following lemma shows.
Lemma 10.3
We have the following statements.
-
(a)
If C is a cell, we have \(|C| = 2^{| g(C)|}\).
-
(b)
Suppose that g is an i-genotype. There are \((2^{2^{r-i}}-2)^{2^i-| g |}\) cells (at level i) with \(g(C) = g\).
-
(c)
If \(g(C) = g\), and if \(C'\) is a child of C, then \(g(C') \leqslant g^*\). In particular, \(|g(C')| \leqslant \frac{1}{2}|g(C)|\).
-
(d)
Suppose that \(g(C) = g\). Suppose that \(g'\) is an \((i-1)\)-genotype and that \(g' \leqslant g^*\). Then number of children \(C'\) of C with \(g(C') = g'\) is \(2^{| g | - | g^* | - | g' |}\).
-
(e)
Suppose that C is a cell at level i with \(g(C) = g\). Then the number of children of C (at level \(i-1\)) is \(2^{| g | - 2 |g^* |} 3^{|g^* |}\).
Proof
(a) This is almost immediate: for each of the \(A\subset g(C)\) of constant blocks, the are two choices (\({\textbf{0}}_{r - i}\) or \({\textbf{1}}_{r - i}\)) for \(\omega (A,i)\).
(b) To determine C completely (given g), one must specify the value of each of \(2^i - | g |\) non-constant i-blocks. For each such block, there are \(2^{2^{r-i}}-2\) possible non-constant values.
(c) A set \(A' \subset [i-1]\) can only possibly be the position of a constant block in some child cell of C if both \(A'\) and \(A' \cup \{i\}\) are the positions of constant blocks in C, or in other words \(A', A' \cup \{i\} \in g\), which is precisely what it means for \(A'\) to lie in \(g^*\).
Note that the child cell \(C'\) containing \(\omega \) only does have a constant \((i-1)\)-block at position \(A'\) if \(\omega (A', i) = \omega (A' \cup \{i\}, i)\), which may or may not happen.
The second statement is an immediate consequence of the first and (10.2).
(d) Let \(A \in g\). We say that A is productive if \(A' := A \cap [i-1] \in g^*\), or equivalently if \(A'\) and \(A' \cup \{i\}\) both lie in g (or, more succinctly, \(A \triangle \{i\} \in g\)). These are the positions which can give rise to constant \((i-1)\)-blocks in children of C. There are \(2|g^*|\) such positions, coming in \(|g^*|\) pairs. To create a child \(C'\) with genotype \(g'\), we have a binary choice at \(|g^*| - |g'|\) of these pairs: at each of them either \(\omega (A', i) = {\textbf{0}}_{r - i}\) and \(\omega (A' \cup \{i\}, i) = {\textbf{1}}_{r - i}\), or the other way around. There are \(|g| - 2|g^*|\) non-productive positions \(A \in g\), and for each of these there is also a binary choice, either \(\omega (A,i) = {\textbf{0}}_{r - i}\) or \(\omega (A,i) = {\textbf{1}}_{r - i}\). The total number of choices is therefore \(2^{|g^*| - |g'|} \times 2^{|g| - 2|g^*|}\), which is exactly as claimed.
(e) This is immediate from part (d), upon summing over \(g' \subseteq g^*\). \(\square \)
10.4 The \(f^C(\rho )\) and genotype
We begin by recalling from (7.4) the definition of the functions \(f^C({\varvec{\rho }})\). Here \({\varvec{\rho }} = (\rho _1,\ldots , \rho _{r-1})\) is a sequence of parameters, and we define \(\rho _0 = 0\). If C has level 0, we set \(f^C({\varvec{\rho }}) = 1\), whilst for C at level \(i \geqslant 1\) we apply the recursion
Proposition 10.4
The quantities \(f^C\) depend only on the genotype of C, and thus for any i-genotype g we may define \(F(g) := f^{C}({\varvec{\rho }})\), where C is any cell with \(g(C) = g\). We have the recursion
Remark
The F(g) depend on \({\varvec{\rho }}\), as well as on i (where g is an i-genotype) but we suppress explicit mention of this. For example, it should be clear from context that g on the left is an i-genotype, but the sum on the right is over \((i-1)\)-genotypes, since \(g^*\) is an \((i-1)\)-genotype by definition.
Proof
This is a simple induction on the level i using the definition of the \(f^C({\varvec{\rho }})\), and parts (c) and (d) of Lemma 10.3.\(\square \)
Let us pause to record two corollaries which we will need later.
Corollary 10.5
Suppose that \(g_1, g_2\) are two i-genotypes with \(g_1 \leqslant g_2\). Then \(F(g_1) \leqslant F(g_2)\).
Proof
Note that \(g_1^* \leqslant g_2^*\), and also that \(|g_1| - |g_1^*| \leqslant |g_2| - |g_2^*|\), since
Hence, by two applications of Proposition 10.4,
\(\square \)
Recall that \(\Gamma _i\) is the cell at level i containing \({\textbf{0}}\). Note that \(g(\Gamma _i) ={\mathcal {P}}[i]\).
Corollary 10.6
If \(C \ne \Gamma _i\) is a cell of level i, then \( f^C({\varvec{\rho }}) < f^{\Gamma _i}({\varvec{\rho }})\).
Proof
This is simply the special case \(g_2 = {\mathcal {P}}[i]\) of the preceding corollary. The inequality is strict because if \(g < {\mathcal {P}}[i]\), then \(g^* < {\mathcal {P}}[i-1]\).\(\square \)
10.5 Existence of the \(\rho _i\)
In this section we prove Proposition 9.2 (a), which asserts that for the binary flags there is a unique solution \({\varvec{\rho }} = (\rho _1,\rho _2,\ldots )\) to the \(\rho \)-equations (9.1). In fact, we will prove the following more general fact which treats the jth \(\rho \)-equation in isolation, irrespective of whether the earlier ones have already been solved.
Proposition 10.7
Let \(j\in {\mathbb {N}}\) and let \(\rho _1,\ldots , \rho _{j-1} \in (0,1)\). Then there is a unique \(\rho _j \in (0,1)\) such that the jth \(\rho \)-equation for the binary flag, \(f^{\Gamma _{j+1}}(\varvec{\rho }) = e^{2^j} f^{\Gamma _j}(\varvec{\rho })^{\rho _{j}}\), is satisfied.
Remark
We will prove in the next section (Lemma 11.2) that for the solution \(\rho _1,\rho _2,\ldots \) to the full set of \(\rho \)-equations we have \(\rho _j \leqslant \rho _1 = 0.30648\ldots \) for all j. For a table of numerical values of the \(\rho _j\), see Table 1 in Sect. 12.
Before beginning the proof of Proposition 10.7, we isolate a lemma.
Lemma 10.8
Fix a \((j-1)\)-genotype \(g'\). Then
where the sum is over all j-genotypes g.
Proof
In order to determine g, we must determine for each \(A\subset [j-1]\) whether A and/or \(A\cup \{j\}\) lie in g. Since we are only summing over g whose consolidation \(g^*\) contains \(g'\), we must have that A and \(A\cup \{j\}\) belong to g for all \(A\in g'\), so the membership of A and \(A\cup \{j\}\) to g is fully determined for all \(A\in g'\). For any \(A\subset [j-1]\) with \(A\notin g'\), we have four choices, according to whether \(A\in g\) and whether \(A\cup \{j\}\). If both of these conditions hold, then we further have \(A\in g^*\); in the other three cases, we have \(A\notin g^*\). We conclude that
This completes the proof. \(\square \)
Proof of Proposition 10.7
For \(j=1\), the equation to be satisfied is \(3^{\rho _1} + 4 \cdot 2^{\rho _1} + 4 = e^2 3^{\rho _1}\). It may easily be checked numerically that this has a unique solution \(\rho _1 \approx 0.306481\ldots \) in (0, 1). One may also proceed analytically as follows. Define
In particular, the roots of G are in correspondence with the roots of \(H(x)=e^2 - (1 + 4 \cdot (2/3)^x + 4/3^x)\). This is clearly a continuous and strictly increasing function. In addition, \(H(0)=e^2-9<0\) and \(H(1)=e^2-5>0\). Thus, H has a unique root \(\rho _1\in (0,1)\), and so does G.
Now assume \(j\geqslant 2\). It turns out that much the same argument works, although the details are more elaborate. Assume that \(0<\rho _i<1\) for \(1\leqslant i<j\). Define
Proposition 10.4 implies that
where
and the sums over g run over all genotypes \(g \subset {\mathcal {P}}[j]\) at level j. Since (by an easy induction) \(F({\mathcal {P}}[j])>0\), it follows that G and H have the same roots. The latter is a continuous and strictly increasing function because Corollary 10.6 implies that \(F(g)/F({\mathcal {P}}[j]) \leqslant 1\), with equality only when \(g={\mathcal {P}}[j]\). Moreover, \(H(0) = e^{2^j} - 3^{2^j} < 0\). Therefore to complete the proof it suffices to show that \(H(1) > 0\).
To show this, we use (10.4). First note that
where the sum is over all genotypes \(g'\) of level \((j-1)\).
Next, by Proposition 10.4 and Lemma 10.8 we have
Putting (10.4), (10.5) and (10.6) together we obtain
Since \(e^2>7\), we have \(\sqrt{14} < e\sqrt{2}\), and thus \(H(1)>0\). This completes the proof. \(\square \)
10.6 Entropy inequalities for the binary systems
We begin with a lemma which will be used a few times in what follows.
Lemma 10.9
Let \(C'\) be one of the children of \(\Gamma _i\), thus \(C'\) is a cell at level \((i-1)\). Then
and equality occurs only when \(C' = \Gamma _{i-1}\).
Proof
We showed in Corollary 10.6 that \(f^{C'}({\varvec{\rho }}) < f^{\Gamma _{i-1}}({\varvec{\rho }})\), for any choice of \({\varvec{\rho }} = (\rho _1,\ldots , \rho _{r-1})\), and for any child \(C'\) of \(\Gamma _i\) with \(C' \ne \Gamma _{i-1}\). Now that we know that the \(\rho \)-equations have a solution, it follows immediately from the definition of the optimal measures \({\varvec{\mu }}^*\) in (7.6), applied with \(C = \Gamma _i\), that \(\mu _i(C') < \mu _i(\Gamma _{i-1})\), again for any child \(C'\) of \(\Gamma _i\) with \(C' \ne \Gamma _{i-1}\). Finally, observe that \(\mu _i(\Gamma _{i-1}) = e^{-2^{i-1}}\) by (7.7). \(\square \)
Proof of Lemma 9.4
This follows almost immediately from Lemma 10.9 with \(i = m+1\). Indeed since \(\mu _{m+1}(C) \leqslant e^{-2^m}\) for all cells C at level m, with equality only for \(C = \Gamma _m\), we have
This concludes the proof. \(\square \)
Proof of Lemma 9.5
Let \(\mu = \mu _i\) with \(m<i\leqslant r\). We must show that
Let C denote a cell at level m and \(C'\) a child of C at level \((m-1)\). In addition, let the notations g(C) and \(g(C)^*\) refer to the genotype of C and its consolidation, as defined in Definitions 10.1 and 10.2. By the definition of entropy, Lemma 10.3 (e), and the concavity of \(L(x)=-x\log x\) we find that
Now by (10.2) we have \(|g(C)^*| \geqslant |g(C)| - 2^{m-1}\), whence
Since we also have that \(|g(C)|\leqslant 2^m\), we infer that
This and (10.8) already imply the bound
which is only very slightly weaker than Lemma 9.5.
To make the crucial extra saving, write S for the union of all cells C at level m with \(|g(C)| > \frac{3}{4} 2^m\). We claim that
We postpone the proof of this inequality momentarily and show how to use it to complete the proof of Lemma 9.5.
Observe that if C is not one of the cells making up S, that is to say if \(|g(C)| \leqslant \frac{3}{4} 2^m\), then
where we used (10.9) to obtain the first inequality. Assuming the claim (10.11), it follows from this, (10.8) and (10.10) that
which is the statement of Lemma 9.5.
It remains to prove (10.11). Recall that \(1\leqslant m<i\leqslant r\).
When \(1\leqslant m\leqslant 2\), the only integer in \((\frac{3}{4}2^m,2^m]\) is \(2^m\). Hence, if a cell C at level m satisfies the inequality \(|g(C)|>\frac{3}{4}2^m\), we must have \(|g(C)|=2^m\). The only cell with this property is \(\Gamma _m\). Since we have \(\mu (\Gamma _m)=e^{2^m-2^i} \leqslant e^{-1}\) by (7.7), our claim (10.11) follows in this case.
Assume now that \(m\geqslant 3\). Let \({\tilde{S}}\) be the union of all children \({{\tilde{C}}}\) of \(\Gamma _i\) (thus these are cells at level \(i-1 \geqslant m\)) which contain a cell C in S. By repeated applications of Lemma 10.3 (c) we have \(|g(\tilde{C})| > 2^{i-1-m} (\frac{3}{4} 2^{m})=\frac{3}{4} 2^{i-1}\) for any such \({{\tilde{C}}}\). Lemma 10.3 (d), applied with \(C = \Gamma _i\), implies that the number of such cells \({\tilde{C}}\) is at most
By Lemma 10.9 and our assumption that \(i-1\geqslant m\geqslant 3\), it follows that
This completes the proof of the claim (10.11) and hence of Lemma 9.5. \(\square \)
10.7 Existence of the optimal parameters \({{\textbf{c}}}^*\)
Proof of Proposition 9.2 (b)
We have \({\text {Supp}}(\mu _j^*) =\Gamma _j\) by Remark 7.1 (b), and hence \(|{\text {Supp}}(\mu _j^*)| =2^{2^j}\) by Lemma 5.1. By Lemma B.2, when \(j\geqslant m+2\) we deduce the inequality
Now recall (Definition 7.5) that the optimal parameters should satisfy the conditions (7.12) (which are the fully written out version of (7.11)). We wish to show that there is a solution with \(1 = c_1^*> c^*_2> \cdots> c^*_{r+1} > 0\). Rearranging (7.12) and recalling \(\dim (V_j)=2^j\), we find that
for \(0\leqslant m\leqslant r-1\). By Lemma 9.4 and (10.12), we may apply a downwards induction on \(m = r-1, r-2,\ldots \) to solve these equations with \(0< c^*_{r+1}< c^*_r< \cdots < c^*_1\). Rescaling, we may additionally ensure that \(c^*_1 = 1\).\(\square \)
11 The limit of the \(\rho _i\)
In the last section we showed that there is a unique solution \(\varvec{\rho }= (\rho _1,\rho _2,\ldots )\) to the \(\varvec{\rho }\)-equations (9.1) for the binary system with \(0< \rho _j < 1\) for all j. In this section, we show that the limit \(\lim _{j \rightarrow \infty } \rho _j \) exists.
Proposition 11.1
\(\rho = \lim _{j \rightarrow \infty } \rho _j\) exists.
11.1 \(\rho _1\) is the largest \(\rho _j\)
The estimates required in the proof of Proposition 11.1 are rather delicate, and to make them usable for our purposes we need the following a priori bound on the \(\rho _j\).
Lemma 11.2
For all \(j\geqslant 1\), we have \(\rho _j \leqslant \rho _1 = 0.30648\ldots \)
The reader should recall the notion of genotype g (Definition 10.1) and of the function F(g) (Proposition 10.4).
The next lemma is a stronger version of Corollary 10.5, whose proof uses that result as an ingredient.
Lemma 11.3
For any \(j\geqslant 1\) and \(g_1 \leqslant g_2\) at level j, we have
Proof
We have
This concludes the proof. \(\square \)
Proof of Lemma 11.2
We begin by observing that
The \(\rho \)-equations (9.1), translated into the language of genotypes, are \(F({\mathcal {P}}[j+1]) = e^{2^j} F(P[j])^{\rho _j}\). Therefore, by Proposition 10.4 (with \(g = {\mathcal {P}}[j+1]\)) followed by Lemma 11.3 (with \(g_2 = {\mathcal {P}}[j]\)), we have
Dividing through by \(F({\mathcal {P}}[j])^{\rho _j}\), and applying (11.1) with \(c_1 = 2^{\rho _j - 1}\) and \(c_2 = (3/4)^{\rho _j}\), we find that
Therefore
However, the first \(\rho \)-equation (9.2) is precisely that
The result follows immediately (using the monotonicity of the function \(1 + 4(2/3)^t + 4(1/3)^t\) - see the proof of Proposition 10.7). \(\square \)
11.2 Preamble to the proof
In this section, we set up some notation and structure necessary for the proof of Proposition 11.1. Since we wish to let \(r\rightarrow \infty \), it is convenient to embed all binary r-step systems into a universal infinite binary system. To this end, and with a slight abuse of notation, we let
for \(j=0,1,\ldots .\) Clearly, \(V_j\simeq {\mathbb {Q}}^{2^j}\) for all j, and the flag
is isomorphic to the flag of the r-step binary system.
In this notation, we have
where
is the discrete unit cube. We further set
Lastly, for each \(j\geqslant 0\), we say that C is a cell at level j if \(C\subset \Gamma _\infty \) and there exists some \(x=(x_A)_{A\subset {\mathcal {P}}({\mathbb {N}})}\) such that \(x_A\in {\mathbb {Q}}\) for all A and \(C=\Omega \cap (x+V_j)\). We may easily check that the collection of cells lying in \(\Gamma _r\) forms the tree corresponding to the r-step binary system.
We may now define the functions \(f^C\) for our infinite binary flag. It is convenient to reverse the indices in \(f^C\). Specifically, let \({\textbf{x}}= (x_1,x_2,\ldots )\in [0,1]^{\mathbb {N}}\). If C is a cell at level \(j\geqslant 0\), then we define
In particular, \(\psi ^C({\textbf{x}})=0\) when \(j=0\), and \(\psi ^C({\textbf{x}})=\log |C{\setminus }\{{\textbf{0}}\}|\) when \(j=1\).
In the special case \(C = \Gamma _j\) we define also
Thus \(\phi _1({\textbf{x}}) = \frac{1}{2} \log 3\) and \(\phi _2({\textbf{x}}) = \frac{1}{4} \log (3^{x_1} + 4 \cdot 2^{x_1} + 4)\).
Note that \(\psi ^C, \phi _j\) are increasing in each variable. Moreover we have the following simple bounds.
Lemma 11.4
(Simple bounds) We have \(\frac{1}{2}\log 3 \leqslant \phi _j({\textbf{x}}) < \log 2\).
Proof
For the upper bound, note that \(f^{\Gamma _j}({\textbf{x}}) \leqslant f^{\Gamma _j}({\textbf{1}})\). By the definition of \(f^C\) (see (7.4)), we have that \(f^{\Gamma _j}({\textbf{1}})\) is equal to the number of children of \(\Gamma _j\) at level 0, which, in turn, is equal to \(2^{2^j} - 1\). This proves the claimed upper bound on \(\phi _j({\textbf{x}})\).
For the lower bound, observe that \(f^{\Gamma _j}({\textbf{x}}) \geqslant f^{\Gamma _j}({\textbf{0}})\). Using again the definition of \(f^C\), we find that \(f^{\Gamma _j}({\textbf{0}})\) equals the number of children of \(\Gamma _j\) at level \(j-1\). Thus \(f^{\Gamma _j}({\textbf{0}}) = 3^{2^{j-1}}\) by Lemma 10.3. This proves the claimed lower bound of \(\phi _j({\textbf{x}})\), thus completing the proof of the lemma. \(\square \)
The \(\rho \)-equations (9.1) may be expressed in terms of the \(\phi _j\) in the following simple form:
11.3 Product structure of cells and self-similarity of the functions \(\phi _j\)
There is a natural bijection \(\pi : {\mathbb {Q}}^{{\mathcal {P}}({\mathbb {N}})} \times {\mathbb {Q}}^{{\mathcal {P}}({\mathbb {N}})} \rightarrow {\mathbb {Q}}^{{\mathcal {P}}({\mathbb {N}})}\) defined by \(\pi ((x, x')) = y\), where \(y_{A} = x_{A-1}\) and \(y_{\{1\} \cup A} = x'_{A-1}\), for all \(A \subset \{2,3,\ldots \}\). Here, we write \(A-1\) for the set \(\{a-1:a\in A\}\). There is a finite version of this map that can be visualized as a concatenation map. For each r, let \(\pi _r : {\mathbb {Q}}^{{\mathcal {P}}[r-1]} \times {\mathbb {Q}}^{{\mathcal {P}}[r-1]} \rightarrow {\mathbb {Q}}^{{\mathcal {P}}[r]}\) defined by \(\pi ((x, x')) = y\), where \(y_{A} = x_{A-1}\) and \(y_{\{1\} \cup A} = x'_{A-1}\), for all \(A \subset \{2,3,\ldots ,r\}\). If we place the coordinates of x and \(x'\) in reverse binary order, as per the map \(\{2,\ldots ,r\}\supset A\rightarrow \sum _{a\in A}2^{r-a}\in \{0,1,\ldots ,2^{r-1}-1\}\), then \(\pi _r\) is the concatenation map that generates y by placing first all coordinates of x, followed by all coordinates of \(x'\).
Now one may easily check that \(\pi (V_{j-1} \times V_{j-1}) = V_j\) for all \(j=1,2,\ldots \) Therefore if \(C_1, C_2\) are two cells at level \((j-1)\) in the infinite binary system, then \(\pi (C_1 \times C_2)\) is a cell at level j, and conversely every cell of level j is of this form. The children \(C'\) of C are precisely \(\pi (C'_1 \times C'_2)\) where \(C_1 \rightarrow C'_1\), \(C_2 \rightarrow C'_2\).
The product structure established above manifests itself in a self-similarity property \(\phi _j\approx \phi _{j-1}\). In this section, we will establish the following precise version of this.
Proposition 11.5
Let \(\alpha \in (0,1]\) and consider a vector \({\textbf{x}}=(x_1,x_2,\ldots )\in [0,\alpha ]^{\mathbb {N}}\). In addition, let \(C = \pi (C_1 \times C_2)\) be a cell of level \(j \geqslant 2\). Then we have
In particular, taking \(C = \Gamma _j = \pi (\Gamma _{j-1} \times \Gamma _{j-1})\), we have
Proof
We proceed by induction on j. When \(j = 2\), we proceed by hand. Notice that at level 1, there are three different types of cells, having 4, 2 and 1 elements, respectively. There is only one cell with 4 elements, the cell \(\Gamma _1\); it splits into three cells at level 0: one with two elements, and two unicells (singletons). All other cells at level 1 split into unicells at level 0. Hence, at level 2, there are six different types of cells \(C = \pi (C_1 \times C_2)\) corresponding to the six possibilities for the unordered pair \(\{|C_1|, |C_2|\}\). Their subcells are in 1-1 correspondence with the cells \(\pi (C_1'\times C_2')\), where \(C_1'\) is a subcell of \(C_1\) (at level 0) and \(C_2'\) is a subcell of \(C_2\) (also at level 0).
The three cases with \(\max (|C_1|,|C_2|\}\leqslant 2\) are trivial, because we then have that all the cells at level 1 are unicells, and thus we readily find that \(f^C=f^{C_1}f^{C_2}=|C_1|\cdot |C_2|\).
The two other cases with \(|C_1|\leqslant 2\) and \(|C_2|=4\) (so that \(C_2=\Gamma _1\)) are only slightly harder: if \(|C_1|=2\), then \(f^C({\textbf{x}}) = 2 \cdot 2^{x_1} + 4\), \(f^{C_1} = 2\), \(f^{C_2} = 3\) and so the desired inequalities are \(\log 6\leqslant \log (2 \cdot 2^{x_1} + 4) \leqslant \log 6 + x_1 \log 2\), which are immediately seen to be true for all \(x_1 \geqslant 0\). Similarly, if \(|C_1|=1\), then \(f^C({\textbf{x}}) = 2^{x_1} + 2\), \(f^{C_1} = 1\), \(f^{C_2} = 3\), and so the desired inequalities are \(\log 3\leqslant \log (2^{x_1} + 2) \leqslant \log 3 + x_1 \log 2\), which are again true for all \(x_1\geqslant 0\).
A little trickier is the case \(|C_1| = |C_2| = 3\), corresponding to \(C = \Gamma _2 = \pi (\Gamma _1 \times \Gamma _1)\). In this case \(f^C({\textbf{x}}) = 3^{x_1} + 4 \cdot 2^{x_1} + 4\), \(f^{C_1} = f^{C_2} = 3\), so the desired inequalities are \(2\log 3\leqslant \log (3^x + 4 \cdot 2^x + 4) \leqslant 2 \log 3 + x \log 2\). The lower bound is evident. For the upper bound, we must equivalently show that \(g(x) := 5 \cdot 2^x - 3^x - 4\geqslant 0\) for \(x\in [0,1]\). Since \(g(0) = 0\) and \(g'(x) = 5\log 2 \cdot 2^x - \log 3 \cdot 3^x > 0\) for \(x\leqslant 1\), the desired inequality follows.
Now suppose that \(j \geqslant 3\), and assume the result is true for cells at level \((j-1)\). By the recursive definition of \(f^C\), if C is a cell at level j, we have the recurrence
where \(T{\textbf{x}}\) denotes the shift operator
For the upper bound, note that
Recalling that \(x_1\leqslant \alpha \), we conclude that
The lower bound is proven similarly. The result thus follows. \(\square \)
11.4 Derivatives and the limit of the \(\rho _i\)
Because of the implicit definition of the parameters \(\rho _i\), the self-similarity property (11.4) is not enough for us by itself. We will also require the following (rather ad hoc) derivative bounds.
Here, and in what follows, \(\partial _m F(y_1,\ldots ) := \frac{\partial F}{\partial y_m}(y_1,\ldots )\), that is to say the derivative of the function F with respect to its mth variable. Thus, for instance,
Proposition 11.6
Set \(\Delta _m := \sup _{j \geqslant 2} \sup _{{\textbf{x}}\in [0,0.31]^{\mathbb {N}}}| \partial _m \phi _j({\textbf{x}})|\). Then \(\Delta _1 < 0.17\), \(\Delta _2 < 0.05\), \(\sum _{m \geqslant 3} \Delta _m < 0.01\) and \(\Delta _m \ll 0.155^m\).
The proof of this proposition is given in Sect. 11.5. Let us now show how this proposition, together with (11.4), implies Proposition 11.1.
Proof of Proposition 11.1
Write \(\varepsilon _i := \rho _{i+1}- \rho _{i}\), \(i = 1,2,3,\ldots \) The \(\rho \)-equation at level \((j+1)\) is
by (11.2). Recall that that \(\rho _j\leqslant \rho _1\leqslant 0.31\) for all j, by Lemma 11.2. Hence, two applications of (11.4) (with \(\alpha = 0.31\)) yield the asymptotic formula
Subtracting (11.2), the \(\rho \)-equation at level j, from this gives
Now by the mean value theorem,
and
Therefore, from (11.7), the triangle inequality and the fact that
we have
Now by Lemma 11.4 and Proposition 11.6,
Also, by Proposition 11.6 we have
Assuming that \(j\geqslant j_0\) with \(j_0\) large enough, (11.10) implies a bound
where \(c_1,c_2,\ldots \) are fixed nonnegative constants with \(\sum _{i} c_i< \frac{0.096}{0.104} < 0.93\) and, by Proposition 11.6, \(c_i \leqslant 2^{-i}\) for all \(i \geqslant i_0\) for some \(i_0\). It is convenient to assume that \(i_0, j_0 \geqslant 10\), which we clearly may.
We claim that (11.11) implies exponential decay of the \(\varepsilon _j\), which of course immediately implies Theorem 11.1. To see this, take \(\delta \in (0, \frac{1}{4})\) so small that \(0.94 (1 - \delta )^{-i_0} < 0.99\), and then take \(A \geqslant 100\) large enough that \(|\varepsilon _j| \leqslant A(1- \delta )^j\) for all \(j \leqslant j_0\). We claim that the same bound holds for all j, which follows immediately by induction using (11.11) provided one can show that
for \(j \geqslant j_0\). Since \(\delta < \frac{1}{2}\) and \(A \geqslant 100\), it is enough to show that
The contribution to this sum from \(i \leqslant i_0\) is at most \(0.93 (1 - \delta )^{-i_0}\), whereas the contribution from \(i > i_0\) is (by summing the geometric series) at most
Therefore the desired bound follows from our choice of \(\delta \). \(\square \)
11.5 Self-similarity for derivatives
Our remaining task is to prove Proposition 11.6. Once again we use self-similarity of the \(\phi _j\), but now for their derivatives, the key point being that \(\partial _m \phi _j \approx \partial _m \phi _{j-1}\). Here is a precise statement.
Proposition 11.7
Suppose that \(C = \pi (C_1 \times C_2)\) is cell at level \(j\geqslant 1\). Let \(\alpha \in [0,1)\) and \(m \geqslant 1\), and suppose that \({\textbf{x}}\in [0,\alpha ]^{\mathbb {N}}\). Then we have
In particular, taking \(C = \Gamma _j = \pi (\Gamma _{j-1} \times \Gamma _{j-1})\), we have
Proof
The lower bound follows by noticing that \(\psi ^C\) is increasing in each variable. For the upper bound, we may assume that \(m\leqslant j-1\), for when \(m\geqslant j\), \(\partial _m \phi _j({\textbf{x}})\) is identically zero. We proceed by induction on m, first establishing the case \(m = 1\). Differentiating (11.5) gives
By two applications of the upper bound in Proposition 11.5 (applied to \(C' = \pi (C'_1 \times C'_2)\)), we obtain
On the other hand, for \(i = 1,2\) we get by differentiating the recurrence
with respect to \(x_1\) that
Substituting (11.15) and (11.16) into (11.14) gives
Finally, Proposition 11.5 implies that \(e^{\psi ^{C_1}({\textbf{x}}) + \psi ^{C_2}({\textbf{x}})}\leqslant e^{\psi ^C({\textbf{x}})}\). Dividing both sides by \(e^{\psi ^C({\textbf{x}})}\) gives the result when \(m = 1\).
Now suppose that \(m \geqslant 2\). Differentiating (11.5) with respect to \(x_{m}\) and applying (11.6) gives
By the inductive hypothesis, if \(C' = \pi (C'_1 \times C'_2)\) we have
Also, by the upper bound in Proposition 11.5, we have
Substituting (11.18) and (11.19) into (11.17) and using the assumption that \(0\leqslant x_{1} \leqslant \alpha \) gives
Now, differentiating the recurrence (11.15) with respect to \(x_{m}\) (using (11.6)) gives, for \(i = 1, 2\),
Substituting (11.15) and (11.21) into (11.20), and using once again that \(x_1\leqslant \alpha \), gives
Again, Proposition 11.5 implies that \( e^{\psi ^{C_1}({\textbf{x}}) + \psi ^{C_2}({\textbf{x}}) } \leqslant e^{\psi ^C({\textbf{x}})}\), and so by dividing both sides by \(e^{\psi ^C({\textbf{x}})}\), we obtain the stated result. \(\square \)
Before proving Proposition 11.6, we isolate a lemma.
Lemma 11.8
For \(0 \leqslant x_1\leqslant 0.31\) we have \(0 \leqslant 4\partial _1\phi _2({\textbf{x}}) \leqslant 0.481\).
Proof
We have \(e^{4\phi ({\textbf{x}})}= 3^{x_1} + 4\cdot 2^{x_1} + 4\), and thus
The lemma is therefore equivalent to
The left-hand side here is increasing in \(x_1\) and, when \(x_1 = 0.31\), it is equal to \(0.480052\ldots \). \(\square \)
Proof of Proposition 11.6
Henceforth, set \(\alpha := 0.31\) and fix two integers \(m\geqslant 1\) and \(j\geqslant 2\). Our goal is to bound \(\partial _m\phi ({\textbf{x}})\) uniformly for \({\textbf{x}}\in [0,\alpha ]^{\mathbb {N}}\). We may assume that \(j\geqslant m+1\), as \(\partial _m \phi _j({\textbf{x}})=0\) when \(j\leqslant m\).
Now, let us define
Then, if we apply (11.13) \(\ell \) times, we obtain
Here, we observed that all the \(B_m^{\alpha ^t}\) terms in (11.22) have \(t \geqslant s + 1 - m\); bounding them all above by \(B_m^{\alpha ^{s + 1 - m}}\) then allowed us to sum a geometric series.
Let us fix some \(s\in \{1,2,\ldots ,m+1\}\) independent of j. Then the number \(j-s\) lies in \(\{0,1,\ldots ,j-1\}\). Hence, applying (11.22) with \(\ell = j - s\), and then taking the supremum over all \(j\geqslant m+1\) and all \({\textbf{x}}\in [0,\alpha ]^{\mathbb {N}}\), we find that
When \(m = 1\), we take \(s = 2\). Then Lemma 11.8 and relation (11.23) give
as required. When \(m \geqslant 2\), we take \(s = m\). Then \(\partial _{m}\phi _{s} \equiv 0\) and so (11.23) degenerates to
This gives \(\Delta _2 < 0.05\), and also confirms that \(\Delta _m \ll 0.155^m\). To bound \(\sum _{m \geqslant 3} \Delta _m\) we use (11.24) and the uniform bound \(B_m \leqslant 2^{1/(1 - \alpha )^2}\), obtaining
This completes the proof of Proposition 11.6. \(\square \)
12 Calculating the \(\rho _i\) and \(\rho \)
In this section we conclude our analysis of the parameters \(\rho _1,\rho _2,\ldots \) for the binary flags. The situation so far is that we have shown that these parameters exist, are unique and lie in (0, 0.31). Moreover, their limit \(\rho = \lim _{i \rightarrow \infty } \rho _i\) exists (Proposition 11.1).
None of this helps with actually computing the limit numerically or giving any kind of closed form for it, and the objective of this section is to provide tools for doing that. We prove two main results, Propositions 12.1 and 12.2 below. Recall the convention that \(\rho _0 = 0\).
Proposition 12.1
Recall the convention that \(\rho _0 = 0\). Define a sequence \((a_{i,j})_{i\geqslant 1,\,1\leqslant j\leqslant i+1}\) by the relations \(a_{i,1}=2\), \(a_{i,2}=2+2^{\rho _{i-1}}\) and
Then
In practice, these relations are enough to calculate the \(\rho _j\) to high precision. Indeed, a short computer program produced the data in Table 1. (We suppress any discussion of the numerical precision of our routines.)
Using Proposition 12.1 we may obtain the following reasonably satisfactory description of \(\rho \), which is equivalent to the statement of Theorem 2 (c).
Proposition 12.2
For each \(t \in (0,1)\), define a sequence \(a_j(t)\) by
Then the limit \(\rho = \lim _{i \rightarrow \infty } \rho _i\) is a solution (in the variable t) to the equation
Furthermore, \(\rho \) is the unique solution to (12.4) in the interval \(0\leqslant t\leqslant 1/3\).
Remark. This is easily seen to be equivalent to Theorem 2 (c), but we have introduced t as a dummy variable since \(\rho \) now has the specific meaning \(\rho = \lim _{i \rightarrow \infty } \rho _i\), and this will avoid confusion in the proof.
Before starting the proofs of Propositions 12.1 and 12.2, let us pause to observe a simple link between the sequences \(a_{i,j}\) and \(a_j(t)\) defined in (12.1) and (12.3) respectively.
Lemma 12.3
For each fixed \(j\geqslant 1\), the limit \(\lim _{i \rightarrow \infty } a_{i,j}\) exists and equals \(a_j(\rho )\).
Proof
The existence of the limit follows by induction on j, using Proposition 11.1, noting that the result is trivial for \(j = 1\) and immediate from Proposition 11.1 when \(j = 2\). The fact that the limit equals \(a_j(\rho )\) then follows immediately by letting \(i \rightarrow \infty \) in (12.1) and comparing with (12.3). \(\square \)
12.1 Product formula for \(f^C(\rho )\) and a double recursion for the \(\rho _i\)
Proposition 12.1 is a short deduction from a product formula for F(g), or equivalently for \(f^C(\varvec{\rho })\), given in Proposition 12.5 below. Whilst is would be a stretch to say that this formula is of independent interest, it is certainly a natural result to prove in the context of our work.
Before we state the formula, the reader should recall the notion of genotype g (Definition 10.1) and of the function F(g) (Proposition 10.4). We require the following further small definition.
Definition 12.4
(Defects) Let \(i,m\in {\mathbb {Z}}_{\geqslant 0}\) and let g be an i-genotype.
(a) If \(m\leqslant i\), then we define the mth consolidation
Otherwise, if \(m \geqslant i+1\), then by convention we define \(g^{(m)}\) to be empty.
(b) For \(m\geqslant 1\), we set
Remark
Note that \(g^{(0)} = g\), \(g^{(1)} = g^*\) and \(g^{(m)} = (g^{(m-1)})^*\). It is easy to see that \(\Delta ^m(g)\) is always a nonnegative integer. Observe that \(\Delta ^{i+1}(g) = 0\) unless \(g = {\mathcal {P}}[i]\), in which case \(\Delta ^{i+1}(g) = 1\), and that \(\Delta ^m(g) = 0\) whenever \(m > i+1\).
Proposition 12.5
Let \(i\in {\mathbb {N}}\) and suppose that g is an i-genotype. Then
with the \(a_{i,m}\) defined as in Proposition 12.1 above.
Proof of Proposition 12.1, given Proposition 12.5
Note that we have \(\Delta ^m({\mathcal {P}}[i]) = 1_{m=i+1}\) for \(1\leqslant m\leqslant i+1\). Together with Proposition 12.5, this implies that \(F({\mathcal {P}}[i]) = a_{i, i+1}\). Thus \(f^{\Gamma _i}({\varvec{\rho }}) = F({\mathcal {P}}[i]) = a_{i, i+1}\). The Eq. (12.2) is then an immediate consequence of the \(\rho \)-equations (9.1). \(\square \)
Before turning to the proof of Proposition 12.5, we isolate a couple of lemmas from the proof.
Lemma 12.6
Let \(\alpha \in {\mathbb {R}}\) and \(i\in {\mathbb {N}}\). Let g be an i-genotype, and suppose that k is an \((i-1)\)-genotype with \(k \leqslant g^*\). Then
Proof
We have \(g=\{A\subset [i-1]:A\in g\}\cup \{A\subset [i-1]:A\cup \{i\}\in g\}\). Hence, if we let
and
then we have \(|g|=2|g^*|+|X|+|Y|\), and thus \(\Delta ^1(g)=|X|+|Y|\).
Now, in order to choose \(g'\leqslant g\) with \((g')^*=k\), we must decide independently for each \(A\subset [i-1]\) whether \(A\in g'\) and/or \(A\cup \{i\}\in g'\). The condition that \(g'\leqslant g\) means that if \(A\notin g\) (resp. if \(A\cup \{i\}\notin g\)), then we are forced to have \(A\notin g'\) (resp. \(A\cup \{i\}\notin g'\)). Let us now examine all admissible options for the conditions “\(A\in g'\)” and “\(A\cup \{i\}\in g'\)”:
-
\(A\in \ k\): since \((g')^*=k\), we are forced to have \(A,A\cup \{i\}\in g'\).
-
\(A\in g^*{\setminus } k\): we know in this case that \(A,A\cup \{i\}\in g\), so the condition \(g'\leqslant g\) imposes no further restrictions on the membership of A and of \(A\cup \{i\}\) in \(g'\). On the other hand, we know that \(A\notin k=(g')^*\), and thus at most one out of A and of \(A\cup \{i\}\) may belong to \(g'\).
-
\(A\in X\): the condition \(g'\leqslant g\) implies the restriction that \(A\cup \{i\}\notin g'\), and we may then choose freely among the two options of having \(A\in g'\) or \(A\notin g'\).
-
\(A\in Y\): the condition \(g'\leqslant g\) implies the restriction that \(A\notin g'\), and we may then choose freely among the two options of having \(A\cup \{i\}\in g'\) or \(A\cup \{i\}\notin g'\).
By the above discussion, we have
Since \(|X|+|Y|=\Delta ^1(g)\), the proof is complete. \(\square \)
For \({\textbf{a}} = (a_1,a_2,\ldots )\), and for some (i-)genotype g, write
(Note that the \(a_m\) here are just parameters, not related to the recursion (12.3), which does not feature in this subsection.) If \(\theta \in {\mathbb {R}}_{> 0}\), define
Lemma 12.7
We have the functional equation
As before, \(T{\textbf{a}}\) denotes the shift operator \(T{\textbf{a}} = (a_2,a_3,\ldots )\).
Proof
Using the relation \(P_{{\textbf{a}}}(g') = a_1^{\Delta ^1(g')} P_{T{\textbf{a}}}((g')^*)\), we have
The result now follows from Lemma 12.6 and a routine short calculation. \(\square \)
We are now in a position to prove Proposition 12.5.
Proof of Proposition 12.5
Let \(a_{i,m}\) be as in the statement of Proposition 12.5, and write \({\textbf{a}}_i = (a_{i,1},a_{i,2},\ldots )\). In the notation introduced above (cf. (12.5)) the claim of Proposition 12.5 is then that
We proceed by induction on i. Let us first consider the base case when \(i=1\).
-
If \(g={\mathcal {P}}[1]\), we have \(F(g)=f^{\Gamma _1}(\varvec{\rho })=3\). On the other hand, \(P_{{\textbf{a}}_1}({\mathcal {P}}[1])=a_{1,2}=3\) in this case by the convention that \(\rho _0=0\).
-
If \(g\subsetneqq {\mathcal {P}}[1]\), then \(g^*=\emptyset \) and thus \(\Delta ^1(g)=|g|\) and \(\Delta ^2(g)=0\). So we conclude that \(P_{{\textbf{a}}_1}(g)=2^{|g|}\). On the other hand, for all such genotypes, the corresponding cell contains \(2^{|g|}\) elements that all split into unicells at level 0. Consequently, \(F(g)=2^{|g|}=P_{{\textbf{a}}_1}(g)\) in this case too.
Next, suppose that we have the result for \((i-1)\)-genotypes for some \(i\geqslant 2\), and let g be an i-genotype. We know from (10.3) that
By the induction hypothesis, we have \(F(g')^{\rho _{i-1}} = P_{{\textbf{a}}_{i-1}^{\rho _{i-1}}}(g')\) for all \(g'\leqslant g^*\), where \({\textbf{a}}_{i-1}^{\rho _{i-1}}\) is shorthand for \((a_{i-1,1}^{\rho _{i-1}}, a_{i-1,2}^{\rho _{i-1}},\ldots )\). Hence, it follows immediately that
with \(\Phi \) defined in (12.6). The fact that the right-hand side of (12.8) is a product \(P_{*}(g)\) is now clear by an iterated application of Lemma 12.7. To get a handle on exactly which product, suppose that the result of applying Lemma 12.7\(j-1\) times is that
Thus \(b_{i,1} = \theta _{i,1} = 2\), and we have the relations
and
for \(j\in \{1,\ldots ,i\}\). We claim that \(b_{i,j}=a_{i,j}\) for all \(j\leqslant i+1\). This will complete the proof of Proposition 12.5, because we may then apply (12.9) with \(j=i+1\) to show that
because \(g^{(i+1)}=\emptyset \) for all i-genotypes g.
Let us now prove our claim that \(b_{i,j}=a_{i,j}\) for all \(j\leqslant i+1\). We shall use induction on j. We have \(b_{i,1}=2=a_{i,1}\). In addition, \(b_{i,2}=2+2^{\rho _{i-1}}=a_{i,2}\) by (12.10) with \(j=1\) and by the fact that \(\theta _{i,1}=2\). Now, assume that we have proven that \(b_{i,j}=a_{i,j}\) for some \(j\in \{2,\ldots ,i\}\). Relation (12.11) applied with \(j-1\) in place of j implies that
The right-hand side equals \(b_{i,j}^2=a_{i,j}^2\) by applying (12.10) followed by the induction hypothesis. Thus, \(\theta _{i,j}=a_{i,j}^2-a_{i-1,j-1}^{2\rho _{i-1}}\). Inserting this relation into (12.10) and using the recursive formula (12.1) shows that \(b_{i,j+1}=a_{i,j+1}\). This completes the inductive step and thus the proof of Proposition 12.5. \(\square \)
12.2 A single recurrence for \(\rho \)
In this section we deduce Proposition 12.2 from Proposition 12.1 by a limiting argument.
To carry this out, we will need the following fairly crude estimates for the \(a_{i,j}\) and the \(a_j(t)\), defined in (12.1) and (12.3) respectively.
Lemma 12.8
We have
and
Proof
Since \(\rho _{i-1}<1\) for all \(i\geqslant 1\) (cf. Lemma 11.2), we have \(a_{i,2}<4=a_{i,1}^2\). Hence, the inequality (12.12) follows from a simple induction using (12.1).
Using another simple induction, we readily confirm the inequality \(a_{i,j}\leqslant a_{i,2}^{2^{j-2}}\) in (12.13).
For the lower bound in (12.13), we know from (12.10) and (12.11) and from the fact that \(b_{i,j}=a_{i,j}\) for all \(j\leqslant i+1\) that
and that
for \(j\in \{1,\ldots ,i\}\). By a simple induction, these formulas imply that \(a_{i,j}>1\) and \(\theta _{i,j}>0\) for all \(j\leqslant i+1\), and thus \(\theta _{i,j+1} + 1 \geqslant (\theta _{i,j} + 1)^2\) for \(j=1,2,\ldots ,i\). By yet another induction, we find \(\theta _{i,j} \geqslant 3^{2^{j - 1}} - 1\). Finally, the lower bound on the \(a_{i,j}\) in (12.13) follows from this and (12.14). \(\square \)
Lemma 12.9
Let \(t \in (0,1)\). We have
and
Proof
The inequality (12.16) follows from a simple induction using (12.3), and the upper bound in (12.17) follows with a further induction.
For the lower bound, we first set up relations analogous to (12.14) and (12.15), defining \(\theta _j(t)\) for \(j\geqslant 1\) via the relation
We then note that we also have
Indeed, on the one hand, we have
by (12.3). On the other hand,
by (12.18).
Having proven (12.19), we now proceed analogously to the proof of Lemma 12.8. We have \(a_j(t)>1\) and \(\theta _j(t)>0\) for all \(j\geqslant 1\), by a simple induction using (12.18) and (12.19). Therefore, from (12.19), we have that
By induction, this implies that \(\theta _{j}(t) \geqslant 3^{2^{j - 1}} - 1\). Finally, the lower bound on the \(a_{j}(t)\) in (12.17) follows from this and (12.18). \(\square \)
We are now in a position to prove that the relation
holds with \(t=\rho \), which is one of the main statements of Proposition 12.2. Iterating (12.2) gives
By Proposition 11.1 , we have \(\rho _i\rightarrow \rho \). In addition, by Lemma 11.2, we have \(0\leqslant \rho _i\leqslant \rho _1<0.31\) for all i. Thus, taking limits as \(i \rightarrow \infty \) gives
We now derive another expression for the left-hand side of (12.21). A telescoping argument gives
The terms on the right-hand side of (12.22) are rapidly decreasing. Indeed, by (12.12) we have \(1\geqslant a_{i,j+1}/a_{i,j}^2\) for all \(j\geqslant 1\). On the other hand, by (12.1) (with j replaced by \(j+1\) there) and by (12.13), we have
for all \(j\in \{2,\ldots ,i\}\). Since \(\rho _{i-1}\leqslant \rho _1\leqslant 0.31\), we have \(2^{\rho _{i-1}}/3<1/2\). In conclusion,
for all \(j\in \{1,\ldots ,i\}\). By a simple limiting argument using relation (12.22) and Lemma 12.3, we thus find that
Here, we used (12.23) to bound the terms with j large. Comparing this with (12.21) confirms that indeed (12.20) is satisfied with \(t=\rho \).
We turn now to the final statement in Proposition 12.2, the statement that (12.20) has a unique solution in \(t \in [0,\frac{1}{3}]\) (which must, by the above discussion, be \(\rho \)). This is a purely analytic problem. Write
We must show that there is only one solution to \(W(t) = 0\). We already know \(W(\rho )=0\), so it would suffice to show that W is strictly increasing in [0, 1/3]. This would certainly follow if we could show that
for all \(j\geqslant 2\) and all \(0\leqslant t\leqslant t'\leqslant 1/3\). Since the derivative of \(\frac{1}{1 - t/2}\) is bounded below by \(\frac{1}{2}\) on \([0,\frac{1}{3}]\), it is enough to establish the derivative bound
for all \(j \geqslant 2\) and all \(t \in (0,\frac{1}{3})\). The remainder of the section is devoted to proving this bound, which it is convenient to write in the form
where \(\ell _j(t) := a'_j(t)/a_j(t)\).
We begin by observing that, since \(t \in (0,\frac{1}{3})\), we have \(a_2(t) \leqslant 2 + 2^{1/3}\) and so we may upgrade the upper bound in (12.17) to
for \(j \geqslant 2\). Note also that, by induction using (12.18) and (12.19), both \(a_j(t)\) and \(\theta _j(t)\) are increasing functions of t. In particular, \(a_j(t)\) is an increasing function of t so the derivative \(a'_j(t)\) is positive.
Differentiating (12.3) gives
where here and in the next few lines we have omitted the argument (t) from the functions for brevity. The term in parentheses is non-positive by (12.16), and the final term \(- 2t a_{j-1}^{2t} \frac{a_{j-1}'}{a_{j-1}}\) is negative since the derivative \(a'_{j-1}\) is positive. It follows from (12.26) that
A little computation using (12.3) shows that this may equivalently be written as
where we used our notation \(\ell _j = a'_j/a_j\).
Denote
Then (12.27) implies that \(\ell _{j+1}(t) < 2 \ell _j(t) \xi _j\) for all \(t\in [0,1/3]\) and all \(j\geqslant 2\). Telescoping this inequality gives
We have
for all \(t\in [0,1/3]\). Hence, in order to obtain the desired bound (12.24), it is enough to show
The \(\xi _i\) tend to 1 exceptionally rapidly, and crude bounds (together with a little computation) turn out to suffice, as follows.
First, by (12.17) and the fact that \(a_2(t)^{2-t} = (2 + 2^t)^{2 - t} \leqslant 9\) for \(t \in [0,1]\) (a calculus exercise), we have
Second, by the lower bound in (12.17) and by (12.25) we have
We may also check by hand that \(a_1(t)^{2t}/a_2(t)^2=(2^{1-t}+1)^{-2}<1/6\) for all \(t\in [0,1/3]\). Hence,
Third, again by the lower bound in (12.17) and by (12.25), we have
Substituting (12.30), (12.31) and (12.32) into the definition (12.28) gives
Using this bound, one may check the bound \(\prod _{j = 2}^{\infty } \xi _j \leqslant 10/9\), which is stronger than the desired bound (12.29), on a pocket calculator or even by hand. For example, we have \(\xi _2\xi _3 \leqslant \frac{46751495}{42169248}\) and can use a very crude bounds for the higher terms. Since \(\frac{1}{1 - x} + \frac{x}{6} \leqslant e^{2x}\) for \(0\leqslant x\leqslant 0.1\), taking \(x = 6^{-2^{j-2}}\) gives
for \(j \geqslant 4\). Therefore
This concludes the proof of the final statement in Proposition 12.2.
12.3 Proof of parts (b) and (c) of Theorem 2
To conclude this paper, we complete the proof of parts (b) and (c) of Theorem 2, as defined in the end of Sect. 1.3. In fact, all of the ingredients have already been assembled and we must simply remark on how they fit together.
First, recall from Definition 9.6 that
Now, it is an easy exercise to see that if \(x_1,x_2,\ldots \) is a sequence of positive real numbers for which \(x = \lim _{i \rightarrow \infty } x_i\) exists and is positive, then
Applying this with \(x_i = 2/\rho _i\) gives, by Proposition 11.1, that
This, together with Proposition 12.2, completes the proof of Theorem 2.
Notes
A property of natural numbers is said to occur for almost all n if the number of exceptions below x is o(x) as \(x\rightarrow \infty \).
The factor \(3^{m-1}\) is missing in the stated lower bounds for \(\alpha _k\) in [26].
We use \(1_E\) for the indicator function of a statement E; that is, \(1_E=1\) if E is true and \(1_E=0\) if E is false.
We say that the random variable X has the distribution \(\textrm{NB}(r,p)\) with \(r\in {\mathbb {N}}\) and \(p\in (0,1]\) if X takes values in \({\mathbb {Z}}_{\geqslant 0}\) with the following frequency: \({\mathbb {P}}(X=k)=\left( {\begin{array}{c}k+r-1\\ r-1\end{array}}\right) (1-p)^kp^r\) for each \(k\in {\mathbb {Z}}_{\geqslant 0}\).
In the literature, the term “flag” means that the inclusions are proper, i.e., \(\dim (V_{i+1}) > \dim V_i\) for all i. In this paper, we will use the term more broadly to refer to an arbitrary nested sequence of subspaces.
Here and throughout the paper, \({\text {Span}}(v_1,\ldots )\) denotes the \({\mathbb {Q}}\)-span of vectors \(v_1,\ldots \).
Note that we have not said that the \(\rho _i\) are unique. However, in cases of interest to us this will turn out to be the case.
The term genotype is appropriate, as each component in g acts like recessive gene with respect to child cells.
References
Alon, N., Spencer, J.H.: The Probabilistic Method. Wiley Series in Discrete Mathematics and Optimization, fou4 edn. John Wiley & Sons Inc, Hoboken, NJ (2016)
Arratia, R., Barbour, A.D., Tavaré, S.: On random polynomials over finite fields. In: Mathematical Proceedings of the Cambridge Philosophical Society, vol. 114, pp. 347–368 (1993)
Arratia, R., Tavaré, S.: The cycle structure of random permutations. Ann. Probab. 20, 1567–1591 (1992)
Elliott, P.D.T.A.: Probabilistic Number Theory. I. Grundlehren der Mathematischen Wissenschaften [Fundamental Principles of Mathematical Science], vol. 239. Springer-Verlag, New York (1979). (Mean-value theorems)
Elliott, P.D.T.A.: Probabilistic Number Theory. II. Grundlehren der Mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences], vol. 240. Springer-Verlag, New York (1980). (Central limit theorems)
Erdős, P., Hall, R.R.: The propinquity of divisors. Bull. Lond. Math. Soc. 11, 304–307 (1979)
Erdős, P., Nicolas, J.-L.: Répartition des nombres superabondants. Bull. Soc. Math. Fr. 103, 65–90 (1975)
Erdős, P., Nicolas, J.-L.: Méthodes probabilistes et combinatoires en théorie des nombres. Bull. Sci. Math. (2) 100, 301–320 (1976)
Erdős, P.: On the density of some sequences of integers. Bull. Am. Math. Soc. 54, 685–692 (1948)
Erdős, P.: On some applications of probability to analysis and number theory. J. Lond. Math. Soc. 39, 692–696 (1964)
Ford, K.: Joint Poisson distribution of prime factors in sets. Math. Proc. Camb. Philos. Soc. 173, 189–200 (2022)
Hall, R.R., Tenenbaum, G.: On the average and normal orders of Hooley’s \(\Delta \)-function. J. Lond. Math. Soc. (2) 25, 392–406 (1982)
Hall, R.R., Tenenbaum, G.: The average orders of Hooley’s \(\Delta _{r}\)-functions. Mathematika 31, 98–109 (1984)
Hall, R.R., Tenenbaum, G.: The average orders of Hooley’s \(\Delta _r\)-functions. II. Compos. Math. 60, 163–186 (1986)
Hall, R.R., Tenenbaum, G.: Divisors. Cambridge Tracts in Mathematics, vol. 90. Cambridge University Press, Cambridge (1988)
Hooley, C.: On a new technique and its applications to the theory of numbers. Proc. Lond. Math. Soc. (3) 38, 115–151 (1979)
Koukoulopoulos, D.: Localized factorizations of integers. Proc. Lond. Math. Soc. 101, 392–426 (2010)
Koukoulopoulos, D.: On the number of integers in a generalized multiplication table. J. Reine Angew. Math. 689, 33–99 (2014)
Levin, D.A., Peres, Y., Wilmer, E.L.: Markov Chains and Mixing Times. American Mathematical Society, Providence, RI (2009). With a chapter by James G. Propp and David B. Wilson
Maier, H., Tenenbaum, G.: On the set of divisors of an integer. Invent. Math. 76, 121–128 (1984)
Maier, H., Tenenbaum, G.: On the normal concentration of divisors. J. Lond. Math. Soc. (2) 31, 393–400 (1985)
Maier, H., Tenenbaum, G.: On the normal concentration of divisors. II. Math. Proc. Camb. Philos. Soc. 147, 513–540 (2009)
Tenenbaum, G.: Sur la concentration moyenne des diviseurs. Comment. Math. Helv. 60, 411–428 (1985)
Tenenbaum, G.: Fonctions \(\Delta \) de Hooley et applications. In: Séminaire de théorie des nombres, Paris 1984–85. Progress in Mathematics, vol. 63, pp. 225–239. Birkhäuser, Boston, MA (1986)
Tenenbaum, G.: Crible d’ératosthène et modèle de Kubilius. In: Number Theory in Progress (Zakopane-Kościelisko, 1997), vol. 2, pp. 1099–1129. de Gruyter, Berlin (1999)
Tenenbaum, G.: Some of Erdős’ unconventional problems in number theory, thirty-four years later. In: Erdős centennial. Bolyai Society Mathematical Studies, vol. 25, pp. 651–681. János Bolyai Mathematical Society, Budapest (2013)
Acknowledgements
This collaboration began at the MSRI program on Analytic Number Theory, which took place in the first half of 2017 and which was supported by the National Science Foundation under Grant No. DMS-1440140. All three authors are grateful to MSRI for allowing us the opportunity to work together. The project was completed during a visit of KF and DK to Oxford in the first half of 2019. Both authors are grateful to the University of Oxford for its hospitality. KF is supported by the National Science Foundation Grants DMS-1501982 and DMS-1802139. In addition, his stay at Oxford in early 2019 was supported by a Visiting Fellowship at Magdalen College Oxford. BG is supported by a Simons Investigator Grant, which also funded DK’s visit to Oxford. DK is also supported by the Courtois Chair II in fundamental research, by the Natural Sciences and Engineering Research Council of Canada (RGPIN-2018-05699) and by the Fonds de recherche du Québec - Nature et technologies (2019-PR-256442 and 2022-PR-300951).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendices
Appendix A. Some probabilistic lemmas
Throughout this section, \({\textbf{A}}\subset {\mathbb {N}}\) will be a random set, with \({\mathbb {P}}(i \in {\textbf{A}}) = 1/i\) and these choices being independent for different values of i.
Lemma A.1
For any finite subset \(B \subset {\mathbb {Z}}_{\geqslant 4}\) and any \(k\in {\mathbb {Z}}_{\geqslant 0}\), we have
where
Proof
The result follows by a standard inclusion-exclusion argument. We have
For the lower bound, we note that
Since \(\sum _{a\in B}1/(a-1)^2 < 1/(\min B-2)^2\leqslant 4/(\min B)^2\), the proof is complete. \(\square \)
Lemma A.2
Uniformly for \(B\subset {\mathbb {N}}\) with \(\lambda :=\sum _{m\in B}1/m\geqslant 1\) and \(0\leqslant \varepsilon \leqslant 1\), we have
Proof
This follows by the upper bound in Lemma A.1 with standard bounds on the tails of the Poisson distribution, e.g. Norton’s bounds [15, Theorem 09]. \(\square \)
Lemma A.3
For any \(x>0\) and finite set \(B\subset {\mathbb {N}}\),
Proof
The random variable \(\# ({\textbf{A}}\cap B)\) is the sum of independent Bernouilli random variables and thus
Note that all factors are positive because \(x>0\). The lemma now follows from the inequality \(1+y\leqslant e^y\), valid for all real y. \(\square \)
Lemma A.4
Let \(k\in {\mathbb {N}}\), and let B and G be finite sets such that \(B\subset G\subset {\mathbb {Z}}_{\geqslant 4}\) and
Then
Proof
Since \(|B|=k\), we have
The denominator is estimated using Lemma A.1, whereas for the numerator we simply note that
This completes the proof of the lemma. \(\square \)
Lemma A.5
Given \(0<c<1\) and \(D \geqslant e^{100/c}\), the probability that \({\textbf{A}}\subset (D^c,D]\) satisfies
is \(\geqslant 1 - O(e^{-(1/4) (\log D)^{1/2}})\).
Proof
It suffices to bound the probability that
whenever \(\alpha \log D, \beta \log D \in {\mathbb {N}}\). The random variable \(N=N(\alpha ,\beta ):= \# ({\textbf{A}}\cap (D^{\alpha },D^{\beta }])\) is the sum of Bernoulli random variables and has expectation \({\mathbb {E}}N = M+O(1)\), where
By Lemma A.3, \({\mathbb {E}}\lambda ^{ N}\leqslant e^{(\lambda -1) {\mathbb {E}}N}\). Thus, for \(y = (\log D)^{3/4}\) and \(\lambda _j = 1+ (-1)^j \frac{y}{\log D}\) we have
Summing over all possible \(\alpha ,\beta \) completes the proof. \(\square \)
Lemma A.6
Uniformly for \(X \geqslant 2\) and \(K\geqslant 2\) we have
with probability \(\geqslant 1-e^{2 - K}\).
Proof
We use Chernoff’s inequality, often called Rankin’s trick in this context:
because \(e^t\leqslant 1+2t\) for all \(t\in [0,1]\). This concludes the proof. \(\square \)
Lemma A.7
Let \(\eta \in [0,1]\) and let \(J_1,\ldots , J_d \subset {\mathbb {N}}\) be mutually disjoint intervals. Suppose that \(X \subset J_1 \times \cdots \times J_d\) is a set of size \(\eta \prod _i \max J_i\). If \(\min _i |J_i|\) is sufficiently large in terms of \(\eta \) and d, then with probability \(\geqslant (\eta /4)^d\), there are distinct elements \(a_i \in {\textbf{A}}\) with \((a_1,\ldots , a_d) \in X\).
Proof
Let \(M_i = \max J_i\) for each i. We will prove the lemma by induction on d.
The case \(d = 1\) follows by direct calculation: Suppose that \(X \subset J_1\) has size \(\geqslant \eta M_1\). Then
Let us now assume we have proven the lemma for \(d-1\) intervals, and let us prove it for d intervals \(J_1,\ldots ,J_d\). For each \(j_1\in J_1\), we set
Let \(Y=\{j_1\in J_1:|X_{j_1}|\geqslant (\eta /2)M_1\}\). Then \(|Y|\geqslant (\eta /2)M_1\), because otherwise we would have \(|X|<\eta \prod _i M_i\), a contradiction to our hypotheses. By the case \(d = 1\) (just described), \({\textbf{A}}\cap Y\) is nonempty with probability \(\geqslant \eta /4\). Fix some \(a_1 \in {\textbf{A}}\cap Y\). Then, by the inductive hypothesis and the fact that the \(J_i\) are disjoint, with probability \(\geqslant (\eta /4)^{d-1}\), independent of the choice of \(a_1\), there are elements \(a_i \in {\textbf{A}}\cap J_i\), \(i = 2,\ldots , d\) with \((a_2,\ldots , a_d) \in X_{a_1}\), and therefore \((a_1,\ldots , a_d) \in X\). The disjointness of the \(J_i\) of course guarantees that the \(a_i\) are all distinct. This completes the proof. \(\square \)
Lemma A.8
If \(X_j,Y_j\) live on the same discrete probability space for \(1\leqslant j\leqslant k\), and furthermore \(X_1,\ldots ,X_k\) are independent, and \(Y_1,\ldots ,Y_k\) are also independent, then
Proof
We begin with the following identity
Denoting \(\Omega \) the domain of \((X_1,\ldots ,X_m)\), and writing \(a_i={\mathbb {P}}(X_i=\omega _i)\), \(b_i={\mathbb {P}}(Y_i=\omega _i)\), we then have
\(\square \)
Appendix B. Basic properties of entropy
The notion of entropy plays a key role in our paper. In this appendix we record the key facts about it that we need. Proofs may be found in many places. One convenient resource is [1].
If X is a random variable taking values in a finite set then we define
where the log is to base e and the summation runs over the range of X.
If \({\textbf{p}} = (p_1,\ldots , p_n)\) is a vector of probabilities (that is, if \(p_1,\ldots , p_n \geqslant 0\) and \(p_1 + \cdots + p_n = 1\)), then we write
There should be no danger of confusing the two slightly different usages.
Our first lemma gives a simple upper bound for multinomial coefficients in terms of entropies.
Lemma B.1
Let \(n, n_1,\ldots , n_k\) be non-negative integers with \(\sum n_i = n\). Then
where \({\textbf{p}} = (p_1,\ldots , p_k)\) with \(p_i := n_i/n\).
Proof
The right-hand side is \((n/n_1)^{n_1} \ldots (n/n_k)^{n_k}\). Now simply observe that
\(\square \)
Our next lemma is a simple and well-known upper bound for the entropy.
Lemma B.2
Let X be a random variable taking values in a set of size N. Then \({\mathbb {H}}(X) \leqslant \log N\).
Proof
This follows immediately from the convexity of the function \(L(x) = -x \log x\) and Jensen’s inequality. See [1, Lemma 14.6.1 (i)]. \(\square \)
The next lemma is simple and has no doubt appeared elsewhere, but we do not know an explicit reference. In its statement, we use the notation \(\langle {\textbf{a}},{\textbf{p}}\rangle = \sum _{i = 1}^n a_i p_i\).
Lemma B.3
Let \({\textbf{p}} = (p_1,\ldots , p_n)\) be a vector of probabilities, and let \({\textbf{a}} = (a_1,\ldots , a_n)\) be a vector of real numbers. Then
and equality occurs if and only if \(p_j = e^{a_j} / \sum _{i=1}^n e^{a_i}\) for all j.
Proof
Let us begin by recalling that if \(t_1,\ldots ,t_n>0\) are such that
then the concavity of the logarithm implies that
for all \(x_1,\ldots ,x_n>0\). In addition, equality occurs in (B.1) if and only if \(x_1=\cdots =x_n\). One may also prove this fact by induction on n, and by noticing that the case \(n=2\) is equivalent to having \(u^t\leqslant tu+ 1-t\) for all \(u>0\) and all \(t\in (0,1)\), with equality occurring if and only if \(u=1\).
Let us now proved the lemma. If \(p_j=1\) for some j, then \({\mathbb {H}}({\textbf{p}})+\langle {\textbf{a}}, {\textbf{p}}\rangle = a_j\). If \(n=1\), then this is equal to \(\log (\sum _{i=1}^ne^{a_i})\), whereas if \(n\geqslant 2\), then we have \(a_j<\log (\sum _{i=1}^n e^{a_i})\), so that the lemma holds in both cases. Assume now that \(p_j\in (0,1)\) for all j. We then have
We may then use (B.1) with \(t_j=p_j\) and \(x_j=e^{a_j}/p_j\) to complete the proof of the lemma. \(\square \)
The next lemma, known as the chain rule for entropy, is nothing more than a short computation.
Lemma B.4
Let X, Y be random variables taking values in finite sets. Then
Remark
The sum over y is usually written \({\mathbb {H}}(X | Y)\) and called the conditional entropy.
We will apply the preceding result together with the following observation.
Lemma B.5
Suppose that X, Y are random variables with finite ranges and that Y is a deterministic function of X. Then \({\mathbb {H}}(X,Y) = {\mathbb {H}}(X)\).
Proof
This follows from Lemma B.4 with the role of X and Y reversed, since all the entropies \({\mathbb {H}}(Y | X = x)\) are zero. \(\square \)
The next result, known as the submodularity property of entropy, is a crucial ingredient in our paper.
Lemma B.6
. Let X, Y, Z be any random variables taking values in finite sets. Then
Proof
This is [1, Lemma 14.6.1 (iv)].\(\square \)
Appendix C. Maier–Tenenbaum flags
The purpose of this appendix is to say a little more about the bound (3.12), which corresponds in the language of this paper to [22, Theorem 1.4]. Numerically, this bound is \({{\tilde{\gamma }}}_{2^r} \gg (0.12885796477\ldots )^r\), which is a little weaker than the bound leading to Theorem 2, which is \({{\tilde{\gamma }}}_{2^r} \gg (0.140605674848\ldots )^r\). What is interesting, however, is that the flags \({\mathscr {V}}\) which lead to (3.12) are completely different to the binary flags which have been the main focus of our paper. The fact that these very different flags – the “Maier–Tenenbaum flags” – lead to a result which appears to be within 10 % of optimal suggests that they will have a key role to play in any future upper bound arguments for these questions.
Definition C.1
(Maier–Tenenbaum flag of order r) Let \(k = 2^r\) be a power of two. Identify \({\mathbb {Q}}^k\) with \({\mathbb {Q}}^{{\mathcal {P}}[r]}\) and define a flag \({\mathscr {V}}\), \(\langle {\textbf{1}}\rangle = V_0 \leqslant V_1 \leqslant \cdots \leqslant V_r \leqslant {\mathbb {Q}}^{{\mathcal {P}}[r]}\), as follows: \(V_i = {\text {Span}}({\textbf{1}}, \omega ^1,\ldots , \omega ^{i})\), where \(\omega ^i_S = 1_{i \in S}\) for \(S \subset [r]\).
Remark
We have \(\dim (V_i) = i+1\) and in particular \(V_r\) is much smaller than \({\mathbb {Q}}^k\), in contrast to the situation for binary systems. We leave it to the reader to check that \({\mathscr {V}}\) is nondegenerate.
Recall that \({\mathscr {V}}\) gives rise to a tree structure, with the cells at level i being the intersections of cosets \(x + V_i\) with the cube \(\{0,1\}^k\) (cf. Sect. 7.2). It is easy to check that this tree structure has a very simple form, with the cell \(\Gamma _i = V_i \cap \{0,1\}^k\) being \(\{{\textbf{0}},{\textbf{1}}, \omega ^1, {\textbf{1}} - \omega ^1,\ldots , \omega ^i, {\textbf{1}} - \omega ^i\}\), this dividing into three children at level \(i-1\); the cell \(\Gamma _{i-1}\) together with two singletons, \(\{ \omega ^i\}\) and \(\{{\textbf{1}} - \omega ^i\}\).
The recursive definition of the quantities \(f^C({\varvec{\rho }})\) (see (7.4)) therefore becomes \(f^{\Gamma _1}({\varvec{\rho }}) = 3\),
In addition, the \(\rho \)-equations (7.5) become
On the one hand, iterating (C.2) yields that
for all \(j\geqslant 1\). On the other hand, combining (C.1) and (C.2), we find that
and thus
for all \(j\geqslant 1\). Hence, we obtain the formulas
Let us also note that the above discussion implies that
Now, assuming that the conditions of Proposition 7.7 hold, we therefore have
Now it can be shown by explicit calculation that the conditions of Proposition 7.7 do hold. The optimal measures \(\mu _i^*\) are all induced from the measure \(\mu ^*\) in which
In addition, we have
We may then prove by a slightly lengthy computation whose details we leave to the reader that the optimal parameters \({{\textbf{c}}}^*\) are given by
It can also be shown that \(\gamma ^{{\text {res}}}_k({\mathscr {V}}) = \gamma _k({\mathscr {V}})\), by showing that the full entropy condition (3.6) follows from the restricted conditions (7.11). This is a little involved, but a fairly direct inductive argument can be made to work and this is certainly less subtle than the arguments of Sect. 8. In this way one may establish the bound
Finally, a relatively routine perturbative argument yields the same bound for \({{\tilde{\gamma }}}_{2^r}\).
It will be noted that (C.4) is strictly stronger than (3.12), the bound obtained in [22]. This is because, in essence, Maier and Tenenbaum chose slightly suboptimal measures and parameters on the system \({\mathscr {V}}\), roughly corresponding to \(\mu (\omega ^j) \sim 3^{j - r-1}\), which then leads to \(c_j \sim \big ( \frac{1 - 1/\log 3}{1 - 1/\log 27} \big )^j\).
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Ford, K., Green, B. & Koukoulopoulos, D. Equal sums in random sets and the concentration of divisors. Invent. math. 232, 1027–1160 (2023). https://doi.org/10.1007/s00222-022-01177-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00222-022-01177-y