Equal sums in random sets and the concentration of divisors

Ford, Kevin; Green, Ben; Koukoulopoulos, Dimitris

doi:10.1007/s00222-022-01177-y

Equal sums in random sets and the concentration of divisors

Open access
Published: 29 March 2023

Volume 232, pages 1027–1160, (2023)
Cite this article

Download PDF

You have full access to this open access article

Inventiones mathematicae Aims and scope

Equal sums in random sets and the concentration of divisors

Download PDF

Kevin Ford¹,
Ben Green² &
Dimitris Koukoulopoulos³

2842 Accesses
3 Citations
2 Altmetric
Explore all metrics

Abstract

We study the extent to which divisors of a typical integer n are concentrated. In particular, defining $\Delta (n) := \max _t \# \{d | n, \log d \in [t,t+1]\}$, we show that $\Delta (n) \geqslant (\log \log n)^{0.35332277\ldots }$ for almost all n, a bound we believe to be sharp. This disproves a conjecture of Maier and Tenenbaum. We also prove analogs for the concentration of divisors of a random permutation and of a random polynomial over a finite field. Most of the paper is devoted to a study of the following much more combinatorial problem of independent interest. Pick a random set ${\textbf{A}} \subset {\mathbb {N}}$ by selecting i to lie in ${\textbf{A}}$ with probability 1/i. What is the supremum of all exponents $\beta _k$ such that, almost surely as $D \rightarrow \infty $, some integer is the sum of elements of ${\textbf{A}} \cap [D^{\beta _k}, D]$ in k different ways? We characterise $\beta _k$ as the solution to a certain optimisation problem over measures on the discrete cube $\{0,1\}^k$, and obtain lower bounds for $\beta _k$ which we believe to be asymptotically sharp.

Asymptotic behavior for sums of non-identically distributed random variables

Article 15 March 2019

Large deviations of sums of random variables

Article 29 June 2021

Infinitely Divisible Approximations for Sums of m-Dependent Random Variables

Article 12 July 2017

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Part I. Main results and overview of the paper

1 Introduction

1.1 The concentration of divisors

Given an integer n, we define the Delta function

$$\begin{aligned} \Delta (n) := \max _t \# \{d | n, \log d \in [t,t+1]\}, \end{aligned}$$

that is to say the maximum number of divisors n has in any interval of logarithmic length 1. Its normal order (almost sure behaviour) has proven quite mysterious, and indeed it was a celebrated achievement of Maier and Tenenbaum [20], answering a question of Erdős from 1948 [9], to show that $\Delta (n) > 1$ for almost all^{Footnote 1}n.

Work on the distribution of $\Delta $ began in the 1970s with Erdős and Nicolas [7, 8]. However, it was not until the work of Hooley [16] that the Delta function received proper attention. Among other things, Hooley showed how bounds on the average size of $\Delta $ can be used to count points on certain algebraic varieties. Further work on the normal and average behavior of $\Delta $ can be found in the papers of Tenenbaum [23, 24], Hall and Tenenbaum [12,13,14], and of Maier and Tenenbaum [20,21,22]. See also [15, Ch. 5,6,7]. Finally, Tenenbaum’s survey paper [26, p. 652–658] includes a history of the Delta function and description of many applications in number theory.

The best bounds for $\Delta (n)$ for “normal” n currently known were obtained in a more recent paper of Maier and Tenenbaum [22].

Theorem MT (Maier–Tenenbaum [22]) Let $\varepsilon >0$ be fixed. Then

$$\begin{aligned} (\log \log n)^{c_1 - \varepsilon } \leqslant \Delta (n) \leqslant (\log \log n)^{\log 2 + \varepsilon }, \end{aligned}$$

for almost all n, where

$$\begin{aligned} c_1 = \frac{\log 2}{\log \big (\frac{1 - 1/\log 27}{1 - 1/\log 3} \big ) } \approx 0.33827. \end{aligned}$$

It is conjectured in [22] that the lower bound is optimal.

One of the main results of this paper is a disproof of this conjecture.

Theorem 1

Let $\varepsilon >0$ be fixed. Then

$$\begin{aligned} \Delta (n) \geqslant (\log \log n)^{\eta - \varepsilon } \end{aligned}$$

for almost all n, where $\eta = 0.35332277270132346711\ldots $.

The constant $\eta $, which we believe to be sharp, is described in relation (1.3) below, just after the statement of Theorem 2.

1.2 Packing divisors

Let us briefly attempt to explain, without details, why it was natural for Maier and Tenenbaum to make their conjecture, and what it is that allows us to find even more tightly packed divisors.

We start with a simple observation. Let n be an integer, and suppose we can find pairs of divisors $d_i, d'_i$ of n, $i = 1,\ldots , k$, such that

$1 < d_i/d'_i \leqslant 2^{1/k}$;
The sets of primes dividing $d_id'_i$ are disjoint, as i varies in $\{1,\ldots , k\}$.

Then we can find $2^k$ different divisors of n in a dyadic interval, namely all products $a_1\ldots a_k$ where $a_i$ is either $d_i$ or $d'_i$.

In [22], Maier and Tenenbaum showed how to find many such pairs of divisors $d_i, d'_i$. To begin with, they look only at the large prime factors of n. They first find one pair $d_1, d'_1$ using the technique of [20]. Then, using a modification of the argument, they locate a further pair $d_2$ and $d'_2$, but with these divisors not having any primes in common with $d_1, d'_1$. They continue in this fashion to find $d_3, d'_3$, $d_4,d_4'$, etc., until essentially all the large prime divisors of n have been used. After this, they move on to a smaller range of prime factors of n, and so on.

By contrast, we eschew an iterative approach and select $2^k$ close divisors from amongst the large prime divisors of n in one go, in a manner that is combinatorially quite different to that of Maier and Tenenbaum. We then apply a similar technique to a smaller range of prime factors of n, and so on. This turns out to be a more efficient way of locating proximal divisors.

In fact, we provide a general framework that encapsulates all possible combinatorial constructions one might use to pack many divisors close to each other. To work in this generality it is necessary to use a probabilistic formalism. One effect of this is that, even though our work contains that of Maier and Tenenbaum as a special case, the arguments here will look totally different.

1.3 Random sets and equal sums

For most of the paper we do not talk about integers and divisors, but rather about the following model setting. Throughout the paper, ${\textbf{A}}$ will denote a random set of positive integers in which i is included in ${\textbf{A}}$ with probability 1/i, these choices being independent for different is. We refer to ${\textbf{A}}$ as a logarithmic random set.

A large proportion of our paper will be devoted to understanding conditions under which there is an integer which can be represented as a sum of elements of ${\textbf{A}}$ in (at least) k different ways. In particular, we wish to obtain bounds on the quantities $\beta _k$ defined in the following problem.

Problem 1

Let $k \geqslant 2$ be an integer. Determine $\beta _k$, the supremum of all exponents $c < 1$ for which the following is true: with probability tending to 1 as $D \rightarrow \infty $, there are distinct sets $A_1, \ldots , A_k \subset {\textbf{A}}\cap [D^c, D]$ with equal sums, i.e., $\sum _{a \in A_1} a = \cdots = \sum _{a \in A_k} a$.

The motivation for the random set ${\textbf{A}}$ comes from our knowledge of the anatomy of integers, permutations and polynomials. For a random integer $m\leqslant x$, with x large, let $U_k$ be the event that m has a prime factor in the interval $(e^{k},e^{k+1}]$. For a random permutation $\sigma \in S_n$, let $V_k$ be the event that $\sigma $ has a cycle of size k, and for a random monic polynomial f of degree n over ${\mathbb {F}}_q$, with n large, let $W_k$ be the event that f has an irreducible factor of degree k. Then it is known (see e.g., [2, 3, 15]) that $U_k$, $V_k$ and $W_k$ each occur with probability close to 1/k, and also that the $U_k$ are close to independent for $k=o(\log x)$, the $V_k$ are close to independent for $k=o(n)$, and the $W_k$ are close to independent for k large and $k=o(n)$. Thus, the model set ${\textbf{A}}$ captures the factorization structure of random integers, random permutations and random polynomials over a finite field. It is then relatively straightforward to transfer results about subset sums of ${\textbf{A}}$ to divisors of integers, permutations and polynomials. Section 2 below contains details of the transference principle.

The main result of this paper is an asymptotic lower bound on $\beta _k$.

Theorem 2

We have $\liminf _{r\rightarrow \infty } (\beta _{2^r})^{1/r} \geqslant \rho /2$, where

$$\begin{aligned} \rho =0.2812113496 9637466015\ldots \end{aligned}$$

is a specific constant defined as the unique solution in [0, 1/3] of

$$\begin{aligned} \frac{1}{1 - \rho /2} = \lim _{j\rightarrow \infty } \frac{\log a_j}{2^{j-2}} , \end{aligned}$$

(1.1)

where the sequence $a_j$ is defined by

$$\begin{aligned} a_1 = 2, \quad a_2 = 2 + 2^{\rho }, \quad a_j = a_{j-1}^2 + a_{j-1}^{\rho } - a_{j-2}^{2\rho } \qquad (j \geqslant 3). \end{aligned}$$

The proof of Theorem 2 will occupy the bulk of this paper, and has three basic parts:

(a)
Showing that for every $r\geqslant 1$, $\beta _{2^r} \geqslant \theta _r$ for a certain explicitly defined constant $\theta _r$;
(b)
Showing that $\lim _{r\rightarrow \infty } \theta _r^{1/r}$ exists;
(c)
Showing that (1.1) has a unique solution $\rho \in [0,1/3]$ and that
$$\begin{aligned} \rho =2\lim _{r\rightarrow \infty } \theta _r^{1/r}. \end{aligned}$$

In the sequel we shall refer to “Theorem 2 (a)”, “Theorem 2 (b)” and “Theorem 2 (c)”. Parts (a), (b) and (c) are quite independent of one another, with the proof of (a) (given in Sect. 9.2) being by far the longest of the three. The definition of $\theta _r$, while somewhat complicated, is fairly self-contained: see Definition 9.6. Parts (b) and (c) are then problems of an analytic and combinatorial flavour which can be addressed largely independently of the main arguments of the paper. The formula (1.1) allows for a quick computation of $\rho $ to many decimal places, as the limit on the right side converges extremely rapidly. See Sect. 12 for details.

Let us now state an important corollary of Theorem 2.

Corollary 1

Define

$$\begin{aligned} \zeta _+ = \limsup _{k\rightarrow \infty } \frac{\log k}{\log (1/\beta _{k})} \quad \text {and}\quad \zeta _- = \liminf _{k\rightarrow \infty } \frac{\log k}{\log (1/\beta _{k})} . \end{aligned}$$

(1.2)

Then

$$\begin{aligned} \zeta _+\geqslant \zeta _-\geqslant \eta := \frac{\log 2}{\log (2/\rho )} = 0.3533227\ldots . \end{aligned}$$

(1.3)

Proof

Evidently, $\zeta _+\geqslant \zeta _-$. In addition, observe the trivial bound $\beta _k \leqslant \beta _{k+1}$. Hence,

$$\begin{aligned} \zeta _+ = \limsup _{r\rightarrow \infty } \frac{r\log 2}{\log (1/\beta _{2^r})} \quad \text {and}\quad \zeta _- = \liminf _{r\rightarrow \infty } \frac{r\log 2}{\log (1/\beta _{2^r})} . \end{aligned}$$

(1.4)

We then use Theorem 2 to find that $\zeta _-\geqslant \eta $. $\square $

We conjecture that our lower bounds on $\beta _k$ are asymptotically sharp, so that the following holds:

Conjecture 1

We have $\zeta _+=\zeta _- = \eta $.

We will address the exact values of $\beta _k$ in a future paper; in particular, we will show that

$$\begin{aligned} \beta _3 = \frac{\log 3 -1}{\log 3 + \frac{1}{\xi }} =0.02616218797316965133\ldots \end{aligned}$$

and

$$\begin{aligned} \beta _4 = \frac{\log 3-1}{\log 3 + \frac{1}{\xi } + \frac{1}{\xi \lambda }} = 0.01295186091360511918\ldots \end{aligned}$$

where

$$\begin{aligned} \xi = \frac{\log 2 - \log (e-1)}{\log (3/2)}, \qquad \lambda = \frac{\log 2 - \log (e-1)}{1 + \log 2 - \log (e-1)-\log (1+2^{1-\xi })}. \end{aligned}$$

1.4 Application to divisors of integers, permutations and polynomials

The link between Problem 1 and the concentration of divisors is given by the following Theorems. The proofs are relatively straightforward and given in the next section. Recall from (1.2) the definition of $\zeta _+$.

Theorem 3

For any $\varepsilon >0$, we have

$$\begin{aligned} \Delta (n) \geqslant (\log \log n)^{\zeta _+ - \varepsilon } \end{aligned}$$

for almost every n.

Remark

In principle, the proof of Theorem 3 yields an explicit bound on the size of the set of integers n with $\Delta (n)\leqslant (\log \log n)^{\zeta _+-\varepsilon }$. However, incorporating such an improvement is a very complicated task. In addition, the obtained bound will presumably be rather weak without a better understanding of the theoretical tools we develop (cf. Sect. 3).

The same probabilistic setup allows us to quickly make similar conclusions about the distribution of divisors (product of cycles) of permutations and of polynomials over finite fields.

Theorem 4

For a permutation $\sigma $ on $S_n$, denote by

$$\begin{aligned} \Delta (\sigma ) := \max _r \# \{ d| \sigma : {\text {length}}(d)=r \}, \end{aligned}$$

where d denotes a generic divisor of $\sigma $; that is, d is the product of a subset of the cycles of $\sigma $.

Let $\varepsilon >0$ be fixed. If n is sufficiently large in terms of $\varepsilon $, then for at least $(1-\varepsilon )(n!)$ of the permutations $\sigma \in S_n$, we have

$$\begin{aligned} \Delta (\sigma ) \geqslant (\log n)^{\zeta _+ - \varepsilon }. \end{aligned}$$

Theorem 5

Let q be any prime power. For a polynomial $f\in {\mathbb {F}}_q[t]$, let

$$\begin{aligned} \Delta (f) = \max _r \# \{ g| f : \deg (g)=r \}. \end{aligned}$$

Let $\varepsilon >0$ be fixed. If n is sufficiently large in terms of $\varepsilon $, then at least $(1-\varepsilon ) q^n$ monic polynomials of degree n satisfy

$$\begin{aligned} \Delta (f) \geqslant (\log n)^{\zeta _+ - \varepsilon }. \end{aligned}$$

Conjecture 2

The lower bounds given in Theorems 3, 4 and 5 are sharp. That is, corresponding upper bounds with exponent $\zeta _+ + \varepsilon $ hold.

If both Conjectures 1 and 2 hold, then we deduce that the optimal exponent in the above theorems is equal to $\eta $.

Remark

The exponent $\zeta _+-\varepsilon $ in Theorems 3, 4 and 5 depends only on accurate asymptotics for $\beta _k$ as $k\rightarrow \infty $ or, even more weakly, for $\beta _{2^r}$ as $r\rightarrow \infty $ (cf. (1.4)). In this work, however, we develop a framework for determining $\beta _k$ exactly for each k.

The quantity $\beta _k$ is also closely related to the densest packing of k divisors of a typical integer. To be specific, we define $\alpha _k$ be the supremum of all real numbers $\alpha $ such that for almost every $n\in {\mathbb {N}}$, n has k divisors $d_1<\cdots <d_k$ with $d_k \leqslant d_1 (1+ (\log n)^{-\alpha })$. In 1964, Erdős [10] conjectured that $\alpha _2 = \log 3 -1$, and this was confirmed by Erdős and Hall [6] (upper bound) and Maier and Tenenbaum [20] (lower bound). The best bounds on $\alpha _k$ for $k\geqslant 3$ are given by Maier and Tenenbaum [22], who showed that

$$\begin{aligned} \alpha _k \leqslant \frac{\log 2}{k+1} \qquad (k\geqslant 3) \end{aligned}$$

and (this is not stated explicitly in [22])

$$\begin{aligned} \alpha _k \geqslant \frac{(\log 3-1)^m 3^{m-1}}{(3\log 3 -1 )^{m-1}} \qquad (2^{m-1} < k \leqslant 2^m, m\in {\mathbb {N}}). \end{aligned}$$

(1.5)

See also [26, p. 655–656].^{Footnote 2} In particular, it is not known if $\alpha _3 > \alpha _4$, although Tenenbaum [26] conjectures that the sequence $(\alpha _k)_{k\geqslant 2}$ is strictly decreasing.

We can quickly deduce a lower bound for $\alpha _k$ in terms of $\beta _k$.

Theorem 6

For all $k\geqslant 2$ we have $\alpha _k \geqslant \beta _k/(1-\beta _k)$.

In particular,

$$\begin{aligned} \alpha _3 \geqslant \frac{\beta _3}{1-\beta _3} = 0.0268650\ldots , \end{aligned}$$

which is substantially larger than the bound from (1.5), which is $\alpha _3 \geqslant 0.0127069\ldots $.

Combining Theorem 6 with the bounds on $\beta _k$ given in Theorem 2, we have improved the lower bounds (1.5) for large k.

The upper bound on $\alpha _k$ is more delicate, and a subject which we will return to in a future paper. For now, we record our belief that the lower bound in Theorem 6 is sharp.

Conjecture 3

For all $k\geqslant 2$ we have $\alpha _k = \beta _k/(1-\beta _k)$.

2 Application to random integers, random permutations and random polynomials

In this section we assume the validity of Theorem 2 and use it to prove Theorems 3, 4, 5 and 6. The two main ingredients in this deduction are a simple combinatorial device (Lemma 2.1), of a type often known as a “tensor power trick”, used for building a large collection of equal subset sums, and transference results (Lemmas 2.2, 2.3 and 2.4) giving a correspondence between the random set ${\textbf{A}}$ and prime factors of a random integer, the cycle structure of a random permutation and the factorization of a random polynomial over a finite field. In the integer setting, this is a well-known principle following, e.g. from the Kubilius model of the integers (Kubilius, Elliott [4, 5], Tenenbaum [25]). We give a self-contained (modulo using the sieve) proof below.

Throughout this section, ${\textbf{A}}$ denotes a logarithmic random set.

2.1 A “tensor power” argument

In this section we give a simple combinatorial argument, first used in a related context in the work of Maier–Tenenbaum [20], which shows how to use equal subsums in multiple intervals $((D')^c,D']$ to create many more common subsums in ${\mathcal {A}}$.

Lemma 2.1

Let $k \in {\mathbb {Z}}_{\geqslant 2}$ and $\varepsilon >0$ be fixed. Let $D_1,D_2$ be parameters depending on D with $3 \leqslant D_1 < D_2 \leqslant D$, $\log \log D_1 = o(\log \log D)$ and $\log \log D_2 = (1 - o(1)) \log \log D$ as $D\rightarrow \infty $. Then, with probability $\rightarrow 1$ as $D\rightarrow \infty $, there are distinct $A_1,\ldots , A_M \subset {\textbf{A}}\cap [D_1, D_2]$ with $\sum _{a \in A_1} a = \cdots = \sum _{a \in A_M} a$ and $M \geqslant (\log D)^{(\log k)/\log (1/\beta _k) - \varepsilon }$.

Remark

In particular, the result applies when $D_1 = 3$ and $D_2 = D$, in which case it has independent combinatorial interest, giving a (probably tight) lower bound on the growth of the representation function for a random set.

Proof

Since increasing the value of $D_1$ only makes the proposition stronger, we may assume that $D_1 \rightarrow \infty $ as $D\rightarrow \infty $. Let $0<\delta <\beta _k$, and set $\alpha := \beta _k - \delta $. Set

$$\begin{aligned} m: = \Big \lfloor \frac{\log \log D_2 - \log \log D_1}{-\log (\beta _k - \delta )}\Big \rfloor \end{aligned}$$

and consider the intervals $[D_2^{\alpha ^{i+1}}, D_2^{\alpha ^i})$, $i = 0,1,\ldots , m - 1$. Due to the choice of m, these all lie in $[D_1, D_2]$.

Let $E_i$, $i = 0,1,2,\ldots $ be the event that there are distinct $A^{(i)}_1,\ldots , A^{(i)}_k \subset [D_2^{\alpha ^{i+1}}, D_2^{\alpha ^i})$ with $\sum _{a \in A^{(i)}_1} a = \cdots = \sum _{a \in A^{(i)}_k} a$. Then, by the definition of $\beta _k$ and the fact that $D_1 \rightarrow \infty $, we have ${\mathbb {P}}(E_i) = 1 - o(1)$, uniformly in $i=0,1,\ldots ,m-1$. Here and throughout the proof, o(1) means a function tending to zero as $D\rightarrow \infty $, at a rate which may depend on $k,\delta $. These events $E_i$ are all independent. The Law of Large Numbers then implies that, with probability $1 - o(1)$, at least $(1 - o(1))m$ of them occur, let us say for $i \in I$, $|I| = (1 - o(1))m$.

From the above discussion, we have found $M := k^{|I|} = k^{(1 - o(1))m}$ distinct sets $B_{\varvec{j}} = \bigcup _{i \in I} A_{j_i}^{(i)}$, ${\varvec{j}} \in [k]^{I}$, such that all of the sums $\sum _{a \in B_{\varvec{j}}} a$ are the same. Note that

$$\begin{aligned} M = k^{(1 + O_k(\delta ) + o(1)) \log \log D/\log (1/\beta _k)}. \end{aligned}$$

Taking $\delta $ small enough and D large enough, the result follows. $\square $

2.2 Modeling prime factors with a logarithmic random set

Let X be a large parameter, suppose that

$$\begin{aligned} 1 \leqslant K \leqslant (\log X)^{1/2}, \end{aligned}$$

(2.1)

and let $I=[i_1,i_2] \cap {\mathbb {N}}$, where

$$\begin{aligned} i_1 = {\left\lfloor {K (\log \log X)^3} \right\rfloor }, \quad i_2={\left\lfloor {\frac{K\log X}{2\log \log \log X}} \right\rfloor }. \end{aligned}$$

(2.2)

For a uniformly random positive integer ${\textbf{n}}\leqslant X$, let ${\textbf{n}}=\prod _p p^{v_p}$ be the the prime factorization of ${\textbf{n}}$, where the product is over all primes. Let ${\mathscr {P}}_i$ be the set of primes in $(e^{i/K}, e^{(i+1)/K}]$, and define the random set

$$\begin{aligned} {\textbf{I}} = \{ i\in I : \exists p\in {\mathscr {P}}_i\ \text {such that}\ p|{{\textbf{n}}}\} . \end{aligned}$$

(2.3)

that is, the set of i for which ${\textbf{n}}$ has a prime factor in ${\mathscr {P}}_i$. By the sieve, it is known that the random variables $v_p$ are nearly independent for $p=X^{o(1)}$, and thus the probability that $b_i\geqslant 1$ is roughly

$$\begin{aligned} R_i := \sum _{p\in {\mathscr {P}}_i} \frac{1}{p} \approx \frac{1}{i}. \end{aligned}$$

The next lemma makes this precise.

Recall the notion of total variation distance $d_{{\text {TV}}}(X,Y)$ between two discrete real random vectors X, Y defined on the same probability space $(\Omega ,{\mathcal {F}},{\mathbb {P}})$:

$$\begin{aligned} d_{{\text {TV}}}(X,Y) = \max _{A\in {\mathcal {F}}} | {\mathbb {P}}(X\in A) - {\mathbb {P}}(Y\in A)|. \end{aligned}$$

We have

$$\begin{aligned} d_{{\text {TV}}}((X_1,\ldots ,X_k),(Y_1,\ldots ,Y_k)) \leqslant \sum _{j=1}^k d_{{\text {TV}}}(X_j,Y_j), \end{aligned}$$

(2.4)

provided that the random variables $X_j,Y_j$ live on the same probability space for each j, that $X_1,\ldots ,X_k$ are independent, and $Y_1,\ldots ,Y_k$ are also independent. Although we believe this is a standard inequality, we could not find a good reference for it and give a proof of (2.4) in Lemma A.8. In addition, recall the identity

$$\begin{aligned} d_{{\text {TV}}}(X,Y) = \frac{1}{2} \sum _{t\in \Omega } | {\mathbb {P}}(X=t) - {\mathbb {P}}(Y=t) | \end{aligned}$$

(2.5)

when X and Y take values in a probability space $(\Omega ,{\mathcal {F}},{\mathbb {P}})$ with $\Omega $ countable and ${\mathcal {F}}$ being the power set of $\Omega $. See, e.g. [19, Proposition 4.2].

Lemma 2.2

Uniformly for any collection ${\mathscr {I}}$ of subsets of I, we have

$$\begin{aligned} {\mathbb {P}}({\textbf{A}}\cap I \in {\mathscr {I}}) = {\mathbb {P}}({\textbf{I}} \in {\mathscr {I}}) + O(1/\log \log X). \end{aligned}$$

Proof

For $i_1 \leqslant i\leqslant i_2$, let $\omega _i$ be the indicator function of the event that ${{\textbf{n}}}$ has a prime factor from ${\mathscr {P}}_i$, let $Q_i$ be a Poisson random variable with parameter $R_i$, with the different $Q_i$ independent, and let $Z_i=1_{Q_i\geqslant 1}$.^{Footnote 3} Also, let $Y_i$ be a Bernoulli random variable with ${\mathbb {P}}(Y_i=1)=1/i$, again with the $Y_i$ independent. Let $\varvec{\omega }, {\textbf{Z}}, {\textbf{Y}}$ denote the vectors of the variables $\omega _i,Z_i,Y_i$, respectively. By assumption, each ${\mathscr {P}}_i \subset [\log X, X^{1/3\log \log \log X}]$. Hence, Theorem 1 of [11] implies that

$$\begin{aligned} d_{{\text {TV}}}(\varvec{\omega },{\textbf{Z}}) \ll \frac{1}{\log \log X}. \end{aligned}$$

In addition, note that $d_{{\text {TV}}}(Z_i,Y_i)\ll 1/i^2$ for all i, something that can be easily proven using (2.5). Combining this estimate with (2.4), we find that

$$\begin{aligned} d_{{\text {TV}}}({\textbf{Z}},{\textbf{Y}}) \leqslant \sum _{i=i_1}^{i_2} d_{{\text {TV}}}(Z_i,Y_i) \ll \sum _{i=i_1}^{i_2} \frac{1}{i^2} \ll \frac{1}{\log \log X}. \end{aligned}$$

The triangle inequality then implies that $d_{{\text {TV}}}(\varvec{\omega },{\textbf{Y}})\ll 1/\log \log X$, as desired. $\square $

2.3 The concentration of divisors of integers

In this section we prove Theorems 3 and 6. Recall from (1.2) the definition of $\zeta _+$.

Proof of Theorem 3

Fix $\varepsilon >0$ and let X be large enough in terms of $\varepsilon $, and let ${\textbf{n}} \leqslant X$ be a uniformly sampled random integer. Generate a logarithmic random set ${\textbf{A}}$. Set $K=10 \log \log X$, $D_1 = i_1$, $D=D_2 = i_2$, where $i_1$ and $i_2$ are defined by (2.2). With our choice of parameters, the hypotheses of Lemma 2.1 hold and hence, with probability $1 - o(1)$ as $X\rightarrow \infty $, there are distinct sets $A_1,\ldots , A_M \subset {\textbf{A}}\cap [D_1, D_2]$ with $\sum _{a \in A_1} a = \cdots = \sum _{a \in A_M} a$ and $M:=\lceil (\log \log X)^{\zeta _+ - \varepsilon }\rceil $. By Lemma A.2, with probability $1 - o(1)$, we have

$$\begin{aligned} |{\textbf{A}}\cap [D_1,D_2]| \leqslant 2 \log D_2 \leqslant 2 \log \log X+2\log K. \end{aligned}$$

Write F for the event that both of these happen.

Let ${{\textbf{n}}}$ be a random integer chosen uniformly in [1, X], and let ${\textbf{I}}$ be the random set associated to ${{\textbf{n}}}$ via (2.3). By Lemma 2.2, the corresponding event $F'$ for ${\textbf{I}}$ also holds with probability $1-o(1)$; that is, $F'$ is the event that $|{\textbf{I}} \cap [D_1,D_2]| \leqslant 2\log D_2$ and that there are distinct subsets $I_1,\ldots ,I_M$ with equal sums. Assume we are in the event $F'$. For each $i\in {\textbf{I}}$, ${\textbf{n}}$ is divisible by some prime $p_i\in {\mathscr {P}}_i$. In addition, for each $r,s\in \{1,2,\ldots ,M\}$, we have

$$\begin{aligned} \Big |\sum _{i \in I_r} \log p_i - \sum _{i \in I_s} \log p_i\Big |&\leqslant \frac{|I_r| + |I_s|}{K} + \frac{1}{K} \Big |\sum _{i \in I_r} i - \sum _{i \in I_s} i\Big | \\ {}&\leqslant \frac{4 \log \log X+4\log K}{K} < \frac{1}{2}. \end{aligned}$$

Writing $d_r := \prod _{i \in I_r} p_i$ for each i, we thus see that the $d_r$’s are all divisors of ${\textbf{n}}$ and their logarithms all lie in an interval of length 1. It follows that ${\mathbb {P}}(\Delta ({\textbf{n}}) \geqslant M) = 1 - o(1)$ when ${{\textbf{n}}}$ is a uniformly sampled random integer from [1, X], as required for Theorem 3. $\square $

Proof of Theorem 6

Fix $0<c < \beta _k/(1-\beta _k)$, let X be large and set $K= (\log X)^{c}$. Define $i_1,i_2$ by (2.2), let $D=i_2$ and define $c'$ by $D^{c'} = i_1$. Let ${{\textbf{n}}}$ be a random integer chosen uniformly in [1, X]. We have

$$\begin{aligned} c' = \frac{c}{c+1} + o(1) \qquad (X\rightarrow \infty ), \end{aligned}$$

and therefore $c' \leqslant \beta _k-\delta $ for some $\delta >0$, which depends only on c. By the definition of $\beta _k$ and Lemma 2.2, it follows that with probability $1-o(1)$, the set ${\textbf{I}}$ defined in (2.3) has k distinct subsets $I_1,\ldots ,I_k$ with equal sums, and moreover (cf. the proof of Theorem 3 above), $|{\textbf{I}}| \leqslant 2\log i_2$, so that $|I_j|\leqslant 2\log i_2$ for each j. Thus, with probability $1-o(1)$, there are primes $p_i\in {\mathscr {P}}_i$ ($i\in {\textbf{I}}$) such that for any $r,s\in \{1,\ldots ,k\}$ we have

$$\begin{aligned} \Big | \sum _{i\in I_r} \log p_i - \sum _{i\in I_s} \log p_i \Big | \leqslant \frac{|I_r|+|I_s|}{K} \leqslant \frac{4\log \log X}{(\log X)^c}. \end{aligned}$$

Thus, setting $d_r = \prod _{i\in I_r} p_i$, we see that $d_r \leqslant d_s \exp \big \{O\big (\frac{\log \log X}{(\log X)^c}\big ) \big \}$ for any $r,s\in \{1,\ldots ,k\}$. Since c is arbitrary subject to $c<\beta _k/(1-\beta _k)$, we conclude that $\alpha _k \geqslant \beta _k/(1-\beta _k)$. $\square $

2.4 Permutations and polynomials over finite fields

The connection between random logarithmic sets, random permutations and random polynomials is more straightforward, owing to the well-known approximations of these objects by a vector of Poisson random variables.

For each j, let $Z_j$ be a Poisson random variable with parameter 1/j, and such that $Z_1,Z_2,\ldots ,$ are independent. The next proposition states that, apart from the very longest cycles, the cycle lengths of a random permutation have a joint Poisson distribution.

Lemma 2.3

For a random permutation $\sigma \in S_n$, let $C_j(\sigma )$ denote the number of cycles in $\sigma $ of length j. Then for $r = o(n)$ as $n\rightarrow \infty $ we have

$$\begin{aligned} d_{{\text {TV}}} \Big ( (C_1(\sigma ),\ldots ,C_r(\sigma )),(Z_1,\ldots ,Z_r) \Big ) = o(1). \end{aligned}$$

Proof

In fact there is a bound $\ll e^{-n/r}$ uniformly in n and r; see [3]. $\square $

The next proposition states a similar phenomenon for the degrees of the irreducible factors of a random polynomial over ${\mathbb {F}}_q$, except that now one must also exclude the very smallest degrees as well.

Lemma 2.4

Let q be a prime power. Let f be a random, monic polynomial in ${\mathbb {F}}_q[t]$ of degree n. Let $Y_d(f)$ denote the number of monic, irreducible factors of f which have degree d. Suppose that $10\log n \leqslant r \leqslant s\leqslant \frac{n}{10\log n}$. Then

$$\begin{aligned} d_{{\text {TV}}} \Big ( (Y_r(f),\ldots ,Y_s(f)),(Z_r,\ldots ,Z_s) \Big ) = o(1) \end{aligned}$$

as $n \rightarrow \infty $.

Proof

For $r \leqslant i \leqslant s$, let ${\hat{Z}}_i$ be a negative binomial random variable^{Footnote 4}$\textrm{NB}(\frac{1}{i}\sum _{j|i} \mu (i/j) q^{j},q^{-i})$. Corollary 3.3 in [2] implies that

$$\begin{aligned} d_{{\text {TV}}} \Big ( (Y_r(f),\ldots ,Y_s(f)),({\hat{Z}}_r,\ldots ,{\hat{Z}}_s) \Big ) \ll 1/n \end{aligned}$$

(2.6)

uniformly in q, n, r, s as in the statement of the lemma. Note that

$$\begin{aligned} \frac{1}{i} \sum _{j|i} \mu (i/j) q^{j} = \frac{1}{i}q^i(1+ O(q^{-i/2}))=\frac{1}{i}q^i(1+O(1/n)) \end{aligned}$$

for $i\geqslant r\geqslant 10\log n$. A routine if slightly lengthy calculation with (2.5) gives

$$\begin{aligned} d_{{\text {TV}}}(Z_i,{\hat{Z}}_i) \ll 1/n. \end{aligned}$$

Combining this with (2.4), we arrive at

$$\begin{aligned} d_{{\text {TV}}}( (Z_r,\ldots ,Z_s),({\hat{Z}}_r,\ldots ,{\hat{Z}}_s) ) \ll s/n = o(1). \end{aligned}$$

The conclusion follows from this, (2.6) and the triangle inequality. $\square $

Proof of Theorem 4

Fix $\varepsilon >0$, let n be large enough in terms of $\varepsilon $, let $u=\log n$ and $v=n/\log n$. For a random permutation $\sigma \in S_n$, let ${\textbf{C}} = \{ j: C_j(\sigma )\geqslant 1 \}$, and define the random set ${\tilde{{\textbf{A}}}} = \{ j: Z_j \geqslant 1 \}$. As in the proof of Lemma 2.2, (2.4) and (2.5) imply that

$$\begin{aligned} d_{{\text {TV}}} ({\textbf{A}}\cap (u,v], {\tilde{{\textbf{A}}}} \cap (u,v]) \ll \sum _{u<i\leqslant v} \frac{1}{i^2} \ll \frac{1}{u}. \end{aligned}$$

Lemma 2.3 implies that

$$\begin{aligned} d_{{\text {TV}}} ({\tilde{{\textbf{A}}}} \cap (u,v], {\textbf{C}} \cap (u,v]) =o(1) \qquad (n\rightarrow \infty ). \end{aligned}$$

Hence,

$$\begin{aligned} d_{{\text {TV}}} ({\textbf{A}}\cap (u,v], {\textbf{C}} \cap (u,v]) \leqslant&\, d_{{\text {TV}}} ({\textbf{A}}\cap (u,v], {\tilde{{\textbf{A}}}} \cap (u,v]) \\ {}&+ d_{{\text {TV}}} ({\tilde{{\textbf{A}}}} \cap (u,v], {\textbf{C}} \cap (u,v]) \\ =&\, o(1) \end{aligned}$$

as $n \rightarrow \infty $. By Lemma 2.1, with probability $\rightarrow 1$ as $n\rightarrow \infty $, ${\textbf{A}}\cap (u,v]$ has M distinct subsets $A_1,\ldots ,A_M$ with equal sums, where $M=\lceil (\log n)^{\zeta _+-\varepsilon }\rceil $. Hence, ${\textbf{C}}$ has distinct subsets $S_1,\ldots ,S_M$ with equal sums with probability $\rightarrow 1$ as $n\rightarrow \infty $. Each subset $S_j$ corresponds to a distinct divisor of $\sigma $, the size of the divisor being the sum of elements of $S_j$. $\square $

Proof of Theorem 5

The proof is essentially the same as that of Theorem 4, except now we take $u=10\log n$, $v=\frac{n}{10\log n}$, ${\textbf{C}} = \{j: Y_j(f)\geqslant 1 \}$ and use Lemma 2.4 in place of Lemma 2.3. $\square $

3 Overview of the paper

The purpose of this section is to explain the main ideas that go into the proof of Theorem 2 in broad strokes, as well as to outline the structure of the rest of the paper. The remainder of the paper splits into three parts, and we devote a subsection to each of these. Finally, in Sect. 3.4, we make some brief comments about the relationship of our work to previous work of Maier and Tenenbaum [20, 22]. Further comments on this connection are made in Appendix C.

3.1 Part II: equal sums and the optimization problem

Part II provides a very close link between the key quantity $\beta _k$ (which is defined in Problem 1 and appears in all four of Theorems 2, 3, 4 and 5) and a quantity $\gamma _k$, which on the face of it appears to be of a completely different nature, being the solution to a certain optimization problem (Problem 3.7 below) involving the manner in which linear subspaces of ${\mathbb {Q}}^k$ intersect the cube $\{0,1\}^k$.

At the heart of this connection is a fairly simple way of associating a flag to k distinct sets $A_1,\ldots , A_k \subset A$, where A is a given set of integers (that we typically generate logarithmically).

Definition 3.1

(Flags) Let $k \in {\mathbb {N}}$. By an r-step flag we mean a nested sequence

$$\begin{aligned} {\mathscr {V}}: \langle {\textbf{1}} \rangle = V_0 \leqslant V_1 \leqslant V_2 \leqslant \cdots \leqslant V_r \leqslant {\mathbb {Q}}^k \end{aligned}$$

of vector spaces.^{Footnote 5} Here ${\textbf{1}} = (1,1,\ldots , 1) \in {\mathbb {Q}}^k$. A flag is complete if $\dim V_{i+1} = \dim V_i + 1$ for $i = 0,1,\ldots , r-1$.

To each choice of distinct sets $A_1,\ldots , A_k \subset A$, we associate a flag as follows. The Venn diagram of the subsets $A_1,\ldots ,A_k$ produces a natural partition of A into $2^k$ subsets, which we denote by $B_\omega $ for $\omega \in \{0,1\}^k$. Here $A_i = \sqcup _{\omega :\omega _i=1} B_\omega $. We iteratively select vectors $\omega ^1,\ldots ,\omega ^r$ to maximize $\prod _{j=1}^r (\max B_{\omega ^j})$ subject to the constraint that ${\textbf{1}},\omega ^1,\ldots ,\omega ^r$ are linearly independent over ${\mathbb {Q}}$. We then define^{Footnote 6}$V_j = {\text {Span}}({\textbf{1}},\omega ^1,\ldots ,\omega ^j)$ for $j = 0,1,\ldots , r$.

The purpose of making this construction is difficult to describe precisely in a short paragraph. However, the basic idea is that the vectors $\omega ^1,\ldots , \omega ^r$ and the flag ${\mathscr {V}}$ provide a natural frame of reference for studying the equal sums equation

$$\begin{aligned} \sum _{a\in A_1}a=\cdots =\sum _{a\in A_k} a. \end{aligned}$$

(3.1)

Suppose now that $A_1,\ldots , A_k \subset [D^c,D]$. Then the construction just described naturally leads, in addition to the flag ${\mathscr {V}}$, to the following further data: thresholds $c_j$ defined by $\max B_{\omega ^j} \approx D^{c_j}$, and measures $\mu _j$ on $\{0,1\}^k$, which capture the relative sizes of the sets $B_{\omega } \cap (D^{c_{j+1}},D^{c_j}]$, $\omega \in \{0,1\}^k$. Full details of these constructions are given in Sect. 4.

The above discussion motivates the following definition, which will be an important one in our paper.

Definition 3.2

(Systems) Let $({\mathscr {V}},{{\textbf{c}}},{\varvec{\mu }})$ be a triple such that:

(a)
${\mathscr {V}}$ is an r-step flag whose members $V_j$ are distinct and spanned by elements of $\{0,1\}^k$;
(b)
${\mathscr {V}}$ is nondegenerate, which means that $V_r$ is not contained in any of the subspaces $\{ x \in {\mathbb {Q}}^k : x_i = x_j\}$, $i \ne j$;
(c)
${{\textbf{c}}}=(c_1,\ldots ,c_r,c_{r+1})$ with $1\geqslant c_1 \geqslant \cdots \geqslant c_{r+1} \geqslant 0$;
(d)
${\varvec{\mu }}=(\mu _1,\ldots ,\mu _r)$ is an r-tuple of probability measures;
(e)
${\text {Supp}}(\mu _i)\subset V_i \cap \{0,1\}^k$ for all i.

Then we say that $({\mathscr {V}},{{\textbf{c}}},{\varvec{\mu }})$ is a system. We say that a system is complete if its underlying flag is, in the sense of Definition 3.1.

Remark

The nondegeneracy condition (b) arises naturally from the construction described previously, provided one assumes the sets $A_1,\ldots , A_k$ are distinct.

We have sketched how a system $({\mathscr {V}},{{\textbf{c}}},{\varvec{\mu }})$ may be associated to any k distinct sets $A_1,\ldots , A_k \subset [D^c, D]$. Full details are given in Sect. 4.1. There is certainly no canonical way to reverse this and associate sets $A_i$ to a system $({\mathscr {V}},{{\textbf{c}}},{\varvec{\mu }})$, even if the numbers $\mu _j(\omega )$ are all rational. However, given a set ${\textbf{A}}\subset [D^c,D]$ (which, in our paper, will be a logarithmic random set) and a system $({\mathscr {V}},{{\textbf{c}}},{\varvec{\mu }})$, there is a natural probabilistic way to construct subsets $A_1,\ldots , A_k \subset {\textbf{A}}$ via their Venn diagram $(B_{\omega })_{\omega \in \{0,1\}^k}$: if $a \in {\textbf{A}}\cap (D^{c_{j+1}}, D^{c_j}]$ then we put a in $B_{\omega }$ with probability $\mu _j(\omega )$, these choices being independent for different as.

This will be indeed be roughly our strategy for constructing, given a logarithmic random set ${\textbf{A}}\subset [D^c, D]$, distinct subsets $A_1,\ldots , A_k \subset {\textbf{A}}\cap [D^c, D]$ satisfying the equal sums condition (3.1). Very broadly speaking, we will enact this plan in two stages, described in Sects. 5 and 6 respectively. In Sect. 5, which is by far the deeper part of the argument, we will show that (almost surely in ${\textbf{A}}$) the distribution of tuples $(\sum _{a \in A_i} a)_{i = 1}^k$ is dense in a certain box adapted to the flag ${\mathscr {V}}$, as the $A_i$ range over the random choices just described. Then, in Sect. 6, we will show that (almost surely) one of these tuples can be “corrected” to give the equal sums condition (3.1). This general mode of argument has its genesis in the paper [20] of Maier and Tenenbaum, but the details here will look very different. In addition to the fact that linear algebra and entropy play no role in Maier and Tenenbaum’s work, they use a second moment argument which does not work in our setting. Instead we use an $\ell ^p$ estimate with $p\approx 1$, building on ideas in [17, 18].

In analysing the distribution of tuples $(\sum _{a \in A_i} a)_{i = 1}^k$, the notion of entropy comes to the fore.

Definition 3.3

(Entropy of a subspace) Suppose that $\nu $ is a finitely supported probability measure on ${\mathbb {Q}}^k$ and that $W \leqslant {\mathbb {Q}}^k$ is a vector subspace. Then we define

$$\begin{aligned} {\mathbb {H}}_{\nu }(W) := -\sum _x \nu (x) \log \nu (W + x). \end{aligned}$$

Remark

This the (Shannon) entropy of the distribution on cosets $W + x$ induced by $\nu $. Entropy will play a key role in our paper, and basic definitions and properties of it are collected in Appendix B.

More important than the entropy itself will be a certain quantity $\textrm{e}({\mathscr {V}}',{{\textbf{c}}},{\varvec{\mu }})$, assigned to subflags of ${\mathscr {V}}$. We give the relevant definitions now.

Definition 3.4

(Subflags) Suppose that

$$\begin{aligned} {\mathscr {V}}: \langle {\textbf{1}} \rangle = V_0 \leqslant V_1 \leqslant V_2 \leqslant \cdots \leqslant V_r \leqslant {\mathbb {Q}}^k \end{aligned}$$

is a flag. Then another flag

$$\begin{aligned} {\mathscr {V}}': \langle {\textbf{1}} \rangle = V'_0 \leqslant V'_1 \leqslant V'_2 \leqslant \cdots \leqslant V'_r \leqslant {\mathbb {Q}}^k \end{aligned}$$

is said to be a subflag of ${\mathscr {V}}$ if $V'_i \leqslant V_i$ for all i. In this case we write ${\mathscr {V}}' \leqslant {\mathscr {V}}$. It is a proper subflag if it is not equal to ${\mathscr {V}}$.

Definition 3.5

($\textrm{e}$-value) Let $({\mathscr {V}},{{\textbf{c}}},{\varvec{\mu }})$ be a system, and let ${\mathscr {V}}' \leqslant {\mathscr {V}}$ be a subflag. Then we define the $\textrm{e}$-value

$$\begin{aligned} \textrm{e}({\mathscr {V}}', {{\textbf{c}}},{\varvec{\mu }}) := \sum _{j = 1}^r (c_j - c_{j+1}) {\mathbb {H}}_{\mu _j}(V'_j) + \sum _{j = 1}^r c_j\dim (V'_j/V'_{j-1}) . \end{aligned}$$

(3.2)

Remark

Note that

$$\begin{aligned} \textrm{e}({\mathscr {V}}, {{\textbf{c}}},{\varvec{\mu }}) = \sum _{j = 1}^r c_j \dim (V_j/V_{j-1}), \end{aligned}$$

(3.3)

since condition (e) of Definition 3.2 implies that ${\mathbb {H}}_{\mu _j}(V_j)=0$ for $1\leqslant j\leqslant r$.

Definition 3.6

(Entropy condition) Let $({\mathscr {V}},{{\textbf{c}}},{\varvec{\mu }})$ be a system. We say that this system satisfies the entropy condition if

$$\begin{aligned} \textrm{e}({\mathscr {V}}',{{\textbf{c}}},{\varvec{\mu }})\geqslant \textrm{e}({\mathscr {V}},{{\textbf{c}}},{\varvec{\mu }}) \qquad \text {for all subflags}\ {\mathscr {V}}'\ \text {of}\ {\mathscr {V}}, \end{aligned}$$

(3.4)

and the strict entropy condition if

$$\begin{aligned} \textrm{e}({\mathscr {V}}',{{\textbf{c}}},{\varvec{\mu }}) >\textrm{e}({\mathscr {V}},{{\textbf{c}}},{\varvec{\mu }}) \qquad \text {for all proper subflags}\ {\mathscr {V}}'\ \text {of}\ {\mathscr {V}}. \end{aligned}$$

(3.5)

We cannot give a meaningful discussion of exactly why these definitions are the right ones to make in this overview. Indeed, it took the authors over a year of working on the problem to arrive at them. Let us merely say that

If a random logarithmic set ${\textbf{A}}\cap [D^c, D]$ almost surely admits distinct subsets $A_1,\ldots , A_k$ satisfying the equal sums condition (3.1), then some associated system $({\mathscr {V}},{{\textbf{c}}},{\varvec{\mu }})$ satisfies the entropy condition (3.4). For detailed statements and proofs, see Sect. 4.
If a system $({\mathscr {V}},{{\textbf{c}}},{\varvec{\mu }})$ satisfies the strict entropy condition (3.5) then the details of the construction of sets $A_1,\ldots , A_k$ satisfying the equal sums condition, outlined above, can be made to work. For detailed statements and proofs, see Sects. 5 and 6.

With the above definitions and discussion in place, we are finally ready to introduce the key optimization problem, the study of which will occupy a large part of our paper.

Problem 3.7

(The optimisation problem) Determine the value of $\gamma _k$, defined to be the supremum of all constants c for which there is a system $({\mathscr {V}},{{\textbf{c}}},{\varvec{\mu }})$ such that $c_{r+1}=c$ and the entropy condition (3.4) holds.

Similarly, determine ${\tilde{\gamma }}_k$, defined to be the supremum of all constants c for which there is a system $({\mathscr {V}},{{\textbf{c}}},{\varvec{\mu }})$ such that $c_{r+1}=c$ and the strict entropy condition (3.5) holds.

The precise content of the two bullet points above, and the main result of Part II of the paper, is then the following theorem.

Theorem 7

For every $k\geqslant 2$, we have

$$\begin{aligned} {\tilde{\gamma }}_k\leqslant \beta _k\leqslant \gamma _k. \end{aligned}$$

Remark 3.1

(a) Presumably $\gamma _k = \beta _k = {{\tilde{\gamma }}}_k$. Indeed, it is natural to think that any system satisfying (3.4) can be perturbed an arbitrarily small amount to satisfy (3.5). However, we have not been able to show that this is possible in general.

(b) It is not a priori clear that $\gamma _k$ and ${\tilde{\gamma }}_k$ exist and are positive. This will follow, e.g., from our work on “binary systems” in part IV of the paper, although there is an easier way to see this using the original Maier–Tenenbaum argument, adapted to our setting; see Appendix C for a sketch of the details.

3.2 Part III: the optimization problem

Part III of the paper is devoted to the study of Problem 3.7 in as much generality as we can manage. Unfortunately we have not yet been able to completely resolve this problem, and indeed numerical experiments suggest that a complete solution, for all k, could be very complicated.

The main achievement of Part III is to provide a solution of sorts when the flag ${\mathscr {V}}$ is fixed, but one is free to choose ${\textbf{c}}$ and ${\varvec{\mu }}$. Write $\gamma _k({\mathscr {V}})$ (or ${{\tilde{\gamma }}}_k({\mathscr {V}})$) for the solution to this problem, that is, the supremum of values $c=c_{r+1}\geqslant 0$ for which a system $({\mathscr {V}},{{\textbf{c}}},{\varvec{\mu }})$ exists satisfying (3.4) (or (3.5)).

Our solution applies only to rather special flags ${\mathscr {V}}$, but this is unsurprising: for “generic” flags ${\mathscr {V}}$, one would not expect there to be any choice of ${\textbf{c}}$, ${\varvec{\mu }}$, for which $c_{r+1} > 0$, and so $\gamma _k({\mathscr {V}})= 0$ in these cases. Such flags are of no interest in this paper.

We begin, in Sect. 7, by solving an even more specific problem in which the entropy condition (3.4) is only required to hold for certain very special subflags ${\mathscr {V}}'$ of ${\mathscr {V}}$, which we call basic flags. These are flags of the form

$$\begin{aligned} {\mathscr {V}}'_{{\text {basic}}(m)} : \langle {\textbf{1}} \rangle = V_0 \leqslant V_1 \leqslant \cdots \leqslant V_{m-1} \leqslant V_m = V_m = \cdots = V_m. \end{aligned}$$

We call this the restricted entropy condition; to spell it out, this is the condition that

$$\begin{aligned} \textrm{e}({\mathscr {V}}'_{{\text {basic}}(m)}, {{\textbf{c}}},{\varvec{\mu }}) \geqslant \textrm{e}({\mathscr {V}}, {{\textbf{c}}},{\varvec{\mu }}) \end{aligned}$$

(3.6)

for $m = 0,1,\ldots , r-1$ (the case $m = r$ being vacuous).

We write $\gamma _k^{{\text {res}}}({\mathscr {V}})$ for the maximum value of $c_{r+1}$ (over all choices of ${\textbf{c}}$ and ${\varvec{\mu }}$ such that $({\mathscr {V}}, {{\textbf{c}}}, {\varvec{\mu }})$ is a system) subject to this condition. Clearly

$$\begin{aligned} \gamma _k^{{\text {res}}}({\mathscr {V}}) \geqslant \gamma _k({\mathscr {V}}).\end{aligned}$$

(3.7)

The main result of Sect. 7 is Proposition 7.7, which states that under certain conditions we have

$$\begin{aligned} \gamma _k^{{\text {res}}}({\mathscr {V}}) = \frac{\log 3 - 1}{\log 3 + \sum _{i = 1}^{r-1} \frac{\dim (V_{i+1}/V_i)}{\rho _1 \cdots \rho _{r-1}}},\end{aligned}$$

(3.8)

for certain parameters $\rho _1,\ldots , \rho _{r-1}$ depending on the flag ${\mathscr {V}}$.

To define these, one considers the “tree structure” on $\{0,1\}^k \cap V_r$ induced by the flag ${\mathscr {V}}$: the “cells at level j” are simply intersections with cosets of $V_j$, and we join a cell C at level j to a “child” cell $C'$ at level $j-1$ iff $C' \subset C$. The $\rho _i$ are then defined by setting up a certain recursively-defined function on this tree and then solving what we term the $\rho $-equations. The details may be found in Sect. 7.2. Proposition 7.7 also describes the measures ${\varvec{\mu }}$ and the parameters ${{\textbf{c}}}$ for which this optimal value is attained.

In Sect. 8, we relate the restricted optimisation problem to the real one, giving fairly general conditions under which we in fact have equality in (3.7), that is to say $\gamma _k^{{\text {res}}}({\mathscr {V}}) = \gamma _k({\mathscr {V}})$. The basic strategy of this section is to show that for the ${\textbf{c}}$ and ${\varvec{\mu }}$ which are optimal for the restricted optimisation problem, the full entropy condition (3.4) is in fact a consequence of the restricted condition (3.6).

The arguments of this section make heavy use of the submodularity inequality for entropy, using this to drive a kind of “symmetrisation” argument. In this way one can show that an arbitrary $\textrm{e}({\mathscr {V}}', {{\textbf{c}}}, {\varvec{\mu }})$ is greater than or equal to one in which ${\mathscr {V}}'$ is almost a basic flag; these “semi-basic” flags are then dealt with by hand.

To add an additional layer of complexity, we build a perturbative device into this argument so that our results also apply to $\tilde{\gamma }_k({\mathscr {V}})$.

3.3 Part IV: binary systems

The final part of the paper is devoted to a discussion of a particular type of flag ${\mathscr {V}}$, the binary flags, and the associated optimal systems $({\mathscr {V}}, {{\textbf{c}}}, {\varvec{\mu }})$, which we call binary systems.

Definition 3.8

(Binary flag of order r) Let $k = 2^r$ be a power of two. Identify ${\mathbb {Q}}^k$ with ${\mathbb {Q}}^{{\mathcal {P}}[r]}$ (where ${\mathcal {P}}[r]$ means the power set of $[r] = \{1,\ldots , r\}$) and define an r-step flag ${\mathscr {V}}$, $\langle {\textbf{1}} \rangle = V_0 \leqslant V_1 \leqslant \cdots \leqslant V_r = {\mathbb {Q}}^{{\mathcal {P}}[r]}$, as follows: $V_i$ is the subspace of all $(x_S)_{S \subset [r]}$ for which $x_S = x_{S \cap [i]}$ for all $S \subset [r]$.

Whilst the definition is, in hindsight, rather simple and symmetric, it was motivated by extensive numerical experiment. We believe these flags to be asymptotically optimal for Problem 3.7, though we currently lack a proof.

There are two main tasks in Part IV. First, we must verify that the various conditions necessary for the results of Part III hold for the binary flags. This is accomplished in Sect. 10, the main statements being given in Sect. 9. At the end of Sect. 9 we give the proof (and complete statement) of Theorem 2(a), conditional upon the results of Sect. 10. This is the deepest result in the paper.

Following this we turn to Theorem 2(b). There are two tasks here. First, we prove that the parameters $\rho _i$ for the binary flags (which do not depend on r) tend to a limit $\rho $. This is not at all straightforward, and is accomplished in Sect. 11.

After that, in Sect. 12, we describe this limit in terms of certain recurrence relations, which also provide a useful means of calculating it numerically. Theorem 2(b) is established at the very end of the paper.

Most of Part IV could, if desired, be read independently of the rest of the paper.

3.4 Relation to previous work

Previous lower bounds for the a.s. behaviour of $\Delta $ are contained in two papers of Maier and Tenenbaum [20, 22]. Both of these bounds can be understood within the framework of our paper.

The main result of [20] follows from the fact that

$$\begin{aligned} {{\tilde{\gamma }}}_2 \geqslant 1 - \frac{1}{\log 3}.\end{aligned}$$

(3.9)

Indeed by Theorem 7 it then follows that $\beta _2 \geqslant 1 - \frac{1}{\log 3}$, and then from Theorem 3 it follows that for almost every n we have

$$\begin{aligned} \Delta (n) \gg (\log \log n)^{-\log 2/\log (1 - \frac{1}{\log 3}) + o(1)}.\end{aligned}$$

(3.10)

The exponent appearing here is $0.28754048957\ldots $ and is exactly the one in [20, Theorem 2].

The bound (3.9) is very easy to establish, and a useful exercise in clarifying the notation we have set up. Take $k = 2$, $r = 1$ and let ${\mathscr {V}}$ be the flag $\langle {\textbf{1}}\rangle = V_0 \leqslant V_1 = {\mathbb {Q}}^2$. Let ${{\textbf{c}}}= (c_1, c_2)$ with $c_1 = 1$ and

$$\begin{aligned} c_2 < 1 - \frac{1}{\log 3}.\end{aligned}$$

(3.11)

Let $\mu _1$ be the measure which assigns weight $\frac{1}{3}$ to the points ${\textbf{0}} = (0,0)$, (0, 1) and (1, 0) in $\{0,1\}^2$ (this being a pullback of the uniform measure on $\{0,1\}^2 / V_0$).

There are only two subflags ${\mathscr {V}}'$ of ${\mathscr {V}}$, namely ${\mathscr {V}}$ itself and the basic flag ${\mathscr {V}}'_{{\text {basic}}(0)} : \langle {\textbf{1}}\rangle = V'_0 \leqslant V'_1$ with $V'_0 = V'_1 = V_0 = \langle {\textbf{1}}\rangle $. The entire content of the strict entropy condition (3.5) is therefore that

$$\begin{aligned} \textrm{e}({\mathscr {V}}'_{{\text {basic}}(0)}, {{\textbf{c}}},{\varvec{\mu }}) > \textrm{e}({\mathscr {V}}, {{\textbf{c}}}, {\varvec{\mu }}), \end{aligned}$$

which translates to

$$\begin{aligned} (c_1 - c_2) {\mathbb {H}}_{\mu _1}(V_0) > c_1. \end{aligned}$$

We have ${\mathbb {H}}_{\mu _1}(V_0) = \log 3$ and $c_1 = 1$, and so this translates to precisely condition (3.11).

Remark

(a) With very little more effort (appealing to Lemma B.2) one can show that $\gamma _2 = \beta _2 = {{\tilde{\gamma }}}_2 = 1 - \frac{1}{\log 3}$.

(b) This certainly does not provide a shorter proof of Theorem 3.10 than the one Maier and Tenenbaum gave, since our deductions are reliant on the material in Sects. 5 and 6, which constitute a significant elaboration of the ideas from [20].

The main result of [22] (Theorem 1.4 there) follows from the lower bound

$$\begin{aligned} {{\tilde{\gamma }}}_{2^r} \geqslant \Big (1 - \frac{1}{\log 3}\Big ) \Big (\frac{1 - 1/\log 3}{1 - 1/\log 27}\Big )^{r - 1}, \end{aligned}$$

(3.12)

which of course includes (3.9) as the special case $r = 1$. Applying Theorem 7 and Theorem 3, then letting $r \rightarrow \infty $, we recover [22, Theorem 1.4] (quoted as Theorem MT in Sect. 1), namely the bound

$$\begin{aligned} \Delta (n) \geqslant (\log \log n)^{\frac{\log 2}{\log \left( \frac{1 - 1/\log 27}{1 - 1/\log 3}\right) } - o(1) } \end{aligned}$$

for almost all n. The exponent here is $0.33827824168\ldots $.

To explain how (3.12) may be seen within our framework requires a little more setting up. Since it is not directly relevant to our main arguments, we defer this to Appendix C.

Part II. Equal sums and the optimisation problem

4 The upper bound $\beta _k \leqslant \gamma _k$

In this section we establish the bound in the title. We recall the definitions of $\beta _k$ (Problem 1) and $\gamma _k$ (Problem 3.7). We will in fact show a bit more, that if $c>\gamma _k$ then

$$\begin{aligned}{} & {} {\mathbb {P}}\left( \text {there are } \text { distinct }A_1,\ldots ,A_k\in [D^c,D]\text { with equal sums} \right) \rightarrow 0\nonumber \\ \end{aligned}$$

(4.1)

as $D\rightarrow \infty $.

4.1 Venn diagrams and linear algebra

Let $0< c < 1$ be some fixed quantity, and let D be a real number, large in terms of c. Suppose that $A_1,\ldots , A_k \subset [D^c, D]$ are distinct sets. In this section we show that there is a rather natural way to associate a complete system $({\mathscr {V}},{\textbf{c}},{\varvec{\mu }})$ (in the sense of Definition 3.2) to these sets. This system encodes the “linear algebra of the Venn diagram of the $A_i$” in a way that turns out to be extremely useful.

The Venn diagram of the $A_i$ has $2^k$ cells, indexed by $\{0,1\}^k$ in a natural way. Thus for each $\omega =(\omega _1,\ldots ,\omega _k)\in \{0,1\}^k$, we define

$$\begin{aligned} B_\omega := \bigcap _{i\,:\,\omega _i=1} A_i \bigcap _{i\,:\,\omega _i=0} (A_i)^c, \end{aligned}$$

(4.2)

The flag ${\mathscr {V}}$. Set $\Omega := \{ \omega : B_{\omega } \ne \emptyset \}$. We may put a total order $\prec $ on $\Omega $ by writing $\omega ' \prec \omega $ if and only if $\max B_{\omega '} < \max B_\omega $. We now select r special vectors $\omega ^1,\ldots ,\omega ^r \in \Omega $, with $r\leqslant k-1$, in the following manner. Let $\omega ^1 = \max _{\prec }(\Omega {\setminus } \{\varvec{0}, \varvec{1} \} )$. Assuming we have chosen $\omega ^1,\ldots ,\omega ^j$ such that ${\textbf{1}},\omega ^1,\ldots ,\omega ^j$ are linearly independent over ${\mathbb {Q}}$, let $\omega ^{j+1} = \max ( \Omega {\setminus } {\text {Span}}(\varvec{1}, \omega ^1,\ldots , \omega ^j) )$, as long as such a vector exists.

Let ${\textbf{1}}, \omega ^1,\ldots , \omega ^r$ be the set of vectors produced when this algorithm terminates. By construction, $\Omega \subset {\text {Span}}({\textbf{1}},\omega ^1,\ldots ,\omega ^r)$, or in other words $B_\omega =\emptyset $ whenever $\omega \in \{0,1\}^k{\setminus } {\text {Span}}({\textbf{1}},\omega ^1,\ldots ,\omega ^r)$.

Now define an r-step flag ${\mathscr {V}}: \langle {\textbf{1}}\rangle = V_0< V_1< \cdots < V_r$ by setting $V_j := {\text {Span}}({\textbf{1}}, \omega ^1,\ldots , \omega ^j)$ for $1 \leqslant j \leqslant r$.

The parameters ${\textbf{c}}$. Now we construct the parameters ${\textbf{c}} : 1 \geqslant c_1 \geqslant c_2 \geqslant \cdots \geqslant c_{r+1}$. For $j = 1,\ldots , r$, we define

$$\begin{aligned} c_j = 1 + \frac{\lceil \log \max B_{\omega ^j} - \log D\rceil }{\log D}.\end{aligned}$$

(4.3)

Thus

$$\begin{aligned} \max B_{\omega ^j} \in (\frac{1}{e}D^{c_j}, D^{c_j}]\end{aligned}$$

(4.4)

for $j = 1,\ldots , r$. Also set $c_{r+1}=c$. (The ceiling function $\lceil \cdot \rceil $ produces a “coarse” or discretised set of possible thresholds $c_i$, suitable for use in a union bound later on; see Lemma 4.2 below. The offset of $-\log D$ is to ensure that $c_1 \leqslant 1$.)

The measures ${\varvec{\mu }}$. Set

$$\begin{aligned} B'_{\omega } := \left\{ \begin{array}{ll} B_{\omega } {\setminus } \{\max B_{\omega ^j}\} &{}\quad \text {if }\omega = \omega ^j\text { for some }j, \\ B_{\omega } &{} \quad \text{ otherwise }. \end{array}\right. \end{aligned}$$

(4.5)

Define

$$\begin{aligned} \mu _j(\omega ) := \frac{\# \big (B'_{\omega } \cap (D^{c_{j+1}}, D^{c_{j}}]\big )}{\sum _{\omega } \# \big (B'_{\omega } \cap (D^{c_{j+1}}, D^{c_{j}}]\big )} , \end{aligned}$$

(4.6)

with the convention that if the denominator vanishes, then $\mu _j(\omega ) = 1_{\omega = {\textbf{0}}}$.

Remark

It is important that we use the $B'_{\omega }$ here, rather than the $B_{\omega }$, for technical reasons that will become apparent in the proof of Proposition 4.4 below.

Lemma 4.1

$({\mathscr {V}},{\textbf{c}},{\varvec{\mu }})$ is a complete system (in the sense of Definition 3.2).

Proof

We need to check that ${\text {Supp}}(\mu _j) \subset V_j$ for $j = 1,\ldots , r$. By definition, if $\mu _j(\omega ) > 0$ then $B_{\omega } \cap (D^{c_{j+1}}, D] \ne \emptyset $. This implies that $\max B_{\omega } > D^{c_{j+1}}$. On the other hand, (4.4) implies that $D^{c_{j+1}} \geqslant \max B_{\omega ^{j+1}}$, and thus $\max B_{\omega } > \max B_{\omega ^{j+1}}$. By the construction of the vectors $\omega ^i$, we must have $\omega \in {\text {Span}}({\textbf{1}}, \omega ^1,\ldots , \omega ^j) = V_j$.

We also need to check that ${\mathscr {V}}$ is nondegenerate, also in the sense of Definition 3.2, that is to say $V_r$ is not contained in any hyperplane $\{\omega \in {\mathbb {Q}}^k : \omega _i = \omega _j\}$. This follows immediately from the fact that the $A_i$ are distinct. Since

$$\begin{aligned} A_i \triangle A_j = \bigcup _{\begin{array}{c} \omega \in \{0,1\}^k\\ \omega _i\ne \omega _j \end{array}} B_\omega , \end{aligned}$$

and so there is certainly some $\omega $ with $\omega _i \ne \omega _j$ and $B_\omega \ne \emptyset $. $\square $

Note that, in addition to the system $({\mathscr {V}},{\textbf{c}},{\varvec{\mu }})$, the procedure described above outputs a sequence $\omega ^1,\ldots , \omega ^r$ of elements of $\{0,1\}^k$. We call the ensemble consisting of the system and the $\omega ^i$ the linear data associated to $A_1,\ldots , A_k$. We will only consider the event ${\textbf{A}}\in {\mathcal {E}}$, where

$$\begin{aligned}{} & {} {\mathcal {E}}:= \Big \{ A\subseteq [D^c,D]: \big | \# (A \cap (D^{\alpha }, D^{\beta }]) - (\beta - \alpha ) \log D \big |\nonumber \\{} & {} \qquad \qquad \leqslant \log ^{3/4} D \quad (c\leqslant \alpha \leqslant \beta \leqslant 1) \Big \}. \end{aligned}$$

(4.7)

By Lemma A.5, ${\mathbb {P}}({\textbf{A}}\in {\mathcal {E}})=1-o(1)$ as $D\rightarrow \infty $. In particular, if $A \in {\mathcal {E}}$, we have $|A \cap [D^c,D]| \leqslant 2\log D$ for large enough D.

Lemma 4.2

Fix $k\in {\mathbb {Z}}_{\geqslant 2}$ and suppose that $A \in {\mathcal {E}}$. The number of different ensembles of linear data arising from distinct sets $A_1,\ldots , A_k \subset A$ is $\ll (\log D)^{O(1)}$.

Proof

The number of choices for $\omega ^1,\ldots , \omega ^r$ is O(1), and hence the number of ${\mathscr {V}}$ is also $O_k(1)$. The thresholds $c_j$ are drawn from a fixed set of size $\log D$, and the numerators and denominators of the $\mu _j(\omega )$ are all integers $\leqslant 2\log D$. $\quad \square $

Remark 4.1

The O(1) and the $\ll $ here both depend on k. However we regard k as fixed here and do not indicate this dependence explicitly. If one is more careful then one can obtain results that are effective up to about $k \sim \log \log D$.

4.2 A local-to-global estimate

Our next step towards establishing the bound $\beta _k \leqslant \gamma _k$ is to pass from the “local” event that a random logarithmic set ${\textbf{A}}$ possesses a k-tuple of equal subsums $(\sum _{a\in A_1}a,\ldots ,\sum _{a\in A_k}a)$ to the “global” distribution of such subsums (with the subtlety that we must mod out by ${\textbf{1}}$). The latter is controlled by the set ${\mathscr {L}}_{{\mathscr {V}},{{\textbf{c}}},{\varvec{\mu }}}({\textbf{A}})$ defined below.

Definition 4.3

Given a set of integers A and a system $({\mathscr {V}},{\textbf{c}},{\varvec{\mu }})$, we write $ {\mathscr {L}}_{{\mathscr {V}},{\textbf{c}},{\varvec{\mu }}}(A)$ for the set of vectors

$$\begin{aligned} \sum _{\omega \in \{0,1\}^k}\omega \sum _{a\in B_\omega } a\pmod {\textbf{1}}, \end{aligned}$$

where $(B_\omega )_{\omega \in \{0,1\}^k}$ runs over all partitions of A such that

$$\begin{aligned} \mu _j(\omega ) = \frac{\#\big (B_\omega \cap (D^{c_{j+1}},D^{c_j} ]\big )}{\#\big (A\cap (D^{c_{j+1}},D^{c_j}]\big )} \qquad (1\leqslant j\leqslant r, \; \omega \in \{0,1\}^k). \end{aligned}$$

(4.8)

Proposition 4.4

Fix an integer $k\geqslant 2$ and a parameter $0<c<1$. Let D be large in terms of c and k, and let ${\textbf{A}}\subset [D^c,D]$ be a logarithmic random set. Let

$$\begin{aligned}{} & {} {\widetilde{{\mathcal {E}}}} = \Big \{ A\subseteq [D^c,D]: \big | \# (A \cap (D^{\alpha }, D^{\beta }]) - (\beta - \alpha ) \log D \big |\nonumber \\{} & {} \qquad \quad \quad \leqslant 2\log ^{3/4} D \quad (c\leqslant \alpha \leqslant \beta \leqslant 1) \Big \}. \end{aligned}$$

(4.9)

Then we have

$$\begin{aligned}&{\mathbb {P}}\bigg (\exists \ \text {distinct}\ A_1,\ldots ,A_k\subseteq {\textbf{A}}\ \text {such that}\ \sum _{a\in A_1}a=\cdots =\sum _{a\in A_k}a\bigg ) \nonumber \\&\quad \qquad \leqslant (\log D)^{O(1)} \sup _{({\mathscr {V}},{\textbf{c}},{\varvec{\mu }})} D^{-(c_1 + \cdots + c_r)} {\mathbb {E}}{\textbf{1}}_{{\textbf{A}}\in {\widetilde{{\mathcal {E}}}}} |{\mathscr {L}}_{{\mathscr {V}},{\textbf{c}},{\varvec{\mu }}}({\textbf{A}})|+ {\mathbb {P}}({\mathcal {E}}^c). \end{aligned}$$

(4.10)

Here, the supremum is over all complete systems $({\mathscr {V}},{\textbf{c}},{\varvec{\mu }})$ with $c_{r+1}=c$.

Proof

Recall the definition of the set ${\mathcal {E}}$, given in Eq. (4.7). We have

$$\begin{aligned}&{\mathbb {P}}\bigg (\exists \ \text {distinct}\ A_1,\ldots ,A_k\subseteq {\textbf{A}}\ \text {such that}\ \sum _{a\in A_1}a=\cdots =\sum _{a\in A_k}a\bigg )\\&\quad \leqslant {\mathbb {P}}({\mathcal {E}}^c) + \sum _{\begin{array}{c} {\mathscr {V}},{\textbf{c}},{\varvec{\mu }},(\omega ^i) \end{array}} \; \sum _{\begin{array}{c} A \in {\mathscr {S}}({\mathscr {V}},{\textbf{c}},{\varvec{\mu }},(\omega ^i)) \end{array}} {\mathbb {P}}({\textbf{A}}=A), \end{aligned}$$

where, given linear data $\{({\mathscr {V}},{\textbf{c}},\mathbf {\mu }), \omega ^1,\ldots , \omega ^r\}$, we write ${\mathscr {S}}({\mathscr {V}},{\textbf{c}},{\varvec{\mu }},(\omega ^i))$ to denote the set of all $A\in {\mathcal {E}}$ that have k distinct subsets $(A_1,\ldots ,A_k)$ with equal sums-of-elements and associated linear data $\{({\mathscr {V}},{\textbf{c}},\mathbf {\mu }),\omega ^1,\ldots , \omega ^r\}$. (The set ${\textbf{A}}$ appearing in (4.10) will be constructed below by removing certain elements from the logarithmic set ${\textbf{A}}$ we started with; this new set belongs to ${\widetilde{{\mathcal {E}}}}$, but not necessarily to ${\mathcal {E}}$.)

Let us fix a choice of linear data $\{({\mathscr {V}},{\textbf{c}},\mathbf {\mu }), \omega ^1,\ldots , \omega ^r\}$ and let us abbreviate ${\mathscr {S}}$ for the set ${\mathscr {S}}({\mathscr {V}},{\textbf{c}},{\varvec{\mu }},(\omega ^i))$. An elementary probability calculation gives

$$\begin{aligned} E({\mathscr {S}}) := \sum _{A \in {\mathscr {S}}} {\mathbb {P}}({\textbf{A}}= A) = \sum _{A \in {\mathscr {S}}} \prod _{D^c < a \leqslant D} \Big (1 - \frac{1}{a}\Big ) \prod _{a \in A} \frac{1}{a - 1}. \end{aligned}$$

(4.11)

For each $A \in {\mathscr {S}}$, fix a choice of $(A_1,\ldots ,A_k)$ with equal sums and such that the linear data associated to $(A_1,\ldots , A_k)$ is $\{ ({\mathscr {V}}, {\textbf{c}}, {\varvec{\mu }}), \omega ^1,\ldots , \omega ^r\}$. Let $B_\omega $ be the cells of the Venn diagram corresponding to the $A_i$, as in (4.2), and then define the $B_\omega '$ as in (4.5). Recall that (4.6) holds, and define $K_j = \max B_{\omega ^j}$ for $1 \leqslant j\leqslant r$. In particular, $K_1> \cdots > K_r$. Let $A' = A {\setminus } \{K_1,\ldots ,K_r\}$. Note that $A'\in {\tilde{{\mathcal {E}}}}$ if D is large enough in terms of k. Moreover, we have

$$\begin{aligned} \sum _{a\in A_i} a=\sum _{\omega \in \{0,1\}^k}\omega _i\sum _{a\in B_\omega }a. \end{aligned}$$

Therefore, the equal sums condition is equivalent to

$$\begin{aligned} \sum _{\omega \in \{0,1\}^k} \omega \sum _{a\in B_\omega } a \;= \;{\textbf{0}} \; ({\text {mod}}\, {\textbf{1}}) , \end{aligned}$$

and hence

$$\begin{aligned} \sum _{j = 1}^r K_j \omega ^j = - \sum _{\omega } \omega \sum _{a' \in B'_{\omega }} a'\, ({\text {mod}}\, {\textbf{1}}). \end{aligned}$$

(4.12)

Since ${\textbf{1}}, \omega ^1,\ldots , \omega ^r$ are linearly independent, the value of the right-hand side of (4.12) uniquely determines the numbers $K_j$, which themselves uniquely determine A in terms of the sets $B_\omega '$. Therefore, given $A' \in {\tilde{{\mathcal {E}}}}$, the number of possible sets A is, by Definition 4.3, at most $|{\mathscr {L}}_{{\mathscr {V}},{\textbf{c}},{\varvec{\mu }}}(A')|$. Moreover by (4.4) we have $K_j > \frac{1}{e} D^{c_j}$ for every j, and therefore

$$\begin{aligned} \prod _{a \in A} \frac{1}{a-1} \ll D^{-(c_1 + \cdots + c_r)} \prod _{a \in A'} \frac{1}{a - 1}. \end{aligned}$$

(4.13)

We sum over $A'$, and reinterpret the product on the right-hand side of (4.13) in terms of ${\mathbb {P}}({\textbf{A}}=A')$. This gives

$$\begin{aligned}\begin{aligned} E({\mathscr {S}})&\ll D^{-(c_1 + \cdots + c_r)} \sum _{A' \in {\tilde{{\mathcal {E}}}}} |{\mathscr {L}}_{{\mathscr {V}},{{\textbf{c}}},{\varvec{\mu }}}(A')| \prod _{D^c<a\leqslant D} \Big (1-\frac{1}{a}\Big ) \prod _{a \in A'} \frac{1}{a - 1}\\&= D^{-(c_1 + \cdots + c_r)} \sum _{A'\in {\tilde{{\mathcal {E}}}}} |{\mathscr {L}}_{{\mathscr {V}},{{\textbf{c}}},{\varvec{\mu }}}(A')| \cdot {\mathbb {P}}({\textbf{A}}=A')\\&= D^{-(c_1 + \cdots + c_r)} {\mathbb {E}}1_{{\textbf{A}}\in {\tilde{{\mathcal {E}}}}} \cdot |{\mathscr {L}}_{{\mathscr {V}},{\textbf{c}},{\varvec{\mu }}}({\textbf{A}})|. \end{aligned}\end{aligned}$$

By Lemma 4.2 there are $(\log D)^{O(1)}$ possible choices for the linear data $\{({\mathscr {V}},{\textbf{c}},\mathbf {\mu }), \omega ^1,\ldots , \omega ^r\}$, and the proof is complete. $\square $

4.3 Upper bounds in terms of entropies

Having established Proposition 4.4, we turn to the study of the sets ${\mathscr {L}}_{{\mathscr {V}},{{\textbf{c}}},{\varvec{\mu }}}(A)$. We will bound their cardinality in terms of the quantities $\textrm{e}({\mathscr {V}}',{\textbf{c}},\mathbf {\mu })$ from Definition 3.2 with ${\mathscr {V}}'$ a subflag of ${\mathscr {V}}$.

Lemma 4.5

Let $({\mathscr {V}},{{\textbf{c}}},{\varvec{\mu }})$ be a system and let $A\in {\tilde{{\mathcal {E}}}}$, where ${\tilde{{\mathcal {E}}}}$ is defined in (4.9). Then, for any subflag ${\mathscr {V}}'$ of ${\mathscr {V}}$,

$$\begin{aligned} |{\mathscr {L}}_{{\mathscr {V}},{{\textbf{c}}},{\varvec{\mu }}}(A)| \ll _{{\mathscr {V}}'} e^{O(\log ^{3/4} D)} D^{\textrm{e}({\mathscr {V}}',{\textbf{c}},\mathbf {\mu })}. \end{aligned}$$

(4.14)

Remark

The implied constant in the $\ll _{{\mathscr {V}}'}$ could be made explicit if desired (in terms of the quantitative rationality of a basis for the spaces in ${\mathscr {V}}'$) but we have no need to do this.

Proof of Lemma 4.5

Given a set $X\subset [D^c,D]$, write $X^{(j)}:=X\cap (D^{c_{j+1}},D^{c_j}]$ for $j = 1,\ldots , r$. Throughout the proof, we will assume that A is a set of integers and that $(B_{\omega })_{\omega \in \{0,1\}^k}$ runs over all partitions of A such that (4.8) is satisfied. In our new notation, this may be rewritten as

$$\begin{aligned} |B_{\omega }^{(j)}|= \mu _j(\omega )|A^{(j)}|,\quad j = 1,\ldots , r, \; \; \omega \in \{0,1\}^k. \end{aligned}$$

(4.15)

For each j, $1 \leqslant j \leqslant r$, fix a linear projection $P_j : V_j \rightarrow V'_j$, and set $Q_j := {\text {id}}_{V_j} - P_j$, so that $Q_j$ maps $V_j$ to itself. Set

$$\begin{aligned} {\mathscr {L}}^P(A) := \Big \{ \sum _{j = 1}^r \sum _{\begin{array}{c} \omega \in \{0,1\}^k \\ \omega \in V_j \end{array}} P_j(\omega ) \sum _{a \in B_{\omega }^{(j)}} a \ ({\text {mod}}\, {\textbf{1}}) : \text{( }4.15\text {) is satisfied} \Big \} \end{aligned}$$

and

$$\begin{aligned} {\mathscr {L}}^Q(A) := \Big \{ \sum _{j = 1}^r \sum _{\begin{array}{c} \omega \in \{0,1\}^k \\ \omega \in V_j \end{array}} Q_j(\omega ) \sum _{a \in B_{\omega }^{(j)}} a \ ({\text {mod}}\, {\textbf{1}}) : \text{( }4.15\text {) is satisfied} \Big \} . \end{aligned}$$

Since

$$\begin{aligned} \sum _{\omega \in \{0,1\}^k} {\omega } \sum _{a\in B_\omega } a =\sum _{j=1}^r \sum _{\begin{array}{c} \omega \in \{0,1\}^k \\ \omega \in V_j \end{array}} {P}_j(\omega )\sum _{a\in B_\omega ^{(j)}} a +\sum _{j=1}^r \sum _{\begin{array}{c} \omega \in \{0,1\}^k \\ \omega \in V_j \end{array}}{Q}_j(\omega ) \sum _{a\in B_\omega ^{(j)}} a, \end{aligned}$$

it follows immediately from the definition of ${\mathscr {L}}_{{\mathscr {V}},{\textbf{c}},{\varvec{\mu }}}(A)$ (Definition 4.3) that

$$\begin{aligned} |{\mathscr {L}}_{{\mathscr {V}},{{\textbf{c}}},{\varvec{\mu }}}(A)| \leqslant |{\mathscr {L}}^P(A)|\cdot |{\mathscr {L}}^Q(A)|. \end{aligned}$$

(4.16)

We claim that

$$\begin{aligned} |{\mathscr {L}}^P(A)| \ll _{\mathscr {V}'} (\log D)^r D^{\sum _{j = 1}^r c_j \dim (V'_j/V'_{j-1})} \end{aligned}$$

(4.17)

and that

$$\begin{aligned} |{\mathscr {L}}^Q(A)| \leqslant e^{O(\log ^{3/4} D)}D^{\sum _{j = 1}^r (c_{j} - c_{j+1}) {\mathbb {H}}_{\mu _j}(V'_j)}. \end{aligned}$$

(4.18)

These bounds, substituted into (4.16), immediately imply Lemma 4.5.

It remains to establish (4.17) and (4.18), which are proven in quite different ways. We begin with (4.18), which is a “combinatorial” bound, in that there cannot be too many choices for the data making up the sums in ${\mathscr {L}}^Q(A)$. For this, observe that $Q_j$ vanishes on $V'_j$ and hence is constant on cosets of $V'_j$. Therefore the elements of ${\mathscr {L}}^Q(A)$ are determined by the sets $\bigcup _{\omega \in v_j + V'_j} B^{(j)}_{\omega }$, over all $v_j \in V_j/V'_j$ and $1 \leqslant j \leqslant r$. By (4.15),

$$\begin{aligned} \Big |\bigcup _{\omega \in v_j + V'_j} B^{(j)}_{\omega }\Big | = \mu _j(v_j + V'_j) |A^{(j)}|, \end{aligned}$$

and by Lemma B.1 the number of ways of partitioning $A^{(j)}$ into sets of these sizes is bounded above by $e^{{\mathbb {H}}({\textbf{p}}^{(j)}) |A^{(j)}|}$, where ${\textbf{p}}^{(j)} = (\mu _j(v_j + V'_j))_{v_j \in V_j/V'_j}$. By Definition 3.3, ${\mathbb {H}}({\textbf{p}}^{(j)}) = {\mathbb {H}}_{\mu _j}(V'_j)$. Taking the product over $j = 1,\ldots , r$ gives

$$\begin{aligned} |{\mathscr {L}}^Q(A)| \leqslant e^{\sum _{j = 1}^r {\mathbb {H}}_{\mu _j}(V'_j) |A^{(j)}|}. \end{aligned}$$

From the assumption that $A\in {\widetilde{{\mathcal {E}}}}$, where ${\widetilde{{\mathcal {E}}}}$ is defined in (4.9), we have

$$\begin{aligned} |A^{(j)}| = (c_j - c_{j+1}) \log D + O(\log ^{3/4} D). \end{aligned}$$

Using this, and the trivial bound ${\mathbb {H}}_{\mu _j}(V'_j) \leqslant \log |{\text {Supp}}(\mu _j)| \leqslant \log (2^k)$, (4.18) follows.

Now we prove (4.17), which is a “metric” bound, the point being that none of the sums in ${\mathscr {L}}^P(A)$ can be too large in an appropriate sense. Pick a basis for ${\mathbb {Q}}^k$ adapted to ${\mathscr {V}}'$: that is, a basis $e_1,\ldots , e_k$ such that $V'_j = {\text {Span}}(e_1,\ldots , e_{\dim V'_j})$ for each j, and $e_1 = {\textbf{1}}$. There are positive integers $M,N = O_{{\mathscr {V}}',{\mathscr {V}}}(1)$ such that, in this basis, the $e_i$-coordinates of $P_j(\omega )$ are all rationals with denominator M and absolute value at most N.

Now for fixed j and $\omega $, if D is large then $ \sum _{a \in B_{\omega }^{(j)}} a \leqslant D^{c_j} \log D, $ since $B_{\omega }^{(j)} \subset (D^{c_{j+1}}, D^{c_j}]$ and by the assumption that $A\in {\widetilde{{\mathcal {E}}}}$. Thus

$$\begin{aligned}{} & {} \sum _{\begin{array}{c} \omega \in \{0,1\}^k \\ \omega \in V_j \end{array}} P_j(\omega ) \sum _{a \in B_{\omega }^{(j)}} a \in \Big \{ \sum _{1\leqslant i\leqslant \dim (V_j')} x_i e_i \in {\mathbb {Q}}^k : Mx_i \in {\mathbb {Z}},\\ {}{} & {} \qquad \qquad \qquad \qquad \qquad \qquad \qquad |x_i| \leqslant r N D^{c_j} \log D \; (\text {for all }i) \Big \} , \end{aligned}$$

and so the expression $ \sum \limits _{i=1}^r \sum \limits _{{\omega \in \{0,1\}^k \cap V_j}} P_j(\omega ) \sum _{a \in B_{\omega }^{(j)}} a $ belongs to the set

$$\begin{aligned} \Big \{ \sum _{1\leqslant i\leqslant k} x_i e_i \in {\mathbb {Q}}^k : \begin{array}{l} Mx_i \in {\mathbb {Z}}\ \text {and}\ |x_i| \leqslant r^2 N D^{c_j} \log D \\ \text {for}\ \dim V'_{j-1} < i \leqslant \dim V_j'\ \text {and}\ 1\leqslant j\leqslant r \end{array} \Big \}. \end{aligned}$$

We must bound the number of different values that the expression $\sum _{i=1}^k x_ie_i$ can take mod ${\textbf{1}}$ when the coefficients $x_1,\ldots ,x_k$ are as above. Since $e_1 = {\textbf{1}}$ and $x_1 M\in {\mathbb {Z}}$, given $x_2,\ldots ,x_k$ there are at most M possibilities for $x_1$ mod ${\textbf{1}}$. In addition, there are

$$\begin{aligned} \ll (r^2MN)^{k-1} (\log D)^r D^{\sum _{j = 1}^r c_j \dim (V'_j/V'_{j-1})} \end{aligned}$$

possibilities for $x_2,\ldots ,x_k$, thereby concluding the proof of (4.17) and hence of Lemma 4.5. $\square $

A potential problem with applying Lemma 4.5 is that there may be infinitely many subflags ${\mathscr {V}}'$ to consider, and the constant implied by the $\ll $-symbol depends on ${\mathscr {V}}'$. As we shall see in the next Lemma, however, we may reduce the problem to consideration of a finite number of subflags, a tool which will be used in several parts of this paper.

Lemma 4.6

For a given k, the set of all flags

$$\begin{aligned} {\mathscr {V}}': \langle {\textbf{1}} \rangle = V'_0 \leqslant V'_1 \leqslant V'_2 \leqslant \cdots \leqslant V'_r \leqslant {\mathbb {Q}}^k \end{aligned}$$

may be partitioned into $O_k(1)$ equivalence classes such that any two flags $\mathscr {V}',\mathscr {V}''$ in the same equivalence class satisfy $\dim V'_j = \dim V''_j$ for all j, and for any thresholds ${{\textbf{c}}}$ satisfying $c_1\geqslant c_2 \geqslant \cdots \geqslant c_{r+1}$ and probability measures ${\varvec{\mu }}$ supported on $\{0,1\}^k$, we have ${\mathbb {H}}_{\mu _j}(V'_j) = {\mathbb {H}}_{\mu _j}(V''_j)$ for all j and $\textrm{e}({\mathscr {V}}',{\textbf{c}}, {\varvec{\mu }}) = \textrm{e}({\mathscr {V}}'',{\textbf{c}}, {\varvec{\mu }})$.

Proof

We say that two subflags ${\mathscr {V}}',{\mathscr {V}}''$ are equivalent if $V'_j, V''_j$ have the same intersection with $\{0,1\}^k$ and $\dim V'_j = \dim V''_j$, for all $j = 1,\ldots , r$. There are clearly only $O_k(1)$ equivalence classes, and the desired properties hold for members of the same equivalence class by the definition of ${\mathbb {H}}_{\mu _j}(V'_j)$ and $\textrm{e}(\mathscr {V}',{{\textbf{c}}},{\varvec{\mu }})$. $\square $

Armed with Lemma 4.6, we immediately obtain from Lemma 4.5, applied to one representative from each class, the following corollary.

Corollary 4.7

Let $({\mathscr {V}},{\textbf{c}},{\varvec{\mu }})$ be a system and suppose that $A\in {\tilde{{\mathcal {E}}}}$. Then

$$\begin{aligned} |{\mathscr {L}}_{{\mathscr {V}},{{\textbf{c}}},{\varvec{\mu }}}(A)| \ll e^{O(\log ^{3/4} D)}\min _{{\mathscr {V}}' \leqslant {\mathscr {V}}} D^{\textrm{e}({\mathscr {V}}',{{\textbf{c}}},{\varvec{\mu }})}. \end{aligned}$$

4.4 The upper bound in Theorem 7

We can now establish the upper bound in Theorem 7, that is to say the inequality $\beta _k \leqslant \gamma _k$.

We start by applying Proposition 4.4. Together with Lemma A.5, it implies that

$$\begin{aligned}&{\mathbb {P}}\big (\exists \ \text {distinct}\ A_1,\ldots ,A_k\subseteq {\textbf{A}}\cap (D^c, D] \ \text { with equal sums} \big )\\&\quad \qquad \leqslant (\log D)^{O(1)} \sup _{({\mathscr {V}},{\textbf{c}},{\varvec{\mu }})} D^{-\textrm{e}({\mathscr {V}},{\textbf{c}},{\varvec{\mu }})} {\mathbb {E}}1_{{\textbf{A}}\in \tilde{{\mathcal {E}}}} |{\mathscr {L}}_{{\mathscr {V}},{\textbf{c}},{\varvec{\mu }}}({\textbf{A}})|+ O(e^{-\frac{1}{4} \log ^{1/2} D}). \end{aligned}$$

Here, the supremum is over complete systems $({\mathscr {V}},{\textbf{c}},{\varvec{\mu }})$ with $c_{r+1} = c$, and we made the observation that for such systems we have

$$\begin{aligned} \textrm{e}({\mathscr {V}},{\textbf{c}},{\varvec{\mu }}) = c_1 + \cdots + c_r, \end{aligned}$$

an immediate consequence of the definition of $\textrm{e}({\mathscr {V}},{{\textbf{c}}},{\varvec{\mu }})$ and the fact that ${\mathbb {H}}_{\mu _j}(V_j) = 0$ for all j and that $\dim V_j = j+1$. Thus we may apply Corollary 4.7, concluding that

$$\begin{aligned}&{\mathbb {P}}\big (\exists \ \text {distinct}\ A_1,\ldots ,A_k\subseteq {\textbf{A}}\cap (D^c, D] \ \text { with equal sums}\big )\\ {}&\quad \leqslant D^{\theta +o(1)} + O(e^{-\frac{1}{4} \log ^{1/2} D}), \end{aligned}$$

where

$$\begin{aligned} \theta := \sup _{({\mathscr {V}},{\textbf{c}},{\varvec{\mu }}) \,:\, c_{r+1} = c} \min _{{\mathscr {V}}' \leqslant {\mathscr {V}}} \big ( \textrm{e}({\mathscr {V}}', {\textbf{c}}, {\varvec{\mu }}) - \textrm{e}({\mathscr {V}},{\textbf{c}},{\varvec{\mu }})) \,; \end{aligned}$$

(4.19)

the supremum is over all complete systems $({\mathscr {V}},{\textbf{c}},{\varvec{\mu }})$ with $c_{r+1} = c$, and the minimum is over all subflags $\mathscr {V}' \leqslant {\mathscr {V}}$. Note that the minimum exists by Lemma 4.6, since we may restrict attention to a finite set of subflags $\mathscr {V}'$. Moreover, the supremum is realised, meaning there is a system $({\mathscr {V}},{{\textbf{c}}},{\varvec{\mu }})$ for which the right side of (4.19) equals $\theta $. Indeed, there are O(1) choices for ${\mathscr {V}}$, and with ${\mathscr {V}}$ fixed the quantities ${\textbf{c}},{\varvec{\mu }}$ range over compact subsets of Euclidean space, with the right side of (4.19) continuous in these variables.

Now, if we assume that $c>\gamma _k$, then the definition of $\gamma _k$ in Problem 3.7 implies that there is no system $({\mathscr {V}},{{\textbf{c}}},{\varvec{\mu }})$ with $c_{r+1}=c$ and that satisfies the entropy condition (3.4). Equivalently, if $c_{r+1}=c$, then $\min _{{\mathscr {V}}'\leqslant {\mathscr {V}}} \big (\textrm{e}({\mathscr {V}}', {\textbf{c}}, {\varvec{\mu }}) - \textrm{e}({\mathscr {V}},{\textbf{c}},{\varvec{\mu }}))\big )<0$. In particular, we have $\theta <0$. We have thus established (4.1), as required.

Remark

In the above proof, $({\mathscr {V}},{\textbf{c}},{\varvec{\mu }})$ is a complete system. However, for other aspects of our problem it is not natural to focus on the completeness condition, for which reason we omit it from the definition of $\gamma _k$.

5 The lower bound $\beta _k \geqslant {{\tilde{\gamma }}}_k$

5.1 Introduction and simple reductions

The aim of this section and the next is to establish the lower bound $\beta _k \geqslant {{\tilde{\gamma }}}_k$. We begin, in Lemma 5.3 below, by showing that we may restrict our attention to certain systems satisfying some additional regularity conditions.

We isolate a “folklore” lemma from the proof for which it is not easy to find a good reference. The authors thank Carla Groenland for a helpful conversation on this topic.

Lemma 5.1

Let V be a subspace of ${\mathbb {Q}}^k$. Then $\# (V \cap \{0,1\}^k) \leqslant 2^{\dim V}$.

Proof

We outline two quite different short proofs. Let $d := \dim V$.

Proof 1. We claim that there is a projection from ${\mathbb {Q}}^k$ onto some set of d coordinates which is injective on V. From this, the result is obvious, since the image of $\{0,1\}^k$ under any such projection has size $2^d$. To prove the claim, let $e_1,\ldots ,e_n$ denote the standard basis on ${\mathbb {Q}}^n$. Note that if $W \leqslant {\mathbb {Q}}^n$ and if none of the quotient maps ${\mathbb {Q}}^n \mapsto {\mathbb {Q}}^n/\langle e_i\rangle $ is injective on W, then W must contain a multiple of each $e_i$, and therefore $W = {\mathbb {Q}}^n$. Thus if W is a proper subspace of ${\mathbb {Q}}^n$ then there is a projection onto some set of $(n-1)$ coordinates which is injective on W. Repeated use of this fact establishes the claim.

Proof 2. Suppose that $\# (V \cap \{0,1\}^k)$ contains $2^d + 1$ points. These are all distinct under the natural ring homomorphism $\pi : {\mathbb {Z}}^k \rightarrow {\mathbb {F}}_2^k$, and so their images cannot lie in a subspace (over ${\mathbb {F}}_2$) of dimension d. Hence there are $v_1,\ldots , v_{d+1} \in V$ such that $\pi (v_1),\ldots , \pi (v_{d+1})$, are linearly independent over ${\mathbb {F}}_2$. The $(d +1) \times k$ matrix formed by these $\pi (v_i)$ therefore has a $(d+1) \times (d+1)$-subminor which is nonzero in ${\mathbb {F}}_2$. The corresponding subminor of the matrix formed by the $v_i$ is therefore an odd integer, and in particular not zero. This means that $v_1,\ldots , v_{d+1}$ are linearly independent over ${\mathbb {Q}}$, contrary to the assumption that $\dim (V)=d$. $\square $

We now record an immediate corollary of Lemma 4.6, which provides a “gap condition” on the $\textrm{e}$-quantities.

Lemma 5.2

If the system $({\mathscr {V}},{{\textbf{c}}},{\varvec{\mu }})$ satisfies (3.5) then there is an $\varepsilon >0$ such that for all proper subflags ${\mathscr {V}}'$,

$$\begin{aligned} \textrm{e}({\mathscr {V}}',{{\textbf{c}}},{\varvec{\mu }}) \geqslant \textrm{e}({\mathscr {V}},{{\textbf{c}}},{\varvec{\mu }}) + \varepsilon . \end{aligned}$$

(5.1)

For future reference, the next two lemmas record more information about optimal systems for ${\tilde{\gamma }}_k$ and for $\gamma _k$, respectively.

Lemma 5.3

Let $k\in {\mathbb {Z}}_{\geqslant 2}$. We have that ${\tilde{\gamma }}_k$ is the supremum of all $c>0$ for which there is a system $({\mathscr {V}},{{\textbf{c}}},{\varvec{\mu }})$ such that $c_{r+1}=c$, (3.5) holds and we further have:

(a)
$1=c_1> c_2> \cdots >c_{r+1}=c$;
(b)
${\mathbb {H}}_{\mu _j}(V_{j-1}) > \dim (V_j/V_{j-1})$ for $1\leqslant j\leqslant r-1$ and
$$\begin{aligned} {\mathbb {H}}_{\mu _r}(V_{r-1}) > \frac{c_r}{c_r-c_{r+1}} \dim (V_r/V_{r-1}); \end{aligned}$$
(c)
$\dim (V_1/V_0)=1$;
(d)
${\text {Supp}}(\mu _j) = V_j \cap \{0,1\}^k$ for $j=1,2,\ldots ,r$;
(e)
for all j and $\omega $, $\mu _j(\omega )=\mu _j({\textbf{1}}-\omega )$.

Proof

First of all, we show that we may assume that $c>0$ and that statement (d) holds. Indeed, if a system $({\mathscr {V}},{\varvec{\mu }},{{\textbf{c}}})$ satisfies (3.5), then Lemma 5.2 implies that (5.1) holds for some $\varepsilon >0$. As the difference between the left and right sides of (5.1) is continuous in the quantities $c_j$ and $\mu _j(\omega )$, we may increase $c_{r+1}$ (and possibly some of the other $c_j$’s) a tiny bit and we may also adjust the measures $\mu _j$ by a small amount, so that $c_{r+1}>0$, statement (d) holds, and we also have that

$$\begin{aligned} \textrm{e}(\mathscr {V}',{{\textbf{c}}},{\varvec{\mu }}) \geqslant \textrm{e}({\mathscr {V}},{{\textbf{c}}},{\varvec{\mu }}) + \varepsilon /2 \end{aligned}$$

for every proper subflag $\mathscr {V}'$.

Next, we show that we may take $c_1=1$. Indeed, condition (3.5) implies that $\textrm{e}({\mathscr {V}}',{{\textbf{c}}},{\varvec{\mu }})\geqslant \textrm{e}({\mathscr {V}},{{\textbf{c}}},{\varvec{\mu }})\geqslant 0$ for all ${\mathscr {V}}'\leqslant {\mathscr {V}}$ by (3.3). Now if $c_1<1$ and ${\tilde{c}}_j=c_j/c_1$ for each j, then the perturbed system $({\mathscr {V}},{\tilde{{{\textbf{c}}}}},{\varvec{\mu }})$ has a larger value of $c_{r+1}$, and moreover also satisfies (3.5), since for any subflag $\mathscr {V}'$ we have

$$\begin{aligned} \textrm{e}(\mathscr {V}',{\tilde{{{\textbf{c}}}}},{\varvec{\mu }}) = (1/c_1) \textrm{e}(\mathscr {V}',{{\textbf{c}}},{\varvec{\mu }}). \end{aligned}$$

Next, consider a system $({\mathscr {V}},{{\textbf{c}}},{\varvec{\mu }})$ satisfying $c_1=1$ and $c_{r+1}=c>0$, and consider the subflag ${\mathscr {V}}' : \langle {\textbf{1}}\rangle = V'_0 \leqslant V_1'\leqslant \cdots \leqslant V_r'$, where $V_i' = V_i$ for $i\ne j$, and $V_{j}'=V_{j-1}$; that is, ${\mathscr {V}}'$ has two consecutive copies of $V_{j-1}$. By assumption (Definition 3.2), we have $V_{j-1}\ne V_j$, and thus ${\mathscr {V}}'$ is a proper subflag of ${\mathscr {V}}$. Thus

$$\begin{aligned}{} & {} \textrm{e}({\mathscr {V}}',{{\textbf{c}}},{\varvec{\mu }}) - \textrm{e}({\mathscr {V}},{{\textbf{c}}},{\varvec{\mu }}) \\ {}{} & {} \quad = {\left\{ \begin{array}{ll} (c_j-c_{j+1}) \big ( {\mathbb {H}}_{\mu _j}(V_{j-1}) - \dim (V_j/V_{j-1}) \big ) &{}\text {if}\ j\leqslant r-1,\\ (c_r-c_{r+1}){\mathbb {H}}_{\mu _r}(V_{r-1}) - c_r\dim (V_r/V_{r-1}) &{}\text {if}\ j=r. \end{array}\right. } \end{aligned}$$

Since the left-hand side is positive, we conclude that (a) and (b) hold.

(c) Let $d=\dim (V_1/V_0)$. By Lemma 5.1, $|V_1 \cap \{0,1\}^k| \leqslant 2^{\dim V_1} = 2^{d+1}$ and hence $\mu _1$ is supported on at most $2^{d+1}-1$ cosets of $V_0$ (since ${\textbf{1}} \in V_0$, the points ${\textbf{0}}$ and ${\textbf{1}}$ lie in the same coset). In particular, by Lemma B.2, ${\mathbb {H}}_{\mu _1}(V_0) \leqslant \log (2^{d+1}-1)$. On the other hand, ${\mathbb {H}}_{\mu _1}(V_0) > d$ by statement (b). We must thus have $d = 1$, which is exactly statement (c).

(e) Assume the system $({\mathscr {V}},{{\textbf{c}}},{\varvec{\mu }})$ satisfies (3.5) and (a). For every j and $\omega \in V_j$, we define

$$\begin{aligned} {\tilde{\mu }}_j (\omega ) = \frac{\mu _j(\omega )+\mu _j({\textbf{1}}-\omega )}{2}. \end{aligned}$$

We then consider the system $({\mathscr {V}},{{\textbf{c}}},{\tilde{{\varvec{\mu }}}})$, and must show that it also satisfies (3.5). For this, it is enough to show that

$$\begin{aligned} {\mathbb {H}}_{{\widetilde{\mu }}_j}(V_j') \geqslant {\mathbb {H}}_{\mu _j}(V_j')\end{aligned}$$

(5.2)

for all j. Indeed, we then have, for every proper subflag $\mathscr {V}'$,

$$\begin{aligned} \textrm{e}({\mathscr {V}}',{{\textbf{c}}},\widetilde{\varvec{\mu }}) \geqslant \textrm{e}({\mathscr {V}}',{{\textbf{c}}},{\varvec{\mu }}) > \textrm{e}({\mathscr {V}},{{\textbf{c}}},{\varvec{\mu }}) = \textrm{e}({\mathscr {V}},{{\textbf{c}}}, \widetilde{\varvec{\mu }}). \end{aligned}$$

To prove (5.2), write

$$\begin{aligned} {\mathbb {H}}_{\mu _j}(V'_j) = \sum _C L(\mu _j(C)), \quad {\mathbb {H}}_{{{\tilde{\mu }}}_j}(V'_j) = \sum _C L({{\tilde{\mu }}}_j(C)), \end{aligned}$$

where the sum is over all cosets C of $V'_j$ and $L(t) = -t \log t$. Thus, since $-C$ runs over all cosets as C does, we have

$$\begin{aligned} {\mathbb {H}}_{\mu _j}(V'_j) = \sum _C \frac{ L(\mu _j(C)) + L(\mu _j(-C))}{2} . \end{aligned}$$

By the concavity of L, we have

$$\begin{aligned} \frac{ L(\mu _j(C)) + L(\mu _j(-C))}{2} \leqslant L\Big ( \frac{\mu _j(C) + \mu _j(-C)}{2}\Big ) = L({\tilde{\mu }}_j(C)). \end{aligned}$$

Claim (5.2) then readily follows. $\square $

Lemma 5.4

Let $k\in {\mathbb {Z}}_{\geqslant 2}$ be such that $\gamma _k>0$. Then we have that $\gamma _k$ is the supremum of all $c>0$ for which there is a system $({\mathscr {V}},{{\textbf{c}}},{\varvec{\mu }})$ such that $c_{r+1}=c$, (3.4) holds and we further have:

(a)
$1=c_1> c_2> \cdots >c_{r+1}=c$;
(b)
${\mathbb {H}}_{\mu _j}(V_{j-1}) \geqslant \dim (V_j/V_{j-1})$ for $1\leqslant j\leqslant r-1$ and
$$\begin{aligned} {\mathbb {H}}_{\mu _r}(V_{r-1}) \geqslant \frac{c_r}{c_r-c_{r+1}} \dim (V_r/V_{r-1}) ; \end{aligned}$$
(c)
$\dim (V_1/V_0)=1$;
(d)
$\bigcup _{i=1}^j {\text {Supp}}\mu _i$ spans $V_j$ for $j=1,2,\ldots ,r$;
(e)
for all j and $\omega $, $\mu _j(\omega )=\mu _j({\textbf{1}}-\omega )$.

Remark

As we will see in Part IV, we always have $\gamma _k>0$.

Proof

The proof that we may take $c_1=1$ is the same as in Lemma 5.3.

Next, consider a system $({\mathscr {V}},{{\textbf{c}}},{\varvec{\mu }})$ satisfying $c_1=1$ and $c_{r+1}=c>0$, and consider the subflag ${\mathscr {V}}' : \langle {\textbf{1}}\rangle = V'_0 \leqslant V_1'\leqslant \cdots \leqslant V_r'$, where $V_i' = V_i$ for $i\leqslant r-1$, and $V_r'=V_{r-1}$. Thus

$$\begin{aligned} \textrm{e}({\mathscr {V}}',{{\textbf{c}}},{\varvec{\mu }}) - \textrm{e}({\mathscr {V}},{{\textbf{c}}},{\varvec{\mu }}) = (c_r-c_{r+1}){\mathbb {H}}_{\mu _r}(V_{r-1}) - c_r\dim (V_r/V_{r-1}) . \end{aligned}$$

Since the left-hand side is $\geqslant 0$ and we have assumed that $c_{r+1}=c>0$ and that $V_{r-1}\ne V_r$, the latter being true from Definition 3.2, we conclude that

$$\begin{aligned} c_r>c_{r+1} \quad \text {and}\quad {\mathbb {H}}_{\mu _r}(V_{r-1}) \geqslant \frac{c_r}{c_r-c_{r+1}} \dim (V_r/V_{r-1}). \end{aligned}$$

(5.3)

This proves part of statements (a) and (b). We shall now prove them fully.

(a) There are always indices $1=i_1<i_2<\cdots<i_s<i_{s+1}=r+1$ such that

$$\begin{aligned} c_{i_j}=\cdots =c_{i_{j+1}-1}>c_{i_{j+1}} \quad \text {for}\ j=1,\ldots ,s. \end{aligned}$$

Crucially, note that $i_{s+1}=r+1$ because $c_r>c_{r+1}$ by (5.3). Next, we define the system $({\mathscr {W}},\varvec{\nu },{\textbf{d}})$, where ${\mathscr {W}}$ is an s-step flag and, for all $j\in \{1,\ldots ,s\}$, we have

$$\begin{aligned} W_j=V_{i_{j+1}-1},\quad \nu _j=\mu _{i_{j+1}-1},\quad \text {and}\quad d_j=c_{i_{j+1}-1}. \end{aligned}$$

In particular, $W_s=V_{i_{s+1}-1}=V_r$ because $i_{s+1}=r$, and thus ${\mathscr {W}}$ is a non-degenerate flag system as per Definition 3.2 (b). Clearly, $1=d_1>d_2>\cdots>d_s>d_{s+1}=c$, so in order to prove part (a), all that remains to show is that the system $({\mathscr {W}},\varvec{\nu },{\textbf{d}})$ satisfies the entropy condition (3.4). This follows by a simple computation. Indeed, let ${\mathscr {W}}'$ be a subflag of ${\mathscr {W}}$. We then define ${\mathscr {V}}'\leqslant {\mathscr {V}}$ by letting $V_m'=W_j$ whenever $i_j\leqslant m<i_{j+1}$. Hence,

$$\begin{aligned} \textrm{e}({\mathscr {V}}',{\varvec{\mu }},{{\textbf{c}}})&= \sum _{m=1}^r (c_m-c_{m+1}) {\mathbb {H}}_{\mu _m}(V_m') + \sum _{m=1}^r c_m\dim (V_m'/V_{m-1}') \\&= \sum _{j=1}^s (c_{i_{j+1}-1} - c_{i_{j+1}}) {\mathbb {H}}_{\mu _m}(V_m') + \sum _{j=1}^s c_{i_j} \dim (V_{i_j}'/V_{i_j-1}') \\&=\textrm{e}({\mathscr {W}}',\varvec{\nu },{\textbf{d}}). \end{aligned}$$

Consequently, since the system $({\mathscr {V}},{\varvec{\mu }},{{\textbf{c}}})$ satisfies condition (3.4), so does $({\mathscr {W}},\varvec{\nu },{\textbf{d}})$. This proves that we may always assume condition (a).

(b) Consider a system $({\mathscr {V}},{{\textbf{c}}},{\varvec{\mu }})$ satisfying (a). We then argue as in Lemma 5.3, by considering the subflag ${\mathscr {V}}'$ with $V_i' = V_i$ for $i\ne j$, and $V_{j}'=V_{j-1}$. We then have

$$\begin{aligned}{} & {} \textrm{e}({\mathscr {V}}',{{\textbf{c}}},{\varvec{\mu }}) - \textrm{e}({\mathscr {V}},{{\textbf{c}}},{\varvec{\mu }}) \\ {}{} & {} \quad = {\left\{ \begin{array}{ll} (c_j-c_{j+1}) \big ( {\mathbb {H}}_{\mu _j}(V_{j-1}) - \dim (V_j/V_{j-1}) \big ) &{}\text {if}\ j\leqslant r-1,\\ (c_r-c_{r+1}){\mathbb {H}}_{\mu _r}(V_{r-1}) - c_r\dim (V_r/V_{r-1}) &{}\text {if}\ j=r. \end{array}\right. } \end{aligned}$$

Since the left-hand side is $\geqslant 0$ and $c_j-c_{j+1}>0$ for all $j=1,\ldots ,r$, statement (b) follows.

(c) Assuming statement (b), we may prove statement (c) by arguing as in Lemma 5.3.

(d) Suppose that (a) holds. Consider the flag ${\mathscr {V}}': \langle {\textbf{1}}\rangle \leqslant V_1'\leqslant \cdots \leqslant V_r'$, where

$$\begin{aligned} V_j' = {\text {Span}}\bigg (\bigcup _{i=1}^j {\text {Supp}}(\mu _j)\bigg ) \qquad (1\leqslant j\leqslant r). \end{aligned}$$

It is easy to see from the definition of a system (Definition 3.2) that ${\mathscr {V}}'$ is a subflag of ${\mathscr {V}}$. We have ${\mathbb {H}}_{\mu _j}(V'_j)=0$ for all j, and hence

$$\begin{aligned} \textrm{e}({\mathscr {V}}',{{\textbf{c}}},{\varvec{\mu }})&= \sum _{i=1}^r c_i \dim (V_i'/V_{i-1}') \\&= -c_1 + c_r \dim (V_r') + \sum _{i=1}^{r-1} (c_i-c_{i+1}) \dim (V_i') \\&\geqslant -c_1 + c_r \dim (V_r) + \sum _{i=1}^{r-1} (c_i-c_{i+1}) \dim (V_i) = \textrm{e}({\mathscr {V}},{{\textbf{c}}},{\varvec{\mu }}), \end{aligned}$$

by (3.5). Since $c_i-c_{j+1}>0$ for all $i\leqslant r-1$, and $c_r>c_{r+1}\geqslant 0$, we must have that $V_i'=V_i$ for all i, which is precisely statement (d).

(e) This statement is proven as in Lemma 5.3. $\square $

The bound $\beta _k \geqslant {\tilde{\gamma }}_k$ will now follow from the following proposition, as long as we can show that the quantity ${\tilde{\gamma }}_k$ is well-defined and positive. The latter will be accomplished in Sect. 9, where we construct a system satisfying the strict entropy condition 3.5. An alternative construction is given in Appendix C.

As usual, ${\textbf{A}}$ is a logarithmic random set.

Proposition 5.5

Let $c>0$ and suppose that there is a system $({\mathscr {V}},{{\textbf{c}}},{\varvec{\mu }})$ such that:

(i)
$1=c_1> c_2> \cdots >c_{r+1}=c$;
(ii)
There is some $\varepsilon >0$ such that $\textrm{e}({\mathscr {V}}',{{\textbf{c}}},{\varvec{\mu }}) \geqslant \textrm{e}({\mathscr {V}},{{\textbf{c}}},{\varvec{\mu }}) + \varepsilon $ for all proper subflags ${\mathscr {V}}'$ of ${\mathscr {V}}$.
(iii)
${\text {Supp}}(\mu _j) = V_j \cap \{0,1\}^k$ for $j=1,2,\ldots ,r$.

Let $\delta >0$, and assume that D is large enough in terms of $\delta ,\varepsilon $ and $({\mathscr {V}},{{\textbf{c}}},{\varvec{\mu }})$. Then the probability that ${\textbf{A}}\cap [D^c,D]$ has k distinct subsets with equal sums is $\geqslant 1-\delta $.

The proof of Proposition 5.5 is perhaps the most difficult part of this paper, and will occupy this and the next section. Throughout the remainder of this section and throughout the next section, we will fix a system $({\mathscr {V}},{{\textbf{c}}},{\varvec{\mu }})$ with $c_{r+1}=c$ satisfying conditions (i)–(iii) of Proposition 5.5. Constants implied by $O-$ and $\ll -$symbols may depend on this system.

The main result, which we will prove in this section and the next, is Proposition 5.7 below.

Definition 5.6

(Nondegenerate maps) A map $\psi : X \rightarrow \{0,1\}^k$ is said to be nondegenerate if the image of $\psi $ is not contained in any of the subspaces $\{x\in {\mathbb {Q}}^k : x_i=x_j\}$.

The map $\psi $ is a “Venn diagram selection function”, that is, the value of $\psi (b)$ specifies which piece of the Venn diagram of k subsets $X_1,\ldots ,X_k$ of X that b belongs to. In the notation (4.6) of the previous section, $\psi (a)=\omega $ means that $a\in B_\omega $. The condition that $\psi $ is nondegenerate is equivalent to $X_1,\ldots ,X_k$ being distinct, and is similar to the property of a flag ${\mathscr {V}}$ being nondegenerate.

Proposition 5.7

With probability tending to 1 as $D\rightarrow \infty $, there exists a nondegenerate map $\psi : {\textbf{A}}\cap (D^c,D] \rightarrow \{0,1\}^k$ such that $\sum _{a \in {\textbf{A}}} a \psi (a) \in \langle {\textbf{1}} \rangle $.

The map $\psi $ will be constructed using the data from the system $({\mathscr {V}},{{\textbf{c}}},{\varvec{\mu }})$. Before we embark on the proof of this result, we show how to deduce Proposition 5.5 from it.

Proof of Proposition 5.5, assuming Proposition 5.7

By Proposition 5.7, we know that with probability $1 - o_{D \rightarrow \infty }(1)$ there is a nondegenerate map $\psi : {\textbf{A}}\cap (D^c,D] \rightarrow \{0,1\}^k$ such that $\sum _{a \in {\textbf{A}}} a \psi (a)$ lies in $\langle {\textbf{1}} \rangle $, that is to say, it is a constant vector. We will show that this map induces k distinct subsets of ${\textbf{A}}$ with equal sums.

Let $\psi _i:{\textbf{A}}\cap (D^c,D]\rightarrow {\mathbb {Q}}$, $i=1,\ldots ,k$, denote the projection of $\psi $ onto the i-th coordinate of ${\mathbb {Q}}^k$, so that $\psi =(\psi _1,\ldots ,\psi _k)$. Define

$$\begin{aligned} A_i := \{ a \in {\textbf{A}}: \psi _i(a) = 1\}. \end{aligned}$$

These sets are distinct because if $A_i = A_j$, then the image of $\psi $ would take values in the hyperplane $\{x \in {\mathbb {Q}}^k : x_i = x_j\}$, contrary to the fact that $\psi $ is nondegenerate. Moreover, for all i, j we have

$$\begin{aligned} \sum _{a \in A_i}a - \sum _{a \in A_j}a = \sum _{a \in {\textbf{A}}} a \psi _i(a) - \sum _{a \in {\textbf{A}}} a \psi _j(a) = 0, \end{aligned}$$

and so $A_1,\ldots ,A_k$ do indeed have equal sums. $\square $

5.2 Many values of $\sum _{a \in {\textbf{A}}} a \psi (a)$, and a moment bound

We turn now to the task of proving Proposition 5.7. We will divide the proof of Proposition 5.7 into two parts. The first and more difficult part, which we prove in this section, states that (with high probability) $\sum _{a \in {\textbf{A}}} a \psi (a)$ takes many different values modulo $\langle {\textbf{1}} \rangle $ as $\psi $ ranges over all nondegenerate maps $\psi :{\textbf{A}}\cap (D^c,D]\rightarrow \{0,1\}^k$. The precise statement is Proposition 5.9 below. The deduction of Proposition 5.7 from Proposition 5.9 will occupy Sect. 6.

Let $0<\kappa \leqslant \min _{1\leqslant j\leqslant r} (c_j-c_{j+1})-2/\log D$ be a small quantity, which may depend on D. Let

$$\begin{aligned} {\textbf{A}}^j= & {} \{ a\in {\textbf{A}}: D^{c_{j+1}+\kappa } < a\leqslant D^{c_j}/e \} \qquad (1\leqslant j\leqslant r),\nonumber \\ {\textbf{A}}':= & {} \bigcup _{j = 1}^r {\textbf{A}}^j. \end{aligned}$$

(5.4)

The purpose of working with ${\textbf{A}}'$ rather than ${\textbf{A}}$ is to ensure that some gaps are left for the subsequent argument in the next section (based on ideas of Maier and Tenenbaum [20]), in which we show that one of the many sums $\sum _{a \in {\textbf{A}}'} a \psi (a)$ guaranteed by Proposition 5.9 may be modified, using the elements of ${\textbf{A}}\cap (D^c,D] {\setminus } {\textbf{A}}'$, to be in $\langle {\textbf{1}} \rangle $.

Definition 5.8

(Compatible functions) We say that a map $\psi : {\textbf{A}}' \rightarrow \{0,1\}^k$ is compatible if, for all j, $a\in {\textbf{A}}^j$ implies $\psi (a)\in V_j$.

Remark

Recall that ${\text {Supp}}(\mu _j)=V_j\cap \{0,1\}^k$ for all j by condition (iii) of Proposition 5.5. Setting $B_\omega ^{(j)}=\{a\in {\textbf{A}}^j:\psi (a)=\omega \}$, we see that $\psi $ being compatible is equivalent to $B_\omega ^{(j)} \ne \emptyset $ only if $\mu _j(\omega )>0$, and is consistent with earlier notation (4.6).

Proposition 5.9

There exist real numbers $\kappa ^*>0$, $p>1$ and $t>0$ (which depend on the system $({\mathscr {V}},{{\textbf{c}}},{\varvec{\mu }})$) so that the following is true. Let $\delta > 0$ and suppose that D is sufficiently large as a function of $\delta $. Uniformly for $0\leqslant \kappa \leqslant \kappa ^*$, we have with probability at least $1 - \delta $, that $\sum _{a \in {\textbf{A}}'}a \psi (a)$ takes at least

$$\begin{aligned} (t\delta )^{\frac{1}{p-1}} D^{\sum _j c_j \dim (V_j/V_{j-1})} \end{aligned}$$

different values modulo $\langle {\textbf{1}} \rangle $, as $\psi $ ranges over all nondegenerate, compatible maps $\psi $.

Remark

By (5.4), it clearly suffices to prove Proposition 5.9 for $\kappa =\kappa ^*$.

We will deduce Proposition 5.9 from a moment bound. Firstly, define the representation function $r_{{\textbf{A}}'}: {\mathbb {Q}}^k/\langle {\textbf{1}} \rangle \rightarrow {\mathbb {R}}$ by

$$\begin{aligned} r_{{\textbf{A}}'}(x) := \sum _{\begin{array}{c} \psi : {\textbf{A}}' \rightarrow \{0,1\}^k\\ \sum _{a \in {\textbf{A}}'} a \psi (a)- x\in \langle {\textbf{1}} \rangle \end{array}} w_{{\textbf{A}}'}(\psi ), \end{aligned}$$

where the summation is over all maps $\psi : {\textbf{A}}'\rightarrow \{0,1\}^k$, and where

$$\begin{aligned} w_{{\textbf{A}}'}(\psi ) := \prod _{j=1}^r \prod _{a \in {\textbf{A}}^j} \mu _j(\psi (a)). \end{aligned}$$

This weight function $w_{{\textbf{A}}'}$ is chosen so that it is large only when $\psi $ is balanced, that is, when for all j and $\omega $, the set ${\textbf{A}}^j$ has about $\mu _j(\omega )|{\textbf{A}}_j|$ elements a with $\psi (a)=\omega $. Observe that if $\psi (a)\not \in {\text {Supp}}(\mu _j)$ for some j and some $a\in {\textbf{A}}^j$, then $w_{{\textbf{A}}'}(\psi )=0$, and thus only compatible $\psi $ contribute to the sum $r_{{\textbf{A}}}(x)$. However, $w_{\textbf{A}}(\psi )$ might be non-zero for some degenerate maps $\psi $, and these will be removed by a separate argument below.

The crucial moment bound for the deduction of Proposition 5.9 is given below.

Proposition 5.10

Let

$$\begin{aligned} {\mathcal {E}}^* = \Big \{ A\subseteq [D^c,D]: \# (A \cap (y/e,y]) \leqslant \sqrt{y}/100 \quad (D^c\leqslant y\leqslant D) \Big \}. \end{aligned}$$

There is a $p > 1$ and $\kappa ^*>0$ so that uniformly for $0\leqslant \kappa \leqslant \kappa ^*$ and for all $D\geqslant e^{100/c}$ we have the moment bound

$$\begin{aligned} {\mathbb {E}}\Big [1_{{\textbf{A}}'\in {\mathcal {E}}^*} \sum _x r_{{\textbf{A}}'}(x)^p \Big ] \ll D^{-(p - 1) \sum _j c_j \dim (V_j/V_{j-1})}. \end{aligned}$$

Proof of Proposition 5.9, assuming Proposition 5.10

Define also

$$\begin{aligned} {\tilde{r}}_{{\textbf{A}}'}(x) := \sum _{\begin{array}{c} \psi : {\textbf{A}}' \rightarrow \{0,1\}^k \\ \psi \text { is compatible and nondegenerate} \\ \sum _{a \in {\textbf{A}}'} a \psi (a) -x \in \langle {\textbf{1}} \rangle \end{array}} w_{{\textbf{A}}'}(\psi ). \end{aligned}$$

We have

$$\begin{aligned} \sum _{x} r_{{\textbf{A}}'}(x) = \prod _{j=1}^r \Bigg ( \sum _{\omega } \mu _j(\omega ) \Bigg )^{|{\textbf{A}}^j|} = \prod _{j=1}^r 1 = 1 \end{aligned}$$

for any ${\textbf{A}}'$. On the other hand, when $\psi $ is non-compatible, then $w_{{\textbf{A}}'}(\psi )=0$ because we know that ${\text {Supp}}(\mu _j)=V_j\cap \{0,1\}^k$ for all j by our assumption of condition (iii) of Proposition 5.5. In addition, if $\psi $ is degenerate, then its image is contained in $\{x\in {\mathbb {Q}}^k:x_i=x_j\}\cap \{0,1\}^k$ for some $i\ne j$. Since $V_r\not \subset \{x\in {\mathbb {Q}}^k:x_i=x_j\}$, there must exist some $\omega \in V_r\cap \{0,1\}^k={\text {Supp}}(\mu _r)$ that is not in the support of $\psi $. Therefore,

$$\begin{aligned}\begin{aligned} \sum _{x} (r_{{\textbf{A}}'}(x) - {\tilde{r}}_{{\textbf{A}}'}(x)) \leqslant \sum _{\omega \in {\text {Supp}}(\mu _r)} (1-\mu _r(\omega ))^{|{\textbf{A}}^r|}. \end{aligned}\end{aligned}$$

Since $c_r>c_{r+1}$ by our assumption of condition (i) of Proposition 5.5, Lemma A.5 implies $|{\textbf{A}}^r| \geqslant \frac{1}{2}(c_{r}-c_{r+1}) \log D$ with probability $> 1 - O(e^{-(1/4)\log ^{1/2} D})$, and thus the right side above is o(1) with this same probability. The same lemma also implies that ${\textbf{A}}'\in {\mathcal {E}}^*$ with probability $> 1 - O(e^{-(1/4)\log ^{1/2} D})$.

Now fix a small $\delta >0$. The above discussion implies that, with probability at least $1 - \delta /2$ (for D sufficiently large), we have

$$\begin{aligned} \sum _{x} {\tilde{r}}_{{\textbf{A}}'}(x) \geqslant \frac{1}{2}\quad \text {and}\quad {\textbf{A}}'\in {\mathcal {E}}^* .\end{aligned}$$

(5.5)

On the other hand, Markov’s inequality and Proposition 5.10 imply that, with probability at least $1 - \delta /2$, we have

$$\begin{aligned} 1_{{\textbf{A}}'\in {\mathcal {E}}^*}\sum _x {\tilde{r}}_{{\textbf{A}}'}(x)^p\leqslant & {} 1_{{\textbf{A}}'\in {\mathcal {E}}^*}\sum _x r_{{\textbf{A}}'}(x)^p\nonumber \\\ll & {} \delta ^{-1} D^{-(p - 1) \sum _j c_j \dim (V_j/V_{j-1})}. \end{aligned}$$

(5.6)

By Hölder’s inequality,

$$\begin{aligned} 1_{{\textbf{A}}\in {\mathcal {E}}^*}\sum _x {\tilde{r}}_{{\textbf{A}}'}(x) \leqslant |{\text {Supp}}({\tilde{r}}_{{\textbf{A}}'})|^{1 - 1/p} \big (1_{{\textbf{A}}'\in {\mathcal {E}}^*} \sum _x {\tilde{r}}_{{\textbf{A}}'}(x)^p \big )^{1/p}. \end{aligned}$$

(5.7)

With probability at least $1 - \delta $, both (5.5) and (5.6) hold, and in this case (5.7) gives

$$\begin{aligned} |{\text {Supp}}({\tilde{r}}_{{\textbf{A}}'})| \gg _p \delta ^{\frac{1}{p-1}} D^{\sum _j c_j \dim (V_j/V_{j-1})}. \end{aligned}$$

This completes the proof of Proposition 5.9. $\square $

The rest of the section is devoted to the proof of Proposition 5.10.

5.3 An entropy condition for adapted systems

For reasons that will become apparent, in the proof of Proposition 5.10 we will need to apply the entropy gap condition not only with subflags ${\mathscr {V}}'$ of ${\mathscr {V}}$, but with a more general type of system.

Definition 5.11

(Adapted system) Given a system $({\mathscr {V}},{{\textbf{c}}},{\varvec{\mu }})$, the pair $({\mathscr {W}},{\textbf{b}})$ is adapted to $({\mathscr {V}},{{\textbf{c}}},{\varvec{\mu }})$ if ${\mathscr {W}}: \langle {\textbf{1}} \rangle = W_0 \leqslant W_1 \leqslant \cdots \leqslant W_s$ is a complete flag with $W_s\leqslant V_r$, and ${{\textbf{b}}}=(b_1,\ldots ,b_s)$ satisfies $1\geqslant b_1 \geqslant \cdots \geqslant b_s\geqslant 0$ and the condition

$$\begin{aligned} W_i \leqslant V_j \qquad \text {whenever}\qquad b_i>c_{j+1} . \end{aligned}$$

We say that $({\mathscr {W}},{\textbf{b}})$ is saturated if $s=\dim (V_r)-1$ and if for all $j\leqslant r$, there are exactly $\dim V_j-1$ values of i with $b_i>c_{j+1}$. Otherwise, we call $({\mathscr {W}},{{\textbf{b}}})$ unsaturated.

Remark

For the definition of complete flag, see Definition 3.1. We make a few comments to motivate the term saturated. Let

$$\begin{aligned} m_j=\#\{i:b_i>c_{j+1}\}\qquad (0\leqslant j\leqslant r), \end{aligned}$$

(5.8)

so that the $b_i$’s belonging to the interval $(c_{j+1},c_j]$ are precisely $b_{m_{j-1}+1},\ldots ,b_{m_j}$. Since $W_i\leqslant V_j$ whenever $b_i>c_{j+1}$, we infer that

$$\begin{aligned} W_{m_j}\leqslant V_j \qquad (1\leqslant j\leqslant r). \end{aligned}$$

(5.9)

Since ${\mathscr {W}}$ is complete, we have $\dim (W_i)=i+1$, and thus $m_j\leqslant \dim (V_j)-1$. In particular, $({\mathscr {W}},{{\textbf{b}}})$ is saturated if, and only if, we have equality in (5.9) for all j. $\square $

We need some further notation, which reflects that ${\textbf{A}}'$ is supported on intervals with gaps. For $1\leqslant j\leqslant r$, let

$$\begin{aligned} I_j = (c_{j+1}+\kappa ,c_j]. \end{aligned}$$

(5.10)

Recall that we take $\kappa $ small enough so that each $I_j$ has length $\geqslant 2/\log D$, that is, $\kappa \leqslant \min _j (c_j-c_{j+1})-2/\log D$.

There is a natural analogue of the $\textrm{e}$-value (cf. Definition 3.5) for adapted systems.

Definition 5.12

Given an adapted system $({\mathscr {W}}, {\textbf{b}})$, we define

$$\begin{aligned} \textrm{e}({\mathscr {W}},{\textbf{b}}) = \textrm{e}({\mathscr {W}},{\textbf{b}};{\mathscr {V}},{{\textbf{c}}},{\varvec{\mu }}) := \sum _{i,j} \lambda ([b_{i+1},b_i] \cap I_j) {\mathbb {H}}_{\mu _j}(W_i) + \sum _i b_i, \end{aligned}$$

where $\lambda $ denotes the Lebesgue measure on ${\mathbb {R}}$.

Finally, we define

$$\begin{aligned} \delta ({{\textbf{b}}}) = \max _{i,j} \{ c_j-b_i : b_i \in I_j \}, \end{aligned}$$

(5.11)

that is to say $\delta ({{\textbf{b}}})$ is the smallest non-negative real number with the property that

$$\begin{aligned} c_j-\delta ({{\textbf{b}}})\leqslant b_i\leqslant c_j\qquad (1\leqslant j\leqslant r,\ i\in I_j). \end{aligned}$$

Adapted systems $({\mathscr {W}},{{\textbf{b}}})$ can, in a certain sense, be interpreted in terms of convex superpositions of pairs $({\mathscr {V}}',{{\textbf{c}}})$, ${\mathscr {V}}' \leqslant {\mathscr {V}}$ a subflag. The next lemma gives us a strict inequality analogous to condition (ii) of Proposition 5.5, unless ${\mathscr {W}}$ is saturated and has a small value of $\delta ({{\textbf{b}}})$, which corresponds to the convex superposition which gives rise to $({\mathscr {W}},{\textbf{b}})$ having weight $\approx 1$ on the trivial subflag $({\mathscr {V}},{{\textbf{c}}})$.

Lemma 5.13

Let $({\mathscr {V}},{\varvec{\mu }},{{\textbf{c}}})$ be a system satisfying conditions (i)–(ii) of Proposition 5.5. Let $\varepsilon $ be as in condition (ii). Suppose that $({\mathscr {W}}, {\textbf{b}})$ is an adapted system to $({\mathscr {V}},{\varvec{\mu }},{{\textbf{c}}})$ such that $b_i$ lies in some set $I_j$ for each i. Suppose, further, that $\kappa $ is small enough in terms of $\varepsilon $, and that $\kappa \leqslant \frac{1}{2} \min _j (c_j-c_{j+1}).$

(a)
If $({\mathscr {W}},{{\textbf{b}}})$ is unsaturated, then $\textrm{e}({\mathscr {W}}, {\textbf{b}}) \geqslant \textrm{e}({\mathscr {V}},{{\textbf{c}}},{\varvec{\mu }}) + \varepsilon /2$.
(b)
If $({\mathscr {W}},{\textbf{b}})$ is a saturated, then $\textrm{e}({\mathscr {W}},{{\textbf{b}}}) \geqslant \textrm{e}({\mathscr {V}},{{\textbf{c}}},{\varvec{\mu }}) + \varepsilon \delta ({{\textbf{b}}})/2$.

Proof

We treat both parts together for most of the proof. Let $m_j$ be defined by (5.8). In particular, $m_0=0$ because $c_1=1$. Note that

$$\begin{aligned} \max _{i\in I_j}(c_j-b_i)=c_j-b_{m_j}, \end{aligned}$$

and let h be such that

$$\begin{aligned} \delta ({{\textbf{b}}})=c_{h}-b_{m_{h}} . \end{aligned}$$

Without loss of generality, we may assume that $b_{m_{h}}<c_{h}$; the case $b_{m_{h}}=c_{h}$ will then follow by continuity.

Set $b=b_{m_{h}}$ and note that

$$\begin{aligned}{} & {} \textrm{e}({\mathscr {W}},{{\textbf{b}}})\\ {}{} & {} \quad \geqslant \min \left\{ \textrm{e}({\mathscr {W}},{{\textbf{b}}}'): \begin{array}{l} b'_i \in [c_{j+1}+\kappa ,c_j]\ \text {when }i\in (m_{j-1},m_j]\text { and }j\ne h,\\ b_i'\in [b,c_{h}]\ \text {when } i\in (m_{h-1},m_{h}),\ b_{m_{h}}'=b,\\ b_1'\geqslant b_2'\geqslant \cdots \geqslant b_s' \end{array}\right\} . \end{aligned}$$

The quantity $\textrm{e}({\mathscr {W}},{{\textbf{b}}}')$ is linear in each variable $b_i'$ and the region over which we consider the above minimum is a polytope. As a consequence, the minimum of $\textrm{e}({\mathscr {W}},{{\textbf{b}}}')$ must occur at one of the vertices of the polytope. In particular, there are indices $\ell _j\in (m_{j-1},m_j]$ for $j=1,\ldots ,r$ such that

$$\begin{aligned}{} & {} \textrm{e}({\mathscr {W}},{{\textbf{b}}})\geqslant \textrm{e}({\mathscr {W}},{{\textbf{b}}}^*), \quad \text {where}\nonumber \\ {}{} & {} \quad b_i^*={\left\{ \begin{array}{ll} c_j&{}\text {if}\ m_{j-1}<i\leqslant \ell _j,\\ c_{j+1}+\kappa &{}\text {if}\ \ell _j<i\leqslant m_j,\ j\ne h,\\ b&{}\text {if}\ \ell _{h}<i\leqslant m_{h} . \end{array}\right. } \end{aligned}$$

(5.12)

In fact, note that we must have $\ell _{h} <m_{h}$ because $b_{m_{h}}^*=b$ and we have assumed that $b<c_{h}$.

Using the linearity of $\textrm{e}({\mathscr {W}},\cdot )$ once again, we find that

$$\begin{aligned} \textrm{e}({\mathscr {W}},{{\textbf{b}}}^*) = \frac{c_{h}-b}{c_{h}-c_{h+1}-\kappa } \textrm{e}({\mathscr {W}},{{\textbf{b}}}^{(1)}) + \frac{b-c_{h+1}-\kappa }{c_{h}-c_{h+1}-\kappa }\textrm{e}({\mathscr {W}},{{\textbf{b}}}^{(2)}), \end{aligned}$$

(5.13)

where $b_i^{(1)} = b_i^{(2)} = b_i^*$ for $i\in \{1,\ldots ,s\}{\setminus } (\ell _{h},m_{h}]$, $b_i^{(1)} = c_{h+1}+\kappa $ for $i\in (\ell _{h},m_{h}]$ and $b_i^{(2)} = c_{h}$ for $i\in (\ell _{h},m_{h}]$.

Fix ${{\textbf{b}}}'\in \{ {{\textbf{b}}}^{(1)}, {{\textbf{b}}}^{(2)} \}$. In addition, define the indices $i_1,\ldots ,i_r$ by letting $i_j=\ell _j$ when $j\ne h$ or ${{\textbf{b}}}'={{\textbf{b}}}^{(1)}$, while letting $i_{h}=m_{h}$ when ${{\textbf{b}}}'={{\textbf{b}}}^{(2)}$. We then have

$$\begin{aligned} b_i'= {\left\{ \begin{array}{ll} c_j&{}\text {if}\ m_{j-1}<i\leqslant i_j,\\ c_{j+1}+\kappa &{}\text {if}\ i_j<i\leqslant m_j. \end{array}\right. } \end{aligned}$$

A straightforward calculation implies that

$$\begin{aligned} \textrm{e}({\mathscr {W}},{{\textbf{b}}}') =\textrm{e}({\mathscr {V}}',{{\textbf{c}}},{\varvec{\mu }})+ S\kappa + (m_r-i_r)c_{r+1} , \end{aligned}$$

(5.14)

where ${\mathscr {V}}'$ is the subflag of ${\mathscr {V}}$ with $V_j'=W_{i_j}$ and

$$\begin{aligned} S=\sum _{j=1}^r \big (m_j-i_j-{\mathbb {H}}_{\mu _j}(W_{i_j})\big ). \end{aligned}$$

(Note that ${\mathscr {V}}'$ is indeed a subflag since $W_{i_j}\leqslant W_{m_j}\leqslant V_j$ by (5.9).)

If ${\mathscr {V}}'={\mathscr {V}}$, we must have that $W_{i_j}=V_j$ for all j. Since $W_{i_j}\leqslant W_{m_j}\leqslant V_j$, we infer that $W_{m_j}=V_j$, as well as that $i_j=m_j$ for all j. In particular, the flag $({\mathscr {W}},{{\textbf{b}}})$ we started with must be saturated and $S=0$ (since $i_j=m_j$ and ${\mathbb {H}}_{\mu _j}(W_{i_j})={\mathbb {H}}_{\mu _j}(V_j)=0$ for all j).

We are now ready to complete the proof of both parts of the lemma.

(a) By the above discussion, if $({\mathscr {W}},{{\textbf{b}}})$ is unsaturated, then ${\mathscr {V}}'\ne {\mathscr {V}}$. Therefore, by assumption of condition (ii) of Proposition 5.5, we have

$$\begin{aligned} \textrm{e}({\mathscr {W}},{{\textbf{b}}}')\geqslant \textrm{e}({\mathscr {V}},{{\textbf{c}}},{\varvec{\mu }})+\varepsilon \end{aligned}$$

$\text {for } {{\textbf{b}}}'\in \{{{\textbf{b}}}^{(1)},{{\textbf{b}}}^{(2)}\}$. Inserting this inequality into (5.13) implies that

$$\begin{aligned}{ \textrm{e}({\mathscr {W}},{{\textbf{b}}}^*)\geqslant \textrm{e}({\mathscr {V}},{{\textbf{c}}},{\varvec{\mu }})+\varepsilon +O(\kappa ).} \end{aligned}$$

Since $\textrm{e}({\mathscr {W}},{{\textbf{b}}})\geqslant \textrm{e}({\mathscr {W}},{{\textbf{b}}}^*)$, the proof of part (a) is complete by assuming that $\kappa $ is small enough in terms of $\varepsilon $.

(b) Assume that $({\mathscr {W}},{{\textbf{b}}})$ is saturated. We can only have ${\mathscr {V}}'={\mathscr {V}}$ if $i_{h}=m_{h}$. Since $\ell _{h}<m_{h}$, this can only happen when ${{\textbf{b}}}'={{\textbf{b}}}^{(2)}$. As a consequence, assuming again that $\kappa $ is small enough in terms of $\varepsilon $, we have that

$$\begin{aligned} \textrm{e}({\mathscr {W}},{{\textbf{b}}}')\geqslant \textrm{e}({\mathscr {V}},{{\textbf{c}}},{\varvec{\mu }})+1_{{{\textbf{b}}}'={{\textbf{b}}}^{(1)}} \cdot \varepsilon /2. \end{aligned}$$

Inserting this into (5.13) yields the inequality

$$\begin{aligned} \textrm{e}({\mathscr {W}},{{\textbf{b}}}^*) \geqslant \textrm{e}({\mathscr {V}},{{\textbf{c}}},{\varvec{\mu }}) +\frac{c_{h}-b}{c_{h}-c_{h+1}-\kappa } \cdot \frac{\varepsilon }{2}. \end{aligned}$$

Since $b=c_{h}-\delta ({{\textbf{b}}})$, $0<c_{h}-c_{h+1}-\kappa \leqslant 1$, and $\textrm{e}({\mathscr {W}},{{\textbf{b}}})\geqslant \textrm{e}({\mathscr {W}},{{\textbf{b}}}^*)$, we find that $\textrm{e}({\mathscr {W}},{{\textbf{b}}})\geqslant \textrm{e}({\mathscr {V}},{{\textbf{c}}},{\varvec{\mu }})+\varepsilon \delta ({{\textbf{b}}})/2$. This completes the proof of part (b) of the lemma. $\square $

5.4 Proof of the moment bound

In this subsection we prove Proposition 5.10. For a vector

$$\begin{aligned} {\textbf{n}}= (n_0,n_1,n_2,\ldots , n_r) with \end{aligned}$$

$$\begin{aligned} 0=n_0 \leqslant n_1 \leqslant \cdots \leqslant n_r, \end{aligned}$$

define the event

$$\begin{aligned} S({\textbf{n}})=\{{\textbf{A}}' : \# {\textbf{A}}^j = n_{j}-n_{j-1}\quad (1\leqslant j\leqslant r)\}. \end{aligned}$$

When ${\textbf{A}}'$ lies in $S({\textbf{n}})$, we write

$$\begin{aligned} {\textbf{A}}' = \{a_1, a_2, \ldots , a_{n_r} \}, \qquad a_1> a_2> \ldots > a_{n_r}, \end{aligned}$$

so that

$$\begin{aligned} a_t \in {\textbf{A}}^j \qquad \text{ if } \text{ and } \text{ only } \text{ if }\qquad n_{j-1}< t \leqslant n_j. \end{aligned}$$

(5.15)

We may define, for any compatible $\psi $, the auxilliary function

$$\begin{aligned} \theta : [n_r] \rightarrow V_r \cap \{0,1\}^k \qquad \text {such that}\qquad \theta (t)=\psi (a_t). \end{aligned}$$

(5.16)

The salient property of $\theta $ is that it is determined by the ordering of the elements in ${\textbf{A}}^j$ and not by the elements themselves. We denote by $\varvec{\Theta }_{{{\textbf{n}}}}$ the set of compatible functions $\theta $, that is, those functions satisfying

$$\begin{aligned} \theta (t)\in {\text {Supp}}(\mu _j) \qquad \text {whenever}\qquad t\leqslant n_j,\quad 1\leqslant j\leqslant r\ . \end{aligned}$$

(5.17)

In the event $S({\textbf{n}})$, if $\psi $ is an compatible function and $\theta $ is defined by (5.16), we have

$$\begin{aligned} w_{{\textbf{A}}'}(\psi ) = w_{{\textbf{n}}}(\theta ) := \prod _{j=1}^r \prod _{n_{j-1}<t \leqslant n_j} \mu _j(\theta (t)), \end{aligned}$$

(5.18)

where the notation $w_{{\textbf{n}}}$ (in place of $w_{{\textbf{A}}}$) reflects the fact that w only depends on $\theta $, and not otherwise on ${\textbf{A}}$. In this notation,

$$\begin{aligned} r_{{\textbf{A}}'}(x) = \sum _{\begin{array}{c} \theta \in \varvec{\Theta }_{{{\textbf{n}}}} \\ \sum _t \theta (t) a_t -x \in \langle {\textbf{1}} \rangle \end{array}} w_{{\textbf{n}}}(\theta ). \end{aligned}$$

Writing $r_{{\textbf{A}}'}^p = r_{{\textbf{A}}'}^{p-1} r_{{\textbf{A}}'}$ and interchanging the order of summation, it follows that if ${\textbf{A}}'$ lies in $S({\textbf{n}})$, then

$$\begin{aligned} \sum _x r_{{\textbf{A}}'}(x)^p&= \sum _{\theta \in \varvec{\Theta }_{{{\textbf{n}}}}} \bigg ( r_{{\textbf{A}}'}\Big (\sum _t a_t \theta (t)\Big )\bigg )^{p-1} w_{{\textbf{n}}}(\theta ) \nonumber \\&= \sum _{\theta \in \varvec{\Theta }_{{{\textbf{n}}}}} \bigg ( \sum _{\begin{array}{c} \theta ' \in \varvec{\Theta }_{{{\textbf{n}}}} \\ (5.20) \end{array}} w_{{\textbf{n}}}(\theta ') \bigg )^{p-1} w_{{\textbf{n}}}(\theta ), \end{aligned}$$

(5.19)

where the inner summation is over all compatible functions $\theta '$ satisfying

$$\begin{aligned} \sum _t a_t (\theta '(t) - \theta (t)) \in \langle {\textbf{1}} \rangle . \end{aligned}$$

(5.20)

Similar to the argument in Sect. 4.2, we find a flag ${\mathscr {W}}$ and special values of i which have the effect of isolating terms in the relation (5.20). With $\theta , \theta ',{\textbf{n}}$ fixed, let

$$\begin{aligned} \Omega = \Omega (\theta , \theta ') = \{ \theta '(t)-\theta (t) : 1\leqslant t\leqslant n_r \} \end{aligned}$$

and

$$\begin{aligned} s = \dim \big ( {\text {Span}}({\textbf{1}}, \Omega ) \big )- 1. \end{aligned}$$

We now choose a special basis of ${\text {Span}}({\textbf{1}}, \Omega )$. For each $\omega \in \Omega $, let

$$\begin{aligned} K_\omega = \min \{t : \theta '(t)-\theta (t)=\omega \}, \end{aligned}$$

and place a total ordering on $\Omega $ by saying that $\omega \prec \omega '$ if $K_{\omega } < K_{\omega '}$. Let $\omega ^1$ be the minimum element in $\Omega {\setminus } \langle {\textbf{1}} \rangle $,

$$\begin{aligned} \omega ^2 = \min (\Omega {\setminus } {\text {Span}}({\textbf{1}},\omega ^1)),\ldots , \omega ^s = \min (\Omega {\setminus } {\text {Span}}({\textbf{1}},\omega ^1,\ldots ,\omega ^{s-1})), \end{aligned}$$

where s is such that $\Omega \subset {\text {Span}}({\textbf{1}},\omega ^1,\ldots ,\omega ^s)$. Finally, let

$$\begin{aligned}{} & {} W_j = {\text {Span}}({\textbf{1}},\omega ^1,\ldots ,\omega ^j), \qquad \tau _j = K_{\omega ^j} \qquad (1\leqslant j\leqslant s), \\{} & {} \varvec{\tau }(\theta ,\theta ',{{\textbf{n}}})=(\tau _1,\ldots ,\tau _s), \end{aligned}$$

and form the flag

$$\begin{aligned} {\mathscr {W}}={\mathscr {W}}(\theta ,\theta ',{{\textbf{n}}}) \ : \ W_0 \leqslant W_1 \leqslant \cdots \leqslant W_s. \end{aligned}$$

We note that in the special case $\theta =\theta '$, we have $s=0$ and ${\mathscr {W}}$ is a trivial flag with only one space $W_0$.

Now we divide up the sample space of ${\textbf{A}}'$ into events describing the rough size of the critical elements $a_{\tau _j}$. By construction,

$$\begin{aligned} a_{\tau _j} = \max \{a_t\in {\textbf{A}}' : \theta '(t)-\theta (t) = \omega ^j \}. \end{aligned}$$

Similarly to Sect. 4, for $1\leqslant i\leqslant s$ let

(5.21)

The definition of ${\textbf{A}}'$ implies that for each i, there is some j with $b_i\in I_j=(c_{j+1}+\kappa ,c_j]$. Moreover, we have the implications

$$\begin{aligned} b_i>c_{j+1} \quad \implies \quad \tau _i \leqslant n_j \quad \implies \quad \omega ^i = \theta (\tau _i) -\theta '(\tau _i) \in V_j, \end{aligned}$$

where we used (5.17) to obtain the second implication. Since $b_1\geqslant b_2\geqslant \cdots \geqslant b_i$, we infer the stronger relation

$$\begin{aligned} b_i>c_{j+1}\qquad \implies \qquad W_i \leqslant V_j. \end{aligned}$$

(5.22)

Therefore, the pair $({\mathscr {W}},{{\textbf{b}}})$ is adapted to $({\mathscr {V}},{{\textbf{c}}},{\varvec{\mu }})$.

Using the inequality $(x+ y)^{p-1} \leqslant x^{p-1} + y^{p-1}$ repeatedly, we may partition (5.19) according to the values of ${\mathscr {W}}(\theta ,\theta ')$ and $\varvec{\tau }(\theta ,\theta ')$, obtaining (still assuming $S({\textbf{n}})$)

$$\begin{aligned} \sum _x r_{{\textbf{A}}'}(x)^p \leqslant \sum _{{\mathscr {W}}, \varvec{\tau },\theta } \bigg ( \sum _{\begin{array}{c} \theta ' \in \varvec{\Theta }_{{{\textbf{n}}}} ,\ (5.20) \\ {\mathscr {W}}(\theta ,\theta ',{{\textbf{n}}}) = {\mathscr {W}}, \ \varvec{\tau }(\theta ,\theta ',{{\textbf{n}}}) = \varvec{\tau } \end{array} } w_{{\textbf{n}}}(\theta ')\bigg )^{p-1} w_{{\textbf{n}}}(\theta ). \end{aligned}$$

We need to separately consider other elements of ${\textbf{A}}'$ that lie in the intervals $(D^{b_i}/e,D^{b_i}]$, and so we define

$$\begin{aligned}{} & {} {\mathcal {B}}= \{b_i : 1\leqslant i\leqslant s\}\qquad \text {and}\qquad \varvec{\ell }=(\ell _b)_{b\in {\mathcal {B}}}, \\ {}{} & {} \quad \text {where}\quad \ell _b = \#\big ({\textbf{A}}' \cap (D^b/e,D^b]\big ) . \end{aligned}$$

By assumption, $\sum _b \ell _b \geqslant s$. It may happen that $b_i=b_{i+1}$ for some i, in which case $|{\mathcal {B}}| < s$. With ${{\textbf{n}}}, \varvec{\tau }, {{\textbf{b}}}, \varvec{\ell }$ all fixed, consider the event

$$\begin{aligned} E({{\textbf{b}}},\varvec{\tau },{{\textbf{n}}},\varvec{\ell }) \end{aligned}$$

defined as the intersection of

$S({{\textbf{n}}})$;
$a_{\tau _i} \in (D^{b_i}/e,D^{b_i}]$ for all i;
$|{\textbf{A}}' \cap (D^b/e,D^b]| = \ell _b$ for all $b\in {\mathcal {B}}$.

Taking expectations over ${\textbf{A}}'$, we get

$$\begin{aligned} {\mathbb {E}}&\Big [1_{{\textbf{A}}'\in S({\textbf{n}})\cap {\mathcal {E}}^*}\sum _x r_{{\textbf{A}}'}(x)^p \Big ] \\&\leqslant {\mathbb {E}}\bigg [\sum _{\begin{array}{c} {\mathscr {W}},\varvec{\tau },{{\textbf{b}}},\theta , \varvec{\ell }\\ \ell _b\leqslant D^{b/2}/100\ \forall b\in {\mathcal {B}} \end{array}} w_{{\textbf{n}}}(\theta )\bigg ( \sum _{\begin{array}{c} \theta '\in \varvec{\Theta }_{{{\textbf{n}}}},\ (5.20) \\ {\mathscr {W}}(\theta ,\theta ',{{\textbf{n}}}) = {\mathscr {W}},\ \\ \varvec{\tau }(\theta ,\theta ',{{\textbf{n}}}) = \varvec{\tau } \end{array} } w_{{\textbf{n}}}(\theta ')\bigg )^{p-1} 1_{E({{\textbf{b}}},\varvec{\tau }, {{\textbf{n}}}, \varvec{\ell })}\bigg ], \end{aligned}$$

where the condition that $\ell _b\leqslant D^{b/2}/100$ comes from the fact that we taking expectations over ${\textbf{A}}'\in {\mathcal {E}}^*$. By Hölder’s inequality with exponents $\frac{1}{p-1}$, $\frac{1}{2-p}$, this implies that

$$\begin{aligned}&{\mathbb {E}}\Big [1_{{\textbf{A}}'\in S({\textbf{n}})\cap {\mathcal {E}}^*}\sum _x r_{{\textbf{A}}'}(x)^p \Big ]\nonumber \\&\qquad \leqslant \sum _{\begin{array}{c} {\mathscr {W}},\varvec{\tau },{{\textbf{b}}},\theta ,\varvec{\ell }\\ \ell _b\leqslant D^{b/2}/100\ \forall b\in {\mathcal {B}} \end{array}} w_{{\textbf{n}}}(\theta ){\mathbb {P}}( E({{\textbf{b}}},\varvec{\tau },{\textbf{n}}, \varvec{\ell }))^{2 - p} \times \nonumber \\&\qquad \quad \times \bigg \{\sum _{\begin{array}{c} \theta '\in \varvec{\Theta }_{{{\textbf{n}}}} \\ {\mathscr {W}}(\theta ,\theta ',{{\textbf{n}}}) = {\mathscr {W}}\\ \varvec{\tau }(\theta ,\theta ',{{\textbf{n}}}) = \varvec{\tau } \end{array}} w_{{\textbf{n}}}(\theta '){\mathbb {P}}\big [ E({{\textbf{b}}},\varvec{\tau },{{\textbf{n}}}, \varvec{\ell }) \wedge (5.20) \big ] \bigg \}^{p-1}. \end{aligned}$$

(5.23)

Claim. Let $\ell _b\leqslant D^{b/2}/100$ for all $b\in {\mathcal {B}}$. Then we have

$$\begin{aligned} {\mathbb {P}}\big ( (5.20)\ \big | \ E({{\textbf{b}}},\varvec{\tau },{{\textbf{n}}}, \varvec{\ell }) \big ) \ll D^{-(b_1+\cdots +b_s)}e^{\sum _b\ell _b}. \end{aligned}$$

(5.24)

Proof of Claim

Let us begin by analyzing the event $E({{\textbf{b}}},\varvec{\tau },{{\textbf{n}}}, \varvec{\ell })$ we are conditioning on. Consider the set $\bigcup _j(D^{c_{j+1}+\kappa },D^{c_j}]{\setminus } \bigcup _{b\in {\mathcal {B}}} (D^b/e,D^b]$. There is a unique way to write it as $\bigcup _{m=1}^M I_m$, where the sets $I_m$ are intervals of the form (A, B] with their closures ${\bar{I}}_m$ mutually disjoint. Now, the event $E({{\textbf{b}}},\varvec{\tau },{{\textbf{n}}}, \varvec{\ell })$ is equivalent to there being mutually disjoint sets of consecutive integers ${\mathcal {I}}_m$ ($1\leqslant m\leqslant M$) and ${\mathcal {J}}_b$ ($b\in {\mathcal {B}}$) such that:

The sets ${\mathcal {I}}_m$ $(1\leqslant m\leqslant M)$ and ${\mathcal {J}}_b$ $(b\in {\mathcal {B}})$ together form a partition of the set $[n_r]$;
For all $m\in \{1,\ldots ,M\}$, we have $a_n\in I_m$ if and only if $n\in {\mathcal {I}}_m$;
For all $b\in {\mathcal {B}}$, we have $a_n\in (D^b/e,D^b]$ if and only if $n\in {\mathcal {J}}_b$;
$\tau _i\in {\mathcal {J}}_{b_i}$ for all i;
$|{\mathcal {J}}_b|=\ell _b$ for all $b\in {\mathcal {B}}$.

The above discussion allows us to describe the distribution law of ${\textbf{A}}'$ under the event $E({{\textbf{b}}},\varvec{\tau },{{\textbf{n}}}, \varvec{\ell })$: given a choice of the intervals ${\mathcal {I}}_m$ and ${\mathcal {J}}_b$, we construct independent logarithmic random sets ${\textbf{A}}^*_m$ on $I_m$ and ${\tilde{A}}_b$ on $(D^b/e,D^b]$ such that $\#{\textbf{A}}'\cap I_m=\#{\mathcal {I}}_m$ for all m and $\#{\tilde{A}}_b=\ell _b$ for all b. Then ${\textbf{A}}'$ is the union of all ${\textbf{A}}^*_m$’s and all ${\tilde{{\textbf{A}}}}_b$’s.

Having explained how the distribution of ${\textbf{A}}'$ looks like under the event $E({{\textbf{b}}},\varvec{\tau },{{\textbf{n}}},\varvec{\ell })$, let us now prove our claim. We argue as in the proof of Proposition (4.4). Relation (5.20) implies

$$\begin{aligned} \sum _{i=1}^s \omega ^i a_{\tau _i} + \sum _{t\not \in \{\tau _1,\ldots ,\tau _s\}} a_t (\theta '(t)-\theta (t)) = a_0 {\textbf{1}}\end{aligned}$$

for some $a_0\in {\mathbb {Z}}$. Since ${\textbf{1}}, \omega ^1,\ldots ,\omega ^s$ are linearly independent, this uniquely determines their coefficients $a_0, a_{\tau _1},\ldots , a_{\tau _s}$ in terms of the other $a_i$’s. For each $b\in {\mathcal {B}}$, let

$$\begin{aligned} m_b = \# \{i: b_i = b\} \quad \text {and}\quad N_b = \#\big ({\mathbb {Z}}\cap (D^{b}/e,D^{b}]\big ) = (1-1/e)D^b + O(1). \end{aligned}$$

Then, given ${\textbf{A}}_m^*$ for all m and $b\in {\mathcal {B}}$, there are at most

$$\begin{aligned} \left( {\begin{array}{c}N_b\\ \ell _b-m_b\end{array}}\right)\leqslant & {} \frac{N_b^{\ell _b-m_b}}{(\ell _b-m_b)!}\\< & {} \ell _b^{m_b}\cdot \frac{((1-1/e)D)^{b(\ell _b-m_b)}}{\ell _b!} \ll \frac{D^{b(\ell _b-m_b)}}{\ell _b!} \end{aligned}$$

choices for ${\tilde{{\textbf{A}}}}_b$ (since $m_b$ of each elements are determined by the remaining $\ell _b-m_b$ elements and by the elements of the ${\textbf{A}}_m^*$ that we have fixed), where we used that $\ell _b^{m_b}\leqslant \ell _b^k\ll (1-1/e)^{-\ell _b}$. In addition, Lemma A.4 implies that the probability of occurrence of a given set $X_b\subset {\mathbb {Z}}\cap (D^b/e,D^b]$ as the set ${\tilde{{\textbf{A}}}}_b$, conditionally to the event that $\#{\tilde{{\textbf{A}}}}_b=\ell _b$, is

$$\begin{aligned}{} & {} \ll \frac{\ell _b!}{(\sum _{D^b/e<m\leqslant D^b}1/(m-1))^{\ell _b}} \prod _{x\in X_b}\frac{1}{x}\prod _{D^b/e<m\leqslant D^b}\bigg (1-\frac{1}{m}\bigg )\\{} & {} \ll \frac{\ell _b!}{(D^b/e)^{\ell _b}}. \end{aligned}$$

Putting the above estimates together, we conclude that

$$\begin{aligned}\begin{aligned} {\mathbb {P}}\big ( (5.20)\ \big | \ E({{\textbf{b}}},\varvec{\tau },{{\textbf{n}}}, \varvec{\ell }) \big )&\ll \prod _{b\in {\mathcal {B}}}\frac{e^{\ell _b}}{D^{bm_b}} =D^{-(b_1+\cdots +b_s)}e^{\sum _b\ell _b}, \end{aligned}\end{aligned}$$

upon noticing that $\sum _{b\in {\mathcal {B}}} m_b b = \sum _i b_i$. This proves our claim that (5.24) holds. $\square $

In the light of (5.24), relation (5.23) becomes

$$\begin{aligned} {\mathbb {E}}&\Big [1_{{\textbf{A}}'\in S({\textbf{n}})\cap {\mathcal {E}}^*}\sum _x r_{{\textbf{A}}'}(x)^p \Big ] \nonumber \\&\ll \sum _{{\mathscr {W}},\varvec{\tau },{{\textbf{b}}},\varvec{\ell }} D^{-(p-1) \sum _j b_j}e^{\sum _b\ell _b}\nonumber \\ {}&\quad \times {\mathbb {E}}\bigg [ \sum _{\theta \in \varvec{\Theta }_{{{\textbf{n}}}} } w_{{\textbf{n}}}(\theta )\bigg (\sum _{\begin{array}{c} \theta '\in \varvec{\Theta }_{{{\textbf{n}}}} \\ {\mathscr {W}}(\theta ,\theta ',{{\textbf{n}}}) = {\mathscr {W}}\\ \varvec{\tau }(\theta ,\theta ',{{\textbf{n}}}) = \varvec{\tau } \end{array}} w_{{\textbf{n}}}(\theta ')\bigg )^{p-1} 1_{E({{\textbf{b}}},\varvec{\tau },{\textbf{n}},\varvec{\ell })} \bigg ]. \end{aligned}$$

(5.25)

To evaluate the bracketed expression, first recall the definition (5.18) of $w_{{\textbf{n}}}(\theta ')$, and note that the conditions ${\mathscr {W}}(\theta ,\theta ',{{\textbf{n}}}) = {\mathscr {W}}$, $\varvec{\tau }(\theta ,\theta ',{{\textbf{n}}}) = \varvec{\tau }$ together imply that

$$\begin{aligned} \theta '(t) - \theta (t) \in W_i \qquad (1\leqslant t < \tau _{i+1},\ 0\leqslant i\leqslant s), \end{aligned}$$

where we have defined $\tau _0:=0$ and $\tau _{s+1} := n_r+1$. For brevity, write

$$\begin{aligned} T_{i,j} = (n_{j-1},n_j] \cap [\tau _i,\tau _{i+1}) \cap {\mathbb {N}}, \qquad (0\leqslant i\leqslant s,\ 1\leqslant j\leqslant r). \end{aligned}$$

Some of these sets are empty. In any case, we have

$$\begin{aligned} \sum _{\begin{array}{c} \theta '\in \varvec{\Theta }_{{{\textbf{n}}}} \\ {\mathscr {W}}(\theta ,\theta ',{{\textbf{n}}}) = {\mathscr {W}}\\ \varvec{\tau }(\theta ,\theta ',{{\textbf{n}}}) = \varvec{\tau } \end{array}} w_{{\textbf{n}}}(\theta ') \leqslant \prod _{\begin{array}{c} 0\leqslant i\leqslant s \\ 1\leqslant j\leqslant r \end{array}} \;\;\; \prod _{t \in T_{i,j}} \mu _j (\theta (t) + W_i). \end{aligned}$$

(5.26)

From (5.18), and the fact that the discrete intervals $T_{i,j}$ are disjoint and cover $[n_r]$, we have

$$\begin{aligned} w_{{\textbf{n}}}(\theta ) = \prod _{i,j} \prod _{t \in T_{i,j}} \mu _{j}(\theta (t)). \end{aligned}$$

With these observations, we conclude that

$$\begin{aligned}&\sum _{\theta \in \varvec{\Theta }_{{{\textbf{n}}}} } w_{{\textbf{n}}}(\theta )\bigg ( \sum _{\begin{array}{c} \theta '\in \varvec{\Theta }_{{{\textbf{n}}}} \\ {\mathscr {W}}(\theta ,\theta ',{{\textbf{n}}}) = {\mathscr {W}}\\ \varvec{\tau }(\theta ,\theta ',{{\textbf{n}}}) = \varvec{\tau } \end{array}} w_{{\textbf{n}}}(\theta ')\bigg )^{p-1} \nonumber \\ {}&\quad \leqslant \sum _{\theta \in \varvec{\Theta }_{{{\textbf{n}}}} } \prod _{i,j} \prod _{t \in T_{i,j}} \mu _{j} (\theta (t)) \mu _{j}(W_{i} + \theta (t))^{p-1} \nonumber \\&\quad = \prod _{i,j} \eta (i,j,p,{\mathscr {W}})^{|T_{i,j}|}, \end{aligned}$$

(5.27)

where

$$\begin{aligned} \eta (i,j,p,{\mathscr {W}}) := \sum _{\omega \in {\text {Supp}}(\mu _j)} \mu _j(\omega ) \mu _j(W_i + \omega )^{p-1}. \end{aligned}$$

(5.28)

Substituting into (5.25), and summing over ${\textbf{n}}$, we get

$$\begin{aligned}{} & {} {\mathbb {E}}\Big [1_{{\textbf{A}}'\in {\mathcal {E}}^*}\sum _{x} r_{{\textbf{A}}'}(x)^p \Big ] \nonumber \\ {}{} & {} \quad \quad \ll \sum _{{\mathscr {W}}, {{\textbf{b}}}} D^{-(p-1) \sum _j b_j} \sum _{\varvec{\tau },{{\textbf{n}}},\varvec{\ell }} e^{\sum _b\ell _b} {\mathbb {E}}\bigg [ 1_{E({{\textbf{b}}},\varvec{\tau },{{\textbf{n}}},\varvec{\ell })} \prod _{i,j} \eta (i,j,p,{\mathscr {W}})^{|T_{i,j}|} \bigg ].\nonumber \\ \end{aligned}$$

(5.29)

If $V_j \leqslant W_i$, then $\mu _j(W_i+\omega )=1$ for all $\omega $ and thus $\eta (i,j,p,{\mathscr {W}})=1$. For all $i,j,p,{\mathscr {W}}$ we have $\eta (i,j,p,{\mathscr {W}})\leqslant 1$. Thus, we require lower bounds on $|T_{i,j}|$ in the case $V_j \not \leqslant W_i$.

Claim. Assume that $E({{\textbf{b}}},\varvec{\tau },{{\textbf{n}}},\varvec{\ell })$ holds. Given i such that $b_{i+1}<b_i$ and $j\in \{1,\ldots ,r\}$, define

$$\begin{aligned} M_{i,j} := (D^{c_{j+1}+\kappa },D^{c_j}/e] \cap (D^{b_{i+1}}, D^{b_i}/e] \end{aligned}$$

Then

$$\begin{aligned} \{t: a_t \in M_{i,j}\}\subset T_{i,j} . \end{aligned}$$

(5.30)

Proof of Claim

Let t be such that $a_t\in M_{i,j}$. In particular,

$$\begin{aligned} D^{b_{i+1}}<a_t\leqslant D^{b_i}/e. \end{aligned}$$

This relation and the definition of $b_i$ in (5.21) imply that $a_{\tau _{i+1}}<a_t<a_{\tau _i}$ and hence $\tau _i<t<\tau _{i+1}$, where we used that $a_1>a_2>\cdots >a_{n_r}$. In addition, since $D^{c_{j+1}+\kappa }<a_t\leqslant D^{c_j}$, we have that $a_t\in {\textbf{A}}^j$. Thus, $n_{j-1}<t\leqslant n_j$ by (5.15). This completes the proof of the claim. $\square $

A direct consequence of (5.30) is that

$$\begin{aligned} |T_{i,j}| \geqslant \big | {\textbf{A}}' \cap M_{i,j} \big |. \end{aligned}$$

Combining this inequality with (5.29), we get

$$\begin{aligned} {\mathbb {E}}\Big [1_{{\textbf{A}}'\in {\mathcal {E}}^*}\sum _x r_{{\textbf{A}}'}(x)^p\Big ]&\ll \sum _{{\mathscr {W}}, {{\textbf{b}}}} D^{-(p-1) \sum _j b_j} \\&\quad \times \sum _{{{\textbf{n}}},\varvec{\tau },\varvec{\ell }} e^{\sum _b\ell _b} {\mathbb {E}}\bigg [ 1_{E({{\textbf{b}}},\varvec{\tau },{\textbf{n}},\varvec{\ell })} \prod _{i,j} \eta (i,j,p,{\mathscr {W}})^{|{\textbf{A}}\cap M_{i,j}|}\bigg ]. \end{aligned}$$

Fix ${{\textbf{b}}}$ and ${\mathscr {W}}$, and let $E'({{\textbf{b}}},\varvec{\ell })$ be the event that $|{\textbf{A}}'\cap (D^b/e,D^b]|=\ell _b$ for all $b\in {\mathcal {B}}$. Given ${\textbf{A}}'\in E'({{\textbf{b}}},\varvec{\ell })$, we have at most $\prod _b \ell _b\leqslant e^{\sum _b\ell _b}$ choices for $\tau _1,\ldots ,\tau _s$. Hence,

$$\begin{aligned}\begin{aligned} \sum _{{{\textbf{n}}},\varvec{\tau },\varvec{\ell }}&e^{\sum _b\ell _b}{\mathbb {E}}\bigg [ 1_{E({{\textbf{b}}},\varvec{\tau },{\textbf{n}},\varvec{\ell })} \eta (i,j,p,{\mathscr {W}})^{|{\textbf{A}}' \cap M_{i,j}|} \bigg ] \\&\leqslant \sum _{{{\textbf{n}}},\varvec{\ell }}e^{2\sum _b\ell _b} {\mathbb {E}}\bigg [ 1_{S({{\textbf{n}}})}1_{E'({{\textbf{b}}},\varvec{\ell })} \prod _{i,j} \eta (i,j,p,{\mathscr {W}})^{|{\textbf{A}}' \cap M_{i,j}|}\bigg ]. \end{aligned}\end{aligned}$$

Since the events $S({{\textbf{n}}})$ are mutually disjoint, we arrive at the inequality

$$\begin{aligned}{} & {} {\mathbb {E}}\Big [1_{{\textbf{A}}'\in {\mathcal {E}}^*} \sum _x r_{{\textbf{A}}'}(x)^p \Big ] \nonumber \\{} & {} \quad \leqslant \sum _{{\mathscr {W}},{{\textbf{b}}}} D^{-(p-1)\sum _j b_j} \, {\mathbb {E}}\Big [\prod _{b\in {\mathcal {B}}}e^{2|{\tilde{{\textbf{A}}}}_b|} \prod _{i,j}\eta (i,j,p,{\mathscr {W}})^{|{\textbf{A}}' \cap M_{i,j}|} \Big ]. \end{aligned}$$

(5.31)

Next, we estimate the right hand side of (5.31). The intervals $M_{i,j}$ and $(D^b/e,D^b]$ are mutually disjoint by (5.30), hence the quantities $|{\textbf{A}}' \cap M_{i,j}|$ and $|{\tilde{{\textbf{A}}}}_b|$ are independent. Using Lemma A.3, we obtain

$$\begin{aligned} {\mathbb {E}}&\Big [\prod _{b\in {\mathcal {B}}}e^{2|{\tilde{{\textbf{A}}}}_b|} \prod _{i,j}\eta (i,j,p,{\mathscr {W}})^{|{\textbf{A}}' \cap M_{i,j}|}\Big ] \\&\leqslant \exp \Big \{ \sum _{b\in {\mathcal {B}}}\sum _{D^b/e<m\leqslant D^b}\frac{2e-1}{m}+\sum _{i,j} \big (\eta (i,j,p,{\mathscr {W}})-1\big )\sum _{m\in M_{i,j}} \frac{1}{m} \Big \} \\&\ll \exp \Big \{ \sum _{i,j} \big (\eta (i,j,p,{\mathscr {W}})-1\big )\sum _{m\in M_{i,j}} \frac{1}{m} \Big \}. \end{aligned}$$

Recall that $I_j=(c_{j+1}+\kappa ,c_j]$, define

$$\begin{aligned} G_i=G_i({{\textbf{b}}}) = (b_{i+1},b_i], \end{aligned}$$

and recall that $\lambda $ denotes the Lebesgue measure on ${\mathbb {R}}$. Then, by the definition of $M_{i,j}$, we have

$$\begin{aligned} \sum _{m\in M_{i,j}} \frac{1}{m} = \lambda (I_j \cap G_i)\log D + O(1). \end{aligned}$$

Substituting into the definition of $\textrm{e}()$ (Definition 5.12), this gives

$$\begin{aligned} {\mathbb {E}}\Big [1_{{\textbf{A}}'\in {\mathcal {E}}^*} \sum _x r_{{\textbf{A}}'}(x)^p \Big ] \ll \sum _{{\mathscr {W}}, {{\textbf{b}}}} D^{-E(p,{\mathscr {W}},{{\textbf{b}}})}, \end{aligned}$$

(5.32)

where

$$\begin{aligned}&E(p,{\mathscr {W}},{{\textbf{b}}}) \\ {}&\quad := (p-1) \sum _j b_j - \mathop {\sum }_{i,j} \big (\eta (i,j,p,{\mathscr {W}}) - 1\big ) \lambda (I_j \cap G_i)\\&\quad = (p-1) \textrm{e}({\mathscr {W}},{{\textbf{b}}}) - \mathop {\sum }_{i,j} \big [\eta (i,j,p,{\mathscr {W}}) - 1+(p-1){\mathbb {H}}_{\mu _j}(W_i)\big ] \lambda (I_j \cap G_i) .\end{aligned}$$

Recall the definition (5.28) of $\eta (i,j,p,{\mathscr {W}})$. If $W_i \geqslant V_j$, then $\mu _j(W_i + x) \!=\! 1$ whenever $x \in {\text {Supp}}(\mu _j)$, and so in this case $\eta (i,j,p,{\mathscr {W}}) = 1.$ Since ${\mathbb {H}}_{\mu _j}(W_i)=0$ in this case, we have

$$\begin{aligned} \eta (i,j,p,{\mathscr {W}}) - 1 + (p-1){\mathbb {H}}_{\mu _j}(W_i) = 0 \qquad (V_j \leqslant W_i). \end{aligned}$$

(5.33)

For any fixed $i,j,{\mathscr {W}}$, we have

$$\begin{aligned} \frac{\text {d}}{\text {d}p} \eta (i,j,p,{\mathscr {W}})\Big |_{p=1} = -{\mathbb {H}}_{\mu _j}(W_i), \end{aligned}$$

and so

$$\begin{aligned} \eta (i,j,p,{\mathscr {W}}) - 1 + (p-1) {\mathbb {H}}_{\mu _j}(W_l) \ll (p-1)^2 \qquad (V_j \not \leqslant W_i). \end{aligned}$$

(5.34)

We deduce from (5.32), (5.33) and (5.34) that

$$\begin{aligned} E(p,{\mathscr {W}},{{\textbf{b}}}) = (p-1)\textrm{e}({\mathscr {W}},{{\textbf{b}}}) -\mathop {\sum \sum }_{\begin{array}{c} i,j :\ V_j \not \leqslant W_i \end{array}} \lambda (I_j \cap G_i)O((p-1)^2).\nonumber \\ \end{aligned}$$

(5.35)

To continue, we separate two cases.

Case 1. $({\mathscr {W}},{{\textbf{b}}})$ is unsaturated.

In the above case, Lemma 5.13(a) implies that $\textrm{e}({\mathscr {W}},{{\textbf{b}}}) \geqslant \textrm{e}(\mathscr {V},{{\textbf{c}}},{\varvec{\mu }}) + \varepsilon /2$. Consequently,

$$\begin{aligned} E(p,{\mathscr {W}},{{\textbf{b}}})&\geqslant (p-1)\textrm{e}({\mathscr {V}},{{\textbf{c}}},{\varvec{\mu }})+\frac{(p-1)\varepsilon }{2}+O((p-1)^2) \\ {}&\geqslant (p-1)\textrm{e}({\mathscr {V}},{{\textbf{c}}},{\varvec{\mu }})+\frac{(p-1)\varepsilon }{4}, \end{aligned}$$

provided that $p-1$ is small enough in terms of $\varepsilon $ (and k).

Since there are O(1) choices for ${\mathscr {W}}$ and $\log ^{O(1)} D$ choices for ${{\textbf{b}}}$, the contribution of such flags to the right hand side of (5.32) is

$$\begin{aligned} \sum _{({\mathscr {W}}, {{\textbf{b}}})\ \text {unsaturated}} D^{-E(p,{\mathscr {W}},{{\textbf{b}}})}\ll D^{-(p-1)\textrm{e}({\mathscr {V}},{{\textbf{c}}},{\varvec{\mu }})}. \end{aligned}$$

(5.36)

Case 2. $({\mathscr {W}},{{\textbf{b}}})$ is saturated. (Recall from Definition 5.11 that $({\mathscr {W}},{{\textbf{b}}})$ is called saturated when $s=\dim (V_r)-1$ and for all $j\leqslant r$, there are exactly $\dim V_j-1$ values of i with $b_i>c_{j+1}$.)

Fix for the moment a pair (i, j) such that

$$\begin{aligned} V_j\not \leqslant W_i \qquad \text {and}\qquad \lambda (I_j \cap G_i)>0 . \end{aligned}$$

(5.37)

The second condition is equivalent to knowing that

$$\begin{aligned} b_i>c_{j+1} \qquad \text {and}\qquad b_{i+1}<c_j. \end{aligned}$$

In particular, we have $W_i\leqslant V_j$ by (5.22). Note though that we have assumed $V_j\not \leqslant W_i$. Therefore, $W_i<V_j$. Since $\dim (W_i)=i+1$, we infer that

$$\begin{aligned} i\leqslant \dim (V_j)-2 . \end{aligned}$$

Since we have assumed that $({\mathscr {W}},{{\textbf{b}}})$ is saturated, the above inequality implies that $b_{i+1}>c_{j+1}$. Recalling the definition (5.11) of $\delta ({{\textbf{b}}})$, we conclude that

$$\begin{aligned} b_{i+1}\geqslant c_j-\delta ({{\textbf{b}}}). \end{aligned}$$

This implies that $G_i\cap I_j\subset [c_j-\delta ({{\textbf{b}}}),c_j]$ for any pair (i, j) satisfying (5.37). As a consequence,

$$\begin{aligned} \sum _{i:\ V_j\not \leqslant W_i} \lambda (I_j \cap G_i) \leqslant \delta ({{\textbf{b}}})\qquad (1\leqslant j\leqslant r). \end{aligned}$$

Since we also have that $\textrm{e}({\mathscr {W}}, {{\textbf{b}}}) \geqslant \textrm{e}({\mathscr {V}},{{\textbf{c}}},{\varvec{\mu }}) + \varepsilon \delta ({{\textbf{b}}})/2$ by Lemma (5.13)(b), it follows that

$$\begin{aligned} E(p,{\mathscr {W}},{{\textbf{b}}})\geqslant & {} (p-1)\textrm{e}({\mathscr {V}},{{\textbf{c}}},{\varvec{\mu }})+\varepsilon \delta ({{\textbf{b}}})/2 +O((p-1)^2\delta ({{\textbf{b}}})) \nonumber \\\geqslant & {} (p-1)\textrm{e}({\mathscr {V}},{{\textbf{c}}},{\varvec{\mu }})+\varepsilon \delta ({{\textbf{b}}})/4, \end{aligned}$$

(5.38)

provided that $p-1$ is small enough compared to $\varepsilon $.

Using (5.38), we see that the contribution of saturated flags to the right hand side of (5.32) is

$$\begin{aligned} \sum _{({\mathscr {W}}, {{\textbf{b}}})\ \text {saturated}} D^{-E(p,{\mathscr {W}},{{\textbf{b}}})}\ll D^{-(p-1) \textrm{e}({\mathscr {V}},{{\textbf{c}}},{\varvec{\mu }})} \sum _{s=0}^r \sum _{b_1,\ldots ,b_s} D^{-(p-1)\varepsilon \delta ({{\textbf{b}}})/4}, \end{aligned}$$

where we used that there are O(1) choices for ${\mathscr {W}}$. Recall (5.21), which implies that the numbers $b_i$ are restricted to the set $\{m/\log D: m\in {\mathbb {N}}\}$. Thus the number of ${{\textbf{b}}}$ with $\delta ({{\textbf{b}}})=m/\log D$ is at most $(m+1)^s$ and

$$\begin{aligned} \sum _{s=0}^r \sum _{b_1,\ldots ,b_s} D^{-(p-1)\varepsilon \delta ({{\textbf{b}}})/4} \leqslant \sum _{s=0}^r \sum _{m\geqslant 0} (m+1)^s e^{-(p-1)(\varepsilon /4) m} \ll _{\varepsilon ,p} 1. \end{aligned}$$

We thus conclude that

$$\begin{aligned} \sum _{({\mathscr {W}}, {{\textbf{b}}})\ \text {saturated}} D^{-E(p,{\mathscr {W}},{{\textbf{b}}})} \ll D^{-(p-1) \textrm{e}({\mathscr {V}},{{\textbf{c}}},{\varvec{\mu }})}. \end{aligned}$$

If we combine the above inequality with (5.36) and (5.32), we establish Proposition 5.10. $\square $

6 An argument of Maier and Tenenbaum

The aim of this section is to prove Proposition 5.7. The reader may care to recall the statement of that proposition now, as well as the definition of a compatible map (Definition 5.8). As in the previous section, the system $({\mathscr {V}},{{\textbf{c}}},{\varvec{\mu }})$ is fixed, and satisfies conditions (i)–(iii) of Proposition 5.5. We also fix a basis $\{ {\textbf{1}},\omega ^1,\ldots ,\omega ^d \}$ of $V_r$ such that $V_j={\text {Span}}({\textbf{1}},\omega ^1,\ldots ,\omega ^{\dim (V_j)-1})$ for each j and such that $\omega ^i\in \{0,1\}^k$ for each i. Denote $\Omega = {\text {Supp}}(\mu _r) = V_r \cap \{0,1\}^k$.

We begin with an observation related to the solvability of (4.12), which we recall here for the convenience of the reader:

$$\begin{aligned} \sum _{j = 1}^r K_j \omega ^j = - \sum _{\omega } \omega \sum _{a' \in B'_{\omega }} a'\, ({\text {mod}}\, {\textbf{1}}).\end{aligned}$$

(6.1)

Let $\Lambda $ denote the ${\mathbb {Z}}$-span of ${\textbf{1}},\omega ^1,\ldots ,\omega ^d$ (that is, the lattice generated by ${\textbf{1}},\omega ^1,\ldots ,\omega ^d$). Every vector $\omega \in \Omega $ is a rational combination of the basis elements ${\textbf{1}},\omega ^1,\ldots ,\omega ^d$. Hence, there is some $M\in {\mathbb {N}}$ such that $M\omega \in \Lambda $ for each $\omega \in \Omega $. In particular, note that the right-hand side of (6.1) lies generically in the lattice $\Lambda /M=\{x/M: x\in \Lambda \}$. However, we must ensure that (6.1) is solvable with $K_1,\ldots ,K_r\in {\mathbb {Z}}$. Equivalently, the right-hand side of (6.1) must lie in $\Lambda $, which can be guaranteed when the coefficients of all vectors $\omega $ in it lie in $M{\mathbb {Z}}$.

In this section, implied constants in O() and $\ll $ notations may depend on the system $({\mathscr {V}},{{\textbf{c}}},{\varvec{\mu }})$ and basis $\omega ^1,\ldots ,\omega ^d$; in particular, on k, d and M.

6.1 The sets ${\mathscr {L}}_i({\textbf{A}})$ and lower bounds for their size

The main statement of this subsection, Proposition 6.2, is a variant of Proposition 5.9, where we stipulate that all elements lie in $\Lambda $. This will later ensure that (6.1) is solvable with $K_1,\ldots ,K_r\in {\mathbb {Z}}$.

Fix $\kappa >0$ satisfying $\kappa \leqslant \frac{\kappa ^*}{2}$, where $\kappa ^*$ is the constant from Proposition 5.9. In particular, $\kappa \leqslant 1/2$. We introduce the sets

$$\begin{aligned} I_i(D) := \bigcup _{j = 1}^r (D^{c_{j+1}}, D^{c_j(1-\kappa /i)}] , \qquad i = 1,2,\ldots .\end{aligned}$$

(6.2)

Thus each $I_i(D)$ is simply a union of r intervals in $\Lambda $, and we have the nesting

$$\begin{aligned} I_1(D) \subset I_2(D) \subset \cdots \subset (D^{c}, D]. \end{aligned}$$

For any $\omega \in V_r$ we denote by ${\overline{\omega }}$ the projection of $\omega $ onto

$$\begin{aligned} {\overline{V}}_r:= V_r/\langle {\textbf{1}} \rangle ={\text {Span}}\{ \omega ^1,\ldots ,\omega ^d \}. \end{aligned}$$

In addition let ${\overline{\psi }}(a)=\overline{\psi (a)}$ for $a\in {\textbf{A}}$.

The reader may wish to recall the definition of nondegenerate (Definition 5.6) and compatible (Definition 5.8) maps.

Definition 6.1

Write ${\mathscr {L}}_{i}({\textbf{A}})$ for the set of all $\sum _{a \in {\textbf{A}}} a{\overline{\psi }}(a)$ that lie in $\Lambda $, where $\psi $ ranges over all nondegenerate, compatible maps supported on $I_i(D)$.

Proposition 6.2

Let $\delta > 0$ and $i\in {\mathbb {N}}$, and let D be sufficiently large in terms of $\delta $. Then with probability at least $1 - \delta $ in the choice of ${\textbf{A}}\cap I_i(D)$,

$$\begin{aligned} |{\mathscr {L}}_{i}({\textbf{A}})| \gg \delta ^\alpha D^{(1 - \kappa /i)\sum _j c_j \dim (V_j/V_{j-1})}, \end{aligned}$$

(6.3)

where $\alpha $ is a positive constant depending at most on $({\mathscr {V}},{{\textbf{c}}},{\varvec{\mu }})$.

Proof

Let

$$\begin{aligned} I_i'(D)= & {} \bigcup _{j = 1}^r (D^{(c_{j+1}+\kappa ^*)(1-\kappa /i)}, D^{c_j(1-\kappa /i)} ] \\ {}\subset & {} \bigcup _{j = 1}^r (D^{c_{j+1}(1+\kappa /2)}, D^{c_j(1-\kappa /i)}] \subset I_i(D), \end{aligned}$$

where the first inclusion follows by noticing that

$$\begin{aligned} (c_{j+1}+\kappa ^*)(1-\kappa /i)\geqslant c_{j+1}(1+\kappa /2) \end{aligned}$$

for $c_{j+1}\in [0,1]$, $0\leqslant \kappa \leqslant \kappa ^*/2\leqslant 1/2$ and $i\geqslant 1$. Write ${\mathscr {L}}'_i({\textbf{A}})$ for the set of all $\sum _{a \in {\textbf{A}}} a {\overline{\psi }}(a)$, where $\psi $ ranges over all nondegenerate, compatible maps supported on $I'_i(D)$, but without the stipulation that the sum is in $\Lambda $. We now apply Proposition 5.9 with D replaced by $D^{1-\kappa /i}$ and $\delta $ replaced by $\delta /2$ to conclude that

$$\begin{aligned} |{\mathscr {L}}_i'({\textbf{A}})| \gg \delta ^{\alpha } D^{(1 - \kappa /i)\sum _j c_j \dim (V_j/V_{j-1})} \end{aligned}$$

with probability at least $1-\delta /2$, where $\alpha =1/(p-1)$ with p as in Proposition 5.9.

We now use the elements of ${\textbf{A}}\cap (I_i(D){\setminus } I_i'(D))$ to create many sums $\sum _{a \in {\textbf{A}}} {\overline{\psi }}(a)$ which do lie in $\Lambda $. Let $G:= (D^{c_{r+1}(1-\kappa /i)},\delta ^{-1}D^{c_{r+1}(1-\kappa /i)}]$, which is a subset of $I_i(D){\setminus } I'_i(D)$. Let ${\mathcal {E}}$ be the event that ${\textbf{A}}\cap G$ contains at least $2^k$ elements that are $\equiv m\pmod {M}$ for each $m\in \{1,\ldots ,M\}$. Lemma A.2 (applied with $B=\{b\in {\mathbb {Z}}\cap G : b\equiv m\pmod M\}$ and $\varepsilon =1/3$) implies that if $\delta $ is sufficiently small then ${\mathbb {P}}({\mathcal {E}}) \geqslant 1 - \delta /2$.

Assume now that we are in the event ${\mathcal {E}}$. Let us fix a set ${\mathcal {K}}\subset {\textbf{A}}\cap G$ that contains exactly $2^k$ elements that are $\equiv m\pmod {M}$ for each $m\in \{1,\ldots ,M\}$. Take any nondegenerate, compatible function $\psi : {\textbf{A}}\rightarrow \{0,1\}^k$ supported on $I'_i(D)$, and write

$$\begin{aligned} \sum _{a\in I'_i(D)} a {\psi }(a) = \sum _{\omega \in \Omega } \omega N_\omega . \end{aligned}$$

Recall that ${\text {Supp}}(\mu _r)=V_r\cap \{0,1\}^k$ by condition (iii) of Proposition 5.5. Hence, for each $\omega \in \Omega $, we may find an element $a_\omega \in {\mathcal {K}}$ satisfying $a_\omega \equiv -N_\omega \pmod {M}$. Setting $\psi _0(a_{\omega })=\omega $ for each $\omega $, and $\psi _0(a)=\psi (a)$ for $a\in I'_i(D)$, and $\psi _0(a)={\textbf{0}}$ for all other $a\in I_i(D)$. We have

$$\begin{aligned} \sum _{a\in I_i(D)} a \psi _0(a) = \sum _{\omega \in \Omega } (a_{\omega }+N_\omega ) \omega \in \Lambda , \end{aligned}$$

since $M|(a_\omega +N_\omega )$ for all $\omega $. Moreover, $\psi _0$ is nondegenerate and compatible by construction. Consequently, $\sum _a a{\overline{\psi }}_0(a) \in \Lambda $ (by removing the coefficient of ${\textbf{1}}$). Since there are at most $2^{|{\mathcal {K}}|} \leqslant 2^{M2^k}$ choices for $\{a_\omega : \omega \in \Omega \}$, the map from $\sum _{a\in I'_i(D)} a{\overline{\psi }}(a)$ to $\sum _{a\in I_i(D)} a{\overline{\psi }}_0(a)$ is at most $2^{M2^k}$-to-1.We conclude that with probability $\geqslant 1-\delta $,

$$\begin{aligned} |{\mathscr {L}}_i({\textbf{A}})| \geqslant 2^{-M2^k} |{\mathscr {L}}_i'({\textbf{A}})| \gg \delta ^{\alpha } D^{(1 - \kappa /i)\sum _j c_j \dim (V_j/V_{j-1})}, \end{aligned}$$

the implied constant only depending on k, M and $\alpha $, which are all fixed. $\square $

6.2 Putting ${\mathscr {L}}_i({\textbf{A}})$ in a box

In the last section, we showed that (with high probability) ${\mathscr {L}}_i({\textbf{A}})$ is large. In this section we show that with high probability it is contained in a box (in coordinates $\omega ^1,\ldots ,\omega ^d$); putting these results together one then sees that ${\mathscr {L}}_i({\textbf{A}})$ occupies a positive proportion of lattice points in the box, the bound being independent of D.

For $t \in \{1,\ldots , d\}$, write j(t) for the unique j such that

$$\begin{aligned} \dim V_{j-1} < t \leqslant \dim V_j. \end{aligned}$$

In addition, let $C$ be the largest coordinate in absolute value of any element in $V_r\cap \{0,1\}^k$ when written with respect to the base ${\textbf{1}},\omega ^1,\ldots ,\omega ^d$. We then set

$$\begin{aligned} N_j^{(i)} := \delta ^{-1} \cdot C \cdot D^{(1 - \kappa /i) c_j} \qquad \text {and}\qquad N^{(i)} := \prod _{t = 1}^d N_{j(t)}^{(i)} . \end{aligned}$$

(6.4)

Lemma 6.3

Assume $\delta >0$ is small enough so that $ r e^{-2/\delta }\leqslant \delta $. Then, we have

$$\begin{aligned} {\mathscr {L}}_{i}({\textbf{A}}) \subset \bigoplus _{t = 1}^d \big [-N_{j(t)}^{(i)},N_{j(t)}^{(i)}\big ] \omega ^t \end{aligned}$$

(6.5)

with probability at least $1 -\delta $ in the choice of ${\textbf{A}}\cap I_i(D)$.

Proof

This follows quickly from the fact that $\psi $ is compatible and by Lemma A.6, the latter implying that

$$\begin{aligned} \sum _{a \in {\textbf{A}}\cap [2, D^{(1 - \kappa /i) c_j}]} a \leqslant \delta ^{-1} D^{(1 - \kappa /i) c_j} \qquad (1\leqslant j\leqslant r) \end{aligned}$$

with probability $\geqslant 1- r e^{-2/\delta } \geqslant 1-\delta $. $\square $

Proposition 6.4

Let $\delta $ and $\alpha $ be as in Proposition 6.2 and in Lemma 6.3. With probability at least $1 - 2 \delta $ in the choice of ${\textbf{A}}\cap I_i(D)$, ${\mathscr {L}}_i({\textbf{A}})$ is a subset of the box $\bigoplus _{t = 1}^d [-N_{j(t)}^{(i)}, N_{j(t)}^{(i)}] \omega ^t$ of size $\gg \delta ^{d+\alpha } N^{(i)}$.

Proof

This follows immediately upon combining Proposition 6.2 and Lemma 6.3 . $\square $

6.3 Zero sums with positive probability

Lemma 6.5

Let $\delta $ and $\alpha $ be as in Proposition 6.2 and Lemma 6.3, and let D be large enough in terms of $\delta $ and $({\mathscr {V}},{{\textbf{c}}},{\varvec{\mu }})$. Let $i\in {\mathbb {Z}}\cap [1,(\log D)^{1/3}]$. In addition, let $S\subset \bigoplus _{t = 1}^d [-N_{j(t)}^{(i)}, N_{j(t)}^{(i)}] \omega ^t$ with $|S| \gg \delta ^{d+\alpha } N^{(i)}$ and with $S \subset \Lambda $. Then

$$\begin{aligned} {\mathbb {P}}\big (0 \in {\mathscr {L}}_{i+1}({\textbf{A}}) \,\big |\, {\mathscr {L}}_i({\textbf{A}}) = S\big ) \gg \delta ^{2d(d+\alpha )} . \end{aligned}$$

Proof

We condition on a fixed choice of ${\textbf{A}}\cap I_i(D)$ for which ${\mathscr {L}}_i({\textbf{A}}) = S$. Note that

$$\begin{aligned} I_{i+1}(D) {\setminus } I_i(D) = \bigcup _{j = 1}^r (D^{(1 - \kappa /i) c_j}, D^{(1 - \kappa /(i+1)) c_j}] \supset \bigcup _{j=1}^r [N^{(i)}_j, 100d N^{(i)}_j].\nonumber \\ \end{aligned}$$

(6.6)

Then it is enough to show that with probability $\gg \delta ^{2d(d+\alpha )}$ , the set ${\textbf{A}}$ contains 2d distinct elements $a_t$ and $a_t'$, $1\leqslant t\leqslant d$, such that

$$\begin{aligned} \sum _t (a_t'-a_t)\omega ^t \in S \qquad \text {and}\qquad a_t,a_t' \in [N^{(i)}_j, 100d N^{(i)}_j] \quad \text {for}\ \ t=1,\ldots ,d.\nonumber \\ \end{aligned}$$

(6.7)

To see why this is sufficient, let $s=\sum _t(a_t'-a_t)\omega ^t$, which we know belongs to $S={\mathscr {L}}_i({\textbf{A}})$. In particular, there is an compatible map $\psi $ supported on $I_i(D)$ such that $\sum _{a \in {\textbf{A}}} a {\overline{\psi }}(a) = s$. Now, consider the function

$$\begin{aligned} \psi ' : {\textbf{A}}\cap I_{i+1}(D) \rightarrow \{0,1\}^k \end{aligned}$$

with ${\psi '}(a) = \psi (a)$ for $a\in {\textbf{A}}\cap I_i(D)$, $\psi '(a_t')={\textbf{1}}-\omega ^t$ and $\psi '(a_t)=\omega ^t$ for $1\leqslant t\leqslant d$, and $\psi '(a)={\textbf{0}}$ for all other values of $a\in {\textbf{A}}\cap I_{i+1}(D)$. Notice that $\psi '$ is compatible according to Definition 5.8 by the second part of (6.7). It is now clear that $0\in {\mathscr {L}}_{i+1}({\textbf{A}})$. Hence, if the conditional probability that (6.7) holds is $\gg \delta ^{2\beta d}$, so is the probability that $0\in {\mathscr {L}}_{i+1}({\textbf{A}})$.

To find $a_t$ and $a_t'$ satisfying (6.7), let

The number of elements $\sum _t s_t \omega ^t \in S$ with $n|s_t$ for some t is

$$\begin{aligned} \leqslant \sum _{t=1}^d \big ( 2N_{j(t)}^{(i)}/n+1 \big ) \prod _{t'\ne t} \big (2N^{(i)}_j(t')+1\big )\leqslant & {} d3^{d-1}\Big ( \frac{2N^{(i)}}{n}+\frac{N^{(i)}}{\min _j N_j^{(i)}} \Big )\\ {}\leqslant & {} |S|/2 \end{aligned}$$

as long as D is large enough in terms of $\delta $ and $({\mathscr {V}},{{\textbf{c}}},{\varvec{\mu }})$. Thus, there is a subset $S'\subset S$ of size at least |S|/2 and with $n\not \mid s_t$ for all t. We will choose the sets $\{a_t: 1\leqslant t\leqslant d\}$ and $\{a_t': 1\leqslant t\leqslant d\}$ independently, by selecting $a_t \equiv 0\pmod {n}$ and $a_t'\not \equiv 0\pmod {n}$.

Note that

$$\begin{aligned} I_{i+1}(D) {\setminus } I_i(D) = \bigcup _{j = 1}^r (D^{(1 - \kappa /i) c_j}, D^{(1 - \kappa /(i+1)) c_j}] \supset \bigcup _{j=1}^r [N^{(i)}_j, 100d N^{(i)}_j] \end{aligned}$$

provided that $i\leqslant (\log D)^{1/3}$. For each given t, i and j, the probability that the interval $[4t N^{(i)}_{j}, (4t+2) N^{(i)}_{j}]$ contains no element $a_t \equiv 0\pmod {n}$ of ${\textbf{A}}$ equals

$$\begin{aligned} \prod _{\begin{array}{c} 4t N^{(i)}_{j}\leqslant a \leqslant (4t+2) N^{(i)}_{j} \\ a \equiv 0\pmod n \end{array}}(1-1/a) \leqslant 1-\gamma /n \end{aligned}$$

for some small positive constant $\gamma =\gamma (d)$. Thus, the probability that, for each $t=1,2,\ldots ,d$, the set ${\textbf{A}}$ contains some $a_t\equiv 0\pmod n$ in the interval $[4t N^{(i)}_{j(t)}, (4t+2) N^{(i)}_{j(t)}]$ is $\gg 1/n^d\gg \delta ^{d(d+\alpha )}$.

Fix a choice of $a_1,\ldots , a_d$ as described above, and set

$$\begin{aligned} X := \{ (a_1+s_1,\ldots ,a_d+s_d) : s_1\omega ^1 + \cdots + s_d\omega ^d \in S'\}. \end{aligned}$$

(6.8)

By construction, every coordinate of $x\in X$ is $\not \equiv 0\pmod {n}$. Also,

$$\begin{aligned} X \subset \prod _{t = 1}^d \big [(4t-1)N_{j(t)}^{(i)}, (4t+3)N_{j(t)}^{(i)}\big ]. \end{aligned}$$

(6.9)

Now the intervals on the right-hand side above are disjoint, and

$$\begin{aligned} |X| \geqslant \frac{|S|}{2} \gg \delta ^\beta \prod _{t=1}^d N_{j(t)}^{(i)}. \end{aligned}$$

Thus, by Lemma A.7, with probability $\gg (\delta ^{d+\alpha })^d$, there are $a_1',\ldots ,a_d' \in {\textbf{A}}$ such that $(a_1',\ldots ,a_t')\in X$. The relation (6.7) follows for such $a_t,a_t'$, which exist with probability $\gg \delta ^{d(d+\alpha )} \cdot \delta ^{d(d+\alpha )}$. $\square $

6.4 An iterative argument

To complete the proof of Proposition 5.7, we apply Lemma 6.5 iteratively. Let ${\mathscr {S}}$ be the set of sets S satisfying the assumptions of Lemma 6.5. We say that ${\mathscr {L}}_i({\textbf{A}})$ is large if it satisfies the conclusions of Proposition 6.4, or equivalently if ${\mathscr {L}}_i({\textbf{A}}) = S$ with $S\in {\mathscr {S}}$. Thus Lemma 6.5 implies that

$$\begin{aligned}&{\mathbb {P}}\big ( 0 \in {\mathscr {L}}_{i+1}({\textbf{A}}){\setminus }{\mathscr {L}}_i({\textbf{A}}),\ {\mathscr {L}}_{i}({\textbf{A}}) \; \text{ large } \big )\\ {}&\quad =\sum _{\begin{array}{c} S\ \text {large} \\ 0\notin S \end{array}} {\mathbb {P}}({\mathscr {L}}_i({\textbf{A}})=S)\cdot {\mathbb {P}}\big ( 0 \in {\mathscr {L}}_{i+1}({\textbf{A}}) \,\big |\, {\mathscr {L}}_{i}({\textbf{A}})=S\big ) \\&\quad \gg \delta ^{2d\alpha } {\mathbb {P}}\big ({\mathscr {L}}_i({\textbf{A}})\ \text {large},\ 0\notin {\mathscr {L}}_i({\textbf{A}})\big ). \end{aligned}$$

We conclude there is some $\varepsilon = \delta ^{O(1)}$ such that

$$\begin{aligned} {\mathbb {P}}\big ( 0 \in {\mathscr {L}}_{i+1}({\textbf{A}}) \,\big |\, {\mathscr {L}}_{i}({\textbf{A}}) \; \text{ large }, 0 \notin {\mathscr {L}}_i({\textbf{A}}) \big ) \geqslant \varepsilon . \end{aligned}$$

(6.10)

For brevity, write $E_i$ for the event that $0 \notin {\mathscr {L}}_i({\textbf{A}})$, and $F_i$ for the event that ${\mathscr {L}}_{i}({\textbf{A}})$ is large. In this notation, (6.10) becomes

$$\begin{aligned} {\mathbb {P}}\big ( E_{i+1}^c | E_i\cap F_i) \geqslant \varepsilon . \end{aligned}$$

(6.11)

Moreover, Proposition 6.4 implies that

$$\begin{aligned} {\mathbb {P}}(F_i) \geqslant 1 - 2\delta . \end{aligned}$$

(6.12)

Lastly, note that $E_1 \supset E_2 \supset \cdots $ because ${\mathscr {L}}_1({\textbf{A}}) \subset {\mathscr {L}}_2({\textbf{A}}) \subset \cdots $

We claim that ${\mathbb {P}}(E_i)< 4\delta $ for some $i\leqslant I:=\lfloor (\log D)^{1/3}\rfloor $. Indeed, for each $i\leqslant I$, we have

$$\begin{aligned} {\mathbb {P}}(E_{i+1})&= {\mathbb {P}}(E_{i+1} | E_i \cap F_i) {\mathbb {P}}(E_i \cap F_i) + {\mathbb {P}}(E_{i+1} | E_i \cap F^c_i) {\mathbb {P}}(E_i \cap F^c_i) \\ {}&\leqslant (1 - \varepsilon ) {\mathbb {P}}(E_i \cap F_i) + {\mathbb {P}}(E_i \cap F^c_i)\qquad \text{ by } \text{( }6.11\text {)} \\ {}&= {\mathbb {P}}(E_i) - \varepsilon {\mathbb {P}}(E_i \cap F_i) \\ {}&\leqslant {\mathbb {P}}(E_i) - \varepsilon ( {\mathbb {P}}(E_i) - 2\delta ) \qquad \text{ by } \text{( }6.12\text {)}. \end{aligned}$$

Thus, if ${\mathbb {P}}(E_i) \geqslant 4 \delta $, then ${\mathbb {P}}(E_{i+1}) \leqslant (1 - \varepsilon /2) {\mathbb {P}}(E_i)$. If this holds for all $i\leqslant I$, then ${\mathbb {P}}(E_I)\leqslant (1-\varepsilon /2)^{I-1}<4\delta $, a contradiction. Therefore, ${\mathbb {P}}({\mathbb {E}}_{i^*}) < 4\delta $ for some $i^*\leqslant I$, as long as D is large enough in terms of $\delta $ and the (fixed) system $({\mathscr {V}},{{\textbf{c}}},{\varvec{\mu }})$. This completes the proof of Proposition 5.7.

Part III. The optimisation problem

7 The optimisation problem—basic features

In this section we consider Problem 3.7, the optimisation problem on the cube, which is a key feature of our paper. We will give some kind of a solution to this for a fixed nondegenerate flag ${\mathscr {V}}$, leaving aside the question of how to choose ${\mathscr {V}}$ optimally.

Let us refresh ourselves on the main elements of the setup of Problem 3.7. We have a nondegenerate, r-step flag

$$\begin{aligned} {\mathscr {V}}: \langle {\textbf{1}} \rangle = V_0 \leqslant V_1 \leqslant V_2 \leqslant \cdots \leqslant V_r \leqslant {\mathbb {Q}}^k \end{aligned}$$

of distinct vector spaces. In light of Lemma 5.4, we may restrict our attention to flags such that

$$\begin{aligned} \dim (V_1/V_0)=1, \end{aligned}$$

which we henceforth assume. With the flag ${\mathscr {V}}$ fixed, we wish to find $\gamma _k({\mathscr {V}})$, the supremum of numbers $c\geqslant 0$ such that there are thresholds

$$\begin{aligned} 1 = c_1 \geqslant c_2 \geqslant \cdots \geqslant c_{r+1} = c \end{aligned}$$

(we may assume that $c_1=1$ by arguing as in Lemmas 5.3 and 5.4) and probability measures $\mu _1,\ldots , \mu _r$ on $\{0,1\}^k$ satisfying ${\text {Supp}}(\mu _j) \subset V_j$ for each j, and such that the entropy condition (3.4) holds, that is to say

$$\begin{aligned} \textrm{e}({\mathscr {V}}',{{\textbf{c}}},{\varvec{\mu }}) \geqslant \textrm{e}({\mathscr {V}},{{\textbf{c}}},{\varvec{\mu }}) \end{aligned}$$

(7.1)

for all subflags ${\mathscr {V}}' \leqslant {\mathscr {V}}$. We recall that

$$\begin{aligned} \textrm{e}({\mathscr {V}}',{{\textbf{c}}},{\varvec{\mu }}) := \sum _{j = 1}^r (c_j - c_{j+1}) {\mathbb {H}}_{\mu _j}(V'_j) + \sum _{j = 1}^r c_j \dim (V'_j/V'_{j-1}). \end{aligned}$$

Remarks. (a) It is easy to see that $\gamma _k({\mathscr {V}})$ always exists by considering the following example with $c=0$. Take $c_1=1$ and $c_2=\cdots =c_{r+1}=0$ and recall that $\dim (V_1/V_0)=1$. Suppose that $V_1={\text {Span}}({\textbf{1}},\omega )$ with $\omega \in \{0,1\}^k$. Thus, $\textrm{e}({\mathscr {V}},{{\textbf{c}}},{\varvec{\mu }})=1$ for any choice of ${\varvec{\mu }}$. If $V_1'=V_1$ then likewise we have $\textrm{e}({\mathscr {V}}',{{\textbf{c}}},{\varvec{\mu }})=1$, and if $V_1'=V_0$ then $\textrm{e}({\mathscr {V}}',{{\textbf{c}}},{\varvec{\mu }})={\mathbb {H}}_{\mu _1}(V_0)$. Now $V_0+{\textbf{1}}$, $V_0+\omega $ and $V_0+({\textbf{1}}-\omega )$ are three different cosets. Taking

$$\begin{aligned} \mu _1({\textbf{1}})=\mu _1(\omega )=\mu _1({\textbf{1}}-\omega )=1/3 \end{aligned}$$

we have $\textrm{e}({\mathscr {V}}',{{\textbf{c}}},{\varvec{\mu }})=\log 3$. Thus, (3.4) holds. As we shall see in this section, this choice of $\mu _1$ is the optimal choice for a very general class of flags, including those of interest to us.

(b) A simple compactness argument shows that the supremum is realised, that is, there is a choice of ${{\textbf{c}}}$ and ${\varvec{\mu }}$ satisfying the entropy condition 3.4 and with $c_{r+1}=\gamma _k({\mathscr {V}})$.

(c) As long as we can show that $\gamma _k>0$ (which will be taken care of in Part IV), we can always find an optimal system $({\mathscr {V}},{{\textbf{c}}},{\varvec{\mu }})$ that also has $c_j>c_{j+1}$ for each j (cf. Lemma 5.4(a)).

7.1 A restricted optimisation problem

It turns out to be very useful to consider a restricted variant of the problem in which the entropy condition (7.1) is only required to be satisfied for certain “basic” subflags ${\mathscr {V}}'$, rather than all of them.

Definition 7.1

(Basic subflag) Given a flag ${\mathscr {V}}: \langle {\textbf{1}}\rangle = V_0 \leqslant V_1 \leqslant \cdots \leqslant V_r$, the basic subflags ${\mathscr {V}}'_{{\text {basic}}(m)}$ are the ones in which $V'_i = V_{\min (m,i)}$, for $m = 0,1,\ldots , r-1$ (note that when $m = r$ we recover ${\mathscr {V}}$ itself).

Here is the restricted version of Problem 3.7. Recall that a flag is non-degenerate if the top space $V_r$ is not contained in any of the subspaces $\{x\in {\mathbb {R}}^k: x_i=x_j \}$. The restriction to nondegenerate flags ensures that the subsets $A_1,\ldots ,A_k$ in our main problem are distinct.

Problem 7.2

Let ${\mathscr {V}}$ be a nondegenerate flag of distinct spaces in ${\mathbb {Q}}^k$. Define $\gamma _k^{{\text {res}}}({\mathscr {V}})$ to be the supremum of all constants $c\geqslant 0$ for which there are measures $\mu _1,\ldots , \mu _r$ such that ${\text {Supp}}(\mu _i) \subset V_i$, and parameters

$$\begin{aligned} 1= c_1 \geqslant \cdots \geqslant c_{r+1} = c \end{aligned}$$

such that the restricted entropy condition

$$\begin{aligned} \textrm{e}({\mathscr {V}}'_{{\text {basic}}(m)};{{\textbf{c}}},{\varvec{\mu }}) \geqslant \textrm{e}({\mathscr {V}}; {{\textbf{c}}},{\varvec{\mu }}) \end{aligned}$$

(7.2)

holds for all $m = 0,1,\ldots , r-1$.

It is clear that

$$\begin{aligned} \gamma _k^{{\text {res}}}({\mathscr {V}}) \geqslant \gamma _k({\mathscr {V}}) .\end{aligned}$$

(7.3)

In general there is absolutely no reason to suppose that the two quantities are equal, since after all the restricted entropy condition (7.2) apparently only captures a small portion of the full condition (7.1).

Our reason for studying the restricted problem is that we do strongly believe that

$$\begin{aligned} \sup _{{\mathscr {V}}{\text {nondegenerate}}} \gamma _k^{{\text {res}}}({\mathscr {V}}) = \sup _{{\mathscr {V}}{\text {nondegenerate}}} \gamma _k({\mathscr {V}}) = \gamma _k. \end{aligned}$$

One might think of this unproven assertion, on an intuitive level, in two (roughly equivalent) ways:

for those flags optimal for Problem 3.7, the critical cases of (7.1) are those for which ${\mathscr {V}}'$ is basic;
for those flags optimal for Problem 3.7, and for the critical choice of the $c_i, \mu _i$, the restricted condition (7.2) in fact implies the more general condition (7.1).

7.2 The $\rho $-equations, optimal measures and optimal parameters

The definitions and constructions of this section will appear unmotivated at first sight. They are forced upon us by the analysis of Sect. 7.5 below.

Let the flag ${\mathscr {V}}$ be fixed.

It is convenient to call the intersection of a coset $x + V_i$ with the cube $\{0,1\}^k$ a cell at level i, and to denote the cells at various levels by the letter C. (The terminology comes from the fact it can be useful to think of $V_i$ defining a $\sigma $-algebra (partition) on $\{0,1\}^k$, the equivalence relation being given by $\omega \sim \omega '$ iff $\omega - \omega ' \in V_i$: however, we will not generally use the language of $\sigma $-algebras in what follows.)

If C is a cell at level i, then it will be a union of cells $C'$ at level $i-1$. These cells we call the children of C, and we write $C \rightarrow C'$.

Let ${\varvec{\rho }} = (\rho _1,\ldots , \rho _{r-1})$ be real parameters in (0, 1), and for each cell C define functions $f^C({\varvec{\rho }})$ by the following recursive recipe:

If C has level 0, then $f^C({\varvec{\rho }}) = 1$;
If C has level i, then
$$\begin{aligned} f^C({\varvec{\rho }}) = \sum _{C \rightarrow C'} f^{C'}({\varvec{\rho }})^{\rho _{i-1}},\end{aligned}$$
(7.4)

with the convention that $\rho _0 = 0$.

Write

$$\begin{aligned} \Gamma _i = V_i \cap \{0,1\}^k \end{aligned}$$

for the cell at level i which contains ${\textbf{0}}$. Note that

$$\begin{aligned} \{{\textbf{0}},{\textbf{1}}\}=\Gamma _0 \subset \Gamma _1 \subset \cdots \subset \Gamma _r. \end{aligned}$$

Definition 7.3

($\rho $-equations) The $\rho $-equations are the system of equations

$$\begin{aligned} f^{\Gamma _{j+1}}({\varvec{\rho }}) = (f^{\Gamma _j}({\varvec{\rho }}))^{\rho _j} e^{\dim (V_{j+1}/V_j)}, \qquad j = 1,2,\ldots , r-1. \end{aligned}$$

(7.5)

We say that they have a solution if they are satisfied with $\rho _1,\ldots ,\rho _{r-1} \in (0,1)$.

Example

Figure 1 illustrates these definitions for the so-called binary flag in ${\mathbb {Q}}^4$, which will be a key object of study from Sect. 9 onwards. Here

$$\begin{aligned} V_1 = \{ (x_1,x_2,x_3,x_4) \in {\mathbb {Q}}^4 : x_1 = x_2, x_3 = x_4\} \end{aligned}$$

and $V_2 = {\mathbb {Q}}^4$. The $\rho $-equations consist of the single equation , that is to say $3^{\rho _1} + 4 \cdot 2^{\rho _1} +4 = 3^{\rho _1} e^2$. This has the unique solution $\rho _1 \approx 0.306481$.

In general the $\rho $-equations may or may not have a solution, but for flags ${\mathscr {V}}$ of interest to us, it turns out that they have a unique such solution. In this case, we make the following definition.

Definition 7.4

(Optimal measures) Suppose that ${\mathscr {V}}$ is a flag for which the $\rho $-equations have a solution. Then the corresponding optimal measure on $\mu ^*$ on $\{0,1\}^k$ with respect to ${\mathscr {V}}$ is defined as follows: we set $\mu ^*(\Gamma _r) = 1$, and

$$\begin{aligned} \frac{\mu ^*(C')}{\mu ^*(C)} = \frac{f^{C'}({\varvec{\rho }})^{\rho _{i-1}}}{f^C({\varvec{\rho }})}\end{aligned}$$

(7.6)

for any cell C at level $i \geqslant 1$ and any child $C \rightarrow C'$. We also set $\mu ^*({\textbf{0}})=\mu ^*({\textbf{1}}) = \mu ^*(\Gamma _0)/2$. Lastly, we define the restrictions $\mu ^*_j(\omega ) := \mu ^*(\Gamma _j)^{-1}\mu ^*(\omega )1_{\omega \in \Gamma _j}$ for $j = 1,2,\ldots , r$ (thus $\mu ^*_r = \mu ^*$). We call these^{Footnote 7}optimal measures (on $\{0,1\}^k$, with respect to ${\mathscr {V}}$). Finally, we write ${\varvec{\mu }}^* = (\mu _1^*,\mu _2^*,\ldots , \mu _r^*)$.

Remark 7.1

(a) By taking telescoping products of (7.6) for $i = r, r-1,\ldots , 0$, we see that $\mu ^*$ is uniquely defined on all cells at level 0, and these are the cell $\{ {\textbf{0}}, {\textbf{1}}\}$ and singletons $\{\omega \}$ for all $\omega \in \{0,1\}^k{\setminus } \{{\textbf{0}},{\textbf{1}}\}$. Since we also specified $\mu ^*({\textbf{0}})=\mu ^*({\textbf{1}}) = \mu ^*(\Gamma _0)/2$, we see that $\mu ^*(\omega )$ is completely and uniquely determined by these rules, for all $\omega $. In particular, the $\rho $-equations (7.5) are equivalent to

$$\begin{aligned} \frac{ \mu ^*(\Gamma _j)}{\mu ^*(\Gamma _{j+1})} = e^{-\dim (V_{j+1}/V_j)} \quad \text {for}\ j=1,\ldots ,r-1, \end{aligned}$$

and thus

$$\begin{aligned} \mu _j^*(\Gamma _m) = e^{-\dim (V_j/V_m)} \qquad (j\geqslant m\geqslant 1). \end{aligned}$$

(7.7)

In addition, we have

$$\begin{aligned} \mu ^*(\Gamma _0) = \mu ^*(\Gamma _1) \cdot \frac{1}{f^{\Gamma _1}(\varvec{\rho })} = \frac{e^{-\dim (V_1/V_r)}}{|\Gamma _1|-1} . \end{aligned}$$

(7.8)

(b) By construction, the measures $\mu _j^*$ satisfy statements (d) and (e) of Lemma 5.3 for all j:

$$\begin{aligned} {\text {Supp}}(\mu _j)=\Gamma _j\quad \text {and}\quad \mu _j(\omega )=\mu _j({\textbf{1}}-\omega )\quad \forall \omega . \end{aligned}$$

(7.9)

(c) At the moment, the term “optimal measure” is just a name. We will establish the sense in which (in situations of interest) the measures $\mu ^*_j$ are optimal in Proposition 7.7 below.

(d) Note that ${\varvec{\mu }}^*$ and $\mu ^*$ are two different (but closely related) objects. The former is an r-tuple of measures $\mu _j^*$, all of which are induced from the single measure $\mu ^*$.

Definition 7.5

(Optimal parameters) Suppose that ${\mathscr {V}}$ is a flag for which the $\rho $-equations have a solution. Let $\mu ^*$ be the corresponding optimal measure on $\{0,1\}^k$ with respect to ${\mathscr {V}}$. Suppose additionally that

$$\begin{aligned} {\mathbb {H}}_{\mu ^*_{m+1}}(V_m) \ne \dim (V_{m+1}/V_m)\end{aligned}$$

(7.10)

for $m = 0,1,\ldots , r-1$. Then the corresponding optimal parameters with respect to ${\mathscr {V}}$ and the solution $\varvec{\rho }$ are the unique choice of

$$\begin{aligned} {{\textbf{c}}}^* : 1 = c^*_1> c^*_2>\cdots> c^*_{r+1} > 0, \end{aligned}$$

if it exists, such that

$$\begin{aligned} \textrm{e}({\mathscr {V}}'_{{\text {basic}}(m)}, {\varvec{\mu }}^*, {{\textbf{c}}}^*) = \textrm{e}({\mathscr {V}},{\varvec{\mu }}^*, {{\textbf{c}}}^*) \qquad \text{ for } m = 0,1,\ldots , r - 1. \end{aligned}$$

(7.11)

The Eq. (7.11), written out in full, are

$$\begin{aligned} \sum _{j = m+1}^r (c^*_j - c^*_{j+1}) {\mathbb {H}}_{\mu ^*_j}(V_m)= & {} \sum _{j = m+1}^r c^*_j \dim (V_j/V_{j-1}) \end{aligned}$$

(7.12)

$m = 0,1,\ldots , r-1.$

By (7.10), this uniquely determines $c^*_{m+1} \in {\mathbb {R}}$ in terms of $c_{m+2}^*,\ldots ,c^*_{r+1}$. Hence, we recursively determine $c_1,\ldots ,c_r$ in terms of $c_{r+1}$. Since we must further have $c_1=1$, this implicitly determines $c_{r+1}$ as well, and thus the entire vector ${{\textbf{c}}}^*$.

Remark. By Lemma 5.3 (ii), a stronger form of the condition (7.10) is required in order for the entropy gap condition to hold, and so in practice this assumption is not at all restrictive.

We conclude this subsection with a characterization of the optimal measure $\mu ^*$ and parameters ${{\textbf{c}}}^*$. Given an r-step flag ${\mathscr {V}}$, there is an associated rooted tree ${\mathscr {T}}({\mathscr {V}})$, which captures the structure of the cells at different levels $0,\ldots ,r-1$. In particular, this tree always has exactly $2^k-1$ leaves at level 0, corresponding to the cell $\Gamma _0=\{{\textbf{0}},{\textbf{1}}\}$ and the singletons $\{\omega \}$ for each $\omega \in \{0,1\}^k {\setminus } \{{\textbf{0}},{\textbf{1}}\}$.

Lemma 7.6

The optimal constant $\gamma _k^{res}({\mathscr {V}})$, associated measures $\mu ^*_i(C)$ and optimal parameters $c_i^*$ depend only on the tree ${\mathscr {T}}({\mathscr {V}})$ and the sequence of dimensions $\dim (V_j)$, $0\leqslant j\leqslant r$.

Proof

Let ${\mathscr {V}}$ and ${\widetilde{{\mathscr {V}}}}$ be different flags with the same tree structure, that is, ${\mathscr {T}}({\mathscr {V}})$ is isomorphic to ${\mathscr {T}}({\widetilde{{\mathscr {V}}}})$, and with the same sequence of dimensions $\dim (V_j)$ and $\dim (V_j')$. By an easy induction on the level and the definition of $f^C(\varvec{\rho })$, if $C\in {\mathscr {T}}({\mathscr {V}})$ and ${\tilde{C}}\in {\mathscr {T}}({\widetilde{{\mathscr {V}}}})$ correspond, we find that $f^C(\varvec{\rho }) = f^{\widetilde{C}} (\varvec{\rho })$. The statements now follow from Definitions 7.4 and 7.5. $\square $

7.3 Solution of the optimisation problem: statement

Here is the main result of this section, which explains the introduction of the various concepts above, as well as their names.

Proposition 7.7

Suppose that ${\mathscr {V}}: {\textbf{1}} = V_0 \leqslant V_1 \leqslant \cdots \leqslant V_r \leqslant {\mathbb {Q}}^k$ is a nondegenerate flag such that $\dim (V_1/V_0)=1$ and the $\rho $-equations have a solution. Let ${\varvec{\mu }}^*$ be the corresponding optimal measures, and suppose that the corresponding optimal parameters ${{\textbf{c}}}^*$ exist. Then

$$\begin{aligned} \gamma _k^{{\text {res}}}({\mathscr {V}}) = (\log 3 - 1)\Big /\bigg ( \log 3 + \sum _{i = 1}^{r-1} \frac{\dim (V_{i+1}/V_i)}{\rho _1 \cdots \rho _i} \bigg ) . \end{aligned}$$

(7.13)

Moreover, the optimal measures ${\varvec{\mu }}^*$ and optimal parameters ${{\textbf{c}}}^*$ provide the solution to Problem 7.2; in particular, $c^*_{r+1}$ is precisely the right-hand side of (7.13).

For this result to be of any use, we need methods for establishing, for flags ${\mathscr {V}}$ of interest, that the $\rho $-equations have a solution, and also that the optimal parameters exist. The former is a very delicate matter, highly dependent on the specific structure of the flags of interest. Once this is sorted out, the latter problem is less serious, at least in situations relevant to us.

7.4 Linear forms in entropies

In the next section we will prove Proposition 7.7. In this section we isolate some lemmas from the proof.

Let ${\mathscr {V}}: \langle {\textbf{1}} \rangle = V_0 \leqslant \cdots \leqslant V_r \leqslant {\mathbb {Q}}^k$ be a flag. We use the terminology of cells C at level i, introduced at the beginning of Sect. 7.2.

Lemma 7.8

Let ${\textbf{y}}= (y_0,\ldots , y_{r-1})$ be real numbers with the property that all the partial sums $y_{< i} := y_0 + \cdots + y_{i-1}$ are positive. If C is a cell (at some level i), then we write

$$\begin{aligned} h^C({\textbf{y}}) := \sup _{{\text {Supp}}(\mu _C) \subset C} \Big (\sum _{0\leqslant m<r} y_m {\mathbb {H}}_{\mu _C}(V_m)\Big ) , \end{aligned}$$

(7.14)

where the supremum is over all probability measures $\mu _C$ supported on C.

(a)
The quantities $h^C({\textbf{y}})$ are completely determined by the following rules:
- If C has level 0, then $h^C({\textbf{y}}) = 0$;
- If C has level i, then
  $$\begin{aligned} h^C({\textbf{y}}) = y_{< i}\log \Big (\sum _{C':\, C \rightarrow C'} e^{h^{C'}({\textbf{y}})/y_{< i}}\Big ). \end{aligned}$$
  (7.15)
(b)
For any C, the maximum in (7.14) occurs for a unique measure $\mu ^*_{C,{\textbf{y}}}$. Furthermore, all of the $\mu ^*_{C,{\textbf{y}}}$ are restrictions of the “top” measure $\mu ^*_{{\textbf{y}}} := \mu ^*_{\Gamma _r,{\textbf{y}}}$, that is to say $\mu ^*_{C,{\textbf{y}}}(x) = \mu ^*_{{\textbf{y}}}(x)/\mu ^*_{{\textbf{y}}}(C)$ for all $x \in C$, and
$$\begin{aligned} \frac{\mu ^*_{{\textbf{y}}}(C')}{\mu ^*_{{\textbf{y}}}(C)} = \frac{e^{h^{C'}({\textbf{y}}/y_{<i})}}{e^{h^{C}({\textbf{y}}/y_{<i})}}. \end{aligned}$$
(7.16)

Remark

As will be apparent from the proof, we do not use the linear structure of the cells C (that is, the fact that they come from cosets). We leave it to the reader to formulate a completely general version of this lemma in which the cells at level i are the atoms in a $\sigma $-algebra ${\mathscr {F}}_i$, with ${\mathscr {F}}_{i}$ being a refinement of ${\mathscr {F}}_{i+1}$ for all i.

Proof

We prove both parts simultaneously. Let us temporarily write ${{\tilde{h}}}^C({\textbf{y}})$ for the function defined by (7.15), thus the aim is to prove that $h^C({\textbf{y}}) = {{\tilde{h}}}^C({\textbf{y}})$, where $h^C({\textbf{y}})$ is defined in (7.14). We do this by induction on i, the $i=0$ case being trivial since, in this case, all the entropies ${\mathbb {H}}_{\mu _C}(V_m)$ are zero because each cell of level 0 lies in some coset mod $V_0$, and thus in the same coset mod $V_m$ for $m=0,1,\ldots ,r-1$.

Suppose now that we know the result for cells of level $i -1$. Note that both $h^C$ and ${{\tilde{h}}}^C$ satisfy a homogeneity property

$$\begin{aligned} {{\tilde{h}}}^C(t{\textbf{y}}) = t {{\tilde{h}}}^C({\textbf{y}}), \qquad h^C(t{\textbf{y}}) = t h^C({\textbf{y}}). \end{aligned}$$

This is obvious for $h^C$, and can be proven very easily for ${{\tilde{h}}}^C$ by induction. Therefore we may assume that $y_{< i} = 1$. This does not affect the measure $\mu ^*_{{\textbf{y}}}$, which does not depend on the scaling of the parameters $y_m$.

Suppose that C is a cell at level i. A probability measure $\mu _C$ on C is completely determined by probability measures $\mu _{C'}$ on the children $C'$ of C (at level $i - 1$) together with the probabilities $\mu _C(C')$, which must sum to 1, with the relation being that $\mu _{C'}(x) = \mu _C(x)/\mu _C(C')$ for $x\in C'$.

Suppose that $0\leqslant m < i$. Let the random variables X, Y be random cosets of $V_m, V_{i-1}$ respectively, sampled according to the measure $\mu _C$. Then X determines Y and so, by Lemma B.5, ${\mathbb {H}}(X,Y) = {\mathbb {H}}(X)$. The chain rule for entropy, Lemma B.4, then yields

$$\begin{aligned} {\mathbb {H}}(X) = {\mathbb {H}}(Y) + \sum _{y} {\mathbb {P}}(Y = y) {\mathbb {H}}(X | Y = y). \end{aligned}$$

Translated back to the language we are using, this implies that

$$\begin{aligned} {\mathbb {H}}_{\mu _C}(V_{m}) = {\mathbb {H}}_{\mu _C}(V_{i-1}) + \sum _{C'} \mu _C(C') {\mathbb {H}}_{\mu _{C'}}(V_m). \end{aligned}$$

Therefore

$$\begin{aligned} \sum _{0\leqslant m< i} y_m {\mathbb {H}}_{\mu _C}(V_{m}) = {\mathbb {H}}_{\mu _C}(V_{i-1}) + \sum _{C'} \mu _C(C') \sum _{0\leqslant m < i} y_m {\mathbb {H}}_{\mu _{C'}}(V_m). \end{aligned}$$

(Here we used our assumption that $y_{< i} = 1$.) Since ${\mathbb {H}}_{\mu _C}(V_m) = 0$ for $m \geqslant i$, and ${\mathbb {H}}_{\mu _{C'}}(V_m) = 0$ for $m \geqslant i - 1$, we may extend the sums over all $m\in \{0,1,\ldots ,r-1\}$ thereby obtaining

$$\begin{aligned} \sum _{0\leqslant m<r} y_m {\mathbb {H}}_{\mu _C}(V_{m}) = {\mathbb {H}}_{\mu _C}(V_{i-1}) + \sum _{C'} \mu _C(C') \sum _{0\leqslant m<r} y_m {\mathbb {H}}_{\mu _{C'}}(V_m). \end{aligned}$$

Since the $\mu _{C'}$ can be arbitrary probability measures, and ${\mathbb {H}}_{\mu _C}(V_{i-1})$ depends only on the value of $\mu _C(C')$, it follows from the inductive hypothesis that

$$\begin{aligned} h^C({\textbf{y}})&= \sup _{\mu _C}\Big ( \sum _{0\leqslant m<r} y_m {\mathbb {H}}_{\mu _C}(V_{m}) \Big ) \end{aligned}$$

(7.17)

$$\begin{aligned}&= \sup _{\mu _C(C'), \mu _{C'}}\Big ( {\mathbb {H}}_{\mu _C}(V_{i-1}) + \sum _{C'} \mu _C(C') \sum _{0\leqslant m<r} y_m {\mathbb {H}}_{\mu _{C'}}(V_m)\Big ) \end{aligned}$$

(7.18)

$$\begin{aligned}&= \sup _{\mu _C(C')}\Big ( {\mathbb {H}}_{\mu _C}(V_{i-1}) + \sum _{C'} \mu _C(C') {{\tilde{h}}}^{C'}({\textbf{y}})\Big ), \end{aligned}$$

(7.19)

with equality when going from (7.18) to (7.19) when $\mu _{C'} = \mu ^*_{C',{\textbf{y}}}$ for all $C'$. Applying Lemma B.3 with the $p_j$ being the $\mu _C(C')$ and the $a_j$ being the ${{\tilde{h}}}^{C'}({\textbf{y}})$, and noting that ${\mathbb {H}}_{\mu _C}(V_{i-1}) = {\mathbb {H}}({\textbf{p}})$ (where ${\textbf{p}} = (p_1,p_2,\ldots )$), it follows that

$$\begin{aligned} \sup _{\mu _C(C')}\Big ( {\mathbb {H}}_{\mu _C}(V_{i-1}) + \sum _{C'} \mu _C(C') {{\tilde{h}}}^{C'}({\textbf{y}})\Big ) = \log \Big ( \sum _{C' : \, C \rightarrow C'} e^{{{\tilde{h}}}^{C'}({\textbf{y}})} \Big ) = {{\tilde{h}}}^C({\textbf{y}}).\nonumber \\ \end{aligned}$$

(7.20)

In addition, Lemma B.3 implies that equality occurs in (7.20) precisely when $p_j = e^{a_j}/\sum _i e^{a_i}$, that is to say when

$$\begin{aligned} \mu _C(C') = \frac{e^{h^C({\textbf{y}})}}{\sum _{C':\, C \rightarrow C'} e^{h^C({\textbf{y}})}} = \frac{\mu ^*_{{\textbf{y}}}(C')}{\mu ^*_{{\textbf{y}}}(C)} . \end{aligned}$$

(Here we used again that $y_{<i}=1$.) Recalling that $\mu _{C'} = \mu ^*_{C',{\textbf{y}}}$ for all $C'$, we see that the measure $\mu _C$ for which equality occurs in (7.17) is the restriction of $\mu ^*_{{\textbf{y}}} = \mu ^*_{\Gamma _r,{\textbf{y}}}$ to C. This completes the inductive step. $\square $

7.5 Solution of the optimisation problem: proof

This section is devoted to the proof of Proposition 7.7. Strictly speaking, for our main theorems we only need a lower bound on $\gamma _k^{{\text {res}}}({\mathscr {V}})$, and for this it suffices to show that $c_{r+1}^*$ is given by the right-hand side of (7.13). This could, in principle, be phrased as a calculation, but it would look complicated and unmotivated. Instead, we present it in the way we discovered it, by showing that the RHS of (7.13) is an upper bound on $\gamma _k^{{\text {res}}}({\mathscr {V}})$, and then observing that equality does occur when $\mu = \mu ^*$ is the optimal measure (Definition 7.4) and ${{\textbf{c}}}= {{\textbf{c}}}^*$ the optimal parameters (Definition 7.5). We establish this upper bound using the duality argument from linear programming and Lemma 7.8.

To ease the notation, we use the shorthand $d_i := \dim (V_i)$ throughout this subsection. Let us, then, consider the restricted optimisation problem, namely Problem 7.2. The condition (7.2) may be rewritten as

$$\begin{aligned} \sum _{j = m+1}^r (c_j - c_{j+1})({\mathbb {H}}_{\mu _j}(V_m) + d_m - d_j)\geqslant & {} c_{r+1} (d_r - d_m) \end{aligned}$$

(7.21)

for $m = 0,1,\ldots , r-1$. Therefore for any choice of “dual variables” ${\textbf{y}}= (y_0,y_1,\ldots ,$ $y_{r-1})$, $y_0,\ldots , y_{r-1} \geqslant 0$, we have

$$\begin{aligned} \sum _{m = 0}^{r-1} y_m \sum _{j = m+1}^r (c_j - c_{j+1})({\mathbb {H}}_{\mu _j}(V_m) + d_m - d_j) \geqslant c_{r+1} \sum _{m = 0}^{r-1} y_m (d_r - d_m),\nonumber \\ \end{aligned}$$

(7.22)

which, upon rearranging, gives

$$\begin{aligned} \sum _{j = 1}^r (c_j - c_{j+1}) E_j({\textbf{y}}) + c_{r+1}E_{r+1}({\textbf{y}}) \geqslant c_{r+1}.\end{aligned}$$

(7.23)

where

$$\begin{aligned} E_j({\textbf{y}}) := \sum _{m = 0}^{j-1} y_m ({\mathbb {H}}_{\mu _j}(V_m) + d_m - d_j) \end{aligned}$$

for $j = 1,\ldots , r$, and

$$\begin{aligned} E_{r+1}({\textbf{y}}) := 1 - \sum _{m = 0}^{r-1} y_m (d_r - d_m). \end{aligned}$$

Since the $c_j - c_{j+1}$, $j = 1,\ldots , r$, and $c_{r+1}$ are nonnegative and sum to 1, this implies that

$$\begin{aligned} c_{r+1} \leqslant \min _{y_i\geqslant 0\; \forall i} \max \{ E_1({\textbf{y}}),\ldots , E_r({\textbf{y}}), E_{r+1}({\textbf{y}}) \}. \end{aligned}$$

(7.24)

By Lemma 7.8, this implies that

$$\begin{aligned} c_{r+1} \leqslant \min _{y_i\geqslant 0\; \forall i} \max \{ E'_1({\textbf{y}}), \ldots E'_r({\textbf{y}}), E_{r+1}({\textbf{y}})\}, \end{aligned}$$

(7.25)

where

$$\begin{aligned} E'_j({\textbf{y}}):= & {} h^{\Gamma _j}({\textbf{y}}) + \sum _{m = 0}^{j-1} y_m ( d_m - d_j)\nonumber \\= & {} \sum _{m = 0}^{j-1} y_m ({\mathbb {H}}_{\mu ^*_{\Gamma _j,{\textbf{y}}}}(V_m) + d_m - d_j),\nonumber \\ \end{aligned}$$

(7.26)

for $j = 1,\ldots , r$, and $\mu ^*_{\Gamma _j,{\textbf{y}}}$ is the measure $\nu $ supported on $\Gamma _j = V_j \cap \{0,1\}^k$ for which the sum $\sum _m y_m {\mathbb {H}}_{\nu }(V_m)$ is maximal, as defined in Lemma 7.8.

Now we specify a choice of ${\textbf{y}}$. To do this, we make a change of variables, defining $\rho _i = y_{< i}/y_{< i+1}$. Note that for fixed $y_0 > 0$, choices of $y_1,\ldots , y_{r-1}> 0$ are in one-to-one correspondence with choices of $\rho _1,\ldots , \rho _{r-1}$ with $0< \rho _i < 1$. We must then have that

$$\begin{aligned} \log f^C({\varvec{\rho }}) = h^C({\textbf{y}}/y_{<i}) = \frac{1}{y_{< i}} h^C({\textbf{y}}) = \frac{\rho _1 \ldots \rho _{i-1}}{y_0} h^C({\textbf{y}}) \end{aligned}$$

(7.27)

for the cells C at level i, which may easily be proven by induction on the level i, using the defining equations for the $h^C$ and $f^C$ (see (7.15), (7.4) respectively).

Now choose the $\rho _i$ to satisfy the $\rho $-equations (7.5). In virtue of (7.27), the j-th $\rho $-equation

$$\begin{aligned} f^{\Gamma _{j+1}}({\varvec{\rho }}) = (f^{\Gamma _{j}}({\varvec{\rho }}))^{\rho _j} e^{d_{j+1} - d_j} \end{aligned}$$

with $j\in \{1,2,\ldots ,r-1\}$ is equivalent to

$$\begin{aligned} E'_j({\textbf{y}}) = E'_{j+1}({\textbf{y}}), \end{aligned}$$

(7.28)

with $E'_j({\textbf{y}})$ defined as in (7.26) above.

Recall that $d_1-d_0=\dim (V_1/V_0)=1$. Thus, if we choose

$$\begin{aligned} y_0 := 1\Big /\Big ( \log 3 + \sum _{i = 1}^{r-1} \frac{d_{i+1} - d_i}{\rho _1 \ldots \rho _i} \Big ), \end{aligned}$$

a short calculation confirms that

$$\begin{aligned} E_{r+1}({\textbf{y}}) = E'_1({\textbf{y}}) = y_0 (\log 3-1). \end{aligned}$$

(7.29)

With this choice of ${\textbf{y}}$ we therefore have, from (7.28) with $j = 1,\ldots , r-1$, (7.29) and (7.25),

$$\begin{aligned} c_{r+1} \leqslant E'_1({\textbf{y}}) = (\log 3-1) \big /\Big ( \log 3 + \sum _{i = 1}^{r-1} \frac{d_{i+1} - d_i}{\rho _1 \ldots \rho _i} \Big ). \end{aligned}$$

(7.30)

In the above analysis, the $\mu _i$ and the $c_i$ were arbitrary subject to the conditions of Problem 7.2, thus ${\text {Supp}}(\mu _i) \subset V_i$ and $1 = c_1> c_2> \cdots > c_{r+1}$. Therefore, recalling the definition of $\gamma _k^{{\text {res}}}({\mathscr {V}})$ (see Problem 7.2), we have proven that

$$\begin{aligned} \gamma _k({\mathscr {V}}) \leqslant \gamma _k^{{\text {res}}}({\mathscr {V}}) \leqslant (\log 3-1) \big /\Big ( \log 3 + \sum _{i = 1}^{r-1} \frac{d_{i+1} - d_i}{\rho _1 \ldots \rho _i} \Big ). \end{aligned}$$

Proposition 7.7 asserts that equality occurs in this bound when $c_j = c^*_j$ and $\mu _j = \mu ^*_j$, where ${{\textbf{c}}}^* = (c_1^*,\ldots , c^*_{r+1})$ are the optimal parameters defined in Definition 7.5, and $\mu ^*$ and its restrictions $\mu ^*_j$ are the optimal measures defined in Definition 7.4. To establish this, we must go back through the argument showing that equality occurs at every stage with these choices.

First note that (7.21) is equivalent (as we stated at the time) to $\textrm{e}({\mathscr {V}}'_{{\text {basic}}(m)},{{\textbf{c}}},{\varvec{\mu }}) \geqslant \textrm{e}({\mathscr {V}},{{\textbf{c}}},{\varvec{\mu }})$. The fact that equality occurs here when ${{\textbf{c}}}= {{\textbf{c}}}^*$ and ${\varvec{\mu }}= {\varvec{\mu }}^*$ is essentially the definition of the optimal parameters ${{\textbf{c}}}^*$ (Definition 7.5). That equality occurs in (7.22) and (7.23) is then automatic.

Working from the other end of the proof, the choice of ${\textbf{y}}$ was made so that $E'_1({\textbf{y}}) = \cdots = E'_r({\textbf{y}}) = E_{r+1}({\textbf{y}})$. We claim that, with this choice of ${\textbf{y}}$,

$$\begin{aligned} \mu ^* = \mu ^*_{{\textbf{y}}}. \end{aligned}$$

(7.31)

By (7.16), it suffices to check that

$$\begin{aligned} \frac{\mu ^*(C')}{\mu ^*(C)} = \frac{e^{h^{C'}({\textbf{y}}/y_{<i})}}{e^{h^{C}({\textbf{y}}/y_{<i})}}. \end{aligned}$$

This follows immediately from (7.6) and (7.27).

Since $\mu ^*_j$ is defined to be the restriction of $\mu ^*$ to $\Gamma _j$, it follows from (7.31) that $\mu ^*_j = \mu ^*_{\Gamma _j,{\textbf{y}}}$, and hence that $E_j({\textbf{y}}) = E'_j({\textbf{y}})$ for $j = 1,\ldots , r$.

Thus all $2r + 1$ of the quantities $E'_j({\textbf{y}})$ ($j = 1,\ldots , r$) and $E_j({\textbf{y}})$ ($j = 1,\ldots , r+1$) are equal. It follows from this and the fact that equality occurs in (7.23) that equality occurs in (7.24), (7.25) and (7.30) as well. This concludes the proof of Proposition 7.7. $\square $

8 The strict entropy condition

8.1 Introduction

Fix an r-step, nondegenerate flag ${\mathscr {V}}$. In the previous section, we studied a restricted optimization problem (Problem 7.2) asking for the supremum of $c_{r+1}$ when ranging over all systems $({\mathscr {V}},{{\textbf{c}}},{\varvec{\mu }})$ satisfying the “restricted entropy condition”

$$\begin{aligned} \textrm{e}({\mathscr {V}}'_{{\text {basic}}(m)},{{\textbf{c}}},{\varvec{\mu }}) \geqslant \textrm{e}({\mathscr {V}},{{\textbf{c}}},{\varvec{\mu }}) \qquad ( m = 0,1,\ldots , r-1). \end{aligned}$$

(8.1)

The aim of the present section is two-fold: we wish to establish, under general conditions, that an “optimal system” with respect to (8.1) satisfies the more general entropy condition

$$\begin{aligned} \textrm{e}({\mathscr {V}}', {{\textbf{c}}},{\varvec{\mu }}) \geqslant \textrm{e}({\mathscr {V}}, {{\textbf{c}}},{\varvec{\mu }}) \qquad (\text {all } {\mathscr {V}}'\leqslant {\mathscr {V}}). \end{aligned}$$

(8.2)

In addition, we want to show that if we slightly perturb such a system, we may guarantee the strict entropy condition (3.5), which is a version of (8.2) with strict inequalities for all proper subflags ${\mathscr {V}}'$ of ${\mathscr {V}}$.

Before stating our result, we need to define the notion of the automorphism group of a flag.

Definition 8.1

(Automorphism group) For a permutation $\sigma \in S_k$ and $\omega =(\omega _1,\ldots ,\omega _k)\in {\mathbb {Q}}^k$, denote by $\sigma \omega $ the usual coordinate permutation action $\sigma \omega = (\omega _{\sigma (1)},\ldots ,\omega _{\sigma (k)})$. The automorphism group ${\text {Aut}}({\mathscr {V}})$ is the group of all $\sigma $ that satisfy $\sigma V_i=V_i$ for all i.

Proposition 8.2

Let ${\mathscr {V}}$ be an r-step, nondegenerate flag of distinct spaces. Assume that the $\rho $-equations (7.5) have a solution, and define the optimal measures ${\varvec{\mu }}^*$ on $\{0,1\}^k$ as in Definition 7.4. Furthermore, assume that:

(a)
no intermediate subspace is fixed by ${\text {Aut}}({\mathscr {V}})$, that is to say there is no space W that is invariant under the action of ${\text {Aut}}({\mathscr {V}})$ and such that $V_{i-1}< W <V_i$ (the inclusions being strict);
(b)
the optimal parameters ${{\textbf{c}}}^*$ exist and they are distinct and positive, that is to say the system of Eq. (7.12) has a unique solution ${{\textbf{c}}}^*$ satisfying $1=c_1^*> c_2^*> \cdots> c_{r+1}^*>0$;
(c)
the following “positivity inequalities” hold:
1. (i)
  ${\mathbb {H}}_{\mu ^*_{m+1}}(V_m)>\dim (V_{m+1}/V_m)$ for $0\leqslant m\leqslant r-1$;
2. (ii)
  ${\mathbb {H}}_{\mu ^*_i}(V_{m-1})-{\mathbb {H}}_{\mu ^*_i}(V_m) < \dim (V_m/V_{m-1})$ for $1\leqslant m<i\leqslant r$.

Then, for every $\varepsilon >0$, there exists a perturbation ${\tilde{{{\textbf{c}}}}}$ of ${{\textbf{c}}}^*$ such that $1 = {{\tilde{c}}}_1> {{\tilde{c}}}_2>\cdots > {{\tilde{c}}}_{r+1}\geqslant c_{r+1}-\varepsilon $ and such that we have the strict entropy condition

$$\begin{aligned} \textrm{e}({\mathscr {V}}',{\tilde{{{\textbf{c}}}}},{\varvec{\mu }}^*) > \textrm{e}({\mathscr {V}},{\tilde{{{\textbf{c}}}}},{\varvec{\mu }}^*) \qquad \text{ for } \text{ all } \text{ proper } \text{ subflags } {\mathscr {V}}' \leqslant {\mathscr {V}}. \end{aligned}$$

(8.3)

We assume throughout the rest of the section that (a), (b) and (c) of Proposition 8.2 are satisfied, and we now fix the system $({\mathscr {V}},{{\textbf{c}}}^*,{\varvec{\mu }}^*)$. For notational brevity in what follows, we write

$$\begin{aligned} \textrm{e}({\mathscr {V}}') := \textrm{e}({\mathscr {V}}',{\textbf{c}}^*,{\varvec{\mu }}^*). \end{aligned}$$

Our strategy is as follows. First, we show the weaker “unperturbed” statement that

$$\begin{aligned} \textrm{e}({\mathscr {V}}') \geqslant \textrm{e}({\mathscr {V}}) \qquad \text{ for } \text{ all } \text{ subflags } {\mathscr {V}}' \leqslant {\mathscr {V}}, \end{aligned}$$

(8.4)

noting that we have strict inequality for certain subflags ${\mathscr {V}}'$ along the way. Then, in Sect. 8.8, we show how to perturb ${{\textbf{c}}}^*$ to ${\tilde{{{\textbf{c}}}}}$ so that the strict inequality (8.3) is satisfied. We also sketch a second way of effecting the perturbation which is in a sense more robust, but which in essence requires a perturbation of the whole proof of (8.4).

8.2 Analysis of non-basic flags

We turn now to the task of proving (8.4). We will prove it for progressively wider sets of subflags ${\mathscr {V}}'$, each time using the previous statement. In order, we will prove it for subflags ${\mathscr {V}}'$ which we call:

(a)
semi-basic: flags
$$\begin{aligned} {\mathscr {V}}':V_0\leqslant V_1 \leqslant V_2 \leqslant \cdots \leqslant V_{m-1} \leqslant \cdots \leqslant V_{m-1} \leqslant V_m \leqslant \cdots \leqslant V_m \end{aligned}$$
with $m\geqslant 1$ (that is, ${\mathscr {V}}'$ is like a basic flag, but there can be more than one copy of $V_{m-1}$);
(b)
standard: each $V'_i$ is one of the spaces $V_j$;
(c)
invariant: this means that $\sigma V'_i = V'_i$ for all automorphisms $\sigma \in {\text {Aut}}({\mathscr {V}})$ and all i;
(d)
general subflags, i.e. we assume no restriction on the $V'_i$ other than that $V'_i \leqslant V_i$.

Note that a semi-basic flag is standard, a standard flag is invariant, and of course an invariant flag is general.

We introduce some notation for standard flags. Let $J \subset {\mathbb {N}}_0^r$ be the set of all r-tuples $\varvec{j} = (j_1,\ldots , j_r)$ such that $j_1 \leqslant \cdots \leqslant j_r$ and $j_i \leqslant i$ for all i. Then we define the flag ${\mathscr {V}}'_{\varvec{j}} = {\mathscr {V}}'_{(j_1,\ldots , j_r)}$ to be the one with $V'_i = V_{j_i}$. This is a standard flag, and conversely every standard flag is of this form. If we define

$$\begin{aligned} {\text {basic}}(m) := (1,2,\ldots , m-1, m,\ldots , m) \end{aligned}$$

then ${\text {basic}}(m) \in J$, and ${\mathscr {V}}'_{{\text {basic}}(m)}$ agrees with our previous notation.

8.3 Semi-basic subflags

In this subsection we prove the following result, establishing that (8.4) holds for semi-basic subflags, and with strict inequality for those which are not basic.

Lemma 8.3

(Assuming that (a), (b) and (c) of Proposition 8.2 hold) we have $\textrm{e}({\mathscr {V}}') > \textrm{e}({\mathscr {V}})$ for all non-basic, semi-basic flags ${\mathscr {V}}'$.

We begin by setting a small amount of notation for semi-basic flags. We note that the idea of a semi-basic flag, which looks rather ad hoc, will only be used here and in Sect. 8.5.

Definition 8.4

(Semi-basic flags that are not basic) Suppose that $1 \leqslant m \leqslant r - 1$ and that $m \leqslant s \leqslant r-1$. Then we define the element ${\text {semi}}(m,s) \in J$ to be $\varvec{j} = (1,2,\ldots , m-1,m-1,\ldots , m,\ldots , m)$ such that $j_i = i$ for $i \leqslant m-1$, $j_i = m-1$ for $m \leqslant i \leqslant s$ and $j_i = m$ for $i > s$.

It is convenient and natural to extend the notation to $s = m-1$ and $s = r$, by defining

$$\begin{aligned} {\text {semi}}(m,r) = {\text {basic}}(m-1), \qquad {\text {semi}}(m, m-1) = {\text {basic}}(m). \end{aligned}$$

(8.5)

One can think of the semi-basic flags as interpolating between the basic flags.

Example

When $r = 3$ there are three semi-basic flags ${\mathscr {V}}_{\varvec{j}}$ that are not basic, corresponding to

$$\begin{aligned} \varvec{j}= & {} {\text {semi}}(1,1) = (0,1,1),\\ \varvec{j}= & {} {\text {semi}}(1,2) = (0,0,1),\\ \varvec{j}= & {} {\text {semi}}(2,2) = (1,1,2). \end{aligned}$$

Proof of Lemma 8.3

Assume that ${\mathscr {V}}'$ is semi-basic but not basic. We will show that

$$\begin{aligned} \textrm{e}({\mathscr {V}}'_{{\text {semi}}(m,s)}) > \textrm{e}({\mathscr {V}}'_{{\text {semi}}(m,s+1)})\end{aligned}$$

(8.6)

for $m \leqslant s \leqslant r-1$. Since ${\mathscr {V}}'_{{\text {semi}}(m ,r)} = {\mathscr {V}}'_{{\text {basic}}(m-1)}$ is basic, this establishes Lemma 8.3.

To prove (8.6), we simply compute that

$$\begin{aligned}&\textrm{e}({\mathscr {V}}'_{{\text {semi}}(m,s)}) - \textrm{e}({\mathscr {V}}'_{{\text {semi}}(m,s+1)}) \\ {}&\quad = (c^*_{s+1} - c^*_{s+2}) \big [ {\mathbb {H}}_{\mu _{s+1}}(V_m) - {\mathbb {H}}_{\mu _{s+1}}(V_{m-1}) + \dim (V_m/V_{m-1}) \big ] \end{aligned}$$

when $m \leqslant s \leqslant r-2$, and

$$\begin{aligned}&\textrm{e}({\mathscr {V}}'_{{\text {semi}}(m,r-1)}) - \textrm{e}({\mathscr {V}}'_{{\text {semi}}(m,r)}) \\ {}&\quad = (c^*_r - c^*_{r+1}) \big [ {\mathbb {H}}_{\mu _r}(V_m) - {\mathbb {H}}_{\mu _r}(V_{m-1}) + \dim (V_m/V_{m-1}) \big ] \\&\qquad + \dim (V_m/V_{m-1}) c^*_{r+1}. \end{aligned}$$

In both cases, the result follows from part (ii) of condition(c) of Proposition 8.2; in the second case, we also need to use our assumption that $c_{r+1}^*\geqslant 0$. $\square $

8.4 Submodularity inequalities

To proceed further, we make heavy use of a submodularity property of the expressions $\textrm{e}()$.

Suppose that ${\mathscr {V}}', {\tilde{{\mathscr {V}}}}'$ are two subflags of ${\mathscr {V}}$. We can define the sum ${\mathscr {V}}' + {\tilde{{\mathscr {V}}}}'$ and intersection ${\mathscr {V}}' \cap {\tilde{{\mathscr {V}}}}'$ by

$$\begin{aligned} ({\mathscr {V}}' + {\tilde{{\mathscr {V}}}}')_i := V'_i + {{\tilde{V}}}'_i \end{aligned}$$

and

$$\begin{aligned} ({\mathscr {V}}' \cap {\tilde{{\mathscr {V}}}}')_i := V'_i \cap {{\tilde{V}}}'_i. \end{aligned}$$

Both of these are indeed subflags of ${\mathscr {V}}$.

Lemma 8.5

We have

$$\begin{aligned} \textrm{e}({\mathscr {V}}') + \textrm{e}({\tilde{{\mathscr {V}}}}') \geqslant \textrm{e}({\mathscr {V}}' + {\tilde{{\mathscr {V}}}}') + \textrm{e}({\mathscr {V}}' \cap {\tilde{{\mathscr {V}}}}'). \end{aligned}$$

Proof

We first note that the entropies ${\mathbb {H}}_{\mu }(W)$ satisfy a submodularity inequality. Namely, if $W_1, W_2$ are subspaces of ${\mathbb {Q}}^k$ and $\mu $ is a probability measure then

$$\begin{aligned} {\mathbb {H}}_{\mu }(W_1) + {\mathbb {H}}_{\mu }(W_2) \geqslant {\mathbb {H}}_ {\mu }(W_1 \cap W_2) + {\mathbb {H}}_{\mu }(W_1 + W_2).\end{aligned}$$

(8.7)

To prove this, consider the following three random variables:

X is a random coset of $W_1 + W_2$, sampled according to the measure $\mu $;
Y is a random coset of $W_1$, sampled according to the measure $\mu $;
Z is a random coset of $W_2$, sampled according to the measure $\mu $.

Then, more-or-less by definition,

$$\begin{aligned} {\mathbb {H}}(X) = {\mathbb {H}}_{\mu }(W_1 + W_2),\quad {\mathbb {H}}(Y) = {\mathbb {H}}_{\mu }(W_1),\quad {\mathbb {H}}(Z) = {\mathbb {H}}_{\mu }(W_2). \end{aligned}$$

Note also that Y determines X and so ${\mathbb {H}}(Y) = {\mathbb {H}}(X,Y)$, and similarly ${\mathbb {H}}(Z) = {\mathbb {H}}(X,Z)$. Finally, (Y, Z) uniquely defines a random coset of $W_1 \cap W_2$, and so

$$\begin{aligned} {\mathbb {H}}_{\mu }(W_1 \cap W_2) = {\mathbb {H}}(Y,Z) = {\mathbb {H}}(X,Y,Z). \end{aligned}$$

The inequality to be proven, (8.7) is therefore equivalent to

$$\begin{aligned} {\mathbb {H}}(X,Y) + {\mathbb {H}}(X,Z) \geqslant {\mathbb {H}}(X,Y,Z) + {\mathbb {H}}(X), \end{aligned}$$

which is a standard entropy inequality (Lemma B.6; usually known as “submodularity of entropy” or “Shannon’s inequality” in the literature).

Lemma 8.5 is essentially an immediate consequence of (8.7) and the formula

$$\begin{aligned} \dim (W_1) + \dim (W_2) = \dim (W_1 \cap W_2) + \dim (W_1 + W_2). \end{aligned}$$

(It is very important that this formula holds with equality, as compared to (8.7), which holds only with an inequality.) $\square $

This has the following immediate corollary when applied to standard subflags. Here, the $\max $ and $\min $ are taken coordinatewise.

Corollary 8.6

Suppose that $\varvec{j}_1, \varvec{j}_2 \in J$. Then

$$\begin{aligned} \textrm{e}({\mathscr {V}}'_{\varvec{j}_1}) + \textrm{e}({\mathscr {V}}'_{\varvec{j}_2}) \geqslant \textrm{e}({\mathscr {V}}'_{\max (\varvec{j}_1, \varvec{j}_2)}) + \textrm{e}({\mathscr {V}}'_{\min (\varvec{j}_1, \varvec{j}_2)}) \end{aligned}$$

8.5 Standard subflags

Now we extend the result of the Sect. 8.3 to all standard subflags.

Lemma 8.7

(Assuming that (a), (b) and (c) of Proposition 8.2 hold) we have $\textrm{e}({\mathscr {V}}') > \textrm{e}({\mathscr {V}})$ for all standard, non-basic subflags ${\mathscr {V}}'\leqslant {\mathscr {V}}$.

Proof

Let $\varvec{j} \in J$ with $\varvec{j}$ non-basic, and let ${\mathscr {V}}'={\mathscr {V}}'_{\varvec{j}}$. Then $r\geqslant 3$, since when $r\leqslant 2$ all standard flags are basic. We proceed by induction on $\Vert \varvec{j}\Vert _\infty $, the case $\Vert \varvec{j}\Vert _\infty =1$ being trivial, since then ${\mathscr {V}}$ is semibasic and we may invoke Lemma 8.3. Now suppose we have proved $\textrm{e}({\mathscr {V}}') > \textrm{e}({\mathscr {V}})$ for all non-basic standard flags ${\mathscr {V}}'={\mathscr {V}}'_{\varvec{j}}$ with $\Vert \varvec{j}\Vert _\infty < m$, and let $\varvec{j} \in J$ with $\Vert \varvec{j}\Vert _\infty =m$. We apply Corollary 8.6 with $\varvec{j}_1 = \varvec{j}$ and $\varvec{j}_2 = {\text {basic}}(j_r - 1)$. Noting that $\max (\varvec{j}, {\text {basic}}(j_r - 1)) = {\text {semi}}(j_r, s)$, where s is the largest index in $\varvec{j}$ such that $j_s < j_r$, we see that

$$\begin{aligned} \textrm{e}({\mathscr {V}}'_{\varvec{j}}) + \textrm{e}({\mathscr {V}}'_{{\text {basic}}(j_r - 1)} )\geqslant \textrm{e}({\mathscr {V}}'_{\varvec{j}_*}) + \textrm{e}({\mathscr {V}}'_{{\text {semi}}(j_r, s)}), \end{aligned}$$

(8.8)

where

$$\begin{aligned} \varvec{j}_* := \min (\varvec{j}, {\text {basic}}(j_r - 1)). \end{aligned}$$

Suppose that both of the flags on the right of (8.8) are basic. If ${\text {semi}}(j_r, s)$ is basic then it must be ${\text {basic}}(j_r)$, which means that $s = j_r - 1$. But then $\varvec{j}_* = (j_1,\ldots , j_s, j_r - 1,\ldots j_r - 1)$ which, if it is basic, must be ${\text {basic}}(j_r - 1)$; this then implies that $j_i = i$ for $1 \leqslant i \leqslant s$, and hence that $\varvec{j} = {\text {basic}}(j_r)$, a contradiction. Thus, at least one of the two flags $\varvec{j}_*, {\text {semi}}(j_r,s)$ on the right of (8.8) is not basic. Since $\Vert \varvec{j}_* \Vert _{\infty } < \Vert \varvec{j} \Vert _{\infty }=m$, the induction hypothesis together with Lemma 8.3 implies that $\textrm{e}({\mathscr {V}}') > \textrm{e}({\mathscr {V}})$, as desired. $\square $

8.6 Invariant subflags

Now we extend our results to all invariant flags, but now without the strict inequality.

Lemma 8.8

(Assuming that (a), (b) and (c) of Proposition 8.2 hold) we have $\textrm{e}({\mathscr {V}}') \geqslant \textrm{e}({\mathscr {V}})$ for all invariant subflags ${\mathscr {V}}'\leqslant {\mathscr {V}}$.

Proof

We associate a pair $(i,\ell )$, $i\geqslant \ell $, of positive integers to ${\mathscr {V}}'$, which we call the signature, in the following manner. If ${\mathscr {V}}'$ is standard, then set $(i,\ell )=(-1,-1)$. Otherwise, let i be maximal so that $V'_i$ is not a standard space $V_t$, and then let $\ell $ be minimal such that $V'_i \leqslant V_{\ell }$. The fact that $\ell \leqslant i$ is immediate from the definition of a subflag. We put a partial ordering on signatures as follows: $(i', \ell ') \preceq (i,\ell )$ iff $i' < i$, or if $i' = i$ and $\ell ' \leqslant \ell $. We proceed by induction on the pair $(i,\ell )$ with respect to this ordering, the case $(i,\ell )=(-1,-1)$ handled by Lemma 8.7.

For the inductive step, suppose ${\mathscr {V}}'$ is nonstandard with signature $(i,\ell )$. By submodularity,

$$\begin{aligned} \textrm{e}({\mathscr {V}}') + \textrm{e}({\mathscr {V}}'_{{\text {basic}}(\ell -1)}) \geqslant \textrm{e}({\mathscr {V}}_1)+\textrm{e}({\mathscr {V}}_2), \end{aligned}$$

(8.9)

where

$$\begin{aligned} {\mathscr {V}}_1 = {\mathscr {V}}' \cap {\mathscr {V}}'_{{\text {basic}}(\ell -1)}, \qquad {\mathscr {V}}_2 = {\mathscr {V}}' + {\mathscr {V}}'_{{\text {basic}}(\ell -1)}. \end{aligned}$$

Suppose that ${\mathscr {V}}_1, {\mathscr {V}}_2$ have signatures $(i_1,\ell _1),(i_2,\ell _2)$, respectively. We show that

$$\begin{aligned} (i_1,\ell _1)\precneqq (i,\ell ) \quad \text { and} \quad (i_2,\ell _2)\precneqq (i,\ell ). \end{aligned}$$

(8.10)

Both ${\mathscr {V}}_1$ and ${\mathscr {V}}_2$ are invariant flags. Thus, if (8.10) holds, then both flags on the right-hand side of (8.9) have strictly smaller signature than ${\mathscr {V}}'$, and the lemma follows by induction.

Finally, we prove (8.10). Note that if $j>i$, then $V_j'$ is a standard space $V_m$ and thus so are $({\mathscr {V}}_1)_j$ and $({\mathscr {V}}_2)_j$. In particular, $i_1\leqslant i$ and $i_2\leqslant i$. We have that $({\mathscr {V}}_2)_i$ contains $V_{\ell -1}$, is not equal to $V_{\ell -1}$, and is contained in $V_\ell $. But $({\mathscr {V}}_2)_i$ is invariant, and hence by our assumption that (a) of Proposition 8.2 holds, $({\mathscr {V}}_2)_i=V_\ell $. Consequently, $i_2<i$ if ${\mathscr {V}}_2$ is nonstandard. In the case that ${\mathscr {V}}_1$ is nonstandard, we also have that $\ell _1<\ell $ because every space in the flag ${\mathscr {V}}_1$ is contained in $V_{\ell - 1}$. This proves (8.10). $\square $

8.7 General subflags

In this section we establish (8.4), that is to say the inequality $\textrm{e}({\mathscr {V}}')\geqslant \textrm{e}({\mathscr {V}})$ for all subflags ${\mathscr {V}}'$, of course subject to our standing assumption that (a), (b) and (c) of Proposition 8.2 hold. We need a simple lemma about the action of the automorphism group ${\text {Aut}}({\mathscr {V}})$ on subflags.

Lemma 8.9

Let $\sigma \in {\text {Aut}}({\mathscr {V}})$ and let ${\mathscr {V}}'$ be a subflag of ${\mathscr {V}}$. Then one may define a new subflag $\sigma ({\mathscr {V}}')$, setting $\sigma ({\mathscr {V}}')_i := \sigma (V'_i)$. Moreover, $\textrm{e}(\sigma ({\mathscr {V}}')) = \textrm{e}({\mathscr {V}}')$.

Proof

Since ${\mathscr {V}}'$ is a subflag, $V'_i \leqslant V_i$. Applying $\sigma $, and recalling that $V_i$ is invariant under $\sigma $, we see that $\sigma (V'_i) \leqslant V_i$. Therefore $\sigma ({\mathscr {V}}')$ is also a subflag. To see that $\textrm{e}(\sigma ({\mathscr {V}}')) = \textrm{e}({\mathscr {V}}')$, recall Lemma 7.6, which implies that $\mu _i$ is invariant under $\sigma $, since the trees ${\mathscr {T}}({\mathscr {V}}')$ and ${\mathscr {T}}(\sigma ({\mathscr {V}}'))$ are isomorphic and we have $\dim (V_j')=\dim (\sigma (V_j'))$ for all j. It follows that, for any subspace $W \leqslant {\mathbb {Q}}^k$,

$$\begin{aligned} {\mathbb {H}}_{\mu _i}(\sigma (W))&= -\sum _x \mu _i(x) \log \mu _i(\sigma (W) + x) \\&= -\sum _y \mu _i(\sigma (y)) \log \mu _i(\sigma (W + y)) \\&= -\sum _y \mu _i(y) \log \mu _i(W + y) \\&= {\mathbb {H}}_{\mu _i}(W). \end{aligned}$$

This completes the proof of the lemma. $\square $

Proof of (8.4)

Let m be the minimum of $\textrm{e}({\mathscr {V}}')$ over all subflags ${\mathscr {V}}' \leqslant {\mathscr {V}}$, and among the flags with $\textrm{e}({\mathscr {V}}')=m$, take the one with $\sum _i \dim V'_i$ minimal. Let $\sigma \in {\text {Aut}}({\mathscr {V}})$ be an arbitrary automorphism. By Lemma 8.9, $\textrm{e}({\mathscr {V}}') = \textrm{e}(\sigma ({\mathscr {V}}'))$, and hence submodularity implies that

$$\begin{aligned} 2\textrm{e}({\mathscr {V}}') \geqslant \textrm{e}({\mathscr {V}}' + \sigma ({\mathscr {V}}')) + \textrm{e}({\mathscr {V}}' \cap \sigma ({\mathscr {V}}')). \end{aligned}$$

(8.11)

In particular, we have $\textrm{e}({\mathscr {V}}\cap \sigma ({\mathscr {V}}')) = m$ (and also $\textrm{e}({\mathscr {V}}' + \sigma ({\mathscr {V}}')) = \textrm{e}({\mathscr {V}})$, but we will not need this). Moreover, by the minimality of $\sum _i \dim V'_i$,

$$\begin{aligned} \sum _i \dim (V'_i \cap \sigma (V'_i)) = \sum _i \dim V'_i, \end{aligned}$$

which means that ${\mathscr {V}}'$ is invariant. Invoking Lemma 8.8, we conclude that $m=\textrm{e}({\mathscr {V}}')\geqslant \textrm{e}({\mathscr {V}})$. $\square $

8.8 The strict entropy condition

In this section we complete the proof of Proposition 8.2 by showing how to perturb (8.4) to the desired strict inequality (8.3).

First argument. Consider first the collection ${\mathcal {U}}$ of all subflags ${\mathscr {V}}'$ which satisfy, for some $1\leqslant j\leqslant r-1$, the relations

$$\begin{aligned} V_i' = V_i \;\; (i\ne j), \quad V_{j-1} \leqslant V_{j'} < V_j. \end{aligned}$$

These are flags which differ from ${\mathscr {V}}$ in exactly one space. Our first task will be to establish the strict inequality

$$\begin{aligned} \textrm{e}({\mathscr {V}}') > \textrm{e}({\mathscr {V}}) \end{aligned}$$

(8.12)

for all ${\mathscr {V}}' \in {\mathcal {U}}$, by elaborating upon the argument of the previous subsection. We already know that $\textrm{e}({\mathscr {V}}') \geqslant \textrm{e}({\mathscr {V}})$, so suppose as a hypothesis for contradiction that $\textrm{e}({\mathscr {V}}') = \textrm{e}({\mathscr {V}})$ for some ${\mathscr {V}}' \in {\mathcal {U}}$. Amongst all such flags, take one with minimal $\sum \dim (V_i')$. By submodularity, we have (8.11) and hence $\textrm{e}({\mathscr {V}}' \cap \sigma ({\mathscr {V}}')) = \textrm{e}({\mathscr {V}})$ for any automorphism $\sigma \in {\text {Aut}}({\mathscr {V}})$. But

$$\begin{aligned} {\mathscr {V}}' \cap \sigma ({\mathscr {V}}') = (V_1,\ldots ,V_{j-1},V_{j}'\cap \sigma (V_j'),V_{j+1},\ldots ,V_r) \end{aligned}$$

is evidently in ${\mathcal {U}}$ as well, and by our minimality assumption it follows that $\dim (V_j' \cap \sigma (V_j'))=\dim (V_{j}')$. Thus, ${\mathscr {V}}'$ is invariant, and by assumption (a) of Proposition 8.2, it follows that $V_j'=V_{j-1}$. Thus, ${\mathscr {V}}'$ is a standard flag, which is not basic since $j\leqslant r-1$. Hence, $\textrm{e}({\mathscr {V}}')>\textrm{e}({\mathscr {V}})$ by Lemma 8.7. This contradition establishes (8.12).

Let $1\leqslant j\leqslant r-1$ and let V be a space satisfying $V_{j-1}\leqslant V<V_j$. Let ${\mathscr {V}}'$ be the subflag $\langle {\textbf{1}}\rangle = V_0\leqslant \ldots V_{j-1} \leqslant V \leqslant V_{j+1}\leqslant \cdots \leqslant V_r$. Then one easily computes that

$$\begin{aligned} \textrm{e}({\mathscr {V}}') - \textrm{e}({\mathscr {V}}) = (c_{j}-c_{j+1}) \big ( {\mathbb {H}}_{\mu _j}(V)-\dim (V_j/V) \big ), \end{aligned}$$

and so (8.12) implies that

$$\begin{aligned} {\mathbb {H}}_{\mu _j}(V) > \dim (V_j/V). \end{aligned}$$

(8.13)

Now let $\varepsilon >0$ be sufficiently small and consider the pertubation ${\tilde{{{\textbf{c}}}}}$ given by

$$\begin{aligned} {\tilde{c}}_1=1, \quad {\tilde{c}}_j = c_j^*-\frac{1}{2}\sum _{\ell =1}^{j-1} \varepsilon ^\ell \quad (2\leqslant j\leqslant r+1). \end{aligned}$$

Evidently, $1={\tilde{c}}_1> {\tilde{c}}_2>\cdots > {\tilde{c}}_{r+1} \geqslant c^*_{r+1}-\varepsilon $, as needed. For any proper subflag ${\mathscr {V}}' \leqslant {\mathscr {V}}$,

$$\begin{aligned}{} & {} \textrm{e}({\mathscr {V}}',{\tilde{{{\textbf{c}}}}},{\varvec{\mu }}^*) -\textrm{e}({\mathscr {V}},{\tilde{{{\textbf{c}}}}},{\varvec{\mu }}^*) \\{} & {} \quad = \textrm{e}({\mathscr {V}}')-\textrm{e}({\mathscr {V}}) + \frac{1}{2}\sum _{j=1}^r \varepsilon ^j \big ( {\mathbb {H}}_{\mu _j}(V_j')-\dim (V_j/V_j') \big ) \\{} & {} \qquad + \frac{1}{2}(\varepsilon +\varepsilon ^2+\cdots +\varepsilon ^{r})\dim (V_r/V_r'). \end{aligned}$$

Let $J=\min \{j : V_j' \ne V_j\}$. If $J=r$, then $\dim (V_r/V_r')\geqslant 1$ and the right side above is at least $\varepsilon /2 + O(\varepsilon ^r)$, which is positive for small enough $\varepsilon $. If $J\leqslant r-1$, then $V_{J-1} \leqslant V_{J}' < V_J$ and we see that the right side above is at least

$$\begin{aligned} \textrm{e}({\mathscr {V}}')-\textrm{e}({\mathscr {V}}) + \varepsilon ^J \big ( {\mathbb {H}}_{\mu _J}(V_J')-\dim (V_J/V_J') \big ) + O(\varepsilon ^{J+1}), \end{aligned}$$

which is also positive for sufficiently small $\varepsilon $ by (8.4) and (8.12).

Second argument. We now sketch a second approach to the proof of Proposition 8.2. The idea is to introduce a small perturbation of our fundamental quantity $\textrm{e}()$, namely

$$\begin{aligned} \textrm{e}_{\lambda }({\mathscr {V}}', {{\textbf{c}}}, {\varvec{\mu }}) := \lambda \sum _{j = 1}^r (c_{j+1} - c_j) {\mathbb {H}}_{\mu _j}(V'_j) + \sum _{j = 1}^r c_j \dim (V'_j/V'_{j-1}), \end{aligned}$$

where $\lambda \approx 1$. Note that $\textrm{e}_1({\mathscr {V}}',{{\textbf{c}}},{\varvec{\mu }}) = \textrm{e}({\mathscr {V}}',{{\textbf{c}}},{\varvec{\mu }})$, and also that $\textrm{e}_{\lambda }({\mathscr {V}},{{\textbf{c}}},{\varvec{\mu }})$ does not depend on $\lambda $, since all the entropies ${\mathbb {H}}_{\mu _j}(V_j)$ vanish. Define the $\lambda $-perturbed optimal parameters ${{\textbf{c}}}^*(\lambda )$ to be the unique solution to the $\lambda $-perturbed version of (7.11), that is to say the equations

$$\begin{aligned} \textrm{e}_{\lambda }({\mathscr {V}}'_{{\text {basic}}(m)},{{\textbf{c}}}^*(\lambda ),{\varvec{\mu }}) = \textrm{e}_{\lambda }({\mathscr {V}},{\textbf{c}}^*(\lambda ),{\varvec{\mu }}), m = 0,1,\ldots , r-1. \end{aligned}$$

By a continuity argument, these exist for $\lambda $ sufficiently close to 1 and they satisfy $\lim _{\lambda \rightarrow 1} {\textbf{c}}^*(\lambda ) = {\textbf{c}}^*(1) = {\textbf{c}}^*$.

Now, assume that $\lambda $ is close enough to 1 so that

$$\begin{aligned} 1 = c_1^*(\lambda )> c_2^*(\lambda )> \cdots> c_{r+1}^*(\lambda )>0 \end{aligned}$$

and we have the following “positivity inequalities”:

(i)
$\lambda {\mathbb {H}}_{\mu ^*_{m+1}}(V_m)>\dim (V_{m+1}/V_m)$ for $0\leqslant m\leqslant r-1$;
(ii)
$\lambda \cdot \big ({\mathbb {H}}_{\mu ^*_i}(V_{m-1})-{\mathbb {H}}_{\mu ^*_i}(V_m) \big )< \dim (V_m/V_{m-1})$ for $1\leqslant m<i\leqslant r$.

These conditions can be clearly guaranteed by a continuity argument and our assumption that they hold when $\lambda =1$. For a parameter $\lambda $ satisfying (i) and (ii) above, the proof of (8.4) holds verbatim for the $\lambda $-perturbed quantities $\textrm{e}_\lambda $, allowing one to conclude that

$$\begin{aligned} \textrm{e}_{\lambda }({\mathscr {V}}', {{\textbf{c}}}^*(\lambda ),{\varvec{\mu }}) \geqslant \textrm{e}_{\lambda }({\mathscr {V}},{{\textbf{c}}}^*(\lambda ),{\varvec{\mu }}) \end{aligned}$$

for all subflags ${\mathscr {V}}'$ of ${\mathscr {V}}$.

Now suppose that $\lambda < 1$. Then we have

$$\begin{aligned} \textrm{e}({\mathscr {V}}', {{\textbf{c}}}, {\varvec{\mu }}^*) \geqslant \textrm{e}_{\lambda }({\mathscr {V}}',{{\textbf{c}}}, {\varvec{\mu }}^*), \end{aligned}$$

with equality if and only if ${\mathscr {V}}' = {\mathscr {V}}$ because ${\text {Supp}}(\mu _j^*)=V_j\cap \{0,1\}^k$ for all j. Therefore if ${\mathscr {V}}'$ is a proper subflag of ${\mathscr {V}}$ we have

$$\begin{aligned} \textrm{e}({\mathscr {V}}',{{\textbf{c}}}^*(\lambda ),{\varvec{\mu }}^*)> & {} \textrm{e}_{\lambda }({\mathscr {V}}', {{\textbf{c}}}^*(\lambda ),{\varvec{\mu }}^*)\\\geqslant & {} \textrm{e}_{\lambda }({\mathscr {V}},{{\textbf{c}}}^*(\lambda ),{\varvec{\mu }}^*) = \textrm{e}({\mathscr {V}}, {{\textbf{c}}}^*(\lambda ),{\varvec{\mu }}^*). \end{aligned}$$

Taking ${\tilde{{{\textbf{c}}}}} = {{\textbf{c}}}^*(\lambda )$ for $\lambda $ sufficiently close to 1, Proposition 8.2 follows.

Part IV. Binary systems

9 Binary systems and a lower bound for $\beta _k$

In this section we define certain special flags ${\mathscr {V}}$ on ${\mathbb {Q}}^k$, $k = 2^r$, which we call the binary systems of order r. It is these systems which lead to the lower bound on $\beta _k$ given in Theorem 2, which is one of the main results of the paper.

In this section we will define these flags (which is easy) and state their basic properties. The proofs of these properties, some of which are quite lengthy, are deferred to Sect. 10.

We are then in a position to prove part of one of our main theorems, Theorem 2 (a), which we do in Sect. 9.2.

For the convenience of the reader, recall us here the three parts of Theorem 2, as stated at the end of Sect. 1.3:

(a)
Showing that for every $r\geqslant 1$, $\beta _{2^r} \geqslant \theta _r$ for a certain explicitly defined constant $\theta _r$;
(b)
Showing that $\lim _{r\rightarrow \infty } \theta _r^{1/r}$ exists;
(c)
Showing that (1.1) has a unique solution $\rho \in [0,1/3]$ and that $\rho =2\lim _{r\rightarrow \infty } \theta _r^{1/r}$.

9.1 Binary flags and systems: definitions and properties

Definition 9.1

(Binary flag of order r) Let $k = 2^r$ be a power of two. Identify ${\mathbb {Q}}^k$ with ${\mathbb {Q}}^{{\mathcal {P}}[r]}$ (where ${\mathcal {P}}[r]$ means the power set of $[r] = \{1,\ldots , r\}$) and define a flag ${\mathscr {V}}$, $\langle {\textbf{1}} \rangle = V_0 \leqslant V_1 \leqslant \cdots \leqslant V_r = {\mathbb {Q}}^{{\mathcal {P}}[r]}$, as follows: $V_i$ is the subspace of all $(x_S)_{S \subset [r]}$ for which $x_S = x_{S \cap [i]}$ for all $S \subset [r]$.

Remark

We have $\dim (V_i) = 2^i$, and $V_r = {\mathbb {Q}}^{{\mathcal {P}}[r]}$, so the system is trivially nondegenerate. Note that we have been using the letter r to denote the number of $V_i$ in the flag ${\mathscr {V}}$, throughout the paper. It just so happens that, in this example, this is the same r as in the definition of $k = 2^r$.

One major task is to show that optimal measures and optimal parameters, as described in Sect. 7, may be defined on the binary flags. Since we will be seeing them so often, let us write down the $\rho $-equations (7.5) for the binary flags explicitly:

$$\begin{aligned} f^{\Gamma _{j+1}}(\varvec{\rho }) = f^{\Gamma _j}(\varvec{\rho })^{\rho _j} e^{2^j} , \quad j = 1,2,\ldots . \end{aligned}$$

(9.1)

Proposition 9.2

Let ${\mathscr {V}}$ be the binary flag of order r. Then

(a)
the $\rho $-equations (9.1) have a solution with $0< \rho _i < 1$ for $i \geqslant 1$, and consequently we may define the optimal measures ${\varvec{\mu }}^*$ on $\{0,1\}^k$ as in Definition 7.4;
(b)
the optimal parameters ${\textbf{c}}^*$ (in the sense of Definition 7.5) exist.

We call the binary flag ${\mathscr {V}}$ (of order r) together with the additional data of the optimal measures $\mu = \mu ^*$ and optimal parameters ${\textbf{c}} = {\textbf{c}}^*$, the binary system (of order r). We caution that for fixed i (such as $i = 2$) the parameters $c_i$ do depend on r, although not very much.

The second major task is to show that the binary systems satisfy the entropy condition (3.4), or more accurately that arbitrarily small perturbations of them satisfy the strict entropy condition (3.5). In the last section we provided a tool for doing this in somewhat general conditions, namely Proposition 8.2. That proposition has four conditions, (a), (b), (c)(i) and (c)(ii) which must be satisfied. Of these, (b) (the existence of the optimal parameters ${\textbf{c}}^*$) has already been established, assuming the validity of Proposition 9.2. We state the other three conditions separately as lemmas.

Lemma 9.3

Suppose that $V_{i-1} \leqslant W \leqslant V_i$ and that W is invariant under ${\text {Aut}}({\mathscr {V}})$. Then W is either $V_{i-1}$ or $V_i$. Thus, the binary flags satisfy Proposition 8.2 (a).

Lemma 9.4

We have ${\mathbb {H}}_{\mu _{m+1}^*}(V_m) > 2^m$ for $0 \leqslant m \leqslant r - 1$. Thus, the binary flags satisfy Proposition 8.2 (c)(i).

Lemma 9.5

We have ${\mathbb {H}}_{\mu _i^*}(V_{m-1})-{\mathbb {H}}_{\mu ^*_i}(V_m) < 2^{m-1}$ for $1\leqslant m< i\leqslant r$. Thus, the binary flags satisfy Proposition 8.2 (c)(ii).

The proofs of these various facts are given in Sect. 10.

9.2 Proof of Theorems 2 (a) and 7

We are now in a position to complete the proof of Theorem 2 (a), modulo the results stated above. First, we define the constants $\theta _r$.

Definition 9.6

Let $\rho _1,\rho _2,\ldots $ be the solution to the $\rho $-equations (9.1) for the binary flag. Then we define

$$\begin{aligned} \theta _r := (\log 3-1)\Big /\Big (\log 3 + \sum _{i = 1}^{r-1} \frac{2^i}{\rho _1 \ldots \rho _i}\Big ). \end{aligned}$$

Proof of Theorem 2 (a)

By Proposition 7.7, $\theta _r$ is equal to $c^*_{r+1}$, where ${\textbf{c}}^*$ are the optimal parameters on the binary flag ${\mathscr {V}}$ of order r, the existence of which is Proposition 9.2 (b) above.

Fix $\delta \in (0,\theta _r/2]$. By Proposition 8.2 (the hypotheses of which are satisfied by Lemma 9.3, Proposition 9.2 (b) and Lemmas 9.4 and 9.5), there exists a perturbation ${\tilde{{{\textbf{c}}}}}$ of ${{\textbf{c}}}^*$ such that

$$\begin{aligned} 1={\tilde{c}}_1>{\tilde{c}}_2>\cdots>{\tilde{c}}_{r+1}\geqslant c_{r+1}^*-\delta =\theta _r-\delta >0 \end{aligned}$$

and $({\mathscr {V}},{\tilde{{{\textbf{c}}}}},{{\varvec{\mu }}}^*)$ satisfies the strict entropy condition (3.5). By Lemma 5.2, there exists some $\varepsilon >0$ such that the “entropy gap” condition (5.1) holds. Finally, by Remark 7.1 (b), we have that ${\text {Supp}}(\mu _j^*)=\Gamma _j$ for all j. Hence, Proposition 5.5 implies that $\beta _{2^r} \geqslant {\tilde{c}}_{r+1}=\theta _r-\delta $. Since $\delta $ is arbitrary, this proves Theorem 2 (a). $\square $

Proof of Theorem 7

The upper bound $\beta _k\leqslant \gamma _k$ is established in Sect. 4. The lower bound $\beta _k\geqslant {\tilde{\gamma }}_k$ follows by Lemma 5.3, Proposition 5.5 and the fact that there exists at least one system satisfying the strict entropy condition (3.5), as per the proof of Theorem 2 (a) above. $\square $

9.3 Remarks on Theorem 2 (b)

Theorem 2 (b) is a problem of a combinatorial and analytic nature which can be considered more-or-less completely independently of the first three parts of the paper.

To get a feel for it, and a sense of why it is difficult, let us write down the first two $\rho $-equations (9.1) for the binary flags. The equation with $j = 1$ is

$$\begin{aligned} f^{\Gamma _2}(\rho ) = f^{\Gamma _1}(\rho )^{\rho _1} e^2.\end{aligned}$$

(9.2)

This has the numerical solution $\rho _1 \approx 0.306481$.

To write down the $\rho $-equation for $j = 2$, one must compute $f^{\Gamma _3}(\rho )$, and without any additional theory the only means we have to do this is to draw the full tree structure for the binary flag ${\mathscr {V}}$ of order 3 (on ${\mathbb {Q}}^8$). This is a tractable exercise and one may confirm that

$$\begin{aligned} f^{\Gamma _3}(\rho )= & {} (3^{\rho _1} + 4 \cdot 2^{\rho _1} + 4)^{\rho _2} + 8(2 \cdot 2^{\rho _1} + 4)^{\rho _2}\\ {}{} & {} + 16 \cdot 4^{\rho _2} + 8 \cdot (2^{\rho _1} + 2)^{\rho _2} + 32 \cdot 2^{\rho _2} + 16. \end{aligned}$$

The $\rho $-equation with $j = 2$ is then

$$\begin{aligned} f^{\Gamma _3}(\rho ) = f^{\Gamma _2}(\rho )^{\rho _2} e^4, \end{aligned}$$

where (recall from Fig. 1) $f^{\Gamma _2}(\rho ) = 3^{\rho _1} + 4 \cdot 2^{\rho _1} + 4$. This may be solved numerically, with the value $\rho _2 \approx 0.2796104\ldots $, using Mathematica.

Such a numerical procedure, however, is already quite an unappetising prospect if one wishes to compute $\rho _3$.

Consequently, we must develop more theory to understand the $\rho _i$ and to prove Theorem 2 (b). This is the task of the last two sections of the paper.

10 Binary systems: proofs of the basic properties

In this section, we prove the various statements in Sect. 9.1.

We begin, in Sect. 10.2, by proving Lemma 9.3. This is a relatively simple and self-contained piece of combinatorics.

In Sect. 10.3 we introduce the concept of genotype, which allows us to describe the tree structure induced on $\{0,1\}^k$ by the binary flag ${\mathscr {V}}$. In Sect. 10.4 we show how to compute the quantities $f^C(\varvec{\rho })$ in terms of the genotype.

We are then, in Sect. 10.5, in a position to prove Proposition 9.2 (a), guaranteeing that the $\rho _i$ exist and allowing us to define the optimal measures ${\varvec{\mu }}^*$.

In Sect. 10.6 we establish the two entropy inequalities, Lemmas 9.4 and 9.5.

Finally, in Sect. 10.7 we prove Proposition 9.2 (b), which confirms the existence of the optimal parameters ${{\textbf{c}}}^*$.

10.1 Basic terminology

Throughout the section, ${\mathscr {V}}$ will denote the binary flag or order r, as defined in Definition 9.1. That is, we take $k = 2^r$, identify ${\mathbb {Q}}^k$ with ${\mathbb {Q}}^{{\mathcal {P}}[r]}$, and take $V_i$ to be the subspace of all $(x_S)_{S \subset {\mathcal {P}}[r]}$ for which $x_S = x_{S \cap [i]}$ for all $S \subset [r]$.

In addition, we will write ${\textbf{0}}_j, {\textbf{1}}_j$ for the vectors in $\{0,1\}^{{\mathcal {P}}[j]}$ consisting of all 0s (respectively all 1s). We call these (or any multiples of them) constant vectors.

Finally, we introduce the notion of a block of a vector

$$\begin{aligned} x = (x_S)_{S \subset [r]} \in {\mathbb {Q}}^{{\mathcal {P}}[r]}. \end{aligned}$$

For each $A \subset [i]$ we consider the $2^{r-i}$-tuple

$$\begin{aligned} x(A,i) := (x_{A \cup A'})_{A' \subset \{i+1,\ldots , r\}}. \end{aligned}$$

We call these the i-blocks of x.

Remark 10.1

(a) One should note carefully that the i-blocks are strings of length $2^{r-i}$. In this language, $V_i$ is the space of vectors x, all of whose i-blocks are constant.

(b) If we put together the coordinates of the i-blocks x(A, i) and $x(A\triangle \{i\},i)$, then we obtain the $(i-1)$-block $x(A\cap [i-1],i-1)$.

In order to visualize the structure of the flag ${\mathscr {V}}$ and of the partition of $\{0,1\}^{{\mathcal {P}}[r]}$ by the cosets of $V_j$, it will be often useful to write elements of $\{0,1\}^{{\mathcal {P}}[r]}$ as strings of 0s and 1s of length $2^r$. When we do this we use the reverse binary order, which is the one induced from ${\mathbb {N}}$ via the map $f(S) = \sum _{s \in S} 2^{r - s}$.

Example 10.2

For concreteness, let us consider the case $r = 3$. In this case, the ordering of the coordinates of x is

$$\begin{aligned} (x_\emptyset ,x_{\{3\}},x_{\{2\}},x_{\{2,3\}},x_{\{1\}},x_{\{1,3\}},x_{\{1,2\}},x_{[3]}). \end{aligned}$$

(10.1)

If $x = 01001110$ then its 2-blocks are 01, 00, 11, 10, and its 1-blocks are 0100, 1110.

10.2 Automorphisms of the binary system

Proof of Lemma 9.3

We begin by defining some permutations of ${\mathcal {P}}[r]$ for which, we claim, the corresponding coordinate permutations give elements of ${\text {Aut}}({\mathscr {V}})$. Suppose that $1 \leqslant j \leqslant r$ and that $A \subset [j-1]$. Then we may consider the permutation $\pi (A,j)$ defined by

$$\begin{aligned} \pi (A,j)(S) = {\left\{ \begin{array}{ll} S \triangle \{j\} &{} \text{ if } S \cap [j-1] = A, \\ S &{} \text{ otherwise }. \end{array}\right. }. \end{aligned}$$

To visualize the action of this permutation on the coordinates of a vector x, it is useful to order its coordinates as we explained above. The action of $\pi (A,j)$ is then to permute the two adjacent j-blocks x(A, j) and $x(A\sqcup \{j\},j)$, which together form the $(j-1)$-block $x(A,j-1)$, as per Remark 10.1(b). More concretely, below are some examples of the action of the permutations $\pi (A,j)$ in the setting of Example 10.2:

If the readers wish, they may translate the arguments below in the above more visual language.

Claim. $\pi (A,j)$ preserves $V_i$ for all i, and therefore $\pi (A,j) \in {\text {Aut}}({\mathscr {V}})$.

Proof

Suppose that $x= (x_S)_{S \subset [r]} \in V_i$ and let us write for simplicity $\pi $ instead of $\pi (A,j)$.

Suppose first that $j > i$. Then $\pi (S) \cap [i] = S \cap [i]$ for all S, and so

$$\begin{aligned} x_{\pi (S)} = x_{\pi (S) \cap [i]} = x_{S \cap [i]} = x_S. \end{aligned}$$

where the first and last steps used the fact that ${\textbf{x}} \in V_i$. Thus the claim follows in this case.

Suppose that $j \leqslant i$. Let $t > i$. Then the conditions $(S \triangle \{t\}) \cap [j-1] = A$ and $S \cap [j-1] = A$ are equivalent. Hence, if $S \cap [j-1] = A$, then we find that

$$\begin{aligned} x_{\pi (S \triangle \{t \})} = x_{S \triangle \{t\} \triangle \{j\}} = x_{S \triangle \{j\}} = x_{\pi (S)}, \end{aligned}$$

where we used that $x\in V_i$ and that $t>i$ at the second step. Similarly, if $S \cap [j-1] \ne A$, then

$$\begin{aligned} x_{\pi (S \triangle \{t\})}= x_{S\triangle \{t\}} = x_S = x_{\pi (S)}. \end{aligned}$$

In all cases, we have found that $x_{\pi (S \triangle \{t\})} = x_{\pi (S)}$. Since this is true for all $t > i$, $\pi (x)$ indeed lies in $V_i$. This completes the proof of the claim. $\square $

Suppose now that W is an invariant subspace of ${\mathscr {V}}$ satisfying the inclusions $V_{i-1}<W \leqslant V_i$. We want to conclude that $W=V_i$. To accomplish this, we introduce some auxiliary notation.

For each $A\subset [i-1]$, we consider the vector $y^A = (y^A_S)_{S \subset [r]}\in V_i$ that is uniquely determined by the relations $y^A_{A} = 1$, $y^A_{A \cup \{i\}} = -1$ and $y^A_S = 0$ for all other $S \subset [i]$. There are $2^{i-1}$ such vectors $y^A$. They are mutually orthogonal, hence linearly independent. In addition, together with $V_{i-1}$, they generate all of $V_i$. Since $V_{i-1}<W\leqslant V_i$, there must exist $A\subset [i-1]$ such that $y^A\in W$.

Now, it is easy to check that for any $j<i$ and any $A\subset [i-1]$, we have

$$\begin{aligned} \pi (A\cap [j-1],j) y^A = y^{A \triangle \{j\}}. \end{aligned}$$

From the above relation and the invariance of W under ${\text {Aut}}({\mathscr {V}})$, it is clear that if W contains at least one vector $y^A$ with $A\subset [i-1]$, then it contains all such vectors. Since we also know that $V_{i-1}\leqslant W\leqslant V_i$, we must have that $W=V_i$, which completes the proof of Lemma 9.3. $\square $

Remark

A minor elaboration of the above argument in fact allows one to show that the subspaces of ${\mathbb {Q}}^{{\mathcal {P}}[r]}$ invariant under ${\text {Aut}}({\mathscr {V}})$ are the $V_i$, the orthogonal complements of $V_{i-1}$ in $V_i$, and all direct sums of these spaces. However, we will not need the classification in this explicit form.

10.3 Cell structure and genotype

The cosets of $V_i$ partition $\{0,1\}^{{\mathcal {P}}[r]}$ into sets which we call the cells at level i. Our first task is to describe these explicitly.

Consider $\omega , \omega ' \in \{0,1\}^{{\mathcal {P}}[r]}$. It is easy to see that $\omega - \omega ' \in V_i$ (and so $\omega , \omega '$ lie in the same cell at level i) if and only if for every $A \subset [i]$ one of the following is true:

(a)
Both $\omega (A,i)$ and $\omega '(A,i)$ are constant blocks (that is, they both lie in $\{ {\textbf{0}}_{r-i}, {\textbf{1}}_{r-i}\})$.
(b)
$\omega (A, i) = \omega '(A,i)$, and neither of these blocks is constant (that is, neither is ${\textbf{0}}_{r-i}$ nor ${\textbf{1}}_{r-i}$).

Thus a cell at level i is completely specified by the positions A of its constant i-blocks, and the values $\omega (A,i)$ (for an arbitrary $\omega \in C$) of its non-constant i-blocks.

Example

With $r=3$ and $\omega = 01001110$, the level 2 cell that contains $\omega $ is the set

$$\begin{aligned} \{ \omega , 01111110, 01000010, 01000010\}. \end{aligned}$$

Its constant 2-blocks are at $A = \{2\}$ and $A = \{1\}$. Its non-constant 2-blocks are at $A = \emptyset $ (taking the value $\omega (A,2) = 01$) and at $A = \{1,2\}$ (taking the value $\omega (A, 2) = 10$). The level 1 cell containing $\omega $ is just $\{\omega \}$.

The positions of the constant i-blocks play an important role, and we introduce the name genotype to describe these.^{Footnote 8}

Definition 10.1

(Genotype) If C is a cell at level i, its genotype $g(C) \subset {\mathcal {P}}[i]$ is defined to be the collection of $A \subset [i]$ for which $\omega (A,i)\in \{{\textbf{0}}_{r-i},{\textbf{1}}_{r-i}\}$ for all $\omega \in C$. We refer to any subset of ${\mathcal {P}}[i]$ as an i-genotype. If $g, g'$ are two i-genotypes, then we write $g \leqslant g'$ to mean the same as $g \subseteq g'$. We write |g| for the cardinality of g.

Example

If C is the cell at level 2 containing $\omega = 01001110$, the genotype g(C) is equal to $\big \{\{2\}, \{1\}\big \}$. (We have listed these sets in the reverse binary ordering once again.)

Definition 10.2

(Consolidations) If g is an i-genotype, then its consolidation is the $(i-1)$-genotype $g^*$ defined by $g^* := \{A' \subset [i-1] : A'\in g, A' \cup \{i\} \in g\}$ (cf. Remark 10.1 (b)).

Let us pause to note the easy inequality

$$\begin{aligned} \frac{1}{2}|g| \geqslant |g^*| \geqslant |g| - 2^{i-1}, \end{aligned}$$

(10.2)

valid for all i-genotypes.

The genotype is intimately connected to the cell structure on $\{0,1\}^k$ induced by ${\mathscr {V}}$, as the following lemma shows.

Lemma 10.3

We have the following statements.

(a)
If C is a cell, we have $|C| = 2^{| g(C)|}$.
(b)
Suppose that g is an i-genotype. There are $(2^{2^{r-i}}-2)^{2^i-| g |}$ cells (at level i) with $g(C) = g$.
(c)
If $g(C) = g$, and if $C'$ is a child of C, then $g(C') \leqslant g^*$. In particular, $|g(C')| \leqslant \frac{1}{2}|g(C)|$.
(d)
Suppose that $g(C) = g$. Suppose that $g'$ is an $(i-1)$-genotype and that $g' \leqslant g^*$. Then number of children $C'$ of C with $g(C') = g'$ is $2^{| g | - | g^* | - | g' |}$.
(e)
Suppose that C is a cell at level i with $g(C) = g$. Then the number of children of C (at level $i-1$) is $2^{| g | - 2 |g^* |} 3^{|g^* |}$.

Proof

(a) This is almost immediate: for each of the $A\subset g(C)$ of constant blocks, the are two choices (${\textbf{0}}_{r - i}$ or ${\textbf{1}}_{r - i}$) for $\omega (A,i)$.

(b) To determine C completely (given g), one must specify the value of each of $2^i - | g |$ non-constant i-blocks. For each such block, there are $2^{2^{r-i}}-2$ possible non-constant values.

(c) A set $A' \subset [i-1]$ can only possibly be the position of a constant block in some child cell of C if both $A'$ and $A' \cup \{i\}$ are the positions of constant blocks in C, or in other words $A', A' \cup \{i\} \in g$, which is precisely what it means for $A'$ to lie in $g^*$.

Note that the child cell $C'$ containing $\omega $ only does have a constant $(i-1)$-block at position $A'$ if $\omega (A', i) = \omega (A' \cup \{i\}, i)$, which may or may not happen.

The second statement is an immediate consequence of the first and (10.2).

(d) Let $A \in g$. We say that A is productive if $A' := A \cap [i-1] \in g^*$, or equivalently if $A'$ and $A' \cup \{i\}$ both lie in g (or, more succinctly, $A \triangle \{i\} \in g$). These are the positions which can give rise to constant $(i-1)$-blocks in children of C. There are $2|g^*|$ such positions, coming in $|g^*|$ pairs. To create a child $C'$ with genotype $g'$, we have a binary choice at $|g^*| - |g'|$ of these pairs: at each of them either $\omega (A', i) = {\textbf{0}}_{r - i}$ and $\omega (A' \cup \{i\}, i) = {\textbf{1}}_{r - i}$, or the other way around. There are $|g| - 2|g^*|$ non-productive positions $A \in g$, and for each of these there is also a binary choice, either $\omega (A,i) = {\textbf{0}}_{r - i}$ or $\omega (A,i) = {\textbf{1}}_{r - i}$. The total number of choices is therefore $2^{|g^*| - |g'|} \times 2^{|g| - 2|g^*|}$, which is exactly as claimed.

(e) This is immediate from part (d), upon summing over $g' \subseteq g^*$. $\square $

10.4 The $f^C(\rho )$ and genotype

We begin by recalling from (7.4) the definition of the functions $f^C({\varvec{\rho }})$. Here ${\varvec{\rho }} = (\rho _1,\ldots , \rho _{r-1})$ is a sequence of parameters, and we define $\rho _0 = 0$. If C has level 0, we set $f^C({\varvec{\rho }}) = 1$, whilst for C at level $i \geqslant 1$ we apply the recursion

$$\begin{aligned} f^C({\varvec{\rho }}) = \sum _{C \rightarrow C'} f^{C'}({\varvec{\rho }})^{\rho _{i-1}}. \end{aligned}$$

Proposition 10.4

The quantities $f^C$ depend only on the genotype of C, and thus for any i-genotype g we may define $F(g) := f^{C}({\varvec{\rho }})$, where C is any cell with $g(C) = g$. We have the recursion

$$\begin{aligned} F(g) = \sum _{g' \leqslant g^*} 2^{|g| - |g^*| - |g'|} F(g')^{\rho _{i-1}}.\end{aligned}$$

(10.3)

Remark

The F(g) depend on ${\varvec{\rho }}$, as well as on i (where g is an i-genotype) but we suppress explicit mention of this. For example, it should be clear from context that g on the left is an i-genotype, but the sum on the right is over $(i-1)$-genotypes, since $g^*$ is an $(i-1)$-genotype by definition.

Proof

This is a simple induction on the level i using the definition of the $f^C({\varvec{\rho }})$, and parts (c) and (d) of Lemma 10.3.$\square $

Let us pause to record two corollaries which we will need later.

Corollary 10.5

Suppose that $g_1, g_2$ are two i-genotypes with $g_1 \leqslant g_2$. Then $F(g_1) \leqslant F(g_2)$.

Proof

Note that $g_1^* \leqslant g_2^*$, and also that $|g_1| - |g_1^*| \leqslant |g_2| - |g_2^*|$, since

$$\begin{aligned} |g| - |g^*| = |g^*|+ \# \{ A \subset {\mathcal {P}}[i-1] :\#(\{A,A\cup \{i\} \}\cap g)=1\}. \end{aligned}$$

Hence, by two applications of Proposition 10.4,

$$\begin{aligned} F(g_1)= & {} 2^{|g_1| - |g_1^*|} \sum _{g' \leqslant g_1^*} 2^{-|g'|} F(g')^{\rho _{i-1}} \\\leqslant & {} 2^{|g_2| - |g^*_2|} \sum _{g' \leqslant g^*_2} 2^{-|g'|} F(g')^{\rho _{i-1}} = F(g_2). \\ \end{aligned}$$

$\square $

Recall that $\Gamma _i$ is the cell at level i containing ${\textbf{0}}$. Note that $g(\Gamma _i) ={\mathcal {P}}[i]$.

Corollary 10.6

If $C \ne \Gamma _i$ is a cell of level i, then $ f^C({\varvec{\rho }}) < f^{\Gamma _i}({\varvec{\rho }})$.

Proof

This is simply the special case $g_2 = {\mathcal {P}}[i]$ of the preceding corollary. The inequality is strict because if $g < {\mathcal {P}}[i]$, then $g^* < {\mathcal {P}}[i-1]$.$\square $

10.5 Existence of the $\rho _i$

In this section we prove Proposition 9.2 (a), which asserts that for the binary flags there is a unique solution ${\varvec{\rho }} = (\rho _1,\rho _2,\ldots )$ to the $\rho $-equations (9.1). In fact, we will prove the following more general fact which treats the jth $\rho $-equation in isolation, irrespective of whether the earlier ones have already been solved.

Proposition 10.7

Let $j\in {\mathbb {N}}$ and let $\rho _1,\ldots , \rho _{j-1} \in (0,1)$. Then there is a unique $\rho _j \in (0,1)$ such that the jth $\rho $-equation for the binary flag, $f^{\Gamma _{j+1}}(\varvec{\rho }) = e^{2^j} f^{\Gamma _j}(\varvec{\rho })^{\rho _{j}}$, is satisfied.

Remark

We will prove in the next section (Lemma 11.2) that for the solution $\rho _1,\rho _2,\ldots $ to the full set of $\rho $-equations we have $\rho _j \leqslant \rho _1 = 0.30648\ldots $ for all j. For a table of numerical values of the $\rho _j$, see Table 1 in Sect. 12.

Before beginning the proof of Proposition 10.7, we isolate a lemma.

Lemma 10.8

Fix a $(j-1)$-genotype $g'$. Then

$$\begin{aligned} \sum _{g :\, g^*\geqslant g'} 2^{-|g^*|} = 2^{-2^{j-1}} 7^{2^{j-1}-|g'|},\end{aligned}$$

where the sum is over all j-genotypes g.

Proof

In order to determine g, we must determine for each $A\subset [j-1]$ whether A and/or $A\cup \{j\}$ lie in g. Since we are only summing over g whose consolidation $g^*$ contains $g'$, we must have that A and $A\cup \{j\}$ belong to g for all $A\in g'$, so the membership of A and $A\cup \{j\}$ to g is fully determined for all $A\in g'$. For any $A\subset [j-1]$ with $A\notin g'$, we have four choices, according to whether $A\in g$ and whether $A\cup \{j\}$. If both of these conditions hold, then we further have $A\in g^*$; in the other three cases, we have $A\notin g^*$. We conclude that

$$\begin{aligned} \sum _{g :\, g^*\geqslant g'} 2^{-|g^*|} =\prod _{\begin{array}{c} A\in g' \end{array}} 2^{-1} \prod _{\begin{array}{c} A\not \in g' \end{array}} (1\cdot 2^{-1}+3\cdot 2^{-0})= 2^{-2^{j-1}} 7^{2^{j-1}-|g'|}. \end{aligned}$$

This completes the proof. $\square $

Proof of Proposition 10.7

For $j=1$, the equation to be satisfied is $3^{\rho _1} + 4 \cdot 2^{\rho _1} + 4 = e^2 3^{\rho _1}$. It may easily be checked numerically that this has a unique solution $\rho _1 \approx 0.306481\ldots $ in (0, 1). One may also proceed analytically as follows. Define

$$\begin{aligned} G(x) = G_1(x){} & {} := e^2 3^x - (3^x + 4 \cdot 2^x + 4) \\{} & {} = 3^x \big (e^2 - (1 + 4 \cdot (2/3)^x + 4/3^x)\big ), \end{aligned}$$

In particular, the roots of G are in correspondence with the roots of $H(x)=e^2 - (1 + 4 \cdot (2/3)^x + 4/3^x)$. This is clearly a continuous and strictly increasing function. In addition, $H(0)=e^2-9<0$ and $H(1)=e^2-5>0$. Thus, H has a unique root $\rho _1\in (0,1)$, and so does G.

Now assume $j\geqslant 2$. It turns out that much the same argument works, although the details are more elaborate. Assume that $0<\rho _i<1$ for $1\leqslant i<j$. Define

$$\begin{aligned} G(x) := G_j(x) = e^{2^j} (f^{\Gamma _j}(\varvec{\rho }))^x - f^{\Gamma _{j+1}}(\rho _1,\ldots ,\rho _{j-1},x) . \end{aligned}$$

Proposition 10.4 implies that

$$\begin{aligned} G(x)&= e^{2^j} (F({\mathcal {P}}[j]))^x - \sum _{g} 2^{2^j-|g|} F(g)^x \nonumber \\&= F({\mathcal {P}}[j])^x \cdot H(x), \end{aligned}$$

(10.4)

where

$$\begin{aligned} H(x) = e^{2^j} - 2^{2^j} \sum _{g} 2^{-|g|} \big (F(g) / F({\mathcal {P}}[j])\big )^x \end{aligned}$$

and the sums over g run over all genotypes $g \subset {\mathcal {P}}[j]$ at level j. Since (by an easy induction) $F({\mathcal {P}}[j])>0$, it follows that G and H have the same roots. The latter is a continuous and strictly increasing function because Corollary 10.6 implies that $F(g)/F({\mathcal {P}}[j]) \leqslant 1$, with equality only when $g={\mathcal {P}}[j]$. Moreover, $H(0) = e^{2^j} - 3^{2^j} < 0$. Therefore to complete the proof it suffices to show that $H(1) > 0$.

To show this, we use (10.4). First note that

$$\begin{aligned} F({\mathcal {P}}[j]) = (\sqrt{2})^{2^j} \sum _{g'} 2^{-|g'|} F(g')^{\rho _{j-1}}, \end{aligned}$$

(10.5)

where the sum is over all genotypes $g'$ of level $(j-1)$.

Next, by Proposition 10.4 and Lemma 10.8 we have

$$\begin{aligned} \sum _{g \subset {\mathcal {P}}[j]} 2^{-|g|} F(g)= & {} \sum _{g} 2^{-|g^*|} \sum _{g' \leqslant g^*} 2^{-|g'|} F(g')^{\rho _{j-1}} \nonumber \\= & {} \sum _{g' \subset {\mathcal {P}}[j-1]} 2^{-|g'|} F(g')^{\rho _{j-1}} \sum _{g:\, g^*\geqslant g'}2^{-|g^*|}\nonumber \\= & {} (7/2)^{2^{j-1}} \sum _{g'} 14^{-|g'|} F(g')^{\rho _{j-1}}. \end{aligned}$$

(10.6)

Putting (10.4), (10.5) and (10.6) together we obtain

$$\begin{aligned} H(1)\cdot F({\mathcal {P}}[j])= & {} (e\sqrt{2})^{2^j} \sum _{g'} 2^{-|g'|} F(g')^{\rho _{j-1}} \\ {}{} & {} - (\sqrt{14})^{2^j} \sum _{g'} 14^{-|g'|} F(g')^{\rho _{j-1}} . \end{aligned}$$

Since $e^2>7$, we have $\sqrt{14} < e\sqrt{2}$, and thus $H(1)>0$. This completes the proof. $\square $

10.6 Entropy inequalities for the binary systems

We begin with a lemma which will be used a few times in what follows.

Lemma 10.9

Let $C'$ be one of the children of $\Gamma _i$, thus $C'$ is a cell at level $(i-1)$. Then

$$\begin{aligned} \mu _i(C') \leqslant \mu _i(\Gamma _{i-1}) = e^{-2^{i-1}}, \end{aligned}$$

and equality occurs only when $C' = \Gamma _{i-1}$.

Proof

We showed in Corollary 10.6 that $f^{C'}({\varvec{\rho }}) < f^{\Gamma _{i-1}}({\varvec{\rho }})$, for any choice of ${\varvec{\rho }} = (\rho _1,\ldots , \rho _{r-1})$, and for any child $C'$ of $\Gamma _i$ with $C' \ne \Gamma _{i-1}$. Now that we know that the $\rho $-equations have a solution, it follows immediately from the definition of the optimal measures ${\varvec{\mu }}^*$ in (7.6), applied with $C = \Gamma _i$, that $\mu _i(C') < \mu _i(\Gamma _{i-1})$, again for any child $C'$ of $\Gamma _i$ with $C' \ne \Gamma _{i-1}$. Finally, observe that $\mu _i(\Gamma _{i-1}) = e^{-2^{i-1}}$ by (7.7). $\square $

Proof of Lemma 9.4

This follows almost immediately from Lemma 10.9 with $i = m+1$. Indeed since $\mu _{m+1}(C) \leqslant e^{-2^m}$ for all cells C at level m, with equality only for $C = \Gamma _m$, we have

$$\begin{aligned} {\mathbb {H}}_{\mu _{m+1}}(V_m) = \sum _C \mu _{m+1}(C) \log \frac{1}{\mu _{m+1}(C)} > 2^m \sum _C \mu _{m+1}(C) = 2^m. \end{aligned}$$

This concludes the proof. $\square $

Proof of Lemma 9.5

Let $\mu = \mu _i$ with $m<i\leqslant r$. We must show that

$$\begin{aligned} {\mathbb {H}}_\mu (V_{m-1})-{\mathbb {H}}_\mu (V_m) <2^{m-1}. \end{aligned}$$

(10.7)

Let C denote a cell at level m and $C'$ a child of C at level $(m-1)$. In addition, let the notations g(C) and $g(C)^*$ refer to the genotype of C and its consolidation, as defined in Definitions 10.1 and 10.2. By the definition of entropy, Lemma 10.3 (e), and the concavity of $L(x)=-x\log x$ we find that

$$\begin{aligned} {\mathbb {H}}_{\mu }(V_{m-1}) - {\mathbb {H}}_{\mu }(V_m)&= \sum _C \mu (C) \sum _{C'} L\left( \frac{\mu (C')}{\mu (C)}\right) \nonumber \\&\leqslant \sum _C \mu (C) \log (\# C') \nonumber \\&=\sum _C \mu (C) \log \Big [ 2^{|g(C)|} (3/4)^{|g(C)^*|}\Big ]. \end{aligned}$$

(10.8)

Now by (10.2) we have $|g(C)^*| \geqslant |g(C)| - 2^{m-1}$, whence

$$\begin{aligned} 2^{|g(C)|} (3/4)^{|g(C)^*|} \leqslant 2^{|g(C)|}(3/4)^{|g(C)| - 2^{m-1}} = (3/2)^{|g(C)|}(4/3)^{2^{m-1}}.\nonumber \\ \end{aligned}$$

(10.9)

Since we also have that $|g(C)|\leqslant 2^m$, we infer that

$$\begin{aligned} 2^{|g(C)|} (3/4)^{|g(C)^*|}\leqslant 3^{2^{m-1}}. \end{aligned}$$

(10.10)

This and (10.8) already imply the bound

$$\begin{aligned} {\mathbb {H}}_{\mu }(V_{m-1}) - {\mathbb {H}}_{\mu }(V_m) \leqslant 2^{m-1} \log 3, \end{aligned}$$

which is only very slightly weaker than Lemma 9.5.

To make the crucial extra saving, write S for the union of all cells C at level m with $|g(C)| > \frac{3}{4} 2^m$. We claim that

$$\begin{aligned} \mu (S) < \frac{1}{2}. \end{aligned}$$

(10.11)

We postpone the proof of this inequality momentarily and show how to use it to complete the proof of Lemma 9.5.

Observe that if C is not one of the cells making up S, that is to say if $|g(C)| \leqslant \frac{3}{4} 2^m$, then

$$\begin{aligned} \log \Big [ 2^{|g(C)|} (3/4)^{|g(C)^*|} \Big ]&\leqslant \log \Big [ (3/2)^{|g(C)|}(4/3)^{ 2^{m-1}} \Big ] \\&\leqslant \left( \frac{3}{2} \log (3/2) + \log (4/3) \right) 2^{m-1} \\&\leqslant 0.9 \cdot 2^{m-1}, \end{aligned}$$

where we used (10.9) to obtain the first inequality. Assuming the claim (10.11), it follows from this, (10.8) and (10.10) that

$$\begin{aligned} {\mathbb {H}}_{\mu }(V_{m-1}) - {\mathbb {H}}_{\mu }(V_m)\leqslant & {} 2^{m-1}(\log 3) \mu (S) + 0.9\cdot 2^{m-1} (1-\mu (S))\\ {}< & {} 2^{m-1}, \end{aligned}$$

which is the statement of Lemma 9.5.

It remains to prove (10.11). Recall that $1\leqslant m<i\leqslant r$.

When $1\leqslant m\leqslant 2$, the only integer in $(\frac{3}{4}2^m,2^m]$ is $2^m$. Hence, if a cell C at level m satisfies the inequality $|g(C)|>\frac{3}{4}2^m$, we must have $|g(C)|=2^m$. The only cell with this property is $\Gamma _m$. Since we have $\mu (\Gamma _m)=e^{2^m-2^i} \leqslant e^{-1}$ by (7.7), our claim (10.11) follows in this case.

Assume now that $m\geqslant 3$. Let ${\tilde{S}}$ be the union of all children ${{\tilde{C}}}$ of $\Gamma _i$ (thus these are cells at level $i-1 \geqslant m$) which contain a cell C in S. By repeated applications of Lemma 10.3 (c) we have $|g(\tilde{C})| > 2^{i-1-m} (\frac{3}{4} 2^{m})=\frac{3}{4} 2^{i-1}$ for any such ${{\tilde{C}}}$. Lemma 10.3 (d), applied with $C = \Gamma _i$, implies that the number of such cells ${\tilde{C}}$ is at most

$$\begin{aligned}\begin{aligned} \sum _{h>(3/4)2^{i-1}} \left( {\begin{array}{c}2^{i-1}\\ h\end{array}}\right) 2^{2^{i-1}-h} \leqslant 2^{\frac{1}{4} 2^{i-1}} 2^{2^{i-1}} = 2^{(5/4)2^{i-1}}. \end{aligned}\end{aligned}$$

By Lemma 10.9 and our assumption that $i-1\geqslant m\geqslant 3$, it follows that

$$\begin{aligned} \mu (S) \leqslant \mu ({\tilde{S}}) \leqslant (2^{5/4}/e)^{2^{i-1}} < 0.35 . \end{aligned}$$

This completes the proof of the claim (10.11) and hence of Lemma 9.5. $\square $

10.7 Existence of the optimal parameters ${{\textbf{c}}}^*$

Proof of Proposition 9.2 (b)

We have ${\text {Supp}}(\mu _j^*) =\Gamma _j$ by Remark 7.1 (b), and hence $|{\text {Supp}}(\mu _j^*)| =2^{2^j}$ by Lemma 5.1. By Lemma B.2, when $j\geqslant m+2$ we deduce the inequality

$$\begin{aligned} {\mathbb {H}}_{\mu _j^*}(V_m) \leqslant \log |{\text {Supp}}(\mu _j^*)| \leqslant 2^j \log 2 < 2^j - 2^m. \end{aligned}$$

(10.12)

Now recall (Definition 7.5) that the optimal parameters should satisfy the conditions (7.12) (which are the fully written out version of (7.11)). We wish to show that there is a solution with $1 = c_1^*> c^*_2> \cdots> c^*_{r+1} > 0$. Rearranging (7.12) and recalling $\dim (V_j)=2^j$, we find that

$$\begin{aligned} (c^*_{m+1}&- c^*_{m+2})\big ( {\mathbb {H}}_{\mu _{m+1}^*}(V_m) - 2^m\big ) \\&= \sum _{j = m+2}^r \big ( 2^j-2^m - {\mathbb {H}}_{\mu _j^*}(V_m)\big )(c^*_j - c^*_{j+1}) +(2^r-2^m) c^*_{r+1} \end{aligned}$$

for $0\leqslant m\leqslant r-1$. By Lemma 9.4 and (10.12), we may apply a downwards induction on $m = r-1, r-2,\ldots $ to solve these equations with $0< c^*_{r+1}< c^*_r< \cdots < c^*_1$. Rescaling, we may additionally ensure that $c^*_1 = 1$.$\square $

11 The limit of the $\rho _i$

In the last section we showed that there is a unique solution $\varvec{\rho }= (\rho _1,\rho _2,\ldots )$ to the $\varvec{\rho }$-equations (9.1) for the binary system with $0< \rho _j < 1$ for all j. In this section, we show that the limit $\lim _{j \rightarrow \infty } \rho _j $ exists.

Proposition 11.1

$\rho = \lim _{j \rightarrow \infty } \rho _j$ exists.

11.1 $\rho _1$ is the largest $\rho _j$

The estimates required in the proof of Proposition 11.1 are rather delicate, and to make them usable for our purposes we need the following a priori bound on the $\rho _j$.

Lemma 11.2

For all $j\geqslant 1$, we have $\rho _j \leqslant \rho _1 = 0.30648\ldots $

The reader should recall the notion of genotype g (Definition 10.1) and of the function F(g) (Proposition 10.4).

The next lemma is a stronger version of Corollary 10.5, whose proof uses that result as an ingredient.

Lemma 11.3

For any $j\geqslant 1$ and $g_1 \leqslant g_2$ at level j, we have

$$\begin{aligned} \frac{F(g_1)}{F(g_2)} \leqslant \left( \frac{1}{2}\right) ^{|g_2|-|g_1|} \left( \frac{4}{3}\right) ^{|g_2^*|-|g_1^*|}. \end{aligned}$$

Proof

We have

$$\begin{aligned} F(g_2)&= 2^{|g_2|-|g_2^*|} \sum _{g\leqslant g_1^*} 2^{-|g|} \sum _{g'\leqslant g_2^*{\setminus } g_1^*} 2^{-|g'|} F(g \cup g')^{\rho _{j-1}}\\&\qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \text {(by Proposition }10.4\text {)}\\&\geqslant 2^{|g_2|-|g_2^*|} \sum _{g\leqslant g_1^*} 2^{-|g|} \sum _{g'\leqslant g_2^*{\setminus } g_1^*} 2^{-|g'|} F(g)^{\rho _{j-1}} \\&\qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \text {(by Corollary }10.5\text {)}\\&= 2^{|g_2|-|g_2^*|} \sum _{g\leqslant g_1^*} 2^{-|g|} F(g)^{\rho _{j-1}} (3/2)^{|g_2^*|-|g_1^*|}\\&\qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \text {(by the binomial theorem)}\\&=F(g_1) 2^{|g_2|-|g_1|} (3/4)^{|g_2^*|-|g_1^*|}\\&\qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \text {(by Proposition }10.4\text {).} \end{aligned}$$

This concludes the proof. $\square $

Proof of Lemma 11.2

We begin by observing that

$$\begin{aligned} \sum _{g \leqslant {\mathcal {P}}[j]} c_1^{|g|} c_2^{|g^*|} = \prod _{A\subset [j-1]} \Big ( \sum _{a,b\in \{0,1\}} c_1^{a+b}c_2^{ab} \Big ) = (1 + 2c_1 + c_1^2 c_2)^{2^{j-1}}.\nonumber \\ \end{aligned}$$

(11.1)

The $\rho $-equations (9.1), translated into the language of genotypes, are $F({\mathcal {P}}[j+1]) = e^{2^j} F(P[j])^{\rho _j}$. Therefore, by Proposition 10.4 (with $g = {\mathcal {P}}[j+1]$) followed by Lemma 11.3 (with $g_2 = {\mathcal {P}}[j]$), we have

$$\begin{aligned} e^{2^j} F({\mathcal {P}}[j])^{\rho _j}&= F({\mathcal {P}}[j+1]) = 2^{2^j} \sum _{g\leqslant {\mathcal {P}}[j]} 2^{-|g|} F(g)^{\rho _j} \\&\leqslant 2^{2^j} \sum _{g\leqslant {\mathcal {P}}[j]} 2^{-|g|} F({\mathcal {P}}[j])^{\rho _j} \Big [ (1/2)^{2^j-|g|}(4/3)^{2^{j-1}-|g^*|} \Big ]^{\rho _j}\\&=2^{2^j} (1/3)^{2^{j-1}\rho _j} F({\mathcal {P}}[j])^{\rho _j} \sum _{g\leqslant P[j]} 2^{(\rho _j-1)|g|} (3/4)^{\rho _j |g^*|}. \end{aligned}$$

Dividing through by $F({\mathcal {P}}[j])^{\rho _j}$, and applying (11.1) with $c_1 = 2^{\rho _j - 1}$ and $c_2 = (3/4)^{\rho _j}$, we find that

$$\begin{aligned} e^{2^j}&\leqslant (4/3^{\rho _j})^{2^{j-1}} \big ( 1 + 2^{\rho _j} + 2^{2\rho _j-2} (3/4)^{\rho _j}\big )^{2^{j-1}}\\&=\big (4/3^{\rho _j} + 4(2/3)^{\rho _j} + 1 \big )^{2^{j-1}}. \end{aligned}$$

Therefore

$$\begin{aligned} 3^{\rho _j} e^2 \leqslant 4+ 4 \cdot 2^{\rho _j} + 3^{\rho _j}. \end{aligned}$$

However, the first $\rho $-equation (9.2) is precisely that

$$\begin{aligned} 3^{\rho _1} e^2 = 4+ 4 \cdot 2^{\rho _1} + 3^{\rho _1}. \end{aligned}$$

The result follows immediately (using the monotonicity of the function $1 + 4(2/3)^t + 4(1/3)^t$ - see the proof of Proposition 10.7). $\square $

11.2 Preamble to the proof

In this section, we set up some notation and structure necessary for the proof of Proposition 11.1. Since we wish to let $r\rightarrow \infty $, it is convenient to embed all binary r-step systems into a universal infinite binary system. To this end, and with a slight abuse of notation, we let

$$\begin{aligned} V_j= \big \{ (x_A)_{A\subset {\mathcal {P}}({\mathbb {N}})}: x_A\in {\mathbb {Q}}\ \text {and}\ x_{A}= & {} x_{A\cap [j]} \ \text { for all } A\subset {\mathcal {P}}({\mathbb {N}}) \big \} \end{aligned}$$

for $j=0,1,\ldots .$ Clearly, $V_j\simeq {\mathbb {Q}}^{2^j}$ for all j, and the flag

$$\begin{aligned} {\mathscr {V}}^r:V_0\leqslant V_1\leqslant \cdots \leqslant V_r \end{aligned}$$

is isomorphic to the flag of the r-step binary system.

In this notation, we have

$$\begin{aligned} \Gamma _j = \big \{ \omega \in \Omega : \omega \equiv {\textbf{0}}\ ({\text {mod}}\, V_j) \big \}\quad \text {for}\ j=0,1,\ldots , \end{aligned}$$

where

$$\begin{aligned} \Omega = \big \{\omega =(\omega _A)_{A\subset {\mathcal {P}}({\mathbb {N}})}: \omega _A\in \{0,1\}\text { for all }A\subset {\mathcal {P}}({\mathbb {N}}) \big \} \end{aligned}$$

is the discrete unit cube. We further set

$$\begin{aligned} \Gamma _\infty = \bigcup _{j=0}^\infty \Gamma _j . \end{aligned}$$

Lastly, for each $j\geqslant 0$, we say that C is a cell at level j if $C\subset \Gamma _\infty $ and there exists some $x=(x_A)_{A\subset {\mathcal {P}}({\mathbb {N}})}$ such that $x_A\in {\mathbb {Q}}$ for all A and $C=\Omega \cap (x+V_j)$. We may easily check that the collection of cells lying in $\Gamma _r$ forms the tree corresponding to the r-step binary system.

We may now define the functions $f^C$ for our infinite binary flag. It is convenient to reverse the indices in $f^C$. Specifically, let ${\textbf{x}}= (x_1,x_2,\ldots )\in [0,1]^{\mathbb {N}}$. If C is a cell at level $j\geqslant 0$, then we define

$$\begin{aligned} \psi ^C({\textbf{x}}) :=\log f^C(x_{j-1},\ldots , x_1). \end{aligned}$$

In particular, $\psi ^C({\textbf{x}})=0$ when $j=0$, and $\psi ^C({\textbf{x}})=\log |C{\setminus }\{{\textbf{0}}\}|$ when $j=1$.

In the special case $C = \Gamma _j$ we define also

$$\begin{aligned} \phi _j({\textbf{x}}) = 2^{-j} \psi ^{\Gamma _j}({\textbf{x}}) = 2^{-j} \log f^{\Gamma _j} (x_{j-1},\ldots , x_1). \end{aligned}$$

Thus $\phi _1({\textbf{x}}) = \frac{1}{2} \log 3$ and $\phi _2({\textbf{x}}) = \frac{1}{4} \log (3^{x_1} + 4 \cdot 2^{x_1} + 4)$.

Note that $\psi ^C, \phi _j$ are increasing in each variable. Moreover we have the following simple bounds.

Lemma 11.4

(Simple bounds) We have $\frac{1}{2}\log 3 \leqslant \phi _j({\textbf{x}}) < \log 2$.

Proof

For the upper bound, note that $f^{\Gamma _j}({\textbf{x}}) \leqslant f^{\Gamma _j}({\textbf{1}})$. By the definition of $f^C$ (see (7.4)), we have that $f^{\Gamma _j}({\textbf{1}})$ is equal to the number of children of $\Gamma _j$ at level 0, which, in turn, is equal to $2^{2^j} - 1$. This proves the claimed upper bound on $\phi _j({\textbf{x}})$.

For the lower bound, observe that $f^{\Gamma _j}({\textbf{x}}) \geqslant f^{\Gamma _j}({\textbf{0}})$. Using again the definition of $f^C$, we find that $f^{\Gamma _j}({\textbf{0}})$ equals the number of children of $\Gamma _j$ at level $j-1$. Thus $f^{\Gamma _j}({\textbf{0}}) = 3^{2^{j-1}}$ by Lemma 10.3. This proves the claimed lower bound of $\phi _j({\textbf{x}})$, thus completing the proof of the lemma. $\square $

The $\rho $-equations (9.1) may be expressed in terms of the $\phi _j$ in the following simple form:

$$\begin{aligned} \phi _{j+1}(\rho _j,\rho _{j-1},\ldots ) = \frac{1}{2}\big (\rho _j \phi _j(\rho _{j-1},\rho _{j-2},\ldots ) + 1\big ). \end{aligned}$$

(11.2)

11.3 Product structure of cells and self-similarity of the functions $\phi _j$

There is a natural bijection $\pi : {\mathbb {Q}}^{{\mathcal {P}}({\mathbb {N}})} \times {\mathbb {Q}}^{{\mathcal {P}}({\mathbb {N}})} \rightarrow {\mathbb {Q}}^{{\mathcal {P}}({\mathbb {N}})}$ defined by $\pi ((x, x')) = y$, where $y_{A} = x_{A-1}$ and $y_{\{1\} \cup A} = x'_{A-1}$, for all $A \subset \{2,3,\ldots \}$. Here, we write $A-1$ for the set $\{a-1:a\in A\}$. There is a finite version of this map that can be visualized as a concatenation map. For each r, let $\pi _r : {\mathbb {Q}}^{{\mathcal {P}}[r-1]} \times {\mathbb {Q}}^{{\mathcal {P}}[r-1]} \rightarrow {\mathbb {Q}}^{{\mathcal {P}}[r]}$ defined by $\pi ((x, x')) = y$, where $y_{A} = x_{A-1}$ and $y_{\{1\} \cup A} = x'_{A-1}$, for all $A \subset \{2,3,\ldots ,r\}$. If we place the coordinates of x and $x'$ in reverse binary order, as per the map $\{2,\ldots ,r\}\supset A\rightarrow \sum _{a\in A}2^{r-a}\in \{0,1,\ldots ,2^{r-1}-1\}$, then $\pi _r$ is the concatenation map that generates y by placing first all coordinates of x, followed by all coordinates of $x'$.

Now one may easily check that $\pi (V_{j-1} \times V_{j-1}) = V_j$ for all $j=1,2,\ldots $ Therefore if $C_1, C_2$ are two cells at level $(j-1)$ in the infinite binary system, then $\pi (C_1 \times C_2)$ is a cell at level j, and conversely every cell of level j is of this form. The children $C'$ of C are precisely $\pi (C'_1 \times C'_2)$ where $C_1 \rightarrow C'_1$, $C_2 \rightarrow C'_2$.

The product structure established above manifests itself in a self-similarity property $\phi _j\approx \phi _{j-1}$. In this section, we will establish the following precise version of this.

Proposition 11.5

Let $\alpha \in (0,1]$ and consider a vector ${\textbf{x}}=(x_1,x_2,\ldots )\in [0,\alpha ]^{\mathbb {N}}$. In addition, let $C = \pi (C_1 \times C_2)$ be a cell of level $j \geqslant 2$. Then we have

$$\begin{aligned} \psi ^{C_1}({\textbf{x}}) + \psi ^{C_2}({\textbf{x}}) \leqslant \psi ^C({\textbf{x}}) \leqslant \psi ^{C_1}({\textbf{x}}) + \psi ^{C_2}({\textbf{x}}) + \alpha ^{j-1} \log 2.\end{aligned}$$

(11.3)

In particular, taking $C = \Gamma _j = \pi (\Gamma _{j-1} \times \Gamma _{j-1})$, we have

$$\begin{aligned} \phi _{j-1}({\textbf{x}}) \leqslant \phi _j({\textbf{x}}) \leqslant \phi _{j-1}({\textbf{x}}) + (\alpha /2)^j \frac{\log 2}{\alpha } .\end{aligned}$$

(11.4)

Proof

We proceed by induction on j. When $j = 2$, we proceed by hand. Notice that at level 1, there are three different types of cells, having 4, 2 and 1 elements, respectively. There is only one cell with 4 elements, the cell $\Gamma _1$; it splits into three cells at level 0: one with two elements, and two unicells (singletons). All other cells at level 1 split into unicells at level 0. Hence, at level 2, there are six different types of cells $C = \pi (C_1 \times C_2)$ corresponding to the six possibilities for the unordered pair $\{|C_1|, |C_2|\}$. Their subcells are in 1-1 correspondence with the cells $\pi (C_1'\times C_2')$, where $C_1'$ is a subcell of $C_1$ (at level 0) and $C_2'$ is a subcell of $C_2$ (also at level 0).

The three cases with $\max (|C_1|,|C_2|\}\leqslant 2$ are trivial, because we then have that all the cells at level 1 are unicells, and thus we readily find that $f^C=f^{C_1}f^{C_2}=|C_1|\cdot |C_2|$.

The two other cases with $|C_1|\leqslant 2$ and $|C_2|=4$ (so that $C_2=\Gamma _1$) are only slightly harder: if $|C_1|=2$, then $f^C({\textbf{x}}) = 2 \cdot 2^{x_1} + 4$, $f^{C_1} = 2$, $f^{C_2} = 3$ and so the desired inequalities are $\log 6\leqslant \log (2 \cdot 2^{x_1} + 4) \leqslant \log 6 + x_1 \log 2$, which are immediately seen to be true for all $x_1 \geqslant 0$. Similarly, if $|C_1|=1$, then $f^C({\textbf{x}}) = 2^{x_1} + 2$, $f^{C_1} = 1$, $f^{C_2} = 3$, and so the desired inequalities are $\log 3\leqslant \log (2^{x_1} + 2) \leqslant \log 3 + x_1 \log 2$, which are again true for all $x_1\geqslant 0$.

A little trickier is the case $|C_1| = |C_2| = 3$, corresponding to $C = \Gamma _2 = \pi (\Gamma _1 \times \Gamma _1)$. In this case $f^C({\textbf{x}}) = 3^{x_1} + 4 \cdot 2^{x_1} + 4$, $f^{C_1} = f^{C_2} = 3$, so the desired inequalities are $2\log 3\leqslant \log (3^x + 4 \cdot 2^x + 4) \leqslant 2 \log 3 + x \log 2$. The lower bound is evident. For the upper bound, we must equivalently show that $g(x) := 5 \cdot 2^x - 3^x - 4\geqslant 0$ for $x\in [0,1]$. Since $g(0) = 0$ and $g'(x) = 5\log 2 \cdot 2^x - \log 3 \cdot 3^x > 0$ for $x\leqslant 1$, the desired inequality follows.

Now suppose that $j \geqslant 3$, and assume the result is true for cells at level $(j-1)$. By the recursive definition of $f^C$, if C is a cell at level j, we have the recurrence

$$\begin{aligned} e^{\psi ^C({\textbf{x}})} = \sum _{C \rightarrow C'} e^{x_{1}\psi ^{C'}(T{\textbf{x}})},\end{aligned}$$

(11.5)

where $T{\textbf{x}}$ denotes the shift operator

$$\begin{aligned} T{\textbf{x}}= (x_2,x_3,\ldots ). \end{aligned}$$

For the upper bound, note that

$$\begin{aligned} e^{\psi ^C({\textbf{x}})} = \sum _{C \rightarrow C'} e^{x_{1}\psi ^{C'}(T{\textbf{x}})} \leqslant \sum _{\begin{array}{c} C_1 \rightarrow C'_1 \\ C_2 \rightarrow C'_2 \end{array}} e^{x_{1}(\psi ^{C'_1}(T{\textbf{x}}) + \psi ^{C'_2}(T{\textbf{x}}) + \alpha ^{j-2} \log 2 )} . \end{aligned}$$

Recalling that $x_1\leqslant \alpha $, we conclude that

$$\begin{aligned} e^{\psi ^C({\textbf{x}})}\leqslant & {} 2^{\alpha ^{j-1}} \Big (\sum _{C_1 \rightarrow C'_1} e^{x_{1} \psi ^{C'_1}(T{\textbf{x}})}\Big ) \Big (\sum _{C_2 \rightarrow C'_2} e^{x_{1} \psi ^{C'_2}(T{\textbf{x}})}\Big ) \\ {}= & {} 2^{\alpha ^{j-1}} e^{\psi ^{C_1}({\textbf{x}})} e^{\psi ^{C_2}({\textbf{x}})} . \end{aligned}$$

The lower bound is proven similarly. The result thus follows. $\square $

11.4 Derivatives and the limit of the $\rho _i$

Because of the implicit definition of the parameters $\rho _i$, the self-similarity property (11.4) is not enough for us by itself. We will also require the following (rather ad hoc) derivative bounds.

Here, and in what follows, $\partial _m F(y_1,\ldots ) := \frac{\partial F}{\partial y_m}(y_1,\ldots )$, that is to say the derivative of the function F with respect to its mth variable. Thus, for instance,

$$\begin{aligned} \partial _m \psi ^C (T{\textbf{x}}) = \frac{\partial }{\partial x_{m+1}}\big [ \psi ^C(T{\textbf{x}}) \big ]. \end{aligned}$$

(11.6)

Proposition 11.6

Set $\Delta _m := \sup _{j \geqslant 2} \sup _{{\textbf{x}}\in [0,0.31]^{\mathbb {N}}}| \partial _m \phi _j({\textbf{x}})|$. Then $\Delta _1 < 0.17$, $\Delta _2 < 0.05$, $\sum _{m \geqslant 3} \Delta _m < 0.01$ and $\Delta _m \ll 0.155^m$.

The proof of this proposition is given in Sect. 11.5. Let us now show how this proposition, together with (11.4), implies Proposition 11.1.

Proof of Proposition 11.1

Write $\varepsilon _i := \rho _{i+1}- \rho _{i}$, $i = 1,2,3,\ldots $ The $\rho $-equation at level $(j+1)$ is

$$\begin{aligned} \phi _{j+2}(\rho _{j+1},\rho _j,\ldots ) = \frac{1}{2}\big (\rho _{j+1} \phi _{j+1}(\rho _j,\rho _{j-1},\ldots ) + 1\big )\end{aligned}$$

by (11.2). Recall that that $\rho _j\leqslant \rho _1\leqslant 0.31$ for all j, by Lemma 11.2. Hence, two applications of (11.4) (with $\alpha = 0.31$) yield the asymptotic formula

$$\begin{aligned} \phi _{j+1}(\rho _{j+1},\rho _j,\ldots ) = \frac{1}{2}\big (\rho _{j+1} \phi _j(\rho _j,\rho _{j-1},\ldots ) + 1\big ) + O(0.155^j). \end{aligned}$$

Subtracting (11.2), the $\rho $-equation at level j, from this gives

$$\begin{aligned} \phi _{j+1}&(\rho _{j+1},\rho _j,\ldots ) - \phi _{j+1}(\rho _j,\rho _{j-1},\ldots ) \nonumber \\&= \frac{\rho _{j+1}}{2} \big (\phi _j(\rho _j,\rho _{j-1},\ldots ) - \phi _j(\rho _{j-1},\rho _{j-2},\ldots )\big ) \nonumber \\ {}&\quad + \frac{\varepsilon _j}{2} \phi _j(\rho _j,\rho _{j-1},\ldots ) + O(0.155^j). \end{aligned}$$

(11.7)

Now by the mean value theorem,

$$\begin{aligned} |\phi _{j+1} (\rho _{j+1},\rho _j,\ldots ) - \phi _{j+1}(\rho _j,\rho _{j-1},\ldots )| \leqslant \Delta _1 |\varepsilon _j| + \cdots + \Delta _j |\varepsilon _1|\nonumber \\ \end{aligned}$$

(11.8)

and

$$\begin{aligned} |\phi _j(\rho _j,\rho _{j-1},\ldots ) - \phi _j(\rho _{j-1},\rho _{j-2},\ldots )| \leqslant \Delta _1 | \varepsilon _{j-1}| + \cdots + \Delta _{j-1} |\varepsilon _1|.\nonumber \\ \end{aligned}$$

(11.9)

Therefore, from (11.7), the triangle inequality and the fact that

$$\begin{aligned} \frac{\rho _{j+1}}{2} \leqslant \frac{\rho _1}{2}\leqslant 0.155, \end{aligned}$$

we have

$$\begin{aligned}{} & {} |\varepsilon _j| \Big (\frac{1}{2}\phi _j(\rho _j,\rho _{j-1},\ldots ) - \Delta _1\Big )\nonumber \\{} & {} \qquad \,\ \leqslant (\Delta _2 + 0.155 \Delta _1) |\varepsilon _{j-1}| + (\Delta _3 + 0.155 \Delta _2) |\varepsilon _{j-2}| \nonumber \\{} & {} \qquad \qquad + \cdots +O(0.155^j). \end{aligned}$$

(11.10)

Now by Lemma 11.4 and Proposition 11.6,

$$\begin{aligned} \frac{1}{2}\phi _j(\rho _j,\rho _{j-1},\ldots ) - \Delta _1> \frac{1}{4}\log 3 - 0.17 > 0.104. \end{aligned}$$

Also, by Proposition 11.6 we have

$$\begin{aligned} (\Delta _2 + 0.155\Delta _1) + (\Delta _3 + 0.155\Delta _2) + \cdots < 0.096. \end{aligned}$$

Assuming that $j\geqslant j_0$ with $j_0$ large enough, (11.10) implies a bound

$$\begin{aligned} |\varepsilon _j| \leqslant c_1 |\varepsilon _{j-1}| + c_2 |\varepsilon _{j-2}| + \cdots + c_{j-1}|\varepsilon _1| + 2^{-j}, \end{aligned}$$

(11.11)

where $c_1,c_2,\ldots $ are fixed nonnegative constants with $\sum _{i} c_i< \frac{0.096}{0.104} < 0.93$ and, by Proposition 11.6, $c_i \leqslant 2^{-i}$ for all $i \geqslant i_0$ for some $i_0$. It is convenient to assume that $i_0, j_0 \geqslant 10$, which we clearly may.

We claim that (11.11) implies exponential decay of the $\varepsilon _j$, which of course immediately implies Theorem 11.1. To see this, take $\delta \in (0, \frac{1}{4})$ so small that $0.94 (1 - \delta )^{-i_0} < 0.99$, and then take $A \geqslant 100$ large enough that $|\varepsilon _j| \leqslant A(1- \delta )^j$ for all $j \leqslant j_0$. We claim that the same bound holds for all j, which follows immediately by induction using (11.11) provided one can show that

$$\begin{aligned} \sum _{i \geqslant 1} c_i (1 - \delta )^{-i} + \frac{1}{A}\Big (\frac{1}{2(1 - \delta )}\Big )^j < 1 \end{aligned}$$

(11.12)

for $j \geqslant j_0$. Since $\delta < \frac{1}{2}$ and $A \geqslant 100$, it is enough to show that

$$\begin{aligned} \sum _{i \geqslant 1} c_i(1 - \delta )^{-i} < 0.99. \end{aligned}$$

The contribution to this sum from $i \leqslant i_0$ is at most $0.93 (1 - \delta )^{-i_0}$, whereas the contribution from $i > i_0$ is (by summing the geometric series) at most

$$\begin{aligned} \sum _{i > i_0} 2^{-i}(1 - \delta )^{-i}< 2 \cdot 2^{-i_0}(1 - \delta )^{-i_0} < 0.01 (1 - \delta )^{-i_0}. \end{aligned}$$

Therefore the desired bound follows from our choice of $\delta $. $\square $

11.5 Self-similarity for derivatives

Our remaining task is to prove Proposition 11.6. Once again we use self-similarity of the $\phi _j$, but now for their derivatives, the key point being that $\partial _m \phi _j \approx \partial _m \phi _{j-1}$. Here is a precise statement.

Proposition 11.7

Suppose that $C = \pi (C_1 \times C_2)$ is cell at level $j\geqslant 1$. Let $\alpha \in [0,1)$ and $m \geqslant 1$, and suppose that ${\textbf{x}}\in [0,\alpha ]^{\mathbb {N}}$. Then we have

$$\begin{aligned} 0\leqslant \partial _m \psi ^C({\textbf{x}}) \leqslant 2^{\sum _{i = 1}^m \alpha ^{j-i}} \big ( \partial _m \psi ^{C_1}({\textbf{x}}) + \partial _m\psi ^{C_2}({\textbf{x}}) + \alpha ^{j-2} \log 2 \big ). \end{aligned}$$

In particular, taking $C = \Gamma _j = \pi (\Gamma _{j-1} \times \Gamma _{j-1})$, we have

$$\begin{aligned} 0\leqslant \partial _m \phi _j({\textbf{x}}) \leqslant 2^{\sum _{i = 1}^m \alpha ^{j-i}} \Big ( \partial _m \phi _{j-1}({\textbf{x}}) +\left( \frac{\alpha }{2}\right) ^j \frac{\log 2}{\alpha ^2} \Big ). \end{aligned}$$

(11.13)

Proof

The lower bound follows by noticing that $\psi ^C$ is increasing in each variable. For the upper bound, we may assume that $m\leqslant j-1$, for when $m\geqslant j$, $\partial _m \phi _j({\textbf{x}})$ is identically zero. We proceed by induction on m, first establishing the case $m = 1$. Differentiating (11.5) gives

$$\begin{aligned} e^{\psi ^C({\textbf{x}})} \partial _1 \psi ^C({\textbf{x}})= \sum _{C \rightarrow C'} \psi ^{C'}(T{\textbf{x}}) e^{x_1 \psi ^{C'}(T{\textbf{x}})}. \end{aligned}$$

By two applications of the upper bound in Proposition 11.5 (applied to $C' = \pi (C'_1 \times C'_2)$), we obtain

$$\begin{aligned} e^{\psi ^C({\textbf{x}})}\partial _1 \psi ^C({\textbf{x}})\leqslant & {} 2^{\alpha ^{j-1}} \sum _{\begin{array}{c} C_1 \rightarrow C'_1 \\ C_2 \rightarrow C'_2 \end{array}} \big ( \psi ^{C'_1}(T{\textbf{x}}) + \psi ^{C'_2}(T{\textbf{x}}) + \alpha ^{j-2}\log 2 \big ) \nonumber \\{} & {} \times e^{x_1(\psi ^{C'_1}(T{\textbf{x}}) + \psi ^{C'_2}(T{\textbf{x}}))}. \end{aligned}$$

(11.14)

On the other hand, for $i = 1,2$ we get by differentiating the recurrence

$$\begin{aligned} e^{\psi ^{C_i}({\textbf{x}})} = \sum _{C_i \rightarrow C'_i} e^{x_1 \psi ^{C'_i}(T{\textbf{x}})}\end{aligned}$$

(11.15)

with respect to $x_1$ that

$$\begin{aligned} e^{\psi ^{C_i}({\textbf{x}})} \partial _1 \psi ^{C_i}({\textbf{x}}) = \sum _{C_i \rightarrow C'_i} \psi ^{C'_i}(T{\textbf{x}}) e^{x_1 \psi ^{C'_i}(T{\textbf{x}})}. \end{aligned}$$

(11.16)

Substituting (11.15) and (11.16) into (11.14) gives

$$\begin{aligned} e^{\psi ^C({\textbf{x}})} \partial _1 \psi ^C({\textbf{x}}) \leqslant 2^{\alpha ^{j-1}}\big ( \partial _1 \psi ^{C_1}({\textbf{x}}) + \partial _1 \psi ^{C_2}({\textbf{x}}) + \alpha ^{j-2} \log 2 \big ) e^{\psi ^{C_1}({\textbf{x}}) + \psi ^{C_2}({\textbf{x}})}. \end{aligned}$$

Finally, Proposition 11.5 implies that $e^{\psi ^{C_1}({\textbf{x}}) + \psi ^{C_2}({\textbf{x}})}\leqslant e^{\psi ^C({\textbf{x}})}$. Dividing both sides by $e^{\psi ^C({\textbf{x}})}$ gives the result when $m = 1$.

Now suppose that $m \geqslant 2$. Differentiating (11.5) with respect to $x_{m}$ and applying (11.6) gives

$$\begin{aligned} e^{\psi ^C({\textbf{x}})} \partial _m \psi ^C({\textbf{x}}) = \sum _{C \rightarrow C'} x_{1} e^{x_1 \psi ^{C'}(T{\textbf{x}})} \partial _{m-1} \psi ^{C'}(T{\textbf{x}}) . \end{aligned}$$

(11.17)

By the inductive hypothesis, if $C' = \pi (C'_1 \times C'_2)$ we have

$$\begin{aligned} \partial _{m-1} \psi ^{C'}(T{\textbf{x}}) \leqslant 2^{\sum _{i=2}^m \alpha ^{j - i}} \Big ( \partial _{m-1} \psi ^{C'_1}(T{\textbf{x}}) + \partial _{m-1} \psi ^{C'_2}(T{\textbf{x}}) + \alpha ^{j-3} \log 2 \Big ).\nonumber \\ \end{aligned}$$

(11.18)

Also, by the upper bound in Proposition 11.5, we have

$$\begin{aligned} \psi ^{C'}(T{\textbf{x}}) \leqslant \psi ^{C'_1}(T{\textbf{x}}) + \psi ^{C'_2}(T{\textbf{x}}) + \alpha ^{j-2} \log 2.\end{aligned}$$

(11.19)

Substituting (11.18) and (11.19) into (11.17) and using the assumption that $0\leqslant x_{1} \leqslant \alpha $ gives

$$\begin{aligned} e^{\psi ^C({\textbf{x}})} \partial _m \psi ^C({\textbf{x}}) \nonumber&\leqslant 2^{\sum _{i=1}^m \alpha ^{j-i}} \nonumber \\&\quad \times \sum _{\begin{array}{c} C_1 \rightarrow C'_1 \\ C_2 \rightarrow C'_2 \end{array}} x_1\Big [ \partial _{m-1} \psi ^{C'_1}(T{\textbf{x}}) + \partial _{m-1} \psi ^{C'_2}(T{\textbf{x}})+ \alpha ^{j-3}\log 2 \Big ] \nonumber \\&\quad \times e^{x_1(\psi ^{C'_1}(T{\textbf{x}})+ \psi ^{C'_2}(T{\textbf{x}}))}. \end{aligned}$$

(11.20)

Now, differentiating the recurrence (11.15) with respect to $x_{m}$ (using (11.6)) gives, for $i = 1, 2$,

$$\begin{aligned} e^{\psi ^{C_i}({\textbf{x}})}\partial _{m} \psi ^{C_i}({\textbf{x}}) = \sum _{C_i \rightarrow C'_i} x_{1} e^{x_1 \psi ^{C'_i}(T{\textbf{x}})}\partial _{m-1} \psi ^{C'_i}(T{\textbf{x}}).\end{aligned}$$

(11.21)

Substituting (11.15) and (11.21) into (11.20), and using once again that $x_1\leqslant \alpha $, gives

$$\begin{aligned} e^{\psi ^C({\textbf{x}})}\partial _{m} \psi ^C({\textbf{x}}) e^{\psi ^C({\textbf{x}})}\leqslant & {} 2^{\sum _{i=1}^m \alpha ^{j-i}} \Big ( \partial _{m} \psi ^{C_1}({\textbf{x}}) + \partial _{m} \psi ^{C_2}({\textbf{x}}) +\alpha ^{j-2}\log 2 \Big )\\{} & {} \times e^{\psi ^{C_1}({\textbf{x}}) + \psi ^{C_2}({\textbf{x}}) }. \end{aligned}$$

Again, Proposition 11.5 implies that $ e^{\psi ^{C_1}({\textbf{x}}) + \psi ^{C_2}({\textbf{x}}) } \leqslant e^{\psi ^C({\textbf{x}})}$, and so by dividing both sides by $e^{\psi ^C({\textbf{x}})}$, we obtain the stated result. $\square $

Before proving Proposition 11.6, we isolate a lemma.

Lemma 11.8

For $0 \leqslant x_1\leqslant 0.31$ we have $0 \leqslant 4\partial _1\phi _2({\textbf{x}}) \leqslant 0.481$.

Proof

We have $e^{4\phi ({\textbf{x}})}= 3^{x_1} + 4\cdot 2^{x_1} + 4$, and thus

$$\begin{aligned} 4\partial _1\phi _2({\textbf{x}}) = \frac{\log 3 \cdot 3^{x_1} + \log 2 \cdot 4\cdot 2^{x_1}}{3^{x_1} + 4\cdot 2^{x_1} + 4}. \end{aligned}$$

The lemma is therefore equivalent to

$$\begin{aligned} \frac{1}{4}(\log 3 - 0.481) 3^{x_1} + (\log 2 - 0.481)2^{x_1} \leqslant 0.481. \end{aligned}$$

The left-hand side here is increasing in $x_1$ and, when $x_1 = 0.31$, it is equal to $0.480052\ldots $. $\square $

Proof of Proposition 11.6

Henceforth, set $\alpha := 0.31$ and fix two integers $m\geqslant 1$ and $j\geqslant 2$. Our goal is to bound $\partial _m\phi ({\textbf{x}})$ uniformly for ${\textbf{x}}\in [0,\alpha ]^{\mathbb {N}}$. We may assume that $j\geqslant m+1$, as $\partial _m \phi _j({\textbf{x}})=0$ when $j\leqslant m$.

Now, let us define

$$\begin{aligned} A_m := 2^{1 + \alpha + \cdots + \alpha ^{m-1}} \quad \text {and}\quad B_m := 2^{\frac{1 + \alpha + \cdots + \alpha ^{m-1}}{1 - \alpha }}.\end{aligned}$$

Then, if we apply (11.13) $\ell $ times, we obtain

$$\begin{aligned} 0\leqslant \partial _{m} \phi _j({\textbf{x}})&\leqslant A_m^{\alpha ^{j-m} + \cdots + \alpha ^{j - m - (\ell - 1)}}\partial _{m}\phi _{j - \ell }( {\textbf{x}}) \nonumber \\ {}&\quad + \frac{\log 2}{\alpha ^2} \sum _{k = 0}^{\ell -1} A_m^{\alpha ^{j-m} + \cdots + \alpha ^{j - m - k}} \left( \frac{\alpha }{2}\right) ^{j - k} \nonumber \\&\leqslant B_m^{\alpha ^{j - m - (\ell - 1)}}\partial _{m} \phi _{j - \ell }({\textbf{x}}) + \frac{\log 2}{\alpha ^2} \sum _{k = 0}^{\ell - 1} B_m^{\alpha ^{j - m - k}} \left( \frac{\alpha }{2}\right) ^{j - k} \nonumber \\&\leqslant B_m^{\alpha ^{j-m-\ell +1}} \Big ( \partial _{m} \phi _{j - \ell }({\textbf{x}}) + \frac{\log 2}{\alpha ^2} \left( \frac{\alpha }{2}\right) ^{j-\ell +1} \frac{1}{1 - \alpha /2} \Big ) . \nonumber \\ \end{aligned}$$

(11.22)

Here, we observed that all the $B_m^{\alpha ^t}$ terms in (11.22) have $t \geqslant s + 1 - m$; bounding them all above by $B_m^{\alpha ^{s + 1 - m}}$ then allowed us to sum a geometric series.

Let us fix some $s\in \{1,2,\ldots ,m+1\}$ independent of j. Then the number $j-s$ lies in $\{0,1,\ldots ,j-1\}$. Hence, applying (11.22) with $\ell = j - s$, and then taking the supremum over all $j\geqslant m+1$ and all ${\textbf{x}}\in [0,\alpha ]^{\mathbb {N}}$, we find that

$$\begin{aligned} \Delta _m \leqslant B_m^{\alpha ^{s + 1 - m}} \Big ( \sup _{{\textbf{x}}\in [0,\alpha ]^{\mathbb {N}}} |\partial _{m}\phi _s({\textbf{x}})| + \frac{\log 2}{\alpha ^2} \left( \frac{\alpha }{2}\right) ^{s+1} \frac{1}{1 - \alpha /2} \Big ). \end{aligned}$$

(11.23)

When $m = 1$, we take $s = 2$. Then Lemma 11.8 and relation (11.23) give

$$\begin{aligned} \Delta _1 \leqslant 2^{\alpha ^2/(1 - \alpha )}\left( \frac{0.481}{4} + \frac{\alpha \log 2}{8(1 - \alpha /2)}\right) < 0.17, \end{aligned}$$

as required. When $m \geqslant 2$, we take $s = m$. Then $\partial _{m}\phi _{s} \equiv 0$ and so (11.23) degenerates to

$$\begin{aligned} \Delta _m \leqslant B_m^{\alpha } \frac{\log 2}{\alpha ^2} \left( \frac{\alpha }{2}\right) ^{m+1} \frac{1}{1 - \alpha /2}. \end{aligned}$$

(11.24)

This gives $\Delta _2 < 0.05$, and also confirms that $\Delta _m \ll 0.155^m$. To bound $\sum _{m \geqslant 3} \Delta _m$ we use (11.24) and the uniform bound $B_m \leqslant 2^{1/(1 - \alpha )^2}$, obtaining

$$\begin{aligned} \sum _{m \geqslant 3} \Delta _m \leqslant \frac{\alpha ^2 \log 2}{16(1 - \alpha /2)^2} 2^{\alpha /(1 - \alpha )^2} < 0.01. \end{aligned}$$

This completes the proof of Proposition 11.6. $\square $

12 Calculating the $\rho _i$ and $\rho $

In this section we conclude our analysis of the parameters $\rho _1,\rho _2,\ldots $ for the binary flags. The situation so far is that we have shown that these parameters exist, are unique and lie in (0, 0.31). Moreover, their limit $\rho = \lim _{i \rightarrow \infty } \rho _i$ exists (Proposition 11.1).

None of this helps with actually computing the limit numerically or giving any kind of closed form for it, and the objective of this section is to provide tools for doing that. We prove two main results, Propositions 12.1 and 12.2 below. Recall the convention that $\rho _0 = 0$.

Proposition 12.1

Recall the convention that $\rho _0 = 0$. Define a sequence $(a_{i,j})_{i\geqslant 1,\,1\leqslant j\leqslant i+1}$ by the relations $a_{i,1}=2$, $a_{i,2}=2+2^{\rho _{i-1}}$ and

$$\begin{aligned} a_{i,j} = a_{i,j-1}^2+a_{i-1,j-1}^{\rho _{i-1}} - a_{i-1,j-2}^{2\rho _{i-1}} \;\; (3\leqslant j\leqslant i+1).\end{aligned}$$

(12.1)

Then

$$\begin{aligned} a_{i,i+1} = a_{i-1,i}^{\rho _{i-1}} e^{2^{i-1}}\quad \text {for}\ i=2,3,\ldots \end{aligned}$$

(12.2)

In practice, these relations are enough to calculate the $\rho _j$ to high precision. Indeed, a short computer program produced the data in Table 1. (We suppress any discussion of the numerical precision of our routines.)

Table 1 Table of $\rho _j$

Full size table

Using Proposition 12.1 we may obtain the following reasonably satisfactory description of $\rho $, which is equivalent to the statement of Theorem 2 (c).

Proposition 12.2

For each $t \in (0,1)$, define a sequence $a_j(t)$ by

$$\begin{aligned} a_1(t) \!= & {} 2, a_2(t)\! = 2 + 2^{t}, \; \nonumber \\ a_j(t)\!= & {} a_{j-1}(t)^2 + a_{j-1}(t)^{t} - a_{j-2}(t)^{2t}\quad (j \geqslant 3).\nonumber \\ \end{aligned}$$

(12.3)

Then the limit $\rho = \lim _{i \rightarrow \infty } \rho _i$ is a solution (in the variable t) to the equation

$$\begin{aligned} \frac{1}{1 - t/2} = \lim _{j\rightarrow \infty } \frac{\log a_j(t)}{2^{j-2}}. \end{aligned}$$

(12.4)

Furthermore, $\rho $ is the unique solution to (12.4) in the interval $0\leqslant t\leqslant 1/3$.

Remark. This is easily seen to be equivalent to Theorem 2 (c), but we have introduced t as a dummy variable since $\rho $ now has the specific meaning $\rho = \lim _{i \rightarrow \infty } \rho _i$, and this will avoid confusion in the proof.

Before starting the proofs of Propositions 12.1 and 12.2, let us pause to observe a simple link between the sequences $a_{i,j}$ and $a_j(t)$ defined in (12.1) and (12.3) respectively.

Lemma 12.3

For each fixed $j\geqslant 1$, the limit $\lim _{i \rightarrow \infty } a_{i,j}$ exists and equals $a_j(\rho )$.

Proof

The existence of the limit follows by induction on j, using Proposition 11.1, noting that the result is trivial for $j = 1$ and immediate from Proposition 11.1 when $j = 2$. The fact that the limit equals $a_j(\rho )$ then follows immediately by letting $i \rightarrow \infty $ in (12.1) and comparing with (12.3). $\square $

12.1 Product formula for $f^C(\rho )$ and a double recursion for the $\rho _i$

Proposition 12.1 is a short deduction from a product formula for F(g), or equivalently for $f^C(\varvec{\rho })$, given in Proposition 12.5 below. Whilst is would be a stretch to say that this formula is of independent interest, it is certainly a natural result to prove in the context of our work.

Before we state the formula, the reader should recall the notion of genotype g (Definition 10.1) and of the function F(g) (Proposition 10.4). We require the following further small definition.

Definition 12.4

(Defects) Let $i,m\in {\mathbb {Z}}_{\geqslant 0}$ and let g be an i-genotype.

(a) If $m\leqslant i$, then we define the mth consolidation

$$\begin{aligned} g^{(m)} := \{ A' \subset [i - m]: A' \cup X \in g \; \text{ for } \text{ all } X \subset \{i - m+1,\ldots , i\}\}. \end{aligned}$$

Otherwise, if $m \geqslant i+1$, then by convention we define $g^{(m)}$ to be empty.

(b) For $m\geqslant 1$, we set

$$\begin{aligned} \Delta ^m(g) := |g^{(m-1)}| - 2 |g^{(m)}|. \end{aligned}$$

Remark

Note that $g^{(0)} = g$, $g^{(1)} = g^*$ and $g^{(m)} = (g^{(m-1)})^*$. It is easy to see that $\Delta ^m(g)$ is always a nonnegative integer. Observe that $\Delta ^{i+1}(g) = 0$ unless $g = {\mathcal {P}}[i]$, in which case $\Delta ^{i+1}(g) = 1$, and that $\Delta ^m(g) = 0$ whenever $m > i+1$.

Proposition 12.5

Let $i\in {\mathbb {N}}$ and suppose that g is an i-genotype. Then

$$\begin{aligned}\ F(g) = \prod _{m=1}^{i+1} a_{i,m}^{\Delta ^m(g)}, \end{aligned}$$

with the $a_{i,m}$ defined as in Proposition 12.1 above.

Proof of Proposition 12.1, given Proposition 12.5

Note that we have $\Delta ^m({\mathcal {P}}[i]) = 1_{m=i+1}$ for $1\leqslant m\leqslant i+1$. Together with Proposition 12.5, this implies that $F({\mathcal {P}}[i]) = a_{i, i+1}$. Thus $f^{\Gamma _i}({\varvec{\rho }}) = F({\mathcal {P}}[i]) = a_{i, i+1}$. The Eq. (12.2) is then an immediate consequence of the $\rho $-equations (9.1). $\square $

Before turning to the proof of Proposition 12.5, we isolate a couple of lemmas from the proof.

Lemma 12.6

Let $\alpha \in {\mathbb {R}}$ and $i\in {\mathbb {N}}$. Let g be an i-genotype, and suppose that k is an $(i-1)$-genotype with $k \leqslant g^*$. Then

$$\begin{aligned} \sum _{\begin{array}{c} g' \leqslant g \\ (g')^* = k \end{array}} \alpha ^{|g'|} = (1 + \alpha )^{\Delta ^1(g)} (1 + 2\alpha )^{|g^*| - |k|}\alpha ^{2 |k|}. \end{aligned}$$

Proof

We have $g=\{A\subset [i-1]:A\in g\}\cup \{A\subset [i-1]:A\cup \{i\}\in g\}$. Hence, if we let

$$\begin{aligned} X=\{A\subset [i-1]:A\in g,\ A\cup \{i\}\notin g\} \end{aligned}$$

and

$$\begin{aligned} Y=\{A\subset [i-1]:A\notin g,\ A\cup \{i\}\in g\}, \end{aligned}$$

then we have $|g|=2|g^*|+|X|+|Y|$, and thus $\Delta ^1(g)=|X|+|Y|$.

Now, in order to choose $g'\leqslant g$ with $(g')^*=k$, we must decide independently for each $A\subset [i-1]$ whether $A\in g'$ and/or $A\cup \{i\}\in g'$. The condition that $g'\leqslant g$ means that if $A\notin g$ (resp. if $A\cup \{i\}\notin g$), then we are forced to have $A\notin g'$ (resp. $A\cup \{i\}\notin g'$). Let us now examine all admissible options for the conditions “$A\in g'$” and “$A\cup \{i\}\in g'$”:

$A\in \ k$: since $(g')^*=k$, we are forced to have $A,A\cup \{i\}\in g'$.
$A\in g^*{\setminus } k$: we know in this case that $A,A\cup \{i\}\in g$, so the condition $g'\leqslant g$ imposes no further restrictions on the membership of A and of $A\cup \{i\}$ in $g'$. On the other hand, we know that $A\notin k=(g')^*$, and thus at most one out of A and of $A\cup \{i\}$ may belong to $g'$.
$A\in X$: the condition $g'\leqslant g$ implies the restriction that $A\cup \{i\}\notin g'$, and we may then choose freely among the two options of having $A\in g'$ or $A\notin g'$.
$A\in Y$: the condition $g'\leqslant g$ implies the restriction that $A\notin g'$, and we may then choose freely among the two options of having $A\cup \{i\}\in g'$ or $A\cup \{i\}\notin g'$.

By the above discussion, we have

$$\begin{aligned} \sum _{\begin{array}{c} g' \leqslant g \\ (g')^* = k \end{array}} \alpha ^{|g'|} = \alpha ^{2|k|} \prod _{A\in g^*{\setminus } k} (1+\alpha +\alpha ) \prod _{A\in X} (1+\alpha ) \prod _{A\in Y}(1+\alpha ). \end{aligned}$$

Since $|X|+|Y|=\Delta ^1(g)$, the proof is complete. $\square $

For ${\textbf{a}} = (a_1,a_2,\ldots )$, and for some (i-)genotype g, write

$$\begin{aligned} P_{{\textbf{a}}}(g) := \prod _{m=1}^{i+1} a_m^{\Delta ^m(g)}.\end{aligned}$$

(12.5)

(Note that the $a_m$ here are just parameters, not related to the recursion (12.3), which does not feature in this subsection.) If $\theta \in {\mathbb {R}}_{> 0}$, define

$$\begin{aligned} \Phi _{\theta ,{\textbf{a}}}(g) := \sum _{g' \leqslant g} \theta ^{|g| - |g'|} P_{{\textbf{a}}}(g').\end{aligned}$$

(12.6)

Lemma 12.7

We have the functional equation

$$\begin{aligned} \Phi _{\theta ,{\textbf{a}}}(g) = (\theta + a_1)^{\Delta ^1(g)} \Phi _{\theta ^2 + 2a_1 \theta , T{\textbf{a}}}(g^*). \end{aligned}$$

As before, $T{\textbf{a}}$ denotes the shift operator $T{\textbf{a}} = (a_2,a_3,\ldots )$.

Proof

Using the relation $P_{{\textbf{a}}}(g') = a_1^{\Delta ^1(g')} P_{T{\textbf{a}}}((g')^*)$, we have

$$\begin{aligned} \Phi _{\theta ,{\textbf{a}}}(g)&= \theta ^{|g|} \sum _{g' \leqslant g} \Big (\frac{a_1}{\theta }\Big )^{|g'|} \Big (\frac{1}{a_1^2}\Big )^{|(g')^*|} P_{T{\textbf{a}}}((g')^*) \\&= \theta ^{|g|}\sum _{k \leqslant g^*} \Big (\frac{1}{a_1^2}\Big )^{|k|} P_{T{\textbf{a}}}(k) \sum _{\begin{array}{c} g' \leqslant g \\ (g')^* = k \end{array}} \Big (\frac{a_1}{\theta }\Big )^{|g'|}. \end{aligned}$$

The result now follows from Lemma 12.6 and a routine short calculation. $\square $

We are now in a position to prove Proposition 12.5.

Proof of Proposition 12.5

Let $a_{i,m}$ be as in the statement of Proposition 12.5, and write ${\textbf{a}}_i = (a_{i,1},a_{i,2},\ldots )$. In the notation introduced above (cf. (12.5)) the claim of Proposition 12.5 is then that

$$\begin{aligned} F(g) = P_{{\textbf{a}}_i}(g).\end{aligned}$$

(12.7)

We proceed by induction on i. Let us first consider the base case when $i=1$.

If $g={\mathcal {P}}[1]$, we have $F(g)=f^{\Gamma _1}(\varvec{\rho })=3$. On the other hand, $P_{{\textbf{a}}_1}({\mathcal {P}}[1])=a_{1,2}=3$ in this case by the convention that $\rho _0=0$.
If $g\subsetneqq {\mathcal {P}}[1]$, then $g^*=\emptyset $ and thus $\Delta ^1(g)=|g|$ and $\Delta ^2(g)=0$. So we conclude that $P_{{\textbf{a}}_1}(g)=2^{|g|}$. On the other hand, for all such genotypes, the corresponding cell contains $2^{|g|}$ elements that all split into unicells at level 0. Consequently, $F(g)=2^{|g|}=P_{{\textbf{a}}_1}(g)$ in this case too.

Next, suppose that we have the result for $(i-1)$-genotypes for some $i\geqslant 2$, and let g be an i-genotype. We know from (10.3) that

$$\begin{aligned} F(g) = \sum _{g' \leqslant g^*} 2^{|g| - |g^*| - |g'|} F(g')^{\rho _{i-1}}. \end{aligned}$$

By the induction hypothesis, we have $F(g')^{\rho _{i-1}} = P_{{\textbf{a}}_{i-1}^{\rho _{i-1}}}(g')$ for all $g'\leqslant g^*$, where ${\textbf{a}}_{i-1}^{\rho _{i-1}}$ is shorthand for $(a_{i-1,1}^{\rho _{i-1}}, a_{i-1,2}^{\rho _{i-1}},\ldots )$. Hence, it follows immediately that

$$\begin{aligned} F(g) = 2^{\Delta ^1(g)} \Phi _{2,{\textbf{a}}_{i-1}^{\rho _{i-1}}}(g^*).\end{aligned}$$

(12.8)

with $\Phi $ defined in (12.6). The fact that the right-hand side of (12.8) is a product $P_{*}(g)$ is now clear by an iterated application of Lemma 12.7. To get a handle on exactly which product, suppose that the result of applying Lemma 12.7$j-1$ times is that

$$\begin{aligned} F(g) = \Big (\prod _{m = 1}^{j} b_{i,m}^{\Delta ^m(g)}\Big ) \Phi _{\theta _{i, j}, T^{j-1} ({\textbf{a}}_{i-1}^{\rho _{i-1}})} (g^{(j)}). \end{aligned}$$

(12.9)

Thus $b_{i,1} = \theta _{i,1} = 2$, and we have the relations

$$\begin{aligned} b_{i,j+1} = \theta _{i,j} + a_{i-1, j}^{\rho _{i-1}} \end{aligned}$$

(12.10)

and

$$\begin{aligned} \theta _{i,j+1} = \theta _{i,j}^2 + 2 a_{i-1,j}^{\rho _{i-1}} \theta _{i,j} \end{aligned}$$

(12.11)

for $j\in \{1,\ldots ,i\}$. We claim that $b_{i,j}=a_{i,j}$ for all $j\leqslant i+1$. This will complete the proof of Proposition 12.5, because we may then apply (12.9) with $j=i+1$ to show that

$$\begin{aligned} F(g) = \Big (\prod _{m = 1}^{i+1} a_{i,m}^{\Delta ^m(g)}\Big ) \Phi _{\theta _{i, i+1}, T^{i+1} ({\textbf{a}}_{i-1}^{\rho _{i-1}})} (g^{(i+1)}) = \prod _{m = 1}^{i+1} a_{i,m}^{\Delta ^m(g)} \end{aligned}$$

because $g^{(i+1)}=\emptyset $ for all i-genotypes g.

Let us now prove our claim that $b_{i,j}=a_{i,j}$ for all $j\leqslant i+1$. We shall use induction on j. We have $b_{i,1}=2=a_{i,1}$. In addition, $b_{i,2}=2+2^{\rho _{i-1}}=a_{i,2}$ by (12.10) with $j=1$ and by the fact that $\theta _{i,1}=2$. Now, assume that we have proven that $b_{i,j}=a_{i,j}$ for some $j\in \{2,\ldots ,i\}$. Relation (12.11) applied with $j-1$ in place of j implies that

$$\begin{aligned} \theta _{i,j}+a_{i-1,j-1}^{2\rho _{i-1}} = \big (\theta _{i,j-1}+ a_{i-1,j-1}^{\rho _{i-1}}\big )^2. \end{aligned}$$

The right-hand side equals $b_{i,j}^2=a_{i,j}^2$ by applying (12.10) followed by the induction hypothesis. Thus, $\theta _{i,j}=a_{i,j}^2-a_{i-1,j-1}^{2\rho _{i-1}}$. Inserting this relation into (12.10) and using the recursive formula (12.1) shows that $b_{i,j+1}=a_{i,j+1}$. This completes the inductive step and thus the proof of Proposition 12.5. $\square $

12.2 A single recurrence for $\rho $

In this section we deduce Proposition 12.2 from Proposition 12.1 by a limiting argument.

To carry this out, we will need the following fairly crude estimates for the $a_{i,j}$ and the $a_j(t)$, defined in (12.1) and (12.3) respectively.

Lemma 12.8

We have

$$\begin{aligned} a_{i,j+1} \leqslant a_{i,j}^2\quad \text {for}\ 1\leqslant j\leqslant i \end{aligned}$$

(12.12)

and

$$\begin{aligned} 3^{2^{j-2}} \leqslant a_{i,j} \leqslant a_{i,2}^{2^{j-2}}\leqslant 4^{2^{j-2}} \quad \text {for}\ 2\leqslant j\leqslant i+1. \end{aligned}$$

(12.13)

Proof

Since $\rho _{i-1}<1$ for all $i\geqslant 1$ (cf. Lemma 11.2), we have $a_{i,2}<4=a_{i,1}^2$. Hence, the inequality (12.12) follows from a simple induction using (12.1).

Using another simple induction, we readily confirm the inequality $a_{i,j}\leqslant a_{i,2}^{2^{j-2}}$ in (12.13).

For the lower bound in (12.13), we know from (12.10) and (12.11) and from the fact that $b_{i,j}=a_{i,j}$ for all $j\leqslant i+1$ that

$$\begin{aligned} a_{i,j+1} = \theta _{i,j} + a_{i-1, j}^{\rho _{i-1}} \end{aligned}$$

(12.14)

and that

$$\begin{aligned} \theta _{i,j+1} = \theta _{i,j}^2 + 2 a_{i-1,j}^{\rho _{i-1}} \theta _{i,j} \end{aligned}$$

(12.15)

for $j\in \{1,\ldots ,i\}$. By a simple induction, these formulas imply that $a_{i,j}>1$ and $\theta _{i,j}>0$ for all $j\leqslant i+1$, and thus $\theta _{i,j+1} + 1 \geqslant (\theta _{i,j} + 1)^2$ for $j=1,2,\ldots ,i$. By yet another induction, we find $\theta _{i,j} \geqslant 3^{2^{j - 1}} - 1$. Finally, the lower bound on the $a_{i,j}$ in (12.13) follows from this and (12.14). $\square $

Lemma 12.9

Let $t \in (0,1)$. We have

$$\begin{aligned} a_{j+1}(t) \leqslant a_{j}(t)^2\quad \text {for}\ j\geqslant 1 \end{aligned}$$

(12.16)

and

$$\begin{aligned} 3^{2^{j-2}} \leqslant a_{j}(t) \leqslant a_{2}(t)^{2^{j-2}}\leqslant 4^{2^{j-2}} \quad \text {for}\ j\geqslant 2. \end{aligned}$$

(12.17)

Proof

The inequality (12.16) follows from a simple induction using (12.3), and the upper bound in (12.17) follows with a further induction.

For the lower bound, we first set up relations analogous to (12.14) and (12.15), defining $\theta _j(t)$ for $j\geqslant 1$ via the relation

$$\begin{aligned} a_{j+1}(t) = \theta _j(t) + a_j(t)^t. \end{aligned}$$

(12.18)

We then note that we also have

$$\begin{aligned} \theta _{j+1}(t) = \theta _j(t)^2 + 2 a_j(t)^t \theta _j(t).\end{aligned}$$

(12.19)

Indeed, on the one hand, we have

$$\begin{aligned} \theta _{j+1}(t)&= a_{j+2}(t)-a_{j+1}(t)^t = a_{j+1}(t)^2-a_j(t)^{2t} \end{aligned}$$

by (12.3). On the other hand,

$$\begin{aligned} \theta _j(t)^2 + 2 a_j(t)^t \theta _j(t) = \big (\theta _j(t) +a_j(t)\big )^2 - a_j(t)^{2t} = a_{j+1}(t)^2-a_j(t)^{2t} \end{aligned}$$

by (12.18).

Having proven (12.19), we now proceed analogously to the proof of Lemma 12.8. We have $a_j(t)>1$ and $\theta _j(t)>0$ for all $j\geqslant 1$, by a simple induction using (12.18) and (12.19). Therefore, from (12.19), we have that

$$\begin{aligned} \theta _{j+1}(t) +1 \geqslant (\theta _j(t) + 1)^2. \end{aligned}$$

By induction, this implies that $\theta _{j}(t) \geqslant 3^{2^{j - 1}} - 1$. Finally, the lower bound on the $a_{j}(t)$ in (12.17) follows from this and (12.18). $\square $

We are now in a position to prove that the relation

$$\begin{aligned} \frac{1}{1 - t/2} = \lim _{j\rightarrow \infty } \frac{\log a_j(t)}{2^{j-2}} \end{aligned}$$

(12.20)

holds with $t=\rho $, which is one of the main statements of Proposition 12.2. Iterating (12.2) gives

$$\begin{aligned} a_{i, i+1} = \exp (2^{i-1}) a_{i-1, i}^{\rho _{i-1}}&= \exp (2^{i-1} + \rho _{i-1} 2^{i-2}) a_{i-2,i-1}^{\rho _{i-2}\rho _{i-1}} = \cdots \\&= \exp \Big (2^{i-1} + \sum _{j=1}^{i-2}(\rho _{i-j} \cdots \rho _{i-1}) 2^{i - j - 1} \Big ) a_{1,2}^{\rho _1 \cdots \rho _{i-1}}. \end{aligned}$$

By Proposition 11.1 , we have $\rho _i\rightarrow \rho $. In addition, by Lemma 11.2, we have $0\leqslant \rho _i\leqslant \rho _1<0.31$ for all i. Thus, taking limits as $i \rightarrow \infty $ gives

$$\begin{aligned} \lim _{i \rightarrow \infty } \frac{\log a_{i, i+1}}{2^{i-1}} = 1 + \frac{\rho }{2} + \Big (\frac{\rho }{2}\Big )^2 + \ldots = \frac{1}{1 - \rho /2}. \end{aligned}$$

(12.21)

We now derive another expression for the left-hand side of (12.21). A telescoping argument gives

$$\begin{aligned} \frac{\log a_{i, i+1}}{2^{i-1}} = \log 4 + \sum _{j=1}^i \frac{1}{2^{j-1}} \log \left( \frac{a_{i,j+1}}{a_{i,j}^2}\right) . \end{aligned}$$

(12.22)

The terms on the right-hand side of (12.22) are rapidly decreasing. Indeed, by (12.12) we have $1\geqslant a_{i,j+1}/a_{i,j}^2$ for all $j\geqslant 1$. On the other hand, by (12.1) (with j replaced by $j+1$ there) and by (12.13), we have

$$\begin{aligned} \frac{a_{i,j+1}}{a_{i,j}^2} \geqslant 1 - \frac{a_{i-1,j-1}^{2\rho _1}}{a_{i,j}^2} =1 + O\Big (\Big (\frac{2^{\rho _{i-1}}}{3}\Big )^{2^{j-1}}\Big ). \end{aligned}$$

for all $j\in \{2,\ldots ,i\}$. Since $\rho _{i-1}\leqslant \rho _1\leqslant 0.31$, we have $2^{\rho _{i-1}}/3<1/2$. In conclusion,

$$\begin{aligned} \log \left( \frac{a_{i,j+1}}{a_{i,j}^2}\right) = O(2^{-2^{j-1}}) \end{aligned}$$

(12.23)

for all $j\in \{1,\ldots ,i\}$. By a simple limiting argument using relation (12.22) and Lemma 12.3, we thus find that

$$\begin{aligned} \lim _{i \rightarrow \infty } \frac{\log a_{i, i+1}}{2^{i-1}} = \log 4 + \sum _{j=1}^\infty \frac{1}{2^{j-1}} \log \left( \frac{a_{j+1}(\rho )}{a_j(\rho )^2}\right) = \lim _{j\rightarrow \infty } \frac{\log a_j(\rho )}{2^{j-2}}. \end{aligned}$$

Here, we used (12.23) to bound the terms with j large. Comparing this with (12.21) confirms that indeed (12.20) is satisfied with $t=\rho $.

We turn now to the final statement in Proposition 12.2, the statement that (12.20) has a unique solution in $t \in [0,\frac{1}{3}]$ (which must, by the above discussion, be $\rho $). This is a purely analytic problem. Write

$$\begin{aligned} W_j(t) := \frac{1}{1 - t/2} - \frac{\log a_j(t)}{2^{j-2}}, \quad W(t) := \lim _{j \rightarrow \infty } W_j(t). \end{aligned}$$

We must show that there is only one solution to $W(t) = 0$. We already know $W(\rho )=0$, so it would suffice to show that W is strictly increasing in [0, 1/3]. This would certainly follow if we could show that

$$\begin{aligned} W_j(t') - W_j(t) \geqslant \frac{1}{6} (t' - t) \end{aligned}$$

for all $j\geqslant 2$ and all $0\leqslant t\leqslant t'\leqslant 1/3$. Since the derivative of $\frac{1}{1 - t/2}$ is bounded below by $\frac{1}{2}$ on $[0,\frac{1}{3}]$, it is enough to establish the derivative bound

$$\begin{aligned} \frac{\,\textrm{d}}{\,\textrm{d}t} \left( \frac{\log a_j(t)}{2^{j-2}}\right) \leqslant \frac{1}{3} \end{aligned}$$

for all $j \geqslant 2$ and all $t \in (0,\frac{1}{3})$. The remainder of the section is devoted to proving this bound, which it is convenient to write in the form

$$\begin{aligned} \ell _j(t) \leqslant \frac{1}{3} \cdot 2^{j - 2}, \end{aligned}$$

(12.24)

where $\ell _j(t) := a'_j(t)/a_j(t)$.

We begin by observing that, since $t \in (0,\frac{1}{3})$, we have $a_2(t) \leqslant 2 + 2^{1/3}$ and so we may upgrade the upper bound in (12.17) to

$$\begin{aligned} a_j(t) \leqslant (2 + 2^{1/3})^{2^{j - 2}}\end{aligned}$$

(12.25)

for $j \geqslant 2$. Note also that, by induction using (12.18) and (12.19), both $a_j(t)$ and $\theta _j(t)$ are increasing functions of t. In particular, $a_j(t)$ is an increasing function of t so the derivative $a'_j(t)$ is positive.

Differentiating (12.3) gives

$$\begin{aligned} a_{j+1}' = 2a_j a_j' + \big ( a_j^t \log a_j - 2 a_{j-1}^{2t}\log a_{j-1} \big ) + t a_j^t \frac{a_j'}{a_j} - 2t a_{j-1}^{2t} \frac{a_{j-1}'}{a_{j-1}},\nonumber \\ \end{aligned}$$

(12.26)

where here and in the next few lines we have omitted the argument (t) from the functions for brevity. The term in parentheses is non-positive by (12.16), and the final term $- 2t a_{j-1}^{2t} \frac{a_{j-1}'}{a_{j-1}}$ is negative since the derivative $a'_{j-1}$ is positive. It follows from (12.26) that

$$\begin{aligned} a_{j+1}' < 2a_j a_j' + t a_j^t \frac{a_j'}{a_j} . \end{aligned}$$

A little computation using (12.3) shows that this may equivalently be written as

$$\begin{aligned} \ell _{j+1} < 2\ell _j \bigg ( \frac{1}{1+a_j^{t-2}-a_{j-1}^{2t}a_j^{-2}} + \frac{t a_j^t}{2a_{j+1}}\bigg ), \end{aligned}$$

(12.27)

where we used our notation $\ell _j = a'_j/a_j$.

Denote

$$\begin{aligned} \xi _j := \sup _{t \in [0,\frac{1}{3}]} \bigg (\frac{1}{1+a_j(t)^{t-2}-a_{j-1}(t)^{2t}a_j(t)^{-2}} + \frac{t a_j(t)^t}{2a_{j+1}(t)}\bigg ). \end{aligned}$$

(12.28)

Then (12.27) implies that $\ell _{j+1}(t) < 2 \ell _j(t) \xi _j$ for all $t\in [0,1/3]$ and all $j\geqslant 2$. Telescoping this inequality gives

$$\begin{aligned} \ell _j(t) \leqslant (\ell _2(t) \xi _2 \xi _3 \cdots \xi _{j-1}) \cdot 2^{j - 2}. \end{aligned}$$

We have

$$\begin{aligned} \ell _2(t) = \frac{2^t\log 2}{2+2^t} \leqslant \frac{\log 2}{1+2^{2/3}} < 0.268 \end{aligned}$$

for all $t\in [0,1/3]$. Hence, in order to obtain the desired bound (12.24), it is enough to show

$$\begin{aligned} \xi _2 \xi _3 \cdots \xi _{j-1} < 1.2. \end{aligned}$$

(12.29)

The $\xi _i$ tend to 1 exceptionally rapidly, and crude bounds (together with a little computation) turn out to suffice, as follows.

First, by (12.17) and the fact that $a_2(t)^{2-t} = (2 + 2^t)^{2 - t} \leqslant 9$ for $t \in [0,1]$ (a calculus exercise), we have

$$\begin{aligned} a_j(t)^{t-2} \geqslant (a_2(t)^{t-2})^{2^{j-2}} \geqslant 9^{-2^{j-2}} \quad \text {for}\ j\geqslant 2. \end{aligned}$$

(12.30)

Second, by the lower bound in (12.17) and by (12.25) we have

$$\begin{aligned} a_{j-1}(t)^{2t}a_j(t)^{-2} \leqslant \big ((2 + 2^{1/3})^{2^{j-3}}\big )^{2/3} (3^{2^{j-2}} )^{-2}< 6^{-2^{j-2}}\quad \text {for}\ j\geqslant 3. \end{aligned}$$

We may also check by hand that $a_1(t)^{2t}/a_2(t)^2=(2^{1-t}+1)^{-2}<1/6$ for all $t\in [0,1/3]$. Hence,

$$\begin{aligned} a_{j-1}(t)^{2t}a_j(t)^{-2}< 6^{-2^{j-2}}\quad \text {for}\ j\geqslant 2. \end{aligned}$$

(12.31)

Third, again by the lower bound in (12.17) and by (12.25), we have

$$\begin{aligned} \frac{a_j(t)^t}{a_{j+1}(t)} \leqslant \frac{\big ( (2 + 2^{1/3})^{2^{j-2}}\big )^{1/3} }{3^{2^{j-1}}} \leqslant \left( \frac{1}{6}\right) ^{2^{j-2}}\quad \text {for}\ j\geqslant 2. \end{aligned}$$

(12.32)

Substituting (12.30), (12.31) and (12.32) into the definition (12.28) gives

$$\begin{aligned} \xi _j \leqslant \frac{1}{1+(\frac{1}{9})^{2^{j-2}}-(\frac{1}{6})^{2^{j-2}}} + \left( \frac{1}{6}\right) ^{1+2^{j-2}} \quad \text {for}\ j\geqslant 2. \end{aligned}$$

Using this bound, one may check the bound $\prod _{j = 2}^{\infty } \xi _j \leqslant 10/9$, which is stronger than the desired bound (12.29), on a pocket calculator or even by hand. For example, we have $\xi _2\xi _3 \leqslant \frac{46751495}{42169248}$ and can use a very crude bounds for the higher terms. Since $\frac{1}{1 - x} + \frac{x}{6} \leqslant e^{2x}$ for $0\leqslant x\leqslant 0.1$, taking $x = 6^{-2^{j-2}}$ gives

$$\begin{aligned} \xi _j \leqslant \exp \big ( 2 \cdot 6^{-2^{j - 2}}\big ) \end{aligned}$$

for $j \geqslant 4$. Therefore

$$\begin{aligned} \prod _{j = 4}^{\infty } \xi _j< \exp \bigg ( 2 \sum _{i = 4}^{\infty } \frac{1}{6^i} \bigg ) = e^{2/(5\cdot 6^3)} < 1.002. \end{aligned}$$

This concludes the proof of the final statement in Proposition 12.2.

12.3 Proof of parts (b) and (c) of Theorem 2

To conclude this paper, we complete the proof of parts (b) and (c) of Theorem 2, as defined in the end of Sect. 1.3. In fact, all of the ingredients have already been assembled and we must simply remark on how they fit together.

First, recall from Definition 9.6 that

$$\begin{aligned} \theta _r = (\log 3-1)\Big /\Big (\log 3 + \sum _{i = 1}^{r-1} \frac{2^i}{\rho _1 \cdots \rho _i}\Big ). \end{aligned}$$

Now, it is an easy exercise to see that if $x_1,x_2,\ldots $ is a sequence of positive real numbers for which $x = \lim _{i \rightarrow \infty } x_i$ exists and is positive, then

$$\begin{aligned} \lim _{r \rightarrow \infty } \Big ( \sum _{i = 1}^r x_1 \cdots x_i \Big )^{1/r} = \max (x,1). \end{aligned}$$

Applying this with $x_i = 2/\rho _i$ gives, by Proposition 11.1, that

$$\begin{aligned} \lim _{r \rightarrow \infty } \theta _r^{1/r} = \frac{\rho }{2}. \end{aligned}$$

This, together with Proposition 12.2, completes the proof of Theorem 2.

Notes

A property of natural numbers is said to occur for almost all n if the number of exceptions below x is o(x) as $x\rightarrow \infty $.
The factor $3^{m-1}$ is missing in the stated lower bounds for $\alpha _k$ in [26].
We use $1_E$ for the indicator function of a statement E; that is, $1_E=1$ if E is true and $1_E=0$ if E is false.
We say that the random variable X has the distribution $\textrm{NB}(r,p)$ with $r\in {\mathbb {N}}$ and $p\in (0,1]$ if X takes values in ${\mathbb {Z}}_{\geqslant 0}$ with the following frequency: ${\mathbb {P}}(X=k)=\left( {\begin{array}{c}k+r-1\\ r-1\end{array}}\right) (1-p)^kp^r$ for each $k\in {\mathbb {Z}}_{\geqslant 0}$.
In the literature, the term “flag” means that the inclusions are proper, i.e., $\dim (V_{i+1}) > \dim V_i$ for all i. In this paper, we will use the term more broadly to refer to an arbitrary nested sequence of subspaces.
Here and throughout the paper, ${\text {Span}}(v_1,\ldots )$ denotes the ${\mathbb {Q}}$-span of vectors $v_1,\ldots $.
Note that we have not said that the $\rho _i$ are unique. However, in cases of interest to us this will turn out to be the case.
The term genotype is appropriate, as each component in g acts like recessive gene with respect to child cells.

References

Alon, N., Spencer, J.H.: The Probabilistic Method. Wiley Series in Discrete Mathematics and Optimization, fou4 edn. John Wiley & Sons Inc, Hoboken, NJ (2016)
Google Scholar
Arratia, R., Barbour, A.D., Tavaré, S.: On random polynomials over finite fields. In: Mathematical Proceedings of the Cambridge Philosophical Society, vol. 114, pp. 347–368 (1993)
Arratia, R., Tavaré, S.: The cycle structure of random permutations. Ann. Probab. 20, 1567–1591 (1992)
Article MathSciNet MATH Google Scholar
Elliott, P.D.T.A.: Probabilistic Number Theory. I. Grundlehren der Mathematischen Wissenschaften [Fundamental Principles of Mathematical Science], vol. 239. Springer-Verlag, New York (1979). (Mean-value theorems)
Elliott, P.D.T.A.: Probabilistic Number Theory. II. Grundlehren der Mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences], vol. 240. Springer-Verlag, New York (1980). (Central limit theorems)
Erdős, P., Hall, R.R.: The propinquity of divisors. Bull. Lond. Math. Soc. 11, 304–307 (1979)
Article MathSciNet MATH Google Scholar
Erdős, P., Nicolas, J.-L.: Répartition des nombres superabondants. Bull. Soc. Math. Fr. 103, 65–90 (1975)
Article MATH Google Scholar
Erdős, P., Nicolas, J.-L.: Méthodes probabilistes et combinatoires en théorie des nombres. Bull. Sci. Math. (2) 100, 301–320 (1976)
MathSciNet MATH Google Scholar
Erdős, P.: On the density of some sequences of integers. Bull. Am. Math. Soc. 54, 685–692 (1948)
Article MathSciNet MATH Google Scholar
Erdős, P.: On some applications of probability to analysis and number theory. J. Lond. Math. Soc. 39, 692–696 (1964)
Article MathSciNet MATH Google Scholar
Ford, K.: Joint Poisson distribution of prime factors in sets. Math. Proc. Camb. Philos. Soc. 173, 189–200 (2022)
Article MathSciNet MATH Google Scholar
Hall, R.R., Tenenbaum, G.: On the average and normal orders of Hooley’s $\Delta $-function. J. Lond. Math. Soc. (2) 25, 392–406 (1982)
Article MathSciNet MATH Google Scholar
Hall, R.R., Tenenbaum, G.: The average orders of Hooley’s $\Delta _{r}$-functions. Mathematika 31, 98–109 (1984)
Article MathSciNet MATH Google Scholar
Hall, R.R., Tenenbaum, G.: The average orders of Hooley’s $\Delta _r$-functions. II. Compos. Math. 60, 163–186 (1986)
MATH Google Scholar
Hall, R.R., Tenenbaum, G.: Divisors. Cambridge Tracts in Mathematics, vol. 90. Cambridge University Press, Cambridge (1988)
Google Scholar
Hooley, C.: On a new technique and its applications to the theory of numbers. Proc. Lond. Math. Soc. (3) 38, 115–151 (1979)
Article MathSciNet MATH Google Scholar
Koukoulopoulos, D.: Localized factorizations of integers. Proc. Lond. Math. Soc. 101, 392–426 (2010)
Article MathSciNet MATH Google Scholar
Koukoulopoulos, D.: On the number of integers in a generalized multiplication table. J. Reine Angew. Math. 689, 33–99 (2014)
Article MathSciNet MATH Google Scholar
Levin, D.A., Peres, Y., Wilmer, E.L.: Markov Chains and Mixing Times. American Mathematical Society, Providence, RI (2009). With a chapter by James G. Propp and David B. Wilson
Maier, H., Tenenbaum, G.: On the set of divisors of an integer. Invent. Math. 76, 121–128 (1984)
Article MathSciNet MATH Google Scholar
Maier, H., Tenenbaum, G.: On the normal concentration of divisors. J. Lond. Math. Soc. (2) 31, 393–400 (1985)
Article MathSciNet MATH Google Scholar
Maier, H., Tenenbaum, G.: On the normal concentration of divisors. II. Math. Proc. Camb. Philos. Soc. 147, 513–540 (2009)
Article MathSciNet MATH Google Scholar
Tenenbaum, G.: Sur la concentration moyenne des diviseurs. Comment. Math. Helv. 60, 411–428 (1985)
Article MathSciNet MATH Google Scholar
Tenenbaum, G.: Fonctions $\Delta $ de Hooley et applications. In: Séminaire de théorie des nombres, Paris 1984–85. Progress in Mathematics, vol. 63, pp. 225–239. Birkhäuser, Boston, MA (1986)
Tenenbaum, G.: Crible d’ératosthène et modèle de Kubilius. In: Number Theory in Progress (Zakopane-Kościelisko, 1997), vol. 2, pp. 1099–1129. de Gruyter, Berlin (1999)
Tenenbaum, G.: Some of Erdős’ unconventional problems in number theory, thirty-four years later. In: Erdős centennial. Bolyai Society Mathematical Studies, vol. 25, pp. 651–681. János Bolyai Mathematical Society, Budapest (2013)

Download references

Acknowledgements

This collaboration began at the MSRI program on Analytic Number Theory, which took place in the first half of 2017 and which was supported by the National Science Foundation under Grant No. DMS-1440140. All three authors are grateful to MSRI for allowing us the opportunity to work together. The project was completed during a visit of KF and DK to Oxford in the first half of 2019. Both authors are grateful to the University of Oxford for its hospitality. KF is supported by the National Science Foundation Grants DMS-1501982 and DMS-1802139. In addition, his stay at Oxford in early 2019 was supported by a Visiting Fellowship at Magdalen College Oxford. BG is supported by a Simons Investigator Grant, which also funded DK’s visit to Oxford. DK is also supported by the Courtois Chair II in fundamental research, by the Natural Sciences and Engineering Research Council of Canada (RGPIN-2018-05699) and by the Fonds de recherche du Québec - Nature et technologies (2019-PR-256442 and 2022-PR-300951).

Author information

Authors and Affiliations

Department of Mathematics, University of Illinois at Urbana–Champaign, Urbana, IL, 61801, USA
Kevin Ford
Mathematical Institute, Andrew Wiles Building, Radcliffe Observatory Quarter, Woodstock Road, Oxford, OX2 6GG, UK
Ben Green
Département de Mathématiques et de Statistique, Université de Montréal, CP 6128 succ. Centre-Ville, Montréal, QC, H3C 3J7, Canada
Dimitris Koukoulopoulos

Authors

Kevin Ford
View author publications
You can also search for this author in PubMed Google Scholar
Ben Green
View author publications
You can also search for this author in PubMed Google Scholar
Dimitris Koukoulopoulos
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ben Green.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A. Some probabilistic lemmas

Throughout this section, ${\textbf{A}}\subset {\mathbb {N}}$ will be a random set, with ${\mathbb {P}}(i \in {\textbf{A}}) = 1/i$ and these choices being independent for different values of i.

Lemma A.1

For any finite subset $B \subset {\mathbb {Z}}_{\geqslant 4}$ and any $k\in {\mathbb {Z}}_{\geqslant 0}$, we have

$$\begin{aligned} \left( 1 - \frac{2k^2 (\sum _{m\in B} 1/(m-1))^{-2}}{\min B} \right) M \leqslant {\mathbb {P}}( \#({\textbf{A}}\cap B)=k ) \leqslant M, \end{aligned}$$

where

$$\begin{aligned} M = \frac{1}{k!} \bigg ( \sum _{m\in B} \frac{1}{m-1} \bigg )^{k}\prod _{m\in B} \left( 1-\frac{1}{m}\right) . \end{aligned}$$

Proof

The result follows by a standard inclusion-exclusion argument. We have

$$\begin{aligned}\begin{aligned} {\mathbb {P}}( \#({\textbf{A}}\cap B) = k)&= \sum _{\begin{array}{c} a_1,\ldots ,a_k\in B \\ a_1<\cdots<a_k \end{array}} \frac{1}{a_1\cdots a_k} \prod _{\begin{array}{c} m\in B \\ m\not \in \{a_1,\ldots ,a_k\} \end{array}} \left( 1-\frac{1}{m}\right) \\&= \prod _{m\in B} \left( 1-\frac{1}{m}\right) \sum _{\begin{array}{c} a_1,\ldots ,a_k\in B \\ a_1<\cdots <a_k \end{array}} \frac{1}{(a_1-1)\cdots (a_k-1)} \leqslant M. \end{aligned}\end{aligned}$$

For the lower bound, we note that

$$\begin{aligned}&\frac{1}{k!} \bigg ( \sum _{m\in B} \frac{1}{m-1} \bigg )^{k} - \sum _{\begin{array}{c} a_1,\ldots ,a_k\in B \\ a_1<\cdots<a_k \end{array}} \frac{1}{(a_1-1)\cdots (a_k-1)} \\&\qquad = \frac{1}{k!}\sum _{\begin{array}{c} a_1,\ldots ,a_k\in B \\ \exists i<j\ \text {with}\ a_i=a_j \end{array}} \frac{1}{(a_1-1)\cdots (a_k-1)} \\&\qquad \leqslant \frac{1}{k!} \left( {\begin{array}{c}k\\ 2\end{array}}\right) \bigg (\sum _{a\in B} \frac{1}{(a-1)^2} \bigg )\bigg (\sum _{a\in B}\frac{1}{(a-1)}\bigg )^{k-2}. \end{aligned}$$

Since $\sum _{a\in B}1/(a-1)^2 < 1/(\min B-2)^2\leqslant 4/(\min B)^2$, the proof is complete. $\square $

Lemma A.2

Uniformly for $B\subset {\mathbb {N}}$ with $\lambda :=\sum _{m\in B}1/m\geqslant 1$ and $0\leqslant \varepsilon \leqslant 1$, we have

$$\begin{aligned} {\mathbb {P}}\Big ( \big |\#({\textbf{A}}\cap B) - \lambda \big | > \varepsilon \lambda \Big ) \ll \exp (-\varepsilon ^2\lambda /3). \end{aligned}$$

Proof

This follows by the upper bound in Lemma A.1 with standard bounds on the tails of the Poisson distribution, e.g. Norton’s bounds [15, Theorem 09]. $\square $

Lemma A.3

For any $x>0$ and finite set $B\subset {\mathbb {N}}$,

$$\begin{aligned} {\mathbb {E}}x^{\# ({\textbf{A}}\cap B)} \leqslant \exp \Big ( (x-1)\sum _{j\in B} \frac{1}{j} \Big ). \end{aligned}$$

Proof

The random variable $\# ({\textbf{A}}\cap B)$ is the sum of independent Bernouilli random variables and thus

$$\begin{aligned} {\mathbb {E}}x^{\# ({\textbf{A}}\cap B)} = \prod _{j\in B} \bigg (1+\frac{x-1}{j}\bigg ). \end{aligned}$$

Note that all factors are positive because $x>0$. The lemma now follows from the inequality $1+y\leqslant e^y$, valid for all real y. $\square $

Lemma A.4

Let $k\in {\mathbb {N}}$, and let B and G be finite sets such that $B\subset G\subset {\mathbb {Z}}_{\geqslant 4}$ and

$$\begin{aligned} |B|=k \leqslant \frac{\sqrt{\min (G)}}{2}\sum _{m\in G} \frac{1}{m}. \end{aligned}$$

Then

$$\begin{aligned}{} & {} {\mathbb {P}}\big ( {\textbf{A}}\cap G = B \,\big |\, \#({\textbf{A}}\cap G) = k \big )\\{} & {} \quad =\frac{k!(1+O(\frac{k^2(\sum _{m\in G}1/m)^{-2}}{\min (G)}))}{(\sum _{m\in G} 1/(m-1))^k}\, \prod _{b\in B} \frac{1}{b} \prod _{m\in G} \left( 1-\frac{1}{m}\right) . \end{aligned}$$

Proof

Since $|B|=k$, we have

$$\begin{aligned} {\mathbb {P}}\big ( {\textbf{A}}\cap G = B \,\big | \, \#({\textbf{A}}\cap G) = k \big ) = \frac{{\mathbb {P}}({\textbf{A}}\cap G=B)}{{\mathbb {P}}(\#({\textbf{A}}\cap G)=k)} . \end{aligned}$$

The denominator is estimated using Lemma A.1, whereas for the numerator we simply note that

$$\begin{aligned} {\mathbb {P}}({\textbf{A}}\cap G=B) = \prod _{b\in B}\frac{1}{b}\prod _{m\in G{\setminus } B}\bigg (1-\frac{1}{m}\bigg )&= \prod _{b\in B}\frac{1}{b-1} \prod _{m\in G}\bigg (1-\frac{1}{m}\bigg ). \end{aligned}$$

This completes the proof of the lemma. $\square $

Lemma A.5

Given $0<c<1$ and $D \geqslant e^{100/c}$, the probability that ${\textbf{A}}\subset (D^c,D]$ satisfies

$$\begin{aligned} \Big | \#\big ({\textbf{A}}\cap (D^{\alpha },D^{\beta }]\big )- (\beta -\alpha )\log D\Big | \leqslant (\log D)^{3/4} \quad (c\leqslant \alpha \leqslant \beta \leqslant 1)\nonumber \\ \end{aligned}$$

(A.1)

is $\geqslant 1 - O(e^{-(1/4) (\log D)^{1/2}})$.

Proof

It suffices to bound the probability that

$$\begin{aligned} \Big | \#{\textbf{A}}\cap (D^{\alpha },D^{\beta }]- (\beta -\alpha )\log D\Big | \geqslant (\log D)^{3/4}-2 \end{aligned}$$

(A.2)

whenever $\alpha \log D, \beta \log D \in {\mathbb {N}}$. The random variable $N=N(\alpha ,\beta ):= \# ({\textbf{A}}\cap (D^{\alpha },D^{\beta }])$ is the sum of Bernoulli random variables and has expectation ${\mathbb {E}}N = M+O(1)$, where

$$\begin{aligned} M= (\beta -\alpha )\log D. \end{aligned}$$

By Lemma A.3, ${\mathbb {E}}\lambda ^{ N}\leqslant e^{(\lambda -1) {\mathbb {E}}N}$. Thus, for $y = (\log D)^{3/4}$ and $\lambda _j = 1+ (-1)^j \frac{y}{\log D}$ we have

$$\begin{aligned}\begin{aligned} {\mathbb {P}}(N \geqslant M + y)&\leqslant {\mathbb {E}}\lambda _2^{N-M-y} \ll \lambda _2^{-M-y} e^{(\lambda _2-1)M} \ll e^{-(1/3) (\log D)^{1/2}}, \\ {\mathbb {P}}(N \leqslant M - y)&\leqslant {\mathbb {E}}\lambda _1^{N-M+y} \ll \lambda _1^{-M+y} e^{(\lambda _1-1)M} \ll e^{-(1/3) (\log D)^{1/2}}. \end{aligned}\end{aligned}$$

Summing over all possible $\alpha ,\beta $ completes the proof. $\square $

Lemma A.6

Uniformly for $X \geqslant 2$ and $K\geqslant 2$ we have

$$\begin{aligned} \sum _{a\in {\textbf{A}}\cap [2,X] } a \leqslant K X \end{aligned}$$

with probability $\geqslant 1-e^{2 - K}$.

Proof

We use Chernoff’s inequality, often called Rankin’s trick in this context:

$$\begin{aligned} {\mathbb {P}}\Big ( \sum _{a\in {\textbf{A}}\cap [2,X]} a > KX \Big )&\leqslant e^{-K} \sum _{A' \subset [2,X]} {\mathbb {P}}\big ( {\textbf{A}}\cap [2,X]=A' \big ) e^{\frac{1}{X}\sum _{a\in A'} a } \\&=e^{-K} \sum _{A' \subset [2,X]} \prod _{\begin{array}{c} 2\leqslant a\leqslant X \\ a\not \in A' \end{array} } \bigg (1-\frac{1}{a}\bigg ) \prod _{a\in A'} \frac{e^{a/X}}{a} \\&= e^{-K} \prod _{2\leqslant a\leqslant X} \bigg (1-\frac{1}{a}\bigg )\bigg (1+ \frac{e^{a/X}}{a-1}\bigg ) \\&= e^{-K} \prod _{2\leqslant a\leqslant X} \bigg (1+\frac{e^{a/X}-1}{a}\bigg ) \\&\leqslant e^{-K} (1 + 2/X)^X\leqslant e^{2 - K} \end{aligned}$$

because $e^t\leqslant 1+2t$ for all $t\in [0,1]$. This concludes the proof. $\square $

Lemma A.7

Let $\eta \in [0,1]$ and let $J_1,\ldots , J_d \subset {\mathbb {N}}$ be mutually disjoint intervals. Suppose that $X \subset J_1 \times \cdots \times J_d$ is a set of size $\eta \prod _i \max J_i$. If $\min _i |J_i|$ is sufficiently large in terms of $\eta $ and d, then with probability $\geqslant (\eta /4)^d$, there are distinct elements $a_i \in {\textbf{A}}$ with $(a_1,\ldots , a_d) \in X$.

Proof

Let $M_i = \max J_i$ for each i. We will prove the lemma by induction on d.

The case $d = 1$ follows by direct calculation: Suppose that $X \subset J_1$ has size $\geqslant \eta M_1$. Then

$$\begin{aligned} {\mathbb {P}}({\textbf{A}}\cap X = \emptyset ) = \prod _{n \in X} (1 - 1/n) \leqslant (1-1/M_1)^{\eta M_1} \leqslant e^{-\eta } \leqslant 1-\eta /2. \end{aligned}$$

Let us now assume we have proven the lemma for $d-1$ intervals, and let us prove it for d intervals $J_1,\ldots ,J_d$. For each $j_1\in J_1$, we set

$$\begin{aligned} X_{j_1} := \{(j_2,\ldots , j_d) \in J_2 \times \cdots \times J_d : (j_1,j_2,\ldots , j_d) \in X\}. \end{aligned}$$

Let $Y=\{j_1\in J_1:|X_{j_1}|\geqslant (\eta /2)M_1\}$. Then $|Y|\geqslant (\eta /2)M_1$, because otherwise we would have $|X|<\eta \prod _i M_i$, a contradiction to our hypotheses. By the case $d = 1$ (just described), ${\textbf{A}}\cap Y$ is nonempty with probability $\geqslant \eta /4$. Fix some $a_1 \in {\textbf{A}}\cap Y$. Then, by the inductive hypothesis and the fact that the $J_i$ are disjoint, with probability $\geqslant (\eta /4)^{d-1}$, independent of the choice of $a_1$, there are elements $a_i \in {\textbf{A}}\cap J_i$, $i = 2,\ldots , d$ with $(a_2,\ldots , a_d) \in X_{a_1}$, and therefore $(a_1,\ldots , a_d) \in X$. The disjointness of the $J_i$ of course guarantees that the $a_i$ are all distinct. This completes the proof. $\square $

Lemma A.8

If $X_j,Y_j$ live on the same discrete probability space for $1\leqslant j\leqslant k$, and furthermore $X_1,\ldots ,X_k$ are independent, and $Y_1,\ldots ,Y_k$ are also independent, then

$$\begin{aligned} d_{{\text {TV}}}((X_1,\ldots ,X_k),(Y_1,\ldots ,Y_k)) \leqslant \sum _{j=1}^k d_{{\text {TV}}}(X_j,Y_j), \end{aligned}$$

Proof

We begin with the following identity

$$\begin{aligned} a_1 \ldots a_m - b_1\ldots b_m = \sum _{j=1}^m (a_j-b_j)\prod _{i<j} a_i \prod _{i>j} b_i. \end{aligned}$$

Denoting $\Omega $ the domain of $(X_1,\ldots ,X_m)$, and writing $a_i={\mathbb {P}}(X_i=\omega _i)$, $b_i={\mathbb {P}}(Y_i=\omega _i)$, we then have

$$\begin{aligned}&d_{TV} ( (X_1,\ldots ,X_m),(Y_1,\ldots ,Y_m) )\\ {}&= \frac{1}{2} \sum _{ (\omega _1,\ldots ,\omega _m)\in \Omega } \big | {\mathbb {P}}(X_j=\omega _j, 1\leqslant j\leqslant m) - {\mathbb {P}}(Y_j=\omega _j, 1\leqslant j\leqslant m) \big | \\&\quad = \frac{1}{2} \sum _{ (\omega _1,\ldots ,\omega _m)\in \Omega } |a_1\ldots a_m - b_1\ldots b_m|\\&\quad \leqslant \frac{1}{2} \sum _{j=1}^m \sum _{\omega _j} |a_j-b_j| \sum _{\omega _i \; (i\ne j)} \prod _{i<j} a_i \prod _{i>j} b_i \\&\quad = \frac{1}{2} \sum _{j=1}^m \sum _{\omega _j} |a_j-b_j| \\&\quad = \sum _{j=1}^m d_{TV} (X_j,Y_j).\\ \end{aligned}$$

$\square $

Appendix B. Basic properties of entropy

The notion of entropy plays a key role in our paper. In this appendix we record the key facts about it that we need. Proofs may be found in many places. One convenient resource is [1].

If X is a random variable taking values in a finite set then we define

$$\begin{aligned} {\mathbb {H}}(X) := - \sum _x {\mathbb {P}}(X = x) \log ({\mathbb {P}}(X = x)), \end{aligned}$$

where the log is to base e and the summation runs over the range of X.

If ${\textbf{p}} = (p_1,\ldots , p_n)$ is a vector of probabilities (that is, if $p_1,\ldots , p_n \geqslant 0$ and $p_1 + \cdots + p_n = 1$), then we write

$$\begin{aligned} {\mathbb {H}}({\textbf{p}}) := - \sum _{i = 1}^n p_i \log p_i. \end{aligned}$$

There should be no danger of confusing the two slightly different usages.

Our first lemma gives a simple upper bound for multinomial coefficients in terms of entropies.

Lemma B.1

Let $n, n_1,\ldots , n_k$ be non-negative integers with $\sum n_i = n$. Then

$$\begin{aligned} \frac{n!}{n_1! \ldots n_k!} \leqslant e^{{\mathbb {H}}({\textbf{p}}) n}, \end{aligned}$$

where ${\textbf{p}} = (p_1,\ldots , p_k)$ with $p_i := n_i/n$.

Proof

The right-hand side is $(n/n_1)^{n_1} \ldots (n/n_k)^{n_k}$. Now simply observe that

$$\begin{aligned}{} & {} \frac{n!}{(n_1)! \ldots (n_k)!} (n_1/n)^{n_1} \ldots (n_k/n)^{n_k}\\{} & {} \quad \leqslant \sum _{k_1+\cdots +k_m=n}\frac{n!}{k_1!\ldots k_m!} (n_1/n)^{k_1}\ldots (n_k/n)^{k_m} = 1. \\ \end{aligned}$$

$\square $

Our next lemma is a simple and well-known upper bound for the entropy.

Lemma B.2

Let X be a random variable taking values in a set of size N. Then ${\mathbb {H}}(X) \leqslant \log N$.

Proof

This follows immediately from the convexity of the function $L(x) = -x \log x$ and Jensen’s inequality. See [1, Lemma 14.6.1 (i)]. $\square $

The next lemma is simple and has no doubt appeared elsewhere, but we do not know an explicit reference. In its statement, we use the notation $\langle {\textbf{a}},{\textbf{p}}\rangle = \sum _{i = 1}^n a_i p_i$.

Lemma B.3

Let ${\textbf{p}} = (p_1,\ldots , p_n)$ be a vector of probabilities, and let ${\textbf{a}} = (a_1,\ldots , a_n)$ be a vector of real numbers. Then

$$\begin{aligned} {\mathbb {H}}({\textbf{p}}) + \langle {\textbf{a}}, {\textbf{p}}\rangle \leqslant \log \Big (\sum _{j=1}^n e^{a_j}\Big ), \end{aligned}$$

and equality occurs if and only if $p_j = e^{a_j} / \sum _{i=1}^n e^{a_i}$ for all j.

Proof

Let us begin by recalling that if $t_1,\ldots ,t_n>0$ are such that

$$\begin{aligned} t_1+\cdots +t_n=1, \end{aligned}$$

then the concavity of the logarithm implies that

$$\begin{aligned} t_1\log x_1+\cdots +t_n\log x_n\leqslant \log (t_1x_1+\cdots +t_nx_n) \end{aligned}$$

(B.1)

for all $x_1,\ldots ,x_n>0$. In addition, equality occurs in (B.1) if and only if $x_1=\cdots =x_n$. One may also prove this fact by induction on n, and by noticing that the case $n=2$ is equivalent to having $u^t\leqslant tu+ 1-t$ for all $u>0$ and all $t\in (0,1)$, with equality occurring if and only if $u=1$.

Let us now proved the lemma. If $p_j=1$ for some j, then ${\mathbb {H}}({\textbf{p}})+\langle {\textbf{a}}, {\textbf{p}}\rangle = a_j$. If $n=1$, then this is equal to $\log (\sum _{i=1}^ne^{a_i})$, whereas if $n\geqslant 2$, then we have $a_j<\log (\sum _{i=1}^n e^{a_i})$, so that the lemma holds in both cases. Assume now that $p_j\in (0,1)$ for all j. We then have

$$\begin{aligned} {\mathbb {H}}({\textbf{p}})+\langle {\textbf{a}}, {\textbf{p}}\rangle = \sum _{j=1}^n p_j \log (e^{a_j}/p_j). \end{aligned}$$

We may then use (B.1) with $t_j=p_j$ and $x_j=e^{a_j}/p_j$ to complete the proof of the lemma. $\square $

The next lemma, known as the chain rule for entropy, is nothing more than a short computation.

Lemma B.4

Let X, Y be random variables taking values in finite sets. Then

$$\begin{aligned} {\mathbb {H}}(X,Y) = {\mathbb {H}}(Y) + \sum _y {\mathbb {P}}(Y = y) {\mathbb {H}}(X | Y = y). \end{aligned}$$

Remark

The sum over y is usually written ${\mathbb {H}}(X | Y)$ and called the conditional entropy.

We will apply the preceding result together with the following observation.

Lemma B.5

Suppose that X, Y are random variables with finite ranges and that Y is a deterministic function of X. Then ${\mathbb {H}}(X,Y) = {\mathbb {H}}(X)$.

Proof

This follows from Lemma B.4 with the role of X and Y reversed, since all the entropies ${\mathbb {H}}(Y | X = x)$ are zero. $\square $

The next result, known as the submodularity property of entropy, is a crucial ingredient in our paper.

Lemma B.6

. Let X, Y, Z be any random variables taking values in finite sets. Then

$$\begin{aligned} {\mathbb {H}}(X,Y) + {\mathbb {H}}(X,Z) \geqslant {\mathbb {H}}(X,Y,Z) + {\mathbb {H}}(X). \end{aligned}$$

Proof

This is [1, Lemma 14.6.1 (iv)].$\square $

Appendix C. Maier–Tenenbaum flags

The purpose of this appendix is to say a little more about the bound (3.12), which corresponds in the language of this paper to [22, Theorem 1.4]. Numerically, this bound is ${{\tilde{\gamma }}}_{2^r} \gg (0.12885796477\ldots )^r$, which is a little weaker than the bound leading to Theorem 2, which is ${{\tilde{\gamma }}}_{2^r} \gg (0.140605674848\ldots )^r$. What is interesting, however, is that the flags ${\mathscr {V}}$ which lead to (3.12) are completely different to the binary flags which have been the main focus of our paper. The fact that these very different flags – the “Maier–Tenenbaum flags” – lead to a result which appears to be within 10 % of optimal suggests that they will have a key role to play in any future upper bound arguments for these questions.

Definition C.1

(Maier–Tenenbaum flag of order r) Let $k = 2^r$ be a power of two. Identify ${\mathbb {Q}}^k$ with ${\mathbb {Q}}^{{\mathcal {P}}[r]}$ and define a flag ${\mathscr {V}}$, $\langle {\textbf{1}}\rangle = V_0 \leqslant V_1 \leqslant \cdots \leqslant V_r \leqslant {\mathbb {Q}}^{{\mathcal {P}}[r]}$, as follows: $V_i = {\text {Span}}({\textbf{1}}, \omega ^1,\ldots , \omega ^{i})$, where $\omega ^i_S = 1_{i \in S}$ for $S \subset [r]$.

Remark

We have $\dim (V_i) = i+1$ and in particular $V_r$ is much smaller than ${\mathbb {Q}}^k$, in contrast to the situation for binary systems. We leave it to the reader to check that ${\mathscr {V}}$ is nondegenerate.

Recall that ${\mathscr {V}}$ gives rise to a tree structure, with the cells at level i being the intersections of cosets $x + V_i$ with the cube $\{0,1\}^k$ (cf. Sect. 7.2). It is easy to check that this tree structure has a very simple form, with the cell $\Gamma _i = V_i \cap \{0,1\}^k$ being $\{{\textbf{0}},{\textbf{1}}, \omega ^1, {\textbf{1}} - \omega ^1,\ldots , \omega ^i, {\textbf{1}} - \omega ^i\}$, this dividing into three children at level $i-1$; the cell $\Gamma _{i-1}$ together with two singletons, $\{ \omega ^i\}$ and $\{{\textbf{1}} - \omega ^i\}$.

The recursive definition of the quantities $f^C({\varvec{\rho }})$ (see (7.4)) therefore becomes $f^{\Gamma _1}({\varvec{\rho }}) = 3$,

$$\begin{aligned} f^{\Gamma _{j+1}}({\varvec{\rho }}) = f^{\Gamma _j}({\varvec{\rho }})^{\rho _j} + 2. \end{aligned}$$

(C.1)

In addition, the $\rho $-equations (7.5) become

$$\begin{aligned} f^{\Gamma _{j+1}}({\varvec{\rho }}) = e (f^{\Gamma _j}({\varvec{\rho }}))^{\rho _j}. \end{aligned}$$

(C.2)

On the one hand, iterating (C.2) yields that

$$\begin{aligned} \log f^{\Gamma _j}(\varvec{\rho }) = \rho _1\ldots \rho _{j-1}\log 3 + \sum _{i=0}^{j-2}\rho _{j-1}\ldots \rho _{j-i} \end{aligned}$$

for all $j\geqslant 1$. On the other hand, combining (C.1) and (C.2), we find that

$$\begin{aligned} \rho _j\log f^{\Gamma _j}(\varvec{\rho })=\log 2-\log (e-1), \end{aligned}$$

and thus

$$\begin{aligned} \rho _1\ldots \rho _j\log 3 + \sum _{i=0}^{j-2}\rho _j\rho _{j-1}\ldots \rho _{j-i}=\log 2-\log (e-1) \end{aligned}$$

for all $j\geqslant 1$. Hence, we obtain the formulas

$$\begin{aligned} \rho _1 = \frac{\log 2 - \log (e-1)}{\log 3},\qquad \rho _2 = \rho _3 = \cdots = \frac{\log 2 - \log (e-1)}{\log 2 + 1 - \log (e - 1)} =: \kappa . \end{aligned}$$

Let us also note that the above discussion implies that

$$\begin{aligned} \log f^{\Gamma _j}({\varvec{\rho }}) = \frac{\log 2-\log (e-1)}{\rho _j} = {\left\{ \begin{array}{ll} \log 3&{}\text {if}\ j=1,\\ \log 2-\log (e-1)+1&{}\text {if}\ j\geqslant 2. \end{array}\right. } \nonumber \\ \end{aligned}$$

(C.3)

Now, assuming that the conditions of Proposition 7.7 hold, we therefore have

$$\begin{aligned} \gamma _k^{{\text {res}}}({\mathscr {V}})= & {} (\log 3 - 1)\Big /\Big (\log 3 + \frac{1}{\rho _1}\Big (1 + \frac{1}{\kappa }+\cdots + \frac{1}{\kappa ^{r-2}}\Big ) \Big ) \\ {}= & {} \Big (1 - \frac{1}{\log 3}\Big ) \kappa ^{r-1}. \end{aligned}$$

Now it can be shown by explicit calculation that the conditions of Proposition 7.7 do hold. The optimal measures $\mu _i^*$ are all induced from the measure $\mu ^*$ in which

$$\begin{aligned} \mu ^*(\omega ^j) = \mu ^*({\textbf{1}} - \omega ^j) = \mu ^*(\Gamma _j) \cdot \frac{1}{f^{\Gamma _j}(\varvec{\rho })} = {\left\{ \begin{array}{ll} \frac{1}{3}e^{1-r} &{}\text {if}\ j=1,\\ \frac{e-1}{2e} e^{j - r} &{}\text {if}\ j\geqslant 2. \end{array}\right. } \end{aligned}$$

In addition, we have

$$\begin{aligned} \mu ^*({\textbf{0}}) = \mu ^*({\textbf{1}}) = \frac{\mu ^*(\Gamma _0)}{2} = \frac{1}{6} e^{1-r} . \end{aligned}$$

We may then prove by a slightly lengthy computation whose details we leave to the reader that the optimal parameters ${{\textbf{c}}}^*$ are given by

$$\begin{aligned} c^*_1 = 1, \quad c^*_j = \frac{1}{\kappa ^2}\Big (\frac{e - \kappa }{e - 1}\Big ) \Big (1 - \frac{1}{\log 3}\Big ) \kappa ^j, \quad c^*_{r+1} = \Big (1 - \frac{1}{\log 3}\Big ) \kappa ^{r-1}. \end{aligned}$$

It can also be shown that $\gamma ^{{\text {res}}}_k({\mathscr {V}}) = \gamma _k({\mathscr {V}})$, by showing that the full entropy condition (3.6) follows from the restricted conditions (7.11). This is a little involved, but a fairly direct inductive argument can be made to work and this is certainly less subtle than the arguments of Sect. 8. In this way one may establish the bound

$$\begin{aligned} \gamma _{2^r} \geqslant \bigg (1 - \frac{1}{\log 3}\bigg ) \bigg (\frac{\log 2 - \log (e-1)}{\log 2 + 1 - \log (e - 1)}\bigg )^{r-1} \gg ( 0.131810543\ldots )^r. \nonumber \\ \end{aligned}$$

(C.4)

Finally, a relatively routine perturbative argument yields the same bound for ${{\tilde{\gamma }}}_{2^r}$.

It will be noted that (C.4) is strictly stronger than (3.12), the bound obtained in [22]. This is because, in essence, Maier and Tenenbaum chose slightly suboptimal measures and parameters on the system ${\mathscr {V}}$, roughly corresponding to $\mu (\omega ^j) \sim 3^{j - r-1}$, which then leads to $c_j \sim \big ( \frac{1 - 1/\log 3}{1 - 1/\log 27} \big )^j$.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Ford, K., Green, B. & Koukoulopoulos, D. Equal sums in random sets and the concentration of divisors. Invent. math. 232, 1027–1160 (2023). https://doi.org/10.1007/s00222-022-01177-y

Download citation

Received: 25 November 2019
Accepted: 09 December 2022
Published: 29 March 2023
Issue Date: June 2023
DOI: https://doi.org/10.1007/s00222-022-01177-y

Mathematics Subject Classification

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Equal sums in random sets and the concentration of divisors

Abstract

Similar content being viewed by others

Asymptotic behavior for sums of non-identically distributed random variables

Large deviations of sums of random variables

Infinitely Divisible Approximations for Sums of m-Dependent Random Variables

1 Introduction

1.1 The concentration of divisors

Theorem 1

1.2 Packing divisors

1.3 Random sets and equal sums

Problem 1

Theorem 2

Corollary 1

Proof

Conjecture 1

1.4 Application to divisors of integers, permutations and polynomials

Theorem 3

Remark

Theorem 4

Theorem 5

Conjecture 2

Remark

Theorem 6

Conjecture 3

2 Application to random integers, random permutations and random polynomials

2.1 A “tensor power” argument

Lemma 2.1

Remark

Proof

2.2 Modeling prime factors with a logarithmic random set

Lemma 2.2

Proof

2.3 The concentration of divisors of integers

Proof of Theorem 3

Proof of Theorem 6

2.4 Permutations and polynomials over finite fields

Lemma 2.3

Proof

Lemma 2.4

Proof

Proof of Theorem 4

Proof of Theorem 5

3 Overview of the paper

3.1 Part II: equal sums and the optimization problem

Definition 3.1

Definition 3.2

Remark

Definition 3.3

Remark

Definition 3.4

Definition 3.5

Remark

Definition 3.6

Problem 3.7

Theorem 7

Remark 3.1

3.2 Part III: the optimization problem

3.3 Part IV: binary systems

Definition 3.8

3.4 Relation to previous work

Remark

4 The upper bound \(\beta _k \leqslant \gamma _k\)

4.1 Venn diagrams and linear algebra

Remark

Lemma 4.1

Proof

Lemma 4.2

Proof

Remark 4.1

4.2 A local-to-global estimate

Definition 4.3

Proposition 4.4

Proof

4.3 Upper bounds in terms of entropies

Lemma 4.5

Remark

Proof of Lemma 4.5

Lemma 4.6

Proof