Variants of the Selberg sieve, and bounded intervals containing many primes

Polymath, DHJ

doi:10.1186/s40687-014-0012-7

Variants of the Selberg sieve, and bounded intervals containing many primes

Research Article
Open access
Published: 17 October 2014

Volume 1, article number 12, (2014)
Cite this article

Download PDF

You have full access to this open access article

Research in the Mathematical Sciences Aims and scope Submit manuscript

Variants of the Selberg sieve, and bounded intervals containing many primes

Download PDF

DHJ Polymath¹

14k Accesses
83 Citations
20 Altmetric
1 Mention
Explore all metrics

An Erratum to this article was published on 30 July 2015

Abstract

For any m≥1, let H_m denote the quantity ${liminf}_{n \to \infty} (p_{n + m} - p_{n})$ . A celebrated recent result of Zhang showed the finiteness of H₁, with the explicit bound H₁≤70,000,000. This was then improved by us (the Polymath8 project) to H₁≤4680, and then by Maynard to H₁≤600, who also established for the first time a finiteness result for H_m for m≥2, and specifically that H_m≪m³e^4m. If one also assumes the Elliott-Halberstam conjecture, Maynard obtained the bound H₁≤12, improving upon the previous bound H₁≤16 of Goldston, Pintz, and Yıldırım, as well as the bound H_m≪m³e^2m.

In this paper, we extend the methods of Maynard by generalizing the Selberg sieve further and by performing more extensive numerical calculations. As a consequence, we can obtain the bound H₁≤246 unconditionally and H₁≤6 under the assumption of the generalized Elliott-Halberstam conjecture. Indeed, under the latter conjecture, we show the stronger statement that for any admissible triple (h₁,h₂,h₃), there are infinitely many n for which at least two of n+h₁,n+h₂,n+h₃ are prime, and also obtain a related disjunction asserting that either the twin prime conjecture holds or the even Goldbach conjecture is asymptotically true if one allows an additive error of at most 2, or both. We also modify the ‘parity problem’ argument of Selberg to show that the H₁≤6 bound is the best possible that one can obtain from purely sieve-theoretic considerations. For larger m, we use the distributional results obtained previously by our project to obtain the unconditional asymptotic bound $H_{m} ≪ m e^{(4 - \frac{28}{157}) m}$ or H_m≪m e^2m under the assumption of the Elliott-Halberstam conjecture. We also obtain explicit upper bounds for H_m when m=2,3,4,5.

Bounded intervals containing many primes

Article 03 November 2016

A higher rank Selberg sieve and applications

Article 12 December 2017

Large prime gaps and probabilistic models

Article Open access 14 June 2023

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Background

For any natural number m, let H_m denote the quantity

H_{m} : = \underset{n \to \infty}{liminf} (p_{n + m} - p_{n}),

where p_n denotes the n th prime. The twin prime conjecture asserts that H₁=2; more generally, the Hardy-Littlewood prime tuples conjecture [1] implies that H_m=H(m+1) for all m≥1, where H(k) is the diameter of the narrowest admissible k-tuple (see the ‘Outline of the key ingredients’ section for a definition of this term). Asymptotically, one has the bounds

(\frac{1}{2} + o (1)) k log k \leq H (k) \leq (1 + o (1)) k log k

as k→∞ (see Theorem 17 below); thus, the prime tuples conjecture implies that H_m is comparable to m logm as m→∞.

Until very recently, it was not known if any of the H_m were finite, even in the easiest case m=1. In the breakthrough work of Goldston et al. [2], several results in this direction were established, including the following conditional result assuming the Elliott-Halberstam conjecture EH[ 𝜗] (see Claim 8 below) concerning the distribution of the prime numbers in arithmetic progressions:

Theorem 1(GPY theorem).

Assume the Elliott-Halberstam conjecture EH[ 𝜗] for all 0<𝜗<1. Then, H₁≤16.

Furthermore, it was shown in [2] that any result of the form $EH [\frac{1}{2} + 2 ϖ]$ for some fixed 0<ϖ<1/4 would imply an explicit finite upper bound on H₁ (with this bound equal to 16 for ϖ>0.229855). Unfortunately, the only results of the type EH[ 𝜗] that are known come from the Bombieri-Vinogradov theorem (Theorem 9), which only establishes EH[ 𝜗] for 0<𝜗<1/2.

The first unconditional bound on H₁ was established in a breakthrough work of Zhang [3]:

Theorem 2(Zhang’s theorem).

H₁≤70,000,000.

Zhang’s argument followed the general strategy from [2] on finding small gaps between primes, with the major new ingredient being a proof of a weaker version of $EH [\frac{1}{2} + 2 ϖ]$ , which we call MPZ[ ϖ,δ] (see Claim 10) below. It was quickly realized that Zhang’s numerical bound on H₁ could be improved. By optimizing many of the components in Zhang’s argument, we were able (Polymath, DHJ: New equidistribution estimates of Zhang type, submitted), [4] to improve Zhang’s bound to

H_{1} \leq 4, 680 .

Very shortly afterwards, a further breakthrough was obtained by Maynard [5] (with related work obtained independently in an unpublished work of Tao), who developed a more flexible ‘multidimensional’ version of the Selberg sieve to obtain stronger bounds on H_m. This argument worked without using any equidistribution results on primes beyond the Bombieri-Vinogradov theorem, and among other things was able to establish finiteness of H_m for all m, not just for m=1. More precisely, Maynard established the following results.

Theorem 3(Maynard’s theorem).

Unconditionally, we have the following bounds:

(i) H₁≤600

(ii) H_m≤C m³e^4m for all m≥1 and an absolute (and effective) constant C

Assuming the Elliott-Halberstam conjecture EH[ 𝜗] for all 0<𝜗<1, we have the following improvements:

(iii) H₁≤12

(iv) H₂≤600

(v) H_m≤C m³e^2m for all m≥1 and an absolute (and effective) constant C

For a survey of these recent developments, see [6].

In this paper, we refine Maynard’s methods to obtain the following further improvements.

Theorem 4.

Unconditionally, we have the following bounds:

(i) H₁≤246

(ii) H₂≤398,130

(iii) H₃≤24,797,814

(iv) H₄≤1,431,556,072

(v) H₅≤80,550,202,480

(vi) $H_{m} \leq Cm exp ((4 - \frac{28}{157}) m)$ for all m≥1 and an absolute (and effective) constant C

Assume the Elliott-Halberstam conjecture EH[ 𝜗] for all 0<𝜗<1. Then, we have the following improvements:

(vii) H₂≤270

(viii) H₃≤52,116

(ix) H₄≤474,266.

(x) H₅≤4,137,854.

(xi) H_m≤C m e^2m for all m≥1 and an absolute (and effective) constant C

Finally, assume the generalized Elliott-Halberstam conjecture GEH[ 𝜗] (see Claim 12 below) for all 0<𝜗<1. Then,

(xii) H₁≤6

(xiii) H₂≤252

In the ‘Outline of the key ingredients’ section, we will describe the key propositions that will be combined together to prove the various components of Theorem 4. As with Theorem 1, the results in (vii)-(xiii) do not require EH[ 𝜗] or GEH[ 𝜗] for all 0<𝜗<1, but only for a single explicitly computable 𝜗 that is sufficiently close to 1.

Of these results, the bound in (xii) is perhaps the most interesting, as the parity problem [7] prohibits one from achieving any better bound on H₁ than 6 from purely sieve-theoretic methods; we review this obstruction in the ‘The parity problem’ section. If one only assumes the Elliott-Halberstam conjecture EH[ 𝜗] instead of its generalization GEH[ 𝜗], we were unable to improve upon Maynard’s bound H₁≤12; however, the parity obstruction does not exclude the possibility that one could achieve (xii) just assuming EH[ 𝜗] rather than GEH[ 𝜗], by some further refinement of the sieve-theoretic arguments (e.g. by finding a way to establish Theorem 20(ii) below using only EH[ 𝜗] instead of GEH[ 𝜗]).

The bounds (ii)-(vi) rely on the equidistribution results on primes established in our previous paper. However, the bound (i) uses only the Bombieri-Vinogradov theorem, and the remaining bounds (vii)-(xiii) of course use either the Elliott-Halberstam conjecture or a generalization thereof.

A variant of the proof of Theorem 4(xii), which we give in ‘Additional remarks’ section, also gives the following conditional ‘near miss’ to (a disjunction of) the twin prime conjecture and the even Goldbach conjecture:

Theorem 5(Disjunction).

Assume the generalized Elliott-Halberstam conjecture GEH[ 𝜗] for all 0<𝜗<1. Then, at least one of the following statements is true:

(a) (Twin prime conjecture) H₁=2.

(b) (near-miss to even Goldbach conjecture) If n is a sufficiently large multiple of 6, then at least one of n and n−2 is expressible as the sum of two primes, similarly with n−2 replaced by n+2. (In particular, every sufficiently large even number lies within 2 of the sum of two primes.)

We remark that a disjunction in a similar spirit was obtained in [8], which established (prior to the appearance of Theorem 2) that either H₁ was finite or that every interval [x,x+x^ε] contained the sum of two primes if x was sufficiently large depending on ε>0.

There are two main technical innovations in this paper. The first is a further generalization of the multidimensional Selberg sieve introduced by Maynard and Tao, in which the support of a certain cutoff function F is permitted to extend into a larger domain than was previously permitted (particularly under the assumption of the generalized Elliott-Halberstam conjecture). As in [5], this largely reduces the task of bounding H_m to that of efficiently solving a certain multidimensional variational problem involving the cutoff function F. Our second main technical innovation is to obtain efficient numerical methods for solving this variational problem for small values of the dimension k, as well as sharpened asymptotics in the case of large values of k.

The methods of Maynard and Tao have been used in a number of subsequent applications [9]-[21]. The techniques in this paper should be able to be used to obtain slight numerical improvements to such results, although we did not pursue these matters here.

1.1 Organization of the paper

The paper is organized as follows. After some notational preliminaries, we recall in the ‘Distribution estimates on arithmetic functions’ section the known (or conjectured) distributional estimates on primes in arithmetic progressions that we will need to prove Theorem 4. Then, in the section ‘Outline of the key ingredients’, we give the key propositions that will be combined together to establish this theorem. One of these propositions, Lemma 18, is an easy application of the pigeonhole principle. Two further propositions, Theorem 19 and Theorem 20, use the prime distribution results from the ‘Distribution estimates on arithmetic functions’ section to give asymptotics for certain sums involving sieve weights and the von Mangoldt function; they are established in the ‘Multidimensional Selberg sieves’ section. Theorems 22, 24, 26, and 28 use the asymptotics established in Theorems 19 and 20, in combination with Lemma 18, to give various criteria for bounding H_m, which all involve finding sufficiently strong candidates for a variety of multidimensional variational problems; these theorems are proven in the ‘Reduction to a variational problem’ section. These variational problems are analysed in the asymptotic regime of large k in the ‘Asymptotic analysis’ section, and for small and medium k in the ‘The case of small and medium dimension’ section, with the results collected in Theorems 23, 25, 27, and 29. Combining these results with the previous propositions gives Theorem 16, which, when combined with the bounds on narrow admissible tuples in Theorem 17 that are established in the ‘Narrow admissible tuples’ section, will give Theorem 4. (See also Table 1 for more details of the logical dependencies between the key propositions.)

Table 1 Results used to prove various components of Theorem 16

Full size table

Finally, in the ‘The parity problem’ section, we modify an argument of Selberg to show that the bound H₁≤6 may not be improved using purely sieve-theoretic methods, and in the ‘Additional remarks’ section, we establish Theorem 5 and make some miscellaneous remarks.

1.2 Notation

The notation used here closely follows the notation in our previous paper.

We use |E| to denote the cardinality of a finite set E, and 1_E to denote the indicator function of a set E; thus, 1_E(n)=1 when n∈E and 1_E(n)=0 otherwise.

All sums and products will be over the natural numbers $ℕ : = {1, 2, 3, \dots}$ unless otherwise specified, with the exceptions of sums and products over the variable p, which will be understood to be over primes.

The following important asymptotic notation will be in use throughout the paper.

Definition 6(Asymptotic notation).

We use x to denote a large real parameter, which one should think of as going off to infinity; in particular, we will implicitly assume that it is larger than any specified fixed constant. Some mathematical objects will be independent of x and referred to as fixed; but unless otherwise specified, we allow all mathematical objects under consideration to depend on x (or to vary within a range that depends on x, e.g. the summation parameter n in the sum $\sum_{x \leq n \leq 2 x} f (n)$ ). If X and Y are two quantities depending on x, we say that X=O(Y) or X≪Y if one has |X|≤C Y for some fixed C (which we refer to as the implied constant), and X=o(Y) if one has |X|≤c(x)Y for some function c(x) of x (and of any fixed parameters present) that goes to zero as x→∞ (for each choice of fixed parameters). We use X⪻ ⪻Y to denote the estimate X≤x^o(1)Y, X∼Y to denote the estimate Y≪X≪Y, and X≈Y to denote the estimate Y⪻ ⪻X⪻ ⪻Y. Finally, we say that a quantity n is of polynomial size if one has n=O(x^O(1)).

If asymptotic notation such as O() or ⪻ ⪻ appears on the left-hand side of a statement, this means that the assertion holds true for any specific interpretation of that notation. For instance, the assertion $\sum_{n = O (N)} | α (n) | ⪻ ⪻ N$ means that for each fixed constant C>0, one has $\sum_{| n | \leq CN} | α (n) | ⪻ ⪻ N$ .

If q and a are integers, we write a|q if a divides q. If q is a natural number and $a \in ℤ$ , we use a (q) to denote the residue class

a (q) : = \{a + nq : n \in ℤ\}

and let $ℤ / qℤ$ denote the ring of all such residue classes a(q). The notation b=a (q) is synonymous to b∈ a (q). We use (a,q) to denote the greatest common divisor of a and q, and [ a,q] to denote the least common multiple^a. We also let

{(ℤ / qℤ)}^{\times} : = \{a (q) : (a, q) = 1\}

denote the primitive residue classes of $ℤ / qℤ$ .

We use the following standard arithmetic functions:

(i)
$φ (q) : = | {(ℤ / qℤ)}^{\times} |$
denotes the Euler totient function of q.
(ii)
$τ (q) : = \sum_{d | q} 1$
denotes the divisor function of q.
(iii)
Λ(q) denotes the von Mangoldt function of q; thus, Λ(q)= logp if q is a power of a prime p, and Λ(q)=0 otherwise.
(iv)
θ(q) is defined to equal logq when q is a prime, and θ(q)=0 otherwise.
(v)
μ(q) denotes the Möbius function of q; thus, μ(q)=(−1)^k if q is the product of k distinct primes for some k≥0, and μ(q)=0 otherwise.
(vi)
Ω(q) denotes the number of prime factors of q (counting multiplicity).

We recall the elementary divisor bound

τ (n) ⪻ ⪻ 1

(1)

whenever n≪x^O(1), as well as the related estimate

\sum_{n ≪ x} \frac{τ {(n)}^{C}}{n} ≪ \overset{O (1)}{log} x

(2)

for any fixed C>0 (see, e.g. [Lemma 1.5]).

The Dirichlet convolution $α ⋆ β : ℕ \to ℂ$ of two arithmetic functions $α, β : ℕ \to ℂ$ is defined in the usual fashion as

α ⋆ β (n) : = \sum_{d | n} α (d) β (\frac{n}{d}) = \sum_{ab = n} α (a) β (b) .

Distribution estimates on arithmetic functions

As mentioned in the introduction, a key ingredient in the Goldston-Pintz-Yıldırım approach to small gaps between primes comes from distributional estimates on the primes, or more precisely on the von Mangoldt function Λ, which serves as a proxy for the primes. In this work, we will also need to consider distributional estimates on more general arithmetic functions, although we will not prove any new such estimates in this paper, relying instead on estimates that are already in the literature.

More precisely, we will need averaged information on the following quantity:

Definition 7(Discrepancy).

For any function $α : ℕ \to ℂ$ with finite support (that is, α is non-zero only on a finite set) and any primitive residue class a (q), we define the (signed) discrepancy Δ(α;a (q)) to be the quantity

Δ (α; a (q)) : = \sum_{n = a (q)} α (n) - \frac{1}{φ (q)} \sum_{(n, q) = 1} α (n) .

(3)

For any fixed 0<𝜗<1, let EH[ 𝜗] denote the following claim:

Claim 8(Elliott-Halberstam conjecture, EH[ 𝜗]).

If Q⪻ ⪻x^𝜗 and A≥1 is fixed, then

\sum_{q \leq Q} sup_{a \in {(ℤ / qℤ)}^{\times}} |Δ (Λ 1_{[x, 2 x]}; a (q))| ≪ x \overset{- A}{log} x.

(4)

In [22], it was conjectured that EH[ 𝜗] held for all 0<𝜗<1. (The conjecture fails at the endpoint case 𝜗=1; see [23],[24] for a more precise statement.) The following classical result of Bombieri [25] and Vinogradov [26] remains the best partial result of the form EH[ 𝜗]:

Theorem 9(Bombieri-Vinogradov theorem).

[25],[26] EH[ 𝜗] holds for every fixed 0<𝜗<1/2.

In [2], it was shown that any estimate of the form EH[ 𝜗] with some fixed 𝜗>1/2 would imply the finiteness of H₁. While such an estimate remains unproven, it was observed by Motohashi and Pintz [27] and by Zhang [3] that a certain weakened version of EH[ 𝜗] would still suffice for this purpose. More precisely (and following the notation of our previous paper), let ϖ,δ>0 be fixed, and let MPZ[ ϖ,δ] be the following claim:

Claim 10(Motohashi-Pintz-Zhang estimate, MPZ[ ϖ,δ]).

Let I⊂[1,x^δ] and Q⪻ ⪻x^1/2+2ϖ. Let P_I denote the product of all the primes in I, and let S_I denote the square-free natural numbers whose prime factors lie in I. If the residue class a (P_I) is primitive (and is allowed to depend on x), and A≥1 is fixed, then

\sum_{\begin{matrix} q \leq Q \\ q \in S_{I} \end{matrix}} |Δ (Λ 1_{[x, 2 x]}; a (q))| ≪ x \overset{- A}{log} x,

(5)

where the implied constant depends only on the fixed quantities (A,ϖ,δ), but not on a.

It is clear that $EH [\frac{1}{2} + 2 ϖ]$ implies MPZ[ ϖ,δ] whenever ϖ,δ≥0. The first non-trivial estimate of the form MPZ[ ϖ,δ] was established by Zhang [3], who (essentially) obtained MPZ[ ϖ,δ] whenever $0 \leq ϖ, δ < \frac{1}{1, 168}$ . In [Theorem 2.17], we improved this result to the following.

Theorem 11.

MPZ[ ϖ,δ] holds for every fixed ϖ,δ≥0 with 600ϖ+180δ<7.

In fact, a stronger result was established, in which the moduli q were assumed to be densely divisible rather than smooth, but we will not exploit such improvements here. For our application, the most important thing is to get ϖ as large as possible; in particular, Theorem 11 allows one to get ϖ arbitrarily close to $\frac{7}{600} \approx 0.01167$ .

In this paper, we will also study the following generalization of the Elliott-Halberstam conjecture:

Claim 12(Generalized Elliott-Halberstam conjecture, GEH[ 𝜗]).

Let ε>0 and A≥1 be fixed. Let N,M be quantities such that x^ε⪻ ⪻N,M⪻ ⪻x^1−ε with N M≍x, and let $α, β : ℕ \to ℝ$ be sequences supported on [ N,2N] and [ M,2M], respectively, such that one has the pointwise bound

| α (n) | ≪ τ {(n)}^{O (1)} \overset{O (1)}{log} x; | β (m) | ≪ τ {(m)}^{O (1)} \overset{O (1)}{log} x

(6)

for all natural numbers n,m. Suppose also that β obeys the Siegel-Walfisz type bound

|Δ (β 1_{(\cdot, r) = 1}; a (q))| ≪ τ {(qr)}^{O (1)} M \overset{- A}{log} x

(7)

for any q,r≥1, any fixed A, and any primitive residue class a (q). Then for any Q⪻ ⪻x^𝜗, we have

\sum_{q \leq Q} sup_{a \in {(ℤ / qℤ)}^{\times}} |Δ (α ⋆ β; a (q))| ≪ x \overset{- A}{log} x.

(8)

In [28], Conjecture 1], it was essentially conjectured^b that GEH[ 𝜗] was true for all 0<𝜗<1. This is stronger than the Elliott-Halberstam conjecture:

Proposition 13.

For any fixed 0<𝜗<1, GEH[ 𝜗] implies EH[ 𝜗].

Proof.

(Sketch) As this argument is standard, we give only a brief sketch. Let A>0 be fixed. For n∈[ x,2x], we have Vaughan’s identity^c[29]

Λ (n) = μ_{<} ⋆ L (n) - μ_{<} ⋆ Λ_{<} ⋆ 1 (n) + μ_{\geq} ⋆ Λ_{\geq} ⋆ 1 (n),

where L(n):= log(n) and

\begin{align} Λ_{\geq} (n) : = Λ (n) 1_{n \geq x^{1 / 3}}, & Λ_{<} (n) : = Λ (n) 1_{n < x^{1 / 3}} \end{align}

(9)

\begin{align} μ_{\geq} (n) : = μ (n) 1_{n \geq x^{1 / 3}}, & μ_{<} (n) : = μ (n) 1_{n < x^{1 / 3}} . \end{align}

(10)

By decomposing each of the functions μ_<, μ_≥, 1, Λ_<, Λ_≥ into O(logA+1x) functions supported on intervals of the form [ N,(1+ log−A x)N], and discarding those contributions which meet the boundary of [ x,2x] (cf. [3],[28],[30],[31]), and using GEH[ 𝜗] (with A replaced by a much larger fixed constant A^′) to control all remaining contributions, we obtain the claim (using the Siegel-Walfisz theorem; see, e.g. [32], Satz 4] or [33], Th. 5.29]).

By modifying the proof of the Bombieri-Vinogradov theorem, Motohashi [34] established the following generalization of that theorem:

Theorem 14(Generalized Bombieri-Vinogradov theorem).

[34] GEH[ 𝜗] holds for every fixed 0<𝜗<1/2.

One could similarly describe a generalization of the Motohashi-Pintz-Zhang estimate MPZ[ ϖ,δ], but unfortunately, the arguments in [3] or Theorem 11 do not extend to this setting unless one is in the ‘Type I/Type II’ case in which N,M are constrained to be somewhat close to x^1/2, or if one has ‘Type III’ structure to the convolution α⋆β, in the sense that it can refactored as a convolution involving several ‘smooth’ sequences. In any event, our analysis would not be able to make much use of such incremental improvements to GEH[ 𝜗], as we only use this hypothesis effectively in the case when 𝜗 is very close to 1. In particular, we will not directly use Theorem 14 in this paper.

Outline of the key ingredients

In this section, we describe the key subtheorems used in the proof of Theorem 4, with the proofs of these subtheorems mostly being deferred to later sections.

We begin with a weak version of the Dickson-Hardy-Littlewood prime tuples conjecture [1], which (following Pintz [35]) we refer to as [ k,j]. Recall that for any $k \in ℕ$ , an admissible k-tuple is a tuple $ℋ = (h_{1}, \dots, h_{k})$ of k increasing integers h₁<…<h_k which avoids at least one residue class $a_{p} (p) : = {a_{p} + np : n \in ℤ}$ for every p. For instance, (0,2,6) is an admissible 3-tuple, but (0,2,4) is not.

For any k≥j≥2, we let DHL[ k;j] denote the following claim:

Claim 15(Weak Dickson-Hardy-Littlewood conjecture, DHL[ k;j]).

For any admissible k-tuple $ℋ = (h_{1}, \dots, h_{k})$ , there exist infinitely many translates $n + ℋ = (n + h_{1}, \dots, n + h_{k})$ of which contain at least j primes.

The full Dickson-Hardy-Littlewood conjecture is then the assertion that DHL[ k;k] holds for all k≥2. In our analysis, we will focus on the case when j is much smaller than k; in fact, j will be of the order of logk.

For any k, let H(k) denote the minimal diameter h_k−h₁ of an admissible k-tuple; thus for instance, H(3)=6. It is clear that for any natural numbers m≥1 and k≥m+1, the claim DHL[k;m+1] implies that H_m≤H(k) (and the claim DHL[ k;k] would imply that H_k−1=H(k)). We will therefore deduce Theorem 4 from a number of claims of the form DHL[ k;j]. More precisely, we have

Theorem 16.

Unconditionally, we have the following claims:

(i) DHL[50;2].

(ii) DHL[35,410;3].

(iii) DHL[1,649,821;4].

(iv) DHL[75,845,707;5].

(v) DHL[3,473,955,908;6].

(vi) DHL[k;m+1] whenever m≥1 and $k \geq C exp ((4 - \frac{28}{157}) m)$ for some sufficiently large absolute (and effective) constant C.

Assume the Elliott-Halberstam conjecture EH[ θ] for all 0<θ<1. Then, we have the following improvements:

(vii) DHL[54;3].

(viii) DHL[5,511;4].

(ix) DHL[41,588;5].

(x) DHL[309,661;6].

(xi) DHL[k;m+1] whenever m≥1 and k≥C exp(2m) for some sufficiently large absolute (and effective) constant C.

Assume the generalized Elliott-Halberstam conjecture GEH[ θ] for all 0<θ<1. Then

(xii) DHL[3;2].

(xiii) DHL[51;3].

Theorem 4 then follows from Theorem 16 and the following bounds on H(k) (ordered by increasing value of k):

Theorem 17(Bounds on H(k)).

(xii) H(3)=6.

(i) H(50)=246.

(xiii) H(51)=252.

(vii) H(54)=270.

(viii) H(5,511)≤52,116.

(ii) H(35,410)≤398,130.

(ix) H(41,588)≤474,266.

(x) H(309,661)≤4,137,854.

(iii) H(1,649,821)≤24,797,814.

(iv) H(75,845,707)≤1,431,556,072.

(v) H(3,473,955,908)≤80,550,202,480.

(vi), (xi) In the asymptotic limit k→∞, one has H(k)≤k logk+k log logk−k+o(k), with the bounds on the decay rate o(k) being effective.

We prove Theorem 17 in the ‘Narrow admissible tuples’ section. In the opposite direction, an application of the Brun-Titchmarsh theorem gives $H (k) \geq (\frac{1}{2} + o (1)) k log k$ as k→∞ (see [4], §3.9] for this bound, as well as with some slight refinements).

The proof of Theorem 16 follows the Goldston-Pintz-Yıldırım strategy that was also used in all previous progress on this problem (e.g. [2],[3],[5],[27]), namely that of constructing a sieve function adapted to an admissible k-tuple with good properties. More precisely, we set

w : = log log log x

and

W : = \prod_{p \leq w} p,

and observe the crude bound

W ≪ log \overset{O (1)}{log} x.

(11)

We have the following simple ‘pigeonhole principle’ criterion for DHL[k;m+1] (cf. [Lemma 4.1], though the normalization here is slightly different):

Lemma 18(Criterion for DHL).

Let k≥2 and m≥1 be fixed integers and define the normalization constant

B : = \frac{φ (W)}{W} log x.

(12)

Suppose that for each fixed admissible k-tuple (h₁,…,h_k) and each residue class b (W)such that b+h_i is coprime to W for all i=1,…,k, one can find a non-negative weight function $ν : ℕ \to ℝ^{+}$ and fixed quantities α>0 and β₁,…,β_k≥0, such that one has the asymptotic upper bound

\sum_{\begin{matrix} x \leq n \leq 2 x \\ n = b (W) \end{matrix}} ν (n) \leq (α + o (1)) B^{- k} \frac{x}{W},

(13)

the asymptotic lower bound

\sum_{\begin{matrix} x \leq n \leq 2 x \\ n = b (W) \end{matrix}} ν (n) θ (n + h_{i}) \geq (β_{i} - o (1)) B^{1 - k} \frac{x}{φ (W)}

(14)

for all i=1,…,k, and the key inequality

\frac{β_{1} + \dots + β_{k}}{α} > m.

(15)

Then, DHL[ k;m+1] holds.

Proof.

Let (h₁,…,h_k) be a fixed admissible k-tuple. Since it is admissible, there is at least one residue class b (W) such that (b+h_i,W)=1 for all $h_{i} \in ℋ$ . For an arithmetic function ν as in the lemma, we consider the quantity

N : = \sum_{\begin{matrix} x \leq n \leq 2 x \\ n = b (W) \end{matrix}} ν (n) (\sum_{i = 1}^{k} θ (n + h_{i}) - m log 3 x) .

Combining (13) and (14), we obtain the lower bound

N \geq (β_{1} + \dots + β_{k} - o (1)) B^{1 - k} \frac{x}{φ (W)} - (mα + o (1)) B^{- k} \frac{x}{W} log 3 x.

From (12) and the crucial condition (15), it follows that N>0 if x is sufficiently large.

On the other hand, the sum

\sum_{i = 1}^{k} θ (n + h_{i}) - m log 3 x

can be positive only if n+h_i is prime for at least m+1 indices i=1,…,k. We conclude that, for all sufficiently large x, there exists some integer n∈[ x,2x] such that n+h_i is prime for at least m+1 values of i=1,…,k.

Since (h₁,…,h_k) is an arbitrary admissible k-tuple, DHL[ k;m+1] follows.

The objective is then to construct non-negative weights ν whose associated ratio $\frac{β_{1} + \dots + β_{k}}{α}$ has provable lower bounds that are as large as possible. Our sieve majorants will be a variant of the multidimensional Selberg sieves used in [5]. As with all Selberg sieves, the ν are constructed as the square of certain (signed) divisor sums. The divisor sums we will use will be finite linear combinations of products of ‘one-dimensional’ divisor sums. More precisely, for any fixed smooth compactly supported function $F : [0, + \infty) \to ℝ$ , define the divisor sum $λ_{F} : ℤ \to ℝ$ by the formula

λ_{F} (n) : = \sum_{d | n} μ (d) F (\underset{x}{log} d)

(16)

where logx denotes the base x logarithm

\underset{x}{log} n : = \frac{log n}{log x} .

(17)

One should think of λ_F as a smoothed out version of the indicator function to numbers n which are ‘almost prime’ in the sense that they have no prime factors less than x^ε for some small fixed ε>0 (see Proposition 14 for a more rigorous version of this heuristic).

The functions ν we will use will take the form

ν (n) = {(\sum_{j = 1}^{J} c_{j} λ_{F_{j, 1}} (n + h_{1}) \dots λ_{F_{j, k}} (n + h_{k}))}^{2}

(18)

for some fixed natural number J, fixed coefficients $c_{1}, \dots, c_{J} \in ℝ$ and fixed smooth compactly supported functions $F_{j, i} : [0, + \infty) \to ℝ$ with j=1,…,J and i=1,…,k. (One can of course absorb the constant c_j into one of the F_j,i if one wishes.) Informally, ν is a smooth restriction to those n for which n+h₁,…,n+h_k are all almost prime.

Clearly, ν is a (positive-definite) linear combination of functions of the form

n \mapsto \prod_{i = 1}^{k} λ_{F_{i}} (n + h_{i}) λ_{G_{i}} (n + h_{i})

for various smooth functions $F_{1}, \dots, F_{k}, G_{1}, \dots, G_{k} : [0, + \infty) \to ℝ$ . The sum appearing in (13) can thus be decomposed into linear combinations of sums of the form

\sum_{\begin{matrix} x \leq n \leq 2 x \\ n = b (W) \end{matrix}} \prod_{i = 1}^{k} λ_{F_{i}} (n + h_{i}) λ_{G_{i}} (n + h_{i}) .

(19)

Also, since from (16) we clearly have

λ_{F} (n) = F (0)

(20)

when n≥x is prime and F is supported on [ 0,1], the sum appearing in (14) can be similarly decomposed into linear combinations of sums of the form

\sum_{\begin{matrix} x \leq n \leq 2 x \\ n = b (W) \end{matrix}} θ (n + h_{i}) \prod_{1 \leq i^{'} \leq k; i^{'} \neq i} λ_{F_{i^{'}}} (n + h_{i^{'}}) λ_{G_{i^{'}}} (n + h_{i^{'}}) .

(21)

To estimate the sums (21), we use the following asymptotic, proven in the ‘Multidimensional Selberg sieves’ section. For each compactly supported $F : [0, + \infty) \to ℝ$ , let

S (F) : = sup {x \geq 0 : F (x) \neq 0}

(22)

denote the upper range of the support of F (with the convention that S(0)=0).

Theorem 19(Asymptotic for prime sums).

Let k≥2 be fixed, let (h₁,…,h_k) be a fixed admissible k-tuple, and let b (W) be such that b+h_i is coprime to W for each i=1,…,k. Let 1≤i₀≤k be fixed, and for each 1≤i≤k distinct from i₀, let $F_{i}, G_{i} : [0, + \infty) \to ℝ$ be fixed smooth compactly supported functions. Assume one of the following hypotheses:

(i) (Elliott-Halberstam) There exists a fixed 0<𝜗<1 such that EH[ 𝜗] holds and such that

\sum_{1 \leq i \leq k; i \neq i_{0}} (S (F_{i}) + S (G_{i})) < 𝜗.

(23)

(ii) (Motohashi-Pintz-Zhang) There exists fixed 0≤ϖ<1/4 and δ>0 such that MPZ[ϖ,δ] holds and such that

\sum_{1 \leq i \leq k; i \neq i_{0}} (S (F_{i}) + S (G_{i})) < \frac{1}{2} + 2 ϖ

(24)

and

max_{1 \leq i \leq k; i \neq i_{0}} \{S (F_{i}), S (G_{i})\} < δ.

(25)

Then, we have

\sum_{\begin{matrix} x \leq n \leq 2 x \\ n = b (W) \end{matrix}} θ (n + h_{i_{0}}) \prod_{1 \leq i \leq k; i \neq i_{0}} λ_{F_{i}} (n + h_{i}) λ_{G_{i}} (n + h_{i}) = (c + o (1)) B^{1 - k} \frac{x}{φ (W)}

(26)

where

c : = \prod_{1 \leq i \leq k; i \neq i_{0}} (\int_{0}^{1} F_{i}^{'} (t_{i}) G_{i}^{'} (t_{i}) d t_{i}) .

Here of course F^′ denotes the derivative of F.

To estimate the sums (19), we use the following asymptotic, also proven in the ‘Multidimensional Selberg sieves’ section.

Theorem 20(Asymptotic for non-prime sums).

Let k≥1 be fixed, let (h₁,…,h_k) be a fixed admissible k-tuple, and let b (W) be such that b+h_i is coprime to W for each i=1,…,k. For each fixed 1≤i≤k, let $F_{i}, G_{i} : [0, + \infty) \to ℝ$ be fixed smooth compactly supported functions. Assume one of the following hypotheses:

(i) (Trivial case) One has

\sum_{i = 1}^{k} (S (F_{i}) + S (G_{i})) < 1 .

(27)

(ii) (Generalized Elliott-Halberstam) There exists a fixed 0<𝜗<1 and i₀∈{1,…,k} such that GEH[ 𝜗] holds, and

\sum_{1 \leq i \leq k; i \neq i_{0}} (S (F_{i}) + S (G_{i})) < 𝜗.

(28)

Then, we have

\sum_{\begin{matrix} x \leq n \leq 2 x \\ n = b (W) \end{matrix}} \prod_{i = 1}^{k} λ_{F_{i}} (n + h_{i}) λ_{G_{i}} (n + h_{i}) = (c + o (1)) B^{- k} \frac{x}{W},

(29)

where

c : = \prod_{i = 1}^{k} (\int_{0}^{1} F_{i}^{'} (t_{i}) G_{i}^{'} (t_{i}) d t_{i}) .

(30)

A key point in (ii) is that no upper bound on $S (F_{i_{0}})$ or $S (G_{i_{0}})$ is required (although, as we will see in the ‘The generalized Elliott-Halberstam case’ section, the result is a little easier to prove when one has $S (F_{i_{0}}) + S (G_{i_{0}}) < 1$ ). This flexibility in the $F_{i_{0}}, G_{i_{0}}$ functions will be particularly crucial to obtain part (xii) of Theorem 16 and Theorem 4.

Remark 21.

Theorems 19 and 20 can be viewed as probabilistic assertions of the following form: if n is chosen uniformly at random from the set {x≤n≤2x:n=b (W)}, then the random variables θ(n+h_i) and $λ_{F_{j}} (n + h_{j}) λ_{G_{j}} (n + h_{j})$ for i,j=1,…,k have mean $(1 + o (1)) \frac{W}{φ (W)}$ and $(\int_{0}^{1} F_{j}^{'} (t) G_{j}^{'} (t) dt + o (1)) B^{- 1}$ , respectively, and furthermore, these random variables enjoy a limited amount of independence, except for the fact (as can be seen from (20)) that θ(n+h_i) and $λ_{F_{i}} (n + h_{i}) λ_{G_{i}} (n + h_{i})$ are highly correlated. Note though that we do not have asymptotics for any sum which involves two or more factors of θ, as such estimates are of a difficulty at least as great as that of the twin prime conjecture (which is equivalent to the divergence of the sum $\sum_{n} θ (n) θ (n + 2)$ ).

Theorems 19 and 20 may be combined with Lemma 18 to reduce the task of establishing estimates of the form DHL[ k;m+1] to that of establishing certain variational problems. For instance, in the ‘Proof of Theorem 22’ section, we reprove the following result of Maynard ([5], Proposition 4.2]):

Theorem 22(Sieving on the standard simplex).

Let k≥2 and m≥1 be fixed integers. For any fixed compactly supported square-integrable function $F : [0, + \infty)^{k} \to ℝ$ , define the functionals

I (F) : = \int_{[0, + \infty)^{k}} F {(t_{1}, \dots, t_{k})}^{2} d t_{1} \dots t_{k}

(31)

and

J_{i} (F) : = \int_{[0, + \infty)^{k - 1}} {(\int_{0}^{\infty} F (t_{1}, \dots, t_{k}) d t_{i})}^{2} d t_{1} \dots d t_{i - 1} d t_{i + 1} \dots d t_{k}

(32)

for i=1,…,k, and let M_k be the supremum

M_{k} : = sup \frac{\sum_{i = 1}^{k} J_{i} (F)}{I (F)}

(33)

over all square integrable functions F that are supported on the simplex

R_{k} : = \{(t_{1}, \dots, t_{k}) \in [0, + \infty)^{k} : t_{1} + \dots + t_{k} \leq 1\}

and are not identically zero (up to almost everywhere equivalence, of course). Suppose that there is a fixed 0<𝜗<1 such that EH[ 𝜗] holds and such that

M_{k} > \frac{2 m}{𝜗} .

Then, DHL[ k;m+1] holds.

Parts (vii)-(xi) of Theorem 16 (and hence Theorem 4) are then immediate from the following results, proven in the ‘Asymptotic analysis’ and ‘The case of small and medium dimension’ sections, and ordered by increasing value of k:

Theorem 23(Lower bounds on M_k).

(vii) M₅₄>4.00238.

(viii) M_5,511>6.

(ix) M_41,588>8.

(x) M_309,661>10.

(xi) One has M_k≥ logk−C for all k≥C, where C is an absolute (and effective) constant.

For the sake of comparison, in ([5], Proposition 4.3]), it was shown that M₅>2, M₁₀₅>4, and M_k≥ logk−2 log logk−2 for all sufficiently large k. As remarked in that paper, the sieves used on the bounded gap problem prior to the work in [5] would essentially correspond, in this notation, to the choice of functions F of the special form F(t₁,…,t_k):=f(t₁+⋯+t_k), which severely limits the size of the ratio in (33) (in particular, the analogue of M_k in this special case cannot exceed 4, as shown in [36]).

In the converse direction, in Corollary 37, we will also show the upper bound $M_{k} \leq \frac{k}{k - 1} log k$ for all k≥2, which shows in particular that the bounds in (vii) and (xi) of the above theorem cannot be significantly improved. We remark that Theorem 23(vii) and the Bombieri-Vinogradov theorem also give a weaker version DHL[ 54;2] of Theorem 16(i).

We also have a variant of Theorem 22 which can accept inputs of the form MPZ[ ϖ,δ]:

Theorem 24(Sieving on a truncated simplex).

Let k≥2 and m≥1 be fixed integers. Let 0<ϖ<1/4 and 0<δ<1/2 be such that MPZ[ ϖ,δ] holds. For any α>0, let $M_{k}^{[α]}$ be defined as in (33), but where the supremum now ranges over all square-integrable F supported in the truncated simplex

\{(t_{1}, \dots, t_{k}) \in {[0, α]}^{k} : t_{1} + \dots + t_{k} \leq 1\}

(34)

and are not identically zero. If

M_{k}^{[\frac{δ}{1 / 4 + ϖ}]} > \frac{m}{1 / 4 + ϖ},

then DHL[ k;m+1] holds.

In the ‘Asymptotic analysis’ section, we will establish the following variant of Theorem 23, which when combined with Theorem 11, allows one to use Theorem 24 to establish parts (ii)-(vi) of Theorem 16 (and hence Theorem 4):

Theorem 25(Lower bounds on $M_{k}^{[α]}$ ).

(ii) There exist δ,ϖ>0 with 600ϖ+180δ<7 and $M_{35 410}^{[\frac{δ}{1 / 4 + ϖ}]} > \frac{2}{1 / 4 + ϖ}$ .

(iii) There exist δ,ϖ>0 with 600ϖ+180δ<7 and $M_{1 649 821}^{[\frac{δ}{1 / 4 + ϖ}]} > \frac{3}{1 / 4 + ϖ}$ .

(iv) There exist δ,ϖ>0 with 600ϖ+180δ<7 and $M_{75 845 707}^{[\frac{δ}{1 / 4 + ϖ}]} > \frac{4}{1 / 4 + ϖ}$ .

(v) There exist δ,ϖ>0 with 600ϖ+180δ<7 and $M_{3 473 955 908}^{[\frac{δ}{1 / 4 + ϖ}]} > \frac{5}{1 / 4 + ϖ}$ .

(vi) For all k≥C, there exist δ,ϖ>0 with 600ϖ+180δ<7, $ϖ \geq \frac{7}{600} - \frac{C}{log k}$ , and $M_{k}^{[\frac{δ}{1 / 4 + ϖ}]} \geq log k - C$ for some absolute (and effective) constant C.

The implication is clear for (ii)-(v). For (vi), observe that from Theorem 25(vi), Theorem 11, and Theorem 24, we see that DHL[ k;m+1] holds whenever k is sufficiently large and

m \leq (log k - C) (\frac{1}{4} + \frac{7}{600} - \frac{C}{log k})

which is in particular implied by

m \leq \frac{log k}{4 - \frac{28}{157}} - C^{'}

for some absolute constant C^′, giving Theorem 16(vi).

Now we give a more flexible variant of Theorem 22, in which the support of F is enlarged, at the cost of reducing the range of integration of the J_i.

Theorem 26(Sieving on an epsilon-enlarged simplex).

Let k≥2 and m≥1 be fixed integers, and let 0<ε<1 be fixed also. For any fixed compactly supported square-integrable function $F : [0, + \infty)^{k} \to ℝ$ , define the functionals

J_{i, 1 - ε} (F) : = \int_{(1 - ε) \cdot R_{k - 1}} {(\int_{0}^{\infty} F (t_{1}, \dots, t_{k}) d t_{i})}^{2} d t_{1} \dots d t_{i - 1} d t_{i + 1} \dots d t_{k}

for i=1,…,k, and let M_k,ε be the supremum

M_{k, ε} : = sup \frac{\sum_{i = 1}^{k} J_{i, 1 - ε} (F)}{I (F)}

over all square-integrable functions F that are supported on the simplex

(1 + ε) \cdot R_{k} = \{(t_{1}, \dots, t_{k}) \in [0, + \infty)^{k} : t_{1} + \dots + t_{k} \leq 1 + ε\}

and are not identically zero. Suppose that there is a fixed 0<𝜗<1, such that one of the following two hypotheses hold:

(i) EH[𝜗] holds, and $1 + ε < \frac{1}{𝜗}$ .

(ii) GEH[𝜗] holds, and $ε < \frac{1}{k - 1}$ .

If

M_{k, ε} > \frac{2 m}{𝜗}

then DHL[ k;m+1] holds.

We prove this theorem in the ‘Proof of Theorem 26’ section. We remark that due to the continuity of M_k,ε in ε, the strict inequalities in (i) and (ii) of this theorem may be replaced by non-strict inequalities. Parts (i) and (xiii) of Theorem 16, and a weaker version DHL[ 4;2] of part (xii), then follow from Theorem 9 and the following computations, proven in the ‘Bounding M_k,ε for medium k’ and ‘Bounding M_4,ε’ sections:

Theorem 27(Lower bounds on M_k,ε).

(i) M_50,1/25>4.0043.

(xii’) M_4,0.168>2.00558.

(xiii) M_51,1/50>4.00156.

We remark that computations in the proof of Theorem 27(xii’) are simple enough that the bound may be checked by hand, without use of a computer. The computations used to establish the full strength of Theorem 16(xii) are however significantly more complicated.

In fact, we may enlarge the support of F further. We give a version corresponding to part (ii) of Theorem 26; there is also a version corresponding to part (i), but we will not give it here as we will not have any use for it.

Theorem 28(Going beyond the epsilon enlargement).

Let k≥2 and m≥1 be fixed integers, let 0<𝜗<1 be a fixed quantity such that GEH[ 𝜗] holds, and let $0 < ε < \frac{1}{k - 1}$ be fixed also. Suppose that there is a fixed non-zero square-integrable function $F : [0, + \infty)^{k} \to ℝ$ supported in $\frac{k}{k - 1} \cdot R_{k}$ , such that for i=1,…,k, one has the vanishing marginal condition

\int_{0}^{\infty} F (t_{1}, \dots, t_{k}) d t_{i} = 0

(35)

whenever t₁,…,t_i−1,t_i+1,…,t_k≥0 are such that

t_{1} + \dots + t_{i - 1} + t_{i + 1} + \dots + t_{k} > 1 + ε.

Suppose that we also have the inequality

\frac{\sum_{i = 1}^{k} J_{i, ε} (F)}{I (F)} > \frac{2 m}{𝜗} .

Then DHL[ k;m+1] holds.

This theorem is proven in the ‘Proof of Theorem 28’ section. Theorem 16(xii) is then an immediate consequence of Theorem 28 and the following numerical fact, established in the ‘Three-dimensional cutoffs’ section.

Theorem 29(A piecewise polynomial cutoff).

Set $ε : = \frac{1}{4}$ . Then, there exists a piecewise polynomial function $F : [0, + \infty)^{3} \to ℝ$ supported on the simplex

\frac{3}{2} \cdot R_{3} = \{(t_{1}, t_{2}, t_{3}) \in [0, + \infty)^{3} : t_{1} + t_{2} + t_{3} \leq \frac{3}{2}\}

and symmetric in the t₁,t₂,t₃ variables, such that F is not identically zero and obeys the vanishing marginal condition

\int_{0}^{\infty} F (t_{1}, t_{2}, t_{3}) d t_{3} = 0

whenever t₁,t₂≥0 with t₁+t₂>1+ε and such that

\frac{3 \int_{t_{1} + t_{2} \leq 1 - ε} {(\int_{0}^{\infty} F (t_{1}, t_{2}, t_{3}) d t_{3})}^{2} d t_{1} d t_{2}}{\int_{[0, \infty)^{3}} F {(t_{1}, t_{2}, t_{3})}^{2} d t_{1} d t_{2} d t_{3}} > 2 .

There are several other ways to combine Theorems 19 and 20 with equidistribution theorems on the primes to obtain results of the form DHL[k;m+1], but all of our attempts to do so either did not improve the numerology or else were numerically infeasible to implement.

Multidimensional Selberg sieves

In this section, we prove Theorems 19 and 20. A key asymptotic used in both theorems is the following:

Lemma 30(Asymptotic).

Let k≥1 be a fixed integer, and let N be a natural number coprime to W with logN=O(logO(1)x). Let $F_{1}, \dots, F_{k}, G_{1}, \dots, G_{k} : [0, + \infty) \to ℝ$ be fixed smooth compactly supported functions. Then,

\sum_{\begin{matrix} d_{1}, \dots, d_{k}, d_{1}^{'}, \dots, d_{k}^{'} \\ [d_{1}, d_{1}^{'}], \dots, [d_{k}, d_{k}^{'}], W, N coprime \end{matrix}} \prod_{j = 1}^{k} \frac{μ (d_{j}) μ (d_{j}^{'}) F_{j} (\underset{x}{log} d_{j}) G_{j} (\underset{x}{log} d_{j}^{'})}{[d_{j}, d_{j}^{'}]} = (c + o (1)) B^{- k} \frac{N^{k}}{φ {(N)}^{k}}

(36)

where B was defined in (12), and

c : = \prod_{j = 1}^{k} \int_{0}^{\infty} F_{j}^{'} (t_{j}) G_{j}^{'} (t_{j}) d t_{j} .

The same claim holds if the denominators $[d_{j}, d_{j}^{'}]$ are replaced by $φ ([d_{j}, d_{j}^{'}])$ .

Such asymptotics are standard in the literature (see, e.g. [37] for some similar computations). In older literature, it is common to establish these asymptotics via contour integration (e.g. via Perron’s formula), but we will use the Fourier analytic approach here. Of course, both approaches ultimately use the same input, namely the simple pole of the Riemann zeta function at s=1.

Proof.

We begin with the first claim. For j=1,…,k, the functions t↦e^tF_j(t), t↦e^tG_j(t) may be extended to smooth compactly supported functions on all of , and so we have Fourier expansions

e^{t} F_{j} (t) = \int_{ℝ} e^{- itξ} f_{j} (ξ) dξ

(37)

and

e^{t} G_{j} (t) = \int_{ℝ} e^{- itξ} g_{j} (ξ) dξ

for some fixed functions $f_{j}, g_{j} : ℝ \to ℂ$ that are smooth and rapidly decreasing in the sense that f_j(ξ),g_j(ξ)=O((1+|ξ|)^−A) for any fixed A>0 and all $ξ \in ℝ$ (here the implied constant is independent of ξ and depends only on A).

We may thus write

\begin{matrix} F_{j} (\underset{x}{log} d_{j}) = \int_{ℝ} \frac{f_{j} (ξ_{j})}{d_{j}^{\frac{1 + i ξ_{j}}{log x}}} d ξ_{j} \end{matrix}

and

\begin{matrix} G_{j} (\underset{x}{log} d_{j}^{'}) = \int_{ℝ} \frac{g_{j} (ξ_{j}^{'})}{{(d_{j}^{'})}^{\frac{1 + i ξ_{j}^{'}}{log x}}} d ξ_{j}^{'} \end{matrix}

for all $d_{j}, d_{j}^{'} \geq 1$ . We note that

\begin{matrix} \sum_{d_{j}, d_{j}^{'}} \frac{| μ (d_{j}) μ (d_{j}^{'}) |}{[d_{j}, d_{j}^{'}] d_{j}^{1 / log x} {(d_{j}^{'})}^{1 / log x}} = \prod_{p} (1 + \frac{2}{p^{1 + 1 / log x}} + \frac{1}{p^{1 + 2 / log x}}) \leq exp (O (log log x)) . \end{matrix}

Therefore, if we substitute the Fourier expansions into the left-hand side of (36), the resulting expression is absolutely convergent. Thus, we can apply Fubini’s theorem, and the left-hand side of (36) can thus be rewritten as

\begin{matrix} \int_{ℝ} \dots \int_{ℝ} K (ξ_{1}, \dots, ξ_{k}, ξ_{1}^{'}, \dots, ξ_{k}^{'}) \prod_{j = 1}^{k} f_{j} (ξ_{j}) g_{j} (ξ_{j}^{'}) d ξ_{j} d ξ_{j}^{'}, \end{matrix}

(38)

where

\begin{matrix} K (ξ_{1}, \dots, ξ_{k}, ξ_{1}^{'}, \dots, ξ_{k}^{'}) : = \sum_{\begin{matrix} d_{1}, \dots, d_{k}, d_{1}^{'}, \dots, d_{k}^{'} \\ [d_{1}, d_{1}^{'}], \dots, [d_{k}, d_{k}^{'}], W, N coprime \end{matrix}} \prod_{j = 1}^{k} \frac{μ (d_{j}) μ (d_{j}^{'})}{[d_{j}, d_{j}^{'}] d_{j}^{\frac{1 + i ξ_{j}}{log x}} {(d_{j}^{'})}^{\frac{1 + i ξ_{j}^{'}}{log x}}} . \end{matrix}

This latter expression factorizes as an Euler product

K = \prod_{p ∤ WN} K_{p},

where the local factors K_p are given by

\begin{matrix} K_{p} (ξ_{1}, \dots, ξ_{k}, ξ_{1}^{'}, \dots, ξ_{k}^{'}) : = 1 + \frac{1}{p} \sum_{\begin{matrix} d_{1}, \dots, d_{k}, d_{1}^{'}, \dots, d_{k}^{'} \\ [d_{1}, \dots, d_{k}, d_{1}^{'}, \dots, d_{k}^{'}] = p \\ [d_{1}, d_{1}^{'}], \dots, [d_{k}, d_{k}^{'}] coprime \end{matrix}} \prod_{j = 1}^{k} \frac{μ (d_{j}) μ (d_{j}^{'})}{d_{j}^{\frac{1 + i ξ_{j}}{log x}} {(d_{j}^{'})}^{\frac{1 + i ξ_{j}^{'}}{log x}}} . \end{matrix}

(39)

We can estimate each Euler factor as

\begin{matrix} K_{p} (ξ_{1}, \dots, ξ_{k}, ξ_{1}^{'}, \dots, ξ_{k}^{'}) = (1 + O (\frac{1}{p^{2}})) \prod_{j = 1}^{k} \frac{(1 - p^{- 1 - \frac{1 + i ξ_{j}}{log x}}) (1 - p^{- 1 - \frac{1 + i ξ_{j}^{'}}{log x}})}{1 - p^{- 1 - \frac{2 + i ξ_{j} + i ξ_{j}^{'}}{log x}}} . \end{matrix}

(40)

Since

\begin{matrix} \prod_{p : p > w} (1 + O (\frac{1}{p^{2}})) = 1 + o (1), \end{matrix}

we have

\begin{matrix} K (ξ_{1}, \dots, ξ_{k}, ξ_{1}^{'}, \dots, ξ_{k}^{'}) = (1 + o (1)) \prod_{j = 1}^{k} \frac{ζ_{WN} (1 + \frac{2 + i ξ_{j} + i ξ_{j}^{'}}{log x})}{ζ_{WN} (1 + \frac{1 + i ξ_{j}}{log x}) ζ_{WN} (1 + \frac{1 + i ξ_{j}^{'}}{log x})} \end{matrix}

where the modified zeta function ζ_WN is defined by the formula

\begin{matrix} ζ_{WN} (s) : = \prod_{p ∤ WN} {(1 - \frac{1}{p^{s}})}^{- 1} \end{matrix}

for ℜ(s)>1.

For $ℜ (s) \geq 1 + \frac{1}{log x}$ , we have the crude bounds

\begin{matrix} | ζ_{WN} (s) |, | ζ_{WN} (s) |^{- 1} & \leq \prod_{p} (1 + \frac{1}{p^{1 + 1 / log x}} + O (\frac{1}{p^{2}})) \\ ≪ exp (\sum_{p} \frac{1}{p^{1 + 1 / log x}}) \\ \leq exp (log log x + O (1)) \\ ≪ log x. \end{matrix}

Thus,

K (ξ_{1}, \dots, ξ_{k}, ξ_{1}^{'}, \dots, ξ_{k}^{'}) = O (\overset{3 k}{log} x) .

Combining this with the rapid decrease of f_j,g_j, we see that the contribution to (38) outside of the cube $\{max (ξ_{1}, \dots, ξ_{k}, ξ_{1}^{'}, \dots, ξ_{k}^{'}) \leq \sqrt{log x}\}$ (say) is negligible. Thus, it will suffice to show that

\begin{matrix} \int_{- \sqrt{log x}}^{\sqrt{log x}} \dots \int_{- \sqrt{log x}}^{\sqrt{log x}} K (ξ_{1}, \dots, ξ_{k}, ξ_{1}^{'}, \dots, ξ_{k}^{'}) \prod_{j = 1}^{k} f_{j} (ξ_{j}) g_{j} (ξ_{j}^{'}) d ξ_{j} d ξ_{j}^{'} = (c + o (1)) B^{- k} \frac{N^{k}}{φ {(N)}^{k}} . \end{matrix}

When $| ξ_{j} | \leq \sqrt{log x}$ , we see from the simple pole of the Riemann zeta function $ζ (s) = \prod_{p} {(1 - \frac{1}{p^{s}})}^{- 1}$ at s=1 that

ζ (1 + \frac{1 + i ξ_{j}}{log x}) = (1 + o (1)) \frac{log x}{1 + i ξ_{j}} .

For $- \sqrt{log x} \leq ξ_{j} \leq \sqrt{log x}$ , we see that

1 - \frac{1}{p^{1 + \frac{1 + i ξ_{j}}{log x}}} = 1 - \frac{1}{p} + O (\frac{log p}{p \sqrt{log x}}) .

Since logW N≪ logO(1)x, this gives

\begin{align} \prod_{p | WN} (1 - \frac{1}{p^{1 + \frac{1 + i ξ_{j}}{log x}}}) & = \frac{φ (WN)}{WN} exp (O (\sum_{p | WN} \frac{log p}{p \sqrt{log x}})) = (1 + o (1)) \frac{φ (WN)}{WN}, \end{align}

since the sum is maximized when WN is composed only of primes p≪ logO(1)x. Thus,

ζ_{WN} (1 + \frac{1 + i ξ_{j}}{log x}) = \frac{(1 + o (1)) Bφ (N)}{(1 + i ξ_{j}) N},

similarly with 1+i ξ_j replaced by $1 + i ξ_{j}^{'}$ or $2 + i ξ_{j} + i ξ_{j}^{'}$ . We conclude that

K (ξ_{1}, \dots, ξ_{k}, ξ_{1}^{'}, \dots, ξ_{k}^{'}) = (1 + o (1)) B^{- k} \frac{N^{k}}{φ {(N)}^{k}} \prod_{j = 1}^{k} \frac{(1 + i ξ_{j}) (1 + i ξ_{j}^{'})}{2 + i ξ_{j} + i ξ_{j}^{'}} .

(41)

Therefore, it will suffice to show that

\int_{ℝ} \dots \int_{ℝ} \prod_{j = 1}^{k} \frac{(1 + i ξ_{j}) (1 + i ξ_{j}^{'})}{2 + i ξ_{j} + i ξ_{j}^{'}} f_{j} (ξ_{j}) g_{j} (ξ_{j}^{'}) d ξ_{j} d ξ_{j}^{'} = c,

since the errors caused by the 1+o(1) multiplicative factor in (41) or the truncation $| ξ_{j} |, | ξ_{j}^{'} | \leq \sqrt{log x}$ can be seen to be negligible using the rapid decay of f_j,g_j. By Fubini’s theorem, it suffices to show that

\int_{ℝ} \int_{ℝ} \frac{(1 + iξ) (1 + i ξ^{'})}{2 + iξ + i ξ^{'}} f_{j} (ξ) g_{j} (ξ^{'}) dξd ξ^{'} = \int_{0}^{+ \infty} F_{j}^{'} (t) G_{j}^{'} (t) dt

for each j=1,…,k. But from dividing (37) by e^t and differentiating under the integral sign, we have

F_{j}^{'} (t) = - \int_{ℝ} (1 + iξ) e^{- t (1 + iξ)} f_{j} (ξ) dξ,

and the claim then follows from Fubini’s theorem.

Finally, suppose that we replace $[d_{j}, d_{j}^{'}]$ with $φ ([d_{j}, d_{j}^{'}])$ . An inspection of the above argument shows that the only change that occurs is that the $\frac{1}{p}$ term in (39) is replaced by $\frac{1}{p - 1}$ ; but this modification may be absorbed into the $1 + O (\frac{1}{p^{2}})$ factor in (40), and the rest of the argument continues as before.

4.1 The trivial case

We can now prove the easiest case of the two theorems, namely case (i) of Theorem 20; a closely related estimate also appears in ([5], Lemma 6.2]). We may assume that x is sufficiently large depending on all fixed quantities. By (16), the left-hand side of (29) may be expanded as

\sum_{d_{1}, \dots, d_{k}, d_{1}^{'}, \dots, d_{k}^{'}} (\prod_{i = 1}^{k} μ (d_{i}) μ (d_{i}^{'}) F_{i} (\underset{x}{log} d_{i}) G_{i} (\underset{x}{log} d_{i}^{'})) S (d_{1}, \dots, d_{k}, d_{1}^{'}, \dots, d_{k}^{'})

(42)

where

S (d_{1}, \dots, d_{k}, d_{1}^{'}, \dots, d_{k}^{'}) : = \sum_{\begin{matrix} x \leq n \leq 2 x \\ n = b (W) \\ n + h_{i} = 0 ([d_{i}, d_{i}^{'}]) \forall i \end{matrix}} 1 .

By hypothesis, b+h_i is coprime to W for all i=1,…,k, and |h_i−h_j|<w for all distinct i,j. Thus, $S (d_{1}, \dots, d_{k}, d_{1}^{'}, \dots, d_{k}^{'})$ vanishes unless the $[d_{i}, d_{i}^{'}]$ are coprime to each other and to W. In this case, $S (d_{1}, \dots, d_{k}, d_{1}^{'}, \dots, d_{k}^{'})$ is summing the constant function 1 over an arithmetic progression in [ x,2x] of spacing $W [d_{1}, d_{1}^{'}] \dots [d_{k}, d_{k}^{'}]$ , and so

S (d_{1}, \dots, d_{k}, d_{1}^{'}, \dots, d_{k}^{'}) = \frac{x}{W [d_{1}, d_{1}^{'}] \dots [d_{k}, d_{k}^{'}]} + O (1) .

By Lemma 30, the contribution of the main term $\frac{x}{W [d_{1}, d_{1}^{'}] \dots [d_{k}, d_{k}^{'}]}$ to (29) is $(c + o (1)) B^{- k} \frac{x}{W}$ ; note that the restriction of the integrals in (30) to [ 0,1] instead of [ 0,+∞) is harmless since S(F_i),S(G_i)<1 for all i. Meanwhile, the contribution of the O(1) error is then bounded by

O (\sum_{d_{1}, \dots, d_{k}, d_{1}^{'}, \dots, d_{k}^{'}} (\prod_{i = 1}^{k} | F_{i} (\underset{x}{log} d_{i}) | | G_{i} (\underset{x}{log} d_{i}^{'}) |)) .

By the hypothesis in Theorem 20(i), we see that for $d_{1}, \dots, d_{k}, d_{1}^{'}, \dots, d_{k}^{'}$ contributing a non-zero term here, one has

[d_{1}, d_{1}^{'}] \dots [d_{k}, d_{k}^{'}] ⪻ ⪻ x^{1 - ε}

for some fixed ε>0. From the divisor bound (1), we see that each choice of $[d_{1}, d_{1}^{'}] \dots [d_{k}, d_{k}^{'}]$ arises from ⪻ ⪻1 choices of $d_{1}, \dots, d_{k}, d_{1}^{'}, \dots, d_{k}^{'}$ . We conclude that the net contribution of the O(1) error to (29) is ⪻ ⪻x^1−ε, and the claim follows.

4.2 The Elliott-Halberstam case

Now we show case (i) of Theorem 19. For the sake of notation, we take i₀=k, as the other cases are similar. We use (16) to rewrite the left-hand side of (26) as

\sum_{d_{1}, \dots, d_{k - 1}, d_{1}^{'}, \dots, d_{k - 1}^{'}} (\prod_{i = 1}^{k - 1} μ (d_{i}) μ (d_{i}^{'}) F_{i} (\underset{x}{log} d_{i}) G_{i} (\underset{x}{log} d_{i}^{'})) \tilde{S} (d_{1}, \dots, d_{k - 1}, d_{1}^{'}, \dots, d_{k - 1}^{'})

(43)

where

\tilde{S} (d_{1}, \dots, d_{k - 1}, d_{1}^{'}, \dots, d_{k - 1}^{'}) : = \sum_{\begin{matrix} x \leq n \leq 2 x \\ n = b (W) \\ n + h_{i} = 0 ([d_{i}, d_{i}^{'}]) \forall i = 1, \dots, k - 1 \end{matrix}} θ (n + h_{k}) .

As in the previous case, $\tilde{S} (d_{1}, \dots, d_{k - 1}, d_{1}^{'}, \dots, d_{k - 1}^{'})$ vanishes unless the $[d_{i}, d_{i}^{'}]$ are coprime to each other and to W, and so the summand in (43) vanishes unless the modulus $q_{W, d_{1}, \dots, d_{k - 1}^{'}}$ defined by

q_{W, d_{1}, \dots, d_{k - 1}^{'}} : = W [d_{1}, d_{1}^{'}] \dots [d_{k - 1}, d_{k - 1}^{'}]

(44)

is square-free. In that case, we may use the Chinese remainder theorem to concatenate the congruence conditions on n into a single primitive congruence condition

n + h_{k} = a_{W, d_{1}, \dots, d_{k - 1}^{'}} (q_{W, d_{1}, \dots, d_{k - 1}^{'}})

for some $a_{W, d_{1}, \dots, d_{k - 1}^{'}}$ depending on $W, d_{1}, \dots, d_{k - 1}, d_{1}^{'}, \dots, d_{k - 1}^{'}$ , and conclude using (3) that

\begin{matrix} \tilde{S} (d_{1}, \dots, d_{k - 1}, d_{1}^{'}, \dots, d_{k - 1}^{'}) & = \frac{1}{φ (q_{W, d_{1}, \dots, d_{k - 1}^{'}})} \sum_{x + h_{k} \leq n \leq 2 x + h_{k}} θ (n) \\ + Δ (1_{[x + h_{k}, 2 x + h_{k}]} θ; a_{W, d_{1}, \dots, d_{k - 1}^{'}} (q_{W, d_{1}, \dots, d_{k - 1}^{'}})) . \end{matrix}

(45)

From the prime number theorem, we have

\sum_{x + h_{k} \leq n \leq 2 x + h_{k}} θ (n) = (1 + o (1)) x

and this expression is clearly independent of $d_{1}, \dots, d_{k - 1}^{'}$ . Thus, by Lemma 30, the contribution of the main term in (45) is $(c + o (1)) B^{1 - k} \frac{x}{φ (W)}$ . By (11) and (12), it thus suffices to show that for any fixed A we have

\sum_{d_{1}, \dots, d_{k - 1}, d_{1}^{'}, \dots, d_{k - 1}^{'}} (\prod_{i = 1}^{k - 1} |F_{i} (\underset{x}{log} d_{i})| |G_{i} (\underset{x}{log} d_{i}^{'})|) |Δ (1_{[x + h_{k}, 2 x + h_{k}]} θ; a (q))| ≪ x \overset{- A}{log} x,

(46)

where $a = a_{W, d_{1}, \dots, d_{k - 1}^{'}}$ and $q = q_{W, d_{1}, \dots, d_{k - 1}^{'}}$ . For future reference, we note that we may restrict the summation here to those $d_{1}, \dots, d_{k - 1}^{'}$ for which $q_{W, d_{1}, \dots, d_{k - 1}^{'}}$ is square-free.

From the hypotheses of Theorem 19(i), we have

q_{W, d_{1}, \dots, d_{k - 1}^{'}} ⪻ ⪻ x^{𝜗}

whenever the summand in (43) is non-zero, and each choice q of $q_{W, d_{1}, \dots, d_{k - 1}^{'}}$ is associated to O(τ(q)^O(1)) choices of $d_{1}, \dots, d_{k - 1}, d_{1}^{'}, \dots, d_{k - 1}^{'}$ . Thus, this contribution is

≪ \sum_{q ⪻ ⪻ x^{𝜗}} τ {(q)}^{O (1)} sup_{a \in {(ℤ / qℤ)}^{\times}} |Δ (1_{[x + h_{k}, 2 x + h_{k}]} θ; a (q))| .

Using the crude bound

|Δ (1_{[x + h_{k}, 2 x + h_{k}]} θ; a (q))| ≪ \frac{x}{q} \overset{O (1)}{log} x

and (2), we have

\sum_{q ⪻ ⪻ x^{𝜗}} τ {(q)}^{C} sup_{a \in {(ℤ / qℤ)}^{\times}} |Δ (1_{[x + h_{k}, 2 x + h_{k}]} θ; a (q))| ≪ x \overset{O (1)}{log} x

for any fixed C>0. By the Cauchy-Schwarz inequality, it suffices to show that

\sum_{q ⪻ ⪻ x^{𝜗}} sup_{a \in {(ℤ / qℤ)}^{\times}} |Δ (1_{[x + h_{k}, 2 x + h_{k}]} θ; a (q))| ≪ x \overset{- A}{log} x

for any fixed A>0. However, since θ only differs from Λ on powers p^j of primes with j>1, it is not difficult to show that

|Δ (1_{[x + h_{k}, 2 x + h_{k}]} θ; a (q)) - Δ (1_{[x + h_{k}, 2 x + h_{k}]} Λ; a (q))| ⪻ ⪻ \sqrt{\frac{x}{q}},

so the net error in replacing θ here by Λ is ⪻ ⪻x^{1−(1−𝜗)/2}, which is certainly acceptable. The claim now follows from the hypothesis EH[ 𝜗], thanks to Claim 8.

4.3 The Motohashi-Pintz-Zhang case

Now we show case (ii) of Theorem 19. We repeat the arguments from the ‘The Elliott-Halberstam case’ section, with the only difference being in the derivation of (46). As observed previously, we may restrict $q_{W, d_{1}, \dots, d_{k - 1}^{'}}$ to be square-free. From the hypotheses in Theorem 19(ii), we also see that

q_{W, d_{1}, \dots, d_{k - 1}^{'}} ⪻ ⪻ x^{𝜗}

and that all the prime factors of $q_{W, d_{1}, \dots, d_{k - 1}^{'}}$ are at most x^δ. Thus, if we set I:= [ 1,x^δ], we see (using the notation from Claim 10) that $q_{W, d_{1}, \dots, d_{k - 1}^{'}}$ lies in $S_{I}$ and is thus a factor of P_I. If we then let $A \subset ℤ / P_{I} ℤ$ denote all the primitive residue classes a (P_I) with the property that a=b (W), and such that for each prime w<p≤x^δ, one has a+h_i=0 (p) for some i=1,…,k, then we see that $a_{W, d_{1}, \dots, d_{k - 1}^{'}}$ lies in the projection of to $ℤ / q_{W, d_{1}, \dots, d_{k - 1}^{'}} ℤ$ . Each $q \in S_{I}$ is equal to $q_{W, d_{1}, \dots, d_{k - 1}^{'}}$ for O(τ(q)^O(1)) choices of $d_{1}, \dots, d_{k - 1}^{'}$ . Thus, the left-hand side of (46) is

≪ \sum_{q \in S_{I} : q ⪻ ⪻ x^{𝜗}} τ {(q)}^{O (1)} sup_{a \in A} |Δ (1_{[x + h_{k}, 2 x + h_{k}]} θ; a (q))| .

Note from the Chinese remainder theorem that for any given q, if one lets a range uniformly in , then a (q) is uniformly distributed among O(τ(q)^O(1)) different moduli. Thus, we have

sup_{a \in A} |Δ (1_{[x + h_{k}, 2 x + h_{k}]} θ; a (q))| ≪ \frac{τ {(q)}^{O (1)}}{| A |} \sum_{a \in A} |Δ (1_{[x + h_{k}, 2 x + h_{k}]} θ; a (q))|,

and so it suffices to show that

\sum_{q \in S_{I} : q ⪻ ⪻ x^{𝜗}} \frac{τ {(q)}^{O (1)}}{| A |} \sum_{a \in A} |Δ (1_{[x + h_{k}, 2 x + h_{k}]} θ; a (q))| ≪ x \overset{- A}{log} x

for any fixed A>0. We see it suffices to show that

\sum_{q \in S_{I} : q ⪻ ⪻ x^{𝜗}} τ {(q)}^{O (1)} |Δ (1_{[x + h_{k}, 2 x + h_{k}]} θ; a (q))| ≪ x \overset{- A}{log} x

for any given $a \in A$ . But this follows from the hypothesis MPZ[ ϖ,δ] by repeating the arguments of the ‘The Elliott-Halberstam case’ section.

4.4 Crude estimates on divisor sums

To proceed further, we will need some additional information on the divisor sums λ_F (defined in (16)), namely that these sums are concentrated on ‘almost primes’; results of this type have also appeared in [38].

Proposition 14(Almost primality).

Let k≥1 be fixed, let (h₁,…,h_k) be a fixed admissible k-tuple, and let b (W)be such that b+h_i is coprime to W for each i=1,…,k. Let $F_{1}, \dots, F_{k} : [0, + \infty) \to ℝ$ be fixed smooth compactly supported functions, and let m₁,…,m_k≥0 and a₁,…,a_k≥1 be fixed natural numbers. Then,

\sum_{x \leq n \leq 2 x : n = b (W)} \prod_{j = 1}^{k} (| λ_{F_{j}} (n + h_{j}) |^{a_{j}} τ {(n + h_{j})}^{m_{j}}) ≪ B^{- k} \frac{x}{W} .

(47)

Furthermore, if 1≤j₀≤k is fixed and p₀ is a prime with $p_{0} \leq x^{\frac{1}{10 k}}$ , then we have the variant

\sum_{x \leq n \leq 2 x : n = b (W)} \prod_{j = 1}^{k} (| λ_{F_{j}} (n + h_{j}) |^{a_{j}} τ {(n + h_{j})}^{m_{j}}) 1_{p_{0} | n + h_{j_{0}}} ≪ \frac{\underset{x}{log} p_{0}}{p_{0}} B^{- k} \frac{x}{W} .

(48)

As a consequence, we have

\sum_{x \leq n \leq 2 x : n = b (W)} \prod_{j = 1}^{k} (| λ_{F_{j}} (n + h_{j}) |^{a_{j}} τ {(n + h_{j})}^{m_{j}}) 1_{p (n + h_{j_{0}}) \leq x^{ε}} ≪ ε B^{- k} \frac{x}{W},

(49)

for any ε>0, where p(n) denotes the least prime factor of n.

The exponent $\frac{1}{10 k}$ can certainly be improved here, but for our purposes, any fixed positive exponent depending only on k will suffice.

Proof.

The strategy is to estimate the alternating divisor sums $λ_{F_{j}} (n + h_{j})$ by non-negative expressions involving prime factors of n+h_j, which can then be bounded combinatorially using standard tools.

We first prove (47). As in the proof of Proposition 30, we can use Fourier expansion to write

F_{j} (\underset{x}{log} d) = \int_{ℝ} \frac{f_{j} (ξ)}{d^{\frac{1 + iξ}{log x}}} dξ

for some rapidly decreasing $f_{j} : ℝ \to ℂ$ and all natural numbers d. Thus,

λ_{F_{j}} (n) = \int_{ℝ} (\sum_{d | n} \frac{μ (d)}{d^{\frac{1 + iξ}{log x}}}) f_{j} (ξ) dξ,

which factorizes using Euler products as

λ_{F_{j}} (n) = \int_{ℝ} \prod_{p | n} (1 - \frac{1}{p^{\frac{1 + iξ}{log x}}}) f_{j} (ξ) dξ.

The function $s \mapsto p^{\frac{- s}{log x}}$ has a magnitude of O(1) and a derivative of O(logx p) when ℜ(s)>1, and thus

1 - \frac{1}{p^{\frac{1 + iξ}{log x}}} = O (min ((1 + | ξ |) \underset{x}{log} p, 1)) .

From the rapid decrease of f_j and the triangle inequality, we conclude that

| λ_{F_{j}} (n) | ≪ \int_{ℝ} (\prod_{p | n} O (min ((1 + | ξ |) \underset{x}{log} p, 1))) \frac{dξ}{{(1 + | ξ |)}^{A}}

for any fixed A>0. Thus, noting that $\prod_{p | n} O (1) ≪ τ {(n)}^{O (1)}$ , we have

\begin{matrix} | λ_{F_{j}} (n) |^{a_{j}} ≪ τ {(n)}^{O (1)} \int_{ℝ} \dots \int_{ℝ} (\prod_{p | n} \prod_{l = 1}^{a_{j}} min ((1 + | ξ_{l} |) \underset{x}{log} p, 1)) \\ \frac{d ξ_{1} \dots d ξ_{a_{j}}}{{(1 + | ξ_{1} |)}^{A} \dots {(1 + | ξ_{a_{j}} |)}^{A}} \end{matrix}

for any fixed a_j,A. However, we have

\prod_{i = 1}^{a_{j}} min ((1 + | ξ_{i} |) \underset{x}{log} p, 1) ≪ min ((1 + | ξ_{1} | + \dots + | ξ_{a_{j}} |) \underset{x}{log} p, 1),

and so

| λ_{F_{j}} (n) |^{a_{j}} ≪ τ {(n)}^{O (1)} \int_{ℝ} \dots \int_{ℝ} \frac{(\prod_{p | n} min ((1 + | ξ_{1} | + \dots + | ξ_{a_{j}} |) \underset{x}{log} p, 1)) d ξ_{1} \dots d ξ_{a_{j}}}{{(1 + | ξ_{1} | + \dots + | ξ_{a_{j}} |)}^{A}} .

Making the change of variables $σ : = 1 + | ξ_{1} | + \dots + | ξ_{a_{j}} |$ , we obtain

| λ_{F_{j}} (n) |^{a_{j}} ≪ τ {(n)}^{O (1)} \int_{1}^{\infty} (\prod_{p | n} min (σ \underset{x}{log} p, 1)) \frac{dσ}{σ^{A}}

for any fixed A>0. In view of this bound and the Fubini-Tonelli theorem, it suffices to show that

\sum_{x \leq n \leq 2 x : n = b (W)} \prod_{j = 1}^{k} (τ {(n + h_{j})}^{O (1)} \prod_{p | n} min (σ_{j} \underset{x}{log} p, 1)) ≪ B^{- k} \frac{x}{W} {(σ_{1} + \dots + σ_{k})}^{O (1)}

for all σ₁,…,σ_k≥1. By setting σ:=σ₁+⋯+σ_k, it suffices to show that

\sum_{x \leq n \leq 2 x : n = b (W)} \prod_{j = 1}^{k} (τ {(n + h_{j})}^{O (1)} \prod_{p | n + h_{j}} min (σ \underset{x}{log} p, 1)) ≪ B^{- k} \frac{x}{W} σ^{O (1)}

(50)

for any σ≥1.

To proceed further, we factorize n+h_j as a product

n + h_{j} = p_{1} \dots p_{r}

of primes p₁≤⋯≤p_r in increasing order and then write

n + h_{j} = d_{j} m_{j}

where $d_{j} : = p_{1} \dots p_{i_{j}}$ and i_j is the largest index for which $p_{1} \dots p_{i_{j}} < x^{\frac{1}{10 k}}$ , and $m_{j} : = p_{i_{j} + 1} \dots p_{r}$ . By construction, we see that 0≤i_j<r, $d_{j} \leq x^{\frac{1}{10 k}}$ . Also, we have

p_{i_{j} + 1} \geq {(p_{1} \dots p_{i_{j} + 1})}^{\frac{1}{i_{j} + 1}} \geq x^{\frac{1}{10 k (i_{j} + 1)}} .

Since n≤2x, this implies that

r = O (i_{j} + 1)

and so

τ (n + h_{j}) \leq 2^{O (1 + Ω (d_{j}))},

where we recall that Ω(d_j)=i_j denotes the number of prime factors of d_j, counting multiplicity. We also see that

p (m_{j}) \geq x^{\frac{1}{10 k (1 + Ω (d_{j}))}} \geq x^{\frac{1}{10 k (1 + Ω (d_{1} \dots d_{k}))}} = : R,

where p(n) denotes the least prime factor of n. Finally, we have that

\prod_{p | n + h_{j}} min (σ \underset{x}{log} p, 1) \leq \prod_{p | d_{j}} min (σ \underset{x}{log} p, 1),

and we see that the d₁,…,d_k,W are coprime. We may thus estimate the left-hand side of (50) by

≪ \sum_{*} (\prod_{j = 1}^{k} 2^{O (1 + Ω (d_{j})} \prod_{p | d_{j}} min (σ \underset{x}{log} p, 1)) \sum_{* *} 1

where the outer sum $\sum_{*}$ is over $d_{1}, \dots, d_{k} \leq x^{\frac{1}{10 k}}$ with d₁,…,d_k,W coprime, and the inner sum $\sum_{* *}$ is over x≤n≤2x with n=b (W) and n+h_j=0 (d_j) for each j, with $p (\frac{n + h_{j}}{d_{j}}) \geq R$ for each j.

We bound the inner sum $\sum_{* *} 1$ using a Selberg sieve upper bound. Let G be a smooth function supported on [ 0,1] with G(0)=1, and let d=d₁…d_k. We see that

\sum_{* *} 1 \leq \sum_{\begin{matrix} x \leq n \leq 2 x \\ n + h_{i} \equiv 0 (d_{i}) \\ n \equiv b (W) \end{matrix}} \prod_{i = 1}^{k} {(\sum_{\begin{matrix} e | n + h_{i} \\ (e, dW) = 1 \end{matrix}} μ (e) G (\underset{R}{log} e))}^{2},

since the product is G(0)^2k=1 if $p (\frac{n + h_{j}}{d_{j}}) \geq R$ , and non-negative otherwise. The right-hand side may be expanded as

\sum_{\begin{matrix} e_{1}, \dots, e_{k}, e_{1}^{'}, \dots, e_{k}^{'} \\ (e_{i} e_{i}^{'}, dW) = 1 \forall i \end{matrix}} (\prod_{i = 1}^{k} μ (e_{i}) μ (e_{i}^{'}) G (\underset{R}{log} e_{i}) G (\underset{R}{log} e_{i}^{'})) \sum_{\begin{matrix} x \leq n \leq 2 x \\ n + h_{i} \equiv 0 (d_{i} [e_{i}, e_{i}^{'}]) \\ n \equiv b (W) \end{matrix}} 1 .

As in the ‘The trivial case’ section, the inner sum vanishes unless the $e_{i} e_{i}^{'}$ are coprime to each other and dW, in which case it is

\frac{x}{dW [e_{1}, e_{1}^{'}] \dots [e_{k}, e_{k}^{'}]} + O (1) .

The O(1) term contributes ⪻ ⪻R^k⪻ ⪻x^1/10, which is negligible. By Lemma 30, if Ω(d)≪ log1/2x, then the main term contributes

≪ {(\frac{d}{φ (d)})}^{k} \frac{x}{dW} {(log R)}^{- k} ≪ 2^{Ω (d)} B^{- k} \frac{x}{dW} .

We see that this final bound applies trivially if Ω(d)≫ log1/2x. The bound (50) thus reduces to

\sum_{*} (\prod_{j = 1}^{k} \frac{2^{O (1 + Ω (d_{j}))}}{d_{j}} \prod_{p | d_{j}} min (σ \underset{x}{log} p, 1)) ≪ σ^{O (1)} .

(51)

Ignoring the coprimality conditions on the d_j for an upper bound, we see this is bounded by

\begin{matrix} \prod_{w < p \leq x^{\frac{1}{10 k}}} {(1 + \frac{O (min (σ \underset{x}{log} (p), 1))}{p} \sum_{j \geq 0} \frac{O {(1)}^{j}}{p^{j}})}^{k} \\ ≪ exp (O (\sum_{p \leq x} \frac{(min (σ \underset{x}{log} (p), 1))}{p})) . \end{matrix}

But from Mertens’ theorem, we have

\sum_{p \leq x} \frac{min (σ \underset{x}{log} p, 1)}{p} = O (log \frac{1}{σ}),

and the claim (47) follows.

The proof of (48) is a minor modification of the argument above used to prove (47). Namely, the variable $d_{j_{0}}$ is now replaced by [ d₀,p₀]<x^1/5k, which upon factoring out p₀ has the effect of multiplying the upper bound for (51) by $O (\frac{σ \underset{x}{log} p_{0}}{p_{0}})$ (at the negligible cost of deleting the prime p₀ from the sum $\sum_{p \leq x})$ , giving the claim; we omit the details.

Finally, (49) follows immediately from (47) when $ε > \frac{1}{10 k}$ , and from (48) and Mertens’ theorem when $ε \leq \frac{1}{10 k}$ .

Remark 32.

As in [38], one can use Proposition 14, together with the observation that the quantity λ_F(n) is bounded whenever n=O(x) and p(n)≥x^ε, to conclude that whenever the hypotheses of Lemma 18 are obeyed for some ν of the form (18), then there exists a fixed ε>0 such that for all sufficiently large x, there are $≫ \frac{x}{\overset{k}{log} x}$ elements n of [x,2x] such that n+h₁,…,n+h_k have no prime factor less than x^ε, and that at least m of the n+h₁,…,n+h_k are prime.

4.5 The generalized Elliott-Halberstam case

Now we show case (ii) of Theorem 20. For the sake of notation, we shall take i₀=k, as the other cases are similar; thus, we have

\sum_{i = 1}^{k - 1} (S (F_{i}) + S (G_{i})) < 𝜗.

(52)

The basic idea is to view the sum (29) as a variant of (26), with the role of the function θ now being played by the product divisor sum $λ_{F_{k}} λ_{G_{k}}$ , and to repeat the arguments in the ‘The Elliott-Halberstam case’ section. To do this, we rely on Proposition 14 to restrict n+h_i to the almost primes.

We turn to the details. Let ε>0 be an arbitrary fixed quantity. From (49) and Cauchy-Schwarz, one has

\sum_{\begin{matrix} x \leq n \leq 2 x \\ n = b (W) \end{matrix}} (\prod_{i = 1}^{k} λ_{F_{i}} (n + h_{i}) λ_{G_{i}} (n + h_{i})) 1_{p (n + h_{k}) \leq x^{ε}} = O (ε B^{- k} \frac{x}{W})

with the implied constant uniform in ε, so by the triangle inequality and a limiting argument as ε→0, it suffices to show that

\sum_{\begin{matrix} x \leq n \leq 2 x \\ n = b (W) \end{matrix}} (\prod_{i = 1}^{k} λ_{F_{i}} (n + h_{i}) λ_{G_{i}} (n + h_{i})) 1_{p (n + h_{k}) > x^{ε}} = (c_{ε} + o (1)) B^{- k} \frac{x}{W}

(53)

where c_ε is a quantity depending on ε but not on x, such that

{lim}_{ε \to 0} c_{ε} = \prod_{i = 1}^{k} \int_{0}^{1} F_{i}^{'} (t) G_{i}^{'} (t) dt.

We use (16) to expand out $λ_{F_{i}}, λ_{G_{i}}$ for i=1,…,k−1, but not for i=k, so that the left-hand side of (29) becomes

\sum_{d_{1}, \dots, d_{k - 1}, d_{1}^{'}, \dots, d_{k - 1}^{'}} (\prod_{i = 1}^{k} μ (d_{i}) μ (d_{i}^{'}) F_{i} (\underset{x}{log} d_{i}) G_{i} (\underset{x}{log} d_{i}^{'})) S^{'} (d_{1}, \dots, d_{k - 1}, d_{1}^{'}, \dots, d_{k - 1}^{'})

(54)

where

S^{'} (d_{1}, \dots, d_{k - 1}, d_{1}^{'}, \dots, d_{k - 1}^{'}) : = \sum_{\begin{matrix} x \leq n \leq 2 x \\ n = b (W) \\ n + h_{i} = 0 ([d_{i}, d_{i}^{'}]) \forall i = 1, \dots, k - 1 \end{matrix}} λ_{F_{k}} (n + h_{k}) λ_{G_{k}} (n + h_{k}) 1_{p (n + h_{k}) > x^{ε}} .

As before, the summand in (54) vanishes unless the modulus^d $q_{W, d_{1}, \dots, d_{k - 1}^{'}}$ defined in (44) is square-free, in which case we have the analogue

\begin{align} S^{'} (d_{1}, \dots, d_{k - 1}, d_{1}^{'}, \dots, d_{k - 1}^{'}) & = \frac{1}{φ (q)} \sum_{\begin{matrix} x + h_{k} \leq n \leq 2 x + h_{k} \\ (n, q) = 1 \end{matrix}} λ_{F_{k}} (n) λ_{G_{k}} (n) 1_{p (n) > x^{ε}} \\ + Δ (1_{[x + h_{k}, 2 x + h_{k}]} λ_{F_{k}} λ_{G_{k}} 1_{p (\cdot) > x^{ε}}; a (q)) \end{align}

(55)

of (45). Here we have put $q = q_{W, d_{1}, \dots, d_{k - 1}^{'}}$ and $a = a_{W, d_{1}, \dots, d_{k - 1}^{'}}$ for convenience. We thus split

S^{'} = S_{1}^{'} - S_{2}^{'} + S_{3}^{'},

where,

\begin{align} S_{1}^{'} (d_{1}, \dots, d_{k - 1}, d_{1}^{'}, \dots, d_{k - 1}^{'}) & = \frac{1}{φ (q)} \sum_{x + h_{k} \leq n \leq 2 x + h_{k}} λ_{F_{k}} (n) λ_{G_{k}} (n) 1_{p (n) > x^{ε}}, \end{align}

(56)

\begin{align} S_{2}^{'} (d_{1}, \dots, d_{k - 1}, d_{1}^{'}, \dots, d_{k - 1}^{'}) & = \frac{1}{φ (q)} \sum_{x + h_{k} \leq n \leq 2 x + h_{k}; (n, q) > 1} λ_{F_{k}} (n) λ_{G_{k}} (n) 1_{p (n) > x^{ε}}, \end{align}

(57)

\begin{align} S_{3}^{'} (d_{1}, \dots, d_{k - 1}, d_{1}^{'}, \dots, d_{k - 1}^{'}) & = Δ (1_{[x + h_{k}, 2 x + h_{k}]} λ_{F_{k}} λ_{G_{k}} 1_{p (\cdot) > x^{ε}}; a (q)), \end{align}

(58)

when $q = q_{W, d_{1}, \dots, d_{k - 1}^{'}}$ is square-free, with $S_{1}^{'} = S_{2}^{'} = S_{3}^{'} = 0$ otherwise.

For j∈{1,2,3}, let

\begin{matrix} Σ_{j} = & \sum_{d_{1}, \dots, d_{k - 1}, d_{1}^{'}, \dots, d_{k - 1}^{'}} (\prod_{i = 1}^{k} μ (d_{i}) μ (d_{i}^{'}) F_{i} (\underset{x}{log} d_{i}) G_{i} (\underset{x}{log} d_{i}^{'})) \\ S_{j}^{'} (d_{1}, \dots, d_{k - 1}, d_{1}^{'}, \dots, d_{k - 1}^{'}) . \end{matrix}

(59)

To show (53), it thus suffices to show the main term estimate

Σ_{1} = (c_{ε} + o (1)) B^{- k} \frac{x}{W},

(60)

the first error term estimate

Σ_{2} ⪻ ⪻ x^{1 - ε},

(61)

and the second error term estimate

Σ_{3} ≪ x \overset{- A}{log} x

(62)

for any fixed A>0.

We begin with (61). Observe that if p(n)>x^ε, then the only way that $(n, q_{W, d_{1}, \dots, d_{k - 1}^{'}})$ can exceed 1 is if there is a prime x^ε<p≪x which divides both n and one of $d_{1}, \dots, d_{k - 1}^{'}$ ; in particular, this case can only occur when k>1. For the sake of notation, we will just consider the contribution when there is a prime that divides p and d₁, as the other 2k−3 cases are similar. By (57), this contribution to Σ₂ can then be crudely bounded (using (1)) by

\begin{align} Σ_{2} & ⪻ ⪻ \sum_{x^{ε} < p ≪ x} \sum_{d_{1}, \dots, d_{k - 1}, d_{1}^{'}, \dots, d_{k - 1}^{'} \leq x; p | d_{1}} \frac{1}{[d_{1}, d_{1}^{'}] \dots [d_{k - 1}, d_{k - 1}^{'}]} \sum_{n ≪ x : p | n} 1 \\ ⪻ ⪻ \sum_{x^{ε} < p ≪ x} \frac{x}{p} (\sum_{e_{1} \leq x^{2}; p | e_{1}} \frac{τ (e_{1})}{e_{1}}) \prod_{i = 2}^{k - 1} (\sum_{e_{i} \leq x^{2}} \frac{τ (e_{i})}{e_{i}}) \\ ⪻ ⪻ \sum_{x^{ε} < p ≪ x} \frac{x}{p^{2}} \\ ⪻ ⪻ x^{1 - ε} \end{align}

as required, where we have made the change of variables $e_{i} : = [d_{i}, d_{i}^{'}]$ , using the divisor bound to control the multiplicity.

Now we show (62). From the hypothesis (28), we have $q_{W, d_{1}, \dots, d_{k - 1}^{'}} ⪻ ⪻ x^{θ}$ whenever the summand in (62) is non-zero. From the divisor bound, for each q⪻ ⪻x^θ, there are O(τ(q)^O(1)) choices of $d_{1}, \dots, d_{k - 1}^{'}$ with $q_{W, d_{1}, \dots, d_{k - 1}^{'}} = q$ . We see that the product in (59) is O(1). Thus, by (58), we may bound Σ₃ by

Σ_{3} ≪ \sum_{q ⪻ ⪻ x^{θ}} τ {(q)}^{O (1)} sup_{a \in {(ℤ / qℤ)}^{\times}} |Δ (1_{[x + h_{k}, 2 x + h_{k}]} λ_{F_{k}} λ_{G_{k}} 1_{p (\cdot) > x^{ε}}; a (q))| .

From (2), we easily obtain the bound

Σ_{3} ≪ \sum_{q ⪻ ⪻ x^{θ}} τ {(q)}^{O (1)} sup_{a \in {(ℤ / qℤ)}^{\times}} |Δ (1_{[x + h_{k}, 2 x + h_{k}]} λ_{F_{k}} λ_{G_{k}} 1_{p (\cdot) > x^{ε}}; a (q))| ≪ x \overset{O (1)}{log} x,

so by Cauchy-Schwarz, it suffices to show that

\sum_{q ⪻ ⪻ x^{θ}} sup_{a \in {(ℤ / qℤ)}^{\times}} |Δ (1_{[x + h_{k}, 2 x + h_{k}]} λ_{F_{k}} λ_{G_{k}} 1_{p (\cdot) > x^{ε}}; a (q))| ≪ x \overset{- A}{log} x

(63)

for any fixed A>0.

If we had the additional hypothesis S(F_k)+S(G_k)<1, then this would follow easily from the hypothesis GEH[ 𝜗] thanks to Claim 12, since one can write $λ_{F_{k}} λ_{G_{k}} 1_{p (\cdot) > x^{ε}} = α ⋆ β$ with

α (n) : = 1_{p (n) > x^{ε}} \sum_{d, d^{'} : [d, d^{'}] = n} μ (d) F_{k} (\underset{x}{log} d) μ (d^{'}) G_{k} (\underset{x}{log} d^{'})

and

β (n) : = 1_{p (n) > x^{ε}} .

But even in the absence of the hypothesis S(F_k)+S(G_k)<1, we can still invoke GEH[ 𝜗] after appealing to the fundamental theorem of arithmetic. Indeed, if n∈[x+h_k,2x+h_k] with p(·)>ε, then we have

n = p_{1} \dots p_{r}

for some primes x^ε<p₁≤⋯≤p_r≤2x+h_k, which forces $r \leq \frac{1}{ε} + 1$ . If we then partition [ x^ε,2x+h_k] by O(logA+1x) intervals I₁,…,I_m, with each I_j contained in an interval of the form [N,(1+ log−A x)N], then we have $p_{i} \in I_{j_{i}}$ for some 1≤j₁≤⋯≤j_r≤m, with the product interval $I_{j_{1}} \cdot \dots \cdot I_{j_{r}}$ intersecting [ x+h_k,2x+h_k]. For fixed r, there are O(logA r+r x) such tuples (j₁,…,j_r), and a simple application of the prime number theorem with classical error term (and crude estimates on the discrepancy Δ) shows that each tuple contributes O(x log−A r+O(1)x) to (63) (here, and for the rest of this section, implied constants will be independent of A unless stated otherwise). In particular, the O(logA(r−1)x) tuples (j₁,…,j_r) with one repeated j_i, or for which the interval $I_{j_{1}} \cdot \dots \cdot I_{j_{r}}$ meets the boundary of [x+h_k,2x+h_k], contributes O(log−A+O(1)x). This is an acceptable error to (63), and so these tuples may be removed. Thus, it suffices to show that

\sum_{q ⪻ ⪻ x^{θ}} sup_{a \in {(ℤ / qℤ)}^{\times}} |Δ (λ_{F_{k}} λ_{G_{k}} 1_{A_{j_{1}, \dots, j_{r}}}; a (q))| ≪ x \overset{- A (r + 1) + O (1)}{log} x

for any $1 \leq r \leq \frac{1}{ε} + 1$ and 1≤j₁<⋯<j_r≤m with $I_{j_{1}} \cdot \dots \cdot I_{j_{r}}$ contained in [ x+h_k,x+2h_k], where $A_{j_{1}, \dots, j_{r}}$ is the set of all products p₁…p_r with $p_{i} \in I_{j_{i}}$ for i=1,…,r, and where we allow implied constants in the ≪ notation to depend on ε. But for n in $A_{j_{1}, \dots, j_{r}}$ , the 2^r factors of n are just the products of subsets of {p₁,…,p_r}, and from the smoothness of F_k,G_k, we see that $λ_{F_{k}} (n)$ is equal to some bounded constant (depending on j₁,…,j_r, but independent of p₁,…,p_r), plus an error of O(log−A x). As before, the contribution of this error is O(log−A(r+1)+O(1)x), so it suffices to show that

\sum_{q ⪻ ⪻ x^{θ}} sup_{a \in {(ℤ / qℤ)}^{\times}} |Δ (1_{A_{j_{1}, \dots, j_{r}}}; a (q))| ≪ x \overset{- A (r + 1) + O (1)}{log} x.

But one can write $1_{A_{j_{1}, \dots, j_{r}}}$ as a convolution $1_{A_{j_{1}}} ⋆ \dots ⋆ 1_{A_{j_{r}}}$ , where $A_{j_{i}}$ denotes the primes in $I_{j_{i}}$ ; assigning $A_{j_{r}}$ (for instance) to be β and the remaining portion of the convolution to be α, the claim now follows from the hypothesis GEH[ 𝜗], thanks to the Siegel-Walfisz theorem (see, e.g. [32], Satz 4] or [33], Th. 5.29]).

Finally, we show (60). By Lemma 30, we have

\sum_{\begin{matrix} d_{1}, \dots, d_{k - 1}, d_{1}^{'}, \dots, d_{k - 1}^{'} \\ d_{1} d_{1}^{'}, \dots, d_{k - 1} d_{k - 1}^{'}, W coprime \end{matrix}} \frac{\prod_{i = 1}^{k - 1} μ (d_{i}) μ (d_{i}^{'}) F_{i} (\underset{x}{log} d_{i}) G_{i} (\underset{x}{log} d_{i}^{'})}{φ (q_{W, d_{1}, \dots, d_{k - 1}^{'}})} = \frac{1}{φ (W)} (C^{'} + o (1)) B^{- k + 1},

where

C^{'} : = \prod_{i = 1}^{k - 1} \int_{0}^{1} F_{i}^{'} (t) G_{i}^{'} (t) dt

(note that F_i,G_i are supported on [ 0,1] by hypothesis), so by (56) it suffices to show that

\sum_{x + h_{k} \leq n \leq 2 x + h_{k}} λ_{F_{k}} (n) λ_{G_{k}} (n) 1_{p (n) > x^{ε}} = (C_{ε}^{'^{'}} + o (1)) \frac{x}{log x},

(64)

where $C_{ε}^{′′}$ is a quantity depending on ε but not on x such that

{lim}_{ε \to 0} C_{ε}^{′′} = \int_{0}^{1} F_{k}^{'} (t) G_{k}^{'} (t) dt.

In the case S(F_k)+S(G_k)<1, this would follow easily from (the k=1 case of) Theorem 20(i) and Proposition 14. In the general case, we may appeal once more to the fundamental theorem of arithmetic. As before, we may factor n=p₁…p_r for some x^ε≤p₁≤⋯≤p_r≤2x+h_k and $r \leq \frac{1}{ε} + 1$ . The contribution of those n with a repeated prime factor p_i=p_i+1 can easily be shown to be ⪻ ⪻x^1−ε in the same manner we dealt with Σ₂, so we may restrict attention to the square-free n, for which the p_i are strictly increasing. In that case, one can write

λ_{F_{k}} (n) = {(- 1)}^{r} \partial_{(\underset{x}{log} p_{1})} \dots \partial_{(\underset{x}{log} p_{r})} F_{k} (0)

and

λ_{G_{k}} (n) = {(- 1)}^{r} \partial_{(\underset{x}{log} p_{1})} \dots \partial_{(\underset{x}{log} p_{r})} G_{k} (0)

where ∂_(h)F(x):=F(x+h)−F(x). On the other hand, a standard application of Mertens’ theorem and the prime number theorem (and an induction on r) shows that for any fixed r≥1 and any fixed continuous function $f : ℝ^{r} \to ℝ$ , we have

\sum_{x^{ε} \leq p_{1} < \dots < p_{r} : x + h_{k} \leq p_{1} \leq \dots p_{r} \leq 2 x + h_{k}} f (\underset{x}{log} p_{1}, \dots, \underset{x}{log} p_{r}) = (c_{f} + o (1)) \frac{x}{log x}

where c_f is the quantity

c_{f} : = \int_{ε \leq t_{1} < \dots < t_{r} : t_{1} + \dots + t_{r} = 1} f (t_{1}, \dots, t_{r}) \frac{n 1 \dots d t_{r - 1}}{t_{1} \dots t_{r}}

where we lift Lebesgue measure d t₁…d t_r−1 up to the hyperplane t₁+⋯+t_r=1, and thus

\int_{t_{1} + \dots + t_{r} = 1} F (t_{1}, \dots, t_{r}) d t_{1} \dots d t_{r - 1} : = \int_{ℝ^{r - 1}} F (t_{1}, \dots, t_{r - 1}, 1 - t_{1} - \dots - t_{r - 1}) d t_{1} \dots d t_{r - 1} .

Putting all these together, we see that we obtain an asymptotic (64) with

C_{ε}^{′′} : = \sum_{1 \leq r \leq \frac{1}{ε} + 1} \int_{ε \leq t_{1} < \dots < t_{r} : t_{1} + \dots + t_{r} = 1} \partial_{(t_{1})} \dots \partial_{(t_{r})} F_{k} (0) \partial_{(t_{1})} \dots \partial_{(t_{r})} G_{k} (0) \frac{d t_{1} \dots d t_{r - 1}}{t_{1} \dots t_{r}} .

By Proposition 14, we have $C_{ε}^{′′} + O (ε) = O (1)$ . In the case F_k=G_k, we see that this implies $_{ε}^{'}$ converges to a limit as ε→0, and the general case F_k≠G_k then follows from using the Cauchy-Schwarz inequality. Therefore, we have the absolute convergence

\sum_{r > 0} \int_{0 < t_{1} < \dots < t_{r} : t_{1} + \dots + t_{r} = 1} | \partial_{t_{1}} \dots \partial_{t_{r}} F_{k} (0) | | \partial_{t_{1}} \dots \partial_{t_{r}} G_{k} (0) | \frac{d t_{1} \dots d t_{r - 1}}{t_{1} \dots t_{r}} < \infty,

(65)

and so, by the dominated convergence theorem, it suffices to establish the identity

\sum_{r > 0} \int_{0 < t_{1} < \dots < t_{r} : t_{1} + \dots + t_{r} = 1} \partial_{t_{1}} \dots \partial_{t_{r}} F_{k} (0) \partial_{t_{1}} \dots \partial_{t_{r}} G_{k} (0) \frac{d t_{1} \dots d t_{r - 1}}{t_{1} \dots t_{r}} = \int_{0}^{1} F_{k}^{'} (t) G_{k}^{'} (t) dt.

(66)

It will suffice to show the identity

\sum_{r > 0} \int_{0 < t_{1} < \dots < t_{r} : t_{1} + \dots + t_{r} = 1} | \partial_{t_{1}} \dots \partial_{t_{r}} F (0) |^{2} \frac{d t_{1} \dots d t_{r - 1}}{t_{1} \dots t_{r}} = \int_{0}^{1} | F^{'} (t) |^{2} dt

(67)

for any smooth $F : [0, + \infty) \to ℝ$ , since (66) follows by replacing F with F_k+G_k and F_k−G_k and then subtracting.

At this point, we use the following identity:

Lemma 33.

For any positive reals t₁,…,t_r with r≥1, we have

\frac{1}{t_{1} \dots t_{r}} = \sum_{σ \in S_{r}} \frac{1}{\prod_{i = 1}^{r} (\sum_{j = i}^{r} t_{σ (j)})} .

(68)

Thus, for instance, when r=2, we have

\frac{1}{t_{1} t_{2}} = \frac{1}{(t_{1} + t_{2}) t_{1}} + \frac{1}{(t_{1} + t_{2}) t_{2}} .

Proof.

If the right-hand side of (68) is denoted f_r(t₁,…,t_r), then one easily verifies the identity

f_{r} (t_{1}, \dots, t_{r}) = \frac{1}{t_{1} + \dots + t_{r}} \sum_{i = 1}^{r} f_{r - 1} (t_{1}, \dots, t_{i - 1}, t_{i + 1}, \dots, t_{r})

for any r>1; but the left-hand side of (68) also obeys this identity, and the claim then follows from induction.

From this lemma and symmetrisation, we may rewrite the left-hand side of (67) as

\sum_{r > 0} \int_{\begin{matrix} t_{1}, \dots, t_{r} \geq 0 \\ t_{1} + \dots + t_{r} = 1 \end{matrix}} | \partial_{(t_{1})} \dots \partial_{(t_{r})} F (0) |^{2} \frac{d t_{1} \dots d t_{r - 1}}{\prod_{i = 1}^{r} (\sum_{j = i}^{r} t_{i})} .

Let

I_{a} (F) : = \int_{0}^{a} F^{'} {(t)}^{2} dt,

and

J_{a} (F) : = {(\partial_{(a)} F (0))}^{2} .

One can then rewrite (67) as the identity

I_{1} (F) = \sum_{r = 1}^{\infty} K_{1, r} (F),

(69)

where

K_{a, r} (F) : = \int_{\begin{matrix} t_{1}, \dots, t_{r} \geq 0 \\ t_{1} + \dots + t_{r} = a \end{matrix}} J_{t_{r}} (\partial_{(t_{1})} \dots \partial_{(t_{r - 1})} F) \frac{d t_{1} \dots d t_{r - 1}}{a (a - t_{1}) \dots (a - t_{1} - \dots - t_{r - 1})} .

To prove this, we first observe the identity

I_{a} (F) = \frac{1}{a} J_{a} (F) + \int_{0 \leq t \leq a} I_{a - t} (\partial_{(t)} F) \frac{dt}{a}

for any a>0; indeed, we have

\begin{align} \int_{0 \leq t \leq a} I_{a - t} (\partial_{(t)} F) \frac{dt}{a} & = \int_{0 \leq t \leq a; 0 \leq u \leq a - t} | F^{'} (t + u) - F^{'} (t) |^{2} \frac{dudt}{a} \\ = \int_{0 \leq t \leq s \leq a} | F^{'} (s) - F^{'} (t) |^{2} \frac{dsdt}{a} \\ = \frac{1}{2} \int_{0}^{a} \int_{0}^{a} | F^{'} (s) - F^{'} (t) |^{2} \frac{dsdt}{a} \\ = \int_{0}^{a} | F^{'} (s) |^{2} ds - \frac{1}{a} (\int_{0}^{a} F^{'} (s) ds) (\int_{0}^{a} F^{'} (t) dt) \\ = I_{a} (F) - \frac{1}{a} J_{a} (F), \end{align}

and the claim follows. Iterating this identity k times, we see that

I_{a} (F) = \sum_{r = 1}^{k} K_{a, r} (F) + L_{a, k} (F)

(70)

for any k≥1, where

L_{a, k} (F) : = \int_{\begin{matrix} t_{1}, \dots, t_{k} \geq 0 \\ t_{1} + \dots + t_{k} \leq a \end{matrix}} I_{1 - t_{1} - \dots - t_{k}} (\partial_{(t_{1})} \dots \partial_{(t_{k})} F) \frac{d t_{1} \dots d t_{k}}{a (a - t_{1}) \dots (a - t_{1} - \dots - t_{k - 1})} .

In particular, dropping the L_a,k(F) term and sending k→∞ yields the lower bound

\sum_{r = 1}^{\infty} K_{a, r} (F) \leq I_{a} (F) .

(71)

On the other hand, we can expand L_a,k(F) as

\int_{\begin{matrix} t_{1}, \dots, t_{k}, t \geq 0 \\ t_{1} + \dots + t_{k} + t \leq a \end{matrix}} | \partial_{(t_{1})} \dots \partial_{(t_{k})} F^{'} (t) |^{2} \frac{d t_{1} \dots d t_{k} dt}{a (a - t_{1}) \dots (a - t_{1} - \dots - t_{k - 1})} .

Writing s:=t₁+⋯+t_k, we obtain the upper bound

L_{a, k} (F) \leq \int_{s, t \geq 0 : s + t \leq a} K_{s, k} (F_{t}^{'}) dt,

where F_t(x):=F(x+t). Summing this and using (71) and the monotone convergence theorem, we conclude that

\sum_{k = 1}^{\infty} L_{a, k} (F) \leq \int_{s, t \geq 0 : s + t \leq a} I_{s} (F_{t}) dt < \infty,

and in particular L_a,k(F)→0 as k→∞. Sending k→∞ in (70), we obtain (69) as desired.

Reduction to a variational problem

Now that we have proven Theorems 19 and 20, we can now establish Theorems 22, 24, 26 and 28. The main technical difficulty is to take the multidimensional measurable functions F appearing in these functions and approximate them by tensor products of smooth functions, for which Theorems 19 and 20 may be applied.

5.1 Proof of Theorem 22

We now prove Theorem 22. Let k,m,𝜗 obey the hypotheses of that theorem, and thus we may find a fixed square-integrable function $F : [0, + \infty)^{k} \to ℝ$ supported on the simplex

R_{k} : = \{(t_{1}, \dots, t_{k}) \in [0, + \infty)^{k} : t_{1} + \dots + t_{k} \leq 1\}

and not identically zero and with

\frac{\sum_{i = 1}^{k} J_{i} (F)}{I (F)} > \frac{2 m}{𝜗} .

(72)

We now perform a number of technical steps to further improve the structure of F. Our arguments here will be somewhat convoluted and are not the most efficient way to prove Theorem 22 (which in any event was already established in [5]), but they will motivate the similar arguments given below to prove the more difficult results in Theorems 24, 26 and 28. In particular, we will use regularisation techniques which are compatible with the vanishing marginal condition (35) that is a key hypothesis in Theorem 28.

We first need to rescale and retreat a little bit from the slanted boundary of the simplex $R_{k}$ . Let δ₁>0 be a sufficiently small fixed quantity, and write $F_{1} : [0, + \infty)^{k} \to ℝ$ to be the rescaled function

F_{1} (t_{1}, \dots, t_{k}) : = F (\frac{t_{1}}{𝜗 / 2 - δ_{1}}, \dots, \frac{t_{k}}{𝜗 / 2 - δ_{1}}) .

Thus, F₁ is a fixed square-integrable measurable function supported on the rescaled simplex

(𝜗 / 2 - δ_{1}) \cdot R_{k} = \{(t_{1}, \dots, t_{k}) \in [0, + \infty)^{k} : t_{1} + \dots + t_{k} \leq 𝜗 / 2 - δ_{1}\} .

From (72), we see that if δ₁ is small enough, then F₁ is not identically zero and

\frac{\sum_{i = 1}^{k} J_{i} (F_{1})}{I (F_{1})} > m.

(73)

Let δ₁ and F₁ be as above. Next, let δ₂>0 be a sufficiently small fixed quantity (smaller than δ₁), and write $F_{2} : [0, + \infty)^{k} \to ℝ$ to be the shifted function, defined by setting

F_{2} (t_{1}, \dots, t_{k}) : = F_{1} (t_{1} - δ_{2}, \dots, t_{k} - δ_{2})

when t₁,…,t_k≥δ₂, and F₂(t₁,…,t_k)=0 otherwise. As F₁ was square-integrable, compactly supported, and not identically zero, and because spatial translation is continuous in the strong operator topology on L², it is easy to see that we will have F₂ not identically zero and that

\frac{\sum_{i = 1}^{k} J_{i} (F_{2})}{I (F_{2})} > m

(74)

for δ₂ small enough (after restricting F₂ back to [ 0,+∞)^k, of course). For δ₂ small enough, this function will be supported on the region

\{(t_{1}, \dots, t_{k}) \in ℝ^{k} : t_{1} \dots + t_{k} \leq 𝜗 / 2 - δ_{2}; t_{1}, \dots, t_{k} \geq δ_{2}\},

and thus F₂ stays away from all the boundary faces of $R_{k}$ .

By convolving F₂ with a smooth approximation to the identity that is supported sufficiently close to the origin, one may then find a smooth function $F_{3} : [0, + \infty)^{k} \to ℝ$ , supported on

\{(t_{1}, \dots, t_{k}) \in ℝ^{k} : t_{1} \dots + t_{k} \leq 𝜗 / 2 - δ_{2} / 2; t_{1}, \dots, t_{k} \geq δ_{2} / 2\},

which is not identically zero and such that

\frac{\sum_{i = 1}^{k} J_{i} (F_{3})}{I (F_{3})} > m.

(75)

We extend F₃ by zero to all of $ℝ^{k}$ and then define the function $f_{3} : ℝ^{k} \to ℝ$ by

f_{3} (t_{1}, \dots, t_{k}) : = \int_{s_{1} \geq t_{1}, \dots, s_{k} \geq t_{k}} F_{3} (s_{1}, \dots, s_{k}) d s_{1} \dots d s_{k},

and thus f₃ is smooth, not identically zero and supported on the region

\{(t_{1}, \dots, t_{k}) \in ℝ^{k} : \sum_{i = 1}^{k} max (t_{i}, δ_{2} / 2) \leq 𝜗 / 2 - δ_{2} / 2\} .

(76)

From the fundamental theorem of calculus, we have

F_{3} (t_{1}, \dots, t_{k}) : = {(- 1)}^{k} \frac{\partial^{k}}{\partial t_{1} \dots \partial t_{k}} f_{3} (t_{1}, \dots, t_{k}),

(77)

and so $I (F_{3}) = Ĩ (f_{3})$ and $J_{i} (F_{3}) = {\tilde{J}}_{i} (f_{3})$ for i=1,…,k, where

Ĩ (f_{3}) : = \int_{[0, + \infty)^{k}} {|\frac{\partial^{k}}{\partial t_{1} \dots \partial t_{k}} f_{3} (t_{1}, \dots, t_{k})|}^{2} d t_{1} \dots d t_{k}

(78)

and

\begin{matrix} {\tilde{J}}_{i} (f_{3}) : = & \int_{[0, + \infty)^{k - 1}} {|\frac{\partial^{k - 1}}{\partial t_{1} \dots \partial t_{i - 1} \partial t_{i + 1} \dots \partial t_{k}} f_{3} (t_{1}, \dots, t_{i - 1}, 0, t_{i + 1}, \dots, t_{k})|}^{2} \\ d t_{1} \dots d t_{i - 1} d t_{i + 1} \dots d t_{k} . \end{matrix}

(79)

In particular,

\frac{\sum_{i = 1}^{k} {\tilde{J}}_{i} (f_{3})}{Ĩ (f_{3})} > m.

(80)

Now we approximate f₃ by linear combinations of tensor products. By the Stone-Weierstrass theorem, we may express f₃ as the uniform limit of functions of the form

(t_{1}, \dots, t_{k}) \mapsto \sum_{j = 1}^{J} c_{j} f_{1, j} (t_{1}) \dots f_{k, j} (t_{k})

(81)

where c₁,…,c_J are real scalars, and $f_{i, j} : ℝ \to ℝ$ are smooth compactly supported functions. Since f₃ is supported in (76), we can ensure that all the components f_1,j(t₁)…f_k,j(t_k) are supported in the slightly larger region

\{(t_{1}, \dots, t_{k}) \in ℝ^{k} : \sum_{i = 1}^{k} max (t_{i}, δ_{2} / 4) \leq 𝜗 / 2 - δ_{2} / 4\} .

Observe that if one convolves a function of the form (81) by a smooth approximation to the identity which is of tensor product form (t₁,…,t_k)↦φ₁(t₁)…φ₁(t_k), one obtains another function of this form. Such a convolution converts a uniformly convergent sequence of functions to a uniformly smoothly convergent sequence of functions (that is to say, all derivatives of the functions converge uniformly). From this, we conclude that f₃ can be expressed as the smooth limit of functions of the form (81), with each component f_1,j(t₁)…f_k,j(t_k) supported in the region

\{(t_{1}, \dots, t_{k}) \in ℝ^{k} : \sum_{i = 1}^{k} max (t_{i}, δ_{2} / 8) \leq 𝜗 / 2 - δ_{2} / 8\} .

Thus, we may find such a linear combination

f_{4} (t_{1}, \dots, t_{k}) = \sum_{j = 1}^{J} c_{j} f_{1, j} (t_{1}) \dots f_{k, j} (t_{k})

(82)

with J, c_j, f_i,j fixed and f₄ not identically zero, with

\frac{\sum_{i = 1}^{k} {\tilde{J}}_{i} (f_{4})}{Ĩ (f_{4})} > m.

(83)

Furthermore, by construction we have

S (f_{1, j}) + \dots + S (f_{k, j}) < \frac{𝜗}{2} \leq \frac{1}{2}

(84)

for all j=1,…,J, where S() was defined in (22).

Now we construct the sieve weight $ν : ℕ \to ℝ$ by the formula

ν (n) : = {(\sum_{j = 1}^{J} c_{j} λ_{f_{1, j}} (n + h_{1}) \dots λ_{f_{k, j}} (n + h_{k}))}^{2},

(85)

where the divisor sums λ_f were defined in (16).

Clearly ν is non-negative. Expanding out the square and using Theorem 20(i) and (84), we see that

\sum_{\begin{matrix} x \leq n \leq 2 x \\ n = b (W) \end{matrix}} ν (n) = (α + o (1)) B^{- k} \frac{x}{log x}

where

α : = \sum_{j = 1}^{J} \sum_{j^{'} = 1}^{J} c_{j} c_{j^{'}} \prod_{i = 1}^{k} \int_{0}^{\infty} f_{i, j}^{'} (t_{i}) f_{i, j^{'}}^{'} (t_{i}) d t_{i}

which factorizes using (82) and (78) as

\begin{align} α & = \int_{[0, + \infty)^{k}} {|\frac{\partial^{k}}{\partial t_{1} \dots \partial t_{k}} f_{4} (t_{1}, \dots, t_{k})|}^{2} d t_{1} \dots d t_{k} \\ = Ĩ (f_{4}) . \end{align}

Now consider the sum

\sum_{\begin{matrix} x \leq n \leq 2 x \\ n = b (W) \end{matrix}} ν (n) θ (n + h_{k}) .

By (20), one has

λ_{f_{k, j}} (n + h_{k}) = f_{k, j} (0)

whenever n gives a non-zero contribution to the above sum. Expanding out the square in (85) again and using Theorem 19(i) and (84) (and the hypothesis EH[ 𝜗]), we thus see that

\sum_{\begin{matrix} x \leq n \leq 2 x \\ n = b (W) \end{matrix}} ν (n) θ (n + h_{k}) = (β_{k} + o (1)) B^{1 - k} \frac{x}{φ (W)}

where

β_{k} : = \sum_{j = 1}^{J} \sum_{j^{'} = 1}^{J} c_{j} c_{j^{'}} f_{i, j} (0) f_{i, j^{'}} (0) \prod_{i = 1}^{k - 1} \int_{0}^{\infty} f_{i, j}^{'} (t_{i}) f_{i, j^{'}}^{'} (t_{i}) d t_{i}

which factorizes using (82) and (79) as

\begin{align} β_{k} & = \int_{[0, + \infty)^{k}} {|\frac{\partial^{k}}{\partial t_{1} \dots \partial t_{k - 1}} f_{4} (t_{1}, \dots, t_{k - 1}, 0)|}^{2} d t_{1} \dots d t_{k - 1} \\ = {\tilde{J}}_{k} (f_{4}) . \end{align}

More generally, we see that

\sum_{\begin{matrix} x \leq n \leq 2 x \\ n = b (W) \end{matrix}} ν (n) θ (n + h_{k}) = (β_{i} + o (1)) B^{1 - k} \frac{x}{φ (W)}

for i=1,…,k, with $β_{i} : = {\tilde{J}}_{i} (f_{4})$ . Applying Lemma 19 and (75), we obtain DHL[ k;m+1] as required.

5.2 Proof of Theorem 24

Now we prove Theorem 24, which uses a very similar argument to that of the previous section. Let k,m,ϖ,δ,F be as in Theorem 24. By performing the same rescaling as in the previous section (but with 1/2+2ϖ playing the role of 𝜗), we see that we can find a fixed square-integrable measurable function F₁ supported on the rescaled truncated simplex

\{(t_{1}, \dots, t_{k}) \in [0, + \infty)^{k} : t_{1} + \dots + t_{k} \leq \frac{1}{4} + ϖ - δ_{1}; t_{1}, \dots, t_{k} < δ - δ_{1}\}

for some sufficiently small fixed δ₁>0, such that (73) holds. By repeating the arguments of the previous section, we may eventually arrive at a smooth function $f_{4} : ℝ^{k} \to ℝ$ of the form (82), which is not identically zero and obeys (83) and such that each component f_1,j(t₁)…f_k,j(t_k) is supported in the region

\{(t_{1}, \dots, t_{k}) \in ℝ^{k} : \sum_{i = 1}^{k} max (t_{i}, δ_{2} / 8) \leq \frac{1}{4} + ϖ - δ_{2} / 8; t_{1}, \dots, t_{k} < δ - δ_{2} / 8\}

for some sufficiently small δ₂>0. In particular, one has

S (f_{1, j}) + \dots + S (f_{k, j}) < \frac{1}{4} + ϖ \leq \frac{1}{2}

and

S (f_{1, j}), \dots, S (f_{k, j}) < δ

for all j=1,…,J. If we then define ν by (85) as before, and repeat all of the above arguments (but use Theorem 19(ii) and MPZ[ ϖ,δ] in place of Theorem 19(i) and EH[ 𝜗]), we obtain the claim; we leave the details to the interested reader.

5.3 Proof of Theorem 26

Now we prove Theorem 26. Let k,m,ε,𝜗 be as in that theorem. Then, one may find a square-integrable function $F : [0, + \infty)^{k} \to ℝ$ supported on $(1 + ε) \cdot R_{k}$ which is not identically zero, and with

\frac{\sum_{i = 1}^{k} J_{i, 1 - ε} (F)}{I (F)} > \frac{2 m}{𝜗} .

By truncating and rescaling as in the ‘Proof of Theorem 22’ section, we may find a fixed bounded measurable function $F_{1} : [0, + \infty)^{k} \to ℝ$ on the simplex $(1 + ε) (\frac{𝜗}{2} - δ_{1}) \cdot R_{k}$ such that

\frac{\sum_{i = 1}^{k} J_{i, (1 - ε) \frac{𝜗}{2}} (F_{1})}{I (F_{1})} > m.

By repeating the arguments in the ‘Proof of Theorem 22’ section, we may eventually arrive at a smooth function $f_{4} : ℝ^{k} \to ℝ$ of the form (82), which is not identically zero and obeys

\frac{\sum_{i = 1}^{k} {\tilde{J}}_{i, (1 - ε) \frac{𝜗}{2}} (f_{4})}{Ĩ (f_{4})} > m

(86)

with

\begin{align} {\tilde{J}}_{i, (1 - ε) \frac{𝜗}{2}} (f_{4}) & : = \int_{(1 - ε) \frac{𝜗}{2} \cdot R_{k - 1}} {|\frac{\partial^{k - 1}}{\partial t_{1} \dots \partial t_{i - 1} \partial t_{i + 1} \dots \partial t_{k}} f_{4} (t_{1}, \dots, t_{i - 1}, 0, t_{i + 1}, \dots, t_{k})|}^{2} \\ d t_{1} \dots d t_{i - 1} d t_{i + 1} \dots d t_{k}, \end{align}

and such that each component f_1,j(t₁)…f_k,j(t_k) is supported in the region

\{(t_{1}, \dots, t_{k}) \in ℝ^{k} : \sum_{i = 1}^{k} max (t_{i}, δ_{2} / 8) \leq (1 + ε) \frac{𝜗}{2} - \frac{δ_{2}}{8}\}

for some sufficiently small δ₂>0. In particular, we have

S (f_{1, j}) + \dots + S (f_{k, j}) \leq (1 + ε) \frac{𝜗}{2} - \frac{δ_{2}}{8}

(87)

for all 1≤j≤J.

Let δ₃>0 be a sufficiently small fixed quantity (smaller than δ₁ or δ₂). By a smooth partitioning, we may assume that all of the f_i,j are supported in intervals of length at most δ₃, while keeping the sum

\sum_{j = 1}^{J} | c_{j} | | f_{1, j} (t_{1}) | \dots | f_{k, j} (t_{k}) |

(88)

bounded uniformly in t₁,…,t_k and in δ₃.

Now let ν be as in (85), and consider the expression

\sum_{\begin{matrix} x \leq n \leq 2 x \\ n = b (W) \end{matrix}} ν (n) .

This expression expands as a linear combination of the expressions

\sum_{\begin{matrix} x \leq n \leq 2 x \\ n = b (W) \end{matrix}} \prod_{i = 1}^{k} λ_{f_{i, j}} (n + h_{i}) λ_{f_{i, j^{'}}} (n + h_{i})

for various 1≤j,j^′≤J. We claim that this sum is equal to

(\prod_{i = 1}^{k} \int_{0}^{1} f_{i, j}^{'} (t_{i}) f_{i, j^{'}}^{'} (t_{i}) d t_{i} + o (1)) B^{- k} \frac{x}{W} .

To see this, we divide into two cases. First, suppose that hypothesis (i) from Theorem 26 holds, then from (87) we have

\sum_{i = 1}^{k} S (f_{i, j}) + S (f_{i, j^{'}}) < (1 + ε) 𝜗 < 1

and the claim follows from Theorem 20(i). Now suppose instead that hypothesis (ii) from Theorem 26 holds, then from (87) one has

\sum_{i = 1}^{k} S (f_{i, j}) + S (f_{i, j^{'}}) < (1 + ε) 𝜗 < \frac{k}{k - 1} 𝜗,

and so from the pigeonhole principle, we have

\sum_{1 \leq i \leq k : i \neq i_{0}} S (f_{i, j}) + S (f_{i, j^{'}}) < 𝜗

for some i₀=1,…,k. The claim now follows from Theorem 20(ii).

Putting these together as in the ‘Proof of Theorem 22’ section, we conclude that

\sum_{\begin{matrix} x \leq n \leq 2 x \\ n = b (W) \end{matrix}} ν (n) = (α + o (1)) B^{- k} \frac{x}{W}

where

α : = Ĩ (f_{4}) .

Now we consider the sum

\sum_{\begin{matrix} x \leq n \leq 2 x \\ n = b (W) \end{matrix}} ν (n) θ (n + h_{k}) .

(89)

From Proposition 13, we see that we have EH[ 𝜗] as a consequence of the hypotheses of Theorem 26. However, this and Theorem 19 are not strong enough to obtain an asymptotic for the sum (89), as there is an epsilon loss in (87). But observe that Lemma 18 only requires a lower bound on the sum (89), rather than an asymptotic.

To obtain this lower bound, we partition {1,…,J} into $J_{1} \cup J_{2}$ , where $J_{1}$ consists of those indices j∈{1,…,J} with

S (f_{1, j}) + \dots + S (f_{k - 1, j}) < (1 - ε) \frac{𝜗}{2}

(90)

and $J_{2}$ is the complement. From the elementary inequality

{(x_{1} + x_{2})}^{2} = x_{1}^{2} + 2 x_{1} x_{2} + x_{2}^{2} \geq (x_{1} + 2 x_{2}) x_{1},

we obtain the pointwise lower bound

\begin{matrix} ν (n) \geq & ((\sum_{j \in J_{1}} + 2 \sum_{j \in J_{2}}) c_{j} λ_{f_{1, j}} (n + h_{1}) \dots λ_{f_{k, j}} (n + h_{k})) \\ \times (\sum_{j^{'} \in J_{1}} c_{j^{'}} λ_{f_{1, j^{'}}} (n + h_{1}) \dots λ_{f_{k, j^{'}}} (n + h_{k})) . \end{matrix}

The point of performing this lower bound is that if $j \in J_{1} \cup J_{2}$ and $j^{'} \in J_{1}$ , then from (87) and (90) one has

\sum_{i = 1}^{k - 1} S (f_{i, j}) + S (f_{i, j^{'}}) < 𝜗

which makes Theorem 19(i) available for use. Indeed, for any j∈{1,…,J} and i=1,…,k, we have from (87) that

S (f_{i, j}) \leq (1 + ε) \frac{𝜗}{2} < 𝜗 < 1,

and so by (20), we have

\begin{matrix} \begin{matrix} ν (n) θ (n + h_{k}) & \geq ((\sum_{j \in J_{1}} + 2 \sum_{j \in J_{2}}) c_{j} λ_{f_{1, j}} (n + h_{1}) \dots λ_{f_{k - 1, j}} (n + h_{k - 1}) f_{k, j} (0)) \\ \times (\sum_{j^{'} \in J_{1}} c_{j^{'}} λ_{f_{1, j^{'}}} (n + h_{1}) \dots λ_{f_{k - 1, j^{'}}} (n + h_{k - 1}) f_{k, j^{'}} (0)) θ (n + h_{k}) \end{matrix} \end{matrix}

(91)

for x≤n≤2x. If we then apply Theorem 19(i) and the hypothesis EH[ 𝜗], we obtain the lower bound

\sum_{\begin{matrix} x \leq n \leq 2 x \\ n = b (W) \end{matrix}} ν (n) θ (n + h_{k}) \geq (β_{k} - o (1)) B^{1 - k} \frac{x}{φ (W)}

with

β_{k} : = (\sum_{j \in J_{1}} + 2 \sum_{j \in J_{2}}) \sum_{j^{'} \in J_{1}} c_{j} c_{j^{'}} f_{k, j} (0) f_{k, j^{'}} (0) \prod_{i = 1}^{k - 1} \int_{0}^{\infty} f_{i, j}^{'} (t_{i}) f_{i, j^{'}}^{'} (t_{i}) d t_{i}

which we can rearrange as

\begin{align} β_{k} & = \int_{[0, + \infty)^{k - 1}} (\frac{\partial^{k - 1}}{\partial t_{1} \dots \partial t_{k - 1}} f_{4, 1} (t_{1}, \dots, t_{k - 1}, 0) + 2 \frac{\partial^{k - 1}}{\partial t_{1} \dots \partial t_{k - 1}} f_{4, 2} (t_{1}, \dots, t_{k - 1}, 0)) \\ \frac{\partial^{k - 1}}{\partial t_{1} \dots \partial t_{k - 1}} f_{4, 1} (t_{1}, \dots, t_{k - 1}, 0) d t_{1} \dots d t_{k - 1} \end{align}

where

f_{4, l} (t_{1}, \dots, t_{k}) : = \sum_{j \in J_{l}} c_{j} f_{1, j} (t_{1}) \dots f_{k, j} (t_{k})

for l=1,2. Note that f_4,1,f_4,2 are both bounded pointwise by (88), and their supports only overlap on a set of measure O(δ₃). We conclude that

β_{k} = {\tilde{J}}_{k} (f_{4, 1}) + O (δ_{3})

with the implied constant independent of δ₃, and thus

β_{k} = {\tilde{J}}_{k, (1 - ε) \frac{𝜗}{2}} (f_{4}) + O (δ_{3}) .

A similar argument gives

\sum_{\begin{matrix} x \leq n \leq 2 x \\ n = b (W) \end{matrix}} ν (n) θ (n + h_{i}) \geq (β_{i} - o (1)) B^{1 - k} \frac{x}{φ (W)}

for i=1,…,k with

β_{i} = {\tilde{J}}_{i, (1 - ε) \frac{𝜗}{2}} (f_{4}) + O (δ_{3}) .

If we choose δ₃ small enough, then the claim DHL[ k;m+1] now follows from Lemma 18 and (86).

5.4 Proof of Theorem 28

Finally, we prove Theorem 28. Let k,m,ε,F be as in that theorem. By rescaling as in previous sections, we may find a square-integrable function $F_{1} : [0, + \infty)^{k} \to ℝ$ supported on $(\frac{k}{k - 1} \frac{𝜗}{2} - δ_{1}) \cdot R_{k}$ for some sufficiently small fixed δ₁>0, which is not identically zero, which obeys the bound

\frac{\sum_{i = 1}^{k} J_{i, (1 - ε) \frac{𝜗}{2}} (F_{1})}{I (F_{1})} > m

and also obeys the vanishing marginal condition (35) whenever t₁,…,t_i−1,t_i+1,…,t_k≥0 are such that

t_{1} + \dots + t_{i - 1} + t_{i + 1} + \dots + t_{k} > (1 + ε) \frac{𝜗}{2} - δ_{1} .

As before, we pass from F₁ to F₂ by a spatial translation, and from F₂ to F₃ by a regularisation; crucially, we note that both of these operations interact well with the vanishing marginal condition (35), with the end product being that we obtain a smooth function $F_{3} : [0, + \infty)^{k} \to ℝ$ , supported on the region

\{(t_{1}, \dots, t_{k}) \in ℝ^{k} : t_{1} \dots + t_{k} \leq \frac{k}{k - 1} \frac{𝜗}{2} - \frac{δ_{2}}{2}; t_{1}, \dots, t_{k} \geq \frac{δ_{2}}{2}\}

for some sufficiently small δ₂>0, which is not identically zero, obeying the bound

\frac{\sum_{i = 1}^{k} J_{i, (1 - ε) \frac{𝜗}{2}} (F_{3})}{I (F_{3})} > m

and also obeying the vanishing marginal condition (35) whenever t₁,…,t_i−1,t_i+1,…,t_k≥0 are such that

t_{1} + \dots + t_{i - 1} + t_{i + 1} + \dots + t_{k} > (1 + ε) \frac{𝜗}{2} - \frac{δ_{2}}{2} .

As before, we now define the function $f_{3} : ℝ^{k} \to ℝ$ by

f_{3} (t_{1}, \dots, t_{k}) : = \int_{s_{1} \geq t_{1}, \dots, s_{k} \geq t_{k}} F_{3} (s_{1}, \dots, s_{k}) d s_{1} \dots d s_{k},

and thus, f₃ is smooth, not identically zero and supported on the region

\{(t_{1}, \dots, t_{k}) \in ℝ^{k} : \sum_{i = 1}^{k} max (t_{i}, δ_{2} / 2) \leq \frac{k}{k - 1} \frac{𝜗}{2} - \frac{δ_{2}}{2}\} .

Furthermore, from the vanishing marginal condition, we see that we also have

f_{3} (t_{1}, \dots, t_{k}) = 0

whenever we have some 1≤i≤k for which t_i≤δ₂/2 and

t_{1} + \dots + t_{i - 1} + t_{i + 1} + \dots + t_{k} \geq (1 + ε) \frac{𝜗}{2} - \frac{δ_{2}}{2} .

From the fundamental theorem of calculus as before, we have

\frac{\sum_{i = 1}^{k} {\tilde{J}}_{i, (1 - ε) \frac{𝜗}{2}} (f_{3})}{Ĩ (f_{3})} > m.

Using the Stone-Weierstrass theorem as before, we can then find a function f₄ of the form

(t_{1}, \dots, t_{k}) \mapsto \sum_{j = 1}^{J} c_{j} f_{1, j} (t_{1}) \dots f_{k, j} (t_{k})

(92)

where c₁,…,c_J are real scalars, and $f_{i, j} : ℝ \to ℝ$ are smooth functions supported of intervals of length at most δ₃>0 for some sufficiently small δ₃>0, with the support of each component f_1,j(t₁)…f_k,j(t_k) supported in the region

\{(t_{1}, \dots, t_{k}) \in ℝ^{k} : \sum_{i = 1}^{k} max (t_{i}, δ_{2} / 8) \leq \frac{k}{k - 1} \frac{𝜗}{2} - δ_{2} / 8\}

and avoiding the regions

\{(t_{1}, \dots, t_{k}) \in ℝ^{k} : t_{i} \leq δ_{2} / 8; t_{1} + \dots + t_{i - 1} + t_{i + 1} + \dots + t_{k} \geq (1 + ε) \frac{𝜗}{2} - δ_{2} / 8\}

for each i=1,…,k, and such that

\frac{\sum_{i = 1}^{k} {\tilde{J}}_{i, (1 - ε) \frac{𝜗}{2}} (f_{4})}{Ĩ (f_{4})} > m.

In particular, for any j=1,…,J we have

S (f_{1, j}) + \dots + S (f_{k, j}) < \frac{k}{k - 1} \frac{𝜗}{2} < \frac{1}{2} \frac{k}{k - 1} \leq 1

(93)

and for any i=1,…,k with f_k,i not vanishing at zero, we have

S (f_{1, j}) + \dots + S (f_{k, i - 1}) + S (f_{k, i + 1}) + \dots + S (f_{k, j}) < (1 + ε) \frac{𝜗}{2} .

(94)

Let ν be defined by (85). From (93), the hypothesis GEH[ 𝜗], and the argument from the previous section used to prove Theorem 26(ii), we have

\sum_{\begin{matrix} x \leq n \leq 2 x \\ n = b (W) \end{matrix}} ν (n) = (α + o (1)) B^{- k} \frac{x}{W}

where

α : = Ĩ (f_{4}) .

Similarly, from (94) (and the upper bound S(f_i,j)<1 from (93)), the hypothesis EH[ 𝜗] (which is available by Proposition 13), and the argument from the previous section, we have

\sum_{\begin{matrix} x \leq n \leq 2 x \\ n = b (W) \end{matrix}} ν (n) θ (n + h_{i}) \geq (β_{i} - o (1)) B^{1 - k} \frac{x}{φ (W)}

for i=1,…,k with

β_{i} = {\tilde{J}}_{i, (1 - ε) \frac{𝜗}{2}} (f_{4}) + O (δ_{3}) .

Setting δ₃ small enough, the claim DHL[ k;m+1] now follows from Lemma 18.

Asymptotic analysis

We now establish upper and lower bounds on the quantity M_k defined in (33), as well as for the related quantities appearing in Theorem 24.

To obtain an upper bound on M_k, we use the following consequence of the Cauchy-Schwarz inequality.

Lemma 34(Cauchy-Schwarz).

Let k≥2, and suppose that there exist positive measurable functions $G_{i} : R_{k} \to (0, + \infty)$ for i=1,…,k such that

\int_{0}^{\infty} G_{i} (t_{1}, \dots, t_{k}) d t_{i} \leq 1

(95)

for all t₁,…,t_i−1,t_i+1,…,t_k≥0, where we extend G_i by zero to all of [ 0,+∞)^k. Then, we have

M_{k} \leq ess sup_{(t_{1}, \dots, t_{k}) \in R_{k}} \sum_{i = 1}^{k} \frac{1}{G_{i} (t_{1}, \dots, t_{k})} .

(96)

Here ess sup refers to essential supremum (thus, we may ignore a subset of $R_{k}$ of measure zero in the supremum).

Proof.

Let $F : [0, + \infty)^{k} \to ℝ$ be a square-integrable function supported on $R_{k}$ . From the Cauchy-Schwarz inequality and (95), we have

{(\int_{0}^{\infty} F (t_{1}, \dots, t_{k}) d t_{i})}^{2} \leq \int_{0}^{\infty} \frac{F {(t_{1}, \dots, t_{k})}^{2}}{G_{i} (t_{1}, \dots, t_{k})} d t_{i}

for any t₁,…,t_i−1,t_i+1,…,t_k≥0. Inserting this into (32) and integrating, we conclude that

J_{i} (F) \leq \int_{R_{k}} \frac{F {(t_{1}, \dots, t_{k})}^{2}}{G_{i} (t_{1}, \dots, t_{k})} d t_{1} \dots d t_{k} .

Summing in i and using (31), (33), and (96), we obtain the claim.

As a corollary, we can compute M_k exactly if we can locate a positive eigenfunction:

Corollary 35.

Let k≥2, and suppose that there exists a positive function $F : R_{k} \to (0, + \infty)$ obeying the eigenfunction equation

λF (t_{1}, \dots, t_{k}) = \sum_{i = 1}^{k} \int_{0}^{\infty} F (t_{1}, \dots, t_{i - 1}, t_{i}^{'}, t_{i + 1}, \dots, t_{k}) d t_{i}^{'}

(97)

for some λ>0 and all $(t_{1}, \dots, t_{k}) \in R_{k}$ , where we extend F by zero to all of [ 0,+∞)^k. Then, λ=M_k.

Proof.

On the one hand, if we integrate (97) against F and use (31) and (32), we see that

λI (F) = \sum_{i = 1}^{k} J_{i} (F),

and thus by (33), we see that M_k≥λ. On the other hand, if we apply Lemma 34 with

G_{i} (t_{1}, \dots, t_{k}) : = \frac{F (t_{1}, \dots, t_{k})}{\int_{0}^{\infty} F (t_{1}, \dots, t_{i - 1}, t_{i}^{'}, t_{i + 1}, \dots, t_{k}) d t_{i}^{'}},

we see that M_k≤λ, and the claim follows.

This allows for an exact calculation of M₂:

Corollary 36(Computation of M₂).

We have

M_{2} = \frac{1}{1 - W (1 / e)} = 1.38593 \dots

where the Lambert W-function W(x) is defined for positive x as the unique positive solution to x=W(x)e^W(x).

Proof.

If we set $λ : = \frac{1}{1 - W (1 / e)} = 1.38593 \dots$ , then a brief calculation shows that

2 λ - 1 = λ log λ - λ log (λ - 1) .

(98)

Now if we define the function f: [ 0,1]→[ 0,+∞) by the formula

f (x) : = \frac{1}{λ - 1 + x} + \frac{1}{2 λ - 1} log \frac{λ - x}{λ - 1 + x},

then a further brief calculation shows that

\int_{0}^{1 - x} f (y) dy = \frac{λ - 1 + x}{2 λ - 1} log \frac{λ - x}{λ - 1 + x} + \frac{λ log λ - λ log (λ - 1)}{2 λ - 1}

for any 0≤x≤1, and hence by (98) that

\int_{0}^{1 - x} f (y) dy = (λ - 1 + x) f (x) .

If we then define the function $F : R_{2} \to (0, + \infty)$ by F(x,y):=f(x)+f(y), we conclude that

\int_{0}^{1 - x} F (x^{'}, y) d x^{'} + \int_{0}^{1 - y} F (x, y^{'}) d y^{'} = λF (x, y)

for all $(x, y) \in R_{2}$ , and the claim now follows from Corollary 35.

We conjecture that a positive eigenfunction for M_k exists for all k≥2, not just for k=2; however, we were unable to produce any such eigenfunctions for k>2. Nevertheless, Lemma 34 still gives us a general upper bound:

Corollary 37.

We have $M_{k} \leq \frac{k}{k - 1} log k$ for any k≥2.

Thus, for instance, one has M₂≤2 log2=1.38629…, which compares well with Corollary 36. On the other hand, Corollary 37 also gives

M_{4} \leq \frac{4}{3} log 4 = 1.8454 \dots,

so that one cannot hope to establish DHL[ 4;2] (or DHL[ 3;2]) solely through Theorem 22 even when assuming GEH, and must rely instead on more sophisticated criteria for DHL[ k;m] such as Theorem 26 or Theorem 28.

Proof.

If we set $G_{i} : R_{k} \to (0, + \infty)$ for i=1,…,k to be the functions

G_{i} (t_{1}, \dots, t_{k}) : = \frac{k - 1}{log k} \frac{1}{1 - t_{1} - \dots - t_{k} + k t_{i}}

then direct calculation shows that

\int_{0}^{\infty} G_{i} (t_{1}, \dots, t_{k}) d t_{i} \leq 1

for all t₁,…,t_i−1,t_i+1,…,t_k≥0, where we extend G_i by zero to all of [ 0,+∞)^k. On the other hand, we have

\sum_{i = 1}^{k} \frac{1}{G_{i} (t_{1}, \dots, t_{k})} = \frac{k}{k - 1} log k

for all $(t_{1}, \dots, t_{k}) \in R_{k}$ . The claim now follows from Lemma 34.

The upper bound arguments for M_k can be extended to other quantities such as M_k,ε, although the bounds do not appear to be as sharp in that case. For instance, we have the following variant of Lemma 37, which shows that the improvement in constants when moving from M_k to M_k,ε is asymptotically modest:

Proposition 38.

For any k≥2 and ε≥0, we have

M_{k, ε} \leq \frac{k}{k - 1} log (2 k - 1) .

Proof.

Let $F : [0, + \infty)^{k} \to ℝ$ be a square-integrable function supported on $(1 + ε) \cdot R_{k}$ . If i=1,…,k and $(t_{1}, \dots, t_{i - 1}, t_{i + 1}, \dots, t_{k}) \in (1 - ε) \cdot R_{k}$ , then if we write s:=1−t₁−⋯−t_i−1−t_i+1−⋯−t_k, we have s≥ε and hence

\begin{align} \int_{0}^{1 - t_{1} - \dots - t_{i - 1} - t_{i + 1} - \dots - t_{k} + ε} \frac{1}{1 - t_{1} - \dots - t_{k} + k t_{i}} d t_{i} & = \int_{0}^{s + ε} \frac{1}{s + (k - 1) t_{i}} d t_{i} \\ = \frac{1}{k - 1} log \frac{ks + (k - 1) ε}{s} \\ \leq \frac{1}{k - 1} log (2 k - 1) . \end{align}

By Cauchy-Schwarz, we conclude that

{(\int_{0}^{\infty} F (t_{1}, \dots, t_{k}) d t_{i})}^{2} \leq \frac{1}{k - 1} log (2 k - 1) \int_{0}^{\infty} (1 - t_{1} - \dots - t_{k} + k t_{i}) F {(t_{1}, \dots, t_{k})}^{2} d t_{i} .

Integrating in t₁,…,t_i−1,t_i+1,…,t_k and summing in i, we obtain the claim.

Remark 39.

The same argument, using the weight 1+a(−t₁−⋯−t_k+k t_i), gives the more general inequality

M_{k, ε} \leq \frac{k}{a (k - 1)} log (k + \frac{(a (1 + ε) - 1) (k - 1)}{1 - a (1 - ε)})

whenever $\frac{1}{1 + ε} < a < \frac{1}{1 - ε}$ ; the case a=1 is Proposition 38, and the limiting case $a = \frac{1}{1 + ε}$ recovers Lemma 37 when one sends ε to zero.

One can also adapt the computations in Corollary 36 to obtain exact expressions for M_2,ε, although the calculations are rather lengthy and will only be summarized here. For fixed 0<ε<1, the eigenfunctions F one seeks should take the form

F (x, y) : = f (x) + f (y)

for x,y≥0 and x+y≤1+ε, where

f (x) : = 1_{x \leq 1 - ε} \int_{0}^{1 + ε - x} F (x, t) dt.

In the regime 0<ε<1/3, one can calculate that f will (up to scalar multiples) take the form

\begin{align} f (x) & = 1_{x \leq 2 ε} \frac{C_{1}}{λ - 1 - ε + x} \\ + 1_{2 ε \leq x \leq 1 - ε} (\frac{log (λ - x) - log (λ - 1 - ε + x)}{2 λ - 1 - ε} + \frac{1}{λ - 1 - ε + x}) \end{align}

where

C_{1} : = \frac{log (λ - 2 ε) - log (λ - 1 + ε)}{1 - log (λ - 1 + ε) + log (λ - 1 - ε)}

and λ is the largest root of the equation

\begin{align} 1 & = C_{1} (log (λ - 1 + ε) - log (λ - 1 - ε)) - log (λ - 1 + ε) \\ + \frac{(λ - 1 + ε) log (λ - 1 + ε) - (λ - 2 ε) log (λ - 2 ε)}{2 λ - 1 - ε} . \end{align}

In the regime 1/3≤ε<1, the situation is significantly simpler, and one has the exact expressions

f (x) = \frac{1_{x \leq 1 - ε}}{λ - 1 - ε + x}

and

λ = \frac{e (1 + ε) - 2 ε}{e - 1} .

In both cases, a variant of Corollary 35 can be used to show that M_2,ε will be equal to λ; thus, for instance,

M_{2, ε} = \frac{e (1 + ε) - 2 ε}{e - 1}

for 1/3≤ε<1. In particular, M_2,ε increases to 2 in the limit ε→1; the lower bound ${liminf}_{ε \to 1} M_{2, ε} \geq 2$ can also be established by testing with the function F(x,y):=1_{x≤δ,y≤1+ε−δ}+1_{y≤δ,x≤1+ε−δ} for some sufficiently small δ>0.

Now we turn to lower bounds on M_k, which are of more relevance for the purpose of establishing results such as Theorem 23. If one restricts attention to those functions $F : R_{k} \to ℝ$ of the special form F(t₁,…,t_k)=f(t₁+⋯+t_k) for some function $f : [0, 1] \to ℝ$ , then the resulting variational problem has been optimized in previous works [39] (and originally in an unpublished work of Conrey), giving rise to the lower bound

M_{k} \geq \frac{4 k (k - 1)}{j_{k - 2}^{2}}

where j_k−2 is the first positive zero of the Bessel function J_k−2. This lower bound is reasonably strong for small k; for instance, when k=2 it shows that

M_{2} \geq 1.383 \dots

which compares well with Corollary 36, and also shows that M₆>2, recovering the result of Goldston, Pintz, and Yıldırım that DHL[ 6;2] (and hence H₁≤16) was true on the Elliott-Halberstam conjecture. However, one can show that $\frac{4 k (k - 1)}{j_{k - 2}^{2}} < 4$ for all k (see [36]), so this lower bound cannot be used to force M_k to be larger than 4.

In [5], the lower bound

M_{k} \geq log k - 2 log log k - 2

(99)

was established for all sufficiently large k. In fact, the arguments in [5] can be used to show this bound for all k≥200 (for k<200, the right-hand side of (99) is either negative or undefined). Indeed, if we use the bound ([5], (7.19)) with A chosen so that A²e^A=k, then 3<A< logk when k≥200, hence e^A=k/A²>k/ log2k and so A≥ logk−2 log logk. By using the bounds $\frac{A}{e^{A} - 1} < \frac{1}{6}$ (since A>3) and e^A/k=1/A²<1/9, we see that the right-hand side of ([5], (8.17)) exceeds $A - \frac{1}{{(1 - 1 / 6 - 1 / 9)}^{2}} \geq A - 2$ , which gives (99).

We will remove the log logk term in (99) via the following explicit estimate.

Theorem 40.

Let k≥2, and let c,T,τ>0 be parameters. Define the function $g : [0, T] \to ℝ$ by

g (t) : = \frac{1}{c + (k - 1) t}

(100)

and the quantities

\begin{align} m_{2} & : = \int_{0}^{T} g {(t)}^{2} dt \end{align}

(101)

\begin{align} μ & : = \frac{1}{m_{2}} \int_{0}^{T} tg {(t)}^{2} dt \end{align}

(102)

\begin{align} σ^{2} & : = \frac{1}{m_{2}} \int_{0}^{T} t^{2} g {(t)}^{2} dt - μ^{2} . \end{align}

(103)

Assume the inequalities

\begin{align} kμ & \leq 1 - τ \end{align}

(104)

\begin{align} kμ & < 1 - T \end{align}

(105)

\begin{align} k σ^{2} & < {(1 + τ - kμ)}^{2} . \end{align}

(106)

Then, one has

\frac{k}{k - 1} log k - M_{k}^{[T]} \leq \frac{k}{k - 1} \frac{Z + Z_{3} + WX + VU}{(1 + τ / 2) (1 - \frac{k σ^{2}}{{(1 + τ - kμ)}^{2}})}

(107)

where Z,Z₃,W,X,V,U are the explicitly computable quantities

\begin{align} Z & : = \frac{1}{τ} \int_{1}^{1 + τ} (r (log \frac{r - kμ}{T} + \frac{k σ^{2}}{4 {(r - kμ)}^{2} log \frac{r - kμ}{T}}) + \frac{r^{2}}{4 kT}) dr \end{align}

(108)

\begin{align} Z_{3} & : = \frac{1}{m_{2}} \int_{0}^{T} kt log (1 + \frac{t}{T}) g {(t)}^{2} dt \end{align}

(109)

\begin{align} W & : = \frac{1}{m_{2}} \int_{0}^{T} log (1 + \frac{τ}{kt}) g {(t)}^{2} dt \end{align}

(110)

\begin{align} X & : = \frac{log k}{τ} c^{2} \end{align}

(111)

\begin{align} V & : = \frac{c}{m_{2}} \int_{0}^{T} \frac{1}{2 c + (k - 1) t} g {(t)}^{2} dt \end{align}

(112)

\begin{align} U & : = \frac{log k}{c} \int_{0}^{1} ({(1 + uτ - (k - 1) μ - c)}^{2} + (k - 1) σ^{2}) du. \end{align}

(113)

Of course, since $M_{k}^{[T]} \leq M_{k}$ , the bound (107) also holds with $M_{k}^{[T]}$ replaced by M_k.

Proof.

From (33), we have

\sum_{i = 1}^{k} J_{i} (F) \leq M_{k}^{[T]} I (F)

whenever $F : [0, + \infty)^{k} \to ℝ$ is square-integrable and supported on ${[0, T]}^{k} \cap R_{k}$ . By rescaling, we conclude that

\sum_{i = 1}^{k} J_{i} (F) \leq r M_{k}^{[T]} I (F)

whenever r>0 and $F : [0, + \infty)^{k} \to ℝ$ is square-integrable and supported on ${[0, rT]}^{k} \cap r \cdot R_{k}$ . We apply this inequality with the function

F (t_{1}, \dots, t_{k}) : = 1_{t_{1} + \dots + t_{k} \leq r} g (t_{1}) \dots g (t_{k})

where r>1 is a parameter which we will eventually average over, and g is extended by zero to [ 0,+∞). We thus have

I (F) = m_{2}^{k} \int_{0}^{\infty} \dots \int_{0}^{\infty} 1_{t_{1} + \dots + t_{k} \leq r} \prod_{i = 1}^{k} \frac{g {(t_{i})}^{2} d t_{i}}{m_{2}} .

We can interpret this probabilistically as

I (F) = m_{2}^{k} ℙ (X_{1} + \dots + X_{k} \leq r)

where X₁,…,X_k are independent random variables taking values in [ 0,T] with probability distribution $\frac{1}{m_{2}} g {(t)}^{2} dt$ . In a similar fashion, we have

J_{k} (F) = m_{2}^{k - 1} \int_{0}^{\infty} \dots \int_{0}^{\infty} {(\int_{[0, r - t_{1} - \dots - t_{k - 1}]} g (t) dt)}^{2} \prod_{i = 1}^{k - 1} \frac{g {(t_{i})}^{2} d t_{i}}{m_{2}},

where we adopt the convention that $\int_{[a, b]}$ vanishes when b<a. In probabilistic language, we thus have

J_{k} (F) = m_{2}^{k - 1} 𝔼 {(\int_{[0, r - X_{1} - \dots - X_{k - 1}]} g (t) dt)}^{2} .

Also by symmetry, we see that J_i(F)=J_k(F) for all i=1,…,k. Putting all these together, we conclude that

𝔼 {(\int_{0}^{r - X_{1} - \dots - X_{k - 1}} g (t) dt)}^{2} \leq \frac{m_{2} M_{k}^{[T]} r}{k} ℙ (X_{1} + \dots + X_{k} \geq r)

for all r>1. Writing S_i:=X₁+⋯+X_i, we abbreviate this as

𝔼 {(\int_{[0, r - S_{k - 1}]} g (t) dt)}^{2} \leq \frac{m_{2} M_{k}^{[T]} r}{k} ℙ (S_{k} \geq r) .

(114)

Now we run a variant of the Cauchy-Schwarz argument used to prove Corollary 37. If, for fixed r>0, we introduce the random function $h : (0, + \infty) \to ℝ$ by the formula

h (t) : = \frac{1}{r - S_{k - 1} + (k - 1) t} 1_{S_{k - 1} < r}

(115)

and observe that whenever S_k−1<r, we have

\int_{[0, r - S_{k - 1}]} h (t) dt = \frac{log k}{k - 1}

(116)

and thus by the Legendre identity, we have

\begin{matrix} {(\int_{[0, r - S_{k - 1}]} g (t) dt)}^{2} = & \frac{log k}{k - 1} \int_{[0, r - S_{k - 1}]} \frac{g {(t)}^{2}}{h (t)} dt - \frac{1}{2} \int_{[0, r - S_{k - 1}]} \int_{[0, r - S_{k - 1}]} \\ \frac{{(g (s) h (t) - g (t) h (s))}^{2}}{h (s) h (t)} dsdt \end{matrix}

for S_k−1<r; but the claim also holds when r≤S_k−1 since all integrals vanish in that case. On the other hand, we have

\begin{align} 𝔼 \int_{[0, r - S_{k - 1}]} \frac{g {(t)}^{2}}{h (t)} dt & = m_{2} 𝔼 (r - S_{k - 1} + (k - 1) X_{k}) 1_{X_{k} \leq r - S_{k - 1}} \\ = m_{2} 𝔼 (r - S_{k} + k X_{k}) 1_{S_{k} \leq r} \\ = m_{2} 𝔼r 1_{S_{k} \leq r} \\ = m_{2} rℙ (S_{k} \leq r) \end{align}

where we have used symmetry to get the third equality. We conclude that

\begin{matrix} 𝔼 {(\int_{[0, r - S_{k - 1}]} g (t) dt)}^{2} = & \frac{log k}{k - 1} m_{2} rℙ (S_{k} \leq r) - \frac{1}{2} 𝔼 \int_{[0, r - S_{k - 1}]} \int_{[0, r - S_{k - 1}]} \\ \frac{{(g (s) h (t) - g (t) h (s))}^{2}}{h (s) h (t)} dsdt. \end{matrix}

Combining this with (114), we conclude that

Δrℙ (S_{k} \leq r) \leq \frac{k}{2 m_{2}} 𝔼 \int_{[0, r - S_{k - 1}]} \int_{[0, r - S_{k - 1}]} \frac{{(g (s) h (t) - g (t) h (s))}^{2}}{h (s) h (t)} dsdt

where

Δ : = \frac{k}{k - 1} log k - M_{k}^{[T]} .

Splitting into regions where s,t are less than T or greater than T, and noting that g(s) vanishes for s>T, we conclude that

Δrℙ (S_{k} \leq r) \leq Y_{1} (r) + Y_{2} (r)

where

Y_{1} (r) : = \frac{k}{m_{2}} 𝔼 \int_{[0, T]} \int_{[T, r - S_{k - 1}]} \frac{g {(t)}^{2}}{h (t)} h (s) dsdt

and

Y_{2} (r) : = \frac{k}{2 m_{2}} 𝔼 \int_{[0, min (T, r - S_{k - 1})]} \int_{[0, min (T, r - S_{k - 1})]} \frac{{(g (s) h (t) - g (t) h (s))}^{2}}{h (s) h (t)} dsdt.

We average this from r=1 to r=1+τ, to conclude that

Δ (\frac{1}{τ} \int_{1}^{1 + τ} rℙ (S_{k} \leq r) dr) \leq \frac{1}{τ} \int_{1}^{1 + τ} Y_{1} (r) dr + \frac{1}{τ} \int_{1}^{1 + τ} Y_{2} (r) dr.

Thus, to prove (107), it suffices (by (106)) to establish the bounds

\frac{1}{τ} \int_{1}^{1 + τ} rℙ (S_{k} \leq r) dr \geq (1 + τ / 2) (1 - \frac{k σ^{2}}{{(1 + τ - kμ)}^{2}}),

(117)

\frac{k}{k - 1} Y_{1} (r) \leq Z + Z_{3}

(118)

for all 1<r≤1+τ, and

\frac{1}{τ} \int_{1}^{1 + τ} Y_{2} (r) dr \leq \frac{k}{k - 1} (WX + VU) .

(119)

We begin with (117). Since

\frac{1}{τ} \int_{1}^{1 + τ} r dr = 1 + \frac{τ}{2},

it suffices to show that

ℙ (S_{k} > 1 + τ) \leq \frac{k σ^{2}}{{(1 + τ - kμ)}^{2}} .

But from (102) and (103), we see that each X_i has mean μ and variance σ², so S_k has mean k μ and variance k σ². The claim now follows from Chebyshev’s inequality and (104).

Now we show (118). The quantity Y₁(r) is vanishing unless r−S_k−1≥T. Using the crude bound $h (s) \leq \frac{1}{(k - 1) s}$ from (115), we see that

\int_{[T, r - S_{k - 1}]} h (s) ds \leq \frac{1}{k - 1} \underset{+}{log} \frac{r - S_{k - 1}}{T}

where log+(x):= max(logx,0). We conclude that

Y_{1} (r) \leq \frac{k}{k - 1} \frac{1}{m_{2}} 𝔼 \int_{[0, T]} \frac{g {(t)}^{2}}{h (t)} dt \underset{+}{log} \frac{r - S_{k - 1}}{T} .

We can rewrite this as

Y_{1} (r) \leq \frac{k}{k - 1} 𝔼 \frac{1_{S_{k} \leq r}}{h (X_{k})} \underset{+}{log} \frac{r - S_{k - 1}}{T} .

By (115), we have

\frac{1_{S_{k} \leq r}}{h (X_{k})} = (r - S_{k} + k X_{k}) 1_{S_{k} \leq r} .

Also, from the elementary bound log+(x+y)≤ log+x+ log(1+y) for any x,y≥0, we see that

\underset{+}{log} \frac{r - S_{k - 1}}{T} \leq \underset{+}{log} \frac{r - S_{k}}{T} + log (1 + \frac{X_{k}}{T}) .

We conclude that

\begin{align} Y_{1} (r) & \leq \frac{k}{k - 1} 𝔼 (r - S_{k} + k X_{k}) (\underset{+}{log} \frac{r - S_{k}}{T} + log (1 + \frac{X_{k}}{T})) 1_{S_{k} \leq r} \\ \leq \frac{k}{k - 1} (𝔼 (r - S_{k} + k X_{k}) \underset{+}{log} \frac{r - S_{k}}{T} + max (r - S_{k}, 0) \frac{X_{k}}{T} + k X_{k} log (1 + \frac{X_{k}}{T})) \end{align}

using the elementary bound log(1+y)≤y. Symmetrizing in the X₁,…,X_k, we conclude that

Y_{1} (r) \leq \frac{k}{k - 1} (Z_{1} (r) + Z_{2} (r) + Z_{3})

(120)

where

\begin{align} Z_{1} (r) & : = 𝔼r \underset{+}{log} \frac{r - S_{k}}{T} \\ Z_{2} (r) & : = 𝔼 (r - S_{k}) 1_{S_{k} \leq r} \frac{S_{k}}{kT} \end{align}

and Z₃ was defined in (109).

For the minor error term Z₂, we use the crude bound $(r - S_{k}) 1_{S_{k} \leq r} S_{k} \leq \frac{r^{2}}{4}$ , so

Z_{2} (r) \leq \frac{r^{2}}{4 kT} .

(121)

For Z₁, we upper bound log+x by a quadratic expression in x. More precisely, we observe the inequality

\underset{+}{log} x \leq \frac{{(x - 2 a log a - a)}^{2}}{4 a^{2} log a}

for any a>1 and $x \in ℝ$ , since the left-hand side is concave in x for x≥1, while the right-hand side is convex in x, non-negative, and tangent to the left-hand side at x=a. We conclude that

\underset{+}{log} \frac{r - S_{k}}{T} \leq \frac{{(r - S_{k} - 2 aT log a - aT)}^{2}}{4 a^{2} T^{2} log a} .

On the other hand, from (102) and (103), we see that each X_i has mean μ and variance σ², so S_k has mean k μ and variance k σ². We conclude that

Z_{1} (r) \leq r \frac{{(r - kμ - 2 aT log a - aT)}^{2} + k σ^{2}}{4 a^{2} T^{2} log a}

for any a>1.

From (105) and the assumption r>1, we may choose $a : = \frac{r - kμ}{T}$ here, leading to the simplified formula

Z_{1} (r) \leq r (log \frac{r - kμ}{T} + \frac{k σ^{2}}{4 {(r - kμ)}^{2} log \frac{r - kμ}{T}}) .

(122)

From (120), (121), (122), and (108) we conclude (118).

Finally, we prove (119). Here, we finally use the specific form (100) of the function g. Indeed, from (100) and (115), we observe the identity

g (t) - h (t) = (r - S_{k - 1} - c) g (t) h (t)

for t∈[0, min(r−S_k−1,T)]. Thus,

\begin{align} Y_{2} (r) & = \frac{k}{2 m_{2}} 𝔼 \int_{[0, min (r - S_{k - 1}, T)]} \int_{[0, min (r - S_{k - 1}, T)]} \frac{{((g - h) (s) h (t) - (g - h) (t) h (s))}^{2}}{h (s) h (t)} dsdt \\ = \frac{k}{2 m_{2}} 𝔼 {(r - S_{k - 1} - c)}^{2} \int_{[0, min (r - S_{k - 1}, T)]} \int_{[0, min (r - S_{k - 1}, T)]} {(g (s) - g (t))}^{2} h (s) h (t) dsdt. \end{align}

Using the crude bound (g(s)−g(t))²≤g(s)²+g(t)² and using symmetry, we conclude

Y_{2} (r) \leq \frac{k}{m_{2}} 𝔼 {(r - S_{k - 1} - c)}^{2} \int_{[0, min (r - S_{k - 1}, T)]} \int_{[0, min (r - S_{k - 1}, T)]} g {(s)}^{2} h (s) h (t) dsdt.

From (116) and (115), we conclude that

Y_{2} (r) \leq \frac{k}{k - 1} Z_{4} (r)

where

Z_{4} (r) : = \frac{log k}{m_{2}} 𝔼 ({(r - S_{k - 1} - c)}^{2} \int_{[0, min (r - S_{k - 1}, T)]} \frac{g {(s)}^{2}}{r - S_{k - 1} + (k - 1) s} ds) .

To prove (119), it thus suffices (after making the change of variables r=1+u τ) to show that

\int_{0}^{1} Z_{4} (1 + uτ) du \leq WX + VU.

(123)

We will exploit the averaging in u to deal with the singular nature of the factor $\frac{1}{r - S_{k - 1} + (k - 1) s}$ . By Fubini’s theorem, the left-hand side of (123) may be written as

\frac{log k}{m_{2}} 𝔼 \int_{0}^{1} Q (u) du

where Q(u) is the random variable

Q (u) : = {(1 + uτ - S_{k - 1} - c)}^{2} \int_{[0, min (1 + uτ - S_{k - 1}, T)]} \frac{g {(s)}^{2}}{1 + uτ - S_{k - 1} + (k - 1) s} ds.

Note that Q(u) vanishes unless 1+u τ−S_k−1>0. Consider first the contribution of those Q(u) for which

0 < 1 + uτ - S_{k - 1} \leq 2 c.

In this regime, we may bound

{(1 + uτ - S_{k - 1} - c)}^{2} \leq c^{2},

so this contribution to (123) may be bounded by

\frac{log k}{m_{2}} c^{2} 𝔼 \int_{[0, T]} g {(s)}^{2} (\int_{0}^{1} \frac{1_{1 + uτ - S_{k - 1} \geq s}}{1 + uτ - S_{k - 1} + (k - 1) s} du) ds.

Observe on making the change of variables v:=1+u τ−S_k−1+(k−1)s that

\begin{align} \int_{0}^{1} \frac{1_{1 + uτ - S_{k - 1} \geq s}}{1 + uτ - S_{k - 1} + (k - 1) s} du & = \frac{1}{τ} \int_{[max (ks, 1 - S_{k - 1} + (k - 1) s), 1 - S_{k - 1} + τ + (k - 1) s]} \frac{dv}{v} \\ \leq \frac{1}{τ} log \frac{ks + τ}{ks} \end{align}

and so this contribution to (123) is bounded by WX, where W,X are defined in (110) and (111).

Now we consider the contribution to (123) when^e

1 + uτ - S_{k - 1} > 2 c.

In this regime, we bound

\frac{1}{1 + uτ - S_{k - 1} + (k - 1) s} \leq \frac{1}{2 c + (k - 1) t},

and so this portion of $\int_{0}^{1} Z_{4} [1 + uτ] du$ may be bounded by

\int_{0}^{1} \frac{log k}{c} 𝔼 {(1 + uτ - S_{k - 1} - c)}^{2} V du = VU

where V,U are defined in (112) and (113). The proof of the theorem is now complete.

We can now perform an asymptotic analysis in the limit k→∞ to establish Theorem 23(xi) and Theorem 25(vi). For k sufficiently large, we select the parameters

\begin{align} c & : = \frac{1}{log k} + \frac{α}{\overset{2}{log} k} \\ T & : = \frac{β}{log k} \\ τ & : = \frac{γ}{log k} \end{align}

for some real parameters $α \in ℝ$ and β,γ>0 independent of k to be optimized in later. From (100) and (101), we have

\begin{align} m_{2} & = \frac{1}{k - 1} (\frac{1}{c} - \frac{1}{c + (k - 1) T}) \\ = \frac{log k}{k} (1 - \frac{α}{log k} + o (\frac{1}{log k})) \end{align}

where we use o(f(k)) to denote a function g(k) of k with g(k)/f(k)→0 as k→∞. On the other hand, we have from (100) and (102) that

\begin{align} m_{2} (c + (k - 1) μ) & = \int_{0}^{T} (c + (k - 1) t) g {(t)}^{2} dt \\ = \frac{1}{k - 1} log \frac{c + (k - 1) T}{c} \\ = \frac{log k}{k} (1 + \frac{log β}{log k} + o (\frac{1}{log k})) \end{align}

and thus

\begin{align} kμ & = \frac{k}{k - 1} (1 + \frac{log β + α}{log k} + o (\frac{1}{log k})) - \frac{kc}{k - 1} \\ = 1 + \frac{log β + α}{log k} + o (\frac{1}{log k}) - (\frac{1}{log k} + o (\frac{1}{log k})) \\ = 1 + \frac{log β + α - 1}{log k} + o (\frac{1}{log k}) . \end{align}

Similarly, from (100), (102), and (103), we have

\begin{align} m_{2} (c^{2} + 2 c (k - 1) μ + {(k - 1)}^{2} (μ^{2} + σ^{2})) & = \int_{0}^{T} {(c + (k - 1) t)}^{2} g {(t)}^{2} dt \\ = T \end{align}

and thus

\begin{align} k σ^{2} & = \frac{k}{{(k - 1)}^{2}} (\frac{T}{m_{2}} - c^{2} - 2 c (k - 1) μ) - k μ^{2} \\ = \frac{β}{\overset{2}{log} k} + o (\frac{1}{\overset{2}{log} k}) . \end{align}

We conclude that the hypotheses (104), (105), and (106) will be obeyed for sufficiently large k if we have

\begin{align} log β + α + γ & < 1 \\ log β + α + β & < 1 \\ β & < {(1 + γ - α - log β)}^{2} . \end{align}

These conditions can be simultaneously obeyed, for instance by setting β=γ=1 and α=−1.

Now we crudely estimate the quantities Z,Z₃,W,X,V,U in (108)-(113). For 1≤r≤1+τ, we have r−k μ∼1/ logk, and so

\frac{r - kμ}{T} ≍ 1; \frac{k σ^{2}}{{(r - kμ)}^{2}} ≍ 1; \frac{r^{2}}{4 kT} = o (1)

and so by (108) Z=O(1). Using the crude bound $log (1 + \frac{t}{T}) = O (1)$ for 0≤t≤T, we see from (109) and (102) that Z₃=O(k μ)=O(1). It is clear that X=O(1), and using the crude bound $\frac{1}{2 c + (k - 1) t} \leq \frac{1}{c}$ we see from (112) and (101) that V=O(1). For 0≤u≤1 we have 1+u τ−(k−1)μ−c=O(1/ logk), so from (113) we have U=O(1). Finally, from (110) and the change of variables $t = \frac{s}{k log k}$ , we have

\begin{align} W & = \frac{log k}{k m_{2}} \int_{0}^{kT log k} log (1 + \frac{γ}{s}) \frac{ds}{{(1 + \frac{α}{log k} + \frac{k - 1}{k} s)}^{2}} \\ = O (\int_{0}^{\infty} log (1 + \frac{γ}{s}) \frac{ds}{(1 + o (1)) {(1 + s)}^{2}}) \\ = O (1) . \end{align}

Finally, we have

1 - \frac{k σ^{2}}{{(1 + τ - kμ)}^{2}} \sim 1 .

Putting all these together, we see from (107) that

M_{k} \geq M_{k}^{[T]} \geq \frac{k}{k - 1} log k - O (1)

giving Theorem 23(xi). Furthermore, if we set

ϖ : = \frac{7}{600} - \frac{C}{log k}

and

δ : = (\frac{1}{4} + \frac{7}{600}) \frac{β}{log k},

then we will have 600ϖ+180δ<7 for C large enough, and Theorem 25(vi) also follows (as one can verify from inspection that all implied constants here are effective).

Finally, Theorem 23(viii), (ix), and (x) follow by setting

\begin{align} c & : = \frac{θ}{log k} \\ T & : = \frac{β}{log k} \\ τ & = 1 - kμ \end{align}

with θ,β given by Table 2, with (107) then giving the bound $M_{k}^{[T]} > M$ with M as given by the table, after verifying of course that the conditions (104), (105), and (106) are obeyed. Similarly, Theorem 25 (ii), (iii), (iv), and (v) follows with θ,β given by the same table, with ϖ chosen so that

M = \frac{m}{\frac{1}{4} + ϖ}

with m=2,3,4,5 for (ii), (iii), (iv), (v), respectively, and δ chosen by the formula

δ : = T (\frac{1}{4} + ϖ) .

Table 2 Parameter choices for Theorems 23 and 25

Full size table

The case of small and medium dimension

In this section, we establish lower bounds for M_k (and related quantities, such as M_k,ε) both for small values of k (in particular, k=3 and k=4) and medium values of k (in particular, k=50 and k=54). Specifically, we will establish Theorem 23(vii), Theorem 27, and Theorem 29.

7.1 Bounding M_kfor medium k

We begin with the problem of lower bounding M_k. We first formalize an observation^f of Maynard [5] that one may restrict without loss of generality to symmetric functions:

Lemma 41.

For any k≥2, one has

M_{k} : = sup \frac{k J_{1} (F)}{I (F)}

where F ranges over symmetric square-integrable functions on $R_{k}$ that are not identically zero.

Proof.

Firstly, observe that if one replaces a square-integrable function $F : [0, + \infty)^{k} \to ℝ$ with its absolute value |F|, then I(|F|)=I(F) and J_i(|F|)≥J_i(F). Thus, one may restrict the supremum in (33) to non-negative functions without loss of generality. We may thus find a sequence F_n of square-integrable non-negative functions on $R_{k}$ , normalized so that I(F_n)=1, and such that $\sum_{i = 1}^{k} J_{i} (F_{n}) \to M_{k}$ as n→∞.

Now let

\bar{F_{n}} (t_{1}, \dots, t_{k}) : = \frac{1}{k!} \sum_{σ \in S_{k}} F_{n} (t_{σ (1)}, \dots, t_{σ (k)})

be the symmetrisation of F_n. Since the F_n are non-negative with I(F_n)=1, we see that

I (\bar{F_{n}}) \geq I (\frac{1}{k!} F_{n}) = \frac{1}{{(k!)}^{2}}

and so $I (\bar{F_{n}})$ is bounded away from zero. Also, from (33), we know that the quadratic form

Q (F) : = M_{k} I (F) - \sum_{i = 1}^{k} J_{i} (F)

is positive semi-definite and is also invariant with respect to symmetries, and so from the triangle inequality for inner product spaces, we conclude that

Q (\bar{F_{n}}) \leq Q (F_{n}) .

By construction, Q(F_n) goes to zero as n→∞, and thus $Q (\bar{F_{n}})$ also goes to zero. We conclude that

\frac{k J_{1} (\bar{F_{n}})}{I (\bar{F_{n}})} = \frac{\sum_{i = 1}^{k} J_{i} (\bar{F_{n}})}{I (\bar{F_{n}})} \to M_{k}

as n→∞, and so

M_{k} \geq sup \frac{k J_{1} (F)}{I (F)} .

The reverse inequality is immediate from (33), and the claim follows.

To establish a lower bound of the form M_k>C for some C>0, one thus seeks to locate a symmetric function $F : [0, + \infty)^{k} \to ℝ$ supported on $R_{k}$ such that

k J_{1} (F) > CI (F) .

(124)

To do this numerically, we follow [5] (see also [2] for some related ideas) and can restrict attention to functions F that are linear combinations

F = \sum_{i = 1}^{n} a_{i} b_{i}

of some explicit finite set $b_{1}, \dots, b_{n} : [0, + \infty)^{k} \to ℝ$ supported on $R_{k}$ and some real scalars a₁,…,a_n that we may optimize in. The condition (124) then may be rewritten as

a^{T} M_{2} a - C a^{T} M_{1} a > 0

(125)

where a is the vector

a : = (\begin{array}{l} a_{1} \\ ⋮ \\ a_{n} \end{array})

and M₁,M₂ are the real symmetric and positive semi-definite n×n matrices

\begin{align} M_{1} & = {(\int_{ℝ^{k}} b_{i} (t_{1}, \dots, t_{k}) b_{j} (t_{1}, \dots, t_{k}) d t_{1} \dots d t_{k})}_{1 \leq i, j \leq n} \end{align}

(126)

\begin{align} M_{2} & = {(k \int_{ℝ^{k + 1}} b_{i} (t_{1}, \dots, t_{k}) b_{j} (t_{1}, \dots, t_{k - 1}, t_{k}^{'}) d t_{1} \dots d t_{k} d t_{k}^{'})}_{1 \leq i, j \leq n} . \end{align}

(127)

If the b₁,…,b_n are linearly independent in $L^{2} (R_{k})$ , then M₁ is strictly positive definite, and (as observed in [5], Lemma 8.3]), one can find a obeying (125) if and only if the largest eigenvalue of $M_{2} M_{1}^{- 1}$ exceeds C. This is a criterion that can be numerically verified for medium-sized values of n, if the b₁,…,b_n are chosen so that the matrix coefficients of M₁,M₂ are explicitly computable.

In order to facilitate computations, it is natural to work with bases b₁,…,b_n of symmetric polynomials. We have the following basic integration identity:

Lemma 42(Beta function identity).

For any non-negative a,a₁,…,a_k, we have

\int_{R_{k}} {(1 - t_{1} - \dots - t_{k})}^{a} t_{1}^{a_{1}} \dots t_{k}^{a_{k}} d t_{1} \dots d t_{k} = \frac{Γ (a + 1) Γ (a_{1} + 1) \dots Γ (a_{k} + 1)}{Γ (a_{1} + \dots + a_{k} + k + a + 1)}

where $Γ (s) : = \int_{0}^{\infty} t^{s - 1} e^{- t} dt$ is the Gamma function. In particular, if a₁,…,a_k are natural numbers, then

\int_{R_{k}} {(1 - t_{1} - \dots - t_{k})}^{a} t_{1}^{a_{1}} \dots t_{k}^{a_{k}} d t_{1} \dots d t_{k} = \frac{a! a_{1}! \dots a_{k}!}{(a_{1} + \dots + a_{k} + k + a)!} .

Proof.

Since

\int_{R_{k}} {(1 - t_{1} - \dots - t_{k})}^{a} t_{1}^{a_{1}} \dots t_{k}^{a_{k}} d t_{1} \dots d t_{k} = a \int_{R_{k + 1}} t_{1}^{a_{1}} \dots t_{k}^{a_{k}} t_{k + 1}^{a - 1} d t_{1} \dots d t_{k + 1},

we see that to establish the lemma, it suffices to do so in the case a=0.

If we write

X : = \int_{t_{1} + \dots + t_{k} = 1} t_{1}^{a_{1}} \dots t_{k}^{a_{k}} d t_{1} \dots d t_{k - 1},

then by homogeneity we have

r^{a_{1} + \dots + a_{k} + k - 1} X = \int_{t_{1} + \dots + t_{k} = r} t_{1}^{a_{1}} \dots t_{k}^{a_{k}} d t_{1} \dots d t_{k - 1}

for any r>0, and hence on integrating r from 0 to 1, we conclude that

\frac{X}{a_{1} + \dots + a_{k} + k} = \int_{R_{k}} t_{1}^{a_{1}} \dots t_{k}^{a_{k}} d t_{1} \dots d t_{k} .

On the other hand, if we multiply by e^−r and integrate r from 0 to ∞, we obtain instead

\int_{0}^{\infty} r^{a_{1} + \dots + a_{k} + k - 1} X e^{- r} dr = \int_{[0, + \infty)^{k}} t_{1}^{a_{1}} \dots t_{k}^{a_{k}} e^{- t_{1} - \dots - t_{k}} d t_{1} \dots d t_{k} .

Using the definition of the Gamma function, this becomes

Γ (a_{1} + \dots + a_{k} + k) X = Γ (a_{1} + 1) \dots Γ (a_{k} + 1)

and the claim follows.

Define a signature to be a non-increasing sequence α=(α₁,α₂,…,α_k) of natural numbers; for brevity, we omit zeroes; thus, for instance if k=6, then (2,2,1,1,0,0) will be abbreviated as (2,2,1,1). The number of non-zero elements of α will be called the length of the signature α, and as usual the degree of α with be α₁+⋯+α_k. For each signature α, we then define the symmetric polynomials $P_{α} = P_{α}^{(k)}$ by the formula

P_{α} (t_{1}, \dots, t_{k}) = \sum_{a : s (a) = α} t_{1}^{a_{1}} \dots t_{k}^{a_{k}}

where the summation is over all tuples a=(a₁,…,a_k) whose non-increasing rearrangement s(a) is equal to α. Thus, for instance

\begin{align} P_{(1)} (t_{1}, \dots, t_{k}) & = t_{1} + \dots + t_{k} \\ P_{(2)} (t_{1}, \dots, t_{k}) & = t_{1}^{2} + \dots + t_{k}^{2} \\ P_{(1, 1)} (t_{1}, \dots, t_{k}) & = \sum_{1 \leq i < j \leq k} t_{i} t_{j} \\ P_{(2, 1)} (t_{1}, \dots, t_{k}) & = \sum_{1 \leq i < j \leq k} t_{i}^{2} t_{j} + t_{i} t_{j}^{2} \end{align}

and so forth. Clearly, the P_α form a linear basis for the symmetric polynomials of t₁,…,t_k. Observe that if α=(α^′,1) is a signature containing 1, then one can express P_α as $P_{(1)} P_{α^{'}}$ minus a linear combination of polynomials P_β with the length of β less than that of α. This implies that the functions $P_{(1)}^{a} P_{α}$ , with a≥0 and α avoiding 1, are also a basis for the symmetric polynomials. Equivalently, the functions (1−P₍₁₎)^aP_α with a≥0 and α avoiding 1 form a basis.

After extensive experimentation, we have discovered that a good basis b₁,…,b_n to use for the above problem comes by setting the b_i to be all the symmetric polynomials of the form (1−P₍₁₎)^aP_α, where a≥0 and α consists entirely of even numbers, whose total degree a+α₁+⋯+α_k is less than or equal to some chosen threshold d. For such functions, the coefficients of M₁,M₂ can be computed exactly using Lemma 42.

More explicitly, first we quickly compute a look-up table for the structure constants $c_{α, β, γ} \in ℤ$ derived from simple products of the form

P_{α} P_{β} = \sum_{γ} c_{α, β, γ} P_{γ}

where deg(α)+ deg(β)≤d. Using this look-up table, we rewrite the integrands of the entries of the matrices in (126) and (127) as integer linear combinations of nearly ‘pure’ monomials of the form ${(1 - P_{(1)})}^{a} t_{1}^{a_{1}} \dots t_{k}^{a_{k}}$ . We then calculate the entries of M₁ and M₂, as exact rational numbers, using Lemma 42.

We next run a generalized eigenvector routine on (real approximations to) M₁ and M₂ to find a vector a^′ which nearly maximize the quantityC in (125). Taking a rational approximation a to a^′, we then do the quick (and exact) arithmetic to verify that (125) holds for some constant C>4. This generalized eigenvector routine is time-intensive when the sizes of M₁ and M₂ are large (say, bigger than 1,500×1,500) and in practice is the most computationally intensive step of our calculation. When one does not care about an exact arithmetic proof that C>4, instead one can run a test for positive-definiteness for the matrix C M₁−M₂, which is usually much faster and less RAM intensive.

Using this method, we were able to demonstrate M₅₄>4.00238, thus establishing Theorem 23(vii). We took d=23 and imposing the restriction on signatures α that they be composed only of even numbers. It is likely that d=22 would suffice in the absence of this restriction on signatures, but we found that the gain in M₅₄ from lifting this restriction is typically only in the region of 0.005, whereas the execution time is increased by a large factor. We do not have a good understanding of why this particular restriction on signatures is so inexpensive in terms of the trade-off between the accuracy of M-values and computational complexity. The total run-time for this computation was under 1 h.

We now describe a second choice for the basis elements b₁,…,b_n, which uses the Krylov subspace method; it gives faster and more efficient numerical results than the previous basis, but does not seem to extend as well to more complicated variational problems such as M_k,ε. We introduce the linear operator $ℒ : L^{2} (R_{k}) \to L^{2} (R_{k})$ defined by

ℒf (t_{1}, \dots, t_{k}) : = \sum_{i = 1}^{k} \int_{0}^{1 - t_{1} - \dots - t_{i - 1} - t_{i + 1} - \dots - t_{k}} f (t_{1}, \dots, t_{i - 1}, t_{i}^{'}, t_{i + 1}, \dots, t_{k}) d t_{i}^{'} .

This is a self-adjoint and positive semi-definite operator on $L^{2} (R_{k})$ . For symmetric $b_{1}, \dots, b_{n} \in L^{2} (R_{k})$ , one can then write

\begin{align} M_{1} & = {(〈 b_{i}, b_{j} 〉)}_{1 \leq i, j \leq n} \\ M_{2} & = {(〈 ℒ b_{i}, b_{j} 〉)}_{1 \leq i, j \leq n} . \end{align}

If we then choose

b_{i} : = ℒ^{i - 1} 1

where 1 is the unit constant function on $R_{k}$ , then the matrices M₁,M₂ take the Hankel form

\begin{align} M_{1} & = {(〈 ℒ^{i + j - 2} 1, 1 〉)}_{1 \leq i, j \leq n} \\ M_{2} & = {(〈 ℒ^{i + j - 1} 1, 1 〉)}_{1 \leq i, j \leq n}, \end{align}

and so can be computed entirely in terms of the 2n numbers $〈 ℒ^{i} 1, 1 〉$ for i=0,…,2n−1.

The operator maps symmetric polynomials to symmetric polynomials; for instance, one has

\begin{align} ℒ 1 & = k - (k - 1) P_{(1)} \\ ℒ P_{(1)} & = \frac{k}{2} - \frac{k - 1}{2} P_{(2)} - (k - 2) P_{(1, 1)} \end{align}

and so forth. From this and Lemma 42, the quantities $〈 ℒ^{i} 1, 1 〉$ are explicitly computable rational numbers; for instance, one can calculate

\begin{align} 〈 1, 1 〉 & = \frac{1}{k!} \\ 〈 ℒ 1, 1 〉 & = \frac{2 k}{(k + 1)!} \\ 〈 ℒ^{2} 1, 1 〉 & = \frac{k (5 k + 1)}{(k + 2)!} \\ 〈 ℒ^{3} 1, 1 〉 & = \frac{2 k^{2} (7 k + 5)}{(k + 3)!} \end{align}

and so forth.

With Maple, we were able to compute $〈 ℒ^{i} 1, 1 〉$ for i≤50 and k≤100, leading to lower bounds on M_k for these values of k, a selection of which is given in Table 3.

Table 3 Selected lower bounds on M _k obtained from the Krylov subspace method, with $\frac{k}{k - 1} log k$ upper bound displayed for comparison

Full size table

7.2 Bounding M_k,εfor medium k

When bounding M_k,ε, we have not been able to implement the Krylov method because the analogue of $ℒ^{i} 1$ in this context is piecewise polynomial instead of polynomial, and we were only able to compute it explicitly for very small values of i, such as i=1,2,3, which are insufficient for good numerics. Thus, we rely on the previously discussed approach, in which symmetric polynomials are used for the basis functions. Instead of computing integrals over the region $R_{k}$ , we pass to the regions $(1 \pm ε) R_{k}$ . In order to apply Lemma 42 over these regions, this necessitates working with a slightly different basis of polynomials. We chose to work with those polynomials of the form (1+ε−P₍₁₎)^aP_α, where α is a signature with no 1’s. Over the region $(1 + ε) R_{k}$ , a single change of variables converts the needed integrals into those of the form in Lemma 42, and we can then compute the entries of M₁.

On the other hand, over the region $(1 - ε) R_{k}$ , we instead want to work with polynomials of the form (1−ε−P₍₁₎)^aP_α. Since (1+ε−P₍₁₎)^a=(2ε+(1−ε−P₍₁₎))^a, an expansion using the binomial theorem allows us to convert from our given basis to polynomials of the needed form.

With these modifications, and calculating as in the previous section, we find that M_50,1/25>4.00124 if d=25 and M_50,1/25>4.0043 if d=27, thus establishing Theorem 27(i). As before, we found it optimal to restrict signatures to contain only even entries, which greatly reduced execution time while only reducing M by a few thousandths.

One surprising additional computational difficulty introduced by allowing ε>0 is that the ‘complexity’ of ε as a rational number affects the run-time of the calculations. We found that choosing ε=1/m (where $m \in ℤ$ has only small prime factors) reduces this effect.

A similar argument gives M_51,1/50>4.00156, thus establishing Theorem 27(xiii). In this case, our polynomials were of maximum degree d=22.

Code and data for these calculations may be found at http://www.dropbox.com/sh/0xb4xrsx4qmua7u/WOhuo2Gx7f/Polymath8b.

7.3 Bounding M_4,ε

We now prove Theorem 27(xii’), which can be established by a direct numerical calculation. We introduce the explicit function $F : [0, + \infty)^{4} \to ℝ$ defined by

F (t_{1}, t_{2}, t_{3}, t_{4}) : = (1 - α (t_{1} + t_{2} + t_{3} + t_{4})) 1_{t_{1} + t_{2} + t_{3} + t_{4} \leq 1 + ε}

with ε:=0.168 and α:=0.784. As F is symmetric in t₁,t₂,t₃,t₄, we have J_i,1−ε(F)=J_1,1−ε(F), so to show Theorem 27(xii’) it will suffice to show that

\frac{4 J_{1, 1 - ε} (F)}{I (F)} > 2.00558 .

(128)

By making the change of variables s=t₁+t₂+t₃+t₄, we see that

\begin{align} I (F) & = \int_{t_{1} + t_{2} + t_{3} + t_{4} \leq 1 + ε} {(1 - α (t_{1} + t_{2} + t_{3} + t_{4}))}^{2} d t_{1} d t_{2} d t_{3} d t_{4} \\ = \int_{0}^{1 + ε} {(1 - αs)}^{2} \frac{s^{3}}{3!} ds \\ = α^{2} \frac{{(1 + ε)}^{6}}{36} - α \frac{{(1 + ε)}^{5}}{15} + \frac{{(1 + ε)}^{4}}{24} \\ = 0.00728001347 \dots \end{align}

and similarly by making the change of variables u=t₁+t₂+t₃

\begin{align} J_{1, 1 - ε} (F) & = \int_{t_{1} + t_{2} + t_{3} \leq 1 - ε} {(\int_{0}^{1 + ε - t_{1} - t_{2} - t_{3}} (1 - α (t_{1} + t_{2} + t_{3} + t_{4})) d t_{4})}^{2} d t_{1} d t_{2} d t_{3} \\ = \int_{0}^{1 - ε} {(\int_{0}^{1 + ε - u} (1 - α (u + t_{4})) d t_{4})}^{2} \frac{u^{2}}{2!} du \\ = \int_{0}^{1 - ε} {(1 + ε - u)}^{2} {(1 - α \frac{1 + ε + u}{2})}^{2} \frac{u^{2}}{2} du \\ = 0.003650160667 \dots \end{align}

and so (128) follows.

Remark 43.

If we use the truncated function

\tilde{F} (t_{1}, t_{2}, t_{3}, t_{4}) : = F (t_{1}, t_{2}, t_{3}, t_{4}) 1_{t_{1}, t_{2}, t_{3}, t_{4} \leq 1}

in place of F and set ε to 0.18 instead of 0.168, one can compute that

\frac{4 J_{1, 1 - ε} (\tilde{F})}{I (\tilde{F})} > 2.00235 .

Thus, it is possible to establish Theorem 27(xii’) using a cutoff function F^′ that is also supported in the unit cube [ 0,1]⁴. This allows for a slight simplification to the proof of DHL[ 4;2] assuming GEH, as one can add the additional hypothesis $S (F_{i_{0}}) + S (G_{i_{0}}) < 1$ to Theorem 20(ii) in that case.

Remark 44.

By optimizing in ε and taking F to be a symmetric polynomial of degree higher than 1, one can get slightly better lower bounds for M_4,ε; for instance, setting ε=5/21 and choosing F to be a cubic polynomial, we were able to obtain the bound M_4,ε≥2.05411. On the other hand, the best lower bound for M_3,ε that we were able to obtain was 1.91726 (taking ε=56/113 and optimizing over cubic polynomials). Again, see http://www.dropbox.com/sh/0xb4xrsx4qmua7u/WOhuo2Gx7f/Polymath8b for the relevant code and data.

7.4 Three-dimensional cutoffs

In this section, we establish Theorem 29. We relabel the variables (t₁,t₂,t₃) as (x,y,z); thus, our task is to locate a piecewise polynomial function $F : [0, + \infty)^{3} \to ℝ$ supported on the simplex

R : = \{(x, y, z) \in [0, + \infty)^{3} : x + y + z \leq \frac{3}{2}\}

and symmetric in the x,y,z variables, obeying the vanishing marginal condition

\int_{0}^{\infty} F (x, y, z) dz = 0

(129)

whenever x,y≥0 with x+y>1+ε, and such that

J (F) > 2 I (F)

(130)

where

J (F) : = 3 \int_{x + y \leq 1 - ε} {(\int_{0}^{\infty} F (x, y, z) dz)}^{2} dxdy

(131)

and

I (F) : = \int_{R} F {(x, y, z)}^{2} dxdydz

(132)

and

ε : = 1 / 4 .

Our strategy will be as follows. We will decompose the simplex R (up to null sets) into a carefully selected set of disjoint open polyhedra P₁,…,P_m (in fact m will be 60), and on each P_i we will take F(x,y,z) to be a low-degree polynomial F_i(x,y,z) (indeed, the degree will never exceed 3). The left-hand and right-hand sides of (130) become quadratic functions in the coefficients of the F_i. Meanwhile, the requirement of symmetry, as well as the marginal requirement (129), imposes some linear constraints on these coefficients. In principle, this creates a finite-dimensional quadratic program, which one can try to solve numerically. However, to make this strategy practical, one needs to keep the number of linear constraints imposed on the coefficients to be fairly small, as compared with the total number of coefficients. To achieve this, the following properties on the polynomials P_i are desirable:

(Symmetry) If P_i is a polytope in the partition, then every reflection of P_i formed by permuting the x,y,z coordinates should also lie in the partition.

(Graph structure) Each polytope P_i should be of the form

\{(x, y, z) : z \in Q_{i}; a_{i} (x, y) < z < b_{i} (x, y)\},

(133)

where a_i(x,y),b_i(x,y) are linear forms and Q_i is a polygon.

(Epsilon splitting) Each Q_i is contained in one of the regions {(x,y):x+y<1−ε}, {(x,y):1−ε<x+y<1+ε}, or {(x,y):1+ε<x+y<3/2}.

Observe that the vanishing marginal condition (129) now takes the form

\sum_{i : (x, y) \in Q_{i}} \int_{a_{i} (x, y)}^{b_{i} (x, y)} F_{i} (x, y, z) dz = 0

(134)

for every x,y>0 with x+y>1+ε. If the set {i:(x,y)∈Q_i} is fixed, then the left-hand side of (134) is a polynomial in x,y whose coefficients depend linearly on the coefficients on the F_i, and thus (134) imposes a set of linear conditions on these coefficients for each possible set {i:(x,y)∈Q_i} with x+y>1+ε.

Now we describe the partition we will use. This partition can in fact be used for all ε in the interval [ 1/4,1/3], but the endpoint ε=1/4 has some simplifications which allowed for reasonably good numerical results. To obtain the symmetry property, it is natural to split R (modulo null sets) into six polyhedra R_xyz,R_xzy,R_yxz,R_yzx,R_zxy,R_zyx, where

\begin{align} R_{xyz} & : = \{(x, y, z) \in R : x + y < y + z < z + x\} \\ = \{(x, y, z) : 0 < y < x < z; x + y + z \leq 3 / 2\} \end{align}

and the other polyhedra are obtained by permuting the indices x,y,z, thus for instance

\begin{align} R_{yxz} & : = \{(x, y, z) \in R : y + x < x + z < z + y\} \\ = \{(x, y, z) : 0 < x < y < z; x + y + z \leq 3 / 2\} . \end{align}

To obtain the epsilon splitting property, we decompose R_xyz (modulo null sets) into eight sub-polytopes

\begin{align} A_{xyz} & = \{(x, y, z) \in R : x + y < y + z < z + x < 1 - ε\}, \\ B_{xyz} & = \{(x, y, z) \in R : x + y < y + z < 1 - ε < z + x < 1 + ε\}, \\ C_{xyz} & = \{(x, y, z) \in R : x + y < 1 - ε < y + z < z + x < 1 + ε\}, \\ D_{xyz} & = \{(x, y, z) \in R : 1 - ε < x + y < y + z < z + x < 1 + ε\}, \\ E_{xyz} & = \{(x, y, z) \in R : x + y < y + z < 1 - ε < 1 + ε < z + x\}, \\ F_{xyz} & = \{(x, y, z) \in R : x + y < 1 - ε < y + z < 1 + ε < z + x\}, \\ G_{xyz} & = \{(x, y, z) \in R : x + y < 1 - ε < 1 + ε < y + z < z + x\}, \\ H_{xyz} & = \{(x, y, z) \in R : 1 - ε < x + y < y + z < 1 + ε < z + x\}; \end{align}

the other five polytopes R_xzy,R_yxz,R_yzx,R_zxy,R_zyx are decomposed similarly, leading to a partition of R into 6×8=48 polytopes. This is almost the partition we will use; however, there is a technical difficulty arising from the fact that some of the permutations of F_xyz do not obey the graph structure property. So we will split F_xyz further into the three pieces

\begin{align} S_{xyz} & = \{(x, y, z) \in F_{xyz} : z < 1 / 2 + ε\}, \\ T_{xyz} & = \{(x, y, z) \in F_{xyz} : z > 1 / 2 + ε; x > 1 / 2 - ε\}, \\ U_{xyz} & = \{(x, y, z) \in F_{xyz} : x < 1 / 2 - ε\} . \end{align}

Thus, R_xyz is now partitioned into ten polytopes A_xyz,B_xyz,C_xyz,D_xyz, E_xyz, S_xyz, T_xyz, U_xyz, G_xyz, H_xyz, and similarly for permutations of R_xyz, leading to a decomposition of R into 6×10=60 polytopes.

A symmetric piecewise polynomial function F supported on R can now be described (almost everywhere) by specifying a polynomial function $F ⇂_{P} : P \to ℝ$ for the ten polytopes P=A_xyz,B_xyz,C_xyz,D_xyz,E_xyz,S_xyz,T_xyz,U_xyz,G_xyz,H_xyz, and then extending by symmetry, thus for instance

F ⇂_{A_{yzx}} (x, y, z) = F ⇂_{A_{xyz}} (z, x, y) .

As discussed earlier, the expressions I(F),J(F) can now be written as quadratic forms in the coefficients of the $F ⇂_{P}$ , and the vanishing marginal condition (129) imposes some linear constraints on these coefficients.

Observe that the polytope D_xyz and all of its permutations make no contribution to either the functional J(F) or to the marginal condition (129), and give a non-negative contribution to I(F). Thus, without loss of generality we may assume that

F ⇂_{D_{xyz}} = 0 .

However, the other nine polytopes A_xyz,B_xyz,C_xyz,E_xyz,S_xyz,T_xyz,U_xyz,G_xyz,H_xyz have at least one permutation which gives a non-trivial contribution to either J(F) or to (129), and cannot be easily eliminated.

Now we compute I(F). By symmetry, we have

I (F) = 3! I (F ⇂_{R_{xyz}}) = 6 \sum_{P} I (F ⇂_{P})

where P ranges over the nine polytopes A_xyz,B_xyz,C_xyz,E_xyz,S_xyz,T_xyz,U_xyz,G_xyz,H_xyz. A tedious but straightforward computation shows that

\begin{align} I (F ⇂_{A_{xyz}}) & = \int_{x = 0}^{1 / 2 - ε / 2} \int_{y = 0}^{x} \int_{z = x}^{1 - ε - x} F ⇂_{A_{xyz}}^{2} dz dy dx \\ I (F ⇂_{B_{xyz}}) & = (\int_{z = 1 / 2 - ε / 2}^{1 / 2 + ε / 2} \int_{x = 1 - ε - z}^{z} + \int_{z = 1 / 2 + ε / 2}^{1 - ε} \int_{x = 1 - ε - z}^{1 + ε - z}) \int_{y = 0}^{1 - ε - z} F ⇂_{B_{xyz}}^{2} dy dx dz \\ I (F ⇂_{C_{xyz}}) & = (\int_{y = 0}^{1 / 2 - 3 ε / 2} \int_{x = y}^{y + 2 ε} + \int_{y = 1 / 2 - 3 ε / 2}^{1 / 2 - ε} \int_{x = y}^{1 - ε - y}) \int_{z = 1 - ε - y}^{1 + ε - x} \\ + \int_{y = 1 / 2 - ε}^{1 / 2 - ε / 2} \int_{x = y}^{1 - ε - y} \int_{z = 1 - ε - y}^{3 / 2 - x - y} F ⇂_{C_{xyz}}^{2} dz dx dy \\ I (F ⇂_{E_{xyz}}) & = \int_{z = 1 / 2 + ε / 2}^{1 - ε} \int_{x = 1 + ε - z}^{z} \int_{y = 0}^{1 - ε - z} F ⇂_{E_{xyz}}^{2} dy dx dz \\ I (F ⇂_{S_{xyz}}) & = (\int_{y = 0}^{1 / 2 - 3 ε / 2} \int_{z = 1 - ε - y}^{1 / 2 + ε} + \int_{y = 1 / 2 - 3 ε / 2}^{1 / 2 - ε} \int_{z = y + 2 ε}^{1 / 2 + ε}) \int_{x = 1 + ε - z}^{1 - ε - y} F ⇂_{S_{xyz}} dx dz dy \\ I (F ⇂_{T_{xyz}}) & = (\int_{z = 1 / 2 + ε}^{1 / 2 + 2 ε} \int_{x = 1 + ε - z}^{3 / 2 - z} + \int_{z = 1 / 2 + 2 ε}^{1 + ε} \int_{x = 1 / 2 - ε}^{3 / 2 - z}) \int_{y = 0}^{3 / 2 - x - z} F ⇂_{T_{xyz}}^{2} dy dz dx \\ I (F ⇂_{U_{xyz}}) & = \int_{x = 0}^{1 / 2 - ε} \int_{y = 0}^{x} \int_{z = 1 + ε - x}^{1 + ε - y} F ⇂_{U_{xyz}} dz dy dx \\ I (F ⇂_{G_{xyz}}) & = \int_{x = 0}^{1 / 2 - ε} \int_{y = 0}^{x} \int_{z = 1 + ε - y}^{3 / 2 - x - y} F ⇂_{G_{xyz}}^{2} dx dz dy \end{align}

and

\begin{align} I (F ⇂_{H_{xyz}}) & = (\int_{x = 1 / 2 + ε / 2}^{1 - ε} \int_{y = 1 - ε - x}^{3 / 2 - 2 x} + \int_{x = 1 - ε}^{3 / 4} \int_{y = 0}^{3 / 2 - 2 x}) \int_{z = x}^{3 / 2 - x - y} \\ + \int_{x = 1 / 2}^{1 / 2 + ε / 2} \int_{y = 1 - ε - x}^{1 / 2 - ε} \int_{z = 1 + ε - x}^{3 / 2 - x - y} F ⇂_{H_{xyz}}^{2} dz dy dx. \end{align}

Now we consider the quantity J(F). Here we only have the symmetry of swapping x and y, so that

J (F) = 6 \int_{0 < y < x; x + y < 1 - ε} {(\int_{0}^{3 / 2 - x - y} F (x, y, z) dz)}^{2} dxdy.

The region of integration meets the polytopes A_xyz, A_yzx, A_zyx, B_xyz, B_zyx, C_xyz, E_xyz, E_zyx, S_xyz, T_xyz, U_xyz, and G_xyz.

Projecting these regions to the (x,y)-plane, we have the diagram:

This diagram is drawn to scale in the case when ε=1/4; otherwise, there is a separation between the J₅ and J₇ regions. For each of these eight regions, there are eight corresponding integrals J₁,J₂,…,J₈, and thus

J = 2 (J_{1} + \dots + J_{8}) .

We have

\begin{align} J_{1} & = \int_{x = 0}^{1 / 2 - ε} \int_{y = 0}^{x} (\int_{z = 0}^{y} F ⇂_{A_{yzx}} + \int_{z = y}^{x} F ⇂_{A_{zyx}} + \int_{z = x}^{1 - ε - x} F ⇂_{A_{xyz}} + \int_{z = 1 - ε - x}^{1 - ε - y} F ⇂_{B_{xyz}} \\ {+ \int_{z = 1 - ε - y}^{1 + ε - x} F ⇂_{C_{xyz}} + \int_{z = 1 + ε - x}^{1 + ε - y} F ⇂_{U_{xyz}} + \int_{z = 1 + ε - y}^{3 / 2 - x - y} F ⇂_{G_{xyz}} dz)}^{2} dy dx. \end{align}

Next comes

\begin{align} J_{2} & = \int_{x = 1 / 2 - ε}^{1 / 2 - ε / 2} \int_{y = 1 / 2 - ε}^{x} (\int_{z = 0}^{y} F ⇂_{A_{yzx}} + \int_{z = y}^{x} F ⇂_{A_{zyx}} + \int_{z = x}^{1 - ε - x} F ⇂_{A_{xyz}} + \int_{z = 1 - ε - x}^{1 - ε - y} F ⇂_{B_{xyz}} \\ {+ \int_{z = 1 - ε - y}^{3 / 2 - x - y} F ⇂_{C_{xyz}} dz)}^{2} dy dx. \end{align}

Third is the piece

\begin{align} J_{3} & = \int_{x = 1 / 2 - ε}^{1 / 2 - ε / 2} \int_{y = 0}^{1 / 2 - ε} (\int_{z = 0}^{y} F ⇂_{A_{yzx}} + \int_{z = y}^{x} F ⇂_{A_{zyx}} + \int_{z = x}^{1 - ε - x} F ⇂_{A_{xyz}} + \int_{z = 1 - ε - x}^{1 - ε - y} F ⇂_{B_{xyz}} \\ {+ \int_{z = 1 - ε - y}^{1 + ε - x} F ⇂_{C_{xyz}} + \int_{z = 1 + ε - x}^{3 / 2 - x - y} F ⇂_{T_{xyz}} dz)}^{2} dy dx. \end{align}

We now have dealt with all integrals involving A_xyz, and all remaining integrals pass through B_zyx. Continuing, we have

\begin{align} J_{4} & = \int_{x = 1 / 2 - ε / 2}^{1 / 2} \int_{y = 1 / 2 - ε}^{1 - ε - x} (\int_{z = 0}^{y} F ⇂_{A_{yzx}} + \int_{z = y}^{1 - ε - x} F ⇂_{A_{zyx}} + \int_{z = 1 - ε - x}^{x} F ⇂_{B_{zyx}} \\ {+ \int_{z = x}^{1 - ε - y} F ⇂_{B_{xyz}} + \int_{z = 1 - ε - y}^{3 / 2 - x - y} F ⇂_{C_{xyz}} dz)}^{2} dy dx. \end{align}

Another component is

\begin{align} J_{5} & = \int_{x = 1 / 2 - ε / 2}^{1 / 2} \int_{y = 0}^{1 / 2 - ε} (\int_{z = 0}^{y} F ⇂_{A_{yzx}} + \int_{z = y}^{1 - ε - x} F ⇂_{A_{zyx}} \\ + \int_{z = 1 - ε - x}^{x} F ⇂_{B_{zyx}} + \int_{z = x}^{1 - ε - y} F ⇂_{B_{xyz}} + \int_{z = 1 - ε - y}^{1 + ε - x} F ⇂_{C_{xyz}} \\ {+ \int_{z = 1 + ε - x}^{3 / 2 - x - y} F ⇂_{T_{xyz}} dz)}^{2} dy dx. \end{align}

The most complicated piece is

\begin{align} J_{6} & = (\int_{x = 1 / 2}^{2 ε} \int_{y = 0}^{1 - ε - x} + \int_{x = 2 ε}^{1 / 2 + ε / 2} \int_{y = x - 2 ε}^{1 - ε - x}) (\int_{z = 0}^{y} F ⇂_{A_{yzx}} + \int_{z = y}^{1 - ε - x} F ⇂_{A_{zyx}} \\ + \int_{z = 1 - ε - x}^{x} F ⇂_{B_{zyx}} + \int_{z = x}^{1 - ε - y} F ⇂_{B_{xyz}} + \int_{z = 1 - ε - y}^{1 + ε - x} F ⇂_{C_{xyz}} + \int_{z = 1 + ε - x}^{1 / 2 + ε} F ⇂_{S_{xyz}} \\ {+ \int_{z = 1 / 2 + ε}^{3 / 2 - x - y} F ⇂_{T_{xyz}} dz)}^{2} dy dx. \end{align}

Here we use $(\int_{x = 1 / 2}^{2 ε} \int_{y = 0}^{1 - ε - x} + \int_{x = 2 ε}^{1 / 2 + ε / 2} \int_{y = x - 2 ε}^{1 - ε - x}) f (x, y) dydx$ as an abbreviation for

\int_{x = 1 / 2}^{2 ε} \int_{y = 0}^{1 - ε - x} f (x, y) dydx + \int_{x = 2 ε}^{1 / 2 + ε / 2} \int_{y = x - 2 ε}^{1 - ε - x} f (x, y) dydx.

We have now exhausted C_xyz. The seventh piece is

\begin{align} J_{7} & = \int_{x = 2 ε}^{1 / 2 + ε / 2} \int_{y = 0}^{x - 2 ε} (\int_{z = 0}^{y} F ⇂_{A_{yzx}} + \int_{z = y}^{1 - ε - x} F ⇂_{A_{zyx}} + \int_{z = 1 - ε - x}^{x} F ⇂_{B_{zyx}} \\ + \int_{z = x}^{1 + ε - x} F ⇂_{B_{xyz}} + \int_{z = 1 + ε - x}^{1 - ε - y} F ⇂_{E_{xyz}} + \int_{1 - ε - y}^{1 / 2 + ε} F ⇂_{S_{xyz}} \\ {+ \int_{1 / 2 + ε}^{3 / 2 - x - y} F ⇂_{T_{xyz}} dz)}^{2} dy dx. \end{align}

Finally, we have

\begin{align} J_{8} & = \int_{x = 1 / 2 + ε / 2}^{1 - ε} \int_{y = 0}^{1 - ε - x} (\int_{z = 0}^{y} F ⇂_{A_{yzx}} + \int_{z = y}^{1 - ε - x} F ⇂_{A_{zyx}} + \int_{z = 1 - ε - x}^{1 + ε - x} F ⇂_{B_{zyx}} \\ + \int_{z = 1 + ε - x}^{x} F ⇂_{E_{zyx}} + \int_{z = x}^{1 - ε - y} F ⇂_{E_{xyz}} + \int_{1 - ε - y}^{1 / 2 + ε} F ⇂_{S_{xyz}} \\ {+ \int_{1 / 2 + ε}^{3 / 2 - x - y} F ⇂_{T_{xyz}} dz)}^{2} dy dx. \end{align}

In the case ε=1/4, the marginal conditions (129) reduce to requiring

\begin{align} \int_{z = 0}^{3 / 2 - x - y} F ⇂_{G_{yzx}} dz & = 0 \end{align}

(135)

\begin{align} \int_{z = 0}^{y} F ⇂_{G_{yzx}} + \int_{z = y}^{3 / 2 - x - y} F ⇂_{G_{zyx}} dz & = 0 \end{align}

(136)

\begin{align} \int_{z = 0}^{1 + ε - x} F ⇂_{U_{yzx}} + \int_{z = 1 + ε - x}^{y} F ⇂_{G_{yzx}} + \int_{z = y}^{3 / 2 - x - y} F ⇂_{G_{zyx}} dz & = 0 \end{align}

(137)

\begin{align} \int_{z = 0}^{1 + ε - x} F ⇂_{U_{yzx}} + \int_{z = 1 + ε - x}^{3 / 2 - x - y} F ⇂_{G_{yzx}} dz & = 0 \end{align}

(138)

\begin{align} \int_{z = 0}^{3 / 2 - x - y} F ⇂_{T_{yzx}} dz & = 0 \end{align}

(139)

\begin{align} \int_{z = 0}^{1 - ε - x} F ⇂_{E_{yzx}} + \int_{z = 1 - ε - x}^{1 - ε - y} F ⇂_{S_{yzx}} + \int_{z = 1 - ε - y}^{3 / 2 - x - y} F ⇂_{H_{yzx}} dz & = 0 . \end{align}

(140)

Each of these constraints is only required to hold for some portion of the parameter space {(x,y):1+ε≤x+y≤3/2}, but as the left-hand sides are all polynomial functions in x,y (using the signed definite integral $\int_{b}^{a} = - \int_{a}^{b}$ ), it is equivalent to require that all coefficients of these polynomial functions vanish.

Now we specify F. After some numerical experimentation, we have found that the simplest choice of F which still achieves the desired goal comes by taking F(x,y,z) to be a polynomial of degree 1 on each of E_xyz, S_xyz, and H_xyz; degree 2 on T_xyz, vanishing on D_xyz; and degree 3 on the remaining five relevant components of R_xyz. After solving the quadratic program, rounding, and clearing denominators, we arrive at the choice

\begin{align} F ⇂_{A_{xyz}} & : = - 66 + 96 x - 147 x^{2} + 125 x^{3} + 128 y - 122 xy + 104 x^{2} y - 275 y^{2} + 394 y^{3} \\ + 99 z - 58 xz + 63 x^{2} z - 98 yz + 51 xyz + 41 y^{2} z - 112 z^{2} + 24 x z^{2} + 72 y z^{2} \\ + 50 z^{3} \\ F ⇂_{B_{xyz}} & : = - 41 + 52 x - 73 x^{2} + 25 x^{3} + 108 y - 66 xy + 71 x^{2} y - 294 y^{2} + 56 x y^{2} \\ + 363 y^{3} + 33 z + 15 xz + 22 x^{2} z - 40 yz - 42 xyz + 75 y^{2} z - 36 z^{2} - 24 x z^{2} \\ + 26 y z^{2} + 20 z^{3} \\ F ⇂_{C_{xyz}} & : = - 22 + 45 x - 35 x^{2} + 63 y - 99 xy + 82 x^{2} y - 140 y^{2} + 54 x y^{2} + 179 y^{3} \\ F ⇂_{E_{xyz}} & : = - 12 + 8 x + 32 y \\ F ⇂_{S_{xyz}} & : = - 6 + 8 x + 16 y \\ F ⇂_{T_{xyz}} & : = 18 - 30 x + 12 x^{2} + 42 y - 20 xy - 66 y^{2} - 45 z + 34 xz + 22 z^{2} \\ F ⇂_{U_{xyz}} & : = 94 - 1, 823 x + 5, 760 x^{2} - 5, 128 x^{3} + 54 y - 168 x^{2} y + 105 y^{2} + 1, 422 xz \\ - 2, 340 x^{2} z - 192 y^{2} z - 128 z^{2} - 268 x z^{2} + 64 z^{3} \\ F ⇂_{G_{xyz}} & : = 5, 274 - 19, 833 x + 18, 570 x^{2} - 5, 128 x^{3} - 18, 024 y + 44, 696 xy \\ - 20, 664 x^{2} y + 16, 158 y^{2} - 19, 056 x y^{2} - 4, 592 y^{3} - 10, 704 z \\ + 26, 860 xz - 12, 588 x^{2} z + 24, 448 yz - 30, 352 xyz - 10, 980 y^{2} z + 7, 240 z^{2} \\ - 9, 092 x z^{2} - 8, 288 y z^{2} - 1, 632 z^{3} \\ F ⇂_{H_{xyz}} & : = 8 z. \end{align}

One may compute that

I (F) = \frac{62, 082, 439, 864, 241}{507, 343, 011, 840}

and

J (F) = \frac{9, 933, 190, 664, 926, 733}{40, 587, 440, 947, 200}

with all the marginal conditions (135)-(140) obeyed, and thus

\frac{J (F)}{I (F)} = 2 + \frac{286, 648, 173}{4, 966, 595, 189, 139, 280}

and (130) follows.

The parity problem

In this section, we argue why the ‘parity barrier’ of Selberg [7] prohibits sieve-theoretic methods, such as the ones in this paper, from obtaining any bound on H₁ that is stronger than H₁≤6, even on the assumption of strong distributional conjectures such as the generalized Elliott-Halberstam conjecture GEH[ 𝜗] and even if one uses sieves other than the Selberg sieve. Our discussion will be somewhat informal and heuristic in nature.

We begin by briefly recalling how the bound H₁≤6 on GEH (i.e., Theorem 4(xii)) was proven. This was deduced from the claim DHL[ 3;2], or more specifically from the claim that the set

A : = \{n \in ℕ : atleasttwoofn, n + 2, n + 6 areprime\}

(141)

was infinite.

To do this, we (implicitly) established a lower bound

\sum_{n} ν (n) 1_{A} (n) > 0

for some non-negative weight $ν : ℕ \to ℝ^{+}$ supported on [ x,2x] for a sufficiently large x. This bound was in turn established (after a lengthy sieve-theoretic analysis, and with a carefully chosen weight ν) from upper bounds on various discrepancies. More precisely, one required good upper bounds (on average) for the expressions

|\sum_{x \leq n \leq 2 x : x = a (q)} f (n + h) - \frac{1}{φ (q)} \sum_{x \leq n \leq 2 x : (n + h, q) = 1} f (n + h)|

(142)

for all h∈{0,2,6} and various residue classes a (q) with q≤x^1−ε and arithmetic functions f, such as the constant function f=1, the von Mangoldt function f=Λ, or Dirichlet convolutions f=α⋆β of the type considered in Claim 12. (In the presentation of this argument in previous sections, the shift by h was eliminated using the change of variables n^′=n+h, but for the current discussion, it is important that we do not use this shift.) One also required good asymptotic control on the main terms

\sum_{x \leq n \leq 2 x : (n + h, q) = 1} f (n + h) .

(143)

An inspection of these arguments (which no longer exploit change of variables such as n^′=n+h in the n variable) shows that they would be equally valid if one inserted a further non-negative weight $ω : ℕ \to ℝ^{+}$ in the summation over n. More precisely, the above sieve-theoretic argument would also deduce the lower bound

\sum_{n} ν (n) 1_{A} (n) ω (n) > 0

if one had control on the weighted discrepancies

|\sum_{x \leq n \leq 2 x : x = a (q)} f (n + h) ω (n) - \frac{1}{φ (q)} \sum_{x \leq n \leq 2 x : (n + h, q) = 1} f (n + h) ω (n)|

(144)

and on the weighted main terms

\sum_{x \leq n \leq 2 x : (n + h, q) = 1} f (n + h) ω (n)

(145)

that were of the same form as in the unweighted case ω=1.

Now suppose for instance that one was trying to prove the bound H₁≤4. A natural way to proceed here would be to replace the set A in (141) with the smaller set

A^{'} : = {n \in ℕ : n, n + 2 arebothprime} \cup {n \in ℕ : n + 2, n + 6 arebothprime}

(146)

and hope to establish a bound of the form

\sum_{n} ν (n) 1_{A^{'}} (n) > 0

for a well-chosen function $ν : ℕ \to ℝ^{+}$ supported on [ x,2x], by deriving this bound from suitable (averaged) upper bounds on the discrepancies (142) and control on the main terms (143). If the arguments were sieve-theoretic in nature, then (as in the H₁≤6 case) one could then also deduce the lower bound

\sum_{n} ν (n) 1_{A^{'}} (n) ω (n) > 0

(147)

for any non-negative weight $ω : ℕ \to ℝ^{+}$ , provided that one had the same control on the weighted discrepancies (144) and weighted main terms (145) that one did on (142) and (143).

We apply this observation to the weight

\begin{align} ω (n) & : = (1 - λ (n) λ (n + 2)) (1 - λ (n + 2) λ (n + 6)) \\ = 1 - λ (n) λ (n + 2) - λ (n + 2) λ (n + 6) + λ (n) λ (n + 6) \end{align}

where λ(n):=(−1)^Ω(n) is the Liouville function. Observe that ω vanishes for any n∈A^′, and hence

\sum_{n} ν (n) 1_{A^{'}} (n) ω (n) = 0

(148)

for any ν. On the other hand, the ‘Möbius randomness law’ (see, e.g. [33]) predicts a significant amount of cancellation for any non-trivial sum involving the Möbius function μ or the closely related Liouville function λ. For instance, the expression

\sum_{x \leq n \leq 2 x : n = a (q)} λ (n + h)

is expected to be very small (of size^g $O (\frac{x}{q} \overset{- A}{log} x)$ for any fixed A) for any residue class a (q) with q≤x^1−ε, and any h∈{0,2,6}; similarly for more complicated expressions such as

\sum_{x \leq n \leq 2 x : n = a (q)} λ (n + 2) λ (n + 6)

or

\sum_{x \leq n \leq 2 x : n = a (q)} Λ (n) λ (n + 2) λ (n + 6)

or more generally

\sum_{x \leq n \leq 2 x : n = a (q)} f (n) λ (n + 2) λ (n + 6)

where f is a Dirichlet convolution α⋆β of the form considered in Claim 12. Similarly for expressions such as

\sum_{x \leq n \leq 2 x : n = a (q)} f (n) λ (n) λ (n + 2);

note from the complete multiplicativity of λ that (α⋆β)λ=(α λ)⋆(β λ), so if f is of the form in Claim 12, then f λ is also. In view of these observations (and similar observations arising from permutations of {0,2,6}), we conclude (heuristically, at least) that all the bounds that are believed to hold for (142) and (143) should also hold (up to minor changes in the implied constants) for (144) and (145). Thus, if the bound H₁≤4 could be proven in a sieve-theoretic fashion, one should be able to conclude the bound (147), which is in direct contradiction to (148).

Remark 45.

Similar arguments work for any set of the form

A_{H} : = {n \in ℕ : \exists n \leq p_{1} < p_{2} \leq n + H; p_{1}, p_{2} bothprime, p_{2} - p_{1} \leq 4}

and any fixed H>0, to prohibit any non-trivial lower bound on $\sum_{n} ν (n) 1_{A_{H}} (n)$ from sieve-theoretic methods. Indeed, one uses the weight

ω (n) : = \prod_{0 \leq i \leq i^{'} \leq H; (n + i, 3) = (n + i^{'}, 3) = 1; i^{'} - i \leq 4} (1 - λ (n + i) λ (n + i^{'}));

we leave the details to the interested reader. This seems to block any attempt to use any argument based only on the distribution of the prime numbers and related expressions in arithmetic progressions to prove H₁≤4.

The same arguments of course also prohibit a sieve-theoretic proof of the twin prime conjecture H₁=2. In this case, one can use the simpler weight ω(n)=1−λ(n)λ(n+2) to rule out such a proof, and the argument is essentially due to Selberg [7].

Of course, the parity barrier could be circumvented if one were able to introduce stronger sieve-theoretic axioms than the ‘linear’ axioms currently available (which only control sums of the form (142) or (143)). For instance, if one were able to obtain non-trivial bounds for ‘bilinear’ expressions such as

\sum_{x \leq n \leq 2 x} f (n) Λ (n + 2) = \sum_{d} \sum_{m} α (d) β (m) 1_{[x, 2 x]} (dm) Λ (dm + 2)

for functions f=α⋆β of the form in Claim 12, then (by a modification of the proof of Proposition 13) one would very likely obtain non-trivial bounds on

\sum_{x \leq n \leq 2 x} Λ (n) Λ (n + 2)

which would soon lead to a proof of the twin prime conjecture. Unfortunately, we do not know of any plausible way to control such bilinear expressions. (Note however that there are some other situations in which bilinear sieve axioms may be established, for instance in the argument of Friedlander and Iwaniec [40] establishing an infinitude of primes of the form a²+b⁴.)

Additional remarks

The proof of Theorem 16(xii) may be modified to establish the following variant:

Proposition 46.

Assume the generalized Elliott-Halberstam conjecture GEH[θ] for all 0<θ<1. Let 0<ε<1/2 be fixed. Then, if x is a sufficiently large multiple of 6, there exists a natural number n with ε x≤n≤(1−ε)x such that at least two of n,n−2,x−n are prime, and similarly if n−2 is replaced by n+2.

Note that if at least two of n,n−2,x−n are prime, then either n,n+2 are twin primes or else at least one of x,x−2 is expressible as the sum of two primes, and Theorem 5 easily follows.

Proof.

(Sketch) We just discuss the case of n−2, as the n+2 case is similar. Observe from the Chinese remainder theorem (and the hypothesis that x is divisible by 6) that one can find a residue class b (W) such that b,b−2,x−b are all coprime to W (in particular, one has b=1 (6)). By a routine modification of the proof of Lemma 18, it suffices to find a non-negative weight function $ν : ℕ \to ℝ^{+}$ and fixed quantities α>0 and β₁,β₂,β₃≥0, such that one has the asymptotic upper bound

\sum_{\begin{matrix} εx \leq n \leq (1 - ε) x \\ n = b (W) \end{matrix}} ν (n) \leq S (α + o (1)) B^{- k} \frac{(1 - 2 ε) x}{W},

the asymptotic lower bounds

\begin{align} \sum_{\begin{matrix} εx \leq n \leq (1 - ε) x \\ n = b (W) \end{matrix}} ν (n) θ (n) & \geq S (β_{1} - o (1)) B^{1 - k} \frac{(1 - 2 ε) x}{φ (W)} \\ \sum_{\begin{matrix} εx \leq n \leq (1 - ε) x \\ n = b (W) \end{matrix}} ν (n) θ (n + 2) & \geq S (β_{2} - o (1)) B^{1 - k} \frac{(1 - 2 ε) x}{φ (W)} \\ \sum_{\begin{matrix} εx \leq n \leq (1 - ε) x \\ n = b (W) \end{matrix}} ν (n) θ (x - n) & \geq S (β_{3} - o (1)) B^{1 - k} \frac{(1 - 2 ε) x}{φ (W)} \end{align}

and the inequality

β_{1} + β_{2} + β_{3} > 2 α,

where is the singular series

S : = \prod_{p | x (x - 2); p > w} \frac{p}{p - 1} .

We select ν to be of the form

ν (n) = {(\sum_{j = 1}^{J} c_{j} λ_{F_{j, 1}} (n) λ_{F_{j, 2}} (n + 2) λ_{F_{j, 3}} (x - n))}^{2}

for various fixed coefficients $c_{1}, \dots, c_{J} \in ℝ$ and fixed smooth compactly supported functions $F_{j, i} : [0, + \infty) \to ℝ$ with j=1,…,J and i=1,…,3. It is then routine^h to verify that analogues of Theorem 19 and Theorem 20 hold for the various components of ν, with the role of x in the right-hand side replaced by (1−2ε)x, and the claim then follows by a suitable modification of Theorem 28, taking advantage of the function F constructed in Theorem 29.

It is likely that the bounds in Theorem 4 can be improved further by refining the sieve-theoretic methods employed in this paper, with the exception of part (xii) for which the parity problem prevents further improvement, as discussed in the ‘The parity problem’ section. We list some possible avenues to such improvements as follows:

1.
In Theorem 27, the bound M _k,ε>4 was obtained for some ε>0 and k=50. It is possible that k could be lowered slightly, for instance to k=49, by further numerical computations, but we were only barely able to establish the k=50 bound after 2 weeks of computation. However, there may be a more efficient way to solve the required variational problem (e.g. by selecting a more efficient basis than the symmetric monomial basis) that would allow one to advance in this direction; this would improve the bound H ₁≤246 slightly. Extrapolation of existing numerics also raises the possibility that M ₅₃ exceeds 4, in which case the bound of 270 in Theorem 4(vii) could be lowered to 264.
2.
To reduce k (and thus H ₁) further, one could try to solve another variational problem, such as the one arising in Theorem 24 or in Theorem 28, rather than trying to lower bound M _k or M _k,ε. It is also possible to use the more complicated versions of MPZ[ ϖ,δ] established (in which the modulus q is assumed to be densely divisible rather than smooth) to replace the truncated simplex appearing in Theorem 24 with a more complicated region (such regions also appear implicitly in [§4.5]). However, in the medium-dimensional setting k≈50, we were not able to accurately and rapidly evaluate the various integrals associated to these variational problems when applied to a suitable basis of functions. One key difficulty here is that whereas polynomials appear to be an adequate choice of basis for the M _k, an analysis of the Euler-Lagrange equation reveals that one should use piecewise polynomial basis functions instead for more complicated variational problems such as the M _k,ε problem (as was done in the three-dimensional case in the ‘Three-dimensional cutoffs’ section), and these are difficult to work with in medium dimensions. From our experience with the low k problems, it looks like one should allow these piecewise polynomials to have relatively high degree on some polytopes and low degree on other polytopes, and vanish completely on yet further polytopesⁱ, but we do not have a systematic understanding of what the optimal placement of degrees should be.
3.
In Theorem 28, the function F was required to be supported in the simplex $\frac{k}{k - 1} \cdot R_{k}$ . However, one can consider functions F supported in other regions R, subject to the constraint that all elements of the sumset R+R lie in a region treatable by one of the cases of Theorem 20. This could potentially lead to other optimization problems that lead to superior numerology, although again it appears difficult to perform efficient numerics for such problems in the medium k regime k≈50. One possibility would be to adopt a ‘free boundary’ perspective, in which the support of F is not fixed in advance, but is allowed to evolve by some iterative numerical scheme.
4.
To improve the bounds on H _m for m=2,3,4,5, one could seek a better lower bound on M _k than the one provided by Theorem 40; one could also try to lower bound more complicated quantities such as M _k,ε.
5.
One could attempt to improve the range of ϖ,δ for which estimates of the form MPZ[ ϖ,δ] are known to hold, which would improve the results of Theorem 4(ii)-(vi). For instance, we believe that the condition 600ϖ+180δ<7 in Theorem 11 could be improved slightly to 1,080ϖ+330δ<13 by refining the arguments, but this requires a hypothesis of square root cancellation in a certain four-dimensional exponential sum over finite fields, which we have thus far been unable to establish rigorously. Another direction to pursue would be to improve the δ parameter, or to otherwise relax the requirement of smoothness in the moduli, in order to reduce the need to pass to a truncation of the simplex $R_{k}$ , which is the primary reason why the m=1 results are currently unable to use the existing estimates of the form MPZ[ ϖ,δ]. Another speculative possibility is to seek MPZ[ ϖ,δ] type estimates which only control distribution for a positive proportion of smooth moduli, rather than for all moduli, and then to design a sieve ν adapted to just that proportion of moduli (cf. [41]). Finally, there may be a way to combine the arguments currently used to prove MPZ[ ϖ,δ] with the automorphic forms (or ‘Kloostermania’) methods used to prove nontrivial equidistribution results with respect to a fixed modulus, although we do not have any ideas on how to actually achieve such a combination.
6.
It is also possible that one could tighten the argument in Lemma 18, for instance by establishing a non-trivial lower bound on the portion of the sum $\sum_{n} ν (n)$ when n+h ₁,…,n+h _k are all composite, or a sufficiently strong upper bound on the pair correlations $\sum_{n} θ (n + h_{i}) θ (n + h_{j})$ (see [9], §6] for a recent implementation of this latter idea). However, our preliminary attempts to exploit these adjustements suggested that the gain from the former idea would be exponentially small in k, whereas the gain from the latter would also be very slight (perhaps reducing k by O(1) in large k regimes, e.g. k≥5,000).
7.
All of our sieves used are essentially of Selberg type, being the square of a divisor sum. We have experimented with a number of non-Selberg type sieves (for instance trying to exploit the obvious positivity of $1 - \sum_{p \leq x : p | n} \frac{log p}{log x}$ when n≤x); however, none of these variants offered a numerical improvement over the Selberg sieve. Indeed it appears that after optimizing the cutoff function F, the Selberg sieve is in some sense a ‘local maximum’ in the space of non-negative sieve functions, and one would need a radically different sieve to obtain numerically superior results.
8.
Our numerical bounds for the diameter H(k) of the narrowest admissible k-tuple are known to be exact for k≤342, but there is scope for some slight improvement for larger values of k, which would lead to some improvements in the bounds on H _m for m=2,3,4,5. However, we believe that our bounds on H _m are already fairly close (e.g. within 10%) of optimal, so there is only a limited amount of gain to be obtained solely from this component of the argument.

Narrow admissible tuples

In this section, we outline the methods used to obtain the numerical bounds on H(k) given by Theorem 17, which are reproduced below:

1.
H(3)=6,
2.
H(50)=246,
3.
H(51)=252,
4.
H(54)=270,
5.
H(5,511)≤52,116,
6.
H(35,410)≤398,130,
7.
H(41,588)≤474,266,
8.
H(309,661)≤4,137,854,
9.
H(1,649,821)≤24,797,814,
10.
H(75,845,707)≤1,431,556,072,
11.
H(3,473,955,908)≤80,550,202,480.

10.1 H(k) values for small k

The equalities in the first four bounds (1)-(4) were previously known. The case H(3)=6 is obvious: the admissible 3-tuples (0,2,6) and (0,4,6) have diameter 6 and no 3-tuple of smaller diameter is admissible. The cases H(50)=246, H(51)=252, and H(54)=270 follow from results of Clark and Jarvis [42]. They define ϱ^∗(x) to be the largest integer k for which there exists an admissible k-tuple that lies in a half-open interval (y,y+x] of length x. For each integer k>1, the largest x for which ϱ^∗(x)=k is precisely H(k+1). Table 1 of [42] lists these largest x values for 2≤k≤170, and we find that H(50)=246, H(51)=252, and H(54)=270. Admissible tuples that realize these bounds are shown in Subsubsections “Admissible 50-tuple realizing H(50) = 246”, “Admissible 51-tuple realizing H(51) = 252” and “Admissible 54-tuple realizing H(54) = 270”.

10.1.1 Admissible 50-tuple realizing H(50) = 246

10.1.2 Admissible 51-tuple realizing H(51) = 252

10.1.3 Admissible 54-tuple realizing H(54) = 270

10.2 H(k) bounds for mid-range k

As previously noted, exact values for H(k) are known only for k≤342. The upper bounds on H(k) for the five cases (5)-(9) were obtained by constructing admissible k-tuples using techniques developed during the first part of the Polymath8 project. These are described in detail in section 3 of [4], but for the sake of completeness, we summarize the most relevant methods here.

10.2.1 Fast admissibility testing

A key component of all our constructions is the ability to efficiently determine whether a given k-tuple $ℋ = (h_{1}, \dots, h_{k})$ is admissible. We say that is admissible modulo p if its elements do not form a complete set of residues modulo p. Any k-tuple is automatically admissible modulo all primes p>k, since a k-tuple cannot occupy more than k residue classes; thus, we only need to test admissibility modulo primes p<k.

A simple way to test admissibility modulo p is to enumerate the elements of modulo p and keep track of which residue classes have been encountered in a table with p boolean-valued entries. Assuming the elements of have absolute value bounded by O(k logk) (true of all the tuples we consider), this approach yields a total bit-complexity of O(k²/ logk M(logk)), where M(n) denotes the complexity of multiplying two n-bit integers, which, up to a constant factor, also bounds the complexity of division with remainder. Applying the Schönhage-Strassen bound M(n)=O(n logn log logn) from [43], this is O(k² log logk log log logk), essentially quadratic in k.

This approach can be improved by observing that for most of the primes p<k, there are likely to be many unoccupied residue classes modulo p. In order to verify admissibility at p, it is enough to find one of them, and we typically do not need to check them all in order to do so. Using a heuristic model that assumes the elements of are approximately equidistributed modulo p, one can determine a bound m<p such that k random elements of $ℤ / pℤ$ are unlikely to occupy all of the residue classes in [ 0,m]. By representing the k-tuple as a boolean vector $ℬ = (b_{0}, \dots, b_{h_{k} - h_{1}})$ in which b_i=1 if and only if i=h_j−h₁ for some $h_{j} \in ℋ$ , we can efficiently test whether occupies every residue class in [ 0,m] by examining the the entries

b_{0}, \dots, b_{m}, b_{p}, \dots, b_{p + m}, b_{2 p}, \dots, b_{2 p + m}, \dots

of . The key point is that when p<k is large, say p>(1+ε)k/ logk, we can choose m so that we only need to examine a small subset of the entries in . Indeed, for primes p>k/c (for any constant c), we can take m=O(1) and only need to examine O(logk) elements of (assuming its total size is O(k logk), which applies to all the tuples we consider here).

Of course it may happen that occupies every residue class in [ 0,m] modulo p. In this case, we revert to our original approach of enumerating the elements of modulo p, but we expect this to happen for only a small proportion of the primes p<k. Heuristically, this reduces the complexity of admissibility testing by a factor of O(logk), making it sub-quadratic. In practice, we find this approach to be much more efficient than the straightforward method when k is large (see [§3.1] for further details.

10.2.2 Sieving methods

Our techniques for constructing admissible k-tuples all involve sieving an integer interval [ s,t] of residue classes modulo primes p<k and then selecting an admissible k-tuple from the survivors. There are various approaches one can take, depending on the choice of interval and the residue classes to sieve. We list four of these below, starting with the classical sieve of Eratosthenes and proceeding to more modern variations.

Sieve of Eratosthenes. We sieve an interval [ 2,x] to obtain admissible k-tuples

p_{m + 1}, \dots, p_{m + k} .

with m as small as possible. If we sieve the the residue class 0(p) for all primes p≤k, we have m=π(k) and p_m+1>k. In this case, no admissibility testing is required, since the residue class 0(p) is unoccupied for all p≤k. Applying the Prime Number Theorem in the forms

\begin{align} p_{k} & = k log k + k log log k - k + O (k \frac{log log k}{log k}), \\ π (x) & = \frac{x}{log x} + O (\frac{x}{\overset{2}{log} x}), \end{align}

this construction yields the upper bound

H (k) \leq k log k + k log log k - k + o (k) .

(149)

As an optimization, rather than sieving modulo every prime p≤k, we instead sieve modulo increasing primes p and as soon as the first k survivors form an admissible tuple. This will typically happen for for some p_m<k.

Hensley-Richards sieve. The bound in (149) was improved by Hensley and Richards [44]-[46], who observed that rather than sieving [ 2,x] it is better to sieve the interval [ −x/2,x/2] to obtain admissible k-tuples of the form

- p_{m + ⌊ k / 2 ⌋ - 1}, \dots, p_{m + 1}, \dots, - 1, 1, \dots, p_{m + 1}, \dots, p_{m + ⌊ (k + 1) / 2 ⌋ - 1},

where we again wish to make m as small as possible. It follows from Lemma 5 of [45] that one can take m=o(k/ logk), leading to the improved upper bound

H (k) \leq k log k + k log log k - (1 + log 2) k + o (k) .

(150)

Shifted Schinzel sieve. As noted by Schinzel in [47], in the Hensley-Richards sieve, it is slightly better to sieve 1(2) rather than 0(2); this leaves unsieved powers of 2 near the center of the interval [ −x/2,x/2] that would otherwise be removed (more generally, one can sieve 1(p) for many small primes p, but we did not). Additionally, we find that shifting the interval [ −x/2,x/2] can yield significant improvements (one can also view this as changing the choices of residue classes).

This leads to the following approach: we sieve an interval [ s,s+x] of odd integers and multiples of odd primes p≤p_m, where x is large enough to ensure at least k survivors, and m is large enough to ensure that the survivors form an admissible tuple, with x and m minimal subject to these constraints. A tuple of exactly k survivors is then chosen to minimize the diameter. By varying s and comparing the results, we can choose a starting point s∈[ −x/2,x/2] that yields the smallest final diameter. For large k, we typically find s≈k is optimal, as opposed to s≈−(k/2) logk in the Hensley-Richards sieve.

Shifted greedy sieve. As a further optimization, we can allow greater freedom in the choice of residue class to sieve. We begin as in the shifted Schinzel sieve, but for primes p≤p_m that exceed $2 \sqrt{k log k}$ , rather than sieving 0(p), we choose a minimally occupied residue class a(p). As above, we sieve the interval [ s,s+x] for varying values of s∈[ −x/2,x/2] and select the best result, but unlike the shifted Schinzel sieve, for large k, we typically choose s≈−(k/ logk−k)/2.

We remark that while one might suppose that it would be better to choose a minimally occupied residue class at all primes, not just the larger ones, we find that this is generally not the case. Fixing a structured choice of residue classes for the small primes avoids the erratic behavior that can result from making greedy choices to soon (see [48], Fig. 1] for an illustration of this).

Table 4 lists the bounds obtained by applying each of these techniques (in the online version of this paper, each table entry includes a link to the constructed tuple). To the admissible tuples obtained using the shifted greedy sieve, we additionally applied various local optimizations that are detailed in ([§3.6]). As can be seen in the table, the additional improvement due to these local optimizations is quite small compared to that gained by using better sieving algorithms, especially when k is large.

Table 4 Upper bounds on H ( k ) for selected values of k

Full size table

Table 4 also lists the value ⌊k logk+k⌋ that we conjecture as an upper bound on H(k) for all sufficiently large k.

10.3 H(k) bounds for large k

The upper bounds on H(k) for the last two cases (10) and (11) were obtained using modified versions of the techniques described above that are better suited for handling very large values of k. These entail three types of optimizations that are summarized in the subsections below.

10.3.1 Improved time complexity

As noted above, the complexity of admissibility testing is quasi-quadratic in k. Each of the techniques listed in the ‘H(k) bounds for mid-range k’ section involves optimizing over a parameter space whose size is at least quasi-linear in k, leading to an overall quasi-cubic time complexity for constructing a narrow admissible k-tuple; this makes it impractical to handle k>10⁹. We can reduce this complexity in a number of ways.

First, we can combine parameter optimization and admissibility testing. In both the sieve of Eratosthenes and Hensley-Richards sieves, taking m=k guarantees an admissible k-tuple. For m<k, if the corresponding k-tuple is inadmissible, it is typically because it is inadmissible modulo the smallest prime p_m+1 that appears in the tuple. This suggests a heuristic approach in which we start with m=k, and then iteratively reduce m, testing the admissibility of each k-tuple modulo p_m+1 as we go, until we can proceed no further. We then verify that the last k-tuple that was admissible modulo p_m+1 is also admissible modulo all primes p>p_m+1 (we know it is admissible at all primes p≤p_m because we have sieved a residue class for each of these primes). We expect this to be the case, but if not we can increase m as required. Heuristically, this yields a quasi-quadratic running time, and in practice, it takes less time to find the minimal m than it does to verify the admissibility of the resulting k-tuple.

Second, we can avoid a complete search of the parameter space. In the case of the shifted Schinzel sieve, for example, we find empirically that taking s=k typically yields an admissible k-tuple whose diameter is not much larger than that achieved by an optimal choice of s; we can then simply focus on optimizing m using the strategy described above. Similar comments apply to the shifted greedy sieve.

10.3.2 Improved space complexity

We expect a narrow admissible k-tuple to have diameter d=(1+o(1))k logk. Whether we encode this tuple as a sequence of k integers, or as a bitmap of d+1 bits, as in the fast admissibility testing algorithm, we will need approximately k logk bits. For k>10⁹, this may be too large to conveniently fit in memory. We can reduce the space to O(k log logk) bits by encoding the k-tuple as a sequence of k−1 gaps; the average gap between consecutive entries has size logk and can be encoded in O(log logk) bits. In practical terms, for the sequences we constructed, almost all gaps can be encoded using a single 8-bit byte for each gap.

One can further reduce space by partitioning the sieving interval into windows. For the construction of our largest tuples, we used windows of size $O (\sqrt{d})$ and converted to a gap-sequence representation only after sieving at all primes up to an $O (\sqrt{d})$ bound.

10.3.3 Parallelization

With the exception of the greedy sieve, all the techniques described above are easily parallelized. The greedy sieve is more difficult to parallelize because the choice of a minimally occupied residue class modulo p depends on the set of survivors obtained after sieving modulo primes less than p. To address this issue, we modified the greedy approach to work with batches of consecutive primes of size n, where n is a multiple of the number of parallel threads of execution. After sieving fixed residue classes modulo all small primes $p < 2 \sqrt{k log k}$ , we determine minimally occupied residue classes for the next n primes in parallel, sieve these residue classes, and then proceed to the next batch of n primes.

In addition to the techniques described above, we also considered a modified Schinzel sieve in which we check admissibility modulo each successive prime p before sieving multiples of p, in order to verify that sieving modulo p is actually necessary. For values of p close to but slightly less than p_m, it will often be the case that the set of survivors is already admissibility modulo p, even though it does contain multiples of p (because some other residue class is unoccupied). As with the greedy sieve, when using this approach, we sieve residue classes in batches of size n to facilitate parallelization.

10.3.4 Results for large k

Table 5 lists the bounds obtained for the two largest values of k. For k=75,845,707, the best results were obtained with a shifted greedy sieve that was modified for parallel execution as described above, using the fixed shift parameter s=−(k logk−k)/2. A list of the sieved residue classes is available at http://math.mit.edu/~drew/n75845707_1431556072.txt. This file contains values of k, s, d, and m, along with a list of prime indices n_i>m and residue classes r_i such that sieving the interval [ s,s+d] of odd integers, multiples of p_n for 1<n≤m, and at r_i modulo $p_{n_{i}}$ yields an admissible k-tuple.

Table 5 Upper bounds on H ( k ) for selected values of k

Full size table

For k=3,473,955,908, we did not attempt any form of greedy sieving due to practical limits on the time and computational resources available. The best results were obtained using a modified Schinzel sieve that avoids unnecessary sieving, as described above, using the fixed shift parameter s=k 0. A list of the sieved residue classes is available at http://math.mit.edu/~drew/n75845707_1431556072.txt.

This file contains values of k, s, d, and m, along with a list of prime indices n_i>m such that sieving the interval [ s,s+d] of odd integers, multiples of p_n for 1<n≤m, and multiples of $p_{n_{i}}$ yields an admissible k-tuple.

Source code for our implementation is available at http://math.mit.edu/~drew/ompadm_v0.5.tar; this code can be used to verify the admissibility of both the tuples listed above.

Endnotes

^a When a,b are real numbers, we will also need to use (a,b) and [ a,b] to denote the open and closed intervals, respectively, with endpoints a,b. Unfortunately, this notation conflicts with the notation given above, but it should be clear from the context which notation is in use.

^b Actually, there are some differences between Conjecture 1 of [28] and the claim here. Firstly, we need an estimate that is uniform for all a, whereas in [28] only the case of a fixed modulus a was asserted. On the other hand, α,β were assumed to be controlled in ℓ² instead of via the pointwise bounds (6), and Q was allowed to be as large as x log−C x for some fixed C (although, in view of the negative results in [23],[24], this latter strengthening may be too ambitious).

^c One could also use the Heath-Brown identity [49] here if desired.

^d In the k=1 case, we of course just have $q_{W, d_{1}, \dots, d_{k - 1}^{'}} = W$ .

^e One could obtain a small improvement to the bounds here by replacing the threshold 2c with a parameter to be optimized over.

^f The arguments in [5] are rigorous under the assumption of a positive eigenfunction as in Corollary 35, but the existence of such an eigenfunction remains open for k≥3.

^g Indeed, one might be even more ambitious and conjecture a square-root cancellation $⪻ ⪻ \sqrt{x / q}$ for such sums (see [50] for some similar conjectures), although such stronger cancellations generally do not play an essential role in sieve-theoretic computations.

^h One new technical difficulty here is that some of the various moduli $[d_{j}, d_{j}^{'}]$ arising in these arguments are not required to be coprime at primes p>w dividing x or x−2; this requires some modification to Lemma 30 that ultimately leads to the appearance of the singular series . However, these modifications are quite standard, and we do not give the details here.

ⁱ In particular, the optimal choice F for M_k,ε should vanish on the polytope ${(t_{1}, \dots, t_{k}) \in (1 + ε) \cdot R_{k} : \sum_{i \neq i_{0}} t_{i} \geq 1 - ε forall i_{0} = 1, \dots, k}$ .

References

Hardy GH, Littlewood JE: Some problems of “Partitio Numerorum”, III: on the expression of a number as a sum of primes. Acta Math 1923, 44: 1–70. 10.1007/BF02403921
Article MATH MathSciNet Google Scholar
Goldston D, Pintz J, Yıldırım C: Primes in tuples. I. Ann. Math 2009,170(2):819–862. 10.4007/annals.2009.170.819
Article MATH Google Scholar
Zhang Y: Bounded gaps between primes. Annals Math 2014, 179: 1121–1174. 10.4007/annals.2014.179.3.7
Article MATH Google Scholar
Polymath, DHJ: New equidistribution estimates of Zhang type, and bounded gaps between primes., [http://arxiv.org/abs/1402.0811v2]
Maynard, J: Small gaps between primes. Annals Math. to appear.
Granville, A: Bounded gaps between primes, preprint.
Selberg A: On elementary methods in prime number-theory and their limitations. In Proc. 11th Scand. Math. Cong. Trondheim (1949), Collected Works, vol. I,. Springer-Verlag, Berlin-Göttingen-Heidelberg; 1989.
Google Scholar
Pintz J: The bounded gap conjecture and bounds between consecutive Goldbach numbers. Acta Arith 2012,155(4):397–405. 10.4064/aa155-4-4
Article MATH MathSciNet Google Scholar
Banks, WD, Freiberg, T, Maynard, J: On limit points of the sequence of normalized prime gaps, preprint.
Banks, WD, Freiberg, T, Turnage-Butterbaugh, CL: Consecutive primes in tuples, preprint.
Benatar, J: The existence of small prime gaps in subsets of the integers, preprint.
Castillo, A, Hall, C, Lemke Oliver, RJ, Pollack, P, Thompson, L: Bounded gaps between primes in number fields and function fields, preprint.
Chua, L, Park, S, Smith, G. D: Bounded gaps between primes in special sequences, preprint.
Freiberg, T: A note on the theorem of Maynard and Tao, preprint.
Li, H, Pan, H: Bounded gaps between primes of the special form, preprint.
Maynard, J: Dense clusters of primes in subsets, preprint.
Pintz, J: On the ratio of consecutive gaps between primes, preprint.
Pintz, J: On the distribution of gaps between consecutive primes, preprint.
Pollack, P: Bounded gaps between primes with a given primitive root, preprint.
Pollack, P, Thompson, L: Arithmetic functions at consecutive shifted primes, preprint.
Thorner, J: Bounded Gaps Between Primes in Chebotarev Sets, preprint.
Elliott PDTA, Halberstam H: A conjecture in prime number theory. Symp. Math 1968, 4: 59–72.
Google Scholar
Friedlander, J, Granville, A: Relevance of the residue class to the abundance of primes. In: Proceedings of the Amalfi Conference on Analytic Number Theory (Maiori, 1989), pp. 95–103. Univ. Salerno, Salerno (1992)
Friedlander J, Granville A, Hildebrand A, Maier H: Oscillation theorems for primes in arithmetic progressions and for sifting functions. J. Amer. Math. Soc 1991,4(1):25–86. 10.1090/S0894-0347-1991-1080647-5
Article MATH MathSciNet Google Scholar
Bombieri, E: Le Grand Crible dans la Théorie Analytique des Nombres (Seconde ed.)Astérisque. 18 (1987).
Vinogradov AI: The density hypothesis for Dirichlet L-series. Izv. Akad. Nauk SSSR Ser. Mat. (in Russian) 1956, 29: 903–934.
Google Scholar
Motohashi Y, Pintz J: A smoothed GPY sieve. Bull. Lond. Math. Soc 2008,40(2):298–310. 10.1112/blms/bdn023
Article MATH MathSciNet Google Scholar
Bombieri E, Friedlander J, Iwaniec H: Primes in arithmetic progressions to large moduli. Acta Math 1986,156(3–4):203–251. 10.1007/BF02399204
Article MATH MathSciNet Google Scholar
Vaughan RC: Sommes trigonométriques sur les nombres premiers. C. R. Acad. Sci. Paris Sér. A 1977, 285: 981–983.
MATH MathSciNet Google Scholar
Autour du théorème de Bombieri-Vinogradov Acta Math 1984,152(3–4):219–244. 10.1007/BF02392198
Fouvry É, Iwaniec H: Primes in arithmetic progressions. Acta Arith 1983,42(2):197–218.
MATH MathSciNet Google Scholar
Siebert, H: Einige Analoga zum Satz von Siegel-Walfisz. In: Zahlentheorie (Tagung, Math. Forschungsinst., Oberwolfach, 1970), pp. 173–184. Bibliographisches Inst., Mannheim (1971).
Iwaniec, H, Kowalski, E: Analytic number theory, Vol. 53. American Mathematical Society Colloquium Publications (2004).
Motohashi Y: An induction principle for the generalization of Bombieri’s prime number theorem. Proc. Japan Acad 1976, 52: 273–275. 10.3792/pja/1195518296
Article MATH MathSciNet Google Scholar
Pintz, J: Polignac Numbers, Conjectures of Erdős on Gaps between Primes, Arithmetic Progressions in Primes, and the Bounded Gap Conjecture, preprint.
Soundararajan K: Small gaps between prime numbers: the work of Goldston-Pintz-Yıldırım. Bull. Amer. Math. Soc. (N.S.) 2007,44(1):1–18. 10.1090/S0273-0979-06-01142-6
Article MATH MathSciNet Google Scholar
Goldston D, Yıldırım C: Higher correlations of divisor sums related to primes. I. Triple correlations. Integers 2003,3(A5):66.
Google Scholar
Pintz J: Are there arbitrarily long arithmetic progressions in the sequence of twin primes? An irregular mind. Szemerédi is 70. Bolyai Soc. Math. Stud. Springer 2010, 21: 525–559. 10.1007/978-3-642-14444-8_15
Article MathSciNet Google Scholar
Farkas, B, Pintz, J, Révész, S: On the optimal weight function in the Goldston-Pintz-Yıldırım method for finding small gaps between consecutive primes. In: Paul Turán Memorial Volume: Number Theory, Analysis and Combinatorics. de Gruyter, Berlin (2013).
Friedlander, J, Iwaniec, H: The polynomial X²+Y⁴ captures its primes. Ann. Math. 148(3), 945–1040 (1998).
Fouvry E: Théorḿe de Brun-Titchmarsh: application au théorème de Fermat. Invent. Math 1985,79(2):383–407. 10.1007/BF01388980
Article MATH MathSciNet Google Scholar
Clark D, Jarvis N: Dense admissible sequences. Math. Comp 2001,70(236):1713–1718. 10.1090/S0025-5718-01-01348-5
Article MATH MathSciNet Google Scholar
Schönhage A, Strassen V: Schnelle Multiplikation großer Zahlen. Comput. Arch. Elektron. Rechnen 1971, 7: 281–292.
MATH Google Scholar
Hensley, D, Richards, I: On the incompatibility of two conjectures concerning primes. In: Analytic Number Theory (Proc. Sympos. Pure Math., Vol. XXIV, St. Louis Univ., St. Louis, Mo. 1972), pp. 123–127. Amer. Math. Soc., Providence, R.I. (1973).
Hensley, D, Richards, I: Primes in intervals. Acta Arith. 25(1973/74), 375–391.
Richards I: On the incompatibility of two conjectures concerning primes; a discussion of the use of computers in attacking a theoretical problem. Bull. Amer. Math. Soc 1974, 80: 419–438. 10.1090/S0002-9904-1974-13434-8
Article MATH MathSciNet Google Scholar
Schinzel A: Remarks on the paper “Sur certaines hypothèses concernant les nombres premiers”. Acta Arith 1961/1962, 7: 1–8.
MathSciNet Google Scholar
Gordon, D, Rodemich, G: Dense admissible sets. In: Algorithmic Number Theory (Portland, OR, 1998), Lecture Notes in Comput. Sci, pp. 216–225. Springer, Berlin (1998). Gordon, D, Rodemich, G: Dense admissible sets. In: Algorithmic Number Theory (Portland, OR, 1998), Lecture Notes in Comput. Sci, pp. 216–225. Springer, Berlin (1998).
Heath-Brown DR: Prime numbers in short intervals and a generalized Vaughan identity. Canad. J. Math 1982,34(6):1365–1377. 10.4153/CJM-1982-095-9
Article MATH MathSciNet Google Scholar
Montgomery HL: Topics in Multiplicative Number Theory, volume 227. Lecture Notes in Math. Springer, New York; 1971.
Book Google Scholar

Download references

Acknowledgements

This paper is part of the Polymath project, which was launched by Timothy Gowers in February 2009 as an experiment to see if research mathematics could be conducted by a massive online collaboration. The current project (which was administered by Terence Tao) is the eighth project in this series, and this is the second paper arising from that project. Further information on the Polymath project can be found on the web site http://michaelnielsen.org/polymath1. Information about this specific project may be found at http://michaelnielsen.org/polymath1/index.php?title=Bounded_gaps_between_primes, and a full list of participants and their grant acknowledgments may be found at http://michaelnielsen.org/polymath1/index.php?title=Polymath8_grant_acknowledgments. We thank Thomas Engelsma for supplying us with his data on narrow admissible tuples and Henryk Iwaniec for useful suggestions. We also thank the anonymous referees for some suggestions in improving the content and exposition of the paper.

Author information

Authors and Affiliations

http://michaelnielsen.org/polymath1/index.php
DHJ Polymath

Authors

DHJ Polymath
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to DHJ Polymath.

Additional information

An erratum to this article is available at http://dx.doi.org/10.1186/s40687-015-0033-x.

Authors’ original submitted files for images

Below are the links to the authors’ original submitted files for images.

Authors’ original file for figure 1

Authors’ original file for figure 2

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0), which permits use, duplication, adaptation, distribution, and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and permissions

About this article

Cite this article

Polymath, D. Variants of the Selberg sieve, and bounded intervals containing many primes. Mathematical Sciences 1, 12 (2014). https://doi.org/10.1186/s40687-014-0012-7

Download citation

Received: 18 July 2014
Accepted: 19 July 2014
Published: 17 October 2014
DOI: https://doi.org/10.1186/s40687-014-0012-7

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Variants of the Selberg sieve, and bounded intervals containing many primes

Abstract

Similar content being viewed by others

Bounded intervals containing many primes

A higher rank Selberg sieve and applications

Large prime gaps and probabilistic models

Background

Theorem 1(GPY theorem).

Theorem 2(Zhang’s theorem).

Theorem 3(Maynard’s theorem).

Theorem 4.

Theorem 5(Disjunction).

1.1 Organization of the paper

1.2 Notation

Definition 6(Asymptotic notation).

Distribution estimates on arithmetic functions

Definition 7(Discrepancy).

Claim 8(Elliott-Halberstam conjecture, EH[ 𝜗]).

Theorem 9(Bombieri-Vinogradov theorem).

Claim 10(Motohashi-Pintz-Zhang estimate, MPZ[ ϖ,δ]).

Theorem 11.

Claim 12(Generalized Elliott-Halberstam conjecture, GEH[ 𝜗]).

Proposition 13.

Proof.

Theorem 14(Generalized Bombieri-Vinogradov theorem).

Outline of the key ingredients

Claim 15(Weak Dickson-Hardy-Littlewood conjecture, DHL[ k;j]).

Theorem 16.

Theorem 17(Bounds on H(k)).

Lemma 18(Criterion for DHL).

Proof.

Theorem 19(Asymptotic for prime sums).

Theorem 20(Asymptotic for non-prime sums).

Remark 21.

Theorem 22(Sieving on the standard simplex).

Theorem 23(Lower bounds on M k ).

Theorem 24(Sieving on a truncated simplex).

Theorem 25(Lower bounds on M k [ α ] ).

Theorem 26(Sieving on an epsilon-enlarged simplex).

Theorem 27(Lower bounds on Mk,ε).

Theorem 28(Going beyond the epsilon enlargement).

Theorem 29(A piecewise polynomial cutoff).

Multidimensional Selberg sieves

Lemma 30(Asymptotic).

Proof.

4.1 The trivial case

4.2 The Elliott-Halberstam case

4.3 The Motohashi-Pintz-Zhang case

4.4 Crude estimates on divisor sums

Proposition 14(Almost primality).

Proof.

Remark 32.

4.5 The generalized Elliott-Halberstam case

Lemma 33.

Proof.

Reduction to a variational problem

5.1 Proof of Theorem 22

5.2 Proof of Theorem 24

5.3 Proof of Theorem 26

5.4 Proof of Theorem 28

Asymptotic analysis

Lemma 34(Cauchy-Schwarz).

Proof.

Corollary 35.

Proof.

Corollary 36(Computation of M2).

Proof.

Corollary 37.

Proof.

Proposition 38.

Proof.

Remark 39.

Theorem 40.

Proof.

The case of small and medium dimension

7.1 Bounding M k for medium k

Lemma 41.

Proof.

Lemma 42(Beta function identity).

Proof.

Theorem 23(Lower bounds on M_k).

Theorem 25(Lower bounds on $M_{k}^{[α]}$ ).

Theorem 27(Lower bounds on M_k,ε).

Corollary 36(Computation of M₂).

7.1 Bounding M_kfor medium k

7.2 Bounding M_k,εfor medium k

7.3 Bounding M_4,ε