Abstract
Let \(a_0\in \{0,\ldots ,9\}\). We show there are infinitely many prime numbers which do not have the digit \(a_0\) in their decimal expansion. The proof is an application of the Hardy–Littlewood circle method to a binary problem, and rests on obtaining suitable ‘Type I’ and ‘Type II’ arithmetic information for use in Harman’s sieve to control the minor arcs. This is obtained by decorrelating Diophantine conditions which dictate when the Fourier transform of the primes is large from digital conditions which dictate when the Fourier transform of numbers with restricted digits is large. These estimates rely on a combination of the geometry of numbers, the large sieve and moment estimates obtained by comparison with a Markov process.
1 Introduction
Let \(a_0\in \{0,\ldots ,9\}\) and let
be the set of numbers which have no digit equal to \(a_0\) in their decimal expansion. The number of elements of \(\mathcal {A}_1\) which are less than x is \(O(x^{1-c})\), where \(c=\log {(10/9)}/\log {10}\approx 0.046>0\). In particular, \(\mathcal {A}_1\) is a sparse subset of the natural numbers. A set being sparse in this way presents several analytic difficulties if one tries to answer arithmetic questions such as whether the set contains infinitely many primes. Typically we can only show that sparse sets contain infinitely many primes when the set in question possesses some additional multiplicative structure.
The set \(\mathcal {A}_1\) has unusually nice structure in that its Fourier transform has a convenient explicit analytic description, and is often unusually small in size. There has been much previous work [1, 2, 4,5,6, 11, 13] studying \(\mathcal {A}_1\) and related sets by exploiting this Fourier structure. In particular the work of Dartyge and Mauduit [7, 8] shows the existence of infinitely many integers in \(\mathcal {A}_1\) with at most 2 prime factors, this result relying on the fact that \(\mathcal {A}_1\) is well-distributed in arithmetic progressions [7, 12, 16]. We also mention the related work of Mauduit and Rivat [17] who showed the sum of digits of primes is well-distributed, and the work of Bourgain [3] which showed the existence of primes in the sparse set created by prescribing a positive proportion of the binary digits.
We show that there are infinitely many primes in \(\mathcal {A}_1\). Our proof is based on a combination of the circle method, Harman’s sieve, the method of bilinear sums, the large sieve, the geometry of numbers and a comparison with a Markov process. In particular, we make key use of the Fourier structure of \(\mathcal {A}_1\), in the same spirit as the aforementioned works. Somewhat surprisingly, the Fourier structure allows us to successfully apply the circle method to a binary problem.
Theorem 1.1
Let \(X\ge 4\) and \(\mathcal {A}=\{\sum _{0\le i\le k}n_i10^i< X:\, n_i\in \{0,\ldots ,9\}\backslash \{a_0\},\,k\ge 0\}\) be the set of numbers less than X with no digit in their decimal expansion equal to \(a_0\). Then we have
Here, and throughout the paper, \(f\asymp g\) means that there are absolute constants \(c_1,c_2>0\) such that \(c_1f<g<c_2f\).
Thus there are infinitely many primes with no digit \(a_0\) when written in base 10. Since \(\#\mathcal {A}/X^{\log {9}/\log {10}}\) oscillates as \(X\rightarrow \infty \), we cannot expect an asymptotic formula of the form \((c+o(1))X^{\log {9}/\log {10}}/\log {X}\). Nonetheless, we expect that
where
Indeed, there are \((\phi (10)\kappa _\mathcal {A}/10+o(1))\#\mathcal {A}\) elements of \(\mathcal {A}\) which are coprime to 10, and \((1+o(1))X/\log {X}\) primes less than X which are coprime to 10, and \((\phi (10)/10+o(1))X\) integers less than X coprime to 10. Thus if the properties ‘being in \(\mathcal {A}\)’ and ‘being prime’ where independent for integers \(n< X\) coprime to 10, we would expect \((\kappa _\mathcal {A}+o(1)) \#\mathcal {A}/\log {X}\) primes in \(\mathcal {A}\). Theorem 1.1 shows this heuristic guess is within a constant factor of the truth, and we would be able to establish such an asymptotic formula if we had stronger ‘Type II’ information.
One can consider the same problem in bases other than 10, and with more than one excluded digit. The set of numbers less than X missing s digits in base q has \(\asymp X^{c}\) elements, where \(c=\log (q-s)/\log {q}\). For fixed s, the density becomes larger as q increases, and so the problem becomes easier. Our methods are not powerful enough to show the existence of infinitely many primes with two digits not appearing in their decimal expansion, but they can show that there are infinitely many primes with s digits excluded in base q provided q is large enough in terms of s. Moreover, if the set of excluded digits possesses some additional structure this can apply to very thin sets formed in this way.
Theorem 1.2
Let q be sufficiently large, and let \(X\ge q\).
For any choice of \(\mathcal {B}\subseteq \{0,\ldots ,q-1\}\) with \(\#\mathcal {B}=s\le q^{23/80}\), let
be the set of integers less than X with no digit in base q in the set \(\mathcal {B}\). Then we have
In the special case when \(\mathcal {B}=\{0,\ldots ,s-1\}\) or \(\mathcal {B}=\{q-s,\ldots ,q-1\}\), this holds in the wider range \(0\le s\le q-q^{57/80}\).
The final case of Theorem 1.2 when \(\mathcal {B}=\{0,\ldots ,s-1\}\) and \(s\approx q-q^{57/80}\) shows the existence many primes in a set of integers \(\mathcal {A}'\) with \(\#\mathcal {A}'\approx X^{57/80}=X^{0.7125}\), a rather thin set. The exponent here can be improved slightly with more effort.
The estimates in Theorem 1.2 can be improved to asymptotic formulae if we restrict s slightly further. For general \(\mathcal {B}\) with \(s=\#\mathcal {B}\le q^{1/4-\delta }\) and any q sufficiently large in terms of \(\delta >0\) we obtain
where if \(\mathcal {B}\) contains exactly t elements coprime to q, we have
In the case of just one excluded digit, we can obtain this asymptotic formula for \(q\ge 12\). In the case of \(\mathcal {B}=\{0,\ldots ,s-1\}\), we obtain the above asymptotic formula provided \(s\le q-q^{3/4+\delta }\).
We expect several of the techniques introduced in this paper might be useful more generally in other digit-related questions about arithmetic sequences. Our general approach to counting primes in \(\mathcal {A}\) and our analysis of the minor arc contribution might also be of independent interest, with potential application to other questions on primes involving sets whose Fourier transform is unrelated to Diophantine properties of the argument.
2 Outline
Our argument is fundamentally based on an application of the circle method. Clearly for the purposes of Theorem 1.1 we can restrict X to a power of 10 for convenience. The number of primes in \(\mathcal {A}\) is the number of solutions of the binary equation \(p-a=0\) over primes p and integers \(a\in \mathcal {A}\), and so is given by
where
We then separate the contribution from the a in the ‘major arcs’ which give our expected main term for \(\#\{p\in \mathcal {A}\}\), and the a in the ‘minor arcs’ which we bound for an error term.
The reader might be (justifiably) somewhat surprised by this, since it is well known that the circle method typically cannot be applied to binary problems. Indeed, one cannot generally hope for bounds better than ‘square-root cancellation’
for ‘generic’ \(\theta \in [0,1]\). Thus if one cannot exploit cancellation amongst the different terms in the minor arcs, we would expect that the \(\gg X\) different ‘generic’ a in the sum above would contribute an error term which we can only bound as \(O(X^{1/2}\#\mathcal {A}^{1/2})\), and this would dominate the expected main term.
It turns out that the Fourier transform \(S_{\mathcal {A}}(\theta )\) has some somewhat remarkable features which cause it to typically have better than square-root cancellation. (A closely related phenomenon is present and crucial in the work of Mauduit and Rivat [17] and Bourgain [3].) Indeed, we establish the \(\ell ^1\) bound
which shows that for ‘generic’ a we have \(S_{\mathcal {A}}(a/X)\ll \#\mathcal {A}/X^{0.64}\ll X^{0.32}\). This gives us a (small) amount of room for a possible successful application of the circle method , since now we might hope the ‘generic’ a would contribute a total \(O(X^{0.82})\) if the bound \(S_{\mathbb {P}}(a/X)\ll X^{1/2+\epsilon }\) held for all a in the minor arcs, and this \(O(X^{0.82})\) error term is now smaller than the expected main term of size \(\#\mathcal {A}^{1+o(1)}\).
We actually get good asymptotic control over all moments (including fractional ones) of \(S_{\mathcal {A}}(a/X)\) rather than just the first. By making a suitable approximation to \(S_{\mathcal {A}}(\theta )\), we can re-interpret moments of this approximation as the average probability of restricted paths in a Markov process, and obtain asymptotic estimates via a finite eigenvalue computation.
By combining an \(\ell ^2\) bound for \(S_{\mathbb {P}}(a/X)\) with an \(\ell ^{1.526}\) bound for \(S_{\mathcal {A}}(a/X)\), we are able to show that it is indeed the case that ‘generic’ \(a<X\) make a negligible contribution, and that we may restrict ourselves to \(a\in \mathcal {E}\), some set of size \(O(X^{0.36})\).
We expect that \(S_{\mathbb {P}}(\theta )\) is large only when \(\theta \) is close to a rational with small denominator, and \(S_{\mathcal {A}}(\theta )\) is large when \(\theta \) has a decimal expansion containing many 0’s or 9’s. Thus we expect the product to be large only when both of these conditions hold, which is essentially when \(\theta \) is well approximated by a rational whose denominator is a small power of 10.
By obtaining suitable estimates for \(\mathcal {A}\) in arithmetic progressions via the large sieve, one can verify that amongst all a in the major arcs \(\mathcal {M}\) where a / X is well-approximated by a rational of small denominator we obtain our expected main term, and this comes from when a / X is well-approximated by a rational with denominator 10.
Thus we are left to show when \(a\in \mathcal {E}\) and a / X is not close to a rational with small denominator, the product \(S_{\mathcal {A}}(a/X)S_{\mathbb {P}}(-a/X)\) is small on average. By using an expansion of the indicator function of the primes as a sum of bilinear terms (similar to Vaughan’s identity), we are led to bound expressions such as
which is a weighted and averaged form of the typical expressions one encounters when obtaining a \(\ell ^\infty \) bound for exponential sums over primes. Here \(\Vert \cdot \Vert \) is the distance to the nearest integer.
The double sum over \(n_1,n_2\) in (2.2) is of size \(O(N^2)\) for ‘typical’ pairs \((a_1,a_2)\), and if it is noticeably larger than this then \(a_1\) and \(a_2\) must share some Diophantine structure. We find that the pair \((a_1,a_2)\) must lie close to the projection from \(\mathbb {Z}^3\) to \(\mathbb {Z}^2\) of some low height plane or low height line if this quantity is large, where the arithmetic height of the line or plane is bounded in terms of the size of the double sum (For example, the diagonal terms \(a_1=a_2\) give a large contribution and lie on a low height line, and \(a_1,a_2\) which are both small give a large contribution and lie in a low height plane.).
This restricts the number and nature of pairs \((a_1,a_2)\) which can give a large contribution. Since we expect the size of \(S_{\mathcal {A}}(a_1/X)S_{\mathcal {A}}(a_2/X)\) to be determined by digital rather than Diophantine conditions on \(a_1,a_2\), we expect to have a smaller total contribution when restricted to these sets. By using the explicit description of such pairs \((a_1,a_2)\) we succeed in obtaining such a superior bound on the sum over these pairs. It is vital here that we are restricted to \(a_1,a_2\) lying in the small set \(\mathcal {E}\) (for points on a line) and outside of the set \(\mathcal {M}\) of major arcs (for points in a lattice).
This ultimately allows us to get suitable bounds for (2.2) provided \(N\in [X^{0.36},X^{0.425}]\). If this ‘Type II range’ were larger, we would be able to express the indicator function of the primes as a combination of such bilinear expressions and easily controlled terms. We would then obtain an asymptotic estimate for \(\#\{p\in \mathcal {A}\}\). Unfortunately our range is not large enough to do this. Instead we work with a minorant for the indicator function of the primes throughout our argument, which is chosen such that it is essentially a combination of bilinear expressions which do fall into this range. It is this feature which means we obtain a lower bound rather than an asymptotic estimate for the number of primes in \(\mathcal {A}\).
Such a minorant is constructed via Harman’s sieve, and, since it is essentially a combination of Type II terms and easily handled terms, we can obtain an asymptotic formula for elements of \(\mathcal {A}\) weighed by it. This gives a lower bound
for some constant c. We use numerical integration to verify that we (just) have \(c>0\), and so we obtain our asymptotic lower bound for \(\#\{p\in \mathcal {A}\}\). The upper bound is a simple sieve estimate.
Remark
For the method used to prove Theorem 1.1, strong assumptions such as the Generalized Riemann Hypothesis appear to be only of limited benefit. In particular, even under GRH one only gets pointwise bounds of the strength \(S_\mathbb {P}(\theta )\ll X^{3/4+o(1)}\) for ‘generic’ \(\theta \), which is not strong enough to give a non-trivial minor arc bound on its own. The assumption of GRH and the above pointwise bound is sufficient to deal with the entire minor arc contribution in the regime where we obtain asymptotic formulae (i.e. when the base is sufficiently large).
3 Notation
We use the asymptotic notation \(\ll ,\gg \), \(O(\cdot )\), \(o(\cdot )\) throughout, denoting a dependence of the implied constant on a parameter t by a subscript. As mentioned earlier, we use \(f \asymp g\) to denote that both \(f\ll g\) and \(g\ll f\) hold. Throughout the paper \(\epsilon \) will denote a single fixed positive constant which is sufficiently small; \(\epsilon =10^{-100}\) would probably suffice. In particular, any implied constants may depend on \(\epsilon \). We will assume that X is always a suitably large integral power of 10 throughout. We will exclusively use the letter p to denote a prime number, without always making this restriction explicit.
We will use the nonstandard notation that \(n\sim X\) to mean that n lies in the interval (X / 10, X] throughout the paper.
Several variables will be assumed to be non-negative integers, without directly specifying this. Thus sums such as \(\sum _{n<X}\) will be assumed to be over integers n with \(0\le n<X\), for example. The usage should be clear from the context.
It will be convenient to normalize the Fourier transform of \(\mathcal {A}\), and to be able to view it at different scales. With this in mind, we define
Whenever we encounter the function \(F_Y\) we assume that Y is a positive integral power of 10. (Or that they are powers of q in Sect. 16.) We use \(\Vert \cdot \Vert \) to denote the distance to the nearest integer, and \(\Vert \cdot \Vert _2\) to denote the standard Euclidean norm. We use \(\mathbf {1}_{\mathcal {A}_1}\) for the indicator function of the set \(\mathcal {A}_1\) of integers with restricted digits. Here \(e(x)=e^{2\pi i x}\) is the complex exponential function.
We need to make use of various numerical estimates throughout the paper, some of which succeed only by a small margin. We have endeavored to avoid too many explicit calculations and we encourage the reader to not pay too much attention to the numerical constants appearing on a first reading.
4 Structure of the paper
In Sect. 6, we use a sieve decomposition to reduce the proof of Theorem 1.1 to the proof of Propositions 6.1 and 6.2, which are asymptotic estimates for particular types of terms arising from sieve decompositions. These propositions are established in Sect. 7.
In Sect. 7, we use sieve theory to reduce the proof of Propositions 6.1 and 6.2 to the proof of Propositions 7.1 and 7.2, which are our ‘Type I’ and ‘Type II’ estimates. These will be established in Sects. 8 and 9 respectively.
In Sect. 8 we use a large sieve argument to reduce the proof of our Type I estimate Proposition 7.1 to that of Lemmas 8.1 and 8.2, which are Fourier \(\ell ^\infty \) and \(\ell ^1\) bounds. These will be established in Sect. 10.
In Sect. 9 we use the circle method and geometric decompositions to reduce the proof of our Type II estimate Proposition 7.2 to that of Propositions 9.1, 9.2 and 9.3, which are our estimates for the ‘major arcs’, the ‘generic minor arcs’ and the ‘exceptional minor arcs’. These will be established in Sects. 11, 12 and 13 respectively.
In Sect. 10 we establish various Fourier estimates. In particular we establish Lemmas 8.1 and 8.2, as well as several auxiliary lemmas which will be used in later sections.
In Sect. 11 use results on primes in arithmetic progressions to establish our major arc estimate Proposition 9.1, making use of the estimates of Sect. 10.
In Sect. 12 we use Fourier moment bounds from Sect. 10 to establish our generic minor arc estimate Proposition 9.2.
In Sect. 13 we use the geometry of numbers to reduce the proof of the exceptional minor arc estimate Proposition 9.3 to the proof of Propositions 13.3 and 13.4, which are estimates from frequencies constrained to lie in low height lattices or low height lines. These will be established in Sects. 14 and 15.
In Sect. 14 we establish our estimate for low height lattices Proposition 13.3, using the estimates of Sect. 10.
In Sect. 15 we establish our estimate for low height lines Proposition 13.4 , using the geometric counting estimates and the results of Sect. 10. This completes the proof of Theorem 1.1.
In Sect. 16, we sketch the modifications in the argument required to establish Theorem 1.2.
In particular, the dependency graph between the main statements in the proof of Theorem 1.1 is as follows:

5 Basic estimates
We will make frequent use of some well-known facts in analytic number theory without extra comment. In particular, we make use of the Prime Number Theorem in short intervals and arithmetic progressions with error term (see [10, Chapter 22], for example). This states that for any \(A>0\) we have
provided \(\Delta \ge (\log {Y})^{-A}\) and \(q\le (\log {Y})^A\) and \(\gcd (a,q)=1\).
We recall the following sieve estimate (see, for example, [18, Theorem 7.11]): For \(u>1+1/(\log {Y})^{1/2}\)
where \(\omega (u)\) is the Buchstab function defined by the delay-differential equation
We recall some results from the geometry of numbers and Minkowski’s theory of successive minima (see, for example, [9, p. 110]). A lattice in \(\mathbb {R}^k\) is a discrete subgroup of the additive group \(\mathbb {R}^k\). For any lattice \(\Lambda \) there is a Minkowski-reduced basis \(\{\mathbf {v}_1,\ldots ,\mathbf {v}_r\}\) of linearly independent vectors in \(\mathbb {R}^k\) such that
and for any \(x_1,\ldots ,x_r\in \mathbb {R}\) we have
and with \(\Vert \mathbf {v}_1\Vert _2\cdots \Vert \mathbf {v}_r\Vert _2\asymp \det (\Lambda )\), where these implied constants depend only on the ambient dimension k. Here \(\det (\Lambda )\) is the r-dimensional volume of the fundamental parallelepiped, given by
We say r is the rank of the lattice. We see the properties of the Minkowski-reduced basis above indicate that each generating vector \(\mathbf {v}_i\) has a positive proportion of its length in a direction orthogonal to all the other basis vectors.
6 Sieve decomposition and proof of Theorem 1.1
First, we prove Theorem 1.1 assuming two key propositions, given below. This reduces the problem to establishing Propositions 6.1 and 6.2 which we do over the remaining sections.
As remarked in Sect. 2, it suffices to consider X as a power of 10. If \(X=10^k\) we will think of all elements of \(\mathcal {A}\) as having k digits, none of which is equal to \(a_0\). This is equivalent to slightly changing the definition of \(\mathcal {A}\) in the case when \(a_0=0\) (since it restricts \(\mathcal {A}\) to (X / 10, X]), but by considering X, X / 10, \(X/100 \ldots \) we see that we can easily recover Theorem 1.1 for the original set \(\mathcal {A}\) from this situation.
We will make a decomposition of \(\#\{p\in \mathcal {A}{}\}\) into various terms following Harman’s sieve (see [15] for more details). Each of these terms can then be asymptotically estimated by Propositions 6.1 or 6.2 (given below), or can be trivially bounded below by 0. To keep track of the terms in this decomposition we apply the same decomposition to the set
by considering a weighted sequence \(w_n\).
Let \(w_n\) be weights supported on non-negative integers \(n< X\) given by
[We recall that \(\mathbf {1}_{\mathcal {A}}\) is the indicator function of \(\mathcal {A}\), and \(\kappa _\mathcal {A}\) is the constant given by (1.1).] For a set \(\mathcal {C}\) we define
Given an integer \(d>0\) and a real number \(z>0\), let
We expect that \(S_d(z)\) is typically small for a wide range of d and z. The following two propositions show that this is the case for certain d, z.
Proposition 6.1
(Sieve asymptotic terms) Fix an integer \(\ell \ge 0\). Let \(\theta _1=9/25+2\epsilon \) and \(\theta _2=17/40-2\epsilon \). Let \(\mathcal {L}\) be a set of O(1) affine linear functions \(L:\mathbb {R}^\ell \rightarrow \mathbb {R}\). Then we have
where \(\sum ^*\) indicates the summation is restricted by the conditions
for all \(L\in \mathcal {L}\).
Proposition 6.1 includes the case \(\ell =0\), where we interpret the statement as
Proposition 6.2
(Type II terms) Fix an integer \(\ell \ge 1\). Let \(\theta _1,\theta _2,\mathcal {L}\) be as in Proposition 6.1, and let \(\mathcal {I}\subseteq \{1,\ldots ,\ell \}\) and \(j\in \{1,\ldots ,\ell \}\). Then we have
and
where \(\sum ^*\) indicates the same restriction of summation to \(L\ge 0\) for all \(L\in \mathcal {L}\) as in Proposition 6.1.
We note that by inclusion-exclusion the same result holds if some of the inequalities \(L\ge 0\) are replaced by the strict inequality \(L>0\).
Proof of Theorem 1.1 assuming Proposition 6.1 and Proposition 6.2
Let \(\theta _1=9/25+2\epsilon \) and \(\theta _2=17/40-2\epsilon \) as in Proposition 6.1.
We first consider the upper bound for Theorem 1.1, which is essentially a standard sieve upper bound. Since \(\theta _2-\theta _1<1/2\), we have
Thus, using (6.3) and the fact (5.2) that there are \(O(X/\log {X})\) integers in [0, X] with no prime factors smaller than \(X^{\theta _2-\theta _1}\), we have
Thus it suffices to establish the lower bound.
To simplify notation, we let \(z_1\le z_2\le z_3\le z_4\le z_5\le z_6\) be given by
We have
Thus we wish to bound \(S_1(z_4)\) from below. By Buchstab’s identity (i.e. inclusion-exclusion on the least prime factor) we have
The term \(S_1(z_1)\) is \(o(\#\mathcal {A}{}/\log {X})\) by (6.3) from Proposition 6.1. We split the sum over p into ranges \((z_i,z_{i+1}]\), and see that all the terms with \(p\in (z_2,z_3]\) are also negligible by Proposition 6.2. This gives
We wish to replace \(S_p(p)\) by \(S_p(\min (p,(X/p)^{1/2}))\). We note that these are the same when \(p\le X^{1/3}\), but if \(p>X^{1/3}\) then there are additional terms in \(S_p((X/p)^{1/2})\) from primes in the interval \(((X/p)^{1/2},p]\). For \(\delta =1/(\log {X})^{1/2}\), by the prime number theorem and Proposition 6.1, we have
Here, and throughout this section, q is restricted to being a prime number. Similarly, we get corresponding bounds for \(S(\mathcal {B}{}_p,\min (p,(X/p)^{1/2}))\), and so we can replace \(S_p(p)\) with \(S_p(\min (p,(X/p)^{1/2}))\) at the cost of a small error.
Using this, and applying Buchstab’s identity again, we have
The first two terms above are asymptotically negligible by Proposition 6.1, and so this simplifies to
We perform further decompositions to the remaining terms in (6.5). We first concentrate on the first term on the right hand. Splitting the ranges of pq into intervals, and recalling those with a pq in the interval \([z_2,z_3]\) or \([z_5,z_6]\) make a negligible contribution by Proposition 6.2, we obtain
Here we have dropped the condition \(q\le (X/p)^{1/2}\) in the final sum, since this is implied by \(q\le p\) and \(p q\le z_2\). On recalling the definition (6.1) of \(w_n\), we can lower bound the first term of (6.6) by dropping the non-negative contribution from the set \(\mathcal {A}{}\) via \(w_n\ge -\kappa _{\mathcal {A}}\#\mathcal {A}{}/X\). By partial summation, and using the estimate (5.2), this gives
Here \(\omega (u)\) is Buchstab’s function, and \(P^-(n)\) denotes the least prime factor of n.
We perform further decompositions to the second term of (6.6), first splitting according to the size of \(q^2 p\) compared with \(z_6\).
For the second term of (6.8) when \(q^2p\) is large, we first separate the contribution from products of three primes. By an essentially identical argument to when we replaced \(S_p(p)\) by \(S_p(\min (p,(X/p)^{1/2}))\) in (6.4), we may replace \(S_{p q}(q)\) by \(S_{p q}(\min (q,(X/p q)^{1/2}))\) at the cost of a negligible error term (since \(p q<z_6\)). By Buchstab’s identity we have (with r restricted to being prime)
The first term above is counting products of exactly three primes, and for these terms we drop the contribution from \(\mathcal {A}{}\) for a lower bound. By partial summation and the prime number theorem, this gives
For the terms not coming from products of 3 primes, we split our summation according to the size of qr, noting that this is negligible if \(qr\in [z_2,z_3]\) by Proposition 6.2. For the terms with \(qr\notin [z_2,z_3]\) we just take the trivial lower bound. Thus
where \(\mathcal {R}_1\) and \(\mathcal {R}_2\) are given by

Together (6.9), (6.10) and (6.11) give a suitable lower bound for the terms in (6.8) with \(q^2p\ge z_6\).
When \(q^2p<z_6\) we can apply two further Buchstab iterations, since then we can evaluate terms \(S_{p q r}(z_1)\) with \(r\le q\le p\) using Proposition 6.1 as \(p q r\le p q^2<z_6\). As before, we may replace \(S_{p q}(q)\) by \(S_{p q}(\min (q,(X/p q)^{1/2}))\) and \(S_{p q r}(r)\) with \(S_{p q r}(\min (r,(X/p q r)^{1/2}))\) at the cost of negligible error terms (since \(p q r<z_6\)). This gives
where r, s are restricted to primes in the sums above. Finally we see that any part of the final sum with a product of two of p, q, r, s in \([z_2,z_3]\) can be discarded by Proposition 6.2. Trivially lower bounding the remaining terms as we did before yields
where \(\mathcal {R}_3\) is given by

This completes our decomposition of the terms from (6.8), coming from the second term of (6.6). We note that we could have imposed various further restrictions such as \(u+v+w\notin [\theta _1,\theta _2]\) in \(\mathcal {R}_3\), but for ease of calculation we do not include these.
We perform decompositions to the third term of (6.6) in a similar way to how we dealt with the second term. We have \(q^2 p<(q p)^{3/2}< z_2^{3/2}< z_6\) so, as above, we can apply two Buchstab iterations and use Proposition 6.1 to evaluate the terms \(S_{p q r}(z_1)\) since we have \(p q r\le p q^2<z_6\). Furthermore, we notice that terms with any of pqr, pqs, prs, or qrs in \([z_2,z_3]\cup [z_5,z_6]\) are negligible by Proposition 6.2. This gives

where

We note that for \(\mathcal {R}_4\) we have dropped different constraints to those we dropped in \(\mathcal {R}_3\).
Together (6.7), (6.9), (6.10), (6.11), (6.12) and (6.13) give our lower bound for all the terms occurring in (6.6), and so gives a lower bound for first term from (6.5) which covers all terms with \(p\le z_2\).
We are left to consider the second term from (6.5), which is the remaining terms with \(p\in (z_3,z_4]\). We treat these in a similar manner to those with \(p\le z_2\). We first split the sum according to the size of qp. Terms with \(q p\in [z_5,z_6]\) are negligible by Proposition 6.2, so we are left to consider \(q p\in (z_3,z_5)\) or \(q p>z_6\). We then split the terms with \(q p\in (z_3,z_5)\) according to the size of \(q^2 p\) compared with \(z_6\). This gives
where
and where
We apply two further Buchstab iterations to \(S_3\) (we can handle the intermediate terms using Proposition 6.1 as before since \(q^2p<z_6\)). As before, we may replace \(S_{p q}(q)\) by \(S_{p q}(\min (q,(X/p q)^{1/2}))\) and \(S_{p q r}(r)\) by \(S_{p q r}(\min (r,(X/p q r)^{1/2}))\) at the cost of a negligible error term (since \(p q r<z_6\)). This gives

where

Together (6.14), (6.15), (6.16) give our lower bound for the second term from (6.5), which is all the terms with \(p\in [z_3,z_4]\). This completes our lower bound for \(S_1(z_4)\).
Let \(I_1,\ldots ,I_9\) denote the integrals in (6.7), (6.9), (6.10), (6.11), (6.12), (6.13), (6.14), (6.15) and (6.16) respectively. Putting everything together, we obtain
In particular, we have
provided that \(I_1+\cdots +I_9\le 0.999\). Numerical integrationFootnote 1 then gives the following bounds on \(I_1,\ldots ,I_9\) in the case when \(\theta _1\) and \(\theta _2\) in the definition of \(I_1,\ldots ,I_9\) are replaced by 9 / 25 and 17 / 40 respectively.
Thus in this case we have \(I_1+\cdots +I_9< 0.996\), and so by continuity we have \(I_1+\cdots +I_9< 0.996+O(\epsilon )\) when \(\theta _1=9/25+2\epsilon \) and \(\theta _2=17/40-2\epsilon \). Thus, taking \(\epsilon \) suitably small, we see that (6.17) holds, and so we have completed the proof of Theorem 1.1 for X sufficiently large. If \(X\ge 4\) is bounded by a constant, then Theorem 1.1 follows (after potentially adjusting the implied constants) on noting that either 2 or 3 is a prime in \(\mathcal {A}\) and so Theorem 1.1 also holds for bounded \(X\ge 4\). \(\square \)
We note that there are various ways in which one can improve the numerical estimates, but we have restricted ourselves to the above decomposition in the interests of clarity. Judiciously employing further Buchstab decompositions would give small numerical improvements, for example.
7 Sieve asymptotics
In this section we prove Propositions 6.1 and 6.2 assuming Propositions 7.1 and 7.2, given below. This reduces the problem to proving standard ‘Type I’ and ‘Type II’ estimates. These propositions will then be proven in Sects. 8 and 9 .
Before we state the propositions, we set up some extra notation. Let
By a closed convex polytope in \(\mathbb {R}^\ell \) we mean a region \(\mathcal {R}\) defined by a finite number of non-strict affine linear inequalities in the coordinates (equivalently, this is the convex hull of a finite set of points in \(\mathbb {R}^\ell \)). Given a closed convex polytope \(\mathcal {R}\subseteq \mathcal {Q}_\ell (\eta )\), we let
We caution that \(\mathbf {1}_{\mathcal {R}}\) counts numbers with a particular type of prime factorization, and should not be confused with \(\mathbf {1}_{\mathcal {A}}\), the indicator function of the set \(\mathcal {A}\). We recall \(\mathcal {B}=\{n\in \mathbb {Z}:\, 0\le n< X\}\).
Our two key propositions that we will use are given below.
Proposition 7.1
(Type I estimate) Let \(A>0\) and \(Q\le X^{50/77}(\log {X})^{-2A-2}\). Then we have
where
Proposition 7.2
(Type II estimate) Let \(\eta >0\), and let \(\ell \le 2\eta ^{-1}\). Let \(\mathcal {R}\subseteq \mathcal {Q}_\ell (\eta )\) be a closed convex polytope in \(\mathbb {R}^\ell \) which has the property that
for some set \(\mathcal {I}\subseteq \{1,\ldots ,\ell \}\). Then we have
where
Proposition 6.2 follows quickly from Proposition 7.2, but it will be convenient to establish a slightly more general version where the primes can be as small as \(X^\eta \).
Lemma 7.3
(Type II terms, alternative formulation) Fix an integer \(\ell \ge 1\) and a quantity \(\eta >0\). Let \(\theta _1=9/25+2\epsilon \), \(\theta _2=17/40-2\epsilon \), and \(\mathcal {L}\) be as in Proposition 6.2, and let \(\mathcal {I}\subseteq \{1,\ldots ,\ell \}\) and \(j\in \{1,\ldots ,\ell \}\). Then we have
and
where \(\sum ^*\) indicates the same restriction of summation to \(L\ge 0\) for all \(L\in \mathcal {L}\) as in Proposition 6.2.
As before, we note that by inclusion-exclusion the same result holds if some of the constraints \(L\ge 0\) are replaced with \(L>0\). We see Proposition 6.2 follows immediately from Lemma 7.3 on choosing \(\eta =\theta _2-\theta _1\).
Proof of Lemma 7.3 assuming Proposition 7.2
We just deal with the case when \(\prod _{i\in \mathcal {I}}p_i\in [X^{\theta _1},X^{\theta _2}]\); the other case is entirely analogous with \(\theta _1\) and \(\theta _2\) simply replaced with \(1-\theta _2\) and \(1-\theta _1\) throughout. (Notice that if \(\mathbf {e}\in \mathcal {R}\subseteq \mathcal {Q}_\ell (\eta )\) satisfies \(\sum _{i\in \mathcal {I}}e_i\in [23/40+\epsilon ,16/25-\epsilon ]\), then \(\sum _{i\notin \mathcal {I}}e_i\in [9/25+\epsilon ,17/40-\epsilon ]\). Thus the interval \([9/25+\epsilon ,17/40-\epsilon ]\) in Proposition 7.2 can be replaced by the interval \([23/40+\epsilon ,16/25-\epsilon ]\), and so Proposition 7.2 applies similarly in both cases.)
Recall the definition (6.2) of \(S_{d}(z)\). We see that \(S_{p_1\cdots p_\ell }(p_j)\) is a sum of \(w_n\) only involving integers n with at most \(1/\eta \) prime factors, since all prime factors are of size at least \( X^{\eta }\). The terms with exactly r prime factors (for some \(r\le 1/\eta \)) are a sum of \(w_{p_1\cdots p_r}\) over \(p_1,\ldots ,p_r\) with the summation only restricted by a bounded number of linear inequalities on \(\log {p_1}/\log {X},\ldots ,\log {p_r}/\log {X}\). (These are the previous restrictions on \(p_1,\ldots ,p_\ell \), and the restriction \(p_j\le p_{\ell +1}\le \cdots \le p_r\)). We may write the condition \(X^{\eta }\le p_1\) and the restriction on the size of \(\prod _{i\in \mathcal {I}}p_i\) and \(\prod _{i=1}^\ell p_i\) as linear conditions only involving \(\log {p_1}/\log {X},\ldots ,\log {p_\ell }/\log {X}\) with coefficients having constants depending only on \(\eta \). Thus, after increasing \(\mathcal {L}\) to include these conditions, it suffices to show that
where \(\sum ^*\) indicates that the summation is restricted by the conditions
for all \(L\in \mathcal {L}\).
Let \(\delta =1/\log \log {X}\). We first trivially discard the contribution from \(n=p_1\cdots p_{r}<X^{1-\delta }\). Each n appears \(O_\eta (1)\) times in (7.1), so recalling the definition (6.1) of \(w_n\) and dropping the other constraints, the total contribution from such terms is
Thus it is sufficient to show
Since we have the constraint \(p_1\cdots p_\ell \le X/p_j\le X^{1-\eta }\), the result follows immediately if \(r=\ell \) (if \(\eta <\delta \) the result is trivial). Thus we may assume that \(r>\ell \), so none of the constraints involve all the \(p_i\). We now wish to replace \(\log {p_i}/\log {X}\) with \(\log {p_i}/\log {n}\) in the conditions (7.2). For \(n\in [X^{1-\delta },X]\), we have
and so if exactly one of \(L\left(\frac{\log {p_1}}{\log {X}},\ldots ,\frac{\log {p_\ell }}{\log {X}}\right)\) and \(L\left(\frac{\log {p_1}}{\log {n}},\ldots ,\frac{\log {p_\ell }}{\log {n}}\right)\) is non-negative, we must have
To bound the contribution of such terms, let \(\gamma >0\) be a parameter and
(Here the summation is over all choices of primes \(p_1,\ldots ,p_r\), and for any such choice \(n=p_1\cdots p_r\). We do not restrict to \(n\ge X^{1-\delta }\) in the summation.) We wish to show that if \(\gamma =o_{L,\eta }(1)\) then \(G(\gamma ,L)=o_{L,\eta }(\#\mathcal {A}/\log {X})\), and we will do this by first thinking of \(\gamma \) fixed but very small.
We split the sum into at most \(r!=O_\eta (1)\) subsums where the variables are ordered (we potentially double-count the contribution from \(p_i=p_{i'}\) for an upper bound). Thus, after relabelling the \(p_i\), we see that
for some set \(\mathcal {I}'\subseteq \{1,\ldots ,r\}\). Let \(\mathcal {R}=\mathcal {R}(\gamma ,L,\eta )\subseteq \mathcal {Q}_{r}(\eta )\) be given by
Then \(\mathcal {R}\) satisfies the conditions of Proposition 7.2, so
Thus
By the Prime Number Theorem and partial summation, we have
Since all components of elements of \(\mathcal {R}\) are at least \(\eta \), the integral is bounded by \(\eta ^{-r}\) times the \((r-1)\)-dimensional volume of \(\mathcal {R}\). Since L involves at most \(\ell \le r-1\) coordinates and \(\mathcal {R}\subseteq [\eta ,1]^r\), this volume is \(O_{L,\eta }(\gamma )\). Thus
If \(\gamma \rightarrow 0\) as \(X\rightarrow \infty \) suitably slowly, we see that this shows that \(G(\gamma ,L)=o_{L,\eta }(\#\mathcal {A}/\log {X})\). But from the definition of G, we see that \(G(\gamma ,L)\) is non-decreasing in \(\gamma \), so in fact we deduce that for any \(\gamma =o_{L,\eta }(1)\) we have \(G(\gamma ,L)=o_{L,\eta }(\#\mathcal {A}/\log {X})\).
We see from (7.5) that the error introduced to (7.4) by replacing \(\log {p_i}/\log {X}\) with \(\log {p_i}/\log {n}\) in the conditions (7.2) is \(O(\sum _{L\in \mathcal {L}}G(\gamma ,L))\) for some \(\gamma \ll _\mathcal {L}\delta =o_\mathcal {L}(1)\). By the above discussion, this is \(o_{\mathcal {L},\eta }(\#\mathcal {A}/\log {X})\), which is negligible.
After making this change, we may reintroduce the terms with \(n<X^{1-\delta }\) at the cost of a negligible error by using the bound (7.3) again. Thus
where \(\sum ^{**}\) indicates the sum is constrained to
for all \(L\in \mathcal {L}\). Moreover, since we had the constraint \(\prod _{i\in \mathcal {I}}p_i\in [X^{\theta _1},X^{\theta _2}]\) in (7.2), this second sum includes the constraint \(\prod _{i\in \mathcal {I}}p_i\in [n^{\theta _1},n^{\theta _2}]\). We now split the summation into \(O_\eta (1)\) subsums where the \(p_i\) are totally ordered. After relabelling the coordinates, Proposition 7.2 applies to each of these sums, since the linear constraints \(L\ge 0\) for \(L\in \mathcal {L}\) define a closed convex polytope (depending only on \(\mathcal {L}\)), and the ordering of the variables ensures that this lies within \(\mathcal {Q}_r(\eta )\) (recall that the constraint \(X^\eta \le p_1\) becomes \(n^\eta \le p_1\), so all primes are at least \(n^\eta \)). The constraint \(\prod _{i\in \mathcal {I}}p_i\in [n^{\theta _1},n^{\theta _2}]\) corresponds to the sum of a subset of the coordinates of all points in the polytope lying in \([\theta _1,\theta _2]\). Proposition 7.2 shows that the contribution from each such sum is \(o_{\mathcal {L},\eta }(\#\mathcal {A}/\log {X})\). Since there are \(O_\eta (1)\) such sums, the total contribution is \(o_{\mathcal {L},\eta }(\#\mathcal {A}/\log {X})\), giving the result. \(\square \)
Our aim for the remainder of this section is to establish Proposition 6.1 using Propositions 7.1 and 7.2. We first establish an auxiliary lemma.
Lemma 7.4
(Fundamental Lemma) For \(\delta >0\) we have
The implied constant is independent of \(\delta \).
Proof of Lemma 7.4 assuming Proposition 7.1
If \(\delta >\epsilon ^4\) then since \(S(\mathcal {C},X^t)\) is nonnegative and decreasing in t for any set \(\mathcal {C}\), we have
Since \(S(\mathcal {B}{}_d,X^{\epsilon ^4})\ll X/(d\log {X})\) for \(d<X^{1-\epsilon }\) by (5.2), this gives
By the rough number estimate (5.2) again, we see that the sum of 1 / d over \(d<X\) with all prime factors bigger that \(X^\delta \) is \(O_\delta (1)\). Thus the result for \(\delta >\epsilon ^4\) follows from the result for \(\delta =\epsilon ^4\), so we may assume without loss of generality that \(\delta \le \epsilon ^4\).
Let
Then \(\#\mathcal {A}'=\kappa \#\mathcal {A}{}\), where \(\kappa \) is the constant given in Proposition 7.1. Let \(R_d(e)\) be defined by
We put \(q=d e\) and see from Proposition 7.1 that for any \(A>0\) the error terms \(R_{d}(e)\) satisfy
By the fundamental lemma of sieve methods (see, for example, [14, Theorem 6.9]) we have
Summing over d and using the bound (7.6), we obtain
The product in the final bound is \(O(\delta ^{-1}(\log {X})^{-1})\), and the inner sum over d is seen to be \(O(\delta ^{-1})\) by an Euler product upper bound. Finally, since we are assuming that \(\delta \le \epsilon ^4\), we have that \(\delta ^{-2}\exp (-\epsilon /(2\delta ))\ll \exp (-\delta ^{-2/3})\). Thus
An identical argument works for the set \(\mathcal {B}'=\{n< X:\,(n,10)=1\}\) instead of \(\mathcal {A}'\). This gives
We see that for \((d,10)=1\) we have \(S(\mathcal {A}'_d,X^{\delta })=S(\mathcal {A}{}_d,X^{\delta })\), that \(S(\mathcal {B}'_d,X^{\delta })=S(\mathcal {B}{}_d,X^{\delta })\), and that \(\#\mathcal {B}'=\phi (10)\#\mathcal {B}{}/10\). Thus, by the triangle inequality
We bound the first summation by (7.7), the second summation by (7.8), and note that since \(\#\mathcal {B}'=\phi (10)\#\mathcal {B}{}/10\), the third summation is zero. Since \(\kappa _\mathcal {A}=10\kappa /\phi (10)\), this gives
\(\square \)
Using Lemma 7.4 we can now prove Proposition 6.1.
Proof of Proposition 6.1 assuming Lemma 7.3 and Lemma 7.4
Recall that \(\theta _1=9/25+2\epsilon \), \(\theta _2=17/40-2\epsilon \). Let \(\theta :=\theta _2-\theta _1\), and let \(\delta \ge 1/\log \log {X}\) be a small quantity which we will eventually choose to tend to 0 in a suitable manner. In particular, \(\delta \) will be small compared with \(\epsilon \).
We first consider the contribution from \(p_1\cdots p_\ell < X^{\theta _1}\). Given a set \(\mathcal {C}\) and an integer d, we let
Buchstab’s identity shows that
We define \(T_0(\mathcal {C};d)=S(\mathcal {C};X^{\delta })\) and \(V_0(\mathcal {C};d)=0\). This gives for \(d\le X^{\theta _1}\)
We apply the above decomposition to \(\mathcal {A}{}_d\). This gives an expression with \(O(\delta ^{-1})\) terms since trivially \(T_m(\mathcal {A}{}_{d};d)=U_m(\mathcal {A}{}_{d};d)=V_m(\mathcal {A}{}_{d};d)=0\) if \(m>1/\delta \). Applying the same decomposition to \(\mathcal {B}{}_{d}\), taking the weighted difference, and summing over \(d=p_1\cdots p_\ell \) we obtain
Here \(\sum '\) indicates we are summing over all choices of \(p_1,\ldots ,p_\ell \) which appear in the summation in Proposition 6.1 with the additional condition that \(d=p_1\cdots p_\ell < X^{\theta _1}\).
We note that \(p_1,\ldots ,p_\ell \ge X^\theta \), so d has O(1) prime factors and any integer e can be represented O(1) times as \(d p_1'\cdots p_m'\) for some primes \(p_m'\le \dots \le p_1'\) and some choice of \(p_1,\ldots ,p_\ell \) defining d. Thus, expanding the definition of \(T_m\), if \(\delta \le \epsilon \) we have
Here we applied by Lemma 7.4 in the last line, using \(\delta \ge 1/\log \log {X}\).
We now consider the \(V_m\) terms. We expand the definition of \(V_m\) as a sum. We note that \(p_m'\le X^\theta =X^{\theta _2-\theta _1}\), so the summation is constrained by \(X^{\theta _1}\le d p_1'\cdots p_m'\le X^{\theta _2}\), which is our Type II constraint. We see that all terms have \(d p_1'\cdots p_m'\le X/p_m'\), so we can insert this condition without changing the sum. We recall \(p_1,\ldots ,p_\ell \) are constrained only by some linear conditions on \(\log {p_1}/\log {X},\ldots ,\log {p_\ell }/\log {X}\). Thus we see that the sum is of the form considered in Lemma 7.3 with \(\eta =\delta \), since all the conditions in the summation can be written as linear constraints on \(\log {p_i}/\log {X}\) for \(1\le i \le \ell \) and \(\log {p_j'}/\log {X}\) for \(1\le j\le m\). Thus, by Lemma 7.3, we have
Putting together (7.9), (7.10) and (7.11), we obtain
Letting \(\delta \rightarrow 0\) sufficiently slowly then gives the result for \(d<X^{\theta _1}\).
The contribution from d with \(X^{\theta _2}< d< X^{1-\theta _2}\) can be handled by an identical argument, where instead of restricting to \(d p_1'\cdots p_m'\le X^{\theta _1}\) and \(X^{\theta _1}<d p_1'\cdots p_m'\le X^{\theta _1}p_m'\) in \(T_m\), \(U_m\) and \(V_m\), we instead restrict to \(d p_1'\cdots p_m'\le X^{1-\theta _2}\) and \(X^{1-\theta _2}<d p_1'\cdots p_m'\le X^{1-\theta _2}p_m'\) respectively. The terms corresponding to \(V_m\) involve \(a\in \mathcal {A}{}_{d p_1'\cdots p_m'}\) with \(X^{1-\theta _2}<d p_1'\cdots p_m'\le X^{1-\theta _1}\le X/p_m'\), so can be handled by the second part of Lemma 7.3 instead of the first part. Since \(50/77>1-17/40+2\epsilon =1-\theta _2\), the terms corresponding to \(T_m\) can still be handled by Lemma 7.4.
Finally, the contribution from d with \(X^{\theta _1}\le d\le X^{\theta _2}\) or \(X^{1-\theta _2}\le d\le X^{1-\theta _1}\) can be bounded almost immediately by Lemma 7.3. One Buchstab iteration gives
We put \(d=p_1\cdots p_\ell \) and sum over \(p_1,\ldots ,p_\ell \) satisfying the constraints imposed by \(\mathcal {L}\) and such that \(d\in [X^{1-\theta _2},X^{1-\theta _1}]\). The first term makes a negligible total contribution by Lemma 7.4 since \(d\le X^{1-\theta _1}<X^{50/77-\epsilon }\). The second term makes negligible total contribution by Lemma 7.3 (noting that \(d p\le X^{1-\theta _1+\theta }\le X^{1-\theta }\le X/p\)). This gives the result when \(d\in [X^{1-\theta _2}, X^{1-\theta _1}]\). The argument for \(d\in [X^{\theta _1},X^{\theta _2}]\) is completely analogous.
Together these cover the whole range \(p_1\cdots p_\ell \le X^{1-\theta _1}\), giving the result. \(\square \)
Thus, since Lemmas 7.3 and 7.4 follow from Propositions 7.1 and 7.2, it suffices to establish Propositions 7.1 and 7.2.
8 Type I estimate
In this section we establish our ‘Type I’ estimate Proposition 7.1, assuming the more technical Lemmas 8.1 and 8.2 , which we will establish later in Sect. 10. We recall that Proposition 7.1 describes the number of elements of \(\mathcal {A}\) in arithmetic progressions to modulus up to \(X^{50/77-\epsilon }\approx X^{0.65}\) on average.
Our Type I estimate is based on suitable bounds on the Fourier Transform
of the set \(\mathcal {A}\). We recall our definition of the function \(F_Y\) from (3.1), which is a normalized version of \(S_\mathcal {A}\). In particular, \(|S_\mathcal {A}(\theta )|=\#\mathcal {A}\cdot F_X(\theta )\). The two key lemmas which we use in this section are the following.
Lemma 8.1
(Large sieve estimate) We have
Lemma 8.2
(\(\ell ^\infty \) bound) Let \(q<Y^{1/3}\) be of the form \(q=q_1q_2\) with \((q_1,10)=1\) and \(q_1>1\), and let \(|\eta |<Y^{-2/3}/2\). Then for any integer a coprime with q we have
for some absolute constant \(c>0\).
Proof of Proposition 7.1 assuming Lemma 8.1 and Lemma 8.2
By Möbius inversion and using additive characters, we have for \((q,10)=1\)
We write \(b/d q=b'/d q'\) with \((b',q')=1\), and separate the terms with \(q'=1\). We then let \(b'/d q'=b''/d' q'\) with \((b'',d' q')=1\). For \((q,10)=1\) we see that this representation is unique for all b, d under consideration. Thus
We note that \(\#\{a\in \mathcal {A}:(a,10)=1\}=\kappa \#\mathcal {A}\). Summing over \(q<Q\) with \((q,10)=1\) and letting \(q=q' q''\), we obtain
Here we recall our notation that \(q'\sim Q_1\) means \(q'\in (Q_1/10,Q_1]\). By Lemma 8.1 we have for any d|10
which gives the required bound if \(Q_1>(\log {X})^{4A+8}\) on recalling that \(Q_1\le Q\le X^{50/77}(\log {X})^{-2A-2}\). In the case \(Q_1\le (\log {X})^{4A+8}\) we instead use Lemma 8.2, which gives
Thus we see that the bound (8.1) is \(O_A(\#\mathcal {A}/(\log {X})^A)\) in either case, as required. \(\square \)
We are left to establish Proposition 7.2 and Lemmas 8.1 and 8.2.
9 Type II estimate
In this section we reduce our ‘Type II’ estimate to various major arc and minor arc estimates. In particular, we will reduce the proof of Proposition 7.2 to the proof of Propositions 9.1, 9.2 and 9.3 . We first recall the statement of Propositon 7.2 which allows us to count integers in \(\mathcal {A}\) with a specific type of prime factorization provided such numbers always have a ‘conveniently sized’ factor.
Proposition
(Type II estimate Proposition 7.2 restated) Let \(\eta >0\), and let \(\ell \le 2\eta ^{-1}\). Let \(\mathcal {R}\subseteq \mathcal {Q}_\ell (\eta )\) be a closed convex polytope in \(\mathbb {R}^\ell \) which has the property that
for some set \(\mathcal {I}\subseteq \{1,\ldots ,\ell \}\). Then we have
where
To avoid technical issues due to the fact that \(\sum _{n<Y}\mathbf {1}_{\mathcal {A}}(n)\) can fluctuate with Y, we will replace our counts \(\mathbf {1}_{\mathcal {R}}(n)\) with a weight \(\Lambda _{\mathcal {R}}\), where for a set \(\mathcal {R}\subseteq [\eta ,1]^\ell \) we define
We note that in \(\Lambda _\mathcal {R}\) the conditions are on \(\log {p_i}/\log {X}\), whereas in \(\mathbf {1}_{\mathcal {R}}\) the conditions are on \(\log {p_i}/\log {n}\). If every \(\mathbf {e}\in \mathcal {R}\) has \(e_1\le \cdots \le e_\ell \) then at most one term occurs in the summation, so \(\Lambda _{\mathcal {R}}\) simplifies to
We prove Proposition 7.2 by an application of the Hardy–Littlewood circle method, whereby we study the functions
Proposition 7.2 then relies on the following three components.
Proposition 9.1
(Major arcs) Fix \(\eta >0\) and let \(\ell \in \mathbb {Z}\) satisfy \(1\le \ell \le 2/\eta \). Let \(\delta =(\log \log {X})^{-1}\), and let \(\mathcal {R}_X=\mathcal {R}_X(a_1,\ldots ,a_{\ell -1})\) be given by
for some \(a_1,\ldots ,a_{\ell -1}\in \mathbb {R}\) satisfying \(\min _i a_i\ge \eta /2\) and \(\sum _{i=1}^{\ell -1}a_i<1-\eta /2\).
Let \(\mathcal {M}=\mathcal {M}(C)\) be given by
Then
Here \(\kappa _\mathcal {A}\) is the constant given in Proposition 7.2. The implied constant depends on C and \(\eta \), but not on \(\mathcal {R}_X\) or \(a_1\dots ,a_{\ell -1}\).
Proposition 9.2
(Generic minor arcs) Fix \(\eta >0\) and let \(\ell \in \mathbb {Z}\) satisfy \(1\le \ell \le 2/\eta \). Let \(\mathcal {R}\subseteq \mathbb {R}^\ell \) be a closed convex polytope. Let \(\mathcal {M}=\mathcal {M}(C)\) be as in Proposition 9.1.
Then there is some exceptional set \(\mathcal {E}\subseteq [0,X]\) with
such that
The implied constant depends on \(\eta \), but not on \(\mathcal {R}\).
Proposition 9.3
(Exceptional minor arcs) Let \(A>0\). Let \(\eta \), \(\ell \), \(\mathcal {R}_X=\mathcal {R}_X(a_1,\ldots ,a_{\ell -1})\) and \(\mathcal {M}=\mathcal {M}(C)\) be as given in Proposition 9.1. Let \(a_1,\ldots ,a_{\ell -1}\) in the definition of \(\mathcal {R}_X\) satisfy \(\sum _{i\in \mathcal {I}}a_i\in [9/25+\epsilon /2,17/40-\epsilon /2]\cup [23/40+\epsilon /2,16/25-\epsilon /2]\) for some \(\mathcal {I}\subseteq \{1,\ldots ,\ell -1\}\), and let \(C=C(A,\eta )\) in the definition of \(\mathcal {M}\) be sufficiently large in terms of A and \(\eta \). Let \(\mathcal {E}\subseteq [0,X]\) be any set such that \(\#\mathcal {E}\le X^{23/40}\). Then we have
The implied constant depends on \(\eta \) and A, but not on \(\mathcal {R}_X\) or \(a_1,\ldots ,a_{\ell -1}\).
We expect the contribution from the major arcs \(\mathcal {M}\) to give the main contribution. Proposition 9.1 shows that we can get an asymptotic formula from frequencies in \(\mathcal {M}\). Proposition 9.2 shows that most frequencies contribute negligibly, and that any significant contribution must come from some small exceptional set \(\mathcal {E}\). (In view of Proposition 9.1, we must have \(\mathcal {E}\) contains elements of \(\mathcal {M}\) and so \(\mathcal {E}\) is non-empty). We would expect that we can take \(\mathcal {E}=\mathcal {M}\), but cannot quite show this. However, Proposition 9.3 shows that \(\mathcal {E}{\setminus }\mathcal {M}\) contributes negligibly to our sum, which is sufficient for our purposes.
Proof of Proposition 7.2 assuming Propositions 9.1, 9.2 and 9.3 and Lemma 7.4
Proof of Proposition 7.2assuming Propositions 9.1, 9.2and 9.3 andLemma 7.4 Let \(\delta =(\log \log {X})^{-1}\). Clearly we may assume that \(\delta \) is sufficiently small in terms of \(\eta \), since otherwise the result is trivial. We note that \(\ell \ge 2\), since the sum of coordinates of points in \(\mathcal {R}\) is 1 but a non-trivial subset of them lies in [9 / 25, 17 / 40]. Given reals \(a_1,\ldots ,a_{\ell -1}\ge 0\) and \(\gamma >0\) and a set \(\mathcal {S}\in \mathbb {R}^\ell \), let
We see that \(\mathbf {1}_{\mathcal {S}}\) and \(\tilde{\mathbf {1}}_{\mathcal {S}}\) differ in that the denominators of the fractions are \(\log {n}\) and \(\log {X}\) respectively.
We cover \([\eta ,1]^{\ell -1}\) by \(O(\delta ^{-(\ell -1)})\) disjoint hypercubes \(\mathcal {C}(\mathbf {a},\delta )\) of side length \(\delta \) (for example, we can take all \(\mathbf {a}\in \{0,\delta ,2\delta ,\ldots ,\lceil \delta ^{-1}\rceil \delta \}^{\ell -1}\)). Let \(\overline{\mathcal {R}}\subseteq [\eta ,1]^{\ell -1}\) denote the projection of \(\mathcal {R}\) onto the first \(\ell -1\) coordinates (which is also a closed convex polytope). We see that if \(n\in [X^{1-\delta ^2},X]\) then \(\log {n}\) and \(\log {X}\) differ by a factor of at most \(1-\delta ^2\). In particular, if \(\log {p_j}/\log {X}\in [a_j,a_j+\delta ]\) then certainly \(\log {p_j}/\log {n}\in [a_j,a_j+2\delta ]\). This means that if \(\mathcal {C}(\mathbf {a};2\delta )\subseteq \overline{\mathcal {R}}\) and \(\log {p_j}/\log {X}\in [a_j,a_j+\delta ]\) for all \(j\le \ell -1\), then \(\mathbf {1}_{\mathcal {R}}(p_1\cdots p_\ell )=1\) for all \(p_\ell \in [X^{1-\delta ^2}/p_1\cdots p_{\ell -1},X/p_1\cdots p_{\ell -1}]\). Thus for \(n\in [X^{1-\delta ^2},X]\)
If \(\mathcal {C}(\mathbf {a};2\delta )\cap \overline{\mathcal {R}}\ne \emptyset \) but \(\mathcal {C}(\mathbf {a};2\delta )\not \subseteq \overline{\mathcal {R}}\) then \(\mathcal {C}(\mathbf {a};2\delta )\) intersects the boundary \(\partial \overline{\mathcal {R}}\) of \(\overline{\mathcal {R}}\).
Since \(\mathbf {1}_{\mathcal {R}}(n)\) is supported on n with \(\ell \) prime factors all at least \(n^\eta \), if \(n=p_1\cdots p_\ell \ge X^{1-\delta ^2}\) and \(\mathbf {1}_{\mathcal {R}}(n)=1\) then there is an \(\mathbf {a}\) with \(a_i\ge \eta /2\) such that \(\tilde{\mathbf {1}}_{\mathcal {C}(\mathbf {a};\delta )}(p_1\cdots p_{\ell -1})=1\). Moreover, since \(n\ge X^{1-\delta ^2}\) we have \(p_{\ell }\ge X^{1-\delta ^2}/p_1\cdots p_{\ell -1}\ge X^{1-\sum _{i=1}^{\ell -1}a_i-\ell \delta }\), so in fact \(\tilde{\mathbf {1}}_{\mathcal {C}^+(\mathbf {a};\delta )}(n)=1\). Since the cubes are disjoint, this happens for exactly one choice of \(\mathbf {a}\). Therefore we have for any \(n\in [X^{1- \delta ^2},X]\)
Using this with (9.2) to split the summation over hypercubes \(\mathcal {C}\), we find
Re-inserting terms with \(m\le X^{1-\delta ^2}\) and \(n\le X^{1-\delta ^2}\), we obtain
The final two terms above satisfy
We now consider the contribution to (9.3) from \(\mathcal {C}(\mathbf {a};2\delta )\cap \partial \overline{\mathcal {R}}\ne \emptyset \). Since \(\mathcal {R}\subseteq [\eta ,1]^\ell \), we must have \(a_i\ge \eta /2\) and since the coordinates of points in \(\mathcal {R}\) sum to 1 we also have \(\sum _{i=1}^{\ell -1}a_i\le 1-\eta /2\). Since \(\tilde{\mathbf {1}}_{\mathcal {C}^+(\mathbf {a};\delta )}(n)\) and \(\Lambda _{\mathcal {C}^+(\mathbf {a};\delta )}(n)\) have the same support, which is restricted to integers with no factor less than \(X^{\eta /4}\), we have \(\tilde{\mathbf {1}}_{\mathcal {C}^+(\mathbf {a};\delta )}(n)\ll _\eta (\log {X})^{-\ell } \Lambda _{\mathcal {C}^+(\mathbf {a};\delta )}(n)\). Thus we have
Here we used the triangle inequality in the final line. By the prime number theorem, for any choice of \(\mathbf {a}\in [0,2]^{\ell -1}\) we have
Since \(\mathcal {R}\) is a closed convex polytope, so is \(\overline{\mathcal {R}}\subseteq \mathbb {R}^{\ell -1}\). Therefore there are \(O_\mathcal {R}(\delta ^{-(\ell -2)})\) hypercubes \(\mathcal {C}(\mathbf {a};2\delta )\) which intersect \(\partial \overline{\mathcal {R}}\). Thus the contribution to (9.3) from the final term of (9.5) is
We now consider the terms with \(\mathcal {C}(\mathbf {a};2\delta )\subseteq \overline{\mathcal {R}}\). Since \(\mathcal {R}\subseteq \mathcal {Q}_\ell (\eta )\), if \(\mathbf {e}\in \mathcal {R}\) then \(e_1\le \cdots \le e_\ell \), so if \(\mathbf {e}'\in \overline{\mathcal {R}}\) then \(e_1'\le \cdots \le e_{\ell -1}'\). Therefore, since \(\mathcal {C}(\mathbf {a};2\delta )\subseteq \overline{\mathcal {R}}\),
Since \(\sum _{i=1}^\ell e_i=1\) and \(e_{\ell -1}\le e_\ell \) for \(\mathbf {e}\in \mathcal {R}\), if \(\mathbf {e}'\in \overline{\mathcal {R}}\) then \(e_{\ell -1}'\le 1-\sum _{i=1}^{\ell -1}e_i'\). Therefore, since \((a_1+2\delta ,\ldots ,a_{\ell -1}+2\delta )\in \mathcal {C}(\mathbf {a};2\delta )\subseteq \overline{\mathcal {R}}\), we have
Together (9.7) and (9.8) imply that at most one term occurs in the summation in \(\Lambda _{\mathcal {C}^+(\mathbf {a};\delta )}\). Thus for such \(\mathcal {C}(\mathbf {a};2\delta )\), since the coordinates are localized, we have
Thus
Since any \(n=p_1\cdots p_\ell \) contributing to the second term above is counted at most once and has all prime factors at least \(X^{\eta /4}\), we have
Here we used Lemma 7.4 and (5.2) in the final line. Combining (9.4), (9.5), (9.6), (9.10) and (9.11), we find (9.3) is bounded by
Thus to establish Proposition 7.2 it is sufficient to show that for any \(A>0\), we have
uniformly for every hypercube \(\mathcal {C}(\mathbf {a};\delta )\) of side length \(\delta \) with \(\mathcal {C}(\mathbf {a};2\delta )\cap \overline{\mathcal {R}}\ne \emptyset \).
Since \(\sum _{i\in \mathcal {I}}e_i\in [9/25+\epsilon ,17/40-\epsilon ]\) if \(\mathbf {e}\in \mathcal {R}\), by taking \(\mathcal {J}=\mathcal {I}\) or \(\mathcal {J}=\{1,\ldots ,\ell \}\backslash \mathcal {I}\), we must have that \(\sum _{i\in \mathcal {J}}a_j\in [9/25+\epsilon /2,17/40-\epsilon /2]\cup [23/40+\epsilon /2,16/25-\epsilon /2]\) for some \(\mathcal {J}\subseteq \{1,\ldots ,\ell -1\}\) for any \(\mathbf {a}\) such that \(\mathcal {C}(\mathbf {a};2\delta )\cap \mathcal {R}\ne \emptyset \). Since \(\mathcal {R}\subseteq [\eta ,1]^\ell \), we have \(\min _i a_i\ge \eta /2\) and \(\sum _{i=1}^{\ell -1}a_i<1-\eta /2\) if \(\mathcal {C}(\mathbf {a};2\delta )\cap \mathcal {R}\ne \emptyset \). Thus all hypercubes under consideration satisfy the assumptions on \(\mathcal {R}_X\) of Propositions 9.1–9.3.
By Fourier expansion we have
We split the summation over b into the sets \(\mathcal {M}\), \([0,X)\backslash (\mathcal {E}\cup \mathcal {M})\) and \(\mathcal {E}\backslash \mathcal {M}\), where \(\mathcal {M}\) is as given by Proposition 9.1, and \(\mathcal {E}\) is the set who existence is asserted by Proposition 9.2. We then apply Propositions 9.1, 9.2 and 9.3 respectively to each set in turn. Let \(H_{\mathcal {C}^+}(\theta )=S_{\mathcal {A}}(\theta )S_{\mathcal {C}^+(\mathbf {a};\delta )}(-\theta )\). For C in the definition of \(\mathcal {M}\) sufficiently large in terms of A and \(\eta \), this gives
This gives (9.12), and hence completes the proof of Proposition 7.2. \(\square \)
Since Lemma 7.4 follows from Proposition 7.1, which in turn follows from Lemmas 8.1 and 8.2 , we are left to establish Lemmas 8.1, 8.2, Propositions 9.1, 9.2 and 9.3.
10 Fourier estimates
In this section we collect various distributional bounds on the Fourier transform
which will underpin our later analysis. In particular, we establish Lemma 8.1 and Lemma 8.2, as well as several other related estimates. Specifically, Lemma 8.1 is a special case of Lemma 10.5, and Lemma 8.2 is the same as Lemma 10.1.
We recall our normalized version of \(S_{\mathcal {A}}(\theta )\) from (3.1)
We recall that we assume Y is an integral power of ten whenever we encounter \(F_Y\) to avoid some unimportant technicalities. In particular,
for all \(\theta \) and Y. The key property of \(F_Y\) which we exploit is that it has an exceptionally nice product form. If \(Y=10^k\), then letting \(n=\sum _{i=0}^{k-1}n_i 10^i\) have decimal digits \(n_{k-1},\ldots , n_0\), we find
We note that \(F_Y\) is periodic modulo 1, and that the above product formula gives the identity
(We recall that we assume that U and V are both powers of 10 in such a statement.)
Lemma 10.1
(\(\ell ^\infty \) bound, Lemma 8.2 restated) Let \(q<Y^{1/3}\) be of the form \(q=q_1q_2\) with \((q_1,10)=1\) and \(q_1>1\), and let \(|\eta |<Y^{-2/3}/2\). Then for any integer a coprime with q we have
for some absolute constant \(c>0\).
Proof
From the bounds coming from truncated Taylor expansions, we have that
We recall that \(\Vert \cdot \Vert \) denotes the distance to the nearest integer. This implies that
For the final inequality we used the convexity of \(\exp (-x^2)\). We substitute this bound into our expression (10.2) for \(F_Y\), which gives for \(Y=10^k\)
If \(t=a/q_1q_2\) with \(q_1>1\), \((q_1,10)=1\) and \((a,q_1)=1\), then \(\Vert 10^i t\Vert \ge 1/q_1q_2\) for all i. Similarly, if \(t=a/q_1q_2+\eta \) with \(a,q_1,q_2\) as above, with \(|\eta |<Y^{-2/3}/2\) and with \(q=q_1q_2<Y^{1/3}\) then for \(i\le k/3\) we have \(\Vert 10^i t\Vert \ge 1/q-10^i|\eta |\ge 1/2q\). However, if \(\Vert 10^i t\Vert <1/20\) then \(\Vert 10^{i+1}t\Vert =10\Vert 10^i t\Vert \). Thus, for any interval \(\mathcal {I}\subseteq [0,k/3]\) of length \(\log {q}/\log {10}\), there must be some integer \(i\in \mathcal {I}\) such that \(\Vert 10^i (a/q+\eta )\Vert >1/200\). This implies that
Substituting this into the bound for F, and recalling we assume \(q<Y^{1/3}\) gives the result. \(\square \)
Lemma 10.2
(Markov moment bound) Let J be a positive integer. Let \(\lambda _{t,J}\) be the largest eigenvalue of the \(10^J\times 10^J\) matrix \(M_{t}\), given by
where
Then we have that
Proof
We recall the product formula (10.3) with \(Y=10^k\)
where we interpret the term in parentheses as 9 if \(\Vert 10^{i-1}\theta \Vert =0\). Writing \(\theta =\sum _{i=1}^k t_i 10^{-i}\) for \(t_i\in \{0,\ldots ,9\}\), we see that the \((k-j){\mathrm{th}}\) term in the product depends only on \(t_{k-j},\ldots ,t_k\). Moreover, the value of the term is mainly dependent on the first few of these digits by continuity. Thus we may approximate the absolute value of \(F_Y(\theta )\) by a product where the \(j{\mathrm{th}}\) term depends only on \(t_{j},\ldots ,t_{j+J}\) for some constant J. Explicitly, we have
where we put \(t_j=0\) for \(j>k\).
With this formulation we can interpret the above bound in terms of the probability of a walk on \(\{0,\ldots ,9,\infty \}^k\). Let \(t\in \mathbb {R}\) be given. Consider an order-J Markov chain \(X_1,X_2,\ldots \) where for \(a,a_1,\ldots ,a_n\in \{0,\ldots ,9\}\) we have for \(n>J\)
for some suitably small constant c (so that the probability that \(X_n\in \{0,\ldots ,9\}\) is less than 1). To make this a genuine Markov chain we choose the probability that \(X_n=\infty \) given \(X_{n-1},\ldots ,X_{n-J}\) to be such that the probabilities add up to 1, and if \(X_n=\infty \) then we have that \(X_{n+1}=\infty \) with probability 1.
Then we have that
The sum (over all paths in \(\{0,\ldots ,9\}^k\)) of the probabilities of paths is a linear combination of the entries in the \(k{\mathrm{th}}\) power of the transition matrix restricted to \(\{0,\ldots ,9\}\). Thus such a moment estimate is a linear combination of the \(k{\mathrm{th}}\) power of the eigenvalues of this matrix. This allows us to estimate any moment of \(F_{Y}(a/Y)\) over \(a\in [0,Y)\) uniformly for all k by performing a finite eigenvalue calculation. In particular, this gives us a (arbitrarily good as J increases) numerical approximation to the distribution function of \(F_Y\).
Explicitly, let \(M_{t}\) be the \(10^J\times 10^J\) matrix given by
and let \(\lambda _{t,J}\) be the absolute value of the largest eigenvalue of \(M_t\). Since \(G(t_1,\ldots ,t_{J+1})>0\) for all \(t_1,\ldots ,t_{J+1}\), we have that \(M_t\) is irreducible, and so each eigenspace corresponding to an eigenvalue of modulus \(\lambda _{t,J}\) has dimension 1 by the Perron-Frobenius Theorem. Let \((M_t)_{i,j}=m_{i,j}\). By expanding out the \(k{\mathrm{th}}\) power, we have
We recall that \(m_{i,j}=0\) unless there is \(a_1,\ldots ,a_{J+1}\in \{0,\ldots ,9\}\) such that
Thus the product \(m_{i,i_1}m_{i_1,i_2}\cdots m_{i_{k-1},j}\) is non-zero only if there are \(a_1,\ldots ,a_{k+J}\in \{0,\ldots ,9\}\) such that
If this is the case then we have
Thus, fixing \(i=1\) so that \(a_{k+1}=\dots =a_{J+k}=0\), and summing over j, we have that
On the other hand, by the eigenvalue expansion of \(M_{t}\), we have
This gives the result. \(\square \)
Lemma 10.3
(\(\ell ^1\) bound) We have for any \(k\in \mathbb {N}\)
In particular, we have for \(Y_1\asymp Y_2\asymp Y_3\)
and
Here \(27/77\approx 0.35\) is slightly larger than 1/3, and \(50/77\approx 0.65\).
Proof
This follows from Lemma 10.2 and a numerical bound on \(\lambda _{1,4}\). Specifically, by Lemma 10.2 taking \(J=4\) we find
A numerical calculationFootnote 2 reveals that
for all choices of \(a_0\in \{0,\ldots ,9\}\). Thus, letting \(Y=10^k\) we have \(\lambda _{1,4}^k<Y^{27/77}\), which gives the first result.
For the second bound, let \(U_1=\max (1,Y_3/Y_2)\). Since \(Y_3\asymp Y_2\), we have \(U_1\ll 1\). Any \(a<Y_1\) can be written as \(a=a_1+U_1a_2+Y_3 a_3\) for some \(0\le a_1< U_1\ll 1\), \(0\le a_2<Y_3/U_1=\min (Y_3,Y_2)\) and \(0\le a_3< Y_1/Y_3\ll 1\). Since there are O(1) choices of \(a_1,a_3\) and these can be absorbed into the supremum over \(\beta \), we see that it suffices to show
Since \(F_{Y_2}\ge 0\) we can extend the summation to \(a_2<Y_2\). Thus without loss of generality we may assume that \(Y_1=Y_2=Y_3=Y=10^k\). We see that
Here we used the fact that \(G(t_i,\ldots ,t_{i+4})\) is bounded away from 0 for all \(t_1,\ldots ,t_{k}\in \{0,\ldots ,9\}\) since it is the maximal absolute value of a trigonometric polynomial over an interval. Since F is periodic modulo 1 we see that
and so the second bound of the lemma follows from (10.6), (10.4) and (10.5) on letting \(a=\sum _{i=1}^k t_i/10^i\). For the final bound we integrate (10.6) over \(\eta \in [0,Y^{-1}]\) and sum over \(t_1,\ldots ,t_k\in \{0,\ldots ,9\}\), giving
\(\square \)
Lemma 10.4
(\(235/154{\mathrm{th}}\) moment bound) We have that
Here \(235/154\approx 1.5\) and \(59/433\approx 0.14\). We recall that \(n\sim X\) means that \(X/10<n\le X\).
Proof
This follows from Lemma 10.2 and a numerical bound for \(\lambda _{235/154,4}\). Explicitly, we take \(J=4\) and \(Y=10^k\). By Lemma 10.2 we have
A numerical calculationFootnote 3 reveals that
for all choices of \(a_0\in \{0,\ldots ,9\}\). Substituting this in the bound above gives the result. \(\square \)
Lemma 10.5
(Large sieve estimates) We have
and for any integer d, we have
Proof
For each \(a\le q\), let \(|\eta _{a}|\) maximize \(F_U(a/q+\eta )\) over \(|\eta |<\delta \). Since the fractions a / q are all separated from one another by at least 1 / q, we have for any t
Thus, considering \(t=b/q-\beta \), we see that
We have that
Thus integrating over \(s\in [t-\gamma ,t+\gamma ]\) for some \(\gamma >0\), we have
This implies that
Taking \(\gamma =1/2q\), we obtain
Writing \(U=10^u\) and \(n=\sum _{i=0}^{u-1}n_i 10^i\), we see that
Writing \(n=\sum _{j=0}^{u-1}n_j10^{j-1}\) and using the triangle inequality, we have
We recall the function G from Lemma 10.2. Since \(G(t_1,\ldots ,t_{1+J})\) is bounded away from 0, we see that for \(\eta \ll U^{-1}\)
Thus, integrating over \(\eta \in [0,U^{-1}]\), taking \(J=4\), and using Lemma 10.3, we obtain
By Lemma 10.3 we have
Combining (10.10), (10.9), (10.8) and (10.7), we obtain
Combining this with the trivial bound
for \(U\le Y\), and choosing U maximally subject to \(U\le q\) and \(U\le Y\) gives the first result of the lemma.
The other bounds follow from entirely analogous arguments. In particular we note that for \((a,q)=1\), \(q<Q\), the numbers a / q are separated from one another by \(1/Q^2\), and those with d|q are separated from each other by \(d/Q^2\), so we have the equivalent of (10.7) with \(\delta q\) replaced by \(\delta Q^2\) or \(\delta Q^2/d\) and \(|\eta |\le 1/2q\) replaced by \(|\eta |\le 1/2Q^2\) or \(|\eta |\le d/2Q^2\). \(\square \)
Lemma 10.6
(Hybrid Bounds) Let \(E\ge 1\). Then we have
In the above lemma, we emphasize that a, q, d are all integers, bu the summation over \(\eta \) is over real numbers which are well-spaced from the condition \(Y(\eta +a/q)\in \mathbb {Z}\).
Proof
We first note that the summand \(a/q+\eta \) runs through fractions b / Y with \(|b|\le E+Y\) since we have the condition \((\eta +a/q)Y\in \mathbb {Z}\). Each fraction b / Y is represented \(O(1+\min (q E/Y,q))\) times, since if \(a_1/q+\eta _1=a_2/q+\eta _2\) then \(a_2=a_1+O(q E/Y)\) and \(\eta _2\) is determined by \(a_1,a_2,\eta _1\). There are \(O(1+E/Y)\) choices of b giving the same fraction \(\ (\mathrm {mod}\ 1)\), and since \(F_Y\) is periodic \(\ (\mathrm {mod}\ 1)\) these all give the same value of \(F_Y(b/Y)\). Thus we may consider only \(b<Y\) with each fraction b / Y occurring \(O((1+E/Y)\min (q E/Y,q))\) times. Thus we see that if \(10 q E\ge Y\) then
In this case the result now follows from Lemma 10.3. Thus we may assume \(q E<Y/10\).
Using the product formula (10.3), we have for \(Y\ge UV\) powers of 10
We also have the trivial bound \(F_{V}(U\theta )\le 1\) of (10.1). For \(UV\le Y\) and \(|\eta |<E/Y\) these give
We choose V and then U to be the largest powers of 10 such that \(V\le Y/q E\) and \(U\le Y/V E\). Note that this choice gives \(U,V\ge 1\) since \(q E<Y/10\) and \(q,E\ge 1\). Thus
where
Since we chose U and V maximally, we have \(V\ge Y/10q E\), so \(q/100\le U\le 10q\). Since \(q E<Y/10\), we may extend the supremum in \(\Sigma _1\) to \(\gamma \le 1/10q\) for an upper bound. Thus, by Lemma 10.5 we have
Similarly, since \(Y/UV\asymp E\), by Lemma 10.3 we have
Putting this together gives the first result.
The second bound follows from an entirely analogous argument. We first split the argument depending on whether \(Q^2E/d\ge Y/10\) or not, and use the final bound of Lemma 10.5 instead of the first bound to handle \(\Sigma _2\). \(\square \)
The argument giving the first bound of Lemma 10.6 is essentially sharp if the \(\ell ^1\) bounds used in the proof are sharp and if q is a divisor of a power of 10 or if \(Q E\ge Y\). When \(Q E\le Y^{1-\epsilon }\) and q is not a divisor of a power of 10, however, we trivially bounded a factor \(F_V(U(a/q+\eta ))\) by 1 in the proof, which we expect not to be tight. Lemma 10.7 below allows us to obtain superior bounds (in certain ranges) provided the denominators do not have large powers of 2 or 5 dividing them.
Lemma 10.7
(Alternative Hybrid Bound) Let \(D,E,Y,Q_1\ge 1\) be integral powers of 10 with \(DE\ll Y\). Let \(q_1\sim Q_1\) with \((q_1,10)=1\) and let \(d\sim D\) satisfy \(d|10^u\) for some \(u\ge 0\). Let
Then we have
In particular, if \(q=d q'\) with \((q',10)=1\) and \(d|10^u\) for some integer \(u\ge 0\), then we have
For example, if \((q,10)=1\) and qE is a sufficiently small power of Y, then we improve the first bound \((q E)^{27/77}\) of Lemma 10.6 in the q-aspect to \(E^{27/77}q^{1/21}\). This improvement is important for our later estimates.
Proof
Choose \(E'\asymp E\) and \(D'\asymp D\) with \(E',D'\ge 1\) integral powers of 10 such that \(E' D'\le Y\). Let V be the largest integral power of 10 such that \(V^2\le Y/D' E'\). Since \(D' E'\le Y\) we have that \(V\ge 1\). Let \(d=d_1d_2d_3\) where \(d_3=(d,D')\) and \(d_2d_3=(d,VD')\).
By the periodicity of F modulo one, the fact \((q_1q_2,d)=1\), and the Chinese remainder theorem, we have
where the dash on \(\sum '\) indicates that \(\eta \) is summed over all reals satisfying
By (10.3), we have \(F_{E' D' V^2}(t)=F_{D'}(t)F_{V^2}(D' t)F_{E'}(D' V^2t)\). Since \(D' E' V^2\le Y\), we have \(F_Y(t)\le F_{D' E' V^2}(t)\). Thus, since F is periodic modulo 1 and \(d_3|D'\) and \(d_2d_3|VD'\), we have
where
Moreover, by (10.3) and Cauchy–Schwarz, we have
Since \(d_2d_3|D' V\), this gives
where
These give
where
Since \((d_1d_2d_3,D')=d_3\) and \((q_1q_2,d)=1\), as \(a'\), \(b_1\) and \(b_2\) go through all residue classes \(\ (\mathrm {mod}\ q_1q_2)\), \(\ (\mathrm {mod}\ d_1)\) and \(\ (\mathrm {mod}\ d_2)\) respectively subject to \((a',q_1q_2)=(b_1+d_1b_2,d_1d_2)=1\), we see that \(D'\beta _2\) goes through all values of \(c/q_1q_2d_1d_2\ (\mathrm {mod}\ 1)\) for \(0< c< q_1q_2d_1d_2\) with \((c,q_1q_2d_1d_2)=1\), and each value is attained exactly once. Similarly, since \((d_1d_2d_3,D' V)=d_2d_3\), we see that \(\beta _3\) goes through every value of \(c/q_1q_2d_1\ (\mathrm {mod}\ 1)\) with \(0< c< q_1q_2d_1\) and \((c,q_1q_2d_1)=1\) exactly once as a goes through the values \(\ (\mathrm {mod}\ q_1q_2)\) and \(b_1\) goes through the values \(\ (\mathrm {mod}\ d_1)\) with \((a,q_1q_2)=(b_1,d_1)=1\).
Thus we have
where
We note that only \(\Sigma _3\) and \(\Sigma _5\) depend on \(q_2\). Thus, summing over \(q_2\sim Q_2\) with \((q_2,10)=1\) we obtain
where \(\Sigma _1\), \(\Sigma _2\) and \(\Sigma _4\) are as above and \(\Sigma _3'\) and \(\Sigma _5'\) are given by
Since \(Y/D' V^2\asymp E\asymp E'\), by Lemma 10.3 we have
We have \(d_2d_3\le d\le D\) and \(DE\ll Y\), so \(E/Y\ll 1/d_2d_3\). Thus, by Lemma 10.5, we have
We are left to bound \(\Sigma _3'\) and \(\Sigma _5'\), which are very similar. Let
We note that \(\Sigma '(q_1,d_1,d_2)\) is the same as \(\Sigma _3'\) except we have increased the range of the supremum, and so we have \(\Sigma _3'\le \Sigma '(q_1,d_1,d_2)\). Moreover, we see that \(\Sigma _5'\) is a special case of \(\Sigma '\) with \(d_2=1\), so \(\Sigma _5'=\Sigma '(q_1,d_1,1)\). Thus it will suffice to get suitable bounds on \(\Sigma '\).
Since \(F_R(\theta )\ge F_V(\theta )\) for \(R\le V\), we may replace \(F_V\) with \(F_R\) where \(R=10^r\) is the largest power of 10 less than \(\min (V,d_1d_2Q_1Q_2^2)\). Since \(R\le V\) and \(D' E V/Y\ll 1/V\), we see all quantities \(\gamma \) occurring in the supremum are of size at most O(1 / R). Given any choice of reals \(\eta _{a,q_2}\ll 1/R\) for \(a\le d_1d_2q_1q_2\) and \(q_2\sim Q_2\) with \((a,d_1d_2q_1q_2)=1\), the numbers \(a/d_1d_2q_1q_2+\eta _{a,q_2}\) can be arranged into \(O(d_1d_2Q_1Q_2^2/R)\) sets such that all numbers in any set are separated by \(\gg 1/R\). (Recall that r is chosen such that \(R\le d_1d_2Q_1Q_2^2\).) Thus, as in the proof of Lemma 10.5 (specifically the argument leading up to (10.8)), we find that
By Parseval we have
and
Using Cauchy–Schwarz and the above bounds, we obtain
Putting this together gives
We recall that \(R=10^{r}\sim \min (V,d_1d_2Q_1Q_2^2)\) and \(V\asymp (Y/DE)^{1/2}\), and note that \(20/21<\log {9}/\log {10}\). This gives
This gives a bound for \(\Sigma _3'\) since \(\Sigma _3'\le \Sigma '\), and we obtain an analogous bound for \(\Sigma _5'\) with \(d_2\) replaced by 1. Combining (10.16) with our earlier bounds (10.13), (10.14) and (10.15) and substituting these into (10.12) gives
Simplifying the exponents by noting \(1+10/21<3/2\) and \(27/77+10/21<5/6\) then gives the result.
The second statement of the lemma is simply the case when \(Q_2=1\) and \(q=d q_1\). \(\square \)
We see that Lemma 8.1 follows immediately from Lemma 10.5, and Lemma 8.2 is the same as Lemma 10.1. Thus we are left to establish Propositions 9.1, 9.2 and 9.3, which we do over the next few sections.
11 Major arcs
In this section we establish Proposition 9.1 using the prime number theorem in arithmetic progressions and short intervals, making use of Lemma 10.1.
Proof of Proposition 9.1
We split \(\mathcal {M}\) up as three disjoint sets
where
By Lemma 10.1 and recalling X is a power of 10, we have
Using the trivial bound \(S_{\mathcal {R}_X}(\theta )\ll X(\log {X})^{\ell }\), where \(\ell \le 2/\eta \) and noting \(\#\mathcal {M}_1\ll (\log {X})^{3C}\), we obtain
This gives the result for \(\mathcal {M}_1\).
We now consider \(\mathcal {M}_2\). Recalling the definition of \(\mathcal {R}_X\), we have that for \(n<X\)
where \(\mathcal {C}=(a_1,a_1+\delta ]\times \dots \times (a_{\ell -1},a_{\ell -1}+\delta ]\) is the projection of \(\mathcal {R}_X\) onto the first \(\ell -1\) coordinates. We note the crude bound
Let \(\Delta =\lceil \log {X}\rceil ^{-10C-10\ell }\). We note that if \(a\in \mathcal {M}_2\) then \(a/X=b/q+c/X\) for some integers \(b,q,|c|\le (\log {X})^C\) (c is an integer since q|X for the set \(\mathcal {M}_2\)). We separate the sum \(S_{\mathcal {R}_X}(a/X)\) by putting the prime variable p occurring in (11.2) in short intervals of length \(\Delta x/m\) and in arithmetic progressions \(\ (\mathrm {mod}\ q)\). We note that \(\Lambda _{\mathcal {C}}\) is supported on \(m\le X^{\sum _i a_i+(\ell -1)\delta }< X^{1-\eta /3}\), so we can drop the constraints \(p\ge X^{\eta /4},X^{1-\sum _{i}a_i-\ell \delta }\) at the cost of some terms with \(mp<X^{1-\eta /12}+X^{1-\delta }\). Thus we have
If \(mp=j\Delta X+O(\Delta X)\) and \(p\equiv r\ (\mathrm {mod}\ q)\) we have
By the prime number theorem in short intervals and arithmetic progressions (5.1), for \(m<X^{1-\eta /3}\) and \((r,q)=1\) we have
Thus
Finally, since \(c\in \mathbb {Z}\) and \(c\ne 0\) and \(\Delta ^{-1}\in \mathbb {Z}\), we have
Using (11.3), this gives
Note that in the above argument for us to be able to save an arbitrary power of log it was important that we are counting elements with weight \(\Lambda _{\mathcal {R}_X}(n)\) rather than \(\mathbf {1}_{\mathcal {R}_X}(n)\), and that \(X\nu \in \mathbb {Z}\) for \(a\in \mathcal {M}_2\).
Using the trivial bounds \(S_{\mathcal {A}}(\theta )\le \#\mathcal {A}\) and \(\#\mathcal {M}_2\ll (\log {X})^{3C}\) along with (11.4), we obtain
Finally, we consider \(\mathcal {M}_3\). By the prime number theorem in arithmetic progressions as above, we have for \((r,q)=1\) and \(q\le (\log {X})^C\) that
Thus, for \((a,q)=1\)
Since \(\mu (q)=0\) for \(q|10^k=X\) unless \(q\in \{1,2,5,10\}\), using the trivial bounds \(\#\mathcal {M}_3\ll (\log {X})^{2C}\) and \(|S_\mathcal {A}(a/X)|\le \#\mathcal {A}\), we obtain
Thus (11.1), (11.5) and (11.6) gives the result. \(\square \)
Remark
We have only needed to use the prime number theorem in arithmetic progressions when the modulus is a small divisor of X, and so has no large prime factors. This means that our implied constants can be taken to be effectively computable since for such moduli we do not need to appeal to Siegel’s theorem.
12 Generic minor arcs
In this section we establish Proposition 9.2 and obtain some bounds on the exceptional set \(\mathcal {E}\) by using the distributional estimates of Lemma 10.4.
Lemma 12.1
(\(\ell ^2\) bound for primes) We have that
Proof
This follows from the \(\ell ^2\) bound coming from Parseval’s identity.
\(\square \)
Lemma 12.2
(Generic frequency bounds) Let
Then
and
Proof
The first bound on the size of \(\mathcal {E}\) follows from using Lemma 10.4 with \(B=X^{23/80}\) and verifying that \((23\times 235)/(80\times 154)+59/433<23/40\). For the second bound we see from Lemma 10.4 that
and so the calculation above gives the result.
It remains to bound the sum over \(a\notin \mathcal {E}\). We divide the sum into \(O(\log {X})^2\) subsums where we restrict to those a such that \(F_X(a/X)\sim 1/B\) and \(|S_{\mathcal {R}}(a/X)|\sim X/C\) for some \(B\ge X^{23/80}\) and \(C\le X^2\) (terms with \(C>X^2\) makes a contribution O(1 / X)). This gives
We concentrate on the inner sum. Using Lemmas 10.4 and 12.1 we see that the sum contributes
Here we used the bound \(\min (x,y)\le x^{1/2}y^{1/2}\) in the last line. In particular, we see this is \(O_\eta (X^{1-2\epsilon })\) if \(B\ge X^{23/80}\) on verifying that \(23/80\times 73/308>59/866\). Substituting this into our bound above gives the result. \(\square \)
13 Exceptional minor arcs
In this section we reduce Proposition 9.3 to the task of establishing Propositions 13.3 and 13.4, given below. We do this by making use of the bilinear structure of \(\Lambda _{\mathcal {R}_X}(n)\) which is supported on integers of the form \(n_1n_2\) with \(n_1\) of convenient size, and then showing that if these resulting bilinear expressions are large then the Fourier frequencies must lie in a smaller additively structured set. Propositions 13.3 and 13.4 then show that we have superior Fourier distributional estimates inside such sets. Thus we conclude that the bilinear sums are always small. To make the bilinear bound explicit, we establish the following lemma, from which Proposition 9.3 follows quickly.
Lemma 13.1
(Bilinear sum bound) Let \(N,M,Q\ge 1\) and E satisfy \(X^{9/25}\le N\le X^{17/40}\), \(Q\le X^{1/2}\), \(NM\le 1000X\) and \(E\le 100X^{1/2}/Q\), and either \(E\ge 1/X\) or \(E=0\). Let \(\mathcal {F}=\mathcal {F}(Q,E)\) be given by
Then for any complex 1-bounded complex sequences \(\alpha _n,\beta _m,\gamma _a\) we have
Proof of Proposition 9.3 assuming Lemma 13.1
By symmetry, we may assume that \(\mathcal {I}=\{1,\ldots ,\ell _1\}\) for some \(\ell _1< \ell \). By Dirichlet’s theorem on Diophantine approximation, any \(a\in [0, X)\) has a representation
for some integers \((b,q)=1\) with \(q\le X^{1/2}\) and some real \(|\nu |\le 1/X^{1/2}q\). Thus we can divide [0, X) into \(O(\log {X})^2\) sets \(\mathcal {F}(Q,E)\) as defined by Lemma 13.1 for different parameters Q, E satisfying \(1\le Q\le X^{1/2}\) and \(E=0\) or \(1/X\le E\le 100 X^{1/2}/Q\). Moreover, if \(a\notin \mathcal {M}\) then \(a\in \mathcal {F}=\mathcal {F}(Q,E)\) for some Q, E, with \(Q+E\ge (\log {X})^C\). Thus, provided C is sufficiently large compared with A and \(\eta \), we see it is sufficient to show that
From the definition (9.1) of \(\Lambda _{\mathcal {R}_X}\) and shape of \(\mathcal {R}_X\) given by Proposition 9.3, we have that for \(n<X\)
where \(\mathcal {R}_1\) is the projection of \(\mathcal {R}_X\) onto the first \(\ell _1\) coordinates, and \(\mathcal {R}_2\) is the projection onto the subsequent \(\ell -\ell _1-1\) coordinates.
Since \(n_1\), \(n_2\), p and X are integers, \(|\log {((X-1/2)/n_1n_2p)}|\gg 1/X\). Thus, by Perron’s formula (see, for example, [10, Chapter 17]), we have for \(n_1,n_2,p<X\)
We will use this to remove the constraint \(n=n_1n_2p<X\) in \(S_{\mathcal {R}_X}(-a/X)\). We first put \(n_1,n_2,p\) into one of \(O(\log {X})^3\) intervals of the form (Y / 10, Y], and then apply the above estimate. The \(O(X^{-2})\) error term trivially makes a negligible contribution to (13.1). Thus, we see that for C sufficiently large, it suffices to show uniformly over all s with \(\mathfrak {R}(s)=1/\log {X}\) and all choices of \(N_1,N_2,P\) with \(N_1N_2P\le 1000 X\) and \(P\ge X^{1-\sum _{i=1}^{\ell -1} a_i-\ell \delta }\) that
where \(c_p=\log {p}\) if \(p\ge X^{\eta /4},X^{1-\sum _i a_i-\ell \delta }\) and 0 otherwise. (The integral over s and the choices of \(N_1,N_2,P\) contribute a factor of \(O(\log {X})^4\), which is acceptable for establishing (13.1) if C is sufficiently large.)
Since \(\Lambda _{\mathcal {R}_1}(n_1)\) is supported on \(n_1\in [X^{\sum _{i=1}^{\ell _1}a_i},X^{\sum _{i=1}^{\ell _1}a_i+\ell \delta }]\) and \(\Lambda _{\mathcal {R}_2}(n_2)\) is supported on \(n_2\ge X^{\sum _{\ell _1+1}^{\ell -1}a_i}\), we only need to consider \(N_1N_2P\ge X^{1-\ell \delta }\) and \(N_1\in [X^{\sum _{i=1}^{\ell _1}a_i},X^{\sum _{i=1}^{\ell _1}a_i+\epsilon /6}]\). But, by assumption,
so either \(N_1\) or \(N_2 P\) lie in \([X^{9/25},X^{17/40}]\). Since \(\Lambda _{\mathcal {R}_1}(n_1),\Lambda _{\mathcal {R}_2}(n_2),\log {p}\ll _\ell (\log {X})^{\ell -1}\), for C sufficiently large in terms of \(\ell \) we see that it suffices to show that
uniformly over all choices of \(N\in [X^{9/25},X^{17/40}]\) and \(M\le 1000 X/N\) and uniformly over all 1-bounded complex sequences \(\alpha _n,\beta _m\). (Setting \(\alpha _n=\Lambda _{\mathcal {R}_1}(n)/(\log {X})^{\ell }\) and \(\beta _m=\sum _{p n_2=m, p\sim P, n_2\sim N_2}\Lambda _{\mathcal {R}_2}(n_2)c_p/(\log {X})^{\ell }\) gives the bound when \(\sum _{i=1}^{\ell _1}a_i\in [9/25+\epsilon /2,17/40-\epsilon /2]\); the other case is analogous with \(\alpha _n\) and \(\beta _m\) swapped.)
Finally, let \(\gamma _a\) be the 1-bounded sequence satisfying \(S_{\mathcal {A}}(a/X)=\#\mathcal {A}\gamma _a F_X(a/X)\). After substituting this expression for \(S_\mathcal {A}\), we see that (13.2) follows immediately from Lemma 13.1 for C sufficiently large in terms of \(\eta \), thus giving the result. \(\square \)
Thus it remains to establish Lemma 13.1. The key estimate constraining Fourier frequencies to additively structured sets is the following lemma.
Lemma 13.2
(Geometry of numbers) Let \(K_0\) be a sufficiently large constant, let \(\mathbf {t}\in \mathbb {R}^3\) with \(\Vert \mathbf {t}\Vert _2=1\) and let \(N>1>\delta >0\). Let
satisfy \(\#\mathcal {R}\cap \mathbb {Z}^3\ge \delta K N^2\) for some \(K>K_0\). Then there exists a lattice \(\Lambda \subset \mathbb {Z}^3\) of rank at most 2 such that
If a cuboid \(\mathcal {R}\subseteq \mathbb {R}^3\) of volume V lies in a the region \(|z|\le \epsilon \), then it can easily contain rather more than V lattice points from the plane \(z=0\). Lemma 13.2 says that such a situation is essentially the only way a cuboid can contain many lattice points; if any cuboid has substantially more than V lattice points in \(\mathcal {R}\cap \mathbb {Z}^3\), then these lattice points must come from some lower dimensional linear subspace. The region \(\mathcal {R}\) which we are interested in is a slightly thickened disc through the origin in the plane orthogonal to \(\mathbf {t}\).
Proof of Lemma 13.2
Let \(\phi :\mathbb {R}^3\rightarrow \mathbb {R}^3\) be the linear map which is a dilation by a factor \(N/\delta \) in the \(\mathbf {t}\)-direction (i.e. \(\phi (\mathbf {v})=\mathbf {v}+\mathbf {t}(N/\delta -1)(\mathbf {v}\cdot \mathbf {t})\).) Let \(\Lambda _1=\phi (\mathbb {Z}^3)\subset \mathbb {R}^3\) be the lattice which is the image of \(\mathbb {Z}^3\) under \(\phi \). Since the determinant of a lattice is the volume of the fundamental parallelepiped, we see that \(\det (\Lambda _1)=N/\delta \).
Let \(\{\mathbf {v}_1,\mathbf {v}_2,\mathbf {v}_3\}\) be a Minkowski-reduced basis of \(\Lambda _1\). We recall that this means that any \(\mathbf {v}\in \Lambda _1\) can be written uniquely as \(n_1\mathbf {v}_1+n_2\mathbf {v}_2+n_3\mathbf {v}_3\) for some \(n_1,n_2,n_3\in \mathbb {Z}\), and for any \(n_1,n_2,n_3\in \mathbb {Z}\) we have
and that \(\Vert \mathbf {v}_1\Vert _2\Vert \mathbf {v}_2\Vert _2\Vert \mathbf {v}_3\Vert _2\asymp \det (\Lambda _1)=N/\delta \). Without loss of generality let \(\Vert \mathbf {v}_1\Vert _2\le \Vert \mathbf {v}_2\Vert _2\le \Vert \mathbf {v}_3\Vert _2\).
We now notice that any element of \(\mathcal {R}\cap \mathbb {Z}^3\) is mapped injectively by \(\phi \) to an element of \(\{\mathbf {x}\in \Lambda _1:\,\Vert \mathbf {x}\Vert _2\le 2N\}\). Thus for a sufficiently large constant C, we have
If \(\Vert \mathbf {v}_3\Vert _2>C N\), then there are no \(\mathbf {n}\in \mathbb {Z}^3\) counted above with \(n_3\ne 0\). If instead \(\Vert \mathbf {v}_3\Vert _2\le C N\) then since \(\Vert \mathbf {v}_1\Vert _2\le \Vert \mathbf {v}_2\Vert _2\le \Vert \mathbf {v}_3\Vert _2\), the number of \(\mathbf {n}\) is
Thus in either case there are \(O(\delta N^2)\) points with \(n_3\ne 0\). However, by assumption of the lemma we have that K is sufficiently large and
This means that most of the contribution must come from terms with \(n_3=0\). Indeed, we have
We may choose \(K_0\) such that if \(K\ge K_0\) then the right hand side is at least \(\delta KN^2/2\). Thus, we see if \(\Lambda \) is the lattice \(\phi ^{-1}(\mathbf {v}_1)\mathbb {Z}+\phi ^{-1}(\mathbf {v}_2)\mathbb {Z}\) then \(\Lambda \subseteq \mathbb {Z}^3\) and
\(\square \)
We establish Lemma 13.1 assuming two key propositions, Proposition 13.3 and Proposition 13.4, given below. These propositions will be proven over the next two sections.
Proposition 13.3
(Bound for angles generating lattices) Let \(X,K,N,Q\ge 1\) and \(\delta >0\), \(E\ge 0\) satisfy \(X^{17/40}\le N K\), \(\delta \ge N/X\), \(E\le 100X^{1/2}/Q\) and \(Q\le X^{1/2}\). Let \(\mathcal {B}_1=\mathcal {B}_1(N,K,\delta )\subseteq [0,X)^2\) be the set of pairs \((a_1,a_2)\in \mathbb {Z}^2\) such that there is a lattice \(\Lambda \subseteq \mathbb {Z}^3\) of rank 2 such that
and not all of these points lie on a line through the origin. Let \(\mathcal {F}=\mathcal {F}(Q,E)\) be given by
Then we have
Proposition 13.4
(Bound for angles generating lines) Let \(N\ge X^{9/25}\), \(\delta \ge N/X\) and \(K\ge 1\). Let \(\mathcal {B}_2=\mathcal {B}_2(N,K,\delta )\subseteq [0,X)^2\) be the set of pairs \((a_1,a_2)\in \mathbb {Z}^2\) such that there exists a line L through the origin such that
Given \(B\le X^{23/80}\), let \(\mathcal {E}'=\mathcal {E}'(B)\) be given by
Then we have
Proof of Lemma 13.1 assuming Propositions 13.3 and 13.4
We split \(\mathcal {E}\) into \(O(\log {X})\) subsets of the form
for some \(B\in [1,X^{23/80}]\). By Cauchy–Schwarz, we have
where
Thus it suffices to show
provided \(X^{9/25}\le N\le X^{17/40}\), \(Q\le X^{1/2}\) and \(E\le 100X^{1/2}/Q\).
Let \(\mathcal {G}(K)\) denote the set of pairs \((a_1,a_2)\in \mathcal {F}\cap \mathcal {E}'\) such that
We consider \(1\le K\le X/N\) taking values which are integral powers of 10, and split the contribution of our sum according to these sets. We see it is therefore sufficient to show that for each K
Let \(\mathcal {G}(K,\delta )\) denote the set of pairs \((a_1,a_2)\in \mathcal {F}\cap \mathcal {E}'\) such that
By considering \(\delta =2^{-j}\) and using the pigeonhole principle, we see that if
then there is some \(\delta \ge N/X\) and some \(K/\log {X} \ll K'\le K\) such that
Thus is suffices to show for all \(K',\delta \) that
From Lemma 12.2, we have the bound
which gives (13.3) in the case when \(N K'\ll X^{17/40+\epsilon }\). Thus we may assume that \(N K'\gg X^{17/40+\epsilon }\). By assumption, we also have that \(N\le X^{17/40}\), so we only consider \(K'\gg X^{\epsilon }\). In particular, we may use Lemma 13.2 to conclude that either there is a rank 2 lattice \(\Lambda \subseteq \mathbb {Z}^3\) such that
and not all of these points lie on a line through the origin, or there is a line \(L\subseteq \mathbb {Z}^3\) such that
In either case (13.3) follows from Proposition 13.3 or Proposition 13.4 (taking ‘N’ and ‘K’ in the propositions to be 10N and \(K'/1000\ge 1\) in our notation here). \(\square \)
14 Lattice estimates
In this section we establish Proposition 13.3, which controls the contribution from pairs of angles which cause a large contribution to the bilinear sums considered in Sect. 13 to come from a lattice. A low height lattice \(\Lambda \) makes a significant contribution only if \((a_1,a_2,X)\) is approximately orthogonal to the plane of the lattice, and so only if \((a_1,a_2,X)\) lies close to the line through the origin orthogonal to this lattice. We note that we only make small use of the fact that these angles lie in a small set, but it is vital that the angles lie outside the major arcs.
Lemma 14.1
(Lattice generating angles have simultaneous approximation) Let \(\delta >0\) and \(X,N,K\ge 1\) be such that \(\delta \ge N/X\). Let \(\mathcal {B}_1=\mathcal {B}_1(N,K,\delta )\subseteq [0,X)^2\) be the set of pairs \((a_1,a_2)\in \mathbb {Z}^2\) such that there is a lattice \(\Lambda \subseteq \mathbb {Z}^3\) of rank 2 such that
and moreover the points counted above do not all lie on a line through the origin.
Then all pairs \((a_1,a_2)\in \mathcal {B}_1\) have the simultaneous rational approximations
for some integer \(q\ll X/N K\).
We see Lemma 14.1 restricts the pair \((a_1,a_2)\) to lie in a set of size \(O(X/N K)^3\), which is noticeably smaller than \(X^2\) for the range of NK under consideration. This allows us to obtain superior bounds for the sum over \(a_1,a_2\), by exploiting the estimates of Lemma 10.6 which show F is not abnormally large on such a set.
Proof
Clearly we may assume that NK is sufficiently large, since otherwise the result is trivial. By assumption of the lemma, for any pair \((a_1,a_2)\in \mathcal {B}_1\) there is a rank 2 lattice \(\Lambda =\Lambda _{a_1,a_2}\) such that \(\#(\Lambda \cap \mathcal {H})\ge \delta K N^2\) where
Moreover, not all the points in \(\Lambda \cap \mathcal {H}\) lie in a line through the origin. Let \(\mathbf {a}=(a_1,a_2,X)\), and let \(\phi :\mathbb {R}^3\rightarrow \mathbb {R}^3\) be a dilation by a factor \(N/\delta \) in the \(\mathbf {a}\)-direction, and let \(\Lambda '=\phi (\Lambda )\). Then we see that
Moreover, not all the points on the right hand hand side lie in a line through the origin, since \(\phi ^{-1}\) preserves lines through the origin. Let \(\Lambda '\) have a Minkowski-reduced basis \(\{\mathbf {v}_1,\mathbf {v}_2\}\), and let \(V_1=\Vert \mathbf {v}_1\Vert _2\) and \(V_2=\Vert \mathbf {v}_2\Vert _2\). Since \(\Vert m_1\mathbf {v}_1+m_2\mathbf {v}_2\Vert _2\asymp |m_1|V_1+|m_2|V_2\), for a suitably large constant C we have
Since not all of the points in the final set lie in a line through the origin, we see that \(V_1,V_2\le C N\). Thus
In particular, \(V_1V_2\ll 1/\delta K\).
Let \(\mathbf {w}_1=\phi ^{-1}(\mathbf {v}_1)\) and \(\mathbf {w}_2=\phi ^{-1}(\mathbf {v}_2)\), so \(\mathbf {w}_1\) and \(\mathbf {w}_2\) are linearly independent vectors in \(\Lambda \subseteq \mathbb {Z}^3\). Since \(\phi \) can only increase the length of vectors, \(\Vert \mathbf {w}_1\Vert _2\le V_1\) and \(\Vert \mathbf {w}_2\Vert _2\le V_2\). Let \(\epsilon _1=|\mathbf {w}_1\cdot \mathbf {a}|\) and \(\epsilon _2=|\mathbf {w}_2\cdot \mathbf {a}|\). Trivially we have \(|\mathbf {v}_1\cdot \mathbf {a}|\ll V_1X\) and \(|\mathbf {v}_2\cdot \mathbf {a}|\ll V_2X\), and so recalling that \(\phi \) is a dilation by a factor \(N/\delta \) in the \(\mathbf {a}\)-direction, we see that \(\epsilon _1\ll \delta X V_1/N\) and \(\epsilon _2\ll \delta X V_2/N\).
Putting this together, we see that for any pair \((a_1,a_2)\in \mathcal {B}_1\) there are linearly independent vectors \(\mathbf {w}_1,\mathbf {w}_2\in \mathbb {Z}^3\) and quantities \(V_1,V_2\) such that
This puts considerable constraints on the possibilities for \((a_1,a_2)\), since it must lie in an infinite cylinder with axis parallel to \(\mathbf {w}_1\times \mathbf {w}_2\) with short radius, for some low height vectors \(\mathbf {w}_1,\mathbf {w}_2\). (Here \(\times \) is the standard cross product on \(\mathbb {R}^3\).) Explicitly, let \(\mathbf {e}_1,\mathbf {e}_2,\mathbf {e}_3\) be an orthonormal basis of \(\mathbb {R}^3\) with \(\mathbf {e}_1\) orthogonal to \(\mathbf {w}_1\) and \(\mathbf {w}_2\), and with \(\mathbf {e}_2\) orthogonal to \(w_2\). Then we see that \(\mathbf {e}_1\propto \mathbf {w}_1\times \mathbf {w}_2\), \(\mathbf {e}_2\propto \mathbf {w}_2\times \mathbf {e}_1\) and \(\mathbf {e}_3\propto \mathbf {w}_2\). In particular, we have that \(|\mathbf {e}_3\cdot \mathbf {w}_2|=\Vert \mathbf {w}_2\Vert _2\), and
(Here we used the identity \(\mathbf {a}\cdot (\mathbf {b}\times \mathbf {c})=\mathbf {c}\cdot (\mathbf {a}\times \mathbf {b})\).) Thus, if \(\mathbf {x}=x_1\mathbf {e}_1+x_2\mathbf {e}_2+x_3\mathbf {e}_3\) has \(|\mathbf {x}\cdot \mathbf {w}_1|\ll \delta X V_1/N\) and \(|\mathbf {x}\cdot \mathbf {w}_2|\ll \delta X V_2/N\), then
Since \(\Vert \mathbf {w}_1\Vert _2\ll V_1\), \(\Vert \mathbf {w}_2\Vert _2\ll V_2\) and \(\Vert \mathbf {w}_1\times \mathbf {w}_2\Vert _2\le \Vert \mathbf {w}_1\Vert _2\Vert \mathbf {w}_2\Vert _2\), this implies that
Thus, since \(V_1V_2\ll 1/\delta K\), we see that any vector \(\mathbf {x}\) with \(|\mathbf {x}\cdot \mathbf {w}_1|\ll \delta X V_1/N\) and \(|\mathbf {x}\cdot \mathbf {w}_2|\ll \delta X V_2/N\) satisfies
for some \(\lambda \in \mathbb {R}\). We note that the error term is o(X) since \(\mathbf {w}_1,\mathbf {w}_2\) are linearly independent integer vectors and NK is assumed sufficiently large. Let the components of \(\mathbf {w}_1\times \mathbf {w}_2\) be \(c_1,c_2,c_3\) (with respect to the standard basis of \(\mathbb {R}^3\)). Since \(\mathbf {w}_1,\mathbf {w}_2\in \mathbb {Z}^3\), we have \(c_1,c_2,c_3\in \mathbb {Z}\). Thus if \(\mathbf {a}\) is of the above form we must have \(\mathbf {a}=\lambda (\mathbf {w}_1\times \mathbf {w}_2)+o(X)\) for some \(\lambda \). Since \(\Vert \mathbf {a}\Vert _2\ge X\) and \(a_1,a_2\le a_3=X\), we must have that \(|c_1|,|c_2|\ll |c_3|\). In particular, \(|c_3|\asymp \Vert \mathbf {w}_1\times \mathbf {w}_2\Vert _2\). Dividing through by \(X=\lambda c_3+O(X/N K |c_3|)\) then gives
Finally, we note that since \(\delta \ge N/X\) and \(V_1 V_2\ll 1/\delta K\) we have
Thus, we see that for any pair \((a_1,a_2)\in \mathcal {B}_1\) there must be integers \(c_1,c_2,c_3\ll X/N K\) such that (14.1) holds. This gives the result. \(\square \)
Lemma 14.2
(Size of rational approximations) Let \(\mathcal {B}_1(N,K,\delta )\) and \(\mathcal {F}=\mathcal {F}(Q,E)\) be as in Proposition 13.3. If \(\mathcal {B}_1(N,K,\delta )\cap \mathcal {F}^2\ne \emptyset \) then
Proof
By Lemma 14.1, if \((a_1,a_2)\in \mathcal {B}_1(N,K,\delta )\) then
for some \(q\ll X/N K\) and \(|\nu _1|,|\nu _2|\ll 1/N K q\). By clearing common factors we may assume that \((b_1,b_2,q)=1\).
If \(N K > X^{2/3}\) (and X is sufficiently large) then we see that \(b_1/q\) and \(b_2/q\) are the best rational approximations to \(a_1/X\) and \(a_2/X\) with denominator \(O(X^{1/3})\), since the error in the approximation is \(O(1/(qX^{2/3}))\). Thus if we also have \(a_1,a_2\in \mathcal {F}(Q,E)\) then we must have \(q\gg Q\) and \(|\nu _1|,|\nu _2|\sim E/X\). In particular, we must have \(Q+E\ll X/NK\). If instead \(N K\le X^{2/3}\) then since \(Q+E\ll X^{1/2}\) we have \(Q+E\ll (X/NK)^{2}\). Thus in either case we have that there are no such pairs \((a_1,a_2)\) in both \(\mathcal {B}_1(N,K,\delta )\) and in \(\mathcal {F}\times \mathcal {F}\) unless \(Q+E\ll (X/NK)^2\). \(\square \)
Lemma 14.3
Let \(N K\ge X^{17/40}\), and let \(\mathcal {B}_1(N,K,\delta )\), \(\mathcal {F}=\mathcal {F}(Q,E)\) and \(\mathcal {E}\) be as in Proposition 13.3. Then we have
where \(\mathcal {V}=\{2^u 5^v:u,v\in \mathbb {Z}_{\ge 0}\}\), the supremum is over all choices of \(Q_1,G_1,G_2,D_0,D_1,E_0\ge 1\) which are powers of 10 and satisfy \(Q_1 G_1 G_2 D_0 D_1 E_0\ll X/N K\) and \(G_1\ll G_2\), and \(S_1,S_2,S_3\) are given by
Proof
By Lemma 14.1 we are considering pairs \((a_1,a_2)\in \mathcal {B}_1(N,K,\delta )\) such that
for some \(q\ll X/N K\) and \(|\nu _1|,|\nu _2|\ll 1/N K q\).
By clearing common factors we may assume that \((b_1,b_2,q)=1\). We let \(g_1=(b_1,q)\) and \(g_2=(b_2,q)\). By symmetry we may assume that \(g_1\le g_2\). We let \(d_1\) be the part of \(g_1\) not coprime to 10 (i.e. \(d_1|10^u\) for some integer u, and \(g_1=g_1'd_1\) for some \((g_1',10)=1\)). Similarly we let \(d_0\) be the part of \(q/g_1g_2\) which is not coprime to 10. To ease notation we let \(b_1'=b_1/g_1\), \(b_2'=b_2/g_2\), \(q'=q/g_1g_2d_0\) and \(g_1'=g_1/d_1\). Thus \(q=g_1'g_2d_0d_1q'\), \(b_1=b_1'd_1g_1'\) and \(b_2=b_2'g_2\) with \((b_1',d_0 q' g_2)=(b_2',d_0 d_1 q' g_1')=1\) and \((q',10)=(g_1',10){=}1\).
We split the contribution of pairs \((a_1,a_2)\in \mathcal {B}_1\) into \(O(\log {X})^5\) subsets. We consider terms where we have the restrictions \(q'\sim Q_1\), \(g_1'\sim G_1\), \(g_2\sim G_2\), \(d_0\sim D_0\) and \(d_1\sim D_1\) for some \(Q_1,G_1,G_2,D_0,D_1 \ge 1\) all integer powers of 10 with \(Q_0:=Q_1 G_1 G_2 D_0 D_1\ll X / N K\). Since \(g_1=g_1'd_1\le g_2\) we have \(G_1D_1\ll G_2\). We relax the restriction \(|\nu _1|,|\nu _2|\ll 1/N K q\) to \(|\nu _1|,|\nu _2|\le E_0/X\) for a suitable power of 10 \(E_0\asymp X/N K Q_0\) with \(E_0\ge 1\). We see there are \(O(\log {X})^5\) sets with such restrictions which cover all possible \((b_1,b_2,q,\nu _1,\nu _2)\) and hence all \((a_1,a_2)\in \mathcal {B}_1\). For simplicity, the reader might like to consider the special case \(G_1=G_2=D_0=D_1=1\) on a first reading.
To ease notation we let \(\mathcal {V}=\{2^u5^v:\,u,v\in \mathbb {Z}_{\ge 0}\}\), and note that we have \(d_0,d_1\in \mathcal {V}\). By summing over all possibilities of \(q',g_1',g_2,d_0,d_1,b_1',b_2'\), we see that
where the supremum is over all choices of \(Q_1,G_1,G_2,D_0,D_1,E_0\ge 1\) which are powers of 10 and satisfy \(Q_1G_1G_2D_0D_1E_0\ll X/N K\) and \(G_1D_1\ll G_2\) and \(S_0\) is given by
In \(S_0\), we have used \(\sum '\) to indicate that the summation is further constrained by the conditions
which we suppressed for notational simplicity. We see that \(g_1',g_2,b_1',b_2',\nu _1,\nu _2\) each occur in only one of the two \(F_X\) terms, and so given \(d_0,d_1,q'\) the remaining summation in \(S_0\) factors into a product of two sums. Taking a supremum over all choices of \(q'\) in the first of these then gives
where
The bound (14.2) will be useful when \(Q_0\) is small, but when \(Q_0\) is large it is wasteful to sum over all these possibilities since we have not made use of the fact that \(a_1,a_2\in \mathcal {E}\), a small set. To obtain an alternative bound we first sum over all \(a_1\in \mathcal {E}\), then all possibilities of q, \(b_2\), \(\nu _2\). This shows that
where the supremum has the same constraints as before, and \(S_0'\) is given by
Here the summation in \(S_0'\) is constrained by
Again, taking a supremum over \(q'\) and factorizing the summation, we find that
where \(S_1\) is as given by (14.3) above, and \(S_3\) is given by
where
Putting together (14.2), (14.5), (14.6) we obtain
as required. \(\square \)
Lemma 14.4
Let \(N K\ge X^{17/40}\) and let \(S_1,S_2,S_3\) be as in Lemma 14.3. Let \(Q_1,G_1,G_2,D_0,D_1,E_0\ge 1\) be powers of 10 which satisfy \(Q_1 G_1 G_2 D_0 D_1 E_0\ll X/N K\) and \(G_1\ll G_2\). Then we have
where \(Q_0=Q_1 G_1 G_2 D_0 D_1\).
Proof
We first bound \(S_1,S_2,S_3\) individually using Lemmas 12.2, 10.6 and 10.7. We will then combine these bounds to give the desired result.
We first consider the quantity \(N(a_1,d_0)\) occurring in \(S_3\). If q and \(q'\) are both counted by N(a, d) then there exists b, g and \(b',g'\) such that \((b,q d g)=(b',q' d g')=1\) and
Here we used the fact that \(E_0/X\ll 1/N K Q_0\). The variables we consider satisfy \(q,q'\sim Q_1\ll Q_0/G_1G_2D_0D_1\) and \(g,g'\sim G_2\) and \(d\sim D_0\). Thus
Let \(h\ll Q_0 / D_0 D_1 N K\) be such that \(b q' g'-b' q g=h\). There are \(O(1+Q_0/D_0D_1 N K)\) such choices of h. Given q, g, b, h with \((q g,b)=1\), we then see
Since \(q' g'\asymp q g\) and \(b'\asymp b\), there are O(1) choices of \(b'\) and \(q' g'\). Thus there are \(O(Q_0^\epsilon )\) such choices of \(q',g',b'\) by the divisor bound. Thus we find that
Combining this with Lemma 12.2 gives the bound
We recall \(Q_0=Q_1G_1G_2D_0D_1\) is the approximate size of q and that \(G_1\ll G_2\), \(E_0Q_0\ll X/N K\ll X\). By Lemma 10.6 we have
Alternatively, we may bound \(S_1\) using Lemma 10.7, which gives
If the first term in (14.11) dominates, then since \(E_0\ll X/N K Q_0\), the bounds (14.11) and (14.10) give
This shows \(S_1 S_2\ll Q_0^{1-\epsilon }E_0^{1-\epsilon }\) in this case by recalling that \(N K\gg X^{17/40}\) and verifying that \(22/21\times 23/40<50/77\).
If instead the second term in (14.11) dominates, then by (14.9) and (14.11) (using \(G_1\ll G_2\) and replacing \(E_0^{5/6}\) with \(E_0\) to simplify the expression), we have
Combining this with (14.10), we obtain
Here we have simplified the exponents appearing for an upper bound. We recall that \(Q_0 E_0\ll X/N K\) and (by assumption of the lemma) \(N K\gg X^{17/40}\). These give
Thus this term is \(O(Q_0^{1-\epsilon }E_0^{1-\epsilon })\), and so
Similarly, we find that combining (14.12) and (14.8) gives
Here we used \(10/21-23/80>3/16\). Since \(Q_0 E_0\ll X/N K\) and \(N K\gg X^{17/40}\gg X^{13/32+\epsilon }\), we see that
Thus we have
Combining (14.13) and (14.14), we obtain
We find that
Thus we have \(\min (S_1S_2,S_3S_2)\ll Q_0^{1-\epsilon }E_0^{1-\epsilon }\) in all cases, as desired. \(\square \)
Having established the technical Lemmas 14.3 and 14.4, we are now in a position to prove Proposition 13.3.
Proof of Proposition 13.3
We wish to show that
in the region \(X^{17/40}\le N K\). Since \(\mathcal {B}_1(N,K,\delta )\cap \mathcal {F}^2=\emptyset \) unless \(Q+E\ll (X/N K)^2\) by Lemma 14.2, we may assume that \(Q+E\ll (X/NK)^2\).
By Lemmas 14.3 and 14.4 we have
There are \(O(Q_0^{\epsilon /2})\) elements \(d_0,d_1\in \mathcal {V}\) with \(d_0,d_1\ll Q_0\). Thus, recalling that \(Q_0E_0\ll X/N K\), we have
We recall that \(Q+E\ll (X/N K)^2\), and so this gives
as required. \(\square \)
15 Line estimates
In this section we establish Proposition 13.4, which controls the contribution from pairs of angles which cause a large contribution to the bilinear sums considered in Sect. 13 to come from a line. If a line L makes a large contribution, then \((a_1,a_2,X)\) must lie close to the low height plane orthogonal to this line. We note that we do not make use of the fact that these angles lie outside the major arcs, but it is vital that the angles are restricted to the small set \(\mathcal {E}\).
Lemma 15.1
(Line angles lie in low height plane) Let \(0<\delta <1\) and \(K,N,X>1\) be reals with \(\delta \ge N/X\) and \(N K\ge X^{17/40}\). Let \(\mathcal {B}_2=\mathcal {B}_2(N,K,\delta )\) be the set of integer pairs \((a_1,a_2)\in [0,X)^2\) such that there is a line L through the origin such that
Then all pairs \((a_1,a_2)\in \mathcal {B}_2\) satisfy
for some integers \(v_1,v_2,v_3,v_4\ll X/N^2K\) not all zero.
Proof
Let \(\mathbf {v}=(v_1,v_2,v_3)\) be a non-zero element of \(\mathbb {Z}^3\cap L\) of smallest norm, and let \(V=\Vert \mathbf {v}\Vert _2\) and \(\epsilon _1=|v_1a_1+v_2a_2+v_3X|\). Then all of \(\mathbb {Z}^3\cap L\) is generated by \(\mathbf {v}\), and so
By assumption, this is also \(\gg \delta N^2K\), and so we obtain
Letting \(v_4=-(v_1a_1+v_2a_2+v_3X)\in \{\pm \epsilon _1\}\) gives the result. \(\square \)
Lemma 15.2
(Sparse sets restricted to low height planes) Let \(\mathcal {C}\subseteq [0,X)\) be a set of integers. Then we have for any \(V\ge 1\)

Proof
Trivially there are \(O(\#\mathcal {C}^2)\) choices of \(a_1,a_2\in \mathcal {C}\), which gives the required bound if \(V>\#\mathcal {C}^{3/8}\). In particular, we may assume that \(V< \#\mathcal {C}\le X\). There are \(O(\#\mathcal {C})\) points with \(a_1=0\) or \(a_2=0\), so we may assume that \(a_1,a_2\ne 0\).
We first claim that there are
choices of \(v_1\), \(v_2\), \(v_3\), \(v_4\), \(a_1\), and \(a_2\) satisfying \(v_1a_1+v_2a_2+v_3X+v_4=0\) with at least one of \(v_1,v_2,v_3,v_4\) equal to 0 and at least one of \(v_1,v_2,v_3,v_4\) non-zero. For example, if \(v_1=0\) then there are \(O(\#\mathcal {C}V^2)\) choices of \(a_1,v_3,v_4\), which then determines \(v_2a_2\). Since there are no non-zero solutions to \(v_3X+v_4=0\), this is non-zero and so there are \(O(X^\epsilon )\) choices of \(v_2,a_2\). The other cases are entirely analogous. Thus it suffices to consider pairs \((a_1,a_2)\) such that \(v_1a_1+v_2a_2+v_3X+v_4=0\) for some \(v_1,v_2,v_3,v_4\) all non-zero. We let \(\mathcal {C}_2\) denote the set of such pairs.
Given \(a\in \mathbb {Z}\), let \(M_a\) be the smallest value of \((c_1^2+c_2^2)^{1/2}\) over all non-zero integers \(c_1,c_2\) such that \(c_1\equiv c_2 X\ (\mathrm {mod}\ a)\). We divide \(\mathcal {C}\) into \(O(\log {X})^2\) subsets localizing the size of \(a<X\) and \(M_a<X\) by considering the sets
There are \(O(M^2)\) choices of \(c_1,c_2\) with \((c_1^2+c_2^2)^{1/2}\le M\), and given any such choice with \(M<X\) there are \(X^{o(1)}\) choices of \(a|c_1-c_2 X\) from the divisor bound (noting that this must be non-zero). Thus we have that
By Cauchy–Schwarz we have

where

We wish to bound \(N_2\). Given \(v_1,v_1'\), let \(d=\gcd (v_1,v_1')\) and \(v_1=d\tilde{v}_1\), \(v_1'=d\tilde{v}_1'\) so \(\gcd (\tilde{v}_1,\tilde{v}_1')=1\). We split the count \(N_2\) by considering \(\max (\tilde{v}_1,\tilde{v}_1')\sim V_1\) for different choices of \(V_1\). Since \(V< X\), there are \(O(\log {X})\) choices of \(V_1\) we need to consider. This gives
where

We wish to show that \(N_3(V_1)\ll X^{o(1)}(\#\mathcal {C}^{3/2}V^4+\#\mathcal {C}^2V^6/X)\) for any choice of \(0<V_1<V\). By symmetry we may assume \(|\tilde{v}_1|\ge |\tilde{v}_1'|\), so \(|\tilde{v}_1|\sim V_1\). Let \(b_1=\tilde{v}_1'v_2\), \(b_2=-\tilde{v}_1v_2'\), \(b_3=\tilde{v}_1'v_3-\tilde{v}_1v_3'\) and \(b_4=\tilde{v}_1'v_4-\tilde{v}_1v_4'\). We see that any solution counted by \(N_3(V_1)\) must give a solution to
with \(0\le |b_1|,|b_2|,|b_3|,|b_4|\le 2V_1V\) and \(b_1,b_2\ne 0\).
There are \(O(V_1^3V^3)\) choices of \(b_2,b_3,b_4\) and \(O(\#\mathcal {C})\) choices of \(a_2'\). Given such a choice of \(b_2,b_3,b_4,a_2'\), there are \(O(X^{o(1)})\) choices of \(b_1\) and \(a_2\) by the divisor bound, since \(b_1a_2=-b_2a_2'-b_3X-b_4\) and \(b_1a_2\) is non-zero. Given \(b_1,b_2\) there are \(O(X^{o(1)})\) choices of \(\tilde{v}_1,\tilde{v}_1',v_2,v_2'\) by the divisor bound (recall \(b_1,b_2\ne 0\)). Given \(\tilde{v}_1,\tilde{v}_1'\) and \(b_3\) we see that
Thus there are \(O(V/V_1)\) choices of \(v_3\) (here we use the fact that \(\gcd (\tilde{v}_1,\tilde{v}_1')=1\)). Given \(v_1,\tilde{v}_1,b_3\) and such a choice of \(v_3\) there is just one choice of \(v_3'\). Similarly, there are \(O(V/V_1)\) choices of \(v_4,v_4'\) given \(\tilde{v}_1,\tilde{v}_1'\) and \(b_4\). Given \(\tilde{v}_1,v_2,v_3,v_4,a_2\), there are \(O(X^{o(1)})\) choices of \(d,a_1\) since \(d a_1\tilde{v}_1X=v_2a_2+v_3 X+v_4\) and \(d a_1 \tilde{v}_1 X\ne 0\). Putting this all together, we have
This bound will be good for us if \(V_1\) is small, but we need a different argument if \(V_1\) is large.
We note that
We make a choice of \(a_2,a_2',b_1\), for which there are \(\ll V V_1 X^{o(1)}\min (M^4,\#\mathcal {C}^2)\) possibilities counted by \(N_3(V_1)\). We see that \(b_3,b_4\) satisfy
Let \(b_{3,0},b_{4,0}\) be a solution to this congruence with \(b_{3,0}^2+b_{4,0}^2\) minimal. We may assume that \(b_{3,0}\ll VV_1A/X\) and \(b_{4,0}\ll VV_1\) since otherwise there are no possible \(b_3,b_4\). All pairs \(b_3,b_4\) satisfying the congruence are then of the form \((b_3,b_4)=(b_{3,0}+b_3',b_{4,0}+b_4')\) for some integers \(b_3',b_4'\) satisfying \(b_3'X+b_4'\equiv 0\ (\mathrm {mod}\ a_2')\) and \(b_3'\ll V V_1A/X\), \(b_4'\ll V V_1\). This forces \(b_3'\mathbf {e}_1+b_4'\mathbf {e}_2\) to lie in a lattice \(\Lambda \subset \mathbb {Z}^2\) of determinant \(a_2'\), where \(\mathbf {e}_1,\mathbf {e}_2\) are the standard basis vector of \(\mathbb {Z}^2\). Let \(\phi :\mathbb {R}^2\rightarrow \mathbb {R}^2\) be the linear map which is a dilation by a factor X / A in the \(\mathbf {e}_1\) direction, and \(\Lambda '=\phi (\Lambda )\), a lattice in \(\mathbb {R}^2\) of determinant \(a_2X/A\asymp X\).
Let \(\Lambda '\) have a Minkowski-reduced basis \(\{\mathbf {v}_1,\mathbf {v}_2\}\). We recall this means that \(\Vert \mathbf {v}_1\Vert _2\cdot \Vert \mathbf {v}_2\Vert _2\asymp \det (\Lambda )=a_2'X/A\asymp X\) and \(\Vert n_1\mathbf {v}_1+n_2\mathbf {v}_2\Vert _2\asymp \Vert n_1\mathbf {v}_1\Vert _2+\Vert n_2\mathbf {v}_2\Vert _2\). From the definition of \(M_a\), we see that the smallest non-zero vector in \(\Lambda \) has length at least M / 10, and so since \(\phi \) can only increase the length of vectors we have \(\Vert \mathbf {v}_1\Vert _2,\Vert \mathbf {v}_2\Vert _2\ge M/10\).
The set of vectors \(b_3'\mathbf {e}_1+b_4'\mathbf {e}_2\) in \(\Lambda \) inside the bounded region \(|b_3'|\ll V V_1 A/X\), \(|b_4'|\ll V V_1\) can be injected by \(\phi \) into the set \(\{\mathbf {x}\in \Lambda ':\,\Vert \mathbf {x}\Vert _2\le CV V_1\}\) for some suitably large constant C. Thus, provided C is sufficiently large so that we also have \(\Vert n_1\mathbf {v}_1+n_2\mathbf {v}_2\Vert _2\ge \max _i\Vert n_i\mathbf {v}_i\Vert _2/C\), we see that the number of pairs \((b_3',b_4')\) is bounded by
Here we used the fact that \(\Vert \mathbf {v}_1\Vert _2,\Vert \mathbf {v}_2\Vert _2\gg M\) and \(\Vert \mathbf {v}_1\Vert _2\cdot \Vert \mathbf {v}_2\Vert _2\asymp \det (\Lambda ')\) in the penultimate line, and \(\det (\Lambda ')\asymp X\) in the final line.
Given any choice of \(a_2,a_2',b_1,b_3,b_4\), we see that \(b_2\) is then determined uniquely by \(b_1a_2+b_2a_2'=b_3X+b_4\), since we have already chosen all the other terms. As before, given \(a_2\), \(a_2'\), \(b_1\), \(b_2\), \(b_3\), \(b_4\) there are \(O(X^{o(1)}V^2/V_1^2)\) choices of \(\tilde{v}_1\), \(\tilde{v}_1'\), \(v_2\), \(v_3\), \(v_4\), \(v_2'\), \(v_3'\), \(v_4'\), d, \(a_1\). Putting this all together, we obtain the bound
Since \(\min (M^4,\#\mathcal {C}^2)\le \min (M\#\mathcal {C}^{3/2},\#\mathcal {C}^2)\) this gives
Combining (15.4) and (15.5), we obtain
We substitute (15.3) and (15.6) into (15.2), and obtain
We recall from (15.1) that terms with \(v_1v_2v_3v_4a_1a_2=0\) contribute a total \(O(\#\mathcal {C}V^2X^{o(1)})\), which is negligible compared with the \(\#\mathcal {C}^{5/4}V^2\) term above. Thus we obtain the result. \(\square \)
We see that Lemma 15.2 improves on the trivial bound \(O(X^{o(1)}\min (V^3\#\mathcal {C},\#\mathcal {C}^2))\) if \(V^{8/3+\epsilon }\ll \#\mathcal {C}\ll V^{4-\epsilon }+X^{1-\epsilon }\).
Proof of Proposition 13.4
We wish to show that
in the region \(N\gg X^{9/25}\). We recall that
for some \(B\ll X^{23/80}\). Trivially, we have that
By Lemma 10.4, we have
This gives
on verifying that \(4/77\times 23/80+118/433<23/80\). This gives the required bound if \(N K\ll X^{57/80}/B\).
Alternatively, if \(N K\gg X^{57/80}/B\), we use Lemmas 15.1 and 15.2 to bound \(\#(\mathcal {B}_2\cap (\mathcal {E}')^2)\), and obtain
Here we have written \(\mathbf {a}\) for the vector \((a_1,a_2,X,1)\in \mathbb {Z}^4\).
Since \(N K\gg X^{57/80}/B\), we have \(X/N K\ll X^{23/80}B\). Combining this bound with (15.7), we obtain a bounds for \((\#\mathcal {E}')^{5/4}B^{-2}X/N K\) and \((\#\mathcal {E}')^{3/2}B^{-2}X^{-1/2}(X/N K)^2\) of the form \(X^a B^b\) for some \(b>0\). Since we are only considering \(B\ll X^{23/80}\), these expressions are maximized when \(B\asymp X^{23/80}\). When \(B\asymp X^{23/80}\) we have \(\#\mathcal {E}'\ll X^{23/40}\) and \(X/N K\ll X^{23/40}\). Thus we obtain the bounds
Substituting these bounds into (15.8) gives
We can then verify that \(2\times 9/25>23/32\) and that \(3\times 9/25>15/16\), so for \(N\gg X^{9/25}\) this is \(O(X^{1-\epsilon }/NK)\), as required. \(\square \)
16 Modifications for Theorem 1.2
Theorem 1.2 follows from essentially the same overall approach as in Theorem 1.1. We only provide a brief sketch the proof, leaving the complete details to the interested reader. When q is large, there is negligible benefit from using the \(235/154{\mathrm{th}}\) moment, so we just use \(\ell ^1\) bounds. For \(Y=q^k\) a power of q, we let
The inner sum is \(\le \min (q-s,\,s+2/\Vert q^i\theta \Vert )\). Thus, similarly to Lemma 10.3, we find
In particular, for q large enough in terms of \(\epsilon \) and \(s\le q^{23/80}\), this is \(O(Y^{23/80+\epsilon })\). We can use this bound in place of Lemmas 10.3 and 10.4 throughout the argument with the same (or stronger) consequences. This gives the first part of Theorem 1.2.
For the second part of Theorem 1.2, we see that in the special case \(\mathcal {B}=\{0,\ldots ,s-1\}\) we have
Using this bound, get a corresponding improvement on (16.1), which gives
If \(s\le q-q^{57/80}\) and q is sufficiently large in terms of \(\epsilon \), this gives a bound \(Y^{23/80+\epsilon }\). As before, using this bound in place of Lemmas 10.3 and 10.4 throughout gives the result.
For the results mentioned after Theorem 1.2, we find that in the further restricted ranges \(s\le q^{1/4-\delta }\) (or \(s\le q-q^{3/4+\delta }\) if \(\mathcal {B}=\{0,\ldots ,s-1\}\)), the bound (16.1) [or (16.2)] give an \(\ell ^1\) bound of \(Y^{1/4-\delta /2}\). Following this through the argument, we obtain a wider Type II range and can estimate bilinear sums provided \(N\in [X^{5/16},X^{1/2}]\) instead of \([X^{9/25},X^{17/40}]\). By symmetry, we can then also estimate terms in \(N\in [X^{1/2},X^{11/16}]\). This allows us to obtain asymptotic estimates for all the terms in the right hand side of the identity
by the equivalents of Propositions 6.1 and 6.2 adapted to this larger Type II range.
References
Banks, William D., Conflitti, Alessandro, Shparlinski, Igor E.: Character sums over integers with restricted \(g\)-ary digits. Ill. J. Math. 46(3), 819–836 (2002)
Banks, William D., Shparlinski, Igor E.: Arithmetic properties of numbers with restricted digits. Acta Arith. 112(4), 313–332 (2004)
Bourgain, Jean: Prescribing the binary digits of primes, II. Isr. J. Math. 206(1), 165–182 (2015)
Col, Sylvain: Diviseurs des nombres ellipséphiques. Period. Math. Hung. 58(1), 1–23 (2009)
Coquet, Jean: On the uniform distribution modulo one of some subsequences of polynomial sequences. J. Number Theory 10(3), 291–296 (1978)
Coquet, Jean: On the uniform distribution modulo one of subsequences of polynomial sequences. II. J. Number Theory 12(2), 244–250 (1980)
Dartyge, Cécile, Mauduit, Christian: Nombres presque premiers dont l’écriture en base \(r\) ne comporte pas certains chiffres. J. Number Theory 81(2), 270–291 (2000)
Dartyge, Cécile, Mauduit, Christian: Ensembles de densité nulle contenant des entiers possédant au plus deux facteurs premiers. J. Number Theory 91(2), 230–255 (2001)
Davenport, Harold: Indefinite quadratic forms in many variables. II. Proc. Lond. Math. Soc. 3(8), 109–126 (1958)
Davenport, H.: Multiplicative Number Theory. Graduate Texts in Mathematics, vol. 74, 3rd edn. Springer, New York (2000). (Revised and with a preface by Hugh L. Montgomery)
Drmota, Michael, Mauduit, Christian: Weyl sums over integers with affine digit restrictions. J. Number Theory 130(11), 2404–2427 (2010)
Erdős, Paul, Mauduit, Christian, Sárközy, András: On arithmetic properties of integers with missing digits. I. Distribution in residue classes. J. Number Theory 70(2), 99–120 (1998)
Erdős, Paul, Mauduit, Christian, Sárközy, András: On arithmetic properties of integers with missing digits. II. Prime factors. Discrete Math. 200(1–3), 149–164 (1999). Paul Erdős memorial collection
Friedlander, J., Iwaniec, H.: Opera de cribro. American Mathematical Society Colloquium Publications, vol. 57. American Mathematical Society, Providence (2010)
Harman, G.: Prime-Detecting Sieves. London Mathematical Society Monographs Series, vol. 33. Princeton University Press, Princeton (2007)
Konyagin, Sergei: Arithmetic properties of integers with missing digits: distribution in residue classes. Period. Math. Hung. 42(1–2), 145–162 (2001)
Mauduit, Christian, Rivat, Joël: Sur un problème de Gelfond: la somme des chiffres des nombres premiers. Ann. Math. (2) 171(3), 1591–1646 (2010)
Montgomery, H.L., Vaughan, R.C.: Multiplicative Number Theory. I. Classical Theory. Cambridge Studies in Advanced Mathematics, vol. 97. Cambridge University Press, Cambridge (2007)
Acknowledgements
We thank Ben Green for introducing the author to this problem, Xuancheng Shao for useful discussions and Fabian Karwatowski for some important corrections. We also thank the anonymous referee for many helpful suggestions and corrections. The author is supported by a Clay Research Fellowship and a Fellowship by Examination of Magdalen College, Oxford. Part of this work was performed whilst the author was visiting Stanford university, whose hospitality is gratefully acknowledged.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
OpenAccess This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
About this article
Cite this article
Maynard, J. Primes with restricted digits. Invent. math. 217, 127–218 (2019). https://doi.org/10.1007/s00222-019-00865-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00222-019-00865-6