Primes with restricted digits

Let a0∈{0,…,9}\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$a_0\in \{0,\ldots ,9\}$$\end{document}. We show there are infinitely many prime numbers which do not have the digit a0\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$a_0$$\end{document} in their decimal expansion. The proof is an application of the Hardy–Littlewood circle method to a binary problem, and rests on obtaining suitable ‘Type I’ and ‘Type II’ arithmetic information for use in Harman’s sieve to control the minor arcs. This is obtained by decorrelating Diophantine conditions which dictate when the Fourier transform of the primes is large from digital conditions which dictate when the Fourier transform of numbers with restricted digits is large. These estimates rely on a combination of the geometry of numbers, the large sieve and moment estimates obtained by comparison with a Markov process.


Introduction
Let a 0 ∈ {0, . . . , 9} and let  = log (10/9)/ log 10 ≈ 0.046 > 0. In particular, A 1 is a sparse subset of the natural numbers. A set being sparse in this way presents several analytic difficulties if one tries to answer arithmetic questions such as whether the set contains infinitely many primes. Typically we can only show that sparse sets contain infinitely many primes when the set in question possesses some additional multiplicative structure.
The set A 1 has unusually nice structure in that its Fourier transform has a convenient explicit analytic description, and is often unusually small in size. There has been much previous work [1,2,[4][5][6]11,13] studying A 1 and related sets by exploiting this Fourier structure. In particular the work of Dartyge and Mauduit [7,8] shows the existence of infinitely many integers in A 1 with at most 2 prime factors, this result relying on the fact that A 1 is well-distributed in arithmetic progressions [7,12,16]. We also mention the related work of Mauduit and Rivat [17] who showed the sum of digits of primes is welldistributed, and the work of Bourgain [3] which showed the existence of primes in the sparse set created by prescribing a positive proportion of the binary digits.
We show that there are infinitely many primes in A 1 . Our proof is based on a combination of the circle method, Harman's sieve, the method of bilinear sums, the large sieve, the geometry of numbers and a comparison with a Markov process. In particular, we make key use of the Fourier structure of A 1 , in the same spirit as the aforementioned works. Somewhat surprisingly, the Fourier structure allows us to successfully apply the circle method to a binary problem. Theorem 1.1 Let X ≥ 4 and A = { 0≤i≤k n i 10 i < X : n i ∈ {0, . . . , 9}\{a 0 }, k ≥ 0} be the set of numbers less than X with no digit in their decimal expansion equal to a 0 . Then we have #{ p ∈ A} #A log X X log 9/ log 10 log X .
Here, and throughout the paper, f g means that there are absolute constants c 1 , c 2 > 0 such that c 1 Thus there are infinitely many primes with no digit a 0 when written in base 10. Since #A/ X log 9/ log 10 oscillates as X → ∞, we cannot expect an asymptotic formula of the form (c + o(1))X log 9/ log 10 / log X . Nonetheless, we expect that where κ A = 10(φ(10)−1) 9φ (10) , if (10, a 0 ) = 1, 10 9 , otherwise. (1.1) Indeed, there are (φ (10)κ A /10 + o(1))#A elements of A which are coprime to 10, and (1 + o(1))X/ log X primes less than X which are coprime to 10, and (φ(10)/10 + o(1))X integers less than X coprime to 10. Thus if the properties 'being in A' and 'being prime' where independent for integers n < X coprime to 10, we would expect (κ A +o(1))#A/ log X primes in A. Theorem 1.1 shows this heuristic guess is within a constant factor of the truth, and we would be able to establish such an asymptotic formula if we had stronger 'Type II' information.
One can consider the same problem in bases other than 10, and with more than one excluded digit. The set of numbers less than X missing s digits in base q has X c elements, where c = log(q − s)/ log q. For fixed s, the density becomes larger as q increases, and so the problem becomes easier. Our methods are not powerful enough to show the existence of infinitely many primes with two digits not appearing in their decimal expansion, but they can show that there are infinitely many primes with s digits excluded in base q provided q is large enough in terms of s. Moreover, if the set of excluded digits possesses some additional structure this can apply to very thin sets formed in this way. shows the existence many primes in a set of integers A with #A ≈ X 57/80 = X 0.7125 , a rather thin set. The exponent here can be improved slightly with more effort. The estimates in Theorem 1.2 can be improved to asymptotic formulae if we restrict s slightly further. For general B with s = #B ≤ q 1/4−δ and any q sufficiently large in terms of δ > 0 we obtain where if B contains exactly t elements coprime to q, we have In the case of just one excluded digit, we can obtain this asymptotic formula for q ≥ 12. In the case of B = {0, . . . , s − 1}, we obtain the above asymptotic formula provided s ≤ q − q 3/4+δ . We expect several of the techniques introduced in this paper might be useful more generally in other digit-related questions about arithmetic sequences. Our general approach to counting primes in A and our analysis of the minor arc contribution might also be of independent interest, with potential application to other questions on primes involving sets whose Fourier transform is unrelated to Diophantine properties of the argument.

Outline
Our argument is fundamentally based on an application of the circle method. Clearly for the purposes of Theorem 1.1 we can restrict X to a power of 10 for convenience. The number of primes in A is the number of solutions of the binary equation p − a = 0 over primes p and integers a ∈ A, and so is given by We then separate the contribution from the a in the 'major arcs' which give our expected main term for #{ p ∈ A}, and the a in the 'minor arcs' which we bound for an error term. The reader might be (justifiably) somewhat surprised by this, since it is well known that the circle method typically cannot be applied to binary problems. Indeed, one cannot generally hope for bounds better than 'square-root cancellation' for 'generic' θ ∈ [0, 1]. Thus if one cannot exploit cancellation amongst the different terms in the minor arcs, we would expect that the X different 'generic' a in the sum above would contribute an error term which we can only bound as O(X 1/2 #A 1/2 ), and this would dominate the expected main term.
It turns out that the Fourier transform S A (θ ) has some somewhat remarkable features which cause it to typically have better than square-root cancellation. (A closely related phenomenon is present and crucial in the work of Mauduit and Rivat [17] and Bourgain [3].) Indeed, we establish the 1 bound 0≤a<X S A a X #A X 0.36 . (2.1) which shows that for 'generic' a we have S A (a/ X ) #A/ X 0.64 X 0.32 . This gives us a (small) amount of room for a possible successful application of the circle method , since now we might hope the 'generic' a would contribute a total O(X 0.82 ) if the bound S P (a/ X ) X 1/2+ held for all a in the minor arcs, and this O(X 0.82 ) error term is now smaller than the expected main term of size #A 1+o (1) .
We actually get good asymptotic control over all moments (including fractional ones) of S A (a/ X ) rather than just the first. By making a suitable approximation to S A (θ ), we can re-interpret moments of this approximation as the average probability of restricted paths in a Markov process, and obtain asymptotic estimates via a finite eigenvalue computation.
By combining an 2 bound for S P (a/ X ) with an 1.526 bound for S A (a/ X ), we are able to show that it is indeed the case that 'generic' a < X make a negligible contribution, and that we may restrict ourselves to a ∈ E, some set of size O(X 0.36 ).
We expect that S P (θ ) is large only when θ is close to a rational with small denominator, and S A (θ ) is large when θ has a decimal expansion containing many 0's or 9's. Thus we expect the product to be large only when both of these conditions hold, which is essentially when θ is well approximated by a rational whose denominator is a small power of 10. By obtaining suitable estimates for A in arithmetic progressions via the large sieve, one can verify that amongst all a in the major arcs M where a/ X is well-approximated by a rational of small denominator we obtain our expected main term, and this comes from when a/ X is well-approximated by a rational with denominator 10.
Thus we are left to show when a ∈ E and a/ X is not close to a rational with small denominator, the product S A (a/ X )S P (−a/ X ) is small on average. By using an expansion of the indicator function of the primes as a sum of bilinear terms (similar to Vaughan's identity), we are led to bound expressions such as 2) which is a weighted and averaged form of the typical expressions one encounters when obtaining a ∞ bound for exponential sums over primes. Here · is the distance to the nearest integer.
The double sum over n 1 , n 2 in (2.2) is of size O(N 2 ) for 'typical' pairs (a 1 , a 2 ), and if it is noticeably larger than this then a 1 and a 2 must share some Diophantine structure. We find that the pair (a 1 , a 2 ) must lie close to the projection from Z 3 to Z 2 of some low height plane or low height line if this quantity is large, where the arithmetic height of the line or plane is bounded in terms of the size of the double sum (For example, the diagonal terms a 1 = a 2 give a large contribution and lie on a low height line, and a 1 , a 2 which are both small give a large contribution and lie in a low height plane.).
This restricts the number and nature of pairs (a 1 , a 2 ) which can give a large contribution. Since we expect the size of S A (a 1 / X )S A (a 2 / X ) to be determined by digital rather than Diophantine conditions on a 1 , a 2 , we expect to have a smaller total contribution when restricted to these sets. By using the explicit description of such pairs (a 1 , a 2 ) we succeed in obtaining such a superior bound on the sum over these pairs. It is vital here that we are restricted to a 1 , a 2 lying in the small set E (for points on a line) and outside of the set M of major arcs (for points in a lattice).
This ultimately allows us to get suitable bounds for (2.2) provided N ∈ [X 0.36 , X 0.425 ]. If this 'Type II range' were larger, we would be able to express the indicator function of the primes as a combination of such bilinear expressions and easily controlled terms. We would then obtain an asymptotic estimate for #{ p ∈ A}. Unfortunately our range is not large enough to do this. Instead we work with a minorant for the indicator function of the primes throughout our argument, which is chosen such that it is essentially a combination of bilinear expressions which do fall into this range. It is this feature which means we obtain a lower bound rather than an asymptotic estimate for the number of primes in A.
Such a minorant is constructed via Harman's sieve, and, since it is essentially a combination of Type II terms and easily handled terms, we can obtain an asymptotic formula for elements of A weighed by it. This gives a lower bound #A log X for some constant c. We use numerical integration to verify that we (just) have c > 0, and so we obtain our asymptotic lower bound for #{ p ∈ A}. The upper bound is a simple sieve estimate.
Remark For the method used to prove Theorem 1.1, strong assumptions such as the Generalized Riemann Hypothesis appear to be only of limited benefit. In particular, even under GRH one only gets pointwise bounds of the strength S P (θ ) X 3/4+o(1) for 'generic' θ , which is not strong enough to give a nontrivial minor arc bound on its own. The assumption of GRH and the above pointwise bound is sufficient to deal with the entire minor arc contribution in the regime where we obtain asymptotic formulae (i.e. when the base is sufficiently large).

Notation
We use the asymptotic notation , , O(·), o(·) throughout, denoting a dependence of the implied constant on a parameter t by a subscript. As mentioned earlier, we use f g to denote that both f g and g f hold. Throughout the paper will denote a single fixed positive constant which is sufficiently small; = 10 −100 would probably suffice. In particular, any implied constants may depend on . We will assume that X is always a suitably large integral power of 10 throughout. We will exclusively use the letter p to denote a prime number, without always making this restriction explicit.
We will use the nonstandard notation that n ∼ X to mean that n lies in the interval (X/10, X ] throughout the paper.
Several variables will be assumed to be non-negative integers, without directly specifying this. Thus sums such as n<X will be assumed to be over integers n with 0 ≤ n < X , for example. The usage should be clear from the context. It will be convenient to normalize the Fourier transform of A, and to be able to view it at different scales. With this in mind, we define Whenever we encounter the function F Y we assume that Y is a positive integral power of 10. (Or that they are powers of q in Sect. 16.) We use · to denote the distance to the nearest integer, and · 2 to denote the standard Euclidean norm. We use 1 A 1 for the indicator function of the set A 1 of integers with restricted digits. Here e(x) = e 2πi x is the complex exponential function. We need to make use of various numerical estimates throughout the paper, some of which succeed only by a small margin. We have endeavored to avoid too many explicit calculations and we encourage the reader to not pay too much attention to the numerical constants appearing on a first reading.

Structure of the paper
In Sect. 6, we use a sieve decomposition to reduce the proof of Theorem 1.1 to the proof of Propositions 6.1 and 6.2, which are asymptotic estimates for particular types of terms arising from sieve decompositions. These propositions are established in Sect. 7.
In Sect. 7, we use sieve theory to reduce the proof of Propositions 6.1 and 6.2 to the proof of Propositions 7.1 and 7.2, which are our 'Type I' and 'Type II' estimates. These will be established in Sects. 8 and 9 respectively.
In Sect. 8 we use a large sieve argument to reduce the proof of our Type I estimate Proposition 7.1 to that of Lemmas 8.1 and 8.2, which are Fourier ∞ and 1 bounds. These will be established in Sect. 10.
In Sect. 9 we use the circle method and geometric decompositions to reduce the proof of our Type II estimate Proposition 7.2 to that of Propositions 9.1, 9.2 and 9.3, which are our estimates for the 'major arcs', the 'generic minor arcs' and the 'exceptional minor arcs'. These will be established in Sects. 11, 12 and 13 respectively.
In Sect. 10 we establish various Fourier estimates. In particular we establish Lemmas 8.1 and 8.2, as well as several auxiliary lemmas which will be used in later sections.
In Sect. 11 use results on primes in arithmetic progressions to establish our major arc estimate Proposition 9.1, making use of the estimates of Sect. 10.
In Sect. 12 we use Fourier moment bounds from Sect. 10 to establish our generic minor arc estimate Proposition 9.2.
In Sect. 13 we use the geometry of numbers to reduce the proof of the exceptional minor arc estimate Proposition 9.3 to the proof of Propositions 13.3 and 13.4, which are estimates from frequencies constrained to lie in low height lattices or low height lines. These will be established in Sects. 14 and 15.
In Sect. 14 we establish our estimate for low height lattices Proposition 13.3, using the estimates of Sect. 10.
In Sect. 15 we establish our estimate for low height lines Proposition 13.4 , using the geometric counting estimates and the results of Sect. 10. This completes the proof of Theorem 1.1.
In Sect. 16, we sketch the modifications in the argument required to establish Theorem 1.2.
In particular, the dependency graph between the main statements in the proof of Theorem 1.1 is as follows:

Basic estimates
We will make frequent use of some well-known facts in analytic number theory without extra comment. In particular, we make use of the Prime Number Theorem in short intervals and arithmetic progressions with error term (see [10,Chapter 22], for example). This states that for any A > 0 we have provided ≥ (log Y ) −A and q ≤ (log Y ) A and gcd(a, q) = 1. We recall the following sieve estimate (see, for example, [18,Theorem 7.11]): For where ω(u) is the Buchstab function defined by the delay-differential equation We recall some results from the geometry of numbers and Minkowski's theory of successive minima (see, for example, [9, p. 110]). A lattice in R k is a discrete subgroup of the additive group R k . For any lattice there is a Minkowskireduced basis {v 1 , . . . , v r } of linearly independent vectors in R k such that = v 1 Z + · · · + v r Z, and for any x 1 , . . . , x r ∈ R we have and with v 1 2 · · · v r 2 det( ), where these implied constants depend only on the ambient dimension k. Here det( ) is the r -dimensional volume of the fundamental parallelepiped, given by We say r is the rank of the lattice. We see the properties of the Minkowskireduced basis above indicate that each generating vector v i has a positive proportion of its length in a direction orthogonal to all the other basis vectors.

Sieve decomposition and proof of Theorem 1.1
First, we prove Theorem 1.1 assuming two key propositions, given below. This reduces the problem to establishing Propositions 6.1 and 6.2 which we do over the remaining sections.
As remarked in Sect. 2, it suffices to consider X as a power of 10. If X = 10 k we will think of all elements of A as having k digits, none of which is equal to a 0 . This is equivalent to slightly changing the definition of A in the case when a 0 = 0 (since it restricts A to (X/10, X ]), but by considering X , X/10, X/100 . . . we see that we can easily recover Theorem 1.1 for the original set A from this situation.
We will make a decomposition of #{ p ∈ A} into various terms following Harman's sieve (see [15] for more details). Each of these terms can then be asymptotically estimated by Propositions 6.1 or 6.2 (given below), or can be trivially bounded below by 0. To keep track of the terms in this decomposition we apply the same decomposition to the set by considering a weighted sequence w n .
Let w n be weights supported on non-negative integers n < X given by Given an integer d > 0 and a real number z > 0, let We expect that S d (z) is typically small for a wide range of d and z. The following two propositions show that this is the case for certain d, z.
where * indicates the summation is restricted by the conditions Proposition 6.1 includes the case = 0, where we interpret the statement as where * indicates the same restriction of summation to L ≥ 0 for all L ∈ L as in Proposition 6.1.
We note that by inclusion-exclusion the same result holds if some of the inequalities L ≥ 0 are replaced by the strict inequality L > 0.
Proof of Theorem 1.1 assuming Proposition 6.1 and Proposition 6.2 Let θ 1 = 9/25 + 2 and θ 2 = 17/40 − 2 as in Proposition 6.1. We first consider the upper bound for Theorem 1.1, which is essentially a standard sieve upper bound. Since θ 2 − θ 1 < 1/2, we have Thus, using (6.3) and the fact (5.2) that there are O(X/ log X ) integers in [0, X ] with no prime factors smaller than X θ 2 −θ 1 , we have Thus it suffices to establish the lower bound.
To simplify notation, we let z 1 ≤ z 2 ≤ z 3 ≤ z 4 ≤ z 5 ≤ z 6 be given by We have Thus we wish to bound S 1 (z 4 ) from below. By Buchstab's identity (i.e. inclusion-exclusion on the least prime factor) we have The term S 1 (z 1 ) is o(#A/ log X ) by (6.3) from Proposition 6.1. We split the sum over p into ranges (z i , z i+1 ], and see that all the terms with p ∈ (z 2 , z 3 ] are also negligible by Proposition 6.2. This gives We wish to replace S p ( p) by S p (min( p, (X/ p) 1/2 )). We note that these are the same when p ≤ X 1/3 , but if p > X 1/3 then there are additional terms in S p ((X/ p) 1/2 ) from primes in the interval ((X/ p) 1/2 , p]. For δ = 1/(log X ) 1/2 , by the prime number theorem and Proposition 6.1, we have Here, and throughout this section, q is restricted to being a prime number. Similarly, we get corresponding bounds for S(B p , min( p, (X/ p) 1/2 )), and so we can replace S p ( p) with S p (min( p, (X/ p) 1/2 )) at the cost of a small error. Using this, and applying Buchstab's identity again, we have The first two terms above are asymptotically negligible by Proposition 6.1, and so this simplifies to We perform further decompositions to the remaining terms in (6.5). We first concentrate on the first term on the right hand. Splitting the ranges of pq into intervals, and recalling those with a pq in the interval [z 2 , z 3 ] or [z 5 , z 6 ] make a negligible contribution by Proposition 6.2, we obtain Here we have dropped the condition q ≤ (X/ p) 1/2 in the final sum, since this is implied by q ≤ p and pq ≤ z 2 . On recalling the definition (6.1) of w n , we can lower bound the first term of (6.6) by dropping the non-negative contribution from the set A via w n ≥ −κ A #A/ X . By partial summation, and using the estimate (5.2), this gives Here ω(u) is Buchstab's function, and P − (n) denotes the least prime factor of n.
We perform further decompositions to the second term of (6.6), first splitting according to the size of q 2 p compared with z 6 . .
(6.9) For the terms not coming from products of 3 primes, we split our summation according to the size of qr, noting that this is negligible if qr ∈ [z 2 , z 3 ] by Proposition 6.2. For the terms with qr / ∈ [z 2 , z 3 ] we just take the trivial lower bound. Thus where R 1 and R 2 are given by Together (6.9), (6.10) and (6.11) give a suitable lower bound for the terms in (6.8) with q 2 p ≥ z 6 . When q 2 p < z 6 we can apply two further Buchstab iterations, since then we can evaluate terms S pqr (z 1 ) with r ≤ q ≤ p using Proposition 6.1 as pqr ≤ pq 2 < z 6 . As before, we may replace S pq (q) by S pq (min(q, (X/ pq) 1/2 )) and S pqr (r ) with S pqr (min(r, (X/ pqr) 1/2 )) at the cost of negligible error terms (since pqr < z 6 ). This gives where r, s are restricted to primes in the sums above. Finally we see that any part of the final sum with a product of two of p, q, r, s in [z 2 , z 3 ] can be discarded by Proposition 6.2. Trivially lower bounding the remaining terms as we did before yields z 1 <s≤r ≤q≤ p≤z 2 q 2 p<z 6 z 3 ≤ pq<z 5 r 2 pq,s 2 pqr≤X S pqrs (s) where R 3 is given by This completes our decomposition of the terms from (6.8), coming from the second term of (6.6). We note that we could have imposed various further restrictions such as u + v + w / ∈ [θ 1 , θ 2 ] in R 3 , but for ease of calculation we do not include these.
We perform decompositions to the third term of (6.6) in a similar way to how we dealt with the second term. We have q 2 p < (qp) 3/2 < z 3/2 2 < z 6 so, as above, we can apply two Buchstab iterations and use Proposition 6.1 to evaluate the terms S pqr (z 1 ) since we have pqr ≤ pq 2 < z 6 . Furthermore, we notice that terms with any of pqr, pqs, pr s, or qrs in [z 2 , z 3 ] ∪ [z 5 , z 6 ] are negligible by Proposition 6.2. This gives where We note that for R 4 we have dropped different constraints to those we dropped in R 3 .
Thus in this case we have I 1 + · · · + I 9 < 0.996, and so by continuity we have I 1 +· · ·+ I 9 < 0.996+ O( ) when θ 1 = 9/25+2 and θ 2 = 17/40−2 . Thus, taking suitably small, we see that (6.17) holds, and so we have completed the proof of Theorem 1.1 for X sufficiently large. If X ≥ 4 is bounded by a constant, then Theorem 1.1 follows (after potentially adjusting the implied constants) on noting that either 2 or 3 is a prime in A and so Theorem 1.1 also holds for bounded X ≥ 4.
We note that there are various ways in which one can improve the numerical estimates, but we have restricted ourselves to the above decomposition in the interests of clarity. Judiciously employing further Buchstab decompositions would give small numerical improvements, for example.
Thus it suffices to establish Propositions 6.1 and 6.2 .

Sieve asymptotics
In this section we prove Propositions 6.1 and 6.2 assuming Propositions 7.1 and 7.2, given below. This reduces the problem to proving standard 'Type I' and 'Type II' estimates. These propositions will then be proven in Sects. 8 and 9 . Before we state the propositions, we set up some extra notation. Let By a closed convex polytope in R we mean a region R defined by a finite number of non-strict affine linear inequalities in the coordinates (equivalently, this is the convex hull of a finite set of points in R ). Given a closed convex polytope R ⊆ Q (η), we let log a ∈ R, 0, otherwise. 1 A Mathematica® file detailing this computation is included with this article on arxiv.org.
We caution that 1 R counts numbers with a particular type of prime factorization, and should not be confused with 1 A , the indicator function of the set A. We recall B = {n ∈ Z : 0 ≤ n < X }.
Our two key propositions that we will use are given below.
Proposition 6.2 follows quickly from Proposition 7.2, but it will be convenient to establish a slightly more general version where the primes can be as small as X η .
where * indicates the same restriction of summation to L ≥ 0 for all L ∈ L as in Proposition 6.2.
As before, we note that by inclusion-exclusion the same result holds if some of the constraints L ≥ 0 are replaced with L > 0. We see Proposition 6.2 follows immediately from Lemma 7.3 on choosing η = θ 2 − θ 1 .
Proof of Lemma 7.3 assuming Proposition 7. 2 We just deal with the case when i∈I p i ∈ [X θ 1 , X θ 2 ]; the other case is entirely analogous with θ 1 and θ 2 simply replaced with 1 − θ 2 and 1 − Recall the definition (6.2) of S d (z). We see that S p 1 ··· p ( p j ) is a sum of w n only involving integers n with at most 1/η prime factors, since all prime factors are of size at least X η . The terms with exactly r prime factors (for some r ≤ 1/η) are a sum of w p 1 ··· p r over p 1 , . . . , p r with the summation only restricted by a bounded number of linear inequalities on log p 1 / log X, . . . , log p r / log X . (These are the previous restrictions on p 1 , . . . , p , and the restriction p j ≤ p +1 ≤ · · · ≤ p r ). We may write the condition X η ≤ p 1 and the restriction on the size of i∈I p i and i=1 p i as linear conditions only involving log p 1 / log X, . . . , log p / log X with coefficients having constants depending only on η. Thus, after increasing L to include these conditions, it suffices to show that * where * indicates that the summation is restricted by the conditions for all L ∈ L. Let δ = 1/ log log X . We first trivially discard the contribution from n = p 1 · · · p r < X 1−δ . Each n appears O η (1) times in (7.1), so recalling the definition (6.1) of w n and dropping the other constraints, the total contribution from such terms is η n∈A Thus it is sufficient to show * Since we have the constraint p 1 · · · p ≤ X/ p j ≤ X 1−η , the result follows immediately if r = (if η < δ the result is trivial). Thus we may assume that r > , so none of the constraints involve all the p i . We now wish to replace log p i / log X with log p i / log n in the conditions (7.2). For n ∈ [X 1−δ , X ], we have and so if exactly one of L log p 1 log X , . . . , log p log X and L log p 1 log n , . . . , log p log n is non-negative, we must have To bound the contribution of such terms, let γ > 0 be a parameter and (Here the summation is over all choices of primes p 1 , . . . , p r , and for any such choice n = p 1 · · · p r . We do not restrict to n ≥ X 1−δ in the summation.) We wish to show that if γ = o L ,η (1) then G(γ , L) = o L ,η (#A/ log X ), and we will do this by first thinking of γ fixed but very small. We split the sum into at most r ! = O η (1) subsums where the variables are ordered (we potentially double-count the contribution from p i = p i for an upper bound). Thus, after relabelling the p i , we see that Then R satisfies the conditions of Proposition 7.2, so By the Prime Number Theorem and partial summation, we have Since all components of elements of R are at least η, the integral is bounded by η −r times the (r − 1)-dimensional volume of R. Since L involves at most If γ → 0 as X → ∞ suitably slowly, we see that this shows that We see from (7.5) that the error introduced to (7.4) by replacing After making this change, we may reintroduce the terms with n < X 1−δ at the cost of a negligible error by using the bound (7.3) again. Thus * where * * indicates the sum is constrained to for all L ∈ L. Moreover, since we had the constraint i∈I p i ∈ [X θ 1 , X θ 2 ] in (7.2), this second sum includes the constraint i∈I p i ∈ [n θ 1 , n θ 2 ]. We now split the summation into O η (1) subsums where the p i are totally ordered. After relabelling the coordinates, Proposition 7.2 applies to each of these sums, since the linear constraints L ≥ 0 for L ∈ L define a closed convex polytope (depending only on L), and the ordering of the variables ensures that this lies within Q r (η) (recall that the constraint X η ≤ p 1 becomes n η ≤ p 1 , so all primes are at least n η ). The constraint i∈I p i ∈ [n θ 1 , n θ 2 ] corresponds to the sum of a subset of the coordinates of all points in the polytope lying in [θ 1 , θ 2 ]. Proposition 7.2 shows that the contribution from each such sum is o L,η (#A/ log X ). Since there are O η (1) such sums, the total contribution is o L,η (#A/ log X ), giving the result.
Our aim for the remainder of this section is to establish Proposition 6.1 using Propositions 7.1 and 7.2. We first establish an auxiliary lemma.

Lemma 7.4 (Fundamental Lemma) For δ > 0 we have
The implied constant is independent of δ.
Proof of Lemma 7.4 assuming Proposition 7.1 If δ > 4 then since S(C, X t ) is nonnegative and decreasing in t for any set C, we have By the rough number estimate (5.2) again, we see that the sum of 1/d over d < X with all prime factors bigger that X δ is O δ (1). Thus the result for δ > 4 follows from the result for δ = 4 , so we may assume without loss of generality that δ ≤ 4 . Let Then #A = κ#A, where κ is the constant given in Proposition 7.1. Let R d (e) be defined by We put q = de and see from Proposition 7.1 that for any A > 0 the error terms R d (e) satisfy By the fundamental lemma of sieve methods (see, for example, [14, Theorem 6.9]) we have Summing over d and using the bound (7.6), we obtain The product in the final bound is O(δ −1 (log X ) −1 ), and the inner sum over d is seen to be O(δ −1 ) by an Euler product upper bound. Finally, since we are assuming that δ ≤ 4 , we have that An identical argument works for the set B = {n < X : (n, 10) = 1} instead of A . This gives , and that #B = φ(10)#B/10. Thus, by the triangle inequality We bound the first summation by (7.7), the second summation by (7.8), and note that since #B = φ(10)#B/10, the third summation is zero. Since κ A = 10κ/φ(10), this gives Using Lemma 7.4 we can now prove Proposition 6.1.
We first consider the contribution from p 1 · · · p < X θ 1 . Given a set C and an integer d, we let Buchstab's identity shows that We apply the above decomposition to A d . This gives an expression with Applying the same decomposition to B d , taking the weighted difference, and summing over d = p 1 · · · p we obtain Here indicates we are summing over all choices of p 1 , . . . , p which appear in the summation in Proposition 6.1 with the additional condition that d = p 1 · · · p < X θ 1 .
We note that p 1 , . . . , p ≥ X θ , so d has O(1) prime factors and any integer e can be represented O(1) times as dp 1 · · · p m for some primes p m ≤ · · · ≤ p 1 and some choice of p 1 , . . . , p defining d. Thus, expanding the definition of Here we applied by Lemma 7.4 in the last line, using δ ≥ 1/ log log X . We now consider the V m terms. We expand the definition of V m as a sum. We note that p m ≤ X θ = X θ 2 −θ 1 , so the summation is constrained by X θ 1 ≤ dp 1 · · · p m ≤ X θ 2 , which is our Type II constraint. We see that all terms have dp 1 · · · p m ≤ X/ p m , so we can insert this condition without changing the sum. We recall p 1 , . . . , p are constrained only by some linear conditions on log p 1 / log X, . . . , log p / log X . Thus we see that the sum is of the form considered in Lemma 7.3 with η = δ, since all the conditions in the summation can be written as linear constraints on log p i / log X for 1 ≤ i ≤ and log p j / log X for 1 ≤ j ≤ m. Thus, by Lemma 7.3, we have (7.11) Putting together (7.9), (7.10) and (7.11), we obtain Letting δ → 0 sufficiently slowly then gives the result for d < X θ 1 .
The contribution from d with X θ 2 < d < X 1−θ 2 can be handled by an identical argument, where instead of restricting to dp 1 · · · p m ≤ X θ 1 and X θ 1 < dp 1 · · · p m ≤ X θ 1 p m in T m , U m and V m , we instead restrict to dp 1 · · · p m ≤ X 1−θ 2 and X 1−θ 2 < dp 1 · · · p m ≤ X 1−θ 2 p m respectively. The terms corresponding to V m involve a ∈ A dp 1 ··· p m with X 1−θ 2 < dp 1 · · · p m ≤ X 1−θ 1 ≤ X/ p m , so can be handled by the second part of Lemma 7.3 instead of the first part. Since 50/77 > 1−17/40+2 = 1−θ 2 , the terms corresponding to T m can still be handled by Lemma 7.4. Finally, the contribution from d with 1 can be bounded almost immediately by Lemma 7.3. One Buchstab iteration gives We put d = p 1 · · · p and sum over p 1 , . . . , p satisfying the constraints imposed by L and such that d ∈ [X 1−θ 2 , X 1−θ 1 ]. The first term makes a negligible total contribution by Lemma 7.4 since d ≤ X 1−θ 1 < X 50/77− . The second term makes negligible total contribution by Lemma 7.3 (noting that dp ≤ Together these cover the whole range p 1 · · · p ≤ X 1−θ 1 , giving the result. of the set A. We recall our definition of the function F Y from (3.1), which is a normalized version of S A . In particular, |S A (θ )| = #A · F X (θ ). The two key lemmas which we use in this section are the following.
be of the form q = q 1 q 2 with (q 1 , 10) = 1 and q 1 > 1, and let |η| < Y −2/3 /2. Then for any integer a coprime with q we have Proof of Proposition 7.1 assuming Lemma 8.1 and Lemma 8.2 By Möbius inversion and using additive characters, we have for (q, 10) = 1 We note that #{a ∈ A : (a, 10) = 1} = κ#A. Summing over q < Q with (q, 10) = 1 and letting q = q q , we obtain Here we recall our notation that q ∼ Q 1 means q ∈ (Q 1 /10, Q 1 ]. By Lemma 8.1 we have for any d|10 Thus we see that the bound (8.1) is O A (#A/(log X ) A ) in either case, as required.
We are left to establish Proposition 7.2 and Lemmas 8.1 and 8.2.

Type II estimate
In this section we reduce our 'Type II' estimate to various major arc and minor arc estimates. In particular, we will reduce the proof of Proposition 7.2 to the proof of Propositions 9.1, 9.2 and 9.3 . We first recall the statement of Propositon 7.2 which allows us to count integers in A with a specific type of prime factorization provided such numbers always have a 'conveniently sized' factor.
To avoid technical issues due to the fact that n<Y 1 A (n) can fluctuate with Y , we will replace our counts We note that in R the conditions are on log p i / log X , whereas in 1 R the conditions are on log p i / log n. If every e ∈ R has e 1 ≤ · · · ≤ e then at most one term occurs in the summation, so R simplifies to We prove Proposition 7.2 by an application of the Hardy-Littlewood circle method, whereby we study the functions Proposition 7.2 then relies on the following three components.
Proposition 9.1 (Major arcs) Fix η > 0 and let ∈ Z satisfy 1 ≤ ≤ 2/η. Let δ = (log log X ) −1 , and let R X = R X (a 1 , . . . , a −1 ) be given by Here κ A is the constant given in Proposition 7.2. The implied constant depends on C and η, but not on R X or a 1 . . . , a −1 . Then there is some exceptional set E ⊆ [0, X ] with The implied constant depends on η, but not on R.
The implied constant depends on η and A, but not on R X or a 1 , . . . , a −1 .
We expect the contribution from the major arcs M to give the main contribution. Proposition 9.1 shows that we can get an asymptotic formula from frequencies in M. Proposition 9.2 shows that most frequencies contribute negligibly, and that any significant contribution must come from some small exceptional set E. (In view of Proposition 9.1, we must have E contains elements of M and so E is non-empty). We would expect that we can take E = M, but cannot quite show this. However, Proposition 9.3 shows that E\M contributes negligibly to our sum, which is sufficient for our purposes. We see that 1 S and1 S differ in that the denominators of the fractions are log n and log X respectively. We cover [η, 1] −1 by O(δ −( −1) ) disjoint hypercubes C(a, δ) of side length δ (for example, we can take all a ∈ {0, δ, 2δ, . . . , δ −1 δ} −1 ). Let R ⊆ [η, 1] −1 denote the projection of R onto the first − 1 coordinates (which is also a closed convex polytope). We see that if n ∈ [X 1−δ 2 , X ] then log n and log X differ by a factor of at most 1 − δ 2 . In particular, if log p j / log X ∈ [a j , a j + δ] then certainly log p j / log n ∈ [a j , a j + 2δ]. This means that if C(a; 2δ) ⊆ R and log p j / log X ∈ [a j , a j + δ] for all j ≤ − 1, then Since 1 R (n) is supported on n with prime factors all at least n η , if n = p 1 · · · p ≥ X 1−δ 2 and 1 R (n) = 1 then there is an a with a i ≥ η/2 such that1 C(a;δ) ( p 1 · · · p −1 ) = 1. Moreover, since n ≥ X 1−δ 2 we have p ≥ Since the cubes are disjoint, this happens for exactly one choice of a. Therefore we have for any n ∈ [X 1−δ 2 , X ] Using this with (9.2) to split the summation over hypercubes C, we find Re-inserting terms with m ≤ X 1−δ 2 and n ≤ X 1−δ 2 , we obtain 3) The final two terms above satisfy We now consider the contribution to (9.3) from C(a; 2δ) ∩ ∂R = ∅. Since R ⊆ [η, 1] , we must have a i ≥ η/2 and since the coordinates of points in R sum to 1 we also have and C + (a;δ) (n) have the same support, which is restricted to integers with no factor less than X η/4 , we have1 C + (a;δ) (n) η (log X ) − C + (a;δ) (n). Thus we have Here we used the triangle inequality in the final line. By the prime number theorem, for any choice of a ∈ [0, 2] −1 we have Since R is a closed convex polytope, so is R ⊆ R −1 . Therefore there are O R (δ −( −2) ) hypercubes C(a; 2δ) which intersect ∂R. Thus the contribution to (9.3) from the final term of (9.5) is We now consider the terms with C(a; 2δ) ⊆ R. Since R ⊆ Q (η), if e ∈ R then e 1 ≤ · · · ≤ e , so if e ∈ R then e 1 ≤ · · · ≤ e −1 . Therefore, since Since i=1 e i = 1 and e −1 ≤ e for e ∈ R, if e ∈ R then e −1 ≤ 1 − −1 i=1 e i . Therefore, since (a 1 + 2δ, . . . , a −1 + 2δ) ∈ C(a; 2δ) ⊆ R, we have Together (9.7) and (9.8) imply that at most one term occurs in the summation in C + (a;δ) . Thus for such C(a; 2δ), since the coordinates are localized, we havẽ Thus a C(a;2δ)⊆R m∈A1 Since any n = p 1 · · · p contributing to the second term above is counted at most once and has all prime factors at least X η/4 , we have Here we used Lemma 7.4 and (5.2) in the final line. Combining (9.4), (9.5), (9.6), (9.10) and (9.11), we find (9.3) is bounded by Thus to establish Proposition 7.2 it is sufficient to show that for any A > 0, we have (9.12) uniformly for every hypercube C(a; δ) of side length δ with C(a; 2δ) ∩ R = ∅.
Since i∈I e i ∈ [9/25 + , 17 We split the summation over b into the sets M, [0, X )\(E ∪ M) and E\M, where M is as given by Proposition 9.1, and E is the set who existence is asserted by Proposition 9.2. We then apply Propositions 9.1, 9.2 and 9.3 respectively to each set in turn.
For C in the definition of M sufficiently large in terms of A and η, this gives This gives (9.12), and hence completes the proof of Proposition 7.2.

Fourier estimates
In this section we collect various distributional bounds on the Fourier transform which will underpin our later analysis. In particular, we establish Lemma 8.1 and Lemma 8.2, as well as several other related estimates. Specifically, Lemma 8.1 is a special case of Lemma 10.5, and Lemma 8.2 is the same as Lemma 10.1.
We recall our normalized version of S A (θ ) from (3.1) We recall that we assume Y is an integral power of ten whenever we encounter F Y to avoid some unimportant technicalities. In particular, for all θ and Y . The key property of F Y which we exploit is that it has an exceptionally nice product form. If Y = 10 k , then letting n = k−1 i=0 n i 10 i have decimal digits n k−1 , . . . , n 0 , we find We note that F Y is periodic modulo 1, and that the above product formula gives the identity 3) (We recall that we assume that U and V are both powers of 10 in such a statement.) Lemma 10.1 ( ∞ bound, Lemma 8.2 restated) Let q < Y 1/3 be of the form q = q 1 q 2 with (q 1 , 10) = 1 and q 1 > 1, and let |η| < Y −2/3 /2. Then for any integer a coprime with q we have for some absolute constant c > 0.
Then we have that where we interpret the term in parentheses as 9 if 10 i−1 θ = 0. Writing θ = k i=1 t i 10 −i for t i ∈ {0, . . . , 9}, we see that the (k − j)th term in the product depends only on t k− j , . . . , t k . Moreover, the value of the term is mainly dependent on the first few of these digits by continuity. Thus we may approximate the absolute value of F Y (θ ) by a product where the jth term depends only on t j , . . . , t j+J for some constant J . Explicitly, we have where we put t j = 0 for j > k.
With this formulation we can interpret the above bound in terms of the probability of a walk on {0, . . . , 9, ∞} k . Let t ∈ R be given. Consider an order-J Markov chain X 1 , X 2 , . . . where for a, a 1 , . . . , a n ∈ {0, . . . , 9} we have for n > J cG(a, a 1 , a 2 , . . . , a J ) t for some suitably small constant c (so that the probability that X n ∈ {0, . . . , 9} is less than 1). To make this a genuine Markov chain we choose the probability that X n = ∞ given X n−1 , . . . , X n−J to be such that the probabilities add up to 1, and if X n = ∞ then we have that X n+1 = ∞ with probability 1.
Then we have that The sum (over all paths in {0, . . . , 9} k ) of the probabilities of paths is a linear combination of the entries in the kth power of the transition matrix restricted to {0, . . . , 9}. Thus such a moment estimate is a linear combination of the kth power of the eigenvalues of this matrix. This allows us to estimate any moment of F Y (a/Y ) over a ∈ [0, Y ) uniformly for all k by performing a finite eigenvalue calculation. In particular, this gives us a (arbitrarily good as J increases) numerical approximation to the distribution function of F Y .
If this is the case then we have Thus, fixing i = 1 so that a k+1 = · · · = a J +k = 0, and summing over j, we have that On the other hand, by the eigenvalue expansion of M t , we have This gives the result. In particular, we have for For the second bound, let Since there are O(1) choices of a 1 , a 3 and these can be absorbed into the supremum over β, we see that it suffices to show Since F Y 2 ≥ 0 we can extend the summation to a 2 < Y 2 . Thus without loss of generality we may assume that Here we used the fact that G(t i , . . . , t i+4 ) is bounded away from 0 for all t 1 , . . . , t k ∈ {0, . . . , 9} since it is the maximal absolute value of a trigonometric polynomial over an interval. Since F is periodic modulo 1 we see that   Proof For each a ≤ q, let |η a | maximize F U (a/q + η) over |η| < δ. Since the fractions a/q are all separated from one another by at least 1/q, we have for any t We have that Thus integrating over s ∈ [t − γ, t + γ ] for some γ > 0, we have This implies that |F U (s)|ds.

n)e(nt) .
Writing n = u−1 j=0 n j 10 j−1 and using the triangle inequality, we have We recall the function G from Lemma 10.2. Since G(t 1 , . . . , t 1+J ) is bounded away from 0, we see that for η for U ≤ Y , and choosing U maximally subject to U ≤ q and U ≤ Y gives the first result of the lemma.
The other bounds follow from entirely analogous arguments. In particular we note that for (a, q) = 1, q < Q, the numbers a/q are separated from one another by 1/Q 2 , and those with d|q are separated from each other by d/Q 2 , so we have the equivalent of (10.7) with δq replaced by δ Q 2 or δ Q 2 /d and |η| ≤ 1/2q replaced by |η| ≤ 1/2Q 2 or |η| ≤ d/2Q 2 .

Lemma 10.6 (Hybrid Bounds) Let E ≥ 1. Then we have
In the above lemma, we emphasize that a, q, d are all integers, bu the summation over η is over real numbers which are well-spaced from the condition Y (η + a/q) ∈ Z.
Proof We first note that the summand a/q +η runs through fractions b/Y with |b| ≤ E +Y since we have the condition /Y, q)) times, since if a 1 /q + η 1 = a 2 /q + η 2 then a 2 = a 1 +O(q E/Y ) and η 2 is determined by a 1 , a 2 , η 1 . There are O(1+ E/Y ) choices of b giving the same fraction (mod 1), and since F Y is periodic (mod 1) these all give the same value of F Y (b/Y ). Thus we may consider only In this case the result now follows from Lemma 10.3. Thus we may assume q E < Y /10. Using the product formula (10.3), we have for Y ≥ U V powers of 10 We also have the trivial bound F V (U θ) ≤ 1 of (10.1). For U V ≤ Y and |η| < E/Y these give We choose V and then U to be the largest powers of 10 such that V ≤ Y /q E and U ≤ Y /V E. Note that this choice gives U, V ≥ 1 since q E < Y /10 and q, E ≥ 1. Thus Since we chose U and V maximally, we have V ≥ Y /10q E, so q/100 ≤ U ≤ 10q. Since q E < Y /10, we may extend the supremum in 1  Putting this together gives the first result. The second bound follows from an entirely analogous argument. We first split the argument depending on whether Q 2 E/d ≥ Y /10 or not, and use the final bound of Lemma 10.5 instead of the first bound to handle 2 .
The argument giving the first bound of Lemma 10.6 is essentially sharp if the 1 bounds used in the proof are sharp and if q is a divisor of a power of 10 or if Q E ≥ Y . When Q E ≤ Y 1− and q is not a divisor of a power of 10, however, we trivially bounded a factor F V (U (a/q + η)) by 1 in the proof, which we expect not to be tight. Lemma 10.7 below allows us to obtain superior bounds (in certain ranges) provided the denominators do not have large powers of 2 or 5 dividing them.
Then we have In particular, if q = dq with (q , 10) = 1 and d|10 u for some integer u ≥ 0, then we have For example, if (q, 10) = 1 and q E is a sufficiently small power of Y , then we improve the first bound (q E) 27/77 of Lemma 10.6 in the q-aspect to E 27/77 q 1/21 . This improvement is important for our later estimates.
Proof Choose E E and D D with E , D ≥ 1 integral powers of 10 such that E D ≤ Y . Let V be the largest integral power of 10 such that By the periodicity of F modulo one, the fact (q 1 q 2 , d) = 1, and the Chinese remainder theorem, we have (10.11) where the dash on indicates that η is summed over all reals satisfying Thus, since F is periodic modulo 1 and d 3 |D and d 2 d 3 |V D , we have Moreover, by (10.3) and Cauchy-Schwarz, we have Since d 2 d 3 |D V , this gives where These give Since (d 1 d 2 d 3 , D ) = d 3 and (q 1 q 2 , d) = 1, as a , b 1 and b 2 go through all residue classes (mod q 1 q 2 ), (mod d 1 ) and (mod d 2 ) respectively subject to (a , q 1 q 2 ) = (b 1 +d 1 b 2 , d 1 d 2 ) = 1, we see that D β 2 goes through all values of c/q 1 q 2 d 1 d 2 (mod 1) for 0 < c < q 1 q 2 d 1 d 2 with (c, q 1 q 2 d 1 d 2 ) = 1, and each value is attained exactly once. Similarly, since (d 1 d 2 d 3 , D V ) = d 2 d 3 , we see that β 3 goes through every value of c/q 1 q 2 d 1 (mod 1) with 0 < c < q 1 q 2 d 1 and (c, q 1 q 2 d 1 ) = 1 exactly once as a goes through the values (mod q 1 q 2 ) and b 1 goes through the values (mod d 1 ) with (a, q 1 q 2 ) = (b 1 , d 1 ) = 1.
We note that only 3 and 5 depend on q 2 . Thus, summing over q 2 ∼ Q 2 with (q 2 , 10) = 1 we obtain (10.12) where 1 , 2 and 4 are as above and 3 and 5 are given by We are left to bound 3 and 5 , which are very similar. Let We note that (q 1 , d 1 , d 2 ) is the same as 3 except we have increased the range of the supremum, and so we have 3 ≤ (q 1 , d 1 , d 2 ). Moreover, we see that 5 is a special case of with d 2 = 1, so 5 = (q 1 , d 1 , 1). Thus it will suffice to get suitable bounds on . Since 1/V , we see all quantities γ occurring in the supremum are of size at most O(1/R). Given any choice of reals η a,q 2 1/R for a ≤ d 1 d 2 q 1 q 2 and q 2 ∼ Q 2 with (a, d 1 d 2 q 1 q 2 ) = 1, the numbers a/d 1 d 2 q 1 q 2 + η a,q 2 can be arranged into O(d 1 d 2 Q 1 Q 2 2 /R) sets such that all numbers in any set are separated by 1/R. (Recall that r is chosen such that R ≤ d 1 d 2 Q 1 Q 2 2 .) Thus, as in the proof of Lemma 10.5 (specifically the argument leading up to (10.8)), we find that By Parseval we have Using Cauchy-Schwarz and the above bounds, we obtain Putting this together gives We recall that R = 10 r ∼ min(V, d 1 d 2 Q 1 Q 2 2 ) and V (Y /D E) 1/2 , and note that 20/21 < log 9/ log 10. This gives This gives a bound for 3 since 3 ≤ , and we obtain an analogous bound for 5 with d 2 replaced by 1. Combining (10.16) with our earlier bounds (10.13), (10.14) and (10.15) and substituting these into (10.12) gives Simplifying the exponents by noting 1 + 10/21 < 3/2 and 27/77 + 10/21 < 5/6 then gives the result. The second statement of the lemma is simply the case when Q 2 = 1 and q = dq 1 .
We see that Lemma 8.1 follows immediately from Lemma 10.5, and Lemma 8.2 is the same as Lemma 10.1. Thus we are left to establish Propositions 9.1, 9.2 and 9.3, which we do over the next few sections.

Major arcs
In this section we establish Proposition 9.1 using the prime number theorem in arithmetic progressions and short intervals, making use of Lemma 10.1.
Proof of Proposition 9. 1 We split M up as three disjoint sets By Lemma 10.1 and recalling X is a power of 10, we have Using the trivial bound S R X (θ ) X (log X ) , where ≤ 2/η and noting #M 1 (log X ) 3C , we obtain This gives the result for M 1 . We now consider M 2 . Recalling the definition of R X , we have that for n < X R X (n) = n= p 1 ··· p p j ∈(X a j ,X a j +δ ] for j< (11.2) where C = (a 1 , a 1 + δ] × · · · × (a −1 , a −1 + δ] is the projection of R X onto the first − 1 coordinates. We note the crude bound We note that if a ∈ M 2 then a/ X = b/q +c/ X for some integers b, q, |c| ≤ (log X ) C (c is an integer since q|X for the set M 2 ). We separate the sum S R X (a/ X ) by putting the prime variable p occurring in (11.2) in short intervals of length x/m and in arithmetic progressions (mod q). We note that C is supported on m ≤ X i a i +( −1)δ < X 1−η/3 , so we can drop the constraints p ≥ X η/4 , X 1− i a i − δ at the cost of some terms with mp < X 1−η/12 + X 1−δ . Thus we have If mp = j X + O( X ) and p ≡ r (mod q) we have By the prime number theorem in short intervals and arithmetic progressions (5.1), for m < X 1−η/3 and (r, q) = 1 we have Finally, since c ∈ Z and c = 0 and −1 ∈ Z, we have  (11.4) Note that in the above argument for us to be able to save an arbitrary power of log it was important that we are counting elements with weight R X (n) rather than 1 R X (n), and that X ν ∈ Z for a ∈ M 2 . Using the trivial bounds S A (θ ) ≤ #A and #M 2 (log X ) 3C along with (11.4), we obtain Finally, we consider M 3 . By the prime number theorem in arithmetic progressions as above, we have for (r, q) = 1 and q ≤ (log X ) C that n<X n≡r (mod q) Thus, for (a, q) = 1 Since μ(q) = 0 for q|10 k = X unless q ∈ {1, 2, 5, 10}, using the trivial bounds #M 3 (log X ) 2C and |S A (a/ X )| ≤ #A, we obtain (11.6) Thus (11.1), (11.5) and (11.6) gives the result.
Remark We have only needed to use the prime number theorem in arithmetic progressions when the modulus is a small divisor of X , and so has no large prime factors. This means that our implied constants can be taken to be effectively computable since for such moduli we do not need to appeal to Siegel's theorem.

Generic minor arcs
In this section we establish Proposition 9.2 and obtain some bounds on the exceptional set E by using the distributional estimates of Lemma 10.4. (1) .

Lemma 12.1 ( 2 bound for primes) We have that
Proof This follows from the 2 bound coming from Parseval's identity. (1) .

Lemma 12.2 (Generic frequency bounds) Let
Proof The first bound on the size of E follows from using Lemma  It remains to bound the sum over a / ∈ E. We divide the sum into O(log X ) 2 subsums where we restrict to those a such that F X (a/ X ) ∼ 1/B and |S R (a/ X )| ∼ X/C for some B ≥ X 23/80 and C ≤ X 2 (terms with C > X 2 makes a contribution O(1/ X )). This gives We concentrate on the inner sum. Using Lemmas 10.4 and 12.1 we see that the sum contributes Here we used the bound min(x, y) ≤ x 1/2 y 1/2 in the last line. In particular, we see this is O η (X 1−2 ) if B ≥ X 23/80 on verifying that 23/80 × 73/308 > 59/866. Substituting this into our bound above gives the result.

Exceptional minor arcs
In this section we reduce Proposition 9.3 to the task of establishing Propositions 13.3 and 13.4, given below. We do this by making use of the bilinear structure of R X (n) which is supported on integers of the form n 1 n 2 with n 1 of convenient size, and then showing that if these resulting bilinear expressions are large then the Fourier frequencies must lie in a smaller additively structured set. Propositions 13.3 and 13.4 then show that we have superior Fourier distributional estimates inside such sets. Thus we conclude that the bilinear sums are always small. To make the bilinear bound explicit, we establish the following lemma, from which Proposition 9.3 follows quickly.
Then for any complex 1-bounded complex sequences α n , β m , γ a we have Proof of Proposition 9.3 assuming Lemma 13.1 By symmetry, we may assume that I = {1, . . . , 1 } for some 1 < . By Dirichlet's theorem on Diophantine approximation, any a ∈ [0, X ) has a representation for some integers (b, q) = 1 with q ≤ X 1/2 and some real |ν| ≤ 1/ X 1/2 q. Thus we can divide [0, X ) into O(log X ) 2 sets F(Q, E) as defined by Lemma 13.1 for different parameters Q, E satisfying 1 ≤ Q ≤ X 1/2 and Thus, provided C is sufficiently large compared with A and η, we see it is sufficient to show that From the definition (9.1) of R X and shape of R X given by Proposition 9.3, we have that for n < X where R 1 is the projection of R X onto the first 1 coordinates, and R 2 is the projection onto the subsequent − 1 − 1 coordinates.
Since n 1 , n 2 , p and X are integers, | log ((X − 1/2)/n 1 n 2 p)| 1/ X . Thus, by Perron's formula (see, for example, [10, Chapter 17]), we have for n 1 , n 2 , p < X We will use this to remove the constraint n = n 1 n 2 p < X in S R X (−a/ X ). We first put n 1 , n 2 , p into one of O(log X ) 3 intervals of the form (Y /10, Y ], and then apply the above estimate. The O(X −2 ) error term trivially makes a negligible contribution to (13.1). Thus, we see that for C sufficiently large, it suffices to show uniformly over all s with (s) = 1/ log X and all choices of where c p = log p if p ≥ X η/4 , X 1− i a i − δ and 0 otherwise. (The integral over s and the choices of N 1 , N 2 , P contribute a factor of O(log X ) 4 , which is acceptable for establishing (13.1) if C is sufficiently large.) Finally, let γ a be the 1-bounded sequence satisfying S A (a/ X ) = #Aγ a F X (a/ X ). After substituting this expression for S A , we see that (13.2) follows immediately from Lemma 13.1 for C sufficiently large in terms of η, thus giving the result.
Thus it remains to establish Lemma 13.1. The key estimate constraining Fourier frequencies to additively structured sets is the following lemma. Lemma 13.2 (Geometry of numbers) Let K 0 be a sufficiently large constant, let t ∈ R 3 with t 2 = 1 and let N > 1 > δ > 0. Let Then there exists a lattice ⊂ Z 3 of rank at most 2 such that If a cuboid R ⊆ R 3 of volume V lies in a the region |z| ≤ , then it can easily contain rather more than V lattice points from the plane z = 0. Lemma 13.2 says that such a situation is essentially the only way a cuboid can contain many lattice points; if any cuboid has substantially more than V lattice points in R ∩ Z 3 , then these lattice points must come from some lower dimensional linear subspace. The region R which we are interested in is a slightly thickened disc through the origin in the plane orthogonal to t.
Proof of Lemma 13.2 Let φ : R 3 → R 3 be the linear map which is a dilation by a factor N /δ in the t-direction Since the determinant of a lattice is the volume of the fundamental parallelepiped, we see that det( 1 ) = N /δ.
Let {v 1 , v 2 , v 3 } be a Minkowski-reduced basis of 1 . We recall that this means that any v ∈ 1 can be written uniquely as n 1 v 1 + n 2 v 2 + n 3 v 3 for some n 1 , n 2 , n 3 ∈ Z, and for any n 1 , n 2 , n 3 ∈ Z we have and that v 1 2 v 2 2 v 3 2 det( 1 ) = N /δ. Without loss of generality let v 1 2 ≤ v 2 2 ≤ v 3 2 . We now notice that any element of R ∩ Z 3 is mapped injectively by φ to an element of {x ∈ 1 : x 2 ≤ 2N }. Thus for a sufficiently large constant C, we have If v 3 2 > C N , then there are no n ∈ Z 3 counted above with n 3 = 0. If instead v 3 2 3 2 , the number of n is Thus in either case there are O(δ N 2 ) points with n 3 = 0. However, by assumption of the lemma we have that K is sufficiently large and This means that most of the contribution must come from terms with n 3 = 0. Indeed, we have #{(n 1 , n 2 ) ∈ Z 2 : We may choose K 0 such that if K ≥ K 0 then the right hand side is at least We establish Lemma 13.1 assuming two key propositions, Proposition 13.3 and Proposition 13.4, given below. These propositions will be proven over the next two sections.

Proposition 13.3 (Bound for angles generating lattices)
such that there is a lattice ⊆ Z 3 of rank 2 such that #{n ∈ : |n 1 a 1 + n 2 a 2 + n 3 X | ≤ δ X, and not all of these points lie on a line through the origin. Let F = F(Q, E) be given by Then we have #{n ∈ L ∩ Z 3 : |n 1 a 1 + n 2 a 2 + n 3 X | ≤ δ X, n 2 ≤ N } ≥ δ N 2 K .
Then we have Proof of Lemma 13.1 assuming Propositions 13.3 and 13. 4 We split E into O(log X ) subsets of the form for some Thus it suffices to show We consider 1 ≤ K ≤ X/N taking values which are integral powers of 10, and split the contribution of our sum according to these sets. We see it is therefore sufficient to show that for each K By considering δ = 2 − j and using the pigeonhole principle, we see that if then there is some δ ≥ N / X and some K / log X K ≤ K such that (a 1 , a 2 ) ∈ G(K , δ).
Thus is suffices to show for all K , δ that which gives (13.3) in the case when N K X 17/40+ . Thus we may assume that N K X 17/40+ . By assumption, we also have that N ≤ X 17/40 , so we only consider K X . In particular, we may use Lemma 13.2 to conclude that either there is a rank 2 lattice ⊆ Z 3 such that #{n ∈ : n 2 ≤ 10N , |n 1 a 1 + n 2 a 2 + n 3 X | ≤ δ X } ≥ δ K N 2 /2, and not all of these points lie on a line through the origin, or there is a line L ⊆ Z 3 such that #{n ∈ L : n 2 ≤ 10N , |n 1 a 1 + n 2 a 2 + n 3 X | ≤ δ X } ≥ δ K N 2 /2.
In either case (13.3) follows from Proposition 13.3 or Proposition 13.4 (taking 'N ' and 'K ' in the propositions to be 10N and K /1000 ≥ 1 in our notation here).
Thus it remains to establish Propositions 13.3 and 13.4.

Lattice estimates
In this section we establish Proposition 13.3, which controls the contribution from pairs of angles which cause a large contribution to the bilinear sums considered in Sect. 13 to come from a lattice. A low height lattice makes a significant contribution only if (a 1 , a 2 , X ) is approximately orthogonal to the plane of the lattice, and so only if (a 1 , a 2 , X ) lies close to the line through the origin orthogonal to this lattice. We note that we only make small use of the fact that these angles lie in a small set, but it is vital that the angles lie outside the major arcs.

Lemma 14.1 (Lattice generating angles have simultaneous approximation)
Let δ > 0 and X, N , K ≥ 1 be such that δ ≥ N / X . Let B 1 = B 1 (N , K , δ) ⊆ [0, X ) 2 be the set of pairs (a 1 , a 2 ) ∈ Z 2 such that there is a lattice ⊆ Z 3 of rank 2 such that #{n ∈ : |n 1 a 1 + n 2 a 2 + n 3 X | ≤ δ X, n 2 ≤ N } ≥ δ K N 2 , and moreover the points counted above do not all lie on a line through the origin.
Then all pairs (a 1 , a 2 ) ∈ B 1 have the simultaneous rational approximations for some integer q X/N K .
We see Lemma 14.1 restricts the pair (a 1 , a 2 ) to lie in a set of size O(X/N K ) 3 , which is noticeably smaller than X 2 for the range of N K under consideration. This allows us to obtain superior bounds for the sum over a 1 , a 2 , by exploiting the estimates of Lemma 10.6 which show F is not abnormally large on such a set.
Proof Clearly we may assume that N K is sufficiently large, since otherwise the result is trivial. By assumption of the lemma, for any pair (a 1 , a 2 ) ∈ B 1 there is a rank 2 lattice = a 1 ,a 2 such that #( ∩ H) ≥ δ K N 2 where Moreover, not all the points in ∩ H lie in a line through the origin. Let a = (a 1 , a 2 , X ), and let φ : R 3 → R 3 be a dilation by a factor N /δ in the a-direction, and let = φ( ). Then we see that Moreover, not all the points on the right hand hand side lie in a line through the origin, since φ −1 preserves lines through the origin. Let have a Minkowskireduced basis {v 1 , v 2 }, and let V 1 = v 1 2 and V 2 = v 2 2 . Since m 1 v 1 + m 2 v 2 2 |m 1 |V 1 + |m 2 |V 2 , for a suitably large constant C we have Since not all of the points in the final set lie in a line through the origin, we see that In particular, , so w 1 and w 2 are linearly independent vectors in ⊆ Z 3 . Since φ can only increase the length of vectors, w 1 2 ≤ V 1 and w 2 2 ≤ V 2 . Let 1 = |w 1 · a| and 2 = |w 2 · a|. Trivially we have |v 1 · a| V 1 X and |v 2 · a| V 2 X , and so recalling that φ is a dilation by a factor N /δ in the a-direction, we see that 1 δ X V 1 /N and Putting this together, we see that for any pair (a 1 , a 2 ) ∈ B 1 there are linearly independent vectors w 1 , w 2 ∈ Z 3 and quantities V 1 , V 2 such that This puts considerable constraints on the possibilities for (a 1 , a 2 ), since it must lie in an infinite cylinder with axis parallel to w 1 × w 2 with short radius, for some low height vectors w 1 , w 2 . (Here × is the standard cross product on R 3 .) Explicitly, let e 1 , e 2 , e 3 be an orthonormal basis of R 3 with e 1 orthogonal to w 1 and w 2 , and with e 2 orthogonal to w 2 . Then we see that e 1 ∝ w 1 × w 2 , e 2 ∝ w 2 × e 1 and e 3 ∝ w 2 . In particular, we have that |e 3 · w 2 | = w 2 2 , and Since w 1 2 V 1 , w 2 2 V 2 and w 1 × w 2 2 ≤ w 1 2 w 2 2 , this implies that Thus, since V 1 V 2 1/δ K , we see that any vector x with |x · w 1 | δ X V 1 /N and |x · w 2 | δ X V 2 /N satisfies 2 2 for some λ ∈ R. We note that the error term is o(X ) since w 1 , w 2 are linearly independent integer vectors and N K is assumed sufficiently large. Let the components of w 1 × w 2 be c 1 , c 2 , c 3 (with respect to the standard basis of R 3 ). Since w 1 , w 2 ∈ Z 3 , we have c 1 , c 2 , c 3 ∈ Z. Thus if a is of the above form we must have a = λ(w 1 × w 2 ) + o(X ) for some λ. Since a 2 ≥ X and a 1 , a 2 ≤ a 3 = X , we must have that |c 1 |, |c 2 | |c 3 |. In particular, |c 3 | w 1 × w 2 2 . Dividing through by X = λc 3 + O(X/N K |c 3 |) then gives Finally, we note that since δ ≥ N / X and Thus, we see that for any pair (a 1 , a 2 ) ∈ B 1 there must be integers c 1 , c 2 , c 3 X/N K such that (14.1) holds. This gives the result.
If N K > X 2/3 (and X is sufficiently large) then we see that b 1 /q and b 2 /q are the best rational approximations to a 1 / X and a 2 / X with denominator O(X 1/3 ), since the error in the approximation is O(1/(q X 2/3 )). Thus if we also have a 1 , a 2 ∈ F(Q, E) then we must have q Q and |ν 1 |, |ν 2 | ∼ E/ X . In particular, we must have Q + E X/N K . If instead N K ≤ X 2/3 then since Q + E X 1/2 we have Q + E (X/N K ) 2 . Thus in either case we have that there are no such pairs (a 1 , a 2 ) in both B 1 (N , K , δ) where V = {2 u 5 v : u, v ∈ Z ≥0 }, the supremum is over all choices of Q 1 , G 1 , G 2 , D 0 , D 1 , E 0 ≥ 1 which are powers of 10 and satisfy Q 1 G 1 G 2 D 0 D 1 E 0 X/N K and G 1 G 2 , and S 1 , S 2 , S 3 are given by Proof By Lemma 14.1 we are considering pairs (a 1 , a 2 ) ∈ B 1 (N , K , δ) such that for some q X/N K and |ν 1 |, |ν 2 | 1/N K q. By clearing common factors we may assume that (b 1 , b 2 , q) = 1. We let g 1 = (b 1 , q) and g 2 = (b 2 , q). By symmetry we may assume that g 1 ≤ g 2 . We let d 1 be the part of g 1 not coprime to 10 (i.e. d 1 |10 u for some integer u, and g 1 = g 1 d 1 for some (g 1 , 10) = 1). Similarly we let d 0 be the part of q/g 1 g 2 which is not coprime to 10. To ease notation we let b 1 = b 1 /g 1 , b 2 = b 2 /g 2 , q = q/g 1 g 2 d 0 and We split the contribution of pairs (a 1 , a 2 ) ∈ B 1 into O(log X ) 5 subsets. We consider terms where we have the restrictions q ∼ Q 1 , We relax the restriction |ν 1 |, |ν 2 | 1/N K q to |ν 1 |, |ν 2 | ≤ E 0 / X for a suitable power of 10 E 0 X/N K Q 0 with E 0 ≥ 1. We see there are O(log X ) 5 sets with such restrictions which cover all possible (b 1 , b 2 , q, ν 1 , ν 2 ) and hence all (a 1 , a 2 ) ∈ B 1 . For simplicity, the reader might like to consider the special case G 1 = G 2 = D 0 = D 1 = 1 on a first reading.
To ease notation we let V = {2 u 5 v : u, v ∈ Z ≥0 }, and note that we have d 0 , d 1 ∈ V. By summing over all possibilities of q , g 1 , where the supremum is over all choices of Q 1 , G 1 , G 2 , D 0 , D 1 , E 0 ≥ 1 which are powers of 10 and satisfy Q 1 G 1 G 2 D 0 D 1 E 0 X/N K and G 1 D 1 G 2 and S 0 is given by In S 0 , we have used to indicate that the summation is further constrained by the conditions which we suppressed for notational simplicity. We see that g 1 , g 2 , b 1 , b 2 , ν 1 , ν 2 each occur in only one of the two F X terms, and so given d 0 , d 1 , q the remaining summation in S 0 factors into a product of two sums. Taking a supremum over all choices of q in the first of these then gives

2) where
3) The bound (14.2) will be useful when Q 0 is small, but when Q 0 is large it is wasteful to sum over all these possibilities since we have not made use of the fact that a 1 , a 2 ∈ E, a small set. To obtain an alternative bound we first sum over all a 1 ∈ E, then all possibilities of q, b 2 , ν 2 . This shows that where the supremum has the same constraints as before, and S 0 is given by Here the summation in S 0 is constrained by Again, taking a supremum over q and factorizing the summation, we find that where S 1 is as given by (14.3) above, and S 3 is given by where N (a 1 , d 0 ) Putting together (14.2), (14.5), (14.6) we obtain as required.
Proof We first bound S 1 , S 2 , S 3 individually using Lemmas 12.2, 10.6 and 10.7. We will then combine these bounds to give the desired result. We first consider the quantity N (a 1 , d 0 ) occurring in S 3 . If q and q are both counted by N (a, d) then there exists b, g and b , g such that (b, qdg) = (b , q dg ) = 1 and Here we used the fact that E 0 / X 1/N K Q 0 . The variables we consider satisfy q, q ∼ Q 1 Q 0 /G 1 G 2 D 0 D 1 and g, g ∼ G 2 and d ∼ D 0 . Thus Given q, g, b, h with (qg, b) = 1, we then see We recall that Q + E (X/N K ) 2 , and so this gives as required.

Line estimates
In this section we establish Proposition 13.4, which controls the contribution from pairs of angles which cause a large contribution to the bilinear sums considered in Sect. 13 to come from a line. If a line L makes a large contribution, then (a 1 , a 2 , X ) must lie close to the low height plane orthogonal to this line. We note that we do not make use of the fact that these angles lie outside the major arcs, but it is vital that the angles are restricted to the small set E.
Lemma 15.1 (Line angles lie in low height plane) Let 0 < δ < 1 and K , N , X > 1 be reals with δ ≥ N / X and N K ≥ X 17/40 . Let B 2 = B 2 (N , K , δ) be the set of integer pairs (a 1 , a 2 ) ∈ [0, X ) 2 such that there is a line L through the origin such that Then all pairs (a 1 , be a non-zero element of Z 3 ∩ L of smallest norm, and let V = v 2 and 1 = |v 1 a 1 + v 2 a 2 + v 3 X |. Then all of Z 3 ∩ L is generated by v, and so #{n ∈ L ∩ Z 3 : |n 1 a 1 + n 2 a 2 + n 3 X | ≤ δ X, By assumption, this is also δ N 2 K , and so we obtain Letting Proof Trivially there are O(#C 2 ) choices of a 1 , a 2 ∈ C, which gives the required bound if V > #C 3/8 . In particular, we may assume that V < #C ≤ X . There are O(#C) points with a 1 = 0 or a 2 = 0, so we may assume that a 1 , a 2 = 0. We first claim that there are O(#CV 2 X o(1) ) (15.1) choices of v 1 , v 2 , v 3 , v 4 , a 1 , and a 2 satisfying v 1 a 1 +v 2 a 2 +v 3 X +v 4 = 0 with at least one of v 1 , v 2 , v 3 , v 4 equal to 0 and at least one of v 1 , v 2 , v 3 , v 4 non-zero. For example, if v 1 = 0 then there are O(#CV 2 ) choices of a 1 , v 3 , v 4 , which then determines v 2 a 2 . Since there are no non-zero solutions to v 3 X + v 4 = 0, this is non-zero and so there are O(X ) choices of v 2 , a 2 . The other cases are entirely analogous. Thus it suffices to consider pairs (a 1 , a 2 ) such that v 1 a 1 + v 2 a 2 + v 3 X + v 4 = 0 for some v 1 , v 2 , v 3 , v 4 all non-zero. We let C 2 denote the set of such pairs. Given a ∈ Z, let M a be the smallest value of (c 2 1 + c 2 2 ) 1/2 over all nonzero integers c 1 , c 2 such that c 1 ≡ c 2 X (mod a). We divide C into O(log X ) 2 subsets localizing the size of a < X and M a < X by considering the sets  where N 2 = # (a 2 , a 2 , a 1 ) ∈ C(A, M) 2 × C : for some integers 0 < |v 1 |, |v 1 |, |v 2 |, |v 2 |, |v 3 |, |v 3 |, |v 4 |, |v 4 | ≤ V, a 1 a 2 a 2 = 0 .
There This bound will be good for us if V 1 is small, but we need a different argument if V 1 is large. We note that We make a choice of a 2 , a 2 , b 1 , for which there are V V 1 X o(1) min(M 4 , #C 2 ) possibilities counted by N 3 (V 1 ). We see that b 3 , b 4 satisfy Let b 3,0 , b 4,0 be a solution to this congruence with b 2 3,0 +b 2 4,0 minimal. We may assume that b 3,0 V V 1 A/ X and b 4,0 V V 1 since otherwise there are no possible b 3 , b 4 . All pairs b 3 , b 4 satisfying the congruence are then of the form (b 3 , b 4 ) = (b 3,0 +b 3 , b 4,0 +b 4 ) for some integers b 3 , b 4 satisfying b 3 X +b 4 ≡ 0 (mod a 2 ) and b 3 V V 1 A/ X , b 4 V V 1 . This forces b 3 e 1 + b 4 e 2 to lie in a lattice ⊂ Z 2 of determinant a 2 , where e 1 , e 2 are the standard basis vector of Z 2 . Let φ : R 2 → R 2 be the linear map which is a dilation by a factor X/A in the e 1 direction, and = φ( ), a lattice in R 2 of determinant a 2 X/A X .
Let have a Minkowski-reduced basis {v 1 , v 2 }. We recall this means that v 1 2 · v 2 2 det( ) = a 2 X/A X and n 1 v 1 + n 2 v 2 2 n 1 v 1 2 + n 2 v 2 2 . From the definition of M a , we see that the smallest non-zero vector in has length at least M/10, and so since φ can only increase the length of vectors we have v 1 2 , v 2 2 ≥ M/10.
The set of vectors b 3 e 1 + b 4 e 2 in inside the bounded region |b 3 | V V 1 A/ X , |b 4 | V V 1 can be injected by φ into the set {x ∈ : x 2 ≤ C V V 1 } for some suitably large constant C. Thus, provided C is sufficiently large so that we also have n 1 v 1 + n 2 v 2 2 ≥ max i n i v i 2 /C, we see that the number of pairs (b 3 , b 4 ) is bounded by #{x ∈ : Here we used the fact that v 1 2 , v 2 2 M and v 1 2 · v 2 2 det( ) in the penultimate line, and det( ) X in the final line.

Modifications for Theorem 1.2
Theorem 1.2 follows from essentially the same overall approach as in Theorem 1.1. We only provide a brief sketch the proof, leaving the complete details to the interested reader. When q is large, there is negligible benefit from using the 235/154th moment, so we just use 1 bounds. For Y = q k a power of q, we let The inner sum is ≤ min(q − s, s + 2/ q i θ ). Thus, similarly to Lemma 10.3, we find Using this bound, get a corresponding improvement on (16.1), which gives If s ≤ q − q 57/80 and q is sufficiently large in terms of , this gives a bound Y 23/80+ . As before, using this bound in place of Lemmas 10.3 and 10.4 throughout gives the result. For the results mentioned after Theorem 1.2, we find that in the further restricted ranges s ≤ q 1/4−δ (or s ≤ q − q 3/4+δ if B = {0, . . . , s − 1}), the bound (16.1) [or (16.2)] give an 1 bound of Y 1/4−δ/2 . Following this through the argument, we obtain a wider Type II range and can estimate bilinear sums provided N ∈ [X 5/16 , X 1/2 ] instead of [X 9/25 , X 17/40 ]. By symmetry, we can then also estimate terms in N ∈ [X 1/2 , X 11/16 ]. This allows us to obtain asymptotic estimates for all the terms in the right hand side of the identity S(A, X 1/2 ) = S(A, X 3/8−2 ) − X 3/8−2 ≤ p<X 1/2 S(A p , p), by the equivalents of Propositions 6.1 and 6.2 adapted to this larger Type II range.