Primes with restricted digits

Let $a_0\in\{0,\dots,9\}$. We show there are infinitely many prime numbers which do not have the digit $a_0$ in their decimal expansion. The proof is an application of the Hardy-Littlewood circle method to a binary problem, and rests on obtaining suitable `Type I' and `Type II' arithmetic information for use in Harman's sieve to control the minor arcs. This is obtained by decorrelating Diophantine conditions which dictate when the Fourier transform of the primes is large from digital conditions which dictate when the Fourier transform of numbers with restricted digits is large. These estimates rely on a combination of the geometry of numbers, the large sieve and moment estimates obtained by comparison with a Markov process.


Introduction
Let a 0 ∈ {0, . . . , 9} and let A 1 = 0≤i≤k n i 10 i : n i ∈ {0, . . . , 9}\{a 0 }, k ≥ 0 be the set of numbers which have no digit equal to a 0 in their decimal expansion. The number of elements of A 1 which are less than x is O(x 1−c ), where c = log (10/9)/ log 10 ≈ 0.046 > 0. In particular, A 1 is a sparse subset of the natural numbers. A set being sparse in this way presents several analytic difficulties if one tries to answer arithmetic questions such as whether the set contains infinitely many primes. Typically we can only show that sparse sets contain infinitely many primes when the set in question possesses some additional multiplicative structure.
The set A 1 has unusually nice structure in that its Fourier transform has a convenient explicit analytic description, and is often unusually small in size. There has been much previous work [1,2,4,5,6,11,13] studying A 1 and related sets by exploiting this Fourier structure. In particular the work of Dartyge and Mauduit [7,8] shows the existence of infinitely many integers in A 1 with at most 2 prime factors, this result relying on the fact that A 1 is well-distributed in arithmetic progressions [7,12,16]. We also mention the related work of Mauduit and Rivat [17] who showed the sum of digits of primes is well-distributed, and the work of Bourgain [3] which showed the existence of primes in the sparse set created by prescribing a positive proportion of the binary digits.
We show that there are infinitely many primes in A 1 . Our proof is based on a combination of the circle method, Harman's sieve, the method of bilinear sums, the large sieve, the geometry of numbers and a comparison with a Markov process. In particular, we make key use of the Fourier structure of A 1 , in the same spirit as the aforementioned works. Somewhat surprisingly, the Fourier structure allows us to successfully apply the circle method to a binary problem. Theorem 1.1. Let X ≥ 4 and A = { 0≤i≤k n i 10 i < X : n i ∈ {0, . . . , 9}\{a 0 }, k ≥ 0} be the set of numbers less than X with no digit in their decimal expansion equal to a 0 . Then we have #{p ∈ A} ≍ #A log X ≍ X log 9/ log 10 log X .
Here, and throughout the paper, f ≍ g means that there are absolute constants c 1 , c 2 > 0 such that c 1 f < g < c 2 f .
Thus there are infinitely many primes with no digit a 0 when written in base 10.
Indeed, there are (φ(10)κ A /10 + o(1))#A elements of A which are coprime to 10, and (1+o(1))X/ log X primes less than X which are coprime to 10, and (φ(10)/10+ o(1))X integers less than X coprime to 10. Thus if the properties 'being in A' and 'being prime' where independent for integers n < X coprime to 10, we would expect (κ A +o(1))#A/ log X primes in A. Theorem 1.1 shows this heuristic guess is within a constant factor of the truth, and we would be able to establish such an asymptotic formula if we had stronger 'Type II' information.
One can consider the same problem in bases other than 10, and with more than one excluded digit. The set of numbers less than X missing s digits in base q has ≍ X c elements, where c = log(q − s)/ log q. For fixed s, the density becomes larger as q increases, and so the problem becomes easier. Our methods are not powerful enough to show the existence of infinitely many primes with two digits not appearing in their decimal expansion, but they can show that there are infinitely many primes with s digits excluded in base q provided q is large enough in terms of s. Moreover, if the set of excluded digits possesses some additional structure this can apply to very thin sets formed in this way.
Theorem 1.2. Let q be sufficiently large, and let X ≥ q.
For any choice of B ⊆ {0, . . . , q − 1} with #B = s ≤ q 23/80 , let n i q i < X : n i ∈ {0, . . . , q − 1}\B, k ≥ 0 be the set of integers less than X with no digit in base q in the set B. Then we have #{p ∈ A ′ } ≍ X log(q−s)/ log q log X .
The final case of Theorem 1.2 when B = {0, . . . , s − 1} and s ≈ q − q 57/80 shows the existence many primes in a set of integers A ′ with #A ′ ≈ X 57/80 = X 0.7125 , a rather thin set. The exponent here can be improved slightly with more effort.
The estimates in Theorem 1.2 can be improved to asymptotic formulae if we restrict s slightly further. For general B with s = #B ≤ q 1/4−δ and any q sufficiently large in terms of δ > 0 we obtain where, if B contains exactly t elements coprime to q, we have In the case of just one excluded digit, we can obtain this asymptotic formula for q ≥ 12. In the case of B = {0, . . . , s − 1}, we obtain the above asymptotic formula provided s ≤ q − q 3/4+δ .
We expect several of the techniques introduced in this paper might be useful more generally in other digit-related questions about arithmetic sequences. Our general approach to counting primes in A and our analysis of the minor arc contribution might also be of independent interest, with potential application to other questions on primes involving sets whose Fourier transform is unrelated to Diophantine properties of the argument.

Outline
Our argument is fundamentally based on an application of the circle method. Clearly for the purposes of Theorem 1.1 we can restrict X to a power of 10 for convenience. The number of primes in A is the number of solutions of the binary equation p − a = 0 over primes p and integers a ∈ A, and so is given by We then separate the contribution from the a in the 'major arcs' which give our expected main term for #{p ∈ A}, and the a in the 'minor arcs' which we bound for an error term.
The reader might be (justifiably) somewhat surprised by this, since it is well known that the circle method typically cannot be applied to binary problems. Indeed, one cannot generally hope for bounds better than 'square-root cancellation' S P (θ) ≪ X 1/2 , S A (θ) ≪ #A 1/2 , for 'generic' θ ∈ [0, 1]. Thus if one cannot exploit cancellation amongst the different terms in the minor arcs, we would expect that the ≫ X different 'generic' a in the sum above would contribute an error term which we can only bound as O(X 1/2 #A 1/2 ), and this would dominate the expected main term.
It turns out that the Fourier transform S A (θ) has some somewhat remarkable features which cause it to typically have better than square-root cancellation. (A closely related phenomenon is present and crucial in the work of Mauduit and Rivat [17] and Bourgain [3].) Indeed, we establish the ℓ 1 bound which shows that for 'generic' a we have S A (a/X) ≪ #A/X 0.64 ≪ X 0.32 . This gives us a (small) amount of room for a possible successful application of the circle method , since now we might hope the 'generic' a would contribute a total O(X 0.82 ) if the bound S P (a/X) ≪ X 1/2+ǫ held for all a in the minor arcs, and this O(X 0.82 ) error term is now smaller than the expected main term of size #A 1+o (1) .
We actually get good asymptotic control over all moments (including fractional ones) of S A (a/X) rather than just the first. By making a suitable approximation to S A (θ), we can re-interpret moments of this approximation as the average probability of restricted paths in a Markov process, and obtain asymptotic estimates via a finite eigenvalue computation.
By combining an ℓ 2 bound for S P (a/X) with an ℓ 1.526 bound for S A (a/X), we are able to show that it is indeed the case that 'generic' a < X make a negligible contribution, and that we may restrict ourselves to a ∈ E, some set of size O(X 0.36 ).
We expect that S P (θ) is large only when θ is close to a rational with small denominator, and S A (θ) is large when θ has a decimal expansion containing many 0's or 9's. Thus we expect the product to be large only when both of these conditions hold, which is essentially when θ is well approximated by a rational whose denominator is a small power of 10.
By obtaining suitable estimates for A in arithmetic progressions via the large sieve, one can verify that amongst all a in the major arcs M where a/X is wellapproximated by a rational of small denominator we obtain our expected main term, and this comes from when a/X is well-approximated by a rational with denominator 10.
Thus we are left to show when a ∈ E and a/X is not close to a rational with small denominator, the product S A (a/X)S P (−a/X) is small on average. By using an expansion of the indicator function of the primes as a sum of bilinear terms (similar to Vaughan's identity), we are led to bound expressions such as which is a weighted and averaged form of the typical expressions one encounters when obtaining a ℓ ∞ bound for exponential sums over primes. Here · is the distance to the nearest integer.
The double sum over n 1 , n 2 in (2.2) is of size O(N 2 ) for 'typical' pairs (a 1 , a 2 ), and if it is noticeably larger than this then a 1 and a 2 must share some Diophantine structure. We find that the pair (a 1 , a 2 ) must lie close to the projection from Z 3 to Z 2 of some low height plane or low height line if this quantity is large, where the arithmetic height of the line or plane is bounded in terms of the size of the double sum. (For example, the diagonal terms a 1 = a 2 give a large contribution and lie on a low height line, and a 1 , a 2 which are both small give a large contribution and lie in a low height plane.) This restricts the number and nature of pairs (a 1 , a 2 ) which can give a large contribution. Since we expect the size of S A (a 1 /X)S A (a 2 /X) to be determined by digital rather than Diophantine conditions on a 1 , a 2 , we expect to have a smaller total contribution when restricted to these sets. By using the explicit description of such pairs (a 1 , a 2 ) we succeed in obtaining such a superior bound on the sum over these pairs. It is vital here that we are restricted to a 1 , a 2 lying in the small set E (for points on a line) and outside of the set M of major arcs (for points in a lattice).
This ultimately allows us to get suitable bounds for (2.2) provided N ∈ [X 0.36 , X 0.425 ]. If this 'Type II range' were larger, we would be able to express the indicator function of the primes as a combination of such bilinear expressions and easily controlled terms. We would then obtain an asymptotic estimate for #{p ∈ A}. Unfortunately our range is not large enough to do this. Instead we work with a minorant for the indicator function of the primes throughout our argument, which is chosen such that it is essentially a combination of bilinear expressions which do fall into this range. It is this feature which means we obtain a lower bound rather than an asymptotic estimate for the number of primes in A.
Such a minorant is constructed via Harman's sieve, and, since it is essentially a combination of Type II terms and easily handled terms, we can obtain an asymptotic formula for elements of A weighed by it. This gives a lower bound #{p ∈ A} ≥ (c + o(1)) #A log X for some constant c. We use numerical integration to verify that we (just) have c > 0, and so we obtain our asymptotic lower bound for #{p ∈ A}. The upper bound is a simple sieve estimate.
Remark. For the method used to prove Theorem 1.1, strong assumptions such as the Generalized Riemann Hypothesis appear to be only of limited benefit. In particular, even under GRH one only gets pointwise bounds of the strength S P (θ) ≪ X 3/4+o(1) for 'generic' θ, which is not strong enough to give a non-trivial minor arc bound on its own. The assumption of GRH and the above pointwise bound is sufficient to deal with the entire minor arc contribution in the regime where we obtain asymptotic formulae (i.e. when the base is sufficiently large).

Notation
We use the asymptotic notation ≪, ≫, O(·), o(·) throughout, denoting a dependence of the implied constant on a parameter t by a subscript. As mentioned earlier, we use f ≍ g to denote that both f ≪ g and g ≪ f hold. Throughout the paper ǫ will denote a single fixed positive constant which is sufficiently small; ǫ = 10 −100 would probably suffice. In particular, any implied constants may depend on ǫ. We will assume that X is always a suitably large integral power of 10 throughout. We will exclusively use the letter p to denote a prime number, without always making this restriction explicit.
We will use the nonstandard notation that n ∼ X to mean that n lies in the interval (X/10, X] throughout the paper.
Several variables will be assumed to be non-negative integers, without directly specifying this. Thus sums such as n<X will be assumed to be over integers n with 0 ≤ n < X, for example. The usage should be clear from the context.
It will be convenient to normalize the Fourier transform of A, and to be able to view it at different scales. With this in mind, we define Whenever we encounter the function F Y we assume that Y is a positive integral power of 10. (Or that they are powers of q in Section 16.) We use · to denote the distance to the nearest integer, and · 2 to denote the standard Euclidean norm. We use 1 A1 for the indicator function of the set A 1 of integers with restricted digits.
Here e(x) = e 2πix is the complex exponential function.
We need to make use of various numerical estimates throughout the paper, some of which succeed only by a small margin. We have endeavored to avoid too many explicit calculations and we encourage the reader to not pay too much attention to the numerical constants appearing on a first reading.

Structure of the paper
In Section 6, we use a sieve decomposition to reduce the proof of Theorem 1.1 to the proof of Proposition 6.1 and Proposition 6.2, which are asymptotic estimates for particular types of terms arising from sieve decompositions. These propositions are established in Section 7.
In Section 7, we use sieve theory to reduce the proof of Proposition 6.1 and Proposition 6.2 to the proof of Proposition 7.1 and Proposition 7.2, which are our 'Type I' and 'Type II' estimates. These will be established in Section 8 and Section 9 respectively.
In Section 8 we use a large sieve argument to reduce the proof of our Type I estimate Proposition 7.1 to that of Lemma 8.1 and Lemma 8.2, which are Fourier ℓ ∞ and ℓ 1 bounds. These will be established in Section 10.
In Section 9 we use the circle method and geometric decompositions to reduce the proof of our Type II estimate Proposition 7.2 to that of Proposition 9.1, Proposition 9.2 and Proposition 9.3, which are our estimates for the 'major arcs', the 'generic minor arcs' and the 'exceptional minor arcs'. These will be established in Sections 11, 12 and 13 respectively.
In Section 10 we establish various Fourier estimates. In particular we establish Lemma 8.1 and Lemma 8.2, as well as several auxiliary lemmas which will be used in later sections.
In Section 11 use results on primes in arithmetic progressions to establish our major arc estimate Proposition 9.1, making use of the estimates of Section 10.
In Section 12 we use Fourier moment bounds from Section 10 to establish our generic minor arc estimate Proposition 9.2.
In Section 13 we use the geometry of numbers to reduce the proof of the exceptional minor arc estimate Proposition 9.3 to the proof of Proposition 13.3 and Proposition 13.4, which are estimates from frequencies constrained to lie in low height lattices or low height lines. These will be established in Section 14 and Section 15.
In Section 14 we establish our estimate for low height lattices Proposition 13.3, using the estimates of Section 10.
In Section 15 we establish our estimate for low height lines Proposition 13.4 , using the geometric counting estimates and the results of Section 10. This completes the proof of Theorem 1.1.
In Section 16, we sketch the modifications in the argument required to establish Theorem 1.2.
In particular, the dependency graph between the main statements in the proof of Theorem 1.1 is as follows:

Basic estimates
We will make frequent use of some well-known facts in analytic number theory without extra comment. In particular, we make use of the Prime Number Theorem in short intervals and arithmetic progressions with error term (see [10,Chapter 22], for example). This states that for any A > 0 we have We recall the following sieve estimate (see, for example, [18,Theorem 7.11]): For where ω(u) is the Buchstab function defined by the delay-differential equation We recall some results from the geometry of numbers and Minkowski's theory of successive minima (see, for example, [9, Page 110]). A lattice in R k is a discrete subgroup of the additive group R k . For any lattice Λ there is a Minkowski-reduced basis {v 1 , . . . , v r } of linearly independent vectors in R k such that and for any x 1 , . . . , x r ∈ R we have and with v 1 2 · · · v r 2 ≍ det(Λ), where these implied constants depend only on the ambient dimension k. Here det(Λ) is the r-dimensional volume of the fundamental parallelepiped, given by We say r is the rank of the lattice. We see the properties of the Minkowski-reduced basis above indicate that each generating vector v i has a positive proportion of its length in a direction orthogonal to all the other basis vectors.
6. Sieve Decomposition and proof of Theorem 1.1 First, we prove Theorem 1.1 assuming two key propositions, given below. This reduces the problem to establishing Proposition 6.1 and Proposition 6.2 which we do over the remaining sections.
As remarked in Section 2, it suffices to consider X as a power of 10. If X = 10 k we will think of all elements of A as having k digits, none of which is equal to a 0 . This is equivalent to slightly changing the definition of A in the case when a 0 = 0 (since it restricts A to (X/10, X]), but by considering X, X/10, X/100, . . . we see that we can easily recover Theorem 1.1 for the original set A from this situation.
We will make a decomposition of #{p ∈ A} into various terms following Harman's sieve (see [15] for more details). Each of these terms can then be asymptotically estimated by Proposition 6.1 or Proposition 6.2 (given below), or can be trivially bounded below by 0. To keep track of the terms in this decomposition we apply the same decomposition to the set by considering a weighted sequence w n .
Let w n be weights supported on non-negative integers n < X given by (We recall that 1 A is the indicator function of A, and κ A is the constant given by (1.1).) For a set C we define Given an integer d > 0 and a real number z > 0, let We expect that S d (z) is typically small for a wide range of d and z. The following two propositions show that this is the case for certain d, z.
Proposition 6.1 includes the case ℓ = 0, where we interpret the statement as Proposition 6.2 (Type II terms). Fix an integer ℓ ≥ 1. Let θ 1 , θ 2 , L be as in Proposition 6.1, and let I ⊆ {1, . . . , ℓ} and j ∈ {1, . . . , ℓ}. Then we have * We note that by inclusion-exclusion the same result holds if some of the inequalities L ≥ 0 are replaced by the strict inequality L > 0.
We first consider the upper bound for Theorem 1.1, which is essentially a standard sieve upper bound. Since θ 2 − θ 1 < 1/2, we have Thus, using (6.3) and the fact (5.2) that there are O(X/ log X) integers in [0, X] with no prime factors smaller than X θ2−θ1 , we have Thus it suffices to establish the lower bound.
To simplify notation, we let z 1 ≤ z 2 ≤ z 3 ≤ z 4 ≤ z 5 ≤ z 6 be given by We have Thus we wish to bound S 1 (z 4 ) from below. By Buchstab's identity (i.e. inclusionexclusion on the least prime factor) we have The term S 1 (z 1 ) is o(#A/ log X) by (6.3) from Proposition 6.1. We split the sum over p into ranges (z i , z i+1 ], and see that all the terms with p ∈ (z 2 , z 3 ] are also negligible by Proposition 6.2. This gives We wish to replace S p (p) by S p (min(p, (X/p) 1/2 )). We note that these are the same when p ≤ X 1/3 , but if p > X 1/3 then there are additional terms in S p ((X/p) 1/2 ) from primes in the interval ((X/p) 1/2 , p]. For δ = 1/(log X) 1/2 , by the prime number theorem and Proposition 6.1, we have Here, and throughout this section, q is restricted to being a prime number. Similarly, we get corresponding bounds for S(B p , min(p, (X/p) 1/2 )), and so we can replace S p (p) with S p (min(p, (X/p) 1/2 )) at the cost of a small error.
Using this, and applying Buchstab's identity again, we have The first two terms above are asymptotically negligible by Proposition 6.1, and so this simplifies to We perform further decompositions to the remaining terms in (6.5). We first concentrate on the first term on the right hand. Splitting the ranges of pq into intervals, and recalling those with a pq in the interval [z 2 , z 3 ] or [z 5 , z 6 ] make a negligible contribution by Proposition 6.2, we obtain Here we have dropped the condition q ≤ (X/p) 1/2 in the final sum, since this is implied by q ≤ p and pq ≤ z 2 . On recalling the definition (6.1) of w n , we can lower bound the first term of (6.6) by dropping the non-negative contribution from the set A via w n ≥ −κ A #A/X. By partial summation, and using the estimate (5.2), this gives Here ω(u) is Buchstab's function, and P − (n) denotes the least prime factor of n.
We perform further decompositions to the second term of (6.6), first splitting according to the size of q 2 p compared with z 6 .
The first term above is counting products of exactly three primes, and for these terms we drop the contribution from A for a lower bound. By partial summation and the prime number theorem, this gives .
For the terms not coming from products of 3 primes, we split our summation according to the size of qr, noting that this is negligible if qr ∈ [z 2 , z 3 ] by Proposition 6.2. For the terms with qr / ∈ [z 2 , z 3 ] we just take the trivial lower bound. Thus where R 1 and R 2 are given by Together (6.9), (6.10) and (6.11) give a suitable lower bound for the terms in (6.8) with q 2 p ≥ z 6 .
When q 2 p < z 6 we can apply two further Buchstab iterations, since then we can evaluate terms S pqr (z 1 ) with r ≤ q ≤ p using Proposition 6.1 as pqr ≤ pq 2 < z 6 . As before, we may replace S pq (q) by S pq (min(q, (X/pq) 1/2 )) and S pqr (r) with S pqr (min(r, (X/pqr) 1/2 )) at the cost of negligible error terms (since pqr < z 6 ). This gives where r, s are restricted to primes in the sums above. Finally we see that any part of the final sum with a product of two of p, q, r, s in [z 2 , z 3 ] can be discarded by Proposition 6.2. Trivially lower bounding the remaining terms as we did before yields z1<s≤r≤q≤p≤z2 q 2 p<z6 z3≤pq<z5 r 2 pq,s 2 pqr≤X S pqrs (s) where R 3 is given by This completes our decomposition of the terms from (6.8), coming from the second term of (6.6). We note that we could have imposed various further restrictions such as u + v + w / ∈ [θ 1 , θ 2 ] in R 3 , but for ease of calculation we do not include these.
We perform decompositions to the third term of (6.6) in a similar way to how we dealt with the second term. We have q 2 p < (qp) 3/2 < z < z 6 so, as above, we can apply two Buchstab iterations and use Proposition 6.1 to evaluate the terms S pqr (z 1 ) since we have pqr ≤ pq 2 < z 6 . Furthermore, we notice that terms with any of pqr, pqs, prs, or qrs in [z 2 , z 3 ] ∪ [z 5 , z 6 ] are negligible by Proposition 6.2. This gives z1<q≤p≤z2 z1≤pq<z2 S pqr (z 1 ) + We note that for R 4 we have dropped different constraints to those we dropped in R 3 .
We are left to consider the second term from (6.5), which is the remaining terms with p ∈ (z 3 , z 4 ]. We treat these in a similar manner to those with p ≤ z 2 . We first split the sum according to the size of qp. Terms with qp ∈ [z 5 , z 6 ] are negligible by Proposition 6.2, so we are left to consider qp ∈ (z 3 , z 5 ) or qp > z 6 . We then split the terms with qp ∈ (z 3 , z 5 ) according to the size of q 2 p compared with z 6 . This gives 14) S pq (q).
We apply two further Buchstab iterations to S 3 (we can handle the intermediate terms using Proposition 6.1 as before since q 2 p < z 6 ). As before, we may replace S pq (q) by S pq (min(q, (X/pq) 1/2 )) and S pqr (r) by S pqr (min(r, (X/pqr) 1/2 )) at the cost of a negligible error term (since pqr < z 6 ). This gives where Together (6.14), (6.15), (6.16) give our lower bound for the second term from (6.5), which is all the terms with p ∈ [z 3 , z 4 ]. This completes our lower bound for S 1 (z 4 ).
Thus in this case we have I 1 + · · · + I 9 < 0.996, and so by continuity we have I 1 + · · · + I 9 < 0.996 + O(ǫ) when θ 1 = 9/25 + 2ǫ and θ 2 = 17/40 − 2ǫ. Thus, taking ǫ suitably small, we see that (6.17) holds, and so we have completed the proof of Theorem 1.1 for X sufficiently large. If X ≥ 4 is bounded by a constant, then Theorem 1.1 follows (after potentially adjusting the implied constants) on noting that either 2 or 3 is a prime in A and so Theorem 1.1 also holds for bounded X ≥ 4.
We note that there are various ways in which one can improve the numerical estimates, but we have restricted ourselves to the above decomposition in the interests of clarity. Judiciously employing further Buchstab decompositions would give small numerical improvements, for example.
Thus it suffices to establish Propositions 6.1 and 6.2.

Sieve Asymptotics
In this section we prove Proposition 6.1 and Proposition 6.2 assuming Proposition 7.1 and Proposition 7.2, given below. This reduces the problem to proving standard 'Type I' and 'Type II' estimates. These propositions will then be proven in Sections 8 and 9.
Before we state the propositions, we set up some extra notation. Let By a closed convex polytope in R ℓ we mean a region R defined by a finite number of non-strict affine linear inequalities in the coordinates (equivalently, this is the convex hull of a finite set of points in R ℓ ). Given a closed convex polytope R ⊆ Q ℓ (η), we let We caution that 1 R counts numbers with a particular type of prime factorization, and should not be confused with 1 A , the indicator function of the set A. We recall B = {n ∈ Z : 0 ≤ n < X}.
Our two key propositions that we will use are given below.
be a closed convex polytope in R ℓ which has the property that , if (10, a 0 ) = 1, 10 9 , otherwise.
Proposition 6.2 follows quickly from Proposition 7.2, but it will be convenient to establish a slightly more general version where the primes can be as small as X η .
As before, we note that by inclusion-exclusion the same result holds if some of the constraints L ≥ 0 are replaced with L > 0. We see Proposition 6.2 follows immediately from Lemma 7.3 on choosing η = θ 2 − θ 1 .
Proof of Lemma 7.3 assuming Proposition 7.2. We just deal with the case when i∈I p i ∈ [X θ1 , X θ2 ]; the other case is entirely analogous with θ 1 and θ 2 simply replaced with 1 − θ 2 and 1 − θ 1 throughout. (Notice that if e ∈ R ⊆ Q ℓ (η) satisfies i∈I e i ∈ [23/40 + ǫ, 16 Recall the definition (6.2) of S d (z). We see that S p1···p ℓ (p j ) is a sum of w n only involving integers n with at most 1/η prime factors, since all prime factors are of size at least X η . The terms with exactly r prime factors (for some r ≤ 1/η) are a sum of w p1···pr over p 1 , . . . , p r with the summation only restricted by a bounded number of linear inequalities on log p 1 / log X, . . . , log p r / log X. (These are the previous restrictions on p 1 , . . . , p ℓ , and the restriction p j ≤ p ℓ+1 ≤ · · · ≤ p r ). We may write the condition X η ≤ p 1 and the restriction on the size of i∈I p i and ℓ i=1 p i as linear conditions only involving log p 1 / log X, . . . , log p ℓ / log X with coefficients having constants depending only on η. Thus, after increasing L to include these conditions, it suffices to show that where * indicates that the summation is restricted by the conditions Let δ = 1/ log log X. We first trivially discard the contribution from n = p 1 · · · p r < X 1−δ . Each n appears O η (1) times in (7.1), so recalling the definition (6.1) of w n and dropping the other constraints, the total contribution from such terms is Thus it is sufficient to show Since we have the constraint p 1 · · · p ℓ ≤ X/p j ≤ X 1−η , the result follows immediately if r = ℓ (if η < δ the result is trivial). Thus we may assume that r > ℓ, so none of the constraints involve all the p i . We now wish to replace log p i / log X with log p i / log n in the conditions (7.2). For n ∈ [X 1−δ , X], we have and so if exactly one of L log p1 log X , . . . , log p ℓ log X and L log p1 log n , . . . , log p ℓ log n is non-negative, we must have To bound the contribution of such terms, let γ > 0 be a parameter and (Here the summation is over all choices of primes p 1 , . . . , p r , and for any such choice n = p 1 · · · p r . We do not restrict to n ≥ X 1−δ in the summation.) We wish to show that if γ = o L,η (1) then G(γ, L) = o L,η (#A/ log X), and we will do this by first thinking of γ fixed but very small.
We split the sum into at most r! = O η (1) subsums where the variables are ordered (we potentially double-count the contribution from p i = p i ′ for an upper bound). Thus, after relabelling the p i , we see that Then R satisfies the conditions of Proposition 7.2, so By the Prime Number Theorem and partial summation, we have Since all components of elements of R are at least η, the integral is bounded by η −r times the (r − 1)−dimensional volume of R. Since L involves at most ℓ ≤ r − 1 coordinates and R ⊆ [η, 1] r , this volume is O L,η (γ). Thus If γ → 0 as X → ∞ suitably slowly, we see that this shows that G(γ, L) = o L,η (#A/ log X). But from the definition of G, we see that G(γ, L) is non-decreasing in γ, so in fact we deduce that for any γ = o L,η (1) we have G(γ, L) = o L,η (#A/ log X).
We see from (7.5) that the error introduced to (7.4) by replacing log p i / log X with log p i / log n in the conditions (7.2) is O( L∈L G(γ, L)) for some γ ≪ L δ = o L (1). By the above discussion, this is o L,η (#A/ log X), which is negligible.
After making this change, we may reintroduce the terms with n < X 1−δ at the cost of a negligible error by using the bound (7.3) again. Thus * where * * indicates the sum is constrained to for all L ∈ L. Moreover, since we had the constraint i∈I p i ∈ [X θ1 , X θ2 ] in (7.2), this second sum includes the constraint i∈I p i ∈ [n θ1 , n θ2 ]. We now split the summation into O η (1) subsums where the p i are totally ordered. After relabelling the coordinates, Proposition 7.2 applies to each of these sums, since the linear constraints L ≥ 0 for L ∈ L define a closed convex polytope (depending only on L), and the ordering of the variables ensures that this lies within Q r (η) (recall that the constraint X η ≤ p 1 becomes n η ≤ p 1 , so all primes are at least n η ). The constraint i∈I p i ∈ [n θ1 , n θ2 ] corresponds to the sum of a subset of the coordinates of all points in the polytope lying in [θ 1 , θ 2 ]. Proposition 7.2 shows that the contribution from each such sum is o L,η (#A/ log X). Since there are O η (1) such sums, the total contribution is o L,η (#A/ log X), giving the result.
Our aim for the remainder of this section is to establish Proposition 6.1 using Proposition 7.1 and Proposition 7.2. We first establish an auxiliary lemma.
The implied constant is independent of δ.
Proof of Lemma 7.4 assuming Proposition 7.1. If δ > ǫ 4 then since S(C, X t ) is nonnegative and decreasing in t for any set C, we have By the rough number estimate (5.2) again, we see that the sum of 1/d over d < X with all prime factors bigger that X δ is O δ (1). Thus the result for δ > ǫ 4 follows from the result for δ = ǫ 4 , so we may assume without loss of generality that δ ≤ ǫ 4 . Let Then #A ′ = κ#A, where κ is the constant given in Proposition 7.1. Let R d (e) be defined by We put q = de and see from Proposition 7.1 that for any A > 0 the error terms R d (e) satisfy By the fundamental lemma of sieve methods (see, for example, [14, Theorem 6.9]) we have Summing over d and using the bound (7.6), we obtain The product in the final bound is O(δ −1 (log X) −1 ), and the inner sum over d is seen to be O(δ −1 ) by an Euler product upper bound. Finally, since we are assuming An identical argument works for the set B ′ = {n < X : (n, 10) = 1} instead of A ′ . This gives We see that for (d, 10 , and that #B ′ = φ(10)#B/10. Thus, by the triangle inequality We bound the first summation by (7.7), the second summation by (7.8), and note that since #B ′ = φ(10)#B/10, the third summation is zero. Since κ A = 10κ/φ(10), this gives Using Lemma 7.4 we can now prove Proposition 6.1.
We first consider the contribution from p 1 · · · p ℓ < X θ1 . Given a set C and an integer d, we let Buchstab's identity shows that We define T 0 (C; d) = S(C; X δ ) and V 0 (C; d) = 0. This gives for d ≤ X θ1 We apply the above decomposition to A d . This gives an expression with Applying the same decomposition to B d , taking the weighted difference, and summing over Here ′ indicates we are summing over all choices of p 1 , . . . , p ℓ which appear in the summation in Proposition 6.1 with the additional condition that d = p 1 · · · p ℓ < X θ1 .
We note that p 1 , . . . , p ℓ ≥ X θ , so d has O(1) prime factors and any integer e can be represented O(1) times as dp ′ 1 · · · p ′ m for some primes p ′ m ≤ · · · ≤ p ′ 1 and some choice of p 1 , . . . , p ℓ defining d. Thus, expanding the definition of T m , if δ ≤ ǫ we have Here we applied by Lemma 7.4 in the last line, using δ ≥ 1/ log log X.
We now consider the V m terms. We expand the definition of V m as a sum. We note that p ′ m ≤ X θ = X θ2−θ1 , so the summation is constrained by X θ1 ≤ dp ′ 1 · · · p ′ m ≤ X θ2 , which is our Type II constraint. We see that all terms have dp ′ 1 · · · p ′ m ≤ X/p ′ m , so we can insert this condition without changing the sum. We recall p 1 , . . . , p ℓ are constrained only by some linear conditions on log p 1 / log X, . . . , log p ℓ / log X. Thus we see that the sum is of the form considered in Lemma 7.3 with η = δ, since all the conditions in the summation can be written as linear constraints on log p i / log X for 1 ≤ i ≤ ℓ and log p ′ j / log X for 1 ≤ j ≤ m. Thus, by Lemma 7.3, we have Putting together (7.9), (7.10) and (7.11), we obtain Letting δ → 0 sufficiently slowly then gives the result for d < X θ1 .
The contribution from d with X θ2 < d < X 1−θ2 can be handled by an identical argument, where instead of restricting to dp ′ 1 · · · p ′ m ≤ X θ1 and X θ1 < dp We put d = p 1 · · · p ℓ and sum over p 1 , . . . , p ℓ satisfying the constraints imposed by L and such that d ∈ [X 1−θ2 , X 1−θ1 ]. The first term makes a negligible total contribution by Lemma 7.4 since d ≤ X 1−θ1 < X 50/77−ǫ . The second term makes negligible total contribution by Lemma 7.3 (noting that dp ≤ Together these cover the whole range p 1 · · · p ℓ ≤ X 1−θ1 , giving the result.
Thus, since Lemma 7.3 and Lemma 7.4 follow from Proposition 7.1 and Proposition 7.2, it suffices to establish Proposition 7.1 and Proposition 7.2.

Type I estimate
In this section we establish our 'Type I' estimate Proposition 7.1, assuming the more technical Lemmas 8.1 and 8.2, which we will establish later in Section 10. We recall that Proposition 7.1 describes the number of elements of A in arithmetic progressions to modulus up to X 50/77−ǫ ≈ X 0.65 on average.
Our Type I estimate is based on suitable bounds on the Fourier Transform of the set A. We recall our definition of the function F Y from (3.1), which is a normalized version of S A . In particular, |S A (θ)| = #A · F X (θ). The two key lemmas which we use in this section are the following.
. Let q < Y 1/3 be of the form q = q 1 q 2 with (q 1 , 10) = 1 and q 1 > 1, and let |η| < Y −2/3 /2. Then for any integer a coprime with q we have Proof of Proposition 7.1 assuming Lemma 8.1 and Lemma 8.2. By Möbius inversion and using additive characters, we have for (q, 10) = 1 We note that #{a ∈ A : (a, 10) = 1} = κ#A. Summing over q < Q with (q, 10) = 1 and letting q = q ′ q ′′ , we obtain Here we recall our notation that q ′ ∼ Q 1 means q ′ ∈ (Q 1 /10, Q 1 ]. By Lemma 8.1 we have for any d|10 Thus we see that the bound (8.1) is O A (#A/(log X) A ) in either case, as required.
We are left to establish Proposition 7.2 and Lemma 8.1 and Lemma 8.2.

Type II estimate
In this section we reduce our 'Type II' estimate to various major arc and minor arc estimates. In particular, we will reduce the proof of Proposition 7.2 to the proof of Propositions 9.1, 9.2 and 9.3. We first recall the statement of Propositon 7.2 which allows us to count integers in A with a specific type of prime factorization provided such numbers always have a 'conveniently sized' factor.
To avoid technical issues due to the fact that n<Y 1 A (n) can fluctuate with Y , we will replace our counts 1 R (n) with a weight Λ R , where for a set R ⊆ [η, 1] ℓ we define We note that in Λ R the conditions are on log p i / log X, whereas in 1 R the conditions are on log p i / log n. If every e ∈ R has e 1 ≤ · · · ≤ e ℓ then at most one term occurs in the summation, so Λ R simplifies to Λ R (n) = ℓ i=1 log p i , if n = p 1 · · · p ℓ and ( log p1 log X , . . . , log p ℓ log X ) ∈ R, 0, otherwise.
We prove Proposition 7.2 by an application of the Hardy-Littlewood circle method, whereby we study the functions Proposition 7.2 then relies on the following three components.
The implied constant depends on η, but not on R.
The implied constant depends on η and A, but not on R X or a 1 , . . . , a ℓ−1 .
We expect the contribution from the major arcs M to give the main contribution. Proposition 9.1 shows that we can get an asymptotic formula from frequencies in M. Proposition 9.2 shows that most frequencies contribute negligibly, and that any significant contribution must come from some small exceptional set E. (In view of Proposition 9.1, we must have E contains elements of M and so E is nonempty). We would expect that we can take E = M, but cannot quite show this. However, Proposition 9.3 shows that E \M contributes negligibly to our sum, which is sufficient for our purposes.
Proof of Proposition 7.2 assuming Propositions 9.1, 9.2 and 9.3 and Lemma 7.4. Let δ = (log log X) −1 . Clearly we may assume that δ is sufficiently small in terms of η, since otherwise the result is trivial. We note that ℓ ≥ 2, since the sum of coordinates of points in R is 1 but a non-trivial subset of them lies in [9/25, 17/40]. Given reals a 1 , . . . , a ℓ−1 ≥ 0 and γ > 0 and a set S ∈ R ℓ , let C(a; γ) := a 1 , a 1 + γ × · · · × a ℓ−1 , a ℓ−1 + γ , We see that 1 S and1 S differ in that the denominators of the fractions are log n and log X respectively.

By Fourier expansion we have
We split the summation over b into the sets M, [0, X)\(E ∪ M) and E\M, where M is as given by Proposition 9.1, and E is the set who existence is asserted by Proposition 9.2. We then apply Propositions 9.1, 9.2 and 9.3 respectively to each set in turn. Let H C + (θ) = S A (θ)S C + (a;δ) (−θ). For C in the definition of M sufficiently large in terms of A and η, this gives We recall that we assume Y is an integral power of ten whenever we encounter F Y to avoid some unimportant technicalities. In particular, for all θ and Y . The key property of F Y which we exploit is that it has an exceptionally nice product form. If Y = 10 k , then letting n = k−1 i=0 n i 10 i have decimal digits n k−1 , . . . , n 0 , we find We note that F Y is periodic modulo 1, and that the above product formula gives the identity (We recall that we assume that U and V are both powers of 10 in such a statement.) Lemma 10.1 (ℓ ∞ bound, Lemma 8.2 restated). Let q < Y 1/3 be of the form q = q 1 q 2 with (q 1 , 10) = 1 and q 1 > 1, and let |η| < Y −2/3 /2. Then for any integer a coprime with q we have for some absolute constant c > 0.
For the final inequality we used the convexity of exp(−x 2 ). We substitute this bound into our expression (10.2) for F Y , which gives for Y = 10 k If t = a/q 1 q 2 with q 1 > 1, (q 1 , 10) = 1 and (a, q 1 ) = 1, then 10 i t ≥ 1/q 1 q 2 for all i. Similarly, if t = a/q 1 q 2 + η with a, q 1 , q 2 as above, with |η| < Y −2/3 /2 and with q = q 1 q 2 < Y 1/3 then for i ≤ k/3 we have 10 i t ≥ 1/q − 10 i |η| ≥ 1/2q. However, if 10 i t < 1/20 then 10 i+1 t = 10 10 i t . Thus, for any interval I ⊆ [0, k/3] of length log q/ log 10, there must be some integer i ∈ I such that 10 i (a/q + η) > 1/200. This implies that Substituting this into the bound for F , and recalling we assume q < Y 1/3 gives the result. where we interpret the term in parentheses as 9 if 10 i−1 θ = 0. Writing θ = k i=1 t i 10 −i for t i ∈ {0, . . . , 9}, we see that the (k − j) th term in the product depends only on t k−j , . . . , t k . Moreover, the value of the term is mainly dependent on the first few of these digits by continuity. Thus we may approximate the absolute value of F Y (θ) by a product where the j th term depends only on t j , . . . , t j+J for some constant J. Explicitly, we have where we put t j = 0 for j > k.
With this formulation we can interpret the above bound in terms of the probability of a walk on {0, . . . , 9, ∞} k . Let t ∈ R be given. Consider an order-J Markov chain X 1 , X 2 , . . . where for a, a 1 , . . . , a n ∈ {0, . . . , 9} we have for n > J P(X n = a|X n−i = a i for 1 ≤ i ≤ J) = cG(a, a 1 , a 2 , . . . , a J ) t for some suitably small constant c (so that the probability that X n ∈ {0, . . . , 9} is less than 1). To make this a genuine Markov chain we choose the probability that X n = ∞ given X n−1 , . . . , X n−J to be such that the probabilities add up to 1, and if X n = ∞ then we have that X n+1 = ∞ with probability 1.
Then we have that The sum (over all paths in {0, . . . , 9} k ) of the probabilities of paths is a linear combination of the entries in the k th power of the transition matrix restricted to {0, . . . , 9}. Thus such a moment estimate is a linear combination of the k th power of the eigenvalues of this matrix. This allows us to estimate any moment of F Y (a/Y ) over a ∈ [0, Y ) uniformly for all k by performing a finite eigenvalue calculation. In particular, this gives us a (arbitrarily good as J increases) numerical approximation to the distribution function of F Y .
If this is the case then we have Thus, fixing i = 1 so that a k+1 = · · · = a J+k = 0, and summing over j, we have that On the other hand, by the eigenvalue expansion of M t , we have This gives the result. In particular, we have for Here 27/77 ≈ 0.35 is slightly larger than 1/3, and 50/77 ≈ 0.65.
Proof. This follows from Lemma 10.2 and a numerical bound on λ 1,4 . Specifically, by Lemma 10.2 taking J = 4 we find A numerical calculation 2 reveals that For the second bound, let choices of a 1 , a 3 and these can be absorbed into the supremum over β, we see that it suffices to show sup β∈R a2<min(Y2,Y3) Since F Y2 ≥ 0 we can extend the summation to a 2 < Y 2 . Thus without loss of generality we may assume that Here we used the fact that G(t i , . . . , t i+4 ) is bounded away from 0 for all t 1 , . . . , t k ∈ {0, . . . , 9} since it is the maximal absolute value of a trigonometric polynomial over an interval. Since F is periodic modulo 1 we see that
Proof. This follows from Lemma 10.2 and a numerical bound for λ 235/154,4 . Explicitly, we take J = 4 and Y = 10 k . By Lemma 10.2 we have Proof. For each a ≤ q, let |η a | maximize F U (a/q + η) over |η| < δ. Since the fractions a/q are all separated from one another by at least 1/q, we have for any t Thus, considering t = b/q − β, we see that We have that Thus integrating over s ∈ [t − γ, t + γ] for some γ > 0, we have This implies that |F ′ U (s)|ds.

Combining this with the trivial bound
for U ≤ Y , and choosing U maximally subject to U ≤ q and U ≤ Y gives the first result of the lemma.
The other bounds follow from entirely analogous arguments. In particular we note that for (a, q) = 1, q < Q, the numbers a/q are separated from one another by 1/Q 2 , and those with d|q are separated from each other by d/Q 2 , so we have the equivalent of (10.7) with δq replaced by δQ 2 or δQ 2 /d and |η| ≤ 1/2q replaced by |η| ≤ 1/2Q 2 or |η| ≤ d/2Q 2 .
Lemma 10.6 (Hybrid Bounds). Let E ≥ 1. Then we have In the above lemma, we emphasize that a, q, d are all integers, bu the summation over η is over real numbers which are well-spaced from the condition Y (η+a/q) ∈ Z.
Proof. We first note that the summand a/q + η runs through fractions b/Y with |b| ≤ E + Y since we have the condition (η + a/q)Y ∈ Z. Each fraction b/Y is represented O(1 + min(qE/Y, q)) times, since if a 1 /q + η 1 = a 2 /q + η 2 then a 2 = a 1 + O(qE/Y ) and η 2 is determined by a 1 , a 2 , η 1 . There are O(1 + E/Y ) choices of b giving the same fraction (mod 1), and since F Y is periodic (mod 1) these all give the same value of F Y (b/Y ). Thus we may consider only b < Y with each fraction b/Y occurring O((1 + E/Y ) min(qE/Y, q)) times. Thus we see that if 10qE ≥ Y then In this case the result now follows from Lemma 10.3. Thus we may assume qE < Y /10.
Using the product formula (10.3), we have for Y ≥ U V powers of 10 We also have the trivial bound F V (U θ) ≤ 1 of (10.1). For U V ≤ Y and |η| < E/Y these give We choose V and then U to be the largest powers of 10 such that V ≤ Y /qE and U ≤ Y /V E. Note that this choice gives U, V ≥ 1 since qE < Y /10 and q, E ≥ 1.
Since we chose U and V maximally, we have V ≥ Y /10qE, so q/100 ≤ U ≤ 10q.
Since qE < Y /10, we may extend the supremum in Σ 1 to γ ≤ 1/10q for an upper bound. Thus, by Lemma 10.5 we have Similarly, since Y /U V ≍ E, by Lemma 10.3 we have Putting this together gives the first result.
The second bound follows from an entirely analogous argument. We first split the argument depending on whether Q 2 E/d ≥ Y /10 or not, and use the final bound of Lemma 10.5 instead of the first bound to handle Σ 2 .
The argument giving the first bound of Lemma 10.6 is essentially sharp if the ℓ 1 bounds used in the proof are sharp and if q is a divisor of a power of 10 or if QE ≥ Y . When QE ≤ Y 1−ǫ and q is not a divisor of a power of 10, however, we trivially bounded a factor F V (U (a/q + η)) by 1 in the proof, which we expect not to be tight. Lemma 10.7 below allows us to obtain superior bounds (in certain ranges) provided the denominators do not have large powers of 2 or 5 dividing them.
Then we have In particular, if q = dq ′ with (q ′ , 10) = 1 and d|10 u for some integer u ≥ 0, then we have For example, if (q, 10) = 1 and qE is a sufficiently small power of Y , then we improve the first bound (qE) 27/77 of Lemma 10.6 in the q-aspect to E 27/77 q 1/21 . This improvement is important for our later estimates.
By the periodicity of F modulo one, the fact (q 1 q 2 , d) = 1, and the Chinese remainder theorem, we have where the dash on ′ indicates that η is summed over all reals satisfying Thus, since F is periodic modulo 1 and d 3 |D ′ and Moreover, by (10.3) and Cauchy-Schwarz, we have Since d 2 d 3 |D ′ V , this gives These give Since (d 1 d 2 d 3 , D ′ ) = d 3 and (q 1 q 2 , d) = 1, as a ′ , b 1 and b 2 go through all residue classes (mod q 1 q 2 ), (mod d 1 ) and (mod d 2 ) respectively subject to (a ′ , q 1 q 2 ) = (b 1 +d 1 b 2 , d 1 d 2 ) = 1, we see that D ′ β 2 goes through all values of c/q 1 q 2 d 1 d 2 (mod 1) for 0 < c < q 1 q 2 d 1 d 2 with (c, q 1 q 2 d 1 d 2 ) = 1, and each value is attained exactly once. Similarly, since (d 1 d 2 d 3 , D ′ V ) = d 2 d 3 , we see that β 3 goes through every value of c/q 1 q 2 d 1 (mod 1) with 0 < c < q 1 q 2 d 1 and (c, q 1 q 2 d 1 ) = 1 exactly once as a goes through the values (mod q 1 q 2 ) and b 1 goes through the values (mod d 1 ) with (a, q 1 q 2 ) = (b 1 , d 1 ) = 1.
Since F R (θ) ≥ F V (θ) for R ≤ V , we may replace F V with F R where R = 10 r is the largest power of 10 less than min(V, d 1 d 2 Q 1 Q 2 2 ). Since R ≤ V and D ′ EV /Y ≪ 1/V , we see all quantities γ occurring in the supremum are of size at most O(1/R). Given any choice of reals η a,q2 ≪ 1/R for a ≤ d 1 d 2 q 1 q 2 and q 2 ∼ Q 2 with (a, d 1 d 2 q 1 q 2 ) = 1, the numbers a/d 1 d 2 q 1 q 2 +η a,q2 can be arranged into O(d 1 d 2 Q 1 Q 2 2 /R) sets such that all numbers in any set are separated by ≫ 1/R. (Recall that r is chosen such that R ≤ d 1 d 2 Q 1 Q 2 2 .) Thus, as in the proof of Lemma 10.5 (specifically the argument leading up to (10.8)), we find that By Parseval we have a∈A1 a≤R 4π 2 a 2 ≪ 10 2r 9 r .

Major arcs
In this section we establish Proposition 9.1 using the prime number theorem in arithmetic progressions and short intervals, making use of Lemma 10.1.
Proof of Proposition 9.1. We split M up as three disjoint sets By Lemma 10.1 and recalling X is a power of 10, we have Using the trivial bound S RX (θ) ≪ X(log X) ℓ , where ℓ ≤ 2/η and noting #M 1 ≪ (log X) 3C , we obtain This gives the result for M 1 .
We now consider M 2 . Recalling the definition of R X , we have that for n < X where C = (a 1 , a 1 + δ] × · · · × (a ℓ−1 , a ℓ−1 + δ] is the projection of R X onto the first ℓ − 1 coordinates. We note the crude bound Let ∆ = ⌈log X⌉ −10C−10ℓ . We note that if a ∈ M 2 then a/X = b/q + c/X for some integers b, q, |c| ≤ (log X) C (c is an integer since q|X for the set M 2 ). We separate the sum S RX (a/X) by putting the prime variable p occurring in (11.2) in short intervals of length ∆x/m and in arithmetic progressions (mod q). We note that Λ C is supported on m ≤ X i ai+(ℓ−1)δ < X 1−η/3 , so we can drop the constraints p ≥ X η/4 , X 1− i ai−ℓδ at the cost of some terms with mp < X 1−η/12 + X 1−δ . Thus If mp = j∆X + O(∆X) and p ≡ r (mod q) we have By the prime number theorem in short intervals and arithmetic progressions (5.1), for m < X 1−η/3 and (r, q) = 1 we have Finally, since c ∈ Z and c = 0 and ∆ −1 ∈ Z, we have

Using (11.3), this gives
Note that in the above argument for us to be able to save an arbitrary power of log it was important that we are counting elements with weight Λ RX (n) rather than 1 RX (n), and that Xν ∈ Z for a ∈ M 2 .

Remark.
We have only needed to use the prime number theorem in arithmetic progressions when the modulus is a small divisor of X, and so has no large prime factors. This means that our implied constants can be taken to be effectively computable since for such moduli we do not need to appeal to Siegel's theorem.

Generic minor arcs
In this section we establish Proposition 9.2 and obtain some bounds on the exceptional set E by using the distributional estimates of Lemma 10.4. Lemma 12.1 (ℓ 2 bound for primes). We have that Proof. This follows from the ℓ 2 bound coming from Parseval's identity.

Lemma 12.2 (Generic frequency bounds). Let
Proof. The first bound on the size of E follows from using Lemma 10.4 with B = X 23/80 and verifying that (23 × 235)/(80 × 154) + 59/433 < 23/40. For the second bound we see from Lemma 10.4 that It remains to bound the sum over a / ∈ E. We divide the sum into O(log X) 2 subsums where we restrict to those a such that F X (a/X) ∼ 1/B and |S R (a/X)| ∼ X/C for some B ≥ X 23/80 and C ≤ X 2 (terms with C > X 2 makes a contribution O(1/X)).

This gives
We concentrate on the inner sum. Using Lemmas 10.4 and 12.1 we see that the sum contributes Here we used the bound min(x, y) ≤ x 1/2 y 1/2 in the last line. In particular, we see this is O η (X 1−2ǫ ) if B ≥ X 23/80 on verifying that 23/80 × 73/308 > 59/866. Substituting this into our bound above gives the result.

Exceptional minor arcs
In this section we reduce Proposition 9.3 to the task of establishing Proposition 13.3 and Proposition 13.4, given below. We do this by making use of the bilinear structure of Λ RX (n) which is supported on integers of the form n 1 n 2 with n 1 of convenient size, and then showing that if these resulting bilinear expressions are large then the Fourier frequencies must lie in a smaller additively structured set. Propositions 13.3 and 13.4 then show that we have superior Fourier distributional estimates inside such sets. Thus we conclude that the bilinear sums are always small. To make the bilinear bound explicit, we establish the following lemma, from which Proposition 9.3 follows quickly.
Lemma 13.1 (Bilinear sum bound). Let N, M, Q ≥ 1 and E satisfy X 9/25 ≤ N ≤ X 17/40 , Q ≤ X 1/2 , N M ≤ 1000X and E ≤ 100X 1/2 /Q, and either E ≥ 1/X or E = 0. Let F = F (Q, E) be given by Then for any complex 1-bounded complex sequences α n , β m , γ a we have Proof of Proposition 9.3 assuming Lemma 13.1. By symmetry, we may assume that I = {1, . . . , ℓ 1 } for some ℓ 1 < ℓ. By Dirichlet's theorem on Diophantine approximation, any a ∈ [0, X) has a representation for some integers (b, q) = 1 with q ≤ X 1/2 and some real |ν| ≤ 1/X 1/2 q. Thus we can divide [0, X) into O(log X) 2 sets F (Q, E) as defined by Lemma 13.1 for different parameters Q, E satisfying 1 ≤ Q ≤ X 1/2 and E = 0 or 1/X ≤ E ≤ 100X 1/2 /Q. Moreover, if a / ∈ M then a ∈ F = F (Q, E) for some Q, E, with Q + E ≥ (log X) C . Thus, provided C is sufficiently large compared with A and η, we see it is sufficient to show that From the definition (9.1) of Λ RX and shape of R X given by Proposition 9.3, we have that for n < X where R 1 is the projection of R X onto the first ℓ 1 coordinates, and R 2 is the projection onto the subsequent ℓ − ℓ 1 − 1 coordinates.
Since n 1 , n 2 , p and X are integers, | log ((X − 1/2)/n 1 n 2 p)| ≫ 1/X. Thus, by Perron's formula (see, for example, [10, Chapter 17]), we have for n 1 , n 2 , p < X We will use this to remove the constraint n = n 1 n 2 p < X in S RX (−a/X). We first put n 1 , n 2 , p into one of O(log X) 3 intervals of the form (Y /10, Y ], and then apply the above estimate. The O(X −2 ) error term trivially makes a negligible contribution to (13.1). Thus, we see that for C sufficiently large, it suffices to show uniformly over all s with ℜ(s) = 1/ log X and all choices of N 1 , N 2 , P with where c p = log p if p ≥ X η/4 , X 1− i ai−ℓδ and 0 otherwise. (The integral over s and the choices of N 1 , N 2 , P contribute a factor of O(log X) 4 , which is acceptable for establishing (13.1) if C is sufficiently large.) so either N 1 or N 2 P lie in [X 9/25 , X 17/40 ]. Since Λ R1 (n 1 ), Λ R2 (n 2 ), log p ≪ ℓ (log X) ℓ−1 , uniformly over all choices of N ∈ [X 9/25 , X 17/40 ] and M ≤ 1000X/N and uniformly over all 1-bounded complex sequences α n , β m . (Setting α n = Λ R1 (n)/(log X) ℓ and β m = pn2=m,p∼P,n2∼N2 Λ R2 (n 2 )c p /(log X) ℓ gives the bound when ℓ1 i=1 a i ∈ [9/25 + ǫ/2, 17/40 − ǫ/2]; the other case is analogous with α n and β m swapped.) Finally, let γ a be the 1-bounded sequence satisfying S A (a/X) = #Aγ a F X (a/X). After substituting this expression for S A , we see that (13.2) follows immediately from Lemma 13.1 for C sufficiently large in terms of η, thus giving the result.
Thus it remains to establish Lemma 13.1. The key estimate constraining Fourier frequencies to additively structured sets is the following lemma.
Lemma 13.2 (Geometry of numbers). Let K 0 be a sufficiently large constant, let t ∈ R 3 with t 2 = 1 and let N > 1 > δ > 0. Let If a cuboid R ⊆ R 3 of volume V lies in a the region |z| ≤ ǫ, then it can easily contain rather more than V lattice points from the plane z = 0. Lemma 13.2 says that such a situation is essentially the only way a cuboid can contain many lattice points; if any cuboid has substantially more than V lattice points in R ∩ Z 3 , then these lattice points must come from some lower dimensional linear subspace. The region R which we are interested in is a slightly thickened disc through the origin in the plane orthogonal to t.
Proof of Lemma 13.2. Let φ : R 3 → R 3 be the linear map which is a dilation by a factor N/δ in the t-direction (i.e. φ(v) = v+t(N/δ−1)(v·t).) Let Λ 1 = φ(Z 3 ) ⊂ R 3 be the lattice which is the image of Z 3 under φ. Since the determinant of a lattice is the volume of the fundamental parallelepiped, we see that det(Λ 1 ) = N/δ.
We now notice that any element of R ∩ Z 3 is mapped injectively by φ to an element of {x ∈ Λ 1 : x 2 ≤ 2N }. Thus for a sufficiently large constant C, we have If v 3 2 > CN , then there are no n ∈ Z 3 counted above with Thus in either case there are O(δN 2 ) points with n 3 = 0. However, by assumption of the lemma we have that K is sufficiently large and This means that most of the contribution must come from terms with n 3 = 0. Indeed, we have We may choose K 0 such that if K ≥ K 0 then the right hand side is at least δKN 2 /2.
We establish Lemma 13.1 assuming two key propositions, Proposition 13.3 and Proposition 13.4, given below. These propositions will be proven over the next two sections.
Given B ≤ X 23/80 , let E ′ = E ′ (B) be given by Then we have Proof of Lemma 13.1 assuming Propositions 13.3 and 13.4. We split E into O(log X) subsets of the form Thus it suffices to show We consider 1 ≤ K ≤ X/N taking values which are integral powers of 10, and split the contribution of our sum according to these sets. We see it is therefore sufficient to show that for each K Let G(K, δ) denote the set of pairs (a 1 , a 2 ) ∈ F ∩ E ′ such that # n ∈ Z 3 : n 1 a 1 − n 2 a 2 − n 3 X X ≤ δ, n 2 ≤ 10N ≥ δN 2 K.
By considering δ = 2 −j and using the pigeonhole principle, we see that if then there is some δ ≥ N/X and some K/ log X ≪ K ′ ≤ K such that (a 1 , a 2 ) ∈ G(K ′ , δ).
Thus is suffices to show for all K ′ , δ that From Lemma 12.2, we have the bound which gives (13.3) in the case when N K ′ ≪ X 17/40+ǫ . Thus we may assume that N K ′ ≫ X 17/40+ǫ . By assumption, we also have that N ≤ X 17/40 , so we only consider K ′ ≫ X ǫ . In particular, we may use Lemma 13.2 to conclude that either there is a rank 2 lattice Λ ⊆ Z 3 such that #{n ∈ Λ : n 2 ≤ 10N, |n 1 a 1 + n 2 a 2 + n 3 X| ≤ δX} ≥ δK ′ N 2 /2, and not all of these points lie on a line through the origin, or there is a line L ⊆ Z 3 such that #{n ∈ L : n 2 ≤ 10N, |n 1 a 1 + n 2 a 2 + n 3 X| ≤ δX} ≥ δK ′ N 2 /2.
In either case (13.3) follows from Proposition 13.3 or Proposition 13.4 (taking 'N ' and 'K' in the propositions to be 10N and K ′ /1000 ≥ 1 in our notation here).
Thus it remains to establish Proposition 13.3 and Proposition 13.4.

Lattice Estimates
In this section we establish Proposition 13.3, which controls the contribution from pairs of angles which cause a large contribution to the bilinear sums considered in Section 13 to come from a lattice. A low height lattice Λ makes a significant contribution only if (a 1 , a 2 , X) is approximately orthogonal to the plane of the lattice, and so only if (a 1 , a 2 , X) lies close to the line through the origin orthogonal to this lattice. We note that we only make small use of the fact that these angles lie in a small set, but it is vital that the angles lie outside the major arcs.
Lemma 14.1 (Lattice generating angles have simultaneous approximation). Let δ > 0 and X, N, K ≥ 1 be such that δ ≥ N/X. Let B 1 = B 1 (N, K, δ) ⊆ [0, X) 2 be the set of pairs (a 1 , a 2 ) ∈ Z 2 such that there is a lattice Λ ⊆ Z 3 of rank 2 such that #{n ∈ Λ : |n 1 a 1 + n 2 a 2 + n 3 X| ≤ δX, n 2 ≤ N } ≥ δKN 2 , and moreover the points counted above do not all lie on a line through the origin.
Then all pairs (a 1 , a 2 ) ∈ B 1 have the simultaneous rational approximations for some integer q ≪ X/N K.
We see Lemma 14.1 restricts the pair (a 1 , a 2 ) to lie in a set of size O(X/N K) 3 , which is noticeably smaller than X 2 for the range of N K under consideration. This allows us to obtain superior bounds for the sum over a 1 , a 2 , by exploiting the estimates of Lemma 10.6 which show F is not abnormally large on such a set.
Proof. Clearly we may assume that N K is sufficiently large, since otherwise the result is trivial. By assumption of the lemma, for any pair (a 1 , a 2 ) ∈ B 1 there is a rank 2 lattice Λ = Λ a1,a2 such that #(Λ ∩ H) ≥ δKN 2 where Moreover, not all the points in Λ ∩ H lie in a line through the origin. Let a = (a 1 , a 2 , X), and let φ : R 3 → R 3 be a dilation by a factor N/δ in the a-direction, and let Λ ′ = φ(Λ). Then we see that Moreover, not all the points on the right hand hand side lie in a line through the origin, since φ −1 preserves lines through the origin. Let Λ ′ have a Minkowskireduced basis {v 1 , v 2 }, and let V 1 = v 1 2 and V 2 = v 2 2 . Since m 1 v 1 + m 2 v 2 2 ≍ |m 1 |V 1 + |m 2 |V 2 , for a suitably large constant C we have Since not all of the points in the final set lie in a line through the origin, we see that V 1 , V 2 ≤ CN . Thus In particular, V 1 V 2 ≪ 1/δK.
Putting this together, we see that for any pair (a 1 , a 2 ) ∈ B 1 there are linearly independent vectors w 1 , w 2 ∈ Z 3 and quantities V 1 , V 2 such that This puts considerable constraints on the possibilities for (a 1 , a 2 ), since it must lie in an infinite cylinder with axis parallel to w 1 × w 2 with short radius, for some low height vectors w 1 , w 2 . (Here × is the standard cross product on R 3 .) Explicitly, let e 1 , e 2 , e 3 be an orthonormal basis of R 3 with e 1 orthogonal to w 1 and w 2 , and with e 2 orthogonal to w 2 . Then we see that e 1 ∝ w 1 × w 2 , e 2 ∝ w 2 × e 1 and e 3 ∝ w 2 . In particular, we have that |e 3 · w 2 | = w 2 2 , and (Here we used the identity a · (b × c) = c · (a × b).) Thus, if x = x 1 e 1 + x 2 e 2 + x 3 e 3 has |x · w 1 | ≪ δXV 1 /N and |x · w 2 | ≪ δXV 2 /N , then Since w 1 2 ≪ V 1 , w 2 2 ≪ V 2 and w 1 × w 2 2 ≤ w 1 2 w 2 2 , this implies that Thus, since V 1 V 2 ≪ 1/δK, we see that any vector x with |x · w 1 | ≪ δXV 1 /N and |x · w 2 | ≪ δXV 2 /N satisfies for some λ ∈ R. We note that the error term is o(X) since w 1 , w 2 are linearly independent integer vectors and N K is assumed sufficiently large. Let the components of w 1 × w 2 be c 1 , c 2 , c 3 (with respect to the standard basis of R 3 ). Since w 1 , w 2 ∈ Z 3 , we have c 1 , c 2 , c 3 ∈ Z. Thus if a is of the above form we must have a = λ(w 1 × w 2 ) + o(X) for some λ. Since a 2 ≥ X and a 1 , a 2 ≤ a 3 = X, we must have that |c 1 |, |c 2 | ≪ |c 3 |. In particular, |c 3 | ≍ w 1 × w 2 2 . Dividing through by X = λc 3 + O(X/N K|c 3 |) then gives Finally, we note that since δ ≥ N/X and V 1 V 2 ≪ 1/δK we have Thus, we see that for any pair (a 1 , a 2 ) ∈ B 1 there must be integers c 1 , c 2 , c 3 ≪ X/N K such that (14.1) holds. This gives the result.
If N K > X 2/3 (and X is sufficiently large) then we see that b 1 /q and b 2 /q are the best rational approximations to a 1 /X and a 2 /X with denominator O(X 1/3 ), since the error in the approximation is O(1/(qX 2/3 )). Thus if we also have a 1 , a 2 ∈ F (Q, E) then we must have q ≫ Q and |ν 1 |, |ν 2 | ∼ E/X. In particular, we must have the supremum is over all choices of Q 1 , G 1 , G 2 , D 0 , D 1 , E 0 ≥ 1 which are powers of 10 and satisfy Q 1 G 1 G 2 D 0 D 1 E 0 ≪ X/N K and G 1 ≪ G 2 , and S 1 , S 2 , S 3 are given by Proof. By Lemma 14.1 we are considering pairs (a 1 , a 2 ) ∈ B 1 (N, K, δ) such that for some q ≪ X/N K and |ν 1 |, |ν 2 | ≪ 1/N Kq.
By clearing common factors we may assume that (b 1 , b 2 , q) = 1. We let g 1 = (b 1 , q) and g 2 = (b 2 , q). By symmetry we may assume that g 1 ≤ g 2 . We let d 1 be the part of g 1 not coprime to 10 (i.e. d 1 |10 u for some integer u, and g 1 = g ′ 1 d 1 for some (g ′ 1 , 10) = 1). Similarly we let d 0 be the part of q/g 1 g 2 which is not coprime to 10. To ease notation we let b = 1 and (q ′ , 10) = (g ′ 1 , 10) = 1. We split the contribution of pairs (a 1 , a 2 ) ∈ B 1 into O(log X) 5 subsets. We consider terms where we have the restrictions We relax the restriction |ν 1 |, |ν 2 | ≪ 1/N Kq to |ν 1 |, |ν 2 | ≤ E 0 /X for a suitable power of 10 E 0 ≍ X/N KQ 0 with E 0 ≥ 1. We see there are O(log X) 5 sets with such restrictions which cover all possible (b 1 , b 2 , q, ν 1 , ν 2 ) and hence all (a 1 , a 2 ) ∈ B 1 . For simplicity, the reader might like to consider the special case G 1 = G 2 = D 0 = D 1 = 1 on a first reading.
To ease notation we let V = {2 u 5 v : u, v ∈ Z ≥0 }, and note that we have d 0 , d 1 ∈ V. By summing over all possibilities of q ′ , g ′ where the supremum is over all choices of Q 1 , G 1 , G 2 , D 0 , D 1 , E 0 ≥ 1 which are powers of 10 and satisfy Q 1 G 1 G 2 D 0 D 1 E 0 ≪ X/N K and G 1 D 1 ≪ G 2 and S 0 is given by In S 0 , we have used ′ to indicate that the summation is further constrained by the conditions which we suppressed for notational simplicity. We see that g ′ 1 , g 2 , b ′ 1 , b ′ 2 , ν 1 , ν 2 each occur in only one of the two F X terms, and so given d 0 , d 1 , q ′ the remaining summation in S 0 factors into a product of two sums. Taking a supremum over all choices of q ′ in the first of these then gives (14.2) (a1,a2)∈B1(N,K,δ) a1,a2∈F The bound (14.2) will be useful when Q 0 is small, but when Q 0 is large it is wasteful to sum over all these possibilities since we have not made use of the fact that a 1 , a 2 ∈ E, a small set. To obtain an alternative bound we first sum over all a 1 ∈ E, then all possibilities of q, b 2 , ν 2 . This shows that where the supremum has the same constraints as before, and S ′ 0 is given by Here the summation in S ′ 0 is constrained by Again, taking a supremum over q ′ and factorizing the summation, we find that (14.6) S ′ 0 ≪ S 1 S 3 , where S 1 is as given by (14.3) above, and S 3 is given by Putting together (14.2), (14.5), (14.6) we obtain as required.
Lemma 14.4. Let N K ≥ X 17/40 and let S 1 , S 2 , S 3 be as in Lemma 14.3. Let Q 1 , G 1 , G 2 , D 0 , D 1 , E 0 ≥ 1 be powers of 10 which satisfy Q 1 G 1 G 2 D 0 D 1 E 0 ≪ X/N K and G 1 ≪ G 2 . Then we have Proof. We first bound S 1 , S 2 , S 3 individually using Lemma 12.2, Lemma 10.6 and Lemma 10.7. We will then combine these bounds to give the desired result.
We first consider the quantity N (a 1 , d 0 ) occurring in S 3 . If q and q ′ are both counted by N (a, d) then there exists b, g and b ′ , g ′ such that (b, qdg) = (b ′ , q ′ dg ′ ) = 1 and Here we used the fact that E 0 /X ≪ 1/N KQ 0 . The variables we consider satisfy q, q ′ ∼ Q 1 ≪ Q 0 /G 1 G 2 D 0 D 1 and g, g ′ ∼ G 2 and d ∼ D 0 . Thus such choices of h. Given q, g, b, h with (qg, b) = 1, we then see Since q ′ g ′ ≍ qg and b ′ ≍ b, there are O(1) choices of b ′ and q ′ g ′ . Thus there are O(Q ǫ 0 ) such choices of q ′ , g ′ , b ′ by the divisor bound. Thus we find that We recall that Q + E ≪ (X/N K) 2 , and so this gives as required.

Line Estimates
In this section we establish Proposition 13.4, which controls the contribution from pairs of angles which cause a large contribution to the bilinear sums considered in Section 13 to come from a line. If a line L makes a large contribution, then (a 1 , a 2 , X) must lie close to the low height plane orthogonal to this line. We note that we do not make use of the fact that these angles lie outside the major arcs, but it is vital that the angles are restricted to the small set E.
Lemma 15.2 (Sparse sets restricted to low height planes). Let C ⊆ [0, X) be a set of integers. Then we have for any V ≥ 1 # (a 1 , a 2 ) ∈ C 2 : Proof. Trivially there are O(#C 2 ) choices of a 1 , a 2 ∈ C, which gives the required bound if V > #C 3/8 . In particular, we may assume that V < #C ≤ X. There are O(#C) points with a 1 = 0 or a 2 = 0, so we may assume that a 1 , a 2 = 0.
We first claim that there are Since there are no non-zero solutions to v 3 X + v 4 = 0, this is non-zero and so there are O(X ǫ ) choices of v 2 , a 2 . The other cases are entirely analogous. Thus it suffices to consider pairs (a 1 , a 2 ) such that v 1 a 1 + v 2 a 2 + v 3 X + v 4 = 0 for some v 1 , v 2 , v 3 , v 4 all non-zero. We let C 2 denote the set of such pairs.
Proof of Proposition 13.4. We wish to show that (a1,a2)∈B2(N,K,δ) a1,a2∈E ′ F X a 1 X F X a 2 X ≪ X 1−ǫ N K in the region N ≫ X 9/25 . We recall that Here we have written a for the vector (a 1 , a 2 , X, 1) ∈ Z 4 .
Since N K ≫ X 57/80 /B, we have X/N K ≪ X 23/80 B. Combining this bound with (15.7), we obtain a bounds for (#E ′ ) 5/4 B −2 X/N K and (#E ′ ) 3/2 B −2 X −1/2 (X/N K) 2 of the form X a B b for some b > 0. Since we are only considering B ≪ X 23/80 , these expressions are maximized when B ≍ X 23/80 . When B ≍ X 23/80 we have #E ′ ≪ X 23/40 and X/N K ≪ X 23/40 . Thus we obtain the bounds (#E ′ ) 5 We can then verify that 2 × 9/25 > 23/32 and that 3 × 9/25 > 15/16, so for N ≫ X 9/25 this is O(X 1−ǫ /N K), as required. 16. Modifications for Theorem 1.2 Theorem 1.2 follows from essentially the same overall approach as in Theorem 1.1. We only provide a brief sketch the proof, leaving the complete details to the interested reader. When q is large, there is negligible benefit from using the 235/154 th moment, so we just use ℓ 1 bounds. For Y = q k a power of q, we let The inner sum is ≤ min(q − s, s + 2/ q i θ ). Thus, similarly to Lemma 10.3, we find Using this bound, get a corresponding improvement on (16.1), which gives If s ≤ q − q 57/80 and q is sufficiently large in terms of ǫ, this gives a bound Y 23/80+ǫ . As before, using this bound in place of Lemma 10.3 and Lemma 10.4 throughout gives the result.

Acknowledgments
We thank Ben Green for introducing the author to this problem, Xuancheng Shao for useful discussions and Fabian Karwatowski for some important corrections. We also thank the anonymous referee for many helpful suggestions and corrections. The author is supported by a Clay Research Fellowship and a Fellowship by Examination of Magdalen College, Oxford. Part of this work was performed whilst the author was visiting Stanford university, whose hospitality is gratefully acknowledged.