A variation of the prime k-tuples conjecture with applications to quantum limits

Let $\mathcal{H}^{*}=\{h_1,h_2,\ldots\}$ be an ordered set of integers. We give sufficient conditions for the existence of increasing sequences of natural numbers $a_j$ and $n_k$ such that $n_k+h_{a_j}$ is a sum of two squares for every $k\geq 1$ and $1\leq j\leq k.$ Our method uses a novel modification of the Maynard-Tao sieve together with a second moment estimate. As a special case of our result, we deduce a conjecture due to D.~Jakobson which has several implications for quantum limits on flat tori.


Introduction
We say that a set H = {h 1 , . . . , h k } of distinct integers is admissible if #{H (mod p)} < p for every prime p. An outstanding problem in analytic number theory is the prime k-tuples conjecture, which asserts the following. Conjecture 1.1. Let H = {h 1 , . . . , h k } be admissible. Then there exists infinitely many integers n such that the translates n + h 1 , . . . , n + h k are prime.
A proof of this conjecture is far out of reach of current techniques. However, we have been successful in establishing various weak versions of this result using sieve methods. For example, the Maynard-Tao sieve can be used to show that ≫ log k of the translates are simultaneously prime infinitely often, when k is sufficiently large (cf. [9,11]).
We extend the definition of admissibility to infinite ordered sets and say H * = {h 1 , h 2 , . . .} is admissible if the finite truncation {h 1 , . . . , h k } ⊆ H * is admissible for every k ≥ 1. In this paper we are interested in the following variation of this conjecture, for numbers representable as a sum of two squares. Conjecture 1.2. Let H * = {h 1 , h 2 , . . .} be admissible. Then there exists an increasing sequence of integers n k such that, for every k ≥ 1, the translates n k + h 1 , . . . , n k + h k are sums of two squares.
We remark that if we replaced "sums of two squares" with "prime" here, then this would simply be a reformulation of Conjecture 1.1. (It is easy to show that any finite admissible set can be extended to an infinite admissible set.) Our interest in this version of the conjecture stems from a problem which appears towards the end of D. Jakobson's "Quantum limits on flat tori" paper [8]. In this paper Jakobson is concerned with characterising the possible quantum limits that can arise on the standard flat d-dimensional torus T d = R d /Z d . A complete classification of such objects is established in two dimensions, with possible behaviours in higher dimensions described unconditionally for d ≥ 4, and conditionally for d = 3 on a weak version of Conjecture 1.2 (cf. [8,Conjecture 8.2]).
In this paper we establish Jakobson's conjecture.
Theorem 1.3. There exists increasing sequences of natural numbers a j and M k such that M k − a 2 j is a sum of two squares for 1 ≤ j ≤ k. Moreover, the sequence a j is such that: (1) r 2 (a j ) < r 2 (a j+1 ) for all j ≥ 1.
(2) The even parts are uniformly bounded; that is to say, if we write a j = 2 b j m j where (m j , 2) = 1, then b j = O(1) uniformly for j ≥ 1.
Here, r 2 (n) denotes the number of representations of n as a sum of two squares. We deduce Theorem 1.3 from the following general result. Then there exists increasing sequences of natural numbers a j and n k such that n k + h a j is a sum of two squares for every k ≥ 1 and 1 ≤ j ≤ k.
As mentioned above, Theorem 1.3 allows us conclude results about quantum limits on flat tori. Let (λ j ) j≥1 be a sequence of eigenvalues of the Laplace operator ∇ on T d such that λ j → ∞, and let ϕ j be corresponding eigenfunctions with ϕ j 2 = 1. If the sequence of probability measures dµ j = |ϕ j | 2 dx has a weak- * limit dv, then we call dv a quantum limit. (Here dx is the normalised Riemannian volume.) It can be shown that all limits of such sequences dµ j are absolutely continuous with respect to the Lebesgue measure on T d (cf. [ Among other things, Jakobson shows that in two dimensions all quantum limits are necessarily trigonometric polynomials (cf. [8,Theorem 1.2]). The same result isn't true for d ≥ 4, and conjecturally not true for d = 3 either (cf. [8,Conjecture 8.2] and the following discussion). With Theorem 1.3, we can now complete this aspect of the classification of quantum limits on flat tori. Theorem 1.5. There exists quantum limits on T 3 that are not trigonometric polynomials.
As further consequences to Theorem 1.3 we are able to show the following results for quantum limits whose Fourier expansions are described as in (1.1). Theorem 1.6. Let ǫ > 0. We have the following.
(i) For d ≥ 4 there exists quantum limits dv on T d with densities that are not in l 2−ǫ (i.e. for which τ |c τ | 2−ǫ diverges).
(ii) For d ≥ 5 there exists quantum limits dv on T d for which where Σ(ρ) is defined as The results contained in Theorem 1.6 improve upon various results found in [8]. Part (i) was previously shown for d ≥ 5, and has now been extended to the case d = 4 where it is now optimal (cf. [8,Theorem 1.4]). Part (ii) improves on the weaker lower bound lim sup ρ→∞ Σ(ρ) ρ d−5−ǫ = +∞ which was shown for d ≥ 6. The lower bound we prove is believed to be optimal for all d ≥ 5 (cf. [8,Proposition 1.2] and comments shortly after).
Remark 1.7. It is well-known that the eigenvalues of ∇ on T d are the numbers 4π 2 k for non-negative integers k, and they occur with multiplicity r d (k) (the number of representations of k as the sum of d squares). This means various constructions associated to quantum limits on flat tori can be translated to problems in number theory involving sums of squares. Remark 1.8. Jakobson shows how Theorem 1.3 follows from weak form of the prime k-tuples conjecture, essentially by using the fact primes p ≡ 1 (mod 4) are the sum of two squares (cf. discussion at the end of [8,Section 8]). We note that the weak form of the conjecture Jakobson uses is still far out of reach of current methods.

Outline of new sieve ideas
In this section, let A ⊆ N denote a set of arithmetic interest, which for our purposes is the set of numbers representable as a sum of two squares (but the following discussion holds more generally). We will denote random variables by boldfaced letters, for example X. We will let P(·) denote a probability measure and by E[·] the expectation operator.

A model problem.
Our aim is to prove Theorem 1.4. By a pigeonhole argument (see Proposition 5.1), it suffices to consider the following model problem.
Model Problem. Fix an admissible set H * = {h 1 , h 2 , . . .} of integers and a partition H * = B 1 ∪ B 2 ∪ . . . where each bin B i is a fixed, finite size k i . Is it the case that for every M ≥ 1 there exists elements h a 1 , . . . , h a M and infinitely many integers n such that h a j ∈ B j and n + h a j ∈ A for 1 ≤ j ≤ M ?
We realise the above set-up as the output of a sieving process. For notational purposes we order B i = {h k 0 +...+k i−1 +1 , . . . , h k 0 +...+k i } for i ≥ 1, with the convention that k 0 = 0. Let k = k 0 + . . . + k M for some large M . Given n ∈ [N, 2N ) for some large N , let X i denote the random variable that counts the number of h ∈ B i such that n + h ∈ A, and let X = X 1 + . . . X M .
The current method we use to detect primes in k-tuples is the GPY method. For general sets A, the aim is to show the first moment inequality holds for some integer m ≥ 1, where 1 A denotes the indicator function of the set A and w(n) are non-negative weights (cf. [4,9,11]). If we normalise the weights to sum to 1, then this is saying "if we choose n randomly from the interval [N, 2N ) with probability w(n), then E[X] > m." From this we can deduce the existence of an n ∈ [N, 2N ) for which m + 1 of the translates n + h i ∈ A. We say such a translate has been "accepted." Exactly which translates are accepted is unknown. This is a limitation of the first moment method. It is clear that for our model problem, we require more information about which translates n + h appear. Namely, we need to be obtaining an accepted translate from each of the M bins B 1 , . . . , B M (recall k = k 0 + . . . + k M ). This presents two obvious difficulties.
(1) For any 1 ≤ i ≤ k the probability of the event n + h i ∈ A depends on k, and tends to 0 as k → ∞. This would mean any bin of fixed size expects to get fewer and fewer elements as k gets large. In particular, we cannot hope the hypotheses hold for every M ≥ 1.
) unless we input some information about the joint distribution of the bins.
We are able to overcome these issues by modifying the sieve weights and using a second moment estimate.

2.2.
Choice of sieve weights. We solve the first problem by modifying the sieve weights to put more emphasis on the earlier bins. This way, we can guarantee that P(n + h ∈ A|h ∈ B i ) = c i where the constant c i depends solely on the bin. This also means that we can guarantee E[X i ] is large for each i (provided k i is large enough in terms of c i ). We will consider Maynard-Tao sieve weights with a fixed factorisation and f i is a suitable smooth function supported on the simplex Here (β i ) i≥1 is a sequence of real numbers such that ∞ i=1 β i ≤ 1 (cf. the sieve weights defined in [9,Proposition 4.1]). We will take β i = 2 −i , and in this instance one might say "we have allocated 50% of the sieve power to B 1 ." 2.3. Concentration of measure. We can deal with the second problem by showing the random variables X i exhibit "enough" independence. This is precisely what concentration of measure arguments are used for. For example, an application of the union bound and Chebychev's inequality tells us that where where ρ A is a non-negative function supported on A. This means our methods are limited to cases in which estimates of the above type are known. In particular we cannot deal with the case of primes, as evaluating the above sum asymptotically with 1 P or the von Mangoldt function Λ (say) is equivalent to the twin prime conjecture. 1 One can do much better when working with sums of two squares. We cannot evaluate (2.6) asymptotically using the indicator function, but we can if we work with the representation function r 2 (n) instead. Unfortunately r 2 (n) is too large for our purposes, and it proves necessary to consider a weighted version instead. In Hooley's work [7] on the distribution of numbers representable as the sum of two squares, he considers a weighted representation function ρ(n) = t(n)r 2 (n) where and θ 1 is a suitably small, fixed constant (for example Hooley takes θ 1 = 1/20). Here g 2 (p) is the multiplicative function defined on primes by The t(n) factor acts to dampen down the oscillations due to r 2 (n). Thus ρ(n) acts a proxy for the indicator function 1 n= + and moreover asymptotics for (2.6) are available for ρ(n). This is the function we will be working with.

2.5.
Outline of the paper. In Section 3 we deduce the results about quantum limits contained in Theorem's 1.5 and 1.6 from Theorem 1.3. In Section 5 we state a few preliminary lemmas that will be needed in the sieve calculations. We defer the proofs of these results to the Appendix. In Section 6 we state our main sieve results, and from them we deduce Theorem 1.4. We isolate a key lemma (see Lemma 6.6) from which all of our sieve estimates follow. Section 7 and Section 8 are dedicated to proving this lemma.

Proofs of quantum limit results
In this section we deduce the results of Theorem 1.5 and Theorem 1.6 from Theorem 1.3. Following [8], we note that ϕ k is an eigenfunction of the Laplacian on T d with eigenvalue λ k = 4π 2 n k for some n k ∈ N if and only if its Fourier expansion is of the form for a ξ ∈ C. Moreover ϕ k 2 = 1 if and only if ξ |a ξ | 2 = 1. It follows that (3.2) Let dv be a quantum limit on T d with Fourier expansion as in (1.1). By |ϕ k | 2 dx → dv weak- * as k → ∞ we mean that for every τ ∈ Z d we have c τ = lim k→∞ b τ (k). Fix a 1 < a 2 < . . . and M 1 < M 2 < . . . as in the statement of Theorem 1.3, and let b Let 0 < ǫ < 2 and let F = F ǫ : N → N be a rapidly increasing function whose rate of growth will be specified later. As we are assuming both a i → ∞ and r(a i ) → ∞, by passing to a subsequence if necessary (and relabelling the indices of the sequence M k ), we may suppose a i , r(a i ) ≫ F (i).
We will require information about the number of integer points on the surface of the d-dimensional sphere. For this we recall the following results: writing n = 2 k m and letting σ(n) = d|n d denote the sum-of-divisors function, we have the identities Here C d (n 2 ) is a singular series which satisfies C d (n 2 ) ≍ d 1.
We prove each statement similarly -in each case we consider a suitable sequence of L 2 -normalised eigenfunctions with eigenvalues λ k = 4π 2 M k and show that the limit has the desired property.
Proof of (Theorem 1.3 ⇒ Theorem 1.5). Consider the sequence of L 2 -normalised eigenfunctions on T 3 that arise by choosing coefficients because the a i are distinct and so the only contribution to the sum comes from which proves the theorem.
Proof of (Theorem 1.3 ⇒ Theorem 1.6). It suffices to prove part (i) for d = 4, by identifying the eigenfunctions on T d with the eigenfunctions on T d+l all of whose non-zero frequencies lie in the subspace {(x 1 , . . . , x d+l ) : Consider the sequence of L 2 -normalised eigenvectors on T 4 that arise by choosing j ) for some j and X 2 + Y 2 = a 2 j , 0 otherwise.
Fix i and suppose k ≥ i. Given two non-zero coefficients a ξ , a ξ ′ , corresponding to vectors of the form ξ = (X, Y, b and the norm of this vector is ≤ 2a i by the triangle inequality. From (3.2), it follows that if we sum b τ (k) over all |τ | ≤ 2a i then we pick up all such differences. There are r(a 2 i ) 2 of them, leading to Taking the limit as k → ∞ we conclude that Now we can choose F = F ǫ so that the expression on the right hand side is unbounded as i → ∞. It follows that τ |c τ | 2−ǫ doesn't converge, proving part (i). For part (ii), fix d ≥ 5. We proceed as in part (i), except this time because d ≥ 5 we have the lower bound r d−2 (a 2 i ) ≫ d a d−4 i . We remark that to obtain this bound for d ∈ {5, 6} we are using property (2) given by Theorem 1.3. For d ≥ 7 the bound holds without this extra assumption on our sequence. Now consider the sequence of eigenvectors on T d with densities j ) for some j and otherwise.
Fix i and suppose k ≥ i. As above we can conclude Taking the limit as k → ∞ we conclude that

It follows that
Choosing F = F ǫ appropriately and letting i → ∞, we see that for this choice of quantum limit we have where Σ(ρ) is defined as in (1.2). This proves part (ii).

Notation
We will use both Landau and Vinogradov asymptotic notation throughout the paper. N will denote a large integer, and all asymptotic notation is to be understood as referring to the limit as N → ∞. Any dependencies of the implied constants on other parameters A will be denoted by a subscript, for example X ≪ A Y or X = O A (Y ), unless stated otherwise. We let ǫ denote a small positive constant, and we adopt the convention it is allowed to change at each occurrence, and even within a line.
We will denote the non-trivial Dirichlet character (mod 4) by χ 4 , and we may omit the subscript and simply write χ. As usual, we let ϕ(n) denote the Euler-Totient function, τ r (n) denote the number of ways of writing n as the product of r natural numbers, µ(n) denote the Möbius function, and r d (n) denote the number of representations of n as the sum of d squares. For the rest of the paper we will write r(n) when d = 2. For integers a, b we let (a, b) denote their highest common factor, and [a, b] denote their lowest common multiple.
We define the Ramanujan-Landau constant which will appear in many of our results.

Preliminaries
In this section, we formalise some of the notions discussed in Section 2, and state a few key estimates that will be required later in the sieve calculations.

5.1.
A pigeonhole argument. The following proposition allows us to go from the set-up in Theorem 1.4 to the model problem discussed in Section 2. where each bin B i is a fixed, finite size, such that for every M ≥ 1, there exists infinitely many n and M translates n + h i,M ∈ A with h i,M ∈ B i for 1 ≤ i ≤ M. Then there exists increasing sequences a j and n k such that for every k ≥ 1 we have n k +h a j ∈ A for 1 ≤ j ≤ k and moreover h a j ∈ B j for all j.
Proof. With the above set-up, obtain translates n+h i,M ∈ A with h i,M ∈ B i for 1 ≤ i ≤ M for each M ≥ 1. Record this process in the following infinite table: Look at the first column. By the pigeonhole principle, since B 1 is finite, there must exist an element h a 1 ∈ B 1 which appears infinitely many times. Choose the smallest such h a 1 , and choose any n 1 ∈ N for which n 1 + h a 1 ∈ A. Now erase all the rows that do not start with h a 1 , and look at the remaining (infinite) table. Again, since B 2 is finite, some element h a 2 ∈ B 2 must occur infinitely many times in the second column. Choose the smallest such h a 2 , and choose any n 2 > n 1 such that n 2 + h a 2 ∈ A (which we can do because there are infinitely many such n 2 ). By construction this n 2 will be such that n 2 + h a 1 ∈ A. Now erase all rows that don't start with h a 1 , h a 2 , and repeat this process for B 3 , and so on. We will end up with increasing sequences a j and n k which by construction satisfy the required conditions.

5.2.
A second moment estimate. As discussed in Section 2, our work will require input about the joint distribution of the bins, which we will achieve via concentration of measure arguments. The following second moment estimate will suffice for our purposes.
then there exists an n ∈ [N, 2N ) and elements h a i ∈ B i such that n+h a i ∈ A for 1 ≤ i ≤ M.
Proof. By positivity we deduce the existence of an n ∈ [N, 2N ) such that If n + h / ∈ A for all h ∈ B i , then by assumption on the support of ρ A the left hand side of the above expression is ≥ µ 2 i /t 2 i , a contradiction. Thus, if the second moment estimate (5.1) holds for all M ≥ 1 and sufficiently large N , then we are in a situation where the hypotheses of Proposition 5.1 are satisfied.

Estimates in arithmetic progressions.
We require an understanding of how ρ(n) and ρ(n)ρ(n + h) behaves in arithmetic progressions for our sieve calculations. Essentially this reduces down to understanding the corresponding sums for r(n) and r(n)r(n+h), where the estimates we need are known with power-saving error terms. This means that the error terms in the sieve calculations can be bounded trivially (cf. with the case of primes [4,9], where we have to use equi-distribution results such as the Bombieri-Vinogradov theorem to bound the error terms that arise).
We have the following lemmas. We note that the functions g 1 , . . . , g 7 defined in this section will be used frequently throughout the rest of the paper.
where g 2 is defined as in (2.8), g 1 is the multiplicative function defined on primes by g 1 (p) = 1 − χ(p)/p, and Here g 3 , g 4 are the multiplicative functions defined on primes by and g 5 (p), g 6 (p) are defined by We will prove each of these results in Appendix A. Lemma 5.3 follows from two known results. Lemma 5.4 follows by adapting the method used by Plaksin in [10], where a similar sum is considered. Finally Lemma 5.5 can be shown using standard Perron's formula arguments, together with a fourth moment estimate for Dirichlet L-functions.
When finding the corresponding estimates for ρ(n) the following sums naturally appear (for the definitions of W, W 1 and D 0 see (6.1) below the fold): Here g 7 is the multiplicative function defined on primes by g 7 (p) = p + 1. The following lemma evaluates the auxiliary sums above.
Lemma 5.6 (Auxiliary estimates for ρ(n)). We have where A is defined as in (4.1). In each case one may take the o(1) term to be O(D −1 0 ). We remark that the estimate for X N,1 appears in Hooley's work (after correcting a misprint -cf. [7, Lemma 5] and note his slightly different definition of A). We prove Lemma 5.6 in Appendix B. Each sum can be evaluated by the Selberg-Delange method.

The sieve set-up
We now state our sieve results and use them to deduce Theorem 1.4. For the rest of the paper k is fixed, H = {h 1 , . . . , h k } is a fixed admissible set such that 4|h i for each i, and N is sufficiently large in terms of any fixed quantity. We allow any of the constants hidden in the Landau notation to depend on k, without explicitly specifying so.
We will employ a 4W -trick in our sieve calculations. Let for any fixed ǫ > 0 by the prime number theorem. It will prove useful to define We consider four types of sums: (6.6) Because k is fixed, we may assume that D 0 is sufficiently large so that Remark 6.1. For the second moment estimate, it proves important to control the residue classes of the translates n + h (mod 4), hence the condition n ≡ 1 (mod 4) in our sieve sums and also the assumption 4|h for our admissible set. This is because of the inherent bias numbers representable as a sum of two squares have modulo 4.
Our first Proposition evaluates these sums for general half-dimensional Maynard-Tao sieve weights. Fix 0 < θ 1 < 1/18 in the definition of ρ(n) (see (2.7)). We also define the normalisation constant Then we have From this, one can deduce the corresponding results for the modification of the Maynard-Tao sieve described in Section 2.
where each F i is smooth and supported on the simplex Proof. The hypotheses imply F = M i=1 F i is also smooth and supported on R k , and hence the results of Proposition 6.2 apply. It suffices to to show the functionals factorise in the forms stated. Because our set-up ensures that supp(F ) = supp( With the following lemma we will be in a position to prove Theorem 1.4.
Then for any m, l we have In this case the functionals factorise completely and the lemma follows from the fact Remark 6.5. We have restricted the support of our functions to the cube [0, β k ] k ⊆ R k,β so that the integrals can be evaluated exactly. This essentially means we are using weights of similar strength to the original GPY weights (cf. [4]). For the half-dimensional case one can show that for large k these weights are essentially optimal. (In particular, following a similar optimisation process as in [9,Section 7] one arrives at the same results as above.) We are now in a position to prove Theorem 1.4.
Proof of Theorem 1.4. Let H = {h 1 , h 2 , . . .} be a fixed admissible set. Fix real numbers θ 1 , θ 2 subject to 0 < θ 1 + θ 2 < 1/18 and define the constant With notation as above, consider a partition ..,d k as in Proposition 6.2, and let Expanding out (6.9), we have the evaluate the expression (where by abuse of notation we have written S Evaluating these sums using Proposition 6.3 and Lemma 6.4 we see that this is asymptotically Hence (6.9) will be satisfied for all sufficiently large N provided But now our choice of bins ensures that , and so (6.10) is satisfied for all M ≥ 1.
It remains to prove Proposition 6.2. Each sum can be treated similarly. The following lemma handles all of them at once. First, given a function F satisfying the hypotheses of Proposition 6.2, we define The lemma can now be stated as follows.
Lemma 6.6 (General sieve lemma). Let J ⊆ {1, . . . , k} (possibly empty) and with weights λ d 1 ,...,d k defined as in Proposition 6.2. If J = ∅ we define f (p) = 1/p (and there is no dependence on g in the sum). Otherwise, f and g are non-zero multiplicative functions defined on primes by and moreover we assume that f (p) = 1/p. We write S J for S J,1,1,m . Suppose λ d 1 ,...,d k satisfy the same hypotheses as in Proposition 6.2. Then for |J| ∈ {0, 1, 2} we have the following: (iii) We have where the integral operators are defined by Proposition 6.2 above, and we write L J (F ) as shorthand for L k;j∈J (F ).
We now show how this implies Proposition 6.2.
Proof of (Lemma 6.6 ⇒ Proposition 6.2). We consider each sum in turn. First we note that using the definition of λ d 1 ,...,d k , the exact same calculation as in [ As mentioned in Section 4, because we can obtain power-saving in the error terms for the formulae stated there, this trivial bound will suffice for our purposes.
(i) Rewrite S 1 in the form This is of the form S J where |J| = 0. Evaluating it according to Lemma 6.6 we obtain By definition of ρ(n + h m ) this is equal to As q ≪ W R 2 ≪ ǫ N θ 2 +ǫ and d ≪ vR 2 ≪ N θ 1 +θ 2 we see that the inner sum evaluates to Bounding the sum over a trivially by v log v and using (6.12), we see the error term contributes O ǫ (N 1 3 + 3 2 (θ 1 +θ 2 )+ǫ ) which is negligible. We obtain a main term where we have defined log v a as in (5.2). The sieve sum above is of the form S J with |J| = 1. We can evaluate this by Lemma 6.6 to obtain Recalling the definition of B in (6.8), evaluating X N,W according to Lemma 5.6, and using the fact k;m (F ).
Similarly to the above, for non-zero contribution we may restrict to the case W, We note that q ≪ ǫ N θ 2 +ǫ and d 1 , d 2 ≪ N θ 1 +θ 2 . Using the fact θ 1 + θ 2 < 1/18, we see the second error term in the definition of R 2 (N ; d 1 , d 2 , q) dominates, and so the inner sum evaluates to Bounding the rest of the sum trivially, we obtain a total error of size O ǫ (N 5 6 +3θ 1 +2θ 2 +ǫ ) which, again, is negligible in the range θ 1 + θ 2 < 1/18. We obtain a main term leaving us with a main term .
Here we have defined as in (5.3). The main term is of the form S J for |J| = 2. By Lemma 6.6 it can be evaluated as where we have written Evaluating Y N,W as in Lemma 5.6, this simplifies to S (l,m) 3 k;m,l (F ).
(iv) Rewrite S Expanding out the definition of ρ 2 (n) we see this is equal to . We note that q ≪ ǫ N θ 2 +ǫ and d ≪ N 2θ 1 +θ 2 , and so the inner sum becomes Bounding the rest of the sum trivially, we see the error term contributes O ǫ (N 3 4 +2(θ 1 +θ 2 )+ǫ ) which is small. For the main term, let T is of the form S J for |J| = 1, and so by Lemma 6.6 it can be evaluated as To evaluate T (p,i) , note by inclusion-exclusion we can write it as Now we note that D 0 <p≤v p≡3 (mod 4) and so, with our choice D 0 = (log log N ) 3 , the contributions from Λ 2 and Λ 3 are negligible. Because we see that the only contribution to the main term comes from the Λ 1 term corresponding to log N, and Λ 4 , leaving us with k;m (F ).
Evaluating these according to Lemma 5.6, and using the fact This finishes the proof of Proposition 6.2.
Thus it remains to establish Lemma 6.6. First we require a few technical sieve lemmas. We list these in the following section.

Technical sieve sums lemmas
In the various sieve calculations that appear in the proof of Lemma 6.6, we will frequently encounter sums of the form n≤X p|n⇒p≡3 (mod 4) where f is a multiplicative function satisfying f (p) = O(1/p). We can evaluate sums of this type with the following lemmas. for any 2 ≤ w ≤ z. Let g be the totally multiplicative function defined on primes by g(p) = γ(p) p−γ(p) . Finally, let G : [0, 1] → R be a piecewise differentiable function, and let G max = sup t∈[0,1] (|G(t)| + |G ′ (t)|). Then Here, the implied constant in the Landau notation is independent of G and L.
Proof. This is [5, Lemma 4] with slight changes to notation.
To use this lemma in practice, we need to be able to evaluate the singular series c γ which appears. In the next lemma we do this for a function γ(p) which covers the cases of interest to us.
where A is the Ramanujan-Landau constant defined in (4.1). Proof The latter product is 1 + O(D −1 0 ) by our assumption α(p) = O(1/p). The next lemma collects both of these results together. First we recall the definition of the normalising constant from (6.8): Proof. Let f (p) = 1/p + g(p) where g(p) = O(1/p 2 ), and consider the function γ defined on primes by With this choice of γ(p) we have Therefore we can apply Lemma 7.1 with γ(p), taking L ≪ 1 + log D 0 and A 2 a suitable constant. We obtain We can evaluate c γ by Lemma 7.2 to find When we substitute this back into our expression we see the error incurred here contributes O(G max B/D 0 ) and dominates. The result follows as Γ(1/2) L(1) = π/2.
We highlight the following two results, the first of which follows immediately from Lemma 7.3, and the second of which is trivial.
(1) For multiplicative functions f satisfying f (p) = 1/p+O(1/p 2 ) we have the upper bound (2) For multiplicative functions g satisfying g(p) = O(1/p 2 ) we have the upper bound These sums will appear frequently in our calculations, and we will use these bounds without comment in the arguments which follow. 8. Establishing Lemma 6.6 Our attention now turns to establishing Lemma 6.6. We follow the combinatorial arguments used by Maynard -the steps which follow mirror those found in [9]. 8.1. Change of variables. Our first step to evaluating the sums appearing in Lemma 6.6 is to make a change of variables. We do so in the following proposition.
Proposition 8.1 (Diagonalising the sieve sum). With notation as in Lemma 6.6, denote by f * , g * the convolutions Define the diagonalising vectors y (J,p,m) Let y If m ∈ J then the (error) term E satisfies If m / ∈ J then E satisfies a similar estimate, namely that which is obtained upon replacing all occurrences of p 2 i with p i in the above expression, for i ∈ {1, 2}. Moreover, in both of these cases, we adopt the convention that if p i = 1 then any term in our expression for E involving p i in the denominator may be omitted.
Proof. Recall the definition of S J,p 1 ,p 2 ,m given in Lemma 6.6: using multiplicativity of the functions f and g, together with the fact [d i , e i ] is square-free for each i. We remark that because f, g are non-zero, the functions 1/f, 1/g are well-defined. We note the convolution identities for f * and g * . Substituting these into (8.2) and swapping the order of summation, we obtain From the support of the λ d 1 ,...,d k , we see the only restriction coming from the pairwise coprimality of W, [d 1 , e 1 ], . . . , [d k , e k ] is from the possibility (d i , e j ) = 1 for i = j. We can take care of this constraint by Möbius inversion: multiplying by s i,j |d i ,e j µ(s i,j ) for all i = j, we obtain We may restrict to the case where s i,j is coprime to s i,a , s b,j and u i , u j , for a = j and b = i, because the vectors λ d 1 ,...,d k are supported on square-free integers d = k i=1 d i . Denote the sum over s i,j with these conditions by ′ . Define the diagonalising vectors From the support of λ d 1 ,...,d k we see that y (J,p,m) r 1 ,...,r k is also supported on r 1 , . . . , r k with r = k i=1 r i square-free, (r, W ) = 1 and p|r ⇒ p ≡ 3 (mod 4). We claim this change of variables is invertible. Indeed, from the definition (8.4), for d 1 , . . . , d k with k i=1 d i square-free, we have With this transformation our sum (8.3) becomes where we have defined a i = u i j =i s i,j and b j = u j i =j s i,j . Because of our constraints on the s i,j variables, we can use multiplicativity to write this as We now wish to reduce to the case when (a m , p 1 ) = (b m , p 2 ) = 1. Indeed, we will show the contribution from the alternative cases is negligible. Of course, depending on whether of not p 1 = 1 and/or p 2 = 1, some (or all) of the analysis which follows is not necessary, and this accounts for the convention we assert in the statement of the proposition. First let us note the estimate and similarly 1 g * (p) = 1 Now, there are three cases to consider.
(1) Suppose that p 1 |a m and p 2 |b m . This occurs if and only if p 1 |u m or p 1 |s m,j for some j = m, and p 2 |u m or p 2 |s i,m for some i = m. Suppose, for example, that p 1 |u m and p 2 |u m . Moreover let us assume that m ∈ J. Then one can bound the contribution as follows: It is easy to see that this bound also holds in any of the other possible cases in which p 1 |a m and p 2 |b m and m ∈ J. If instead m / ∈ J, then again it is easy to see the contribution is in all possible cases. (2) Suppose that p 1 |a m and p 2 ∤ b m . If m ∈ J, then similarly to the above, one can bound the contribution by If m / ∈ J then likewise one obtains a contribution (3) Finally, the case p 1 ∤ a m and p 2 |b m proceeds as above, interchanging the roles of p 1 and p 2 .
Thus, we may now suppose that (a m , p 1 ) = (b m , p 2 ) = 1. From the support of the y (J,p 1 ,m) a 1 ,...,a k we see there is no contribution from (s i,j , W ) = 1 and so either s i,j = 1 or s i,j > D 0 . The contribution from s i,j > D 0 with i, j ∈ I is This contribution will be negligible. The cases i, j ∈ J and i ∈ I, j ∈ J can be treated the same way. This leaves us with a main term To finish, we claim the contribution from u j > 1 is small whenever j ∈ J. Indeed if u j > 1 then it must be divisible by a prime p > D 0 (with p ≡ 3 (mod 4)). So suppose |J| ≥ 1 and let j ∈ J. If u j > 1 we get a contribution which is small. Putting all of these facts together establishes Proposition 8.1.

Transformation for y
(J,p,m) r 1 ,...,r k and proof of Lemma 6.6 parts (i) and (ii). Define and let y max = sup r 1 ,...,r k |y r 1 ,...,r k |. By the inversion formula (8.5), our definition of λ d 1 ,...,d k in Proposition 6.2 is equivalent to taking We now wish to relate the more complicated diagonalisation vectors y (J,p,m) r 1 ,...,r k to these simpler vectors. We first deal with the case when J = ∅, which is straightforward. By inspecting the proof of Proposition 6.2, it is clear that we only need to understand this case when f (p) = 1/p. (1) If p 1 |r m then y (J,p 1 ,m) r 1 ,...,r k = y r 1 ,...,r k .
Proof. If f (p) = 1/p then f * (p) = p − 1 = ϕ(p). Hence we are assuming The result then follows by comparing with the definition of y r 1 ...,r k given above.
Thus, proceeding, we may suppose that |J| ≥ 1. We may further suppose that f (p) = 1/p as this case is of no interest to us (again, this is clear by inspecting the proof of Proposition 6.2). The following proposition gives the result in full.
(ii) If m / ∈ J and (r m , p) = 1 we have Here f * * and g * * are defined by the convolutions where ι is the identity function, ι(p) = p.
Proof. We prove (i), with the rest proved in exactly the same way. Directly from the definition (8.4) we have From the inversion formula (8.5) and the definition of y r 1 ,...,r k , we see that the right hand side of (8.10) equals .
Swapping sums we obtain e 1 ,...,e k r i |e i ∀i p 1 |em We can evaluate the inner sums using the convolution identities .
We note that and similarly With our assumption f (p) = 1/p, we may suppose both of these functions are non-zero. Now, recall that we are assuming m ∈ J. Using these identities transforms (8.11) into µ(p 1 )p 1 g(p 1 ) g * * (p 1 ) i∈I µ(r i )r i f (r i ) f * * (r i ) j∈J µ(r j )r j g(r j ) g * * (r j ) · e 1 ,...,e k r i |e i ∀i p 1 |em y e 1 ,...,e k i∈I f * * (e i ) ϕ(e i ) j∈J g * * (e j ) ϕ(e j ) .
Here we are using the fact y e 1 ,...,e k is supported on square-free integers e = k i=1 e i . Hence, from (8.10), it follows that .
From the support restrictions on y (J,p 1 ,m) r 1 ,...,r k we necessarily have (r i , W ) = 1. From the above, we see the first product may be replaced by 1 + O(D −1 0 ). This incurs an acceptable error which gives the result stated.
We record the following useful corollary.
Corollary 8.4. With notation as in the statement of Proposition 8.1, the following estimates hold.
(1) If m ∈ J thenỹ Proof. This follows easily from Lemma 8.2 and Proposition 8.1. (We note that in the special case J = ∅, in items (2) and (3) we could drop the extra log log R factor if required.) We are now in a position to prove the first two parts of Lemma 6.6.
Proof of Lemma 6.6 parts (i) and (ii). We prove part (i), in the case m ∈ J. The rest of the argument proceeds along the same lines. From Proposition 8.3 we have Here we have used Corollary 8.4 to control the various error terms in the statement of the proposition. We have also used the fact |I| + |J| = k. It follows that again using Corollary 8.4. This gives the result, as required.
From now on we are only interested in the sums S J = S J,1,1,m , and in particular the cases |J| ∈ {0, 1, 2}. For ease of notation we let y Define the integral operators Proof. First suppose J = {m}. From Proposition 8.3 we have Consider the function if p ∤ W i∈I r i and p ≡ 3 (mod 4), 0 otherwise.
With this choice of γ 1 (p) we have and one can easily check γ 1 (p) = 1 + O(1/p). By an argument identical to the proof of Lemma 7.3, we can evaluate the sum in (8.17) as which proves (i). If J = {m, l} then, again using Proposition 8.3, we find Take the sum over e l first. Consider the function if p ∤ W i∈I r i e m and p ≡ 3 (mod 4), 0 otherwise.
Reasoning as above, the sum on the right hand side of (8.18) becomes We can evaluate this sum in much the same way, this time using the function if p ∤ W i∈I r i and p ≡ 3 (mod 4), 0 otherwise, to get the stated result.
If we define the (identity) operator then the results of Corollary 8.5 can be concisely written as for |J| ∈ {0, 1, 2}. Here we have used I r 1 ,...,r k ;J (F ) to denote I r 1 ,...,r k ;j:j∈J (F ). We are now in a position to prove the remaining part of Lemma 6.6.
Proof of Lemma 6.6 part (iii). We can write the operators in the statement of Lemma 6.6 as follows: where we have defined From Proposition 8.1 we see that From Corollary 8.5, for |J| ∈ {0, 1, 2}, we have (y (J) r 1 ,...,r k ) 2 Substituting this into (8.19), and using (8.16), yields The first error contributes For the main term, if (u i , u j ) = 1 then they must be divisible by a prime q > D 0 with q ≡ 3 (mod 4). In this case we get a contribution Thus this constraint can be removed at the cost of a negligible error and we are left with we can evaluate this multidimensional sum by applying Lemma 7.3 |I| = k − |J| times. We obtain This completes the proof of Lemma 6.6.
Proof of Lemma 5.4. Under the additional assumption that p|h ⇒ p|2q, then we see for r considered in the sum c r (h) = µ(r). But now restricting to square-free r, since d 1 , d 2 are square-free Ψ(d 1 , r) = Ψ(d 2 , r) = 1. Thus in this case This is the form stated in Lemma 5.4. Now we briefly outline the proof of Lemma A.3. In [10, Lemma 4] a similar sum to (A.6) is considered, this time under the hypotheses p|d 1 , d 2 ⇒ p ≡ 3 (mod 4), q = 1 and the congruence n ≡ 1 (mod 4) is omitted. The proof of Lemma A.3 is similar to the proof found there, with few minor changes. The key point is to note that, using the convolution identity r = 4(χ * 1) and complete multiplicativity of χ, for n ≡ 1 (mod 4) we can write Using this we may expand out r(n) in the sum (A.6). We obtain (after swapping the order of summation) and ∆ (mod Q) satisfies the congruence system ∆ ≡ a+h (mod q), ∆ ≡ 1 (mod 4), ∆ ≡ h (mod m), ∆ ≡ h (mod d 1 ) and ∆ ≡ 0 (mod d 2 ). The error term arises by estimating the intervals of length h left over: taking absolute values, using the divisor bound r(n) ≪ n ǫ , and noting that in this regime h ≪ N 3/4 , we have to estimate sums of type Applying the standard estimate h/m + O(1) to the inner sum and then carrying out the summation over m yields the desired estimate (after redefining our choice of ǫ). Now the inner sums in (A.9) consist of estimating r(n) in arithmetic progressions. It proves convenient to proceed using formula (A.4). We obtain where R 2 is an error term. Using the fact (∆, Q) ≤ d 1 d 2 (m, h) and Q ≪ mqd 1 d 2 one can show R 2 contributes Proof of Lemma 5.5. One obtains the main term as stated in Lemma 5.5 by taking the logarithmic derivative of H(s, d, 4q) (taking an appropriate branch-cut) and evaluating at s = 1.
We now outline the proof of Lemma A.4. The proof uses a standard application of Perron's formula. The following lemma, which combines [12,Theorem II.8.20] and [12, Theorem II.8.22], will prove necessary.
Lemma A.5. Let L = log (|t| + Q + 1) and let χ Q (mod Q) be a Dirichlet character. For σ ≥ 1 we have the following (1) If χ 2 Q is complex then L(s, χ 2 Q ) −1 ≪ L 7 . (2) If χ 2 Q is real, non-trivial, then there exists an absolute constant c 0 > 0 such that Write Q = 4q and let ∆ (mod Q) be the unique solution to the congruences ∆ ≡ a (mod q) and ∆ ≡ 1 (mod 4). Since (∆, Q) = 1 by character orthogonality we can write n≤x n≡∆ (mod Q) d|n We study these sums by considering their generating series We need bounds for the integrand in the region σ ≥ 1/2 and |t| ≤ 2T. One can easily check the trivial bound |A(s, d, Q)| ≪ 1 (uniformly in d and q). We require a lower bound for |L(2s, χ 2 Q )| in this region. If χ Q is real then χ 2 Q is the trivial character, and so we have In the region σ ≥ 1 2 we can bound the product from below by ϕ(Q)/Q. Standard bounds for ζ(s) on the 1-line (see for example [12,Theorem II.3.9]) then tell us that If χ Q is complex then we have to be more careful. We use Lemma A.5, and see there is a minor technical complication where we must bound the contribution from s = 1/2 + it with |t| ≤ 2c 0 (say) separately (with this choice of cut-off we can use the bound L(2s, χ 2 Q ) ≪ (QT ) ǫ uniformly in χ Q for all the other contours). One can easily show the contribution from these values of s is ≪ Q and moreover if a n > 0 then we also have n≤x a n = x(log x) z−1 (taking the positive determination of the square-root in the first instance). Both of these functions are analytic for σ > 3/4 (say). K 1 (s) is bounded in this region, and G 1 (W, s) estimates, we note that the inner sum appearing in both is precisely X N,W − X N,qW , and from our work above it follows that Finally, we can write This last sum is exactly X N,W . Using our bound from part (i), and also the fact These four functions are analytic in the region σ > 3/4 (say), where they all satisfy the bound O(1) except for G 3 . Since (a, W ) = 1, by multiplicativity we can write T 1 can be evaluated similarly to part (iii) to give where β 1 (q) = µ(q)g 4 (q)γ 1 (q)G 1 (q)G 3 (q)G 4 (q)G 5 (q) g 2 (q) = − q(4q 2 − 3q + 1) 2(q − 1)(2q 2 − 2q + 1) .