1 Introduction

1.1 Main results.

An integer \(n \ge 1\) is squarefree if it is not divisible by the square of a prime. By analogy with questions about prime numbers, a basic problem in analytic number theory is to understand the distribution of squarefree numbers in arithmetic progressions and in short intervals. Squarefree numbers ought to be a simpler, more regular sequence than primes, and yet they present distinct challenges; for instance we can determine whether n is prime in polynomial time [AKS04], but there is no known polynomial time algorithm to determine whether n is squarefree.

It was conjectured by Montgomery (see [Cro75]) that for any given \(\varepsilon \in (0, 1/100)\), and \((a,q) = 1\),

$$\begin{aligned} \sum _{\begin{array}{c} n \le x \\ n \equiv a \, (\mathrm {mod}\, q) \end{array}} \mu ^2(n) = \frac{6}{\pi ^2} \cdot \frac{x}{q} \prod _{p | q} \Big ( 1 - \frac{1}{p^2} \Big ) + O_{\varepsilon } \Big ( (x / q)^{1/4 + \varepsilon } \Big ). \end{aligned}$$
(1)

uniformly in \(1 \le q \le x^{1 - \varepsilon }\). This conjecture is difficult for two reasons. In the regime of large q of size roughly \(x^{1 - \varepsilon }\) the left-hand side contains only \(x^{\varepsilon }\) terms and even establishing an asymptotic is openFootnote 1 (see the work of Nunes [Nun17] for the best result in this direction). In the regime of small q of size about \(x^{\varepsilon }\) establishing an asymptotic is easy but obtaining an error term as good as \(O_{\varepsilon }((x / q)^{1/4 + \varepsilon })\) is an open problem, even conditionally on the Generalized Riemann Hypothesis.

Analogously we conjecture that for any given \(\varepsilon \in (0, \tfrac{1}{100})\), uniformly in \(x^{\varepsilon } \le H \le x\),

$$\begin{aligned} \sum _{\begin{array}{c} x < n \le x + H \end{array}} \mu ^2(n) = \frac{6 H}{\pi ^2} + O_{\varepsilon }(H^{1/4 + \varepsilon }). \end{aligned}$$
(2)

Similarly to the case of arithmetic progressions, when H is close to \(x^{\varepsilon }\) no asymptotic estimates are known (see the work of Tolev [Tol06] and Filaseta and Trifonov [FT92] for the best unconditional results in this direction and [CE19, Thm. A.1], [Gra98] for results conditional on the ABC conjecture). Meanwhile for large H, say \(H = x\), estimating (2) asymptotically is straightforward, but obtaining an error term \(O_{\varepsilon }(x^{1/4 + \varepsilon })\) is an open problem, even conditionally on the Riemann Hypothesis (see [Liu16] for the best result in this direction).

An important feature of both conjectures (1) and (2) is that the error term is significantly smaller than the square-root of the number of terms being summed, in contrast to what a naive probabilistic model predicts.

The conjectures (1) and (2) imply the Riemann Hypothesis, and they are almost certainly deeper than the Riemann Hypothesis. Nonetheless one can still hope to investigate them on average over residue classes for (1) or on average over short intervals for (2). Importantly, establishing (1) on average is easier when q is large than when q is small, since a large q allows for more averaging over the residue classes \(a \, (\mathrm {mod}\, q)\). Similarly establishing (2) on average is easier when H is small, since there are more non-overlapping short intervals \([x, x + H]\) to average over compared to the case when H is large. In fact when there is little averaging (i.e q small or H large), the averaged versions of (1) and (2) are not significantly easier than the non-averaged version, see Theorem 3 for a concrete manifestation of this.

In our first result we compute the variance of (2) on average over short intervals. We estimate the variance asymptotically thus making (on average) the error term in (2) more precise.

Theorem 1

Let \(\varepsilon \in (0, \tfrac{1}{100})\) be given. Let \(X \ge 1\) and \(1 \le H \le X^{6/11 - \varepsilon }\). Then

$$\begin{aligned} \frac{1}{X} \int _{X}^{2X} \Big | \sum _{x < m \le x + H} \mu ^2(m) - \frac{6H}{\pi ^2} \Big |^2 dx = C \sqrt{H} + O_{\varepsilon }(H^{1/2-\varepsilon /16}) \end{aligned}$$
(3)

with

$$\begin{aligned} C:= \frac{\zeta (3/2)}{\pi } \prod _p \Big (1 - \frac{3}{p^2} + \frac{2}{p^3}\Big ). \end{aligned}$$
(4)

Assuming the Lindelöf Hypothesis (3) holds in the wider range \(H \le X^{2/3-\varepsilon }\).

We recall that the Lindelöf Hypothesis follows from the Riemann Hypothesis and asserts that for any given \(\varepsilon > 0\) we have \(|\zeta (\tfrac{1}{2} + it)| \ll _{\varepsilon } 1 + |t|^{\varepsilon }\) for all \(t \in {\mathbb {R}}\). In Theorem 3 we will show that if we had (3) in the full range \(H \le X^{1 - \varepsilon }\) then the Riemann Hypothesis would ensue.

Theorem 1 extends a theorem of Hall [Hal82] who showed that the asymptotic formula (3) holds in the range \(H \le X^{2/9-\varepsilon }\). We will now explain why the range \(H = X^{1/2}\) can be considered a threshold in this problem. It is reasonable to conjecture that given \(\varepsilon > 0\), for any \(1 \le h \le x^{1 - \varepsilon }\),

$$\begin{aligned} \sum _{n \le x} \mu ^2(n) \mu ^2(n + h) - C(h) x = O_{\varepsilon }(x^{1/4 + \varepsilon }) \end{aligned}$$
(5)

with C(h) a constant depending only on h. Summing this conjectural estimate over h recovers Theorem 1 but only in the range \(H < X^{1/2 - \varepsilon }\). Thus Theorem 1 exploits (unconditionally!) additional cancellations between the error terms in (5).Footnote 2

We now describe the analogue of Theorem 1 for the distribution of squarefree numbers in arithmetic progressions with a given modulus. In this case for a given modulus q the parameter x/q has the same role as the length H of the short interval in Theorem 1. While the results are analogous they are harder to prove, as is often the case with q-analogues.

Theorem 2

Let \(\varepsilon \in (0, \tfrac{1}{100})\) be given. Let \(x \ge q \ge x^{5/11 + \varepsilon }\) be a prime. Then

$$\begin{aligned} \begin{aligned}&\frac{1}{\varphi (q)}\sum _{(a,q) = 1} \Big | \sum _{\begin{array}{c} m \le x \\ m \,\equiv \, a \, (\mathrm {mod}\, q) \end{array}} \mu ^2(m) - \frac{6}{\pi ^2}\cdot \frac{x}{q} \prod _{p | q} \Big (1-\frac{1}{p^2}\Big )^{-1} \Big |^2 \\&\quad = C \prod _{p | q} \Big (1 + \frac{2}{p} \Big )^{-1} \cdot \sqrt{\frac{x}{q}} + O_{\varepsilon }((x/q)^{1/2-\varepsilon /16}), \end{aligned} \end{aligned}$$
(6)

where C is the same constant as in Theorem 1. Assuming the Generalized Lindelöf Hypothesis the claim holds in the wider range \(q > x^{1/3 + 30 \varepsilon }\).

We recall that the Generalized Lindelöf Hypothesis follows from the Generalized Riemann Hypothesis and asserts that for any given \(\varepsilon > 0\) we have \(|L(\tfrac{1}{2} + it, \chi )| \ll _{\varepsilon } 1 + (|q| + |t|)^{\varepsilon }\) for all \(t \in {\mathbb {R}}\) and all characters \(\chi \, (\mathrm {mod}\, q)\).

For simplicity we have assumed in Theorem 2 that q is prime, but our methods are amenable to handling the general case of composite q with a bit more effort.

Once extended to composite q our Theorem 2 improves on results by Warlimont [War80] and Vaughan [Vau05] who obtain an asymptotic formula with an additional averaging over \(q \le Q\) in the range \(x^{2/3} \le Q = o(x)\). Moreover, for prime values of q, Theorem 2 improves on a succession of results by Blomer [Blo08], Nunes [Nun15] (see also [Par19]) and Le Boudec [LB18] who considered individual averages over \((a,q) = 1\) as we do in Theorem 2. In particular Nunes showed that (6) holds in the range \(x^{31/41+\varepsilon } \le q = o(x)\) and Le Boudec showed that the left-hand side of (6) is \(O_\varepsilon ((x/q)^{1/2+\varepsilon })\) for all \(\varepsilon > 0\) in the range \(x^{1/2} \le q \le x\).

Keating and Rudnick [KR16] obtained Theorems 1 and 2 in the context of function fields in the limit of a large field size. Their results hold in the (analogues of) the ranges \(X^{\varepsilon } \le H \le X^{1 - \varepsilon }\) and \(x^{\varepsilon } \le q \le x^{1 - \varepsilon }\). Our proofs of Theorem 1 and Theorem 2 can be adapted in the setting of a fixed base field and large degree limit. In fact our proofs of Theorems 1 and 2 were originally motivated by analogies with the function field setting. Since we ended up obtaining equally strong results in the setting of number fields we do not include the proofs in the function field setting.

Finally the next result shows that obtaining nearly optimal upper bounds for (3) in a complete range is equivalent to the Riemann Hypothesis.

Theorem 3

The Riemann Hypothesis holds if and only if for every \(\varepsilon \in (0, \tfrac{1}{100})\) and every \(1 \le H \le X^{1 - \varepsilon }\),

$$\begin{aligned} \frac{1}{X} \int _{X}^{2X} \Big | \sum _{x < m \le x + H} \mu ^2(m) - \frac{6 H}{\pi ^2} \Big |^2 dx \ll _{\varepsilon , \delta } H^{1/2 + \delta } \end{aligned}$$
(7)

for every \(\delta > 0\).

Following the proof of Theorem 3 one can show that for any smooth compactly supported \(\Phi \), conditionally on the Generalized Riemann Hypothesis

$$\begin{aligned} \frac{1}{\varphi (q)} \sum _{(a,q) = 1} \Big | \sum _{\begin{array}{c} m \equiv a \pmod {q} \end{array}} \mu ^2(n) \Phi \Big ( \frac{n}{x} \Big ) - \frac{6}{\pi ^2 \varphi (q)} \sum _{\begin{array}{c} (m,q) = 1 \end{array}} \Phi \Big ( \frac{m}{x} \Big ) \Big |^2 \ll _{\varepsilon , \delta } (x / q)^{1/2 + \delta }\nonumber \\ \end{aligned}$$
(8)

for all \(\delta > 0\) and uniformly in \(1 \le q \le x^{1 - \varepsilon }\) for any given \(\varepsilon \in (0, \tfrac{1}{100})\). However it is not clear whether (8) implies the Generalized Riemann Hypothesis. Moreover replacing the smoothing \(\Phi \) by sharp cut-offs appears to be difficult. For these reasons we decided not to pursue this further in the present paper.

Finally, we note that we have made no effort to optimize the exponents of error terms \(O_{\varepsilon }(H^{1/2-\varepsilon /16})\) and \(O_{\varepsilon }((x/q)^{1/2-\varepsilon /16})\) in Theorems 1 and 2. Better power saving estimates, in more restricted ranges, can be found in the papers [Hal82, Nun15].

1.2 Fractional Brownian motion.

One notable feature of Theorems 1 and 2 is that while the expected count of squarefrees in a short interval (or likewise arithmetic progression) is of order H, the variance of these counts is of order \(H^{1/2}\). For many other natural arithmetic sequences (e.g. primes) one conjectures that the variance of counts is of the same order of magnitude as the expected value of counts.

That the variance is of order \(H^{1/2}\) in Theorems 1 and 2 speaks to the idea that the squarefree numbers are “less random" than (for example) the primes (cf. [CS13]). One may conjecture that higher moments are Gaussian (see [ACS17] for numerical evidence). For x drawn uniformly at random from [X, 2X], one may even make the stronger conjecture that the process

$$\begin{aligned} t \mapsto \frac{1}{H^{1/4}} \sum _{x < n \le x+tH} (\mu ^2(n) - 1/\zeta (2)) \end{aligned}$$
(9)

tends weakly, when suitably normalized by \(H^{1/4}\), to a fractional Brownian motion with Hurst parameter 1/4. See Fig. 1 for an illustration of the evolution of the partial sums (9). A formulation of this perspective seems to have been first made in [GH91]. This is in contrast to the analogous process generated by prime-counting, where one may conjecture the appearance of Hurst parameter 1/2—that is, usual Brownian motion. (See [She14] for a survey on fractional Brownian motion.) The evolution of the process

$$\begin{aligned} t \mapsto \frac{1}{H^{1/2}} \sum _{x < p \le x + t H} (\log p - 1). \end{aligned}$$
(10)

is depicted in Fig. 2. Both Figs. 1 and 2 depict the same range of parameters to make the comparison easier. The dots on Figs. 1 and 2 correspond to lattice points on the positive x-axis and on the (positive and negative) y-axis and indicate the difference in scales.

Fig. 1
figure 1

Partial sums of \(\mu ^2(n)\): depiction of (9) with \(x = 2 \times 10^{15}\), \(H = 44{,}721{,}359\) and \(0 \le t \le 10\).

Fig. 2
figure 2

Partial sums of \(\log p\): depiction of (10) with \(x = 2 \times 10^{15}\), \(H = 44{,}721{,}359\). and \(0 \le t \le 10\).

1.3 Conventions and notations.

Throughout the rest of the paper we will allow the implicit constants in \(\ll \) and \(O(\cdot )\) to depend on \(\varepsilon \). Furthermore the notation \(n \sim N\) in the subscript of a sum will mean that \(N \le n < 2N\).

2 Proofs of Theorems 1 and 2

We will show in this section how Theorems 1 and 2 follow from a number of technical propositions that are proven in Sects. 47.

The proof of Theorem 1 splits into two steps and depends on the identity

$$\begin{aligned} \mu ^2(m) = \sum _{nd^2 = m} \mu (d) \end{aligned}$$

and the following two propositions.

Proposition 1

Let \(\varepsilon \in (0, \tfrac{1}{100})\) be given. Let \(X \ge 1\) and \(X^{\varepsilon } \le H \le X^{2/3 - \varepsilon }\). Let \(H^{1+\varepsilon } \le z \le \min \{X/H^{1/2+\varepsilon }, H^{1/2-\varepsilon }X^{1/2}\}\). Then, as \(X \rightarrow \infty \),

$$\begin{aligned} \frac{1}{X} \int _{X}^{2X} \Big | \sum _{\begin{array}{c} x < n d^2 \le x + H \\ d^2 \le z \end{array}} \mu (d) - H \sum _{d^2 \le z} \frac{\mu (d)}{d^2} \Big |^2 \, dx = C \sqrt{H} + O(H^{1/2-\varepsilon /10}) \end{aligned}$$
(11)

with C as in (4).

Proposition 2

Let \(\varepsilon \in (0, \tfrac{1}{100})\) be given. Let \(X \ge 1\) and \(X^{\varepsilon } \le H \le X^{4/7 - \varepsilon }\). Let \(z \ge H^{4/3+\varepsilon }\). Then

$$\begin{aligned} \frac{1}{X} \int _{X}^{2X} \Big | \sum _{\begin{array}{c} x < n d^2 \le x + H \\ d^2> z \end{array}} \mu (d) - H \sum _{\begin{array}{c} 2X \ge d^2 > z \end{array}} \frac{\mu (d)}{d^2} \Big |^2 dx \ll H^{1/2 - \varepsilon / 8}. \end{aligned}$$
(12)

Assuming the Lindelöf Hypothesis, the claim holds in the wider range \(X^\varepsilon \le H \le X^{2/3-\varepsilon }\) and \(z \ge H^{1+\varepsilon }\).

Under the assumption of the Lindelöf Hypothesis, the above propositions cover all the possible values of \(d^2\) for \(X^{\varepsilon } \le H \le X^{2/3-\varepsilon }\). However, unconditionally they cover all the possible values of \(d^2\) only for \(X^{\varepsilon } \le H \le X^{6/(11+12\varepsilon )}\). It would be possible to improve on the exponent 4/7 in Proposition 2, but this would not help. Similarly it should be possible to prove Proposition 1 only with the condition \(H^{1 + \varepsilon } \le z \le X / H^{1/2 + \varepsilon }\) by adapting the proof of Proposition 3 below.

We note that only the terms d with \(d^2 \in [H^{1 - \varepsilon }, H^{1 + \varepsilon }]\) contribute to the main term \(C \sqrt{H}\) in Proposition 1.

Roughly speaking Proposition 1 depends only on “convex" inputs such as a Fourier expansion and a point-counting lemma, whereas Proposition 2 exploits large value estimates of Huxley and subconvexity and fourth moment estimates for the Riemann zeta-function.

Proof of Theorem 1assuming Propositions 1and 2. Let \(\varepsilon \in (0, \tfrac{1}{100})\). If \(H \le X^{\varepsilon }\) then the result already follows from Hall’s theorem. We can therefore assume that \(H > X^{\varepsilon }\).

For \(H \in [X^\varepsilon , X^{6/11-\varepsilon }]\), take \(z = \min \{X/H^{1/2+\varepsilon }, H^{1/2-\varepsilon }X^{1/2}\}\). Note that \(z \ge H^{4/3+\varepsilon }\). Denoting by \({\mathcal {I}}_1\) the left-hand side of (11) and by \({\mathcal {I}}_2\) the left-hand side of (12), we get, using Cauchy–Schwarz, that

$$\begin{aligned} \frac{1}{X} \int _{X}^{2X} \Big | \sum _{\begin{array}{c} x < n d^2 \le x + H \end{array}} \mu (d) - H \sum _{d^2 \le 2X} \frac{\mu (d)}{d^2} \Big |^2 dx = {\mathcal {I}}_1 + O(\sqrt{{\mathcal {I}}_1 {\mathcal {I}}_2} + {\mathcal {I}}_2). \end{aligned}$$

Using the bounds in (11) and (12), we conclude that

$$\begin{aligned} \frac{1}{X} \int _{X}^{2X} \Big | \sum _{\begin{array}{c} x < n d^2 \le x + H \end{array}} \mu (d) - H \sum _{d^2 \le 2X} \frac{\mu (d)}{d^2} \Big |^2 dx = C \sqrt{H} + O(H^{1/2 - \varepsilon / 16}). \end{aligned}$$
(13)

Notice that the tail \(H\sum _{d^2 > 2X} \mu (d)/d^2\) is \(\ll H/\sqrt{X}\). Hence the claim reduces to showing that

$$\begin{aligned} \frac{1}{X} \int _{X}^{2X} \Big | \sum _{\begin{array}{c} x < n d^2 \le x + H \end{array}} \mu (d) - H \sum _{d^2 \le 2X} \frac{\mu (d)}{d^2} \Big | \cdot \frac{H}{\sqrt{X}} + \Bigl (\frac{H}{\sqrt{X}}\Bigr )^2 dx \ll H^{1/2 - \varepsilon / 16}. \end{aligned}$$

Applying Cauchy–Schwarz and (13) we see that the left hand side is \(\ll H^{1/4}(H/\sqrt{X}) + H^2/X \ll H^{1/2-\varepsilon /16}\) since \(H \le X^{2/3 - \varepsilon }\). \(\square \)

Likewise the proof of Theorem 2 splits into two steps and depends on the following propositions.

Proposition 3

Let \(\varepsilon \in (0, \tfrac{1}{100})\). Let q be prime with \(x^{1/3 + 30 \varepsilon } \le q \le x^{1 - \varepsilon }\) and let \((x/q)^{1 + \varepsilon } \le z \le x^{-\varepsilon } \cdot \sqrt{qx}\). Then

$$\begin{aligned}&\frac{1}{\varphi (q)}\sum _{(a,q) = 1} \Big | \sum _{\begin{array}{c} d^2 n \le x, \ d^2< z \\ d^2 n \,\equiv \, a \, (\mathrm {mod}\, q) \end{array}} \mu (d) - \frac{1}{\varphi (q)} \sum _{\begin{array}{c} d^2 n \le x, \ d^2 < z \\ (d^2 n,q)=1 \end{array}} \mu (d) \Big |^2 = C \sqrt{x /q} + O\left( (x/q)^{1/2-\varepsilon /16}\right) \qquad \quad \end{aligned}$$
(14)

with C as in (4).

Proposition 4

Let \(\varepsilon \in (0, \tfrac{1}{100})\). Let \(x \ge 1\) and \(x^{3/7 + \varepsilon } \le q \le x^{1 - \varepsilon }\). Let \(z \ge (x/q)^{4/3+\varepsilon }\). Then

$$\begin{aligned} \frac{1}{\varphi (q)}\sum _{(a,q) = 1} \Big | \sum _{\begin{array}{c} d^2 n \le x, \ d^2 \ge z \\ d^2 n \,\equiv \, a \, (\mathrm {mod}\, q) \end{array}} \mu (d) - \frac{1}{\varphi (q)} \sum _{\begin{array}{c} d^2 n \le x, \ d^2 \ge z \\ (d^2 n, q) = 1 \end{array}} \mu (d) \Big |^2 \ll (x / q)^{1/2 - \varepsilon / 8}. \end{aligned}$$
(15)

Assuming the Generalized Lindelöf Hypothesis, the claim holds in the wider range \(x^{1/3+\varepsilon } \le q \le x^{1-\varepsilon }\) and \(z \ge (x/q)^{1+\varepsilon }\).

The proof of Proposition 3 depends once again only on “convex” inputs: in this case Poisson summation and results on integer solutions to binary quadratic forms with positive discriminant. However the proof of Proposition 3 is more intricate than that of Proposition 1 due to a number of technical issues. The proof of Proposition 4 is similar to the proof of Proposition 2 and uses hybrid versions of Huxley’s large value estimates, subconvexity estimates for \(L(s, \chi )\) and a hybrid fourth moment estimate.

The deduction of Theorem 2 from the above two proposition is identical to the deduction of Theorem 1 from Propositions 1 and 2. The only difference is that we use the result of Nunes to handle the case when \(q > x^{1 - \varepsilon }\) and we notice that for prime q,

$$\begin{aligned} \frac{1}{\varphi (q)} \sum _{\begin{array}{c} d^2 n \le x \\ (d^2 n, q) = 1 \end{array}} \mu (d)= & {} \frac{1}{\varphi (q)} \sum _{\begin{array}{c} d^2\le x \\ (d, q) = 1 \end{array}} \mu (d) \Big (\Big \lfloor \frac{x}{d^2}\Big \rfloor - \Big \lfloor \frac{x}{qd^2}\Big \rfloor \Big ) \\= & {} \frac{x}{q} \sum _{\begin{array}{c} d^2 \le x \\ (d,q)=1 \end{array}} \frac{\mu (d)}{d^2} + O\Big (\frac{\sqrt{x}}{q}\Big ) = \frac{6}{\pi ^2} \frac{x}{q} \Big (1-\frac{1}{q^2}\Big )^{-1} + O\Big (\frac{\sqrt{x}}{q}\Big ) \end{aligned}$$

and the total error term incurred is \(x / q^2\) which is \(\le x^{-\varepsilon } \sqrt{x/ q}\) for \(q > x^{1/3 + \varepsilon }\).

Theorem 3 depends upon similar principles as Propositions 2 and 4. We prove Theorem 3 in Sect. 8.

Finally let us make a few remarks on the bottleneck that prevents us from pushing our result further. Taking \(H = X^{6/11}\), we are unable to show the following estimate,

$$\begin{aligned} \frac{1}{X} \int _{X}^{2X} \Big | \sum _{\begin{array}{c} x \le n d^2 \le x + H \\ d^2 \sim X^{8/11} \end{array}} \mu (d) - H \sum _{d^2 \sim X^{8/11}} \frac{\mu (d)}{d^2}\Big |^2 dx \ll _{A} \frac{\sqrt{H}}{\log ^{A} X}. \end{aligned}$$

Specifically opening \(\mu (d)\) using Heath-Brown’s identity (see [HB82]) the only situation that we are not able to estimate is the one in which \(\mu (d)\) is replaced by two smooth sums of equal length. Roughly speaking this corresponds to estimating,

$$\begin{aligned} \frac{1}{X} \int _{X}^{2X} \Big | \sum _{\begin{array}{c} x \le n a^2 b^2 \le x + H \\ a, b \sim X^{2/11} \end{array}} 1 - H \sum _{a, b \sim X^{2/11}} \frac{1}{a^2 b^2} \Big |^2 dx \ll \frac{\sqrt{H}}{\log ^{A} X} \end{aligned}$$

Opening the above expression into Dirichlet polynomials this is roughly equivalent to

$$\begin{aligned} \int _{|t| \le X^{5/11}} \Big | \sum _{n \sim X^{3/11}} \frac{1}{n^{1/2 + it}} \sum _{a \sim X^{2/11}} \frac{1}{a^{1/2 + 2 it}} \sum _{b \sim X^{2/11}} \frac{1}{b^{1/2 + 2 it}} \Big |^2 dt \ll \frac{X^{6/11}}{\log ^{A} X}. \end{aligned}$$

Applying the functional equation on the Dirichlet polynomial over n, and setting \(Y = X^{12/11}\) we then see that obtaining the above estimate is equivalent to showing that,

$$\begin{aligned} \int _{|t| \le Y^{5/12}} \Big | \sum _{n \sim Y^{1/6}} \frac{1}{n^{1/2 + it}} \sum _{a \sim Y^{1/6}} \frac{1}{a^{1/2 + 2 it}} \sum _{b \sim Y^{1/6}} \frac{1}{b^{1/2 + 2 it}} \Big |^2 dt \ll \frac{Y^{1/2}}{\log ^{A} Y}. \end{aligned}$$

If the 2it in the Dirichlet polynomial over a and b were replaced by it then we would be facing exactly the same bottleneck as in the case of improving Huxley’s prime number theorem in short intervals (by a variant of the computations in [HB82], see also [Har07, Chapter 7]). In particular to make further progress we either need to find a way to improve Huxley’s estimate or find a way to exploit the fact that the phases in two of the Dirichlet polynomials are 2it and not it. Unfortunately we do not see how to make progress on either of these questions.

3 Lemmas

3.1 Dirichlet polynomials and L-functions.

Let us first collect some standard results on large values of Dirichlet polynomials and L-functions.

Lemma 1

(Large-value theorem). Let \(N, T \ge 1\) and \(V > 0\). Let \(F(s) = \sum _{n \le N} a_n n^{-s}\) be a Dirichlet polynomial and let \(G = \sum _{n \le N} |a_n|^2\). Let \({\mathcal {T}}\) be a set of 1-spaced points \(t_r \in [-T, T]\) such that \(|F(it_r)| \ge V\). Then

$$\begin{aligned} |{\mathcal {T}}| \ll (GNV^{-2} + T \min \{GV^{-2}, G^3 N V^{-6}\})(\log 2NT)^6. \end{aligned}$$

Proof

This follows from the mean-value theorem and Huxley’s large value theorem, see e.g. [IK04, Theorem 9.7 and Corollary 9.9].

\(\square \)

We will say that a set of tuples \((t, \chi )\) with \(\chi \) a Dirichlet character and t a real number is well-spaced whenever it holds that if \((t, \chi ) \ne (u, \chi ')\) then either \(\chi \ne \chi '\) or \(|t - u| \ge 1\).

Lemma 2

(Hybrid large-value theorem). Let \(q \in {\mathbb {N}}\), \(N, T \ge 1\) and \(V > 0\). Let \(F(s, \chi ) = \sum _{n \le N} a_n \chi (n) n^{-s}\) be a Dirichlet polynomial, and let \(G = \sum _{n \le N} |a_n|^2\). Let \({\mathcal {T}}\) be a set of well-spaced tuples \((t_r, \chi )\) with \(t_r \in [-T, T]\) and with \(\chi \) a primitive character of modulus q such that \(|F(i t_r, \chi )| \ge V\). Then

$$\begin{aligned} |{\mathcal {T}}| \ll (G N V^{-2} + q T \min \{ G V^{-2}, G^3 N V^{-6} \}) \cdot (\log 2qNT)^{18}. \end{aligned}$$

Proof

This follows e.g. from [IK04, Theorems 9.16 and 9.18 with \(k = q\) and \(Q = 1\)].

\(\square \)

Lemma 3

(Fourth moment estimate). Let \(T \ge 2\). Then

$$\begin{aligned} \int _{|t| \le T} |\zeta (\tfrac{1}{2} + it)|^4 dt \ll T (\log T)^4. \end{aligned}$$

Proof

See e.g. [Tit86, formula (7.6.1)]. \(\square \)

Lemma 4

(Hybrid fourth moment estimate). Let \(T \ge 2\) and \(q \ge 2\). Then

$$\begin{aligned} \sum _{\chi } \int _{|t| \le T} |L(\tfrac{1}{2} + it, \chi )|^4 dt \ll T \varphi (q) \log (T q)^4, \end{aligned}$$

where the sum is over all characters modulo q.

Proof

See e.g. [Mon71, Theorem 10.1]. \(\square \)

Lemma 5

(Subconvexity estimate). One has, for \(|t| \ge 2\),

$$\begin{aligned} \zeta (1/2 + it) \ll |t|^{1/6} (\log |t|)^2. \end{aligned}$$

Proof

See e.g. [IK04, formula (8.22)]. \(\square \)

Lemma 6

(Hybrid Weyl subconvexity). For cube-free q, primitive characters \(\chi \, (\mathrm {mod}\, q)\) and \(|t| \ge 2\),

$$\begin{aligned} L(1/2 + it, \chi ) \ll _\varepsilon (q t)^{1/6 + \varepsilon } \end{aligned}$$

for any \(\varepsilon > 0\).

Proof

See [PY19, Theorem 1.1]. \(\square \)

Of course the Lindelöf and Generalized Lindelöf Hypotheses would give us respectively that for any \(\varepsilon > 0\), for \(|t| \ge 2\) and any character \(\chi \, (\mathrm {mod}\, q)\),

$$\begin{aligned} \zeta (1/2+it) \ll |t|^\varepsilon , \quad L(1/2+it,\chi ) \ll (q|t|)^\varepsilon . \end{aligned}$$

Lemma 7

(Hybrid mean-value theorem). Let a(n) be an arbitrary sequence of coefficients and \(N, q \ge 1\) be integers and \(T \ge 1\) real. Then, for any given \(\varepsilon > 0\),

$$\begin{aligned} \sum _{\chi \, (\mathrm {mod}\, q)} \int _{|t| \le T} \Big | \sum _{n \le N} a(n) \chi ^2(n) n^{it} \Big |^2 dt \ll q^{\varepsilon } (q T + N) \sum _{n \le N} |a(n)|^2. \end{aligned}$$

Proof

We notice that given a character \(\psi \, (\mathrm {mod}\, q)\) there are at most \(\ll q^{\varepsilon }\) characters \(\chi \) such that \(\chi ^2 = \psi \). Therefore the left-hand side of the claim is bounded by

$$\begin{aligned} \ll q^{\varepsilon } \sum _{\psi \, (\mathrm {mod}\, q)} \int _{|t| \le T} \Big | \sum _{n \le N} a(n) \psi (n) n^{it} \Big |^2 dt \end{aligned}$$

and the result follows from the standard hybrid mean-value theorem, see e.g. [Mon71, Theorem 6.4]. \(\square \)

3.2 Asymptotic estimates.

Lemma 8

Fix \(\varepsilon \in (0, \tfrac{1}{100})\). Let \(K_0\) be a positive constant. Suppose that \(W {:}\,{\mathbb {R}} \rightarrow {\mathbb {C}}\) is such that, for all \(k, \ell \in \{0, 1, 2, 3, 4\}\), one has

$$\begin{aligned} |W^{(k)}(y)| \le K_0 \frac{H^{\ell \varepsilon /4}}{(1+|y|)^\ell }, \quad \; \text {for all}\; y \in {\mathbb {R}}. \end{aligned}$$
(16)

Let \(z \ge H^{1+\varepsilon }\). Then

$$\begin{aligned} 2 H^2 \sum _{d_1^2, d_2^2 \le z} \frac{\mu (d_1) \mu (d_2)}{d_1^2 d_2^2} \sum _{\lambda \ge 1} \Big | W \Big ( \frac{H \lambda }{(d_1^2, d_2^2)} \Big ) \Big |^2 = C H^{1/2} \pi \int _0^\infty |W(y)|^2 \sqrt{y} dy + O(H^{1/2-\varepsilon /5}),\nonumber \\ \end{aligned}$$
(17)

where C is as in (4) and where the implied constant depends only on \(K_0\) and \(\varepsilon \).

Proof

The proof consists of two steps.

The first step is to show that the sum in (17) can be completed into a sum over all \(d_1, d_2\) without affecting the claimed asymptotic. We use the information (16) for \(k=0\) and \(\ell = 0, 1\) to see that, for any \(\nu > 0\),

$$\begin{aligned} \sum _{\lambda \ge 1} |W(\lambda /\nu )|^2 \ll \sum _{1 \le \lambda \le \nu H^{\varepsilon /4}} 1 + \sum _{\lambda > \nu H^{\varepsilon /4}} \frac{H^{\varepsilon /2} \nu ^2}{\lambda ^2} \ll \nu H^{\varepsilon /4}. \end{aligned}$$
(18)

Hence

$$\begin{aligned} 2H^2 \sum _{\begin{array}{c} d_1> z^{1/2} \\ \text {or} \\ d_2> z^{1/2} \end{array}} \frac{\mu (d_1) \mu (d_2)}{d_1^2 d_2^2} \sum _{\lambda \ge 1} \Big |W\Big (\frac{H\lambda }{(d_1^2,d_2^2)}\Big ) \Big |^2 \ll H^{1+\varepsilon /4} \sum _{\begin{array}{c} d_1> z^{1/2} \\ \text {or} \\ d_2 > z^{1/2} \end{array}} \frac{(d_1,d_2)^2}{d_1^2 d_2^2}. \end{aligned}$$

Writing \((d_1, d_2) = d_0\) and \(d_i = \delta _i d_0\) and utilizing symmetry and the lower bound for z, we see that this is

$$\begin{aligned} \ll H^{1+\varepsilon /4} \sum _{d_0 \ge 1} \frac{1}{d_0^2} \sum _{\begin{array}{c} \delta _1 \ge H^{(1+\varepsilon )/2}/d_0 \\ \delta _2 \ge 1 \end{array}} \frac{1}{\delta _1^2 \delta _2^2} \ll H^{1+\varepsilon /4} \sum _{d_0 \ge 1} \frac{1}{d_0^2} \min \left\{ 1, \frac{d_0}{H^{(1+\varepsilon )/2}}\right\} \ll H^{1/2-\varepsilon /5}. \end{aligned}$$

Thus the left-hand side of (17) is

$$\begin{aligned} 2H^2 \sum _{d_1, d_2 \ge 1} \frac{\mu (d_1) \mu (d_2)}{d_1^2 d_2^2} \sum _{\lambda \ge 1} \Big | W\Big (\frac{H\lambda }{(d_1^2,d_2^2)}\Big ) \Big |^2 + O(H^{1/2-\varepsilon /5}). \end{aligned}$$
(19)

The second step is to use contour integration to simplify (19). Define \(g(x) = |W(e^x)|^2 e^x\), which is smooth and decays exponentially as \(|x|\rightarrow \infty \). Now

$$\begin{aligned} {\hat{g}}(\xi ) = \int _{-\infty }^\infty |W(e^x)|^2 e^x e(-x\xi ) dx = \int _0^\infty |W(y)|^2 y^{-2\pi i \xi } dy, \end{aligned}$$

and standard partial integration arguments show that (i) \({\hat{g}}(\xi )\) is entire and (ii) \({\hat{g}}(\xi ) = O(H^{2\varepsilon }/(|\xi |+1)^3)\) uniformly for \(|\mathfrak {I}(\xi )| < 1/(2\pi )\). Fourier inversion implies, for \(r > 0\),

$$\begin{aligned} |W(r)|^2 = r^{-1} \frac{1}{2\pi i} \int _{(c)} r^s {\hat{g}}\Big (\frac{s}{2\pi i}\Big )\, ds, \end{aligned}$$

where the integral is over \(\mathfrak {R}(s) = c\), and \(-1< c < 1\).

Hence, taking \(c = -1/4\),

$$\begin{aligned}&2H^2 \sum _{d_1, d_2 \ge 1} \frac{\mu (d_1) \mu (d_2)}{d_1^2 d_2^2} \sum _{\lambda \ge 1} \Big | W\Big (\frac{H\lambda }{(d_1^2,d_2^2)}\Big )\Big |^2 \\&\quad = \frac{H}{i\pi } \sum _{d_1, d_2} \frac{\mu (d_1)\mu (d_2)}{d_1^2 d_2^2} \sum _{\lambda \ge 1} \frac{(d_1, d_2)^2}{\lambda } \int _{(-1/4)} H^s \lambda ^s (d_1, d_2)^{-2s} {\hat{g}}\Big (\frac{s}{2\pi i}\Big )\, ds. \end{aligned}$$

The range of s is such that the sums over both \(\lambda \) and \(d_1, d_2\) can be taken inside the integral, and the above simplifies to

$$\begin{aligned}&\frac{H}{i\pi } \int _{(-1/4)} H^s \zeta (1-s) \prod _p \Big (1 - \frac{2}{p^2} + \frac{1}{p^{2+2s}}\Big ) {\hat{g}}\Big (\frac{s}{2\pi i}\Big )\, ds \\&\quad = \frac{H}{i\pi } \int _{(-1/4)} H^s \zeta (1-s) \zeta (2+2s) \prod _p \Big (1 - \frac{2}{p^2} + \frac{2}{p^{4+2s}} - \frac{1}{p^{4+4s}}\Big ) {\hat{g}}\Big (\frac{s}{2\pi i}\Big )\, ds. \end{aligned}$$

The Euler product in the last line converges absolutely for \(\mathfrak {R}s > -3/4\). Therefore using Lemma 5 (noting that the same bound holds also for \(\zeta (c+it)\) with \(c \ge 1/2\)) and bounds on \({\hat{g}}\) we can push the contour integral above to the left to an integral over \(\mathfrak {R}(s) = -3/4+\varepsilon \). Because of the singularity from \(\zeta (2+2s)\) at \(s = -1/2\) the above then simplifies to

$$\begin{aligned} \begin{aligned}&H^{1/2}\zeta (3/2) \prod _p \Big (1 - \frac{3}{p^2} + \frac{2}{p^3}\Big ) {\hat{g}}\Big (-\frac{1}{4\pi i}\Big ) + O(H^{1/4+3\varepsilon }) \\&\quad = H^{1/2} \int _0^\infty |W(y)|^2 \sqrt{y}\, dy \zeta (3/2) \prod _p \Big (1 - \frac{3}{p^2} + \frac{2}{p^3}\Big ) + O(H^{1/2-\varepsilon /5}). \end{aligned} \end{aligned}$$

This verifies the lemma. \(\square \)

We also have a minor variant:

Lemma 9

Fix \(\varepsilon \in (0,\tfrac{1}{100})\). Let \(S(x)= \frac{\sin \pi x}{\pi x}\), defined by continuity at \(x=0\), and let \(z \ge H^{1+\varepsilon }\). Then

$$\begin{aligned} 2 H^2 \sum _{d_1^2, d_2^2 \le z} \frac{\mu (d_1) \mu (d_2)}{d_1^2 \cdot d_2^2} \sum _{\lambda \ge 1} S \Big ( \frac{H \lambda }{(d_1^2, d_2^2)} \Big )^2 = C H^{1/2} + O(H^{1/2-\varepsilon /8}). \end{aligned}$$
(20)

Proof

We first note that

$$\begin{aligned} \int _0^\infty S(y)^2 \sqrt{y}\, dy = \frac{1}{\pi }. \end{aligned}$$
(21)

This identity follows from [GR14, formula 3.823].

Thus (20) is a variant of (17). Lemma 8 does not apply directly because S does not decay quickly enough. To overcome this issue, we let h be a smooth bump function such that \(h(x) = 1\) for \(|x| \le 1\) and \(h(x) = 0\) for \(|x|\ge 2\). We introduce the function

$$\begin{aligned} W(y) = S(y)h(y/H^{\varepsilon /4}) \end{aligned}$$

which satisfies the hypothesis of Lemma 8 for our \(\varepsilon \). On the other hand for such W

$$\begin{aligned}&2 H^2 \sum _{d_1^2, d_2^2 \le z} \frac{\mu (d_1) \mu (d_2)}{d_1^2 \cdot d_2^2} \sum _{\lambda \ge 1} \left( S \Big ( \frac{H \lambda }{(d_1^2, d_2^2)} \Big )^2 - W \Big ( \frac{H \lambda }{(d_1^2, d_2^2)} \Big )^2 \right) \\&\quad \ll H^2 \sum _{d_1^2, d_2^2} \frac{1}{d_1^2 d_2^2} \sum _{\lambda \ge 1} \frac{1}{(H\lambda /(d_1^2,d_2^2))^2} {\mathbf {1}}\Big ( \frac{H\lambda }{(d_1^2,d_2^2)} \ge H^{\varepsilon /4}\Big ). \end{aligned}$$

We split the sum over \(d_1\) and \(d_2\) into the complementary ranges \((d_1^2, d_2^2) \le H^{1-\varepsilon /4}\) and \((d_1^2, d_2^2) > H^{1-\varepsilon /4}\). In the second case we utilize that \(\lambda > H^{\varepsilon /4-1} (d_1^2, d_2^2)\), and we see that the above is

$$\begin{aligned} \ll \sum _{(d_1,d_2)^2 \le H^{1-\varepsilon /4}} \frac{(d_1,d_2)^4}{d_1^2 d_2^2} + H^{1-\varepsilon /4}\sum _{(d_1,d_2)^2 > H^{1-\varepsilon /4}} \frac{(d_1,d_2)^2}{d_1^2 d_2^2}. \end{aligned}$$

Writing \((d_1, d_2) = d_0\) and \(d_i = \delta _i d_0\), the above is

$$\begin{aligned} \ll \sum _{d_0 \le H^{1/2-\varepsilon /8}} \sum _{\delta _1, \delta _2} \frac{1}{\delta _1^2\delta _2^2} + H^{1-\varepsilon /4} \sum _{d_0 > H^{1/2-\varepsilon /8}} \frac{1}{d_0^2} \sum _{\delta _1, \delta _2} \frac{1}{\delta _1^2\delta _2^2} \ll H^{1/2-\varepsilon /8}. \end{aligned}$$
(22)

On the other hand,

$$\begin{aligned} \int _0^\infty S(y)^2\sqrt{y}\,dy - \int _0^\infty W(y)^2\sqrt{y}\,dy \ll \int _{H^{\varepsilon /4}}^\infty y^{-3/2}\, dy \ll H^{-\varepsilon /8}. \end{aligned}$$

Combining this with the bound (22) and the identity (21) verifies (20) with error term of order \(H^{1/2-\varepsilon /5} + H^{1/2-\varepsilon /8} \ll H^{1/2-\varepsilon /8}\). \(\square \)

3.3 Initial reductions on second moments.

The following lemma will be used in the proof of Proposition 2.

Lemma 10

If \(F {:}\,{\mathbb {R}} \rightarrow {\mathbb {C}}\) is square-integrable and \(H \le X\), then

$$\begin{aligned} \int _{X}^{2X} |F(x + H) - F(x)|^2 dx \ll \sup _{\theta \in [\frac{H}{3X}, \frac{3 H}{X}]} \int _{X}^{3X} |F(u + \theta u) - F(u)|^2 du \end{aligned}$$

Proof

The proof can be found in a paper by Saffari and Vaughan [SV77, Page 25] but for the convenience of the reader we include the proof here.

First note that by the triangle inequality we have, for any \(v \ge H\),

$$\begin{aligned} |F(x+H)-F(x)|^2 \ll |F(x+v)-F(x)|^2 + |F(x+v)-F(x+H)|^2. \end{aligned}$$

Integrating this over \(x \in [X,2X]\) and \(v \in [2H,3H]\),

$$\begin{aligned}&H \int _X^{2X} |F(x+H)-F(x)|^2\,dx \\&\quad \ll \int _{2H}^{3H} \int _X^{2X} |F(x+v)-F(x)|^2\, dx dv \\&\qquad +\,\int _{2H}^{3H} \int _X^{2X} |F(x+v) - F(x+H)|^2\, dx dv. \end{aligned}$$

By a change of variables the right-hand side is equal to

$$\begin{aligned}&\int _{2H}^{3H} \int _X^{2X} |F(x+v)-F(x)|^2\, dxdv + \int _H^{2H} \int _{X+H}^{2X+H} |F(y+w)-F(y)|^2\, dy dw \\&\quad \le \int _H^{3H} \int _X^{3X} |F(x+v)-F(x)|^2\, dx dv = \int _X^{3X} \int _H^{3H} |F(x+v)-F(x)|^2\, dv dx. \end{aligned}$$

Changing the order of integration was justified by Fubini’s theorem. Letting \(v = \theta x\) in the inner integral of the last expression above, we see the right-hand side is equal to

$$\begin{aligned} \int _X^{3X} \int _{H/x}^{3H/x} |F(x+\theta x) - F(x)|^2 x \, d\theta dx \ll X \int _X^{3X} \int _{H/3X}^{3H/X} |F(x+\theta x) - F(x)|^2\, d\theta dx. \end{aligned}$$

Collecting everything and swapping the order of integration again, we obtain

$$\begin{aligned} H \int _X^{2X} |F(x+H) - F(x)|^2\, dx \ll X \int _{H/3X}^{3H/X} \int _X^{3X} |F(u+\theta u) - F(u)|^2\, dud\theta , \end{aligned}$$

which immediately implies the claim. \(\square \)

We will frequently use the following immediate consequences of the orthogonality of characters: for any sequence \(b_n\) of complex numbers,

$$\begin{aligned} \begin{aligned} \frac{1}{\varphi (q)} \sum _{\begin{array}{c} \chi \, (\mathrm {mod}\, q) \\ \chi \ne \chi _0 \end{array}} \left| \sum _{n} b_n \chi (n)\right| ^2&= \sum _{\begin{array}{c} n_1\, \equiv \, n_2 \, (\mathrm {mod}\, q) \\ (n_1 n_2, q) = 1 \end{array}} b_{n_1} \overline{b_{n_2}} - \frac{1}{\varphi (q)} \sum _{\begin{array}{c} n_1, n_2 \\ (n_1 n_2, q) = 1 \end{array}} b_{n_1} \overline{b_{n_2}}\\&= \sum _{\begin{array}{c} a \, (\mathrm {mod}\, q) \\ (a, q) = 1 \end{array}} \left| \sum _{n\, \equiv \, a \, (\mathrm {mod}\, q)} b_n - \frac{1}{\varphi (q)} \sum _{(n, q) = 1} b_n \right| ^2 \end{aligned} \end{aligned}$$
(23)

and

$$\begin{aligned} \begin{aligned} \frac{1}{\varphi (q)} \sum _{\chi \, (\mathrm {mod}\, q)} \left| \sum _{n} b_n \chi (n)\right| ^2&= \sum _{\begin{array}{c} n_1\, \equiv \, n_2 \, (\mathrm {mod}\, q) \\ (n_1 n_2, q) = 1 \end{array}} b_{n_1} \overline{b_{n_2}} = \sum _{\begin{array}{c} a \, (\mathrm {mod}\, q) \\ (a, q) = 1 \end{array}} \left| \sum _{n = a \, (\mathrm {mod}\, q)} b_n \right| ^2. \end{aligned} \end{aligned}$$
(24)

3.4 Point-counting lemmas.

Lemma 11

Let \(a,b \in {\mathbb {N}}\) be such that \(\sqrt{b / a}\) is irrational. Let \(\eta \in (0, 1]\) and \(M \ge 1\). The number of \(m \sim M\) such that

$$\begin{aligned} \Big \Vert m \sqrt{\frac{b}{a}} \Big \Vert \le \eta \end{aligned}$$
(25)

is bounded by

$$\begin{aligned} \ll \eta M + \sqrt{\eta M} (a b)^{1/4} + 1. \end{aligned}$$

Proof

We can clearly assume that \(\eta ^{1/2} (ab)^{1/4} \le M^{1/2}\) since otherwise the claim is trivial. Assume we have a (reduced) rational approximation r/q with \(r \in {\mathbb {Z}}\) and \(q \in {\mathbb {N}}\) such that

$$\begin{aligned} \left| \sqrt{\frac{b}{a}} - \frac{r}{q}\right| \le \frac{1}{q^2}. \end{aligned}$$
(26)

Now, writing each \(m \in (M, 2M]\) as \(m = kq + \ell \) with \(0 \le \ell \le q-1\), we see that the number of solutions to (25) with \(m \sim M\) is at most

$$\begin{aligned} \begin{aligned}&\sum _{\lfloor M/q \rfloor \le k \le 2M/q} \Bigl |\Bigl \{0 \le \ell \le q-1 {:}\,\Big \Vert (kq + \ell ) \sqrt{\frac{b}{a}} \Big \Vert \le \eta \Bigr \}\Bigr |\\&\quad \ll \left( \frac{M}{q} + 1\right) \max _{\xi \in [0, 1]} \Bigl |\Bigl \{0 \le \ell \le q-1 {:}\,\Big \Vert \ell \sqrt{\frac{b}{a}} + \xi \Big \Vert \le \eta \Bigr \}\Bigr | \\&\quad \ll \left( \frac{M}{q} + 1\right) \max _{\xi \in [0, 1]} \Bigl |\Bigl \{0 \le \ell \le q-1 {:}\,\Big \Vert \frac{\ell r}{q} + \xi \Big \Vert \le \eta + 1/q \Bigr \}\Bigr | \\&\quad \ll \left( \frac{M}{q} + 1\right) \left( q \cdot \eta + 1\right) \ll M\eta + \frac{M}{q} + q\eta + 1. \end{aligned} \end{aligned}$$

Now since \(\sqrt{b/a} = \sqrt{ab}/a\) is a quadratic irrational, the partial denominators in its continued fraction expansion have size at most \(2\sqrt{ab}\) (see for instance [RS92, p. 44]). In particular this means that for any given \(R \ge 1\), we can find \(q \in [R, 3\sqrt{ab} R]\) such that (26) holds for some r coprime to q. Taking \(R = M^{1/2}/(\eta ^{1/2} (ab)^{1/4}) \ge 1\), we see that the number of solutions is indeed

$$\begin{aligned} \ll M\eta + M^{1/2} \eta ^{1/2} (ab)^{1/4} + 1.[-3.6pc] \end{aligned}$$

\(\square \)

Lemma 12

Let \(a, b \in {\mathbb {N}}\) be such that \(\sqrt{b/a}\) is irrational, and let \(M_1, M_2, T \ge 1\). The number of solutions to

$$\begin{aligned} |a m_1^2 - b m_2^2| \le \frac{b M_2^2}{T} \quad \text {with }m_1 \sim M_1\, \mathrm{and}\, m_2 \sim M_2 \end{aligned}$$

is

$$\begin{aligned} \ll \frac{M_1 M_2}{T} + \Big ( \frac{(M_1 M_2)^{1/2}(ab)^{1/4}}{T^{1/2}} + 1 \Big ) \cdot {\mathbf {1}}_{M_2 < T}. \end{aligned}$$

Proof

Dividing by b and factoring, we see that we need to count the number of solutions to

$$\begin{aligned} \left| \left( m_1 \sqrt{\frac{a}{b}} - m_2\right) \left( m_1 \sqrt{\frac{a}{b}} + m_2\right) \right| \le \frac{M_2^2}{T} \end{aligned}$$

Dividing by the second factor, we see that it suffices to count the number of solutions to

$$\begin{aligned} \left| m_1 \sqrt{\frac{a}{b}} - m_2\right| \le \frac{M_2}{T}. \end{aligned}$$

If \(M_2 \ge T\), we have \(M_1\) choices for \(m_1\) and after that \(O(M_2/T)\) choices for \(m_2\), so in total \(M_1 M_2/T\) solutions which is fine.

If \(M_2 < T\), then once \(m_1\) is chosen there are at most two choices for \(m_2\). Therefore it suffices to count the number of integers \(m_1 \sim M_1\) such that

$$\begin{aligned} \left\| m_1 \sqrt{\frac{b}{a}} \right\| \le \frac{M_2}{T}. \end{aligned}$$

The result now follows from Lemma 11. \(\square \)

4 The range \(H^{1+\varepsilon } \le z \le \min \{X/H^{1/2+\varepsilon }, H^{1/2-\varepsilon }X^{1/2}\}\) in the t-aspect: Proof of Proposition 1

In what follows we let S be the sinc function as defined in Lemma 9. Proposition 1 follows immediately combining the following proposition with Lemma 9.

Proposition 5

Let \(X^{\varepsilon } \le H \le X^{2/3 - \varepsilon }\) and \(H^{1+\varepsilon } \le z \le \min \{X^{1-\varepsilon }/H^{1/2}, H^{1/2-\varepsilon }X^{1/2}\}\). Then, as \(X \rightarrow \infty \),

$$\begin{aligned}&\frac{1}{X} \int _X^{2X} \Big | \sum _{d^2 \le z} \mu (d) \sum _{x/d^2 < n \le (x+H)/d^2} 1 - H \sum _{d^2 \le z} \frac{\mu (d)}{d^2}\Big |^2\, dx \\&\quad = (1+O(H^{-\varepsilon /2}))2H^2 \sum _{k_1^2,k_2^2 \le z} \frac{\mu (k_1) \mu (k_2)}{k_1^2 k_2^2} \sum _{\lambda \ge 1} S\Big (\frac{H\lambda }{(k_1^2,k_2^2)}\Big )^2 + O(H^{1/2-\varepsilon /3}). \end{aligned}$$

Proof

We prove a smoothed version of the claim first. Let \(\sigma {:}\,{\mathbb {R}}\rightarrow {\mathbb {R}}\) be an absolutely integrable function such that \({\hat{\sigma }}\) is supported in the interval \([-BH^{\varepsilon /2},B H^{\varepsilon /2}]\) for some constant B to be specified later. We first show that as \(X\rightarrow \infty \),

$$\begin{aligned}&\frac{1}{X} \int _{-\infty }^{\infty } \sigma \left( \frac{x}{X}\right) \Big | \sum _{d^2 \le z} \mu (d) \sum _{x/d^2 < n \le (x+H)/d^2} 1 - H \sum _{d^2 \le z} \frac{\mu (d)}{d^2}\Big |^2\, dx \nonumber \\&\quad = 2{\hat{\sigma }}(0) H^2 \sum _{k_1^2,k_2^2 \le z} \frac{\mu (k_1) \mu (k_2)}{k_1^2 k_2^2} \sum _{\lambda \ge 1} S\Big (\frac{H\lambda }{(k_1^2,k_2^2)}\Big )^2 + O(H^{1/2-\varepsilon /3}). \end{aligned}$$
(27)

Here

$$\begin{aligned} \sum _{x/d^2 < n \le (x+H)/d^2} 1 = H/d^2 + \psi (x/d^2) - \psi ((x+H)/d^2), \end{aligned}$$
(28)

where \(\psi (y) = y - [y] - 1/2\) with [y] the integral part of y. For \(\psi \) we have the Fourier expansion (see e.g. [IK04, (4.18)])

$$\begin{aligned} \psi (y) = - \frac{1}{2\pi i} \sum _{0 < |n| \le N} \frac{1}{n} e(y n) + O(\min \{1, 1/(N\Vert y \Vert )\}). \end{aligned}$$
(29)

We take \(N = X^{10}\) and plug (29) into (28). The arising error term is \(O(1/X^5)\) unless \(\Vert x/d^2 \Vert < X^{-5}\) or \(\Vert (x+H)/d^2 \Vert < X^{-5}\). Given this, it is easy to see that the error term leads to acceptable contribution to the left hand side (27).

Hence, the left hand side of (27) can be replaced by

$$\begin{aligned} \frac{1}{4\pi ^2 X} \int _{-\infty }^{\infty } \sigma \left( \frac{x}{X}\right) \Big | \sum _{d^2 \le z} \mu (d) \sum _{0 < |n| \le N} \frac{1}{n} e\left( \frac{n x}{d^2}\right) \left( 1-e\left( \frac{nH}{d^2}\right) \right) \Big |^2\, dx. \end{aligned}$$

Expanding, this equals

$$\begin{aligned}&\frac{1}{4 \pi ^2}\sum _{d_1^2, d_2^2 \le z} \sum _{0 < |n_1|, |n_2| \le N} \mu (d_1) \mu (d_2) \frac{1}{n_1 n_2}\nonumber \\&\times \left( 1-e\left( \frac{n_1 H}{d_1^2}\right) \right) \overline{\left( 1-e\left( \frac{n_2H}{d_2^2}\right) \right) } {\hat{\sigma }}\Big (-X \Big ( \frac{n_1}{d_1^2} - \frac{n_2}{d_2^2}\Big )\Big ). \end{aligned}$$
(30)

Owing to the support of \({\hat{\sigma }}\) this implies that we may restrict the sum in (30) to those integers for which

$$\begin{aligned} \Big | \frac{n_1}{d_1^2} - \frac{n_2}{d_2^2} \Big | \le \frac{BH^{\varepsilon /2}}{X}. \end{aligned}$$
(31)

We consider separately those \((n_1, n_2, d_1, d_2)\) for which \(n_1 d_2^2 = n_2 d_1^2\) and those for which this does not hold. In the first case parameterizing solutions in \(n_1\) and \(n_2\) by \(n_1 = \lambda d_1^2/(d_1^2,d_2^2)\) and \(n_2 = \lambda d_2^2 /(d_1^2,d_2^2)\) for \(\lambda \in {\mathbb {Z}}\setminus \{0\}\), we obtain

$$\begin{aligned} \frac{{\hat{\sigma }}(0)}{4\pi ^2} \sum _{d_1^2, d_2^2 \le z} \mu (d_1) \mu (d_2) \sum _{\lambda \ne 0} \frac{(d_1, d_2)^4}{d_1^2 d_2^2 \lambda ^2} \left| 1-e\left( \frac{\lambda H}{(d_1, d_2)^2}\right) \right| ^2 + O\left( \frac{1}{X^5}\right) , \end{aligned}$$

where the error term comes from adding \(|n_i| > N\) (for which surely \(|\lambda | > X^8\)). Here

$$\begin{aligned} \left| 1-e\left( \frac{\lambda H}{(d_1, d_2)^2}\right) \right| = 2\left| \sin \left( \frac{\lambda \pi H}{(d_1^2, d_2^2)}\right) \right| , \end{aligned}$$

so we get the desired main term involving \(S(\lambda H/(d_1^2, d_2^2))\).

Therefore it remains to show that the contribution of terms with \(n_1d_2^2 \ne n_2 d_1^2\) is negligible. Splitting \(n_j\) and \(d_j\) dyadically, we need to bound, for any \(D_1, D_2 \le z^{1/2}\) and any \(N_1, N_2 \le N\),

$$\begin{aligned} \min \left\{ \frac{1}{N_1}, \frac{H}{D_1^2}\right\} \min \left\{ \frac{1}{N_2}, \frac{H}{D_2^2}\right\} \sum _{\begin{array}{c} n_1 \sim N_1 \\ n_2 \sim N_2 \end{array}} \# \left\{ (d_1, d_2) {:}\,d_j \sim D_j, 0 < \Big | \frac{n_1}{d_1^2} - \frac{n_2}{d_2^2} \Big | \le \frac{BH^{\varepsilon /2}}{X} \right\} \nonumber \\ \end{aligned}$$
(32)

and we need a bound that is \(O(H^{1/2 - \varepsilon /2})\). Now

$$\begin{aligned} \begin{aligned}&\# \{(d_1, d_2) {:}\,d_j \sim D_j, 0< \Big | \frac{n_1}{d_1^2} - \frac{n_2}{d_2^2} \Big | \le \frac{BH^{\varepsilon /2}}{X} \} \\&\quad \ll \# \left\{ (d_1, d_2) {:}\,d_j \sim D_j, 0 < \Big | n_1 d_2^2 - n_2 d_1^2 \Big | \le 16\frac{BD_2^2 H^{\varepsilon /2}}{X N_2} \cdot D_1^2 N_2\right\} . \end{aligned} \end{aligned}$$
(33)

Notice that there are no solutions unless

$$\begin{aligned} N_1 D_2^2 \asymp N_2 D_1^2. \end{aligned}$$
(34)

We split into two cases according to whether \(\sqrt{n_2/n_1}\) is quadratic irrational or instead rational. In the first case we can apply Lemma 12, which shows that the number of solutions (33) is

$$\begin{aligned} \begin{aligned}&\ll \frac{H^{\varepsilon /2} D_1 D_2^3}{X N_2} + 1+ \frac{D_1^{1/2} D_2^{3/2} N_1^{1/4} H^{\varepsilon /4}}{X^{1/2} N_2^{1/4}}. \end{aligned} \end{aligned}$$

By (34) we can multiply the first term by \((D_1/D_2) (N_2/N_1)^{1/2}\) and the third term by \((D_1/D_2)^{1/2} (N_2/N_1)^{1/4}\) to obtain

$$\begin{aligned} \ll \frac{H^{\varepsilon /2} D_1^2 D_2^2}{X (N_1N_2)^{1/2}} + 1+ \frac{D_1 D_2 H^{\varepsilon /4}}{X^{1/2}}. \end{aligned}$$

Using this bound in (32), and summing over \(n_1\) and \(n_2\), we note that the maximum is attained for \(N_j = D_j^2/H\) and thus the contribution to (32) from \(\sqrt{n_2/n_1}\) quadratic irrational is bounded by

$$\begin{aligned} \ll H^{\varepsilon /2}\left( \frac{D_1 D_2 H}{X} + 1 +\frac{D_1 D_2}{X^{1/2}} \right) = O(H^{1/2-\varepsilon /2}) \end{aligned}$$

since \(D_1 \cdot D_2 \le z \le \min \{X/H^{1/2+\varepsilon }, H^{1/2-\varepsilon } X^{1/2}\}\).

In case \(\sqrt{n_2/n_1}\) is rational, there exist \(m, \ell _1, \ell _2 \in {\mathbb {Z}}\) such that \(n_1 = m \ell _1^2\) and \(n_2 = m \ell _2^2\). Hence, writing \(r_1^2 = \ell _1^2 d_2^2\) and \(r_2^2 = \ell _2^2 d_1^2\), we see that the contribution to (32) for \(\sqrt{n_2/n_1}\) rational is bounded by

$$\begin{aligned} \begin{aligned}&\ll H^{\varepsilon /1000} \min \left\{ \frac{1}{N_1}, \frac{H}{D_1^2}\right\} \min \left\{ \frac{1}{N_2}, \frac{H}{D_2^2}\right\} \\&\qquad \times \sum _{m} \#\left\{ (r_1, r_2) {:}\,r_j \le D_j\sqrt{N_j/m}, 0 < |r_1^2 - r_2^2| \le \frac{BH^{\varepsilon /2} D_1^2 D_2^2}{m X}\right\} . \end{aligned} \end{aligned}$$
(35)

Factoring \(r_1^2-r_2^2 = (r_1-r_2)(r_1+r_2)\) and dividing by the second factor, we see that the number of solutions \((r_1, r_2)\) is

$$\begin{aligned} \ll \frac{BH^{\varepsilon /2} D_1^2 D_2^2}{m X} \log X \end{aligned}$$

Summing over \(m \ll \min \{N_1, N_2\}\) and using this bound in (35), the maximum in the resulting bound for (35) is attained for \(N_j = D_j^2/H\). Hence we obtain that (35) is at most \(H^{2+\varepsilon /2+\varepsilon /500}/X \le H^{1/2-\varepsilon /2}\) since \(H \le X^{2/3-\varepsilon }\).

Let us now dispose of the smoothing \(\sigma \): Take B to be a sufficiently large absolute constant that there exist integrable functions \(\sigma _-\) and \(\sigma _+\) such that \({\widehat{\sigma }}_-\) and \({\widehat{\sigma }}_+\) have support \([-B H^{\varepsilon /2}, BH^{\varepsilon /2}]\), and

$$\begin{aligned} \sigma _- \le {\mathbf {1}}_{[1,2]} \le \sigma _+, \quad \text {and}\quad \Big | \int \sigma _{\pm }(x)\,dx -1 \Big | \le H^{-\varepsilon /2}. \end{aligned}$$

(We allow \(\sigma _-\) and \(\sigma _+\) to take negative values.) An explicit construction of such functions is given by the Beurling–Selberg majorant and minorant [Mon01, p. 273]. Applying (27) and these bounds,

$$\begin{aligned}&\frac{1}{X} \int _{-\infty }^\infty {\mathbf {1}}_{[1,2]}\Big (\frac{x}{X}\Big ) \Big | \sum _{d^2 \le z} \mu (d) \sum _{x/d^2 \le n \le (x+H)/d^2} 1 - H \sum _{k^2 \le z} \frac{\mu (k)}{k^2}\Big |^2\, dx \\&\quad = (1+O(H^{-\varepsilon /2})) 2H^2 \sum _{k_1^2,k_2^2 \le z} \frac{\mu (k_1) \mu (k_2)}{k_1^2 k_2^2} \sum _{\lambda \ge 1} S\Big (\frac{H\lambda }{(k_1^2,k_2^2)}\Big )^2 + O(H^{1/2-\varepsilon /3}).[-2.5pc] \end{aligned}$$

\(\square \)

5 The range \(z \ge H^{4/3+\varepsilon }\) in the t-aspect: Proof of Proposition 2

We would like to establish that

$$\begin{aligned} \frac{1}{X} \int _{X}^{2X} \Big | \sum _{\begin{array}{c} x< n d^2 \le x + H \\ d^2 > z \end{array}} \mu (d) - H \sum _{\begin{array}{c} z < d^2 \le 2 X \end{array}} \frac{\mu (d)}{d^2} \Big |^2 dx \ll H^{1/2 - \varepsilon / 8}. \end{aligned}$$

Splitting into dyadic ranges according to the size of d, it essentially suffices to show that, for each \(D \in [z^{1/2}, (2X)^{1/2}]\), we have

$$\begin{aligned} \frac{1}{X} \int _{X}^{2X} \Big | \sum _{\begin{array}{c} x < n d^2 \le x + H \\ d \sim D \end{array}} \mu (d) - H \sum _{d \sim D} \frac{\mu (d)}{d^2} \Big |^2 dx \ll H^{1/2 - \varepsilon / 4}. \end{aligned}$$
(36)

Let

$$\begin{aligned} A(x) := \sum _{\begin{array}{c} n d^2 \le x \\ d \sim D \end{array}} \mu (d) - x \sum _{d \sim D} \frac{\mu (d)}{d^2}. \end{aligned}$$

Using this definition and Lemma 10, we see that the left-hand side of (36) is

$$\begin{aligned} \frac{1}{X} \int _{X}^{2X} |A (x + H) - A(x) |^2 dx \ll \frac{1}{X} \int _{X}^{3X} |A(u ( 1+ \theta )) - A(u)|^2 du \end{aligned}$$
(37)

for some \(\theta \in [\frac{H}{3X}, \frac{3 H}{X}]\). Choose w such that \(e^w = 1 + \theta \), so that \(w \asymp \frac{H}{X}\). By contour integration

$$\begin{aligned} A(e^y) = \frac{1}{2\pi i} \int _{2-i\infty }^{2+i\infty } \frac{e^{y s}}{s} \zeta (s) M(2s) ds - e^y \sum _{d \sim D} \frac{\mu (d)}{d^2}, \end{aligned}$$
(38)

where

$$\begin{aligned} M(s) := \sum _{d \sim D} \frac{\mu (d)}{d^{s}}. \end{aligned}$$

Moving the contour to the line \(\mathfrak {R}s= 1/2\) we notice that the residue from \(s = 1\) cancels with the second term on the right-hand side of (38), and we obtain

$$\begin{aligned} \frac{A(e^{w + x}) - A(e^x)}{e^{x / 2}} = \frac{1}{2\pi } \int _{{\mathbb {R}}} \frac{e^{w (\tfrac{1}{2} + it)} - 1}{\tfrac{1}{2} + it} e^{i t x} \zeta (\tfrac{1}{2} + it) M(1 + 2it) dt. \end{aligned}$$

Therefore, by Plancherel,

$$\begin{aligned} \int _{0}^{\infty } | A(e^{u + w}) - A(e^u) |^2 \cdot \frac{du}{e^u} \ll \int _{{\mathbb {R}}} \Big | \frac{e^{w (\tfrac{1}{2} + it)} - 1}{\tfrac{1}{2} + it} \Big |^2 \cdot |\zeta (\tfrac{1}{2} + it) M(1 + 2it)|^2 dt.\nonumber \\ \end{aligned}$$
(39)

Combining (37) and (39) we get after a change of variable,

$$\begin{aligned} \begin{aligned} \frac{1}{X} \int _{X}^{2X} |A(x + H) - A(x)|^2 dx&\ll X \int _{0}^{\infty } |A( u (1 + \theta ) ) - A(u) |^2 \frac{du}{u^2} \\&\ll X \int _{{\mathbb {R}}} \Big | \frac{e^{w (\tfrac{1}{2} + it)} - 1}{\tfrac{1}{2} + it} \Big |^2 \cdot |\zeta (\tfrac{1}{2} + it) M(1 + 2it)|^2 dt \\&\ll X \int _{{\mathbb {R}}} \min \Bigl \{ \Bigl (\frac{H}{X}\Bigr )^2, \frac{1}{|t|^2}\Bigr \} \cdot |\zeta (\tfrac{1}{2} + it) M(1 + 2it)|^2 dt. \end{aligned}\nonumber \\ \end{aligned}$$
(40)

By Lemma 5 the part with \(|t| \ge X^2\) contributes

$$\begin{aligned} \ll X \int _{X^2}^\infty |t|^{-5/3+\varepsilon } dt = O(1). \end{aligned}$$

On the other hand, the contribution of \(|t|\le X^2\) to the right-hand side of (40) is at most

$$\begin{aligned}&\ll \frac{H^2}{X} \int _{|t| \le 2X/H} |\zeta (\tfrac{1}{2} + it) M(1 + 2it)|^2 dt \nonumber \\&\qquad +\,X \int _{X/H}^{X^2} \frac{1}{T^2} \cdot \frac{1}{T} \int _{T \le |t|\le 2T} |\zeta (\tfrac{1}{2} + it) M(1 + 2it)|^2 dt dT\nonumber \\&\qquad \ll H \Big ( \sup _{X / H \le T \le X^2} \frac{1}{T} \int _{|t| \le T} |\zeta (\tfrac{1}{2} + it) M(1 + 2it)|^2 dt \Big ) + O(1). \end{aligned}$$
(41)

Let us now prove the claim on the assumption of the Lindelöf Hypothesis. Applying Lindelöf and then the mean-value theorem (Lemma 7 with \(q = 1\)), we have for any choice of \(\delta > 0\),

$$\begin{aligned} \begin{aligned} \frac{H}{T} \int _{|t|\le T} |\zeta (1/2+it) M(1+i2t)|^2\, dt&\ll \frac{H T^\delta }{T}\int _{|t|\le T}|M(1+i2t)|^2\,dt \\&\ll \frac{H T^\delta }{T} (T+D)\cdot \frac{1}{D} \ll \frac{HT^{\delta }}{D} + \frac{HT^\delta }{T}. \end{aligned} \end{aligned}$$

Recall we have \(D \ge z^{1/2} \ge H^{(1+\varepsilon )/2}, \, H \le X^{2/3-\varepsilon }\) and \(X/H \le T \le X^2\). Hence the above is

$$\begin{aligned} \ll H^{1/2-\varepsilon /2} T^{\delta } + \frac{H^{2-\delta }}{X^{1-\delta }} \ll H^{1/2 - \varepsilon /4} + \frac{H^{2-\delta }}{H^{(1-\delta )/(2/3-\varepsilon )}} \ll H^{1/2-\varepsilon /4}, \end{aligned}$$

for \(\delta \) sufficiently small. Applying this bound to (41) yields the claim.

Let us now prove the unconditional part of the proposition. First notice that the values of t for which \(|M(1+2it)| \le D^{-1/2 + \varepsilon /16}\) contribute to (41) by Cauchy–Schwarz and the fourth moment bound (Lemma 3) \(O(H^{1+\varepsilon /16}D^{-1+\varepsilon /8}) = O(H^{1/2-\varepsilon /4})\), and therefore their contribution is always acceptable. Writing

$$\begin{aligned} S(V) = \{t \in [-T, T] {:}\,V \le |M(1+2it)| < 2V\}, \end{aligned}$$

by dyadic splitting, it suffices to show that, for each \(V \in [D^{-1/2}, 1]\) and \(T \in [X/H, X^2]\), we have

$$\begin{aligned} \frac{H}{T} V^2 \int _{S(V)} |\zeta (1/2+it)|^2 dt \ll H^{1/2-\varepsilon /3}. \end{aligned}$$

Now by Lemma 1 we have

$$\begin{aligned} |S(V)| \ll (V^{-2} + T\min \{D^{-1}V^{-2}, D^{-2} V^{-6}\}) (\log 2X)^6. \end{aligned}$$
(42)

Consider first the case when the first term dominates here. Then by Lemma 5 we have

$$\begin{aligned} \frac{H}{T} V^2 \int _{S(V)} |\zeta (1/2+it)|^2 dt \ll \frac{H}{T} T^{1/3+\varepsilon /2} \ll \frac{H}{T^{2/3-\varepsilon /2}} \ll \frac{H^{5/3}}{X^{2/3-\varepsilon /2}} \le H^{1/2-\varepsilon /3} \end{aligned}$$

since \(H \le X^{4/7-\varepsilon }\).

Consider now the case that the second term dominates in (42). Then, by Cauchy–Schwarz and the fourth moment estimate (Lemma 3),

$$\begin{aligned}&\frac{H}{T} V^2 \int _{S(V)} |\zeta (1/2+it)|^2 dt \ll \frac{H}{T} V^2 |S(V)|^{1/2} \left( \int _{|t| \le T} |\zeta (\tfrac{1}{2} + it)|^4 dt \right) ^{1/2} \\&\quad \ll H V^2 \min \{D^{-1}V^{-2}, D^{-2} V^{-6}\}^{1/2} (\log 2X)^5 \ll H\min \{D^{-1/2} V, D^{-1} V^{-1}\} (\log 2X)^5 \\&\quad \ll H (D^{-1/2} V)^{1/2} (D^{-1} V^{-1})^{1/2} (\log 2X)^5 \ll H D^{-3/4} (\log 2X)^5 \ll H z^{-3/8} (\log 2X)^5 \ll H^{1/2-\varepsilon /3} \end{aligned}$$

since \(z \ge H^{4/3+\varepsilon }\). This finishes the proof of Proposition 2.\(\square \)

6 The range \((x/q)^{1+\varepsilon } \le z < x^{-\varepsilon } \sqrt{qx}\) in the q-aspect: Proof of Proposition 3

By (23) Proposition 3 follows immediately from the following proposition.

Proposition 6

Let \(\varepsilon \in (0, 1/100)\). Let q be prime with \(x^{1/3 + 30 \varepsilon } \le q \le x^{1 - \varepsilon }\) and let \((x / q)^{1 + \varepsilon } \le z \le x^{-\varepsilon } \sqrt{q x}\). Then

$$\begin{aligned} \frac{1}{\varphi (q)} \sum _{\begin{array}{c} \chi \, (\mathrm {mod}\, q) \\ \chi \ne \chi _0 \end{array}} \Big | \sum _{\begin{array}{c} d^2 \le z \\ n d^2 \le x \end{array}} \mu (d) \chi (d^2) \chi (n) \Big |^2 = C \sqrt{qx} + O((x/q)^{-\varepsilon /16} \sqrt{qx}) \end{aligned}$$
(43)

with C as in (4).

The proof of Proposition 6 is based on two Propositions that we now describe. Proposition 7 below will be used to introduce a smoothing into (43). Note that it gives an upper bound that is \(o(\sqrt{qx})\) whenever \(z = o(\sqrt{qx}/(\log x)^{6})\) and the interval I has length \(o(x/(\log x)^{12})\).

Proposition 7

Let q be prime with \(q \le x,\) let \(z \le x\). Let \(I\subset [1,2 x]\) be an interval. Then

$$\begin{aligned} \frac{1}{\varphi (q)} \sum _{\begin{array}{c} \chi \, (\mathrm {mod}\, q) \\ \chi \ne \chi _0 \end{array}} \Big | \sum _{\begin{array}{c} d^2 \le z \\ n d^2 \in I \end{array}}\mu (d) \chi (d^2) \chi (n) \Big |^2 \ll (\log x)^6 \cdot \Big (z + \sqrt{|I| q} \Big ). \end{aligned}$$
(44)

We will use the following proposition to evaluate the smoothed analogue of (43).

Proposition 8

Let \(\varepsilon > 0\) be given. Let f be a smooth function such that f is compactly supported on [0, 1] and \(f(u) = 1\) for \((x/q)^{-\varepsilon /4} \le u \le 1 - (x/q)^{-\varepsilon /4}\) and for each integer \(k \ge 0\), we have \(f^{(k)}(u) \ll (x/q)^{\varepsilon k/4}\). Let \((x / q)^{1 + \varepsilon } \le z \le x^{-\varepsilon } \sqrt{qx}\). Then for \(x^{1/3 + 30 \varepsilon } \le q \le x^{1 - \varepsilon }\),

$$\begin{aligned} \frac{1}{\varphi (q)} \sum _{\begin{array}{c} \chi \, (\mathrm {mod}\, q) \\ \chi \ne \chi _0 \end{array}} \Big | \sum _{\begin{array}{c} n \ge 1 \\ d^2 \le z \end{array}} f \Big ( \frac{n d^2}{x} \Big ) \mu (d) \chi (d^2) \chi (n) \Big |^2 = C \sqrt{q x} + O((x/q)^{-\varepsilon /10} \sqrt{qx}), \end{aligned}$$

where C is as in (4).

One way to construct f satisfying the assumptions of the proposition is to take \(\phi (t)\) to be a smooth function which vanishes for negative t and has \(\phi (t)=1\) for t greater than 1, and then set \(f(u) = \phi ((x/q)^{\varepsilon /4} u) \phi ((x/q)^{\varepsilon /4} (1-u))\).

With these two propositions at hand we are ready to prove Proposition 6.

6.1 Proof of Proposition 6.

For \(m \in {\mathbb {N}}\), set

$$\begin{aligned} A_m := \sum _{\begin{array}{c} d^2 \mid m \\ d^2 \le z \end{array}} \mu (d), \end{aligned}$$

and let f be as described below Proposition 8. Then

$$\begin{aligned} \frac{1}{\varphi (q)}\sum _{\begin{array}{c} \chi \, (\mathrm {mod}\, q) \\ \chi \ne \chi _0 \end{array}} \Big | \sum _{\begin{array}{c} n \le x \end{array}} A_n \chi (n)\Big |^2 = S_1 + O(\sqrt{S_1 S_2} + S_2), \end{aligned}$$

where

$$\begin{aligned} S_1 := \frac{1}{\varphi (q)}\sum _{\begin{array}{c} \chi \, (\mathrm {mod}\, q) \\ \chi \ne \chi _0 \end{array}} \Big | \sum _{n} A_n\chi (n) f \Big (\frac{n}{x} \Big ) \Big |^2 \end{aligned}$$

and

$$\begin{aligned} S_2 := \frac{1}{\varphi (q)}\sum _{\begin{array}{c} \chi \, (\mathrm {mod}\, q) \\ \chi \ne \chi _0 \end{array}} \Big | \sum _{\begin{array}{c} n \in I \end{array}} A_n \chi (n) \Big (1 - f \Big (\frac{n}{x} \Big )\Big )\Big |^2 \end{aligned}$$

where \(I = I_1 \cup I_2\) with \(I_1 := [1, x\cdot (x/q)^{ - \varepsilon /4}]\) and \(I_2 := [x - x\cdot (x/q)^{ - \varepsilon /4}, x]\).

For \(i=1,2\), define

$$\begin{aligned} B_i(\chi ;t) = \sum _{\begin{array}{c} n \in I_i \\ n < t \end{array}} A_n \chi (n). \end{aligned}$$

By partial summation,

$$\begin{aligned} \sum _{n \in I_2} A_n \chi (n) \Big (1 - f \Big ( \frac{n}{x} \Big ) \Big )&= \int _{I_2} \Big ( 1 - f \Big ( \frac{t}{x} \Big )\Big ) d B_2(\chi ; t) \\&= \frac{1}{x} \int _{I_2} f' \Big ( \frac{t}{x} \Big ) B_2(\chi ; t) dt + B_2(\chi ; x). \end{aligned}$$

Hence,

$$\begin{aligned}&\frac{1}{\varphi (q)} \sum _{\chi \ne \chi _0} \Big | \sum _{n \in I_2} A_n \chi (n) \Big (1 - f \Big ( \frac{n}{x} \Big ) \Big ) \Big |^2 \nonumber \\&\quad \ll \frac{1}{\varphi (q)} \sum _{\chi \ne \chi _0} \Big | \frac{1}{x} \int _{I_2} f' \Big ( \frac{t}{x} \Big ) B_2(\chi ; t) dt \Big |^2 + \frac{1}{\varphi (q)} \sum _{\chi \ne \chi _0} \Big |B_2(\chi ; x)|^2 \nonumber \\&\qquad \le (x/q)^{\varepsilon / 4} \cdot \frac{1}{\varphi (q)} \sum _{\chi \ne \chi _{0}} \frac{1}{x} \int _{I_2} |B_2(\chi ; t)|^2 dt + \frac{1}{\varphi (q)} \sum _{\chi \ne \chi _0} \Big |B_2(\chi ; x)|^2. \end{aligned}$$
(45)

Now by Proposition 7 we have, for \(t \in I_2\),

$$\begin{aligned} \frac{1}{\varphi (q)} \sum _{\chi \ne \chi _{0}} |B_{2}(\chi ; t)|^2 \ll (\log x)^6 \cdot \Big ( x^{-\varepsilon } \sqrt{q x} + \sqrt{(t - (x - x\cdot (x/q)^{ - \varepsilon /4})) \cdot q} \Big ). \end{aligned}$$

Therefore (45) is

$$\begin{aligned}&\ll (\log x)^{6} \cdot (x/q)^{\varepsilon / 4} \cdot \frac{1}{x} \cdot \Big ( x\cdot (x/q)^{ - \varepsilon /4} \cdot x^{-\varepsilon } \sqrt{q x} + x^{3/2} (x/q)^{- 3\varepsilon / 8} \sqrt{q} \Big )\\&\quad + (\log x)^6 (x/q)^{-\varepsilon /8} \sqrt{qx} \ll (\log x)^6 (x/q)^{- \varepsilon / 8} \cdot \sqrt{q x}. \end{aligned}$$

A similar argument shows that

$$\begin{aligned} \frac{1}{\varphi (q)} \sum _{\chi \ne \chi _0} \Big | \sum _{n \in I_1} A_n \chi (n) \Big (1 - f \Big ( \frac{n}{x} \Big ) \Big ) \Big |^2 \ll (\log x)^6 (x/q)^{- \varepsilon / 8} \cdot \sqrt{q x}. \end{aligned}$$

as well. By \((a+b)^2 \ll |a|^2+|b|^2\) we conclude that

$$\begin{aligned} S_2 \ll (\log x)^6 (x/q)^{-\varepsilon / 8} \sqrt{qx} \end{aligned}$$

as needed. On the other hand we can compute \(S_1\) by using Proposition 8 and this yields the claimed estimate. \(\square \)

6.2 Proof of Proposition 7.

By Pólya’s formula (see [MV77, Lemma 1]) for \(I = [a,b]\) and any \(\chi \ne \chi _0\) of modulus q,

$$\begin{aligned} \sum _{n \in I / d^2} \chi (n) = \frac{\tau (\chi )}{2\pi i} \sum _{1 \le |n| \le q} {\overline{\chi }}(n) f_{I / d^2}(n) + O(\log q), \end{aligned}$$

where

$$\begin{aligned} f_{I / d^2}(n) = \frac{1}{n} \cdot \Big ( e \Big ( \frac{n a}{d^2 q} \Big ) - e \Big ( \frac{n b}{d^2 q } \Big ) \Big ) \ll g_{I / d^2}(n) := {\left\{ \begin{array}{ll} \frac{|I|}{d^2 q} &{}\quad \text { if } |n| \le \frac{d^2 q}{|I|}, \\ \frac{1}{n} &{}\quad \text { otherwise.} \end{array}\right. } \end{aligned}$$

We split d and n into dyadic intervals and bound the left-hand side of (44) by

$$\begin{aligned} (\log x)^2 \sup _{\begin{array}{c} D \le z^{1/2} \\ 1 \le N \le q \end{array}} \sum _{\chi \, (\mathrm {mod}\, q)} \Big | \sum _{d \sim D} \mu (d) \chi (d^2) \sum _{n \sim N} {\overline{\chi }}(n) f_{I / d^2} (n) \Big |^2 + O(z (\log q)^2). \end{aligned}$$
(46)

The error term is clearly acceptable. We bound the main term of (46) using a majorant principle—by going through the first equality in (24) we can replace coefficients \(\mu (d)\) and \(f_{I/d^2}(n)\) by their majorants. Hence we get the bound

$$\begin{aligned} \ll (\log x)^2 \sup _{\begin{array}{c} D \le z^{1/2} \\ 1 \le N \le q \end{array}} \sum _{\chi } \Big | \sum _{d} \chi ^2(d) V \Big ( \frac{d}{D} \Big ) \cdot \sum _{n} \chi (n) V(n/N) g_{I / D^2} (N) \Big |^2 \end{aligned}$$

with V a smooth function supported on [1/2, 4].

The contribution of the principal character and quadratic character is \( \ll z (\log x)^4 \) which is acceptable. On the remaining non-principal and non-quadratic characters we apply Cauchy–Schwarz giving the upper bound

$$\begin{aligned} \ll (\log x)^2 \sup _{\begin{array}{c} D \le z^{1/2} \\ 1 \le N \le q \end{array}} g_{I/D^2}(N)^2 \Big ( \sum _{\begin{array}{c} \chi ^2 \ne \chi _0 \end{array}} \Big |&\sum _{n} \chi ^2(n) V \Big ( \frac{n}{D}\Big ) \Big |^4 \Big )^{1/2} \Big ( \sum _{\chi \ne \chi _0} \Big | \sum _{n} \chi (n) V \Big ( \frac{n}{N}\Big ) \Big |^4 \Big )^{1/2}. \end{aligned}$$
(47)

We claim that

$$\begin{aligned}&\sum _{\begin{array}{c} \chi ^2 \ne \chi _0 \end{array}} \Big | \sum _{n} \chi ^2(n) V \Big ( \frac{n}{D} \Big ) \Big |^4 \ll q D^2 \cdot (\log x)^4 \nonumber \\&\quad \text { and }\sum _{\begin{array}{c} \chi \ne \chi _0 \end{array}} \Big | \sum _{n} \chi (n) V \Big ( \frac{n}{N} \Big ) \Big |^4 \ll q N^2 \cdot (\log x)^4. \end{aligned}$$
(48)

We explain the second bound in (48); the first bound is similar. Let \({\widetilde{V}}\) be the Mellin transform of V. Using contour integration, the decay of \({\widetilde{V}}\), and Hölder, we get, for every \(A \ge 1\),

$$\begin{aligned} \sum _{\chi \ne \chi _0} \Big | \sum _{n} \chi (n) V \Big ( \frac{n}{N} \Big ) \Big |^4&=\sum _{\chi \ne \chi _0} \Big | \int _{{\mathbb {R}}} L(1/2+it,\chi ){\widetilde{V}}(1/2+it) N^{1/2+it}\,dt \Big |^4\\&\ll _A N^2 \sum _{\chi \ne \chi _0} \Big ( \int _{{\mathbb {R}}} \Big |L(1/2+it,\chi )\Big | (1+|t|)^{-A} dt \Big )^4 \\&\ll _A N^2 \sum _{\chi \ne \chi _0} \int _{{\mathbb {R}}} \Big |L(1/2+it,\chi )\Big |^4 (1+|t|)^{-A} \,dt. \end{aligned}$$

A dyadic decomposition of the integration range and the fourth moment bound for Dirichlet L-functions (Lemma 4) yield the second part of (48).

Using (48) in (47), we obtain an upper bound

$$\begin{aligned} \begin{aligned}&\ll (\log x)^6 \sup _{\begin{array}{c} D \le z^{1/2} \\ 1 \le N \le q \end{array}} g_{I/D^2}(N)^2 q DN \\&\ll (\log x)^6 \sup _{\begin{array}{c} D \le \sqrt{|I|/q} \\ 1 \le N \le q \end{array}} g_{I/D^2}(N)^2 q DN + (\log x)^6 \sup _{\begin{array}{c} \sqrt{|I|/q} < D \le z^{1/2} \\ 1 \le N \le q \end{array}} g_{I/D^2}(N)^2 q DN. \end{aligned} \end{aligned}$$

Recalling the definition of \(g_{I/D^2}(N)\), we see that on the last line, the first N-supremum is attained for \(N=1\) and the second N-supremum is attained for \(N = D^2q/|I|\), and we get the bound

$$\begin{aligned} \ll (\log x)^6 \sup _{\begin{array}{c} D \le \sqrt{|I|/q} \end{array}} q D + (\log x)^6 \sup _{\begin{array}{c} \sqrt{|I|/q} < D \le z^{1/2} \end{array}} |I|/D \ll (\log x)^6 \sqrt{|I| q} \end{aligned}$$

and the claim follows. \(\square \)

6.3 Proof of Proposition 8.

We apply Poisson summation (see e.g. [IK04, formula (4.26)]) in the sum over n, getting

$$\begin{aligned} \sum _{n} \chi (n) f \Big ( \frac{n d^2}{x} \Big ) = \tau (\chi ) \cdot \frac{x }{q d^2} \sum _{\ell } {\overline{\chi }}(\ell ) {\hat{f}} \Big ( \frac{x \ell }{d^2 q} \Big ). \end{aligned}$$

Therefore we have to asymptotically estimate

$$\begin{aligned} \frac{q}{\varphi (q)}&\cdot \frac{x^2}{q^2} \sum _{\chi \ne \chi _{0}} \Big | \sum _{\begin{array}{c} d^2 \le z \\ \ell \in {\mathbb {Z}} \end{array}} \frac{\mu (d)}{d^2} \chi (d^2) {\overline{\chi }}(\ell ) {\hat{f}} \Big ( \frac{x \ell }{d^2 q} \Big ) \Big |^2 \end{aligned}$$
(49)
$$\begin{aligned}&= \frac{x^2}{q} \sum _{\begin{array}{c} n_1, n_2 \in {\mathbb {Z}} \\ d_1^2, d_2^2 \le z \\ d_1^2 n_1\, \equiv \, d_2^2 n_2 \, (\mathrm {mod}\, q) \\ (d_1 d_2 n_1 n_2, q) = 1 \end{array}} \frac{\mu (d_1)}{d_1^2} \frac{\mu (d_2)}{d_2^2} \cdot {\hat{f}} \Big ( \frac{x n_2}{d_1^2 q} \Big ) \overline{{\hat{f}} \Big ( \frac{x n_1}{d_2^2 q} \Big )} + O ( z x^{2\varepsilon /3} ), \end{aligned}$$
(50)

and where \(O(z x^{2\varepsilon /3})\) comes from the principal character and from replacing \(\varphi (q)\) by q. We note that since \(z \le x^{-\varepsilon } \sqrt{qx}\) this contribution is acceptable. Notice that we can add and remove the restrictions \(d_1, d_2 > x^{1/2 - \varepsilon /6} / \sqrt{q}\) and \(|n_1|, |n_2| \le x^{\varepsilon /3} \cdot z q / x\) at will because they cost us a negligible error term that is \(\ll _{A} x^{-A}\) for any given \(A > 0\). Moreover note that \(n_1\) and \(n_2\) now traverse all of \({\mathbb {Z}}\).

We now separate the set of tuples \((n_1, n_2)\) into

$$\begin{aligned} {\mathcal {M}} := \left\{ (k_1^2 m, k_2^2 m){:}\,m \in {\mathbb {Z}} \text { squarefree}, k_1, k_2 \in {\mathbb {N}} \right\} \end{aligned}$$

and the complement. The \((n_1, n_2) \in {\mathcal {M}}\) contribute to a main term that is relatively easy to compute. On the other hand we will bound the contribution of \((n_1, n_2) \not \in {\mathcal {M}}\).

6.3.1 The main term \((n_1, n_2) \in {\mathcal {M}}\).

The conditions \(d_1^2 n_1 \equiv d_2^2 n_2 \, (\mathrm {mod}\, q)\) and \((n_1 n_2, q) = \) in the sum in (50) imply that if \((n_1, n_2) \in {\mathcal {M}}\) then \(d_1^2 k_1^2 \,\equiv \, d_2^2 k_2^2 \, (\mathrm {mod}\, q)\) and therefore \(d_1 k_1 \,\equiv \, \pm d_2 k_2 \, (\mathrm {mod}\, q)\). This implies that \(d_1 k_1 = d_2 k_2\) since \(d_j k_j \le \sqrt{z} \cdot \sqrt{x^{\varepsilon /3} z q / x} = x^{\varepsilon /6} z \cdot \sqrt{q / x}\) and this is \(\le q/3\) because \(z \le x^{-\varepsilon } \sqrt{qx}\). We conclude that the contribution of \((n_1, n_2) \in {\mathcal {M}}\) is given by

$$\begin{aligned} \frac{x^2}{q} \sum _{k_1, k_2} \sum _{\begin{array}{c} \begin{array}{c} d_1 k_1 = d_2 k_2 \\ d_1^2, d_2^2 \le z \\ (d_1 d_2 k_1 k_2, q) = 1 \end{array} \end{array}} \frac{\mu (d_1) \mu (d_2)}{d_1^2 \cdot d_2^2} \sum _{(m,q)=1} \mu ^2(m) {\hat{f}} \Big ( \frac{x k_2^2 m}{d_1^2 q} \Big ) \overline{{\hat{f}} \Big ( \frac{x k_1^2 m}{d_2^2 q} \Big )}. \end{aligned}$$
(51)

We now parametrize the equation \(d_1 k_1 = d_2 k_2\) by dividing by \((d_1, d_2)\) on both sides so that

$$\begin{aligned} k_1 = \frac{d_2 \ell }{(d_1, d_2)} \text { and } k_2 = \frac{d_1 \ell }{(d_1, d_2)} \quad \text {with } \ell \in {\mathbb {N}}. \end{aligned}$$

Plugging this and noticing that each non-negative integer can be written uniquely as \(\ell ^2 m\) with m squarefree, we can re-write (51) as

$$\begin{aligned} \frac{2 x^2}{q} \sum _{\begin{array}{c} d_1^2, d_2^2 \le z \\ (d_1 d_2, q) = 1 \end{array}} \frac{\mu (d_1) \mu (d_2)}{d_1^2 \cdot d_2^2} \sum _{\begin{array}{c} \ell \ge 1 \\ (\ell ,q) = 1 \end{array}} \Big | {\hat{f}} \Big ( \frac{x \ell }{q (d_1^2, d_2^2)} \Big ) \Big |^2. \end{aligned}$$

Note that we can drop the condition \((d_1d_2,q) = 1\) as q is prime and \(d_1, d_2 < q\). Likewise since \(\ell \ge q\) contribute \(O_A(x^{-A})\), we can drop the condition \((\ell , q) = 1\) and apply Lemma 8 with \(W = {\hat{f}}\) and \(H = x/q\) to see that the above is

$$\begin{aligned} C \sqrt{qx} \cdot \pi \int _0^\infty |{\hat{f}}(y)|^2\sqrt{y}\,dy + O((x/q)^{-\varepsilon /8} \sqrt{xq}). \end{aligned}$$

Let \(F(u) = {\mathbf {1}}_{[0,1]}(u)\). We have,

$$\begin{aligned} {\hat{f}}(y) - {\hat{F}}(y) \ll \min \{(x/q)^{-\varepsilon /4}, |y|^{-1}\}, \end{aligned}$$

with the bound \((x/q)^{-\varepsilon /4}\) for the difference between these two Fourier transforms following from the fact that \(\Vert f - F\Vert _{L^1} \ll (x/q)^{-\varepsilon /4}\), and the bound 1/|y| following from the fact that the total variation of the function \(f-F\) is bounded by an absolute constant. Likewise

$$\begin{aligned} {\hat{f}}(y)\ll (1+|y|)^{-1} \quad \text {and} \quad {\hat{F}}(y) \ll (1+|y|)^{-1}. \end{aligned}$$

Hence

$$\begin{aligned}&\int _0^\infty |{\hat{f}}(y)|^2 \sqrt{y}\, dy - \int _0^\infty |{\hat{F}}(y)|^2\sqrt{y}\,dy \\&\quad \ll \int _0^\infty \min \{(x/q)^{-\varepsilon /4}, y^{-1}\} (1+y)^{-1} \sqrt{y}\, dy \ll (x/q)^{-\varepsilon /8}. \end{aligned}$$

Putting these estimates together, and using the relation \(|{\hat{F}}(\xi )| = |S(\xi )|\) and the integral identity (21), we see that (51) is

$$\begin{aligned} C\sqrt{qx} + O((x/q)^{-\varepsilon /8} \sqrt{xq}). \end{aligned}$$

6.3.2 The off-diagonal \((n_1, n_2) \not \in {\mathcal {M}}\).

Let us focus on bounding the contribution of \((n_1, n_2) \not \in {\mathcal {M}}\). We recall that the contribution of \(d_1 \le x^{1/2 - \varepsilon /6}/q^{1/2}\) to (49) is negligible and likewise the contribution of \(d_2 \le x^{1/2 - \varepsilon /6}/q^{1/2}\) is negligible. We now partition \(d_1, d_2\) into intervals \([D_1, 2D_1]\) and \([D_2, 2 D_2]\) with \(x^{1/2 - \varepsilon /6}/q^{1/2} \le D_1, D_2 \le \sqrt{z}\). The total contribution of \((n_1, n_2) \not \in {\mathcal {M}}\) with \(d_1 \in [D_1, 2 D_1]\) and \(d_2 \in [D_2, 2 D_2]\) to (49) is bounded by

$$\begin{aligned} \frac{x^2}{q} \cdot \frac{1}{D_1^2 D_2^2} \sum _{\begin{array}{c} (n_1, n_2) \not \in {\mathcal {M}} \end{array}} V \Big ( \frac{n_1}{N_1} \Big ) V \Big ( \frac{n_2}{N_2} \Big ) \sum _{\begin{array}{c} d_1^2 n_1 \,\equiv \, d_2^2 n_2 \, (\mathrm {mod}\, q) \\ (d_1 d_2 n_1 n_2, q) = 1 \end{array}} V \Big ( \frac{d_1}{D_1} \Big ) V \Big ( \frac{d_2}{D_2} \Big ) \end{aligned}$$
(52)

with V a smooth non-negative compactly supported function such that \(V(x) \ge 1\) for \(x \in [-2, 2]\) and \(D_1, D_2 > x^{1/2 - \varepsilon /6}/q^{1/2}\) and \(N_1 \le x^{\varepsilon /3} D_2^2 q / x\) and \(N_2 \le x^{\varepsilon /3} D_1^2 q / x\).

We now split into two cases according to the size of \(D_1D_2\):

6.3.3 Case \(D_1 D_2 \ge x^{1 + 2\varepsilon } / q\).

In this case we do not use the condition \((n_1, n_2) \not \in {\mathcal {M}}\). Dropping this condition and using Dirichlet characters we can re-write (52) as

$$\begin{aligned}&\frac{x^2}{q \varphi (q)} \frac{1}{D_1^2 D_2^2} \sum _{\chi ^2 \ne \chi _0} \Big ( \sum _{n_1} \chi (n_1) V \Big ( \frac{n_1}{N_1} \Big ) \Big ) \Big ( \sum _{n_2} {\overline{\chi }}(n_2) V \Big ( \frac{n_2}{N_2} \Big ) \Big ) \nonumber \\&\qquad \times \Big ( \sum _{d_1} \chi ^2 (d_1) V \Big ( \frac{d_1}{D_1} \Big ) \Big ) \Big ( \sum _{d_2} {\overline{\chi }}^2(d_2) V \Big ( \frac{d_2}{D_2} \Big ) \Big ) + O \Big ( \frac{x^2}{q^2} \cdot \frac{N_1 N_2}{D_1 D_2} \Big ) \end{aligned}$$
(53)

and where the \(O(\cdot )\) term corresponds to the contribution of the characters with \(\chi ^2 = \chi _0\). Note that this contribution is acceptable since

$$\begin{aligned} \frac{x^2}{q^2} \cdot \frac{N_1 N_2}{D_1 D_2} \ll x^{2\varepsilon /3} D_1 D_2 \ll x^{2\varepsilon /3} z \ll x^{-\varepsilon /3} \sqrt{qx}. \end{aligned}$$

Now we express each of the sums in (53) using a contour integral, and using Hölder’s inequality this allows us to bound (53) by

$$\begin{aligned} \frac{x^2}{q^2} \cdot \frac{\sqrt{N_1 N_2 D_1 D_2}}{D_1^2 D_2^2} \sum _{\chi } \int _{|u| \le x^{\varepsilon /3}} |L(\tfrac{1}{2} + i u, \chi )|^4 du + x^{-\varepsilon /3} \sqrt{qx}. \end{aligned}$$

By the fourth moment bound (Lemma 4) the first term is

$$\begin{aligned} \ll \frac{x^2}{q^2} \cdot \frac{\sqrt{N_1 N_2 D_1 D_2}}{D_1^2 D_2^2} q x^{\varepsilon /2} \ll x^{5\varepsilon /6} \frac{x}{\sqrt{D_1 D_2}} \ll x^{-\varepsilon /6} \sqrt{qx} \end{aligned}$$

since \(D_1 D_2 \ge x^{1 + 2\varepsilon } / q\).

6.3.4 Case \(D_1 D_2 < x^{1 + 2\varepsilon } / q\).

In this case we notice that since \(D_1, D_2 > x^{1/2 - \varepsilon /6} / \sqrt{q}\) we have \(D_1, D_2 \le (x / q)^{1/2} x^{3\varepsilon }\) and in particular \(N_1, N_2 \ll x^{7 \varepsilon }\). We notice that if \((n_1, n_2) \not \in {\mathcal {M}}\) and \(n_1 d_1^2 \,\equiv \, n_2 d_2^2 \, (\mathrm {mod}\, q)\) then \(n_1 d_1^2 = n_2 d_2^2 + q \ell \) with \(0 < |\ell | \ll x^{1+13\varepsilon }/q^2\). We now fix \(n_1, n_2, \ell \)—there are \(\ll x^{1 + 27\varepsilon } / q^2\) possible choices. We shall show that the number of solutions in \(|d_1|, |d_2| \ll (x / q)^{1/2} x^{3 \varepsilon }\) to \(n_1 d_1^2 - n_2 d_2^2 = q \ell \) is bounded by \(\ll x^{9\varepsilon }\) which will be sufficient.

First of all note that we can assume that \((n_1, n_2, q \ell ) = 1\). Indeed, q cannot divide \(n_1 n_2\) as \(n_1 n_2 =o(q)\), and so letting \(g=(n_1,n_2, q\ell )\) we have \(g \mid \ell \) and the problem reduces to one where \((n_1,n_2,\ell )\) is replaced with \((n_1',n_2',\ell ') = (n_1,n_2,\ell )/g\) and now \((n_1',n_2',q\ell ')=1\).

Notice that \(f(x_1, y_1) = n_1 x_1^2 - n_2 y_1^2\) is a primitive binary quadratic form with discriminant \(d = 4 n_1 n_2 > 0\). Denote by \(\varepsilon _{n_1 n_2}\) the real number \(x_0/2 + y_0 \sqrt{n_1 n_2}\) where \((x_0, y_0)\) is the solution in positive integers to the equation \(x_0^2 - 4 n_1 n_2 y_0^2 = 4\) for which \(x_0+y_0\sqrt{d}\) is least. Note that \(\varepsilon _{n_1 n_2} \ge 3/2\).

Let \((x_1,y_1)\) be a solution to \(f(x_1,y_1) = q \ell \) with \(x_1, y_1 \ll (x / q)^{1/2} x^{3\varepsilon }\). We notice that in this situation

$$\begin{aligned} (x_1,y_1) \in \bigcup _{\begin{array}{c} 1 \le m \le \log x \end{array}} T_m^{+} \cup T_m^{-} \end{aligned}$$

where

$$\begin{aligned} T_{m}^{+} = \Big \{ (x,y) \in {\mathbb {Z}}^2{:}\,f(x,y) = q \ell \text { and } \sqrt{n_1} x > \sqrt{n_2} y \text { and } \varepsilon _{n_1 n_2}^{2m - 2} \le \Big | \frac{\sqrt{n_1} x + \sqrt{n_2} y}{\sqrt{n_1} x - \sqrt{n_2} y} \Big | < \varepsilon _{n_1 n_2}^{2m} \Big \} \end{aligned}$$

and

$$\begin{aligned} \begin{aligned} T_{m}^{-}&= \Big \{ (x,y) \in {\mathbb {Z}}^2{:}\,f(x,y) = q \ell \text { and } \sqrt{n_1} x< \sqrt{n_2} y \text { and } \varepsilon _{n_1 n_2}^{2m - 2} \le \Big | \frac{\sqrt{n_1} x + \sqrt{n_2} y}{\sqrt{n_1} x - \sqrt{n_2} y} \Big | < \varepsilon _{n_1 n_2}^{2m} \Big \} \\&= \{(x, y) \in {\mathbb {Z}}^2 {:}\,(-x, -y) \in T_m^+\}. \end{aligned} \end{aligned}$$

The reason for this is that \(\sqrt{n_1} x_1 + \sqrt{n_2} y_1 \ll x^{7\varepsilon } (x/q)^{1/2}\) and

$$\begin{aligned} | \sqrt{n_1} x_1 - \sqrt{n_2} y_1 | = \frac{q \ell }{\sqrt{n_1} x_1 + \sqrt{n_2} y_1} \gg \frac{q^{3/2}}{x^{1/2+7\varepsilon }} \gg 1. \end{aligned}$$

Moreover by Lemma 13 of [MW02] we have \(\# T_{m}^{+} = \# T_{1}^{+}\) for all \(m \ge 1\), and trivially \(\# T_{m}^{-} = \# T_{m}^{+}\) for all \(m \ge 1\).

The solutions belonging to \(T_{1}^{+}\) are primary for the quadratic form \(n_1 x_1^2 - n_2 x_2^2\) of discriminant \(4 n_1 n_2\) (see p. 101 of [SW06] for the definition of primary). By Theorem 4.1 of [SW06] the number of \((x_1, y_1)\) for which there exists a quadratic form g of discriminant \(4 n_1 n_2\) such that \(g(x_1,y_1) = q \ell \) and such that \((x_1, y_1)\) is primary for g, is either 0 or given by

$$\begin{aligned} m \prod _{p \mid m} \left( 1-\frac{1}{p}\left( \frac{4n_1 n_2/m^2}{p} \right) \right) \cdot \sum _{k \mid \frac{q\ell }{m^2} } \left( \frac{d_0}{k}\right) , \end{aligned}$$

for particular integers m and \(d_0\) with \(m^2 \mid (q\ell ,4n_1 n_2)\). Using the divisor bound \(\# \{ k{:}\,k \mid n\} \ll _{\varepsilon } n^{\varepsilon /100}\), we find that this is

$$\begin{aligned} \ll (n_1 n_2)^{1/2 + \varepsilon /100} (q \ell )^{\varepsilon /100} \ll x^{8\varepsilon }. \end{aligned}$$

We conclude therefore that \(\# T_{1}^{+} \ll x^{8\varepsilon }\) and therefore the number of solutions \((x_1,y_1)\) with \(|x_1|, |y_1| \ll ( x/q)^{1/2} x^{3\varepsilon }\) to the equation \(f(x_1, y_1) = q \ell \) is bounded by \(\ll \log x\cdot \# T_{1}^{+} \ll x^{9\varepsilon }\) as claimed. It follows therefore that the total number of solutions to \(n_1 d_1^2 - n_2 d_2^2 = q \ell \) with \(n_i \sim N_i\), \(d_i \sim D_i\) for \(i = 1,2\) is \(\ll x^{1 + 36\varepsilon } / q^2\).

We conclude therefore that (52) is

$$\begin{aligned} \ll \frac{x^2}{q} \cdot \frac{1}{D_1^2 D_2^2} \cdot \frac{x^{1 + 36\varepsilon }}{q^2} \ll \frac{x^{1 + 40 \varepsilon }}{q} \ll x^{-\varepsilon } \sqrt{ qx} \end{aligned}$$

since \(q > x^{1/3 + 30\varepsilon }\). \(\square \)

7 The range \(z \ge (x / q)^{4/3 + \varepsilon }\) in the q-aspect: Proof of Proposition 4

Splitting into dyadic segments and recalling (23), we can bound the left-hand side of (15) by a constant times

$$\begin{aligned} \log x \sup _{\sqrt{z} \le D \le \sqrt{x}} \frac{1}{\varphi (q)^2} \sum _{\chi \ne \chi _0} \Big | \sum _{\begin{array}{c} n d^2 \le x \\ d \sim D \end{array}} \mu (d) \chi (n) \chi (d^2) \Big |^2. \end{aligned}$$

Expressing the condition \(n d^2 \le x\) using a contour integral (see [MV07, Cor. 5.3]) the above is bounded by

$$\begin{aligned} \ll \log x \sup _{\sqrt{z} \le D \le \sqrt{x}} \frac{1}{\varphi (q)^2} \sum _{\chi \ne \chi _0} \Big | \int _{|t| \le x} L(\tfrac{1}{2} + it, \chi ) M(1 + 2 i t, \chi ^2) \cdot \frac{x^{1/2 + it}}{1/2 + it} dt \Big |^2 + O((x/q)^{1/2-\varepsilon /8}), \end{aligned}$$

(in fact a better error term can be obtained but we do not need to keep track of it) where

$$\begin{aligned} M(1 + 2 i t, \chi ^2) = \sum _{d \sim D} \frac{\mu (d) \chi ^2(d)}{d^{1 + 2 it}}. \end{aligned}$$

Applying Cauchy–Schwarz and splitting according to the values of t we can bound the main term above as

$$\begin{aligned} \ll x (\log x)^3 \sup _{\begin{array}{c} \sqrt{z} \le D \le \sqrt{x} \\ 1 \le T \le x \end{array}} \frac{1}{\varphi (q)^2} \sum _{\chi \ne \chi _0} \frac{1}{T} \int _{-T}^{T} |L(\tfrac{1}{2} + it, \chi )|^2 \cdot |M(1 + 2it, \chi ^2)|^2 dt. \end{aligned}$$
(54)

Let us now prove the claim on the assumption of the Generalized Lindelöf Hypothesis. Applying Generalized Lindelöf and then the hybrid mean-value theorem (Lemma 7) we have for any choice of \(\delta > 0\),

$$\begin{aligned} \begin{aligned}&\frac{x}{\varphi (q)^2} \sum _{\chi \ne \chi _0} \frac{1}{T} \int _{-T}^T |L(\tfrac{1}{2}+it,\chi )|^2 |M(1+2it,\chi ^2)|^2\, dt \\&\quad \ll \frac{x (qT)^\delta }{q^2 T} \sum _{\chi \ne \chi _0} \int _{-T}^T |M(1+2it,\chi ^2)|^2\,dt \\&\quad \ll \frac{x T^\delta q^{2\delta }}{q^2 T} (qT + D) \cdot \frac{1}{D} \ll T^\delta q^{2\delta } \left( \frac{x}{qD} + \frac{x}{q^2 T}\right) . \end{aligned} \end{aligned}$$
(55)

Since \(q, T \le x \le (x/q)^{O(1)}\), for sufficiently small \(\delta \) we have \(T^\delta q^{2\delta } \le (x/q)^{\varepsilon /100}\). Recalling also that \(D \ge z^{1/2} \ge (x/q)^{(1+\varepsilon )/2}\) and \(q \ge x^{1/3 + \varepsilon }\), we see that (55) is

$$\begin{aligned} \ll x^{\varepsilon /100} \biggl ( \Bigl (\frac{x}{q}\Bigr )^{1/2-\varepsilon /2} + \frac{x}{q^{2}} \biggr ) \ll (x/q)^{1/2-\varepsilon /3}. \end{aligned}$$

Applying this estimate to (54) yields the claim.

Let us now consider the unconditional part of the claim. Let

$$\begin{aligned} S_{T,q} (V) := \{ (t, \chi ){:}\,V \le | M(1 + 2 it, \chi ^2)| \le 2 V \, \ |t| \le T \, \ \chi \, (\mathrm {mod}\, q) \}. \end{aligned}$$

Note that for \(D \ge \sqrt{z} \ge (x/q)^{1/2+\varepsilon }\), the values of \(t \in [-T, T]\) for which \(|M(1+2it, \chi ^2)| \le D^{-1/2 + \varepsilon /4}\) contribute to  (54) by Cauchy–Schwarz and the fourth moment bound (Lemma 4) \(O((\log x)^5 x D^{-1 + \varepsilon /2}/q) = O((x / q)^{1/2 - \varepsilon /2})\). Additionally, \(|M(1+2it)| \le \sum _{d \sim D} 1/d \le 2\). Therefore it suffices to show that for each \(\sqrt{x} \ge D \ge \sqrt{z}\), \(V \in [D^{-1/2}, 1]\), and \(T \in [1, x]\), we have

$$\begin{aligned} \frac{x V^2}{\varphi (q)} \sum _{\chi \ne \chi _0} \frac{1}{T} \int _{t {:}\,(t, \chi ) \in S_{T, q}(V)} |L(\tfrac{1}{2} + it, \chi )|^2 dt \ll (x/q)^{-\varepsilon /8} (\log x)^{-4} \cdot \sqrt{q x}. \end{aligned}$$
(56)

By Lemma 2 we have,

$$\begin{aligned} |S_{T, q}(V)| \ll (V^{-2} + q T \min \{ D^{-1} V^{-2}, D^{-2} V^{-6} \}) \cdot (\log x)^{18}. \end{aligned}$$
(57)

Here \(|S_{T,q}(V)|\) is the measure of \(S_{T,q}(V)\), where the set of \(\chi \, (\mathrm {mod}\, q)\) is endowed with the counting measure. Consider first the case when the first term dominates. Then, by Lemma 6, we see that the left-hand side of (56) is

$$\begin{aligned}&\ll \frac{x V^2}{\varphi (q)} \cdot \frac{|S_{T, q}(V)|}{T} \cdot (q T)^{1/3 + \varepsilon /4} \ll \frac{x}{\varphi (q)} \cdot \frac{1}{T} \cdot (q T)^{1/3 + \varepsilon /3} \\&\ll \frac{x}{q^{2/3 - \varepsilon /2}} \ll (x/q)^{-\varepsilon /8} (\log x)^{-4} \cdot \sqrt{qx} \end{aligned}$$

since \(q > x^{3/7 + \varepsilon }\). Note that the factor \((\log x)^{18}\) in (57) was absorbed in the exponent of qT.

Consider now the case that the second term dominates in (57). Then by Cauchy–Schwarz and the hybrid fourth moment estimate (Lemma 4),

$$\begin{aligned}&\frac{x V^2}{\varphi (q)} \sum _{\chi \ne \chi _0} \frac{1}{T} \int _{t {:}\,(t, \chi ) \in S_{T, q}(V)} |L(\tfrac{1}{2} + it, \chi )|^2 dt \\&\quad \ll \frac{x V^2}{T \varphi (q)} \cdot |S_{T, q}(V)|^{1/2} \cdot \Big ( \sum _{\chi \ne \chi _0} \int _{-T}^T |L(\tfrac{1}{2} + it, \chi )|^4 dt \Big )^{1/2} \\&\quad \ll (\log x)^{11} \cdot x V^2 \cdot \min \{ D^{-1} V^{-2}, D^{-2} V^{-6} \}^{1/2} \\&\quad \ll (\log x)^{11} \cdot x \min \{ D^{-1/2} V, D^{-1} V^{-1} \} \\&\quad \ll (\log x)^{11} \cdot x \cdot ( D^{-1/2} V )^{1/2} \cdot (D^{-1} V^{-1})^{1/2} \\&\quad \ll x (\log x)^{11} \cdot D^{-3 / 4} \ll (x/q)^{-\varepsilon /8} (\log x)^{-3} \sqrt{qx} \end{aligned}$$

since \(D \ge \sqrt{z} \ge (x / q)^{2/3 + \varepsilon /2}\).

8 Conditional estimates: Proof of Theorem 3

The proof of Theorem 3 splits into two parts since two assertions are made.

8.1 Proof that the Riemann Hypothesis implies (7).

The proof follows the same ideas as the proof of Proposition 2. The claim (7) is already proved for \(H\le X^{2/3-\varepsilon }\), so we may assume \(H > X^{2/3-\varepsilon }\). We return to (41) and consider first the case \(D \ge H^{(1-\delta )/2}\). Note that the Riemann Hypothesis implies

$$\begin{aligned} M(1+2it) \ll _\delta D^{-1/2+\delta /2}, \end{aligned}$$
(58)

for \(|t| \le X^2\). Now (41), Cauchy–Schwarz and the fourth moment bound for the Riemann zeta function (Lemma 3) imply

$$\begin{aligned} \frac{1}{X} \int _{X}^{2X} \Big | \sum _{\begin{array}{c} x < n d^2 \le x + H \\ d \sim D \end{array}} \mu (d) - H \sum _{\begin{array}{c} d \sim D \end{array}} \frac{\mu (d)}{d^2} \Big |^2 dx \ll _\delta (\log X)^2 H/D^{1-\delta }. \end{aligned}$$

For \(D \ge H^{(1-\delta )/2}\), the right-hand side is \(\ll _{\delta } (\log X) H^{1/2+\delta - \delta ^2/2}\). Splitting dyadically for \(D \in [H^{(1-\delta )/2}, X^{1/2}]\) and using the tail bound

$$\begin{aligned} \sum _{d^2 > 2X} \frac{\mu (d)}{d^2} \ll _{\delta } \frac{1}{X^{3/4 - \delta /10}}, \end{aligned}$$

valid under Riemann Hypothesis, we see that the Riemann Hypothesis implies

$$\begin{aligned} \begin{aligned}&\frac{1}{X} \int _{X}^{2X} \Big | \sum _{\begin{array}{c} x< n d^2 \le x + H \\ d^2 \ge H^{1-\delta } \end{array}} \mu (d) - H \sum _{\begin{array}{c} d^2 \ge H^{1-\delta } \end{array}} \frac{\mu (d)}{d^2} \Big |^2 dx \\&\quad \ll (\log X)^2 \sup _{H^{(1-\delta )/2} \le D \le X^{1/2}} \frac{1}{X} \int _{X}^{2X} \Big | \sum _{\begin{array}{c} x < n d^2 \le x + H \\ d \sim D \end{array}} \mu (d) - H \sum _{\begin{array}{c} d \sim D \end{array}} \frac{\mu (d)}{d^2}\Big |^2 dx \\&\qquad + \frac{1}{X} \int _{X}^{2X} \Big |H \sum _{d^2 > 2X} \frac{\mu (d)}{d^2} \Big |^2 dx \ll _\delta H^{1/2+\delta }. \end{aligned} \end{aligned}$$
(59)

On the other hand, estimating the n-sum on the left-hand side by \(H/d^2 + O(1)\), we see that

$$\begin{aligned} \frac{1}{X} \int _{X}^{2X} \Big | \sum _{\begin{array}{c} x < n d^2 \le x + H \\ d^2 \le H^{1/2} \end{array}} \mu (d) - H \sum _{\begin{array}{c} d^2 \le H^{1/2} \end{array}} \frac{\mu (d)}{d^2} \Big |^2 dx \ll H^{1/2}. \end{aligned}$$

Hence the claim follows once we have shown that, for any \(D \in [H^{1/4}, H^{(1-\delta )/2}]\), we have

$$\begin{aligned} \frac{1}{X} \int _{X}^{2X} \Big | \sum _{\begin{array}{c} x < n d^2 \le x + H \\ d \sim D \end{array}} \mu (d) - H \sum _{\begin{array}{c} d \sim D \end{array}} \frac{\mu (d)}{d^2} \Big |^2 dx \ll _{\delta } H^{1/2}. \end{aligned}$$

Notice that we can attach to the n variable a dummy function \(f(n D^2 / X)\) with f a smooth function supported in [1/20, 20] and such that \(f(y) = 1\) for \(y \in [1/10, 10]\).

Similarly to the proof of Proposition 2, write

$$\begin{aligned} A(x) := \sum _{\begin{array}{c} n d^2 \le x \\ d \sim D \end{array}} f \Big ( \frac{n}{X / D^2} \Big ) \mu (d) - x \sum _{d \sim D} \frac{\mu (d)}{d^2}. \end{aligned}$$

By contour integration we have, for \(e^y \in [X, 2X]\) and \(w \le 1/100\),

$$\begin{aligned} A(e^{y + w}) - A(e^{y}) = \frac{1}{2\pi i} \int _{1/2-i\infty }^{1/2+i\infty } e^{y s} \cdot \frac{e^{w s} - 1}{s} N_1(s) M(2s) ds - e^{y} (e^{w} - 1) \sum _{d \sim D} \frac{\mu (d)}{d^2},\nonumber \\ \end{aligned}$$
(60)

where

$$\begin{aligned} M(s) := \sum _{d \sim D} \frac{\mu (d)}{d^{s}} \quad \text {and} \quad N_1(s) := \sum _{m} \frac{1}{m^s} \cdot f \Big ( \frac{m}{X / D^2} \Big ). \end{aligned}$$

Write also

$$\begin{aligned} N_2(s) := \int _{{\mathbb {R}}} \frac{1}{u^s} \cdot f \Big ( \frac{u}{X / D^2} \Big ) du \end{aligned}$$

and note that, for \(e^y \in [X, 2X]\), we have by contour integration

$$\begin{aligned} \frac{1}{2\pi i} \int _{1/2-i\infty }^{1/2+i\infty } e^{y s} \cdot \frac{e^{w s} - 1}{s} N_2(s) M(2s) ds&= \sum _{d \sim D} \mu (d) \int _{{\mathbb {R}}} f \Big ( \frac{u}{X / D^2} \Big ) 1_{e^{y} \le ud^2 \le e^{y + w}} du \\&= e^y (e^{w} - 1) \sum _{d \sim D} \frac{\mu (d)}{d^2}. \end{aligned}$$

Plugging this into (60) and arguing as in the proof of Proposition 2, we see that, for some \(w \asymp H/X\), we have

$$\begin{aligned} \begin{aligned}&\frac{1}{X} \int _{X}^{2X} |A(x + H) - A(x)|^2 dx \\&\quad \ll X \int _{{\mathbb {R}}} \Big | \frac{e^{w (\tfrac{1}{2} + it)} - 1}{\tfrac{1}{2} + it} \Big |^2 \cdot \left| \left( N_1(\tfrac{1}{2} + it)-N_2(\tfrac{1}{2}+it)\right) M(1 + 2it)\right| ^2 dt. \end{aligned} \end{aligned}$$
(61)

By Poisson summation

$$\begin{aligned} \begin{aligned} N_1(\tfrac{1}{2} + it)&= \sum _{m} \frac{1}{m^{1/2+it}} \cdot f \Big ( \frac{m}{X / D^2} \Big ) = \sum _\ell \int _{-\infty }^\infty \frac{1}{u^{1/2+it}} \cdot f \Big ( \frac{u}{X / D^2} \Big ) e(\ell u) du \\&= N_2(\tfrac{1}{2} + it) + \frac{X}{D^2} \sum _{\ell \ne 0} \int _{1/20}^{20} \left( \frac{D^2}{yX}\right) ^{1/2+it} f( y ) e\left( \frac{\ell yX}{D^2}\right) dy. \end{aligned} \end{aligned}$$

By partial integration (taking antiderivatives of \(e(\ell y X/D^2)\)), this implies that, for \(|t| < X/D^{2 + \delta /100}\),

$$\begin{aligned} |N_1(\tfrac{1}{2} + it)-N_2(\tfrac{1}{2}+it)| \ll _{A} X^{-A}, \end{aligned}$$

for any \(A > 0\). Therefore the part of the integral (61) with \(|t| < X / D^{2 +\delta /100}\) is completely negligible.

On the other hand the part with \(|t| \ge X^{10}\) contributes only O(1) to the left-hand side of (61) by estimating \(|N_j(1/2+it)|\) and \(|M(1+it)|\) trivially.

Furthermore, assuming the Riemann Hypothesis, we have by contour integration, for \(|t| \in [X/D^{2 + \delta /100}, X^{10}]\),

$$\begin{aligned} |N_1(\tfrac{1}{2} + it)| \ll \sup _{\begin{array}{c} |t|/2 \le |u| \le 2X^{10} \end{array}} |\zeta (\tfrac{1}{2}+iu)| \ll _{\delta } X^{\delta /100} \end{aligned}$$

and

$$\begin{aligned} |M(1+2it)| \ll _{\delta } D^{-1/2+\delta /100}. \end{aligned}$$

Furthermore, by partial integration we have, for \(|t| \in [X/D^{2 + \delta /100}, X^{10}]\),

$$\begin{aligned} |N_2(\tfrac{1}{2} + it)| \ll _{\delta } X^{\delta /100} \end{aligned}$$

Hence the part with \(|t| \in [X/D^{2 + \delta /100}, X^{10}]\) contributes to (61)

$$\begin{aligned} \ll _{\delta } X \int _{X/D^{2 + \delta /100} \le |t| \le X^{10}} \frac{1}{|t|^2} X^{\delta /50} D^{-1+\delta /50}\, dt \ll _{\delta } D X^{\delta /10} \ll _{\delta } H^{1/2} \end{aligned}$$

since \(D \le H^{(1-\delta )/2}\) and \(H \ge X^{1/2}\). \(\square \)

8.2 Proof that (7) implies the Riemann Hypothesis.

Suppose that (7) holds for \(H = X^{1 - \delta }\). Then, by Cauchy–Schwarz,

$$\begin{aligned} \int _{{\mathbb {R}}} \Phi \Big ( \frac{x}{X} \Big ) \Big ( \frac{1}{H} \sum _{x < m \le x + H} \mu ^2 (m) \Big ) dx = \frac{6 X{\widehat{\Phi }}(0)}{\pi ^2} + O_{\delta }(X^{1/4 + 3 \delta }). \end{aligned}$$

with \(\Phi \) an arbitrary, but not identically zero smooth function compactly supported in [1/2, 3] (one could even enforce that \({\widehat{\Phi }}(0) = 0\) to simplify the above expression but we didn’t find any significant advantage in doing this). Therefore,

$$\begin{aligned} \frac{1}{2\pi i} \int _{2 - i \infty }^{2 + i \infty } \frac{\zeta (s)}{\zeta (2s)} X^s \cdot \Psi _{H/ X}(s) ds - \frac{6 X {\widehat{\Phi }}(0)}{\pi ^2} = O_{\delta }(X^{1/4 + 3 \delta }), \end{aligned}$$
(62)

where uniformly in \(1/100< \mathfrak {R}s < 100\), for any given \(A > 1\),

$$\begin{aligned} \begin{aligned} \Psi _{H/X}(s)&:= \frac{1}{s} \cdot \frac{X}{H} \int _{{\mathbb {R}}} \Big ( \Phi \Big (x - \frac{H}{X} \Big ) - \Phi (x) \Big ) x^s dx\\&= \frac{1}{s} \sum _{1 \le j \le A} \frac{(-1)^j}{j!} \cdot \Big ( \frac{H}{X} \Big )^{j - 1} \int _{{\mathbb {R}}} \Phi ^{(j)}(x) x^{s} dx + O_{A} ( X^{-\delta A} ) \\&= - \frac{1}{s} \int _{{\mathbb {R}}} \Phi '(x) x^{s} dx + O_{A} \Big ( \frac{H}{X} \cdot (1 + |\mathfrak {I}s|)^{-A} + X^{-\delta A} \Big ). \end{aligned} \end{aligned}$$
(63)

By integration by parts the main term is equal to \({\widetilde{\Phi }}(s)\) where \({\widetilde{\Phi }}(s)\) is the Mellin transform of \(\Phi \). The reader may also verify that we have the exact relation \(\Psi _{H/X}(1) = {\widetilde{\Phi }}(1)\).

Suppose that the Riemann Hypothesis fails. Then \(\zeta (s) / \zeta (2s)\) has a pole in the strip \(\tfrac{1}{4}< \sigma < \tfrac{1}{2}\) (e.g. \(s = \rho / 2\) with \(\rho = \beta + i \gamma \) the zeros of \(\zeta (s)\) with smallest \(\gamma > 0\) among all zeros of \(\zeta (s)\) with \(\beta \in (\tfrac{1}{2}, 1)\)). Let \(\Theta > \tfrac{1}{4}\) denote the supremum of the real part of poles of \(\zeta (s) / \zeta (2s)\) lying in the strip \(\tfrac{1}{4}< \sigma < \tfrac{1}{2}\). Choose \(\delta > 0\) to be sufficiently small so that \(\tfrac{1}{4} + 3 \delta \le \Theta - \delta / 2\).

Pick now \(s_0\) a pole of \(\zeta (s) / \zeta (2s)\) with \(\mathfrak {R}s_0 \in (\Theta - \delta / 50, \Theta ]\) and the smallest positive imaginary part. We can assume without loss of generality that \(\Phi \) is chosen so that \({\widetilde{\Phi }}(s_{0}) \ne 0\). Indeed if it were the case that \({\widetilde{\Phi }}(s_0) = 0\) then pick a \(c \in (0,1)\) such that \({\widetilde{\Phi }}(c + s_0) \ne 0\) and consider \(x^{c} \Phi (x)\) in place of \(\Phi (x)\).

We shall shift the contour in (62) to the line \(\sigma = \Theta + \delta / 8\). Note that for any fixed values of X and H, from the definition (63) and integration by parts, we have \(\Psi _{H/X}(s) \ll _A (1+|\mathfrak {I}s|)^{-A}\) uniformly for \(1/100< \mathfrak {R}s < 100\) for all \(A \ge 1\). Furthermore for \(\mathfrak {R}s \in [\sigma ,2]\) and s bounded away from 1 we get the bound \(\zeta (s)/\zeta (2s) \ll _\delta (1+|\mathfrak {I}s|)^C\), where C is a constant which depends only on \(\delta \). (This follows because we may bound \(\zeta (s)\) using a convexity bound (see e.g. [Tit86, Sec. 5.1]) and we may bound \(1/\zeta (2s)\) using the estimate \(\log \zeta (2s) \ll _\delta \log (|\mathfrak {I}(2s)|+2)\) in this region, which follows from a well-known estimate on the logarithmic derivative of the zeta function (e.g. [Tit86, Thm 9.6 (A)]) and the fact that in this region \(|2s-\rho '| \ge \delta /8\) whenever \(\zeta (\rho ') = 0\).) Thus

$$\begin{aligned} \frac{1}{2\pi i} \int _{(2)} \frac{\zeta (s)}{\zeta (2s)} X^s \cdot \Psi _{H/ X}(s) ds = \frac{1}{2\pi i} \int _{(\sigma )} \frac{\zeta (s)}{\zeta (2s)} X^s \cdot \Psi _{H/ X}(s) ds + \frac{6}{\pi ^2} X \Psi _{H/X}(1). \end{aligned}$$

Applying (62) and (63) we find

$$\begin{aligned} \frac{1}{2\pi i} \int _{(\sigma )} \frac{\zeta (s)}{\zeta (2s)} \cdot X^s {\widetilde{\Phi }}(s) ds = O_{\delta } ( X^{\Theta - \delta / 2} ) + O_{\delta }(X^{1/4 + 3 \delta }). \end{aligned}$$

By choice of \(\delta > 0\) the error term is bounded by \(O_{\delta }(X^{\Theta - \delta / 2})\). Therefore setting

$$\begin{aligned} A(X) := \sum _{n} \mu ^2(n) \Phi \Big ( \frac{n}{X} \Big ) - \frac{6}{\pi ^2} {\widehat{\Phi }}(0) X\cdot {\mathbf {1}}_{[1,\infty )}(X), \end{aligned}$$

we have for \(X \ge 1\),

$$\begin{aligned} A(X) = \frac{1}{2\pi i} \int _{(\sigma )} \frac{\zeta (s)}{\zeta (2s)} \cdot X^s {\widetilde{\Phi }}(s) ds = O_{\delta }(X^{\Theta - \delta / 2}). \end{aligned}$$

Thus there exists a constant \(c = c(\Theta , \delta )\) such that,

$$\begin{aligned} |A(x)| \le c x^{\Theta - \delta / 50} \end{aligned}$$
(64)

for all \(x \ge 0\) (note that for \(0< x < 1/100\) we have that A(x) vanishes). Let us start by observing that for \(\mathfrak {R}s > 1\),

$$\begin{aligned} \begin{aligned} \int _{0}^{\infty } A(x) x^{-s - 1} dx&= \sum _{n \ge 1} \mu ^2(n) \int _{0}^{\infty } \Phi \Big ( \frac{n}{x} \Big ) x^{-s - 1} dx - \frac{6 {\widehat{\Phi }}(0)}{\pi ^2} \cdot \frac{1}{s - 1} \\&= \sum _{n \ge 1} \mu ^2(n) \cdot n^{-s} {\widetilde{\Phi }}(s) - \frac{6 {\widetilde{\Phi }}(1)}{\pi ^2} \cdot \frac{1}{s - 1} \\&= \frac{\zeta (s)}{\zeta (2s)} \cdot {\widetilde{\Phi }}(s) - \frac{6 {\widetilde{\Phi }}(1)}{\pi ^2} \cdot \frac{1}{s - 1}. \end{aligned} \end{aligned}$$
(65)

The function \(\int _{0}^{\infty } A(x) x^{-s - 1} dx\) is analytic in \(\mathfrak {R}s > \Theta - \delta / 50\) by (64). Therefore, by (65) and analytic continuation,

$$\begin{aligned} \frac{\zeta (s)}{\zeta (2s)} \cdot {\widetilde{\Phi }}(s) - \frac{6 {\widetilde{\Phi }}(1)}{\pi ^2} \cdot \frac{1}{s - 1} \end{aligned}$$
(66)

is analytic in the region \(\mathfrak {R}s > \Theta - \delta / 50\). This however contradicts that (66) has a pole at \(s_0\) and \(\mathfrak {R}s_0 \in (\Theta - \delta / 50, \Theta ]\). \(\square \)