1 Introduction

Questions concerning the number and distribution of rational points on hypersurfaces have long attracted the interest of both number theorists and algebraic geometers. Building on work by Davenport [15], Birch wrote an influential paper [6] in which he provided a method to prove the analytic Hasse principle and establish asymptotic formulæ for the number of integer points on projective hypersurfaces under moderate non-singularity conditions, provided that the dimension of the hypersurface is sufficiently large compared to its degree. In particular, suppose that \(F \in \mathbb {Z}[x_1, \ldots , x_n]\) is a non-singular form of degree d defining a hypersurface \(\mathcal {V}\), and write N(X) for the number of points \(\mathbf {x} \in \mathcal {V}(\mathbb {Z})\) with \(|x_i| \leqslant X\) for \(1 \leqslant i \leqslant n\). In this notation, Birch’s main result [6, Theorem] states that whenever \(n > 2^d (d-1)\), there exists a positive real number \(\nu \) with the property that the number of integer points on \(\mathcal {V}\) satisfies an asymptotic formula of the shape

$$\begin{aligned} N(X) = c X^{n-d} + O(X^{n-d-\nu }). \end{aligned}$$

The constant c is non-negative and has an interpretation in terms of the density of \(\mathbb {Q}_v\)-points in \(\mathcal {V}\) for all completions \(\mathbb {Q}_v\) of \(\mathbb {Q}\).

In the work at hand, we study a higher-dimensional generalisation of Birch’s result. Denote by N(XY) the number of points \(\mathbf {x}, \mathbf {y} \in \mathbb {Z}^n {\setminus } \{\varvec{0}\}\) satisfying \(|x_i| \leqslant X\) and \(|y_i| \leqslant Y\) for \(1 \leqslant i \leqslant n\), and having the property that

$$\begin{aligned} F(u \mathbf {x} + v \mathbf {y}) = 0 \qquad \text {identically in } u\text { and } v. \end{aligned}$$
(1.1)

This problem is related to that of counting rational lines contained in \(\mathcal {V}\), in that it counts all possible sets of generating pairs \((\mathbf {x}, \mathbf {y})\) of suitably bounded height and with the property that the line spanned by \((\mathbf {x}, \mathbf {y})\) is fully contained in \(\mathcal {V}\). Geometrically, it is known that the Fano scheme of lines on a generic hypersurface \(\mathcal {V}\) of degree d has dimension \(2n-d-5\) whenever that number is positive (Langer [21]; see also the classical work by Altman and Kleiman [1]). More recent results by Harris, Mazur and Pandharipande [18], Beheshti [3,4,5] and others explore under what circumstances a similar statement holds for all smooth hypersurfaces. Unfortunately, these results within algebraic geometry do not contain much arithmetic information and thus are of limited use if one seeks to study rational lines.

When F is a cubic form, recent work of the author jointly with Dietmann [12] shows that (1.1) has non-trivial rational solutions whenever \(n \geqslant 29\), but that there may not be any rational solutions when \(n=11\) or lower. For more general settings, (1.1) has been investigated in a series of papers by the present author [8,9,10,11]. We note at this point that, in order to strictly count lines, we would have to exclude those solutions of (1.1) where \(\mathbf {x}\) and \(\mathbf {y}\) are proportional. Fortunately, the contribution of such points is of a smaller order of magnitude than our eventual main term, so we do not lose any generality by omitting to explicitly exclude them.

A special role in problems of this flavour is played by certain points \(\mathbf {y} \in \mathcal {V}\) that admit a disproportionate number of solutions \(\mathbf {x} \in \mathcal {V}\) satisfying (1.1). Typically, the contribution arising from these solutions is counterbalanced by the relative sparsity of such points \(\mathbf {y}\), but when Y is very small in comparison to X, such solutions might well dominate the overall count. It is therefore natural to exclude the solutions that arise from such special subvarieties. When \(\mathcal {U} \subseteq \mathcal {V}\) is a Zariski-open subset, we denote by \(N_{\mathcal {U}}(X,Y)\) the number of integral \(\mathbf {x}, \mathbf {y} \in \mathcal {U}\) with \(|x_i| \leqslant X\) and \(|y_i| \leqslant Y\) for \(1 \leqslant i \leqslant n\) that satisfy (1.1). We can now state the main result of this memoir.

Theorem 1.1

Let \(F \in \mathbb {Z}[x_1, \ldots , x_n]\) be a non-singular form of degree \(d \geqslant 5\) defining a hypersurface \(\mathcal {V}\). Let further

$$\begin{aligned} n > 2^{d-1}d^4(d+1)(d+2). \end{aligned}$$

Then there exists a non-empty Zariski-open subset \(\mathcal {U} \subseteq \mathcal {V}\) and a positive real number \(\nu \) with the property that

$$\begin{aligned} N_{\mathcal {U}}(X,Y) = (XY)^{n-\frac{1}{2} d(d+1)} \chi _{\infty } \prod _{p \text { prime}} \chi _p + O((XY)^{n-\frac{1}{2}d(d+1) - \nu }). \end{aligned}$$

The Euler product converges absolutely, and its factors have an interpretation as the density of solutions of (1.1) over the local fields \(\mathbb {R}\) and \(\mathbb {Q}_p\), respectively.

Note that Theorem 1.1 is a slightly simplified version of what our methods yield; by a more thorough analysis it would be possible to obtain some improvements in the lower-order terms at the expense of a significantly more complicated expression, but no easy improvement of the order of growth \(2^d d^6\) in our result. In particular, we do not expect our results to be competitive when d is small. For this reason, even though a modification of our approach would provide results for \(d\in \{2,3,4\}\) also, we refrain from including the analysis of those cases as the expected results would likely be quite weak.

Clearly, the problem is symmetric in X and Y, so in our discussion we may assume without loss of generality that \(Y \leqslant X\). In the special case when \(Y=X\), the conclusion of Theorem 1.1 follows from [8, Theorem 1.1] under the more lenient condition that \(n>3 \cdot 2^d (d-1)(d+2)\), and subsequent work [9, Theorem 2.1] establishes a conclusion similar to that of Theorem 1.1 above under the additional condition that n should be large enough in terms of \(\log X/\log Y\), which is acceptable if X is at most a bounded power of Y. The main new input in our present work is therefore our treatment of the situation when Y is vastly smaller than X. Unlike in our former work in [8, 9], where we allowed the variables \(\mathbf {x}\), \(\mathbf {y}\) to vary independently, we pursue a slicing approach inspired by [23] in which we fix a point \(\mathbf {y} \in \mathcal {U}(\mathbb {Z})\) and then investigate the number \(N_{\mathbf {y}}(X; \mathcal {U})\) of points \(\mathbf {x} \in \mathcal {U}(\mathbb {Z}) \cap [-X,X]^n\) for which (1.1) is satisfied with that particular value \(\mathbf {y}\). We then have

$$\begin{aligned} N_{\mathcal {U}}(X,Y) = \sum _{\begin{array}{c} \mathbf {y} \in \mathcal {U}(\mathbb {Z})\\ |\mathbf {y}| \leqslant Y \end{array}} N_{\mathbf {y}}(X; \mathcal {U}), \end{aligned}$$
(1.2)

and we aim to establish bounds of the shape

$$\begin{aligned} N_{\mathbf {y}}(X; \mathcal {U}) = c_{\mathbf {y}} X^{n-\frac{1}{2} d(d+1)} + O_{\mathbf {y}}(X^{n-{\frac{1}{2}} d(d+1)-\nu }) \end{aligned}$$

for some constant \(c_{\mathbf {y}}\) and some positive number \(\nu \).

For generic \(\mathbf {y}\), the quantity \(N_{\mathbf {y}}(X) = N_{\mathbf {y}}(X; \mathcal {V})\) can be understood by applying the methods of Browning and Heath-Brown [14] for systems of homogeneous equations with differing degrees, although we need to be careful to track the dependence on the coefficients as these will be polynomially dependent on \(\mathbf {y}\). Unfortunately, this strategy breaks down if \(\mathbf {y}\) fails to satisfy a certain second-order non-singularity condition. When \(H_{\mathbf {x}}\) denotes the Hessian of F at the point \(\mathbf {x}\), we set

$$\begin{aligned} \mathcal {V}^*_{2,\rho } = \{\mathbf {x} \in \mathcal {V}: {{\,\mathrm{rank}\,}}H_{\mathbf {x}} \leqslant n-\rho \}, \end{aligned}$$

and let \(\mathcal {V}_{2,\rho } = \mathcal {V} {\setminus } \mathcal {V}^*_{2,\rho }\). In particular, \(\mathcal {V}_{2,\rho }\) is Zariski-open in \(\mathcal {V}\) for all \(1 \leqslant \rho \leqslant n\).

The set \(\mathcal {V}^*_{2,1}\) is defined by the zero set of the simultaneous equations \(F(\mathbf {x}) = 0\) and \(\det H_{\mathbf {x}}=0\), and is thus of codimension one inside \(\mathcal {V}\). Here, the function \(\Delta (\mathbf {x})=\det H_{\mathbf {x}}\) is a form of degree \((d-2)n\) in n variables, and one might hope that, unless the variety defined by \(\Delta (\mathbf {x})=0\) contains high-dimensional subvarieties of low degree, the set \(\mathcal {V}^*_{2,1}(\mathbb {Z})\) might not be too large. In particular, in the eventuality that \(\mathcal {V}^*_{2,1}(\mathbb {Z})\) should consist only of the origin for some function F, it would be permissible in Theorem 1.1 to take \(\mathcal {U} = \mathcal {V} {\setminus } \{ \varvec{0}\}\). Unfortunately, our current understanding of the size of the set \(\mathcal {V}^*_{2,\rho }\) is quite weak. Nonetheless, by bounding the number of integral points in \(\mathcal {V}^*_{2,\rho }\) we are still able to establish asymptotic formulæ for our original counting function N(XY) that extend the admissible range of Y compared to what had previously been known in [9, Theorem 2.1] without having to exclude an exceptional subvariety.

Theorem 1.2

Let \(F \in \mathbb {Z}[x_1, \ldots , x_n]\) be a non-singular form of degree \(d \geqslant 5\), and suppose that \(Y = X^\psi \), where \(0< \psi < (2d^4)^{-1}\). Furthermore, suppose that

$$\begin{aligned} n> 2^{d-1}d^4(d+1)(d+2) + \textstyle {\frac{1}{2}}d(d-1) \psi ^{-1}. \end{aligned}$$

Then there exists a positive real number \(\nu \) with the property that

$$\begin{aligned} N(X, Y) = (XY)^{n-\frac{1}{2}d(d+1)} \chi _{\infty }\prod _{p \text { prime}} \chi _p + O((XY)^{n-\frac{1}{2}d(d+1)}X^{-\nu }), \end{aligned}$$

where the local factors are the same as in Theorem 1.1.

The reader may wonder how the lower bound on n compares with that which can be extracted from [9, Theorem 2.1]. In that result, the bound on the number of variables in the case when \(\psi \) is small can be written in terms of \(\psi \) as

$$\begin{aligned} n > 2^{d-2} d(d+1)(1+\psi ^{-1}). \end{aligned}$$

It is clear that for \(\psi \ll d^{-4}\) our new result is significantly stronger.

There are many questions of interest from a geometric point of view that use height functions different from the naive one. In particular, the conjectures of Manin and Manin–Peyre (see [2, 17, 22]) are phrased in terms of an anticanonical height function, which reflects the dimension and degree of the varieties under consideration. In bi- or multihomogeneous settings, this height forces the variables to lie in domains of a hyperbolic shape, which it is hard to capture from an analytic point of view. Fortunately, a very general method of transferring information between these settings has been provided by Blomer and Brüdern [7] in the context of proving Manin’s conjecture for multihomogeneous diagonal equations. The same technology has later been used by Schindler [23] in her analysis of Manin’s conjecture for bihomogeneous equations. Here, the main criterion for the method of [7] to work is that one have a good understanding of the underlying counting function over lopsided boxes. The treatment of boxes with sides of comparable length is usually tractable by some minor modifications of the strategy employed for homogeneous boxes, so a main challenge lies in controlling the extreme cases when the boxes have very unequal sidelengths. In view of such potential future applications we state two by-products of our strategy which may be of independent interest and are simplified versions of Theorems 7.1 and 7.2 below, respectively.

Theorem 1.3

Let \(F \in \mathbb {Z}[x_1, \ldots , x_n]\) be a non-singular form of degree \(d \geqslant 5\) defining a hypersurface \(\mathcal {V}\). Let further \(\psi \in (0,1/(2 d^4)]\), and suppose that

$$\begin{aligned} n\geqslant 2^{d}d(d^2-1) + \rho . \end{aligned}$$

Then there exists a positive real number \(\nu \) with the property that

$$\begin{aligned} N_{\mathbf {y}}(X) = X^{n-\frac{1}{2} d(d+1)} \mathfrak {S}_{\mathbf {y}} \mathfrak {J}_{\mathbf {y}} + O(X^{n-\frac{1}{2} d(d+1) - \nu }) \end{aligned}$$

uniformly for all \(\mathbf {y} \in \mathcal {V}_{2,\rho }(\mathbb {Z})\) satisfying \(|\mathbf {y}| \leqslant X^{\psi }\). Moreover, the local factors satisfy \(0 \leqslant \mathfrak {S}_{\mathbf {y}} \ll _{\mathbf {y}} 1\) and \(0 \leqslant \mathfrak {J}_{\mathbf {y}} \ll _{\mathbf {y}} 1\).

The set \(\mathcal {V}^*_{2,\rho }\) is clearly algebraically defined for any \(\rho \), and it is known (see e.g. [20, Lemma 2]) that \(\dim \mathcal {V}^*_{2,\rho } \leqslant n-\rho \). Thus we have \(N_{\mathbf {y}}(X; \mathcal {V}_{2,\rho })=N_{\mathbf {y}}(X)+O(X^{n-\rho })\), and we see that when \(\rho >\frac{1}{2} d(d+1)\), the anticipated main term exceeds any error that might arise if we replace \(\mathcal {V}_{\rho , 2}\) by \(\mathcal {V}\) itself. This allows us to derive a bound on \(N_{\mathcal {U}}(X,Y)\) from bounds on \(N_{\mathbf {y}}(X)\).

Theorem 1.4

Let F and \(\mathcal {V}\) be as before with \(d \geqslant 5\), and for some \(\psi \in (0,1/(2d^4)]\) set \(Y=X^{\psi }\). Suppose that

$$\begin{aligned} n \geqslant 2^d d(d^2-1) \end{aligned}$$

and set \(\mathcal {U} = \mathcal {V}_{2, \frac{1}{2} d(d+1)+1}\). Then there exists a real number \(\nu > 0\) for which

$$\begin{aligned} N_{\mathcal {U}}(X,Y) = X^{n-\frac{1}{2} d(d+1)} \sum _{\begin{array}{c} \mathbf {y} \in \mathcal {U}(\mathbb {Z}) \\ |\mathbf {y}| \leqslant Y \end{array}} \mathfrak {S}_{\mathbf {y}} \mathfrak {J}_{\mathbf {y}} + O((XY)^{n-\frac{1}{2} d(d+1)-\nu }). \end{aligned}$$

Notation Throughout the paper, the following notational conventions will be observed. Any statements containing the letter \(\varepsilon \) are asserted to hold for all sufficiently small values of \(\varepsilon \), and we make no effort to track the precise ‘value’ of \(\varepsilon \), which is consequently allowed to change from one line to the next. We will be liberal in our use of vector notation. In particular, equations and inequalities involving vectors should always be understood entrywise. In this spirit, we write \(| \mathbf {x} | = \Vert \mathbf {x} \Vert _\infty = \max |x_i|\), as well as \((\mathbf {a}, b) = \gcd (a_1, \ldots , a_n, b)\). For \(\alpha \in \mathbb {R}\) we write \(\Vert \alpha \Vert = \min _{z \in \mathbb {Z}}|\alpha - z|\). Finally, the implicit constants in the Landau and Vinogradov notations are allowed to depend on all parameters except X, Y and \(\mathbf {y}\).

2 Preliminary manoevres

Let \(\Phi \) denote the symmetric d-linear form associated to F, so that F can be written as \(F(\mathbf {x}) = \Phi (\mathbf {x}, \ldots , \mathbf {x})\). Then after expanding, the form F may be written as

$$\begin{aligned} F\left( u \mathbf {x}+ v \mathbf {y}\right) = \sum _{j=0}^d \left( {\begin{array}{c}d\\ j\end{array}}\right) u^j v^{d-j} \Phi (\underbrace{\mathbf {x}, \ldots , \mathbf {x}}_{j \text { entries}}, \underbrace{\mathbf {y}, \ldots , \mathbf {y}}_{d-j \text { entries}}), \end{aligned}$$

and our counting function \(N_{\mathcal {U}}(X, Y)\) counts integer solutions \(\mathbf {x}, \mathbf {y} \in \mathcal {U}\) to the system of equations

$$\begin{aligned} \Phi (\underbrace{\mathbf {x}, \ldots , \mathbf {x}}_{j \text { entries}}, \underbrace{\mathbf {y}, \ldots , \mathbf {y}}_{d-j \text { entries}}) = 0 \qquad (0 \leqslant j \leqslant d), \end{aligned}$$
(2.1)

where \(|x_i| \leqslant X\) and \(|y_i|\leqslant Y\) for \(1 \leqslant i \leqslant n\).

In this and the following sections we fix a value of \(\mathbf {y}\) and consider (2.1) as a system of equations in \(\mathbf {x}\) only. Eventually, we will have to consider only such choices for \(\mathbf {y}\) that lie in a suitable Zariski-open subset \(\mathcal {U}\). This allows us in particular to exclude the value \(\mathbf {y} = \varvec{0}\). For \(1 \leqslant j \leqslant d\) we write \(\Phi ^{(j)}_{\mathbf {y}}(\mathbf {x})\) for the form having j entries \(\mathbf {x}\) and \(d-j\) entries \(\mathbf {y}\). In this notation, \(N_{\mathbf {y}}(X)\) denotes the number of points \(\mathbf {x} \in \mathbb {Z}^n \cap [-X, X]^n\) satisfying

$$\begin{aligned} \Phi _{\mathbf {y}}^{(j)}(\mathbf {x})&= 0 \qquad (1 \leqslant j \leqslant d). \end{aligned}$$
(2.2)

The system (2.2) consists of forms of consecutive degrees \(1, \ldots , d\). Asymptotic formulæ for the number of solutions of such systems can be obtained by the machinery of Browning and Heath-Brown [14]. However, before embarking on that argument, it is convenient to eliminate one variable by solving the linear equation, so that all forms explicitly occurring in the system have degree two or higher. To this end, observe that the equation \(\Phi _{\mathbf {y}}^{(1)}(\mathbf {x}) = 0\) can be expressed as

$$\begin{aligned} l_1(\mathbf {y}) x_1 + \ldots + l_n(\mathbf {y}) x_n = 0, \end{aligned}$$
(2.3)

where the coefficients \(l_i = l_i(\mathbf {y})\) are given by \(l_i(\mathbf {y}) = \frac{\partial }{\partial y_i} F(\mathbf {y})\), and are therefore polynomials of degree \(d-1\) in \(\mathbf {y}\). Since F is non-singular by assumption, not all \(l_i\) can vanish simultaneously, and thus the set of \(\mathbf {x} \in \mathbb {Z}^n\) satisfying (2.3) forms an \((n-1)\)-dimensional lattice \(\Lambda _{\mathbf {y}} \subseteq \mathbb {Z}^n\). Denote by \(\mathfrak {A}_{\mathbf {y}}(X) \subseteq \mathbb {Z}^n\) the set of lattice points \(\mathbf {x} \in \Lambda _{\mathbf {y}}\) for which \(|\mathbf {x}| \leqslant X\). Thus, we may equivalently consider the quantity \(N_{\mathbf {y}}(X)\) to be given by the number of points \(\mathbf {x} \in \mathfrak {A}_{\mathbf {y}}(X)\) satisfying the system of equations

$$\begin{aligned} \Phi _{\mathbf {y}}^{(j)}(\mathbf {x}) = 0 \qquad (2 \leqslant j \leqslant d). \end{aligned}$$

In order to understand the counting function \(N_{\mathbf {y}}(X)\), we encode the summation conditions in exponential sums. Let \(\varvec{\alpha }=(\alpha _2, \ldots , \alpha _{d}) \in [0,1)^{d-1}\), then \(N_{\mathbf {y}}(X)\) is given by

$$\begin{aligned} N_{\mathbf {y}}(X) = \sum _{\mathbf {x} \in \mathfrak {A}_{\mathbf {y}}(X)} \int _{[0,1)^{d-1}} e\bigg (\sum _{j=2}^d \alpha _j \Phi ^{(j)}_{\mathbf {y}}(\mathbf {x})\bigg ) \,\mathrm {d}\varvec{\alpha }= \int _{[0,1)^{d-1}}T_{\mathbf {y}}(\varvec{\alpha }; X) \,\mathrm {d}\varvec{\alpha },\ \end{aligned}$$
(2.4)

where we introduced the exponential sum

$$\begin{aligned} T_{\mathbf {y}}(\varvec{\alpha }; P)=\sum _{\mathbf {x} \in \mathfrak {A}_{\mathbf {y}}(P)}e\bigg (\sum _{j=2}^d \alpha _j \Phi ^{(j)}_{\mathbf {y}}(\mathbf {x})\bigg ). \end{aligned}$$

In our arguments below, we will omit the parameter P from the notation whenever there is no danger of confusion. In particular, we drop it in most cases when \(P=X\), highlighting it only when we consider exponential sums of size different from X.

For simpler notation below, we write \(s = n-1\). The following facts are all contained in [19, Lemma 1] or easy consequences thereof. By part (i) of that lemma, the lattice \(\Lambda _{\mathbf {y}}\) has discriminant

$$\begin{aligned} d(\Lambda _{\mathbf {y}})\ll | \mathbf {l}(\mathbf {y})| \ll |\mathbf {y}|^{d-1}. \end{aligned}$$
(2.5)

Since we will require \(\psi \leqslant (2d)^{-2}\) and thus \(d(\Lambda _{\mathbf {y}}) \ll Y^{d-1} \ll X^{1/d}\), it follows that \({{\,\mathrm{Card}\,}}\mathfrak {A}_{\mathbf {y}}(X) \asymp X^{s}/d(\Lambda _{\mathbf {y}})\). We denote the successive minima of the lattice \(\Lambda _{\mathbf {y}}\) by \(\mu _1 \leqslant \ldots \leqslant \mu _s\), setting \(\mu _s=\mu _{\mathrm {max}}\), and recall that \(\mu _1 \cdots \mu _s \asymp d (\Lambda _{\mathbf {y}})\). Fix a basis \(\mathcal {B} = \{\mathbf {b}_1, \ldots , \mathbf {b}_s\} \subseteq \mathbb {R}^n\) of \(\Lambda _{\mathbf {y}}\), which by part (iii) of the same lemma we are free to choose in such a way that when \(\mathbf {x} \in \mathfrak {A}_{\mathbf {y}}(X)\) with \(\mathbf {x} =\xi _1 \mathbf {b}_1 + \ldots + \xi _s \mathbf {b}_s\), we have \(\xi _i \ll X/|\mathbf {b}_i|\) for \(1 \leqslant i \leqslant s\). We label the basis elements in increasing order, so that \(|\mathbf {b}_1| \leqslant |\mathbf {b}_2| \leqslant \ldots \leqslant |\mathbf {b}_s|\). With \(\mathcal {B}\) thusly chosen, it then is an immediate consequence of the second statement of part (iii) of the abovementioned lemma together with the definition of the successive minima that \(|\mathbf {b}_i| \asymp \mu _i\) for \(1 \leqslant i \leqslant s\).

Set

$$\begin{aligned} \mathfrak {B}_{\mathbf {y}}(X) = \prod _{i=1}^s [-c X/\mu _i, c X/\mu _i], \end{aligned}$$

where \(c \ll 1\) is chosen large enough so that the coordinate vector \(\varvec{\xi }\) of \(\mathbf {x}\) lies in \(\mathfrak {B}_{\mathbf {y}}(X)\) whenever \(\mathbf {x} \in \mathfrak {A}_{\mathbf {y}}(X)\). Moreover, for \(2 \leqslant j \leqslant d\) set \(\Psi ^{(j)}_{\mathbf {y}}(\varvec{\xi })=\Phi ^{(j)}_{\mathbf {y}}(\mathbf {x})\) and write

$$\begin{aligned} \phi _{\mathbf {y}}(\varvec{\alpha };\varvec{\xi }) = \sum _{j=2}^d \alpha _j \Psi ^{(j)}_{\mathbf {y}}(\varvec{\xi }). \end{aligned}$$

By an argument along the lines of that of Lemma 5.2 in [8] one sees that

$$\begin{aligned} T_{\mathbf {y}}(\varvec{\alpha }) \ll X^{\varepsilon }U_{\mathbf {y}}(\varvec{\alpha }), \end{aligned}$$
(2.6)

where

$$\begin{aligned} U_{\mathbf {y}}(\varvec{\alpha })=\sup _{\varvec{\eta }\in [0,1]^{d-1}}\Bigg |\sum _{\varvec{\xi }\in \mathfrak {B}_{\mathbf {y}}(X)}e(\phi _{\mathbf {y}}(\varvec{\alpha }; \varvec{\xi }) + \varvec{\eta }\cdot \varvec{\xi })\Bigg |. \end{aligned}$$

This exponential sum is related to that considered by Schindler and Sofos [24] in their treatment of forms in many variables over lopsided boxes. In comparison with their result, however, our argument is more sensitive to the degree of the lopsidedness of the box. Fortunately, the discriminant of our lattice is fairly small. Indeed, since our methods will break down when \(\psi \gg 1/d^2\) (see (3.4) below), and for our theorems we require even \(\psi \ll 1/d^4\), we find ourselves in a situation where the discriminant of our lattice satisfies the bound \(d(\Lambda _{\mathbf {y}}) \ll X^{(d-1)\psi } \ll X^{O(1/d^3)}\).

3 Van der Corput differences

The discrete differencing operator \(\partial \) is defined by its action on a polynomial F via the relation \(\partial _{\mathbf {h}}F(\mathbf {x}) = F(\mathbf {x} + \mathbf {h})-F(\mathbf {x})\), and we write

$$\begin{aligned} \partial _{\mathbf {h}_i, \ldots , \mathbf {h}_1}F(\mathbf {x}) = \partial _{\mathbf {h}_i} \cdots \partial _{\mathbf {h}_1} F(\mathbf {x}) \end{aligned}$$

for its i-fold iteration. This allows us to state our basic differencing lemma, which is fairly straightforward and essentially follows from [6, Lemma 2.1].

Lemma 3.1

Let \(1 \leqslant i \leqslant d-1\). Then one has

$$\begin{aligned} | U_{\mathbf {y}}(\varvec{\alpha })|^{2^{i}} \ll \left( \frac{X^s}{d(\Lambda _{\mathbf {y}})}\right) ^{(2^{i}-i-1)} \sum _{\begin{array}{c} {\mathbf {h}_l \in \mathfrak {B}_{\mathbf {y}}(X)}\\ {1 \leqslant l \leqslant i} \end{array}} \Bigg |\sum _{\varvec{\xi }\in \mathfrak {C}}e\bigg (\partial _{\mathbf {h}_i, \ldots , \mathbf {h}_1} \phi _{\mathbf {y}}(\varvec{\alpha };\varvec{\xi })\bigg )\Bigg |, \end{aligned}$$

where the sets \(\mathfrak {C} = \mathfrak {C}(\mathbf {h}_1, \ldots , \mathbf {h}_i)\) are boxes (possibly empty) contained inside \(\mathfrak {B}_{\mathbf {y}}(X)\).

Proof

Upon recalling that \({{\,\mathrm{Card}\,}}\mathfrak {B}_{\mathbf {y}}(X) \asymp X^s/d(\Lambda _{\mathbf {y}})\), this is a straightforward reformulation of the standard Weyl differencing procedure as for instance in Davenport’s monograph [16, Chapter 13]. \(\square \)

At this stage, the usual procedure would be to apply Lemma 3.1 with \(i=d-1\), so that the argument of the exponential function becomes linear, thus yielding either a non-trivial upper bound or good approximations to the coefficient \(\alpha _d\). In the situation at hand, however, this approach would lose all information connected to the forms \(\Psi ^{(j)}_{\mathbf {y}}\) with \(j<d\). So instead we follow the approach by Browning and Heath-Brown [14] and replace the last Weyl differencing step by a suitable van der Corput step. For \(2 \leqslant j \leqslant d\) we define functions \(B_{\mathbf {y},m}^{(j)}\) for \(1 \leqslant m \leqslant s\) via the relation

$$\begin{aligned} \Psi _{\mathbf {y}}^{(j)}(\varvec{\xi }, \mathbf {h}_1, \ldots , \mathbf {h}_{j-1}) = \sum _{m=1}^s \xi _m B_{\mathbf {y},m}^{(j)}(\mathbf {h}_1, \ldots , \mathbf {h}_{j-1}). \end{aligned}$$
(3.1)

Furthermore, let \(\theta _2, \ldots , \theta _d\) be parameters in the unit interval which will be fixed later, and define

$$\begin{aligned} \nu _j = (j-1) \theta _j \qquad \text {and} \qquad \omega _j = \sum _{i=j}^{d} \nu _i \qquad (2 \leqslant j \leqslant d). \end{aligned}$$
(3.2)

Set further \(D_j =\frac{1}{2}j(j+1)\) for \(1 \leqslant j \leqslant d\), and for integers \(q_j\) with \(2 \leqslant j \leqslant d\) put

$$\begin{aligned} Q_j = \prod _{i=j}^d q_i. \end{aligned}$$
(3.3)

For notational reasons we write

$$\begin{aligned} D = D_d, \qquad D_0 = 0, \qquad \omega _{d+1} = 0 \qquad \text {and} \qquad Q_{d+1}=1, \end{aligned}$$

and we assume

$$\begin{aligned} \psi <1/(2d^2) \end{aligned}$$
(3.4)

throughout. For fixed \(\theta _{j+1}, \ldots , \theta _{d}\) set

$$\begin{aligned} R_j = X^{1-\omega _{j+1}} |\mathbf {y}|^{-D_{d-j}} \mu _{\mathrm {max}}^{-(d-j)} \end{aligned}$$
(3.5)

and

$$\begin{aligned} \Upsilon _j=\sum _{\begin{array}{c} {\mathbf {h}_l \in \mathfrak {B}_{\mathbf {y}} (X)}\\ {1 \leqslant l \leqslant j-2} \end{array}}\sum _{\mathbf {w} \in \mathfrak {B}_{\mathbf {y}}( 2R_j)} \prod _{m=1}^s \min \Bigg \{\frac{X}{\mu _m}, \bigg \Vert j!Q_{j+1}\alpha _{j} B_{\mathbf {y},m}^{(j)}(\mathbf {h}_1, \ldots , \mathbf {h}_{j-2}, \mathbf {w}) \bigg \Vert ^{-1} \Bigg \} . \end{aligned}$$
(3.6)

We can now state one of our key iterative lemmas.

Lemma 3.2

Let \(j \in \{2, \ldots , d\}\) be fixed. When \(j < d\), suppose that \(\theta _{j+1}, \ldots , \theta _{d}\) are fixed in such a way that, observing (3.2), one has

$$\begin{aligned} \omega _{j+1} + \psi (D_{d-j} + (d-1)(d-j))< 1. \end{aligned}$$
(3.7)

Suppose that for any i with \(j < i \leqslant d\) there exists a natural number \(q_i\) satisfying \(q_i \ll X^{\nu _i}|\mathbf {y}|^{d-i} \mu _{\mathrm {max}}\) with the property that, in view of (3.3), one has

$$\begin{aligned} \big \Vert Q_i \alpha _i\big \Vert \ll X^{-i + \omega _i}|\mathbf {y}|^{D_{d-i}}\mu _{\mathrm {max}}^{d-i+1}. \end{aligned}$$

Then we have the bound

$$\begin{aligned} |U_{\mathbf {y}}(\varvec{\alpha })|^{2^{j-1}} \ll \left( \frac{X^s}{d(\Lambda _{\mathbf {y}})}\right) ^{2^{j-1}-(j-1)}\left( \frac{R_j^s}{d(\Lambda _{\mathbf {y}})}\right) ^{-1} \Upsilon _j. \end{aligned}$$

Proof

Suppose first that \(j > 2\). In this case, applying Lemma 3.1 with \(i=j-2\) followed by an application of Cauchy’s inequality gives

$$\begin{aligned} | U_{\mathbf {y}}(\varvec{\alpha }) |^{2^{j-1}} \ll \left( \frac{X^s}{d(\Lambda _{\mathbf {y}})}\right) ^{2^{j-1}-j} \sum _{\begin{array}{c} {\mathbf {h}_l \in \mathfrak {B}_{\mathbf {y}}(X)}\\ {1 \leqslant l \leqslant j-2} \end{array}} \left| \sum _{\varvec{\xi }\in \mathfrak {C}_1} e(\partial _{\mathbf {h}_1, \ldots , \mathbf {h}_{j-2}} \phi _{\mathbf {y}}(\varvec{\alpha };\varvec{\xi }) )\right| ^2 \end{aligned}$$
(3.8)

for suitable boxes \(\mathfrak {C}_1 = \mathfrak {C}_1(\mathbf {h}_1, \ldots , \mathbf {h}_{j-2}) \subseteq \mathfrak {B}_{\mathbf {y}}(X)\). Let now \(\mathbf {h}_1, \ldots , \mathbf {h}_{j-2}\) be temporarily fixed, and observe that our hypothesis concerning the size of the \(q_i\) implies via (3.3) that \(R_j Q_{j+1} \ll X\). Consequently, we have

$$\begin{aligned}&\frac{R_j^s}{d(\Lambda _{\mathbf {y}})} \sum _{\varvec{\xi }\in \mathfrak {C}_1} e(\partial _{\mathbf {h}_1, \ldots , \mathbf {h}_{j-2}} \phi _{\mathbf {y}}(\varvec{\alpha };\varvec{\xi }) )\nonumber \\&\quad \ll \sum _{\mathbf {u} \in \mathfrak {B}_{\mathbf {y}}( R_j)}\sum _{\begin{array}{c} \varvec{\xi }\\ \varvec{\xi }+ Q_{j+1}\mathbf {u} \in \mathfrak {C}_1 \end{array}} e(\partial _{\mathbf {h}_1, \ldots , \mathbf {h}_{j-2}} \phi _{\mathbf {y}}(\varvec{\alpha };\varvec{\xi }+ Q_{j+1} \mathbf {u}) ). \end{aligned}$$
(3.9)

We denote by \(\mathfrak {C}_2\) the set of \(\varvec{\xi }\) for which \(\varvec{\xi }+ Q_{j+1}\mathbf {u} \in \mathfrak {C}_1\) for some \(\mathbf {u} \in \mathfrak {B}_{\mathbf {y}}(R_j)\); this box has cardinality \({{\,\mathrm{Card}\,}}\mathfrak {C}_2 \asymp X^s/d(\Lambda _{\mathbf {y}})\). Then with another application of Cauchy’s inequality one obtains from (3.9) the bound

$$\begin{aligned}&\left( \frac{R_j^{s}}{d(\Lambda _{\mathbf {y}})}\right) ^2 \left| \sum _{\varvec{\xi }\in \mathfrak {C}_1}e(\partial _{\mathbf {h}_1, \ldots , \mathbf {h}_{j-2}} \phi _{\mathbf {y}}(\varvec{\alpha };\varvec{\xi }))\right| ^2 \\&\qquad \ll \frac{X^{s}}{d(\Lambda _{\mathbf {y}})} \sum _{\varvec{\xi }\in \mathfrak {C}_2} \Bigg | \sum _{\begin{array}{c} \mathbf {u} \in \mathfrak {B}_{\mathbf {y}}( R_j)\\ \varvec{\xi }+ Q_{j+1}\mathbf {u} \in \mathfrak {C}_1 \end{array}} e(\partial _{\mathbf {h}_1, \ldots , \mathbf {h}_{j-2}} \phi _{\mathbf {y}}(\varvec{\alpha };\varvec{\xi }+ Q_{j+1} \mathbf {u}) )\Bigg |^2 \\&\qquad \ll \frac{X^{s}}{d(\Lambda _{\mathbf {y}})} \sum _{\mathbf {u}, \mathbf {v} \in \mathfrak {B}_{\mathbf {y}}( R_j) } \left| \sum _{\varvec{\xi }} e\big (\partial _{\mathbf {h}_1, \ldots , \mathbf {h}_{j-2}} (\phi _{\mathbf {y}}(\varvec{\alpha };\varvec{\xi }+ Q_{j+1} \mathbf {u}) - \phi _{\mathbf {y}}(\varvec{\alpha };\varvec{\xi }+ Q_{j+1} \mathbf {v}))\big )\right| , \end{aligned}$$

where the inner sum runs over all \(\varvec{\xi }\) for which both \(\varvec{\xi }+ Q_{j+1}\mathbf {u}\) and \(\varvec{\xi }+ Q_{j+1}\mathbf {v}\) lie in \(\mathfrak {C}_1\).

We now make the change of variables \(\varvec{\xi }' = \varvec{\xi }+ Q_{j+1}\mathbf {v}\) and \(\mathbf {w} = \mathbf {u} - \mathbf {v}\), so that

$$\begin{aligned} \phi _{\mathbf {y}}(\varvec{\alpha };\varvec{\xi }+ Q_{j+1} \mathbf {u}) -\phi _{\mathbf {y}}(\varvec{\alpha };\varvec{\xi }+ Q_{j+1} \mathbf {v}) = \partial _{Q_{j+1}\mathbf {w}} \phi _{\mathbf {y}}(\varvec{\alpha };\varvec{\xi }'). \end{aligned}$$

Thus, upon summing trivially over \(\mathbf {v}\), we have shown that

$$\begin{aligned}&\left| \sum _{\varvec{\xi }\in \mathfrak {C}_1} e( \partial _{\mathbf {h}_1, \ldots , \mathbf {h}_{j-2}} \phi _{\mathbf {y}}(\varvec{\alpha };\varvec{\xi })) \right| ^2\\&\quad \ll \left( \frac{X}{R_j}\right) ^s \sum _{\mathbf {w} \in \mathfrak {B}_{\mathbf {y}}(2R_j)} \sup _{\mathfrak {C} \subseteq \mathfrak {B}_{\mathbf {y}}(X)}\left| \sum _{\varvec{\xi }\in \mathfrak {C}} e(\partial _{\mathbf {h}_1, \ldots , \mathbf {h}_{j-2}, Q_{j+1}\mathbf {w}} \phi _{\mathbf {y}}(\varvec{\alpha };\varvec{\xi }))\right| , \end{aligned}$$

where the supremum is over all coordinate-aligned boxes \(\mathfrak {C}\) inside \(\mathfrak {B}_{\mathbf {y}}(X)\). Thus, upon combining this bound with (3.8), it follows that the exponential sum can be bounded above via

$$\begin{aligned} | U_{\mathbf {y}}(\varvec{\alpha }) |^{2^{j-1}} \ll \left( \frac{X^s}{d(\Lambda _{\mathbf {y}})}\right) ^{2^{j-1}-(j-1)} \left( \frac{R_j^s}{d(\Lambda _{\mathbf {y}})}\right) ^{-1} \mathcal {W}_j, \end{aligned}$$

where

$$\begin{aligned} \mathcal {W}_j = \sum _{\begin{array}{c} {\mathbf {h}_l \in \mathfrak {B}_{\mathbf {y}}(X)}\\ {1 \leqslant l \leqslant j-2} \end{array}} \sum _{\mathbf {w} \in \mathfrak {B}_{\mathbf {y}}(2 R_j)} \sup _{\mathfrak {C} \subseteq \mathfrak {B}_{\mathbf {y}}(X)} \Bigg | \sum _{\varvec{\xi }\in \mathfrak {C}} e\bigg (\partial _{\mathbf {h}_1, \ldots , \mathbf {h}_{j-2}, Q_{j+1}\mathbf {w}} \sum _{i=2}^d \alpha _i \Psi _{\mathbf {y}}^{(i)}(\varvec{\xi }) \bigg )\Bigg |. \end{aligned}$$
(3.10)

An analogous bound is also derived easily in the omitted case when \(j=2\) upon interpreting the empty sum over \(\mathbf {h}_l\) and the concomitant differences as void, and noting that the phase factor in \(U_{\mathbf {y}}(\varvec{\alpha })\) disappears in the van der Corput step.

The size of the innermost exponential sum in (3.10) is dominated by the term corresponding to \(i=j\). In fact, observe that after \(j-1\) differences taken only the terms \(\Psi _{\mathbf {y}}^{(i)}(\varvec{\xi })\) with \( i \ge j\) occur explicitly in the argument of the exponential, and due to the last \(Q_{j+1}\)-van der Corput step all of these contain a factor \(Q_{j+1}\). Hence whenever \(j < d\) and \(1 \leqslant l \leqslant s\) one has

$$\begin{aligned}&\frac{\partial }{\partial \xi _l} e\bigg (\partial _{\mathbf {h}_1, \ldots , \mathbf {h}_{j-2}, Q_{j+1}\mathbf {w}} \sum _{i=j+1}^d \alpha _i \Psi _{\mathbf {y}}^{(i)}(\varvec{\xi }) \bigg ) \\&\quad \ll \sum _{i=j+1}^d \left\| Q_{j+1} \alpha _i\right\| X^{i-2} R_j |\mathbf {y}|^{d-i} \mu _l \\&\quad \ll \sum _{i=j+1}^d \left| \frac{Q_{j+1}}{Q_i}\right| \left\| Q_i\alpha _i\right\| X^{i-1-\omega _{j+1}}|\mathbf {y}|^{-D_{d-j}+d-i } \mu _{\mathrm {max}}^{-(d-j)}\mu _l \\&\quad \ll X^{-1} \mu _l, \end{aligned}$$

where in the last step we used the hypotheses of the lemma. Upon iterating this procedure, one confirms for any subset \(\{l_1, \ldots , l_k\} \subseteq \{1, \ldots , s\}\) that

$$\begin{aligned} \frac{\partial ^k}{\partial \xi _{l_1} \cdots \partial \xi _{l_k}} e\bigg (\partial _{\mathbf {h}_1, \ldots , \mathbf {h}_{j-2}, Q_{j+1}\mathbf {w}} \sum _{i=j+1}^d \alpha _i \Psi _{\mathbf {y}}^{(i)}(\varvec{\xi }) \bigg ) \ll X^{-k} \mu _{l_1} \cdots \mu _{l_k}. \end{aligned}$$

Suppose that \(\mathfrak {C} = \prod _i [C_i, C_i']\), recalling that \(\mathfrak {C} \subseteq \mathfrak {B}\) forces \(\max \{|C_i|, |C_i'|\} \ll X/\mu _i\) for \( 1 \leqslant i \leqslant s\). Thus, it follows from multidimensional partial summation that

$$\begin{aligned}&\sum _{\varvec{\xi }\in \mathfrak {C}} e\bigg (\partial _{\mathbf {h}_1, \ldots , \mathbf {h}_{j-2}, Q_{j+1}\mathbf {w}} \sum _{i=j}^d \alpha _i \Psi _{\mathbf {y}}^{(i)}(\varvec{\xi }) \bigg )\\&\quad \ll \bigg |\sum _{\varvec{\xi }\in \mathfrak {C}} e(\partial _{\mathbf {h}_1, \ldots , \mathbf {h}_{j-2}, Q_{j+1}\mathbf {w}} \alpha _j \Psi _{\mathbf {y}}^{(j)}(\varvec{\xi }))\bigg | \\&\qquad + \sum _{l=1}^s \frac{\mu _l}{X}\int _{C_{l}}^{C'_{l}} \bigg |\sum _{\begin{array}{c} \varvec{\xi }\in \mathfrak {C} \\ \xi _l \leqslant t \end{array}} e(\partial _{\mathbf {h}_1, \ldots , \mathbf {h}_{j-2}, Q_{j+1}\mathbf {w}} \alpha _j \Psi _{\mathbf {y}}^{(j)}(\varvec{\xi }))\bigg | \,\mathrm {d}t \\&\qquad + \ldots + \frac{\mu _1 \cdots \mu _s}{X^s} \int _{\mathfrak {C}} \bigg |\sum _{\begin{array}{c} \varvec{\xi }\in \mathfrak {C} \\ \xi _l \leqslant t_l\, (1 \leqslant l \leqslant s) \end{array}} e(\partial _{\mathbf {h}_1, \ldots , \mathbf {h}_{j-2}, Q_{j+1}\mathbf {w}} \alpha _j \Psi _{\mathbf {y}}^{(j)}(\varvec{\xi }))\bigg | \,\mathrm {d}\mathbf {t}\\&\quad \ll \sup _{\mathfrak {C}' \subseteq \mathfrak {C}}\bigg |\sum _{\varvec{\xi }\in \mathfrak {C}'} e(\partial _{\mathbf {h}_1, \ldots , \mathbf {h}_{j-2}, Q_{j+1}\mathbf {w}}\alpha _j \Psi _{\mathbf {y}}^{(j)}(\varvec{\xi }))\bigg |, \end{aligned}$$

where the supremum is over all coordinate-aligned boxes \(\mathfrak {C}' \subseteq \mathfrak {C}\). Thus, we discern that the dominant contribution arises indeed from the term of degree j, so that

$$\begin{aligned} \mathcal {W}_j \ll \sum _{\begin{array}{c} {\mathbf {h}_l \in \mathfrak {B}_{\mathbf {y}}(X)}\\ {1 \leqslant l \leqslant j-2} \end{array}} \sum _{\mathbf {w} \in \mathfrak {B}_{\mathbf {y}}(2R_j)} \sup _{\mathfrak {C} \subseteq \mathfrak {B}_{\mathbf {y}}(X)} \left| \sum _{\varvec{\xi }\in \mathfrak {C}} e\bigg (\alpha _j \partial _{\mathbf {h}_1, \ldots , \mathbf {h}_{j-2}, Q_{j+1} \mathbf {w}} \Psi _{\mathbf {y}}^{(j)}(\varvec{\xi }) \bigg )\right| . \end{aligned}$$

The argument of the exponential is now linear in \(\varvec{\xi }\). Since \(\mathfrak {C} \subseteq \mathfrak {B}_{\mathbf {y}}(X)\) is a box oriented along the coordinate axes, upon recalling the definition (3.1) the standard estimate on linear exponential sums yields the bound

$$\begin{aligned}&\left| \sum _{\varvec{\xi }\in \mathfrak {C}} e\bigg (\alpha _j \partial _{\mathbf {h}_1, \ldots , \mathbf {h}_{j-2}, Q_{j+1} \mathbf {w}} \Psi _{\mathbf {y}}^{(j)}(\varvec{\xi }) \bigg )\right| \\&\quad \ll \prod _{m=1}^s \min \Bigg \{\frac{X}{\mu _m}, \bigg \Vert j!Q_{j+1}\alpha _{j} B_{\mathbf {y},m}^{(j)}(\mathbf {h}_1, \ldots , \mathbf {h}_{j-2}, \mathbf {w}) \bigg \Vert ^{-1} \Bigg \}. \end{aligned}$$

Thus we have shown that \(\mathcal {W}_j \ll \Upsilon _j\) and the proof of the lemma is complete. \(\square \)

4 Geometry of numbers and a nonsingularity condition

The next step is to estimate the quantity \(\Upsilon _j\). For positive real numbers UVW set

$$\begin{aligned} N_{j,\mathbf {y}}(U,V; W)= {{\,\mathrm{Card}\,}}&\bigg \{ \mathbf {h}_1, \ldots , \mathbf {h}_{j-2} \in \mathfrak {B}_{\mathbf {y}}(U), \mathbf {z} \in \mathfrak {B}_{\mathbf {y}}(V), \nonumber \\&\quad \Big \Vert j!Q_{j+1}\alpha _{j} B_{\mathbf {y}, m}^{(j)}(\mathbf {h}_1, \ldots , \mathbf {h}_{j-2}, \mathbf {z}) \Big \Vert < \frac{\mu _m}{W} \quad (1 \leqslant m \leqslant s)\bigg \}. \end{aligned}$$
(4.1)

In this notation, standard arguments similar to those in the proof of [16, Lemma 13.2] show that for any fixed \(\theta _{j+1}, \ldots , \theta _d\) one has

$$\begin{aligned} \Upsilon _j \ll \left( \frac{X^s}{d(\Lambda _{\mathbf {y}})}\right) ^{1+\varepsilon } N_{j,\mathbf {y}}(X,R_j; X). \end{aligned}$$
(4.2)

Our next goal is to bound the size of \(N_{j,\mathbf {y}}(X,R_j; X)\). For this purpose we need a generalisation of Davenport’s lemma on the geometry of numbers (see [16, Lemma 12.6]). Let \(A_{k, m}>1\) be real numbers for \(1 \leqslant k \leqslant j-1\), \(1 \leqslant m \leqslant s\), and write

$$\begin{aligned} \mathscr {A}_k = \prod _{m=1}^s [-A_{k,m}, A_{k,m}] \qquad (1 \leqslant k \leqslant j-1). \end{aligned}$$

Let further \(0 < Z_k \leqslant 1\) for \(1 \leqslant k \leqslant j-1\). For any l with \(1 \leqslant l \leqslant j-1\) write \(\mathcal {R}_l(Z)\) for the number of \(\varvec{\xi }_1, \ldots , \varvec{\xi }_{j-1} \in \mathbb {Z}^{s}\) such that \(\varvec{\xi }_k \in Z_k \mathscr {A}_k\) for all \(1 \leqslant k \leqslant j-1\) with \(k \ne l\) and \(\varvec{\xi }_l \in Z \mathscr {A}_l\), having the property that

$$\begin{aligned} \left\| j!Q_{j+1}\alpha _{j} B_{\mathbf {y}, m}^{(j)}(\varvec{\xi }_1, \ldots , \varvec{\xi }_{j-1}) \right\| \leqslant Z A_{l,m}^{-1} \qquad (1 \leqslant m \leqslant s). \end{aligned}$$

In this notation, Schindler and Sofos [24] give the following variant of Davenport’s result.

Lemma 4.1

(Lemma 2.4 in [24]). Fix \(Z_1, \ldots , Z_{j-1} \in (0,1]\) and l with \(1 \leqslant l \leqslant j-1\). For any Z, \(Z'\) in the range \(0 < Z' \leqslant Z \leqslant 1\) one has

$$\begin{aligned} \mathcal {R}_l(Z) \ll (Z / Z')^s \mathcal {R}_l(Z'). \end{aligned}$$

Suppose that \(\theta _{j+1}, \ldots , \theta _d\) are fixed in such a way that (3.7) is satisfied. For any \(\theta _j\) satisfying

$$\begin{aligned} 0< \theta _j \leqslant 1-\omega _{j+1}-\psi (D_{d-j}+(d-1)(d-j)) \end{aligned}$$
(4.3)

and all \(1 \leqslant m \leqslant s\) we set

$$\begin{aligned} A_{k,m}&= (X/ \mu _m)X^{\textstyle {\frac{(1-\theta _j)(k-1)}{2}}},&Z_k&=X^{-\textstyle {\frac{(1-\theta _j)(k-1)}{2}}},&Z_k'&= X^{- \textstyle {\frac{ (1-\theta _j)(k+1)}{2}}} \end{aligned}$$

for \(1 \leqslant k \leqslant j-2\), and

$$\begin{aligned} A_{j-1,m}&=\frac{(R_jX)^{1/2}}{\mu _m}X^{\textstyle {\frac{(1-\theta _j)(j-2)}{2}}}, \qquad Z_{j-1} =\left( \frac{R_j}{X}\right) ^{1/2} X^{-\textstyle {\frac{(1-\theta _j)(j-2)}{2}}}, \\ Z_{j-1}'&= \left( \frac{X}{R_j}\right) ^{1/2} X^{-\textstyle {\frac{(1-\theta _j)j}{2}}}. \end{aligned}$$

Thus \(0< Z_{k}' < Z_{k} \leqslant 1\) for all k, and one has

$$\begin{aligned} A_{k,m} Z_k&= X/\mu _m, \qquad A_{k,m} Z_k' = X^{\theta _j}/\mu _m, \qquad Z_k/Z_k' = X^{1-\theta _j}, \\ Z_k'/A_{k,m}&= \mu _m X^{-1-(1-\theta _j)k} = Z_{k+1}/A_{k+1,m} \end{aligned}$$

for \(1 \leqslant k \leqslant j-2\), and

$$\begin{aligned} A_{j-1,m} Z_{j-1}&= R_j/\mu _m, \qquad A_{j-1,m}Z'_{j-1} = X^{\theta _j}/\mu _m, \qquad Z_{j-1}/Z_{j-1}'= R_jX^{-\theta _j},\\ Z_{j-1}/A_{j-1,m}&=\mu _mX^{-1-(1-\theta _j)(j-2)}, \qquad Z'_{j-1}/A_{j-1,m} = \frac{\mu _m}{R_j} X^{-(j-1)(1-\theta _j)}. \end{aligned}$$

Note here that (4.3) implies via (3.5) and (2.5) that \(R_j> X^{\theta _j}\). Applying Lemma 4.1 consecutively for the indices \(k=1, \ldots , j-1\) shows that

$$\begin{aligned} N_{j,\mathbf {y}}(X,R_j; X) \ll X^{(j-2)(1-\theta _j)s} (R_j X^{-\theta _j})^s N_{j,\mathbf {y}}(X^{\theta _j}, X^{\theta _j}; X^{(j-1)(1-\theta _j)}R_j), \end{aligned}$$

and hence we infer from (4.2) that

$$\begin{aligned} \Upsilon _j \ll \frac{X^{(j-1)(1 -\theta _j)s + \varepsilon }R_j^s}{d(\Lambda _{\mathbf {y}})}N_{j,\mathbf {y}}(X^{\theta _j}, X^{\theta _j}; X^{(j-1)(1-\theta _j)}R_j). \end{aligned}$$
(4.4)

If we now make the assumption that \(|T_{\mathbf {y}}(\varvec{\alpha })| \gg (X^{s}/d(\Lambda _{\mathbf {y}}))X^{-k_j \theta _j}\) for some \(k_j>0\) and some \(\theta _j\) satisfying (4.3), we obtain from Lemma 3.2 together with (2.6) and (4.4) the bound

$$\begin{aligned} N_{j,\mathbf {y}}(X^{\theta _j}, X^{\theta _j}; X^{-(j-1)(1-\theta _j)}R_j) \gg \left( \frac{X^{\theta _js}}{d(\Lambda _{\mathbf {y}})}\right) ^{j-1} X^{ -2^{j-1}k_j \theta _j - \varepsilon }. \end{aligned}$$

The diophantine approximation condition that is implicit in (4.1) is satisfied either if the functions \(B_{\mathbf {y}, m}^{(j)}\) (\(1 \leqslant m \leqslant s\)) tend to vanish for geometric reasons, or if \(\alpha _j\) has a good approximation in the rational numbers. Suppose that \(j! B_{\mathbf {y}, m}^{(j)}(\mathbf {h}_1, \ldots , \mathbf {h}_{j-1})\) is non-zero for some m and some choice of \(\mathbf {h}_1, \ldots , \mathbf {h}_{j-1}\) counted by \(N_{j,\mathbf {y}}(X^{\theta _j}, X^{\theta _j}; X^{(j-1)(1-\theta _j)}R_j)\), and denote its absolute value by \(q_j\). Then \(q_j \ll X^{\nu _j}|\mathbf {y}|^{d-j}\mu _{\mathrm {max}}\), and the approximation condition implied by the definition (4.1) takes the shape

$$\begin{aligned} \Vert \alpha _{j}q_jQ_{j+1}\Vert \ll \mu _{\mathrm {max}} X^{(j-1)(\theta _j-1)}R_j^{-1} \ll X^{-j+\omega _j}|\mathbf {y}|^{D_{d-j}}\mu _{\mathrm {max}}^{d-j+1}. \end{aligned}$$

We summarise the conclusions of our arguments in a lemma.

Lemma 4.2

Let \(j \in \{2, \ldots , d\}\) be fixed. Recalling (3.2), when \(j < d\) assume that \(\theta _{j+1}, \ldots , \theta _d\) are such that (3.7) is satisfied. Suppose that for any i with \(j < i \leqslant d\) there are positive integers \(q_i \ll X^{\nu _i}|\mathbf {y}|^{d-i}\mu _{\mathrm {max}}\) with the property that, in view of (3.3), one has

$$\begin{aligned} \big \Vert Q_i \alpha _i\big \Vert \ll X^{-i + \omega _i}|\mathbf {y}|^{D_{d-i}}\mu _{\mathrm {max}}^{d-i+1}. \end{aligned}$$

Finally, take \(k_j>0\) and \(\theta _j>0\) to be parameters, where \(\theta _j\) satisfies (4.3). For any \(\varvec{\alpha }\in [0,1)^{d-1}\) one of the following holds.

  1. (A)

    The exponential sum is bounded by

    $$\begin{aligned} |T_{\mathbf {y}}(\varvec{\alpha })| \ll \frac{X^{s}}{d(\Lambda _{\mathbf {y}})} X^{-k_j \theta _j+\varepsilon }. \end{aligned}$$
  2. (B)

    There exist integers \(a_j\) and \(q_j\) satisfying \(1 \leqslant q_j \ll X^{\nu _j}|\mathbf {y}|^{d-j}\mu _{\mathrm {max}}\) as well as \(0 \leqslant a_j \leqslant Q_j \) such that

    $$\begin{aligned} |Q_j\alpha _j - a_j| \ll X^{-j + \omega _j}|\mathbf {y}|^{D_{d-j}}\mu _{\mathrm {max}}^{d-j+1}. \end{aligned}$$
  3. (C)

    The number of \(\varvec{\xi }_1, \ldots , \varvec{\xi }_{j-1} \in \mathfrak {B}_{\mathbf {y}}(X^{\theta _j})\) for which \(B_{\mathbf {y},m}^{(j)}(\varvec{\xi }_1, \ldots , \varvec{\xi }_{j-1}) = 0\) for \(1 \leqslant m \leqslant s\) is at least of order \((X^{\theta _js}/d(\Lambda _{\mathbf {y}}))^{j-1}X^{-2^{j-1}k_j\theta _j - \varepsilon }\).

Our next goal is to interpret the third case geometrically. Write \(\mathcal {M}_j(\mathbf {y})\) for the variety containing all \((\varvec{\xi }_1, \ldots , \varvec{\xi }_{j-1}) \in \mathbb {A}_{\mathbb {C}}^{(j-1)s}\) that satisfy

$$\begin{aligned} B_{\mathbf {y},m}^{(j)}(\varvec{\xi }_1, \ldots , \varvec{\xi }_{j-1}) = 0 \quad (1 \leqslant m \leqslant s). \end{aligned}$$

It is clear (for instance from Theorem 3.1 in [13]) that for any positive real number Z one has

$$\begin{aligned} {{\,\mathrm{Card}\,}}\left\{ (\mathbf {h}_1, \ldots , \mathbf {h}_{j-1}) \in \mathbb {Z}^{(j-1)s} \cap \mathcal {M}_j(\mathbf {y}):\, |\mathbf {h}_i| \leqslant Z \;(1 \leqslant i \leqslant j-1) \right\} \ll Z^{\dim \mathcal {M}_j(\mathbf {y})}. \end{aligned}$$

As in the work of Schindler and Sofos [24] we cover the domain \(( {\mathfrak {B}}_{\mathbf {y}}(X^{\theta _j}))^{j-1}\) by at most \(O(\mu _{\mathrm {max}}^{s(j-1)}/d(\Lambda _{\mathbf {y}})^{j-1})\) translates of the box \([-X^{\theta _j}/\mu _{\mathrm {max}},X^{\theta _j}/\mu _{\mathrm {max}}]^{s(j-1)}\). In particular, since \(d(\Lambda _{\mathbf {y}}) \asymp \mu _1 \cdots \mu _s \ll \mu _{\mathrm {max}}^s\), the number of boxes in the covering is positive. Suppose now that

$$\begin{aligned} \psi < \varpi k_j \theta _j \end{aligned}$$
(4.5)

for all j and a suitably small parameter \(\varpi \), so that \(\mu _{\mathrm {max}} \ll X^{(d-1)\varpi k_j\theta _j}\). Since [13, Theorem 3.1] allows for translations, we infer that

$$\begin{aligned}&{{\,\mathrm{Card}\,}}\left\{ (\varvec{\xi }_1, \ldots , \varvec{\xi }_{j-1}) \in ({\mathfrak {B}}_{\mathbf {y}}(X^{\theta _j}))^{j-1} \cap \mathcal {M}_j(\mathbf {y}) \right\} \\&\qquad \ll \left( \frac{\mu _{\mathrm {max}}^{s}}{d(\Lambda _{\mathbf {y}})}\right) ^{j-1} \left( \frac{X^{\theta _j}}{\mu _{\mathrm {max}}} \right) ^{\dim \mathcal {M}_j(\mathbf {y})}\\&\qquad \ll \left( \frac{X^{(d-1)\varpi k_j\theta _j s}}{d(\Lambda _{\mathbf {y}})}\right) ^{j-1} \left( X^{\theta _j(1-(d-1)\varpi k_j)} \right) ^{\dim \mathcal {M}_j(\mathbf {y})}, \end{aligned}$$

where in the second step we used that \((j-1)s-\dim \mathcal {M}_j(\mathbf {y}) \geqslant 0\) trivially by the definition of \(\mathcal {M}_j(\mathbf {y})\) as a variety inside \(\mathbb {A}^{(j-1)s}\). We thus discern that whenever we are in case (C) of Lemma 4.2, we must have the bound

$$\begin{aligned} \left( \frac{X^{\theta _js}}{d(\Lambda _{\mathbf {y}})}\right) ^{j-1} X^{ -2^{j-1}k_j\theta _j - \varepsilon } \ll \left( \frac{X^{(d-1)\varpi k_j\theta _j s}}{d(\Lambda _{\mathbf {y}})}\right) ^{j-1} \left( X^{\theta _j(1-(d-1)\varpi k_j)} \right) ^{\dim \mathcal {M}_j(\mathbf {y})}, \end{aligned}$$

which simplifies to

$$\begin{aligned} (X^{\theta _j(1-(d-1)\varpi k_j )})^{(j-1)s-\dim \mathcal {M}_j(\mathbf {y})} \ll X^{2^{j-1}k_j \theta _j - \varepsilon }. \end{aligned}$$

It follows that for any j, the case (C) of Lemma 4.2 is excluded when

$$\begin{aligned} (j-1)s-\dim \mathcal {M}_j(\mathbf {y}) > \frac{2^{j-1}k_j}{1-(d-1)\varpi k_j }. \end{aligned}$$
(4.6)

We thus want to choose our parameters in such a way that (4.6) holds for all \(2 \leqslant j \leqslant d\).

We begin by observing that \(\mathcal {M}_{j-1}(\mathbf {y})\) is obtained from \(\mathcal {M}_j(\mathbf {y})\) by intersecting with the s hyperplanes defined by \(\mathbf {h}_{j-1}=\mathbf {y}\). This gives the inequality \(\dim \mathcal {M}_{j-1}(\mathbf {y}) \geqslant \dim \mathcal {M}_j(\mathbf {y}) - s\) for all j with \(3 \leqslant j \leqslant d\), and upon solving the recursion we deduce that

$$\begin{aligned} \dim \mathcal {M}_j(\mathbf {y})\leqslant (j-2)s + \dim \mathcal {M}_2(\mathbf {y}) \qquad (2 \leqslant j \leqslant d). \end{aligned}$$
(4.7)

It thus suffices to understand the set \(\mathcal {M}_2(\mathbf {y})\).

Lemma 4.3

Let \(\mathbf {y} \in \mathcal {V}\). We have \(\mathcal {M}_2(\mathbf {y}) = \langle \ker H_{\mathbf {y}}, \mathbf {y}\rangle \) and thus

$$\begin{aligned} \dim \mathcal {M}_2(\mathbf {y}) \leqslant \dim \ker H_{\mathbf {y}} +1. \end{aligned}$$

Proof

It follows from the definition of \(\Psi ^{(2)}\) that \(\mathcal {M}_2(\mathbf {y})\) is given by the set of all \(\mathbf {h} \in \mathbb {A}_{\mathbb {C}}^n\) satisfying \(\Phi _{\mathbf {y}}^{(1)}(\mathbf {h})=0\) and \((H_{\mathbf {y}}\mathbf {h}) \cdot \mathbf {x} = 0\) for all \(\mathbf {x}\) having \(\Phi _{\mathbf {y}}^{(1)}(\mathbf {x})=0\). In particular, \(\mathbf {h}\) has to be such that \((H_{\mathbf {y}} \mathbf {h}) \cdot \mathbf {x} = 0\) whenever \((H_{\mathbf {y}} \mathbf {y}) \cdot \mathbf {x} = 0\). This is clearly satisfied if \(\mathbf {h} \in \ker H_{\mathbf {y}}\), as then the first equation holds trivially. On the other hand, if \(\mathbf {h} \not \in \ker H_{\mathbf {y}}\), both equations define hyperplanes which coincide precisely if the vectors \(H_{\mathbf {y}}\mathbf {h}\) and \(H_{\mathbf {y}}\mathbf {y}\) are proportional, or in other words, \(\mathbf {h} - \alpha \mathbf {y} \in \ker H_{\mathbf {y}}\) for some scalar \(\alpha \). Rewriting gives \(\mathbf {h} \in \langle \ker H_{\mathbf {y}}, \mathbf {y} \rangle \), and the statement follows. \(\square \)

We now quantify the set of points \(\mathbf {y}\) for which \(\ker H_{\mathbf {y}}\) is large. For a natural number \(\rho \) set

$$\begin{aligned} \mathcal {A}(\rho ) = \{\mathbf {y} \in \mathbb {A}_{\mathbb {C}}^n : \dim \ker H_{\mathbf {y}} \leqslant \rho -1 \} \end{aligned}$$

and

$$\begin{aligned} \mathcal {B}(\rho ) = \{\mathbf {y} \in \mathbb {A}_{\mathbb {C}}^n: \dim \ker H_{\mathbf {y}} \geqslant \rho \}, \end{aligned}$$

so that the sets \(\mathcal {A}(\rho )\) and \(\mathcal {B}(\rho )\) are complementary. Observe also that with this definition we have \(\mathcal {V}^*_{2, \rho } = \mathcal {B}(\rho ) \cap \mathcal {V}\) and \(\mathcal {V}_{2, \rho } = \mathcal {A}(\rho ) \cap \mathcal {V}\). Suppose that \(\mathbf {y} \in \mathcal {A}(\rho )\) for some natural number \(\rho \). It then follows from (4.6) and (4.7) via Lemma 4.3 that case (C) of Lemma 4.2 is excluded whenever the inequalities

$$\begin{aligned} s - \rho > \frac{2^{j-1}k_j}{1-(d-1)\varpi k_j} \qquad (2 \leqslant j \leqslant d) \end{aligned}$$
(4.8)

are satisfied.

To conclude the section, we record the bound

$$\begin{aligned} {{\,\mathrm{Card}\,}}\{ \mathbf {y} \in \mathcal {V}^*_{2,\rho }(\mathbb {Z}): |\mathbf {y}| \leqslant Y\}&\leqslant {{\,\mathrm{Card}\,}}\{ \mathbf {y} \in \mathcal {B}(\rho ) \cap \mathbb {Z}^n: |\mathbf {y}| \leqslant Y\} \ll Y^{n-\rho }, \end{aligned}$$
(4.9)

which follows from the argument of [20, Lemma 2] via Theorem 3.1 in [13].

5 Major and minor arcs

Lemma 4.2 is designed to inductively define a partition into major and minor arcs for the entries \(\alpha _j\) of \(\varvec{\alpha }\) as j runs from d to 2. The size of the major arcs obtained in this way is controlled by the parameters \(\theta _j\) and \(k_j\) which it is now our job to choose optimally. Throughout this section and the next we will assume that \(\mathbf {y} \in \mathcal {A}(\rho )\) for some parameter \(\rho \). Also, we will work on the assumption that (4.8) is satisfied, so that the singular case in Lemma 4.2 is excluded.

Given an index j and parameters \(\theta _{j}, \ldots , \theta _d \in (0,1]\), we define the major arcs \(\mathfrak {M}_{\mathbf {y}}(X; \theta _j, \ldots , \theta _d)\) to be the set of all \(\varvec{\alpha }\in [0,1)^{d-1}\) for which there exist integers \(q_j, \ldots , q_d\) and \(a_j, \ldots , a_d\) having the property that for all \(i \in \{j, j+1, \ldots , d\}\) one has

$$\begin{aligned} 1 \leqslant q_i\leqslant & {} c_jX^{\nu _i}|\mathbf {y}|^{d-i}\mu _{\mathrm {max}}, \qquad 0 \leqslant a_i \leqslant Q_i, \nonumber \\ |\alpha _iQ_i-a_i|\leqslant & {} c_jX^{-i+\omega _i}|\mathbf {y}|^{D_{d-i}} \mu _{\mathrm {max}}^{d-i+1} \end{aligned}$$
(5.1)

for some suitable constant \(c_j\). Here, we implicitly used the notation (3.2) and (3.3). Let

$$\begin{aligned} \mathfrak {m}_{\mathbf {y}}(X; \theta _j, \ldots , \theta _d) = [0,1)^{d-1} {\setminus } \mathfrak {M}_{\mathbf {y}}(X; \theta _j, \ldots , \theta _d) \end{aligned}$$

be the corresponding minor arcs. One checks that the major arcs are disjoint as soon as X is sufficiently large and

$$\begin{aligned} \omega _j < j/2- \psi (D_{d-j}+(d-1)(d-j+1)). \end{aligned}$$

The definition of the major arcs as given above is iterative in nature in that the approximation of \(\alpha _j\) involves the denominators \(q_i\) for all \(i>j\), and this reflects the fact that our work of the previous section generates an approximation for \(\alpha _j\) only in the case when all \(\alpha _i\) with \(i>j\) have already been approximated. In a sense, therefore, the major arcs \(\mathfrak {M}_{\mathbf {y}}(X; \theta _j, \ldots , \theta _d)\) are only defined inside the set \(\mathfrak {M}_{\mathbf {y}}(X; \theta _{j+1}, \ldots , \theta _d)\).

At this point, we observe that the size of \(T_{\mathbf {y}}(\varvec{\alpha })\) is well defined for any particular \(\varvec{\alpha }\). In the light of Lemma 4.2, this means that we lose nothing by making the choice

$$\begin{aligned} k_j \theta _j = k_i \theta _i \quad (2 \leqslant i, j \leqslant d). \end{aligned}$$
(5.2)

With this assumption, as a consequence of our nested definition of the major arcs we have

$$\begin{aligned} |T_{\mathbf {y}}(\varvec{\alpha })| \ll \frac{X^s}{d(\Lambda _{\mathbf {y}}) }X^{-k_j \theta _j+\varepsilon } \quad \text { whenever } \varvec{\alpha }\in \mathfrak {m}_{\mathbf {y}}(X, \theta _j, \ldots , \theta _d), \end{aligned}$$

where the \(\varepsilon \) absorbs any possible dependence on the constants \(c_j\). As the convention (5.2) renders much of the information in our above notation superfluous, we put

$$\begin{aligned} \mathfrak {M}_{\mathbf {y}}^{(j)}(X; \theta _j)&= \mathfrak {M}_{\mathbf {y}}(X; \theta _j, (k_j/k_{j+1})\theta _j, \ldots , (k_j/k_{d})\theta _j), \end{aligned}$$

and we adopt an analogous convention for the minor arcs.

It is useful to make the definition

$$\begin{aligned} \Omega _j = \sum _{i=j}^d \omega _i = \sum _{i=j}^d (i-j+1)\nu _i \qquad (2 \leqslant j \leqslant d). \end{aligned}$$

Write further

$$\begin{aligned} \sigma _j = \sum _{i=j}^d \frac{(i-1)}{k_i} \quad \hbox { and } \quad \Sigma _j=\sum _{i=j}^d \sigma _j= \sum _{i=j}^d\frac{(i-j+1)(i-1)}{k_i}, \end{aligned}$$
(5.3)

then (5.2) implies that

$$\begin{aligned} \omega _j = \sigma _j k_j \theta _j \quad \hbox { and } \quad \Omega _j = \Sigma _j k_j \theta _j. \end{aligned}$$
(5.4)

When there is no danger of confusion, we will employ the convention that

$$\begin{aligned} \Sigma = \Sigma _2, \qquad \sigma =\sigma _2, \qquad \omega = \omega _2. \end{aligned}$$

Also, define

$$\begin{aligned} \Delta _j&= \sum _{i=j}^d D_{d-i} = \textstyle {\frac{1}{6}}(d - j) (d - j+1) (d - j+2), \end{aligned}$$

noting that \(\Delta _d=0\) and

$$\begin{aligned} \Delta _j \leqslant \Delta _2 = \textstyle {\frac{1}{6}} d (d^2 - 3 d + 2)\quad \text { for all } j. \end{aligned}$$

We then have the following simple lemma.

Lemma 5.1

For any j with \(2 \leqslant j \leqslant d\) the volume of the multi-dimensional major arcs is bounded by

$$\begin{aligned} {{\,\mathrm{vol}\,}}\mathfrak {M}_{\mathbf {y}}(X; \theta _j, \ldots , \theta _d) \ll X^{-(D-D_{j-1}) + \Omega _j + \omega _j}|\mathbf {y}|^{\Delta _{j} + D_{d-j}}\mu _{\mathrm {max}}^{D_{d-j+2}-1}. \end{aligned}$$

Proof

Recall the notation (3.2) and (3.3). The condition (5.1) implies that

$$\begin{aligned}&{{\,\mathrm{vol}\,}}\mathfrak {M}_{\mathbf {y}}(X; \theta _j, \ldots , \theta _d) \\&\qquad \ll \sum _{q_j=1}^{c_jX^{\nu _j}|\mathbf {y}|^{d-j}\mu _{\mathrm {max}} } \sum _{a_j = 0}^{Q_j}\left( \frac{X^{-j+\omega _j} |\mathbf {y}|^{D_{d-j}}\mu _{\mathrm {max}}^{d-j+1}}{Q_j} \right) \times \ldots \\&\qquad \qquad \qquad \times \sum _{q_d=1}^{c_jX^{\nu _d}\mu _{\mathrm {max}}} \sum _{a_d = 0}^{Q_d}\left( \frac{X^{-d+\omega _d}\mu _{\mathrm {max}}}{Q_d} \right) . \end{aligned}$$

We can perform the summations over all \(a_j\). After that, the sums disentangle, and we obtain

$$\begin{aligned} {{\,\mathrm{vol}\,}}\mathfrak {M}_{\mathbf {y}}(X; \theta _j, \ldots , \theta _d)&\ll \prod _{i=j}^d \left( \sum _{q_i=1}^{c_iX^{\nu _i}|\mathbf {y}|^{d-i}\mu _{\mathrm {max}} } X^{-i+\omega _i} |\mathbf {y}|^{D_{d-i}}\mu _{\mathrm {max}}^{d-i+1} \right) \\&\ll \prod _{i=j}^d X^{-i+\omega _i + \nu _i} |\mathbf {y}|^{D_{d-i} + d-i}\mu _{\mathrm {max}}^{d-i+2}\\&\ll X^{-(D-D_{j-1})+\Omega _j+\omega _j}|\mathbf {y}|^{\Delta _j + D_{d-j}}\mu _{\mathrm {max}}^{D_{d-j+2}-1} \end{aligned}$$

as claimed. \(\square \)

Our next task is to analyse under which conditions the contribution of the minor arcs is under control. We first consider the one-dimensional minor arcs \(\mathfrak {m}_{\mathbf {y}}^{(d)}(X; \theta _d)\).

Lemma 5.2

For any choice of positive parameters \(\theta _d \in (0,1]\), \(k_d\) and \(\delta _d\) suppose that

$$\begin{aligned} k_d > D-1+\delta _d \end{aligned}$$
(5.5)

and

$$\begin{aligned} (1-\Sigma _d- \sigma _d)k_d\theta _d > D_{d-1}-1 + \delta _d. \end{aligned}$$
(5.6)

Then for some \(\nu >0\) we have the bound

$$\begin{aligned} \int _{\mathfrak {m}_{\mathbf {y}}^{(d)}(X; \theta _d)} |T_{\mathbf {y}}(\varvec{\alpha })| \,\mathrm {d}\varvec{\alpha }\ll X^{s-(D-1)- \delta _d-\nu } \mu _{\mathrm {max}}. \end{aligned}$$

Proof

Let \(\theta _d\) be given. We can find a sequence \(\theta ^{(i)}_d\) with the property

$$\begin{aligned} 1=\theta _d^{(0)}> \theta _d^{(1)}> \ldots> \theta _d^{(M)}=\theta _d>0 \end{aligned}$$

and subject to the condition

$$\begin{aligned} \big (\theta _d^{(i-1)}-\theta _d^{(i)} \big ) k_d < (1-\Sigma _d-\sigma _d)k_d\theta _d - (D_{d-1}-1) - \delta _d \qquad (1 \leqslant i \leqslant M). \end{aligned}$$
(5.7)

Thanks to (5.6), this is always possible with \(M=O(1)\). We now infer from Lemma 4.2 and (5.5) that

$$\begin{aligned} \int _{\mathfrak {m}^{(d)}_{\mathbf {y}}(X; \theta _d^{(0)})}|T_{\mathbf {y}}(\varvec{\alpha })| \,\mathrm {d}\varvec{\alpha }\ll \sup _{\varvec{\alpha }\in \mathfrak {m}_{\mathbf {y}}^{(d)}(X;\theta _d^{(0)})} |T_{\mathbf {y}}(\varvec{\alpha })| \ll \frac{X^s}{d(\Lambda _{\mathbf {y}})}X^{-k_d +\varepsilon }\ll X^{s-(D-1)-\delta _d- \nu }, \end{aligned}$$

provided that \(\nu \) is small enough in terms of the other parameters. Further, if we write

$$\begin{aligned} \mathfrak {m}_{\mathbf {y}, i}^{(d)} = \mathfrak {m}^{(d)}_{\mathbf {y}}(X; \theta _{d}^{(i)}) \cap \mathfrak {M}^{(d)}_{\mathbf {y}}(X; \theta _{d}^{(i-1)}) \qquad (1 \leqslant i \leqslant M), \end{aligned}$$

one obtains via Lemma 5.1, (5.4) and Lemma 4.2 that

$$\begin{aligned} \int _{\mathfrak {m}_{\mathbf {y}, i}^{(d)}}|T_{\mathbf {y}}(\varvec{\alpha })| \,\mathrm {d}\varvec{\alpha }&\ll {{\,\mathrm{vol}\,}}\mathfrak {M}^{(d)}_{\mathbf {y}}(X; \theta _{d}^{(i-1)}) \sup _{\varvec{\alpha }\in \mathfrak {m}^{(d)}_{\mathbf {y}}(X; \theta _{d}^{(i)})} |T_{\mathbf {y}}(\varvec{\alpha })| \\&\ll \frac{X^s}{d(\Lambda _{\mathbf {y}})}X^{ -(D-D_{d-1})+(\Sigma _{d}+\sigma _d)k_{d}\theta _{d}^{(i-1)}-k_d \theta _{d}^{(i)} + \varepsilon }\mu _{\mathrm {max}}^2, \end{aligned}$$

and (5.7) ensures that in the exponent one has for every \(i=1, \ldots , M\) the relation

$$\begin{aligned} -k_d \theta _{d}^{(i)} + (\Sigma _{d}+\sigma _d)k_{d}\theta _{d}^{(i-1)}&\leqslant \big (\theta _{d}^{(i-1)} - \theta _{d}^{(i)}\big ) k_d - \big (1 - (\Sigma _{d} + \sigma _d)\big )k_d\theta _{d} \\&< -(D_{d-1}-1) - \delta _d-\nu \end{aligned}$$

for some sufficiently small \(\nu >0\). Since \(\mu _{\mathrm {max}}\ll d(\Lambda _{\mathbf {y}})\) and

$$\begin{aligned} \int _{\mathfrak {m}^{(d)}_{\mathbf {y}}(X;\theta _d)} |T_{\mathbf {y}}(\varvec{\alpha })| \,\mathrm {d}\varvec{\alpha }= \int _{\mathfrak {m}^{(d)}_{\mathbf {y}}(X; \theta _d^{(0)})}|T_{\mathbf {y}}(\varvec{\alpha })| \,\mathrm {d}\varvec{\alpha }+ \sum _{i=1}^M \int _{\mathfrak {m}_{\mathbf {y}, i}^{(d)}}|T_{\mathbf {y}}(\varvec{\alpha })| \,\mathrm {d}\varvec{\alpha }\end{aligned}$$

with \(M=O(1)\), this completes the proof. \(\square \)

We now employ an iterative argument in order to control the contribution from the nested sets of minor arcs. Fix some j in the range \(2 \leqslant j \leqslant d-1\), and suppose that the contribution arising from the sets \(\mathfrak {m}^{(i)}_{\mathbf {y}}(X; \theta _{i})\) is already bounded for all \(i>j\) and some suitable parameter \(\theta ^*_{j+1}\), where the \(\theta _i\) with \(i>j+1\) are determined by \(\theta ^*_{j+1}\) via (5.2).

Lemma 5.3

Fix an index j with \(2 \leqslant j \leqslant d-1\). Suppose that the parameters \(k_{i}\) with \(j+1 \leqslant i \leqslant d\) as well as \(\theta ^*_{j+1}\) are given in accordance with (3.7). For some \(\delta _{j+1}\geqslant 0\) assume that

$$\begin{aligned} (1-\Sigma _{j+1}- \sigma _{j+1})k_{j+1}\theta ^*_{j+1} > D_{j}-1 + \delta _{j+1}. \end{aligned}$$
(5.8)

Furthermore, for non-negative parameters \(\delta _j\) and \(k_j\) suppose that \(\theta _j\) satisfies (4.3) as well as the inequalities

$$\begin{aligned} 0< \theta _j < \theta _j^{(0)}= \frac{k_{j+1}}{k_j}\theta ^*_{j+1} \end{aligned}$$
(5.9)

and

$$\begin{aligned} (1-(\Sigma _j+ \sigma _j))k_j\theta _j > D_{j-1}-1 + \delta _j. \end{aligned}$$
(5.10)

Then the j-th minor arcs contribution is bounded by

$$\begin{aligned} \int _{\mathfrak {m}^{(j)}(X; \theta _j)} |T_{\mathbf {y}}(\varvec{\alpha })| \,\mathrm {d}\varvec{\alpha }\ll X^{s-(D-1)}\sum _{i=j}^d X^{-\delta _i - \nu }|\mathbf {y}|^{\Delta _{i} + D_{d-i}} \mu _{\mathrm {max}}^{D_{d-i+2}-2}, \end{aligned}$$

where \(\nu \) is some suitably small real number.

Proof

Observe first that with our notation in (5.9) we have the decomposition

$$\begin{aligned} \mathfrak {m}_{\mathbf {y}}^{(j)}(X; \theta _j^{(0)}) = \mathfrak {m}_{\mathbf {y}}^{(j+1)}(X; \theta ^*_{j+1}) \cup \left( \mathfrak {m}_{\mathbf {y}}^{(j)}(X; \theta _j^{(0)}) \cap \mathfrak {M}_{\mathbf {y}}^{(j+1)}(X; \theta ^*_{j+1}) \right) . \end{aligned}$$

Suppose that the lemma has been established for j replaced by \(j+1\), and recall (2.5) and (5.4). We infer from the inductive hypothesis and Lemmata 5.1 and 4.2 that

$$\begin{aligned}&\int _{\mathfrak {m}^{(j)}_{\mathbf {y}}(X; \theta _j^{(0)})}|T_{\mathbf {y}}(\varvec{\alpha })| \,\mathrm {d}\varvec{\alpha }\\&\quad \ll \int _{\mathfrak {m}^{(j+1)}_{\mathbf {y}}(X; \theta ^*_{j+1})}|T_{\mathbf {y}}(\varvec{\alpha })| \,\mathrm {d}\varvec{\alpha }+ {{\,\mathrm{vol}\,}}\mathfrak {M}^{(j+1)}_{\mathbf {y}}(X; \theta ^*_{j+1}) \sup _{\varvec{\alpha }\in \mathfrak {m}_{\mathbf {y}}^{(j)}(X;\theta _j^{(0)})} |T_{\mathbf {y}}(\varvec{\alpha })| \\&\quad \ll \sum _{i=j+1}^d X^{s-(D-1)-\delta _i - \nu }|\mathbf {y}|^{\Delta _i + D_{d-i} } \mu _{\mathrm {max}}^{D_{d-i+2}-2}\\&\qquad +\frac{X^s}{d(\Lambda _{\mathbf {y}})}X^{-(D-D_{j}) + (\Sigma _{j+1}+ \sigma _{j+1})k_{j+1}\theta ^*_{j+1} -k_j \theta _j^{(0)}+\varepsilon }|\mathbf {y}|^{\Delta _{j+1} + D_{d-j-1} } \mu _{\mathrm {max}}^{D_{d-j+1}-1}. \end{aligned}$$

Recall (5.9). Thus the above bound implies via (5.8) and the relation \(\mu _{\mathrm {max}} \ll d(\Lambda _{\mathbf {y}})\) that

$$\begin{aligned} \int _{\mathfrak {m}^{(j)}_{\mathbf {y}}(X; \theta _j^{(0)})}|T_{\mathbf {y}}(\varvec{\alpha })| \,\mathrm {d}\varvec{\alpha }&\ll X^{s-(D-1)} \sum _{i=j+1}^d X^{-\delta _i - \nu }|\mathbf {y}|^{\Delta _{i}+ D_{d-i}} \mu _{\mathrm {max}}^{D_{d-i+2}-2}, \end{aligned}$$

provided that \(\nu \) is small enough in terms of the other parameters.

Let now \(\theta _j\) be given according to (5.9) and (5.10). We can find a sequence \(\theta ^{(i)}_{j}\) satisfying

$$\begin{aligned} \theta _j^{(0)}> \theta _j^{(1)}> \ldots> \theta _j^{(M)}=\theta _j>0, \end{aligned}$$

and subject to the condition

$$\begin{aligned} \big (\theta _j^{(i-1)}-\theta _j^{(i)} \big ) k_j < (1-(\Sigma _j+ \sigma _j))k_j\theta _j - (D_{j-1}-1) - \delta _j \qquad (1 \leqslant i \leqslant M). \end{aligned}$$
(5.11)

This is always possible with \(M=O(1)\). For \(i \geqslant 1\) set

$$\begin{aligned} \mathfrak {m}_{\mathbf {y}, i}^{(j)} = \mathfrak {m}^{(j)}_{\mathbf {y}}(X; \theta _{j}^{(i)}) \cap \mathfrak {M}^{(j)}_{\mathbf {y}}(X; \theta _{j}^{(i-1)}). \end{aligned}$$

Then one deduces from Lemma 5.1, Lemma 4.2, (5.4) and (5.11) that

$$\begin{aligned} \int _{\mathfrak {m}_{\mathbf {y}, i}^{(j)}}|T_{\mathbf {y}}(\varvec{\alpha })| \,\mathrm {d}\varvec{\alpha }&\ll {{\,\mathrm{vol}\,}}\mathfrak {M}^{(j)}_{\mathbf {y}}(X; \theta _{j}^{(i-1)}) \sup _{\varvec{\alpha }\in \mathfrak {m}_{\mathbf {y}}^{(j)}(X;\theta _j^{(i)})} |T_{\mathbf {y}}(\varvec{\alpha })|\\&\ll \frac{X^s}{d(\Lambda _{\mathbf {y}})}X^{-(D-D_{j-1})+(\Sigma _{j}+\sigma _j)k_{j}\theta _{j}^{(i-1)}-k_j \theta _{j}^{(i)} + \varepsilon }|\mathbf {y}|^{\Delta _{j} + D_{d-j} } \mu _{\mathrm {max}}^{D_{d-i+2}-1}\\&\ll X^{s-(D-1)-\delta _j-\nu }|\mathbf {y}|^{\Delta _j + D_{d-j} } \mu _{\mathrm {max}}^{D_{d-j+2}-2} \end{aligned}$$

for each \(i \geqslant 1\), and thus altogether

$$\begin{aligned} \int _{\mathfrak {m}^{(j)}_{\mathbf {y}}(X;\theta _j)} |T_{\mathbf {y}}(\varvec{\alpha })| \,\mathrm {d}\varvec{\alpha }&= \int _{\mathfrak {m}^{(j)}_{\mathbf {y}}(X; \theta _j^{(0)})}|T_{\mathbf {y}}(\varvec{\alpha })| \,\mathrm {d}\varvec{\alpha }+ \sum _{i=1}^M \int _{\mathfrak {m}_{\mathbf {y}, i}^{(j)}}|T_{\mathbf {y}}(\varvec{\alpha })| \,\mathrm {d}\varvec{\alpha }\\&\ll M X^{s-(D-1)} \sum _{i=j}^d X^{-\delta _i - \nu }|\mathbf {y}|^{\Delta _{i} + D_{d-i}} \mu _{\mathrm {max}}^{D_{d-i+2} - 2}. \end{aligned}$$

Since \(M = O(1)\), this completes the proof. \(\square \)

We may now apply first Lemma 5.2 and then Lemma 5.3 successively to each of the \(\theta _j\). Thus, for the initial step we need to ensure that the condition (5.5) is satisfied, and after that we have to satisfy the requirements described by (5.10) for all \(2 \leqslant j \leqslant d\). On the other hand, we have to be careful to ensure that in each iteration we can take \(\theta _{j+1}^*\) small enough for Lemma 4.2 to be applicable within Lemma 5.3. The crucial requirement here is for the bound (4.3) to be satisfied for \(\theta _j^{(0)}\) for all j with \(2 \leqslant j \leqslant d-1\). Using (5.4) and our convention (5.2), the bound of (4.3) can be re-written in the form

$$\begin{aligned} (\sigma _j + 1/k_{j-1})k_j \theta _j<1 - (D_{d-j+1}+(d-1)(d-j+1))\psi \qquad (3 \leqslant j \leqslant d). \end{aligned}$$
(5.12)

For \(3 \leqslant j \leqslant d\), the condition (5.12) is compatible with the hypotheses (5.6) and (5.10) of Lemmata 5.2 and 5.3, respectively, only if

$$\begin{aligned} \left( \sigma _j +k_{j-1}^{-1}\right) \frac{D_{j-1}-1+\delta _j}{1 - (D_{d-j+1}+(d-1)(d-j+1))\psi } + \Sigma _j + \sigma _j< 1. \end{aligned}$$
(5.13)

At the same time, a comparison of (5.12) with (4.5) shows that we also require

$$\begin{aligned} \sigma _j + \frac{1}{k_{j-1}} < \frac{(1-(D_{d-j+1}+(d-1)(d-j+1))\psi )\varpi }{\psi }. \end{aligned}$$
(5.14)

Meanwhile, when \(j=2\), the bound of (5.12) does not apply, and we only have the constraints stemming from (4.5) and (5.10), which can be rewritten as

$$\begin{aligned} k_2\theta _2 > \max \left\{ \frac{\delta _2}{1-(\Sigma _2 +\sigma _2)}, \psi \varpi ^{-1} \right\} . \end{aligned}$$
(5.15)

We will attend to this bound later, but in the meanwhile we remark that regardless of the specific values \(\theta _2>0\) and \(\delta _2 \geqslant 0\), it implies that we must have \( \Sigma _2 + \sigma _2 < 1\). Summarising, we obtain the following intermediate result.

Proposition 5.4

Assume (4.8). Suppose that (5.13) and (5.14) are satisfied for all \(j \geqslant 3\), and that furthermore (5.5) and (5.15) hold. Then for some \(\nu > 0\) we have

$$\begin{aligned} N_{\mathbf {y}}(X) = \int _{\mathfrak {M}_{\mathbf {y}}^{(2)}(X; \theta )} T_{\mathbf {y}}(\varvec{\alpha }) \,\mathrm {d}\varvec{\alpha }+ O\left( X^{s-(D-1)-\nu }\sum _{j=2}^dX^{-\delta _j}|\mathbf {y}|^{\Delta _j + D_{d-j}}\mu _{\mathrm {max}}^{D_{d-j+2}-2}\right) . \end{aligned}$$
(5.16)

Proof

This follows from (2.4) upon applying Lemmata 5.2 and 5.3, and the discussion preceding the statement of the proposition. \(\square \)

6 Understanding the main term

In order to show that the main term of (5.16) is indeed of the expected shape, it is necessary for the approximations of all components of \(\varvec{\alpha }\) to have the same denominator. Recall that we wrote \(\omega = \omega _2\), and set

$$\begin{aligned} q=Q_2\quad \text { and }\quad b_j=(q/Q_j)a_j \qquad (2 \leqslant j \leqslant d). \end{aligned}$$

For some positive constant c set \(W=cX^{\omega }|\mathbf {y}|^{D_{d-2} + (d-1)^2}\), where \(\omega \) is as obtained in Proposition 5.4. Our final set \(\mathfrak {P}_{\mathbf {y}}(X; \omega )\) of major arcs is now the set of all \(\varvec{\alpha }\) with an approximation of the shape

$$\begin{aligned} 1 \leqslant q \leqslant W \quad \text {and} \quad |\alpha _j - b_j/q| \leqslant X^{-j}W \quad (2 \leqslant j \leqslant d). \end{aligned}$$
(6.1)

Recall (2.5). When c is sufficiently large, the set \(\mathfrak {P}_{\mathbf {y}}(X; \omega )\) is slightly larger than \(\mathfrak {M}_{\mathbf {y}}^{(2)} (X; \theta )\), so the corresponding minor arcs \(\mathfrak {p}_{\mathbf {y}}(X; \omega ) = [0,1)^{d-1} {\setminus } \mathfrak {P}_{\mathbf {y}}(X; \omega )\) are contained in \(\mathfrak {m}_{\mathbf {y}}^{(2)} (X; \theta )\). In the statement of Proposition 5.4, we may therefore replace the major arcs \(\mathfrak {M}_{\mathbf {y}}^{(2)} (X; \theta )\) by the larger set \(\mathfrak {P}_{\mathbf {y}}(X; \omega )\).

Let \(\mathcal {L}_{\mathbf {y}}\) denote the s-dimensional subspace of \(\mathbb {R}^n\) containing \(\Lambda _{\mathbf {y}}\). Furthermore, we define \(\mathcal {L}_{\mathbf {y}}(X) = \mathcal {L}_{\mathbf {y}} \cap [-X,X]^n\), and we let \(\Lambda _{\mathbf {y}}(q)\) denote the set of residue classes modulo q of lattice points \(\mathbf {x} \in \Lambda _{\mathbf {y}}\). Also, set

$$\begin{aligned} \vartheta _{\mathbf {y}}(\varvec{\alpha };\mathbf {x}) = \sum _{j=2}^d \alpha _j \Phi _{\mathbf {y}}^{(j)} (\mathbf {x}) \end{aligned}$$

for the analogue of \(\phi _{\mathbf {y}}\) in terms of the original variables \(\mathbf {x} \in \Lambda _{\mathbf {y}}\). In this notation, we can now define

$$\begin{aligned} S_{\mathbf {y}}(q, \mathbf {a}) = \sum _{\mathbf {x} \in \Lambda _{\mathbf {y}}(q)} e(\vartheta _{\mathbf {y}}(\mathbf {a}/q;\mathbf {x}) ) \quad \text { and } \quad v_{\mathbf {y}}(\varvec{\beta }, X) = \int _{\mathcal {L}_{\mathbf {y}}(X)} e(\vartheta _{\mathbf {y}}(\varvec{\beta }; \varvec{\xi })) \,\mathrm {d}\varvec{\xi }. \end{aligned}$$
(6.2)

These functions allow us to approximate the exponential sum \(T_{\mathbf {y}}(\varvec{\alpha })\) on the major arcs.

Lemma 6.1

Suppose that \(\varvec{\alpha }= \mathbf {a} /q + \varvec{\beta }\) with \(q \leqslant X^{1-\psi (d-1)}\). We have

$$\begin{aligned} \left| T_{\mathbf {y}}(\varvec{\alpha }) - \frac{S_{\mathbf {y}}(q, \mathbf {a})}{q^s} \frac{v_{\mathbf {y}}(\varvec{\beta }, X)}{d (\Lambda _{\mathbf {y}})}\right|&\ll X^{s-1}q\left( 1 + \frac{1}{d(\Lambda _{\mathbf {y}})}\sum _{j=2}^d |\beta _j|X^{j}|\mathbf {y}|^{d-j} \right) . \end{aligned}$$
(6.3)

Proof

This is essentially standard, but due to our specific setting over a lattice we prefer to provide a full proof. Sorting the terms into arithmetic progressions modulo q, we find that

$$\begin{aligned} T_{\mathbf {y}}(\varvec{\alpha }) = \sum _{\mathbf {z} \in \Lambda _{\mathbf {y}}(q)} e (\vartheta _{\mathbf {y}}(\mathbf {a}/q;\mathbf {z})) \sum _{\begin{array}{c} \mathbf {w} \in \Lambda _{\mathbf {y}} \\ q \mathbf {w} + \mathbf {z} \in \mathfrak {A}_{\mathbf {y}}(X) \end{array}} e (\vartheta _{\mathbf {y}}(\varvec{\beta };q \mathbf {w} + \mathbf {z})), \end{aligned}$$

and hence

$$\begin{aligned} \left| T_{\mathbf {y}}(\varvec{\alpha }) - \frac{S_{\mathbf {y}}(q, \mathbf {a})}{q^s} \frac{v_{\mathbf {y}}(\varvec{\beta }, X)}{d (\Lambda _{\mathbf {y}})}\right|&\ll \sum _{\mathbf {z} \in \Lambda _{\mathbf {y}}(q)} e (\vartheta _{\mathbf {y}}(\mathbf {a}/q;\mathbf {z})) H(q, \mathbf {z}, \varvec{\beta }), \end{aligned}$$

where

$$\begin{aligned} H(q, \mathbf {z}, \varvec{\beta }) = \sum _{\begin{array}{c} \mathbf {w} \in \Lambda _{\mathbf {y}} \\ q \mathbf {w} + \mathbf {z} \in \mathfrak {A}_{\mathbf {y}}(X) \end{array}} e (\vartheta _{\mathbf {y}}(\varvec{\beta }; q \mathbf {w} + \mathbf {z})) - \frac{1}{q^s d(\Lambda _{\mathbf {y}})} \int _{\varvec{\xi } \in \mathcal {L}_{\mathbf {y}}(X)} e(\vartheta _{\mathbf {y}}(\varvec{\beta };\varvec{\xi })) \,\mathrm {d}\varvec{\xi }. \end{aligned}$$

Denote the fundamental domain of \(\Lambda _{\mathbf {y}}\) by \(\mathcal {F}\), and for \(\mathbf {w} \in \Lambda _{\mathbf {y}}\) write \(\mathcal {F}(\mathbf {w}) = \mathbf {w} + \mathcal {F}\) for the fundamental domain located at \(\mathbf {w}\). Moreover, we write \(\mathcal {F}_{q, \mathbf {z}}(\mathbf {w}) = q (\mathbf {w} + \mathcal {F})+\mathbf {z} \) for the domain, stretched by a factor q, that is located at \(q \mathbf {w} + \mathbf {z}\). We want to replace \(H(q, \mathbf {z}, \varvec{\beta })\) by the related quantity

$$\begin{aligned} H^*(q, \mathbf {z}, \varvec{\beta }) =\sum _{\begin{array}{c} \mathbf {w} \in \Lambda _{\mathbf {y}} \\ q \mathbf {w} + \mathbf {z} \in \mathfrak {A}_{\mathbf {y}}(X) \end{array}} \left\{ e (\vartheta _{\mathbf {y}}(\varvec{\beta }; q \mathbf {w} + \mathbf {z})) - \frac{1}{q^s d(\Lambda _{\mathbf {y}})} \int _{ \mathcal {F}_{q, \mathbf {z}}(\mathbf {w})} e(\vartheta _{\mathbf {y}}(\varvec{\alpha };\varvec{\xi })) \,\mathrm {d}\varvec{\xi }\right\} . \end{aligned}$$

Clearly, we have \({{\,\mathrm{vol}\,}}\mathcal {F} = d(\Lambda _{\mathbf {y}})\) and \({{\,\mathrm{vol}\,}}\mathcal {F}_{q, \mathbf {z}}(\mathbf {w}) = q^s d(\Lambda _{\mathbf {y}})\). Thus, \(\mathcal {L}_{\mathbf {y}}(X)\) may be covered by \(O(X^s/(q^s d(\Lambda _{\mathbf {y}})))\) domains \(\mathcal {F}_{q, \mathbf {z}}(\mathbf {w})\) as \(\mathbf {w}\) varies over \(\Lambda _{\mathbf {y}}\), and the boundary intersects at most roughly \((X/q)^{s-1}\mu _{\mathrm {max}}/d(\Lambda _{\mathbf {y}}) \ll (X/q)^{s-1}\) of these. Thus, the defect is of size at most \(O(X^{s-1}q d(\Lambda _{\mathbf {y}}) )\). With this information, we find upon partitioning the integrating domain that \(H(q, \mathbf {z}, \varvec{\beta }) - H^*(q, \mathbf {z}, \varvec{\beta })\ll (X/q)^{s-1}\), and thus

$$\begin{aligned} \left| T_{\mathbf {y}}(\varvec{\alpha }) - \frac{S_{\mathbf {y}}(q, \mathbf {a})}{q^s} \frac{v_{\mathbf {y}}(\varvec{\beta }, X)}{d (\Lambda _{\mathbf {y}})}\right|&\ll \sum _{\mathbf {z} \in \Lambda _{\mathbf {y}}(q)} e (\vartheta _{\mathbf {y}}(\mathbf {a}/q;\mathbf {z})) H^*(q, \mathbf {z}, \varvec{\beta }) + O(X^{s-1}q). \end{aligned}$$

Rewriting

$$\begin{aligned} H^*(q, \mathbf {z}, \varvec{\beta }) = \sum _{\begin{array}{c} \mathbf {w} \in \Lambda _{\mathbf {y}} \\ q \mathbf {w} + \mathbf {z} \in \mathfrak {A}_{\mathbf {y}}(X) \end{array}}\frac{1}{d(\Lambda _{\mathbf {y}})}\int _{\mathcal {F}(\mathbf {w})}e (\vartheta _{\mathbf {y}}(\varvec{\beta }; q \mathbf {w} + \mathbf {z})) - e(\vartheta _{\mathbf {y}}(\varvec{\beta }; q \varvec{\xi }+ \mathbf {z})) \,\mathrm {d}\varvec{\xi } \end{aligned}$$

puts us into a position where we can apply the mean value theorem, whereupon we see that

$$\begin{aligned} H^*(q, \mathbf {z}, \varvec{\beta })&\ll \sum _{\begin{array}{c} \mathbf {w} \in \Lambda _{\mathbf {y}} \\ q \mathbf {w} + \mathbf {z} \in \mathfrak {A}_{\mathbf {y}}(X) \end{array}}q \sum _{j=2}^d |\beta _j|X^{j-1}|\mathbf {y}|^{d-j} \\&\ll \left( \frac{X^s}{q^sd(\Lambda _{\mathbf {y}})}+1\right) q \sum _{j=2}^d |\beta _j|X^{j-1}|\mathbf {y}|^{d-j}. \end{aligned}$$

The desired bound follows now upon applying the trivial bound \(S_{\mathbf {y}}(q,\mathbf {a}) \ll q^s\). \(\square \)

In particular, when \(\varvec{\alpha }\in \mathfrak {P}_{\mathbf {y}}(X;\omega )\), inserting the conditions (6.1) into (6.3) shows that

$$\begin{aligned} \left| T_{\mathbf {y}}(\varvec{\alpha }) - \frac{S_{\mathbf {y}}(q, \mathbf {a})}{q^s} \frac{v_{\mathbf {y}}(\varvec{\beta }, X)}{d (\Lambda _{\mathbf {y}})}\right| \ll X^{s-1}W^2. \end{aligned}$$

Since

$$\begin{aligned} {{\,\mathrm{vol}\,}}\mathfrak {P}_{\mathbf {y}}(X;\omega )&\ll \sum _{q=1}^{W}\prod _{j=2}^d q X^{-j}W \ll X^{-(D-1)}W^{2d-1}, \end{aligned}$$

it follows that

$$\begin{aligned} \int _{\mathfrak {P}_{\mathbf {y}}(X; \omega )} T_{\mathbf {y}}(\varvec{\alpha }) \,\mathrm {d}\varvec{\alpha }&= \sum _{q=1}^{W} \sum _{\begin{array}{c} {\mathbf {a} = 0}\\ {(\mathbf {a}, q)=1} \end{array}}^{q-1} \frac{S_{\mathbf {y}}(q, \mathbf {a})}{q^s} \int _{\begin{array}{c} |\beta _j| \leqslant X^{-j}W \\ (2 \leqslant j \leqslant d) \end{array}} \frac{v_{\mathbf {y}}(\varvec{\beta }, X)}{d(\Lambda _{\mathbf {y}})} \,\mathrm {d}\varvec{\beta }\nonumber \\&\qquad + O \left( X^{s-D}W^{2d+1} \right) . \end{aligned}$$
(6.4)

As usual, the growth rate of the main term in the asymptotic formula comes from the contribution of \(v_{\mathbf {y}}(\varvec{\beta }, X)\). Setting \(\gamma _j = X^{j}\beta _j\) for \(2 \leqslant j \leqslant d\), the identity

$$\begin{aligned} v_{\mathbf {y}}(\varvec{\beta }, X) =X^{s} v_{\mathbf {y}}(\varvec{\gamma }, 1) \end{aligned}$$
(6.5)

follows from (6.2) by applying integration by parts, and in the same manner one finds further that

$$\begin{aligned} \int _{\begin{array}{c} |\beta _j| \leqslant X^{-j}W \\ (2 \leqslant j \leqslant d) \end{array} }v_{\mathbf {y}}(\varvec{\beta }, X) \,\mathrm {d}\varvec{\beta }= X^{s-(D-1)} \int _{|\varvec{\beta }| \leqslant W } v_{\mathbf {y}}(\varvec{\beta }, 1) \,\mathrm {d}\varvec{\beta }. \end{aligned}$$

Let

$$\begin{aligned} \mathfrak {J}_{\mathbf {y}}(W) = \int _{[-W,W]^{d-1}} \frac{v_{\mathbf {y}}(\varvec{\beta }, 1)}{d(\Lambda _{\mathbf {y}})} \,\mathrm {d}\varvec{\beta }\qquad \text { and }\qquad \mathfrak {S}_{\mathbf {y}}(W) = \sum _{q=1}^{W} q^{-s} \sum _{\begin{array}{c} \mathbf {a} = 0 \\ (\mathbf {a}, q)=1 \end{array}}^{q-1} S_{\mathbf {y}}(q, \mathbf {a}), \end{aligned}$$

then we can rewrite (6.4) in the shape

$$\begin{aligned} \int _{\mathfrak {P}_{\mathbf {y}}(X; \omega )} T_{\mathbf {y}}(\varvec{\alpha }) \,\mathrm {d}\varvec{\alpha }&= X^{s-D+1} \mathfrak {S}_{\mathbf {y}}(W) \mathfrak {J}_{\mathbf {y}}(W) + O\left( X^{s-D}W^{2d+1}\right) . \end{aligned}$$
(6.6)

In order to understand the main term in (6.6), we extend the truncated singular integral \(\mathfrak {J}_{\mathbf {y}}(W)\) and the truncated singular series \(\mathfrak {S}_{\mathbf {y}}(W)\) to infinity by taking the limits \(X \rightarrow \infty \) in both expressions. In our analysis of these limits, the notations \(\varvec{\beta }_j = (\beta _j, \ldots , \beta _d)\) and \(\mathbf {a}_j = (a_j, \ldots , a_d)\) (\(2 \leqslant j \leqslant d\)) will prove useful.

We start by considering the singular integral.

Lemma 6.2

We have

$$\begin{aligned} |v_{\mathbf {y}}(\varvec{\beta }, 1)| \ll \min _{2 \leqslant j \leqslant d} |\mathbf {y}|^{D_{d-j}/\sigma _j}\mu _{\mathrm {max}}^{(d-j+1)/\sigma _j}(1+ |\varvec{\beta }_j|)^{-1/\sigma _j+\varepsilon }. \end{aligned}$$

Proof

Fix j with \(2 \leqslant j \leqslant d\). For \(|\varvec{\beta }_j| \leqslant 1\) the claim is trivial, so we may assume that \(|\varvec{\beta }_j| > 1\). Choose \(P=|\varvec{\beta }|^A\) for some large parameter A to be fixed later, and write \(\varvec{\gamma }= (P^{-2} \beta _2, \ldots , P^{-d} \beta _d)\) and \(\varvec{\gamma }_j = (\gamma _j, \ldots , \gamma _d)\). Recalling (5.4), we fix \(\theta _j\) such that

$$\begin{aligned} \max _{j \leqslant i \leqslant d } \frac{|\varvec{\beta }_i|}{c_j P^{\omega _i} |\mathbf {y}|^{D_{d-i}}\mu _{\mathrm {max}}^{d-i+1}} = 1, \end{aligned}$$

so that

$$\begin{aligned} P^{ -k_j\theta _j} \ll |\varvec{\beta }_j|^{-1/\sigma _j} |\mathbf {y}|^{D_{d-j}/\sigma _j}\mu _{\mathrm {max}}^{(d-j+1)/\sigma _j}. \end{aligned}$$
(6.7)

With this choice, \(\varvec{\gamma }_j\) lies in the major arcs \(\mathfrak {M}_{\mathbf {y}}^{(j)}(P; \theta _j)\). Clearly, the major arcs are disjoint when A is sufficiently large, so \(\varvec{\gamma }_j\) is best approximated by \(q=1\) and \(\mathbf {a}_j= \varvec{0}\). We therefore have from Lemma 6.1 and (6.5) that

$$\begin{aligned} |v_{\mathbf {y}}(\varvec{\beta }, 1)| \ll \left( \frac{P^s}{d(\Lambda _{\mathbf {y}})}\right) ^{-1} |T_{\mathbf {y}}(\varvec{\gamma }; P)| + P^{-1}|\varvec{\beta }|. \end{aligned}$$
(6.8)

On the other hand, \(\varvec{\gamma }\) lies just on the boundary of the major arcs and thus by continuity the minor arcs bound continues to apply. Consequently, we obtain from Lemma 4.2 and (6.7) the complementary estimate

$$\begin{aligned} |T_{\mathbf {y}}(\varvec{\gamma }; P)| \ll \left( \frac{P^s}{d(\Lambda _{\mathbf {y}})}\right) P^{-k_j \theta _j + \varepsilon } \ll \left( \frac{P^s}{d(\Lambda _{\mathbf {y}})}\right) P^{\varepsilon }|\varvec{\beta }_j|^{-1/\sigma _j} |\mathbf {y}|^{D_{d-j}/\sigma _j}\mu _{\mathrm {max}}^{(d-j+1)/\sigma _j}. \end{aligned}$$

Inserting this into (6.8) leads to

$$\begin{aligned} |v_{\mathbf {y}}(\varvec{\beta }, 1)| \ll P^{\varepsilon }|\varvec{\beta }_j|^{-1/\sigma _j+\varepsilon } |\mathbf {y}|^{D_{d-j}/\sigma _j}\mu _{\mathrm {max}}^{(d-j+1)/\sigma _j} + P^{-1}|\varvec{\beta }|, \end{aligned}$$

and upon recalling that \(P=|\varvec{\beta }|^A \geqslant |\varvec{\beta }_j|^A\), this reproduces the desired estimate whenever A is sufficiently large. \(\square \)

It follows from Lemma 6.2 that for any tuple \(\lambda _{2}, \ldots , \lambda _{d} \in [0,1]\) satisfying the relation \(\lambda _{2} + \ldots + \lambda _{d}=1\) we have

$$\begin{aligned}&\int _{\begin{array}{c} \begin{array}{c} \varvec{\beta }\in \mathbb {R}^{d-1} \\ |\varvec{\beta }|> W \end{array} \end{array}}\frac{|v_{\mathbf {y}}(\varvec{\beta }, 1)|}{d(\Lambda _{\mathbf {y}})} \,\mathrm {d}\varvec{\beta }\\&\quad \ll \frac{1}{d(\Lambda _{\mathbf {y}})} \int _{\begin{array}{c} \begin{array}{c} \varvec{\beta }\in \mathbb {R}^{d-1} \\ |\varvec{\beta }| > W \end{array} \end{array}} \prod _{j=2}^d\left( |\mathbf {y}|^{D_{d-j}/\sigma _j}\mu _{\mathrm {max}}^{(d-j+1)/\sigma _j} (1+ |\varvec{\beta }|)^{- 1/\sigma _j+\varepsilon }\right) ^{\lambda _{j}}\,\mathrm {d}\varvec{\beta }. \end{aligned}$$

The set of all \(\varvec{\beta }\in \mathbb {R}^{d-1}\) having \(|\varvec{\beta }|=r\) has volume \(O(r^{d-2})\). Recalling that we have \(\mu _{\mathrm {max}} \ll d(\Lambda _{\mathbf {y}})\), it follows that the above integral is bounded by

$$\begin{aligned} \int _{\begin{array}{c} \begin{array}{c} \varvec{\beta }\in \mathbb {R}^{d-1} \\ |\varvec{\beta }|> W \end{array} \end{array}}\frac{|v_{\mathbf {y}}(\varvec{\beta }, 1)|}{d(\Lambda _{\mathbf {y}})} \,\mathrm {d}\varvec{\beta }&\ll |\mathbf {y}|^{\kappa _1 }\mu _{\mathrm {max}}^{-1+\kappa _2} \int _{r > W} (1+ r) ^{-\kappa _3+d-2+ \varepsilon }\,\mathrm {d}r, \end{aligned}$$

where

$$\begin{aligned} \kappa _1 = \sum _{j=2}^d \frac{D_{d-j}\lambda _j}{\sigma _j}, \qquad \kappa _2 = \sum _{j=2}^d \frac{(d-j+1)\lambda _j}{\sigma _j}, \qquad \kappa _3 =\sum _{j=2}^d \frac{\lambda _j}{\sigma _j} . \end{aligned}$$

The integral converges if we can pick \(\lambda _2, \ldots , \lambda _d\) in such a way that \(\kappa _3> d-1\). We take \(\lambda _j = \sigma _j\) for \(j \geqslant 3\), so that \(\lambda _2 = 1 - \Sigma _3=\sigma _2 + (1-\Sigma _2)\). With this choice, the desired inequality \(\kappa _3 > d-1\) is satisfied if \(\Sigma < 1\), and we have

$$\begin{aligned} \kappa _3 = d-1 + \frac{1-\Sigma _2}{\sigma _2} = d+\frac{1-\Sigma -\sigma }{\sigma }. \end{aligned}$$

Moreover, using these values in our expression for \(\kappa _1\) and \(\kappa _2\) we obtain

$$\begin{aligned} \kappa _1 = \Delta _2 + D_{d-2} + D_{d-2}\frac{1-\sigma - \Sigma }{\sigma } \qquad \text {and} \qquad \kappa _2 = D-1+ (d-1)\frac{1-\sigma - \Sigma }{\sigma }. \end{aligned}$$

Upon referring to (2.5), this allows us to conclude that

$$\begin{aligned} \mathfrak {J}_{\mathbf {y}} - \mathfrak {J}_{\mathbf {y}}(W)&\ll |\mathbf {y}|^{\Delta _2 + D_{d-2}+ (d-1)(D-2)+ (D_{d-2}+(d-1)^2){\textstyle \frac{1-\sigma -\Sigma }{\sigma }}} W^{-1-{\textstyle \frac{1-\sigma -\Sigma }{\sigma }}+\varepsilon } \nonumber \\&\ll |\mathbf {y}|^{\frac{1}{3}(2 d^3 - 11 d + 9) + \frac{1}{2}(3d^2-7d+4){\textstyle \frac{1-\sigma -\Sigma }{\sigma }}} W^{-1-{\textstyle \frac{1-\sigma -\Sigma }{\sigma }}+\varepsilon }, \end{aligned}$$
(6.9)

and we have the bound

$$\begin{aligned} \mathfrak {J}_{\mathbf {y}}(W)&\ll |\mathbf {y}|^{\frac{1}{3}(2 d^3 - 11 d + 9) + \frac{1}{2}(3d^2-7d+4){\textstyle \frac{1-\sigma -\Sigma }{\sigma }}} \end{aligned}$$
(6.10)

uniformly in W.

The next step is to complete the truncated singular series.

Lemma 6.3

The terms of the singular series are bounded by

$$\begin{aligned} |q^{-s}S_{\mathbf {y}}(q, \mathbf {a}) | \ll \min _{2 \leqslant j \leqslant d} q^{\varepsilon }\left( \frac{q}{(q, \mathbf {a}_j)} \right) ^{-1/\sigma _j} |\mathbf {y}|^{D_{d-j}/\sigma _j}\mu _{\mathrm {max}}^{(d-j+1)/\sigma _j}. \end{aligned}$$

Proof

For \(q=1\) the estimate is trivial, so we may suppose that \(q > 1\). Fix \(P = q^A\) for some large A to be determined later. For any j with \(2 \leqslant j \leqslant d\) fix \(\theta _j\) such that

$$\begin{aligned} \max _{j \leqslant i \leqslant d}\frac{q/(q, \mathbf {a}_i)}{ c_j^{d-i} P^{\omega _i}|\mathbf {y}|^{D_{d-i}} \mu _{\mathrm {max}}^{d-i+1}}=1, \end{aligned}$$

so that in particular

$$\begin{aligned} P^{-k_j \theta _j} \ll \left( \frac{q}{(q,\mathbf {a}_j)}\right) ^{-1/\sigma _j}|\mathbf {y}|^{D_{d-j}/\sigma _j}\mu _{\mathrm {max}}^{(d-j+1)/\sigma _j} \end{aligned}$$
(6.11)

and \(\mathbf {a}_j/q \in \mathfrak {M}_{\mathbf {y}}^{(j)}(P; \theta _j)\). Note that by taking A sufficiently large we may ensure that the major arcs \(\mathfrak {M}_{\mathbf {y}}^{(j)}(P; \theta _j)\) are disjoint, so \(\mathbf {a}_j/q\) is best approximated by itself. Applying Lemma 6.1 and (6.5) with \(\varvec{\beta }= \varvec{0}\) and observing that \(v_{\mathbf {y}}(\varvec{0}, 1) \asymp 1\), it follows that

$$\begin{aligned} q^{-s}S_{\mathbf {y}}(q, \mathbf {a}) \ll \left( \frac{P^s}{d(\Lambda _{\mathbf {y}})}\right) ^{-1} |T_{\mathbf {y}}(q^{-1} \mathbf {a}; P)| + P^{-1}q . \end{aligned}$$
(6.12)

At the same time, \(\mathbf {a}_j/q\) can be viewed as lying just on the boundary of the major arcs in the q-aspect. As before, this implies that Lemma 4.2 and (6.11) furnish the additional minor arcs bound

$$\begin{aligned} |T_{\mathbf {y}}(q^{-1} \mathbf {a}; P)|&\ll d(\Lambda _{\mathbf {y}})^{-1}P^{s-k_j \theta _j+\varepsilon } \\&\ll \frac{P^{s+\varepsilon }}{d(\Lambda _{\mathbf {y}})} \left( \frac{q}{(q, \mathbf {a}_j)}\right) ^{-1/\sigma _j}|\mathbf {y}|^{D_{d-j}/\sigma _j}\mu _{\mathrm {max}}^{(d-j+1)/\sigma _j}, \end{aligned}$$

and on substituting this into (6.12) we discern that

$$\begin{aligned} q^{-s}S_{\mathbf {y}}(q, \mathbf {a}) \ll P^{\varepsilon }\left( \frac{q}{(q, \mathbf {a}_j)}\right) ^{-1/\sigma _j}|\mathbf {y}|^{D_{d-j}/\sigma _j}\mu _{\mathrm {max}}^{(d-j+1)/\sigma _j} + P^{-1}q \qquad (2 \leqslant j \leqslant d). \end{aligned}$$

Recalling that \(P=q^A\), it is clear that for A sufficiently large the first term dominates. \(\square \)

Lemma 6.3 implies that the singular series may be extended to infinity. Let \(\tau _2, \ldots , \tau _d\) be natural numbers with the property that \(\tau _j | \tau _{j+1}\) for \(2 \leqslant j \leqslant d-1\) and \(\tau _d|q\). For any j the number of choices of \(\mathbf {a} \;(\mathrm {mod}\;{q})\) satisfying \((q, \mathbf {a}_j)= \tau _j\) is \(O(q^{d-1}/\tau _j^{d-j+1})\). It thus follows that we have

$$\begin{aligned}&\sum _{q=1}^W \sum _{\begin{array}{c} \mathbf {a}= 0 \\ (\mathbf {a}, q)=1 \end{array}}^{q-1} q^{-s} |S_{\mathbf {y}}(q, \mathbf {a})|\\&\quad \ll \sum _{q=1}^W \sum _{\tau _2| \ldots | \tau _d | q} \min _{2 \leqslant j \leqslant d} q^{j-2+\varepsilon } \left( \frac{q}{\tau _j}\right) ^{d-j+1-1/\sigma _j} |\mathbf {y}|^{D_{d-j}/\sigma _j}\mu _{\mathrm {max}}^{(d-j+1)/\sigma _j}\\&\quad \ll \sum _{q=1}^W q^{d-1+\varepsilon } \prod _{j=2}^d \left( q^{-1/\sigma _j} |\mathbf {y}|^{D_{d-j}/\sigma _j}\mu _{\mathrm {max}}^{(d-j+1)/\sigma _j}\right) ^{\lambda _j} \end{aligned}$$

for any choice of \(\lambda _2, \ldots , \lambda _d \in [0,1]\) with \(\lambda _2+ \ldots + \lambda _d=1\). Just like in the treatment of the singular integral, we can take \(\lambda _j = \sigma _j\) for \(3 \leqslant j \leqslant d\), and \(\lambda _2 = 1 - \Sigma _3\). This choice yields the bound

$$\begin{aligned} \mathfrak {S}_{\mathbf {y}} - \mathfrak {S}_{\mathbf {y}}(W)&\ll |\mathbf {y}|^{ \Delta _2 + D_{d-2}+ (d-1)(D-1)+ {\textstyle \frac{1-\sigma -\Sigma }{\sigma }}(D_{d-2}+(d-1)^2) } \sum _{q\geqslant W} q^{-1-{\textstyle \frac{1-\sigma -\Sigma }{\sigma }}+\varepsilon } \nonumber \\&\ll |\mathbf {y}|^{\frac{2}{3}(d^3 - 4 d + 3) + \frac{1}{2}(3d^2-7d+4){\textstyle \frac{1-\sigma -\Sigma }{\sigma }}} W^{-{\textstyle \frac{1-\sigma -\Sigma }{\sigma }}+\varepsilon } \end{aligned}$$
(6.13)

whenever we have \(\Sigma +\sigma < 1\). Again, we recall that this last inequality is satisfied as a consequence of the more stringent condition (5.15). In particular, we have the bound

$$\begin{aligned} \mathfrak {S}_{\mathbf {y}}(W) \ll |\mathbf {y}|^{\frac{2}{3}(d^3 - 4 d + 3) + \frac{1}{2}(3d^2-7d+4){\textstyle \frac{1-\sigma -\Sigma }{\sigma }}}, \end{aligned}$$
(6.14)

which holds uniformly in W.

We can now complete the singular series and integral. Here, from (6.9), (6.10), (6.13) and (6.14) and upon inserting our value \(W=c X^{\omega }|\mathbf {y}|^{D_{d-2}+(d-1)^2}\), we find that

$$\begin{aligned} |\mathfrak {J}_{\mathbf {y}} \mathfrak {S}_{\mathbf {y}} - \mathfrak {J}_{\mathbf {y}}(W) \mathfrak {S}_{\mathbf {y}}(W) |&\ll |\mathbf {y}|^{\frac{1}{3}(4 d^3 - 19 d + 15) + (3d^2-7d+4){\textstyle \frac{1-\sigma -\Sigma }{\sigma }}} W^{- {\textstyle \frac{1-\sigma -\Sigma }{\sigma }}+\varepsilon }\nonumber \\&\ll X^{-\omega {\textstyle \frac{1-\sigma -\Sigma }{\sigma }}+\varepsilon }|\mathbf {y}|^{\frac{1}{3}(4 d^3 - 19 d + 15) + \frac{1}{2}(3 d^2 - 7 d + 4) {\textstyle \frac{1-\sigma -\Sigma }{\sigma }}}. \end{aligned}$$
(6.15)

It remains to collect our estimates.

Proposition 6.4

Make the assumption (3.4) and suppose that the conditions (5.5), (5.13), (5.14) and (5.15) are satisfied. Moreover, assume (4.8). In this case we have the asymptotic formula

$$\begin{aligned} N_{\mathbf {y}}(X)= X^{n-D} \left( \mathfrak {S}_{\mathbf {y}} \mathfrak {J}_{\mathbf {y}} + O(E(\mathbf {y}, \theta ))\right) , \end{aligned}$$

where

$$\begin{aligned} E(\mathbf {y}, \theta )&= \sum _{j=2}^dX^{-\delta _j-\nu }|\mathbf {y}|^{\Delta _j+D_{d-j}+ (D_{d-j+1}+d-j)(d-1)} \nonumber \\&\quad + X^{-1+(2d+1)\omega } |\mathbf {y}|^{\frac{1}{2}(6d^3-11d^2+d+4)} \nonumber \\&\quad + X^{-\omega {\textstyle \frac{1-\sigma -\Sigma }{\sigma }}+\varepsilon }|\mathbf {y}|^{\frac{1}{3}(4 d^3 - 19 d + 15) + \frac{1}{2}(3 d^2 - 7 d + 4) {\textstyle \frac{1-\sigma -\Sigma }{\sigma }}}. \end{aligned}$$
(6.16)

Proof

Recall that we had \(n=s+1\). The statement now follows from Proposition 5.4 together with (6.6) and (6.15). \(\square \)

Before concluding the section, we remark that the singular series and integral can be expressed in terms of solution densities of the system (2.2) over the real and p-adic numbers. Indeed, since under the hypotheses of the proposition the singular series is absolutely convergent, by standard arguments it can be written as an absolutely convergent Euler product \(\mathfrak {S}_{\mathbf {y}} = \prod _{p} \chi _p\), where

$$\begin{aligned} \chi _p&= \sum _{h=0}^{\infty } p^{-hs}\sum _{\begin{array}{c} \mathbf {a} = 1 \\ (\mathbf {a}, p)=1 \end{array}}^{p^h} S_{\mathbf {y}}(p^h, \mathbf {a}) \\&= \lim _{H \rightarrow \infty }p^{H(D-1-s)} \#\{\mathbf {x} \in \Lambda _{\mathbf {y}}(p^H): \Phi ^{(j)}_{\mathbf {y}} (\mathbf {x}) \equiv 0 \;(\mathrm {mod}\;{p^H}) \text { for }2 \leqslant j \leqslant d\}. \end{aligned}$$

Upon recalling that \(\Lambda _{\mathbf {y}}(q)\) denotes the set of all \(\mathbf {x} \in (\mathbb {Z}/q\mathbb {Z})^n\) that satisfy the congruence \(\Phi _{\mathbf {y}}^{(1)}(\mathbf {x}) \equiv 0 \;(\mathrm {mod}\;{q})\), we see that the above can be re-written as

$$\begin{aligned} \chi _p&= \lim _{H \rightarrow \infty }p^{H(D-n)} \#\{\mathbf {x} \in (\mathbb {Z}/ p^H \mathbb {Z})^n: \Phi ^{(j)}_{\mathbf {y}} (\mathbf {x}) \equiv 0 \;(\mathrm {mod}\;{p^H}) \text { for }1 \leqslant j \leqslant d\}. \end{aligned}$$

Thus, each factor \(\chi _p\) reflects the solution density of (2.2) in \(\mathbb {Q}_p\).

For the singular integral we proceed in a similar manner. Recall that \(\Phi _{\mathbf {y}}^{(1)}\) is an invertible linear transformation. Consider the manifold

$$\begin{aligned}M(h) = \{ \varvec{\xi }\in [-1,1]^n: \Phi _{\mathbf {y}}^{(1)}(\varvec{\xi }) = h\} \end{aligned}$$

with associated measure \(\mu \), normalised such that \(\mu (M(0))=d(\Lambda _{\mathbf {y}})^{-1}\). Let now

$$\begin{aligned} g(\varvec{\xi }) = \int _{\mathbb {R}^{d-1}} e\left( \sum _{j=2}^d \eta _j \Phi ^{(j)}_{\mathbf {y}}(\varvec{\xi })\right) \,\mathrm {d}\varvec{\eta }\qquad \text {and} \qquad f(h) = \int _{M(h)} g(\varvec{\xi }) \,\mathrm {d}\mu (\varvec{\xi }), \end{aligned}$$

so that \(f(0) = \mathfrak {J}_{\mathbf {y}}\). The inverse Fourier transform of f is given by

$$\begin{aligned} \mathcal {F}^{-1} f(\alpha )&= \int _{[-1,1]^n} g(\varvec{\xi }) e(\alpha \Phi _{\mathbf {y}}^{(1)}(\varvec{\xi }))\,\mathrm {d}\varvec{\xi }, \end{aligned}$$

and upon taking the (regular) Fourier transform it follows from the Fourier inversion formula that

$$\begin{aligned} f(N)&= \int _{\mathbb {R}} \int _{[-1,1]^n} g(\varvec{\xi }) e(\alpha (\Phi _{\mathbf {y}}^{(1)}(\varvec{\xi }) - N))\,\mathrm {d}\varvec{\xi }\,\mathrm {d}\alpha . \end{aligned}$$

Thus we conclude that

$$\begin{aligned} \mathfrak {J}_{\mathbf {y}} = f(0) = \int _{[-1,1]^n}\int _{\mathbb {R}^{d}} e\left( \sum _{j=1}^d \eta _j \Phi ^{(j)}_{\mathbf {y}}(\varvec{\xi })\right) \,\mathrm {d}\varvec{\eta }\,\mathrm {d}\varvec{\xi }. \end{aligned}$$

One can now show by standard arguments (for instance Lemma 2 and §11 in [25]) that this expression indeed describes the solution density of (2.2) over the real unit hypercube.

7 Endgame

The quantities \(\sigma _j\) and \(\Sigma _j\) can be expressed in terms of s itself. It is a straightforward exercise to confirm the identities

$$\begin{aligned} \sum _{n=1}^N n2^n = 2^{N+1}(N-1)+2 \qquad \text { and } \qquad \sum _{n=1}^N n^2 2^n = 2^{N+1}(N^2-2N+3)-6. \end{aligned}$$
(7.1)

Note that (4.8) transforms into

$$\begin{aligned} \frac{1}{k_j} > \frac{2^{j-1}}{s-\rho } + (d-1)\varpi . \end{aligned}$$

Using this within (5.3), an application of (7.1) produces the bounds

$$\begin{aligned} \sigma _j > \frac{2^d(d-2)-2^{j-1}(j-3) }{s-\rho } + \varpi (d-1)\frac{d(d-1)- (j-1)(j-2)}{2} \end{aligned}$$

and

$$\begin{aligned} \Sigma _j&> \frac{2^d(d^2-2d+2-j(d-2))+2^{j-1}(j-5)}{s - \rho }\\&\quad + \varpi \frac{(d-1) (d - j+1) (d - j+2) (2 d + j-3)}{6}, \end{aligned}$$

which we require to hold for all indices j in our range \(2 \leqslant j \leqslant d\). For the sake of simplicity we replace all these bounds by

$$\begin{aligned} \frac{1}{k_j}&> \frac{2^{d-1}}{s-\rho } + (d-1)\varpi , \qquad \sigma _j> \frac{2^d (d-1)}{s-\rho } + \frac{\varpi d(d-1)^2}{2}, \nonumber \\ \Sigma _j&> \frac{2^d (d^2-4d+6)}{s-\rho }+ \frac{\varpi (2d-1)d(d-1)^2}{6}. \end{aligned}$$
(7.2)

This allows us to state a first result.

Theorem 7.1

Let \(F \in \mathbb {Z}[x_1, \ldots , x_n]\) be a non-singular form of degree \(d \geqslant 5\) defining a hypersurface \(\mathcal {V}\). Let further \(\psi >0\) be a parameter satisfying

$$\begin{aligned} \psi ^{-1}> {\textstyle d^4 + \frac{3}{2} d^3 - \frac{11}{2} d^2 + d + 2}, \end{aligned}$$
(7.3)

and set

$$\begin{aligned} n_1(\psi )&= \frac{2^{d-1}\left( d^3 + \frac{1}{2}d^2 - \frac{11}{2} d + 10 - \psi p_6(d)\right) }{1-(d^4 + \frac{3}{2} d^3 - \frac{11}{2} d^2 + d + 2)\psi }, \end{aligned}$$

where \(p_6(d)=\frac{1}{12}(50 d^6 - 171 d^5 + 88 d^4 + 517 d^3 - 732 d^2 + 8 d - 120)\). For some integer \(\rho \in [1,n]\) suppose that \(n-\rho > n_1(\psi )\). Then there exists a real positive number \(\nu \) with the property that

$$\begin{aligned} N_{\mathbf {y}}(X) = X^{n-\frac{1}{2} d(d+1)}\mathfrak {S}_{\mathbf {y}} \mathfrak {J}_{\mathbf {y}} + O(X^{n-\frac{1}{2} d(d+1) - \nu }) \end{aligned}$$

uniformly for all \(\mathbf {y} \in \mathcal {V}_{2,\rho }(\mathbb {Z})\) satisfying \(|\mathbf {y}| \leqslant X^{\psi }\), and the factors \(\mathfrak {S}_{\mathbf {y}}\) and \(\mathfrak {J}_{\mathbf {y}}\) satisfy \(0 \leqslant \mathfrak {S}_{\mathbf {y}} \ll _{\mathbf {y}} 1\) and \(0 \leqslant \mathfrak {J}_{\mathbf {y}} \ll _{\mathbf {y}} 1\).

Proof

Our main task here is to bound the error terms given by (6.16) in the conclusion of Proposition 6.4, while at the same time ensuring that the hypotheses of said proposition are satisfied. In order to control the first term in (6.16) we choose

$$\begin{aligned} \delta _j = \psi (\Delta _j+D_{d-j}+ (D_{d-j+2} - 2)(d-1)) \end{aligned}$$

for \(2 \leqslant j \leqslant d\). Thus, we have \(\delta _d=(d-1)\psi \). With this choice, and recalling (3.4), the bound in (5.5) is certainly majorised by \(k_d > D\). In a similar manner, upon taking into account the uniform bounds (7.2) as well as the relations \(D_j \leqslant D\) and

$$\begin{aligned} \delta _j \leqslant \delta _2=\textstyle {\frac{1}{3}}(2d^3 - 11d + 9)\psi \end{aligned}$$

for all j, a modicum of computation reveals that for all \(\psi \) satisfying (7.3) one has

$$\begin{aligned} \frac{D_{j-1}-1+\delta _j}{1 - (D_{d-j+1}+(d-1)(d-j+1))\psi } \leqslant \frac{d(d-1)}{2}, \end{aligned}$$

and hence the condition (5.13) may be simplified to

$$\begin{aligned} \textstyle {\frac{1}{2} d(d-1)}\left( \sigma _j +k_{j-1}^{-1}\right) + \Sigma _j + \sigma _j< 1. \end{aligned}$$
(7.4)

Upon inserting (7.2), we see that the conditions (5.5), (5.13) (as simplified to (7.4)) and (5.14) of Proposition 6.4 are satisfied whenever

$$\begin{aligned} s - \rho > \max \{a_0(\varpi ) , a_1(\varpi ) , a_2(\varpi ,\psi ) \}, \end{aligned}$$

where

$$\begin{aligned} a_0(\varpi )&= \frac{2^{d-2} d(d+1)}{1-\frac{1}{2} d(d^2-1)\varpi },\\ a_1(\varpi )&= \frac{2^{d-2}(2 d^3 + d^2 - 11 d + 20)}{1 - \frac{1}{12} (d - 1)^2 d (3 d^2 + d + 10)\varpi },\\ a_2(\varpi ,\psi )&= \frac{2^{d-1}(2d-1)\psi }{\varpi (1- \frac{1}{2}d(d^2+d-2)\psi )}. \end{aligned}$$

For this to be defined, we require in particular that

$$\begin{aligned} \varpi ^{-1}> {\textstyle \frac{1}{12} (d - 1)^2 d (3 d^2 + d + 10)}, \end{aligned}$$
(7.5)

which we will assume henceforth.

Meanwhile, to control the second and third term in (6.16) we require that

$$\begin{aligned}&\frac{\frac{1}{3}(4 d^3 - 19 d + 15)\psi }{1-\Sigma -\sigma } + \frac{\frac{1}{2}(3 d^2 - 7 d + 4)\psi }{\sigma }\nonumber \\&\quad< k_2\theta _2 < \frac{1- \frac{1}{2}(6d^3-11d^2+d+4)\psi }{(2d+1)\sigma }, \end{aligned}$$
(7.6)

while simultaneously the bound (5.15) should be satisfied. Upon re-writing, we see that the interval in (7.6) is non-empty when

$$\begin{aligned} \left( 1+\frac{ \frac{1}{3} (8 d^4 + 4 d^3 - 38 d^2 + 11 d + 15)\psi }{1 - (6 d^3 - 11 d^2 + d + 4) \psi }\right) \sigma + \Sigma <1. \end{aligned}$$
(7.7)

When \(\psi \) satisfies (7.3) one can show for \(d \geqslant 5\) that

$$\begin{aligned} \frac{ \frac{1}{3} (8 d^4 + 4 d^3 - 38 d^2 + 11 d + 15)\psi }{1 - (6 d^3 - 11 d^2 + d + 4) \psi }\leqslant 8, \end{aligned}$$

and hence (7.7) may be simplified to \(9\sigma +\Sigma <1\). In combination with (7.2) this delivers the bound \(s-\rho > b_1(\varpi )\) where

$$\begin{aligned} b_1(\varpi ) = \frac{2^d (d^2+5d-3)}{1 - \frac{1}{3} d(d-1)^2(d+13)\varpi }. \end{aligned}$$

In order to handle the bound (5.15) one confirms that \(\delta _2/(1-\sigma -\Sigma )\) is smaller than the first term on the left hand side of (7.6), and hence (5.15) is compatible with the right hand side of (7.6) if the inequality

$$\begin{aligned} \psi \varpi ^{-1} <\frac{1- \frac{1}{2}(6d^3-11d^2+d+4)\psi }{(2d+1)\sigma } \end{aligned}$$

is satisfied. Re-arranging yields

$$\begin{aligned} \frac{\psi (2d+1)\varpi ^{-1}}{1-\frac{1}{2}(6d^3-11d^2+d+4)\psi }\sigma <1, \end{aligned}$$

which upon inserting (7.2) delivers the bound \(s-\rho > b_2(\varpi , \psi )\) where

$$\begin{aligned} b_2(\varpi , \psi )&= \frac{2^d(d-1)(2d+1) \psi }{\varpi (1- ( d^4 + \frac{3}{2} d^3 - \frac{11}{2} d^2 + d + 2)\psi )}. \end{aligned}$$

Thus, altogether we have shown that the conclusion of the theorem follows if for some suitable value of \(\varpi \) one has

$$\begin{aligned} s-\rho > \max \{a_0(\varpi ), a_1(\varpi ), a_2(\varpi , \psi ), b_1(\varpi ), b_2(\varpi , \psi ) \}. \end{aligned}$$

We see that \(b_2(\varpi , \psi )>a_2(\varpi , \psi ) \) for all admissible values of \(\psi \) and \(\varpi \). In a similar manner, when \(d \geqslant 5\) we have the inequalities \(a_1(\varpi ) \geqslant \max \{a_0(\varpi ) ,b_1(\varpi )\}\) for all admissible values of \(\varpi \). One can compute (for instance with the help of a computer algebra programme) that \(a_1(\varpi ) = b_2(\varpi , \psi )\) when \(\varpi = \varpi _0(\psi )\), where

$$\begin{aligned} \varpi _0(\psi )=\frac{(d-1)(2d+1)\psi }{d^3 + \frac{1}{2}d^2 - \frac{11}{2} d + 10 - \psi p_6(d)}, \end{aligned}$$

where \(p_6(d)\) is as in the statement of the theorem. The quantity \(\varpi _0(\psi )\) is increasing in \(\psi \), and a final computation confirms that it is admissible within (7.5) for all values of \(\psi \) satisfying (7.3). Thus, for any given value of \(\psi \) within the admissible range the bound \(s-\rho > b_2(\psi , \varpi _0(\psi ))\) dominates overall. Setting \( n_1(\psi )= b_2(\psi , \varpi _0(\psi ))\) concludes the proof of Theorem 7.1.

Theorem 1.3 is a simplification of Theorem 7.1. Indeed, upon choosing \(\psi = \psi _1\) with \(\psi _1^{-1} =2d^4\) we find that

$$\begin{aligned} n_1(\psi _1)&=\frac{2^d( 24 d^7 - 38 d^6 + 39 d^5 + 152 d^4 - 517 d^3 + 732 d^2 - 8 d - 240) }{ 24 d^4 - 36 d^3 + 132 d^2 - 24 d - 48} \\&< 2^{d}d(d^2-1) \end{aligned}$$

for all admissible values of d. Since the function \(n_1(\psi )\) is increasing in \(\psi \), this bound is sufficient for all \(\psi < \psi _1\) also. This completes the proof of Theorem 1.3.

In order to obtain an estimate for \(N_{\mathcal {U}}(X, X^\psi )\) and thus complete the proof of Theorems 1.4 and 1.2, we need to sum over all values of \(\mathbf {y} \in \mathcal {U}(\mathbb {Z})\) satisfying \(|\mathbf {y}| \leqslant X^\psi \) and \(F(\mathbf {y})=0\).

Theorem 7.2

Let \(F \in \mathbb {Z}[x_1, \ldots , x_n]\) be a non-singular form of degree \(d \geqslant 5\) defining a hypersurface \(\mathcal {V}\). Let further \(\psi >0\) be a parameter satisfying

$$\begin{aligned} \psi ^{-1}>{\textstyle d^4 + \frac{3}{2} d^3 - 5 d^2 + \frac{1}{2} d + 2}. \end{aligned}$$
(7.8)

Set

$$\begin{aligned} n_2(\psi )&=\frac{ 2^{d-1} \left( d^3 + \frac{1}{2}d^2 - \frac{11}{2} d + 10 - q_6(d) \psi \right) }{1-(d^4 + \frac{1}{2} d^3 - \frac{5}{2} d^2 - 2 d + 2)\psi } \end{aligned}$$

where \(q_6(d)=\frac{1}{12}(50 d^6 - 165 d^5 + 85 d^4 + 481 d^3 - 639 d^2 - 52 d + 240)\). For some integer \(\rho \) in the range \(\frac{1}{2}d(d+1)+1< \rho <n\) suppose that \( n-\rho > n_2(\psi )\). Then there exists a positive real number \(\nu \) for which we have the asymptotic formula

$$\begin{aligned} N_{\mathcal {V}_{2, \rho }}(X, Y) = X^{n-D} \sum _{\begin{array}{c} \mathbf {y} \in \mathcal {V}_{2, \rho }(\mathbb {Z}) \\ |\mathbf {y}| \leqslant Y \end{array}} \mathfrak {S}_{\mathbf {y}} \mathfrak {J}_{\mathbf {y}} + O((XY)^{n-D}X^{-\nu }), \end{aligned}$$
(7.9)

and the factors satisfy \(0 \leqslant \mathfrak {S}_{\mathbf {y}} \ll _{\mathbf {y}} 1\) and \(0 \leqslant \mathfrak {J}_{\mathbf {y}} \ll _{\mathbf {y}} 1\).

Proof

Recall from Birch’s theorem [6] that for \(n > 2^d(d-1)\) the number of points \(\mathbf {z} \in \mathbb {Z}^n\) with \(|\mathbf {z}| \leqslant Z\) and \(F(\mathbf {z})=0\) is given by \(N(Z) \ll Z^{n-d}\). Upon combining (1.2) and Proposition 6.4, we find that

$$\begin{aligned} N_{\mathcal {U}}(X, X^\psi ) = X^{n-D} \sum _{\begin{array}{c} \mathbf {y} \in \mathcal {U}(\mathbb {Z}) \cap \mathcal {A}(\rho ) \\ |\mathbf {y}| \leqslant X^\psi \\ F(\mathbf {y})=0 \end{array}} \mathfrak {S}_{\mathbf {y}} \mathfrak {J}_{\mathbf {y}} + O\left( E_{\mathcal {A}}(\psi )+E_{\mathcal {B}}(\psi )+E_{\mathcal {U}}(\psi )\right) , \end{aligned}$$
(7.10)

where

$$\begin{aligned} E_{\mathcal {A}}(\psi )&= X^{n-D}\sum _{\begin{array}{c} \mathbf {y} \in \mathcal {U}(\mathbb {Z}) \cap \mathcal {A}(\rho ) \\ |\mathbf {y}| \leqslant X^\psi \\ F(\mathbf {y})=0 \end{array}} E(\mathbf {y}, \theta ),&E_{\mathcal {B}}(\psi )&= \sum _{\begin{array}{c} \mathbf {y} \in \mathcal {U}(\mathbb {Z}) \cap \mathcal {B}(\rho ) \\ 0 <|\mathbf {y}| \leqslant X^\psi \\ F(\mathbf {y})=0 \end{array}} \sum _{\begin{array}{c} \mathbf {x} \in \mathcal {U}(\mathbb {Z}) \\ |\mathbf {x}| \leqslant X \\ F(\mathbf {x})=0 \end{array}}1 \end{aligned}$$

and

$$\begin{aligned} E_{\mathcal {U}}(\psi ) = \sum _{\begin{array}{c} |\mathbf {y}| \leqslant X^\psi \\ F(\mathbf {y})=0 \end{array}} \sum _{\begin{array}{c} \mathbf {x} \in \mathcal {V}(\mathbb {Z}) {\setminus } \mathcal {U}(\mathbb {Z}) \\ |\mathbf {x}| \leqslant X \end{array}} 1 \ll X^{\dim \mathcal {V} {\setminus } \mathcal {U} + \psi (n-d)}. \end{aligned}$$

The choice \(\mathcal {U} = \mathcal {A}(\rho ) \cap \mathcal {V} = \mathcal {V}_{2,\rho }\) entails that \(\dim \mathcal {V} {\setminus } \mathcal {V}_{2,\rho } = \dim \mathcal {V}^*_{2, \rho } \leqslant n-\rho \), and we conclude that the error \(E_{\mathcal {U}}(\psi )\) is acceptable within (7.10) if \(\rho > D + \psi (D-d)\). In particular, it follows from (3.4) that the choice \(\rho =D+1\) is permissible. Clearly, with this choice of \(\mathcal {U}\) the set \(\mathcal {B}(\rho ) \cap \mathcal {U}\) is empty and we can disregard the error term \(E_{\mathcal {B}}(\psi )\). Thus, it suffices to bound the error \(E_{\mathcal {A}}(\psi )\). We have

$$\begin{aligned} E_{\mathcal {A}}(\psi )&\ll X^{n-D} N(X^{\psi })\sup _{ |\mathbf {y}| \leqslant X^\psi } E(\mathbf {y}, \theta ) \ll (X^{1+\psi })^{n-D}(U_1+ U_2 + U_3), \end{aligned}$$

where

$$\begin{aligned} U_1&= \sum _{j=2}^d X^{-\delta _j+\psi (\Delta _j+D_{d-j}+ (D_{d-j+2}-2)(d-1)+D-d)-\nu },\\ U_2&= X^{-1+(2d+1)\omega + (3d^3-5d^2+2)\psi }, \\ U_3&= X^{{\textstyle \frac{1-\sigma -\Sigma }{\sigma }}(-\omega + \frac{1}{2} (3 d^2 - 7 d + 4)\psi ) +\frac{1}{6}(8 d^3 + 3 d^2 - 41 d + 30)\psi }. \end{aligned}$$

Assuming that

$$\begin{aligned} \delta _j = \psi (\Delta _j+D_{d-j}+ (D_{d-j+2}-2)(d-1)+D-d) \qquad (2 \leqslant j \leqslant d), \end{aligned}$$

the exponent in the first term is negative. With this choice we have \(\delta _d = (D-1) \psi \) and

$$\begin{aligned} \delta _j \leqslant \delta _2=\textstyle {\frac{1}{6}} (4 d^3 + 3 d^2 - 25 d + 18)\psi . \end{aligned}$$

As before, this choice allows us to simplify the conditions (5.5) and (5.13), and we see that they and (5.14) are satisfied whenever \(s-\rho > \max \{a_0(\varpi ), a_1(\varpi ), a_2(\varpi , \psi )\}\), with the same values as in the proof of Theorem 7.1.

Meanwhile, the error terms \(U_2\) and \(U_3\) are acceptable if we can choose \(\theta _2\) such that

$$\begin{aligned}&\frac{\frac{1}{6}(8 d^3 + 3 d^2 - 41 d + 30)\psi }{1-\sigma - \Sigma } + \frac{ \frac{1}{2} (3 d^2 - 7 d + 4)\psi }{\sigma }\nonumber \\&\quad< k_2 \theta _2< \frac{1- (3d^3-5d^2+2)\psi }{(2d+1)\sigma }, \end{aligned}$$
(7.11)

and this interval can be seen to be non-empty if (7.8) is satisfied and further

$$\begin{aligned} \left( 1+ \frac{\frac{1}{6} (16 d^4 + 14 d^3 - 79 d^2 + 19 d + 30)\psi }{1-\frac{1}{2} (12 d^3 - 21 d^2 + d + 8)\psi }\right) \sigma + \Sigma < 1. \end{aligned}$$
(7.12)

When \(\psi \) satisfies (7.8) one can show for \(d \geqslant 5\) that

$$\begin{aligned} \frac{\frac{1}{6} (16 d^4 + 14 d^3 - 79 d^2 + 19 d + 30)\psi }{1-\frac{1}{2} (12 d^3 - 21 d^2 + d + 8)\psi }\leqslant \frac{25}{3}, \end{aligned}$$

and hence (7.12) can be simplified to \(\frac{28}{3}\sigma +\Sigma <1\). Upon recalling (7.2) this gives \(s-\rho > \beta _1(\varpi )\), where

$$\begin{aligned} \beta _1(\varpi ) = \frac{2^d (d^2+\frac{16}{3}d-\frac{10}{3})}{1 - \frac{1}{6} d(d-1)^2(2d+27)\varpi }. \end{aligned}$$

It remains to compare the right hand side of (7.11) with the bound of (5.15). As before, with our choice of \(\delta _2\) we find that the first term in the maximum in (5.15) is bounded above by the left hand side of (7.11). Thus, it suffices to ensure that the interval

$$\begin{aligned} \psi \varpi ^{-1}< k_2 \theta _2< \frac{1- (3d^3-5d^2+2)\psi }{(2d+1)\sigma } \end{aligned}$$

is non-empty. Such is the case when

$$\begin{aligned} \frac{(2d+1)\psi }{\varpi (1-(3 d^3 - 5 d^2 + 2)\psi )}\sigma <1, \end{aligned}$$

and on inserting (7.2) we obtain the bound \(s-\rho > \beta _2(\varpi , \psi )\) where

$$\begin{aligned} \beta _2(\varpi , \psi )= \frac{2^d (d-1)(2d+1) \psi }{\varpi (1- (d^4 + \frac{3}{2} d^3 - 5 d^2 + \frac{1}{2} d + 2)\psi )}. \end{aligned}$$

When \(d \geqslant 5\) one checks by a modicum of computation that \(\beta _2(\varpi , \psi ) \geqslant a_2(\varpi , \psi )\) and that \(a_1(\varpi )\) exceeds both \(\beta _1(\varpi )\) and \(a_0(\varpi )\) in the appropriate ranges for \(\varpi \) and \(\psi \). Just as before, we see that \(a_1(\varpi ) = \beta _2(\varpi , \psi )\) when \(\varpi =\varpi _1(\psi )\), where

$$\begin{aligned} \varpi _1(\psi ) = \frac{2(1+2d)(d-1)\psi }{d^3 + \frac{1}{2}d^2 - \frac{11}{2} d + 10 - q_6(d)\psi }, \end{aligned}$$

and \(q_6(d)\) is the polynomial defined in the statement of the theorem. This is in accordance with (7.5), so that just as before we obtain our final bound \(s-\rho > n_2(\psi )\) where we put \(n_2(\psi )= \beta _2(\varpi _1(\psi ), \psi )\). This completes the proof of the theorem. \(\square \)

As before, one can show that \(n_2(\psi )\) is increasing in \(\psi \), and by taking \(\psi =\psi _1\) with \(\psi _1^{-1} = 2d^4\) we see after some calculations that

$$\begin{aligned} n_2(\psi _1)&=\frac{2^d(24 d^7 - 38 d^6 + 33 d^5 + 155 d^4 - 481 d^3 + 639 d^2 + 52 d - 240)}{24 d^4 - 36 d^3 + 120 d^2 - 12 d - 48} \\&\leqslant 2^d d(d^2-1) - {\textstyle \frac{1}{2} d(d+1)}-1. \end{aligned}$$

The conclusion of Theorem 1.4 now follows upon choosing \(\rho = \frac{1}{2} d(d+1)+1\).

It thus remains to evaluate the sum over the singular integral and singular series. This task can be absolved swiftly by invoking Theorem 2.1 in [9] and imitating arguments from [23, Section 8]. For fixed Y we set \(\psi _0 = (d^3(d+\frac{3}{2}) - 1)^{-1}\) and \(X_0 = Y^{1/\psi _0}\). Now assume that

$$\begin{aligned} n - \rho > 2^{d-1}d(d+1)(1+\psi _0^{-1}). \end{aligned}$$
(7.13)

Then by [9, Theorem 2.1] we have the alternative asymptotic formula

$$\begin{aligned} N(X_0, Y) =(X_0Y)^{n-D} \chi _{\infty }\prod _{p \text { prime}} \chi _p + O((X_0Y)^{n-D} Y^{-\nu }). \end{aligned}$$

On the other hand, one can check that the condition in (7.13) is stricter than the hypothesis of Theorem 7.2, so we may compare this bound with (7.9) and deduce that

$$\begin{aligned} \sum _{\begin{array}{c} \mathbf {y} \in \mathcal {U}(\mathbb {Z}) \\ |\mathbf {y}| \leqslant Y\\ F(\mathbf {y})=0 \end{array}} \mathfrak {S}_{\mathbf {y}} \mathfrak {J}_{\mathbf {y}}= Y^{n-D} \chi _{\infty }\prod _{p \text { prime}} \chi _p + O(Y^{n-D-\nu }). \end{aligned}$$
(7.14)

Note in particular that (7.14) does not depend on \(X_0\) any longer. Thus, if (7.13) is satisfied, we are able to replace the sum over the singular series and integral in Theorem 1.4 by a product of local densities as in (7.14). This establishes Theorem 1.1 for all \(\psi \leqslant \psi _0\), while for \(\psi _0 \leqslant \psi \leqslant 1\) the corresponding result follows from Theorem 2.1 in [9]. Finally, we recall that we need \(\rho \geqslant \frac{1}{2}d(d+1)+1\) and note that

$$\begin{aligned} {\textstyle 2^{d-1}d^4(d+1)(d+ \frac{3}{2}) + \frac{1}{2} d(d+1)+1 }\leqslant 2^{d-1}d^4(d+1)(d+2) \end{aligned}$$

for all admissible values d. This completes the proof of Theorem 1.1.

In order to complete the proof of our final result in Theorem 1.2, we note that in this case \(\mathcal {U} = \mathcal {V} {\setminus } \{ \varvec{0}\}\). Thus, the error \(E_{\mathcal {U}}(\psi ) \ll X^{\psi (n-d)}\) is under control, and it remains to understand the error arising from any singular set \(\mathcal {B}(\rho )\). From (4.9) we infer that \(E_{\mathcal {B}}(\psi ) \ll X^{n-d} X^{\psi (n-\rho )}\), which is acceptable within (7.10) if \(\rho > D + d(d-1)/(2\psi )\). Picking \(\rho \) minimal in this way, we can now proceed precisely as in the proof of Theorem 1.1.