1 Introduction

We consider here polynomial curves in \({\mathbb {Z}}^d\) for \(d\ge 1\) given by

$$\begin{aligned} \gamma (n) = (P_1(n),\ldots , P_d(n)) \end{aligned}$$

with \(P_1,\ldots ,P_d\) univariate polynomials in \({\mathbb {Z}}[X]\) with the property that their degrees are separated, by which we mean that \(\deg P_j < \deg P_{j+1}\) for all \(j \in \{1,\ldots , d-1\}\). The prototypical example of such curves is the moment curve

$$\begin{aligned} \varGamma (n) :=(n,n^2,n^3,\ldots , n^d). \end{aligned}$$

To each curve \(\gamma \) one can associate the sequence of discrete (forward) averages in \({\mathbb {Z}}^d\) along the curve given by

$$\begin{aligned} {\mathcal {A}}^{\gamma }_{N}f ({\varvec{x}}) := \frac{1}{N}\sum _{n=1}^{N} f({\varvec{x}}-\gamma (n)), \end{aligned}$$

where \({\varvec{x}}\in {\mathbb {Z}}^d\); we will typically omit the superscript if no confusion ensues. In analogy with the \(L^p\)-improving problem for continuous averages, one is interested in studying the \(\ell ^p \rightarrow \ell ^q\) mapping properties of the operators \({\mathcal {A}}_N\)—in particular, one would like to explore to what extent inequalities of the form

$$\begin{aligned} \Vert {\mathcal {A}}_N f\Vert _{\ell ^q({\mathbb {Z}}^d)} \le C_{p,q,\gamma }(N) \Vert f\Vert _{\ell ^p({\mathbb {Z}})} \end{aligned}$$
(1)

can hold and what is the (asymptotically) smallest \(C_{p,q,\gamma }(N)\) constant for which (1) holds. In this regard, one is led to the following conjecture.

Conjecture 1

Let \(D = D_{\gamma } := \sum _{j = 1}^{d} \mathrm {deg} P_j\) denote the total degree of the curve \(\gamma \). For any pair of exponents pq such that \(q \ge p\) and for every \(\epsilon > 0\) the estimate

$$\begin{aligned} \Vert {\mathcal {A}}_N f\Vert _{\ell ^{q}({\mathbb {Z}}^d)} \lesssim _{\epsilon } N^{\epsilon }(N^{ - D(1/p - 1/q)} + N^{-1/{q'}} + N^{-1/p}) \Vert f\Vert _{\ell ^{p}({\mathbb {Z}}^d)} \end{aligned}$$
(2)

holds for every \(f \in \ell ^p({\mathbb {Z}}^d)\).

That the condition \(q \ge p\) is necessary can be seen by a standard argument due to Hörmander using the translation invariance of the operators \({\mathcal {A}}_N\) and the fact that they are local. To show that the best constant \(C_{p,q,\gamma }(N)\) for which (1) can hold is at least as large as the right-hand side of (2) it suffices to test against some standard examples: in particular, testing against a Dirac delta function one obtains \(C_{p,q,\gamma }(N) > rsim N^{-1/{q'}}\), while testing against the characteristic function of the imageFootnote 1\(\gamma ([1,N])\) one obtains \(C_{p,q,\gamma }(N) > rsim N^{-1/p}\); finally, the remaining condition is obtained by testing against the characteristic function of (a suitable dilate of) a parabolic box of sidelengths \(\sim N^{\mathrm {deg} P_1} \times \ldots \times N^{\mathrm {deg} P_d}\). Thus the conjecture states that the necessary powers of N given by such examples are also sufficient.

Remark 1

The \(\epsilon \)-loss in the exponents of N does not arise from the aforementioned examples and has been included out of an abundance of caution. It is possibly absent from the true estimates and can sometimes be removed by suitable \(\epsilon \)-removal lemmata, but we will not concern ourselves with such questions here.

If an estimate of the form (1) is shown to hold for a certain pair pq with constant \(C_{p,q,\gamma }(N)\) as in (2), we will say that the estimate is optimal. In this paper we shall be concerned exclusively with optimal estimates. Moreover, the regime in which the term \(N^{- D(1/p - 1/q)}\) at the right-hand side of (2) dominates over the other two will be called the supercritical regime: by inspection, it consists of those exponents pq such that

$$\begin{aligned} {\left\{ \begin{array}{ll} &{} q \ge p, \\ &{} \frac{D}{q}> \frac{D-1}{p}, \\ &{} \frac{D}{p'} > \frac{D-1}{q'}; \end{array}\right. } \end{aligned}$$
(3)

on a (1/p, 1/q) diagram, it corresponds to a triangle with one side given by the \(1/p = 1/q\) line and the vertex opposite to it given by the critical endpoint \((1/p_c, 1/{p_c'}) = \big (\frac{D}{2D - 1}, \frac{D-1}{2D-1}\big )\) (that is, \(p_c = 2 - \frac{1}{D}\)). The complement of the supercritical regime in the \(q \ge p\) range will be called the subcritical regime. Estimates for exponents pq in either regime will be named accordingly. See Fig. 1 for a pictorial representation of the regimes on a (1/p, 1/q) diagram.

Remark 2

There seems to be a certain difference between the discrete and continuous case: in the former the supercritical regime is conjectured to correspond to a triangle in the (1/p, 1/q) diagram, while in the latter it is known (for suitably regular curves or for suitably weighted averages) that the range of boundedness corresponds to a trapezoid instead (when \(d>2\)).

Fig. 1
figure 1

A pictorial illustration of Conjecture 1. The supercritical regime corresponds to the darker triangle, which consists of the points (1/p, 1/q) for which the term \(N^{-D(1/p - 1/q)}\) dominates over the other two in the right-hand side of (2), as indicated. The tip of the triangle corresponds to the critical estimate. The lower lighter triangle below the \(q = p'\) line corresponds to the part of the subcritical regime in which the term \(N^{-1/p}\) dominates; the lighter triangle above the \(q = p'\) line corresponds to the part of the subcritical regime in which the term \(N^{-1/{q'}}\) dominates

A first incarnation of the problem considered here has been the problem of studying the \(\ell ^p \rightarrow \ell ^q\) mapping properties of discrete fractional integrals along discrete varieties (see [6] for why the two are quite related). This issue has attracted a great deal of attention over the years—see [10, 12,13,14,15,16,17] for a selection of works in the area. The \(\ell ^p\)-improving problem for discrete averages \({\mathcal {A}}_N\) proper is more recent but has seen a certain degree of activity lately. In particular, in [7] Han et al. have studied the \(\ell ^p\)-improving properties of the averages \({\mathcal {A}}^{\gamma }_N\) along the polynomial \(\gamma (n) = n^2\), while in [6] the same authors together with Kovač and Madrid have studied the case of \(\gamma = \varGamma \) the moment curve. In the latter, using the celebrated solution to the Vinogradov Mean Value conjecture of Wooley [21] for \(d=3\) and of Bourgain et al. [2] for arbitrary d, they have shown for the moment curve the optimal supercritical estimates (2) for exponents pq such that

$$\begin{aligned} {\left\{ \begin{array}{ll} &{} q \ge p, \\ &{} \frac{D+1/2}{q}> \frac{D-1/2}{p}, \\ &{} \frac{D+1/2}{p'} > \frac{D-1/2}{q'}, \end{array}\right. } \end{aligned}$$

which is the (strict) subset of the supercritical regime given by interpolation of the trivial \(\ell ^p \rightarrow \ell ^p\) inequalities with the optimal endpoint \(\ell ^{p_0} \rightarrow \ell ^{{p_0}'}\) inequality with \(p_0 = \frac{4D}{2D+1}\), where here \(D = D_{\varGamma } = \frac{d(d+1)}{2}\) (see however Sect. 2.2 for why the range they obtain is actually larger). By a transference argument they also obtained some supercritical optimal estimates for the curve given by the monomial \(\gamma (n) = n^k\); however, along the \(q = p'\) line the transference argument yields the optimal \(\ell ^p \rightarrow \ell ^{p'}\) inequality only for \(2 \ge p \ge 2 - O(1/k^2)\), while the conjectured endpoint is \(2 - 1/k\).

We also mention the recent work of Dasu et al. [4] in which they have completely solved the analogous \(\ell ^p\)-improving problem for the discrete paraboloid in \({\mathbb {Z}}^d\) for any \(d \ge 2\). The obvious connection with the case considered here is given by \(d=2\), in which the paraboloid is simply the parabola \(\gamma (n) = (n,n^2)\). In here we will reprove their endpoint result for the parabola (see (ii) of Theorem 1 below).

Paraboloids are not the only discrete hypersurfaces for which the \(\ell ^p\)-improving problem for the associated averages has been studied—see [1, 8, 11] for work on the discrete sphere.

1.1 Main Results

All the aforementioned papers [4, 6, 7] deal with estimates for \({\mathcal {A}}_N\) in the supercritical regime. In this work we concentrate instead on the subcritical regime and prove some new optimal subcritical estimates for certain classes of curves. Our results can be summarised in the restricted weak-type estimates on the \(q = p'\) line listed in the following theorem.

Theorem 1

Let \(\gamma \) be a polynomial curve with separated degrees. Then the following optimal inequalities hold:

  1. (i)

    Suppose that \(d \ge 1\) and that \(\gamma (n)\) is not a single linear polynomial. Then we have for any finite sets \(E,F \subset {\mathbb {Z}}^d\)

    (4)
  2. (ii)

    Suppose that \(d\ge 2\) and that the first component of \(\gamma \) is a linear polynomial. Then we have for any finite sets \(E,F \subset {\mathbb {Z}}^d\)

    $$\begin{aligned} \langle {\mathcal {A}}_N {\mathbf {1}}_E, {\mathbf {1}}_F \rangle \lesssim _{\gamma ,\epsilon } N^{-3/5 + \epsilon } |E|^{3/5} |F|^{3/5}. \end{aligned}$$
    (5)
  3. (iii)

    Suppose that \(d\ge 3\) and the first three components \(P_1,P_2,P_3\) of \(\gamma \) have degrees respectively equal to 1, 2 and 3. Then we have for any finite sets \(E,F \subset {\mathbb {Z}}^d\)

    $$\begin{aligned} \langle {\mathcal {A}}_N {\mathbf {1}}_E, {\mathbf {1}}_F \rangle \lesssim _{\gamma ,\epsilon } N^{-4/7 + \epsilon } |E|^{4/7} |F|^{4/7}. \end{aligned}$$
    (6)

With the methods of this paper it is a priori possible to extend the list of results further to include additional higher dimensional situations; however, the success of the strategy hinges on the accessibility of certain solution counting bounds for systems of diophantine equations. See Remarks 10 and 11 for details.

Remark 3

Inspection of the proof of Theorem 1 reveals that the \(\epsilon \)-losses in the form of the factors \(N^{\epsilon }\) above can be more precisely quantified to be of the form \(\lesssim e^{C \frac{\log N}{\log \log N}}\) for some \(C>0\).

Remark 4

Observe that, if \(Q \in {\mathbb {Z}}[X]\) is non-constant, (4) holds equally true for curves \(\gamma \) and \(\gamma \circ Q\) (and is always optimal). From the statement of case (ii), it appears at first sight that we no longer have this freedom of composition for (5). However, further inspection of the proof reveals that case (ii) of Theorem 1 admits the following extension. Let \({\mathfrak {X}} \subset {\mathbb {Z}}\) be a sequence and let \({\mathfrak {X}}_N := {\mathfrak {X}} \cap [1,N]\) for any \(N>0\); we can define the averages along \(\gamma \) restricted to \({\mathfrak {X}}\) as

$$\begin{aligned} {\mathcal {A}}^{\gamma }_{{\mathfrak {X}},N}f({\varvec{x}}) := \frac{1}{|{\mathfrak {X}}_N|} \sum _{m\in {\mathfrak {X}}_N} f({\varvec{x}} - \gamma (m)). \end{aligned}$$

When \(Q \in {\mathbb {Z}}[X]\) is a non-constant univariate polynomialFootnote 2 and \({\mathfrak {X}}^Q\) the set of its values, that is \({\mathfrak {X}}^Q= \{ Q(n) : n \in {\mathbb {Z}} \}\), the operator \({\mathcal {A}}^{\gamma }_{{\mathfrak {X}}^Q,N}\) is essentially the same as \({\mathcal {A}}^{\gamma \circ Q}_{M}\) with \(M \sim N^{1/\deg Q}\) (notice that \(|{\mathfrak {X}}^Q_N| \sim N^{1/\deg Q}\)). In this situation, inspecting the proof of Theorem 1 (and in particular that of Lemma 5) we see that case (ii) of Theorem 1 extends to

$$\begin{aligned} \langle {\mathcal {A}}_{{\mathfrak {X}}^Q,N} {\mathbf {1}}_E, {\mathbf {1}}_F \rangle \lesssim _{\gamma ,\epsilon } |{\mathfrak {X}}^Q_N|^{-3/5 + \epsilon } |E|^{3/5} |F|^{3/5} \end{aligned}$$

(so that (5) holds for \(\gamma \circ Q\) as well). Testing against the Dirac delta function (or its dual example) verifies that this estimate is optimal.

We stress that in each of the three cases in Theorem 1 the estimate is indeed in the subcriticalFootnote 3 regime: in the first case the critical exponent \(p_c\) is at least \(2 - 1/2 = 3/2\), in the second at least \(2 - 1/3 = 5/3\) and in the third at least \(2 - 1/6 = 11/6 > 7/4\).

Lorentz interpolation of each of the above estimates with the trivial \(\ell ^p \rightarrow \ell ^{\infty }\) and \(\ell ^1 \rightarrow \ell ^q\) inequalities yields a range of strong-type optimal subcritical estimates. If one has optimal estimates on the critical lines (that separate the super- and subcritical regimes; see Fig. 1), it is possible to interpolate with those as well and obtain an even larger range of subcritical estimates (see Sect. 2.1). In particular, when the exponents of estimates (4) and (5) coincide with the critical exponents \(p_c, {p_c}'\) we obtain by interpolation the full Conjecture 1—see the following corollary.

Corollary 1

Conjecture 1 holds for all \(q \ge p\) when:

  1. (i)

    \(d=1\) and \(\gamma (n)\) is a quadratic polynomial;

  2. (ii)

    \(d=2\) and \(\gamma (n)\) is a parabola in \({\mathbb {Z}}^2\).

Case (i) of the Corollary recovers the corresponding results of [6, 7], while case (ii) recovers the 2D case of [4].

Remark 5

For the parabola, the \(\epsilon \)-removal technology of [4] allows one to remove the \(\epsilon \)-loss (caused by interpolation) from the interior of the supercritical regime. The reach of such technology seems currently limited to the supercritical regime. We mention here in this regard that, since (4) holds without the \(\epsilon \)-loss when \(d\ge 2\), one obtains by interpolation \(\epsilon \)-free optimal estimates in a subset of the subcritical range.

The proof of Theorem 1 relies on an adaptation to our arithmetic setting of the method of refinements introduced by Christ in [3] to study the continuous \(L^p\)-improving problem. The problem has been completely solved by the method in the case of continuous curves, see [3, 5, 18]. Summarising briefly for the unaware reader, the essence of the method (at least in our adaptation) consists in a combinatorial reformulation of the restricted weak-type inequalities that translates them into lower bounds for |E| in terms of parameters \(\alpha ,\beta \) such that \(\alpha |F|= \beta |E| = \langle {\mathcal {A}}_N {\varvec{1}}_E, {\varvec{1}}_F\rangle \); this allows one to set up a “flowing” procedure based on \(\gamma \), taking us alternatingly from the set E to the set F and vice versa. One can show that the procedure yields a “large” (in terms of \(\alpha , \beta \)) set of parameters \((n_1, m_1, \ldots , n_k,m_k) \in [1,N]^{2k}\) such that for a carefully chosen \({\varvec{y}} \in E\) we have for all these parameters

$$\begin{aligned} {\varvec{y}} + \gamma (n_1) - \gamma (m_1) + \ldots + \gamma (n_k) - \gamma (m_k) \in E. \end{aligned}$$

If one views this expression as a map, in our discrete context it is possible to obtain a lower bound for |E| by estimating the multiplicity of the map; this lower bound then translates back into restricted weak-type estimates. The multiplicity estimates translate quite directly into a classical number-theory problem—that of bounding the number of solutions to certain diophantine equations.

While preparing this manuscript we became aware that—not surprisingly—the method of refinements has been applied to such discrete questions before. In fact, Oberlin [13] and Kim [12] used it to prove \(\ell ^p \rightarrow \ell ^q\) estimates for certain discrete fractional integrals along curves—[12] in particular provides somewhat general conditional statements. In contrast to [12, 13], in our adaptation of the method we additionally prune the combinatorial tower of parameters so as to ensure that in our multiplicity bounds we never encounter equations of the form (7) (see Sect. 2.2) but rather their inhomogeneous version. This pruning is crucial to us as the latter are expected to have fewer solutions than (7), since they do not admit solutions of diagonal type.

The rest of the paper is organised as follows: in Sect. 2 we review certain basic facts about the averages \({\mathcal {A}}_N\) and describe how the aforementioned result of [6] is proven; in Sect. 3 we develop an arithmetic version of the method of refinements as needed in our case, with which the proof of Theorem 1 is reduced to proving bounds for the number of solutions to certain diophantine systems that arise in the process; in Sect. 4 we prove such bounds by elementary arguments, thus completing the proof of Theorem 1.

1.2 Notation and Basic Facts

Throughout this manuscript we use \(A \lesssim B\) to denote the inequality \(A \le C B\) for some suppressed constant \(C>0\); \(A \sim B\) means \(A \lesssim B\) and \(B \lesssim A\). When the suppressed constant depends on a certain list \({\mathcal {L}}\) of parameters we highlight this by writing \(A \lesssim _{{\mathcal {L}}} B\). Moreover, in conditional statements we will use \(A \gg B\) to denote the inequality \(A \ge C B\) for some sufficiently large constant \(C>0\).

We use [1, N] as shorthand for the set of integers \(\{1, \ldots , N\}\). If \(E \subset {\mathbb {Z}}^d\) then |E| denotes its cardinality.

In Sect. 4 we will repeatedly make use of the so-called divisor bound, which states that the number of distinct divisors of \(n \ne 0\) is bounded by \(\lesssim e^{C \frac{\log n}{\log \log n}}\); however, we will limit ourselves to the weaker version that states that for every \(\epsilon > 0\) the number of divisors of \(n\lesssim N\) is \(\lesssim _\epsilon N^\epsilon \).

2 Preliminaries

In this section we record some observations about the affine structure of the problem and then discuss how one can obtain estimates of the form (1) by a simple Hausdorff-Young argument. The former will allow us to reduce case (i) of Theorem 1 to curves \(\gamma \) of the form P(n), case (ii) of Theorem 1 to curves \(\gamma \) of the form (nP(n)) and case (iii) to \(\gamma \) the moment curve in \(d=3\), that is \(\varGamma (n) = (n,n^2,n^3)\). The latter discussion will provide some context and allow us to illustrate what is the range in which Conjecture 1 is currently known to hold for the moment curve.

2.1 Affine Transformations and Projections

In the discrete context an affine transformation that maps \({\mathbb {Z}}^d\) into itself does not necessarily have an inverse, that which hampers the usual change of variable arguments. Consider however the following special examples of linear transformations (we ignore the translations since they are harmless):

  1. (i)

    A linear transformation \(T : {\mathbb {Z}}^d \rightarrow {\mathbb {Z}}^d\) such that

    $$\begin{aligned} T(x_1, \ldots , x_d) = (a_1 x_1, \ldots , a_d x_d) \quad \forall {\varvec{x}} \in {\mathbb {Z}}^d \end{aligned}$$

    for some non-zero integers \(a_1, \ldots , a_d\), which we will refer to as integer dilation;

  2. (ii)

    A linear transformation \(T : {\mathbb {Z}}^d \rightarrow {\mathbb {Z}}^d\) such that

    $$\begin{aligned} T(x_1, \ldots , x_d) = (x_1, \ldots , x_{j-1}, x_j - b x_k, x_{j+1},\ldots , x_d) \quad \forall {\varvec{x}} \in {\mathbb {Z}}^d \end{aligned}$$

    for some integer b and \(k \ne j\), which we will refer to as integer shear.

Then the following still holds.

Lemma 1

Let \(T : {\mathbb {Z}}^d \rightarrow {\mathbb {Z}}^d\) be a linear transformation obtained by the composition of integer dilations and integer shears. Then for any curve \(\gamma \) and any Npq such that \(q \ge p\) we have for the averages associated to \(T\gamma \)

$$\begin{aligned} \Vert {\mathcal {A}}^{T\gamma }_N \Vert _{\ell ^p({\mathbb {Z}}^d) \rightarrow \ell ^q({\mathbb {Z}}^d)} \le \Vert {\mathcal {A}}^{\gamma }_N \Vert _{\ell ^p({\mathbb {Z}}^d) \rightarrow \ell ^q({\mathbb {Z}}^d)}, \end{aligned}$$

and similarly for the restricted weak-type norms.

Proof

We consider here only the strong operator norm, since the result for restricted weak-type norms requires only trivial modifications.

It clearly suffices to check for a single integer dilation and a single integer shear separately. Abandoning for a moment our convention about the ordering of the polynomial degrees in the components of \(\gamma \), we can assume that

$$\begin{aligned} T(x_1, \ldots , x_d) = (a x_1, x_2 \ldots , x_d) \end{aligned}$$

or

$$\begin{aligned} T(x_1, \ldots , x_d) = (x_1 - b x_2, x_2, x_3, \ldots , x_d). \end{aligned}$$

In the first case, write \({\varvec{x}} = (x_1,{\varvec{x}}') \in {\mathbb {Z}} \times {\mathbb {Z}}^{d-1}\) and let zr be the unique integers such that \(x_1 = az + r\) with \(0\le r<a\); then if we let \(g_r(s,{\varvec{y}}) := f(as + r, {\varvec{y}})\) we have

$$\begin{aligned} {\mathcal {A}}^{T\gamma }_{N} f(x_1, {\varvec{x}}') = {\mathcal {A}}^{\gamma }_{N} g_r(z,{\varvec{x}}'). \end{aligned}$$

Therefore we have

$$\begin{aligned} \sum _{{\varvec{x}} \in {\mathbb {Z}}^d} |{\mathcal {A}}^{T\gamma }_{N} f({\varvec{x}})|^q&= \sum _{r=0}^{a-1} \sum _{(z,{\varvec{x}}') \in {\mathbb {Z}} \times {\mathbb {Z}}^{d-1}} |{\mathcal {A}}^{\gamma }_{N} g_r(z,{\varvec{x}}')|^q \\&\le \Vert {\mathcal {A}}^{\gamma }_{N}\Vert _{p \rightarrow q}^q \sum _{r = 0}^{a-1}\Big (\sum _{(z,{\varvec{y}}) \in {\mathbb {Z}} \times {\mathbb {Z}}^{d-1}} |g_r(z,{\varvec{y}})|^p \Big )^{q/p} \\&\le \Vert {\mathcal {A}}^{\gamma }_{N}\Vert _{p \rightarrow q}^q \Big ( \sum _{r = 0}^{a-1}\sum _{(z,{\varvec{y}}) \in {\mathbb {Z}} \times {\mathbb {Z}}^{d-1}} |g_r(z,{\varvec{y}})|^p \Big )^{q/p} \\ {}&= \Vert {\mathcal {A}}^{\gamma }_{N}\Vert _{p \rightarrow q}^q \Vert f\Vert _{\ell ^p({\mathbb {Z}}^d)}^q, \end{aligned}$$

where we have used the fact that for \(q \ge p\) the \(\ell ^q\) norm is smaller than the \(\ell ^p\) one. This proves the Lemma for integer dilations.

In the second case, the transformation T is a linear bijection over \({\mathbb {Z}}^d\) with well-defined inverse. A standard change of variables argument (notice that integer shears leave the \(\ell ^p\) norms unchanged) then concludes the proof of the Lemma. \(\square \)

It is an immediate consequence of Lemma 1 that it will suffice to prove case (ii) of Theorem 1 for curves of the form \(\gamma (n) = (n, P_2(n), \ldots , P_d(n))\); moreover, it will similarly suffice to prove case (iii) of Theorem 1 for curves of the form \(\gamma (n) = (n,n^2,n^3, P_4(n), \ldots , P_d(n))\). However, as anticipated, a further reduction is possible—we encapsulate it in the following lemma.

Lemma 2

Let \(\gamma _1, \gamma _2\) be polynomial curves mapping into \({\mathbb {Z}}^{d_1}, {\mathbb {Z}}^{d_2}\) respectively; \((\gamma _1(n),\gamma _2(n))\) is then a polynomial curve mapping into \({\mathbb {Z}}^{d_1} \times {\mathbb {Z}}^{d_2}\). If \(q \ge p\) then we have

$$\begin{aligned} \Vert {\mathcal {A}}^{(\gamma _1,\gamma _2)}_{N}\Vert _{\ell ^{p}({\mathbb {Z}}^{d_1} \times {\mathbb {Z}}^{d_2}) \rightarrow \ell ^q({\mathbb {Z}}^{d_1} \times {\mathbb {Z}}^{d_2})} \le \Vert {\mathcal {A}}^{\gamma _1}_{N} \Vert _{\ell ^p({\mathbb {Z}}^{d_1}) \rightarrow \ell ^q({\mathbb {Z}}^{d_1})}, \end{aligned}$$

and similarly for the restricted weak-type norms and for \(\gamma _2\) in place of \(\gamma _1\).

Proof (outline)

We consider only the strong norms for simplicity. By Minkowski’s inequality and the fact that \(q\ge p\), we have for any fixed \({\varvec{x}} \in {\mathbb {Z}}^{d_1}\) that

$$\begin{aligned}&\Big (\sum _{{\varvec{y}} \in {\mathbb {Z}}^{d_2}} \Big | \frac{1}{N} \sum _{n=1}^{N} f({\varvec{x}} - \gamma _1(n), {\varvec{y}} - \gamma _2(n))\Big |^q \Big )^{1/q} \\&\quad \le \frac{1}{N} \sum _{n=1}^{N} \Big (\sum _{{\varvec{y}} \in {\mathbb {Z}}^{d_2}} |f({\varvec{x}} - \gamma _1(n), {\varvec{y}})|^p \Big )^{1/p}; \end{aligned}$$

this can be reformulated as

$$\begin{aligned} \Vert {\mathcal {A}}^{(\gamma _1,\gamma _2)}_{N} f({\varvec{x}}, \cdot )\Vert _{\ell ^{q}({\mathbb {Z}}^{d_2})} \le {\mathcal {A}}^{\gamma _1}_{N}(\Vert f(\cdot ,\cdot )\Vert _{\ell ^{p}({\mathbb {Z}}^{d_2})})({\varvec{x}}). \end{aligned}$$

Taking the \(\ell ^q({\mathbb {Z}}^{d_1})\) norm in \({\varvec{x}}\) on both sides and appealing to the \(\ell ^p({\mathbb {Z}}^{d_1}) \rightarrow \ell ^q({\mathbb {Z}}^{d_1})\) boundedness of \({\mathcal {A}}^{\gamma _1}_{N}\) we conclude. \(\square \)

An immediate consequence of Lemmas 1 and 2 is therefore that it will suffice:

  • To prove case (i) of Theorem 1 in the case \(\gamma (n) = P(n)\) a univariate polynomial with \(\deg P \ge 2\);

  • To prove case (ii) of Theorem 1 in the case \(\gamma (n) = (n,P(n))\) with P a univariate polynomial with \(\deg P \ge 2\).

  • To prove case (iii) of Theorem 1 in the case \(\gamma (n) = (n,n^2,n^3)\).

2.2 Estimates Using Hausdorff-Young

The operators \({\mathcal {A}}_N\) are convolution operators; in particular, if we let

$$\begin{aligned} \mu _N := \frac{1}{N} \sum _{n=1}^{N} \delta _{\gamma (n)} \end{aligned}$$

we have explicitly \({\mathcal {A}}_N f = f *\mu _N\). When p is of the form \(\frac{4s}{2s+1}\) (for \(s \ge 1/2\)), so that \(1/p - 1/{p'} = 1/(2s)\), one can then argue as follows:

$$\begin{aligned} \Vert {\mathcal {A}}_N f\Vert _{\ell ^{p'}({\mathbb {Z}}^d)}&= \Vert f *\mu _N \Vert _{\ell ^{p'}({\mathbb {Z}}^d)} \le \Vert {\widehat{f}} \cdot \widehat{\mu _N} \Vert _{L^{p}({\mathbb {T}}^d)} \\&\le \Vert {\widehat{f}}\Vert _{L^{p'}({\mathbb {T}}^d)} \Vert \widehat{\mu _N}\Vert _{L^{2s}({\mathbb {T}}^d)} \le \Vert f\Vert _{\ell ^{p}({\mathbb {Z}}^d)} \Vert \widehat{\mu _N}\Vert _{L^{2s}({\mathbb {T}}^d)}, \end{aligned}$$

where we have used the Hausdorff-Young inequality twice.Footnote 4 A bound for \(\Vert \widehat{\mu _N}\Vert _{L^{2s}({\mathbb {T}}^d)}\) will then result in an \(\ell ^p\)-improving inequality of the form (1). By a standard orthogonality calculation one notices that when s is integer \(N^{2s} \Vert \widehat{\mu _N}\Vert _{L^{2s}({\mathbb {T}}^d)}^{2s}\) coincides with the number of solutions with \(n_j,m_j\) in [1, N] to the system of d diophantine equations

$$\begin{aligned} \gamma (n_1) + \ldots + \gamma (n_s) = \gamma (m_1) + \ldots + \gamma (m_s); \end{aligned}$$
(7)

a bound for such number will then result in an \(\ell ^p\)-improving inequality as well.

When \(\gamma = \varGamma \) the moment curve, the system of equations (7) is the so-called Vinogradov diophantine system and it was shown in [2, 21] that the number of solutions is \(\lesssim _{\epsilon } N^{\epsilon }(N^s + N^{2s - D_{\varGamma }})\). Picking the critical value \(s = D_{\varGamma }\) one obtains by the above argument, with \(p_0 = 4D_{\varGamma } / (2D_{\varGamma } + 1)\) as before,

$$\begin{aligned} \Vert {\mathcal {A}}^{\varGamma }_{N}f\Vert _{\ell ^{p_0'}({\mathbb {Z}}^d)} \lesssim _{\epsilon } N^{-1/2 + \epsilon } \Vert f\Vert _{\ell ^{p_0}({\mathbb {Z}}^d)}, \end{aligned}$$
(8)

which can be verified to be optimal. This is how the \(\ell ^p\)-improving result in [6] was obtained.

We point out however that the argument can actually yield a little more than the above: indeed, by using Hausdorff-Young only once, we have for \(q = 2s/(s-1)\)

$$\begin{aligned} \Vert {\mathcal {A}}_N f\Vert _{\ell ^{2s/(s-1)}({\mathbb {Z}}^d)}&\le \Vert {\widehat{f}} \cdot \widehat{\mu _N}\Vert _{L^{2s/(s+1)}({\mathbb {T}}^d)} \\&\le \Vert {\widehat{f}}\Vert _{L^2({\mathbb {T}}^d)} \Vert \widehat{\mu _N}\Vert _{L^{2s}({\mathbb {T}}^d)} = \Vert f\Vert _{\ell ^2({\mathbb {Z}}^d)} \Vert \widehat{\mu _N}\Vert _{L^{2s}({\mathbb {T}}^d)}; \end{aligned}$$

when \(\gamma =\varGamma \) and \(s=D_{\varGamma }\) we obtain thus

$$\begin{aligned} \Vert {\mathcal {A}}^{\varGamma }_N f\Vert _{\ell ^{2D_{\varGamma }/(D_{\varGamma }-1)}({\mathbb {Z}}^d)} \lesssim _{\epsilon } N^{-1/2+\epsilon } \Vert f\Vert _{\ell ^2({\mathbb {Z}}^d)}, \end{aligned}$$
(9)

which implies estimate (8) by interpolation with its own dual. It can be verified that estimate (9) is not only optimal but it is on the critical line \(D/q = (D-1)/p\) (see (3) and Fig. 1). This allows one to interpolate also with the trivial \(\ell ^p \rightarrow \ell ^\infty \) and \(\ell ^1 \rightarrow \ell ^q\) inequalities, thus proving Conjecture 1 not only in a subset of the supercritical regime but also in a subset of the subcritical one. If one further interpolates these inequalities with (6) of Theorem 1 (when \(d\ge 3\)) the range obtained is as in Figs. 2 and 3.

Fig. 2
figure 2

The variously shaded areas correspond to the range of optimal estimates for the moment curve \(\varGamma \) (\(d\ge 3\)) obtained by interpolating estimate (9), its dual and estimate (6) with the trivial ones. The darker area surrounding the bottom right corner corresponds to the \(\epsilon \)-free subcritical estimates obtained by interpolation with (4)

Fig. 3
figure 3

Zooming in on the region of Fig. 2 near the critical point \(\Big (\frac{D}{2D-1}, \frac{D-1}{2D-1}\Big )\)

It is conjectured from standard number-theoretical arguments that for a generic diophantine system of the form (7) the number of solutions in \([1,N]^{2s}\) is controlled by the generalisation of the Vinogradov bound above, that is

$$\begin{aligned} \begin{aligned} J_{s,\gamma }(N) := |\{&(n_1,\ldots ,n_s, m_1,\ldots ,m_s) \in [1,N]^{2s} : \\&\gamma (n_1) + \ldots +\gamma (n_s) = \gamma (m_1) + \ldots + \gamma (m_s)\}| \\&\lesssim _{\epsilon } N^{\epsilon } (N^s + N^{2s-D_{\gamma }}). \end{aligned} \end{aligned}$$
(10)

It is easily verified that if (10) holds for a certain \(\gamma , s\) then the \(\ell ^2 \rightarrow \ell ^{2s/(s-1)}\) bound obtained by the argument above is optimal (although it will only be critical if \(s = D_{\gamma }\)).

We record in Table 1 below a list of optimal estimates for a few examples, obtained from known sharp number-theory estimates for \(\Vert \widehat{\mu _N}\Vert _{L^{2s}({\mathbb {T}}^d)}\) (some in the form (10), as indicated).

Table 1 Some optimal \(\ell ^2 \rightarrow \ell ^q\) estimates obtained by the H-Y argument

Remark 6

It is clear that such simple Hausdorff-Young arguments can never give the critical endpoint of Conjecture 1, even when using the strongest number-theory estimates available.

3 Arithmetic Method of Refinements for Discrete Curves

As anticipated, our plan is to adapt the method of refinements of Christ [3] to our arithmetic setting and apply it to inequalities of the form (1) (or rather, their restricted weak-type versions). In this section we will develop the setup and use it to reduce the proof of Theorem 1 to certain number-theory estimates for systems of diophantine equations. We will find it more convenient to work with the un-normalised averages from now on, so we define

$$\begin{aligned} {\mathscr {A}}^{\gamma }_N f({\varvec{x}}) := N {\mathcal {A}}^{\gamma }_{N} f({\varvec{x}}) = \sum _{n = 1}^{N} f({\varvec{x}} - \gamma (n)). \end{aligned}$$

\({{\mathscr {A}}_N}^{*}\) will denote the adjoint of \({\mathscr {A}}_N\) instead.

3.1 Combinatorial Reformulation of the Estimates

In order to apply the method of refinements to our problem we need to first reformulate the desired restricted weak-type estimates in an equivalent combinatorial fashion as follows.

If we let EF denote finite subsets of \({\mathbb {Z}}^d\), the restricted weak-type version of the conjectured inequality (2) is

$$\begin{aligned} \langle {\mathscr {A}}_N {\mathbf {1}}_E, {\mathbf {1}}_F \rangle \lesssim _{\epsilon } N^{\epsilon } (N^{1-D(1/p - 1/q)} + N^{1/q} + N^{1/{p'}}) |E|^{1/p}|F|^{1/{q'}}. \end{aligned}$$

If we introduce the quantities

$$\begin{aligned} \alpha := \frac{\langle {\mathscr {A}}_N {\mathbf {1}}_E, {\mathbf {1}}_F \rangle }{|F|}, \qquad \beta := \frac{\langle {{\mathscr {A}}_N}^{*}{\mathbf {1}}_F, {\varvec{1}}_{E} \rangle }{|E|}, \end{aligned}$$

and let \(\frac{1}{r} := \frac{1}{p} - \frac{1}{q}\), we see after some calculations that the restricted weak-type inequality above in the supercritical regime is equivalently rewritten as

$$\begin{aligned} \alpha ^{r/{q'}} \beta ^{r/q} \lesssim _{\epsilon } N^{r-D+\epsilon } |E|. \end{aligned}$$

If the exponents pq are in the subcritical regime instead, then the (conjectured) restricted weak-type inequality is equivalently rewritten as

$$\begin{aligned} \alpha ^{r/{q'}} \beta ^{r/q} \lesssim _{\epsilon } N^{\epsilon } (N^{r/q} + N^{r/{p'}}) |E|. \end{aligned}$$

If we take \(p = 2 - \frac{1}{k+1}\) for some \(k < D\) and \(q = p'\) the latter becomes in particular

$$\begin{aligned} \alpha ^{k+1} \beta ^{k} \lesssim _{\epsilon } N^{k+\epsilon } |E| \end{aligned}$$
(11)

(which is in the subcritical regime if \(k< D-1\) and is the critical endpoint estimate if \(k = D-1\)). The estimates in Theorem 1 are precisely of this form.

Remark 7

Observe that since we are considering only characteristic functions we have always \(\alpha ,\beta \le N\).

This reformulation of the desired inequalities has turned our task into that of proving suitable lower bounds for |E| in terms of \(\alpha ,\beta , N\). The machinery developed in this section will serve to achieve precisely this.

3.2 Refinements of the Sets EF

The lemma presented in this subsection is well-known and a number of presentations exist in the literature—it first appeared in [3].

Let EF be finite subsets of \({\mathbb {Z}}^d\) and \(\alpha ,\beta \) be as defined in Sect. 3.1. Observe that if we let

$$\begin{aligned} F_1 := \big \{ {\varvec{x}} \in F : {\mathscr {A}}_N {\mathbf {1}}_E({\varvec{x}}) > \frac{\alpha }{2} \big \} \end{aligned}$$

then we have

$$\begin{aligned} \langle {\mathscr {A}}_N {\varvec{1}}_E, {\varvec{1}}_{F_1} \rangle \ge \frac{1}{2}\langle {\mathscr {A}}_N {\varvec{1}}_{E}, {\varvec{1}}_{F} \rangle . \end{aligned}$$
(12)

Indeed, one sees that

$$\begin{aligned} \langle {\mathscr {A}}_N {\varvec{1}}_{E}, {\varvec{1}}_{F\backslash F_1}\rangle \le \frac{\alpha }{2}|F| = \frac{1}{2}\langle {\mathscr {A}}_N {\varvec{1}}_{E}, {\varvec{1}}_{F} \rangle , \end{aligned}$$

from which (12) follows at once. Notice that in particular we have that \(F_1 \ne \emptyset \). The observation extends easily to show that we can define iteratively (with \(E_0 = E, F_0 = F\))

$$\begin{aligned} F_{j}&:= \big \{{\varvec{x}} \in F_{j-1} : {\mathscr {A}}_N {\mathbf {1}}_{E_{j-1}}({\varvec{x}}) > rsim _j \alpha \big \}, \\ E_{j}&:= \big \{{\varvec{y}} \in E_{j-1} : {{\mathscr {A}}_{N}}^{*}{\mathbf {1}}_{F_{j}}({\varvec{y}}) > rsim _j \beta \big \}, \end{aligned}$$

for implicit constants decreasing sufficiently fast and obtain a sequence of sets as per the following lemma.

Lemma 3

Given EF finite subsets of \({\mathbb {Z}}^d\), there exists a sequence of subsets \(E_j \subseteq E_0 := E\) and \(F_j \subseteq F_0:= F\) such that we have for every j

  1. (i)

    \(F_j \subseteq F_{j-1}\);

  2. (ii)

    for every \({\varvec{x}} \in F_j\) we have \({\mathscr {A}}_N {\varvec{1}}_{E_{j-1}}({\varvec{x}}) > rsim _j \alpha \);

  3. (iii)

    \(\langle {\mathscr {A}}_N {\varvec{1}}_{E_{j-1}}, {\varvec{1}}_{F_j}\rangle > rsim _j \langle {\mathscr {A}}_N {\varvec{1}}_{E}, {\varvec{1}}_{F} \rangle \);

  4. (iv)

    \(E_j \subseteq E_{j-1}\);

  5. (v)

    for every \({\varvec{y}} \in E_j\) we have \({{\mathscr {A}}_N}^{*}{\varvec{1}}_{F_j}({\varvec{y}}) > rsim _j \beta \);

  6. (vi)

    \(\langle {{\mathscr {A}}_N}^{*} {\varvec{1}}_{F_j}, {\varvec{1}}_{E_j}\rangle > rsim _j \langle {\mathscr {A}}_N {\varvec{1}}_{E}, {\varvec{1}}_{F} \rangle \).

We omit the easy proof—see [3] for details.

Remark 8

Ultimately, properties (iii) and (vi) are only needed to show that the sets \(E_j, F_j\) are not empty.

3.3 Flowing Back and Forth

Given the refined sets produced by Lemma 3, we consider now the parameters in which we are averaging. In the following, the parameter k is fixed (ultimately it will only take the values \(k=1,2,3\) for us).

If \({\varvec{y}} \in E_{k}\) we have by (v) of Lemma 3 that \({{\mathscr {A}}_N}^{*}{\varvec{1}}_{F_k}({\varvec{y}}) > rsim _j \beta \); but notice that

$$\begin{aligned} {{\mathscr {A}}_N}^{*}{\varvec{1}}_{F_k}({\varvec{y}}) = \sum _{n=1}^{N} {\varvec{1}}_{F_k}({\varvec{y}} + \gamma (n)) = |B^{{\varvec{y}}}| \end{aligned}$$

with

$$\begin{aligned} B^{{\varvec{y}}} := \{ n_1 \in [1,N] : {\varvec{y}} + \gamma (n_1) \in F_k \}; \end{aligned}$$

thus (v) is a statement about the cardinality of the set of parameters \(B^{{\varvec{y}}} \subseteq [1,N]\)—namely the lower bound \(|B^{{\varvec{y}}}| > rsim \beta \). Now if \({\varvec{y}}\in E_{k}\) and \(n_1\in B^{{\varvec{y}}}\) as above we have \({\varvec{y}}+\gamma (n_1) \in F_k\) and therefore we have by (ii) of Lemma 3 that \({\mathscr {A}}_N {\varvec{1}}_{E_{k-1}}({\varvec{y}}+\gamma (n_1)) > rsim \alpha \); but again

$$\begin{aligned} {\mathscr {A}}_N {\varvec{1}}_{E_{k-1}}({\varvec{y}}+\gamma (n_1)) = \sum _{m=1}^{N} {\varvec{1}}_{E_{k-1}}({\varvec{y}}+\gamma (n_1)-\gamma (m)) =|A^{{\varvec{y}}}_{n_1}| \end{aligned}$$

with

$$\begin{aligned} A^{{\varvec{y}}}_{n_1} := \{ m_1 \in [1,N] : {\varvec{y}}+\gamma (n_1)-\gamma (m_1) \in E_{k-1} \}, \end{aligned}$$

so that (ii) is also a statement about the cardinality of the sets of parameters \(A^{{\varvec{y}}}_{n_1} \subseteq [1,N]\)—namely that \(|A^{{\varvec{y}}}_{n_1}| > rsim \alpha \) if \(n_1 \in B^{{\varvec{y}}}\).

We can clearly continue in this fashion and obtain a collection of slices

$$\begin{aligned} B^{{\varvec{y}}}, A^{{\varvec{y}}}_{n_1}, B^{{\varvec{y}}}_{n_1,m_1}, A^{{\varvec{y}}}_{n_1,m_1, n_2}, \ldots ,B^{{\varvec{y}}}_{n_1,m_1,\ldots ,n_{k-1},m_{k-1}}, A^{{\varvec{y}}}_{n_1,m_1,\ldots ,m_{k-1},n_k}, \end{aligned}$$

(each a subset of [1, N] and each parametrised by the previous ones, which results in somewhat cumbersome notation) where we have defined iteratively

$$\begin{aligned} B^{{\varvec{y}}}_{n_1,m_1,\ldots ,n_{j},m_{j}} := \{n_{j+1} \in [1,N] : {\varvec{y}}&+\gamma (n_1)-\gamma (m_1)+ \ldots \\&+ \gamma (n_{j}) - \gamma (m_j) + \gamma (n_{j+1}) \in F_{k-j} \}, \\ A^{{\varvec{y}}}_{n_1,m_1,\ldots ,m_j,n_{j+1}} := \{m_{j+1} \in [1,N] : {\varvec{y}}&+\gamma (n_1)-\gamma (m_1)+ \ldots \\&- \gamma (m_j) + \gamma (n_{j+1}) - \gamma (m_{j+1}) \in E_{k-j-1} \}. \end{aligned}$$

By Lemma 3, the slices have the fundamental properties that if one takes the chain of parameters \(n_1, m_1, \ldots , n_k, m_k\) such that

$$\begin{aligned} n_1 \in B^{{\varvec{y}}},\, m_1 \in A^{{\varvec{y}}}_{n_1},\, n_2 \in B^{{\varvec{y}}}_{n_1,m_1},\, \ldots ,\, m_k \in A^{{\varvec{y}}}_{n_1,m_1, \ldots , n_k}, \end{aligned}$$
(13)

then we have lower bounds

$$\begin{aligned} |B^{{\varvec{y}}}_{n_1,\ldots ,m_{j}}| > rsim \beta , \qquad |A^{{\varvec{y}}}_{n_1,\ldots ,n_{j+1}}| > rsim \alpha , \end{aligned}$$

for all j and moreover we have

$$\begin{aligned} {\varvec{y}} \in&\, E_{k}, \\ {\varvec{y}} +\gamma (n_1) \in&\, F_{k}, \\ {\varvec{y}}+\gamma (n_1)-\gamma (m_1) \in&\, E_{k-1}, \\ \vdots &\\ {\varvec{y}}+\gamma (n_1)-\gamma (m_1) + \ldots +\gamma (n_k) \in&\, F_1, \\ {\varvec{y}}+\gamma (n_1)-\gamma (m_1) + \ldots +\gamma (n_k)-\gamma (m_{k}) \in&\, E. \end{aligned}$$

In particular, we see that we are “flowing” between E and F with each step.

3.4 Tower of Parameters

The parameter slices defined in Sect. 3.3 assemble naturally into the structure described below that is at the heart of the method of refinements. We will however prune one of the slices before assembling them, in order to enforce a certain crucial condition.

Let k be as in the previous subsection and assume that \(\alpha \gg 1\), depending on certain parameters introduced below.Footnote 5 We leave all slices undisturbed, safe for the last one, which we redefine to be

$$\begin{aligned} {\widetilde{A}}^{{\varvec{y}}}_{n_1,m_1,\ldots ,m_{k-1},n_{k}}&:= A^{{\varvec{y}}}_{n_1,m_1,\ldots ,m_{k-1},n_{k}} \backslash \{ m_k \in [1,N] : \\& \gamma (n_1)-\gamma (m_1) + \ldots +\gamma (n_k)-\gamma (m_{k}) = {\varvec{0}} \}. \end{aligned}$$

Notice that in the set we are removing the variables \(n_1, m_1, \ldots , n_k\) are fixed and thus the set consists of the common zeroes of certain univariate polynomials. The set has thus cardinality \(\lesssim _{\gamma } 1\) and since \(\alpha \gg 1\) we still have

$$\begin{aligned} |{\widetilde{A}}^{{\varvec{y}}}_{n_1,m_1,\ldots ,m_{k-1}, n_k}| > rsim \alpha . \end{aligned}$$

We will now define iteratively the set \({\mathcal {T}} \subseteq [1,N]^{2k}\) of sequences of parameters \((n_1,m_1,\) \(\ldots ,n_k,m_k)\) obtained by flowing back and forth as per Sect. 3.3. The set \({\mathcal {T}}\) is called the tower of parameters and is defined as follows: let \(S_1 := B^{{\varvec{y}}}\) and \(T_1:= \bigcup _{n_1\in S_1} \{n_1\}\times A^{{\varvec{y}}}_{n_1}\), and let iteratively

$$\begin{aligned} S_j := \bigcup _{{\varvec{t}}\in T_{j-1}} \{{\varvec{t}}\}\times B^{{\varvec{y}}}_{{\varvec{t}}}, \qquad T_j := \bigcup _{{\varvec{s}}\in S_{j}} \{{\varvec{s}}\}\times A^{{\varvec{y}}}_{{\varvec{s}}}, \end{aligned}$$

except for \(T_k\) where in place of \(A^{{\varvec{y}}}_{{\varvec{s}}}\) we use the pruned slice \({\widetilde{A}}^{{\varvec{y}}}_{{\varvec{s}}}\) instead. Then the tower of parameters is simply \({\mathcal {T}} := T_{k}\). The elements \((n_1,m_1, \ldots , n_k,m_k)\) of \({\mathcal {T}}\) are chains that satisfy (13). We will say that \({\mathcal {T}}\) has been constructed by flowing k times in each direction.

Remark 9

The pruning has had the effect of enforcing the condition that wherever we follow the flow starting at \({\varvec{y}} \in E_k\) and given by \((n_1,m_1, \ldots , n_k,m_k)\) we never end up back at point \({\varvec{y}}\). In other words, if \((n_1,m_1, \ldots , n_k,m_k) \in {\mathcal {T}}\) we have

$$\begin{aligned} {\varvec{y}}+\gamma (n_1)-\gamma (m_1) + \ldots +\gamma (n_k)-\gamma (m_{k}) \ne {\varvec{y}}. \end{aligned}$$

Observe that we can provide a lower bound for the cardinality of \({\mathcal {T}}\) in terms of \(\alpha ,\beta \): indeed, we have

$$\begin{aligned} |{\mathcal {T}}| = |T_{k}| = \sum _{{\varvec{s}}\in S_{k}} |{\widetilde{A}}^{{\varvec{y}}}_{{\varvec{s}}}| > rsim \alpha |S_k| \quad \text { and } \quad |S_k| = \sum _{{\varvec{t}}\in T_{k-1}} |B^{{\varvec{y}}}_{{\varvec{t}}}| > rsim \beta |T_{k-1}|; \end{aligned}$$

iterating all the way to \(S_1 = B^{{\varvec{y}}}\), we obtain

$$\begin{aligned} |{\mathcal {T}}| > rsim \alpha ^{k}\beta ^{k}. \end{aligned}$$
(14)

3.5 Lower Bounds for |E| and Proof of Theorem 1

With \({\varvec{y}}\) as above, we now let \(\varPsi \) denote the map

$$\begin{aligned} \varPsi (n_1,m_1,\ldots ,n_k,m_k) := {\varvec{y}}+\gamma (n_1)-\gamma (m_1) + \ldots +\gamma (n_k)-\gamma (m_k). \end{aligned}$$
(15)

The definition of the slices and of \({\mathcal {T}}\) show that \(\varPsi ({\mathcal {T}}) \subseteq E\) by construction, and therefore we have quite simply the lower bound

$$\begin{aligned} |\varPsi ({\mathcal {T}})| \le |E|. \end{aligned}$$

This lower bound is of limited use in this form as in general it is not easy to compute \(|\varPsi ({\mathcal {T}})|\). One can however estimate it using the multiplicity of the map \(\varPsi \) over \({\mathcal {T}}\) in the following way. Given a mapping \(\Phi : [1,N]^{s} \rightarrow {\mathbb {Z}}^d\) and a set \(S \subseteq [1,N]^{s}\) we define the multiplicity of \(\Phi \) over S to be

$$\begin{aligned} m(\Phi ;S) := \max _{z \in \Phi (S)} |\Phi ^{-1}(\{z\}) \cap S|. \end{aligned}$$

If we let \({\mathcal {T}}_{{\varvec{z}}} := \varPsi ^{-1}(\{{\varvec{z}}\}) \cap {\mathcal {T}}\) for convenience, we then have

$$\begin{aligned} |\varPsi ({\mathcal {T}})| = \sum _{{\varvec{z}} \in \varPsi ({\mathcal {T}})} 1 = \sum _{{\varvec{z}} \in \varPsi ({\mathcal {T}})} \frac{|{\mathcal {T}}_{{\varvec{z}}}|}{|{\mathcal {T}}_{{\varvec{z}}}|} \ge \sum _{{\varvec{z}} \in \varPsi ({\mathcal {T}})} \frac{|{\mathcal {T}}_{{\varvec{z}}}|}{m(\varPsi ;{\mathcal {T}})} = \frac{|{\mathcal {T}}|}{m(\varPsi ;{\mathcal {T}})}, \end{aligned}$$

so that we always have the lower bound

$$\begin{aligned} |{\mathcal {T}}| \le m(\varPsi ;{\mathcal {T}}) |E|. \end{aligned}$$
(16)

Remark 10

Observe that for \(\varPsi \) as given by (15) the quantity \(m(\varPsi ;{\mathcal {T}})\) is the maximum number of solutions \((n_1, m_1, \ldots , n_k, m_k) \in {\mathcal {T}}\) to the system of diophantine equations given by

$$\begin{aligned} \gamma (n_1) + \ldots + \gamma (n_k) = ({\varvec{z}} - {\varvec{y}}) + \gamma (m_1) + \ldots + \gamma (m_k), \end{aligned}$$

as \({\varvec{z}}\) ranges over \(\varPsi ({\mathcal {T}})\) – and by construction of \({\mathcal {T}}\), \({\varvec{z}} \ne {\varvec{y}}\). We then see that we are dealing with an inhomogeneous version of (7), and therefore \(m(\varPsi ;{\mathcal {T}})\) should be compared with the quantity \(J_{k,\gamma }(N)\) as defined in (10).

Combining (16) and (14) one has therefore

$$\begin{aligned} \alpha ^k \beta ^k \lesssim m(\varPsi ;{\mathcal {T}}) |E|, \end{aligned}$$

which since \(\alpha \le N\) implies

$$\begin{aligned} \alpha ^{k+1} \beta ^k \lesssim N\, m(\varPsi ;{\mathcal {T}}) |E|. \end{aligned}$$

Comparing this with the combinatorial reformulation (11) one sees that to conclude an optimal subcritical estimate on the \(q = p'\) line it is sufficient to show that we can construct a tower \({\mathcal {T}}\) (obtained by flowing k times in each direction) such that \(m(\varPsi ;{\mathcal {T}}) \lesssim _\epsilon N^{k-1 + \epsilon }\) can be shown to hold. This is precisely the strategy that we adopt in the proof of Theorem 1.

Remark 11

Comparing once again the quantities \(m(\varPsi ;{\mathcal {T}})\) with \(J_{k,\gamma }\), we stress the fact that we are looking for an estimate of the form \(m(\varPsi ;{\mathcal {T}}) \lesssim _\epsilon N^{k-1 + \epsilon }\), whereas, for the corresponding homogeneous system (7), estimate (10) in this regime takes instead the form \(J_{k,\gamma }(N) \lesssim _{\epsilon } N^{k + \epsilon }\) (and this bound clearly cannot be improved because of the presence of diagonal solutions to (7)).

Proof (of Theorem 1)

By (11) of Sect. 3.1 the inequalities (4), (5), (6) of Theorem 1 can be reformulated as, respectively,

$$\begin{aligned} \alpha ^2 \beta ^{}&\lesssim _{\epsilon } N^{1+\epsilon } |E|, \\ \alpha ^3 \beta ^2&\lesssim _{\epsilon } N^{2+\epsilon } |E|, \\ \alpha ^4 \beta ^3&\lesssim _{\epsilon } N^{3+\epsilon } |E|. \end{aligned}$$

We claim that we can always assume that \(\alpha , \beta \gg 1\). Indeed, if \(\alpha \lesssim 1\) or \(\beta \lesssim 1\) we have \(\alpha ^k \beta ^k \lesssim N^k\) for any k; but since we see easilyFootnote 6 that \(\alpha \lesssim |E|\), all the desired inequalities would immediately follow.

Assuming then \(\alpha , \beta \gg 1\), as anticipated above we proceed to prove the sharpenedFootnote 7 inequalities

$$\begin{aligned} \alpha ^{} \beta ^{}&\lesssim _{\epsilon } N^{\epsilon } |E|, \\ \alpha ^2 \beta ^2&\lesssim _{\epsilon } N^{1+\epsilon } |E|, \\ \alpha ^3 \beta ^3&\lesssim _{\epsilon } N^{2+\epsilon } |E|, \end{aligned}$$

from which the desired ones follow immediately since \(\alpha \le N\).

We prove these inequalities all at once, conditionally on Lemmata 4, 5, 6 which are proven in Sect. 4. Let \(k\in \{1,2,3\}\) and recall that by Sect. 2.1 we can assume \(\gamma (n) = P(n)\) with \(\deg P \ge 2\) when \(k=1\), \(\gamma (n) = (n, P(n))\) when \(k=2\) and \(\gamma (n)=(n,n^2,n^3)\) when \(k=3\). For each k build the tower \({\mathcal {T}}\) as per Sect. 3.4 by flowing k times in each direction, so that by (14) and (16) we have

$$\begin{aligned} \alpha ^k \beta ^k \lesssim m(\varPsi ;{\mathcal {T}}) |E|, \end{aligned}$$

with \(\varPsi \) given by (15). By construction the tower \({\mathcal {T}}\) is contained in the set

$$\begin{aligned} \{(n_1,m_1,\ldots , n_k,m_k) \in [1,N]^{2k} : \gamma (n_1) - \gamma (m_1) + \ldots + \gamma (n_k) - \gamma (m_k) \ne 0\} \end{aligned}$$

(see Remark 9) and therefore \(m(\varPsi ;{\mathcal {T}})\) is bounded by the maximum number of solutions in \([1,N]^{2k}\) to

$$\begin{aligned} \gamma (n_1) - \gamma (m_1) + \ldots + \gamma (n_k) - \gamma (m_k) = {\mathfrak {z}} \end{aligned}$$

when \({\mathfrak {z}} \ne 0\) (see Remark 10). By substituting for \(\gamma \) the respective special forms for each k we see that

  • When \(k=1\), \(m(\varPsi ;{\mathcal {T}}) \lesssim _\epsilon N^{\epsilon }\) when \(d=1\) and \(m(\varPsi ;{\mathcal {T}}) \lesssim 1\) when \(d\ge 2\) by Lemma 4 of Sect. 4 and Remark 12;

  • When \(k=2\), \(m(\varPsi ;{\mathcal {T}}) \lesssim _\epsilon N^{1 + \epsilon }\) by Lemma 5 of Sect. 4;

  • When \(k=3\), \(m(\varPsi ;{\mathcal {T}}) \lesssim _\epsilon N^{2 + \epsilon }\) by Lemma 6 of Sect. 4.

The proof is thus concluded, modulo the proofs of the lemmata which are presented in the next section. \(\square \)

4 Bounds for the Number of Solutions to Diophantine Systems of Equations

In this last section we conclude the proof of Theorem 1 by proving the lemmata for the number of solutions of the relative diophantine equations employed above. Such lemmata are proven by elementary means, ultimately resting on the divisor bound; the proofs are inspired by the corresponding one in [9] by the second author and Wooley.

The lemmata are ordered by increasing number of equations and increasing number of variables. The proof of the first one already contains in essence the idea for all three proofs.

Lemma 4

For every \(\epsilon >0\) the following holds.

Let \(P \in {\mathbb {Z}}[X]\) with \(\deg P \ge 2\). The number of solutions \((n_1,m_1) \in [1,N]^2\) to

$$\begin{aligned} P(n_1) - P(m_1) = {\mathfrak {z}}_1 \end{aligned}$$
(17)

with \(|{\mathfrak {z}}_1|\lesssim N^{\deg P}\) and \({\mathfrak {z}}_1 \ne 0\) is bounded by \(\lesssim _{\epsilon } N^{\epsilon }\).

Proof

Observe that since P is not linear it must be that for some non-vanishing non-constant \(Q \in {\mathbb {Z}}[X,Y]\) we have identically

$$\begin{aligned} P(X) - P(Y) = Q(X,Y)(X - Y); \end{aligned}$$

moreover, for any n the univariate polynomial Q(nY) is non-constant. But the polynomial identity implies that any solution to (17) is also a solution to one of the systems of diophantine equations

$$\begin{aligned} {\left\{ \begin{array}{ll} Q(n_1,m_1) &{}= d_1, \\ n_1 - m_1 &{}= d_2, \end{array}\right. } \end{aligned}$$

with \(d_1 d_2 = {\mathfrak {z}}_1\). Since there are only \(\lesssim _\epsilon N^{\epsilon }\) such factorisations of \({\mathfrak {z}}_1 \ne 0\) (by the divisor bound) and since each distinct such system has clearly at most \(\lesssim _{\deg P} 1\) solutions (since \(Q(n_1,Y)\) is non-constant), we conclude that (17) has at most \(\lesssim _{\epsilon ,\gamma } N^{\epsilon }\) solutions. \(\square \)

Remark 12

When \(d \ge 2\) in (4), we have to consider instead the system

$$\begin{aligned} {\left\{ \begin{array}{ll} P_1(n) - P_1(m) &{}= {\mathfrak {z}}_1, \\ P_2(n) - P_2(m) &{}= {\mathfrak {z}}_2, \end{array}\right. } \end{aligned}$$

obtained from the first two components of \(\gamma \) (where \({\mathfrak {z}}_1, {\mathfrak {z}}_2\) are not both simultaneously zero). This system can only have at most \(\lesssim _\gamma 1\) solutions, thanks to Bezout’s theorem (indeed, notice that the polynomials \(P_j(n) - P_j(m) - {\mathfrak {z}}_j\) with \(j=1,2\) cannot have a non-trivial common factor, thanks to the fact that \(({\mathfrak {z}}_1,{\mathfrak {z}}_2) \ne (0,0)\)). Thus when \(d\ge 2\) we obtain (4) without \(\epsilon \)-losses.

The second lemma deals with a system of two diophantine equations in four variables in which the first equation is linear.

Lemma 5

For every \(\epsilon >0\) the following holds.

Let \(P \in {\mathbb {Z}}[X]\) with \(\deg P \ge 2\). The number of solutions \((n_1, m_1, n_2, m_2) \in [1,N]^4\) to

$$\begin{aligned} {\left\{ \begin{array}{ll} n_1 - m_1 + n_2 - m_2 &{}= {\mathfrak {z}}_1, \\ P(n_1) - P(m_1) + P(n_2) - P(m_2) &{}= {\mathfrak {z}}_2, \end{array}\right. } \end{aligned}$$

with \(|{\mathfrak {z}}_1| \lesssim N, |{\mathfrak {z}}_2|\lesssim N^{\deg P}\) and \({\mathfrak {z}}_1,{\mathfrak {z}}_2\) not both simultaneously zero is bounded by \(\lesssim _{\epsilon } N^{1 + \epsilon }\).

The proof of Lemma 5 rests on the fact that if P is non-linear then there is a non-vanishing polynomial \(Q \in {\mathbb {Z}}[X,Y,Z]\) such that

$$\begin{aligned} P(X) - P(Y) + P(Z) - P(X-Y+Z) = Q(X,Y,Z)(X-Y)(Y-Z) \end{aligned}$$
(18)

identically (and moreover, for any nm the univariate polynomial Q(nmZ) does not vanish identically). The resulting proof is essentially a simpler version of the proof of the next lemma, and therefore we omit the details and direct the reader there.

The final lemma deals with a special system of three diophantine equations in six variables—effectively an inhomogeneous version of one of the so-called Vinogradov systems.

Lemma 6

For every \(\epsilon > 0\), the following holds.

The number of solutions \((n_1, m_1, n_2, m_2, n_3, m_3) \in [1,N]^6\) to the diophantine system of equations

$$\begin{aligned} {\left\{ \begin{array}{ll} n_1 - m_1 + n_2 &{} = {\mathfrak {z}}_1 + m_2 - n_3 + m_3, \\ n_1^2 - m_1^2 + n_2^2 &{} = {\mathfrak {z}}_2 + m_2^2 - n_3^2 + m_3^2, \\ n_1^3 - m_1^3 + n_2^3 &{} = {\mathfrak {z}}_3 + m_2^3 - n_3^3 + m_3^3, \end{array}\right. } \end{aligned}$$
(19)

with \(|{\mathfrak {z}}_1| \lesssim N, |{\mathfrak {z}}_2| \lesssim N^2, |{\mathfrak {z}}_3| \lesssim N^3\) and \({\mathfrak {z}}_1, {\mathfrak {z}}_2, {\mathfrak {z}}_3\) not all simultaneously zero is bounded by \(\lesssim _\epsilon N^{2+\epsilon }\).

Proof

For solutions of the type we want, the quantity \(u := n_1 - m_1 + n_2\) can take at most \(\lesssim N\) values. Fix then such a value \(u \lesssim N\) and observe that using the polynomial identities (particular cases of (18))

$$\begin{aligned} X^2 - Y^2 + Z^2 - (X - Y + Z)^2&= 2(X-Y)(Y-Z), \\ X^3 - Y^3 + Z^3 - (X - Y + Z)^3&= 3(X+Z)(X-Y)(Y-Z), \end{aligned}$$

we can rewrite system (19) as

$$\begin{aligned} {\left\{ \begin{array}{ll} n_1 - m_1 + n_2 &{}= u,\\ m_2 - n_3 + m_3 &{} = u - {\mathfrak {z}}_1, \\ 2(n_1 - m_1)(m_1 - n_2) &{} = 2(m_2 - n_3)(n_3 - m_3) \\ &{} - u^2 + {\mathfrak {z}}_2 + (u-{\mathfrak {z}}_1)^2, \\ 3(n_1 + n_2)(n_1 - m_1)(m_1 - n_2) &{} = 3(m_2+m_3)(m_2 - n_3)(n_3 - m_3) \\ &{} - u^3 +{\mathfrak {z}}_3 + (u-{\mathfrak {z}}_1)^3. \end{array}\right. } \end{aligned}$$
(20)

We stress that if \((n_1, m_1, n_2, m_2, n_3, m_3) \in [-N,N]^6\) is a solution to (19) then for some value of \(u\lesssim N\) it is a solution to (20) too.

Multiplying by 2 and using the quadratic equation, we see that we can rewrite the cubic equation above as

$$\begin{aligned} 3(n_1 + n_2)2&(n_1 - m_1)(m_1 - n_2) \\&= 3(m_2+m_3) [2(n_1 - m_1)(m_1 - n_2) + u^2 - {\mathfrak {z}}_2 - (u-{\mathfrak {z}}_1)^2] \\& +2({\mathfrak {z}}_3 + (u-{\mathfrak {z}}_1)^3 - u^3), \end{aligned}$$

and rearranging

$$\begin{aligned} \begin{aligned} 6(n_1 + n_2 -&m_2 - m_3)(n_1 - m_1)(m_1 - n_2) \\&= 3(m_2+m_3)[u^2 - {\mathfrak {z}}_2 - (u-{\mathfrak {z}}_1)^2] +2({\mathfrak {z}}_3 + (u-{\mathfrak {z}}_1)^3 - u^3). \end{aligned} \end{aligned}$$
(21)

Letting \(t = m_2 + m_3\) we observe that, once t is also fixed, if

$$\begin{aligned} M := 3t[u^2 - {\mathfrak {z}}_2 - (u-{\mathfrak {z}}_1)^2] +2({\mathfrak {z}}_3 + (u-{\mathfrak {z}}_1)^3 - u^3) \ne 0 \end{aligned}$$

then since \(M \lesssim N^3\) this number can be factorised as \(M = 6 \,d_1 d_2 d_3\) into at most \(\lesssim _\epsilon N^\epsilon \) ways, by the divisor bound. Solutions to (19) are therefore solutions to one of the systems

$$\begin{aligned} {\left\{ \begin{array}{ll} n_1 - m_1 + n_2 &{}= u,\\ m_2 - n_3 + m_3 &{} = u - {\mathfrak {z}}_1, \\ m_2 + m_3 &{}= t, \\ n_1 + n_2 - m_2 - m_3 &{}= d_1, \\ n_1 - m_1 &{}= d_2, \\ m_1 - n_2 &{}= d_3, \end{array}\right. } \end{aligned}$$

obtained by choosing ut and factorising M. This system is not linearly independent (it has rank 5) but it contains enough equations to fix the values of, say, \(n_1,m_1,n_2\). By the quadratic equation of (20) we have then

$$\begin{aligned} 2(m_2 - n_3)(n_3 - m_3) = 2 d_2 d_3 + u^2 - {\mathfrak {z}}_2 - (u-{\mathfrak {z}}_1)^2. \end{aligned}$$
(22)

If the right-hand side of (22) is non-zero we can invoke the divisor bound again to factor it as \(2d_4 d_5\) and thus reduce to the system

$$\begin{aligned} {\left\{ \begin{array}{ll} m_2 - n_3 + m_3 &{} = u - {\mathfrak {z}}_1, \\ m_2 + m_3 &{}= t, \\ m_2 - n_3 &{}= d_4, \\ n_3 - m_3 &{}= d_5, \end{array}\right. } \end{aligned}$$

which can have at most 1 solution (if any). If the right-hand side of (22) is instead zero then it must be either \(m_2 - n_3 =0\) or \(n_3 - m_3 = 0\); either of them gives a well-posed system of linear equations and thus at most 1 solution. This analysis has thus shown that for values of ut such that \(M \ne 0\) the system has at most \(\lesssim _\epsilon N^{2\epsilon }\) solutions; since \(|u|,|t| \lesssim N\), we obtain from these situations a contribution of at most \(\lesssim _\epsilon N^{2+2\epsilon }\) solutions to (19).

The case in which \(M = 0\) for some choice of parameters has to be dealt with separately, that which involves the gruelling analysis of several subcases (the assumption that \({\mathfrak {z}}_1,{\mathfrak {z}}_2,{\mathfrak {z}}_3\) are not all simultaneously zero is crucial here). However, to show that these cases contribute at most some more \(\lesssim _\epsilon N^{2+\epsilon }\) solutions to (19) altogether, the very same arguments used above (or variations thereof) suffice; hence we will discuss them quite briefly. Let \(A := u^2 - {\mathfrak {z}}_2 - (u-{\mathfrak {z}}_1)^2\) and \(B:= {\mathfrak {z}}_3 + (u-{\mathfrak {z}}_1)^3 - u^3\), so that we can rewrite \(M = 3tA + B\).

  1. (i)

    If \(A \ne 0\) there can be at most a single value of t such that \(M=0\) (saving us a factor of \(\lesssim N\)); one of the linear factors in the left-hand side of (21) must therefore be zero, that which yields an additional linear equation. Combining all the linear equations obtained so far with the quadratic equation of (20) and using again the divisor-bound, one accumulates at most \(\lesssim _{\epsilon } N^{2+\epsilon }\) additional solutions (with the second factor of \(\lesssim N\) arising from the need to choose one solution to an under-determined linear sub-system).

  2. (ii)

    If \(A = 0\) notice that it must be \(2u {\mathfrak {z}}_1 = {\mathfrak {z}}_2 + {\mathfrak {z}}_1^2\). Moreover, since we are assuming \(M=0\), it must be \(B=0\) too. There are here two further subcases to consider, according to whether \({\mathfrak {z}}_1 \ne 0\) or \({\mathfrak {z}}_1 = 0\). If \({\mathfrak {z}}_1 \ne 0\) then u is fixed by the above identity so that we do not need to choose it (this saves us a factor of \(\lesssim N\)). As before, one of the factors at the left-hand side of (21) must be zero, that which yields a further linear equation. If this equation is \(n_1 - m_1 =0\) or \(m_1 - n_2 = 0\) then it is easy to show that there can be at most \(\lesssim N^2\) solutions (by taking into account the quadratic equation of (20)). If the equation is instead \(n_1 + n_2 - m_2 - m_3 = 0\), we fix a value of \(t = m_2 + m_3\) with \(|t|\lesssim N\) as before and observe that we have in total two independent linear equations in \(n_1, n_2, m_1\). Fixing a solution \((n_1,n_2,m_1)\) to these (of which there are at most \(\lesssim N\)), the remaining equations give the system

    $$\begin{aligned} {\left\{ \begin{array}{ll} m_2 - n_3 + m_3 &{}= u - {\mathfrak {z}}_1, \\ m_2 + m_3 &{}= t, \\ 2(m_2 - n_3)(n_3 - m_3) &{} = 2(n_1 - m_1)(m_1 - n_2), \end{array}\right. } \end{aligned}$$

    where the right-hand sides are now all known. This system can be seen to have at most two solutions, and therefore the \(A=0, {\mathfrak {z}}_1 \ne 0\) case contributes at most \(\lesssim N^{2}\) solutions again. In the remaining case in which \(A=0, {\mathfrak {z}}_1 = 0\), we see that \(A = 0\) forces \({\mathfrak {z}}_2= 0\) and \(B=0\) forces \({\mathfrak {z}}_3 = 0\) too. This is impossible by assumption and thus this case does not contribute any further solutions.

The proof of the Lemma is thus concluded. \(\square \)

In the above argument we rely in an essential way on a polynomial identity that is special to degree 3. It looks therefore unlikely that such simple divisor-bound arguments could handle higher dimensional cases, even for the moment curve.