1 Introduction

This short note serves primarily as a commentary on the very recent topic of pointwise estimates for the operator that restricts the Fourier transform to a hypersurface. We will concentrate exclusively on the case of the two-dimensional unit sphere \({\mathbb {S}}^2\) in three-dimensional Euclidean space \({\mathbb {R}}^3\). This both simplifies the exposition and enables the formulation of more general results.

Classical Fourier restriction theory seeks for a priori \(L ^p\)-estimates for the operator \(f\mapsto {\widehat{f}}|_S\), where S is a hypersurface in the Euclidean space. In the case of \({\mathbb {S}}^2\subset {\mathbb {R}}^3\), the endpoint Tomas–Stein inequality [17, 19] reads as follows:

$$\begin{aligned} \big \Vert {\widehat{f}}\,\big \vert _{{\mathbb {S}}^2} \big \Vert _{L ^2({\mathbb {S}}^2,\sigma )} \lesssim \Vert f\Vert _{L ^{4/3}({\mathbb {R}}^3)}. \end{aligned}$$
(1.1)

Here \(\sigma \) denotes the standard surface measure on \({\mathbb {S}}^2\). It is well-known that 4/3 is the largest exponent that can appear on the right-hand side of (1.1) provided that we stick to the \(L ^2\)-norm on the left-hand side. Estimate (1.1) enables the Fourier restriction operator

$$\begin{aligned} {\mathcal {R}}:L ^{4/3}({\mathbb {R}}^3)\rightarrow L ^2({\mathbb {S}}^2,\sigma ) \end{aligned}$$
(1.2)

to be defined via an approximation of identity argument as follows. Fix a complex-valued Schwartz function \(\chi \) such that \(\int _{{\mathbb {R}}^3}\chi (x)\,d x=1\), and write \(\chi _{\varepsilon }\) for the \(L ^1\)-normalized dilate of \(\chi \), defined as

$$\begin{aligned} \chi _{\varepsilon }(x):=\varepsilon ^{-3}\chi (\varepsilon ^{-1}x) \end{aligned}$$

for \(x\in {\mathbb {R}}^3\) and \(\varepsilon >0\). Given \(f\in L ^{4/3}({\mathbb {R}}^3)\), then, thanks to (1.1), \({\mathcal {R}}f\) can be defined as the limit

$$\begin{aligned} \lim _{\varepsilon \rightarrow 0^+}({\widehat{f}}*\chi _{\varepsilon })\big \vert _{{\mathbb {S}}^2} \end{aligned}$$

in the norm of the space \(L ^2({\mathbb {S}}^2,\sigma )\).

Maximal restriction theorems were recently inaugurated by Müller, Ricci, and Wright [13]. In that work, the authors considered general \(C ^2\) planar curves with nonnegative signed curvature equipped with the affine arclength measure, and established a maximal restriction theorem in the full range of exponents where the usual restriction estimate is known to hold. Shortly thereafter, Vitturi [20] provided an elementary argument which leads to a partial generalization to higher-dimensional spheres. In \({\mathbb {R}}^3\), Vitturi’s result covers the full Tomas–Stein range, whose endpoint estimate amounts to

$$\begin{aligned} \Big \Vert \sup _{\varepsilon >0} \big | ({\widehat{f}}*\chi _{\varepsilon })(\omega ) \big | \Big \Vert _{L ^2_{\omega }({\mathbb {S}}^2,\sigma )} \lesssim _{\chi } \Vert f\Vert _{L ^{4/3}({\mathbb {R}}^3)}. \end{aligned}$$
(1.3)

An easy consequence of (1.3) and of obvious convergence properties in the dense class of Schwartz functions is the fact that the limit

$$\begin{aligned} \lim _{\varepsilon \rightarrow 0^+}({\widehat{f}}*\chi _{\varepsilon })(\omega ) \end{aligned}$$
(1.4)

exists for each \(f\in L ^{4/3}({\mathbb {R}}^3)\) and for \(\sigma \)-almost every \(\omega \in {\mathbb {S}}^2\). This enables us to recover the operator (1.2) also in the pointwise sense, and not only in the \(L ^2\)-norm, which was the main motivation behind the paper [13].

For the elegant proof of (1.3), Vitturi [20] used the following equivalent, non-oscillatory reformulation of the ordinary restriction estimate (1.1):

$$\begin{aligned} \Big | \int \limits _{({\mathbb {S}}^2)^2} g(\omega ) \overline{g(\omega ')} h(\omega -\omega ') \,d \sigma (\omega ) \,d \sigma (\omega ') \Big | \lesssim \Vert g\Vert _{L ^2({\mathbb {S}}^2,\sigma )}^2 \Vert h\Vert _{L ^{2}({\mathbb {R}}^3)}. \end{aligned}$$
(1.5)

The proof of the equivalence between (1.1) and (1.5) amounts to passing to the adjoint operator (i.e., to a Fourier extension estimate) and expanding out the \(L ^4\)-norm using Plancherel’s identity. These steps make the choice of exponents 4/3 and 2 into the most convenient one. The advantage of the expanded adjoint formulation (1.5) is that one can easily insert the iterated maximal function operator in h, and simply invoke its boundedness on \(L ^2({\mathbb {R}}^3)\). We refer the reader to [20] for details. In some sense, we will be following a similar step below.

In this paper, we quantify the pointwise convergence result (1.4) in terms of the so-called variational norms. These were introduced by Lépingle [11] in the context of probability theory. Various modifications were then defined and used by Bourgain for numerous problems in harmonic analysis and ergodic theory; see for instance [5]. Recall that, given a function \(a:(0,\infty )\rightarrow {\mathbb {C}}\) and an exponent \(\varrho \in [1,\infty )\), the \(\varrho \)-variation seminorm of a is defined as

$$\begin{aligned} \Vert a\Vert _{\widetilde{V }^{\varrho }} := \sup _{\begin{array}{c} m\in {\mathbb {N}}\\ 0<\varepsilon _0<\varepsilon _1<\cdots <\varepsilon _m \end{array}} \Big ( \sum _{j=1}^{m} |a(\varepsilon _{j})-a(\varepsilon _{j-1})|^{\varrho } \Big )^{1/\varrho }. \end{aligned}$$

In order to turn it into a \(\varrho \)-variation norm, we simply add the term \(|a(\varepsilon _0)|^{\varrho }\) as follows:

$$\begin{aligned} \Vert a\Vert _{V ^{\varrho }} := \sup _{\begin{array}{c} m\in {\mathbb {N}}\\ 0<\varepsilon _0<\varepsilon _1<\cdots <\varepsilon _m \end{array}} \Big ( |a(\varepsilon _0)|^{\varrho } + \sum _{j=1}^{m} |a(\varepsilon _{j})-a(\varepsilon _{j-1})|^{\varrho } \Big )^{1/\varrho }. \end{aligned}$$

Since this is not the standard convention in the literature, we make a distinction as we are going to use both quantities. Clearly, \(\Vert a\Vert _{V ^{\varrho }}\) controls both \(\sup _{\varepsilon >0}|a(\varepsilon )|\) and the number of “jumps” of \(a(\varepsilon )\) as \(\varepsilon \rightarrow 0^+\) (and even as \(\varepsilon \rightarrow \infty \), which is not the interesting case here). A particular instance of the main result of this paper (which is Theorem 1 below) is the following variational generalization of estimate (1.3):

$$\begin{aligned} \Big \Vert \big \Vert ({\widehat{f}}*\chi _{\varepsilon })(\omega ) \big \Vert _{V ^{\varrho }_{\varepsilon }} \Big \Vert _{L ^2_{\omega }({\mathbb {S}}^2,\sigma )} \lesssim _{\chi ,\varrho } \Vert f\Vert _{L ^{4/3}({\mathbb {R}}^3)} \end{aligned}$$
(1.6)

when \(\varrho \in (2,\infty )\) and \(f\in L ^{4/3}({\mathbb {R}}^3)\). The reader can still consider \(\chi \) to be a fixed Schwartz function, but we are just about to discuss more general possible choices. Variational estimates like (1.6) for various averages and truncations of integral operators have been extensively studied; the papers [2, 6, 7, 9, 12] are just a sample from the available literature. In addition to quantifying the mere convergence, such estimates establish convergence in the whole \(L ^p\)-space in an explicit and quantitative manner, without the need for pre-existing convergence results on a dense subspace. Later in the text, we will also use the biparameter \(\varrho \)-variation seminorm, defined for a function of two variables \(b:(0,\infty )\times (0,\infty )\rightarrow {\mathbb {C}}\) as

It is natural to consider more general averaging functions \(\chi \); this has already been suggested (albeit somewhat implicitly) in the papers [13, 20]. It is clear from the proof in [20] that the function \(\chi \) does not need to be smooth. One can, for instance, take \(\chi \) to be the \(L ^1\)-normalized indicator function of the unit ball in \({\mathbb {R}}^3\), in which case the \(({\widehat{f}}*\chi _{\varepsilon })(\omega )\) become the usual Hardy–Littlewood averages of \({\widehat{f}}\) over Euclidean balls \(B (\omega ,\varepsilon )\),

$$\begin{aligned} \frac{1}{|B (\omega ,\varepsilon )|} \int \limits _{B (\omega ,\varepsilon )} {\widehat{f}}(y) d y. \end{aligned}$$

Moreover, Ramos [15] concluded that, for each \(f\in L ^{p}({\mathbb {R}}^3)\) and \(1\le p\le 4/3\), almost every point on the sphere is a Lebesgue point of \({\widehat{f}}\), i.e., for \(\sigma \)-a.e. \(\omega \in {\mathbb {S}}^2\), we have that

$$\begin{aligned} \lim _{\varepsilon \rightarrow 0^+} \frac{1}{|\hbox {B}(\omega ,\varepsilon )|} \int \limits _{\hbox {B}(\omega ,\varepsilon )} \big |{\widehat{f}}(y) - ({\mathcal {R}}f)(y) \big | \,d y = 0. \end{aligned}$$

Prior to [15], this had been confirmed by Vitturi [20] for functions \(f\in L ^{p}({\mathbb {R}}^3)\), \(1\le p\le 8/7\), who repeated the two-dimensional argument of Müller, Ricci, and Wright [13].

Subsequent papers [10] and [14], which appeared after the first version of the present paper, generalize the averaging procedure even further, by convolving \({\widehat{f}}\) with certain averaging measures \(\mu \). In light of this more recent research, we now take the opportunity to both generalize (1.6) and complete our short survey of maximal and variational Fourier restriction theories with papers that appeared in the meantime. In what follows, \(\mu \) will be a finite complex measure defined on the Borel sets in \({\mathbb {R}}^3\); its dilates \(\mu _{\varepsilon }\) are now defined as

$$\begin{aligned} \mu _{\varepsilon }(E):=\mu (\varepsilon ^{-1}E) \end{aligned}$$

for every Borel set \(E\subseteq {\mathbb {R}}^3\) and \(\varepsilon >0\). For reasons of elegance, one can additionally assume that \(\mu \) is normalized by \(\mu ({\mathbb {R}}^3)=1\) and that it is even, i.e., centrally symmetric with respect to the origin, which means that

$$\begin{aligned} \mu (-E)=\mu (E) \end{aligned}$$

for each Borel set \(E\subseteq {\mathbb {R}}^3\).

In [10], one of the present authors showed analogues of (1.3) and (1.6) when \(\chi \) is replaced by a measure \(\mu \) whose Fourier transform \({\widehat{\mu }}\) is \(C ^\infty \) and satisfies the decay condition

$$\begin{aligned} |\nabla {\widehat{\mu }}(x)| \lesssim (1+|x|)^{-1-\delta } \end{aligned}$$
(1.7)

for some \(\delta >0\). It is interesting to make the following observation, which applies to the cases of two or three dimensions only as things improve in higher dimensions. If one takes \(\mu \) to be the normalized spherical measure, i.e., \(\mu =\sigma /\sigma ({\mathbb {S}}^2)\), then the decay of \(|\nabla {\widehat{\mu }}(x)|\) as \(|x|\rightarrow \infty \) is only \(O(|x|^{-1})\); see [1]. Consequently, the results from [10] do not apply. This was one of the sources of motivation for Ramos [14], who reused Vitturi’s argument from [20] to conclude that the maximal estimate

$$\begin{aligned} \Big \Vert \sup _{\varepsilon >0} \big | ({\widehat{f}}*\mu _{\varepsilon })(\omega ) \big | \Big \Vert _{L ^2_{\omega }({\mathbb {S}}^2,\sigma )} \lesssim _{\mu } \Vert f\Vert _{L ^{4/3}({\mathbb {R}}^3)} \end{aligned}$$

holds as soon as the maximal operator \(h\mapsto \sup _{\varepsilon >0}(h*\mu _{\varepsilon })\) is bounded on \(L ^2({\mathbb {R}}^3)\). Relating this to the work of Rubio de Francia [16], he further deduced the following sufficient condition in terms of the decays of \({\widehat{\mu }}\) and \(\nabla {\widehat{\mu }}\):

$$\begin{aligned} |{\widehat{\mu }}(x)| \lesssim (1+|x|)^{-\alpha } \text { and } |\nabla {\widehat{\mu }}(x)| \lesssim (1+|x|)^{-\beta } \text { with } \alpha +\beta >1. \end{aligned}$$

This condition includes the spherical measure as it satisfies \(|{\widehat{\mu }}(x)|=O(|x|^{-1})\) and \(|\nabla {\widehat{\mu }}(x)|=O(|x|^{-1})\) as \(|x|\rightarrow \infty \).

The main result of this note is a variational estimate which generalizes (1.6) slightly beyond the previously covered cases of the averaging functions \(\chi \) or measures \(\mu \).

Theorem 1

Suppose that \(\varrho \in (1,\infty )\), and that \(\mu \) is a normalized, even complex measure, defined on the Borel subsets of \({\mathbb {R}}^3\), satisfying any of the following three conditions:

  1. (a)

    \(-x\cdot \nabla {\widehat{\mu }}(x)\ge 0\) for each \(x\in {\mathbb {R}}^3\), and

    $$\begin{aligned} \Big \Vert \big \Vert (h*\mu _{\varepsilon })(x) \big \Vert _{\widetilde{V }^{\varrho }_{\varepsilon }} \Big \Vert _{L _x^{2}({\mathbb {R}}^3)} \lesssim _{\mu ,\varrho } \Vert h\Vert _{L ^{2}({\mathbb {R}}^3)} \end{aligned}$$
    (1.8)

    holds for each Schwartz function h

  2. (b)

    \(\varrho \in (2,\infty )\), while \({\widehat{\mu }}\) is \(C ^2\) and satisfies the decay condition (1.7); 

  3. (c)

    the inequality

    $$\begin{aligned} \Big \Vert \big \Vert (h*\mu _{\varepsilon }*{\overline{\mu }}_{\eta })(x) \big \Vert _{\widetilde{W }^{\varrho }_{\varepsilon ,\eta }} \Big \Vert _{L _x^{2}({\mathbb {R}}^3)} \lesssim _{\mu ,\varrho } \Vert h\Vert _{L ^{2}({\mathbb {R}}^3)} \end{aligned}$$
    (1.9)

    holds for each Schwartz function h.

Then, for each Schwartz function f, the following estimate holds:

$$\begin{aligned} \Big \Vert \big \Vert ({\widehat{f}}*\mu _{\varepsilon })(\omega ) \big \Vert _{V ^{\varrho }_{\varepsilon }} \Big \Vert _{L ^2_{\omega }({\mathbb {S}}^2,\sigma )} \lesssim _{\mu ,\varrho } \Vert f\Vert _{L ^{4/3}({\mathbb {R}}^3)}. \end{aligned}$$
(1.10)

Let us immediately clarify one minor technical issue. Theorem 1 claims estimate (1.10) for Schwartz functions f only, but it immediately extends to all \(f\in L ^{4/3}({\mathbb {R}}^3)\) whenever \(\mu \) is absolutely continuous with respect to the Lebesgue measure. Otherwise, we could run into measurability issues on the left-hand side of (1.10) for singular measures \(\mu \).

Condition (a) in Theorem 1 is quite restrictive, but it is satisfied at least when \(\varrho >2\) and the Radon–Nikodym density of \(\mu \) is a radial Gaussian function. Indeed, if \(d \mu (x)=\alpha ^3 e^{-\pi \alpha ^2|x|^2}\,d x\) for some \(\alpha \in (0,\infty )\), then

$$\begin{aligned} -x\cdot \nabla {\widehat{\mu }}(x) = 2\pi \alpha ^{-2}|x|^2 e^{-\pi \alpha ^{-2}|x|^2} \end{aligned}$$
(1.11)

is nonnegative, and (1.8) is a standard estimate by Bourgain [5, Lemma 3.28]. In fact, Bourgain formulated (1.8) for one-dimensional Schwartz averaging functions in [5], but the proof carries over to higher dimensions. Alternatively, one can invoke more general results from the subsequent literature, such as the work of Jones, Seeger, and Wright [9], which covered higher-dimensional convolutions, more general dilation structures, and both strong and weak-type variational estimates in a range of \(L ^p\)-spaces.

Theorem 1 under condition (b) was covered by the paper [10], up to minor technicalities, such as the fact that here we do not need \({\widehat{\mu }}\) to be smoother than \(C ^2\). However, [10] was concerned with more general surfaces and more general measures \(\sigma \) on them, while here we are able to give a more direct proof that is specific to the sphere and to the stated choice of the Lebesgue space exponents. In fact, as we have already noted, the present proof predates [10].

Condition (c) above is somewhat artificial and difficult to verify, but we include it since the proof that it implies (1.10) will be the most straightforward.

Maximal restriction estimates have found a nice application in the very recent work of Bilz [4], who used them to show that there exists a compact set \(E\subset {\mathbb {R}}^3\) of full Hausdorff dimension that does not allow any nontrivial a priori Fourier restriction estimates for any nontrivial Borel measure on E. We do not discuss the details here, but rather refer an interested reader to [4].

1.1 Notation

If \(A,B:X\rightarrow [0,\infty )\) are two functions (or functional expressions) such that, for each \(x\in X\), \(A(x)\le CB(x)\) for some unimportant constant \(0\le C<\infty \), then we write \(A(x)\lesssim B(x)\) or \(A(x) = O(B(x))\). If the constant C depends on a set of parameters P, we emphasize it notationally by writing \(A(x)\lesssim _P B(x)\) or \(A(x) = O_P(B(x))\).

We write a variable in the subscript of the letter denoting a function space whenever we need to emphasize with respect to which variable the corresponding (semi)norm is taken. For instance, \(\Vert g(\omega )\Vert _{L ^p_\omega }\) can be written in place of \(\Vert g\Vert _{L ^p}\), while \(\Vert g(\varepsilon )\Vert _{V ^\varrho _\varepsilon }\) can be written in place of \(\Vert g\Vert _{V ^\varrho }\), whenever the variables \(\omega ,\varepsilon \) need to be written explicitly.

The Fourier transform of a function \(f\in L ^1({\mathbb {R}}^3)\) is normalized as

$$\begin{aligned} {\widehat{f}}(y) := \int \limits _{{\mathbb {R}}^3} f(x) e^{-2\pi i x\cdot y} d x \end{aligned}$$

for each \(y\in {\mathbb {R}}^3\). Here \(x\cdot y\) denotes the standard scalar product of vectors \(x,y\in {\mathbb {R}}^3\), and integration is performed with respect to the Lebesgue measure. The map \(f\mapsto {\widehat{f}}\) is then extended, as usual, by continuity to bounded linear operators \(L ^p({\mathbb {R}}^3)\rightarrow L ^{p'}({\mathbb {R}}^3)\) for each \(p\in (1,2]\) and \(p'=p/(p-1)\).

More generally, the Fourier transform of a complex measure \(\mu \) is the function \({\widehat{\mu }}\) defined as

$$\begin{aligned} {\widehat{\mu }}(y) := \int \limits _{{\mathbb {R}}^3} e^{-2\pi i x\cdot y} d \mu (x) \end{aligned}$$

for each \(y\in {\mathbb {R}}^3\).

The set of complex-valued Schwartz functions on \({\mathbb {R}}^3\) will be denoted by \({\mathcal {S}}({\mathbb {R}}^3)\).

The remainder of this paper is devoted to the proof of Theorem 1.

2 Proof of Theorem 1

We need to establish (1.10) assuming any one of the three conditions from the statement of Theorem 1. Let us start with condition (a).

Proof of Theorem 1 assuming condition (a)

Start by observing that

$$\begin{aligned} \sup _{\varepsilon _0>0} \big |{\widehat{f}}*\mu _{\varepsilon _0}\big | \le \big |{\widehat{f}}*\mu \big | + \sup _{\varepsilon _0>0} \big |{\widehat{f}}*(\mu _{\varepsilon _0}-\mu _1)\big |, \end{aligned}$$

that \({\widehat{f}}*\mu = (f{\widehat{\mu }})\,\), and that the ordinary restriction estimate (1.1) applies to \(f{\widehat{\mu }}\) and yields

$$\begin{aligned} \big \Vert {\widehat{f}}*\mu \big \Vert _{L ^2({\mathbb {S}}^2,\sigma )} \lesssim \Vert f{\widehat{\mu }}\Vert _{L ^{4/3}({\mathbb {R}}^3)} \lesssim _{\mu } \Vert f\Vert _{L ^{4/3}({\mathbb {R}}^3)}. \end{aligned}$$

Thus, inequality (1.10) reduces to two applications of

$$\begin{aligned} \Big \Vert \big \Vert ({\widehat{f}}*\mu _{\varepsilon })(\omega ) \big \Vert _{\widetilde{V }^{\varrho }_{\varepsilon }} \Big \Vert _{L ^2_{\omega }({\mathbb {S}}^2,\sigma )} \lesssim _{\mu ,\varrho } \Vert f\Vert _{L ^{4/3}({\mathbb {R}}^3)}, \end{aligned}$$
(2.1)

which we proceed to establish. The desired estimate (2.1) unfolds as

$$\begin{aligned} \bigg \Vert \sup _{\begin{array}{c} m\in {\mathbb {N}}\\ 0<\varepsilon _0<\varepsilon _1<\cdots <\varepsilon _m \end{array}} \Big ( \sum _{j=1}^{m} \big | \big ({\widehat{f}} *(\mu _{\varepsilon _{j-1}} - \mu _{\varepsilon _{j}}) \big ) (\omega ) \big |^\varrho \Big )^{1/\varrho } \bigg \Vert _{L ^2_{\omega }({\mathbb {S}}^2,\sigma )} \lesssim _{\mu ,\varrho } \Vert f\Vert _{L ^{4/3}({\mathbb {R}}^3)}. \end{aligned}$$
(2.2)

The numbers \(\varepsilon _j\) in the above supremum can be restricted to a fixed interval \([\varepsilon _min ,\varepsilon _max ]\) with \(0<\varepsilon _min <\varepsilon _max \), but the estimate needs to be established with a constant independent of \(\varepsilon _min \) and \(\varepsilon _max \). Afterwards one simply applies the monotone convergence theorem letting \(\varepsilon _min \rightarrow 0^+\) and \(\varepsilon _max \rightarrow \infty \). Moreover, by only increasing the left-hand side of (2.2), we can also achieve \(\varepsilon _0=\varepsilon _min \) and \(\varepsilon _m=\varepsilon _max \).

Next, by continuity, one may further restrict the attention to rational numbers in the interval \([\varepsilon _min ,\varepsilon _max ]\), and, by yet another application of the monotone convergence theorem, one may consider only finitely many values in that interval. In this way, no generality is lost in assuming that the supremum in (2.2) is achieved for some \(m\in {\mathbb {N}}\) and for some measurable functions \(\varepsilon _k:{\mathbb {S}}^2\rightarrow [\varepsilon _min ,\varepsilon _max ]\), \(k\in \{0,1,\ldots ,m\}\), such that \(\varepsilon _0(\omega )\equiv \varepsilon _{min }\), \(\varepsilon _m(\omega )\equiv \varepsilon _{max }\). Estimate (2.2) then becomes

$$\begin{aligned} \Big \Vert \Big ( \sum _{j=1}^{m} \big | \big ({\widehat{f}} *(\mu _{\varepsilon _{j-1}(\omega )} - \mu _{\varepsilon _{j}(\omega )}) \big ) (\omega ) \big |^\varrho \Big )^{1/\varrho } \Big \Vert _{L ^2_{\omega }({\mathbb {S}}^2,\sigma )} \lesssim _{\mu ,\varrho } \Vert f\Vert _{L ^{4/3}({\mathbb {R}}^3)}. \end{aligned}$$

Once again, the implicit constant needs to be independent of m and the functions \(\varepsilon _k\). The reduction we just performed is an instance of the Kolmogorov–Seliverstov–Plessner linearization method used in [13].

Dualizing the mixed \(L _\omega ^2(\ell _j^\varrho )\)-norm, see [3], we turn the latter estimate into

$$\begin{aligned} \big | \Lambda (f,{\mathbf {g}}) \big | \lesssim _{\mu ,\varrho } \Vert f\Vert _{L ^{4/3}({\mathbb {R}}^3)} \Big \Vert \Big ( \sum _{j=1}^{m} |g_j|^{\varrho '} \Big )^{1/\varrho '} \Big \Vert _{L ^2({\mathbb {S}}^2,\sigma )}, \end{aligned}$$
(2.3)

where the bilinear form \(\Lambda \) is defined via

$$\begin{aligned} \Lambda (f,{\mathbf {g}}) := \int \limits _{{\mathbb {S}}^2} \sum _{j=1}^{m} \big ({\widehat{f}} *(\mu _{\varepsilon _{j-1}(\omega )} - \mu _{\varepsilon _{j}(\omega )}) \big ) (\omega ) \overline{g_j(\omega )} \,d \sigma (\omega ). \end{aligned}$$

Here, \(\varrho '=\varrho /(\varrho -1)\) denotes the exponent conjugate to \(\varrho \) as usual, and \(g_j:{\mathbb {S}}^2\rightarrow {\mathbb {C}}\) are arbitrary measurable functions, \(j\in \{1,2,\ldots ,m\}\), gathered in a single vector-valued function \({\mathbf {g}}=(g_j)_{j=1}^{m}\). By elementary properties of the Fourier transform, \(\Lambda \) can be rewritten as

$$\begin{aligned} \Lambda (f,{\mathbf {g}}) = \int \limits _{{\mathbb {R}}^3} f(x) \overline{{\mathcal {E}}({\mathbf {g}})(x)} d x, \end{aligned}$$

where \({\mathcal {E}}\) is a certain extension-type operator given by

$$\begin{aligned} {\mathcal {E}}({\mathbf {g}})(x) := \int \limits _{{\mathbb {S}}^2} \sum _{j=1}^{m} \Big ( {\widehat{\mu }}\big (\varepsilon _{j-1}(\omega )x\big ) - {\widehat{\mu }}\big (\varepsilon _{j}(\omega )x\big ) \Big ) g_j(\omega ) e^{2\pi i x\cdot \omega } \,d \sigma (\omega ). \end{aligned}$$
(2.4)

By Hölder’s inequality, (2.3) is in turn equivalent to

$$\begin{aligned} \Vert {\mathcal {E}}({\mathbf {g}}) \Vert _{L ^{4}({\mathbb {R}}^3)} \lesssim _{\mu ,\varrho } \Big \Vert \Big ( \sum _{j=1}^{m} |g_j|^{\varrho '} \Big )^{1/\varrho '} \Big \Vert _{L ^2({\mathbb {S}}^2,\sigma )}. \end{aligned}$$
(2.5)

If we denote

$$\begin{aligned} \vartheta (x) := - x \cdot (\nabla {\widehat{\mu }})(x), \end{aligned}$$
(2.6)

then we also have

$$\begin{aligned} - t \frac{d }{d t} {\widehat{\mu }}(tx) = -(tx) \cdot (\nabla {\widehat{\mu }})(tx) = \vartheta (tx) \end{aligned}$$

for any \(x\in {\mathbb {R}}^3\), which in turn implies

$$\begin{aligned} {\widehat{\mu }}(ax) - {\widehat{\mu }}(bx) = \int \limits _{a}^{b} \vartheta (t x) \frac{d t}{t} \end{aligned}$$
(2.7)

for any \(0<a<b\). Substituting this into the definition of \({\mathcal {E}}\) yields

$$\begin{aligned} {\mathcal {E}}({\mathbf {g}})(x) = \int \limits _{\varepsilon _{min }}^{\varepsilon _{max }} \vartheta (tx) \int \limits _{{\mathbb {S}}^2} g_{j(t,\omega )}(\omega ) e^{2\pi i x\cdot \omega } \,d \sigma (\omega ) \frac{d t}{t}, \end{aligned}$$

where, for each \(t\in [\varepsilon _{min },\varepsilon _{max })\) and each \(\omega \in {\mathbb {S}}^2\), we denote by \(j(t,\omega )\) the unique index \(j\in \{1,2,\ldots ,m\}\) such that \(t\in [\varepsilon _{j-1}(\omega ),\varepsilon _{j}(\omega ))\). Since \(\vartheta (x)\ge 0\), by the standing assumption in (a), we can apply the Cauchy–Schwarz inequality in the variable t to estimate

$$\begin{aligned} | {\mathcal {E}}({\mathbf {g}})(x) |^2 \lesssim _{\mu } {\mathcal {A}}(x) {\mathcal {B}}({\mathbf {g}})(x), \end{aligned}$$
(2.8)

where

$$\begin{aligned} {\mathcal {A}}(x) := \int \limits _{\varepsilon _{min }}^{\varepsilon _{max }} \vartheta (tx) \frac{d t}{t} \end{aligned}$$

and

$$\begin{aligned} {\mathcal {B}}({\mathbf {g}})(x) := \int \limits _{\varepsilon _{min }}^{\varepsilon _{max }} \vartheta (tx) \Big | \int \limits _{{\mathbb {S}}^2} g_{j(t,\omega )}(\omega ) e^{2\pi i x\cdot \omega } \,d \sigma (\omega ) \Big |^2 \frac{d t}{t}. \end{aligned}$$
(2.9)

By (2.7),

$$\begin{aligned} {\mathcal {A}}(x) = {\widehat{\mu }}\big (\varepsilon _{min }x\big ) - {\widehat{\mu }}\big (\varepsilon _{max }x\big ) \lesssim 1. \end{aligned}$$
(2.10)

From (2.8) and (2.10), we see that (2.5) will be established once we prove

$$\begin{aligned} \Vert {\mathcal {B}}({\mathbf {g}}) \Vert _{L ^{2}({\mathbb {R}}^3)} \lesssim _{\mu ,\varrho } \Big \Vert \Big ( \sum _{j=1}^{m} |g_j|^{\varrho '} \Big )^{1/\varrho '} \Big \Vert _{L ^2({\mathbb {S}}^2,\sigma )}^2 . \end{aligned}$$
(2.11)

Expanding out the square in the definition of \({\mathcal {B}}({\mathbf {g}})(x)\) yields

$$\begin{aligned} {\mathcal {B}}({\mathbf {g}})(x) = \int \limits _{({\mathbb {S}}^2)^2} \int \limits _{\varepsilon _{min }}^{\varepsilon _{max }} g_{j(t,\omega )}(\omega ) \overline{g_{j(t,\omega ')}(\omega ')} \,e^{2\pi i x\cdot (\omega -\omega ')} \,\vartheta (tx) \frac{d t}{t} \,d \sigma (\omega ) \,d \sigma (\omega '). \end{aligned}$$
(2.12)

For fixed \(\omega ,\omega '\in {\mathbb {S}}^2\), consider

$$\begin{aligned}&J(\omega ,\omega ') \\&\quad := \big \{ (j,j')\in \{1,2,\ldots ,m\}^2 \,:\, [\varepsilon _{j-1}(\omega ),\varepsilon _{j}(\omega )) \cap [\varepsilon _{j'-1}(\omega '),\varepsilon _{j'}(\omega ')) \ne \emptyset \big \}. \end{aligned}$$

The intersection of two half-open intervals is either the empty set or again a half-open interval. For each pair \((j,j')\in J(\omega ,\omega ')\), it follows that there exist unique real numbers \(a(j,j',\omega ,\omega ')\) and \(b(j,j',\omega ,\omega ')\), such that

$$\begin{aligned}{}[\varepsilon _{j-1}(\omega ), \varepsilon _{j}(\omega )) \cap [\varepsilon _{j'-1}(\omega '), \varepsilon _{j'}(\omega ')) = [a(j,j',\omega ,\omega '), b(j,j',\omega ,\omega ')). \end{aligned}$$
(2.13)

Clearly the intervals (2.13) constitute a finite partition of \([\varepsilon _{min },\varepsilon _{max })\). Using (2.7), we can rewrite (2.12) as

$$\begin{aligned} \begin{aligned} {\mathcal {B}}({\mathbf {g}})(x)&= \int \limits _{({\mathbb {S}}^2)^2} \sum _{(j,j')\in J(\omega ,\omega ')} g_{j}(\omega ) \overline{g_{j'}(\omega ')} \,e^{2\pi i x\cdot (\omega -\omega ')}\\ {}&\quad \times \left( \int \limits _{a(j,j',\omega ,\omega ')}^{b(j,j',\omega ,\omega ')} \vartheta (tx) \frac{\mathrm{d} t}{t} \right) \,\mathrm{d} \sigma (\omega ) \,\mathrm{d} \sigma (\omega ') \\ {}&= \int \limits _{({\mathbb {S}}^2)^2} \sum _{(j,j')\in J(\omega ,\omega ')} g_{j}(\omega ) \overline{g_{j'}(\omega ')} \,e^{2\pi i x\cdot (\omega -\omega ')} \\ {}&\quad \times \Big ( {\widehat{\mu }}\big (a(j,j',\omega ,\omega ')x\big ) - {\widehat{\mu }}\big (b(j,j',\omega ,\omega ')x\big ) \Big ) \,\mathrm{d} \sigma (\omega ) \,\mathrm{d} \sigma (\omega '). \end{aligned} \end{aligned}$$

Taking \(h\in {\mathcal {S}}({\mathbb {R}}^3)\) and dualizing with \({\widehat{h}}\) leads to the form

$$\begin{aligned} \Theta ({\mathbf {g}},h) := \int \limits _{{\mathbb {R}}^3} {\mathcal {B}}({\mathbf {g}})(x) {\widehat{h}}(x) \,d x. \end{aligned}$$

By Plancherel’s identity, we have

$$\begin{aligned} \Vert {\mathcal {B}}({\mathbf {g}}) \Vert _{L ^{2}({\mathbb {R}}^3)} = \sup \big \{ |\Theta ({\mathbf {g}},h)| \,:\, h\in {\mathcal {S}}({\mathbb {R}}^3),\ \Vert h\Vert _{L ^2({\mathbb {R}}^3)}=1 \big \}, \end{aligned}$$

so (2.11) will follow from

$$\begin{aligned} | \Theta ({\mathbf {g}},h) | \lesssim _{\mu ,\varrho } \Big \Vert \Big ( \sum _{j=1}^{m} |g_j|^{\varrho '} \Big )^{1/\varrho '} \Big \Vert _{L ^2({\mathbb {S}}^2,\sigma )}^2 \Vert h\Vert _{L ^{2}({\mathbb {R}}^3)}, \end{aligned}$$
(2.14)

which we proceed to establish.

Using basic properties of the Fourier transform, the form \(\Theta \) can be rewritten as

$$\begin{aligned} \Theta ({\mathbf {g}},h)&= \int \limits _{({\mathbb {S}}^2)^2} \sum _{(j,j')\in J(\omega ,\omega ')} g_{j}(\omega ) \overline{g_{j'}(\omega ')} \\&\quad \times \big (h*\mu _{a(j,j',\omega ,\omega ')} - h*\mu _{b(j,j',\omega ,\omega ')}\big )(\omega -\omega ') \,d \sigma (\omega ) \,d \sigma (\omega ') \end{aligned}$$

for any Schwartz function h. Applying Hölder’s inequality to the sum in \((j,j')\), and recalling the definition of the \(\varrho \)-variation yields

$$\begin{aligned} \begin{aligned} |\Theta ({\mathbf {g}},h)|&\le \int \limits _{({\mathbb {S}}^2)^2} \Big (\sum _{j=1}^{m}|g_{j}(\omega )|^{\varrho '}\Big )^{1/\varrho '} \Big (\sum _{j=1}^{m}|g_{j}(\omega ')|^{\varrho '}\Big )^{1/\varrho '} \\ {}&\quad \times \Vert (h *\mu _\varepsilon )(\omega -\omega ') \Vert _{\widetilde{V }^\varrho _\varepsilon } \,\mathrm{d} \sigma (\omega ) \,\mathrm{d} \sigma (\omega '). \end{aligned} \end{aligned}$$

By the usual Tomas–Stein restriction theorem in the formulation (1.5), applied with g replaced by \(\big (\sum _{j=1}^{m}|g_{j}|^{\varrho '}\big )^{1/\varrho '}\) and with h replaced by \(\Vert h*\mu _\varepsilon \Vert _{\widetilde{V }^\varrho _\varepsilon }\), we obtain

$$\begin{aligned} |\Theta ({\mathbf {g}},h)| \lesssim \Big \Vert \Big ( \sum _{j=1}^{m} |g_j|^{\varrho '} \Big )^{1/\varrho '} \Big \Vert _{L ^2({\mathbb {S}}^2,\sigma )}^2 \big \Vert \Vert h *\mu _\varepsilon \Vert _{\widetilde{V }^\varrho _\varepsilon } \big \Vert _{L ^{2}({\mathbb {R}}^3)}. \end{aligned}$$

Invoking assumption (1.8) completes the proof of estimate (2.14), and therefore also that of (1.10). \(\square \)

Next, we will impose condition (b) and reduce the proof to the previous one by replacing \(\mu \) with a superposition of “nicer” measures.

Proof of Theorem 1 assuming condition (b)

We can repeat the same steps as before, reducing (1.10) to (2.11), where \({\mathcal {B}}\) is as in (2.9) and \(\vartheta \) is defined by (2.6).

We have already noted that (1.8) is satisfied for measures \(\mu \) with Gaussian densities, i.e., when \(d \mu (x)=\alpha ^3 e^{-\pi \alpha ^2|x|^2}\,d x\) for some \(\alpha \in (0,\infty )\), and that in this case (1.11) equals \(\psi (x/\alpha )\), where

$$\begin{aligned} \psi (x) := 2\pi |x|^2 e^{-\pi |x|^2}. \end{aligned}$$

Therefore, the previous proof of (2.11) specialized to this measure yields

$$\begin{aligned} \bigg \Vert \int \limits _{\varepsilon _{min }}^{\varepsilon _{max }} \psi \Big (\frac{tx}{\alpha }\Big ) \Big | \int \limits _{{\mathbb {S}}^2} g_{j(t,\omega )}(\omega ) e^{2\pi i x\cdot \omega } \,d \sigma (\omega ) \Big |^2 \frac{d t}{t} \bigg \Vert _{L ^{2}_{x}({\mathbb {R}}^3)} \lesssim _{\varrho } \Big \Vert \Big ( \sum _{j=1}^{m} |g_j|^{\varrho '} \Big )^{1/\varrho '} \Big \Vert _{L ^2({\mathbb {S}}^2,\sigma )}^2 . \end{aligned}$$
(2.15)

Also note that the \(L ^1\)-normalization of the above Gaussians guarantees that the left-hand side of (1.8) does not depend on the parameter \(\alpha \). Consequently, the previous proof makes the constant in the bound (2.15) independent of \(\alpha \) as well. In this way, estimate (2.11) for a general measure \(\mu \) satisfying condition (b) will be a consequence of (2.15) and Minkowski’s inequality for integrals if we can only dominate \(\vartheta (x)\) pointwise as follows:

$$\begin{aligned} |\vartheta (x)| \lesssim _{\mu ,\delta } \int \limits _{1}^{\infty } \psi \Big (\frac{x}{\alpha }\Big ) \frac{d \alpha }{\alpha ^{1+\delta }} \end{aligned}$$
(2.16)

for each \(x\in {\mathbb {R}}^3\).

Denote by \(\Psi (x)\) the right-hand side of (2.16), and observe that \(\Psi (0)=0\) and \(\Psi (x)>0\) for each \(x\ne 0\). By continuity and compactness, it suffices to show that the ratio \(|\vartheta (x)|/\Psi (x)\) remains bounded as \(|x|\rightarrow \infty \) or \(|x|\rightarrow 0^+\).

Substituting \(r=\pi \alpha ^{-2}|x|^2\), we may rewrite \(\Psi \) as

$$\begin{aligned} \Psi (x)= & {} 2\pi |x|^2 \int \limits _{1}^{\infty } e^{-\pi \alpha ^{-2} |x|^2} \frac{d \alpha }{\alpha ^{3+\delta }} \end{aligned}$$
(2.17)
$$\begin{aligned}= & {} \int \limits _{0}^{\pi |x|^2} e^{-r} \Big (\frac{\sqrt{r}}{\sqrt{\pi }|x|}\Big )^{\delta } \,d r. \end{aligned}$$
(2.18)

From (2.18), we see that \(\lim _{|x|\rightarrow \infty } \Psi (x)/|x|^{-\delta } \in (0,\infty )\), while decay condition (1.7) gives \(|\vartheta (x)|=O(|x|^{-\delta })\) as \(|x|\rightarrow \infty \).

On the other hand, using Taylor’s formula for the function \(x\mapsto e^{-\pi \alpha ^{-2} |x|^2}\) and substituting into (2.17), we easily obtain

$$\begin{aligned} \Psi (x) = 2\pi |x|^2 \Big (\frac{1}{2+\delta } + O_{\delta }(|x|^2)\Big ) \end{aligned}$$

on a neighborhood of the origin. Moreover, \({\widehat{\mu }}\) is \(C ^2\) and even since \(\mu \) is even, and so we have that \((\nabla {\widehat{\mu }})(0)=0\). Taylor’s formula then yields

$$\begin{aligned} \vartheta (x) = O_{\mu }(|x|^2) \end{aligned}$$

on a neighborhood of the origin. It follows that \(|\vartheta (x)|/\Psi (x) = O_{\mu ,\delta }(1)\) for sufficiently small nonzero |x|, and this completes the proof of (2.16). \(\square \)

The Gaussian domination trick which we have just used can be attributed to Stein, see [18, Chapter V, §3.1]. It was generalized and used in a slightly different context by Durcik [8].

Proof of Theorem 1 assuming condition (c)

We can repeat the same steps as before that reduce (1.10) to (2.5). This time we define the form \(\Theta \) differently via

$$\begin{aligned} \Theta ({\mathbf {g}},h) := \int \limits _{{\mathbb {R}}^3} |{\mathcal {E}}({\mathbf {g}})(x)|^2 \,{\widehat{h}}(x) \,d x, \end{aligned}$$

where \({\mathbf {g}}\) is as before and \(h\in {\mathcal {S}}({\mathbb {R}}^3)\). Again, by duality, we only need to establish (2.14).

Squaring out (2.4) and inserting that into the above definition of \(\Theta \), we obtain

$$\begin{aligned} \Theta ({\mathbf {g}},h)&= \int \limits _{{\mathbb {R}}^3} \int \limits _{({\mathbb {S}}^2)^2} \sum _{\begin{array}{c} 1\le j\le m\\ 1\le k\le m \end{array}} g_{j}(\omega ) \overline{g_{k}(\omega ')} \,\Big ( {\widehat{\mu }}\big (\varepsilon _{j-1}(\omega )x\big ) - {\widehat{\mu }}\big (\varepsilon _{j}(\omega )x\big ) \Big ) \\&\quad \times \Big ( \overline{{\widehat{\mu }}\big (\varepsilon _{k-1}(\omega ')x\big )} - \overline{{\widehat{\mu }}\big (\varepsilon _{k}(\omega ')x\big )} \Big ) \,{\widehat{h}}(x) \,e^{2\pi i x\cdot (\omega -\omega ')} \,d \sigma (\omega ) \,d \sigma (\omega ') \,d x, \end{aligned}$$

i.e.,

$$\begin{aligned} \Theta ({\mathbf {g}},h)&= \int \limits _{({\mathbb {S}}^2)^2} \sum _{\begin{array}{c} 1\le j\le m\\ 1\le k\le m \end{array}} g_{j}(\omega ) \overline{g_{k}(\omega ')} \\&\quad \times \big (h *( \mu _{\varepsilon _{j-1}(\omega )} - \mu _{\varepsilon _{j}(\omega )} ) *\overline{( \mu _{\varepsilon _{k-1}(\omega )} - \mu _{\varepsilon _{k}(\omega )} )} \,\big ) (\omega -\omega ') \,d \sigma (\omega ) \,d \sigma (\omega '). \end{aligned}$$

Applying Hölder’s inequality to the sum in (jk), and recalling the definition of the biparameter \(\varrho \)-variation seminorm yields

$$\begin{aligned} |\Theta ({\mathbf {g}},h)|&\le \int \limits _{({\mathbb {S}}^2)^2} \Big (\sum _{j=1}^{m}|g_{j}(\omega )|^{\varrho '}\Big )^{1/\varrho '} \Big (\sum _{k=1}^{m}|g_{k}(\omega ')|^{\varrho '}\Big )^{1/\varrho '} \\&\quad \times \Vert (h *\mu _\varepsilon *{\overline{\mu }}_\eta )(\omega -\omega ') \Vert _{\widetilde{W }^\varrho _{\varepsilon ,\eta }} \,d \sigma (\omega ) \,d \sigma (\omega '). \end{aligned}$$

Using (1.5) we obtain

$$\begin{aligned} |\Theta ({\mathbf {g}},h)| \lesssim \Big \Vert \Big ( \sum _{j=1}^{m} |g_j|^{\varrho '} \Big )^{1/\varrho '} \Big \Vert _{L ^2({\mathbb {S}}^2,\sigma )}^2 \big \Vert \Vert h *\mu _\varepsilon *{\overline{\mu }}_\eta \Vert _{\widetilde{W }^\varrho _{\varepsilon ,\eta }} \big \Vert _{L ^{2}({\mathbb {R}}^3)}, \end{aligned}$$

and it remains to invoke the assumption (1.9). This proves (2.14) and, thus, also completes the proof of the theorem assuming condition (c). \(\square \)