1 Introduction

In this paper, we study a well-known problem on the optimal recovery of multivariate functions from n function samples. The problem turned out to be rather difficult in several relevant situations. Since we want to recover the function f from n function samples \((f(\mathbf {x}^1),\ldots ,f(\mathbf {x}^n))\), the problem boils down to the question of how to choose these sampling nodes \(\mathbf {X}= (\mathbf {x}^1,\ldots ,\mathbf {x}^n)\) and corresponding recovery algorithms. The minimal worst-case error for an optimal choice is reflected by the n-th sampling number defined by

$$\begin{aligned} g_n(\mathrm {Id}_{K,\varrho _D}) {:=} \inf \limits _{\mathbf {x}^1,\ldots ,\mathbf {x}^n \in D} \; \inf \limits _{\varphi :\mathbb {C}^n \rightarrow L_2} \; \sup \limits _{\Vert f\Vert _{H(K)}\le 1}\Vert f-\varphi (f(\mathbf {x}^1),\ldots ,f(\mathbf {x}^n))\Vert _{L_2(D,\varrho _D)}. \end{aligned}$$
(1)

The functions are modeled as elements from a separable reproducing kernel Hilbert space H(K) of functions on a set \(D \subset \mathbb {R}^d\) with finite trace kernel \(K(\cdot ,\cdot )\), i.e.,

$$\begin{aligned} {\text {tr}}(K){:}{=}\int _D K(\mathbf {x},\mathbf {x})d\varrho _D(\mathbf {x})<\infty . \end{aligned}$$
(2)

The recovery problem (in the above framework) has been first addressed by Wasilkowski and Woźniakowski in [42]. The corresponding problem for certain particular cases (e.g., classes of functions with mixed smoothness properties, see [7, Sect. 5]) has been studied much earlier. Our main result is the existence of two universal constants \(C,c>0\) (specified in Remark 6.3) such that the relation

$$\begin{aligned} g_n^2 \le C \frac{\log (n)}{n}\sum \limits _{k\ge \lfloor cn \rfloor } \sigma _k^2\quad , \quad n\ge 2, \end{aligned}$$
(3)

holds true between the sampling numbers \((g_n)_{n\in \mathbb {N}}\) and the square summable singular numbers \((\sigma _k)_{k\in \mathbb {N}}\) of the compact embedding

$$\begin{aligned} \mathrm {Id}_{K,\varrho _D}:H(K) \rightarrow L_2(D,\varrho _D). \end{aligned}$$

We emphasize that in general, the square-summability of the singular numbers \((\sigma _k)_{k\in \mathbb {N}}\) is not implied by the compactness of the embedding \(\mathrm {Id}_{K,\varrho _D}\). This is one reason why we need the additional assumption of a finite trace kernel (2) (or a Hilbert–Schmidt embedding). In addition, as it has been observed by Hinrichs et al. [10], the non-existing trace may cause the sampling numbers to have a worse (or even no) polynomial decay than the corresponding polynomially decaying singular numbers \((\sigma _k)_k\). Hence, an inequality (like (3)) which passes on the polynomial decay of the singular numbers to the sampling numbers is in general impossible without the condition of a finite trace (2). In our main example, the recovery of multivariate functions with dominating mixed smoothness (see Sect. 7), this condition is equivalent to \(s>1/2\), where s denotes the mixed smoothness parameter. For further historical and technical comments (e.g., non-separable RKHS) we refer to Remark 6.2.

The algorithm which realizes the bound (3) in the sense of (1) is a (linear) least squares algorithm based on a specific set of sampling nodes. These are constructed out of a random draw in combination with a down-sampling procedure coming from the proof of Weaver’s conjecture [26], see Sect. 2. In its original form, the result in [26] is not applicable for our purpose. That is why we have to slightly generalize it, see Theorem 2.3. Note that the result in (3) is non-constructive. We do not have a deterministic construction for a suitable set of nodes. However, we have control of the failure probability which can be made arbitrarily small. In addition, the subspace, where the least squares algorithm is taking place is precisely given and determined by the first m singular vectors.

The problem discussed in the present paper is tightly related to the problem of the Marcinkiewicz discretization of \(L_2\)-norms for functions from finite-dimensional spaces (e.g., trigonometric polynomials). In fact, constructing well-conditioned matrices for the least squares approximation is an equivalent issue. Let us emphasize that V.N. Temlyakov (and coauthors) already used the Nitzan et al. construction [26] for the Marcinkiewicz discretization problem in the context of multivariate (hyperbolic cross) polynomials, see [34, 35] and the very recent paper [21].

Compared to the result by Krieg and M.Ullrich [15], the relation (3) is stronger. In fact, the difference is mostly in the \(\log \)-exponent as the example below shows. The general relation (3) yields a significant improvement in the situation of mixed Sobolev embeddings in \(L_2\), see Sect. 7. Applied, for instance, to the situation of \(H^s_{\text {mix}}(\mathbb {T}^d)\) in \(L_2(\mathbb {T}^d)\) with \(s>1/2\) (this condition is equivalent to the finite trace condition (2)), the result in (3) yields

$$\begin{aligned} g_n \lesssim _d n^{-s}\log (n)^{(d-1)s+1/2}, \end{aligned}$$
(4)

whereas the result in [15] (see also [13, 24, 41]) implies

$$\begin{aligned} g_n \lesssim _d n^{-s}\log (n)^{(d-1)s+s}. \end{aligned}$$

The \(\log \)-gap grows with \(s>1/2\). Our new result achieves rates that are only worse by \(\sqrt{\log (n)}\) in comparison with the benchmark rates given by the singular numbers. Note that in \(d\ge 3\) and any \(s>1/2\) the bound (4) yields a better performance than any sparse grid technique is able to provide, see [2, 3, 6, 8, 31, 33] and [7, Sect. 5]. In addition, combining the above result with recent preasymptotic estimates for the \((\sigma _j)_j\), see [14, 17,18,19], we are able to obtain reasonable bounds for \(g_n\) also in the case of small n. See Sect. 7 for further comments and references in this direction.

Krieg and M.Ullrich [15] used a sophisticated random sampling strategy which allowed for establishing a new connection between sampling numbers and singular values. Let us emphasize that this can be considered as a major progress in this field. In addition, the result in this paper partly relies on this random sampling strategy according to a distribution built upon spectral properties of the embedding. The advantage of the pure random strategy in connection with a \(\log (n)\)-oversampling is the fact that the failure probability decays polynomially in n which has been recently shown by M.Ullrich [41] and, independently, by Moeller together with the third named author [24]. In other words, although this approach incorporates a probabilistic ingredient, the failure probability is controlled and the algorithm may be implemented. Note that there are some obvious parallels to the field of compressed sensing, where also the measurement matrix is drawn at random and satisfies RIP with high probability.

Notation. As usual, \(\mathbb {N}\) denotes the natural numbers, \(\mathbb {N}_0{:}{=}\mathbb {N}\cup \{0\}\), \(\mathbb {Z}\) denotes the integers, \(\mathbb {R}\) the real numbers and \(\mathbb {R}_+\) the nonnegative real numbers and \(\mathbb {C}\) the complex numbers. For a natural number m, we set \([m] {:}{=} \{1, \ldots , m\}\). We will also use \(\dot{\cup }\) to emphasize that a union is disjoint. If not indicated otherwise \(\log (\cdot )\) denotes the natural logarithm of its argument. \(\mathbb {C}^n\) denotes the complex n-space, whereas \(\mathbb {C}^{m\times n}\) denotes the set of all \(m\times n\)-matrices \(\mathbf {L}\) with complex entries. Vectors and matrices are usually typesetted boldface with \(\mathbf {x},\mathbf {y}\in \mathbb {C}^n\). The matrix \(\mathbf {L}^{*}\) denotes the adjoint matrix. The spectral norm of matrices \(\mathbf {L}\) is denoted by \(\Vert \mathbf {L}\Vert \) or \(\Vert \mathbf {L}\Vert _{2\rightarrow 2}\). For a complex (column) vector \(\mathbf {y}\in \mathbb {C}^n\) (or \(\ell _2\)), we will often use the tensor notation for the matrix

$$\begin{aligned} \mathbf {y}\otimes \mathbf {y}{:}{=} \mathbf {y}\cdot \mathbf {y}^*= \mathbf {y}\cdot \overline{\mathbf {y}}^\top \in \mathbb {C}^{n \times n}\;(\text {or}\,\mathbb {C}^{\mathbb {N}\times \mathbb {N}}). \end{aligned}$$

For \(0<p\le \infty \) and \(\mathbf {x}\in \mathbb {C}^n\), we denote \(\Vert \mathbf {x}\Vert _p {:}{=} (\sum _{i=1}^n |x_i|^p)^{1/p}\) with the usual modification in the case \(p=\infty \) or \(\mathbf {x}\) being an infinite sequence. As usual, we will denote with \(\mathbb {E}X\) the expectation of a random variable X on a probability space \((\varOmega , \mathcal {A}, \mathbb {P})\). Given a measurable subset \(D\subset \mathbb {R}^d\) and a measure \(\varrho \), we denote with \(L_2(D,\varrho )\) the space of all square integrable complex-valued functions (equivalence classes) on D with \(\int _D |f(\mathbf {x})|^2\, d\varrho (\mathbf {x})<\infty \). We will often use \(\varOmega = D^n\) as probability space with the product measure \(\mathbb {P}= d\varrho ^n\) if \(\varrho \) is a probability measure itself.

2 Weaver’s Theorem

In this section, we prove a modified version of Weaver’s \(\text {KS}_2\)-theorem, also known as Weaver’s \(\text {KS}_2\)-conjecture, from [43] which was shown to be equivalent to the famous Kadison–Singer conjecture [12] dating back as far as 1959. For a long time, these statements were mere conjectures and many people even believed them to be false. Since the celebrated proof given by Marcus et al. [22] in 2015, however, they have turned into actual theorems and thus into rather strong tools for various applications, and it is in fact the Weaver \(\text {KS}_2\)-conjecture that is at the heart of our argument in this article. We need it in a slightly modified form, however, formulated in Theorem 2.3. The starting point for its proof is the following reformulation of the classical Weaver statement which already occurred in [26]. We will formulate it with slightly improved constants, see [25].

Theorem 2.1

[26] Let \(0 < \varepsilon \) and \(\mathbf {u}_1, \ldots , \mathbf {u}_n \in \mathbb {C}^m\) with \(\Vert \mathbf {u}_i\Vert _2^2 \le \varepsilon \) for all \(i=1, \ldots , n\) and

$$\begin{aligned} \sum _{i=1}^n |\langle \mathbf {w}, \mathbf {u}_i\rangle |^2 = \Vert \mathbf {w}\Vert _2^2 \end{aligned}$$
(5)

for all \(\mathbf {w} \in \mathbb {C}^m\). Then, there is a partition \(S_1 \dot{\cup } S_2 = [n]\) with

$$\begin{aligned} \sum _{i \in S_j} |\langle \mathbf {w}, \mathbf {u}_i\rangle |^2 \le \frac{(1+\sqrt{2\varepsilon })^2}{2} \Vert \mathbf {w}\Vert _2^2 \end{aligned}$$

for each \(j=1, 2\) and all \(\mathbf {w} \in \mathbb {C}^m\). In particular, we have

$$\begin{aligned} \frac{1-(2+\sqrt{2})\sqrt{\varepsilon }}{2} \Vert \mathbf {w}\Vert _2^2 \le \sum _{i \in S_j} |\langle \mathbf {w}, \mathbf {u}_i\rangle |^2 \le \frac{1+(2+\sqrt{2})\sqrt{\varepsilon }}{2} \Vert \mathbf {w}\Vert _2^2 \end{aligned}$$

for each \(j=1, 2\) and all \(\mathbf {w} \in \mathbb {C}^m\).

Note that the above statement is trivial for \(\varepsilon \ge (2+\sqrt{2})^{-2}\), since in this case the lower bound is \(\le 0\) and the upper bound is \(\ge 1\). Relaxing condition (5), one obtains an analogous statement for non-tight frames.

Corollary 2.2

[26] Let \(0 < \varepsilon \) and \(\mathbf {u}_1, \ldots , \mathbf {u}_n \in \mathbb {C}^m\) with \(\Vert \mathbf {u}_i\Vert _2^2 \le \varepsilon \) for all \(i=1, \ldots , n\) and

$$\begin{aligned} \alpha \Vert \mathbf {w}\Vert _2^2 \le \sum _{i=1}^n |\langle \mathbf {w}, \mathbf {u}_i\rangle |^2 \le \beta \Vert \mathbf {w}\Vert _2^2 \end{aligned}$$

for all \(\mathbf {w} \in \mathbb {C}^m\), where \(\beta \ge \alpha > 0\) are some fixed constants. Then, there is a partition \(S_1 \dot{\cup } S_2 = [n]\), such that

$$\begin{aligned} \frac{1-(2+\sqrt{2})\sqrt{\varepsilon /\alpha }}{2} \cdot \alpha \Vert \mathbf {w}\Vert _2^2 \le \sum _{i \in S_j} |\langle \mathbf {w}, \mathbf {u}_i\rangle |^2 \le \frac{1+(2+\sqrt{2})\sqrt{\varepsilon /\alpha }}{2} \cdot \beta \Vert \mathbf {w}\Vert _2^2 \end{aligned}$$

for each \(j=1, 2\) and all \(\mathbf {w} \in \mathbb {C}^m\).

Again, the above statement is trivial for \(\varepsilon /\alpha \ge (2+\sqrt{2})^{-2}\). Now we are ready to formulate and prove the theorem which is convenient for our later purpose. The proof technique of this theorem is analogous to the one used for the proof of Lemma 2 in [26]. After the preprint was finished, V.N. Temlyakov pointed out to us that their proof of Lemma 2.2 in their recent paper [21], which is stated in a weaker form, also contains a version of the theorem below with unspecified constants.

Theorem 2.3

Let \(k_1, k_2, k_3 > 0\) and \(\mathbf {u}_1, \ldots , \mathbf {u}_n \in \mathbb {C}^m\) with \(\Vert \mathbf {u}_i\Vert _2^2 \le k_1 \frac{m}{n}\) for all \(i=1, \ldots , n\) and

$$\begin{aligned} k_2 \Vert \mathbf {w}\Vert _2^2 \le \sum _{i=1}^n |\langle \mathbf {w}, \mathbf {u}_i\rangle |^2 \le k_3 \Vert \mathbf {w}\Vert _2^2 \end{aligned}$$

for all \(\mathbf {w} \in \mathbb {C}^m\). Then, there is a \(J \subseteq [n]\) of size \(\# J \le c_1 m\) with

$$\begin{aligned} c_2 \cdot \frac{m}{n} \Vert \mathbf {w}\Vert _2^2 \le \sum _{i \in J} |\langle \mathbf {w}, \mathbf {u}_i\rangle |^2 \le c_3 \cdot \frac{m}{n} \Vert \mathbf {w}\Vert _2^2 \end{aligned}$$

for all \(\mathbf {w} \in \mathbb {C}^m\), where \(c_1, c_2, c_3\) only depend on \(k_1, k_2, k_3\). More precisely, we can choose

$$\begin{aligned} c_1 = 1642 \frac{k_1}{k_2} ,\quad c_2 = (2+\sqrt{2})^2k_1,\quad c_3 = 1642 \frac{k_1k_3}{k_2} \end{aligned}$$

in case \(\frac{n}{m}\ge 47 \frac{k_1}{k_2}\). In the regime \(1\le \frac{n}{m}< 47 \frac{k_1}{k_2}\), one may put \(c_1 = 47k_1/k_2\), \(c_2 = k_2\), \(c_3=47k_1 k_3/k_2\).

Proof

To ease the notation a bit, let us set \(\zeta {:}{=} 2+\sqrt{2}\). Put \(\delta {:}{=} k_1 \frac{m}{n}\), \(\alpha _0 {:}{=} k_2\), \(\beta _0 {:}{=} k_3\) and define recursively

$$\begin{aligned} \alpha _{\ell +1} {:}{=} \frac{1-\zeta \sqrt{\delta /\alpha _\ell }}{2} \cdot \alpha _\ell , \quad \beta _{\ell +1} {:}{=} \frac{1+\zeta \sqrt{\delta /\alpha _\ell }}{2} \cdot \beta _\ell \end{aligned}$$

for \(\ell \in \mathbb {N}_0\). Assume for the moment that \(\delta < (2\zeta )^{-2}k_2\). We want to show that there is a constant \(\gamma >0\), not depending on \(\delta \) and an \(L \in \mathbb {N}\), such that \(\alpha _\ell \ge (2\zeta )^2\delta \) for all \(\ell \le L\) as well as \(\zeta ^2 \delta \le \alpha _{L+1} < (2\zeta )^2\delta \) and \(\beta _{L+1}< \gamma \alpha _{L+1} < (2\zeta )^2\gamma \delta \).

Notice that

$$\begin{aligned} \frac{1-\sqrt{\delta /\alpha _\ell }}{2} \end{aligned}$$

is strictly increasing in \(\alpha _\ell > 0\). For \(\alpha _\ell \ge (2\zeta )^2\delta \), we thus have

$$\begin{aligned} \frac{1}{4} = \frac{1-\zeta \sqrt{\frac{\delta }{(2\zeta )^2\delta }}}{2} \le \frac{1-\zeta \sqrt{\delta /\alpha _\ell }}{2} < \lim \limits _{x \longrightarrow \infty } \frac{1-\zeta \sqrt{\delta /x}}{2} = \frac{1}{2}, \end{aligned}$$

and therefore

$$\begin{aligned} \frac{1}{4} \alpha _\ell \le \frac{1-\zeta \sqrt{\delta /\alpha _\ell }}{2} \cdot \alpha _\ell = \alpha _{\ell +1} < \frac{1}{2} \alpha _\ell . \end{aligned}$$

Set \(L {:}{=} \max \left\{ \ell \in \mathbb {N}_0: \alpha _\ell \ge (2\zeta )^2\delta \right\} \) so that \(\alpha _\ell \ge (2\zeta )^2\delta \) for all \(\ell \le L\). Notice that, since \(\alpha _{\ell +1} < \alpha _\ell /2\) as long as \(\alpha _\ell \ge (2\zeta )^2\delta \), we have \(L < \infty \). Since \(\alpha _0 = k_2 > (2\zeta )^2\delta \), we also have \(L \ge 0\).

The definition of L directly yields \(\alpha _{L+1} < (2\zeta )^2\delta \), but by the above also

$$\begin{aligned} \alpha _{L+1} \ge \frac{1}{4} \alpha _L \ge \zeta ^2 \delta . \end{aligned}$$

It remains to find a \(\gamma > 0\) as described above. To do so, first observe by the definition of the \(\alpha _\ell \) and \(\beta _\ell \) that

$$\begin{aligned} \frac{\beta _{L+1}}{\alpha _{L+1}}&= \frac{\beta _L}{\alpha _L} \cdot \frac{1+\zeta \sqrt{\delta /\alpha _L}}{1-\zeta \sqrt{\delta /\alpha _L}} = \frac{\beta _{L-1}}{\alpha _{L-1}} \cdot \frac{1+\zeta \sqrt{\delta /\alpha _{L-1}}}{1-\zeta \sqrt{\delta /\alpha _{L-1}}} \cdot \frac{1+\zeta \sqrt{\delta /\alpha _L}}{1-\zeta \sqrt{\delta /\alpha _L}} = \ldots \\&= \frac{\beta _0}{\alpha _0} \cdot \prod _{\ell =0}^L \frac{1+\zeta \sqrt{\delta /\alpha _\ell }}{1-\zeta \sqrt{\delta /\alpha _\ell }}. \end{aligned}$$

We have \(\alpha _L \ge (2\zeta )^2\delta \) so that \(\zeta \sqrt{\delta /\alpha _L} \le 2^{-1}\) and using

$$\begin{aligned} \alpha _{\ell +1}< \frac{1}{2} \alpha _\ell \quad \Rightarrow \quad \zeta \sqrt{\delta /\alpha _\ell } < 2^{-1/2} \cdot \zeta \sqrt{\delta /\alpha _{\ell +1}} \end{aligned}$$

inductively we get \(\zeta \sqrt{\delta /\alpha _{L-\ell }} \le 2^{-1-\ell /2}\) for \(\ell = 0, 1, \ldots , L\). Thus,

$$\begin{aligned} \frac{\beta _{L+1}}{\alpha _{L+1}} \le \frac{k_3}{k_2} \cdot \prod _{\ell =0}^L \frac{1+\zeta \sqrt{\delta /\alpha _\ell }}{1-\zeta \sqrt{\delta /\alpha _\ell }}< \frac{k_3}{k_2} \cdot \prod _{\ell =0}^\infty \frac{1+2^{-1-\ell /2}}{1-2^{-1-\ell /2}} =: \gamma < 35.21 \cdot \frac{k_3}{k_2}, \end{aligned}$$

which yields the final claim.

With this at hand, consider the situation of the theorem. Clearly, we have \(n\ge m\) due to the lower frame bound \(k_2>0\). We now distinguish two cases. Firstly, if \(1\le \frac{n}{m} < 47\frac{k_1}{k_2}\) the assertion follows directly for \(J = [n]\) and the choice \(c_1 = \frac{n}{m}\), \(c_2 = k_2 \frac{n}{m}\), and \(c_3 = k_3 \frac{n}{m}\). Incorporating the bounds for n/m gives the choice in the statement of the theorem.

In the second case, when \(\frac{n}{m}\ge 47\frac{k_1}{k_2}\), let \(\alpha _\ell \), \(\beta _\ell \) be as above and note that \(\delta = k_1 \frac{m}{n} \le k_2/47 < (2\zeta )^{-2}k_2\). The vectors \(\mathbf {u}_i\) fulfill the assumptions of Corollary 2.2 for \(\alpha = \alpha _0 = k_2\) and \(\beta = \beta _0 = k_3\), so that there is a set \(J_1 \subseteq [n]\) with

$$\begin{aligned} \alpha _1 \Vert \mathbf {w}\Vert _2^2 \le \sum _{i \in J_1} |\langle \mathbf {w}, \mathbf {u}_i\rangle |^2 \le \beta _1 \Vert \mathbf {w}\Vert _2^2 \end{aligned}$$

for all \(\mathbf {w} \in \mathbb {C}^m\). By choosing the smaller of the two partition classes \(S_1\) or \(S_2\), we may assume \(\# J_1 \le \frac{1}{2} n\). We can now apply Corollary 2.2 again, where we restrict ourselves to the indices in \(J_1\). We thus get a \(J_2 \subseteq J_1\) with

$$\begin{aligned} \alpha _2 \Vert \mathbf {w}\Vert _2^2 \le \sum _{i \in J_2} |\langle \mathbf {w}, \mathbf {u}_i\rangle |^2 \le \beta _2 \Vert \mathbf {w}\Vert _2^2 \end{aligned}$$

for all \(\mathbf {w} \in \mathbb {C}^m\). Again, by choosing the smaller partition class, we may assume \(\# J_2 \le \frac{1}{2} \# J_1 \le \frac{1}{4} n\). After \(L+1\) applications of Corollary 2.2, we get

$$\begin{aligned} \alpha _{L+1} \Vert \mathbf {w}\Vert _2^2 \le \sum _{i \in J_{L+1}} |\langle \mathbf {w}, \mathbf {u}_i\rangle |^2 \le \beta _{L+1} \Vert \mathbf {w}\Vert _2^2 \end{aligned}$$

for all \(\mathbf {w} \in \mathbb {C}^m\), where \(J_{L+1} \subseteq [n]\) with \(\#J_{L+1} \le 2^{-(L+1)} n\). By what was proven in the first part of this proof, we therefore get

$$\begin{aligned} \zeta ^2\delta \Vert \mathbf {w}\Vert _2^2 \le \sum _{i \in J_{L+1}} |\langle \mathbf {w}, \mathbf {u}_i\rangle |^2 \le (2\zeta )^2 \gamma \delta \Vert \mathbf {w}\Vert _2^2 \end{aligned}$$

for all \(\mathbf {w} \in \mathbb {C}^m\). We thus get the assertion for \(J = J_{L+1}\), \(c_2 = \zeta ^2 k_1\) and \(c_3 = (2\zeta )^2 \gamma k_1\le 35.21\cdot (2\zeta )^2 \frac{k_1k_3}{k_2}\). As for \(c_1\), look at the quantities

$$\begin{aligned} \phi _\ell {:}{=} \frac{\beta _\ell }{\# J_\ell } \end{aligned}$$

for \(\ell = 0, 1, \ldots , L+1\), where we set \(J_0 {:}{=} [n]\). Then, \(\phi _0 = \beta _0/n = k_3/n\). Since

$$\begin{aligned} \frac{\phi _{\ell +1}}{\phi _\ell } = \frac{\beta _{\ell +1}}{\beta _\ell } \cdot \frac{\# J_\ell }{\# J_{\ell +1}} = \underbrace{\frac{1+\zeta \sqrt{\delta /\alpha _\ell }}{2}}_{\ge 1/2} \cdot \underbrace{\frac{\# J_\ell }{\# J_{\ell +1}}}_{\ge 2} \ge 1, \end{aligned}$$

we see that the \(\phi _\ell \) are monotonically increasing. Thus,

$$\begin{aligned} \frac{k_3}{n} = \phi _0 \le \phi _{L+1} = \frac{\beta _{L+1}}{\# J_{L+1}} \le \frac{(2\zeta )^2\gamma \delta }{\# J_{L+1}}, \end{aligned}$$

so that

$$\begin{aligned} \# J = \# J_{L+1} \le \frac{n}{k_3} \cdot (2\zeta )^2 \gamma \cdot k_1 \frac{m}{n} \le 35.21 \cdot (2\zeta )^2 \frac{k_1}{k_2} \cdot m, \end{aligned}$$

i.e., \(c_1 \le 35.21 \cdot (2\zeta )^2k_1/k_2\). \(\square \)

3 Reproducing Kernel Hilbert Spaces

We will work in the framework of reproducing kernel Hilbert spaces. The relevant theoretical background can be found in [1, Chapt. 1] and [4, Chapt. 4]. The papers [9] and [32] are also of particular relevance for the subject of this paper.

Let \(L_2(D,\varrho _D)\) be the space of complex-valued square-integrable functions with respect to \(\varrho _D\). Here \(D \subset \mathbb {R}^d\) is an arbitrary measurable subset and \(\varrho _D\) a measure on D. We further consider a reproducing kernel Hilbert space H(K) with a Hermitian positive definite kernel K on \(D \times D\). The crucial property of reproducing kernel Hilbert spaces is the fact that Dirac functionals are continuous, or, equivalently, the reproducing property

$$\begin{aligned} f(\mathbf {x}) = \langle f, K(\cdot ,\mathbf {x}) \rangle _{H(K)} \end{aligned}$$

holds for all \(\mathbf {x}\in D\).

We will use the notation from [4, Chapt. 4]. In the framework of this paper, the finite trace of the kernel is given by

$$\begin{aligned} {\text {tr}}(K){:}{=}\Vert K\Vert ^2_{2} = \int _{D} K(\mathbf {x},\mathbf {x})d\varrho _D(\mathbf {x}) < \infty . \end{aligned}$$
(6)

The embedding operator

$$\begin{aligned} \mathrm {Id}_{K,\varrho _D}:H(K) \rightarrow L_2(D,\varrho _D) \end{aligned}$$
(7)

is Hilbert–Schmidt under the finite trace condition (6), see [9, 32, Lemma 2.3], which we always assume from now on. We additionally assume that H(K) is at least infinite dimensional. Let us denote the (at most) countable system of strictly positive eigenvalues \((\lambda _j)_{j\in \mathbb {N}}\) of \(W_{K,\varrho _D} = \mathrm {Id}_{K,\varrho _D}^{*} \circ \mathrm {Id}_{K,\varrho _D}\) arranged in non-increasing order, i.e.,

$$\begin{aligned} \lambda _1 \ge \lambda _2 \ge \lambda _3 \ge \cdots > 0. \end{aligned}$$

We will also need the left and right singular vectors \((e_k)_k \subset H(K)\) and \((\eta _k)_k \subset L_2(D,\varrho _D)\) which both represent orthonormal systems in the respective spaces related by \(e_k = \sigma _k \eta _k\) with \(\lambda _k = \sigma _k^2\) for \(k\in \mathbb {N}\). We would like to emphasize that the embedding (7) is not necessarily injective. In other words, for certain kernels there might also be a nontrivial null-space of the embedding in (7). Therefore, the system \((e_k)_k\) from above is not necessarily a basis in H(K). It would be a basis under additional restrictions, e.g., if the kernel \(K(\cdot ,\cdot )\) is continuous and bounded (i.e., a Mercer kernel). It is shown in [9, 32, Lemma 2.3] that if \({\text {tr}}(K)<\infty \) and H(K) is separable the nonnegative function

$$\begin{aligned} K(\mathbf {x},\mathbf {x}) - \sum \limits _{k=1}^{\infty } |e_k(\mathbf {x})|^2 \end{aligned}$$
(8)

vanishes almost everywhere. Let us finally define the “spectral functions”

$$\begin{aligned} N(m) {:}{=} \sup \limits _{\mathbf {x}\in D}\sum \limits _{k=1}^{m-1}|\eta _k(\mathbf {x})|^2\ \end{aligned}$$
(9)

and

$$\begin{aligned} T(m) {:}{=} \sup \limits _{\mathbf {x}\in D}\sum \limits _{k=m}^{\infty }|e_k(\mathbf {x})|^2 \end{aligned}$$
(10)

provided that they exist.

4 Weighted Least Squares

Let us begin with concentration inequalities for the spectral norm of sums of complex rank-1 matrices. Such matrices appear as \(\mathbf {L}^*\mathbf {L}\) when studying least squares solutions of over-determined linear systems

$$\begin{aligned} \mathbf {L}\cdot \mathbf {c} = \mathbf {f}, \end{aligned}$$

where \(\mathbf {L}\in \mathbb {C}^{n \times m}\) is a matrix with \(n>m\). It is well known that the above system may not have a solution. However, we can ask for the vector \(\mathbf {c}\) which minimizes the residual \(\Vert \mathbf {f}-\mathbf {L}\cdot \mathbf {c}\Vert _2\). Multiplying the system with \(\mathbf {L}^*\) gives

$$\begin{aligned} \mathbf {L}^{*}\mathbf {L}\cdot \mathbf {c} = \mathbf {L}^{*}\cdot \mathbf {f} \end{aligned}$$

which is called the system of normal equations. If \(\mathbf {L}\) has full rank, then the unique solution of the least squares problem is given by

$$\begin{aligned} \mathbf {c} = (\mathbf {L}^{*}\mathbf {L})^{-1}\mathbf {L}^{*}\cdot \mathbf {f}. \end{aligned}$$
(11)

For function recovery problems, we will use the following matrix

$$\begin{aligned} \mathbf {L}_{n,m} {:}{=} \left( \begin{array}{cccc} \eta _1(\mathbf {x}^1) &{}\eta _2(\mathbf {x}^1)&{} \cdots &{} \eta _{m-1}(\mathbf {x}^1)\\ \vdots &{} \vdots &{}&{} \vdots \\ \eta _1(\mathbf {x}^n) &{}\eta _2(\mathbf {x}^n)&{} \cdots &{} \eta _{m-1}(\mathbf {x}^n) \end{array}\right) = \left( \begin{array}{c} \mathbf {y}^1\\ \vdots \\ \mathbf {y}^n \end{array}\right) \quad \text {and}\quad \mathbf {f} = \left( \begin{array}{c} f(\mathbf {x}^1)\\ \vdots \\ f(\mathbf {x}^n) \end{array}\right) , \end{aligned}$$
(12)

for \(\mathbf {X}= (\mathbf {x}^1,\ldots ,\mathbf {x}^n) \in D^n\) of distinct sampling nodes and a system \((\eta _k(\cdot ))_{k}\) of functions. Here \(\mathbf {y}^i {:}{=} (\eta _1(\mathbf {x}^i),\ldots ,\eta _{m-1}(\mathbf {x}^i)), i=1,\ldots ,n\).

Lemma 4.1

[13, Proposition 3.1] Let \(\mathbf {L}\in \mathbb {C}^{n\times m}\) be a matrix with \(m\le n\) with full rank and singular values \(\tau _1,\ldots ,\tau _m >0\) arranged in non-increasing order.

  1. (i)

    Then, also the matrix \((\mathbf {L}^{*}\mathbf {L})^{-1}\mathbf {L}^{*}\) has full rank and singular values \(\tau _m^{-1},\ldots ,\tau _1^{-1}\) (arranged in non-increasing order).

  2. (ii)

    In particular, it holds that

    $$\begin{aligned} (\mathbf {L}^{*}\mathbf {L})^{-1}\mathbf {L}^{*} = \mathbf {V}^{*} \tilde{{{\varvec{\varSigma }}}}\mathbf {U}\end{aligned}$$

    whenever \(\mathbf {L}= \mathbf {U}^{*} {{\varvec{\varSigma }}} \mathbf {V}\), where \(\mathbf {U}\in \mathbb {C}^{n\times n}\) and \(\mathbf {V}\in \mathbb {C}^{m\times m}\) are orthogonal matrices and \({{\varvec{\varSigma }}} \in \mathbb {R}^{n\times m}\) is a rectangular matrix with only \((\tau _1,\ldots ,\tau _m)\) on the main diagonal. Here \(\tilde{{{\varvec{\varSigma }}}} \in \mathbb {R}^{m\times n}\) denotes the matrix with \((\tau _1^{-1},\ldots ,\tau _m^{-1})\) on the main diagonal.

  3. (iii)

    The operator norm \(\Vert (\mathbf {L}^{*}\mathbf {L})^{-1}\mathbf {L}^{*}\Vert _{2 \rightarrow 2}\) can be controlled as follows:

    $$\begin{aligned} \tau _1^{-1} \le \Vert (\mathbf {L}^{*}\mathbf {L})^{-1}\mathbf {L}^{*}\Vert _{2 \rightarrow 2} \le \tau _m^{-1}. \end{aligned}$$

Being in the RKHS setting, we compute the coefficients \(c_k\), \(k=1,\ldots ,m-1\), of the approximant

$$\begin{aligned} S^m_{\mathbf {X}}f{:}{=}\sum _{k = 1}^{m-1} c_k\, \eta _k\, \end{aligned}$$
(13)

using the least squares algorithm (11). We will also use the weighted version below, where \(\varrho _m(\cdot )\) is a density function which essentially first appeared in [15] and has been adapted in [24] to

$$\begin{aligned} \varrho _m(\mathbf {x}) = \frac{1}{2} \bigg (\frac{1}{m-1} \sum _{j = 1}^{m-1} |\eta _j(\mathbf {x})|^2 + \frac{K(\mathbf {x},\mathbf {x}) - \sum _{j = 1}^{m-1}|e_j(\mathbf {x})|^2}{\int _D K(\mathbf {x},\mathbf {x}) d\varrho _D(\mathbf {x}) - \sum _{j = 1}^{m-1}\lambda _j} \bigg ). \end{aligned}$$
(14)
figure a

Note that the mapping \(f \mapsto \widetilde{S}^m_{\mathbf {X}}f\) is well defined and linear for a fixed set of sampling nodes

$$\mathbf {X}= (\mathbf {x}^1,\ldots ,\mathbf {x}^n) \in D^n$$

if the matrix \(\widetilde{\mathbf {L}}_{n,m}\) has full (column) rank. The next section gives sufficient conditions when this is the case.

5 Concentration Results for Random Matrices

We start with a concentration inequality for the spectral norm of a matrix of type (12). It turns out that the complex matrix \(\mathbf {L}_{n,m}{:}{=}\mathbf {L}_{n,m}(\mathbf {X})\in \mathbb {C}^{n\times (m-1)}\) has full rank with high probability, if \(\mathbf {X}= (\mathbf {x}^1,\ldots ,\mathbf {x}^n)\) is drawn at random from \(D^n\) according to a measure \(\mathbb {P}= d\varrho ^n\), the functions \((\eta _k)_{k=1}^{m-1}\) are orthonormal w.r.t the measure \(\varrho \) and m is not too large (compared to n). We will find below that the eigenvalues of

$$\begin{aligned} \mathbf {H}_m{:}{=}\mathbf {H}_m(\mathbf {X}) = \frac{1}{n}\mathbf {L}_{n,m}^*\mathbf {L}_{n,m} = \frac{1}{n} \sum \limits _{i = 1}^n \mathbf {y}^i \otimes \mathbf {y}^i \in \mathbb {C}^{(m-1)\times (m-1)} \end{aligned}$$
(16)

are bounded away from zero with high probability if m is small enough compared to n. The following result is a consequence of [40, Thm. 1.1], see also [24, Thm. 2.3, Cor. 2.5].

Theorem 5.1

For \(n \ge m\), \(r > 1\), we immediately obtain that the matrix \(\mathbf {H}_m\) has only eigenvalues greater than 1/2 and smaller than 3/2 with probability at least \(1 - 2n^{1-r}\) if the nodes are sampled i.i.d. according to \(\varrho \) and

$$\begin{aligned} N(m) \le \frac{n}{ 10 \, r \log n}, \end{aligned}$$
(17)

where N(m) is the quantity defined in (9). Equivalently, we have for all \(\mathbf {w}\in \mathbb {C}^{m-1}\)

$$\begin{aligned} \frac{1}{2}\Vert \mathbf {w}\Vert _2^2 \le \frac{1}{n}\big \Vert \mathbf {L}_{n,m} \mathbf {w}\big \Vert _2^2 \le \frac{3}{2}\Vert \mathbf {w}\Vert _2^2 \end{aligned}$$

and

$$\begin{aligned} \sqrt{\frac{2}{3n}} \le \Vert (\mathbf {L}_{n,m}^{*}\mathbf {L}_{n,m})^{-1}\mathbf {L}_{n,m}^{*}\Vert _{2\rightarrow 2} \le \sqrt{\frac{2}{n}} \end{aligned}$$
(18)

with probability at least \(1-2n^{1-r}\).

Let us now turn to infinite matrices. We need a result which can be applied to independent \(\ell _2\)-sequences of the form

$$\begin{aligned} \mathbf {y}^i = (e_m(\mathbf {x}^i),e_{m+1}(\mathbf {x}^i),\ldots )\quad ,\quad i=1,\ldots ,n, \end{aligned}$$

where \((e_k)_k \subset H(K)\) is the system of right singular vectors of the embedding \(\mathrm {Id}:H(K) \rightarrow L_2\) defined above.

The following infinite-dimensional concentration result is proved in . There are earlier versions for the finite-dimensional framework (matrices) proved by Tropp [40], Oliveira [28], Rauhut [30] and others. Mendelson, Pajor [23] and also Oliveira [28] comment on infinite versions of their result. The key feature of the following proposition is the exact control of the constants and the decaying fail probability, see also Remark 3.10 in [24] for a more detailed comparison to earlier results.

Proposition 5.2

Let \(\mathbf {y}^i , i= 1 \dots n \), be i.i.d random sequences from \( \ell _2\). Let further \(n \ge 3\), \(r > 1\), \(M>0\) such that \(\Vert \mathbf {y}^i \Vert _2 \le M\) for all \(i=1 \dots n\) almost surely and \( \mathbb {E}\mathbf {y}^i \otimes \mathbf {y}^i = {{\varvec{\varLambda }}}\) for all \(i=1 \dots n\). Then,

$$\begin{aligned} \mathbb {P}\Big ( \Big \Vert \frac{1}{n} \sum _{i=1}^n \, \mathbf {y}^i \otimes \mathbf {y}^i - {{\varvec{\varLambda }}} \Big \Vert _{2 \rightarrow 2} \ge F \Big ) \le 2^\frac{3}{4} \, n^{1-r}, \end{aligned}$$

where \(F {:}{=} \max \Big \{ \frac{8r \log n}{n} M^2 \kappa ^2 , \Vert {{\varvec{\varLambda }}} \Vert _{2 \rightarrow 2} \Big \}\) and \( \kappa = \frac{1+\sqrt{5}}{2}\) .

This can be written in a more compact form.

Theorem 5.3

Let \(\mathbf {y}^i , i= 1 \dots n, \) be i.i.d random sequences from \( \ell _2\). Let further \(n \ge 3\), \(M>0\) such that \(\Vert \mathbf {y}^i \Vert _2 \le M\) for all \(i=1 \dots n\) almost surely and \( \mathbb {E}\mathbf {y}^i \otimes \mathbf {y}^i = {{\varvec{\varLambda }}}\) for \(i=1,\ldots ,n\) with \(\Vert \varvec{\varLambda }\Vert _{2\rightarrow 2} \le 1\). Then, for \(0<t<1\),

$$\begin{aligned} \mathbb {P}\Big ( \Big \Vert \frac{1}{n} \sum _{i=1}^n \, \mathbf {y}^i \otimes \mathbf {y}^i - {{\varvec{\varLambda }}} \Big \Vert _{2 \rightarrow 2} \ge t \Big ) \le 2^\frac{3}{4} n\exp \Big (-\frac{t^2n}{21M^2}\Big ). \end{aligned}$$

6 New Bounds for Sampling Numbers

We are interested in the question of optimal sampling recovery of functions from reproducing kernel Hilbert spaces in \(L_2(D,\varrho _D)\). The quantity we want to study is classically given by

$$\begin{aligned} g_n(\mathrm {Id}_{K,\varrho _D}) {:}{=} \inf \limits _{\mathbf {x}^1,\ldots ,\mathbf {x}^n \in D} \; \inf \limits _{\varphi :\mathbb {C}^n \rightarrow L_2} \; \sup \limits _{\Vert f\Vert _{H(K)}\le 1}\Vert f-\varphi (f(\mathbf {x}^1),\ldots ,f(\mathbf {x}^n))\Vert _{L_2(D,\varrho _D)} \end{aligned}$$

and quantifies the recovery of functions out of n function values in the worst-case setting. The goal is to get reasonable bounds for this quantity in n, preferably in terms of the singular numbers of the embedding. Results on the decay properties of this quantity in the framework of RKHS have been given by several authors, see, e.g., [15, 20, 27] and the references therein (see also Remark 6.2). For a special case in the field of hyperbolic cross approximation, we refer to [7, Outstanding Open Problem 1.4]. Here we present a new upper bound in the general framework.

The main idea. In the following theorem, we apply Weaver’s theorem to a random frame. The idea is to construct a sampling operator \(\widetilde{S}^m_J\) using O(m) sampling nodes as follows. We draw nodes \(\mathbf {X}= (\mathbf {x}^1,\ldots ,\mathbf {x}^n)\) i.i.d. at random according to some measure \(\mu _m\) specified concretely in the proof below, where n scales as \(m\log m\). At this stage, we have too many sampling nodes, however a “good” frame in the sense of a well-conditioned matrix \(\mathbf {L}_{n,m}\) in (12). To this frame (rows of \(\mathbf {L}_{n,m}\)), we apply our modified Weaver theorem. The result is a shrunk well-conditioned sub-frame corresponding to a subset \((\mathbf {x}^i)_{i\in J}\) of the initial set of sample nodes. With this sub-frame (sub-matrix of \(\mathbf {L}_{n,m}\)), we solve the over-determined system via the least squares Algorithm 1. This represents the sampling operator. When it comes to the error analysis, we again benefit from the fact that we deal with a (not too small compared to n) subset of the original nodes \(\mathbf {X}\). The consequence is that we do not pay too much (\(\sqrt{\log n}\)) compared to the sampling operator based on the original set \(\mathbf {X}\) of nodes. However, we only used O(m) sample nodes which makes the difference.

Theorem 6.1

Let H(K) be a separable reproducing kernel Hilbert space on a set \(D \subset \mathbb {R}^d\) with a positive semidefinite kernel \(K:D\times D\rightarrow \mathbb {C}\) satisfying

$$\begin{aligned} \int _D K(\mathbf {x},\mathbf {x}) d\varrho _D(\mathbf {x}) < \infty \end{aligned}$$

for some measure \(\varrho _D\) on D. Then, \(\mathrm {Id}_{K,\varrho _D}:H(K) \rightarrow L_2(D,\varrho _D)\) is a Hilbert–Schmidt embedding, the corresponding sequence of singular numbers \((\sigma _k)_{k=1}^{\infty }\) square-summable. For the sequence of sampling numbers \(g_n{:}{=}g_n(\mathrm {Id}_{K,\varrho _D})\), we have the general bound

$$\begin{aligned} g_n^2 \le C\frac{\log n}{n}\sum \limits _{k\ge cn} \sigma _k^2 \quad ,\quad n\ge 2, \end{aligned}$$
(19)

with two universal constants \(C,c>0\), which are specified in Remark 6.3.

Proof

Let \(m \ge 2\). Similarly, as in [13, 15] and [24], we use the density function (14) in order to consider the embedding \(\widetilde{\mathrm {Id}}:H(\widetilde{K}_m) \rightarrow L_2(D,\varrho _m(\cdot )d\varrho _D)\) instead of \(\mathrm {Id}_{K,\varrho _D}:H(K) \rightarrow L_2(D,\varrho _D)\). In fact, we define the new kernel

$$\begin{aligned} \widetilde{K}_m(\mathbf {x},\mathbf {y}) {:}{=} \frac{K(\mathbf {x},\mathbf {y})}{\sqrt{\varrho _m(\mathbf {x})}\sqrt{\varrho _m(\mathbf {y})}}. \end{aligned}$$
(20)

This yields

$$\begin{aligned} \sup _{\Vert f\Vert _{H(K)} \le 1 } \big \Vert f - \widetilde{S}_{\mathbf {X}}^m f \big \Vert _{L_2(D,\varrho _D)} \le \sup _{\Vert g\Vert _{H(\widetilde{K}_m)} \le 1 } \big \Vert g - S_{\mathbf {X}}^m g \big \Vert _{L_2(D,\varrho _m(\cdot )d\varrho _D)}\, \end{aligned}$$
(21)

and \(\widetilde{N}(m) \le 2(m-1)\), \(\widetilde{T}(m) \le 2\sum _{j=m}^{\infty }\sigma _j^2\). For the details of this, see the discussion in the proof of [13, Thm. 5.9] and [13, Thm. 5.5]. Note that the operators \(\widetilde{S}_{\mathbf {X}}^m\) and \(S_{\mathbf {X}}^m\) are defined by Algorithm 1 and (11). The number of samples will be chosen later as O(m). Choose now the smallest n such that

$$\begin{aligned} m \le \frac{n}{40\log (n)}. \end{aligned}$$

This implies \(\widetilde{N}(m) \le 2(m-1) \le n/(20 \log (n))\). Applying Theorem 5.1 with \(r=2\) gives that the rows of the matrix \(\frac{1}{\sqrt{n}}\widetilde{\mathbf {L}}_{n,m}\) represent a finite frame with frame bounds 1/2 and 3/2 with high probability (the failure probability \(2n^{-1}\) decays polynomially in n) when \(\mathbf {X}= (\mathbf {x}^1,\ldots ,\mathbf {x}^n) \in D^n\) is sampled w.r.t to the measure \(d\mu _m = \varrho _m(\cdot )d\varrho _D\). That means we have with high probability for any \(\mathbf {w}\in \mathbb {C}^{m-1}\)

$$\begin{aligned} \frac{1}{2}\Vert \mathbf {w}\Vert _2^2 \le \frac{1}{n}\big \Vert \widetilde{\mathbf {L}}_{n,m} \mathbf {w}\big \Vert _2^2 \le \frac{3}{2}\Vert \mathbf {w}\Vert _2^2. \end{aligned}$$
(22)

Let us now denote with \(P_{m-1}:H(\widetilde{K}_m) \rightarrow H(\widetilde{K}_m)\) the projection operator onto \(\text{ span }\{e_1(\cdot )/\sqrt{\varrho _m(\cdot )},\ldots ,e_{m-1}(\cdot )/\sqrt{\varrho _m(\cdot )}\}\). Following the proof in [24, Thm. 5.1], we obtain almost surely

$$\begin{aligned} \begin{aligned} \sup \limits _{\Vert g\Vert _{H(\widetilde{K}_m)}\le 1}\frac{1}{n}\sum \limits _{i=1}^n|g(\mathbf {x}^i)-P_{m-1}g(\mathbf {x}^i)|^2&\le \frac{1}{n}\Vert {{\varvec{\varPhi }}}_m^{*}{{\varvec{\varPhi }}}_m\Vert _{2\rightarrow 2} \\&\le \Big \Vert \frac{1}{n}{{\varvec{\varPhi }}}_m^{*}{{\varvec{\varPhi }}}_m-{{\varvec{\varLambda }}}\Big \Vert _{2\rightarrow 2} + \Vert {{\varvec{\varLambda }}}\Vert _{2\rightarrow 2} \end{aligned} \end{aligned}$$
(23)

with \({{\varvec{\varLambda }}} = \mathrm {diag}(\sigma _m^2,\sigma _{m+1}^2,\ldots )\) and

$$\begin{aligned} {{\varvec{\varPhi }}}_m{:}{=}\left( \begin{array}{c} \mathbf {y}^1\\ \vdots \\ \mathbf {y}^n \end{array}\right) \, \end{aligned}$$

being infinite matrices/operators with sequences \(\mathbf {y}^i = 1/\sqrt{\varrho _m(\mathbf {x}^i)}\left( e_m(\mathbf {x}^i),e_{m+1}(\mathbf {x}^i),\right. \left. \dots \right) \), \( i=1 \dots n\). By Proposition 5.2, we finally get from this with high probability

$$\begin{aligned} \sup \limits _{\Vert g\Vert _{H(\widetilde{K}_m)}\le 1}\frac{1}{n}\sum \limits _{i=1}^n|g(\mathbf {x}^i)-P_{m-1}g(\mathbf {x}^i)|^2 \le c_1\Big (\sigma _m^2 + \frac{1}{m}\sum \limits _{k=m}^\infty \sigma _k^2\Big ), \end{aligned}$$
(24)

for some constant \(c_1>0\), where we used that

$$\begin{aligned} \Vert \mathbf {y}^i\Vert _2^2\le \widetilde{T}(m) \le 2\sum _{j=m}^{\infty } \sigma _j^2. \end{aligned}$$

Due to the high probability of both events, (22) and (24), there exists an instance of nodes \(\{\mathbf {x}^1,\ldots ,\mathbf {x}^n\}\) such that the bound (24) is true and \(\frac{1}{\sqrt{n}}\widetilde{\mathbf {L}}_{n,m}\) represents a finite frame in \(\mathbb {C}^{m-1}\). Note that all the assumptions of Theorem 2.3 are fulfilled. Indeed, the squared Euclidean norms of the rows of \(\frac{1}{\sqrt{n}}\widetilde{\mathbf {L}}_{n,m}\) are bounded by \(2(m-1)/n\). To this finite frame, we may thus apply Theorem 2.3 which constructs a sub-matrix \(\widetilde{\mathbf {L}}_{J,m}\) having \(\# J = O(m)\) rows which, properly normalized, still form a frame in \(\mathbb {C}^{m-1}\). It holds for all \(\mathbf {w}\in \mathbb {C}^{m-1}\)

$$\begin{aligned} c_2\Vert \mathbf {w}\Vert _2^2 \le \frac{1}{m}\Vert \widetilde{\mathbf {L}}_{J,m}\mathbf {w}\Vert _2^2 \le C_2\Vert \mathbf {w}\Vert _2^2\,. \end{aligned}$$
(25)

With this matrix we perform the least squares method (11) applied to the shrunk vector of function samples \((f(\mathbf {x}^i))_{i\in J}\) corresponding to the rows of \(\widetilde{\mathbf {L}}_{J,m}\). We denote the least squares operator with \(S^m_J\). Note that this operator uses only \(\# J = O(m)\) samples. Let \(\Vert g\Vert _{H(\widetilde{K}_m)}\le 1\) and again denote with \(P_{m-1}:H(\widetilde{K}_m) \rightarrow H(\widetilde{K}_m)\) the projection operator onto \(\text{ span }\{e_1(\cdot )/\sqrt{\varrho _m(\cdot )},\ldots ,e_{m-1}(\cdot )/\sqrt{\varrho _m(\cdot )}\}\). Then, it holds

$$\begin{aligned} \begin{aligned} \Vert g-S^m_Jg\Vert ^2_{L_2}&= \Vert g-P_{m-1}g + P_{m-1}g -S^m_Jg\Vert ^2_{L_2}\\&= \Vert g-P_{m-1}g\Vert _{L_2}^2 + \Vert S^m_J(P_{m-1}g -g)\Vert ^2_{L_2}\\&\le \sigma _m^2 + \Vert (\widetilde{\mathbf {L}}_{J,m}^*\widetilde{\mathbf {L}}_{J,m})^{-1}\widetilde{\mathbf {L}}_{J,m}^{*}\Vert ^2_{2\rightarrow 2}\sum \limits _{i\in J}|g(\mathbf {x}^i)-P_{m-1}g(\mathbf {x}^i)|^2\,. \end{aligned} \end{aligned}$$
(26)

Using Lemma 4.1 together with (25) gives

$$\begin{aligned} \Vert g-S^m_Jg\Vert ^2_{L_2} \le \sigma _m^2+\frac{c_3}{m}\sum \limits _{i=1}^n |g(\mathbf {x}^i) - P_{m-1}g(\mathbf {x}^i)|^2. \end{aligned}$$

By the choice of n together with (24), we may estimate further

$$\begin{aligned} \begin{aligned} \Vert g-S^m_Jg\Vert ^2_{L_2}&\le c_4\log (n)\Big (\sigma _m^2 + \frac{1}{m}\sum \limits _{k=m}^\infty \sigma _k^2\Big )\\&\le c_5\frac{\log (m)}{m}\sum \limits _{k=\lfloor m/2 \rfloor }^\infty \sigma _k^2. \end{aligned} \end{aligned}$$
(27)

Consequently, by (21) we obtain

$$\begin{aligned} \sup _{\Vert f\Vert _{H(K)} \le 1 } \big \Vert f - \widetilde{S}_{J}^m f \big \Vert _{L_2(D,\varrho _D)} \le c_5\frac{\log (m)}{m}\sum \limits _{k=\lfloor m/2 \rfloor }^\infty \sigma _k^2\ \end{aligned}$$

and finally, using that \(\# J = O(m)\),

$$\begin{aligned} g_{\lceil c_6 m \rceil }^2 \le c_5\frac{\log (m)}{m}\sum \limits _{k=\lfloor m/2 \rfloor }^\infty \sigma _k^2 , \end{aligned}$$

where all the involved constants are universal. This implies the statement of the theorem. \(\square \)

Remark 6.2

(i) The additional assumption on the “separability” of H(K) ensures the equality sign in the inequality

$$\begin{aligned} \sum \limits _{k=1}^{\infty } \sigma _k^2 \le \int _D K(\mathbf {x},\mathbf {x})d\varrho _D(\mathbf {x}) < \infty . \end{aligned}$$
(28)

The identity is crucial in the proof of Theorem 6.1, see also [13, Thm. 5.5]. Without an equality in (28), the best known general upper bound for the sampling numbers \(g_n\) can be found in the recent paper [24]. In fact, combining the proof of Theorem 6.1 with the one in [24, Thm. 7.1], one can prove

$$\begin{aligned} g_n^2 \le C\max \Big \{\frac{1}{n},\frac{\log n}{n}\sum \limits _{k\ge cn} \sigma _k^2\Big \}. \end{aligned}$$

The bound is worse compared to (19). However, one should rather compare to the bound in [42], since this proof also works without equality in (28). There the authors proved under the same assumptions

$$\begin{aligned} g_n^2 \le \min \Big \{\sigma _\ell ^2 + \frac{{\text {tr}}(K)\ell }{n}~:~\ell =1,2,\ldots \Big \}. \end{aligned}$$

(ii) As far as we know, Wasilkowski and Woźniakowski [42] were the first who addressed the sampling recovery problem in 2001 in the context of reproducing kernel Hilbert spaces. They obtained results using exclusively the finite trace condition (2). Later in 2009, Kuo et al. [20] did further progress by determining the range for the “power of standard information” when knowing the decay rate of the singular numbers. We refer to the monograph [27] for a detailed historical discussion of the development until 2012. The recent progress in the field has been initiated by Krieg and M.Ullrich [15] in 2019. The authors proved under stronger assumptions than in Theorem 6.1 an existence result, namely

$$\begin{aligned} g_n^2 \lesssim \frac{\log n}{n}\sum \limits _{k \gtrsim n/\log (n)} \sigma _k^2, \end{aligned}$$

which represents a slightly weaker bound. Further progress (explicit constants, consequences for numerical integration) has been given in Kämmerer et al. [13] and in M.Ullrich [41] as well as Moeller and T.Ullrich [24] for the control of the failure probability.

(iii) To further demonstrate the usefulness of the tools developed here, let us mention that they have already been used by others. In fact, while this manuscript was under review, Krieg and M.Ullrich [16] used this technique to observe that for non-Hilbert function space embeddings the “gap” is also at most \(\sqrt{\log (n)}\) if the corresponding approximation numbers are p-summable with \(p<2\). Finally, we would like to mention a very recent result by Temlyakov [36], where the sampling numbers \(g_n\) of a function class \(\mathbf {F}\) in \(L_2\) are related to the Kolmogorov numbers in \(L_\infty \). This allows for treating the case of “small smoothness”, where the square summability does not hold [37, 38].

Remark 6.3

Note that, using Theorem 2.3, Theorem 5.1 and Proposition 5.2, we can get explicit values for the constants in the above theorem. Concretely, we can conclude that (19) holds for \(m\ge 13136\) with \(C=1.5 \cdot 10^6 \) and \(c=3.8 \cdot 10^{-5}\).

To verify this, let \(m \ge 2\) and \(n=n(m)\), as in the proof of Theorem 6.1, such that

$$\begin{aligned} \frac{n-1}{40 \log (n-1)} < m \le \frac{n}{40 \log (n)}. \end{aligned}$$

For \(m=2\) we get \(n=497\) and since n increases as m increases, we generally have \(n \ge 497\). Put \(r=2\). We then also have (22) with probability at least \(1-2/n \ge 1-2/497\) and (24) with probability at least \(1-2^{3/4}/n \ge 1-2^{3/4}/497\). Since these already sum up to \( (1-2/497) + (1-2^{3/4}/497) > 1\), we can guarantee the existence of a node set \(\{\mathbf{x}^{1}, \ldots , \mathbf{x}^{n}\}\) as in the proof of Theorem 6.1.

In the proof, we apply Theorem 2.3 with \(k_1 = 2, k_2 = 1/2, k_3 = 3/2\), so that we get the respective constants

$$\begin{aligned} \tilde{c}_1 = 6568, \quad \tilde{c}_2 = 2(2+\sqrt{2})^2, \quad \tilde{c}_3 = 9852. \end{aligned}$$

For this, note that for all \(m\ge 2\) and \(n=n(m)\) as above we have

$$\begin{aligned} \frac{n}{m} \ge 40\log (n) \ge 40\log (497) \ge 40\cdot 4.7 = 47 \frac{k_1}{k_2}. \end{aligned}$$

To calculate \(c_1\), we apply Proposition 5.2 with \(M^2 = 2 \sum _{k=m}^\infty \sigma _k^2\), so that we get

$$\begin{aligned} F = \max \left\{ \sigma _m^2, \frac{16 \log (n)}{n} M^2 \kappa ^2\right\} = \max \left\{ \sigma _m^2, 32 \left( \frac{1+\sqrt{5}}{2}\right) ^2 \frac{\log (n)}{n} \sum _{k=m}^\infty \sigma _k^2\right\} . \end{aligned}$$

Using \(\log (n)/n \le 1/40m\), we can then estimate further

$$\begin{aligned} F&\le \max \left\{ \sigma _m^2, \frac{32}{40} \left( \frac{1+\sqrt{5}}{2}\right) ^2 \frac{1}{m} \sum _{k=m}^\infty \sigma _k^2\right\} \\&\le \underbrace{\left( 1+\frac{4}{5}\left( \frac{1+\sqrt{5}}{2}\right) ^2\right) }_{=: c_1} \left( \sigma _m^2 + \frac{1}{m} \sum _{k=m}^\infty \sigma _k^2\right) , \end{aligned}$$

so that \(c_1 = 3.09\dots \).

After applying Theorem 2.3, we get \(c_2 = \tilde{c}_2 = 2(2+\sqrt{2})^2\) and \(C_2 = \tilde{c}_3 = 9852\), as well as \(\# J \le 6568m\).

Since

$$\begin{aligned} mc_2 \Vert \mathbf{w} \Vert _2^2 \le \Vert \widetilde{\mathbf{L }}_{J, m} \mathbf{w} \Vert _2^2 \le C_2m\Vert \mathbf{w} \Vert _2^2, \end{aligned}$$

Lemma 4.1 (iii) gives

$$\begin{aligned} \frac{1}{C_2m} \le \Vert (\widetilde{\mathbf{L }}_{J, m}^* \widetilde{\mathbf{L }}_{J, m})^{-1} \widetilde{\mathbf{L }}_{J, m}^*\Vert _{2 \rightarrow 2}^2 \le \frac{1}{c_2m}, \end{aligned}$$

so that we may choose \(c_3 = 1/c_2 = (2+\sqrt{2})^{-2}/2\).

To obtain \(c_4\), we use \(n/m \le \frac{n}{n-1} \cdot 40 \log (n-1) \le \frac{497\log (496)}{496\log (497)} \cdot 40 \log (n)\) to estimate

$$\begin{aligned} \Vert g-\widetilde{S}^m_Jg\Vert ^2_{L_2}&\le \sigma _m^2+\frac{c_3}{m}\sum \limits _{i=1}^n |g(\mathbf {x}^i) - P_{m-1}g(\mathbf {x}^i)|^2 \\&\le \sigma _m^2+ c_3 \frac{n}{m} \cdot c_1\left( \sigma _m^2 + \frac{1}{m}\sum \limits _{k=m}^\infty \sigma _k^2\right) \\&\le \sigma _m^2 + 40 \cdot \frac{497\log (496)}{496\log (497)} c_1c_3 \log (n)\left( \sigma _m^2 + \frac{1}{m}\sum \limits _{k=m}^\infty \sigma _k^2\right) \\&\le \underbrace{\left( 1+40 \cdot \frac{497\log (496)}{496\log (497)} c_1c_3\right) }_{=: c_4} \log (n) \left( \sigma _m^2 + \frac{1}{m}\sum \limits _{k=m}^\infty \sigma _k^2\right) \end{aligned}$$

and get \(c_4 = 6.31\dots \).

As for \(c_5\), we start with

$$\begin{aligned} \sigma _m^2 \le \frac{1}{(m-1)-\lfloor m/2\rfloor +1} \sum _{k=\lfloor m/2\rfloor }^{m-1} \sigma _k^2 = \frac{1}{\lceil m/2\rceil } \sum _{k=\lfloor m/2\rfloor }^{m-1} \sigma _k^2 \le \frac{2}{m} \sum _{k=\lfloor m/2\rfloor }^{m-1} \sigma _k^2, \end{aligned}$$

so that

$$\begin{aligned} \sigma _m^2 + \frac{1}{m} \sum _{k=m}^\infty \sigma _k^2 \le \frac{2}{m} \sum _{k=\lfloor m/2\rfloor }^\infty \sigma _k^2. \end{aligned}$$

Furthermore, we have \(\log (n-1)-\log (40)-\log \log (n-1) < \log (m)\) and

$$\begin{aligned} \frac{\log (n-1)-\log (40)-\log \log (n-1)}{\log (n)} \ge \underbrace{\frac{\log (496)-\log (40)-\log \log (496)}{\log (497)}}_{=: \vartheta = 0.11\dots }, \end{aligned}$$

from which we conclude \(\log (n) \le \vartheta ^{-1} \log (m)\). We then have

$$\begin{aligned} c_4 \log (n) \left( \sigma _m^2 + \frac{1}{m}\sum \limits _{k=m}^\infty \sigma _k^2\right) \le \underbrace{\frac{2c_4}{\vartheta }}_{=: c_5} \cdot \frac{\log (m)}{m} \sum _{k=\lfloor m/2\rfloor }^{\infty } \sigma _k^2 \end{aligned}$$

with \(c_5 = 113.35\dots \).

Finally observe that we can choose \(c_6=\tilde{c}_1=6568\) due to \(\# J \le \tilde{c}_1m=6568m\).

Now take \(\tilde{m}\ge 2\tilde{c}_1\) and \(m=\lfloor \tilde{m}/\tilde{c}_1 \rfloor \ge 2\). Further note that \(\lfloor \tilde{m}/2\tilde{c}_1 \rfloor = \lfloor \lfloor \tilde{m}/\tilde{c}_1 \rfloor /2\rfloor \). Then

$$\begin{aligned} g_{\tilde{m}}^2 \le g_{\tilde{c}_1 m}^2 \le c_5 \frac{\log (m)}{m}\sum \limits _{k=\lfloor m/2 \rfloor }^\infty \sigma _k^2 = c_5 \frac{\log (\lfloor \tilde{m}/\tilde{c}_1 \rfloor )}{\lfloor \tilde{m}/\tilde{c}_1 \rfloor }\sum \limits _{k=\lfloor \tilde{m}/(2\tilde{c}_1) \rfloor }^\infty \sigma _k^2. \end{aligned}$$

The asserted estimates now follow due to \(2\tilde{c}_1=13136\) and \(2 c_5 \tilde{c}_1 \le 1.5 \cdot 10^6\) and \(1/(4\tilde{c}_1) \ge 3.8 \cdot 10^{-5}\).

Remark 6.4

The interesting question remains, whether there is a situation where the above bound on sampling numbers is sharp. Let us refer to the next subsection for a possible candidate. Clearly, there are situations where the bound in Theorem 6.1 does not reflect the correct behavior of sampling numbers. This is, for instance, the case for the univariate Sobolev embedding \(\mathrm {Id}:H^1([0,1]) \rightarrow L_2([0,1])\) where the sampling numbers show, at least asymptotically, the same behavior as the singular numbers.

7 An Outstanding Open Problem

Let us once again comment on an important open problem for the optimal sampling recovery of multivariate functions. We consider the minimal worst-case error (sampling numbers/widths) defined by

$$\begin{aligned} g_n(\mathrm {Id}_{s,d}:H^s_{\text {mix}}(\mathbb {T}^d) \rightarrow L_2(\mathbb {T}^d)) {:}{=} \inf \limits _{\mathbf {X}= (\mathbf {x}^1,\ldots ,\mathbf {x}^n)}\inf \limits _{\varphi :\mathbb {C}^n \rightarrow L_2} \sup \limits _{\Vert f\Vert _{H^s_{\text {mix}}}\le 1}\Vert f-\varphi (f(\mathbf {X}))\Vert _{L_2(\mathbb {T}^d)}. \end{aligned}$$
(29)

Let us comment on the class \(H^s_{\text {mix}}(\mathbb {T}^d)\). That is, we consider functions on the d-dimensional torus \(\mathbb {T}^d \simeq [0,1)^d\), where \(\mathbb {T}\) stands for [0, 1] with endpoints identified. Note that the unit cube \([0,1]^d\) is preferred here since it has Lebesgue measure 1 and is therefore a probability space. We could have also worked with \([0,2\pi ]^d\) and the Lebesgue measure (which can be made a probability measure by a d-dependent rescaling).

There are many different ways to define function spaces of dominating mixed smoothness, see [7, Chapt. 3]. We choose an approach which is closely related to [18, Sect. 2.1], see also (2.6) there. In fact, \(L_2(\mathbb {T}^d)\)-norms of mixed derivatives of the multivariate function f can be written in terms of Fourier coefficients \(\hat{f}_{\mathbf {k}}\) of f. For \(\alpha \in \mathbb {N}\) we define the space \(H^{\alpha }_{\text{ mix }}(\mathbb {T}^d)\) as the Hilbert space with the inner product

$$\begin{aligned} \langle f , g\rangle _{H^{\alpha }_{\text{ mix }}} {:}{=} \sum \limits _{\mathbf {j}\in \{0,\alpha \}^d}\langle D^{(\mathbf {j})}f,D^{(\mathbf {j})}g \rangle _{L_2(\mathbb {T}^d)}. \end{aligned}$$
(30)

\(D^{(j_1,\ldots ,j_d)}=\partial _1^{j_1}\cdots \partial _d^{j_d}\) thereby denotes the weak derivative operator. Defining the weight

$$\begin{aligned} w_{\alpha }(k) = (1+(2\pi |k|)^{2\alpha })^{1/2}\quad ,\quad k\in \mathbb {Z}\, \end{aligned}$$
(31)

and the univariate kernel function

$$\begin{aligned} K^1_{\alpha }(x,y){:}{=}\sum \limits _{k\in \mathbb {Z}} \frac{\exp (2\pi \mathrm {i}k(y-x) )}{w_{\alpha }(k)^2}\quad ,\quad x,y\in \mathbb {T}, \end{aligned}$$

directly leads to

$$\begin{aligned} K^d_{\alpha }(\mathbf {x},\mathbf {y}){:}{=} K^1_{\alpha }(x_1,y_1)\cdots K^1_{\alpha }(x_d,y_d)\quad ,\quad \mathbf {x},\mathbf {y}\in \mathbb {T}^d, \end{aligned}$$
(32)

which is a reproducing kernel for \(H^\alpha _{\text{ mix }}(\mathbb {T}^d)\). In particular, for any \(f\in H^\alpha _{\text{ mix }}(\mathbb {T}^d)\) we have

$$\begin{aligned} f(\mathbf {x}) = \langle f, K^d_{\alpha }(\mathbf {x},\cdot ) \rangle _{H^{\alpha }_{\text{ mix }}}. \end{aligned}$$

The kernel defined in (32) associated with the inner product (30) can be extended to the case of fractional smoothness \(s>0\) replacing \(\alpha \) by s in (31)–(32) which in turn leads to the inner product

$$\begin{aligned} \langle f , g\rangle _{H^{s}_{\text{ mix }}} {:}{=} \sum \limits _{\mathbf {k}\in \mathbb {Z}^d} \hat{f}_{\mathbf {k}} \, \overline{\hat{g}_{\mathbf {k}}} \, \prod _{j=1}^d w_{s}(k_j)^2\, \end{aligned}$$

in terms of the Fourier coefficients \(\hat{f}_{\mathbf {k}}\), \(\hat{g}_{\mathbf {k}}\) and the corresponding norm. The (ordered) sequence \((\lambda _j)_{j=1}^{\infty }\) of eigenvalues of the corresponding mapping \(W_{s,d} = \mathrm {Id}_{s,d}^{*}\circ \mathrm {Id}_{s,d}\), where \(\mathrm {Id}:\mathrm {H}\big (K^d_{s}\big ) \rightarrow L_2(\mathbb {T}^d)\), is the non-increasing rearrangement of the numbers

$$\begin{aligned} \Big \{\lambda _{\mathbf {k}}{:}{=}\prod \limits _{j=1}^d w_{s}(k_j)^2=\prod \limits _{j=1}^d (1+(2\pi |k_j|)^{2s})^{-1}~:~\mathbf {k}\in \mathbb {Z}^d\Big \}. \end{aligned}$$

It has been shown by various authors, see [7, Chapt. 4] and the references therein, that we have asymptotically (\(s>0\))

$$\begin{aligned} \sigma _n(\mathrm {Id}_{s,d}) \asymp _{s,d} n^{-s}(\log n)^{(d-1)s}\quad ,\quad n\ge 2. \end{aligned}$$
(33)

The correct asymptotic behavior of (29) has been addressed by several authors in the information-based complexity (IBC) community, see, e.g., [27] and also [7, Outstanding Open Problem 1.4]. It is nowadays well known, see, e.g., [6, 31, 39] and [7, Sec. 5] for some historical remarks that for \(s>1/2\) the bound

$$\begin{aligned} c_{s,d}n^{-s}(\log n)^{(d-1)s} \le g_n(\mathrm {Id}_{s,d})\le C_{s,d}n^{-s}(\log n)^{(d-1)(s+1/2)}\, \end{aligned}$$
(34)

holds asymptotically in \(n\in \mathbb {N}\). Note that there is a d-depending gap in the logarithm between upper and lower bound.

Recently, Krieg and M.Ullrich [15] improved this bound by using a probabilistic technique to show that for \(s>1/2\)

$$\begin{aligned} g_n(\mathrm {Id}_{s,d}) \lesssim _{s,d} n^{-s}(\log n)^{(d-1)s+s}. \end{aligned}$$

Clearly, if \(s<(d-1)/2\), then the gap in (34) is reduced to \((\log n)^s\), which is still growing in s. In particular, there is no improvement if \(d=2\). However, this result can be considered as a major progress for the research on the complexity of this problem. They disproved Conjecture 5.6.2. in [7] for \(p=2\) and \(1/2< s <(d-1)/2\). Indeed, the celebrated sparse grid points are now beaten by random points in a certain range for s. This again reflects the “power of random information”, see [11].

Still it is worth mentioning that the sparse grids represent the best known deterministic construction what concerns the asymptotic order. Indeed, the guarantees are deterministic and only slightly worse compared to random nodes in the asymptotic regime. However, regarding preasymptotics the random constructions provide substantial advantages. The problem is somehow related to the recent efforts in compressed sensing. There the optimal RIP matrices are given as realizations of random matrices. Known deterministic constructions are far from being optimal.

In the present paper, we prove that the sparse grids are beaten for the full range of \(s>1/2\) whenever \(d\ge 3\). In case \(d=2\) our approach and the sparse grids have the same performance. Clearly, inserting (33) into the bound in Theorem 6.1 gives

$$\begin{aligned} g_n(\mathrm {Id}_{s,d}) \lesssim _{s,d} n^{-s}(\log n)^{(d-1)s+1/2}, \end{aligned}$$
(35)

which shortens the gap between upper and lower bound to \(\sqrt{\log n}\). The best known lower bound is the one from (33). It is neither clear whether the bound in (35) is sharp nor if it can be improved. So this framework might serve as a candidate for Remark 6.4. Therefore, the outstanding open question remains (see, e.g., [7, Chapt. 5] and the references therein) whether there is an intrinsic additional difficulty when restricting to algorithms based on function samples rather than Fourier coefficients. From a practical point of view, sampling algorithms are highly relevant since we usually have given discrete samples of functions. The question remains: are the asymptotic characteristics \(\sigma _n\) and \(g_n\) of the same order or do they rather behave like \(\sigma _n = o(g_n)\)? This represents a fundamental open problem in hyperbolic cross approximation, see [7, Outstanding Open Problem 1.4].