Function Values Are Enough for $$L_2$$ -Approximation

Krieg, David; Ullrich, Mario

doi:10.1007/s10208-020-09481-w

Function Values Are Enough for $L_2$-Approximation

Open access
Published: 07 December 2020

Volume 21, pages 1141–1151, (2021)
Cite this article

Download PDF

You have full access to this open access article

Foundations of Computational Mathematics Aims and scope Submit manuscript

Function Values Are Enough for $L_2$-Approximation

Download PDF

David Krieg¹ &
Mario Ullrich^1,2

3263 Accesses
37 Citations
1 Altmetric
Explore all metrics

Abstract

We study the $L_2$-approximation of functions from a Hilbert space and compare the sampling numbers with the approximation numbers. The sampling number $e_n$ is the minimal worst-case error that can be achieved with n function values, whereas the approximation number $a_n$ is the minimal worst-case error that can be achieved with n pieces of arbitrary linear information (like derivatives or Fourier coefficients). We show that

$$\begin{aligned} e_n \,\lesssim \, \sqrt{\frac{1}{k_n} \sum _{j\ge k_n} a_j^2}, \end{aligned}$$

where $k_n \asymp n/\log (n)$. This proves that the sampling numbers decay with the same polynomial rate as the approximation numbers and therefore that function values are basically as powerful as arbitrary linear information if the approximation numbers are square-summable. Our result applies, in particular, to Sobolev spaces $H^s_\mathrm{mix}(\mathbb {T}^d)$ with dominating mixed smoothness $s>1/2$ and dimension $d\in \mathbb {N}$, and we obtain

$$\begin{aligned} e_n \,\lesssim \, n^{-s} \log ^{sd}(n). \end{aligned}$$

For $d>2s+1$, this improves upon all previous bounds and disproves the prevalent conjecture that Smolyak’s (sparse grid) algorithm is optimal.

Optimal Monte Carlo Methods for $L^2$-Approximation

Article 06 April 2018

Smooth approximation of mappings with rank of the derivative at most 1

Article 20 January 2023

Weighted $L^2$ Version of Mergelyan and Carleman Approximation

Article Open access 21 May 2020

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Let H be a reproducing kernel Hilbert space, i.e., a Hilbert space of real-valued functions on a set D such that point evaluation

$$\begin{aligned} \delta _x:H \rightarrow \mathbb {R},\quad \delta _x(f) := f(x) \end{aligned}$$

is a continuous functional for all $x\in D$. We consider numerical approximation of functions from such spaces, using only function values. We measure the error in the space $L_2=L_2(D,\mathcal {A},\mu )$ of square-integrable functions with respect to an arbitrary measure $\mu $ such that H is embedded into $L_2$. This means that the functions in H are square-integrable and two functions from H that are equal $\mu $-almost everywhere are also equal point-wise.

We are interested in the n-th minimal worst-case error

$$\begin{aligned} e_n \,:=\, e_n(H) \,:=\, \inf _{\begin{array}{c} x_1,\dots ,x_n\in D\\ \varphi _1,\dots ,\varphi _n\in L_2 \end{array}}\, \sup _{f\in H:\Vert f\Vert _H\le 1}\, \Big \Vert f - \sum _{i=1}^n f(x_i)\, \varphi _i\Big \Vert _{L_2}, \end{aligned}$$

which is the worst-case error of an optimal algorithm that uses at most n function values. These numbers are sometimes called sampling numbers. We want to compare $e_n$ with the n-th approximation number

$$\begin{aligned} a_n \,:=\, a_n(H) \,:=\, \inf _{\begin{array}{c} L_1,\dots ,L_n\in H'\\ \varphi _1,\dots ,\varphi _n\in L_2 \end{array}}\, \sup _{f\in H:\Vert f\Vert _H\le 1}\, \Big \Vert f - \sum _{i=1}^n L_i(f)\, \varphi _i\Big \Vert _{L_2}, \end{aligned}$$

where $H'$ is the space of all bounded, linear functionals on H. This is the worst-case error of an optimal algorithm that uses at most n linear functionals as information. Clearly, we have $a_n\le e_n$ since the point evaluations form a subset of $H'$.

The approximation numbers are quite well understood in many cases because they are equal to the singular values of the embedding operator $\mathrm{id}:H\rightarrow L_2$. However, the sampling numbers still resist a precise analysis. For an exposition of such approximation problems, we refer to [11,12,13], especially [13, Chapter 26 & 29], and references therein. One of the fundamental questions in the area asks for the relation of $e_n$ and $a_n$ for specific Hilbert spaces H. The minimal assumption on H is the compactness of the embedding $\mathrm{id}:H\rightarrow L_2$. It is known that

$$\begin{aligned} \lim _{n\rightarrow \infty } e_n = 0 \quad \Leftrightarrow \quad \lim _{n\rightarrow \infty } a_n = 0 \quad \quad \Leftrightarrow \quad H \hookrightarrow L_2 \text { compactly}, \end{aligned}$$

see [13, Section 26.2]. However, the compactness of the embedding is not enough for a reasonable comparison of the speed of this convergence, see [6]. If $(a_n^*)$ and $(e_n^*)$ are decreasing sequences that converge to zero and $(a_n^*)\not \in \ell _2$, one may construct H and $L_2$ such that $a_n=a_n^*$ for all $n\in \mathbb {N}$ and $e_n\ge e_n^*$ for infinitely many $n\in \mathbb {N}$. In particular, if

$$\begin{aligned} {{\,\mathrm{ord}\,}}(c_n) := \sup \left\{ s\ge 0 :\lim _{n\rightarrow \infty } c_n n^{s}=0\right\} \end{aligned}$$

denotes the (polynomial) order of convergence of a positive sequence $(c_n)$, it may happen that ${{\,\mathrm{ord}\,}}(e_n)=0$ even if ${{\,\mathrm{ord}\,}}(a_n)=1/2$.

It thus seems necessary to assume that $(a_n)$ is in $\ell _2$, i.e., that $\mathrm{id}:H\rightarrow L_2$ is a Hilbert–Schmidt operator. This is fulfilled, e.g., for Sobolev spaces defined on the unit cube; see Corollary 2. Under this assumption, it is proven in [9] that

$$\begin{aligned} {{\,\mathrm{ord}\,}}(e_n) \ge \frac{2 {{\,\mathrm{ord}\,}}(a_n)}{2{{\,\mathrm{ord}\,}}(a_n) + 1}\, {{\,\mathrm{ord}\,}}(a_n). \end{aligned}$$

In fact, the authors of [9] conjecture that the order of convergence is the same for both sequences. We give an affirmative answer to this question. Our main result can be stated as follows.

FormalPara Theorem 1

There are absolute constants $C,c>0$ and a sequence of natural numbers $(k_n)$ with $k_n\ge c n/\log (n+1)$ such that the following holds. For any $n\in \mathbb {N}$, any measure space $(D,\mathcal A,\mu )$ and any reproducing kernel Hilbert space H of real-valued functions on D that is embedded into $L_2(D,\mathcal A,\mu )$, we have

$$\begin{aligned} e_n(H)^2 \,\le \, \frac{C}{k_n} \sum _{j\ge k_n} a_j(H)^2. \end{aligned}$$

In particular, we obtain the following result on the order of convergence. This solves Open Problem 126 in [13, p. 333], see also [13, Open Problems 140 & 141].

FormalPara Corollary 1

Consider the setting of Theorem 1. If $a_n(H)\lesssim n^{-s}\log ^\alpha (n)$ for some $s>1/2$ and $\alpha \in \mathbb {R}$, then we obtain

$$\begin{aligned} e_n(H) \,\lesssim \, n^{-s}\log ^{\alpha +s}(n). \end{aligned}$$

In particular, we always have ${{\,\mathrm{ord}\,}}(e_n)={{\,\mathrm{ord}\,}}(a_n)$.

Let us now consider a specific example. Namely, we consider Sobolev spaces with (dominating) mixed smoothness defined on the d-dimensional torus $\mathbb {T}^d \cong [0,1)^d$. These spaces attracted quite a lot of attention in various areas of mathematics due to their intriguing attributes in high dimensions. For history and the state of the art (from a numerical analysis point of view) see [3, 19, 20].

Let us first define a one-dimensional and real-valued orthonormal basis of $L_2(\mathbb {T})$ by $b_0^{(1)}=1$, $b_{2m}^{(1)}=\sqrt{2}\cos (2\pi m x)$ and $b_{2m-1}^{(1)}=\sqrt{2}\sin (2\pi m x)$ for $m\in \mathbb {N}$. From this we define a basis of $L_2(\mathbb {T}^d)$ using d-fold tensor products: We set $\mathbf {b}_\mathbf{m}:=\bigotimes _{j=1}^d b_{m_j}^{(1)}$ for $\mathbf{m}=(m_1,\dots ,m_d)\in \mathbb {N}_0^d$. The Sobolev space with dominating mixed smoothness $s>0$ can be defined as

$$\begin{aligned} H := H^s_{{\mathrm{mix}}}(\mathbb {T}^d) := \Big \{ f \in L_2(\mathbb {T}^d) \,\Big |\, \Vert f\Vert _H^2 := \sum _{{\mathbf{m}}\in \mathbb {N}_0^d} \prod _{j=1}^d(1+|{m_j}|^{2s}) \left\langle f,{\mathbf{b}}_{\mathbf{m}}\right\rangle _{L_2}^2 <\infty \Big \}. \end{aligned}$$

This is a Hilbert space. It satisfies our assumptions whenever $s>1/2$. It is not hard to prove that an equivalent norm in $H^s_\mathrm{mix}(\mathbb {T}^d)$ for $s\in \mathbb {N}$ is given by

$$\begin{aligned} \Vert f\Vert _{H^s_\mathrm{mix}(\mathbb {T}^d)}^2 \,=\, \sum _{\alpha \in \{0,s\}^d} \Vert D^\alpha f\Vert _{L_2}^2. \end{aligned}$$

The approximation numbers $a_n = a_n(H)$ are known for some time to satisfy

$$\begin{aligned} a_n \,\asymp \, n^{-s} \log ^{s(d-1)}(n) \end{aligned}$$

for all $s>0$, see, e.g., [3, Theorem 4.13]. The sampling numbers $e_n = e_n(H)$, however, seem to be harder to tackle. The best bounds so far are

$$\begin{aligned} n^{-s} \log ^{s(d-1)}(n) \,\lesssim \; e_n \;\lesssim \, n^{-s} \log ^{(s+1/2)(d-1)}(n) \end{aligned}$$

for $s>1/2$. The lower bound easily follows from $e_n\ge a_n$, and the upper bound was proven in [17], see also [3, Chapter 5]. For earlier results on this prominent problem, see [15, 16, 18, 22]. Note that finding the right order of $e_n$ in this case is posed as Outstanding Open Problem 1.4 in [3]. From Corollary 1, setting $\alpha =s(d-1)$ in the second part, we easily obtain the following.

FormalPara Corollary 2

Let $H^s_\mathrm{mix}(\mathbb {T}^d)$ be the Sobolev space with mixed smoothness as defined above. Then, for $s>1/2$, we have

$$\begin{aligned} e_n\big (H^s_\mathrm{mix}(\mathbb {T}^d)\big ) \,\lesssim \, n^{-s} \log ^{sd}(n). \end{aligned}$$

The bound in Corollary 2 improves on the previous bounds if $d>2s+1$, or equivalently $s<(d-1)/2$. With this, we disprove Conjecture 5.26 from [3] and show, in particular, that Smolyak’s algorithm is not optimal in these cases. Although our techniques do not lead to an explicit deterministic algorithm that achieves the above bounds, it is interesting that n i.i.d. random points are suitable with positive probability (independent of n).

Let us conclude with a few topics for future research. While this paper was under review, Theorem 1 has already been extended to the case of complex-valued functions and non-injective operators $\mathrm{id}:H\rightarrow L_2$ in [7], including explicit values for the constants c and C, see also [21]. It remains open to generalize our results to non-Hilbert space settings. It is also quite a different question whether the sampling numbers and the approximation numbers behave similarly with respect to the dimension of the domain D. This is a subject of tractability studies. We refer to [13, Chapter 26] and especially [14, Corollary 8]. Here, we only note that the constants of Theorem 1 are, in particular, independent of the domain, and that this may be utilized for these studies, see also [7].

1 The Proof

The result follows from a combination of the general technique to assess the quality of random information as developed in [4, 5], together with bounds on the singular values of random matrices with independent rows from [10].

Before we consider algorithms that only use function values, let us briefly recall the situation for arbitrary linear functionals. In this case, the minimal worst-case error $a_n$ is given via the singular value decomposition of $\mathrm{id}: H\rightarrow L_2$ in the following way. Since $W=\mathrm{id}^*\mathrm{id}$ is positive, compact and injective, there is an orthogonal basis $\mathcal B=\left\{ b_j :j\in \mathbb {N}\right\} $ of H that consists of eigenfunctions of W. Without loss of generality, we may assume that H is infinite-dimensional. It is easy to verify that $\mathcal B$ is also orthogonal in $L_2$. We may assume that the eigenfunctions are normalized in $L_2$ and that $\Vert b_1\Vert _H \le \Vert b_2\Vert _H \le \dots $. From these properties, it is clear that the Fourier series

$$\begin{aligned} f\,=\,\sum _{j=1}^\infty f_j b_j, \qquad \text { where } \quad f_j:=\left\langle f,b_j\right\rangle _{L_2}, \end{aligned}$$

converges in H for every $f\in H$, and therefore also point-wise. The optimal algorithm based on n linear functionals is given by

$$\begin{aligned} P_n: H \rightarrow L_2, \quad P_n(f):=\sum _{j\le n} f_j b_j, \end{aligned}$$

which is the $L_2$-orthogonal projection onto $V_n:=\mathrm{span}\{b_1,\ldots ,b_n\}$. We refer to [11, Section 4.2] for details. We obtain that

$$\begin{aligned} a_n(H)=\sup _{f\in H:\Vert f\Vert _H\le 1} \big \Vert f - P_n(f) \big \Vert _{L_2} =\Vert b_{n+1}\Vert _H^{-1}. \end{aligned}$$

We now turn to algorithms using only function values. In order to bound the minimal worst-case error $e_n$ from above, we employ the probabilistic method in the following way. Let $x_1,\dots ,x_n\in D$ be i.i.d. random variables with $\mu $-density

$$\begin{aligned} \varrho : D\rightarrow \mathbb {R}, \quad \varrho (x) := \frac{1}{2} \left( \frac{1}{k} \sum _{j< k} b_{j+1}(x)^2 + \frac{1}{\sum _{j\ge k} a_j^2} \sum _{j\ge k} a_j^2 b_{j+1}(x)^2 \right) , \end{aligned}$$

where $k\le n$ will be specified later. Given these sampling points, we consider the algorithm

$$\begin{aligned} A_n: H\rightarrow L_2, \quad A_n(f):=\sum _{j=1}^k (G^+ N f)_j b_j, \end{aligned}$$

where $N:H\rightarrow \mathbb {R}^n$ with $N(f):=(\varrho (x_i)^{-1/2}f(x_i))_{i\le n}$ is the weighted information mapping and $G^+\in \mathbb {R}^{k\times n}$ is the Moore–Penrose inverse of the matrix

$$\begin{aligned} G:=(\varrho (x_i)^{-1/2} b_j(x_i))_{i\le n, j\le k} \in \mathbb {R}^{n\times k}. \end{aligned}$$

This algorithm is a weighted least-squares estimator: If G has full rank, then

$$\begin{aligned} A_n(f)=\underset{g\in V_k}{\mathrm{argmin}}\, \sum _{i=1}^n \frac{\vert g(x_i) - f(x_i) \vert ^2}{\varrho (x_i)}. \end{aligned}$$

In particular, we have $A_n(f)=f$ whenever $f\in V_k$. The worst-case error of $A_n$ is defined as

$$\begin{aligned} e(A_n) := \sup _{f\in H:\Vert f\Vert _H\le 1}\, \big \Vert f - A_n(f)\big \Vert _{L_2}. \end{aligned}$$

Clearly, we have $e_n\le e(A_n)$ for every realization of $x_1,\dots ,x_n$. Thus, it is enough to show that $e(A_n)$ obeys the desired upper bound with positive probability.

Remark 1

If $\mu $ is a probability measure and if the basis is uniformly bounded, i.e., if $\sup _{j\in \mathbb {N}}\, \Vert b_j\Vert _\infty < \infty $, we may also choose $\varrho \equiv 1$ and consider i.i.d. sampling points with distribution $\mu $.

Remark 2

Weighted least-squares estimators are widely studied in the literature. We refer to [1, 2]. In contrast to previous work, we show that we can choose a fixed set of weights and sampling points that work simultaneously for all $f\in H$. We do not need additional assumptions on the function f, the basis $(b_j)$ or the measure $\mu $. For this, we think that our modification of the weights is important.

Remark 3

The worst-case error $e(A_n)$ of the randomly chosen algorithm $A_n$ is not to be confused with the Monte Carlo error of a randomized algorithm, which can be defined by

$$\begin{aligned} e^\mathrm{ran}(A_n) \,:=\, \sup _{f\in H:\Vert f\Vert _H\le 1}\, \left( \mathbb {E}\left\| f - A_n(f)\right\| _{L_2}^2 \right) ^{1/2}. \end{aligned}$$

The Monte Carlo error is a weaker error criterion. It is shown in [8], see also [23], that the assumptions of Corollary 1 give rise to a randomized algorithm $M_n$ which uses at most n function values and satisfies

$$\begin{aligned} e^\mathrm{ran}(M_n) \,\lesssim \, n^{-s}\log ^\alpha (n). \end{aligned}$$

However, this does not imply that the worst-case error $e(M_n)$ is small for any realization of $M_n$.

To give an upper bound on $e(A_n)$, let us assume that G has full rank. For any $f\in H$ with $\Vert f\Vert _H\le 1$, we have

$$\begin{aligned} \begin{aligned} \left\| f-A_n f\right\| _{L_2} \,&\le \, a_k + \left\| P_k f - A_n f\right\| _{L_2} \,=\, a_k + \left\| A_n(f- P_k f)\right\| _{L_2} \\&=\, a_k + \left\| G^+ N(f- P_k f)\right\| _{\ell _2^k} \\&\le \, a_k +\left\| G^+:\ell _2^n \rightarrow \ell _2^k\right\| \left\| N:P_k(H)^\perp \rightarrow \ell _2^n\right\| . \end{aligned} \end{aligned}$$

The norm of $G^+$ is the inverse of the kth largest (and therefore the smallest) singular value of the matrix G. The norm of N is the largest singular value of the matrix

$$\begin{aligned} \Gamma :=\big (\varrho (x_i)^{-1/2} a_j b_{j+1}(x_i) \big )_{1\le i \le n, j\ge k} \in \mathbb {R}^{n\times \infty }. \end{aligned}$$

To see this, note that $N=\Gamma \Delta $ on $P_k(H)^\perp $, where the mapping $\Delta :P_k(H)^\perp \rightarrow \ell _2$ with $\Delta g=(g_{j+1}/a_j)_{j\ge k}$ is an isomorphism. This yields

$$\begin{aligned} e(A_n) \le a_k + \frac{s_\mathrm{max}(\Gamma )}{s_\mathrm{min}(G)}. \end{aligned}$$

(1)

It remains to bound $s_\mathrm{min}(G)$ from below and $s_\mathrm{max}(\Gamma )$ from above. Clearly, any nontrivial lower bound on $s_\mathrm{min}(G)$ automatically yields that the matrix G has full rank. To state our results, let

$$\begin{aligned} \beta _k \,:=\, \left( \frac{1}{k} \sum _{j\ge k} a_j^2\right) ^{1/2} \qquad \text { and }\qquad \gamma _{k}\,:=\,\max \Big \{a_k,\,\beta _k\Big \}. \end{aligned}$$

Note that $a_{2k}^2\le \frac{1}{k}(a_k^2+\ldots +a_{2k}^2)\le \beta _{k}^2$ for all $k\in \mathbb {N}$ and thus $\gamma _{k} \le \beta _ {\lfloor k/2 \rfloor }$. Before we continue with the proof of Theorem 1, we show that Corollary 1 follows from Theorem 1 by providing the order of $\beta _k$ in the following special case. The proof is an easy exercise.

Lemma 1

Let $a_n\asymp n^{-s}\log ^{\alpha }(n)$ for some $s,\alpha \in \mathbb {R}$. Then,

$$\begin{aligned} \beta _{k} \,\asymp \, {\left\{ \begin{array}{ll} a_k, &{} \text {if } s>1/2, \\ a_k \sqrt{\log (k)}, &{} \text {if } s=1/2 \,\text { and }\, \alpha <-1/2,\\ \end{array}\right. } \end{aligned}$$

and $\beta _k=\infty $ in all other cases.

The rest of the paper is devoted to the proof of the following two claims: There exist constants $c,C>0$ such that, for all $n\in \mathbb {N}$ and $k= \lfloor c\,n/\log n\rfloor $, we have

Claim 1

$$\begin{aligned} \mathbb {P}\Big (s_\mathrm{max}(\Gamma ) \,\le \, C\, \gamma _{k}\, n^{1/2} \Big ) > 1/2. \end{aligned}$$

Claim 2

$$\begin{aligned} \mathbb {P}\Big (s_\mathrm{min}(G) \,\ge \, n^{1/2}/2 \Big ) > 1/2. \end{aligned}$$

Together with (1), this will yield with positive probability that

$$\begin{aligned} e(A_n) \,\le \, a_k + 2C\,\gamma _{k} \le (2C+1)\, \gamma _{k} \le (2C+1)\, \beta _{\lfloor k/2 \rfloor }, \end{aligned}$$

which is the statement of Theorem 1.

Both claims are based on [10, Theorem 2.1], which we state here in a special case. Recall that, for $X\in \ell _2$, the operator $X\otimes X$ is defined on $\ell _2$ by $X\otimes X(v)= \langle X,v\rangle _2\cdot X$. By $\left\| M\right\| $ we denote the spectral norm of a matrix M.

Proposition 1

There exists an absolute constant $c>0$ for which the following holds. Let X be a random vector in $\mathbb {R}^k$ or $\ell _2$ with $\Vert X\Vert _2\le R$ with probability 1, and let $X_1,X_2,\dots $ be independent copies of X. We put

$$\begin{aligned} D:=\mathbb {E}(X\otimes X), \qquad A \,:=\, R^2\, \frac{\log n}{n} \qquad \text { and }\qquad B \,:=\, R\, \Vert D\Vert ^{1/2} \sqrt{\frac{\log n}{n}}. \end{aligned}$$

Then, for any $t>0$,

$$\begin{aligned} \mathbb {P}\left( \bigg \Vert \sum _{i=1}^n X_i\otimes X_i - nD\bigg \Vert \,\ge \, c\, t\, \max \{A, B\}\, n\right) \,\le \, 2e^{-t}. \end{aligned}$$

Proof of Proposition 1

We describe the steps needed to obtain this reformulation of [10, Theorem 2.1]. For this let $\Vert Z\Vert _{\psi _\alpha }:=\inf \{C>0:\mathbb {E}\exp (|Z|^{\alpha }/C^{\alpha })\le 2\}$ for $Z=\Vert X\Vert _2$ and

$$\begin{aligned} \rho :=\sup \left\{ \left( \mathbb {E}\left\langle X,\theta \right\rangle _2^4\right) ^{1/4}:{\theta \in \mathbb {R}^k \text { with } \Vert \theta \Vert _2=1}\right\} . \end{aligned}$$

Theorem 2.1 of [10] then states that

$$\begin{aligned} \mathbb {P}\left( \bigg \Vert \sum _{i=1}^n X_i\otimes X_i - nD\bigg \Vert \,\ge \, c\, t\, \max \{\widetilde{A}, \widetilde{B}\}\, n\right) \,\le \, 2e^{-t^{\alpha /(2+\alpha )}} \end{aligned}$$

with

$$\begin{aligned} \widetilde{A} \,:=\, \Vert Z\Vert _{\psi _\alpha }^2\frac{(\log n)^{1+\frac{2}{\alpha }} }{n} \qquad \text { and }\qquad \widetilde{B} \,:=\, \frac{\rho ^2}{\sqrt{n}} + \left( \Vert D\Vert \cdot \widetilde{A} \right) ^{1/2}, \end{aligned}$$

for all $t>0$, $\alpha \ge 1$, $n\in \mathbb {N}$ and some absolute constant $c>0$. Note that the 2 in the right-hand side of above inequality is missing in [10, Theorem 2.1], but can be found in the proof.

From $\Vert X\Vert _2\le R$ we obtain $\Vert Z\Vert _{\psi _\alpha }\le 2R$ for all $\alpha \ge 1$. Therefore, we can take the limit $\alpha \rightarrow \infty $ and obtain the result with $\widetilde{A}=\frac{R^2 \log n}{n}$ and $\widetilde{B}=\frac{\rho ^2}{\sqrt{n}} + R\Vert D\Vert ^{1/2}\sqrt{\frac{\log n}{n}}$ (and a slightly changed constant c). Moreover, we have

$$\begin{aligned} \mathbb {E}\left\langle X,\theta \right\rangle _2^4 \le R^2 \cdot \mathbb {E}\left\langle X,\theta \right\rangle _2^2 = R^2 \cdot \left\langle D\theta ,\theta \right\rangle _2 \le R^2 \cdot \Vert D\Vert \end{aligned}$$

for any $\theta \in \mathbb {R}^k$ (or $\ell _2$) with $\Vert \theta \Vert _2=1$, which implies $\rho ^2\le R\cdot \Vert D\Vert ^{1/2}$. This “trick” leads to an improvement over [10, Corollary 2.6] and yields our formulation of the result.$\square $

Proof of Claim 1

Consider independent copies $X_1,\ldots ,X_n$ of the vector

$$\begin{aligned} X:=\varrho (x)^{-1/2} (a_k b_{k+1}(x), a_{k+1} b_{k+2}(x), \ldots ), \end{aligned}$$

where x is a random variable on D with density $\varrho $. Clearly, $\sum _{i=1}^n X_i\otimes X_i = \Gamma ^* \Gamma $ with $\Gamma $ from above. First, observe

$$\begin{aligned} \left\| X\right\| _2^2 \,=\, \varrho (x)^{-1} \sum _{j\ge k} a_j^2\, b_{j+1}(x)^2 \,\le \, 2 \sum _{j\ge k} a_j^2 \,=\, 2 k\, \beta _{k}^2 \,=:\, R^2. \end{aligned}$$

Since $D:=\mathbb {E}(X\otimes X)=\mathop {\mathrm {diag}}(a_k^2, a_{k+1}^2, \ldots )$, we have $\Vert D\Vert =a_k^2$. This implies, with A and B defined as in Proposition 1, that

$$\begin{aligned} A \,\le \, 2 k\, \beta _{k}^2\, \frac{\log n}{n} \end{aligned}$$

and

$$\begin{aligned} B \,\le \, (2 k\, \beta _{k}^2\,)^{1/2} a_k\, \sqrt{\frac{\log n}{n}}. \end{aligned}$$

Choosing $k= \lfloor c\,n/\log n\rfloor $ for c small enough, we obtain

$$\begin{aligned} \mathbb {P}\Big (\left\| \Gamma ^*\Gamma - nD\right\| \ge t\,\gamma _{k}^2\, n\Big ) \le 2\exp \left( -t\right) . \end{aligned}$$

By choosing $t=2$, we obtain with probability greater 1/2 that

$$\begin{aligned} s_\mathrm{max}(\Gamma )^2 = \left\| \Gamma ^*\Gamma \right\| \le \left\| nD\right\| + \left\| \Gamma ^*\Gamma - nD\right\| \le n\, a_k^2 + 2 \gamma _{k}^2 n \le 3\, \gamma _{k}^2\, n. \end{aligned}$$

This yields Claim 1.$\square $

Proof of Claim 2

Consider $X:=\varrho (x)^{-1/2}(b_1(x), \ldots , b_k(x))$ with x distributed according to $\varrho $. Clearly, $\sum _{i=1}^n X_i\otimes X_i = G^*G$ with G from above. First, observe

$$\begin{aligned} \left\| X\right\| _2^2 \,=\, \varrho (x)^{-1} \sum _{j\le k} b_j(x)^2 \,\le \, 2 k \,=:\, R^2. \end{aligned}$$

Since $D:=\mathbb {E}(X\otimes X)=\mathop {\mathrm {diag}}(1, \ldots ,1)$, we have $\Vert D\Vert =1$. This implies, with A and B defined as in Proposition 1, that

$$\begin{aligned} A \,\le \, 2 k\, \frac{\log n}{n} \end{aligned}$$

and

$$\begin{aligned} B \,\le \, (2 k)^{1/2} \sqrt{\frac{\log n}{n}}. \end{aligned}$$

Again, choosing $k= \lfloor c\,n/\log n\rfloor $ for c small enough, we obtain

$$\begin{aligned} \mathbb {P}\left( \left\| G^*G - nD\right\| \ge \frac{t\, n}{4}\right) \le 2\exp \left( -t\right) . \end{aligned}$$

By choosing $t=2$, we obtain with probability greater 1/2 that

$$\begin{aligned} s_\mathrm{min}(G)^2 = s_\mathrm{min}(G^*G) \,\ge \, s_\mathrm{min}(nD) - \Vert G^*G - nD\Vert \,\ge \, n/2. \end{aligned}$$

This yields Claim 2.$\square $

References

Å. Björk. Numerical methods for least squares problems. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, 1996.
Book Google Scholar
A. Cohen, G. Migliorati. Optimal weighted least-squares methods. SMAI-Journal of Computational Mathematics, 3:181–203, 2017.
Article MathSciNet Google Scholar
D. Dũng, V.N. Temlyakov, and T. Ullrich. Hyperbolic Cross Approximation. Advanced Courses in Mathematics - CRM Barcelona. Springer International Publishing, 2018.
A. Hinrichs, D. Krieg, E. Novak, J. Prochno, and M. Ullrich. On the power of random information. In F. J. Hickernell and P. Kritzer, editors, Multivariate Algorithms and Information-Based Complexity, pages 43–64. De Gruyter, Berlin/Boston, 2020.
MATH Google Scholar
A. Hinrichs, D. Krieg, E. Novak, J. Prochno, and M. Ullrich. Random sections of ellipsoids and the power of random information. arXiv:1901.06639, 2019.
A. Hinrichs, E. Novak, and J. Vybíral. Linear information versus function evaluations for $\ell _2$-approximation. J. Approx. Theory, 153:97–107, 07 2008.
L. Kämmerer, T. Ullrich, T. Volkmer. Worst case recovery guarantees for least squares approximation using random samples. arXiv:1911.10111, 2019.
D. Krieg. Optimal Monte Carlo methods for $L_2$-approximation. Constr. Approx., 49:385–403, 2019.
Article MathSciNet Google Scholar
F. Kuo, G. W. Wasilkowski, and H. Woźniakowski. On the power of standard information for multivariate approximation in the worst case setting. J. Approx. Theory, 158:97–125, 2009.
Article MathSciNet Google Scholar
S. Mendelson and A. Pajor. On singular values of matrices with independent rows. Bernoulli, 12(5):761–773, 2006.
Article MathSciNet Google Scholar
E. Novak and H. Woźniakowski. Tractability of multivariate problems. Vol. 1: Linear information, volume 6 of EMS Tracts in Mathematics. European Mathematical Society (EMS), Zürich, 2008.
E. Novak and H. Woźniakowski. Tractability of multivariate problems. Volume II: Standard information for functionals, volume 12 of EMS Tracts in Mathematics. European Mathematical Society (EMS), Zürich, 2010.
E. Novak and H. Woźniakowski. Tractability of multivariate problems. Volume III: Standard information for operators, volume 18 of EMS Tracts in Mathematics. European Mathematical Society (EMS), Zürich, 2012.
E. Novak and H. Woźniakowski. Tractability of multivariate problems for standard and linear information in the worst case setting: Part I. J. Approx. Theory, 207:177–192, 2016.
Article MathSciNet Google Scholar
W. Sickel. Approximate recovery of functions and Besov spaces of dominating mixed smoothness. Constructive theory of functions, 404–411, DARBA, Sofia, 2003.
W. Sickel. Approximation from sparse grids and function spaces of dominating mixed smoothness. Approximation and probability, 271–283, Banach Center Publ., 72, Polish Acad. Sci. Inst. Math., Warsaw, 2006.
W. Sickel and T. Ullrich. The Smolyak algorithm, sampling on sparse grids and function spaces of dominating mixed smoothness. East J. Approx., 13(4):387–425, 2007.
MathSciNet Google Scholar
V. N. Temlyakov. Approximation of periodic functions, Computational Mathematics and Analysis Series, Nova Science Publishers, Inc., Commack, NY, 1993.
MATH Google Scholar
V. N. Temlyakov. Multivariate Approximation, volume 32 of Cambridge Monographs on Applied and Computational Mathematics. Cambridge University Press, 2018.
H. Triebel. Bases in Function Spaces, Sampling, Discrepancy, Numerical Integration, volume 11 of EMS Tracts in Mathematics. European Mathematical Society (EMS), Zürich, 2010.
M. Ullrich, On the worst-case error of least squares algorithms for $L_2$-approximation with high probability, J. Complexity 60, 2020, https://doi.org/10.1016/j.jco.2020.101484.
T. Ullrich. Smolyak’s algorithm, sampling on sparse grids and function spaces of dominating mixed smoothness. East J. Approx., 14(1):1–38, 2008.
MathSciNet MATH Google Scholar
G. W. Wasilkowski and H. Woźniakowski. The power of standard information for multivariate approximation in the randomized setting. Math. Comp., 76:965–988, 2006.

Download references

Acknowledgements

D. Krieg is supported by the Austrian Science Fund (FWF) Project F5513-N26, which is a part of the Special Research Program Quasi-Monte Carlo Methods: Theory and Applications.

Funding

Open access funding provided by Johannes Kepler University Linz.

Author information

Authors and Affiliations

Institut für Analysis, Johannes Kepler Universität, Linz, Austria
David Krieg & Mario Ullrich
Moscow Center for Fundamental and Applied Mathematics, Lomonosov Moscow State University, Moscow, Russia
Mario Ullrich

Authors

David Krieg
View author publications
You can also search for this author in PubMed Google Scholar
Mario Ullrich
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mario Ullrich.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Communicated by Frances Kuo.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Krieg, D., Ullrich, M. Function Values Are Enough for $L_2$-Approximation. Found Comput Math 21, 1141–1151 (2021). https://doi.org/10.1007/s10208-020-09481-w

Download citation

Received: 02 May 2020
Revised: 28 September 2020
Accepted: 05 October 2020
Published: 07 December 2020
Issue Date: August 2021
DOI: https://doi.org/10.1007/s10208-020-09481-w

Keywords

Mathematics Subject Classification

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Function Values Are Enough for \(L_2\)-Approximation

Abstract

Similar content being viewed by others

Optimal Monte Carlo Methods for \(L^2\)-Approximation

Smooth approximation of mappings with rank of the derivative at most 1

Weighted \(L^2\) Version of Mergelyan and Carleman Approximation

1 The Proof

Remark 1

Remark 2

Remark 3

Lemma 1

Claim 1

Claim 2

Proposition 1

Proof of Proposition 1

Proof of Claim 1

Proof of Claim 2

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

Function Values Are Enough for \(L_2\)-Approximation

Abstract

Similar content being viewed by others

Optimal Monte Carlo Methods for \(L^2\)-Approximation

Smooth approximation of mappings with rank of the derivative at most 1

Weighted \(L^2\) Version of Mergelyan and Carleman Approximation

1 The Proof

Remark 1

Remark 2

Remark 3

Lemma 1

Claim 1

Claim 2

Proposition 1

Proof of Proposition 1

Proof of Claim 1

Proof of Claim 2

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation