Exponential tractability of $L_2$-approximation with function values

We study the complexity of high-dimensional approximation in the $L_2$-norm when different classes of information are available; we compare the power of function evaluations with the power of arbitrary continuous linear measurements. Here, we discuss the situation when the number of linear measurements required to achieve an error $\varepsilon \in (0,1)$ in dimension $d\in\mathbb{N}$ depends only poly-logarithmically on $\varepsilon^{-1}$. This corresponds to an exponential order of convergence of the approximation error, which often happens in applications. However, it does not mean that the high-dimensional approximation problem is easy, the main difficulty usually lies within the dependence on the dimension $d$. We determine to which extent the required amount of information changes, if we allow only function evaluation instead of arbitrary linear information. It turns out that in this case we only lose very little, and we can even restrict to linear algorithms. In particular, several notions of tractability hold simultaneously for both types of available information.

which might be understood as a continuous embedding into L 2 . The class of all spaces F satisfying the assumptions above will be denoted by A. In particular, for each F ∈ A we have some associated nonempty set D, measure µ on D and continuous embedding APP.
We approximate APP by using functionals from the class Λ std consisting of all function evaluations, or from the class Λ all = F * of all continuous linear functionals.
Below B F denotes the closed unit ball in F . Let us define, for n ∈ N, the • n-th linear sampling width as e n (F, L 2 ) := inf • n-th sampling width as g n (F, L 2 ) := inf • n-th linear width as a n (F, L 2 ) := inf • n-th Gelfand width as These quantities represent the minimal worst case errors that can be achieved with linear or nonlinear algorithms using at most n function values or linear measurements, respectively.
We also define the information-based complexity of the problem APP for the classes Λ std and Λ all , respectively, as the minimal number of evaluations from Λ std or Λ all necessary to obtain the absolute precision of approximation at most ε, i.e., as n std (ε, F ) := min n : g n (F, L 2 ) ≤ ε and n all (ε, F ) := min n : c n (F, L 2 ) ≤ ε .
Note that, since g n (F, L 2 ) ≤ e n (F, L 2 ), we have and all our upper bounds are proven for n std-lin (ε, F ). There is a lot of literature on the size of these quantities for specific classes F . We refer to the monographs [10,33,35,36,42,43,45] for more details and literature on the subject.
Here, we are specifically interested in the comparison of these quantities for general classes F . That is, since n all (ε, F ) ≤ n std (ε, F ) is obvious for all F ∈ A, we ask for an upper bound on n std (ε, F ) based on knowledge of the function n all (ε, F ). However, it is known that such a bound cannot hold without certain assumptions on F , see [36,Chapter 26] and references therein, and even then, the involved "constants" depend in a non-trivial way on F . One approach to obtain qualitative statements on the relation of the complexities is to consider a whole sequence of spaces (F d ) d∈N , where d can be interpreted as the dimension of the underlying domain. We then assume a certain bound on n all (ε, F d ), depending only on ε ∈ (0, 1) and d ∈ N, and ask for an upper bound on n std (ε, F d ), hopefully not much worse than the bound on n all (ε, F d ).
In the present paper, we allow arbitrary Banach spaces of functions F d , but we assume that n all (ε, F d ) depends only poly-logarithmically on ε −1 . That is, we and study how this translates into bounds on n std (ε, F d ). Note that the above bound (1) on the complexity implies that whereas (2) implies that (1) holds with +1 added on the right hand side. The assumption (1) is therefore equivalent to the existence of a (possibly non-linear) algorithm based on arbitrary linear information that converges exponentially fast. We will show that in this case, we do not lose much when we only allow linear algorithms and function evaluations as information. One of our main results may be stated as follows.
Theorem (see Corollary 5). Assume that F d ∈ A for every d ∈ N and n all (ε, F d ) ≤ c d q (1 + ln ε −1 ) p for some p, c > 0, q ≥ 0, and all ε ∈ (0, 1). Then for all ε ∈ (0, 1) and d ∈ N, and some C > 0 that depends only on c, p and q.
This shows that every Banach space that is assumed to be approximable in high dimensions (in the above sense) with an exponential order by some algorithm and information can practically already be treated with linear algorithms based on function values. In particular, this improves upon Theorem 26.21 from [36] and solves Open Problem 128 therein. Let us add that we do not know if the additional (1 + ln d) p is necessary.
There are many appearances of the assumption (1) in the literature. Besides the detailed study of certain weighted Hilbert spaces of analytic functions [8,24,25,30,47], it appears naturally in the context of approximation with (increasingly flat) Gaussian kernels [12,17,26,41], or in tensor product approximations [14,16], or for certain smoothness spaces on complex spheres [7]. Moreover, it is a typical assumption for the construction of greedy bases [4,5,15]. Let us also add that there is quite some study on the stability of algorithms that can achieve an exponential convergence, see [1,2,3,39] for details.
When it comes to the study of the tractability of the problem, i.e., the precise dependence of the error on the dimension, especially when we only allow function evaluations, there is much less to cite and we are only aware of the Hilbert space references from above. As an explicit example, let us mention the Gaussian space , which satisfies a relation of the form (1) for L 2 -approximation with respect to the Gaussian probability measure µ, see [41]. In the Hilbert case, there are some general results which make the situation somewhat simpler. For example, it is known that linear algorithms are always optimal and one may work with the singular value decomposition of the embedding APP. We refer to [33,Chapter 4] and [36,Chapter 26].
A bit more is known in the case of algebraic tractability, i.e., when the complexity depends polynomially on ε −1 instead of ln ε −1 . In addition to general Hilbert space results, see [36,Chapter 26], and characterizations for weighted Korobov spaces, see [11], there are also quite sharp results for the classical smoothness spaces C k (Ω d ) of k-times differentiable functions on certain d-dimensional domains, possibly for k = ∞. See [27,34,46] for details on approximation, and [20,21,22,23] for numerical integration in the same classes. However, a comparison as proven here in the case of exponential convergence is not possible in this case, see the end of Section 2. In any case, it is open to determine the precise behavior of n std (ε, F d ) for most classical spaces, while n all (ε, F d ) is more often known.
Our results are based on the following (special case of) Theorem 3 from [9], see also [18,28,29,32,44], which allows us to treat more general classes of functions. Theorem 1. For each 0 < r < 2, there is a universal constant b ∈ N, depending only on r, such that the following holds. For all F ∈ A and n ≥ 2, we have Additionally, we use the following fundamental result from [38], see also [6,31].

Theorem 2.
For all F ∈ A and n ≥ 1, we have a n (F,

Remark 3.
We would like to stress that the proof of Theorem 1 in [9] is nonconstructive, and we do not know how to explicitly construct evaluation points However, for problems with known operators T achieving the infimum as in the definition of linear widths a n (F, L 2 ), we are able to specify algorithms utilizing i.i.d. sampling of evaluation points from some known distribution, and satisfying inequalities similar to the one above with high probability, see Theorem 8 of [29].

Exponential tractability of approximation
The notions of tractability are defined as follows. Let us fix, for every d ∈ N, some space F d ∈ A. For each F d ∈ A we have some associated set D d equipped with a measure µ d , and a continuous embedding APP d : F d → L 2 (D d , µ d ). The index d ∈ N is an arbitrary parameter, but it usually stands for the dimension of the domain D d . A multivariate approximation problem is simply a sequence of embeddings Moreover, tractability notions are defined relative to the considered class of information operations, i.e., we can consider tractability for Λ std or Λ all . Therefore, for x ∈ {std, all}, we say that APP is for some C, t > 0 and for all d ∈ N and ε ∈ (0, 1), • exponentially uniformly weakly tractable (EXP-UWT) for the class Λ x if and only if for all α, β > 0 we have • exponentially weakly tractable (EXP-WT) for the class Λ x if and only if It is easy to see that we have the following logical relation between the tractability notions defined above EXP-SPT =⇒ EXP-PT =⇒ EXP-QPT =⇒ EXP-UWT =⇒ EXP-WT.
For a multivariate approximation problem we prove that exponential strong polynomial tractability (EXP-SPT), exponential polynomial tractability (EXP-PT), exponential uniform weak tractability (EXP-UWT) and exponential weak tractability (EXP-WT) for the class Λ all are each equivalent to the corresponding tractability property for the class Λ std . Moreover, exponential quasi-polynomial tractability (EXP-QPT) for Λ all implies exponential uniform weak tractability (EXP-UWT) for Λ std , i.e, the next tractability notion in the tractability hierarchy considered here. Whether the equivalence of exponential quasi-polynomial tractability (EXP-QPT) for the classes Λ all and Λ std holds remains an open problem.
These equivalences are in sharp contrast to the results for algebraic tractability. See, e.g., [19,37,40] for examples where the problem is algebraically tractable for Λ all but the curse of dimensionality holds for Λ std . In particular, [37,Example 5] shows that for the tensor product W s 2,d of certain univariate periodic Sobolev spaces, s > 1/2, we have QPT for Λ all , but the curse of dimensionality for Λ std .

Results
We now present our results. The first results are concerned with EXP-(S)PT and EXP-QPT. Both are direct corollaries of the following theorem.

Theorem 4. Assume that F ∈ A satisfies
for some B > 0 and A ≥ 1 and all ε ∈ (0, 1). Then and b is the absolute constant from Theorem 1 in the case r = 1.
Applying first Lemma 8 and then Lemma 9 from the Appendix, we deduce that for all n ≥ n 0 (A, B) := A max(3B/2, 1) B + 1. In particular, (a n (F, L 2 )) ∈ ℓ 1 . It follows from Theorem 1 that there exists an absolute constant b ∈ N such that If we put B 0 := max{B/2, 1}, we have for all B > 0 and n ≥ n 0 (A, B) the bound Thus which gives the desired estimate.

Corollary 5. Assume that F d ∈ A for every d ∈ N and
for some p, c > 0, q ≥ 0, and all ε ∈ (0, 1). Then for all ε ∈ (0, 1) and d ∈ N, and some C > 0 that depends only on c, p and q.
In particular, if APP is exponentially (strongly) polynomially tractable for the class Λ all then it is exponentially (strongly) polynomially tractable for Λ std .
Proof. We use Theorem 4 with A = c d q + 1 and B = p.
We now turn to the assumption that APP is exponentially quasi-polynomially tractable for the class Λ all . This is the only case where we do not know if it implies the same property for Λ std .

Corollary 6. Assume that F d ∈ A for every d ∈ N and
for some c, t > 0 and all ε ∈ (0, 1). Then for all ε ∈ (0, 1) and d > (e + 1 c ) 1/t e −1 , and some C > 0 that depends only on c.
In particular, if APP is exponentially quasi-polynomially tractable for the class Λ all , then it is exponentially uniformly weakly tractable for the class Λ std .
Hence, we can apply Theorem 4 with A = c e t d t and B = t ln + d, i.e., A = c e B . Note that A, B ≥ 1 for d > (e + 1 c ) 1/t e −1 . We obtain that there exists an absolute constant b > 0 such that for all ε ∈ (0, 1), with where c ′ > 0 only depends on c. This proves the bound. Now, since ln n std (ε, F d ) depends only logarithmically on d and doublelogarithmically on ε −1 , we obtain for all α, β > 0, i.e., APP is exponentially uniformly weakly tractable for the class Λ std .
We finally discuss EXP-UWT and EXP-WT.

Theorem 7.
Assume that F d ∈ A for every d ∈ N. If the problem APP is exponentially (uniformly) weakly tractable for the class Λ all , then it is exponentially (uniformly) weakly tractable for the class Λ std .
Proof. Assume that there are 0 < α, β ≤ 1 such that It is enough to show that By assumption, for every for all n ≥ exp(hv 0 ). From Theorem 2, we get For all n ≥ exp(2hd α ) and h ≤ 1/16, we have ln n h and hence we have for all n ≥ max{exp(hv 0 ), exp(2hd α )} that It follows from Theorem 1 that for some absolute constant b ∈ N and all n ≥ max{exp(hv 0 ), exp(2hd α )}, we have where we again used that h ≤ 1/16. It follows that for some absolute constant D > 0 and all ε ∈ (0, 1) and d ∈ N such that d + ε −1 is sufficiently large. This implies Since h ∈ (0, 1/16) can be chosen arbitrarily close to 0, we obtain (3). This allows us to conclude our statement. Indeed, for uniform weak tractability we take arbitrary α and β from (0, 1), and for weak tractability we take α = β = 1.

Appendix: technical lemmas
The following lemmas are used in the proofs of our results. Proof. Using integration by substitution, with u = (t/A) 1/B , we obtain that where, for a ∈ R and x > 0, Γ(a, x) = ∞ x v a−1 exp(−v)dv is the incomplete gamma function.
If, on the other hand, 0 < a < 1 and x > 1 then since v a−1 ≤ x a−1 for v ≥ x we have Therefore, for every a > 0 and x > max(a, 1), the following bound holds Γ(a, x) ≤ max(a, 1) x a−1 exp(−x).

Declarations
Conflict of interest The authors declare no competing interests.