Constructive Approximation

, Volume 39, Issue 2, pp 385–395 | Cite as

Approximation of High-Dimensional Rank One Tensors

  • Markus Bachmayr
  • Wolfgang Dahmen
  • Ronald DeVore
  • Lars Grasedyck
Article

Abstract

Many real world problems are high-dimensional in that their solution is a function which depends on many variables or parameters. This presents a computational challenge since traditional numerical techniques are built on model classes for functions based solely on smoothness. It is known that the approximation of smoothness classes of functions suffers from the so-called ‘curse of dimensionality’. Avoiding this curse requires new model classes for real world functions that match applications. This has led to the introduction of notions such as sparsity, variable reduction, and reduced modeling. One theme that is particularly common is to assume a tensor structure for the target function. This paper investigates how well a rank one function f(x1,…,xd)=f1(x1)⋯fd(xd), defined on Ω=[0,1]d can be captured through point queries. It is shown that such a rank one function with component functions fj in \(W^{r}_{\infty}([0,1])\) can be captured (in L) to accuracy O(C(d,r)Nr) from N well-chosen point evaluations. The constant C(d,r) scales like ddr. The queries in our algorithms have two ingredients, a set of points built on the results from discrepancy theory and a second adaptive set of queries dependent on the information drawn from the first set. Under the assumption that a point zΩ with nonvanishing f(z) is known, the accuracy improves to O(dNr).

Keywords

Query algorithms High-dimensional approximation Separable functions Rate of approximation 

Mathematics Subject Classification (2000)

41A25 65D15 

1 Introduction

A recurring model in certain high-dimensional application domains is that the target function is a low rank tensor, or can be approximated well by a linear combination of such tensors. For an overview of numerical methods based on this concept and their applications, we refer to [3] and the references therein. We consider a fundamental question concerning the computational complexity of such low rank tensors: If we know that a given function has such a tensor structure, to what accuracy can we approximate it using only a certain number of deterministically chosen point queries? In this paper, we treat this problem in the simplest setting where the tensors are of rank one.

Given an integer r, we denote by \(W^{r}_{\infty}[0,1]\) the set of all univariate functions on [0,1] which have r weak derivatives in L, with the semi-norm
$$ |f|_{W^r_\infty[0,1]}:= \big\| f^{(r)}\big\| _{L_\infty}. $$
We shall study the following classes of rank one tensor functions defined on Ω:=[0,1]d. If r is a positive integer and M>0, we consider the class of functions
$$\begin{aligned} &\mathcal {F}^{r}(M) := \Biggl\{ f\in C(\varOmega) \colon f(x) = \prod _{i=1}^d f_i(x_i) \\ & \phantom{\mathcal {F}^{r}(M) :=\Bigg\{ } \mathrm{with}\ \|f_i\|_{L_\infty[0,1]} \leq1 , |f_i|_{W^r_\infty[0,1]} \leq M,\ i=1,\dots,d \Biggr\} . \end{aligned}$$
Note that we could equally well replace the bound 1 appearing in the definition by an arbitrary positive value and arrive at the above class by simple rescaling. Note also that whenever \(\|f\|_{L_{\infty}(\varOmega )}\le1\), we can achieve the restriction on the \(\|f_{i}\|_{L_{\infty}[0,1]}\) in this definition by choosing a scaling of the individual factors so that \(\|f_{i}\|_{L_{\infty}[0,1]}\leq1\) for all i.

For ease of presentation, we assume from now on that M≥1.

Let us note at the outset that \(\mathcal {F}^{r}\) is closely related to a class of functions with bounded mixed derivatives. We use the notation \(D^{\nu}=D^{\nu_{1}}_{x_{1}}\cdots D^{\nu_{d}}_{x_{d}}\) for multivariate derivatives. Then, the class of functions MWr(L) consists of all functions f(x1,…,xd) for which
$$ |f|_{MW^r(L_\infty(\varOmega))}:= \sum_{\nu\in\varLambda_r\setminus\{0\} }\big\| D^\nu f \big\| _{L_\infty(\varOmega)} < \infty, $$
where Λr:={ν=(ν1,…,νd): 0≤νir, i=1,…,d}. We define the norm on this space by adding \(\|f\|_{L_{\infty}(\varOmega)}\) to the above semi-norm. This is a well-studied class of functions, especially for the analysis of cubature formulae. These function spaces can also be characterized as tensor products of univariate Sobolev spaces, see [9]. Clearly, we have that \(\mathcal {F}^{r}(M)\) is contained in a finite ball of MWr(L(Ω)) (see Chaps. III and V of [11]). It is known (see, e.g., [10], [11, IV.5], [1, Lemma 4.9]) that one can sample functions in MWr(L(Ω)) on a set of points (called sparse grids) with cardinality N and use these point values to construct an approximation to f with accuracy \(C(d,r)\|f\|_{MW^{r}(L_{\infty})}N^{-r} [\log N]^{(r+1)(d-1)}\) in L(Ω).

The main result of the present paper is to present a query algorithm for functions \(f\in \mathcal {F}^{r}\). The query algorithm works without knowledge of M, but would require a bound on r. We show that we can query such a function f at O(N) suitably chosen points and from these queries we can construct an approximation \(\tilde{f}_{N}\) that approximates f to accuracy C(r,d)Nr. Thus, for rank one tensors, the [logN](r+1)(d−1) appearing for mixed norm classes can be removed. Moreover, \(\tilde{f}_{N}\) is again separable, that is, the algorithm preserves this structural property of the original function f.

Given a budget N, our queries of f will have two stages. The first queries of f occur at a set of O(N) points built from discrepancy theory. If f(z)≠0 for one of the points z of the initial query, then we continue and sample f at O(N) points built from z. We then show how to build an approximation \(\tilde{f}_{N}\) to f from these query values which will provide the required accuracy.

2 Univariate Approximation

Our construction of approximations of multivariate functions in \(\mathcal {F}^{r}(M)\) is based on the approximation of univariate functions. It is well known that for \(g\in W^{r}_{\infty}[0,1]\), given the values g(i/N), we can construct an approximation \(\mathcal {I}_{N}((g(i/N))_{i=1}^{N})\) that satisfies
$$\begin{aligned} &\big\| g-\mathcal {I}_N\bigl(\bigl(g(i/N)\bigr)_{i=1}^N \bigr)\big\| _{L_\infty[0,1]}\le C_1(r) \min\bigl\{ \|g\|_{L_\infty[0,1]}, |g|_{W^r_\infty[0,1]} N^{-r}\bigr\} , \\ & \quad N=1,2\dots. \end{aligned}$$
(2.1)
There are many ways to construct such an approximation operator \(\mathcal {I}_{N}\). One is to use a quasi-interpolation operator built on univariate splines of order r. Another is to simply take for each interval I=[j−1/N,j/N), j=1,…,N, a set Sj of r consecutive integers i+1,…,i+r that contain j−1 and j, and then define g on the interval I as the polynomial of order r that interpolates g at the points in Sj.

In going further, we use any such construction of an operator \(\mathcal {I}_{N}\). We note that \(\mathcal {I}_{N}\) needs as input any vector y=(y0,…,yN). The yi are usually taken as function values such as yi=g(i/N) above.

We need a second result about univariate functions summarized in the following lemma.

Lemma 2.1

Suppose\(g\in W^{r}_{\infty}[0,1]\)is a univariate function that vanishes atrpointst1,…,tr∈[0,1]. IfJis the smallest interval that contains all of thetj, j=1,…,r, then
$$ \big|g(t)\big|\le\big\| g^{(r)}\big\| _{L_\infty[0,1]} \bigl(|J|+\operatorname{dist}(t,J) \bigr)^r,\quad t\in[0,1]. $$
(2.2)

Proof

Note that each weak derivative g(k) for k=0,…,r−1 is in \(W^{1}_{\infty}[0,1]\) and can thus be identified with a continuous function. From Rolle’s theorem, for each k=0,…,r−1, there is a point ξk in J such that g(k)(ξk)=0. This gives the bound
$$ \big|g^{(r-1)}(t)\big|\le\big\| g^{(r)}\big\| _{L_\infty[0,1]}|t- \xi_{r-1}|\le\big\| g^{(r)}\big\| _{L_\infty[0,1]} \bigl(|J|+\operatorname{dist}(t,J) \bigr), \quad t\in[0,1]. $$
From this, we obtain the bound
$$\begin{aligned} \big|g^{(r-2)}(t)\big| \le& \big\| g^{(r)}\big\| _{L_\infty[0,1]} \bigl(|J|+ \operatorname{dist}(t,J) \bigr) |t-\xi_{r-2}| \\ \le& \big\| g^{(r)}\big\| _{L_\infty[0,1]} \bigl(|J|+\operatorname{dist}(t,J) \bigr)^2, \quad t\in[0,1]. \end{aligned}$$
Continuing in this way, we arrive at (2.2). □

3 Low-Discrepancy Point Sequences

The first set of query points that we shall employ is a low-discrepancy sequence that is commonly used in quasi-Monte Carlo methods for high-dimensional integration. Roughly speaking, stopping at any place in the sequence gives a well scattered set of points in Ω. The particular property we are interested in here is that no d-dimensional rectangle contained in Ω can have large measure without containing at least one of these points. We shall adopt a method for constructing such a sequence given in [4, 5] which rests on base q expansions. For any prime number q and any positive integer n, we have a unique base q representation
$$n = \sum_{j\geq0} b_j q^j,\quad b_j=b_j(q,n) \in\{ 0,\ldots, q-1\}. $$
The bj are the ‘bits’ of n in base q. For any n<qk, one has bj(q,n)=0 for jk.
With the bit sequence (bj)=(bj(n)) in hand, we define
$$ \gamma_q(n) := \sum_{j\geq0} b_j q^{-j-1} . $$
If q is fixed, the set of points Γq(m):={γq(n):1≤n<m} are in (0,1), and any point x∈(0,1) satisfies
$$ \operatorname{dist}\bigl(x,\varGamma_q(m)\bigr)\le q/m. $$
(3.1)
Indeed, if m=qk for some positive integer k, then Γq(m) contains all points j/m, j=1,…,m−1, and so the distance in (3.1) does not exceed 1/m. The general result for arbitrary m follows from this.

Definition 3.1

(Halton Sequence)

Given the space dimension d≥1, we choose the first d prime numbers p1,…,pd. The sequence of points \((\hat{x}_{k})_{k\in \mathbb{N}}\) in [0,1]d is then defined by
$$ \hat{x}_{k} := \bigl( \gamma_{p_1}(k), \ldots, \gamma_{p_{d}}(k) \bigr) . $$
(3.2)

The following theorem (see [8] and [2]) shows that this sequence of points is well scattered in the sense that we need.

Theorem 3.2

Let\(\hat{x}_{k}\), k=1,2,…, be defined as in (3.2). For anyd-dimensional rectangleR=(α1,β1)×⋯×(αd,βd) with 0≤αi<βi≤1 that does not contain any of the points\(\hat{x}_{k}\), k=1,…,N, we have the following bound for the measure |R| ofR:
$$ |R| \leq\frac{C_H(d)}{N}, $$
where\(C_{H}(d):= 2^{d} \prod_{i=1}^{d} p_{i} \).

Proof

For completeness, we give the short proof of this theorem, following essentially the presentation in [2, Thm. 3]. We first consider any d-dimensional rectangle R0Ω of the form
$$ R_0:= I_1\times\cdots\times I_d, \quad I_i:= p_i^{-\nu_i} \big[t_i, (t_i+1) \big),\ i=1,\dots,d, $$
(3.3)
where the \(\nu_{i}\in \mathbb{N}\) and satisfy \(p_{1}^{\nu_{1}}\cdots p_{d}^{\nu_{d}} \leq N\) and the ti are positive integers. Such a rectangle obviously has volume ≥1/N. We shall show that such a rectangle always contains a point \(\hat{x}_{k}\) for some 1≤kN and thus obtain the theorem for rectangles of this special type.
Since R0Ω, each ti is in \(\{0,\dots,p_{i}^{\nu _{1}}-1\} \) and therefore has a unique expansion
$$t_i = \sum_{j=0}^{\nu_i-1} a_{i,j} p_i^j, $$
with ai,j∈{0,…,pi−1}. We introduce the integers
$$ m_i:=\sum_{j=0}^{\nu_i-1} a_{i,\nu_i-j-1} p_i^{j},\quad i=1,\dots,d, $$
which satisfy
$$\gamma_{p_i}(m_i)=t_ip_i^{-\nu_i},\quad i=1,\dots,d. $$
From the Chinese remainder theorem, there is an integer \(k<p_{1}^{\nu _{1}}\cdots p_{d}^{\nu_{d}}\le N\) such that
$$ k \equiv m_i \mod p_i^{\nu_i}, \quad i=1,\dots,d. $$
It follows that
$$\gamma_{p_i}(k)= t_ip_i^{-\nu_i}+\epsilon_i,\quad i=1,\dots,d, $$
where \(0\le\epsilon_{i}< p_{i}^{-\nu_{i}}\), i=1,…,d. Therefore \(\hat{x}_{k}=(\gamma_{p_{1}}(k),\dots,\gamma_{p_{d}}(k))\) is in R0, and we have proved the theorem in this special case.

We now consider the general rectangle R in the statement of the theorem. We claim that R contains a special rectangle R0 of the form (3.3) of volume larger than CH(d)−1|R|. Indeed, for the given αi<βi, we define νi to be the smallest integer such that there exists an integer ti with \([t_{i} p^{-\nu_{i}} , (t_{i}+1) p^{-\nu_{i}} ) \subset(\alpha_{i}, \beta_{i})\). Then, \(\beta_{i} -\alpha_{i} < 2 p^{-\nu_{i} + 1}\), since otherwise νi would not be minimal. This means that R contains a special rectangle R0 with volume |R0|≥CH(d)−1|R|. Since R does not contain any of the \(\hat{x}_{k}\), k=0,…,N, the same is true of R0. Hence |R0|≤N−1, and so |R|≤CH(d)N−1. □

4 Query Points and the Approximation

We now describe our query points. These will depend on r. If r=1, then given our budget N of queries, it would be sufficient to simply query f at the points \(\hat{x}_{1},\hat{x}_{2}, \dots,\hat{x}_{N}\) of a Halton sequence in succession. However, when r>1, we will occasionally have to query f at a cloud of points near each \(\hat{x}_{k}\) in order to take advantage of the higher smoothness of f. We fix r≥1 in what follows. We next describe the cloud of points where we might query f. We define for each k=1,2,…, and each nk,
$$ \varGamma_n(\hat{x}_k):= \Biggl\{ \hat{x}_k+\sum_{i=1}^d \frac{j_i}{r2^n}e_i: \ j_i\in\{-r+1,\dots,0,\dots,r-1\} \Biggr\} \cap\varOmega, $$
where ei, i=1,…,d, is the usual coordinate basis for \(\mathbb{R}^{d}\). For each k,n, this set contains at most (2r−1)d points and contains at least rd points. When asked to query f at one of the sets \(\varGamma_{n}(\hat{x}_{k})\), we traverse these points in lexicographic order.
Our query algorithms first sample f at point clouds \(\varGamma _{n_{k}}(\hat{x}_{k})\), k=1,…. If we stipulate the budget N in advance, we can then choose once and for all a single nk as the smallest integer such that \(2^{n_{k}} \ge N\). For a given f and fixed N, this gives rise to the basic scheme given in Algorithm 1 for determining an approximation \(\tilde{f}_{N}\) to f.
Algorithm 1

Query algorithm for prescribed N

The procedure CrossApproximation required in Query 2 is defined for any z such that f(z)≠0 and \(N\in \mathbb{N}\) as follows:
  • \(\tilde{f}_{N}\) := CrossApproximation(z,N)

  • defining zj as the vector which agrees with z in all but the j-th coordinate and is zero in the j-th coordinate, evaluate f at the points
    $$ \tilde{z}_{j,i}:= z^j+\frac{i}{N}e_j, \quad i=1,\dots,N,\ j=1,\dots,d , $$
    (4.1)
  • and define
    $$ F_j:=\mathcal {I}_N \bigl(f(\tilde{z}_{j,i})_{i=1}^N \bigr),\quad j=1,\dots,d, $$
    (4.2)
    where \(\mathcal {I}_{N}\) is the operator of Sect. 2. Then, setting A:=f(z), return
    $$ \tilde{f}_N(x):=A^{-d+1}F_1(x_1) \cdots F_d(x_d). $$
    (4.3)
Rather than proceeding to analyze the performance of Algorithm 1, we instead modify this algorithm so that the sampling is progressive if we increase N; i.e., when the budget N is increased, one would still like to utilize the previous samples. This requires modifying both query stages. First, we will occasionally update the assignment of nk, which means that the function f has to be resampled at \(\varGamma_{n_{k}}(\hat{x}_{j})\) for some of the j<k. This leads us to the modification specified in Query 1 of Algorithm 2. Note that this query loop may be exited at any value of N. As we will show below, the asymptotic complexity of Algorithm 1 is preserved.
Algorithm 2

Query algorithm with progressively increasing N

Second, note that once a point z with f(z)≠0 is found for some N, Algorithm 2 proceeds directly to Query 2 for every subsequent value of N. To keep the complexity of Query 2 proportional to N, as N increases, we should also reuse the samples in preceding calls of Query 2. This can be done by dyadic nesting and leads to the modified procedure CrossApproximation(z,N) obtained by replacing N in (4.1) and (4.2) by \(2^{\lceil\log_{2} N\rceil}\).

To estimate the complexity of the progressive scheme, we define ΛN(f) as the set of points where we have sampled f in Algorithm 2, up to a given budget index N. We want next to bound the cardinality of ΛN(f). Since \(\#(\varGamma_{n}(\hat{x}_{k}))\le(2r-1)^{d}\), for all choices n,k, the only issue in bounding the number of samples in Query 1 will be how many times we have resampled f near \(\hat{x}_{j}\). Now, for a given \(\hat{x}_{j}\), we originally sample f at the points \(\varGamma_{j}(\hat{x}_{j})\). This sampling will be updated to a sampling \(\varGamma_{2^{j}}(\hat{x}_{j})\) if 2j<N. It will be updated again if \(2^{2^{j}}<N\) and so on. It follows that the only \(\hat{x}_{j}\) whose sampling is updated are those with j≤log2N and the maximum number of times it is updated is bounded by log2N. Thus, the total number of samples taken in Query 1 does not exceed (2r−1)d[N+(log2N)2]≤2⋅(2r−1)dN. This gives that the total number of samples taken is
$$ \# \bigl(\varLambda_N(f) \bigr)\le C_1(d,r) N,\quad C_1(d,r):=2(2r-1)^d +d . $$
(4.4)

5 Error of Approximation

We now analyze how well \(\tilde{f}_{N}\) approximates f.

Theorem 5.1

If\(f\in \mathcal {F}^{r}(M)\), then for eachN=1,2,…, we have
$$ \|f-\tilde{f}_N\|_{L_\infty(\varOmega)}\le \bigl[C_H(d)\bigr]^r (2M)^d N^{-r}, $$
(5.1)
withCH(d) as in Theorem 3.2. If, however, Query1stops at a pointzwheref(z)≠0, andNsatisfiesC1(r)MNr<1/(2d), then
$$ {\|f-\tilde{f}_N\|_{L_\infty(\varOmega)}\le \sqrt{e}C_1(r) d M N^{-r} .} $$
(5.2)

The remainder of this section is devoted to the proof of this theorem. We will consider the two cases used for the definition of \(\tilde{f}_{N}\) in Algorithm 2.

Proof

We fix an arbitrary N. We first consider:

Case 1: Nozwithf(z)≠0 has been found inQuery 1.

In this case, \(\tilde{f}_{N} = 0\). In order to obtain the required bound for \(\|f\|_{L_{\infty}(\varOmega )}\), we begin with:

Lemma 5.2

Under the assumptions of Theorem 5.1, for eachk=1,…,N, there is aj∈{1,…,d} such thatfjvanishes atrdistinct points in [0,1] of the form\((\hat{x}_{k})_{j}+t_{i,j}\), i∈{−r+1,…,0,…r−1} with |ti,j|≤N−1.

Proof of Lemma 5.2

We know that f vanishes at all points in \(\varGamma_{n_{k}}(\hat{x}_{k})\) where nk is the last update associated to \(\hat{x}_{k}\). We also know that \(2^{-n_{k}}\le1/N\). We now prove the lemma for \(t_{i,j} = \frac{i}{r2^{n_{k}}}\). Suppose that the statement does not hold; then for this value of k and for each j=1,…,d, there is an ij∈{−r+1,…,0,…r−1} such that \(z_{j}:=(\hat{x}_{k})_{j}+(r2^{n_{k}})^{-1} i_{j}\in[0,1]\) and fj(zj)≠0. But then \(z:=(z_{1},\dots,z_{d})\in\varGamma_{n_{k}}(\hat{x}_{k})\) and f(z)≠0, which is the desired contradiction. □

For each k, we let \(\mathcal{C}_{k}\) be the set of all such integers j∈{1,…,d} with the properties stated in Lemma 5.2. We refer to the integers j in \(\mathcal{C}_{k}\) as the colors of \(\hat{x}_{k}\).

In the case we are considering, we know that f vanishes at each of the points of Query 1 and that \(\tilde{f}_{N}=0\). Let x=(x1,…,xd)∈Ω. Our goal is to bound the value f(x). We define
$$\begin{aligned} &\delta_j:=\delta_j(x):=\inf\bigl\{ \big|( \hat{x}_k)_j-x_j\big| \colon k\in\{1,\dots,N\} \ \mathrm{such}\ \mathrm{that}\ j\in \mathcal{C}_k \bigr\} \cup\{ 1 \} ,\\ & \quad j=1,\dots, d. \end{aligned}$$
In other words, δj(x) tells us how well we can approximate xj by the numbers \((\hat{x}_{k})_{j}\) using those k for which j is in \(\mathcal{C}_{k}\).
It follows that the rectangle \(R:=\varOmega\cap\prod_{j=1}^{d}(x_{j}-\delta _{j}, x_{j}+\delta_{j})\) does not contain any points \(\hat{x}_{k}\) which have color j, and this is true for each j=1,…,d. Since, as we have already observed in Lemma 5.2, every \(\hat{x}_{k}\) has some colors, it follows that R does not contain any of the points \(\hat{x}_{k}\), k=1,…,N. From Theorem 3.2, we have that |R|≤CH(d)/N. Since \(|R|\ge\prod_{j=1}^{d}\delta_{j}\), we obtain
$$ \prod_{j=1}^d \delta_j(x)\le C_H(d)/N. $$
(5.3)
Now fix any 1≤jd. In the case that the statement of Lemma 5.2 does not apply with this j for any k, we have \(\| f_{j} \|_{L_{\infty}[0,1]} \leq1 = \delta_{j} = \delta_{j}^{r}\). Otherwise, we know from the definition of coloring and the definition of δj that there exist r points t1,…,tr∈[0,1] contained in an interval J of length 1/N such that \(\operatorname{dist}(x_{j},J)\le\delta_{j}\) and fj vanishes at each of these points. Hence, from Lemma 2.1, we obtain
$$ \big|f_j(x_j)\big|\le\big\| f^{(r)} \big\| _{L_\infty[0,1]} \bigl(|J|+\delta_j \bigr)^r\le M \bigl(N^{-1}+\delta_j \bigr)^r\le2M\max\bigl\{ N^{-r},\delta_j^r \bigr\} . $$
It follows that
$$ \big|f(x)\big|= \prod_{j=1}^d\big|f_j(x_j)\big| \le2^dM^d \prod_{j=1}^d \max\bigl\{ N^{-r},\delta_j^r\bigr\} \le 2^dM^d\bigl[C_H(d)\bigr]^rN^{-r}. $$
Here in the derivation of the last inequality, we used (5.3) and the fact that all the δj, j=1,…,d are not greater than one. This completes the proof of the theorem in Case 1.

Case 2: Query 1has producedzsuch thatf(z)≠0.

In this case, \(\tilde{f}_{N}\) is obtained by CrossApproximation*. With such z=(z1,…,zd), let A:=f(z)≠0 and Aj:=∏ijfi(zi) for j=1,…,d. Sampling f at the points \(\tilde{z}_{j,i}\) of (4.1) thus yields the values \(f(\tilde{z}_{j,i})=A_{j}f_{j}(i/N)\), i=1,2,…,N. Hence, from (2.1), we obtain
$$\big\| A_jf_j(x)-F_j(x)\big\| _{L_\infty[0,1]} \le C_1(r) A_j M N^{-r},\quad j=1,\dots,N. $$
In other words,
$$ \big\| f_j-A_j^{-1}F_j \big\| _{L_\infty[0,1]} \le C_1(r) M N^{-r},\quad j=1,\dots,N. $$
(5.4)
Since \(\prod_{j=1}^{d}A_{j}=A^{d-1}\), we can write our approximation in the form \(\tilde{f}_{N}(x)= \prod_{j=1}^{d}A_{j}^{-1}F_{j}(x_{j})\). Hence, the approximation error can be rewritten as
$$ f(x)-\tilde{f}_N(x)=\prod _{j=1}^d f_j(x_j)-\prod _{j=1}^dA_j^{-1}F_j(x_j). $$
Now, for any numbers \(y_{j},y_{j}'\in[-L,L]\), j=1,…,d, we have
$$ \big| y_1\cdots y_d-y_1' \cdots y_d'\big|= \Bigg| \sum_{j=1}^d {y_1\cdots} y_{j-1}y_{j+1}'{\cdots} y_d'\bigl(y_j-y_j' \bigr) \Bigg|\le d L^{d-1}\max_{1\le j\le d} \big|y_j-y_j'\big|. $$
We use this inequality with yj:=fj(xj) and \(y_{j}':=A_{j}^{-1}F_{j}(x_{j})\), in which case we can take L:=1+C1(r)MNr to obtain
$$ \|f-\tilde{f}_N\|_{L_\infty(\varOmega)}\le d \bigl(1+C_1(r) M N^{-r}\bigr)^{d-1} C_1(r) M N^{-r}, $$
(5.5)
where we have used (5.4).
For ε:=C1(r)MNr, we have ε<1/(2d) by our assumption, and hence
$$(1+\varepsilon)^{d-1} \le\exp(\varepsilon)^{d-1} \le e^{(d-1)/(2d)} \le\sqrt{e}. $$
Using this in (5.5), we obtain \(\|f-\tilde{f}_{N}\|_{L_{\infty}(\varOmega)}\le d \sqrt{e} \varepsilon\), completing the proof of the theorem.  □

6 Optimality of the Algorithm

It is quite easy to see that our algorithm has asymptotically optimal performance, in terms of N, on the class \(\mathcal {F}^{r}(M)\).

Theorem 6.1

Given positive integersrandd, there is an absolute constantc(d,r) such that the following holds: Given any algorithm which usesNpoint queries to approximatefbyAN(f), there is a function\(f\in \mathcal {F}^{r}(M)\)such that
$$ \big\| f-A_N(f)\big\| _{L_\infty(\varOmega)}\ge c(r,d) M^dN^{-r}. $$
(6.1)

Proof

We can assume without loss of generality that N=md−1 for some positive integer m. We divide Ω into N+1 cubes of sidelength 1/m. To the proposed query algorithm we return the value zero to each of the N query points. Now we can choose a cube Q of sidelength 1/m which contains none of the N query points. There is a function \(g\in \mathcal {F}^{r}(M)\) which is supported in Q and has maximum value [c(r)Mmr]d. Since the proposed algorithm gives AN(g)=AN(0), for one of the two functions f=0 or f=g, (6.1) follows. □

Let us finally relate our results to what is commonly referred to as the curse of dimensionality, see [6, 7]. First note that our estimate for the computational work in Algorithms 1 and 2 may be dominated by Query 1. In fact, according to the bound (5.1), the computational complexity of our algorithms—measured in terms of the number of generated degrees of freedom—required to realize a desired target accuracy may in general increase exponentially in the spatial dimension d and is therefore still subject to that “curse.” However, (5.2) says that, once a point z for which f(z)≠0 has been found, the complexity of the additional computational work needed to recover f within a desired accuracy ϵ grows only like (d/ε)1/r, and hence this part of the algorithm would break the curse of dimensionality. Moreover, under additional assumptions on f, Query 1 can have much lower complexity. For example, if each component function fj is a polynomial of a fixed degree p, or more generally if each component has at most a fixed number p of zeros, then Query 1 will terminate after at most p steps. Indeed, the Halton sequence never repeats a coordinate value. Even when further relaxing the assumptions on f, say to analyticity, and replacing in Query 1 the sampling by random sampling, one could formulate a result according to which the algorithm breaks the curse of dimensionality with high probability.

Notes

Acknowledgements

This research was supported by the Office of Naval Research Contracts ONR N00014-08-1-1113, ONR N00014-09-1-0107, and ONR N00014-11-1-0712; the AFOSR Contract FA95500910500; the NSF Grants DMS-0810869, and DMS 0915231; and the DFG Special Priority Program SPP-1324. This research was done when R.D. was a visiting professor at RWTH and the AICES Graduate Program. This publication is based in part on work supported by Award No. KUS-C1-016-04 made by King Abdullah University of Science and Technology (KAUST).

References

  1. 1.
    Bungartz, H.-J., Griebel, M.: Sparse grids. Acta Numer. 13, 147–269 (2004) CrossRefMathSciNetGoogle Scholar
  2. 2.
    Dumitrescu, A., Jiang, M.: On the largest empty axis-parallel box amidst n points. Algorithmica (2012). doi:10.1007/s00453-012-9635-5 MathSciNetGoogle Scholar
  3. 3.
    Hackbusch, W.: Tensor Spaces and Numerical Tensor Calculus. Springer Series in Computational Mathematics, vol. 42. Springer, Berlin (2012) CrossRefMATHGoogle Scholar
  4. 4.
    Halton, J.H.: On the efficiency of certain quasi-random sequences of points in evaluating multi-dimensional integrals. Numer. Math. 2, 84–90 (1960) CrossRefMATHMathSciNetGoogle Scholar
  5. 5.
    Hammersley, J.M.: Monte Carlo methods for solving multivariable problems. Ann. N.Y. Acad. Sci. 86, 844–874 (1960) CrossRefMATHMathSciNetGoogle Scholar
  6. 6.
    Novak, E., Woźniakowski, H.: Tractability of Multivariate Problems, Volume I: Linear Information. EMS Tracts in Mathematics, vol. 6. Eur. Math. Soc., Zurich (2008) CrossRefGoogle Scholar
  7. 7.
    Novak, E., Woźniakowski, H.: Approximation of infinitely differentiable multivariate functions is intractable. J. Complex. 25, 398–404 (2009) CrossRefMATHGoogle Scholar
  8. 8.
    Rote, G., Tichy, R.F.: Quasi-Monte-Carlo methods and the dispersion of point sequences. Math. Comput. Model. 23, 9–23 (1996) CrossRefMATHMathSciNetGoogle Scholar
  9. 9.
    Sickel, W., Ullrich, T.: Tensor products of Sobolev–Besov spaces and applications to approximation from the hyperbolic cross. J. Approx. Theory 161, 748–786 (2009) CrossRefMATHMathSciNetGoogle Scholar
  10. 10.
    Smolyak, S.A.: Quadrature and interpolation formulas tensor products of certain classes of functions. Sov. Math. Dokl. 4, 240–243 (1963) Google Scholar
  11. 11.
    Temlyakov, V.: Approximation of Periodic Functions. Nova Science Publishers, New York (1993) MATHGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2013

Authors and Affiliations

  • Markus Bachmayr
    • 1
  • Wolfgang Dahmen
    • 1
  • Ronald DeVore
    • 2
  • Lars Grasedyck
    • 1
  1. 1.Institut für Geometrie und Praktische MathematikRWTH AachenAachenGermany
  2. 2.Department of MathematicsTexas A&M UniversityCollege StationUSA

Personalised recommendations