Abstract
We present a dimensionincremental algorithm for the nonlinear approximation of highdimensional functions in an arbitrary bounded orthonormal product basis. Our goal is to detect a suitable truncation of the basis expansion of the function, where the corresponding basis support is assumed to be unknown. Our method is based on point evaluations of the considered function and adaptively builds an index set of a suitable basis support such that the approximately largest basis coefficients are still included. For this purpose, the algorithm only needs a suitable search space that contains the desired index set. Throughout the work, there are various minor modifications of the algorithm discussed as well, which may yield additional benefits in several situations. For the first time, we provide a proof of a detection guarantee for such an index set in the function approximation case under certain assumptions on the submethods used within our algorithm, which can be used as a foundation for similar statements in various other situations as well. Some numerical examples in different settings underline the effectiveness and accuracy of our method.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
In recent years, socalled sparse algorithms that are designed to recover sparse signals have gained significant attention. Various methods and algorithms got developed since then and evolved the field of compressed sensing tremendously, see [14] for a lot of examples and references. Especially the socalled sparse Fast Fourier Transform (sFFT) algorithms (see, e.g., [3, 7, 12, 15, 17, 18, 21, 23, 36, 42, 43, 50] for a short overview and introduction) provide efficient ways to reconstruct univariate sparse trigonometric polynomials in different settings. Of course, there are also many other onedimensional bases besides the Fourier basis, where the sparse recovery problem is also of interest. Hence, similar algorithms began to arise for bases such as, e.g., the Legendre polynomial basis [19, 41, 45, 48], as well as for more general settings, cf. [16]. Often, the methods of the aforementioned are also applicable in the approximation setting, so if the target function is not sparse itself but assumed to be well approximated by some sparse quantity.
At the same time the highdimensional generalization of these problems became another topic of research, in particular the question for methods that circumvent the curse of dimensionality (as introduced in [5]) to some extent. While again the Fourier case is already studied very well, e.g., [8, 9, 20, 22, 30, 35, 46, 47], there is very little knowledge about efficient algorithms for other highdimensional bases.
Sparse polynomial approximation algorithms based on least squares, sparse recovery or compressed sensing have been shown to be quite effective for approximating highdimensional functions, even in nonFourier settings, with a relatively small number of samples used. A broad overview of this topic is available in [1, 2, 13] and the numerous references therein, providing detailed analysis of the strengths and limitations of these methods. One of the main challenges of sparse polynomial approximation is the computational complexity of the matrix–vector multiplications involved in these algorithms. The size of the matrices used therein can grow exponentially with the number of input variables, making the computation time of these algorithms a bottleneck in many applications. For certain problem settings the structure of the matrices or the particular function spaces considered allow for a speedup in computation time since faster matrix–vector multiplication algorithms are available. However, for more general problem settings the computational complexity is still an issue. Recently, more efficient sublineartime algorithms for bounded orthonormal product bases have been developed in [10, 11] and have shown promising results.
Another popular approach in the highdimensional stochastic setting are sparse polynomial chaos expansions, cf. [39] and the references therein. There, random variables are approximated using a subset of the corresponding polynomial orthonormal basis by sparse regression. After each iteration, the candidate subset or the sampling locations are modified until the sparse solution is satisfactory. Note that the concept of sparsity is only used as a tool to find robust solutions in this case and is not the main goal of sparse polynomial chaos expansions. There also exist basisadaptive sparse PCE approaches as for example described and compared in [40] using and combining various approaches to iteratively build a suitable candidate basis. However, the particular methods have to be chosen carefully since the relative error strongly varies for the different methods. A final model selection after computing several sparse PCE solutions is heavily recommended therein.
Our aim here is the nonlinear approximation of a function \(f({\varvec{x}}) :=\sum _{{\varvec{k}}\in \mathbb {N}^d} c_{{\varvec{k}}}\Phi _{{\varvec{k}}}({\varvec{x}})\) using samples by a truncated sum
with a carefully selected, finite index set \(\textrm{I}\), which is apriori unknown. Additionally, we also approximate the basis coefficients \(c_{\varvec{k}}, {\varvec{k}}\in \textrm{I},\) to derive the approximation
with \({\hat{f}}_{{\varvec{k}}} \in \mathbb {C}, {\varvec{k}}\in \textrm{I}\). Throughout the whole paper, sampling is meant w.r.t. a black box algorithm that provides the function values \(f({\varvec{x}})\) for any sampling node \({\varvec{x}}\) our algorithm \(\mathcal {A}\) requires. Such a black box case can be achieved for example when solving parametric PDEs, see for example [31, 32], where each sample can be computed as the solution of the PDE w.r.t. the spatial domain for a fixed random variable. This concept also enables the algorithm to work highly adaptive, since the samples can be suitably chosen in each step, which is not the case when working with given samples.
We stress on the two parts of this sparse approximation problem, which can be identified by the common error estimate
where we denote with \(\textrm{I}^\textrm{c}:=\mathbb {N}^d {\setminus } \textrm{I}\) the complement of \(\textrm{I}\). We need to compute good approximations \({\hat{f}}_{{\varvec{k}}}\) for the coefficients \(c_{\varvec{k}}\) for all \({\varvec{k}}\in \textrm{I}\) to reduce the coefficient approximation error \(\sum _{{\varvec{k}}\in \textrm{I}} \left {\hat{f}}_{{\varvec{k}}}  c_{{\varvec{k}}}\right \) and detect a suitable sparse index set \(\textrm{I}\) such that it contains as many indices \({\varvec{k}}\), corresponding to the largest coefficients \(c_{\varvec{k}}\) of the function f, as possible and therefore minimizing the truncation error \(\sum _{{\varvec{k}}\in \textrm{I}^\textrm{c}} \left c_{{\varvec{k}}}\right \). While the coefficient approximation problem for a given index set \(\textrm{I}\) is wellknown for many bases, the detection of a good index set \(\textrm{I}\) is rather complicated and will therefore be our primary aim.
In this paper, we present a dimensionincremental approach for the nonlinear approximation of highdimensional functions by sparse basis representations applicable to arbitrary bounded orthonormal product basis. The basis indices for these representations are detected adaptively by computing socalled projected coefficients, which are indicating the importance of the corresponding index projections. Our algorithm utilizes suitable methods in the corresponding function spaces to determine sampling nodes and approximate those projected coefficients using the corresponding samples, e.g., by a cubature or least squares method. Therefore, our algorithm benefits tremendously, if those methods are efficient in the sense of sampling or computational complexity.
The paper is organized as follows: In the remaining part of Sect. 1, we briefly introduce the function space setting and explain the concept of projected coefficients. In Sect. 2, we derive our dimensionincremental method and briefly discuss its complexity, the apriori choice of the search space and alternative increment strategies. Section 3 contains the derivation of the theoretical main result and Sect. 4 shows the application of our algorithm to periodic and nonperiodic function approximations. Finally, we briefly summarize the results of this work in Sect. 5.
1.1 Bounded orthonormal product bases
We consider \(d \in \mathbb {N}\) measure spaces \((\mathcal {D}_j,\mathcal {A}_j,\mu _j), j = 1,\ldots ,d,\) with the probability measures \(\mu _j\), the \(\sigma \)algebras \(\mathcal {A}_j\) and the Borel sets \(\mathcal {D}_j \subset \mathbb {R}\) for all \(j=1,\ldots ,d\). As usual, we denote the sets of all functions \(f: \mathcal {D}_j \rightarrow \mathbb {C}\) that are squareintegrable with respect to \(\mu _j\) by \(L_2(\mathcal {D}_j,\mu _j)\) and for \(\mathcal {D}= \mathop {{\times }}_{j=1}^d \mathcal {D}_j \subset \mathbb {R}^d\) the set of all functions \(f: \mathcal {D}\rightarrow \mathbb {C}\) that are squareintegrable with respect to the product measure \(\mu = \mathop {{\times }}_{j=1}^d \mu _j\) by \(L_2(\mathcal {D},\mu )\). We assume that the measure spaces \((\mathcal {D}_j,\mathcal {A}_j,\mu _j), j = 1,\ldots ,d,\) are such that the \(L_2(\mathcal {D}_j,\mu _j), j = 1,\ldots ,d,\) are separable Hilbert spaces. Hence, there exists a countable orthonormal basis \(\lbrace \phi _{j,k_j}:\mathcal {D}_j\rightarrow \mathbb {C}\,\vert \, k_j \in \mathbb {N}\rbrace \) for each \(j = 1,\ldots ,d\). Further, the space \(L_2(\mathcal {D},\mu )\) is then also a separable Hilbert space spanned by the orthonormal product basis \(\lbrace \Phi _{{\varvec{k}}}:\mathcal {D}\rightarrow \mathbb {C}\,\vert \, {\varvec{k}}\in \mathbb {N}^d \rbrace \) with
Finally, we assume that there exist the finite constants
for each \(j=1,\ldots ,d\), i.e., the orthonormal basis \(\lbrace \phi _{j,k_j}:\mathcal {D}_j\rightarrow \mathbb {C}\,\vert \, k_j \in \mathbb {N}\rbrace \) is bounded for each j. Then, the orthonormal product basis \(\lbrace \Phi _{{\varvec{k}}}:\mathcal {D}\rightarrow \mathbb {C}\,\vert \, {\varvec{k}}\in \mathbb {N}^d \rbrace \) is also bounded by
and is therefore called Bounded Orthonormal Product Basis (BOPB) throughout this paper.
Let \(f \in L_2(\mathcal {D},\mu )\) be smooth enough such that there exist coefficients \(\lbrace c_{{\varvec{k}}}\rbrace _{{\varvec{k}}\in \mathbb {N}^d}, \sum _{{\varvec{k}}\in \mathbb {N}^d} \left c_{{\varvec{k}}}\right < \infty ,\) and the series expansion
holds pointwise for all \({\varvec{x}}\in \mathcal {D}\). This smoothness requirement enables the concept of approximation of f using point samples, but is different for each basis \(\lbrace \Phi _{{\varvec{k}}}\rbrace _{{\varvec{k}}\in \mathbb {N}^d}\).
An example of such a BOPB is the Fourier system on the periodic, dvariate torus \(\mathcal {D}= \mathbb {T}^d \simeq [0,1)^d\) with the common Lebesgue measure, where \(\Phi _{\varvec{k}}({\varvec{x}}) = \text {e}^{2\pi \text {i}{\varvec{k}}\cdot {\varvec{x}}}, {\varvec{k}}\in \mathbb {Z}^d,\) with constant \(B=1\). For \(f \in L_2(\mathbb {T}^d)\), the coefficients \(c_{\varvec{k}}\) are then the well known Fourier coefficients \(\int _{\mathbb {T}^d} f({\varvec{x}}) \text {e}^{2\pi \text {i}{\varvec{k}}\cdot {\varvec{x}}} \textrm{d}{\varvec{x}}\). The approximation of a Fourier partial sum \(S_\textrm{I}f({\varvec{x}})\) for a given frequency set \(\textrm{I}\) can be realized efficiently by specific Fast Fourier Transform (FFT) methods, while the more challenging task of identifying a suitable frequency set \(\textrm{I}\) is considered as “sparse FFT” in several works, see [35, Tbl. 1.1] for an overview.
Another example is the Chebyshev system on \(\mathcal {D}= [1,1]^d\) with \(\Phi _{\varvec{k}}({\varvec{x}}) = T_{\varvec{k}}({\varvec{x}}) :=\prod _{j=1}^d \cos (k_j \arccos (x_j)), {\varvec{k}}\in \mathbb {N}^d,\) a tensor product of Chebyshev polynomials of first kind. The corresponding space is \(L_2([1,1]^d,\mu _{\textrm{Cheb}})\), where \(\mu _{\textrm{Cheb}}(\textrm{d}{\varvec{x}}) :=\prod _{j=1}^d (\pi \sqrt{1x_j^2})^{1} \textrm{d}{\varvec{x}}\) is the Chebyshev measure. The BOPB constant is again \(B=1\) in this setting.
We encourage the reader to keep such an example in mind. Especially for the Fourier system, the sparse FFT approaches presented in [30, 35, 46] may be seen as special cases of the algorithm we are about to present. Note that our setting above is not restricted to bases with similar structures in each dimension \(j=1,\ldots ,d\). For instance, one could also think of systems with a Fouriertype basis for only some \(j \in \{1,\ldots ,d\}\) and a Chebyshevtype basis for the remaining dimensions, see [11, Sec. 5.1.1] as an example.
Remark 1.1
As mentioned above, the smoothness of f is important to ensure the welldefinedness of point evaluations of f. Obviously, a possible restriction to ensure this property is to assume the continuity of f. Additionally, the smoothness condition is also fulfilled for most function spaces with higher regularity, e.g., when using weighted Wiener spaces as considered in [24, 44] for the Fourier setting. However, depending on the BOPB, other or weaker assumptions may be possible.
Also, one could assume f to be from a reproducing kernel Hilbert space instead, cf. for example [6]. The welldefinedness of point evaluations is one of the defining properties of such spaces.
1.2 Projected coefficients and cubature rules
To simplify notations in the upcoming sections, we introduce the following notations for \(\mathfrak {u}\subset \{1,\ldots ,d\}\) and its complement \(\mathfrak {u}^\textrm{c}:=\{1,\ldots ,d\} {\setminus } \mathfrak {u}\):

\(\mathcal {D}_{\mathfrak {u}} :=\mathop {{\times }}_{j \in \mathfrak {u}} \mathcal {D}_j \subset \mathbb {R}^{\left \mathfrak {u}\right }\), \(\mu _{\mathfrak {u}} :=\mathop {{\times }}_{j \in \mathfrak {u}} \mu _j\), \(B_\mathfrak {u}:=\prod _{j\in \mathfrak {u}} B_j\),

\(\Phi _{\mathfrak {u},{\varvec{k}}}({\varvec{\xi }}) :=\prod _{j \in \mathfrak {u}} \phi _{j,k_j}(\xi _j)\) for all \({\varvec{k}}=(k_j)_{j \in \mathfrak {u}}\in \mathbb {N}^{\left \mathfrak {u}\right }\) and \({\varvec{\xi }}=(\xi _j)_{j\in \mathfrak {u}}\in \mathcal {D}_{\mathfrak {u}}\),

\({\varvec{h}}_{\mathfrak {u}} :=(h_j)_{j\in \mathfrak {u}} \in \mathbb {N}^{\left \mathfrak {u}\right }\) for \({\varvec{h}}\in \mathbb {N}^d\),

\(({\varvec{k}},{\varvec{h}})_{\mathfrak {u}} :=(l_j)_{j=1}^d\) with \(l_j = {\left\{ \begin{array}{ll} k_j, &{} j \in \mathfrak {u}\\ h_j, &{} j \not \in \mathfrak {u}\end{array}\right. }\) for \({\varvec{k}}= (k_j)_{j\in \mathfrak {u}} \in \mathbb {N}^{\left \mathfrak {u}\right },\;{\varvec{h}}= (h_j)_{j\in \mathfrak {u}^\textrm{c}} \in \mathbb {N}^{d\left \mathfrak {u}\right }\),

\(f({\varvec{\xi }},\tilde{{\varvec{x}}})_{\mathfrak {u}} :=f((y_j)_{j=1}^d)\) with \(y_j = {\left\{ \begin{array}{ll} \xi _j, &{} j \in \mathfrak {u}\\ \tilde{x}_j, &{} j \not \in \mathfrak {u}\end{array}\right. }\) for \({\varvec{\xi }}= (\xi _j)_{j\in \mathfrak {u}} \in \mathcal {D}_{\mathfrak {u}},\;\tilde{{\varvec{x}}} = (\tilde{x}_j)_{j\in \mathfrak {u}^\textrm{c}} \in \mathcal {D}_{\mathfrak {u}^\textrm{c}}\).
To ensure that all of these quantities using \(\mathfrak {u}\) (or \(\mathfrak {u}^\textrm{c}\)) are welldefined, we assume them to be ordered naturally. Note that the notations coincide with their one and ddimensional counterparts for \(\left \mathfrak {u}\right =1\) or \(\left \mathfrak {u}\right =d\), respectively, if they exist.
Our algorithm, which we are about to present in Sect. 2, detects a suitable index set \(\textrm{I}\) by computing approximations of socalled projected coefficients \(c_{\mathfrak {u},{\varvec{k}}}\) for \(\mathfrak {u}= \{1,\ldots ,t\}\) and \(\mathfrak {u}= \{t\}\) for each \(t = 1,\ldots ,d\) and several indices \({\varvec{k}}\in \mathbb {N}^{\left \mathfrak {u}\right }\) using samples of the function f. In particular, we consider the projected coefficients
as a function w.r.t. the \((d\left \mathfrak {u}\right )\)dimensional anchor \(\tilde{{\varvec{x}}}\in \mathcal {D}_{\mathfrak {u}^\textrm{c}}\). The name is based on the fact that those \(c_{\mathfrak {u},{\varvec{k}}}(\tilde{{\varvec{x}}})\) can be interpreted as the coefficients of the basis expansion in the space \(L_2(\mathcal {D}_\mathfrak {u},\mu _\mathfrak {u})\) of the projections \(f(\cdot ,\tilde{{\varvec{x}}})\), which play an important role in the anchored version of the multivariate decomposition method (MDM), cf. for example [33].
However, using \(\sum _{{\varvec{h}}\in \mathbb {N}^d} \left c_{{\varvec{h}}}\right < \infty \) and Fubini’s Theorem, we proceed
Hence, the size of the projected coefficients \(c_{\mathfrak {u},{\varvec{k}}}\) can be considered as an indicator for the importance of the set of indices \({\varvec{h}}=({\varvec{k}},{\varvec{h}}_{\mathfrak {u}^\textrm{c}})_{\mathfrak {u}}\) with fixed \({\varvec{k}}\) in the components \(\mathfrak {u}\).
In order to utilize this fact, we need a suitable way to approximate such projected coefficients. Here, one can apply various approaches, which we will call reconstruction method throughout this paper. For our theoretical results, we will restrict ourselves to a special kind of cubature approaches in the following sections. Note that most of the theoretical results in Sect. 3 can be proven similarly for other reconstruction methods, cf. Remark 1.3.
We require for fixed \(\mathfrak {u}\) a suitable cubature rule Q with weights \(w_j \in \mathbb {C}, j=1,\ldots ,M\) and cubature nodes \({\varvec{\xi }}_j \in \mathcal {D}_{\mathfrak {u}},j=1,\ldots ,M\), which is exact for some finite index set \(K \subset \mathbb {N}^{\left \mathfrak {u}\right }\) for the inner products \(\left\langle \Phi _{\mathfrak {u},{\varvec{k}}_1}, \Phi _{\mathfrak {u},{\varvec{k}}_2}\right\rangle _{\mathcal {D}_{\mathfrak {u}}}\) for all \({\varvec{k}}_1,{\varvec{k}}_2 \in K\), i.e.,
holds. Additionally, we denote
for each such cubature rule.
We now define the approximated projected coefficients with anchor \(\tilde{{\varvec{x}}}\) as cubature of the integral (1.1) w.r.t. the cubature rule Q, i.e.,
Note that the approximation of the projected coefficients \(c_{\mathfrak {u},{\varvec{k}}}(\tilde{{\varvec{x}}})\) with anchor \(\tilde{{\varvec{x}}}\) may also be realized in different ways, cf. Remark 1.3.
With similar arguments as above, we get
We assume now that (1.3) holds for some index set \(K \subset \mathbb {N}^{\left \mathfrak {u}\right }\) and consider another index set \(J \subset \mathbb {N}^d\) with \(J \subset K \times \mathbb {N}^{d\left \mathfrak {u}\right }\). We split the sum \(\sum _{{\varvec{h}}\in \mathbb {N}^d} = \sum _{{\varvec{h}}\in J} + \sum _{{\varvec{h}}\in J^\textrm{c}}\), apply (1.3) in the first sum \(\sum _{{\varvec{h}}\in J}\) and continue for all \({\varvec{k}}\in K\) with
which is the same as (1.2) up to the projection error term
Note that \(\Psi _{\mathfrak {u},{\varvec{k}}}^{Q,J}\) vanishes for sparse functions f, i.e., if all the coefficients \(c_{{\varvec{h}}}, {\varvec{h}}\in J^\textrm{c},\) are zero. Formula (1.6) legitimizes the use of \({\hat{f}}_{\mathfrak {u},{\varvec{k}}}^Q\) instead of \(c_{\mathfrak {u},{\varvec{k}}}\) as an indicator for the importance of the \({\varvec{h}}=({\varvec{k}},{\varvec{h}}_{\mathfrak {u}^\textrm{c}})_\mathfrak {u}\), if the projection error term \(\Psi _{\mathfrak {u},{\varvec{k}}}^{Q,J}\) is suitably bounded, cf. Sect. 3.
Remark 1.2
Our exactness condition (1.3) can be extended to hold for all functions in the span of the respective basis functions \(\Phi _{{\varvec{u}},{\varvec{k}}}\) by linearity. It is shown in [4, Thm. 2.3], that such a condition is equivalent to fulfulling an \(L_2\)MarcinkiewiczZygmund inequality with equal constants \(A=B\).
Unfortunately, this special kind of \(L_2\)MZ inequality does only hold for one of our used reconstruction methods in Sect. 4, namely the single rank1 lattice (R1L) approach. Based on this observation we assume, that a generalization of the theoretical part in Sect. 3 is possible when assuming the reconstruction method Q to fulfill a relaxed version of the \(L_2\)MZ inequality with constants \(A \le B\). Such a condition also holds for the Monte Carlo nodes (MC, cMC) and probably even for the multiple rank1 lattice (MR1L, cMR1L) approaches from our numerical tests in Sect. 4.
Remark 1.3
We can write (1.5) as matrix vector equation \(\hat{{\varvec{f}}}_\mathfrak {u}^Q = \varvec{\Phi }_\mathfrak {u}^*{\varvec{f}}_w\) with \({\varvec{f}}_w :=(w_j f({\varvec{\xi }}_j,\tilde{{\varvec{x}}})_\mathfrak {u})_{j=1}^M\) and \(\varvec{\Phi }_\mathfrak {u}\in \mathbb {C}^{M\times \vert K \vert }\) containing the corresponding basis function values \(\Phi _{\mathfrak {u},{\varvec{k}}}({\varvec{\xi }}_j)\). While we stick to the presented cubature approach Q in the theoretical part of this paper, one can also apply other reconstruction approaches R to compute the approximated projected coefficients \(\hat{{\varvec{f}}}_\mathfrak {u}^R\), e.g., using a least squares or compressed sensing approach, cf. [14, Chap. 3] for some basic methods. Then, the approximated projected coefficients \({\hat{f}}_{\mathfrak {u},{\varvec{k}}}^{R} (\tilde{{\varvec{x}}})\) are still a good indicator for the importance of the corresponding indices \({\varvec{h}}=({\varvec{k}},{\varvec{h}}_{\mathfrak {u}^\textrm{c}})_\mathfrak {u}\) as long as the corresponding projection error term \(\left {\hat{f}}_{\mathfrak {u},{\varvec{k}}}^{R} (\tilde{{\varvec{x}}})  c_{\mathfrak {u},{\varvec{k}}}(\tilde{{\varvec{x}}})\right \) is small enough.
Note that the theoretical results studied in Sect. 3 should be applied for the new projection error term \({\hat{f}}_{\mathfrak {u},{\varvec{k}}}^{R} (\tilde{{\varvec{x}}})  c_{\mathfrak {u},{\varvec{k}}}(\tilde{{\varvec{x}}})\) instead of \(\Psi _{\mathfrak {u},{\varvec{k}}}^{Q,J}(\tilde{{\varvec{x}}})\) in this case and may need some modifications based on the properties of this new projection error term.
2 The nonlinear approximation algorithm
In this section, we present our nonlinear approximation algorithm based on the concept of projected coefficients explained in Sect. 1.2. In Sect. 2.4 we also discuss different increment strategies and their possible advantages and disadvantages.
2.1 The dimensionincremental method
The full method is given in Algorithm 1. Additionally, Fig. 1 illustrates some of the first steps of the application of Algorithm 1 to some made up function f.
As already mentioned, our algorithm proceeds in a dimensionincremental way. Roughly speaking, it constructs the frequencies \({\varvec{k}}\) of the desired index set \(\textrm{I}\) componentbycomponent. To explain this concept properly, we denote with \(\mathscr {P}(\Omega )\) the power set of a set \(\Omega \) and introduce the projection operator
Hence, the set \(\mathcal {P}_{\mathfrak {u}}(\Omega )\) contains all indices \({\varvec{k}}\) which can be extended to at least one index \({\varvec{h}}\in \Omega \), i.e., \({\varvec{h}}= ({\varvec{k}},{\varvec{h}}_{\mathfrak {u}^\textrm{c}})_\mathfrak {u}\) for some \({\varvec{h}}\in \Omega \).
2.1.1 Single component identification
Algorithm 1 starts by detecting onedimensional index sets, which we denote as \(\textrm{I}_{\{t\}}\), for all \(t=1,\ldots ,d\) in step 1. To this end, it constructs a suitable cubature rule Q to compute the approximated projected coefficients \({\hat{f}}_{\{t\},k_t}^Q(\tilde{{\varvec{x}}})\) for all \(k_t \in \mathcal {P}_{\{t\}}(\Gamma )\), so all possible values for the tth component of the indices in \(\textrm{I}\) according to our search space \(\Gamma \), for some randomly chosen anchor \(\tilde{{\varvec{x}}} \in \mathcal {D}_{\{t\}^\textrm{c}}\) via (1.5). As explained in Sect. 1.2, these values are a suitable indicator to decide, whether or not \(k_t\) appears as tth component of any index \({\varvec{h}}\in \textrm{I}\), i.e., if \(k_t \in \mathcal {P}_{\{t\}}(\textrm{I})\). An index \(k_t\) is kept and therefore added to the set \(\textrm{I}_{\{t\}}\), if the absolute value of its approximated projected coefficient \({\hat{f}}_{\{t\},k_t}^Q(\tilde{{\varvec{x}}})\) is larger than the socalled detection threshold \(\delta _+ \in \mathbb {R}^+\), as it can be seen in Fig. 1a. The right choice of \(\delta _+\) and the connection between the detection threshold \(\delta _+\) and the true size of the basis coefficients \(c_{\varvec{h}}\) with \({\varvec{h}}\in \textrm{I}\) is given in Sect. 3. To avoid the detection of unnecessarily many indices \(k_t\), we also use a socalled sparsity parameter \(s_\textrm{local}\) and consider only the \(s_\textrm{local}\)largest approximated projected coefficients \({\hat{f}}_{\{t\},k_t}^Q(\tilde{{\varvec{x}}})\) larger than the detection threshold \(\delta _+\). Finally, the random choice of the anchor \(\tilde{{\varvec{x}}}\) may result in some annihilations, so small approximated projected coefficients \({\hat{f}}_{\{t\},k_t}^Q(\tilde{{\varvec{x}}})\) even though the corresponding basis coefficients \(c_{\varvec{h}}\) with \({\varvec{h}}_{\{t\}} = k_t,\) are large. Therefore, we repeat the choice of \(\tilde{{\varvec{x}}}\), the computation of the approximated projected coefficients \({\hat{f}}_{\{t\},k_t}^Q(\tilde{{\varvec{x}}})\) and the addition of important indices \(k_t\) to the index set \(\textrm{I}_{\{t\}}\) now \(r \in \mathbb {N}\) times, which is also shown in Fig. 1a. Hence, we call the parameter r the number of detection iterations. Choosing r large enough, cf. Sect. 3, ensures that each index \(k_t \in \mathcal {P}_{\{t\}}(\textrm{I})\) is detected in at least one detection iteration with high probability and therefore \(\mathcal {P}_{\{t\}}(\textrm{I}) \subset \textrm{I}_{\{t\}}\).
2.1.2 Coupled component identification
In each iteration \(t=2,\ldots ,d\) of step 2 of Algorithm 1, we already detected the previous set \(\textrm{I}_{\{1,\ldots ,t1\}}\) and consider \(\mathfrak {u}= \{1,\ldots ,t\}\). As in step 1, the aim is the construction of an index set \(\textrm{I}_\mathfrak {u}\) such that \(\mathcal {P}_\mathfrak {u}(\textrm{I}) \subset \textrm{I}_\mathfrak {u}\) holds with high probability. We construct our socalled candidate set \(K \supset \mathcal {P}_\mathfrak {u}(\textrm{I})\) from two parts. The first part is the product set \(\textrm{I}_{\{1,\ldots ,t1\}} \times \textrm{I}_{\{t\}}\). The first set hopefully contains all \({\varvec{k}}\in \mathcal {P}_{\{1,\ldots ,t1\}}(\textrm{I})\) and the second set all \(k_t \in \mathcal {P}_{\{t\}}(\textrm{I})\). Hence, the combined set \(\textrm{I}_{\{1,\ldots ,t1\}} \times \textrm{I}_{\{t\}}\) is an obvious choice when we are looking for indices \({\varvec{k}}\in \mathcal {P}_\mathfrak {u}(\textrm{I})\). Two such combined sets are shown in Fig. 1b and 1d. The second part is the projection \(\mathcal {P}_\mathfrak {u}(\Gamma )\) as in step 1. There is no need to consider any \({\varvec{k}}\not \in \mathcal {P}_\mathfrak {u}(\Gamma )\), since \(({\varvec{k}},{\varvec{h}})_\mathfrak {u}\not \in \Gamma \supset \textrm{I}\) for any \({\varvec{h}}\in \mathbb {N}^{dt}\) anyway in this case. Therefore, the candidate set K is now chosen as the intersection of those two sets, i.e., \(K = \left( \textrm{I}_{\{1,\ldots ,t1\}} \times \textrm{I}_{\{t\}}\right) \cap \mathcal {P}_\mathfrak {u}(\Gamma )\).
Now, we construct a suitable cubature rule Q for the set K and proceed as in the first step of Algorithm 1: We choose an anchor \(\tilde{{\varvec{x}}} \in \mathcal {D}_{\mathfrak {u}^\textrm{c}}\) at random, compute the corresponding approximated projected coefficient \({\hat{f}}_{\mathfrak {u},{\varvec{k}}}^Q(\tilde{{\varvec{x}}})\) and put the indices \({\varvec{k}}\) of the (up to) \(s_\textrm{local}\)largest coefficients, which are still larger than the detection threshold \(\delta _+\), into the set \(\textrm{I}_\mathfrak {u}\). For \(t=2\), this step is illustrated in Fig. 1c. Finally, we again repeat this procedure r times to ensure the detection of all of the desired indices with high probability. Note that in the final iteration \(t=d\) no more than one detection iteration is needed, since the cubature nodes \({\varvec{\xi }}_j\) are already ddimensional, so there is no randomly chosen anchor \(\tilde{{\varvec{x}}}\). Because of this and since the output \(\textrm{I}_\mathfrak {u}\) of this final iteration is also the final output \(\textrm{I}\), one might want to use another, smaller sparsity parameter s than in the previous steps. Finally note that the computed approximated projected coefficients in this step are already approximations of the true coefficients \(c_{\varvec{h}}, {\varvec{h}}\in \textrm{I}\), so it is not necessary to recompute those quantities in step 3 of Algorithm 1.
2.2 Complexity
The sampling complexity as well as the computational complexity of Algorithm 1 obviously depends strongly on the function space \(L_2(\mathcal {D},\mu )\) as well as the used reconstruction methods in the corresponding function spaces \(L_2(\mathcal {D}_{\mathfrak {u}},\mu _{\mathfrak {u}})\). To simplify the brief consideration of the complexity in this section, we assume that all function spaces \(L_2(\mathcal {D}_j,\mu _j), j=1,\ldots ,d,\) are equal and we can apply cubature method Q for each product of those spaces.
In each part of Algorithm 1 the cubature method Q, see (1.5), chooses an amount of sampling nodes, which depends on the amount of candidates \(\left K\right \), the current dimensionality \(\left \mathfrak {u}\right \in \lbrace 1,\ldots ,d \rbrace \) and the cubature method Q itself. Hence, we will denote this sampling amount by \(S_Q(\left K\right ,\left \mathfrak {u}\right )\). Note that this adds the implicit assumption that Q acts independently of the structure of the candidate set K.
In step 1, we have the dimensionality \(\left \mathfrak {u}\right =\left \lbrace t \rbrace \right =1\) and the candidate set \(K = \mathbb {P}_{\mathfrak {u}}(\Gamma ) = \mathbb {P}_{\lbrace t \rbrace }(\Gamma )\). For most common choices of \(\Gamma \in \mathbb {N}^d\), cf. Section 2.3, the onedimensional projections \(\mathbb {P}_{\lbrace t \rbrace }(\Gamma )\) are just the sets \(\lbrace 0,\ldots ,N1\rbrace \) for some extension \(N1\), so \(\left K\right =N\). Since we sample r times with different \(\tilde{{\varvec{x}}}\) in each dimension \(t = 1,\ldots ,d\), the amount of sampling nodes in step 1 of Algorithm 1 is then \(drS_Q(N,1)\).
In step 2, we sample r times for each dimensionality \(\left \mathfrak {u}\right = 2,\ldots ,d1\) and 1 time for \(\left \mathfrak {u}\right =d\). The size of the tth candidate set \(\textrm{I}_{\{1,\ldots ,t1\}} \times \textrm{I}_{\{t\}}\) for \(t=2,\ldots ,d\) is bounded by \(r^2 s_\textrm{local}^2\), since both index sets contain at most \(r s_\textrm{local}\) indices (if the detected sets for each detection iteration were pairwise disjoint). The intersection with the projection of \(\Gamma \) may only decrease the true number of samples. Hence, we end up with at most \(S_Q(r^2s_\textrm{local}^2,d) + \sum _{t=2}^{d1}rS_Q(r^2s_\textrm{local}^2,t)\) samples for step 2.
Finally, if \(s \sim s_\textrm{local}\), the sampling complexity of Algorithm 1 is then
which is bounded by \(\mathcal {O}\left( drS_Q(r^2s^2,d)\right) \) if \(r^2s^2 \ge N\) and \(S_Q(*,d)\ge S_Q(*,t)\) hold for each \(t=1,\ldots ,d\).
For the computational complexity, we assume that all other steps of Algorithm 1 like the sampling itself, the choice of the random anchors \(\tilde{{\varvec{x}}}\) or the construction of the candidate sets K are negligible. If we denote the computational complexity for the simultaneous numerical integration of all \(\hat{f}_{\{u\},{\varvec{k}}}^Q\) using the cubature method Q by some expression \(T_Q(\left K\right ,\left \mathfrak {u}\right )\) and assume no dependency on the structure of K again, we receive the similar expression
or with similar assumptions as before \(\mathcal {O}\left( drT_Q(r^2s^2,d)\right) \).
2.3 A priori information
As already stated several times, we need \(\Gamma \) to be large enough such that the desired indices we want to detect are all contained in it. Still, Algorithm 1 may benefits from a better choice of \(\Gamma \), so additional a priori information about the function f, since the amount of candidates especially in the higherdimensional steps can be reduced significantly. While a full grid approach
with large enough n will always work, for smoother functions f with rapidly decaying basis coefficients \(c_{{\varvec{k}}}\) a weighted hyperbolic cross approach
with weight \({\varvec{\gamma }}= (\gamma _j)_{j=1}^d \in (0,\infty )^d\) is preferable. But even if the decay of the coefficients is relatively slow, an \(\ell _p\) ball approach
with weight \({\varvec{\gamma }}= (\gamma _j)_{j=1}^d \in (0,\infty )^d\) can also reduce the amount of samples and computation time needed. For \(p=\infty \), \(\Gamma \) is the (weighted) full grid.
Another reasonable choice for practical examples comes from the sparsityofeffects principle, which states that a system is usually dominated by main effects and loworder interactions. In our case, the principle means that the indices \({\varvec{k}}\) with a rather small number of nonzero components \(\vert \vert {\varvec{k}}\vert \vert _0\) belong to the largest basis coefficients \(c_{\varvec{k}}\), as we already noticed when working with parametric PDEs in [31]. This is also one of the main principles behind various loworder methods like the popular ANOVA decomposition, cf. [33, 44, 49] and the references therein, or the SHRIMP method, cf. [52]. For such a case, a loworder approach
with superposition dimension \(\tilde{d} \in \mathbb {N}\) should be combined with any of the previous choices.
Table 1 shows the size of the search space \(\Gamma \) in \(d=10\) dimensions for some examples with weights \({\varvec{\gamma }}= {\varvec{1}}= (1,\ldots ,1)^\top \) and their reduced versions \(\Gamma _{\tilde{d}}\). Figure 2 illustrates three different index sets in two dimensions.
2.4 Alternative increment strategies
One main feature of the dimensionincremental method is the combination of the detected, onedimensional index set projections \(\textrm{I}_{\{t\}}, t=1,\ldots ,d\). Algorithm 1 realizes this in the most intuitive way by adding the dimension t to the already detected set \(\textrm{I}_{\{1,\ldots ,t1\}}\) in each dimension increment \(t=2,\ldots ,d\). This classical approach, which we will call onebyone strategy, is sketched in Fig. 3a for \(d=9\). The same approach was exploited in [46], where socalled reconstructing rank1 lattices were used for the computation of the projected coefficients \({\hat{f}}_{\mathfrak {u},{\varvec{k}}}^Q\). Therein, these lattices were computed componentbycomponentwise and hence perfectly matched this incremental strategy. Another advantage of the onebyone strategy can be seen when we combine the forloop over t in step 1 of Algorithm 1 with the forloop over t in step 2. This is possible, since each onedimensional projection set \(\textrm{I}_{\{t\}}\) is only used in the corresponding tth iteration in step 2. Hence, we only need one such onedimensional projection set \(\textrm{I}_{\{t\}}\) at a time and may save additional memory space by simply overwriting the previous one in the next iteration \(t+1\).
Obviously, there is no need to limit ourselves to this straight forward strategy in general. In the remaining part of this section, we therefore discuss some alternative increment strategies as well as possible advantages and disadvantages of these approaches. Note that all those strategies just yield several improvements, but are not “optimal” in any sense. Even worse, such “optimalities” heavily depends on the given problem and the corresponding dimension d, the cubature rules Q and the used algorithm parameters. Hence, it is very tricky to come up with an overall good strategy for every possible setting. Also, the reader needs to decide on its own, which kind of “optimality” is even aimed for, e.g., small complexities of the cubature rules, memory efficiency or possible parallelizations.
2.4.1 Dyadic strategy
In step 2 of Algorithm 1, we most importantly construct the cubature rule Q and compute the projected coefficients \({\hat{f}}_{\mathfrak {u},{\varvec{k}}}^Q\). Since the values for \(\tilde{{\varvec{x}}}\) are random but fixed, these are basically \(\left \mathfrak {u}\right \)dimensional problems. Depending on the used cubature rule, the size \(\left \mathfrak {u}\right \) might heavily influences the amount of cubature nodes M as well as the computational costs.
Using the onebyone strategy, we consider \(d1\) dimensionincremental steps, where each dimensionality \(\left \mathfrak {u}\right = t \in \{2,\ldots ,d\}\) appears exactly once. The dyadic strategy now aims for more incremental steps with lower dimensionality while keeping the overall number of steps constant. This strategy combines the two projected index sets \(\textrm{I}_\mathfrak {u}\) and \(\textrm{I}_\mathfrak {v}\) with smallest dimensionalities \(\left \mathfrak {u}\right \) and \(\left \mathfrak {v}\right \) in each step. If there are several sets of same dimensionalities, e.g., at the beginning when there are d sets with dimension \(\left \mathfrak {u}\right = 1\), the set is randomly chosen among these candidates. Rearranging these dimensionincremental steps into stages as in Fig. 3b, the dyadic structure can be seen. Note that for \(d\not =2^k, k\in \mathbb {N}\), some stages have to keep one projected index set untouched since there was an odd number of sets to combine, which will be the projected index set \(\textrm{I}_\mathfrak {u}\) with the highest dimensionality \(\left \mathfrak {u}\right \). This is the case for \(\textrm{I}_{\{9\}}\) in the first stage, \(\textrm{I}_{\{5,6\}}\) in the second stage and \(\textrm{I}_{\{1,\ldots ,4\}}\) in the third stage in Fig. 3b, visualized using the dashed arrow.
As mentioned above, this strategy reduces the dimensionalities in many steps tremendously for large d. Even for the relatively small \(d=9\) in Fig. 3b, the dyadic strategy uses four 2dimensional steps and only one 3, 4, 5 and 9dimensional step each instead of one tdimensional step for each \(t=2,\ldots ,9\) as in the onebyone strategy. The additional computational effort for the realization of this strategy is also relatively small, since it depends only on the particular d used and can even be precomputed. Finally, many dimensionincremental steps can be performed in parallel, as they are not dependent on each other, which allows additional time savings. Unfortunately, the dyadic strategy is not implementable as memoryfriendly as the onebyone strategy since some of the projected index sets \(\textrm{I}_\mathfrak {u}\) block memory for several steps while they are not picked for the next combination.
2.4.2 Datadriven onebyone strategy
While the dyadic strategy aims for smaller dimensionalities \(\left \mathfrak {u}\right \) in the dimensionincremental steps, the size of the candidate set \(K \subset \mathbb {N}^{\left \mathfrak {u}\right }\), so how many approximated projected coefficients \({\hat{f}}_{\mathfrak {u},{\varvec{k}}}^Q\) we need to compute, is probably another crucial factor influencing the performance of the cubature rules Q and hence the overall performance of our algorithm. In general, we have \(K = \left( \textrm{I}_\mathfrak {u}\times \textrm{I}_\mathfrak {v}\right) \cap \mathbb {P}_{\mathfrak {u}\cup \mathfrak {v}}(\Gamma )\), e.g., \(\textrm{I}_\mathfrak {u}= \textrm{I}_{\{1,\ldots ,t\}}\) and \(\textrm{I}_\mathfrak {v}= \textrm{I}_{\{t\}}\) in a dimensionincrement in Algorithm 1. The size \(\left K\right \) depends mainly on the size of the projected index sets \(\textrm{I}_\mathfrak {u}\) and \(\textrm{I}_\mathfrak {v}\) it is built from. Therefore, we now present two socalled datadriven strategies, where these sizes \(\left \textrm{I}_\mathfrak {u}\right \) are examined before each dimensionincremental step and then it is decided, which \(\textrm{I}_\mathfrak {u}\) and \(\textrm{I}_\mathfrak {v}\) to choose for the next step. Note that an investigation of the sizes of all possible K instead of just the sizes of all available \(\textrm{I}_\mathfrak {u}\) might be even more favorable, especially for challenging choices of \(\Gamma \), but also needs even more computational effort and is therefore not considered herein.
The following approach is based on the classical onebyone strategy and is therefore called datadriven onebyone strategy. We start with the computation of all the onedimensional projected index sets \(\textrm{I}_{\{t\}}, t=1,\ldots ,d\). We proceed as in the onebyone strategy, but instead of working through the sets lexicographically, so from \(\textrm{I}_{\{1\}}\) to \(\textrm{I}_{\{d\}}\), we rearrange them, ordered by their descending size. In particular, we go from \(\textrm{I}_{\{t_1\}}\) to \(\textrm{I}_{\{t_d\}}\) with \(t_i \in \{1,\ldots ,d\}\) for all \(i \in \{1,\ldots ,d\}\) and \(\vert \textrm{I}_{\{t_i\}}\vert \ge \vert \textrm{I}_{\{t_j\}}\vert , 1\le j \le i \le d\). For instance, in Figure 4a the set \(\textrm{I}_{\{5\}}\) is the largest one and therefore the starting point of the datadriven onebyone strategy.
The advantage of this approach compared to the classical onebyone strategy is the fact that the higherdimensional candidate sets K in the later dimensionincremental steps are probably smaller and hence the construction of the cubature rule Q as well as the computation of the coefficients might be considerably cheaper in terms of sampling points and computation time.
2.4.3 Datadriven dyadic strategy
Finally, the datadriven dyadic strategy combines the advantages of both the dyadic strategy and the datadriven onebyone strategy, illustrated for \(d=9\) in Fig. 4b. While the rearrangement into stages in the dyadic strategy was just for visualization, it is an essential part of the strategy now. In the first stage, we investigate the size of all onedimensional sets \(\textrm{I}_{\{t\}}, t=1,\ldots ,d\), and perform \(\left\lfloor d/2\right\rfloor \) dimensionincremental steps, where we combine the largest set with the smallest one, the second largest with the second smallest and so on. In the next stage, we then take a look at all those new, twodimensional index sets (and the possibly leftover onedimensional set from the first stage if d is odd) and perform multiple dimensionincremental steps again, using the same criterion as in the first stage. This strategy is then repeated as often as needed, so until we end up with the full index set \(\textrm{I}_{\{1,\ldots ,d\}}\). Note that the dimensionality \(\left \mathfrak {u}\right \) of the sets is not considered in this kind of strategy, but higherdimensional sets are more likely to be larger as well and is therefore involved implicitly. If the number of available sets is odd in any stage, the median sized set is just kept as it is for the next stage, e.g., sets \(\textrm{I}_{\{9\}}\), \(\textrm{I}_{\{7,8\}}\) and \(\textrm{I}_{\{3,5,9\}}\) in the first three stages in Fig. 4b.
As for the dyadic strategy, we manage to end up with less highdimensional combination steps as for the onebyone or datadriven onebyone strategy, which can be performed in parallel again. On the other hand, the additional size criterion in each stage avoids cases, where two very large sets are combined and hence the size of the corresponding candidate set K grows unnecessarily large. This can easily happen in the dyadic strategy since it only considers the dimensionalities \(\left \mathfrak {u}\right \) of the sets \(\textrm{I}_{\mathfrak {u}}\) and also chooses randomly among sets with the same dimensionality.
3 Theoretical detection guarantee
In this section, we show a bound on the number of detection iterations r in Algorithm 1 such that we can ensure the successful detection of all indices \({\varvec{k}}\) belonging to basis coefficients \(c_{{\varvec{k}}}\), whose magnitude is larger than an absolute threshold \(\delta \), with high probability. Therefore, we follow the main steps in [30] and generalize their theoretical results. As explained in Sect. 1.1, we consider smooth enough, multivariate functions \(f \in L_2(\mathcal {D},\mu )\) of the form
for some coefficients \(\lbrace c_{{\varvec{k}}}\rbrace _{{\varvec{k}}\in \mathbb {N}^d}\). Further, we denote with \(\textrm{I}_{\delta } \subset \mathbb {N}^d\) the unknown, finite index set such that
holds and \(\Gamma \subset \mathbb {N}^d\) as before the suitable search domain containing this index set \(\textrm{I}_{\delta }\), i.e., \(\textrm{I}_\delta \subset \Gamma \).
We compute the adaptive approximation in d dimension increment steps in Algorithm 1. In each of the dimension increment steps at most three probabilistic substeps are performed:

The detection of the onedimensional projections \(\textrm{I}_{\{t\}}\) in step 1, which is successful, if the event \(E_{1,t} :=\left\{ \mathcal {P}_{\mathfrak {u}}(\textrm{I}_\delta ) \subset \textrm{I}_{\mathfrak {u}} \right\} \) with \(\mathfrak {u}= \{t\}\) occurs.

The (possibly probabilistic) construction of the cubature rule Q for some index set K in step 2, i.e., the successful computation of \(M, (w_j)_{j=1}^M\) and \(({\varvec{\xi }}_j)_{j=1}^M\). We define
$$\begin{aligned} E_{2,t} :=\left\{ \text {Successful construction of }Q \text { for } K \right\} . \end{aligned}$$ 
The detection of the multidimensional projections \(\textrm{I}_{\{1,\ldots ,t\}}\) in step 2, which is successful, if the event \(E_{3,t} :=\left\{ \mathcal {P}_{\mathfrak {u}} (\textrm{I}_\delta ) \subset \textrm{I}_{\mathfrak {u}} \right\} \) with \(\mathfrak {u}= \{1,\ldots ,t\}\) occurs.
If each of these probabilistic substeps is successful, we detect all indices from \(\textrm{I}_\delta \). We use the union bound to estimate the corresponding probability
We aim for a failure probability \(\varepsilon \in (0,1)\) of the whole algorithm. We split this up such that each probabilistic substep has an equal upper bound on its failure probability of \(\varepsilon /(3d)\). Hence, we now estimate the probabilities \(\mathbb {P}(E_{1,t}^\textrm{c})\) and \(\mathbb {P}(E_{3,t}^\textrm{c})\). Upper bounds on \(\mathbb {P}(E_{2,t}^\textrm{c})\) depend on the used cubature rule Q and its construction and are therefore not considered here.
3.1 Failure probability estimates
First, we recall the definition and estimate of the approximated projected coefficients (1.5) as well as the projection error term (1.7) from Sect. 1.2 and apply them for \(J = \textrm{I}_\delta \). We use

\(\mathfrak {u}=\{t\}\) and \(K = \mathcal {P}_{\mathfrak {u}}(\Gamma ) \subset \mathbb {N}\) for the onedimensional projections in step 1 and

\(\mathfrak {u}= \{1,\ldots ,t\}\) and \(K =(\textrm{I}^{(1,\ldots ,t1)} \times \textrm{I}^{(t)})\cap \mathcal {P}_{\mathfrak {u}}(\Gamma ) \subset \mathbb {N}^t\) for the multidimensional projections in step 2
of Algorithm 1. In each case, we get for \({\varvec{k}}\in K\) the formula
Note that the first part of (3.3) is independent of the used cubature rule Q. Hence, the approximated projected coefficients \({\hat{f}}_{\mathfrak {u},{\varvec{k}}}^Q\) are also independent of Q up to the projection error term \(\Psi _{\mathfrak {u},{\varvec{k}}}^{Q,J}\), depending on the basis coefficients of f in \(\textrm{I}_\delta ^\textrm{c}\), so all coefficients \(c_{\varvec{k}}\) with absolute value smaller than \(\delta \).
The following Lemma based on [30, Lem. 4] gives an estimate on the probability that such sums (3.3) take small function values for randomly drawn anchors \(\tilde{{\varvec{x}}}\).
Lemma 3.1
Consider \(\mathfrak {v}\subset \lbrace 1,\ldots ,d \rbrace \) and the space \(\mathcal {D}_\mathfrak {v}:=\mathop {{\times }}_{j\in \mathfrak {v}} \mathcal {D}_j \subset \mathbb {R}^{\left \mathfrak {v}\right }\) with the corresponding product measure \(\mu _\mathfrak {v}\) and basis functions \(\Phi _{\mathfrak {v},{\varvec{h}}}\), which are bounded by the finite constant \(B_\mathfrak {v}\), cf. Sect. 1.2.
Let a function \(g: \mathcal {D}_\mathfrak {v}\rightarrow \mathbb {C},\, g({\varvec{x}}) :=\sum _{{\varvec{h}}\in \tilde{\textrm{I}}} \hat{g}_{{\varvec{h}}} \Phi _{\mathfrak {v},{\varvec{h}}}({\varvec{x}}) \not \equiv 0,\, \tilde{\textrm{I}} \subset \mathbb {N}^{\left \mathfrak {v}\right }, \vert \tilde{\textrm{I}}\vert < \infty ,\) be given and let \(\Psi : \mathcal {D}_\mathfrak {v}\rightarrow \mathbb {C}\) be some function with \(\left\ \Psi \right\ _{L_1(\mathcal {D}_\mathfrak {v})} < \delta _\Psi \). Moreover, let \(X_j\in \mathcal {D}_j,\, j \in \mathfrak {v},\) be independent, \(\mu _j\)distributed random variables and we denote by \({\varvec{X}}:=(X_j)_{j\in \mathfrak {v}} \in \mathcal {D}_\mathfrak {v}\) the random vector.
If \(\max _{{\varvec{h}}\in \tilde{\textrm{I}}}\vert \hat{g}_{{\varvec{h}}}\vert > B_\mathfrak {v}(\delta _+ + \delta _\Psi )\) for some \(\delta _+ >0\), then
Choosing r random vectors \({\varvec{X}}_1,\ldots ,{\varvec{X}}_r \in \mathcal {D}_\mathfrak {v}\) independently, we observe
Proof
We proceed as in [30] and refer to the lower bound
from [38, Par. 9.3.A]. Applying this for the even and on \([0,\infty )\) nondecreasing function \(h(t) :=\left t\right \) and \(Y=(g+\Psi )({\varvec{X}})\) leads to
and consequently
Since
for all \({\varvec{h}}\in \tilde{\textrm{I}}\), we have
Together with the estimate
we conclude
Using the assumption \(\max _{{\varvec{h}}\in \tilde{\textrm{I}}} \left \hat{g}_{{\varvec{h}}}\right > B_\mathfrak {v}(\delta _+ + \delta _\Psi )\), the estimate (3.4) holds and (3.5) then follows directly. \(\square \)
In Algorithm 1, we compute the approximated projected coefficients \({\hat{f}}_{\mathfrak {u},{\varvec{k}}}^Q (\tilde{{\varvec{x}}})\) for all \({\varvec{k}}\) in our candidate sets \(\mathcal {P}_\mathfrak {u}(\Gamma )\) in step 1 and \(\left( \textrm{I}_{\{1,\ldots ,t1\}} \times \textrm{I}_{\{t\}} \right) \cap \mathcal {P}_{\mathfrak {u}}(\Gamma )\) in step 2. Now, we apply Lemma 3.1 to those coefficients with \({\varvec{k}}\in \mathcal {P}_\mathfrak {u}(\textrm{I}_\delta )\) and \({\varvec{k}}\in \left( \textrm{I}_{\{1,\ldots ,t1\}} \times \textrm{I}_{\{t\}} \right) \cap \mathcal {P}_{\mathfrak {u}}(\textrm{I}_\delta )\), respectively, so those projected coefficients, we want to detect. This yields us bounds on the probability that they are below the detection threshold \(\delta _+\) and therefore not detected by Algorithm 1.
Corollary 3.2
Let a threshold value \(\delta > 0\) and a smooth enough function \(f \in L_2(\mathcal {D},\mu )\) be given. We consider the finite index set \(\textrm{I}_\delta \) such that (3.1) holds.

For fixed \(1 \le t \le d\), \(\mathfrak {u}= \{t\}\), we denote \(K(S) :=\mathcal {P}_\mathfrak {u}(S), S \in \{\textrm{I}_\delta ,\Gamma \}\) and compute the onedimensional approximated projected coefficients for the tth component and

for fixed \(1< t < d\), \(\mathfrak {u}= \{1,\ldots ,t\}\), we denote \(K(S) :=\left( \textrm{I}_{\{1,\ldots ,t1\}} \times \textrm{I}_{\{t\}} \right) \cap \mathcal {P}_{\mathfrak {u}}(S), S \in \{\textrm{I}_\delta ,\Gamma \}\) and compute the multidimensional approximated projected coefficients for the components \((1,\ldots ,t)\)
using a cubature rule Q with (1.3) and (1.4) by
where the anchors \(\tilde{{\varvec{x}}}_1,\ldots ,\tilde{{\varvec{x}}}_r \in \mathcal {D}_{\mathfrak {u}^\textrm{c}}, r \in \mathbb {N}\) are independently chosen according to the corresponding product measure \(\mu _{\mathfrak {u}^\textrm{c}}\) at random. Further, we assume that there exists a \(\delta _\Psi > 0\) such that \(\left\ \Psi _{\mathfrak {u},{\varvec{k}}}^{Q,I_\delta }\right\ _{L_1(\mathcal {D}_{\mathfrak {u}^\textrm{c}})} < \delta _\Psi \) holds for all \({\varvec{k}}\in K(\textrm{I}_\delta )\).
Then, for \(\delta _+>0\) with \(B(\delta _+ + \delta _\Psi ) < \min _{{\varvec{k}}\in \textrm{I}_\delta } \left c_{\varvec{k}}\right \) and \({\varvec{k}}\in K(\textrm{I}_\delta )\), the probability estimate
holds, where
holds.
Proof
The estimate
or
holds due to (3.4) in Lemma 3.1 and the estimate
Repeating the computation of \({\hat{f}}_{\mathfrak {u},{\varvec{k}}}^Q (\tilde{{\varvec{x}}})\) for independent randomly chosen anchors \(\tilde{{\varvec{x}}} = \tilde{{\varvec{x}}}_1,\ldots ,\) \(\tilde{{\varvec{x}}}_r \in \mathcal {D}_{\mathfrak {u}^\textrm{c}}, r\in \mathbb {N},\) we estimate
Applying the union bound yields
\(\square \)
Finally, the following Lemma is based on [30, Lem. 7] and gives an estimate on the choice of the number of detection iterations r such that the probabilities \(\mathbb {P}(E_{1,t}^\textrm{c})\) and \(\mathbb {P}(E_{3,t}^\textrm{c})\) in (3.2) are bounded by the desired value \(\varepsilon /(3d)\).
Lemma 3.3
Let a threshold value \(\delta > 0\) and a smooth enough function \(f \in L_2(\mathcal {D},\mu )\) be given. We consider the finite index set \(\textrm{I}_{3\delta }\) such that (3.1) holds. Further, we assume that \(3B(\delta _+ + \delta _\Psi ) < \min _{{\varvec{k}}\in \textrm{I}_{3\delta }} \left c_{\varvec{k}}\right \) holds for the detection threshold \(\delta _+\) and the projection error threshold \(\delta _\Psi \), cf. Corollary 3.2. Also, let \(C \in \mathbb {R}\) be given such that \(C \ge C_Q\) for all cubature rules Q used in Algorithm 1.
Then, choosing the number of detection iterations
in Algorithm 1 guarantees that each of the probabilities \(\mathbb {P}(E_{1,t}^\textrm{c})\) and \(\mathbb {P}(E_{3,t}^\textrm{c})\) is bounded from above by \(\varepsilon /(3d)\).
Proof
We use \(\mathfrak {u}\) and \(K(\textrm{I}_{3\delta })\) as in Corollary 3.2 and estimate the probabilities \(\mathbb {P}(E_{1,t}^\textrm{c})\) and \(\mathbb {P}(E_{3,t}^\textrm{c})\) by (3.6). We increase r such that
is fulfilled. Consequently, r has to be bounded from below by
Hence, we now estimate
and using \(3B(\delta _\Psi + \delta _+) < \max _{{\varvec{h}}=({\varvec{k}},{\varvec{h}}_{\mathfrak {u}^\textrm{c}})_\mathfrak {u}\in \textrm{I}_{3\delta }} \left c_{{\varvec{h}}}\right \) by assumption yields
Exploiting the fact that \(\max _{{\varvec{h}}=({\varvec{k}},{\varvec{h}}_{\mathfrak {u}^\textrm{c}})_\mathfrak {u}\in \textrm{I}_{3\delta }} \left c_{{\varvec{h}}}\right > 3\delta \), the fraction inside the minimum can be bounded from below by
which is now independent of \({\varvec{k}}\). Hence, we have
Consequently, and since \(\log (x+1)\ge \frac{x}{x+1}\) for all \(x>1\) and hence \(\frac{1}{\log (x+1)}\le \frac{x+1}{x} = 1+\frac{1}{x}\), we obtain for the denominator in (3.8) the estimate
Finally, using C instead of \(C_Q\), this bound is applicable independently of Q. Therefore, the choice (3.7) then satisfies the lower bound (3.8) and \(\mathbb {P}(E_{1,t}^\textrm{c}) \le \varepsilon / (3d)\) and \(\mathbb {P}(E_{3,t}^\textrm{c}) \le \varepsilon / (3d)\) are fulfilled. \(\square \)
Remark 3.4
The lower bound (3.7) depends linearly on \(\sum _{{\varvec{h}}\in \textrm{I}_{3\delta }^\textrm{c}} \left c_{{\varvec{h}}}\right \), which will be small for reasonable choices of \(\delta \) as well as fast enough decaying coefficients \(c_{\varvec{h}}\). Still, we may not access this value directly and need to bound it from above by some more accessible value, e.g.,
However, the sum might be tremendously smaller than such bounds, since it does not consider the largest coefficients of f, i.e., all basis coefficients \(c_{\varvec{h}}\), whose absolute value is larger than our threshold \(3\delta \).
Additionally, note that for sparse functions f the sum \(\sum _{{\varvec{h}}\in \textrm{I}_{3\delta }^\textrm{c}} \left c_{{\varvec{h}}}\right \) vanishes completely for a large enough threshold \(3\delta \).
3.2 Main result
Now we collected all ingredients to state and prove our main theorem:
Theorem 3.5
Let a threshold \(\delta >0\) and a failure probability \(\varepsilon \in (0,1)\) be given. We consider a smooth enough function \(f \in L_2(\mathcal {D},\mu )\) such that the corresponding finite index set \(\textrm{I}_{3\delta }\) fulfilling the condition (3.1) is nonempty. Further, we assume that the projection error terms \(\Psi _{\mathfrak {u},{\varvec{k}}}^{Q,J}(\tilde{{\varvec{x}}})\) given in (1.7) for \(\mathfrak {u}= \{t\}, t=1,\ldots ,d,\) and \(\mathfrak {u}= \{1,\ldots ,t\}, t=2,\ldots ,d1,\) fulfill the bound
for all \({\varvec{k}}\in \mathcal {P}_{\mathfrak {u}}(\textrm{I}_{3\delta })\) or \({\varvec{k}}\in \left( \textrm{I}^{(1,\hdots ,t1)} \times \textrm{I}^{(t)} \right) \cap \mathcal {P}_{(1,\hdots ,t)}(\textrm{I}_{3\delta })\), respectively, for some \(\delta _\Psi < \delta \) uniformly for all possibly used cubature rules Q. Finally, let \(C \in \mathbb {R}\) be given such that \(C \ge C_Q\) for all possibly used cubature rules Q.
We apply Algorithm 1 to the function f using the following parameters. We choose

the search space \(\Gamma \subset \mathbb {N}^d\) such that \(\textrm{I}_{3\delta } \subset \Gamma \),

the detection threshold \(\delta _+\) such that \(3B(\delta _+ + \delta _\Psi ) < \min _{{\varvec{k}}\in \textrm{I}_{3\delta }} \left c_{\varvec{k}}\right \),

the number of detection iterations \( r \ge \left( 1 + \frac{3}{2} B^2 \left \textrm{I}_{3\delta }\right + \frac{B^3C}{2\delta } \sum _{{\varvec{h}}\in \textrm{I}_{3\delta }^\textrm{c}} \left c_{{\varvec{h}}}\right \right) (\log 3 + \log d + \log \left \textrm{I}_{3\delta }\right  \log \varepsilon )\).
We assume that the construction of the cubature rules Q fails with probability at most \(\varepsilon /(3d)\).
Then, with a probability \(1\varepsilon \), the index set \(\textrm{I}\) in the output of Algorithm 1 contains the whole index set \(\textrm{I}_{3\delta }\).
Proof
The assertion follows directly when inserting the probabilities \(\mathbb {P}(E_{1,t}^\textrm{c})\) and \(\mathbb {P}(E_{3,t}^\textrm{c})\) proven in Lemma 3.3 in the union bound estimate
discussed in the beginning of Sect. 3. \(\square \)
Remark 3.6
While we already discussed the smoothness assumption in Remark 1.1, inequality (3.9) adds another technical assumption to Theorem 3.5. As mentioned in Sect. 1, \(\Psi _{\mathfrak {u},{\varvec{k}}}^{Q,\textrm{I}_{3\delta }}\) will vanish for sparse functions f (given a suitable choice of Q) and hence fulfill (3.9). However, also the case of sparsity with additional noise can be covered easily depending on the magnitude and type of the noise.
For nonsparse functions f, we could consider a sufficiently fast decay of the basis coefficients \(c_{\varvec{k}}\). As an example, we refer to the Fourier case and the weigthed Wiener spaces as studied in [44]. Therein, weights \(\omega ^{\alpha ,\beta }({\varvec{k}})\) are considered, which are of product and orderdependent structure and where \(\alpha \) regulates the isotropic and \(\beta \) the dominating mixed smoothness. In [44, Sec. 5] the case of scattered data as well as the blackbox scenario are investigated, where the latter corresponds to our case here.
Finally, note that a large gap in the size of the basis coefficients will also have a huge effect on the size of \(\Psi _{\mathfrak {u},{\varvec{k}}}^{Q,\textrm{I}_{3\delta }}\), if \(\textrm{I}_{3\delta }\) is chosen to match that gap.
Theorem 3.5 guarantees the output set \(\textrm{I}\) of Algorithm 1 to contain the index set \(\textrm{I}_{3\delta }\) with probability \(1\varepsilon \). This holds, since we have shown that all the projections \(\mathcal {P}_\mathfrak {u}(\textrm{I}_{3\delta })\) are detected with high probability. Note that if the projection error term \(\Psi _{\mathfrak {u},{\varvec{k}}}^{Q,J}\) is large enough in Formula (3.3), the projected coefficients appear to be large enough and might be detected anyway, even if there is not a single \({\varvec{h}}\in \textrm{I}_{3\delta }\) with \({\varvec{h}}= ({\varvec{k}},{\varvec{h}}_{\mathfrak {u}^\textrm{c}})_\mathfrak {u}\), cf. Remark 3.7. This leads to additional index projections detected, which do not belong to the largest coefficients. Theorem 3.5 implicitely assumes that the cutoff at the end of each step in Algorithm 1 might reduce the amount of such unnecessary detections, but never throws away the important projections of the indices \({\varvec{h}}\in \textrm{I}_{3\delta }\). This can be ensured by choosing the sparsity parameters s and \(s_{\text {local}}\) in Algorithm 1 large enough. Note that in real applications the optimal choice of those parameters is probably unknown, so they would need to be chosen roughly.
Remark 3.7
We now briefly estimate the probability of additional detections \({\varvec{k}}\not \in \mathcal {P}_{\mathfrak {u}}(\textrm{I}_{3\delta })\) to get a feeling for how many of those detections we should expect each time. We use the righthand side inequality
from [38, Par. 9.3.A], again with \(h(t):=\left t\right \) as in the proof of Lemma 3.1 and \(Y = \Psi _{\mathfrak {u},{\varvec{k}}}^{Q,J}(\tilde{{\varvec{x}}})\). Hence, we have the estimate
for the probability that \(\Psi _{\mathfrak {u},{\varvec{k}}}^{Q,J}\) is smaller than the detection threshold \(\delta _+\). So we end up with the probability
that in at least one detection iteration \(\nu =1,\ldots ,r\) the projection error term \(\Psi _{\mathfrak {u},{\varvec{k}}}^{Q,J}\) passes the detection threshold \(\delta _+\). So for each \({\varvec{k}}\not \in \mathcal {P}_{\mathfrak {u}}(\textrm{I}_{3\delta })\), there is a chance of at most \(1(1\frac{\delta _\Psi }{\delta _+})^r\) to be detected accidentally. This matches the intuitive idea that functions with a large \(\delta _\Psi \) behave significantly worse than functions with very small coefficients \(c_{\varvec{k}}\ll 3\delta \) for all \({\varvec{k}}\in \textrm{I}_{3\delta }^\textrm{c}\) or functions, which are nearly sparse. \(\square \)
4 Numerics
We now investigate the performance and results of our algorithm for two different settings. The first numerical example considers the approximation of a 10dimensional, periodic test function in the space \(L_2(\mathbb {T}^{10})\) using the Fourier basis. The second part is the approximation of a 9dimensional, nonperiodic test function in the space \(L_2([1,1]^9,\mu _{\textrm{Cheb}})\) with \(\mu _{\textrm{Cheb}}\) the Chebyshev measure using the Chebyshev basis, cf. Section 1.1.
While we mentioned in Sect. 2.1 that an additional recomputation of the basis coefficients on the detected index set in Step 3 of Algorithm 1 is not necessary, the error size of the coefficient approximation might be significantly smaller in this case. To respect this possible lack of accuracy of the herein used version of Algorithm 1, we investigate the precision of the coefficient approximation and the crucial aim of the algorithm, the detection of a useful sparse index set \(\textrm{I}\), separately.
Finally, note that we do not control the size of the output index set \(\textrm{I}\) by the detection threshold \(\delta _+\) as in our approach in Sect. 3 in our numerical tests. Here, we follow the more common approach of choosing \(\delta _+\) relatively small, but controlling the output using the sparsity parameter s. Hence, we do not need to estimate a suitable choice for \(\delta _+\) based on the intended threshold \(\delta \), cf. Theorem 3.5, but also have no theoretical guarantee on the output index set \(\textrm{I}\) anymore.
4.1 10dimensional periodic test function
For this example, we consider the frequency domain \(\mathbb {Z}^d\) instead of \(\mathbb {N}^d\), as it is more convenient to use for the Fourier basis. Consider the multivariate test function \(f: \mathbb {T}^{10} \rightarrow \mathbb {R},\)
from e.g. [46, Sec. 3.3] and [35, Sec. 4.2.3], where \(N_m:\mathbb {T}\rightarrow \mathbb {R}\) is the BSpline of order \(m\in \mathbb {N}\),
with a constant \(C_m>0\) such that \(\left\ N_m\right\ _{L_2(\mathbb {T})}=1\). The function f has infinitely many nonzero Fourier coefficients. The largest and therefore most important coefficients are expected to be supported on a union of a threedimensional symmetric hyperbolic cross in the dimensions 1, 3, 8, a fourdimensional symmetric hyperbolic cross in the dimensions 2, 5, 6, 10, and a threedimensional symmetric hyperbolic cross in the dimensions 4, 7, 9, each corresponding to the important coefficients of one of the three summands due to their decay properties. Therefore we use the search space \(\Gamma :=\tilde{H}_{N}^{10,\frac{1}{2}}\), where
is the ddimensional symmetric hyperbolic cross with weight \(0<\gamma \le 1\) and extension \(N>0\) such that \(2^N\in \mathbb {N}\). Moreover, we set the number of detection iterations \(r = 5\). We increase the sparsity s exponentially and use the local sparsity parameter \(s_\textrm{local} = 1.2 s\). Due to the fast decay of the Fourier coefficients and the increasing sparsity s, we fix the detection threshold \(\delta _+ = 10^{12}\).
The algorithm was implemented in MATLAB^{®} and tested using 2 six core CPUs Intel^{®} Xeon^{®} CPU E52620 v3 @ 2.40GHz and 64 GB RAM. All tests are performed 10 times and the relative \(L_2(\mathbb {T}^{10})\) approximation error
as well as the coefficient approximation error
for \(p=1,2,\infty \) are computed. Note that the relative \(L_2(\mathbb {T}^{10})\) approximation error uses the exact coefficients \(c_{{\varvec{k}}}\) and hence only depends on the computed index set \(\textrm{I}\). Therefore, this error indicates how well our detected indices \({\varvec{k}}\in \textrm{I}\) can approximate the function f without possible loss due to nonoptimal approximations \({\hat{f}}_{\varvec{k}}\) of the coefficients \(c_{{\varvec{k}}}\).
We consider three different cubature methods Q to construct and evaluate the approximated projected (Fourier) coefficients \({\hat{f}}_{\mathfrak {u},{\varvec{k}}}^Q (\tilde{{\varvec{x}}})\) in Algorithm 1:

Monte Carlo points (MC): We draw \(M :=\left K\right \log (\left K\right )\) nodes \({\varvec{\xi }}_j, j=1,\ldots ,M,\) uniformly at random in \(\mathbb {T}^t\) and set \(w_j=M^{1}\) for all \(j=1,\ldots ,M\). To improve the accuracy, we subsequent apply the least squares method with up to 20 iterations and the tolerance \(10^{6}\). Hence, the method is no longer an equally weighted Monte Carlo cubature.

single rank1 lattices (R1L): We construct a rank1 lattice which is a spatial discetization for K using [28, Algorithm 5]. An FFT approach is used in order to compute the projected coefficients simultaneously, cf. [37] and [25, Algorithm 3.2], which is equivalent to the application of a cubature rule using the M sampling nodes of the rank1 lattice and the weights \(w_j=M^{1}\).

multiple rank1 lattices (MR1L): We construct a spatial discretization for K using [27, Algorithm 5]. The discretization consists of a set of rank1 lattices whose structure allows efficient computations of the Fourier matrix and its adjoint matrix, cf. [26], which in turn is used to apply the least squares method to compute the projected Fourier coefficients. The calculation is equivalent to applying a nonequally weighted cubature rule to calculate each projected coefficient.
Table 2 states upper bounds on the sampling and computational complexity of the methods using the rough bounds \(\mathcal {O}\left( drS_Q(r^2\,s^2,d)\right) \) and \(\mathcal {O}\left( drT_Q(r^2\,s^2,d)\right) \) from Sect. 2.2. Those complexities are simplified using particular assumptions. For more detail on the complexities of the reconstruction methods, see the respective references.
Figure 5 illustrates the decay of the relative \(L_2(\mathbb {T}^{10})\) approximation error for the three different cubature methods used as well as the \(\ell _\infty \) coefficient approximation error, the amount of samples and the computation time. Note that these are the medians of the 10 test runs. While we also performed tests for the rank1 lattice approaches for higher extensions up to \(N=13\) and larger sparsities like \(s=2^{17}\), the MC approach failed to deliver already for smaller parameters because of the large matrices used there. Figure 5a only shows the data of those tests that could be performed successfully. The computation time also increases at a much faster rate, while all illustrated tests for R1L and MR1L computed in less than an hour. Anyways, the detected index sets \(\textrm{I}\) seem to perform very well for all three methods and larger extensions N allow an even longer decay of the relative \(L_2(\mathbb {T}^{10})\) approximation error w.r.t. the sparsity s.
The \(\ell _\infty \) coefficient approximation error decays well for all examples, but starts to stagnate a bit earlier w.r.t. the sparsity than the relative error. This effect seems to be caused by the aliasing of the coefficients surrounding \(\Gamma \), since for larger or smaller values of N the stagnation also starts later or earlier, respectively. Analogously, the \(\ell _1\) and \(\ell _2\) coefficient approximation error decay in a similar way but are not illustrated here to preserve clarity. The amount of samples illustrated in Fig. 5e grows reasonably. As expected, the R1L approach needs most samples while the growth for the MR1L and MC approaches is considerably slower. The amount of samples tends to be very large compared to the size of our search space \(\Gamma \), e.g., \(\vert \tilde{H}_{8}^{10,\frac{1}{2}}\vert \approx 2.39 \cdot 10^6\), but other hyperbolic crosses or even the full grid should result in comparable amounts of samples as long as the extension N is of similar size, while the amount of candidates grows tremendously for those search spaces.
4.2 9dimensional nonperiodic test function
Consider the multivariate test function \(f: [1,1]^{9} \rightarrow \mathbb {R},\)
from e.g. [51, Sec. 4.2.2] and [47, Sec. III.B], where \(B_2:\mathbb {R}\rightarrow \mathbb {R}\) and \(B_4:\mathbb {R}\rightarrow \mathbb {R}\) are shifted, scaled and dilated BSplines of order 2 and 4, respectively, see Fig. 6 for an illustration and [51] for their rigorous definition.
As in Sect. 4.1, the function f is not sparse in the Chebyshev frequency domain and we expect the significant Chebyshev coefficients \(c_{{\varvec{k}}}, {\varvec{k}}\in \mathbb {N}^9,\) of the function to be supported on a union of a fourdimensional hyperbolic cross like structure in the dimensions 1, 3, 4, 7, and a fivedimensional hyperbolic cross like structure in the dimensions 2, 5, 6, 8, 9. Hence, we restrict ourselves to the search space \(\Gamma :=\tilde{H}_n^{9,\frac{1}{2}} \cap \mathbb {N}^9\) with \(\tilde{H}_{n}^{d,\gamma }\) as defined in (4.1). Again, we increase the sparsity s while fixing the number of detection iterations \(r=5\), the detection threshold \(\delta _+=10^{12}\) and the local sparsity parameter \(s_{\textrm{local}}=1.2 s\).
All tests are performed 10 times and the relative \(L_2([1,1]^9,\mu _{\textrm{Cheb}})\) approximation error
as well as the coefficient approximation error
for \(p=1,2,\infty \) are computed. As before, the relative \(L_2([1,1]^9,\mu _{\textrm{Cheb}})\) approximation error only depends on the detected index set \(\textrm{I}\) and not on the approximated coefficients \({\hat{f}}_{\varvec{k}}, {\varvec{k}}\in \textrm{I},\) cf. Sect. 4.1.
We consider two different cubature methods Q to construct and evaluate the approximated projected (Chebyshev) coefficients \({\hat{f}}_{\mathfrak {u},{\varvec{k}}}^Q (\tilde{{\varvec{x}}})\) in Algorithm 1:

Monte Carlo points (cMC): We set \(m=\sum _{{\varvec{k}}\in K} 2^{\left\ {\varvec{k}}\right\ _0}\), cf. [34, Sec. 1.2], and draw \(M :=m \log (m)\) nodes \({\varvec{\xi }}_j, j=1,\ldots ,M,\) in \([1,1]\) at random w.r.t. the Chebyshev measure \(\mu _{\textrm{Cheb}}\) and set \(w_j=M^{1}\) for all \(j=1,\ldots ,M\). To improve the accuracy, we again apply the least squares method with up to 20 iterations and the tolerance \(10^{6}\). Again, the method is now no longer an equally weighted Monte Carlo cubature in the classical sense.

Chebyshev multiple rank1 lattices (cMR1L): Similar to the multiple rank1 lattice approach in Sect. 4.1, there exists a strategy [29] for discretizing spans of multivariate Chebyshev polynomials using sets of transformed rank1 lattices [34]. The computation of the evaluation matrix as well as its adjoint can be implemented in an efficient way. We compute the Chebyshev coefficients using the least squares method which is thus equivalent to applying a nonequally weighted cubature rule to calculate each projected coefficient.
Table 3 again states upper bounds on the sampling and computational complexity of those methods using the rough bounds \(\mathcal {O}\left( drS_Q(r^2\,s^2,d)\right) \) and \(\mathcal {O}\left( drT_Q(r^2\,s^2,d)\right) \) from Sect. 2.2. As before, those complexities are simplified using particular assumptions, while the detailed versions can be found in the given references.
Figure 7 again illustrates the decay of the relative \(L_2([1,1]^9,\mu _{\textrm{Cheb}})\) approximation error for the two different methods used as well as the \(\ell _\infty \) coefficient approximation error and the computation time. Again, the \(\ell _1\) and \(\ell _2\) coefficient approximation error decay in a similar way as the \(\ell _\infty \) error and are not shown here. As in Sect. 4.1, we could not apply the cMC method for larger parameters s and N due to the higher computation time as well as the larger amount of samples needed. On the other hand, the cMR1L method transferred their efficiency to the whole dimensionincremental method, managing a significantly shorter computation time and less samples needed. While both the relative \(L_2([1,1]^9,\mu _{\textrm{Cheb}})\) approximation error as well as the \(\ell _\infty \) coefficient approximation error decay as expected for both approaches, the coefficient approximation error is again a little larger for the lattice approach. Those results again underline the importance of the efficiency and accuracy of the underlying reconstruction method to our algorithm.
4.3 Reliability
The numerical experiments in Sects. 4.1 and 4.2 are regulated by the sparsity s instead of the detection threshold \(\delta _+\). This approach seems more natural for applications of our algorithm, but is obviously missing a proper theoretical foundation like Theorem 3.5. However, the results already seem very promising in the previous sections. To underline this, we further investigate the reliability of Algorithm 1 by performing chosen tests 100 instead of 10 times and plotting the results for the respective relative \(L_2\) approximation error in boxandwhisker plots in Fig. 8. We stick to the typical approach by classifying results as outliers, if they are more than 1.5 interquartile ranges above the upper quartile.
The main observation is how small the boxes (and the extension of their whiskers) are for almost all tests, which indicates a high reliability of our algorithm. The amount of outliers is below \(10\%\) in most tests, the highest detect amount is \(24\%\). However, most of those outliers are still incredibly close to the average results and are most likely to be outliers only due to the tremendously small interquartile ranges. As can be seen in Fig. 8, there are only singular outliers where the accuracy is considerably worse than expected.
Surprisingly, our approaches in the Chebyshev setting (cMC and cMR1L) seem to behave even better in terms of reliability, as there are even less and also less bad outliers. However, it is not entirely clear whether this is caused by the particular basis or due to sideeffects.
5 Conclusion
The presented algorithm is capable of approximating highdimensional functions very well by detecting a sparse truncation of its basis expansion in the corresponding space. Given a suitable coefficient reconstruction method like rank1 lattice approaches, the highly adaptive algorithm can be applied to any bounded orthonormal product basis and also benefits tremendously, if the reconstruction method is efficient in terms of, e.g., sample complexity, memory complexity or computational complexity. If several reconstruction methods are available, one should prioritize those with the best properties for the considered situation, e.g., a sample efficient method if sampling is very expensive compared to the rest of the algorithm as in [31].
We provide a theoretical reconstruction guarantee for a special kind of methods, which can be seen as a blueprint for similar proofs for other reconstruction methods. On the other hand, Theorem 3.5 also brings up various open questions which are suitable for further research. It is still unknown how to properly include the sparsity s as cutoff parameter, which is way more suitable for regulating applications of the algorithm, into the theoretical results instead of the detection threshold \(\delta _+\). Also, improved bounds on the number of detection iterations r are still desired since the working choice of a small and constant amount does not coincide with the theoretical bound.
Our numerical tests result in promising and reliable nonlinear approximations for the wellknown Fourier case as well as the nonperiodic Chebyshev case. These results strongly motivate applying the dimensionincremental algorithm to several other bounded orthonormal product bases as in [11].
Finally, we stated several modifications and improvements of the algorithm throughout the paper, which should be considered in future works to increase the power of the dimensionincremental method even further.
References
Adcock, B., Brugiapaglia, S., Webster, C. G.: Compressed Sensing Approaches for Polynomial Approximation of HighDimensional Functions, pages 93–124. Springer International Publishing, Cham (2017)
Adcock, B., Brugiapaglia, S., Webster, C.G.: Sparse Polynomial Approximation of HighDimensional Functions. Society for Industrial and Applied Mathematics, Philadelphia (2022)
Akavia, A.: Deterministic sparse Fourier approximation via approximating arithmetic progressions. IEEE Trans. Inform. Theory 60(3), 1733–1741 (2014)
Bartel, F., Kämmerer, L., Potts, D., Ullrich, T.: On the reconstruction of functions from values at subsampled quadrature points (2022). arXiv:2208.13597 [math.NA]
Bellman, R.E.: Adaptive control processes—a guided tour. Princeton University Press, Princeton (1961)
Berlinet, A., ThomasAgnan, C.: Reproducing kernel Hilbert spaces in probability and statistics. Kluwer Academic Publishers, Boston (2004)
Bittens, S.: Sparse FFT for functions with short frequency support. Dolomit. Res. Notes Approx. 10, 43–55 (2017)
Choi, B., Christlieb, A., Wang, Y.: Highdimensional sparse Fourier algorithms. Numer, Algorithms (2020)
Choi, B., Christlieb, A., Wang, Y.: Multiscale highdimensional sparse Fourier algorithms for noisy data. Math. Comput. Geom. Data 1(1), 35–58 (2021)
Choi, B., Iwen, M., Krahmer, F.: Sparse harmonic transforms: a new class of sublineartime algorithms for learning functions of many variables. Found. Comput. Math. 21, 06 (2020)
Choi, B., Iwen, M., Volkmer, T.: Sparse harmonic transforms II: best sterm approximation guarantees for bounded orthonormal product bases in sublineartime. Numer. Math. 148, 293–362 (2021)
Christlieb, A., Lawlor, D., Wang, Y.: A multiscale sublinear time Fourier algorithm for noisy data. Appl. Comput. Harmon. Anal. 40, 553–574 (2016)
Cohen, A., DeVore, R.: Approximation of highdimensional parametric PDEs. Acta Numer. 24, 1–159 (2015)
Foucart, S., Rauhut, H.: A Mathematical Introduction to Compressive Sensing. Applied and Numerical Harmonic Analysis, Birkhäuser/Springer, New York (2013)
Gilbert, A., Indyk, P., Iwen, M., Schmidt, L.: Recent developments in the sparse Fourier transform: a compressed Fourier transform for big data. IEEE Signal Proc. Mag. 31(5), 91–100 (2014)
Gilbert, A. C., Gu, A., Ré, C., Rudra, A., Wootters, M.: Sparse recovery for orthogonal polynomial transforms. In A. Czumaj, A. Dawar, and E. Merelli, editors, 47th International Colloquium on Automata, Languages, and Programming, ICALP 2020, July 811, 2020, Saarbrücken, Germany (Virtual Conference), volume 168 of LIPIcs, pages 58:1–58:16. Schloss Dagstuhl  LeibnizZentrum für Informatik (2020)
Hassanieh, H., Indyk, P., Katabi, D., Price, E.: Nearly optimal sparse Fourier transform. In: Proceedings of the Fortyfourth Annual ACM Symposium on Theory of Computing, pages 563–578. ACM (2012)
Hassanieh, H., Indyk, P., Katabi, D., Price, E.: Simple and practical algorithm for sparse Fourier transform. In: Proceedings of the Twentythird Annual ACMSIAM Symposium on Discrete Algorithms, pages 1183–1194. SIAM (2012)
Hu, X., Iwen, M., Kim, H.: Rapidly computing sparse Legendre expansions via sparse Fourier transforms. Numer. Algor. 74, 1029–1059 (2017)
Indyk, P., Kapralov, M.: Sampleoptimal Fourier sampling in any constant dimension. In: Foundations of Computer Science (FOCS), 2014 IEEE 55th Annual Symposium on, pages 514–523 (2014)
Iwen, M.A.: Combinatorial sublineartime Fourier algorithms. Found. Comput. Math. 10, 303–338 (2010)
Iwen, M.A.: Improved approximation guarantees for sublineartime Fourier algorithms. Appl. Comput. Harmon. Anal. 34, 57–82 (2013)
Iwen, M.A., Gilbert, A., Strauss, M.: Empirical evaluation of a sublinear time sparse DFT algorithm. Commun. Math. Sci. 5, 981–998 (2007)
Jahn, T., Ullrich, T., Voigtlaender, F.: Sampling numbers of smoothness classes via \(\ell ^1\)minimization (2022). arxiv:2212.00445 [math.NA]
Kämmerer, L.: High Dimensional Fast Fourier Transform Based on Rank1 Lattice Sampling. Dissertation. Universitätsverlag Chemnitz (2014)
Kämmerer, L.: Multiple rank1 lattices as sampling schemes for multivariate trigonometric polynomials. J. Fourier Anal. Appl. 24, 17–44 (2018)
Kämmerer, L.: Constructing spatial discretizations for sparse multivariate trigonometric polynomials that allow for a fast discrete Fourier transform. Appl. Comput. Harmon. Anal. 47(3), 702–729 (2019)
Kämmerer, L.: A fast probabilistic componentbycomponent construction of exactly integrating rank1 lattices and applications (2020). arXiv:2012.14263
Kämmerer, L.: Discretizing multivariate Chebyshev polynomials using multiple Chebyshev rank1 lattices. in preparation (2022)
Kämmerer, L., Potts, D., Volkmer, T.: Highdimensional sparse FFT based on sampling along multiple rank1 lattices. Appl. Comput. Harmon. Anal. 51, 225–257 (2021)
Kämmerer, L., Potts, D., Taubert, F.: The uniform sparse FFT with application to PDEs with random coefficients. Sampl. Theory Signal Proces. Data Anal. 20(19) (2022)
Kempf, R., Wendland, H., Rieger, C.: Kernelbased reconstructions for parametric PDEs, pages 53–71. Springer International Publishing, Cham (2019)
Kuo, F.Y., Sloan, I.H., Wasilkowski, G.W., Woźniakowski, H.: On decompositions of multivariate functions. Math. Comput. 79(270), 953–966 (2010)
Kuo, F., Migliorati, G., Nobile, F., Nuyens, D.: Function integration, reconstruction and approximation using rank1 lattices. Math. Comp. 90(330), 1861–1897 (2021)
Kämmerer, L., Krahmer, F., Volkmer, T.: A sample efficient sparse FFT for arbitrary frequency candidate sets in high dimensions. Numer. Algorithms, 1479–1520 (2021)
Lawlor, D., Wang, Y., Christlieb, A.: Adaptive sublinear time Fourier algorithms. Adv. Adapt. Data Anal. 5(1):1350003 (2013)
Li, D., Hickernell, F.J.: Trigonometric spectral collocation methods on lattices. In: Recent advances in scientific computing and partial differential equations (Hong Kong, 2002), volume 330 of Contemp. Math., pages 121–132. Amer. Math. Soc., Providence, RI (2003)
Loève, M.: Probability Theory I. Graduate Texts in Mathematics, 4th edn. Springer, New York (1977)
Lüthen, N., Marelli, S., Sudret, B.: Sparse polynomial chaos expansions: literature survey and benchmark. SIAM/ASA J. Uncertain. Quantif. 9(2), 593–649 (2021)
Lüthen, N., Marelli, S., Sudret, B.: Automatic selection of basisadaptive sparse polynomial chaos expansions for engineering applications. Int. J. Uncertain. Quantif. 12(3), 49–74 (2022)
Peter, T., Plonka, G., Roşca, D.: Representation of sparse Legendre expansions. J. Symb. Comput. 50, 159–169 (2013)
Plonka, G., Wannenwetsch, K.: A deterministic sparse FFT algorithm for vectors with small support. Numer. Algorithms 71, 889–905 (2016)
Plonka, G., Wannenwetsch, K.: A sparse fast Fourier algorithm for real nonnegative vectors. J. Comput. Appl. Math. 321, 532–539 (2017)
Potts, D., Schmischke, M.: Approximation of highdimensional periodic functions with Fourierbased methods. SIAM J. Numer. Anal. 59(5), 2393–2429 (2021)
Potts, D., Tasche, M.: Reconstruction of sparse Legendre and Gegenbauer expansions. Numer. Math. 56, 1019–1043 (2016)
Potts, D., Volkmer, T.: Sparse highdimensional FFT based on rank1 lattice sampling. Appl. Comput. Harmon. Anal. 41(3), 713–748 (2016)
Potts, D., Volkmer, T.: Multivariate sparse FFT based on rank1 Chebyshev lattice sampling. In: 2017 International Conference on Sampling Theory and Applications (SampTA), pages 504–508 (2017)
Rauhut, H., Ward, R.: Sparse Legendre expansions via \({\ell _1}\)minimization. J. Approx. Theory 164, 517–533 (2012)
Schmischke, M.: Interpretable Approximation of HighDimensional Data based on the ANOVA Decomposition. Dissertation. Universitätsverlag Chemnitz (2022)
Segal, B., Iwen, M.: Improved sparse Fourier approximation results: faster implementations and stronger guarantees. Numer. Algor. 63, 239–263 (2013)
Volkmer, T.: Multivariate approximation and highdimensional sparse FFT based on rank1 lattice sampling. Dissertation. Universitätsverlag Chemnitz (2017)
Xie, Y., Shi, R., Schaeffer, H., Ward, R.: Shrimp: sparser random feature models via iterative magnitude pruning. In: Dong, B., Li, Q., Wang, L., Xu, Z.Q. J. (eds) Proceedings of Mathematical and Scientific Machine Learning, volume 190 of Proceedings of Machine Learning Research, pages 303–318. PMLR, (2022)
Acknowledgements
L. Kämmerer gratefully acknowledges funding by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) with the project number 380648269 and Daniel Potts with the project number 416228727 – SFB 1410.
Funding
Open Access funding enabled and organized by Projekt DEAL.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Mark Iwen.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Kämmerer, L., Potts, D. & Taubert, F. Nonlinear approximation in bounded orthonormal product bases. Sampl. Theory Signal Process. Data Anal. 21, 19 (2023). https://doi.org/10.1007/s43670023000577
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s43670023000577
Keywords
 Sparse approximation
 Nonlinear approximation
 Highdimensional approximation
 Dimensionincremental algorithm
 Bounded orthonormal product bases
 Projected coefficients