Abstract
We develop the uniform sparse Fast Fourier Transform (usFFT), an efficient, nonintrusive, adaptive algorithm for the solution of elliptic partial differential equations with random coefficients. The algorithm is an adaption of the sparse Fast Fourier Transform (sFFT), a dimensionincremental algorithm, which tries to detect the most important frequencies in a given search domain and therefore adaptively generates a suitable Fourier basis corresponding to the approximately largest Fourier coefficients of the function. The usFFT does this w.r.t. the stochastic domain of the PDE simultaneously for multiple fixed spatial nodes, e.g., nodes of a finite element mesh. The key idea of joining the detected frequency sets in each dimension increment results in a Fourier approximation space, which fits uniformly for all these spatial nodes. This strategy allows for a faster and more efficient computation due to a significantly smaller amount of samples needed, than just using other algorithms, e.g., the sFFT for each spatial node separately. We test the usFFT for different examples using periodic, affine and lognormal random coefficients in the PDE problems.
Similar content being viewed by others
1 Introduction
Parametric operator equations have gained significant attention in recent years. In particular, partial differential equations with random coefficients play an important role in the study of uncertainty quantification, e.g., [8, 22, 23]. Therefore, the numerical solution of these equations and how to compute them in an efficient and reliable way has become more and more important.
In this work, we consider the parametric, elliptic problem of finding \(u: D_{{\varvec{x}}} \times D_{{\varvec{y}}} \rightarrow {\mathbb {R}}\) such that for every \({\varvec{y}}\in D_{{\varvec{y}}}\) there holds
describing the diffusion characteristics of inhomogeneous materials and therefore being called diffusion equations with the random diffusion coefficients a. Here, \({\varvec{x}}= (x_j)_{j=1}^{d_{{\varvec{x}}}} \in D_{{\varvec{x}}}\) is the spatial variable in a bounded Lipschitz domain \(D_{{\varvec{x}}} \subset {\mathbb {R}}^{d_{{\varvec{x}}}}\), typically with spatial dimension \(d_{{\varvec{x}}}=1,2\) or 3, and \({\varvec{y}}= (y_j)_{j=1}^{d_{{\varvec{y}}}} \in D_{{\varvec{y}}}\) is a highdimensional random variable with \(D_{{\varvec{y}}} \subset {\mathbb {R}}^{d_{{\varvec{y}}}}\). For the remainder of this paper, we use d instead of \(d_{{\varvec{y}}}\) to simplify notations. The differential operator \(\nabla \) is always used w.r.t. the spatial variable \({\varvec{x}}\) and the onedimensional random variables \(y_j\) are assumed to be i.i.d. with a prescribed distribution.
A common way to define the random coefficient a is via
where \(a_0\) and \(\psi _j\) are assumed to be uniformly bounded on \(D_{{\varvec{x}}}\). This model is commonly used with the stochastic domain \(D_{{\varvec{y}}} = [\alpha ,\beta ]^{d}\), typically with \([\alpha ,\beta ] = [1,1]\) or \([\frac{1}{2}, \frac{1}{2}]\). The \(\Theta _j({\varvec{y}})\) can also be interpreted as random variables itself and are usually chosen to have expectation value \({\mathbb {E}}[\Theta _j({\varvec{y}})]=0\), such that \({\mathbb {E}}[a({\varvec{x}},\cdot )] = a_0({\varvec{x}})\) holds and the terms of the sum model the stochastic fluctuations.
Often, the model (1.2) is in an affine fashion, using \(\Theta _j({\varvec{y}}) = y_j\) for all \(j=1,\ldots ,d\). This socalled affine model is considered in many works on parametric differential equations with random coefficients, e.g., [2, 5, 11, 14, 15, 18, 30, 31, 37, 40, 44]. The socalled periodic model using \(\Theta _j({\varvec{y}}) = \frac{1}{\sqrt{6}}\sin (2\pi y_j)\) has been recently studied in [22, 23]. For \(y_j\) uniformly distributed on \([\frac{1}{2},\frac{1}{2}]\) each, these \(\Theta _j\) are then distributed according to the arcsine distribution on \([1,1]\). It turned out that this model is also worth to be considered in addition to the affine model. Further, this model yields some advantages for our new approach due to its periodicity w.r.t. the random variables, as we will see later.
The second type of the random coefficient a, that is also used in many recent works, e.g., [3, 4, 6, 9, 19, 36], is the socalled lognormal form
Here, the random variables \(y_j\) are typically normally distributed, i.e., \(y_j \sim {\mathcal {N}}(0,1)\), and hence \(D_{{\varvec{y}}} = {\mathbb {R}}^{d}\). The numerical analysis as well as the computation of approximations for this model is more difficult, but also arises more often from real applications. A more detailed overview on parametric and stochastic PDEs can be found, e.g., in [10, Sect. 1].
In this paper, we design a numerical method for solving the aforementioned problems. To be more precise, we will compute approximations of the solutions \(u({\varvec{x}},{\varvec{y}})\) using trigonometric polynomials. A Fourier approach on ordinary differential equations, i.e., \(d_x=1\), with highdimensional random coefficients has already been presented in [7]. There, a dimensionincremental method, the socalled sparse Fast Fourier Transform (sFFT), cf. [39], was used to detect the most important frequencies \({\varvec{k}}\) and corresponding approximations of the Fourier coefficients \(c_{{\varvec{k}}}(u)\) of the solution \(u({\varvec{x}},{\varvec{y}})\). These values can be used to compute an approximation of the solution u or other quantities of interest as, e.g., the expectation value \({\mathbb {E}}[u]\). Further, the frequencies and Fourier coefficients can be used to gain detailed information about the influence of the random variables \(y_j\) on the solution u and their interaction with each other.
In this work, we present a nonintrusive approach based on the main idea of the algorithm developed in [7]. The main difference is, that we do not include the spatial variable \({\varvec{x}}\) in the Fourier approach and therefore only apply the sFFT w.r.t. the random variable \({\varvec{y}}\). Therefore, the sFFT only needs samples of the function values of u for fixed \({\varvec{y}}\), which can be computed by using any suitable, already available differential equation solver. In consequence, we are not restricted to particular spatial domains \(D_{{\varvec{x}}}\) or spatial dimensions \(d_x\). To be more precise, we consider a finite set \({{\mathcal {T}}}_G \subset D_{{\varvec{x}}}\) with finite cardinality \({{\mathcal {T}}}_G = G < \infty ,\) as spatial discretization and aim for approximations of the functions
for each \({\varvec{x}}_g \in {{\mathcal {T}}}_G\). Also note that we might need to apply a suitable periodization w.r.t. \({\varvec{y}}\) first if the function \(u_{{\varvec{x}}_g}\) is not already periodic.
Unfortunately, we would need to apply such a pointwise approximation algorithm, like in our case sFFT, then G times separately, resulting in an unnecessary huge increase in the number of samples used and therefore, since each sample implies a call of the underlying, probably expensive differential equation solver, also in computation time of the algorithm. Hence, we develop a modification of the sFFT to overcome this problem and compute the approximations of the functions \(u_{{\varvec{x}}_g}\) within one call of the new algorithm. In particular, our socalled uniform sparse Fast Fourier Transform (usFFT) combines some candidate sets between each dimensionincremental step, which allows to use the same sampling nodes \({\varvec{y}}\) for each point \({\varvec{x}}_g \in {{\mathcal {T}}}_G\) in the next step. This strategy manages to keep the number of used samples in a reasonable size and hence decreases the computation time drastically compared to G applications of the sFFT algorithm itself. We summarize this in the following Theorem:
Theorem 1.1
Let the sparsity parameter \(s\in {\mathbb {N}}\), a frequency candidate set \(\Gamma \subset {\mathbb {Z}}^d\), \(\Gamma <\infty \), the amount \(G\in {\mathbb {N}}\) and a failure probability \(\delta \in (0,1)\) be given. Moreover, we define \(N_{\Gamma } :=\max _{j=1,\ldots ,d} \lbrace \max _{{\varvec{k}}\in \Gamma } k_j  \min _{{\varvec{l}}\in \Gamma } l_j \rbrace \). Then, there exists a randomized sampling strategy based on the random rank1 lattice approach in [32] generating a set S of sampling locations with cardinality
such that the following holds.
Consider G arbitrary multivariate trigonometric polynomials \(p^{(g)}({\varvec{y}}):=\sum _{{\varvec{k}}\in I_g}\hat{p}_{{\varvec{k}}}^{(g)}\text {e}^{2\pi \text {i}{\varvec{k}}\cdot {\varvec{y}}}\), \(g=1,\ldots ,G\), where we assume \(I_g\subset \Gamma \), \(I_g\le s\) and \(\min _{{\varvec{k}}\in I_g}\hat{p}_{{\varvec{k}}}^{(g)}>0\) for each \(g=1,\ldots ,G\). We generate a random set S via this sampling strategy. Then, with probability at least \(1\delta \) it holds that

all frequencies \({\varvec{k}}\in I_g\) as well as

all Fourier coefficients \(\hat{p}_{{\varvec{k}}}^{(g)}\), \({\varvec{k}}\in I_g\),
of all multivariate trigonometric polynomials \(p^{(g)}\), \(g=1,\ldots ,G\), can be reconstructed from their values at the sampling locations in S.
The simultaneous identification of all the frequencies and the computation of all the Fourier coefficients can be realized by a combination of Algorithm 1 and a modification of the approach presented in [32] in the role of Algorithm \({\text {A}}\). The suggested method has a computational complexity of
with probability at least \(1\delta \) as well as \({\mathcal {O}}\left( d^2 \,s^3 \,G^2 \,N_{\Gamma } \,\log ^3 \frac{d \,s \,G \,N_{\Gamma }}{\delta }\right) \) in the worst case.
Remark 1.2
Note that (1.3) in Theorem 1.1 does not state anything about the sampling complexity of Algorithm 1, but the amount of sampling locations \(\vert S \vert \). We need samples of all trigonometric polynomials \(p^{(g)}({\varvec{y}}), g=1,\ldots ,G,\) at these sampling nodes \({\varvec{y}}\in S\), so the necessary amount of samples in the classical sense is G times larger. However, we aim for an Algorithm, where the G samples \(p^{(g)}({\varvec{y}})\) for a fixed \({\varvec{y}}\in S\) for all \(g=1,\ldots ,G\) are obtained by just a single call of some (probably expensive) black box sampling method. In all of our numerical examples in Sect. 4, the computation time for this sampling procedure tremendously outweighs the pure computation time of the remaining steps of Algorithm 1, which we referred to as computational complexity in Theorem 1.1. Hence, we stress on the fact, that the computational complexity is not the main focus of this complexity result, but the amount of sampling nodes.
The proof of the Theorem is given in Appendix B. While Theorem 1.1 is stated for trigonometric polynomials \(p^{(g)}\) only, the algorithm can be used on the above mentioned periodic or periodized functions \(u_{{\varvec{x}}_g}({\varvec{y}})\) to compute the support and values of the approximately largest Fourier coefficients of the functions with some thresholding technique, realized by the parameters \(s_\mathrm {local}\) and \(\theta \) in Algorithm 1, as well, which is also the key idea when applying the sFFT for function approximation in [25, 32, 39]. More generally spoken, we could even consider G different periodic functionals \(F_g({\varvec{y}})\) and approximate them with the same approach we are about to present here. Moreover, Theorem 1.1 does not assume the frequency sets \(I_g\) to share any frequencies \({\varvec{k}}\), i.e., these sets could even be pairwise disjoint in the worst case scenario. Obviously, this will not be the case in our examples later on as the functions \(u_{{\varvec{x}}_g}({\varvec{y}})\) and \(u_{{\varvec{x}}_{\tilde{g}}}({\varvec{y}})\) are probably very similar for \({\varvec{x}}_g\) and \({\varvec{x}}_{\tilde{g}}\) close to each other due to the smoothness of the solution \(u({\varvec{x}},{\varvec{y}})\). Hence, the given complexities, especially the quadratic dependency on G of the computational complexity, are very pessimistic and should really be seen as a worst case estimate.
The crucial advantage of the presented approach is the efficient and adaptive choice of the frequency set performed by the underlying sFFT. Most of the approaches in the aforementioned works are based on certain (tensorized) basis functions [2,3,4,5,6, 9, 11, 22], QuasiMonte Carlo methods [9, 14, 15, 18, 19, 23, 30, 31, 36, 37, 40] or collocation methods [9, 44] and often assume the particular involved basis functions, weights, index sets or kernels needed to be known, chosen or computed in advance. A common example are compressed sensing techniques, see, e.g., [1, Ch. 7] or [17] and the references therein for an overview, where the considered index sets need to be chosen in advance. The adaptivity of our algorithm circumvents this step and therefore provides much more freedom in finding a good sparse approximation. Also note that our method aims for the approximation of the solution \(u({\varvec{x}},{\varvec{y}})\) directly instead of, e.g., just a highdimensional quadrature. As an example and for additional information, we refer to [28] as a short and general introduction to QuasiMonte Carlo methods, which is also one of the most common approaches.
Another suitable approach is to use an efficient deterministic sampling strategy which guarantees the reconstruction of each ssparse signal supported on the ddimensional candidate set \(\Gamma \), socalled deterministic sparse Fourier Transform. Such a method allows to use the same samples for approximating all signals \(u_{{\varvec{x}}_g}\), \(g=1,\ldots , G\). Several works like [21] already provide applicable multivariate sparse Fourier transform results, but the resulting complexities \({\mathcal {O}}(d^4s^2)\) (up to logarithmic factors) in a deterministic setting and \({\mathcal {O}}(d^4s)\) for a random variant scale suboptimal in the dimension d. Another fully deterministic method is stated in [34]. There, the construction of the sampling sets suffers from some minor restrictions on the considered frequency set, which result in a slightly better scaling sampling complexity \({\mathcal {O}}(d^3s^2N)\) and computational complexity \({\mathcal {O}}(d^3s^2N^2)\) (both again up to logarithmic factors) with N the side length of the considered cube in frequency domain. Finally, in [20] multivariate sparse Fourier transforms are presented, where the corresponding complexities scale again quadratically in s for a deterministic version and linearly for a Monte Carlo version, while the dimension d only enters linearly. Unfortunately, this is only true if the considered candidate sets do not scale exponentially w.r.t. the dimension d. Otherwise, the size M of the reconstructing rank1 lattices used also scales exponentially in d and logarithmic factors like \(\log ^{11/2} M\) in the complexities then result again in a suboptimal dimension scaling.
Our usFFT is highly adaptive, since it only needs an arbitrary candidate set \(\Gamma \) and selects the important frequencies \({\varvec{k}}\) in this search domain on its own. The cardinality \(\Gamma \) of this candidate set is not as crucial as for other approaches like mentioned above, since the number of used samples and the computation time suffer only mildly from larger candidate sets. Again, we stress on the fact that we may also extract additional information about the influence and the interactions of the random variables \({\varvec{y}}_j\) from the output of the usFFT. For instance, we detect a maximum of only four simultaneously active dimensions in the detected frequencies in our numerical examples, i.e., the detected frequency vectors \({\varvec{k}}\) have at most four nonzero components with \(d = 10\) or even \(d =20\). Another main advantage of our algorithm is the nonintrusive and parallelizable behavior. As already mentioned, the usFFT uses existing numerical solvers of the considered differential equation. We can use suitable, reliable and efficient solvers with no need to reimplement them. Further, the different samples needed in each sampling step can be computed on multiple instances. This parallelization allows to reduce the computation time even further and makes a higher number of used samples less time consuming.
The remainder of the paper is organized as follows:
In Sect. 2 we set up some notation and assumptions and briefly explain the key idea of the sFFT algorithm. Section 3 is devoted to the explanation of the usFFT as well as some periodizations required for the affine and lognormal cases. Finally, in Sect. 4 we test the new algorithm on different examples using periodic, affine, and lognormal random coefficients and investigate the computed approximations under different aspects.
The MATLAB^{®} source code of the algorithm as well as demos for our numerical examples can be downloaded from https://mytuc.org/fyfw.
2 Prerequisites
We consider the PDE problem (1.1). Note that we always assume f to be independent of the random variable \({\varvec{y}}\) and zero boundary conditions just for simplicity and to preserve clarity. Our algorithm (up to some minor changes) may also be applied for righthand sides \(f({\varvec{x}},{\varvec{y}})\) as well as nonzero Dirichlet boundary conditions \(u({\varvec{x}},{\varvec{y}}) = h({\varvec{x}},{\varvec{y}})\) for all \({\varvec{x}}\in \partial D_{{\varvec{x}}}\).
2.1 Problem setting
The weak formulation of our problem reads: Given \(f \in H^{1}(D_{{\varvec{x}}})\), for every \({\varvec{y}}\in D_{{\varvec{y}}}\), find \(u(\cdot ,{\varvec{y}}) \in H_0^1(D_{{\varvec{x}}})\), such that
As usual, \(H_0^1(D_{{\varvec{x}}})\) denotes the subspace of the \(L_2\)Sobolev space \(H^1(D_{{\varvec{x}}})\) with vanishing trace on \(\partial D_{{\varvec{x}}}\) and \(H^{1}(D_{{\varvec{x}}})\) denotes the dual space of \(H_0^1(D_{{\varvec{x}}})\). We say, that the diffusion coefficient \(a: D_{{\varvec{x}}} \times D_{{\varvec{y}}} \rightarrow {\mathbb {R}}\) fulfills the uniform ellipticity assumption, if there exist two constants \(a_{\min } \in {\mathbb {R}}\) and \(a_{\max } \in {\mathbb {R}}\), such that
Then, the Lax–Milgram Lemma ensures, that the problem (1.1) possesses a unique solution \(u(\cdot ,{\varvec{y}}) \in H_0^1(D_{{\varvec{x}}})\) for every fixed \({\varvec{y}}\in D_{{\varvec{y}}}\), satisfying the a priori estimate
Some further basic information and results on approximation and smoothness of the solution u of highdimensional parametric PDEs can be found in [10, Sects. 1 and 2]. Additionally, we also refer to the general results on best nterm approximations given in [10, Sect. 3.1], since our Fourier approach fits in this particular framework as well.
In order to compute an approximation of the solution \(u_{{\varvec{x}}_g} :=u({\varvec{x}}_g,\cdot )\) at a given point \({\varvec{x}}_g \in D_{{\varvec{x}}}\) using the dimensionincremental method explained below, we need samples of \(u_{{\varvec{x}}_g}\) for a lot of sampling nodes \({\varvec{y}}\). We aim for a nonintrusive approach and therefore use a finite element method to solve the problem (1.1) for a given \({\varvec{y}}\in D_{{\varvec{y}}}\). A similar approach is used e.g. in [36, 37], where the finite element method is used to solve the PDE for any \({\varvec{y}}_{\mathfrak {u}}\) with \(\mathfrak {u} \subset {\mathbb {N}}\) and \(({\varvec{y}}_{\mathfrak {u}})_j = y_j\) for \(j\in \mathfrak {u}\) and 0 otherwise. The corresponding approximations of the socalled \(\mathfrak {u}\)truncated solution are then used for their particular method aswell. In our case, we just evaluate the finite element solution \(\check{u}(\cdot ,{\varvec{y}})\) at the given point \({\varvec{x}}_g \in D_{{\varvec{x}}}\). In particular, instead of the finite element method, any differential equation solver would fit, that is capable of computing the value \(u({\varvec{x}}_g,{\varvec{y}})\) for given \({\varvec{x}}_g\) and \({\varvec{y}}\). Hence, we also refer to this sampling method as black box sampling later on.
Note that we will use the finite element solution \(\check{u}\) also as an approximation of the true solution u, when we test the accuracy of our computed approximation \(u^{\texttt {usFFT}}\) in Sect. 4. In detail, we have
where \(\text {err}(\cdot ,\cdot )\) is a suitable metric, symbolizing the error. So while we only investigate the second term \(\text {err}(\check{u},u^{\texttt {usFFT}})\) in our numerical tests later, the first term includes other error sources as the modeling, e.g., by a dimension truncation of infinitedimensional random coefficient a, or the error coming from the finite element approximation itself. For a particular example of this, we refer to the detailed error analysis for the periodic model mentioned in Sect. 1, that is given in [22, Sect. 4].
2.2 The dimensionincremental method for ssparse periodic functions
The following dimensionincremental method was presented in [39]. The aim of this algorithm is to determine the nonzero Fourier coefficients \(\hat{p}_{{\varvec{k}}} \in {\mathbb {C}}, \, {\varvec{k}}\in I,\) of a multivariate trigonometric polynomial
with unknown frequency set \(I\subset {\mathbb {Z}}^d,\, I < \infty ,\) based on samples of the polynomial p. Obviously, p is a periodic signal and its domain is the ddimensional torus \({\mathbb {T}}^d\), \({\mathbb {T}}\simeq [0,1)\).
The goal is not only to calculate the nonzero Fourier coefficients \(\hat{p}_{{\varvec{k}}}\) but also, and more important, to detect the frequencies \(\varvec{k}\) out of a possibly huge search domain \(\Gamma \subset {\mathbb {Z}}^d\) belonging to the nonzero Fourier coefficients. In particular, we define the set
and call the cardinality \({\text {supp}}\hat{p}\) the sparsity of p.
First, we introduce some further notation. We consider a given search domain \(\Gamma \subset {\mathbb {Z}}^d, \, \Gamma  < \infty ,\) that should be large enough to contain the unknown frequency set \({I\subset \Gamma }\). We denote the projection of a frequency \({\varvec{k}}:=(k_1,\ldots ,k_d) \in {\mathbb {Z}}^d\) to the components \({\varvec{i}}:=(i_1,\ldots ,i_m) \in \lbrace \iota \in \lbrace 1,\ldots ,d \rbrace ^m : \iota _t \not = \iota _{t^{\prime }} \text { for } t \not = t^{\prime } \rbrace \) by \({\mathcal {P}}_{{\varvec{i}}}({\varvec{k}}) :=(k_{i_1},\ldots ,k_{i_m}) \in {\mathbb {Z}}^m\). Correspondingly, we define the projection of a frequency set \(I\subset {\mathbb {Z}}^d\) to the components \({\varvec{i}}\) by \({\mathcal {P}}_{{\varvec{i}}}(I) :=\lbrace (k_{i_1},\ldots ,k_{i_m}): {\varvec{k}}\in I\rbrace \). Using these notations, the general approach is the following:
2.3 Sketch of dimensionincremental reconstruction

1.
Compute the first components of the unknown frequency set from sampling values, i.e., determine a set \(I^{(1)} \subset {\mathcal {P}}_1(\Gamma )\), such that \({\mathcal {P}}_1(\text {supp } \hat{p}) \subset I^{(1)}\) holds.

2.
For dimension increment step \(t=2,\ldots ,d\), i.e., for each additional dimension:

(a)
Compute the tth components of the unknown frequency set from sampling values, i.e., determine a set \(I^{(t)} \subset {\mathcal {P}}_t(\Gamma )\), such that \({\mathcal {P}}_t(\text {supp } \hat{p}) \subset I^{(t)}\) holds.

(b)
Construct a suitable sampling set \({\mathcal {X}}^{(1,\ldots ,t)} \subset {\mathbb {T}}^d,\, {\mathcal {X}}^{(1,\ldots ,t)} \ll \Gamma \), which allows to detect those frequencies from the set \((I^{(1,\ldots ,t1)} \times I^{(t)}) \cap {\mathcal {P}}_{(1,\ldots ,t)}(\Gamma )\) belonging to nonzero Fourier coefficients \(\hat{p}_{{\varvec{k}}}\).

(c)
Sample the trigonometric polynomial p along the nodes of the sampling set \({\mathcal {X}}^{(1,\ldots ,t)}\).

(d)
Compute the Fourier coefficients \(\tilde{\hat{p}}_{(1,\ldots ,t),{\varvec{k}}},\, {\varvec{k}}\in (I^{(1,\ldots ,t1)} \times I^{(t)}) \cap {\mathcal {P}}_{(1,\ldots ,t)}(\Gamma )\).

(e)
Determine the nonzero Fourier coefficients from \(\tilde{\hat{p}}_{(1,\ldots ,t),{\varvec{k}}},\, {\varvec{k}}\in (I^{(1,\ldots ,t1)} \times I^{(t)}) \cap {\mathcal {P}}_{(1,\ldots ,t)}(\Gamma )\) and obtain the set \(I^{(1,\ldots ,t)}\) of detected frequencies. The \(I^{(1,\ldots ,t)}\) index set should be equal to the projection \({\mathcal {P}}_{(1,\ldots ,t)} (\text {supp }\hat{p})\).

(a)

3.
Use the set \(I^{(1,\ldots ,d)}\) and the computed Fourier coefficients \(\tilde{\hat{p}}_{(1,\ldots ,d),{\varvec{k}}},\, {\varvec{k}}\in I^{(1,\ldots ,d)}\) as an approximation for the support \(\text {supp } \hat{p}\) and the Fourier coefficients \(\hat{p}_{{\varvec{k}}},\, {\varvec{k}}\in \text {supp } \hat{p}\).
Note that this method can also be used for the numerical determination of the approximately largest Fourier coefficients
of suffciently smooth periodic signals f using a suitable thresholding technique, cf. [25, 32, 39].
The proposed approach includes the construction of suitable sampling sets in step 2b. To this end, one assumes that an upper bound \(s\ge {\text {supp}}\hat{p}\) is known and one constructs the sampling sets \({\mathcal {X}}^{(1,\ldots ,t)}\) such that the Fourier coefficients \(\tilde{\hat{p}}_{(1,\ldots ,t),{\varvec{k}}}\) computed in step 2d are randomly projected ones. Due to that projection one may observe cancellations with the effect that one misses active frequencies. For that reason, one repeats the computation of the projected Fourier coefficients for a number r of random projections and then one takes the union.
Of course, there exist different methods for the computation of the projected Fourier coefficients. The algorithm works with any sampling method, which computes Fourier coefficients on a given frequency set. Preferable sampling sets combine the four properties:

relatively low number of sampling nodes (sampling complexity),

stability,

efficient construction methods for the sampling set,

fast Fourier transform like algorithms in order to compute the projected Fourier coefficients.
Especially due to the last point, we will call the dimensionincremental method the sparse Fast Fourier Transform (sFFT) from now on. A quick sketch of the sampling techniques used in [25, 32, 39] as well as the sample complexity and computational complexity of the sFFT using these methods are given in Appendix A.
3 The uniform sparse FFT
Up to now, the sFFT algorithm is a suitable tool in order to compute an approximation of the solution
for a single \({\varvec{x}}_g\). When considering a whole set of points \({\varvec{x}}_g \in {{\mathcal {T}}}_G\), \({{\mathcal {T}}}_G = G\), we have to call the existing method G times. But multiple, independent calls of the sFFT result in different, adaptively determined sampling sets \({\mathcal {X}}^{(1,\ldots ,t)}\) in step 2b. Hence, we cannot guarantee that the solutions of the differential equation from one run of the algorithm can be utilized in another one. So we really need G full calls of the sFFT including all sampling computations and therefore end up with unnecessary many samples, even when using the sample efficient rank1 lattice (R1L) approaches. Remember, that sampling means solving the differential equation with a call of the underlying differential equation solver, that might be very expensive in computation time. Therefore, we now modify the dimensionincremental method, such that we can work on the set \({{\mathcal {T}}}_G\) and one call of the algorithm computes approximations of the most important Fourier coefficients \(c_{{\varvec{k}}}(u_{{\varvec{x}}_g})\), \({\varvec{k}}\in I_{{\varvec{x}}_g}\), for each \(g=1,\ldots ,G\), including a clever choice of the sampling nodes \({\varvec{y}}\).
3.1 Expanding the sFFT
The full method is stated in Algorithm 1. We force the dimensionincremental method to select a set \(I\subset \Gamma \subset {\mathbb {Z}}^d\) containing the frequencies of the s approximately largest Fourier coefficients \(c_{{\varvec{k}}}(u_{{\varvec{x}}_g})\) for each \({\varvec{x}}_g \in {{\mathcal {T}}}_G\). To this end, we compute the set of detected frequencies \(I_{{\varvec{x}}_g}^{(1,\ldots ,t)}\) for each \({\varvec{x}}_g\) in each dimensionincrement t, but afterwards we form the union of these sets \(\bigcup _{g=1}^{G} I_{{\varvec{x}}_g}^{(1,\ldots ,t)}\), which will be the set of detected frequencies \(I^{(1,\ldots ,t)}\), that is given to the next dimensionincremental step \(t+1\).
Now, we start each iteration with a larger frequency candidate set \((I^{(1,\ldots ,t1)} \times I^{(t)}) \cap {\mathcal {P}}_{(1,\ldots ,t)}(\Gamma )\), which is suitable for all \({\varvec{x}}_g \in {{\mathcal {T}}}_G\). This way, the first t components of the elements of the sampling set \({\mathcal {X}}^{(1,\ldots ,t)}\) are the same for each \({\varvec{x}}_g\) and the random part, which causes the specific random projection of the Fourier coefficients, can be chosen equally for each \({\varvec{x}}_g\) without disturbing the algorithm. Therefore, we can now take advantage of the fact, that our underlying differential equation solver can evaluate the solutions \(u({\varvec{x}},{\varvec{y}})\) for a given \({\varvec{y}}\) for multiple values of \({\varvec{x}}\) in the domain \(D_{{\varvec{x}}}\). Accordingly, we only need to solve the differential equation once for each sampling node \({\varvec{y}}\) and still get all the sampling values \(u_{{\varvec{x}}_g}({\varvec{y}})\) for all \({\varvec{x}}_g\) in our finite set \({{\mathcal {T}}}_G\). Note that this also holds for the onedimensional detections in steps 1 and 2a. Also, we might need to interpolate or approximate, if some of the values \(u_{{\varvec{x}}_g}({\varvec{y}})\), \({\varvec{x}}_g\in {{\mathcal {T}}}_G\) and fixed \( {\varvec{y}}\in {\mathcal {X}}^{(1,\ldots ,t)}\), are not directly given by the differential equation solver. Obviously, the larger candidate sets \((I^{(1,\ldots ,t1)} \times I^{(t)}) \cap {\mathcal {P}}_{(1,\ldots ,t)}(\Gamma )\), resulting from the union of the sets \(I_{{\varvec{x}}_g}^{(1,\ldots ,t1)}\), \(g=1,\ldots ,G\), and the union of the sets \(I_{{\varvec{x}}_g}^{(t)}\), \(g=1,\ldots ,G\), will also result in larger sampling sets. The overall increase of sampling locations considered is still very reasonable, cf. Theorem 1.1. The computational complexity suffers a bit harder from these modifications, but is not as important as the amount of sampling locations, cf. Remark 1.2. We could also think of further thresholding methods to cut the number of frequencies back to the sparsity parameter s at the end of each dimensionincremental step or at least at the end of the whole algorithm. In this work, we will not do this, but take a look at the total number of frequencies in the output of the algorithm in relation to the sparsity parameter s, cf. Remark 4.1.
Overall, the proposed method is now capable of computing approximations
for all nodes \({\varvec{x}}_g, g=1,\ldots ,G.\) Please note that step 2 does not provide \((c_{\varvec{k}}^{\texttt {usFFT}}(u_{{\varvec{x}}_g}))_{{\varvec{k}}\in I}\) but only approximations of \((c_{\varvec{k}}^{\texttt {usFFT}}(u_{{\varvec{x}}_g}))_{{\varvec{k}}\in \tilde{J}_{d,1,g}}\), where \(\tilde{J}_{d,1,g}\subsetneq I^{(1,\ldots ,d)}=I\) holds in general. In order to compute all the Fourier coefficients \(c_{\varvec{k}}^{\texttt {usFFT}}(u_{{\varvec{x}}_g})\), \({\varvec{k}}\in I\) and \(g=1,\ldots , G\), we propose an additional approximation in step 3. For this approximation, the user can apply a suitable approach of his choice. The used method should just compute an approximation of the projection to the already determined space of trigonometric polynomials \(\text {span}\{\exp (2\pi \text {i}{\varvec{k}}\cdot \circ ):{\varvec{k}}\in I\}\), cf., e.g., [24, 26, 27] for different possible sampling approaches.
We will call this modified version of the sFFT the uniform sFFT or short usFFT from now on, where uniform is meant w.r.t. the discrete set of points \({{\mathcal {T}}}_G\). The main difference to the sFFT algorithms from [25, 32, 39] are the loops over g and the corresponding unions of the frequency sets. Possible choices for Algorithm \({\text {A}}\) and the approximation approach used in step 3 of Algorithm 1 are a random rank1 lattice approach and a multiple rank1 lattice approach, respectively, which actually leads to Theorem 1.1, cf. Appendix A and Appendix B.
3.2 Periodization
The usFFT allows us to reconstruct a frequency set \(I\) and approximations \(c_{{\varvec{k}}}^{\texttt {usFFT}}(u_{{\varvec{x}}_g})\) of the corresponding Fourier coefficients \(c_{{\varvec{k}}}(u_{{\varvec{x}}_g})\) for each \({\varvec{x}}_g \in {{\mathcal {T}}}_G\). Unfortunately, this approach requires the function \(u({\varvec{x}},{\varvec{y}})\) to be 1periodic w.r.t. \({\varvec{y}}\) in each stochastic dimension d.
Since the righthand side \(f({\varvec{x}})\) does not depend on \({\varvec{y}}\) in our considerations, the random coefficient a is the only given function involving the random variable \({\varvec{y}}\) in the problem (1.1). In periodic models, we use the random coefficient (1.2) with 1periodic functions \(\Theta _j({\varvec{y}})\). Hence, the random coefficient \(a({\varvec{x}},{\varvec{y}})\) is 1periodic and thus the solution \(u({\varvec{x}},{\varvec{y}})\) is also 1periodic w.r.t. each component of \({\varvec{y}}\). Therefore, we can apply the usFFT directly for this model without any further considerations.
In order to apply the usFFT when using the affine and lognormal models, we need to apply a suitable periodization first, since the random coefficient a and therefore the solution u are not periodic in general. Note that we assume the random variable to be uniformly distributed in the affine case, i.e., \({\varvec{y}}\sim {\mathcal {U}}([\alpha ,\beta ]^{d})\), and standard normally distributed in the lognormal case, i.e., \({\varvec{y}}\sim {\mathcal {N}}(0,1)^{d}\) and recall \({\mathbb {T}}\simeq [0,1)\).
3.2.1 Affine case
We consider the in \(\tilde{{\varvec{y}}}\) 1periodic function
with \(\varphi \) being some suitable transformation function, i.e.,
With this approach, the usFFT is able to compute approximations of the functions \({\tilde{u}_{{\varvec{x}}_g} :=\tilde{u}({\varvec{x}}_g,\cdot )}\) for each \({\varvec{x}}_g \in {{\mathcal {T}}}_G\). We want \(\varphi \) to act componentwise on the random variable, i.e., \(\varphi (\tilde{{\varvec{y}}}) :=\left( \varphi _j(\tilde{y}_j) \right) _{j=1}^{d}\). Further, we assume, that these mappings \(\varphi _j\) fulfill the assumptions

(A1)
Each \(\varphi _j\) is continuous, i.e., \(\varphi _j \in C({\mathbb {T}})\) for each \(j=1,\ldots ,d\).

(A2)
It holds \(\varphi _j(0) = \varphi _j(1) = \alpha \) and \(\varphi _j(\frac{1}{2}) = \beta \) for each \(j=1,\ldots ,d\).

(A3)
Each \(\varphi _j\) is symmetric, i.e., \(\varphi _j(\frac{1}{2}\tilde{y}) = \varphi _j(\frac{1}{2}+\tilde{y})\) for \(\tilde{y} \in [0, \frac{1}{2}]\) and for each \(j=1,\ldots ,d\).

(A4)
Each \(\varphi _j\) is strictly monotonously increasing in \([0,\frac{1}{2}]\) for each \(j=1,\ldots ,d\).
With these restrictions we ensure, that \(\varphi \) is bijective w.r.t. the interval \([0,\frac{1}{2}]^{d}\). Hence, we define the inverse mapping \(\varphi ^{1}({\varvec{y}}): [\alpha ,\beta ]^{d} \rightarrow [0,\frac{1}{2}]^{d}\).
With this inverse mapping, we are now able to compute approximations of the functions \(u_{{\varvec{x}}_g}({\varvec{y}})\) via
with the finite index set \(I\) and the approximated Fourier coefficients \(c_{{\varvec{k}}}^{\texttt {usFFT}}(\tilde{u}_{{\varvec{x}}_g})\) from the usFFT applied to the functions \(\tilde{u}_{{\varvec{x}}_g}\), \(g\in {\mathcal {T}}_G\).
In this work, we always consider the tent transformation, cf. [12, 33, 43], for each \(\varphi _j\), i.e.,
Although this transformation mapping fulfills the assumptions (A1)–(A4), it might not be the most favorable choice in specific applications due to its lack of smoothness. Smoother periodizations, e.g., [7, Sect. 2.2.2], might yield better approximation results in specific situations due to the faster decay of the Fourier coefficients of \(\tilde{u}\). On the other hand, the linear structure of the tent transformation on the interval \([0,\frac{1}{2}]\) allows some simplifications later on.
3.2.2 Lognormal case
As in the affine case, we need a suitable, periodic transformation mapping \({\varphi : {\mathbb {T}}^{d} \rightarrow D_{{\varvec{y}}} = {\mathbb {R}}^{d}}\) to receive a periodization \(\tilde{u}({\varvec{x}},\tilde{{\varvec{y}}})\). Again, we choose the same functions in each stochastic dimension, so the same \(\varphi _j\) for all \(j=1,\ldots ,d\), but this time \(\varphi _j\) will consist of two separate steps. First, we consider the transformation
with the error function
For further information on this transformation, see [35]. This mapping \(\tau _1\) seems like the ideal choice when talking about random variables \({\varvec{y}}\sim {\mathcal {N}}(0,1)\), since the error function \(\text {erf}(y)\) is closely related to its cumulative distribution function \(\Phi \). In detail, it holds
The socalled inversion method in stochastic simulation describes, that the cumulative distribution function \(\Phi \) and its inverse \(\Phi ^{1}\) map random variables, distributed according to \(\Phi \), to uniformly distributed random variables on [0, 1] and the other way around, cf. [13, Sect. II.2]. Thus, our transformation \(\tau _1\) maps uniformly distributed random variables \(\breve{y} \sim {\mathcal {U}}(\frac{1}{2},\frac{1}{2})\) to normally distributed random variables \(y \sim {\mathcal {N}}(0,1)\) and is therefore a great generalization when moving forward from uniformly distributed random variables.
The second part is a suitable periodization \(\tau _2: {\mathbb {T}}\rightarrow (\frac{1}{2},\frac{1}{2})\). We choose a similar approach as in the affine case and use a shifted tent transformation
with shift \(\Delta > 0\). We need this shift, since we cannot apply the transformation \(\tau _1\) if we have \(\tau _{2,\Delta }(\tilde{y})=\pm \frac{1}{2}\) due to the poles there. Shifting with a suitable \(\Delta \) ensures, that the deterministic part of the sampling set \({\mathcal {X}}\) does not contain components equal to \(\Delta \) or \(\frac{1}{2} + \Delta \). The randomly chosen values from the interval [0, 1] for the other components will not be equal to \(\Delta \) or \(\frac{1}{2} + \Delta \) almost surely too. Hence, the sampling set \({\mathcal {X}}\) in Algorithm 1 does not contain any nodes with any component equal to \(\Delta \) or \(\frac{1}{2}+\Delta \) almost surely.
Now we define the transformation mappings \(\varphi _{j,\Delta }\) for each \(j=1,\ldots ,d\) for the lognormal case as
The mapping \(\varphi _{j,\Delta }\) as well as its two parts \(\tau _1\) and \(\tau _{2,\Delta }\) are visualized in Fig. 1. These mappings fulfill slightly modified versions of the assumptions (A1)–(A4) taking into account the shift \(\Delta \). Now we can use the transformation \(\varphi _\Delta :=(\varphi _{j,\Delta })_{j=1}^{d}\) to receive the in \(\tilde{{\varvec{y}}}\) periodic signals \(\tilde{u}({\varvec{x}}_g,\tilde{{\varvec{y}}}) = u({\varvec{x}}_g,\varphi _\Delta (\tilde{{\varvec{y}}}))\), \({\varvec{x}}_g\in {\mathcal {T}}_G\), that can be approximated using our usFFT, cf. Algorithm 1. Plugging the inverse mapping \(\varphi _\Delta ^{1}\) into the evaluation formula, which is similar to (3.1), we are now able to compute approximations of the solution functions \(u_{{\varvec{x}}_g}({\varvec{y}})\) in the lognormal case as well. Again, the periodization \(\varphi _{\Delta }\) is not smooth and therefore might yield nonoptimal approximation results. In particular, the periodization mappings \(\varphi _{j,\Delta }\) possess two poles instead of two kinks, which is a way worse smoothness behavior than in the affine case.
We now ask for the optimal choice of the parameter \(\Delta \), such that the deterministic components of the sampling nodes \(\tilde{y}\) of the R1Ls in the usFFT are as far as possible from \(\Delta \) and \(\Delta +1/2\) to reduce problems at the poles of the transformation mapping \(\varphi _\Delta \). Let
denote the jth component of the ith R1L node of the ddimensional R1L of size M. Then, we are looking for \(\Delta \), such that the minimum of the two distances
is maximal.
Lemma 3.1
Let \(\Lambda ({\varvec{z}},M)\) be a ddimensional R1L with prime lattice size \(M\in {\mathbb {N}}\), \(M>2,\) generating vector \({\varvec{z}}\in {\mathbb {Z}}^d\), for at least one \(j_0\in \{1,\ldots ,d\}\), and the lattice nodes \(\tilde{y}_{i,j}\) as defined in (3.4). Then, we have
The proof of Lemma 3.1 is given in Appendix C.
4 Numerics
We will now test the usFFT on different, twodimensional numerical examples. In particular, we consider the parametric PDE (1.1) with zero boundary condition and different random coefficients \(a({\varvec{x}},{\varvec{y}})\) and righthand sides \(f({\varvec{x}})\).
Since our algorithm yields an approximation \(u_{{\varvec{x}}_g}^{\texttt {usFFT}}({\varvec{y}})\) for each \({\varvec{x}}_g \in {{\mathcal {T}}}_G\) separately, we also compute the approximation error
and
for each \({\varvec{x}}_g \in {{\mathcal {T}}}_G\) separately, using \(n_{\text {test}}=10^5\) different, randomly drawn test variables \({\varvec{y}}^{(j)}\) from the underlying probability distribution. Here, \(\check{u}(\cdot ,{\varvec{y}}^{(j)})\) are the finite element solutions of the PDE for fixed parameters \({\varvec{y}}^{(j)}\) and \(u^{\texttt {usFFT}}({\varvec{x}}_g,\cdot )\) are our approximations from the usFFT. The parameter \(\eta \) denotes the used sFFT parameters N, s and \(s_{\text {local}}\) as given in Table 1.
Here, N is the extension of the full grid \([N,N]^{d}\), that is used as the search space \({\Gamma \subset {\mathbb {Z}}^{d}}\). Note that we also choose \(s_{\text {local}} = s\). If we miss an important frequency component at one point \({\varvec{x}}_g\), it is very likely, that it is contained in the detected index set of a neighboring mesh point. Therefore, the union over all points \({\varvec{x}}_g\) should be enough to avoid losing frequencies and we do not need a larger \(s_{\text {local}}\). We choose the threshold \(\theta = 1 \cdot 10^{12}\) and the number of detection iterations \(r=5\) for all parameter settings \(\eta \), which are common values and the same as in [32]. In particular, we choose the number of detection iterations r as well as the probabilities \(\gamma _{{\text {A}}}\) and \(\gamma _{{\text {B}}}\) as in the case \(G=1\), since we expect a huge overlap of the detected index sets and hence a small failure probability even for these parameter choices instead of the theoretical choices given in the proof of Theorem 1.1.
Further, we always use the random R1L approach in the role of Algorithm \({\text {A}}\) to recover the projected Fourier coefficients in the dimensionincremental method, cf. Sect. 2.2 and [32]. We also tested the algorithm using the single and multiple R1L approaches mentioned in Sect. 2.2, but these did not achieve significantly smaller approximation errors and are using larger numbers of samples and therefore result in longer runtimes of the algorithm. Hence, it seems reasonable to stick with the random R1L approach here. We choose the target maximum edge length of the finite element mesh \(h_{\max } = 0.075\) in the FE solver. All examples consider the spatial domain \(D_{{\varvec{x}}}=[0,1]^2\), resulting in a finite element mesh \({{\mathcal {T}}}_G \subset D_{{\varvec{x}}}\) with \(G=737\) inner and 104 boundary nodes.
Further, we will also analyze the importance of and the interactions between our detected Fourier coefficients. To this end, we use the classical ANOVA decomposition of 1periodic functions as given in [38] or [29, 44]. Note that for instance in [38] the ANOVA decomposition is used already in the proposed methods to receive an adaptive selection of the most important approximation terms, which we realized in our method by simply comparing the size of the projected Fourier coefficients, cf. Sects. 2.2 and 3.1.
In particular, we consider the variance of our approximation
Now we can study different subsets \(\mathrm {J}\subset I\) and estimate the variance of the approximation using only these subsets. The fraction of variance, that is explained using this subset \(\mathrm {J}\), is then called global sensitivity index (GSI), see [41, 42],
where we define \(\tilde{u}_{{\varvec{x}}_g, \mathrm {J}}^{\texttt {usFFT}}({\varvec{y}}):=\sum _{{\varvec{k}}\in \mathrm {J}}c_{{\varvec{k}}}^{\texttt {usFFT}}(\tilde{u}_{{\varvec{x}}_g})\,\text {e}^{2\pi \text {i}{\varvec{k}}\cdot \varphi ^{1}({\varvec{y}})}\). In our examples, we will mainly consider the subsets \(\mathrm {J}_{\ell }\) of all frequencies \({\varvec{k}}\) with exactly \(\ell \) nonzero components, i.e.,
but of course several other choices of \(\mathrm {J}_{\ell }\) might be interesting as well for different applications.
Finally, one can also think about evaluating various quantities of interest of the approximation. Here, we will consider the expectation value \({\mathbb {E}}(u_{{\varvec{x}}_g}^{\texttt {usFFT}})\) as one example of such quantities. We use a MonteCarlo approximation of the expectation value
of the finite element approximation using \({n_{\text {MC}}}\) random samples for comparison.
4.1 Expectation value of the approximation
Computing the expectation value of our approximation \(u_{{\varvec{x}}_g}^{\texttt {usFFT}}\) requires some additional effort, depending on the particular model and eventually used periodization methods. By definition, the expectation value is given by
where p is the probability density function of the random variable \({\varvec{y}}\).
For the periodic model, we do not need any periodization. Therefore, the approximation of the solution reads as
with \(I\) the frequency set and \(c_{{\varvec{k}}}^{\texttt {usFFT}}(u_{{\varvec{x}}_g})\) the corresponding approximated Fourier coefficients computed by the usFFT. The random variable \({\varvec{y}}\) is assumed to be uniformly distributed in \(D_{{\varvec{y}}} = [\frac{1}{2},\frac{1}{2}]^{d}\) in this case. Hence, we have
In the affine case, we use the tent transformation (3.2), such that our approximation reads as
with \({\varvec{1}}= (1,1,\ldots ,1) \in {\mathbb {R}}^{d}\). Again, the random variable \({\varvec{y}}\) is assumed to be uniformly distributed, but for this computation we work with the more general domain \(D_{{\varvec{y}}} = [\alpha ,\beta ]^{d}\). Therefore, we have
with
Note that the parameters \(\alpha \) and \(\beta \) vanish completely. Thus, the formula is independent of the particular domain \(D_{{\varvec{y}}} = [\alpha ,\beta ]^{d}\).
Finally, the lognormal model involves the more complicated transformation mappings \(\varphi _{j,\Delta }\) given in (3.3). Thus, the approximation reads as
Here, the random variable \({\varvec{y}}\) is standard normally distributed, i.e., \({\varvec{y}}\sim {\mathcal {N}}({\varvec{0}},{\varvec{I}})\) with \({\varvec{I}}\) the identity matrix of dimension d. Hence, the expectation value can be written as
with
Note that the factors \(D_{k_j,\Delta }\) are exactly the same as the \(D_{k_j}\) in the affine case up to the correction term \(\text {e}^{2\pi \text {i}k_j \Delta }\) due to the shift with \(\Delta \).
4.2 Periodic example
We consider the example from [22, Sect. 6] using the domain \(D_{{\varvec{x}}} = (0,1)^2\) with righthand side \(f({\varvec{x}}) = x_2\) and the random coefficient
with the random variables \({\varvec{y}}\sim {\mathcal {U}}\left( [\frac{1}{2},\frac{1}{2}]^{d}\right) \) and
where \(c>0\) is a constant and \(\mu >1\) is the decay rate. Accordingly, we get
such that for \(c < \frac{\sqrt{6}}{\zeta (\mu )}\) the uniform ellipticity assumption (2.1) is fulfilled.
We test the usFFT with the stochastic dimension \(d = 10\) on the two parameter choices \(\mu = 1.2,\; c = 0.4\) and \(\mu = 3.6,\; c = 1.5\) from [22]. The first choice seems to model a more difficult PDE, since the decay of the functions \(\psi _j\) w.r.t. j is very slow and we have \(a_{\min } = 0.08690\) and \(a_{\max } = 1.91310\). This range of a is wider and \(a_{\min }\) is closer to zero than for the quickly decaying second parameter choice with \(a_{\min } = 0.31660\) and \(a_{\max } = 1.68340\).
Figure 2 illustrates the total approximation error \({\text {err}}_p^{\eta }({\varvec{x}}_g)\) for \(p=1\) and \(p=2\) as well as the Monte–Carlo approximation of the expectation value \(\overline{\check{u}_{{\varvec{x}}_g}}\) using \(n_{\text {MC}}=10^6\) samples for comparison.
A more detailed insight on the decay of the error is given in Fig. 3. There, the largest approximation error \({\text {err}}_2^{\eta }\) w.r.t. the nodes \({\varvec{x}}_g\) is given with the number of samples used in the corresponding usFFT. Note that this number scales directly with the sparsity parameter s, while the extension parameter N has nearly no impact. Hence, the data points in Fig. 3 are ordered from left to right from \(s=100\) to \(s=4000\).
Finally, Fig. 4 shows the cardinality of the sets \(\mathrm {J}_\ell \), i.e., the number of frequencies detected with exactly \(\ell \) nonzero components as given in (4.4), as well as the corresponding global sensitivity indices \(\varrho (\mathrm {J}_\ell ,u_{{\varvec{x}}_g}^{\texttt {usFFT}})\) given in (4.3). Note that these values also depend on the considered point \({\varvec{x}}_g\). Therefore, the bars show the smallest and largest GSI among all nodes \({\varvec{x}}_g \in {{\mathcal {T}}}_G\) as well as their median and mean value.
4.2.1 Discussion
The absolute error \({\text {err}}_p^{\eta }\) in Fig. 2 is very small compared to the function values of \(\overline{\check{u}_{{\varvec{x}}_g}}\). Thus, our approximation \(u_{{\varvec{x}}_g}^{\texttt {usFFT}}\) is already a very good approximation for these relatively small sparsity parameters s and extension parameters N.
The periodic setting results in very quickly decaying Fourier coefficients. Obviously, the same holds for their projections computed in the dimensionincremental steps. In particular, most of the onedimensional projections in step 1 & 2a of Algorithm 1, e.g., all projections with component \(k_t\) with \(k_t >4\), at the start of each iteration are so small, that they are neglected immediately. Hence, the onedimensional index sets \(I^{(t)}\) are independent of N (for large enough N) and so the choice of N has only a marginal impact. Note that we also tested our algorithm with smaller thresholds \(\theta \), but the additionally detected and not neglected frequencies did not change the approximation significantly in the end.
We indicate some kind of linear behavior in the double logarithmic Fig. 3. Additional tests showed, that even for smaller sparsity parameters \(1\le s<100\) the corresponding sampleserrorpair fits into this model, i.e., there seems to be no preasymptotic behavior of our algorithm. In [22] the theoretical decay rates are often smaller than the error decay observed in numerical experiments. We also observe a relatively fast decay compared to these theoretical rates. On the other hand, the decay of the approximation error \({\text {err}}_2^{\eta }\) for the faster decaying random coefficient a with \(\mu = 3.6\) is not that much better than the decay of the more complicated example with \(\mu = 1.2\). It seems like our algorithm is capable of handling the more difficult problem very well, but also does not yield that much further advantages when being applied to easier problems, i.e., with larger \(\mu \), larger \(a_{\min }\) and a smaller range of the interval \([a_{\min }, a_{\max }]\). Note that most of the samples needed are required for the detection of the frequency set \(I\) and only a small fraction is really used for the final computation of the corresponding Fourier coefficients, cf. Sect. 4.5 and Remark 4.2. We also computed the approximation error \(\text {err}_\infty ^{\eta }\) for different \(\eta \) for both parameter choices of \(\mu \) and c. Obviously, these errors have to be larger than the shown errors \({\text {err}}_2^{\eta }\), but the actual magnitude of \(\text {err}_\infty ^{\eta }\) is only about 10 or 15 times as large as the errors \({\text {err}}_2^{\eta }\). Hence, the pointwise approximation error seems to stay in a reasonable size for any randomly drawn \({\varvec{y}}\).
As we saw in Sect. 4.1, the expectation value of our approximation \({\mathbb {E}}(u_{{\varvec{x}}_g}^{\texttt {usFFT}})\) is simply its zeroth Fourier coefficient \(c_{{\varvec{0}}}^{\texttt {usFFT}}(u_{{\varvec{x}}_g})\). Since this coefficient is included and computed for each sparsity parameter s anyway, it seems like our different parameter choices would not influence the precision of its approximation at first sight. But for larger sparsity parameters s, we compute more Fourier coefficients in our algorithm, where possible aliasing effects should spread evenly among all of these coefficients, i.e., the particular socalled aliasing error on \(c_{{\varvec{0}}}^{\texttt {usFFT}}(u_{{\varvec{x}}_g})\), cf. [25, 32], gets smaller and the approximation improves. Unfortunately, this is not visible in our numerical tests, since the comparison value \(\overline{\check{u}_{{\varvec{x}}_g}}\) behaves too poorly. In detail, we would have to investigate very small sparsity parameters \(s<25\) to observe the described effects. For all of our parameter choices \(\eta \), the Monte–Carlo approximation \(\overline{\check{u}_{{\varvec{x}}_g}}\) with \({n_{\text {MC}}} = 5\cdot 10^6\) samples is not accurate enough to give insight on the particular behavior of our approximation of the expectation value.
Figure 4 shows, that there are no frequencies detected with all or nearly all components being active. Further, even though only 68 of the 4819 frequencies detected (excluding \(c_{{\varvec{0}}}\)) have exactly one nonzero component, i.e., are supported on the axis cross, they contain more than \(99\%\) of the variance of our approximation. So the higherdimensional frequencies with two, three or four nonzero components seem to be nearly neglectable for the approximation.
4.3 Affine example
For the affine case, we consider an example from [16, Sect. 11] with domain \(D_{{\varvec{x}}} = (0,1)^2\), righthand side \(f({\varvec{x}}) \equiv 1\) and the random coefficient
with the random variables \({\varvec{y}}\sim {\mathcal {U}}([1,1]^{d})\) and
where again \(c > 0\) is a constant and \(\mu > 1\) the decay rate. Further, \(m_1(j)\) and \(m_2(j)\) are defined as
with \(k(j) :=\lfloor 1/2 + \sqrt{1/4 + 2j} \rfloor \). Table 2 shows the numbers \(m_1(j), m_2(j)\) and k(j) for a few \(j \ge 1\).
As before, we get that \(a_{\min } = 1c\,\zeta (\mu )\) and \(a_{\max } = 1+c\,\zeta (\mu )\), such that for \(c < \frac{1}{\zeta (\mu )}\) the uniform ellipticity assumption (2.1) is fulfilled. Here, we use the parameter choices from [16] with \(\mu = 2\) for a relatively slow decay and \(c = \frac{0.9}{\zeta (2)} \approx 0.547\) to end up with \(a_{\min }=0.1\) and \(a_{\max }=1.9\), which is very similar to the first parameter choice in the periodic case. We choose the stochastic dimension \(d=20\) as in [16].
Figure 5 illustrates the Monte–Carlo approximation of the expectation value \(\overline{\check{u}_{{\varvec{x}}_g}}\) with \({n_{\text {MC}}} = 10^6\) samples used as well as the pointwise error \(\vert \overline{\check{u}_{{\varvec{x}}_g}}  {\mathbb {E}}(u_{{\varvec{x}}_g}^{\texttt {usFFT}})\vert \) for two different parameter choices \(\eta \) with \({\mathbb {E}}(u_{{\varvec{x}}_g}^{\texttt {usFFT}})\) as given in (4.5).
Figure 6 again shows the largest error \({\text {err}}_2^{\eta }\) w.r.t. the nodes \({\varvec{x}}_g\) for different parameter settings \(\eta \). This time, we can observe a small increase in the number of used samples for larger extensions N, which was not visible in the periodic example. Hence, the parameter settings \(\eta = \) I to XI have monotonously increasing sampling sizes, i.e., the data points in Fig. 6 are ordered from left to right w.r.t. increasing \(\eta \) this time.
Figure 7 again illustrates the cardinality of the sets \(\mathrm {J}_\ell \) as well as their GSI.
4.3.1 Discussion
We note that even for small sparsity parameters s the approximation \({\mathbb {E}}(u_{{\varvec{x}}_g}^{\texttt {usFFT}})\) seems to be quite accurate in Fig. 5. Unfortunately, for all other parameter choices \(\eta \) except I and II, the magnitude of the errors does not decrease any further than \(1.5 \cdot 10^{5}\), which is probably caused by the poor performance of the MonteCarlo approximation \(\overline{\check{u}_{{\varvec{x}}_g}}\).
The magnitudes of the errors \({\text {err}}_2^{\eta }\) in Fig. 6 are already very low for small sparsity parameters s compared to the expected function values shown in Fig. 5a. We note that there is an obvious improvement for each sparsity parameter s, when we progress from \(N=32\), the first data point in each cluster, to \(N=64\), the second data point. In the periodic case, the important frequencies are very well localized around zero, such that the choice of N had almost no impact. This time, we really lose some accuracy if we choose the smaller extension \(N=32\). For the sparsity parameter \(s=2000\) we also see this effect when progressing from \(N=64\) to \(N=128\). The overall decay of the error is a lot slower compared to the periodic example. This is probably mainly caused by the nonsmooth tent transformation used, cf. Sect. 3.2.1. Again, some numerical tests determining the error \(\text {err}_\infty ^{\eta }\) revealed a similar behavior as in the periodic setting and showed that these errors again are not larger than at most 20 times the error \({\text {err}}_2^{\eta }\).
Since the Fourier coefficients do not decay as fast as in the smooth periodic case, we detected a significantly larger number of one and twodimensional couplings in Fig. 7. Again, the frequencies with only one nonzero entry explain the largest part of the variance of the function, but this time the minimum percentage is lower than in the periodic example with only about \(94.5\%\). Accordingly, the importance of the two and threedimensional pairings did slightly grow. The large number of important coefficients with only one, two or three nonzero entries also results in nearly no detected significant frequencies with more than three nonzero entries. For example, the 54 frequencies in \(\mathrm {J}_4\) will vanish when working with larger extensions N, since other frequencies with less entries are preferred in that case. So even though we are working with the moderate stochastic dimension \(d = 20\), we do not detect any frequencies, where the half or even only a quarter of these dimensions are active simultaneously.
4.4 Lognormal example
We consider a twodimensional problem based on the example in [9] on the domain \(D_{{\varvec{x}}}=[0,1]^2\) with righthand side \(f({\varvec{x}}) = \sin (1.3\pi x_1 + 3.4 \pi x_2) \cos (4.3 \pi x_1  3.1 \pi x_2)\). The lognormal random coefficient is given by
with the functions
In [9], the stochastic dimension \(d=4\) has been used. Here, we will work with \(d = 10\) to receive a more complicated and higherdimensional problem setting. We use a standard normally distributed random variable \({\varvec{y}}\sim {\mathcal {N}}({\varvec{0}},{\varvec{I}})\) with \({\varvec{I}}\) the identity matrix of dimension d as before. Hence, we have, that for each \({\varvec{x}}\) there holds \(0< a({\varvec{x}},{\varvec{y}}) < \infty \) for any \({\varvec{y}}\). However, there do not exist the constants \(0< a_{\min } \le a_{\max }<\infty \) in this example, since \(b({\varvec{x}},{\varvec{y}})\) can become arbitrarily small or large. Therefore, the problem is neither uniformly elliptic nor uniformly bounded. This complicates the analysis of this problem tremendously. We can still stick with it for our numerical tests, since we only need the solvability of the differential equation for fixed values of \({\varvec{y}}\). Further, we have \(b({\varvec{x}},{\varvec{y}}) \in [3,3]\) and therefore \(\exp (b({\varvec{x}},{\varvec{y}})) \in [\text {e}^{3},\text {e}^3] \approx [0.05, 20.09]\) with a probability of more than \(99\%\) for each \({\varvec{x}}\in D_{{\varvec{x}}}\), i.e., tremendously small or large values of \(a({\varvec{x}},{\varvec{y}})\) are very unlikely to appear.
Figure 8a once again illustrates the Monte–Carlo approximation of the expectation value \(\overline{\check{u}_{{\varvec{x}}_g}}\) with \(n_{\text {MC}} = 10^6\) samples used.
The decay of the largest error \({\text {err}}_2^{\eta }\) w.r.t. the nodes \({\varvec{x}}_g\) is shown in Fig. 9, where the data points are ordered from left to right w.r.t. increasing \(\eta \) as in Fig. 6.
Finally, Fig. 10 shows the cardinality of the sets \(\mathrm {J}_\ell \) as well as their GSI for this example.
4.4.1 Discussion
We note that the pointwise solution in Fig. 8a has a more interesting structure than for the other examples above, mainly caused by the lognormal random coefficient and the nonconstant righthand side \(f({\varvec{x}})\). Nevertheless, the approximations \({\mathbb {E}}(u_{{\varvec{x}}_g}^{\texttt {usFFT}})\) achieve small errors, which are shown in Fig. 8. This time, a further increase of the sparsity parameter s and the extension N still increase the accuracy of our approximations, so the stagnation due to the limitations of the MonteCarlo approximation \(\overline{\check{u}_{{\varvec{x}}_g}}\), that we saw in the previous examples, does not occur yet.
The pointwise errors \({\text {err}}_2^{\eta }({\varvec{x}}_g)\) behave slightly worse but still very good, as we see in Fig. 9. Again, the increase of the extension N shows visible improvements of the approximation error \({\text {err}}_2^{\eta }\). The decay rate is lower than before, matching our expectations since the lognormal example is far more difficult than the affine or periodic examples. Note that once again the slope considers all data points shown, while specific decays for fixed extensions N might be slower or faster. Further, the size of the error \(\text {err}_\infty ^{\eta }\) is again about 10 times the size of \({\text {err}}_2^{\eta }\), revealing also a good pointwise approximation w.r.t. the random variable \({\varvec{y}}\) in this scenario.
We notice a similar distribution of the detected frequencies \({\varvec{k}}\) to the index sets \(\mathrm {J}_{\ell }\cap I\) as before, cf. Fig. 10. The key difference is the size of the GSI for each of these index sets. The range of the GSI for \(\mathrm {J}_1\) increased significantly, the minimal portion of variance is now only about \(30\%\). Obviously, the GSI for the other index sets \(\mathrm {J}_\ell \) grew accordingly. This is probably caused by the more difficult structure of the lognormal diffusion coefficient a and the corresponding more difficult structure of the solution which is reflected in larger differences in the optimal frequency sets \(I_{{\varvec{x}}_g}\), \(g=1,\ldots ,G\), cf. Remark 4.1. Nevertheless, we again detect nearly no significant frequencies \({\varvec{k}}\) with 4 or more active dimensions as in the previous examples.
Remark 4.1
As mentioned before, the output of the usFFT contains more than the sparsity parameter s frequencies since we join the detected index sets in each dimension increment and use no thresholding technique to reduce the number of found frequencies after that. While we have no reasonably tight theoretical bounds on the size of the output yet, we can further investigate the number of output frequencies in our numerical tests. In detail, we express the detected output sparsity \(s_{\text {real}}\) as a multiple of the given sparsity parameter s, i.e., \(s_{\text {real}} = q \cdot s\) with some factor \(q \in {\mathbb {R}}\).
In the numerical tests for the first periodic example in Sect. 4.2, i.e., \(\mu =1.2\) and \(c=0.4\), we have \(q \in [2.41, 2.74]\), where the larger values of q tend to appear for smaller sparsity parameters s. For the quickly decaying example, i.e., \(\mu =3.6\) and \(c=1.5\), we have \(q \in [1.9042, 2.45]\) and again the larger values of q are attained for small sparsity parameters s.
The affine model in Sect. 4.3 results in \(q \in [2.06, 2.186]\), where \(q < 2.1\) is only attained for \(\eta =\) I, II and IV, so parameter settings with small sparsity parameters s and extension \(N=32\).
Finally, in the complicated lognormal case in Sect. 4.4, we observe \(q \in [4.776, 5.15]\). While the values above 5 only appear for \(\eta =\) I and II, we still have significantly larger factors q than before . However, the magnitude of q is still very small compared to the size \(\vert {{\mathcal {T}}}_G\vert = G = 739\) in our examples.
Our observation is consistent with recent results presented in [22] which considers the periodic model only. The crucial common feature is that the pointwise approximations \(u_{{\varvec{x}}_g}\) can be regarded as elements of a joint reproducing kernel Hilbert space with uniformly good kernel approximants.
Overall, the factor q in our examples is much smaller than G, which would be the worst factor possible in the case that all \(I_{{\varvec{x}}_g}\) are disjoint. Hence, as already mentioned in Sect. 1, the given complexities in Theorem 1.1 are way too pessimistic and the true amount of sampling locations and computational steps needed is much smaller in all of our numerical examples.
4.5 Comparison to given frequency sets
The main effort of the usFFT lies in detecting the index set \(I\subset \Gamma \). The computation of the corresponding Fourier coefficients in the final step of Algorithm 1 needs significantly less samples than the detection steps before. Hence, the question arises, if an a priori choice of the index set \(I\) should be preferred to reduce the computational cost, cf. Remark 4.2. Therefore, we now consider the following kinds of index sets:

axis cross with uniform weight 1: \( I= \lbrace {\varvec{k}}\in {\mathbb {Z}}^{d}: \Vert {\varvec{k}}\Vert _0 = 1, \Vert {\varvec{k}}\Vert _1 \le N \rbrace \)

hyperbolic cross with uniform weight \(\frac{1}{4}\): \( I= \lbrace {\varvec{k}}\in {\mathbb {Z}}^{d}: \prod _{j=1}^{d} \max (1,4 \vert k_j \vert ) \le N \rbrace \)

hyperbolic cross with slowly or quickly \((q=1\) or 2) decaying weights \(\frac{1}{j^q}\): \( I= \lbrace {\varvec{k}}\in {\mathbb {Z}}^{d}: \prod _{j=1}^{d} \max (1,j^q \vert k_j \vert ) \le N \rbrace \)

\(l_1\)ball with slowly or quickly \((q=1\) or 2) decaying weights \(\frac{1}{j^q}\): \( I= \lbrace {\varvec{k}}\in {\mathbb {Z}}^{d}: \sum _{j=1}^{d} j^q \vert k_j \vert \le N \rbrace \)
The Fourier coefficients \(c_{{\varvec{k}}}(u_{{\varvec{x}}_g})\) are approximated using the same multiple R1L approach as in step 3 of our Algorithm 1, i.e., we just skipped steps 1 and 2 by choosing the index set \(I\) instead of detecting it. Figure 11 illustrates the largest error \({\text {err}}_2^{\eta }\) w.r.t. the nodes \({\varvec{x}}_g\in {\mathcal {T}}_G\) for the previously considered periodic and affine examples, cf. Sects. 4.2 and 4.3, with these given frequency sets \(I\) for various refinements N.
The magnitude of the errors is considerably larger than for comparable parameter settings of the usFFT, e.g., \(\eta =\) I to III, especially for the periodic example. Further, we also see that the particular choice of the structure of the index set plays an important role. Obviously, a cleverly chosen index set reduces the size of the approximation error tremendously, especially in the periodic settings. But finding a good or even optimal choice of the index set is highly nontrivial, since it requires sufficient a priori information about the PDE and the structure of its solution or additional computational effort, e.g., to determine suitable weights for a given index set structure. This can be observed for example when comparing the hyperbolic cross index sets for the periodic examples. The uniform weights achieve the best results for \(\mu = 1.2\), but cannot keep up at all with the decaying weights for the faster decay rate \(\mu = 3.6\). On the other hand, even if we know, that there is a certain decay in our random coefficient, it is not clear how to choose suitable decay rates for the weights in order to guarantee reasonable results—specifically in preasymptotic settings, which is the rule rather than the exception when numerically determining solutions of highdimensional problems.
The usFFT does not depend on these kind of information, as its choice of the frequency set is fully adaptive and the only required a priori information is the search space \(\Gamma \), which can be chosen sufficiently large without disturbing the results of the algorithm. Further, the detected frequency set \(I\) provides these additional information about the structure of the solution u as well as the dependence on the random variables \({\varvec{y}}\). In other words, the additional amount of samples needed for the usFFT makes these structural information unnecessary, detects them on its own and provides a possibility to extract them afterwards from the output.
Remark 4.2
The computations in this section and in step 3 of the usFFT are performed using the multiple R1L approach for the efficient computation of Fourier coefficients for a given frequency set \(I\) as proposed in [24]. From [24, Cor. 3.7] we get a bound on the number of sampling nodes M used. Since we are working with \(c=2\) and \(\delta = 0.5\) as in [25, Alg. 3], we arrive at \(M \le \lceil 2 \,\mathrm {ln}\, (2\vert I\vert ) \rceil 4 (\vert I\vert 1)\). Note that this upper bound is very rough and the actual number of used sampling nodes in almost all numerical experiments is much lower.
As stated above several times, this number of samples used in step 3 of the usFFT is just a small fraction of the total number of used samples when applying the usFFT. In particular, the computation of the actual Fourier coefficients \(c_{{\varvec{k}}}^{\text {\texttt {usFFT}}}(u_{{\varvec{x}}_g})\) for the detected frequency set \(I\) requires roughly \(0.4\%\) or \(0.3\%\) of the total sampling amount for the two different parameter choices of the periodic example in Sect. 4.2, around \(0.1\%\) in the affine case in Sect. 4.3 and about \(0.65\%\) for the lognormal model from Sect. 4.4.
In [9, 44], a datadriven method was proposed, which is capable of computing approximations for multiple righthand sides \(f({\varvec{x}})\) from a certain function class. Our usFFT approach can also be generalized in a similar, datadriven way: For a given class of functions \(f({\varvec{x}})\) or even \(f({\varvec{x}},{\varvec{y}})\), we can use the usFFT in order to compute the frequency set \(I\) for one randomly selected righthand side f or randomly select multiple righthand sides f and compute unions of the corresponding index sets \(I_f\) by means of (a slight modification of) the presented usFFT. In each case, we end up with a frequency set \(I\), which is probably a good choice for all the functions f in the given class, since they are hopefully very similar to each other. Hence, we can use this index set \(I\) as a starting point and compute approximations of the corresponding Fourier coefficients \(c_{{\varvec{k}}}(u_{{\varvec{x}}_g}), {\varvec{k}}\in I,\) as done above. This approximation of \(u_{{\varvec{x}}_g}\) is then probably a lot better, i.e., the detected index set \(I\) is a better localization of the largest Fourier coefficients than some a priori choice.
5 Summary
The proposed dimensionincremental method provides an efficient possibility to adaptively compute solutions to parametric PDEs involving highdimensional random coefficients. The nonintrusive behavior allows the use of different, suitable PDE solvers and therefore generates high adaptability of our algorithm to different problem settings as well. We show, that the amount of PDE solutions needed can be decreased by our method compared to the naive repetitive approach even in the worst case. The numerical experiments underline this and show the functionality of our method for practical examples.
References
Adcock, B., Brugiapaglia, S., Webster, C.G.: Sparse Polynomial Approximation of HighDimensional Functions. Society for Industrial and Applied Mathematics, Philadelphia, PA (2022)
Bachmayr, M., Cohen, A., Dahmen, W.: Parametric PDEs: sparse or lowrank approximations? IMA J. Numer. Anal. 38(4), 1661–1708 (2018)
Bachmayr, M., Cohen, A., DeVore, R., Migliorati, G.: Sparse polynomial approximation of parametric elliptic PDEs. Part II: lognormal coefficients. ESAIM Math. Model. Numer. Anal. 51(1), 341–363 (2017)
Bachmayr, M., Cohen, A., Dũng, D., Schwab, C.: Fully discrete approximation of parametric and stochastic elliptic PDEs. SIAM J. Numer. Anal. 55(5), 2151–2186 (2017)
Bachmayr, M., Cohen, A., Migliorati, G.: Sparse polynomial approximation of parametric elliptic PDEs. Part I: Affine coefficients. ESAIM Math. Model. Numer. Anal. 51(1), 321–339 (2017)
Bachmayr, M., Cohen, A., Migliorati, G.: Representations of Gaussian random fields and approximation of elliptic PDEs with lognormal coefficients. J. Fourier Anal. Appl. 24(3), 621–649 (2018)
Bochmann, M., Kämmerer, L., Potts, D.: A sparse FFT approach for ODE with random coefficients. Adv. Comput. Math. 46(5), Paper No. 65, 21 (2020)
Bouchot, J.L., Rauhut, H., Schwab, C.: Multilevel Compressed Sensing PetrovGalerkin discretization of highdimensional parametric PDEs. ArXiv eprints (2017). arXiv:1701.01671 [math.NA]
Cheng, M., Hou, T.Y., Yan, M., Zhang, Z.: A datadriven stochastic method for elliptic PDEs with random coefficients. SIAM/ASA J. Uncertain. Quantif. 1(1), 452–493 (2013)
Cohen, A., DeVore, R.: Approximation of highdimensional parametric PDEs. Acta Numer 24, 1–159 (2015)
Cohen, A., DeVore, R., Schwab, C.: Convergence rates of best \(N\)term Galerkin approximations for a class of elliptic sPDEs. Found. Comput. Math. 10(6), 615–646 (2010)
Cools, R., Kuo, F.Y., Nuyens, D., Suryanarayana, G.: Tenttransformed lattice rules for integration and approximation of multivariate nonperiodic functions. J. Complexity 36, 166–181 (2016)
Devroye, L.: Nonuniform random variate generation. SpringerVerlag, New York (1986)
Dick, J., Kuo, F.Y., Le Gia, Q.T., Schwab, C.: Multilevel higher order QMC PetrovGalerkin discretization for affine parametric operator equations. SIAM J. Numer. Anal. 54(4), 2541–2568 (2016)
Dick, J., Le Gia, Q.T., Schwab, C.: Higher order quasiMonte Carlo integration for holomorphic, parametric operator equations. SIAM/ASA J. Uncertain. Quantif. 4(1), 48–79 (2016)
Eigel, M., Gittelson, C.J., Schwab, C., Zander, E.: Adaptive stochastic Galerkin FEM. Comput. Methods Appl. Mech. Engrg. 270, 247–269 (2014)
Foucart, S., Rauhut, H.: A Mathematical Introduction to Compressive Sensing. Applied and Numerical Harmonic Analysis. Birkhäuser/Springer, New York (2013)
Gantner, R.N., Herrmann, L., Schwab, C. (2018) Multilevel QMC with product weights for affineparametric, elliptic PDEs. In: Contemporary Computational Mathematics—A Celebration of the 80th Birthday of Ian Sloan. Springer, Cham, pp. 373–405
Graham, I.G., Kuo, F.Y., Nichols, J.A., Scheichl, R., Schwab, C., Sloan, I.H.: QuasiMonte Carlo finite element methods for elliptic PDEs with lognormal random coefficients. Numer. Math. 131(2), 329–368 (2015)
Gross, C., Iwen, M.A., Kämmerer, L., Volkmer, T.: Sparse Fourier transforms on rank1 lattices for the rapid and lowmemory approximation of functions of many variables. Sampl. Theory Signal Process. Data Anal. 20, 1 (2022)
Iwen, M.A.: Improved approximation guarantees for sublineartime Fourier algorithms. Appl. Comput. Harmon. Anal. 34, 57–82 (2013)
Kaarnioja, V., Kazashi, Y., Kuo, F., Nobile, F., Sloan, I.: Fast approximation by periodic kernelbased latticepoint interpolation with application in uncertainty quantification. Numer. Math. 150, 33–77 (2022)
Kaarnioja, V., Kuo, F.Y., Sloan, I.H.: Uncertainty quantification using periodic random variables. SIAM J. Numer. Anal. 58(2), 1068–1091 (2020)
Kämmerer, L.: Constructing spatial discretizations for sparse multivariate trigonometric polynomials that allow for a fast discrete Fourier transform. Appl. Comput. Harmon. Anal. 47(3), 702–729 (2019)
Kämmerer, L., Potts, D., Volkmer, T.: Highdimensional sparse FFT based on sampling along multiple rank1 lattices. Appl. Comput. Harmon. Anal. 51, 225–257 (2021)
Kämmerer, L., Ullrich, T., Volkmer, T.: Worst case recovery guarantees for least squares approximation using random samples. Constr. Approx. 54, 295–352 (2021)
Kuo, F., Migliorati, G., Nobile, F., Nuyens, D.: Function integration, reconstruction and approximation using rank1 lattices. Math. Comp. 90(330), 1861–1897 (2021)
Kuo, F.Y., Nuyens, D. (2018) Application of quasi–Monte Carlo methods to PDEs with random coefficients—an overview and tutorial Springer Proc Math Stat. In: Monte Carlo and quasiMonte Carlo methods. Springer, Cham, pages 53–71
Kuo, F.Y., Nuyens, D., Plaskota, L., Sloan, I.H., Wasilkowski, G.W.: Infinitedimensional integration and the multivariate decomposition method. J. Comput. Appl. Math. 326, 217–234 (2017)
Kuo, F.Y., Schwab, C., Sloan, I.H.: QuasiMonte Carlo finite element methods for a class of elliptic partial differential equations with random coefficients. SIAM J. Numer. Anal. 50(6), 3351–3374 (2012)
Kuo, F.Y., Schwab, C., Sloan, I.H.: Multilevel quasiMonte Carlo finite element methods for a class of elliptic PDEs with random coefficients. Found. Comput. Math. 15(2), 411–449 (2015)
Kämmerer, L., Krahmer, F., Volkmer, T.: A sample efficient sparse FFT for arbitrary frequency candidate sets in high dimensions. Numer. Algorithms (2021)
Li, D., Hickernell, F.J.: Trigonometric spectral collocation methods on lattices. In: Recent Advances in Scientific Computing and Partial Differential Equations (Hong Kong, 2002), vol. 330, pp. 121–132. Amer. Math. Soc., Providence, RI (2003)
Morotti, L.: Explicit universal sampling sets in finite vector spaces. Appl. Comput. Harmon. Anal. 43, 354–369 (2017)
Nasdala, R., Potts, D.: Transformed rank1 lattices for highdimensional approximation. Electron. Trans. Numer. Anal. 53, 239–282 (2020)
Nguyen, D.T.P., Nuyens, D.: MDFEM: multivariate decomposition finite element method for elliptic PDEs with lognormal diffusion coefficients using higherorder QMC and FEM. ESAIM Math. Model. Numer. Anal. 55(4), 1461–1505 (2021)
Nguyen, D.T.P., Nuyens, D.: MDFEM: Multivariate decomposition finite element method for elliptic PDEs with uniform random diffusion coefficients using higherorder QMC and FEM. Numer. Math. 148(3), 633–669 (2021)
Potts, D., Schmischke, M.: Approximation of highdimensional periodic functions with Fourierbased methods. SIAM J. Numer. Anal. 59(5), 2393–2429 (2021)
Potts, D., Volkmer, T.: Sparse highdimensional FFT based on rank1 lattice sampling. Appl. Comput. Harmon. Anal. 41(3), 713–748 (2016)
Schwab, C.: QMC Galerkin discretization of parametric operator equations. In: Monte Carlo and quasiMonte Carlo methods 2012. Springer Proc. Math. Stat., pp. 613–629. Springer, Heidelberg (2013)
Sobol, I.M.: On sensitivity estimation for nonlinear mathematical models. Keldysh AppliedMathematics Institute 1, 112–118 (1990)
Sobol, I.M.: Global sensitivity indices for nonlinear mathematical models and their Monte Carlo estimates. Math. Comput. Simulation 55(1–3), 271–280 (2001)
Suryanarayana, G., Nuyens, D., Cools, R.: Reconstruction and collocation of a class of nonperiodic functions by sampling along tenttransformed rank1 lattices. J. Fourier Anal. Appl. 22(1), 187–214 (2016)
Zhang, Z., Hu, X., Hou, T.Y., Lin, G., Yan, M.: An adaptive ANOVAbased datadriven stochastic method for elliptic PDEs with random coefficient. Commun. Comput. Phys. 16(3), 571–598 (2014)
Acknowledgements
The authors would like to thank the reviewers for their useful advices to improve the comprehensibility of the paper. L. Kämmerer gratefully acknowledges funding by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) with the project number 380648269 and Daniel Potts with the project number 416228727SFB 1410.
Funding
Open Access funding enabled and organized by Projekt DEAL.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Johannes Maly.
This article is part of the topical collection “Recent advances in computational harmonic analysis” edited by Dae Gwan Lee, Ron Levie, Johannes Maly and Hanna Veselovska.
Appendices
Appendix A Rank1 lattices in the sFFT
Popular approaches with the properties stated at the end of Sect. 2.2 are for example based on socalled rank1 lattices. A rank1 lattice (R1L) is a set
with a socalled generating vector \({\varvec{z}}\in {\mathbb {Z}}^d\) and lattice size \(M \in {\mathbb {N}}\). In [39], single rank1 lattices (single R1Ls) were used as sampling strategy in the dimensionincremental method and provided a perfectly stable, reliable and efficient way to reconstruct the projected Fourier coefficients \(\tilde{\hat{p}}_{(1,\ldots ,t),{\varvec{k}}}\). In [25] and [32], other approaches based on multiple rank1 lattices (multiple R1Ls) and random rank1 lattices (random R1Ls) have been studied. The main advantage of these approaches is a smaller size and a significantly faster construction of the involved sampling sets \({\mathcal {X}}\), but therefore they involve some failure probability, which is not needed for the perfectly stable single R1L approach. Table 3 shows the sampling and arithmetic complexities of the dimensionincremental method when using these different sampling strategies based on R1Ls. Further notes on these approaches and their behavior when used in the dimensionincremental method can be found in the referred works.
Appendix B
Proof of Theorem 1.1
Proof
Note that in the following explanations as well as in the corresponding Tables 4 and 5 ’sample complexities’ always refers to the cardinality of the set of sampling locations, since we assume the black box sampling algorithm to provide the samples for all G trigonometric polynomials \(p^{(g)}\) simultaneously. In addition, we assume \(s_\mathrm {local}\lesssim s\).
Theorem 1.1 is a slight modification of [32, Thm. 2]. To be more precise, we apply Algorithm 1 using a random R1L approach in the role of Algorithm \({\text {A}}\) and a spatial discretization \({\mathcal {Y}}\) based on multiple R1Ls, cf. [24], in step 3. Accordingly, we mainly refer to the analysis of the sample complexity and computational complexity of the sFFT using random R1Ls given in [32, Sect. 3.2] as well as the therefore necessary theoretical results from [25, Sect. 4].
The crucial difference to [32, Thm. 2] is that we have to take account of the modification that we demand for reconstructing not only one but even G different trigonometric polynomials with possibly differing frequency supports \(I_g\), \(g=1,\ldots ,G\). In the following, we discuss the necessary modifications on the bounds and parameter choices discussed, proved and used in [32, Sects. 3.2.2 and 3.2.3] and [25, Lem. 4.4 and Thm. 4.6], such that the corresponding results hold.
The modifications when considering the usFFT can be separated in two different parts, first the possibly larger sets \(\mathrm {J}_t\) and \(I^{(1,\ldots ,d)}\) in steps 2 and 3, respectively, and second the modified failure probability. Note that \(r, \gamma _{{\text {A}}}\) and \(\gamma _{{\text {B}}}\) are the decisive parameters which provide estimates on the failure probability later on and, thus, both are discussed in the second part of the proof. \(\square \)
1.1 Part 1: Size of the frequency sets \(\mathrm {J}_t\) and \(I^{(1,\ldots ,d)}\)
We start with the candidate sets \(\mathrm {J}_t\) in step 2b of Algorithm 1 and observe, that the cardinality \(\vert \mathrm {J}_t \vert \) of the set of frequency candidates
in each dimension increment t can be simply bounded by \(\mathrm {J}_t\lesssim r\,s\,G\,N_\Gamma \), which contains an additional factor G now. Applying the sampling strategy suggested in [25, Sect. 2.1] together with [25, Lem. 4.5] directly yields \({\mathcal {X}}_{t,i}\lesssim \max (s,N_\Gamma )\log (\mathrm {J}_t/\gamma _{{\text {A}}})\lesssim \max (s,N_\Gamma )\log (r\, s\, G\,N_\Gamma /\gamma _{{\text {A}}})\) with \(\gamma _{{\text {A}}}\) the failure probability of Algorithm \({\text {A}}\).
Further, the cardinality of the finally detected frequency set \(I^{(1,\ldots ,d)}\) used in step 3 of our algorithm is bounded from above by \(s\, G\) due to the same argumentation, since \(\tilde{r}=1\) and \(\tilde{s}=s\) when \(t=d\) holds in step 2. Accordingly, we can apply [25, Alg. 1] in order to construct a spatial discretization \({\mathcal {Y}}\) of \(I^{(1,\ldots ,d)}\) based on multiple R1Ls which has a cardinality bounded by \({\mathcal {Y}}\lesssim \max (s\,G,N_\Gamma )\log (s\,G/\gamma _{{\text {B}}})\), where \(\gamma _{{\text {B}}}\) is the failure probability of the construction of this spatial discretization, cf. [24, Thm. 4.1].
1.2 Sample and computational complexity of the usFFT
Now, we need to discuss the computational complexities of the individual steps of the usFFT. Obviously, step 1 applies \(d\,r\,G\) different onedimensional FFTs of lengths at most \(N_\Gamma \), which yields a computational complexity in \({\mathcal {O}}\left( d\,r\,G\,N_\Gamma \log (N_\Gamma )\right) \). In step 2, we apply \(((d2)\,r+1)\,G\) times [32, Alg. 4], where the sampling set is a union of \(L\in {\mathcal {O}}\left( \log (r\, s\,G\, N_\Gamma /\gamma _{{\text {A}}})\right) \) R1Ls of size at most in \({\mathcal {O}}\left( \max \{s,N_\Gamma \} \right) \) and the input set of frequencies \(\mathrm {J}_t\) is bounded from above by \({\mathcal {O}}\left( r\,s\,G\,N_\Gamma \right) \) in its cardinality, which yields an arithmetic complexity in
in the worst case. Moreover, we observe \(\mathrm {J}_t\lesssim s\,G\,N_\Gamma \) with a certain probability since the signals p are all trigonometric polynomials and in the case where Algorithm \({\text {A}}\) does not fail in any case, we have \(\bigcup _{i=1,\ldots ,\tilde{r}} \tilde{\mathrm {J}}_{t,i,g}\subset I^{(1,\ldots ,t)}_{g}\) with \(I^{(1,\ldots ,t)}_{g}\le I_g^{(1,\ldots ,d)}\le s\). As a consequence, we save a linear r and the r in the log term compared to the worst case arithmetic complexity, cf. [32, Sect. 3.2.2] for a similar argumentation. In addition, the same argumentation saves a factor r in the logarithmic term of the upper bound on the number of sampling locations in step 2 with the same probability. Later, we specifically choose the parameters \(r, \gamma _{{\text {A}}}\) and \(\gamma _{{\text {B}}}\) such that the estimates hold with high probability  for that reason, we call these complexities with high probability complexities already here.
For computing the G FFTs of step 3 we apply [25, Alg. 2], which yields a computational complexity in
The sample complexities and computational complexities of the usFFT due to these modifications are given in Table 4, see [32, Tab. 3.2] for comparison to the sFFT. Here, the only changes are several appearances of the parameter G, i.e., for \(G=1\) we observe the complexities of the sFFT.
1.3 Part 2: Parameter choices
We continue with the aforementioned second big part, where we need to discuss suitable choices of r, \(\gamma _{{\text {A}}}\) and \(\gamma _{{\text {B}}}\) to obtain our desired failure probability \(\delta \).
To this end, we first consider the projection failure probability, i.e., the failure that occurs if important projected Fourier coefficients are close to zero and, thus, not detectable. The number r of detection iterations determines, how many of these projections are computed. The more projections are considered, the less is the probability that a specific projected Fourier coefficient is small for all of them and hence not detectable. Therefore, the parameter r directly controls this projection failure probability.
1.4 The number r of detection iterations
We consider a single trigonometric polynomial \(p\not \equiv 0\) with \(\min _{{\varvec{h}}\in \text {supp }{\hat{p}}}\hat{p}_g\ge 3\theta \) and \(\Gamma \supset \text {supp }{\hat{p}}\), \(\text {supp }{\hat{p}}\le s\). Choosing
as given in [25, Lem. 7], yields a probability of at most \(\frac{\delta }{3\,d\,s\,G}\) that all the projected Fourier coefficients are less than \(\theta \) for at least one frequency.
For G different of such trigonometric polynomials \(p^{(g)}\), we then apply the union bound. Therefore, the probability, that all the projected Fourier coefficients are less than \(\theta \) for at least one frequency and at least one signal, is bounded by \(\frac{\delta }{3\,d\,s}\).
1.5 The failure probabilities \(\gamma _{{\text {A}}}\) and \(\gamma _{{\text {B}}}\)
In the remaining lines of Part 2, we investigate the choices of the failure probabilities \(\gamma _{{\text {A}}}\) and \(\gamma _{{\text {B}}}\).
We start with the parameter \(\gamma _{{\text {A}}}\), which is in fact the failure probability of [32, Alg. 4] in the role of Algorithm \({\text {A}}\). When choosing \(\gamma _{{\text {A}}}:=\frac{\delta }{3\,d\,s\,G}\), we observe, that the probability, that at least one of the G applications of Algorithm \({\text {A}}\) fails in step 2d (for fixed t and i), is bounded from above due to the union bound by \(\frac{\delta }{3\,d\,s}\) again as in [32, Sect. 3.2.3].
Last, we fix the parameter \(\gamma _B:=\frac{\delta }{3\,d}\), i.e., the failure probability of step 3 is bounded from above by \(\gamma _B\), cf. [24, Thm. 4.1]. Here, no modification is needed.
1.6 Final step: parameter insertion and union bounds
The new parameter choices for r, \(\gamma _{{\text {A}}}\) and \(\gamma _{{\text {B}}}\) lead to the same failure probabilities for the detection of projected coefficients (\(\frac{\delta }{3\,d\,s}\)), Algorithm A for fixed t and i (\(\frac{\delta }{3\,d\,s}\)) and step 3 of Algorithm 1 (\(\frac{\delta }{3\,d}\)). Therefore, we can now use a union bound over the different steps of our algorithm similar to [25, Thm. 9]. It shows, that the total failure probability is now really bounded by terms less than \(\delta \).
Finally, the sample complexity and computational complexity stated in Theorem 1.1 now follow directly using the above discussed choices \(r = \lceil 2\,s\,\log (\frac{3\,d\,s\,G}{\delta })\rceil \), \(\gamma _A:=\frac{\delta }{3\,d\,s\,G}\), and \(\gamma _B:=\frac{\delta }{3\,d}\). The precise complexities for each step are given in Table 5.
Remark B.1
The sample complexity of step 2 was the dominating term for the sFFT. Hence, we could neglect the sample complexity of step 3 there completely. In the usFFT, this sample complexity now contains a linear factor G, such that it is not neglectable for arbitrarily chosen G. However, if we can bound G for example by \(G \lesssim d\,s\), the sample complexity of step 3 is again asymptotically smaller than for step 2. Even more, since G appears only in logarithmic terms of the sample complexity of step 2, we see, that the overall sample complexity of the usFFT is the same as for the sFFT in this case, i.e., the number of sampling locations is bounded in \({\mathcal {O}}\left( d\, s\, \max (s, N_{\Gamma })\, \log ^2 \frac{d\, s\, N_{\Gamma }}{\delta }\right) \) when assuming \(G \lesssim d\,s\), cf. also [32, Thm. 1.3] for comparison. This is an important observation, since the amount of sampling locations is the crucial factor for the overall computational complexity of our algorithm due to the high computational cost of the underlying sampling algorithm, i.e., the PDE solver, as mentioned several times before.
Appendix C
Proof of Lemma 3.1
Proof
Since i and \(z_j\) in formula (3.4) are integers, we know that \(\tilde{y}_{i,j} \in \lbrace \frac{n}{M},\, n=0,\ldots ,M1 \rbrace \) for all \(i=0,\ldots ,M1\) and \(j=1,\ldots ,d\). In particular, since M is prime and , we have that \(\lbrace \tilde{y}_{i,j_0}, i=0,\ldots ,M1\rbrace = \lbrace \frac{n}{M},\, n=0,\ldots ,M1 \rbrace \), so each \(\frac{n}{M}\) is really attained at least once for some i and j. Using this and the fact, that we are only considering \(0 = \frac{0}{M}< \Delta < \frac{1}{2}\frac{1}{M} = \frac{1}{2M}\), we have
On the other hand, we have
where the minimum is attained for n being the closest integer number to \(M\Delta + \frac{M}{2}\). Since \(0< M\Delta < \frac{M}{2M} = \frac{1}{2}\) and M odd, we conclude
Since the sum of these two minima is constant \(\frac{1}{2M}\), we have the upper bound
Finally, this upper bound is reached if and only if \(\Delta = \frac{1}{4M}\) and hence
\(\square \)
Remark B.1
Note that the \({{\,\mathrm{arg\,max}\,}}\) in Lemma 3.1 is not unique in general, since there also exist several values for \(\Delta \ge \frac{1}{2M}\) attaining this maximum, e.g., \(\Delta = \frac{3}{4M}\), which can be proven analogously. In our numerical experiments in Section 4, we will always work with \(\Delta _{\text {opt}} = \frac{1}{4M}\), which is the smallest optimal \(\Delta > 0\) as we saw in the Theorem above.
Also, if we would neglect the assumption that M is prime, we could run into problems if \(z_j\) and M are not coprime for all \(j = 1,\ldots ,d\), since then \(\lbrace \tilde{y}_{i,j}, i=0,\ldots ,M1\rbrace \) is only a proper subset of \(\lbrace \frac{n}{M},\, n=0,\ldots ,M1 \rbrace \). But this case is neglectable, since our algorithm only uses prime lattice sizes M.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Kämmerer, L., Potts, D. & Taubert, F. The uniform sparse FFT with application to PDEs with random coefficients. Sampl. Theory Signal Process. Data Anal. 20, 19 (2022). https://doi.org/10.1007/s43670022000373
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s43670022000373
Keywords
 Partial differential equation with random coefficient
 Stochastic differential equation
 Sparse fast Fourier transform
 Sparse FFT
 Lattice rule
 Periodization
 Uncertainty quantification
 High dimensional trigonometric approximation