## 1 Introduction

Parametric operator equations have gained significant attention in recent years. In particular, partial differential equations with random coefficients play an important role in the study of uncertainty quantification, e.g., [8, 22, 23]. Therefore, the numerical solution of these equations and how to compute them in an efficient and reliable way has become more and more important.

In this work, we consider the parametric, elliptic problem of finding $$u: D_{{\varvec{x}}} \times D_{{\varvec{y}}} \rightarrow {\mathbb {R}}$$ such that for every $${\varvec{y}}\in D_{{\varvec{y}}}$$ there holds

\begin{aligned} \begin{aligned} -\nabla \cdot \left( a({\varvec{x}},{\varvec{y}}) \nabla u({\varvec{x}},{\varvec{y}}) \right)&= f({\varvec{x}}) \qquad \qquad {\varvec{x}}\in D_{{\varvec{x}}},\, {\varvec{y}}\in D_{{\varvec{y}}},\\ u({\varvec{x}},{\varvec{y}})&= 0 \quad \qquad \qquad {\varvec{x}}\in \partial D_{{\varvec{x}}},\, {\varvec{y}}\in D_{{\varvec{y}}}, \end{aligned} \end{aligned}
(1.1)

describing the diffusion characteristics of inhomogeneous materials and therefore being called diffusion equations with the random diffusion coefficients a. Here, $${\varvec{x}}= (x_j)_{j=1}^{d_{{\varvec{x}}}} \in D_{{\varvec{x}}}$$ is the spatial variable in a bounded Lipschitz domain $$D_{{\varvec{x}}} \subset {\mathbb {R}}^{d_{{\varvec{x}}}}$$, typically with spatial dimension $$d_{{\varvec{x}}}=1,2$$ or 3, and $${\varvec{y}}= (y_j)_{j=1}^{d_{{\varvec{y}}}} \in D_{{\varvec{y}}}$$ is a high-dimensional random variable with $$D_{{\varvec{y}}} \subset {\mathbb {R}}^{d_{{\varvec{y}}}}$$. For the remainder of this paper, we use d instead of $$d_{{\varvec{y}}}$$ to simplify notations. The differential operator $$\nabla$$ is always used w.r.t. the spatial variable $${\varvec{x}}$$ and the one-dimensional random variables $$y_j$$ are assumed to be i.i.d. with a prescribed distribution.

A common way to define the random coefficient a is via

\begin{aligned} a({\varvec{x}},{\varvec{y}}) = a_0({\varvec{x}}) + \sum _{j=1}^{d} \Theta _j({\varvec{y}}) \,\psi _j({\varvec{x}}), \end{aligned}
(1.2)

where $$a_0$$ and $$\psi _j$$ are assumed to be uniformly bounded on $$D_{{\varvec{x}}}$$. This model is commonly used with the stochastic domain $$D_{{\varvec{y}}} = [\alpha ,\beta ]^{d}$$, typically with $$[\alpha ,\beta ] = [-1,1]$$ or $$[-\frac{1}{2}, \frac{1}{2}]$$. The $$\Theta _j({\varvec{y}})$$ can also be interpreted as random variables itself and are usually chosen to have expectation value $${\mathbb {E}}[\Theta _j({\varvec{y}})]=0$$, such that $${\mathbb {E}}[a({\varvec{x}},\cdot )] = a_0({\varvec{x}})$$ holds and the terms of the sum model the stochastic fluctuations.

Often, the model (1.2) is in an affine fashion, using $$\Theta _j({\varvec{y}}) = y_j$$ for all $$j=1,\ldots ,d$$. This so-called affine model is considered in many works on parametric differential equations with random coefficients, e.g., [2, 5, 11, 14, 15, 18, 30, 31, 37, 40, 44]. The so-called periodic model using $$\Theta _j({\varvec{y}}) = \frac{1}{\sqrt{6}}\sin (2\pi y_j)$$ has been recently studied in [22, 23]. For $$y_j$$ uniformly distributed on $$[-\frac{1}{2},\frac{1}{2}]$$ each, these $$\Theta _j$$ are then distributed according to the arcsine distribution on $$[-1,1]$$. It turned out that this model is also worth to be considered in addition to the affine model. Further, this model yields some advantages for our new approach due to its periodicity w.r.t. the random variables, as we will see later.

The second type of the random coefficient a, that is also used in many recent works, e.g., [3, 4, 6, 9, 19, 36], is the so-called lognormal form

\begin{aligned}&a({\varvec{x}},{\varvec{y}}) = a_0({\varvec{x}}) + \exp (b({\varvec{x}},{\varvec{y}})),&b({\varvec{x}},{\varvec{y}}) = b_0({\varvec{x}}) + \sum _{j=1}^{d} y_j \,\psi _j({\varvec{x}}). \end{aligned}

Here, the random variables $$y_j$$ are typically normally distributed, i.e., $$y_j \sim {\mathcal {N}}(0,1)$$, and hence $$D_{{\varvec{y}}} = {\mathbb {R}}^{d}$$. The numerical analysis as well as the computation of approximations for this model is more difficult, but also arises more often from real applications. A more detailed overview on parametric and stochastic PDEs can be found, e.g., in [10, Sect. 1].

In this paper, we design a numerical method for solving the aforementioned problems. To be more precise, we will compute approximations of the solutions $$u({\varvec{x}},{\varvec{y}})$$ using trigonometric polynomials. A Fourier approach on ordinary differential equations, i.e., $$d_x=1$$, with high-dimensional random coefficients has already been presented in [7]. There, a dimension-incremental method, the so-called sparse Fast Fourier Transform (sFFT), cf. [39], was used to detect the most important frequencies $${\varvec{k}}$$ and corresponding approximations of the Fourier coefficients $$c_{{\varvec{k}}}(u)$$ of the solution $$u({\varvec{x}},{\varvec{y}})$$. These values can be used to compute an approximation of the solution u or other quantities of interest as, e.g., the expectation value $${\mathbb {E}}[u]$$. Further, the frequencies and Fourier coefficients can be used to gain detailed information about the influence of the random variables $$y_j$$ on the solution u and their interaction with each other.

In this work, we present a non-intrusive approach based on the main idea of the algorithm developed in [7]. The main difference is, that we do not include the spatial variable $${\varvec{x}}$$ in the Fourier approach and therefore only apply the sFFT w.r.t. the random variable $${\varvec{y}}$$. Therefore, the sFFT only needs samples of the function values of u for fixed $${\varvec{y}}$$, which can be computed by using any suitable, already available differential equation solver. In consequence, we are not restricted to particular spatial domains $$D_{{\varvec{x}}}$$ or spatial dimensions $$d_x$$. To be more precise, we consider a finite set $${{\mathcal {T}}}_G \subset D_{{\varvec{x}}}$$ with finite cardinality $$|{{\mathcal {T}}}_G| = G < \infty ,$$ as spatial discretization and aim for approximations of the functions

\begin{aligned} u_{{\varvec{x}}_g}({\varvec{y}}) :=u({\varvec{x}}_g,{\varvec{y}}) \end{aligned}

for each $${\varvec{x}}_g \in {{\mathcal {T}}}_G$$. Also note that we might need to apply a suitable periodization w.r.t. $${\varvec{y}}$$ first if the function $$u_{{\varvec{x}}_g}$$ is not already periodic.

Unfortunately, we would need to apply such a pointwise approximation algorithm, like in our case sFFT, then G times separately, resulting in an unnecessary huge increase in the number of samples used and therefore, since each sample implies a call of the underlying, probably expensive differential equation solver, also in computation time of the algorithm. Hence, we develop a modification of the sFFT to overcome this problem and compute the approximations of the functions $$u_{{\varvec{x}}_g}$$ within one call of the new algorithm. In particular, our so-called uniform sparse Fast Fourier Transform (usFFT) combines some candidate sets between each dimension-incremental step, which allows to use the same sampling nodes $${\varvec{y}}$$ for each point $${\varvec{x}}_g \in {{\mathcal {T}}}_G$$ in the next step. This strategy manages to keep the number of used samples in a reasonable size and hence decreases the computation time drastically compared to G applications of the sFFT algorithm itself. We summarize this in the following Theorem:

### Theorem 1.1

Let the sparsity parameter $$s\in {\mathbb {N}}$$, a frequency candidate set $$\Gamma \subset {\mathbb {Z}}^d$$, $$|\Gamma |<\infty$$, the amount $$G\in {\mathbb {N}}$$ and a failure probability $$\delta \in (0,1)$$ be given. Moreover, we define $$N_{\Gamma } :=\max _{j=1,\ldots ,d} \lbrace \max _{{\varvec{k}}\in \Gamma } k_j - \min _{{\varvec{l}}\in \Gamma } l_j \rbrace$$. Then, there exists a randomized sampling strategy based on the random rank-1 lattice approach in [32] generating a set S of sampling locations with cardinality

\begin{aligned} |S|\in {\mathcal {O}}\left( d\, s\, \max (s, N_{\Gamma })\, \log ^2 \frac{d\, s\, G\, N_{\Gamma }}{\delta } + \max (sG, N_\Gamma )\, \log \frac{d\,s\,G}{\delta } \right) \end{aligned}
(1.3)

such that the following holds.

Consider G arbitrary multivariate trigonometric polynomials $$p^{(g)}({\varvec{y}}):=\sum _{{\varvec{k}}\in I_g}\hat{p}_{{\varvec{k}}}^{(g)}\text {e}^{2\pi \text {i}{\varvec{k}}\cdot {\varvec{y}}}$$, $$g=1,\ldots ,G$$, where we assume $$I_g\subset \Gamma$$, $$|I_g|\le s$$ and $$\min _{{\varvec{k}}\in I_g}|\hat{p}_{{\varvec{k}}}^{(g)}|>0$$ for each $$g=1,\ldots ,G$$. We generate a random set S via this sampling strategy. Then, with probability at least $$1-\delta$$ it holds that

• all frequencies $${\varvec{k}}\in I_g$$ as well as

• all Fourier coefficients $$\hat{p}_{{\varvec{k}}}^{(g)}$$, $${\varvec{k}}\in I_g$$,

of all multivariate trigonometric polynomials $$p^{(g)}$$, $$g=1,\ldots ,G$$, can be reconstructed from their values at the sampling locations in S.

The simultaneous identification of all the frequencies and the computation of all the Fourier coefficients can be realized by a combination of Algorithm 1 and a modification of the approach presented in [32] in the role of Algorithm $${\text {A}}$$. The suggested method has a computational complexity of

\begin{aligned} {\mathcal {O}}\left( d^2\, s^2 \,G^2 \,N_{\Gamma }\, \log ^3 \frac{d \,s\, G\, N_{\Gamma }}{\delta }\right) \end{aligned}

with probability at least $$1-\delta$$ as well as $${\mathcal {O}}\left( d^2 \,s^3 \,G^2 \,N_{\Gamma } \,\log ^3 \frac{d \,s \,G \,N_{\Gamma }}{\delta }\right)$$ in the worst case.

### Remark 1.2

Note that (1.3) in Theorem 1.1 does not state anything about the sampling complexity of Algorithm 1, but the amount of sampling locations $$\vert S \vert$$. We need samples of all trigonometric polynomials $$p^{(g)}({\varvec{y}}), g=1,\ldots ,G,$$ at these sampling nodes $${\varvec{y}}\in S$$, so the necessary amount of samples in the classical sense is G times larger. However, we aim for an Algorithm, where the G samples $$p^{(g)}({\varvec{y}})$$ for a fixed $${\varvec{y}}\in S$$ for all $$g=1,\ldots ,G$$ are obtained by just a single call of some (probably expensive) black box sampling method. In all of our numerical examples in Sect. 4, the computation time for this sampling procedure tremendously outweighs the pure computation time of the remaining steps of Algorithm 1, which we referred to as computational complexity in Theorem 1.1. Hence, we stress on the fact, that the computational complexity is not the main focus of this complexity result, but the amount of sampling nodes.

The proof of the Theorem is given in Appendix B. While Theorem 1.1 is stated for trigonometric polynomials $$p^{(g)}$$ only, the algorithm can be used on the above mentioned periodic or periodized functions $$u_{{\varvec{x}}_g}({\varvec{y}})$$ to compute the support and values of the approximately largest Fourier coefficients of the functions with some thresholding technique, realized by the parameters $$s_\mathrm {local}$$ and $$\theta$$ in Algorithm 1, as well, which is also the key idea when applying the sFFT for function approximation in [25, 32, 39]. More generally spoken, we could even consider G different periodic functionals $$F_g({\varvec{y}})$$ and approximate them with the same approach we are about to present here. Moreover, Theorem 1.1 does not assume the frequency sets $$I_g$$ to share any frequencies $${\varvec{k}}$$, i.e., these sets could even be pairwise disjoint in the worst case scenario. Obviously, this will not be the case in our examples later on as the functions $$u_{{\varvec{x}}_g}({\varvec{y}})$$ and $$u_{{\varvec{x}}_{\tilde{g}}}({\varvec{y}})$$ are probably very similar for $${\varvec{x}}_g$$ and $${\varvec{x}}_{\tilde{g}}$$ close to each other due to the smoothness of the solution $$u({\varvec{x}},{\varvec{y}})$$. Hence, the given complexities, especially the quadratic dependency on G of the computational complexity, are very pessimistic and should really be seen as a worst case estimate.

The crucial advantage of the presented approach is the efficient and adaptive choice of the frequency set performed by the underlying sFFT. Most of the approaches in the aforementioned works are based on certain (tensorized) basis functions [2,3,4,5,6, 9, 11, 22], Quasi-Monte Carlo methods [9, 14, 15, 18, 19, 23, 30, 31, 36, 37, 40] or collocation methods [9, 44] and often assume the particular involved basis functions, weights, index sets or kernels needed to be known, chosen or computed in advance. A common example are compressed sensing techniques, see, e.g., [1, Ch. 7] or [17] and the references therein for an overview, where the considered index sets need to be chosen in advance. The adaptivity of our algorithm circumvents this step and therefore provides much more freedom in finding a good sparse approximation. Also note that our method aims for the approximation of the solution $$u({\varvec{x}},{\varvec{y}})$$ directly instead of, e.g., just a high-dimensional quadrature. As an example and for additional information, we refer to [28] as a short and general introduction to Quasi-Monte Carlo methods, which is also one of the most common approaches.

Another suitable approach is to use an efficient deterministic sampling strategy which guarantees the reconstruction of each s-sparse signal supported on the d-dimensional candidate set $$\Gamma$$, so-called deterministic sparse Fourier Transform. Such a method allows to use the same samples for approximating all signals $$u_{{\varvec{x}}_g}$$, $$g=1,\ldots , G$$. Several works like [21] already provide applicable multivariate sparse Fourier transform results, but the resulting complexities $${\mathcal {O}}(d^4s^2)$$ (up to logarithmic factors) in a deterministic setting and $${\mathcal {O}}(d^4s)$$ for a random variant scale suboptimal in the dimension d. Another fully deterministic method is stated in [34]. There, the construction of the sampling sets suffers from some minor restrictions on the considered frequency set, which result in a slightly better scaling sampling complexity $${\mathcal {O}}(d^3s^2N)$$ and computational complexity $${\mathcal {O}}(d^3s^2N^2)$$ (both again up to logarithmic factors) with N the side length of the considered cube in frequency domain. Finally, in [20] multivariate sparse Fourier transforms are presented, where the corresponding complexities scale again quadratically in s for a deterministic version and linearly for a Monte Carlo version, while the dimension d only enters linearly. Unfortunately, this is only true if the considered candidate sets do not scale exponentially w.r.t. the dimension d. Otherwise, the size M of the reconstructing rank-1 lattices used also scales exponentially in d and logarithmic factors like $$\log ^{11/2} M$$ in the complexities then result again in a suboptimal dimension scaling.

Our usFFT is highly adaptive, since it only needs an arbitrary candidate set $$\Gamma$$ and selects the important frequencies $${\varvec{k}}$$ in this search domain on its own. The cardinality $$|\Gamma |$$ of this candidate set is not as crucial as for other approaches like mentioned above, since the number of used samples and the computation time suffer only mildly from larger candidate sets. Again, we stress on the fact that we may also extract additional information about the influence and the interactions of the random variables $${\varvec{y}}_j$$ from the output of the usFFT. For instance, we detect a maximum of only four simultaneously active dimensions in the detected frequencies in our numerical examples, i.e., the detected frequency vectors $${\varvec{k}}$$ have at most four non-zero components with $$d = 10$$ or even $$d =20$$. Another main advantage of our algorithm is the non-intrusive and parallelizable behavior. As already mentioned, the usFFT uses existing numerical solvers of the considered differential equation. We can use suitable, reliable and efficient solvers with no need to re-implement them. Further, the different samples needed in each sampling step can be computed on multiple instances. This parallelization allows to reduce the computation time even further and makes a higher number of used samples less time consuming.

The remainder of the paper is organized as follows:

In Sect. 2 we set up some notation and assumptions and briefly explain the key idea of the sFFT algorithm. Section 3 is devoted to the explanation of the usFFT as well as some periodizations required for the affine and lognormal cases. Finally, in Sect. 4 we test the new algorithm on different examples using periodic, affine, and lognormal random coefficients and investigate the computed approximations under different aspects.

The MATLAB® source code of the algorithm as well as demos for our numerical examples can be downloaded from https://mytuc.org/fyfw.

## 2 Prerequisites

We consider the PDE problem (1.1). Note that we always assume f to be independent of the random variable $${\varvec{y}}$$ and zero boundary conditions just for simplicity and to preserve clarity. Our algorithm (up to some minor changes) may also be applied for right-hand sides $$f({\varvec{x}},{\varvec{y}})$$ as well as non-zero Dirichlet boundary conditions $$u({\varvec{x}},{\varvec{y}}) = h({\varvec{x}},{\varvec{y}})$$ for all $${\varvec{x}}\in \partial D_{{\varvec{x}}}$$.

### 2.1 Problem setting

The weak formulation of our problem reads: Given $$f \in H^{-1}(D_{{\varvec{x}}})$$, for every $${\varvec{y}}\in D_{{\varvec{y}}}$$, find $$u(\cdot ,{\varvec{y}}) \in H_0^1(D_{{\varvec{x}}})$$, such that

\begin{aligned} \int \limits _{D_{{\varvec{x}}}} a({\varvec{x}},{\varvec{y}}) \nabla u({\varvec{x}},{\varvec{y}}) \cdot \nabla v({\varvec{x}}) \,\mathrm {d}{\varvec{x}}= \int \limits _{D_{{\varvec{x}}}} f({\varvec{x}})v({\varvec{x}})\,\mathrm {d}{\varvec{x}}\quad \forall v \in H_0^1(D_{{\varvec{x}}}). \end{aligned}

As usual, $$H_0^1(D_{{\varvec{x}}})$$ denotes the subspace of the $$L_2$$-Sobolev space $$H^1(D_{{\varvec{x}}})$$ with vanishing trace on $$\partial D_{{\varvec{x}}}$$ and $$H^{-1}(D_{{\varvec{x}}})$$ denotes the dual space of $$H_0^1(D_{{\varvec{x}}})$$. We say, that the diffusion coefficient $$a: D_{{\varvec{x}}} \times D_{{\varvec{y}}} \rightarrow {\mathbb {R}}$$ fulfills the uniform ellipticity assumption, if there exist two constants $$a_{\min } \in {\mathbb {R}}$$ and $$a_{\max } \in {\mathbb {R}}$$, such that

\begin{aligned}&0< a_{\min } \le a({\varvec{x}},{\varvec{y}}) \le a_{\max } < \infty&\forall {\varvec{x}}\in D_{{\varvec{x}}},\, \forall {\varvec{y}}\in D_{{\varvec{y}}}. \end{aligned}
(2.1)

Then, the Lax–Milgram Lemma ensures, that the problem (1.1) possesses a unique solution $$u(\cdot ,{\varvec{y}}) \in H_0^1(D_{{\varvec{x}}})$$ for every fixed $${\varvec{y}}\in D_{{\varvec{y}}}$$, satisfying the a priori estimate

\begin{aligned} \sup _{{\varvec{y}}\in D_{{\varvec{y}}}} \Vert u(\cdot ,{\varvec{y}}) \Vert _{H_0^1(D_{{\varvec{x}}})} \le \frac{1}{a_{\min }} \Vert f \Vert _{H^{-1}(D_{{\varvec{x}}})}. \end{aligned}

Some further basic information and results on approximation and smoothness of the solution u of high-dimensional parametric PDEs can be found in [10, Sects. 1 and 2]. Additionally, we also refer to the general results on best n-term approximations given in [10, Sect. 3.1], since our Fourier approach fits in this particular framework as well.

In order to compute an approximation of the solution $$u_{{\varvec{x}}_g} :=u({\varvec{x}}_g,\cdot )$$ at a given point $${\varvec{x}}_g \in D_{{\varvec{x}}}$$ using the dimension-incremental method explained below, we need samples of $$u_{{\varvec{x}}_g}$$ for a lot of sampling nodes $${\varvec{y}}$$. We aim for a non-intrusive approach and therefore use a finite element method to solve the problem (1.1) for a given $${\varvec{y}}\in D_{{\varvec{y}}}$$. A similar approach is used e.g. in [36, 37], where the finite element method is used to solve the PDE for any $${\varvec{y}}_{\mathfrak {u}}$$ with $$\mathfrak {u} \subset {\mathbb {N}}$$ and $$({\varvec{y}}_{\mathfrak {u}})_j = y_j$$ for $$j\in \mathfrak {u}$$ and 0 otherwise. The corresponding approximations of the so-called $$\mathfrak {u}$$-truncated solution are then used for their particular method aswell. In our case, we just evaluate the finite element solution $$\check{u}(\cdot ,{\varvec{y}})$$ at the given point $${\varvec{x}}_g \in D_{{\varvec{x}}}$$. In particular, instead of the finite element method, any differential equation solver would fit, that is capable of computing the value $$u({\varvec{x}}_g,{\varvec{y}})$$ for given $${\varvec{x}}_g$$ and $${\varvec{y}}$$. Hence, we also refer to this sampling method as black box sampling later on.

Note that we will use the finite element solution $$\check{u}$$ also as an approximation of the true solution u, when we test the accuracy of our computed approximation $$u^{\texttt {usFFT}}$$ in Sect. 4. In detail, we have

\begin{aligned} \text {err}(u,u^{\texttt {usFFT}}) \le \text {err}(u,\check{u}) + \text {err}(\check{u},u^{\texttt {usFFT}}), \end{aligned}

where $$\text {err}(\cdot ,\cdot )$$ is a suitable metric, symbolizing the error. So while we only investigate the second term $$\text {err}(\check{u},u^{\texttt {usFFT}})$$ in our numerical tests later, the first term includes other error sources as the modeling, e.g., by a dimension truncation of infinite-dimensional random coefficient a, or the error coming from the finite element approximation itself. For a particular example of this, we refer to the detailed error analysis for the periodic model mentioned in Sect. 1, that is given in [22, Sect. 4].

### 2.2 The dimension-incremental method for s-sparse periodic functions

The following dimension-incremental method was presented in [39]. The aim of this algorithm is to determine the non-zero Fourier coefficients $$\hat{p}_{{\varvec{k}}} \in {\mathbb {C}}, \, {\varvec{k}}\in I,$$ of a multivariate trigonometric polynomial

\begin{aligned} p({\varvec{y}}) = \sum _{{\varvec{k}}\in I} \hat{p}_{{\varvec{k}}} \exp (2\pi \text {i}{\varvec{k}}\cdot {\varvec{y}}) \end{aligned}

with unknown frequency set $$I\subset {\mathbb {Z}}^d,\, |I| < \infty ,$$ based on samples of the polynomial p. Obviously, p is a periodic signal and its domain is the d-dimensional torus $${\mathbb {T}}^d$$, $${\mathbb {T}}\simeq [0,1)$$.

The goal is not only to calculate the nonzero Fourier coefficients $$\hat{p}_{{\varvec{k}}}$$ but also, and more important, to detect the frequencies $$\varvec{k}$$ out of a possibly huge search domain $$\Gamma \subset {\mathbb {Z}}^d$$ belonging to the nonzero Fourier coefficients. In particular, we define the set

\begin{aligned} {\text {supp}}\hat{p}:=\{{\varvec{k}}\in \Gamma :\hat{p}_{\varvec{k}}\ne 0\} \end{aligned}

and call the cardinality $$|{\text {supp}}\hat{p}|$$ the sparsity of p.

First, we introduce some further notation. We consider a given search domain $$\Gamma \subset {\mathbb {Z}}^d, \, |\Gamma | < \infty ,$$ that should be large enough to contain the unknown frequency set $${I\subset \Gamma }$$. We denote the projection of a frequency $${\varvec{k}}:=(k_1,\ldots ,k_d) \in {\mathbb {Z}}^d$$ to the components $${\varvec{i}}:=(i_1,\ldots ,i_m) \in \lbrace \iota \in \lbrace 1,\ldots ,d \rbrace ^m : \iota _t \not = \iota _{t^{\prime }} \text { for } t \not = t^{\prime } \rbrace$$ by $${\mathcal {P}}_{{\varvec{i}}}({\varvec{k}}) :=(k_{i_1},\ldots ,k_{i_m}) \in {\mathbb {Z}}^m$$. Correspondingly, we define the projection of a frequency set $$I\subset {\mathbb {Z}}^d$$ to the components $${\varvec{i}}$$ by $${\mathcal {P}}_{{\varvec{i}}}(I) :=\lbrace (k_{i_1},\ldots ,k_{i_m}): {\varvec{k}}\in I\rbrace$$. Using these notations, the general approach is the following:

### 2.3 Sketch of dimension-incremental reconstruction

1. 1.

Compute the first components of the unknown frequency set from sampling values, i.e., determine a set $$I^{(1)} \subset {\mathcal {P}}_1(\Gamma )$$, such that $${\mathcal {P}}_1(\text {supp } \hat{p}) \subset I^{(1)}$$ holds.

2. 2.

For dimension increment step $$t=2,\ldots ,d$$, i.e., for each additional dimension:

1. (a)

Compute the tth components of the unknown frequency set from sampling values, i.e., determine a set $$I^{(t)} \subset {\mathcal {P}}_t(\Gamma )$$, such that $${\mathcal {P}}_t(\text {supp } \hat{p}) \subset I^{(t)}$$ holds.

2. (b)

Construct a suitable sampling set $${\mathcal {X}}^{(1,\ldots ,t)} \subset {\mathbb {T}}^d,\, |{\mathcal {X}}^{(1,\ldots ,t)}| \ll |\Gamma |$$, which allows to detect those frequencies from the set $$(I^{(1,\ldots ,t-1)} \times I^{(t)}) \cap {\mathcal {P}}_{(1,\ldots ,t)}(\Gamma )$$ belonging to non-zero Fourier coefficients $$\hat{p}_{{\varvec{k}}}$$.

3. (c)

Sample the trigonometric polynomial p along the nodes of the sampling set $${\mathcal {X}}^{(1,\ldots ,t)}$$.

4. (d)

Compute the Fourier coefficients $$\tilde{\hat{p}}_{(1,\ldots ,t),{\varvec{k}}},\, {\varvec{k}}\in (I^{(1,\ldots ,t-1)} \times I^{(t)}) \cap {\mathcal {P}}_{(1,\ldots ,t)}(\Gamma )$$.

5. (e)

Determine the non-zero Fourier coefficients from $$\tilde{\hat{p}}_{(1,\ldots ,t),{\varvec{k}}},\, {\varvec{k}}\in (I^{(1,\ldots ,t-1)} \times I^{(t)}) \cap {\mathcal {P}}_{(1,\ldots ,t)}(\Gamma )$$ and obtain the set $$I^{(1,\ldots ,t)}$$ of detected frequencies. The $$I^{(1,\ldots ,t)}$$ index set should be equal to the projection $${\mathcal {P}}_{(1,\ldots ,t)} (\text {supp }\hat{p})$$.

3. 3.

Use the set $$I^{(1,\ldots ,d)}$$ and the computed Fourier coefficients $$\tilde{\hat{p}}_{(1,\ldots ,d),{\varvec{k}}},\, {\varvec{k}}\in I^{(1,\ldots ,d)}$$ as an approximation for the support $$\text {supp } \hat{p}$$ and the Fourier coefficients $$\hat{p}_{{\varvec{k}}},\, {\varvec{k}}\in \text {supp } \hat{p}$$.

Note that this method can also be used for the numerical determination of the approximately largest Fourier coefficients

\begin{aligned} c_{{\varvec{k}}}(f):=\int _{{\mathbb {T}}^d}f({\varvec{y}})\text {e}^{-2\pi \text {i}{\varvec{k}}\cdot {\varvec{y}}}\mathrm {d}{\varvec{y}},\quad {\varvec{k}}\in I, \end{aligned}

of suffciently smooth periodic signals f using a suitable thresholding technique, cf. [25, 32, 39].

The proposed approach includes the construction of suitable sampling sets in step 2b. To this end, one assumes that an upper bound $$s\ge |{\text {supp}}\hat{p}|$$ is known and one constructs the sampling sets $${\mathcal {X}}^{(1,\ldots ,t)}$$ such that the Fourier coefficients $$\tilde{\hat{p}}_{(1,\ldots ,t),{\varvec{k}}}$$ computed in step 2d are randomly projected ones. Due to that projection one may observe cancellations with the effect that one misses active frequencies. For that reason, one repeats the computation of the projected Fourier coefficients for a number r of random projections and then one takes the union.

Of course, there exist different methods for the computation of the projected Fourier coefficients. The algorithm works with any sampling method, which computes Fourier coefficients on a given frequency set. Preferable sampling sets combine the four properties:

• relatively low number of sampling nodes (sampling complexity),

• stability,

• efficient construction methods for the sampling set,

• fast Fourier transform like algorithms in order to compute the projected Fourier coefficients.

Especially due to the last point, we will call the dimension-incremental method the sparse Fast Fourier Transform (sFFT) from now on. A quick sketch of the sampling techniques used in [25, 32, 39] as well as the sample complexity and computational complexity of the sFFT using these methods are given in Appendix A.

## 3 The uniform sparse FFT

Up to now, the sFFT algorithm is a suitable tool in order to compute an approximation of the solution

\begin{aligned} u_{{\varvec{x}}_g}({\varvec{y}}) :=u({\varvec{x}}_g,{\varvec{y}})=\sum _{{\varvec{k}}\in {\mathbb {Z}}^d}c_{\varvec{k}}(u_{{\varvec{x}}_g})\text {e}^{2\pi \text {i}{\varvec{k}}\cdot {\varvec{y}}}\approx \sum _{{\varvec{k}}\in I_{{\varvec{x}}_g}}c_{\varvec{k}}^{\texttt {sFFT}}(u_{{\varvec{x}}_g})\text {e}^{2\pi \text {i}{\varvec{k}}\cdot {\varvec{y}}} \end{aligned}

for a single $${\varvec{x}}_g$$. When considering a whole set of points $${\varvec{x}}_g \in {{\mathcal {T}}}_G$$, $$|{{\mathcal {T}}}_G |= G$$, we have to call the existing method G times. But multiple, independent calls of the sFFT result in different, adaptively determined sampling sets $${\mathcal {X}}^{(1,\ldots ,t)}$$ in step 2b. Hence, we cannot guarantee that the solutions of the differential equation from one run of the algorithm can be utilized in another one. So we really need G full calls of the sFFT including all sampling computations and therefore end up with unnecessary many samples, even when using the sample efficient rank-1 lattice (R1L) approaches. Remember, that sampling means solving the differential equation with a call of the underlying differential equation solver, that might be very expensive in computation time. Therefore, we now modify the dimension-incremental method, such that we can work on the set $${{\mathcal {T}}}_G$$ and one call of the algorithm computes approximations of the most important Fourier coefficients $$c_{{\varvec{k}}}(u_{{\varvec{x}}_g})$$, $${\varvec{k}}\in I_{{\varvec{x}}_g}$$, for each $$g=1,\ldots ,G$$, including a clever choice of the sampling nodes $${\varvec{y}}$$.

### 3.1 Expanding the sFFT

The full method is stated in Algorithm 1. We force the dimension-incremental method to select a set $$I\subset \Gamma \subset {\mathbb {Z}}^d$$ containing the frequencies of the s approximately largest Fourier coefficients $$c_{{\varvec{k}}}(u_{{\varvec{x}}_g})$$ for each $${\varvec{x}}_g \in {{\mathcal {T}}}_G$$. To this end, we compute the set of detected frequencies $$I_{{\varvec{x}}_g}^{(1,\ldots ,t)}$$ for each $${\varvec{x}}_g$$ in each dimension-increment t, but afterwards we form the union of these sets $$\bigcup _{g=1}^{G} I_{{\varvec{x}}_g}^{(1,\ldots ,t)}$$, which will be the set of detected frequencies $$I^{(1,\ldots ,t)}$$, that is given to the next dimension-incremental step $$t+1$$.

Now, we start each iteration with a larger frequency candidate set $$(I^{(1,\ldots ,t-1)} \times I^{(t)}) \cap {\mathcal {P}}_{(1,\ldots ,t)}(\Gamma )$$, which is suitable for all $${\varvec{x}}_g \in {{\mathcal {T}}}_G$$. This way, the first t components of the elements of the sampling set $${\mathcal {X}}^{(1,\ldots ,t)}$$ are the same for each $${\varvec{x}}_g$$ and the random part, which causes the specific random projection of the Fourier coefficients, can be chosen equally for each $${\varvec{x}}_g$$ without disturbing the algorithm. Therefore, we can now take advantage of the fact, that our underlying differential equation solver can evaluate the solutions $$u({\varvec{x}},{\varvec{y}})$$ for a given $${\varvec{y}}$$ for multiple values of $${\varvec{x}}$$ in the domain $$D_{{\varvec{x}}}$$. Accordingly, we only need to solve the differential equation once for each sampling node $${\varvec{y}}$$ and still get all the sampling values $$u_{{\varvec{x}}_g}({\varvec{y}})$$ for all $${\varvec{x}}_g$$ in our finite set $${{\mathcal {T}}}_G$$. Note that this also holds for the one-dimensional detections in steps 1 and 2a. Also, we might need to interpolate or approximate, if some of the values $$u_{{\varvec{x}}_g}({\varvec{y}})$$, $${\varvec{x}}_g\in {{\mathcal {T}}}_G$$ and fixed $${\varvec{y}}\in {\mathcal {X}}^{(1,\ldots ,t)}$$, are not directly given by the differential equation solver. Obviously, the larger candidate sets $$(I^{(1,\ldots ,t-1)} \times I^{(t)}) \cap {\mathcal {P}}_{(1,\ldots ,t)}(\Gamma )$$, resulting from the union of the sets $$I_{{\varvec{x}}_g}^{(1,\ldots ,t-1)}$$, $$g=1,\ldots ,G$$, and the union of the sets $$I_{{\varvec{x}}_g}^{(t)}$$, $$g=1,\ldots ,G$$, will also result in larger sampling sets. The overall increase of sampling locations considered is still very reasonable, cf. Theorem 1.1. The computational complexity suffers a bit harder from these modifications, but is not as important as the amount of sampling locations, cf. Remark 1.2. We could also think of further thresholding methods to cut the number of frequencies back to the sparsity parameter s at the end of each dimension-incremental step or at least at the end of the whole algorithm. In this work, we will not do this, but take a look at the total number of frequencies in the output of the algorithm in relation to the sparsity parameter s, cf. Remark 4.1.

Overall, the proposed method is now capable of computing approximations

\begin{aligned} u_{{\varvec{x}}_g}^{\texttt {usFFT}}({\varvec{y}}) :=\sum _{{\varvec{k}}\in I}c_{\varvec{k}}^{\texttt {usFFT}}(u_{{\varvec{x}}_g})\text {e}^{2\pi \text {i}{\varvec{k}}\cdot {\varvec{y}}} \end{aligned}

for all nodes $${\varvec{x}}_g, g=1,\ldots ,G.$$ Please note that step 2 does not provide $$(c_{\varvec{k}}^{\texttt {usFFT}}(u_{{\varvec{x}}_g}))_{{\varvec{k}}\in I}$$ but only approximations of $$(c_{\varvec{k}}^{\texttt {usFFT}}(u_{{\varvec{x}}_g}))_{{\varvec{k}}\in \tilde{J}_{d,1,g}}$$, where $$\tilde{J}_{d,1,g}\subsetneq I^{(1,\ldots ,d)}=I$$ holds in general. In order to compute all the Fourier coefficients $$c_{\varvec{k}}^{\texttt {usFFT}}(u_{{\varvec{x}}_g})$$, $${\varvec{k}}\in I$$ and $$g=1,\ldots , G$$, we propose an additional approximation in step 3. For this approximation, the user can apply a suitable approach of his choice. The used method should just compute an approximation of the projection to the already determined space of trigonometric polynomials $$\text {span}\{\exp (2\pi \text {i}{\varvec{k}}\cdot \circ ):{\varvec{k}}\in I\}$$, cf., e.g., [24, 26, 27] for different possible sampling approaches.

We will call this modified version of the sFFT the uniform sFFT or short usFFT from now on, where uniform is meant w.r.t. the discrete set of points $${{\mathcal {T}}}_G$$. The main difference to the sFFT algorithms from [25, 32, 39] are the loops over g and the corresponding unions of the frequency sets. Possible choices for Algorithm $${\text {A}}$$ and the approximation approach used in step 3 of Algorithm 1 are a random rank-1 lattice approach and a multiple rank-1 lattice approach, respectively, which actually leads to Theorem 1.1, cf. Appendix A and Appendix B.

### 3.2 Periodization

The usFFT allows us to reconstruct a frequency set $$I$$ and approximations $$c_{{\varvec{k}}}^{\texttt {usFFT}}(u_{{\varvec{x}}_g})$$ of the corresponding Fourier coefficients $$c_{{\varvec{k}}}(u_{{\varvec{x}}_g})$$ for each $${\varvec{x}}_g \in {{\mathcal {T}}}_G$$. Unfortunately, this approach requires the function $$u({\varvec{x}},{\varvec{y}})$$ to be 1-periodic w.r.t. $${\varvec{y}}$$ in each stochastic dimension d.

Since the right-hand side $$f({\varvec{x}})$$ does not depend on $${\varvec{y}}$$ in our considerations, the random coefficient a is the only given function involving the random variable $${\varvec{y}}$$ in the problem (1.1). In periodic models, we use the random coefficient (1.2) with 1-periodic functions $$\Theta _j({\varvec{y}})$$. Hence, the random coefficient $$a({\varvec{x}},{\varvec{y}})$$ is 1-periodic and thus the solution $$u({\varvec{x}},{\varvec{y}})$$ is also 1-periodic w.r.t. each component of $${\varvec{y}}$$. Therefore, we can apply the usFFT directly for this model without any further considerations.

In order to apply the usFFT when using the affine and lognormal models, we need to apply a suitable periodization first, since the random coefficient a and therefore the solution u are not periodic in general. Note that we assume the random variable to be uniformly distributed in the affine case, i.e., $${\varvec{y}}\sim {\mathcal {U}}([\alpha ,\beta ]^{d})$$, and standard normally distributed in the lognormal case, i.e., $${\varvec{y}}\sim {\mathcal {N}}(0,1)^{d}$$ and recall $${\mathbb {T}}\simeq [0,1)$$.

#### 3.2.1 Affine case

We consider the in $$\tilde{{\varvec{y}}}$$ 1-periodic function

\begin{aligned}&\tilde{u}: D_{{\varvec{x}}} \times {\mathbb {T}}^{d} \longrightarrow {\mathbb {R}}\\&\tilde{u}({\varvec{x}},\tilde{{\varvec{y}}}) :=u({\varvec{x}},\varphi (\tilde{{\varvec{y}}})), \end{aligned}

with $$\varphi$$ being some suitable transformation function, i.e.,

\begin{aligned} \varphi : {\mathbb {T}}^{d} \longrightarrow D_{{\varvec{y}}} = [\alpha ,\beta ]^{d}. \end{aligned}

With this approach, the usFFT is able to compute approximations of the functions $${\tilde{u}_{{\varvec{x}}_g} :=\tilde{u}({\varvec{x}}_g,\cdot )}$$ for each $${\varvec{x}}_g \in {{\mathcal {T}}}_G$$. We want $$\varphi$$ to act component-wise on the random variable, i.e., $$\varphi (\tilde{{\varvec{y}}}) :=\left( \varphi _j(\tilde{y}_j) \right) _{j=1}^{d}$$. Further, we assume, that these mappings $$\varphi _j$$ fulfill the assumptions

1. (A1)

Each $$\varphi _j$$ is continuous, i.e., $$\varphi _j \in C({\mathbb {T}})$$ for each $$j=1,\ldots ,d$$.

2. (A2)

It holds $$\varphi _j(0) = \varphi _j(1) = \alpha$$ and $$\varphi _j(\frac{1}{2}) = \beta$$ for each $$j=1,\ldots ,d$$.

3. (A3)

Each $$\varphi _j$$ is symmetric, i.e., $$\varphi _j(\frac{1}{2}-\tilde{y}) = \varphi _j(\frac{1}{2}+\tilde{y})$$ for $$\tilde{y} \in [0, \frac{1}{2}]$$ and for each $$j=1,\ldots ,d$$.

4. (A4)

Each $$\varphi _j$$ is strictly monotonously increasing in $$[0,\frac{1}{2}]$$ for each $$j=1,\ldots ,d$$.

With these restrictions we ensure, that $$\varphi$$ is bijective w.r.t. the interval $$[0,\frac{1}{2}]^{d}$$. Hence, we define the inverse mapping $$\varphi ^{-1}({\varvec{y}}): [\alpha ,\beta ]^{d} \rightarrow [0,\frac{1}{2}]^{d}$$.

With this inverse mapping, we are now able to compute approximations of the functions $$u_{{\varvec{x}}_g}({\varvec{y}})$$ via

\begin{aligned} u_{{\varvec{x}}_g}^{\texttt {usFFT}}({\varvec{y}}) :=\tilde{u}_{{\varvec{x}}_g}^{\texttt {usFFT}}(\varphi ^{-1}({\varvec{y}})) = \sum _{{\varvec{k}}\in I} c_{{\varvec{k}}}^{\texttt {usFFT}}(\tilde{u}_{{\varvec{x}}_g})\,\text {e}^{2\pi \text {i}{\varvec{k}}\cdot \varphi ^{-1}({\varvec{y}})}, \end{aligned}
(3.1)

with the finite index set $$I$$ and the approximated Fourier coefficients $$c_{{\varvec{k}}}^{\texttt {usFFT}}(\tilde{u}_{{\varvec{x}}_g})$$ from the usFFT applied to the functions $$\tilde{u}_{{\varvec{x}}_g}$$, $$g\in {\mathcal {T}}_G$$.

In this work, we always consider the tent transformation, cf. [12, 33, 43], for each $$\varphi _j$$, i.e.,

\begin{aligned}&\varphi _j: {\mathbb {T}}\longrightarrow [\alpha ,\beta ],&\varphi _j(\tilde{y}) = \beta - \left| (\beta -\alpha )\left( 1 - 2\tilde{y}\right) \right| , \end{aligned}
(3.2a)
\begin{aligned}&\varphi _j^{-1}: [\alpha ,\beta ] \longrightarrow \left[ 0,\frac{1}{2}\right] ,&\varphi _j^{-1}(y) = \frac{y-\alpha }{2(\beta -\alpha )}. \end{aligned}
(3.2b)

Although this transformation mapping fulfills the assumptions (A1)–(A4), it might not be the most favorable choice in specific applications due to its lack of smoothness. Smoother periodizations, e.g., [7, Sect. 2.2.2], might yield better approximation results in specific situations due to the faster decay of the Fourier coefficients of $$\tilde{u}$$. On the other hand, the linear structure of the tent transformation on the interval $$[0,\frac{1}{2}]$$ allows some simplifications later on.

#### 3.2.2 Lognormal case

As in the affine case, we need a suitable, periodic transformation mapping $${\varphi : {\mathbb {T}}^{d} \rightarrow D_{{\varvec{y}}} = {\mathbb {R}}^{d}}$$ to receive a periodization $$\tilde{u}({\varvec{x}},\tilde{{\varvec{y}}})$$. Again, we choose the same functions in each stochastic dimension, so the same $$\varphi _j$$ for all $$j=1,\ldots ,d$$, but this time $$\varphi _j$$ will consist of two separate steps. First, we consider the transformation

\begin{aligned}&\tau _1: \left( -\frac{1}{2},\frac{1}{2}\right) \longrightarrow {\mathbb {R}},&\tau _1(\breve{y}) :=\sqrt{2} \, \text {erf}^{-1}(2\breve{y}),\\&\tau _1^{-1}: {\mathbb {R}}\longrightarrow \left( -\frac{1}{2},\frac{1}{2}\right) ,&\tau _1^{-1}(y)= \frac{1}{2} \, \text {erf}\left( \frac{y}{\sqrt{2}}\right) , \end{aligned}

with the error function

\begin{aligned} \text {erf}(y) :=\frac{1}{\sqrt{\pi }} \int \limits _{-y}^{y} \text {e}^{-t^2} \,\mathrm {d}t, \quad x \in {\mathbb {R}}. \end{aligned}

For further information on this transformation, see [35]. This mapping $$\tau _1$$ seems like the ideal choice when talking about random variables $${\varvec{y}}\sim {\mathcal {N}}(0,1)$$, since the error function $$\text {erf}(y)$$ is closely related to its cumulative distribution function $$\Phi$$. In detail, it holds

\begin{aligned} \Phi (y) = \frac{1}{2} \left( 1+ \text {erf} \left( \frac{y}{\sqrt{2}} \right) \right) . \end{aligned}

The so-called inversion method in stochastic simulation describes, that the cumulative distribution function $$\Phi$$ and its inverse $$\Phi ^{-1}$$ map random variables, distributed according to $$\Phi$$, to uniformly distributed random variables on [0, 1] and the other way around, cf. [13, Sect. II.2]. Thus, our transformation $$\tau _1$$ maps uniformly distributed random variables $$\breve{y} \sim {\mathcal {U}}(-\frac{1}{2},\frac{1}{2})$$ to normally distributed random variables $$y \sim {\mathcal {N}}(0,1)$$ and is therefore a great generalization when moving forward from uniformly distributed random variables.

The second part is a suitable periodization $$\tau _2: {\mathbb {T}}\rightarrow (-\frac{1}{2},\frac{1}{2})$$. We choose a similar approach as in the affine case and use a shifted tent transformation

\begin{aligned}&\tau _{2,\Delta }: {\mathbb {T}}\longrightarrow \left[ -\frac{1}{2},\frac{1}{2}\right] ,&\tau _{2,\Delta }(\tilde{y}) = {\left\{ \begin{array}{ll} -\frac{1}{2} - 2(\tilde{y}-\Delta ) &{} 0 \le \tilde{y}< \Delta \\ -\frac{1}{2} + 2(\tilde{y}-\Delta ) &{} \Delta \le \tilde{y}< \frac{1}{2}+\Delta \\ +\frac{3}{2} - 2(\tilde{y}-\Delta ) &{} \frac{1}{2}+\Delta \le \tilde{y} < 1 \end{array}\right. } \\&\tau _{2,\Delta }^{-1}: \left[ -\frac{1}{2},\frac{1}{2}\right] \longrightarrow \left[ \Delta ,\frac{1}{2}+\Delta \right] ,&\tau _{2,\Delta }^{-1}(\breve{y}) = \frac{\breve{y}}{2} + \Delta + \frac{1}{4} \end{aligned}

with shift $$\Delta > 0$$. We need this shift, since we cannot apply the transformation $$\tau _1$$ if we have $$\tau _{2,\Delta }(\tilde{y})=\pm \frac{1}{2}$$ due to the poles there. Shifting with a suitable $$\Delta$$ ensures, that the deterministic part of the sampling set $${\mathcal {X}}$$ does not contain components equal to $$\Delta$$ or $$\frac{1}{2} + \Delta$$. The randomly chosen values from the interval [0, 1] for the other components will not be equal to $$\Delta$$ or $$\frac{1}{2} + \Delta$$ almost surely too. Hence, the sampling set $${\mathcal {X}}$$ in Algorithm 1 does not contain any nodes with any component equal to $$\Delta$$ or $$\frac{1}{2}+\Delta$$ almost surely.

Now we define the transformation mappings $$\varphi _{j,\Delta }$$ for each $$j=1,\ldots ,d$$ for the lognormal case as

\begin{aligned}&\varphi _{j,\Delta }: {\mathbb {T}}\setminus \left\{ \Delta ,\frac{1}{2}+\Delta \right\} \longrightarrow {\mathbb {R}},&\varphi _{j,\Delta }(\tilde{y}) = (\tau _1\circ \tau _{2,\Delta })(\tilde{y}), \end{aligned}
(3.3a)
\begin{aligned}&\varphi _{j,\Delta }^{-1}: {\mathbb {R}}\longrightarrow \left( \Delta ,\frac{1}{2}+\Delta \right) ,&\varphi _{j,\Delta }^{-1}(y) = (\tau _{2,\Delta }^{-1} \circ \tau _1^{-1})(y). \end{aligned}
(3.3b)

The mapping $$\varphi _{j,\Delta }$$ as well as its two parts $$\tau _1$$ and $$\tau _{2,\Delta }$$ are visualized in Fig. 1. These mappings fulfill slightly modified versions of the assumptions (A1)–(A4) taking into account the shift $$\Delta$$. Now we can use the transformation $$\varphi _\Delta :=(\varphi _{j,\Delta })_{j=1}^{d}$$ to receive the in $$\tilde{{\varvec{y}}}$$ periodic signals $$\tilde{u}({\varvec{x}}_g,\tilde{{\varvec{y}}}) = u({\varvec{x}}_g,\varphi _\Delta (\tilde{{\varvec{y}}}))$$, $${\varvec{x}}_g\in {\mathcal {T}}_G$$, that can be approximated using our usFFT, cf. Algorithm 1. Plugging the inverse mapping $$\varphi _\Delta ^{-1}$$ into the evaluation formula, which is similar to (3.1), we are now able to compute approximations of the solution functions $$u_{{\varvec{x}}_g}({\varvec{y}})$$ in the lognormal case as well. Again, the periodization $$\varphi _{\Delta }$$ is not smooth and therefore might yield non-optimal approximation results. In particular, the periodization mappings $$\varphi _{j,\Delta }$$ possess two poles instead of two kinks, which is a way worse smoothness behavior than in the affine case.

We now ask for the optimal choice of the parameter $$\Delta$$, such that the deterministic components of the sampling nodes $$\tilde{y}$$ of the R1Ls in the usFFT are as far as possible from $$\Delta$$ and $$\Delta +1/2$$ to reduce problems at the poles of the transformation mapping $$\varphi _\Delta$$. Let

\begin{aligned}&\tilde{y}_{i,j} :=\frac{i}{M} z_j \mod 1,&i=0,\ldots ,M-1 \;\text { and }\; j=1,\ldots ,d \end{aligned}
(3.4)

denote the jth component of the ith R1L node of the d-dimensional R1L of size M. Then, we are looking for $$\Delta$$, such that the minimum of the two distances

\begin{aligned} \min _{\begin{array}{c} i=0,\ldots ,M-1\\ j=1,\ldots ,d \end{array}} \left|\tilde{y}_{i,j} - \Delta \right|\qquad \text { and } \qquad \min _{\begin{array}{c} i=0,\ldots ,M-1\\ j=1,\ldots ,d \end{array}} \left|\tilde{y}_{i,j} - \left( \Delta + \frac{1}{2} \right) \right|\end{aligned}

is maximal.

### Lemma 3.1

Let $$\Lambda ({\varvec{z}},M)$$ be a d-dimensional R1L with prime lattice size $$M\in {\mathbb {N}}$$, $$M>2,$$ generating vector $${\varvec{z}}\in {\mathbb {Z}}^d$$, for at least one $$j_0\in \{1,\ldots ,d\}$$, and the lattice nodes $$\tilde{y}_{i,j}$$ as defined in (3.4). Then, we have

\begin{aligned} \quad&\Delta _{\text {opt}} :={{\,\mathrm{arg\,max}\,}}_{0< \Delta < \frac{1}{2M}} \left\{ \min \left\{ \min _{\begin{array}{c} i=0,\ldots ,M-1\\ j=1,\ldots ,d \end{array}} \left|\tilde{y}_{i,j} - \Delta \right|, \min _{\begin{array}{c} i=0,\ldots ,M-1\\ j=1,\ldots ,d \end{array}} \left|\tilde{y}_{i,j} - \left( \Delta + \frac{1}{2} \right) \right|\right\} \right\} \\&\quad = \frac{1}{4M}. \end{aligned}

The proof of Lemma 3.1 is given in Appendix C.

## 4 Numerics

We will now test the usFFT on different, two-dimensional numerical examples. In particular, we consider the parametric PDE (1.1) with zero boundary condition and different random coefficients $$a({\varvec{x}},{\varvec{y}})$$ and right-hand sides $$f({\varvec{x}})$$.

Since our algorithm yields an approximation $$u_{{\varvec{x}}_g}^{\texttt {usFFT}}({\varvec{y}})$$ for each $${\varvec{x}}_g \in {{\mathcal {T}}}_G$$ separately, we also compute the approximation error

\begin{aligned} \text {err}_p^{\eta }({\varvec{x}}_g)&:=\left( \frac{1}{n_{\text {test}}} \sum _{j=1}^{n_{\text {test}}} \left|\check{u}\left( {\varvec{x}}_g,{\varvec{y}}^{(j)}\right) - u^{\texttt {usFFT}}\left( {\varvec{x}}_g,{\varvec{y}}^{(j)}\right) \right|^p \right) ^{\frac{1}{p}} \end{aligned}
(4.1)

and

\begin{aligned} \text {err}_\infty ^{\eta }({\varvec{x}}_g)&:=\max _{j=1,\ldots ,n_{\text {test}}} \left|\check{u}\left( {\varvec{x}}_g,{\varvec{y}}^{(j)}\right) - u^{\texttt {usFFT}}\left( {\varvec{x}}_g,{\varvec{y}}^{(j)}\right) \right|\end{aligned}
(4.2)

for each $${\varvec{x}}_g \in {{\mathcal {T}}}_G$$ separately, using $$n_{\text {test}}=10^5$$ different, randomly drawn test variables $${\varvec{y}}^{(j)}$$ from the underlying probability distribution. Here, $$\check{u}(\cdot ,{\varvec{y}}^{(j)})$$ are the finite element solutions of the PDE for fixed parameters $${\varvec{y}}^{(j)}$$ and $$u^{\texttt {usFFT}}({\varvec{x}}_g,\cdot )$$ are our approximations from the usFFT. The parameter $$\eta$$ denotes the used sFFT parameters N, s and $$s_{\text {local}}$$ as given in Table 1.

Here, N is the extension of the full grid $$[-N,N]^{d}$$, that is used as the search space $${\Gamma \subset {\mathbb {Z}}^{d}}$$. Note that we also choose $$s_{\text {local}} = s$$. If we miss an important frequency component at one point $${\varvec{x}}_g$$, it is very likely, that it is contained in the detected index set of a neighboring mesh point. Therefore, the union over all points $${\varvec{x}}_g$$ should be enough to avoid losing frequencies and we do not need a larger $$s_{\text {local}}$$. We choose the threshold $$\theta = 1 \cdot 10^{-12}$$ and the number of detection iterations $$r=5$$ for all parameter settings $$\eta$$, which are common values and the same as in [32]. In particular, we choose the number of detection iterations r as well as the probabilities $$\gamma _{{\text {A}}}$$ and $$\gamma _{{\text {B}}}$$ as in the case $$G=1$$, since we expect a huge overlap of the detected index sets and hence a small failure probability even for these parameter choices instead of the theoretical choices given in the proof of Theorem 1.1.

Further, we always use the random R1L approach in the role of Algorithm $${\text {A}}$$ to recover the projected Fourier coefficients in the dimension-incremental method, cf. Sect. 2.2 and [32]. We also tested the algorithm using the single and multiple R1L approaches mentioned in Sect. 2.2, but these did not achieve significantly smaller approximation errors and are using larger numbers of samples and therefore result in longer runtimes of the algorithm. Hence, it seems reasonable to stick with the random R1L approach here. We choose the target maximum edge length of the finite element mesh $$h_{\max } = 0.075$$ in the FE solver. All examples consider the spatial domain $$D_{{\varvec{x}}}=[0,1]^2$$, resulting in a finite element mesh $${{\mathcal {T}}}_G \subset D_{{\varvec{x}}}$$ with $$G=737$$ inner and 104 boundary nodes.

Further, we will also analyze the importance of and the interactions between our detected Fourier coefficients. To this end, we use the classical ANOVA decomposition of 1-periodic functions as given in [38] or [29, 44]. Note that for instance in [38] the ANOVA decomposition is used already in the proposed methods to receive an adaptive selection of the most important approximation terms, which we realized in our method by simply comparing the size of the projected Fourier coefficients, cf. Sects. 2.2 and 3.1.

In particular, we consider the variance of our approximation

\begin{aligned} \sigma ^2(\tilde{u}_{{\varvec{x}}_g}^{\texttt {usFFT}}) :=\Vert \tilde{u}_{{\varvec{x}}_g}^{\texttt {usFFT}} \Vert _{L_2({\mathbb {T}}^{d})}^2 - |c_{{\varvec{0}}}^{\texttt {usFFT}}(\tilde{u}_{{\varvec{x}}_g}) |^2 = \sum _{{\varvec{k}}\in I\setminus \lbrace {\varvec{0}}\rbrace } |c_{{\varvec{k}}}^{\texttt {usFFT}}(\tilde{u}_{{\varvec{x}}_g}) |^2. \end{aligned}

Now we can study different subsets $$\mathrm {J}\subset I$$ and estimate the variance of the approximation using only these subsets. The fraction of variance, that is explained using this subset $$\mathrm {J}$$, is then called global sensitivity index (GSI), see [41, 42],

\begin{aligned} \varrho (\mathrm {J},\tilde{u}_{{\varvec{x}}_g}^{\texttt {usFFT}}) :=\frac{\sigma ^2(\tilde{u}_{{\varvec{x}}_g, \mathrm {J}}^{\texttt {usFFT}})}{\sigma ^2(\tilde{u}_{{\varvec{x}}_g}^{\texttt {usFFT}})} = \frac{\sum _{{\varvec{k}}\in \mathrm {J}\setminus \lbrace {\varvec{0}}\rbrace } |c_{{\varvec{k}}}^{\texttt {usFFT}}(\tilde{u}_{{\varvec{x}}_g}) |^2}{\sum _{{\varvec{k}}\in I\setminus \lbrace {\varvec{0}}\rbrace } |c_{{\varvec{k}}}^{\texttt {usFFT}}(\tilde{u}_{{\varvec{x}}_g}) |^2} \; \in \, [0,1], \end{aligned}
(4.3)

where we define $$\tilde{u}_{{\varvec{x}}_g, \mathrm {J}}^{\texttt {usFFT}}({\varvec{y}}):=\sum _{{\varvec{k}}\in \mathrm {J}}c_{{\varvec{k}}}^{\texttt {usFFT}}(\tilde{u}_{{\varvec{x}}_g})\,\text {e}^{2\pi \text {i}{\varvec{k}}\cdot \varphi ^{-1}({\varvec{y}})}$$. In our examples, we will mainly consider the subsets $$\mathrm {J}_{\ell }$$ of all frequencies $${\varvec{k}}$$ with exactly $$\ell$$ non-zero components, i.e.,

\begin{aligned} \mathrm {J}_{\ell } :=\left\{ {\varvec{k}}\in I: \Vert {\varvec{k}}\Vert _0 :=|\lbrace i \in \lbrace 1,\ldots ,d \rbrace : k_i \not = 0\rbrace |= \ell \right\} , \end{aligned}
(4.4)

but of course several other choices of $$\mathrm {J}_{\ell }$$ might be interesting as well for different applications.

Finally, one can also think about evaluating various quantities of interest of the approximation. Here, we will consider the expectation value $${\mathbb {E}}(u_{{\varvec{x}}_g}^{\texttt {usFFT}})$$ as one example of such quantities. We use a Monte-Carlo approximation of the expectation value

\begin{aligned} \overline{\check{u}_{{\varvec{x}}_g}} :=\frac{1}{n_{\text {MC}}} \sum _{j=1}^{n_{\text {MC}}} \check{u}\left( {\varvec{x}}_g,{\varvec{y}}^{(j)}\right) \end{aligned}

of the finite element approximation using $${n_{\text {MC}}}$$ random samples for comparison.

### 4.1 Expectation value of the approximation

Computing the expectation value of our approximation $$u_{{\varvec{x}}_g}^{\texttt {usFFT}}$$ requires some additional effort, depending on the particular model and eventually used periodization methods. By definition, the expectation value is given by

\begin{aligned} {\mathbb {E}}(u_{{\varvec{x}}_g}^{\texttt {usFFT}}) :=\int \limits _{D_{{\varvec{y}}}} u_{{\varvec{x}}_g}^{\texttt {usFFT}}({\varvec{y}}) \,\mathrm {d}\mu ({\varvec{y}}) = \int \limits _{D_{{\varvec{y}}}} u_{{\varvec{x}}_g}^{\texttt {usFFT}}({\varvec{y}}) \,p({\varvec{y}}) \,\mathrm {d}{\varvec{y}}, \end{aligned}

where p is the probability density function of the random variable $${\varvec{y}}$$.

For the periodic model, we do not need any periodization. Therefore, the approximation of the solution reads as

\begin{aligned} u_{{\varvec{x}}_g}^{\texttt {usFFT}}({\varvec{y}}) = \sum _{{\varvec{k}}\in I} c_{{\varvec{k}}}^{\texttt {usFFT}}(u_{{\varvec{x}}_g})\,\text {e}^{2\pi \text {i}{\varvec{k}}\cdot {\varvec{y}}} \end{aligned}

with $$I$$ the frequency set and $$c_{{\varvec{k}}}^{\texttt {usFFT}}(u_{{\varvec{x}}_g})$$ the corresponding approximated Fourier coefficients computed by the usFFT. The random variable $${\varvec{y}}$$ is assumed to be uniformly distributed in $$D_{{\varvec{y}}} = [-\frac{1}{2},\frac{1}{2}]^{d}$$ in this case. Hence, we have

\begin{aligned} {\mathbb {E}}(u_{{\varvec{x}}_g}^{\texttt {usFFT}})&= \int \limits _{D_{{\varvec{y}}}} u_{{\varvec{x}}_g}^{\texttt {usFFT}}({\varvec{y}}) \,p({\varvec{y}}) \,\mathrm {d}{\varvec{y}}\\&= \int \limits _{[-\frac{1}{2},\frac{1}{2}]^{d}} 1^{-d} \, \sum _{{\varvec{k}}\in I} c_{{\varvec{k}}}^{\texttt {usFFT}}(u_{{\varvec{x}}_g})\,\text {e}^{2\pi \text {i}{\varvec{k}}\cdot {\varvec{y}}} \,\mathrm {d}{\varvec{y}}\\&= \sum _{{\varvec{k}}\in I} c_{{\varvec{k}}}^{\texttt {usFFT}}(u_{{\varvec{x}}_g}) \int \limits _{[-\frac{1}{2},\frac{1}{2}]^{d}} \text {e}^{2\pi \text {i}{\varvec{k}}\cdot {\varvec{y}}} \,\mathrm {d}{\varvec{y}}\\&= \sum _{{\varvec{k}}\in I} c_{{\varvec{k}}}^{\texttt {usFFT}}(u_{{\varvec{x}}_g}) \, \delta _{{\varvec{k}}} = c_{{\varvec{0}}}^{\texttt {usFFT}}(u_{{\varvec{x}}_g}). \end{aligned}

In the affine case, we use the tent transformation (3.2), such that our approximation reads as

\begin{aligned} u_{{\varvec{x}}_g}^{\texttt {usFFT}}({\varvec{y}}) = \sum _{{\varvec{k}}\in I} c_{{\varvec{k}}}^{\texttt {usFFT}}(\tilde{u}_{{\varvec{x}}_g})\,\text {e}^{\pi \text {i}{\varvec{k}}\cdot \frac{{\varvec{y}}- \alpha {\varvec{1}}}{\beta -\alpha }} \end{aligned}

with $${\varvec{1}}= (1,1,\ldots ,1) \in {\mathbb {R}}^{d}$$. Again, the random variable $${\varvec{y}}$$ is assumed to be uniformly distributed, but for this computation we work with the more general domain $$D_{{\varvec{y}}} = [\alpha ,\beta ]^{d}$$. Therefore, we have

\begin{aligned} {\mathbb {E}}(u_{{\varvec{x}}_g}^{\texttt {usFFT}})&= \int \limits _{D_{{\varvec{y}}}} u_{{\varvec{x}}_g}^{\texttt {usFFT}}({\varvec{y}}) \,p({\varvec{y}}) \,\mathrm {d}{\varvec{y}}\nonumber \\&= \int \limits _{[\alpha ,\beta ]^{d}} (\beta -\alpha )^{-d} \sum _{{\varvec{k}}\in I} c_{{\varvec{k}}}^{\texttt {usFFT}}(\tilde{u}_{{\varvec{x}}_g})\,\text {e}^{\pi \text {i}{\varvec{k}}\cdot \frac{{\varvec{y}}- \alpha {\varvec{1}}}{\beta -\alpha }} \,\mathrm {d}{\varvec{y}}\nonumber \\&= \sum _{{\varvec{k}}\in I} c_{{\varvec{k}}}^{\texttt {usFFT}}(\tilde{u}_{{\varvec{x}}_g}) (\beta -\alpha )^{-d} \int \limits _{[\alpha ,\beta ]^{d}} \text {e}^{\pi \text {i}{\varvec{k}}\cdot \frac{{\varvec{y}}- \alpha {\varvec{1}}}{\beta -\alpha }} \,\mathrm {d}{\varvec{y}}\nonumber \\&= \sum _{{\varvec{k}}\in I} c_{{\varvec{k}}}^{\texttt {usFFT}}(\tilde{u}_{{\varvec{x}}_g}) \, D_{{\varvec{k}}} \end{aligned}
(4.5)

with

\begin{aligned} D_{{\varvec{k}}} :=\prod _{j=1}^{d} D_{k_j} \qquad \text {and} \qquad D_{k_j} :={\left\{ \begin{array}{ll} \frac{2\text {i}}{\pi k_j} &{} k_j \equiv 1 \mod 2\\ 1 &{} k_j = 0 \\ 0 &{} \text {else}. \end{array}\right. } \end{aligned}

Note that the parameters $$\alpha$$ and $$\beta$$ vanish completely. Thus, the formula is independent of the particular domain $$D_{{\varvec{y}}} = [\alpha ,\beta ]^{d}$$.

Finally, the lognormal model involves the more complicated transformation mappings $$\varphi _{j,\Delta }$$ given in (3.3). Thus, the approximation reads as

\begin{aligned} u_{{\varvec{x}}_g}^{\texttt {usFFT}}({\varvec{y}}) = \sum _{{\varvec{k}}\in I} c_{{\varvec{k}}}^{\texttt {usFFT}}(\tilde{u}_{{\varvec{x}}_g})\,\text {e}^{2\pi \text {i}{\varvec{k}}\cdot \varphi _{\Delta }^{-1}({\varvec{y}})}. \end{aligned}

Here, the random variable $${\varvec{y}}$$ is standard normally distributed, i.e., $${\varvec{y}}\sim {\mathcal {N}}({\varvec{0}},{\varvec{I}})$$ with $${\varvec{I}}$$ the identity matrix of dimension d. Hence, the expectation value can be written as

\begin{aligned} {\mathbb {E}}(u_{{\varvec{x}}_g}^{\texttt {usFFT}})&= \int \limits _{D_{{\varvec{y}}}} u_{{\varvec{x}}_g}^{\texttt {usFFT}}({\varvec{y}}) \,p({\varvec{y}}) \,\mathrm {d}{\varvec{y}}\\&= \int _{{\mathbb {R}}^{d}} (2\pi )^{-\frac{d}{2}} \, \text {e}^{-\frac{1}{2} \Vert {\varvec{y}}\Vert ^2} \sum _{{\varvec{k}}\in I} c_{{\varvec{k}}}^{\texttt {usFFT}}(\tilde{u}_{{\varvec{x}}_g})\,\text {e}^{2\pi \text {i}{\varvec{k}}\cdot \varphi _{\Delta }^{-1}({\varvec{y}})} \,\mathrm {d}{\varvec{y}}\\&= \sum _{{\varvec{k}}\in I} c_{{\varvec{k}}}^{\texttt {usFFT}}(\tilde{u}_{{\varvec{x}}_g}) (2\pi )^{-\frac{d}{2}} \int \limits _{{\mathbb {R}}^{d}} \text {e}^{-\frac{1}{2} \Vert {\varvec{y}}\Vert ^2} \, \text {e}^{2\pi \text {i}{\varvec{k}}\cdot \varphi _{\Delta }^{-1}({\varvec{y}})} \,\mathrm {d}{\varvec{y}}\\&= \sum _{{\varvec{k}}\in I} c_{{\varvec{k}}}^{\texttt {usFFT}}(\tilde{u}_{{\varvec{x}}_g}) D_{{\varvec{k}},\Delta } \\ \end{aligned}

with

\begin{aligned} D_{{\varvec{k}},\Delta } :=\prod _{j=1}^{d} D_{k_j,\Delta } \qquad \text {and} \qquad D_{k_j,\Delta } :={\left\{ \begin{array}{ll} \frac{2 \text {i}}{\pi k_j} \text {e}^{2\pi \text {i}k_j \Delta }&{} k_j \equiv 1 \mod 2\\ 1 &{} k_j = 0 \\ 0 &{} \text {else}. \end{array}\right. } \end{aligned}

Note that the factors $$D_{k_j,\Delta }$$ are exactly the same as the $$D_{k_j}$$ in the affine case up to the correction term $$\text {e}^{2\pi \text {i}k_j \Delta }$$ due to the shift with $$\Delta$$.

### 4.2 Periodic example

We consider the example from [22, Sect. 6] using the domain $$D_{{\varvec{x}}} = (0,1)^2$$ with right-hand side $$f({\varvec{x}}) = x_2$$ and the random coefficient

\begin{aligned}&a({\varvec{x}},{\varvec{y}}) :=1 + \frac{1}{\sqrt{6}} \sum _{j=1}^{d} \sin (2\pi y_j) \,\psi _j({\varvec{x}}),&{\varvec{x}}\in D_{{\varvec{x}}}, \, {\varvec{y}}\in D_{{\varvec{y}}}, \end{aligned}

with the random variables $${\varvec{y}}\sim {\mathcal {U}}\left( [-\frac{1}{2},\frac{1}{2}]^{d}\right)$$ and

\begin{aligned}&\psi _j({\varvec{x}}) :=c j^{-\mu } \sin (j\pi x_1) \sin (j\pi x_2),&{\varvec{x}}\in D_{{\varvec{x}}},\, j\ge 1, \end{aligned}

where $$c>0$$ is a constant and $$\mu >1$$ is the decay rate. Accordingly, we get

\begin{aligned} a_{\min } = 1-\frac{c}{\sqrt{6}}\zeta (\mu ) \qquad \text {and} \qquad a_{\max } = 1+\frac{c}{\sqrt{6}}\zeta (\mu ), \end{aligned}

such that for $$c < \frac{\sqrt{6}}{\zeta (\mu )}$$ the uniform ellipticity assumption (2.1) is fulfilled.

We test the usFFT with the stochastic dimension $$d = 10$$ on the two parameter choices $$\mu = 1.2,\; c = 0.4$$ and $$\mu = 3.6,\; c = 1.5$$ from [22]. The first choice seems to model a more difficult PDE, since the decay of the functions $$\psi _j$$ w.r.t. j is very slow and we have $$a_{\min } = 0.08690$$ and $$a_{\max } = 1.91310$$. This range of a is wider and $$a_{\min }$$ is closer to zero than for the quickly decaying second parameter choice with $$a_{\min } = 0.31660$$ and $$a_{\max } = 1.68340$$.

Figure 2 illustrates the total approximation error $${\text {err}}_p^{\eta }({\varvec{x}}_g)$$ for $$p=1$$ and $$p=2$$ as well as the Monte–Carlo approximation of the expectation value $$\overline{\check{u}_{{\varvec{x}}_g}}$$ using $$n_{\text {MC}}=10^6$$ samples for comparison.

A more detailed insight on the decay of the error is given in Fig. 3. There, the largest approximation error $${\text {err}}_2^{\eta }$$ w.r.t. the nodes $${\varvec{x}}_g$$ is given with the number of samples used in the corresponding usFFT. Note that this number scales directly with the sparsity parameter s, while the extension parameter N has nearly no impact. Hence, the data points in Fig. 3 are ordered from left to right from $$s=100$$ to $$s=4000$$.

Finally, Fig. 4 shows the cardinality of the sets $$\mathrm {J}_\ell$$, i.e., the number of frequencies detected with exactly $$\ell$$ non-zero components as given in (4.4), as well as the corresponding global sensitivity indices $$\varrho (\mathrm {J}_\ell ,u_{{\varvec{x}}_g}^{\texttt {usFFT}})$$ given in (4.3). Note that these values also depend on the considered point $${\varvec{x}}_g$$. Therefore, the bars show the smallest and largest GSI among all nodes $${\varvec{x}}_g \in {{\mathcal {T}}}_G$$ as well as their median and mean value.

#### 4.2.1 Discussion

The absolute error $${\text {err}}_p^{\eta }$$ in Fig. 2 is very small compared to the function values of $$\overline{\check{u}_{{\varvec{x}}_g}}$$. Thus, our approximation $$u_{{\varvec{x}}_g}^{\texttt {usFFT}}$$ is already a very good approximation for these relatively small sparsity parameters s and extension parameters N.

The periodic setting results in very quickly decaying Fourier coefficients. Obviously, the same holds for their projections computed in the dimension-incremental steps. In particular, most of the one-dimensional projections in step 1 & 2a of Algorithm 1, e.g., all projections with component $$k_t$$ with $$|k_t |>4$$, at the start of each iteration are so small, that they are neglected immediately. Hence, the one-dimensional index sets $$I^{(t)}$$ are independent of N (for large enough N) and so the choice of N has only a marginal impact. Note that we also tested our algorithm with smaller thresholds $$\theta$$, but the additionally detected and not neglected frequencies did not change the approximation significantly in the end.

We indicate some kind of linear behavior in the double logarithmic Fig. 3. Additional tests showed, that even for smaller sparsity parameters $$1\le s<100$$ the corresponding samples-error-pair fits into this model, i.e., there seems to be no pre-asymptotic behavior of our algorithm. In [22] the theoretical decay rates are often smaller than the error decay observed in numerical experiments. We also observe a relatively fast decay compared to these theoretical rates. On the other hand, the decay of the approximation error $${\text {err}}_2^{\eta }$$ for the faster decaying random coefficient a with $$\mu = 3.6$$ is not that much better than the decay of the more complicated example with $$\mu = 1.2$$. It seems like our algorithm is capable of handling the more difficult problem very well, but also does not yield that much further advantages when being applied to easier problems, i.e., with larger $$\mu$$, larger $$a_{\min }$$ and a smaller range of the interval $$[a_{\min }, a_{\max }]$$. Note that most of the samples needed are required for the detection of the frequency set $$I$$ and only a small fraction is really used for the final computation of the corresponding Fourier coefficients, cf. Sect. 4.5 and Remark 4.2. We also computed the approximation error $$\text {err}_\infty ^{\eta }$$ for different $$\eta$$ for both parameter choices of $$\mu$$ and c. Obviously, these errors have to be larger than the shown errors $${\text {err}}_2^{\eta }$$, but the actual magnitude of $$\text {err}_\infty ^{\eta }$$ is only about 10 or 15 times as large as the errors $${\text {err}}_2^{\eta }$$. Hence, the pointwise approximation error seems to stay in a reasonable size for any randomly drawn $${\varvec{y}}$$.

As we saw in Sect. 4.1, the expectation value of our approximation $${\mathbb {E}}(u_{{\varvec{x}}_g}^{\texttt {usFFT}})$$ is simply its zeroth Fourier coefficient $$c_{{\varvec{0}}}^{\texttt {usFFT}}(u_{{\varvec{x}}_g})$$. Since this coefficient is included and computed for each sparsity parameter s anyway, it seems like our different parameter choices would not influence the precision of its approximation at first sight. But for larger sparsity parameters s, we compute more Fourier coefficients in our algorithm, where possible aliasing effects should spread evenly among all of these coefficients, i.e., the particular so-called aliasing error on $$c_{{\varvec{0}}}^{\texttt {usFFT}}(u_{{\varvec{x}}_g})$$, cf. [25, 32], gets smaller and the approximation improves. Unfortunately, this is not visible in our numerical tests, since the comparison value $$\overline{\check{u}_{{\varvec{x}}_g}}$$ behaves too poorly. In detail, we would have to investigate very small sparsity parameters $$s<25$$ to observe the described effects. For all of our parameter choices $$\eta$$, the Monte–Carlo approximation $$\overline{\check{u}_{{\varvec{x}}_g}}$$ with $${n_{\text {MC}}} = 5\cdot 10^6$$ samples is not accurate enough to give insight on the particular behavior of our approximation of the expectation value.

Figure 4 shows, that there are no frequencies detected with all or nearly all components being active. Further, even though only 68 of the 4819 frequencies detected (excluding $$c_{{\varvec{0}}}$$) have exactly one non-zero component, i.e., are supported on the axis cross, they contain more than $$99\%$$ of the variance of our approximation. So the higher-dimensional frequencies with two, three or four non-zero components seem to be nearly neglectable for the approximation.

### 4.3 Affine example

For the affine case, we consider an example from [16, Sect. 11] with domain $$D_{{\varvec{x}}} = (0,1)^2$$, right-hand side $$f({\varvec{x}}) \equiv 1$$ and the random coefficient

\begin{aligned}&a({\varvec{x}},{\varvec{y}}) :=1 + \sum _{j=1}^{d} y_j \psi _j({\varvec{x}}),&{\varvec{x}}\in D_{{\varvec{x}}}, \, {\varvec{y}}\in D_{{\varvec{y}}}, \end{aligned}

with the random variables $${\varvec{y}}\sim {\mathcal {U}}([-1,1]^{d})$$ and

\begin{aligned}&\psi _j({\varvec{x}}) :=c j^{-\mu } \cos (2\pi m_1(j)\,x_1)\, \cos (2\pi m_2(j)\,x_2),&{\varvec{x}}\in D_{{\varvec{x}}},\, j\ge 1, \end{aligned}

where again $$c > 0$$ is a constant and $$\mu > 1$$ the decay rate. Further, $$m_1(j)$$ and $$m_2(j)$$ are defined as

\begin{aligned} m_1(j) :=j-\frac{k(j) (k(j)+1)}{2} \quad \text {and} \quad m_2(j) :=k(j)-m_1(j)\\ \end{aligned}

with $$k(j) :=\lfloor -1/2 + \sqrt{1/4 + 2j} \rfloor$$. Table 2 shows the numbers $$m_1(j), m_2(j)$$ and k(j) for a few $$j \ge 1$$.

As before, we get that $$a_{\min } = 1-c\,\zeta (\mu )$$ and $$a_{\max } = 1+c\,\zeta (\mu )$$, such that for $$c < \frac{1}{\zeta (\mu )}$$ the uniform ellipticity assumption (2.1) is fulfilled. Here, we use the parameter choices from [16] with $$\mu = 2$$ for a relatively slow decay and $$c = \frac{0.9}{\zeta (2)} \approx 0.547$$ to end up with $$a_{\min }=0.1$$ and $$a_{\max }=1.9$$, which is very similar to the first parameter choice in the periodic case. We choose the stochastic dimension $$d=20$$ as in [16].

Figure 5 illustrates the Monte–Carlo approximation of the expectation value $$\overline{\check{u}_{{\varvec{x}}_g}}$$ with $${n_{\text {MC}}} = 10^6$$ samples used as well as the pointwise error $$\vert \overline{\check{u}_{{\varvec{x}}_g}} - {\mathbb {E}}(u_{{\varvec{x}}_g}^{\texttt {usFFT}})\vert$$ for two different parameter choices $$\eta$$ with $${\mathbb {E}}(u_{{\varvec{x}}_g}^{\texttt {usFFT}})$$ as given in (4.5).

Figure 6 again shows the largest error $${\text {err}}_2^{\eta }$$ w.r.t. the nodes $${\varvec{x}}_g$$ for different parameter settings $$\eta$$. This time, we can observe a small increase in the number of used samples for larger extensions N, which was not visible in the periodic example. Hence, the parameter settings $$\eta =$$ I to XI have monotonously increasing sampling sizes, i.e., the data points in Fig. 6 are ordered from left to right w.r.t. increasing $$\eta$$ this time.

Figure 7 again illustrates the cardinality of the sets $$\mathrm {J}_\ell$$ as well as their GSI.

#### 4.3.1 Discussion

We note that even for small sparsity parameters s the approximation $${\mathbb {E}}(u_{{\varvec{x}}_g}^{\texttt {usFFT}})$$ seems to be quite accurate in Fig. 5. Unfortunately, for all other parameter choices $$\eta$$ except I and II, the magnitude of the errors does not decrease any further than $$1.5 \cdot 10^{-5}$$, which is probably caused by the poor performance of the Monte-Carlo approximation $$\overline{\check{u}_{{\varvec{x}}_g}}$$.

The magnitudes of the errors $${\text {err}}_2^{\eta }$$ in Fig. 6 are already very low for small sparsity parameters s compared to the expected function values shown in Fig. 5a. We note that there is an obvious improvement for each sparsity parameter s, when we progress from $$N=32$$, the first data point in each cluster, to $$N=64$$, the second data point. In the periodic case, the important frequencies are very well localized around zero, such that the choice of N had almost no impact. This time, we really lose some accuracy if we choose the smaller extension $$N=32$$. For the sparsity parameter $$s=2000$$ we also see this effect when progressing from $$N=64$$ to $$N=128$$. The overall decay of the error is a lot slower compared to the periodic example. This is probably mainly caused by the non-smooth tent transformation used, cf. Sect. 3.2.1. Again, some numerical tests determining the error $$\text {err}_\infty ^{\eta }$$ revealed a similar behavior as in the periodic setting and showed that these errors again are not larger than at most 20 times the error $${\text {err}}_2^{\eta }$$.

Since the Fourier coefficients do not decay as fast as in the smooth periodic case, we detected a significantly larger number of one- and two-dimensional couplings in Fig. 7. Again, the frequencies with only one non-zero entry explain the largest part of the variance of the function, but this time the minimum percentage is lower than in the periodic example with only about $$94.5\%$$. Accordingly, the importance of the two- and three-dimensional pairings did slightly grow. The large number of important coefficients with only one, two or three non-zero entries also results in nearly no detected significant frequencies with more than three non-zero entries. For example, the 54 frequencies in $$\mathrm {J}_4$$ will vanish when working with larger extensions N, since other frequencies with less entries are preferred in that case. So even though we are working with the moderate stochastic dimension $$d = 20$$, we do not detect any frequencies, where the half or even only a quarter of these dimensions are active simultaneously.

### 4.4 Lognormal example

We consider a two-dimensional problem based on the example in [9] on the domain $$D_{{\varvec{x}}}=[0,1]^2$$ with right-hand side $$f({\varvec{x}}) = \sin (1.3\pi x_1 + 3.4 \pi x_2) \cos (4.3 \pi x_1 - 3.1 \pi x_2)$$. The lognormal random coefficient is given by

\begin{aligned} a({\varvec{x}},{\varvec{y}}) :=\exp (b({\varvec{x}},{\varvec{y}})) \qquad \text {and} \qquad b({\varvec{x}},{\varvec{y}}) :=\sum _{j=1}^{d} \frac{1}{j} y_j \psi _j({\varvec{x}}) \end{aligned}

with the functions

\begin{aligned} \psi _j({\varvec{x}}) :=\sin (2\pi j x_1)\cos (2 \pi (d+1 - j) x_2). \end{aligned}

In [9], the stochastic dimension $$d=4$$ has been used. Here, we will work with $$d = 10$$ to receive a more complicated and higher-dimensional problem setting. We use a standard normally distributed random variable $${\varvec{y}}\sim {\mathcal {N}}({\varvec{0}},{\varvec{I}})$$ with $${\varvec{I}}$$ the identity matrix of dimension d as before. Hence, we have, that for each $${\varvec{x}}$$ there holds $$0< a({\varvec{x}},{\varvec{y}}) < \infty$$ for any $${\varvec{y}}$$. However, there do not exist the constants $$0< a_{\min } \le a_{\max }<\infty$$ in this example, since $$b({\varvec{x}},{\varvec{y}})$$ can become arbitrarily small or large. Therefore, the problem is neither uniformly elliptic nor uniformly bounded. This complicates the analysis of this problem tremendously. We can still stick with it for our numerical tests, since we only need the solvability of the differential equation for fixed values of $${\varvec{y}}$$. Further, we have $$b({\varvec{x}},{\varvec{y}}) \in [-3,3]$$ and therefore $$\exp (b({\varvec{x}},{\varvec{y}})) \in [\text {e}^{-3},\text {e}^3] \approx [0.05, 20.09]$$ with a probability of more than $$99\%$$ for each $${\varvec{x}}\in D_{{\varvec{x}}}$$, i.e., tremendously small or large values of $$a({\varvec{x}},{\varvec{y}})$$ are very unlikely to appear.

Figure 8a once again illustrates the Monte–Carlo approximation of the expectation value $$\overline{\check{u}_{{\varvec{x}}_g}}$$ with $$n_{\text {MC}} = 10^6$$ samples used.

The decay of the largest error $${\text {err}}_2^{\eta }$$ w.r.t. the nodes $${\varvec{x}}_g$$ is shown in Fig. 9, where the data points are ordered from left to right w.r.t. increasing $$\eta$$ as in Fig. 6.

Finally, Fig. 10 shows the cardinality of the sets $$\mathrm {J}_\ell$$ as well as their GSI for this example.

#### 4.4.1 Discussion

We note that the pointwise solution in Fig. 8a has a more interesting structure than for the other examples above, mainly caused by the lognormal random coefficient and the non-constant right-hand side $$f({\varvec{x}})$$. Nevertheless, the approximations $${\mathbb {E}}(u_{{\varvec{x}}_g}^{\texttt {usFFT}})$$ achieve small errors, which are shown in Fig. 8. This time, a further increase of the sparsity parameter s and the extension N still increase the accuracy of our approximations, so the stagnation due to the limitations of the Monte-Carlo approximation $$\overline{\check{u}_{{\varvec{x}}_g}}$$, that we saw in the previous examples, does not occur yet.

The pointwise errors $${\text {err}}_2^{\eta }({\varvec{x}}_g)$$ behave slightly worse but still very good, as we see in Fig. 9. Again, the increase of the extension N shows visible improvements of the approximation error $${\text {err}}_2^{\eta }$$. The decay rate is lower than before, matching our expectations since the lognormal example is far more difficult than the affine or periodic examples. Note that once again the slope considers all data points shown, while specific decays for fixed extensions N might be slower or faster. Further, the size of the error $$\text {err}_\infty ^{\eta }$$ is again about 10 times the size of $${\text {err}}_2^{\eta }$$, revealing also a good pointwise approximation w.r.t. the random variable $${\varvec{y}}$$ in this scenario.

We notice a similar distribution of the detected frequencies $${\varvec{k}}$$ to the index sets $$\mathrm {J}_{\ell }\cap I$$ as before, cf. Fig. 10. The key difference is the size of the GSI for each of these index sets. The range of the GSI for $$\mathrm {J}_1$$ increased significantly, the minimal portion of variance is now only about $$30\%$$. Obviously, the GSI for the other index sets $$\mathrm {J}_\ell$$ grew accordingly. This is probably caused by the more difficult structure of the lognormal diffusion coefficient a and the corresponding more difficult structure of the solution which is reflected in larger differences in the optimal frequency sets $$I_{{\varvec{x}}_g}$$, $$g=1,\ldots ,G$$, cf. Remark 4.1. Nevertheless, we again detect nearly no significant frequencies $${\varvec{k}}$$ with 4 or more active dimensions as in the previous examples.

### Remark 4.1

As mentioned before, the output of the usFFT contains more than the sparsity parameter s frequencies since we join the detected index sets in each dimension increment and use no thresholding technique to reduce the number of found frequencies after that. While we have no reasonably tight theoretical bounds on the size of the output yet, we can further investigate the number of output frequencies in our numerical tests. In detail, we express the detected output sparsity $$s_{\text {real}}$$ as a multiple of the given sparsity parameter s, i.e., $$s_{\text {real}} = q \cdot s$$ with some factor $$q \in {\mathbb {R}}$$.

In the numerical tests for the first periodic example in Sect. 4.2, i.e., $$\mu =1.2$$ and $$c=0.4$$, we have $$q \in [2.41, 2.74]$$, where the larger values of q tend to appear for smaller sparsity parameters s. For the quickly decaying example, i.e., $$\mu =3.6$$ and $$c=1.5$$, we have $$q \in [1.9042, 2.45]$$ and again the larger values of q are attained for small sparsity parameters s.

The affine model in Sect. 4.3 results in $$q \in [2.06, 2.186]$$, where $$q < 2.1$$ is only attained for $$\eta =$$ I, II and IV, so parameter settings with small sparsity parameters s and extension $$N=32$$.

Finally, in the complicated lognormal case in Sect. 4.4, we observe $$q \in [4.776, 5.15]$$. While the values above 5 only appear for $$\eta =$$ I and II, we still have significantly larger factors q than before . However, the magnitude of q is still very small compared to the size $$\vert {{\mathcal {T}}}_G\vert = G = 739$$ in our examples.

Our observation is consistent with recent results presented in [22] which considers the periodic model only. The crucial common feature is that the pointwise approximations $$u_{{\varvec{x}}_g}$$ can be regarded as elements of a joint reproducing kernel Hilbert space with uniformly good kernel approximants.

Overall, the factor q in our examples is much smaller than G, which would be the worst factor possible in the case that all $$I_{{\varvec{x}}_g}$$ are disjoint. Hence, as already mentioned in Sect. 1, the given complexities in Theorem 1.1 are way too pessimistic and the true amount of sampling locations and computational steps needed is much smaller in all of our numerical examples.

### 4.5 Comparison to given frequency sets

The main effort of the usFFT lies in detecting the index set $$I\subset \Gamma$$. The computation of the corresponding Fourier coefficients in the final step of Algorithm 1 needs significantly less samples than the detection steps before. Hence, the question arises, if an a priori choice of the index set $$I$$ should be preferred to reduce the computational cost, cf. Remark 4.2. Therefore, we now consider the following kinds of index sets:

• axis cross with uniform weight 1: $$I= \lbrace {\varvec{k}}\in {\mathbb {Z}}^{d}: \Vert {\varvec{k}}\Vert _0 = 1, \Vert {\varvec{k}}\Vert _1 \le N \rbrace$$

• hyperbolic cross with uniform weight $$\frac{1}{4}$$: $$I= \lbrace {\varvec{k}}\in {\mathbb {Z}}^{d}: \prod _{j=1}^{d} \max (1,4 \vert k_j \vert ) \le N \rbrace$$

• hyperbolic cross with slowly or quickly $$(q=1$$ or 2) decaying weights $$\frac{1}{j^q}$$: $$I= \lbrace {\varvec{k}}\in {\mathbb {Z}}^{d}: \prod _{j=1}^{d} \max (1,j^q \vert k_j \vert ) \le N \rbrace$$

• $$l_1$$-ball with slowly or quickly $$(q=1$$ or 2) decaying weights $$\frac{1}{j^q}$$: $$I= \lbrace {\varvec{k}}\in {\mathbb {Z}}^{d}: \sum _{j=1}^{d} j^q \vert k_j \vert \le N \rbrace$$

The Fourier coefficients $$c_{{\varvec{k}}}(u_{{\varvec{x}}_g})$$ are approximated using the same multiple R1L approach as in step 3 of our Algorithm 1, i.e., we just skipped steps 1 and 2 by choosing the index set $$I$$ instead of detecting it. Figure 11 illustrates the largest error $${\text {err}}_2^{\eta }$$ w.r.t. the nodes $${\varvec{x}}_g\in {\mathcal {T}}_G$$ for the previously considered periodic and affine examples, cf. Sects. 4.2 and 4.3, with these given frequency sets $$I$$ for various refinements N.

The magnitude of the errors is considerably larger than for comparable parameter settings of the usFFT, e.g., $$\eta =$$ I to III, especially for the periodic example. Further, we also see that the particular choice of the structure of the index set plays an important role. Obviously, a cleverly chosen index set reduces the size of the approximation error tremendously, especially in the periodic settings. But finding a good or even optimal choice of the index set is highly non-trivial, since it requires sufficient a priori information about the PDE and the structure of its solution or additional computational effort, e.g., to determine suitable weights for a given index set structure. This can be observed for example when comparing the hyperbolic cross index sets for the periodic examples. The uniform weights achieve the best results for $$\mu = 1.2$$, but cannot keep up at all with the decaying weights for the faster decay rate $$\mu = 3.6$$. On the other hand, even if we know, that there is a certain decay in our random coefficient, it is not clear how to choose suitable decay rates for the weights in order to guarantee reasonable results—specifically in pre-asymptotic settings, which is the rule rather than the exception when numerically determining solutions of high-dimensional problems.

The usFFT does not depend on these kind of information, as its choice of the frequency set is fully adaptive and the only required a priori information is the search space $$\Gamma$$, which can be chosen sufficiently large without disturbing the results of the algorithm. Further, the detected frequency set $$I$$ provides these additional information about the structure of the solution u as well as the dependence on the random variables $${\varvec{y}}$$. In other words, the additional amount of samples needed for the usFFT makes these structural information unnecessary, detects them on its own and provides a possibility to extract them afterwards from the output.

### Remark 4.2

The computations in this section and in step 3 of the usFFT are performed using the multiple R1L approach for the efficient computation of Fourier coefficients for a given frequency set $$I$$ as proposed in [24]. From [24, Cor. 3.7] we get a bound on the number of sampling nodes M used. Since we are working with $$c=2$$ and $$\delta = 0.5$$ as in [25, Alg. 3], we arrive at $$M \le \lceil 2 \,\mathrm {ln}\, (2\vert I\vert ) \rceil 4 (\vert I\vert -1)$$. Note that this upper bound is very rough and the actual number of used sampling nodes in almost all numerical experiments is much lower.

As stated above several times, this number of samples used in step 3 of the usFFT is just a small fraction of the total number of used samples when applying the usFFT. In particular, the computation of the actual Fourier coefficients $$c_{{\varvec{k}}}^{\text {\texttt {usFFT}}}(u_{{\varvec{x}}_g})$$ for the detected frequency set $$I$$ requires roughly $$0.4\%$$ or $$0.3\%$$ of the total sampling amount for the two different parameter choices of the periodic example in Sect. 4.2, around $$0.1\%$$ in the affine case in Sect. 4.3 and about $$0.65\%$$ for the lognormal model from Sect. 4.4.

In [9, 44], a data-driven method was proposed, which is capable of computing approximations for multiple right-hand sides $$f({\varvec{x}})$$ from a certain function class. Our usFFT approach can also be generalized in a similar, data-driven way: For a given class of functions $$f({\varvec{x}})$$ or even $$f({\varvec{x}},{\varvec{y}})$$, we can use the usFFT in order to compute the frequency set $$I$$ for one randomly selected right-hand side f or randomly select multiple right-hand sides f and compute unions of the corresponding index sets $$I_f$$ by means of (a slight modification of) the presented usFFT. In each case, we end up with a frequency set $$I$$, which is probably a good choice for all the functions f in the given class, since they are hopefully very similar to each other. Hence, we can use this index set $$I$$ as a starting point and compute approximations of the corresponding Fourier coefficients $$c_{{\varvec{k}}}(u_{{\varvec{x}}_g}), {\varvec{k}}\in I,$$ as done above. This approximation of $$u_{{\varvec{x}}_g}$$ is then probably a lot better, i.e., the detected index set $$I$$ is a better localization of the largest Fourier coefficients than some a priori choice.

## 5 Summary

The proposed dimension-incremental method provides an efficient possibility to adaptively compute solutions to parametric PDEs involving high-dimensional random coefficients. The non-intrusive behavior allows the use of different, suitable PDE solvers and therefore generates high adaptability of our algorithm to different problem settings as well. We show, that the amount of PDE solutions needed can be decreased by our method compared to the naive repetitive approach even in the worst case. The numerical experiments underline this and show the functionality of our method for practical examples.