The uniform sparse FFT with application to PDEs with random coefficients

We develop the uniform sparse Fast Fourier Transform (usFFT), an efficient, non-intrusive, adaptive algorithm for the solution of elliptic partial differential equations with random coefficients. The algorithm is an adaption of the sparse Fast Fourier Transform (sFFT), a dimension-incremental algorithm, which tries to detect the most important frequencies in a given search domain and therefore adaptively generates a suitable Fourier basis corresponding to the approximately largest Fourier coefficients of the function. The usFFT does this w.r.t. the stochastic domain of the PDE simultaneously for multiple fixed spatial nodes, e.g., nodes of a finite element mesh. The key idea of joining the detected frequency sets in each dimension increment results in a Fourier approximation space, which fits uniformly for all these spatial nodes. This strategy allows for a faster and more efficient computation due to a significantly smaller amount of samples needed, than just using other algorithms, e.g., the sFFT for each spatial node separately. We test the usFFT for different examples using periodic, affine and lognormal random coefficients in the PDE problems.


Introduction
Parametric operator equations have gained significant attention in recent years.In particular, partial differential equations with random coefficients play an important role in the study of uncertainty quantification, e.g., [8,22,23].Therefore, the numerical solution of these equations and how to compute them in an efficient and reliable way has become more and more important.
In this work, we consider the parametric, elliptic problem of finding u : D x × D y → R such that for every y ∈ D y there holds −∇ • (a(x, y)∇u(x, y)) = f (x) x ∈ D x , y ∈ D y , u(x, y) = 0 x ∈ ∂D x , y ∈ D y , describing the diffusion characteristics of inhomogeneous materials and therefore being called diffusion equations with the random diffusion coefficients a.Here, x = (x j ) dx j=1 ∈ D x is the spatial variable in a bounded Lipschitz domain D x ⊂ R dx , typically with spatial dimension d x = 1, 2 or 3, and y = (y j ) dy j=1 ∈ D y is a high-dimensional random variable with D y ⊂ R dy .For the remainder of this paper, we use d instead of d y to simplify notations.The differential operator ∇ is always used w.r.t. the spatial variable x and the one-dimensional random variables y j are assumed to be i.i.d. with a prescribed distribution.
A common way to define the random coefficient a is via a(x, y) = a 0 (x) + where a 0 and ψ j are assumed to be uniformly bounded on D x .This model is commonly used with the stochastic domain ].The Θ j (y) can also be interpreted as random variables itself and are usually chosen to have expectation value E[Θ j (y)] = 0, such that E[a(x, •)] = a 0 (x) holds and the terms of the sum model the stochastic fluctuations.
Often, the model (1.2) is in an affine fashion, using Θ j (y) = y j for all j = 1, ..., d.This so-called affine model is considered in many works on parametric differential equations with random coefficients, e.g., [11,30,40,44,15,31,3,14,5,18,37].The so-called periodic model using Θ j (y) = 1 √ 6 sin(2πy j ) has been recently studied in [23,22].For y j uniformly distributed on [− 1  2 , 1  2 ] each, these Θ j are then distributed according to the arcsine distribution on [−1, 1].It turned out that this model is also worth to be considered in addition to the affine model.Further, this model yields some advantages for our new approach due to its periodicity w.r.t. the random variables, as we will see later.
Here, the random variables y j are typically normally distributed, i.e., y j ∼ N (0, 1), and hence D y = R d .The numerical analysis as well as the computation of approximations for this model is more difficult, but also arises more often from real applications.A more detailed overview on parametric and stochastic PDEs can be found, e.g., in [10,Sec. 1].
In this paper, we design a numerical method for solving the aforementioned problems.To be more precise, we will compute approximations of the solutions u(x, y) using trigonometric polynomials.A Fourier approach on ordinary differential equations, i.e., d x = 1, with high-dimensional random coefficients has already been presented in [7].There, a dimensionincremental method, the so-called sparse Fast Fourier Transform (sFFT), cf.[39], was used to detect the most important frequencies k and corresponding approximations of the Fourier coefficients c k (u) of the solution u(x, y).These values can be used to compute an approximation of the solution u or other quantities of interest as, e.g., the expectation value E[u].Further, the frequencies and Fourier coefficients can be used to gain detailed information about the influence of the random variables y j on the solution u and their interaction with each other.
In this work, we present a non-intrusive approach based on the main idea of the algorithm developed in [7].The main difference is, that we do not include the spatial variable x in the Fourier approach and therefore only apply the sFFT w.r.t. the random variable y.Therefore, the sFFT only needs samples of the function values of u for fixed y, which can be computed by using any suitable, already available differential equation solver.In consequence, we are not restricted to particular spatial domains D x or spatial dimensions d x .To be more precise, we consider a finite set T G ⊂ D x with finite cardinality |T G | = G < ∞, as spatial discretization and aim for approximations of the functions u xg (y) := u(x g , y) for each x g ∈ T G .Also note that we might need to apply a suitable periodization w.r.t.y first if the function u xg is not already periodic.
Unfortunately, we would need to apply such a pointwise approximation algorithm, like in our case sFFT, then G times separately, resulting in an unnecessary huge increase in the number of samples used and therefore, since each sample implies a call of the underlying, probably expensive differential equation solver, also in computation time of the algorithm.Hence, we develop a modification of the sFFT to overcome this problem and compute the approximations of the functions u xg within one call of the new algorithm.In particular, our so-called uniform sparse Fast Fourier Transform (usFFT) combines some candidate sets between each dimension-incremental step, which allows to use the same sampling nodes y for each point x g ∈ T G in the next step.This strategy manages to keep the number of used samples in a reasonable size and hence decreases the computation time drastically compared to G applications of the sFFT algorithm itself.We summarize this in the following Theorem: Theorem 1.1.Let the sparsity parameter s ∈ N, a frequency candidate set Γ ⊂ Z d , |Γ| < ∞, the amount G ∈ N and a failure probability δ ∈ (0, 1) be given.Moreover, we define N Γ := max j=1,...,d {max k∈Γ k j − min l∈Γ l j }.Then, there exists a randomized sampling strategy based on the random rank-1 lattice approach in [32] generating a set S of sampling locations with cardinality such that the following holds.Consider G arbitrary multivariate trigonometric polynomials p (g) (y) := k∈Ig p(g) k e 2πik•y , g = 1, . . ., G, where we assume I g ⊂ Γ, |I g | ≤ s and min k∈Ig |p (g) k | > 0 for each g = 1, . . ., G. We generate a random set S via this sampling strategy.Then, with probability at least 1 − δ it holds that • all frequencies k ∈ I g as well as of all multivariate trigonometric polynomials p (g) , g = 1, . . ., G, can be reconstructed from their values at the sampling locations in S.
The simultaneous identification of all the frequencies and the computation of all the Fourier coefficients can be realized by a combination of Algorithm 1 and a modification of the approach presented in [32] in the role of Algorithm A. The suggested method has a computational complexity of in the worst case.
Remark 1.2.Note that (1.3) in Theorem 1.1 does not state anything about the sampling complexity of Algorithm 1, but the amount of sampling locations |S|.We need samples of all trigonometric polynomials p (g) (y), g = 1, . . ., G, at these sampling nodes y ∈ S, so the necessary amount of samples in the classical sense is G times larger.However, we aim for an Algorithm, where the G samples p (g) (y) for a fixed y ∈ S for all g = 1, . . ., G are obtained by just a single call of some (probably expensive) black box sampling method.In all of our numerical examples in Section 4, the computation time for this sampling procedure tremendously outweighs the pure computation time of the remaining steps of Algorithm 1, which we referred to as computational complexity in Theorem 1.1.Hence, we stress on the fact, that the computational complexity is not the main focus of this complexity result, but the amount of sampling nodes.
The proof of the Theorem is given in Appendix B. While Theorem 1.1 is stated for trigonometric polynomials p (g) only, the algorithm can be used on the above mentioned periodic or periodized functions u xg (y) to compute the support and values of the approximately largest Fourier coefficients of the functions with some thresholding technique, realized by the parameters s local and θ in Algorithm 1, as well, which is also the key idea when applying the sFFT for function approximation in [39,25,32].More generally spoken, we could even consider G different periodic functionals F g (y) and approximate them with the same approach we are about to present here.Moreover, Theorem 1.1 does not assume the frequency sets I g to share any frequencies k, i.e., these sets could even be pairwise disjoint in the worst case scenario.Obviously, this will not be the case in our examples later on as the functions u xg (y) and u x g (y) are probably very similar for x g and x g close to each other due to the smoothness of the solution u(x, y).Hence, the given complexities, especially the quadratic dependency on G of the computational complexity, are very pessimistic and should really be seen as a worst case estimate.
The crucial advantage of the presented approach is the efficient and adaptive choice of the frequency set performed by the underlying sFFT.Most of the approaches in the aforementioned works are based on certain (tensorized) basis functions [11,9,3,5,4,2,6,22], Quasi-Monte Carlo methods [30,40,19,9,15,31,14,18,23,37,36] or collocation methods [9,44] and often assume the particular involved basis functions, weights, index sets or kernels needed to be known, chosen or computed in advance.A common example are compressed sensing techniques, see, e.g., [1,Ch. 7] or [17] and the references therein for an overview, where the considered index sets need to be chosen in advance.The adaptivity of our algorithm circumvents this step and therefore provides much more freedom in finding a good sparse approximation.Also note that our method aims for the approximation of the solution u(x, y) directly instead of, e.g., just a high-dimensional quadrature.As an example and for additional information, we refer to [28] as a short and general introduction to Quasi-Monte Carlo methods, which is also one of the most common approaches.
Another suitable approach is to use an efficient deterministic sampling strategy which guarantees the reconstruction of each s-sparse signal supported on the d-dimensional candidate set Γ, so-called deterministic sparse Fourier Transform.Such a method allows to use the same samples for approximating all signals u xg , g = 1, . . ., G. Several works like [21] already provide applicable multivariate sparse Fourier transform results, but the resulting complexities O(d 4 s 2 ) (up to logarithmic factors) in a deterministic setting and O(d 4 s) for a random variant scale suboptimal in the dimension d.Another fully deterministic method is stated in [34].There, the construction of the sampling sets suffers from some minor restrictions on the considered frequency set, which result in a slightly better scaling sampling complexity O(d 3 s 2 N ) and computational complexity O(d 3 s 2 N 2 ) (both again up to logarithmic factors) with N the side length of the considered cube in frequency domain.Finally, in [20] multivariate sparse Fourier transforms are presented, where the corresponding complexities scale again quadratically in s for a deterministic version and linearly for a Monte Carlo version, while the dimension d only enters linearly.Unfortunately, this is only true if the considered candidate sets do not scale exponentially w.r.t. the dimension d.Otherwise, the size M of the reconstructing rank-1 lattices used also scales exponentially in d and logarithmic factors like log 11/2 M in the complexities then result again in a suboptimal dimension scaling.
Our usFFT is highly adaptive, since it only needs an arbitrary candidate set Γ and selects the important frequencies k in this search domain on its own.The cardinality |Γ| of this candidate set is not as crucial as for other approaches like mentioned above, since the number of used samples and the computation time suffer only mildly from larger candidate sets.Again, we stress on the fact that we may also extract additional information about the influence and the interactions of the random variables y j from the output of the usFFT.For instance, we detect a maximum of only 4 simultaneously active dimensions in the detected frequencies in our numerical examples, i.e., the detected frequency vectors k have at most 4 non-zero components with d = 10 or even d = 20.Another main advantage of our algorithm is the non-intrusive and parallelizable behavior.As already mentioned, the usFFT uses existing numerical solvers of the considered differential equation.We can use suitable, reliable and efficient solvers with no need to re-implement them.Further, the different samples needed in each sampling step can be computed on multiple instances.This parallelization allows to reduce the computation time even further and makes a higher number of used samples less time consuming.
The remainder of the paper is organized as follows: In Section 2 we set up some notation and assumptions and briefly explain the key idea of the sFFT algorithm.Section 3 is devoted to the explanation of the usFFT as well as some periodizations required for the affine and lognormal cases.Finally, in Section 4 we test the new algorithm on different examples using periodic, affine, and lognormal random coefficients and investigate the computed approximations under different aspects.
The MATLAB ® source code of the algorithm as well as demos for our numerical examples can be downloaded from https://mytuc.org/fyfw.

Prerequisites
We consider the PDE problem (1.1).Note that we always assume f to be independent of the random variable y and zero boundary conditions just for simplicity and to preserve clarity.Our algorithm (up to some minor changes) may also be applied for right-hand sides f (x, y) as well as non-zero Dirichlet boundary conditions u(x, y) = h(x, y) for all x ∈ ∂D x .

Problem setting
The weak formulation of our problem reads: As usual, H 1 0 (D x ) denotes the subspace of the L 2 -Sobolev space H 1 (D x ) with vanishing trace on ∂D x and H −1 (D x ) denotes the dual space of H 1 0 (D x ).We say, that the diffusion coefficient a : D x × D y → R fulfills the uniform ellipticity assumption, if there exist two constants a min ∈ R and a max ∈ R, such that Then, the Lax-Milgram Lemma ensures, that the problem (1.1) possesses a unique solution u(•, y) ∈ H 1 0 (D x ) for every fixed y ∈ D y , satisfying the a priori estimate Some further basic information and results on approximation and smoothness of the solution u of high-dimensional parametric PDEs can be found in [10,Sec. 1 and 2].Additionally, we also refer to the general results on best n-term approximations given in [10,Sec. 3.1], since our Fourier approach fits in this particular framework as well.
In order to compute an approximation of the solution u xg := u(x g , •) at a given point x g ∈ D x using the dimension-incremental method explained below, we need samples of u xg for a lot of sampling nodes y.We aim for a non-intrusive approach and therefore use a finite element method to solve the problem (1.1) for a given y ∈ D y .A similar approach is used e.g. in [37,36], where the finite element method is used to solve the PDE for any y u with u ⊂ N and (y u ) j = y j for j ∈ u and 0 otherwise.The corresponding approximations of the so-called u-truncated solution are then used for their particular method aswell.In our case, we just evaluate the finite element solution ǔ(•, y) at the given point x g ∈ D x .In particular, instead of the finite element method, any differential equation solver would fit, that is capable of computing the value u(x g , y) for given x g and y.Hence, we also refer to this sampling method as black box sampling later on.
Note that we will use the finite element solution ǔ also as an approximation of the true solution u, when we test the accuracy of our computed approximation u usFFT in Section 4. In detail, we have err(u, u usFFT ) ≤ err(u, ǔ) + err(ǔ, u usFFT ), where err(•, •) is a suitable metric, symbolizing the error.So while we only investigate the second term err(ǔ, u usFFT ) in our numerical tests later, the first term includes other error sources as the modeling, e.g., by a dimension truncation of infinite-dimensional random coefficient a, or the error coming from the finite element approximation itself.For a particular example of this, we refer to the detailed error analysis for the periodic model mentioned in Section 1, that is given in [22,Sec. 4].

The dimension-incremental method for s-sparse periodic functions
The following dimension-incremental method was presented in [39].The aim of this algorithm is to determine the non-zero Fourier coefficients pk ∈ C, k ∈ I, of a multivariate trigonometric polynomial with unknown frequency set I ⊂ Z d , |I| < ∞, based on samples of the polynomial p. Obviously, p is a periodic signal and its domain is the d-dimensional torus T d , T [0, 1).
The goal is not only to calculate the nonzero Fourier coefficients pk but also, and more important, to detect the frequencies k out of a possibly huge search domain Γ ⊂ Z d belonging to the nonzero Fourier coefficients.In particular, we define the set supp p := {k ∈ Γ : pk = 0} and call the cardinality | supp p| the sparsity of p.
First, we introduce some further notation.We consider a given search domain Γ ⊂ Z d , |Γ| < ∞, that should be large enough to contain the unknown frequency set I ⊂ Γ.We denote the projection of a frequency k := (k 1 , ..., k d ) ∈ Z d to the components i := (i 1 , ..., i m ) ∈ {ι ∈ {1, ..., d} m : ι t = ι t for t = t } by P i (k) := (k i 1 , ..., k im ) ∈ Z m .Correspondingly, we define the projection of a frequency set I ⊂ Z d to the components i by P i (I) := {(k i 1 , ..., k im ) : k ∈ I}.Using these notations, the general approach is the following: Sketch of dimension-incremental reconstruction 1. Compute the first components of the unknown frequency set from sampling values, i.e., determine a set I (1) ⊂ P 1 (Γ), such that P 1 (supp p) ⊂ I (1) holds.
2. For dimension increment step t = 2, ..., d, i.e., for each additional dimension: a) Compute the t-th components of the unknown frequency set from sampling values, i.e., determine a set I (t) ⊂ P t (Γ), such that P t (supp p) ⊂ I (t) holds.
Note that this method can also be used for the numerical determination of the approximately largest Fourier coefficients of suffciently smooth periodic signals f using a suitable thresholding technique, cf.[39,25,32].
The proposed approach includes the construction of suitable sampling sets in step 2b.To this end, one assumes that an upper bound s ≥ | supp p| is known and one constructs the sampling sets X (1,...,t) such that the Fourier coefficients p(1,...,t),k computed in step 2d are randomly projected ones.Due to that projection one may observe cancellations with the effect that one misses active frequencies.For that reason, one repeats the computation of the projected Fourier coefficients for a number r of random projections and then one takes the union.
Of course, there exist different methods for the computation of the projected Fourier coefficients.The algorithm works with any sampling method, which computes Fourier coefficients on a given frequency set.Preferable sampling sets combine the four properties: • relatively low number of sampling nodes (sampling complexity), • stability, • efficient construction methods for the sampling set, • fast Fourier transform like algorithms in order to compute the projected Fourier coefficients.
Especially due to the last point, we will call the dimension-incremental method the sparse Fast Fourier Transform (sFFT) from now on.A quick sketch of the sampling techniques used in [39,25,32] as well as the sample complexity and computational complexity of the sFFT using these methods are given in Appendix A.

The uniform sparse FFT
Up to now, the sFFT algorithm is a suitable tool in order to compute an approximation of the solution for a single x g .When considering a whole set of points we have to call the existing method G times.But multiple, independent calls of the sFFT result in different, adaptively determined sampling sets X (1,...,t) in step 2b.Hence, we cannot guarantee that the solutions of the differential equation from one run of the algorithm can be utilized in another one.So we really need G full calls of the sFFT including all sampling computations and therefore end up with unnecessary many samples, even when using the sample efficient rank-1 lattice (R1L) approaches.Remember, that sampling means solving the differential equation with a call of the underlying differential equation solver, that might be very expensive in computation time.Therefore, we now modify the dimension-incremental method, such that we can work on the set T G and one call of the algorithm computes approximations of the most important Fourier coefficients c k (u xg ), k ∈ I xg , for each g = 1, ..., G, including a clever choice of the sampling nodes y.

Expanding the sFFT
The full method is stated in Algorithm 1.We force the dimension-incremental method to select a set I ⊂ Γ ⊂ Z d containing the frequencies of the s approximately largest Fourier coefficients c k (u xg ) for each x g ∈ T G .To this end, we compute the set of detected frequencies for each x g in each dimension-increment t, but afterwards we form the union of these sets G g=1 I (1,...,t) xg , which will be the set of detected frequencies I (1,...,t) , that is given to the next dimension-incremental step t + 1.Now, we start each iteration with a larger frequency candidate set (I (1,...,t−1) × I (t) ) ∩ P (1,...,t) (Γ), which is suitable for all x g ∈ T G .This way, the first t components of the elements of the sampling set X (1,...,t) are the same for each x g and the random part, which causes the specific random projection of the Fourier coefficients, can be chosen equally for each x g without disturbing the algorithm.Therefore, we can now take advantage of the fact, that our underlying differential equation solver can evaluate the solutions u(x, y) for a given y for multiple values of x in the domain D x .Accordingly, we only need to solve the differential equation once for each sampling node y and still get all the sampling values u xg (y) for all x g in our finite set T G .Note that this also holds for the one-dimensional detections in steps 1 and 2a.Also, we might need to interpolate or approximate, if some of the values u xg (y), x g ∈ T G and fixed y ∈ X (1,...,t) , are not directly given by the differential equation solver.Obviously, the larger candidate sets (I (1,...,t−1) × I (t) ) ∩ P (1,...,t) (Γ), resulting from the union of the sets I (1,...,t−1) xg , g = 1, . . ., G, and the union of the sets xg , g = 1, . . ., G, will also result in larger sampling sets.The overall increase of sampling locations considered is still very reasonable, cf.Theorem 1.1.The computational complexity suffers a bit harder from these modifications, but is not as important as the amount of sampling locations, cf.Remark 1.2.We could also think of further thresholding methods to cut the number of frequencies back to the sparsity parameter s at the end of each dimension-incremental step or at least at the end of the whole algorithm.In this work, we will not do this, but take a look at the total number of frequencies in the output of the algorithm in relation to the sparsity parameter s, cf.Remark 4.1.
Overall, the proposed method is now capable of computing approximations for all nodes x g , g = 1, ..., G. Please note that step 2 does not provide (c usFFT k (u xg )) k∈I but only approximations of (c usFFT k (u xg )) k∈ Jd,1,g , where Jd,1,g I (1,...,d) = I holds in general.In order to compute all the Fourier coefficients c usFFT k (u xg ), k ∈ I and g = 1, . . ., G, we propose an additional approximation in step 3.For this approximation, the user can apply a suitable approach of his choice.The used method should just compute an approximation of the projection to the already determined space of trigonometric polynomials span{exp(2πik • •) : k ∈ I}, cf., e.g., [27,24,26] for different possible sampling approaches.
We will call this modified version of the sFFT the uniform sFFT or short usFFT from now on, where uniform is meant w.r.t. the discrete set of points T G .The main difference to the sFFT algorithms from [39,25,32] are the loops over g and the corresponding unions of the frequency sets.Possible choices for Algorithm A and the approximation approach used in step 3 of Algorithm 1 are a random rank-1 lattice approach and a multiple rank-1 lattice approach, respectively, which actually leads to Theorem 1.1, cf.Appendix A and Appendix B.

Periodization
The usFFT allows us to reconstruct a frequency set I and approximations c usFFT k (u xg ) of the corresponding Fourier coefficients c k (u xg ) for each x g ∈ T G .Unfortunately, this approach requires the function u(x, y) to be 1-periodic w.r.t.y in each stochastic dimension d.
Since the right-hand side f (x) does not depend on y in our considerations, the random coefficient a is the only given function involving the random variable y in the problem (1.1).In periodic models, we use the random coefficient (1.2) with 1-periodic functions Θ j (y).Hence, the random coefficient a(x, y) is 1-periodic and thus the solution u(x, y) is also 1-periodic w.r.t. each component of y.Therefore, we can apply the usFFT directly for this model without any further considerations.
In order to apply the usFFT when using the affine and lognormal models, we need to apply a suitable periodization first, since the random coefficient a and therefore the solution u are not periodic in general.Note that we assume the random variable to be uniformly distributed in the affine case, i.e., y ∼ U([α, β] d ), and standard normally distributed in the lognormal case, i.e., y ∼ N (0, 1) d and recall T [0, 1).

Affine case
We consider the in ỹ 1-periodic function with ϕ being some suitable transformation function, i.e., With this approach, the usFFT is able to compute approximations of the functions ũxg := ũ(x g , •) for each x g ∈ T G .We want ϕ to act component-wise on the random variable, i.e., ϕ(ỹ) := (ϕ j (ỹ j )) d j=1 .Further, we assume, that these mappings ϕ j fulfill the assumptions (A1) Each ϕ j is continuous, i.e., ϕ j ∈ C(T) for each j = 1, ..., d.
With these restrictions we ensure, that ϕ is bijective w.r.t. the interval [0, with the finite index set I and the approximated Fourier coefficients c usFFT k (ũ xg ) from the usFFT applied to the functions ũxg , g ∈ T G .
In this work, we always consider the tent transformation, cf.[33,43,12], for each ϕ j , i.e., Although this transformation mapping fulfills the assumptions (A1) -(A4), it might not be the most favorable choice in specific applications due to its lack of smoothness.Smoother periodizations, e.g., [7, Sec.2.2.2], might yield better approximation results in specific situations due to the faster decay of the Fourier coefficients of ũ.On the other hand, the linear structure of the tent transformation on the interval [0, 1  2 ] allows some simplifications later on.

Lognormal case
As in the affine case, we need a suitable, periodic transformation mapping ϕ : to receive a periodization ũ(x, ỹ).Again, we choose the same functions in each stochastic dimension, so the same ϕ j for all j = 1, ..., d, but this time ϕ j will consist of two separate steps.First, we consider the transformation with the error function For further information on this transformation, see [35].This mapping τ 1 seems like the ideal choice when talking about random variables y ∼ N (0, 1), since the error function erf(y) is closely related to its cumulative distribution function Φ.In detail, it holds The so-called inversion method in stochastic simulation describes, that the cumulative distribution function Φ and its inverse Φ −1 map random variables, distributed according to Φ, to uniformly distributed random variables on [0, 1] and the other way around, cf.[13, Sec.II.2].Thus, our transformation τ 1 maps uniformly distributed random variables y ∼ U(− 1 2 , 1 2 ) to normally distributed random variables y ∼ N (0, 1) and is therefore a great generalization when moving forward from uniformly distributed random variables.
The second part is a suitable periodization τ 2 : T → (− 1 2 , 1 2 ).We choose a similar approach as in the affine case and use a shifted tent transformation with shift ∆ > 0. We need this shift, since we cannot apply the transformation τ 1 if we have τ 2,∆ (ỹ) = ± 1 2 due to the poles there.Shifting with a suitable ∆ ensures, that the deterministic part of the sampling set X does not contain components equal to ∆ or 1  2 + ∆.The randomly chosen values from the interval [0, 1] for the other components will not be equal to ∆ or 1  2 + ∆ almost surely too.Hence, the sampling set X in Algorithm 1 does not contain any nodes with any component equal to ∆ or 1  2 + ∆ almost surely.Now we define the transformation mappings ϕ j,∆ for each j = 1, ..., d for the lognormal case as The mapping ϕ j,∆ as well as its two parts τ 1 and τ 2,∆ are visualized in Figure 3.1.These mappings fulfill slightly modified versions of the assumptions (A1) -(A4) taking into account the shift ∆.Now we can use the transformation ϕ ∆ := (ϕ j,∆ ) d j=1 to receive the in ỹ periodic signals ũ(x g , ỹ) = u(x g , ϕ ∆ (ỹ)), x g ∈ T G , that can be approximated using our usFFT, cf.Algorithm 1. Plugging the inverse mapping ϕ −1 ∆ into the evaluation formula, which is similar to (3.1), we are now able to compute approximations of the solution functions u xg (y) in the lognormal case as well.Again, the periodization ϕ ∆ is not smooth and therefore might yield non-optimal approximation results.In particular, the periodization mappings ϕ j,∆ possess two poles instead of two kinks, which is a way worse smoothness behavior than in the affine case.
We now ask for the optimal choice of the parameter ∆, such that the deterministic components of the sampling nodes ỹ of the R1Ls in the usFFT are as far as possible from ∆ and generating vector z ∈ Z d , z j 0 ≡ 0 (mod M ) for at least one j 0 ∈ {1, . . ., d}, and the lattice nodes ỹi,j as defined in (3.4).Then, we have The proof of Lemma 3.1 is given in Appendix C.

Numerics
We will now test the usFFT on different, two-dimensional numerical examples.In particular, we consider the parametric PDE (1.1) with zero boundary condition and different random coefficients a(x, y) and right-hand sides f (x).
Since our algorithm yields an approximation u usFFT xg (y) for each x g ∈ T G separately, we also compute the approximation error for each x g ∈ T G separately, using n test = 10 5 different, randomly drawn test variables y (j)  from the underlying probability distribution.Here, ǔ(•, y (j) ) are the finite element solutions of the PDE for fixed parameters y (j) and u usFFT (x g , •) are our approximations from the usFFT.The parameter η denotes the used sFFT parameters as given in Table 4.1.
Here, N is the extension of the full grid [−N, N ] d , that is used as the search space Γ ⊂ Z d .Note that we also choose s local = s.If we miss an important frequency component at one point x g , it is very likely, that it is contained in the detected index set of a neighboring mesh point.Therefore, the union over all points x g should be enough to avoid losing frequencies and we do not need a larger s local .The choices for the threshold θ and the number of detection iterations r are common values and the same as in [32].In particular, we choose the number of detection iterations r as well as the probabilities γ A and γ B as in the case G = 1, since we expect a huge overlap of the detected index sets and hence a small failure probability even for these parameter choices instead of the theoretical choices given in the proof of Theorem 1.1.
Further, we always use the random R1L approach in the role of Algorithm A to recover the projected Fourier coefficients in the dimension-incremental method, cf.Section 2.2 and [32].We also tested the algorithm using the single and multiple R1L approaches mentioned in Section 2.2, but these did not achieve significantly smaller approximation errors and are using larger numbers of samples and therefore result in longer runtimes of the algorithm.Hence, it seems reasonable to stick with the random R1L approach here.We choose the target maximum edge length of the finite element mesh h max = 0.075 in the FE solver.All examples consider the spatial domain D x = [0, 1] 2 , resulting in a finite element mesh T G ⊂ D x with G = 737 inner and 104 boundary nodes.
Further, we will also analyze the importance of and the interactions between our detected Fourier coefficients.To this end, we use the classical ANOVA decomposition of 1-periodic functions as given in [38] or [44,29].Note that for instance in [38] the ANOVA decomposition is used already in the proposed methods to receive an adaptive selection of the most important approximation terms, which we realized in our method by simply comparing the size of the projected Fourier coefficients, cf.Sections 2.2 and 3.1.
In particular, we consider the variance of our approximation Now we can study different subsets J ⊂ I and estimate the variance of the approximation using only these subsets.The fraction of variance, that is explained using this subset J, is then called global sensitivity index (GSI), see [41,42], where we define ũusFFT xg,J (y) := k∈J c usFFT k (ũ xg ) e 2πik•ϕ −1 (y) .In our examples, we will mainly consider the subsets J of all frequencies k with exactly non-zero components, i.e., J := {k ∈ I : k 0 := |{i ∈ {1, ..., d} : but of course several other choices of J might be interesting as well for different applications.
Finally, one can also think about evaluating various quantities of interest of the approximation.Here, we will consider the expectation value E(u usFFT xg ) as one example of such quantities.We use a Monte-Carlo approximation of the expectation value of the finite element approximation using n MC random samples for comparison.

Expectation value of the approximation
Computing the expectation value of our approximation u usFFT xg requires some additional effort, depending on the particular model and eventually used periodization methods.By definition, the expectation value is given by E(u usFFT For the periodic model, we do not need any periodization.Therefore, the approximation of the solution reads as In the affine case, we use the tent transformation (3.2), such that our approximation reads as Again, the random variable y is assumed to be uniformly distributed, but for this computation we work with the more general domain D y = [α, β] d .Therefore, we have with Note that the parameters α and β vanish completely.Thus, the formula is independent of the particular domain Finally, the lognormal model involves the more complicated transformation mappings ϕ j,∆ given in (3.3).Thus, the approximation reads as Here, the random variable y is standard normally distributed, i.e., y ∼ N (0, I) with I the identity matrix of dimension d.Hence, the expectation value can be written as Note that the factors D k j ,∆ are exactly the same as the D k j in the affine case up to the correction term e 2πik j ∆ due to the shift with ∆.

Periodic example
We consider the example from [22, Sec.6] using the domain D x = (0, 1) 2 with right-hand side f (x) = x 2 and the random coefficient where c > 0 is a constant and µ > 1 is the decay rate.Accordingly, we get such that for c < √ 6 ζ(µ) the uniform ellipticity assumption (2.1) is fulfilled.We test the usFFT with the stochastic dimension d = 10 on the two parameter choices µ = 1.2, c = 0.4 and µ = 3.6, c = 1.5 from [22].The first choice seems to model a more difficult PDE, since the decay of the functions ψ j w.r.t.j is very slow and we have a min = 0.08690 and a max = 1.91310.This range of a is wider and a min is closer to zero than for the quickly decaying second parameter choice with a min = 0.31660 and a max = 1.68340.
Figure 4.1 illustrates the total approximation error err η p (x g ) for p = 1 and p = 2 as well as the Monte-Carlo approximation of the expectation value ǔxg using n MC = 10 6 samples for comparison.A more detailed insight on the decay of the error is given in Figure 4.2.There, the largest approximation error err η 2 w.r.t. the nodes x g is given with the number of samples used in the corresponding usFFT.Note that this number scales directly with the sparsity parameter s, while the extension parameter N has nearly no impact.Hence, the data points in Figure 4.2 are ordered from left to right from s = 100 to s = 4000.Finally, Figure 4.3 shows the cardinality of the sets J , i.e., the number of frequencies detected with exactly non-zero components as given in (4.4), as well as the corresponding global sensitivity indices (J , u usFFT xg ) given in (4.3).Note that these values also depend on the considered point x g .Therefore, the bars show the smallest and largest GSI among all nodes x g ∈ T G as well as their median and mean value.

Discussion
The absolute error err η p in Figure 4.1 is very small compared to the function values of ǔxg .Thus, our approximation u usFFT xg is already a very good approximation for these relatively small sparsity parameters s and extension parameters N .4.1 for the periodic example.
The periodic setting results in very quickly decaying Fourier coefficients.Obviously, the same holds for their projections computed in the dimension-incremental steps.In particular, most of the one-dimensional projections in step 1 & 2a of Algorithm 1, e.g., all projections with component k t with |k t | > 4, at the start of each iteration are so small, that they are neglected immediately.Hence, the one-dimensional index sets I (t) are independent of N (for large enough N ) and so the choice of N has only a marginal impact.Note that we also tested our algorithm with smaller thresholds θ, but the additionally detected and not neglected frequencies did not change the approximation significantly in the end.
We indicate some kind of linear behavior in the double logarithmic Figure 4.2.Additional tests showed, that even for smaller sparsity parameters 1 ≤ s < 100 the corresponding samples-error-pair fits into this model, i.e., there seems to be no pre-asymptotic behavior of our algorithm.In [22] the theoretical decay rates are often smaller than the error decay observed in numerical experiments.We also observe a relatively fast decay compared to these theoretical rates.On the other hand, the decay of the approximation error err η 2 for the faster decaying random coefficient a with µ = 3.6 is not that much better than the decay of the more complicated example with µ = 1.2.It seems like our algorithm is capable of handling the more difficult problem very well, but also does not yield that much further advantages when being applied to easier problems, i.e., with larger µ, larger a min and a smaller range of the interval [a min , a max ].Note that most of the samples needed are required for the detection of the frequency set I and only a small fraction is really used for the final computation of the corresponding Fourier coefficients, cf.Section 4.5 and Remark 4.2.We also computed the approximation error err η ∞ for different η for both parameter choices of µ and c.Obviously, these errors have to be larger than the shown errors err η 2 , but the actual magnitude of err η ∞ is only about 10 or 15 times as large as the errors err η 2 .Hence, the pointwise approximation error seems to stay in a reasonable size for any randomly drawn y.
As we saw in Section 4.1, the expectation value of our approximation E(u usFFT xg ) is simply its zeroth Fourier coefficient c usFFT 0 (u xg ).Since this coefficient is included and computed for each sparsity parameter s anyway, it seems like our different parameter choices would not influence the precision of its approximation at first sight.But for larger sparsity parameters s, we compute more Fourier coefficients in our algorithm, where possible aliasing effects should spread evenly among all of these coefficients, i.e., the particular so-called aliasing error on c usFFT 0 (u xg ), cf.[25,32], gets smaller and the approximation improves.Unfortunately, this is not visible in our numerical tests, since the comparison value ǔxg behaves too poorly.In detail, we would have to investigate very small sparsity parameters s < 25 to observe the described effects.For all of our parameter choices η, the Monte-Carlo approximation ǔxg Table 4.2: The values of m 1 (j), m 2 (j) and k(j).j 1 2 3 4 5 6 7 8 9 10 11 12 13 14 . . .
with n MC = 5 • 10 6 samples is not accurate enough to give insight on the particular behavior of our approximation of the expectation value.Figure 4.3 shows, that there are no frequencies detected with all or nearly all components being active.Further, even though only 68 of the 4819 frequencies detected (excluding c 0 ) have exactly one non-zero component, i.e., are supported on the axis cross, they contain more than 99% of the variance of our approximation.So the higher-dimensional frequencies with two, three or four non-zero components seem to be nearly neglectable for the approximation.
As before, we get that a min = 1 − c ζ(µ) and a max = 1 + c ζ(µ), such that for c < 1 ζ(µ) the uniform ellipticity assumption (2.1) is fulfilled.Here, we use the parameter choices from [16] with µ = 2 for a relatively slow decay and c = 0.9 ζ(2) ≈ 0.547 to end up with a min = 0.1 and a max = 1.9, which is very similar to the first parameter choice in the periodic case.We choose the stochastic dimension d = 20 as in [16].  2 w.r.t. the nodes x g for different parameter settings η.This time, we can observe a small increase in the number of used samples for larger extensions N , which was not visible in the periodic example.Hence, the parameter settings η = I to XI have monotonously increasing sampling sizes, i.e., the data points in Figure 4.5 are ordered from left to right w.r.t.increasing η this time.Figure 4.6 again illustrates the cardinality of the sets J as well as their GSI.4.1 for the affine example.

Discussion
We note that even for small sparsity parameters s the approximation E(u usFFT xg ) seems to be quite accurate in Figure 4.4.Unfortunately, for all other parameter choices η except I and II, the magnitude of the errors does not decrease any further than 1.5 • 10 −5 , which is probably caused by the poor performance of the Monte-Carlo approximation ǔxg .
The magnitudes of the errors err η 2 in Figure 4.5 are already very low for small sparsity parameters s compared to the expected function values shown in Figure 4.4a.We note that there is an obvious improvement for each sparsity parameter s, when we progress from N = 32, the first data point in each cluster, to N = 64, the second data point.In the periodic case, the important frequencies are very well localized around zero, such that the choice of N had almost no impact.This time, we really lose some accuracy if we choose the smaller extension N = 32.For the sparsity parameter s = 2000 we also see this effect when progressing from N = 64 to N = 128.The overall decay of the error is a lot slower compared to the periodic example.This is probably mainly caused by the non-smooth tent transformation used, cf.∞ revealed a similar behavior as in the periodic setting and showed that these errors again are not larger than at most 20 times the error err η 2 .Since the Fourier coefficients do not decay as fast as in the smooth periodic case, we detected a significantly larger number of one-and two-dimensional couplings in Figure 4.6.Again, the frequencies with only one non-zero entry explain the largest part of the variance of the function, but this time the minimum percentage is lower than in the periodic example with only about 94.5%.Accordingly, the importance of the two-and three-dimensional pairings did slightly grow.The large number of important coefficients with only one, two or three non-zero entries also results in nearly no detected significant frequencies with more than three non-zero entries.For example, the 54 frequencies in J 4 will vanish when working with larger extensions N , since other frequencies with less entries are preferred in that case.So even though we are working with the moderate stochastic dimension d = 20, we do not detect any frequencies, where the half or even only a quarter of these dimensions are active simultaneously.

Lognormal example
We consider a two-dimensional problem based on the example in [9] on the domain with the functions In [9], the stochastic dimension d = 4 has been used.Here, we will work with d = 10 to receive a more complicated and higher-dimensional problem setting.We use a standard normally distributed random variable y ∼ N (0, I) with I the identity matrix of dimension d as before.Hence, we have, that for each x there holds 0 < a(x, y) < ∞ for any y.However, there do not exist the constants 0 < a min ≤ a max < ∞ in this example, since b(x, y) can become arbitrarily small or large.Therefore, the problem is neither uniformly elliptic nor uniformly bounded.This complicates the analysis of this problem tremendously.We can still stick with it for our numerical tests, since we only need the solvability of the differential equation for fixed values of y.Further, we have b(x, y) ∈ [−3, 3] and therefore exp(b(x, y)) ∈ [e −3 , e 3 ] ≈ [0.05, 20.09] with a probability of more than 99% for each x ∈ D x , i.e., tremendously small or large values of a(x, y) are very unlikely to appear.
Figure 4.7a once again illustrates the Monte-Carlo approximation of the expectation value ǔxg with n MC = 10 6 samples used.The decay of the largest error err η 2 w.r.t. the nodes x g is shown in Figure 4.8, where the data points are ordered from left to right w.r.t.increasing η as in Figure 4.5.Finally, Figure 4.9 shows the cardinality of the sets J as well as their GSI for this example.

Discussion
We note that the pointwise solution in Figure 4.7a has a more interesting structure than for the other examples above, mainly caused by the lognormal random coefficient and the non-constant right-hand side f (x).Nevertheless, the approximations E(u usFFT xg ) achieve small errors, which are shown in Figure 4.7.This time, a further increase of the sparsity parameter s and the extension N still increase the accuracy of our approximations, so the stagnation due to the limitations of the Monte-Carlo approximation ǔxg , that we saw in the previous examples, does not occur yet.
The pointwise errors err η 2 (x g ) behave slightly worse but still very good, as we see in Figure 4.8.Again, the increase of the extension N shows visible improvements of the approximation error err η 2 .The decay rate is lower than before, matching our expectations since the lognormal  4.1 for the lognormal example.example is far more difficult than the affine or periodic examples.Note that once again the slope considers all data points shown, while specific decays for fixed extensions N might be slower or faster.Further, the size of the error err η ∞ is again about 10 times the size of err η 2 , revealing also a good pointwise approximation w.r.t. the random variable y in this scenario.
We notice a similar distribution of the detected frequencies k to the index sets J ∩ I as before, cf. Figure 4.9.The key difference is the size of the GSI for each of these index sets.The range of the GSI for J 1 increased significantly, the minimal portion of variance is now only about 30%.Obviously, the GSI for the other index sets J grew accordingly.This is probably caused by the more difficult structure of the lognormal diffusion coefficient a and the corresponding more difficult structure of the solution which is reflected in larger differences in the optimal frequency sets I xg , g = 1, . . ., G, cf.Remark 4.1.Nevertheless, we again detect nearly no significant frequencies k with 4 or more active dimensions as in the previous examples.
Remark 4.1.As mentioned before, the output of the usFFT contains more than the sparsity parameter s frequencies since we join the detected index sets in each dimension increment and use no thresholding technique to reduce the number of found frequencies after that.While we have no reasonably tight theoretical bounds on the size of the output yet, we can further investigate the number of output frequencies in our numerical tests.In detail, we express the detected output sparsity s real as a multiple of the given sparsity parameter s, i.e., s real = q • s with some factor q ∈ R.
In the numerical tests for the first periodic example in Section 4.2, i.e., µ = 1.2 and c = 0.4, we have q ∈ [2.41, 2.74], where the larger values of q tend to appear for smaller sparsity parameters s.For the quickly decaying example, i.e., µ = 3.6 and c = 1.5, we have q ∈ [1.9042, 2.45] and again the larger values of q are attained for small sparsity parameters s.
The affine model in Section 4.3 results in q ∈ [2.06, 2.186], where q < 2.1 is only attained for η = I, II and IV, so parameter settings with small sparsity parameters s and extension N = 32.
Finally, in the complicated lognormal case in Section 4.4, we observe q ∈ [4.776, 5.15].While the values above 5 only appear for η = I and II, we still have significantly larger factors q than before.However, the magnitude of q is still very small compared to the size |T G | = G = 739 in our examples.
Our observation is consistent with recent results presented in [22] which considers the periodic model only.The crucial common feature is that the pointwise approximations u xg can be regarded as elements of a joint reproducing kernel Hilbert space with uniformly good kernel approximants.
Overall, the factor q in our examples is much smaller than G, which would be the worst factor possible in the case that all I xg are disjoint.Hence, as already mentioned in Section 1, the given complexities in Theorem 1.1 are way too pessimistic and the true amount of sampling locations and computational steps needed is much smaller in all of our numerical examples.

Comparison to given frequency sets
The main effort of the usFFT lies in detecting the index set I ⊂ Γ.The computation of the corresponding Fourier coefficients in the final step of Algorithm 1 needs significantly less samples than the detection steps before.Hence, the question arises, if an a priori choice of the index set I should be preferred to reduce the computational cost, cf.Remark 4.2.Therefore, we now consider the following kinds of index sets: • axis cross with uniform weight 1: • hyperbolic cross with uniform weight 1  4 : • hyperbolic cross with slowly or quickly (q = 1 or 2) decaying weights 1 j q : I = {k ∈ Z d : • l 1 -ball with slowly or quickly (q = 1 or 2) decaying weights 1 j q : I = {k ∈ Z d : The Fourier coefficients c k (u xg ) are approximated using the same multiple R1L approach as in step 3 of our Algorithm 1, i.e., we just skipped steps 1 and 2 by choosing the index set I instead of detecting it.The magnitude of the errors is considerably larger than for comparable parameter settings of the usFFT, e.g., η = I to III, especially for the periodic example.Further, we also see that the particular choice of the structure of the index set plays an important role.Obviously, a cleverly chosen index set reduces the size of the approximation error tremendously, especially in the periodic settings.But finding a good or even optimal choice of the index set is highly non-trivial, since it requires sufficient a priori information about the PDE and the structure of its solution or additional computational effort, e.g., to determine suitable weights for a given index set structure.This can be observed for example when comparing the hyperbolic cross index sets for the periodic examples.The uniform weights achieve the best results for µ = 1.2, but cannot keep up at all with the decaying weights for the faster decay rate µ = 3.6.On the other hand, even if we know, that there is a certain decay in our random coefficient, it is not clear how to choose suitable decay rates for the weights in order to guarantee reasonable results -specifically in pre-asymptotic settings, which is the rule rather than the exception when numerically determining solutions of high-dimensional problems.
The usFFT does not depend on these kind of information, as its choice of the frequency set is fully adaptive and the only required a priori information is the search space Γ, which can be chosen sufficiently large without disturbing the results of the algorithm.Further, the detected frequency set I provides these additional information about the structure of the solution u as well as the dependence on the random variables y.In other words, the additional amount of samples needed for the usFFT makes these structural information unnecessary, detects them on its own and provides a possibility to extract them afterwards from the output.
Remark 4.2.The computations in this section and in step 3 of the usFFT are performed using the multiple R1L approach for the efficient computation of Fourier coefficients for a given frequency set I as proposed in [24].From [24,Cor. 3.7] we get a bound on the number of sampling nodes M used.Since we are working with c = 2 and δ = 0.5 as in [25,Alg. 3], we arrive at M ≤ 2 ln (2|I|) 4(|I| − 1).Note that this upper bound is very rough and the actual number of used sampling nodes in almost all numerical experiments is much lower.
As stated above several times, this number of samples used in step 3 of the usFFT is just a small fraction of the total number of used samples when applying the usFFT.In particular, the computation of the actual Fourier coefficients c usFFT k (u xg ) for the detected frequency set I requires roughly 0.4% or 0.3% of the total sampling amount for the two different parameter choices of the periodic example in Section 4.2, around 0.1% in the affine case in Section 4.3 and about 0.65% for the lognormal model from Section 4.4.In [9,44], a data-driven method was proposed, which is capable of computing approximations for multiple right-hand sides f (x) from a certain function class.Our usFFT approach can also be generalized in a similar, data-driven way: For a given class of functions f (x) or even f (x, y), we can use the usFFT in order to compute the frequency set I for one randomly selected right-hand side f or randomly select multiple right-hand sides f and compute unions of the corresponding index sets I f by means of (a slight modification of) the presented usFFT.In each case, we end up with a frequency set I, which is probably a good choice for all the functions f in the given class, since they are hopefully very similar to each other.Hence, we can use this index set I as a starting point and compute approximations of the corresponding Fourier coefficients c k (u xg ), k ∈ I, as done above.This approximation of u xg is then probably a lot better, i.e., the detected index set I is a better localization of the largest Fourier coefficients than some a priori choice.

Summary
The proposed dimension-incremental method provides an efficient possibility to adaptively compute solutions to parametric PDEs involving high-dimensional random coefficients.The non-intrusive behavior allows the use of different, suitable PDE solvers and therefore generates high adaptability of our algorithm to different problem settings as well.We show, that the amount of PDE solutions needed can be decreased by our method compared to the naive repetitive approach even in the worst case.The numerical experiments underline this and show the functionality of our method for practical examples.[25,Lem. 4.5] Further, the cardinality of the finally detected frequency set I (1,...,d) used in step 3 of our algorithm is bounded from above by s G due to the same argumentation, since r = 1 and s = s when t = d holds in step 2. Accordingly, we can apply [25,Alg. 1] in order to construct a spatial discretization Y of I (1,...,d) based on multiple R1Ls which has a cardinality bounded by |Y| max(s G, N Γ ) log(s G/γ B ), where γ B is the failure probability of the construction of this spatial discretization, cf.[24,Thm. 4.1].| ≤ s.As a consequence, we save a linear r and the r in the log term compared to the worst case arithmetic complexity, cf.[32, Sec.3.2.2]for a similar argumentation.In addition, the same argumentation saves a factor r in the logarithmic term of the upper bound on the number of sampling locations in step 2 with the same probability.Later, we specifically choose the parameters r, γ A and γ B such that the estimates hold with high probability -for that reason, we call these complexities with high probability complexities already here.

Sample and computational complexity of the usFFT
For computing the G FFTs of step 3 we apply The sample complexities and computational complexities of the usFFT due to these modifications are given in Table B.1, see [32,Tab. 3.2] for comparison to the sFFT.Here, the only changes are several appearances of the parameter G, i.e., for G = 1 we observe the complexities of the sFFT.

Part 2: Parameter choices
We continue with the aforementioned second big part, where we need to discuss suitable choices of r, γ A and γ B to obtain our desired failure probability δ.
To this end, we first consider the projection failure probability, i.e., the failure that occurs if important projected Fourier coefficients are close to zero and, thus, not detectable.The number r of detection iterations determines, how many of these projections are computed.Table B.1: Sample complexities and computational complexities with high probability (w.h.p.) and in the worst case (w.c.) for the different steps of Algorithm 1, where the efficient identification by [32,Alg. 4] is used in step 2 and the multiple R1L approach from [25, Alg.1] in step 3.
The more projections are considered, the less is the probability that a specific projected Fourier coefficient is small for all of them and hence not detectable.Therefore, the parameter r directly controls this projection failure probability.

The number r of detection iterations
We For G different of such trigonometric polynomials p (g) , we then apply the union bound.Therefore, the probability, that all the projected Fourier coefficients are less than θ for at least one frequency and at least one signal, is bounded by δ 3 d s .
The failure probabilities γ A and γ B In the remaining lines of Part 2, we investigate the choices of the failure probabilities γ A and γ B .We start with the parameter γ A , which is in fact the failure probability of [32,Alg. 4] in the role of Algorithm A. When choosing γ A := δ 3 d s G , we observe, that the probability, that at least one of the G applications of Algorithm A fails in step 2d (for fixed t and i), is bounded from above due to the union bound by δ 3 d s again as in [32,Sec. 3.2.3].Last, we fix the parameter γ B := δ 3 d , i.e., the failure probability of step 3 is bounded from above by γ B , cf. [24,Thm. 4.1].Here, no modification is needed.

Final Step: Parameter insertion and union bounds
The new parameter choices for r, γ A and γ B lead to the same failure probabilities for the detection of projected coefficients ( δ 3 d s ), Algorithm A for fixed t and i ( δ 3 d s ) and step 3 of Algorithm 1 ( δ 3 d ).Therefore, we can now use a union bound over the different steps of our algorithm similar to [25,Thm. 9].It shows, that the total failure probability is now really bounded by terms less than δ.Remark C.1.Note that the arg max in Lemma 3.1 is not unique in general, since there also exist several values for ∆ ≥ 1 2M attaining this maximum, e.g., ∆ = 3 4M , which can be proven analogously.In our numerical experiments in Section 4, we will always work with ∆ opt = 1 4M , which is the smallest optimal ∆ > 0 as we saw in the Theorem above.
Also, if we would neglect the assumption that M is prime, we could run into problems if z j and M are not coprime for all j = 1, ..., d, since then {ỹ i,j , i = 0, ..., M − 1} is only a proper subset of { n M , n = 0, ..., M − 1}.But this case is neglectable, since our algorithm only uses prime lattice sizes M .
p(y) dy, where p is the probability density function of the random variable y.
) e 2πik•y with I the frequency set and c usFFT k (u xg ) the corresponding approximated Fourier coefficients computed by the usFFT.The random variable y is assumed to be uniformly distributed in D y = [− 1 2 , 1 2 ] d in this case.Hence, we have xg ) δ k = c usFFT 0 (u xg ).

Figure 4 . 2 :
Figure 4.2: Largest error err η 2 w.r.t. the nodes x g for all parameter settings η displayed in Table4.1 for the periodic example.

Figure 4 .
4 illustrates the Monte-Carlo approximation of the expectation value ǔxg with n MC = 10 6 samples used as well as the pointwise error |ǔ xg − E(u usFFT xg )| for two different parameter choices η with E(u usFFT xg ) as given in (4.5).

Figure 4 .
5 again shows the largest error err η

Figure 4 . 4 :
Figure 4.4: The MC approximation ǔxg and the pointwise errors |ǔ xg − E(u usFFT xg )| for η = I and II for the affine example.

Figure 4 . 5 :
Figure 4.5: Largest error err η 2 w.r.t. the nodes x g for the parameter settings η = I to XI displayed in Table4.1 for the affine example.

24 Figure 4 . 8 :
Figure 4.8: Largest error err η 2 w.r.t. the nodes x g for the parameter settings η = I to XI displayed in Table4.1 for the lognormal example.

Figure 4 . 9 :
Figure 4.9: Cardinality (left, blue, solid) of the index sets J and the corresponding largest (right, orange, striped), smallest (right, yellow, solid), mean (dashed line) and median (dotted line) of the global sensitivity indices (J , u usFFT xg ) w.r.t.x g for the lognormal example with s = 2000, N = 32.

Figure 4 .
10 illustrates the largest error err η 2 w.r.t. the nodes x g ∈ T G for the previously considered periodic and affine examples, cf.Sections 4.2 and 4.3, with these given frequency sets I for various refinements N .

Figure 4 . 10 :
Figure 4.10: Largest error err η 2 w.r.t. the nodes x g for the periodic and affine examples with given frequency sets.
Now, we need to discuss the computational complexities of the individual steps of the usFFT.Obviously, step 1 applies d r G different one-dimensional FFTs of lengths at most N Γ , which yields a computational complexity in O (d r G N Γ log(N Γ )).In step 2, we apply ((d−2) r+1) G times[32, Alg.4], where the sampling set is a union of L ∈ O (log(r s G N Γ /γ A )) R1Ls of size at most in O (max{s, N Γ }) and the input set of frequencies J t is bounded from above by O (r s G N Γ ) in its cardinality, which yields an arithmetic complexity inO (d r G(max(s, N Γ ) log(s N Γ ) + d r s G N Γ ) log(r s G N Γ /γ A )) ⊂ O d 2 r 2 s G 2 N Γ log 2 (r s N Γ G/γ A )in the worst case.Moreover, we observe |J t | s G N Γ with a certain probability since the signals p are all trigonometric polynomials and in the case where Algorithm A does not fail in any case, we have i=1,...,r Jt,i,g ⊂ I (1,...,t) g with |I (1,...,t) g | ≤ |I (1,...,d) g [25, Alg.2], which yields a computational complexity inO G log(s G/γ B ) max(s G, N Γ ) log(s G N Γ ) + s G(d + log(s G)) ⊂ O (G max(s G, N Γ ) log(s G/γ B ) (d + log(s G N Γ ))) .
Γ d r G N Γ log N Γ Step 2 (w.h.p.) d r max(s, N Γ ) log s G N Γ γ A d 2 r s G 2 N Γ log 2 s G N Γ γ A Step 2 (w.c.) d r max(s, N Γ ) log r s G N Γ γ A d 2 r 2 s G 2 N Γ log 2 r s G N Γ γ A Step 3 max(s G, N Γ ) log s G γ B G max(s G, N Γ ) log s G γ B (d + log(s G N Γ )) consider a single trigonometric polynomial p ≡ 0 with min h∈supp p |p g | ≥ 3θ and Γ ⊃ supp p, |supp p| ≤ s.Choosing r = 2s(log 3 + log d + log s + log G − log δ) , as given in [25, Lem.7], yields a probability of at most δ 3 d s G that all the projected Fourier coefficients are less than θ for at least one frequency.

Table 4 .
1: Parameter settings η for the numerical tests of Algorithm 1.