A sample efficient sparse FFT for arbitrary frequency candidate sets in high dimensions

In this paper a sublinear time algorithm is presented for the reconstruction of functions that can be represented by just few out of a potentially large candidate set of Fourier basis functions in high spatial dimensions, a so-called high-dimensional sparse fast Fourier transform. In contrast to many other such algorithms, our method works for arbitrary candidate sets and does not make additional structural assumptions on the candidate set. Our transform significantly improves upon the other approaches available for such a general framework in terms of the scaling of the sample complexity. Our algorithm is based on sampling the function along multiple rank-1 lattices with random generators. Combined with a dimension-incremental approach, our method yields a sparse Fourier transform whose computational complexity only grows mildly in the dimension and can hence be efficiently computed even in high dimensions. Our theoretical analysis establishes that any Fourier $s$-sparse function can be accurately reconstructed with high probability. This guarantee is complemented by several numerical tests demonstrating the high efficiency and versatile applicability for the exactly sparse case and also for the compressible case.


Introduction
The problem of reconstructing or approximating a function from a finite number of samples is one of the central objects of study in approximation theory. In finite dimensional function spaces (or function spaces that allow for good finite-dimensional approximations), this problem can be formulated as a linear system Ax = y. Here y is the vector of observed function values, x is the (unknown) coefficient vector of the function in a given basis B, and the matrix A represents the linear map that maps basis coefficients to function evaluations; knowing the basis B and the sampling points, A can be explicitly determined. The focus of this paper will be on the case where A is the discrete Fourier transform (DFT) matrix or some submatrix of it.
When A is left-invertible with left inverse L, as it is the case for the full DFT matrix, one can find x via the the matrix-vector product of L and y. The computational complexity of such a matrix-vector multiplication can, in general, not be improved beyond a linear scaling in the number of entries of the matrix even if the matrix inverse has been precomputed, as this is what is needed just to read the matrix. In the case of DFT matrices, further acceleration is possible via the Fast Fourier transform (FFT) [11], which improves the complexity of multiplying the (inverse) discrete Fourier matrix of size n × n with an arbitrary vector of length n from O(n 2 ) to O(n log n). An extension of this result for irregular sampling nodes with similar advantages is the nonequispaced fast Fourier transform (see e.g. [28]). Further optimization is possible if the frequencies, i.e., the multi-indices of the basis functions, belong to special structured sets such as hyperbolic crosses, cf. [13].
A natural next step is to consider functions where the frequencies of the non-zero Fourier coefficients form a relatively small set I ⊂ Z d within a possibly huge candidate set Γ ⊂ Z d , but for which the set I is unknown. Such functions are commonly referred to as Fourier sparse. Here a function is s-sparse in the given basis if it can be expressed as a linear combination of only at most s basis elements, i.e., s ≥ |I|. Naturally, this reduced model approach is assumed in order to reduce both the number of samples required and the computation time. The crucial challenge in this approach is the identification of I, which is a much harder task compared to the classical approach where only the Fourier coefficients, i.e., the entries of coefficient vector x, are unknown.
The focus in the area of compressive sensing has been on the sample complexity of this problem. Here, one seeks to recover a sparse signal from as few samples as possible. It has been shown that recovery is possible for a number of random samples that scales linearly in the sparsity s up to logarithmic factors [5,31], even when the signal is sparse in an arbitrary incoherent basis [4] or a wavelet basis [25]. The resulting methods cannot be considered to be fast transforms per se, as the computation time required for most reconstruction procedures scales at least linearly in the number of candidates |Γ| ≫ s.
For any sublinear time algorithm to recover Fourier sparse signals, significant subsampling compared to |Γ| must be an integral part. For the case of d = 1 spatial dimension, a randomized approach that achieves runtime scaling quadratically in terms of the sparsity s (up to logarithmic factors in |Γ|) was presented in [16]. A deterministic, combinatorial algorithm with runtime complexity scaling quadratically in the sparsity s was presented in [19,20]. Improved randomized algorithms with runtime scaling linearly in the sparsity were introduced in [15,19,2,1,20] and an improved deterministic algorithm in [26]. The latter one was enhanced once more in [10]  case d ≥ 2, and new algorithms have been developed. In Table 1.1, we give an overview on existing multi-dimensional sparse FFT approaches and the ones presented in this paper. We compare sample and computational complexities as well as the general settings and types of sampling strategies.
The deterministic approach in [26] and its variant for noisy samples [10] have been extended to high spatial dimensions d in [6] and [7], respectively, with runtime complexity O (d s log s) and sample complexity O (d s) on average (both with an additional factor of log N Γ for the variant addressing noisy samples), cf. Table 1.1. The method is highly efficient since the average case sample complexity is best possible, as a frequency set I ⊂ Z d of cardinality |I| = s has d s many entries, and correspondingly, the average case runtime complexity is best possible up to a logarithmic factor. However, the approach uses a random signal setting, i.e., the expectations and success probabilities are computed with respect to active frequencies collected in I that are assumed to be distributed uniformly at random in a full (finite) tensor product grid as candidate set Γ. Especially, smaller structured subsets, such as (subsets of) hyperbolic crosses, do not fit into this model assumption. In contrast to this, the other methods mentioned in Table 1.1 use an "arbitrary signal" model, i.e., the active frequencies can be an arbitrary subset of a suitable candidate set Γ.
In [27], the author presents a deterministic and noise robust high-dimensional sparse FFT approach. The corresponding complexities scale merely quadratic with respect to the sparsity s. However, the computational complexity of the Fourier transform O d 3 s 2 N 2 Γ log N Γ and the computational complexity of the construction of the sampling set O d 6 s 2 N Γ log 3 N Γ indicates that the approach has advantages only for moderate expansions N Γ and moderate dimensions d.
The randomized approach in [1] has also been generalized to Fourier transforms in constant dimension [18], but due to dimensional scaling of d O(d) in the runtime as well as sample complexity, cf. Table 1.1, the approach will not be feasible in higher spatial dimensions d.
In contrast, the deterministic approach in [20] has been shown to generalize to high spatial dimensions d with a quadratic scaling of the complexities in the sparsity s and a polynomial scaling d 4 in the spatial dimension, cf. Table 1.1, but may exhibit limitations with respect to its numerical stability on large candidate sets Γ, in particular in high spatial dimensions d. The reason for this issue is a transformation of the d-dimensional frequency domain Γ, which is assumed to be a tensor product grid, to a one-dimensional frequency domain [0,Ñ ), wherẽ N |Γ| is necessary in order to obtain a unique mapping between both frequency grids. Subsequently, one applies a one-dimensional sparse FFT approach to the new one-dimensional problem for typically hugeÑ , which suffers from numerical issues due to the fact that highly oscillating basis functions -with then possibly neighboring one-dimensional frequenciesare difficult to distinguish when using only a few samples. Remarkably, the locations of the samples and the algorithm itself are fully deterministic, and the method is guaranteed to successfully detect all active frequencies (on a machine with sufficiently high numerical precision). The related randomized version in [20] uses a random subsampling and achieves a linear scaling of the complexities in s, but also suffers from potential numerical issues for large d.
The aforementioned approaches share the main characteristic that they are specifically designed for full (finite) tensor product grids in frequency domain as candidate sets Γ. In many applications, however, the function space under consideration motivates a significantly reduced set Γ, such as a hyperbolic cross, or possibly an unstructured set Γ. So only those Fourier s-sparse functions need to be considered with active frequencies in the candidate set Γ.
High-dimensional sparse fast Fourier transforms for such scenarios have been designed, for example, using rank-1 lattices [30,23]. These methods use a dimension-incremental pairing technique, which can also be found in [32,29]. We describe the general ideas behind this technique in more detail later in this section. The required runtime complexity of the algorithm in [23] is O d 2 s 2 N Γ log 3 s with high probability, where N Γ is the expansion of Γ, cf. (2.5), and the sample complexity is O d s 2 N Γ log 2 s , cf. Table 1.1. Due to the mild dependence on the spatial dimension d and the sparsity s, this approach is well suited for high-dimensional problems.
Furthermore, a much more general approach with runtime complexity O d 6 s 5 log 4 (d s N Γ ) and sample complexity O d 5 s 3 log 4 (d s N Γ ) was presented in [8,9], which has also been applied to more general tensor product bases. It is based on a similar dimension-incremental pairing technique as in [30,23] and internally uses ideas from compressed sensing.
One main contribution of this paper is the design of a sample-efficient sparse fast Fourier transform for Fourier sparse functions in high spatial dimensions d with frequencies in a given arbitrary candidate set Γ ⊂ Z d , |Γ| < ∞, where in particular Γ may be much smaller than a full tensor product grid. The runtime complexity exhibits an additive joint dependence on s and |Γ| up to logarithmic factors and -most importantly -a sample complexity that is linear in s times a logarithmic factor in |Γ|. We stress on the fact that the algorithm succeeds with high probability, where the failure probability does not depend on the structure of the candidate set Γ or the active frequency set I of sparsity s. In particular, the presented estimates still hold true even if the frequency set I is arbitrarily clustered. More precisely, our main result reads as follows.  1) and the parameter δ ∈ (0, 1). Then, there exists a sampling strategy based on random rank-1 lattices using less than 37 |I| (ln |Γ| − ln δ) sampling values of a multivariate trigonometric polynomial p ∈ Π I in order to identify all frequencies k ∈ I belonging to the non-zero Fourier coefficientsp k = 0 with probability at least 1 − δ. This identification and the computation of the Fourier coefficientsp k can be realized by Algorithm 2 in less than arithmetic operations with an absolute constant C > 0.
Remark 1.2. In Theorem 1.1 we can relax the subset property (1.1) to with the consequence that only the frequencies k ∈ Γ ∩ I can be identified. Moreover, the constant 10.33 in (1.1) can be adapted for specific applications, see also Remark 2.8.
In addition to the significant improvement for arbitrary sampling sets, Theorem 1.1 also yields significant acceleration for sparse Fourier transforms on a full tensor grid, the setup studied in many of the works discussed above. Namely, in high spatial dimensions, our result can be combined with a so-called dimension-incremental approach, which is also a key ingredient of many of the currently best known algorithms for sparse fast transforms. The idea of such approaches is to identify the multi-indices corresponding to the frequencies of the non-zero Fourier coefficients component by component, where in each step one works with the candidate set Γ consisting of all indices that agree with the previously identified partial component vectors, see Section 3 for more details.
To identify the components of the active multi-indices, we will use Theorem 1.1, which is based on random rank-1 lattices. In that sense our work follows a similar strategy as in [23] (and its predecessor [30]), which also uses rank-1 lattices in the identification step of a dimension-incremental approach. The sparse transforms designed in these works, however, exhibit a larger runtime complexity due to a suboptimal scaling of the sample complexity required in each dimension-incremental step. More precisely, they are based on spatial discretizations of full frequency candidate sets without taking advantage of the knowledge that only a few frequencies within these candidate sets are active. With Algorithm 1 below at hand, in contrast, one can explicitly take this additional information into account to obtain a constructive sampling strategy with an improved sample complexity in combination with a comparable runtime complexity as in [23]. This is summarized in the following Theorem. Theorem 1.3. (Dimension-incremental strategy). We consider the frequency sets Γ ⊂ Z d , |Γ| < ∞, and I ⊂ Γ. Moreover, we define N Γ as in (2.5), e.g., Then, there exists a sampling strategy based on random rank-1 lattices using sampling values that allows with probability at least 1 − δ to identify all multivariate trigonometric polynomials p supported on I, that is, those p with all its non-zero Fourier coefficients corresponding to index vectors in I (in other words, trigonometric polynomials withp k = 0 for all k ∈ Z d \I). This identification and the computation of the active Fourier coefficientsp k , k ∈ I, can be realized by a combination of Algorithm 3 and Algorithm 4. This method has a computational complexity of To compare this runtime and sample complexity to those of alternative dimensionincremental strategies, we again refer to Table 1.1. In addition to the approach [23] that is based on rank-1 lattices, this includes [8,9] which use compressed sensing techniques for the identification step. These compressed sensing based approaches use the special structure of the candidate set Γ: When restricting to each entry of the index set, one just encounters a regular grid in a much lower dimensional space, so one can efficiently apply compressed sensing techniques. The entries are then combined using an appropriate pairing technique, also based on compressed sensing ideas. This approach is much more general in that it allows for much larger classes of bounded orthonormal systems than just Fourier, but for that reason, it also does not take optimal advantage of the structural properties of the Fourier transform and the possible computational speedup by FFT techniques that our approach is able to exploit. This explains the inferior dimensional scaling of the runtime complexity.
Besides these works, dimension-incremental methods have also been studied in combination with Prony's method [29]. Also, there are further Prony-like techniques available in higher dimensions, cf. [12]. There is no explicit analysis of runtime and sample complexity available, however, which is why we cannot include this approach in Table 1.1. In addition, the feasibility of Prony's method in this context is severely limited to low sparsities due to stability problems arising even for sparsity levels in the order of a few hundreds.
2 Reconstructing multivariate trigonometric polynomials from samples along random rank-1 lattices

Sparse FFT via rank-1 lattices -background and previous works
The frequency identification method we present in this work is based on a class of well established cubature methods in higher dimensions, so-called rank-1 lattice rules, which are special quasi Monte Carlo methods. Such methods consider rank-1 lattices as sampling sets, that is, sets of the form where z ∈ Z d and M ∈ N are called generating vector and lattice size of Λ(z, M ), respectively. A candidate for an approximate integral of a function over a high-dimensional cube is then computed as the average of its samples along this lattice. Naturally, it depends on the function class under consideration whether and to which accuracy this candidate approximates the true integral. In this paper, we will apply this approach for the identification of a sparse trigonometric polynomial where k · x := d j=1 k j x j denotes the usual inner product in R d . Recall that we are interested in the case that the multivariate trigonometric polynomial is sparse, i.e., the index set I ⊂ Z d -the frequency set -not only has finite cardinality, |I| < ∞, but is also small.
Note that the Fourier coefficientp k with frequency k can be computed by evaluating the integralp As it turns out, a rank-1 lattice rule applied to the integrand in (2.3) often returns the correct Fourier coefficient. Indeed, the rank-1 lattice rule with rank-1 lattice Λ(z, M ) as in (2.1) yieldŝ where δ 0 (0) := 1, δ 0 (l) := 0 for l = 0, and Λ ⊥ (z, M ) := {h ∈ Z d : h · z ≡ 0 (mod M )} is the integer dual lattice of the rank-1 lattice Λ(z, M ). As we will see, for a random lattice Λ, Λ ⊥ (z, M ) will typically intersect I − k only in h = 0 for k ∈ I or not at all for k / ∈ I, so (2.4) typically consists just of the single summandp k or even none, as desired.
When the intersection consists of more elements than just 0, we speak of aliasing. For a specific frequency, as observed recently by one of the authors in [22], this happens only with small probability for random lattices even with small lattice sizes M ≍ |I|. Consequently, one has unique reconstruction of at least some of the original Fourier coefficientsp k with a certain probability. Nevertheless each realization of such a random lattice will yield aliasing effects for some fraction of its Fourier coefficient. This means that our goal of correctly determining for all candidate frequencies h ∈ Γ whether they are active or not is typically impossible with a single realization. Even to just identify a superset of the active frequencies (with some false-positives allowed) requires a lattice of larger cardinality. More precisely, the samples along a (single) rank-1 lattice rule do not allow for the reconstruction of all original Fourier coefficients {p k } k∈I unless the rank-1 lattice size M is on the order of M |I| 2 in the worst case [21]. In addition to this disadvantage, a corresponding generating vector z needs to be constructed, e.g. using a component-by-component approach, which can lead to large computational costs, cf. [23, Page 3] for a detailed discussion on this topic.
That is why in this paper, we work with multiple rank-1 lattices. Building upon the work of [22], we develop a strategy to identify the frequencies belonging to the non-zero Fourier coefficients of the polynomial p within a frequency candidate set Γ ⊃ I in Section 2.2. To this end, we present some notation and technical basics from [22] in the following.
As we use a random approach to generate the multiple rank-1 lattices, our method will not be guaranteed to be successful in all cases but with a chosen high probability. Naturally, the success rate will depend on the candidate set Γ to some extent. In particular, a candidate set Γ that is compact and of small size yields superior performance as compared to one which is extremely large or very wide spread. To describe the nature of the frequency candidate set Γ, we will work with its cardinality |Γ| and its expansion (2.5) Here, N Γ is the smallest edge length of a cube containing the frequency candidate set Γ. That is, there exists some k ∈ Z d such that If k, k ′ ∈ Γ agree up to multiples of M , k ≡ k ′ mod M , sampling on a rank-one lattice of the type Λ(z, M ) cannot distinguish k and k ′ . In general, one can at best identify the equivalence class mod M . For that reason, it is often beneficial to represent each of these equivalence classes by an element in {0, . . . , M − 1}, i.e., to consider where we assume M ∈ N, and to aim to identify I mod M ∩ Γ mod M using the techniques introduced in this paper. Under the assumption that |I mod M ∩ Γ mod M | = |I ∩ Γ| (that will always hold for M large enough), this also allows for the identification of |I ∩ Γ|. We refer the reader to [22,Lem. 2.3] for further details. Working with the modified candidate set is preferred as for M ≤ N Γ , dealing with Γ mod M instead of Γ leads to a smaller expansion of the candidate set Γ under consideration and allows -in specific situations -for a significantly reduced number of sampling values required for the suggested approach, Algorithm 1 below, to identify the frequency support I of the multivariate trigonometric polynomial p.
Furthermore, for technical reasons, it will be beneficial to use rank-1 lattice sizes M that are prime numbers in addition to ensuring that |Γ mod M | = |Γ|, i.e., we consider This set contains at least all primes greater than N Γ .
At this point, we stress on the fact that we will exploit the advantageous aliasing formula (2.4) in order to construct our algorithm. In particular, the strategy is to randomly choose multiple suitable rank-1 lattices Λ(z 1 , M 1 ), . . . , Λ(z L , M L ) and to exploit the fact that for a fixed frequency problematic aliasing effects occur with a fixed probability for each of the rank-1 lattices under consideration. In order to control the probability that these aliasing effects occur in a reasonably large proportion of the L rank-1 lattices, we need to choose L sufficiently large. We call such a collection The separate consideration of the aliasing effects of each single rank-1 lattice Λ(z ℓ , M ℓ ), ℓ = 1, . . . , L, allows for the separate computation of possibly activep The structure of a specific rank-1 lattice Λ(z, M ) will then provide an efficient algorithm that computes all (aliased) Fourier coefficientsp . This is highly efficient compared to applications of matrix vector products, cf. [21].

New approach and reconstruction guarantee
Given the role of the candidate set Γ, it should not come as a surprise that one can only exactly identify the frequency support I of a multivariate trigonometric polynomial p provided that I ⊂ Γ . In practical scenarios, however, errors cannot always be avoided. For example when applying the methods developed in this section as frequency identification steps in a dimension-incremental approach, cf. Section 3 below, the candidate sets arise from previous estimation steps. So it may happen that this assumption is violated at some point. For that reason we do not explicitly require I ⊂ Γ and show that the method will identify the frequency support I ∩ Γ of p within Γ with high probability for a suitable choice of parameters -even when other frequencies are present. Note, however, that these parameters depend on I ∪ Γ, so they are not straight forward to determine for I ⊂ Γ and unknown I.
The main tools for our analysis are the following generalizations of [22, Lemma 3.1 and Theorem 3.2].
Lemma 2.1. Let I ⊂ Z d and Γ ⊂ Z d be frequency sets of finite cardinalities, |I| < ∞ and |Γ| < ∞. We fix a frequency k ∈ I ∪ Γ and choose a prime number M such that |(I ∪ Γ) mod M | = |(I ∪ Γ)|. In addition, we choose a generating vector z ∈ [0, M − 1] d ∩ Z d uniformly at random. Then, with probability not greater than |I| M , the frequency k aliases to at least one frequency from I \ {k}, i.e., Proof. We start with the case k ∈ Γ \ I, and we build the frequency setĨ = I ∪ {k}. Since we have |(I ∪ Γ) mod M | = |I ∪ Γ| andĨ ⊂ (I ∪ Γ), it follows that |Ĩ mod M | = |Ĩ|, and we apply [22,Lemma 3.1] with the frequency setĨ, which yields that k aliases to at least one [22,Lemma 3.1] yields that k aliases to at least one other frequency from I \ {k} with probability at most |I|−1 M < |I| M .

Remark 2.3.
In Algorithm 1, we use a comparison to zero on line 6 implemented as Kronecker delta function δ 0 . Clearly, numerical implementations should take into account numerical inaccuracies and utilize a suitable approximation of this function.
The next two lemmas estimate the failure probabilities of the proposed detection method and, subsequently, Corollary 2.6 reveals the connection to Theorem 2.2. In the following, we widely use the random variables Y k ℓ , that have already been defined in (2.6).
Proof. Since k ∈ Γ \ I,p (ℓ) k = 0 holds in each case where k does not alias to any frequency h ∈ I. Therefore, we necessarily need aliasing in order to obtainp  , which holds since aliasing is necessary in order to even obtainp As a consequence of the last two lemmas, we need upper bounds on which are themselves upper bounds on the failure probabilities of the classification of k ∈ Γ\I and k ∈ I in line 6 of Algorithm 1, respectively. In order to apply Theorem 2.2, we need to choose ν ∈ (0, 1/2], observing that This restriction on ν allows for the application of Theorem 2.2 which leads to the following statement about Algorithm 1. Corollary 2.6. Let I ⊂ Z d and Γ ⊂ Z d with |I| < ∞ and |Γ| < ∞ be given. Moreover, we choose the parameters ν ∈ (0, 1/2] and c > 1/ν. Furthermore, we fix δ ∈ (0, 1), Subsequently, we randomly choose . . , L} of the multivariate trigonometric polynomial p(x) = k∈Ip k e 2πik·x ,p k = 0 for each k ∈ I, are given. Then, the probability that the outputĨ of Algorithm 1 does not equal Γ ∩ I is less than δ.
Proof. Applying Theorem 2.2, Lemma 2.4, Lemma 2.5, and the union bound yields Up to now, we considered only the classification of the non-zero frequencies. Specific parameter choices allow for the additional computation of the unknown Fourier coefficients. To this end, we computě k·z ℓ mod M ℓ . Choosing ν = 1/2 and L odd, we obtaiň with probability at least η : since there is no aliasing for at least L/2, i.e., L+1 2 , rank-1 lattices on the frequency k with at least this probability. Accordingly, Algorithm 2 Detecting the frequency set and computing the Fourier coefficients of a trigonometric polynomial p. (Algorithm 1 with ν := 1/2 and computation of Fourier coefficients).
Input: k =p k , which implies the median is exactly this value. Algorithm 2 presents the resulting strategy for detecting the active frequencies k ∈ Γ as well as computing all medians of the sets {p (ℓ) k : ℓ = 1, . . . , L}, k ∈Ĩ, as corresponding Fourier coefficients. The differences to Algorithm 1 are the fixed parameter choice ν := 1/2 and the additional line 8 in Algorithm 2, which computes the Fourier coefficientsp k , k ∈Ĩ, as medians using the method from [3]. SinceĨ ⊂ Γ, Algorithm 2 requires at most O (L |Γ|) additional arithmetic operations. Thus, we obtain the same computational complexities for Algorithms 1 and 2.
Proof. We follow the proof of Corollary 2.6 for estimating the probability, and we take into account the argumentations on the mediansp k immediately above this Corollary.
Remark 2.8. The parameters in Corollary 2.7 need to be chosen carefully depending on the specific application. In particular, it might be useful to increase the parameter c, which leads to possibly larger lattice sizes M but smaller numbers L of used rank-1 lattices.
Theorem 1.1 states the result of Corollary 2.7 for the specific parameter choice c := 10.33 and the restriction on the frequency sets given in (1.1). We now give its proof.
Proof of Theorem. 1.1. We apply Corollary 2.7 setting the parameter c := 10.33. Since the subset property (1.1) of the frequency sets is assumed, P I∪Γ contains each prime number p larger than c|I|. The ratio M c|I| of the smallest prime number M larger than c|I| is bounded from above by M c|I| ≤ 127 11·10.33 , which can be proven using [14,Prop. 5.4] and the calculation of finitely many instances of this ratio. Accordingly, the lattice sizes are bounded from above by M ≤ 127 11 |I|. In addition, for |Γ| ≥ 8 > e 2 , the number of lattices L is bounded from above by where in the last inequality we used that δ < 1. This yields a total number of samples M L < 37|I| (ln |Γ| − ln δ). According to Corollary 2.7, we apply Algorithm 2 and obtain the non-zero ones of the Fourier coefficientsp k , k ∈ I = I ∩ Γ, with probability at least 1 − δ.
Remark 2.9. Clearly, the reconstruction guarantees in Theorem 1.1 as well as Corollary 2.7 hold with probability of at least 1−δ. However, from a practical point of view, it is reasonable to expect that the outputĨ of Algorithms 1 and 2 contains I, i.e., I ⊂Ĩ, with significantly higher probability. On the one hand, the estimate (2.8) in Lemma 2.5 is very rough since we estimate the probability of cancellations of aliasing Fourier coefficients just by the probability of aliasing frequencies, which are widely differing events in fact. On the other hand, the probability that I ⊂Ĩ holds, i.e., that we observe false negatives, is bounded from above by |I| |Γ| δ, cf. the proof of Corollary 2.6. Moreover, the probability that we will observe false positives seems to be significantly higher, cf. Section 4.1, which also fits well to the theoretical estimate on the failure probability in Corollary 2.6 since the upper bound on the probability of observing false positives contributes much more to the upper bound δ of the total failure probability when one assumes that |I| ≪ |Γ|.
From that point of view, the probability of I ⊂Ĩ in conjunction with, for instance, |Ĩ| ≤ 2 |I| may be considerably larger than the probability we estimated in Theorem 1.1. Even in these cases, one has a reasonable chance to entirely identify the trigonometric polynomial p from the already known sampling values.
An improvement strategy is to postprocess the frequency setĨ and Fourier coefficientsp k , k ∈Ĩ, obtained as output of Algorithm 2. One considersĨ as the frequency set of the polynomial p and one computes the corresponding Fourier coefficientsp k , k ∈Ĩ, from the already available sampling values along the fixed rank-1 lattices Λ(z 1 , M 1 ), . . . , Λ(z L , M L ) using [24,Algorithm 2] or, even more general, one applies a least squares method. If this set of rank-1 lattices provides a spatial discretization of the space ΠĨ of trigonometric polynomials, cf. [22] for details, andĨ ⊃ I holds, we will entirely identify the trigonometric polynomial p. Otherwise, we can use the following hybrid approach. For each rank-1 lattice Λ(z ℓ , M ℓ ), we determine the frequency setĨ (ℓ) := {k ∈Ĩ : k · z ℓ ≡ h · z ℓ (mod M ℓ )∀h ∈Ĩ \ {k}} belonging to reconstructable Fourier coefficientsp k for any p ∈ ΠĨ and setp k :=p . Whenever a frequency k ∈Ĩ is contained in more than oneĨ (ℓ) , we set the Fourier coefficientp k to the average of the corresponding coefficientsp . If there should be frequencies k ∈Ĩ \ ∪ L ℓ=1Ĩ (ℓ) , then we just use the Fourier coefficientsp k computed in line 8 of Algorithm 2 as a fall-back.
Numerical tests, cf. Section 4.1, confirm that this postprocessing strategy works very well. However, we cannot directly apply the theoretical results from [22], since the frequency setĨ already depends on the aliasing effects of the rank-1 lattices Λ(z 1 , M 1 ), . . . , Λ(z L , M L ).

Dimension-incremental sparse FFT -background and previous works
As already mentioned in the introduction, one of the main motivations for our algorithm is to apply it as a key ingredient of a dimension-incremental algorithm for the high-dimensional fast Fourier transform. Such algorithms have received considerable attention in recent years due to their excellent and reliable applicability in high-dimensional settings, cf., e.g., [30,23,8,9].
To precisely formulate the algorithmic framework of these approaches that will also form the basis of our sparse FFT, we recall some notation: We denote the multivariate trigonometric polynomial that we aim to recover by for some frequency set I = suppp ⊂ Z d , s := |I| < ∞. Furthermore, we assume that we know a (possibly very large) candidate set Γ ⊂ Z d of finite cardinality for I, i.e., I ⊂ Γ and |Γ| < ∞.
In this notation our key dimensional-incremental strategy, in analogy to a number of recent approaches in the literature, such as [8,9], amounts to first identifying I (t) as a superset of the projection In each of these pairing steps, one determines the candidates for the indices in I (1,...,t) by appropriately combining the (already identified) frequency set I (1,...,t−1) for P 1,...,t−1 (I) with the (already identified) frequency set I (t) for P t (I). That is, one aims to find |I| active elements from the larger candidate set (I (1,...,t−1) × I (t) ) ∩ P 1,...,t (Γ) =: J t . This problem is exactly of the form studied in the previous section, and basically this step is where the various approaches that have been proposed in the literature differ. Treating this step as a black box, say Algorithm A, we can formulate the dimension-incremental approach to high-dimensional fast transforms as a meta-algorithm, which is summarized in Algorithm 3.
To apply our algorithm for a wider class of scenarios, we introduced the two additional parameters θ and s local for enhanced stability and robustness. The threshold parameter θ is meant to account for scenarios where due to noise or model mismatch, some linear combination of non-significant coefficients is not evaluated to exactly zero, but just approximately zero. More precisely, one is not aiming to identify all non-zero coefficients, but rather only those with absolute values of at least θ. Naturally the parameter θ should be chosen smaller than the values to be expected, see also (3.1).
The parameter s local is related to the observation that allowing for a moderate number of false positives can significantly improve the success rate for recovery in practice. If Step 2c of Algorithm 3 is designed to identify only exactly s coefficients, then a false positive can lead to the situation where an active frequency no longer corresponds to any of the s largest coefficients, and hence is not identified.
For that reason it is useful to collect s local > s frequencies in the intermediate steps of Algorithm 3 in practice, and we incorporate this feature in Algorithm 3 by using s local := 2s as the default choice.
In general, a major computational bottleneck of such dimension-incremental approaches is the iterated signal reconstruction from sampling values, (i.e., Step 2 in the formalization of Algorithm 3). In the simplest case, each such instance can be thought of as computing a pseudoinverse of a matrix. Since the matrices to be inverted change in every step of the dimension-incremental algorithm, a procedure for highly efficient computation of such a pseudoinverse is the key to the computational efficiency of the entire method. If one has FFT-like algorithms at one's disposal, the computational complexity will be tremendously reduced as compared to algorithms entirely based on computing full matrix vector products. Such high-dimensional FFT-like algorithms, however, typically need to exploit some structure of the sampling pattern and are hence restricted to specific sampling sets. Even for highdimensional trigonometric polynomials on a regular full grid, this structure will likely get lost in the dimension-incremental procedure: As soon as previous iterations have identified an unstructured candidate set, this set will no longer have the advantageous structure of the underlying grid that can be exploited for computing the FFT.
Algorithm 2 in contrast, yields a fast transform for arbitrary candidate sets. Consequently, it can be efficiently applied in every iteration no matter how the candidate set looks like, which in turn gives rise to the superior computational complexity of the high dimensional sparse fast Fourier transform resulting from employing Algorithm 2 (together with some subsequent low-cost computations) in the role of Algorithm A in the context of Algorithm 3.

Sample complexity and computational complexity 3.2.1 Algorithm 3 using a general efficient identification Algorithm A
Since we proposed to consider Algorithm 3 as a meta-algorithm, we start by analyzing its sample complexity as well as its computational complexity in general terms.
First we consider the number of samples used by Algorithm 3 for each step separately. The general results are collected in Table 3.1. In Step 1, we use r d t=1 K t ≤ d r N Γ many sampling values. For Step 2 we construct r different sampling sets X t,i for each t = 2, . . . , d−1 for all ℓ = 0, . . . , K t − 1. Apply Algorithm A to obtain the supportJ t,i ⊂ J t , |J t,i | ≤s, of frequencies belonging to the at mosts largest Fourier coefficients, each larger than θ in absolute value, using the sampling values p(x j ), x j ∈ X t,i . Step Step 3 max(s, N Γ ) log(s/γ) max(s, N Γ ) log(s/γ)(d + log(s N Γ )) and, in addition, one sampling set X d,1 , i.e., r (d − 2) + 1 many. Since different choices for Algorithm A require different sampling sets and hence the sampling strategy must be chosen in conjunction with the algorithm substituted for this black box, we denote the sampling sets by X t,i = X t,i (A). Certainly, the sampling sets X t,i (A) may depend on the admissible failure probability γ A of Algorithm A. With this notation, the number of sampling values used in Step 2 is bounded by |X d, Step 3, we need a number of sampling nodes on the order of max(s, N Γ ) log(s/γ) for a parameter γ. Namely, this is the number of sampling nodes required by [23, Algorithm 1] to realize a spatial discretization of trigonometric polynomials with frequencies supported in I (1,...,d) with probability at least 1 − γ (see [23] for details).
Second, we consider the computational complexity, again for each step. In Step 1, we apply r one-dimensional FFTs of maximal size N Γ , d times each. Moreover, each of the d different frequency sets I (t) , t = 1, . . . , d, is constructed incrementally in r substeps, e.g., by sorting vectors (|p t,j |) j∈Pt(Γ) of length at most N Γ and handling a sorted vector of length at most N Γ . Accordingly, the computational complexity of Step 1 is in d rN Γ log N Γ .
The computational complexity of Step 2 intrinsically depends on the choice of Algorithm A. This dependency is two-fold. On the one hand, in Step 2c, the algorithm is executed, hence the runtime of this step directly corresponds to the computational complexity of Algorithm A. On the other hand, different choices for Algorithm A will require different sampling sets, which can be costly to construct if specific properties are required. Hence the computational complexity of Step 2a, in which the sampling set is constructed will also depend on Algorithm A. We will subsume the A-dependent contribution to the computational complexity arising in these two steps in a constant C(A), chosen to be a universal upper bound for these contributions over all possible choices of t and i.
Step 2b also has a mild implicit dependency on Algorithm A, as it scales with the size of the sampling set. However, it should not be seen as a part of the algorithm, as it also depends on the sampling procedure in the underlying application, which is independent of the algorithm design. In this paper, we will follow the assumption made in most other works on the topic that the sampling procedure is dominated by the other steps of the algorithms, i.e., the computational complexity of Step 2b is also of order O (C(A)). At this point, we would remind the reader once again that we tolerate a specific failure probability γ A of Algorithm A which may imply a γ A dependence of C(A).
Step 2d is about adding at most s local elements to the set I (1,...,t) ; the largest run-time contribution of this step is to avoid listing an existing element again. In analogy to the considerations of Step 1, we observe a computational complexity in O (d r s log(rs)) for Step 2d under the assumption s local ∼ s.

As
Step 3 is based on [23, Algorithms 1 and 2], the results in this paper provide bounds for the computational complexity of constructing the sampling set and computing the Fourier coefficients. We obtain O (max(s, N Γ ) log(s/γ)(d + log(s N Γ ))), with the parameter γ as introduced in the beginning of this subsection.

Algorithm 3 using a modification of Algorithm 2 in the role of Algorithm A
In this section, we analyze Algorithm 3 when using multiple random rank-1 lattices for constructing the first t components of the sampling sets X t,i in Step 2a according to Corollary 2.7 and applying Algorithm 2 as Algorithm A in Step 2c, where we use J t as the set of frequency candidates Γ for Algorithm 2. Since we assumed Algorithm A as an estimator of thes most significant frequencies of the input signal, we need to apply a slight modification of Algorithm 2 which we summarize in Algorithm 4. It is just the application of Algorithm 2 with a subsequent additional restriction of the outputĨ guaranteeing that this frequency set then fits to the requirements of Step 2c in Algorithm 3.
As above, we assume s local ∼ s, e.g., s local := 2s, in order to avoid at least this parameter. It suffices to discuss Step 2 since the complexity of the other steps is independent of Algorithm A and has already been investigated in the last section. Some parameters related to the probability of failure and the number of iterations will not be specified in this section, the next section discusses suitable choices. In particular, the failure probability γ A of Algorithm A still remains unspecified in this section. In Table 3.2, we give an overview of the sample complexities as well as computational complexities of the three steps involving these parameters r, γ A , and γ. The corresponding complexities for the suitable parameter choices will again be postponed to the next section.
Again, we start with the sample complexity. Each of the sampling sets X t,i is the union of O (log(|J t |/γ A )) rank-1 lattices in t spatial dimensions, consisting of M t,i,ℓ = O (max(s, N Γ )) lattice nodes each, which are embedded into d dimensions by concatenating all the lattice nodes by d − t fixed components (the same for all nodes in the sampling set X t,i ), which are drawn uniformly at random from T. The generating vectors z t,i,ℓ of the t-dimensional rank-1 lattices are drawn independently of these components and of each other, uniformly sample complexity computational complexity Step 3 max(s, N Γ ) log(s/γ) max(s, N Γ ) log(s/γ)(d + log(s, N Γ )) at random from [0, M t,i,ℓ − 1] t ∩ Z t . As |J t | r s N Γ , an upper bound of the size of each sampling set is given by |X t,i | max(s, N Γ ) log(r s N Γ /γ A ). As the number of these sampling sets X t,i is 1 + (d − 2) r, we obtain that the sampling complexity of Step 2 is O (d r max(s, N Γ ) log(r s N Γ /γ A )).
In order to estimate the computational complexity of Step 2 with Algorithm 4 taking the role of Algorithm A, we distinguish two different cases. First, we consider the worst case scenario. For this, we just use the estimate |J t | r s N Γ and apply Theorem 1.1 for each t and i, where J t takes the role of the set of frequency candidates Γ in Theorem 1.1. Accordingly, we observe a computational complexity in This estimate, however, is far from order optimal in many cases, as the bound on |J t | is approximately tight only when the setsJ t−1,i constructed for different values of i have very little overlap. In the event that Step 1 of Algorithm 3, i.e., P t (I) ⊂ I (t) and, in addition, all instances of Algorithm 4 are successful (in the sense of Corollary 2.7, cf. Section 3.2.3), these sets will have large overlap. As we will discuss in the next section, this event has high probability for suitable parameter choices. That is, in such scenarios, the computational complexity will be considerably smaller with high probability. More precisely, in that event one has |J t | ≤ sN Γ , so one obtains a computational complexity that is reduced by a factor of r. This yields a computational complexity in A similar argument shows that with high probability, one also observes a slight improvement of the sample complexity in the sense that the logarithmic term will no longer depend on r.

Choosing the parameters in Algorithm 3 for Algorithm 4 in the role of Algorithm A
Up to now, we have not discussed how to choose the parameters r, γ A , and γ. Step Step 3 max(s,  same fixed values, allows for the estimation of a certain projection of the Fourier coefficients. Repeating this process r times for different choices of the d − t components ensures that with a probability of at least 1 − δ 3 d , each of the active Fourier coefficients is projected to some coefficient that is not less than θ at least once among the different projections. Since we take the union of the frequency setsJ t,i , i = 1, . . . ,r, in Step 2d, it is sufficient to only regard the frequencies that belong to coefficients of at least θ in modulus in Step 2c. Clearly, Algorithm 2 will compute all projections of the active Fourier coefficients, and in particular those that are at least θ, with failure probabilities estimated in Corollary 2.7. Accordingly, the failure probabilities for detecting all the frequencies with coefficients not less than θ are also bounded by these estimates and as a consequence, we can apply the estimates on the failure probabilities in Corollary 2.7 to Algorithm 4 sinces ≥ s is fulfilled. That is why we set γ A := δ/(3 d r) in Step 2, which entails that for each fixed t and fixed i, Algorithm 4 has a failure probability of at most δ/(3 d r) for detecting all frequencies belonging to Fourier coefficients of at least θ. Similarly, we fix γ := δ/(3 d) for Step 3.
The resulting sample complexities and computational complexities for this choice of parameters are summarized in Table 3.3. For Step 2 we list a worst case (w.c.) bound as well as a bound that hold with a high probability of at least 1 − δ (w.h.p.).
For the computational complexity of Step 2, we obtain simpler expressions by bounding the worst case estimate via and the high probability estimate via

Numerical results
In this section, we validate our theoretical findings by numerical experiments. We first demonstrate the feasibility for unstructured candidate sets by choosing both the candidate set and the set of active frequencies at random. Second, as an example of highly structured candidate sets, we investigate the hyperbolic cross. Lastly we study the performance of our method in the context of a dimension-incremental sparse FFT in high dimensions.

Sparse FFT for arbitrary candidate sets as introduced in Section 2
We start by validating our approach in the framework of Section 2. That is, we consider different frequency candidate sets Γ ⊂ Z d , |Γ| < ∞, and apply our method to recover trigonometric polynomials p, cf. (2.2), with frequencies supported on small subsets I ⊂ Γ.

Identifying potential false positives and potential false negatives
As explained above, a main advantage of our recovery guarantees as compared to other results with similar sample complexity is that we do not assume a random signal model, but we show recovery with high probability for an arbitrary signal. Even stronger, we only analyze aliasing properties of the support, cf. formula (2.4) and the related discussion in Section 2.1. So for fixed support I we guarantee that with high probability all signals supported on I can be recovered.
In our numerical simulations, we also aim to illustrate this strong property. For that, we employ a worst case measure for the support detection. For a given support I, we compute all the potential false positives (PFP) and potential false negatives (PFN), that is, all index vectors j that arise as an alias of some k ∈ I. When j ∈ I, a cancellation of the true coefficient and the aliased coefficient can have the effect that the associated frequency is not detected as active. For j / ∈ I, the aliased coefficient will result in an unjustified detection of the coefficient. Especially the potential false negatives will only be realized for very specific choices of coefficients (that would be unlikely under a random model), and it is not clear if there actually are coefficients that realize multiple of them simultaneously. For this reason, the bounds on the success rates we empirically compute are somewhat conservative, but they certainly form a lower bound for the true rates.
More precisely, our empirical evaluation is based on the observation that to decide if a given frequency k is in I, our method considers a set of measurements, each of which constitutes the sum over the true (potentially zero if k / ∈ I) coefficient corresponding to k and all the aliased coefficients with respect to some random rank-1 lattice, and classifies k as a member of I if fewer than half of these measurements are zero. Thus a sufficient condition to prevent that k is wrongly classified as a member or not a member of I is that for more than half of these random lattices, k aliases to no other j ∈ I.
To count how many j ∈ I the frequency k aliases to, we use the following trick: We apply the method to the auxiliary polynomial (4.1) As all the coefficients indexed by I are 1, the aforementioned sum over the aliased coefficients for a realization of the random lattice is nothing but a counting measure applied to the intersection of I and the set of indices that k aliases to. Consequently, when no aliasing happens, the sum will be 0 for k / ∈ I and 1 for k ∈ I. Both of these values are the smallest possible, so if the medianp k in (2.10) takes this value, aliasing happens in fewer than half of the cases, as desired, and hence both false positives and false negatives are excluded for any choice of values of the active coefficients. Otherwise, we count k as a potential false positive (PFP) or potential false negative (PFN), respectively. This method to empirically identify the potential false detections is illustrated in Table 4.1.
Coefficient recovered by Algorithm 2 p k = 0p k = 1p k ≥ 2 true Fourier coefficient of pp k = 1 (i.e., k ∈ I) not possible not a PFN PFN p k = 0 (i.e., k ∈ Γ \ I) not a PFP PFP Table 4.1: Empirical detection of potential false negatives (PFN) and potential false positives (PFP) for known Γ and I via Algorithm 2 applied to the auxiliary trigonometric polynomial given in (4.1) Remark 4.1. On the one hand, the empirical detection of the PFNs and PFPs via Algorithm 2 applied to the auxiliary trigonometric polynomial (4.1) will always give correct results. On the other hand, when additionally using the postprocessing discussed in Remark 2.9, we may obtain "wrong" counts if there are any potential false negatives present. The reason for this is that the PFN frequencies will be in the outputĨ of Algorithm 2 and consequently, taken into consideration by the postprocessing discussed in Remark 2.9. Whenever the set of used rank-1 lattices provides a spatial discretization of the space ΠĨ of trigonometric polynomials, all PFNs will be filtered out. In reality, however, it may happen that a potential false negative does not appear in the outputĨ at all if aliasing Fourier coefficients cancel out each other, and consequently, the postprocessing could possibly yield incorrect Fourier coefficients and filter out energetic frequencies. As the cancellation strongly depends on the used rank-1 lattices and the Fourier coefficients of the function under consideration, the cancellation should be unlikely in practice.

Random frequency sets and candidate sets in three dimensions
Here, we investigate the accuracy of multiple random rank-1 lattice sampling for the exactly sparse case. We consider the reconstruction of the Fourier coefficientsp k of three-variate trigonometric polynomials p of sparsity |I| = 1 000. For each L ∈ {9, 11, 13, . . . , 37}, we fix a multiple random rank-1 lattice configuration Λ(z 1 , M 1 , . . . , z L , M L ), where each single rank-1 lattice is of size M ℓ = 10 331 > 10.33·1 000. Now we repeat the following computations 1 000 times. We choose an index set of possible frequencies Γ ⊂ {−1 000, −999, . . . , 1 000} 3 , |Γ| = 10 7 , and the index set of active frequencies I ⊂ Γ, |I| = 1 000, both uniformly at random. For these choices of Γ and I we identify potential false positives (PFP) and potential false negatives (PFN) as explained in Section 4.1.1 and illustrated in Table 4.1.
In Figure 4.1, we plot the success and failure rates for these experiments as solid lines and filled circles for various choices for the number L of rank-1 lattices used in the configuration.
Here we count an experiment as a success when no PFPs and PFNs are identified.
To put these success rate into perspective, recall that by Corollary 2.7, the probability that Algorithm 2 will correctly determine a trigonometric polynomial using the multiple random rank-1 lattice construction is bounded from below by max(0, 1−10 7 ·9.331 −8.331/41.324 L ). This number is positive for odd L ≥ 37; for L = 41 one already obtains a lower bound on the success probability of 0.90. As expected some estimates in the proof of the corollary are not tight. On the one hand, for a similar target failure rate, the number L of used rank-1 lattices in the numerical tests is lower by approximately one third compared to the corresponding theoretical bound in Corollary 2.7. On the other hand, for L = 37 rank-1 lattices, the empirical failure rate behaves distinctly better with a value of only 0.001 compared to the higher theoretical bound by Corollary 2.7 of 0.583. To illustrate this difference, we include our bound for the failure rate given by Corollary 2.7 for comparison as dashed line and unfilled circles. We observe that despite large constants, the exponential decay of both theory and experiments match well.
Furthermore, we observe that the empirical failure rate is less than 0.01 for L ≥ 33. For L = 33, we require 340 891 samples, which is only ≈ 1/29 of the samples required for a full discrete Fourier transform on Γ and only ≈ 1/23 500 of the samples of a fast Fourier transform on {−1 000, −999, . . . , 1 000} 3 . Another remarkable observation is that the success probability increases from less than 0.1 to more than 0.9 within a small range of L, which also reflects the logarithmic dependence of L on the failure probability.  In Figure 4.2, we consider the aliasing effects in more detail. For L ≥ 29, we observe no potential false negatives, which means that the outputĨ of Algorithm 2 contains all frequencies that belong to non-zero Fourier coefficients, i.e., I ⊂Ĩ. However, for 20 ≤ L ≤ 37, we still observe thatĨ = I in some test runs due to a small number of potential false positives. For our goal of identifying the active frequencies and their associated coefficients, this is much less severe than a false negative would be.   This observation directly yields a refined definition of success. Namely, we consider Algorithm 2 to be successful if no potential false negative and at most a predefined number of potential false positives are observed. The lower plots in Figure 4.2 visualize this refined success rate when this predefined number is chosen to be 10 and 100, respectively. We observe success in more than 99% of the test runs for L ≥ 23 and L ≥ 17 for at most ten potential false positives and for at most 100, respectively. Furthermore, we note that the success rate seems to exhibit a sharp phase transition, increasing from less than 0.02 to more than 0.95 from one odd L to the next.
Without providing any further details, we point out that even for significantly smaller L ≥ 17, we observe at most a small number of false negatives and at most a small number of false positives, which could be a good reason for iterative applications of Algorithm 2 with relatively small numbers L of rank-1 lattices and successively reduced frequency support I.
As an alternative to this idea, one can additionally use the available rank-1 lattice information to filter the false positives in a postprocessing step as discussed in Remark 2.9. In accordance with Remark 4.1, we only apply the postprocessing if there are no potential false negatives. The corresponding success rates are plotted in Figure 4.1 as dotted lines and filled circles. We observe a distinct improvement having success rates of more than 99% already for L ≥ 17.
As briefly indicated in Remark 2.9, one could also determine the Fourier coefficients by solving the linear system arising from the restriction to all identified frequencies (including false negatives). Already for minor oversampling, this system will often have a unique solution -with zero coefficients associated to the false positives. Naturally this approach will require that there are no false negatives and only a small number of false positives.

Weighted and unweighted hyperbolic crosses as frequency sets and candidate sets in eight dimensions
In the following, we investigate the detection accuracy for deterministic, structured frequency index sets. To this end, we fix an unweighted eight-dimensional hyperbolic cross |Γ| = 10 665 297, as a frequency candidate set. Furthermore, we fix the set of active frequencies I ⊂ Γ of cardinality |I| = 1 069 given by which is an eight-dimensional weighted hyperbolic cross. For each L ∈ {9, 11, 13, . . . , 37}, we repeat the following procedure 1 000 times: We draw a multiple random rank-1 lattice configuration Λ(z 1 , M 1 , . . . , z L , M L ), where each single rank-1 lattice is of size M ℓ = 11 047 > 10.33 · 1 069, and we identify potential false positives (PFP) and potential false negatives (PFN) using Algorithm 2 as explained in Section 4.1.1 and illustrated in We also observe a similar behavior of the success rates and the failure rates, in line with the decay rates predicted by Corollary 2.7. For L = 31 -this corresponds to 342 427 samples -we achieve a success rate of more than 0.99, if we do not allow for any PFNs and PFPs. For the modified notion of success with no PFNs, but 10 or 100 PFPs allowed, a success rate of more than 0.99 can be achieved already for L = 23 or L = 19, respectively.
As before, if one additionally uses the available rank-1 lattice information, cf. Remark 2.9 and 4.1, then we observe success rates of more than 99% already for L ≥ 15, see the dotted lines and filled circles in Figure 4

Dimension-incremental sparse FFT
We continue by numerically exploring the dimension-incremental sparse FFT method of [30,23], and Algorithm 3 with our multiple random rank-1 lattice sampling method described in Algorithm 4 in the role of Algorithm A.   1)i, |p k | ≥ 10 −6 , uniformly at random for all k ∈ I = suppp. For the reconstruction of the trigonometric polynomials p, we only assume that the search domain Γ :=Ĝ d N ⊃ I. We set the expansion parameter N := 32, which corresponds to N Γ = 64. Now, we compare the results of single reconstructing rank-1 lattice sampling from [23, Algorithm 5], multiple reconstructing rank-1 lattice sampling from [23,Algorithm 4], and Algorithm 3 combined with our multiple random rank-1 lattice approach, Algorithm 4. Note that [23, Algorithm 5] also follows the framework of Algorithm 3, with a single reconstructing rank-1 lattice sampling method used in Step 3 and also taking the role of Algorithm A. Likewise, [23,Algorithm 4] corresponds to Algorithm 3 with multiple reconstructing rank-1 lattice sampling taking the role of Algorithm A. For all of these approaches, we choose the absolute threshold parameter θ := 10 −12 , the number of detection iterations r := 1, and the sparsity parameter s := |I|.
For the combination of Algorithm 3 and Algorithm 4, we work with local sparsity parameter of s local := 2 |I| and parameter δ := 0.9. Motivated by the numerical results in Section 4.1 and since we can tolerate up to |I| false positives for each invocation of Algorithm 4, we distinctly reduce the number L of random rank-1 lattices used in Algorithm 4 as compared to the choice postulated by Corollary 2.7. We only use L := min n ∈ 2N + 1 : n ≥   First row: Maximal numbers of aliasing frequencies within I (PFNs) and aliasing frequencies in Γ \ I with frequencies within I (PFPs). Second row: Modified success rates with no potential false negatives, but some potential false positives allowed.
parameter L is not covered by our theory. However, the numerical tests below are not affected, which impressively corroborates that the worst-case scenarios we used for the theoretical estimates hardly ever occur simultaneously. At this point, we stress on the fact that [23,Algorithm 4] and [23, Algorithm 5] use numerically determined spatial discretizations for the candidate sets J t in Step 2, where the approaches for constructing these spatial discretizations are also optimized with respect to the number of used sampling values. Thus, both algorithms also already exploit the potential gaps between theory and practice.  [23,Algorithm 4], and Algorithm 3 using Algorithm 4. We only omit the case of sparsity |I| = 100 000 for [23, Algorithm 5] since this would have required quite a large number of samples and very long runtimes. All tests are repeated 10 times with newly chosen frequencies k ∈ Γ and Fourier coefficientsp k ∈ C. Then, for the 10 repetitions, we determine the maximum of the total number of samples. The numerical results are displayed in Table 4.2. We also determine the relative ℓ 2 -errors of the Fourier coefficients and observe that all errors are near machine precision (below 2 · 10 −15 ). In particular, all frequencies in all test runs are successfully recovered.
For [23, Algorithm 5] using single reconstructing rank-1 lattices, the results are shown in the third column of Table 4.2. Moreover, the results for [23, Algorithm 4] using multiple reconstructing rank-1 lattices are presented in the fourth column. We observe that we require significantly fewer samples than if we had used a d-dimensional FFT on a full grid, which would require |Ĝ 5 32 | = 65 5 = 1 160 290 625 already in the 5-dimensional case. Additionally, for sparsity |I| := 10 000, [23, Algorithm 4] using multiple reconstructing rank-1 lattices required only a fraction of approximately between 1/83 and 1/11 of the samples as compared to [23, Algorithm 5] using single rank-1 lattices.
The results for Algorithm 3 combined with the multiple random rank-1 lattice approach of Algorithm 4 are presented in the fifth column of Table 4.2. Here, we require only a fraction between 1/179 and 1/21 of the samples as compared to [23,Algorithm 5]. Moreover, we achieve at least a similar number of samples as compared to [23,Algorithm 4] and are able to reduce the samples by up to 2/3. The detection was successful in all considered cases and the relative ℓ 2 -errors of the Fourier coefficients near machine precision (below 2 · 10 −15 ).
Furthermore, we also re-ran all the test for an increased expansion parameter N = 256, which corresponds to N Γ = 512. The numerical results we obtained are shown in Table 4.3; the columns have the same meaning as in Table 4.2. For this scenario, we observe that Algorithm 3 using the multiple random rank-1 lattice approach from Algorithm 4 requires only a fraction between 1/17 and 1/10 of the number of samples of [23, Algorithm 4] using multiple reconstructing rank-1 lattices as well as only a fraction between 1/1590 and 1/217 of the number of samples of [23,Algorithm 5]. The relative ℓ 2 -errors are still near machine precision for all the settings and methods considered (below 2 · 10 −15 ).
Comparing the results in Table 4.3 for expansion parameter N = 256 with the results in Table 4.2 for N = 32, we observe a significantly higher reduction in the number of samples for the combination of Algorithm 3 and Algorithm 4 in the case N = 256. This observation confirms the theoretical results that we are able to reduce the factor N Γ in the sample complexity of [23,Algorithm 5] and [23,Algorithm 4] to log N Γ by employing Algorithm 4, see also

Approximation of tensor-product function by trigonometric polynomials
In the following, we will demonstrate that our method also works for functions that are only approximately sparse. For that we consider the multivariate periodic test function f : with a constant C m > 0 such that N m |L 2 (T) = 1. Each of the three summands has infinitely many non-zero Fourier coefficients, but they are exhibiting a decay pattern described by a hyperbolic cross (different for each of the summands). Thus we expect that the function is well approximated by trigonometric polynomials, namely the one corresponding to the Fourier coefficients indexed by a union of three hyperbolic crosses, each corresponding to significant coefficients of one of the summands.
We aim to demonstrate that our method allows to efficiently find such an approximation of the function f by a multivariate trigonometric polynomial p. More precisely, we apply the dimension-incremental approaches already discussed in the previous examples to determine a frequency index set I = I (1,...,10) ⊂ Γ :=Ĝ 10 N and to compute approximated Fourier coefficientŝ p k , k ∈ I. As explained, an adaquate choice for the resulting frequency index sets I is given by the union of three sets of frequencies corresponding to the significant coefficients of the three summands. In our example, these are a three-dimensional symmetric hyperbolic cross in the dimensions 1, 3, 8, a four-dimensional symmetric hyperbolic cross in the dimensions 2, 5, 6, 10, and a three-dimensional symmetric hyperbolic cross in the dimensions 4, 7, 9.
All tests are performed 10 times and the relative L 2 (T 10 ) approximation error is computed, where p =S I f := k∈Ip k e 2πik·• . We set the expansion parameter N := 16, 32, 64 and we use the full grids Γ :=Ĝ 10 N as search space. Moreover, we set the number of detection iterations r := 5. The used sparsity input parameters s and s local = 2s are specified in column 2 of Table 4.4. Furthermore, the results of [23, Algorithm 5] based on single reconstructing rank-1 lattices, of [23, Algorithm 4] based on multiple reconstructing rank-1 lattices, and of Algorithm 3 combined with our multiple random rank-1 lattice approach in Algorithm 4 are shown in columns 3-4, 5-6, and 7-8 of Table 4.4, respectively. For the combination of Algorithm 3 and 4, we set the parameter δ := 0.999 and the number L of random rank-1 lattices to (4.2), which again corresponds to a reduction of about 3/4 compared to the theoretical predictions of Corollary 2.7.
The column "max. rel. L 2 -error" contains the maximum of the relative L 2 (T 10 ) approximation errors f −S I f |L 2 (T 10 ) / f |L 2 (T 10 ) of the 10 test runs. The remaining columns have the same meaning as described in Section 4.2.1. We observe that for increasing sparsity parameter, the number of samples increases while the relative L 2 (T 10 ) approximation error decreases.
Moreoever, we observe that [23,Algorithm 5] and Algorithm 3 combined with Algorithm 4 yield similar relative L 2 (T 10 ) approximation errors, whereas [23, Algorithm 4] produces slightly higher errors. Furthermore, Algorithm 3 combined with Algorithm 4 requires the fewest samples, a fraction of between 1/9 and 1/2 of the number of samples required by [23,Algorithm 4] and a fraction between 1/148 and 1/17 of the number of samples required by [23,Algorithm 5]. For instance in the case N = 64 and s = 5 000, the maximal total number of samples for [