Deterministic Sparse Sublinear FFT with Improved Numerical Stability

In this paper we extend the deterministic sublinear FFT algorithm for fast reconstruction of $M$-sparse vectors of length $N= 2^J$ considered in Plonka et al. (2018). Our numerical experiences show that our modification has a huge impact on the stability of the algorithm, while the runtime of the algorithm is still ${\mathcal O}(M^2 \, \log N)$.


Introduction
Sparse FFT methods can be used in many different applications, where it is a priori known that the resulting signal in time/space or frequency domain is sparse.Such algorithms have earned a considerable interest within the last years.
Many deterministic sparse FFT algorithms are based on combinatorial approaches or phase shift, see e.g.[1,3,6,10,11,19].These approaches usually need access to arbitrary values of a given function f (x) = M j=1 a j e 2πiw j x assuming that the unknown frequencies w j are in [−N/2, N/2) ∩ Z.The sparse FFT techniques in [9,18] are based on Prony's method.
By contrast, the deterministic algorithms proposed in [12,14,15,17], or in [16], Section 5.4, consider the fully discrete problem, where for a given vector x ∈ C N , we want to efficiently compute its discrete Fourier transform x under the assumption that x is M -sparse or has a short support of length M .Recently, these techniques have also been transferred to derive sparse fast algorithms for the discrete cosine transform, [4,5].
Problem statement.More precisely, we consider the following problem.Let x = (x j ) N −1 j=0 ∈ C N with N = 2 J for some J > 1.Further, let F N := (ω j,k N ) N −1 j,k=0 ∈ C N ×N with ω N := e −2πi/N denote the Fourier matrix of order N , with F −1 N = 1 N F. We consider the following two scenarious, which can essentially be treated with the same algorithm.
(a) Assume that x := F N x = (x k ) N −1 k=0 is given.How to determine x from x in a sublinear stable way if it can be assumed that x is M -sparse with M 2 < N ?(b) Assume that x ∈ C N is given.How to determine x = F N x from x in a sublinear stable way if it can be assumed that x is M -sparse with M 2 < N ?
In both scenarios, M need not to be known beforehand.However, if M is known, then this knowledge can be used to simplify the algorithm.Throughout the paper, we say that a vector x is M -sparse, if only M components have an amplitude that exceeds a predetermined small threshold > 0.
This paper is organized as follows.In Section 2, we summarize the basic multi-scale idea of the algorithm used in [17] for the scenario (a).Section 3 is devoted to the extension of the method in [17].First, we present the general pseudocode of the sparse FFT algorithm.The numerical stability of this algorithm mainly depends on the condition number of special Vandermonde matrices, which are used at each iteration step for solving a linear system with at most M unknowns.In Section 3.1 we give an estimate of the condition number of the occurring Vandermonde matrices, which are partial matrices of the Fourier matrix.This estimate is used in the sequel to determine the two parameters determining the Vandermonde matrix.One parameter stretches the given nodes generating the Vandermonde matrix, and the second parameter determines the number of its rows.Section 4 shortly shows, how the derived algorithm can be simply adapted to solve the sparse FFT problem (b).Finally, we show in Section 5 the large impact of the new approach that allows rectangular Vandermonde matrices.A Python implementation of the new algorithm is available at http://na.math.unigoettingen.de/index.php?section=gruppe&subsection=software.
2 Multi-scale Sparse Sublinear FFT Algorithm from [17] We consider the problem stated in (a) to derive an iterative stable procedure to reconstruct x from adaptively chosen Fourier entries of x.To state the multi-scale algorithm from [17], we need to define the periodized vectors In particular, x (J) = x and x k is the sum of all components x.Observe that, if the vector x = (x k ) N −1 k=0 is known, then also the Fourier transformed vectors x(j) are immediately known, and we have see Lemma 2.1 in [14].Throughout the paper, we assume that no cancellation appears in the periodic vectors, i.e., for each significant component for a fixed shrinkage constant >0.
Idea of the algorithm.The multiscale algorithm in [17] iteratively computes x (j+1) from x (j) , for j = j 0 , . . ., J − 1.If the sparsity M of x is unknown, then we start with j 0 = 0 and x (0 using an FFT algorithm with complexity O(j 0 2 j 0 ) = O(M 2 log M ).Let M j denote the sparsity of x (j) , then we always have M j ≤ M .If M 2 j < 2 j , then the iteration step to compute x (j+1) from x (j) is based on the following theorem, see Theorem 2.2 in [17].
Theorem 2.1.Let x (j) , j = 0, . . ., J − 1, be the vectors defined in (1) satisfying (2).Then, for each j = 0, . . ., J − 1, we have: if M j ≤ 2 j − 1, then the vector x (j+1) can be uniquely recovered from x (j) and M j components xk M j are taken from the set {2 J−j−1 (2l + 1) : l = 0, . . . 2 j − 1} such that the matrix The proof of Theorem 2.1 is constructive.With the notation x (j+1) 0 =2 j , we have from ( 1) Thus, if x (j) is known, it suffices to compute x (j+1) 0 , while x (j+1) 1 then follows from (4).We can now use the factorization of the Fourier matrix F 2 j+1 , see [16], Formula (5.9), and obtain (x , where W 2 j := diag (ω 0 2 j+1 , . . ., ω 2 j −1 2 j+1 ).Thus, we conclude Further, (4) together with (2) implies that x (j+1) 0 can only have significant entries for the same index set as x (j) , and we have to compute only these M j entries.Introducing the restricted vectors x(j+1) we can also restrict the matrix F 2 j W 2 j ∈ C 2 j ×2 j in the linear system (5) to its M j columns with indices n (j) r .Finally, it suffices to restrict the system in (5) to M j linear independent rows, and x (j+1) 0 can still be uniquely computed.Therefore a restriction A (j) ∈ C M j ×M j of the product F 2 j W 2 j can be chosen as Here, the first matrix is a restriction of F 2 j to the the rows 0 ≤ h r , r = 1, . . ., M j .The diagonal matrix is the restriction of W 2 j to the rows and columns n [17], Theorem 2.1 is applied to iteratively compute x (j+1) from x (j) , if solving the restricted linear system M j p=1 (7) is cheaper than an FFT algorithm for vectors of length 2 j .
The further results in [17] focus on finding good choices of indices (h M j p=1 at each iteration step.Thereby, the paper restricts to matrices A (j) of the form i.e., we choose h (j) p = σ j (p−1) for p = 1, . . ., M j .The first matrix in this factorization ( 8) is a Vandermonde matrix generated by the knots w , r = 1, . . ., M j .The iterative algorithm which is based on Theorem 2.1 will be stable, if the linear system (7) can be efficiently computed in a stable way at each level j = j 0 , . . ., J. Therefore, [17] tries to find parameters σ j ∈ {0, . . ., 2 j − 1} such that is invertible and has a good condition.Observe that V M j (σ j ) is always invertible if we choose σ j = 1.However, σ j = 1 can lead to a very bad condition number of V M j (σ j ) and A (j) , respectively.

Extension of the Sparse FFT Algorithm
The main contribution of this paper is an extension of the algorithm proposed in [17], which tremendously improves the stability of that algorithm to make it really applicable.
We will stay with the approach to consider only matrices A (j) , which are given as a product of a Vandermonde matrix and a diagonal matrix (with condition number 1) as in (8), and we will also try to find a suitable parameter σ j to improve the numerical stability of the system.The Vandermonde structure provides the advantage that the system in (7) can be solved with computational cost of O(M 2 ), see e.g.[7].
We however do not insist on a square matrix, but allow the Vandermonde matrix factor to be a rectangular matrix with more rows than columns of the form We will choose the number of rows of the Vandermonde matrix V M j ,M j (σ j ) adaptively at each iteration step based on the obtained estimate of the condition number of V M j ,M j (σ j ), where We start with presenting the general pseudo code for the case of unknown sparsity M .In the further subsections, we will particularly present, how the coefficient matrix A (j) needs to be chosen, where we allow now a rectangular matrix.) T , (x ) T T ; Determine the index set I (j+1) by deleting all indices in I (j) ∪ (I (j) + 2 j ) that correspond to entries in x(j+1) with modulus being smaller than ; M := #I (j+1) ; Output: I (J) , the set of active indices in of x; x = x(J) = (x l ) l∈I (J) , the vector restricted to nonzero entries.
To determine the suitable matrix we have to find a well-conditioned Vandermonde matrix V M j ,M j (σ j ).Our procedure consists of two steps.1) We compute a suitable parameter σ j with O(M 2 ) operations.
2) We compute the number M j of needed rows in the Vandermonde matrix, to achieve a well-conditioned coefficient matrix in the system (11).
As seen already in [17], we can simplify the procedure of determining V M j ,M j (σ j ), if the number of significant entries M j of x (j) did not change in the previous iteration step, i.e. if M j−1 = M j .In this case, we can just choose σ j+1 := 2σ j and stay with the number of columns, i.e., M j := M j , see also Subsection 3.4.

Estimation of the condition of
It is crucial for our algorithm to have a good estimate of the condition of V M j ,M j (σ j ).The condition of V M j ,M j (σ j ) strongly depends on the minimal distance between its generating nodes ω , more precisely we have the following theorem, see [13,17], or [16], Theorem 10.23.
M j < N j with N j := 2 j be a given set of indices.For a given σ j ∈ {1, . . ., N j − 1} we define as the smallest (periodic) distance between two indices σ j n l (j) and σ j n k (j) , and assume that d σ j > 0. Then the condition number κ 2 (V M j ,M j (σ j )) of the Vandermonde matrix However, this estimate cannot be used for square matrices, and it is not very sharp for large M j .Indeed, if d σ j = N j /M j , which means, that the nodes σ j n (j) k are equidistantly distributed, then the square matrix M −1/2 j V M j ,M j (σ j ) (with M j = M j ) is orthogonal with condition number 1, see [2], while the estimate (13) cannot be applied.On the other hand, if M j = N j , then we can simply conclude that V N j ,M j (σ j ) * V N j ,M j (σ j ) = N j I M j such that we again achieve condition number 1, while (13) provides N j (1+1/dσ j ) N j (1−1/dσ j ) , which again fails for the worst case d σ j = 1 completely.Therefore, we apply another estimate, which is a simple consequence of the Theorem of Gershgorin, and can be iteratively computed during the iteration steps.It is based on the following Theorem.
M j < N j with N j := 2 j be a given set of indices.Further, let for all k = 1, . . ., M j , Then the condition number of the Vandermonde matrix V M j ,M j (σ j ) in ( 9) is bounded by Proof.Considering the product and for k = , ) .

Thus, S (j)
k is the sum of the absolute values of all non-diagonal components in the k-th row of W. The Theorem of Gershgorin implies now that the maximal eigenvalue of W is bounded from above by M j + max k S (j) k , and the smallest eigenvalue is bounded from below by While the estimate ( 15) is quite simple to achieve, it is more accurate than (13).In particular, in the two special cases M j = M j , d σ j = N j /M j and M j = N j , d σ j = 1, the estimate is sharp, and we obtain the true condition number 1.
For our computations, we will however simplify ( 14) and consider instead the upper bounds of S (j) which is not longer dependent on M j .

Efficient computation of σ j
For a given set of indices 0 ≤ n M < N j = 2 j we want to find a suitable σ j ∈ {1, . . ., N j −1} such that max k S(j) k (σ j ) is minimal.The optimal parameter σj solves the optimization problem with S(j) k (σ) defined in ( 16) for the given index set.We surely could just consider all possible sets {σn (j) 1 , . . ., σn (j) M } for σ ∈ {1, . . ., N j − 1}, compute the maximal sum S(j) k (σ) and compare the results to find the optimal parameter σj .However, this procedure is too expensive.To achieve a sparse FFT algorithm with the desired overall complexity of O(M 2 log N ), we can spend at most O(M 2 ) operations to find a suitable parameter σ j .
To avoid vanishing distances σ j (n , we will only consider odd integers σ j ≥ 1.Since N j = 2 j , we then have that N j and σ j are co-prime such that for each odd σ j we at least achieve that max k S(j) k (σ j ) is bounded.Our numerical tests show that prime numbers are good candidates for σ j , we propose the following algorithm to determine σ j .

Algorithm 3.4. (Computation of σ
M j } Initialization: M := #I (j) and choose K with K log K ≤ M Σ := set of K largest prime numbers smaller than N/2 Loop: For all σ ∈ Σ: Compute the set σI (j) := {σl mod N : l ∈ I (j) }; Order the elements of σI (j) by size to get ñ(j) The most expensive step in Algorithm 3.4 is the sorting of elements in σI (j) , this can be done with K log K ≤ M operations.Therefore the algorithm has a computational cost of O(M 2 ).Note, that we did not compute the complete sum S(j) k (σ) for all choices of σ in Algorithm 3.4.Instead, for fixed σ, we search for an index k that provides the smallest (periodic) distance |σ(n We then only compute the sum of the largest component and the neighboring component of S(j) k (σ) instead of the full sum, since S(j) k (σ) is mainly governed by these components.Remark 3.5.Using Theorem 3.2 it is of course also possible to determine σ j by comparing only the minimal distance d σ in (12) for all σ ∈ Σ, and to choose σ ∈ Σ that maximizes this distance.
There are always enough odd prime numbers available in [1,

Determination of M j
Further, we need to fix the number of needed rows M j ≥ M j to ensure that the Vandermonde matrix V M j ,M j (σ j ) is well conditioned.Employing Theorem 3.3, we can now compute the value S(j) 16).This can be done with O(M 2 j ) operations.Now, we fix M j such that is satisfied for some pre-determined bound C > 1 for the condition number.
Remark 3.6.We can also use the estimates in Theorem 3.2 for determining M j .In this case, we simply fix M j such that First, we observe that the Fourier matrix satisfies the property see [16], Formula (3.34), where J N := (δ (j+k) mod N ) N −1 j,k=0 is the so-called flip matrix with (J N ) −1 = J N .Here, δ j denotes the Kronecker symbol, i.e., δ j = 0 for j = 0 and δ j = 1 for j = 0. Thus, the relation x = F −1 N y is equivalent to In other words, if we replace the given vector x by x in Algorithm 3.1, then x is the given Fourier transform of the desired vector y, and we can apply Algorithm 3.1 directly to compute y.

Numerical experiments
First, we present some numerical experiments showing that the algorithm in [17] for sparsity M > 20 is no longer reliable.We generate randomly chosen sets of support indices I M ⊂ {0, . . ., 2 15 − 1} with different cardinalities M = 20, 30, . . ., 100 and randomly choose values x k for k ∈ I M in double precision arithmetics.Then we apply our Algorithm 3.1, where access to the Fourier transform of x ∈ C 2 J is provided.While σ j is optimally chosen as a prime number according to Algorithm 4.5 in [17], we only consider square Vandermonde matrices (as considered in [17]), i.e., we set τ max = 1.We compare the output index set I rec with the generated set I M of indices and count the failures of 100 tests for each M .The results are presented in Figure 1.The test shows that the algorithm starts to be unreliable for sparsity M > 20.
We now run the test with the same input data as above, but used the criteria in ( 18) with τ max = 2.For any M = 20, 30, . . ., 100 there occurs no failures for the computed set of indices I rec , i.e., we always find I M = I rec .Even if we run the tests for M = 200, the error rate is still zero.
To understand this strong effect if the number of rows of the Vandermonde matrix is enlarged, we analyze the condition numbers of the Vandermonde matrices occurring in the computations for different values τ max .We generate sets I M of indices and randomly choose the amplitudes of components of x with support I M .For Algorithm 3.1, we provide access to the Fourier transformed vector x as input as before for the tuples (J, M ) Figure 1: Error rate in percent for the computed set of indices for τ max = 1 and J = 15.
with J = 15, 16, . . ., 22 and M = 20, 30, 40, 50.This time we study τ max ∈ {1, 2, 5}.In each test we compute the average over all condition numbers of the used Vandermonde matrices and repeat this 20 times for each tuple (J, M ).Finally, we take the mean of all the 20 averages, and obtain the results given in the Tables 1 and 2. The results in Table 1 show that a suitable choice of the parameter σ j , as applied in [17], is not sufficient to ensure moderate condition numbers of the Vandermonde matrices involved in the sparse FFT algorithm.In Table 2, we provide some further condition numbers for larger numbers M of significant vector entries up to M = 200 and N = 2 15 , . . ., 2 22 .The experiments show that τ max = 2, i.e., doubling the number of rows in the coefficient matrix A (j) , is usually sufficient for M ≤ 100.For M > 100, we need to take a larger τ max .Now, we investigate, how the runtime of the Algorithm depends on τ max .In Figure 2 we present the average runtime for 20 tests with randomly chosen sparse vectors with sparsities M = 10, 30 and for τ max = 1, τ max = 5, τ max = 20.As we see in Figure 2, our modifications have only a very small effect on the runtime.Finally, in Figure 3 we compare the runtime of the implemented FFT of length 2 J with our algorithm for τ max = 20.We can see, that our current Python implementation starts to be faster than the FFT for M ≤ 30 and N ≥ 2 20 .It is available at http://na.math.unigoettingen.de/index.php?section=gruppe&subsection=software.

.
Find the index of the smallest distance k := argmin k=1,...,M d k .Compute D σ that d 0 := d M and d M +1 := d 1 .Completion: Choose σ ∈ Σ with minimal D σ .If there are several parameters σ achieving the same value D σ choose the σ which minimizes the sum Output: σ j := σ