We’re sorry, something doesn't seem to be working properly.

Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Skip to main content
Log in

Sparse Harmonic Transforms: A New Class of Sublinear-Time Algorithms for Learning Functions of Many Variables

    We’re sorry, something doesn't seem to be working properly.

    Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

  • Published:
Foundations of Computational Mathematics Aims and scope Submit manuscript

Abstract

In this paper we develop fast and memory efficient numerical methods for learning functions of many variables that admit sparse representations in terms of general bounded orthonormal tensor product bases. Such functions appear in many applications including, e.g., various Uncertainty Quantification (UQ) problems involving the solution of parametric PDE that are approximately sparse in Chebyshev or Legendre product bases (Chkifa et al. in Polynomial approximation via compressed sensing of high-dimensional functions on lower sets. arXiv:1602.05823, 2016; Rauhut and Schwab in Math Comput 86(304):661–700, 2017). We expect that our results provide a starting point for a new line of research on sublinear-time solution techniques for UQ applications of the type above which will eventually be able to scale to significantly higher-dimensional problems than what are currently computationally feasible. More concretely, let \({\mathcal {B}}\) be a finite Bounded Orthonormal Product Basis (BOPB) of cardinality \(|{\mathcal {B}}| = N\). Herein we will develop methods that rapidly approximate any function f that is sparse in the BOPB, that is, \(f: {\mathcal {D}} \subset {\mathbb {R}}^D \rightarrow {\mathbb {C}}\) of the form

$$\begin{aligned} f(\varvec{x}) = \sum _{b \in {\mathcal {S}}} c_b \cdot b(\varvec{x}) \end{aligned}$$

with \({\mathcal {S}} \subset {\mathcal {B}}\) of cardinality \(|{\mathcal {S}}| = s \ll N\). Our method adapts the CoSaMP algorithm (Needell and Tropp in Appl Comput Harmon Anal 26(3):301–321, 2009) to use additional function samples from f along a randomly constructed grid \({\mathcal {G}} \subset {\mathbb {R}}^D\) with universal approximation properties in order to rapidly identify the multi-indices of the most dominant basis functions in \({\mathcal {S}}\) component by component during each CoSaMP iteration. It has a runtime of just \((s \log N)^{{\mathcal {O}}(1)}\), uses only \((s \log N)^{{\mathcal {O}}(1)}\) function evaluations on the fixed and nonadaptive grid \({\mathcal {G}}\), and requires not more than \((s \log N)^{{\mathcal {O}}(1)}\) bits of memory. We emphasize that nothing about \({\mathcal {S}}\) or any of the coefficients \(c_b \in {\mathbb {C}}\) is assumed in advance other than that \({\mathcal {S}} \subset {\mathcal {B}}\) has \(|{\mathcal {S}}| \le s\). Both \({\mathcal {S}}\) and its related coefficients \(c_b\) will be learned from the given function evaluations by the developed method. For \(s\ll N\), the runtime \((s \log N)^{{\mathcal {O}}(1)}\) will be less than what is required to simply enumerate the elements of the basis \({\mathcal {B}}\); thus our method is the first approach applicable in a general BOPB framework that falls into the class referred to as sublinear-time. This and the similarly reduced sample and memory requirements set our algorithm apart from previous works based on standard compressive sensing algorithms such as basis pursuit which typically store and utilize full intermediate basis representations of size \(\varOmega (N)\) during the solution process.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15

Similar content being viewed by others

Notes

  1. Additionally, we will occasionally assume that our total grid size \(|{\mathcal {G}}|\) below always satisfies \(|{\mathcal {G}}| \le N^c\) for some absolute constant \(c \ge 1\) in order to simplify some of the logarithmic factors appearing in our big-O notation. This will certainly always be the case for any standardly used (trigonometric) polynomial BOPB (such as Fourier and Chebyshev product bases) whenever \(sKDM < N\).

  2. Here we note that preconditioning and well chosen sampling distributions are crucial for many BOP bases. For example, the BOS constant for the standard Legendre basis is \(K = \sqrt{2M+1}^D\) which implies that a naive application of Theorem 1 may require more than \(M^D\) (or \(M^d\)) samples. However, preconditioning can effectively reduce this BOS constant to \(K = \sqrt{3}^d\) in practice [56].

  3. Though the resulting \({\mathcal {O}}\left( s^5 D^2 d^4 {{\mathrm{polylog}}}(MDs\Vert \varvec{c}\Vert _2/\eta p) \right) \)-runtime achieved by Corollary 1 for the multidimensional Fourier basis is strictly worse than the best existing noise robust and deterministic sublinear-time results for that basis [40] (except perhaps when \(s^3 d^4 \ll D^2\)), we emphasize that it is achieved with a different and significantly less specialized grid \({\mathcal {G}}\) herein.

  4. Best s-term approximation guarantees imply the exact recovery of exactly s-sparse functions.

  5. The \(\min \left\{ d - \Vert {\widetilde{n}} \Vert _0, D-1 \right\} \) in the exponent of the M in (9) handles the case when \(d = D\) and \({\widetilde{n}} = 0\).

  6. When \(j = D-1\) the vector \(\varvec{z}_{j,k}\) is interpreted as a null vector satisfying \((\varvec{w}_{j,\ell },\varvec{z}_{j,k}) = \varvec{w}_{j,\ell }\) \(\forall (\ell , k)\).

  7. In practice, it suffices to approximate the least-squares solution \(b_T\) by an iterative least-squares approach such as Richardson’s iteration or conjugate gradient [17] since computing the exact least squares solution can be expensive when s is large. The argument of [50] shows that it is enough to take three iterations for Richardson’s iteration or conjugate gradient if the initial condition is set to \(\varvec{a}^{t}\), and if \(\varPhi \) has an RIP constant \(\delta _{2s}<0.025\). In fact, both of these methods have similar runtime performance.

  8. Note that we are generally assuming herein that \(2s < M\). In the event that \(2s \ge M\) one can proceed in at least two different ways. The first way is to not change anything, and to simply be at peace with the possibility of, e.g., occasionally returning \({\mathcal {N}}_j = [M]\). This is our default approach. The second way is regroup the first \(g \in {\mathbb {N}}\) variables of f together into a new collective “first variable”, the second g variables together into a new collective “second variable”, etc., for some g satisfying \(M^g > 2s\). After such regrouping the algorithm can then again effectively be run as is with respect to these new collective variables.

  9. Recall that \({\mathcal {L}}'\) represents the maximum number of function evaluations one needs in order to compute \(\langle g, T_{j;{\widetilde{n}}} \rangle \) for all \({\widetilde{n}} \in [M]\) in \({\mathcal {O}}({\mathcal {L}})\)-time for any given \(j \in [D]\), and s-sparse \(g: {\mathcal {D}}_j \rightarrow {\mathbb {C}}\) in \(\mathrm{span} \left\{ T_{j;m}~\big |~m\in [M] \right\} \).

  10. In this simple example we can of course simply estimate the energy for all 18 possible index vectors. The three true index vectors in the support of \(\varvec{r}\) with nonzero energy would then be discovered and all would be well. However, this naive approach becomes spectacularly inefficient for larger \(D \gg 3\).

  11. In less optimal settings one should keep in mind that Algorithm 3 only finds the most energetic entries in general, so that \({\mathcal {P}}\supset \left\{ \varvec{n} ~\big |~ |{r}_{\varvec{n}}|^2 \ge \frac{\Vert \varvec{r}\Vert _2^2}{\alpha ^2 s} \right\} \) for a given \(\alpha >1\). This is why we need to apply it iteratively.

  12. In fact, this is unsurprising given that similar dimension incremental strategies have been proposed as far back as the 1970’s in work related to recovering sparse algebraic polynomials [65].

  13. The \(\mathcal {{\tilde{O}}}\) complexity notation here neglects all logarithmic factors while simultaneously holding D, K, d, \(\eta \), \(\Vert \mathbf{c}_f \Vert _2\) and \({\mathcal {L}}\) constant.

  14. CoSaMP always uses only \(m = {\mathcal {O}}(s \cdot D \log M)\) samples in the experiments herein which means that its measurement matrix’s conjugate transpose, \(\varPhi ^* \in {\mathbb {C}}^{M^D \times m}\), can be naively multiplied by vectors in only \({\mathcal {O}}(s \cdot D \log M \cdot M^D)\)-time. When s is small this is comparable to the \({\mathcal {O}}( D \log M \cdot M^D)\) runtime complexity of a (nonuniform) FFT.

References

  1. B. Adcock. Infinite-dimensional \(\ell ^{1}\) minimization and function approximation from pointwise data. Constr. Approx., 45(3):345–390, 2017.

    Article  MathSciNet  Google Scholar 

  2. B. Adcock, S. Brugiapaglia, and C. G. Webster. Compressed Sensing Approaches for Polynomial Approximation of High-Dimensional Functions, pages 93–124. Springer International Publishing, Cham, 2017.

    Google Scholar 

  3. R. Arratia and L. Gordon. Tutorial on large deviations for the binomial distribution. Bull. Math. Biol., 51(1):125–131, 1989.

    Article  MathSciNet  Google Scholar 

  4. J. Bailey, M. A. Iwen, and C. V. Spencer. On the design of deterministic matrices for fast recovery of Fourier compressible functions. SIAM J. Matrix Anal. Appl., 33(1):263–289, 2012.

    Article  MathSciNet  Google Scholar 

  5. S. Bittens, R. Zhang, and M. A. Iwen. A deterministic sparse FFT for functions with structured Fourier sparsity. Adv. Comput. Math., to appear.

  6. J.-L. Bouchot, H. Rauhut, and C. Schwab. Multi-level Compressed Sensing Petrov-Galerkin discretization of high-dimensional parametric PDEs. ArXiv e-prints, Jan. 2017.

  7. H.-J. Bungartz and M. Griebel. Sparse grids. Acta Numer., 13:147–269, 2004.

    Article  MathSciNet  Google Scholar 

  8. R. E. Caflisch. Monte Carlo and quasi-Monte Carlo methods. Acta Numer., 7:1–49, 1998.

    Article  MathSciNet  Google Scholar 

  9. E. J. Candeès, J. Romberg, and T. Tao. Stable signal recovery from incomplete and inaccurate measurements. Comm. Pure Appl. Math., 59:1207–1223, 2006.

    Article  MathSciNet  Google Scholar 

  10. A. Chkifa, N. Dexter, H. Tran, and C. G. Webster. Polynomial approximation via compressed sensing of high-dimensional functions on lower sets. arXiv preprintarXiv:1602.05823, 2016.

  11. B. Choi, A. Christlieb, and Y. Wang. High-dimensional sublinear sparse Fourier algorithm. Numer. Algorithms, to appear, 2020.

  12. B. Choi, A. Christlieb, and Y. Wang. Multiscale high-dimensional sparse Fourier algorithms for noisy data. Mathematics, Computation and Geometry of Data, to appear, 2020.

  13. B. Choi and M. Iwen. SHT: Sparse harmonic transforms for learning functions of many variables. https://math.msu.edu/markiwen/Code.html, Aug. 2018.

  14. B. Choi, M. Iwen, and T. Volkmer. Sparse harmonic transforms II: Best \(s\)-term approximation guarantees for bounded orthonormal product bases in sublinear time. arXiv:1909.09564, 2019.

  15. A. Christlieb, D. Lawlor, and Y. Wang. A multiscale sub-linear time Fourier algorithm for noisy data. Appl. Comput. Harmon. Anal., 40:553–574, 2016.

    Article  MathSciNet  Google Scholar 

  16. A. Cohen, W. Dahmen, and R. DeVore. Compressed sensing and best \(k\)-term approximation. J. Amer. Math. Soc., 22(1):211–231, 2009.

    Article  MathSciNet  Google Scholar 

  17. G. Dahlquist and Å. Björck. Numerical Methods in Scientific Computing: Volume 1. Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, 2008.

    MATH  Google Scholar 

  18. I. Daubechies, M. Defrise, and C. De Mol. An iterative thresholding algorithm for linear inverse problems with a sparsity constraint. Comm. Pure Appl. Math., 57(11):1413–1457, 2004.

    Article  MathSciNet  Google Scholar 

  19. R. DeVore, G. Petrova, and P. Wojtaszczyk. Approximation of functions of few variables in high dimensions. Constr. Approx., 33(1):125–143, 2011.

    Article  MathSciNet  Google Scholar 

  20. D. L. Donoho. Compressed sensing. IEEE Trans. Inform. Theory, 52(4):1289–1306, 2006.

    Article  MathSciNet  Google Scholar 

  21. D. Dũng, V. N. Temlyakov, and T. Ullrich. Hyperbolic cross approximation. arXiv preprintarXiv:1601.03978, 2016.

  22. S. Foucart. Hard thresholding pursuit: an algorithm for compressive sensing. SIAM J. Numer. Anal., 49(6):2543–2563, 2011.

    Article  MathSciNet  Google Scholar 

  23. S. Foucart and H. Rauhut. A mathematical introduction to compressive sensing. Springer, Berlin, 2013.

    Book  Google Scholar 

  24. A. Gilbert, Y. Li, E. Porat, and M. Strauss. Approximate sparse recovery: Optimizing time and measurements. SIAM J. Comput., 41(2):436–453, 2012.

    Article  MathSciNet  Google Scholar 

  25. A. Gilbert, M. Strauss, J. Tropp, and R. Vershynin. Sublinear approximation of compressible signals. Proc. SPIE Intell. Integrated Microsystems (IIM), page 623, 2006.

  26. A. C. Gilbert, P. Indyk, M. Iwen, and L. Schmidt. Recent developments in the sparse Fourier transform: a compressed Fourier transform for big data. IEEE Signal Processing Magazine, 31(5):91–100, 2014.

    Article  Google Scholar 

  27. A. C. Gilbert, M. A. Iwen, and M. J. Strauss. Group testing and sparse signal recovery. In 42nd Asilomar Conference on Signals, Systems, and Computers, 2008.

  28. A. C. Gilbert, Y. Li, E. Porat, and M. J. Strauss. For-all sparse recovery in near-optimal time. ACM Trans. Algorithms, 13(3):32:1–32:26, Mar. 2017.

    Article  MathSciNet  Google Scholar 

  29. A. C. Gilbert, S. Muthukrishnan, and M. Strauss. Improved time bounds for near-optimal sparse Fourier representations. In Proceedings of SPIE, volume 5914, page 59141A, 2005.

  30. A. C. Gilbert, M. J. Strauss, J. A. Tropp, and R. Vershynin. One sketch for all: Fast algorithms for compressed sensing. In Proceedings of the Thirty-ninth Annual ACM Symposium on Theory of Computing, STOC ’07, pages 237–246, New York, NY, USA, 2007. ACM.

  31. L. Greengard and J.-Y. Lee. Accelerating the nonuniform fast Fourier transform. SIAM Rev., 46(3):443–454, 2004.

    Article  MathSciNet  Google Scholar 

  32. C. Gross, M. A. Iwen, L. Kämmerer, and T. Volkmer. A deterministic algorithm for constructing multiple rank-1 lattices of near-optimal size. arXiv:2003.09753, 2020.

  33. H. Hassanieh, P. Indyk, D. Katabi, and E. Price. Simple and practical algorithm for sparse Fourier transform. In Proceedings of the twenty-third annual ACM-SIAM symposium on Discrete Algorithms, pages 1183–1194. Society for Industrial and Applied Mathematics, 2012.

  34. A. Hinrichs, E. Novak, M. Ullrich, and H. Woźniakowski. The curse of dimensionality for numerical integration of smooth functions. Math. Comp., 83(290):2853–2863, 2014.

    Article  MathSciNet  Google Scholar 

  35. X. Hu, M. Iwen, and H. Kim. Rapidly computing sparse Legendre expansions via sparse Fourier transforms. Numer. Algorithms, pages 1–31, 2015.

  36. P. Indyk and M. Kapralov. Sparse Fourier transform in any constant dimension with nearly-optimal sample complexity in sublinear time. 2014.

  37. M. Iwen, A. Gilbert, and M. Strauss. Empirical evaluation of a sub-linear time sparse DFT algorithm. Commun. Math. Sci., 5(4):981–998, 2007.

    Article  MathSciNet  Google Scholar 

  38. M. A. Iwen. A deterministic sub-linear time sparse Fourier algorithm via non-adaptive compressed sensing methods. In Proceedings of the nineteenth annual ACM-SIAM symposium on Discrete algorithms, pages 20–29. Society for Industrial and Applied Mathematics, 2008.

  39. M. A. Iwen. Combinatorial sublinear-time Fourier algorithms. Found. Comput. Math., 10(3):303–338, 2010.

    Article  MathSciNet  Google Scholar 

  40. M. A. Iwen. Improved approximation guarantees for sublinear-time Fourier algorithms. Appl. Comput. Harmon. Anal., 34(1):57–82, 2013.

    Article  MathSciNet  Google Scholar 

  41. M. A. Iwen. Compressed sensing with sparse binary matrices: Instance optimal error guarantees in near-optimal time. J. Complexity, 30(1):1–15, 2014.

    Article  MathSciNet  Google Scholar 

  42. L. Kämmerer, D. Potts, and T. Volkmer. High-dimensional sparse FFT based on sampling along multiple rank-1 lattices. arXiv preprintarXiv:1711.05152, 2017.

  43. M. Kapralov. Sparse Fourier transform in any constant dimension with nearly-optimal sample complexity in sublinear time. arXiv 1604.00845, 2016.

  44. F. Krahmer and R. Ward. Stable and robust sampling strategies for compressive imaging. IEEE Trans. Image Process., 23(2):612–622, 2014.

    Article  MathSciNet  Google Scholar 

  45. F. Y. Kuo, G. Migliorati, F. Nobile, and D. Nuyens. Function integration, reconstruction and approximation using rank-1 lattices. arXiv:1908.01178, 2019.

  46. G. Leobacher and F. Pillichshammer. Introduction to Quasi-Monte Carlo Integration and Applications. Compact Textbooks in Mathematics. Springer International Publishing, 2014.

  47. S. Merhi, R. Zhang, M. A. Iwen, and A. Christlieb. A new class of fully discrete sparse Fourier transforms: Faster stable implementations with guarantees. J. Fourier Anal. Appl., https://doi.org/10.1007/s00041-018-9616-4, 2018.

  48. L. Morotti. Explicit universal sampling sets in finite vector spaces. Appl. Comput. Harmon. Anal., 43(2):354–369, 2017.

    Article  MathSciNet  Google Scholar 

  49. R. Motwani and P. Raghavan. Randomized Algorithms. Cambridge University Press, Cambridge, 1995.

    Book  Google Scholar 

  50. D. Needell and J. A. Tropp. CoSaMP: Iterative signal recovery from incomplete and inaccurate samples. Appl. Comput. Harmon. Anal., 26(3):301–321, 2009.

    Article  MathSciNet  Google Scholar 

  51. D. Needell and R. Vershynin. Signal recovery from incomplete and inaccurate measurements via regularized orthogonal matching pursuit. IEEE Journal of selected topics in signal processing, 4(2):310–316, 2010.

    Article  Google Scholar 

  52. D. Potts and T. Volkmer. Sparse high-dimensional FFT based on rank-1 lattice sampling. Appl. Comput. Harmon. Anal., 41(3):713–748, 2016.

    Article  MathSciNet  Google Scholar 

  53. D. Potts and T. Volkmer. Multivariate sparse FFT based on rank-1 Chebyshev lattice sampling. In Sampling Theory and Applications (SampTA), 2017 International Conference on, pages 504–508. IEEE, 2017.

  54. H. Rauhut. Random sampling of sparse trigonometric polynomials. Appl. Comput. Harmon. Anal., 22(1):16–42, 2007.

    Article  MathSciNet  Google Scholar 

  55. H. Rauhut and C. Schwab. Compressive sensing Petrov-Galerkin approximation of high-dimensional parametric operator equations. Math. Comp., 86(304):661–700, 2017.

    Article  MathSciNet  Google Scholar 

  56. H. Rauhut and R. Ward. Sparse Legendre expansions via \(\ell _1\)-minimization. J. Approx. Theory, 164(5):517–533, 2012.

    Article  MathSciNet  Google Scholar 

  57. C. Schwab and R. A. Todor. Karhunen–Loève approximation of random fields by generalized fast multipole methods. J. Comput. Phys., 217(1):100–122, 2006.

  58. I. Segal and M. Iwen. Improved sparse Fourier approximation results: Faster implementations and stronger guarantees. Numer. Algorithms, 63:239–263, 2013.

    Article  MathSciNet  Google Scholar 

  59. J. Shen and L.-L. Wang. Sparse spectral approximations of high-dimensional problems based on hyperbolic cross. SIAM J. Numer. Anal., 48(3):1087–1109, 2010.

    Article  MathSciNet  Google Scholar 

  60. R. C. Smith. Uncertainty Quantification: Theory, Implementation, and Applications. Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, 2013.

    Google Scholar 

  61. R. Vershynin. Introduction to the non-asymptotic analysis of random matrices. arXiv:1011.3027v7, 2011.

  62. T. Volkmer. Multivariate approximation and high-dimensional sparse FFT based on rank-1 lattice sampling. Dissertation (PhD thesis), Faculty of Mathematics, Technische Universität Chemnitz (Chemnitz University of Technology), 2017.

  63. D. Xiu. Numerical Methods for Stochastic Computations: A Spectral Method Approach. Princeton University Press, Princeton, NJ, USA, 2010.

    Book  Google Scholar 

  64. T. Zhang. Sparse recovery with orthogonal matching pursuit under RIP. IEEE Trans. Inform. Theory, 57(9):6215–6221, 2011.

    Article  MathSciNet  Google Scholar 

  65. R. Zippel. Probabilistic algorithms for sparse polynomials. In International symposium on symbolic and algebraic manipulation, pages 216–226. Springer, 1979.

Download references

Acknowledgements

The authors thank Holger Rauhut for fruitful discussions on the topic. Both Mark Iwen and Felix Krahmer acknowledge support by the TUM August–Wilhelm–Scheer (AWS) Visiting Professor Program that allowed for the initiation of this project.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bosu Choi.

Additional information

Communicated by Francis Bach.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Mark Iwen was supported in part by NSF DMS-1416752 and NSF CCF-1615489. Bosu Choi was supported in part by NSF DMS-1416752. Felix Krahmer was supported in part by the German Science foundation in the context of the Emmy Noether junior research group KR 4512/1-1.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Choi, B., Iwen, M.A. & Krahmer, F. Sparse Harmonic Transforms: A New Class of Sublinear-Time Algorithms for Learning Functions of Many Variables. Found Comput Math 21, 275–329 (2021). https://doi.org/10.1007/s10208-020-09462-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10208-020-09462-z

Keywords

Mathematics Subject Classification

Navigation