Constructive Approximation

, Volume 50, Issue 3, pp 403–455 | Cite as

Learning General Sparse Additive Models from Point Queries in High Dimensions

  • Hemant Tyagi
  • Jan VybiralEmail author


We consider the problem of learning a d-variate function f defined on the cube \([-1,1]^d\subset \mathbb {R}^d\), where the algorithm is assumed to have black box access to samples of f within this domain. Let \({\mathcal {S}}_r \subset {[d] \atopwithdelims ()r}; r=1,\dots ,r_0\) be sets consisting of unknown r-wise interactions amongst the coordinate variables. We then focus on the setting where f has an additive structure; i.e., it can be represented as
$$\begin{aligned} f = \sum _{{\mathbf {j}}\in {\mathcal {S}}_1} \phi _{{\mathbf {j}}} + \sum _{{\mathbf {j}}\in {\mathcal {S}}_2} \phi _{{\mathbf {j}}} + \dots + \sum _{{\mathbf {j}}\in {\mathcal {S}}_{r_0}} \phi _{{\mathbf {j}}}, \end{aligned}$$
where each \(\phi _{{\mathbf {j}}}\); \({\mathbf {j}}\in {\mathcal {S}}_r\) is at most r-variate for \(1 \le r \le r_0\). We derive randomized algorithms that query f at a carefully constructed set of points and exactly recover each \({\mathcal {S}}_r\) with high probability. In contrast to previous work, our analysis does not rely on numerical approximation of derivatives by finite order differences.


Sparse additive models Sampling Hash functions Sparse recovery 

Mathematics Subject Classification

41A25 41A63 65D15 



  1. 1.
    Alizadeh, F., Goldfarb, D.D.: Second-order cone programming. Math. Program. Ser. B 95(1), 3–51 (2003)MathSciNetCrossRefGoogle Scholar
  2. 2.
    Baraniuk, R., Davenport, M., DeVore, R., Wakin, M.: A simple proof of the restricted isometry property for random matrices. Constr. Approx. 28(3), 253–263 (2008)MathSciNetCrossRefGoogle Scholar
  3. 3.
    Bien, J., Taylor, J., Tibshirani, R.: A Lasso for hierarchical interactions. Ann. Stat. 41(3), 1111–1141 (2013)MathSciNetCrossRefGoogle Scholar
  4. 4.
    Blöchl, P.E.: Generalized separable potentials for electronic-structure calculations. Phys. Rev. B 41, 5414–5416 (1990)CrossRefGoogle Scholar
  5. 5.
    Blumensath, T., Davies, M.E.: Iterative hard thresholding for compressed sensing. Appl. Comput. Harm. Anal. 27(3), 265–274 (2009)MathSciNetCrossRefGoogle Scholar
  6. 6.
    Blumensath, T., Davies, M.E.: Normalized iterative hard thresholding: guaranteed stability and performance. IEEE J. Select. Top. Signal Proc. 4(2), 298–309 (2010)CrossRefGoogle Scholar
  7. 7.
    Candès, E.J.: The restricted isometry property and its implications for compressed sensing. Comptes Rendus Mathematique 346(9–10), 589–592 (2008)MathSciNetCrossRefGoogle Scholar
  8. 8.
    Candès, E.J., Romberg, J.: \(\ell _{1}\)-magic: recovery of sparse signals via convex programming (2005). Available at
  9. 9.
    Chen, Y., Chi, Y., Goldsmith, A.J.: Exact and stable covariance estimation from quadratic sampling via convex programming. IEEE Trans. Inf. Theory 61(7), 4034–4059 (2015)MathSciNetCrossRefGoogle Scholar
  10. 10.
    Choi, N.H., Li, W., Zhu, J.: Variable selection with the strong heredity constraint and its oracle property. J. Am. Stat. Assoc. 105(489), 354–364 (2010)MathSciNetCrossRefGoogle Scholar
  11. 11.
    Cohen, A., DeVore, R., Schwab, C.: Convergence rates of best n-term Galerkin approximations for a class of elliptic spdes. Found. Comput. Math. 10(6), 615–646 (2010)MathSciNetCrossRefGoogle Scholar
  12. 12.
    Comminges, L., Dalalyan, A.S.: Tight conditions for consistency of variable selection in the context of high dimensionality. Ann. Stat. 40(5), 2667–2696 (2012)MathSciNetCrossRefGoogle Scholar
  13. 13.
    Comminges, L., Dalalyan, A.S.: Tight conditions for consistent variable selection in high dimensional nonparametric regression. J. Mach. Learn. Res. 19, 187–206 (2012)zbMATHGoogle Scholar
  14. 14.
    Dalalyan, A., Ingster, Y., Tsybakov, A.B.: Statistical inference in compound functional models. Probab. Theory Relat. Fields 158(3–4), 513–532 (2014)MathSciNetCrossRefGoogle Scholar
  15. 15.
    DeVore, R., Petrova, G., Wojtaszczyk, P.: Approximation of functions of few variables in high dimensions. Constr. Approx. 33, 125–143 (2011)MathSciNetCrossRefGoogle Scholar
  16. 16.
    Dibangoye, S.J., Amato, C., Buffet, O., Charpillet, F.: Exploiting separability in multiagent planning with continuous-state mdps. In: Proceedings of the 2014 International Conference on Autonomous Agents and Multi-agent Systems, AAMAS ’14, pp. 1281–1288. International Foundation for Autonomous Agents and Multiagent Systems (2014)Google Scholar
  17. 17.
    Foucart, S., Rauhut, H.: A Mathematical Introduction to Compressive Sensing. Birkhäuser/Springer, New York (2013)CrossRefGoogle Scholar
  18. 18.
    Ghiringhelli, L.M., Vybíral, J., Levchenko, S.V., Draxl, C., Scheffler, M.: Big data of materials science: critical role of the descriptor. Phys. Rev. Lett. 114(10), 105503 (2015)CrossRefGoogle Scholar
  19. 19.
    Goel, G., Chou, I.-C., Voit, E.O.: System estimation from metabolic time-series data. Bioinformatics 24(21), 2505–2511 (2008)CrossRefGoogle Scholar
  20. 20.
    Griewank, A., Toint, P.L.: On the unconstrained optimization of partially separable functions. In Nonlinear Optimization 1981, pp. 301–312. Academic Press (1982)Google Scholar
  21. 21.
    Hanson, D.L., Wright, F.T.: A bound on tail probabilities for quadratic forms in independent random variables. Ann. Math. Stat. 42(3), 1079–1083 (1971)MathSciNetCrossRefGoogle Scholar
  22. 22.
    Holtz, M.: Sparse Grid Quadrature in High Dimensions with Applications in Finance and Insurance, vol. 77. Springer, New York (2010)zbMATHGoogle Scholar
  23. 23.
    Huang, J., Horowitz, J.L., Wei, F.: Variable selection in nonparametric additive models. Ann. Stat. 38(4), 2282–2313 (2010)MathSciNetCrossRefGoogle Scholar
  24. 24.
    Kekatos, V., Giannakis, G.B.: Sparse volterra and polynomial regression models: recoverability and estimation. Trans. Sig. Proc. 59(12), 5907–5920 (2011)MathSciNetCrossRefGoogle Scholar
  25. 25.
    Koltchinskii, V., Yuan, M.: Sparse recovery in large ensembles of kernel machines. In: 21st Annual Conference on Learning Theory (COLT), pp. 229–238 (2008)Google Scholar
  26. 26.
    Koltchinskii, V., Yuan, M.: Sparsity in multiple kernel learning. Ann. Stat. 38(6), 3660–3695 (2010)MathSciNetCrossRefGoogle Scholar
  27. 27.
    Lin, Y., Zhang, H.H.: Component selection and smoothing in multivariate nonparametric regression. Ann. Stat. 34(5), 2272–2297 (2006)MathSciNetCrossRefGoogle Scholar
  28. 28.
    Meier, L., Van de Geer, S., Bühlmann, P.: High-dimensional additive modeling. Ann. Stat. 37(6B), 3779–3821 (2009)MathSciNetCrossRefGoogle Scholar
  29. 29.
    Mossel, E., O’Donnell, R., Servedio, R.: Learning juntas. In: 35th Annual ACM Symposium on Theory of Computing (STOC), pp. 206–212 (2003)Google Scholar
  30. 30.
    Nazer, B., Nowak, R.D.: Sparse interactions: Identifying high-dimensional multilinear systems via compressed sensing. In: 48th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1589–1596 (2010)Google Scholar
  31. 31.
    Nesterov, Y., Nemirovskii, A.: Interior-Point Polynomial Algorithms in Convex Programming. Society for Industrial and Applied Mathematics (1994)Google Scholar
  32. 32.
    Novak, E., Triebel, H.: Function spaces in lipschitz domains and optimal rates of convergence for sampling. Constr. Approx. 23(3), 325–350 (2006)MathSciNetCrossRefGoogle Scholar
  33. 33.
    Novak, E., Woźniakowski, H.: Approximation of infinitely differentiable multivariate functions is intractable. J. Compl. 25, 398–404 (2009)MathSciNetCrossRefGoogle Scholar
  34. 34.
    Radchenko, P., James, G.M.: Variable selection using adaptive nonlinear interaction structures in high dimensions. J. Am. Stat. Assoc. 105, 1541–1553 (2010)MathSciNetCrossRefGoogle Scholar
  35. 35.
    Raskutti, G., Wainwright, M.J., Yu, B.: Minimax-optimal rates for sparse additive models over kernel classes via convex programming. J. Mach. Learn. Res. 13(1), 389–427 (2012)MathSciNetzbMATHGoogle Scholar
  36. 36.
    Rauhut, H.: Compressive sensing and structured random matrices. Theor. Found. Numer. Methods Sparse Recovery 9, 1–92 (2010)MathSciNetzbMATHGoogle Scholar
  37. 37.
    Ravikumar, P., Lafferty, J., Liu, H., Wasserman, L.: Sparse additive models. J. R. Stat. Soc.: Ser. B (Stat. Methodol.) 71(5), 1009–1030 (2009)MathSciNetCrossRefGoogle Scholar
  38. 38.
    Schnass, K., Vybíral, J.: Compressed learning of high-dimensional sparse functions. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 3924–3927 (2011)Google Scholar
  39. 39.
    Shan, S., Wang, G.G.: Survey of modeling and optimization strategies to solve high-dimensional design problems with computationally-expensive black-box functions. Struct. Multidiscip. Optim. 41(2), 219–241 (2010)MathSciNetCrossRefGoogle Scholar
  40. 40.
    Storlie, C.B., Bondell, H.D., Reich, B.J., Zhang, H.H.: Surface estimation, variable selection, and the nonparametric oracle property. Statistica Sinica 21(2), 679–705 (2011)MathSciNetCrossRefGoogle Scholar
  41. 41.
    Tyagi, H., Krause, A., Gärtner, B.: Efficient sampling for learning sparse additive models in high dimensions. Adv. Neural Inf. Process. Syst. (NIPS) 27, 514–522 (2014)Google Scholar
  42. 42.
    Tyagi, H., Kyrillidis, A., Gärtner, B., Krause, A.: Learning sparse additive models with interactions in high dimensions. In: 19th International Conference on Artificial Intelligence and Statistics (AISTATS), pp. 111–120 (2016)Google Scholar
  43. 43.
    Tyagi, H., Kyrillidis, A., Gärtner, B., Krause, A.: Algorithms for learning sparse additive models with interactions in high dimensions. Inf. Inference: J. IMA, iax008 (2017)Google Scholar
  44. 44.
    Vershynin, R.: Introduction to the non-asymptotic analysis of random matrices. In: Compressed Sensing: Theory and Applications, pp. 210–268. Cambridge University Press (2012)Google Scholar
  45. 45.
    Vybíral, J.: Sampling numbers and function spaces. J. Compl. 23(4–6), 773–792 (2007)MathSciNetCrossRefGoogle Scholar
  46. 46.
    Vybíral, J.: Widths of embeddings in function spaces. J. Compl. 24(4), 545–570 (2008)MathSciNetCrossRefGoogle Scholar
  47. 47.
    Wahl, M.: Variable selection in high-dimensional additive models based on norms of projections (2015). ArXiv e-prints, arXiv:1406.0052
  48. 48.
    Winkelbauer, A.: Moments and absolute moments of the normal distribution (2014). ArXiv e-prints, arXiv:1209.4340v2
  49. 49.
    Yang, Y., Tokdar, S.T.: Minimax-optimal nonparametric regression in high dimensions. Ann. Stat. 43(2), 652–674 (2015)MathSciNetCrossRefGoogle Scholar
  50. 50.
    Zhu, P., Morelli, J., Ferrari, S.: Value function approximation for the control of multiscale dynamical systems. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 5471–5477 (2016)Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.INRIA Lille - Nord EuropeVilleneuve d’AscqFrance
  2. 2.Department of Mathematics FNSPECzech Technical University in PraguePragueCzech Republic

Personalised recommendations