Learning General Sparse Additive Models from Point Queries in High Dimensions
Article
First Online:
- 66 Downloads
- 1 Citations
Abstract
We consider the problem of learning a d-variate function f defined on the cube \([-1,1]^d\subset \mathbb {R}^d\), where the algorithm is assumed to have black box access to samples of f within this domain. Let \({\mathcal {S}}_r \subset {[d] \atopwithdelims ()r}; r=1,\dots ,r_0\) be sets consisting of unknown r-wise interactions amongst the coordinate variables. We then focus on the setting where f has an additive structure; i.e., it can be represented as where each \(\phi _{{\mathbf {j}}}\); \({\mathbf {j}}\in {\mathcal {S}}_r\) is at most r-variate for \(1 \le r \le r_0\). We derive randomized algorithms that query f at a carefully constructed set of points and exactly recover each \({\mathcal {S}}_r\) with high probability. In contrast to previous work, our analysis does not rely on numerical approximation of derivatives by finite order differences.
$$\begin{aligned} f = \sum _{{\mathbf {j}}\in {\mathcal {S}}_1} \phi _{{\mathbf {j}}} + \sum _{{\mathbf {j}}\in {\mathcal {S}}_2} \phi _{{\mathbf {j}}} + \dots + \sum _{{\mathbf {j}}\in {\mathcal {S}}_{r_0}} \phi _{{\mathbf {j}}}, \end{aligned}$$
Keywords
Sparse additive models Sampling Hash functions Sparse recoveryMathematics Subject Classification
41A25 41A63 65D15Notes
References
- 1.Alizadeh, F., Goldfarb, D.D.: Second-order cone programming. Math. Program. Ser. B 95(1), 3–51 (2003)MathSciNetCrossRefGoogle Scholar
- 2.Baraniuk, R., Davenport, M., DeVore, R., Wakin, M.: A simple proof of the restricted isometry property for random matrices. Constr. Approx. 28(3), 253–263 (2008)MathSciNetCrossRefGoogle Scholar
- 3.Bien, J., Taylor, J., Tibshirani, R.: A Lasso for hierarchical interactions. Ann. Stat. 41(3), 1111–1141 (2013)MathSciNetCrossRefGoogle Scholar
- 4.Blöchl, P.E.: Generalized separable potentials for electronic-structure calculations. Phys. Rev. B 41, 5414–5416 (1990)CrossRefGoogle Scholar
- 5.Blumensath, T., Davies, M.E.: Iterative hard thresholding for compressed sensing. Appl. Comput. Harm. Anal. 27(3), 265–274 (2009)MathSciNetCrossRefGoogle Scholar
- 6.Blumensath, T., Davies, M.E.: Normalized iterative hard thresholding: guaranteed stability and performance. IEEE J. Select. Top. Signal Proc. 4(2), 298–309 (2010)CrossRefGoogle Scholar
- 7.Candès, E.J.: The restricted isometry property and its implications for compressed sensing. Comptes Rendus Mathematique 346(9–10), 589–592 (2008)MathSciNetCrossRefGoogle Scholar
- 8.Candès, E.J., Romberg, J.: \(\ell _{1}\)-magic: recovery of sparse signals via convex programming (2005). Available at http://www.acm.caltech.edu/l1magic/
- 9.Chen, Y., Chi, Y., Goldsmith, A.J.: Exact and stable covariance estimation from quadratic sampling via convex programming. IEEE Trans. Inf. Theory 61(7), 4034–4059 (2015)MathSciNetCrossRefGoogle Scholar
- 10.Choi, N.H., Li, W., Zhu, J.: Variable selection with the strong heredity constraint and its oracle property. J. Am. Stat. Assoc. 105(489), 354–364 (2010)MathSciNetCrossRefGoogle Scholar
- 11.Cohen, A., DeVore, R., Schwab, C.: Convergence rates of best n-term Galerkin approximations for a class of elliptic spdes. Found. Comput. Math. 10(6), 615–646 (2010)MathSciNetCrossRefGoogle Scholar
- 12.Comminges, L., Dalalyan, A.S.: Tight conditions for consistency of variable selection in the context of high dimensionality. Ann. Stat. 40(5), 2667–2696 (2012)MathSciNetCrossRefGoogle Scholar
- 13.Comminges, L., Dalalyan, A.S.: Tight conditions for consistent variable selection in high dimensional nonparametric regression. J. Mach. Learn. Res. 19, 187–206 (2012)zbMATHGoogle Scholar
- 14.Dalalyan, A., Ingster, Y., Tsybakov, A.B.: Statistical inference in compound functional models. Probab. Theory Relat. Fields 158(3–4), 513–532 (2014)MathSciNetCrossRefGoogle Scholar
- 15.DeVore, R., Petrova, G., Wojtaszczyk, P.: Approximation of functions of few variables in high dimensions. Constr. Approx. 33, 125–143 (2011)MathSciNetCrossRefGoogle Scholar
- 16.Dibangoye, S.J., Amato, C., Buffet, O., Charpillet, F.: Exploiting separability in multiagent planning with continuous-state mdps. In: Proceedings of the 2014 International Conference on Autonomous Agents and Multi-agent Systems, AAMAS ’14, pp. 1281–1288. International Foundation for Autonomous Agents and Multiagent Systems (2014)Google Scholar
- 17.Foucart, S., Rauhut, H.: A Mathematical Introduction to Compressive Sensing. Birkhäuser/Springer, New York (2013)CrossRefGoogle Scholar
- 18.Ghiringhelli, L.M., Vybíral, J., Levchenko, S.V., Draxl, C., Scheffler, M.: Big data of materials science: critical role of the descriptor. Phys. Rev. Lett. 114(10), 105503 (2015)CrossRefGoogle Scholar
- 19.Goel, G., Chou, I.-C., Voit, E.O.: System estimation from metabolic time-series data. Bioinformatics 24(21), 2505–2511 (2008)CrossRefGoogle Scholar
- 20.Griewank, A., Toint, P.L.: On the unconstrained optimization of partially separable functions. In Nonlinear Optimization 1981, pp. 301–312. Academic Press (1982)Google Scholar
- 21.Hanson, D.L., Wright, F.T.: A bound on tail probabilities for quadratic forms in independent random variables. Ann. Math. Stat. 42(3), 1079–1083 (1971)MathSciNetCrossRefGoogle Scholar
- 22.Holtz, M.: Sparse Grid Quadrature in High Dimensions with Applications in Finance and Insurance, vol. 77. Springer, New York (2010)zbMATHGoogle Scholar
- 23.Huang, J., Horowitz, J.L., Wei, F.: Variable selection in nonparametric additive models. Ann. Stat. 38(4), 2282–2313 (2010)MathSciNetCrossRefGoogle Scholar
- 24.Kekatos, V., Giannakis, G.B.: Sparse volterra and polynomial regression models: recoverability and estimation. Trans. Sig. Proc. 59(12), 5907–5920 (2011)MathSciNetCrossRefGoogle Scholar
- 25.Koltchinskii, V., Yuan, M.: Sparse recovery in large ensembles of kernel machines. In: 21st Annual Conference on Learning Theory (COLT), pp. 229–238 (2008)Google Scholar
- 26.Koltchinskii, V., Yuan, M.: Sparsity in multiple kernel learning. Ann. Stat. 38(6), 3660–3695 (2010)MathSciNetCrossRefGoogle Scholar
- 27.Lin, Y., Zhang, H.H.: Component selection and smoothing in multivariate nonparametric regression. Ann. Stat. 34(5), 2272–2297 (2006)MathSciNetCrossRefGoogle Scholar
- 28.Meier, L., Van de Geer, S., Bühlmann, P.: High-dimensional additive modeling. Ann. Stat. 37(6B), 3779–3821 (2009)MathSciNetCrossRefGoogle Scholar
- 29.Mossel, E., O’Donnell, R., Servedio, R.: Learning juntas. In: 35th Annual ACM Symposium on Theory of Computing (STOC), pp. 206–212 (2003)Google Scholar
- 30.Nazer, B., Nowak, R.D.: Sparse interactions: Identifying high-dimensional multilinear systems via compressed sensing. In: 48th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1589–1596 (2010)Google Scholar
- 31.Nesterov, Y., Nemirovskii, A.: Interior-Point Polynomial Algorithms in Convex Programming. Society for Industrial and Applied Mathematics (1994)Google Scholar
- 32.Novak, E., Triebel, H.: Function spaces in lipschitz domains and optimal rates of convergence for sampling. Constr. Approx. 23(3), 325–350 (2006)MathSciNetCrossRefGoogle Scholar
- 33.Novak, E., Woźniakowski, H.: Approximation of infinitely differentiable multivariate functions is intractable. J. Compl. 25, 398–404 (2009)MathSciNetCrossRefGoogle Scholar
- 34.Radchenko, P., James, G.M.: Variable selection using adaptive nonlinear interaction structures in high dimensions. J. Am. Stat. Assoc. 105, 1541–1553 (2010)MathSciNetCrossRefGoogle Scholar
- 35.Raskutti, G., Wainwright, M.J., Yu, B.: Minimax-optimal rates for sparse additive models over kernel classes via convex programming. J. Mach. Learn. Res. 13(1), 389–427 (2012)MathSciNetzbMATHGoogle Scholar
- 36.Rauhut, H.: Compressive sensing and structured random matrices. Theor. Found. Numer. Methods Sparse Recovery 9, 1–92 (2010)MathSciNetzbMATHGoogle Scholar
- 37.Ravikumar, P., Lafferty, J., Liu, H., Wasserman, L.: Sparse additive models. J. R. Stat. Soc.: Ser. B (Stat. Methodol.) 71(5), 1009–1030 (2009)MathSciNetCrossRefGoogle Scholar
- 38.Schnass, K., Vybíral, J.: Compressed learning of high-dimensional sparse functions. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 3924–3927 (2011)Google Scholar
- 39.Shan, S., Wang, G.G.: Survey of modeling and optimization strategies to solve high-dimensional design problems with computationally-expensive black-box functions. Struct. Multidiscip. Optim. 41(2), 219–241 (2010)MathSciNetCrossRefGoogle Scholar
- 40.Storlie, C.B., Bondell, H.D., Reich, B.J., Zhang, H.H.: Surface estimation, variable selection, and the nonparametric oracle property. Statistica Sinica 21(2), 679–705 (2011)MathSciNetCrossRefGoogle Scholar
- 41.Tyagi, H., Krause, A., Gärtner, B.: Efficient sampling for learning sparse additive models in high dimensions. Adv. Neural Inf. Process. Syst. (NIPS) 27, 514–522 (2014)Google Scholar
- 42.Tyagi, H., Kyrillidis, A., Gärtner, B., Krause, A.: Learning sparse additive models with interactions in high dimensions. In: 19th International Conference on Artificial Intelligence and Statistics (AISTATS), pp. 111–120 (2016)Google Scholar
- 43.Tyagi, H., Kyrillidis, A., Gärtner, B., Krause, A.: Algorithms for learning sparse additive models with interactions in high dimensions. Inf. Inference: J. IMA, iax008 (2017)Google Scholar
- 44.Vershynin, R.: Introduction to the non-asymptotic analysis of random matrices. In: Compressed Sensing: Theory and Applications, pp. 210–268. Cambridge University Press (2012)Google Scholar
- 45.Vybíral, J.: Sampling numbers and function spaces. J. Compl. 23(4–6), 773–792 (2007)MathSciNetCrossRefGoogle Scholar
- 46.Vybíral, J.: Widths of embeddings in function spaces. J. Compl. 24(4), 545–570 (2008)MathSciNetCrossRefGoogle Scholar
- 47.Wahl, M.: Variable selection in high-dimensional additive models based on norms of projections (2015). ArXiv e-prints, arXiv:1406.0052
- 48.Winkelbauer, A.: Moments and absolute moments of the normal distribution (2014). ArXiv e-prints, arXiv:1209.4340v2
- 49.Yang, Y., Tokdar, S.T.: Minimax-optimal nonparametric regression in high dimensions. Ann. Stat. 43(2), 652–674 (2015)MathSciNetCrossRefGoogle Scholar
- 50.Zhu, P., Morelli, J., Ferrari, S.: Value function approximation for the control of multiscale dynamical systems. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 5471–5477 (2016)Google Scholar
Copyright information
© Springer Science+Business Media, LLC, part of Springer Nature 2019