Spatially-Dimension-Adaptive Sparse Grids for Online Learning

  • Valeriy KhakhutskyyEmail author
  • Markus Hegland
Conference paper
Part of the Lecture Notes in Computational Science and Engineering book series (LNCSE, volume 109)


This paper takes a new look at regression with adaptive sparse grids. Considering sparse grid refinement as an optimisation problem, we show that it is in fact an instance of submodular optimisation with a cardinality constraint. Hence, we are able to directly apply results obtained in combinatorial optimisation research concerned with submodular optimisation to the grid refinement problem. Based on these results, we derive an efficient refinement indicator that allows the selection of new grid indices with finer granularity than was previously possible. We then implement the resulting new refinement procedure using an averaged stochastic gradient descent method commonly used in online learning methods. As a result we obtain a new method for training adaptive sparse grid models. We show both for synthetic and real-life data that the resulting models exhibit lower complexity and higher predictive power compared to currently used state-of-the-art methods.


Conjugate Gradient Gradient Descent Sparse Grid Marginal Gain Stochastic Gradient Descent 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    J.K. Adelman-McCarthy et al., The Fifth Data Release of the Sloan Digital Sky Survey. Astrophys. J. Suppl. Ser. 172(2), 634–644 (2007)CrossRefGoogle Scholar
  2. 2.
    F. Bach, E. Moulines, Non-strongly-convex smooth stochastic approximation with convergence rate o(1/n), in Advances in Neural Information Processing Systems 26, ed. by C. Burges, L. Bottou, M. Welling, Z. Ghahramani, K. Weinberger (Curran Associates, Inc., Red Hook, 2013), pp. 773–781Google Scholar
  3. 3.
    T. Blumensath, M.E. Davies, Stagewise weak gradient pursuits. IEEE Trans. Signal Process. 57(11), 4333–4346 (2009)MathSciNetCrossRefGoogle Scholar
  4. 4.
    L. Bottou, Stochastic learning, in Advanced Lectures on Machine Learning (Springer, Berlin/Heidelberg, 2004), pp. 146–168CrossRefzbMATHGoogle Scholar
  5. 5.
    L. Bottou, Online algorithms and stochastic approximations, in Online Learning and Neural Networks, ed. by D. Saad (Cambridge University Press, Cambridge, 1998), pp. 9–42. Revised (2012)Google Scholar
  6. 6.
    L. Bottou, Large-scale machine learning with stochastic gradient descent, in Proceedings of the 19th International Conference on Computational Statistics (COMPSTAT’2010), Paris, ed. by Y. Lechevallier, G. Saporta (Physica-Verlag, Heidelberg, 2010), pp. 177–187. ISBN:978-3-7908-2603-6CrossRefGoogle Scholar
  7. 7.
    L. Bottou, Stochastic gradient tricks, in Neural Networks, Tricks of the Trade, Reloaded, ed. by G. Montavon, G.B. Orr, K.-R. Müller. Lecture Notes in Computer Science (LNCS 7700) (Springer, 2012), pp. 430–445Google Scholar
  8. 8.
    L. Bottou, Y. LeCun, Large scale online learning, in Advances in Neural Information Processing Systems 16, ed. by S. Thrun, L. Saul, B. Schölkopf (MIT, Cambridge MA, 2004), pp. 217–224Google Scholar
  9. 9.
    H.-J. Bungartz, M. Griebel, Sparse grids. Acta Numer. 13, 147–269 (2004)MathSciNetCrossRefzbMATHGoogle Scholar
  10. 10.
    H.-J. Bungartz, D. Pflüger, S. Zimmer, Adaptive sparse grid techniques for data mining, in Modeling, Simulation and Optimization of Complex Processes, ed. by H.G. Bock, E. Kostina, H.X. Phu, R. Rannacher (Springer, Berlin/Heidelberg, 2008), pp. 121–130. ISBN:978-3-540-79408-0CrossRefGoogle Scholar
  11. 11.
    G. Buse, Exploiting Many-Core Architectures for Dimensionally Adaptive Sparse Grids. Dissertation, Institut für Informatik, Technische Universität München, München, 2015Google Scholar
  12. 12.
    U. Feige, A threshold of ln n for approximating set cover. J. ACM 45(4), 634–652 (1998)MathSciNetCrossRefzbMATHGoogle Scholar
  13. 13.
    J. Garcke, Maschinelles Lernen durch Funktionsrekonstruktion mit verallgemeinerten dünnen Gittern. Doktorarbeit, Institut für Numerische Simulation, Universität Bonn, 2004Google Scholar
  14. 14.
    J. Garcke, M. Griebel, M. Thess, Data mining with sparse grids. Computing 67(3), 225–253 (2001)MathSciNetCrossRefzbMATHGoogle Scholar
  15. 15.
    M. Hegland, Adaptive sparse grids, in Procedings of 10th Computational Techniques and Applications Conference CTAC-2001, Brisbane, vol. 44, ed. by K. Burrage, R.B. Sidje (2003), pp. C335–C353Google Scholar
  16. 16.
    A. Heinecke, D. Pflüger, Multi- and many-core data mining with adaptive sparse grids, in Proceedings of the 8th ACM International Conference on Computing Frontiers, New York, May 2011 (ACM, 2011), pp. 29:1–29:10Google Scholar
  17. 17.
    P. Kambadur, A.C. Lozano, A parallel, block greedy method for sparse inverse covariance estimation for ultra-high dimensions, in Proceedings of the Sixteenth International Conference on Artificial Intelligence and Statistics, Scottsdale (2013), pp. 351–359Google Scholar
  18. 18.
    V. Khakhutskyy, D. Pflüger, M. Hegland, Scalability and fault tolerance of the alternating direction method of multipliers for sparse grids, in Parallel Computing: Accelerating Computational Science and Engineering (CSE), Amsterdam, 2014, ed. by M. Bader, H.-J. Bungartz, A. Bode, M. Gerndt, G.R. Joubert. Volume 25 of Advances in Parallel Computing (IOS, 2014), pp. 603–612Google Scholar
  19. 19.
    J. Leskovec, A. Krause, C. Guestrin, C. Faloutsos, J. VanBriesen, N. Glance, Cost-effective outbreak detection in networks, in Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’07, New York (ACM, 2007), pp. 420–429Google Scholar
  20. 20.
    M. Minoux, Accelerated greedy algorithms for maximizing submodular set functions, in Optimization Techniques, Lecture Notes in Control and Information Sciences 7:234–243, (1978)MathSciNetCrossRefzbMATHGoogle Scholar
  21. 21.
    G.L. Nemhauser, L.a. Wolsey, M.L. Fisher, An analysis of approximations for maximizing submodular set functions-I. Math. Program. 14, 265–294 (1978)Google Scholar
  22. 22.
    J. Nocedal, S.J. Wright, Numerical Optimization. Springer Series in Operations Research and Financial Engineering, 2nd edn. (Springer, New York, 2006)Google Scholar
  23. 23.
    B. Peherstorfer, Model Order Reduction of Parametrized Systems with Sparse Grid Learning Techniques. Dissertation, Department of Informatics, Technische Universität München, Oct. 2013Google Scholar
  24. 24.
    B. Peherstorfer, D. Pflüger, H.-J. Bungartz, A Sparse-grid-based out-of-sample extension for dimensionality reduction and clustering with laplacian eigenmaps, in AI 2011: Advances in Artificial Intelligence, ed. by D. Wang, M. Reynolds (Springer, Berlin/Heidelberg, 2011), pp. 112–121CrossRefGoogle Scholar
  25. 25.
    B. Peherstorfer, F. Franzelin, D. Pflüger, H.-J. Bungartz, Classification with probability density estimation on sparse grids, in Sparse Grids and Applications – Munich 2012, ed. by J. Garcke, D. Pflüger. Volume 97 of Lecture Notes in Computational Science and Engineering, pp. 255–270 (Springer, Cham/New York, 2014)Google Scholar
  26. 26.
    D. Pflüger, Spatially Adaptive Sparse Grids for High-Dimensional Problems (Verlag Dr. Hut, München, 2010).zbMATHGoogle Scholar
  27. 27.
    D. Pflüger, Spatially adaptive refinement, in Sparse Grids and Applications, ed. by J. Garcke, M. Griebel. Lecture Notes in Computational Science and Engineering (Springer, Berlin/Heidelberg, 2012), pp. 243–262Google Scholar
  28. 28.
    B. Polyak, A. Juditsky, Acceleration of stochastic approximation by averaging. SIAM J. Control Optim. 30(4), 838–855 (1992)MathSciNetCrossRefzbMATHGoogle Scholar
  29. 29.
    M. Rosenblatt, Remarks on a multivariate transformation. Ann. Math. Stat. 23(3), 470–472 (1952)MathSciNetCrossRefzbMATHGoogle Scholar
  30. 30.
    T. Schaul, S. Zhang, Y. LeCun, No More Pesky learning rates. J. Mach. Learn. Res. 28, 343–351 (2013)Google Scholar
  31. 31.
    D. Strätling, Concept drift with adaptive sparse grids. Bachelor Thesis, Technische Universität München, 2015Google Scholar
  32. 32.
    K. Wei, J. Bilmes, R.U.W. Edu, B.U.W. Edu, Fast multi-stage submodular maximization, in International Conference on Machine Learning, Beijing, 2014Google Scholar
  33. 33.
    W. Xu, Towards optimal one pass large scale learning with averaged stochastic gradient descent. Arxiv preprint arXiv:1107.2490. (2011), pp. 1–19Google Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  1. 1.Technische Universität MünchenMünchenGermany
  2. 2.The Australian National UniversityCanberraAustralia

Personalised recommendations