Advertisement

Alternating Direction Method of Multipliers for Hierarchical Basis Approximators

  • Valeriy KhakhutskyyEmail author
  • Dirk Pflüger
Conference paper
Part of the Lecture Notes in Computational Science and Engineering book series (LNCSE, volume 97)

Abstract

Sparse grids have been successfully used for the mining of vast datasets with a moderate number of dimensions. Compared to established machine learning techniques like artificial neural networks or support vector machines, sparse grids provide an analytic approximant that is easier to analyze and to interpret. More important, they are based on a high-dimensional discretization of the feature space, are thus less data-dependent than conventional approaches, scale only linearly in the number of data points and are well-suited to deal with huge amounts of data. But with an increasing size of the datasets used for learning, computing times clearly can become prohibitively large for normal use, despite the linear scaling. Thus, efficient parallelization strategies have to be found to exploit the power of modern hardware. We investigate the parallelization opportunities for solving high-dimensional machine learning problems with adaptive sparse grids using the alternating direction method of multipliers (ADMM). ADMM allows us to split the initially large problem into smaller ones. They can then be solved in parallel while their reduced problem sizes can even be small enough for an explicitly assembly of the system matrices. We show the first results of the new approach using a set of problems and discuss the challenges that arise when applying ADMM to a hierarchical basis.

Keywords

Sparse Grid Augmented Lagrangian Method Alternate Direction Method Memory Footprint Shared Memory System 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    K. Arrow, L. Hurwicz, H. Uzawa, Studies in Linear and Non-linear Programming (Stanford University Press, Stanford, 1958)zbMATHGoogle Scholar
  2. 2.
    D.P. Bertsekas, J.N. Tsitsiklis, Parallel and Distributed Computation: Numerical Methods (Prentice-Hall, Upper Saddle River, 1989)zbMATHGoogle Scholar
  3. 3.
    S. Boyd, N. Parikh, E. Chu, B. Peleato, J. Eckstein, Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 3(1), 1–122 (2010)CrossRefzbMATHGoogle Scholar
  4. 4.
    J. Eckstein, M. Fukushima, Some reformulations and applications of the alternating direction method of multipliers, in Large Scale Optimization: State of the Art (Springer, US, 1994), pp. 115–134Google Scholar
  5. 5.
    M.A.T. Figueiredo, J.M. Bioucas-Dias, Restoration of Poissonian images using alternating direction optimization. IEEE Trans. Image Process. Publ. IEEE Signal Proc. Soc. 19(12), 3133–3145 (2010)CrossRefMathSciNetGoogle Scholar
  6. 6.
    M. Fortin, R. Glowinski, Augmented Lagrangian methods in quadratic programming, in Augmented Lagrangian Methods: Applications to the Numerical Solution of Boundary-Value Problems. Studies in Mathematics and its Applications, vol. 15 (Springer, Berlin, 1983), pp. 1–46Google Scholar
  7. 7.
    J. Friedman, Multivariate adaptive regression splines. Ann. Stat. 19(1), 1–67 (1991)CrossRefzbMATHGoogle Scholar
  8. 8.
    M. Fukushima, Application of the alternating direction method of multipliers to separable convex programming problems. Comput. Optim. Appl. 1(1), 93–111 (1992)CrossRefzbMATHMathSciNetGoogle Scholar
  9. 9.
    D. Gabay, Applications of the method of multipliers to variational inequalities, in Augmented Lagrangian Methods: Applications to the Numerical Solution of Boundary-Value Problems, ed. by M. Fortin, R. Glowinski. Studies in Mathematics and Its Applications, vol. 15 (Elsevier, New York, 1983), pp. 299–331Google Scholar
  10. 10.
    D. Gabay, B. Mercier, A dual algorithm for the solution of nonlinear variational problems via finite element approximation. Comput. Math. Appl. 2(1), 17–40 (1976)CrossRefzbMATHGoogle Scholar
  11. 11.
    J. Garcke, M. Griebel, On the Parallelization of the Sparse Grid Approach for Data Mining (Springer, Berlin, 2001), pp. 22–32Google Scholar
  12. 12.
    J. Garcke, M. Griebel, M. Thess, Data mining with sparse grids. Computing 67(3), 225–253 (2001)CrossRefzbMATHMathSciNetGoogle Scholar
  13. 13.
    J. Garcke, M. Hegland, O. Nielsen, Parallelisation of sparse grids for large scale data analysis, in Computational Science — ICCS 2003, ed. by P.M.A. Sloot, D. Abramson, A.V. Bogdanov, Y.E. Gorbachev, J.J. Dongarra, A.Y. Zomaya. Lecture Notes in Computer Science, vol. 2659 (Springer, Berlin, 2003), pp. 683–692Google Scholar
  14. 14.
    R. Glowinski, A. Marroco, Sur l’approximation, par éléments finis d’ordre un, et la résolution, par pénalisation-dualité d’une classe de problèmes de dirichlet non linéaires. Revue française d’automatique, informatique, recherche opérationnelle. Analyse numérique 9(2), 41–76 (1975)Google Scholar
  15. 15.
    R. Glowinski, P.L. Tallec, Augmented Lagrangian Methods for the Solution of Variational Problems, Chap. 3 (Society for Industrial and Applied Mathematics, Philadelphia 1989), pp. 45–121Google Scholar
  16. 16.
    K. Goto, R. Van De Geijn, High-performance implementation of the level-3 BLAS. ACM Trans. Math. Softw. 35(1), 1–4 (2008) [Article 4]Google Scholar
  17. 17.
    B. He, H. Yang, S. Wang, Alternating direction method with self-adaptive penalty parameters for monotone variational inequalities. J. Optim. Theory Appl. 106(2), 337–356 (2000)CrossRefzbMATHMathSciNetGoogle Scholar
  18. 18.
    A. Heinecke, D. Pflüger, Multi- and many-core data mining with adaptive sparse grids, in Proceedings of the 8th ACM International Conference on Computing Frontiers (ACM, New York, 2011), pp. 29:1–29:10Google Scholar
  19. 19.
    A. Heinecke, D. Pflüger, Emerging architectures enable to boost massively parallel data mining using adaptive sparse grids. Int. J. Parallel Prog. 41(3), 357–399 (2013)CrossRefGoogle Scholar
  20. 20.
    D. Pflüger, SG\(++\) (2013). http://www5.in.tum.de/SGpp
  21. 21.
    D. Pflüger, Spatially Adaptive Sparse Grids for High-Dimensional Problems (Verlag Dr. Hut, München, 2010)Google Scholar
  22. 22.
    G. Steidl, T. Teuber, Removing multiplicative noise by Douglas-Rachford splitting methods. J. Math. Imaging Vis. 36, 168–184 (2010)CrossRefMathSciNetGoogle Scholar
  23. 23.
    X. Zhang, M. Burger, S. Osher, A unified primal-dual algorithm framework based on Bregman iteration. J. Sci. Comput. 46(1), 20–46 (2010)CrossRefMathSciNetGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  1. 1.Institute for Advanced StudyTechnische Universität MünchenGarchingGermany
  2. 2.Institute for Parallel and Distributed SystemsUniversity of StuttgartStuttgartGermany

Personalised recommendations