Advertisement

Mathematical Programming Computation

, Volume 6, Issue 1, pp 77–102 | Cite as

Block splitting for distributed optimization

  • Neal ParikhEmail author
  • Stephen Boyd
Full Length Paper

Abstract

This paper describes a general purpose method for solving convex optimization problems in a distributed computing environment. In particular, if the problem data includes a large linear operator or matrix \(A\), the method allows for handling each sub-block of \(A\) on a separate machine. The approach works as follows. First, we define a canonical problem form called graph form, in which we have two sets of variables related by a linear operator \(A\), such that the objective function is separable across these two sets of variables. Many types of problems are easily expressed in graph form, including cone programs and a wide variety of regularized loss minimization problems from statistics, like logistic regression, the support vector machine, and the lasso. Next, we describe graph projection splitting, a form of Douglas–Rachford splitting or the alternating direction method of multipliers, to solve graph form problems serially. Finally, we derive a distributed block splitting algorithm based on graph projection splitting. In a statistical or machine learning context, this allows for training models exactly with a huge number of both training examples and features, such that each processor handles only a subset of both. To the best of our knowledge, this is the only general purpose method with this property. We present several numerical experiments in both the serial and distributed settings.

Keywords

Distributed optimization Alternating direction method of multipliers Operator splitting Proximal operators Cone programming Machine learning 

Mathematics Subject Classification (2000)

90C06 90C22 90C25 90C30 

Notes

Acknowledgments

We thank Eric Chu for many helpful discussions, and for help with the IMRT example (including providing code for data generation and plotting results). Michael Grant provided advice on integrating a new cone solver into CVX. Alex Teichman and Daniel Selsam gave helpful comments on an early draft. We also thank the anonymous referees and Kim-Chuan Toh for much helpful feedback.

References

  1. 1.
    Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 3(1), 1–122 (2011)CrossRefGoogle Scholar
  2. 2.
    Luenberger, D.G.: Optimization by Vector Space Methods. Wiley, New York (1969)zbMATHGoogle Scholar
  3. 3.
    Nesterov, Y., Nemirovskii, A.: Interior-Point Polynomial Methods in Convex Programming. Society for Industrial and Applied Mathematics (1994)Google Scholar
  4. 4.
    Ben-Tal, A., Nemirovski, A.: Lectures on Modern Convex Optimization: Analysis, Algorithms, and Engineering Applications. Society for Industrial and Applied Mathematics (2001)Google Scholar
  5. 5.
    Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University Press, Cambridge (2004)CrossRefzbMATHGoogle Scholar
  6. 6.
    Wen, Z., Goldfarb, D., Yin, W.: Alternating direction augmented Lagrangian methods for semidefinite programming. Math. Program. Comput. 2(3), 203–230 (2010)CrossRefzbMATHMathSciNetGoogle Scholar
  7. 7.
    Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B 58(1), 267–288 (1996)zbMATHMathSciNetGoogle Scholar
  8. 8.
    Group, I.M.R.T.C.W.: Intensity-modulated radiotherapy: Current status and issues of interest. Int. J. Radiat. Oncol. Biol. Phys. 51(4), 880–914 (2001)Google Scholar
  9. 9.
    Webb, S.: Intensity-Modulated Radiation Therapy. Taylor & Francis (2001)Google Scholar
  10. 10.
    Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace Hilbertien. Rep. Paris Acad. Sci. Ser. A 255, 2897–2899 (1962)zbMATHMathSciNetGoogle Scholar
  11. 11.
    Parikh, N., Boyd, S.: Proximal algorithms. Found. Trends Optim. 1(3), 1–108 (2013). To appearGoogle Scholar
  12. 12.
    Eckstein, J., Ferris, M.C.: Operator-splitting methods for monotone affine variational inequalities, with a parallel application to optimal control. INFORMS J. Comput. 10(2), 218–235 (1998)CrossRefzbMATHMathSciNetGoogle Scholar
  13. 13.
    Yang, J., Zhang, Y.: Alternating direction algorithms for \(\backslash \text{ ell }\_1\)-problems in compressive sensing. SIAM J. Sci. Comput. 33(1), 250–278 (2011)CrossRefzbMATHMathSciNetGoogle Scholar
  14. 14.
    Yuan, M., Lin, Y.: Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. Ser. B (Statistical Methodology) 68(1), 49–67 (2006)CrossRefzbMATHMathSciNetGoogle Scholar
  15. 15.
    Ohlsson, H., Ljung, L., Boyd, S.: Segmentation of ARX-models using sum-of-norms regularization. Automatica 46(6), 1107–1111 (2010)CrossRefzbMATHMathSciNetGoogle Scholar
  16. 16.
    Agarwal, A., Chapelle, O., Dudik, M., Langford, J.: A reliable effective terascale linear learning, system. arXiv:1110.4198 (2011)Google Scholar
  17. 17.
    Grant, M., Boyd, S.: CVX: Matlab software for disciplined convex programming (2008). http://cvxr.com/cvx
  18. 18.
    Sturm, J.: Using SeDuMi 1.02, a MATLAB toolbox for optimization over symmetric cones. Optim. Methods Softw. 11(1–4), 625–653 (1999)CrossRefMathSciNetGoogle Scholar
  19. 19.
    Toh, K.C., Todd, M.J., Tütüncü, R.H.: SDPT3: a MATLAB software package for semidefinite programming, version 1.3. Optim. Methods Softw. 11(1–4), 545–581 (1999)CrossRefMathSciNetGoogle Scholar
  20. 20.
    CVX Research, I.: CVX: Matlab software for disciplined convex programming, version 2.0 beta. http://cvxr.com/cvx/examples (2012). Example library
  21. 21.
    Whaley, R.C., Dongarra, J.J.: Automatically tuned linear algebra software. In: Proceedings of the 1998 ACM/IEEE Conference on Supercomputing (CDROM), pp. 1–27 (1998)Google Scholar
  22. 22.
    Vanderbei, R.J.: Symmetric quasi-definite matrices. SIAM J. Optim. 5(1), 100–113 (1995)CrossRefzbMATHMathSciNetGoogle Scholar
  23. 23.
    Saunders, M.A.: Cholesky-based methods for sparse least squares: the benefits of regularization. In: Adams, L., Nazareth, J.L. (eds.) Linear and Nonlinear Conjugate Gradient-Related Methods, pp. 92–100. SIAM, Philadelphia (1996)Google Scholar
  24. 24.
    Davis, T.A.: Algorithm 8xx: a concise sparse Cholesky factorization package. ACM Trans. Math. Softw. 31(4), 587–591 (2005)CrossRefzbMATHGoogle Scholar
  25. 25.
    Amestoy, P., Davis, T.A., Duff, I.S.: An approximate minimum degree ordering algorithm. SIAM J. Matrix Anal. Appl. 17(4), 886–905 (1996)CrossRefzbMATHMathSciNetGoogle Scholar
  26. 26.
    Amestoy, P., Davis, T.A., Duff, I.S.: Algorithm 837: AMD, an approximate minimum degree ordering algorithm. ACM Trans. Math. Softw. 30(3), 381–388 (2004)CrossRefzbMATHMathSciNetGoogle Scholar
  27. 27.
    Davis, T.A.: Direct Methods for Sparse Linear Systems. SIAM, Philadelphia (2006)CrossRefzbMATHGoogle Scholar
  28. 28.
    Benzi, M., Meyer, C.D., Tuma, M.: A sparse approximate inverse preconditioner for the conjugate gradient method. SIAM J. Sci. Comput. 17(5), 1135–1149 (1996)CrossRefzbMATHMathSciNetGoogle Scholar
  29. 29.
    Benzi, M.: Preconditioning techniques for large linear systems: a survey. J. Comput. Phys. 182(2), 418–477 (2002)CrossRefzbMATHMathSciNetGoogle Scholar
  30. 30.
    Golub, G.H., Van Loan, C.: Matrix Computations, vol. 3. Johns Hopkins University Press, Baltimore (1996)Google Scholar
  31. 31.
    Wilkinson, J.H.: Rounding Errors in Algebraic Processes. Dover (1963)Google Scholar
  32. 32.
    Moler, C.B.: Iterative refinement in floating point. J. ACM 14(2), 316–321 (1967)CrossRefzbMATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg and Mathematical Optimization Society 2013

Authors and Affiliations

  1. 1.Department of Computer ScienceStanford UniversityStanfordUSA
  2. 2.Department of Electrical EngineeringStanford UniversityStanfordUSA

Personalised recommendations