# Block splitting for distributed optimization

- 861 Downloads
- 32 Citations

## Abstract

This paper describes a general purpose method for solving convex optimization problems in a distributed computing environment. In particular, if the problem data includes a large linear operator or matrix \(A\), the method allows for handling each sub-block of \(A\) on a separate machine. The approach works as follows. First, we define a canonical problem form called *graph form*, in which we have two sets of variables related by a linear operator \(A\), such that the objective function is separable across these two sets of variables. Many types of problems are easily expressed in graph form, including cone programs and a wide variety of regularized loss minimization problems from statistics, like logistic regression, the support vector machine, and the lasso. Next, we describe *graph projection splitting*, a form of Douglas–Rachford splitting or the alternating direction method of multipliers, to solve graph form problems serially. Finally, we derive a distributed *block splitting* algorithm based on graph projection splitting. In a statistical or machine learning context, this allows for training models exactly with a huge number of both training examples and features, such that each processor handles only a subset of both. To the best of our knowledge, this is the only general purpose method with this property. We present several numerical experiments in both the serial and distributed settings.

## Keywords

Distributed optimization Alternating direction method of multipliers Operator splitting Proximal operators Cone programming Machine learning## Mathematics Subject Classification (2000)

90C06 90C22 90C25 90C30## Notes

### Acknowledgments

We thank Eric Chu for many helpful discussions, and for help with the IMRT example (including providing code for data generation and plotting results). Michael Grant provided advice on integrating a new cone solver into CVX. Alex Teichman and Daniel Selsam gave helpful comments on an early draft. We also thank the anonymous referees and Kim-Chuan Toh for much helpful feedback.

## References

- 1.Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn.
**3**(1), 1–122 (2011)CrossRefGoogle Scholar - 2.Luenberger, D.G.: Optimization by Vector Space Methods. Wiley, New York (1969)zbMATHGoogle Scholar
- 3.Nesterov, Y., Nemirovskii, A.: Interior-Point Polynomial Methods in Convex Programming. Society for Industrial and Applied Mathematics (1994)Google Scholar
- 4.Ben-Tal, A., Nemirovski, A.: Lectures on Modern Convex Optimization: Analysis, Algorithms, and Engineering Applications. Society for Industrial and Applied Mathematics (2001)Google Scholar
- 5.Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University Press, Cambridge (2004)CrossRefzbMATHGoogle Scholar
- 6.Wen, Z., Goldfarb, D., Yin, W.: Alternating direction augmented Lagrangian methods for semidefinite programming. Math. Program. Comput.
**2**(3), 203–230 (2010)CrossRefzbMATHMathSciNetGoogle Scholar - 7.Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B
**58**(1), 267–288 (1996)zbMATHMathSciNetGoogle Scholar - 8.Group, I.M.R.T.C.W.: Intensity-modulated radiotherapy: Current status and issues of interest. Int. J. Radiat. Oncol. Biol. Phys.
**51**(4), 880–914 (2001)Google Scholar - 9.Webb, S.: Intensity-Modulated Radiation Therapy. Taylor & Francis (2001)Google Scholar
- 10.Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace Hilbertien. Rep. Paris Acad. Sci. Ser. A
**255**, 2897–2899 (1962)zbMATHMathSciNetGoogle Scholar - 11.Parikh, N., Boyd, S.: Proximal algorithms. Found. Trends Optim.
**1**(3), 1–108 (2013). To appearGoogle Scholar - 12.Eckstein, J., Ferris, M.C.: Operator-splitting methods for monotone affine variational inequalities, with a parallel application to optimal control. INFORMS J. Comput.
**10**(2), 218–235 (1998)CrossRefzbMATHMathSciNetGoogle Scholar - 13.Yang, J., Zhang, Y.: Alternating direction algorithms for \(\backslash \text{ ell }\_1\)-problems in compressive sensing. SIAM J. Sci. Comput.
**33**(1), 250–278 (2011)CrossRefzbMATHMathSciNetGoogle Scholar - 14.Yuan, M., Lin, Y.: Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. Ser. B (Statistical Methodology)
**68**(1), 49–67 (2006)CrossRefzbMATHMathSciNetGoogle Scholar - 15.Ohlsson, H., Ljung, L., Boyd, S.: Segmentation of ARX-models using sum-of-norms regularization. Automatica
**46**(6), 1107–1111 (2010)CrossRefzbMATHMathSciNetGoogle Scholar - 16.Agarwal, A., Chapelle, O., Dudik, M., Langford, J.: A reliable effective terascale linear learning, system. arXiv:1110.4198 (2011)Google Scholar
- 17.Grant, M., Boyd, S.: CVX: Matlab software for disciplined convex programming (2008). http://cvxr.com/cvx
- 18.Sturm, J.: Using SeDuMi 1.02, a MATLAB toolbox for optimization over symmetric cones. Optim. Methods Softw.
**11**(1–4), 625–653 (1999)CrossRefMathSciNetGoogle Scholar - 19.Toh, K.C., Todd, M.J., Tütüncü, R.H.: SDPT3: a MATLAB software package for semidefinite programming, version 1.3. Optim. Methods Softw.
**11**(1–4), 545–581 (1999)CrossRefMathSciNetGoogle Scholar - 20.CVX Research, I.: CVX: Matlab software for disciplined convex programming, version 2.0 beta. http://cvxr.com/cvx/examples (2012). Example library
- 21.Whaley, R.C., Dongarra, J.J.: Automatically tuned linear algebra software. In: Proceedings of the 1998 ACM/IEEE Conference on Supercomputing (CDROM), pp. 1–27 (1998)Google Scholar
- 22.Vanderbei, R.J.: Symmetric quasi-definite matrices. SIAM J. Optim.
**5**(1), 100–113 (1995)CrossRefzbMATHMathSciNetGoogle Scholar - 23.Saunders, M.A.: Cholesky-based methods for sparse least squares: the benefits of regularization. In: Adams, L., Nazareth, J.L. (eds.) Linear and Nonlinear Conjugate Gradient-Related Methods, pp. 92–100. SIAM, Philadelphia (1996)Google Scholar
- 24.Davis, T.A.: Algorithm 8xx: a concise sparse Cholesky factorization package. ACM Trans. Math. Softw.
**31**(4), 587–591 (2005)CrossRefzbMATHGoogle Scholar - 25.Amestoy, P., Davis, T.A., Duff, I.S.: An approximate minimum degree ordering algorithm. SIAM J. Matrix Anal. Appl.
**17**(4), 886–905 (1996)CrossRefzbMATHMathSciNetGoogle Scholar - 26.Amestoy, P., Davis, T.A., Duff, I.S.: Algorithm 837: AMD, an approximate minimum degree ordering algorithm. ACM Trans. Math. Softw.
**30**(3), 381–388 (2004)CrossRefzbMATHMathSciNetGoogle Scholar - 27.Davis, T.A.: Direct Methods for Sparse Linear Systems. SIAM, Philadelphia (2006)CrossRefzbMATHGoogle Scholar
- 28.Benzi, M., Meyer, C.D., Tuma, M.: A sparse approximate inverse preconditioner for the conjugate gradient method. SIAM J. Sci. Comput.
**17**(5), 1135–1149 (1996)CrossRefzbMATHMathSciNetGoogle Scholar - 29.Benzi, M.: Preconditioning techniques for large linear systems: a survey. J. Comput. Phys.
**182**(2), 418–477 (2002)CrossRefzbMATHMathSciNetGoogle Scholar - 30.Golub, G.H., Van Loan, C.: Matrix Computations, vol. 3. Johns Hopkins University Press, Baltimore (1996)Google Scholar
- 31.Wilkinson, J.H.: Rounding Errors in Algebraic Processes. Dover (1963)Google Scholar
- 32.Moler, C.B.: Iterative refinement in floating point. J. ACM
**14**(2), 316–321 (1967)CrossRefzbMATHGoogle Scholar