# Block-simultaneous direction method of multipliers: a proximal primal-dual splitting algorithm for nonconvex problems with multiple constraints

## Abstract

We introduce a generalization of the linearized Alternating Direction Method of Multipliers to optimize a real-valued function f of multiple arguments with potentially multiple constraints $$g_\circ$$ on each of them. The function f may be nonconvex as long as it is convex in every argument, while the constraints $$g_\circ$$ need to be convex but not smooth. If f is smooth, the proposed Block-Simultaneous Direction Method of Multipliers (bSDMM) can be interpreted as a proximal analog to inexact coordinate descent methods under constraints. Unlike alternative approaches for joint solvers of multiple-constraint problems, we do not require linear operators $${{\mathsf {L}}}$$ of a constraint function $$g({{\mathsf {L}}}\ \cdot )$$ to be invertible or linked between each other. bSDMM is well-suited for a range of optimization problems, in particular for data analysis, where f is the likelihood function of a model and $${{\mathsf {L}}}$$ could be a transformation matrix describing e.g. finite differences or basis transforms. We apply bSDMM to the Non-negative Matrix Factorization task of a hyperspectral unmixing problem and demonstrate convergence and effectiveness of multiple constraints on both matrix factors. The algorithms are implemented in python and released as an open-source package.

This is a preview of subscription content, access via your institution.

## Notes

1. Throughout this work, indices denote different variables or constraints, not elements of vectors or tensors.

2. We use $$||\cdot ||_{\mathrm {s}}$$ to denote the spectral norm, $$||\cdot ||_2$$ for the element-wise $$\ell _2$$ norm of vectors and tensors.

3. While it is always possible to reformulate the problem thusly because we can set $$f({\mathbf {x}}_1) = g_l({{\mathsf {L}}}_{j 1} {\mathbf {x}}_1)$$ for any l, it may render inefficient the minimization of f by means of a proximal operator. This is the limitation of the algorithm we derive in this section.

4. Data set obtained from https://engineering.purdue.edu/~biehl/MultiSpec/.

5. The choice of $$K=4$$ is somewhat arbitrary, and we have not attempted to find the optimal number of components since that is not the focus of this work.

## References

• Berry MW, Browne M, Langville AN, Pauca VP, Plemmons RJ (2007) Algorithms and applications for approximate nonnegative matrix factorization. Comput Stat Data Anal 52(1):155–173

• Blanton MR, Roweis S (2007) K-corrections and filter transformations in the ultraviolet, optical, and near-infrared. Astron J 133:734–754. https://doi.org/10.1086/510127. ArXiv:astro-ph/0606170

• Boyd S, Parikh N, Chu E, Peleato B, Eckstein J (2011) Distributed optimization and statistical learning via the alternating direction method of multipliers. Found Trends® Mach Learn 3(1):1–122

• Chambolle A (2004) An algorithm for total variation minimization and applications. J Math Imaging Vis 20(1):89–97. https://doi.org/10.1023/B:JMIV.0000011325.36760.1e

• Chambolle A, Lions PL (1997) Image recovery via total variation minimization and related problems. Numer Math 76(2):167–188. https://doi.org/10.1007/s002110050258

• Chen G, Teboulle M (1994) A proximal-based decomposition method for convex minimization problems. Math Program 64(1–3):81–101

• Combettes PL, Pesquet JC (2007) A Douglas-Rachford splitting approach to nonsmooth convex variational signal recovery. IEEE J Sel Top Signal Process 1(4):564–574. https://doi.org/10.1109/JSTSP.2007.910264

• Combettes PL, Pesquet JC (2011) Proximal splitting methods in signal processing. In: Fixed-point algorithms for inverse problems in science and engineering, Springer, pp 185–212

• Combettes PL, Wajs VR (2005) Signal recovery by proximal forward-backward splitting. Multiscale Model Simul 4(4):1168–1200

• Condat L (2013) A primal-dual splitting method for convex optimization involving lipschitzian, proximable and linear composite terms. J Optim Theory Appl 158(2):460–479

• Douglas J, Rachford HH (1956) On the numerical solution of heat conduction problems in two and three space variables. Trans Am Math Soc 82(2):421–439

• Eckstein J, Bertsekas DP (1992) On the Douglas-Rachford splitting method and the proximal point algorithm for maximal monotone operators. Math Program 55(1):293–318

• Eckstein J, Yao W (2017) Approximate ADMM algorithms derived from Lagrangian splitting. Comput Optim Appl. https://doi.org/10.1007/s10589-017-9911-z

• Esser E, Zhang X, Chan TF (2010) A general framework for a class of first order primal-dual algorithms for convex optimization in imaging science. SIAM J Imaging Sci 3(4):1015–1046

• Gabay D, Mercier B (1976) A dual algorithm for the solution of nonlinear variational problems via finite element approximation. Comput Math Appl 2(1):17–40

• Gillis N (2014) The why and how of nonnegative matrix factorization, Chapman and Hall/CRC, pp 257–291. https://doi.org/10.1201/b17558-13

• Glowinski R, Marroco A (1975) Sur l’approximation, par éléments finis d’ordre un, et la résolution, par pénalisation-dualité d’une classe de problèmes de dirichlet non linéaires. Revue française d’automatique, informatique, recherche opérationnelle Analyse numérique 9(2):41–76

• Grippo L, Sciandrone M (2000) On the convergence of the block nonlinear Gauss-Seidel method under convex constraints. Oper Res Lett 26(3):127–136

• Hong M, Luo ZQ, Razaviyayn M (2016) Convergence analysis of alternating direction method of multipliers for a family of nonconvex problems. SIAM J Optim 26(1):337–364

• Jia S, Qian Y (2009) Constrained nonnegative matrix factorization for hyperspectral unmixing. IEEE Trans Geosci Remote Sens 47(1):161–173. https://doi.org/10.1109/TGRS.2008.2002882

• Komodakis N, Pesquet JC (2015) Playing with duality: an overview of recent primal-dual approaches for solving large-scale optimization problems. IEEE Signal Process Mag 32(6):31–54

• Lee DD, Seung HS (2001) Algorithms for non-negative matrix factorization. In: Leen TK, Dietterich TG, Tresp V (eds) Advances in Neural Information Processing Systems 13, MIT Press, pp 556–562, http://papers.nips.cc/paper/1861-algorithms-for-non-negative-matrix-factorization.pdf

• Lin CJ (2007) Projected gradient methods for nonnegative matrix factorization. Neural Comput 19(10):2756–2779

• Mitchell PA (1995) Hyperspectral digital imagery collection experiment (hydice). In: Proceedings of SPIE, vol 2587, pp 2587–2587–26, https://doi.org/10.1117/12.226807

• Nesterov Y (2013) Gradient methods for minimizing composite functions. Math Program 140(1):125–161

• Paatero P, Tapper U (1994) Positive matrix factorization: a non-negative factor model with optimal utilization of error estimates of data values. Environmetrics 5(2):111–126

• Parikh N, Boyd S et al (2014) Proximal algorithms. Found Trends® Optim 1(3):127–239

• Pesquet JC, Pustelnik N (2012) A parallel inertial proximal optimization method. Pac J Optim 8(2):273–305, https://hal.archives-ouvertes.fr/hal-00790702

• Razaviyayn M, Hong M, Luo ZQ (2013) A unified convergence analysis of block successive minimization methods for nonsmooth optimization. SIAM J Optim 23(2):1126–1153

• Stephanopoulos G, Westerberg AW (1975) The use of Hestenes’ method of multipliers to resolve dual gaps in engineering system optimization. J Optim Theory Appl 15(3):285–309

• Wang Y, Yin W, Zeng J (2015) Global convergence of ADMM in nonconvex nonsmooth optimization. arXiv preprint arXiv:151106324

• Xu Y, Yin W (2013) A block coordinate descent method for regularized multiconvex optimization with applications to nonnegative tensor factorization and completion. SIAM J Imaging Sci 6(3):1758–1789

• Zhang S, Qian H, Gong X (2016) An alternating proximal splitting method with global convergence for nonconvex structured sparsity optimization. In: AAAI, pp 2330–2336

• Zhu G (2016) Nonnegative Matrix Factorization (NMF) with Heteroscedastic Uncertainties and Missing Data. ArXiv e-prints ArXiv:1612.06037

## Acknowledgements

We would like to thank Robert Vanderbei and Jonathan Eckstein for useful discussions regarding the algorithm, and Jim Bosch and Robert Lupton for comments on its astrophysical applications.

## Author information

Authors

### Corresponding author

Correspondence to Fred Moolekamp.

## Rights and permissions

Reprints and Permissions

Moolekamp, F., Melchior, P. Block-simultaneous direction method of multipliers: a proximal primal-dual splitting algorithm for nonconvex problems with multiple constraints. Optim Eng 19, 871–885 (2018). https://doi.org/10.1007/s11081-018-9380-y

• Revised:

• Accepted:

• Published:

• Issue Date:

• DOI: https://doi.org/10.1007/s11081-018-9380-y

### Keywords

• Optimization
• Proximal algorithms
• Nonconvex optimization
• Block coordinate descent
• Non-negative matrix factorization