Skip to main content

Block-simultaneous direction method of multipliers: a proximal primal-dual splitting algorithm for nonconvex problems with multiple constraints


We introduce a generalization of the linearized Alternating Direction Method of Multipliers to optimize a real-valued function f of multiple arguments with potentially multiple constraints \(g_\circ\) on each of them. The function f may be nonconvex as long as it is convex in every argument, while the constraints \(g_\circ\) need to be convex but not smooth. If f is smooth, the proposed Block-Simultaneous Direction Method of Multipliers (bSDMM) can be interpreted as a proximal analog to inexact coordinate descent methods under constraints. Unlike alternative approaches for joint solvers of multiple-constraint problems, we do not require linear operators \({{\mathsf {L}}}\) of a constraint function \(g({{\mathsf {L}}}\ \cdot )\) to be invertible or linked between each other. bSDMM is well-suited for a range of optimization problems, in particular for data analysis, where f is the likelihood function of a model and \({{\mathsf {L}}}\) could be a transformation matrix describing e.g. finite differences or basis transforms. We apply bSDMM to the Non-negative Matrix Factorization task of a hyperspectral unmixing problem and demonstrate convergence and effectiveness of multiple constraints on both matrix factors. The algorithms are implemented in python and released as an open-source package.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4


  1. Throughout this work, indices denote different variables or constraints, not elements of vectors or tensors.

  2. We use \(||\cdot ||_{\mathrm {s}}\) to denote the spectral norm, \(||\cdot ||_2\) for the element-wise \(\ell _2\) norm of vectors and tensors.

  3. While it is always possible to reformulate the problem thusly because we can set \(f({\mathbf {x}}_1) = g_l({{\mathsf {L}}}_{j 1} {\mathbf {x}}_1)\) for any l, it may render inefficient the minimization of f by means of a proximal operator. This is the limitation of the algorithm we derive in this section.

  4. Data set obtained from

  5. The choice of \(K=4\) is somewhat arbitrary, and we have not attempted to find the optimal number of components since that is not the focus of this work.


  • Berry MW, Browne M, Langville AN, Pauca VP, Plemmons RJ (2007) Algorithms and applications for approximate nonnegative matrix factorization. Comput Stat Data Anal 52(1):155–173

    MathSciNet  Article  Google Scholar 

  • Blanton MR, Roweis S (2007) K-corrections and filter transformations in the ultraviolet, optical, and near-infrared. Astron J 133:734–754. ArXiv:astro-ph/0606170

    Article  Google Scholar 

  • Boyd S, Parikh N, Chu E, Peleato B, Eckstein J (2011) Distributed optimization and statistical learning via the alternating direction method of multipliers. Found Trends® Mach Learn 3(1):1–122

    MATH  Google Scholar 

  • Chambolle A (2004) An algorithm for total variation minimization and applications. J Math Imaging Vis 20(1):89–97.

    MathSciNet  Article  MATH  Google Scholar 

  • Chambolle A, Lions PL (1997) Image recovery via total variation minimization and related problems. Numer Math 76(2):167–188.

    MathSciNet  Article  MATH  Google Scholar 

  • Chen G, Teboulle M (1994) A proximal-based decomposition method for convex minimization problems. Math Program 64(1–3):81–101

    MathSciNet  Article  Google Scholar 

  • Combettes PL, Pesquet JC (2007) A Douglas-Rachford splitting approach to nonsmooth convex variational signal recovery. IEEE J Sel Top Signal Process 1(4):564–574.

    Article  Google Scholar 

  • Combettes PL, Pesquet JC (2011) Proximal splitting methods in signal processing. In: Fixed-point algorithms for inverse problems in science and engineering, Springer, pp 185–212

  • Combettes PL, Wajs VR (2005) Signal recovery by proximal forward-backward splitting. Multiscale Model Simul 4(4):1168–1200

    MathSciNet  Article  Google Scholar 

  • Condat L (2013) A primal-dual splitting method for convex optimization involving lipschitzian, proximable and linear composite terms. J Optim Theory Appl 158(2):460–479

    MathSciNet  Article  Google Scholar 

  • Douglas J, Rachford HH (1956) On the numerical solution of heat conduction problems in two and three space variables. Trans Am Math Soc 82(2):421–439

    MathSciNet  Article  Google Scholar 

  • Eckstein J, Bertsekas DP (1992) On the Douglas-Rachford splitting method and the proximal point algorithm for maximal monotone operators. Math Program 55(1):293–318

    MathSciNet  Article  Google Scholar 

  • Eckstein J, Yao W (2017) Approximate ADMM algorithms derived from Lagrangian splitting. Comput Optim Appl.

    MathSciNet  Article  Google Scholar 

  • Esser E, Zhang X, Chan TF (2010) A general framework for a class of first order primal-dual algorithms for convex optimization in imaging science. SIAM J Imaging Sci 3(4):1015–1046

    MathSciNet  Article  Google Scholar 

  • Gabay D, Mercier B (1976) A dual algorithm for the solution of nonlinear variational problems via finite element approximation. Comput Math Appl 2(1):17–40

    Article  Google Scholar 

  • Gillis N (2014) The why and how of nonnegative matrix factorization, Chapman and Hall/CRC, pp 257–291.

  • Glowinski R, Marroco A (1975) Sur l’approximation, par éléments finis d’ordre un, et la résolution, par pénalisation-dualité d’une classe de problèmes de dirichlet non linéaires. Revue française d’automatique, informatique, recherche opérationnelle Analyse numérique 9(2):41–76

    Article  Google Scholar 

  • Grippo L, Sciandrone M (2000) On the convergence of the block nonlinear Gauss-Seidel method under convex constraints. Oper Res Lett 26(3):127–136

    MathSciNet  Article  Google Scholar 

  • Hong M, Luo ZQ, Razaviyayn M (2016) Convergence analysis of alternating direction method of multipliers for a family of nonconvex problems. SIAM J Optim 26(1):337–364

    MathSciNet  Article  Google Scholar 

  • Jia S, Qian Y (2009) Constrained nonnegative matrix factorization for hyperspectral unmixing. IEEE Trans Geosci Remote Sens 47(1):161–173.

    Article  Google Scholar 

  • Komodakis N, Pesquet JC (2015) Playing with duality: an overview of recent primal-dual approaches for solving large-scale optimization problems. IEEE Signal Process Mag 32(6):31–54

    Article  Google Scholar 

  • Lee DD, Seung HS (2001) Algorithms for non-negative matrix factorization. In: Leen TK, Dietterich TG, Tresp V (eds) Advances in Neural Information Processing Systems 13, MIT Press, pp 556–562,

  • Lin CJ (2007) Projected gradient methods for nonnegative matrix factorization. Neural Comput 19(10):2756–2779

    MathSciNet  Article  Google Scholar 

  • Mitchell PA (1995) Hyperspectral digital imagery collection experiment (hydice). In: Proceedings of SPIE, vol 2587, pp 2587–2587–26,

  • Nesterov Y (2013) Gradient methods for minimizing composite functions. Math Program 140(1):125–161

    MathSciNet  Article  Google Scholar 

  • Paatero P, Tapper U (1994) Positive matrix factorization: a non-negative factor model with optimal utilization of error estimates of data values. Environmetrics 5(2):111–126

    Article  Google Scholar 

  • Parikh N, Boyd S et al (2014) Proximal algorithms. Found Trends® Optim 1(3):127–239

    Article  Google Scholar 

  • Pesquet JC, Pustelnik N (2012) A parallel inertial proximal optimization method. Pac J Optim 8(2):273–305,

  • Razaviyayn M, Hong M, Luo ZQ (2013) A unified convergence analysis of block successive minimization methods for nonsmooth optimization. SIAM J Optim 23(2):1126–1153

    MathSciNet  Article  Google Scholar 

  • Stephanopoulos G, Westerberg AW (1975) The use of Hestenes’ method of multipliers to resolve dual gaps in engineering system optimization. J Optim Theory Appl 15(3):285–309

    MathSciNet  Article  Google Scholar 

  • Wang Y, Yin W, Zeng J (2015) Global convergence of ADMM in nonconvex nonsmooth optimization. arXiv preprint arXiv:151106324

  • Xu Y, Yin W (2013) A block coordinate descent method for regularized multiconvex optimization with applications to nonnegative tensor factorization and completion. SIAM J Imaging Sci 6(3):1758–1789

    MathSciNet  Article  Google Scholar 

  • Zhang S, Qian H, Gong X (2016) An alternating proximal splitting method with global convergence for nonconvex structured sparsity optimization. In: AAAI, pp 2330–2336

  • Zhu G (2016) Nonnegative Matrix Factorization (NMF) with Heteroscedastic Uncertainties and Missing Data. ArXiv e-prints ArXiv:1612.06037

Download references


We would like to thank Robert Vanderbei and Jonathan Eckstein for useful discussions regarding the algorithm, and Jim Bosch and Robert Lupton for comments on its astrophysical applications.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Fred Moolekamp.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Moolekamp, F., Melchior, P. Block-simultaneous direction method of multipliers: a proximal primal-dual splitting algorithm for nonconvex problems with multiple constraints. Optim Eng 19, 871–885 (2018).

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI:


  • Optimization
  • Proximal algorithms
  • Nonconvex optimization
  • Block coordinate descent
  • Non-negative matrix factorization