Computational Optimization and Applications

, Volume 72, Issue 1, pp 1–43 | Cite as

Proximal alternating penalty algorithms for nonsmooth constrained convex optimization

  • Quoc Tran-DinhEmail author


We develop two new proximal alternating penalty algorithms to solve a wide range class of constrained convex optimization problems. Our approach mainly relies on a novel combination of the classical quadratic penalty, alternating minimization, Nesterov’s acceleration, adaptive strategy for parameters. The first algorithm is designed to solve generic and possibly nonsmooth constrained convex problems without requiring any Lipschitz gradient continuity or strong convexity, while achieving the best-known \(\mathcal {O}\left( \frac{1}{k}\right) \)-convergence rate in a non-ergodic sense, where k is the iteration counter. The second algorithm is also designed to solve non-strongly convex, but semi-strongly convex problems. This algorithm can achieve the best-known \(\mathcal {O}\left( \frac{1}{k^2}\right) \)-convergence rate on the primal constrained problem. Such a rate is obtained in two cases: (1) averaging only on the iterate sequence of the strongly convex term, or (2) using two proximal operators of this term without averaging. In both algorithms, we allow one to linearize the second subproblem to use the proximal operator of the corresponding objective term. Then, we customize our methods to solve different convex problems, and lead to new variants. As a byproduct, these algorithms preserve the same convergence guarantees as in our main algorithms. We verify our theoretical development via different numerical examples and compare our methods with some existing state-of-the-art algorithms.


Proximal alternating algorithm Quadratic penalty method Accelerated scheme Constrained convex optimization First-order methods Convergence rate 

Mathematics Subject Classification

90C25 90-08 



This work is partly supported by the NSF-grant, DMS-1619884, USA, and the Nafosted grant 101.01-2017.315 (Vietnam).


  1. 1.
    Attouch, H., Chbani, Z., Riahi, H.: Rate of convergence of the Nesterov accelerated gradient method in the subcritical case \(\alpha \le 3\). ESAIM: COCV (2017).
  2. 2.
    Auslender, A., Teboulle, M.: Interior gradient and proximal methods for convex and conic optimization. SIAM J. Optim. 16(3), 697–725 (2006)MathSciNetCrossRefzbMATHGoogle Scholar
  3. 3.
    Bauschke, H.H., Combettes, P.: Convex Analysis and Monotone Operators Theory in Hilbert Spaces, 2nd edn. Springer, Berlin (2017)CrossRefzbMATHGoogle Scholar
  4. 4.
    Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding agorithm for linear inverse problems. SIAM J. Imaging Sci. 2(1), 183–202 (2009)MathSciNetCrossRefzbMATHGoogle Scholar
  5. 5.
    Becker, S., Candès, E.J., Grant, M.: Templates for convex cone problems with applications to sparse signal recovery. Math. Program. Comput. 3(3), 165–218 (2011)MathSciNetCrossRefzbMATHGoogle Scholar
  6. 6.
    Belloni, A., Chernozhukov, V., Wang, L.: Square-root LASSO: pivotal recovery of sparse signals via conic programming. Biometrika 94(4), 791–806 (2011)MathSciNetCrossRefzbMATHGoogle Scholar
  7. 7.
    Ben-Tal, A., Nemirovski, A.: Lectures on Modern Convex Optimization: Analysis, Algorithms, and Engineering Applications, Volume 3 of MPS/SIAM Series on Optimization. SIAM, Philadelphia (2001)CrossRefzbMATHGoogle Scholar
  8. 8.
    Bertsekas, D.P.: Nonlinear Programming, 2nd edn. Athena Scientific, Nashua (1999)zbMATHGoogle Scholar
  9. 9.
    Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 3(1), 1–122 (2011)CrossRefzbMATHGoogle Scholar
  10. 10.
    Boyd, S., Vandenberghe, L.: Convex Optimization. University Press, Cambridge (2004)CrossRefzbMATHGoogle Scholar
  11. 11.
    Chambolle, A., Pock, T.: A first-order primal-dual algorithm for convex problems with applications to imaging. J. Math. Imaging Vis. 40(1), 120–145 (2011)MathSciNetCrossRefzbMATHGoogle Scholar
  12. 12.
    Chambolle, A., Pock, T.: On the ergodic convergence rates of a first-order primal–dual algorithm. Math. Program. 159(1–2), 253–287 (2016)MathSciNetCrossRefzbMATHGoogle Scholar
  13. 13.
    Chen, G., Teboulle, M.: A proximal-based decomposition method for convex minimization problems. Math. Program. 64, 81–101 (1994)MathSciNetCrossRefzbMATHGoogle Scholar
  14. 14.
    Condat, L.: A primal–dual splitting method for convex optimization involving Lipschitzian, proximable and linear composite terms. J. Optim. Theory Appl. 158, 460–479 (2013)MathSciNetCrossRefzbMATHGoogle Scholar
  15. 15.
    Davis, D.: Convergence rate analysis of primal–dual splitting schemes. SIAM J. Optim. 25(3), 1912–1943 (2015)MathSciNetCrossRefzbMATHGoogle Scholar
  16. 16.
    Davis, D.: Convergence rate analysis of the forward-Douglas–Rachford splitting scheme. SIAM J. Optim. 25(3), 1760–1786 (2015)MathSciNetCrossRefzbMATHGoogle Scholar
  17. 17.
    Davis, D., Yin, W.: Faster convergence rates of relaxed Peaceman–Rachford and ADMM under regularity assumptions. Math. Oper. Res. 42, 783–805 (2014)MathSciNetCrossRefzbMATHGoogle Scholar
  18. 18.
    Du, Y., Lin, X., Ruszczyński, A.: A selective linearization method for multiblock convex optimization. SIAM J. Optim. 27(2), 1102–1117 (2017)MathSciNetCrossRefzbMATHGoogle Scholar
  19. 19.
    Eckstein, J., Bertsekas, D.: On the Douglas–Rachford splitting method and the proximal point algorithm for maximal monotone operators. Math. Program. 55, 293–318 (1992)MathSciNetCrossRefzbMATHGoogle Scholar
  20. 20.
    Esser, E., Zhang, X., Chan, T.: A general framework for a class of first order primal-dual algorithms for TV-minimization. SIAM J. Imaging Sci. 3(4), 1015–1046 (2010)MathSciNetCrossRefzbMATHGoogle Scholar
  21. 21.
    Fletcher, R.: Practical Methods of Optimization, 2nd edn. Wiley, Chichester (1987)zbMATHGoogle Scholar
  22. 22.
    Goldfarb, D., Ma, S., Scheinberg, K.: Fast alternating linearization methods of minimization of the sum of two convex functions. Math. Program. Ser. A 141(1), 349–382 (2012)zbMATHGoogle Scholar
  23. 23.
    Goldstein, T., O’Donoghue, B., Setzer, S.: Fast alternating direction optimization methods. SIAM J. Imaging Sci. 7(3), 1588–1623 (2012)MathSciNetCrossRefzbMATHGoogle Scholar
  24. 24.
    Grant, M., Boyd, S., Ye, Y.: Disciplined convex programming. In: Liberti, L., Maculan, N. (eds.) Global Optimization: From Theory to Implementation, Nonconvex Optimization and Its Applications, pp. 155–210. Springer, Berlin (2006)CrossRefGoogle Scholar
  25. 25.
    He, B.S., Yuan, X.M.: On the \({O}(1/n)\) convergence rate of the Douglas–Rachford alternating direction method. SIAM J. Numer. Anal. 50, 700–709 (2012)MathSciNetCrossRefzbMATHGoogle Scholar
  26. 26.
    Jaggi, M.: Revisiting Frank–Wolfe: projection-free sparse convex optimization. JMLR W&CP 28(1), 427–435 (2013)Google Scholar
  27. 27.
    Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: Burges, C.J.C., Bottou, L., Welling, M., Ghahramani, Z., Weinberger, K.Q. (eds.), Advances in Neural Information Processing Systems (NIPS), pp. 315–323. NIPS Foundation Inc., Lake Tahoe (2013)Google Scholar
  28. 28.
    Kiwiel, K.C., Rosa, C.H., Ruszczyński, A.: Proximal decomposition via alternating linearization. SIAM J. Optim. 9(3), 668–689 (1999)MathSciNetCrossRefzbMATHGoogle Scholar
  29. 29.
    Lan, G., Monteiro, R.D.C.: Iteration complexity of first-order penalty methods for convex programming. Math. Program. 138(1), 115–139 (2013)MathSciNetCrossRefzbMATHGoogle Scholar
  30. 30.
    Necoara, I., Nesterov, Yu., Glineur, F.: Linear convergence of first order methods for non-strongly convex optimization. Math. Program. (2018).
  31. 31.
    Necoara, I., Patrascu, A., Glineur, F.: Complexity of first-order inexact Lagrangian and penalty methods for conic convex programming. Optim. Method Softw. (2017).
  32. 32.
    Necoara, I., Suykens, J.A.K.: Interior-point Lagrangian decomposition method for separable convex optimization. J. Optim. Theory Appl. 143(3), 567–588 (2009)MathSciNetCrossRefzbMATHGoogle Scholar
  33. 33.
    Nemirovskii, A., Yudin, D.: Problem Complexity and Method Efficiency in Optimization. Wiley Interscience, New York (1983)Google Scholar
  34. 34.
    Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence \({\cal{O}} (1/k^2)\). Doklady AN SSSR 269, 543–547 (1983). Translated as Soviet Math. DoklGoogle Scholar
  35. 35.
    Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course, Volume 87 of Applied Optimization. Kluwer Academic Publishers, Dordrecht (2004)CrossRefzbMATHGoogle Scholar
  36. 36.
    Nesterov, Y.: Efficiency of coordinate descent methods on huge-scale optimization problems. SIAM J. Optim. 22(2), 341–362 (2012)MathSciNetCrossRefzbMATHGoogle Scholar
  37. 37.
    Nesterov, Y.: Gradient methods for minimizing composite objective function. Math. Program. 140(1), 125–161 (2013)MathSciNetCrossRefzbMATHGoogle Scholar
  38. 38.
    Nguyen, V.Q., Fercoq, O., Cevher, V.: Smoothing technique for nonsmooth composite minimization with linear operator. ArXiv preprint (arXiv:1706.05837) (2017)
  39. 39.
    Nocedal, J., Wright, S.J.: Numerical Optimization. Springer Series in Operations Research and Financial Engineering, 2nd edn. Springer, Berlin (2006)zbMATHGoogle Scholar
  40. 40.
    O’Donoghue, B., Candes, E.: Adaptive Restart for Accelerated Gradient Schemes. Found. Comput. Math. 15, 715–732 (2015)MathSciNetCrossRefzbMATHGoogle Scholar
  41. 41.
    Ouyang, Y., Chen, Y., Lan, G., Pasiliao, E .J.R.: An accelerated linearized alternating direction method of multiplier. SIAM J. Imaging Sci. 8(1), 644–681 (2015)MathSciNetCrossRefzbMATHGoogle Scholar
  42. 42.
    Recht, B., Fazel, M., Parrilo, P.A.: Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization. SIAM Rev. 52(3), 471–501 (2010)MathSciNetCrossRefzbMATHGoogle Scholar
  43. 43.
    Rockafellar, R.T.: Convex Analysis, Volume 28 of Princeton Mathematics Series. Princeton University Press, Princeton (1970)Google Scholar
  44. 44.
    Shefi, R., Teboulle, M.: On the rate of convergence of the proximal alternating linearized minimization algorithm for convex problems. EURO J. Comput. Optim. 4(1), 27–46 (2016)MathSciNetCrossRefzbMATHGoogle Scholar
  45. 45.
    Sra, S., Nowozin, S., Wright, S.J.: Optimization for Machine Learning. MIT Press, Cambridge (2012)Google Scholar
  46. 46.
    Tran-Dinh, Q., Alacaoglu, A., Fercoq, O., Cevher, V.: Self-adaptive double-loop primal–dual algorithm for nonsmooth convex optimization. ArXiv preprint (arXiv:1808.04648), pp. 1–38 (2018)
  47. 47.
    Tran-Dinh, Q., Cevher, V.: A primal–dual algorithmic framework for constrained convex minimization. ArXiv preprint (arXiv:1406.5403), Technical Report, pp. 1–54 (2014)
  48. 48.
    Tran-Dinh, Q., Cevher, V.: Constrained convex minimization via model-based excessive gap. In: Proceedings of the Neural Information Processing Systems (NIPS), vol. 27, pp. 721–729, Montreal, Canada, December (2014)Google Scholar
  49. 49.
    Tran-Dinh, Q., Fercoq, O., Cevher, V.: A smooth primal–dual optimization framework for nonsmooth composite convex minimization. SIAM J. Optim. 28, 1–35 (2018)MathSciNetCrossRefzbMATHGoogle Scholar
  50. 50.
    Tran-Dinh, Q.: Construction and iteration-complexity of primal sequences in alternating minimization algorithms. ArXiv preprint (arXiv:1511.03305) (2015)
  51. 51.
    Tran-Dinh, Q.: Adaptive smoothing algorithms for nonsmooth composite convex minimization. Comput. Optim. Appl. 66(3), 425–451 (2016)MathSciNetCrossRefzbMATHGoogle Scholar
  52. 52.
    Tran-Dinh, Q.: Non-Ergodic alternating proximal augmented Lagrangian algorithms with optimal rates. In: The 32nd Conference on Neural Information Processing Systems (NIPS), pp. 1–9, NIPS Foundation Inc., Montreal, Canada, December (2018)Google Scholar
  53. 53.
    Tseng, P.: Applications of splitting algorithm to decomposition in convex programming and variational inequalities. SIAM J. Control Optim. 29, 119–138 (1991)MathSciNetCrossRefzbMATHGoogle Scholar
  54. 54.
    Tseng, P.: On accelerated proximal gradient methods for convex-concave optimization. Submitted to SIAM J. Optim. (2008)Google Scholar
  55. 55.
    Vu, C.B.: A splitting algorithm for dual monotone inclusions involving co-coercive operators. Adv. Comput. Math. 38(3), 667–681 (2013)MathSciNetCrossRefzbMATHGoogle Scholar
  56. 56.
    Woodworth, B.E., Srebro, N.: Tight complexity bounds for optimizing composite objectives. In: Lee, D.D., Sugiyama, M., Luxburg, U.V., Guyon, I., Garnett, R. (eds.), Advances in Neural Information Processing Systems (NIPS), pp. 3639–3647. NIPS Foundation Inc., Barcelona, Spain (2016)Google Scholar
  57. 57.
    Xu, Y.: Accelerated first-order primal–dual proximal methods for linearly constrained composite convex programming. SIAM J. Optim. 27(3), 1459–1484 (2017)MathSciNetCrossRefzbMATHGoogle Scholar
  58. 58.
    Zou, H., Hastie, T.: Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. B 67(2), 301–320 (2005)MathSciNetCrossRefzbMATHGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Department of Statistics and Operations ResearchUniversity of North Carolina at Chapel Hill (UNC-Chapel Hill)Chapel HillUSA

Personalised recommendations