Skip to main content
Log in

The multiproximal linearization method for convex composite problems

  • Full Length Paper
  • Series A
  • Published:
Mathematical Programming Submit manuscript

Abstract

Composite minimization involves a collection of smooth functions which are aggregated in a nonsmooth manner. In the convex setting, we design an algorithm by linearizing each smooth component in accordance with its main curvature. The resulting method, called the Multiprox method, consists in solving successively simple problems (e.g., constrained quadratic problems) which can also feature some proximal operators. To study the complexity and the convergence of this method, we are led to prove a new type of qualification condition and to understand the impact of multipliers on the complexity bounds. We obtain explicit complexity results of the form \(O(\frac{1}{k})\) involving new types of constant terms. A distinctive feature of our approach is to be able to cope with oracles involving moving constraints. Our method is flexible enough to include the moving balls method, the proximal Gauss–Newton’s method, or the forward–backward splitting, for which we recover known complexity results or establish new ones. We show through several numerical experiments how the use of multiple proximal terms can be decisive for problems with complex geometries.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Notes

  1. Also known as the proximal Gauss–Newton’s method

  2. Indeed, PGNM is somehow a “constant step size” method

  3. Here, hard constraints means that only feasible point can be considered, contrasting with infeasible methods (e.g., [3, 10]).

  4. We refer to constants relative to the gradients.

  5. For any such i and any m real numbers \(z_1, \ldots z_m\), the function \(z \mapsto g(z_1,\ldots , z_{i-1}, z,z_{i+1}, \ldots , z_m)\) is nondecreasing. In particular, its domain is either the whole of \(\mathbb {R}\) or a closed half line \((-\infty , a]\) for some \(a\in \mathbb {R}\), or empty.

  6. There is a slight shift in the indices of F

  7. Observe that the subproblems are simple convex quadratic problems.

  8. For the original problem (28).

  9. Lipschitz continuity is actually superfluous for Theorem 1 to hold.

  10. Actually the inverse of our steps.

References

  1. Auslender, A., Teboulle, M.: Interior gradient and proximal methods for convex and conic optimization. SIAM J. Optim. 16(3), 697–725 (2006)

    MathSciNet  MATH  Google Scholar 

  2. Auslender, A., Shefi, R., Teboulle, M.: A moving balls approximation method for a class of smooth constrained minimization problems. SIAM J. Optim. 20(6), 3232–3259 (2010)

    MathSciNet  MATH  Google Scholar 

  3. Auslender, A.: An extended sequential quadratically constrained quadratic programming algorithm for nonlinear, semidefinite, and second-order cone programming. J. Optim. Theory Appl. 156(2), 183–212 (2013)

    MathSciNet  MATH  Google Scholar 

  4. Bauschke, H.H., Combettes, P.L.: Convex Analysis and Monotone Operator Theory in Hilbert Spaces. Springer, New York (2017)

    MATH  Google Scholar 

  5. Beck A, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2(1), 183–202 (2009)

    MathSciNet  MATH  Google Scholar 

  6. Bolte, J., Pauwels, E.: Majorization-minimization procedures and convergence of SQP methods for semi-algebraic and tame programs. Math. Oper. Res. 41(2), 442–465 (2016)

    MathSciNet  MATH  Google Scholar 

  7. Burke, J.V.: Descent methods for composite nondifferentiable optimization problems. Math. Program. 33(3), 260–279 (1985)

    MathSciNet  MATH  Google Scholar 

  8. Burke, J.V., Ferris, M.C.: A Gauss–Newton method for convex composite optimization. Math. Program. 71(2), 179–194 (1995)

    MathSciNet  MATH  Google Scholar 

  9. Cartis, C., Gould, N.I., Toint, P.L.: On the evaluation complexity of composite function minimization with applications to nonconvex nonlinear programming. SIAM J. Optim. 21(4), 1721–1739 (2011)

    MathSciNet  MATH  Google Scholar 

  10. Cartis, C., Gould, N., Toint, P.: On the complexity of finding first-order critical points in constrained nonlinear optimization. Math. Program. 144(1), 93–106 (2014)

    MathSciNet  MATH  Google Scholar 

  11. Combettes, P.L., Wajs, V.R.: Signal recovery by proximal forward–backward splitting. Multiscale Model. Simul. 4(4), 1168–2000 (2005)

    MathSciNet  MATH  Google Scholar 

  12. Combettes, P.L., Pesquet, J.-C.: Proximal Splitting Methods in Signal Processing, Fixed-Point Algorithm for Inverse Problems in Science and Engineering. Optimization and Its Applications, pp. 185–212. Springer, New York (2011)

    MATH  Google Scholar 

  13. Combettes, P.L.: Systems of structured monotone inclusions: duality, algorithms, and applications. SIAM J. Optim. 23(4), 2420–2447 (2013)

    MathSciNet  MATH  Google Scholar 

  14. Combettes, P.L., Eckstein, J.: Asynchronous block-iterative primal-dual decomposition methods for monotone inclusions. Math. Program. 168(1–2), 645–672 (2018). https://doi.org/10.1007/s10107-016-1044-0

    Article  MathSciNet  MATH  Google Scholar 

  15. Drusvyatskiy, D., Lewis, A.S.: Error bounds, quadratic growth, and linear convergence of proximal methods. Math. Op. Res. 43(3), 919–948 (2018). https://doi.org/10.1287/moor.2017.0889

    Article  MathSciNet  MATH  Google Scholar 

  16. Drusvyatskiy, D., Paquette, C.: Efficiency of minimizing compositions of convex functions and smooth maps. Math. Program. (2018). https://doi.org/10.1007/s10107-018-1311-3

  17. Eckstein, J.: Nonlinear proximal point algorithms using Bregman functions, with applications to convex programming. Math. Oper. Res. 18(1), 202–226 (1993)

    MathSciNet  MATH  Google Scholar 

  18. Fletcher, R.: A model algorithm for composite nondifferentiable optimization problems. In: Sorensen, D.C., Wets, R.J.B. (eds.) Nondifferential and Variational Techniques in Optimization. Mathematical Programming Studies, vol. 17. Springer, Berlin, Heidelberg (1982)

  19. Hiriart-Urruty, J.-B., Lemarechal, C.: Convex Analysis and Minimization Algorithm I. Springer, New York (1993)

    MATH  Google Scholar 

  20. Hiriart-Urruty, J.B.: A note on the Legendre–Fenchel transform of convex composite functions. In: Alart, P., Maisonneuve, O., Rockafellar R.T. (eds.) Nonsmooth Mechanics and Analysis, pp. 35–46. Springer US (2006)

  21. Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. Adv. Neural Inf. Process. Syst. 25, 2663–2671 (2012)

    Google Scholar 

  22. Levitin, E.S., Polyak, B.T.: Constrained minimization methods. USSR Comput. Math. Math. Phys. 6(5), 1–50 (1966)

    MATH  Google Scholar 

  23. Lewis, A.S., Wright, S.J.: A proximal method for composite minimization. Mathe. Program. Math. Program. 158(1–2), 501–546 (2016). https://doi.org/10.1007/s10107-015-0943-9

    Article  MathSciNet  MATH  Google Scholar 

  24. Li, C., Ng, K.F.: Majorizing functions and convergence of the Gauss–Newton method for convex composite optimization. SIAM J. Optim. 18(2), 613–642 (2007)

    MathSciNet  MATH  Google Scholar 

  25. Li, C., Wang, X.: On convergence of the Gauss–Newton method for convex composite optimization. Math. Program. 91(2), 349–356 (2002)

    MathSciNet  MATH  Google Scholar 

  26. Lions, P.-L., Mercier, B.: Splitting algorithms for the sum of two nonlinear operators. SIAM J. Numer. Anal. 16(6), 964–979 (1979)

    MathSciNet  MATH  Google Scholar 

  27. Lofberg, J.: YALMIP: A toolbox for modeling and optimization in MATLAB. In: IEEE International Symposium on Computer Aided Control Systems Design (2004)

  28. Martinet, B.: Revue française d’informatique et de recherche opérationnelle, série rouge. Brève communication. Régularisation d’inéquations variationnelles par approximations successives 4(3), 154–158 (1970)

    Google Scholar 

  29. Moreau, J.-J.: Proximité et dualité dans un espace hilbertien. Bulletin de la Société mathématique de France. 93, 273–299 (1965)

    MathSciNet  MATH  Google Scholar 

  30. Moreau, J.-J.: Evolution problem associated with a moving convex set in a Hilbert space. J. Differ. Equ. 26(3), 347–374 (1977)

    MathSciNet  MATH  Google Scholar 

  31. Mosek Aps: The MOSEK optimization toolbox for MATLAB manual. Version 7, 1 (2016). https://HrBwww.yumpu.com/en/document/view/54768342/the-mosek-optimization-toolbox-for-matlab-manuaHrBl-version-70-revision-141/32

  32. Nesterov, Y., Nemirovskii, A.: Interior-Point Polynomial Algorithms in Convex Programming. Society for Industrial and Applied Mathematics, Philadelphia (1994)

    MATH  Google Scholar 

  33. Nesterov, Y.: Introductory Lectures on Convex Programming, Volumne I: Basis Course. Springer, New York (2004)

    Google Scholar 

  34. Nemirovskii, A., Yudin, D.: Problem Complexity and Method Efficiency in Optimization. Wiley, New York (1983)

    Google Scholar 

  35. Nocedal, J., Wright, S.: Numerical Optimization. Springer, New York (2006)

    MATH  Google Scholar 

  36. Ortega, J.M., Rheinboldt, W.C.: Iterative Solution of Nonlinear Equations in Several Variables. SIAM, Philadelphia (2000)

    MATH  Google Scholar 

  37. Passty, G.B.: Ergodic convergence to a zero of the sum of monotone operators in Hilbert space. J. Math. Anal. Appl. 72(2), 383–390 (1979)

    MathSciNet  MATH  Google Scholar 

  38. Pauwels, E.: The value function approach to convergence analysis in composite optimization. Oper. Res. Lett. 44(6), 790–795 (2016)

    MathSciNet  MATH  Google Scholar 

  39. Pshenichnyi, B.N.: The linearization method. Optimization 18(2), 179–196 (1987)

    MathSciNet  Google Scholar 

  40. Rockafellar, R.T.: Augmented Lagrangians and applications of the proximal point algorithm in convex programming. Math. Oper. Res. 1(2), 97–116 (1976)

    MathSciNet  MATH  Google Scholar 

  41. Rockafellar, R.T., Wets, R.: Variational Analysis. Springer, New York (1998)

    MATH  Google Scholar 

  42. Rosen, J.B.: The gradient projection method for nonlinear programming. Part I. Linear constraints. J. Soc. Ind. Appl. Math. 8(1), 181–217 (1960)

    MATH  Google Scholar 

  43. Rosen, J.B.: The gradient projection method for nonlinear programming. Part II. Nonlinear constraints. J. Soc. Ind. Appl. Math. 9(4), 514–532 (1961)

    MATH  Google Scholar 

  44. Salzo, S., Villa, S.: Convergence analysis of a proximal Gauss–Newton method. Comput. Optim. Appl. 53(2), 557–589 (2012)

    MathSciNet  MATH  Google Scholar 

  45. Schmidt, M., Le Roux, N., Bach, F.: Minimizing finite sums with the stochastic average gradient. Math. Program. 162(1–2), 83–112 (2017)

    MathSciNet  MATH  Google Scholar 

  46. Shefi, R., Teboulle, M.: A dual method for minimizing a nonsmooth objective over one smooth inequality constraint. Math. Program. 159(1–2), 137–164 (2016)

    MathSciNet  MATH  Google Scholar 

  47. Solodov, : Global convergence of an SQP method without boundedness assumptions on any of the iterative sequences. Math. Program. 118(1), 1–12 (2009)

    MathSciNet  MATH  Google Scholar 

  48. Tseng, P.: Applications of a splitting algorithm to decomposition in convex programming and variational inequalities. SIAM J. Control Optim. 29(1), 119–138 (1991)

    MathSciNet  MATH  Google Scholar 

  49. Villa, S., Salzo, S., Baldassarre, L., Verri, A.: Accelerated and inexact forward–backward algorithms. SIAM J. Optim. 23(3), 1607–1633 (2013)

    MathSciNet  MATH  Google Scholar 

  50. Ye, Y.: Interior Point Algorithms: Theory and Analysis. Yinyu Ye Wiley & Sons, New York (1997)

    MATH  Google Scholar 

Download references

Acknowledgements

We thank Marc Teboulle for his suggestions and the anonymous referees for their very useful comments.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zheng Chen.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This work is sponsored by the Air Force Office of Scientific Research under Grant FA9550-14-1-0500.

Appendices

Appendix A: Proof of Proposition 1

Let us recall a qualification condition from [41]. Given any \(x\in F^{-1}(\mathrm {dom}\,g)\), let

$$\begin{aligned} J(x,\cdot ):\left\{ \begin{array}{lll} {\mathbb {R}}^n &{} \rightarrow &{} {\mathbb {R}}^{m}\\ \omega &{}\mapsto &{} F({x}) + \nabla F({x}) \omega , \end{array}\right. \end{aligned}$$
(42)

be the linearized mapping of F at x. Proposition 1 follows immediately from the classical chain rule given in [41, Theorem 10.6] and the following proposition.

Proposition 3

(Two equivalent qualification conditions) Under Assumptions 1 on F and g, Assumption 4 holds if and only if

(QC) \(\mathrm {dom}\,g\) cannot be separated from \(J(x,{\mathbb {R}}^n)\) for any \(x\in F^{-1}(\mathrm {dom}\,g)\).

Proof

We first suppose that (QC) is true. We begin with a remark showing that this implies that \(\mathrm {dom}\,g\) is not empty. Let A and B be two subsets of \(\mathbb {R}^m\). The logical negation of the sentence “A and B can be separated” can be written as follows: for all a in \(\mathbb {R}^m\) and for all \(b \in \mathbb {R}\), there exists \(y \in A\) such that

$$\begin{aligned} {a^Ty} + b > 0, \end{aligned}$$

or, there exists \(z \in B\) such that

$$\begin{aligned} {a^T z} + b< 0. \end{aligned}$$

In particular if A and B cannot be separated, then either A or B is not empty. Note that if \(\mathrm {dom}\,g\) is empty, then so is the set \(\{J(x, \mathbb {R}^n),\, x \in F^{-1}(\mathrm {dom}\,g)\}\). Hence (QC) actually implies that \(\mathrm {dom}\,g\) is not empty. Pick a point \({\tilde{x}}\in F^{-1}(\mathrm {dom}\,g)\). If \(F({\tilde{x}})\in \text {int dom}(g)\), there is nothing to prove, so we may suppose that \(F({\tilde{x}})\in \text {bd dom}\,g\). If we had \([\text {int dom}(g)] \cap J({\tilde{x}},{\mathbb {R}}^n) = \emptyset \), then, \(\mathrm {dom}\,g\) and \(J({\tilde{x}},{\mathbb {R}}^n)\) could be separated by Hahn-Banach theorem contradicting (QC). Hence, there exists \(\tilde{\omega }\in {\mathbb {R}}^n\) such that \(J({\tilde{x}},{\tilde{\omega }})\in \text {int dom}(g)\). Note that, since \(F({\tilde{x}})\in \mathrm {dom}(g)\) and g is nondecreasing with respect to each argument, it follows \(F({\tilde{x}}) - d\in \mathrm {dom}(g)\) for any \(d\in ({\mathbb {R}}_+^*)^m\), indicating that \(\mathrm {int}(\mathrm {dom}(g))\ne \emptyset \). Since \(\mathrm {dom}\,g\) is convex, a classical result yields

$$\begin{aligned} J({\tilde{x}},\lambda {\tilde{\omega }}) \in \text {int dom}(g),\ \forall \,\lambda \in (0,1]. \end{aligned}$$
(43)

On the other hand F is differentiable thus

$$\begin{aligned} \Vert F({\tilde{x}} + \lambda {\tilde{\omega }}) - J({\tilde{x}},\lambda {\tilde{\omega }})\Vert = o(\lambda ), \end{aligned}$$
(44)

where \(o(\lambda )/\lambda \) tends to zero as \(\lambda \) goes to zero.

After these basic observations, let us recall an important property of the signed distance (see [19, p. 154]). Let \(D\subset {\mathbb {R}}^m\) be a nonempty closed convex set. Then, the function

$$\begin{aligned} D\rightarrow {\mathbb {R}}_+,\ z\mapsto \mathrm {dist}(z,\mathrm {bd}(D)), \end{aligned}$$

is concave. Using this concavity property for \(D=\mathrm {dom}\,g\) and the fact that \(F({\tilde{x}}) = J({\tilde{x}},0)\), it holds that

$$\begin{aligned}&\lambda \text {dist}[J({\tilde{x}},{\tilde{\omega }}),\text {bd}(\mathrm {dom}\,g)] + (1-\lambda ) \text {dist}[F({\tilde{x}}),\text {bd}(\mathrm {dom}\,g)]\\&\le \text {dist}[J({\tilde{x}},\lambda {\tilde{\omega }}),\text {bd}(\mathrm {dom}\,g)],\ \forall \ \lambda \in [0,1]. \end{aligned}$$

Since \(\text {dist}[F({\tilde{x}}),\text {bd}(\mathrm {dom}\,g)] = 0\), it follows that

$$\begin{aligned} \lambda \text {dist}[J({\tilde{x}},{\tilde{\omega }}),\text {bd}(\mathrm {dom}\,g)] \le \text {dist}[J({\tilde{x}},\lambda {\tilde{\omega }}),\text {bd}(\mathrm {dom}\,g)],\ \lambda \in [0,1]. \end{aligned}$$
(45)

Note that \( \text {dist}[J({\tilde{x}},{\tilde{\omega }}),\text {bd}(\mathrm {dom}\,g)] > 0\) since \(J({\tilde{x}},{\tilde{\omega }})\in \text {int dom}(g)\). Hence, Eq. (44) indicates that there exists \(\epsilon > 0\) such that for any \(0 < \lambda \le \epsilon \), we have

$$\begin{aligned} \Vert F({\tilde{x}} + \lambda {\tilde{\omega }}) - J({\tilde{x}},\lambda {\tilde{\omega }}) \Vert < \lambda \text {dist}[J({\tilde{x}},{\tilde{\omega }}),\text {bd}(\mathrm {dom}\,g)] . \end{aligned}$$

Substituting this inequality into Eq. (45) indicates that for any \(0 < \lambda \le \epsilon \), we have

$$\begin{aligned} \Vert F({\tilde{x}} + \lambda {\tilde{\omega }}) - J({\tilde{x}},\lambda {\tilde{\omega }}) \Vert < \text {dist}[ J({\tilde{x}},\lambda {\tilde{\omega }}),\text {bd}(\mathrm {dom}\,g)]. \end{aligned}$$

Using Eq. (43), for any \(0 <\lambda \le \epsilon \), we have \(F({\tilde{x}} + \lambda {\tilde{\omega }})\in \text {int dom}(g)\). This shows the first implication of the equivalence.

Let us prove the reverse implication by contraposition and assume that (QC) does not hold, that is, there exists a point \({\tilde{x}}\in F^{-1}(\mathrm {dom}\,g)\) such that \(\mathrm {dom}\,g\) can be separated from \(J({\tilde{x}},{\mathbb {R}}^n)\). In this case, there exists \(a \ne 0 \in {\mathbb {R}}^{m}\) and \(b\in {\mathbb {R}}\) such that

$$\begin{aligned} {\left\{ \begin{array}{ll} a^T z + b \le 0,\ \forall z\in \mathrm {dom}\,g,\\ a^T J({\tilde{x}},\omega ) + b \ge 0,\ \forall \omega \in {\mathbb {R}}^n. \end{array}\right. } \end{aligned}$$
(46)

Since \(J({\tilde{x}},0) = F({\tilde{x}}) \in \mathrm {dom}\,g\), it follows

$$\begin{aligned} a^TJ({\tilde{x}},0) + b = 0. \end{aligned}$$
(47)

By the coordinatewise convexity of F, for every \(i\in \{1,\ldots ,m\}\) one has

$$\begin{aligned} f_i(y) {\left\{ \begin{array}{ll} \ge f_i(x) + {\nabla f_i(x)^T (y-x)},\ L_i >0,\\ = f_i(x) + {\nabla f_i(x)^T(y-x)},\ L_i =0, \end{array}\right. }\ \ \forall (x,y)\in {\mathbb {R}}^n\times {\mathbb {R}}^n. \end{aligned}$$

We thus have the componentwise inequality

$$\begin{aligned} J({\tilde{x}},0) - [F({\tilde{x}}+\omega ) - J({\tilde{x}},\omega ) ] \le F({{\tilde{x}}}). \end{aligned}$$

The monotonicity properties of g implies thus that

$$\begin{aligned} J({\tilde{x}},0) - [F({\tilde{x}}+\omega ) - J({\tilde{x}},\omega ) ] \in \mathrm {dom}\,g,\ \forall \omega \in {\mathbb {R}}^n. \end{aligned}$$

As a result, combining Eq. (46) with Eq. (47), one has

$$\begin{aligned} a^{T} \big \{ J({\tilde{x}},0) - [F({\tilde{x}}+\omega ) - J({\tilde{x}},\omega ) ]\} + b \le a^T J({\tilde{x}},0) + b,\ \forall \omega \in {\mathbb {R}}^n, \end{aligned}$$

which reduces to \(a^T [F({\tilde{x}}+\omega ) - J({\tilde{x}},\omega ) ] \ge 0,\)\(\forall \omega \in {\mathbb {R}}^n\). Hence, for any \(\omega \in {\mathbb {R}}^n\) one has

$$\begin{aligned} a^T F({\tilde{x}}+\omega ) + b= & {} a^T J({\tilde{x}},\omega ) + b + a^T(F({\tilde{x}}+\omega ) - J({\tilde{x}},\omega ) )\\\ge & {} a^T J({\tilde{x}},\omega ) + b\;\ge \; 0, \end{aligned}$$

where for the last inequality, Eq. (46) is used. This inequality combined with the fact \(a^T z + b < 0,\, \forall z\in \mathrm {int\ dom}\ g\) obtained according to the first item of Eq. (46), shows that \(F({\tilde{x}}+\omega )\not \in \text {int dom}\ g\), for all \(\omega \in {\mathbb {R}}^n\), and thus \(F^{-1}(\text {int dom}\,g)= \emptyset \), that is Assumption 4 does not hold. This provides the reverse implication and the proof is complete. \(\square \)

Appendix B: Proof of Lemma 4

In this section, we present an explicit estimate of the condition number appearing in our complexity result. Let us first introduce a notation. For any \(D\subset {\mathbb {R}}^m\), nonempty closed set, we define a signed distance function as

$$\begin{aligned} D\rightarrow {\mathbb {R}},\ z\mapsto \mathrm {sdist}= {\left\{ \begin{array}{ll} \mathrm {dist}(z,\mathrm {bd}(D)),\ \text {if}\ z\in \mathrm {int}(D),\\ -\mathrm {dist}(z,\mathrm {bd}(D)),\ \text {otherwise}. \end{array}\right. } \end{aligned}$$
(48)

It is worth recalling that the signed distance function is concave (see [19, p. 154]). We begin with a lemma which describes a monotonicity property of the signed distance function.

Lemma 5

Given any \(z\in \mathrm {dom}\,g \) and any \(d=(d_1,\ldots ,d_m)\in {\mathbb {R}}_+^m\) with \(d_i =0\) if \(L_i = 0\), if \(\mathrm {bd\ dom}g \ne \emptyset \), one has

$$\begin{aligned} \mathrm {sdist}(z,\mathrm {bd}\ \mathrm {dom}\,g ) \ge \mathrm {sdist}(z+ d, \mathrm {bd}\ \mathrm {dom}\,g ). \end{aligned}$$
(49)

Proof

Fix an arbitrary \(z\in \mathrm {dom}\,g \) and an arbitrary \(d=(d_1,\ldots ,d_m)\in {\mathbb {R}}_+^m\) such that \(d_i = 0\) whenever \(L_i = 0\) in the sequel of the proof. If \(z+d \not \in \mathrm {dom}\ g\), Eq. (49) holds true by the definition in Eq. (48).

From now on, we suppose \(z+d \in \mathrm {dom}\ g\). Let \({\bar{z}} \in \mathrm {bd}\ \mathrm {dom}\,g \) be a point such that

$$\begin{aligned} {\bar{z}} \in \underset{{\hat{z}}\in \mathrm {bd}\ \mathrm {dom}\,g }{\mathrm {argmin}}\ \ \Vert z - {\hat{z}}\Vert . \end{aligned}$$

Then, one has

$$\begin{aligned} \Vert z - {\bar{z}}\Vert = \mathrm {sdist}(z,\mathrm {bd}\ \mathrm {dom}\,g ). \end{aligned}$$
(50)

Since \({\bar{z}}\) lies on the boundary of \(\mathrm {dom}\,g \), it follows that \({\bar{z}}+d \not \in \mathrm {int}\ \mathrm {dom}\,g \) because of the monotonicity property of g in Assumption 1. Hence, by the definition of \(\mathrm {sdist}\) in Eq. (48), we have

$$\begin{aligned}&\mathrm {sdist}(z+d,\mathrm {bd}\ \mathrm {dom}\,g ) = \mathrm {dist}(z+d,\mathbb {R}^m \setminus \mathrm {int}\ \mathrm {dom}\,g ) \\&\quad \le \Vert (z + d) - ({\bar{z}} + d)\Vert = \Vert z - {\bar{z}}\Vert . \end{aligned}$$

Combining this inequality with Eq. (50) completes the proof. \(\square \)

The following lemma shows that it is possible to construct a convex combination between the current x and the Slater point \({\bar{x}}\) given in Assumption 4 which will be a Slater point for the current sub-problem with a uniform control over the “degree” of qualification.

Lemma 6

Let \({\bar{x}}\) be given as in Assumption 4 and \(x \in F^{-1}(\mathrm {dom}\ g)\) and assume that \(\mathrm {bd\ dom}g \ne \emptyset \). Set

$$\begin{aligned} \gamma ({\bar{x}},x):= \mathrm {min}\left\{ 1,\frac{\mathrm {sdist}[F({\bar{x}}),\mathrm {bd}\ \mathrm {dom}\,g ]}{2\{\mathrm {sdist}[F({\bar{x}}),\mathrm {bd}\ \mathrm {dom}\,g ]- \mathrm {sdist}[F({\bar{x}})+\varvec{L} \Vert x - {\bar{x}}\Vert ^2/2,\mathrm {bd}\ \mathrm {dom}\,g ]\}}\right\} . \end{aligned}$$

Then,

$$\begin{aligned} \mathrm {sdist}[H(x,x+\gamma ({\bar{x}},x)({\bar{x}} - x)),\mathrm {bd}\ \mathrm {dom}\,g ]&\ge \frac{\mathrm {sdist}[F({\bar{x}}),\mathrm {bd}\ \mathrm {dom}\,g ]^2 }{4 \big \{\mathrm {sdist}[F({\bar{x}}),\mathrm {bd}\ \mathrm {dom}\,g ] + \Vert \varvec{L}\Vert \Vert x - {\bar{x}}\Vert ^2/2\big \}}. \end{aligned}$$
(51)

Proof

Fix an arbitary \(x\in K\). Then, for any \(t\in (0,1]\) one has

$$\begin{aligned} H(x,x+t({\bar{x}} - x))= & {} F(x) + t\nabla F(x)({\bar{x}} - x) + \frac{\varvec{L}}{2}\Vert {\bar{x}} - x\Vert ^2 t^2\nonumber \\\le & {} (1-t) F(x) + t F({\bar{x}}) + \frac{\varvec{L}}{2}\Vert x - {\bar{x}}\Vert ^2 t^2, \end{aligned}$$
(52)

where the last inequality is obtained by applying the coordinatewise convexity of F. Therefore, for any \(t\in (0,1]\), we have

$$\begin{aligned}&\mathrm {sdist}[H(x,x+t({\bar{x}} - x)),\mathrm {bd}\ \mathrm {dom}\,g ]\nonumber \\&\quad \overset{(a)}{\ge } \mathrm {sdist}[(1-t) F(x) + t F({\bar{x}}) + \frac{\varvec{L}}{2} \Vert x - {\bar{x}}\Vert ^2 t^2,\mathrm {bd}\ \mathrm {dom}\,g ]\nonumber \\&\quad \overset{(b)}{\ge } \mathrm {sdist} [F(x),\mathrm {bd}\ \mathrm {dom}\,g ](1-t) + \mathrm {sdist} [F({\bar{x}}) + t \frac{\varvec{L}}{2}\Vert x - {\bar{x}}\Vert ^2 ,\mathrm {bd}\ \mathrm {dom}\,g ] t \nonumber \\&\quad \overset{(c)}{\ge } \mathrm {sdist} [(1-t) F({\bar{x}}) + t ( F({\bar{x}}) + \frac{\varvec{L}}{2} \Vert x - {\bar{x}}\Vert ^2 ),\mathrm {bd}\ \mathrm {dom}\,g ] t \nonumber \\&\quad \overset{(d)}{\ge } \mathrm {sdist}[F({\bar{x}}),\mathrm {bd}\ \mathrm {dom}\,g ] t(1-t) + \mathrm {sdist}[ F({\bar{x}}) + \frac{\varvec{L}}{2} \Vert x - {\bar{x}}\Vert ^2,\mathrm {bd}\ \mathrm {dom}\,g ] t^2 \nonumber \\&\quad =: \delta (t), \end{aligned}$$
(53)

where for (a) we combine Eq. (52) with Lemma 5, for (b) we use the concavity of the signed distance function (see [19, p. 154]), for (c) we use the fact that \((1-t) \mathrm {sdist} [F(x),\mathrm {bd}\ \mathrm {dom}\,g ] \ge 0\), and for (d) we use the concavity of the signed distance function again.

It is easy to verify that \(\gamma ({\bar{x}},x)\in (0,1]\) is the maximizer of \(\delta (t)\) over the interval (0, 1]. We now consider the following inequality.

$$\begin{aligned} \mathrm {sdist}[F({\bar{x}})+\varvec{L} \Vert x - {\bar{x}}\Vert ^2/2,\mathrm {bd}\ \mathrm {dom}\,g ] \ge - \frac{\Vert \varvec{L}\Vert }{2}\Vert x - {\bar{x}}\Vert ^2, \end{aligned}$$
(54)

Inequality (54) holds true: indeed, either \(F({\bar{x}})+\varvec{L} \Vert x - {\bar{x}}\Vert ^2/2 \in \mathrm {dom}\ g\) and the result is trivial or otherwise, the result holds by the definition of the distance as an infimum. If \(\gamma ({\bar{x}},x) = 1\), by its definition, one immediately has

$$\begin{aligned} \frac{\mathrm {sdist}[F({\bar{x}}),\mathrm {bd}\ \mathrm {dom}\,g ]}{2\{\mathrm {sdist}[F({\bar{x}}),\mathrm {bd}\ \mathrm {dom}\,g ]- \mathrm {sdist}[F({\bar{x}})+\varvec{L} \Vert x - {\bar{x}}\Vert ^2/2,\mathrm {bd}\ \mathrm {dom}\,g ]\}} \ge 1, \end{aligned}$$

which implies

$$\begin{aligned} \mathrm {sdist}[F({\bar{x}})+\varvec{L} \Vert x - {\bar{x}}\Vert ^2/2,\mathrm {bd}\ \mathrm {dom}\,g ] \ge \frac{\mathrm {sdist}[F({\bar{x}}),\mathrm {bd}\ \mathrm {dom}\,g ]}{2}. \end{aligned}$$
(55)

Substituting \(\gamma ({\bar{x}},x) = 1\) into Eq. (53) yields

$$\begin{aligned} \delta (\gamma ({\bar{x}},x))= & {} \mathrm {sdist}[ F({\bar{x}}) + \frac{\varvec{L}}{2} \Vert x - {\bar{x}}\Vert ^2,\mathrm {bd}\ \mathrm {dom}\,g ] \nonumber \\\ge & {} \frac{\mathrm {sdist}[F({\bar{x}}),\mathrm {bd}\ \mathrm {dom}\,g ]}{2}\nonumber \\\ge & {} \frac{\mathrm {sdist}[F({\bar{x}}),\mathrm {bd}\ \mathrm {dom}\,g ]}{2} \times \frac{1}{2}\frac{\mathrm {sdist}[F({\bar{x}}),\mathrm {bd}\ \mathrm {dom}\,g ]}{\mathrm {sdist}[F({\bar{x}}),\mathrm {bd}\ \mathrm {dom}\,g ] + \Vert \varvec{L}\Vert \Vert x - {\bar{x}}\Vert ^2/2}\nonumber \\= & {} \frac{\mathrm {sdist}[F({\bar{x}}),\mathrm {bd}\ \mathrm {dom}\,g ]^2 }{4 \big \{\mathrm {sdist}[F({\bar{x}}),\mathrm {bd}\ \mathrm {dom}\,g ] + \Vert \varvec{L}\Vert \Vert x - {\bar{x}}\Vert ^2/2\big \}}, \end{aligned}$$
(56)

where the first inequality is obtained by considering Eq. (55). As a result, Eq. (51) holds true if \(\gamma ({\bar{x}},x) = 1\).

From now on, let us consider \(\gamma ({\bar{x}},x)<1\). In this case, one has

$$\begin{aligned} \gamma ({\bar{x}},x) = \frac{\mathrm {sdist}[F({\bar{x}}),\mathrm {bd}\ \mathrm {dom}\,g ]}{2\{\mathrm {sdist}[F({\bar{x}}),\mathrm {bd}\ \mathrm {dom}\,g ]- \mathrm {sdist}[F({\bar{x}})+\varvec{L} \Vert x - {\bar{x}}\Vert ^2/2,\mathrm {bd}\ \mathrm {dom}\,g ]\}}. \end{aligned}$$

Substituting into Eq. (53) and using (54) leads to

$$\begin{aligned} \delta (\gamma ({\bar{x}},x))= & {} \frac{\mathrm {sdist}[F({\bar{x}}),\mathrm {bd}\ \mathrm {dom}\,g ]^2 }{4 \big \{\mathrm {sdist}[F({\bar{x}}),\mathrm {bd}\ \mathrm {dom}\,g ] -\mathrm {sdist}[F({\bar{x}})+\varvec{L} \Vert x - {\bar{x}}\Vert ^2/2,\mathrm {bd}\ \mathrm {dom}\,g ]\big \}}\\\ge & {} \frac{\mathrm {sdist}[F({\bar{x}}),\mathrm {bd}\ \mathrm {dom}\,g ]^2 }{4 \big \{\mathrm {sdist}[F({\bar{x}}),\mathrm {bd}\ \mathrm {dom}\,g ] + \Vert \varvec{L}\Vert \Vert x - {\bar{x}}\Vert ^2/2\big \}}. \end{aligned}$$

Eventually, combining this equation with (53) and (56) completes the proof. \(\square \)

We are now ready to describe the proof of Lemma 4

Proof of Lemma 4

  1. (i)

    As the function g is \(L_g\) Lipschitz continuous on its domain, an immediate application of the Cauchy–Schwartz inequality leads to \(\varvec{L}^T\nu \le L_g \Vert \varvec{L}\Vert \) (see also Sect. 3.2.1).

  2. (ii)

    The claim is trivial if \(\mathrm {bd\ dom}(g) = \emptyset \), hence we will assume that it is not so that we can use Lemmas 5 and 6. Set \(w = x + \gamma ({\bar{x}},x)({\bar{x}} - x)\) with \(\gamma ({\bar{x}},x)\) given as in Lemma 6. By Lemma 6, one has \(w \in \mathrm {dom}(g\circ H(x,\cdot ))\). Then, one obtains

    $$\begin{aligned} \frac{\varvec{L}^T\nu }{2}\Vert w - y\Vert ^2= & {} [H(x,w) - H(x,y)]^T\nu \nonumber \\\le & {} g\circ H(x,w) - g\circ H(x,y)\nonumber \\\le & {} L_g \Vert H(x,w) - H(x,y)\Vert , \end{aligned}$$
    (57)

    where the equality follows from Eq. (11), the first inequality is obtained by the convexity of g, and the last inequality is due to the assumption that g is \(L_g\) Lipschitz continuous on its domain. On the other hand, a direct calculation yields

    $$\begin{aligned} \Vert H(x,w) - H(x,y)\Vert =&\ \Vert \nabla F(x) (w - y) + \frac{\varvec{L}}{2}\Vert w - x\Vert ^2 - \frac{\varvec{L}}{2}\Vert y - x\Vert ^2 \Vert \\ =&\ \Vert \nabla F(x) (w - y) + \frac{\varvec{L}}{2} ( \Vert w - y\Vert ^2 + 2(w-y)^T(y-x) ) \Vert \\ \le&\ \Vert \nabla F(x)\Vert _{\mathrm{op}}\Vert w-y\Vert + \frac{\Vert \varvec{L}\Vert }{2} \Vert w - y\Vert ^2 + \Vert \varvec{L}\Vert \Vert w-y\Vert \Vert y - x\Vert \\ =&\ \Vert w - y\Vert \left[ \Vert \nabla F(x)\Vert _{\mathrm{op}} + \frac{\Vert \varvec{L}\Vert }{2}\Vert w - y\Vert + \Vert \varvec{L}\Vert \Vert y - x\Vert \right] \\ \le&\ \Vert w - y\Vert \left[ \Vert \nabla F(x)\Vert _{\mathrm{op}} + \frac{\Vert \varvec{L}\Vert }{2}\Vert x - y\Vert \right. \\&\left. \quad + \gamma ({\bar{x}},x)\frac{\Vert \varvec{L}\Vert }{2}\Vert {\bar{x}} - x\Vert + \Vert \varvec{L}\Vert \Vert y - x\Vert \right] \\ \le&\ \Vert w - y\Vert \left[ \Vert \nabla F(x)\Vert _{\mathrm{op}} + \frac{3\Vert \varvec{L}\Vert }{2}\Vert x - y\Vert + \frac{\Vert \varvec{L}\Vert }{2}\Vert {\bar{x}} - x\Vert \right] \end{aligned}$$

    Substituting this inequality into Eq. (57) leads to

    $$\begin{aligned} \frac{\varvec{L}^T \nu }{2}\Vert H(x,w) - H(x,y)\Vert \le L_g \left[ \Vert \nabla F(x)\Vert _{\mathrm{op}} + \frac{3\Vert \varvec{L}\Vert }{2}\Vert x - y\Vert + \frac{\Vert \varvec{L}\Vert }{2}\Vert {\bar{x}} - x\Vert \right] ^2.\nonumber \\ \end{aligned}$$
    (58)

    As H(xy) is on the boundary of \(\mathrm {dom}\,g \), it follows that

    $$\begin{aligned} \Vert H(x,w) - H(x,y)\Vert \ge \mathrm {sdist}[H(x,w),\mathrm {bd}\ \mathrm {dom}\,g ]. \end{aligned}$$
    (59)

    Combining this inequality with Lemma 6 eventually completes the proof. \(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bolte, J., Chen, Z. & Pauwels, E. The multiproximal linearization method for convex composite problems. Math. Program. 182, 1–36 (2020). https://doi.org/10.1007/s10107-019-01382-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10107-019-01382-3

Keywords

Mathematics Subject Classification

Navigation