Skip to main content
Log in

Primal–Dual Proximal Splitting and Generalized Conjugation in Non-smooth Non-convex Optimization

  • Published:
Applied Mathematics & Optimization Submit manuscript

Abstract

We demonstrate that difficult non-convex non-smooth optimization problems, such as Nash equilibrium problems and anisotropic as well as isotropic Potts segmentation models, can be written in terms of generalized conjugates of convex functionals. These, in turn, can be formulated as saddle-point problems involving convex non-smooth functionals and a general smooth but non-bilinear coupling term. We then show through detailed convergence analysis that a conceptually straightforward extension of the primal–dual proximal splitting method of Chambolle and Pock is applicable to the solution of such problems. Under sufficient local strong convexity assumptions on the functionals—but still with a non-bilinear coupling term—we even demonstrate local linear convergence of the method. We illustrate these theoretical results numerically on the aforementioned example problems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Aragón Artacho, F.J., Geoffroy, M.H.: Characterization of metric regularity of subdifferentials. J. Convex Anal. 15(2), 365–380 (2008)

    MathSciNet  MATH  Google Scholar 

  2. Aragón Artacho, F.J., Geoffroy, M.H.: Metric subregularity of the convex subdifferential in Banach spaces. J. Nonlinear Convex Anal. 15(1), 35–47 (2014)

    MathSciNet  MATH  Google Scholar 

  3. Attouch, H., Bolte, J., Svaiter, B.: Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward-backward splitting, and regularized Gauss-Seidel methods. Math. Progr. 137(1–2), 91–129 (2013). https://doi.org/10.1007/s10107-011-0484-9

    Article  MathSciNet  MATH  Google Scholar 

  4. Bauschke, H.H., Combettes, P.L.: Convex Analysis and Monotone Operator Theory in Hilbert Spaces. CMS Books in Mathematics, 2nd edn. Springer, New York (2017)

    Book  Google Scholar 

  5. Benning, M., Knoll, F., Schönlieb, C.B., Valkonen, T.: Preconditioned ADMM with nonlinear operator constraint. In: L. Bociu, J.A. Désidéri, A. Habbal (eds.) System Modeling and Optimization: 27th IFIP TC 7 Conference, CSMO 2015, Sophia Antipolis, France, June 29–July 3, 2015, Revised Selected Papers, pp. 117–126. Springer International Publishing (2016). https://tuomov.iki.fi/m/nonlinearADMM.pdf

  6. Borzì, A., Kanzow, C.: Formulation and numerical solution of Nash equilibrium multiobjective elliptic control problems. SIAM J. Control Optim. 51(1), 718–744 (2013). https://doi.org/10.1137/120864921

    Article  MathSciNet  MATH  Google Scholar 

  7. Chambolle, A.: An algorithm for total variation minimization and applications. J. Math. Imaging Vis. 20(1), 89–97 (2004). https://doi.org/10.1023/B:JMIV.0000011325.36760.1e

    Article  MathSciNet  MATH  Google Scholar 

  8. Chambolle, A., Pock, T.: A first-order primal-dual algorithm for convex problems with applications to imaging. J. Math. Imaging Vis. 40, 120–145 (2011). https://doi.org/10.1007/s10851-010-0251-1

    Article  MathSciNet  MATH  Google Scholar 

  9. Clason, C., Kunisch, K.: A convex analysis approach to multi-material topology optimization. ESAIM Math. Modell. Numer. Anal. 50(6), 1917–1936 (2016). https://doi.org/10.1051/m2an/2016012

    Article  MathSciNet  MATH  Google Scholar 

  10. Clason, C., Valkonen, T.: Primal-dual extragradient methods for nonlinear nonsmooth PDE-constrained optimization. SIAM J. Optim. 27(3), 1313–1339 (2017). https://doi.org/10.1137/16M1080859

    Article  MathSciNet  MATH  Google Scholar 

  11. Clason, C., Mazurenko, S., Valkonen, T.: Acceleration and global convergence of a first-order primal-dual method for nonconvex problems. SIAM J. Optim. 29, 933–963 (2019). https://doi.org/10.1137/18M1170194

    Article  MathSciNet  MATH  Google Scholar 

  12. Clason, C., Mazurenko, S., Valkonen, T.: Julia codes for “primal-dual proximal splitting and generalized conjugation in non-smooth non-convex optimization”. Online resource on Zenodo (2020). https://doi.org/10.5281/zenodo.3647614

    Article  MATH  Google Scholar 

  13. Drori, Y., Sabach, S., Teboulle, M.: A simple algorithm for a class of nonsmooth convex-concave saddle-point problems. Oper. Res. Lett. 43(2), 209–214 (2015). https://doi.org/10.1016/j.orl.2015.02.001

    Article  MathSciNet  MATH  Google Scholar 

  14. Ekeland, I., Temam, R.: Convex Analysis and Variational Problems. SIAM, Philadelphia (1999)

    Book  Google Scholar 

  15. Elster, K.H., Wolf, A.: Recent Results on Generalized Conjugate Functions, pp. 67–78. Springer, New York (1988)

    MATH  Google Scholar 

  16. Facchinei, F., Kanzow, C.: Generalized Nash equilibrium problems. Ann. Oper. Res. 175, 177–211 (2010). https://doi.org/10.1007/s10479-009-0653-x

    Article  MathSciNet  MATH  Google Scholar 

  17. Flåm, S.D., Antipin, A.S.: Equilibrium programming using proximal-like algorithms. Math. Progr. 78(1, Ser. A), 29–41 (1997). https://doi.org/10.1007/BF02614504

    Article  MathSciNet  MATH  Google Scholar 

  18. Geman, S., Geman, D.: Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Trans. Pattern Anal. Mach. Intell. 6, 721–741 (1984). https://doi.org/10.1109/TPAMI.1984.4767596

    Article  MATH  Google Scholar 

  19. Hamedani, E.Y., Aybat, N.S.: A primal-dual algorithm for general convex-concave saddle point problems (2018)

  20. He, N., Juditsky, A., Nemirovski, A.: Mirror prox algorithm for multi-term composite minimization and semi-separable problems. Comput. Optim. Appl. 61(2), 275–319 (2015). https://doi.org/10.1007/s10589-014-9723-3

    Article  MathSciNet  MATH  Google Scholar 

  21. He, Y., Monteiro, R.D.: An accelerated HPE-type algorithm for a class of composite convex-concave saddle-point problems. SIAM J. Optim. 26(1), 29–56 (2016). https://doi.org/10.1137/14096757X

    Article  MathSciNet  MATH  Google Scholar 

  22. Juditsky, A., Nemirovski, A.: First Order Methods for Nonsmooth Convex Large-Scale Optimization, pp. 121–148. I General Purpose Methods. MIT Press, Cambridge (2011)

    Google Scholar 

  23. Juditsky, A., Nemirovski, A.: First Order Methods for Nonsmooth Convex Large-Scale Optimization II Utilizing Problems Structure, pp. 149–183. MIT Press, Cambridge (2011)

    Google Scholar 

  24. Kolossoski, O., Monteiro, R.: An accelerated non-euclidean hybrid proximal extragradient-type algorithm for convex-concave saddle-point problems. Optim. Methods Softw. 32(6), 1244–1272 (2017). https://doi.org/10.1080/10556788.2016.1266355

    Article  MathSciNet  MATH  Google Scholar 

  25. Krawczyk, J.B., Uryasev, S.: Relaxation algorithms to find Nash equilibria with economic applications. Environ. Model. Assess. 5(1), 63–73 (2000). https://doi.org/10.1023/A:1019097208499

    Article  Google Scholar 

  26. Martinez-Legaz, J.E.: Generalized convex duality and its economic applications. In: Hadjisavvas, N., Komlósi, S., Schaible, S. (eds.) Handbook of Generalized Convexity and Generalized Monotonicity, pp. 237–292. Springer, New York (2005)

    Chapter  Google Scholar 

  27. Nemirovski, A.: Prox-method with rate of convergence \(O(1/t)\) for variational inequalities with Lipschitz continuous monotone operators and smooth convex-concave saddle point problems. SIAM J. Optim. 15(1), 229–251 (2004). https://doi.org/10.1137/S1052623403425629

    Article  MathSciNet  MATH  Google Scholar 

  28. Nesterov, Y.: Smooth minimization of non-smooth functions. Math. Progr. 103(1), 127–152 (2005). https://doi.org/10.1007/s10107-004-0552-5

    Article  MathSciNet  MATH  Google Scholar 

  29. Nikaidô, H., Isoda, K.: Note on non-cooperative convex games. Pac. J. Math. 5, 807–815 (1955). https://doi.org/10.2140/pjm.1955.5.807

    Article  MathSciNet  MATH  Google Scholar 

  30. Rasband, W.S.: ImageJ. https://imagej.nih.gov/ij/

  31. Rosen, J.B.: Existence and uniqueness of equilibrium points for concave \(n\)-person games. Econometrica 33, 520–534 (1965). https://doi.org/10.2307/1911749

    Article  MathSciNet  MATH  Google Scholar 

  32. Singer, I.: Duality for Nonconvex Approximation and Optimization. Springer, New York (2006). https://doi.org/10.1007/0-387-28395-1

    Book  MATH  Google Scholar 

  33. Storath, M., Weinmann, A., Demaret, L.: Jump-sparse and sparse recovery using Potts functionals. IEEE Trans. Signal Process. 62(14), 3654–3666 (2014). https://doi.org/10.1109/TSP.2014.2329263

    Article  MathSciNet  MATH  Google Scholar 

  34. Storath, M., Weinmann, A., Frikel, J., Unser, M.: Joint image reconstruction and segmentation using the potts model. Invers. Probl. 31(2), 025003 (2015). https://doi.org/10.1088/0266-5611/31/2/025003

    Article  MathSciNet  MATH  Google Scholar 

  35. Valkonen, T.: A primal-dual hybrid gradient method for non-linear operators with applications to MRI. Invers. Probl. 30(5), 055012 (2014). https://doi.org/10.1088/0266-5611/30/5/055012

    Article  MATH  Google Scholar 

  36. Valkonen, T.: Testing and non-linear preconditioning of the proximal point method. Appl. Math. Optim. (2018). https://doi.org/10.1007/s00245-018-9541-6

    Article  MATH  Google Scholar 

  37. Valkonen, T., Pock, T.: Acceleration of the PDHGM on partially strongly convex functions. J. Math. Imaging Vis. 59, 394–414 (2017). https://doi.org/10.1007/s10851-016-0692-2

    Article  MathSciNet  MATH  Google Scholar 

  38. von Heusinger, A., Kanzow, C.: Optimization reformulations of the generalized Nash equilibrium problem using Nikaido-Isoda-type functions. Comput. Optim. Appl. 43(3), 353–377 (2009). https://doi.org/10.1007/s10589-007-9145-6

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

In the first stages of the research T. Valkonen and S. Mazurenko were supported by the EPSRC First Grant EP/P021298/1, “PARTIAL Analysis of Relations in Tasks of Inversion for Algorithmic Leverage”. Later T. Valkonen was supported by the Academy of Finland grants 314701 and 320022. C. Clason was supported by the German Science Foundation (DFG) under grant Cl 487/2-1. We thank the anonymous reviewers for insightful comments.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Christian Clason.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

A data Statement for the EPSRC

The source codes for the numerical experiments are on Zenodo at [12].

Reductions of the Three-Point Condition

The following two propositions demonstrate that Assumption 3.2 (iv) is closely related to standard second-order optimality conditions, i.e., that the Hessian is positive definite at the solution \({\widehat{u}}\).

Proposition A.1

Suppose Assumption 3.2 (ii) (locally Lipschitz gradients of K) holds in some neighborhood \({\mathcal {U}}\) of \({\widehat{u}}\), and for some \(\xi _x\in {\mathbb {R}}\), \(\gamma _x>0\),

$$\begin{aligned} \xi _x\Vert x-{\widehat{x}}\Vert ^2 +\langle K_x(x,{\widehat{y}})-K_x({\widehat{x}},{\widehat{y}}),x-{\widehat{x}}\rangle \ge \gamma _x\Vert x-{\widehat{x}}\Vert ^2 \quad ((x,y)\in {\mathcal {U}}). \end{aligned}$$
(66)

Then (15a) holds in \({\mathcal {U}}\) with \(\theta _x=2(\gamma _x-\alpha )L_{yx}^{-1}\), and \(\lambda _x=L_x({\widehat{y}})^2(2\alpha )^{-1}\) for any \(\alpha \in (0,\gamma _x]\).

Proof

An application of Cauchy’s and Young’s inequalities with any factor \(\alpha >0\), Assumption 3.2 (ii), and (66) yields the estimate

$$\begin{aligned} \begin{aligned} \langle K_x(x',{\widehat{y}})-K_x({\widehat{x}},{\widehat{y}}),x-{\widehat{x}}\rangle +\xi _x\Vert x-{\widehat{x}}\Vert ^2&= \langle K_x(x,{\widehat{y}})-K_x({\widehat{x}},{\widehat{y}}),x-{\widehat{x}}\rangle +\xi _x\Vert x-{\widehat{x}}\Vert ^2 \\&\quad +\langle K_x(x',{\widehat{y}})-K_x(x,{\widehat{y}}),x-{\widehat{x}}\rangle \\&\ge (\gamma _x-\alpha )\Vert x-{\widehat{x}}\Vert ^2-L_x({\widehat{y}})^2(4\alpha )^{-1} \Vert x'-x\Vert ^2. \end{aligned} \end{aligned}$$

At the same time, using (16),

$$\begin{aligned} \Vert K_y({\widehat{x}},y)-K_y(x,y)-K_{yx}(x,y)({\widehat{x}}-x)\Vert \le \frac{L_{yx}}{2}\Vert x-{\widehat{x}}\Vert ^2. \end{aligned}$$

Therefore (15a) holds if we take \(\theta _x\le 2(\gamma _x-\alpha )L_{yx}^{-1}\) and \(\lambda _x=L_x({\widehat{y}})^2(2\alpha )^{-1}\). \(\square \)

Proposition A.2

Suppose Assumption 3.2 (ii) (locally Lipschitz gradients of K) holds in some neighborhood \({\mathcal {U}}\) of \({\widehat{u}}\) with \(L_y(x)\le {\bar{L}}_y\), and that

$$\begin{aligned} \Vert K_{xy}(x,y')-K_{xy}(x,y)\Vert \le L_{xy}\Vert y'-y\Vert \quad (u,u'\in {\mathcal {U}}) \end{aligned}$$

for some constant \(L_{xy} \ge 0\). Assume, moreover, for some \(\xi _y\in {\mathbb {R}}\), \(\gamma _y>0\) that

$$\begin{aligned} \xi _y\Vert y-{\widehat{y}}\Vert ^2 +\langle K_y({\widehat{x}},{\widehat{y}})-K_y({\widehat{x}},y),y-{\widehat{y}}\rangle \ge \gamma _y\Vert y-{\widehat{y}}\Vert ^2 \quad ((x,y)\in {\mathcal {U}}). \end{aligned}$$
(67)

Then (15b) holds in \({\mathcal {U}}\) with \(\theta _y=2(\gamma _y-\alpha _1)(1+\alpha _2)^{-1} L_{xy}^{-1}\), and \(\lambda _y=({\bar{L}}_y^2(2\alpha _1)^{-1}+(1+\alpha _2^{-1})L_{xy}\theta _y)\) for any \(\alpha _1\in (0,\gamma _y]\), \(\alpha _2>0\).

Proof

An application of Cauchy’s and Young’s inequalities with any factor \(\alpha >0\), Assumption 3.2 (ii), and (67) yields the estimate

$$\begin{aligned} \begin{aligned} \langle K_y(x,y)-K_y(x,y')&+K_y({\widehat{x}},{\widehat{y}})-K_y({\widehat{x}},y),y-{\widehat{y}}\rangle +\xi _y\Vert y-{\widehat{y}}\Vert ^2 \\&\ge \langle K_y(x,y)-K_y(x,y'),y-{\widehat{y}}\rangle +\gamma _y\Vert y-{\widehat{y}}\Vert ^2 \\&\ge (\gamma _y-\alpha _1)\Vert y-{\widehat{y}}\Vert ^2-\frac{L_y(x)^2}{4\alpha _1} \Vert y'-y\Vert ^2. \end{aligned}\end{aligned}$$

At the same time, using (16) and Young’s inequality for any \(\alpha _2>0\),

$$\begin{aligned}&\Vert K_x(x',{\widehat{y}})-K_x(x',y')-K_{xy}(x',y')({\widehat{y}}-y')\Vert \le \frac{L_{xy}}{2}\Vert y'-{\widehat{y}}\Vert ^2 \\&\quad \le \frac{L_{xy}}{2}(1+\alpha _2)\Vert y-{\widehat{y}}\Vert ^2 +\frac{L_{xy}}{2}(1+\alpha _2^{-1})\Vert y'-y\Vert ^2. \end{aligned}$$

Therefore (15b) holds if we take \(\theta _y\le 2\frac{\gamma _y-\alpha _1}{(1+\alpha _2)L_{xy}}\) and \(\lambda _y=\frac{{\bar{L}}_y^2}{2\alpha _1}+(1+\alpha _2^{-1})L_{xy}\theta _y\). \(\square \)

Relaxations of the Three-Point Condition

In all the results of this paper, Assumption 3.2(iv) can be generalized to the following three-point condition similar to the one used in [11].

Assumption B.1

The functional \(K(x,y)\in C^1(X\times Y)\) and there exists a neighborhood

$$\begin{aligned} {{\mathcal {U}}(\rho _x,\rho _y) :=(\mathbb {B}({\widehat{x}}, \rho _x) \cap {\mathcal {X}}_G) \times (\mathbb {B}({\widehat{y}}, \rho _y) \cap {\mathcal {Y}}_{F^*}),} \end{aligned}$$
(68)

for some \(\rho _x,\rho _y>0\) such that for all \(u',u \in {\mathcal {U}}(\rho _x,\rho _y)\), the following property holds:

  1. (iv*)

    (three-point condition) There exist \(\theta _x,\theta _y > 0\), \(\lambda _x,\lambda _y\ge 0\), \(\xi _x,\xi _y\in {\mathbb {R}}\), and \(p_x,p_y\in [1,2]\) such that

    $$\begin{aligned}&\langle K_x(x',{\widehat{y}})-K_x({\widehat{x}},{\widehat{y}}),x-{\widehat{x}}\rangle +\xi _x\Vert x-{\widehat{x}}\Vert ^2\nonumber \\&\quad \ge \theta _x\Vert K_y({\widehat{x}},y)-K_y(x,y)-K_{yx}(x,y)({\widehat{x}}-x)\Vert ^{p_x} -\frac{\lambda _x}{2}\Vert x-x'\Vert ^2, \end{aligned}$$
    (69a)
    $$\begin{aligned}&\langle K_y(x,y)-K_y(x,y')+K_y({\widehat{x}},{\widehat{y}})-K_y({\widehat{x}},y),y-{\widehat{y}}\rangle +\xi _y\Vert y-{\widehat{y}}\Vert ^2 \nonumber \\&\quad \ge \theta _y\Vert K_x(x',{\widehat{y}})-K_x(x',y')-K_{xy}(x',y')({\widehat{y}}-y')\Vert ^{p_y} -\frac{\lambda _y}{2}\Vert y-y'\Vert ^2. \end{aligned}$$
    (69b)

This assumption introduces \(p_x\) and \(p_y\) in [1, 2], while in Assumption 3.2 (iv) we had \(p_x=p_y=1\). For instance, in [11, Appendix B] we verified Assumption B.1 with \(p_x=2\) for the case \(K(x,y)=\langle A(x),y\rangle \) for the reconstruction of the phase and amplitude of a complex number. This relaxation mainly affects the proof of Step 4 in Theorem 4.2, which now requires a few intermediate derivations.

Corollary B.1

The results of Theorem 4.2 continue to hold if Assumption 3.2 (iv) is replaced with Assumption B.1 (iv*) for some \(p_x,p_y\in [1,2]\), where in case \(p_y \in (1, 2]\), (24d) is replaced by

$$\begin{aligned} \gamma _G&\ge {{\tilde{\gamma }}}_G + \xi _x + \frac{p_y-1}{(\theta _yp_y^{p_y}\rho _x^{p_y-2}{\overline{\omega }}^{-1})^{\frac{1}{p_y-1}}}, \end{aligned}$$
(70a)

and in case \(p_x \in (1, 2]\), (24e) is replaced by

$$\begin{aligned} \gamma _{F^*}&\ge {{\tilde{\gamma }}}_{F^*}+ \xi _y + \frac{p_x-1}{({\underline{\omega }}\theta _xp_x^{p_x} \rho _y^{p_x-2})^{\frac{1}{p_x-1}}}. \end{aligned}$$
(70b)

Proof

The beginning of the proof follows the exact same steps as in the proof of Theorem 4.2 up until (30). We now use Assumption B.1 (iv*) to further bound \(D_x\) and \(D_y\) similarly to (31) and (32). From (69a),

$$\begin{aligned} \begin{aligned} D_x&\ge \theta _x\Vert K_y({\widehat{x}},y^{i+1})-K_y(x^{i+1},y^{i+1})-K_{yx}(x^{i+1},y^{i+1})({\widehat{x}}-x^{i+1})\Vert ^{p_x} -\frac{\lambda _x}{2}\Vert x^{i+1}-x^i\Vert ^2 \\&-\Vert y^{i+1}-{\widehat{y}}\Vert \Vert K_y({\widehat{x}},y^{i+1})-K_y(x^{i+1},y^{i+1})-K_{yx}(x^{i+1},y^{i+1})({\widehat{x}}-x^{i+1})\Vert \omega _i^{-1}. \end{aligned} \end{aligned}$$
(71)

The following generalized Young’s inequality for any positive abp and q such that \(q^{-1}+p^{-1}=1\) allows for our choice of varying \(p_x\in [1,2]\):

$$\begin{aligned} ab=\left( ab^{\frac{2-p}{p}}\right) b^{2\frac{p-1}{p}} \le \frac{1}{p}\left( ab^{\frac{2-p}{p}}\right) ^p+\frac{1}{q}b^{2\frac{p-1}{p}q} =\frac{1}{p}a^pb^{2-p}+\biggl (1-\frac{1}{p}\biggr )b^2. \end{aligned}$$
(72)

Applying this inequality with \(p=p_x\),

$$\begin{aligned} a&:=(\zeta _x p_x)^{-1/2} \Vert K_y({\widehat{x}},y^{i+1})-K_y(x^{i+1},y^{i+1})-K_{yx}(x^{i+1},y^{i+1})({\widehat{x}}-x^{i+1})\Vert , \\ b&:=(\zeta _x p_x)^{1/2}\Vert y^{i+1}-{\widehat{y}}\Vert , \end{aligned}$$

for any \(\zeta _x>0\) to the last term of (71), we arrive at the estimate

$$\begin{aligned} \begin{aligned} D_x&\ge \theta _x\Vert K_y({\widehat{x}},y^{i+1})-K_y(x^{i+1},y^{i+1})-K_{yx}(x^{i+1},y^{i+1})({\widehat{x}}-x^{i+1})\Vert ^{p_x} -\frac{\lambda _x}{2}\Vert x^{i+1}-x^i\Vert ^2 \\&\quad -\frac{\Vert y^{i+1}-{\widehat{y}}\Vert ^{2-p_x}}{p_x^{p_x}\omega _i\zeta _x^{p_x-1}} \Vert K_y({\widehat{x}},y^{i+1})-K_y(x^{i+1},y^{i+1})-K_{yx}(x^{i+1},y^{i+1})({\widehat{x}}-x^{i+1})\Vert ^{p_x} \\&\quad -\frac{p_x-1}{\omega _i}\zeta _x \Vert y^{i+1}-{\widehat{y}}\Vert ^2. \end{aligned} \end{aligned}$$

We now use \(u^{i+1}\in {\mathcal {U}}(\rho _x,\rho _y)\) for some \(\rho _x,\rho _y\ge 0\), and \(\omega _i^{-1} \le {\underline{\omega }}^{-1}\) to obtain

$$\begin{aligned} \theta _x-\Vert y^{i+1}-{\widehat{y}}\Vert ^{2-p_x}(p_x^{p_x}\omega _i\zeta _x^{p_x-1})^{-1} \ge \theta _x-\rho _y^{2-p_x}(p_x^{p_x}{\underline{\omega }}\zeta _x^{p_x-1})^{-1}. \end{aligned}$$
(73)

If \(p_x=1\), we use the assumed inequality \(\theta _x \ge \rho _y{\underline{\omega }}^{-1}\) from (24e) to show that the right-hand side of (73) is non-negative for any \(\zeta _x>0\). Otherwise we take \(\zeta _x :=({\underline{\omega }}\theta _xp_x^{p_x}\rho _y^{p_x-2})^{1/(1-p_x)}\) to ensure the right-hand side of (73) is zero. In either case, \(\theta _x-\rho _y^{2-p_x}(p_x^{p_x}{\underline{\omega }}\zeta _x^{p_x-1})^{-1}\ge 0\) and hence

$$\begin{aligned} D_x \ge -\frac{\lambda _x}{2}\Vert x^{i+1}-x^i\Vert ^2 -(p_x-1)\omega ^{-1}_i\zeta _x \Vert y^{i+1}-{\widehat{y}}\Vert ^2. \end{aligned}$$
(74)

Analogously, from (69b) and Cauchy’s inequality,

$$\begin{aligned} \begin{aligned} D_y&\ge \theta _y\Vert K_x(x^i,{\widehat{y}})-K_x(x^i,y^i)-K_{xy}(x^i,y^i)({\widehat{y}}-y^i)\Vert ^{p_y} -\frac{\lambda _y}{2}\Vert y^{i+1}-y^i\Vert ^2 \\&\quad -\omega _i\Vert x^{i+1}-{\widehat{x}}\Vert \Vert K_x(x^i,{\widehat{y}})-K_x(x^i,y^i)-K_{xy}(x^i,y^i)({\widehat{y}}-y^i)\Vert . \end{aligned} \end{aligned}$$

This has a structure similar to (71) with \(\omega _i\) now as a multiplier. Hence, we apply a similar generalized Young’s inequality to the last term with any \(\zeta _y>0\). Noting that \(\omega _i\le {\overline{\omega }}\), we use the following bound similar to (73):

$$\begin{aligned} \theta _y-\Vert x^{i+1}-{\widehat{x}}\Vert ^{2-p_y}\omega _i(p_y^{p_y}\zeta _y^{p_y-1})^{-1} \ge \theta _y-\rho _x^{2-p_y}{\overline{\omega }}(p_y^{p_y}\zeta _y^{p_y-1})^{-1} \ge 0. \end{aligned}$$

The last inequality holds for any \(\zeta _y>0\) if \(p_y=1\) due to the assumed \(\theta _y \ge {\overline{\omega }}\rho _x\) from (24d); otherwise, we set \(\zeta _y :=(\theta _yp_y^{p_y}\rho _x^{p_y-2}{\overline{\omega }}^{-1})^{1/(1-p_y)}\). We then obtain that

$$\begin{aligned} D_y \ge -\frac{\lambda _y}{2}\Vert y^{i+1}-y^i\Vert ^2 -(p_y-1)\omega _i\zeta _y \Vert x^{i+1}-{\widehat{x}}\Vert ^2. \end{aligned}$$
(75)

Combining (30), (74), and (75), we can thus bound

$$\begin{aligned} \begin{aligned} D&= \eta _i D_x+\eta _{i+1}D_y+\eta _{i+1}D_\omega +\eta _i(\gamma _G-{{\tilde{\gamma }}}_G-\xi _x)\Vert x^{i+1}-{\widehat{x}}\Vert ^2 \\&\quad +\eta _{i+1}(\gamma _{F^*}-{{\tilde{\gamma }}}_{F^*}-\xi _y)\Vert y^{i+1}-{\widehat{y}}\Vert ^2 \\&\ge \eta _{i+1}(\gamma _{F^*}-{{\tilde{\gamma }}}_{F^*}-\xi _y-(p_x-1)\zeta _x) \Vert y^{i+1}-{\widehat{y}}\Vert ^2 -\eta _i\frac{\lambda _x}{2}\Vert x^{i+1}-x^i\Vert ^2 \\&\quad +\eta _{i}(\gamma _G-{{\tilde{\gamma }}}_G-\xi _x-(p_y-1)\zeta _y)\Vert x^{i+1}-{\widehat{x}}\Vert ^2 -\eta _{i+1}\frac{\lambda _y}{2}\Vert y^{i+1}-y^i\Vert ^2 \\&\quad -\eta _i \frac{L_{yx}}{2}(\omega _i+2)\rho _y\Vert x^{i+1}-x^i\Vert ^2 \\&\ge -\eta _i\frac{\lambda _x+L_{yx}(\omega _i+2)\rho _y}{2}\Vert x^{i+1}-x^i\Vert ^2 -\eta _{i+1}\frac{\lambda _y}{2}\Vert y^{i+1}-y^i\Vert ^2, \end{aligned} \end{aligned}$$
(76)

where in the final step, we have also used (70) and the selected \(\zeta _x\) and \(\zeta _y\) if \(p_x>1\) or \(p_y>1\) or both. Thus, we obtained exactly the same lower bound as in (33). We then continue along the rest of the proof of Theorem 4.2 to obtain the claim. \(\square \)

It is worth observing that when \(p_x \in (1, 2]\) or \(p_y\in (1,2]\), the inequalities (70) do not directly bound the respective \(\rho _y\) or \(\rho _x\). Hence, we do not need to initalize the corresponding variable locally, unlike when \(p_x=1\) or \(p_y=1\). On the other hand, sufficient strong convexity is required from the corresponding G and \(F^*\).

We start with the lemma ensuring that the iterates stay in the initial neighborhood of the saddle point.

Corollary B.2

The results of Lemma 5.2 continue to hold if the corresponding conditions of Theorem 4.2 are replaced with those in Corollary B.1.

Proof

The proof repeats that of Lemma 5.2, applying Corollary B.1 instead of Theorem 4.2 in Step 2. \(\square \)

We next extend the results of Section 6 to arbitrary choices of both \(p_x \in [1,2]\) and \(p_y \in [1,2]\). This mainly consists of verifying (70a) when \(p_y \ne 1\) and (70b) when \(p_x \ne 1\). Note that it is possible to take \(p_x=1\) and \(p_y \ne 1\), or vice versa, as long as the corresponding conditions are satisfied.

Corollary B.3

The results of Theorem 6.1 continue to hold if Assumption 3.2 (iv) is replaced with Assumption B.1 (iv*) for some \(p_x,p_y\in [1,2]\), where in case \(p_y \in (1, 2]\), (46a) is replaced with

$$\begin{aligned} \xi _x&= \gamma _G - \frac{p_y-1}{(\theta _yp_y^{p_y}(2\rho _x)^{p_y-2})^{\frac{1}{p_y-1}}}, \end{aligned}$$
(77a)

and in case \(p_x \in (1, 2]\), (46b) is replaced with

$$\begin{aligned} \xi _y&= \gamma _{F^*} - \frac{p_x-1}{(\theta _xp_x^{p_x}(2\rho _y)^{p_x-2})^{\frac{1}{p_x-1}}}. \end{aligned}$$
(77b)

Proof

Since conditions (77) are sufficient for (70) with \({\overline{\omega }}={\underline{\omega }}=1\) to hold, we can repeat the proof of Theorem 6.1 replacing the references to Theorem 4.2 by references to Corollary B.1 up until (50). If \(p_x>1\), we now obtain a lower bound on \(d_i^x\) by arguing as in (71)–(73) with \({\widehat{u}}\) replaced by \({\bar{u}}\). Specifically, using (16), Assumption B.1 (iv*) at \({\bar{u}}\), and the generalized Young’s inequality (72), we obtain for any \(\zeta _x>0\) that

$$\begin{aligned} \begin{aligned} d_i^x&\le -\theta _x\Vert K_y({\bar{x}},y^{i+1})-K_y(x^{i+1},y^{i+1})-K_{yx}(x^{i+1},y^{i+1})({\bar{x}}-x^{i+1})\Vert ^{p_x} \\&\quad +\Vert y^{i+1}-{\bar{y}}\Vert \Vert K_y({\bar{x}},y^{i+1})-K_y(x^{i+1},y^{i+1})-K_{yx}(x^{i+1},y^{i+1})({\bar{x}}-x^{i+1})\Vert \\&\quad +\frac{\lambda _x}{2}\Vert x^{i+1}-x^i\Vert ^2 -\frac{p_y-1}{(\theta _yp_y^{p_y}(2\rho _x)^{p_y-2})^{\frac{1}{p_y-1}}}\Vert x^{i+1}-{\bar{x}}\Vert ^2 \\&\le \left( \frac{\Vert y^{i+1}-{\bar{y}}\Vert ^{2-p_x}}{p_x^{p_x}\zeta _x^{p_x-1}}-\theta _x\right) \Vert K_y({\bar{x}},y^{i+1})-K_y(x^{i+1},y^{i+1})-K_{yx}(x^{i+1},y^{i+1})({\bar{x}}-x^{i+1})\Vert ^{p_x} \\&\quad +(p_x-1)\zeta _x \Vert y^{i+1}-{\bar{y}}\Vert ^2 +\frac{\lambda _x}{2}\Vert x^{i+1}-x^i\Vert ^2 -\frac{p_y-1}{(\theta _yp_y^{p_y}(2\rho _x)^{p_y-2})^{\frac{1}{p_y-1}}}\Vert x^{i+1}-{\bar{x}}\Vert ^2. \end{aligned} \end{aligned}$$

Inserting \(\zeta _x=(\theta _xp_x^{p_x}(2\rho _y)^{p_x-2})^{1/(1-p_x)}\) and \(\Vert y^{i+1}-{\bar{y}}\Vert \le 2\rho _y\), we eliminate the first term on the right-hand side. Likewise, if \(p_y>1\), similar steps applied to \(d_i^y\) result in

$$\begin{aligned} d_i^y\le (p_y-1)\zeta _y \Vert x^{i+1}-{\bar{x}}\Vert ^2 +\frac{\lambda _y}{2}\Vert y^{i+1}-y^i\Vert ^2 -\frac{p_x-1}{(\theta _xp_x^{p_x}(2\rho _y)^{p_x-2})^{\frac{1}{p_x-1}}}\Vert y^{i+1}-{\bar{y}}\Vert ^2 \end{aligned}$$

for \(\zeta _y=(\theta _yp_y^{p_y}(2\rho _x)^{p_y-2})^{1/(p_y-1)}\). Using \(\Vert u^{i+1}-u^i\Vert \rightarrow 0\) and the selection of \(\zeta _x\) and \(\zeta _y\), we then obtain the desired estimate \(\limsup _{i\rightarrow \infty }~q_i:=\limsup _{i\rightarrow \infty }~(d_i^x + d_i^y + O(\Vert u^{i+1}-u^i\Vert ))\le 0\). \(\square \)

Corollary B.4

The results of Theorem 6.3 continue to hold if Assumption 3.2 (iv) is replaced with Assumption B.1 (iv*) for some \(p_x,p_y\in [1,2]\), where in case \(p_y \in (1, 2]\), (51a) is replaced for some \({{\tilde{\gamma }}}_G > 0\) with

$$\begin{aligned} \xi _x&= \gamma _G - {{\tilde{\gamma }}}_G - \frac{p_y-1}{(\theta _yp_y^{p_y}(\rho _x)^{p_y-2})^{\frac{1}{p_y-1}}}, \end{aligned}$$
(78a)

and in case \(p_x \in (1, 2]\), (51b) is replaced with

$$\begin{aligned} \xi _y&= \gamma _{F^*} - \frac{p_x-1}{(\theta _xp_x^{p_x}(\rho _y)^{p_x-2})^{\frac{1}{p_x-1}}}. \end{aligned}$$
(78b)

Proof

Conditions (78) are sufficient for (70) with \({\overline{\omega }}={\underline{\omega }}=1\) to hold; therefore, we can repeat the proof of Theorem 6.3 replacing the references to Theorem 4.2 by references to Corollary B.1. \(\square \)

Corollary B.5

The results of Theorem 6.4 continue to hold if Assumption 3.2 (iv) is replaced with Assumption B.1 (iv*) for some \(p_x,p_y\in [1,2]\), where in case \(p_y \in (1, 2]\), (54a) is replaced for some \({{\tilde{\gamma }}}_G>0\) with

$$\begin{aligned} \xi _x&= \gamma _G - {{\tilde{\gamma }}}_G - \frac{p_y-1}{(\theta _yp_y^{p_y}(\rho _x)^{p_y-2}\omega ^{-1})^{\frac{1}{p_y-1}}}, \end{aligned}$$
(79a)

and in case \(p_x \in (1, 2]\), (54b) is replaced for some \({{\tilde{\gamma }}}_{F^*}>0\) with

$$\begin{aligned} \xi _y&= \gamma _{F^*} - {{\tilde{\gamma }}}_{F^*} - \frac{p_x-1}{(\omega \theta _xp_x^{p_x}(\rho _y)^{p_x-2})^{\frac{1}{p_x-1}}}. \end{aligned}$$
(79b)

Proof

Conditions (79) are sufficient for (70) with \({\overline{\omega }}={\underline{\omega }}=\omega \) to hold; therefore, we can repeat the proof of Theorem 6.4 replacing the references to Theorem 4.2 by references to Corollary B.1. \(\square \)

Corollary B.6

The results of Proposition 6.6 continue to hold if the corresponding conditions of Theorem 6.16.3, or 6.4 are replaced with those in Corollary B.3B.4, or B.5.

Proof

The proof repeats that of Proposition 6.6. \(\square \)

Verification of Conditions for Step Function Presentation and Potts Model

Throughout this section, we set \(\rho (t) :=2t-t^2\) and \(\kappa (x, y) :=\rho (\langle x,y\rangle )\) for \(x, y \in {\mathbb {R}}^m\). Then \(\rho '(t)=2(1-t)\) so that

$$\begin{aligned} \kappa _x(x,y)= & {} 2y(1-\langle y,x\rangle )\quad \text {and} \, \kappa _{xy}(x,y)=2(I-\langle y,x\rangle I-y \otimes x), \end{aligned}$$
(80a)
$$\begin{aligned} \kappa _y(x,y)= & {} 2x(1-\langle x,y\rangle )\quad \text {and} \, \kappa _{yx}(x,y)=2(I-\langle x,y\rangle I-x \otimes y), \end{aligned}$$
(80b)

where \(a\otimes b\in {\mathbb {R}}^{n\times n}\) is the tensor product between two vectors a and b, producing a matrix of all the combinations of products between the entries.

The following lemma verifies Assumption 3.2 for \(K=\kappa \).

Lemma C.1

Let \(R_K>2\), and suppose \({\widehat{x}},{\widehat{y}}\in {\mathbb {R}}^m\) for \(m \ge 1\) with

$$\begin{aligned} 0 \le \langle {\widehat{x}},{\widehat{y}}\rangle I + {\widehat{x}}\otimes {\widehat{y}}\le 2I. \end{aligned}$$
(81)

Then the function \(K=\kappa \) defined above satisfies Assumption 3.2 for some \(\theta _x, \theta _y >0\) and some \(\rho _x,\rho _y>0\) dependent on \(R_K\) with

$$\begin{aligned} L_x(y)&= 2|y|_2^2,&L_y(x)&= 2|x|_2^2,&L_{yx}&= 4(|{\widehat{y}}|_2+\rho _y), \end{aligned}$$

as well as the constants \(\xi _x,\xi _y \in {\mathbb {R}}\), \(\lambda _x,\lambda _y \ge 0\) satisfying \(\lambda _x\xi _x > 2(\lambda _x+|{\widehat{y}}|_2^2)|{\widehat{y}}|_2^2\), \(\xi _y > 0\), and \(\lambda _y > |{\widehat{x}}|_2^2\).

Proof

First, Assumption 3.2 (i) holds everywhere since \(K\in C^\infty ({\mathbb {R}}^m)\). To verify Assumption 3.2 (ii), we observe using (80) that

$$\begin{aligned} \kappa _x(x', y)-\kappa _x(x, y)= & {} 2(y \otimes y)(x-x'), \end{aligned}$$
(82a)
$$\begin{aligned} \kappa _{xy}(x, y')-\kappa _{xy}(x, y)= & {} 2\langle y-y',x\rangle I+2(y-y') \otimes x, \end{aligned}$$
(82b)
$$\begin{aligned} \kappa _y(x, y')-\kappa _y(x, y)= & {} 2(x \otimes x)(y-y'), \end{aligned}$$
(82c)
$$\begin{aligned} \kappa _{yx}(x', y)-\kappa _{yx}(x, y)= & {} 2\langle x-x',y\rangle I+2(x-x') \otimes y. \end{aligned}$$
(82d)

Hence \(L_x\), \(L_y\), and \(L_{yx}\) are as claimed.

To verify Assumption 3.2 (iii), we first of all observe using (81) that

$$\begin{aligned} |\kappa _{xy}({\widehat{x}}, {\widehat{y}})|_2 = 2|I-\langle {\widehat{y}},{\widehat{x}}\rangle I-{\widehat{y}}\otimes {\widehat{x}}|_2 \le 2. \end{aligned}$$

Therefore \({\sup _{(x, y) \in \mathbb {B}({\widehat{x}}, \rho _x) \times \mathbb {B}({\widehat{y}}, \rho _y)} |\kappa _{xy}(x, y)|_2 \le R_K}\) for some \({\rho _x, \rho _y>0}\) dependent on \(R_K>2\).

Finally, to verify Assumption 3.2 (iv), we start with (15a), i.e.,

$$\begin{aligned}&\langle \kappa _x(x',{\widehat{y}})-\kappa _x({\widehat{x}},{\widehat{y}}),x-{\widehat{x}}\rangle +\xi _x|x-{\widehat{x}}|_2^2 \\&\quad \ge \theta _x|\kappa _y({\widehat{x}},y)-\kappa _y(x,y)-\kappa _{yx}(x,y)({\widehat{x}}-x)|_2 -\frac{\lambda _x}{2}|x-x'|_2^2. \end{aligned}$$

Expanding the equation using (80), (82), and

$$\begin{aligned} \begin{aligned} \kappa _y({\widehat{x}},y)&-\kappa _y(x,y)-\kappa _{yx}(x,y)({\widehat{x}}-x) \\&=2{\widehat{x}}(1-\langle {\widehat{x}},y\rangle )-2x(1-\langle x,y\rangle ) -2(I-\langle x,y\rangle I-x \otimes y)({\widehat{x}}-x) \\ {}&=2[\langle x,y\rangle x-\langle {\widehat{x}},y\rangle {\widehat{x}}+(\langle x,y\rangle I+x \otimes y)({\widehat{x}}-x)] \\ {}&=2[\langle x-{\widehat{x}},y\rangle {\widehat{x}}+(x \otimes y)({\widehat{x}}-x)] \\ {}&=-2(({\widehat{x}}-x) \otimes y)({\widehat{x}}-x), \end{aligned} \end{aligned}$$

we require that

$$\begin{aligned} 2\langle {\widehat{x}}-x',x-{\widehat{x}}\rangle _{{\widehat{y}}\otimes {\widehat{y}}} +\xi _x|x-{\widehat{x}}|_2^2 \ge 2\theta _x|y|_2|x-{\widehat{x}}|_2^2 -\frac{\lambda _x}{2}|x-x'|_2^2. \end{aligned}$$
(83)

Taking any \(\alpha >0\), this will hold by Cauchy’s and Young’s inequalities if \(\xi _x \ge (2+\alpha )|{\widehat{y}}|_2^2 + 2\theta _x|y|_2\) and \(\lambda _x/2 \ge \alpha ^{-1}|{\widehat{y}}|_2^2\). If \(|{\widehat{y}}|_2=0\), clearly these hold for some \(\alpha ,\theta _x>0\). Otherwise, solving \(\alpha \) from the latter as an equality, i.e., taking \(\alpha =2\lambda ^{-1}_x|{\widehat{y}}|_2^2\), the former holds if \(\xi _x \ge 2(1+\lambda ^{-1}_x|{\widehat{y}}|_2^2)|{\widehat{y}}|_2^2 + 2\theta _x|y|_2\). If \(\lambda _x\xi _x > 2(\lambda _x+|{\widehat{y}}|_2^2)|{\widehat{y}}|_2^2\), this holds for some \(\theta _x,\rho _x,\rho _y>0\) in a neighborhood \({\mathbb {B}({\widehat{x}}, \rho _x) \times \mathbb {B}({\widehat{y}}, \rho _y)}\) of (\({\widehat{x}}, {\widehat{y}})\).

It remains to verify (15b), i.e.,

$$\begin{aligned}&\langle \kappa _y(x,y)-\kappa _y(x,y')+\kappa _y({\widehat{x}},{\widehat{y}})-\kappa _y({\widehat{x}},y),y-{\widehat{y}}\rangle +\xi _y|y-{\widehat{y}}|_2^2 \\&\quad \ge \theta _y|\kappa _x(x',{\widehat{y}})-\kappa _x(x',y')-\kappa _{xy}(x',y')({\widehat{y}}-y')|_2 -\frac{\lambda _y}{2}|y-y'|_2^2. \end{aligned}$$

Again, using (80) and (82) we expand this as

$$\begin{aligned} 2\langle y'-y,y-{\widehat{y}}\rangle _{x \otimes x}+2|y-{\widehat{y}}|_{{\widehat{x}}\otimes {\widehat{x}}}^2 +\xi _y|y-{\widehat{y}}|_2^2 \ge 2\theta _y|x'|_2|y'-{\widehat{y}}|_2^2 -\frac{\lambda _y}{2}|y-y'|_2^2. \end{aligned}$$

Rearranging the \(\theta _y\)-term, we see that this holds if

$$\begin{aligned}&2\langle y'-y,y-{\widehat{y}}\rangle _{x \otimes x-2\theta _y|x'|_2I} +2|y-{\widehat{y}}|^2_{{\widehat{x}}\otimes {\widehat{x}}}+(\xi _y-2\theta _y)|x'|_2|y-{\widehat{y}}|_2^2 \\&\quad \ge \left( 2\theta _y|x'|_2-\frac{\lambda _y}{2}\right) |y'-y|_2^2. \end{aligned}$$

Rearranging and estimating the first term as

$$\begin{aligned} \begin{aligned} 2\langle y'-y,y-{\widehat{y}}\rangle _{x \otimes x-2\theta _y|x'|_2I}&= 2\langle y'-y,x\rangle \langle y-{\widehat{y}},x\rangle -4\theta _y|x'|_2\langle y'-y,y-{\widehat{y}}\rangle \\&\ge -2|y-{\widehat{y}}|^2_{x \otimes x}-\frac{1}{2}|y'-y|^2_{x \otimes x} -4\theta _y|x'|_2|y'-y|_2^2-\theta _y|x'|_2|y-{\widehat{y}}|_2^2 \end{aligned} \end{aligned}$$

and then using Young’s inequality on both parts, we obtain the condition

$$\begin{aligned} 2\left( |y-{\widehat{y}}|^2_{{\widehat{x}}\otimes {\widehat{x}}}-|y-{\widehat{y}}|^2_{x \otimes x}\right) +(\xi _y-3\theta _y)|x'|_2|y-{\widehat{y}}|_2^2 \ge \left( \frac{1}{2}|x|_2^2 + 6\theta _y|x'|_2-\frac{\lambda _y}{2}\right) |y'-y|_2^2. \end{aligned}$$

If \(\xi _y > 0\) and \(\lambda _y > |{\widehat{x}}|_2^2\), this holds for some \(\theta _y,\rho _y,\rho _x>0\) in \({\mathbb {B}({\widehat{x}}, \rho _x) \times \mathbb {B}({\widehat{y}}, \rho _y)}\). \(\square \)

We comment on the condition (81) on the primal–dual solutions pair \({\widehat{x}}, {\widehat{y}}\in {\mathbb {R}}\). First, for \(m=1\), this condition reduces to \({\widehat{x}}{\widehat{y}}\in [0, 1]\). This is necessarily satisfied in the case of the step function (where \(f^*=\delta _{[0, \infty )}\)) and in the case of the \(\ell ^0\) function (where \(f^*=0\)) as in both cases, \({\widehat{x}}{\widehat{y}}\in \{0, 1\}\) by the dual optimality condition \(\kappa _y({\widehat{x}}, {\widehat{y}}) \in \partial f^*({\widehat{y}})\). Furthermore, if we take \(f^*_\gamma =\frac{\gamma }{2}|\,\varvec{\cdot }\,|_2^2\) for some \(\gamma \ge 0\), then for any \(m\ge 1\) the dual optimality condition reads \(2{\widehat{x}}(1-\langle {\widehat{x}},{\widehat{y}}\rangle )=\gamma {\widehat{y}}\), i.e, \({\widehat{y}}=2{\widehat{x}}(\gamma +2|{\widehat{x}}|_2^2)^{-1}\), for which (81) is easily verified.

The following lemma shows that Assumption 3.2 remains valid if we include a linear operator in the primal component.

Lemma C.2

Let \(K(x, y)={\tilde{K}}(Ax, y)\) for some \({A \in \mathbb {L}(X; Z)}\) and \({\tilde{K}} \in C^1(Z \times Y)\) on Hilbert spaces XYZ. Suppose \({\tilde{K}}\) satisfies Assumption 3.2 at \(({\widehat{z}}, {\widehat{y}}) :=(A {\widehat{x}}, {\widehat{y}})\). Mark the corresponding constants with a tilde: \({\tilde{L}}_z\), \({\tilde{R}}_K\), and so on. Then K satisfies Assumption 3.2 with \(R_K :={\tilde{R}}_K \Vert A\Vert \); \(\xi _x=\Vert A\Vert {{\tilde{\xi }}}_z\), \(\xi _y={{\tilde{\xi }}}_y\); \(\lambda _x=\Vert A\Vert {{\tilde{\lambda }}}_z\), \(\lambda _y={{\tilde{\lambda }}}_y\); \(\theta _x={{\tilde{\theta }}}_z\), \(\theta _y={{\tilde{\theta }}}_y\Vert A\Vert ^{-1}\); \(\rho _x=\Vert A\Vert ^{-1}{{\tilde{\rho }}}_x\), and \(\rho _y={{\tilde{\rho }}}_y\) as well as

$$\begin{aligned} L_x(y)&=\Vert A\Vert ^2{\tilde{L}}_z(y),&L_y(x)&={\tilde{L}}_y(Ax),&L_{yx}&=\Vert A\Vert ^2{\tilde{L}}_{yz}. \end{aligned}$$
(84)

Proof

Observe first of all that by the chain rule,

$$\begin{aligned} K_x(x, y)&= A^* {\tilde{K}}_z(Ax, y),&K_y(x, y)&= {\tilde{K}}_y(Ax, y),&K_{xy}(x,y)&=A^*{\tilde{K}}_{zy}(Ax, y), \end{aligned}$$

and hence Assumption 3.2 (i) holds for K if it holds for \({\tilde{K}}\).

Let now Assumption 3.2 (ii) hold for \({\tilde{K}}\) with \({\tilde{L}}_x\), \({\tilde{L}}_y\), and \({\tilde{L}}_{yx}\). Observing that

$$\begin{aligned} {A\mathbb {B}({\widehat{x}}, \rho _x) \times \mathbb {B}({\widehat{y}}, \rho _y) \subset \mathbb {B}({\widehat{z}}, {{\tilde{\rho }}}_x) \times \mathbb {B}({\widehat{y}},{{\tilde{\rho }}}_y),} \end{aligned}$$
(85)

Assumption 3.2 (ii) thus also holds with the function of (84). Similarly in Assumption 3.2 (iii), we can take \(R_K :={\tilde{R}}_K \Vert A\Vert \).

Finally, we expand Assumption 3.2 (iv) for K as

$$\begin{aligned} \begin{aligned}&\langle {\tilde{K}}_z(z',{\widehat{y}})-{\tilde{K}}_z({\widehat{z}},{\widehat{y}}),z-{\widehat{z}}\rangle +\xi _x\Vert x-{\widehat{x}}\Vert ^2 \\&\quad \ge \theta _x\Vert {\tilde{K}}_y({\widehat{z}},y)-{\tilde{K}}_y(z,y)-{\tilde{K}}_{yz}(z,y)({\widehat{z}}-z)\Vert -\frac{\lambda _x}{2}\Vert x-x'\Vert ^2 \end{aligned} \end{aligned}$$

and

$$\begin{aligned} \begin{aligned}&\langle {\tilde{K}}_y(z,y)-{\tilde{K}}_y(z,y') +{\tilde{K}}_y({\widehat{z}},{\widehat{y}})-{\tilde{K}}_y({\widehat{z}},y),y-{\widehat{y}}\rangle +\xi _y\Vert y-{\widehat{y}}\Vert ^2 \\&\quad \ge \theta _y\Vert A^*[{\tilde{K}}_z(z',{\widehat{y}}) -{\tilde{K}}_z(z',y')-{\tilde{K}}_{zy}(z',y')({\widehat{y}}-y')]\Vert -\frac{\lambda _y}{2}\Vert y-y'\Vert ^2, \end{aligned} \end{aligned}$$

where \(z=Ax\), \(z'=Ax'\), and \({\widehat{z}}=A{\widehat{x}}\). Since \(\Vert z-z'\Vert \le \Vert A\Vert \Vert x-x'\Vert \), etc., this follows from Assumption 3.2 (iv) for \({\tilde{K}}\) with the constants as claimed. \(\square \)

Applying this lemma to \({\tilde{K}}(z, y)=\sum _{k=1}^n \kappa (z_k, y_k)\), we can thus lift the scalar estimates for \(K=\kappa \) as in (80) to the corresponding estimates on \(K(x, y) :=\sum _{k=1}^n \kappa ([D_h x]_k, y_k)\) as used in the Potts model example.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Clason, C., Mazurenko, S. & Valkonen, T. Primal–Dual Proximal Splitting and Generalized Conjugation in Non-smooth Non-convex Optimization. Appl Math Optim 84, 1239–1284 (2021). https://doi.org/10.1007/s00245-020-09676-1

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00245-020-09676-1

Keywords

Mathematics Subject Classification

Navigation