Skip to main content
Log in

On Iteration Complexity of a First-Order Primal-Dual Method for Nonlinear Convex Cone Programming

  • Published:
Journal of the Operations Research Society of China Aims and scope Submit manuscript

Abstract

Nonlinear convex cone programming (NCCP) models have found many practical applications. In this paper, we introduce a flexible first-order primal-dual algorithm, called the variant auxiliary problem principle (VAPP), for solving NCCP problems when the objective function and constraints are convex but may be nonsmooth. At each iteration, VAPP generates a nonlinear approximation of the primal augmented Lagrangian model. The approximation incorporates both linearization and a distance-like proximal term, and then the iterations of VAPP are shown to possess a decomposition property for NCCP. Motivated by recent applications in big data analytics, there has been a growing interest in the convergence rate analysis of algorithms with parallel computing capabilities for large scale optimization problems. We establish O(1/t) convergence rate towards primal optimality, feasibility and dual optimality. By adaptively setting parameters at different iterations, we show an \(O(1/t^2)\) rate for the strongly convex case. Finally, we discuss some issues in the implementation of VAPP.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. The MATLAB codes can be found at: https://github.com/lzhao-cloud/GuaranteeDisplayAdvertising.git.

  2. The MATLAB codes can be found at: https://github.com/lzhao-cloud/StructuredElasticNetSupportVectorMachine.git.

References

  1. Goberna, M.A., López, M.A.: Linear Semi-infinite Optimization, vol. 2. Wiley, London (1998)

    MATH  Google Scholar 

  2. López, M., Still, G.: Semi-infinite programming. Eur. J. Oper. Res. 180(2), 491–518 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  3. Shapiro, A.: Semi-infinite programming, duality, discretization and optimality conditions. Optimization 58(2), 133–161 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  4. Alizadeh, F., Goldfarb, D.: Second-order cone programming. Math. Program. 95(1), 3–51 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  5. Fukuda, E.H., Silva, P.J., Fukushima, M.: Differentiable exact penalty functions for nonlinear second-order cone programs. SIAM J. Optim. 22(4), 1607–1633 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  6. Kanzow, C., Ferenczi, I., Fukushima, M.: On the local convergence of semismooth Newton methods for linear and nonlinear second-order cone programs without strict complementarity. SIAM J. Optim. 20(1), 297–320 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  7. Kato, H., Fukushima, M.: An SQP-type algorithm for nonlinear second-order cone programs. Optim. Lett. 1(2), 129–144 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  8. Yamashita, H., Yabe, H.: A primal–dual interior point method for nonlinear optimization over second-order cones. Optim. Methods Softw. 24(3), 407–426 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  9. Ben-Tal, A., Nemirovski, A.: Robust convex optimization. Math. Oper. Res. 23(4), 769–805 (1998)

    Article  MathSciNet  MATH  Google Scholar 

  10. Ben-Tal, A., El Ghaoui, L., Nemirovski, A.: Robust Optimization. Princeton University Press, Princeton (2009)

    Book  MATH  Google Scholar 

  11. Lobo, M.S., Vandenberghe, L., Boyd, S., Lebret, H.: Applications of second-order cone programming. Linear Algebra Appl. 284(1), 193–228 (1998)

    Article  MathSciNet  MATH  Google Scholar 

  12. Wu, S.P., Boyd, S., Vandenberghe, L.: FIR filter design via semidefinite programming and spectral factorization. In: Proceedings of the 35th IEEE Conference on Decision and Control, vol. 1, pp. 271–276. IEEE (1996)

  13. Candés, E.J., Romberg, J., Tao, T.: Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information. IEEE Trans. Inf. Theory 52(2), 489–509 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  14. Patriksson, M.: A survey on the continuous nonlinear resource allocation problem. Eur. J. Oper. Res. 185(1), 1–46 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  15. Donoho, D.L.: Compressed sensing. IEEE Trans. Inf. Theory 52(4), 1289–1306 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  16. Patriksson, M., Strömberg, C.: Algorithms for the continuous nonlinear resource allocation problem–new implementations and numerical studies. Eur. J. Oper. Res. 243(3), 703–722 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  17. Hestenes, M.R.: Multiplier and gradient methods. J. Optim. Theory Appl. 4(5), 303–320 (1969)

    Article  MathSciNet  MATH  Google Scholar 

  18. Powell, M.J.D.: A method for nonlinear constraints in minimization problems. In: Fletcher, R. (ed.) Optimization. Academic Press, London (1969)

    Google Scholar 

  19. Buys, J.D.: Dual algorithms for constrained optimization problems. Brondder-Offset (1972)

  20. Rockafellar, R.T.: Augmented Lagrangians and applications of the proximal point algorithm in convex programming. Math. Oper. Res. 1(2), 97–116 (1976)

    Article  MathSciNet  MATH  Google Scholar 

  21. Shapiro, A., Sun, J.: Some properties of the augmented Lagrangian in cone constrained optimization. Math. Oper. Res. 29(3), 479–491 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  22. Fortin, M., Glowinski, R.: Chapter III on decomposition–coordination methods using an augmented Lagrangian. Stud. Math. Its Appl. 15, 97–146 (1983)

    Google Scholar 

  23. Stellato, B., Banjac, G., Goulart, P., Bemporad, A., Boyd, S.: OSQP: An operator splitting solver for quadratic programs. In: 2018 UKACC 12th International Conference on Control (CONTROL), pp. 339–339. IEEE (2018)

  24. Cohen, G., Zhu, D.L.: Decomposition coordination methods in large scale optimization problems. The nondifferentiable case and the use of augmented Lagrangians. Adv. Large Scale Syst. 1, 203–266 (1984)

    MathSciNet  Google Scholar 

  25. Contreras, J., Losi, A., Russo, M., Wu, F.F.: DistOpt: a software framework for modeling and evaluating optimization problem solutions in distributed environments. J. Parallel Distrib. Comput. 60(6), 741–763 (2000)

    Article  MATH  Google Scholar 

  26. Losi, A., Russo, M.: On the application of the auxiliary problem principle. J. Optim. Theory Appl. 117(2), 377–396 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  27. Kim, B.H., Baldick, R.: Coarse-grained distributed optimal power flow. IEEE Trans. Power Syst. 12(2), 932–939 (1997)

    Article  Google Scholar 

  28. Kim, B.H., Baldick, R.: A comparison of distributed optimal power flow algorithms. IEEE Trans. Power Syst. 15(2), 599–604 (2000)

    Article  Google Scholar 

  29. Renaud, A.: Daily generation management at Electricité de France: from planning towards real time. IEEE Trans. Autom. Control 38(7), 1080–1093 (1993)

    Article  Google Scholar 

  30. Cao, L., Sun, Y., Cheng, X., Qi, B., Li, Q.: Research on the convergent performance of the auxiliary problem principle based distributed and parallel optimization algorithm. In: 2007 IEEE International Conference on Automation and Logistics, pp. 1083–1088. IEEE (2007)

  31. Hur, D., Park, J.K., Kim, B.H.: On the convergence rate improvement of mathematical decomposition technique on distributed optimal power flow. Int. J. Electr. Power Energy Syst. 25(1), 31–39 (2003)

    Article  Google Scholar 

  32. Gabay, D., Mercier, B.: A dual algorithm for the solution of nonlinear variational problems via finite element approximation. Comput. Math. With Appl. 2(1), 17–40 (1976)

    Article  MATH  Google Scholar 

  33. Aybat, N.S., Hamedani, E.Y.: A distributed ADMM-like method for resource sharing under conic constraints over time-varying networks. arXiv:1611.07393 (2016)

  34. Li, M., Sun, D., Toh, K.C.: A majorized ADMM with indefinite proximal terms for linearly constrained convex composite optimization. SIAM J. Optim. 26(2), 922–950 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  35. He, B., Yuan, X.: On the \(O(1/n)\) convergence rate of the Douglas–Rachford alternating direction method. SIAM J. Numer. Anal. 50(2), 700–709 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  36. Monteiro, R.D., Svaiter, B.F.: Iteration-complexity of block-decomposition algorithms and the alternating direction method of multipliers. SIAM J. Optim. 23(1), 475–507 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  37. Gao, X., Zhang, S.Z.: First-order algorithms for convex optimization with nonseparable objective and coupled constraints. J. Oper. Res. Soc. China 5(2), 131–159 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  38. Deng, W., Yin, W.: On the global and linear convergence of the generalized alternating direction method of multipliers. J. Sci. Comput. 66(3), 889–916 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  39. Hong, M., Luo, Z.Q.: On the linear convergence of the alternating direction method of multipliers. Math. Program. 162(1–2), 165–199 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  40. Lin, T., Ma, S., Zhang, S.: On the global linear convergence of the ADMM with multiblock variables. SIAM J. Optim. 25(3), 1478–1497 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  41. Liu, Y., Yuan, X., Zeng, S., Zhang, J.: Partial error bound conditions and the linear convergence rate of the alternating direction method of multipliers. SIAM J. Numer. Anal. 56(4), 2095–2123 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  42. Chen, G., Teboulle, M.: A proximal-based decomposition method for convex minimization problems. Math. Program. 64(1–3), 81–101 (1994)

    Article  MathSciNet  MATH  Google Scholar 

  43. Zhang, X., Burger, M., Osher, S.: A unified primal–dual algorithm framework based on Bregman iteration. J. Sci. Comput. 46(1), 20–46 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  44. Deng, W., Lai, M.J., Peng, Z., Yin, W.: Parallel multi-block ADMM with \(o(1/k)\) convergence. J. Sci. Comput. 71(2), 712–736 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  45. Börgens, E., Kanzow, C.: Regularized Jacobi-type ADMM-methods for a class of separable convex optimization problems in Hilbert spaces. Comput. Optim. Appl. 73(3), 755–790 (2019)

    Article  MathSciNet  MATH  Google Scholar 

  46. Chambolle, A., Pock, T.: A first-order primal-dual algorithm for convex problems with applications to imaging. J. Math. Imaging Vis. 40(1), 120–145 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  47. Chambolle, A., Pock, T.: On the ergodic convergence rates of a first-order primal-dual algorithm. Math. Program. 159(1–2), 253–287 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  48. Nemirovski, A.: Prox-method with rate of convergence \(O\)(1/t) for variational inequalities with Lipschitz continuous monotone operators and smooth convex-concave saddle point problems. SIAM J. Optim. 15(1), 229–251 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  49. He, N., Juditsky, A., Nemirovski, A.: Mirror prox algorithm for multi-term composite minimization and semi-separable problems. Comput. Optim. Appl. 61(2), 275–319 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  50. Juditsky, A., Nemirovski, A.: First order methods for nonsmooth convex large-scale optimization, ii: utilizing problems structure. Optim. Mach. Learn. 30(9), 149–183 (2011)

    Google Scholar 

  51. Hamedani, E.Y., Aybat, N.S.: A primal–dual algorithm for general convex–concave saddle point problems. arXiv:1803.01401 (2018)

  52. Fang, Z., Li, Y., Liu, C., Zhu, W., Zhang, Y., Zhou, W.: Large-scale personalized delivery for guaranteed display advertising with real-time pacing. IEEE Int. Conf. Data Min. (ICDM) 2019, 190–199 (2019)

    Google Scholar 

  53. Hojjat, A., Turner, J., Cetintas, S., Yang, J.: A unified framework for the scheduling of guaranteed targeted display advertising under reach and frequency requirements. Oper. Res. 65(2), 289–313 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  54. Turner, J.: The planning of guaranteed targeted display advertising. Oper. Res. 60(1), 18–33 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  55. Turner, J., Hojjat, A., Cetintas, S., Yang, J.: Delivering guaranteed display ADS under reach and frequency requirements. In: Twenty-Eighth AAAI Conference on Artificial Intelligence. AAAI Press (2014)

  56. Slawski, M., Zu Castell, W., Tutz, G.: Feature selection guided by structural information. Ann. Appl. Stat. 4, 1056–1080 (2010)

  57. Slawski, M.: The structured elastic net for quantile regression and support vector classification. Stat. Comput. 22(1), 153–168 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  58. Ortega, J.M., Rheinboldt, W.C.: Iterative Solution of Nonlinear Equations in Several Variables, vol. 30. SIAM, Philadelphia (1970)

    MATH  Google Scholar 

  59. Shapiro, A., Scheinberg, K.: Duality and optimality conditions. Handbook of Semidefinite Programming, pp. 67–110 (2000)

  60. Cheney, W., Goldstein, A.A.: Proximity maps for convex sets. Proc. Am. Math. Soc. 10(3), 448–450 (1959)

    Article  MathSciNet  MATH  Google Scholar 

  61. Wierzbicki, A.P., Kurcyusz, S.: Projection on a cone, penalty functionals and duality theory for problems with inequaltity constraints in Hilbert space. SIAM J. Control Optim. 15(1), 25–56 (1977)

    Article  MathSciNet  MATH  Google Scholar 

  62. Beck, A., Teboulle, M.: Mirror descent and nonlinear projected subgradient methods for convex optimization. Oper. Res. Lett. 31(3), 167–175 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  63. Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University Press, Cambridge (2004)

    Book  MATH  Google Scholar 

  64. Hiriart-Urruty, J.B., Lemaréchal, C.: Convex Analysis and Minimization Algorithms I: Fundamentals, vol. 305. Springer, Berlin (2013)

    MATH  Google Scholar 

  65. Aybat, N.S., Iyengar, G.: A unified approach for minimizing composite norms. Math. Program. 144(1–2), 181–226 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  66. Vapnik, V.: Statistical Learning Theory, vol. 3. Wiley, New York (1998)

    MATH  Google Scholar 

  67. Bi, J., Vapnik, V.N.: Learning with rigorous support vector machines. In: Learning Theory and Kernel Machines, pp. 243–257. Springer, Berlin (2003)

  68. Oneto, L., Ridella, S., Anguita, D.: Tikhonov, Ivanov and Morozov regularization for support vector machine learning. Mach. Learn. 103(1), 103–136 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  69. Zhao, L., Zhu, D.: First-order primal-dual method for nonlinear convex cone programs. arXiv preprint arXiv: 1801.00261v5 (2019)

Download references

Acknowledgements

The authors are grateful for valuable comments from Professor Shu-Zhong Zhang on earlier versions of this manuscript.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dao-Li Zhu.

Additional information

This research was supported by the National Natural Science Foundation of China (Nos. 71471112 and 71871140)

This paper was dedicated to late Professor Quan Zheng in commeoration of his contribution to Operations Research.

Appendix

Appendix

A\(_1\): Proof of Lemma 1(Descent inequalities of generalized distance function)

Step 1. Estimate \(L(u^{k+1},q^k)-L(u,q^k)\).

For the primal subproblem (22) of VAPP, the unique solution \(u^{k+1}\) is characterized by the following variational inequality:

$$\begin{aligned}&\langle \nabla G(u^{k}),u-u^{k+1}\rangle +J(u)-J(u^{k+1})\nonumber \\&\qquad +\langle q^k,\nabla \varOmega (u^{k})(u-u^{k+1})+\varPhi (u)-\varPhi (u^{k+1})\rangle \nonumber \\&\qquad +\frac{1}{\varepsilon ^k}\langle \nabla K(u^{k+1})-\nabla K(u^k), u-u^{k+1}\rangle \ge 0, \forall u\in {\varvec{U}}, \end{aligned}$$
(A1)

which follows that

$$\begin{aligned} L(u^{k+1},q^k)-L(u,q^k)= & {} (G+J)(u^{k+1})-(G+J)(u)+\langle q^k, \varTheta (u^{k+1})-\varTheta (u)\rangle \nonumber \\\leqslant & {} \underbrace{G(u^{k+1})-G(u)+\langle \nabla G(u^{k}),u-u^{k+1}\rangle }_{\varLambda _1}\nonumber \\&+\underbrace{\langle q^k,\varOmega (u^{k+1})-\varOmega (u)+\nabla \varOmega (u^{k})(u-u^{k+1})\rangle }_{\varLambda _2}\nonumber \\&+\underbrace{\frac{1}{\varepsilon ^k}\langle \nabla K(u^{k+1})-\nabla K(u^k), u-u^{k+1}\rangle }_{\varLambda _3}. \end{aligned}$$
(A2)

By the convexity of G, we estimate term \(\varLambda _1\) in (A2).

$$\begin{aligned} \varLambda _1= & {} G(u^k)-G(u)+\langle \nabla G(u^{k}),u-u^{k}\rangle +\big (G(u^{k+1})-G(u^{k})\nonumber \\&\quad -\langle \nabla G(u^{k}),u^{k+1}-u^{k}\rangle \big )\nonumber \\\leqslant & {} G(u^{k+1})-G(u^{k})-\langle \nabla G(u^{k}),u^{k+1}-u^{k}\rangle . \end{aligned}$$
(A3)

Since \(\varOmega (u)\) is \({\varvec{ C}}\)-convex, \(q^k\in {\varvec{ C}}^*\), then \(\langle q^k,\varOmega (u)\rangle \) is convex and

$$\begin{aligned} \varLambda _2= & {} \langle q^k,\varOmega (u^{k})-\varOmega (u)+\nabla \varOmega (u^{k})(u-u^{k})\rangle +\big (\langle q^k,\varOmega (u^{k+1})-\varOmega (u^{k})\nonumber \\&-\nabla \varOmega (u^{k})(u^{k+1}-u^{k})\rangle \big )\nonumber \\\leqslant & {} \langle q^k,\varOmega (u^{k+1})-\varOmega (u^{k})-\nabla \varOmega (u^{k})(u^{k+1}-u^{k})\rangle . \end{aligned}$$
(A4)

Since \(K(\cdot )\) satisfies Assumption 2, simple algebraic operation follows that

$$\begin{aligned} \varLambda _3= & {} \frac{1}{\varepsilon ^k}\langle \nabla K(u^{k+1})-\nabla K(u^k),u-u^{k+1}\rangle \nonumber \\= & {} \frac{1}{\varepsilon ^k}\big [D(u,u^k)-D(u,u^{k+1})-D(u^{k+1},u^k)\big ]. \end{aligned}$$
(A5)

Take \(\varLambda _1\), \(\varLambda _2\) and \(\varLambda _3\) into (A2), we have

$$\begin{aligned} L(u^{k+1},q^k)-L(u,q^k)\leqslant & {} \frac{1}{\varepsilon ^k}D(u,u^k)-\frac{1}{\varepsilon ^{k}}D(u,u^{k+1})-\frac{1}{\varepsilon ^k}\bigg \{D(u^{k+1},u^{k})\;\\&-\varepsilon ^k\bigg [\big (G(u^{k+1})-G(u^{k})-\langle \nabla G(u^{k}),u^{k+1}-u^{k}\rangle \big )\\&+\langle q^k,\varOmega (u^{k+1})-\varOmega (u^{k})-\nabla \varOmega (u^{k})(u^{k+1}-u^{k})\rangle \bigg ]\bigg \}. \end{aligned}$$

Multiply \(\varepsilon ^k\) on both side of the above inequality, and we have that

$$\begin{aligned}&\varepsilon ^k[L(u^{k+1},q^k)-L(u,q^k)]\nonumber \\&\leqslant D(u,u^k)-D(u,u^{k+1})-\varDelta ^k(u^k,u^{k+1})-\frac{\varepsilon ^k\gamma }{2}\Vert \varTheta (u^{k})-\varTheta (u^{k+1})\Vert ^2. \end{aligned}$$
(A6)

Step 2. Estimate \(L(u^{k+1},p)-L(u^{k+1},q^k).\)

We first derive two inequalities. By the property of projection (16) with \(u=p^k+\gamma \varTheta (u^{k+1})\), \(v=p\), \(\forall p\in {\varvec{C}}^*\), we have

$$\begin{aligned} \frac{1}{\gamma }\langle p-p^{k+1}, p^k+\gamma \varTheta (u^{k+1})-p^{k+1}\rangle \leqslant 0. \end{aligned}$$
(A7)

Using Proposition 1 with \(u=\gamma \varTheta (u^{k+1})\), \(v=\gamma \varTheta (u^k)\), and \(w=p^k\), we have

$$\begin{aligned}&2\langle p^{k+1}-q^k,\gamma \varTheta (u^{k+1})\rangle \leqslant \Vert \gamma \varTheta (u^{k+1})\nonumber \\&\quad -\gamma \varTheta (u^k)\Vert ^2+\Vert p^{k+1}-p^k\Vert ^2-\Vert q^k-p^k\Vert ^2. \end{aligned}$$
(A8)

Statement (ii) follows from (A7) and (A8):

$$\begin{aligned}&L(u^{k+1},p)-L(u^{k+1},q^k)\nonumber \\= & {} \langle p-q^k,\varTheta (u^{k+1})\rangle \nonumber \\= & {} \langle p-p^{k+1},\varTheta (u^{k+1})\rangle +\langle p^{k+1}-q^k,\varTheta (u^{k+1})\rangle \nonumber \\= & {} \frac{1}{\gamma }\langle p-p^{k+1},p^k+\gamma \varTheta (u^{k+1})-p^{k+1}\rangle +\frac{1}{\gamma }\langle p-p^{k+1},p^{k+1}-p^k\rangle \nonumber \\&+\langle p^{k+1}-q^k,\varTheta (u^{k+1})\rangle \nonumber \\\leqslant & {} \frac{1}{\gamma }\langle p-p^{k+1},p^{k+1}-p^k\rangle +\langle p^{k+1}-q^k,\varTheta (u^{k+1})\rangle \quad \text{(by } \text{ inequality }~(47))\nonumber \\\leqslant & {} \frac{1}{\gamma }\langle p-p^{k+1},p^{k+1}-p^{k}\rangle +\frac{1}{2\gamma }\Vert p^k-p^{k+1}\Vert ^2-\frac{1}{2\gamma }\Vert q^k-p^k\Vert ^2 \nonumber \\&+\frac{\gamma }{2}\Vert \varTheta (u^{k})-\varTheta (u^{k+1})\Vert ^2\quad \;\text{(by } \text{ inequality }~(48))\nonumber \\= & {} \frac{1}{2\gamma }\big [\Vert p-p^{k}\Vert ^2-\Vert p-p^{k+1}\Vert ^2\big ]-\frac{1}{2\gamma }\Vert q^k-p^k\Vert ^2 \nonumber \\&+\frac{\gamma }{2}\Vert \varTheta (u^{k})-\varTheta (u^{k+1})\Vert ^2. \end{aligned}$$
(A9)

Then, multiplying \(\varepsilon ^k\) on both side of (A9), we obtain

$$\begin{aligned}&\varepsilon ^k[L(u^{k+1},p)-L(u^{k+1},q^k)]\nonumber \\&\leqslant \frac{\varepsilon ^k}{2\gamma }\big [\Vert p-p^{k}\Vert ^2-\Vert p-p^{k+1}\Vert ^2\big ]-\frac{\varepsilon ^k}{2\gamma }\Vert q^k-p^k\Vert ^2+\frac{\varepsilon ^k\gamma }{2}\Vert \varTheta (u^{k})-\varTheta (u^{k+1})\Vert ^2\nonumber \\&\leqslant \frac{\varepsilon ^k}{2\gamma }\Vert p-p^{k}\Vert ^2-\frac{\varepsilon ^{k+1}}{2\gamma }\Vert p-p^{k+1}\Vert ^2-\frac{\varepsilon ^{k}}{2\gamma }\Vert q^k-p^k\Vert ^2+\frac{\varepsilon ^k\gamma }{2}\Vert \varTheta (u^{k})-\varTheta (u^{k+1})\Vert ^2\nonumber \\&\text{(since } \varepsilon ^{k+1}\leqslant \varepsilon ^k). \end{aligned}$$
(A10)

Step 3. Estimate \(L(u^{k+1},p)-L(u,q^k)\):

Summing (A6) and (A10), the desired result is coming.

A\(_2\): Proof of Theorem 1(Convergence analysis for VAPP)

Take \(u=u^{*}\) and \(p=p^*\) in Lemma 1, then we have that

$$\begin{aligned}&\big [D(u^*,u^{k+1})+\frac{\varepsilon ^{k+1}}{2\gamma }\Vert p^*-p^{k+1}\Vert ^2\big ]-\big [D(u^*,u^k)+\frac{\varepsilon ^k}{2\gamma }\Vert p^*-p^{k}\Vert ^2\big ]\nonumber \\&\leqslant \varepsilon ^k[L(u^*,q^k)-L(u^{k+1},p^*)]-\big [\varDelta ^k(u^k,u^{k+1})+\frac{\varepsilon ^k}{2\gamma }\Vert q^k-p^k\Vert ^2\big ]\nonumber \\&\leqslant -\big [\varDelta ^k(u^k,u^{k+1})+\frac{\varepsilon ^k}{2\gamma }\Vert q^k-p^k\Vert ^2\big ]\quad \text{(since } (u^*,p^*) \text{ is } \text{ a } \text{ saddle } \text{ point }~(24))\nonumber \\&\leqslant -\bigg [\frac{\beta -\varepsilon ^k(B_G+B_{\varOmega }+\gamma \tau ^2)}{2}\Vert u^{k}-u^{k+1}\Vert ^2+\frac{\varepsilon ^k}{2\gamma }\Vert q^k-p^k\Vert ^2\bigg ]\nonumber \\&\qquad \qquad \qquad \quad \;\text{(from }~(26), \varDelta ^k(u,v)\ge \frac{\beta -\varepsilon ^k(B_G+B_{\varOmega }+\gamma \tau ^2)}{2}\Vert u-v\Vert ^2)\nonumber \\&\leqslant -\bigg [\frac{\beta -{\overline{\varepsilon }}(B_G+B_{\varOmega }+\gamma \tau ^2)}{2}\Vert u^{k}-u^{k+1}\Vert ^2+\frac{{\underline{\varepsilon }}}{2\gamma }\Vert q^k-p^k\Vert ^2\bigg ]\nonumber \\&\qquad \text{(since } {\underline{\varepsilon }}\leqslant \varepsilon ^k\leqslant {\overline{\varepsilon }} \text{ satisfy }~(21)). \end{aligned}$$
(A11)

Since \(\{\varepsilon ^k\}\) satisfies (21), we conclude that the sequence \(\{D(u^*,u^{k})+\frac{\varepsilon ^{k}}{2\gamma }\Vert p^*-p^{k}\Vert ^2\}\) is strictly decreasing, unless \(u^k=u^{k+1}\) and \(p^k=q^k\) or \(p^k=p^{k+1}\). The rest of proof is similar to that of [24].

A\(_3\): Proof of Theorem 2(Bifunction value estimation, primal optimality and feasibility for solving (P) by VAPP)

  1. (i)

    Note that the set \({\varvec{U}}\times {\varvec{C}}^*\) is convex, and the VAPP scheme guarantees that \((u^k,p^k)\in {\varvec{U}}\times {\varvec{C}}^*\), \(\forall k\in {\mathbb {N}}\); thus we have \(({\bar{u}}_t,{\bar{p}}_t)\in {\varvec{U}}\times {\varvec{C}}^*\). Since \(\{\varepsilon ^k\}\) satisfies (21), then \(\varDelta ^k(u^k,u^{k+1})\ge 0\). From Lemma 1, we have

    $$\begin{aligned}&\varepsilon ^k[L(u^{k+1},p)-L(u,q^k)]\leqslant \big [D(u,u^k)+\frac{\varepsilon ^k}{2\gamma }\Vert p-p^{k}\Vert ^2\big ]\\&\quad -\big [D(u,u^{k+1})+\frac{\varepsilon ^{k+1}}{2\gamma }\Vert p-p^{k+1}\Vert ^2\big ]. \end{aligned}$$

    Note that the bifunction \(L(u',p)-L(u,p')\) is convex in \(u'\) and linear in \(p'\) for given \(u\in {\varvec{U}}\), \(p\in {\varvec{C}}^*\). Summing the above inequality over \(k=0,1,\cdots ,t\), we obtain that

    $$\begin{aligned} L({\bar{u}}_{t},p)-L(u,{\bar{p}}_t)\leqslant & {} \frac{1}{\sum _{k=0}^{t}\varepsilon ^k}\sum _{k=0}^{t}\varepsilon ^k[L(u^{k+1},p)-L(u,q^k)]\\\leqslant & {} \frac{1}{{\underline{\varepsilon }}(t+1)}\left[ D(u,u^0)+\frac{\varepsilon ^0}{2\gamma }\Vert p-p^{0}\Vert ^2\right] ,\;\forall u\in {\varvec{ U}},p\in {\varvec{ C}}^*. \end{aligned}$$
  2. (ii)

    If \(\Vert \varPi (\varTheta ({\bar{u}}_t))\Vert =0\), statement (ii) is obviously true.

    Otherwise, taking \(u=u^*\in {\varvec{U}}\) and \(p={\hat{p}}=\frac{(M_0+1)\varPi (\varTheta ({\bar{u}}_t))}{\Vert \varPi (\varTheta ({\bar{u}}_t))\Vert }\in {\varvec{C}}^*\) in statement (i) of this theorem, we have that

    $$\begin{aligned}&L({\bar{u}}_t,{\hat{p}})-L(u^*,{\bar{p}}_t)\nonumber \\= & {} (G+J)({\bar{u}}_{t})-(G+J)(u^*)\nonumber \\&+\langle \frac{(M_0+1)\varPi (\varTheta ({\bar{u}}_t))}{\Vert \varPi (\varTheta ({\bar{u}}_t))\Vert }, \varTheta ({\bar{u}}_{t})\rangle -\langle {\bar{p}}_t, \varTheta (u^*)\rangle \nonumber \\\ge & {} (G+J)({\bar{u}}_{t})-(G+J)(u^*)\nonumber \\&+\langle \frac{(M_0+1)\varPi (\varTheta ({\bar{u}}_t))}{\Vert \varPi (\varTheta ({\bar{u}}_t))\Vert }, \varTheta ({\bar{u}}_{t})\rangle \;\;\text{(since } \langle {\bar{p}}_t, \varTheta (u^*)\rangle \leqslant 0)\nonumber \\= & {} (G+J)({\bar{u}}_{t})-(G+J)(u^*)\nonumber \\&+\langle \frac{(M_0+1)\varPi (\varTheta ({\bar{u}}_t))}{\Vert \varPi (\varTheta ({\bar{u}}_t))\Vert }, \varPi (\varTheta ({\bar{u}}_t))+\varPi _{-{\varvec{C}}}(\varTheta ({\bar{u}}_t))\rangle \quad \text{(from }~(19))\nonumber \\= & {} (G+J)({\bar{u}}_{t})-(G+J)(u^*)+(M_0+1)\Vert \varPi (\varTheta ({\bar{u}}_t))\Vert \quad \text{(from }~(20)).\nonumber \\ \end{aligned}$$
    (A12)

    Combining statement (i) of this theorem, (A12) yields that

    $$\begin{aligned} (G+J)({\bar{u}}_{t})-(G+J)(u^*)+(M_0+1)\Vert \varPi (\varTheta ({\bar{u}}_t))\Vert\leqslant & {} \frac{D(u^*,u^0)+\frac{\varepsilon ^0}{2\gamma }\Vert {\hat{p}}-p^{0}\Vert ^2}{{\underline{\varepsilon }}(t+1)}\nonumber \\\leqslant & {} \frac{d_1}{{\underline{\varepsilon }}(t+1)}, \end{aligned}$$
    (A13)

    where \(d_1=\max \nolimits _{\Vert p\Vert \leqslant M_0+1}\big [D(u^*,u^0)+\frac{\varepsilon ^0}{2\gamma }\Vert p-p^{0}\Vert ^2\big ]\). Moreover, taking \(u={\bar{u}}_t\) in the right-hand side of saddle point inequality (5) yields that

    $$\begin{aligned} (G+J)({\bar{u}}_{t})-(G+J)(u^*)\ge & {} -\langle p^*,\varTheta ({\bar{u}}_t)\rangle \nonumber \\= & {} -\langle p^*,\varPi (\varTheta ({\bar{u}}_t))+\varPi _{-{\varvec{C}}}(\varTheta ({\bar{u}}_t))\rangle \quad \text{(since }~(19))\nonumber \\\ge & {} -\langle p^*,\varPi (\varTheta ({\bar{u}}_t))\rangle \quad \text{(since } \langle p^*,\varPi _{-{\varvec{C}}}(\varTheta ({\bar{u}}_t))\rangle \leqslant 0)\nonumber \\\ge & {} -\Vert p^*\Vert \Vert \varPi (\varTheta ({\bar{u}}_t))\Vert \nonumber \\\ge & {} -M_0\Vert \varPi (\varTheta ({\bar{u}}_t))\Vert \quad \text{(by } \Vert p^*\Vert \leqslant M_0). \end{aligned}$$
    (A14)

    Taking (A13) and (A14) together, we get that \(\Vert \varPi (\varTheta ({\bar{u}}_t))\Vert \leqslant \frac{d_1}{{\underline{\varepsilon }}(t+1)}\).

  3. (iii)

    Since \((M_0+1)\Vert \varPi (\varTheta ({\bar{u}}_t))\Vert \ge 0\), from (A13) we have

    $$\begin{aligned} (G+J)({\bar{u}}_{t})-(G+J)(u^*)\leqslant \frac{d_1}{{\underline{\varepsilon }}(t+1)}. \end{aligned}$$

    Combining statement (ii) of this theorem and (A14), we obtain that

    $$\begin{aligned} (G+J)({\bar{u}}_{t})-(G+J)(u^*)\ge -\frac{M_0d_1}{{\underline{\varepsilon }}(t+1)}. \end{aligned}$$

A\(_4:\) Proof of Lemma 2

Suppose the assertion of the lemma does not hold, that is, for any \(\kappa >0\), there is \(\Vert p^j\Vert \leqslant d_p\) so that all optimizers \({\hat{u}}(p^j)\in \arg \min \nolimits _{u\in {\varvec{U}}}L_\gamma (u,p^j)\) satisfy \(\Vert {\hat{u}}(p^j)\Vert >\kappa \). Then, we construct a sequence \(\{{\hat{u}}(p^j)\}\) such that \(\Vert {\hat{u}}(p^j)\Vert \rightarrow +\infty \).

On the other hand, we observe that

$$\begin{aligned} L_\gamma ({\hat{u}}(p^j),p^j)= & {} (G+J)({\hat{u}}(p^j))+\varphi \big (\varTheta ({\hat{u}}(p^j)),p^j\big )\\= & {} (G+J)({\hat{u}}(p^j))+\max _{q\in {\varvec{C}}^*}\langle q,\varTheta ({\hat{u}}(p^j))\rangle -\frac{1}{2\gamma }\Vert q-p^j\Vert ^2\\\ge & {} (G+J)({\hat{u}}(p^j))-\frac{1}{2\gamma }\Vert p^j\Vert ^2\\\ge & {} (G+J)({\hat{u}}(p^j))-\frac{d_p^2}{2\gamma }. \end{aligned}$$

Since \(\Vert {\hat{u}}(p^j)\Vert \rightarrow +\infty \), from the coercivity of \((G+J)(u)\), we have \(\psi _\gamma (p^j)=L_\gamma ({\hat{u}}(p^j),p^j)\rightarrow +\infty \). However, from the boundness of \(\{p^j\}\) and the continuity of \(\psi _\gamma (\cdot )\), we conclude that \(\psi _\gamma (p^j)\) is bounded, which follows one contradiction and assertion of lemma is provided.

A\(_5:\) Proof of Theorem 3(Approximate saddle point and dual optimality for solving (P) by VAPP)

  1. (i)

    From statement (i) of Theorem 2, it is easy to have that, for any \((u,p)\in ({\varvec{ U}}\cap {\mathfrak {B}}^{u})\times ({\varvec{C}}^*\cap {\mathfrak {B}}^{p})\),

    $$\begin{aligned} L({\bar{u}}_{t},p)-L(u,{\bar{p}}_{t})\leqslant \frac{D(u,u^0)+\frac{\varepsilon ^0}{2\gamma }\Vert p-p^{0}\Vert ^2}{{\underline{\varepsilon }}(t+1)}\leqslant \frac{d_2}{{\underline{\varepsilon }}(t+1)}, \end{aligned}$$
    (A15)

    where \(d_2=\max _{(u,p)\in ({\varvec{U}}\cap {\mathfrak {B}}^{u})\times ({\varvec{C}}^*\cap {\mathfrak {B}}^{p}))}\big [D(u,u^0)+\frac{\varepsilon ^0}{2\gamma }\Vert p-p^{0}\Vert ^2\big ]\).

    Since \({\bar{u}}_t\in {\varvec{U}}\cap {\mathfrak {B}}^{u}\), then taking \(u={\bar{u}}_t\) in (A15), we obtain

    $$\begin{aligned} L({\bar{u}}_{t},p)-L({\bar{u}}_{t},{\bar{p}}_{t})\leqslant \frac{d_2}{{\underline{\varepsilon }}(t+1)}, \forall p\in {\varvec{C}}^*\cap {\mathfrak {B}}^{p}. \end{aligned}$$
    (A16)

    Similarly, by taking \(p={\bar{p}}_t\in {\varvec{C}}^*\cap {\mathfrak {B}}^{p}\) in (A15), we obtain

    $$\begin{aligned} L({\bar{u}}_{t},{\bar{p}}_t)-L(u,{\bar{p}}_{t})\leqslant \frac{d_2}{{\underline{\varepsilon }}(t+1)}, \forall u\in {\varvec{U}}\cap {\mathfrak {B}}^{u}. \end{aligned}$$
    (A17)
  2. (ii)

    In the left-hand side of inequality in statement (i), taking \(p=0\), we get \(\langle {\bar{p}}_t,\varTheta ({\bar{u}}_t)\rangle \ge -\frac{d_2}{{\underline{\varepsilon }}(t+1)}\). Then, from (13), we have

    $$\begin{aligned} \varphi \big (\varTheta ({\bar{u}}_t),{\bar{p}}_t\big )\ge \langle {\bar{p}}_t,\varTheta ({\bar{u}}_t)\rangle \ge -\frac{d_2}{{\underline{\varepsilon }}(t+1)}. \end{aligned}$$
    (A18)

    On the other hand, for \(p\in {\varvec{C}}^*\cap {\mathfrak {B}}^{p}\), we have

    $$\begin{aligned} \varphi \big (\varTheta ({\bar{u}}_t),p\big )= & {} \min _{\xi \in -{\varvec{C}}}\langle p,\varTheta ({\bar{u}}_t)-\xi \rangle +\frac{\gamma }{2}\Vert \varTheta ({\bar{u}}_t)-\xi \Vert ^2\qquad \text{(from }~(12))\nonumber \\\leqslant & {} \langle p,\varTheta ({\bar{u}}_t)-\varPi _{-{\varvec{C}}}(\varTheta ({\bar{u}}_t))\rangle +\frac{\gamma }{2}\Vert \varTheta ({\bar{u}}_t)-\varPi _{-{\varvec{C}}}(\varTheta ({\bar{u}}_t))\Vert ^2\nonumber \\\leqslant & {} \Vert p\Vert \cdot \Vert \varPi (\varTheta ({\bar{u}}_t))\Vert +\frac{\gamma }{2}\Vert \varPi (\varTheta ({\bar{u}}_t))\Vert ^2\nonumber \\\leqslant & {} \frac{r^pd_1}{{\underline{\varepsilon }}(t+1)}+\frac{\gamma (d_1)^2}{2{\underline{\varepsilon }}^2(t+1)^2}\nonumber \\&\text{(from } \text{ statment } \text{(ii) } \text{ of } \text{ Theorem }~2 \,\text{ and }\, {p}\in {\varvec{C}}^*\cap {\mathfrak {B}}^{p}). \end{aligned}$$
    (A19)

    Therefore, we get the left-hand side of inequality in statement (ii):

    $$\begin{aligned} L_{\gamma }({\bar{u}}_{t},p)-L_{\gamma }({\bar{u}}_{t},{\bar{p}}_{t})= & {} \varphi (\varTheta ({\bar{u}}_t),p)-\varphi (\varTheta ({\bar{u}}_t),{\bar{p}}_t)\nonumber \\\leqslant & {} \frac{r^pd_1+d_2}{{\underline{\varepsilon }}(t+1)}+\frac{\gamma (d_1)^2}{2{\underline{\varepsilon }}^2(t+1)^2}. \end{aligned}$$
    (A20)

    From (A18) and (A19), it also has that

    $$\begin{aligned} -\frac{d_2}{{\underline{\varepsilon }}(t+1)}\leqslant \langle {\bar{p}}_t,\varTheta ({\bar{u}}_t)\rangle \leqslant \varphi (\varTheta ({\bar{u}}_t),{\bar{p}}_t)\leqslant \frac{r^p d_1}{{\underline{\varepsilon }}(t+1)}+\frac{\gamma (d_1)^2}{2{\underline{\varepsilon }}^2(t+1)^2}, \end{aligned}$$

    which follows that

    $$\begin{aligned} \varphi (\varTheta ({\bar{u}}_t),{\bar{p}}_t)-\langle {\bar{p}}_t,\varTheta ({\bar{u}}_t)\rangle\leqslant & {} \frac{r^pd_1}{{\underline{\varepsilon }}(t+1)}+\frac{\gamma (d_1)^2}{2{\underline{\varepsilon }}^2(t+1)^2}-(-\frac{d_2}{{\underline{\varepsilon }}(t+1)})\\= & {} \frac{r^pd_1+d_2}{{\underline{\varepsilon }}(t+1)}+\frac{\gamma (d_1)^2}{2{\underline{\varepsilon }}^2(t+1)^2}. \end{aligned}$$

    Then, for \(u\in {\varvec{U}}\cap {\mathfrak {B}}^{u}\), we have

    $$\begin{aligned}&L_\gamma ({\bar{u}}_{t},{\bar{p}}_{t})\leqslant L({\bar{u}}_{t},{\bar{p}}_{t})+\frac{r^pd_1+d_2}{{\underline{\varepsilon }}(t+1)}+\frac{\gamma (d_1)^2}{2{\underline{\varepsilon }}^2(t+1)^2}\nonumber \\&\quad \leqslant L(u,{\bar{p}}_{t})+\frac{r^pd_1+2d_2}{{\underline{\varepsilon }}(t+1)}+\frac{\gamma (d_1)^2}{2{\underline{\varepsilon }}^2(t+1)^2}\text{(by } \text{ right-hand } \text{ side } \text{ of } \text{ statement } \text{(i)) }\nonumber \\&\quad \leqslant L_\gamma (u,{\bar{p}}_{t})+\frac{r^pd_1+2d_2}{{\underline{\varepsilon }}(t+1)}+\frac{\gamma (d_1)^2}{2{\underline{\varepsilon }}^2(t+1)^2}\quad \text{(from }~(13)), \end{aligned}$$
    (A21)

    which follows the right-hand side of inequality in statement (ii).

  3. (iii)

    For saddle point \((u^*,p^*)\), we have

    $$\begin{aligned} L_{\gamma }(u^*,p)\leqslant L_{\gamma }(u^*,p^*)\leqslant L_{\gamma }(u,p^*), \forall u\in {\varvec{U}}, p\in {\mathbb {R}}^m. \end{aligned}$$
    (A22)

    Taking \(u={\bar{u}}_t\), \(p={\bar{p}}_t\) in (A22), and taking \(u={\hat{u}}({\bar{p}}_t)\), \(p=p^*\) in statement (ii) of this theorem, we obtain the following two inequalities, respectively:

    $$\begin{aligned} L_{\gamma }(u^*,{\bar{p}}_t)\leqslant&L_{\gamma }(u^*,p^*)&\leqslant L_{\gamma }({\bar{u}}_t,p^*), \end{aligned}$$

    and

    $$\begin{aligned}&-\frac{r^pd_1+d_2}{{\underline{\varepsilon }}(t+1)}-\frac{\gamma (d_1)^2}{2{\underline{\varepsilon }}^2(t+1)^2}+L_{\gamma }({\bar{u}}_{t},p^*)\leqslant L_{\gamma }({\bar{u}}_{t},{\bar{p}}_{t})\leqslant L_{\gamma }({\hat{u}}({\bar{p}}_t),{\bar{p}}_{t})\\&\quad +\frac{r^pd_1+2d_2}{{\underline{\varepsilon }}(t+1)}+\frac{\gamma (d_1)^2}{2{\underline{\varepsilon }}^2(t+1)^2}. \end{aligned}$$

    Combining these two inequalities, the desired inequality is obtained:

    $$\begin{aligned}&-\frac{r^pd_1+d_2}{{\underline{\varepsilon }}(t+1)}-\frac{\gamma (d_1)^2}{2{\underline{\varepsilon }}^2(t+1)^2}+L_{\gamma }(u^*,p^*)\leqslant L_{\gamma }({\hat{u}}({\bar{p}}_t),{\bar{p}}_{t})\\&\quad +\frac{r^pd_1+2d_2}{{\underline{\varepsilon }}(t+1)}+\frac{\gamma (d_1)^2}{2{\underline{\varepsilon }}^2(t+1)^2}. \end{aligned}$$

    Therefore,

    $$\begin{aligned} \psi _\gamma (p^*)=L_{\gamma }(u^*,p^*)\leqslant & {} L_{\gamma }({\hat{u}}({\bar{p}}_t),{\bar{p}}_{t})+\frac{2r^pd_1+3d_2}{{\underline{\varepsilon }}(t+1)}+\frac{\gamma (d_1)^2}{{\underline{\varepsilon }}^2(t+1)^2}\nonumber \\= & {} \psi _\gamma ({\bar{p}}_t)+\frac{2r^pd_1+3d_2}{{\underline{\varepsilon }}(t+1)}+\frac{\gamma (d_1)^2}{{\underline{\varepsilon }}^2(t+1)^2}. \end{aligned}$$
    (A23)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhao, L., Zhu, DL. On Iteration Complexity of a First-Order Primal-Dual Method for Nonlinear Convex Cone Programming. J. Oper. Res. Soc. China 10, 53–87 (2022). https://doi.org/10.1007/s40305-021-00344-x

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s40305-021-00344-x

Keywords

Mathematics Subject Classification

Navigation