Abstract
Nonlinear convex cone programming (NCCP) models have found many practical applications. In this paper, we introduce a flexible first-order primal-dual algorithm, called the variant auxiliary problem principle (VAPP), for solving NCCP problems when the objective function and constraints are convex but may be nonsmooth. At each iteration, VAPP generates a nonlinear approximation of the primal augmented Lagrangian model. The approximation incorporates both linearization and a distance-like proximal term, and then the iterations of VAPP are shown to possess a decomposition property for NCCP. Motivated by recent applications in big data analytics, there has been a growing interest in the convergence rate analysis of algorithms with parallel computing capabilities for large scale optimization problems. We establish O(1/t) convergence rate towards primal optimality, feasibility and dual optimality. By adaptively setting parameters at different iterations, we show an \(O(1/t^2)\) rate for the strongly convex case. Finally, we discuss some issues in the implementation of VAPP.
Similar content being viewed by others
Notes
The MATLAB codes can be found at: https://github.com/lzhao-cloud/GuaranteeDisplayAdvertising.git.
The MATLAB codes can be found at: https://github.com/lzhao-cloud/StructuredElasticNetSupportVectorMachine.git.
References
Goberna, M.A., López, M.A.: Linear Semi-infinite Optimization, vol. 2. Wiley, London (1998)
López, M., Still, G.: Semi-infinite programming. Eur. J. Oper. Res. 180(2), 491–518 (2007)
Shapiro, A.: Semi-infinite programming, duality, discretization and optimality conditions. Optimization 58(2), 133–161 (2009)
Alizadeh, F., Goldfarb, D.: Second-order cone programming. Math. Program. 95(1), 3–51 (2003)
Fukuda, E.H., Silva, P.J., Fukushima, M.: Differentiable exact penalty functions for nonlinear second-order cone programs. SIAM J. Optim. 22(4), 1607–1633 (2012)
Kanzow, C., Ferenczi, I., Fukushima, M.: On the local convergence of semismooth Newton methods for linear and nonlinear second-order cone programs without strict complementarity. SIAM J. Optim. 20(1), 297–320 (2009)
Kato, H., Fukushima, M.: An SQP-type algorithm for nonlinear second-order cone programs. Optim. Lett. 1(2), 129–144 (2007)
Yamashita, H., Yabe, H.: A primal–dual interior point method for nonlinear optimization over second-order cones. Optim. Methods Softw. 24(3), 407–426 (2009)
Ben-Tal, A., Nemirovski, A.: Robust convex optimization. Math. Oper. Res. 23(4), 769–805 (1998)
Ben-Tal, A., El Ghaoui, L., Nemirovski, A.: Robust Optimization. Princeton University Press, Princeton (2009)
Lobo, M.S., Vandenberghe, L., Boyd, S., Lebret, H.: Applications of second-order cone programming. Linear Algebra Appl. 284(1), 193–228 (1998)
Wu, S.P., Boyd, S., Vandenberghe, L.: FIR filter design via semidefinite programming and spectral factorization. In: Proceedings of the 35th IEEE Conference on Decision and Control, vol. 1, pp. 271–276. IEEE (1996)
Candés, E.J., Romberg, J., Tao, T.: Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information. IEEE Trans. Inf. Theory 52(2), 489–509 (2006)
Patriksson, M.: A survey on the continuous nonlinear resource allocation problem. Eur. J. Oper. Res. 185(1), 1–46 (2008)
Donoho, D.L.: Compressed sensing. IEEE Trans. Inf. Theory 52(4), 1289–1306 (2006)
Patriksson, M., Strömberg, C.: Algorithms for the continuous nonlinear resource allocation problem–new implementations and numerical studies. Eur. J. Oper. Res. 243(3), 703–722 (2015)
Hestenes, M.R.: Multiplier and gradient methods. J. Optim. Theory Appl. 4(5), 303–320 (1969)
Powell, M.J.D.: A method for nonlinear constraints in minimization problems. In: Fletcher, R. (ed.) Optimization. Academic Press, London (1969)
Buys, J.D.: Dual algorithms for constrained optimization problems. Brondder-Offset (1972)
Rockafellar, R.T.: Augmented Lagrangians and applications of the proximal point algorithm in convex programming. Math. Oper. Res. 1(2), 97–116 (1976)
Shapiro, A., Sun, J.: Some properties of the augmented Lagrangian in cone constrained optimization. Math. Oper. Res. 29(3), 479–491 (2004)
Fortin, M., Glowinski, R.: Chapter III on decomposition–coordination methods using an augmented Lagrangian. Stud. Math. Its Appl. 15, 97–146 (1983)
Stellato, B., Banjac, G., Goulart, P., Bemporad, A., Boyd, S.: OSQP: An operator splitting solver for quadratic programs. In: 2018 UKACC 12th International Conference on Control (CONTROL), pp. 339–339. IEEE (2018)
Cohen, G., Zhu, D.L.: Decomposition coordination methods in large scale optimization problems. The nondifferentiable case and the use of augmented Lagrangians. Adv. Large Scale Syst. 1, 203–266 (1984)
Contreras, J., Losi, A., Russo, M., Wu, F.F.: DistOpt: a software framework for modeling and evaluating optimization problem solutions in distributed environments. J. Parallel Distrib. Comput. 60(6), 741–763 (2000)
Losi, A., Russo, M.: On the application of the auxiliary problem principle. J. Optim. Theory Appl. 117(2), 377–396 (2003)
Kim, B.H., Baldick, R.: Coarse-grained distributed optimal power flow. IEEE Trans. Power Syst. 12(2), 932–939 (1997)
Kim, B.H., Baldick, R.: A comparison of distributed optimal power flow algorithms. IEEE Trans. Power Syst. 15(2), 599–604 (2000)
Renaud, A.: Daily generation management at Electricité de France: from planning towards real time. IEEE Trans. Autom. Control 38(7), 1080–1093 (1993)
Cao, L., Sun, Y., Cheng, X., Qi, B., Li, Q.: Research on the convergent performance of the auxiliary problem principle based distributed and parallel optimization algorithm. In: 2007 IEEE International Conference on Automation and Logistics, pp. 1083–1088. IEEE (2007)
Hur, D., Park, J.K., Kim, B.H.: On the convergence rate improvement of mathematical decomposition technique on distributed optimal power flow. Int. J. Electr. Power Energy Syst. 25(1), 31–39 (2003)
Gabay, D., Mercier, B.: A dual algorithm for the solution of nonlinear variational problems via finite element approximation. Comput. Math. With Appl. 2(1), 17–40 (1976)
Aybat, N.S., Hamedani, E.Y.: A distributed ADMM-like method for resource sharing under conic constraints over time-varying networks. arXiv:1611.07393 (2016)
Li, M., Sun, D., Toh, K.C.: A majorized ADMM with indefinite proximal terms for linearly constrained convex composite optimization. SIAM J. Optim. 26(2), 922–950 (2016)
He, B., Yuan, X.: On the \(O(1/n)\) convergence rate of the Douglas–Rachford alternating direction method. SIAM J. Numer. Anal. 50(2), 700–709 (2012)
Monteiro, R.D., Svaiter, B.F.: Iteration-complexity of block-decomposition algorithms and the alternating direction method of multipliers. SIAM J. Optim. 23(1), 475–507 (2013)
Gao, X., Zhang, S.Z.: First-order algorithms for convex optimization with nonseparable objective and coupled constraints. J. Oper. Res. Soc. China 5(2), 131–159 (2017)
Deng, W., Yin, W.: On the global and linear convergence of the generalized alternating direction method of multipliers. J. Sci. Comput. 66(3), 889–916 (2016)
Hong, M., Luo, Z.Q.: On the linear convergence of the alternating direction method of multipliers. Math. Program. 162(1–2), 165–199 (2017)
Lin, T., Ma, S., Zhang, S.: On the global linear convergence of the ADMM with multiblock variables. SIAM J. Optim. 25(3), 1478–1497 (2015)
Liu, Y., Yuan, X., Zeng, S., Zhang, J.: Partial error bound conditions and the linear convergence rate of the alternating direction method of multipliers. SIAM J. Numer. Anal. 56(4), 2095–2123 (2018)
Chen, G., Teboulle, M.: A proximal-based decomposition method for convex minimization problems. Math. Program. 64(1–3), 81–101 (1994)
Zhang, X., Burger, M., Osher, S.: A unified primal–dual algorithm framework based on Bregman iteration. J. Sci. Comput. 46(1), 20–46 (2011)
Deng, W., Lai, M.J., Peng, Z., Yin, W.: Parallel multi-block ADMM with \(o(1/k)\) convergence. J. Sci. Comput. 71(2), 712–736 (2017)
Börgens, E., Kanzow, C.: Regularized Jacobi-type ADMM-methods for a class of separable convex optimization problems in Hilbert spaces. Comput. Optim. Appl. 73(3), 755–790 (2019)
Chambolle, A., Pock, T.: A first-order primal-dual algorithm for convex problems with applications to imaging. J. Math. Imaging Vis. 40(1), 120–145 (2011)
Chambolle, A., Pock, T.: On the ergodic convergence rates of a first-order primal-dual algorithm. Math. Program. 159(1–2), 253–287 (2016)
Nemirovski, A.: Prox-method with rate of convergence \(O\)(1/t) for variational inequalities with Lipschitz continuous monotone operators and smooth convex-concave saddle point problems. SIAM J. Optim. 15(1), 229–251 (2004)
He, N., Juditsky, A., Nemirovski, A.: Mirror prox algorithm for multi-term composite minimization and semi-separable problems. Comput. Optim. Appl. 61(2), 275–319 (2015)
Juditsky, A., Nemirovski, A.: First order methods for nonsmooth convex large-scale optimization, ii: utilizing problems structure. Optim. Mach. Learn. 30(9), 149–183 (2011)
Hamedani, E.Y., Aybat, N.S.: A primal–dual algorithm for general convex–concave saddle point problems. arXiv:1803.01401 (2018)
Fang, Z., Li, Y., Liu, C., Zhu, W., Zhang, Y., Zhou, W.: Large-scale personalized delivery for guaranteed display advertising with real-time pacing. IEEE Int. Conf. Data Min. (ICDM) 2019, 190–199 (2019)
Hojjat, A., Turner, J., Cetintas, S., Yang, J.: A unified framework for the scheduling of guaranteed targeted display advertising under reach and frequency requirements. Oper. Res. 65(2), 289–313 (2017)
Turner, J.: The planning of guaranteed targeted display advertising. Oper. Res. 60(1), 18–33 (2012)
Turner, J., Hojjat, A., Cetintas, S., Yang, J.: Delivering guaranteed display ADS under reach and frequency requirements. In: Twenty-Eighth AAAI Conference on Artificial Intelligence. AAAI Press (2014)
Slawski, M., Zu Castell, W., Tutz, G.: Feature selection guided by structural information. Ann. Appl. Stat. 4, 1056–1080 (2010)
Slawski, M.: The structured elastic net for quantile regression and support vector classification. Stat. Comput. 22(1), 153–168 (2012)
Ortega, J.M., Rheinboldt, W.C.: Iterative Solution of Nonlinear Equations in Several Variables, vol. 30. SIAM, Philadelphia (1970)
Shapiro, A., Scheinberg, K.: Duality and optimality conditions. Handbook of Semidefinite Programming, pp. 67–110 (2000)
Cheney, W., Goldstein, A.A.: Proximity maps for convex sets. Proc. Am. Math. Soc. 10(3), 448–450 (1959)
Wierzbicki, A.P., Kurcyusz, S.: Projection on a cone, penalty functionals and duality theory for problems with inequaltity constraints in Hilbert space. SIAM J. Control Optim. 15(1), 25–56 (1977)
Beck, A., Teboulle, M.: Mirror descent and nonlinear projected subgradient methods for convex optimization. Oper. Res. Lett. 31(3), 167–175 (2003)
Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University Press, Cambridge (2004)
Hiriart-Urruty, J.B., Lemaréchal, C.: Convex Analysis and Minimization Algorithms I: Fundamentals, vol. 305. Springer, Berlin (2013)
Aybat, N.S., Iyengar, G.: A unified approach for minimizing composite norms. Math. Program. 144(1–2), 181–226 (2014)
Vapnik, V.: Statistical Learning Theory, vol. 3. Wiley, New York (1998)
Bi, J., Vapnik, V.N.: Learning with rigorous support vector machines. In: Learning Theory and Kernel Machines, pp. 243–257. Springer, Berlin (2003)
Oneto, L., Ridella, S., Anguita, D.: Tikhonov, Ivanov and Morozov regularization for support vector machine learning. Mach. Learn. 103(1), 103–136 (2016)
Zhao, L., Zhu, D.: First-order primal-dual method for nonlinear convex cone programs. arXiv preprint arXiv: 1801.00261v5 (2019)
Acknowledgements
The authors are grateful for valuable comments from Professor Shu-Zhong Zhang on earlier versions of this manuscript.
Author information
Authors and Affiliations
Corresponding author
Additional information
This research was supported by the National Natural Science Foundation of China (Nos. 71471112 and 71871140)
This paper was dedicated to late Professor Quan Zheng in commeoration of his contribution to Operations Research.
Appendix
Appendix
A\(_1\): Proof of Lemma 1(Descent inequalities of generalized distance function)
Step 1. Estimate \(L(u^{k+1},q^k)-L(u,q^k)\).
For the primal subproblem (22) of VAPP, the unique solution \(u^{k+1}\) is characterized by the following variational inequality:
which follows that
By the convexity of G, we estimate term \(\varLambda _1\) in (A2).
Since \(\varOmega (u)\) is \({\varvec{ C}}\)-convex, \(q^k\in {\varvec{ C}}^*\), then \(\langle q^k,\varOmega (u)\rangle \) is convex and
Since \(K(\cdot )\) satisfies Assumption 2, simple algebraic operation follows that
Take \(\varLambda _1\), \(\varLambda _2\) and \(\varLambda _3\) into (A2), we have
Multiply \(\varepsilon ^k\) on both side of the above inequality, and we have that
Step 2. Estimate \(L(u^{k+1},p)-L(u^{k+1},q^k).\)
We first derive two inequalities. By the property of projection (16) with \(u=p^k+\gamma \varTheta (u^{k+1})\), \(v=p\), \(\forall p\in {\varvec{C}}^*\), we have
Using Proposition 1 with \(u=\gamma \varTheta (u^{k+1})\), \(v=\gamma \varTheta (u^k)\), and \(w=p^k\), we have
Statement (ii) follows from (A7) and (A8):
Then, multiplying \(\varepsilon ^k\) on both side of (A9), we obtain
Step 3. Estimate \(L(u^{k+1},p)-L(u,q^k)\):
Summing (A6) and (A10), the desired result is coming.
A\(_2\): Proof of Theorem 1(Convergence analysis for VAPP)
Take \(u=u^{*}\) and \(p=p^*\) in Lemma 1, then we have that
Since \(\{\varepsilon ^k\}\) satisfies (21), we conclude that the sequence \(\{D(u^*,u^{k})+\frac{\varepsilon ^{k}}{2\gamma }\Vert p^*-p^{k}\Vert ^2\}\) is strictly decreasing, unless \(u^k=u^{k+1}\) and \(p^k=q^k\) or \(p^k=p^{k+1}\). The rest of proof is similar to that of [24].
A\(_3\): Proof of Theorem 2(Bifunction value estimation, primal optimality and feasibility for solving (P) by VAPP)
-
(i)
Note that the set \({\varvec{U}}\times {\varvec{C}}^*\) is convex, and the VAPP scheme guarantees that \((u^k,p^k)\in {\varvec{U}}\times {\varvec{C}}^*\), \(\forall k\in {\mathbb {N}}\); thus we have \(({\bar{u}}_t,{\bar{p}}_t)\in {\varvec{U}}\times {\varvec{C}}^*\). Since \(\{\varepsilon ^k\}\) satisfies (21), then \(\varDelta ^k(u^k,u^{k+1})\ge 0\). From Lemma 1, we have
$$\begin{aligned}&\varepsilon ^k[L(u^{k+1},p)-L(u,q^k)]\leqslant \big [D(u,u^k)+\frac{\varepsilon ^k}{2\gamma }\Vert p-p^{k}\Vert ^2\big ]\\&\quad -\big [D(u,u^{k+1})+\frac{\varepsilon ^{k+1}}{2\gamma }\Vert p-p^{k+1}\Vert ^2\big ]. \end{aligned}$$Note that the bifunction \(L(u',p)-L(u,p')\) is convex in \(u'\) and linear in \(p'\) for given \(u\in {\varvec{U}}\), \(p\in {\varvec{C}}^*\). Summing the above inequality over \(k=0,1,\cdots ,t\), we obtain that
$$\begin{aligned} L({\bar{u}}_{t},p)-L(u,{\bar{p}}_t)\leqslant & {} \frac{1}{\sum _{k=0}^{t}\varepsilon ^k}\sum _{k=0}^{t}\varepsilon ^k[L(u^{k+1},p)-L(u,q^k)]\\\leqslant & {} \frac{1}{{\underline{\varepsilon }}(t+1)}\left[ D(u,u^0)+\frac{\varepsilon ^0}{2\gamma }\Vert p-p^{0}\Vert ^2\right] ,\;\forall u\in {\varvec{ U}},p\in {\varvec{ C}}^*. \end{aligned}$$ -
(ii)
If \(\Vert \varPi (\varTheta ({\bar{u}}_t))\Vert =0\), statement (ii) is obviously true.
Otherwise, taking \(u=u^*\in {\varvec{U}}\) and \(p={\hat{p}}=\frac{(M_0+1)\varPi (\varTheta ({\bar{u}}_t))}{\Vert \varPi (\varTheta ({\bar{u}}_t))\Vert }\in {\varvec{C}}^*\) in statement (i) of this theorem, we have that
$$\begin{aligned}&L({\bar{u}}_t,{\hat{p}})-L(u^*,{\bar{p}}_t)\nonumber \\= & {} (G+J)({\bar{u}}_{t})-(G+J)(u^*)\nonumber \\&+\langle \frac{(M_0+1)\varPi (\varTheta ({\bar{u}}_t))}{\Vert \varPi (\varTheta ({\bar{u}}_t))\Vert }, \varTheta ({\bar{u}}_{t})\rangle -\langle {\bar{p}}_t, \varTheta (u^*)\rangle \nonumber \\\ge & {} (G+J)({\bar{u}}_{t})-(G+J)(u^*)\nonumber \\&+\langle \frac{(M_0+1)\varPi (\varTheta ({\bar{u}}_t))}{\Vert \varPi (\varTheta ({\bar{u}}_t))\Vert }, \varTheta ({\bar{u}}_{t})\rangle \;\;\text{(since } \langle {\bar{p}}_t, \varTheta (u^*)\rangle \leqslant 0)\nonumber \\= & {} (G+J)({\bar{u}}_{t})-(G+J)(u^*)\nonumber \\&+\langle \frac{(M_0+1)\varPi (\varTheta ({\bar{u}}_t))}{\Vert \varPi (\varTheta ({\bar{u}}_t))\Vert }, \varPi (\varTheta ({\bar{u}}_t))+\varPi _{-{\varvec{C}}}(\varTheta ({\bar{u}}_t))\rangle \quad \text{(from }~(19))\nonumber \\= & {} (G+J)({\bar{u}}_{t})-(G+J)(u^*)+(M_0+1)\Vert \varPi (\varTheta ({\bar{u}}_t))\Vert \quad \text{(from }~(20)).\nonumber \\ \end{aligned}$$(A12)Combining statement (i) of this theorem, (A12) yields that
$$\begin{aligned} (G+J)({\bar{u}}_{t})-(G+J)(u^*)+(M_0+1)\Vert \varPi (\varTheta ({\bar{u}}_t))\Vert\leqslant & {} \frac{D(u^*,u^0)+\frac{\varepsilon ^0}{2\gamma }\Vert {\hat{p}}-p^{0}\Vert ^2}{{\underline{\varepsilon }}(t+1)}\nonumber \\\leqslant & {} \frac{d_1}{{\underline{\varepsilon }}(t+1)}, \end{aligned}$$(A13)where \(d_1=\max \nolimits _{\Vert p\Vert \leqslant M_0+1}\big [D(u^*,u^0)+\frac{\varepsilon ^0}{2\gamma }\Vert p-p^{0}\Vert ^2\big ]\). Moreover, taking \(u={\bar{u}}_t\) in the right-hand side of saddle point inequality (5) yields that
$$\begin{aligned} (G+J)({\bar{u}}_{t})-(G+J)(u^*)\ge & {} -\langle p^*,\varTheta ({\bar{u}}_t)\rangle \nonumber \\= & {} -\langle p^*,\varPi (\varTheta ({\bar{u}}_t))+\varPi _{-{\varvec{C}}}(\varTheta ({\bar{u}}_t))\rangle \quad \text{(since }~(19))\nonumber \\\ge & {} -\langle p^*,\varPi (\varTheta ({\bar{u}}_t))\rangle \quad \text{(since } \langle p^*,\varPi _{-{\varvec{C}}}(\varTheta ({\bar{u}}_t))\rangle \leqslant 0)\nonumber \\\ge & {} -\Vert p^*\Vert \Vert \varPi (\varTheta ({\bar{u}}_t))\Vert \nonumber \\\ge & {} -M_0\Vert \varPi (\varTheta ({\bar{u}}_t))\Vert \quad \text{(by } \Vert p^*\Vert \leqslant M_0). \end{aligned}$$(A14)Taking (A13) and (A14) together, we get that \(\Vert \varPi (\varTheta ({\bar{u}}_t))\Vert \leqslant \frac{d_1}{{\underline{\varepsilon }}(t+1)}\).
-
(iii)
Since \((M_0+1)\Vert \varPi (\varTheta ({\bar{u}}_t))\Vert \ge 0\), from (A13) we have
$$\begin{aligned} (G+J)({\bar{u}}_{t})-(G+J)(u^*)\leqslant \frac{d_1}{{\underline{\varepsilon }}(t+1)}. \end{aligned}$$Combining statement (ii) of this theorem and (A14), we obtain that
$$\begin{aligned} (G+J)({\bar{u}}_{t})-(G+J)(u^*)\ge -\frac{M_0d_1}{{\underline{\varepsilon }}(t+1)}. \end{aligned}$$
A\(_4:\) Proof of Lemma 2
Suppose the assertion of the lemma does not hold, that is, for any \(\kappa >0\), there is \(\Vert p^j\Vert \leqslant d_p\) so that all optimizers \({\hat{u}}(p^j)\in \arg \min \nolimits _{u\in {\varvec{U}}}L_\gamma (u,p^j)\) satisfy \(\Vert {\hat{u}}(p^j)\Vert >\kappa \). Then, we construct a sequence \(\{{\hat{u}}(p^j)\}\) such that \(\Vert {\hat{u}}(p^j)\Vert \rightarrow +\infty \).
On the other hand, we observe that
Since \(\Vert {\hat{u}}(p^j)\Vert \rightarrow +\infty \), from the coercivity of \((G+J)(u)\), we have \(\psi _\gamma (p^j)=L_\gamma ({\hat{u}}(p^j),p^j)\rightarrow +\infty \). However, from the boundness of \(\{p^j\}\) and the continuity of \(\psi _\gamma (\cdot )\), we conclude that \(\psi _\gamma (p^j)\) is bounded, which follows one contradiction and assertion of lemma is provided.
A\(_5:\) Proof of Theorem 3(Approximate saddle point and dual optimality for solving (P) by VAPP)
-
(i)
From statement (i) of Theorem 2, it is easy to have that, for any \((u,p)\in ({\varvec{ U}}\cap {\mathfrak {B}}^{u})\times ({\varvec{C}}^*\cap {\mathfrak {B}}^{p})\),
$$\begin{aligned} L({\bar{u}}_{t},p)-L(u,{\bar{p}}_{t})\leqslant \frac{D(u,u^0)+\frac{\varepsilon ^0}{2\gamma }\Vert p-p^{0}\Vert ^2}{{\underline{\varepsilon }}(t+1)}\leqslant \frac{d_2}{{\underline{\varepsilon }}(t+1)}, \end{aligned}$$(A15)where \(d_2=\max _{(u,p)\in ({\varvec{U}}\cap {\mathfrak {B}}^{u})\times ({\varvec{C}}^*\cap {\mathfrak {B}}^{p}))}\big [D(u,u^0)+\frac{\varepsilon ^0}{2\gamma }\Vert p-p^{0}\Vert ^2\big ]\).
Since \({\bar{u}}_t\in {\varvec{U}}\cap {\mathfrak {B}}^{u}\), then taking \(u={\bar{u}}_t\) in (A15), we obtain
$$\begin{aligned} L({\bar{u}}_{t},p)-L({\bar{u}}_{t},{\bar{p}}_{t})\leqslant \frac{d_2}{{\underline{\varepsilon }}(t+1)}, \forall p\in {\varvec{C}}^*\cap {\mathfrak {B}}^{p}. \end{aligned}$$(A16)Similarly, by taking \(p={\bar{p}}_t\in {\varvec{C}}^*\cap {\mathfrak {B}}^{p}\) in (A15), we obtain
$$\begin{aligned} L({\bar{u}}_{t},{\bar{p}}_t)-L(u,{\bar{p}}_{t})\leqslant \frac{d_2}{{\underline{\varepsilon }}(t+1)}, \forall u\in {\varvec{U}}\cap {\mathfrak {B}}^{u}. \end{aligned}$$(A17) -
(ii)
In the left-hand side of inequality in statement (i), taking \(p=0\), we get \(\langle {\bar{p}}_t,\varTheta ({\bar{u}}_t)\rangle \ge -\frac{d_2}{{\underline{\varepsilon }}(t+1)}\). Then, from (13), we have
$$\begin{aligned} \varphi \big (\varTheta ({\bar{u}}_t),{\bar{p}}_t\big )\ge \langle {\bar{p}}_t,\varTheta ({\bar{u}}_t)\rangle \ge -\frac{d_2}{{\underline{\varepsilon }}(t+1)}. \end{aligned}$$(A18)On the other hand, for \(p\in {\varvec{C}}^*\cap {\mathfrak {B}}^{p}\), we have
$$\begin{aligned} \varphi \big (\varTheta ({\bar{u}}_t),p\big )= & {} \min _{\xi \in -{\varvec{C}}}\langle p,\varTheta ({\bar{u}}_t)-\xi \rangle +\frac{\gamma }{2}\Vert \varTheta ({\bar{u}}_t)-\xi \Vert ^2\qquad \text{(from }~(12))\nonumber \\\leqslant & {} \langle p,\varTheta ({\bar{u}}_t)-\varPi _{-{\varvec{C}}}(\varTheta ({\bar{u}}_t))\rangle +\frac{\gamma }{2}\Vert \varTheta ({\bar{u}}_t)-\varPi _{-{\varvec{C}}}(\varTheta ({\bar{u}}_t))\Vert ^2\nonumber \\\leqslant & {} \Vert p\Vert \cdot \Vert \varPi (\varTheta ({\bar{u}}_t))\Vert +\frac{\gamma }{2}\Vert \varPi (\varTheta ({\bar{u}}_t))\Vert ^2\nonumber \\\leqslant & {} \frac{r^pd_1}{{\underline{\varepsilon }}(t+1)}+\frac{\gamma (d_1)^2}{2{\underline{\varepsilon }}^2(t+1)^2}\nonumber \\&\text{(from } \text{ statment } \text{(ii) } \text{ of } \text{ Theorem }~2 \,\text{ and }\, {p}\in {\varvec{C}}^*\cap {\mathfrak {B}}^{p}). \end{aligned}$$(A19)Therefore, we get the left-hand side of inequality in statement (ii):
$$\begin{aligned} L_{\gamma }({\bar{u}}_{t},p)-L_{\gamma }({\bar{u}}_{t},{\bar{p}}_{t})= & {} \varphi (\varTheta ({\bar{u}}_t),p)-\varphi (\varTheta ({\bar{u}}_t),{\bar{p}}_t)\nonumber \\\leqslant & {} \frac{r^pd_1+d_2}{{\underline{\varepsilon }}(t+1)}+\frac{\gamma (d_1)^2}{2{\underline{\varepsilon }}^2(t+1)^2}. \end{aligned}$$(A20)From (A18) and (A19), it also has that
$$\begin{aligned} -\frac{d_2}{{\underline{\varepsilon }}(t+1)}\leqslant \langle {\bar{p}}_t,\varTheta ({\bar{u}}_t)\rangle \leqslant \varphi (\varTheta ({\bar{u}}_t),{\bar{p}}_t)\leqslant \frac{r^p d_1}{{\underline{\varepsilon }}(t+1)}+\frac{\gamma (d_1)^2}{2{\underline{\varepsilon }}^2(t+1)^2}, \end{aligned}$$which follows that
$$\begin{aligned} \varphi (\varTheta ({\bar{u}}_t),{\bar{p}}_t)-\langle {\bar{p}}_t,\varTheta ({\bar{u}}_t)\rangle\leqslant & {} \frac{r^pd_1}{{\underline{\varepsilon }}(t+1)}+\frac{\gamma (d_1)^2}{2{\underline{\varepsilon }}^2(t+1)^2}-(-\frac{d_2}{{\underline{\varepsilon }}(t+1)})\\= & {} \frac{r^pd_1+d_2}{{\underline{\varepsilon }}(t+1)}+\frac{\gamma (d_1)^2}{2{\underline{\varepsilon }}^2(t+1)^2}. \end{aligned}$$Then, for \(u\in {\varvec{U}}\cap {\mathfrak {B}}^{u}\), we have
$$\begin{aligned}&L_\gamma ({\bar{u}}_{t},{\bar{p}}_{t})\leqslant L({\bar{u}}_{t},{\bar{p}}_{t})+\frac{r^pd_1+d_2}{{\underline{\varepsilon }}(t+1)}+\frac{\gamma (d_1)^2}{2{\underline{\varepsilon }}^2(t+1)^2}\nonumber \\&\quad \leqslant L(u,{\bar{p}}_{t})+\frac{r^pd_1+2d_2}{{\underline{\varepsilon }}(t+1)}+\frac{\gamma (d_1)^2}{2{\underline{\varepsilon }}^2(t+1)^2}\text{(by } \text{ right-hand } \text{ side } \text{ of } \text{ statement } \text{(i)) }\nonumber \\&\quad \leqslant L_\gamma (u,{\bar{p}}_{t})+\frac{r^pd_1+2d_2}{{\underline{\varepsilon }}(t+1)}+\frac{\gamma (d_1)^2}{2{\underline{\varepsilon }}^2(t+1)^2}\quad \text{(from }~(13)), \end{aligned}$$(A21)which follows the right-hand side of inequality in statement (ii).
-
(iii)
For saddle point \((u^*,p^*)\), we have
$$\begin{aligned} L_{\gamma }(u^*,p)\leqslant L_{\gamma }(u^*,p^*)\leqslant L_{\gamma }(u,p^*), \forall u\in {\varvec{U}}, p\in {\mathbb {R}}^m. \end{aligned}$$(A22)Taking \(u={\bar{u}}_t\), \(p={\bar{p}}_t\) in (A22), and taking \(u={\hat{u}}({\bar{p}}_t)\), \(p=p^*\) in statement (ii) of this theorem, we obtain the following two inequalities, respectively:
$$\begin{aligned} L_{\gamma }(u^*,{\bar{p}}_t)\leqslant&L_{\gamma }(u^*,p^*)&\leqslant L_{\gamma }({\bar{u}}_t,p^*), \end{aligned}$$and
$$\begin{aligned}&-\frac{r^pd_1+d_2}{{\underline{\varepsilon }}(t+1)}-\frac{\gamma (d_1)^2}{2{\underline{\varepsilon }}^2(t+1)^2}+L_{\gamma }({\bar{u}}_{t},p^*)\leqslant L_{\gamma }({\bar{u}}_{t},{\bar{p}}_{t})\leqslant L_{\gamma }({\hat{u}}({\bar{p}}_t),{\bar{p}}_{t})\\&\quad +\frac{r^pd_1+2d_2}{{\underline{\varepsilon }}(t+1)}+\frac{\gamma (d_1)^2}{2{\underline{\varepsilon }}^2(t+1)^2}. \end{aligned}$$Combining these two inequalities, the desired inequality is obtained:
$$\begin{aligned}&-\frac{r^pd_1+d_2}{{\underline{\varepsilon }}(t+1)}-\frac{\gamma (d_1)^2}{2{\underline{\varepsilon }}^2(t+1)^2}+L_{\gamma }(u^*,p^*)\leqslant L_{\gamma }({\hat{u}}({\bar{p}}_t),{\bar{p}}_{t})\\&\quad +\frac{r^pd_1+2d_2}{{\underline{\varepsilon }}(t+1)}+\frac{\gamma (d_1)^2}{2{\underline{\varepsilon }}^2(t+1)^2}. \end{aligned}$$Therefore,
$$\begin{aligned} \psi _\gamma (p^*)=L_{\gamma }(u^*,p^*)\leqslant & {} L_{\gamma }({\hat{u}}({\bar{p}}_t),{\bar{p}}_{t})+\frac{2r^pd_1+3d_2}{{\underline{\varepsilon }}(t+1)}+\frac{\gamma (d_1)^2}{{\underline{\varepsilon }}^2(t+1)^2}\nonumber \\= & {} \psi _\gamma ({\bar{p}}_t)+\frac{2r^pd_1+3d_2}{{\underline{\varepsilon }}(t+1)}+\frac{\gamma (d_1)^2}{{\underline{\varepsilon }}^2(t+1)^2}. \end{aligned}$$(A23)
Rights and permissions
About this article
Cite this article
Zhao, L., Zhu, DL. On Iteration Complexity of a First-Order Primal-Dual Method for Nonlinear Convex Cone Programming. J. Oper. Res. Soc. China 10, 53–87 (2022). https://doi.org/10.1007/s40305-021-00344-x
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s40305-021-00344-x