Abstract
Stochastic gradient methods (SGMs) have been widely used for solving stochastic optimization problems. A majority of existing works assume no constraints or easy-to-project constraints. In this paper, we consider convex stochastic optimization problems with expectation constraints. For these problems, it is often extremely expensive to perform projection onto the feasible set. Several SGMs in the literature can be applied to solve the expectation-constrained stochastic problems. We propose a novel primal-dual type SGM based on the Lagrangian function. Different from existing methods, our method incorporates an adaptiveness technique to speed up convergence. At each iteration, our method inquires an unbiased stochastic subgradient of the Lagrangian function, and then it renews the primal variables by an adaptive-SGM update and the dual variables by a vanilla-SGM update. We show that the proposed method has a convergence rate of \(O(1/\sqrt{k})\) in terms of the objective error and the constraint violation. Although the convergence rate is the same as those of existing SGMs, we observe its significantly faster convergence than an existing non-adaptive primal-dual SGM and a primal SGM on solving the Neyman–Pearson classification and quadratically constrained quadratic programs. Furthermore, we modify the proposed method to solve convex–concave stochastic minimax problems, for which we perform adaptive-SGM updates to both primal and dual variables. A convergence rate of \(O(1/\sqrt{k})\) is also established to the modified method for solving minimax problems in terms of primal-dual gap. Our code has been released at https://github.com/RPI-OPT/APriD.
Similar content being viewed by others
Data Availability Statement
This manuscript has associated data in a data repository at https://github.com/RPIOPT/APriD.
References
Abadi, M., Chu, A., Goodfellow, I., McMahan, H.B., Mironov, I., Talwar, K., Zhang, L.: Deep learning with differential privacy. In: Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, pp. 308–318 (2016)
Aybat, N.S., Iyengar, G.: An augmented Lagrangian method for conic convex programming. arXiv preprint arXiv:1302.6322 (2013)
Calafiore, G., Campi, M.C.: Uncertain convex programs: randomized solutions and confidence levels. Math. Progr. 102(1), 25–46 (2005)
Calafiore, G.C., Campi, M.C.: The scenario approach to robust control design. IEEE Trans. Autom. Control 51(5), 742–753 (2006)
Chen, Y., Lan, G., Ouyang, Y.: Optimal primal-dual methods for a class of saddle point problems. SIAM J. Optim. 24(4), 1779–1814 (2014)
Dozat, T.: Incorporating Nesterov momentum into adam. Dostupné z: http://cs229.stanford.edu/proj2015/054_report. pdf (2016)
Dua, D., Graff, C.: UCI machine learning repository (2017)
Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12(Jul), 2121–2159 (2011)
Grant, M., Boyd, S.: CVX: Matlab software for disciplined convex programming, version 2.1. http://cvxr.com/cvx (2014)
Grant, M.C., Boyd, S.P.: Graph implementations for nonsmooth convex programs. In: Recent Advances in Learning and Control, pp. 95–110. Springer (2008)
Gupta, M., Cotter, A., Pfeifer, J., Voevodski, K., Canini, K., Mangylov, A., Moczydlowski, W., Van Esbroeck, A.: Monotonic calibrated interpolated look-up tables. J. Mach. Learn. Res. 17(1), 3790–3836 (2016)
Guyon, I., Gunn, S., Ben-Hur, A., Dror, G.: Result analysis of the nips 2003 feature selection challenge. Adv. Neural Inf. Process. Syst. 17, 545–552 (2004)
Hamedani, E.Y., Jalilzadeh, A., Aybat, N.S., Shanbhag, U.V.: Iteration complexity of randomized primal-dual methods for convex-concave saddle point problems. arXiv preprint arXiv:1806.04118 (2018)
Hamedani, E.Y., Aybat, N.S.: A primal-dual algorithm for general convex–concave saddle point problems. arXiv preprint arXiv:1803.01401 (2018)
Hien, L.T.K., Zhao, R., Haskell, W.B.: An inexact primal-dual smoothing framework for large-scale non-bilinear saddle point problems. arXiv preprint arXiv:1711.03669 (2017)
Juditsky, A., Nemirovski, A., Tauvel, C.: Solving variational inequalities with stochastic mirror-prox algorithm. Stoch. Syst. 1(1), 17–58 (2011)
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Lan, G., Zhou, Z.: Algorithms for stochastic optimization with function or expectation constraints. Comput. Optim. Appl. 76, 1–38 (2020)
Lin, Q., Ma, R., Yang, T.: Level-set methods for finite-sum constrained convex optimization. In: International Conference on Machine Learning, pp. 3112–3121 (2018)
Lin, Q., Nadarajah, S., Soheili, N.: A level-set method for convex optimization with a feasible solution path. SIAM J. Optim. 28(4), 3290–3311 (2018)
Lu, Z., Zhou, Z.: Iteration-complexity of first-order augmented Lagrangian methods for convex conic programming. arXiv preprint arXiv:1803.09941 (2018)
Luedtke, J., Ahmed, S.: A sample approximation approach for optimization with probabilistic constraints. SIAM J. Optim. 19(2), 674–699 (2008)
Luo, L., Xiong, Y., Liu, Y., Sun, X.: Adaptive gradient methods with dynamic bound of learning rate. arXiv preprint arXiv:1902.09843 (2019)
Nemirovski, A., Juditsky, A., Lan, G., Shapiro, A.: Robust stochastic approximation approach to stochastic programming. SIAM J. Optim. 19(4), 1574–1609 (2009)
Neumann, J.V.: Zur theorie der gesellschaftsspiele. Math. Ann. 100(1), 295–320 (1928)
Pagnoncelli, B.K., Ahmed, S., Shapiro, A.: Sample average approximation method for chance constrained programming: theory and applications. J. Optim. Theory Appl. 142(2), 399–416 (2009)
Rigollet, P., Tong, X.: Neyman–Pearson classification, convexity and stochastic constraints. J. Mach. Learn. Res. 12(Oct), 2831–2855 (2011)
Robbins, H., Monro, S.: A stochastic approximation method. Ann. Math. Stat. 22, 400–407 (1951)
Rockafellar, R.T., Uryasev, S., et al.: Optimization of conditional value-at-risk. J. Risk 2, 21–42 (2000)
Ryu, E.K., Yin, W.: Proximal–proximal–gradient method. arXiv preprint arXiv:1708.06908 (2017)
Reddi, S.J., Kale, S, Kumar, S.: On the convergence of adam and beyond. In: International Conference on Learning Representations (2018)
Scott, C., Nowak, R.: A Neyman–Pearson approach to statistical learning. IEEE Trans. Inf. Theory 51(11), 3806–3819 (2005)
Shapiro, A., Dentcheva, D., Ruszczyński, A.: Lectures on Stochastic Programming: Modeling and Theory. SIAM (2014)
Tieleman, T., Hinton, G.: Lecture 6.5-rmsprop: divide the gradient by a running average of its recent magnitude. COURSERA Neural Netw. Mach. Learn. 4(2), 26–31 (2012)
Wang, M., Bertsekas, D.P.: Stochastic first-order methods with random constraint projection. SIAM J. Optim. 26(1), 681–717 (2016)
Wang, M., Chen, Y., Liu, J., Gu, Y.: Random multi-constraint projection: Stochastic gradient methods for convex optimization with many constraints. arXiv preprint arXiv:1511.03760 (2015)
Wang, M., Fang, E.X., Liu, H.: Stochastic compositional gradient descent: algorithms for minimizing compositions of expected-value functions. Math. Progr. 161(1–2), 419–449 (2017)
Xu, Y.: Primal-dual stochastic gradient method for convex programs with many functional constraints. SIAM J. Optim. 30(2), 1664–1692 (2020)
Xu, Y.: First-order methods for constrained convex programming based on linearized augmented Lagrangian function. INFORMS J. Optim. 3(1), 89–117 (2021)
Xu, Y.: Iteration complexity of inexact augmented Lagrangian methods for constrained convex programming. Math. Progr. 185, 199–244 (2021)
Xu, Y., Xu, Y.: Katyusha acceleration for convex finite-sum compositional optimization. INFORMS J. Optim. 3(4), 418–443 (2021)
Yu, H., Neely, M.J.: A primal-dual type algorithm with the \({O} (1/t)\) convergence rate for large scale constrained convex programs. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 1900–1905. IEEE (2016)
Zafar, M.B., Valera, I., Gomez-Rodriguez, M., Gummadi, K.P.: Fairness constraints: a flexible approach for fair classification. J. Mach. Learn. Res. 20(75), 1–42 (2019)
Zafar, M.B., Valera, I., Rogriguez, M.G., Gummadi, K.P.: Fairness constraints: mechanisms for fair classification. In: Artificial Intelligence and Statistics. PMLR, pp. 962–970 (2017)
Zeiler, M.D.: Adadelta: an adaptive learning rate method. arXiv preprint arXiv:1212.5701 (2012)
Zhao, R.: Optimal stochastic algorithms for convex–concave saddle-point problems. arXiv preprint arXiv:1903.01687 (2019)
Acknowledgements
The authors would like to thank three anonymous reviewers for their valuable comments and suggestions to improve the quality of the paper and also for the careful testing on our codes. The authors are partly supported by the NSF award 2053493 and the RPI-IBM AIRC faculty fund.
Funding
This work was partly supported by NSF Award 2053493.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Proof of Lemma 2
Proof
First, consider the case of non-constant primal step size. By \(\eta _1 = \sum _{i=1}^K \alpha _i\beta _1^{i-1}\) and the \(\eta \)-update in (2.8), we have \(\eta _k = \sum _{i=k}^K \alpha _i \beta _1^{i-k}, k\in [K]\), and thus the \(\rho \)-update becomes
By the above equation, we have for \(2\le j\le K\),
where the inequality follows from the non-increasing monotonicity of \(\{\alpha _j\}_{j= 1}^K\) and \(\beta _1\in (0,1)\). Hence, \(\{\rho _j\}_{j= 1}^K\) is a non-increasing sequence. Using (2.8) again, we have for \(2\le j\le t\le K\),
which clearly implies the inequality in (2.11).
For \(j=1\), (2.12) holds because \(\beta \in (0,1)\). To show it holds for \(2\le j\le K\), we rewrite \(\rho _j\) and obtain
where the third equation recursively applies the second equation, and the inequalities hold by the two inequalities in (2.10).
Now, consider the case of constant primal step size, i.e., \(\alpha _j = \alpha _1\) for all \(j\in [K]\). We can prove \(\eta _j = \frac{\alpha _1}{1-\beta _1}\) for all \(j\in [K]\) by the induction, and thus
which completes the proof. \(\square \)
Proof of Lemma 3
Proof
As \((\mathbf {x}^*,\mathbf {z}^*)\) satisfies the KKT conditions in Assumption 3, there are \(\tilde{\nabla }f_i(\mathbf {x}^*), \forall i\in [M]\) such that
From the convexity of \(f_0\) and X, it follows that \( f_0(\mathbf {x}) - f_0(\mathbf {x}^*)\ge - \big \langle \sum _{i=1}^{M} z_i^*\tilde{\nabla }f_i(\mathbf {x}^*), \mathbf {x}-\mathbf {x}^*\big \rangle , \forall \mathbf {x}\in X\). Since \(f_i\) is convex for each \(i\in [M]\), we have \(f_i(\mathbf {x}) - f_i(\mathbf {x}^*) \ge \langle \tilde{\nabla }f_i(\mathbf {x}^*),\mathbf {x}-\mathbf {x}^*\rangle \). Noticing \(\mathbf {z}^*\ge \mathbf {0}\), we have for any \( \mathbf {x}\in X\),
Because \(z_i^*f_i(\mathbf {x}^*)=0\) for all \(i\in [M]\), we obtain (3.7).
Furthermore, for any \(\mathbf {z}\ge \mathbf {0}\), we have \(\langle \mathbf {z}, \mathbf {f}(\mathbf {x}^*)\rangle \le 0\) from \(f_i(\mathbf {x}^*)\le 0\), \(\forall i \in [M]\). Hence, combining with (3.7), we have the inequality in (3.8). \(\square \)
Proof of Lemma 4
Proof
For \(\widehat{\mathbf {u}}^k\) given in (2.4), we have \(\Vert \widehat{\mathbf {u}}^k\Vert \le \theta \) and thus each coordinate of \(\widehat{\mathbf {u}}^k\) is also less than \(\theta \), i.e. \(-\theta \mathbf {1} \le \widehat{\mathbf {u}}^k \le \theta \mathbf {1}\). Recursively rewriting the updates in (2.3), (2.5) and (2.6) gives
here, \(\mathbf {m}^0=\mathbf {v}^0=\widehat{\mathbf {v}}^0=\mathbf {0}\). By (C.2) and \(-\theta \mathbf {1} \le \widehat{\mathbf {u}}^k \le \theta \mathbf {1}\), we have \( \mathbf {v}^k \le \theta ^2 (1-\beta _2)\sum _{j=1}^k \beta _2^{k-j} \mathbf {1} \le \theta ^2 \mathbf {1}\). By (C.3), we further have \( \widehat{\mathbf {v}}^k \le \theta ^2 \mathbf {1}\). Thus \( \mathbb {E}\big [\Vert (\widehat{\mathbf {v}}^k)^{1/2}\Vert _1\big ] \le n\theta \) holds.
Notice \( \mathbb {E}\big [\Vert \mathbf {m}^{k}\Vert _{({\widehat{\mathbf {v}}}^k)^{-{1/2}}}^2\big ] = \mathbb {E}\big [\Vert \frac{\mathbf {m}^{k}}{({\widehat{\mathbf {v}}}^k)^{{1/4}}}\Vert ^2\big ]\). We can lower bound \(\mathbf {v}^k\) by keeping only the last term in (C.2) since \(({\widehat{\mathbf {u}}}^{j})^2\ge \mathbf {0}\), i.e. \( \mathbf {v}^k \ge (1-\beta _2)({\widehat{\mathbf {u}}}^k)^2 \). By (C.3), we also have \(\widehat{\mathbf {v}}^k \ge (1-\beta _2) \max _{j\in [k]} ({\widehat{\mathbf {u}}}^j)^2.\) Plugging the inequality and (C.1) into \( \mathbb {E}\big [\Vert \mathbf {m}^{k}\Vert _{({\widehat{\mathbf {v}}}^k)^{-{1/2}}}^2\big ] \) gives
Then we bound \(\left\| \frac{\sum _{j=1}^{k} \beta _{1}^{k-j} \mathbf {u}^j }{\max _{j\in [k]} \mid {\widehat{\mathbf {u}}}^j\mid ^{1/2}} \right\| ^2\) by the Cauchy-Schwarz inequality.
where we use \(\max _{j\in [k]} \mid {\widehat{\mathbf {u}}}^j\mid ^{1/2}\ge \mid {\widehat{\mathbf {u}}}^j\mid ^{1/2}\) for \(j\in [k]\) in the second inequality. For \(\left\| \frac{ \mathbf {u}^j }{\mid {\widehat{\mathbf {u}}}^j\mid ^{1/2}}\right\| ^2\), we have \({\widehat{\mathbf {u}}}^j\) given in (2.4) and notice \(\max \big \{1, \frac{\Vert \mathbf {u}^k\Vert }{\theta }\big \}\) is a scalar.
So we get \(\left\| \frac{ \mathbf {u}^j }{\mid {\widehat{\mathbf {u}}}^j\mid ^{1/2}}\right\| ^2\le \sqrt{n} \left( \theta + \frac{\Vert \mathbf {u}^j \Vert ^2}{\theta }\right) \). Plug the inequality back to (C.5), and then (C.5) back to (C.4).
where we have used \( \sum _{j=1}^{k} \beta _{1}^{k-j}\le \sum _{j=1}^{\infty } \beta _{1}^{j}\le \frac{1}{1-\beta _1}\) for \(\beta _1\in (0,1)\) in the last inequality. With (3.2), the proof is finished. \(\square \)
Proof of Lemma 5
Proof
From the projection (2.7) in the primal variable update, we have for \(k\in [K]\) and \( \forall \mathbf {x}\in X\),
The first term of the right side equals to
Recursively rewrite \( \big \langle \mathbf {x}^{k+1}-\mathbf {x}, \mathbf {m}^k \big \rangle \) with the update (2.3)
where the second equation recursively applied the first equation and the last term \(\beta _{1}^k\big \langle \mathbf {x}^{1} - \mathbf {x},\mathbf {m}^{0}\big \rangle \) vanishes because \(\mathbf {m}^0 = \mathbf {0}\). Plugging Eqs. (D.2) and (D.4) into the inequality (D.1) gives
Sum the above inequality (D.5) for \(k=1\) to t. About the left side, we have
About the right side of the sum of the inequality (D.5), by \((\widehat{\mathbf {v}}^{k})^{1/2} \ge (\widehat{\mathbf {v}}^{k-1})^{1/2}\ge \mathbf {0}\), \(k \in [t]\) since the iteration (2.6), and Assumption 1, we have
Thus with inequalities (D.6) and (D.7), the sum of the inequality (D.5) for \(k=1\) to t becomes
Eliminating the term \(\big \Vert \mathbf {x}^{j+1} -\mathbf {x}^{j}\big \Vert _{(\widehat{\mathbf {v}}^j)^{1/2}}\) on both sides and exchanging the order of sums in the first term give
Then take the expectation on the above inequality. With the bounds given in Lemma 4, we have
where the last inequality holds because we notice \(\widehat{P}_\mathbf {z}^k\) defined in (3.2) is nondecreasing with respect to k. \(\square \)
Proof of Lemma 6
Proof
For the dual variable is projected to the positive region in the update (2.9), it follows that for any \(\mathbf {z}\ge 0\), \(j\in [K]\),
It could be rewritten as
For each term of the above inequality (E.1), we have
Plugging the above three terms into the inequality (E.1) and eliminating \(\big \Vert \mathbf {z}^{j+1}-\mathbf {z}^j\big \Vert ^2\) give
Rearranging the above inequality gives the inequality (3.10). \(\square \)
Proof of Lemma 7
Proof
For any \(j\in [K]\), we have
Here \( \tilde{\nabla }_\mathbf {x}\mathcal {L}(\mathbf {x}^j,\mathbf {z}^j) = \mathbb {E}\big [\mathbf {u}^j\mid \mathcal {H}^j\big ] \in \partial _\mathbf {x}\mathcal {L}(\mathbf {x}^j,\mathbf {z}^j)\) according to Assumption 2. By the convexity of \(f_i(\mathbf {x}), i=0,1, \ldots , M\), we know \(\mathcal {L}(\mathbf {x},\mathbf {z})\) is convex with respect to \(\mathbf {x}\) and
Plug the lower bound of \(\langle \mathbf {z}^j, \mathbf {f}(\mathbf {x}^j)\rangle \) given in Lemma 6 to the above inequality.
Summarizing (F.1) with weights \( \sum _{k=j}^t \alpha _k\beta _1^{k-j}\) for \(j\in [t]\), and plugging the above inequality give
Summation of the term about \(\big \Vert \mathbf {z}^{j+1} - \mathbf {z}\big \Vert ^2 - \big \Vert \mathbf {z}^j-\mathbf {z}\big \Vert ^2\) can be lower bounded:
Summation of \(\big \Vert \mathbf {w}^j\big \Vert ^2\) can also be lower bounded
Plugging the above two inequalities (F.3) and (F.4) into the inequality (F.2) gives
Then taking the expectation on the above inequality and using Assumption 2 to bound \(\mathbb {E}\big [\big \Vert \mathbf {w}^j\big \Vert ^2\big ]\) give the result (3.11). \(\square \)
Proof of Lemma 8
Proof
If \((\mathbf {x},\mathbf {z})\) are deterministic, we can prove (3.13) by the conditional expectation and Assumption 2, i.e.
Then we prove the stochastic case through considering the left two terms of (3.12), separately, in a similar way. Let \(\tilde{\mathbf {z}}^1 = \mathbf {z}^1, \tilde{\mathbf {z}}^{j+1} = \tilde{\mathbf {z}}^j - \gamma _j( \mathbf {w}^j - \mathbf {f}(\mathbf {x}^j))\), then \( \mathbf {z}^j-\tilde{\mathbf {z}}^j\) is known given \(\mathcal {H}^k\) and we have \(\mathbb {E}\big [ \big \langle \mathbf {z}^j-\tilde{\mathbf {z}}^j, \mathbf {w}^j - \mathbf {f}(\mathbf {x}^j) \big \rangle \big ] = 0\) like the above deterministic case. Thus we have
where the first inequality holds because we drop the nonpositive term and \(\tilde{\mathbf {z}}^1 = \mathbf {z}^1\); the second inequality holds because for any random vector \(\mathbf {w}\), \(\mathbb {E}\big [\big \Vert \mathbf {w}-\mathbb {E}[\mathbf {w}]\big \Vert ^2\big ]\le \mathbb {E}\big [\big \Vert \mathbf {w}\big \Vert ^2\big ]\), and here \(\mathbb {E}\big [\mathbf {w}^j\mid \mathcal {H}^j\big ] = \mathbf {f}(\mathbf {x}^j)\) for \(j\in [t]\); and the last inequality holds by Assumption 2.
For the summation of \(\gamma _j\mathbb {E}\big [ \big \langle \mathbf {x}^j-\mathbf {x}, \mathbf {u}^j -\tilde{\nabla }_\mathbf {x}\mathcal {L}(\mathbf {x}^j,\mathbf {z}^j)\big \rangle \big ]\), let \(\tilde{\mathbf {x}}^1 = \mathbf {x}^1, \tilde{\mathbf {x}}^{j+1} = \tilde{\mathbf {x}}^j + \gamma _j\big (\mathbf {u}^j -\tilde{\nabla }_\mathbf {x}\mathcal {L}(\mathbf {x}^j,\mathbf {z}^j)\big )\), then \(\mathbb {E}\big [\big \langle \mathbf {x}^j-\tilde{\mathbf {x}}^j, \mathbf {u}^j -\tilde{\nabla }_\mathbf {x}\mathcal {L}(\mathbf {x}^j,\mathbf {z}^j) \big ]\big \rangle = 0\) and
where the first inequality holds because we drop the nonpositive term and (3.1); the second inequality holds because \(\mathbb {E}\big [\mathbf {u}^j\mid \mathcal {H}^j\big ]= \tilde{\nabla }_\mathbf {x}\mathcal {L}(\mathbf {x}^j,\mathbf {z}^j)\) for \(j\in [t]\); and the last two inequalities holds by Assumption 2 and (3.2).
Adding (G.1) and (G.2) gives the result (3.12). \(\square \)
Proof of Lemma 9
Proof
Let \(\mathbf {x}= \mathbf {x}^*\) in (3.17), we have \(\mathbf {f}(\mathbf {x}^*)\le \mathbf {0}\) and thus
Since \(f_j({\bar{\mathbf {x}}})\le [f_j({\bar{\mathbf {x}}})]_+\) and \(\mathbf {z}^*\ge 0\), we have from (3.7) that
Substituting (H.2) into (H.1) with \(\mathbf {z}\) given by \(z_j = 1 + z_j^*\) if \(f_j({\bar{\mathbf {x}}})>0\) and \(z_j = 0\) otherwise for any \(j\in [M]\) gives
Simplifying the above inequality gives (3.19).
Letting \(z_j = 3 z_j^*\) if \(f_j({\bar{\mathbf {x}}})>0\) and \(z_j = 0\) otherwise for any \(j\in [M]\) in (H.1) and adding (H.2) together gives
Hence, by the above inequality and (H.2), we obtain
Thus \(\mathbb {E}\big [[f_0({\bar{\mathbf {x}}})-f_0(\mathbf {x}^*)]_-\big ]\le \frac{\epsilon _1}{2}+\frac{9\epsilon _0}{2}\big \Vert \mathbf {z}^*\big \Vert ^2.\) In addition, from (H.1) with \(\mathbf {z}= 0\), it follows \(\mathbb {E}\big [f_0({\bar{\mathbf {x}}})-f_0(\mathbf {x}^*)\big ]\le \epsilon _1\). Since \(\mid a\mid = a + 2[a]_-\) for any real number a, we have
which gives (3.18).
Furthermore, let \(\mathbf {z}= 0\) in (3.17) and take \({\widehat{\mathbf {x}}}\in \text{ argmin}_{\mathbf {x}\in X} f_0(\mathbf {x}) + \big \langle \bar{\mathbf {z}},\mathbf {f}(\mathbf {x})\big \rangle \). By equation (2.2), we have \(\mathbb {E}f_0({\bar{\mathbf {x}}})\le \mathbb {E}d(\bar{\mathbf {z}}) + \epsilon _1\), which together with (3.6) gives
Combining the above inequality with (H.4) gives (3.20). \(\square \)
Proof of Corollary 4
Proof
For the step sizes \(\{\alpha _j\}_{j=1}^K\), we have
Similarly, for \(\{\rho _j\}_{j=1}^K\), it holds \( \sum _{j=1}^{K} \rho _j^2\le \rho ^2\log {(K+1)}\). Plug these bounds to the result of Theorem 3 and note \(\log (K+1)\ge 1\) for \(K\ge 2\). We finish the proof. \(\square \)
Rights and permissions
About this article
Cite this article
Yan, Y., Xu, Y. Adaptive primal-dual stochastic gradient method for expectation-constrained convex stochastic programs. Math. Prog. Comp. 14, 319–363 (2022). https://doi.org/10.1007/s12532-021-00214-w
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12532-021-00214-w
Keywords
- Stochastic gradient method
- Adaptive methods
- Expectation-constrained stochastic optimization
- Saddle-point problem