Stochastic first-order methods for convex and nonconvex functional constrained optimization

Boob, Digvijay; Deng, Qi; Lan, Guanghui

doi:10.1007/s10107-021-01742-y

Stochastic first-order methods for convex and nonconvex functional constrained optimization

Full Length Paper
Series A
Published: 21 January 2022

Volume 197, pages 215–279, (2023)
Cite this article

Mathematical Programming Submit manuscript

Digvijay Boob¹,
Qi Deng² &
Guanghui Lan¹

2460 Accesses
14 Citations
Explore all metrics

Abstract

Functional constrained optimization is becoming more and more important in machine learning and operations research. Such problems have potential applications in risk-averse machine learning, semisupervised learning and robust optimization among others. In this paper, we first present a novel Constraint Extrapolation (ConEx) method for solving convex functional constrained problems, which utilizes linear approximations of the constraint functions to define the extrapolation (or acceleration) step. We show that this method is a unified algorithm that achieves the best-known rate of convergence for solving different functional constrained convex composite problems, including convex or strongly convex, and smooth or nonsmooth problems with stochastic objective and/or stochastic constraints. Many of these rates of convergence were in fact obtained for the first time in the literature. In addition, ConEx is a single-loop algorithm that does not involve any penalty subproblems. Contrary to existing primal-dual methods, it does not require the projection of Lagrangian multipliers into a (possibly unknown) bounded set. Second, for nonconvex functional constrained problems, we introduce a new proximal point method which transforms the initial nonconvex problem into a sequence of convex problems by adding quadratic terms to both the objective and constraints. Under certain MFCQ-type assumption, we establish the convergence and rate of convergence of this method to KKT points when the convex subproblems are solved exactly or inexactly. For large-scale and stochastic problems, we present a more practical proximal point method in which the approximate solutions of the subproblems are computed by the aforementioned ConEx method. Under a strong feasibility assumption, we establish the total iteration complexity of ConEx required by this inexact proximal point method for a variety of problem settings, including nonconvex smooth or nonsmooth problems with stochastic objective and/or stochastic constraints. To the best of our knowledge, most of these convergence and complexity results of the proximal point method for nonconvex problems also seem to be new in the literature.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A stochastic primal-dual method for a class of nonconvex constrained optimization

Article 24 June 2022

Lingzi Jin & Xiao Wang

Level constrained first order methods for function constrained optimization

Article Open access 06 March 2024

Digvijay Boob, Qi Deng & Guanghui Lan

Non-monotone derivative-free algorithm for solving optimization models with linear constraints: extensions for solving nonlinearly constrained models via exact penalty methods

Article 27 February 2020

Ubaldo M. García-Palomares

Notes

This x, y is required to be non-random because we are dropping the inner product terms of the left hand side of (2.32).

References

Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016)
Andreani, R., Haeser, G., Martínez, J.M.: On sequential optimality conditions for smooth constrained optimization. Optimization 60(5), 627–641 (2011)
Article MATH Google Scholar
Andreani, R., Martínez, J.M., Ramos, A., Silva, P.J.S.: Strict constraint qualifications and sequential optimality conditions for constrained optimization. Math. Oper. Res. 43, 693–717 (2018)
Article MATH Google Scholar
Aravkin, A.Y., Burke, J.V., Drusvyatskiy, D., Friedlander, M.P., Roy, S.: Level-set methods for convex optimization. Math. Program. 174, 359–390 (2018)
Article MATH Google Scholar
Ben-Tal, A., Nemirovski, A.: Non-Euclidean restricted memory level method for large-scale convex optimization. Math. Program. 102, 407–456 (2005)
Article MATH Google Scholar
Bertsekas, D.P.: Nonlinear Programming. Athena Scientific, Belmont (1999)
MATH Google Scholar
Bertsekas, D.P.: Convex Optimization Algorithms. Athena Scientific, Belmont (2015)
MATH Google Scholar
Cartis, C., Gould, N.I., Toint, P.L.: On the complexity of finding first-order critical points in constrained nonlinear optimization. Math. Program. 144(1), 93–106 (2014)
Article MATH Google Scholar
Chambolle, A., Pock, T.: A first-order primal-dual algorithm for convex problems with applications to imaging. J. Math. Imaging Vis. 40(1), 120–145 (2011)
Article MATH Google Scholar
Chen, Y., Lan, G., Ouyang, Y.: Optimal primal-dual methods for a class of saddle point problems. SIAM J. Optim. 24(4), 1779–1814 (2014)
Article MATH Google Scholar
Davis, D., Grimmer, B.: Proximally guided stochastic subgradient method for nonsmooth, nonconvex problems. arXiv:1707.03505v4 (2017)
Dinh, Q.T., Gumussoy, S., Michiels, W., Diehl, M.: Combining convex-concave decompositions and linearization approaches for solving BMIS, with application to static output feedback. arXiv:1109.3320 (2011)
Facchinei, F., Kungurtsev, V., Lampariello, L., Scutari, G.: Ghost penalties in nonconvex constrained optimization: diminishing stepsizes and iteration complexity. arXiv:1709.03384 (2017)
Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: Near-optimal non- convex optimization via stochastic path-integrated differential estimator. Adv. Neural Inf. Process. Syst. 687–697 (2018)
Frostig, R., Ge, R., Kakade, S., Sidford, A.: Un-regularizing: approximate proximal point and faster stochastic algorithms for empirical risk minimization. In: International Conference on Machine Learning, pp. 2540–2548 (2015)
Ghadimi, S., Lan, G.: Stochastic first- and zeroth-order methods for nonconvex stochastic programming. SIAM J. Optim. 23(4), 2341–2368 (2013)
Article MATH Google Scholar
Ghadimi, S., Lan, G.: Accelerated gradient methods for nonconvex nonlinear and stochastic programming. Math. Program. 156(1–2), 59–99 (2016)
Article MATH Google Scholar
Güler, O.: New proximal point algorithms for convex minimization. SIAM J. Optim. 2(4), 649–664 (1992)
Article MATH Google Scholar
Hamedani, E.Y., Aybat, N.S.: A primal-dual algorithm for general convex-concave saddle point problems. arXiv:1803.01401 (2018)
Kong, W., Melo, J.G., Monteiro, R.D.: Complexity of a quadratic penalty accelerated inexact proximal point method for solving linearly constrained nonconvex composite programs. arXiv:1802.03504 (2018)
Lan, G.: First-order and Stochastic Optimization Methods for Machine Learning. Springer, Berlin (2020)
Book MATH Google Scholar
Lan, G., Lee, S., Zhou, Y.: Communication-efficient algorithms for decentralized and stochastic optimization. Math. Program. 180, 237–284 (2018)
Article MATH Google Scholar
Lan, G., Monteiro, R.D.C.: Iteration-complexity of first-order penalty methods for convex programming. Math. Program. 138, 115–139 (2013)
Article MATH Google Scholar
Lan, G., Monteiro, R.D.C.: Iteration-complexity of first-order augmented Lagrangian methods for convex programming. Math. Program. 155(1–2), 511–547 (2016)
Article MATH Google Scholar
Lan, G., Yang, Y.: Accelerated stochastic algorithms for nonconvex finite-sum and multi-block optimization. arXiv:1805.05411 (2018)
Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Math. Program. 171(1–2), 167–215 (2018)
Article MATH Google Scholar
Lan, G., Zhou, Z.: Algorithms for stochastic optimization with expectation constraints. arXiv:1604.03887 (2016)
Lemaréchal, C., Nemirovski, A.S., Nesterov, Y.E.: New variants of bundle methods. Math. Program. 69, 111–148 (1995)
Article MATH Google Scholar
Lin, Q., Ma, R., Yang, T.: Level-set methods for finite-sum constrained convex optimization. In: Proceedings of the 35th International Conference on Machine Learning, vol. 80, pp. 3112–3121 (2018)
Lin, Q., Nadarajah, S., Soheili, N.: A level-set method for convex optimization with a feasible solution path. SIAM J. Optim. 28(4), 3290–3311 (2018)
Article MATH Google Scholar
Ma, R., Lin, Q., Yang, T.: Proximally constrained methods for weakly convex optimization with weakly convex constraints. arXiv:1908.01871 (2019)
Mangasarian, O., Fromovitz, S.: The fritz john necessary optimality conditions in the presence of equality and inequality constraints. J. Math. Anal. Appl. 17, 37–47 (1967)
Article MATH Google Scholar
Martínez, J.M., Svaiter, B.F.: A practical optimality condition without constraint qualifications for nonlinear programming. J. Optim. Theory Appl. 118(1), 117–133 (2003)
Article MATH Google Scholar
Nemirovski, A.: Prox-method with rate of convergence o(1/t) for variational inequalities with Lipschitz continuous monotone operators and smooth convex-concave saddle point problems. SIAM J. Optim. 15(1), 229–251 (2004)
Article MATH Google Scholar
Nesterov, Y.: Lectures on Convex Optimization. Springer, Berlin (2018)
Book MATH Google Scholar
Nguyen, L.M., Liu, J., Scheinberg, K., c, M.T.: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning, vol. 70, pp. 2613–2621 (2017)
Nouiehed, M., Sanjabi, M., Lee, J.D., Razaviyayn, M.: Solving a class of non-convex min-max games using iterative first order methods. arXiv:1902.08297 (2019)
Pham, N.H., Nguyen, L.M., Phan, D.T., Tran-Dinh, Q.: Proxsarah: an efficient algorithmic framework for stochastic composite nonconvex optimization. arXiv:1902.05679 (2019)
Polyak, B.: A general method of solving extremum problems. Sov. Math. Doklady 8(3), 593–597 (1967)
MATH Google Scholar
Rafique, H., Liu, M., Lin, Q., Yang, T.: Non-convex min-max optimization: provable algorithms and applications in machine learning. arXiv:1810.02060 (2018)
Reddi, S.J., Hefny, A., Sra, S., Póczós, B., Smola, A.J.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016)
Rockafellar, R.T., Uryasev, S.: Optimization of conditional value-at-risk. J. Risk 2, 21–41 (2000)
Article Google Scholar
Shapiro, A., Dentcheva, D., Ruszczynski, A.: Lectures on Stochastic Programming: Modeling and Theory, 2nd edn. Society for Industrial and Applied Mathematics, Philadelphia (2014)
Wang, X., Ma, S., Yuan, Y.: Penalty methods with stochastic approximation for stochastic nonlinear programming. Math. Comput. 86(306), 1793–1820 (2017)
Article MATH Google Scholar
Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: Spiderboost: A class of faster variance-reduced algorithms for nonconvex optimization. arXiv:1810.10690 (2018)
Xu, Y.: Iteration complexity of inexact augmented Lagrangian methods for constrained convex programming. Math. Program. (2019)
Yu, H., Neely, M., Wei, X.: Online convex optimization with stochastic constraints. Adv. Neural Inf. Process. Syst. pp. 1428–1438 (2017)
Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. In Proceedings of the 32nd International Conference on Neural Information Processing Systems (USA, 2018), NIPS’18, Curran Associates Inc., pp. 3925–3936

Download references

Acknowledgements

The authors would like to thank Dr. Qihang Lin for a few inspiring discussions that help to improve the initial version of this work. The authors would also like to thank an anonymous reviewer whose comments helped in a significant streamlining of the presentation of the paper.

Author information

Authors and Affiliations

Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, USA
Digvijay Boob & Guanghui Lan
School of Information Management and Engineering, Shanghai University of Finance and Economics, Shanghai, China
Qi Deng

Authors

Digvijay Boob
View author publications
You can also search for this author in PubMed Google Scholar
Qi Deng
View author publications
You can also search for this author in PubMed Google Scholar
Guanghui Lan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Guanghui Lan.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Boob and Lan was partially supported by the NSF Grant CCF 1909298. Deng was partially supported by the NSFC Grant 11831002.

Appendices

Proof of Proposition 1

Let us denote

$$\begin{aligned} \begin{aligned} {\bar{\psi }}_i(x)&:= \psi _{i}(x) {+ \mu _iW(x, x^*)}, \quad i = 0, \ldots , m. \end{aligned} \end{aligned}$$

(A.1)

It is easy to see that ${\bar{\psi }}_0(x)$ and ${\bar{\psi }}_i(x),\ i \in [m]$, are convex functions. Moreover, their respective subdifferentials can be written as

$$\begin{aligned} \partial {\bar{\psi }}_i(x)&= \{\nabla f_i(x) + \mu _i \nabla W(x,x^*)\} + \partial \chi _i(x), \end{aligned}$$

where $\nabla W$ is the gradient in the first variable. Note that $\nabla W(x^*, x^*) = {\varvec{0}}$. Consider the constrained convex optimization problem:

$$\begin{aligned} \min _{x \in X}&\quad {\bar{\psi }}_0(x) \nonumber \\ \text {s.t.}&\quad {\bar{\psi }}_i(x) \leqslant 0, \quad i \in [m]. \end{aligned}$$

(A.2)

Note that $x^*$ is a feasible solution of this problem. For sake of this proof, define $\varPsi _k(x) := {\bar{\psi }}_0(x) + \tfrac{k}{2}\sum _{i=1}^m \left[ {\bar{\psi }}_i(x)\right] _+^2 +\tfrac{1}{2}\left\Vert x-x^*\right\Vert ^{2}_{2}$. Let $S=\{x \in X: \left\Vert x-x^*\right\Vert ^{}_{2} \leqslant \varepsilon \}$ for some $\varepsilon > 0$ such that any $x \in S$ which is feasible for (A.2) satisfies ${\bar{\psi }}_0(x) \geqslant {\bar{\psi }}_0(x^*)$. Let $x_k :=\hbox {argmin}_{x\in S} \varPsi _k(x)$. This is well-defined since $\varPsi _k$ is a strongly convex function. Note that

$$\begin{aligned} \liminf _{k\rightarrow \infty } \varPsi _k(x_k) \leqslant \limsup _{k\rightarrow \infty }\varPsi _k(x_k) \leqslant \limsup _{k\rightarrow \infty } \varPsi _k(x^*) = {\bar{\psi }}_0(x^*) < \infty , \end{aligned}$$

(A.3)

where second inequality follows from the fact that $x_k$ is optimal and $x^* \in S$ is a feasible point. Note that as $k \rightarrow \infty $, we have $\limsup _{k\rightarrow \infty } \varPsi _k(x_k) < \infty \Rightarrow \limsup _{k\rightarrow \infty }{\bar{\psi }}(x_k) \leqslant 0$.

Moreover, note that ${{\,\mathrm{dom}\,}}(\liminf _{k\rightarrow \infty } \varPsi _k) \subseteq \{x: \psi _{i}(x) \leqslant 0, i \in [m]\}$. Also note that ${{\,\mathrm{dom}\,}}(\liminf _{k\rightarrow \infty } \varPsi _k) \cap S \ne \emptyset $ since both sets contain $x^*$. Then, definition of set S implies ${\bar{\psi }}_0(x) \geqslant {\bar{\psi }}_0(x^*)$ for all $x \in {{\,\mathrm{dom}\,}}(\liminf _{k\rightarrow \infty } \varPsi _k) \cap S$. Hence, $ \liminf _{k\rightarrow \infty } \varPsi _k(x_k) \geqslant \liminf _{k\rightarrow \infty } {\bar{\psi }}_0(x_k) \geqslant {\bar{\psi }}_0(x^*)$. This inequality with (A.3) implies that $\lim _{k\rightarrow \infty } \varPsi _k(x_k) = \psi _0(x^*)$ and $x_k \rightarrow x^*$. Hence, there exists ${\bar{k}}$ such that for all $k > {\bar{k}}$, $x_k \in {{\,\mathrm{int}\,}}(S)$. So for such k we can write the following first-order criterion for convex optimization ($\left[ {\bar{\psi }}_{i}\right] _+^2$ is a convex function):

$$\begin{aligned} {\varvec{0}}\in N_X(x_k) + \partial {\bar{\psi }}_0(x_k) + k\left[ {\bar{\psi }}(x_k)\right] _+ \partial {\bar{\psi }}(x_k) + x_k-x^*. \end{aligned}$$

This implies that $x_k$ is also the optimal solution of

$$\begin{aligned} \min _{x\in X} {\bar{\psi }}_0(x)+k\,\left[ {\bar{\psi }}(x_k)\right] _+^T {\bar{\psi }}(x)+\tfrac{1}{2}\Vert x-x^*\Vert _2^2. \end{aligned}$$

For simplicity, let us denote $v_k=k\,\left[ {\bar{\psi }}(x_k)\right] _+$. Due to the optimality of $x_k$ of solving the above, we have

$$\begin{aligned} {\bar{\psi }}_0(x_k)+v_k^T{\bar{\psi }}(x_k)+\tfrac{1}{2}\Vert x_k-x^*\Vert ^2\leqslant {\bar{\psi }}_0(x)+v_k^T{\bar{\psi }}(x)+\tfrac{1}{2}\Vert x-x^*\Vert _2^2,\quad \forall x\in X.\nonumber \\ \end{aligned}$$

(A.4)

We claim that $\{v_k\}$ is a bounded sequence. Indeed, if this is true, then we can find a convergent subsequence $\{i_k\}$ with $\lim _{k\rightarrow \infty } v_{i_k}=v^*$. Taking $k\rightarrow \infty $ in (A.4), we have

$$\begin{aligned} \limsup _{k\rightarrow \infty }{\bar{\psi }}_0(x_{i_k}) + {v^*}^T{\bar{\psi }}(x^*)\leqslant {\bar{\psi }}_0(x)+ {v^*}^T{\bar{\psi }}(x)+\tfrac{1}{2}\Vert x-x^*\Vert _2^2,\quad \forall x\in X.\nonumber \\ \end{aligned}$$

(A.5)

Placing $x=x^*$, we have ${\bar{\psi }}_0(x^*)\geqslant \limsup {\bar{\psi }}_0(x_{i_k})$, thus $\lim _{k\rightarrow \infty }{\bar{\psi }}_0(x_{i_k})={\bar{\psi }}_0(x^*)$ based on the lower semicontinuity of ${\bar{\psi }}_0$. In view of this discussion, $x^*$ optimizes the right side of (A.5). Thus, applying the first order criterion, we have

$$\begin{aligned} 0\in \partial {\bar{\psi }}_0(x^*)+\sum _{i\in [m]}{v^{(i)}}^*\partial {\bar{\psi }}(x^*) + N_X(x^*). \end{aligned}$$

It remains to apply $\partial {\bar{\psi }}_0(x^*)=\partial \psi _0(x^*)$ and $\partial {\bar{\psi }}_i(x^*)=\partial \psi _i(x^*)$.

In addition, to prove complimentary slackness, it suffices to show when ${\bar{\psi }}_i(x^*)=\psi _i(x^*)<0$, we must have ${v^{(i)}}^*=0$. Since $x_k$ converges to $x^*$ and ${\bar{\psi }}_i$ is continuous, there exists some $\exists k_0>0$, such that ${\bar{\psi }}_i(x_{i_k})<0$ when $k>k_0$. Hence ${v^{(i)}_{i_k}}^*=0$ by its definition. Taking the limit, we have ${v^{(i)}}^*=0$.

It remains to show the missing piece, that $\{v_k\}$ is a bounded sequence. We will prove by contradiction. If this is not true, we may assume $\lim _{k\rightarrow \infty }\Vert v_k\Vert = \infty $, passing to a subsequence if necessary. Moreover, define $y_k=v_k/\Vert v_k\Vert $, since $y_k$ is a unit vector, it has some limit point, let us assume $\lim _{k\rightarrow \infty }y_{j_k}=y^*$ for a subsequence $\{j_k\}$. Dividing both sides of (A.4) by $\Vert v_k\Vert $ and then passing it to the subsequence $\{j_k\}$, we have

$$\begin{aligned}&{\bar{\psi }}_0(x_{j_k})/\Vert v_{j_k}\Vert +y_{j_k}^T{\bar{\psi }}(x_{j_k})+\tfrac{1}{2\Vert v_{j_k}\Vert }\Vert x_{j_k}-x^*\Vert ^2\\&\quad \leqslant {\bar{\psi }}_0(x)/\Vert v_{j_k}\Vert +y_{j_k}^T{\bar{\psi }}(x)+\tfrac{1}{2\Vert v_{j_k}\Vert }\Vert x-x^*\Vert ^2,\quad \forall x\in X. \end{aligned}$$

Taking $k\rightarrow \infty $, we have

$$\begin{aligned} {y^*}^T{\bar{\psi }}(x^*)\leqslant {y^*}^T{\bar{\psi }}(x),\quad \forall x\in X. \end{aligned}$$

Since subsequence $x_{j_k}$ converges to $x^*$ and ${\bar{\psi }}_i$ is continuous, we see that ${\bar{\psi }}_i(x_{j_k}) < 0$ for any $i \notin {\mathcal {A}}(x^*)$ for $k \geqslant k_0$. This implies $y_{j_k}= j_k \left[ {\bar{\psi }}_i(x_{j_k})\right] _+ = 0$ for all $ k \geqslant k_0$ and for all $i \notin {\mathcal {A}}(x^*)$. So we must have ${\varvec{0}}\in N_X(x^*) + \sum _{i \in {\mathcal {A}}(x^*)} y^{*\left( i\right) } \partial \psi _i(x^*)$. Here, we have used the fact that $\nabla W(x^*, x^*) = {\varvec{0}}$, implying that $\partial {\bar{\psi }}_i(x^*) = \partial \psi _{i}(x^*)$ for all $i = 0, \ldots , m$. Let $u \in N_X(x^*)$ and $g_i(x^*) \in \partial \psi _i(x^*), i \in {\mathcal {A}}(x^*)$ be such that

$$\begin{aligned} u + \sum _{i \in {\mathcal {A}}(x^*)} y^{*\left( i\right) }g_i(x^*) = {\varvec{0}}. \end{aligned}$$

Then we can derive a contradiction by using MFCQ (Definition 3). Assume that z satisfies MFCQ (3). Therefore, we have

$$\begin{aligned} 0 = z^Tu + \sum _{i \in {\mathcal {A}}(x^*)}y^{*\left( i\right) } z^Tg_i(x^*)&\leqslant \sum _{i \in {\mathcal {A}}(x^*)}y^{*\left( i\right) } z^Tg_i(x^*) \\&\leqslant \sum _{i \in {\mathcal {A}}(x^*)}y^{*\left( i\right) } \max _{v \in \partial \psi _i(x^*)}z^Tv < 0, \end{aligned}$$

where first inequality follows since $z \in -N_X^*(x^*)$ and $u \in N_X(x^*)$ hence $z^Tu \leqslant 0$, second inequality follows due to the fact that $y^{*\left( i\right) } \geqslant 0$ and $g_i(x^*) \in \partial \psi _i(x^*)$ and last strict inequality follows since (3) and $y^{*\left( i\right) } > 0$ for at least one $i \in {\mathcal {A}}(x^*)$.

Proof of Proposition 4

Let us define ${\bar{\psi }}_i, i = 0, \ldots , m$ as in (A.1) where $x^*$ is a local solution of (1.1) then,

$$\begin{aligned}&\exists \ \varepsilon> 0 \quad \text { s.t.} \quad \psi _0(x) \geqslant \psi _0(x^*) \quad&\text {for all } x \in \{x \in X: \psi _i(x) \leqslant 0, i \in [m],\ \left\Vert x-x^*\right\Vert ^{}_{}< \varepsilon \}\\ \Rightarrow&\exists \ \varepsilon> 0 \quad \text { s.t.} \quad \psi _0(x) \geqslant \psi _0(x^*) \quad&\text {for all } x \in \{x \in X: {\bar{\psi }}_i(x) \leqslant 0, i \in [m],\ \left\Vert x-x^*\right\Vert ^{}_{}< \varepsilon \}\\ \Rightarrow&\exists \ \varepsilon > 0 \quad \text { s.t.} \quad {\bar{\psi }}_0(x) \geqslant \psi _0(x^*) = {\bar{\psi }}_0(x^*) \quad&\text {for all } x \in \{x \in X: {\bar{\psi }}_i(x) \leqslant 0, i \in [m],\ \left\Vert x-x^*\right\Vert ^{}_{} < \varepsilon \}, \end{aligned}$$

where the first implication follows from the fact that ${\bar{\psi }}_{i}(x) \geqslant \psi _{i}(x)$ for all $ i \in [m]$ or equivalently $\{x \in X: {\bar{\psi }}_i(x) \leqslant 0, i \in [m],\ \left\Vert x-x^*\right\Vert ^{}_{}< \varepsilon \} \subseteq \{x \in X: \psi _i(x) \leqslant 0, i \in [m],\ \left\Vert x-x^*\right\Vert ^{}_{} < \varepsilon \}$, and second implication follows from the fact that ${\bar{\psi }}_i(x) \geqslant \psi _{i}(x)$.

The last statement implies that $x^*$ is a local optimal solution for the convex problem (A.2). Hence, it is also a global optimal solution. Based on (3.39) from Assumption 3, we have,

$$\begin{aligned} {\bar{\psi }}_{i}({\bar{x}}) = \psi _{i}({\bar{x}}) + \mu _iW({\bar{x}}, x^*) \leqslant -2\mu _iD_X^2 + \mu _iD_X^2 = -\mu _iD_X^2 < 0. \end{aligned}$$

Hence, by Slater condition, we have that there exists $y^* \geqslant {\varvec{0}}$ such that $(x^*, y^*)$ satisfy first order KKT-condition for the convex problem (A.2). Thus, we have

$$\begin{aligned}&\partial {\bar{\psi }}_0(x^*)+\sum _{i\in [m]}{y^{(i)}}^*\partial {\bar{\psi }}_i(x^*) + N_X(x^*) \ni {\varvec{0}},\\&y^{*\left( i\right) }{\bar{\psi }}_i(x^*) =0, \qquad i \in [m]. \end{aligned}$$

It remains to apply $\partial {\bar{\psi }}_i(x^*)=\partial \psi _i(x^*)$ and ${\bar{\psi }}_i(x^*) = \psi _i(x^*)$ for all $i \in 0, \ldots , m$. Hence, we conclude the proof.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Boob, D., Deng, Q. & Lan, G. Stochastic first-order methods for convex and nonconvex functional constrained optimization. Math. Program. 197, 215–279 (2023). https://doi.org/10.1007/s10107-021-01742-y

Download citation

Received: 05 October 2019
Accepted: 29 October 2021
Published: 21 January 2022
Issue Date: January 2023
DOI: https://doi.org/10.1007/s10107-021-01742-y

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Stochastic first-order methods for convex and nonconvex functional constrained optimization

Abstract

Access this article

Similar content being viewed by others

A stochastic primal-dual method for a class of nonconvex constrained optimization

Level constrained first order methods for function constrained optimization

Non-monotone derivative-free algorithm for solving optimization models with linear constraints: extensions for solving nonlinearly constrained models via exact penalty methods

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendices

Proof of Proposition 1

Proof of Proposition 4

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

Stochastic first-order methods for convex and nonconvex functional constrained optimization

Abstract

Access this article

Similar content being viewed by others

A stochastic primal-dual method for a class of nonconvex constrained optimization

Level constrained first order methods for function constrained optimization

Non-monotone derivative-free algorithm for solving optimization models with linear constraints: extensions for solving nonlinearly constrained models via exact penalty methods

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendices

Proof of Proposition 1

Proof of Proposition 4

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation