Stochastic inexact augmented Lagrangian method for nonconvex expectation constrained optimization

Li, Zichong; Chen, Pin-Yu; Liu, Sijia; Lu, Songtao; Xu, Yangyang

doi:10.1007/s10589-023-00521-z

Stochastic inexact augmented Lagrangian method for nonconvex expectation constrained optimization

Published: 07 September 2023

Volume 87, pages 117–147, (2024)
Cite this article

Computational Optimization and Applications Aims and scope Submit manuscript

Zichong Li¹,
Pin-Yu Chen²,
Sijia Liu³,
Songtao Lu² &
…
Yangyang Xu ORCID: orcid.org/0000-0002-4163-3723¹

398 Accesses
Explore all metrics

Abstract

Many real-world problems not only have complicated nonconvex functional constraints but also use a large number of data points. This motivates the design of efficient stochastic methods on finite-sum or expectation constrained problems. In this paper, we design and analyze stochastic inexact augmented Lagrangian methods (Stoc-iALM) to solve problems involving a nonconvex composite (i.e. smooth + nonsmooth) objective and nonconvex smooth functional constraints. We adopt the standard iALM framework and design a subroutine by using the momentum-based variance-reduced proximal stochastic gradient method (PStorm) and a postprocessing step. Under certain regularity conditions (assumed also in existing works), to reach an $\varepsilon $-KKT point in expectation, we establish an oracle complexity result of $O(\varepsilon ^{-5})$, which is better than the best-known $O(\varepsilon ^{-6})$ result. Numerical experiments on the fairness constrained problem and the Neyman–Pearson classification problem with real data demonstrate that our proposed method outperforms an existing method with the previously best-known complexity result.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

We’re sorry, something doesn't seem to be working properly.

Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Data availability statements

The data used for numerical tests are from UCI repository at https://archive.ics.uci.edu/ml/index.php and LIBSVM at https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/.

Notes

In this paper, we use $\tilde{O}$ to suppress all logarithmic terms of $\varepsilon $ from the big-O notation.
A function f is $\rho $-weakly convex for some $\rho >0$, if $f(\cdot ) + \frac{\rho }{2}\Vert \cdot \Vert ^2$ is convex.

References

Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. Math. Program. 199, 165–214 (2023)
Boob, D., Deng, Q., Lan, G.: Stochastic first-order methods for convex and nonconvex functional constrained optimization. Math. Program. 197, 215–279 (2023)
Cartis, C., Gould, N.I., Toint, P.L.: On the evaluation complexity of composite function minimization with applications to nonconvex nonlinear programming. SIAM J. Optim. 21(4), 1721–1739 (2011)
Article MathSciNet Google Scholar
Chang, C.-C., Lin, C.-J.: Libsvm: a library for support vector machines. ACM Transa. Intell. Syst. Technol. (TIST) 2(3), 1–27 (2011)
Article Google Scholar
Cutkosky, A., Orabona, F.: Momentum-based variance reduction in non-convex sgd. In: Advances in Neural Information Processing Systems, vol. 32 (2019)
Dua, D., Graff, C.: UCI machine learning repository (2017)
Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: near-optimal non-convex optimization via stochastic path-integrated differential estimator. In: Advances in Neural Information Processing Systems, vol. 31 (2018)
Guyon, I., Gunn, S., Ben-Hur, A., Dror, G.: Result analysis of the nips 2003 feature selection challenge. In: Advances in neural information processing systems, vol. 17 (2004)
Hestenes, M.R.: Multiplier and gradient methods. J. Optim. Theory Appl. 4(5), 303–320 (1969)
Article MathSciNet Google Scholar
Huang, F., Gao, S., Pei, J., Huang, H.: Accelerated zeroth-order and first-order momentum methods from mini to minimax optimization. J. Mach. Learn. Res. 23, 1–36 (2022)
MathSciNet Google Scholar
Jin, L., Wang, X.: A stochastic primal-dual method for a class of nonconvex constrained optimization. Comput. Optim. Appl. 83(1), 143–180 (2022)
Article MathSciNet Google Scholar
Lan, G., Monteiro, R.D.: Iteration-complexity of first-order augmented Lagrangian methods for convex programming. Math. Program. 155(1–2), 511–547 (2016)
Article MathSciNet Google Scholar
Lan, G., Zhou, Z.: Algorithms for stochastic optimization with function or expectation constraints. Comput. Optim. Appl. 76(2), 461–498 (2020)
Article MathSciNet Google Scholar
Li, F., Qu, Z.: An inexact proximal augmented Lagrangian framework with arbitrary linearly convergent inner solver for composite convex optimization. Math. Program. Comput. 13(3), 583–644 (2021)
Article MathSciNet Google Scholar
Li, Z., Chen, P.-Y., Liu, S., Lu, S., Xu, Y.: Rate-improved inexact augmented Lagrangian method for constrained nonconvex optimization. In: International Conference on Artificial Intelligence and Statistics, pp. 2170–2178. PMLR (2021)
Li, Z., Chen, P.-Y., Liu, S., Lu, S., Xu, Y.: Zeroth-order optimization for composite problems with functional constraints. Proc. AAAI Conf. Artif. Intell. 36, 7453–7461 (2022)
Google Scholar
Li, Z., Xu, Y.: Augmented Lagrangian-based first-order methods for convex-constrained programs with weakly convex objective. Informs J. Optim. 3(4), 373–397 (2021)
Article MathSciNet Google Scholar
Lin, Q., Ma, R., Xu, Y.: Complexity of an inexact proximal-point penalty method for constrained smooth non-convex optimization. Comput. Optim. Appl. 82(1), 175–224 (2022)
Article MathSciNet Google Scholar
Lu, S.: A single-loop gradient descent and perturbed ascent algorithm for nonconvex functional constrained optimization. In: International Conference on Machine Learning, pp. 14315–14357. PMLR (2022)
Luo, L., Ye, H., Huang, Z., Zhang, T.: Stochastic recursive gradient descent ascent for stochastic nonconvex-strongly-concave minimax problems. Adv. Neural. Inf. Process. Syst. 33, 20566–20577 (2020)
Google Scholar
Ma, R., Lin, Q., Yang, T.: Proximally constrained methods for weakly convex optimization with weakly convex constraints. arXiv:1908.01871 (2019)
Ma, R., Lin, Q., Yang, T.: Quadratically regularized subgradient methods for weakly convex optimization with weakly convex constraints. In: International Conference on Machine Learning, pp. 6554–6564. PMLR (2020)
Melo, J.G., Monteiro, R.D., Wang, H.: Iteration-complexity of an inexact proximal accelerated augmented Lagrangian method for solving linearly constrained smooth nonconvex composite optimization problems. Optimization Online (2020)
Necoara, I., Nedelcu, V.: Rate analysis of inexact dual first-order methods application to dual decomposition. IEEE Trans. Autom. Control 59(5), 1232–1243 (2014)
Article MathSciNet Google Scholar
Nedelcu, V., Necoara, I., Tran-Dinh, Q.: Computational complexity of inexact gradient augmented Lagrangian methods: application to constrained mpc. SIAM J. Control. Optim. 52(5), 3109–3134 (2014)
Article MathSciNet Google Scholar
Neyman, J., Pearson, E.S.: Containing papers of a mathematical or physical character. Ix. on the problem of the most efficient tests of statistical hypotheses. Philos. Trans. Roy. Soc. Lond. Ser. A 231(694–706), 289–337 (1933)
Google Scholar
Ouyang, Y., Chen, Y., Lan, G., Pasiliao, E., Jr.: An accelerated linearized alternating direction method of multipliers. SIAM J. Imag. Sci. 8(1), 644–681 (2015)
Article MathSciNet Google Scholar
Rakhlin, A., Shamir, O., Sridharan, K.: Making gradient descent optimal for strongly convex stochastic optimization. In: Proceedings of the 29th International Conference on International Conference on Machine Learning, pp. 1571–1578 (2012)
Rigollet, P., Tong, X.: Neyman–Pearson classification, convexity and stochastic constraints. J. Mach. Learn. Res. 12(Oct), 2831–2855 (2011)
MathSciNet Google Scholar
Rockafellar, R.T.: A dual approach to solving nonlinear programming problems by unconstrained optimization. Math. Program. 5(1), 354–373 (1973)
Article MathSciNet Google Scholar
Sahin, M.F., Alacaoglu, A., Latorre, F., Cevher, V. et al:. An inexact augmented Lagrangian framework for nonconvex optimization with nonlinear constraints. In: Advances in Neural Information Processing Systems, pp. 13943–13955 (2019)
Shi, Q., Wang, X., Wang, H.: A momentum-based linearized augmented Lagrangian method for nonconvex constrained stochastic optimization (2022)
Tran Dinh, Q., Liu, D., Nguyen, L.: Hybrid variance-reduced sgd algorithms for minimax problems with nonconvex-linear function. Adv. Neural. Inf. Process. Syst. 33, 11096–11107 (2020)
Google Scholar
Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Math. Program. 191(2), 1005–1071 (2022)
Article MathSciNet Google Scholar
Wang, X., Ma, S., Yuan, Y.-X.: Penalty methods with stochastic approximation for stochastic nonlinear programming. Math. Comput. 86(306), 1793–1820 (2017)
Article MathSciNet Google Scholar
Xu, Yangyang: Primal-dual stochastic gradient method for convex programs with many functional constraints. SIAM J. Optim. 30(2), 1664–1692 (2020). https://doi.org/10.1137/18M1229869
Article MathSciNet Google Scholar
Xu, Y.: First-order methods for constrained convex programming based on linearized augmented Lagrangian function. Informs J. Optim. 3(1), 89–117 (2021)
Article MathSciNet Google Scholar
Xu, Y.: Iteration complexity of inexact augmented Lagrangian methods for constrained convex programming. Math. Program. 185(1), 199–244 (2021)
Article MathSciNet Google Scholar
Xu, Y., Xu, Y.: Momentum-based variance-reduced proximal stochastic gradient method for composite nonconvex stochastic optimization. J. Optim. Theory Appl. 196(1), 266–297 (2023)
Article MathSciNet Google Scholar
Yan, Y., Xu, Y.: Adaptive primal-dual stochastic gradient method for expectation-constrained convex stochastic programs. Math. Program. Comput. 14, 319–363 (2022)
Article MathSciNet Google Scholar
Yu, H., Neely, M., Wei, X.: Online convex optimization with stochastic constraints. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Zhang, J., Luo, Z.-Q.: A global dual error bound and its application to the analysis of linearly constrained nonconvex optimization. SIAM J. Optim. 32(3), 2319–2346 (2022)
Article MathSciNet Google Scholar

Download references

Acknowledgements

The authors would like to thank two anonymous reviewers for their constructive comments and suggestions. This work is partly supported by NSF Grants DMS-2053493 and DMS-2208394 and the ONR award N00014-22-1-2573, and also by the Rensselaer-IBM AI Research Collaboration, part of the IBM AI Horizons Network.

Author information

Authors and Affiliations

Department of Mathematical Sciences, Rensselaer Polytechnic Institute, Troy, NY, 12180, USA
Zichong Li & Yangyang Xu
IBM Research, Thomas J. Watson Research Center, Yorktown Heights, NY, 10598, USA
Pin-Yu Chen & Songtao Lu
Department of Computer Science and Engineering, Michigan State University, East Lansing, MI, 48824, USA
Sijia Liu

Authors

Zichong Li
View author publications
You can also search for this author in PubMed Google Scholar
Pin-Yu Chen
View author publications
You can also search for this author in PubMed Google Scholar
Sijia Liu
View author publications
You can also search for this author in PubMed Google Scholar
Songtao Lu
View author publications
You can also search for this author in PubMed Google Scholar
Yangyang Xu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yangyang Xu.

Ethics declarations

Conflict of interest

There is no potential conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

A regularity condition for ball-constrained fairness and Neyman–Pearson problems

In this section, we show that the regularity condition in (7) holds for the nonconvex fairness problem in (46) and the Neyman-Pearson classification problem in (48), when a ball constraint is imposed and the input data satisfies certain conditions. We consider the two problems together in the following form:

$$\begin{aligned} \min _{{\textbf{x}}, s} f_0({\textbf{x}}), \text{ s.t. } \hat{f}_1({\textbf{x}},s):=f_1({\textbf{x}}) + s = 0, s \ge 0, {\textbf{x}}\in {\mathcal {X}}:=\{{\textbf{x}}\in \mathbb {R}^d: \Vert {\textbf{x}}\Vert \le \lambda \}, \end{aligned}$$

(49)

where $\lambda > 0$, and $f_0$ and $f_1$ are the functions defined in (46) and (48) respectively for the fairness problem and the Neyman-Pearson classification problem. The slack variable s is used to reformulate (46) and (48) into equality-constrained problems. We did not include the constraint ${\textbf{x}}\in {\mathcal {X}}$ in our experiments but the generated iterate sequence remained bounded.

Let ${\mathcal {N}}_{\mathcal {X}}({\textbf{x}})$ be the normal cone of ${\mathcal {X}}$ at ${\textbf{x}}\in {\mathcal {X}}$ and ${\mathcal {N}}_+(s)$ the normal cone of $\mathbb {R}_+$ at $s \ge 0$. Then the regularity condition in (7) for the problem (49) becomes: there exists $\nu >0$ such that

$$\begin{aligned} \nu ^2 (f_1({\textbf{x}}) + s)^2 \le \textrm{dist}\left( -(f_1({\textbf{x}}) + s) \left[ \begin{array}{c}\nabla f_1({\textbf{x}})\\ 1 \end{array}\right] ,\ {\mathcal {N}}_{\mathcal {X}}({\textbf{x}}) \otimes {\mathcal {N}}_+(s)\right) ^2, \forall \, {\textbf{x}}\in {\mathcal {X}}, s\ge 0. \end{aligned}$$

(50)

Notice that ${\mathcal {N}}_{\mathcal {X}}({\textbf{x}}) = \{\textbf{0}\}$ if $\Vert {\textbf{x}}\Vert < \lambda $ and ${\mathcal {N}}_{\mathcal {X}}({\textbf{x}}) = \{\alpha {\textbf{x}}: \alpha \ge 0\}$ if $\Vert {\textbf{x}}\Vert =\lambda $. Also, ${\mathcal {N}}_+(s) = \{0\}$ if $s> 0$ and ${\mathcal {N}}_+(s) = \mathbb {R}_-$ if $s=0$. Hence, for any $({\textbf{x}}, s) \in {\mathcal {X}}\otimes \mathbb {R}_+$,

$$\begin{aligned} \begin{aligned}&\textrm{dist}\left( -(f_1({\textbf{x}}) + s) \left[ \begin{array}{c}\nabla f_1({\textbf{x}})\\ 1 \end{array}\right] ,\ {\mathcal {N}}_{\mathcal {X}}({\textbf{x}}) \otimes {\mathcal {N}}_+(s)\right) ^2\\&\quad = \left\{ \begin{array}{ll} (f_1({\textbf{x}}) + s)^2 + \textrm{dist}\big (-(f_1({\textbf{x}}) + s)\nabla f_1({\textbf{x}}),\, {\mathcal {N}}_{\mathcal {X}}({\textbf{x}}) \big )^2, &{} \text { if } s >0\\[0.1cm] \min \big (f_1({\textbf{x}}), 0\big )^2 + |f_1({\textbf{x}})|^2\Vert \nabla f_1({\textbf{x}})\Vert ^2, &{} \text { if } s =0, \Vert {\textbf{x}}\Vert < \lambda \\[0.1cm] \min \big (f_1({\textbf{x}}), 0\big )^2 + \min _{\alpha \ge 0} \Vert f_1({\textbf{x}}) \nabla f_1({\textbf{x}}) + \alpha {\textbf{x}}\Vert ^2, &{} \text { if } s =0, \Vert {\textbf{x}}\Vert = \lambda . \end{array} \right. \end{aligned} \end{aligned}$$

(51)

From (51), we can easily have that when $s>0$ or when $s=0$ and $f_1({\textbf{x}}) \le 0$, it holds

$$\begin{aligned} (f_1({\textbf{x}}) + s)^2 \le \textrm{dist}\left( -(f_1({\textbf{x}}) + s) \left[ \begin{array}{c}\nabla f_1({\textbf{x}})\\ 1 \end{array}\right] ,\ {\mathcal {N}}_{\mathcal {X}}({\textbf{x}}) \otimes {\mathcal {N}}_+(s)\right) ^2. \end{aligned}$$

Thus we only need to show the regularity condition at $({\textbf{x}},0)$ with ${\textbf{x}}\in {\mathcal {X}}$ such that $f_1({\textbf{x}}) > 0$. We make the following assumption about the data involved in (46) and (48).

Assumption 6

The feature vectors in (46) and (48) satisfy:

(i)
In (46), $\Vert {\textbf{a}}\Vert = q, \forall \, {\textbf{a}}\in S$ for some $q>0$ and $\langle {\textbf{a}}_1, {\textbf{a}}_2 \rangle \ge 0$ for any ${\textbf{a}}_1, {\textbf{a}}_2 \in S$. In addition,
$$\begin{aligned} \frac{e^{\lambda q} }{(1+e^{\lambda q})^2} \sqrt{\sum _{{\textbf{a}}_1, {\textbf{a}}_2 \in S\backslash S_{\min }} {\textbf{a}}_1^\top {\textbf{a}}_2}> \frac{1-c}{4c} \sqrt{\sum _{{\textbf{a}}_1, {\textbf{a}}_2 \in S_{\min }} {\textbf{a}}_1^\top {\textbf{a}}_2}. \end{aligned}$$
(52)
(ii)
In (48), $\Vert {\textbf{a}}_i^-\Vert = q, \forall \, i$ for some $q>0$, and $\langle {\textbf{a}}_i^-, {\textbf{a}}_j^- \rangle \ge 0$ for any i, j.

The above assumption will hold if each data point is first normalized and then appended by 1 at the end, which is equivalent to having an intercept term in the model, and in addition, for (46) the minority group $S_{\min }$ is only a small fraction of S.

Claim A1

Under Assumption 6, let $f_1$ be given in (46) or (48). Then $\nu _1:= \min _{\Vert {\textbf{x}}\Vert \le \lambda } \Vert \nabla f_1({\textbf{x}})\Vert >0$.

Proof

We first prove the claim for (48), for which case,

$$\begin{aligned} \nabla f_1({\textbf{x}}) = -\frac{1}{n^-}\sum _{i=1}^{n^-} \phi '(-{\textbf{x}}^\top {\textbf{a}}_i^-) {\textbf{a}}_i^-. \end{aligned}$$

(53)

Suppose $\nu _1 = \Vert \nabla f_1(\tilde{{\textbf{x}}})\Vert $, i.e., the minimum is reached at $\tilde{{\textbf{x}}}$. Notice $\phi '(u) = - \frac{e^u}{(1+e^u)^2} < 0$. Thus

$$\begin{aligned} \nu _1^2 = \frac{1}{(n^-)^2} \sum _{i, j =1}^{n^-} \phi '(-\tilde{{\textbf{x}}}^\top {\textbf{a}}_i^-) \phi '(-\tilde{{\textbf{x}}}^\top {\textbf{a}}_j^-) \langle {\textbf{a}}_i^-, {\textbf{a}}_j^-\rangle \ge \frac{1}{(n^-)^2} \sum _{i=1}^{n^-} \big [\phi '(-\tilde{{\textbf{x}}}^\top {\textbf{a}}_i^-)\big ]^2 \Vert {\textbf{a}}_i^- \Vert ^2, \end{aligned}$$

where the inequality follows from $\langle {\textbf{a}}_i^-, {\textbf{a}}_j^-\rangle \ge 0 $ for all i, j. Hence, $\nu _1>0$ must hold by Assumption 6(ii).

Next we prove the claim for (46). When $f_1$ is the function in (46), it holds

$$\begin{aligned} \nabla f_1({\textbf{x}}) =c \sum _{{\textbf{a}}\in S\backslash S_{\min }} \sigma '({\textbf{a}}^\top {\textbf{x}}) {\textbf{a}}- (1-c) \sum _{{\textbf{a}}\in S_{\min }} \sigma '({\textbf{a}}^\top {\textbf{x}}) {\textbf{a}}. \end{aligned}$$

Again, suppose $\nu _1 = \Vert \nabla f_1(\tilde{{\textbf{x}}})\Vert $. Notice $\sigma '(u) = \frac{e^u}{(1+e^u)^2}$ is decreasing on $[0, +\infty )$ and increasing on $(-\infty , 0]$. Also, by Assumption 6(i) and $\Vert \tilde{{\textbf{x}}}\Vert \le \lambda $, we have $|{\textbf{a}}^\top \tilde{{\textbf{x}}}| \le q\lambda $. Hence, $\frac{e^{q\lambda }}{(1+e^{q\lambda })^2} \le \sigma '({\textbf{a}}^\top \tilde{{\textbf{x}}}) \le \frac{1}{4}$. Thus

$$\begin{aligned} \left\| c \sum _{{\textbf{a}}\in S\backslash S_{\min }} \sigma '({\textbf{a}}^\top \tilde{{\textbf{x}}}) {\textbf{a}}\right\| ^2= & {} c^2 \sum _{{\textbf{a}}_1, {\textbf{a}}_2 \in S\backslash S_{\min }} \sigma '({\textbf{a}}_1^\top \tilde{{\textbf{x}}}) \sigma '({\textbf{a}}_2^\top \tilde{{\textbf{x}}}) {\textbf{a}}_1^\top {\textbf{a}}_2 \\\ge & {} \frac{c^2 e^{2q\lambda }}{(1+e^{q\lambda })^4} \sum _{{\textbf{a}}_1, {\textbf{a}}_2 \in S\backslash S_{\min }} {\textbf{a}}_1^\top {\textbf{a}}_2, \end{aligned}$$

and

$$\begin{aligned} \left\| (1-c) \sum _{{\textbf{a}}\in S_{\min }} \sigma '({\textbf{a}}^\top {\textbf{x}}) {\textbf{a}}\right\| ^2 \le \frac{(1-c)^2}{16} \sum _{{\textbf{a}}_1, {\textbf{a}}_2 \in S_{\min }} {\textbf{a}}_1^\top {\textbf{a}}_2. \end{aligned}$$

By the triangle inequality, it holds that

$$\begin{aligned} \Vert \nabla f_1({\textbf{x}})\Vert \ge c \left\| \sum _{{\textbf{a}}\in S\backslash S_{\min }} \sigma '({\textbf{a}}^\top {\textbf{x}}) {\textbf{a}}\right\| - (1-c) \left\| \sum _{{\textbf{a}}\in S_{\min }} \sigma '({\textbf{a}}^\top {\textbf{x}}) {\textbf{a}}\right\| . \end{aligned}$$

Therefore, from (52), we obtain $\nu _1 > 0$ and complete the proof. $\square $

Claim A2

Suppose Assumption 6(ii) holds and in addition, the origin is a feasible point of (48), i.e., $f_1(\textbf{0}) \le 0$. Then it holds

$$\begin{aligned} \nu _2 := \min _{\alpha \ge 0, {\textbf{x}}}\big \{ \Vert f_1({\textbf{x}})\nabla f_1({\textbf{x}}) + \alpha {\textbf{x}}\Vert : \Vert {\textbf{x}}\Vert =\lambda , f_1({\textbf{x}}) \ge 0 \big \} > 0. \end{aligned}$$

(54)

Proof

Suppose that the minimum in (54) is reached at $\tilde{{\textbf{x}}}$, i.e., $\Vert \tilde{{\textbf{x}}}\Vert = \lambda $, $f_1(\tilde{{\textbf{x}}}) \ge 0$, and

$$\begin{aligned} \nu _2 = \min _{\alpha \ge 0} \Vert f_1(\tilde{{\textbf{x}}})\nabla f_1(\tilde{{\textbf{x}}}) + \alpha \tilde{{\textbf{x}}}\Vert . \end{aligned}$$

If $\nu _2 = 0$, then we must have $\tilde{\textbf{x}}= -\lambda \frac{\nabla f_1(\tilde{{\textbf{x}}})}{\Vert \nabla f_1(\tilde{{\textbf{x}}})\Vert }$ and the optimal $\alpha = \frac{ \Vert \nabla f_1(\tilde{{\textbf{x}}})\Vert }{\lambda }$. By (53), we have

$$\begin{aligned} -\tilde{{\textbf{x}}}^\top {\textbf{a}}_j^- = -\frac{\lambda }{\Vert \nabla f_1(\tilde{{\textbf{x}}})\Vert } \frac{1}{n^-} \sum _{i=1}^{n^-} \phi '(-{\textbf{x}}^\top {\textbf{a}}_i^-) \langle {\textbf{a}}_i^-, {\textbf{a}}_j^-\rangle > 0, \end{aligned}$$

where the inequality follows from $\phi '(u) < 0, \forall \, u$ and Assumption 6(ii). Now notice that $\phi (u)$ is an decreasing function. We have $f_1(\tilde{{\textbf{x}}}) < f_1(\textbf{0}) \le 0$, which contradicts to $f_1(\tilde{{\textbf{x}}}) \ge 0$. Therefore, we must have $\nu _2>0$ and thus complete the proof. $\square $

By Claims A1 and A2, we immediately obtain the theorem below.

Theorem 4

Suppose Assumption 6 holds and in addition, the origin is feasible in (48). Then there must exist a constant $\nu >0$ such that if $f_1$ is given in (46),

$$\begin{aligned} \nu |{\hat{f}}_1({\textbf{x}}, s)| \le \textrm{dist}\left( -{\hat{f}}_1({\textbf{x}}, s) \nabla {\hat{f}}_1({\textbf{x}}, s),\ {\mathcal {N}}_{\mathcal {X}}({\textbf{x}}) \otimes {\mathcal {N}}_+(s)\right) , \forall \, {\textbf{x}}\in \textrm{int}({\mathcal {X}}), s\ge 0, \end{aligned}$$

(55)

and if $f_1$ is given in (48),

$$\begin{aligned} \nu |{\hat{f}}_1({\textbf{x}}, s)| \le \textrm{dist}\left( -{\hat{f}}_1({\textbf{x}}, s) \nabla {\hat{f}}_1({\textbf{x}}, s),\ {\mathcal {N}}_{\mathcal {X}}({\textbf{x}}) \otimes {\mathcal {N}}_+(s)\right) , \forall \, {\textbf{x}}\in {\mathcal {X}}, s\ge 0, \end{aligned}$$

(56)

where ${\hat{f}}_1$ and ${\mathcal {X}}$ are defined in (49).

Proof

From (50) and (51), we obtain (55) with $\nu = \min \{1, \nu _1\}$, where $\nu _1$ is defined in Claim A1, and we obtain (56) with $\nu = \min \{1, \nu _1, \nu _2\}$, where $\nu _1$ and $\nu _2$ are defined in Claims A1 and A2. $\square $

Remark 9

From Theorem 4, we see that the regularity condition will hold for the ball constrained version of (48) under a certain data preprocessing and an origin feasibility condition. It can almost hold for a ball constrained version of (46), except for the points on the sphere of the constraint ball. For the tested instances of (46) and (48) without a ball constraint, we checked the regularity condition at the iterates (because we actually only need the condition at the generated iterates). We found that for (46), the introduced slack variable s would always be positive during the iterations, and for (48), the iterates became feasible after just a few iterations; see Fig. 2. Hence, for the instances tested in our experiments, the regularity condition in (7) holds with $\nu =1$ for ${\textbf{x}}$ being a generated iterate.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Li, Z., Chen, PY., Liu, S. et al. Stochastic inexact augmented Lagrangian method for nonconvex expectation constrained optimization. Comput Optim Appl 87, 117–147 (2024). https://doi.org/10.1007/s10589-023-00521-z

Download citation

Received: 20 December 2022
Accepted: 18 August 2023
Published: 07 September 2023
Issue Date: January 2024
DOI: https://doi.org/10.1007/s10589-023-00521-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Stochastic inexact augmented Lagrangian method for nonconvex expectation constrained optimization

Abstract

Access this article

Data availability statements

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

A regularity condition for ball-constrained fairness and Neyman–Pearson problems

A regularity condition for ball-constrained fairness and Neyman–Pearson problems

Assumption 6

Claim A1

Proof

Claim A2

Proof

Theorem 4

Proof

Remark 9

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation