Skip to main content
Log in

Stochastic inexact augmented Lagrangian method for nonconvex expectation constrained optimization

  • Published:
Computational Optimization and Applications Aims and scope Submit manuscript

Abstract

Many real-world problems not only have complicated nonconvex functional constraints but also use a large number of data points. This motivates the design of efficient stochastic methods on finite-sum or expectation constrained problems. In this paper, we design and analyze stochastic inexact augmented Lagrangian methods (Stoc-iALM) to solve problems involving a nonconvex composite (i.e. smooth + nonsmooth) objective and nonconvex smooth functional constraints. We adopt the standard iALM framework and design a subroutine by using the momentum-based variance-reduced proximal stochastic gradient method (PStorm) and a postprocessing step. Under certain regularity conditions (assumed also in existing works), to reach an \(\varepsilon \)-KKT point in expectation, we establish an oracle complexity result of \(O(\varepsilon ^{-5})\), which is better than the best-known \(O(\varepsilon ^{-6})\) result. Numerical experiments on the fairness constrained problem and the Neyman–Pearson classification problem with real data demonstrate that our proposed method outperforms an existing method with the previously best-known complexity result.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Algorithm 1
Algorithm 2
Fig. 1
Fig. 2

We’re sorry, something doesn't seem to be working properly.

Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Data availability statements

The data used for numerical tests are from UCI repository at https://archive.ics.uci.edu/ml/index.php and LIBSVM at https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/.

Notes

  1. In this paper, we use \(\tilde{O}\) to suppress all logarithmic terms of \(\varepsilon \) from the big-O notation.

  2. A function f is \(\rho \)-weakly convex for some \(\rho >0\), if \(f(\cdot ) + \frac{\rho }{2}\Vert \cdot \Vert ^2\) is convex.

References

  1. Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. Math. Program. 199, 165–214 (2023)

  2. Boob, D., Deng, Q., Lan, G.: Stochastic first-order methods for convex and nonconvex functional constrained optimization. Math. Program. 197, 215–279 (2023)

  3. Cartis, C., Gould, N.I., Toint, P.L.: On the evaluation complexity of composite function minimization with applications to nonconvex nonlinear programming. SIAM J. Optim. 21(4), 1721–1739 (2011)

    Article  MathSciNet  Google Scholar 

  4. Chang, C.-C., Lin, C.-J.: Libsvm: a library for support vector machines. ACM Transa. Intell. Syst. Technol. (TIST) 2(3), 1–27 (2011)

    Article  Google Scholar 

  5. Cutkosky, A., Orabona, F.: Momentum-based variance reduction in non-convex sgd. In: Advances in Neural Information Processing Systems, vol. 32 (2019)

  6. Dua, D., Graff, C.: UCI machine learning repository (2017)

  7. Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: near-optimal non-convex optimization via stochastic path-integrated differential estimator. In: Advances in Neural Information Processing Systems, vol. 31 (2018)

  8. Guyon, I., Gunn, S., Ben-Hur, A., Dror, G.: Result analysis of the nips 2003 feature selection challenge. In: Advances in neural information processing systems, vol. 17 (2004)

  9. Hestenes, M.R.: Multiplier and gradient methods. J. Optim. Theory Appl. 4(5), 303–320 (1969)

    Article  MathSciNet  Google Scholar 

  10. Huang, F., Gao, S., Pei, J., Huang, H.: Accelerated zeroth-order and first-order momentum methods from mini to minimax optimization. J. Mach. Learn. Res. 23, 1–36 (2022)

    MathSciNet  Google Scholar 

  11. Jin, L., Wang, X.: A stochastic primal-dual method for a class of nonconvex constrained optimization. Comput. Optim. Appl. 83(1), 143–180 (2022)

    Article  MathSciNet  Google Scholar 

  12. Lan, G., Monteiro, R.D.: Iteration-complexity of first-order augmented Lagrangian methods for convex programming. Math. Program. 155(1–2), 511–547 (2016)

    Article  MathSciNet  Google Scholar 

  13. Lan, G., Zhou, Z.: Algorithms for stochastic optimization with function or expectation constraints. Comput. Optim. Appl. 76(2), 461–498 (2020)

    Article  MathSciNet  Google Scholar 

  14. Li, F., Qu, Z.: An inexact proximal augmented Lagrangian framework with arbitrary linearly convergent inner solver for composite convex optimization. Math. Program. Comput. 13(3), 583–644 (2021)

    Article  MathSciNet  Google Scholar 

  15. Li, Z., Chen, P.-Y., Liu, S., Lu, S., Xu, Y.: Rate-improved inexact augmented Lagrangian method for constrained nonconvex optimization. In: International Conference on Artificial Intelligence and Statistics, pp. 2170–2178. PMLR (2021)

  16. Li, Z., Chen, P.-Y., Liu, S., Lu, S., Xu, Y.: Zeroth-order optimization for composite problems with functional constraints. Proc. AAAI Conf. Artif. Intell. 36, 7453–7461 (2022)

    Google Scholar 

  17. Li, Z., Xu, Y.: Augmented Lagrangian-based first-order methods for convex-constrained programs with weakly convex objective. Informs J. Optim. 3(4), 373–397 (2021)

    Article  MathSciNet  Google Scholar 

  18. Lin, Q., Ma, R., Xu, Y.: Complexity of an inexact proximal-point penalty method for constrained smooth non-convex optimization. Comput. Optim. Appl. 82(1), 175–224 (2022)

    Article  MathSciNet  Google Scholar 

  19. Lu, S.: A single-loop gradient descent and perturbed ascent algorithm for nonconvex functional constrained optimization. In: International Conference on Machine Learning, pp. 14315–14357. PMLR (2022)

  20. Luo, L., Ye, H., Huang, Z., Zhang, T.: Stochastic recursive gradient descent ascent for stochastic nonconvex-strongly-concave minimax problems. Adv. Neural. Inf. Process. Syst. 33, 20566–20577 (2020)

    Google Scholar 

  21. Ma, R., Lin, Q., Yang, T.: Proximally constrained methods for weakly convex optimization with weakly convex constraints. arXiv:1908.01871 (2019)

  22. Ma, R., Lin, Q., Yang, T.: Quadratically regularized subgradient methods for weakly convex optimization with weakly convex constraints. In: International Conference on Machine Learning, pp. 6554–6564. PMLR (2020)

  23. Melo, J.G., Monteiro, R.D., Wang, H.: Iteration-complexity of an inexact proximal accelerated augmented Lagrangian method for solving linearly constrained smooth nonconvex composite optimization problems. Optimization Online (2020)

  24. Necoara, I., Nedelcu, V.: Rate analysis of inexact dual first-order methods application to dual decomposition. IEEE Trans. Autom. Control 59(5), 1232–1243 (2014)

    Article  MathSciNet  Google Scholar 

  25. Nedelcu, V., Necoara, I., Tran-Dinh, Q.: Computational complexity of inexact gradient augmented Lagrangian methods: application to constrained mpc. SIAM J. Control. Optim. 52(5), 3109–3134 (2014)

    Article  MathSciNet  Google Scholar 

  26. Neyman, J., Pearson, E.S.: Containing papers of a mathematical or physical character. Ix. on the problem of the most efficient tests of statistical hypotheses. Philos. Trans. Roy. Soc. Lond. Ser. A 231(694–706), 289–337 (1933)

    Google Scholar 

  27. Ouyang, Y., Chen, Y., Lan, G., Pasiliao, E., Jr.: An accelerated linearized alternating direction method of multipliers. SIAM J. Imag. Sci. 8(1), 644–681 (2015)

    Article  MathSciNet  Google Scholar 

  28. Rakhlin, A., Shamir, O., Sridharan, K.: Making gradient descent optimal for strongly convex stochastic optimization. In: Proceedings of the 29th International Conference on International Conference on Machine Learning, pp. 1571–1578 (2012)

  29. Rigollet, P., Tong, X.: Neyman–Pearson classification, convexity and stochastic constraints. J. Mach. Learn. Res. 12(Oct), 2831–2855 (2011)

    MathSciNet  Google Scholar 

  30. Rockafellar, R.T.: A dual approach to solving nonlinear programming problems by unconstrained optimization. Math. Program. 5(1), 354–373 (1973)

    Article  MathSciNet  Google Scholar 

  31. Sahin, M.F., Alacaoglu, A., Latorre, F., Cevher, V. et al:. An inexact augmented Lagrangian framework for nonconvex optimization with nonlinear constraints. In: Advances in Neural Information Processing Systems, pp. 13943–13955 (2019)

  32. Shi, Q., Wang, X., Wang, H.: A momentum-based linearized augmented Lagrangian method for nonconvex constrained stochastic optimization (2022)

  33. Tran Dinh, Q., Liu, D., Nguyen, L.: Hybrid variance-reduced sgd algorithms for minimax problems with nonconvex-linear function. Adv. Neural. Inf. Process. Syst. 33, 11096–11107 (2020)

    Google Scholar 

  34. Tran-Dinh, Q., Pham, N.H., Phan, D.T., Nguyen, L.M.: A hybrid stochastic optimization framework for composite nonconvex optimization. Math. Program. 191(2), 1005–1071 (2022)

    Article  MathSciNet  Google Scholar 

  35. Wang, X., Ma, S., Yuan, Y.-X.: Penalty methods with stochastic approximation for stochastic nonlinear programming. Math. Comput. 86(306), 1793–1820 (2017)

    Article  MathSciNet  Google Scholar 

  36. Xu, Yangyang: Primal-dual stochastic gradient method for convex programs with many functional constraints. SIAM J. Optim. 30(2), 1664–1692 (2020). https://doi.org/10.1137/18M1229869

    Article  MathSciNet  Google Scholar 

  37. Xu, Y.: First-order methods for constrained convex programming based on linearized augmented Lagrangian function. Informs J. Optim. 3(1), 89–117 (2021)

    Article  MathSciNet  Google Scholar 

  38. Xu, Y.: Iteration complexity of inexact augmented Lagrangian methods for constrained convex programming. Math. Program. 185(1), 199–244 (2021)

    Article  MathSciNet  Google Scholar 

  39. Xu, Y., Xu, Y.: Momentum-based variance-reduced proximal stochastic gradient method for composite nonconvex stochastic optimization. J. Optim. Theory Appl. 196(1), 266–297 (2023)

    Article  MathSciNet  Google Scholar 

  40. Yan, Y., Xu, Y.: Adaptive primal-dual stochastic gradient method for expectation-constrained convex stochastic programs. Math. Program. Comput. 14, 319–363 (2022)

    Article  MathSciNet  Google Scholar 

  41. Yu, H., Neely, M., Wei, X.: Online convex optimization with stochastic constraints. In: Advances in Neural Information Processing Systems, vol. 30 (2017)

  42. Zhang, J., Luo, Z.-Q.: A global dual error bound and its application to the analysis of linearly constrained nonconvex optimization. SIAM J. Optim. 32(3), 2319–2346 (2022)

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

The authors would like to thank two anonymous reviewers for their constructive comments and suggestions. This work is partly supported by NSF Grants DMS-2053493 and DMS-2208394 and the ONR award N00014-22-1-2573, and also by the Rensselaer-IBM AI Research Collaboration, part of the IBM AI Horizons Network.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yangyang Xu.

Ethics declarations

Conflict of interest

There is no potential conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

A regularity condition for ball-constrained fairness and Neyman–Pearson problems

A regularity condition for ball-constrained fairness and Neyman–Pearson problems

In this section, we show that the regularity condition in (7) holds for the nonconvex fairness problem in (46) and the Neyman-Pearson classification problem in (48), when a ball constraint is imposed and the input data satisfies certain conditions. We consider the two problems together in the following form:

$$\begin{aligned} \min _{{\textbf{x}}, s} f_0({\textbf{x}}), \text{ s.t. } \hat{f}_1({\textbf{x}},s):=f_1({\textbf{x}}) + s = 0, s \ge 0, {\textbf{x}}\in {\mathcal {X}}:=\{{\textbf{x}}\in \mathbb {R}^d: \Vert {\textbf{x}}\Vert \le \lambda \}, \end{aligned}$$
(49)

where \(\lambda > 0\), and \(f_0\) and \(f_1\) are the functions defined in (46) and (48) respectively for the fairness problem and the Neyman-Pearson classification problem. The slack variable s is used to reformulate (46) and (48) into equality-constrained problems. We did not include the constraint \({\textbf{x}}\in {\mathcal {X}}\) in our experiments but the generated iterate sequence remained bounded.

Let \({\mathcal {N}}_{\mathcal {X}}({\textbf{x}})\) be the normal cone of \({\mathcal {X}}\) at \({\textbf{x}}\in {\mathcal {X}}\) and \({\mathcal {N}}_+(s)\) the normal cone of \(\mathbb {R}_+\) at \(s \ge 0\). Then the regularity condition in (7) for the problem (49) becomes: there exists \(\nu >0\) such that

$$\begin{aligned} \nu ^2 (f_1({\textbf{x}}) + s)^2 \le \textrm{dist}\left( -(f_1({\textbf{x}}) + s) \left[ \begin{array}{c}\nabla f_1({\textbf{x}})\\ 1 \end{array}\right] ,\ {\mathcal {N}}_{\mathcal {X}}({\textbf{x}}) \otimes {\mathcal {N}}_+(s)\right) ^2, \forall \, {\textbf{x}}\in {\mathcal {X}}, s\ge 0. \end{aligned}$$
(50)

Notice that \({\mathcal {N}}_{\mathcal {X}}({\textbf{x}}) = \{\textbf{0}\}\) if \(\Vert {\textbf{x}}\Vert < \lambda \) and \({\mathcal {N}}_{\mathcal {X}}({\textbf{x}}) = \{\alpha {\textbf{x}}: \alpha \ge 0\}\) if \(\Vert {\textbf{x}}\Vert =\lambda \). Also, \({\mathcal {N}}_+(s) = \{0\}\) if \(s> 0\) and \({\mathcal {N}}_+(s) = \mathbb {R}_-\) if \(s=0\). Hence, for any \(({\textbf{x}}, s) \in {\mathcal {X}}\otimes \mathbb {R}_+\),

$$\begin{aligned} \begin{aligned}&\textrm{dist}\left( -(f_1({\textbf{x}}) + s) \left[ \begin{array}{c}\nabla f_1({\textbf{x}})\\ 1 \end{array}\right] ,\ {\mathcal {N}}_{\mathcal {X}}({\textbf{x}}) \otimes {\mathcal {N}}_+(s)\right) ^2\\&\quad = \left\{ \begin{array}{ll} (f_1({\textbf{x}}) + s)^2 + \textrm{dist}\big (-(f_1({\textbf{x}}) + s)\nabla f_1({\textbf{x}}),\, {\mathcal {N}}_{\mathcal {X}}({\textbf{x}}) \big )^2, &{} \text { if } s >0\\[0.1cm] \min \big (f_1({\textbf{x}}), 0\big )^2 + |f_1({\textbf{x}})|^2\Vert \nabla f_1({\textbf{x}})\Vert ^2, &{} \text { if } s =0, \Vert {\textbf{x}}\Vert < \lambda \\[0.1cm] \min \big (f_1({\textbf{x}}), 0\big )^2 + \min _{\alpha \ge 0} \Vert f_1({\textbf{x}}) \nabla f_1({\textbf{x}}) + \alpha {\textbf{x}}\Vert ^2, &{} \text { if } s =0, \Vert {\textbf{x}}\Vert = \lambda . \end{array} \right. \end{aligned} \end{aligned}$$
(51)

From (51), we can easily have that when \(s>0\) or when \(s=0\) and \(f_1({\textbf{x}}) \le 0\), it holds

$$\begin{aligned} (f_1({\textbf{x}}) + s)^2 \le \textrm{dist}\left( -(f_1({\textbf{x}}) + s) \left[ \begin{array}{c}\nabla f_1({\textbf{x}})\\ 1 \end{array}\right] ,\ {\mathcal {N}}_{\mathcal {X}}({\textbf{x}}) \otimes {\mathcal {N}}_+(s)\right) ^2. \end{aligned}$$

Thus we only need to show the regularity condition at \(({\textbf{x}},0)\) with \({\textbf{x}}\in {\mathcal {X}}\) such that \(f_1({\textbf{x}}) > 0\). We make the following assumption about the data involved in (46) and (48).

Assumption 6

The feature vectors in (46) and (48) satisfy:

  1. (i)

    In (46), \(\Vert {\textbf{a}}\Vert = q, \forall \, {\textbf{a}}\in S\) for some \(q>0\) and \(\langle {\textbf{a}}_1, {\textbf{a}}_2 \rangle \ge 0\) for any \({\textbf{a}}_1, {\textbf{a}}_2 \in S\). In addition,

    $$\begin{aligned} \frac{e^{\lambda q} }{(1+e^{\lambda q})^2} \sqrt{\sum _{{\textbf{a}}_1, {\textbf{a}}_2 \in S\backslash S_{\min }} {\textbf{a}}_1^\top {\textbf{a}}_2}> \frac{1-c}{4c} \sqrt{\sum _{{\textbf{a}}_1, {\textbf{a}}_2 \in S_{\min }} {\textbf{a}}_1^\top {\textbf{a}}_2}. \end{aligned}$$
    (52)
  2. (ii)

    In (48), \(\Vert {\textbf{a}}_i^-\Vert = q, \forall \, i\) for some \(q>0\), and \(\langle {\textbf{a}}_i^-, {\textbf{a}}_j^- \rangle \ge 0\) for any ij.

The above assumption will hold if each data point is first normalized and then appended by 1 at the end, which is equivalent to having an intercept term in the model, and in addition, for (46) the minority group \(S_{\min }\) is only a small fraction of S.

Claim A1

Under Assumption 6, let \(f_1\) be given in (46) or (48). Then \(\nu _1:= \min _{\Vert {\textbf{x}}\Vert \le \lambda } \Vert \nabla f_1({\textbf{x}})\Vert >0\).

Proof

We first prove the claim for (48), for which case,

$$\begin{aligned} \nabla f_1({\textbf{x}}) = -\frac{1}{n^-}\sum _{i=1}^{n^-} \phi '(-{\textbf{x}}^\top {\textbf{a}}_i^-) {\textbf{a}}_i^-. \end{aligned}$$
(53)

Suppose \(\nu _1 = \Vert \nabla f_1(\tilde{{\textbf{x}}})\Vert \), i.e., the minimum is reached at \(\tilde{{\textbf{x}}}\). Notice \(\phi '(u) = - \frac{e^u}{(1+e^u)^2} < 0\). Thus

$$\begin{aligned} \nu _1^2 = \frac{1}{(n^-)^2} \sum _{i, j =1}^{n^-} \phi '(-\tilde{{\textbf{x}}}^\top {\textbf{a}}_i^-) \phi '(-\tilde{{\textbf{x}}}^\top {\textbf{a}}_j^-) \langle {\textbf{a}}_i^-, {\textbf{a}}_j^-\rangle \ge \frac{1}{(n^-)^2} \sum _{i=1}^{n^-} \big [\phi '(-\tilde{{\textbf{x}}}^\top {\textbf{a}}_i^-)\big ]^2 \Vert {\textbf{a}}_i^- \Vert ^2, \end{aligned}$$

where the inequality follows from \(\langle {\textbf{a}}_i^-, {\textbf{a}}_j^-\rangle \ge 0 \) for all ij. Hence, \(\nu _1>0\) must hold by Assumption 6(ii).

Next we prove the claim for (46). When \(f_1\) is the function in (46), it holds

$$\begin{aligned} \nabla f_1({\textbf{x}}) =c \sum _{{\textbf{a}}\in S\backslash S_{\min }} \sigma '({\textbf{a}}^\top {\textbf{x}}) {\textbf{a}}- (1-c) \sum _{{\textbf{a}}\in S_{\min }} \sigma '({\textbf{a}}^\top {\textbf{x}}) {\textbf{a}}. \end{aligned}$$

Again, suppose \(\nu _1 = \Vert \nabla f_1(\tilde{{\textbf{x}}})\Vert \). Notice \(\sigma '(u) = \frac{e^u}{(1+e^u)^2}\) is decreasing on \([0, +\infty )\) and increasing on \((-\infty , 0]\). Also, by Assumption 6(i) and \(\Vert \tilde{{\textbf{x}}}\Vert \le \lambda \), we have \(|{\textbf{a}}^\top \tilde{{\textbf{x}}}| \le q\lambda \). Hence, \(\frac{e^{q\lambda }}{(1+e^{q\lambda })^2} \le \sigma '({\textbf{a}}^\top \tilde{{\textbf{x}}}) \le \frac{1}{4}\). Thus

$$\begin{aligned} \left\| c \sum _{{\textbf{a}}\in S\backslash S_{\min }} \sigma '({\textbf{a}}^\top \tilde{{\textbf{x}}}) {\textbf{a}}\right\| ^2= & {} c^2 \sum _{{\textbf{a}}_1, {\textbf{a}}_2 \in S\backslash S_{\min }} \sigma '({\textbf{a}}_1^\top \tilde{{\textbf{x}}}) \sigma '({\textbf{a}}_2^\top \tilde{{\textbf{x}}}) {\textbf{a}}_1^\top {\textbf{a}}_2 \\\ge & {} \frac{c^2 e^{2q\lambda }}{(1+e^{q\lambda })^4} \sum _{{\textbf{a}}_1, {\textbf{a}}_2 \in S\backslash S_{\min }} {\textbf{a}}_1^\top {\textbf{a}}_2, \end{aligned}$$

and

$$\begin{aligned} \left\| (1-c) \sum _{{\textbf{a}}\in S_{\min }} \sigma '({\textbf{a}}^\top {\textbf{x}}) {\textbf{a}}\right\| ^2 \le \frac{(1-c)^2}{16} \sum _{{\textbf{a}}_1, {\textbf{a}}_2 \in S_{\min }} {\textbf{a}}_1^\top {\textbf{a}}_2. \end{aligned}$$

By the triangle inequality, it holds that

$$\begin{aligned} \Vert \nabla f_1({\textbf{x}})\Vert \ge c \left\| \sum _{{\textbf{a}}\in S\backslash S_{\min }} \sigma '({\textbf{a}}^\top {\textbf{x}}) {\textbf{a}}\right\| - (1-c) \left\| \sum _{{\textbf{a}}\in S_{\min }} \sigma '({\textbf{a}}^\top {\textbf{x}}) {\textbf{a}}\right\| . \end{aligned}$$

Therefore, from (52), we obtain \(\nu _1 > 0\) and complete the proof. \(\square \)

Claim A2

Suppose Assumption 6(ii) holds and in addition, the origin is a feasible point of (48), i.e., \(f_1(\textbf{0}) \le 0\). Then it holds

$$\begin{aligned} \nu _2 := \min _{\alpha \ge 0, {\textbf{x}}}\big \{ \Vert f_1({\textbf{x}})\nabla f_1({\textbf{x}}) + \alpha {\textbf{x}}\Vert : \Vert {\textbf{x}}\Vert =\lambda , f_1({\textbf{x}}) \ge 0 \big \} > 0. \end{aligned}$$
(54)

Proof

Suppose that the minimum in (54) is reached at \(\tilde{{\textbf{x}}}\), i.e., \(\Vert \tilde{{\textbf{x}}}\Vert = \lambda \), \(f_1(\tilde{{\textbf{x}}}) \ge 0\), and

$$\begin{aligned} \nu _2 = \min _{\alpha \ge 0} \Vert f_1(\tilde{{\textbf{x}}})\nabla f_1(\tilde{{\textbf{x}}}) + \alpha \tilde{{\textbf{x}}}\Vert . \end{aligned}$$

If \(\nu _2 = 0\), then we must have \(\tilde{\textbf{x}}= -\lambda \frac{\nabla f_1(\tilde{{\textbf{x}}})}{\Vert \nabla f_1(\tilde{{\textbf{x}}})\Vert }\) and the optimal \(\alpha = \frac{ \Vert \nabla f_1(\tilde{{\textbf{x}}})\Vert }{\lambda }\). By (53), we have

$$\begin{aligned} -\tilde{{\textbf{x}}}^\top {\textbf{a}}_j^- = -\frac{\lambda }{\Vert \nabla f_1(\tilde{{\textbf{x}}})\Vert } \frac{1}{n^-} \sum _{i=1}^{n^-} \phi '(-{\textbf{x}}^\top {\textbf{a}}_i^-) \langle {\textbf{a}}_i^-, {\textbf{a}}_j^-\rangle > 0, \end{aligned}$$

where the inequality follows from \(\phi '(u) < 0, \forall \, u\) and Assumption 6(ii). Now notice that \(\phi (u)\) is an decreasing function. We have \(f_1(\tilde{{\textbf{x}}}) < f_1(\textbf{0}) \le 0\), which contradicts to \(f_1(\tilde{{\textbf{x}}}) \ge 0\). Therefore, we must have \(\nu _2>0\) and thus complete the proof. \(\square \)

By Claims A1 and A2, we immediately obtain the theorem below.

Theorem 4

Suppose Assumption 6 holds and in addition, the origin is feasible in (48). Then there must exist a constant \(\nu >0\) such that if \(f_1\) is given in (46),

$$\begin{aligned} \nu |{\hat{f}}_1({\textbf{x}}, s)| \le \textrm{dist}\left( -{\hat{f}}_1({\textbf{x}}, s) \nabla {\hat{f}}_1({\textbf{x}}, s),\ {\mathcal {N}}_{\mathcal {X}}({\textbf{x}}) \otimes {\mathcal {N}}_+(s)\right) , \forall \, {\textbf{x}}\in \textrm{int}({\mathcal {X}}), s\ge 0, \end{aligned}$$
(55)

and if \(f_1\) is given in (48),

$$\begin{aligned} \nu |{\hat{f}}_1({\textbf{x}}, s)| \le \textrm{dist}\left( -{\hat{f}}_1({\textbf{x}}, s) \nabla {\hat{f}}_1({\textbf{x}}, s),\ {\mathcal {N}}_{\mathcal {X}}({\textbf{x}}) \otimes {\mathcal {N}}_+(s)\right) , \forall \, {\textbf{x}}\in {\mathcal {X}}, s\ge 0, \end{aligned}$$
(56)

where \({\hat{f}}_1\) and \({\mathcal {X}}\) are defined in (49).

Proof

From (50) and (51), we obtain (55) with \(\nu = \min \{1, \nu _1\}\), where \(\nu _1\) is defined in Claim A1, and we obtain (56) with \(\nu = \min \{1, \nu _1, \nu _2\}\), where \(\nu _1\) and \(\nu _2\) are defined in Claims A1 and A2. \(\square \)

Remark 9

From Theorem 4, we see that the regularity condition will hold for the ball constrained version of (48) under a certain data preprocessing and an origin feasibility condition. It can almost hold for a ball constrained version of (46), except for the points on the sphere of the constraint ball. For the tested instances of (46) and (48) without a ball constraint, we checked the regularity condition at the iterates (because we actually only need the condition at the generated iterates). We found that for (46), the introduced slack variable s would always be positive during the iterations, and for (48), the iterates became feasible after just a few iterations; see Fig. 2. Hence, for the instances tested in our experiments, the regularity condition in (7) holds with \(\nu =1\) for \({\textbf{x}}\) being a generated iterate.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, Z., Chen, PY., Liu, S. et al. Stochastic inexact augmented Lagrangian method for nonconvex expectation constrained optimization. Comput Optim Appl 87, 117–147 (2024). https://doi.org/10.1007/s10589-023-00521-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10589-023-00521-z

Keywords

Navigation