Skip to main content

Noisy Zeroth-Order Optimization for Non-smooth Saddle Point Problems

  • Conference paper
  • First Online:
Mathematical Optimization Theory and Operations Research (MOTOR 2022)

Abstract

This paper investigates zeroth-order methods for non-smooth convex-concave saddle point problems (with r-growth condition for duality gap). We assume that a black-box gradient-free oracle returns an inexact function value corrupted by an adversarial noise. In this work we prove that the standard zeroth-order version of the mirror descent method is optimal in terms of the oracle calls complexity and the maximum admissible noise.

The research was supported by Russian Science Foundation (project No. 21-71-30005).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 149.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 199.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Bartlett, P., Dani, V., Hayes, T., Kakade, S., Rakhlin, A., Tewari, A.: High-probability regret bounds for bandit online linear optimization. In: Proceedings of the 21st Annual Conference on Learning Theory, COLT 2008, pp. 335–342. Omnipress (2008)

    Google Scholar 

  2. Bayandina, A.S., Gasnikov, A.V., Lagunovskaya, A.A.: Gradient-free two-point methods for solving stochastic nonsmooth convex optimization problems with small non-random noises. Autom. Remote. Control. 79(8), 1399–1408 (2018)

    Article  MathSciNet  Google Scholar 

  3. Ben-Tal, A., Nemirovski, A.: Lectures on Modern Convex Optimization: Analysis, Algorithms, and Engineering Applications. SIAM (2013)

    Google Scholar 

  4. Beznosikov, A., Sadiev, A., Gasnikov, A.: Gradient-free methods with inexact oracle for convex-concave stochastic saddle-point problem. In: Kochetov, Y., Bykadorov, I., Gruzdeva, T. (eds.) MOTOR 2020. CCIS, vol. 1275, pp. 105–119. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58657-7_11

    Chapter  Google Scholar 

  5. Bubeck, S.: Regret analysis of stochastic and nonstochastic multi-armed bandit problems. Found. Trends® Mach. Learn. 5(1), 1–122 (2012). https://doi.org/10.1561/2200000024

    Article  MathSciNet  MATH  Google Scholar 

  6. Chen, P.Y., Zhang, H., Sharma, Y., Yi, J., Hsieh, C.J.: ZOO: zeroth order optimization based black-box attacks to deep neural networks without training substitute models. In: Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security, pp. 15–26 (2017)

    Google Scholar 

  7. Choromanski, K., Rowland, M., Sindhwani, V., Turner, R., Weller, A.: Structured evolution with compact architectures for scalable policy optimization. In: International Conference on Machine Learning, pp. 970–978. PMLR (2018)

    Google Scholar 

  8. Conn, A., Scheinberg, K., Vicente, L.: Introduction to Derivative-Free Optimization. Society for Industrial and Applied Mathematics (2009). https://doi.org/10.1137/1.9780898718768. http://epubs.siam.org/doi/abs/10.1137/1.9780898718768

  9. Duchi, J.C., Jordan, M.I., Wainwright, M.J., Wibisono, A.: Optimal rates for zero-order convex optimization: the power of two function evaluations. IEEE Trans. Inf. Theor. 61(5), 2788–2806 (2015). https://doi.org/10.1109/TIT.2015.2409256. arXiv:1312.2139

  10. Flaxman, A.D., Kalai, A.T., McMahan, H.B.: Online convex optimization in the bandit setting: gradient descent without a gradient. arXiv preprint arXiv:cs/0408007 (2004)

  11. Gasnikov, A.V., Krymova, E.A., Lagunovskaya, A.A., Usmanova, I.N., Fedorenko, F.A.: Stochastic online optimization. Single-point and multi-point non-linear multi-armed bandits. Convex and strongly-convex case. Autom. Remote Control 78(2), 224–234 (2017). https://doi.org/10.1134/S0005117917020035. arXiv:1509.01679

  12. Gasnikov, A., et al.: The power of first-order smooth optimization for black-box non-smooth problems. arXiv preprint arXiv:2201.12289 (2022)

  13. Gasnikov, A.V., Nesterov, Y.E.: Universal method for stochastic composite optimization problems. Comput. Math. Math. Phys. 58(1), 48–64 (2018)

    Article  MathSciNet  Google Scholar 

  14. Goodfellow, I., et al.: Generative adversarial nets. In: Advances in Neural Information Processing Systems 27 (2014)

    Google Scholar 

  15. Gorbunov, É.A., Vorontsova, E.A., Gasnikov, A.V.: On the upper bound for the expectation of the norm of a vector uniformly distributed on the sphere and the phenomenon of concentration of uniform measure on the sphere. Math. Notes 106, 11–19 (2019)

    Article  MathSciNet  Google Scholar 

  16. Juditsky, A., Nesterov, Y.: Deterministic and stochastic primal-dual subgradient algorithms for uniformly convex minimization. Stoch. Syst. 4(1), 44–80 (2014). https://doi.org/10.1287/10-SSY010

  17. Mania, H., Guy, A., Recht, B.: Simple random search of static linear policies is competitive for reinforcement learning. In: Proceedings of the 32nd International Conference on Neural Information Processing Systems, pp. 1805–1814 (2018)

    Google Scholar 

  18. Nemirovskij, A.S., Yudin, D.B.: Problem Complexity and Method Efficiency in Optimization (1983)

    Google Scholar 

  19. Nesterov, Y., Spokoiny, V.: Random gradient-free minimization of convex functions. Found. Comput. Math. 17(2), 527–566 (2017). https://doi.org/10.1007/s10208-015-9296-2. First appeared in 2011 as CORE discussion paper 2011/16

  20. Neumann, J.: Zur theorie der gesellschaftsspiele. Mathematische annalen 100(1), 295–320 (1928)

    Google Scholar 

  21. Polyak, B.: Introduction to Optimization. Optimization Software, New York (1987)

    Google Scholar 

  22. Risteski, A., Li, Y.: Algorithms and matching lower bounds for approximately-convex optimization. Adv. Neural. Inf. Process. Syst. 29, 4745–4753 (2016)

    Google Scholar 

  23. Sergeyev, Y.D., Candelieri, A., Kvasov, D.E., Perego, R.: Safe global optimization of expensive noisy black-box functions in the \(\delta \)-lipschitz framework. Soft. Comput. 24(23), 17715–17735 (2020)

    Article  Google Scholar 

  24. Shamir, O.: An optimal algorithm for bandit and zero-order convex optimization with two-point feedback. J. Mach. Learn. Res. 18, 52:1–52:11 (2017). http://jmlr.org/papers/v18/papers/v18/16-632.html. First appeared in arXiv:1507.08752

  25. Shapiro, A., Dentcheva, D., Ruszczynski, A.: Lectures on Stochastic Programming: Modeling and Theory. SIAM (2021)

    Google Scholar 

  26. Spall, J.C.: Introduction to Stochastic Search and Optimization, 1st edn. Wiley, New York (2003)

    Book  Google Scholar 

  27. Vasin, A., Gasnikov, A., Spokoiny, V.: Stopping rules for accelerated gradient methods with additive noise in gradient (2021)

    Google Scholar 

  28. Vural, N.M., Yu, L., Balasubramanian, K., Volgushev, S., Erdogdu, M.A.: Mirror descent strikes again: optimal stochastic convex optimization under infinite noise variance (2022)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Darina Dvinskikh .

Editor information

Editors and Affiliations

A Auxiliary Results

A Auxiliary Results

This appendix presents auxiliary results to prove Theorem 1 from Sect. 3.

Lemma 1

Let vector \(\boldsymbol{e}\) be a random unit vector from the Euclidean unit sphere \(\{\boldsymbol{e}:\Vert \boldsymbol{e}\Vert _2=1\}\). Then it holds for all \(r \in \mathbb R^d\)

$$\begin{aligned} \mathbb E_{ \boldsymbol{e}}\left[ \left| \langle \boldsymbol{e}, r \rangle \right| \right] \le {\Vert r\Vert _2}/{\sqrt{d}}. \end{aligned}$$

Lemma 2

Let f(z) be \(M_2\)-Lipschitz continuous. Then for \(f^\tau (z)\) from (4), it holds

$$ \sup _{z\in \mathcal {Z}}|f^\tau (z) - f(z)|\le \tau M_2. $$

Lemma 3

Function \(f^\tau (z)\) is differentiable with the following gradient

$$ \nabla f^\tau (z) = \mathbb E_{{\boldsymbol{e}}}\left[ \frac{d}{\tau } f(z+\tau {\boldsymbol{e}})\boldsymbol{e}\right] . $$

Lemma 4

For \(g(z,\xi ,\boldsymbol{e})\) from (3) and \(f^\tau (z)\) from (4), the following holds

  1. 1.

    under Assumption 2

    $$\begin{aligned} \mathbb E_{\xi , \boldsymbol{e}}\left[ \langle g(z,\xi ,\boldsymbol{e}),r\rangle \right] \ge \langle \nabla f^\tau (z),r\rangle - {d\varDelta }{\tau ^{-1}} \mathbb E_{\boldsymbol{e}} \left[ \left| \langle \boldsymbol{e}, r \rangle \right| \right] , \end{aligned}$$
  2. 2.

    under Assumption 3

    $$\begin{aligned} \mathbb E_{\xi , \boldsymbol{e}}\left[ \langle g(z,\xi ,\boldsymbol{e}),r\rangle \right] \ge \langle \nabla f^\tau (z),r\rangle - d M_{2,\delta } \mathbb E_{\boldsymbol{e}} \left[ \left| \langle \boldsymbol{e}, r \rangle \right| \right] , \end{aligned}$$

Lemma 5

[24, Lemma 9]. For any function \(f(\boldsymbol{e})\) which is M-Lipschitz w.r.t. the \(\ell _2\)-norm, it holds that if \(\boldsymbol{e}\) is uniformly distributed on the Euclidean unit sphere, then

$$ \sqrt{\mathbb E\left[ (f(\boldsymbol{e}) - \mathbb Ef(\boldsymbol{e}))^4 \right] } \le cM_2^2/d $$

for some numerical constant c.

Lemma 6

For \(g(z,\xi ,\boldsymbol{e})\) from (3), the following holds under Assumption 1

  1. 1.

    and Assumption 2

    $$\begin{aligned} \mathbb E_{\xi ,\boldsymbol{e}}\left[ \Vert g(z,\xi ,\boldsymbol{e})\Vert ^2_q\right] \le c a^2_q dM_2^2 + {d^2 a_q^2\varDelta ^2}/{\tau ^2}, \end{aligned}$$
  2. 2.

    and Assumption 3

    $$\begin{aligned} \mathbb E_{\xi ,\boldsymbol{e}}\left[ \Vert g(z,\xi ,\boldsymbol{e})\Vert ^2_q\right] \le c a^2_q d (M_2^2+M_{2,\delta }^2), \end{aligned}$$

where c is some numerical constant and \(\sqrt{\mathbb E\left[ \Vert \boldsymbol{e}\Vert _q^4\right] } \le a_q^2\).

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Dvinskikh, D., Tominin, V., Tominin, I., Gasnikov, A. (2022). Noisy Zeroth-Order Optimization for Non-smooth Saddle Point Problems. In: Pardalos, P., Khachay, M., Mazalov, V. (eds) Mathematical Optimization Theory and Operations Research. MOTOR 2022. Lecture Notes in Computer Science, vol 13367. Springer, Cham. https://doi.org/10.1007/978-3-031-09607-5_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-09607-5_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-09606-8

  • Online ISBN: 978-3-031-09607-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics