Zeroth-order optimization with orthogonal random directions

Kozak, David; Molinari, Cesare; Rosasco, Lorenzo; Tenorio, Luis; Villa, Silvia

doi:10.1007/s10107-022-01866-9

Zeroth-order optimization with orthogonal random directions

Full Length Paper
Series A
Published: 02 August 2022

Volume 199, pages 1179–1219, (2023)
Cite this article

Mathematical Programming Submit manuscript

David Kozak ORCID: orcid.org/0000-0002-3795-5834¹,
Cesare Molinari²,
Lorenzo Rosasco³,
Luis Tenorio⁴ &
…
Silvia Villa⁵

897 Accesses
4 Citations
Explore all metrics

Abstract

We propose and analyze a randomized zeroth-order optimization method based on approximating the exact gradient by finite differences computed in a set of orthogonal random directions that changes with each iteration. A number of previously proposed methods are recovered as special cases including spherical smoothing, coordinate descent, as well as discretized gradient descent. Our main contribution is proving convergence guarantees as well as convergence rates under different parameter choices and assumptions. In particular, we consider convex objectives, but also possibly non-convex objectives satisfying the Polyak-Łojasiewicz (PL) condition. Theoretical results are complemented and illustrated by numerical experiments.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Coordinate descent methods beyond smoothness and separability

Article 13 February 2024

Random Gradient-Free Minimization of Convex Functions

Article 30 November 2015

A simplified view of first order methods for optimization

Article 12 May 2018

References

Agarwal, A., Dekel, O., Xiao, L.: Optimal algorithms for online convex optimization with multi-point bandit feedback. In: Proceedings of the Twenty Third Annual Conference on Computational Learning Theory, Citeseer, pp. 28–40 (2010)
Attouch, H., Bolte, J., Svaiter, B.F.: Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward-backward splitting, and regularized Gauss-Seidel methods. Math. Program. 137, 91–129 (2013)
Article MathSciNet MATH Google Scholar
Avron, H., Maymounkov, P., Toledo, S.: Blendenpik: supercharging Lapack’s least-squares solver. SIAM J. Sci. Comput. 32, 1217–1236 (2010)
Article MathSciNet MATH Google Scholar
Baillon, J.-B., Haddad, G.: Quelques propriétés des opérateurs angle-bornés et n-cycliquement monotones. Israel J. Math. 26, 137–150 (1977)
Article MathSciNet MATH Google Scholar
Beck, A., Tetruashvili, L.: On the convergence of block coordinate descent type methods. SIAM J. Optim. 23, 2037–2060 (2013)
Article MathSciNet MATH Google Scholar
Berahas, A.S., Byrd, R.H., Nocedal, J.: Derivative-free optimization of noisy functions via quasi-Newton methods. SIAM J. Optim. 29, 965–993 (2019)
Article MathSciNet MATH Google Scholar
Berahas, A.S., Cao, L., Choromanski, K., Scheinberg, K.: A theoretical and empirical comparison of gradient approximations in derivative-free optimization, arXiv preprint arXiv:1905.01332, (2019)
Berahas, A.S., Cao, L., Scheinberg, K.: Global convergence rate analysis of a generic line search algorithm with noise, arXiv preprint arXiv:1910.04055, (2019)
Bertsekas, D.P., Tsitsiklis, J.N.: Gradient convergence in gradient methods with errors. SIAM J. Optim. 10, 627–642 (2000)
Article MathSciNet MATH Google Scholar
Bollapragada R., Wild, S.M.: Adaptive sampling quasi-Newton methods for derivative-free stochastic optimization, arXiv preprint arXiv:1910.13516, (2019)
Cartis, C., Roberts, L.: Scalable subspace methods for derivative-free nonlinear least-squares optimization, arXiv preprint arXiv:2102.12016, (2021)
Cartis, C., Scheinberg, K.: Global convergence rate analysis of unconstrained optimization methods based on probabilistic models. Math. Program. 169, 337–375 (2018)
Article MathSciNet MATH Google Scholar
Cauchy, A.: Méthode générale pour la résolution des systemes d’équations simultanées. Comp. Rend. Sci. Paris 25, 536–538 (1847)
Google Scholar
Choromanski, K., Rowland, M., Sindhwani, V., Turner, R., Weller, A.: Structured evolution with compact architectures for scalable policy optimization. In: Dy, J., Krause, A. (Eds.), Proceedings of the 35th International Conference on Machine Learning, vol. 80 of Proceedings of Machine Learning Research, Stockholmsmässan, Stockholm Sweden, PMLR, pp. 970–978, (10–15 Jul 2018)
Chung, K.L.: On a stochastic approximation method. Ann. Math. Stat. 25, 463–483 (1954)
Article MathSciNet MATH Google Scholar
Combettes, P.L., Pesquet, J.-C.: Stochastic quasi-Fejér block-coordinate fixed point iterations with random sweeping. SIAM J. Optim. 25, 1221–1248 (2015)
Article MathSciNet MATH Google Scholar
Conn, A.R., Scheinberg, K., Vicente, L.N.: Introduction to derivative-free optimization, vol. 8 of MPS/SIAM Series on Optimization, Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA; Mathematical Programming Society (MPS), Philadelphia, PA (2009)
Dodangeh, M., Vicente, L.N., Zhang, Z. a: On the optimal order of worst case complexity of direct search. Optim. Lett. 10, 699–708 (2016)
Article MathSciNet MATH Google Scholar
Drineas, P., Mahoney, M.W., Muthukrishnan, S., Sarlós, T.: Faster least squares approximation. Numer. Math. 117, 219–249 (2011)
Article MathSciNet MATH Google Scholar
Duchi, J.C., Jordan, M.I., Wainwright, M.J., Wibisono, A.: Optimal rates for zero-order convex optimization: The power of two function evaluations. IEEE Trans. Inf. Theory 61, 2788–2806 (2015)
Article MathSciNet MATH Google Scholar
Ehrhardt, M.J., Riis, E.S., Ringholm, T., Schönlieb, C.-B.: A geometric integration approach to smooth optimisation: Foundations of the discrete gradient method, arXiv preprint arXiv:1805.06444, (2018)
Finkel, D.E., Kelley, C.T.: Convergence analysis of sampling methods for perturbed Lipschitz functions. Pac. J. Optim. 5, 339–349 (2009)
MathSciNet MATH Google Scholar
Flaxman, A.D., Kalai, A.T., McMahan, H.B.: Online convex optimization in the bandit setting: gradient descent without a gradient. In: Proceedings of the Sixteenth Annual ACM-SIAM Symposium on Discrete Algorithms, ACM, New York, pp. 385–394 (2005)
Ghadimi, S., Lan, G.: Stochastic first-and zeroth-order methods for nonconvex stochastic programming. SIAM J. Optim. 23, 2341–2368 (2013)
Article MathSciNet MATH Google Scholar
Gilmore, P., Kelley, C.T.: An implicit filtering algorithm for optimization of functions with many local minima. SIAM J. Optim. 5, 269–285 (1995)
Article MathSciNet MATH Google Scholar
Gilmore, P., Kelley, C.T., Miller, C.T., Williams, G.A.: Implicit filtering and optimal design problems. In: Optimal design and control (Blacksburg, VA, 1994), vol. 19 of Progr. Systems Control Theory, Birkhäuser Boston, Boston, MA, pp. 159–176 (1995)
Gratton, S., Royer, C.W., Vicente, L.N., Zhang, Z.: Direct search based on probabilistic descent. SIAM J. Optim. 25, 1515–1541 (2015)
Article MathSciNet MATH Google Scholar
Grimm, V., McLachlan, R.I., McLaren, D.I., Quispel, G., Schönlieb, C.: Discrete gradient methods for solving variational image regularisation models. J. Phys. A: Math. Theor. 50, 295201 (2017)
Article MathSciNet MATH Google Scholar
Gupal, A.: A method for the minimization of almost-differentiable functions. Cybern. 13, 115–117 (1977)
Article MathSciNet Google Scholar
Gupal, A., Norkin, V.: Algorithm for the minimization of discontinuous functions. Cybern. 13, 220–223 (1977)
Article MATH Google Scholar
Hanzely, F., Doikov, N., Nesterov, Y., Richtarik, P.: Stochastic subspace cubic Newton method. In: Proceedings of the 37th International Conference on Machine Learning, H. D. III and A. Singh, eds., vol. 119 of Proceedings of Machine Learning Research, PMLR, pp. 4027–4038 (13–18 Jul 2020)
Hanzely, F., Kovalev, D., Richtarik, P.: Variance reduced coordinate descent with acceleration: New method with a surprising application to finite-sum problems. In: Proceedings of the 37th International Conference on Machine Learning, H. D. III and A. Singh, eds., vol. 119 of Proceedings of Machine Learning Research, PMLR, pp. 4039–4048 (13–18 Jul 2020)
Karimi, H., Nutini, J., Schmidt, M.: Linear convergence of gradient and proximal-gradient methods under the Polyak-Łojasiewicz condition. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases, Springer, pp. 795–811 (2016)
Kiefer, J., Wolfowitz, J.: Stochastic estimation of the maximum of a regression function. Ann. Math. Stat. 23, 462–466 (1952)
Article MathSciNet MATH Google Scholar
Bromwich, T.: Theory and application of infinite series. By Knopp K. Translated from the second German edition by Miss Young R.C. pp. xii 572. 30s. net. 1928. (Blackie) Math. Gaz. 14(199), 370–371. Cambridge University Press (1929). https://doi.org/:10.2307/3606730
Konecný, J., Richtárik, P.: Simple complexity analysis of simplified direct search, arXiv preprint arXiv:1411.5873, (2014)
Kozak, D., Becker, S., Doostan, A., Tenorio, L.: Stochastic subspace descent, arXiv preprint arXiv:1904.01145, (2019)
Kozak, D., Becker, S., Doostan, A., Tenorio, L.: A stochastic subspace approach to gradient-free optimization in high dimensions. Comput. Optim. Appl. 79, 339–368 (2021)
Article MathSciNet MATH Google Scholar
Kushner, H.J., Clark, D.S.: Stochastic approximation methods for constrained and unconstrained systems. Applied Mathematical Sciences, vol. 26. Springer-Verlag, New York-Berlin (1978)
Mania, H., Guy, A., Recht, B.: Simple random search of static linear policies is competitive for reinforcement learning. In: Bengio, S., Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., Garnett, R. (eds.) Advances in Neural Information Processing Systems 31, pp. 1800–1809. Curran Associates, Inc., New York (2018)
Google Scholar
Martinsson, P.-G., Tropp, J.A.: Randomized numerical linear algebra: foundations and algorithms. Acta Numer. 29, 403–572 (2020)
Article MathSciNet MATH Google Scholar
Matyas, J.: Random optimization. Autom. Remote. Control. 26, 246–253 (1965)
MathSciNet MATH Google Scholar
Mezzadri, F.: How to generate random matrices from the classical compact groups. Notices Amer. Math. Soc. 54, 592–604 (2007)
MathSciNet MATH Google Scholar
Nesterov, Y.: Random gradient-free minimization of convex functions. LIDAM Discussion Papers CORE 2011001, Université catholique de Louvain, Center for Operations Research and Econometrics (CORE), (Jan. 2011)
Nesterov, Y., Spokoiny, V.: Random gradient-free minimization of convex functions. Found. Comput. Math. 17, 527–566 (2017). (First appeard as CORE discussion paper 2011)
Article MathSciNet MATH Google Scholar
Opial, Z. a: Weak convergence of the sequence of successive approximations for nonexpansive mappings. Bull. Amer. Math. Soc. 73, 591–597 (1967)
Article MathSciNet MATH Google Scholar
Paquette, C., Scheinberg, K.: A stochastic line search method with expected complexity analysis. SIAM J. Optim. 30, 349–376 (2020)
Article MathSciNet MATH Google Scholar
Polyak, B.T.: Introduction to optimization, vol. 1. Optimization Software Inc., New York (1987)
MATH Google Scholar
Rastrigin, L.A.: About convergence of random search method in extremal control of multi-parameter systems. Avtomat. i Telemekh 24, 1467–1473 (1963)
Google Scholar
Riis, E.S., Ehrhardt, M.J., Quispel, G., Schönlieb, C.-B.: A geometric integration approach to nonsmooth, nonconvex optimisation, arXiv preprint arXiv:1807.07554, (2018)
Robbins, H., Monro, S.: A stochastic approximation method. Ann. Math. Stat. 22, 400–407 (1951)
Article MathSciNet MATH Google Scholar
Robbins, H., Siegmund, D.: A convergence theorem for non negative almost supermartingales and some applications. In: Optimizing methods in statistics (Proc. Sympos., Ohio State Univ., Columbus, Ohio, 1971), pp. 233–257 (1971)
Salimans, T., Ho, J., Chen, X., Sidor, S., Sutskever, I.: Evolution strategies as a scalable alternative to reinforcement learning, arXiv preprint arXiv:1703.03864, (2017)
Salzo, S., Villa, S.: Parallel random block-coordinate forward-backward algorithm: A unified convergence analysis, arXiv preprint arXiv:1906.07392, (2019)
Schmidt, M., Roux, N.L., Bach, F.: Convergence rates of inexact proximal-gradient methods for convex optimization, arXiv preprint arXiv:1109.2415, (2011)
Spall, J.C.: Multivariate stochastic approximation using a simultaneous perturbation gradient approximation. IEEE Trans. Automat. Control 37, 332–341 (1992)
Article MathSciNet MATH Google Scholar
Tappenden, R., Takáč, M., Richtárik, P.: On the complexity of parallel coordinate descent. Optim. Methods Softw. 33, 372–395 (2018)
Article MathSciNet MATH Google Scholar
Williams, R.J.: Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach. Learn. 8, 229–256 (1992)
Article MATH Google Scholar
Woodruff, D.P.: Sketching as a tool for numerical linear algebra. Found. Trends Theor. Comput. Sci. 10, iv+157 (2014)
MathSciNet MATH Google Scholar
Wright, S.J.: Coordinate descent algorithms. Math. Program. 151, 3–34 (2015)
Article MathSciNet MATH Google Scholar
Zavriev, S.K.: On the global optimization properties of finite-difference local descent algorithms. J. Global Optim. 3, 67–78 (1993)
Article MathSciNet MATH Google Scholar

Download references

Acknowledgements

L.R. acknowledges support from the Center for Brains, Minds and Machines (CBMM), funded by NSF STC award CCF-1231216, and the Italian Institute of Technology. L.R. also acknowledges the financial support of the European Research Council (grant SLING 819789), the AFOSR projects FA9550- 18-1-7009, FA9550-17-1-0390 and BAA-AFRL-AFOSR-2016-0007 (European Office of Aerospace Research and Development), and the EU H2020-MSCA-RISE project NoMADS - DLV-777826. This work has been supported by the ITN-ETN project TraDE-OPT funded by the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No 861137. LT acknowledges funding by National Science Foundation grant DMS-1723005.

Author information

Authors and Affiliations

Solea Energy, Overland Park, United States
David Kozak
Istituto Italiano di Tecnologia, Genoa, Italy
Cesare Molinari
MaLGa, DIBRIS, Università di Genova CBMM, MIT Istituto Italiano di Tecnologia, Genoa, Italy
Lorenzo Rosasco
Applied Mathematics and Statistics Department, Colorado School of Mines, Golden, CO, United States
Luis Tenorio
MaLGa, DIMA, Università degli Studi di Genova, Genoa, Italy
Silvia Villa

Authors

David Kozak
View author publications
You can also search for this author in PubMed Google Scholar
Cesare Molinari
View author publications
You can also search for this author in PubMed Google Scholar
Lorenzo Rosasco
View author publications
You can also search for this author in PubMed Google Scholar
Luis Tenorio
View author publications
You can also search for this author in PubMed Google Scholar
Silvia Villa
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to David Kozak.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file 1 (pdf 374 KB)

Auxiliary lemmas

Here we collect the main auxiliary results used in the convergence analysis of Algorithm 1.

Proof of Lemma 1

Let $k\in \mathbb {N}$, denoting by $p^{(j)}$ the j-th column of $P_k$, we want to estimate

$$\begin{aligned} \left\Vert \nabla _{\left( P_k, h_k\right) } f(x)-P_k^\top \nabla f(x)\right\Vert {}^2 = {\sum _{j=1}^{\ell }} \left( \left[ \nabla _{\left( P_k, h_k\right) } f(x)\right] _j-\langle \nabla f(x), p^{(j)} \rangle \right) ^2.\qquad \end{aligned}$$

(9.1)

To get an upper-bound for the term in parenthesis, we use H.1. As $\nabla f$ is $\lambda $-Lipschitz, the Descent Lemma 7 holds: for every $x\in \mathbb {R}^d$,

$$\begin{aligned} \left| f(x+h_k p_j) - f(x) - h_k\langle \nabla f(x), p^{(j)} \rangle \right| \le {\frac{\lambda h_k^2}{2}}\left\Vert p^{(j)}\right\Vert ^2. \end{aligned}$$

(9.2)

Then, rearranging,

$$\begin{aligned} \begin{aligned} \left| \left[ \nabla _{\left( P_k, h_k\right) } f(x)\right] _j-\langle \nabla f(x),p^{(j)} \rangle \right|&\le {\frac{\lambda h_k}{2}}\left\Vert p^{(j)}\right\Vert ^2 \quad \text {a.s.} \end{aligned} \end{aligned}$$

(9.3)

Note that P.1 implies $\Vert p^{(j)}\Vert ^2\overset{a.s.}{=}d/\ell $ and so (9.1) and (9.3) yield

$$\begin{aligned} \begin{aligned} \left\Vert \nabla _{\left( P_k, h_k\right) } f(x)-P_k^\top \nabla f(x)\right\Vert {}&\le \sqrt{{\sum _{j=1}^{\ell } } \left( {\frac{\lambda h_k d}{2\ell }}\right) ^2} = {\frac{\lambda h_k d}{2\sqrt{\ell }}} \quad \text { a.s.}. \end{aligned} \end{aligned}$$

$\square $

Lemma 5

Let $\left( a_k\right) $ and $\left( c_k\right) $ be non-negative sequences with $\lim _k c_k =0$ and let $\eta \in \left( 0,1\right) $. If for every $k\in \mathbb {N}$

$$\begin{aligned} a_{k+1}\le \left( 1-\eta \right) a_k +c_k, \end{aligned}$$

(9.4)

then $\lim _k a_k =0$. Moreover, if $c_k=c/k^t$ for some $c\ge 0$ and $t>0$, then we have $\limsup _k \ k^t a_k\le {c}/{\eta }.$

Proof

Iterating the inequality in (9.4), we get

$$\begin{aligned} \begin{aligned} a_{k}&\le (1-\eta )^k a_0 \\&\quad + \left( c_{k-1}+(1-\eta ) c_{k-2} + (1-\eta )^2 c_{k-3} +...+(1-\eta )^{k-1}c_0 \right) \\&\le a_0 + \left( \sup _{j\ge 0} c_j \right) {\sum _{j\ge 0}^{}} (1-\eta )^j = : {\tilde{C}} <+\infty , \end{aligned} \end{aligned}$$

where we used the fact that $\left( c_k\right) $ is convergent (and thus bounded) and that $\eta \in \left( 0,1\right) $. In particular, the sequence $\left( a_k\right) $ is bounded and so its $\limsup $ is a real number. Then, again from the hypothesis that $a_{k+1}\le (1-\eta ) a_k +c_k$ and the existence of $\lim _k c_k$, we have $\limsup _k a_{k} \le \limsup _k \left[ (1-\eta ) a_k + c_k \right] = (1-\eta ) \limsup _k a_k + \lim _k c_k.$ So, as $\lim _k c_k =0$ and $\limsup _k a_k \in \left[ 0,+\infty \right) $, $\eta \limsup _k a_k\le 0$. Finally, as $\eta \in \left( 0,1\right) $, $\limsup _k a_k\le 0$, and so, as $a_k\ge 0$, $\lim _k a_k = 0$. Now assume that $c_k=c/k^t$ for some $c\ge 0$ and $t>0$.

Let $c'>c$. Then, following the proof of [15, Lemma 4], there exists $k_0\in \mathbb {N}$ such that for every $k\ge k_0$ it holds

$$\begin{aligned} \frac{c}{k^t}\le \frac{c'}{\eta }\left[ \frac{1}{(k+1)^t}-(1-\eta )\frac{1}{k^t}\right] . \end{aligned}$$

Using the previous inequality in (9.4), we get

$$\begin{aligned} a_{k+1}-\frac{c'}{\eta (k+1)^t}\le (1-\eta )\left[ a_{k}-\frac{c'}{\eta k^t}\right] . \end{aligned}$$

(9.5)

First suppose that there exists ${\bar{k}}\ge k_0$ such that $a_{{\bar{k}}}\le \frac{c'}{\eta {\bar{k}}^{t}}$. Using (9.5), it is easy to see by recursion that, for every $k\ge {\bar{k}}$,

$$\begin{aligned} a_{k}\le \frac{c'}{\eta k^{t}}. \end{aligned}$$

Now suppose the opposite; namely, that for every $k\ge k_0$

$$\begin{aligned} a_k-\frac{c'}{\eta k^t}>0. \end{aligned}$$

Then, iterating (9.5), we get

$$\begin{aligned} a_{k}-\frac{c'}{\eta k^t}\le (1-\eta )^{k-1}\left[ a_{1}-\frac{c'}{\eta }\right] . \end{aligned}$$

In any of the two previous cases,

$$\begin{aligned} \limsup _k \ k^t a_k\le \limsup _k \left\{ \frac{c'}{\eta }+k^t(1-\eta )^{k-1}\left[ a_{1}-\frac{c'}{\eta }\right] \right\} =\frac{c'}{\eta }. \end{aligned}$$

Finally, since the previous inequality holds for every $c'>c$, we get the claim. $\square $

Remark 23

Consider a function $f: {\mathbb {R}}^d \rightarrow {\mathbb {R}}$ with $\lambda $-Lipschitz gradient and at least one minimizer. Applying the recursion of Algorithm 1 to such a function with an arbitrary starting point $x_0 \in {\mathbb {R}}^d$ we get that $\left\Vert x_k-x_*\right\Vert ^2$, $\langle P_kP_k^\top \nabla f_k, x_k-x_* \rangle $, and $\left\Vert \nabla f_k\right\Vert ^2$ are bounded for all $k>0$. In particular, $\left\Vert \nabla f_k\right\Vert ^2$, $\left\Vert x_k-x_*\right\Vert ^2$, and $\langle P_kP_k^\top \nabla f_k, x_k-x_* \rangle $ are all integrable. To see this note first that $\nabla f_k$ and $x_k$ are both measurable. Recall that for finite $x_k$, $\lambda $-Lipschitz gradient of f implies that $\left\Vert \nabla f_k\right\Vert ^2 \le \lambda ^2 \left\Vert x_k-x_*\right\Vert ^2$. Choose an arbitrary finite $x_0$ to begin the recursion. Then,

$$\begin{aligned} \left\Vert x_1 -x_*\right\Vert ^2&\le 2\left\Vert x_0-x_*\right\Vert ^2 + 2\alpha _0^2\left\Vert P_0\nabla _{P_0} f_0\right\Vert ^2 \\&\le C_1 + C_2\left\Vert P_0\nabla _{P_0} f_0 - P_0P_0^\top \nabla f_0 + P_0P_0^\top \nabla f_0\right\Vert ^2 \\ \mathrm {Lemma }\,\,~1&\le C_1 + C_2\left( C_3 + \left( {\frac{d}{\ell }}\right) ^2 \left\Vert \nabla f_0\right\Vert ^2\right) , \end{aligned}$$

where $C_1, C_2, C_3$ are fixed and finite, the values are unimportant but can be calculated. Repeated recursion reveals that $\left\Vert x_k-x_*\right\Vert ^2$ is bounded for all $k>0$. Further, by $\lambda $-Lipschitz gradient, $\left\Vert \nabla f_k\right\Vert ^2 \le \lambda ^2 \left\Vert x_k-x_*\right\Vert ^2$. The claim on the inner product follows from Young’s inequality and the previous results. Finally, since $\left\Vert \nabla f_k\right\Vert ^2$, $\left\Vert x_k-x_*\right\Vert ^2$, and $\langle P_kP_k^\top \nabla f_k, x_k-x_* \rangle $ are bounded, and measurable, they are integrable.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Kozak, D., Molinari, C., Rosasco, L. et al. Zeroth-order optimization with orthogonal random directions. Math. Program. 199, 1179–1219 (2023). https://doi.org/10.1007/s10107-022-01866-9

Download citation

Received: 19 July 2021
Accepted: 30 June 2022
Published: 02 August 2022
Issue Date: May 2023
DOI: https://doi.org/10.1007/s10107-022-01866-9

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Zeroth-order optimization with orthogonal random directions

Abstract

Access this article

Similar content being viewed by others

Coordinate descent methods beyond smoothness and separability

Random Gradient-Free Minimization of Convex Functions

A simplified view of first order methods for optimization

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Supplementary Information

Supplementary file 1 (pdf 374 KB)

Auxiliary lemmas

Proof of Lemma 1

Lemma 5

Proof

Remark 23

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

Zeroth-order optimization with orthogonal random directions

Abstract

Access this article

Similar content being viewed by others

Coordinate descent methods beyond smoothness and separability

Random Gradient-Free Minimization of Convex Functions

A simplified view of first order methods for optimization

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Supplementary Information

Supplementary file 1 (pdf 374 KB)

Auxiliary lemmas

Auxiliary lemmas

Proof of Lemma 1

Lemma 5

Proof

Remark 23

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation