Efficiency of higher-order algorithms for minimizing composite functions

Nabou, Yassine; Necoara, Ion

doi:10.1007/s10589-023-00533-9

Efficiency of higher-order algorithms for minimizing composite functions

Published: 10 October 2023

Volume 87, pages 441–473, (2024)
Cite this article

Computational Optimization and Applications Aims and scope Submit manuscript

813 Accesses
1 Altmetric
Explore all metrics

Abstract

Composite minimization involves a collection of functions which are aggregated in a nonsmooth manner. It covers, as a particular case, smooth approximation of minimax games, minimization of max-type functions, and simple composite minimization problems, where the objective function has a nonsmooth component. We design a higher-order majorization algorithmic framework for fully composite problems (possibly nonconvex). Our framework replaces each component with a higher-order surrogate such that the corresponding error function has a higher-order Lipschitz continuous derivative. We present convergence guarantees for our method for composite optimization problems with (non)convex and (non)smooth objective function. In particular, we prove stationary point convergence guarantees for general nonconvex (possibly nonsmooth) problems and under Kurdyka–Lojasiewicz (KL) property of the objective function we derive improved rates depending on the KL parameter. For convex (possibly nonsmooth) problems we also provide sublinear convergence rates.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Barzilai-Borwein type method for minimizing composite functions

Article 17 October 2014

The multiproximal linearization method for convex composite problems

Article 22 March 2019

Approximate Optimality Conditions for Composite Convex Optimization Problems

Article 01 November 2016

References

Attouch, H., Bolte, J.: On the convergence of the proximal algorithm for nonsmooth functions involving analytic features. Math. Program. 116(1–2), 5–16 (2009)
Article MathSciNet Google Scholar
Birgin, E.G., Gardenghi, J.L., Martínez, J.M., Santos, S.A., Toint, P.L.: Worst-case evaluation complexity for unconstrained nonlinear optimization using high-order regularized models. Math. Program. 163(1), 359–368 (2017)
Article MathSciNet Google Scholar
Birgin, E.G., Gardenghi, J.L., Martínez, J.M., Santos, S.A.: On the use of third-order models with fourth-order regularization for unconstrained optimization. Optim. Lett. 14, 815–838 (2020)
Article MathSciNet Google Scholar
Bolte, J., Daniilidis, A., Lewis, A., Shiota, M.: Clarke subgradients of stratifiable functions. SIAM 18(2), 556–572 (2007)
Article MathSciNet Google Scholar
Bolte, J., Chen, Z., Pauwels, E.: The multiproximal linearization method for convex composite problems. Math. Prog. 182, 1–36 (2020)
Article MathSciNet Google Scholar
Bolte, J., Sabach, S., Teboulle, M.: Proximal alternating linearized minimization for nonconvex and nonsmooth problems. Math. Program. 146, 459–494 (2014)
Article MathSciNet Google Scholar
Cartis, C., Gould, N., Toint, P.L.: A concise second-order complexity analysis for unconstrained optimization using high-order regularized models. Optim. Methods Softw. 35, 243–256 (2020)
Article MathSciNet Google Scholar
Drusvyatskiy, D., Paquette, C.: Efficiency of minimizing compositions of convex functions and smooth maps. Math. Program. 178(1–2), 503–558 (2019)
Article MathSciNet Google Scholar
Doikov, N., Nesterov, Yu.: Optimization methods for fully composite problems. SIAM J. Optim. 32(3), 2402–2427 (2022)
Article MathSciNet Google Scholar
Fletcher, R.: A model algorithm for composite NDO problems. Math. Program. Stud. 17, 67–76 (1982)
Article Google Scholar
Gasnikov, A., Dvurechensky, P., Gorbunov, E., Vorontsova, E., Selikhanovych, D., Uribe, C., Jiang, B., Wang, H., Zhang, S., Bubeck, S., Jiang, Q.: Near optimal methods for minimizing convex functions with Lipschitz $p$th derivatives. Conf. on Learning Theory 1392–1393 (2019)
Gould, N.I.M., Rees, T., Scott, J.: Convergence and evaluation-complexity analysis of a regularized tensor-Newton method for solving nonlinear least-squares problems. Comput. Optim. Appl. 73(1), 1–35 (2019)
Article MathSciNet Google Scholar
Grapiglia, G., Nesterov, Yu.: Tensor methods for minimizing convex functions with Hölder continuous higher-order derivatives. SIAM J. Optim. 30(4), 2750–2779 (2020)
Article MathSciNet Google Scholar
Hiriart-Urruty, J.-B.: New concepts in nondifferentiable programming. Memoires de la Societe Mathematique de France 60, 57–85 (1979)
Article MathSciNet Google Scholar
Li, C., Ng, K.F.: Majorizing functions and convergence of the Gauss-Newton method for convex composite optimization. SIAM J. Optim. 18(2), 613–642 (2007)
Article MathSciNet Google Scholar
Mairal, J.: Incremental majorization-minimization optimization with application to large-scale machine learning. SIAM J. Optim. 25(2), 829–855 (2015)
Article MathSciNet Google Scholar
Mordukhovich, B.: Variational Analysis and Generalized Differentiation. Basic Theory. Springer, Berlin (2006)
Book Google Scholar
More, J., Garbow, B.S., Hillstrom, K.E.: Testing unconstrained optimization software. ACM Transat. Math. Soft. 7(1), 17–41 (1981)
Article MathSciNet Google Scholar
Necoara, I., Nesterov, Yu., Glineur, F.: Linear convergence of first-order methods for non-strongly convex optimization. Math. Program. 175, 69–107 (2019)
Article MathSciNet Google Scholar
Necoara, I., Lupu, D.: General higher-order majorization-minimization algorithms for (non) convex optimization (2020). arXiv preprint: arXiv:2010.13893
Nesterov, Yu., Nemirovskii, A.: Interior-Point Polynomial Algorithms in Convex Programming. SIAM, Philadelphia (1994)
Book Google Scholar
Nesterov, Yu.: Smooth minimization of non-smooth functions. Math. Program. 103, 127–152 (2005)
Article MathSciNet Google Scholar
Nesterov, Yu., Polyak, B.T.: Cubic regularization of Newton method and its global performance. Math. Program. 108, 177–205 (2006)
Article MathSciNet Google Scholar
Nesterov, Yu.: Implementable tensor methods in unconstrained convex optimization. Math. Program. 186, 157–183 (2021)
Article MathSciNet PubMed Google Scholar
Nesterov, Yu.: Inexact basic tensor methods for some classes of convex optimization problems. Optim. Methods Soft 37, 878–906 (2022)
Article MathSciNet Google Scholar
Nesterov, Yu.: Gradient methods for minimizing composite functions. Math. Program. 140(1), 125–161 (2013)
Article MathSciNet Google Scholar
Pauwels, E.: The value function approach to convergence analysis in composite optimization. Oper. Res. Lett. 44, 790–795 (2016)
Article MathSciNet Google Scholar
Wächter, A., Biegler, L.T.: On the implementation of a primal-dual interior point filter line search algorithm for large-scale nonlinear programming. Math. Program. 106(1), 25–57 (2006)
Article MathSciNet Google Scholar
Yuan, Y.: Conditions for convergence of trust-region algorithms for nonsmooth optimization. Math. Program. 31, 220–228 (1985)
Article MathSciNet Google Scholar

Download references

Acknowledgements

The research leading to these results has received funding from: ITN-ETN project TraDE-OPT funded by the EU, H2020 Research and Innovation Programme under the Marie Skolodowska-Curie grant agreement No. 861137; NO Grants 2014-2021, under project ELO-Hyp, contract no. 24/2020; UEFISCDI PN-III-P4-PCE-2021-0720, under project L2O-MOC, nr. 70/2022.

Author information

Authors and Affiliations

Automatic Control and Systems Engineering Department, National University of Science and Technology Politehnica Bucharest, Spl. Independentei 313, 060042, Bucharest, Romania
Yassine Nabou & Ion Necoara
Gheorghe Mihoc-Caius Iacob Institute of Mathematical Statistics and Applied Mathematics of the Romanian Academy, 050711, Bucharest, Romania
Ion Necoara

Authors

Yassine Nabou
View author publications
You can also search for this author in PubMed Google Scholar
Ion Necoara
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ion Necoara.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Proof of Lemma 3

Let us first prove that for $p=2$, $g(\cdot )=\max (\cdot )$ and $h(\cdot ) = 0$, one can compute efficiently the global solution $x_{k+1}$ of the subproblem (15). Indeed, in this particular case (15) is equivalent to the following subproblem:

$$\begin{aligned} \min _{x \in {\mathbb {R}}^n} \max \limits _{i=1:m}&\left\{ F_i(x_k) + \langle \nabla F_i(x_k), x - x_k \rangle + \frac{1}{2} \left\langle \nabla ^2 F_i(x_k)(x - x_k), x - x_k\right\rangle \right. \\&\quad +\left. \frac{M_i}{6} \Vert x - x_k \Vert ^3\right\} .\nonumber \end{aligned}$$

(39)

Further, this is equivalent to:

$$\begin{aligned} \min _{x \in {\mathbb {R}}^n} \max _{\begin{array}{c} u \in \Delta _m \end{array}}\;&\sum _{i=1}^{m} u_i F_i(x_k) + \left\langle \sum _{i=1}^{m}u_i \nabla F_i(x_k), x - x_k \right\rangle \\&\quad + \frac{1}{2} \left\langle \sum _{i=1}^{m} u_i \nabla ^2 F_i(x_k)(x - x_k), x - x_k \right\rangle + \frac{\sum _{i=1}^{m}u_i M_i}{6} \Vert x - x_k\Vert ^3, \end{aligned}$$

where $u=(u_1,\cdots ,u_m)$ and $\Delta _m:= \left\{ u\ge 0: \sum _{i=1}^{m} u_i = 1 \right\} $ is the standard simplex in ${\mathbb {R}}^m$. Further, this $\min -\max $ problem can be written as follows:

$$\begin{aligned}&\min _{x \in {\mathbb {R}}^n} \max _{ \begin{array}{c} u\in \Delta _M \end{array}}\; \sum _{i=1}^{m} u_i F_i(x_k) + \left\langle \sum _{i=1}^{m}u_i \nabla F_i(x_k), x - x_k \right\rangle \\&\qquad + \frac{1}{2} \left\langle \sum _{i=1}^{m} u_i \nabla ^2 F_i(x_k) (x - x_k), (x - x_k) \right\rangle \\&\qquad + \max _{w\ge 0} \left( \frac{w}{4}\Vert x - x_k\Vert ^2 - \frac{1}{12(\sum _{i=1}^{m} u_i M_i)^2} w^3 \!\!\right) . \end{aligned}$$

Denote for simplicity $H_k(u,w) = \sum _{i=1}^{m} u_i \nabla ^2 F_i(x_k) + \frac{w}{2} I$, $g_k(u) = \sum _{i=1}^{m}u_i \nabla F_i(x_k)$, $l_k(u) = \sum _{i=1}^{m} u_i F_i(x_k)$ and $\tilde{M}(u) = \sum _{i=1}^{m}u_i M_i$. Then, the dual formulation of this problem takes the form:

$$\begin{aligned} \min _{x \in {\mathbb {R}}^n} \max _{ \begin{array}{c} u\in \Delta _m \\ w\in {\mathbb {R}}_+ \end{array}}\; l_k(u) + \left\langle g_k(u), x - x_k \right\rangle + \frac{1}{2} \left\langle H_k(u,w) (x - x_k), (x - x_k) \right\rangle - \frac{w^3}{12\tilde{M}(u)^2}. \end{aligned}$$

Consider the following notations:

$$\begin{aligned} \theta (x,u)&= l_k(u) + \langle g_k(u), x - x_k \rangle + \frac{1}{2}\! \left\langle \!\! \left( \sum _{i=1}^{m} u_i \nabla ^2 F_i(x_k)\right) \! (x - x_k), x - x_k \!\!\right\rangle \\&\quad + \frac{\tilde{M}(u)}{6}\Vert x - x_k\Vert ^3, \\ \beta (u,w)&= l_k(u) -\frac{1}{2} \left\langle H_k(u,w)^{-1} g(u) , g(u)\right\rangle - \frac{1}{12\tilde{M}(u)^2} w^3, \\ D&= \left\{ (u,w)\in \Delta _m\times {\mathbb {R}}_+:\; \; \text {s.t.}\; \sum _{i=1}^{m} u_i \nabla ^2 F_i(x_k) + \frac{w}{2} I\succ 0 \right\} . \end{aligned}$$

Below, we prove that if there exists an $M_i >0$, for some $i = 1:m$, then we have the following relation:

$$\begin{aligned} \theta ^*:=\min _{x\in {\mathbb {R}}^n}\max _{u\in \Delta _m} \theta (x,u) = \max _{(u,w)\in D}\beta (u,w) = \beta ^*. \end{aligned}$$

Additionally, for any $(u,w)\in D$ the direction $x_{k+1} = x_k -H_k(u,w)^{-1}g_k(u)$ satisfies:

$$\begin{aligned} 0\le \theta (x_{k+1},u) - \beta (u,w) = \frac{\tilde{M}(u)}{12} \left( \frac{w}{\tilde{M}(u)} + 2r_k\right) \left( r_k - \frac{w}{\tilde{M}(u)}\right) ^2, \end{aligned}$$

(40)

where $r_k:= \Vert x_{k+1} - x_k\Vert $. Indeed, let us first show that $\theta ^*\ge \beta ^*$. Using a similar reasoning as in [23], we have:

$$\begin{aligned} \theta ^*&= \min _{x \in {\mathbb {R}}^n} \max _{ \begin{array}{c} u\in \Delta _m\\ w \in {\mathbb {R}}_+ \end{array}}\; l_k(u) + \left\langle g_k(u), x - x_k \right\rangle + \frac{1}{2} \left\langle H_k(u,w) (x - x_k), x - x_k\right\rangle - \frac{ w^3}{12\tilde{M}(u)^2} \\&\ge \max _{ \begin{array}{c} u\in \Delta _m\\ w \in {\mathbb {R}}_+ \end{array}} \min _{x \in {\mathbb {R}}^n}\;l_k(u) + \left\langle g_k(u), x - x_k \right\rangle + \frac{1}{2} \left\langle H_k(u,w) (x - x_k), x - x_k \right\rangle - \frac{ w^3}{12\tilde{M}(u)^2} \\&\ge \max _{(u,w)\in D} \min _{x \in {\mathbb {R}}^n}\;l_k(u) + \left\langle g_k(u), x - x_k \right\rangle + \frac{1}{2} \left\langle H_k(u,w) (x - x_k), x - x_k \right\rangle - \frac{ w^3}{12\tilde{M}(u)^2}\\&= \max _{(u,w)\in D} {l_k(u)} -\frac{1}{2} \left\langle H_k(u,w)^{-1} g_k(u) , g_k(u)\right\rangle - \frac{1}{12\tilde{M}(u)^2} w^3 = \beta ^*. \end{aligned}$$

Let $(u,w) \in D$. Then, we have $g_k(u)= - H_k(u,w)(x_{k+1} - x_k)$ and thus:

$$\begin{aligned} \theta (x_{k+1},u)&= l_k(u) + \langle g_k(u), x_{k+1} - x_k \rangle \\&\quad + \frac{1}{2} \left\langle \left( \sum _{i=1}^{m} u_i \nabla ^2 F_i(x_k)\right) (x_{k+1}- x_k), x_{k+1} - x_k \right\rangle + \frac{\tilde{M}(u)}{6}r_k^3 \\&= l_k(u) - \left\langle H_k(u,w)(x_{k+1} - x_k), x_{k+1} - x_k \right\rangle \\&\quad + \frac{1}{2} \left\langle \left( \sum _{i=1}^{m} u_i \nabla ^2 F_i(x_k)\right) (x_{k+1}- x_k), x_{k+1} - x_k \right\rangle + \frac{\tilde{M}(u)}{6}r_k^3 \\&= l_k(u) - \frac{1}{2}\!\! \left\langle \!\! \left( \sum _{i=1}^{m} u_i \nabla ^2 F_i(x_k) + \frac{w}{2}I\right) (x_{k+1} - x_k), x_{k+1} - x_k \!\!\right\rangle \\&\quad - \frac{w}{4}r_k^2 + \frac{\tilde{M}(u)}{6}r_k^3 \\&= \beta (u,w) + \frac{1}{12\tilde{M}(u)^2}w^3 - \frac{w}{4}r_k^2 + \frac{\tilde{M}(u)}{6}r_k^3 \\&= \beta (u,w) + \frac{\tilde{M}(u)}{12}\!\left( \frac{w}{\tilde{M}(u)}\right) ^3 \!\!\!- \!\frac{\tilde{M}(u)}{4}\!\! \left( \frac{w}{\tilde{M}(u)}\right) \!r_k^2 + \frac{\tilde{M}(u)}{6}r_k^3 \\&= \beta (u,w) + \frac{\tilde{M}(u)}{12} \left( \frac{w}{\tilde{M}(u)} + 2r_k\right) \left( r_k - \frac{w}{\tilde{M}(u)}\right) ^2, \end{aligned}$$

which proves (40). Note that we have [23]:

$$\begin{aligned} \nabla _w \beta (u,w)&= \frac{1}{4} \Vert x_{k+1} - x_k\Vert ^2 - \frac{1}{4\tilde{M}(u)^2} w^2 = \frac{1}{4} \left( r_k + \frac{w}{\tilde{M}(u)}\right) \left( r_k - \frac{w}{\tilde{M}(u)}\right) . \end{aligned}$$

Therefore, if $\beta ^*$ is attained at some $(u^*,w^*) \in D$, then we have $\nabla \beta (u^*,w^*) = 0$. This implies $\frac{w^*}{\tilde{M}(u^*)} = r_k$ and by (40) we conclude that $\theta ^* = \beta ^*$.

Finally, if $x_{k+1}$ is a global solution of the subproblem (15) (or equivalently (39)), then it satisfies the inexact condition (25) with $\delta = 0$. Hence, using the proof of Lemma 2 with $\delta = 0$ we can conclude that Assumption 2 holds with $y_{k+1}$ given in (24), $L^{1}_{p}=\left( C_{L^{e}_{p}}^{\mu _{p}}\right) ^{1/3}$ and $L^{2}_{p}=\frac{\mu _{p}}{2}$. $\square $

Proof of Remark 2

If g is the identity function, then taking $y_{k+1}=x_{k+1}$ one can see that Assumption 3 holds for any $\theta _{1,p}$ and $\theta _{2,p}$ nonnegative constants. If g is a general function, then Assumption 3 holds, provided that $x_{k+1}$ satisfies the inexact optimality condition (25). Indeed, in this case, we have:

$$\begin{aligned} f(x_{k+1})&\le g \Big (s(x_{k+1};\!\!x_{k})\Big ) + h(x_{k+1})\\&{\mathop {\le }\limits ^{(25)}} \min _{{y:\; \Vert y - x_k\Vert \le D_k}} g \Big (s(y;\!\!x_{k})\Big ) + h(y) + \delta \Vert x_{k+1} - x_k\Vert ^{p+1}\\&{\mathop {\le }\limits ^{(25),(8)}} \min _{{y:\;\Vert y - x_k\Vert \le D_k}} g\big (F(y)\big )+ h(y) + \dfrac{g(L^{e}_{p})}{(p+1)!}\Vert y-x_{k}\Vert ^{p+1}\\&\quad + \delta \Vert x_{k+1} - x_k\Vert ^{p+1}\\&\le f(y_{k+1}) + \dfrac{g(L^{e}_{p})}{(p+1)!}\Vert y_{k+1}-x_{k}\Vert ^{p+1} + \delta \Vert x_{k+1} - x_k\Vert ^{p+1}, \end{aligned}$$

where the last inequality follows taking $y = y_{k+1}$. Hence, Assumption 3 holds in this case for $\theta _{1,p} = \dfrac{g(L^{e}_{p})}{(p+1)!}$ and $\theta _{2,p} = \delta $. Finally, if $p=2$ and $g(\cdot ) = \max (\cdot )$, then $x_{k+1}$ is the global solution of the subproblem (15) and hence, using similar arguments as above, we can prove that Assumption 3 also holds in this case. $\square $

Proof of Lemma 6

Note that the sequence $\lambda _{k}$ is nonincreasing and nonnegative, thus it is convergent. Let us consider first $\theta \le 1$. Since $\lambda _{k}-\lambda _{k+1}$ converges to 0, then there exists $k_{0}$ such that $\lambda _{k}-\lambda _{k+1} \le 1$ and $\lambda _{k+1}\le (C_{1}+C_{2})\left( \lambda _{k}-\lambda _{k+1}\right) $ for all $k \ge k_{0}$. It follows that:

$$\begin{aligned} \lambda _{k+1}\le \frac{C_{1}+C_{2}}{1+C_{1}+C_{2}}\lambda _{k}, \end{aligned}$$

which proves the first statement. If $1<\theta \le 2$, then there exists also an integer $k_{0}$ such that $\lambda _{k}-\lambda _{k+1} \le 1$ for all $k\ge k_{0}$. Then, we have:

$$\begin{aligned} \lambda _{k+1}^{\theta }\le (C_{1}+C_{2})^{\theta }\left( \lambda _{k}-\lambda _{k+1}\right) . \end{aligned}$$

Since $1<\theta \le 2$, then taking $0 < \beta = \theta -1 \le 1$, we have:

$$\begin{aligned} \left( \frac{1}{C_{1}+C_{2}}\right) ^{\theta }\lambda _{k+1}^{1+\beta }\le \lambda _{k}-\lambda _{k+1}, \end{aligned}$$

for all $k\ge k_{0}$. From Lemma 11 in [25], we further have:

$$\begin{aligned} \lambda _{k}\le \frac{\lambda _{k_{0}}}{(1+\sigma (k-k_{0}))^{\frac{1}{\beta }}} \end{aligned}$$

for all $k\ge k_{0}$ and for some $\sigma >0$. Finally, if $\theta > 2$, then define $h(s){=}s^{-\theta }$ and let $R>1$ be fixed. Since $1/\theta < 1$, then there exists a $k_{0}$ such that $\lambda _{k}-\lambda _{k+1} \le 1$ for all $k \ge k_{0}$. Then, we have $\lambda _{k+1}\le (C_{1}+C_{2})\left( \lambda _{k}-\lambda _{k+1}\right) ^{\frac{1}{\theta }}$, or equivalently:

$$\begin{aligned} 1\le (C_{1}+C_{2})^{\theta }(\lambda _{k}-\lambda _{k+1})h(\lambda _{k+1}). \end{aligned}$$

If we assume that $h(\lambda _{k+1})\le R h(\lambda _{k}) $, then:

$$\begin{aligned} 1\le R(C_{1}+C_{2})^{\theta }(\lambda _{k}-\lambda _{k+1})h(\lambda _{k})&\le \frac{R(C_{1}+C_{2})^{\theta }}{-\theta +1}\left( \lambda _{k}^{-\theta +1} -\lambda _{k+1}^{-\theta +1} \right) . \end{aligned}$$

Denote $\mu =\frac{-R(C_{1}+C_{2})^{\theta }}{-\theta +1}$. Then:

$$\begin{aligned} 0<\mu ^{-1} \le \lambda _{k+1}^{1-\theta } - \lambda _{k}^{1-\theta }. \end{aligned}$$

(41)

If we assume that $h(\lambda _{k+1})> R h(\lambda _{k}) $ and set $\gamma = R^{-\frac{1}{\theta }}$, then it follows immediately that $\lambda _{k+1}\le \gamma \lambda _{k}$. Since $1-\theta $ is negative, we get:

$$\begin{aligned} \lambda _{k+1}^{1-\theta }&\ge \gamma ^{1-\theta }\lambda _{k}^{1-\theta } \quad \iff \quad \lambda _{k+1}^{1-\theta } - \lambda _{k}^{1-\theta } \ge (\gamma ^{1-\theta }-1)\lambda _{k}^{1-\theta }. \end{aligned}$$

Since $1- \theta <0$, $\gamma ^{1-\theta } > 1$ and $\lambda _{k}$ has a nonnegative limit, then there exists ${\bar{\mu }} > 0$ such that $(\gamma ^{1-\theta } - 1) \lambda _{k}^{1-\theta } > {\bar{\mu }}$ for all $k \ge k_0$. Therefore, in this case we also obtain:

$$\begin{aligned} 0< {\bar{\mu }} \le \lambda _{k+1}^{1-\theta }-\lambda _{k}^{1-\theta }. \end{aligned}$$

(42)

If we set ${\hat{\mu }}=\min (\mu ^{-1},{\bar{\mu }})$ and combine (41) and (42), we obtain:

$$\begin{aligned} 0< {\hat{\mu }} \le \lambda _{k+1}^{1-\theta }-\lambda _{k}^{1-\theta }. \end{aligned}$$

Summing the last inequality from $k_{0}$ to k, we obtain $\lambda _{k}^{1-\theta }-\lambda _{k_{0}}^{1-\theta }\ge {\hat{\mu }}(k-k_{0})$, i.e.:

$$\begin{aligned} \lambda _{k} \le \frac{{\hat{\mu }}^{-\frac{1}{\theta -1}}}{(k-k_{0})^{\frac{1}{\theta -1}}} \end{aligned}$$

for all $k \ge k_0$. This concludes our proof. $\square $

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Nabou, Y., Necoara, I. Efficiency of higher-order algorithms for minimizing composite functions. Comput Optim Appl 87, 441–473 (2024). https://doi.org/10.1007/s10589-023-00533-9

Download citation

Received: 23 September 2022
Accepted: 22 September 2023
Published: 10 October 2023
Issue Date: March 2024
DOI: https://doi.org/10.1007/s10589-023-00533-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Efficiency of higher-order algorithms for minimizing composite functions

Abstract

Access this article

Similar content being viewed by others

A Barzilai-Borwein type method for minimizing composite functions

The multiproximal linearization method for convex composite problems

Approximate Optimality Conditions for Composite Convex Optimization Problems

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendix

Proof of Lemma 3

Proof of Remark 2

Proof of Lemma 6

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Efficiency of higher-order algorithms for minimizing composite functions

Abstract

Access this article

Similar content being viewed by others

A Barzilai-Borwein type method for minimizing composite functions

The multiproximal linearization method for convex composite problems

Approximate Optimality Conditions for Composite Convex Optimization Problems

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendix

Appendix

Proof of Lemma 3

Proof of Remark 2

Proof of Lemma 6

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation