Abstract
Composite minimization involves a collection of functions which are aggregated in a nonsmooth manner. It covers, as a particular case, smooth approximation of minimax games, minimization of max-type functions, and simple composite minimization problems, where the objective function has a nonsmooth component. We design a higher-order majorization algorithmic framework for fully composite problems (possibly nonconvex). Our framework replaces each component with a higher-order surrogate such that the corresponding error function has a higher-order Lipschitz continuous derivative. We present convergence guarantees for our method for composite optimization problems with (non)convex and (non)smooth objective function. In particular, we prove stationary point convergence guarantees for general nonconvex (possibly nonsmooth) problems and under Kurdyka–Lojasiewicz (KL) property of the objective function we derive improved rates depending on the KL parameter. For convex (possibly nonsmooth) problems we also provide sublinear convergence rates.
Similar content being viewed by others
References
Attouch, H., Bolte, J.: On the convergence of the proximal algorithm for nonsmooth functions involving analytic features. Math. Program. 116(1–2), 5–16 (2009)
Birgin, E.G., Gardenghi, J.L., Martínez, J.M., Santos, S.A., Toint, P.L.: Worst-case evaluation complexity for unconstrained nonlinear optimization using high-order regularized models. Math. Program. 163(1), 359–368 (2017)
Birgin, E.G., Gardenghi, J.L., Martínez, J.M., Santos, S.A.: On the use of third-order models with fourth-order regularization for unconstrained optimization. Optim. Lett. 14, 815–838 (2020)
Bolte, J., Daniilidis, A., Lewis, A., Shiota, M.: Clarke subgradients of stratifiable functions. SIAM 18(2), 556–572 (2007)
Bolte, J., Chen, Z., Pauwels, E.: The multiproximal linearization method for convex composite problems. Math. Prog. 182, 1–36 (2020)
Bolte, J., Sabach, S., Teboulle, M.: Proximal alternating linearized minimization for nonconvex and nonsmooth problems. Math. Program. 146, 459–494 (2014)
Cartis, C., Gould, N., Toint, P.L.: A concise second-order complexity analysis for unconstrained optimization using high-order regularized models. Optim. Methods Softw. 35, 243–256 (2020)
Drusvyatskiy, D., Paquette, C.: Efficiency of minimizing compositions of convex functions and smooth maps. Math. Program. 178(1–2), 503–558 (2019)
Doikov, N., Nesterov, Yu.: Optimization methods for fully composite problems. SIAM J. Optim. 32(3), 2402–2427 (2022)
Fletcher, R.: A model algorithm for composite NDO problems. Math. Program. Stud. 17, 67–76 (1982)
Gasnikov, A., Dvurechensky, P., Gorbunov, E., Vorontsova, E., Selikhanovych, D., Uribe, C., Jiang, B., Wang, H., Zhang, S., Bubeck, S., Jiang, Q.: Near optimal methods for minimizing convex functions with Lipschitz \(p\)th derivatives. Conf. on Learning Theory 1392–1393 (2019)
Gould, N.I.M., Rees, T., Scott, J.: Convergence and evaluation-complexity analysis of a regularized tensor-Newton method for solving nonlinear least-squares problems. Comput. Optim. Appl. 73(1), 1–35 (2019)
Grapiglia, G., Nesterov, Yu.: Tensor methods for minimizing convex functions with Hölder continuous higher-order derivatives. SIAM J. Optim. 30(4), 2750–2779 (2020)
Hiriart-Urruty, J.-B.: New concepts in nondifferentiable programming. Memoires de la Societe Mathematique de France 60, 57–85 (1979)
Li, C., Ng, K.F.: Majorizing functions and convergence of the Gauss-Newton method for convex composite optimization. SIAM J. Optim. 18(2), 613–642 (2007)
Mairal, J.: Incremental majorization-minimization optimization with application to large-scale machine learning. SIAM J. Optim. 25(2), 829–855 (2015)
Mordukhovich, B.: Variational Analysis and Generalized Differentiation. Basic Theory. Springer, Berlin (2006)
More, J., Garbow, B.S., Hillstrom, K.E.: Testing unconstrained optimization software. ACM Transat. Math. Soft. 7(1), 17–41 (1981)
Necoara, I., Nesterov, Yu., Glineur, F.: Linear convergence of first-order methods for non-strongly convex optimization. Math. Program. 175, 69–107 (2019)
Necoara, I., Lupu, D.: General higher-order majorization-minimization algorithms for (non) convex optimization (2020). arXiv preprint: arXiv:2010.13893
Nesterov, Yu., Nemirovskii, A.: Interior-Point Polynomial Algorithms in Convex Programming. SIAM, Philadelphia (1994)
Nesterov, Yu.: Smooth minimization of non-smooth functions. Math. Program. 103, 127–152 (2005)
Nesterov, Yu., Polyak, B.T.: Cubic regularization of Newton method and its global performance. Math. Program. 108, 177–205 (2006)
Nesterov, Yu.: Implementable tensor methods in unconstrained convex optimization. Math. Program. 186, 157–183 (2021)
Nesterov, Yu.: Inexact basic tensor methods for some classes of convex optimization problems. Optim. Methods Soft 37, 878–906 (2022)
Nesterov, Yu.: Gradient methods for minimizing composite functions. Math. Program. 140(1), 125–161 (2013)
Pauwels, E.: The value function approach to convergence analysis in composite optimization. Oper. Res. Lett. 44, 790–795 (2016)
Wächter, A., Biegler, L.T.: On the implementation of a primal-dual interior point filter line search algorithm for large-scale nonlinear programming. Math. Program. 106(1), 25–57 (2006)
Yuan, Y.: Conditions for convergence of trust-region algorithms for nonsmooth optimization. Math. Program. 31, 220–228 (1985)
Acknowledgements
The research leading to these results has received funding from: ITN-ETN project TraDE-OPT funded by the EU, H2020 Research and Innovation Programme under the Marie Skolodowska-Curie grant agreement No. 861137; NO Grants 2014-2021, under project ELO-Hyp, contract no. 24/2020; UEFISCDI PN-III-P4-PCE-2021-0720, under project L2O-MOC, nr. 70/2022.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
Proof of Lemma 3
Let us first prove that for \(p=2\), \(g(\cdot )=\max (\cdot )\) and \(h(\cdot ) = 0\), one can compute efficiently the global solution \(x_{k+1}\) of the subproblem (15). Indeed, in this particular case (15) is equivalent to the following subproblem:
Further, this is equivalent to:
where \(u=(u_1,\cdots ,u_m)\) and \(\Delta _m:= \left\{ u\ge 0: \sum _{i=1}^{m} u_i = 1 \right\} \) is the standard simplex in \({\mathbb {R}}^m\). Further, this \(\min -\max \) problem can be written as follows:
Denote for simplicity \(H_k(u,w) = \sum _{i=1}^{m} u_i \nabla ^2 F_i(x_k) + \frac{w}{2} I\), \(g_k(u) = \sum _{i=1}^{m}u_i \nabla F_i(x_k)\), \(l_k(u) = \sum _{i=1}^{m} u_i F_i(x_k)\) and \(\tilde{M}(u) = \sum _{i=1}^{m}u_i M_i\). Then, the dual formulation of this problem takes the form:
Consider the following notations:
Below, we prove that if there exists an \(M_i >0\), for some \(i = 1:m\), then we have the following relation:
Additionally, for any \((u,w)\in D\) the direction \(x_{k+1} = x_k -H_k(u,w)^{-1}g_k(u)\) satisfies:
where \(r_k:= \Vert x_{k+1} - x_k\Vert \). Indeed, let us first show that \(\theta ^*\ge \beta ^*\). Using a similar reasoning as in [23], we have:
Let \((u,w) \in D\). Then, we have \(g_k(u)= - H_k(u,w)(x_{k+1} - x_k)\) and thus:
which proves (40). Note that we have [23]:
Therefore, if \(\beta ^*\) is attained at some \((u^*,w^*) \in D\), then we have \(\nabla \beta (u^*,w^*) = 0\). This implies \(\frac{w^*}{\tilde{M}(u^*)} = r_k\) and by (40) we conclude that \(\theta ^* = \beta ^*\).
Finally, if \(x_{k+1}\) is a global solution of the subproblem (15) (or equivalently (39)), then it satisfies the inexact condition (25) with \(\delta = 0\). Hence, using the proof of Lemma 2 with \(\delta = 0\) we can conclude that Assumption 2 holds with \(y_{k+1}\) given in (24), \(L^{1}_{p}=\left( C_{L^{e}_{p}}^{\mu _{p}}\right) ^{1/3}\) and \(L^{2}_{p}=\frac{\mu _{p}}{2}\). \(\square \)
Proof of Remark 2
If g is the identity function, then taking \(y_{k+1}=x_{k+1}\) one can see that Assumption 3 holds for any \(\theta _{1,p}\) and \(\theta _{2,p}\) nonnegative constants. If g is a general function, then Assumption 3 holds, provided that \(x_{k+1}\) satisfies the inexact optimality condition (25). Indeed, in this case, we have:
where the last inequality follows taking \(y = y_{k+1}\). Hence, Assumption 3 holds in this case for \(\theta _{1,p} = \dfrac{g(L^{e}_{p})}{(p+1)!}\) and \(\theta _{2,p} = \delta \). Finally, if \(p=2\) and \(g(\cdot ) = \max (\cdot )\), then \(x_{k+1}\) is the global solution of the subproblem (15) and hence, using similar arguments as above, we can prove that Assumption 3 also holds in this case. \(\square \)
Proof of Lemma 6
Note that the sequence \(\lambda _{k}\) is nonincreasing and nonnegative, thus it is convergent. Let us consider first \(\theta \le 1\). Since \(\lambda _{k}-\lambda _{k+1}\) converges to 0, then there exists \(k_{0}\) such that \(\lambda _{k}-\lambda _{k+1} \le 1\) and \(\lambda _{k+1}\le (C_{1}+C_{2})\left( \lambda _{k}-\lambda _{k+1}\right) \) for all \(k \ge k_{0}\). It follows that:
which proves the first statement. If \(1<\theta \le 2\), then there exists also an integer \(k_{0}\) such that \(\lambda _{k}-\lambda _{k+1} \le 1\) for all \(k\ge k_{0}\). Then, we have:
Since \(1<\theta \le 2\), then taking \(0 < \beta = \theta -1 \le 1\), we have:
for all \(k\ge k_{0}\). From Lemma 11 in [25], we further have:
for all \(k\ge k_{0}\) and for some \(\sigma >0\). Finally, if \(\theta > 2\), then define \(h(s){=}s^{-\theta }\) and let \(R>1\) be fixed. Since \(1/\theta < 1\), then there exists a \(k_{0}\) such that \(\lambda _{k}-\lambda _{k+1} \le 1\) for all \(k \ge k_{0}\). Then, we have \(\lambda _{k+1}\le (C_{1}+C_{2})\left( \lambda _{k}-\lambda _{k+1}\right) ^{\frac{1}{\theta }}\), or equivalently:
If we assume that \(h(\lambda _{k+1})\le R h(\lambda _{k}) \), then:
Denote \(\mu =\frac{-R(C_{1}+C_{2})^{\theta }}{-\theta +1}\). Then:
If we assume that \(h(\lambda _{k+1})> R h(\lambda _{k}) \) and set \(\gamma = R^{-\frac{1}{\theta }}\), then it follows immediately that \(\lambda _{k+1}\le \gamma \lambda _{k}\). Since \(1-\theta \) is negative, we get:
Since \(1- \theta <0\), \(\gamma ^{1-\theta } > 1\) and \(\lambda _{k}\) has a nonnegative limit, then there exists \({\bar{\mu }} > 0\) such that \((\gamma ^{1-\theta } - 1) \lambda _{k}^{1-\theta } > {\bar{\mu }}\) for all \(k \ge k_0\). Therefore, in this case we also obtain:
If we set \({\hat{\mu }}=\min (\mu ^{-1},{\bar{\mu }})\) and combine (41) and (42), we obtain:
Summing the last inequality from \(k_{0}\) to k, we obtain \(\lambda _{k}^{1-\theta }-\lambda _{k_{0}}^{1-\theta }\ge {\hat{\mu }}(k-k_{0})\), i.e.:
for all \(k \ge k_0\). This concludes our proof. \(\square \)
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Nabou, Y., Necoara, I. Efficiency of higher-order algorithms for minimizing composite functions. Comput Optim Appl 87, 441–473 (2024). https://doi.org/10.1007/s10589-023-00533-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10589-023-00533-9