Skip to main content
Log in

Scalable adaptive cubic regularization methods

  • Full Length Paper
  • Series A
  • Published:
Mathematical Programming Submit manuscript

Abstract

Adaptive cubic regularization (ARC) methods for unconstrained optimization compute steps from linear systems involving a shifted Hessian in the spirit of the Levenberg-Marquardt and trust-region methods. The standard approach consists in performing an iterative search for the shift akin to solving the secular equation in trust-region methods. Such search requires computing the Cholesky factorization of a tentative shifted Hessian at each iteration, which limits the size of problems that can be reasonably considered. We propose a scalable implementation of ARC named ARC\(_q\)K in which we solve a set of shifted systems concurrently by way of an appropriate modification of the Lanczos formulation of the conjugate gradient (CG) method. At each iteration of ARC\(_q\)K to solve a problem with \(n\) variables, a range of \(m \ll n\) shift parameters is selected. The computational overhead in CG beyond the Lanczos process is thirteen scalar operations to update five vectors of length \(m\), and two \(n\)-vector updates for each value of the shift. The CG variant only requires one Hessian-vector product and one dot product per iteration, independently of \(m\). Solves corresponding to inadequate shift parameters are interrupted early. All shifted systems are solved inexactly. Such modest cost makes our implementation scalable and appropriate for large-scale problems. We provide a new analysis of the inexact ARC method including its worst case evaluation complexity, global and asymptotic convergence. We describe our implementation and report numerical experience that confirms that our implementation of ARC\(_q\)K outperforms a classic Steihaug-Toint trust-region method, and the ARC method of the GALAHAD library. The latter solves the subproblem in nested Krylov subspaces by a Lanczos-based method, which requires the storage of a dense matrix that can be comparable to or larger than the two dense arrays required by our approach if the problem is large or requires many Lanczos iterations. Finally, we generalize our convergence results to inexact Hessians and nonlinear least-squares problems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Algorithm 1
Algorithm 2
Algorithm 3
Fig. 1
Fig. 2
Fig. 3
Fig. 4
Algorithm 4

Similar content being viewed by others

Notes

  1. An iterate was generated where the objective evaluated to \(-\infty \).

References

  1. Bianconcini, T., Liuzzi, G., Morini, B., Sciandrone, M.: On the use of iterative methods in cubic regularization for unconstrained optimization. Comput. Optim. Appl. 60(1), 35–57 (2014). https://doi.org/10.1007/s10589-014-9672-x

    Article  MathSciNet  MATH  Google Scholar 

  2. Birgin, E.G., Gardenghi, J.L., Martínez, J.M., Santos, S.A., Toint, Ph.L.: Worst-case evaluation complexity for unconstrained nonlinear optimization using high-order regularized models. Math. Progr. 163(1–2), 359–368 (2016). https://doi.org/10.1007/s10107-016-1065-8

    Article  MathSciNet  MATH  Google Scholar 

  3. Carmon, Y., Duchi, J.: Gradient descent finds the cubic-regularized nonconvex Newton step. SIAM J. Optim. 29(3), 2146–2178 (2019). https://doi.org/10.1137/17m1113898

    Article  MathSciNet  MATH  Google Scholar 

  4. Cartis, C., Gould, N.I.M., Toint, Ph.L.: Adaptive cubic regularisation methods for unconstrained optimization. Part I: motivation, convergence and numerical results. Math. Program. 127(2), 245–295 (2011). https://doi.org/10.1007/s10107-009-0286-5

  5. Cartis, C., Gould, N.I.M., Toint, Ph.L.: Adaptive cubic regularisation methods for unconstrained optimization. Part II: worst-case function- and derivative-evaluation complexity. Math. Program. 130(2), 295–319 (2011). https://doi.org/10.1007/s10107-009-0337-y

    Article  MathSciNet  MATH  Google Scholar 

  6. Cartis, C., Gould, N.I.M., Toint, Ph.L.: Complexity bounds for second-order optimality in unconstrained optimization. J. Complex. 28(1), 93–108 (2012). https://doi.org/10.1016/j.jco.2011.06.001

    Article  MathSciNet  MATH  Google Scholar 

  7. Cartis, C., Gould, N.I.M., Toint, Ph.L.: On the evaluation complexity of cubic regularization methods for potentially rank-deficient nonlinear least-squares problems and its relevance to constrained nonlinear optimization. SIAM J. Optim. 23(3), 1553–1574 (2013). https://doi.org/10.1137/120869687

    Article  MathSciNet  MATH  Google Scholar 

  8. Conn, A.R., Gould, N.I.M., Toint. Ph.L.: Trust-Region Methods, volume 1 of MPS/SIAM Series on Optimization. SIAM, Philadelphia, USA (2000). https://doi.org/10.1137/1.9780898719857

  9. Cristofari, A., Niri, T.D., Lucidi, S.: On global minimizers of quadratic functions with cubic regularization. Optim. Lett. 13(6), 1269–1283 (2018). https://doi.org/10.1007/s11590-018-1316-0

    Article  MathSciNet  MATH  Google Scholar 

  10. Dolan, E.D., Moré, J.J.: Benchmarking optimization software with performance profiles. Math. Progr. Ser. B 29(2), 201–213 (2002). https://doi.org/10.1007/s101070100

    Article  MathSciNet  MATH  Google Scholar 

  11. Dussault, J.-P.: ARCq: a new adaptive regularization by cubics. Optim. Method Softw. 33(2), 322–335 (2018). https://doi.org/10.1080/10556788.2017.1322080

    Article  MATH  Google Scholar 

  12. Dussault, J.-P.: A unified efficient implementation of trust-region type algorithms for unconstrained optimization. Infor 58(2), 290–309 (2020). https://doi.org/10.1080/03155986.2019.1624490

    Article  MathSciNet  MATH  Google Scholar 

  13. Fong, D.C.L., Saunders, M.A.: LSMR: an iterative algorithm for sparse least-squares problems. SIAM J. Sci. Comput. 33(5), 2950–2971 (2011). https://doi.org/10.1137/10079687X

    Article  MathSciNet  MATH  Google Scholar 

  14. Frommer, A., Maass, P.: Fast CG-based methods for Tikhonov-Phillips regularization. SIAM J. Sci. Comput. 20(5), 1831–1850 (1999). https://doi.org/10.1137/S1064827596313310

    Article  MathSciNet  MATH  Google Scholar 

  15. Glässner, U., Güsken, S., Lippert, T., Ritzenhöfer, G., Schilling, K., Frommer, A.: How to compute Green’s functions for entire mass trajectories within Krylov solvers. Int. J. Mod. Phys. C 07(05), 635–644 (1996)

    Article  Google Scholar 

  16. Gould, N.I.M., Simoncini, V.: Error estimates for iterative algorithms for minimizing regularized quadratic subproblems. Optim. Method Softw. 35(2), 304–328 (2019). https://doi.org/10.1080/10556788.2019.1670177

    Article  MathSciNet  MATH  Google Scholar 

  17. Gould, N.I.M., Lucidi, S., Roma, M., Toint, Ph.L.: Solving the trust-region subproblem using the Lanczos method. SIAM J. Optim. 9(2), 504–525 (1999). https://doi.org/10.1137/S1052623497322735

    Article  MathSciNet  MATH  Google Scholar 

  18. Gould, N.I.M., Orban, D., Toint, Ph.L.: GALAHAD, a library of thread-safe Fortran 90 packages for large-scale nonlinear optimization. ACM Trans. Math. Softw. 29(4), 353–372 (2003). https://doi.org/10.1145/962437.962438

    Article  MathSciNet  MATH  Google Scholar 

  19. Gould, N.I.M., Porcelli, M., Toint, Ph.L.: Updating the regularization parameter in the adaptive cubic regularization algorithm. Comput. Optim. Appl. 53(1), 1–22 (2012). https://doi.org/10.1007/s10589-011-9446-7

    Article  MathSciNet  MATH  Google Scholar 

  20. Gould, N.I.M., Orban, D., Toint, Ph.L.: CUTEst: a constrained and unconstrained testing environment with safe threads for mathematical optimization. Comput. Optim. Appl. 60, 545–557 (2015). https://doi.org/10.1007/s10589-014-9687-3

    Article  MathSciNet  MATH  Google Scholar 

  21. Griewank, A.: The modification of Newton’s method for unconstrained optimization by bounding cubic terms. Technical Report NA/12, Department of Applied Mathematics and Theoretical Physics, University of Cambridge (1981)

  22. Hestenes, M.R., Stiefel, E.: Methods of conjugate gradients for solving linear systems. J. Res. Natl. Bur. Stand. 49(6), 409–436 (1952)

    Article  MathSciNet  MATH  Google Scholar 

  23. Jegerlehner, B.: Krylov space solvers for shifted linear systems. Technical Report IUHET-353, Indiana University, Department of Physics, Bloomington, IN, 1996. arXiv:hep-lat/9612014

  24. Jiang, R., Yue, M.-C., Zhou, Z.: An accelerated first-order method with complexity analysis for solving cubic regularization subproblems. Comput. Optim. Appl. 79(2), 471–506 (2021). https://doi.org/10.1007/s10589-021-00274-7

    Article  MathSciNet  MATH  Google Scholar 

  25. Lanczos, C.: An iteration method for the solution of the eigenvalue problem of linear differential and integral operators. J. Res. Natl. Bur. Stand. 45(4), 255–282 (1950)

    Article  MathSciNet  Google Scholar 

  26. Montoison, A., Orban, D., and contributors. Krylov.jl: A Julia basket of hand-picked Krylov methods, June 2020. https://doi.org/10.5281/zenodo.822073. https://github.com/JuliaSmoothOptimizers/Krylov.jl

  27. Moré, J.J., Sorensen, D.C.: Computing a trust region step. SIAM J. Sci. Stat. Comput. 4(3), 553–572 (1983). https://doi.org/10.1137/0904038

    Article  MathSciNet  MATH  Google Scholar 

  28. Orban, D., Siqueira, A.S.: JuliaSmoothOptimizers: Infrastructure and solvers for continuous optimization in Julia, 2019. https://doi.org/10.5281/zenodo.2655082. https://www.juliasmoothoptimizers.github.io

  29. Orban, D., Siqueira, A.S., and contributors. CUTEst.jl: Julia’s CUTEst interface (2020a). https://doi.org/10.5281/zenodo.1188851. https://github.com/JuliaSmoothOptimizers/CUTEst.jl

  30. Orban, D., Siqueira, A. S., and contributors. NLPModels.jl: Data structures for optimization models (2020b). https://doi.org/10.5281/zenodo.2558627. https://github.com/JuliaSmoothOptimizers/NLPModels.jl

  31. Orban, D., Siqueira, A. S., and contributors. SolverBenchmark.jl: Benchmark tools for solvers (2020c). https://doi.org/10.5281/zenodo.3948381. https://github.com/JuliaSmoothOptimizers/SolverBenchmark.jl

  32. Orban, D., Siqueira, A. S., and contributors. SolverTools.jl: Tools for developing nonlinear optimization solvers (2020d). https://doi.org/10.5281/zenodo.3937408. https://github.com/JuliaSmoothOptimizers/SolverTools.jl

  33. Ortega, J. M.: Numerical analysis: a second course. Number 3 in Class. Appl. Math. Soc. Indust. Appl. Math., Philadelphia, PA, 1990

  34. Paige, C.C., Saunders, M.A.: Solution of sparse indefinite systems of linear equations. SIAM J. Numer. Anal. 12(4), 617–629 (1975). https://doi.org/10.1137/0712047

    Article  MathSciNet  MATH  Google Scholar 

  35. Paige, C.C., Saunders, M.A.: LSQR: an algorithm for sparse linear equations and sparse least squares. ACM Trans. Math. Softw. 8(1), 43–71 (1982). https://doi.org/10.1145/355984.355989

    Article  MathSciNet  MATH  Google Scholar 

  36. Steihaug, T.: The conjugate gradient method and trust regions in large scale optimization. SIAM J. Numer. Anal. 20(3), 626–637 (1983). https://doi.org/10.1137/0720042

    Article  MathSciNet  MATH  Google Scholar 

  37. Toint, Ph.: Towards an efficient sparsity exploiting Newton method for minimization. In I. S. Duff, editor, Sparse Matrices and Their Uses, pages 57–88. Academic press, 1981

  38. van den Eshof, J., Sleijpen, G.L.: Accurate conjugate gradient methods for families of shifted systems. Appl. Numer. Math. 49(1), 17–37 (2004). https://doi.org/10.1016/j.apnum.2003.11.010

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dominique Orban.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Research partially supported by an NSERC Discovery Grant, Research supported by IVADO and the Canada First Research Excellence Fund / Apogée, Research partially supported by an NSERC Discovery Grant.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file 1 (pdf 135 KB)

Appendices

Appendix A. Example of an ill-conditioned problem

We examine the iterations of the ARC\(_q\)K method on problem sbrybnd from the CUTEst test set [20], which has unfavorable eigenvalues. The problem is a Broyden banded system of nonlinear equations with 5000 variables, solved in the least squares sense. The smallest eigenvalue is of order \(-10^{12}\) at the initial point. With the parameters of Sect. 4, ARC\(_q\)K and ARC both stop because they attain the time limit. Both solvers stop close to stationary point in the sense that \(\Vert \nabla f(x_k)\Vert \approx {\tilde{\epsilon }} \Vert \nabla f(x_0)\Vert \) with \({\tilde{\epsilon }} = 2.26 \cdot 10^{-5}\) after \(148\) iterations for ARC\(_q\)K, and \({\tilde{\epsilon }} = 2.45 \cdot 10^{-4}\) after \(299\) iterations for ARC.

Figure 5 reports the shift \(\lambda _k\) used by ARC\(_q\)K and the absolute value of the smallest eigenvalue \(\sigma _k\) of \(\nabla ^2 f(x_k)\) as a function of the iterations (left), and the norm of the search direction \(d_k\) (right). As expected, at iterations where \(\sigma _k\) is very negative, \(\lambda _k\) is very large, and \(\Vert d_k\Vert \) is small, which induces slow progresses. Figure 6 shows the corresponding quantities for ARC, and the conclusions are similar.

Problem sbrybnd illustrates the necessity to make provision for potentially large shifts on problems with unfavorable eigenvalue distribution. Both methods make slow progress in regions dominated by very negative eigenvalues. They both manage to solve the problem to an acceptable level, though not to the prescribed stopping tolerance. Of course, large shifts may be unnecessary on many problems. One can easily imagine a refined implementation in which large shifts are only employed when smaller shifts fail to provide a satisfactory search direction. Evidently, the issue would be mitigated by an appropriate problem scaling, but that is out of the scope of the present analysis.

Fig. 5
figure 5

Left: shifts \(\lambda _k\) used by ARC\(_q\)K and the absolute value of the smallest eigenvalue \(\sigma _k\) of \(\nabla ^2 f(x_k)\) as a function of the iteration. Right: norm of the direction \(d_k\) as a function of the iteration

Fig. 6
figure 6

Left: shifts used by ARC estimated as the ratio \(\alpha _k / \Vert d_k\Vert \) where \(\alpha _k\) is ARC’s regularization parameter and \(d_k\) s the search direction. Right: norm of the direction \(d_k\) as a function of the iteration

Appendix B Technical results

Denote by \(\sigma _{1,k}\le ...\le \sigma _{n,k}\) the eigenvalues of \(\nabla ^2 f(x_k)\), and consider an eigendecomposition of \(\nabla ^2 f(x_k) = U_k^T\Sigma _k U_k\) where \(\Sigma _k = diag (\sigma _{1,k}, \ldots , \sigma _{n,k})\), and \(U_k\) is an orthonormal matrix of eigenvectors.

As described in the preamble of Sect. 3, at each iteration k, the sequence of shifts is defined such that

$$\begin{aligned} \lambda ^k_i:= \lambda _0 \psi ^i, \quad i=0, \dots , m, \end{aligned}$$

where \(\lambda _0:= \psi ^{-e_L}\) and \(\psi > 1\).

Assume that there exists \({\bar{\imath }}_k\) the smallest nonnegative integer such that

$$\begin{aligned} \lambda ^k_{{\bar{\imath }}_k}:= \psi ^{{\bar{\imath }}_k}\lambda _0 > \max (0, -\sigma _{1,k}). \end{aligned}$$

Note that such an \({\bar{\imath }}_k\) always exists if the eigenvalues are bounded and m is sufficiently large. The motivation of this section is to prove that there exists a constant \(\beta \) such that for some \(i \ge {\bar{\imath }}_k\), (2.2c) is satisfied,

$$\begin{aligned} \frac{1}{\beta } \le \phi (\lambda ^k_{i}):=\frac{\alpha _k \lambda ^k_{i}}{\Vert d(\lambda ^k_{i})\Vert } \le \beta , \end{aligned}$$
(2.2c)

assuming that for all \(\lambda ^k_{i}\), there exists an oracle returning \(d(\lambda ^k_{i})\) satisfying (2.2a) with \(r^k_i\). First, the following lemma shows that \(\phi \) is monotonic.

Lemma B.1

Assume that \(\nabla f(x_k) \ne 0\). Let \(\Gamma _k > 0\) be a constant such that

$$\begin{aligned} \Gamma _k\ge & {} \frac{2}{(\sigma _{1,k} + \lambda ^k_{{\bar{\imath }}_k})^{2}}\sup _{i \ge {\bar{\imath }}_k}\Vert r^k_i\Vert ^2_2, \end{aligned}$$
(B.1)
$$\begin{aligned} \Gamma _k< & {} \frac{2\lambda _0\psi ^{{\bar{\imath }}_k}(\psi - 1) }{(\sigma _{n,k} + \lambda ^k_{m})^3} \Vert \nabla f(x_k)\Vert ^2_2. \end{aligned}$$
(B.2)

Then, the sequence \(i \mapsto \phi (\lambda ^k_{i})\) is strictly increasing for \(i \ge {\bar{\imath }}_k\).

Conditions (B.1) and (B.2) impose a constraint on the precision used to solve the linear system, which is asymptotically less stringent than Assumption 2.6 used in the proofs.

Proof

The beginning of the proof follows the same line as [8, Section 7.3.1]. Using the eigendecomposition of the \(\nabla ^2 f(x_k)\), it follows that

$$\begin{aligned} \Vert d_k(\lambda ^k_i)\Vert _2^2= & {} \Vert (\nabla ^2 f(x_k) + \lambda ^k_i I)^{-1}(\nabla f(x_k) - r^k_i) \Vert _2^2 \\= & {} \Vert U_k^T(\Sigma _k + \lambda ^k_i I)^{-1}U_k(\nabla f(x_k) - r^k_i) \Vert _2^2\\= & {} \Vert (\Sigma _k + \lambda ^k_i I)^{-1}U_k(\nabla f(x_k) - r^k_i) \Vert _2^2 = \sum _{j=1}^n \frac{(\gamma _{j,k} - [U_k r^k_i]_j)^2}{(\sigma _{j,k} + \lambda ^k_i)^2}, \end{aligned}$$

where \(\gamma _{j,k} = [U_k \nabla f(x_k)]_j\) is the j-th component of \(U_k \nabla f(x_k)\). Additionally, using the triangle inequality and that \(\Vert a - b\Vert \ge \Vert a\Vert - \Vert b\Vert \), it holds for all \(i \ge {\bar{\imath }}_k\) that

$$\begin{aligned} \left| \Vert d_k(\lambda ^k_i)\Vert _2^2 - \Vert (\Sigma _k + \lambda ^k_i I)^{-1}U_k\nabla f(x_k)\Vert _2^2 \right|\le & {} \Vert (\Sigma _k + \lambda ^k_i I)^{-1}U_kr^k_i \Vert _2^2\\\le & {} \Vert r^k_i\Vert ^2_2(\sigma _{1,k} + \lambda ^k_{{\bar{\imath }}})^{-2} \overset{(b.1)}{\le } \frac{\Gamma _k}{2}. \end{aligned}$$

We now prove that \(\Vert d_k(\lambda _{i+1}^k)\Vert _2^2 < \Vert d_k(\lambda ^k_i)\Vert _2^2\), which would conclude the proof. For \(i \ge {\bar{\imath }}_k\), using previous inequality, we get

$$\begin{aligned} \Vert d_k(\lambda _{i+1}^k)\Vert _2^2 - \Vert d_k(\lambda ^k_i)\Vert _2^2 \le \\ \Vert (\Sigma _k + \lambda _{i+1}^k I)^{-1}U_k\nabla f(x_k)\Vert _2^2 - \Vert (\Sigma _k + \lambda ^k_i I)^{-1}U_k\nabla f(x_k)\Vert _2^2 + \Gamma _k. \end{aligned}$$

Considering the first part of the right-hand side in previous inequality yields

$$\begin{aligned}{} & {} \Vert (\Sigma _k + \lambda ^k_i I)^{-1}U_k\nabla f(x_k)\Vert _2^2 -\Vert (\Sigma _k + \lambda ^k_j I)^{-1}U_k\nabla f(x_k)\Vert _2^2 \\{} & {} \quad = \sum _{j=1}^{n} \gamma _{j,k}^2 \left( \frac{1}{(\sigma _{j,k} + \lambda _{i+1}^k)^2} - \frac{1}{(\sigma _{j,k} + \lambda ^k_i)^2} \right) \\{} & {} \quad = \sum _{j=1}^{n} \gamma _{j,k}^2 \left( \frac{(\sigma _{j,k} + \lambda ^k_i)^2 - (\sigma _{j,k} + \lambda _{i+1}^k)^2}{(\sigma _{j,k} + \lambda _{i+1}^k)^2(\sigma _{j,k} + \lambda ^k_i)^2} \right) \\{} & {} \quad = (\lambda ^k_{i} - \lambda ^k_{i+1})\sum _{j=1}^{n} \gamma _{j,k}^2 \left( \frac{\lambda ^k_{i+1} + \lambda ^k_{i} + 2\sigma _{j,k}}{(\sigma _{j,k} + \lambda _{i+1}^k)^2(\sigma _{j,k} + \lambda ^k_i)^2} \right) \\{} & {} \quad \le \lambda _0\psi ^{{\bar{\imath }}_k}(1 - \psi )\sum _{j=1}^{n} \gamma _{j,k}^2 \left( \frac{\lambda ^k_{i+1} + \lambda ^k_{i} + 2\sigma _{j,k}}{(\sigma _{j,k} + \lambda _{i+1}^k)^2(\sigma _{j,k} + \lambda ^k_i)^2} \right) \\{} & {} \quad \le \lambda _0\psi ^{{\bar{\imath }}_k}(1 - \psi )\sum _{j=1}^{n} \gamma _{j,k}^2\left( \frac{ 2}{(\sigma _{j,k} + \lambda _{i+1}^k)^3}\right) \\{} & {} \quad \le 2\lambda _0\psi ^{{\bar{\imath }}_k}(1 - \psi )\Vert \nabla f(x_k)\Vert ^2_2 \left( \frac{1}{(\sigma _{n,k} + \lambda ^k_{m})^3}\right) < 0. \end{aligned}$$

The proof is complete after combining this relation with (B.2). \(\square \)

The following lemma gives an upper bound independent of i of

$$\begin{aligned} {\overline{\kappa }}_{k}(i):= \frac{\sigma _{n,k} + \lambda ^k_{i+1}}{\sigma _{1,k} + \lambda ^k_{i}}. \end{aligned}$$

Lemma B.2

For all \(i \ge {\bar{\imath }}_k\),

$$\begin{aligned} 1 < {\overline{\kappa }}_{k}(i) \le \max \left( {\overline{\kappa }}_{k}({\bar{\imath }}_k), \frac{|\sigma _{n,k}|}{\lambda _0\psi ^{{\bar{\imath }}_k}(\psi - 1)} + \frac{\psi }{1 - \frac{1}{\psi }}\right) := {\overline{\kappa }}_k. \end{aligned}$$

Proof

The case \(i = {\bar{\imath }}_k\) is straightforward. Let \(i > {\bar{\imath }}_k\). Using \(\sigma _{1,k} + \psi ^{{\bar{\imath }}_k}\lambda _0 > 0\), then

$$\begin{aligned} \sigma _{1,k} + \psi ^i \lambda _0 \ge -\psi ^{{\bar{\imath }}_k}\lambda _0 + \psi ^i\lambda _0 = \lambda _0\psi ^{{\bar{\imath }}_k}(\psi ^{i - {\bar{\imath }}_k} - 1). \end{aligned}$$

On the other hand,

$$\begin{aligned} \frac{\psi ^{i+1}}{\psi ^i - \psi ^{{\bar{\imath }}_k}} = \frac{\psi }{1 - \frac{\psi ^{{\bar{\imath }}_k}}{\psi ^i}} \le \frac{\psi }{1 - \frac{1}{\psi }}. \end{aligned}$$

Combining the two previous inequalities into the definition of \({\overline{\kappa }}_{k}(i)\) yields

$$\begin{aligned} {\overline{\kappa }}_{k}(i) \le \frac{\sigma _{n,k} + \psi ^{i+1} \lambda _0}{\lambda _0\psi ^{{\bar{\imath }}_k}(\psi ^{i - {\bar{\imath }}_k} - 1)} \le \frac{\sigma _{n,k}}{\lambda _0\psi ^{{\bar{\imath }}_k}(\psi ^{i - {\bar{\imath }}_k} - 1)} + \frac{\psi }{1 - \frac{1}{\psi }}, \end{aligned}$$

and the result follows as \(\psi ^{i - {\bar{\imath }}_k} \ge \psi \). \(\square \)

Proposition B.1

Let the assumptions of Lemma B.1 hold. Moreover, let \(\beta _k \ge 1\) be chosen such that

$$\begin{aligned} \beta _k \ge \max \left( \phi (\lambda ^k_{{\bar{\imath }}_k}), \sqrt{\psi {\overline{\kappa }}_k (1 + \Gamma _k)}\right) , \end{aligned}$$
(B.3)

where \(\Gamma _k\) and \({\overline{\kappa }}\) are respectively the constants defined in Lemma B.1 and Lemma B.2. Then, there exists \(i \ge {\bar{\imath }}_k\) such that (2.2c) holds.

Proof

We can further assume that \(\phi (\lambda ^k_{{\bar{\imath }}_k}) < \frac{1}{\beta _k}\), as otherwise \(\phi (\lambda ^k_{{\bar{\imath }}_k}) \ge \frac{1}{\beta _k}\) and the result would follow by (B.3). The condition (2.2c) can be reformulated as

$$\begin{aligned} \frac{1}{\beta _k} \le \phi (\lambda ) \le \beta _k, \Longleftrightarrow -\log (\beta _k) \le \log (\alpha _k \lambda ) - \log (\Vert d(\lambda )\Vert ) \le \log (\beta _k), \end{aligned}$$
(B.4)

so the targeted interval is of length \(2\log (\beta _k)\). By Lemma B.1, the sequence \(i \mapsto \phi (\lambda ^k_{i})\) is strictly increasing, \(\log (\phi (\lambda ^k_{{\bar{\imath }}_k})) < \log (\beta _k)\) from (B.3), and we now prove that

$$\begin{aligned} 0< \log \left( \phi (\lambda ^k_{i+1}) \right) - \log \left( \phi (\lambda ^k_{i}) \right) < 2\log (\beta _k), \quad i \ge {\bar{\imath }}_k, \end{aligned}$$
(B.5)

which would conclude the proof.

Let \(i \ge {\bar{\imath }}_k\), it follows from (2.2a) that

$$\begin{aligned} (\nabla ^2 f(x_k) + \lambda ^k_{i+1} I)d_k(\lambda _{i+1}^k) - (\nabla ^2 f(x_k) + \lambda ^k_i I)d_k(\lambda ^{i}_k) = r_k^{i+1} - r^k_i. \end{aligned}$$

Taking the norm on both sides, and using the triangle inequality yields

$$\begin{aligned} \Vert d_k(\lambda ^k_i)\Vert\le & {} \Vert (\nabla ^2 f(x_k) + \lambda ^k_{i} I)^{-1} ( \nabla ^2 f(x_k) + \lambda ^k_{i+1} I)d_k(\lambda _{i+1}^k)\Vert \\ {}{} & {} + \Vert (\nabla ^2 f(x_k) + \lambda ^k_{i} I)^{-1}(r_k^{i+1} - r^k_i)\Vert \\\le & {} \frac{\sigma _{n,k} + \lambda ^k_{i+1}}{\sigma _{1,k} + \lambda ^k_{i}} \Vert d_k(\lambda _{i+1}^k)\Vert + \frac{\Vert r_k^{i+1} - r^k_i\Vert }{\sigma _{1,k} + \lambda ^k_{{\bar{\imath }}_k}}. \end{aligned}$$

Additionally, \(\lambda ^k_i = \psi ^i \lambda _0\) gives

$$\begin{aligned} \log ( \phi (\lambda ^k_{i})) = \log \left( \frac{\alpha _k \lambda ^k_{i}}{\Vert d(\lambda ^k_{i})\Vert }\right) = \log (\alpha _k) + \log (\lambda _0) + i \log (\psi ) - \log (\Vert d(\lambda ^k_{i})\Vert ). \end{aligned}$$

Therefore, using that \(\log \) is increasing and that \(x \mapsto \log (1+x)\) is subadditive,

$$\begin{aligned} \log \left( \phi (\lambda ^k_{i+1}) \right) - \log \left( \phi (\lambda ^k_{i}) \right)= & {} \log (\psi ) - \log (\Vert d(\lambda ^k_{i+1})\Vert ) + \log (\Vert d(\lambda ^k_{i})\Vert )\\\le & {} \log (\psi ) + \log ({\overline{\kappa }}_{k}) + \log \left( 1 + \frac{\Vert r_k^{i+1} - r^k_i\Vert }{\sigma _{1,k} + \lambda ^k_{{\bar{\imath }}_k}} \right) \\\le & {} 2\log (\beta _k), \end{aligned}$$

which proves (B.5), and the result. \(\square \)

Proposition B.2

Let \(\{x^k\}\) be a sequence generated by Algorithm 3. Assume that the eigenvalues of \(\nabla ^2 f(x_k)\) are bounded for all k. Assume further that there exists a constant \(\phi ^{\max }\) such that

$$\begin{aligned} \phi (\lambda ^k_{{\bar{\imath }}_k}) \le \phi ^{\max }. \end{aligned}$$
(B.6)

Then, there exists a constant \(\beta \ge 1\) such that (B.5) holds for all k. Furthermore, for all k, there exists a \(i \ge {\bar{\imath }}_k\) such that (2.2c) holds.

Proof

Using Lemma B.2 and that the eigenvalues of \(\nabla ^2 f(x_k)\) are bounded for all k, it is clear that \(\beta \) can be chosen greater than \(\phi ^{\max }\) and \(\sqrt{\psi {\overline{\kappa }}_k (1 + \Gamma _k)}\) for all k. \(\square \)

The condition (B.6) further imposes that the shifts \(\lambda ^k_i\) covers sufficiently small values.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Dussault, JP., Migot, T. & Orban, D. Scalable adaptive cubic regularization methods. Math. Program. (2023). https://doi.org/10.1007/s10107-023-02007-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10107-023-02007-6

Keywords

Mathematics Subject Classification

Navigation