Skip to main content
Log in

On the geometric analysis of a quartic–quadratic optimization problem under a spherical constraint

  • Full Length Paper
  • Series A
  • Published:
Mathematical Programming Submit manuscript

Abstract

This paper considers the problem of solving a special quartic–quadratic optimization problem with a single sphere constraint, namely, finding a global and local minimizer of \(\frac{1}{2}\mathbf {z}^{*}A\mathbf {z}+\frac{\beta }{2}\sum _{k=1}^{n}|z_{k}|^{4}\) such that \(\Vert \mathbf {z}\Vert _{2}=1\). This problem spans multiple domains including quantum mechanics and chemistry sciences and we investigate the geometric properties of this optimization problem. Fourth-order optimality conditions are derived for characterizing local and global minima. When the matrix in the quadratic term is diagonal, the problem has no spurious local minima and global solutions can be represented explicitly and calculated in \(O(n\log {n})\) operations. When A is a rank one matrix, the global minima of the problem are unique under certain phase shift schemes. The strict-saddle property, which can imply polynomial time convergence of second-order-type algorithms, is established when the coefficient \(\beta \) of the quartic term is either at least \(O(n^{3/2})\) or not larger than O(1). Finally, the Kurdyka–Łojasiewicz exponent of quartic–quadratic problem is estimated and it is shown that the largest exponent is at least 1/4 for a broad class of stationary points.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

References

  1. Absil, P.A., Mahony, R., Andrews, B.: Convergence of the iterates of descent methods for analytic cost functions. SIAM J. Optim. 16(2), 531–547 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  2. Absil, P.A., Mahony, R., Sepulchre, R.: Optimization Algorithms on Matrix Manifolds. Princeton University Press, Princeton (2009)

    MATH  Google Scholar 

  3. Adhikari, S.K.: Numerical solution of the two-dimensional Gross–Pitaevskii equation for trapped interacting atoms. Phys. Lett. A 265(1–2), 91–96 (2000)

    Article  Google Scholar 

  4. Anandkumar, A., Ge, R.: Efficient approaches for escaping higher order saddle points in non-convex optimization. In: 29th Annual Conference on Learning Theory. Proceedings of Machine Learning Research, vol. 49, pp. 81–102 (2016)

  5. Arora, S., Ge, R., Ma, T., Moitra, A.: Simple, efficient, and neural algorithms for sparse coding. J. Mach. Learn. Res. (2015)

  6. Attouch, H., Bolte, J.: On the convergence of the proximal algorithm for nonsmooth functions involving analytic features. Math. Program. 116(1–2), 5–16 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  7. Attouch, H., Bolte, J., Redont, P., Soubeyran, A.: Proximal alternating minimization and projection methods for nonconvex problems: an approach based on the Kurdyka–łojasiewicz inequality. Math. Oper. Res. 35(2), 438–457 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  8. Attouch, H., Bolte, J., Svaiter, B.F.: Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward-backward splitting, and regularized Gauss–Seidel methods. Math. Program. 137(1–2, Ser. A), 91–129 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  9. Bandeira, A.S., Boumal, N., Singer, A.: Tightness of the maximum likelihood semidefinite relaxation for angular synchronization. Math. Program. 163(1–2), 145–167 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  10. Bandeira, A.S., Boumal, N., Voroninski, V.: On the low-rank approach for semidefinite programs arising in synchronization and community detection. In: Conference on Learning Theory, pp. 361–382 (2016)

  11. Bao, W., Cai, Y.: Mathematical theory and numerical methods for Bose–Einstein condensation. Kinet. Relat. Models 6(1), 1–135 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  12. Bolte, J., Daniilidis, A., Lewis, A.: The Łojasiewicz inequality for nonsmooth subanalytic functions with applications to subgradient dynamical systems. SIAM J. Optim. 17(4), 1205–1223 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  13. Bolte, J., Daniilidis, A., Lewis, A.: A nonsmooth Morse–Sard theorem for subanalytic functions. J. Math. Anal. Appl. 321(2), 729–740 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  14. Bolte, J., Daniilidis, A., Lewis, A., Shiota, M.: Clarke subgradients of stratifiable functions. SIAM J. Optim. 18(2), 556–572 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  15. Bolte, J., Sabach, S., Teboulle, M.: Proximal alternating linearized minimization or nonconvex and nonsmooth problems. Math. Program. 146(1–2), 459–494 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  16. Bonettini, S., Loris, I., Porta, F., Prato, M., Rebegoldi, S.: On the convergence of a linesearch based proximal-gradient method for nonconvex optimization. Inverse Probl. 33(5) (2017)

  17. Boumal, N.: Nonconvex phase synchronization. SIAM J. Optim. 26(4), 2355–2377 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  18. Cai, Y., Zhang, L., Bai, Z., Li, R.C.: On an eigenvector-dependent nonlinear eigenvalue problem. SIAM J. Matrix Anal. Appl. 39(3), 1360–1382 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  19. Candès, E.J., Li, X.: Solving quadratic equations via phaselift when there are about as many equations as unknowns. Found. Comput. Math. 14(5), 1017–1026 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  20. Candès, E.J., Li, X., Soltanolkotabi, M.: Phase retrieval via wirtinger flow: theory and algorithms. IEEE Trans. Inf. Theory 61(4), 1985–2007 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  21. Candès, E.J., Strohmer, T., Voroninski, V.: Phaselift: exact and stable signal recovery from magnitude measurements via convex programming. Commun. Pure Appl. Math. 66(8), 1241–1274 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  22. Candès, E.J., Tao, T.: The power of convex relaxation: near-optimal matrix completion. IEEE Trans. Inf. Theory 56(5), 2053–2080 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  23. Cartis, C., Gould, N.I.M., Toint, P.L.: Second-order optimality and beyond: characterization and evaluation complexity in convexly constrained nonlinear optimization. Found. Comput. Math. 18(5), 1073–1107 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  24. Chen, Y., Chi, Y., Fan, J., Ma, C.: Gradient descent with random initialization: fast global convergence for nonconvex phase retrieval. Math. Program. 176(1–2, Ser. B), 5–37 (2019)

    Article  MathSciNet  MATH  Google Scholar 

  25. Chen, Y., Chi, Y., Goldsmith, A.J.: Exact and stable covariance estimation from quadratic sampling via convex programming. IEEE Trans. Inf. Theory 61(7), 4034–4059 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  26. Chi, Y., Lu, Y.M., Chen, Y.: Nonconvex optimization meets low-rank matrix factorization: an overview (2018). Preprint arXiv:1809.09573

  27. D’Acunto, D., Kurdyka, K.: Explicit bounds for the łojasiewicz exponent in the gradient inequality for polynomials. Ann. Pol. Math. 87, 51–61 (2005)

  28. Dedieu, J.P.: Third- and fourth-order optimality conditions in optimization. Optimization 33(2), 97–104 (1995)

    Article  MathSciNet  MATH  Google Scholar 

  29. Edwards, M., Burnett, K.: Numerical solution of the nonlinear Schrödinger equation for small samples of trapped neutral atoms. Phys. Rev. A 51(2), 1382 (1995)

    Article  Google Scholar 

  30. Forti, M., Nistri, P., Quincampoix, M.: Convergence of neural networks for programming problems via a nonsmooth Łojasiewicz inequality. IEEE Trans. Neural Netw. 17(6), 1471–1486 (2006)

    Article  Google Scholar 

  31. Gao, B., Liu, X., Chen, X., Yuan, Y.: On the Łojasiewicz exponent of the quadratic sphere constrained optimization problem (2016). Preprint arXiv:1611.08781

  32. Gao, B., Liu, X., Chen, X., Yuan, Y.: A new first-order algorithmic framework for optimization problems with orthogonality constraints. SIAM J. Optim. 28(1), 302–332 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  33. García-Ripoll, J.J., Pérez-García, V.M.: Optimizing Schrödinger functionals using sobolev gradients: applications to quantum mechanics and nonlinear optics. SIAM J. Sci. Comput. 23(4), 1316–1334 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  34. Ge, R., Huang, F., Jin, C., Yuan, Y.: Escaping from saddle points–online stochastic gradient for tensor decomposition. In: Conference Learning Theory, pp. 797–842 (2015)

  35. Ge, R., Jin, C., Zheng, Y.: No spurious local minima in nonconvex low rank problems: a unified geometric analysis, vol. 70, pp. 1233–1242. In: Conference Machine Learning (2017)

  36. Ge, R., Lee, J.D., Ma, T.: Matrix completion has no spurious local minimum. Adv. Neural Inf. Process. Syst. 2973–2981 (2016)

  37. Ge, R., Ma, T.: On the optimization landscape of tensor decompositions. Adv. Neural Inf. Process. Syst. 3653–3663 (2017)

  38. Griffin, A., Snoke, D.W., Stringari, S.: Bose–Einstein Condensation. Cambridge University Press, Cambridge (1996)

    Google Scholar 

  39. Gwoździewicz, J.: The Łojasiewicz exponent of an analytic function at an isolated zero. Comment. Math. Helv. 74(3), 364–375 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  40. Hu, J., Jiang, B., Liu, X., Wen, Z.: A note on semidefinite programming relaxations for polynomial optimization over a single sphere. Sci. China Math. 59(8), 1543–1560 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  41. Hu, J., Milzarek, A., Wen, Z., Yuan, Y.: Adaptive quadratically regularized Newton method for Riemannian optimization. SIAM J. Matrix Anal. Appl. 39(3), 1181–1207 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  42. Jaganathan, K., Eldar, Y.C., Hassibi, B.: Phase retrieval: an overview of recent developments (2015). Preprint arXiv:1510.07713

  43. Jain, P., Kar, P.: Non-convex optimization for machine learning. Found. Trends® Mach. Learn. 10(3–4), 142–336 (2017)

  44. Keshavan, R.H., Montanari, A., Oh, S.: Matrix completion from a few entries. IEEE Trans. Inf. Theory 56(6), 2980–2998 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  45. Keshavan, R.H., Montanari, A., Oh, S.: Matrix completion from noisy entries. J. Mach. Learn. Res. 11(Jul), 2057–2078 (2010)

    MathSciNet  MATH  Google Scholar 

  46. Kreutz-Delgado, K.: The complex gradient operator and the CR-calculus. Preprint arXiv:0906.4835 (2009)

  47. Kurdyka, K.: On gradients of functions definable in o-minimal structures. Ann. Inst. Fourier (Grenoble) 48(3), 769–783 (1998)

    Article  MathSciNet  MATH  Google Scholar 

  48. Lee, J.D., Panageas, I., Piliouras, G., Simchowitz, M., Jordan, M.I., Recht, B.: First-order methods almost always avoid strict saddle points. Math. Program. 176(1–2, Ser. B), 311–337 (2019)

    Article  MathSciNet  MATH  Google Scholar 

  49. Lee, J.D., Simchowitz, M., Jordan, M.I., Recht, B.: Gradient descent converges to minimizers. In: Conference Learning Theory, pp. 1246–1257 (2016)

  50. Li, G., Pong, T.K.: Calculus of the exponent of Kurdyka–Łojasiewicz inequality and its applications to linear convergence of first-order methods. Found. Comput. Math. 18(5), 1199–1232 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  51. Li, X., Zhu, Z., So, A.M.C., Vidal, R.: Nonconvex robust low-rank matrix recovery (2018). Preprint arXiv:1809.09237

  52. Liang, S., Sun, R., Li, Y., Srikant, R.: Understanding the loss surface of neural networks for binary classification. In: International Conference on Machine Learning, pp. 2835–2843 (2018)

  53. Liu, H., So, A.M.C., Wu, W.: Quadratic optimization with orthogonality constraint: explicit Łojasiewicz exponent and linear convergence of retraction-based line-search and stochastic variance-reduced gradient methods. Math. Program. 1–48 (2017)

  54. Liu, H., Yue, M.C., Man-Cho So, A.: On the estimation performance and convergence rate of the generalized power method for phase synchronization. SIAM J. Optim. 27(4), 2426–2446 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  55. Łojasiewicz, S.: Une propriété topologique des sous-ensembles analytiques réels. Les équations aux dérivées partielles 117, 87–89 (1963)

    MATH  Google Scholar 

  56. Luo, Z.Q., Pang, J.S.: Error bounds for analytic systems and their applications. Math. Program. 67(1–3), 1–28 (1994)

    Article  MathSciNet  MATH  Google Scholar 

  57. Luo, Z.Q., Sturm, J.F.: Error bounds for quadratic systems. In: High Performance Optimization, pp. 383–404. Springer (2000)

  58. Marčenko, V.A., Pastur, L.A.: Distribution of eigenvalues for some sets of random matrices. Math. USSR-Sbornik 1(4), 457 (1967)

    Article  Google Scholar 

  59. Merlet, B., Nguyen, T.N., et al.: Convergence to equilibrium for discretizations of gradient-like flows on Riemannian manifolds. Differ. Integr. Equ. 26(5/6), 571–602 (2013)

    MathSciNet  MATH  Google Scholar 

  60. More, J.J.: Generalizations of the trust region problem. Optim. Methods Softw. 2(3–4), 189–209 (1993)

    Article  Google Scholar 

  61. Murty, K.G., Kabadi, S.N.: Some NP-complete problems in quadratic and nonlinear programming. Math. Program. 39(2), 117–129 (1987)

    Article  MathSciNet  MATH  Google Scholar 

  62. Ochs, P., Chen, Y., Brox, T., Pock, T.: iPiano: inertial proximal algorithm for nonconvex optimization. SIAM J. Imaging Sci. 7(2), 1388–1419 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  63. Panageas, I., Piliouras, G.: Gradient descent only converges to minimizers: non-isolated critical points and invariant regions. arXiv:1605.00405 (2016)

  64. Penot, J.P.: Higher-order optimality conditions and higher-order tangent sets. SIAM J. Optim. 27(4), 2508–2527 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  65. Pethick, C.J., Smith, H.: Bose–Einstein Condensation in Dilute Gases. Cambridge University Press, Cambridge (2002)

    Google Scholar 

  66. Press, W.H., Flannery, B.P., Teukolsky, S.A., Vetterling, W.T.: The art of scientific computing. In: Numerical Recipes. Cambridge University Press, Cambridge (1986)

  67. Recht, B.: A simpler approach to matrix completion. J. Mach. Learn. Res. 12(Dec), 3413–3430 (2011)

    MathSciNet  MATH  Google Scholar 

  68. Schneider, R., Uschmajew, A.: Convergence results for projected line-search methods on varieties of low-rank matrices via Łojasiewicz inequality. SIAM J. Optim. 25(1), 622–646 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  69. Shechtman, Y., Eldar, Y.C., Cohen, O., Chapman, H.N., Miao, J., Segev, M.: Phase retrieval with application to optical imaging: a contemporary overview. IEEE Signal Process. Mag. 32(3), 87–109 (2015)

    Article  Google Scholar 

  70. Sorber, L., Barel, M.V., Lathauwer, L.D.: Unconstrained optimization of real functions in complex variables. SIAM J. Optim. 22(3), 879–898 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  71. Sun, J.: When Are Nonconvex Optimization Problems Not Scary? Columbia University, New York (2016)

    Google Scholar 

  72. Sun, J., Qu, Q., Wright, J.: A geometric analysis of phase retrieval. In: IEEE International Symposium on Information Theory (ISIT), 2016, pp. 2379–2383. IEEE (2016)

  73. Sun, J., Qu, Q., Wright, J.: Complete dictionary recovery over the sphere i: overview and the geometric picture. IEEE Trans. Inf. Theory 63(2), 853–884 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  74. Sun, J., Qu, Q., Wright, J.: Complete dictionary recovery over the sphere ii: recovery by Riemannian trust-region method. IEEE Trans. Inf. Theory 63(2), 885–914 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  75. Sun, J., Qu, Q., Wright, J.: A geometric analysis of phase retrieval. Found. Comput. Math. 18(5), 1131–1198 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  76. Sun, R., Luo, Z.Q.: Guaranteed matrix completion via non-convex factorization. IEEE Trans. Inf. Theory 62(11), 6535–6579 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  77. Thoai, N.V.: General quadratic programming. In: Essays and Surveys in Global Optimization GERAD 25th Anniversary Series, vol. 7, pp. 107–129. Springer, New York (2005)

  78. Wen, Z., Yin, W.: A feasible method for optimization with orthogonality constraints. Math. Program. 142(1–2, Ser. A), 397–434 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  79. Wu, X., Wen, Z., Bao, W.: A regularized Newton method for computing ground states of Bose–Einstein condensates. SIAM J. Sci. Comput. 73(1), 303–329 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  80. Yang, W.H.: Error bounds for convex polynomials. SIAM J. Optim. 19(4), 1633–1647 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  81. Yang, W.H., Zhang, L.H., Song, R.: Optimality conditions for the nonlinear programming problems on Riemannian manifolds. Pac. J. Optim. 10, 415–434 (2014)

    MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

The authors are grateful to Prof. Adrian Lewis, the Associate Editor and the anonymous reviewers for their valuable comments and suggestions that helped to improve the quality of our manuscript.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zaiwen Wen.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

H. Zhang was partly supported by the elite undergraduate training program of School of Mathematical Sciences in Peking University. Z. Wen was supported in part by the NSFC Grant 11831002, the Key-Area Research and Development Program of Guangdong Province (No. 2019B121204008) and Beijing Academy of Artificial Intelligence. A. Milzarek was partly supported by the Beijing International Center for Mathematical Research, Peking University, the Boya Postdoctoral Fellowship Program, the Shenzhen Institute for Artificial Intelligence and Robotics for Society (AIRS), and by the Fundamental Research Fund - Shenzhen Research Institute of Big Data (SRIBD) Startup Fund JCYJ-AM20190661.

A Proof of Theorem 6.2

A Proof of Theorem 6.2

Proof

As in the proof of Theorem 6.1, the verification of Theorem 6.2 is mainly based on proper decompositions of \(\Delta = \mathbf{y}-\mathbf{z}\) and \(\mathrm{diag}(\tau )\mathbf{y}\) that allow to derive appropriate bounds of \(|f(\mathbf{y}) - f(\mathbf{z})|\) and \(\Vert \mathrm {grad\!\;}{f(\mathbf{y})}\Vert \) if \(\mathbf{y}\) is close to \(\mathbf{z}\) or if, equivalently, \(\Delta \) is sufficiently small. In particular, we will discuss three cases that will allow to simplify and estimate the expressions for \(|f(\mathbf{y}) - f(\mathbf{z})|\) and \(\Vert \mathrm {grad\!\;}{f(\mathbf{y})}\Vert \) via identifying the different leading terms.

Without loss of generality we assume \(\beta =1\). Let \(\mathbf{y} \in \mathbb {S}^{n-1}\) be arbitrary and let us set \(\Delta = \mathbf{y} - \mathbf{z}\), and

$$\begin{aligned}&\gamma _{1} = \sum _{k\in \mathcal {I}}z_{k}^{3}\Delta _{k},\quad \gamma _{2} = \sum _{k\in \mathcal {I}} z_{k}^2\Delta _{k}^2, \quad \gamma _{3}=\sum _{k\in \mathcal {I}} z_{k}\Delta _{k}^{3},\quad \gamma _{4}= \Vert \Delta \Vert _4^4. \end{aligned}$$
(A.1)

Based on the representation \(\mathrm {grad\!\;}{f(\mathbf{y})} = P_\mathbf{y}^\bot [H + 2 \mathrm{diag}(\tau )]\mathbf{y}\), we now introduce the following decompositions

$$\begin{aligned}&2\mathrm{diag}(\tau )\mathbf{y} = \mathbf{w} + c_{1} \mathbf{y}, \quad \mathbf{w} = 2 P_\mathbf{y}^\bot \mathrm{diag}(\tau )\mathbf{y} , \quad c_{1} = 2 \mathbf{y}^T \mathrm{diag}(\tau )\mathbf{y} \nonumber \\&{\Delta } = \mathbf{u} + \mathbf{v}, \quad H\mathbf{u} = 0, \quad \mathbf{u}^T \mathbf{v} = 0, \quad \Vert H\mathbf{v}\Vert \ge \sigma _-(H)\Vert \mathbf{v}\Vert , \end{aligned}$$
(A.2)

where \(\sigma _-(H)\) denotes the smallest positive singular value of H. Using this decomposition, (6.5), and \(H\mathbf{y} = H\Delta \), we can express the norm of the Riemannian gradient as follows

$$\begin{aligned} \Vert \mathrm {grad\!\;}{f(\mathbf{y})}\Vert ^2&= \Vert P_\mathbf{y}^\bot [H + 2 \mathrm{diag}(\tau )]\mathbf{y}\Vert ^{2} = \Vert P_\mathbf{y}^\bot [H\mathbf{y} + \mathbf{w} + c_1\mathbf{y}]\Vert ^{2}\nonumber \\&= \Vert P_\mathbf{y}^\bot [H\Delta + \mathbf{w}]\Vert ^{2} = \Vert H\Delta + \mathbf{w}\Vert ^{2} - (\mathbf{y}^{T}H\Delta + \mathbf{y}^{T}{} \mathbf{w})^{2}\nonumber \\&= \Vert H{\Delta } + \mathbf{w}\Vert ^2 - ({\Delta }^T H{\Delta })^2 = \Vert H\mathbf{v} + \mathbf{w}\Vert ^2 - (\mathbf{v}^T H\mathbf{v})^2. \end{aligned}$$
(A.3)

Let \(\lambda _+(H)\) be the largest eigenvalue of H. Then, by definition of \(\mathbf{v}\), we obtain

$$\begin{aligned} |\mathbf{v}^{T}H\mathbf{v}|&\le \lambda _+(H) \Vert \mathbf{v}\Vert ^2 \le \lambda _+(H){\sigma _{-}(H)}^{-2} \Vert H\mathbf{v}\Vert ^2 =: {\bar{\sigma }}^{-1} \cdot \mathbf{v}^{T} H^{2} \mathbf{v}. \end{aligned}$$
(A.4)

Moreover, Lemma 6.1 yields

$$\begin{aligned} \Vert \tau \Vert ^\frac{3}{2} \le \eta _1 \Vert \mathbf{w}\Vert \end{aligned}$$
(A.5)

for some constant \(\eta _{1}>0\) and for all \(\mathbf{y} \in \mathbb {S}^{n-1}\) sufficiently close to \(\mathbf{z}\). Throughout the proof, we will also repeatedly use the following facts:

$$\begin{aligned} 2 \mathbf{y}^{T}\Delta&= \Vert \mathbf{y}\Vert ^{2}+\Vert \Delta \Vert ^{2}-\Vert \mathbf{y}-\Delta \Vert ^{2} = \Vert \Delta \Vert ^{2},\nonumber \\ \mathbf{z}^{T}\Delta&=\mathbf{y}^{T}\Delta - \Vert \Delta \Vert ^{2}=-\frac{1}{2}\Vert \Delta \Vert ^{2}. \end{aligned}$$
(A.6)

Let \(\epsilon \in (0,1]\) be an arbitrary, small positive constant. We now discuss three different cases.

Case 1. \(\Vert \mathbf {w}\Vert \ge (1+\epsilon ) \Vert H\mathbf{v}\Vert \) or \(\Vert \mathbf {w}\Vert \le (1-\epsilon )\Vert H\mathbf{v}\Vert \). We first assume \(\mathbf{w} \ne 0\) and \(\epsilon < 1\). Since the function \(x \mapsto \varrho (x) := x + 1/x\) is monotonically decreasing on the interval \((0,1-\epsilon ]\) and monotonically increasing on \([1+\epsilon ,\infty )\), it follows

$$\begin{aligned} \varrho (x) \ge 1-\epsilon + \frac{1}{1-\epsilon } \quad x \in (0,1-\epsilon ] \quad \text {and} \quad \varrho (x) \ge 1+\epsilon + \frac{1}{1+\epsilon } \quad x \ge 1+\epsilon \end{aligned}$$

Thus, we obtain

$$\begin{aligned} \frac{\Vert \mathbf{w}\Vert }{\Vert H\mathbf{v}\Vert }+\frac{\Vert H\mathbf{v}\Vert }{\Vert \mathbf{w}\Vert } \ge \min \left\{ 1-\epsilon + \frac{1}{1-\epsilon },1+\epsilon + \frac{1}{1+\epsilon }\right\} = \frac{(1+\epsilon )^2+1}{1+\epsilon }, \end{aligned}$$

which further implies

$$\begin{aligned} \Vert H\mathbf{v} + \mathbf{w}\Vert ^{2}&\ge \Vert \mathbf{w}\Vert ^{2}+\Vert H\mathbf{v}\Vert ^{2} - 2\Vert \mathbf{w}\Vert \Vert H\mathbf{v}\Vert \\&= \left[ 1 - 2\left( \frac{\Vert \mathbf{w}\Vert }{\Vert H\mathbf{v}\Vert }+\frac{\Vert H\mathbf{v}\Vert }{\Vert \mathbf{w}\Vert }\right) ^{-1}\right] \cdot \left( \Vert \mathbf{w}\Vert ^{2}+\Vert H\mathbf{v}\Vert ^{2}\right) \\&\ge \frac{\epsilon ^{2}}{(1+\epsilon )^2+1} \left( \Vert \mathbf{w}\Vert ^{2}+\Vert H\mathbf{v}\Vert ^{2} \right) \ge \frac{\epsilon ^{2}}{5} \left( \Vert \mathbf{w}\Vert ^{2}+\Vert H\mathbf{v}\Vert ^{2} \right) . \end{aligned}$$

Next, we choose \(\delta _\mathbf{z} > 0\) sufficiently small such that \(\Vert \mathbf{v}\Vert ^2 \le \epsilon ^2 {\bar{\sigma }} / (10\lambda _+(H))\) and \(|\mathbf{v}^TH\mathbf{v}| \le 1\). (This is possible due to the decomposition (A.2) and \(\Vert \mathbf{v}\Vert \le \Vert \Delta \Vert \)). Using (A.4), (A.5), the estimate \([\frac{1}{2} |a + b|]^{3/2} \le [|a|^{3/2} + |b|^{3/2}] / \sqrt{2}\), \(a,b \in \mathbb {R}\), and setting \({\tilde{\eta }}_1 := \min \{\eta _1^{-2},\frac{{\bar{\sigma }}}{2}\}\), it follows

$$\begin{aligned} \Vert \mathrm {grad\!\;}{f(\mathbf {y})}\Vert ^{2}&\ge \frac{\epsilon ^{2}}{5}(\Vert \mathbf {w}\Vert ^{2}+\Vert H\mathbf{v}\Vert ^{2})-\lambda _{+}(H)\Vert \mathbf{v}\Vert ^{2} |\mathbf{v}^T H\mathbf{v}|\\&\ge \frac{\epsilon ^{2}}{10}(2\Vert \mathbf {w}\Vert ^{2}+2\Vert H\mathbf{v}\Vert ^{2})-\lambda _{+}(H) \cdot \epsilon ^2\bar{\sigma }/(10\lambda _+(H)) \cdot \bar{\sigma }^{-1}\Vert H\mathbf{v}\Vert ^2\\&=\frac{\epsilon ^{2}}{10}\left( 2\Vert \mathbf {w}\Vert ^{2}+\Vert H\mathbf{v}\Vert ^{2}\right) \ge \frac{\epsilon ^2}{5} \min \left\{ \frac{1}{\eta _{1}^{2}},\frac{{\bar{\sigma }}}{2} \right\} \left[ \Vert \tau \Vert ^3 + |\mathbf{v}^T H \mathbf{v}| \right] \\&\ge \frac{\epsilon ^2{\tilde{\eta }}_1}{5} \left[ \Vert \tau \Vert ^3 + |\mathbf{v}^T H \mathbf{v}|^\frac{3}{2} \right] \ge \frac{\epsilon ^2 {\tilde{\eta }}_1}{10} \left| \Vert \tau \Vert ^2 + \mathbf{v}^T H \mathbf{v} \right| ^{\frac{3}{2}}\\&= \frac{\epsilon ^2{\tilde{\eta }}_1\sqrt{2}}{5} | f(\mathbf{y}) - f(\mathbf{z}) |^\frac{3}{2}, \end{aligned}$$

where the last equality is a consequence of (6.4). Thus, we can infer that the largest KL exponent of problem (1.1) at \(\mathbf{z}\) is at least \(\frac{1}{4}\).

Case 2. \((2-{\epsilon })r_{+}^{2}\Vert \Delta _{\mathcal {I}} \Vert \ge \Vert \Delta \Vert ^2\) or \(\gamma _1 \le (2-\epsilon )r_+^2 \Vert \Delta _\mathcal {I}\Vert ^2 \Vert \Delta \Vert ^{-2}\). First, utilizing the identity \(\Delta ^T \mathbf{y} = \frac{1}{2}\Vert \Delta \Vert ^2\) in (A.6), we have \(P_{\mathbf {y}}^{\perp }\Delta =\Delta -\frac{1}{2}\Vert \Delta \Vert ^{2}\mathbf {y}\). Defining \(t_\Delta := \Delta ^T P_{\mathbf {y}}^{\perp }\Delta = \Vert \Delta \Vert ^2 - \frac{1}{4} \Vert \Delta \Vert ^4\), we will now work with the following additional decompositions

$$\begin{aligned}&H\Delta = \Delta ^{T}H\Delta \cdot \mathbf {y}+c_{2} P_{\mathbf {y}}^{\perp } \Delta +\mathbf {w}_{1}, \quad c_{2}=\frac{1-\frac{1}{2}\Vert \Delta \Vert ^{2}}{t_\Delta }\Delta ^{T}H\Delta ,\\&\mathbf {w}=c_{3} P_{\mathbf {y}}^{\perp }\Delta +\mathbf {w}_{2}, \quad c_{3}= \frac{1}{t_\Delta }[2\Delta ^T\mathrm{diag}(\tau )\mathbf{y} - \Vert \Delta \Vert ^2 \mathbf{y}^T\mathrm{diag}(\tau )\mathbf{y}]. \end{aligned}$$

We note that due to the choice of \(c_2\) and \(c_3\) and (A.2), the vectors \(\mathbf {w}_{1}\) and \(\mathbf {w}_{2}\) are orthogonal to \(\mathbf {y}\) and \(\Delta \). Hence, by (A.3), it holds that

$$\begin{aligned} \Vert \mathrm {grad\!\;}{f(\mathbf {y})}\Vert ^{2}&= \Vert H\Delta \Vert ^2 + 2\mathbf{w}^T H\Delta + \Vert \mathbf{w}\Vert ^2 - (\Delta ^T H \Delta )^2 \\&= (c_2^2 + 2 c_2c_3 + c_3^2)\Vert P_\mathbf{y}^\bot \Delta \Vert ^2 + \Vert \mathbf{w}_1\Vert ^2 + 2 \mathbf{w}_1^T\mathbf{w}_2 + \Vert \mathbf{w}_2\Vert ^2 \\&= (c_{2}+c_{3})^{2} \cdot t_\Delta +\Vert \mathbf {w}_{1}+\mathbf {w}_{2}\Vert ^{2} \ge (c_{2}+c_{3})^{2} \cdot t_\Delta . \end{aligned}$$

Recalling the definitions introduced in (A.1) and using \(\tau _k = (y_k - z_k)(y_k + z_k) = \Delta _k (\Delta _k + 2z_k)\) and \(y_k = \Delta _k + z_k\), we can express \(t_\Delta \cdot c_{3}\) via

$$\begin{aligned} t_\Delta c_{3}&= 2\sum \limits _{k\in [n]} \Delta _k^2(\Delta _k+2z_k)(\Delta _k+z_k) \nonumber \\&\quad -\Vert \Delta \Vert ^{2}\sum \limits _{k\in [n]} \Delta _k(\Delta _{k}+2z_{k})(z_{k}+\Delta _{k})^{2} \nonumber \\&= \sum \limits _{k \in [n]} 2 [\Delta _k^4 + 3 \Delta _k^3z_k + 2\Delta _k^2z_k^2] \nonumber \\&\quad - \Vert \Delta \Vert ^2 [\Delta _k^4 + 4 \Delta _k^3z_k + 5\Delta _k^2z_k^2 + 2\Delta _kz_k^3] \nonumber \\&= (2-\Vert \Delta \Vert ^2) \gamma _4 + (6 - 4 \Vert \Delta \Vert ^2) \gamma _3 + (4-5\Vert \Delta \Vert ^2) \gamma _2 - 2 \Vert \Delta \Vert ^2 \gamma _1. \end{aligned}$$
(A.7)

Notice that we have \(\gamma _2 \ge r_+^2 \Vert \Delta _\mathcal {I}\Vert ^2\) and \(\gamma _1 \le 1\) provided that \(\Vert \Delta \Vert \le 1\). Now, if \((2-{\epsilon })r_{+}^{2}\Vert \Delta _{\mathcal {I}}\Vert \ge \Vert \Delta \Vert ^{2}\), we obtain

$$\begin{aligned} (4-5\Vert \Delta \Vert ^2) \gamma _2 - 2\Vert \Delta \Vert ^2 \gamma _1&\ge (4 - 5\Vert \Delta \Vert ^2) r_+^2 \Vert \Delta _\mathcal {I}\Vert ^2 - 2\Vert \Delta \Vert ^2\\&\ge (2\epsilon - 5\Vert \Delta \Vert ^2) r_+^2 \Vert \Delta _\mathcal {I}\Vert ^2. \end{aligned}$$

and hence, it follows

$$\begin{aligned} t_\Delta \cdot c_{3} \ge \gamma _4 + \epsilon r_+^2 \Vert \Delta _\mathcal {I}\Vert ^2 + o(\Vert \Delta _\mathcal {I}\Vert ^2) \ge \Vert \Delta \Vert _4^4 + \frac{\epsilon r_+^2}{2} \Vert \Delta _\mathcal {I}\Vert ^2 \end{aligned}$$
(A.8)

for \(\Delta \) sufficiently small. Otherwise, if \(\gamma _1 \le (2-\epsilon )r_+^2 \Vert \Delta _\mathcal {I}\Vert ^2 \Vert \Delta \Vert ^{-2}\), then we also have \((4-5\Vert \Delta \Vert ^2) \gamma _2 - 2 \Vert \Delta \Vert ^2 \gamma _1 \ge (2\epsilon - 5\Vert \Delta \Vert ^2) r_+^2 \Vert \Delta _\mathcal {I}\Vert ^2\) and thus, (A.8) holds in both sub-cases. Consequently, due to the positive semidefiniteness of H, (A.8), and

$$\begin{aligned} \Vert \Delta \Vert ^2 = - 2 \Delta ^T \mathbf{z} = -2\Delta _\mathcal {I}^T z_\mathcal {I}\le 2 \Vert \Delta _\mathcal {I}\Vert \Vert z_\mathcal {I}\Vert \le 2 \Vert \Delta _\mathcal {I}\Vert , \end{aligned}$$
(A.9)

we can infer

$$\begin{aligned} \Vert \mathrm {grad\!\;}{f(\mathbf {y})}\Vert&\ge |c_{2}+c_{3}| \sqrt{t_\Delta } = \left[ \left( 1-\frac{1}{2}\Vert \Delta \Vert ^2\right) \Delta ^{T}H\Delta + t_\Delta c_{3}\right] t_\Delta ^{-1/2} \\&\ge [\Delta ^{T}H\Delta + 2 \Vert \Delta \Vert _4^4 +\epsilon r_{+}^{2}\Vert \Delta _{\mathcal {I}}\Vert ^{2}] \cdot (2\Vert \Delta \Vert )^{-1} \\&\ge [\Delta ^{T}H\Delta + 2 \Vert \Delta \Vert _4^4 +\epsilon r_{+}^{2}\Vert \Delta _{\mathcal {I}}\Vert ^{2}]^{\frac{3}{4}} \cdot \frac{\epsilon ^{\frac{1}{4}}\sqrt{r_{+} \Vert \Delta _{\mathcal {I}}\Vert }}{2\Vert \Delta \Vert }\\&\ge \eta _2 [\Delta ^{T}H\Delta + 2 \Vert \Delta \Vert _4^4 +\epsilon r_{+}^{2}\Vert \Delta _{\mathcal {I}}\Vert ^{2}]^{\frac{3}{4}}, \end{aligned}$$

where \(\eta _2 := \epsilon ^{\frac{1}{4}}\sqrt{r_{+}} / 2\sqrt{2}\) and if \(\Delta \) is chosen sufficiently small. Here, we also used the estimates \(t_{\Delta }=\Vert \Delta \Vert ^{2}-\Vert \Delta \Vert ^{4}/4\le \Vert \Delta \Vert ^{2}\) and \(1- \Vert \Delta \Vert ^2/2 \ge 1/2\) in the second inequality. The third inequality follows from \([\Delta ^{T}H\Delta + 2 \Vert \Delta \Vert _4^4 +\epsilon r_{+}^{2}\Vert \Delta _{\mathcal {I}}\Vert ^{2}]^\frac{1}{4}\ge \epsilon ^\frac{1}{4} \sqrt{r_{+}\Vert \Delta _{\mathcal {I}}\Vert }\). Next, utilizing (6.4) and \(|\tau _k| \le 2 |\Delta _k|\) for all \(k \in \mathcal {I}\), we finally obtain

$$\begin{aligned} |f(\mathbf {y})-f(\mathbf {z})|&=\frac{1}{2}[ \Delta ^{T}H\Delta + \Vert \tau \Vert ^2] \\&\le \frac{1}{2}\left[ \Delta ^{T}H\Delta + \Vert \Delta _\mathcal {A}\Vert _4^4 + 4\Vert \Delta _{\mathcal {I}}\Vert ^{2} \right] \le \eta _{3} \Vert \mathrm {grad\!\;}{f(\mathbf {y})}\Vert ^{\frac{4}{3}} \end{aligned}$$

for some constant \(\eta _{3}>0\) and for all \(\mathbf{y}\) sufficiently close to \(\mathbf{z}\). Hence, the largest KL exponent is at least \(\frac{1}{4}\) in this case.

Case 3. \((1-\epsilon )\Vert H\mathbf{v}\Vert \le \Vert \mathbf {w}\Vert \le (1+\epsilon )\Vert H\mathbf{v}\Vert \), \(\gamma _{1}\ge (2-\epsilon )r_{+}^{2}\Vert \Delta _{\mathcal {I}} \Vert ^{2}\Vert \Delta \Vert ^{-2}\) and \(\left( 2-\epsilon \right) r_{+}^{2}\Vert \Delta _{\mathcal {I}}\Vert \le \Vert \Delta \Vert ^{2}\). In this case, the inequality (A.9) implies \(\Vert \Delta _{\mathcal {I}}\Vert =\Theta (\Vert \Delta \Vert ^{2})\) and setting \(\nu = [(2-\epsilon )r_+^2]^{-1}\), the terms \(\gamma _{i}\) for \(i=1, \ldots ,3\) can be estimated as follows

$$\begin{aligned} (4\nu )^{-1} \Vert \Delta \Vert ^2&\le (2-\epsilon )r_{+}^{2}\Vert \Delta _{\mathcal {I}}\Vert ^{2}\Vert \Delta \Vert ^{-2} \le \gamma _{1}\le \Vert \Delta _{\mathcal {I}}\Vert _{1} \le |{\mathcal {I}}|\nu \Vert \Delta \Vert ^2,\nonumber \\&\quad \frac{r_+^2}{4} \Vert \Delta \Vert ^4 \le r_{+}^{2}\Vert \Delta _{\mathcal {I}}\Vert ^{2} \le \gamma _{2}\le \Vert \Delta _{\mathcal {I}}\Vert ^{2} \le \nu ^2 \Vert \Delta \Vert ^4,\nonumber \\&\quad -\nu ^3 \Vert \Delta \Vert ^6 \le -\Vert \Delta _{\mathcal {I}}\Vert _3^{3} \le \gamma _{3}\le \Vert \Delta _{\mathcal {I}}\Vert _3^{3} \le \nu ^3 \Vert \Delta \Vert ^6. \end{aligned}$$
(A.10)

Together with \(\gamma _4 = \Vert \Delta \Vert _4^4\), this shows that

$$\begin{aligned} \gamma _{1} = \Theta (\Vert \Delta \Vert ^2) ,\quad \gamma _{2} = \Theta ( \Vert \Delta \Vert ^4), \quad \gamma _{3}=O(\Vert \Delta \Vert ^{6}),\quad \gamma _{4}=\Theta (\Vert \Delta \Vert ^{4}) \end{aligned}$$
(A.11)

for all \(\Delta \). Let us set \(m := |\mathcal {I}|\) and define \(\sigma _{k}= \Delta _{k} z_{k}+\frac{1}{2m}\Vert \Delta \Vert ^{2}\) for all \(k\in \mathcal {I}\) and \(\sigma _k = 0\) for all \(k \in \mathcal {A}\). Then, by (A.6), we have

$$\begin{aligned} \sum _{k \in [n]} \sigma _k = 0, \quad \Vert \sigma \Vert _1 \ge \sum _{k\in \mathcal {I}} z_{k}^{2}\sigma _{k} = \gamma _1 + \frac{1}{2m} \Vert \Delta \Vert ^2, \end{aligned}$$
(A.12)

and \(\Vert \sigma \Vert = \Theta (\Vert \Delta \Vert ^2)\). We now express \(\Vert \mathbf {w}\Vert ^{2}\) in terms of \(\gamma _{1}\), \(\gamma _2\), \(\gamma _3\), and \(\gamma _{4}\). Specifically, by utilizing (A.2), (A.11) and by mimicking the derivation of (A.7), we obtain

$$\begin{aligned} \frac{1}{4} \Vert \mathbf {w}\Vert ^{2}&= \Vert P_\mathbf{y}^\bot \mathrm{diag}(\tau )\mathbf{y}\Vert ^{2} = \mathbf{y}^T \mathrm{diag}(|\tau |^2)\mathbf{y} - (\mathbf{y}^T\mathrm{diag}(\tau )\mathbf{y})^2 \\&= \sum \limits _{k \in [n]} (z_k+\Delta _k)^2\Delta _k^2(\Delta _k+2z_k)^2 - (\Vert \Delta \Vert _4^4 + 4\gamma _3 + 5\gamma _2 + 2\gamma _1)^2 \\&= \sum \limits _{k \in [n]} \Delta _k^2 [\Delta _k^4 + 6\Delta _k^3z_k + 13\Delta _k^2z_k^2 + 12 \Delta _kz_k^3 + 4z_k^4] - 4\gamma _1^2 - 20\gamma _1\gamma _2 \\&\quad - 25\gamma _2^2 - 8\gamma _3(5\gamma _2+2\gamma _1) - 16 \gamma _3^2 - 2 \Vert \Delta \Vert _4^4(4\gamma _3+5\gamma _2+2\gamma _1) - \Vert \Delta \Vert _4^8 \\&= \sum \limits _{k \in \mathcal {I}} [4 \Delta _k^2 z_k^4 + 12 \Delta _k^3 z_k^3 + 13 \Delta _k^4 z_k^2 + 6\Delta _k^5z_k] - 4\gamma _1^2 + O(\Vert \Delta \Vert ^6) \\&= \sum \limits _{k \in \mathcal {I}} 4\Delta _k^2z_k^4 - 4\gamma _1^2 + O(\Vert \Delta \Vert ^6) = \sum \limits _{k, j \in \mathcal {I}} 4\Delta _k^2z_k^4z_j^2 - 4\gamma _1^2 + O(\Vert \Delta \Vert ^6) \\&= \sum \limits _{k, j \in \mathcal {I}} 2\Delta _k^2z_k^4z_j^2 + \sum \limits _{k, j \in \mathcal {I}} 2\Delta _j^2z_j^4z_k^2 - 4\left( \sum \limits _{k\in \mathcal {I}} \Delta _k z_k^3 \right) ^2 + O(\Vert \Delta \Vert ^6) \\&= 2 \sum \limits _{k, j \in \mathcal {I}} [z_k^4 z_j^2 \Delta _k^2 - 2 z_k^3 z_j^3 \Delta _k \Delta _j + z_k^2 z_j^4 \Delta _j^2] + O(\Vert \Delta \Vert ^6) \\&= 2 \sum \limits _{k, j \in \mathcal {I}} z_k^2 z_j^2 (\sigma _k - \sigma _j)^2 + O(\Vert \Delta \Vert ^6). \end{aligned}$$

We notice that the higher order terms \(\sum _{k \in \mathcal {I}} \Delta _k^{3+\ell } z_k^{3-\ell }\), \(\ell \in \{0,1,2\}\), can be discussed as in (A.10) and are all bounded by \(O(\Vert \Delta \Vert ^6)\). Finally, applying (A.12) and \(|z_k| \le 1\) and \(z_k^2 \ge r_+^2\) for all \(k \in \mathcal {I}\), we obtain

$$\begin{aligned} \sum \limits _{k, j \in \mathcal {I}} z_k^2 z_j^2 (\sigma _k - \sigma _j)^2 \ge r_+^4 \sum \limits _{k,j \in \mathcal {I}} (\sigma _k - \sigma _j)^2 = 2 m r_+^4 \Vert \sigma \Vert ^2 \end{aligned}$$

and \({\sum }_{k, j \in \mathcal {I}} z_k^2 z_j^2 (\sigma _k - \sigma _j)^2 \le 2m \Vert \sigma \Vert ^2\) which implies \(\Vert \mathbf {w}\Vert =\Theta (\Vert \Delta \Vert ^{2})\) and

$$\begin{aligned} \Vert H\mathbf{v}\Vert =\Theta (\Vert \Delta \Vert ^{2}) \end{aligned}$$
(A.13)

for \(\Delta \rightarrow 0\). As a consequence, by (A.4), we also get

$$\begin{aligned} |\mathbf{v}^{T}H\mathbf{v}|=O(\Vert \Delta \Vert ^{4}). \end{aligned}$$
(A.14)

Furthermore, by (6.4) and (A.1), it holds that

$$\begin{aligned} 2 |f(\mathbf {y})-f(\mathbf {z})|&\le |\Delta ^{T}H\Delta | + \Vert \tau \Vert ^{2} = |\mathbf{v}^{T}H\mathbf{v}| + \sum \limits _{k\in [n]}[\Delta _{k}^{2}+2z_{k}\Delta _{k}]^{2}\nonumber \\&=|\mathbf{v}^{T}H\mathbf{v}| + \Vert \Delta \Vert _4^4 +4\gamma _{3}+4\gamma _{2} = O(\Vert \Delta \Vert ^{4}) \end{aligned}$$
(A.15)

for \(\Delta \rightarrow 0\). For some index sets \({\mathcal {K}}, {\mathcal {J}} \subset [n]\), let \(H_{{\mathcal {K}}{\mathcal {J}}} \in \mathbb {R}^{|{\mathcal {K}}| \times |{\mathcal {J}}|}\) denote the submatrix \(H_{{\mathcal {K}}{\mathcal {J}}} = (H_{kj})_{k \in {\mathcal {K}}, j \in {\mathcal {J}}}\). Notice that due to the positive semidefiniteness of H, we have \(H_{\mathcal {I}\mathcal {I}}\succeq 0\) and \(H_{\mathcal {A}\mathcal {A}}\succeq 0\). Moreover, due to (A.2), (A.3), and \(|\mathbf{v}^{T}H\mathbf{v}|=O(\Vert \Delta \Vert ^{4})\), we obtain

$$\begin{aligned} \Vert \mathrm {grad\!\;}{f(\mathbf{y})}\Vert ^2&= \Vert H\Delta + \mathbf{w} \Vert ^2 + O(\Vert \Delta \Vert ^8) \nonumber \\&= \Vert H_{\mathcal {A}\cdot } \Delta + {w}_\mathcal {A}\Vert ^2 + \Vert H_{\mathcal {I}\cdot } \Delta + {w}_\mathcal {I}\Vert ^2 + O(\Vert \Delta \Vert ^8)\nonumber \\&= \Vert H_{\mathcal {I}\mathcal {A}}^T\Delta _\mathcal {I}+ H_{\mathcal {A}\mathcal {A}} \Delta _\mathcal {A}+ 2\mathrm{diag}(|\Delta _\mathcal {A}|^2) \Delta _\mathcal {A}- c_1 \Delta _\mathcal {A}\Vert ^2 \nonumber \\&\quad + \Vert H_{\mathcal {I}\mathcal {I}} \Delta _\mathcal {I}+ H_{\mathcal {I}\mathcal {A}} \Delta _\mathcal {A}+ {w}_\mathcal {I}\Vert ^2 + O(\Vert \Delta \Vert ^8), \end{aligned}$$
(A.16)

where \(c_{1}=2\mathbf{y}^{T}\mathrm{diag}(\tau )\mathbf{y}\) was defined in (A.2) and we used the identities \(z_{k}=0\) for \(k\in \mathcal {A}\) and \(w_{\mathcal {A}}=2\mathrm{diag}(\tau _\mathcal {A}){y}_\mathcal {A}-c_{1}{y}_\mathcal {A}=2\mathrm{diag}(|\Delta _\mathcal {A}|^2)\Delta _\mathcal {A}- c_1 \Delta _\mathcal {A}\). We set

$$\begin{aligned} \mathbf{h}&:= H_{\mathcal {I}\mathcal {I}} \Delta _\mathcal {I}+ H_{\mathcal {I}\mathcal {A}} \Delta _\mathcal {A}, \quad \mathbf{g}_1 := \mathbf{h}+ {w}_\mathcal {I}, \nonumber \\ \mathbf{g}_2&:= H_{\mathcal {I}\mathcal {A}}^T\Delta _\mathcal {I}+ H_{\mathcal {A}\mathcal {A}} \Delta _\mathcal {A}+ 2\mathrm{diag}(|\Delta _\mathcal {A}|^2)\Delta _\mathcal {A}- c_1 \Delta _\mathcal {A}\end{aligned}$$
(A.17)

and, let \(\eta _{4},\eta _{5}>0\) and \(\mu \in (0,\frac{1}{2})\) be given constants. Next, we discuss two separate sub-cases.

Sub-case 3.1. \(\Delta _{\mathcal {I}}^{T}\mathbf{h}\ge -\eta _{4}\Vert \Delta \Vert ^{4+\mu }\) or \(\Delta _\mathcal {I}^T \mathbf{h} \le -\eta _5 \Vert \Delta \Vert ^{4-\mu }\). Let us first assume \(\Delta _{\mathcal {I}}^{T}{} \mathbf{h}\ge -\eta _{4}\Vert \Delta \Vert ^{4+\mu }\). Following the derivation of (A.7) and using (A.6), (A.2), and (A.11), we obtain \(\frac{1}{2}c_1 = \Vert \Delta \Vert _4^4 + 4\gamma _3 + 5\gamma _2 + 2\gamma _1 = \Theta (\Vert \Delta \Vert ^2)\), \(\Delta _\mathcal {I}^T y_\mathcal {I}= - \frac{1}{2}\Vert \Delta \Vert ^2\), and

$$\begin{aligned} \Delta _\mathcal {I}^T w_\mathcal {I}= 2\Delta _\mathcal {I}^T\mathrm{diag}(\tau _\mathcal {I}){y}_{\mathcal {I}} - c_1 \Delta _\mathcal {I}^T{y}_\mathcal {I}= 2\Vert \Delta _\mathcal {I}\Vert _4^4 + 6 \gamma _3 + 4\gamma _2 + {\textstyle \frac{1}{2}} \Vert \Delta \Vert ^2 c_1, \end{aligned}$$

which yields

$$\begin{aligned} \Delta _{\mathcal {I}}^{T}{} \mathbf{g}_{1}&\ge - \eta _4 \Vert \Delta \Vert ^{4+\mu } + 2 \Vert \Delta _\mathcal {I}\Vert _4^4 + 4\gamma _2 + \Vert \Delta \Vert ^2 [\Vert \Delta \Vert _4^4 + 5\gamma _2 + 2\gamma _1] \\&\quad + 6\gamma _3 + 4 \Vert \Delta \Vert ^2 \gamma _3 \\&\ge \Theta (\Vert \Delta \Vert ^{4}). \end{aligned}$$

Similarly, in the case \(\Delta _{\mathcal {I}}^{T}{} \mathbf{h} \le -\eta _{5}\Vert \Delta \Vert ^{4-\mu }\) and if \(\Vert \Delta \Vert \) is sufficiently small, we get

$$\begin{aligned} \Delta _{\mathcal {I}}^{T}{} \mathbf{g}_{1} \le -\frac{\eta _{5}}{2}\Vert \Delta \Vert ^{4-\mu }. \end{aligned}$$

Combining both cases, we can infer \(\Vert \Delta _{\mathcal {I}}\Vert \Vert \mathbf{g}_{1}\Vert \ge |\Delta _{\mathcal {I}}^{T}{} \mathbf{g}_{1}| \ge \eta _6 \Vert \Delta \Vert ^4\) for some \(\eta _6 > 0\) and for all \(\mathbf{y}\) sufficiently close to \(\mathbf{z}\). By the assumptions of case 3, this implies \(\Vert \mathbf{g}_{1}\Vert \ge \nu ^{-1}\eta _6 \Vert \Delta \Vert ^2\) and hence, by (A.16), we have \(\Vert \mathrm {grad\!\;}f(\mathbf{y})\Vert \ge \Theta (\Vert \Delta \Vert ^2)\). Considering (A.15), the Łojasiewicz inequality holds with \(\theta = \frac{1}{2}\) in this sub-case.

Sub-case 3.2. \(-\eta _{5}\Vert \Delta \Vert ^{4-\mu }\le \Delta _{\mathcal {I}}^{T}{} \mathbf{h} \le -\eta _{4}\Vert \Delta \Vert ^{4+\mu }\). Due to \(\Delta _{\mathcal {I}}^{T}H_{\mathcal {I} \mathcal {I}}\Delta _{\mathcal {I}}\ge 0\), this directly yields \(\Delta _{\mathcal {I}}^{T}H_{\mathcal {I}\mathcal {A}} \Delta _{\mathcal {A}}=\Delta _{\mathcal {I}}^T\mathbf{h}-\Delta _{\mathcal {I}}^{T}H_{\mathcal {I}\mathcal {I}} \Delta _{\mathcal {I}}<0\). Moreover, we can estimate that

$$\begin{aligned} \Delta _{\mathcal {I}}^{T}H_{\mathcal {I}\mathcal {A}}\Delta _{\mathcal {A}}&=\Delta _{\mathcal {I}}^T\mathbf{h}-\Delta _{\mathcal {I}}^{T}H_{\mathcal {I}\mathcal {I}}\Delta _{\mathcal {I}} \ge -\eta _5\Vert \Delta \Vert ^{4-\mu } - \lambda _-(H_{\mathcal {I}\mathcal {I}})\Vert \Delta _\mathcal {I}\Vert ^2\\&\ge -\eta _5\Vert \Delta \Vert ^{4-\mu } - \lambda _-(H_{\mathcal {I}\mathcal {I}}) \cdot (2-\epsilon )^{-2}r_+^{-4}\Vert \Delta \Vert ^4 \ge -2\eta _5 \Vert \Delta \Vert ^{4-\mu }, \end{aligned}$$

where \(\lambda _-(H_{\mathcal {I}\mathcal {I}})\) is the maximal eigenvalue of \(H_{\mathcal {I}\mathcal {I}}\). We note that the last inequality holds when \(\Vert \Delta \Vert \) is small enough, since \(\eta _5,r_+,\epsilon ,H_{\mathcal {I}\mathcal {I}}\) do not depend on \(\Delta \) and can be considered as constants when \(\Delta \rightarrow 0\). Also, we have

$$\begin{aligned} \Delta _{\mathcal {A}}^{T}H_{\mathcal {A}\mathcal {A}} \Delta _{\mathcal {A}}\ge \Delta _{\mathcal {A}}^{T} (H_{\mathcal {I}\mathcal {A}}^{T}\Delta _{\mathcal {I}} +H_{\mathcal {A}\mathcal {A}}\Delta _{\mathcal {A}})\ge \Delta ^{T}H\Delta = {\mathbf{v}^T H \mathbf{v}}. \end{aligned}$$
(A.18)

Utilizing the positive semidefiniteness of H, it holds that

$$\begin{aligned} \Vert H\mathbf{v}\Vert ^2 = (H^\frac{1}{2}\mathbf{v})^TH(H^\frac{1}{2}\mathbf{v}) \le \lambda _+(H) \Vert H^\frac{1}{2}\mathbf{v}\Vert = \lambda _+(H) \cdot \mathbf{v}^TH\mathbf{v}. \end{aligned}$$

Hence, the estimates \(\Vert H\mathbf{v}\Vert =\Theta (\Vert \Delta \Vert ^2)\) in (A.13) and \(|\mathbf{v}^TH\mathbf{v}| = O(\Vert \Delta \Vert ^4)\) in (A.14) lead to \(\mathbf{v}^TH\mathbf{v} = \Theta (\Vert \Delta \Vert ^4)\). Consequently, if \(\Delta _{\mathcal {A}}^{T}H_{\mathcal {A}\mathcal {A}} \Delta _{\mathcal {A}}\ge \eta _{7}\Vert \Delta \Vert ^{4-2\mu }\) for some constant \(\eta _{7}>0\), then we can infer

$$\begin{aligned} \Delta _{\mathcal {A}}^{T}\mathbf{g}_{2}&=\Delta _{\mathcal {A}}^{T}(H_{\mathcal {I} \mathcal {A}}^{T}\Delta _{\mathcal {I}}+H_{\mathcal {A} \mathcal {A}}\Delta _{\mathcal {A}})+2\Vert \Delta _{\mathcal {A}} \Vert _{4}^{4}-c_1\Vert \Delta _{\mathcal {A}}\Vert ^{2}\\&\ge \eta _{7}\Vert \Delta \Vert ^{4-2\mu }-2\eta _{5} \Vert \Delta \Vert ^{4-\mu }+O(\Vert \Delta \Vert ^{4})\ge \frac{\eta _{7}}{2} \Vert \Delta \Vert ^{4-2\mu } \end{aligned}$$

for \(\Delta \rightarrow 0\). As in sub-case 3.1, this allows to show \(\Vert \mathrm {grad\!\;}{f(\mathbf{y})}\Vert \ge \Theta (\Vert \Delta \Vert ^{3-2\mu })\) and thus, by (A.15), the Łojasiewicz inequality holds with \(\theta = \frac{1+2\mu }{4} \ge \frac{1}{4}\).

Finally, let us consider \(\eta _{8}\Vert \Delta \Vert ^{4}\le \Delta _{\mathcal {A}}^{T} H_{\mathcal {A}\mathcal {A}}\Delta _{\mathcal {A}}\le \eta _{7}\Vert \Delta \Vert ^{4-2\mu }\), where \(\eta _{8} > 0\) is chosen such that \(\Delta ^{T}H\Delta \ge \eta _{8}\Vert \Delta \Vert ^{4}\) and let us define the final decompositions

$$\begin{aligned} \Delta _{\mathcal {A}}=\psi _{1}+\xi _{1},\quad \psi _{1}\in \mathrm {null}~H_{\mathcal {A}\mathcal {A}},\quad \xi _{1} \in [\mathrm {null}~H_{\mathcal {A}\mathcal {A}}]^\bot ,\\ \mathrm{diag}(|\Delta _\mathcal {A}|^2)\Delta _{\mathcal {A}}=\psi _{2}+\xi _{2}, \quad \psi _{2}\in \mathrm {null}~H_{\mathcal {A}\mathcal {A}}, \quad \xi _{2} \in [\mathrm {null}~H_{\mathcal {A}\mathcal {A}}]^\bot , \end{aligned}$$

where \(\mathrm {null}~M\) is the null space of a matrix M. We then have \(\Vert \xi _{1}\Vert =\Theta (\Vert H_{\mathcal {A}\mathcal {A}}\xi _{1}\Vert )= \Theta (\sqrt{\xi _{1}^{T}H_{\mathcal {A}\mathcal {A}}\xi _{1}})=O (\Vert \Delta \Vert ^{2-\mu })\) and \(\Vert \psi _{1}\Vert =O(\Vert \Delta \Vert )\). Notice that such decompositions exist due to the symmetry of \(H_{\mathcal {A}\mathcal {A}}\).

Since H is positive semidefinite, we can show that \(\mathrm {null}\,H_{\mathcal {A}\mathcal {A}}\subset \mathrm {null}\, H_{\mathcal {I}\mathcal {A}}\). If \(H_{\mathcal {I}\mathcal {A}}=0\), then this claim is certainly true. Otherwise, if we assume that the statement is false, the set \(S = \mathrm {null}\,H_{\mathcal {A}\mathcal {A}}\cap [\mathrm {null}\,H_{\mathcal {I}\mathcal {A}}]^{\bot }\) does not only contain zero and we can select \(\psi \in S\backslash \{0\}\) and \(\xi \in [\mathrm {null}\,H_{\mathcal {I}\mathcal {A}}^{T}]^{\bot } \backslash \{0\} = [\mathrm {ran}\,H_{\mathcal {I}\mathcal {A}}] \backslash \{0\}\). Then it holds that

$$\begin{aligned} \begin{bmatrix}\xi ^T&\quad a\psi ^T\end{bmatrix}\begin{bmatrix} H_{\mathcal {I}\mathcal {I}} &{}\quad H_{\mathcal {I}\mathcal {A}} \\ H_{\mathcal {I}\mathcal {A}}^T &{}\quad H_{\mathcal {A}\mathcal {A}} \end{bmatrix} \begin{bmatrix}\xi \\ a\psi \end{bmatrix} =\xi ^{T}H_{\mathcal {I}\mathcal {I}}\xi +2a \psi ^{T}H_{\mathcal {I}\mathcal {A}}^{T}\xi \ge 0,\quad \forall ~a\in \mathbb {R}. \end{aligned}$$

But since \([H_{\mathcal {I}\mathcal {A}}\psi ]^{T}\xi \ne 0\), we can choose a such that \(\xi ^{T}H_{\mathcal {I}\mathcal {I}}\xi +2a\psi ^{T}H_{\mathcal {I} \mathcal {A}}^{T}\xi <0\), which is a contradiction.

Hence and due to \(H_{\mathcal {I}\mathcal {A}}^{T}\Delta _{\mathcal {I}} \in \mathrm {ran}\,H_{\mathcal {I}\mathcal {A}}^T = [\mathrm {null}\,H_{\mathcal {I}\mathcal {A}}]^\bot \), we can infer \(H_{\mathcal {I}\mathcal {A}}^{T}\Delta _{\mathcal {I}} + H_{\mathcal {A}\mathcal {A}}\Delta _{\mathcal {A}}\in [\mathrm {null}\,H_{\mathcal {A}\mathcal {A}}]^{\perp }\). This implies that \(\mathbf{g}_2\) can be written as \(\mathbf{g}_2 = \mathbf{g}_3 + \mathbf{d}\) where \(\mathbf{g}_3 \in [\mathrm {null}\,H_{\mathcal {A}\mathcal {A}}]^{\perp }\) and \(\mathbf{d}=2\psi _{2}-c_{1}\psi _{1} \in \mathrm {null}\,H_{\mathcal {A}\mathcal {A}}\). If \(\Vert \mathbf{d}\Vert \ge \frac{\eta _{8}}{2}\Vert \Delta \Vert ^{3}\), we obtain \(\Vert \mathrm {grad\!\;}{f(\mathbf{y})}\Vert \ge \Vert \mathbf{g}_{2}\Vert \ge \Vert \mathbf{d}\Vert \ge \frac{\eta _{8}}{2}\Vert \Delta \Vert ^{3}\) by (A.16) and the Łojasiewicz inequality is satisfied with \(\theta =\frac{1}{4}\) due to (A.15). Otherwise, if \(\Vert \mathbf{d}\Vert \le \frac{\eta _{8}}{2}\Vert \Delta \Vert ^{3}\), it follows

$$\begin{aligned} 2\Vert \Delta _{\mathcal {A}}\Vert _{4}^{4}-c_{1}\Vert \Delta _{\mathcal {A}}\Vert ^{2}&= 2\psi _{1}^{T}\psi _{2}-c_{1}\Vert \psi _{1}\Vert ^{2}+2\xi _{1}^{T}\xi _{2}-c_{1}\Vert \xi _{1}\Vert ^{2}\\&=\psi _{1}^{T}\mathbf {d}+2\xi _{1}^{T}\xi _{2}-c_{1}\Vert \xi _{1}\Vert ^{2} \ge -\frac{\eta _{8}}{2}\Vert \Delta \Vert ^{4}+O(\Vert \Delta \Vert ^{5-\mu }), \end{aligned}$$

where we applied the estimates \(\Vert \xi _2\Vert \le \Vert \Delta _\mathcal {A}\Vert _6^3 \le \Vert \Delta \Vert ^3\) and \(c_1 = \Theta (\Vert \Delta \Vert ^2)\). Using (A.18), this shows

$$\begin{aligned} \Delta _{\mathcal {A}}^{T}{} \mathbf{g}_{2}&=\Delta _{\mathcal {A}}^{T}(H_{\mathcal {I}\mathcal {A}}^{T}\Delta _{\mathcal {I}}+H_{\mathcal {A}\mathcal {A}}\Delta _{\mathcal {A}})+2\Vert \Delta _{\mathcal {A}}\Vert _{4}^{4}-c_{1}\Vert \Delta _{\mathcal {A}}\Vert ^{2}\\&\ge \Delta ^{T}H\Delta -\frac{\eta _{8}}{2}\Vert \Delta \Vert ^{4}+O(\Vert \Delta \Vert ^{5-\mu }) \ge \frac{\eta _{8}}{2}\Vert \Delta \Vert ^{4}+O(\Vert \Delta \Vert ^{5-\mu })\ge \frac{\eta _{8}}{4}\Vert \Delta \Vert ^{4} \end{aligned}$$

for \(\Delta \rightarrow 0\). Thus, we have \(\Vert \mathrm {grad\!\;}{f(\mathbf{y})}\Vert \ge \Theta (\Vert \Delta \Vert ^{3})\) and as before we can infer that the Łojasiewicz inequality holds with \(\theta =\frac{1}{4}\) in this case. \(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, H., Milzarek, A., Wen, Z. et al. On the geometric analysis of a quartic–quadratic optimization problem under a spherical constraint. Math. Program. 195, 421–473 (2022). https://doi.org/10.1007/s10107-021-01702-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10107-021-01702-6

Keywords

Mathematics Subject Classification

Navigation