On the geometric analysis of a quartic–quadratic optimization problem under a spherical constraint

Zhang, Haixiang; Milzarek, Andre; Wen, Zaiwen; Yin, Wotao

doi:10.1007/s10107-021-01702-6

On the geometric analysis of a quartic–quadratic optimization problem under a spherical constraint

Full Length Paper
Series A
Published: 21 August 2021

Volume 195, pages 421–473, (2022)
Cite this article

Mathematical Programming Submit manuscript

Haixiang Zhang¹,
Andre Milzarek^2,3,4,
Zaiwen Wen ORCID: orcid.org/0000-0003-1762-0671^5,6 &
…
Wotao Yin⁷

877 Accesses
4 Citations
Explore all metrics

Abstract

This paper considers the problem of solving a special quartic–quadratic optimization problem with a single sphere constraint, namely, finding a global and local minimizer of $\frac{1}{2}\mathbf {z}^{*}A\mathbf {z}+\frac{\beta }{2}\sum _{k=1}^{n}|z_{k}|^{4}$ such that $\Vert \mathbf {z}\Vert _{2}=1$. This problem spans multiple domains including quantum mechanics and chemistry sciences and we investigate the geometric properties of this optimization problem. Fourth-order optimality conditions are derived for characterizing local and global minima. When the matrix in the quadratic term is diagonal, the problem has no spurious local minima and global solutions can be represented explicitly and calculated in $O(n\log {n})$ operations. When A is a rank one matrix, the global minima of the problem are unique under certain phase shift schemes. The strict-saddle property, which can imply polynomial time convergence of second-order-type algorithms, is established when the coefficient $\beta $ of the quartic term is either at least $O(n^{3/2})$ or not larger than O(1). Finally, the Kurdyka–Łojasiewicz exponent of quartic–quadratic problem is estimated and it is shown that the largest exponent is at least 1/4 for a broad class of stationary points.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Finding the global optimum of a class of quartic minimization problem

Article 20 January 2022

A DCA-Newton method for quartic minimization over the sphere

Article 10 July 2023

Quadratic Growth and Linear Convergence of a DCA Method for Quartic Minimization over the Sphere

Article 01 March 2024

References

Absil, P.A., Mahony, R., Andrews, B.: Convergence of the iterates of descent methods for analytic cost functions. SIAM J. Optim. 16(2), 531–547 (2005)
Article MathSciNet MATH Google Scholar
Absil, P.A., Mahony, R., Sepulchre, R.: Optimization Algorithms on Matrix Manifolds. Princeton University Press, Princeton (2009)
MATH Google Scholar
Adhikari, S.K.: Numerical solution of the two-dimensional Gross–Pitaevskii equation for trapped interacting atoms. Phys. Lett. A 265(1–2), 91–96 (2000)
Article Google Scholar
Anandkumar, A., Ge, R.: Efficient approaches for escaping higher order saddle points in non-convex optimization. In: 29th Annual Conference on Learning Theory. Proceedings of Machine Learning Research, vol. 49, pp. 81–102 (2016)
Arora, S., Ge, R., Ma, T., Moitra, A.: Simple, efficient, and neural algorithms for sparse coding. J. Mach. Learn. Res. (2015)
Attouch, H., Bolte, J.: On the convergence of the proximal algorithm for nonsmooth functions involving analytic features. Math. Program. 116(1–2), 5–16 (2009)
Article MathSciNet MATH Google Scholar
Attouch, H., Bolte, J., Redont, P., Soubeyran, A.: Proximal alternating minimization and projection methods for nonconvex problems: an approach based on the Kurdyka–łojasiewicz inequality. Math. Oper. Res. 35(2), 438–457 (2010)
Article MathSciNet MATH Google Scholar
Attouch, H., Bolte, J., Svaiter, B.F.: Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward-backward splitting, and regularized Gauss–Seidel methods. Math. Program. 137(1–2, Ser. A), 91–129 (2013)
Article MathSciNet MATH Google Scholar
Bandeira, A.S., Boumal, N., Singer, A.: Tightness of the maximum likelihood semidefinite relaxation for angular synchronization. Math. Program. 163(1–2), 145–167 (2017)
Article MathSciNet MATH Google Scholar
Bandeira, A.S., Boumal, N., Voroninski, V.: On the low-rank approach for semidefinite programs arising in synchronization and community detection. In: Conference on Learning Theory, pp. 361–382 (2016)
Bao, W., Cai, Y.: Mathematical theory and numerical methods for Bose–Einstein condensation. Kinet. Relat. Models 6(1), 1–135 (2012)
Article MathSciNet MATH Google Scholar
Bolte, J., Daniilidis, A., Lewis, A.: The Łojasiewicz inequality for nonsmooth subanalytic functions with applications to subgradient dynamical systems. SIAM J. Optim. 17(4), 1205–1223 (2006)
Article MathSciNet MATH Google Scholar
Bolte, J., Daniilidis, A., Lewis, A.: A nonsmooth Morse–Sard theorem for subanalytic functions. J. Math. Anal. Appl. 321(2), 729–740 (2006)
Article MathSciNet MATH Google Scholar
Bolte, J., Daniilidis, A., Lewis, A., Shiota, M.: Clarke subgradients of stratifiable functions. SIAM J. Optim. 18(2), 556–572 (2007)
Article MathSciNet MATH Google Scholar
Bolte, J., Sabach, S., Teboulle, M.: Proximal alternating linearized minimization or nonconvex and nonsmooth problems. Math. Program. 146(1–2), 459–494 (2014)
Article MathSciNet MATH Google Scholar
Bonettini, S., Loris, I., Porta, F., Prato, M., Rebegoldi, S.: On the convergence of a linesearch based proximal-gradient method for nonconvex optimization. Inverse Probl. 33(5) (2017)
Boumal, N.: Nonconvex phase synchronization. SIAM J. Optim. 26(4), 2355–2377 (2016)
Article MathSciNet MATH Google Scholar
Cai, Y., Zhang, L., Bai, Z., Li, R.C.: On an eigenvector-dependent nonlinear eigenvalue problem. SIAM J. Matrix Anal. Appl. 39(3), 1360–1382 (2018)
Article MathSciNet MATH Google Scholar
Candès, E.J., Li, X.: Solving quadratic equations via phaselift when there are about as many equations as unknowns. Found. Comput. Math. 14(5), 1017–1026 (2014)
Article MathSciNet MATH Google Scholar
Candès, E.J., Li, X., Soltanolkotabi, M.: Phase retrieval via wirtinger flow: theory and algorithms. IEEE Trans. Inf. Theory 61(4), 1985–2007 (2015)
Article MathSciNet MATH Google Scholar
Candès, E.J., Strohmer, T., Voroninski, V.: Phaselift: exact and stable signal recovery from magnitude measurements via convex programming. Commun. Pure Appl. Math. 66(8), 1241–1274 (2013)
Article MathSciNet MATH Google Scholar
Candès, E.J., Tao, T.: The power of convex relaxation: near-optimal matrix completion. IEEE Trans. Inf. Theory 56(5), 2053–2080 (2010)
Article MathSciNet MATH Google Scholar
Cartis, C., Gould, N.I.M., Toint, P.L.: Second-order optimality and beyond: characterization and evaluation complexity in convexly constrained nonlinear optimization. Found. Comput. Math. 18(5), 1073–1107 (2018)
Article MathSciNet MATH Google Scholar
Chen, Y., Chi, Y., Fan, J., Ma, C.: Gradient descent with random initialization: fast global convergence for nonconvex phase retrieval. Math. Program. 176(1–2, Ser. B), 5–37 (2019)
Article MathSciNet MATH Google Scholar
Chen, Y., Chi, Y., Goldsmith, A.J.: Exact and stable covariance estimation from quadratic sampling via convex programming. IEEE Trans. Inf. Theory 61(7), 4034–4059 (2015)
Article MathSciNet MATH Google Scholar
Chi, Y., Lu, Y.M., Chen, Y.: Nonconvex optimization meets low-rank matrix factorization: an overview (2018). Preprint arXiv:1809.09573
D’Acunto, D., Kurdyka, K.: Explicit bounds for the łojasiewicz exponent in the gradient inequality for polynomials. Ann. Pol. Math. 87, 51–61 (2005)
Dedieu, J.P.: Third- and fourth-order optimality conditions in optimization. Optimization 33(2), 97–104 (1995)
Article MathSciNet MATH Google Scholar
Edwards, M., Burnett, K.: Numerical solution of the nonlinear Schrödinger equation for small samples of trapped neutral atoms. Phys. Rev. A 51(2), 1382 (1995)
Article Google Scholar
Forti, M., Nistri, P., Quincampoix, M.: Convergence of neural networks for programming problems via a nonsmooth Łojasiewicz inequality. IEEE Trans. Neural Netw. 17(6), 1471–1486 (2006)
Article Google Scholar
Gao, B., Liu, X., Chen, X., Yuan, Y.: On the Łojasiewicz exponent of the quadratic sphere constrained optimization problem (2016). Preprint arXiv:1611.08781
Gao, B., Liu, X., Chen, X., Yuan, Y.: A new first-order algorithmic framework for optimization problems with orthogonality constraints. SIAM J. Optim. 28(1), 302–332 (2018)
Article MathSciNet MATH Google Scholar
García-Ripoll, J.J., Pérez-García, V.M.: Optimizing Schrödinger functionals using sobolev gradients: applications to quantum mechanics and nonlinear optics. SIAM J. Sci. Comput. 23(4), 1316–1334 (2001)
Article MathSciNet MATH Google Scholar
Ge, R., Huang, F., Jin, C., Yuan, Y.: Escaping from saddle points–online stochastic gradient for tensor decomposition. In: Conference Learning Theory, pp. 797–842 (2015)
Ge, R., Jin, C., Zheng, Y.: No spurious local minima in nonconvex low rank problems: a unified geometric analysis, vol. 70, pp. 1233–1242. In: Conference Machine Learning (2017)
Ge, R., Lee, J.D., Ma, T.: Matrix completion has no spurious local minimum. Adv. Neural Inf. Process. Syst. 2973–2981 (2016)
Ge, R., Ma, T.: On the optimization landscape of tensor decompositions. Adv. Neural Inf. Process. Syst. 3653–3663 (2017)
Griffin, A., Snoke, D.W., Stringari, S.: Bose–Einstein Condensation. Cambridge University Press, Cambridge (1996)
Google Scholar
Gwoździewicz, J.: The Łojasiewicz exponent of an analytic function at an isolated zero. Comment. Math. Helv. 74(3), 364–375 (1999)
Article MathSciNet MATH Google Scholar
Hu, J., Jiang, B., Liu, X., Wen, Z.: A note on semidefinite programming relaxations for polynomial optimization over a single sphere. Sci. China Math. 59(8), 1543–1560 (2016)
Article MathSciNet MATH Google Scholar
Hu, J., Milzarek, A., Wen, Z., Yuan, Y.: Adaptive quadratically regularized Newton method for Riemannian optimization. SIAM J. Matrix Anal. Appl. 39(3), 1181–1207 (2018)
Article MathSciNet MATH Google Scholar
Jaganathan, K., Eldar, Y.C., Hassibi, B.: Phase retrieval: an overview of recent developments (2015). Preprint arXiv:1510.07713
Jain, P., Kar, P.: Non-convex optimization for machine learning. Found. Trends® Mach. Learn. 10(3–4), 142–336 (2017)
Keshavan, R.H., Montanari, A., Oh, S.: Matrix completion from a few entries. IEEE Trans. Inf. Theory 56(6), 2980–2998 (2010)
Article MathSciNet MATH Google Scholar
Keshavan, R.H., Montanari, A., Oh, S.: Matrix completion from noisy entries. J. Mach. Learn. Res. 11(Jul), 2057–2078 (2010)
MathSciNet MATH Google Scholar
Kreutz-Delgado, K.: The complex gradient operator and the CR-calculus. Preprint arXiv:0906.4835 (2009)
Kurdyka, K.: On gradients of functions definable in o-minimal structures. Ann. Inst. Fourier (Grenoble) 48(3), 769–783 (1998)
Article MathSciNet MATH Google Scholar
Lee, J.D., Panageas, I., Piliouras, G., Simchowitz, M., Jordan, M.I., Recht, B.: First-order methods almost always avoid strict saddle points. Math. Program. 176(1–2, Ser. B), 311–337 (2019)
Article MathSciNet MATH Google Scholar
Lee, J.D., Simchowitz, M., Jordan, M.I., Recht, B.: Gradient descent converges to minimizers. In: Conference Learning Theory, pp. 1246–1257 (2016)
Li, G., Pong, T.K.: Calculus of the exponent of Kurdyka–Łojasiewicz inequality and its applications to linear convergence of first-order methods. Found. Comput. Math. 18(5), 1199–1232 (2018)
Article MathSciNet MATH Google Scholar
Li, X., Zhu, Z., So, A.M.C., Vidal, R.: Nonconvex robust low-rank matrix recovery (2018). Preprint arXiv:1809.09237
Liang, S., Sun, R., Li, Y., Srikant, R.: Understanding the loss surface of neural networks for binary classification. In: International Conference on Machine Learning, pp. 2835–2843 (2018)
Liu, H., So, A.M.C., Wu, W.: Quadratic optimization with orthogonality constraint: explicit Łojasiewicz exponent and linear convergence of retraction-based line-search and stochastic variance-reduced gradient methods. Math. Program. 1–48 (2017)
Liu, H., Yue, M.C., Man-Cho So, A.: On the estimation performance and convergence rate of the generalized power method for phase synchronization. SIAM J. Optim. 27(4), 2426–2446 (2017)
Article MathSciNet MATH Google Scholar
Łojasiewicz, S.: Une propriété topologique des sous-ensembles analytiques réels. Les équations aux dérivées partielles 117, 87–89 (1963)
MATH Google Scholar
Luo, Z.Q., Pang, J.S.: Error bounds for analytic systems and their applications. Math. Program. 67(1–3), 1–28 (1994)
Article MathSciNet MATH Google Scholar
Luo, Z.Q., Sturm, J.F.: Error bounds for quadratic systems. In: High Performance Optimization, pp. 383–404. Springer (2000)
Marčenko, V.A., Pastur, L.A.: Distribution of eigenvalues for some sets of random matrices. Math. USSR-Sbornik 1(4), 457 (1967)
Article Google Scholar
Merlet, B., Nguyen, T.N., et al.: Convergence to equilibrium for discretizations of gradient-like flows on Riemannian manifolds. Differ. Integr. Equ. 26(5/6), 571–602 (2013)
MathSciNet MATH Google Scholar
More, J.J.: Generalizations of the trust region problem. Optim. Methods Softw. 2(3–4), 189–209 (1993)
Article Google Scholar
Murty, K.G., Kabadi, S.N.: Some NP-complete problems in quadratic and nonlinear programming. Math. Program. 39(2), 117–129 (1987)
Article MathSciNet MATH Google Scholar
Ochs, P., Chen, Y., Brox, T., Pock, T.: iPiano: inertial proximal algorithm for nonconvex optimization. SIAM J. Imaging Sci. 7(2), 1388–1419 (2014)
Article MathSciNet MATH Google Scholar
Panageas, I., Piliouras, G.: Gradient descent only converges to minimizers: non-isolated critical points and invariant regions. arXiv:1605.00405 (2016)
Penot, J.P.: Higher-order optimality conditions and higher-order tangent sets. SIAM J. Optim. 27(4), 2508–2527 (2017)
Article MathSciNet MATH Google Scholar
Pethick, C.J., Smith, H.: Bose–Einstein Condensation in Dilute Gases. Cambridge University Press, Cambridge (2002)
Google Scholar
Press, W.H., Flannery, B.P., Teukolsky, S.A., Vetterling, W.T.: The art of scientific computing. In: Numerical Recipes. Cambridge University Press, Cambridge (1986)
Recht, B.: A simpler approach to matrix completion. J. Mach. Learn. Res. 12(Dec), 3413–3430 (2011)
MathSciNet MATH Google Scholar
Schneider, R., Uschmajew, A.: Convergence results for projected line-search methods on varieties of low-rank matrices via Łojasiewicz inequality. SIAM J. Optim. 25(1), 622–646 (2015)
Article MathSciNet MATH Google Scholar
Shechtman, Y., Eldar, Y.C., Cohen, O., Chapman, H.N., Miao, J., Segev, M.: Phase retrieval with application to optical imaging: a contemporary overview. IEEE Signal Process. Mag. 32(3), 87–109 (2015)
Article Google Scholar
Sorber, L., Barel, M.V., Lathauwer, L.D.: Unconstrained optimization of real functions in complex variables. SIAM J. Optim. 22(3), 879–898 (2012)
Article MathSciNet MATH Google Scholar
Sun, J.: When Are Nonconvex Optimization Problems Not Scary? Columbia University, New York (2016)
Google Scholar
Sun, J., Qu, Q., Wright, J.: A geometric analysis of phase retrieval. In: IEEE International Symposium on Information Theory (ISIT), 2016, pp. 2379–2383. IEEE (2016)
Sun, J., Qu, Q., Wright, J.: Complete dictionary recovery over the sphere i: overview and the geometric picture. IEEE Trans. Inf. Theory 63(2), 853–884 (2017)
Article MathSciNet MATH Google Scholar
Sun, J., Qu, Q., Wright, J.: Complete dictionary recovery over the sphere ii: recovery by Riemannian trust-region method. IEEE Trans. Inf. Theory 63(2), 885–914 (2017)
Article MathSciNet MATH Google Scholar
Sun, J., Qu, Q., Wright, J.: A geometric analysis of phase retrieval. Found. Comput. Math. 18(5), 1131–1198 (2018)
Article MathSciNet MATH Google Scholar
Sun, R., Luo, Z.Q.: Guaranteed matrix completion via non-convex factorization. IEEE Trans. Inf. Theory 62(11), 6535–6579 (2016)
Article MathSciNet MATH Google Scholar
Thoai, N.V.: General quadratic programming. In: Essays and Surveys in Global Optimization GERAD 25th Anniversary Series, vol. 7, pp. 107–129. Springer, New York (2005)
Wen, Z., Yin, W.: A feasible method for optimization with orthogonality constraints. Math. Program. 142(1–2, Ser. A), 397–434 (2013)
Article MathSciNet MATH Google Scholar
Wu, X., Wen, Z., Bao, W.: A regularized Newton method for computing ground states of Bose–Einstein condensates. SIAM J. Sci. Comput. 73(1), 303–329 (2017)
Article MathSciNet MATH Google Scholar
Yang, W.H.: Error bounds for convex polynomials. SIAM J. Optim. 19(4), 1633–1647 (2008)
Article MathSciNet MATH Google Scholar
Yang, W.H., Zhang, L.H., Song, R.: Optimality conditions for the nonlinear programming problems on Riemannian manifolds. Pac. J. Optim. 10, 415–434 (2014)
MathSciNet MATH Google Scholar

Download references

Acknowledgements

The authors are grateful to Prof. Adrian Lewis, the Associate Editor and the anonymous reviewers for their valuable comments and suggestions that helped to improve the quality of our manuscript.

Author information

Authors and Affiliations

Department of Mathematics, University of California, Berkeley, CA, USA
Haixiang Zhang
School of Data Science (SDS), Chinese University of Hong Kong, Shenzhen, China
Andre Milzarek
Shenzhen Research Institute of Big Data (SRIBD), Shenzhen, China
Andre Milzarek
Shenzhen Institute for Artificial Intelligence and Robotics for Society (AIRS), Shenzhen, China
Andre Milzarek
Beijing International Center for Mathematical Research, Peking University, Beijing, China
Zaiwen Wen
Center for Data Science, Peking University, Beijing, China
Zaiwen Wen
Department of Mathematics, University of California, Los Angeles, CA, USA
Wotao Yin

Authors

Haixiang Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Andre Milzarek
View author publications
You can also search for this author in PubMed Google Scholar
Zaiwen Wen
View author publications
You can also search for this author in PubMed Google Scholar
Wotao Yin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zaiwen Wen.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

H. Zhang was partly supported by the elite undergraduate training program of School of Mathematical Sciences in Peking University. Z. Wen was supported in part by the NSFC Grant 11831002, the Key-Area Research and Development Program of Guangdong Province (No. 2019B121204008) and Beijing Academy of Artificial Intelligence. A. Milzarek was partly supported by the Beijing International Center for Mathematical Research, Peking University, the Boya Postdoctoral Fellowship Program, the Shenzhen Institute for Artificial Intelligence and Robotics for Society (AIRS), and by the Fundamental Research Fund - Shenzhen Research Institute of Big Data (SRIBD) Startup Fund JCYJ-AM20190661.

A Proof of Theorem 6.2

Proof

As in the proof of Theorem 6.1, the verification of Theorem 6.2 is mainly based on proper decompositions of $\Delta = \mathbf{y}-\mathbf{z}$ and $\mathrm{diag}(\tau )\mathbf{y}$ that allow to derive appropriate bounds of $|f(\mathbf{y}) - f(\mathbf{z})|$ and $\Vert \mathrm {grad\!\;}{f(\mathbf{y})}\Vert $ if $\mathbf{y}$ is close to $\mathbf{z}$ or if, equivalently, $\Delta $ is sufficiently small. In particular, we will discuss three cases that will allow to simplify and estimate the expressions for $|f(\mathbf{y}) - f(\mathbf{z})|$ and $\Vert \mathrm {grad\!\;}{f(\mathbf{y})}\Vert $ via identifying the different leading terms.

Without loss of generality we assume $\beta =1$. Let $\mathbf{y} \in \mathbb {S}^{n-1}$ be arbitrary and let us set $\Delta = \mathbf{y} - \mathbf{z}$, and

$$\begin{aligned}&\gamma _{1} = \sum _{k\in \mathcal {I}}z_{k}^{3}\Delta _{k},\quad \gamma _{2} = \sum _{k\in \mathcal {I}} z_{k}^2\Delta _{k}^2, \quad \gamma _{3}=\sum _{k\in \mathcal {I}} z_{k}\Delta _{k}^{3},\quad \gamma _{4}= \Vert \Delta \Vert _4^4. \end{aligned}$$

(A.1)

Based on the representation $\mathrm {grad\!\;}{f(\mathbf{y})} = P_\mathbf{y}^\bot [H + 2 \mathrm{diag}(\tau )]\mathbf{y}$, we now introduce the following decompositions

$$\begin{aligned}&2\mathrm{diag}(\tau )\mathbf{y} = \mathbf{w} + c_{1} \mathbf{y}, \quad \mathbf{w} = 2 P_\mathbf{y}^\bot \mathrm{diag}(\tau )\mathbf{y} , \quad c_{1} = 2 \mathbf{y}^T \mathrm{diag}(\tau )\mathbf{y} \nonumber \\&{\Delta } = \mathbf{u} + \mathbf{v}, \quad H\mathbf{u} = 0, \quad \mathbf{u}^T \mathbf{v} = 0, \quad \Vert H\mathbf{v}\Vert \ge \sigma _-(H)\Vert \mathbf{v}\Vert , \end{aligned}$$

(A.2)

where $\sigma _-(H)$ denotes the smallest positive singular value of H. Using this decomposition, (6.5), and $H\mathbf{y} = H\Delta $, we can express the norm of the Riemannian gradient as follows

$$\begin{aligned} \Vert \mathrm {grad\!\;}{f(\mathbf{y})}\Vert ^2&= \Vert P_\mathbf{y}^\bot [H + 2 \mathrm{diag}(\tau )]\mathbf{y}\Vert ^{2} = \Vert P_\mathbf{y}^\bot [H\mathbf{y} + \mathbf{w} + c_1\mathbf{y}]\Vert ^{2}\nonumber \\&= \Vert P_\mathbf{y}^\bot [H\Delta + \mathbf{w}]\Vert ^{2} = \Vert H\Delta + \mathbf{w}\Vert ^{2} - (\mathbf{y}^{T}H\Delta + \mathbf{y}^{T}{} \mathbf{w})^{2}\nonumber \\&= \Vert H{\Delta } + \mathbf{w}\Vert ^2 - ({\Delta }^T H{\Delta })^2 = \Vert H\mathbf{v} + \mathbf{w}\Vert ^2 - (\mathbf{v}^T H\mathbf{v})^2. \end{aligned}$$

(A.3)

Let $\lambda _+(H)$ be the largest eigenvalue of H. Then, by definition of $\mathbf{v}$, we obtain

$$\begin{aligned} |\mathbf{v}^{T}H\mathbf{v}|&\le \lambda _+(H) \Vert \mathbf{v}\Vert ^2 \le \lambda _+(H){\sigma _{-}(H)}^{-2} \Vert H\mathbf{v}\Vert ^2 =: {\bar{\sigma }}^{-1} \cdot \mathbf{v}^{T} H^{2} \mathbf{v}. \end{aligned}$$

(A.4)

Moreover, Lemma 6.1 yields

$$\begin{aligned} \Vert \tau \Vert ^\frac{3}{2} \le \eta _1 \Vert \mathbf{w}\Vert \end{aligned}$$

(A.5)

for some constant $\eta _{1}>0$ and for all $\mathbf{y} \in \mathbb {S}^{n-1}$ sufficiently close to $\mathbf{z}$. Throughout the proof, we will also repeatedly use the following facts:

$$\begin{aligned} 2 \mathbf{y}^{T}\Delta&= \Vert \mathbf{y}\Vert ^{2}+\Vert \Delta \Vert ^{2}-\Vert \mathbf{y}-\Delta \Vert ^{2} = \Vert \Delta \Vert ^{2},\nonumber \\ \mathbf{z}^{T}\Delta&=\mathbf{y}^{T}\Delta - \Vert \Delta \Vert ^{2}=-\frac{1}{2}\Vert \Delta \Vert ^{2}. \end{aligned}$$

(A.6)

Let $\epsilon \in (0,1]$ be an arbitrary, small positive constant. We now discuss three different cases.

Case 1. $\Vert \mathbf {w}\Vert \ge (1+\epsilon ) \Vert H\mathbf{v}\Vert $ or $\Vert \mathbf {w}\Vert \le (1-\epsilon )\Vert H\mathbf{v}\Vert $. We first assume $\mathbf{w} \ne 0$ and $\epsilon < 1$. Since the function $x \mapsto \varrho (x) := x + 1/x$ is monotonically decreasing on the interval $(0,1-\epsilon ]$ and monotonically increasing on $[1+\epsilon ,\infty )$, it follows

$$\begin{aligned} \varrho (x) \ge 1-\epsilon + \frac{1}{1-\epsilon } \quad x \in (0,1-\epsilon ] \quad \text {and} \quad \varrho (x) \ge 1+\epsilon + \frac{1}{1+\epsilon } \quad x \ge 1+\epsilon \end{aligned}$$

Thus, we obtain

$$\begin{aligned} \frac{\Vert \mathbf{w}\Vert }{\Vert H\mathbf{v}\Vert }+\frac{\Vert H\mathbf{v}\Vert }{\Vert \mathbf{w}\Vert } \ge \min \left\{ 1-\epsilon + \frac{1}{1-\epsilon },1+\epsilon + \frac{1}{1+\epsilon }\right\} = \frac{(1+\epsilon )^2+1}{1+\epsilon }, \end{aligned}$$

which further implies

$$\begin{aligned} \Vert H\mathbf{v} + \mathbf{w}\Vert ^{2}&\ge \Vert \mathbf{w}\Vert ^{2}+\Vert H\mathbf{v}\Vert ^{2} - 2\Vert \mathbf{w}\Vert \Vert H\mathbf{v}\Vert \\&= \left[ 1 - 2\left( \frac{\Vert \mathbf{w}\Vert }{\Vert H\mathbf{v}\Vert }+\frac{\Vert H\mathbf{v}\Vert }{\Vert \mathbf{w}\Vert }\right) ^{-1}\right] \cdot \left( \Vert \mathbf{w}\Vert ^{2}+\Vert H\mathbf{v}\Vert ^{2}\right) \\&\ge \frac{\epsilon ^{2}}{(1+\epsilon )^2+1} \left( \Vert \mathbf{w}\Vert ^{2}+\Vert H\mathbf{v}\Vert ^{2} \right) \ge \frac{\epsilon ^{2}}{5} \left( \Vert \mathbf{w}\Vert ^{2}+\Vert H\mathbf{v}\Vert ^{2} \right) . \end{aligned}$$

Next, we choose $\delta _\mathbf{z} > 0$ sufficiently small such that $\Vert \mathbf{v}\Vert ^2 \le \epsilon ^2 {\bar{\sigma }} / (10\lambda _+(H))$ and $|\mathbf{v}^TH\mathbf{v}| \le 1$. (This is possible due to the decomposition (A.2) and $\Vert \mathbf{v}\Vert \le \Vert \Delta \Vert $). Using (A.4), (A.5), the estimate $[\frac{1}{2} |a + b|]^{3/2} \le [|a|^{3/2} + |b|^{3/2}] / \sqrt{2}$, $a,b \in \mathbb {R}$, and setting ${\tilde{\eta }}_1 := \min \{\eta _1^{-2},\frac{{\bar{\sigma }}}{2}\}$, it follows

$$\begin{aligned} \Vert \mathrm {grad\!\;}{f(\mathbf {y})}\Vert ^{2}&\ge \frac{\epsilon ^{2}}{5}(\Vert \mathbf {w}\Vert ^{2}+\Vert H\mathbf{v}\Vert ^{2})-\lambda _{+}(H)\Vert \mathbf{v}\Vert ^{2} |\mathbf{v}^T H\mathbf{v}|\\&\ge \frac{\epsilon ^{2}}{10}(2\Vert \mathbf {w}\Vert ^{2}+2\Vert H\mathbf{v}\Vert ^{2})-\lambda _{+}(H) \cdot \epsilon ^2\bar{\sigma }/(10\lambda _+(H)) \cdot \bar{\sigma }^{-1}\Vert H\mathbf{v}\Vert ^2\\&=\frac{\epsilon ^{2}}{10}\left( 2\Vert \mathbf {w}\Vert ^{2}+\Vert H\mathbf{v}\Vert ^{2}\right) \ge \frac{\epsilon ^2}{5} \min \left\{ \frac{1}{\eta _{1}^{2}},\frac{{\bar{\sigma }}}{2} \right\} \left[ \Vert \tau \Vert ^3 + |\mathbf{v}^T H \mathbf{v}| \right] \\&\ge \frac{\epsilon ^2{\tilde{\eta }}_1}{5} \left[ \Vert \tau \Vert ^3 + |\mathbf{v}^T H \mathbf{v}|^\frac{3}{2} \right] \ge \frac{\epsilon ^2 {\tilde{\eta }}_1}{10} \left| \Vert \tau \Vert ^2 + \mathbf{v}^T H \mathbf{v} \right| ^{\frac{3}{2}}\\&= \frac{\epsilon ^2{\tilde{\eta }}_1\sqrt{2}}{5} | f(\mathbf{y}) - f(\mathbf{z}) |^\frac{3}{2}, \end{aligned}$$

where the last equality is a consequence of (6.4). Thus, we can infer that the largest KL exponent of problem (1.1) at $\mathbf{z}$ is at least $\frac{1}{4}$.

Case 2. $(2-{\epsilon })r_{+}^{2}\Vert \Delta _{\mathcal {I}} \Vert \ge \Vert \Delta \Vert ^2$ or $\gamma _1 \le (2-\epsilon )r_+^2 \Vert \Delta _\mathcal {I}\Vert ^2 \Vert \Delta \Vert ^{-2}$. First, utilizing the identity $\Delta ^T \mathbf{y} = \frac{1}{2}\Vert \Delta \Vert ^2$ in (A.6), we have $P_{\mathbf {y}}^{\perp }\Delta =\Delta -\frac{1}{2}\Vert \Delta \Vert ^{2}\mathbf {y}$. Defining $t_\Delta := \Delta ^T P_{\mathbf {y}}^{\perp }\Delta = \Vert \Delta \Vert ^2 - \frac{1}{4} \Vert \Delta \Vert ^4$, we will now work with the following additional decompositions

$$\begin{aligned}&H\Delta = \Delta ^{T}H\Delta \cdot \mathbf {y}+c_{2} P_{\mathbf {y}}^{\perp } \Delta +\mathbf {w}_{1}, \quad c_{2}=\frac{1-\frac{1}{2}\Vert \Delta \Vert ^{2}}{t_\Delta }\Delta ^{T}H\Delta ,\\&\mathbf {w}=c_{3} P_{\mathbf {y}}^{\perp }\Delta +\mathbf {w}_{2}, \quad c_{3}= \frac{1}{t_\Delta }[2\Delta ^T\mathrm{diag}(\tau )\mathbf{y} - \Vert \Delta \Vert ^2 \mathbf{y}^T\mathrm{diag}(\tau )\mathbf{y}]. \end{aligned}$$

We note that due to the choice of $c_2$ and $c_3$ and (A.2), the vectors $\mathbf {w}_{1}$ and $\mathbf {w}_{2}$ are orthogonal to $\mathbf {y}$ and $\Delta $. Hence, by (A.3), it holds that

$$\begin{aligned} \Vert \mathrm {grad\!\;}{f(\mathbf {y})}\Vert ^{2}&= \Vert H\Delta \Vert ^2 + 2\mathbf{w}^T H\Delta + \Vert \mathbf{w}\Vert ^2 - (\Delta ^T H \Delta )^2 \\&= (c_2^2 + 2 c_2c_3 + c_3^2)\Vert P_\mathbf{y}^\bot \Delta \Vert ^2 + \Vert \mathbf{w}_1\Vert ^2 + 2 \mathbf{w}_1^T\mathbf{w}_2 + \Vert \mathbf{w}_2\Vert ^2 \\&= (c_{2}+c_{3})^{2} \cdot t_\Delta +\Vert \mathbf {w}_{1}+\mathbf {w}_{2}\Vert ^{2} \ge (c_{2}+c_{3})^{2} \cdot t_\Delta . \end{aligned}$$

Recalling the definitions introduced in (A.1) and using $\tau _k = (y_k - z_k)(y_k + z_k) = \Delta _k (\Delta _k + 2z_k)$ and $y_k = \Delta _k + z_k$, we can express $t_\Delta \cdot c_{3}$ via

$$\begin{aligned} t_\Delta c_{3}&= 2\sum \limits _{k\in [n]} \Delta _k^2(\Delta _k+2z_k)(\Delta _k+z_k) \nonumber \\&\quad -\Vert \Delta \Vert ^{2}\sum \limits _{k\in [n]} \Delta _k(\Delta _{k}+2z_{k})(z_{k}+\Delta _{k})^{2} \nonumber \\&= \sum \limits _{k \in [n]} 2 [\Delta _k^4 + 3 \Delta _k^3z_k + 2\Delta _k^2z_k^2] \nonumber \\&\quad - \Vert \Delta \Vert ^2 [\Delta _k^4 + 4 \Delta _k^3z_k + 5\Delta _k^2z_k^2 + 2\Delta _kz_k^3] \nonumber \\&= (2-\Vert \Delta \Vert ^2) \gamma _4 + (6 - 4 \Vert \Delta \Vert ^2) \gamma _3 + (4-5\Vert \Delta \Vert ^2) \gamma _2 - 2 \Vert \Delta \Vert ^2 \gamma _1. \end{aligned}$$

(A.7)

Notice that we have $\gamma _2 \ge r_+^2 \Vert \Delta _\mathcal {I}\Vert ^2$ and $\gamma _1 \le 1$ provided that $\Vert \Delta \Vert \le 1$. Now, if $(2-{\epsilon })r_{+}^{2}\Vert \Delta _{\mathcal {I}}\Vert \ge \Vert \Delta \Vert ^{2}$, we obtain

$$\begin{aligned} (4-5\Vert \Delta \Vert ^2) \gamma _2 - 2\Vert \Delta \Vert ^2 \gamma _1&\ge (4 - 5\Vert \Delta \Vert ^2) r_+^2 \Vert \Delta _\mathcal {I}\Vert ^2 - 2\Vert \Delta \Vert ^2\\&\ge (2\epsilon - 5\Vert \Delta \Vert ^2) r_+^2 \Vert \Delta _\mathcal {I}\Vert ^2. \end{aligned}$$

and hence, it follows

$$\begin{aligned} t_\Delta \cdot c_{3} \ge \gamma _4 + \epsilon r_+^2 \Vert \Delta _\mathcal {I}\Vert ^2 + o(\Vert \Delta _\mathcal {I}\Vert ^2) \ge \Vert \Delta \Vert _4^4 + \frac{\epsilon r_+^2}{2} \Vert \Delta _\mathcal {I}\Vert ^2 \end{aligned}$$

(A.8)

for $\Delta $ sufficiently small. Otherwise, if $\gamma _1 \le (2-\epsilon )r_+^2 \Vert \Delta _\mathcal {I}\Vert ^2 \Vert \Delta \Vert ^{-2}$, then we also have $(4-5\Vert \Delta \Vert ^2) \gamma _2 - 2 \Vert \Delta \Vert ^2 \gamma _1 \ge (2\epsilon - 5\Vert \Delta \Vert ^2) r_+^2 \Vert \Delta _\mathcal {I}\Vert ^2$ and thus, (A.8) holds in both sub-cases. Consequently, due to the positive semidefiniteness of H, (A.8), and

$$\begin{aligned} \Vert \Delta \Vert ^2 = - 2 \Delta ^T \mathbf{z} = -2\Delta _\mathcal {I}^T z_\mathcal {I}\le 2 \Vert \Delta _\mathcal {I}\Vert \Vert z_\mathcal {I}\Vert \le 2 \Vert \Delta _\mathcal {I}\Vert , \end{aligned}$$

(A.9)

we can infer

$$\begin{aligned} \Vert \mathrm {grad\!\;}{f(\mathbf {y})}\Vert&\ge |c_{2}+c_{3}| \sqrt{t_\Delta } = \left[ \left( 1-\frac{1}{2}\Vert \Delta \Vert ^2\right) \Delta ^{T}H\Delta + t_\Delta c_{3}\right] t_\Delta ^{-1/2} \\&\ge [\Delta ^{T}H\Delta + 2 \Vert \Delta \Vert _4^4 +\epsilon r_{+}^{2}\Vert \Delta _{\mathcal {I}}\Vert ^{2}] \cdot (2\Vert \Delta \Vert )^{-1} \\&\ge [\Delta ^{T}H\Delta + 2 \Vert \Delta \Vert _4^4 +\epsilon r_{+}^{2}\Vert \Delta _{\mathcal {I}}\Vert ^{2}]^{\frac{3}{4}} \cdot \frac{\epsilon ^{\frac{1}{4}}\sqrt{r_{+} \Vert \Delta _{\mathcal {I}}\Vert }}{2\Vert \Delta \Vert }\\&\ge \eta _2 [\Delta ^{T}H\Delta + 2 \Vert \Delta \Vert _4^4 +\epsilon r_{+}^{2}\Vert \Delta _{\mathcal {I}}\Vert ^{2}]^{\frac{3}{4}}, \end{aligned}$$

where $\eta _2 := \epsilon ^{\frac{1}{4}}\sqrt{r_{+}} / 2\sqrt{2}$ and if $\Delta $ is chosen sufficiently small. Here, we also used the estimates $t_{\Delta }=\Vert \Delta \Vert ^{2}-\Vert \Delta \Vert ^{4}/4\le \Vert \Delta \Vert ^{2}$ and $1- \Vert \Delta \Vert ^2/2 \ge 1/2$ in the second inequality. The third inequality follows from $[\Delta ^{T}H\Delta + 2 \Vert \Delta \Vert _4^4 +\epsilon r_{+}^{2}\Vert \Delta _{\mathcal {I}}\Vert ^{2}]^\frac{1}{4}\ge \epsilon ^\frac{1}{4} \sqrt{r_{+}\Vert \Delta _{\mathcal {I}}\Vert }$. Next, utilizing (6.4) and $|\tau _k| \le 2 |\Delta _k|$ for all $k \in \mathcal {I}$, we finally obtain

$$\begin{aligned} |f(\mathbf {y})-f(\mathbf {z})|&=\frac{1}{2}[ \Delta ^{T}H\Delta + \Vert \tau \Vert ^2] \\&\le \frac{1}{2}\left[ \Delta ^{T}H\Delta + \Vert \Delta _\mathcal {A}\Vert _4^4 + 4\Vert \Delta _{\mathcal {I}}\Vert ^{2} \right] \le \eta _{3} \Vert \mathrm {grad\!\;}{f(\mathbf {y})}\Vert ^{\frac{4}{3}} \end{aligned}$$

for some constant $\eta _{3}>0$ and for all $\mathbf{y}$ sufficiently close to $\mathbf{z}$. Hence, the largest KL exponent is at least $\frac{1}{4}$ in this case.

Case 3. $(1-\epsilon )\Vert H\mathbf{v}\Vert \le \Vert \mathbf {w}\Vert \le (1+\epsilon )\Vert H\mathbf{v}\Vert $, $\gamma _{1}\ge (2-\epsilon )r_{+}^{2}\Vert \Delta _{\mathcal {I}} \Vert ^{2}\Vert \Delta \Vert ^{-2}$ and $\left( 2-\epsilon \right) r_{+}^{2}\Vert \Delta _{\mathcal {I}}\Vert \le \Vert \Delta \Vert ^{2}$. In this case, the inequality (A.9) implies $\Vert \Delta _{\mathcal {I}}\Vert =\Theta (\Vert \Delta \Vert ^{2})$ and setting $\nu = [(2-\epsilon )r_+^2]^{-1}$, the terms $\gamma _{i}$ for $i=1, \ldots ,3$ can be estimated as follows

$$\begin{aligned} (4\nu )^{-1} \Vert \Delta \Vert ^2&\le (2-\epsilon )r_{+}^{2}\Vert \Delta _{\mathcal {I}}\Vert ^{2}\Vert \Delta \Vert ^{-2} \le \gamma _{1}\le \Vert \Delta _{\mathcal {I}}\Vert _{1} \le |{\mathcal {I}}|\nu \Vert \Delta \Vert ^2,\nonumber \\&\quad \frac{r_+^2}{4} \Vert \Delta \Vert ^4 \le r_{+}^{2}\Vert \Delta _{\mathcal {I}}\Vert ^{2} \le \gamma _{2}\le \Vert \Delta _{\mathcal {I}}\Vert ^{2} \le \nu ^2 \Vert \Delta \Vert ^4,\nonumber \\&\quad -\nu ^3 \Vert \Delta \Vert ^6 \le -\Vert \Delta _{\mathcal {I}}\Vert _3^{3} \le \gamma _{3}\le \Vert \Delta _{\mathcal {I}}\Vert _3^{3} \le \nu ^3 \Vert \Delta \Vert ^6. \end{aligned}$$

(A.10)

Together with $\gamma _4 = \Vert \Delta \Vert _4^4$, this shows that

$$\begin{aligned} \gamma _{1} = \Theta (\Vert \Delta \Vert ^2) ,\quad \gamma _{2} = \Theta ( \Vert \Delta \Vert ^4), \quad \gamma _{3}=O(\Vert \Delta \Vert ^{6}),\quad \gamma _{4}=\Theta (\Vert \Delta \Vert ^{4}) \end{aligned}$$

(A.11)

for all $\Delta $. Let us set $m := |\mathcal {I}|$ and define $\sigma _{k}= \Delta _{k} z_{k}+\frac{1}{2m}\Vert \Delta \Vert ^{2}$ for all $k\in \mathcal {I}$ and $\sigma _k = 0$ for all $k \in \mathcal {A}$. Then, by (A.6), we have

$$\begin{aligned} \sum _{k \in [n]} \sigma _k = 0, \quad \Vert \sigma \Vert _1 \ge \sum _{k\in \mathcal {I}} z_{k}^{2}\sigma _{k} = \gamma _1 + \frac{1}{2m} \Vert \Delta \Vert ^2, \end{aligned}$$

(A.12)

and $\Vert \sigma \Vert = \Theta (\Vert \Delta \Vert ^2)$. We now express $\Vert \mathbf {w}\Vert ^{2}$ in terms of $\gamma _{1}$, $\gamma _2$, $\gamma _3$, and $\gamma _{4}$. Specifically, by utilizing (A.2), (A.11) and by mimicking the derivation of (A.7), we obtain

$$\begin{aligned} \frac{1}{4} \Vert \mathbf {w}\Vert ^{2}&= \Vert P_\mathbf{y}^\bot \mathrm{diag}(\tau )\mathbf{y}\Vert ^{2} = \mathbf{y}^T \mathrm{diag}(|\tau |^2)\mathbf{y} - (\mathbf{y}^T\mathrm{diag}(\tau )\mathbf{y})^2 \\&= \sum \limits _{k \in [n]} (z_k+\Delta _k)^2\Delta _k^2(\Delta _k+2z_k)^2 - (\Vert \Delta \Vert _4^4 + 4\gamma _3 + 5\gamma _2 + 2\gamma _1)^2 \\&= \sum \limits _{k \in [n]} \Delta _k^2 [\Delta _k^4 + 6\Delta _k^3z_k + 13\Delta _k^2z_k^2 + 12 \Delta _kz_k^3 + 4z_k^4] - 4\gamma _1^2 - 20\gamma _1\gamma _2 \\&\quad - 25\gamma _2^2 - 8\gamma _3(5\gamma _2+2\gamma _1) - 16 \gamma _3^2 - 2 \Vert \Delta \Vert _4^4(4\gamma _3+5\gamma _2+2\gamma _1) - \Vert \Delta \Vert _4^8 \\&= \sum \limits _{k \in \mathcal {I}} [4 \Delta _k^2 z_k^4 + 12 \Delta _k^3 z_k^3 + 13 \Delta _k^4 z_k^2 + 6\Delta _k^5z_k] - 4\gamma _1^2 + O(\Vert \Delta \Vert ^6) \\&= \sum \limits _{k \in \mathcal {I}} 4\Delta _k^2z_k^4 - 4\gamma _1^2 + O(\Vert \Delta \Vert ^6) = \sum \limits _{k, j \in \mathcal {I}} 4\Delta _k^2z_k^4z_j^2 - 4\gamma _1^2 + O(\Vert \Delta \Vert ^6) \\&= \sum \limits _{k, j \in \mathcal {I}} 2\Delta _k^2z_k^4z_j^2 + \sum \limits _{k, j \in \mathcal {I}} 2\Delta _j^2z_j^4z_k^2 - 4\left( \sum \limits _{k\in \mathcal {I}} \Delta _k z_k^3 \right) ^2 + O(\Vert \Delta \Vert ^6) \\&= 2 \sum \limits _{k, j \in \mathcal {I}} [z_k^4 z_j^2 \Delta _k^2 - 2 z_k^3 z_j^3 \Delta _k \Delta _j + z_k^2 z_j^4 \Delta _j^2] + O(\Vert \Delta \Vert ^6) \\&= 2 \sum \limits _{k, j \in \mathcal {I}} z_k^2 z_j^2 (\sigma _k - \sigma _j)^2 + O(\Vert \Delta \Vert ^6). \end{aligned}$$

We notice that the higher order terms $\sum _{k \in \mathcal {I}} \Delta _k^{3+\ell } z_k^{3-\ell }$, $\ell \in \{0,1,2\}$, can be discussed as in (A.10) and are all bounded by $O(\Vert \Delta \Vert ^6)$. Finally, applying (A.12) and $|z_k| \le 1$ and $z_k^2 \ge r_+^2$ for all $k \in \mathcal {I}$, we obtain

$$\begin{aligned} \sum \limits _{k, j \in \mathcal {I}} z_k^2 z_j^2 (\sigma _k - \sigma _j)^2 \ge r_+^4 \sum \limits _{k,j \in \mathcal {I}} (\sigma _k - \sigma _j)^2 = 2 m r_+^4 \Vert \sigma \Vert ^2 \end{aligned}$$

and ${\sum }_{k, j \in \mathcal {I}} z_k^2 z_j^2 (\sigma _k - \sigma _j)^2 \le 2m \Vert \sigma \Vert ^2$ which implies $\Vert \mathbf {w}\Vert =\Theta (\Vert \Delta \Vert ^{2})$ and

$$\begin{aligned} \Vert H\mathbf{v}\Vert =\Theta (\Vert \Delta \Vert ^{2}) \end{aligned}$$

(A.13)

for $\Delta \rightarrow 0$. As a consequence, by (A.4), we also get

$$\begin{aligned} |\mathbf{v}^{T}H\mathbf{v}|=O(\Vert \Delta \Vert ^{4}). \end{aligned}$$

(A.14)

Furthermore, by (6.4) and (A.1), it holds that

$$\begin{aligned} 2 |f(\mathbf {y})-f(\mathbf {z})|&\le |\Delta ^{T}H\Delta | + \Vert \tau \Vert ^{2} = |\mathbf{v}^{T}H\mathbf{v}| + \sum \limits _{k\in [n]}[\Delta _{k}^{2}+2z_{k}\Delta _{k}]^{2}\nonumber \\&=|\mathbf{v}^{T}H\mathbf{v}| + \Vert \Delta \Vert _4^4 +4\gamma _{3}+4\gamma _{2} = O(\Vert \Delta \Vert ^{4}) \end{aligned}$$

(A.15)

for $\Delta \rightarrow 0$. For some index sets ${\mathcal {K}}, {\mathcal {J}} \subset [n]$, let $H_{{\mathcal {K}}{\mathcal {J}}} \in \mathbb {R}^{|{\mathcal {K}}| \times |{\mathcal {J}}|}$ denote the submatrix $H_{{\mathcal {K}}{\mathcal {J}}} = (H_{kj})_{k \in {\mathcal {K}}, j \in {\mathcal {J}}}$. Notice that due to the positive semidefiniteness of H, we have $H_{\mathcal {I}\mathcal {I}}\succeq 0$ and $H_{\mathcal {A}\mathcal {A}}\succeq 0$. Moreover, due to (A.2), (A.3), and $|\mathbf{v}^{T}H\mathbf{v}|=O(\Vert \Delta \Vert ^{4})$, we obtain

$$\begin{aligned} \Vert \mathrm {grad\!\;}{f(\mathbf{y})}\Vert ^2&= \Vert H\Delta + \mathbf{w} \Vert ^2 + O(\Vert \Delta \Vert ^8) \nonumber \\&= \Vert H_{\mathcal {A}\cdot } \Delta + {w}_\mathcal {A}\Vert ^2 + \Vert H_{\mathcal {I}\cdot } \Delta + {w}_\mathcal {I}\Vert ^2 + O(\Vert \Delta \Vert ^8)\nonumber \\&= \Vert H_{\mathcal {I}\mathcal {A}}^T\Delta _\mathcal {I}+ H_{\mathcal {A}\mathcal {A}} \Delta _\mathcal {A}+ 2\mathrm{diag}(|\Delta _\mathcal {A}|^2) \Delta _\mathcal {A}- c_1 \Delta _\mathcal {A}\Vert ^2 \nonumber \\&\quad + \Vert H_{\mathcal {I}\mathcal {I}} \Delta _\mathcal {I}+ H_{\mathcal {I}\mathcal {A}} \Delta _\mathcal {A}+ {w}_\mathcal {I}\Vert ^2 + O(\Vert \Delta \Vert ^8), \end{aligned}$$

(A.16)

where $c_{1}=2\mathbf{y}^{T}\mathrm{diag}(\tau )\mathbf{y}$ was defined in (A.2) and we used the identities $z_{k}=0$ for $k\in \mathcal {A}$ and $w_{\mathcal {A}}=2\mathrm{diag}(\tau _\mathcal {A}){y}_\mathcal {A}-c_{1}{y}_\mathcal {A}=2\mathrm{diag}(|\Delta _\mathcal {A}|^2)\Delta _\mathcal {A}- c_1 \Delta _\mathcal {A}$. We set

$$\begin{aligned} \mathbf{h}&:= H_{\mathcal {I}\mathcal {I}} \Delta _\mathcal {I}+ H_{\mathcal {I}\mathcal {A}} \Delta _\mathcal {A}, \quad \mathbf{g}_1 := \mathbf{h}+ {w}_\mathcal {I}, \nonumber \\ \mathbf{g}_2&:= H_{\mathcal {I}\mathcal {A}}^T\Delta _\mathcal {I}+ H_{\mathcal {A}\mathcal {A}} \Delta _\mathcal {A}+ 2\mathrm{diag}(|\Delta _\mathcal {A}|^2)\Delta _\mathcal {A}- c_1 \Delta _\mathcal {A}\end{aligned}$$

(A.17)

and, let $\eta _{4},\eta _{5}>0$ and $\mu \in (0,\frac{1}{2})$ be given constants. Next, we discuss two separate sub-cases.

Sub-case 3.1. $\Delta _{\mathcal {I}}^{T}\mathbf{h}\ge -\eta _{4}\Vert \Delta \Vert ^{4+\mu }$ or $\Delta _\mathcal {I}^T \mathbf{h} \le -\eta _5 \Vert \Delta \Vert ^{4-\mu }$. Let us first assume $\Delta _{\mathcal {I}}^{T}{} \mathbf{h}\ge -\eta _{4}\Vert \Delta \Vert ^{4+\mu }$. Following the derivation of (A.7) and using (A.6), (A.2), and (A.11), we obtain $\frac{1}{2}c_1 = \Vert \Delta \Vert _4^4 + 4\gamma _3 + 5\gamma _2 + 2\gamma _1 = \Theta (\Vert \Delta \Vert ^2)$, $\Delta _\mathcal {I}^T y_\mathcal {I}= - \frac{1}{2}\Vert \Delta \Vert ^2$, and

$$\begin{aligned} \Delta _\mathcal {I}^T w_\mathcal {I}= 2\Delta _\mathcal {I}^T\mathrm{diag}(\tau _\mathcal {I}){y}_{\mathcal {I}} - c_1 \Delta _\mathcal {I}^T{y}_\mathcal {I}= 2\Vert \Delta _\mathcal {I}\Vert _4^4 + 6 \gamma _3 + 4\gamma _2 + {\textstyle \frac{1}{2}} \Vert \Delta \Vert ^2 c_1, \end{aligned}$$

which yields

$$\begin{aligned} \Delta _{\mathcal {I}}^{T}{} \mathbf{g}_{1}&\ge - \eta _4 \Vert \Delta \Vert ^{4+\mu } + 2 \Vert \Delta _\mathcal {I}\Vert _4^4 + 4\gamma _2 + \Vert \Delta \Vert ^2 [\Vert \Delta \Vert _4^4 + 5\gamma _2 + 2\gamma _1] \\&\quad + 6\gamma _3 + 4 \Vert \Delta \Vert ^2 \gamma _3 \\&\ge \Theta (\Vert \Delta \Vert ^{4}). \end{aligned}$$

Similarly, in the case $\Delta _{\mathcal {I}}^{T}{} \mathbf{h} \le -\eta _{5}\Vert \Delta \Vert ^{4-\mu }$ and if $\Vert \Delta \Vert $ is sufficiently small, we get

$$\begin{aligned} \Delta _{\mathcal {I}}^{T}{} \mathbf{g}_{1} \le -\frac{\eta _{5}}{2}\Vert \Delta \Vert ^{4-\mu }. \end{aligned}$$

Combining both cases, we can infer $\Vert \Delta _{\mathcal {I}}\Vert \Vert \mathbf{g}_{1}\Vert \ge |\Delta _{\mathcal {I}}^{T}{} \mathbf{g}_{1}| \ge \eta _6 \Vert \Delta \Vert ^4$ for some $\eta _6 > 0$ and for all $\mathbf{y}$ sufficiently close to $\mathbf{z}$. By the assumptions of case 3, this implies $\Vert \mathbf{g}_{1}\Vert \ge \nu ^{-1}\eta _6 \Vert \Delta \Vert ^2$ and hence, by (A.16), we have $\Vert \mathrm {grad\!\;}f(\mathbf{y})\Vert \ge \Theta (\Vert \Delta \Vert ^2)$. Considering (A.15), the Łojasiewicz inequality holds with $\theta = \frac{1}{2}$ in this sub-case.

Sub-case 3.2. $-\eta _{5}\Vert \Delta \Vert ^{4-\mu }\le \Delta _{\mathcal {I}}^{T}{} \mathbf{h} \le -\eta _{4}\Vert \Delta \Vert ^{4+\mu }$. Due to $\Delta _{\mathcal {I}}^{T}H_{\mathcal {I} \mathcal {I}}\Delta _{\mathcal {I}}\ge 0$, this directly yields $\Delta _{\mathcal {I}}^{T}H_{\mathcal {I}\mathcal {A}} \Delta _{\mathcal {A}}=\Delta _{\mathcal {I}}^T\mathbf{h}-\Delta _{\mathcal {I}}^{T}H_{\mathcal {I}\mathcal {I}} \Delta _{\mathcal {I}}<0$. Moreover, we can estimate that

$$\begin{aligned} \Delta _{\mathcal {I}}^{T}H_{\mathcal {I}\mathcal {A}}\Delta _{\mathcal {A}}&=\Delta _{\mathcal {I}}^T\mathbf{h}-\Delta _{\mathcal {I}}^{T}H_{\mathcal {I}\mathcal {I}}\Delta _{\mathcal {I}} \ge -\eta _5\Vert \Delta \Vert ^{4-\mu } - \lambda _-(H_{\mathcal {I}\mathcal {I}})\Vert \Delta _\mathcal {I}\Vert ^2\\&\ge -\eta _5\Vert \Delta \Vert ^{4-\mu } - \lambda _-(H_{\mathcal {I}\mathcal {I}}) \cdot (2-\epsilon )^{-2}r_+^{-4}\Vert \Delta \Vert ^4 \ge -2\eta _5 \Vert \Delta \Vert ^{4-\mu }, \end{aligned}$$

where $\lambda _-(H_{\mathcal {I}\mathcal {I}})$ is the maximal eigenvalue of $H_{\mathcal {I}\mathcal {I}}$. We note that the last inequality holds when $\Vert \Delta \Vert $ is small enough, since $\eta _5,r_+,\epsilon ,H_{\mathcal {I}\mathcal {I}}$ do not depend on $\Delta $ and can be considered as constants when $\Delta \rightarrow 0$. Also, we have

$$\begin{aligned} \Delta _{\mathcal {A}}^{T}H_{\mathcal {A}\mathcal {A}} \Delta _{\mathcal {A}}\ge \Delta _{\mathcal {A}}^{T} (H_{\mathcal {I}\mathcal {A}}^{T}\Delta _{\mathcal {I}} +H_{\mathcal {A}\mathcal {A}}\Delta _{\mathcal {A}})\ge \Delta ^{T}H\Delta = {\mathbf{v}^T H \mathbf{v}}. \end{aligned}$$

(A.18)

Utilizing the positive semidefiniteness of H, it holds that

$$\begin{aligned} \Vert H\mathbf{v}\Vert ^2 = (H^\frac{1}{2}\mathbf{v})^TH(H^\frac{1}{2}\mathbf{v}) \le \lambda _+(H) \Vert H^\frac{1}{2}\mathbf{v}\Vert = \lambda _+(H) \cdot \mathbf{v}^TH\mathbf{v}. \end{aligned}$$

Hence, the estimates $\Vert H\mathbf{v}\Vert =\Theta (\Vert \Delta \Vert ^2)$ in (A.13) and $|\mathbf{v}^TH\mathbf{v}| = O(\Vert \Delta \Vert ^4)$ in (A.14) lead to $\mathbf{v}^TH\mathbf{v} = \Theta (\Vert \Delta \Vert ^4)$. Consequently, if $\Delta _{\mathcal {A}}^{T}H_{\mathcal {A}\mathcal {A}} \Delta _{\mathcal {A}}\ge \eta _{7}\Vert \Delta \Vert ^{4-2\mu }$ for some constant $\eta _{7}>0$, then we can infer

$$\begin{aligned} \Delta _{\mathcal {A}}^{T}\mathbf{g}_{2}&=\Delta _{\mathcal {A}}^{T}(H_{\mathcal {I} \mathcal {A}}^{T}\Delta _{\mathcal {I}}+H_{\mathcal {A} \mathcal {A}}\Delta _{\mathcal {A}})+2\Vert \Delta _{\mathcal {A}} \Vert _{4}^{4}-c_1\Vert \Delta _{\mathcal {A}}\Vert ^{2}\\&\ge \eta _{7}\Vert \Delta \Vert ^{4-2\mu }-2\eta _{5} \Vert \Delta \Vert ^{4-\mu }+O(\Vert \Delta \Vert ^{4})\ge \frac{\eta _{7}}{2} \Vert \Delta \Vert ^{4-2\mu } \end{aligned}$$

for $\Delta \rightarrow 0$. As in sub-case 3.1, this allows to show $\Vert \mathrm {grad\!\;}{f(\mathbf{y})}\Vert \ge \Theta (\Vert \Delta \Vert ^{3-2\mu })$ and thus, by (A.15), the Łojasiewicz inequality holds with $\theta = \frac{1+2\mu }{4} \ge \frac{1}{4}$.

Finally, let us consider $\eta _{8}\Vert \Delta \Vert ^{4}\le \Delta _{\mathcal {A}}^{T} H_{\mathcal {A}\mathcal {A}}\Delta _{\mathcal {A}}\le \eta _{7}\Vert \Delta \Vert ^{4-2\mu }$, where $\eta _{8} > 0$ is chosen such that $\Delta ^{T}H\Delta \ge \eta _{8}\Vert \Delta \Vert ^{4}$ and let us define the final decompositions

$$\begin{aligned} \Delta _{\mathcal {A}}=\psi _{1}+\xi _{1},\quad \psi _{1}\in \mathrm {null}~H_{\mathcal {A}\mathcal {A}},\quad \xi _{1} \in [\mathrm {null}~H_{\mathcal {A}\mathcal {A}}]^\bot ,\\ \mathrm{diag}(|\Delta _\mathcal {A}|^2)\Delta _{\mathcal {A}}=\psi _{2}+\xi _{2}, \quad \psi _{2}\in \mathrm {null}~H_{\mathcal {A}\mathcal {A}}, \quad \xi _{2} \in [\mathrm {null}~H_{\mathcal {A}\mathcal {A}}]^\bot , \end{aligned}$$

where $\mathrm {null}~M$ is the null space of a matrix M. We then have $\Vert \xi _{1}\Vert =\Theta (\Vert H_{\mathcal {A}\mathcal {A}}\xi _{1}\Vert )= \Theta (\sqrt{\xi _{1}^{T}H_{\mathcal {A}\mathcal {A}}\xi _{1}})=O (\Vert \Delta \Vert ^{2-\mu })$ and $\Vert \psi _{1}\Vert =O(\Vert \Delta \Vert )$. Notice that such decompositions exist due to the symmetry of $H_{\mathcal {A}\mathcal {A}}$.

Since H is positive semidefinite, we can show that $\mathrm {null}\,H_{\mathcal {A}\mathcal {A}}\subset \mathrm {null}\, H_{\mathcal {I}\mathcal {A}}$. If $H_{\mathcal {I}\mathcal {A}}=0$, then this claim is certainly true. Otherwise, if we assume that the statement is false, the set $S = \mathrm {null}\,H_{\mathcal {A}\mathcal {A}}\cap [\mathrm {null}\,H_{\mathcal {I}\mathcal {A}}]^{\bot }$ does not only contain zero and we can select $\psi \in S\backslash \{0\}$ and $\xi \in [\mathrm {null}\,H_{\mathcal {I}\mathcal {A}}^{T}]^{\bot } \backslash \{0\} = [\mathrm {ran}\,H_{\mathcal {I}\mathcal {A}}] \backslash \{0\}$. Then it holds that

$$\begin{aligned} \begin{bmatrix}\xi ^T&\quad a\psi ^T\end{bmatrix}\begin{bmatrix} H_{\mathcal {I}\mathcal {I}} &{}\quad H_{\mathcal {I}\mathcal {A}} \\ H_{\mathcal {I}\mathcal {A}}^T &{}\quad H_{\mathcal {A}\mathcal {A}} \end{bmatrix} \begin{bmatrix}\xi \\ a\psi \end{bmatrix} =\xi ^{T}H_{\mathcal {I}\mathcal {I}}\xi +2a \psi ^{T}H_{\mathcal {I}\mathcal {A}}^{T}\xi \ge 0,\quad \forall ~a\in \mathbb {R}. \end{aligned}$$

But since $[H_{\mathcal {I}\mathcal {A}}\psi ]^{T}\xi \ne 0$, we can choose a such that $\xi ^{T}H_{\mathcal {I}\mathcal {I}}\xi +2a\psi ^{T}H_{\mathcal {I} \mathcal {A}}^{T}\xi <0$, which is a contradiction.

Hence and due to $H_{\mathcal {I}\mathcal {A}}^{T}\Delta _{\mathcal {I}} \in \mathrm {ran}\,H_{\mathcal {I}\mathcal {A}}^T = [\mathrm {null}\,H_{\mathcal {I}\mathcal {A}}]^\bot $, we can infer $H_{\mathcal {I}\mathcal {A}}^{T}\Delta _{\mathcal {I}} + H_{\mathcal {A}\mathcal {A}}\Delta _{\mathcal {A}}\in [\mathrm {null}\,H_{\mathcal {A}\mathcal {A}}]^{\perp }$. This implies that $\mathbf{g}_2$ can be written as $\mathbf{g}_2 = \mathbf{g}_3 + \mathbf{d}$ where $\mathbf{g}_3 \in [\mathrm {null}\,H_{\mathcal {A}\mathcal {A}}]^{\perp }$ and $\mathbf{d}=2\psi _{2}-c_{1}\psi _{1} \in \mathrm {null}\,H_{\mathcal {A}\mathcal {A}}$. If $\Vert \mathbf{d}\Vert \ge \frac{\eta _{8}}{2}\Vert \Delta \Vert ^{3}$, we obtain $\Vert \mathrm {grad\!\;}{f(\mathbf{y})}\Vert \ge \Vert \mathbf{g}_{2}\Vert \ge \Vert \mathbf{d}\Vert \ge \frac{\eta _{8}}{2}\Vert \Delta \Vert ^{3}$ by (A.16) and the Łojasiewicz inequality is satisfied with $\theta =\frac{1}{4}$ due to (A.15). Otherwise, if $\Vert \mathbf{d}\Vert \le \frac{\eta _{8}}{2}\Vert \Delta \Vert ^{3}$, it follows

$$\begin{aligned} 2\Vert \Delta _{\mathcal {A}}\Vert _{4}^{4}-c_{1}\Vert \Delta _{\mathcal {A}}\Vert ^{2}&= 2\psi _{1}^{T}\psi _{2}-c_{1}\Vert \psi _{1}\Vert ^{2}+2\xi _{1}^{T}\xi _{2}-c_{1}\Vert \xi _{1}\Vert ^{2}\\&=\psi _{1}^{T}\mathbf {d}+2\xi _{1}^{T}\xi _{2}-c_{1}\Vert \xi _{1}\Vert ^{2} \ge -\frac{\eta _{8}}{2}\Vert \Delta \Vert ^{4}+O(\Vert \Delta \Vert ^{5-\mu }), \end{aligned}$$

where we applied the estimates $\Vert \xi _2\Vert \le \Vert \Delta _\mathcal {A}\Vert _6^3 \le \Vert \Delta \Vert ^3$ and $c_1 = \Theta (\Vert \Delta \Vert ^2)$. Using (A.18), this shows

$$\begin{aligned} \Delta _{\mathcal {A}}^{T}{} \mathbf{g}_{2}&=\Delta _{\mathcal {A}}^{T}(H_{\mathcal {I}\mathcal {A}}^{T}\Delta _{\mathcal {I}}+H_{\mathcal {A}\mathcal {A}}\Delta _{\mathcal {A}})+2\Vert \Delta _{\mathcal {A}}\Vert _{4}^{4}-c_{1}\Vert \Delta _{\mathcal {A}}\Vert ^{2}\\&\ge \Delta ^{T}H\Delta -\frac{\eta _{8}}{2}\Vert \Delta \Vert ^{4}+O(\Vert \Delta \Vert ^{5-\mu }) \ge \frac{\eta _{8}}{2}\Vert \Delta \Vert ^{4}+O(\Vert \Delta \Vert ^{5-\mu })\ge \frac{\eta _{8}}{4}\Vert \Delta \Vert ^{4} \end{aligned}$$

for $\Delta \rightarrow 0$. Thus, we have $\Vert \mathrm {grad\!\;}{f(\mathbf{y})}\Vert \ge \Theta (\Vert \Delta \Vert ^{3})$ and as before we can infer that the Łojasiewicz inequality holds with $\theta =\frac{1}{4}$ in this case. $\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, H., Milzarek, A., Wen, Z. et al. On the geometric analysis of a quartic–quadratic optimization problem under a spherical constraint. Math. Program. 195, 421–473 (2022). https://doi.org/10.1007/s10107-021-01702-6

Download citation

Received: 09 August 2019
Accepted: 10 August 2021
Published: 21 August 2021
Issue Date: September 2022
DOI: https://doi.org/10.1007/s10107-021-01702-6

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

On the geometric analysis of a quartic–quadratic optimization problem under a spherical constraint

Abstract

Access this article

Similar content being viewed by others

Finding the global optimum of a class of quartic minimization problem

A DCA-Newton method for quartic minimization over the sphere

Quadratic Growth and Linear Convergence of a DCA Method for Quartic Minimization over the Sphere

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

A Proof of Theorem 6.2

Proof

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

On the geometric analysis of a quartic–quadratic optimization problem under a spherical constraint

Abstract

Access this article

Similar content being viewed by others

Finding the global optimum of a class of quartic minimization problem

A DCA-Newton method for quartic minimization over the sphere

Quadratic Growth and Linear Convergence of a DCA Method for Quartic Minimization over the Sphere

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

A Proof of Theorem 6.2

A Proof of Theorem 6.2

Proof

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation