Abstract
This paper considers the problem of solving a special quartic–quadratic optimization problem with a single sphere constraint, namely, finding a global and local minimizer of \(\frac{1}{2}\mathbf {z}^{*}A\mathbf {z}+\frac{\beta }{2}\sum _{k=1}^{n}|z_{k}|^{4}\) such that \(\Vert \mathbf {z}\Vert _{2}=1\). This problem spans multiple domains including quantum mechanics and chemistry sciences and we investigate the geometric properties of this optimization problem. Fourth-order optimality conditions are derived for characterizing local and global minima. When the matrix in the quadratic term is diagonal, the problem has no spurious local minima and global solutions can be represented explicitly and calculated in \(O(n\log {n})\) operations. When A is a rank one matrix, the global minima of the problem are unique under certain phase shift schemes. The strict-saddle property, which can imply polynomial time convergence of second-order-type algorithms, is established when the coefficient \(\beta \) of the quartic term is either at least \(O(n^{3/2})\) or not larger than O(1). Finally, the Kurdyka–Łojasiewicz exponent of quartic–quadratic problem is estimated and it is shown that the largest exponent is at least 1/4 for a broad class of stationary points.
Similar content being viewed by others
References
Absil, P.A., Mahony, R., Andrews, B.: Convergence of the iterates of descent methods for analytic cost functions. SIAM J. Optim. 16(2), 531–547 (2005)
Absil, P.A., Mahony, R., Sepulchre, R.: Optimization Algorithms on Matrix Manifolds. Princeton University Press, Princeton (2009)
Adhikari, S.K.: Numerical solution of the two-dimensional Gross–Pitaevskii equation for trapped interacting atoms. Phys. Lett. A 265(1–2), 91–96 (2000)
Anandkumar, A., Ge, R.: Efficient approaches for escaping higher order saddle points in non-convex optimization. In: 29th Annual Conference on Learning Theory. Proceedings of Machine Learning Research, vol. 49, pp. 81–102 (2016)
Arora, S., Ge, R., Ma, T., Moitra, A.: Simple, efficient, and neural algorithms for sparse coding. J. Mach. Learn. Res. (2015)
Attouch, H., Bolte, J.: On the convergence of the proximal algorithm for nonsmooth functions involving analytic features. Math. Program. 116(1–2), 5–16 (2009)
Attouch, H., Bolte, J., Redont, P., Soubeyran, A.: Proximal alternating minimization and projection methods for nonconvex problems: an approach based on the Kurdyka–łojasiewicz inequality. Math. Oper. Res. 35(2), 438–457 (2010)
Attouch, H., Bolte, J., Svaiter, B.F.: Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward-backward splitting, and regularized Gauss–Seidel methods. Math. Program. 137(1–2, Ser. A), 91–129 (2013)
Bandeira, A.S., Boumal, N., Singer, A.: Tightness of the maximum likelihood semidefinite relaxation for angular synchronization. Math. Program. 163(1–2), 145–167 (2017)
Bandeira, A.S., Boumal, N., Voroninski, V.: On the low-rank approach for semidefinite programs arising in synchronization and community detection. In: Conference on Learning Theory, pp. 361–382 (2016)
Bao, W., Cai, Y.: Mathematical theory and numerical methods for Bose–Einstein condensation. Kinet. Relat. Models 6(1), 1–135 (2012)
Bolte, J., Daniilidis, A., Lewis, A.: The Łojasiewicz inequality for nonsmooth subanalytic functions with applications to subgradient dynamical systems. SIAM J. Optim. 17(4), 1205–1223 (2006)
Bolte, J., Daniilidis, A., Lewis, A.: A nonsmooth Morse–Sard theorem for subanalytic functions. J. Math. Anal. Appl. 321(2), 729–740 (2006)
Bolte, J., Daniilidis, A., Lewis, A., Shiota, M.: Clarke subgradients of stratifiable functions. SIAM J. Optim. 18(2), 556–572 (2007)
Bolte, J., Sabach, S., Teboulle, M.: Proximal alternating linearized minimization or nonconvex and nonsmooth problems. Math. Program. 146(1–2), 459–494 (2014)
Bonettini, S., Loris, I., Porta, F., Prato, M., Rebegoldi, S.: On the convergence of a linesearch based proximal-gradient method for nonconvex optimization. Inverse Probl. 33(5) (2017)
Boumal, N.: Nonconvex phase synchronization. SIAM J. Optim. 26(4), 2355–2377 (2016)
Cai, Y., Zhang, L., Bai, Z., Li, R.C.: On an eigenvector-dependent nonlinear eigenvalue problem. SIAM J. Matrix Anal. Appl. 39(3), 1360–1382 (2018)
Candès, E.J., Li, X.: Solving quadratic equations via phaselift when there are about as many equations as unknowns. Found. Comput. Math. 14(5), 1017–1026 (2014)
Candès, E.J., Li, X., Soltanolkotabi, M.: Phase retrieval via wirtinger flow: theory and algorithms. IEEE Trans. Inf. Theory 61(4), 1985–2007 (2015)
Candès, E.J., Strohmer, T., Voroninski, V.: Phaselift: exact and stable signal recovery from magnitude measurements via convex programming. Commun. Pure Appl. Math. 66(8), 1241–1274 (2013)
Candès, E.J., Tao, T.: The power of convex relaxation: near-optimal matrix completion. IEEE Trans. Inf. Theory 56(5), 2053–2080 (2010)
Cartis, C., Gould, N.I.M., Toint, P.L.: Second-order optimality and beyond: characterization and evaluation complexity in convexly constrained nonlinear optimization. Found. Comput. Math. 18(5), 1073–1107 (2018)
Chen, Y., Chi, Y., Fan, J., Ma, C.: Gradient descent with random initialization: fast global convergence for nonconvex phase retrieval. Math. Program. 176(1–2, Ser. B), 5–37 (2019)
Chen, Y., Chi, Y., Goldsmith, A.J.: Exact and stable covariance estimation from quadratic sampling via convex programming. IEEE Trans. Inf. Theory 61(7), 4034–4059 (2015)
Chi, Y., Lu, Y.M., Chen, Y.: Nonconvex optimization meets low-rank matrix factorization: an overview (2018). Preprint arXiv:1809.09573
D’Acunto, D., Kurdyka, K.: Explicit bounds for the łojasiewicz exponent in the gradient inequality for polynomials. Ann. Pol. Math. 87, 51–61 (2005)
Dedieu, J.P.: Third- and fourth-order optimality conditions in optimization. Optimization 33(2), 97–104 (1995)
Edwards, M., Burnett, K.: Numerical solution of the nonlinear Schrödinger equation for small samples of trapped neutral atoms. Phys. Rev. A 51(2), 1382 (1995)
Forti, M., Nistri, P., Quincampoix, M.: Convergence of neural networks for programming problems via a nonsmooth Łojasiewicz inequality. IEEE Trans. Neural Netw. 17(6), 1471–1486 (2006)
Gao, B., Liu, X., Chen, X., Yuan, Y.: On the Łojasiewicz exponent of the quadratic sphere constrained optimization problem (2016). Preprint arXiv:1611.08781
Gao, B., Liu, X., Chen, X., Yuan, Y.: A new first-order algorithmic framework for optimization problems with orthogonality constraints. SIAM J. Optim. 28(1), 302–332 (2018)
García-Ripoll, J.J., Pérez-García, V.M.: Optimizing Schrödinger functionals using sobolev gradients: applications to quantum mechanics and nonlinear optics. SIAM J. Sci. Comput. 23(4), 1316–1334 (2001)
Ge, R., Huang, F., Jin, C., Yuan, Y.: Escaping from saddle points–online stochastic gradient for tensor decomposition. In: Conference Learning Theory, pp. 797–842 (2015)
Ge, R., Jin, C., Zheng, Y.: No spurious local minima in nonconvex low rank problems: a unified geometric analysis, vol. 70, pp. 1233–1242. In: Conference Machine Learning (2017)
Ge, R., Lee, J.D., Ma, T.: Matrix completion has no spurious local minimum. Adv. Neural Inf. Process. Syst. 2973–2981 (2016)
Ge, R., Ma, T.: On the optimization landscape of tensor decompositions. Adv. Neural Inf. Process. Syst. 3653–3663 (2017)
Griffin, A., Snoke, D.W., Stringari, S.: Bose–Einstein Condensation. Cambridge University Press, Cambridge (1996)
Gwoździewicz, J.: The Łojasiewicz exponent of an analytic function at an isolated zero. Comment. Math. Helv. 74(3), 364–375 (1999)
Hu, J., Jiang, B., Liu, X., Wen, Z.: A note on semidefinite programming relaxations for polynomial optimization over a single sphere. Sci. China Math. 59(8), 1543–1560 (2016)
Hu, J., Milzarek, A., Wen, Z., Yuan, Y.: Adaptive quadratically regularized Newton method for Riemannian optimization. SIAM J. Matrix Anal. Appl. 39(3), 1181–1207 (2018)
Jaganathan, K., Eldar, Y.C., Hassibi, B.: Phase retrieval: an overview of recent developments (2015). Preprint arXiv:1510.07713
Jain, P., Kar, P.: Non-convex optimization for machine learning. Found. Trends® Mach. Learn. 10(3–4), 142–336 (2017)
Keshavan, R.H., Montanari, A., Oh, S.: Matrix completion from a few entries. IEEE Trans. Inf. Theory 56(6), 2980–2998 (2010)
Keshavan, R.H., Montanari, A., Oh, S.: Matrix completion from noisy entries. J. Mach. Learn. Res. 11(Jul), 2057–2078 (2010)
Kreutz-Delgado, K.: The complex gradient operator and the CR-calculus. Preprint arXiv:0906.4835 (2009)
Kurdyka, K.: On gradients of functions definable in o-minimal structures. Ann. Inst. Fourier (Grenoble) 48(3), 769–783 (1998)
Lee, J.D., Panageas, I., Piliouras, G., Simchowitz, M., Jordan, M.I., Recht, B.: First-order methods almost always avoid strict saddle points. Math. Program. 176(1–2, Ser. B), 311–337 (2019)
Lee, J.D., Simchowitz, M., Jordan, M.I., Recht, B.: Gradient descent converges to minimizers. In: Conference Learning Theory, pp. 1246–1257 (2016)
Li, G., Pong, T.K.: Calculus of the exponent of Kurdyka–Łojasiewicz inequality and its applications to linear convergence of first-order methods. Found. Comput. Math. 18(5), 1199–1232 (2018)
Li, X., Zhu, Z., So, A.M.C., Vidal, R.: Nonconvex robust low-rank matrix recovery (2018). Preprint arXiv:1809.09237
Liang, S., Sun, R., Li, Y., Srikant, R.: Understanding the loss surface of neural networks for binary classification. In: International Conference on Machine Learning, pp. 2835–2843 (2018)
Liu, H., So, A.M.C., Wu, W.: Quadratic optimization with orthogonality constraint: explicit Łojasiewicz exponent and linear convergence of retraction-based line-search and stochastic variance-reduced gradient methods. Math. Program. 1–48 (2017)
Liu, H., Yue, M.C., Man-Cho So, A.: On the estimation performance and convergence rate of the generalized power method for phase synchronization. SIAM J. Optim. 27(4), 2426–2446 (2017)
Łojasiewicz, S.: Une propriété topologique des sous-ensembles analytiques réels. Les équations aux dérivées partielles 117, 87–89 (1963)
Luo, Z.Q., Pang, J.S.: Error bounds for analytic systems and their applications. Math. Program. 67(1–3), 1–28 (1994)
Luo, Z.Q., Sturm, J.F.: Error bounds for quadratic systems. In: High Performance Optimization, pp. 383–404. Springer (2000)
Marčenko, V.A., Pastur, L.A.: Distribution of eigenvalues for some sets of random matrices. Math. USSR-Sbornik 1(4), 457 (1967)
Merlet, B., Nguyen, T.N., et al.: Convergence to equilibrium for discretizations of gradient-like flows on Riemannian manifolds. Differ. Integr. Equ. 26(5/6), 571–602 (2013)
More, J.J.: Generalizations of the trust region problem. Optim. Methods Softw. 2(3–4), 189–209 (1993)
Murty, K.G., Kabadi, S.N.: Some NP-complete problems in quadratic and nonlinear programming. Math. Program. 39(2), 117–129 (1987)
Ochs, P., Chen, Y., Brox, T., Pock, T.: iPiano: inertial proximal algorithm for nonconvex optimization. SIAM J. Imaging Sci. 7(2), 1388–1419 (2014)
Panageas, I., Piliouras, G.: Gradient descent only converges to minimizers: non-isolated critical points and invariant regions. arXiv:1605.00405 (2016)
Penot, J.P.: Higher-order optimality conditions and higher-order tangent sets. SIAM J. Optim. 27(4), 2508–2527 (2017)
Pethick, C.J., Smith, H.: Bose–Einstein Condensation in Dilute Gases. Cambridge University Press, Cambridge (2002)
Press, W.H., Flannery, B.P., Teukolsky, S.A., Vetterling, W.T.: The art of scientific computing. In: Numerical Recipes. Cambridge University Press, Cambridge (1986)
Recht, B.: A simpler approach to matrix completion. J. Mach. Learn. Res. 12(Dec), 3413–3430 (2011)
Schneider, R., Uschmajew, A.: Convergence results for projected line-search methods on varieties of low-rank matrices via Łojasiewicz inequality. SIAM J. Optim. 25(1), 622–646 (2015)
Shechtman, Y., Eldar, Y.C., Cohen, O., Chapman, H.N., Miao, J., Segev, M.: Phase retrieval with application to optical imaging: a contemporary overview. IEEE Signal Process. Mag. 32(3), 87–109 (2015)
Sorber, L., Barel, M.V., Lathauwer, L.D.: Unconstrained optimization of real functions in complex variables. SIAM J. Optim. 22(3), 879–898 (2012)
Sun, J.: When Are Nonconvex Optimization Problems Not Scary? Columbia University, New York (2016)
Sun, J., Qu, Q., Wright, J.: A geometric analysis of phase retrieval. In: IEEE International Symposium on Information Theory (ISIT), 2016, pp. 2379–2383. IEEE (2016)
Sun, J., Qu, Q., Wright, J.: Complete dictionary recovery over the sphere i: overview and the geometric picture. IEEE Trans. Inf. Theory 63(2), 853–884 (2017)
Sun, J., Qu, Q., Wright, J.: Complete dictionary recovery over the sphere ii: recovery by Riemannian trust-region method. IEEE Trans. Inf. Theory 63(2), 885–914 (2017)
Sun, J., Qu, Q., Wright, J.: A geometric analysis of phase retrieval. Found. Comput. Math. 18(5), 1131–1198 (2018)
Sun, R., Luo, Z.Q.: Guaranteed matrix completion via non-convex factorization. IEEE Trans. Inf. Theory 62(11), 6535–6579 (2016)
Thoai, N.V.: General quadratic programming. In: Essays and Surveys in Global Optimization GERAD 25th Anniversary Series, vol. 7, pp. 107–129. Springer, New York (2005)
Wen, Z., Yin, W.: A feasible method for optimization with orthogonality constraints. Math. Program. 142(1–2, Ser. A), 397–434 (2013)
Wu, X., Wen, Z., Bao, W.: A regularized Newton method for computing ground states of Bose–Einstein condensates. SIAM J. Sci. Comput. 73(1), 303–329 (2017)
Yang, W.H.: Error bounds for convex polynomials. SIAM J. Optim. 19(4), 1633–1647 (2008)
Yang, W.H., Zhang, L.H., Song, R.: Optimality conditions for the nonlinear programming problems on Riemannian manifolds. Pac. J. Optim. 10, 415–434 (2014)
Acknowledgements
The authors are grateful to Prof. Adrian Lewis, the Associate Editor and the anonymous reviewers for their valuable comments and suggestions that helped to improve the quality of our manuscript.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
H. Zhang was partly supported by the elite undergraduate training program of School of Mathematical Sciences in Peking University. Z. Wen was supported in part by the NSFC Grant 11831002, the Key-Area Research and Development Program of Guangdong Province (No. 2019B121204008) and Beijing Academy of Artificial Intelligence. A. Milzarek was partly supported by the Beijing International Center for Mathematical Research, Peking University, the Boya Postdoctoral Fellowship Program, the Shenzhen Institute for Artificial Intelligence and Robotics for Society (AIRS), and by the Fundamental Research Fund - Shenzhen Research Institute of Big Data (SRIBD) Startup Fund JCYJ-AM20190661.
A Proof of Theorem 6.2
A Proof of Theorem 6.2
Proof
As in the proof of Theorem 6.1, the verification of Theorem 6.2 is mainly based on proper decompositions of \(\Delta = \mathbf{y}-\mathbf{z}\) and \(\mathrm{diag}(\tau )\mathbf{y}\) that allow to derive appropriate bounds of \(|f(\mathbf{y}) - f(\mathbf{z})|\) and \(\Vert \mathrm {grad\!\;}{f(\mathbf{y})}\Vert \) if \(\mathbf{y}\) is close to \(\mathbf{z}\) or if, equivalently, \(\Delta \) is sufficiently small. In particular, we will discuss three cases that will allow to simplify and estimate the expressions for \(|f(\mathbf{y}) - f(\mathbf{z})|\) and \(\Vert \mathrm {grad\!\;}{f(\mathbf{y})}\Vert \) via identifying the different leading terms.
Without loss of generality we assume \(\beta =1\). Let \(\mathbf{y} \in \mathbb {S}^{n-1}\) be arbitrary and let us set \(\Delta = \mathbf{y} - \mathbf{z}\), and
Based on the representation \(\mathrm {grad\!\;}{f(\mathbf{y})} = P_\mathbf{y}^\bot [H + 2 \mathrm{diag}(\tau )]\mathbf{y}\), we now introduce the following decompositions
where \(\sigma _-(H)\) denotes the smallest positive singular value of H. Using this decomposition, (6.5), and \(H\mathbf{y} = H\Delta \), we can express the norm of the Riemannian gradient as follows
Let \(\lambda _+(H)\) be the largest eigenvalue of H. Then, by definition of \(\mathbf{v}\), we obtain
Moreover, Lemma 6.1 yields
for some constant \(\eta _{1}>0\) and for all \(\mathbf{y} \in \mathbb {S}^{n-1}\) sufficiently close to \(\mathbf{z}\). Throughout the proof, we will also repeatedly use the following facts:
Let \(\epsilon \in (0,1]\) be an arbitrary, small positive constant. We now discuss three different cases.
Case 1. \(\Vert \mathbf {w}\Vert \ge (1+\epsilon ) \Vert H\mathbf{v}\Vert \) or \(\Vert \mathbf {w}\Vert \le (1-\epsilon )\Vert H\mathbf{v}\Vert \). We first assume \(\mathbf{w} \ne 0\) and \(\epsilon < 1\). Since the function \(x \mapsto \varrho (x) := x + 1/x\) is monotonically decreasing on the interval \((0,1-\epsilon ]\) and monotonically increasing on \([1+\epsilon ,\infty )\), it follows
Thus, we obtain
which further implies
Next, we choose \(\delta _\mathbf{z} > 0\) sufficiently small such that \(\Vert \mathbf{v}\Vert ^2 \le \epsilon ^2 {\bar{\sigma }} / (10\lambda _+(H))\) and \(|\mathbf{v}^TH\mathbf{v}| \le 1\). (This is possible due to the decomposition (A.2) and \(\Vert \mathbf{v}\Vert \le \Vert \Delta \Vert \)). Using (A.4), (A.5), the estimate \([\frac{1}{2} |a + b|]^{3/2} \le [|a|^{3/2} + |b|^{3/2}] / \sqrt{2}\), \(a,b \in \mathbb {R}\), and setting \({\tilde{\eta }}_1 := \min \{\eta _1^{-2},\frac{{\bar{\sigma }}}{2}\}\), it follows
where the last equality is a consequence of (6.4). Thus, we can infer that the largest KL exponent of problem (1.1) at \(\mathbf{z}\) is at least \(\frac{1}{4}\).
Case 2. \((2-{\epsilon })r_{+}^{2}\Vert \Delta _{\mathcal {I}} \Vert \ge \Vert \Delta \Vert ^2\) or \(\gamma _1 \le (2-\epsilon )r_+^2 \Vert \Delta _\mathcal {I}\Vert ^2 \Vert \Delta \Vert ^{-2}\). First, utilizing the identity \(\Delta ^T \mathbf{y} = \frac{1}{2}\Vert \Delta \Vert ^2\) in (A.6), we have \(P_{\mathbf {y}}^{\perp }\Delta =\Delta -\frac{1}{2}\Vert \Delta \Vert ^{2}\mathbf {y}\). Defining \(t_\Delta := \Delta ^T P_{\mathbf {y}}^{\perp }\Delta = \Vert \Delta \Vert ^2 - \frac{1}{4} \Vert \Delta \Vert ^4\), we will now work with the following additional decompositions
We note that due to the choice of \(c_2\) and \(c_3\) and (A.2), the vectors \(\mathbf {w}_{1}\) and \(\mathbf {w}_{2}\) are orthogonal to \(\mathbf {y}\) and \(\Delta \). Hence, by (A.3), it holds that
Recalling the definitions introduced in (A.1) and using \(\tau _k = (y_k - z_k)(y_k + z_k) = \Delta _k (\Delta _k + 2z_k)\) and \(y_k = \Delta _k + z_k\), we can express \(t_\Delta \cdot c_{3}\) via
Notice that we have \(\gamma _2 \ge r_+^2 \Vert \Delta _\mathcal {I}\Vert ^2\) and \(\gamma _1 \le 1\) provided that \(\Vert \Delta \Vert \le 1\). Now, if \((2-{\epsilon })r_{+}^{2}\Vert \Delta _{\mathcal {I}}\Vert \ge \Vert \Delta \Vert ^{2}\), we obtain
and hence, it follows
for \(\Delta \) sufficiently small. Otherwise, if \(\gamma _1 \le (2-\epsilon )r_+^2 \Vert \Delta _\mathcal {I}\Vert ^2 \Vert \Delta \Vert ^{-2}\), then we also have \((4-5\Vert \Delta \Vert ^2) \gamma _2 - 2 \Vert \Delta \Vert ^2 \gamma _1 \ge (2\epsilon - 5\Vert \Delta \Vert ^2) r_+^2 \Vert \Delta _\mathcal {I}\Vert ^2\) and thus, (A.8) holds in both sub-cases. Consequently, due to the positive semidefiniteness of H, (A.8), and
we can infer
where \(\eta _2 := \epsilon ^{\frac{1}{4}}\sqrt{r_{+}} / 2\sqrt{2}\) and if \(\Delta \) is chosen sufficiently small. Here, we also used the estimates \(t_{\Delta }=\Vert \Delta \Vert ^{2}-\Vert \Delta \Vert ^{4}/4\le \Vert \Delta \Vert ^{2}\) and \(1- \Vert \Delta \Vert ^2/2 \ge 1/2\) in the second inequality. The third inequality follows from \([\Delta ^{T}H\Delta + 2 \Vert \Delta \Vert _4^4 +\epsilon r_{+}^{2}\Vert \Delta _{\mathcal {I}}\Vert ^{2}]^\frac{1}{4}\ge \epsilon ^\frac{1}{4} \sqrt{r_{+}\Vert \Delta _{\mathcal {I}}\Vert }\). Next, utilizing (6.4) and \(|\tau _k| \le 2 |\Delta _k|\) for all \(k \in \mathcal {I}\), we finally obtain
for some constant \(\eta _{3}>0\) and for all \(\mathbf{y}\) sufficiently close to \(\mathbf{z}\). Hence, the largest KL exponent is at least \(\frac{1}{4}\) in this case.
Case 3. \((1-\epsilon )\Vert H\mathbf{v}\Vert \le \Vert \mathbf {w}\Vert \le (1+\epsilon )\Vert H\mathbf{v}\Vert \), \(\gamma _{1}\ge (2-\epsilon )r_{+}^{2}\Vert \Delta _{\mathcal {I}} \Vert ^{2}\Vert \Delta \Vert ^{-2}\) and \(\left( 2-\epsilon \right) r_{+}^{2}\Vert \Delta _{\mathcal {I}}\Vert \le \Vert \Delta \Vert ^{2}\). In this case, the inequality (A.9) implies \(\Vert \Delta _{\mathcal {I}}\Vert =\Theta (\Vert \Delta \Vert ^{2})\) and setting \(\nu = [(2-\epsilon )r_+^2]^{-1}\), the terms \(\gamma _{i}\) for \(i=1, \ldots ,3\) can be estimated as follows
Together with \(\gamma _4 = \Vert \Delta \Vert _4^4\), this shows that
for all \(\Delta \). Let us set \(m := |\mathcal {I}|\) and define \(\sigma _{k}= \Delta _{k} z_{k}+\frac{1}{2m}\Vert \Delta \Vert ^{2}\) for all \(k\in \mathcal {I}\) and \(\sigma _k = 0\) for all \(k \in \mathcal {A}\). Then, by (A.6), we have
and \(\Vert \sigma \Vert = \Theta (\Vert \Delta \Vert ^2)\). We now express \(\Vert \mathbf {w}\Vert ^{2}\) in terms of \(\gamma _{1}\), \(\gamma _2\), \(\gamma _3\), and \(\gamma _{4}\). Specifically, by utilizing (A.2), (A.11) and by mimicking the derivation of (A.7), we obtain
We notice that the higher order terms \(\sum _{k \in \mathcal {I}} \Delta _k^{3+\ell } z_k^{3-\ell }\), \(\ell \in \{0,1,2\}\), can be discussed as in (A.10) and are all bounded by \(O(\Vert \Delta \Vert ^6)\). Finally, applying (A.12) and \(|z_k| \le 1\) and \(z_k^2 \ge r_+^2\) for all \(k \in \mathcal {I}\), we obtain
and \({\sum }_{k, j \in \mathcal {I}} z_k^2 z_j^2 (\sigma _k - \sigma _j)^2 \le 2m \Vert \sigma \Vert ^2\) which implies \(\Vert \mathbf {w}\Vert =\Theta (\Vert \Delta \Vert ^{2})\) and
for \(\Delta \rightarrow 0\). As a consequence, by (A.4), we also get
Furthermore, by (6.4) and (A.1), it holds that
for \(\Delta \rightarrow 0\). For some index sets \({\mathcal {K}}, {\mathcal {J}} \subset [n]\), let \(H_{{\mathcal {K}}{\mathcal {J}}} \in \mathbb {R}^{|{\mathcal {K}}| \times |{\mathcal {J}}|}\) denote the submatrix \(H_{{\mathcal {K}}{\mathcal {J}}} = (H_{kj})_{k \in {\mathcal {K}}, j \in {\mathcal {J}}}\). Notice that due to the positive semidefiniteness of H, we have \(H_{\mathcal {I}\mathcal {I}}\succeq 0\) and \(H_{\mathcal {A}\mathcal {A}}\succeq 0\). Moreover, due to (A.2), (A.3), and \(|\mathbf{v}^{T}H\mathbf{v}|=O(\Vert \Delta \Vert ^{4})\), we obtain
where \(c_{1}=2\mathbf{y}^{T}\mathrm{diag}(\tau )\mathbf{y}\) was defined in (A.2) and we used the identities \(z_{k}=0\) for \(k\in \mathcal {A}\) and \(w_{\mathcal {A}}=2\mathrm{diag}(\tau _\mathcal {A}){y}_\mathcal {A}-c_{1}{y}_\mathcal {A}=2\mathrm{diag}(|\Delta _\mathcal {A}|^2)\Delta _\mathcal {A}- c_1 \Delta _\mathcal {A}\). We set
and, let \(\eta _{4},\eta _{5}>0\) and \(\mu \in (0,\frac{1}{2})\) be given constants. Next, we discuss two separate sub-cases.
Sub-case 3.1. \(\Delta _{\mathcal {I}}^{T}\mathbf{h}\ge -\eta _{4}\Vert \Delta \Vert ^{4+\mu }\) or \(\Delta _\mathcal {I}^T \mathbf{h} \le -\eta _5 \Vert \Delta \Vert ^{4-\mu }\). Let us first assume \(\Delta _{\mathcal {I}}^{T}{} \mathbf{h}\ge -\eta _{4}\Vert \Delta \Vert ^{4+\mu }\). Following the derivation of (A.7) and using (A.6), (A.2), and (A.11), we obtain \(\frac{1}{2}c_1 = \Vert \Delta \Vert _4^4 + 4\gamma _3 + 5\gamma _2 + 2\gamma _1 = \Theta (\Vert \Delta \Vert ^2)\), \(\Delta _\mathcal {I}^T y_\mathcal {I}= - \frac{1}{2}\Vert \Delta \Vert ^2\), and
which yields
Similarly, in the case \(\Delta _{\mathcal {I}}^{T}{} \mathbf{h} \le -\eta _{5}\Vert \Delta \Vert ^{4-\mu }\) and if \(\Vert \Delta \Vert \) is sufficiently small, we get
Combining both cases, we can infer \(\Vert \Delta _{\mathcal {I}}\Vert \Vert \mathbf{g}_{1}\Vert \ge |\Delta _{\mathcal {I}}^{T}{} \mathbf{g}_{1}| \ge \eta _6 \Vert \Delta \Vert ^4\) for some \(\eta _6 > 0\) and for all \(\mathbf{y}\) sufficiently close to \(\mathbf{z}\). By the assumptions of case 3, this implies \(\Vert \mathbf{g}_{1}\Vert \ge \nu ^{-1}\eta _6 \Vert \Delta \Vert ^2\) and hence, by (A.16), we have \(\Vert \mathrm {grad\!\;}f(\mathbf{y})\Vert \ge \Theta (\Vert \Delta \Vert ^2)\). Considering (A.15), the Łojasiewicz inequality holds with \(\theta = \frac{1}{2}\) in this sub-case.
Sub-case 3.2. \(-\eta _{5}\Vert \Delta \Vert ^{4-\mu }\le \Delta _{\mathcal {I}}^{T}{} \mathbf{h} \le -\eta _{4}\Vert \Delta \Vert ^{4+\mu }\). Due to \(\Delta _{\mathcal {I}}^{T}H_{\mathcal {I} \mathcal {I}}\Delta _{\mathcal {I}}\ge 0\), this directly yields \(\Delta _{\mathcal {I}}^{T}H_{\mathcal {I}\mathcal {A}} \Delta _{\mathcal {A}}=\Delta _{\mathcal {I}}^T\mathbf{h}-\Delta _{\mathcal {I}}^{T}H_{\mathcal {I}\mathcal {I}} \Delta _{\mathcal {I}}<0\). Moreover, we can estimate that
where \(\lambda _-(H_{\mathcal {I}\mathcal {I}})\) is the maximal eigenvalue of \(H_{\mathcal {I}\mathcal {I}}\). We note that the last inequality holds when \(\Vert \Delta \Vert \) is small enough, since \(\eta _5,r_+,\epsilon ,H_{\mathcal {I}\mathcal {I}}\) do not depend on \(\Delta \) and can be considered as constants when \(\Delta \rightarrow 0\). Also, we have
Utilizing the positive semidefiniteness of H, it holds that
Hence, the estimates \(\Vert H\mathbf{v}\Vert =\Theta (\Vert \Delta \Vert ^2)\) in (A.13) and \(|\mathbf{v}^TH\mathbf{v}| = O(\Vert \Delta \Vert ^4)\) in (A.14) lead to \(\mathbf{v}^TH\mathbf{v} = \Theta (\Vert \Delta \Vert ^4)\). Consequently, if \(\Delta _{\mathcal {A}}^{T}H_{\mathcal {A}\mathcal {A}} \Delta _{\mathcal {A}}\ge \eta _{7}\Vert \Delta \Vert ^{4-2\mu }\) for some constant \(\eta _{7}>0\), then we can infer
for \(\Delta \rightarrow 0\). As in sub-case 3.1, this allows to show \(\Vert \mathrm {grad\!\;}{f(\mathbf{y})}\Vert \ge \Theta (\Vert \Delta \Vert ^{3-2\mu })\) and thus, by (A.15), the Łojasiewicz inequality holds with \(\theta = \frac{1+2\mu }{4} \ge \frac{1}{4}\).
Finally, let us consider \(\eta _{8}\Vert \Delta \Vert ^{4}\le \Delta _{\mathcal {A}}^{T} H_{\mathcal {A}\mathcal {A}}\Delta _{\mathcal {A}}\le \eta _{7}\Vert \Delta \Vert ^{4-2\mu }\), where \(\eta _{8} > 0\) is chosen such that \(\Delta ^{T}H\Delta \ge \eta _{8}\Vert \Delta \Vert ^{4}\) and let us define the final decompositions
where \(\mathrm {null}~M\) is the null space of a matrix M. We then have \(\Vert \xi _{1}\Vert =\Theta (\Vert H_{\mathcal {A}\mathcal {A}}\xi _{1}\Vert )= \Theta (\sqrt{\xi _{1}^{T}H_{\mathcal {A}\mathcal {A}}\xi _{1}})=O (\Vert \Delta \Vert ^{2-\mu })\) and \(\Vert \psi _{1}\Vert =O(\Vert \Delta \Vert )\). Notice that such decompositions exist due to the symmetry of \(H_{\mathcal {A}\mathcal {A}}\).
Since H is positive semidefinite, we can show that \(\mathrm {null}\,H_{\mathcal {A}\mathcal {A}}\subset \mathrm {null}\, H_{\mathcal {I}\mathcal {A}}\). If \(H_{\mathcal {I}\mathcal {A}}=0\), then this claim is certainly true. Otherwise, if we assume that the statement is false, the set \(S = \mathrm {null}\,H_{\mathcal {A}\mathcal {A}}\cap [\mathrm {null}\,H_{\mathcal {I}\mathcal {A}}]^{\bot }\) does not only contain zero and we can select \(\psi \in S\backslash \{0\}\) and \(\xi \in [\mathrm {null}\,H_{\mathcal {I}\mathcal {A}}^{T}]^{\bot } \backslash \{0\} = [\mathrm {ran}\,H_{\mathcal {I}\mathcal {A}}] \backslash \{0\}\). Then it holds that
But since \([H_{\mathcal {I}\mathcal {A}}\psi ]^{T}\xi \ne 0\), we can choose a such that \(\xi ^{T}H_{\mathcal {I}\mathcal {I}}\xi +2a\psi ^{T}H_{\mathcal {I} \mathcal {A}}^{T}\xi <0\), which is a contradiction.
Hence and due to \(H_{\mathcal {I}\mathcal {A}}^{T}\Delta _{\mathcal {I}} \in \mathrm {ran}\,H_{\mathcal {I}\mathcal {A}}^T = [\mathrm {null}\,H_{\mathcal {I}\mathcal {A}}]^\bot \), we can infer \(H_{\mathcal {I}\mathcal {A}}^{T}\Delta _{\mathcal {I}} + H_{\mathcal {A}\mathcal {A}}\Delta _{\mathcal {A}}\in [\mathrm {null}\,H_{\mathcal {A}\mathcal {A}}]^{\perp }\). This implies that \(\mathbf{g}_2\) can be written as \(\mathbf{g}_2 = \mathbf{g}_3 + \mathbf{d}\) where \(\mathbf{g}_3 \in [\mathrm {null}\,H_{\mathcal {A}\mathcal {A}}]^{\perp }\) and \(\mathbf{d}=2\psi _{2}-c_{1}\psi _{1} \in \mathrm {null}\,H_{\mathcal {A}\mathcal {A}}\). If \(\Vert \mathbf{d}\Vert \ge \frac{\eta _{8}}{2}\Vert \Delta \Vert ^{3}\), we obtain \(\Vert \mathrm {grad\!\;}{f(\mathbf{y})}\Vert \ge \Vert \mathbf{g}_{2}\Vert \ge \Vert \mathbf{d}\Vert \ge \frac{\eta _{8}}{2}\Vert \Delta \Vert ^{3}\) by (A.16) and the Łojasiewicz inequality is satisfied with \(\theta =\frac{1}{4}\) due to (A.15). Otherwise, if \(\Vert \mathbf{d}\Vert \le \frac{\eta _{8}}{2}\Vert \Delta \Vert ^{3}\), it follows
where we applied the estimates \(\Vert \xi _2\Vert \le \Vert \Delta _\mathcal {A}\Vert _6^3 \le \Vert \Delta \Vert ^3\) and \(c_1 = \Theta (\Vert \Delta \Vert ^2)\). Using (A.18), this shows
for \(\Delta \rightarrow 0\). Thus, we have \(\Vert \mathrm {grad\!\;}{f(\mathbf{y})}\Vert \ge \Theta (\Vert \Delta \Vert ^{3})\) and as before we can infer that the Łojasiewicz inequality holds with \(\theta =\frac{1}{4}\) in this case. \(\square \)
Rights and permissions
About this article
Cite this article
Zhang, H., Milzarek, A., Wen, Z. et al. On the geometric analysis of a quartic–quadratic optimization problem under a spherical constraint. Math. Program. 195, 421–473 (2022). https://doi.org/10.1007/s10107-021-01702-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10107-021-01702-6
Keywords
- Constrained quartic–quadratic optimization
- Geometric analysis
- Strict-saddle property
- Łojasiewicz inequality