Abstract
We consider optimization problems on manifolds with equality and inequality constraints. A large body of work treats constrained optimization in Euclidean spaces. In this work, we consider extensions of existing algorithms from the Euclidean case to the Riemannian case. Thus, the variable lives on a known smooth manifold and is further constrained. In doing so, we exploit the growing literature on unconstrained Riemannian optimization. For the special case where the manifold is itself described by equality constraints, one could in principle treat the whole problem as a constrained problem in a Euclidean space. The main hypothesis we test here is whether it is sometimes better to exploit the geometry of the constraints, even if only for a subset of them. Specifically, this paper extends an augmented Lagrangian method and smoothed versions of an exact penalty method to the Riemannian case, together with some fundamental convergence results. Numerical experiments indicate some gains in computational efficiency and accuracy in some regimes for minimum balanced cut, non-negative PCA and k-means, especially in high dimensions.
Similar content being viewed by others
Notes
Note that this condition involves \(\mathcal {L}\) as defined in Sect. 2.2, not \(\mathcal{{L}}_\rho \).
When the step size is of order \(10^{-10}\), we believe that the current point is close to convergence. We also conducted experiments with minimum step size \(10^{-7}\) for minimum balanced cut and non-negative PCA, and the performance profiles are visually similar to those displayed here.
The proof follows an argument laid out by John M. Lee: https://math.stackexchange.com/questions/2307289/parallel-transport-along-radial-geodesics-yields-a-smooth-vector-field.
References
Absil, P.-A., Hosseini, S.: A collection of nonsmooth Riemannian optimization problems. Technical Report UCL-INMA-2017.08, Université catholique de Louvain (2017)
Absil, P.-A., Mahony, R., Sepulchre, R.: Optimization Algorithms on Matrix Manifolds. Princeton University Press, Princeton (2008)
Agarwal, N., Boumal, N., Bullins, B., Cartis, C.: Adaptive regularization with cubics on manifolds (2018). arXiv preprint arXiv:1806.00065
Albert, R., Barabási, A.-L.: Statistical mechanics of complex networks. Rev. Mod. Phys. 74, 47–97 (2002)
Andreani, R., Birgin, E.G., Martínez, J.M., Schuverdt, M.L.: On augmented Lagrangian methods with general lower-level constraints. SIAM J. Optim. 18(4), 1286–1309 (2007)
Andreani, R., Haeser, G., Martínez, J.M.: On sequential optimality conditions for smooth constrained optimization. Optimization 60(5), 627–641 (2011)
Andreani, R., Haeser, G., Ramos, A., Silva, P.J.: A second-order sequential optimality condition associated to the convergence of optimization algorithms. IMA J. Numer. Anal. 37, 1902–1929 (2017)
Bento, G., Ferreira, O., Melo, J.: Iteration-complexity of gradient, subgradient and proximal point methods on Riemannian manifolds. J. Optim. Theory Appl. 173(2), 548–562 (2017)
Bento, G.C., Ferreira, O.P., Melo, J.G.: Iteration-complexity of gradient, subgradient and proximal point methods on Riemannian manifolds. J. Optim. Theory Appl. 173(2), 548–562 (2017)
Bergmann, R., Herzog, R.: Intrinsic formulation of KKT conditions and constraint qualifications on smooth manifolds (2018). arXiv preprint arXiv:1804.06214
Bergmann, R., Persch, J., Steidl, G.: A parallel Douglas-Rachford algorithm for minimizing ROF-like functionals on images with values in symmetric Hadamard manifolds. SIAM J. Imaging Sci. 9(3), 901–937 (2016)
Bertsekas, D.P.: Constrained Optimization and Lagrange Multiplier Methods. Athena Scientific, Belmont (1982)
Bertsekas, D.P.: Nonlinear Programming. Athena Scientific, Belmont (1999)
Birgin, E., Haeser, G., Ramos, A.: Augmented Lagrangians with constrained subproblems and convergence to second-order stationary points. Optimization Online (2016)
Birgin, E.G., Floudas, C.A., Martínez, J.M.: Global minimization using an augmented Lagrangian method with variable lower-level constraints. Math. Program. 125(1), 139–162 (2010)
Birgin, E.G., Martínez, J.M.: Practical Augmented Lagrangian Methods for Constrained Optimization. SIAM (2014)
Boumal, N., Absil, P.-A., Cartis, C.: Global rates of convergence for nonconvex optimization on manifolds. IMA J. Numer. Anal. (2018)
Boumal, N., Mishra, B., Absil, P.-A., Sepulchre, R.: Manopt, a Matlab toolbox for optimization on manifolds. J. Mach. Learn. Res. 15(1), 1455–1459 (2014)
Burer, S., Monteiro, R.D.: A nonlinear programming algorithm for solving semidefinite programs via low-rank factorization. Math. Program. 95(2), 329–357 (2003)
Byrd, R.H., Nocedal, J., Waltz, R.A.: Knitro: An integrated package for nonlinear optimization. In: Large-Scale Nonlinear Optimization, pp. 35–59. Springer (2006)
Cambier, L., Absil, P.-A.: Robust low-rank matrix completion by Riemannian optimization. SIAM J. Sci. Comput. 38(5), S440–S460 (2016)
Carmo, MPd: Riemannian Geometry. Birkhäuser, Boston (1992)
Carson, T., Mixon, D.G., Villar, S.: Manifold optimization for k-means clustering. In: Sampling Theory and Applications (SampTA), 2017 International Conference on, pp. 73–77. IEEE (2017)
Chatterjee, A., Madhav Govindu, V.: Efficient and robust large-scale rotation averaging. In: The IEEE International Conference on Computer Vision (ICCV) (December 2013)
Chen, C., Mangasarian, O.L.: Smoothing methods for convex inequalities and linear complementarity problems. Math. Program. 71(1), 51–69 (1995)
Clarke, F.H.: Optimization and nonsmooth analysis. SIAM (1990)
Conn, A.R., Gould, G., Toint, P.L.: LANCELOT: A Fortran Package for Large-Scale Nonlinear Optimization (Release A), vol. 17. Springer, New York (2013)
Dolan, E.D., Moré, J.J.: Benchmarking optimization software with performance profiles. Math. Program. 91(2), 201–213 (2002)
Dreisigmeyer, D.W.: Equality constraints. Riemannian manifolds and direct search methods. Optimization-Online (2007)
Gould, N.I., Toint, P.L.: A note on the convergence of barrier algorithms to second-order necessary points. Math. Program. 85(2), 433–438 (1999)
Grohs, P., Hosseini, S.: \(\varepsilon \)-subgradient algorithms for locally Lipschitz functions on Riemannian manifolds. Adv. Comput. Math. 42(2), 333–360 (2016)
Guo, L., Lin, G.-H., Jane, J.Y.: Second-order optimality conditions for mathematical programs with equilibrium constraints. J. Optim. Theory. Appl. 158(1), 33–64 (2013)
Hosseini, S., Huang, W., Yousefpour, R.: Line search algorithms for locally Lipschitz functions on Riemannian manifolds. SIAM J. Optim. 28(1), 596–619 (2018)
Hosseini, S., Pouryayevali, M.: Generalized gradients and characterization of epi-Lipschitz sets in Riemannian manifolds. Nonlinear Anal. 74(12), 3884–3895 (2011)
Huang, W., Absil, P.-A., Gallivan, K., Hand, P.: ROPTLIB: an object-oriented C++ library for optimization on Riemannian manifolds. Technical Report FSU16-14.v2, Florida State University (2016)
Huang, W., Gallivan, K.A., Absil, P.-A.: A Broyden class of quasi-Newton methods for Riemannian optimization. SIAM J. Optim. 25(3), 1660–1685 (2015)
Johnstone, I.M., Lu, A.Y.: On consistency and sparsity for principal components analysis in high dimensions. J. Am. Stat. Assoc. 104(486), 682–693 (2009)
Kanzow, C., Steck, D.: An example comparing the standard and safeguarded augmented Lagrangian methods. Oper. Res. Lett. 45(6), 598–603 (2017)
Khuzani, M.B., Li, N.: Stochastic primal-dual method on Riemannian manifolds with bounded sectional curvature (2017). arXiv preprint arXiv:1703.08167
Kovnatsky, A., Glashoff, K., Bronstein, M.M.: Madmm: a generic algorithm for non-smooth optimization on manifolds. In: European Conference on Computer Vision, pp. 680–696. Springer (2016)
Lang, K.: Fixing two weaknesses of the spectral method. In: Advances in Neural Information Processing Systems, pp. 715–722 (2006)
Lee, J.: Introduction to Smooth Manifolds. Graduate Texts in Mathematics, vol. 218, 2nd edn. Springer, New York (2012)
Lee, J.M.: Smooth manifolds. In: Introduction to Smooth Manifolds, pp. 1–29. Springer (2003)
Lewis, A.S., Overton, M.L.: Nonsmooth optimization via BFGS. SIAM J. Optim 1–35 (Submitted) (2009)
Lichman, M.: UCI machine learning repository (2013)
Montanari, A., Richard, E.: Non-negative principal component analysis: message passing algorithms and sharp asymptotics. IEEE Trans. Inf. Theory 62(3), 1458–1484 (2016)
Nocedal, J., Wright, S.J.: Numerical Optimization, 2nd edn. Springer, New York (2006)
Parikh, N., Boyd, S.: Proximal Algorithms, vol. 1. Now Publishers inc., Hanover (2014)
Pinar, M.Ç., Zenios, S.A.: On smoothing exact penalty functions for convex constrained optimization. SIAM J. Optim. 4(3), 486–511 (1994)
Ruszczyński, A.P.: Nonlinear Optimization, vol. 13. Princeton University Press, Princeton (2006)
Townsend, J., Koep, N., Weichwald, S.: Pymanopt: a Python toolbox for optimization on manifolds using automatic differentiation. J. Mach. Learn. Res. 17, 1–5 (2016)
Weber, M., Sra, S.: Frank–Wolfe methods for geodesically convex optimization with application to the matrix geometric mean (2017). arXiv preprint arXiv:1710.10770
Yang, W.H., Zhang, L.-H., Song, R.: Optimality conditions for the nonlinear programming problems on Riemannian manifolds. Pac. J. Optim. 10(2), 415–434 (2014)
Zass, R., Shashua, A.: Nonnegative sparse pca. In: Advances in Neural Information Processing Systems, pp. 1561–1568 (2007)
Zhang, J., Ma, S., Zhang, S.: Primal-dual optimization algorithms over Riemannian manifolds: an iteration complexity analysis (2017). arXiv preprint arXiv:1710.02236
Zhang, J., Zhang, S.: A cubic regularized Newton’s method over Riemannian manifolds (2018). arXiv preprint arXiv:1805.05565
Acknowledgements
We thank an anonymous reviewer for detailed and helpful comments on the first version of this paper. NB is partially supported by NSF grant DMS-1719558.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
A Proof of Proposition 3.2
We first introduce two supporting lemmas. The first lemma is a well-known fact for which we provide a proof for completeness.Footnote 4
Lemma A.1
Let p be a point on a Riemannian manifold \(\mathcal{{M}}\), and let v be a tangent vector at p. Let \(\mathcal{{U}}\) be a normal neighborhood of p, that is, the exponential map maps a neighbourhood of the origin of \(\mathrm {T}_p\mathcal{{M}}\) diffeomorphically to \(\mathcal{{U}}\). Define the following vector field on \(\mathcal{{U}}\):
where parallel transport is done along the (unique) minimizing geodesic from p to q. Then, V is a smooth vector field on \(\mathcal{{U}}\).
Proof
Parallel transport from p is along geodesics passing through p. To facilitate their study, set up normal coordinates \(\phi :U \subset {\mathbb {R}}^d \rightarrow \mathcal{{U}}\) around p (in particular, \(\phi (0) = p\)), where d is the dimension of the manifold. For a point \(\phi (x_1,\dots ,x_d)\), by definition of normal coordinates, the radial geodesic from p is \(c(t) = \phi (tx_1,\dots , tx_d)\). Our vector field of interest is defined by \(V(p) = v\) and the fact that it is parallel along every radial geodesic c as described.
For a choice of point \(\phi (x)\) and corresponding geodesic, let
for some coordinate functions \(v_1, \ldots , v_d\), where \(\partial _k\) is the kth coordinate vector field. These coordinate functions satisfy the following ordinary differential equations (ODE) [22, Prop. 2.6, eq. (2)]:
where \(\Gamma \) denotes Christoffel symbols. Expand V(p) into the coordinate vector fields: \(v = \sum _{k=1}^d w_k \partial _k(p)\). Then, the initial conditions are \(v_k(0) = w_k\) for each k. Because these ODEs are smooth, solutions \(v_k(t; w)\) exist, and they are smooth in both t and the initial conditions w [42, Thm. D.6]. But this is not enough for our purpose.
Crucially, we wish to show smoothness also in the choice of \(x \in U\). To this end, following a classical trick, we extend the set of equations to let x be part of the variables, as follows:
The extended initial conditions are:
Clearly, the functions \(u_k(t)\) are constant: \(u_k(t) = x_k\). These ODEs are still smooth, hence solutions \(v_k(t; w, x)\) still exist and are identical to those of the previous set of ODEs, except we now see they are also smooth in the choice of x. Specifically, for every \(x \in U\),
and each \(v_k(1; w, x)\) depends smoothly on x. Hence, V is smooth on \(\mathcal{{U}} = \phi (U)\).
\(\square \)
Lemma A.2
Given a Riemannian manifold \(\mathcal{{M}}\), a function \(f :\mathcal {M} \rightarrow {\mathbb {R}}\) (continuously differentiable), and a point \(p\in \mathcal{{M}}\), if \(p_0, p_1, p_2, \ldots \) is a sequence of points in a normal neighborhood of p and convergent to p, then the following holds:
where \(\mathcal{{P}}_{p_k\rightarrow p}\) is the parallel transport from \(\mathrm {T}_{p_k}\mathcal {M}\) to \(\mathrm {T}_p\mathcal {M}\) along the minimizing geodesic.
Proof of Lemma A.2
As parallel transport is an isometry, it is equivalent to show
Under our assumptions, \(\text {grad}f\) is a continuous vector field. Furthermore, by Lemma A.1, in a normal neighborhood of p, the vector field \(V(y) = \mathcal{{P}}_{p\rightarrow y} {\text {grad}}\, f(p)\) is a continuous vector field as well. Hence, \(\text {grad}f - V\) is a continuous vector field around p; since \(\text {grad}f(p) - V(p) = 0\), the result is proved: \(\lim _{k \rightarrow \infty } \text {grad}f(p_k) - V(p_k) = \text {grad}f(p) - V(p) = 0\). \(\square \)
Proof of Proposition 3.2
Restrict to a convergent subsequence if needed, so that \(\lim _{k\rightarrow \infty }x_k = {\overline{x}}\). Further exclude a (finite) number of \(x_k\)’s so that all the remaining points are in a neighborhood of \({\overline{x}}\) where the exponential map is a diffeomorphism. In this proof, let \(\mathcal{{A}}\) denote \(\mathcal{{A}}({\overline{x}})\) for ease of notation: this is the set of active constraints at the limit point. Then, there exist constants \(c, k_1\) such that \(g_i(x_k)< c < 0\) for all \(k>k_1\) with \(i \in \mathcal{{I}} \setminus \mathcal{{A}}\).
When \(\{\rho _k\}\) is unbounded, since multipliers are bounded, there exists \(k_2>k_1\) such that \(\lambda _i^k+\rho _kg_i(x_{k+1}) < 0\) for all \(k\ge k_2\), \(i\in \mathcal{{I}}\setminus \mathcal{{A}}\). Thus, by definition, \(\lambda _i^{k+1} = 0\) for all \(k\ge k_2\), \(i\in \mathcal{{I}}\setminus \mathcal{{A}}\).
When instead \(\{\rho _k\}\) is bounded, \(\lim _{k\rightarrow \infty } |\sigma _i^k| = 0\). Thus for \(i\in \mathcal{{I}}\setminus \mathcal{{A}}\), in view of \(g_i(x_k)< c<0\) for all \(k>k_1\), we have \(\lim _{k\rightarrow \infty } \frac{-\lambda _i^k}{\rho _k}= 0\). Then, for large enough k, \(\lambda _i^k +\rho _kg_i(x_{k+1})<0\) and thus there exists \(k_2>k_1\) such that \(\lambda _i^k = 0\) for all \(k\ge k_2\). So in either case, we can find such \(k_2\).
As LICQ is satisfied at \({\overline{x}}\), by continuity of the gradients of \(\{g_i\}\) and \(\{h_j\}\), the tangent vectors \(\{{\text {grad}}\, h_j(x_k)\}_{j\in \mathcal{{E}}}\cup \{{\text {grad}}\, g_i(x_k)\}_{i\in \mathcal{{I}}\cap \mathcal{{A}}}\) are linearly independent for all \(k>k_3>k_2\) for some \(k_3\). Define
as the unclipped update. Define \(S_k := \max \{\Vert {\overline{\gamma }}^k\Vert _\infty ,\Vert {\overline{\lambda }}^k\Vert _\infty \}\). We are going to discuss separately for situations when \(S_k\) is bounded and when it is unbounded. If it is bounded, then denote a limit point of \({\overline{\lambda }}^k, {\overline{\gamma }}^k\) as \({\overline{\lambda }}\) and \({\overline{\gamma }}\). Let
In order to prove that v is zero, we compare it to a similar vector defined at \(x_k\), for all large k, and consider the limit \(k \rightarrow \infty \). Unlike the Euclidean case in the proof in [5], we cannot directly compare tangent vectors in the tangent spaces at \(x_k\) and \({\overline{x}}\): we use parallel transport to bring all tangent vectors to the tangent space at \({\overline{x}}\):
By Lemma A.2, the first term vanishes in the limit \(k \rightarrow \infty \) since \(x_k \rightarrow {\overline{x}}\). We can understand the second term using isometry of parallel transport and linearity:
Here, the second term vanishes in the limit because it is upper bounded by \(\epsilon _{k}\) (by assumption) and we let \(\lim _{k\rightarrow \infty } \epsilon _k = 0\); the last term vanishes in the limit because of the discussion in the second paragraph; and the first term attains arbitrarily small values for large k as norms of gradients are bounded in a neighbourhood of \({\overline{x}}\) and by definition of \({\overline{\lambda }}\) and \({\overline{\gamma }}\). Since v is independent of k, we conclude that \(\Vert v\Vert = 0\). Therefore, \({\overline{x}}\) satisfies KKT conditions.
On the other hand, if \(\{S_k\}\) is unbounded, then for \(k\ge k_3\), we have
As all the coefficients on the left-hand side are bounded in \([-1,1]\), and by definition of \(S_k\), the coefficient vector has a nonzero limit point. Denote it as \({\overline{\lambda }}\) and \({\overline{\gamma }}\). By a similar argument as above, taking the limit in k, we can obtain
which contradicts the LICQ condition at \({\overline{x}}\). Hence, the situation that \(\{S_k\}\) is unbounded does not take place, so we are left with the cases where it is bounded, for which we already showed that \({\overline{x}}\) satisfies KKT condition. \(\square \)
B Proof of Proposition 3.4
Proof
The proof is adapted from Section 3 in [7]. Define \({\overline{\gamma }}_j^k = \gamma _j^{k-1}+\rho _{k-1} h_j(x_k)\). By Proposition 3.2, \({\overline{x}}\) is a KKT point and by taking a subsequence of \(\{x_k\}\) if needed, \({\overline{\gamma }}^k\) is bounded and converges to \({\overline{\gamma }}\).
For any tangent vector \(d\in \mathcal{{C}}^W({\overline{x}})\), we have \(\langle d, {\text {grad}}\, h_j({\overline{x}})\rangle = 0\) for all \(j\in \mathcal{{E}}\). Let \(m = |\mathcal{{E}}|\), and dimension of \(\mathcal{{M}}\) be \(n\ge m\). Let \(\varphi \) be a chart such that \(\varphi ({\overline{x}}) = 0\). From [42, Prop. 8.1], the component functions of \(h_j\) with respect to this chart are smooth. Let \(\partial _1\dots \partial _{n}\) be the basis vectors of the given local chart. Let \(d = (d_1\partial _1,\dots , d_n\partial _n)\). Define: \(\mathcal{{F}}:{\mathbb {R}}^{n+m} \rightarrow {\mathbb {R}}^m\), i.e. for \(x\in {\mathbb {R}}^n, y\in {\mathbb {R}}^m, j\in \{1,\dots , m\}\) as
If we denote \(h_{j}^l\) as the lth coordinate of vector \({\text {grad}}\, h_j\) in this system, and \(G_x\) as gram matrix for the metric where \(G_{x_{p,q}} = \langle \partial _p,\partial _q\rangle _x\), then the above expression can be written as
and by abuse of notation where \([1\dots m]\) means extracting the first m columns, we have
Notice that \([h^1(\varphi ^{-1}({\overline{x}})), \dots h^n(\varphi ^{-1}({\overline{x}}))]\) has full row rank (by LICQ), so it has rank m. As \(G_{\varphi ^{-1}({\overline{x}})}\) is invertible, \(\frac{\partial \mathcal{{F}}}{\partial y}({\overline{x}})\) must be invertible (reindex the columns from the top of the proof for this \(m\times n\) matrix if needed so that the m columns form a full rank matrix). Then, by the implicit function theorem, for a small neighbourhood U of \(\varphi ^{-1}({\overline{x}})\), we have a continuously differentiable function \(g:U\rightarrow {\mathbb {R}}^m\), where \(g(\varphi ^{-1}({\overline{x}})) = [d_1, \dots , d_m]\) and
For each x locally around \({\overline{x}}\), let
These vectors then forms a smooth vector field such that \(\langle d_x, {\text {grad}}\, h_j(x)\rangle = 0\) for all \(j\in \mathcal{{E}}\), and \(d = d_{{\overline{x}}}\). Then we have that
where the second equality is by definition of connection; the third is by orthogonality of d with \(\{{\text {grad}}\, h_j\}\); the fourth is from the definition of Hessian and \({\overline{\gamma }}\). Therefore we have
Since the connection maps two continuously differentiable vector fields to a continuous vector field, we can take a limit and state:
which is just \(\mathrm {Hess} \mathcal{{L}}({\overline{x}}, {\overline{\gamma }})(d,d) \ge 0\). \(\square \)
C Proof of Proposition 4.1
In the proof below, we use the following notation:
Proof
Consider the function Q, defined by:
In a small enough neighbourhood of \(x^*\), terms for inactive constraints disappear and Q is just:
Although Q is nonsmooth, it is easy to verify that it has directional derivative in all directions:
and since \(g_i\circ \mathrm {Exp}_{x^*}\) is sufficiently smooth, discussing separately the sign of \(\frac{d}{d\tau }(g_i\circ \mathrm {Exp}_{x^*})(\tau d)\), we have the right hand side equal to \(\text {max}\{0, \frac{d}{d\tau }(g_i\circ \mathrm {Exp}_{x^*})(\tau d)\} = \max \{0,\langle {\text {grad}}\, g_i(x^*), d\rangle \}\). Similarly, we have
Hence, the directional derivative along direction d, \(Q(x^*,\rho ; d)\), is well defined:
As \(x^*\) is a KKT point,
Thus,
Combining with equation (32), we have
For contradiction, suppose \(x^*\) is not a local minimum of Q. Then, there exists \(\{y_k\}_{k=1}^\infty \), \(\lim _{k\rightarrow \infty } y_k = x^*\) such that \(Q(y_k, \rho ) < Q(x^*, \rho ) = f(x^*)\). By restricting to a small enough neighbourhood, there exists \(\eta _k = \mathrm {Exp}^{-1}_{x^*}(y_k)\). Considering only a subsequence if needed, we have \(\lim _{k\rightarrow \infty } \frac{\eta _k}{\Vert \eta _k\Vert } = {\bar{\eta }}\). It is easy to see that \(Q(\mathrm {Exp}_{x^*}(\cdot ),\rho )\) is locally Lipschitz continuous at \(0_{x^*}\), which gives
Subtract \(Q(x^*,\rho )\) and take the limit:
Notice the left-most expression is just \(Q(x^*, \rho ; {\bar{\eta }})\). Since coefficients on the right-hand side of (33) are strictly positive, we must have \(\langle {\text {grad}}\,g_i(x^*), {\bar{\eta }}\rangle \le 0\) and \(\langle {\text {grad}}\,h_j(x^*), {\bar{\eta }}\rangle = 0\). Since the exponential mapping is of second order, we have a Taylor expansion for f,
and similarly for \(g_i\) and \(h_j\). Notice that
where \(P(y_k) = \sum _{i\in \mathcal{{A}}(x^*)\cap \mathcal{{I}}}(\rho - \lambda _i^*)\max \{0, g_i(y_k)\} + \sum _{j\in \mathcal{{E}}}(\rho - \gamma _j^*)|h_j(y_k)|\). The first inequality follows from quadratic approximation of \(f + \sum _{i\in \mathcal{{A}}(x^*)\cap \mathcal{{I}}} \lambda _{i} g_i+\sum _{j\in \mathcal{{E}}} \gamma _j h_j\) and bilinearity of the metric. The last equality comes from the definition of KKT points. Dividing the equation through by \(\Vert \eta \Vert ^2\), we obtain
If \({\bar{\eta }}\in F'\), then as \(P(y_k)\ge 0\), the first term on the right hand side will be strictly larger than 0, which is a contradiction to \(Q(y_k,\rho ) < f(x^*)\) for all k. If \({\bar{\eta }}\in F-F'\), then there exists \(g_{i'}\) such that \(\langle {\text {grad}}\,g_{i'}(x), {\bar{\eta }}\rangle > 0\). Then,
Hence, dividing the above expression by \(\Vert \eta _k\Vert \) gives
Notice that \(\frac{P(y_k)}{\Vert \eta \Vert ^2} \ge \frac{g_{i'}(y_k)}{\Vert \eta _k\Vert }\) for large enough k and a contradiction is obtained by plugging it into (34). \(\square \)
D Proof of Proposition 4.2
Proof
We give a proof for \(Q^{\mathrm {lse}}\)—it is analogous for \(Q^{\mathrm {lqh}}\). For each iteration k and for each \(i\in \mathcal{{I}}\) and \(j\in \mathcal{{E}}\), define the following coefficients:
Then, a simple calculation shows that (under our assumptions, \(\rho _k = \rho _0\) for all k; we simply write \(\rho \)):
Notice that the multipliers are bounded: \(\gamma ^k_j \in [-1,1]\) and \(\lambda _i^k\in [0,1]\). Hence, as sequences indexed by k, they have a limit point: we denote them by \({\overline{\gamma }}\in [-1,1]\) and \({\overline{\lambda }}\in [0,1]\). Furthermore, since \({\overline{x}}\) is feasible, there exists \(k_1\) such that for any \(k>k_1\), \(i\in \mathcal{{I}}\setminus \mathcal{{A}}({\overline{x}})\), \(g_i(x_k) < c\) for some constant \(c<0\). Then, as \(u_k\rightarrow 0\), by definition, \(\lambda ^k_i\) goes to 0 for \(i\in \mathcal{{I}}\setminus \mathcal{{A}}({\overline{x}})\). This shows \({\overline{\lambda }}_i = 0\) for \(i\in \mathcal{{I}}\setminus \mathcal{{A}}({\overline{x}})\). Considering a convergent subsequence if needed, there exists \(k_2>k_1\) such that, for all \(k>k_2\), \(\text {dist}(x_k,{\overline{x}}) < { i }({\overline{x}})\) (the injectivity radius). Thus, parallel transport from each \(x_k\) to \({\overline{x}}\) is well defined. Consider
Notice that its coefficients are bounded, so we can get \(\Vert v\Vert = 0\) similar to the proof of Proposition 3.2. \(\square \)
Rights and permissions
About this article
Cite this article
Liu, C., Boumal, N. Simple Algorithms for Optimization on Riemannian Manifolds with Constraints. Appl Math Optim 82, 949–981 (2020). https://doi.org/10.1007/s00245-019-09564-3
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00245-019-09564-3
Keywords
- Riemannian optimization
- Constrained optimization
- Differential geometry
- Augmented Lagrangian method
- Exact penalty method
- Nonsmooth optimization