Simple Algorithms for Optimization on Riemannian Manifolds with Constraints

Liu, Changshuo; Boumal, Nicolas

doi:10.1007/s00245-019-09564-3

Simple Algorithms for Optimization on Riemannian Manifolds with Constraints

Published: 28 March 2019

Volume 82, pages 949–981, (2020)
Cite this article

Applied Mathematics & Optimization Submit manuscript

Changshuo Liu^1,2 &
Nicolas Boumal²

3244 Accesses
45 Citations
1 Altmetric
Explore all metrics

Abstract

We consider optimization problems on manifolds with equality and inequality constraints. A large body of work treats constrained optimization in Euclidean spaces. In this work, we consider extensions of existing algorithms from the Euclidean case to the Riemannian case. Thus, the variable lives on a known smooth manifold and is further constrained. In doing so, we exploit the growing literature on unconstrained Riemannian optimization. For the special case where the manifold is itself described by equality constraints, one could in principle treat the whole problem as a constrained problem in a Euclidean space. The main hypothesis we test here is whether it is sometimes better to exploit the geometry of the constraints, even if only for a subset of them. Specifically, this paper extends an augmented Lagrangian method and smoothed versions of an exact penalty method to the Riemannian case, together with some fundamental convergence results. Numerical experiments indicate some gains in computational efficiency and accuracy in some regimes for minimum balanced cut, non-negative PCA and k-means, especially in high dimensions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Riemannian Interior Point Methods for Constrained Optimization on Manifolds

Article 04 March 2024

Zhijian Lai & Akiko Yoshise

Riemannian Optimization via Frank-Wolfe Methods

Article Open access 14 July 2022

Melanie Weber & Suvrit Sra

A Collection of Nonsmooth Riemannian Optimization Problems

Notes

Note that this condition involves $\mathcal {L}$ as defined in Sect. 2.2, not $\mathcal{{L}}_\rho $.
https://github.com/losangle/Optimization-on-manifolds-with-extra-constraints.
When the step size is of order $10^{-10}$, we believe that the current point is close to convergence. We also conducted experiments with minimum step size $10^{-7}$ for minimum balanced cut and non-negative PCA, and the performance profiles are visually similar to those displayed here.
The proof follows an argument laid out by John M. Lee: https://math.stackexchange.com/questions/2307289/parallel-transport-along-radial-geodesics-yields-a-smooth-vector-field.

References

Absil, P.-A., Hosseini, S.: A collection of nonsmooth Riemannian optimization problems. Technical Report UCL-INMA-2017.08, Université catholique de Louvain (2017)
Absil, P.-A., Mahony, R., Sepulchre, R.: Optimization Algorithms on Matrix Manifolds. Princeton University Press, Princeton (2008)
Book MATH Google Scholar
Agarwal, N., Boumal, N., Bullins, B., Cartis, C.: Adaptive regularization with cubics on manifolds (2018). arXiv preprint arXiv:1806.00065
Albert, R., Barabási, A.-L.: Statistical mechanics of complex networks. Rev. Mod. Phys. 74, 47–97 (2002)
Article MathSciNet MATH Google Scholar
Andreani, R., Birgin, E.G., Martínez, J.M., Schuverdt, M.L.: On augmented Lagrangian methods with general lower-level constraints. SIAM J. Optim. 18(4), 1286–1309 (2007)
Article MathSciNet MATH Google Scholar
Andreani, R., Haeser, G., Martínez, J.M.: On sequential optimality conditions for smooth constrained optimization. Optimization 60(5), 627–641 (2011)
Article MathSciNet MATH Google Scholar
Andreani, R., Haeser, G., Ramos, A., Silva, P.J.: A second-order sequential optimality condition associated to the convergence of optimization algorithms. IMA J. Numer. Anal. 37, 1902–1929 (2017)
Article MathSciNet MATH Google Scholar
Bento, G., Ferreira, O., Melo, J.: Iteration-complexity of gradient, subgradient and proximal point methods on Riemannian manifolds. J. Optim. Theory Appl. 173(2), 548–562 (2017)
Article MathSciNet MATH Google Scholar
Bento, G.C., Ferreira, O.P., Melo, J.G.: Iteration-complexity of gradient, subgradient and proximal point methods on Riemannian manifolds. J. Optim. Theory Appl. 173(2), 548–562 (2017)
Article MathSciNet MATH Google Scholar
Bergmann, R., Herzog, R.: Intrinsic formulation of KKT conditions and constraint qualifications on smooth manifolds (2018). arXiv preprint arXiv:1804.06214
Bergmann, R., Persch, J., Steidl, G.: A parallel Douglas-Rachford algorithm for minimizing ROF-like functionals on images with values in symmetric Hadamard manifolds. SIAM J. Imaging Sci. 9(3), 901–937 (2016)
Article MathSciNet MATH Google Scholar
Bertsekas, D.P.: Constrained Optimization and Lagrange Multiplier Methods. Athena Scientific, Belmont (1982)
MATH Google Scholar
Bertsekas, D.P.: Nonlinear Programming. Athena Scientific, Belmont (1999)
MATH Google Scholar
Birgin, E., Haeser, G., Ramos, A.: Augmented Lagrangians with constrained subproblems and convergence to second-order stationary points. Optimization Online (2016)
Birgin, E.G., Floudas, C.A., Martínez, J.M.: Global minimization using an augmented Lagrangian method with variable lower-level constraints. Math. Program. 125(1), 139–162 (2010)
Article MathSciNet MATH Google Scholar
Birgin, E.G., Martínez, J.M.: Practical Augmented Lagrangian Methods for Constrained Optimization. SIAM (2014)
Boumal, N., Absil, P.-A., Cartis, C.: Global rates of convergence for nonconvex optimization on manifolds. IMA J. Numer. Anal. (2018)
Boumal, N., Mishra, B., Absil, P.-A., Sepulchre, R.: Manopt, a Matlab toolbox for optimization on manifolds. J. Mach. Learn. Res. 15(1), 1455–1459 (2014)
MATH Google Scholar
Burer, S., Monteiro, R.D.: A nonlinear programming algorithm for solving semidefinite programs via low-rank factorization. Math. Program. 95(2), 329–357 (2003)
Article MathSciNet MATH Google Scholar
Byrd, R.H., Nocedal, J., Waltz, R.A.: Knitro: An integrated package for nonlinear optimization. In: Large-Scale Nonlinear Optimization, pp. 35–59. Springer (2006)
Cambier, L., Absil, P.-A.: Robust low-rank matrix completion by Riemannian optimization. SIAM J. Sci. Comput. 38(5), S440–S460 (2016)
Article MathSciNet MATH Google Scholar
Carmo, MPd: Riemannian Geometry. Birkhäuser, Boston (1992)
Book MATH Google Scholar
Carson, T., Mixon, D.G., Villar, S.: Manifold optimization for k-means clustering. In: Sampling Theory and Applications (SampTA), 2017 International Conference on, pp. 73–77. IEEE (2017)
Chatterjee, A., Madhav Govindu, V.: Efficient and robust large-scale rotation averaging. In: The IEEE International Conference on Computer Vision (ICCV) (December 2013)
Chen, C., Mangasarian, O.L.: Smoothing methods for convex inequalities and linear complementarity problems. Math. Program. 71(1), 51–69 (1995)
Article MathSciNet MATH Google Scholar
Clarke, F.H.: Optimization and nonsmooth analysis. SIAM (1990)
Conn, A.R., Gould, G., Toint, P.L.: LANCELOT: A Fortran Package for Large-Scale Nonlinear Optimization (Release A), vol. 17. Springer, New York (2013)
MATH Google Scholar
Dolan, E.D., Moré, J.J.: Benchmarking optimization software with performance profiles. Math. Program. 91(2), 201–213 (2002)
Article MathSciNet MATH Google Scholar
Dreisigmeyer, D.W.: Equality constraints. Riemannian manifolds and direct search methods. Optimization-Online (2007)
Gould, N.I., Toint, P.L.: A note on the convergence of barrier algorithms to second-order necessary points. Math. Program. 85(2), 433–438 (1999)
Article MathSciNet MATH Google Scholar
Grohs, P., Hosseini, S.: $\varepsilon $-subgradient algorithms for locally Lipschitz functions on Riemannian manifolds. Adv. Comput. Math. 42(2), 333–360 (2016)
Article MathSciNet MATH Google Scholar
Guo, L., Lin, G.-H., Jane, J.Y.: Second-order optimality conditions for mathematical programs with equilibrium constraints. J. Optim. Theory. Appl. 158(1), 33–64 (2013)
Article MathSciNet MATH Google Scholar
Hosseini, S., Huang, W., Yousefpour, R.: Line search algorithms for locally Lipschitz functions on Riemannian manifolds. SIAM J. Optim. 28(1), 596–619 (2018)
Article MathSciNet MATH Google Scholar
Hosseini, S., Pouryayevali, M.: Generalized gradients and characterization of epi-Lipschitz sets in Riemannian manifolds. Nonlinear Anal. 74(12), 3884–3895 (2011)
Article MathSciNet MATH Google Scholar
Huang, W., Absil, P.-A., Gallivan, K., Hand, P.: ROPTLIB: an object-oriented C++ library for optimization on Riemannian manifolds. Technical Report FSU16-14.v2, Florida State University (2016)
Huang, W., Gallivan, K.A., Absil, P.-A.: A Broyden class of quasi-Newton methods for Riemannian optimization. SIAM J. Optim. 25(3), 1660–1685 (2015)
Article MathSciNet MATH Google Scholar
Johnstone, I.M., Lu, A.Y.: On consistency and sparsity for principal components analysis in high dimensions. J. Am. Stat. Assoc. 104(486), 682–693 (2009)
Article MathSciNet MATH Google Scholar
Kanzow, C., Steck, D.: An example comparing the standard and safeguarded augmented Lagrangian methods. Oper. Res. Lett. 45(6), 598–603 (2017)
Article MathSciNet MATH Google Scholar
Khuzani, M.B., Li, N.: Stochastic primal-dual method on Riemannian manifolds with bounded sectional curvature (2017). arXiv preprint arXiv:1703.08167
Kovnatsky, A., Glashoff, K., Bronstein, M.M.: Madmm: a generic algorithm for non-smooth optimization on manifolds. In: European Conference on Computer Vision, pp. 680–696. Springer (2016)
Lang, K.: Fixing two weaknesses of the spectral method. In: Advances in Neural Information Processing Systems, pp. 715–722 (2006)
Lee, J.: Introduction to Smooth Manifolds. Graduate Texts in Mathematics, vol. 218, 2nd edn. Springer, New York (2012)
Book Google Scholar
Lee, J.M.: Smooth manifolds. In: Introduction to Smooth Manifolds, pp. 1–29. Springer (2003)
Lewis, A.S., Overton, M.L.: Nonsmooth optimization via BFGS. SIAM J. Optim 1–35 (Submitted) (2009)
Lichman, M.: UCI machine learning repository (2013)
Montanari, A., Richard, E.: Non-negative principal component analysis: message passing algorithms and sharp asymptotics. IEEE Trans. Inf. Theory 62(3), 1458–1484 (2016)
Article MathSciNet MATH Google Scholar
Nocedal, J., Wright, S.J.: Numerical Optimization, 2nd edn. Springer, New York (2006)
MATH Google Scholar
Parikh, N., Boyd, S.: Proximal Algorithms, vol. 1. Now Publishers inc., Hanover (2014)
Book Google Scholar
Pinar, M.Ç., Zenios, S.A.: On smoothing exact penalty functions for convex constrained optimization. SIAM J. Optim. 4(3), 486–511 (1994)
Article MathSciNet MATH Google Scholar
Ruszczyński, A.P.: Nonlinear Optimization, vol. 13. Princeton University Press, Princeton (2006)
Book MATH Google Scholar
Townsend, J., Koep, N., Weichwald, S.: Pymanopt: a Python toolbox for optimization on manifolds using automatic differentiation. J. Mach. Learn. Res. 17, 1–5 (2016)
MathSciNet MATH Google Scholar
Weber, M., Sra, S.: Frank–Wolfe methods for geodesically convex optimization with application to the matrix geometric mean (2017). arXiv preprint arXiv:1710.10770
Yang, W.H., Zhang, L.-H., Song, R.: Optimality conditions for the nonlinear programming problems on Riemannian manifolds. Pac. J. Optim. 10(2), 415–434 (2014)
MathSciNet MATH Google Scholar
Zass, R., Shashua, A.: Nonnegative sparse pca. In: Advances in Neural Information Processing Systems, pp. 1561–1568 (2007)
Zhang, J., Ma, S., Zhang, S.: Primal-dual optimization algorithms over Riemannian manifolds: an iteration complexity analysis (2017). arXiv preprint arXiv:1710.02236
Zhang, J., Zhang, S.: A cubic regularized Newton’s method over Riemannian manifolds (2018). arXiv preprint arXiv:1805.05565

Download references

Acknowledgements

We thank an anonymous reviewer for detailed and helpful comments on the first version of this paper. NB is partially supported by NSF grant DMS-1719558.

Author information

Authors and Affiliations

PACM and Mathematics Department, Princeton University, Princeton, NJ, USA
Changshuo Liu
Mathematics Department, Princeton University, Princeton, NJ, USA
Changshuo Liu & Nicolas Boumal

Authors

Changshuo Liu
View author publications
You can also search for this author in PubMed Google Scholar
Nicolas Boumal
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nicolas Boumal.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

A Proof of Proposition 3.2

We first introduce two supporting lemmas. The first lemma is a well-known fact for which we provide a proof for completeness.^{Footnote 4}

Lemma A.1

Let p be a point on a Riemannian manifold $\mathcal{{M}}$, and let v be a tangent vector at p. Let $\mathcal{{U}}$ be a normal neighborhood of p, that is, the exponential map maps a neighbourhood of the origin of $\mathrm {T}_p\mathcal{{M}}$ diffeomorphically to $\mathcal{{U}}$. Define the following vector field on $\mathcal{{U}}$:

$$\begin{aligned} \forall q \in \mathcal{{U}}, \qquad V(q)&= \mathcal{{P}}_{p \rightarrow q} v, \end{aligned}$$

where parallel transport is done along the (unique) minimizing geodesic from p to q. Then, V is a smooth vector field on $\mathcal{{U}}$.

Proof

Parallel transport from p is along geodesics passing through p. To facilitate their study, set up normal coordinates $\phi :U \subset {\mathbb {R}}^d \rightarrow \mathcal{{U}}$ around p (in particular, $\phi (0) = p$), where d is the dimension of the manifold. For a point $\phi (x_1,\dots ,x_d)$, by definition of normal coordinates, the radial geodesic from p is $c(t) = \phi (tx_1,\dots , tx_d)$. Our vector field of interest is defined by $V(p) = v$ and the fact that it is parallel along every radial geodesic c as described.

For a choice of point $\phi (x)$ and corresponding geodesic, let

$$\begin{aligned} V(c(t)) = \sum _{k=1}^d v_k(t) \partial _k(c(t)) \end{aligned}$$

for some coordinate functions $v_1, \ldots , v_d$, where $\partial _k$ is the kth coordinate vector field. These coordinate functions satisfy the following ordinary differential equations (ODE) [22, Prop. 2.6, eq. (2)]:

$$\begin{aligned} 0&= \frac{dv_k(t)}{dt} + \sum _{i,j}\Gamma ^k_{ij}(tx_1,\dots , tx_d) v_j(t) x_i,&k = 1,\dots , d, \end{aligned}$$

where $\Gamma $ denotes Christoffel symbols. Expand V(p) into the coordinate vector fields: $v = \sum _{k=1}^d w_k \partial _k(p)$. Then, the initial conditions are $v_k(0) = w_k$ for each k. Because these ODEs are smooth, solutions $v_k(t; w)$ exist, and they are smooth in both t and the initial conditions w [42, Thm. D.6]. But this is not enough for our purpose.

Crucially, we wish to show smoothness also in the choice of $x \in U$. To this end, following a classical trick, we extend the set of equations to let x be part of the variables, as follows:

$$\begin{aligned} {\left\{ \begin{array}{ll} 0 = \frac{dv_k(t)}{dt} + \sum _{i,j}\Gamma ^k_{ij}(tu_1(t),\ldots , tu_d(t)) v_j(t) u_i(t), &{} k=1,\ldots , d,\\ 0 = \frac{du_k(t)}{dt}, &{} k=1, \ldots , d. \end{array}\right. } \end{aligned}$$

The extended initial conditions are:

$$\begin{aligned} v_k(0)&= w_k,&u_k(0)&= x_k,&k&= 1, \ldots , d. \end{aligned}$$

Clearly, the functions $u_k(t)$ are constant: $u_k(t) = x_k$. These ODEs are still smooth, hence solutions $v_k(t; w, x)$ still exist and are identical to those of the previous set of ODEs, except we now see they are also smooth in the choice of x. Specifically, for every $x \in U$,

$$\begin{aligned} V(\phi (x))&= V(c(1)) = \sum _{k=1}^{d} v_k(1; w, x) \partial _k(\phi (x)), \end{aligned}$$

and each $v_k(1; w, x)$ depends smoothly on x. Hence, V is smooth on $\mathcal{{U}} = \phi (U)$.

$\square $

Lemma A.2

Given a Riemannian manifold $\mathcal{{M}}$, a function $f :\mathcal {M} \rightarrow {\mathbb {R}}$ (continuously differentiable), and a point $p\in \mathcal{{M}}$, if $p_0, p_1, p_2, \ldots $ is a sequence of points in a normal neighborhood of p and convergent to p, then the following holds:

$$\begin{aligned} \lim _{k\rightarrow \infty } \left\| \mathcal{{P}}_{p_k\rightarrow p}{\text {grad}}\, f(p_k) - {\text {grad}}\,f(p)\right\| _p = 0, \end{aligned}$$

where $\mathcal{{P}}_{p_k\rightarrow p}$ is the parallel transport from $\mathrm {T}_{p_k}\mathcal {M}$ to $\mathrm {T}_p\mathcal {M}$ along the minimizing geodesic.

Proof of Lemma A.2

As parallel transport is an isometry, it is equivalent to show

$$\begin{aligned} \lim _{k\rightarrow \infty } \left\| {\text {grad}}\, f(p_k) - \mathcal{{P}}_{p\rightarrow p_k} {\text {grad}}\, f(p)\right\| _{p_k} = 0. \end{aligned}$$

(30)

Under our assumptions, $\text {grad}f$ is a continuous vector field. Furthermore, by Lemma A.1, in a normal neighborhood of p, the vector field $V(y) = \mathcal{{P}}_{p\rightarrow y} {\text {grad}}\, f(p)$ is a continuous vector field as well. Hence, $\text {grad}f - V$ is a continuous vector field around p; since $\text {grad}f(p) - V(p) = 0$, the result is proved: $\lim _{k \rightarrow \infty } \text {grad}f(p_k) - V(p_k) = \text {grad}f(p) - V(p) = 0$. $\square $

Proof of Proposition 3.2

Restrict to a convergent subsequence if needed, so that $\lim _{k\rightarrow \infty }x_k = {\overline{x}}$. Further exclude a (finite) number of $x_k$’s so that all the remaining points are in a neighborhood of ${\overline{x}}$ where the exponential map is a diffeomorphism. In this proof, let $\mathcal{{A}}$ denote $\mathcal{{A}}({\overline{x}})$ for ease of notation: this is the set of active constraints at the limit point. Then, there exist constants $c, k_1$ such that $g_i(x_k)< c < 0$ for all $k>k_1$ with $i \in \mathcal{{I}} \setminus \mathcal{{A}}$.

When $\{\rho _k\}$ is unbounded, since multipliers are bounded, there exists $k_2>k_1$ such that $\lambda _i^k+\rho _kg_i(x_{k+1}) < 0$ for all $k\ge k_2$, $i\in \mathcal{{I}}\setminus \mathcal{{A}}$. Thus, by definition, $\lambda _i^{k+1} = 0$ for all $k\ge k_2$, $i\in \mathcal{{I}}\setminus \mathcal{{A}}$.

When instead $\{\rho _k\}$ is bounded, $\lim _{k\rightarrow \infty } |\sigma _i^k| = 0$. Thus for $i\in \mathcal{{I}}\setminus \mathcal{{A}}$, in view of $g_i(x_k)< c<0$ for all $k>k_1$, we have $\lim _{k\rightarrow \infty } \frac{-\lambda _i^k}{\rho _k}= 0$. Then, for large enough k, $\lambda _i^k +\rho _kg_i(x_{k+1})<0$ and thus there exists $k_2>k_1$ such that $\lambda _i^k = 0$ for all $k\ge k_2$. So in either case, we can find such $k_2$.

As LICQ is satisfied at ${\overline{x}}$, by continuity of the gradients of $\{g_i\}$ and $\{h_j\}$, the tangent vectors $\{{\text {grad}}\, h_j(x_k)\}_{j\in \mathcal{{E}}}\cup \{{\text {grad}}\, g_i(x_k)\}_{i\in \mathcal{{I}}\cap \mathcal{{A}}}$ are linearly independent for all $k>k_3>k_2$ for some $k_3$. Define

$$\begin{aligned} {\overline{\lambda }}_i^k&= \max \left\{ 0, \lambda _i^{k-1} + \rho _{k-1} g_i(x_{k})\right\} ,&\text { and }&{\overline{\gamma }}_j^k&= \gamma _j^{k-1}+\rho _{k-1} h_j(x_{k}) \end{aligned}$$

as the unclipped update. Define $S_k := \max \{\Vert {\overline{\gamma }}^k\Vert _\infty ,\Vert {\overline{\lambda }}^k\Vert _\infty \}$. We are going to discuss separately for situations when $S_k$ is bounded and when it is unbounded. If it is bounded, then denote a limit point of ${\overline{\lambda }}^k, {\overline{\gamma }}^k$ as ${\overline{\lambda }}$ and ${\overline{\gamma }}$. Let

$$\begin{aligned} v = {\text {grad}}\, f({\overline{x}}) + \sum _{j\in \mathcal{{E}}} {\overline{\gamma }}_j{\text {grad}}\, h_j({\overline{x}}) + \sum _{i\in \mathcal{{I}}\cap \mathcal{{A}}}{\overline{\lambda }}_i {\text {grad}}\, g_i({\overline{x}}). \end{aligned}$$

In order to prove that v is zero, we compare it to a similar vector defined at $x_k$, for all large k, and consider the limit $k \rightarrow \infty $. Unlike the Euclidean case in the proof in [5], we cannot directly compare tangent vectors in the tangent spaces at $x_k$ and ${\overline{x}}$: we use parallel transport to bring all tangent vectors to the tangent space at ${\overline{x}}$:

$$\begin{aligned} \Vert v\Vert\le & {} \left\| {\text {grad}}\, f({\overline{x}}) - \mathcal{{P}}_{x_k\rightarrow {\overline{x}}} {\text {grad}}\, f(x_k) {+}\!\! \sum _{j\in \mathcal{{E}}} {\overline{\gamma }}_j\left( {\text {grad}}\, h_j({\overline{x}})-\mathcal{{P}}_{x_k\rightarrow {\overline{x}}} {\text {grad}}\,h_j(x_k)\right) \right. \\&+ \left. \sum _{i\in \mathcal{{I}}\cap \mathcal{{A}}}{\overline{\lambda }}_i \left( {\text {grad}}\, g_i({\overline{x}}) - \mathcal{{P}}_{x_k\rightarrow {\overline{x}}} {\text {grad}}\, g_i(x_k)\right) \right\| \\&+ \left\| \mathcal{{P}}_{x_k\rightarrow {\overline{x}}} {\text {grad}}\, f(x_k) + \sum _{j\in \mathcal{{E}}} {\overline{\gamma }}_j\mathcal{{P}}_{x_k\rightarrow {\overline{x}}} {\text {grad}}\, h_j(x_k)\right. \\&\left. + \sum _{i\in \mathcal{{I}}\cap \mathcal{{A}}}{\overline{\lambda }}_i \mathcal{{P}}_{x_k\rightarrow {\overline{x}}} {\text {grad}}\, g_i(x_k)\right\| . \end{aligned}$$

By Lemma A.2, the first term vanishes in the limit $k \rightarrow \infty $ since $x_k \rightarrow {\overline{x}}$. We can understand the second term using isometry of parallel transport and linearity:

$$\begin{aligned}&\left\| \mathcal{{P}}_{x_k\rightarrow {\overline{x}}} {\text {grad}}\, f(x_k) + \sum _{j\in \mathcal{{E}}} {\overline{\gamma }}_j\mathcal{{P}}_{x_k\rightarrow {\overline{x}}} {\text {grad}}\, h_j(x_k) + \sum _{i\in \mathcal{{I}}\cap \mathcal{{A}}}{\overline{\lambda }}_i \mathcal{{P}}_{x_k\rightarrow {\overline{x}}} {\text {grad}}\, g_i(x_k)\right\| _{{\overline{x}}}\\&\quad = \left\| {\text {grad}}\, f(x_k) + \sum _{j\in \mathcal{{E}}} {\overline{\gamma }}_j{\text {grad}}\, h_j(x_k) + \sum _{i\in \mathcal{{I}}\cap \mathcal{{A}}}{\overline{\lambda }}_i {\text {grad}}\, g_i(x_k)\right\| _{x_k}\\&\quad \le \left\| \sum _{j\in \mathcal{{E}}} ({\overline{\gamma }}_j - {\overline{\gamma }}^k_j){\text {grad}}\, h_j(x_k) + \sum _{i\in \mathcal{{I}}\cap \mathcal{{A}}}({\overline{\lambda }}_i-{\overline{\lambda }}_i^k) {\text {grad}}\, g_i(x_k)\right\| _{x_k} \\&\qquad + \left\| {\text {grad}}\, f(x_k) + \sum _{j\in \mathcal{{E}}} {\overline{\gamma }}_j^k{\text {grad}}\, h_j(x_k) + \sum _{i\in \mathcal{{I}}}{\overline{\lambda }}_i^k {\text {grad}}\, g_i(x_k)\right\| _{x_k}\\&\qquad + \left\| \sum _{i\in \mathcal{{I}}\setminus \mathcal{{A}}} {\overline{\lambda }}_i^k {\text {grad}}\, g_i(x_k)\right\| _{x_k}. \end{aligned}$$

Here, the second term vanishes in the limit because it is upper bounded by $\epsilon _{k}$ (by assumption) and we let $\lim _{k\rightarrow \infty } \epsilon _k = 0$; the last term vanishes in the limit because of the discussion in the second paragraph; and the first term attains arbitrarily small values for large k as norms of gradients are bounded in a neighbourhood of ${\overline{x}}$ and by definition of ${\overline{\lambda }}$ and ${\overline{\gamma }}$. Since v is independent of k, we conclude that $\Vert v\Vert = 0$. Therefore, ${\overline{x}}$ satisfies KKT conditions.

On the other hand, if $\{S_k\}$ is unbounded, then for $k\ge k_3$, we have

$$\begin{aligned} \left\| \frac{1}{S_k}{\text {grad}}\, f({\overline{x}}) + \sum _{j\in \mathcal{{E}}} \frac{{\overline{\gamma }}_j}{S_k}{\text {grad}}\, h_j({\overline{x}}) + \sum _{i\in \mathcal{{I}}}\frac{{\overline{\lambda }}_i}{S_k} {\text {grad}}\, g_i({\overline{x}})\right\| \le \frac{\epsilon _k}{S_k}. \end{aligned}$$

As all the coefficients on the left-hand side are bounded in $[-1,1]$, and by definition of $S_k$, the coefficient vector has a nonzero limit point. Denote it as ${\overline{\lambda }}$ and ${\overline{\gamma }}$. By a similar argument as above, taking the limit in k, we can obtain

$$\begin{aligned} \left\| \sum _{j\in \mathcal{{E}}} {\overline{\gamma }}{\text {grad}}\, h_j({\overline{x}}) + \sum _{i\in \mathcal{{I}}\cap \mathcal{{A}}}{\overline{\lambda }} {\text {grad}}\, g_i({\overline{x}})\right\| = 0, \end{aligned}$$

which contradicts the LICQ condition at ${\overline{x}}$. Hence, the situation that $\{S_k\}$ is unbounded does not take place, so we are left with the cases where it is bounded, for which we already showed that ${\overline{x}}$ satisfies KKT condition. $\square $

B Proof of Proposition 3.4

Proof

The proof is adapted from Section 3 in [7]. Define ${\overline{\gamma }}_j^k = \gamma _j^{k-1}+\rho _{k-1} h_j(x_k)$. By Proposition 3.2, ${\overline{x}}$ is a KKT point and by taking a subsequence of $\{x_k\}$ if needed, ${\overline{\gamma }}^k$ is bounded and converges to ${\overline{\gamma }}$.

For any tangent vector $d\in \mathcal{{C}}^W({\overline{x}})$, we have $\langle d, {\text {grad}}\, h_j({\overline{x}})\rangle = 0$ for all $j\in \mathcal{{E}}$. Let $m = |\mathcal{{E}}|$, and dimension of $\mathcal{{M}}$ be $n\ge m$. Let $\varphi $ be a chart such that $\varphi ({\overline{x}}) = 0$. From [42, Prop. 8.1], the component functions of $h_j$ with respect to this chart are smooth. Let $\partial _1\dots \partial _{n}$ be the basis vectors of the given local chart. Let $d = (d_1\partial _1,\dots , d_n\partial _n)$. Define: $\mathcal{{F}}:{\mathbb {R}}^{n+m} \rightarrow {\mathbb {R}}^m$, i.e. for $x\in {\mathbb {R}}^n, y\in {\mathbb {R}}^m, j\in \{1,\dots , m\}$ as

$$\begin{aligned} \mathcal{{F}}_j(x,y) = \langle (y_1\partial _1, \dots , y_m\partial _m, d_{m+1}\partial _{m+1}, \dots , d_n\partial _n), {\text {grad}}\, h_j(\varphi ^{-1}(x))\rangle _{\varphi ^{-1}(x)}. \end{aligned}$$

If we denote $h_{j}^l$ as the lth coordinate of vector ${\text {grad}}\, h_j$ in this system, and $G_x$ as gram matrix for the metric where $G_{x_{p,q}} = \langle \partial _p,\partial _q\rangle _x$, then the above expression can be written as

$$\begin{aligned} \mathcal{{F}}_j(x,y) = [y_1, \dots , y_m, d_{m+1}, \dots , d_n] G_{\varphi ^{-1}(x)} [h_j^1(\varphi ^{-1}(x)), \dots h_j^n(\varphi ^{-1}(x))]^T. \end{aligned}$$

and by abuse of notation where $[1\dots m]$ means extracting the first m columns, we have

$$\begin{aligned} \frac{\partial \mathcal{{F}}_j}{\partial y} = \left( [h_j^1(\varphi ^{-1}(x)), \dots h_j^n(\varphi ^{-1}(x))]G_{\varphi ^{-1}(x)}\right) _{[1\cdots m]}. \end{aligned}$$

Notice that $[h^1(\varphi ^{-1}({\overline{x}})), \dots h^n(\varphi ^{-1}({\overline{x}}))]$ has full row rank (by LICQ), so it has rank m. As $G_{\varphi ^{-1}({\overline{x}})}$ is invertible, $\frac{\partial \mathcal{{F}}}{\partial y}({\overline{x}})$ must be invertible (reindex the columns from the top of the proof for this $m\times n$ matrix if needed so that the m columns form a full rank matrix). Then, by the implicit function theorem, for a small neighbourhood U of $\varphi ^{-1}({\overline{x}})$, we have a continuously differentiable function $g:U\rightarrow {\mathbb {R}}^m$, where $g(\varphi ^{-1}({\overline{x}})) = [d_1, \dots , d_m]$ and

$$\begin{aligned} \mathcal{{F}}(x, g(\varphi ^{-1}(x))) = 0. \end{aligned}$$

For each x locally around ${\overline{x}}$, let

$$\begin{aligned} d_x = [g(\varphi ^{-1}(x))_1\partial _1, \dots , g(\varphi ^{-1}(x))_m\partial _m, d_{m+1}\partial _{m+1}, \dots , d_n\partial _n]\in \mathrm {T}_{x}\mathcal{{M}}. \end{aligned}$$

These vectors then forms a smooth vector field such that $\langle d_x, {\text {grad}}\, h_j(x)\rangle = 0$ for all $j\in \mathcal{{E}}$, and $d = d_{{\overline{x}}}$. Then we have that

$$\begin{aligned}&\mathrm {Hess}_x \mathcal{{L}}_{\rho _{k-1}}(x_k,\gamma ^{k-1})(d_{x_k},d_{x_k}) \\&\quad = \langle d_{x_k}, \mathrm {Hess} f(x_k) d_{x_k}\rangle + \rho _{k-1} \sum _{j\in \mathcal{{E}}} \langle d_{x_k}, \nabla _{d_{x}}(h_j(x)+\frac{\gamma _j^{k-1}}{\rho _{k-1}}){\text {grad}}\, h_j(x)\rangle _{x_k} \\&\quad = \langle d_{x_k}, \mathrm {Hess} f(x_k) d_{x_k}\rangle + \rho _{k-1}\sum _{j\in \mathcal{{E}}} d_{x}[h_j(x)+\frac{\gamma _j^{k-1}}{\rho _{k-1}}](x_k)\langle d_{x}, {\text {grad}}\, h_j(x)\rangle _{x_k} \\&\quad \quad + \sum _{j\in \mathcal{{E}}} (\rho _{k-1} h_j(x_k)+\gamma _j^{k-1})\langle d_{x}, \nabla _{d_x}{\text {grad}}\, h_j(x)\rangle _{x_k}\\&\quad = \langle d_{x_k}, \mathrm {Hess} f(x_k) d_{x_k}\rangle + \sum _{j\in \mathcal{{E}}} (\rho _{k-1} h_j(x_k)+\gamma _j^{k-1})\langle d_{x}, \nabla _{d_{x}}{\text {grad}}\, h_j(x)\rangle _{x_k}\\&\quad = \langle d_{x_k}, \nabla _{d_{x_k}} {\text {grad}}\, f(x_k)\rangle + \sum _{j\in \mathcal{{E}}} {\overline{\gamma }}_j^{k}\langle d_{x}, \nabla _{d_{x}}{\text {grad}}\, h_j(x)\rangle _{x_k} \end{aligned}$$

where the second equality is by definition of connection; the third is by orthogonality of d with $\{{\text {grad}}\, h_j\}$; the fourth is from the definition of Hessian and ${\overline{\gamma }}$. Therefore we have

$$\begin{aligned} \langle d_{x}, \nabla _{d_{x}} {\text {grad}}\, f(x)\rangle _{x_k} + \sum _{j\in \mathcal{{E}}} {\overline{\gamma }}_j^k\langle d_{x}, \nabla _{d_{x}}{\text {grad}}\, h_j(x)\rangle _{x_k} \ge -\epsilon _k \Vert d_{x_k}\Vert ^2 \end{aligned}$$

Since the connection maps two continuously differentiable vector fields to a continuous vector field, we can take a limit and state:

$$\begin{aligned} \langle d, \nabla _{d} {\text {grad}}\, f({\overline{x}})\rangle + \sum _{j\in \mathcal{{E}}} {\overline{\gamma }}_j\langle d, \nabla _{d}{\text {grad}}\, h_j({\overline{x}})\rangle \ge 0 \end{aligned}$$

which is just $\mathrm {Hess} \mathcal{{L}}({\overline{x}}, {\overline{\gamma }})(d,d) \ge 0$. $\square $

C Proof of Proposition 4.1

In the proof below, we use the following notation:

$$\begin{aligned} v \in F'(x^*,\lambda ^*, \gamma ^*) \Leftrightarrow {\left\{ \begin{array}{ll} v\in \mathrm {T}_{x^*}\mathcal{{M}}, &{}\\ \langle {\text {grad}}\, h_j(x^*), v\rangle = 0&{} \text { for all }j\in \mathcal{{E}},\text { and} \\ \langle {\text {grad}}\, g_i(x^*), v\rangle \le 0&{} \text { for all }i\in \mathcal{{A}}(x^*)\cap \mathcal{{I}}.\\ \end{array}\right. } \end{aligned}$$

(31)

Proof

Consider the function Q, defined by:

$$\begin{aligned} Q(x, \rho ) = f(x) + \rho \left( \sum _{i\in \mathcal{{I}}}\text {max}\{0, g_i(x)\} + \sum _{j\in \mathcal{{E}}}|h_j(x)|\right) . \end{aligned}$$

In a small enough neighbourhood of $x^*$, terms for inactive constraints disappear and Q is just:

$$\begin{aligned} Q(x, \rho ) = f(x) + \rho \left( \sum _{i\in \mathcal{{A}}(x^*)\cap \mathcal{{I}}}\text {max}\{0, g_i(x)\} + \sum _{j\in \mathcal{{E}}}|h_j(x)|\right) . \end{aligned}$$

Although Q is nonsmooth, it is easy to verify that it has directional derivative in all directions:

$$\begin{aligned} \lim _{\tau \rightarrow 0 }\frac{\text {max}\{0,g_i(\mathrm {Exp}_{x^*}(\tau d))\}-\text {max}\{0,g_i(x^*)\}}{\tau }= & {} \lim _{\tau \rightarrow 0 }\frac{\text {max}\{0,g_i(\mathrm {Exp}_{x^*}(\tau d))\}}{\tau } \end{aligned}$$

and since $g_i\circ \mathrm {Exp}_{x^*}$ is sufficiently smooth, discussing separately the sign of $\frac{d}{d\tau }(g_i\circ \mathrm {Exp}_{x^*})(\tau d)$, we have the right hand side equal to $\text {max}\{0, \frac{d}{d\tau }(g_i\circ \mathrm {Exp}_{x^*})(\tau d)\} = \max \{0,\langle {\text {grad}}\, g_i(x^*), d\rangle \}$. Similarly, we have

$$\begin{aligned} \lim _{\tau \rightarrow 0 }\frac{|h_j(\mathrm {Exp}_{x^*}(\tau d))|-|h_j(x^*)|}{\tau } = \left| \frac{d}{d\tau }(h_j\circ \mathrm {Exp}_{x^*})(\tau d)\right| = |\langle {\text {grad}}\,h_j(x^*), d\rangle |. \end{aligned}$$

Hence, the directional derivative along direction d, $Q(x^*,\rho ; d)$, is well defined:

$$\begin{aligned} Q(x^*,\rho ;d)= & {} \langle {\text {grad}}\,f(x^*), d\rangle \nonumber \\&+ \rho \left( \sum _{i\in \mathcal{{A}}(x^*)\cap \mathcal{{I}}}\max \{0,\langle {\text {grad}}\,g_i(x^*), d\rangle \} + \sum _{j\in \mathcal{{E}}}|\langle {\text {grad}}\,h_j(x^*), d\rangle |\right) .\nonumber \\ \end{aligned}$$

(32)

As $x^*$ is a KKT point,

$$\begin{aligned} {\text {grad}}\,f(x^*) + \sum _{i\in \mathcal{{A}}(x^*)\cap \mathcal{{I}}} \lambda _i^* {\text {grad}}\,g_i(x^*) + \sum _{j\in \mathcal{{E}}} \gamma _j^* {\text {grad}}\,h_j(x^*) = 0. \end{aligned}$$

Thus,

$$\begin{aligned} 0= & {} \langle {\text {grad}}\,f(x^*),d\rangle + \sum _{i\in \mathcal{{A}}(x^*)\cap \mathcal{{I}}} \lambda _i^* \langle {\text {grad}}\,g_i(x^*),d\rangle + \sum _{j\in \mathcal{{E}}} \gamma _j^* \langle {\text {grad}}\,h_j(x^*),d\rangle \\\le & {} \langle {\text {grad}}\,f(x^*),d\rangle + \sum _{i\in \mathcal{{A}}(x^*)\cap \mathcal{{I}}} \lambda _i^* \max \{0, \langle {\text {grad}}\,g_i(x^*),d\rangle \}\\&+ \sum _{j\in \mathcal{{E}}} \gamma _j^* |\langle {\text {grad}}\,h_j (x^*),d\rangle |. \end{aligned}$$

Combining with equation (32), we have

$$\begin{aligned} Q(x^*, \rho ;d)\ge & {} \sum _{i\in \mathcal{{A}}(x^*)\cap \mathcal{{I}}} (\rho -\lambda _i^*) \max \{0, \langle {\text {grad}}\,g_i(x^*),d\rangle \}\nonumber \\&+ \sum _{j\in \mathcal{{E}}} (\rho -\gamma _j^*) |\langle {\text {grad}}\,h_j(x^*),d\rangle |. \end{aligned}$$

(33)

For contradiction, suppose $x^*$ is not a local minimum of Q. Then, there exists $\{y_k\}_{k=1}^\infty $, $\lim _{k\rightarrow \infty } y_k = x^*$ such that $Q(y_k, \rho ) < Q(x^*, \rho ) = f(x^*)$. By restricting to a small enough neighbourhood, there exists $\eta _k = \mathrm {Exp}^{-1}_{x^*}(y_k)$. Considering only a subsequence if needed, we have $\lim _{k\rightarrow \infty } \frac{\eta _k}{\Vert \eta _k\Vert } = {\bar{\eta }}$. It is easy to see that $Q(\mathrm {Exp}_{x^*}(\cdot ),\rho )$ is locally Lipschitz continuous at $0_{x^*}$, which gives

$$\begin{aligned} Q(\mathrm {Exp}_{x^*}(\Vert \eta _k\Vert {\bar{\eta }}), \rho ) = Q(\mathrm {Exp}_{x^*}(\eta _k), \rho ) + o(\Vert \eta _k\Vert ) = Q(y_k, \rho ) + o(\Vert \eta _k\Vert ). \end{aligned}$$

Subtract $Q(x^*,\rho )$ and take the limit:

$$\begin{aligned}&\lim _{k\rightarrow \infty } \frac{Q(\mathrm {Exp}_{x^*}(\Vert \eta _k\Vert {\bar{\eta }}), \rho ) - Q(x^*,\rho )}{\Vert \eta _k\Vert } =\lim _{k\rightarrow \infty } \frac{Q(y_k, \rho ) - Q(x^*,\rho )}{\Vert \eta _k\Vert } \\&\quad + \lim _{k\rightarrow \infty } \frac{o(\Vert \eta _k\Vert )}{\Vert \eta _k\Vert }\le 0. \end{aligned}$$

Notice the left-most expression is just $Q(x^*, \rho ; {\bar{\eta }})$. Since coefficients on the right-hand side of (33) are strictly positive, we must have $\langle {\text {grad}}\,g_i(x^*), {\bar{\eta }}\rangle \le 0$ and $\langle {\text {grad}}\,h_j(x^*), {\bar{\eta }}\rangle = 0$. Since the exponential mapping is of second order, we have a Taylor expansion for f,

$$\begin{aligned} f(y_k) = f(x^*) + \langle {\text {grad}}\,f(x^*), \eta _k\rangle + \frac{1}{2}\langle \eta _k, \mathrm {Hess} f(x^*)[\eta _k]\rangle + o(\Vert \eta _k\Vert ^2), \end{aligned}$$

and similarly for $g_i$ and $h_j$. Notice that

$$\begin{aligned} Q(y_k, \rho )= & {} f(y_k) + \rho \left( \sum _{i\in \mathcal{{A}}(x^*)\cap \mathcal{{I}}}\max \{0, g_i(y_k)\} + \sum _{j\in \mathcal{{E}}}|h_j(y_k)|\right) \\\ge & {} \left( f(y_k) + \sum _{i\in \mathcal{{A}}(x^*)\cap \mathcal{{I}}}\lambda _i^* g_i(y_k) + \sum _{j\in \mathcal{{E}}}\gamma _j^*h_j(y_k)\right) \\&+ \sum _{i\in \mathcal{{A}}(x^*)\cap \mathcal{{I}}}(\rho - \lambda _i^*)\max \{0, g_i(y_k)\} \\&+ \sum _{j\in \mathcal{{E}}}(\rho - \gamma _j^*)|h_j(y_k)|\\\ge & {} f(x^*) + \langle {\text {grad}}\,f(x^*) + \sum _{i\in \mathcal{{A}}(x^*)\cap \mathcal{{I}}} \lambda _i^* {\text {grad}}\,g_i(x^*) \\&+ \sum _{j\in \mathcal{{E}}} \gamma _j^* {\text {grad}}\,h_j(x^*), \eta _k\rangle \\&+ \frac{1}{2}\langle \eta _k, \mathrm {Hess} (f(x^*)+\sum _{i\in \mathcal{{A}}(x^*)\cap \mathcal{{I}}} \lambda _i^* g_i(x^*)\\&+\sum _{j\in \mathcal{{E}}}\gamma _{j}^* h_j(x^*))[\eta _k]\rangle + o(\Vert \eta _k\Vert ^2) + P(y_k)\\= & {} f(x^*) + 0 + \frac{1}{2}\langle \eta _k, \mathrm {Hess}(\mathcal{{L}}(x,\lambda ^*,\gamma ^*)(x ^*)[\eta _k])\rangle + o(\Vert \eta _k\Vert ^2) + P(y_k) \end{aligned}$$

where $P(y_k) = \sum _{i\in \mathcal{{A}}(x^*)\cap \mathcal{{I}}}(\rho - \lambda _i^*)\max \{0, g_i(y_k)\} + \sum _{j\in \mathcal{{E}}}(\rho - \gamma _j^*)|h_j(y_k)|$. The first inequality follows from quadratic approximation of $f + \sum _{i\in \mathcal{{A}}(x^*)\cap \mathcal{{I}}} \lambda _{i} g_i+\sum _{j\in \mathcal{{E}}} \gamma _j h_j$ and bilinearity of the metric. The last equality comes from the definition of KKT points. Dividing the equation through by $\Vert \eta \Vert ^2$, we obtain

$$\begin{aligned} \lim _{k\rightarrow \infty } \frac{Q(y_k,\rho ) - f(x^*)}{\Vert \eta \Vert ^2}= & {} \lim _{k\rightarrow \infty }\frac{1}{2}\left\langle \frac{\eta _k}{\Vert \eta _k\Vert }, \mathrm {Hess}(\mathcal{{L}}(x,\lambda ^*,\gamma ^*)(x ^*)\left[ \frac{\eta _k}{\Vert \eta _k\Vert }\right] \right\rangle \nonumber \\&+\, 0 + \lim _{k\rightarrow \infty }\frac{P(y_k)}{\Vert \eta \Vert ^2}. \end{aligned}$$

(34)

If ${\bar{\eta }}\in F'$, then as $P(y_k)\ge 0$, the first term on the right hand side will be strictly larger than 0, which is a contradiction to $Q(y_k,\rho ) < f(x^*)$ for all k. If ${\bar{\eta }}\in F-F'$, then there exists $g_{i'}$ such that $\langle {\text {grad}}\,g_{i'}(x), {\bar{\eta }}\rangle > 0$. Then,

$$\begin{aligned} g_{i'}(y_k) = g_{i'}(x^*) + \langle {\text {grad}}\,g_{i'}(x^*), \eta _k\rangle + o(\Vert {\bar{\eta }}\Vert ) = \langle {\text {grad}}\,g_{i'}(x^*), \eta _k\rangle + o(\Vert \eta _k\Vert ). \end{aligned}$$

Hence, dividing the above expression by $\Vert \eta _k\Vert $ gives

$$\begin{aligned} \lim _{k\rightarrow \infty } \frac{g_{i'}(y_k)}{\Vert \eta _k\Vert } \ge \lim _{k\rightarrow \infty } \left\langle {\text {grad}}\,g_{i'}(x^*), \frac{\eta _k}{\Vert \eta _k\Vert }\right\rangle + 0 = \langle {\text {grad}}\,g_{i'}(x^*), \bar{\eta _k}\rangle > 0. \end{aligned}$$

Notice that $\frac{P(y_k)}{\Vert \eta \Vert ^2} \ge \frac{g_{i'}(y_k)}{\Vert \eta _k\Vert }$ for large enough k and a contradiction is obtained by plugging it into (34). $\square $

D Proof of Proposition 4.2

Proof

We give a proof for $Q^{\mathrm {lse}}$—it is analogous for $Q^{\mathrm {lqh}}$. For each iteration k and for each $i\in \mathcal{{I}}$ and $j\in \mathcal{{E}}$, define the following coefficients:

$$\begin{aligned} \lambda _i^k&= \frac{e^{g_i(x_{k+1})/u_k}}{1+e^{g_i(x_{k+1})/u_k}},&\text { and }&\gamma ^k_j&= \frac{e^{h_j(x_{k+1})/u_k}-e^{-h_j(x_{k+1})/u_k}}{e^{h_j(x_{k+1})/u_k}+e^{-h_j(x_{k+1})/u_k}}. \end{aligned}$$

Then, a simple calculation shows that (under our assumptions, $\rho _k = \rho _0$ for all k; we simply write $\rho $):

$$\begin{aligned} {\text {grad}}\, Q^{\text {lse}}(x_{k+1},\rho _k,u_k) =&{\text {grad}}\, f(x_{k+1}) + \rho \sum _{i\in \mathcal{{I}}} \lambda _i^k {\text {grad}}\, g_i(x_{k+1}) \\&+ \rho \sum _{j\in \mathcal{{E}}} \gamma ^k_j {\text {grad}}\,h_j(x_{k+1}). \end{aligned}$$

Notice that the multipliers are bounded: $\gamma ^k_j \in [-1,1]$ and $\lambda _i^k\in [0,1]$. Hence, as sequences indexed by k, they have a limit point: we denote them by ${\overline{\gamma }}\in [-1,1]$ and ${\overline{\lambda }}\in [0,1]$. Furthermore, since ${\overline{x}}$ is feasible, there exists $k_1$ such that for any $k>k_1$, $i\in \mathcal{{I}}\setminus \mathcal{{A}}({\overline{x}})$, $g_i(x_k) < c$ for some constant $c<0$. Then, as $u_k\rightarrow 0$, by definition, $\lambda ^k_i$ goes to 0 for $i\in \mathcal{{I}}\setminus \mathcal{{A}}({\overline{x}})$. This shows ${\overline{\lambda }}_i = 0$ for $i\in \mathcal{{I}}\setminus \mathcal{{A}}({\overline{x}})$. Considering a convergent subsequence if needed, there exists $k_2>k_1$ such that, for all $k>k_2$, $\text {dist}(x_k,{\overline{x}}) < { i }({\overline{x}})$ (the injectivity radius). Thus, parallel transport from each $x_k$ to ${\overline{x}}$ is well defined. Consider

$$\begin{aligned} v = {\text {grad}}\,f({\overline{x}}) + \rho \sum _{i\in \mathcal{{I}}\cap \mathcal{{A}}({\overline{x}})}{\overline{\lambda }}_i {\text {grad}}\,g_i({\overline{x}}) + \rho \sum _{j\in \mathcal{{E}}} {\overline{\gamma }}_j{\text {grad}}\,h_i({\overline{x}}). \end{aligned}$$

Notice that its coefficients are bounded, so we can get $\Vert v\Vert = 0$ similar to the proof of Proposition 3.2. $\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liu, C., Boumal, N. Simple Algorithms for Optimization on Riemannian Manifolds with Constraints. Appl Math Optim 82, 949–981 (2020). https://doi.org/10.1007/s00245-019-09564-3

Download citation

Published: 28 March 2019
Issue Date: December 2020
DOI: https://doi.org/10.1007/s00245-019-09564-3

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Simple Algorithms for Optimization on Riemannian Manifolds with Constraints

Abstract

Access this article

Similar content being viewed by others

Riemannian Interior Point Methods for Constrained Optimization on Manifolds

Riemannian Optimization via Frank-Wolfe Methods

A Collection of Nonsmooth Riemannian Optimization Problems

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendices

A Proof of Proposition 3.2

Lemma A.1

Proof

Lemma A.2

Proof of Lemma A.2

Proof of Proposition 3.2

B Proof of Proposition 3.4

Proof

C Proof of Proposition 4.1

Proof

D Proof of Proposition 4.2

Proof

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

Simple Algorithms for Optimization on Riemannian Manifolds with Constraints

Abstract

Access this article

Similar content being viewed by others

Riemannian Interior Point Methods for Constrained Optimization on Manifolds

Riemannian Optimization via Frank-Wolfe Methods

A Collection of Nonsmooth Riemannian Optimization Problems

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendices

A Proof of Proposition 3.2

Lemma A.1

Proof

Lemma A.2

Proof of Lemma A.2

Proof of Proposition 3.2

B Proof of Proposition 3.4

Proof

C Proof of Proposition 4.1

Proof

D Proof of Proposition 4.2

Proof

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation