Accelerated Optimization on Riemannian Manifolds via Discrete Constrained Variational Integrators

A variational formulation for accelerated optimization on normed vector spaces was recently introduced in Wibisono et al. (PNAS 113:E7351–E7358, 2016), and later generalized to the Riemannian manifold setting in Duruisseaux and Leok (SJMDS, 2022a). This variational framework was exploited on normed vector spaces in Duruisseaux et al. (SJSC 43:A2949–A2980, 2021) using time-adaptive geometric integrators to design efficient explicit algorithms for symplectic accelerated optimization, and it was observed that geometric discretizations which respect the time-rescaling invariance and symplecticity of the Lagrangian and Hamiltonian flows were substantially less prone to stability issues, and were therefore more robust, reliable, and computationally efficient. As such, it is natural to develop time-adaptive Hamiltonian variational integrators for accelerated optimization on Riemannian manifolds. In this paper, we consider the case of Riemannian manifolds embedded in a Euclidean space that can be characterized as the level set of a submersion. We will explore how holonomic constraints can be incorporated in discrete variational integrators to constrain the numerical discretization of the Riemannian Hamiltonian system to the Riemannian manifold, and we will test the performance of the resulting algorithms by solving eigenvalue and Procrustes problems formulated as optimization problems on the unit sphere and Stiefel manifold.


Introduction
Many data analysis algorithms are designed around the minimization of a loss function or the maximization of a likelihood function. Due to the ever-growing scale of the data sets and size of the problems, there has been a lot of focus on first-order optimization algorithms because of their low cost per iteration. In 1983, Nesterov's accelerated gradient method (Nesterov 1983) was shown to converge in O(1/k 2 ) to the minimum of the convex objective function f , improving on the O(1/k) convergence rate exhibited by standard gradient descent methods. This O(1/k 2 ) convergence rate was shown in Nesterov (2004) to be optimal among first-order methods using only information about ∇ f at consecutive iterates. This phenomenon in which an algorithm displays this improved rate of convergence is referred to as acceleration, and other accelerated algorithms have been derived since Nesterov's algorithm, which was shown in Su et al. (2016) to limit to a second-order ordinary differential equation (ODE), as the timestep goes to 0, and that f (x(t)) converges to its optimal value at a rate of O(1/t 2 ) along any trajectory x(t) of this ODE. It was then shown in Wibisono et al. (2016) that in continuous time, an arbitrary convergence rate O(1/t p ) can be achieved in normed spaces, by considering flow maps generated by a family of time-dependent Bregman Lagrangian and Hamiltonian systems which is closed under time-rescaling. This variational framework and the time-rescaling property of this family were then exploited in  by using time-adaptive geometric integrators to design efficient explicit algorithms for symplectic accelerated optimization. It was observed that a careful use of adaptivity and symplecticity could result in a significant gain in computational efficiency. More generally, when applied to Hamiltonian systems, symplectic integrators yield discrete approximations of the flow that preserve the symplectic 2form (Hairer et al. 2006). The preservation of symplecticity results in the preservation of many qualitative aspects of the underlying dynamical system. In particular, when applied to conservative Hamiltonian systems, symplectic integrators exhibit excellent long-time near-energy preservation (Benettin and Giorgilli 1994;Reich 1999). Variational integrators provide a systematic method for constructing symplectic integrators of arbitrarily high-order based on the numerical discretization of Hamilton's principle (Marsden and West 2001;Hall and Leok 2015), or equivalently, by the approximation of Jacobi's solution of the Hamilton-Jacobi equation, which is a generating function for the exact symplectic flow map.
In the past few years, there has been some effort to derive accelerated optimization algorithms in the Riemannian manifold setting (Duruisseaux and Leok 2022a;Alimisis et al. 2020aAlimisis et al. , b, 2021Sra 2016, 2018;Ahn and Sra 2020;Liu et al. 2017). In Duruisseaux and Leok (2022a), it was shown that in continuous time, the convergence rate of f (x(t)) to its optimal value can be accelerated to an arbitrary convergence rate O(1/t p ) on Riemannian manifolds. This was achieved by considering a family of time-dependent Bregman Lagrangian and Hamiltonian systems on Riemannian manifolds which is closed under time-rescaling, thereby generalizing the variational framework for accelerated optimization of Wibisono et al. (2016) to Riemannian manifolds. The time-adaptivity based approach relying on a Poincaré transformation from  was also extended to the Riemannian manifold setting in Duruisseaux and Leok (2022a). Now, the Whitney embedding theorems (Whitney 1944a, b) state that any smooth manifold of dimension n ≥ 2 can be embedded in R 2n and immersed in R 2n−1 , and is thus diffeomorphic to a submanifold of R 2n . Furthermore, the Nash embedding theorems (Nash 1956) state that any Riemannian manifold can be globally isometrically embedded into some Euclidean space. As a consequence of these embedding theorems, the study of Riemannian manifolds can in principle be reduced to the study of submanifolds of Euclidean spaces. Altogether, this motivates the introduction of time-adaptive variational integrators on Riemannian manifolds which exploit the structure of the embedding Euclidean space, and in this paper, we will study how holonomic constraints can be incorporated into different types of variational integrators to constrain the numerical solutions of the Riemannian dynamical system to the Riemannian manifold. Incorporating holonomic constraints in geometric integrators has been studied extensively in the past (see Marsden and West (2001); Hairer et al. (2006); Marsden and Ratiu (1999); Holm et al. (2009) for instance), and some work has been done from the variational perspective for the Type I Lagrangian formulation in Marsden and West (2001) via augmented Lagrangians.

Outline of the Paper
Section 2 shows the equivalence between constrained variational principles and constrained Euler-Lagrange equations, both in continuous and discrete time, before deriving analogous results for both the Type II and Type III Hamiltonian formulations of mechanics in Sect. 3. In Sect. 4, we will exploit error analysis theorems for unconstrained mechanics from Marsden and West (2001); Schmitt and Leok (2017) to obtain variational error analysis results for the maps defined implicitly by the discrete constrained Euler-Lagrange and Hamilton's equations. Finally, in Sect. 5, we will exploit these constrained variational integrators and the variational formulation of accelerated optimization on Riemannian manifolds from Duruisseaux and Leok (2022a) to solve numerically generalized eigenvalue problems and Procrustes problems on the unit sphere and Stiefel manifold.

Constrained Variational Lagrangian Mechanics
Traditionally, variational integrators have been designed based on the Type I generating function known as the discrete Lagrangian, L d : Q × Q → R. The exact discrete Lagrangian is the exact generating function for the time-h flow map of Hamilton's equations, and it can be represented in boundary value form by where q(0) = q 0 , q(h) = q h , and q satisfies the Euler-Lagrange equations over the time interval [0, h]. This is closely related to Jacobi's solution of the Hamilton-Jacobi equation. A variational integrator is defined by constructing an approximation L d : Q × Q → R to the exact discrete Lagrangian L E d , and then applying the implicit discrete Euler-Lagrange equations, which implicitly define a numerical integrator, referred to as the discrete Hamiltonian mapF L d : (q 0 , p 0 ) → (q 1 , p 1 ), where D i denotes a partial derivative with respect to the i-th argument. These equations define the discrete Legendre transforms, F ± L d : 4) and the discrete Hamiltonian map can be expressed asF Such numerical methods are called variational integrators as they can be derived from a discrete Hamilton's principle, which involves extremizing a discrete action sum , subject to fixed boundary conditions on q 0 , q N . Now, suppose we are given a configuration manifold M, and a holonomic constraint function C : M → R d . Assuming that 0 ∈ R d is a regular point of C, we can constrain the dynamics to the constraint submanifold Q = C −1 (0), which is truly a submanifold of M (see Marsden and West (2001); Abraham et al. (1988)). We will now consider variational Lagrangian mechanics with holonomic constraints C(q) using Lagrange multipliers λ : [0, T ] → .

Remark 2.1
These constrained Euler-Lagrange equations can be thought of as the Euler-Lagrange equations coming from the augmented LagrangianL q, λ,q,λ = L(q,q) − λ, C(q) .
Consider the function S(q 0 , q T ) given by the extremal value of the constrained action functional S over the family of curves (q(·), λ(·)) satisfying the boundary conditions q(0) = q 0 and q(T ) = q T : (2.7) The following theorem shows that S(q 0 , q T ) is a generating function for the flow of the continuous constrained Euler-Lagrange equations: is implicitly given by the following relations: In particular, S(q 0 , q T ) is a Type I generating function that generates the exact flow of the constrained Euler-Lagrange equations (2.6).
Proof See Appendix A.4.

Discrete Constrained Variational Lagrangian Mechanics
We now introduce a discrete variational formulation of Lagrangian mechanics which includes holonomic constraints. Suppose we are given a partition 0 = t 0 < t 1 < . . . < t N = T of the interval [0, T ], and a discrete curve in Q × denoted by {(q k , λ k )} N k=0 such that q k ≈ q(t k ) and λ k ≈ λ(t k ). We will formulate discrete constrained variational Lagrangian mechanics in terms of the following discrete analogues of the constrained action functional S given by Eq. (2.5): (2.11) We can now derive discrete analogues to Theorem 2.1 relating discrete Type I variational principles to discrete Euler-Lagrange equations:

Theorem 2.3 The Type I discrete Hamilton's variational principles
are equivalent to the discrete constrained Euler-Lagrange equations Proof See Appendix A.7.

Remark 2.2
These discrete constrained Euler-Lagrange equations can be thought of as the discrete Euler-Lagrange equations coming from the augmented discrete LagrangiansL (2.15)

Constrained Variational Hamiltonian Mechanics
The boundary value formulation of the exact Type II generating function of the time-h flow of Hamilton's equations is given by the exact discrete right Hamiltonian, where (q, p) satisfies Hamilton's equations with boundary conditions q(0) = q 0 and p(h) = p h . A Type II Hamiltonian variational integrator is constructed by using an approximate discrete right Hamiltonian H + d , and applying the discrete right Hamilton's equations, which implicitly define the integrator,F H + d : (q 0 , p 0 ) → (q 1 , p 1 ). Similarly, the boundary value formulation of the exact Type III generating function of the time-h flow of Hamilton's equations is given by the exact discrete left Hamiltonian, . We now derive analogous results to those of Sect. 2 from the Hamiltonian perspective. As in the Lagrangian case, we will assume we have a configuration manifold M, a holonomic constraint function C : M → R d , and that the dynamics are constrained to the submanifold Q = C −1 (0).

Continuous Constrained Variational Hamiltonian Mechanics
The following theorem presents the equivalence between a continuous constrained variational principle and continuous constrained Hamilton's equations in the Type II case, generalizing Lemma 2.1 from Leok and Zhang (2011) to include holonomic constraints: Theorem 3.1 Consider the Type II constrained action functional S : (3.5) The condition that S(q(·), p(·), λ(·)) is stationary with respect to the boundary conditions δq(0) = 0 and δ p(T ) = 0 is equivalent to (q(·), p(·), λ(·)) satisfying Hamilton's constrained equationṡ As in the Type II case, we can derive a theorem relating a continuous constrained variational principle and continuous constrained Hamilton's equations in the Type III case: Theorem 3.2 Consider the Type III constrained action functional S : (3.7) The condition that S(q(·), p(·), λ(·)) is stationary with respect to the boundary conditions δq(T ) = 0 and δ p(0) = 0 is equivalent to (q(·), p(·), λ(·)) satisfying Hamilton's constrained equationṡ Remark 3.1 Hamilton's constrained equations are the same in the Type II and Type III formulations of Hamiltonian mechanics, and they can be thought of as the Hamilton's equations generated by the augmented Hamiltonian where p is the conjugate momentum for the variable λ. Furthermore, they are equivalent to the constrained Euler-Lagrange Eq. (2.6), provided that the Lagrangian L is hyperregular.
Remark 3.2 It is sometimes beneficial to augment the continuous equations with the equation ∂ H ∂ p (q, p), ∇C(q) = 0 (and analogously for the discrete case) to ensure that the momentum p lies in the cotangent space to the manifold, as explained and illustrated in (Hairer et al. 2006, Chapter VII).
We will now generalize Theorem 2.2 and its Type III analogue from Leok and Zhang (2011) to include holonomic constraints C(q) using Lagrange multipliers λ : In the Type II case, consider the function S(q 0 , p T ) given by the extremal value of the constrained action functional S over the family of curves (q(·), p(·), λ(·)) satisfying the boundary conditions q(0) = q 0 and p(T ) = p T : ( 3.10) The following theorem shows that S(q 0 , p T ) is a generating function for the flow of the continuous constrained Hamilton's equations:

Theorem 3.3 The exact time-T flow map of Hamilton's equations
is implicitly given by the following relations: In particular, S(q 0 , p T ) is a Type II generating function that generates the exact flow of the constrained Hamilton's Eq. (3.6).
Proof See Appendix A.5.
In the Type III case, consider the function S(q T , p 0 ) given by the extremal value of the constrained action functional S over the family of curves (q(·), p(·), λ(·)) satisfying the boundary conditions q(T ) = q T and p(0) = p 0 : ( 3.12) The following theorem shows that S(q T , p 0 ) is a generating function for the flow of the continuous constrained Hamilton's equations:

Theorem 3.4 The exact time-T flow map of Hamilton's equations
is implicitly given by the following relations: In particular, S(q T , p 0 ) is a Type III generating function that generates the exact flow of the constrained Hamilton's Eq. (3.8).
Proof See Appendix A.6.

Discrete Constrained Variational Hamiltonian Mechanics
Let us now extend the results of Sect. 3 from Leok and Zhang (2011) to introduce a discrete formulation of variational Hamiltonian mechanics which includes holonomic constraints. Suppose we are given a partition 0 = t 0 < t 1 < . .
We formulate discrete constrained variational Hamiltonian mechanics in terms of the following discrete analogues of the constrained action functional S given by Eq. (3.5): (3.14) We can now derive discrete analogues of Theorems 3.1 and 3.2 relating discrete variational principles to discrete constrained Hamilton's equations, generalizing Lemma 3.1 from Leok and Zhang (2011): is equivalent to the discrete constrained right Hamilton's equations

Theorem 3.6 The Type III discrete Hamilton's phase space variational principle
is equivalent to the discrete constrained left Hamilton's equations  Schmitt et al. (2018) for Taylor variational integrators provided the Lagrangian is hyperregular, and in Leok and Zhang (2011) for generalized Galerkin variational integrators constructed using the same choices of basis functions and numerical quadrature formula provided the Hamiltonian is hyperregular.

Unconstrained Error Analysis
Theorem 2.3.1 of Marsden and West (2001) states that if a discrete Lagrangian, L d : , viewed as a one-step method defined implicitly from the discrete Euler-Lagrange equations or equivalently in terms of the implicit discrete Euler-Lagrange equations, which involve the corresponding discrete momenta via the discrete Legendre transforms, has order of accuracy r . Theorem 2.3.1 of Marsden and West (2001) has an analogue for Hamiltonian variational integrators. Theorem 2.2 in Schmitt and Leok (2017) , viewed as a one-step method defined implicitly by the discrete right Hamilton's equations is order r accurate. As mentioned in Schmitt and Leok (2017), the proof of Theorem 2.2 in Schmitt and Leok (2017) , viewed as a one-step method defined implicitly by the discrete left Hamilton's equations is order r accurate. Many other properties of the integrator, such as momentum conservation properties of the method, can be determined by analyzing the associated discrete Lagrangian or Hamiltonian, as opposed to analyzing the integrator directly. We will exploit these error analysis results to derive analogous results for the constrained versions discussed in Sects. 2 and 3.

Constrained Error Analysis
For the Lagrangian case, we can think of the Lagrange multipliers λ as extra position coordinates and define an augmented LagrangianL viā A corresponding augmented discrete Lagrangian is given bȳ and the discrete Euler-Lagrange Eq. (4.2) yield the discrete constrained Euler-Lagrange equations (4.12) Then, the discrete map (q k , p k , λ k ) → (q k+1 , p k+1 , λ k+1 ), viewed as a one-step method defined implicitly by the discrete constrained Euler-Lagrange equations, has order of accuracy r .
For the Hamiltonian case, we can think of the Lagrange multipliers λ as extra position coordinates and define conjugate momenta p, which are constants of motion since the time-derivative of λ does not appear anywhere, and are constrained to be zero. The augmented HamiltonianH , given bȳ (4.13) yields the following augmented left and right discrete Hamiltonians 15) and the discrete left and right Hamilton's equations yield the discrete constrained left Hamilton's equations (4.18) and the discrete constrained right Hamilton's equations (4.20) Then, the discrete map (q k , p k , λ k ) → (q k+1 , p k+1 , λ k+1 ), viewed as a one-step method defined implicitly by the discrete constrained right Hamilton's equations, has order of accuracy r . (4.21) Then, the discrete map (q k , p k , λ k ) → (q k+1 , p k+1 , λ k+1 ), viewed as a one-step method defined implicitly by the discrete constrained left Hamilton's equations, has order of accuracy r .
Definition 5.1 Suppose we have a Riemannian manifold Q with Riemannian metric g(·, ·) = ·, · , represented by the positive-definite symmetric matrix (g i j ) in local coordinates. Then, we define the musical isomorphism g : and its inverse musical isomorphism g : T * Q → T Q. The Riemannian metric g(·, ·) = ·, · induces a fiber metric g * (·, ·) = ·, · on T * Q via represented by the positive-definite symmetric matrix (g i j ) in local coordinates, which is the inverse of the Riemannian metric matrix (g i j ).
Definition 5.2 Denoting the differential of f by d f , the Riemannian gradient grad f (q) ∈ T q Q at a point q ∈ Q of a smooth function f : Q → R is the tangent vector at q such that This can also be expressed in terms of the inverse musical isomorphism, grad f (q) = g (d f (q)).

Definition 5.3
A geodesic in a Riemannian manifold Q is a parametrized curve γ : [0, 1] → Q which is of minimal local length, and is a generalization of the notion of straight line from Euclidean spaces to Riemannian manifolds. The other generalization of straight lines involves curves having zero "acceleration" or constant "speed," which requires the introduction of an affine connection. These two generalizations are equivalent if the Riemannian manifold is endowed with the Levi-Civita connection. Given two points q,q ∈ Q, a vector in T q Q can be transported to Tq Q along a geodesic γ by an operation q q : T q Q → Tq Q called the parallel transport along γ .

Definition 5.4 The Riemannian Exponential map
where γ v is the unique geodesic in Q such that γ v (0) = q and γ v (0) = v, for any v ∈ T q Q. Exp q is a diffeomorphism in some neighborhood U ⊂ T q Q containing 0, so we can define its inverse map, the Riemannian Logarithm map Log p : Exp q (U ) → T q Q.
Definition 5.5 A retraction on a manifold Q is a smooth mapping R : T Q → Q such that for any q ∈ Q, the restriction R q : the tangent map of R at 0 q ∈ T q Q and I T q Q is the identity map on T q Q.
The Riemannian Exponential map is a natural example of a retraction on a Riemannian manifold.
Definition 5.6 A subset A of a Riemannian manifold Q is called geodesically uniquely convex if every two points of A are connected by a unique geodesic in A. A function f : Q → R is called geodesically convex if for any two points q,q ∈ Q and a geodesic γ connecting them, Note that if f is a smooth geodesically convex function on a geodesically uniquely convex subset A, Note that a local minimum of a geodesically convex or α-WQC function is also a global minimum.
Definition 5.7 Given a Riemannian manifold Q with sectional curvature bounded below by K min , and an upper bound D for the diameter of the domain of consideration, define Note that ζ ≥ 1 since x coth x ≥ 1 for all real values of x.

Hamiltonian Approach
Our approach consists in integrating the Riemannian Bregman Hamiltonian systems derived in Duruisseaux and Leok (2022a) which evolve on the Riemannian manifold Q, via discrete constrained variational Hamiltonian integrators which enforce the numerical solution to lie on the Riemannian manifold Q. With ζ given by Eq. (5.1), we know from Duruisseaux and Leok (2022a) that if we let λ = ζ in the geodesically convex case, and λ = ζ /α in the geodesically α-weakly quasi-convex case, we obtain the Direct approach Riemannian p-Bregman Hamiltonian It is proven in Duruisseaux and Leok (2022a) that along the trajectories of the Riemannian p-Bregman dynamics, f (Q(t)) converges to its optimal value at a rate of O(1/t p ), under suitable assumptions on Q. (5.5)

Rayleigh Quotient Minimization on the Unit Sphere
An eigenvector v corresponding to the largest eigenvalue of a symmetric n × n matrix A maximizes the Rayleigh quotient v Av v v over R n . Thus, a unit eigenvector corresponding to the largest eigenvalue of the matrix A is a minimizer of the function f (v) = −v Av over the unit sphere Q = S n−1 , which can be thought of as a Riemannian submanifold with constant positive curvature K = 1 of R n endowed with the Riemannian metric inherited from the Euclidean inner product g v (u, w) = u w. Solving the Rayleigh quotient optimization problem efficiently is challenging when the given symmetric matrix A is ill-conditioned and high-dimensional. Note that an efficient algorithm that solves the above minimization problem can also be used to find eigenvectors corresponding to the smallest eigenvalue of A by using the fact that the eigenvalues of A are the negative of the eigenvalues of −A.

When endowed with the Riemannian metric g X (A, B) = Trace(A B), the Stiefel manifold
is a Riemannian submanifold of R n×m . The tangent space at any X ∈ St(m, n) is given by T X St(m, n) = {Z ∈ R n×m |X Z + Z X = 0}, and the orthogonal projection P X onto T X St(m, n) is given by P X Z = Z − 1 2 X (X Z + Z X ). A retraction on St(m, n) is given by R X (ξ ) = qf(X + ξ), where qf(A) denotes the Q factor of the QR factorization of the matrix A ∈ R n×m as A = Q R where Q ∈ St(m, n) and R is an upper triangular n × m matrix with strictly positive diagonal elements (Absil et al. 2008).
A generalized eigenvector problem consists of finding the m smallest eigenvalues of a n × n symmetric matrix A and corresponding eigenvectors. This problem can be formulated as a Riemannian optimization problem on the Stiefel manifold St(m, n) via the Brockett cost function where N = diag(μ 1 , . . . , μ m ) for arbitrary 0 ≤ μ 1 ≤ . . . ≤ μ m . The columns of a global minimizer of f are eigenvectors corresponding to the m smallest eigenvalues of A (see Absil et al. (2008)). If we definef : R n×m → R via X →f (X ) = Trace(X AX N ), then f is the restriction off to St(m, n) so grad f (X ) = P X gradf (X ), where gradf (X ) = 2 AX N . (5.8) The unbalanced orthogonal Procrustes problem consists of minimizing the function on the Stiefel manifold St(m, n), for given matrices A ∈ R l×n and B ∈ R l×m with l ≥ n and l > m, where · F is the Frobenius norm X 2 Note that the special case where n = m is the balanced orthogonal Procrustes problem. In this case, St(m, n) = O(n) so AX 2 F = A 2 F and minimizing the function f (X ) = AX − B 2 F is replaced by the problem of maximizing Trace(X A B) over X ∈ O(n). A solution is then given by X * = U V where B A = U V is the Singular Value Decomposition of B A with square orthogonal matrices U and V , and the solution is unique provided B A is nonsingular (see Eldén and Park (1999); Golub and Van Loan (2013)).

Hamiltonian Taylor Variational Integrators (HTVIs)
HTVIs were first introduced in Schmitt et al. (2018). A discrete approximate Hamiltonian is constructed by approximating the flow map and the trajectory associated with the boundary values using a Taylor method, and approximating the integral by a quadrature rule. The Hamiltonian Taylor variational integrator is then generated by the discrete Hamilton's equations. More explicitly, Type II HTVIs are constructed as follows: (i) Construct the r -order and (r + 1)-order Taylor methods (iii) Choose a quadrature rule of order s with weights and nodes given by (vi) Apply the quadrature rule to obtain the associated discrete right Hamiltonian  Schmitt and Leok (2017), the associated discrete Hamiltonian map has the same order of accuracy.
In this paper, we will use the Direct approach and Adaptive approach r = 0 Type II HTVIs constructed in  based on the Direct and Adaptive discrete right Hamiltonians (respectively) (5.12)

Euler-Lagrange Simple Discretization
In Duruisseaux and Leok (2022a), the p-Bregman Euler-Lagrange equations were rewritten as a first-order system of differential equations, for which a Riemannian version of a semi-implicit Euler scheme was applied to obtain the following algorithm: Algorithm 2: Semi-Implicit Euler Integration of the p-Bregman Euler-Lagrange Equations Version I of Algorithm 2 corresponds to the usual update for the semi-implicit Euler scheme, while Version II is inspired by the reformulation of Nesterov's method from Sutskever et al. (2013) that uses a corrected gradient ∇ f (X k + hb k V k ) instead of the traditional gradient ∇ f (X k ).

Riemannian Gradient Descent (RGD)
This is a generalization of Gradient Descent to the setting of Riemannian manifolds which involves the Riemannian gradient and a retraction.

Algorithm 3: Riemannian Gradient Descent (RGD)
Input: A function f : Q → R, a retraction R from T Q to Q, h > 0, and

Numerical Results
It is noted in Duruisseaux and Leok (2022a) that although higher values of p in Algorithm 2 result in provably faster rates of convergence, they also appear to be more prone to stability issues under numerical discretization, which can cause the numerical optimization algorithm to diverge. Numerical experiments in  showed that in the normed vector space setting, geometric discretizations which respect the time-rescaling invariance and symplecticity of the Bregman Lagrangian and Hamiltonian flows were substantially less prone to these stability issues, and were therefore more robust, reliable, and computationally efficient. This was one of the motivations to develop time-adaptive Hamiltonian variational integrators for the Bregman Hamiltonians. Numerical experiments were conducted for the Rayleigh quotient minimization problem on S n−1 , and for the generalized eigenvalue and Procrustes problems on the Stiefel manifold St(m, n).
The results from Fig. 1 show how the Hamiltonian Taylor variational integrators compare to the Euler-Lagrange discretizations from Duruisseaux and Leok (2022a) and the standard Riemannian gradient descent. Note that for certain instances of the Procrustes problem with certain initial values, all the algorithms converged to a local minimizer, and not the global minimizer, of the objective function. We can observe from Fig. 1 that for the same value of the timestep h, the Adaptive Hamiltonian variational integrator clearly outperforms its Direct counterpart, Riemannian gradient descent and the Euler-Lagrange discretizations in terms of number of iterations required. Furthermore, unlike the Euler-Lagrange discretizations (Algorithm 2) and the Riemannian gradient descent (Algorithm 3), the HTVI methods (Algorithm 1) do not require the use of retractions or parallel transports. Note that the Rayleigh minimization results indicate that the Euler-Lagrange discretizations suffer from stability issues leading to a loss of convergence, as the polynomially growing unbounded coefficient C p 2 (kh) p−2 is multiplied with grad f , so for this product to be bounded, the gradient has to decay to zero, but due to finite numerical precision, the gradient remains bounded away from zero, thereby causing the product to grow without bound. This issue can be resolved by adding a suitable upper bound to the coefficient C p 2 (kh) p−2 in the updates, as can be seen both for the Euler-Lagrange discretizations and Hamiltonian variational integrators for the problems on St(m, n).
However, the algorithms generated by these constrained Hamiltonian variational integrators are implicit, which can significantly increase the cost per iteration as the dimension of the problem becomes very large. In this case, it might be beneficial to consider other options using the unconstrained explicit Hamiltonian Taylor variational integrator, such as incorporating the constraints within the objective function as a penalty, although this might not constrain the solution trajectory to lie exactly on the manifold, or using projections if they can be computed efficiently and accurately for the Riemannian manifold of interest . Further, note that the implementation of the Hamiltonian variational integrators needs a very careful tuning of the various parameters at play, which may be challenging and thus also motivates the development of different methods.

Conclusion
Motivated by variational formulations of optimization problems on Riemannian manifolds, we first studied the relationship between the constrained Type I/II/III variational principles and the corresponding constrained Hamilton's or Euler-Lagrange equations both in continuous and discrete time, and derived variational error analysis results for the maps defined implicitly by the resulting discrete constrained equations. We then exploited these discrete constrained variational integrators and the variational formulation of accelerated optimization on Riemannian manifolds from Duruisseaux and Leok (2022a) to numerically solve the generalized eigenvalue and Procrustes problems on S n−1 and St(m, n). The numerical experiments conducted in this paper corroborated the observation made for the vector space setting in  that the adaptive Hamiltonian variational integrator is significantly more efficient than the direct Hamiltonian variational integrator, and that it can significantly outperform the Euler-Lagrange discretizations and Riemannian gradient descent, when its parameters are tuned carefully. Furthermore, it was noted that unlike the Euler-Lagrange discretizations and Riemannian gradient descent, the Hamiltonian algorithms did not require the use of retractions or parallel transports, which could be important when the problem considered lies on a Riemannian manifold for which it might not be possible to compute or approximate these objects efficiently.
We noted however that tuning the parameters of these discrete constrained variational integrators can be challenging, and also that the resulting algorithms are implicit, which may significantly increase the cost per iteration as the dimension of the problem becomes very large, in which case it might be beneficial to consider using the unconstrained explicit HTVIs with projections  or by incorporating the constraints within the objective function as a penalty. Moreover, although the Whitney and Nash embedding theorems (Whitney 1944a, b;Nash 1956) imply that there is no loss of generality when studying Riemannian manifolds only as submanifolds of Euclidean spaces, there are limitations to the constrained integration strategy based on embeddings presented in this paper, and an approach intrinsically defined on Riemannian manifolds would be desirable. Indeed, the embedding approach usually leads to higher-dimensional computations, and requires an effective way of constructing the embedding or a natural way of writing down equations that constrain the problem and the numerical solutions to the Riemannian manifold. Furthermore, most results in Riemannian geometry or results concerning specific Riemannian manifolds are proven from an intrinsic perspective because the embedding approach tends to flood intrinsic geometric properties of the manifold with superfluous information coming from the additional dimensions of the Euclidean space. This motivates the development of intrinsic methods that would exploit the symmetries and geometric properties of the manifold and of the problem at hand.
Developing an intrinsic extension of Hamiltonian variational integrators to manifolds will require some additional work, since the current approach involves Type II/III generating functions H + d (q k , p k+1 ), H − d ( p k , q k+1 ), which depend on the position at one boundary point, and the momentum at the other boundary point. However, this does not make intrinsic sense on a manifold, since one needs the base point in order to specify the corresponding cotangent space, and one should ideally consider a Hamiltonian variational integrator construction based on discrete Dirac mechanics (Leok and Ohsawa 2011), which would yield a generating function E + d (q k , q k+1 , p k+1 ), E − d (q k , p k , q k+1 ), that depends on the position at both boundary points and the momentum at one of the boundary points. This approach can be viewed as a discretization of the generalized energy E (q, v, p) = p, v − L(q, v), in contrast to the Hamiltonian H (q, p) On the other hand, the formulation of Lagrangian variational integrators presented in the introduction of Sect. 2 makes sense intrinsically on manifolds, and a framework for variable time-stepping in Lagrangian variational integration was introduced in our subsequent work (Duruisseaux and Leok 2022b) to design intrinsic time-adaptive Lagrangian variational integrators for accelerated optimization on Riemannian manifolds.
It would also be interesting to extend the proposed approach to the problem of optimization with nonintegrable constraints, which naturally leads to the question of whether vakonomic or nonholonomic mechanics is the appropriate description (Cortés et al. 2002). In the context of optimization with nonintegrable constraints, the relevant extension will likely involve vakonomic variational integrators (Benito and Martín de Diego 2005;Jiménez and Martín de Diego 2012). However, it would be interesting to relate the methods introduced in this paper to the existing work on variational integrators applied to optimal control problems (Junge et al. 2005;de León et al. 2007), and the discrete optimal control of nonholonomic dynamical systems would likely require a combination of the methods described here and nonholonomic integrators (Cortés and Martínez 2001;de León et al. 2004;Fedorov and Zenkov 2005;McLachlan and Perlmutter 2006).

Data Availability Statement
The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Proof Using integration by parts and simplifying gives
By Theorem 2.1, the extremum of the action functional S is achieved when (q, λ) satisfies the constrained Euler-Lagrange Eq. (2.6), so we get D 1 S(q 0 , q T ) = − ∂ L ∂q (q 0 ,q(0)) and D 2 S(q 0 , q T ) = ∂ L ∂q (q T ,q(T )).

A.7. Proof of Theorem 2.3
Theorem A.7 The Type I discrete Hamilton's variational principles are equivalent to the discrete constrained Euler-Lagrange equations where L d (q k , q k+1 ) is defined via Eq. (2.11).
Proof Using the fact that δq 0 = 0 and δq N = 0, we have If the discrete constrained Euler-Lagrange Eq. (A.12) are satisfied, then each term vanishes and δS ± d = 0. Conversely, if δS ± d = 0, then a discrete fundamental theorem of the calculus of variations yields the discrete constrained Euler-Lagrange Eq. (A.12).

A.9. Proof of Theorem 3.6
Theorem A.9 The Type III discrete Hamilton's phase space variational principle is equivalent to the discrete constrained left Hamilton's equations where H − d (q k+1 , p k ) is defined via Eq. (3.17).
Proof Using the fact that δq N = 0 and δ p 0 = 0 since (q N , p 0 ) is fixed, we obtain the following expression for the variations of S − d : If the discrete constrained left Hamilton's Eq. (A.16) are satisfied, then each term vanishes and δS − d = 0. Conversely, if δS − d = 0, then a discrete fundamental theorem of the calculus of variations yields the discrete constrained left Hamilton's Eq. (A.16).