A Grassmann Manifold Handbook: Basic Geometry and Computational Aspects

The Grassmann manifold of linear subspaces is important for the mathematical modelling of a multitude of applications, ranging from problems in machine learning, computer vision and image processing to low-rank matrix optimization problems, dynamic low-rank decompositions and model reduction. With this mostly expository work, we aim to provide a collection of the essential facts and formulae on the geometry of the Grassmann manifold in a fashion that is fit for tackling the aforementioned problems with matrix-based algorithms. Moreover, we expose the Grassmann geometry both from the approach of representing subspaces with orthogonal projectors and when viewed as a quotient space of the orthogonal group, where subspaces are identified as equivalence classes of (orthogonal) bases. This bridges the associated research tracks and allows for an easy transition between these two approaches. Original contributions include a modified algorithm for computing the Riemannian logarithm map on the Grassmannian that is advantageous numerically but also allows for a more elementary, yet more complete description of the cut locus and the conjugate points. We also derive a formula for parallel transport along geodesics in the orthogonal projector perspective, formulae for the derivative of the exponential map, as well as a formula for Jacobi fields vanishing at one point.

In this work, we approach the Grassmannian from a matrix-analytic perspective.The focus is on the computational aspects as well as on geometric concepts that directly or indirectly feature in matrix-based algorithmic applications.The most prominent approaches of representing points on Grassmann manifolds with matrices in computational algorithms are • the basis perspective: A subspace U ∈ Gr(n, p) is identified with a (nonunique) matrix U ∈ R n×p whose columns form a basis of U. In this way, a subspace is identified with the equivalence class of all rank-p matrices whose columns span U.For an overview of this approach, see for example the survey [1].A brief introduction is given in [27].• the ONB perspective: In analogy to the basis perspective above, a subspace U may be identified with the equivalence class of matrices whose columns form an orthonormal basis (ONB) of U.This is often advantageous in numerical computations.This approach is surveyed in [17].• the projector perspective: A subspace U ∈ Gr(n, p) is identified with the (unique) orthogonal projector P ∈ R n×n onto U, which in turn is uniquely represented by P = U U T , with U from the ONB perspective above.For an approach without explicit matrices see [38], and for the approach with matrices see for example [26,6,27].• the Lie group perspective: A subspace U ∈ Gr(n, p) is identified with an equivalence class of orthogonal n×n matrices.This perspective is for example taken in [20,50].These approaches are closely related and all of them rely on Lie group theory to some extent.Yet, the research literature on the basis/ONB perspective and the projector perspective is rather disjoint.The recent preprint [33] proposes yet another perspective, namely representing p-dimensional subspaces as symmetric orthogonal matrices of trace 2p − n.This approach corresponds to a scaling and translation of the projector matrices in the vector space of symmetric matrices, hence it yields very similar formulae.
Raison d'être and original contributions.We treat the Lie group approach, the ONB perspective and the projector perspective simultaneously.This may serve as a bridge between the corresponding research tracks.Moreover, we collect the essential facts and concepts that feature as generic tools and building blocks in Riemannian computing problems on the Grassmann manifold in terms of matrix formulae, fit for algorithmic calculations.This includes, among others, the Grassmannian's quotient space structure, the Riemannian metric and distance, the Riemannian connection, the Riemannian exponential and its inverse, the Riemannian logarithm, as well as the associated Riemannian normal coordinates, parallel transport of tangent vectors and the sectional curvature.Wherever possible, we provide self-contained and elementary derivations of the sometimes classical results.Here, the term elementary is to be understood as "via tools from linear algebra and matrix analysis" rather than "via tools from abstract differential geometry".Care has been taken that the quantities that are most relevant for algorithmic applications are stated in a form that allows calculations that scale in O(np 2 ).
As novel research results, we provide a modified algorithm for computing the Riemannian logarithm map on the Grassmannian that has favorable numerical features and additionally allows to (non-uniquely) map points from the cut locus of a point to its tangent space.Therefore any set of points on the Grassmannian can be mapped to a single tangent space.In particular, we give explicit formulae for the (possibly multiple) shortest curves between any two points on the Grassmannian as well as the corresponding tangent vectors.Furthermore, we present a more elementary, yet more complete description of the conjugate locus of a point on the Grassmannian, which is derived in terms of principal angles between subspaces.We also derive a formula for parallel transport along geodesics in the orthogonal projector perspective, formulae for the derivative of the exponential map, as well as a formula for Jacobi fields vanishing at one point.
Organization.Section 2 introduces the quotient space structure of the Grassmann manifold and provides basic formulae for representing Grassmann points and tangent vectors via matrices.Section 3 recaps the essential Riemann-geometric aspects of the Grassmann manifold, including the Riemannian exponential, its derivative and parallel transport.In Section 4, the Grassmannian's symmetric space structure is established by elementary means and used to explore the sectional curvature and its bounds.In Section 5, the (tangent) cut locus is described and a new algorithm is proposed to calculate the pre-image of the exponential map, i.e. the Riemannian logarithm where the pre-image is unique.Section 6 addresses normal coordinates and local parameterizations for the Grassmannian.In Section 7, questions on Jacobi fields and the conjugate locus of a point are considered.Section 8 concludes the paper.
2. The Quotient Structure of the Grassmann Manifold.In this section, we recap the definition of the Grassmann manifold and connect results from [17,6,38,26].Tools from Lie group theory establish the quotient space structure of the Grassmannian, which gives rise to efficient representations.The required Lie group background can be found in the appendix and in [23,34].
The Grassmann manifold (also called Grassmannian) is defined as the set of all p-dimensional subspaces of the Euclidean space R n , i.e., Gr(n, p) := {U ⊂ R n | U is a subspace, dim(U) = p} .
With a slight abuse of notation, this set can be identified with the set of orthogonal rank-p projectors, (2.1) Gr(n, p) = P ∈ R n×n P T = P, P 2 = P, rank P = p , as is for example done in [26,6].Note that a projector P is symmetric as a matrix (namely, P T = P ) if and only if it is orthogonal as a projection operation (its range and null space are mutually orthogonal) [47, §3].The identification in (2.1) associates P with the subspace U := range(P ).Every P ∈ Gr(n, p) can in turn be identified with an equivalence class of orthonormal basis matrices spanning the same subspace; an approach that is for example chosen in [17].These ONB matrices are elements of the so called Stiefel manifold The link between these two sets is via the projection π SG : St(n, p) → Gr(n, p), U → U U T .
To obtain a manifold structure on Gr(n, p) and St(n, p), we recognize these matrix sets as quotients of the orthogonal group which is a compact Lie group.For a brief introduction to Lie groups and their actions, see Appendix A. The link from O(n) to St(n, p) and Gr(n, p) is given by the projections where Q(:, 1 : p) is the matrix formed by the p first columns of Q, and respectively.We can consider the following hierarchy of quotient structures: Two square orthogonal matrices Q, Q ∈ O(n) determine the same rectangular, column-orthonormal matrix U ∈ R n×p , if both Q and Q feature U as their first p columns.Two column-orthonormal matrices U, Ũ ∈ R n×p determine the same subspace, if they differ by an orthogonal coordinate change.This hierarchy is visualized in Figure 2.1.In anticipation of the upcoming discussion, the figure already indicates the lifting of tangent vectors according to the quotient hierarchy.
2.1.The Manifold Structure of the Grassmannian.In order to formally introduce the submanifold structure of the Grassmannian and various expressions for tangent vectors, we start with the orthogonal group O(n), which is the domain of π OS and π OG .The Lie algebra of O(n) is the set of skew-symmetric matrices Conceptual visualization of the quotient structure of the Grassmann manifold.The double brackets [[•]] denote an equivalence class with respect to π OG = π SG • π OS , while the single brackets [•] denote an equivalence class with respect π OS or π SG , depending on the element inside the brackets.The tangent vectors along an equivalence class for a projection are vertical with respect to that projection, while the directions orthogonal to the vertical space are horizontal.Correspondingly, the horizontal lift of a tangent vector ∆ ∈ T P Gr(n, p) to T U St(n, p) or T Q O(n) is orthogonal to all vertical tangent vectors at that point.With respect to the projection π SG from the Stiefel to the Grassmann manifold, the green tangent vector ∆ hor U is horizontal and the magenta tangent vector (along the equivalence class) is vertical.On the other hand, the magenta tangent vectors in O(n) (pointing to the left) are horizontal with respect to π OS but vertical with respect to π OG .
The tangent space at an arbitrary Q ∈ O(n) is therefore given by the left translates Restricting the Euclidean matrix space metric A, B 0 = tr(A T B) to the tangent spaces turns the manifold O(n) into a Riemannian manifold.We include a factor of 1 2 to obtain metrics on the Stiefel and Grassmann manifold, in Subsections 2.2 and 3.1, respectively, that comply with common conventions.This yields the Riemannian metric (termed here metric for short) In order to obtain a smooth manifold structure on the set of orthogonal projectors Gr(n, p), define an isometric group action of O(n) on the symmetric n × n matrices Sym n by which is the matrix representation of the canonical projection onto the first p coordinates with respect to the Cartesian standard basis.The set of orthogonal projectors Gr(n, p) is the orbit Φ(O(n), P 0 ) of the element P 0 under the group action Φ: Any matrix QP 0 Q T obviously satisfies the defining properties of Gr(n, p) as stated in (2.1).Conversely, if P ∈ Gr(n, p), then P is real, symmetric and positive semidefinite with p eigenvalues equal to one and n − p eigenvalues equal to zero.Hence, the eigenvalue decomposition (EVD) P = QΛQ T = QP 0 Q T establishes P as a point in the orbit of P 0 .In other words, we have confirmed that maps into Gr(n, p) and is surjective.The stabilizer of Φ at P 0 is given by a smooth submersion, i.e., a smooth map with surjective differential at every point.Proposition A.2 in the appendix shows that Gr(n, p)

The differential of the projection π
Since π OG is a submersion, this spans the entire tangent space, i.e., In combination with the metric g O Q , the smooth submersion π OG allows to decompose every tangent space T Q O(n) into a vertical and horizontal part, c.f. [35,Chapter 2].The vertical part is the kernel of the differential dπ OG Q , and the horizontal part is the orthogonal complement with respect to the metric g O Q .We therefore have where
Writing P = QP 0 Q T and making use of (2.5) shows that every ∆ ∈ T P Gr(n, p) is of the form Note that for ∆ ∈ T P Gr(n, p), there is Ω ∈ so P (n) such that ∆ = [Ω, P ].This Ω can be calculated via Ω = [∆, P ] ∈ so P (n).
Proposition 2.1 (Tangent vector characterization).Let P ∈ Gr(n, p) be the orthogonal projector onto the subspace U.For every symmetric ∆ = ∆ T ∈ R n×n , the following conditions are equivalent: where Ω := [∆, P ] ∈ so P (n).Here, ∆(U) := {∆x ∈ R n | x ∈ U} and the orthogonal complement U ⊥ is taken with respect to the Euclidean metric in R n .
Proof.The equivalence of a), b) and c) is from [38,Result 3.7].To show c) implies d), note that ∆P + P ∆ = ∆ implies P ∆P = 0 and therefore [[∆, P ], P ] = ∆P + P ∆ − 2P ∆P = ∆.On the other hand, if d) holds then ∆ = ∆P + P ∆ − 2P ∆P , which also implies P ∆P = 0 by multiplication with P from one side.Inserting P ∆P = 0 into the equation shows that c) holds.The statement that Ω ∈ so P (n) is automatically true.

Horizontal
Lift to the Stiefel Manifold.The elements of Gr(n, p) are n × n matrices (see the bottom level of Figure 2.1).The map π OG makes it possible to (non uniquely) represent elements of Gr(n, p) as elements of O(n)-the top level of Figure 2.1-which are also n × n matrices.In practical computations, however, it is often not feasible to work with n × n matrices, especially if n is large when compared to the subspace dimension p.A remedy is to resort to the middle level of Figure 2 is an embedded submanifold of R n×p and the projection from the orthogonal group onto the Stiefel manifold is given by and π OS is a smooth submersion, which allows to decompose every tangent space T Q O(n) into a vertical and horizontal part with respect to the metric g O Q .We therefore have where and By the identification see [17], and orthogonal completion U ⊥ ∈ R n×(n−p) of U , i.e., such that U U ⊥ ∈ O(n), the tangent spaces of the Stiefel manifold are explicitly given by either of the following expressions (2.9) Note that U T U ⊥ = 0 and U T ⊥ U ⊥ = I n−p , as well as Because different matrices T, T may have the same orthogonal projection onto span(U ⊥ ), (I n − U U T )T = (I n − U U T ) T , only the first representation in (2.9) has a number of degrees of freedom (dofs), namely 1  2 p(p − 1) (dofs of A ∈ so(p)) plus (n − p)p (dofs of B), equal to the dimension of St(n, p) and consequently to that of T U St(n, p).
The canonical metric g St U (•, •) on the Stiefel manifold is given via the horizontal lift.That means that for any two tangent vectors in The inner product between D hor 1,Q , D hor 2,Q is now computed according to the metric of O(n).In practice, this leads to c.f. [17].The last equality shows that it does not matter which base point Q ∈ (π OS ) −1 (U ) is chosen for the lift.
In order to make the transition from column-orthogonal matrices U to the associated subspaces U = span(U ), another equivalence relation, this time on the Stiefel manifold, is required: Identify any matrices U ∈ St(n, p), whose column vectors span the same subspace U.For any two Stiefel matrices U, Ũ that span the same subspace, it holds that Ũ = U U T Ũ .As a consequence, under this group action can be identified with a projector U U T and vice versa.Therefore, according to [34, is surjective.By making use of (2.8), every tangent vector ∆ ∈ T π OG (Q) Gr(n, p) can be written as Again, we split every tangent space T U St(n, p) with respect to the projection π SG and the metric g St U (•, •) on the Stiefel manifold.Defining the kernel of d(π SG ) U as the vertical space and its orthogonal complement (with respect to the metric g St U ) as the horizontal space leads to the direct sum decomposition where (2.13) Since π SG is the only projection that we use on the Stiefel manifold, the dependence of the splitting on the projection is omitted in the notation.
The tangent space T P Gr(n, p) of the Grassmannian can be identified with the horizontal space Hor U St(n, p).Therefore, for every tangent vector ∆ ∈ T P Gr(n, p), there is a unique ∆ hor U ∈ Hor U St(n, p), called the horizontal lift of ∆ to U .By (2.13), there are matrices T ∈ R n×p and B ∈ R (n−p)×p such that Note that ∆ hor U depends only on the chosen representative U of U, while B depends on the chosen orthogonal completion U ⊥ as well.
Multiplication of (2.12) from the right with U shows that the horizontal lift of ∆ ∈ T P Gr(n, p) to U ∈ St(n, p) can be calculated by ( In conclusion, the Grassmann manifold is placed at the end of the following quotient space hierarchy with equivalence classes [•] from (2.10) and [[•]] from (2.4): Remark 2.2.It should be noted that there is yet another way of viewing the Grassmann manifold as a quotient.Instead of taking equivalence classes in O(n), one can take the quotient of the noncompact Stiefel manifold by the general linear group GL(p).This introduces a factor of the form (Y T Y ) −1 into many formulae, where Y ∈ R n×p is a rank p matrix with (not necessarily orthogonal) column vectors spanning the desired subspace.For this approach see for example [1].
3. Riemannian Structure.In this section, we study the basic Riemannian structure of the Grassmannian.We introduce the canonical metric coming from the quotient structure-which coincides with the Euclidean metric-and the Riemannian connection.The Riemannian exponential mapping for geodesics is derived in the formulation as projectors as well as with Stiefel representatives.Lastly, we study the concept of parallel transport on the Grassmannian.Many of those results have been studied before for the projector or the ONB perspective.For the metric and the exponential see for example [17,1] (Stiefel perspective) and [6] (projector perspective).For the horizontal lift of the Riemannian connection see [1].A formula for parallel transport in the ONB perspective was given in [17].Here we combine the approaches and provide some modifications and additions.We derive formulae for all mentioned concepts in both perspectives and also study the derivative of the exponential mapping.

Riemannian Metric.
The Riemannian metric on the Grassmann manifold that is induced by the quotient structure coincides with (one half times) the Euclidean metric.To see this, let ∆ 1 , ∆ 2 ∈ T P Gr(n, p) be two tangent vectors at P ∈ Gr(n, p) and let Q = U U ⊥ ∈ O(n) such that π OG (Q) = P .The metric on the Grassmann manifold is then inherited from the metric on O(n) applied to the horizontal lifts, i.e. (3.1) and ∆ hor i,U = U ⊥ B i .We immediately see that The last equality can be seen by noticing The metric does not depend on the point to which we lift: Lifting to a different U R ∈ St(n, p) results in a postmultiplication of ∆ hor i,U with R according to (2.15).By the invariance properties of the trace, this does not change the metric.An analogous argument holds for the lift to O(n).
With the Riemannian metric we can define the induced norm of a tangent vector ∆ ∈ T P Gr(n, p) by

Riemannian Connection.
The disjoint collection of all tangent spaces of a manifold M is called the tangent bundle T M = ∪p∈M T p M .A smooth vector field on M is a smooth map X from M to T M that maps a point p ∈ M to a tangent vector X(p) ∈ T p M .The set of all smooth vector fields on M is denoted by X(M ).Plugging smooth vector fields Y, Z ∈ X(M ) into the metric of a Riemannian manifold gives a smooth function g(Y, Z) : M → R. It is not possible to calculate the differential of a vector field in the classical sense, since every tangent space is a separate vector space and the addition of X(p) ∈ T p M and X(q) ∈ T q M is not defined for p = q.To this end, the abstract machinery of differential geometry provides special tools called connections.A connection acts as the derivative of a vector field in the direction of another vector field.On a Riemannian manifold (M, g), the Riemannian or Levi-Civita connection is the unique connection ∇ : • compatible with the metric: for all vector fields X, Y, Z ∈ X(M ), we have the product rule where [X, Y ] = X(Y ) − Y (X) denotes the Lie bracket of two vector fields.The Riemannian connection can be explicitly calculated in the case of embedded submanifolds: It is the projection of the Levi-Civita connection of the ambient manifold onto the tangent space of the embedded submanifold.For details see for example [35].
The Euclidean space R n×p is a vector space, which implies that every tangent space of R n×p can be identified with R n×p itself.Therefore, the Riemannian connection of the Euclidean space R n×p with the Euclidean metric (B.1) is the usual directional derivative: Let F : The same holds for the space of symmetric matrices Sym n .When considered as the set of orthogonal projectors, the Grassmann manifold Gr(n, p) is an embedded submanifold of Sym n .In this case, the projection onto the tangent space is see also [38].In order to restrict calculations to n × p matrices, we can lift to the Stiefel manifold and use the projection onto the horizontal space, which is see also [17,1].Note that Hor U St(n, p) ∼ = T π SG (U ) Gr(n, p) as described in Subsection 2.2.The Riemannian connection on Gr(n, p) is now obtained via the following proposition.
Proposition 3.1 (Riemannian Connection).Let X ∈ X(Gr(n, p)) be a smooth vector field on Gr(n, p), i.e., X(P ) ∈ T P Gr(n, p), with a smooth extension to an open set in the symmetric n × n matrices, again denoted by X.Let Y ∈ T P Gr(n, p).The Riemannian connection on Gr(n, p) is then given by It can also be calculated via the horizontal lift, , where X is the vector field P → X(P ).Proof.Equation (3.5) follows directly from the preceding discussion.It can be checked that (3.6) is the horizontal lift of (3.5).Alternatively, (3.6) can be deduced from [43,Lemma 7.45] by noticing that the horizontal space of the Stiefel manifold is the same for the Euclidean and the canonical metric.Furthermore, (3.6) coincides with [1, Theorem 3.4].

Gradient.
The gradient of a real-valued function on the Grassmannian for the canonical metric was computed in [17] for the Grassmannian with Stiefel representatives, in [26] for the projector perspective and in [1] for the Grassmannian as a quotient of the noncompact Stiefel manifold.For the sake of completeness, we introduce it here as well.The gradient is dual to the differential of a function in the following sense: For a function f : Gr(n, p) → R, the gradient at P is defined as the unique tangent vector (grad f ) P ∈ T P Gr(n, p) fulfilling df P (∆) = g Gr P ((grad f ) P , ∆) for all ∆ ∈ T P Gr(n, p), where df P denotes the differential of f at P .It is well known that the gradient for the induced Euclidean metric on a manifold is the projection of the Euclidean gradient grad eucl to the tangent space.For the Euclidean gradient to be well-defined, f is to be understood as a smooth extension of the actual function f to an open subset of Sym n .Therefore (grad f ) P = Π T P Gr ((grad eucl f ) P ).
The function f on Gr(n, p) can be lifted to the function f := f • π SG on the Stiefel manifold.Again, when necessary, we identify f with a suitable differentiable extension.These two functions are linked by The first equality is [1, Equation (3.39)], while the second equality uses the same argument as above.The last equality is due to the fact that the gradient of f has no vertical component.For further details see [17,26,1].

Exponential Map.
The exponential map exp p : T p M → M on a Riemannian manifold M maps a tangent vector ∆ ∈ T p M to the endpoint γ(1) ∈ M of the unique geodesic γ that emanates from p in the direction ∆.Thus, geodesics and the Riemannian exponential are related by γ(t) = exp p (t∆).Under a Riemannian submersion π : M → N , geodesics with horizontal tangent vectors in M are mapped to geodesics in N , cf. [43,Corollary 7.46].Since the projection π OG : O(n) → Gr(n, p) defined in (2.3) is a Riemannian submersion by construction, this observation may be used to obtain the Grassmann geodesics.
We start with the geodesics of the orthogonal group.For any Lie group with biinvariant metric, the geodesics are the one-parameter subgroups, [3, §2].Therefore, the geodesic from where exp m denotes the matrix exponential, see (B.2).If π OG (Q) = P ∈ Gr(n, p) and ∆ ∈ T P Gr(n, p) with ∆ hor This formula, while simple, is not useful for applications with large n, since it involves the matrix exponential of an n × n matrix.Evaluating the projection π OG leads to the geodesic formula from [6]: Proposition 3.2 (Grassmann Exponential: Projector Perspective).Let P ∈ Gr(n, p) be a point in the Grassmannian and ∆ ∈ T P Gr(n, p).The exponential map is given by , the horizontal lift of the tangent vector ∆ = [Ω, P ] ∈ T P Gr(n, p) is given by ΩQ ∈ Hor If n p, then working with Stiefel representatives reduces the computational effort immensely.The corresponding geodesic formula appears in [1,17] and is restated in the following proposition.The bracket [•] denotes the equivalence classes from (2.10).Proposition 3.3 (Grassmann Exponential: ONB Perspective).For a point P = U U T ∈ Gr(n, p) and a tangent vector ∆ ∈ T P Gr(n, p), let ∆ hor U ∈ Hor U St(n, p) be the horizontal lift of ∆ to Hor U St(n, p).Let r ≤ min(p, n − p) be the number of non-zero singular values of ∆ hor U .Denote the thin singular value decomposition (SVD) of ∆ hor U by The Grassmann exponential for the geodesic from P in direction ∆ is given by which does not depend on the chosen orthogonal completion V ⊥ .
Proof.This is essentially [17, Theorem 2.3] with a more careful consideration of the case of rank-deficient tangent velocity vectors and a reduced storage requirement for Q if r < p.The thin SVD of B is given by which leads to the desired result when inserted into (3.7).The second equality in (3.8) is given by a postmultiplication by V V ⊥ ∈ O(p), which does not change the equivalence class.This postmultiplication does however change the Stiefel representative, so U V cos(tΣ) + Q sin(tΣ) U V ⊥ is the Stiefel geodesic from U V U V ⊥ in direction QΣ 0 .A different orthogonal completion of V does not change the second expression in (3.8) and results in a different representative of the same equivalence class in the third expression.
The formula established in [17] By a slight abuse of notation we also define to be the Grassmann exponential on the level of Stiefel representatives.
3.5.Differentiating the Grassmann Exponential.In this section, we compute explicit expressions for the differential of the Grassmann exponential d(Exp Gr P ) ∆ at a tangent location ∆ ∈ T P Gr(n, p).One possible motivation is the computation of Jacobi fields vanishing at a point in Subsection 7.1.Another motivation is, e.g., Hermite manifold interpolation as in [62].Formally, the differential is the linear map (3.11) d(Exp Gr P ) ∆ : T ∆ (T P Gr(n, p)) → T Exp Gr P (∆) Gr(n, p).The tangent space to a linear space can be identified with the linear space itself, so that T ∆ (T P Gr(n, p)) ∼ = T P Gr(n, p).We also exploit this principle in practical computations.We consider the exponential in the form of (3.9).The task boils down to computing the directional derivatives where ∆, ∆ ∈ T P Gr(n, p).A classical result in Riemannian geometry [35,Prop. 5.19] ensures that for ∆ = 0 ∈ T P Gr(n, p) the derivative is the identity d(Exp Gr P ) 0 ( ∆) = ∆.For ∆ = 0, we can proceed as follows: Proposition 3.4 (Derivative of the Grassmann Exponential).Let P = U U T ∈ Gr(n, p) and ∆, ∆ ∈ T P Gr(n, p) such that ∆ hor U has mutually distinct, non-zero singular values.Furthermore let ∆ hor U = QΣV T and (∆ + t ∆) hor U = Q(t)Σ(t)V (t) T be the compact SVDs of the horizontal lifts of ∆ and ∆ + t ∆, respectively.Denote the derivative of Q(t) evaluated at t = 0 by Q = d dt t=0 Q(t) and likewise for Σ(t) and V (t). 1 Let and Then the derivative of the Grassmann exponential is given by Proof.The curve γ(t) := Exp Gr P (∆ + t ∆) on the Grassmannian is given by according to (3.9).Note that this is in general not a geodesic in Gr(n, p) but merely a curve through the endpoints of the geodesics from P in direction ∆ + t ∆.That is to say, it is the mapping of the (non-radial) straight line ∆+t ∆ in T P Gr(n, p) to Gr(n, p) via the exponential map.The projection π SG is not affected by the postmultiplication of V (t) T ∈ O(p), because of the nature of the equivalence classes in St(n, p).Therefore we set and have γ(t) = π SG (µ(t)).The derivative of γ with respect to t evaluated at t = 0 is then given by But with the definitions above, Y = µ(0) and Γ = μ(0), so (3.15) is equivalent to (3.13).The horizontal lift of (3.15) to Y is according to (2.14) given by a postmultiplication of Y , which shows (3.14).Note however that Γ ∈ T Y St(n, p) is not necessarily horizontal, so 0 = Γ T Y ∈ so(p).
In order to remove the "mutually distinct singular values" assumption of Proposition 3.4 and to remedy the numerical instability of the SVD in the presence of clusters of singular values, we introduce an alternative computational approach that relies on the derivative of the QR-decomposition rather than that of the SVD.Yet in this case, the "non-zero singular values" assumption is retained, and instabilities may arise for matrices that are close to being rank-deficient.
Let U, ∆ hor U , ∆hor U be as introduced in Prop.3.4 (now with possibly repeated singular values of ∆ hor U ) and consider the t-dependent QR-decomposition of the matrix curve (∆ + t ∆) hor U = Q(t)R(t).The starting point is (3.7), which can be transformed to by means of elementary matrix operations.Write . By the product rule, (3.16) d dt t=0 γ(t) = (0, Q(0)) exp m (M (0)) The derivative d dt t=0 exp m (M (t)) = d(exp m ) M (0) ( Ṁ (0)) can be computed according to Mathias' Theorem [28,Thm 3.6,p. 58] from which is a (4p × 4p)-matrix exponential written in sub-blocks of size (p × p).Substituting in (3.16) gives the O(np 2 )-formula This corresponds to [62, Lemma 5], which addresses the Stiefel case.The derivative matrices Q(0), Ṙ(0) can be obtained from Alg. B.2 in Appendix B. The final formula is obtained by taking the projection into account as in (3.15),where µ is to be replaced by γ.The horizontal lift is computed accordingly.The derivative of the Grassmann exponential can also be computed directly in Gr(n, p) without using horizontal lifts, at the cost of a higher computational complexity, but without restrictions with regard to the singular values.The key is again to apply Mathias' Theorem to evaluate the derivative of the matrix exponential.Let P ∈ Gr(n, p) and ∆ = [Ω, P ], ∆ = [ Ω, P ] ∈ T P Gr(n, p) with Ω = ( Here, Ψ ∈ so(n), since exp m (Ω + t Ω) is a curve in O(n) through Q at t = 0. Then a computation shows that the derivative of The matrices Q and ΨQ can be obtained in one calculation by evaluating the left side of according to Mathias' Theorem.
3.6.Parallel Transport.On a Riemannian manifold (M, g), parallel transport of a tangent vector v ∈ T p M along a smooth curve γ : I → M through p gives a smooth vector field V ∈ X(γ) along γ that is parallel with respect to the Riemannian connection ∇ and fulfills the initial condition V (p) = v.A vector field V ∈ X(γ) along a curve γ is a vector field that is defined on the range of the curve, i.e., V : γ(I) → T M and V (γ(t)) ∈ T γ(t) M .The term "parallel" means that for all t ∈ I, the covariant derivative of V in direction of the tangent vector of γ vanishes, i.e.
Parallel transport on the Grassmannian (ONB perspective) was studied in [17], where an explicit formula for the horizontal lift of the parallel transport of a tangent vector along a geodesic was derived, and in [1], where a differential equation for the horizontal lift of parallel transport along general curves was given.In the next proposition, we complete the picture by providing a formula for the parallel transport on the Grassmannian from the projector perspective.Note that this formula is similar to the parallel transport formula in the preprint [33].
Applying the horizontal lift to the parallel transport equation leads to the formula also found in [17].
According to (2.14), the horizontal lift of P ∆ (Exp Gr P (tΓ)) to the Stiefel geodesic representative U (t) = Q exp m (tQ T ΩQ)I n,p is given by a post-multiplication with U (t), P ∆ (Exp Gr P (tΓ)) hor This formula can be simplified similarly to [ Similarly to the proof of Proposition 3.3, with γ Γ (t) := Exp Gr P (tΓ), This difference between this formula and the one found from [17,Theorem 2.4] is in the usage of the thin SVD and the therefore smaller matrices Q, Σ and V , depending on the problem.But the first line also shows that if r = n − p, which can happen if p ≥ n/2, the term I n−p −W W T vanishes, and therefore also the term (I n − Q QT )∆ hor U . 4. Symmetry and Curvature.In this section, we establish the symmetric space structure of the Grassmann manifold by elementary means.The symmetric structure of the Grassmannian was for example shown in [30, Vol.II] and [10].
Exploiting the symmetric space structure, the curvature of the Grassmannian can be calculated explicitly.Curvature formulae for symmetric spaces can be found for example in [43] and [30,Vol. II].To the best of the authors' knowledge, a first formula for the sectional curvature of the Grassmannian was given in [56], without making use of the symmetric structure.The bounds were studies in [57].In [36], curvature formulae have been derived in local coordinates via differential forms.Explicit curvature formulae for a generalized version of the Grassmannian as the space of orthogonal projectors were given in [38].For applications where curvatures matter, see for example [11] and [37], and several references therein, where bounds on the curvature are required for the analysis of Riemannian optimization methods and of the Riemannian centers of mass.
4.1.Symmetric Space Structure.In differential geometry, a metric symmetry at q is an isometry σ : M → M of a manifold M that fixes a certain point σ(q) = q with the additional property that dσ q = − id | TqM .This relates to the concept of a point reflection in Euclidean geometry.A (metric) symmetric space is a connected differentiable manifold that has a metric symmetry at every point, [43, §8].Below, we execute an explicit construction of symmetries for the Grassmannian, which compares to the abstract course of action in [43, §11, p. 315ff].
Given any other point P ∈ Gr(n, p), we can compute the EVD P = QP 0 Q T and define σ P : P → (QS 0 Q T ) P (QS 0 Q T ).This isometry fixes P , σ P (P ) = P .Moreover, for any curve with P (0) = P , Ṗ (0) = ∆ ∈ T P Gr(n, p), it holds ∆ = is skew.As a consequence, we use the transformation to move ∆ to the tangent space at P 0 and compute Hence, we have constructed metric symmetries at every point of Gr(n, p).
The symmetric space structure of Gr(n, p) implies a number of strong properties.First of all, it follows that Gr(n, p) is geodesically complete [43,Chap. 8,Lemma 20].This means that the maximal domain of definition for all Grassmann geodesics is the whole real line R.As a consequence, all the statements of the Hopf-Rinow Theorem [16, Chap.7, Thm 2.8], [3, Thm 2.9] hold for the Grassmannian: 1.The Riemannian exponential Exp Gr P : T P Gr(n, p) → Gr(n, p) is globally defined.2. (Gr(n, p), dist(•, •)) is a complete metric space, where dist(•, •) is the Riemannian distance function.3. Every closed and bounded set in Gr(n, p) is compact.These statements are equivalent.Any one of them additionally implies 4. For any two points P 1 , P 2 ∈ Gr(n, p), there exists a geodesic γ of length L(γ) = dist(P 1 , P 2 ) that joins P 1 to P 2 ; hence any two points can be joined by a minimal geodesic segment.5.The exponential map Exp Gr P : T P Gr(n, p) → Gr(n, p) is surjective for all P ∈ Gr(n, p).
∈ T P0 Gr(n, p), etc.Then, by [43,Proposition 11.31], the curvature tensor at P 0 is given by R xy z = dπ OG P0 ([ Ẑ, [ X, Ŷ ]]), since the Grassmannian is symmetric and therefore also reductive homogeneous.This formula coincides with the formula found in [38].Explicitly, we can calculate The sectional curvature of the Grassmannian can be calculated by the following formulae.It depends only on the plane spanned by two given tangent vectors, not the spanning vectors themselves.For a Riemannian manifold, the sectional curvature completely determines the curvature tensor, see for example [35,Proposition 8.31].
Proposition 4.1.Let P ∈ Gr(n, p) and let ∆ 1 , ∆ 2 ∈ T P Gr(n, p) span a nondegenerate plane in T P Gr(n, p).The sectional curvature is then given by (4.1) Proof.This formula can be derived from the result in [38].For a direct proof, we proceed as follows.The tangent vectors can be expressed as ∆ 1 = [Ω 1 , P ], ∆ 2 = [Ω 2 , P ] ∈ T P Gr(n, p) for some Ω 1 , Ω 2 ∈ so P (n).Using the fact The property that for any two X, Y ∈ holds, shows the claim according to [43,Proposition 11.31].
With (2.12) every ∆ i ∈ T P Gr(n, p) can be written as for some U U ⊥ ∈ (π OG ) −1 (P ) and B i ∈ R (n−p)×p .Since every tangent vector in T P Gr(n, p) is uniquely determined by such a B for a chosen representative U U ⊥ , we can insert this into (4.1) and get the simplified formula This formula is equivalent to the slightly more extended form in [56] and depends only on the factors It also holds for the horizontal lifts of ∆ i by just replacing the symbols B i by (∆ i ) hor U , which can also be shown by exploiting (2.12) and (∆ i ) hor U = U ⊥ B i .In summary, for two orthonormal tangent vectors ∆ the sectional curvature is given by Inserting any pair of orthonormal tangent vectors shows that for n > 2, the sectional curvature of the real projective space Gr(n, 1) = RP n−1 is constant K P ≡ 1, as it is by the same calculation for Gr(n, n − 1), see also [56].The same source also states a list of facts about the sectional curvature on Gr(n, p) without proof, especially that for min(p, n − p) ≥ 2. Nonnegativity follows directly from (4.1).The upper bound was proven in [57], by proving that for any two matrices A, B ∈ R m×n , with m, n ≥ 2, the inequality holds.Note that (4.2) can be rewritten as .
The bounds of the sectional curvature (4.3) are sharp for all cases except those mentioned in the next paragraph: The lower bound zero is attained whenever ∆ 1 , ∆ 2 commute.The upper curvature bound is attained, e.g., for B 1 = 1 1 −1 1 , B 2 = −1 1 −1 −1 , or matrices containing B 1 and B 2 as their top-left block and else only zeros, when p > 2.
In [36] it was shown that a Grassmannian Gr(n, p) features a strictly positive sectional curvature K P only if the sectional curvature is constant throughout.The sectional curvature is constant (and equal to K P ≡ 1) only in the cases p = 1, n > 2 or p = n−1, n > 2. In the case of n = 2, p = 1, the sectional curvature is not defined, since dim(Gr(2, 1)) = 1.Hence, in this case, there are no non-degenerate two-planes in the tangent space.
5. Cut Locus and Riemannian Logarithm.We have seen in Section 4.1 that Gr(n, p) is a complete Riemannian manifold.On such manifolds, the cut locus of a point P consists of those points F beyond which the geodesics starting at P cease to be length-minimizing.It is known [49, Ch.III, Prop.4.1] that P and F are in each other's cut locus if there is more than one shortest geodesic from P to F .We will see that, on the Grassmannian, this "if" is an "if and only if", and moreover "more than one" is always either two or infinitely many.
To get an intuitive idea of the cut locus, think of the earth as an ideal sphere.Then the cut locus of the north pole is the south pole, as it is the only point beyond which the geodesics starting at the north pole cease to be length-minimizing.In the case of the sphere, the "if and only if" statement that we just mentioned for the Grassmannian also holds; however, for the sphere, "more than one" is always infinitely many.
Given two points P, F ∈ Gr(n, p) that are not in each other's cut locus, the unique smallest tangent vector ∆ ∈ T P Gr(n, p) such that Exp Gr P (∆) = F is called the Riemannian logarithm of F at P .We propose an algorithm that calculates the Riemannian logarithm.Moreover, in the case of cut points, the algorithm is able to return any of the (two or infinitely many) smallest ∆ ∈ T P Gr(n, p) such that Exp Gr P (∆) = F .This ability comes from the indeterminacy of the SVD operation invoked by the algorithm.
The horizontal lift of the exponential map (3.9) depends explicitly on the so called principal angles between two points and allows us to give explicit formulae for different geodesics between P and a cut point F .We observe that the inherent ambiguity of the SVD, see Appendix B, corresponds to the different geodesics connecting the same points.
Our approach allows data processing schemes to explicitly map any set of given points on the Grassmannian to any tangent space T P Gr(n, p), with the catch that possibly a subset of the points (namely those that are in the cut locus of P ), is mapped to a set of tangent vectors each, instead of just a single one.

Cut Locus.
We can introduce the cut locus of the Grassmannian by applying the definitions of [35,Chap. 10] about cut points to Gr(n, p).In the following, let P ∈ Gr(n, p) and ∆ ∈ T P Gr(n, p) and γ ∆ : t → Exp Gr P (t∆).Then the cut time of (P, ∆) is defined as The cut point of P along γ ∆ is given by γ ∆ (t cut (P, ∆)) and the cut locus of P is defined as Cut P := {F ∈ Gr(n, p) | F = γ ∆ (t cut (P, ∆)) for some ∆ ∈ T P Gr(n, p)}.
In [54,48], it is shown that the cut locus of P = U U T ∈ Gr(n, p) is the set of all (projectors onto) subspaces with at least one direction orthogonal to all directions in the subspace onto which P projects, i.e. (5.1) This means that the cut locus can be described in terms of principal angles: The principal angles θ 1 , . . ., θ p ∈ [0, π 2 ] between two subspaces U and U are defined recursively by cos(θ

They can be computed via θ
, where s k ≤ 1 is the k-largest singular value of U T Ũ ∈ R p×p for any two Stiefel representatives U and Ũ .According to this definition, the principal angles are listed in ascending order: 0 In other words, the cut locus of P consists of all points F ∈ Gr(n, p) with at least one principal angle between P and F being equal to π 2 .Furthermore, as in [35], we introduce the tangent cut locus of P by Proof.Since γ ∆ (t cut (P, ∆)) ∈ Cut P , by (3.9) we have which is equivalent to cos(t cut (P, ∆)σ 1 ) = 0. Now we see that the tangent cut locus TCL P consists of those tangent vectors for which σ 1 (the largest singular value of the horizontal lift) fulfills σ 1 = π 2 and the injectivity domain ID P contains the tangent vectors with σ 1 < π 2 .The geodesic distance is a natural notion of distance between two points on a Riemannian manifold.It is defined as the length of the shortest curve(s) between two points as measured with the Riemannian metric, if such a curve exists.On the Grassmannian, it can be calculated as the two-norm of the vector of principal angles between the two subspaces, cf.[54], i.e. (5.3) dist(U, U) .
This shows that for any two points on the Grassmann manifold Gr(n, p), the geodesic distance is bounded by which was already stated in [54,Theorem 8].
Remark 5.2.There are other notions of distance on the Grassmannian that can also be computed from the principal angles, but which are not equal to the geodesic distance, see [17, §4.5], [44], [58,Table 2].In the latter reference, it is also shown that all these distances can be generalized to subspaces of different dimensions by introducing Schubert varieties and adding π 2 for the "missing" angles.The injectivity radius at P ∈ Gr(n, p) is defined as the distance from P to its cut locus, or equivalently, as the supremum of the radii r for which Exp Gr P is a diffeomorphism from the open ball B r (0) ⊂ T P Gr(n, p) onto its image.The injectivity radius at every P is equal to inj(P ) = π 2 , since there is always a subspace F for which the principal angles between P and F are all equal to zero, except one, which is equal to π 2 .For such an F it holds that dist(P, F ) = π 2 , c.f. (5.3), and F ∈ Cut P .For all other points F with dist(P, F ) < π 2 , all principal angles are strictly smaller than π 2 , and therefore F / ∈ Cut P .
Proof.In case of a), γ ∆ is minimizing by definition of the cut locus.It is unique by [35,Thm. 10.34 c)].In case of b), γ ∆ is still minimizing by the definition of the cut locus.For non-uniqueness, replace σ 1 by − π 2 (instead of π 2 ) and observe that we get a different geodesic with the same length and same endpoints.Case c) holds by definition of the cut locus.

Riemannian Logarithm.
For any P ∈ Gr(n, p), the restriction of Exp Gr P to the injectivity domain ID P is a diffeomorphism onto Gr(n, p)\Cut P by [35,Theorem 10.34].This means that for any F ∈ Gr(n, p) \ Cut P there is a unique tangent vector ∆ ∈ ID P such that Exp Gr P (∆) = F .The mapping that finds this ∆ is conventionally called the Riemannian logarithm.Furthermore, [35,Thm. 10.34] states that the restriction of Exp Gr P to the union of the injectivity domain and the tangent cut locus ID P ∪ TCL P is surjective.Therefore for any F ∈ Cut P we find a (non-unique) tangent vector which is mapped to F via the exponential map.We propose Algorithm 5.1, which computes the unique ∆ ∈ ID P ⊂ T P Gr(n, p) in case of F ∈ Gr(n, p) \ Cut P and one possible ∆ ∈ TCL P ⊂ Gr(n, p) for F ∈ Cut P .In the latter case, all other possible ∆ ∈ TCL P such that Exp Gr P ( ∆) = F can explicitly be derived from that result.
Procrustes processing element-wise on the diagonal 5: ∆ hor U := QΣR T Output: smallest ∆ hor U ∈ Hor U St(n, p) such that Exp Gr P (∆) = F Remark: In Step 1, the expression SVD := is to be understood as "is an SVD".In case of F ∈ Cut P , i.e. singular values equal to zero, different choices of decompositions lead to different valid output vectors ∆ hor U .The non-uniqueness of the compact SVD in Step 3 does not matter, because Σ = arcsin( Ŝ), and arcsin maps zero to zero and repeated singular values to repeated singular values.Therefore any non-uniqueness cancels out in the definition of ∆ hor U .
Before we prove the claimed properties of Algorithm 5.1, let us state the following: An algorithm for the Grassmann logarithm with Stiefel representatives only was derived in [1].The Stiefel representatives are however not retained in this algorithm, i.e., coupling the exponential map and the logarithm recovers the input subspace but produces a different Stiefel representative Ỹ = Exp Gr U (Log Gr U (Y )) = Y as an output.Furthermore, it requires the matrix inverse of U T Y , which also means that it only works for points not in the cut locus, see (5.1).By slightly modifying this algorithm we get Algorithm 5.1, which retains the Stiefel representative, does not require the calculation of the matrix inverse (U T Y ) −1 and works for all pairs of points.The computational procedure of Algorithm 5.1 was first published in the book chapter preprint [61].
In the following Theorem 5.4, we show that Algorithm 5.1 indeed produces the Grassmann logarithm for points not in the cut locus.
Proof.First, Algorithm 5.1 aligns the given subspace representatives U and Y by producing a representative of the equivalence class [Y ] that is "closest" to U .To this end, the Procrustes method is used, cf.[28].Procrustes gives If we denote the part of Y * that lies in span(U ) ⊥ by L := (I n − U U T )Y * , we see that ) is the diagonal matrix of singular values of L, with the singular values in descending order.The square root is well-defined, since (I n − S 2 ) is diagonal with values between 0 and 1.Note also that the column vectors of R are a set of orthonormal eigenvectors of L T L, i.e., a compact singular value decomposition of L is of the form where again Q ∈ St(n, p).Define Σ := arccos(S), where the arcus cosine (and sine and cosine in the following) is applied entry-wise on the diagonal.Then S = cos(Σ) and Ŝ = sin(Σ).Inserting in (5.5) gives This is exactly the exponential with Stiefel representatives (3.10), i.e., Exp Gr U (∆ hor U ) = Y * , where ∆ hor U = QΣR T .We also see that the exact matrix representative Y * , and not just any equivalent representative, is computed by plugging ∆ hor U into the exponential Exp Gr U .
The singular value decomposition in (5.4) differs from the usual SVD -with singular values in descending order -only by a permutation of the columns of Q and R.But if Y T U = QSR T is an SVD with singular values in ascending order and Y T U = Q S R T is an SVD with singular values in descending order, the product QR T = Q R T does not change, i.e., the computation of Y * is not affected.Therefore we can compute the usual SVD for an easier implementation and keep in mind that S 2 + Ŝ2 = I n .
It remains to show that ∆ ∈ ID P , so that it is actually the Riemannian logarithm.Since F is not in the cut locus Cut P , we have rank(U T Y ) = p, which means that the smallest singular value of U T Y is larger than zero (and smaller than or equal to one).Therefore the entries of Σ = arccos(S) are smaller than π 2 , which shows the claim.The next theorem gives an explicit description of the shortest geodesics between a point and another point in its cut locus.
Theorem 5.5.For P = U U T ∈ Gr(n, p) and some F = Y Y T ∈ Cut P , let r denote the number of principal angles between P and F equal to π 2 .Then ∆ ∈ TCL P ⊂ T P Gr(n, p) is a minimizing solution of (5.7) Exp Gr P (∆) = F if and only if the horizontal lift ∆ hor U is an output of Algorithm 5.1.Consider the compact SVD ∆ hor U = QΣR T .Then the horizontal lifts of all other minimizing solutions of (5.7) are given by where W ∈ O(r) and diag(W, I p−r ) = W 0 0 Ip−r denotes a block diagonal matrix.The shortest geodesics between P and F are given by Proof.Algorithm 5.1 continues to work for points in the cut locus, but the result is not unique.With an SVD of Y T U with singular values in ascending order, the first r singular values are zero.By Proposition B.1, where D ∈ O(p − r) with (D) ij = 0 for s i = s j and W 1 , W 2 ∈ O(r) arbitrary.Then Y * is not unique anymore, but is given as the set of matrices With Σ = arcsin( Ŝ) = arccos(S), every matrix is the horizontal lift of a tangent vector at P of a geodesic towards F : For the exponential, it holds that where the third equality holds, since Σ = diag( π 2 , . . ., π 2 , σ r+1 , . . ., σ p ).But the geodesics γ W starting at [U ] in the directions ∆ W differ, i.e.
Hence, the ambiguity factor Ŵ T = diag(W T , I p−r ) does not vanish for 0 < t < 1.The geodesics are all of the same (minimal) length, since the singular values do not change and γ W (1) = Exp Gr P (∆ W ).
To show that there are no other solutions, let ∆ ∈ TCL P fulfill Exp Gr P ( ∆) = F and ∆hor U = Q Σ RT .Then by (3.10) there is some M ∈ O(p) such that which means that Q sin( Σ) RT is a compact SVD of (I n − U U T ) Ȳ * .Therefore ∆hor U is an output of Algorithm 5.1 and the claim is shown.
Log Gr P (P ) = (0, 0) Log Gr P Fig. 5.1.The manifold of one-dimensional subspaces of R , i.e., Gr(3, 1), can be seen as the upper half sphere with half of the equator removed.For points in the cut locus of a point P ∈ Gr(3, 1) (like F 3 in the figure), there is no unique velocity vector in T P Gr(3, 1) that sends a geodesic from P to the point in question, but instead a set of two starting velocities (∆ 3,+1 and ∆ 3,−1 ) that can be calculated according to Theorem 5.5.Since the points actually mark one dimensional subspaces through the origin, F 3 is identical to its antipode on the equator.
Together, Theorem 5.4 and Theorem 5.5 allow to map any set of points on Gr(n, p) to a single tangent space.The situation of multiple tangent vectors that correspond to one and the same point in the cut locus is visualized in Figure 5.1.Notice that if r = 1 in Theorem 5.5, there are only two possible geodesics γ ±1 (t).For r > 1 there is a smooth variation of geodesics.
In [6,Theorem 3.3] a closed formula for the logarithm for Grassmann locations represented by orthogonal projectors was derived.We recast this result in form of the following proposition.Consequently Log Gr P (F ) = [Ω, P ].This proposition gives the logarithm explicitly, but it relies on n × n matrices.Lifting the problem to the Stiefel manifold reduces the computational complexity.A method to compute the logarithm that uses an orthogonal completion of the Stiefel representative U and the CS decomposition was proposed in [20].
6. Local Parameterizations of the Grassmann Manifold.In this section, we construct local parameterizations and coordinate charts of the Grassmannian.To this end, we work with the Grassmann representation as orthogonal projector P = U U T .The dimension of Gr(n, p) is (n − p)p.Here, we recap how explicit local parameterizations from open subsets of R (n−p)×p onto open subsets of Gr(n, p) (and the corresponding coordinate charts) can be constructed.
The Grassmannian Gr(n, p) can be parameterized by the so called normal coordinates via the exponential map, which was also done in [26].Let P = U U T ∈ Gr(n, p) and U ⊥ some orthogonal completion of U ∈ St(n, p).By making use of (2.12), a parameterization of Gr(n, p) around P is given via ρ : R (n−p)×p → Gr(n, p), A different approach that avoids matrix exponentials, and which is also briefly introduced in [27, Appendix C.4], works as follows: Let B ⊂ R (n−p)×p be an open ball around the zero-matrix 0 ∈ R (n−p)×p for some induced matrix norm • .Consider Note that B is mapped to the orthogonal projector onto colspan( Ip B ), so that actually ϕ(B) ⊂ Gr(n, p).In particular, ϕ(0) = P 0 .Let P ∈ Gr(n, p) be written block-wise as P = A B T B C .Next, we show that the image of ϕ is the set of such projectors P with an invertible p × p-block A and that ϕ is a bijection onto its image.To this end, assume that A ∈ R p×p has full rank p.Because P is idempotent, it holds As a consequence, ( p) .This shows that X = A −1 B T and C = BA −1 B T .In summary, The tangent space at P is the image colspan(dϕ P ) for a suitable parameterization ϕ around P .At P 0 , we obtain in consistency with (2.8).
In principle, ϕ and ψ can be used as a replacement for the Riemannian exp-and log-mappings in data processing procedures.For example, for a set of data points contained in ϕ(B) ⊂ Gr(n, p), Euclidean interpolation can be performed on the coordinate images in B ⊂ R (n−p)×p .Likewise, for an objective function f : Gr(n, p) ⊃ D → R with domain D ⊂ ϕ(B), the associated function f • ϕ : R (n−p)×p ⊃ ϕ −1 (D) → R can be optimized relying entirely on standard Euclidean tools; no evaluation of neither matrix exponentials nor matrix logarithms is required.Yet, these parameterizations do not enjoy the metric special properties of the Riemannian normal coordinates.Another reason to be wary of interpolation in coordinates is that the values on the Grassmannian will never leave ϕ(B), and this can be very unnatural for some data sets.Furthermore, the presence of a domain D can be unnatural, as ϕ(B) is an open subset of Gr(n, p), whereas the whole Grassmannian is compact, a desirable property for optimization.If charts are a switched, then information gathered by the solver may lose interest.Nevertheless, working in charts can be a successful approach [52].
7. Jacobi Fields and Conjugate Points.In this section, we describe Jacobi fields vanishing at one point and the conjugate locus of the Grassmannian.Jacobi fields are vector fields along a geodesic fulfilling the Jacobi equation (7.1).They can be viewed as vector fields pointing towards another "close-by" geodesic, see for example [35].The conjugate points of P are all those F ∈ Gr(n, p) such that there is a non-zero Jacobi field along a (not necessarily minimizing) geodesic from P to F , which vanishes at P and F .The set of all conjugate points of P is the conjugate locus of P .In general, there are not always multiple distinct (possibly non-minimizing) geodesics between two conjugate points, but on the Grassmannian there are.The conjugate locus on the Grassmannian was first treated in [55], but the description there is not complete.This is for example pointed out in [48] and [8].The latter gives a description of the conjugate locus in the complex case, which we show can be transferred to the real case.
Jacobi fields and conjugate points are of interest, when variations of geodesics are considered.They arise for example in geodesic regression [18] and curve fitting problems on manifolds [9].7.1.Jacobi Fields.A Jacobi field is smooth vector field J along a geodesic γ satisfying the ordinary differential equation (7.1) D 2 t J + R(J, γ) γ = 0, called Jacobi equation.Here R(•, •) is the curvature tensor and D t denotes the covariant derivative along the curve γ.This means that for every extension Ĵ of J, which is to be understood as a smooth vector field on a neighborhood of the image of γ that coincides with J on γ(t) for every t, it holds that (D t J)(t) = ∇ γ(t) Ĵ.For a detailed introduction see for example [35,Chapter 10].A Jacobi field is the variation field of a variation through geodesics.That means intuitively that J points from the geodesic γ to a "close-by" geodesic, and, by linearity and scaling, to a whole family of such close-by geodesics.Jacobi fields that vanish at a point can be explicitly described via [35,Proposition 10.10], which states that the Jacobi field J along the geodesic γ, with γ(0) = p and γ(0) = v, and initial conditions J(0) = 0 ∈ T p M and The concept is visualized in Figure 7.1.
Gr(3, 1) P γ J Fig. 7.1.The Jacobi field J points from the geodesic γ towards close-by geodesics (dotted) and vanishes at P .Note that J(t) ∈ T γ(t) Gr(3, 1) is a tangent vector and not actually the offset vector between points on the respective geodesics in R 3 .Nevertheless, J is the variation field of a variation of γ through geodesics, c.f. [35,Proposition 10.4].
By making use of the derivative of the exponential mapping derived in Proposition 3.4, we can state the following proposition for Jacobi fields vanishing at a point on the Grassmannian.Proposition 7.1.Let P = U U T ∈ Gr(n, p) and let ∆ 1 , ∆ 2 ∈ T P Gr(n, p) be two tangent vectors, where the singular values of (∆ 1 ) hor U are mutually distinct and non-zero.Define the geodesic γ by γ(t) := Exp Gr P (t∆ 1 ).Furthermore, let (t∆ 1 ) hor U = Q(tΣ)V T and (t(∆ 1 + s∆ 2 )) hor U = Q(s)(tΣ(s))V (s) T be given via the compact SVDs of the horizontal lifts, i.e., Q(s) ∈ St(n, p), Σ(s) = diag(σ 1 (s), . . ., σ p (s)) and V (s) ∈ O(p), as well as Then the Jacobi field J along γ fulfilling J(0) = 0 and D t J(0) = ∆ 2 is given by The horizontal lift of J(t) to Y (t) is accordingly given by It is the variation field of the variation of γ through geodesics given by Ξ(s, t) := Exp Gr P (t(∆ 1 + s∆ 2 )).

Conjugate Locus.
In the following, let p ≤ n 2 .The reason for this restriction is that for p > n 2 there are automatically principal angles equal to zero, yet these do not contribute to the conjugate locus, as one can see by switching to the orthogonal complement.The conjugate locus Conj P of P ∈ Gr(n, p) is given by all F ∈ Gr(n, p) such that at least two principal angles between P and F coincide, or there is at least one principal angle equal to zero if p < n 2 .This obviously includes the case of two or more principal angles equal to π 2 .In the complex case, the conjugate locus also includes points with one principal angle of π 2 , as is shown in [8].Only in the cases of principal angles of π 2 is there a nontrivial Jacobi field vanishing at P and F along a shortest geodesic.It can be calculated from the variation of geodesics as above.In the other cases, the shortest geodesic is unique, but we can smoothly vary longer geodesics from P to F .This variation is possible because of the periodicity of sine and cosine and the indeterminacies of the SVD.
is a geodesic from P to γ D (1) = F .Since for 0 < t < 1 the matrix cos(tΣ ) does not have the same number of repeated diagonal entries as cos(tΣ), not all curves γ D coincide.Then we can choose an open interval I around 0 and a smooth curve D : I → O(r) with D(0) = I r such that Γ(s, t) = γ D(s) (t) is a variation through geodesics as defined in [35,Chap. 10,p. 284].The variation field J of Γ is a Jacobi field along γ D(0) according to [35,Theorem 10.1].Furthermore, J is vanishing at t = 0 and t = 1, as γ D(s) (1) = γ D(s) (1) for all s, s ∈ I by Proposition B.1, and likewise for t = 0. Since J is not constantly vanishing, P and F are conjugate along γ D(0) by definition.
When p < n 2 and there is at least one principal angle equal to zero, there is some additional freedom of variation.Let the last r principal angles between P and F be σ can be chosen such that U T Q = 0, and there is at least one unit vector q ∈ R n , such that q is orthogonal to all column vectors in U and in Q.Let Q⊥ be an orthogonal completion of Q with q as its first column vector.Define Σ as the matrix Σ with π added to the (p − r + 1)th diagonal entry.Then for every W ∈ O(2), is a geodesic from P with γ W (1) = F .With an argument as above, P and F are conjugate along γ I2 .There are no other points in the conjugate locus than those with repeated principal angles (or one zero angle in case of p < n 2 ), as the SVD is unique (up to order of the singular values) for matrices with no repeating and no zero singular values.As every geodesic on the Grassmannian is of the form (3.9), the claim can be shown by contradiction.
By construction, the length of γ D(0) between P and F is longer than the length of the shortest geodesic, since Σ F < Σ F .The same is true for the case of a zero angle.It holds that the cut locus Cut P is no subset of the conjugate locus Conj P , since points with just one principal angle equal to π 2 are not in the conjugate locus.Likewise the conjugate locus is no subset of the cut locus.The points in the conjugate locus that are conjugate along a minimizing geodesic however are also in the cut locus, as those are exactly those with multiple principal angles equal to π 2 .Remark 7.3.The (incomplete) treatment in [55] covered only the cases of at least two principal angles equal to π 2 or principal angles equal to zero, but not the cases of repeated arbitrary principal angles.We can nevertheless take from there that for p > n 2 we need at least 2p − n + 1 principal angles equal to zero, instead of just one as for p < n 2 .Points with repeated (nonzero) principal angles are however always in the conjugate locus, as the proof of Theorem 7.2 still holds for them.

Conclusion.
In this work, we have collected the facts and formulae that we deem most important for Riemannian computations on the Grassmann manifold.This includes in particular explicit formulae and algorithms for computing local coordinates, the Riemannian normal coordinates (the Grassmann exponential and logarithm mappings), the Riemannian connection, the parallel transport of tangent vectors and the sectional curvature.All these concepts may appear as building blocks or tools for the theoretical analysis of, e.g., optimization problems, interpolation problems and, more generally speaking, data processing problems such as data averaging or clustering.
We have treated the Grassmannian both as a quotient manifold of the orthogonal group and the Stiefel manifold, and as the space of orthogonal projectors of fixed rank and have exposed (and exploited) the connections between these view points.While concepts from differential geometry arise naturally in the theoretical considerations, care has been taken that the final formulae are purely matrix-based and thus are fit for immediate use in algorithms.At last, the paper features an original approach to computing the Grassmann logarithm, which simplifies the theoretical analysis, extends its operational domain and features improved numerical properties.Eventually, this tool allowed us to conduct a detailed investigation of shortest curves to cut points as well as studying the conjugate points on the Grassmannian by basic matrix-algebraic means.These findings are more explicit and more complete than the previous results in the research literature.
An n-dimensional differentiable manifold M is a topological space M such that for every point p ∈ M, there exists a so-called coordinate chart Tangent Spaces.The tangent space of a submanifold M at a point p ∈ M, in symbols T p M, is the space of velocity vectors of differentiable curves c : t → c(t) passing through p, i.e., The tangent space is a vector space of the same dimension n as the manifold M.
Geodesics and the Riemannian Distance Function.Riemannian metrics measure the lengths and angles between tangent vectors.Eventually, this allows to measure the lengths of curves on a manifold and the Riemannian distance between two manifold locations.
A A curve is said to be parameterized by the arc length, if L(c| [a,t] ) = t−a for all t ∈ [a, b].Obviously, unit-speed curves with ċ(t) c(t) ≡ 1 are parameterized by the arc length.Constant-speed curves with ċ(t) c(t) ≡ ν 0 are parameterized proportional to the arc length.The Riemannian distance between two points p, q ∈ M with respect to a given metric is where, by convention, inf{∅} = ∞.A shortest path between p, q ∈ M is a curve c that connects p and q such that L(c) = dist M (p, q).Candidates for shortest curves between points are called geodesics and are characterized by a differential equation: A differentiable curve c : [a, b] → M is a geodesic (w.r.t. to a given Riemannian metric), if the covariant derivative of its velocity vector field vanishes, i.e., Intuitively, the covariant derivative can be thought of as the standard derivative (if it exists) followed by a point-wise projection onto the tangent space.In general, a covariant derivative, also known as a linear connection, is a bilinear mapping (X, Y ) → ∇ X Y that maps two vector fields X, Y to a third vector field ∇ X Y in such a way that it can be interpreted as the directional derivative of Y in the direction of X, and is used to define the Riemannian curvature tensor  A smooth left action of a Lie group G on a manifold M is a smooth map φ : G × M → M fulfilling φ(g 1 , φ(g 2 , p)) = φ(g 1 g 2 , p) and φ(e, p) = p for all g 1 , g 2 ∈ G and all p ∈ M , where e ∈ G denotes the identity element.One often writes φ(g, p) = g • p.
For each p ∈ M , the orbit of p is defined as and the stabilizer of p is defined as For a detailed introduction see for example [34].We need the following well known result, see for example [27,Section 2.1], where the quotient manifold G/G p refers to the set {gG p | g ∈ G} endowed with the unique manifold structure that turns the quotient map g → gG p into a submersion.(−1) j+1 X j j .
The latter is well-defined for matrices that have no eigenvalues on R − .
This equivalence relation collects all orthogonal matrices whose first p columns span the same subspace into an equivalence class.The manifold structure on O(n)/(O(p) × O(n − p)) is by definition the unique one that makes the quotient map p , the projection onto the first p columns.It defines an equivalence relation on O(n) by collecting all orthogonal matrices that share the same first p column vectors into an equivalence class.As above, dim(St(n, p)) = dim(O(n)) − dim(O(n − p)) = np − 1 2 p(p + 1), Hence, any two such Stiefel matrices differ by a rotation/reflection R ∈ O(p).Define a smooth right action of O(p) on St(n, p) by multiplication from the right.Every equivalence class (2.10)

Algorithm 5 . 1
Extended Grassmann Logarithm with Stiefel representatives Input: U, Y ∈ St(n, p) representing P = U U T , F = Y Y T ∈ Gr(n, p), respectively

Theorem 5 . 4 .
Let P = U U T ∈ Gr(n, p) and F = Y Y T ∈ Gr(n, p) \ Cut P be two points on the Grassmannian.Then Algorithm 5.1 computes the horizontal lift of the Grassmann logarithm Log Gr P (F ) = ∆ ∈ T P Gr(n, p) to Hor U St(n, p).It retains the Stiefel representative Y * when coupled with the Grassmann exponential on the level of Stiefel representatives (3.10), i.e.
by means of the SVD (5.4) Y T U = QSR T , chosen here to be with singular values in ascending order from the top left to the bottom right.Therefore Y * := Y QR T represents the same subspace [Y * ] = [Y ], but U T Y * = RSR T is symmetric.Now, we can split Y * with the projector P = U U T onto span(U ) and the projector I n − U U T onto the orthogonal complement of span(U ) via (5.5)

Theorem 7 . 2 .
Let P ∈ Gr(n, p) where p ≤ n 2 .The conjugate locus Conj P of P consists of all points F ∈ Gr(n, p) with at least two identical principal angles or, when p < π 2 , at least one zero principal angle between F and P .Proof.Let P and F have r = j − i + 1 repeated principal angles σ i = • • • = σ j .Obtain ∆ hor U = QΣR T by Algorithm 5.1.Define Σ by adding π to one of the repeated angles.Then for every D ∈ O(r) and D := diag(I i−1 , D, I p−j ), the curve x : M ⊃ D p → R n that bijectively maps an open neighborhood D p ⊂ M of a location p to an open neighborhood D x(p) ⊂ R n around x(p) ∈ R n with the additional property that the coordinate change x • x−1 : x(D p ∩ Dp ) → x(D p ∩ Dp ) of two such charts x, x is a diffeomorphism, where their domains of definition overlap, see [19, Fig. 18.2, p. 496].This enables to transfer the most essential tools from calculus to manifolds.An n-dimensional submanifold of R n+d is a subset M ⊂ R n+d that can be locally smoothly straightened, i.e. satisfies the local n-slice condition [34, Thm.5.8].Theorem A.1 ([19, Prop.18.7, p. 500]).Let h : R n+d ⊃ Ω → R d be differentiable and c 0 ∈ R d be defined such that the differential Dh p ∈ R d×(n+d) has maximum possible rank d at every point p ∈ Ω with h(p) = c 0 .Then, the preimageh −1 (c 0 ) = {p ∈ Ω | h(p) = c 0 } is an n-dimensional submanifold of R n+d .This theorem establishes the Stiefel manifold St(n, p) = U ∈ R n×p U T U = I as an embedded submanifold of R n×p , since St(n, p) = F −1 (I) for F : U → U T U .
Riemannian metric on M is a family ( •, • p ) p∈M of inner products •, • p : T p M × T p M → R that is smooth in variations of the base point p.The length of a tangent vector v ∈ T p M is v p := v, v p .The length of a curve c : [a, b] → M is defined as L(c) = b a ċ(t) c(t) dt = b a ċ(t), ċ(t) c(t) dt.
[35,  §4,   §5].Of importance is the Riemannian connection or Levi-Civita connection that is compatible with a Riemannian metric [2, Thm 5.3.1],[35, Thm 5.10].It is determined uniquely by the Koszul formula 2∇ X Y, Z = X( Y, Z ) + Y ( Z, X ) − Z( X, Y ) − X, [Y, Z] − Y, [X, Z] + Z, [X, Y ] ). Basic examples include GL(n, R) and the orthogonal group O(n).Any matrix Lie group G is automatically an embedded submanifold of C n×n[23, Corollary 3.45].The tangent space T I G of G at the identity I ∈ G has a special role.When endowed with the bracket operator or matrix commutator [V, W ] = V W − W V for V, W ∈ T I G, the tangent space becomes an algebra, called the Lie algebra associated with the Lie group G, see[23,  §3].As such, it is denoted by g = T I G.For any A ∈ G, the function "left-multiplication with A" is a diffeomorphismL A : G → G, L A (B) = AB; its differential at a point B ∈ G is the isomorphism d(L A ) B : T B G → T L A (B) G, d(L A ) B (V ) = AV .Using this observation at B = I shows that the tangent space at an arbitrary location A ∈ G is given by the translates (by left-multiplication) of the tangent space at the identity [21, §5.6, p. 160], (A.3)TA G = T L A (I) G = Ag = ∆ = AV ∈ R n×n | V ∈ g .
' is the element-wise matrix product so that P L R selects the strictly lower triangle of the square matrix R. For brevity, we write Y = Y (t 0 ), Ẏ = d dt t=t0 Y (t), likewise for Q(t), R(t).By the product ruleẎ = QR + Q Ṙ, 0 = QT Q + Q T Q, 0 = P L Ṙ.According to [53, Proposition 2.2], the derivatives Q, Ṙ can be obtained from Alg. B.2.The trick is to computeX = Q T Q first and then use this to compute Q = QQ T Q + (I n − QQ T ) Q by exploiting that Q T Q is skew-symmetric andthat ṘR −1 is upper triangular.Algorithm B.2 Differentiating the QR-decomposition, [53, Proposition 2.2]Input: matrices T, Ṫ ∈ R n×r , (compact) QR-decomposition T = QR.1: L := P L (Q T Ṫ R −1 ) 2: X = L − L T Now, X = Q T Q 3: Ṙ = Q T Ṫ − XR 4: Q = (I n − QQ T ) Ṫ R −1 + QX Output: Q, ṘMatrixExponential and the Principal Matrix Logarithm.The matrix exponential and the principal matrix logarithm are defined by (B.2) exp m (X) := ∞ j=0 X j j! , log m (I + X) := ∞ j=1 [35,iemannian manifold is flat if and only if it is locally isometric to the Euclidean space, which holds if and only if the Riemannian curvature tensor vanishes identically[35, Thm.7.10].Lie Groups and Orbits.A Lie group is a smooth manifold that is also a group with smooth multiplication and inversion.A matrix Lie group G is a subgroup of the general linear group GL(n, C) that is closed in GL(n, C) (but not necessarily in the ambient space C n×n