Curve Based Approximation of Measures on Manifolds by Discrepancy Minimization

The approximation of probability measures on compact metric spaces and in particular on Riemannian manifoldsby atomic or empirical ones is a classical task in approximation and complexity theory with a wide range of applications. Instead of point measures we are concerned with the approximation by measures supported on Lipschitz curves. Special attention is paid to push-forward measures of Lebesgue measures on the interval by such curves. Using the discrepancy as distance between measures, we prove optimal approximation rates in terms of Lipschitz constants of curves. Having established the theoretical convergence rates, we are interested in the numerical minimization of the discrepancy between a given probability measure and the set of push-forward measures of Lebesgue measures on the interval by Lipschitz curves. We present numerical examples for measures on the 2- and 3-dimensional torus, the 2-sphere, the rotation group on $\mathbb R^3$ and the Grassmannian of all 2-dimensional linear subspaces of $\mathbb{R}^4$. Our algorithm of choice is a conjugate gradient method on these manifolds which incorporates second-oder information. For efficiently computing the gradients and the Hessians within the algorithm, we approximate the given measures by truncated Fourier series and use fast Fourier transform techniques on these manifolds.


Introduction
The approximation of probability measures by atomic or empirical ones based on their discrepancies is a well examined problem in approximation and complexity theory [49, On the numerical side, we are interested in finding (local) minimizers of discrepancies between a given continuous measure and those from the set of push-forward measures of the Lebesgue measure by bounded Lipschitz curves. This problem is tackled numerically on T 2 , T 3 , S 2 as well as SO (3) and G 2,4 by switching to the Fourier domain.The minimizers are computed using the method of conjugate gradients (CG) on manifolds which incorporates second order information in form of a multiplication by the Hessian. Thanks to the approach in the Fourier domain, the required gradients and the calculations involving the Hessian can be computed very efficiently by fast Fourier transform techniques at arbitrary nodes on the respective manifolds. Note that in contrast to our approach, semi-continuous OT minimization relies on Laguerre tessellations [23] which are not available in the required form on the 2-sphere, SO(3) or G 2,4 . This paper is organized as follows: In Section 2 we give the necessary preliminaries on probability measure spaces. In particular, we introduce the different sets of measures supported on Lipschitz curves which are used for the approximation. Section 3 provides the notation on reproducing kernel Hilbert spaces and discrepancies including their representation in the Fourier domain. Section 4 contains our estimates of the approximation rates for general given measures. Our main results on the approximation rates of smoother measures are contained in Section 5, where we distinguish between the approximation with respect to the push-forward of general measures ω ∈ P[0, 1], absolute continuous measures and the Lebesgue measure on [0, 1]. In Section 6 we formulate our numerical minimization problem. Our numerical algorithms of choice are briefly described in Section 7. For a comprehensive description of the algorithms on the different manifolds we refer to respective papers. Section 8 contains numerical results demonstrating the practical feasibility of our findings. Finally, Appendix A briefly introduces the different manifolds X used in our numerical examples together with the Fourier representation of probability measures on X.

Probability Measures and Curves
In this section, the basic notation on measure spaces is provided, see [1,29], with focus on probability measures supported on curves. At this point, let us assume that X is a compact metric space and denote the metric by dist X . Further requirements on X are added along the way. By B(X) we denote the Borel σalgebra on X and by M(X) the linear space of all finite signed Borel measures on X, i.e., the space of all µ : B(X) → R satisfying µ(X) < ∞ and for any sequence (B k ) k∈N ⊂ B(X) of pairwise disjoint sets the relation  With the norm µ M = |µ|(X) the space M(X) becomes a Banach space. By C(X) we denote the Banach space of continuous real-valued functions on X equipped with the norm ϕ C(X) := max x∈X |ϕ(x)|. The space M(X) can be identified via Riesz' theorem with the dual space of C(X) and the weak- * topology on M(X) gives rise to the weak convergence of measures, i.e., a sequence (µ k ) k ⊂ M(X) converges weakly to µ and we write µ k µ, if lim k→∞ X ϕ dµ k = X ϕ dµ for all ϕ ∈ C(X).
For a non-negative, finite measure µ, let L p (X, µ) be the Banach space (of equivalence classes) of complex-valued functions with norm By P(X) we denote the space of Borel probability measures on X, i.e., non-negative Borel measures with µ(X) = 1. This space is weakly compact, i.e., compact with respect to the topology of weak convergence. We are interested in the approximation of measures in P(X) by probability measures supported on points and curves in X. To this end, we associate with x ∈ X a probability measure δ x with values δ x (B) = 1 if x ∈ B and δ x (B) = 0 otherwise.
The atomic probability measures at N points are defined by where ∆ N := {w ∈ [0, 1] N : N k=1 w k = 1}. In other words, P atom N (X) is the collection of probability measures, whose support consists of at most N points. Further restriction to equal mass distribution leads to the empirical probability measures at N points denoted by In this paper, we are interested in the approximation by measures having their support on curves. Let C([a, b], X) denote the set of closed, continuous curves γ : [a, b] → X. We restrict our attention to closed curves and point out that all of our approximation results still hold without this requirement. The length of a curve γ ∈ C([a, b], X) is given by This space is quite large and in order to define further meaningful subsets, we first derive an equivalent formulation in terms of push-forward measures. For γ ∈ C([0, 1], X), the push-forward γ * ω ∈ P(X) of a probability measure ω ∈ P([0, 1]) is defined by γ * ω(B) := ω(γ −1 (B)) for all B ∈ B(X). We directly observe supp(γ * ω) = γ(supp(ω)). By the following lemma, P curv (2) Proof. Let ν ∈ P curv L (X) as in (1). If supp(ν) consists of a single point x ∈ X only, then the constant curve γ ≡ x pushes forward an arbitrary δ t for t ∈ [a, b], which shows that ν is contained in (2).
The set P curv L (X) contains P atom N (X) if L is sufficiently large compared to N and X is sufficiently nice, cf. Section 4. It is reasonable to ask for more restrictive sets of approximation measures, e.g., when ω ∈ P([0, 1]) is assumed to be absolutely continuous. For the Lebesgue measure λ on [0, 1], we consider In the literature [16,50], the special case of push-forward of the Lebesgue measure ω = λ on [0, 1] by Lipschitz curves in T d was discussed and successfully used in certain applications [9,15]. Therefore, we also consider approximations from It is obvious that our probability spaces related to curves are nested, Hence, one may expect that establishing good approximation rates is most difficult for P λ-curv L (X) and easier for P curv L (X).

Discrepancies and RKHS
The aim of this section is to introduce the way we quantify the distance ("discrepancy") between two probability measures. Let X be a compact metric space endowed with a bounded non-negative Borel measure σ X ∈ M(X) such that supp(σ X ) = X.
Choose a continuous, symmetric function K : X × X → R that is positive definite, i.e., for any finite number n ∈ N of points x j ∈ X, j = 1, . . . , n, the relation is satisfied for all a j ∈ R, j = 1, . . . , n. We know by Mercer's theorem [20,53,65] that there exists an orthonormal basis {φ k : k ∈ N} of L 2 (X, σ X ) and non-negative coefficients (α k ) k∈N ∈ 1 such that K has the Fourier expansion with absolute and uniform convergence of the right-hand side. If α k > 0 for some k ∈ N 0 , the corresponding function φ k is continuous. Every function f ∈ L 2 (X, σ X ) has a Fourier expansion The kernel K gives rise to a reproducing kernel Hilbert space (RKHS). More precisely, the function space equipped with the inner product and the corresponding norm forms a Hilbert space with reproducing kernel, i.e., Note that f ∈ H K (X) impliesf k = 0 if α k = 0, in which case we make the convention α −1 kf k = 0 in (4). The space H K (X) is the closure of the linear span of {K(x j , ·) : x j ∈ X} with respect to the norm (4), and H K (X) is continuously embedded in C(X). In particular, the point evaluations in H K (X) are continuous.
The discrepancy D K (µ, ν) is defined as the dual norm on H K (X) of the linear operator T : see [36,57]. Note that this looks similar to the 1-Wasserstein distance, where the space of test functions consists of Lipschitz continuous functions and is larger. Since , we obtain by Riesz's representation theorem , which yields by Fubini's theorem, (3), (4) and symmetry of K that where the Fourier coefficients of µ, ν ∈ P(X) are well-defined for k with α k = 0 bŷ Remark 3.1. The Fourier coefficientsμ k andν k depend on both, K and σ X , but the identity (6) shows that D K (µ, ν) only depends on K. Thus, our approximation rates do not depend on the choice of σ X . On the other hand, our numerical algorithms in Section 7 depend on φ k and hence on the choice of σ X .
If µ n µ and ν n ν as n → ∞, then also µ n ⊗ ν n µ ⊗ ν. Therefore, the continuity of K implies that lim n→∞ D K (µ n , ν n ) = D K (µ, ν), so that D K is continuous with respect to weak convergence in both arguments. Thus, for any weakly compact subset P ⊂ P(X), the infimum inf is actually a minimum. All of the subsets introduced in the previous section are weakly compact.
Lemma 3.2. The sets P atom N (X), P emp N (X), P curv L (X), P a-curv L (X), and P λ-curv L (X) are weakly compact.
which converges in L 2 (T d ) so that k |ĥ k | 2 < ∞. Assume thatĥ k = 0 for all k ∈ Z d . We consider the special Mercer kernel By the convolution theorem for Fourier transforms it holds (h * µ) k =ĥ kμk , k ∈ Z d and we obtain by Parseval's identity for µ, ν ∈ M(T d ) and (7) that In image processing, metrics of this kind were considered in [16,30,66].

Approximation of general probability measures
Given µ ∈ P(X), the estimates 1 are well-known, cf. [38,Corollary 2.8]. Here, the constant hidden in may depend on X and K (but is independent of µ and N ∈ N). In this section, we are interested in approximation rates with respect to measures supported on curves. Our approximation rates for P curv L (X) are based on those for P atom N (X) combined with estimates for the traveling salesman problem (TSP). Let TSP X (N ) denote the worst case minimal cost tour in a fully connected graph G of N arbitrary nodes represented by x 1 , . . . , x N ∈ X and edges with cost dist X (x i , x j ), i, j = 1, . . . , N . Similarly, let MST X (N ) denote the worst case cost of the minimal spanning tree of G. To derive suitable estimates, we now assume that X is also Ahlfors d-regular (sometimes also called Ahlfors-David d-regular), i.e., there exists 0 < d < ∞ such that where B r (x) = {y ∈ X : dist X (x, y) ≤ r} and the constants in ∼ do not depend on x or r. From now on, we assume that X is a compact Ahlfors d-regular metric space.
Note that d is not required to be an integer and turns out to be the Hausdorff dimension. For X being the unit cube the following lemma was proved in [64].
Lemma 4.1. If X is a compact Ahlfors d-regular metric space, then there is a constant 0 < C TSP < ∞ (that may depend on X) such that Proof. Using (9) and the same covering argument as in [63, Lemma 3.1], we see that for every choice x 1 , . . . , x N ∈ X, there exist i = j such that dist X (x i , x j ) N − 1 d , where the constant may depend on X.
Let S = {x 1 , . . . , x N } be an arbitrary selection of N points from X. First, we choose x i and x j with dist X (x i , x j ) ≤ cN − 1 d . Then, we form a minimal spanning tree T of S \ {x i } and augment the tree by adding the edge between x i and x j . This construction provides a spanning tree and hence we estimate MST X (N ) ≤ MST X (N − 1) + cN − 1 d . Iterating the argument, we deduce 1 We use the symbols and to indicate that the corresponding inequalities hold up to a positive constant factor on the respective right-hand side. The notation ∼ means that both relations and hold. The dependence of the constants on other parameters shall either be explicitly stated or clear from the context. cf. [64]. Finally, the standard relation TSP X (N ) ≤ 2 MST X (N ) for edge costs satisfying the triangular inequality concludes the proof.
To derive a curve in X from a minimal cost tour in the graph, we make the additional assumption that X is a length space, i.e., a metric space with dist X (x, y) = inf (γ) : γ a continuous curve that connects x and y , cf. [13,14]. We are now assuming that X is a compact Ahlfors d-regular length space.
In this case, Lemma 4.1 implies that P atom Suppose that X is a compact Ahlfors d-regular length space. Then we have P atom . Proof. The Hopf-Rinow Theorem for metric measure spaces, see [13,Chapter I.3] and [14,Theorem 2.5.28], yields that every pair of points x, y ∈ X can be connected by a geodesic, i.e., there is γ ∈ Lip(X) with constant speed and (γ| [s,t] ) = dist X (γ(s), γ(t)), for all 0 ≤ s ≤ t ≤ 1. Thus, for any pair x, y ∈ X, there is a constant speed curve γ x,y ∈ Lip(X) of length (γ x,y ) = dist X (x, y) with γ x,y (0) = x, γ x,y (1) = y, cf. [14,Remark 2.5.29]. For µ N ∈ P atom N (X), let {x 1 , . . . , x N } = supp(µ N ). The minimal cost tour in Lemma 4.1 leads to a curve γ ∈ Lip(X), so that µ N = γ * ω ∈ P curv L (X) for an appropriate measure ω ∈ P atom where the constant may depend on X and K.
Proof. Choose α = d−1 d . For L large enough, set N := (L/C TSP ) 1 α ∈ N, so that we observe P atom N (X) ⊂ P curv L (X). According to (8), we obtain min ν∈P curv L (X) Next, we derive approximation rates for P a-curv L (X) and P λ-curv L (X).
where the constant may depend on X and K.
For L large enough, set N := L 2 2α+1 / diam(X) ∈ N. By (8), there is a set of points {x 1 , . . . , x N } ⊂ X such that Let these points be ordered as a solution of the corresponding TSP. Set x 0 := x N and τ i : We construct a closed curve γ L : [0, 1] → X that rests in each x i for a while and then rushes from x i to x i+1 . As in the proof of Proposition 4.2, X being a compact length space ensures that we With ν := (γ L ) * λ ∈ P λ-curv L (X) the discrepancy can be estimated by The relation (11) Note that many compact sets in R d are compact Ahlfors d-regular length spaces with respect to the Euclidean metric and the normalized Lebesgue measure such as the unit ball or the unit cube. Moreover many compact connected manifolds with or without boundary satisfy these conditions. All assumptions in this section are indeed satisfied for d-dimensional connected, compact Riemannian manifolds without boundary equipped with the Riemannian metric and the normalized Riemannian measure. The latter setting is studied in the subsequent section to refine our investigations on approximation rates.
was derived in [16] provided that K satisfies an additional Lipschitz condition, where the constant in (12) may depend on d and K. The rate coincides with our rate in (10) for d = 2 and is worse for higher dimensions since

Approximation of probability measures having Sobolev densities
To study approximation rates in more detail, we follow the standard strategy in approximation theory and take additional smoothness properties into account. We shall therefore consider µ with a density satisfying smoothness requirements. To define suitable smoothness spaces, we make additional structural assumptions on X. Throughout the remaining part of the paper we suppose that X is a d-dimensional connected, compact Riemannian manifold without boundary equipped with the Riemannian metric dist X and the normalized Riemannian measure σ X .

Sobolev spaces and lower bounds
In order to define a smoothness class of functions on X, let −∆ denote the (negative) Laplace-Beltrami operator on X. It is self-adjoint on L 2 (X, σ X ) and has a sequence of positive, nondecreasing eigenvalues (λ k ) k∈N (with multiplicities) with a corresponding orthonormal complete system of smooth eigenfunctions {φ k : k ∈ N}. Every function f ∈ L 2 (X, σ X ) has a Fourier expansion For s > d/2, the space H s (X) is continuously embedded into the space of Hölder continuous functions of degree s − d 2 , and every function f ∈ H s (X) has a uniformly convergent Fourier series, see [59,Theorem 5.7]. Actually, H s (X), s > d/2, is a RKHS with the reproducing kernel Hence, the discrepancy D K (µ, ν) satisfies (5) with H K (X) = H s (X). Clearly, each kernel of the above form with coefficients having the same decay as (1 + λ k ) −s for k → ∞ gives rise to a RKHS that coincides with H s (X) with an equivalent norm. Appendix A contains more details of the above discussion for the torus T d , the sphere S d , the special orthogonal group SO(3) and the Grassmannian manifold G k,d . Now, we are in the position to establish lower bounds on the approximation rates. Our result still holds if we drop the requirement that the curves in the definitions of P curv L (X), P a-curv L (X), and P λ-curv L (X) are closed.
Assume that µ is absolutely continuous with respect to σ X with a continuous density ρ. Then we have where the constants may depend on X, K, and ρ.
Proof. The proof is based on the construction of a suitable fooling function to be used in (5) and follows [11,Theorem 2.16]. There exists a ball B ⊂ X with ρ(x) ≥ = (B, ρ) for all x ∈ B and σ X (B) > 0, which is chosen as the support of the constructed fooling functions. We shall verify that for every ν ∈ P atom where the constant may depend on X, K, and ρ. For small enough δ we can choose 2N disjoint balls in B with diameters δN −1/d , see also [35]. For ν ∈ P atom N (X), there are N of these balls that do not intersect with supp(ν). By putting together bump functions supported on each of the N balls, we obtain a non-negative function ϕ supported in B that vanishes on supp(ν) and satisfies (13), with a constant that may depend on , cf. [11,Theorem 2.16]. This yields The inequality for P curv L (X) is derived in a similar fashion. Given a continuous curve γ : [0, 1] → X of length L, choose N such that L ≤ δN N −1/d . By taking half of the radius of the above balls, there are 2N pairwise disjoint balls of radius δ 2 N −1/d contained in B with pairwise distances at least δN −1/d . Any curve of length δN N −1/d intersects at most N of those balls. Hence, there are N balls of radius δ 2 N −1/d that do not intersect supp(γ). As above, this yields a fooling function ϕ satisfying (13), which finishes the proof. 13 5.2. Approximation rates for P curv L (X) and P a-curv L (X) In this section we derive upper approximation bounds that match the lower bounds in Theorem 5.1 for P curv L (X) and P a-curv L (X). Our analysis makes use of the following theorem, which was already proved for X = S d in [44].
Theorem 5.2. [11, Theorem 2.12] Assume that ν r ∈ P(X) provides an exact quadrature for all eigenfunctions ϕ k of the Laplace-Beltrami operator with eigenvalues λ k ≤ r 2 , i.e., Then, it holds for every function where the constant may depend on X and s.
For our estimates it is important that the number of eigenfunctions of the Laplace-Beltrami operator on X belonging to eigenvalues with λ k ≤ r 2 is of order r d , see [17,Chapter 6.4] and [45,Theorem 17.5.3,Corollary 17.5.8]. This is known as Weyl's estimates on the spectrum of an elliptic operator. For some special manifolds, the eigenfunctions are explicitely given in the appendix. In the following lemma, the result from Theorem 5.2 is rewritten in terms of discrepancies and generalized to absolutely continuous measures with densities ρ ∈ H s (X).
holds with equivalent norms and that ν r ∈ P(X) satisfies (14). Let µ ∈ P(X) be absolutely continuous with respect to σ X with density ρ ∈ H s (X). For sufficiently large r, the measuresν r := ρ βr ν r ∈ P(X) with β r := X ρ dν r are well defined and obey where the constant may depend on X and K.
Proof. Note that H s (X) is a Banach algebra with respect to addition and multiplication [19], in particular, for f, g ∈ H s (X) we have f g ∈ H s (X) with By Theorem 5.2, we obtain for all ϕ ∈ H s (X) that In particular, this implies for ϕ ≡ 1 that Then, application of the triangle inequality results in According to (16), the first summand is bounded by r −s ϕ H s (X) ρ H s (X) . It remains to derive matching bounds on the second term. Hölder's inequality yields where the last inequality is due to H s (X) → L ∞ (X) and (17).
Using the previous lemma, we derive optimal approximation rates for P atom N (X) and P curv L (X).
Assume that µ is absolutely continuous with respect to σ X with density ρ ∈ H s (X). Then, we have where the constants may depend on X and K.
Proof. By [11, Lemma 2.11] and since the Laplace-Beltrami has N ∼ r d eigenfunctions belonging to eigenvectors λ k < r 2 , there exists a measure ν r ∈ P atom N (X) which satisfies (14). Hence, (14) is satisfied with r ∼ N 1/d , where the constants may depend on X and K. Thus, Lemma 5.3 withν r ∈ P atom N (X) leads to (18). The assumptions of Lemma 4.1 are satisfied, so that the analogous arguments as in the proof of Theorem 4.3 yield P atom implies (19).
To establish approximation rates for P a-curv L (X), restriction to special manifolds is necessary. The basic idea consists in the construction of a curve and a related measure ν r such that all eigenfunctions of the Laplace-Beltrami operator belonging to eigenvalues smaller than a certain value are exactly integrated by this measure and then applying Lemma 5.3 for estimating the minimum of discrepancies. We begin with the torus.
holds with equivalent norms. Then, for any absolutely continuous measure µ ∈ P(X) with Lipschitz continuous density ρ ∈ H s (X), we have where the constant may depend on d, K, and ρ.
Proof. 1. First, we construct a closed curve γ r such that the trigonometric polynomials from Π r (T d ), see (32) in the appendix, are exactly integrated along this curve. Clearly, the polynomials in Π r (T d−1 ) are exactly integrated at equispaced knots and consider the curves Then, each element in Π d r is exactly integrated along the union of these curves, i.e., The argument is repeated for every other coordinate direction, so that we end up with dn d−1 curves mapping from an interval of length 1 dn d−1 to T d . The intersection points of these curves are considered as vertices of a graph, where each vertex has 2d many edges. Consequently, there exists an Euler path γ r : [0, 1] → T d trough the vertices build from all curves. It has constant speed dn d−1 and the polynomials Π d r are exactly integrated along γ r , i.e., 2. Next, we apply Lemma 5.
Here, constants may depend on d, K, and ρ. Now, we provide the approximation rates for X = S d .
Theorem 5.6. Let X = S d with d ≥ 2, s > d/2 and suppose H K (X) = H s (X) holds with equivalent norms. Then, we have for any absolutely continuous measure µ ∈ P(X) with Lipschitz continuous density ρ ∈ H s (X) that where the constant may depend on d, K, and ρ.
Proof. 1. First, we construct a constant speed curve γ r : [0, 1] → S d and a probability measure ω r = ρ r λ with Lipschitz continuous density ρ r : Utilizing spherical coordinates where θ k ∈ [0, π], k = 1, . . . , d − 1 and φ ∈ [0, 2π), we obtain where To see this, substitute u k = sin θ k , k = 2, . . . , d − 1, apply Gaussian quadrature with (r +1)/2 knots and corresponding weights to exactly integrate over u k , and equispaced knots and weights 1/(2r + 1) for the integration over φ as, e.g., in [71]. Then, we define Since (1, 0, . . . , 0) = γ r,i (0) = γ r,i (2π) for all i = 1, . . . , n, the curve is closed. Furthermore, γ r (t) has constant speed since for i = 1, . . . , n, Next, the density ρ r : We directly verify that ρ r is Lipschitz continuous with L(ρ r ) max i a i n 2 . By [31], the quadrature weights fulfill a i By definition of the constant c d and weights a i , we see that ρ r is indeed a probability density For p ∈ Π r (S d ), we obtain Without loss of generality, p is chosen as a homogeneous polynomial of degree k ≤ r, i.e., p(tx) = t k p(x). Then, and regarding that for fixed α ∈ [0, 2π] the functionx → p(cos(α), sin(α)x) is a polynomial of degree at most r on S d−1 , we conclude Now, the assertion (20) follows from (22) and since S d p dσ S d = 0 if k is odd. 2. Next, we apply Lemma 5.3 for ν r = γ r * ρ r λ. This yieldsν r = γ r * ((ρ • γ r )ρ r /β r λ). Since all ρ r are uniformly bounded by construction and ρ is bounded due to continuity, we conclude using L(ρ r ) r d−1 and L(γ r ) ∼ r d−1 that which concludes the proof.
Finally, we derive the approximation rates for X = SO(3).
Corollary 5.7. Let X = SO(3), s > 3 2 and suppose H K (X) = H s (X) holds with equivalent norms. Then, we have for any absolutely continuous measure µ ∈ P(X) with Lipschitz continuous density ρ ∈ H s (X) that min ν∈P a-curv where the constant may depend on K and ρ.
2. The rest follows along the lines of the second part of the proof of Theorem 5.6.

Approximation rates for
To derive approximation rates for P λ-curv L (X), we need the following specification of Lemma 5.3.
Lemma 5.8. For s > d/2 suppose that H K (X) = H s (X) holds with equivalent norms. Let µ ∈ P(X) be absolutely continuous with respect to σ X with positive density ρ ∈ H s (X). Suppose that ν r := γ r * λ with γ r ∈ Lip(X) satisfies (14) and let β r := X ρ dν r . Then, for sufficiently large r, is well-defined and invertible. Moreover,γ r := γ r • g −1 satisfies L(γ r ) L(γ r ) and where the constants may depend on X, K, and ρ.
Proof. Since ρ is continuous, there is > 0 with ρ ≥ . To bound the Lipschitz constant L(γ r ), we apply the mean value theorem together with the definition of g and the fact that (g −1 ) (s) = 1/g (g −1 (s)) to obtain Using (17) this can be further estimated for sufficiently large r as To derive (23), we aim to apply Lemma 5.3 with ν r = γ r * λ. We observẽ so that Lemma 5.3 indeed implies (23).
In comparison to Theorem 5.5, we now trade the Lipschitz condition on ρ with the positivity requirement, which enables us to cover P λ-curv L (X).
Theorem 5.9. Let X = T d with d ∈ N, s > d/2 and suppose H K (X) = H s (X) holds with equivalent norms. Then, for any absolutely continuous measure µ ∈ P(X) with positive density ρ ∈ H s (X), we have where the constant may depend on d, K, and ρ.
Proof. The first part of the proof is identical to the proof of Theorem 5.5. Instead of Lemma 5.3 though, we now apply Lemma 5.8 for γ r and ρ r ≡ 1. Hence,γ r = γ r • g −1 r satisfies L(γ r ) ≤ βr d(2r + 1) d−1 r d−1 , so thatγ r * λ satisfies (23) and is in P λ-curv The construction on X = S d for P a-curv L (X) in the proof of Theorem 5.6 is not compatible with P λ-curv L (X). Thus, the situation is different from the torus, where we have used the same underlying construction and only switched from Lemma 5.3 to Lemma 5.8. Now, we present a new construction for P λ-curv L (X), which is tailored to X = S 2 . In this case, we can transfer the ideas of the torus, but with Gauss-Legendre quadrature points.
Theorem 5.10. Let X = S 2 , s > 1 and suppose H K (X) = H s (X) holds with equivalent norms. Then, we have for any absolutely continuous measure µ ∈ P(X) with positive density ρ ∈ H s (X) that min ν∈P a-curv L (X) where the constant may depend on K and ρ.
As with the torus, we now "turn" the sphere (or switch the position of φ) so that we get circles along orthogonal directions. This large collection of circles is indeed connected. As with the torus, each intersection point has an incoming and outgoing part of a circle, so that all this corresponds to a graph, where again each vertex has an even number of "edges". Hence, there is an Euler path inducing our final curve γ r : [0, 1] → S 2 with piecewise constant speed L(γ r ) r satisfying S 2 p dσ S 2 = 1 0 p d(γ r * λ), p ∈ Π r (S 2 ).
2. Let r ∼ L. Analogous to the end of the proof of Theorem 5.9, Lemma 5.8 now yields the assertion.
To get the approximation rate for X = G 2,4 , we make use of its double covering X = S 2 × S 2 , cf. Remark A.1.
Theorem 5.11. Let X = G 2,4 , s > 2 and suppose H K (X) = H s (X) holds with equivalent norms. Then, we have for any absolutely continuous measure µ ∈ P(X) with positive density ρ ∈ H s (X) that min ν∈P a-curv where the constant may depend on K and ρ.
Proof. By Remark A.1 in the appendix we know that G 2,4 ∼ = S 2 × S 2 /{±1} so that is remains to prove the assertion for X = S 2 × S 2 . There exist pairwise distinct points {x 1 , . . . , x N } ⊂ S 2 such that 1 N N j=1 δ x j satisfies (14) on S 2 with N ∼ r 2 , cf. [7,8]. On the other hand, letγ be the curve on S 2 constructed in the proof of Theorem 5.10, so thatγ * λ satisfies (14) on S 2 with (γ) ≤ L(γ) ∼ r. Let us introduce the virtual point x N +1 := x 1 . The curveγ([0, 1]) contains a great circle. Thus, for each pair x j and x j+1 there is O j ∈ O(3) such that x j , x j+1 ∈ Γ j := O jγ ([0, 1]). It turns out that the set on S 2 ×S 2 given by N j=1 ({x j }×Γ j )∪(Γ j ×{x j+1 }) is connected. We now choose γ j := O jγ and know that the union of the trajectories of the set of curves is connected. Combinatorial arguments involving Euler paths, see Theorems 5.5 and 5.10, lead to a curve γ with (γ) ≤ L(γ) ∼ N L(γ) ∼ r 3 , so that γ * λ satisfies (14). The remaining part follows along the lines of the proof of Theorem 5.6.
Our approximation results can be extended to diffeomeorphic manifolds, e.g., from S 2 to ellipsoids, see also the 3d-torus example in Section 8. To this end, recall that we can describe the Sobolev space H s (X) using local charts, see [67, Section 7.2]. The exponential maps exp x : T x X → X give rise to local charts (B x (r 0 ), exp −1 x ), where B x (r 0 ) := {y ∈ X : dist X (x, y) < r 0 } denotes the geodesic balls around x with the injectivity radius r 0 . If δ < r 0 is chosen small enough, there exists a uniformly locally finite covering of X by a sequence of balls (B x j (δ)) j with a corresponding smooth resolution of unity (ψ j ) j with supp(ψ j ) ⊂B x j (δ), see [67, Proposition 7.2.1]. Then, an equivalent Sobolev norm is given by where (ψ j f ) • exp x j is extended to the R d by zero, see [67,Theorem 7.4.5]. Using definition (24) we are able to pull over results from the Euclidean setting.
Proposition 5.12. Let X 1 , X 2 be two d-dimensional connected, compact Riemannian manifolds without boundary, which are s + 1 diffeomorphic with s > d/2. Assume that for H K (X 2 ) = H s (X 2 ) and every absolutely continuous measure µ with positive density ρ ∈ H s (X 2 ) it holds min where the constant may depend on X 2 , K, and ρ. Then, the same property holds for X 1 , where the constant may additionally depend on the diffeomorphism.
Proof. Let f : X 2 → X 1 denote such a diffeomorphism and ρ ∈ H s (X 1 ) the density of the measure µ on X 1 . Any curveγ : [0, 1] → X 2 gives rise to a curve γ : [0, 1] → X 1 via γ = f •γ, which for every ϕ ∈ H s (X 1 ) satisfies where J f denotes the Jacobian of f . Now, note that ϕ • f, ρ • f | det(J f )| ∈ H s (X 2 ), see (15) and [67,Theorem 4.3.2], which is lifted to manifolds using (24). Hence, we can define a measureμ on X 2 through the probability density ρ • f | det(J f )|. Choosingγ L as a realization for some minimizer of inf ν∈P λ-curv L D(μ, ν), we can apply the approximation result for X 2 and estimate for γ L = f •γ L that where the second estimate follows from [67,Theorem 4.3.2]. Note that L(γ L ) ≤ L(f )L, which consequently implies Remark 5.13. Consider a probability measure µ on X such that the dimension d µ of its support is smaller than the dimension d of X. Then, µ does not have any density with respect to σ X . If supp(µ) is itself a d µ -dimensional connected, compact Riemannian manifold Y without boundary, we switch from X to Y. Sobolev trace theorems and reproducing kernel Hilbert space theory imply that the assumption H K (X) = H s (X) leads to H K (Y) = H s (Y), where K := K| Y×Y is the restricted kernel and s = s − (d − d µ )/2, cf. [33]. If, for instance, Y is diffeomorphic to T dµ (or S dµ with d µ = 2), and µ has a positive density ρ ∈ H s (Y) with respect to σ Y , then Theorem 5.9 (or 5.10) and Proposition 5.12 eventually yield If supp(µ) is a proper subset of Y, then we are able to analyze approximations with P a-curv L (Y). First, we observe that the analogue of Proposition 5.12 also holds for the sets P a-curv L (X 1 ), P a-curv L (X 2 ) when the positivity assumption on ρ is replaced with the Lipschitz requirement as in Theorems 5.5, 5.6. If, for instance, Y is diffeomorphic to T dµ or S dµ and µ has a Lipschitz continuous density ρ ∈ H s (Y) with respect to σ Y , then Theorems 5.5, 5.6, and Proposition 5.12 eventually yield min ν∈P a-curv

Discretization
In our numerical experiments we are interested in determining minimizers of . If X is a compact Ahlfors d-regular metric space and a length space, every pair of points x, y ∈ X can be connected by a geodesic curve. Hence, we approximate curves in A L by piecewise shortest geodesics with N parts, i.e., by curves from where J L,N (γ) := D 2 K (µ, γ * e N ) + ι A L,N (γ). Since ess sup t∈[0,1] |γ|(t) = L(γ), the constraint L(γ) ≤ L can be written as  (26) can be rewritten in the computationally more feasible form This discretization is motivated by the next proposition. To this end, recall that a sequence (f N ) N ∈N of functions f N : X → (−∞, +∞] on a metric space X is said to Γ-converges to f : X → (−∞, +∞] if the following two conditions are fulfilled for each x ∈ X , see [10]: The importance of Γ-convergence relies in the fact that every cluster point of minimizers of (f N ) N ∈N is a minimizer of f . Note that for non-compact spaces X an additional equi-coercivity condition would be required.
Proposition 6.1. If X is a compact Ahlfors d-regular metric space and a length space, then the sequence (J L,N ) N ∈N is Γ-convergent with limit J L .
Proof. 1. First, we verify the lim inf-inequality. Let (γ N ) N ∈N with lim N →∞ γ N = γ, i.e., it holds sup t∈[0,1] dist X (γ(t), γ N (t)) → 0. Excluding the case lim inf N →∞ J L,N (γ N ) = ∞ and restricting to a subsequence (γ N k ) k∈N , we may assume γ N k ∈ A L,N k ⊂ A L . Since A L is closed, we directly infer γ ∈ A L . It holds e N λ, which is equivalent to the convergence of Riemann sums for f ∈ C[0, 1], and hence also γ N * e N γ * dr. By the weak continuity of D 2 K , we obtain 2. Next, we prove the lim sup-inequality, i.e., we are searching for a sequence (γ N ) N ∈N with γ N → γ and lim sup N →∞ J L,N (γ N ) ≤ J L (γ). First, we may exclude the trivial case J L (γ) = ∞. Then, γ N is defined on every interval [(i − 1)/N, i/N ], i = 1, . . . , N , as a shortest geodesic from γ((i − 1)/N ) to γ(i/N ). By construction we have γ N ∈ A L,N .
In the numerical part, we use the penalized form of (27) and minimize

Numerical Algorithm
In order to minimize (29), we have a closer look at the discrepancy term. By (6) and (7), the discrepancy can be represented as follows Both formulas have pros and cons: The first formula allows for an exact evaluation only if the expressions Φ(x) := X K(x, y) dµ(y) and X Φ dµ can be written in closed forms. In this case the complexity scales quadratically in the number of points N . The second formula allows for exact evaluation only if the kernel has a finite expansion (3). In that case the complexity scales linearly in N .
Our approach is to use kernels fulfilling H K (X) = H s (X), s > d/2, and approximating them by their truncated representation with respect to the eigenfunctions of the Laplace-Beltrami operator K r (x, y) := k∈Ir α k ϕ k (x)ϕ k (y), I r := k : ϕ k ∈ Π r (X) .

Algorithm 1 (CG Method with Restarts)
Parameters: maximal iterations k max ∈ N Input: twice differentiable function F : (1) , · · · ∈ X N Then, we finally aim to minimize where α > 0. Our algorithm of choice is the nonlinear conjugate gradient (CG) method with Armijo line search as outlined in Algorithm 1 with notation described below, see [22] for Euclidean spaces. The proposed method is of ,,exact conjugacy" and uses the second order derivative information provided by the Hessian. For the Armijo line search itself, the sophisticated initialization in Algorithm 2 is used which also incorporates second order information via the Hessian. The main advantage of the CG method is its simplicity together with fast convergence at low computational cost. Indeed, under suitable assumptions the sequence produced by Algorithm 1 converges superlinearly, more precisely dN -step quadratically towards a local minimum, cf. [ Remark 7.1. The objective in (30) violates the smoothness requirements whenever However, we observe numerically that local minimizers of (30) do not belong to this set of measure zero. This means in turn, if a local minimizer has a positive definite Hessian, then there is a local neighborhood where the CG method (with exact line search) permits a superlinear convergence rate. We do indeed observe this behavior in our numerical experiments.
Let us briefly comment on Algorithm 1 for X ∈ {T 2 , T 3 , S 2 , SO(3), G 2,4 } which are considered in our numerical examples. By γ x,d we denote the geodesic with γ x,d (0) = x Algorithm 2 (Armijo Line Search) Parameters: 0 < µ < 1 2 , 0 < τ < 1, maximal iterations k max ∈ N Input: smooth function F :  [24] O(r 4 log 2 (r) + N ) Table 1: References for implementation details of Alg. 1 (left) and arithmetic complexity for the evaluations per iteration for the different manifolds (right). andγ x,d (0) = d. Besides the evaluation of geodesics γ x (k) ,d (k) (α (k) ) in the first iteration step, we have to compute the parallel transport of d (k) along the geodesics in the second step. Furthermore, we need to compute the Riemannian gradient ∇ X N F and products of the Hessian H X N F with vectors d, which are approximated by the finite difference The computation of the gradient of the penalty term in (30) is straightforward and its evaluation at a point in X N requires only O(N ) arithmetic operations. Note that for x → dist X (x, y) we have ∇ X dist X (x, y) = log x y/ dist X (x, y), x = y with the logarithmic map log on X, while the distance is not differentiable for x = y, see Remark 7.1. The analytic computation of the Riemannian gradient of the data term in (30) can be essentially realized via the gradient of the eigenfunctions ϕ k of the Laplace-Beltrami operator. Then, the evaluation of the gradient of the whole data term at given points can be done efficiently by fast Fourier transform (FFT) techniques at non-equispaced knots using the NFFT software package of Potts et al. [47]. The overall complexity of the algorithm and references for the computation details for the above manifolds are given in Table 1.

Numerical Results
In this section, we underline our theoretical results by numerical examples. We start by studying the parameter choice in our numerical model. Then, we provide examples for the approximation of absolutely continuous measures with densities in H s (X), s > d/2, by push-forward measures of the Lebesgue measure on [0, 1] by Lipschitz curves for the manifolds X ∈ {T 2 , T 3 , S 2 , SO(3), G 2,4 }. Supplementary material can be found on our webpage.

Parameter choice
In order to find reasonable (local) solutions of (25), we carefully adjust the parameters in problem (30), namely the number of points N , the polynomial degree r in the kernel truncation, and the penalty parameter α. Suppose that dim(supp(µ)) = d ≥ 2. iii) Penalty parameter α: Intuitively, the minimizers of (30) should treat both terms almost equally, i.e., for N → ∞ both terms are of the same order. Hence, our heuristic is to choose the parameter α such that min On the other hand, assuming that for the length (γ) = N k=1 dist X (x k−1 , x k ) of a minimizer γ we have (γ) ∼ L ∼ N (d−1)/d , so that N dist X (x k−1 , x k ) ∼ L, the value of the penalty term behaves like

Hence, a reasonable choice is
Remark 8.1. In view of Remark 5.13 the relations in i)-iii) become In the rest of this subsection, we aim to provide some numerical evidence for the parameter choice above. We restrict our attention to the torus X = T 2 and the kernel K given in (33) with d = 2 and s = 3/2. Choose µ as the Lebesgue measure on T 2 . From (31), we should keep in mind α ∼ N − 5 2 ∼ L −5 .
Influence of N and α. We fix L = 4 and a large polynomial degree r = 128 for truncating the kernel. For any α i = 0.1 · 2 − 5 2 i , i = 1, . . . , 4, we compute local minimizers with N j = 10 · 2 j , j = 1, . . . , 4. More precisely, keeping α i fixed we start with N 1 = 20 and refine successively the curves by inserting the midpoints of the line segments connecting consecutive points and applying a local minimization with this initialization. The results are depicted in Figure 1. For fixed α (fixed row) we can clearly notice that the local minimizers converge towards a smooth curve for increasing N . Moreover, the diagonal corresponds to the choice α = 0.1(N/10) − 5 2 , where we can already observe good approximation of the curves emerging to the right of it. This should provide some evidence that the choice of the penalty parameter α and the number of points N discussed above is reasonable. Indeed, for α → ∞ we observe L(γ) → (γ) → L = 4.
Influence of the polynomial degree r. In Figure 2 we illustrate the local minimizers of (30) for fixed Lipschitz parameters L i = 2 i and corresponding α i = 0.2 · L −5 i , i = 1, . . . , 4, (rows) in dependence on the polynomial degrees r j = 8 · 2 j , j = 1, . . . , 5 (columns). According to the previous experiments, it seems reasonable to choose N = 20L 2 . Note, that the (numerical) choice of α leads to curves with length (γ) ≈ 2L. In Figure 2 we observe that for r = cL the corresponding local minimizers have common features. For instance, if c = 4 (i.e., r ≈ (γ)) the minimizers have mostly vertical and horizontal line segments. Furthermore, for fixed r it appears that the length of the curves increases linearly with L until L exceeds 2r, from where it remains unchanged. This observation can be explained by the fact that there are curves of bounded length cr which provide exact quadratures for degree r.

Quasi-optimal curves on special manifolds
In this subsection, we give numerical examples for X ∈ {T 2 , T 3 , S 2 , SO(3), G 2,4 }. Since the objective function in (30) is highly non-convex, the main problem is to find nearly optimal curves γ L ∈ P λ-curv L (X) for increasing L. Our heuristic is as follows: i) We start with a curve γ L 0 : [0, 1] → X of small length (γ) ≈ L 0 and solve the problem (30) for increasing L i = cL i−1 , c > 1, where we choose the parameters N i , α i and r i in dependence on L i as described in the previous subsection. In each step a local minimizer is computed using the CG method with 100 iterations. Then, the obtained minimizer γ i serves as the initial guess in the next step, which is obtained by inserting the midpoints.
ii) In case that the resulting curves γ i have non-constant speed, each is refined by increasing α i and N i . Then, the resulting problem is solved with the CG method and

31
The following examples show that this recipe indeed enables us to compute ,,quasioptimal" curves, meaning that the obtained minimizers have optimal decay in the discrepancy.
2d-Torus T 2 . In this example we illustrate how well a gray-valued image (considered as probability density) may be approximated by an almost constant speed curve. The original image of size 170x170 is depicted in the bottom-right corner of Figure 3. Its Fourier coefficientsμ k 1 ,k 2 are computed by a discrete Fourier transform (DFT) using the FFT algorithm and normalized appropriately. The kernel K is given by (33) with d = 2 and s = 3/2.
We start with N 0 = 96 points on a circle given by the formula Then, we apply our procedure for i = 0, . . . , 11 with parameters chosen such that the length of the local minimizer γ i satisfies (γ i ) ≈ 2 i+5 2 and the maximal speed is close to L i .
To get nearly constant speed curves γ i , see ii), we increase α i by a factor of 100, N i by a factor of 2 and set L i := 2 i+5 2 . Then, we apply the CG method with maximal 100 iterations and i restarts. The results are depicted in Figure 3. Note that the complexity for the evaluation of the function in (30) scales roughly as N ∼ L 2 . In Figure 4 we observe that the decay-rate of the squared discrepancy D 2 K (µ, ν) in dependence on the Lipschitz constant L matches indeed the theoretical findings of Theorem 5.9.
3d-Torus T 3 . The aim of this example is two-fold. First, it shows that the algorithm works pretty well in three dimensions. Second, we are able to approximate any compact surface in the three-dimensional space by a curve. We construct a measure µ supported around a two-dimensional surface by taking samples from Spock's head 2 and placing small Gaussian peaks at the sampling points, i.e., the density is given for where S ⊂ [− 1 2 , 1 2 ] 3 is the discrete sampling set. From a numerical point of view it holds dim(supp(µ)) = 2. The Fourier coefficients are again computed by a DFT and the kernel K is given by (33) with d = 3 and s = 2 so that H K = H 2 (T 3 ).
Then, we apply our procedure for i = 0, . . . , 8 with parameters, cf. Remark 8.1,   Figure 3 and the computed local minimizers (black dots) on T 2 in log-scale. The blue line corresponds to the optimal decay-rate in Theorem 5.9. 33 To get nearly constant speed curves γ i , we increase α i by a factor of 100, N i by a factor of 2 and set L i := 2 i+6 2 . Then, we apply the CG method with maximal 100 iterations and one restarts to the previously found curve γ i . The results are illustrated in Figure 5. Note that the complexity for the evaluation of the function in (30) scales roughly as N 3 2 ∼ L 3 . In Figure 6 we depict the squared discrepancy D 2 K (µ, ν) of the computed curves. For small Lipschitz constants, say L(γ) ≤ 50, we observe a decrease of approximately L(γ) −3 , which matches the optimal decay-rate for measures supported on surfaces as discussed in Remark 5.13.
2-Sphere S 2 . Next, we approximate a gray-valued image on the sphere S 2 by an almost constant speed curve. The image represents the earth's elevation data provided by MATLAB, given by samples ρ i,j , i = 1, . . . , 180, j = 1, . . . , 360, on the grid x i,j := sin i π 180 sin j π 180 , sin i π 180 cos j π 180 , cos i π
The Fourier coefficients are computed by discretizing the Fourier integrals, i.e., followed by a suitable normalization such thatμ 0 0 = 1. The corresponding sums are efficiently computed by an adjoint nonequispaced fast spherical Fourier transform (NFSFT), see [58]. The kernel K is given by (35). Similar to the previous examples, we apply our procedure for i = 0, . . . , 12 with parameters To get nearly constant speed curves, we increase α i by a factor of 100, N i by a factor of 2 and set L i := L 0 2 i 2 . Then, we apply the CG method with maximal 100 iterations and one restart to the previously constructed curves γ i . The results for i = 6, 8, 10, 12 are depicted in Figure 7. Note that the complexity for the evaluation of the function in (30) scales roughly as N ∼ L 2 . In Figure 8 we observe that the decay-rate of the squared discrepancy D 2 K (µ, ν) in dependence on the Lipschitz constant matches indeed the theoretical findings in Theorem 5.10.
Step ii) appears to be not necessary. Note that the complexity for the evaluation of the function in (30) scales roughly as N ∼ L
Setting q := (cos( α 2 ), sin( α 2 )r) ∈ S 3 with r ∈ S 2 and α ∈ [0, 2π], see (21), we observe that the same rotation is generated by −q = (cos( 2π−α 2 ), sin( 2π−α 2 (−r)) ∈ S 3 , in other words SO(3) ∼ = S 3 /{±1}. Then, by applying the stereographic projection π(q) = (q 2 , q 3 , q 4 )/(1 + q 1 ) we map the upper hemisphere onto the three dimensional unit ball. Note that the equatorial plane of S 3 is mapped onto the sphere S 2 , hence on the surface of the ball antipodal points have to be identified. In other words, the rotation R(α, r) is plotted as the point In Figure 10 we observe that the decay-rate of D 2 K (µ, ν) in dependence on the Lipschitz constant L matches the theoretical findings in Corollary 5.7.
The 4-dimensional Grassmannian G 2,4 . Here, we aim to approximate the Haar measure of the Grassmannian G 2,4 by a curve of almost constant speed. Since this curve samples the space of G 2,4 quite evenly, it could be used for the grand tour, a technique to analyze high-dimensional data by their projections onto two-dimensional subspaces, cf. [3].
The kernel K of the Haar measure is given by (37)  i .
Here, we use a CG method with 100 iterations and one restart.
Step ii) appears to be not necessary. Note that the complexity for the evaluation of the function in (30) scales roughly as N ∼ L 3 2 . The computed curves are illustrated in Figure 11, where we use the following visualization. By Remark A.1 there is an isometric one-to-one mapping P : S 2 × S 2 /{±1} → G 2,4 . Using this relation we plot the point P (u, v) ∈ G 2,4 by two antipodal points z 1 = u + v, z 2 = −u − v ∈ R 3 together with the RGB color-coded vectors ±u. 3 More precisely, R = (1∓u 1 )/2, G = (1∓u 2 )/2, B = (1∓u 3 )/2. This means a curve γ(t) ∈ G 2,4 only intersects itself if the corresponding curve z(t) ∈ R 3 intersects and has the same colors at the intersection point. In Figure 12 we observe that the decay-rate of the squared discrepancy D 2 K (µ, ν) in dependence on the Lipschitz constant L matches indeed the theoretical findings in Theorem 5.11. 3 Note that the decomposition of z ∈ R 3 with 0 < z < 2 into u and v is not unique. There is a one-parameter family of points us, vs ∈ S 2 such z = us + vs. The point z = 0 has a two-dimensional ambiguity v = −u, u ∈ S 2 and the point z ∈ 2S 2 has a unique pre-image v = u = 1 2 z.    The squared discrepancy between the Haar measure µ and the computed local minimizers (black dots) in log-scale. Here, the blue line corresponds to the optimal decay-rate, cf. Theorem 5.11.

A. Special Manifolds
In this section, we introduce the main examples which are addressed in the numerical part. The measure σ X is always the normalized Riemannian measure on the manifold X. Note that for simplicity of notation all eigenspaces are complex in this section. We are interested in the following special manifolds. as corresponding orthonormal eigenfunctions [56]. The span of eigenfunctions with eigenvalues ≤ r(r + d − 1) is given by Π r (S d ) := span Y k l : k = 0, . . . , r, l = 1, . . . , Z(d, k) .
Example 4: X = G 2,4 . For integers 1 ≤ s < r, the (s, r)-Grassmannian is the collection of all s-dimensional linear subspaces of R r and carries the structure of a closed Riemannian manifold. By identifying a subspace with the orthogonal projector onto this subspace, the Grassmannian becomes G s,r := x ∈ R r×r : x = x, x 2 = x, rank(x) = s .
Remark A.1. It is well-known that S 2 × S 2 is a double covering of G 2,4 . More precisely, there is an isometric one-to-one mapping P : S 2 × S 2 /{±1} → G 2,4 given by cf. [24]. Moreover, the ϕ λ l are essentially tensor products of spherical harmonics, which enables transferring the nonequispaced fast Fourier transform from S 2 × S 2 to G 2,4 , see [24] for details.