Galerkin–Chebyshev approximation of Gaussian random fields on compact Riemannian manifolds

A new numerical approximation method for a class of Gaussian random fields on compact connected oriented Riemannian manifolds is introduced. This class of random fields is characterized by the Laplace–Beltrami operator on the manifold. A Galerkin approximation is combined with a polynomial approximation using Chebyshev series. This so-called Galerkin–Chebyshev approximation scheme yields efficient and generic sampling algorithms for Gaussian random fields on manifolds. Strong and weak orders of convergence for the Galerkin approximation and strong convergence orders for the Galerkin–Chebyshev approximation are shown and confirmed through numerical experiments.


Introduction
Models for random fields defined on manifolds are of key importance in many application areas such as environmental sciences, geosciences and cosmological data analysis [40].While one area of interest is dealing with actual data that lies on surfaces and doing inference based on these data, we focus in this work on the primarily needed modeling and sampling of these random fields.More specifically, we propose a generic approach to define and numerically approximate a particular class of Gaussian random fields on (compact) Riemannian manifolds in a computationally efficient manner.
The main contributions of this work are the following.First, we propose a general approach to model and discretize a class of Gaussian random fields Z defined on compact connected oriented Riemannian manifolds M via functions of the Laplace-Beltrami operator −∆ M of the manifold.We define the random field Z through a series expansion, and derive a finitedimensional approximation Z n on any finite-dimensional function space V n , e.g. a finite element space and not necessarily the spectral representation of the series expansion.To do so, we use (functions of) the Galerkin approximation of −∆ M on V n .This approximation of the field allows us to give a closed form for the covariance matrix of the coefficients in basis representation of Z n , and hence an explicit way to sample these correlated random coefficients.Secondly, we propose an approximation of the discretized field Z n based on Chebyshev polynomials which allows to sample these coefficients in a computationally efficient manner.Finally, we show convergence in mean-square and in the covariance of Z n to Z and give the associated convergence rates.
We also derive a convergence result for the root-mean-squared error induced by the Chebyshev approximation.This approach, which we call Galerkin-Chebyshev approximation, provides efficient and scalable algorithms for computing samples of the discretized field.For instance, when defining the discretized field using a linear finite element space of dimension n, we obtain sampling costs that scale linearly with n and with the order of the considered Chebyshev polynomial approximation, and storage costs that scale linearly with n.In particular, computational costs of essentially O( −2/ρ ) are then required to sample, with accuracy > 0, Gaussian random fields with a Matérn covariance function on a two-dimensional manifold (where ρ denotes the rate at which the root-mean-squared error between the random field and its discretization converges to zero).
So far the focus of the literature for random fields on manifolds has been on the sphere.Extensive literature on the definition, properties, and efficient use of random fields on the sphere is available (see [40] for a review).A first simulation approach aims at characterizing valid covariance functions on the sphere that model the correlation between two points using the arc length distance separating them [25,29].A second approach relies on the fact that stationary Gaussian random fields on the sphere have a basis expansion with respect to the spherical harmonic functions [31].The resulting Karhunen-Loève expansion is used to derive simulation methods and to characterize the covariance structure of the resulting fields [15,21,35,36,40].Finally, models have also been proposed to deal with both space-time data [45] and anisotropy [22] on the sphere.Discretization methods that do not rely on Karhunen-Loève expansions are, for instance, using the existence of Parseval frames on the sphere [3] or relying on a regular discretization of the sphere, Markov properties, and fast Fourier transforms [17].
However, the work done for random fields on a sphere hardly generalizes to other spatial domains, as they heavily rely on the intrinsic properties of the sphere as a surface, and on the spherical harmonics.If now random fields on more general manifolds are of interest, Adler and Taylor [1] provide a review of the theory used to define them, primarily focused on their geometry and excursion sets.The goal of this work is to propose and analyze a second approach, which generalizes the expansion approach on the sphere, and results in efficient algorithms for sampling Gaussian random fields on a manifold.Examples of samples of the resulting fields on different manifolds are shown in Figure 1 and show the flexibility of the approach, since it can be applied to widely different domains.
Our approach extends previous methods proposed for the numerical approximation of Gaussian random fields defined on manifolds.Several authors worked on the approximation of Gaussian random fields seen as solutions to stochastic partial differential equations (SPDEs), and in particular Whittle-Matérn fields which were popularized by Lindgren et al. [38].A quadrature approximation allowed them to derive numerical approximations of such fields defined on bounded Euclidean domains [6,7] and even compact metric spaces [28].This approach requires to solve multiple (large but sparse) linear systems in order to generate samples of the random fields, and work has been done to find suitable and efficient preconditioners to tackle them [26].In contrast, our approach does not rely on the fact that the random field is the solution of some SPDE (since we do not require the function of −∆ M to be invertible), but still includes Whittle-Matérn fields as a particular case.Also, the use of a Chebyshev polynomial approximation allows in some cases to avoid solving any linear system while generating samples.
The idea of using functions of the Laplacian to model Gaussian random fields on manifolds was recently investigated by Borovitskiy et al. [10] and Borovitskiy et al. [11].Contrary to Borovitskiy et al. [10], our approach does not require an explicit approximation of the eigenvalues and eigenfunctions of the Laplace-Beltrami operator.Besides, we propose a convergence analysis, both in mean-square and covariance, of the approximations we propose.This analysis extends to the approximations in [10], as they can be seen as a particular instance of our more general framework.Finally, our work provides a theoretical justification for the use of functions of Laplacian matrices to model Gaussian fields on graphs, as proposed in Borovitskiy et al. [11].Indeed, such matrices arise naturally when examining the discretization of random fields [43].
The outline of this paper is as follows.In Section 2, we present some background material on functional analysis on Riemannian manifolds and the class of Gaussian random fields considered in this work.Section 3 is devoted to the Galerkin approximation of these random fields.Then, in Section 4, we introduce the Chebyshev polynomial approximation used to numerically compute the weights of the Galerkin-discretized random fields.In Section 5 we expose the convergence analysis of the Galerkin and Chebyshev approximations and give the corresponding error estimates, and in Section 6 we present an analysis of the computational complexity and storage required to generate samples of a random field using its Galerkin-Chebyshev approximation.Finally, in Section 7, we confirm error estimates through numerical experiments on the sphere and a hyperboloid.
Throughout the paper, we denote by I the identity matrix and for any a, b ∈ N 0 we write [[a, b]] = {a, . . ., b} if a ≤ b, and adopt the convention [[a, b]] = ∅ if a > b.The entries of a vector u ∈ R n are denoted by u 1 , . . ., u n , and the entries of a matrix A ∈ R n×n are denoted by A ij , 1 ≤ i, j ≤ n.If X is a Gaussian vector with mean µ and covariance matrix Σ, we write X ∼ N (µ, Σ).Finally, for any two functions f and g depending on some argument x ∈ R, and for a ∈ {0, +∞}, we write f (x) = O(g(x)) if f is asymptotically bounded by g as x → a, i.e. if there exists some constant M a independent of x such that |f (x)| ≤ M a |g(x)| when x → a.
2. Functional analysis background and random fields on manifolds 2.1.Laplace-Beltrami operator on a compact Riemannian manifold.We first introduce a few notions of Riemannian geometry, and refer the interested reader to [4,32,34] and the references therein for a more in-depth introduction on the subject.
Let (M, g) be a compact connected oriented Riemannian manifold of dimension d ≥ 1, such that M has either a smooth boundary ∂M or no boundary at all (∂M = ∅).A function where φ denotes the local chart associated with the coordinates and (g ij ) 1≤i,j≤d is the inverse of the metric tensor g = (g ij ) 1≤i,j≤d .Similarly, the Laplace-Beltrami operator −∆ M acting on where |g| is the determinant of the metric tensor g.Note in particular that both definitions are independent of the choice of local charts and associated local coordinates.Let dv g denote the canonical measure of (M, g), which is given by where dx 1 • • • dx d denotes the standard Lebesgue measure on R d .We denote by H = L 2 (M, g) the space of square-integrable functions on (M, g), which is defined as In particular, H is a Hilbert space when equipped with the inner product (•, •) 0 defined by and we denote by • 0 the norm associated with this inner product.
Consider the eigenvalue problem with Dirichlet or (homogeneous) Neumann boundary conditions whenever ∂M = ∅.A standard result of spectral theory [34,Theorem 4.3.1]states that this problem admits solutions in the form of a set of eigenpairs (λ k , e k ) k∈N , where λ k ≥ 0 and such that each eigenvalue has a finite multiplicity, the eigenspaces corresponding to distinct eigenvalues are H-orthogonal, and the direct sum of the eigenspaces is dense in H. Hence this theorem provides a decomposition of any function f ∈ H into an orthonormal basis {e k } k∈N of eigenfunctions of −∆ M , as where the equality is understood in the L 2 -sense.Without loss of generality, we assume in the remainder of this paper that the eigenpairs of −∆ M are ordered so that 0 ≤ λ 1 ≤ λ 2 ≤ • • • .In particular we have λ 1 = 0 whenever ∂M = ∅ or Neumann boundary conditions are considered, and λ 1 > 0 when Dirichlet boundary conditions are considered [34,Proposition 4.5.6].Hence, in this work, the multiplicity M 0 of the eigenvalue 0 satisfies M 0 ∈ {0, 1}.The following can be stated about the growth rate of the eigenvalues.Proposition 2.1 (Weyl's asymptotic law).For α = 2/d, there exist constants c λ > 0 and C λ > 0 such that all non-negative eigenvalues {λ j } j∈N satisfy This property is a direct consequence of Weyl's asymptotic formula which holds for connected compact Riemannian manifolds of dimension d and states that the constants c λ and C λ depend on d and on the volume of the manifold [34,Theorem 7.6.4].

2.2.
Function spaces on a compact Riemannian manifold.The Sobolev space H 1 is defined as the completion of C ∞ (M) with respect to the norm • H 1 defined by This space is a Hilbert space when equipped with the inner product (•, •) H 1 defined by In particular, the definition of the gradient operator is here extended to functions of H 1 using a density argument.More generally, Sobolev spaces of fractional order H σ , σ > 0, can be defined on compact Riemannian manifolds by stating that f ∈ H σ when, for any coordinate patch (U, φ), and any function ψ with compact support in U , the function (f ψ) • φ −1 belongs to the Sobolev space H σ (R d ) as usually defined on R d [51, Chapter 4, Section 3].For σ = 1, this last characterization coincides with our used definition of H 1 .Finally, let σ ≥ 0 and let F ⊂ H be the space of finite linear combinations of the eigenfunctions {e k } k∈N of −∆ M .Following the definition of spaces of generalized functions on manifolds introduced by Taylor [51, Chapter 5, Section A], let Ḣσ be the completion of F under the norm • σ defined by where by convention the first sum vanishes if M 0 = 0.In particular, we have Ḣ0 = H and more generally, Ḣσ is a Hilbert space when equipped with the inner product (•, •) σ defined by Remark 2.2.When manifolds without boundary are considered, the definition of Ḣσ given above is equivalent to the definition of the fractional Sobolev space of order σ through Bessel potentials (used for instance by Strichartz [50] or Herrmann et al. [27]).Indeed, recall that the latter is defined as the subspace of H composed of functions f ∈ H satisfying f σ < +∞, where • σ is the norm defined by Equivalence follows from the equivalence of the norms • σ and • σ : we have When manifolds with boundary are considered, and σ > 0, Ḣσ can be seen as a subspace of a fractional Sobolev space composed of functions satisfying the same boundary conditions as the ones considered for the eigenvalue problem of the Laplace-Beltrami operator [51, Chapter 5, Section A].
For σ < 0, we define Ḣσ to be the dual space of Ḣ−σ : these spaces are Hilbert spaces when endowed with the inner product (1), and their elements are seen as distributions [50].
2.3.Functions of the Laplacian.We now introduce a class of operators acting on H, called functions of the Laplacian.These operators are classically used to express solutions of some differential equations and to prove Weyl's asymptotic formula [12].To define functions of the Laplacian, we first introduce the notion of power spectral density.Definition 2.3.A power spectral density is a function γ : [0, +∞) → R with the following properties.First, there exists some ν ∈ N for which γ is ν times differentiable, with continuous derivatives up to order (ν −1) and a derivative of order ν of bounded variation.Second, γ(λ) → 0 as λ → ∞.And finally, there exist constants L γ , C γ , β > 0 such that for all λ ≥ L γ , the first derivative γ of γ satisfies 1+β) .Note in particular that these last two conditions imply that there exists C γ > 0 such that In particular, the power spectral density considered in this work should satisfy the relation given in the next assumption.This assumption allows us to define the notion of functions of Laplacian as a endomorphism of H. Indeed, given a power spectral density γ satisfying Assumption 2.4, we define the function of the Laplacian γ(−∆ M ) associated with γ as the operator γ(−∆ M ) : H → H given by: The next proposition extends the domain of this operator.Proposition 2.5.Let Assumption 2.4 be satisfied.For any σ ∈ R, the function of the Laplacian γ(−∆ M ) can be extended to an operator (also denoted γ(−∆ M ) with a slight abuse of notation) where α > 0 and β > 0 are defined respectively in Proposition 2.1 and Definition 2.3.

2.4.
Random fields on a Riemannian manifold.Let us start by introducing some notation.Let (Ω, A, P) be a complete probability space.Let Q denote some arbitrary Hilbert space (with inner product (•, •) Q and associated norm • Q ).We denote by L 2 (Ω; Q) the set of all Qvalued random variables defined on (Ω, A, P) satisfying, for any In particular, this implies that any Z ∈ L 2 (Ω; Q) is almost surely in Q.Finally, note that L 2 (Ω; Q) is a Hilbert space when equipped with the inner product (•, •) L 2 (Ω;Q) (and associated norm We now define the notion of Gaussian white noise on the manifold M. Let {W j } j∈N be a sequence of independent, standard Gaussian random variables.The linear functional W defined over H by is called Gaussian white noise on M. Note that for any ϕ ∈ H, the series W, ϕ converges in quadratic mean since E [ W, ϕ ] = 0 and by independence of the variables In particular, W satisfies, for any ϕ ∈ H, E [ W, ϕ ] = 0, and for any ϕ 1 , ϕ 2 ∈ H, The next proposition details the domain of definition and regularity of W. Proposition 2.6.For any > 0, W ∈ L 2 (Ω; Ḣ−(α −1 + ) ), where α > 0 is given in Proposition 2.1.
Proof.Let > 0 and N ∈ N. Consider the truncated white noise W N defined by By definition of M 0 , , which gives, using Proposition 2.1, where ζ denotes the Riemann zeta function satisfying < ∞, which proves the claim.We now introduce a class of random fields defined using the white noise W and functions of the Laplacian.Let γ be a power spectral density satisfying Assumption 2.4 be satisfied and let Z be the random field defined by ( 4) By Propositions 2.5 and 2.6, for any > 0, Z is (a.s.) an element of Ḣ2β−(α −1 + ) .The next proposition links Z to H-valued random variables.
Proposition 2.7.Let γ be a power spectral density satisfying Assumption 2.4 and let Z be defined by (4).Then, Z ∈ L 2 (Ω; H) and Z can be decomposed as where the weights {W j } j∈N define a white noise as in (3).
Proof.Since Assumption 2.4 is satisfied, Propositions 2.5 and 2.6 give that Z is in H (almost surely).Recall that, by definition of functions of the Laplacian, By linearity, we then have The class of Gaussian random fields described in this section can be seen as an extension to arbitrary compact connected oriented Riemannian manifolds of the class of isotropic random fields on the sphere described in [35].In this last case, the eigenfunctions {e k } k∈N of the Laplace-Beltrami operator are the spherical harmonics, and the power spectral density γ defines the angular power spectrum of the field.In this sense, the decomposition introduced in Proposition 2.7 can be seen as the Karhunen-Loève expansion of a Gaussian random field on a compact connected oriented Riemannian manifold.
In the particular case where the power spectral density γ takes the form (5) γ(λ) = |κ 2 + λ| −β , λ ≥ 0, for some parameters κ > 0 and β > 1/(2α) = d/4, the resulting field Z is a solution to the fractional elliptic SPDE (6) As such, Z is an instance of a Whittle-Matérn random field on a manifold, as introduced in [38] for compact Riemannian manifolds.This class of random fields was studied in [30] for the particular case where the manifold is a sphere, and in [26,28] for compact Riemannian manifolds.More generally, the random fields defined by (4) are particular instances of regular zero-mean generalized Gaussian fields (GeGF) as defined in [39, Section 3.2.1].To a field Z defined by (4), we can associate the continuous linear functional f ∈ H → (Z, f ) 0 , which corresponds to a GeGF with a covariance operator K : H → H given by K = γ 2 (−∆ M ) (where by definition the covariance operator is defined as . The fact that this GeGF is regular stems directly from the fact that, under the assumptions used in Proposition 2.7, the operator γ 2 (−∆ M ) is nuclear.Conversely, since −∆ M and γ 2 (−∆ M ) have the same eigenfunctions, and since the function γ 2 maps the eigenvalues of −∆ M to those of γ 2 (−∆ M ), any regular GeGF with covariance operator γ 2 (−∆ M ) can be decomposed as in Proposition 2.7 (cf. [39, Theorem 3.2.15]and its proof).

Discretization of Gaussian random fields
We now aim at computing numerical approximations of the random fields Z defined in (4) using a discretization of the Laplace-Beltrami operator.The discretization we propose is based on a Galerkin approximation, and can be seen as an extension of the approach in [7].It leads to an approximation by a weighted sum of basis functions defined on the manifold.
For n ≥ 1, let {ψ k } 1≤k≤n be a family of linearly independent functions of Ḣ1 and denote by V n ⊂ Ḣ1 its linear span.In particular, V n is a n-dimensional subspace of Ḣ1 , and we assume that the constant functions are in V n .Examples that are included in our framework are spectral methods, where V n is spanned by finitely many eigenfunctions of −∆ M , boundary element methods [48], and with an extra approximation step surface finite elements [20].
3.1.Galerkin discretization of the Laplace-Beltrami operator.We first introduce a discretization −∆ n of the Laplace-Beltrami operator over V n by a Galerkin approximation [2, Chapter 4].For any ϕ ∈ V n , we set −∆ n ϕ to be the element of Let C and R be the matrices called (in the context of finite element methods) mass matrix and stiffness matrix respectively, and defined by As defined, C is a symmetric positive definite matrix and R is a symmetric positive semi-definite matrix (cf.Lemma SM2.1 of the Supplementary Materials).Consequently, the generalized eigenvalue problem (GEP) defined by the matrix pencil (R, C), which consists in finding all so-called eigenvalues λ ∈ R and eigenvectors w ∈ R n \{0} such that Rw = λCw, admits a solution consisting of n nonnegative eigenvalues and n eigenvectors mutually orthogonal with respect to the inner product (•, •) C (and norm • C ) defined by (see [42,Theorem 15.3.3]).
We observe further that since C is symmetric and positive definite, The next result links the GEP to the operator −∆ n , and is proven in Appendix B.
Theorem 3.1.The operator −∆ n is diagonalizable and its eigenvalues are those of the GEP defined by the matrix pencil (R, C).In particular, E 0 : R n → V n , defined by is an isomorphism that maps the eigenvectors of (R, C) to eigenfunctions of −∆ n , and an isometry between (R n , • C ) and (V n , • 0 ).
We continue with a corollary that will be useful later on.
Corollary 3.2.The eigenvalues of −∆ n are those of the matrix and the mapping E : R n → V n , defined by is an isomorphism that maps the eigenvectors of S to the eigenfunctions of −∆ n and an isometry between (R n , • 2 ) and (V n , • 0 ).
Proof.Note first that S is well-defined and symmetric positive semi-definite by the properties of C and recall the bijection F given by and therefore (λ, v) is an eigenpair of S. Hence F maps the eigenvectors of (R, C) to those of S, and conversely F −1 maps the eigenvectors of S to those of (R, C).Noting that E = E 0 • F −1 and applying Theorem 3.1 concludes the proof.
We denote by {λ k } 1≤k≤n ⊂ R + the eigenvalues of the matrix S (cf.Corollary 3.2), ordered in non-decreasing order.Let {v k } 1≤k≤n ⊂ R n be a set of eigenvectors associated with these eigenvalues, and chosen to form an orthonormal basis of R n .Hence, if V is the matrix whose columns are (v 1 , . . ., v n ), we have is an orthonormal family of functions of V n .Moreover, given that E is linear and bijective, Consider a power spectral density γ satisfying Assumption 2.4.Following the definition of the discretized operator −∆ n and analogously to the definition of the operator γ(−∆ M ), the discretization of the operator γ(−∆ M ) on V n is defined as the endomorphism γ(−∆ n ) of V n given by ( 9) Note that this definition does not depend on the choice of orthonormal basis (cf.Proposition SM2.2 of the Supplementary Materials).

3.2.
Galerkin discretization of Gaussian random fields.Let W n be the V n -valued random variable defined by ( 10) where W 1 , . . ., W n are independent standard Gaussian random variables.Then, W n is called white noise on V n and satisfies, for any ϕ, ϕ It can be expressed in the basis functions {ψ k } 1≤k≤n of V n , as stated in the next proposition which leads to an expression of the white noise using a basis that does not have to be orthonormal or an eigenbasis of −∆ n .
Proposition 3.3.Let W n be a white noise on V n .Then, W n can be written as where W = ( W1 , . . ., Wn ) T is a centered Gaussian vector with covariance matrix T be the vector containing the random weights defining W n in (10).Using the linearity of E in Corollary 3.2, W n ∈ V n can be written as where W ∼ N (0, I).But also, denoting by W = ( W1 , . . ., Wn ) T the vector containing the coordinates of W n in the basis {ψ k } 1≤k≤n of V n , we get from Corollary 3.2, Hence, since E is bijective, we get W = ( √ C) −T V W which proves the result.
Inspired by the definition of the H-valued random field Z in (4), we introduce its Galerkin discretization Z n as the V n -valued random field defined by (11) Z where W 1 , . . ., W n are independent standard Gaussian random variables.Expressing Z n in the basis functions {ψ k } 1≤k≤n can then be done straightforwardly using the next theorem, leading to a first method to generate approximations of Z.
Theorem 3.4.The discretized field Z n can be decomposed in the basis {ψ k } 1≤k≤n as where Z = (Z 1 , . . ., Z n ) T is a centered Gaussian vector with covariance matrix given by Proof.Notice that Z n is V n -valued, hence there exists some random vector Z ∈ R n such that But following instead the definition of W n in (10) and the linearity of E, we get where W = (W 1 , . . ., W n ) T ∼ N (0, I).Therefore, given that E is bijective, which proves the result.
Theorem 3.4 provides an explicit expression for the covariance matrix of the weights of V nvalued random variables.Consequently, generating realizations of such random functions can be done by simulating a centered Gaussian random vector of weights with covariance matrix (13) and then building the weighted sum (12).
A particular case, investigated in [10], is when V n is spanned by the set of eigenfunctions associated with the first n eigenvalues (sorted in non-decreasing order and counted with their multiplicities) of the Laplace-Beltrami operator.Then, the discretized random field Z n corresponds to a truncation of order n of the series in Proposition 2.7 that defines the random field Z.Hence, we have a direct extension to Riemannian manifolds of the spectral methods used to sample isotropic random fields with spectral density γ 2 on a bounded domain of R d [14] or a sphere [35].In practice though, for arbitrary compact, connected and oriented Riemannian manifolds, the eigenfunctions of the Laplace-Beltrami operator are not readily available and must be computed numerically, rendering such spectral methods potentially cumbersome.But since the only requirement on V n was for this space to be a finite-dimensional subspace of Ḣ1 , Theorem 3.4 is applicable to more general choices of approximation spaces V n .

Chebyshev approximation of the discretized random field
Since the weights of the discretized random field characterized in Theorem 3.4 form a centered Gaussian random vector, they are entirely characterized by their covariance matrix.We show how the particular form of this covariance matrix can be used to propose efficient sampling methods.
Let Z be the centered Gaussian random vector generating Z n in Theorem 3.4.Then, Z can be expressed as the solution to the linear system where X is a centered Gaussian random vector with covariance matrix γ 2 (S).In this section, we review ways of generating the right-hand side of this linear system.A rather straightforward way to generate samples of X would be to compute the product where W ∼ N (0, I) and γ 2 (S) is a square-root of γ 2 (S), i.e., a matrix satisfying γ 2 (S) = T .Suitable choices are the Cholesky factorization of γ 2 (S) and the matrix γ(S).However these choices would entail to fully diagonalize the matrix S since they rely on matrix functions.This requires a workload of O(n 3 ) operations and a storage space of O(n 2 ).
To reduce these high costs, we propose to use a polynomial approximation of the square-root based on Chebyshev series instead.Let X be a sample of the weights obtained through the relation where W ∼ N (0, I).Note that in the particular case where γ = P is a polynomial of degree K with coefficients a 0 , . . ., a K ∈ R, we have This means in particular that the product P (S)W can be computed iteratively, while requiring at each iteration only a single product between S and a vector.Hence, no diagonalization of the matrix is needed in this case.Building on this idea, we propose to approximate, for a general function γ, the vector X in (15) by the vector X defined by where P γ,K is a polynomial approximation of degree K ∈ N of γ, over an interval containing all the eigenvalues of S. In particular, since S is positive semi-definite, we consider this interval to be [0, λ max ] where λ max is some upper bound of the greatest eigenvalue of S.
We choose the basis of Chebyshev polynomials (of the first kind) to compute the expression of the approximating polynomial P γ,K .These polynomials are the family {T k } k∈N 0 of polynomials defined over [−1, 1] by: ( 16) or equivalently via the recurrence relation: Note in particular that for any k ∈ N 0 , T k is a polynomial of degree k and that for any t ∈ and equipped with the inner product •, • c defined by As such, the truncated Chebyshev series of order K ≥ 0 of any function ) is the polynomial of degree (at most) K given by ( 18) where the coefficients c k are defined by Truncated Chebyshev series of continuous functions are pointwise convergent in the L 2 c -sense [41, Theorem 5.6], and for power spectral densities they are uniformly convergent (cf.Appendix A for more details).This motivates their use to approximate a power spectral density γ.Besides, using truncated Chebyshev series also guarantees: the fact that at any order of approximation K, the polynomial P γ,K is near optimal in the sense that γ is the best polynomial approximation of γ of order K and where C ≈ 1.27 is the so-called Lebesgue constant of the approximation [41, Chapter 5, Section 5]; the fact that the coefficients of the polynomial in the Chebyshev basis of polynomials can be computed very efficiently using the Fast Fourier Transform (FFT) algorithm [16], with a complexity that can be bounded by O(K log K) to compute K coefficients (see [43,Section B.4.4] for an algorithm).Since Chebyshev polynomials are defined on [−1, 1], the interval of approximation [0, λ max ] must be mapped onto [−1, 1] and vice versa, which is done with the linear change of variable can then be approximated by a truncated Chebyshev series of order K, and the polynomial P γ,K approximating γ on [0, λ max ] takes the form is the truncation of order K of the Chebyshev series of γ.
Ultimately, the approximation Z n,K of the discretized field Z n that results from the polynomial approximation introduced in this subsection takes the form where the random weights Z = ( Z 1 , . . ., Z n ) T are given by ( 22) with W ∼ N (0, I) and c 0 , . . ., c K denote the first K coefficients of the Chebyshev series of γ.
We call Z n,K a Galerkin-Chebyshev approximation of discretization order n ∈ N and polynomial order K ∈ N of the Gaussian random field Z.

Convergence analysis
The goal of this section is to derive the overall error between the random field Z, as defined in (4), and its Galerkin-Chebyshev approximation Z n,K associated with a functional discretization space V n of dimension n and a Chebyshev polynomial approximation of order K of the power spectral density.To derive this error, we assume for simplicity that the upper bound λ max of the eigenvalues of the stiffness matrix S (on which the Chebyshev polynomial approximation is defined) is equal to the maximal eigenvalues of S, i.e., λ max = λ To prove convergence result between Z and Z n,K , we need an additional assumption on the space V n , or more precisely on the approximating properties of the discretized operator −∆ n that this space yields.We assume the following link between the eigenpairs of −∆ n and those of −∆ M (arranged in non-decreasing order).
In Assumption 5.1, the requirement (24) states that eigenvalues and eigenfunctions of −∆ n should asymptotically lie within a ball around the eigenvalues and eigenfunctions of −∆ M , where the radius of the ball may grow with the magnitude of the eigenvalue but, for a fixed index k, decreases as n → +∞.The requirement (25) expresses that, asymptotically, the eigenvalues of −∆ n should grow at the same rate as the eigenvalues of −∆ M .This last requirement may seem redundant with the first one but ensures that, even for large indices k ≈ n, the eigenvalues λ do not stay too far away from λ k (which is not always ensured by the first requirement).
A straightforward example of a discretization space V n for which Assumption 5.1 is satisfied is when V n is defined as the set containing the first n eigenfunctions of the Laplace-Beltrami operator, since then λ The resulting Galerkin-Chebyshev approximation of the field then amounts to a classical spectral method.In this case, one can use directly the Galerkin approximation of the random field for sampling purposes without requiring a Chebyshev polynomial approximation of the power spectral density (cf.Section 6.2.1 for more details).However, considering this particular discretization space V n implies that the eigenfunctions of the Laplace-Beltrami operator are known, which is seldom in practice.
An alternative to the spectral method consists in building the discretization space V n from basis functions of a finite element space.If the Riemannian manifold (M, g) is a bounded convex polygonal domain equipped with the Euclidean metric, and V n is the linear finite element space associated with a quasi-uniform triangulation of M with mesh size h n −1/d , then Assumption 5.1 is satisfied for the exponents r = s = α = 2/d and q = 2 [49, Theorems 6.1 & 6.2].
If now M is a smooth compact 2-dimensional surface without boundary equipped with the metric g induced by the Euclidean metric on R 3 (and called pullback metric, see [37,Chapter 13] for more details), the surface finite element method (SFEM) provides a way to construct a finite element space on the surface M by "lifting" on M a linear finite element space defined on a polyhedral approximation of M that lies "close" to the surface (see [19] and [18, Section 2.6] for more details).The discretization space V n can then be taken as the linear span of the lifted finite element basis functions defined on the polyhedral surface.One can show that, |λ Appendix C for more details).Proving the eigenfunction inequality is open and ongoing work, but our numerical experiments in Section 7 indicate that our error estimates hold.
Remark 5.3.In practice, when using SFEM, it is usual to consider the eigenfunctions and eigenvalues of the discrete operator defined on the polyhedral approximation M of the surface M (as opposed to the original surface M).In that case, V n is not a subset of functions of M but rather a subset of functions of M, which is considered in the numerical experiments in Section 7.Then, the error on the approximation in V n of the eigenvalues and eigenvectors of the Laplace-Beltrami operator of M can be written as (see [8]): , where the explicit dependence of the constants C 1 and C 2 on λ k is given in [8].Hence, if one can write C 1 (λ k ) λ q k and C 2 (λ k ) λ q k for some q ∈ [1, 2], then Assumption 5.1 is satisfied, which is ongoing work.
We now state the main results of this section.
Theorem 5.4.Let Assumptions 2.4 and 5.1 be satisfied.Then, the approximation error of the random field Z by its Galerkin-Chebyshev approximation Z n,K of discretization order n ∈ N big enough and polynomial order K ∈ N, satisfies where C Galer and C pol are constants independent of n and K, ρ = min {s; r; (αβ − 1/2)} > 0, α > 0 is defined in Proposition 2.1, r > 0 and s > 0 are given in Assumption 5.1, and β > 0 and ν ∈ N as in Definition 2.3.
When the power spectral density γ is known to be analytic over [0, λ max ] (meaning in particular that in Definition 2.3 any ν ∈ N works), the polynomial approximation error can be shown to decrease at an exponential rate.The resulting overall error between the random field Z and its approximation Z n,K can then be upper bounded as stated in the next result.
Corollary 5.5.Let Assumptions 2.4 and 5.1 be satisfied and let γ be a power spectral density such that there exists some χ > 0 such that the map z ∈ C → γ(z) is holomorphic inside the ellipse E χ ⊂ C centered at z = λ max /2, with foci z 1 = 0 and z 2 = λ max , and semi-major axis a χ = λ max /2 + χ.
Then, the approximation error of the random field Z by its Galerkin-Chebyshev approximation Z n,K of discretization order n ∈ N big enough and polynomial order K ∈ N, satisfies where C Galer .Cpol and C pol are constants independent of n and K, ρ = min {s; r; (αβ − 1/2)} > 0, α > 0 is defined in Proposition 2.1, r > 0 and s > 0 are given in Assumption 5.1, and β > 0 as in Definition 2.3.
We prove these two error estimates by upper bounding the left-hand side by the sum of a discretization error and a polynomial approximation error, both of which are derived in the next two subsections.The discretization error is computed in the more general setting on spaces Ḣσ defined in Section 2.3 (with σ = 0 giving the error on H).We also provide an interpretation of the terms composing this error estimate, as well as a result on the convergence of the covariance of the discretization scheme.

5.1.
Error analysis of the discretized field.In this section, a convergence result of the discretized field Z n is derived in terms of a root-mean-squared error on the spaces Ḣσ defined in Section 2.3.Theorem 5.6.Let Assumptions 2.4 and 5.1 be satisfied.Then, there exists N 1 ∈ N such that for any n > N 1 , and σ ∈ [0, α −1 (2αβ − 1)), the approximation error of the random field Z by its discretization Z n satisfies (26) else, where α > 0 is defined in Proposition 2.1, q ≥ 1, r > 0 and s > 0 are given in Assumption 5.1, and β > 0 as in Definition 2.3.Proof.Let n > max{M 0 ; N 0 }, and let Z (n) be the truncated random field of Z given by We split the error with the triangle inequality into which leads by Proposition 2.1 and Assumption 2.4 to where the last inequality is derived using a Riemann sum associated with the integration of the function t → t −α(2β−σ) and using the assumption that α(2β − σ) > 1.
• Discretization error Z (n) − Z n L 2 (Ω; Ḣσ ) : We split the error further by the triangle inequality into

= (I) + (II).
The first term satisfies j , e k − e And using the fact that αq ≤ 2s (cf.eq. ( 23)), we finally obtain Bounding the sum again by the corresponding integral, we distinguish three cases: and continue with bounding Following Remark 5.2, the first sum in (II) 2 is 0. We then focus on the terms composing the second sum.The mean value theorem gives for any j ∈ We have, for n > N 0 , min λ k ; λ (n) k ≥ l λ λ k ≥ l λ c λ k α as a consequence of Proposition 2.1 and Assumption 5.1.We can therefore find N 1 > N 0 such that for any n > N 1 and any j 1+β) .And for j < N 1 , we can take |γ(λ j ) − γ(λ where S γ = sup [0,Lγ ] |γ |.Therefore, using the last two inequalities (and applying again Proposition 2.1 and Assumption 5.1), we get and using the same argument as for (I) 2 , we conclude that Combining the terms (I) and (II) finally gives, if q > 1, (28) The proof is concluded by bounding Equations ( 27) to ( 29) by the smallest exponents.This error estimate (26) yields the same convergence rate as the one derived in [5,7] in their approximation of solutions to fractional elliptic SPDEs with spatial white noise, but our result differs from their result in three aspects.First, we defined our random fields on Riemannian manifolds.Then, the random fields covered by their result can be seen as those specific choices of γ such that γ is non-zero over R + .Finally, we use slightly different assumptions on the discretization space: in Assumption 5.1, we do not assume that λ (n) k ≥ λ k .This assumption holds in particular for finite element spaces associated with conforming triangulation and on domains of R d [49], and dropping it allows to open the way to the use of non-conforming methods.
We conclude this subsection by investigating the overall error in the covariance between the random field Z and its discretized counterpart Z n .This error is described in the next theorem and is derived using the same approach as in Theorem 5.6.
Theorem 5.7.Let Assumptions 2.4 and 5.1 be satisfied.Then, there exists some N 2 ∈ N such that for any n > N 2 , the covariance error between the random field Z and its discretization Z n satisfies, for any θ, ϕ ∈ H, Proof.The proof of this theorem is similar to the proof of Theorem 5.6, and is available in Section SM3 of the Supplementary Materials.

5.2.
Error analysis of the polynomial approximation.The Chebyshev polynomial approximation boils down to replacing the power spectral density γ by the polynomial P γ,K defined in (21), which approximates γ over a segment [0, λ n ] containing all the eigenvalues of the discretized operator −∆ n (or equivalently the eigenvalues of the matrix S).Hence, we have according to (11) where {W k } 1≤k≤n are the same random weights as the ones defining Z n in (11).The next result gives the root-mean-squared error between Z n and its approximation Z n,K .
Theorem 5.8.Let Assumption 5.1 be satisfied, and let ν ∈ N be defined as in Definition 2.3, and let λ max = λ (n) n .Then, there exists N Cheb ∈ N such that for any n > N Cheb , the root-meansquared error between the discretized field Z n and its polynomial approximation Z n,K of order K > ν is bounded by where TV(γ (ν) ) denotes the total variation over [0, λ max ] of the ν-th derivative of γ and α > 0 and C λ > 0 are defined in Proposition 2.1.
If γ satisfies that there exists some χ > 0 such that the map z ∈ C → γ(z) is holomorphic inside the ellipse E χ ⊂ C centered at z = λ max /2, with foci z 1 = 0 and z 2 = λ max and semi-major axis a χ = λ max /2 + χ, then, there exists M Cheb ∈ N such that for any n > M Cheb , n and let K ∈ N. We observe first that using the definition of Z n and Z n,K .A rather crude upper bound of this quantity is given by with γ defined in (20) and S K [γ] denoting the Chebyshev series of γ truncated at order K.If we take K > ν, the convergence properties of Chebyshev series (cf.Theorem A.1) imply that Under Proposition 2.1, and Assumption 5.1, we have )−r ) which yields λ max = O(n α ) (as n → +∞) since α(q − 1) ≤ r.Hence, by defining N Cheb = min{n ∈ N : C 1 C λ n α(q−1)−r < 1}, we obtain that for any n > N Cheb , λ max ≤ 2C λ n α , which in turn gives For the second inequality, using a convergence result of Chebyshev series for analytic functions (cf.Theorem A.1) and the same reasoning as above, we get for any n, K ∈ N, where χ > 0 is given by χ = 2λ −1 max χ + χ(λ max + χ) = h(χλ −1 max ), and for x > 0, h(x) = 2(x + x(1 + x)).In particular, for x ∈ (0, 1), we have 2 x.Following Proposition 2.1 and Assumption 5.1, Then, for any n > N Cheb , we have χλ −1 max ∈ (0, 1) and Taking n > M Cheb = max{N Cheb , N Cheb }, we obtain is decreasing for x ∈ (0, 1) and that log(1 + x) ≥ x/2 yields for any n > M Cheb , where C χ,λ = χ(2C λ ) −1 .This in turn gives For a fixed number of degrees of freedom n in Theorem 5.8, the approximation error Z n − Z n,K L 2 (Ω;H) converges to 0 as the order of the polynomial approximation K goes to infinity.Choosing K as a function of n that grows fast enough then allows to ensure the convergence of the approximation error as n goes to infinity.For instance, let us assume that γ is once differentiable with a derivative with bounded variations (i.e., ν = 1 in Definition 2.3), and take for simplicity λ max = λ (n) n .Assuming that Assumption 5.1 is satisfied, and following Proposition 2.1 yields , where f denotes any function with lim n→∞ f (n) = +∞, ensures that the approximation error Z n − Z n,K L 2 (Ω;H) goes to 0 at least as fast as f goes to infinity.In Section 6.3, we provide another example for the choice of K for an analytic power spectral density.
In practice though, the order K of the polynomial approximation is set differently, which allows to work with relatively small orders.It is suggested in [44] to set K by controlling the deviation in distribution between the samples obtained with and without the polynomial approximation.We propose an approach based on the numerical properties of Chebyshev series, and show in the numerical experiments that it allows to limit the approximation order.
Observe that the random weights (22) defining the Chebyshev polynomial approximation Z n,K are obtained by summing the random vectors given by where W ∼ N (0, I) and c 0 , . . ., c K are the Chebyshev series coefficients of the function γ defined in (20).The Chebyshev polynomials {T k } k∈N have values in [−1, 1], meaning in particular that the eigenvalues of the matrices T k ((2/λ max )S − I) lie in the same interval.Consequently, we have for any Since the coefficients c k converge to 0 at least linearly for power spectral densities (cf.Theorem A.1), the order K can be chosen to ensure that the ratio c K /c max 1 or that the bound |c K |n 1/2 1.Then, in practice, adding more terms to the expansion only results in negligible perturbations of the solution.

Complexity analysis
Recall that the Galerkin-Chebyshev approximation Z n,K of discretization order n ∈ N and polynomial order K ∈ N of a random field Z is defined as where Z = ( Z 1 , . . ., Z n ) T is a Gaussian random vector with mean 0 and covariance matrix which can be computed by solving the linear system for W ∼ N (0, I).We now discuss the computational and storage cost of sampling a GRF using this approximation.In a first part, we derive these costs for the the case where nothing further is assumed about the basis {ψ k } 1≤k≤n used to discretize the field.In a second part, we then show how some particular choices of this basis can help to drastically improve these costs.The computational and storage costs obtained in each case are summarized in Table 1.Each time, we distinguish offline computational costs, linked to operations that can be reused to generate more samples, and online computational costs steps that are specific to the computation of a given sample.In particular, we observe that the spectral method seems to perform best, but as we will see this method is rarely applicable, and we will in practice prefer the method based on linear finite elements with a mass lumping approximation which still offers overall computational costs that grow linearly with the product Kn (see Sections 6.2.1 and 6.2.2 for more details).
6.1.Efficient sampling: general case.Generating samples of the weights Z in (32) requires two steps: first, one computes the vector X = P γ,K (S)W for some W ∼ N (0, I).Due to the fact that P γ,K is a polynomial, this step can be implemented as an iterative program involving at each step only one matrix-vector product between S and a vector; then, one solves the linear system √ C T Z = X.
In order to execute these two steps, one only needs to implement the following two subalgorithms: an algorithm Π S taking as input a vector x and returning the product Π S (x) = Sx; an algorithm Π ( √ C) −T taking as input a vector x and returning the solution y = Π ( √ C) −T (x) to the linear system

Offline computational costs Online computational costs Storage costs
General case Linear finite elements + Cholesky Table 1.Comparison of computational and storage costs for computing a GRF sample from a Galerkin-Chebyshev approximation of discretization order n ∈ N and polynomial order K ∈ N, for various choices of discretization basis.The parameter µ is an upper bound for the mean number of nonzero entries in √ C and R, and η Chol (C) the computational cost of a Cholesky factorization of C.
We present in Algorithm 1 of the • the overall algorithm leading to sampling the weights of the decomposition defined in (31) using this approach.Following the definition of S in Corollary 3.2, Π S does not require the matrix S to be computed explicitly and stored: a product by S boils down to solving a first linear system defined by ( √ C) T , multiplying the obtained solution by R and then solving a second linear system defined by √ C. Hence, both Π ( √ C) −T and Π S rely on solving linear systems involving a square-root of the mass matrix C (or its transpose).The cost associated with calls to Π ( and Π S should be kept minimal in order to reduce the overall computational complexity of the sampling algorithm. Since the choice of this square-root is free, one could take it as the Cholesky factorization of C satisfying √ C = L for some lower-triangular matrix L. Solving a linear system involving L or L T can be done at roughly the cost of a matrix-vector product using forward or backward substitution.The algorithms Π S and Π ( √ C) −T resulting from this choice are presented in Algorithms 2 and 3 of the Supplementary Materials.Regarding the computational complexity of these algorithms, since solving a linear system using forward or backward substitution can be done with a computational cost of the same order as a matrix-vector product (namely O(n 2 ) operations), each call to Π S or Π ( √ C) −T amounts to O(n 2 ) operations.This means that, if implementations of these two algorithms are available, the cost of computing the weights Z in ( 32) is of order O(Kn 2 ), where K corresponds to the order of the polynomial approximation.
Finally, recall that one needs an upper bound λ max of the largest eigenvalue of S in order to define the polynomial P γ,K .This upper bound can be obtained with a limited computational cost (namely O(n 2 ) operations) by combining the Gershgorin circle theorem [24] and a power iteration scheme (as described in Section SM4.1 of the Supplementary Materials).
Overall, the computational cost of sampling the weights of the Galerkin-Chebyshev approximation Z n,K in (31) can be summarized as follows.We can distinguish between offline and online steps.The offline steps are as follows.First, there is the computation of the coefficients of the Chebyshev approximation P γ,K , which requires O(K log K) operations as mentioned in the previous subsection.Then, there is the Cholesky factorization of C, which requires O(n 3 ) operations [46,Chapter 2].And finally, there is the computation of the upper bound of the eigenvalues of S, which requires O(n 2 ) operations (dominated by the use of the power iteration scheme).The online step is the computation of the weights according to (32), which requires O(Kn 2 ) operations.Storage-wise, this workflow only requires enough space to store the Cholesky factorization of the mass matrix C, the stiffness matrix R, the K + 1 coefficients of the Chebyshev polynomial approximation, and a few vectors of size n.In conclusion, the offline costs are of order O(K log K + n 3 ), the online costs are of order O(Kn 2 ), and the storage needs are of order O(n 2 + K).As we will see in the next section, both computational and storage costs can be reduced for typical choices of the discretization space V n .6.2.Efficient sampling: Particular cases.The choice of the space V n used to discretize the random fields impacts heavily the mass and stiffness matrices, and can in relevant cases be leveraged to speed up the sampling process.We provide here two examples, which will be considered later on in the numerical experiments.6.2.1.Spectral approximation.If we assume that the eigenvalues of the Laplace-Beltrami operator are known, we can use spectral methods, which correspond to the case where V n is the set of eigenfunctions associated with the first n eigenvalues of the Laplace-Beltrami operator.Then, since the eigenfunctions are orthonormal, the mass matrix C is equal to the identity matrix.Besides, using Green's theorem, we have that the stiffness matrix R is also diagonal, with entries equal to the operator eigenvalues.This gives that S = R is diagonal.
Thus, sampling the weights of Z n,K can be done without requiring any Cholesky factorization: calls to Π S are replaced by multiplication by the diagonal matrix R containing the eigenvalues of the operator, calls to Π ( √ C) −T are replaced by products with an identity matrix, and the upper bound λ max is replaced by the maximal entry of R. In particular, the offline costs are reduced to the computation of the coefficients of P γ,K , and the online costs are reduced to O(n).As for the storage needs, they would now be reduced to O(n) (since both C and R are diagonal).
In practice though, the Chebyshev polynomial approximation is not necessary.One can directly use Theorem 3.4 to compute samples of Z n (and therefore there is no need to approximate it by Z n,K ): S being now diagonal, the matrix γ 2 (S) is the diagonal matrix obtained by directly applying γ 2 to the diagonal entries of S. Samples of Z n are then obtained by taking the weights Z as a sequence of independent Gaussian random variables with variances given by the diagonal entries of γ 2 (S) (since C is the identity matrix).In conclusion, no offline costs are needed for the spectral method, the online costs are of order O(n), and the storage needs are of order O(n).
These computational costs might seem ideal, but one should remember that the spectral method is only applicable when the eigenfunctions and eigenvalues of the Laplace-Beltrami operator are known.This is the case for instance when working on rectangular Euclidean domains, for which the eigenfunctions correspond to the Fourier basis, and we retrieve the classical spectral methods, or for the sphere, for which the eigenfunctions are the spherical harmonics, see Section 7 for more details).For other choices of compact Riemannian manifolds, these are unknown, which is why we propose the next method relying on the finite element method.6.2.2.Linear finite element spaces.Consider the case where V n is taken to be a finite element space of (piecewise) linear functions associated with a simplicial mesh of the manifold M. In this case, the basis functions composing V n have a support limited to a few elements of the mesh, and the matrices C and R are therefore sparse.Besides, for uniform meshes, one can bound the number of nonzero entries in each row of these matrices.Such sparsity can be leveraged to reduce the cost associated with sample generation.
The cost η Chol (C) of the Cholesky factorization now depends on the number of nonzero entries of C, and adequate permutations can be found to ensure that the factors are themselves sparse.This cost is of course upper-bounded by the cost associated with the Cholesky factorization of a dense matrix, i.e., O(n 3 ), but in practice the sparsity of the matrix is leveraged to achieve a lower computational cost.Consequently, the costs associated with calling Π S or Π ( √ C) −T are reduced to an order O(µn), where µ n denotes an upper bound for the mean number of nonzero entries in √ C and R.This means in particular that the computational cost of computing the weights through (32) drops to O(Kµn) operations.Similarly, using the same approach as the one described in Section 6.1, the upper bound λ max can be computed in O(µn) operations.In conclusion, the offline costs are of order η Chol (C) + O(µn + K log K), the online costs are of order O(Kµn), and the storage needs are of order O(µn + K).
In practice, an additional approximation can be made to further reduce the computational cost of the algorithm.As advocated by Lindgren et al. [38], the mass matrix C can be replaced by a diagonal approximation C whose entries are given by This approach results in a Markovian approximation of the random field, and is inspired from the lumped mass approximation proposed by Chen and Thomée [13] for parabolic PDEs.On Euclidean domains, this approach introduces an error in the covariance of the resulting field of order O(h 2 ) where h is the mesh size, which, for a uniform mesh, is linked to the dimension n of the finite element space as n = O(h −d ).We show in the numerical experiments in Section 7 that this additional error does not affect the theoretical convergence rates derived in Section 5.
Following the lumped mass approach, the square-root √ C currently computed as a Cholesky factor, is replaced by the square-root C 1/2 of C, which is the diagonal matrix obtained by taking the square-root of the entries of C.This completely eliminates the need for a Cholesky factorization.Also the linear system previously solved by substitution can be trivially solved in linear time since the matrix is diagonal.As for the upper bound λ max it can be computed directly without requiring a power iteration method.Then, the offline costs of our approach drop to O(K log K) and the online costs are of order O(Kµn).As for the storage needs, they are reduced to O(µn) (since both C and R are sparse).These costs are drastically reduced compared to the costs associated with the naive approach presented at the beginning of Section 4, which consisted of a storage need of O(n 2 ) and a computational complexity of O(n 3 ) operations.The storage costs now grow linearly with n, and the computational costs grow linearly with K and n, hence rendering the algorithm much more scalable.6.3.Application: Simulation of Whittle-Matérn fields.To conclude this section, we provide an application of the convergence results in Section 5 and of the computational complexities derived in this section to the approximation of Whittle-Matérn random fields, i.e., fields with a power spectral density given by ( 5)).Corollary 6.1.Let Assumption 5.1 be satisfied, and let γ be given by (5).Then, the approximation error of the random field Z by its Galerkin-Chebyshev polynomial approximation Z n,K of order K ∈ N, satisfies where C Galer is a constant independent of n and K, ρ = min {s; r; (αβ − 1/2)} > 0 and C κ,λ = 2C 1/2 λ κ −1 , α > 0 and C λ > 0 are defined in Proposition 2.1, κ > 0 and β > 0 are as in (5), and r > 0 and s > 0 are given in Assumption 5.1.
In particular, there exist 0 , C 1 , C 2 > 0 (depending only on γ, C Galer and the constants defined in Proposition 2.1 and Assumption 5.1) such that for any ∈ (0, 0 ), taking n = (C 1 ) Proof.To ease the reasoning, let us consider once again that the upper-bound λ max corresponds exactly to the maximal eigenvalue of S. The inequality (33) follows directly from Corollary 5.5, after noting that z ∈ C → γ(z) is holomorphic in the ellipse centered at z = λ max /2, with foci z 1 = 0 and z 2 = λ max , and semi-major axis a = λ max /2 + κ 2 /2 (i.e., χ = κ 2 /2 in Corollary 5.5), and that |γ| can be bounded in this ellipse by (κ 2 /2) −β .The error Z − Z n,K L 2 (Ω;H) satisfies the inequality where E Galer = C Galer n −ρ denotes the contribution to the error estimate due to the Galerkin approximation, and E Cheb the contribution due to the Chebyshev approximation, i.e.,

Offline computational costs Online computational costs Storage costs
General case Linear finite elements + Cholesky Table 2. Comparison of computational and storage costs for computing a Galerkin-Chebyshev approximation of a Whittle-Matérn field with a root-mean-squared error bounded by > 0. The parameter µ is an upper bound for the mean number of nonzero entries in √ C and R and η Chol (C) the computational cost of a Cholesky factorization of C.
In conclusion, let ˜ ∈ (0, 0 ) where 0 = (1 + C Cheb ) max and let = ˜ (1 + C Cheb ) −1 .Then, ∈ (0, max ), and when taking As a consequence, we can derive the computational cost required to sample a GRF with rootmean-squared error Z − Z n,K L 2 (Ω;H) smaller than some small by taking n = (C 1 ) 1/ρ −1/ρ and K = C 2 −α/2ρ | log | in the estimates in Table 1.We end up with the bounds in Table 2.We also provide the computational cost associated with the choice of a linear finite element and mass lumping approximation.This method introduces an additional error term due to the mass lumping approximation but in practice does not seem to affect the theoretical convergence rates of the root-mean-squared error, which allows us to think that we can still carry out the analysis leading to Corollary 6.1 (and therefore to the estimates in Table 2) in this case).We finally observe that a Galerkin-Chebyshev approximation of a Whittle-Matérn field with a rootmean-squared error bounded by > 0 can be asymptotically obtained with a computational cost O(µ −(α/2+1)/ρ | log |) using linear finite elements with a mass lumping approximation.

Numerical experiments
In this section we confirm the convergence estimates derived in Section 5 using numerical experiments.In a first subsection, we restrict ourselves to the specific case where the Riemannian manifold of interest (M, g) is the 2-sphere endowed with its canonical metric, as in this case the eigenvalues and eigenvectors of the Laplace-Beltrami are known, and hence the exact solution can be computed and compared to the various approximations introduced in this work.In a second subsection, we investigate the case where the Riemannian manifold of interest is a hyperboloid, for which, even though the the eigenvalues and eigenvectors of the Laplace-Beltrami are not known, we are still able retrieve the error estimate for the covariance.
where for l ∈ N, m ∈ [[0, l]], P m l denotes the associated Legendre polynomial with indices l and m.In the remainder of this section, we use α = 2/d = 1 by Weyl's asymptotic law in Proposition 2.1.
On the sphere, the Gaussian random fields defined using functions of the Laplacian γ(−∆ M ) as in ( 4) are particular instances of the class of isotropic random fields on the sphere described in [35].The covariance C(θ) of such fields between any two points on the sphere is linked to the spherical distance θ separating the points through the relation (34) C where P l , l ∈ N 0 , denotes the Legendre polynomial of order l.Finally, we restrict our numerical experiments to Whittle-Matérn fields by considering power spectral densities of the form γ(λ) = |κ 2 + λ| −β for λ ≥ 0 and some parameters κ > 0, β > 1/2.We introduce an additional parameter a, which we call practical range, and which is defined from the parameters ν and κ by a = 3.6527κ −1 ν 0.4874 .In the remainder of this section, the power spectral densities γ will be characterized by choices of the parameters ν and a.The rationale behind the parameter a comes from numerical experiments conducted in [47] which showed that the correlation range of the Matérn covariance function (on R 2 ) is very-well approximated by a, thus yielding a rule-of-thumb for choosing κ.
We now present the result obtained when computed numerically the truncation error, and the covariance error.Results on the error due to the polynomial approximation can be found in Section SM4.3 of the Supplementary Materials.L 2 (Ω;H) between the full expansion Z and its truncation Z (n) at order n ∈ N, for various choices of n.This error corresponds to the error term derived in Theorem 5.6 when the discretization space V n is the set of the first n eigenvalues of the Laplace-Beltrami operator (cf.Section 6.2.1).In this case, Assumption 5.1 holds for arbitrary large values of the exponents r, s > 0 and we therefore expect a convergence of order αβ − 1/2 = ν/2.
We compute truncation errors for the power spectral densities γ given by ν = 0.75, a ∈ {π/6, π/3}, yielding an expected convergence of order 0.375; ν = 1, a ∈ {π/6, π/3}, yielding an expected convergence of order 0.5; and truncation orders n ∈ {10 10 3 , 10 4 , 5 • 10 4 , 10 5 }.Samples of the corresponding truncated fields are generated using the approach presented in Section 6.2.1.The error Z − Z (n) L 2 (Ω;H) is approximated by a Monte Carlo estimate taking the form , where for any k, Z (Nmax) k is an independent realization of the truncation of Z at a very high order N max = 10 6 , and Z (n) k is a truncation of Z (Nmax) k at order n.The number of samples used for this study is N simu = 500, which is sufficient as larger choices of N max have little impact on the results.The results are presented in Figure 2 and show that the theoretical orders of convergence are systematically retrieved.7.1.2.Covariance error and computational cost.The covariance error refers to the absolute error in covariance between the model random field and its approximation used in practice.We take here the discretization space V n to be the finite element space of piecewise linear functions defined on a polyhedral approximation of the sphere with triangular faces, hence following the surface finite element (SFEM) approach [19].
We generate 10 6 samples of the random field Z n,K while considering finite element spaces defined on gradually refined polyhedral approximations of the sphere.For each choice of parameter defining the spectral density γ, we set the order K of the polynomial approximation using the approach described in Section 5.2, with a criterion |c K /c max | < 10 −12 .The covariance error we compute is given as an error between the covariance functions of the field Z and its approximation Z n,K .The former is given in (34) and the latter is approximated by a Monte Carlo estimator.The overall error between both covariance functions is then evaluated as the maximum absolute error between their evaluations on a grid of 500 equispaced points in (0, π) along a great circle.The covariance errors are presented in Figure 3a, and show that the theoretical convergence rate 2αβ − 1 = ν is confirmed.
Finally, we present the order of polynomial approximation in Figure 3b and the associated computation time needed to generate the samples used to compute the covariance errors in Figure 3c.We observe that although the order of the polynomial approximation grows, the computation time remains small with less than half a second.Finally, we compute the covariances with this same approach on a very fine polyhedral approximation of M (with 540900 nodes) and use these values as the reference solution.We then compute, for each level of discretization of M, the maximal absolute error between the covariance values and the ground truth.The result of the numerical experiment is presented in Figure 5.The parameters defining the power spectral density γ are ν = 1 and a = 0.5 (defined as in Section 7) meaning that we expect convergence of rate ν = 1.As can be observed, we retrieve that the maximal absolute error in the covariance decreases as n −1 .

SM3. Proof of Theorem 5.7 of the main article
We know provide a proof of Theorem 5.7 of the main article, which we first recall.
Theorem.Let Assumptions 2.4 and 5.1 be satisfied.Then, there exists some N 2 ∈ N such that for any n > N 2 , the covariance error between the random field Z and its discretization Z n satisfies, for any θ, ϕ ∈ H, Discretization error R D (θ, ϕ) : From the triangle inequality, D (θ, ϕ), where Z (n) is defined as D (θ, ϕ) θ 0 ϕ 0 n −r .If now q > 1, since α(q − 1) ≤ r, R

SM4. Sampling a Galerkin-Chebyshev approximation of a random field
In Section 4 of the main article, we present an approach to generate samples of the Galerkin-Chebyshev approximation of a random field defined on a Riemannian manifold.We provide here additional implementation details and pseudo-code for this approach.SM4.1.An upper-bound for the eigenvalues of the stiffness matrix.In order to define the polynomial P γ,K used to approximate the power spectral density γ defining the random field, one needs to provide an upper-bound λ max of the largest eigenvalue of the stiffness matrix S. Let us denote by λ (n) n this maximal eigenvalue.Recall from Theorem 3.1 and Corollary 3.2 of the main article that the eigenvalues of S are exactly those of the stencil (R, C).As such, they can be upper-bounded by the maximum of the associated Rayleigh quotient, thus giving λ (n)  n ≤ max x T Rx x T Cx ≤ max x T Rx min x T Cx .
We recognize on the right-hand side of the last inequality the ratio between two Rayleigh quotients.Hence, we can conclude that an upper-bound λ max of the eigenvalues of S is obtained by taking the ratio where λ max (R) (resp.λ min (C)) is an upper-bound (resp.lower-bound) of the eigenvalues of the stiffness matrix R (resp.mass matrix C).On the one hand, λ max (R) can be obtained using the Gershgorin circle theorem, thus requiring only to sum the entries of R row-wise (or column-wise) to get the bound.On the other hand, λ min (C) can be taken to be the inverse of an upper-bound λ max (C −1 ) of the eigenvalues of the inverse of C.This upper-bound can in turn be obtained using a power iteration scheme which would require to solve linear systems defined by C.
SM4.2.Workflow and pseudo-code.We now present the workflow used to generate samples of the Galerkin-Chebyshev approximation and some pseudo-code associated with the different steps of this workflow.The overall workflow is presented in Workflow 1.The weights of the Galerkin-Chebyshev approximation can be sampled using Algorithm 1.This algorithm relies on the following two sub-algorithms: an algorithm Π S taking as input a vector x and returning the product Π S (x) = Sx;

Figure 1 .
Figure 1.Simulations of Gaussian random fields on various (compact Riemannian) manifolds.

Assumption 2 . 4 .
The power spectral density considered in this work satisfy the relation2αβ − 1 > 0where β > 0 is defined in Definition 2.3 and α > 0 is defined in Proposition 2.1.
and following Definition 2.3 and Proposition 2.1,

Figure 4 .Figure 5 .
Figure 4. Whittle-Matérn field on the with the sampled points P used to compute the covariances (in black).The point (1, 0, 0) is colored in red, and the colors on the surface stand for the value of taken by the field.