Sparse operator compression of higher-order elliptic operators with rough coefficients

We introduce the sparse operator compression to compress a self-adjoint higher-order elliptic operator with rough coefficients and various boundary conditions. The operator compression is achieved by using localized basis functions, which are energy-minimizing functions on local patches. On a regular mesh with mesh size $h$, the localized basis functions have supports of diameter $O(h\log(1/h))$ and give optimal compression rate of the solution operator. We show that by using localized basis functions with supports of diameter $O(h\log(1/h))$, our method achieves the optimal compression rate of the solution operator. From the perspective of the generalized finite element method to solve elliptic equations, the localized basis functions have the optimal convergence rate $O(h^k)$ for a $(2k)$th-order elliptic problem in the energy norm. From the perspective of the sparse PCA, our results show that a large set of Mat\'{e}rn covariance functions can be approximated by a rank-$n$ operator with a localized basis and with the optimal accuracy.


Introduction
1.1. Main objectives and the problem setting. The main purpose of this paper is to develop a general strategy to compress a class of self-adjoint higher-order elliptic operators by localized basis functions that give optimal approximation property of the solution operator. To be more specific, suppose L is a self-adjoint elliptic operator in the divergence form which is the optimal approximation error of L −1 among all positive semidefinite operators with range space spanned by Ψ . Using E oc (Ψ ; (L + λ G ) −1 ) for some λ G > 0 to quantify the compression error is useful for operators that are not invertible, such as −∆ with periodic boundary conditions. Without imposing the sparsity constraints on the basis Ψ , the compression error E oc (Ψ ; L −1 ) achieves its minimum λ n+1 (L −1 ) if we use the first n eigenfunctions of L −1 to form Ψ (λ n is the nth eigenvalue arranged in a descending order). However, the eigenfunctions are expensive to compute and do not have localized support [49,41,21]. In many cases, localized/sparse basis functions are preferred. For example, in the multiscale finite element method [13], localized basis functions lead to sparse linear systems, and thus result in more efficient algorithms; see, e.g., [1,23,45,24,2,12,11,29,40,37,5]. In quantum chemistry, localized basis functions like the Wannier functions have better interpretability of the local interactions between particles (see, e.g., [31,10,30,41,27]), and also lead to more efficient algorithms [16]. In statistics, the sparse principal component analysis (SPCA) looks for sparse vectors to span the eigenspace of the covariance matrix, which leads to better interpretability compared with the PCA; see, e.g., [26,49,8,47,46]. 1.2. Summary of our main results. In this paper, we study operator compression for higher-order elliptic operators. We assume that the self-adjoint elliptic operator L is coercive, bounded and strongly elliptic (to be made precise in Section 6.2). Under these assumptions, we construct n basis functions Ψ loc = [ψ loc 1 , . . . , ψ loc n ] that achieve nearly optimal performance on both ends in the accuracy-sparsity trade-off (1.10).
Here, |supp(ψ loc i )| denotes the area/volume of the support of the localized function ψ loc i in R d , and the constant C l is independent of n. 2. If we use a generalized finite element method [1,23,45,11] to solve the elliptic equations, we achieve the optimal convergence rate in the energy norm, i.e., (1.5) , where L n is the stiffness matrix under the basis Ψ loc , · H is the associated energy norm, and C e is independent of n. 3. For the sparse operator compression problem, we achieve the optimal approximation error up to a constant, i.e., (1.6) E oc (Ψ loc ; L −1 ) ≤ C 2 e λ n (L −1 ), where E oc (Ψ loc ; L −1 ) is the operator compression error defined in Eqn. (1.3).
We will focus on the theoretical analysis of the approximation accuracy (1.5) and the localization of the basis functions (1.4).

Our construction.
To construct such localized basis functions Ψ loc = [ψ loc 1 , . . . , ψ loc n ], we first partition the physical domain D using a regular partition {τ i } m i=1 with mesh size h. We pick {ϕ i,q } Q q=1 to be a set of orthogonal basis functions of P k−1 (τ i ), which is the space of all d-variate polynomials of degree at most k − 1 on the patch τ i ⊂ D and Q = k+d−1 d is the dimension of the space P k−1 (τ i ). For r > 0, let S r be the union of the subdomains τ j that intersect with B(x i , r) (for some x i ∈ τ i ) and let ψ loc i,q be the minimizer of the following quadratic problem: (1.7) Here, the space H = {L −1 f : f ∈ L 2 (D)} is the solution space of the operator L, and · H is the energy norm associated with L and the prescribed boundary condition. It is important to point out that the boundary condition of the elliptic problem is already incorporated in the above optimization problem through the solution space H and the definition of the energy norm · H . This variational formulation is very general and can take into account lower-order terms very easily. Collecting all the ψ loc i,q for 1 ≤ i ≤ m and 1 ≤ q ≤ Q together, we get our basis Ψ loc . We will prove that for r = O(h log(1/h)), (1) they achieve the optimal convergence rate to solve the elliptic equation, i.e., (1.8) L −1 f − Ψ loc L −1 n (Ψ loc ) T f H ≤ C e h k f 2 ∀f ∈ L 2 (D), where the constant C e is independent of n.

1.4.
Comparison with other existing methods. Our approach for operator compression originates at the MsFEM and numerical homogenization, where localized multiscale basis functions are constructed to approximate the solution space of some elliptic PDEs with multiscale coefficients; see [1,23,45,13,2,11,29,40,36,37,5]. Specifically, our work is inspired by the work presented in [29,37], in which multiscale basis functions with support size O(h log(1/h)) are constructed for second-order elliptic equations with rough coefficients and homogeneous Dirichlet boundary conditions. In this paper, we generalize the construction [37] and propose a general framework to compress higher-order elliptic operators with optimal compression accuracy and optimal localization.
We remark that although we use the framework presented in [37] as the direct template for our method, to the best of our knowledge, the local orthogonal decomposition (LOD) [29], in the context of multidimensional numerical homogenization, contains the first rigorous proof of optimal exponential decay rates with a priori estimates (leading to localization to subdomains of size h log(1/h), with basis functions derived from the Clement interpolation operator). The idea of using the preimage of some continuous or discontinuous finite element space under the partial differential operator to construct localized basis functions in Galerkin-type methods was even used earlier, e.g., in [17], although it did not provide a constructive local basis. In addition to establishing the exponential decay of the basis (for general nonconforming measurements of the solution, we will generalize the proof of this result to higher-order PDEs and measurements formed by local polynomials), a major contribution of [37] was to introduce a multiresolution operator decomposition for second-order elliptic PDEs with rough coefficients.
There are several new ingredients in our analysis that are essential for us to obtain our results for higherorder elliptic operators with rough coefficients. First of all, we prove an inverse energy estimate for functions in Ψ, which is crucial in proving the exponential decay. In particular, Lemma 4.1 is an essential step to obtaining the inverse energy estimate for higher-order PDEs that is not found in [29] nor [37]. We remark that Lemma 3.12 in [37] provides such an estimate for second-order elliptic operators, by utilizing a relation between the Laplacian operator ∆ and the d-dimensional Brownian motion. It is not straightforward to extend this probabilistic argument to higher-order cases. In contrast, our inverse energy estimate is valid for any 2kth-order elliptic operators and is tighter than the estimation in [37] for the second-order case. Secondly, we prove a projection-type polynomial approximation property in H k (D). This polynomial approximation property plays an essential role in both estimating the compression accuracy and in localizing the basis functions. Thirdly, we propose the notion of the strong ellipticity to analyze the higher-order elliptic operators and show that strong ellipticity is only slightly stronger than the standard uniform ellipticity. Very recently, the authors of [38] introduce the Gaussian cylinder measure and successfully generalize the probabilistic framework in [36,37] to a much broader class of operators, including higher-order elliptic operators without requiring the strong ellipticity.
As in [29,37], the error bound in our convergence analysis blows up for fixed oversampling ratio r/h. To achieve the desired O(h k ) accuracy in the energy norm, we require r/h = O(log(1/h)). There has been some previous attempt to study the convergence of MsFEM using oversampling techniques with r/h being fixed, see, e.g., [19,42]. In particular, the authors of [19,42] showed that if the oversampling ratio r/h is fixed, the accuracy of the numerical solution will depend on the regularity of the solution and cannot be guaranteed for problems with rough coefficients. By imposing r/h = O(log(1/h)), the authors of [19,42] proved that the the MsFEM with constrained oversampling converges with the desired accuracy O(h).
There has been some previous work for second-order elliptic PDEs by using basis functions of support size O(h), see, e.g., [2,22]. However, they need to use O(log(1/h)) basis functions associated with each coarse finite element to recover the O(h) accuracy. The computational complexity of this approach is comparable to the one that we present in this paper. It is worth mentioning that the authors of [22] use a local oversampling operator to construct the optimal local boundary conditions for the nodal multiscale basis and enrich the nodal multiscale basis with optimal edge multiscale basis. Moreover, the method in [22] allows an explicit control of the approximation accuracy in the offline stage by truncating the SVD of the oversampling operator. In [22], the authors demonstrated numerically that this method is robust to high-contrast problems and the number of basis functions per coarse element is typically small. We remark that the recently developed generalized multiscale finite element method (GMsFEM) [11,5] has provided another promising approach in constructing multiscale basis functions with support size O(h).
Another popular way to formulate the operator compression problem is to solve the following l 1 penalized variational problem: (1.10) where ψ i H is the energy norm induced by the operator L. In problem (1.10), enforcing ψ i H to be small leads to a small compression error, enforcing ψ i 1 to be small leads to a sparse basis function, and λ > 0 is a parameter to control the trade-off between the accuracy and sparsity.
The sparse PCA (SPCA) is closely related to the above l 1 -based optimization problem. Given a covariance function K(x, y), the SPCA solves a variational problem similar to Eqn. (1.10): In the SPCA (1.11), we have the minus sign in front the variational term because we are interested in the eigenspace corresponding to the largest n eigenvalues. Although the l 1 approach performs well in practice, neither Problem (1.10) nor the SPCA (1.11) is convex, and one needs to use some sophisticated techniques to solve the non-convex optimization problem or its convex relaxation; see, e.g., [49,8,41,46,27].
In comparison with the l 1 -based optimization method or the SPCA, our approach has the advantage that this construction will guarantee that ψ i,q decays exponentially fast away from τ i . This exponential decay justifies the local construction of the basis functions in Eqn. (1.7). Moroever, our construction (1.7) is a quadratic optimization with linear constraints, which can be solved as efficiently as solving an elliptic problem on the local domain S r . The computational complexity to obtain all n localized basis functions {ψ loc i } n i=1 is only of order N log 3d (N ) if a multilevel construction is employed, where N is the degree of freedom in the discretization of L; see [37]. In contrast, the orthogonality constraint in Eqn. (1.10) is not convex, which introduces additional difficulties in solving the problem. Finally, our construction of is completely decoupled, while all the basis functions in Eqn. (1.10) are coupled together. This decoupling leads to a simple parallel execution, and thus makes the computation of {ψ loc i } n i=1 even more efficient. The rest of the paper is organized as follows. In Section 2, we introduce the abstract framework of the sparse operator compression. In Section 3, we prove a projection-type polynomial approximation property for the Sobolev spaces, which can be seen as a generalization of the Poincare inequality for functions with higher regularity. This polynomial approximation property is critical in our analysis of the higher-order case. It plays a role similar to that of the Poincare inequality in the analysis of the second-order elliptic operator. In Section 4, we prove the inverse energy estimate by scaling. In Section 5, we use the second-order elliptic PDE to illustrate the main idea of our analysis. In Section 6, we first introduce the notion of strong ellipticity, and then prove the exponential decay of the constructed basis function for strongly elliptic operators. In Section 7, we localize the basis functions, and provide the convergence rate for the corresponding MsFEM and the compression rate for the corresponding operator compression. Finally, we present several numerical results to support the theoretical findings in Section 8. Some concluding remarks are made in Section 9 and a few technical proofs are deferred to the Appendix.

Operator compression
In this section, we provide an abstract and general framework to compress a bounded self-adjoint positive semidefinite operator K : X → X, where X can be any separable Hilbert space with inner product (·, ·). In the case of operator compression of an elliptic operator L, K plays the role of the solution operator L −1 and X = L 2 (D). In the case of the SPCA, K plays the role of the covariance operator. In Section 2.1, we introduce the Cameron-Martin space, which plays the role of the solution space of L. In Section 2.2, we provide our main theorem to estimate the compression error. We will use this abstract framework to compress elliptic operators in the rest of the paper.
2.1. The Cameron-Martin space. Suppose {(λ n , e n )} ∞ n=1 are the eigen pairs of the operator K with the eigenvalues {λ n } ∞ n=1 in a descending order. We have λ n ≥ 0 for all n since K is self-adjoint and positive semidefinite. From the spectral theorem of a self-adjoint operator, we know that {e n )} ∞ n=1 forms an orthonormal basis of X.
Lemma 2.1. Let K(X) be the range space of K. We have 1. K(X) is an inner product space with inner product defined by is dense in X if the null space of K only contains the origin, i.e., null(K) = {0}.
We define the Cameron-Martin space H as the completion of K(X) with respect to the norm (·, ·) H . Then, H is a separable Hilbert space and we have the following lemma.
3. For all ψ ∈ X and all f ∈ H, we have Proof.
1. By the continuous imbedding from K(X) to X, we know that a Cauchy sequence in K(X) is also a Cauchy sequence in X. Therefore, we have H ⊂ X. By Eqn. (2.2) and the the continuity of norms, we have (ψ, ψ) ≤ λ 1 (ψ, ψ) H for any ψ ∈ H. 2. It is obvious from item 3 in Lemma 2.1.

Operator compression.
Suppose H is an arbitrary separable Hilbert space and Φ ⊂ H is n-dimensional subspace in H with basis {ϕ i } n i=1 . In the rest of the paper, P (H) Φ denotes the orthogonal projection from a Hilbert space H to its subspace Φ. With this notation, we present our theorem for error estimates below.
Let Ψ be the n-dimensional subspace in H (also in X) spanned by {Kϕ i } n i=1 . Then 1. For any u ∈ K(X) and u = Kf , we have For any u ∈ K(X) and u = Kf , we have where · is the induced operator norm on B(X, X). Moreover, the rank-n operator P (H) In Theorem 2.1, by using a projection-type approximation property of Φ in H, i.e., Eqn. (2.4), we obtain the error estimates of the multiscale finite element method with finite element basis {Kϕ i } n i=1 in the energy norm, i.e., Eqn. (2.5). We will take Φ as the discontinuous piecewise polynomial space later, which is a poor finite element space for elliptic equations with rough coefficients. However, after smoothing Φ with the solution operator K, the smoothed basis functions {Kϕ i } n i=1 have the optimal convergence rate. This datadependent methodology to construct finite element spaces was pioneered by the generalized finite element (GFEM) [1,45], the multiscale finite element method (MsFEM) [23,25,13], and numerical homogenization [29,37].
Our error analysis is different from the traditional finite element error analysis in two aspects. First of all, the traditional error analysis relies on an interpolation type approximation property where higher regularity is required. For example, the error analysis for the FEM with standard linear nodal basis functions for the Poisson equation requires the following interpolation type approximation: |u − I h u| 1,2,D ≤ Ch|u| 2,2,D ∀u ∈ H 2 0 (D), where I h u is the piecewise linear interpolation of the solution u. In Eqn. (2.8), one assumes u ∈ H 2 (D), but this is not the case for elliptic operators with rough coefficients. Secondly, in our projection-type approximation property (2.4) the error is measured by the "weaker" · X norm, while in the traditional interpolation type approximation property the error is measured by the "stronger" · H norm. In this sense, our error estimate relies on weaker assumptions. As far as we know, this kind of error estimate was first introduced in Proposition 3.6 in [37].

Proof. [Proof of Theorem 2.1]
1. For an arbitrary v ∈ Ψ, due to the definition of Ψ, we can write v = K( n i=1 c i ϕ i ), and thus we get By choosing c i such that n i=1 c i ϕ i = P (X) Φ (f ), the second term vanishes. Then, we obtain We use the Aubin-Nistche duality argument to get the estimation in item 2. Let v = K(u − P (H) Ψ u). On one hand, we get On the other hand, we obtain We have used the result of item 1 in the last step. Combining these two estimates, the result follows. 3. From the last item, we obtain that Kf − P (H) Ψ Kf X ≤ k 2 n f X for any f ∈ X. Therefore, we conclude K − P Although the basis functions {Kϕ i } n i=1 have good approximation accuracy, they are typically not localized. Therefore, we construct another set of basis functions {ψ i } n i=1 for Ψ via the following variational approach, which results in basis functions with good localization properties. For any given i ∈ {1, 2, . . . , n}, consider the following quadratic optimization problem It is easy to verify that {Kϕ i } n i=1 are linearly independent if and only if Θ is invertible. We will write Θ −1 as its inverse and Θ −1 i,j as the (i, j)th entry of Θ −1 . It is not difficult to prove the following properties of ψ i , which is defined as the unique minimizer of Eqn. (2.9).
Theorem 2.2. If null(K) ∩ Φ = {0} holds true, then we have 1. The optimization problem (2.9) admits a unique minimizer ψ i , which can be written as 2. For w ∈ R n , n i=1 w i ψ i is the minimizer of ψ H subject to (ϕ j , ψ) = w j for j = 1, 2, . . . , n. Moreover, for any ψ which satisfies (ϕ j , ψ) = w j for j = 1, 2, . . . , n, we have With a good choice of the space Φ and its basis {ϕ i } n i=1 , the energy-minimizing basis ψ i , defined in Eqn. (2.9), enjoys good localization properties. We will prove that the energy-minimizing basis function ψ i decays exponentially fast away from its associated patch. The localization property justifies the following local construction of the basis functions: where S i ⊂ D is a neighborhood of the patch that ψ i is associated with. Compared with Eqn. (2.9), the localized basis ψ loc i is obtained by solving exactly the same quadratic problem but on a local domain S i . To compress elliptic operators with order 2k, we take Φ as the space of (discontinuous) piecewise polynomials, with degree no more than k − 1. We take its basis as is the dimension of the d-variate polynomial space with degree no more than k − 1 and {ϕ i,q } Q q=1 is an orthonormal basis of the polynomial space on the patch τ i . Two main theoretical results in this paper are as follows.
1. The basis function ψ i decays exponentially fast away from its associated patch; see Theorem 6.3 and Theorem 6.4. 2. The localized basis function ψ loc i approximates ψ i accurately; see Theorem 7.1. Meanwhile, the compression rate E oc (Ψ loc ; L −1 ) is the same as E oc (Ψ ; L −1 ); see Theorem 7.2 and Corollary 7.3.

A Projection-type Polynomial Approximation Property
The following projection-type polynomial approximation property in the Sobolev space H k (D) plays an essential role in both obtaining the optimal approximation error and proving the exponential decay of the energy-minimizing basis functions. It can be viewed as a generalized Poincare inequality. Suppose Ω ⊂ R d is affine equivalent to Ω, i.e., there exists an invertible affine mapping such that F ( Ω) = Ω. Let h be the diameter of Ω and δh be the maximum diameter of a ball inscribed in Ω. Let the mapping Π : H k+1 (Ω) → P k (Ω) be the projection onto the polynomial space with degree no greater than k in L 2 (Ω). Then, there exists a constant C(k, Ω) such that for any u ∈ H k+1 (Ω) and any 0 ≤ p ≤ k +1 To prove Theorem 3.1, we use a basic result about the Sobolev spaces, due to J. Deny and J.L. Lions, which pervades the mathematical analysis of the finite element method: over the quotient space H k+1 (D)/P k (D), the seminorm |·| k+1,D is a norm equivalent to the quotient norm. We will use the following theorem (Theorem 3.1.4 in [6]), to prove Theorem 3.1.
and let Π : H k+1 ( Ω) → H m ( Ω) be a continuous linear mapping such that For any open set Ω which is affine equivalent to the set Ω (see Eqn. (3.1)), let the mapping Π Ω be defined by Then, there exists a constant C( Π, Ω) such that, for all affine-equivalent sets Ω, where h = diam(Ω) and δh is the diameter of the biggest ball contained in Ω.
By specializing the operator Π to be the projection of H k+1 ( Ω) to the polynomial space P k ( Ω) in L 2 ( Ω), we can prove Theorem 3.1.
Proof of Theorem 3.1. Let Π : H k+1 ( Ω) → P k ( Ω) be the orthogonal projection in L 2 ( Ω). Let F : Ω → Ω be the invertible linear map and write F ( for all functions v ∈ H k+1 ( Ω) and v ∈ H k+1 (Ω) in the correspondence of the linear mapping. In the following, we prove that Π Ω : In the last equality, we have used the fact that p ∈ P k ( Ω) if p ∈ P k (Ω) and the fact that Π : H k+1 ( Ω) → P k ( Ω) is the orthogonal projection in L 2 ( Ω). Therefore, the kernel space of Π Ω is orthogonal to its range space, i.e., P k (Ω). With the three points above, we have proved that Π Ω is the orthogonal projection from H k+1 (Ω) to P k (Ω) in L 2 (Ω).
We also give the following theorem, which is a direct result of the Friedrichs' inequality; see, e.g., [35].
Here, C f = C f (d, k) depends only on the physical dimension d and the order of the derivative k.

An inverse energy estimation by scaling
In the sparse operator compression, we will show that for a large set of compact operators, the basis functions {ψ i } n i=1 constructed in (2.9) have exponentially decaying tails, which makes localization of these basis functions possible. The following lemma plays a key role in proving such exponential decay property.
Let P s be the space of polynomials with order not greater than s. For γ ≥ 0, there exists C(k, s, d, δ) > 0, such that Proof. Let G h be the Green's function of Eqn. (4.1). After multiplying u h on both sides of Eqn. (4.1) and integration by parts, we have Let {p 1 , p 2 , . . . , p Q } be all the monomials that span P s−1 . It is easy to see Q = s+d−1 d . For convenience, we assume that {p i } Q i=1 are in non-decreasing order with respect to its degree. Specifically, p 1 = 1. Let u h,i be the solution of Eqn. (4.1) with right-hand side p i , and S h , M h ∈ R Q×Q be defined as follows: Then, Eqn. (4.3) is equivalent to and o i is the degree of p i . Therefore, it is easy to check that Notice that both S 1 and M 1 are symmetric positive definite, and let λ max (M 1 , S 1 ) > 0 be the largest generalized eigenvalue of M 1 and S 1 . By choosing we have Combining (4.7) and (4.9), Eqn. (4.5) naturally follows. In Appendix A, we prove that C(k, s, d, Ω 1 ) can be bounded by C(k, s, d, δ), and this proves the lemma.
For the case s = k = 1, we can take as proved in Proposition (A.1). In this case, we have the estimate where |Ω h | is the volume of Ω h . The above bound is tight: when Ω h is a ball with diameter h, the equality holds true. Making use of the mean exit time of a Brownian motion, the author of [37] obtained a different bound where V d is the volume of a unit d-dimensional ball. The two estimates have the same order of δ and h, but our estimates from Lemma 4.1 is much tighter. Moreover, Lemma 4.1 give estimates for any order k and any degree s, which plays a key role in proving the exponential decay in high-order cases, but the mean exit time of a Brownian motion is difficult to generalize to get these higher-order results.

Exponential decay of basis functions: the second-order case
The analysis for a general higher-order elliptic PDE is quite technical. In this section, we will prove that the basis function ψ i for a second-order elliptic PDE has exponential decay away from τ i . When c ≡ 0, this problem has been studied in [37]. When c = 0, it has been recently studied in [39] independently of our work. The results presented in this second-order case are not new [37]. We would like to use the simpler second-order elliptic PDE example to illustrate the main ingredients in the proof of exponential decay for a higher-order elliptic PDE, namely the recursive argument, the projection-type approximation property and the inverse energy estimate.
Consider the following second-order elliptic equation: where D is an open bounded domain in R d , the potential c(x) ≥ 0 and the diffusion coefficient a(x) is a symmetric, uniformly elliptic d×d matrix with entries in L ∞ (D). For simplicity, we consider the homogeneous Dirichlet boundary condition here. We emphasize that all our analysis can be carried over for other types of homogeneous boundary conditions. We assume that there exist 0 < a min ≤ a max and c max such that To simply our notations, for any ψ ∈ H and any subdomain S ⊂ D, ψ H(S) denotes S ∇ψ · a∇ψ + cψ 2 1/2 . For the second-order case, the projection-type approximation property is simply the Poincare inequality. The following lemma provides us the inverse energy estimate. It is a special case of Lemma 6.2, and can be proved by using Lemma 4.1.
Now, we are ready to prove the exponential decay of the basis function ψ i .
, it holds true that with l = e−1 π (1 + C(d, δ)) amax amin and C(d, δ) = 8d(d Proof. Let k ∈ N, l > 0 and i ∈ {1, 2, . . . , m}. Let S 0 be the union of all the domains τ j that are contained in the closure of B(x i , klh) ∩ D, let S 1 be the union of all the domains τ j that are not contained in the closure of B(x i , (k + 1)lh) ∩ D and let S * = S c 0 ∩ S c 1 ∩ D (be the union of all the remaining elements τ j not contained in S 0 or S 1 ), as illustrated in Figure 1.
We will choose l such that C ≤ 1 e−1 and thus get b k ≤ e 1−k b 0 , which gives the result (5.4). We start from k = 1 because we want to make sure τ i ∈ S 0 ; otherwise, S 0 = ∅ and τ i ∈ S * . Now, we prove that for any k ≥ 1, there exists constant C such that lh . 1 By integration by parts, we obtain Since a 0 and c ≥ 0, the left-hand side gives an upper bound for ψ i H(S1) . Combining ∇η ≡ 0 on S 0 ∪ S 1 and the Cauchy-Schwarz inequality, we obtain We have used c ≥ 0 to get S * ∇ψ i · a∇ψ i 1/2 ≤ ψ i H(S * ) in the last inequality. By the construction of ψ i (2.9), we have D ψ i ϕ j = 0 for i = j. Thanks to (2.11), we have −∇ · (a∇ψ i ) + cψ i ∈ Φ. Therefore, we have S1 ηψ i (−∇ · (a∇ψ i ) + cψ i ) = 0. Denoting η j as the volume average of η over τ j , we have Up to now, I 1 and I 2 are some quantities of ψ i purely on S * , and we only need to prove that both of them can be bounded by ψ i 2 H(S * ) (up to a constant). By applying the Poincare inequality, we can easily do this for I 1 , as we will see soon. However, I 2 involves the high-order term Lψ i L 2 (τj ) which in general may not be bounded by the lower-order term ψ i H(S * ) . Fortunately, this can be proved since Lψ i ∈ Φ, the piecewise constant function space. For the current operator Lu = −∇ · (a(x)∇u) + c(x)u with rough coefficient a and nonzero potential c, . Then, we obtain By the construction of ψ i (2.9), we have τj ψ i = 0 for all τ j ∈ S * . By the Poincare inequality, we have ψ i L 2 (τj ) ≤ ∇ψ i L 2 (τj ) h/π, and then we obtain With the iterative argument given before, we have proved the exponential decay.
Remark 5.1. We point out that boundary conditions may be important in several applications. For example, the Robin boundary condition is useful in the application of the SPCA. The periodic boundary condition is useful in compressing a Hamiltonian with a periodic boundary condition in quantum physics.
The above proof can be applied to the operator L in (5.1) with other boundary conditions as long as the corresponding problem Lu = f has a unique solution u ∈ H k (D) for every f ∈ L 2 (D). For other homogeneous boundary condition, the Cameron-Martin space is not H 1 0 (D). Instead, we should use the solution space associated with the corresponding boundary condition. The proof of Theorem 5.1 can be easily carried over to other homogeneous boundary conditions, and the only difference is that a different boundary condition leads to slightly different integration by parts in (5.5). For the homogeneous Neumann boundary condition or the periodic boundary condition, the proof is exactly the same because the integration by parts (5.5) can be carried out in exactly the same way. For the problems with the Robin boundary condition, i.e., where α(x) ≥ 0, the Cameron-Martin space is the subspace of H 1 (D) in which all elements satisfy the Robin boundary condition and the associated energy norm is defined as In this case, for a subdomain S ⊂ D, the local energy norm on S should be modified as follows: Similarly, we can define the Cameron-Martin space and the associated energy norm for the homogeneous mixed boundary conditions.
6. Exponential decay of basis functions: the higher-order case In this section, we will study the case when K : L 2 (D) → L 2 (D) is the solution operator of the following higher-order elliptic equation: Here, we only consider the case when L (thus K) is self-adjoint, i.e., The corresponding symmetric bilinear form on H k 0 (D) is denoted as We assume that B is an inner product on H k 0 (D) and the induced norm (B(u, u)) 1/2 is equivalent to the H k 0 (D) norm, i.e., there exists 0 < a min ≤ a max such that . Thanks to the Riesz representation lemma, Eqn. (6.1) has a unique weak solution in H k 0 (D) for f ∈ L 2 (D).
6.1. Construction of basis functions and the approximation rate. Suppose D is divided into elements {τ i } 1≤i≤m , where each element τ i is a triangle or a quadrilateral in 2D, or a tetrahedron or hexahedron in 3D. Denote the maximum element diameter by h. We also assume that the subdivision is regular [6]. This means that if h i denotes the diameter of τ i and ρ i denotes the maximum diameter of a ball inscribed in τ i , there is a constant δ > 0 such that Applying Theorem 3.1 to Ω = τ j , for any u ∈ H k (D) and any 0 ≤ p ≤ k, we have are affine equivalent to an equilateral triangle or square in 2D, or a equilateral 3-simplex or cubic in 3D. Therefore, for any u ∈ H k (D), any 1 ≤ i ≤ m and any 0 ≤ p ≤ k, we have Specifically for p = 0, u ∈ L 2 (D) with u| τi = Π i u, we conclude that Let X = L 2 (D) and H = H k 0 (D). We use the standard inner product for L 2 (D) and use the inner product u, v = B(u, v) for H. Further, we denote K : L 2 (D) → L 2 (D) as the operator mapping f to the solution u in Eqn. (6.1). Let {ϕ i,q } Q q=1 be an orthogonal basis of P k−1 (τ i ) with respect to the inner product in L 2 (τ i ), where Q = k+d−1 d is the number of d-variate monomials with degree at most k − 1. We take Without loss of generality, we normalize these basis functions such that A set of basis functions of Ψ is defined by Eqn. (2.9) accordingly, i.e., Combining Eqn. (6.4) and (6.6), we have Applying Theorem 2.1 with X and H defined above, we have 1. For any u ∈ H and Lu = f , we have Here, C p plays the role of the Poincare constant 1/π. 2. For any u ∈ H and Lu = f , we have 3. We have Notice that the eigenvalues of the operator L (with the homogeneous Dirichlet boundary conditions) in (6.1) grow like λ n (L) ∼ n 2k/d (see, e.g., [33,7]), and thus, the eigenvalues of K decay like λ n (K) ∼ n −2k/d . Meanwhile, the rank of the operator P Therefore, our construction of the m-dimensional subspace Ψ approximates K at the optimal rate. In Subsection 6.2, we introduce the concept of strong ellipticity that enables us to prove exponential decay results. In Subsection 6.4, we will prove that the basis functions ψ i,q defined in Eqn. (6.9) have exponential decay away from τ i .
6.2. The strong ellipticity condition. In our proof, we need the following strong ellipticity condition of the operator L to obtain the exponential decay. where ζ σ and ζ γ are the σ'th and γ'th entry of ζ, respectively. One can check that k+d−1 k is exactly the number of all possible kth derivatives, i.e., #{D σ u : |σ| = k}.
For a 2kth-order partial differential operator Lu = (−1) k |α|≤2k a α D α u, L is strongly elliptic if there exists a strongly elliptic operator in the divergence form L such that Lu = Lu for all u ∈ C 2k (D).
Remark 6.1. For a 2kth-order partial differential operator Lu = (−1) k |α|≤2k a α D α u, its divergence form may not be unique. It is possible that it has two divergence forms, and one does not satisfy the strong ellipticity condition (6.1) while the other does. For example, the biharmonic operator L = ∆ 2 in two space dimensions have the following two different divergence forms: when {D σ u : |σ| = 2} is ordered as (∂ 2 x1 , ∂ 2 x2 , ∂ x1 ∂ x2 ). Obviously, the first one does not satisfy the strong ellipticity condition (6.1) while the second one does. These two divergence forms correspond to two bilinear forms on H 2 0 (D): The strong ellipticity condition guarantees that for any local subdomain S ⊂ D, the seminorm | · | k,2,S can be controlled by the local energy norm · H(S) .
for all 0 ≤ |σ|, |γ| ≤ k and that for any x ∈ D • L is nonnegative, i.e., • L is bounded, i.e., there exist θ 0,max ≥ 0 and θ k,max > 0 such that • and L is strongly elliptic, i.e., there exists θ k,min > 0 such that For any subdomain S ⊂ D and any ψ ∈ H k (D), define Then, the following two claims hold true.
Proof. The first point can be obtained directly from the definition of strong ellipticity. In the following, we provide the proof of the second point. For S stated in the second point and any ψ ∈ H k (D), we have k,max |ψ| k,2,S ψ k−1,2,S . Thanks to the polynomial approximation property, for any τ i ∈ S and 1 ≤ q ≤ Q, we have Combining Eqn. (6.28) and (6.29), for h 2 (1−h 2k ) Combining Eqn. (6.25), (6.26), (6.27) and (6.30), we prove the second point.
Remark 6.2. When L contains lower-order terms but there is no crossing term between D σ u (|σ| = k) and D σ u (|σ| < k), i.e., J 3 = 0, we can directly get the same bound in Eqn. (6.23) for all h > 0.
The strong ellipticity condition above is different from the standard uniformly elliptic condition (see Definition 9.2 in [43]), i.e., a linear partial differential operator Lu = (−1) k |α|≤2k a α D α u is uniformly elliptic if there exists a constant θ k,min > 0 such that On one hand, it is obvious that a strongly elliptic operator with smooth coefficients is uniformly elliptic, by taking ζ σ := ξ σ in Eqn. (6.15). On the other hand, the relation between the uniform ellipticity and the strong ellipticity turns out to be closely related to the relation between nonnegative polynomials and sum-of-square (SOS) polynomials. In fact, the strongly ellipticity condition (6.15) is equivalent to that there exists θ k,min > 0 such that |σ|=|γ|=k a σγ (x)ξ σ ξ γ − θ k,min |σ|=k |ξ| 2k = Sum-Of-Squares (SOS) polynomials.
Using the famous Hilbert's theorem (1888) on nonnegative polynomials and SOS polynomials, we have the following theorem. Readers can find the proof and more discussions in [48]. Theorem 6.2. Let a α ∈ C |α|−k (D) for k < |α| ≤ 2k, a α ∈ C(D) for |α| ≤ k, and Lu = (−1) k |α|≤2k a α D α u for all u ∈ C 2k (D). Then, in the following two cases, if L is uniformly elliptic it is also strongly elliptic.
For the case (d, k) = (3, 2), i.e., fourth-order partial differential operators in 3-dimensional physical domain, all uniformly elliptic operators with constant coefficients are also strongly elliptic.
For the case (d, k) = (3, 2), we are not able to prove that strong ellipticity is equivalent to uniform ellipticity for elliptic operators with smooth and multiscale coefficients, but we suspect that it is true. For all other cases, there are uniformly but not strongly elliptic operators. Fortunately, for small physical dimensions d and differential orders k, strongly elliptic operators approximate uniformly elliptic operators well and counter examples are difficult to construct. 6.3. Exponential decay of basis functions I. In this subsection, we prove the exponential decay of basis functions constructed in Eqn. (6.9) for higher-order elliptic operators that contain only the highest order terms. We will leave the proof for the general operators to the next subsection. The proof follows exactly the same structure as that in the second-order elliptic case. • L is bounded, i.e., there exist nonnegative θ k,max such that • and L is strongly elliptic, i.e., there exists θ k,min > 0 such that Then, for any 1 ≤ i ≤ m and 1 ≤ q ≤ Q, it holds true that θ k,min . Here, C 1 and C η only depends on k and d, C p is the constant in Eqn. (6.5) and C(k, d, δ) := C(k, k, d, δ) from Lemma 3.1.
Proof. The proof follows the same structure as that of Theorem 5.1 and [37] (Thm. 3.9). Let k ∈ N, l > 0 and i ∈ {1, 2, . . . , m}. Let S 0 be the union of all the domains τ j that are contained in the closure of B(x i , klh)∩D, let S 1 be the union of all the domains τ j that are not contained in the closure of B(x i , (k + 1)lh) ∩ D and let S * = S c 0 ∩ S c 1 ∩ D (be the union of all the remaining elements τ j not contained in S 0 or S 1 ). In the following, we will prove that for any k ≥ 1, there exists constant C such that ψ i,q 2 H(S1) ≤ C ψ i,q 2 H(S * ) . Then, the same recursive argument in the proof of Theorem 5.1 can be used to prove the exponential decay.
Let η(x) be a smooth function which satisfies (1)
6.4. Exponential decay of basis functions II. The following theorem gives the exponential decay property of ψ i,q for an operator L with lower-order terms. Similar to the proof of Theorem 6.4, we need the polynomial approximation property (6.5) and the Friedrichs' inequality (3.4) to bound the lower-order terms, and we get an extra factor of 2 in our error bound.
for all 0 ≤ |σ|, |γ| ≤ k and that for any x ∈ D • L is nonnegative, i.e., • L is bounded, i.e., there exist θ 0,max ≥ 0 and θ k,max > 0 such that • and L is strongly elliptic, i.e., there exists θ k,min > 0 such that Then, there exists h 0 > 0 such that for any h ≤ h 0 , 1 ≤ i ≤ m and 1 ≤ q ≤ Q, it holds true that θ k,min . Here, C 1 and C η depend on k and d only, C p is the constant given in Eqn. (6.5), C(k, d, δ) := C(k, k, d, δ) is given in Lemma 4.1 and θ k,max := max(θ 0,max , θ k,max ). The constant h 0 can be taken as where C f is the constant in the Friedrichs' inequality (3.4).
Proof. The proof follows the same structure as the proof of Theorem 6.3. All we need to do is to use the polynomial approximation property (6.5) and the Friedrichs' inequality (3.4) to bound the lower-order terms when they appear. First, the I 1 in Eqn. (6.35) contains all the lower-order terms and its estimation should be modified as follows: Here, θ k,max := max(θ 0,max , θ k,max ). We have used the Cauchy-Schwarz inequality and the bound (6.45) in Eqn. (6.49). We will defer the proof of the last step in Eqn. (6.50) to the Appendix. Since ψ i,q ⊥ P k−1 locally in L 2 , we obtain from Theorem 3.1 that |ψ i,q | s−s ,2,S * ≤ C p h s |ψ i,q | s,2,S * ∀ 0 ≤ s ≤ s ≤ k. Therefore, we have If we compare the above estimate with Eqn. (6.40), we conclude that Eqn. (6.52) contains all the lower-order terms. We will use the polynomial approximation property (6.5) and take h 2 −h 2k 1−h 2 ≤ 1/C 2 p to guarantee that Eqn. (6.53) is valid. When L contains lower-order terms, by Lemma 6.2, we have Lψ i,q L 2 (τj ) ≤ 2θ k,max C(k, d, δ)h −k ψ i,q H(τj ) for any h > 0 satisfying h 2 (1−h 2k ) . Therefore, using Eqn. (6.42) we get (6.54) . Finally, we need to use Eqn. (6.24) instead of Eqn. (6.23) to bound |ψ i,q | k,2,S * . We get where we have imposed another condition on h, i.e., h 2 (1−h 2k ) θ k,min , we prove the exponential decay. Remark 6.3. As we have pointed out in Remark 6.2, when L contains lower-order terms but there is no crossing term between D σ u (|σ| = k) and D σ u (|σ| < k), Eqn. (6.23) can be used to bound |ψ i,q | k,2,S * . In this case, the constraint on l is and the h 0 can be taken as

6.5.
Lemmas. In this subsection, we will prove the following lemma, which is used in the proof of Theorem 6.3 and Theorem 6.4. Lemma 6.2. L is defined in Eqn. (6.1) and the space Ψ is defined as above. Assume that for any x ∈ D .
If the operator L contains only the highest order terms, i.e., Lu = (−1) k |σ|=|γ|=k D σ (a σγ D γ u), we have We will use Lemma 4.1 to prove this result, but we need to deal with the variable coefficients a σγ and the lower-order terms a σγ with |σ| + |γ| < 2k before we can apply Lemma 4.1. Our strategy is to transfer the variable coefficients to constant ones by the variational formulation (see Lemma 6.3), and to use the polynomial approximation property to deal with the lower-order terms; see Lemma 6.4. For this purpose, we first introduce the following two lemmas. .
then for all f ∈ L 2 (Ω), Proof. Let f ∈ L 2 (Ω). Let ψ L and ψ M be the weak solutions of Lψ L = f and Mψ M = f with the homogeneous Dirichlet boundary conditions on ∂Ω. Observe that ψ L and ψ M are the unique minimizers of I L (u, f ) and I M (u, f ) with (6.60) At the minima ψ L and ψ M , we have Observe that (6.62) where the first inequality is true because ψ L is the minimizer of I L , and the second inequality is true because I L (u, f ) ≤ I M (u, f ) for any u ∈ H k 0 (Ω). Combining Eqn. (6.61) and (6.62), we obtain Ω ψ M f ≤ Ω ψ L f . This proves the lemma. h→0 Proof. Let ψ h be the solution of Lψ h = f with the homogeneous Dirichlet boundary conditions on ∂Ω h and ψ h,0 be the solution of L 0 ψ h,0 = f with the homogeneous Dirichlet boundary conditions on ∂Ω h . Let At the minima ψ h and ψ h,0 , we have Note that Eqn. (6.65) implies that I L0 (ψ h,0 , f ) < 0. By the definition of Green's function, we further have y)f (x)f (y)dx dy ≤ 1 for any h > 0. Applying the Friedrich's inequality (3.4) Here, we have used Eqn. (6.66) in the last equality. Therefore, we have where we have used I L (ψ h , f ) ≤ I L (ψ h,0 , f ) in the first inequality. By using the above upper bound, we prove the lemma. Now, we are ready to prove Lemma 6.2.
Proof of Lemma 6.2.
Due to the construction of ϕ j,q , we have . We denote G j as the Green's function of the operator L with the homogeneous Dirichlet boundary condition on τ j , then Thanks to Lemma 6.3, we have where G * j is the Green's function of the operator (−1) k |σ|=k D 2σ u + θ k,max θ0,max |σ|<k (−1) σ D 2σ u with the homogeneous Dirichlet boundary condition on ∂τ j . Thanks to Lemma 6.4, for all h > 0 such that where G * j,0 is the Green's function of the operator (−1) k |σ|=k D 2σ u with the homogeneous Dirichlet boundary condition on ∂τ j . Denote v 1,0 as the solution of (−1) k |σ|=k D 2σ v 1,0 = g j on τ j with the homogeneous Dirichlet boundary condition, i.e., v 1,0 (x) = τj G * j,0 (x, y)g j (y)dy. Since g j ∈ P k−1 in τ j in this case, Lemma 4.1 shows that Combining Eqn. (6.68), (6.69) and (6.70), we have Therefore, we have proved Lemma 6.2. We point out that when the operator L contains only the highest order terms, i.e., Lu = (−1) k |σ|=|γ|=k D σ (a σγ D γ u), we don't need to pay a factor of 2 in Eqn. (6.69), and thus, g j 2 L 2 (τj ) ≤ (C(k, k, d, δ)) 2 h −2k θ k,max v 2 H(τj ) for all h > 0 in this special case.
Let L −1 0 f ∈ H k 0 (τ i ) be the unique weak solution of the following elliptic equation with the homogeneous Dirichlet boundary condition . We define M 0 , A 0 ∈ R Q×Q as follows: Let λ max (M 0 , A 0 ) be the largest generalized eigenvalue of the eigenvalue problem M 0 α = λA 0 α, which can be written as .
The proof of Lemma 6.2 also implies that If the operator L contains only the highest order terms, we have

Localization of the basis functions
Theorem 5.1 or Theorem 6.4 allows us to localize the construction of basis functions ψ i,q as follows. For r > 0, let S r be the union of the subdomains τ j that intersect with B(x i , r) (recall that B(x i , δh i /2) ⊂ τ i ) and let ψ loc i,q be the minimizer of the following quadratic problem: ψ loc i,q = arg min (7.1) We will naturally identify ψ loc i,q with its extension to H k 0 (D) by setting ψ loc i,q = 0 outside of S r . If the elliptic operator L is given with some other homogeneous boundary condition, the localized problem (7.1) should be slightly modified as follows such that the basis function ψ i,q honors the given boundary condition on ∂D: When ∂S r ∩ ∂D = ∅, Eqn. (7.2) is equivalent to Eqn. (7.1). However, when ∂S r ∩ ∂D = ∅, Eqn. (7.2) only enforces the zero Dirichlet boundary condition on ∂S r \∂D, but honors the original boundary condition on ∂D.
From now on, to simplify the expression of constants, we will assume without loss of generality that the domain is rescaled so that diam(D) ≤ 1. , it holds true that If the operator L contains only the highest order terms, it holds true that ψ loc i,q H ≤ C(k, d, δ) Proof. Consider is the inverse of A 0 (defined in Eqn. (6.74)) and L −1 0 ϕ i,q is the weak solution of the local problem (6.71) with right-hand side ϕ i,q . From the definition of A 0 , we know that τi ϕ i,q ζ i,q = δ q,q . Notice that ζ i,q ∈ H k 0 ⊂ H k 0 (S r ). Therefore, ζ i,q satisfies all constraints of ψ loc i,q (see Eqn. (7.1)), and thus, Making use of (L −1 We have used M 0 (q, q ) = |τ i |δ i,j (due to the normalization (6.8)) in the last inequality. Combining Eqn. (6.75) (or (6.74)), (7.4) and (7.5) and |τ i | ≥ V d (δh/2) d , we complete the proof of Eqn. (7.3).
Theorem 7.1. Under the same assumptions as those in Theorem 6.4, there exists h 0 > 0 such that for any h ≤ h 0 , 1 ≤ i ≤ m and 1 ≤ q ≤ Q, it holds true that Here, all the parameters are the same as those in Theorem 6.4. When the operator L contains only the highest order terms, i.e., Lu = (−1) k |σ|=|γ|=k D σ (a σγ D γ u), Eqn. (7.6) holds true for all h > 0. In this case, the constant C 3 can be taken as Proof. Let S 0 be the union of the subdomains τ j that are not contained in S r and let S 1 be the union of the subdomains τ j that are at distance at least h from S 0 . (We will assume that S 0 = ∅ and S 1 = ∅. If S 0 = ∅, the proof is trivial. We can choose r ≥ 2h such that S 1 = ∅.) Let S * be the union of the subdomains τ j that are not contained in either S 0 or S 1 , as illustrated in Figure 2. Note that in this case, we have S 1 in the inner region and S 0 in the outer region. This is the opposite of the scenario that we consider in Figure 1. Let η be a smooth cut-off function such that 0 ≤ η ≤ 1, η| S1 ≡ 1, η| S0 ≡ 0 and D σ η L ∞ (D) ≤ Cη h |σ| for all σ. Since ψ loc i,q satisfies the same constraints as those in the definition of ψ i,q , thanks to Eqn. (2.12) we have Define ψ i,r j,q as the (unique) minimizer of the following quadratic optimization: ψ i,r j,q := arg min Sr ψϕ j ,q = δ jq,j q ∀1 ≤ j ≤ m, ∀1 ≤ q ≤ Q. Note that ψ loc i,q = ψ i,r i,q . Let w jq = D ηψ i,q ϕ j,q and ψ iq,r w = m j=1 Q q =1 w jq ψ i,r j,q . Thanks to the orthogonality between ψ i,q and ϕ j,q , i.e., the constraints in Eqn. (6.9), we have Using (3) of Theorem 2.2, we have (ψ loc i,q , ψ i,r j,q ) H = Θ i,−1 iq,jq , where Θ i is defined by Eqn. (2.10) with K : L 2 (S r ) → L 2 (S r ) being the inverse of L with the homogeneous Dirichlet boundary condition on ∂S r . Therefore, we have (7.9) ψ iq,r By (2) of Theorem 2.2, we know that ψ iq,r w is the minimizer of the following quadratic problem: (7.10) Noting that ηψ i,q satisfies the same constraint, we have ψ iq,r w 2 H . By using this estimate with (7.7) and (7.9), we obtain .
It turns out that I 1 and I 2 play almost the same role as I 1 and I 2 did in the proof of Theorem 6.4 and can be estimated in a similar way. We will estimate these two terms as follows.
Let's first deal with I 1 . Since η| S1 ≡ 1 and η| S0 ≡ 0, we have H(S * ) . In Appendix B.2, we give a bound for ηψ i,q H(S * ) using a similar technique that we used to obtain Eqn. (6.53) from Eqn. (6.48) in the proof of Theorem 6.4. With this bound, we obtain (7.12) where C 3 = C 1 C η C p 2kθ k,max . With the strong ellipticity (6.46) and the bound (6.24), we conclude (7.13) Applying the exponential decay of Theorem 6.4 to ψ i,q H(S * ) , we get (7.14) We now estimate I 2 . Combining (3) of Theorem 2.2 with the definition of H-norm (2.1), we have Thanks to Lψ loc i,q | τj ∈ span{ϕ j,q } Q q=1 and the orthogonality between Φ and ψ i,r j,q , we have Since {ϕ j,q } Q q=1 is orthogonal and normalized such that ϕ j,q ϕ j,q = |τ j |δ q,q , we get Moreover, we obtain w jq = D ηψ i,q ϕ j,q by definition, and thus we get Here, we have made use of 0 ≤ η ≤ 1 in the last step. Combining (7.15) and (7.16), we get Now, we arrive at exactly the same situation as I 2 (see (6.41)) in the proof of Theorem 6.3. With the same derivation from Eqn. (6.41) to Eqn. (6.42), i.e., applying Lemma 6.2 to Lψ loc i,q L 2 (τj ) and Theorem 3.1 to ψ i,q L 2 (τj ) , we obtain where we have used θ k,max := max(θ 0,max , θ k,max ), the strong ellipticity (6.46) and the bound (6.24) in the last step. Applying the exponential decay of Theorem 6.4 to both ψ i,q H(S * ) and ψ loc i,q H(S * ) , we obtain (7.18) Combining Eqn. (7.11), (7.14) and (7.18), and using Eqn. (7.3) to bound ψ loc i,q H(D) and ψ i,q H(D) (recall ψ i,q H(D) ≤ ψ loc i,q H(D) ), we complete the proof of Eqn. (7.6).
When the operator L contains only the highest order terms, i.e., Lu = (−1) k |σ|=|γ|=k D σ (a σγ D γ u), Eqn. (7.14) and (7.18) hold true for all h > 0. In this case, we can get rid of the factor "2" in both Eqn. (7.14) and (7.18). Therefore, we obtain the estimate on C 3 stated in the theorem.
Theorem 7.2. Let u ∈ H k 0 (D) be the weak solution of Lu = f and ψ loc i,q be the localized basis functions defined in Eqn. (7.1). Then, for r ≥ (d + 4k)lh log(1/h) + 2(1 + l log C 4 )h, we have where C 4 = C3Ce Cp (Qa min ) 1/2 , and C 3 is defined in Theorem 7.1, a min comes from the norm-equivalence (6.4), and C e is the constant such that u L 2 (D) ≤ C e f L 2 (D) holds true.
Using the Cauchy inequality, we have Thanks to the orthogonality of {ϕ i,q } Q q=1 (6.8), we have Using the energy estimation u L 2 (D) ≤ C e f L 2 (D) and Theorem 7.1, we obtain Combining Eqn. (7.20) and (7.21) together, we conclude the proof.
By applying the Aubin-Nistche duality argument, we can get the following corollary.
Corollary 7.3. Let ψ loc i,q be the localized basis functions defined in Eqn. (7.1). Then, for r ≥ (d+4k)lh log(1/h)+ 2(1 + l log C 4 )h, we have where all the constants are the same as those defined in Theorem 7.2.
Corollary 7.3 shows that we can compress the symmetric positive semidefinite operator K with the optimal rate h 2k and with the nearly optimal localized basis (with support size of order h log(1/h)).
Remark 7.1. All the results and proofs presented above can be carried over to other homogeneous boundary conditions. Given a specific homogeneous boundary condition, one only needs to modify the proof of Lemma 7.1. Specifically, when the patch τ i intersects with the boundary of D, the constructed function ζ i,q should honor the same boundary condition on ∂D. The scaling argument in the proof of Lemma 7.1 still works for other homogeneous boundary conditions.

Numerical Examples
In this section, we present several numerical results to support the theoretical findings and to show how the sparse operator compression is utilized in higher-order elliptic operators. In Section 8.1, we apply our method to compress the Matérn covariance function (8.1) with ν = 1/2. We show that our method is able to achieve the optimal compression error with nearly optimally localized basis functions, which means that we are able to get optimality on both ends of the accuracy-sparsity trade-off in the sparse PCA. In Section 8.2, we apply our method to a 1D fourth-order elliptic equation with the homogeneous Dirichlet boundary condition and show that our basis functions, when used as multiscale finite element basis, can achieve the optimal h 2 convergence rate in the energy norm. In Section 8.3, we apply our method to a 2D fourth-order elliptic equation and show that the energy-minimizing basis functions decays exponentially fast away from its associated patch.
8.1. The compression of a Matérn covariance kernel. In spatial statistics, geostatistics, machine learning and image analysis, the Matérn covariance [32] is used to model random fields with smooth samples; see, e.g., [44,18,15]. The Matérn covariance between two points x, y ∈ D ⊂ R d is given by where Γ is the gamma function, K ν is the modified Bessel function of the second kind, and ρ and ν are nonnegative parameters of the covariance. Its Fourier transform is given by where f (ω) is the Fourier transform of f . For both sampling from the random fields and performing basic computations like marginalization and conditioning, we need to compress the Matérn covariance operator K : L 2 (D) → L 2 (D), which is defined through the Hilbert-Schmidt operator with kernel K ν (x, y), by a rank-n covariance operator: where Ψ = [ψ 1 , . . . , ψ n ] spans the range space of the approximate operator Ψ K n Ψ T . Recent study [28,4] shows that the Matérn covariance and the elliptic operators are closely connected. With proper homogeneous boundary conditions, the Matérn covariance operator with ν +d/2 being an integer is the solution operator of an elliptic operator of order 2ν +d. For example, the Matérn covariance operator with ν = 1/2 is the solution operator of a second-order elliptic operator (2lσ 2 ) −1 1 − ρ 2 d 2 dx 2 when the physical dimension d = 1, and is the solution operator of a fourth-order elliptic operator (8πρ 3 σ 2 ) −1 1 − 2ρ 2 ∆ + ρ 4 ∆ 2 when d = 3. Based on Eqn. (2.10) and (2.11), we can also compute the exponentially decaying basis functions from the covariance operator K. In this example, we apply our method to compress the following exponential kernel which is exactly the Matérn covariance (8.1) with ν = 1/2, σ = 1 and ρ = 1. This problem has been studied by different groups; see, e.g., [14,9,21,3]. We remark that since the Matérn covariance function corresponds to the solution operator of an elliptic PDE with constant coefficient, one can compress the Matérn covariance kernel by using a piecewise linear polynomial or wavelets with optimal locality and accuracy. It is not necessary to use the exponential decaying basis to perform the operator compression. We use this example to illustrate that our method can be also applied to compress a general kernel function.
We partition the interval [0, 1] uniformly into m = 2 6 patches and follow our strategy to construct basis functions. By the Fourier transform, we know that it is associated with the second-order elliptic operator 1 2 1 − d 2 dx 2 . Therefore, we take Φ as piecewise constant functions and then compute Ψ by Eqn. (2.10) and (2.11). In Figure 3, we plot ϕ 32 and ψ 32 , which is associated with the patch [1/2 − h, 1/2]. We can see that the basis function ψ 32 clearly has an exponential decay. We take m = 2 i for 0 ≤ i ≤ 7 and compute the compression error E(Ψ ; K). The result is shown in Figure 4. We can see that the exponentially decaying basis functions Ψ have nearly the same compression rate as that of the eigendecomposition.   One can easily verify that the exponential kernel (8.4) is the Green's function of the following second-order elliptic equation with boundary condition u(0) − u (0) = 0, u(1) + u (1) = 0. The associated energy norm is Solving the localized variational problem (7.2), we can get localized basis functions Ψ loc . With different sizes of the support S r , we compute the compression error E(Ψ loc ; K) for m = 2 i (0 ≤ i ≤ 7). The results are summarized in Figure 5. In the left subfigure of Figure 5, we take the support with size Ch, for C = 3, 5, 7, 9 and 11. In the right subfigure of Figure 5, we take the support with size Ch log 2 (1/h), for C = 2, 2.1 and 2.4. For a support of size Ch log 2 (1/h), it contains C log 2 (1/h) patches, where C log 2 (1/h) is the smallest integer of C log 2 (1/h). We can see that the oversampling strategy with r = ch does not give the optimal convergence rate , while the oversampling strategy with r = ch log 2 (1/h) gives the optimal second-order convergence rate as guaranteed by Corollary 7.3. For m = 2 7 and r = 2.4h log 2 (1/h), the constructed localized basis functions achieves the same operator compression error as that using 128 eignefunctions. k −α (ζ 1k sin(kx) + ζ 2k cos(kx)) , where {ζ 1k } K k=1 and {ζ 2k } K k=1 are two independent random vectors with independent entries uniformly distributed in [−1/2, 1/2]. This oscillatory coefficient is also used in [23,34,40], and has no scale separation. We choose α = 0 and K = 40 in the numerical experiment. A sample coefficient is shown in Figure 6. We partition the physical space [0, 1] uniformly into m = 2 6 patches, where the ith patch I i = [(i − 1)h, ih] with h = 1/m. In this fourth-order case, our theory requires the piecewise polynomial space Φ be the space of (discontinuous) piecewise linear functions, which has dimension n = 2m. We have two ϕ's, denoted as ϕ i,1 and ϕ i,2 , associated with the patch I i . Solving the quadratic optimization problem (6.9), we obtain the exponentially decaying basis functions. We also have two ψ's, denoted as ψ i,1 and ψ i,2 , associated with the patch I i . We plot ϕ i,1 and ϕ i,2 associated with the patch I 32 = [1/2−h, 1/2] in Figure 7 A. In Figure 7(B-C), we plot the basis functions ψ 32,1 and ψ 32,2 , which clearly show exponential decay.
To demonstrate the necessity for Ψ to contain all piecewise linear functions, in the third column of Figure 7, we also plot the basis functions associated the patch I 32 when Φ is the space of piecewise constant functions. In this case, we have only one ϕ, denoted as ϕ i , associated with the patch I i . In the third column of Figure 7(A) and (B), we plot ϕ 32 and ψ 32 . Solving the quadratic optimization problem (6.9), we obtain only one basis function ψ, denoted as ψ i , associated with the patch I i . In Figure 7(C), we plot the basis function ψ 32 in the third column. Note that ψ 32 also shows an exponential decay, but its decay rate is much smaller than that of ψ 32,1 and ψ 32,2 .
We have sampled a force f ∈ L 2 (D) from the same model (8.8) as the flexural rigidity. Using the MsFEM, we use two different sets of basis functions {ψ i,q } m,2 i=1,q=1 and {ψ i } m i=1 to solve the corresponding fourth-order elliptic equation (8.7), and get solutions u h,1 and u h,0 respectively. We show their errors in the energy norm, i.e., u h,1 − u H and u h,0 − u H in Figure 8. We can see that u h,1 − u H decays quadratically with respect to the patch size h, while u h,0 − u H decays only linearly. Therefore, to obtain the optimal convergence rate h 2 in the energy norm, it is necessary to include all the piecewise linear functions in the space Φ, as we have proved in Theorem 2.1 and Eqn. (6.11).
x (a 20 (x, y)∂ 2 x u(x, y)) + ∂ 2 y (a 02 (x, y)∂ 2 y u(x, y)) + 2∂ xy (a 11 (x, y)∂ xy u(x, y)) = f (x, y), u ∈ H 2 0 (D), (8.9) which describes the vibration u of a clamped plate subject to a transverse force f ∈ L 2 (D). The coefficients in the operator are given by a 20 (x, y) = a 02 (x, y) = k −α (ζ 1k sin(kx) + ζ 2k cos(ky)) , Based on the uniform partition with grid size h x = h y = 1 8 , we construct the piecewise linear function space Φ, which has dimension n = 3m = 192. We solve the quadratic optimization problem (6.9) with the weighted extended B-splines (Web-splines [20]) of degree 3 on the uniform refined grid with grid size h x,f = h y,f = 1 32 . The 2D Gaussian quadrature with 5 points on each axis is utilized to compute the integral on each fine grid cell. The three basis functions associated with the patch [1/2 − h x , 1/2] × [1/2 − h y , 1/2] are shown in Figure 9. We also show them in the log-scale in Figure 10. We can clearly see that the basis functions decay exponentially fast away from its associated patch, which validates our Theorem 6.3. We point out that the stiffness matrix for the fourth-order elliptic operator (8.9) becomes ill-conditioned very quickly when we refine the grid size. A carefully designed numerical strategy is required to validate the optimal convergence rate. We will leave this to our future work.

Concluding Remarks
In this paper, we have developed a general strategy to compress a class of self-adjoint higher-order elliptic operators by minimizing the energy norm of the localized basis functions. These energy-minimizing localized basis functions are obtained by solving decoupled local quadratic optimization problems with linear constraints, and they give optimal approximation property of the solution operator. For a self-adjoint, bounded and strongly elliptic operator of order 2k (k ≥ 1), we have proved that with support size O(h log(1/h)), our localized basis functions can be used to compress higher-order elliptic operators with the optimal compression rate O(h 2k ). We have applied our new operator compression strategy in different applications. For elliptic equations with rough coefficients, our localized basis functions can be used as multiscale basis functions, which gives the optimal convergence rate O(h k ) in the energy norm. In the application of the sparse PCA, our localized basis functions achieve nearly optimal sparsity and the optimal approximation rate simultaneously when the covariance operator to be compressed is the solution operator of an elliptic operator. We remark that a number of Matérn covariance kernels are related to the Green's functions of some elliptic operators.
There are several directions we can explore in the future work. First of all, the constants in both the compression error and the localization depend on the contrast of the coefficients, which makes the existing methods inefficient for coefficients with high contrast. Other methods (e.g., [17,29,36,37]) also suffer from the same limitation. Our sparse operator compression framework can be used to deal with this high contrast case, and we will report our findings in our upcoming paper. Secondly, in the application of the sparse PCA, our current construction requires the knowledge of the underlying elliptic operator L. We believe that it is possible to construct these localized basis functions using only the covariance function. Moreover, given any covariance operator, which may not be the solution operator of an elliptic operator, we can still define the Cameron-Martin space and the corresponding energy-minimizing basis functions. We are interested in the localization and compression properties of these energy-minimizing basis functions in this general setting. Our preliminary results show that the energy-minimizing basis functions still enjoy fast decay rate away from its associated patch, although the exponential decay may not hold true any more. Thirdly, it is interesting to apply our framework to the graph Laplacians, which can be viewed as discretized elliptic operators. Along this direction, we would like to develop an algorithm with nearly linear complexity to solve linear systems with graph Laplacians. Finally, we are also interested in applying our method to construct localized Wannier functions and to compress the Hamiltonian in quantum chemistry. Unlike the second-order elliptic operators with multiscale diffusion coefficients, all multiscale features of the Hamiltonian H = −∆ + V (x) lie in its potential V (x). Some adaptive domain partition strategy may prove to be useful in this application.
Proof. We begin by expressing the following integral as a sum of two terms: Since ψ i,q ⊥ P k−1 locally in L 2 , from Eqn. (6.5) we have |ψ i,q | s−s ,2,S * ≤ C p h s |ψ i,q | s,2,S * .