Kernel-independent adaptive construction of H2\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {H}^2$$\end{document}-matrix approximations

A method for the kernel-independent construction of H2\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {H}^2$$\end{document}-matrix approximations to non-local operators is proposed. Special attention is paid to the adaptive construction of nested bases. As a side result, new error estimates for adaptive cross approximation (ACA) are presented which have implications on the pivoting strategy of ACA.


Introduction
The fast multipole method introduced by Greengard and Rokhlin (see [24,13]) has become a very popular method for the efficient evaluation of long-range potentials and forces in the n-body problem. In a SIAM News article [12] it has been named to be one of the top 10 algorithms of the 20th century. While in the initial publications two-dimensional electrostatic problems were investigated, later publications [14,11] have improved the method such that three-dimensional electrostatic problems and also problems with more general physical background can be treated efficiently. All these variants rely on explicit kernel expansions, which on the one hand allows to tailor the expansion tightly to the respective problem, but on the other hand requires its own analytic apparatus including a-priori error estimates for each kernel. In order to overcome this technical difficulty, kernel-independent generalizations [25] were introduced. While the latter keep the analytic point of view, H-and H 2 -matrices (see [15,16,18]) generalize the method as much as possible by an algebraic perspective. In addition to the n-body problem, the latter methods can be applied to general elliptic boundary value problems either in its differential or its integral representation; see [6,17]. Furthermore, approximate replacements of usual matrix operations such as addition, multiplication, and inversion can be carried out with logarithmic-linear complexity, which allows to construct preconditioners in a fairly automatic way.
Nevertheless, H 2 -matrix approximations cannot be constructed without taking into account the analytic background. For instance, the construction of suitable cluster bases is a crucial task. In order to guarantee as much universality of the method as possible, polynomial spaces are frequently used; see [9]. While this choice is quite convenient due to special properties of polynomials, it is usually not the most efficient approach. To see why, keep in mind that the three-dimensional approach based on spherical harmonics [11] requires k = O(p 2 ) terms in a truncated expansion with precision of order p, while the number of polynomial terms for the same order of precision requires k = O(p 3 ) terms.
The number of terms k required to achieve a prescribed accuracy is crucial for the overall efficiency of the method. In addition to its dependence on the kernel, this number also depends on the underlying geometry (local patches of the geometry may have a smaller dimension). Additionally, a-priori error estimates usually lead to an overestimation of k. It is therefore helpful to find k in an automatic way, i.e. by an adaptive procedure. Such a method has been introduced by one of the authors. The adaptive cross approximation (ACA) [5] computes low-rank approximations of suitable sub-blocks using only few of the original matrix entries. From the algorithmic point of view this procedure is similar to a rank-revealing LU factorization. Therefore, it is kernel independent. In addition to that, it provably achieves asymptotic optimal convergence rates.
The aim of this article is to generalize the adaptive cross approximation method, which was introduced for H-matrices, to the kernel-independent construction of H 2 -matrices for matrices A ∈ R M ×N with entries of the form a ij = Ω Ω K(x, y)ϕ i (x)ψ j (y) dy dx, i = 1, . . . , M, j = 1, . . . , N. (1) Here, ϕ i and ψ j denote locally supported ansatz and test functions. The kernel function K is of the type K(x, y) = ξ(x) ζ(y) f (x, y) with a singular function f (x, y) = |x − y| −α and functions ξ and ζ each depending on only one of the variables x and y. Such matrices result, for instance, from a Galerkin discretization of integral operators. In particular, this includes the single layer potential operator K(x, y) = |x − y| −1 and the double layer potential operator of the Laplacian in R 3 for which K(x, y) = (x−y)·ny |x−y| 3 = x·ny |x−y| 3 − y·ny |x−y| 3 . Note that collocation methods and Nystrom methods can also be included by formally choosing ϕ i = δ x i or ψ j = δ x j , where δ x denotes the Dirac distribution centered at x. In contrast to Hmatrices for which the method is applied to blocks, in the case of H 2 -matrices cluster bases have to be constructed. If this is to be done adaptively, special properties of the kernel have to be exploited, in order to be able to guarantee that the error is controlled also outside of the cluster. Our approach relies on the harmonicity of the singular part f of the kernel function K. This article also presents a-priori error estimates which are based on interpolation by radial basis functions. The advantage of these new results is that they pave the way to a new pivoting strategy of ACA. While results based on polynomial interpolation error estimates require that the pivots are chosen such that unisolvency of the polynomial interpolation problem is guaranteed, the new estimates show that only the fill distance of pivoting points is crucial for the convergence of ACA.
The article is organized as follows. In the next Sect. 2 we construct interpolants s k to kernels f which are harmonic with respect to one variable. The system of functions in which the interpolating function is constructed will be defined from restrictions of f . This construction guarantees that the harmonicity of f is preserved for its interpolation error. Hence, in order to achieve a prescribed accuracy in the exterior of a domain, it is sufficient to check it on its boundary. This allows to construct s k in a kernel-independent and adaptive way. The interpolating function s k is then used to construct a quadrature rule which will be used in the construction of nested bases. Sect. 2.1 presents error estimates for functions e −γ|x| based on radial basis functions. These results are used in Sect. 2.2 to derive exponential error estimates (via exponential sum approximation) for s k when interpolating f (x, y) = |x−y| −α for arbitrary α > 0. The goal of Sect. 3 is the construction of uniform H-and H 2 -matrix approximations to matrices (1) using the harmonic interpolants s k . In Sect. 4 we apply the new method to boundary integral formulations of Poisson boundary value problems and to fractional diffusion problems and present numerical results which validate the presented method.

Harmonic interpolants and quadrature rules
For the construction of H 2 -matrix approximations (see Sect. 3), quadrature rules for the computation of integrals X f (x, y) dx will be required which depend only on the domain of integration X ⊂ R d and which are valid in the whole far-field of X, i.e. for y ∈ F η (X), where F η (X) := {y ∈ R d : η dist(y, X) ≥ diam X} with given η > 0. Such quadrature formulas are usually based on polynomial interpolation together with a-priori error estimates. The aim of this section is to introduce new adaptive quadrature formulas which are controlled by a-posteriori error estimates. In the special situation that f (x, ·), x ∈ X, is harmonic in and vanishes at infinity it is possible to control the quadrature error for y ∈ F η (X) also computa- Applying the following arguments in R d +2 , one can also treat the case α = d for arbitrary d ∈ N. Fractional exponents, which appear for instance in the case of the fractional Laplacian, will be treated in a forthcoming article. Harmonic functions u : Ω → R in an unbounded domain Ω ⊂ R d are known to satisfy the mean value property for balls B r (x) ⊂ Ω and the maximum principle provided u vanishes at infinity. Let Σ ⊂ R d be an unbounded domain such that (see Figure 1) A natural choice is Σ = F η (X). Since our aim is to check the actual accuracy and we cannot afford Figure 1: Σ and the far-fields F 2η (X) and F η (X).
to inspect it on an infinite set, we introduce the finite set M ⊂ ∂Σ to be close to ∂Σ, i.e., we assume that M satisfies dist(y, M ) ≤ δ, y ∈ ∂Σ.
In [6] we have already used the following recursive definition for the construction of an interpolating function s k in the convergence analysis of the adaptive cross approximation [5]. Let r 0 = f and for k = 0, 1, 2, . . . assume that r k has already been defined. Let x k+1 ∈ X be chosen such that then set and s k+1 := f − r k+1 , where y k+1 ∈ M denotes the maximum of |r k (x k+1 , ·)| in M . It can be shown (see [6]) that s k interpolates f at the chosen nodes x i , i = 1, . . . , k, for all y ∈ F η (X), i.e., s k (x i , y) = f (x i , y), i = 1, . . . , k, and belongs to F k := span{f (·, y 1 ), . . . , f (·, y k )}. In addition, the choice of (x k , y k ) ∈ X × M guarantees unisolvency, which can be seen from where C k ∈ R k×k denotes the matrix with the entries (C k ) ij = f (x i , y j ), i, j = 1, . . . , k. Hence, one can define the Lagrange functions for the system and the nodes Another representation of the vector L k ∈ R k of Lagrange functions L Due to the uniqueness of the interpolation, s k has the representation For an adaptive procedure it remains to control the interpolation error f − s k = r k in X × F η (X). The following obvious property follows from (6) via induction.
is harmonic in X c and vanishes at infinity for all x ∈ X, then so do s k (x, ·) and r k (x, ·).
The following lemma shows that although M ⊂ ∂Σ is a finite set, it can be used to find an upper bound on the maximum of r k (x, ·) in the unbounded domain F η (X).

Lemma 2. Let the assumptions of Lemma 1 be valid and let
Proof. Let x ∈ X and y ∈ ∂Σ. We define the set In the other case N = ∅, our aim is to find y ∈ M such that |r k (x, y)| ≤ 2|r k (x, y )|. r k does not change its sign and is harmonic in B qδ (y) due to B qδ (y) ⊂ X c , which follows from (3) as Due to the assumption (4) we can find y ∈ B δ (y) ∩ M . Then B (q−2)δ (y) ⊂ B (q−1)δ (y ) ⊂ B qδ (y). Hence, the mean value property (applied to r k if r k is positive or to −r k if r k is negative) shows Sine r k vanishes at infinity, (3) together with the maximum principle shows max y∈Fη(X) Notice that due to (8) we have Hence, Although it seems that Λ k (x) ∼ k in practice, there is no proof for this observation up to now. A related topic in interpolation theory are Leja points; see [19].
To see that this special kind of interpolation is more efficient than polynomial interpolation, we present the following example. Example 1. Let X ⊂ R 3 be 1000 points forming a uniform mesh of the unit cube centered at the origin. We choose Σ = {x ∈ R 3 : |x| > 3}. M is a discretization of ∂Σ with 768 points. We consider f (x, y) = |x−y| −1 and compare the quality of s k with the quality of the interpolating tensor Chebyshev polynomial of degree k. The following table shows the maximum pointwise error measured at X and at three times as many points as M has .   k  1  8  27  64  125  Cross approximation 3.28e-1 5.90e-2 5.8e-3 2.22e-4 1.12e-5 Chebyshev interpolation 4.55e-1 8.73e-2 2.18e-2 5.72e-3 2.10e-3 Table 1: Approximation error of s k and tensor Chebyshev polynomial of degree k.

Exponential error estimates for multivariate interpolation
For analyzing the error of the cross approximation, the remainder r k has to be estimated. The proof in [6] establishes a connection of r k with the best approximation in an arbitrary system Ξ = {ξ 1 , . . . , ξ k } of functions. There, qualitative estimates are presented for a polynomial system Ξ. For the uniqueness of polynomial interpolation it has to be assumed that the Vandermonde matrix [ξ j (x i )] ij ∈ R k×k is non-singular. The goal of the following section is to provide new error estimates for the convergence of cross approximation which avoid the unisolvency assumption by employing radial basis function interpolation. Furthermore, we will be able to state a rule for choosing the next pivotal point x k (in addition to (5)) leading to fast convergence rates.
Let κ : R d → R be a continuous function. In the following we assume that κ is positive definite, i.e.
Following [22] we define C κ the set of continuous functions f satisfying for some constant c > 0 and all ϕ ∈ C ∞ 0 (R d ). The smallest constant c in (9) defines a norm f κ and C k is a Hilbert space.
Given a set X k := {x 1 , . . . , x k } ⊂ X consisting of k ∈ N nodes x j , an interpolant p ∈ span{κ(· − x j ), j = 1, . . . , k} has to fulfill the conditions A solution of this interpolation problem can be written in its Lagrangian form denote the Lagrange functions satisfying L κ j (x i ) = δ ij , i.e., its coefficients α (i) ∈ R k are defined as the solution of the linear systems of equations Aα (i) = e i with A := [κ(x i − x j )] ij ∈ R k×k . The error between a function f ∈ C κ and its interpolant p is typically measured in terms of the fill distance h X k ,X := sup x∈X dist(x, X k ).
The following result is proved in [22].
for some ρ > 0. Then there is 0 < λ < 1 such that for all f ∈ C κ the corresponding interpolant p satisfies Remark. The assumption that X is a cube can be generalized. Theorem 1 remains valid as long as X can be expressed as the union of rotations and translations of a fixed cube of side b 0 . Actually, any ball in R d or any set X with sufficiently smooth boundary fulfills the requirements.
Elements f ∈ C κ can be characterized (see [20,21]) by the existence of a function g ∈ L 2 µ such that For later purposes we prove Then κ is positive definite and the measure µ associated with κ satisfies (10). Furthermore, h(x) = exp(−γ|x|) with γ > 0 belongs to C κ .

Application to |x − y| −α
We consider functions f of the form The validity of the latter condition will result from a partitioning of the computational domain Ω × Ω induced by a hierarchical partitioning of the matrix (1). Let κ(x, y) = exp(−β|x − y| 2 ). For fixed y ∈ Y we interpolate f with the radial basis function on the data set X k = {x 1 , . . . , x k }. Here, L κ j , j = 1, . . . , k, are the Lagrange functions for κ and X k .
Functions of type f are not covered by Theorem 1. Therefore, we additionally employ exponential sum approximations g r (t) := r j=1 ω j exp(−γ j t) of g(t) := t −α with finite r on the interval [1, R] in order to approximate f . According to [10], there are coefficients ω j , γ j > 0 such that .
According to Theorem 1 and Lemma 3, the functions h j,y can be interpolated using the radial basis function κ on the data set X k = {x 1 , . . . , x k }, i.e.
The convergence can be controlled by choosing the node x k+1 such that the fill distance h X k+1 ,X is minimized from step k to step k + 1. This minimization problem can be solved efficiently, i.e. with logarithmic-linear complexity, with the approximate nearest neighbor search described in [1,2,3]. Since we can expect that the fill distance behaves like h X k ,X ∼ k −1/d , Lemma 4 shows exponential convergence of p y with respect to k provided the Lebesgue constant grows sub-exponentially.
Applying the results of the previous lemma to the remainder r k , we obtain the following result for interpolating f on X × Y .
Theorem 2. For y ∈ Y let p y denote the radial basis function interpolant (13) for f y := f (·, y) = | · −y| −α . Choosing y 1 , . . . , y k ∈ Y such that |det C where c M > 1 is a constant, it holds that Proof. Let the vector of the Lagrange functions L κ i , i = 1, . . . , k, corresponding to the radial basis function κ and the nodes x 1 , . . . , x k be given by Using (8), we obtain where the last line follows from Cramer's rule. The assertion follows from the triangle inequality and Lemma 4.
Remark. Choosing the nodes y 1 , . . . , y k according to the condition which is much easier to check in practice, leads to the estimate for details see [6].

Construction of H 2 -matrix approximations
The aim of this section is to construct hierarchical matrix approximations to the matrix A defined in (1). To this end, we first partition the set of indices I × J, I = {1, . . . , M } and J = {1, . . . , N }, into sub-blocks t × s, t ⊂ I and s ⊂ J, such that the associated supports i.e. Y s ⊂ F η (X t ) and X t ⊂ F η (Y s ). Notice that from Sect. 2.2 we know that the singular part f of the kernel function K in (1) can be approximated on the pair X t × Y s . The usual way of constructing such partitions is based on cluster trees; see [16,6]. A cluster tree T I for the index set I is a binary tree with root I, where each t ∈ T I and its nonempty successors S I (t) = {t , t } ⊂ T I (if they exist) satisfy t = t ∪ t and t ∩ t = ∅. We refer to L(T I ) = {t ∈ T I : S I (t) = ∅} as the leaves of T I and define T ( ) where dist(t, s) is the minimum distance between t and s in T I . Furthermore, Once the cluster trees T I , T J for the index sets I and J have been computed, a partition P of I × J can be constructed from it. A block cluster tree T I×J is a quad-tree with root I × J satisfying conditions analogous to a cluster tree. It can be constructed from the cluster trees T I and T J in the following way. Starting from the root I × J ∈ T I×J , let the sons of a block t × s ∈ T I×J be S I×J (t, s) := ∅ if t × s satisfies (14) or min{|t|, |s|} ≤ n H min with a given constant n H min > 0. In the remaining case, we set S I×J (t, s) := S I (t) × S J (s). The set of leaves of T I×J defines a partition P of I × J and its cardinality |P | is of the order min{|I|, |J|}; see [6]. As usual, we partition P into admissible and non-admissible blocks where each t×s ∈ P adm satisfies (14) and each t×s ∈ P nonadm is small, i.e. satisfies min{|t|, |s|} ≤ n H min .
In order to approximate the matrix (1) more efficiently, we employ uniform H-matrices; see [15].
Definition 3. Let Φ and Ψ be cluster bases for The storage required for the coupling matrices F (t, s) is of the order k min{|I|, |J|} if for the sake of simplicity it is assumed that k t ≤ k for all t ∈ T I . Additionally, it is not useful to choose k t > |t|. The cluster bases Φ and Ψ require k[|I|L(T I ) + |J|L(T J )] units of storage; see [18].
In the following we employ the method from Sect. 2 to construct a uniform H-matrix approximation to an arbitrary block t × s ∈ P adm of matrix (1). Let ε > 0 be given and for each cluster t. Here, [v] t ) denotes the vector of Lagrange functions defined in (7). τ t and σ t denote index sets with cardinality k. From Theorem 2 we know that where L s (y) : . For x ∈ X t and y ∈ Y s this yields the dual interpolation with corresponding interpolation error and the Lebesgue constant Λ t k ≥ 1. We define the rank- where ξ and ζ are the functions defined in (2). Notice that both matrices are associated only with t and s, respectively, and can be precomputed independently of each other.
depends on both clusters t and s.

Remark. Since the vector of Lagrange functions
, the matrices Φ(t) ∈ R t×τt can be found from solving the linear system With ϕ i L 1 = 1 = ψ j L 1 the Cauchy-Schwarz inequality implies and thus Notice that the computation of the double integral for a single entry of the Galerkin matrix (1) is replaced with two single integrals in (18).

Nested bases
In order to reduce the amount of storage for storing the bases Φ and Ψ one can establish a recursive relation among the basis vectors. The corresponding structure are H 2 -matrices; see [18,8]. This sub-structure of H-matrices is even mandatory if a logarithmic-linear complexity is to be achieved for high-frequency Helmholtz problems. To this end, directional H 2 -matrices have been introduced in [7].
For estimating the complexity of storing a nested cluster basis U notice that the set of leaf clusters T I constitutes a partition of I and for each leaf cluster t ∈ T I at most k|t| entries have to be stored. Hence, t∈T I k|t| = k|I| units of storage are required for the leaf matrices U (t), t ∈ T I . The storage required for the transfer matrices is of the order k|I|, too; see [18].
Hence, the total storage required for an H 2 -matrix is of the order k(|I| + |J|).
Remark. It may be advantageous to consider only nested bases for clusters t having a minimal cardinality n H 2 min ≥ n H min . Blocks consisting of smaller clusters are treated with H-matrices. We define the matrices U (t) ∈ R t×kt , t ∈ T I , by the following recursion. If t ∈ T \ L(T I ) then the set of sons S I (t) is non-empty and we define with the transfer matrix Similarly, we define matrices V (s) ∈ R s×ks , s ∈ T J , using transfer matrices where which can be estimated using (15) and (16) due to x i ∈ X t ⊂ F η (Y s ) and y j ∈ Y s ⊂ F η (X t ). Thus, where denotes the maximum of the levels of t and s. If both t and s are leaves, then A| ts − (19). From (20) we see This shows The same kind of estimate holds if t or s is a leaf, because then U (t) = Φ(t) or V (s) = Ψ(s).

Numerical results
The focus of the following numerical tests lies on two problems. The first problem is an exterior boundary value problem for the Laplace equation, the second is a fractional diffusion process. All tests compare the method presented in this article with an H-matrix approximation generated by adaptive cross approximation (ACA); see [6]. All computations were performed on a computer consisting of two Intel E5-2630 v4 processors. The construction of the matrix approximation was done in parallel using 40 cores.

Exterior boundary value problem
We consider the Dirichlet boundary value problem for the Laplace equation in the exterior of the where γ ext 0 denotes the exterior trace and g the given Dirichlet data in the trace space H 1/2 (∂Ω) of the Sobolev space H 1 (Ω c ). In order to guarantee that the problem is well-defined, we additionally assume suitable conditions at infinity.
Using the single and double layer potential operators denotes the fundamental solution, the solution of (21) is given by the representation formula The task is to compute the missing Neumann data ψ := γ ext 1 u ∈ H −1/2 (∂Ω) from the boundary integral equation The unique solvability of the boundary integral equation (22) or (if the L 2 -scalar product is extended to a duality between H −1/2 (∂Ω) and H 1/2 (∂Ω)) its variational formulation is a consequence of the mapping properties of the single layer potential, the coercivity of the bilinear form (V·, ·) L 2 (∂Ω) and the Riesz-Fischer theorem. A Galerkin approach is used in order to compute ψ numerically. To this end, let the set {ψ 0 1 , . . . , ψ 0 N } denote the basis of the piecewise constant functions P 0 (T ) ⊂ H −1/2 (∂Ω), where T is a regular partition of ∂Ω into N triangles. If g is replaced by some piecewise linear approximation we obtain the discrete boundary integral equation Ax = f with A ∈ R N ×N and f ∈ R N having the entries (see (1))

Numerical Results
We choose various boundary discretizations of the ellipse Ω := {x ∈ R 3 : x 2 1 + x 2 2 + x 2 3 /9 = 1} as the computational domain and the Dirichlet data g = |x − 10e 1 | 2 . We compare H-matrix approximations of A generated via ACA with H 2 -matrix approximations obtained from the method introduced in this article. For both cases the same block cluster tree generated with η = 0.8 is used. The minimum sizes of clusters are denoted by n H min and n H 2 min , respectively; see the remark after Definition 5. As Table 2 shows, both methods produce almost the same relative error e h := u − u h L 2 (∂Ω) / u L 2 (∂Ω) , but they differ in the time needed for computing the respective approximation of A and in the required amount of storage, which is presented as the compression rate, i.e. the ratio of the amount of storage required for the approximation and the amount of storage of the original matrix.  The time for the construction of the matrix approximation decreases the more blocks are approximated with the H 2 -matrix method. While for a small number of degrees of freedom N the H-matrix method is faster than the H 2 -matrix method, the latter requires nearly 30% less CPU time for the finest discretization. Figures 2 and 3 give a deeper insight. Figure 2 shows the matrix A for a coarse discretization which was approximated as an H-matrix. Green blocks are admissible and were generated by low-rank approximation. The numbers displayed in the blocks show the approximation rank k H . Red blocks are not admissible and were generated entry by entry. In Figure 3, A was approximated as an H 2 -matrix. The meaning of green and red blocks is the same as in Figure 2, the blue blocks were generated using the H 2 -approximation. Obviously, there are several additional blocks that could be approximated with the H 2 -method. These are, however, omitted due to their size in order improve the storage requirements. Table 3 shows the portion of time required for the precalculations and the time for constructing the matrix. For the small examples the time required for the precalculation is relatively high compared to the total time and there are only few blocks which are approximated with the H 2 -method. Therefore  Concerning the amount of storage, the new construction of H 2 -matrix approximations is more efficient also for small numbers of degrees of freedom N as can be seen from Table 4. The larger N becomes, the more efficient is the new method. This cannot directly be seen from the compression rates, which compare the respective approximation with the dense matrix. However, inspecting the actual storage requirements, one can see that the storage benefit actually improves. For the finest discretization more than 30% of storage (i.e. more than 2.6 GB) are saved.

Fractional Poisson problem
Let Ω ⊂ R d be a Lipschitz domain, s ∈ (0, 1), and g ∈ H r (Ω), r > −s. We consider the fractional Poisson problem (−∆) s u = g in Ω, where the fractional Laplacian (see [27]) is defined as Here Zero trace spaces H s 0 (Ω) can be defined as the closure of C ∞ 0 (Ω) with respect to the H s -norm. Due to the non-local nature of the operator, we need to define the space of the test functions whereũ denotes the extension of u by zero: It is known (see [26]) thatH s (Ω) = H s 0 (Ω) for s = 1/2, and for s = 1/2 it holds thatH 1/2 (Ω) ⊂ H 1/2 0 (Ω).
The weak formulation of (23) is to find u ∈H s (Ω) satisfying ThenH s (Ω) can be equipped with the energy norm u Hs (Ω) = |u| H s (R d ) = a(u, u).
Let the set {ϕ 1 , . . . , ϕ N } denote the basis of the space of piecewise linear functions V (T ), where T is a regular partition of Ω into M tetrahedra and N inner points. The Galerkin method yields the discrete fractional Poisson problem Ax = f with A ∈ R N ×N , f ∈ R N having the entries If the supports of the basis functions ϕ i and ϕ j are disjoint, the computation of the entry a ij simplifies to Thus, admissible blocks t×s (which satisfy dist(X t , X s ) > 0) are of type (1) and can be approximated by the method presented in this article. We remark that the singular part f (x, y) = |x − y| d+2s due to its fractional exponent is not covered by the theory of this article. Nevertheless the following numerical results show that the method works and a theory for fractional exponents will be presented in a forthcoming article.

Numerical results
The general setup and our approach is the same as in the first example in Sect. 4.1. We compare two types of H-matrix approximations of A using the same block cluster tree generated with η = 0.8. The first one is generated via ACA and the second one is an H 2 -matrix approximation obtained from the method introduced in this article. Due to the Galerkin approach, we choose various volume discretizations of the ellipse Ω := {x ∈ R 3 : x 2 1 + x 2 2 + x 2 3 /9 = 1} as the computational domain, the Dirichlet data g ≡ 1 and the order of the fractional Laplacian s = 0.2.
Since no analytical solution is known for this geometry, we cannot directly verify the accuracy of the numerical solution u h . Instead, we test the quality of A H and A H 2 when applying them to a special vector. For this purpose, we take advantage of the fact that the constant functions are in the kernel of the fractional Laplacian. This also applies to the discrete version, the stiffness matrix A.
Hence, in the following we use e h := A 1 2 / √ N , 1 = [1, . . . , 1] T ∈ R N , as a measure of the quality of the approximations A H and A H 2 . Table 5 shows the minimum sizes of the respective clusters n H min and n H 2 min and the corresponding numerical results, the time needed for the respective approximation of A, the compression rate and the error e h . As in the first example, the time for the construction of the matrix approximation decreases  the matrix A for the coarsest discretization which was approximated as an H-matrix and H 2 -matrix, respectively. As in the Figs. 2 and 3, the red blocks were calculated entry by entry, the green and blue blocks are low rank approximations calculated by the ACA and the new method, respectively, and the number in the low-rank blocks is the rank k H and k H 2 , respectively. Compared to the first example, the ranks k H and k H 2 of corresponding blocks hardly differ. Therefore, n H 2 min can be chosen relatively small even for a large number of degrees of freedom N in order to ensure memory efficiency and to approximate as many blocks as possible with the H 2 -method. The reason for the small value of k H 2 is that for |x| > 1 the kernel function K(x) = |x| −d−2s is quite easy to approximate due to its decaying behavior. For a small number of degrees of freedom N the condition |x| > 1 is almost automatically guaranteed by the admissibility condition of the H 2 -blocks. On the other hand, we pay for this in the time it takes to calculate A, because the cost of the singular and near-singular integrals scale with | log h| per dimension; see [26,Chap. 4.2].
Of course not only the CPU time benefits from the small difference between k H and k H 2 , but also the storage requirements as can be seen from Table 7. For each selected discretization, less storage is   For example, the finest discretization requires 30% less storage (i.e. more than 7.4 GB). In addition, the H 2 -approximation becomes more efficient the larger the number of degrees of freedom N becomes, since the precalculations can be exploited for a increasingly larger part of the matrix.