Multiresolution kernel matrix algebra

Harbrecht, H.; Multerer, M.; Schenk, O.; Schwab, Ch.

doi:10.1007/s00211-024-01409-8

Multiresolution kernel matrix algebra

Open access
Published: 09 May 2024

Volume 156, pages 1085–1114, (2024)
Cite this article

Download PDF

You have full access to this open access article

Numerische Mathematik Aims and scope Submit manuscript

Multiresolution kernel matrix algebra

Download PDF

H. Harbrecht¹,
M. Multerer²,
O. Schenk³ &
…
Ch. Schwab⁴

606 Accesses
Explore all metrics

Abstract

We propose a sparse algebra for samplet compressed kernel matrices to enable efficient scattered data analysis. We show that the compression of kernel matrices by means of samplets produces optimally sparse matrices in a certain S-format. The compression can be performed in cost and memory that scale essentially linearly with the number of data points for kernels of finite differentiability. The same holds true for the addition and multiplication of S-formatted matrices. We prove that the inverse of a kernel matrix, given that it exists, is compressible in the S-format as well. The use of selected inversion allows to directly compute the entries in the corresponding sparsity pattern. Moreover, S-formatted matrix operations enable the efficient, approximate computation of more complicated matrix functions such as ${\varvec{A}}^\alpha $ or $\exp ({\varvec{A}})$ of a matrix ${\varvec{A}}$. The matrix algebra is justified mathematically by pseudo differential calculus. As an application, we consider Gaussian process learning algorithms for implicit surfaces. Numerical results are presented to illustrate and quantify our findings.

Approximation of eigenfunctions in kernel-based spaces

Article 21 January 2016

High-dimensional approximation with kernel-based multilevel methods on sparse grids

Article Open access 28 July 2023

Scalable Kernel Methods for Uncertainty Quantification

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

The concept of samplets has been introduced in [25] by generalizing the wavelet construction from [45] to discrete data sets in Euclidean space. A samplet basis is a multiresolution analysis of discrete signed measures, where stability is a direct consequence of the orthogonality of the basis. Samplets are data-centric and can be constructed such that their measure integrals vanish for all polynomials up to a certain degree. Thanks to this vanishing moment property in ambient space, kernel matrices, as they arise in scattered data approximation, become quasi-sparse in the samplet basis. This means that these kernel matrices are compressible in samplet coordinates, S-compressible for short, and can be replaced by sparse matrices. We call the resulting sparsity pattern the compression pattern. The latter has been characterized in [25, Section 5.3]. Given a quasi-uniform data set of cardinality N, i.e., the distance between neighboring points is uniformly bounded from below and above by $N^{-1/d}$ with $d \ge 1$ being the spatial dimension of the data, the S-compressed kernel matrix contains only ${\mathcal {O}}(N\log N)$ relevant entries, for kernels of possibly low regularity. A similar multiresolution approach in the reproducing kernel Hilbert space context was suggested in [30], while a geometry oblivious compression based on local degenerate kernel expansions is considered in [51].

In this article, we develop fast arithmetic operations for S-compressed kernel matrices. Fixing the sparsity pattern, we can perform addition and multiplication of kernel matrices with high precision in essentially linear cost. The derived cost bounds assume quasi-uniformity of the data points. Even so, all algorithms can still be applied if the quasi-uniformity assumption does not hold. In this case, however, the established cost bounds may become invalid. Similar approaches for realizing arithmetic operations of nonlocal operators exist by means hierarchical matrices, see [11, 14, 17, 21, 22], and by means of wavelets, see [6, 7, 42].

We prove that the inverses of regularized kernel matrices are compressible with respect to the original compression pattern. We can thus employ the selected inversion algorithm proposed in [36] to efficiently approximate the inverse. Our concrete implementation is based on a supernodal left-looking LDLT-factorization of the underlying matrix, which is available in the sparse direct solver Pardiso, see [31, 40]. The selected inversion computes, in the absence of rounding, the exact matrix inverse of the S-compressed matrix on its matrix pattern. Likewise, matrix addition and matrix multiplication are performed exactly on the prescribed compression pattern. This means that the relevant matrix coefficients are computed exactly when adding, multiplying, and inverting S-compressed kernel matrices. The only error introduced is the matrix compression error issuing from the restriction to the compression pattern.

Having a fast formatted matrix addition and fast matrix inversion at hand enables the fast approximate evaluation of holomorphic operator functions via contour integrals to compute more complicated matrix functions. This has been envisioned in [7] (“We conjecture and provide numerical evidence that functions of operators inherit this property”) and suggested in [23]. In the present paper we prove, using the multiresolution kernel matrix algebra under consideration, that, up to (exponentially small) contour quadrature errors, these contour integrals are computed exactly on the prescribed pattern. This is in contrast to previously proposed formats, such as hierarchical matrices, see [17].

Many applications particularly require only the computation of a subset of the elements of a given matrix inverse. Important examples are sparse inverse covariance matrix estimation in $\ell ^1$-regularized Gaussian maximum likelihood estimation, see, e.g., [9, 29], or integrated nested Laplace approximations for approximate Bayesian inference, see, e.g., [48] and the references therein. Other examples of computing a subset of the inverse are electronic structure calculations of materials utilizing multipole expansions, where only the diagonal and, occasionally, sub-diagonals of the discrete Green’s function are required to determine the electron density [33, 35].

We provide a rigorous theoretical underpinning of the algorithms under consideration by means of pseudodifferential calculus [28, 46]. To this end, we focus on kernels of reproducing kernel Hilbert spaces and assume that the associated integral operators correspond, via the Schwarz kernel theorem, to classical, elliptic pseudodifferential operators, from the Hörmander class $S^m_{1,0}$, cp. [28]. A prominent example of such kernels is the Matérn class of kernels, see [38], also called Sobolev splines [16]. The latter are known to generate the Sobolev spaces of positive order, and correspond to fractional powers of the shifted Laplacian. We prove that such pseudodifferential operators are S-compressible, meaning that for numerical representation, only the coefficients in the associated compression pattern need to be computed. Admissible classes comprise in particular the smooth Hörmander class $S^m_{1,0}$, but also considerably larger kernel classes of finite smoothness, which admit Calderon-Zygmund estimates and an appropriate operator calculus, see, e.g., [1, 47]. The corresponding operator calculus implies that sums, concatenations, powers and holomorphic functions of self-adjoint, elliptic pseudodifferential operators yield again pseudodifferential operators. As a consequence the respective operations of kernel matrices in samplet coordinates result again in compressible matrices.

The rest of this article is structured as follows. In Sect. 2, we introduce the scattered data framework under consideration and recall the relevant theory for reproducing kernel Hilbert spaces. The construction of samplets and the samplet matrix compression from [25] are summarized in Sect. 3. The main contribution of this article is Sect. 4. Here, we develop and analyze arithmetic operations for compressed kernel matrices in samplet coordinates. In Sect. 5, we perform numerical experiments in order to qualify and quantify the matrix algebra. Beyond benchmarking experiments, we consider the computation of an implicit surface from scattered data using Gaussian process learning. Finally, the required details from the theory pseudodifferential operators, especially the associated calculus, are collected in Appendix A.

Throughout this article, in order to avoid the repeated use of generic but unspecified constants, by $C\lesssim D$ we indicate that C can be bounded by a multiple of D, independently of parameters which C and D may depend on. Moreover, $C\gtrsim D$ is defined as $D\lesssim C$ and $C\sim D$ as $C\lesssim D$ and $D\lesssim C$.

2 Reproducing kernel Hilbert spaces

Let $({\mathcal {H}},\langle \cdot ,\cdot \rangle _{\mathcal {H}})$ be a Hilbert space of functions $h:\Omega \rightarrow {\mathbb {R}}$ with dual space ${\mathcal {H}}'$. Herein, $\Omega \subset {\mathbb {R}}^d$ is a given bounded domain or a lower-dimensional manifold. Furthermore, let $\kappa $ be a symmetric and positive definite kernel, i.e., $[\kappa ({\varvec{x}}_i,{\varvec{x}}_j)]_{i,j=1}^N$ is a symmetric and positive semi-definite matrix for every $N\in {\mathbb {N}}$ and any point selection ${\varvec{x}}_1,\ldots ,{\varvec{x}}_N\in \Omega $. We recall that $\kappa $ is the reproducing kernel for ${\mathcal {H}}$, iff $\kappa ({\varvec{x}},\cdot )\in {\mathcal {H}}$ for every ${\varvec{x}}\in \Omega $ and $h({\varvec{x}})=\langle \kappa ({\varvec{x}},\cdot ),h\rangle _{\mathcal {H}}$ for every $h\in {\mathcal {H}}$. In this case, we call $({\mathcal {H}},\langle \cdot ,\cdot \rangle _{\mathcal {H}})$ a reproducing kernel Hilbert space (RKHS).

Let $X\mathrel {\mathrel {\mathop :}=}\{{\varvec{x}}_1,\ldots ,{\varvec{x}}_N\}\subset \Omega $ denote a set of N mutually distinct points. With respect to the set $X$, we introduce the subspace

$$\begin{aligned} {\mathcal {H}}_X\mathrel {\mathrel {\mathop :}=}{{\,\textrm{span}\,}}\{\kappa ({\varvec{x}}_1,\cdot ),\ldots ,\kappa ({\varvec{x}}_N,\cdot )\} \subset {\mathcal {H}}. \end{aligned}$$

(1)

Corresponding to ${\mathcal {H}}_X$, we consider the subspace

$$\begin{aligned} {\mathcal {X}}\mathrel {\mathrel {\mathop :}=}{{\,\textrm{span}\,}}\{\delta _{{\varvec{x}}_1}, \ldots ,\delta _{{\varvec{x}}_N}\}\subset {\mathcal {H}}', \end{aligned}$$

which is spanned by the Dirac measures supported at the points of $X$, i.e.,

$$\begin{aligned} \delta _{{\varvec{x}}_i}(A)\mathrel {\mathrel {\mathop :}=}{\left\{ \begin{array}{ll} 1,&{}\text {if }{\varvec{x}}_i\in A,\\ 0,&{}\text {otherwise} \end{array}\right. } \end{aligned}$$

for any subset $A\subset \Omega $. For a continuous function $f\in C(\Omega )$, we use the notation

$$\begin{aligned} (f,\delta _{{\varvec{x}}_i})_\Omega \mathrel {\mathrel {\mathop :}=}\int _{\Omega }f({\varvec{x}})\delta _{{\varvec{x}}_i}({\text {d}}\!{\varvec{x}}) =f({\varvec{x}}_i). \end{aligned}$$

As the kernel $\kappa ({\varvec{x}},\cdot )$ is the Riesz representer of the point evaluation $(\cdot ,\delta _{\varvec{x}})_\Omega $, we particularly have

$$\begin{aligned} (h,\delta _{\varvec{x}})_\Omega =\langle \kappa ({\varvec{x}},\cdot ),h\rangle _{\mathcal {H}}\quad \text {for every } h\in {\mathcal {H}}. \end{aligned}$$

Thus, the space ${\mathcal {X}}$ is isometrically isomorphic to the subspace ${\mathcal {H}}_X$ from (1) and we identify

$$\begin{aligned} u'=\sum _{i=1}^Nu_i\delta _{{\varvec{x}}_i}\in {\mathcal {X}}\quad \text {with}\quad u=\sum _{i=1}^Nu_i\kappa ({\varvec{x}}_i,\cdot )\in {\mathcal {H}}_X. \end{aligned}$$

Later on, we endow ${\mathcal {X}}$ with the inner product

$$\begin{aligned} \langle u',v'\rangle _{\mathcal {X}}\mathrel {\mathrel {\mathop :}=}\sum _{i=1}^N u_iv_i,\quad \text {where } u'=\sum _{i=1}^Nu_i\delta _{{\varvec{x}}_i},\ v'=\sum _{i=1}^Nv_i\delta _{{\varvec{x}}_i}. \end{aligned}$$

(2)

This inner product is different from the restriction of the canonical one in ${\mathcal {H}}$ to ${\mathcal {H}}_X$. The latter is given by

$$\begin{aligned} \langle {u},{v}\rangle _{\mathcal {H}}={\varvec{u}}^\intercal {\varvec{K}}{\varvec{v}} \end{aligned}$$

with the symmetric and positive semi-definite kernel matrix

$$\begin{aligned} {\varvec{K}}\mathrel {\mathrel {\mathop :}=}\left[ \kappa ({\varvec{x}}_i,{\varvec{x}}_j)\right] _{i,j=1}^N\in {\mathbb {R}}^{N\times N} \end{aligned}$$

(3)

and ${\varvec{u}}\mathrel {\mathrel {\mathop :}=}[u_i]_{i=1}^N$ and ${\varvec{v}}\mathrel {\mathrel {\mathop :}=}[v_i]_{i=1}^N$.

Due to the duality between ${\mathcal {H}}_X$ and ${\mathcal {X}}$, the ${\mathcal {H}}$-orthogonal projection of a function $h\in {\mathcal {H}}$ onto ${\mathcal {H}}_X$ is given by the interpolant

$$\begin{aligned} s_h({\varvec{x}})\mathrel {\mathrel {\mathop :}=}\sum _{i=1}^N\alpha _i\kappa ({\varvec{x}}_i,\cdot ), \end{aligned}$$

which satisfies $s_h({\varvec{x}}_i) = h({\varvec{x}}_i)$ for all ${\varvec{x}}_i\in X$. The associated coefficients ${\varvec{\alpha }}=[\alpha _i]_{i=1}^N$ are given by the solution to the linear system

$$\begin{aligned} {\varvec{K}}{\varvec{\alpha }}={\varvec{h}} \end{aligned}$$

with right hand side ${\varvec{h}}=[h({\varvec{x}}_i)]_{i=1}^N$.

From [49, Corollary 11.33], we have the following approximation result.

Theorem 1

Let $\Omega \subset {\mathbb {R}}^d$ be a bounded Lipschitz domain satisfying an interior cone condition. Suppose that the Fourier transform of the kernel $\kappa ({\varvec{x}}-{\varvec{y}})$ satisfies

$$\begin{aligned} {\widehat{\kappa }}({\varvec{\varvec{\xi }}})\sim \left( 1+\Vert {\varvec{\varvec{\xi }}}\Vert _2^2\right) ^{-\tau }, \quad {\varvec{\varvec{\xi }}}\in {\mathbb {R}}^d. \end{aligned}$$

(4)

Then for $0\le t < \lceil \tau \rceil -d/2-1$, the error between $f\in H^\tau (\Omega )$ and its interpolant $s_{f,X}$ satisfies the bound

$$\begin{aligned} \Vert f-s_{f,X}\Vert _{H^{t}(\Omega )}\lesssim h_{X,\Omega }^{\tau -t}\Vert f\Vert _{H^\tau (\Omega )} \end{aligned}$$

for a sufficiently small fill distance

$$\begin{aligned} h_{X,\Omega } \mathrel {\mathrel {\mathop :}=}\sup _{{\varvec{x}}\in \Omega }\min _{{\varvec{x}}_i\in X} \Vert {\varvec{x}}-{\varvec{x}}_i\Vert _2. \end{aligned}$$

(5)

One class of kernels satisfying the conditions of Theorem 1 are the isotropic Matérn kernels, also called Sobolev splines, see [16]. These kernels play an important role in applications, such as spatial statistics [41]. They are given by

$$\begin{aligned} \kappa _\nu (r)\mathrel {\mathrel {\mathop :}=}\frac{2^{1-\nu }}{\Gamma (\nu )} \bigg (\frac{\sqrt{2\nu }r}{\ell }\bigg )^\nu K_\nu \bigg (\frac{\sqrt{2\nu }r}{\ell }\bigg ) \end{aligned}$$

with $r\mathrel {\mathrel {\mathop :}=}\Vert {\varvec{x}}-{\varvec{y}}\Vert _2$, smoothness parameter $\nu >0$ and length scale parameter $\ell >0$, see [38, 41]. Furthermore, $K_\nu $ denotes the modified Bessel function of the second kind. Specifically, property (4) holds with

$$\begin{aligned} {\widehat{\kappa }}_\nu ({\varvec{\varvec{\xi }}}) = \alpha \bigg (1+\frac{\ell ^2}{2\nu }\Vert {\varvec{\varvec{\xi }}}\Vert _2^2\bigg )^{-\nu -d/2}, \end{aligned}$$

(6)

where $\alpha $ is a scaling factor depending on $\nu $, $\ell $ and d, see [38]. The Matérn kernels are the reproducing kernels of the Sobolev spaces $H^{\nu +d/2}({\mathbb {R}}^d)$, see also [49].

For half integer values of $\nu $, i.e., for $\nu =p+1/2$ with $p\in {\mathbb {N}}_0$, the Matérn kernels have an explicit representation given by

$$\begin{aligned} \kappa _{p+1/2}(r)=\exp \bigg (\frac{-\sqrt{2\nu }r}{\ell }\bigg ) \frac{p!}{(2p)!} \sum _{q=0}^p\frac{(p+q)!}{q!(p-q)!} \bigg (\frac{\sqrt{8\nu }r}{\ell }\bigg )^{p-q}. \end{aligned}$$

The limit case $\nu \rightarrow \infty $ gives rise to the Gaussian kernel

$$\begin{aligned} \kappa _\infty (r) = \exp \bigg (\frac{-r^2}{2\ell ^2}\bigg ). \end{aligned}$$

Our subsequent analysis covers the Matérn family, but has considerably wider scope. Indeed, rather large classes of pseudodifferential operators will be admissible. As suitable classes of such operators are known to define an algebra, properties of arithmetic expressions of the underlying kernels, such as off-diagonal coefficient decay and matrix compressibility, can directly be inferred. Equally important, we show that these properties of operator algebras are to some extent transferred also to the corresponding finitely represented structures, i.e., we show the corresponding matrix representation likewise are algebras in the compressed format. We refer to Appendix A for the details and properties of pseudodifferential operators in this article.

3 Samplet matrix compression

We recall in this section the concept of samplets as it has been introduced in [25].

3.1 Samplets

Samplets are defined based on a sequence of spaces $\{{\mathcal {X}}_j\}_{j=0}^J$ forming a multiresolution analysis, i.e.,

$$\begin{aligned} {\mathcal {X}}_0\subset {\mathcal {X}}_1\subset \cdots \subset {\mathcal {X}}_J = {\mathcal {X}}. \end{aligned}$$

(7)

Rather than using a single scale from the multiresolution analysis (7), the idea of samplets is to keep track of the increment of information between two consecutive levels j and $j+1$. Since we have ${\mathcal {X}}_{j}\subset {\mathcal {X}}_{j+1}$, we may decompose

$$\begin{aligned} {\mathcal {X}}_{j+1} ={\mathcal {X}}_j\overset{\perp }{\oplus }{\mathcal {S}}_j \end{aligned}$$

(8)

by introducing the detail space ${\mathcal {S}}_j$, where orthogonality is to be understood with respect to the (discrete) inner product defined in (2).

Let ${\varvec{\Sigma }}_j$ be a basis of the detail space ${\mathcal {S}}_j$ in ${\mathcal {X}}_j$. We call a basis for ${\mathcal {S}}_j$ samplet basis. By choosing a basis of scaling distributions $\varvec{\Phi }_0$ of ${\mathcal {X}}_0$ and recursively applying the decomposition (8), we see that the set

$$\begin{aligned} \mathbf \Sigma _J = {\varvec{\Phi }}_0\cup \bigcup _{j=0}^J{\varvec{\Sigma }}_j \end{aligned}$$

forms a basis of ${\mathcal {X}}_J={\mathcal {X}}$. A visualization of a scaling distribution and two samplets on different resolution levels on a spiral data set is displayed in Fig. 1. The (x, y)-components indicate the support of the associated Dirac measures, while the z-component reflects the size of the corresponding coefficient.

To employ samplets for the compression of kernel matrices, it is desirable that the signed measures $\sigma _{j,k}\in {\mathcal {X}}_j\subset {\mathcal {H}}'$ have isotropic convex hulls of supports, are localized with respect to the corresponding discretization level j, i.e.,

$$\begin{aligned} {{\,\textrm{diam}\,}}({{\,\textrm{supp}\,}}\sigma _{j,k})\sim 2^{-j/d}, \end{aligned}$$

(9)

and that they are stable with respect to the inner product defined in (2), i.e.,

$$\begin{aligned} \langle \sigma _{j,k},\sigma _{j',k'}\rangle _{\mathcal {X}}=0 \quad \text {for }(j,k)\ne (j',k'). \end{aligned}$$

Furthermore, an essential ingredient is the vanishing moment condition of order $q+1$, i.e.,

$$\begin{aligned} (p,\sigma _{j,k})_\Omega = 0\quad \text {for all}\ p\in {\mathcal {P}}_q(\Omega ), \end{aligned}$$

(10)

where ${\mathcal {P}}_q(\Omega )$ is the space of all polynomials with total degree at most $q$. We say then that the samplets have vanishing moments of order $q+1$.

Remark 1

Associated to each samplet

$$\begin{aligned} \sigma _{j,k} = \sum _{\ell =1}^N\beta _\ell \delta _{{\varvec{x}}_{i_\ell }}, \end{aligned}$$

we find a uniquely determined function

$$\begin{aligned} {\hat{\sigma }}_{j,k}\mathrel {\mathrel {\mathop :}=}\sum _{\ell =1}^N\beta _\ell \kappa ({\varvec{x}}_{i_\ell },\cdot )\in {\mathcal {H}}_X, \end{aligned}$$

which also exhibits vanishing moments, i.e.,

$$\begin{aligned} \langle {\hat{\sigma }}_{j,k},h\rangle _{\mathcal {H}}=0 \end{aligned}$$

for any $h\in {\mathcal {H}}$ which satisfies $h|_{O}\in {\mathcal {P}}_q(O)$ for any open set ${{\,\textrm{supp}\,}}\sigma _{j,k}\subset O\subset \Omega $.

3.2 Construction of samplets

The starting point for the construction of samplets is the multiresolution analysis (7). Its construction is based on a hierarchical clustering of the set $X$.

Definition 1

Let ${\mathcal {T}}=(V,E)$ be a binary tree with vertices V and edges E. We define its set of leaves as

$$\begin{aligned} {\mathcal {L}}({\mathcal {T}})\mathrel {\mathrel {\mathop :}=}\{\nu \in V:\nu ~\text {has no children}\}. \end{aligned}$$

The tree ${\mathcal {T}}$ is a cluster tree for the set $X=\{{\varvec{x}}_1,\ldots ,{\varvec{x}}_N\}$, iff the set X is the root of ${\mathcal {T}}$ and all $\nu \in V{\setminus }{\mathcal {L}}({\mathcal {T}})$ are disjoint unions of their two children.

The level $j_\nu $ of $\nu \in {\mathcal {T}}$ is its distance from the root, i.e., the number of edges that are required for traveling from X to $\nu $. The depth $J$ of ${\mathcal {T}}$ is the maximum level of all clusters. We define the set of clusters on level j as

$$\begin{aligned} {\mathcal {T}}_j\mathrel {\mathrel {\mathop :}=}\{\nu \in {\mathcal {T}}:\nu ~\text {has level}~j\}. \end{aligned}$$

The cluster tree is balanced, iff $|\nu |\sim 2^{J-j_{\nu }}$.

To bound the diameter of the clusters, we introduce the separation radius

$$\begin{aligned} q_X\mathrel {\mathrel {\mathop :}=}\frac{1}{2}\min _{i\ne j}\Vert {\varvec{x}}_i-{\varvec{x}}_j\Vert _2 \end{aligned}$$

(11)

and require $X$ to be quasi-uniform.

Definition 2

The set $X\subset \Omega $ is quasi-uniform if the fill distance (5) is proportional to the separation radius (11), i.e., there exists a constant $c = c(X,\Omega )\in (0,1)$ such that

$$\begin{aligned} 0<c\le \frac{q_X}{h_{X,\Omega }} \le c^{-1}. \end{aligned}$$

Roughly speaking, the points ${\varvec{x}}\in X$ are equispaced if $X\subset \Omega $ is quasi-uniform. This immediately implies the following result.

Lemma 1

Let ${\mathcal {T}}$ be a cluster tree constructed by hierarchical longest edge bisection of the bounding box $B_{X}$, where $B_{\nu }$, $\nu \in {\mathcal {T}}$, is the smallest axis-parallel cuboid that contains all points of $\nu $. If $X\subset \Omega $ is quasi-uniform, then there holds

$$\begin{aligned} \frac{|B_\nu |}{|\Omega |} \sim \frac{|B_\nu \cap X|}{N} \end{aligned}$$

with the constant hidden in $\sim $ depending only on the constant $c(X,\Omega )$ in Definition 2. In particular, we have ${{\,\textrm{diam}\,}}(\nu )\sim 2^{-j_\nu /d}$ for all clusters $\nu \in {\mathcal {T}}$.

Samplets with vanishing moments are obtained recursively by employing a two-scale transform between basis elements on a cluster $\nu $ of level j. To this end, we represent scaling distributions $\mathbf {\Phi }_{j}^{\nu } = \{ \varphi _{j,k}^{\nu } \}$ and samplets $\mathbf {\Sigma }_{j}^{\nu } = \{ \sigma _{j,k}^{\nu } \}$ as linear combinations of the scaling distributions $\mathbf {\Phi }_{j+1}^{\nu }$ of $\nu $’s child clusters. This results in the refinement relation

$$\begin{aligned}{}[ \mathbf {\Phi }_{j}^{\nu }, \mathbf {\Sigma }_{j}^{\nu } ] \mathrel {\mathrel {\mathop :}=}\mathbf {\Phi }_{j+1}^{\nu } {\varvec{Q}}^{\nu }= \mathbf {\Phi }_{j+1}^{\nu } \big [ {\varvec{Q}}_{j,\Phi }^{\nu },{\varvec{Q}}_{j,\Sigma }^{\nu }\big ]. \end{aligned}$$

The transformation matrix ${\varvec{Q}}_{j}^{\nu }$ is computed from the QR decomposition

$$\begin{aligned} ({\varvec{M}}_{j+1}^{\nu })^\intercal = {\varvec{Q}}{\varvec{R}} \mathrel {=\mathrel {\mathop :}}\big [{\varvec{Q}}_{j,\Phi }^{\nu }, {\varvec{Q}}_{j,\Sigma }^{\nu }\big ]{\varvec{R}} \end{aligned}$$

of the moment matrix

with

$$\begin{aligned} m_q\mathrel {\mathrel {\mathop :}=}\sum _{\ell =0}^q\left( {\begin{array}{c}\ell +d-1\\ d-1\end{array}}\right) \le (q+1)^d \end{aligned}$$

being the dimension of ${\mathcal {P}}_q(\Omega )$. There holds

$$\begin{aligned} \begin{aligned} \big [{\varvec{M}}_{j,\Phi }^{\nu }, {\varvec{M}}_{j,\Sigma }^{\nu }\big ]&= \left[ ({\varvec{x}}^{\varvec{\alpha }},[\mathbf {\Phi }_{j}^{\nu }, \mathbf {\Sigma }_{j}^{\nu }])_\Omega \right] _{|\varvec{\alpha }|\le q}\\&= \left[ ({\varvec{x}}^{\varvec{\alpha }},\mathbf {\Phi }_{j+1}^{\nu }[{\varvec{Q}}_{j,\Phi }^{\nu }, {\varvec{Q}}_{j,\Sigma }^{\nu }])_\Omega \right] _{|\varvec{\alpha }|\le q} \\&= {\varvec{M}}_{j+1}^{\nu } [{\varvec{Q}}_{j,\Phi }^{\nu }, {\varvec{Q}}_{j,\Sigma }^{\nu } ] = {\varvec{R}}^\intercal . \end{aligned} \end{aligned}$$

As ${\varvec{R}}^\intercal $ is a lower triangular matrix, the first $k-1$ entries in its k-th column are zero. This corresponds to $(k-1)$ vanishing moments for the k-th function generated by the transformation $[{\varvec{Q}}_{j,\Phi }^{\nu }, {\varvec{Q}}_{j,\Sigma }^{\nu } ]$. By defining the first $m_{q}$ functions as scaling distributions and the remaining as samplets, we obtain samplets with vanishing moments at least up to order $q+1$.

For leaf clusters, we define the scaling distributions by the Dirac measures supported at the points ${\varvec{x}}_i\in X$, i.e., $\mathbf {\Phi }_J^{\nu }\mathrel {\mathrel {\mathop :}=}\{ \delta _{{\varvec{x}}_i}: {\varvec{x}}_i\in \nu \}$, to make up for the lack of child clusters that could provide scaling distributions. The scaling distributions of all clusters on a specific level j then generate the spaces

$$\begin{aligned} {\mathcal {X}}_{j}\mathrel {\mathrel {\mathop :}=}{{\,\textrm{span}\,}}\{ \varphi _{j,k}^{\nu }: k\in \Delta _j^\nu ,\ \nu \in {\mathcal {T}}_{j} \}, \end{aligned}$$

(12)

while the samplets span the detail spaces

$$\begin{aligned} {\mathcal {S}}_{j}\mathrel {\mathrel {\mathop :}=}{{\,\textrm{span}\,}}\{ \sigma _{j,k}^{\nu }: k\in \nabla _j^\nu ,\ \nu \in {\mathcal {T}}_{j} \} = {\mathcal {X}}_{j+1}\overset{\perp }{\ominus }{\mathcal {X}}_j. \end{aligned}$$

(13)

Combining the scaling distributions of the root cluster with all clusters’ samplets amounts to the basis

$$\begin{aligned} \mathbf {\Sigma }_{N}\mathrel {\mathrel {\mathop :}=}\mathbf {\Phi }_{0}^{X} \cup \bigcup _{\nu \in {\mathcal {T}}} \mathbf {\Sigma }_{j_{\nu }}^{\nu }. \end{aligned}$$

(14)

By construction, samplets satisfy the following properties, which are collected from [25, Theorem 3.6, Lemma 3.9, Theorem 5.4].

Theorem 2

The spaces ${\mathcal {X}}_{j}$ defined in equation (12) form the desired multiresolution analysis (7), where the corresponding detail spaces ${\mathcal {S}}_{j}$ from (13) satisfy

$$\begin{aligned} {\mathcal {X}}_{j+1}={\mathcal {X}}_j\overset{\perp }{\oplus }{\mathcal {S}}_{j}\quad \text {for all}\quad j=0,1,\ldots , J-1. \end{aligned}$$

The associated samplet basis $\mathbf {\Sigma }_{N}$ defined in (14) constitutes an orthonormal basis of ${\mathcal {X}}$ and we have:

1.
The number of all samplets on level j behaves like $2^j$.
2.
The samplets have vanishing moments of order $q+1$, i.e., there holds (10).
3.
Each samplet is supported in a specific cluster $\nu $. If the points in X are quasi-uniform, then the diameter of the cluster satisfies ${{\,\textrm{diam}\,}}(\nu )\sim 2^{-j_\nu /d}$ and there holds (9).
4.
The coefficient vector ${\varvec{\omega }}_{j,k}=\big [\omega _{j,k,i}\big ]_i$ of the samplet $\sigma _{j,k}$ on the cluster $\nu $ fulfills
$$\begin{aligned} \Vert {\varvec{\omega }}_{j,k}\Vert _{1}\le \sqrt{|\nu |}. \end{aligned}$$
5.
Let $f\in C^{q+1}(\Omega )$. Then, there holds for a samplet $\sigma _{j,k}$ supported on the cluster $\nu $ that
$$\begin{aligned} |(f,\sigma _{j,k})_\Omega |\le \bigg (\frac{d}{2}\bigg )^{q+1} \frac{{{\,\textrm{diam}\,}}(\nu )^{q+1}}{(q+1)!}\Vert f\Vert _{C^{q+1}(\Omega )} \Vert {\varvec{\omega }}_{j,k}\Vert _{1}. \end{aligned}$$

Remark 2

Each samplet is a linear combination of the Dirac measures supported at the points in X. The related coefficient vectors ${\varvec{\omega }}_{j,k}$ in

$$\begin{aligned} \sigma _{j,k} = \sum _{i=1}^{N} \omega _{j,k,i} \delta _{{\varvec{x}}_i} \end{aligned}$$

are pairwise orthonormal with respect to the inner product (2).

The dual samplet basis in ${\mathcal {H}}_X$, which exhibits the Lagrange property, cp. [49], is given by

$$\begin{aligned} {\tilde{\sigma }}_{j,k}=\sum _{i=1}^N{\tilde{\omega }}_{j,k,i}\kappa ({\varvec{x}}_i,\cdot ), \quad \text {where}\quad \tilde{\varvec{\omega }}_{j,k}\mathrel {\mathrel {\mathop :}=}{\varvec{K}}^{-1}{\varvec{\omega }}_{j,k}, \end{aligned}$$

as there holds

$$\begin{aligned} \langle {\tilde{\sigma }}_{j,k},{\hat{\sigma }}_{j'k'}\rangle _{{\mathcal {H}}}= ({\tilde{\sigma }}_{j,k},\sigma _{j'k'})_\Omega&=\sum _{i,i'=1}^N{\tilde{\omega }}_{j,k,i}{\omega }_{j',k',i'} \big (\kappa ({\varvec{x}}_i,\cdot ),\delta _{{\varvec{x}}_{i'}}\big )_\Omega \\&=\tilde{\varvec{\omega }}_{j,k}^\intercal {\varvec{K}}{\varvec{\omega }}_{j',k'} =\delta _{(j,k),(j',k')}. \end{aligned}$$

3.3 Matrix compression

For the compression of the kernel matrix ${\varvec{K}}$ from (3), with samplets of vanishing moment order $q+1$, for some integer $q\ge 0$, we suppose that the kernel $\kappa $ is “$q+1$-asymptotically smooth”. This is to say that there are constants $c_{\kappa ,\varvec{\alpha },\varvec{\beta }}>0$ such that for all ${\varvec{x}},{\varvec{y}}\in \Omega $ with ${\varvec{x}}\ne {\varvec{y}}$ there holds

$$\begin{aligned} \bigg |\frac{\partial ^{|\varvec{\alpha }|+|\varvec{\beta }|}}{\partial {\varvec{x}}^{\varvec{\alpha }} \partial {\varvec{y}}^{\varvec{\beta }}} \kappa ({\varvec{x}},{\varvec{y}})\bigg | \le c_{\kappa ,\varvec{\alpha },\varvec{\beta }} \Vert {\varvec{x}}-{\varvec{y}}\Vert _2^{-(|\varvec{\alpha }|+|\varvec{\beta }|)}, \quad |\varvec{\alpha }|, |\varvec{\beta }| \le q+1. \end{aligned}$$

(15)

Note that such an estimate can only be valid for continuous kernels as considered here, but not for singular ones. However, we observe in passing that this condition is considerably weaker than the usual notion of asymptotic smoothness of kernels in ${{\mathcal {H}}}$-matrix theory, cp. [21]. The condition there would correspond to infinite differentiability in (15) with analytic estimates on the constants $c_{\kappa ,\varvec{\alpha },\varvec{\beta }}$.

Due to (15), we have in accordance with [25, Lemma 5.3] the decay estimate

$$\begin{aligned} \begin{aligned}&(\kappa ,\sigma _{j,k}\otimes \sigma _{j',k'})_{\Omega \times \Omega }\\&\qquad \le c_{\kappa ,q}\frac{{{\,\textrm{diam}\,}}(\nu )^{q+1}{{\,\textrm{diam}\,}}(\nu ')^{q+1}}{{{\,\textrm{dist}\,}}(\nu _{j,k},\nu _{j',k'})^{2(q+1)}} \Vert {\varvec{\omega }}_{j,k}\Vert _{1}\Vert {\varvec{\omega }}_{j',k'}\Vert _{1} \end{aligned} \end{aligned}$$

(16)

for two samplets $\sigma _{j,k}$ and $\sigma _{j',k'}$, with the vanishing moment property of order $q+1$ and supported on the clusters $\nu $ and $\nu '$ such that ${{\,\textrm{dist}\,}}(\nu ,\nu ') > 0$.

Estimate (16) holds for a wide range of kernels that obey the so-called Calderón-Zygmund estimates. It immediately results in the following compression strategy for kernel matrices in samplet representation, cp. [25, Theorem 5.4], which is well-known in the context of wavelet compression of operator equations see, e.g., [39].

Theorem 3

(${\varvec{S}}$-compression) Set all coefficients of the kernel matrix

$$\begin{aligned} {\varvec{K}}^\Sigma \mathrel {\mathrel {\mathop :}=}\big [(\kappa ,\sigma _{j,k} \otimes \sigma _{j',k'})_{\Omega \times \Omega } \big ]_{j,j',k,k'} \end{aligned}$$

to zero which satisfy the $\eta $-admissibility condition

$$\begin{aligned} {{\,\textrm{dist}\,}}(\nu ,\nu ')\ge \eta \max \{{{\,\textrm{diam}\,}}(\nu ),{{\,\textrm{diam}\,}}(\nu ')\},\quad \eta >0, \end{aligned}$$

(17)

where $\nu $ is the cluster supporting $\sigma _{j,k}$ and $\nu '$ is the cluster supporting $\sigma _{j',k'}$, respectively. Then, the resulting S-compressed matrix ${\varvec{K}}^\eta $ satisfies

$$\begin{aligned} \big \Vert {\varvec{K}}^\Sigma -{\varvec{K}}^\eta \big \Vert _F \le c {\eta ^{-2(q+1)}} N{\log (N)}. \end{aligned}$$

for some constant $c>0$ dependent on the polynomial degree $q$ and the kernel $\kappa $.

Remark 3

We remark that Theorem 3 uses the Frobenius norm for measuring the error rather than the operator norm, as it gives control on each matrix coefficient. Estimates with respect to the operator norm would be similar.

The $\eta $-admissibility condition (17) appears reminiscent to the one used for hierarchical matrices, compare, e.g., [11] and the references there. However, in the present context, the clusters $\nu $ and $\nu '$ may also be located on different levels, i.e., $j_\nu \ne j_{\nu '}$ in general. As a consequence, the resulting block cluster tree is the cartesian product ${\mathcal {T}}\times {\mathcal {T}}$ rather than the level-wise cartesian product considered in the context of hierarchical matrices.

The error bounds for S-compression hold for kernel functions $\kappa $ with finite differentiability (especially, with derivatives of order $q+1$, cp. [25, Lemma 5.3]), as opposed to the usual requirement of asymptotic smoothness which appears in the error analysis of the ${\mathcal {H}}$-format, see [21] and the references therein.

For sets $X = \{{\varvec{x}}_i\}_{i=1}^N$ that are quasi-uniform in the sense of Definition 2, there holds

$$\begin{aligned} \frac{1}{N^2}\big \Vert {\varvec{K}}^\Sigma \big \Vert _F^2 = \frac{1}{N^2}\sum _{i=1}^N\sum _{j=1}^N |\kappa ({\varvec{x}}_i,{\varvec{x}}_j)|^2 \sim \int _\Omega \int _\Omega |\kappa ({\varvec{x}},{\varvec{y}})|^2{\text {d}}\!{\varvec{x}}{\text {d}}\!{\varvec{y}}, \end{aligned}$$

i.e., $\big \Vert {\varvec{K}}^\Sigma \big \Vert _F\sim N$. Thus, we can refine the above result, see also [25, Corollary 5.5].

Corollary 1

In case of quasi-uniform points ${\varvec{x}}_i\in X$, the S-compressed matrix ${\varvec{K}}^\eta $ has only ${\mathcal {O}}(N\log N)$ nonzero coefficients, while it satisfies the error estimate

$$\begin{aligned} \frac{\big \Vert {\varvec{K}}^\Sigma -{\varvec{K}}^\eta \big \Vert _F}{\big \Vert {\varvec{K}}^\Sigma \big \Vert _F} \le c \eta ^{-2(q+1)}. \end{aligned}$$

(18)

Here, the constant c depends on K and on q, but is independent of $\eta $ and N. In [25], an algorithm has been proposed which provides a numerical realization of the compressed matrix ${\varvec{K}}^\eta $ in work and memory ${\mathcal {O}}(N\log N)$. The key ingredient to achieve this is the use of an interpolation-based fast multipole method and ${\mathcal {H}}^2$-matrix techniques [3, 11, 20].

4 Samplet matrix algebra

4.1 Addition and multiplication

To bound the cost for the addition of two compressed kernel matrices represented with respect to the same cluster tree, it is sufficient to assume that the points in X are quasi-uniform. Then it is straightforward to see that the cost for adding such matrices is ${\mathcal {O}}({N\log N})$. The multiplication of two compressed matrices, in turn, is motivated by the concatenation ${\mathcal {C}}={\mathcal {A}}\circ {\mathcal {B}}$ of the two pseudodifferential operators ${\mathcal {A}}$ and ${\mathcal {B}}$. In suitable algebras, ${\mathcal {C}}$ is again a pseudodifferential operator and, hence, compressible. The respective kernel $\kappa _{{\mathcal {C}}}(\cdot ,\cdot )$ is given by

$$\begin{aligned} \kappa _{{\mathcal {C}}}({\varvec{x}},{\varvec{y}}) = \int _\Omega \kappa _{{\mathcal {A}}}({\varvec{x}},{\varvec{z}}) \kappa _{{\mathcal {B}}}({\varvec{z}},{\varvec{y}}){\text {d}}\!{\varvec{z}}. \end{aligned}$$

(19)

Since $\Omega \subset {\mathbb {R}}^d$ is bounded by assumption, we may without loss of generality assume $\Omega \subset [0,1)^d$. Moreover, we assume that the data points in $X=\{\varvec{x}_i\}_{i=1}^N\subset \Omega $ are uniformly distributed modulo one, i.e.,

$$\begin{aligned} \lim _{N\rightarrow \infty }\frac{|\Omega |}{N}\sum _{i=1}^N(f,\delta _{{\varvec{x}}_i})_\Omega = \int _\Omega f({\varvec{x}}){\text {d}}\!{\varvec{x}} \end{aligned}$$

(20)

for every Riemann integrable function $f:\Omega \rightarrow {\mathbb {R}}$, cp. [32, Chap. 2.1]. Then, we may interpret the matrix product as a discrete version of the convolution (19). In view of (20), we conclude

$$\begin{aligned} \bigg |\kappa _{{\mathcal {C}}}({\varvec{x}},{\varvec{y}}) -\frac{|\Omega |}{N}\sum _{k=1}^N \kappa _{{\mathcal {A}}}({\varvec{x}},{\varvec{x}}_k) \kappa _{{\mathcal {B}}}({\varvec{x}}_k,{\varvec{y}})\bigg |\rightarrow 0\ \text {as}\,N\rightarrow \infty . \end{aligned}$$

(21)

Consequently, the product of two kernel matrices

$$\begin{aligned} \varvec{K}_{{\mathcal {A}}} = [\kappa _{{\mathcal {A}}}(\varvec{x}_i,\varvec{x}_j)]_{i,j=1}^N,\quad \varvec{K}_{{\mathcal {B}}} = [\kappa _{{\mathcal {B}}}(\varvec{x}_i,\varvec{x}_j)]_{i,j=1}^N \end{aligned}$$

yields an S-compressible matrix $\varvec{K}_{{\mathcal {A}}}\cdot \varvec{K}_{{\mathcal {B}}}\in {\mathbb {R}}^{N\times N}$.

Theorem 4

Let $X=\{\varvec{x}_i\}_{i=1}^N \subset \Omega $ be uniformly distributed modulo one, see (20), and denote by $\varvec{K}_{{\mathcal {C}}}$ the corresponding kernel matrix

$$\begin{aligned} \varvec{K}_{{\mathcal {C}}} = \frac{N}{|\Omega |} [\kappa _{{\mathcal {C}}}(\varvec{x}_i,\varvec{x}_j)]_{i,j=1}^N \end{aligned}$$

with $\kappa _{{\mathcal {C}}}(\cdot ,\cdot )$ from (19). Then, there holds

$$\begin{aligned} \frac{\Vert \varvec{K}_{{\mathcal {C}}}-\varvec{K}_{{\mathcal {A}}}\varvec{K}_{{\mathcal {B}}}\Vert _F}{ \Vert \varvec{K}_{{\mathcal {C}}}\Vert _F} \rightarrow 0 \ \hbox { as}\ N\rightarrow \infty . \end{aligned}$$

Proof

On the one hand, we conclude from (21) that, as $N\rightarrow \infty $,

$$\begin{aligned}&\Vert \varvec{K}_{{\mathcal {C}}}-\varvec{K}_{{\mathcal {A}}}\varvec{K}_{{\mathcal {B}}}\Vert _F^2\\&\qquad = \sum _{i,j=1}^N \bigg [\frac{N}{|\Omega |}\kappa _{{\mathcal {C}}}({\varvec{x}}_i,{\varvec{x}}_j) - \sum _{k=1}^N \kappa _{{\mathcal {A}}}({\varvec{x}}_i,{\varvec{z}}_k)\kappa _{{\mathcal {B}}}({\varvec{z}}_k,{\varvec{x}}_j) \bigg ]^2\\&\qquad \sim N^4 \int _\Omega \int _\Omega \bigg [\kappa _{{\mathcal {C}}}({\varvec{x}},{\varvec{y}}) -\frac{|\Omega |}{N}\sum _{k=1}^N \kappa _{{\mathcal {A}}}({\varvec{x}},{\varvec{x}}_k)\kappa _{{\mathcal {B}}}({\varvec{x}}_k,{\varvec{y}}) \bigg ]^2{\text {d}}\!\varvec{x}{\text {d}}\!\varvec{y}\\&\qquad = o(N^4). \end{aligned}$$

On the other hand, we find likewise

$$\begin{aligned} \Vert \varvec{K}_{{\mathcal {C}}}\Vert _F^2\sim \int _\Omega \int _\Omega N^2 \kappa _{{\mathcal {C}}}({\varvec{x}},{\varvec{y}})^2{\text {d}}\!\varvec{x}{\text {d}}\!\varvec{y} \sim N^4. \end{aligned}$$

This implies the assertion. $\square $

Remark 4

We mention that the consistency bound in the preceding theorem is rather crude. Under provision of stronger kernel-function regularity, corresponding higher convergence rates can be achieved, given that X satisfies appropriate higher-order quasi-Monte Carlo designs, see, e.g., [13] and the references there.

Let $\varvec{K}_{{\mathcal {A}}}^\eta ,\varvec{K}_{{\mathcal {B}}}^\eta ,\varvec{K}_{{\mathcal {C}}}^\eta $ be compressed with respect to the same compression pattern. We assume for given $\varepsilon (\eta )>0$ that $\eta $ in (18) is chosen such that

$$\begin{aligned} \big \Vert {\varvec{K}}^\Sigma -{\varvec{K}}^\eta \big \Vert _F \le \varepsilon (\eta ){\big \Vert {\varvec{K}}^\Sigma \big \Vert _F},\quad \text {for }{\varvec{K}}\in \{\varvec{K}_{{\mathcal {A}}},\varvec{K}_{{\mathcal {B}}},\varvec{K}_{{\mathcal {C}}}\}. \end{aligned}$$

Then, a repeated application of the triangle inequality yields

$$\begin{aligned}&\Vert \varvec{K}_{{\mathcal {C}}}^\eta -\varvec{K}_{{\mathcal {A}}}^\eta \varvec{K}_{{\mathcal {B}}}^\eta \Vert _F\\&\ \le \Vert \varvec{K}_{{\mathcal {C}}}^\Sigma -\varvec{K}_{{\mathcal {C}}}^\eta \Vert _F + \Vert \varvec{K}_{{\mathcal {A}}}^\Sigma \Vert _F\Vert \varvec{K}_{{\mathcal {B}}}^\Sigma -\varvec{K}_{{\mathcal {B}}}^\eta \Vert _F +\Vert \varvec{K}_{{\mathcal {B}}}^\eta \Vert _F\Vert \varvec{K}_{{\mathcal {A}}}^\Sigma -\varvec{K}_{{\mathcal {A}}}^\eta \Vert _F\\&\ \le \varepsilon (\eta )\big (\Vert \varvec{K}_{{\mathcal {C}}}\Vert _F +\Vert \varvec{K}_{{\mathcal {A}}}\Vert _F\Vert \varvec{K}_{{\mathcal {B}}}\Vert _F +\big (1+\varepsilon (\eta )\big )\Vert \varvec{K}_{{\mathcal {A}}}\Vert _F\Vert \varvec{K}_{{\mathcal {B}}}\Vert _F\big )\\&\ \lesssim \varepsilon (\eta )\big (\Vert \varvec{K}_{{\mathcal {C}}}\Vert _F +\Vert \varvec{K}_{{\mathcal {A}}}\Vert _F\Vert \varvec{K}_{{\mathcal {B}}}\Vert _F\big ). \end{aligned}$$

This means that we only need to compute ${\mathcal {O}}(N\log N)$ matrix entries to determine an approximate version $(\varvec{K}_{{\mathcal {A}}}^\eta \varvec{K}_{{\mathcal {B}}}^\eta )^\eta $ of the product $\varvec{K}_{{\mathcal {A}}}^\eta \cdot \varvec{K}_{{\mathcal {B}}}^\eta $. We like to stress that this S-formatted matrix multiplication is exact on the given compression patterns. The next theorem gives a cost bound for the matrix multiplication.

Theorem 5

Consider two kernel matrices

$$\begin{aligned} \varvec{K}_{{\mathcal {A}}}^\eta =[a_{(j,k),(j',k')}], \quad \varvec{K}_{{\mathcal {B}}}^\eta =[b_{(j,k),(j',k')}]\in {\mathbb {R}}^{N\times N} \end{aligned}$$

in samplet coordinates which are S-compressed with respect to the compression pattern induced by the $\eta $-admissibility condition (17).

Then, computing the matrix $\varvec{K}_{{\mathcal {C}}}^\eta =[c_{(j,k),(j',k')}]\in {\mathbb {R}}^{N\times N}$ with respect to the same compression pattern, where the nonzero entries are given by the discrete inner product

$$\begin{aligned} c_{(j,k),(j',k')} = \sum _{\ell =0}^J\sum _{m\in \nabla _\ell } a_{(j,k),(\ell ,m)} b_{(\ell ,m),(j',k')}, \end{aligned}$$

(22)

is of cost ${\mathcal {O}}(N\log ^2 N)$.

Proof

To estimate the cost of the matrix multiplication, we shall make use of the compression rule (17). We assume for all clusters that ${{\,\textrm{diam}\,}}(\nu )\sim 2^{-j/d}$ if $\nu $ is on level j. Thus, the samplet $\sigma _{j,k}$ has approximately diameter $2^{-j/d}$ and, therefore, only ${\mathcal {O}}(2^{\ell -j})$ samplets $\sigma _{\ell ,m}$ of diameter $\sim 2^{-\ell /d}$ are found in its nearfield if $\ell \ge j$ while only ${\mathcal {O}}(1)$ are found if $\ell <j$. For fixed level $0\le \ell \le J$ in (22), we thus have at most ${\mathcal {O}}(\max \{2^{\ell -\max \{j,j'\}},1\})$ nonzero products to evaluate per coefficient $c_{(j,k),(j',k')}$. We assume without loss of generality that $j\ge j'$ and sum over $\ell $, which yields the cost ${\mathcal {O}}(\max \{2^{J-j},j\})$. Per target block matrix ${\varvec{C}}_{j,j'} = [c_{(j,k),(j',k')}]_{j,j'}$, we have ${\mathcal {O}}(2^{\max \{j,j'\}}) = {\mathcal {O}}(2^j)$ nonzero coefficients. Hence, the cost for computing the desired target block is ${\mathcal {O}}(2^j \max \{2^{J-j},j\})$. We shall next sum over j and $j'$

$$\begin{aligned} \sum _{j=0}^J \sum _{j'=0}^j {\mathcal {O}}(2^j\max \{2^{J-j},j\})&= \sum _{j=0}^J \sum _{j'=0}^j {\mathcal {O}}(\max \{N,j2^j\})\\&= \sum _{j=0}^J {\mathcal {O}}(j\max \{N,j 2^j\}) = {\mathcal {O}}(N\log ^2 N). \end{aligned}$$

$\square $

4.2 Sparse selected inversion

Having addition and multiplication of kernel matrices at our disposal, we consider the matrix inversion next. To this end, observe that the inverse ${{\mathcal {A}}}^{-1}$ of a pseudodifferential operator ${{\mathcal {A}}}$ from a suitable algebra of pseudodifferential operators, provided that it exists, is again a pseudodifferential operator, see Sect. A. However, if ${{\mathcal {A}}}$ is a pseudodifferential operator of negative order as in the present RKHS case, the operator ${{\mathcal {A}}}^{-1}$ is of positive order and hence gives rise to a singular kernel which does not satisfy the condition (15). Even so, in the regime of kernel matrices we are rather interested in inverting regularized pseudodifferential operators, i.e., ${{\mathcal {A}}}+\mu {I}$, where $I$ denotes the identity. For such operators, we have the following lemma.

Lemma 2

Let ${{\mathcal {A}}}$ be a pseudodifferential operator of order $s \le 0$ with symmetric and positive semidefinite kernel function.

Then, for any $\mu >0$, the inverse of ${{\mathcal {A}}}+\mu {I}$ can be decomposed into $\frac{1}{\mu }{I}-{{\mathcal {B}}}$ with

$$\begin{aligned} {{\mathcal {B}}} = \frac{1}{\mu }({{\mathcal {A}}}+\mu {I})^{-1}{{\mathcal {A}}}. \end{aligned}$$

(23)

Especially, ${{\mathcal {B}}}$ is also a pseudodifferential operator of order s, which admits a symmetric and positive semidefinite kernel function.

Proof

In view of (23), we infer that

$$\begin{aligned} ({{\mathcal {A}}}+\mu {I})\bigg (\frac{1}{\mu }{I}-{{\mathcal {B}}}\bigg ) = \frac{1}{\mu }{{\mathcal {A}}} +I - ({{\mathcal {A}}}+\mu {I}){{\mathcal {B}}} = I + \frac{1}{\mu }{{\mathcal {A}}}- \frac{1}{\mu }{{\mathcal {A}}} = I. \end{aligned}$$

Therefore, $\frac{1}{\mu }{I}-{{\mathcal {B}}}$ is the inverse operator to ${{\mathcal {A}}}+\mu {I}$. Since ${{\mathcal {A}}}+\mu {I}$ is of order 0, $({{\mathcal {A}}} +\mu {I})^{-1}$ is of order 0, too, and thus $({{\mathcal {A}}}+\mu {I})^{-1} {{\mathcal {A}}}$ is of the same order as ${{\mathcal {A}}}$. Finally, the symmetry and nonnegativity of ${\mathcal {B}}$ follows from the symmetry and nonnegativity of ${\mathcal {A}}$. $\square $

As a consequence of this lemma, the inverse $({\varvec{K}}_{{\mathcal {A}}}+\mu {\varvec{I}})^{-1} \in {\mathbb {R}}^{N\times N}$ of the associated kernel matrix ${\varvec{K}}_{{\mathcal {A}}}+\mu {\varvec{I}} \in {\mathbb {R}}^{N\times N}$ is S-compressible with respect to the same compression pattern as ${\varvec{K}}_{{\mathcal {A}}}$. In [24], strong numerical evidence was presented that a sparse Cholesky factorization of a compressed kernel matrix can efficiently be computed by means of nested dissection, cf. [18, 37]. This suggests the computation of the inverse $({\varvec{K}}_{{\mathcal {A}}}+\mu {\varvec{I}})^{-1}$ in samplet basis on the compression pattern of ${\varvec{K}}_{{\mathcal {A}}}$ by means of selected inversion [36] of a sparse matrix. The approach is outlined below.

Assume that ${\varvec{A}}\in {\mathbb {R}}^{N\times N}$ is symmetric and positive definite. There are two steps in the inversion algorithm. The first stage involves factorizing the input matrix $\varvec{A}$ into $\varvec{A}=\varvec{LDL}^\intercal $. The $\varvec{L}$ and $\varvec{D}$ matrices are used in the second phase to compute the selected components of $\varvec{A}^{-1}$. The first step will be referred to as factorization in the following and the second step as selected inversion. To explain the second step, let ${\varvec{A}}$ be partitioned according to

In particular, the diagonal blocks ${\varvec{A}}_{ii}$ are also symmetric and positive definite. The selected inversion is based on the identity

(24)

with the Schur complement ${\varvec{S}}\mathrel {\mathrel {\mathop :}=}{\varvec{A}}_{22}+{\varvec{A}}_{12}^\intercal {\varvec{C}}$, where ${\varvec{C}}\mathrel {\mathrel {\mathop :}=}-{\varvec{A}}_{11}^{-1}{\varvec{A}}_{12}$. For sparse matrices, this block algorithm can efficiently be realized based on the observation that for the computation of the entries of ${\varvec{A}}^{-1}$ on the pattern of ${\varvec{L}}$ only the entries on the pattern of ${\varvec{L}}$ are required, as it is well known from the sparse matrix literature, cp. [15, 19, 36]. The pattern of ${\varvec{A}}$ is particularly contained in the pattern of ${\varvec{L}}$.

4.3 Algorithmic aspects

A block selected inversion algorithm has at least two advantages: Because $\varvec{A}$ is sparse, blocks can be specified in terms of supernodes [36]. This allows us to use level-3 BLAS to construct an efficient implementation by leveraging memory hierarchy in current microprocessors. A supernode is a group of nodes with the same nonzero structure below the diagonal in their respective columns (of their $\varvec{L}$ factor). The supernodal approach for sparse symmetric factorization represents the factor $\varvec{L}$ as a set of supernodes, each of which consists of a contiguous set of $\varvec{L}$ columns with identical nonzero patterns, and each supernode is stored as a dense submatrix to take advantage of level-3 BLAS calculations.

Given these considerations, it is natural to employ the selected inversion approach presented in [48] and available in [40] in order to directly compute the entries on the pattern of the inverse matrix. For the particular implementation of the selected inversion, we rely on Pardiso. For larger kernel matrices, which cannot be indexed by $32bit$ integers due to the comparatively large number of non-zero entries, we combine the selected inversion with a divide and conquer approach based on the identity (24). The inversion of the ${\varvec{A}}_{11}$ block and of the Schur complement ${\varvec{S}}$ are performed with Pardiso (exploiting symmetry), while the remaining arithmetic operations, i.e., addition and multiplication, are performed in a formatted way, compare Theorem 5.

4.4 Matrix functions

Based on the S-formatted multiplication and inversion of operators represented in samplet basis, certain holomorphic functions of an S-compressed operator also admit S-formatted approximations with, essentially, corresponding approximation accuracies.

To illustrate this, we recall the method in [23]. This approach employs the contour integral representation

$$\begin{aligned} f({\varvec{A}}) = \frac{1}{2\pi i} \int _\Gamma f(z) (z{\varvec{I}}-{\varvec{A}})^{-1}{\text {d}}\!z, \end{aligned}$$

(25)

where $\Gamma $ is a closed contour being contained in the analyticity region of f and winding once around the spectrum $\sigma ({\varvec{A}})$ in counterclockwise direction. As it is well-known, analytic functions f of elliptic, self-adjoint pseudodifferential operators yield again pseudodifferential operators in the same algebra, see, e.g., [46, Chap.XII.1]. Hence, ${\varvec{B}} \mathrel {\mathrel {\mathop :}=}f({\varvec{A}})$ is S-compressible provided that f is analytic. Especially, the S-compressed representation $\big (f({\varvec{A}}^\eta )\big )^{\eta }$ satisfies

$$\begin{aligned} \begin{aligned} \big \Vert {\varvec{B}}^\Sigma -\big (f({\varvec{A}}^{\eta })\big )^{\eta }\big \Vert _F&\le \Vert {\varvec{B}}^\Sigma -{\varvec{B}}^{\eta }\Vert _F +\big \Vert \big (f({\varvec{A}}^\Sigma )-f({\varvec{A}}^{\eta })\big )^{\eta }\big \Vert _F\\&\le \varepsilon \Vert {\varvec{B}}\Vert _F + L\Vert {\varvec{A}}^\Sigma -{\varvec{A}}^{\eta }\Vert _F\\&\le \varepsilon \big (\Vert {\varvec{B}}\Vert _F+L\Vert {\varvec{A}}\Vert _F\big ). \end{aligned} \end{aligned}$$

(26)

Herein, $L$ denotes the Lipschitz constant of the function $f$. In other words, estimate (26) implies that the error of the approximation of the S-formatted matrix function $\big (f({\varvec{A}}^{\eta })\big )^{\eta }$ is rigorously controlled by the sum of the input error $\Vert {\varvec{A}}^\Sigma -{\varvec{A}}^{\eta }\Vert _F$ and the compression error for the exact output $\Vert {\varvec{B}}^\Sigma -{\varvec{B}}^{\eta }\Vert _F$. The latter is under control if the underlying pseudodifferential operator is of order $s < -d$ since then the kernel is continuous and satisfies (15). In the other cases, some analysis is needed to control this error (see below).

For the numerical approximation of the contour integral (25), one has to apply an appropriate quadrature formula. Exemplarily, we consider the matrix square root, i.e., $f(z) = \sqrt{z}$ for $\textrm{Re} z > 0$. This occurs, for example, in the efficient path simulation of Gaussian processes in spatial statistics. We shall apply here the approximation, see [23, Eq. (4.4) and comments below],

$$\begin{aligned} \begin{aligned} {\varvec{A}}^{-1/2}&\approx \frac{2 E \sqrt{{\underline{c}}}}{\pi K} \sum _{k=1}^K\frac{{\text {dn}} \left( t_k | 1-\varkappa _{\varvec{A}}\right) }{{\text {cn}}^2\left( t_k | 1 - \varkappa _{\varvec{A}}\right) } \left( {\varvec{A}} + w_k^2{\varvec{I}}\right) ^{-1},\\ {\varvec{A}}^{1/2}&= {\varvec{A}}\cdot {\varvec{A}}^{-1/2}. \end{aligned} \end{aligned}$$

(27)

Herein, ${\text {sn}}, {\text {cn}}$ and ${\text {dn}}$ are the Jacobian elliptic functions [2, Chapter 16], E is the complete elliptic integral of the second kind associated with the parameter $\varkappa _{\varvec{A}}:= {\underline{c}}/{\overline{c}}$ [2, Chapter 17], and, for $k\in \{1,\ldots , K \}$,

$$\begin{aligned} w_k\mathrel {\mathrel {\mathop :}=}\sqrt{{\underline{c}}}\,\frac{ {\text {sn}}\left( t_k | 1 - \varkappa _{\varvec{A}}\right) }{{\text {cn}}\left( t_k | 1 - \varkappa _{\varvec{A}}\right) } \quad \text {and} \quad t_k\mathrel {\mathrel {\mathop :}=}\frac{E}{K}\big (k-\tfrac{1}{2}\big ). \end{aligned}$$

The quadrature approximation (27) of the contour integral (25) for the matrix square root is known to converge root-exponentially in the number K of quadrature nodes in (27) of the contour integral, see e.g., [10, Lemma 3.4]. Hence, approximate representations with algebraic (with respect to N) consistency order can be achieved with $K\sim |\varepsilon (\eta )|$, resulting in overall log-linear complexity of the numerical realization of (27) in S-format. We also remark that the quadrature shifts $w_k^2$ in the inversions which occur in (27) act as regularizing “nuggets” of a possibly ill-conditioned ${\varvec{A}}$. The input parameters $0< {\underline{c}} < {\overline{c}}$ shall provide bounds to the spectrum of ${\varvec{A}}$, i.e., ${\underline{c}}\approx \lambda _{\min }({\varvec{A}})$ and ${\overline{c}}\approx \lambda _{\max }({\varvec{A}})$. Note that we also assume here that ${\varvec{A}}$ is symmetric and positive definite. Moreover, we should mention that, except for the quadrature error, (27) computes the square root $({\varvec{A}}^\eta )^{-1/2}$ of the compressed input ${\varvec{A}}^\eta $ in an exact way on the compression pattern when we use the selected inversion algorithm from Sect. 4.2.

That $({\varvec{A}}^\eta )^{-1/2}$ is indeed S-compressible is a consequence of the following lemma.

Lemma 3

Let ${{\mathcal {A}}}$ be a pseudodifferential operator of order $s \le 0$ with symmetric and positive semidefinite kernel function. Then, for any $\mu >0$, the inverse square root of ${{\mathcal {A}}}+\mu {I}$ can be written as $\frac{1}{\sqrt{\mu }}{I}-{{\mathcal {B}}}$ with ${{\mathcal {B}}}$ being also a pseudodifferential operator of order s, which admits a symmetric and positive semidefinite kernel function.

Proof

Straightforward calculation shows that the ansatz

$$\begin{aligned} ({\mathcal {A}}+\mu I)^{-1/2} = \frac{1}{\sqrt{\mu }} I-{\mathcal {B}}\end{aligned}$$

(28)

is equivalent to

$$\begin{aligned} ({\mathcal {A}}+\mu I)\bigg (\frac{1}{\mu } I-\frac{2}{\sqrt{\mu }}{\mathcal {B}}+{\mathcal {B}}^2\bigg ) = I. \end{aligned}$$

Thus,

$$\begin{aligned} {\mathcal {B}}\bigg (\frac{2}{\sqrt{\mu }}I-{\mathcal {B}}\bigg ) = \frac{1}{\mu }({\mathcal {A}}+\mu I)^{-1}{\mathcal {A}}, \end{aligned}$$

which in view of (28) is equivalent to

$$\begin{aligned} {\mathcal {B}}\bigg (\frac{1}{\sqrt{\mu }}I+({\mathcal {A}}+\mu I)^{-1/2}\bigg ) = \frac{1}{\mu }{\mathcal {A}}({\mathcal {A}}+\mu I)^{-1}. \end{aligned}$$

As both, $\frac{1}{\sqrt{\mu }}I+({\mathcal {A}}+\mu I)^{-1/2}$ and $({\mathcal {A}}+\mu I)^{-1}$, are pseudodifferential of order 0, ${\mathcal {B}}$ must have the same order as ${\mathcal {A}}$. $\square $

An alternative to the contour integral for computing the matrix exponential of a (possibly singular) matrix ${\varvec{A}}$ is given by the direct evaluation of the power series

$$\begin{aligned} \exp ({\varvec{A}})=\sum _{k=0}^\infty \frac{1}{k!}{\varvec{A}}^k. \end{aligned}$$

As we show in the numerical results, this series converges very fast for the matrices presently under consideration which stem from reproducing kernels, since they correspond to compact operators.

5 Numerical results

The computations in this section have been performed on a single node with two Intel Xeon E5-2650 v3 @2.30GHz CPUs and up to 512GB of main memory. To achieve consistent timings, all computations have been carried out using 16 cores. The samplet compression is implemented in C++11 and relies on the Eigen template library^{Footnote 1} for linear algebra operations. Moreover, the selected inversion is performed by Pardiso. Throughout this section, we employ samplets with $q+1=4$ vanishing moments. The parameter for the admissibility condition (17) is set to $\eta =1.25$. Together with the a priori pattern, which is obtained by neglecting admissible blocks, we also consider an a posteriori compression by setting all matrix entries smaller than $\tau =10^{-5}/N$ to zero resulting in the a posteriori pattern. In view of (18), there is a tradeoff between the number $q+1$ of vanishing moments and $\eta $. Increasing either results in higher accuracy, but also in more densely populated matrices. The chosen setting results in compression errors about $10^{-5}$ for all shown examples. For a comprehensive study of the compression errors, we refer to [25].

5.1 S-formatted matrix multiplication

To benchmark the multiplication, we consider uniformly distributed random points on the unit hypercube $[0,1]^d$. As kernel, we consider exclusively the exponential kernel (which is the Matérn kernel with smoothness parameter $\nu =1/2$ and correlation length $\ell =1$)

$$\begin{aligned} \kappa ({\varvec{x}},{\varvec{y}})=\frac{1}{N}e^{-\Vert {\varvec{x}}-{\varvec{y}}\Vert _2}. \end{aligned}$$

(29)

Note that we impose the scaling 1/N of the kernel function in order fix the largest eigenvalue of the kernel matrix as its trace stays uniformly bounded. We like to stress that the present approach also works for smoother kernels than (29). However, in this case, single-block low-rank approximation techniques are competitive, too, see [4, 5, 12, 26].

We compute the matrix product ${\varvec{K}}^\eta \cdot \tilde{\varvec{K}}^\eta $, where $\tilde{\varvec{K}}^\eta $ is obtained from ${\varvec{K}}^\eta $ by relatively perturbing each nonzero entry by 10% additive noise, which is uniformly distributed in $[0,1]$. This way, we rule out symmetry effects as $\tilde{\varvec{K}}^\eta $ will not be symmetric in general.

To measure the multiplication error, we consider the estimator

$$\begin{aligned} e_F({\varvec{A}})\mathrel {\mathrel {\mathop :}=}\frac{\Vert {\varvec{A}}{\varvec{U}}\Vert _F}{\Vert {\varvec{U}}\Vert _F}, \end{aligned}$$

where ${\varvec{U}}\in {\mathbb {R}}^{N\times 10}$ is a random matrix with uniformly distributed independent entries. The left hand side of Fig. 2 shows the computation time for a single multiplication. The dashed lines correspond to the asymptotic rates ${\mathcal {O}}(N\log ^\alpha N)$ for $\alpha =0,1,2,3$. It can be seen that the multiplication time for $d=2$ perfectly reflects the expected essentially linear behavior. Though the graph is steeper for $d=3$, we expect it to flatten further for larger $N$. The right hand side of the plot shows the multiplication error $e_F({\varvec{K}}^\eta \cdot \tilde{\varvec{K}}^\eta - {\varvec{K}}^\eta \boxdot \tilde{\varvec{K}}^\eta )$, where the formatted multiplication $\boxdot $ is performed on the a posteriori pattern. Taking into account that the compression errors for ${\varvec{K}}^\eta $ are approximately $5.6\cdot 10^{-6}$ for $d=2$ and $1.6\cdot 10^{-5}$ for $d=3$, the obtained matrix product can be considered to be very accurate.

5.2 S-formatted matrix inversion

In order to assess the numerical performance of the matrix inversion, we again consider uniformly distributed random points on the unit hypercube $[0,1]^d$. Since the separation radius $q_X$ ranges between $4.7\cdot 10^{-5}$ ($N=5000$) and $2.8\cdot 10^{-7}$ ($N=1\, 000\, 000$) for $d=2$ and $3.8\cdot 10^{-4}$ ($N=5000$) and $3.2\cdot 10^{-5}$ ($N=1\, 000\, 000$) for $d=3$, we do not expect ${\varvec{K}}^\eta $ to be invertible. Therefore, we rather consider the regularized version ${\varvec{K}}^\eta +\mu {\varvec{I}}$ for a ridge parameter $\mu >0$.

As our theoretical results suggest that the inverse has the same a priori pattern as the matrix itself, we first consider the inversion on the a priori pattern for $d=2$.

The left hand side of Fig. 3 shows the computation times for the inverse matrix employing Pardiso. The dashed line shows the asymptotic rates ${\mathcal {O}}(N^\alpha )$ for $\alpha =1,1.5$. For $N=1\,000\,000$, due to the large amount of non-zero entries, we use the block inversion with one subdivision. This explains the bump in the computation time due to the formatted matrix multiplication. Besides this, Pardiso perfectly exhibits the expected rate of $N^{1.5}$. The right hand side of the plot shows the error for the ridge parameters $\mu =10^{-6},10^{-4},10^{-2}$, where denotes the selected inversion on the pattern of ${\varvec{K}}^\eta $. The choice of the ridge parameters is exemplarily, starts from about the compression error and spans four orders of magnitude. The error reduces significantly with increasing ridge parameter, since the matrix ${\varvec{K}}^\eta +\mu {\varvec{I}}$ gets spectrally closer to the identity matrix.

As the a priori pattern typically exhibits significantly fewer nonzero entries, we also investigate the inversion on the a posteriori pattern. The corresponding results are shown in Fig. 4.

As it can be seen on the left hand side of the figure, the selected inversion now even exhibits a linear behavior, which is explained by the fixed threshold $\tau $, resulting in successively fewer nonzero entries for increasing $N$. On the other hand, the errors for the different ridge parameters, depicted on the right hand side of the same figure, asymptotically exhibit the same behavior as in the a priori case.

Motivated by the results for $d=2$, we consider only the inversion on the a posteriori pattern for $d=3$. The corresponding results are shown in Fig. 5. On the left hand side of the figure, again the computation times are shown. The dashed lines show the asymptotic rates ${\mathcal {O}}(N^\alpha )$ for $\alpha =1,2$. Until $N=100\,000$, the expected quadratic rate is perfectly matched. Due to the large number of non-zeros in the case $d=3$, we have employed the block inversion with three recursion steps for $N>100\,000$, resulting in the peculiar linear behavior for the respective values in the graph. The errors depicted on the right hand side show a behavior similar to the case $d=2$, with a slightly reduced decay.

5.3 S-formatted matrix functions

We compute the matrix square root ${\varvec{A}}^{1/2}$ and the matrix exponential $\exp ({\varvec{A}})$ for the exponential kernel

$$\begin{aligned} \kappa ({\varvec{x}},{\varvec{y}})=\frac{1}{N}e^{-2\Vert {\varvec{x}}-{\varvec{y}}\Vert _2} \end{aligned}$$

This time, the data points are randomly subsampled from a 3D scan of the head of Michelangelo’s David (The scan is provided by the Statens Museum for Kunst under the Creative Commons CC0 license), cp. Fig. 6. The bounding box of the point cloud is $[-0.52,0.42]\times [-0.47,0.46]\times [-0.18,0.78]$. All other parameters are set as in the examples before. Moreover, we set the ridge parameter to $\mu =10^{-4}$, corresponding to the middle value from Sect. 5.2. The smallest eigenvalue is estimated by the ridge parameter, while the largest eigenvalue is upper bounded by $1$. For the contour integral method for the computation of the matrix square root, we found stagnation in the error for $K\ge 7$ quadrature points. The corresponding errors for different values of $N$ are tabulated in Table 1.

Table 1 Errors for the contour integral method for $({\varvec{K}}^\eta +\mu {\varvec{I}})^{1/2}$

Full size table

Finally, Table 2 shows the approximation error of the matrix exponential for different values of $N$. The true matrix exponential is estimated by a power series of length $30$ directly applied to the matrix ${\varvec{X}}$. Here, we found that the error starts to stagnate for more than 8 terms in the expansion. The largest eigenvalue satisfies $\Vert {\varvec{K}}^\eta \Vert _2\approx 0.337$ (estimated by a Rayleigh quotient iteration with 50 iterations), hence explaining the rapid convergence. Note that we do not require any regularization here, as just matrix products are computed.

Table 2 Errors for the approximation of $\exp ({\varvec{K}}^\eta )$ by the power series of the exponential

Full size table

5.4 Gaussian process implicit surfaces

We consider Gaussian process learning of implicit surfaces. In accordance with [50], we consider a closed surface $S=\partial \Omega $ of dimension $d-1$, given by the 0-level set of the function

$$\begin{aligned} f:{\mathbb {R}}^d\rightarrow {\mathbb {R}},\quad f({\varvec{x}}){\left\{ \begin{array}{ll}=0,&{} {\varvec{x}}\in S,\\ >0, &{}{\varvec{x}}\in \Omega ,\\ <0,&{}{\varvec{x}}\in {\mathbb {R}}^d\setminus {\overline{\Omega }}, \end{array}\right. } \end{aligned}$$

i.e.,

$$\begin{aligned} S=\{{\varvec{x}}\in {\mathbb {R}}^d:f({\varvec{x}})=0\}. \end{aligned}$$

For the function $f$, we impose a Gaussian process model with covariance function given by the exponential kernel

$$\begin{aligned} \kappa ({\varvec{x}},{\varvec{y}})=\frac{1}{N}e^{-6\Vert {\varvec{x}}-{\varvec{y}}\Vert _2} \end{aligned}$$

and prior mean zero. Then, given the data sites $X$ of size $N\mathrel {\mathrel {\mathop :}=}|X|$ and the noisy measurements ${\varvec{y}}=f(X)+{\varvec{\varepsilon }}$, where ${\varvec{\varepsilon }}\sim {\mathcal {N}}({\varvec{0}},\mu {\varvec{I}})$, the posterior distribution for the data sites $Z\subset {\mathbb {R}}^3$ is determined by

$$\begin{aligned} {\mathbb {E}}[f(Z)|X,{\varvec{y}}]&={\varvec{K}}_{ZX}({\varvec{K}}_{XX}+\mu {\varvec{I}})^{-1}{\varvec{y}},\\ {\text {Cov}}[f(Z)|X,{\varvec{y}}]&= {\varvec{K}}_{ZZ}-{\varvec{K}}_{ZX}({\varvec{K}}_{XX}+\mu {\varvec{I}})^{-1}{\varvec{K}}_{ZX}^\intercal . \end{aligned}$$

Herein, setting $M\mathrel {\mathrel {\mathop :}=}|Z|$, we have ${\varvec{K}}_{XX}=[\kappa (X,X)] \in {\mathbb {R}}^{N\times N}$, ${\varvec{K}}_{ZX}=[\kappa (Z,X)]\in {\mathbb {R}}^{M\times N}$, ${\varvec{K}}_{ZZ}=[\kappa (Z,Z)]\in {\mathbb {R}}^{M\times M}$.

The matrix ${\varvec{K}}_{ZX}$ can efficiently be computed by using one samplet tree for $Z$ and a second samplet tree for $X$, while $({\varvec{K}}_{XX}+\mu {\varvec{I}})^{-1}$ can be computed as in the previous examples. Hence, the computation of the posterior mean ${\mathbb {E}}[f(Z)|X,{\varvec{y}}]$ is straightforward. For $X$, we use samplets with $q+1=4$ vanishing moments, while samplets with $q+1=3$ vanishing moments are applied for $Z$. Moreover, we use an a-posteriori threshold of $\tau =10^{-4}/N$ for ${\varvec{K}}_{ZX}^{\eta }$.

Similarly, we can evaluate the covariance in samplet coordinates. The evaluation of the standard deviation $\sqrt{{\text {diag}}({\text {Cov}}[f(Z)|X,{\varvec{y}}])}$ requires more care. Here we just transform ${\varvec{K}}_{ZX}$ with respect to the points in $X$ and evaluate the diagonal resulting in a computational cost of ${\mathcal {O}}\big (MN\log N\big )$.

The left panel in Fig. 7 shows the initial setup. 240 data points with a value $-1$ are located on a sphere within the point cloud, $15\,507$ points with a value of 0 are located at its surface and 1200 points with a value of 1 are located on a box enclosing it. This results in $N=16\,947$ data points in total. The ridge parameter was set to $\mu =2\cdot 10^{-5}$. The conditional expectation and the standard deviation have been computed on a regular grid with $M=8\,000\,000$ points. The middle panel in Fig. 7 shows the 0-level set while the right panel shows the standard deviation. As expected, the standard deviation is lowest close to the data sites (blue is small, red is large).

6 Conclusion

We have presented a sparse matrix algebra for kernel matrices in samplet coordinates. This algebra allows for the rapid addition, multiplication and inversion of (regularized) kernel matrices, whose operations mimic algebras of corresponding pseudodifferential operators. The proposed arithmetic operations extend to S-formatted, approximate representations of holomorphic functions of S-formatted approximations of self-adjoint operators, which are likewise realized in log-linear cost. While the addition is straightforward, we have derived an error and cost analysis for the multiplication, and for the approximate evaluation of holomorphic operator-functions, again having log-linear cost. The S-formatted approximate inversion is realized by selected inversion for sparse matrices, which also enables the computation of general matrix functions by the contour integral approach. The numerical benchmarks corroborate the theoretical findings for data sets in two and three dimensions. As a relevant example from computer graphics, we have considered Gaussian process learning for the computation of a signed distance function from scattered data.

We expect the presently developed fast kernel matrix algebra to impact various areas in machine learning and statistics, where kernel-based approximations appear, see, e.g., [8, 34] and the references there.

Notes

https://eigen.tuxfamily.org/.

References

Abels, H.: Pseudodifferential and Singular Integral Operators. De Gruyter, Berlin (2012)
Google Scholar
Abramowitz, M., Stegun, I.A.: Handbook of Mathematical Functions. Dover, New York (1972)
Google Scholar
Alm, D., Harbrecht, H., Krämer, U.: The $\cal{H} ^2$-wavelet method. J. Comput. Appl. Math. 267, 131–159 (2014)
MathSciNet Google Scholar
Bach, F.R., Jordan, M.I.: Kernel independent component analysis. J. Mach. Learn. Res. 3, 1–48 (2002)
MathSciNet Google Scholar
Beebe, N.H.F., Linderberg, J.: Simplifications in the generation and transformation of two-electron integrals in molecular calculations. Int. J. Quantum Chem. 7, 683–705 (1977)
Google Scholar
Beylkin, G.: Wavelets, multiresolution analysis and fast numerical algorithms. In: Erlebacher et al. (eds) Wavelets Theory and Applications, pp. 182–262, Oxford University Press, Oxford (1996)
Beylkin, G., Mohlenkamp, M.J.: Numerical operator calculus in higher dimensions. Proc. Natl. Acad. Sci. 99(16), 10246–10251 (2002)
MathSciNet Google Scholar
Bohn, B., Rieger, C., Griebel, M.: A representer theorem for deep kernel learning. J. Mach. Learn. Res. 20, 1–32 (2019)
MathSciNet Google Scholar
Bollhöfer, M., Eftekhari, A., Scheidegger, S., Schenk, O.: Large-scale sparse inverse covariance matrix estimation. SIAM J. Sci. Comput. 41(1), A380–A401 (2019)
MathSciNet Google Scholar
Bonito, A., Pasciak, J.E.: Numerical approximation of fractional powers of elliptic operators. Math. Comput. 84, 2083–2110 (2015)
MathSciNet Google Scholar
Börm, S.: Efficient numerical methods for non-local operators: ${\cal{H} }^2$-matrix compression, algorithms and analysis. European Mathematical Society, Zürich (2010)
Google Scholar
Chen, Y., Epperly, E.N., Tropp, J.A., Webber, R.J.: Randomly pivoted Cholesky: Practical approximation of a kernel matrix with few entry evaluations. arXiv:2207.06503
Dick, J., Kritzer, P., Pillichshammer, F.: Lattice rules. Numerical integration, approximation, and discrepancy, volume 58 of Springer Series in Computational Mathematics. Springer Nature Switzerland, Cham (2022)
Dölz, J., Harbrecht, H., Multerer, M.: On the best approximation of the hierarchical matrix product. SIAM J. Matrix Anal. Appl. 40(1), 147–174 (2019)
MathSciNet Google Scholar
Duff, I.S., Reid, J.K.: The multifrontal solution of indefinite sparse symmetric sets of linear equations. ACM Trans. Math. Softw. 9, 302–325 (1983)
Google Scholar
Fasshauer, G.E., Ye, Q.: Reproducing kernels of generalized Sobolev spaces via a Green function approach with distributional operators. Numer. Math. 119, 585–611 (2011)
MathSciNet Google Scholar
Gavrilyuk, I.P., Hackbusch, W., Khoromskij, B.N.: Data-sparse approximation to the operator-valued functions of elliptic operator. Math. Comput. 73(247), 1297–1324 (2003)
MathSciNet Google Scholar
George, A.: Nested dissection of a regular finite element mesh. SIAM J. Numer. Anal. 10(2), 345–363 (1973)
MathSciNet Google Scholar
George, A., Liu, J.: Computer Solution of Large Sparse Positive Definite Systems. Prentice-Hall, Englewood Cliffs (1981)
Google Scholar
Greengard, L., Rokhlin, V.: A fast algorithm for particle simulation. J. Comput. Phys. 73, 325–348 (1987)
MathSciNet Google Scholar
Hackbusch, W.: Hierarchical Matrices: Algorithms and Analysis. Springer, Berlin (2015)
Google Scholar
Hackbusch, W., Khoromskij, B.N., Tyrtyshnikov, E.E.: Approximate iterations for structured matrices. Numer. Math. 109(3), 365–383 (2008)
MathSciNet Google Scholar
Hale, N., Higham, N.J., Trefethen, L.N.: Computing ${ A}^\alpha, \log ({ A})$, and related matrix functions by contour integrals. SIAM J. Numer. Anal. 46(5), 2505–2523 (2008)
MathSciNet Google Scholar
Harbrecht, H., Multerer, M.D.: A fast direct solver for nonlocal operators in wavelet coordinates. J. Comput. Phys. 428, 110056 (2021)
MathSciNet Google Scholar
Harbrecht, H., Multerer, M.: Samplets: construction and scattered data compression. J. Comput. Phys. 471, 111616 (2022)
MathSciNet Google Scholar
Harbrecht, H., Peters, M., Schneider, R.: On the low-rank approximation by the pivoted Cholesky decomposition. Appl. Numer. Math. 62(4), 428–440 (2012)
MathSciNet Google Scholar
Hörmander, L.: The analysis of linear partial differential operators. I. Classics in Mathematics. Springer, Berlin, 2003. Distribution theory and Fourier analysis, Reprint of the second (1990) edition
Hörmander, L.: The analysis of linear partial differential operators. III. Classics in Mathematics. Springer, Berlin, 2007. Pseudo-differential operators, Reprint of the 1994 edition
Hsieh,C.-J., Sustik, M.A., Dhillon, I.S., Ravikumar, P.K., Poldrack, R.A.: BIG & QUIC: Sparse inverse covariance estimation for million variables. In: Burges, C., Bottou, L., Welling, M., Ghahramani, Z., Weinberger, K. (eds.) Advances in Neural Information Processing Systems, volume 26 of Neural Information Processing Systems Foundation, pp. 3165–3173 (2013)
Kempf, R., Wendland, H.: High-dimensional approximation with kernel-based multilevel methods on sparse grids. Numer. Math. 154, 485–519 (2023)
MathSciNet Google Scholar
Kuzmin, A., Luisier, M., Schenk, O.: Fast methods for computing selected elements of the Green’s function in massively parallel nanoelectronic device simulations. In: Wolf, F., Mohr, B., Mey, D. (eds.) Euro-Par 2013 Parallel Processing. Lecture Notes in Computer Science, vol. 8097, pp. 533–544. Springer, Berlin-Heidelberg (2013)
Google Scholar
Leobacher, G., Pillichshammer, F.: Introduction to Quasi-Monte Carlo Integration and Applications. Springer International Publishing, Cham (2010)
Google Scholar
Li, S., Ahmed, S., Klimeck, G., Darve, E.: Computing entries of the inverse of a sparse matrix using the FIND algorithm. J. Comput. Phys. 227(22), 9408–942 (2008)
MathSciNet Google Scholar
Li, Z., Kovachki, N., Azizzadenesheli, K., Liu, B., Bhattacharya, K., Stuart, A., Anandkumar, A.: Multipole graph neural operator for parametric partial differential equations. Adv. Neural. Inf. Process. Syst. 33, 6755–6766 (2020)
Google Scholar
Lin, L., Yang, C., Lu, J., Ying, L., E, W.: A fast parallel algorithm for selected inversion of structured sparse matrices with application to 2D electronic structure calculations. SIAM J. Sci. Comput. 33(3), 1329–1351 (2011)
MathSciNet Google Scholar
Lin, L., Yang, C., Meza, J.C., Lu, J., Ying, L., E, W.: 2011. SelInv— An algorithm for selected inversion of a sparse symmetric matrix. ACM Trans. Math. Softw. 37(4), 40 (2011)
Google Scholar
Lipton, R.J., Rose, D.J., Tarjan, R.E.: Generalized nested dissection. SIAM J. Numer. Anal. 16(2), 346–358 (1979)
MathSciNet Google Scholar
Matérn, B.: Spatial Variation, Lecture Notes in Statistics, vol. 36, 2nd edn. Springer, Berlin (1986)
Google Scholar
Meyer, Y.: Wavelets and Operators. Cambridge University Press, Cambridge (2009)
Google Scholar
Panua PARDISO. Version 7.2. Panua Technologies. Lugano, Switzerland, http://www.panua.ch
Rasmussen, C.E., Williams, C.K.I.: Gaussian Processes for Machine Learning. The MIT Press, Cambridge, MA (2006)
Google Scholar
Schneider, R., Weber, T.: Wavelets for density matrix computation in electronic structure calculation. Appl. Numer. Math. 56(10–11), 1383–1396 (2006)
MathSciNet Google Scholar
Seeley, R.T.: Singular integrals and boundary value problems. Am. J. Math. 88, 781–809 (1966)
MathSciNet Google Scholar
Seeley, R.T.: Topics in pseudo-differential operators. In: Pseudo-Diff. Operators (C.I.M.E., Stresa, 1968), pp. 167–305. Edizioni Cremonese, Rome (1969)
Tausch, J., White, J.: Multiscale bases for the sparse representation of boundary integral operators on complex geometries. SIAM J. Sci. Comput. 24, 1610–1629 (2003)
MathSciNet Google Scholar
Taylor, M.E.: Pseudodifferential Operators. Princeton Mathematical Series, vol. 34. Princeton University Press, Princeton, N.J. (1981)
Google Scholar
Taylor, M.E.: Pseudodifferential Operators and Nonlinear PDE. Progress in Mathematics. Birkhäuser, Boston (1991)
Google Scholar
van Niekerk, J., Bakka, H., Rue, H., Schenk, O.: New frontiers in Bayesian modeling using the INLA package in R. J. Stat. Softw. 100(2), 1–28 (2021)
Google Scholar
Wendland, H.: Scattered Data Approximation. Cambridge University Press, Cambridge (2004)
Google Scholar
Williams, C., Fitzgibbon, A.: Gaussian Process Implicit Surfaces. In: Proceedings of the Gaussian Processes in Practice Workshop (1998)
Yu, C.D., Levitt, J., Reiz, S., Biros, G.: Geometry-oblivious FMM for compressing dense SPD matrices. In: SC ’17: The International Conference for High Performance Computing. Networking, Storage and Analysis, pp. 1–14. Association for Computing Machinery, New York (2017)

Download references

Acknowledgements

HH was funded in parts by the SNSF by the grant “Adaptive Boundary Element Methods Using Anisotropic Wavelets” (200021_192041). MM was funded in parts by the SNSF starting grant “Multiresolution methods for unstructured data” (TMSGI2_211684). OS gratefully acknowledges the great scientific support and HPC resources provided by the Erlangen National High Performance Computing Center (NHR@FAU) under the NHR project 286745. NHR funding is provided by federal and Bavarian state authorities. NHR@FAU hardware is partially funded by the DFG (440719683).

Funding

Open access funding provided by Universitá della Svizzera italiana.

Author information

Authors and Affiliations

Departement für Mathematik und Informatik, Universität Basel, Spiegelgasse 1, 4051, Basel, Switzerland
H. Harbrecht
Istituto Eulero, USI Lugano, Via la Santa 1, 6962, Lugano, Switzerland
M. Multerer
Institute of Computing, USI Lugano, Via la Santa 1, 6962, Lugano, Switzerland
O. Schenk
Seminar für Angewandte Mathematik, ETH Zürich, Rämistrasse 101, 8092, Zürich, Switzerland
Ch. Schwab

Authors

H. Harbrecht
View author publications
You can also search for this author in PubMed Google Scholar
M. Multerer
View author publications
You can also search for this author in PubMed Google Scholar
O. Schenk
View author publications
You can also search for this author in PubMed Google Scholar
Ch. Schwab
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to M. Multerer.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix A: Pseudodifferential operators

We present basic definitions and terminology from the theory of pseudodifferential operators, in particular elements of the calculus of pseudodifferential operators, going back to Seeley [43, 44]. We adopt the notation for the statements of results on pseudodifferential operators from the monographs of Hörmander [28] and Taylor [46], but hasten to add that infinite smoothness of kernels in the corresponding operator calculi is not essential in S-formatted matrix algebra, as the S-compression is based on Calderón-Zygmund estimates (15) to order $q+1$.

1.1 A.1 Symbols

For an order $r\in {\mathbb {R}}$ and an open and bounded domain $\Omega \subset {\mathbb {R}}^d$ with smooth boundary, the symbol class $S^r(\Omega \times {\mathbb {R}}^d)$ consists of functions $a\in C^\infty (\Omega \times {\mathbb {R}}^d)$ such that, for any $K\Subset \Omega $ and for every ${\varvec{\alpha }}, {\varvec{\beta }}\in {\mathbb {N}}^d$, there exist constants $C_{{\varvec{\alpha }},{\varvec{\beta }}}(K)>0$ such that

$$\begin{aligned} \forall {\varvec{x}}\in K,\ {\varvec{\xi }}\in {\mathbb {R}}^d:\quad \left| \partial _{\varvec{x}}^{\varvec{\alpha }} \partial _{\varvec{\xi }}^{\varvec{\beta }} a({\varvec{x}},{\varvec{\xi }}) \right| \le C_{{\varvec{\alpha }},{\varvec{\beta }}}(K) \langle {\varvec{\xi }}\rangle ^{r-|{\varvec{\beta }}|}, \end{aligned}$$

where $\langle {\varvec{\xi }}\rangle = (1+\Vert {\varvec{\xi }}\Vert _2^2)^{1/2}$. The class $S^r(\Omega \times {\mathbb {R}}^d)$ is contained in the Hörmander class $S^{r}_{1,0}(\Omega \times {\mathbb {R}}^d)$; we shall not require the general classes $S^r_{\rho ,\delta } (\Omega \times {\mathbb {R}}^d)$, cf. [28], and, therefore, omit the fine indices.

A function $a_r\in C^\infty (\Omega \times {\mathbb {R}}^d\backslash \{ 0 \})$ is called positively homogeneous of degree r if

$$\begin{aligned} \forall {\varvec{x}}\in \Omega ,\; \forall t>0, \; {\varvec{0}}\ne {\varvec{\xi }} \in {\mathbb {R}}^d: \quad a_r({\varvec{x}},t{\varvec{\xi }}) = t^r a_r({\varvec{x}},{\varvec{\xi }}). \end{aligned}$$

Note that then $\chi ({\varvec{\xi }}) a_r({\varvec{x}},{\varvec{\xi }})\in S^r(\Omega \times {\mathbb {R}}^d)$ for any smooth, nonnegative cut-off function $\chi $ which vanishes identically for $\Vert {\varvec{\xi }}\Vert _2 \le 1/2$ and $\chi ({\varvec{\xi }}) \equiv 1$ for $\Vert {\varvec{\xi }}\Vert _2 \ge 1$. For a symbol $a\in S^r(\Omega \times {\mathbb {R}}^d)$, the corresponding pseudodifferential operator ${\mathcal {A}}$ is defined for $u\in C_0^\infty (\Omega )$ via the oscillatory integral, cf. [27],

$$\begin{aligned} {\mathcal {A}}({\varvec{x}},-i\partial _{\varvec{x}})u({\varvec{x}}) = (2\pi )^{-d/2} \int _{{\varvec{\xi }}\in {\mathbb {R}}^d} e^{i\langle {\varvec{x}},{\varvec{\xi }}\rangle } a({\varvec{x}},{\varvec{\xi }}) {\hat{u}}({\varvec{\xi }}){\text {d}}\!{\varvec{\xi }},\ {\varvec{x}}\in \Omega . \end{aligned}$$

(30)

The set of all pseudodifferential operators ${\mathcal {A}}$ generated via (30) from a symbol $a\in S^r(\Omega \times {\mathbb {R}}^d)$ is denoted by $OPS^r(\Omega )$.

A symbol $a\in S^r(\Omega \times {\mathbb {R}}^d)$ is called classical symbol of order $r\in {\mathbb {R}}$ if for every $k\in {\mathbb {N}}$ there exist functions $a_{r-k}({\varvec{x}},{\varvec{\xi }})\in S^{r-k}(\Omega \times {\mathbb {R}}^d)$ such that $a \sim \sum _{k} a_{r-k}$ (in the sense of asymptotic expansions of symbols, compare [28]), where $a_{r-k}$ is homogeneous of degree $r-k$, i.e., there holds $a_{r-k}({\varvec{x}},t{\varvec{\xi }}) = t^{r-k}a_{r-k}({\varvec{x}},{\varvec{\xi }})$ for every $t>0$ and for every ${\varvec{\xi }}\in {\mathbb {R}}^d$ with $\Vert {\varvec{\xi }}\Vert _2\ge 1$. As a consequence of the asymptotic expansion of $a\in S^r_{cl}(\Omega \times {\mathbb {R}}^d)$, for every ${\varvec{\alpha }},{\varvec{\beta }}\in {\mathbb {N}}^d$ and for every $K\Subset \Omega $ exists a constant $c_{{\varvec{\alpha }},{\varvec{\beta }}} (K)\in (0,1)$ such that for every $N\in {\mathbb {N}}$ holds

$$\begin{aligned} \begin{aligned} \forall {\varvec{x}}\in K,\ {\varvec{\xi }}\in {\mathbb {R}}^d:\quad&\left| \partial ^{\varvec{\alpha }}_{\varvec{x}}\partial ^{\varvec{\beta }}_{\varvec{\xi }} \left( a({\varvec{x}},{\varvec{\xi }}) - \sum _{k=0}^N a_{r-k}({\varvec{x}},{\varvec{\xi }})\right) \right| \\&\qquad \le c_{{\varvec{\alpha }},{\varvec{\beta }}}(K) \langle {\varvec{\xi }}\rangle ^{r-N-|{\varvec{\beta }}|-1}. \end{aligned} \end{aligned}$$

(31)

1.2 A.2 Calculus

Pseudodifferential operators admit calculi which are crucial for the subsequent matrix arithmetic. We collect properties of the calculi in $S^r_{cl}(\Omega \times {\mathbb {R}}^d)$ that are required throughout the article.

Proposition 1

1.
${\mathcal {A}}\in OPS^r_{cl}$ implies ${\mathcal {A}}^\star \in OPS^r_{cl}$.
2.
${\mathcal {A}}\in OPS^r_{cl}$ and ${\mathcal {B}}\in OPS^t_{cl}$ implies ${\mathcal {A}}+{\mathcal {B}}\in OPS^{\max \{r,t\}}_{cl}$.
3.
${\mathcal {A}}\in OPS^r_{cl}$ and ${\mathcal {B}}\in OPS^t_{cl}$ implies ${\mathcal {A}}\circ {\mathcal {B}}\in OPS^{r+t}_{cl}$.
4.
If ${\mathcal {A}}\in OPS^r_{cl}$ is invertible and elliptic, then there holds $A^{-1} \in OPS^{-r}_{cl}$.

Proof

The asserted properties for $OPS^r_{cl}$ are standard properties for this algebra. $\square $

In case of the Matérn kernels, expanding (6) asymptotically, as $\Vert {\varvec{\varvec{\xi }}}\Vert _2\rightarrow \infty $, and comparing with (31), we deduce that the associated integral operator satisfies ${\mathcal {K}}_\nu \in OPS_{cl}^{-2\nu -d}$. It follows also from the symbolic calculus in Proposition 1 that the inverse ${\mathcal {K}}_\nu ^{-1}\in OPS_{cl}^{2\nu +d}$. Indeed, the symbol of the inverse corresponds to the differential operator ${\mathcal {A}}_\nu = \alpha ^{-1}(\textsf{id}-\frac{\ell ^2}{2\nu }\Delta )^{\nu +d/2}$ which is of order $2\nu +d$.

1.3 A.3 Kernels

Every continuous function on the cartesian product of two domains $\Omega _1$ and $\Omega _2$, $\kappa \in C(\Omega _1 \times \Omega _2)$, defines an integral operator from $C(\Omega _2)$ to $C(\Omega _1)$ by the formula

$$\begin{aligned} ({\mathcal {K}}\phi )({\varvec{x}}_1) = \int _{\Omega _2} \kappa ({\varvec{x}}_1,{\varvec{x}}_2) \phi ({\varvec{x}}_2){\text {d}}\!{\varvec{x}}_2. \end{aligned}$$

For such kernel functions, we have particularly, cf. [27, Eq. (5.2.1)],

$$\begin{aligned} \langle {\mathcal {K}}\phi ,\psi \rangle = \langle \kappa ,\psi \otimes \phi \rangle \quad \text {for all}\ \psi \in {\mathcal {D}}(\Omega _1),\; \phi \in {\mathcal {D}}(\Omega _2), \end{aligned}$$

(32)

where we define the space of test functions ${\mathcal {D}}(\Omega )\mathrel {\mathrel {\mathop :}=}C_0^\infty (\Omega )$ as usual. The characterization (32) can be extended to distributions $\kappa \in {\mathcal {D}}'(\Omega _1\times \Omega _2)$ if ${\mathcal {K}}\phi $ is allowed to be a distribution. Especially, according to the (classical) Schwartz Kernel Theorem, a (distributional) kernel corresponds in a one-to-one fashion to a linear operator and vice versa.

Proposition 2

(Schwartz Kernel Theorem [27, Thm. 5.2.1]) Every distributional kernel $\kappa \in {\mathcal {D}}'(\Omega _1\times \Omega _2)$ induces, via (32), a continuous, linear map from ${\mathcal {D}}(\Omega _2)$ to ${\mathcal {D}}'(\Omega _1)$. Conversely, for every linear map ${\mathcal {K}}$, there exists a unique distribution ${\mathcal {K}}$ such that (32) holds. The distribution $\kappa $ is called (distributional) kernel of ${\mathcal {K}}$.

Via the Schwartz Kernel Theorem, every classical pseudodifferential operator ${\mathcal {A}}\in OPS^r_{cl}(\Omega )$ with symbol $a\in S^r_{cl}(\Omega \times {\mathbb {R}} ^d)$ can be written as a (distributional) integral operator with (distributional) Schwartz kernel $\kappa _{{\mathcal {A}}}$. If the order r of the pseudodifferential operator ${\mathcal {A}}\in OPS^r_{cl}(\Omega )$ is smaller than $-d$, its distributional kernel is continuous and satisfies (15).

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Harbrecht, H., Multerer, M., Schenk, O. et al. Multiresolution kernel matrix algebra. Numer. Math. 156, 1085–1114 (2024). https://doi.org/10.1007/s00211-024-01409-8

Download citation

Received: 11 June 2023
Revised: 20 December 2023
Accepted: 01 April 2024
Published: 09 May 2024
Issue Date: June 2024
DOI: https://doi.org/10.1007/s00211-024-01409-8

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Multiresolution kernel matrix algebra

Abstract

Similar content being viewed by others

Approximation of eigenfunctions in kernel-based spaces

High-dimensional approximation with kernel-based multilevel methods on sparse grids

Scalable Kernel Methods for Uncertainty Quantification

1 Introduction

2 Reproducing kernel Hilbert spaces

Theorem 1

3 Samplet matrix compression

3.1 Samplets

Remark 1

3.2 Construction of samplets

Definition 1

Definition 2

Lemma 1

Theorem 2

Remark 2

3.3 Matrix compression

Theorem 3

Remark 3

Corollary 1

4 Samplet matrix algebra

4.1 Addition and multiplication

Theorem 4

Proof

Remark 4

Theorem 5

Proof

4.2 Sparse selected inversion

Lemma 2

Proof

4.3 Algorithmic aspects

4.4 Matrix functions

Lemma 3

Proof

5 Numerical results

5.1 S-formatted matrix multiplication

5.2 S-formatted matrix inversion

5.3 S-formatted matrix functions

5.4 Gaussian process implicit surfaces

6 Conclusion

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix A: Pseudodifferential operators

Appendix A: Pseudodifferential operators

1.1 A.1 Symbols

1.2 A.2 Calculus

Proposition 1

Proof

1.3 A.3 Kernels

Proposition 2

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation