Abstract
Datadependent greedy algorithms in kernel spaces are known to provide fast converging interpolants, while being extremely easy to implement and efficient to run. Despite this experimental evidence, no detailed theory has yet been presented. This situation is unsatisfactory, especially when compared to the case of the dataindependent Pgreedy algorithm, for which optimal convergence rates are available, despite its performances being usually inferior to the ones of target datadependent algorithms. In this work, we fill this gap by first defining a new scale of greedy algorithms for interpolation that comprises all the existing ones in a unique analysis, where the degree of dependency of the selection criterion on the functional data is quantified by a real parameter. We then prove new convergence rates where this degree is taken into account, and we show that, possibly up to a logarithmic factor, target datadependent selection strategies provide faster convergence. In particular, for the first time we obtain convergence rates for target data adaptive interpolation that are faster than the ones given by uniform points, without the need of any special assumption on the target function. These results are made possible by refining an earlier analysis of greedy algorithms in general Hilbert spaces. The rates are confirmed by a number of numerical examples.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
Kernel methods are a wellunderstood and widely used technique for approximation, regression and classification in machine learning and numerical analysis.
We start by collecting some notation and preliminary results, while more details are provided in Sect. 2. For a nonempty set \(\Omega \) a kernel is defined as a symmetric function \(k: \Omega \times \Omega \rightarrow \mathbb {R}\). The kernel matrix \(A_{X_n}\) for a set of points \(X_n = \{ x_1, \dots , x_n \} \subset \Omega \) is given as \((A_{X_n})_{ij} = (k(x_i, x_j))_{ij} \in \mathbb {R}^{n \times n}\), \(i,j=1, \dots , n\). If the kernel matrix is strictly positive definite for any set \(X_n \subset \Omega \) of n distinct points, the kernel is called strictly positive definite. Associated to every strictly positive definite kernel there is a unique Reproducing Kernel Hilbert Space \(\mathcal H_k (\Omega )\) (RKHS) with inner product \(\langle \cdot , \cdot \rangle _{\mathcal H_k (\Omega )}\), which is also called native space of k, and which is a space of real valued functions on \(\Omega \) where the kernel k acts as a reproducing kernel, that is

1.
\(k(\cdot , x) \in \mathcal H_k (\Omega )~ \forall x \in \Omega \),

2.
\(f(x) = \langle f, k(\cdot , x) \rangle _{\mathcal H_k (\Omega )} ~ \forall x \in \Omega , \forall f \in \mathcal H_k (\Omega )\) (reproducing property).
Strictly positive definite continuous kernels can be used for the interpolation of continuous functions. The theory is developed under the assumption that \(f \in \mathcal H_k (\Omega )\), and in this case for any set of pairwise distinct interpolation points \(X_n \subset \Omega \) there exists a unique minimumnorm interpolant \(s_n \in \mathcal H_k (\Omega )\) that satisfies
It can be shown that this interpolant is given by the orthogonal projection \(\Pi _{V(X_n)}(f)\) of f onto the linear subspace \(V(X_n) := {{\,\mathrm{\textrm{span}}\,}}\{ k(\cdot , x_i),\; x_i \in X_n \}\), i.e.,
The coefficients \(\alpha _j, j= 1, \dots ,n\), can be calculated by solving the linear system of equations arising from the interpolation conditions in Eq. (1), which is always invertible due to the assumed strict positive definiteness of the kernel.
A standard way of estimating the error between the function f and the interpolant in the \(\Vert \cdot \Vert _{L^\infty (\Omega )}\)norm makes use of the power function, which is given as
Obviously it holds \(P_{X_n}(x_i) = 0\) for all \(i=1, \dots , n\), and the standard power function estimate bounds the interpolation error as
where we denoted the residual as \(r_n := r_n(f):=f  s_n\).
Observe that any worstcase error bound on \((f\Pi _{V(X_n)}(f))(x)\) over the entire \(\mathcal H_k (\Omega )\) transfers to the same decay of the power function via the second equality in Eq. (2). For the large class of translational invariant kernels, that we will introduce below and that includes the notable class of radial basis function (RBF) kernels, it is possible to refine this error estimate by bounding the decay of the power function in terms of the fill distance
Depending on certain properties of the kernel, one may obtain in this way both algebraic and exponential rates in terms of \(h_{X_n}\). Especially in the case of kernels whose RKHS is norm equivalent to a Sobolev space, these algebraic rates are provably quasioptimal and may even be extended to certain functions that are outside of \(\mathcal H_k (\Omega )\) (see [17]).
These results are nevertheless bounded by the dependence on the filling of the space and by the independence on the target function f. Namely, the fill distance is at most decaying as \(h_{X_n, \Omega } \asymp c_\Omega n^{1/d}\) for quasiuniform points, which are spacefilling and targetindependent. On the other hand, a global targetdependent optimization of the interpolation points is a combinatorial and practically infeasible task, and thus approximated strategies have been proposed, and in particular greedy algorithms.
Greedy algorithms in general are studied in various branches of mathematics, and we point to [29] for a general treatment of their use in approximation. In kernel interpolation, a greedy algorithm starts with the empty set \(X_0 := \emptyset \) and adds points incrementally as \(X_{n+1} := X_n \cup \{ x_{n+1} \}\) according to some selection criterion \(\eta ^{(n)}\), that is
Commonly used selection criteria in the greedy kernel literature are the Pgreedy [3], \(f\cdot P\)greedy^{Footnote 1} [6], fgreedy [26], and f/Pgreedy [14] criteria, and they choose the next point according to the following strategies. From now on, we use the shorthand notation \(P_n(\cdot ) := P_{X_n}(\cdot )\) whenever the power function is determined by some greedy algorithm.

i.
Pgreedy: \(\eta _P^{(n)}(x) = P_{n}(x)\),

ii.
\(f \cdot P\)greedy: \(\eta _{f \cdot P}^{(n)}(x) = r_n(x) \cdot P_{n}(x)\),

iii.
fgreedy: \(\eta _f^{(n)}(x) = r_n(x)\),

iv.
f/Pgreedy: \(\eta _{f/P}^{(n)}(x) = r_n(x)/P_{n}(x)\).
These algorithms have been used in a series of applications (see, e.g., [6, 9,10,11, 18, 21, 26,27,28]), and overwhelming numerical evidence points to the fact that criteria which incorporate a residualdependent term provide faster convergence, even if sometimes at the price of stability (see [32] for a discussion of this fact for f/Pgreedy, and [6] for \(f\cdot P\)greedy).
The faster convergence is fully understandable since function adaptivity should clearly be beneficial to convergence speed. Nevertheless, the theoretical results are of opposite nature. Namely, for the Pgreedy algorithm it is possible to prove quasioptimality statements (see [20]), namely that whatever is the best known decay rate of the power function for arbitrarily optimized points, this transfers to the same decay of the power function associated to the points selected by Pgreedy. Especially in the case of Sobolev spaces, these results can be proven to be optimal [32]. On the other hand, the convergence theory for the target datadependent algorithms is much weaker: The known results (see Sect. 2 for a detailed account) provide convergence of order at most \(n^{1/2}\), which is generally not only largely missing the practical observations, but also slower than the rates proven for Pgreedy.
We remark that existing techniques to prove convergence of greedy algorithms in general Hilbert spaces are not directly transferable to this setting. Indeed, the first results on similar algorithms have been obtained in Matching Pursuit, and they work for finite dimensional spaces [2, 13]. When transferred to the kernel setting (see [26]) these require a norm equivalence between the \(\mathcal H_k (\Omega )\) and the \(\infty \)norm, which hold only for finite n. Subsequent general results on greedy algorithms (see [5]) require special assumptions on the target function, and the resulting rates are only of order \(n^{1/2}\). Another common strategy in the greedy literature makes use of the Restricted Isometry Property (see, e.g., [1]), which in the kernel setting translates to the requirement that the smallest eigenvalue \(\lambda _n\) of the kernel matrix is bounded away from zero uniformly in n. This is not the case here, since it is known that \(\lambda _n\le \min _{1\le j\le n} P_{X_n\setminus \{x_j\}}(x_j)^2\) (see [23]), and we will see later that a fast convergence to zero of the right hand side of this inequality is the key of our analysis. Especially, all these results prove convergence in the Hilbert space norm, which is generally too strong (to obtain convergence rates) since the interpolation operator is an orthogonal projection in \(\mathcal H_k (\Omega )\). We work instead with the \(\infty \)norm, which allows to derive fast convergence, even if it introduces an additional difficulty since the norm of the error is not monotonically decreasing. Furthermore we point to the empirical interpolation method (EIM) [12], which is also a greedy technique, that however aims at interpolating a set of functions (instead of a single function) using a subset of these functions as basis elements (instead of kernel evaluations \(k(\cdot , x)\)).
The paper is organized as follows. After recalling additional facts on kernel greedy interpolation in Sect. 2, we derive a new analysis of general greedy algorithms in general Hilbert spaces based on [4] (Sect. 3).
In Sect. 4 we frame the four selection rules into a joint scale of greedy algorithms by introducing \(\beta \)greedy algorithms (Definition 4) which include Pgreedy (\(\beta =0\)), \(f\cdot P\)greedy (\(\beta =1/2\)), fgreedy (\(\beta =1\)), and f/Pgreedy (\(\beta =\infty \)), and we study them within a novel error analysis.
These results are combined in Sect. 5 to obtain precise convergence rates of the minimum error \(e_{\min }(f, n) := \min _{1\le i\le n} \left\ f  s_i\right\ _\infty \) . This measure allows us to circumvent the nonmonotonicity of the error, and we remark in particular that \(e_{\min }(f, n)< \varepsilon \) for some \(\varepsilon >0\) means that an error smaller than \(\varepsilon \) is achieved using at most n points. As an exemplary result, we mention here the case where the rate of worstcase convergence in \(\mathcal H_k (\Omega )\) for a fixed set of n interpolation points is \(n^{\alpha }\) for a given \(\alpha >0\). In this case, for \(\beta \in [0,1]\) we get new convergence rates of the form
with \(c>0\). These results prove in particular that the worst case decay of the error that can be obtained in \(\mathcal H_k (\Omega )\) with a fixed sequence of points transfers to the \(\beta \)greedy algorithms with an additional multiplicative factor of \(\log (n)^\alpha n^{\beta /2}\). Namely, adaptively selected points provide faster convergence than any fixed set of points.
Finally, Sect. 6 illustrates the results with analytical and numerical examples while the final Sect. 7 presents the conclusion and gives an outlook.
2 Background Results on Kernel Interpolation
We recall additional required background information on kernel based approximation and in particular greedy kernel interpolation. For a more detailed overview we refer the reader to [7, 8, 30]. We remark that in this section no special attention is paid to the occurring constants, which can change from line to line.
2.1 Interpolation by Translational Invariant Kernels
In many applications of interest, the domain is a subset of the Euclidean space, i.e., \(\Omega \subset \mathbb {R}^d\). In this case, a special kind of kernels is given by translational invariant kernels, i.e., there exists a function \(\Phi : \mathbb {R}^d \rightarrow \mathbb {R}\) with a continuous Fourier transform \({\hat{\Phi }}\) such that
We remark that the wellknown radial basis function kernels are a particular instance of translational invariant kernels.
Depending on the decay of the Fourier transform of the function \(\Phi \), two classes of translational invariant kernels can be distinguished:

1.
We call the kernel k a kernel of finite smoothness \(\tau > d/2\), if there exist constants \(c_\Phi , C_\Phi > 0\) such that
$$\begin{aligned} c_\Phi (1+\Vert \omega \Vert _2^2)^{\tau } \le {\hat{\Phi }}(\omega ) \le C_\Phi (1 + \Vert \omega \Vert _2^2)^{\tau }. \end{aligned}$$The assumption \(\tau > d/2\) is required in order to have a Sobolev embedding in \(C^0(\Omega )\).

2.
If the Fourier transform \({\hat{\Phi }}\) decays faster than at any polynomial rate, the kernel is called infinitely smooth.
As mentioned in Sect. 1, for these two types of kernels it is possible to derive error estimates by bounding the decay of the power function in terms of the fill distance. We have the following:

1.
For kernels of finite smoothness \(\tau > d/2\), given appropriate conditions on the domain \(\Omega \subset \mathbb {R}^d\) (e.g., Lipschitz boundary and interior cone condition), the native space \(\mathcal H_k (\Omega )\) is norm equivalent to the Sobolev space \(W_2^\tau (\Omega )\). By making use of this connection, error estimates for kernel interpolation can be obtained by using Sobolev bounds [16, 31] that give
$$\begin{aligned} \Vert P_{X_n} \Vert _{L^\infty (\Omega )} \le {\hat{c}}_1 h_{X_n}^{\tau  d/2}. \end{aligned}$$(4) 
2.
For kernels of infinite smoothness such as the Gaussian, the multiquadric or the inverse multiquadric, we have
$$\begin{aligned} \Vert P_{X_n} \Vert _{L^\infty (\Omega )} \le {\hat{c}}_2 \exp ({\hat{c}}_3 h_{X_n}^{1}), \end{aligned}$$(5)if the domain \(\Omega \) is a cube. We remark that these error estimates are not limited to these three exemplary kernels. We point to [30, Theorem 11.22] which states a sufficient condition in order to obtain these exponential kind of error estimates.
By looking at welldistributed points such that \(h_{X_n, \Omega } \le c_\Omega n^{1/d}\), these bounds from Eqs. (4) and (5) can be cast only in terms of the number of interpolation points n, i.e.
2.2 Greedy Kernel Interpolation
We collect the motivation, a few properties, and the existing analysis of the four selection criteria introduced in Sect. 1:

i.
Pgreedy: The Pgreedy algorithm is the best analyzed one of the four algorithms named above. It aims at minimizing the error for all functions in the native space simultaneously, which is done by greedily minimizing the upper error bound from Eq. (3), which is the power function. Thus, the selection criterion of the Pgreedy algorithm is target data independent. For the Pgreedy algorithm, it holds \(P_{n}(x_{n+1}) = \Vert P_{n} \Vert _{L^\infty (\Omega )}\). Several results on the Pgreedy algorithm have been derived in [20, 32]:

(a)
Corollary 2.2. in [20] showed convergence statements for the maximal power function value \(\Vert P_n \Vert _{L^\infty (\Omega )}\) for radial basis function kernels, when \(\Omega \subset \mathbb {R}^d\) has a Lipschitz boundary and satisfies an interior cone condition. It states
$$\begin{aligned} \Vert P_{n} \Vert _{L^\infty (\Omega )}&\le c_1 \cdot n^{1/2\tau /d} \qquad{} & {} \text {(finite smoothness} ~ \tau > d/2) \\ \Vert P_{n} \Vert _{L^\infty (\Omega )}&\le c_2 \exp ( c_3 n^{1/d}) \qquad{} & {} \text {(infinite smoothness)}. \end{aligned}$$Via the standard power function bound from Eq. (3) these bounds directly give bounds on the approximation \(\Vert f  s_n \Vert _{L^\infty (\Omega )}\). A few more details of the proof strategy of [20] will be recalled in Sect. 3.

(a)
The paper [32] showed further results for the case of kernels of finite smoothness \(\tau > d/2\): Theorem 12 in [32] showed that the decay rate on \(\Vert P_n \Vert _{L^\infty (\Omega )}\) is sharp. The sequence of Theorems 15, 19 and 20 of [32] further established that the resulting sequence of points are asymptotically uniformly distributed under some mild conditions. These results implied (optimal) stability statements in [32, Corollary 22].

(a)

ii.
fgreedy: The fgreedy algorithm aims at directly minimizing the residual by setting the currently largest residual to zero by introducing the next interpolation point at this point, i.e. it holds \((fs_n)(x_{n+1}) = \Vert f  s_n \Vert _{L^\infty (\Omega )}\). Existing results prove convergence of order \(n^{\ell /d}\) for kernels \(k\in C^{2\ell }(\Omega \times \Omega )\) in \(d=1\) (see Section 3.4 in [14]), while for general d limited results are known, e.g., [14, Korollar 3.3.8] states that
$$\begin{aligned} \min _{j=1,\dots ,n} \Vert f  s_j \Vert _{L^\infty (\Omega )} \le C n^{1/d} \end{aligned}$$if \(k\in C^2(\Omega \times \Omega )\). As mentioned before, these convergence results do not reflect the approximation speed of fgreedy that can be observed in numerical investigations. Additionally, in [22] convergence of order \(n^{1/2}\) of the \(\mathcal H_k (\Omega )\)norm of the error is proven, but only under additional assumptions on f.

iii.
f/Pgreedy: The f/Pgreedy selection aims at minimizing the native space error of the residual as much as possible as it can be seen from Eq. (7). We remark as a technical detail that the supremum of \((fs_n)(x) / P_n(x)\) over \(x \in \Omega \setminus X_n\) need not be attained as it was exemplified in Example 6 of [32]. However, this can be alleviated by choosing the next point \(x_{n+1}\) such that \(\frac{r_n(x_{n+1})}{P_n(x_{n+1})} \ge (1\epsilon ) \cdot \sup _{x \in \Omega \setminus X_n} \frac{r_n(x)}{P_n(x)}\) for any \(0 < \epsilon \ll 1\). As a convergence result, [33, Theorem 3] states
$$\begin{aligned} \Vert f  s_n \Vert _{\mathcal H_k (\Omega )} \le C n^{1/2}, \end{aligned}$$which, however, only holds for a quite restricted set of functions f, which has slightly been extended in [22].

iv.
\(f \cdot P\)greedy: The idea of the just recently introduced \(f \cdot P\)greedy algorithm is to have a combination of the power function dependence and the target data dependence in order to balance between the stability of the Pgreedy algorithm and the target data dependence of the fgreedy algorithm. No convergence results were given in the original publication [6].
In addition to the selection criteria, we remark that for a practical numerical implementation the greedy algorithms stop if a predefined bound (either on, e.g., the accuracy or the numerical stability) is reached, or if the interpolant is exact.
Finally, to analyze and implement these algorithms it is useful to consider the Newton basis \(\{ v_j \}_{j=1}^n\) of \(V_n\) (see [15, 18]), which is obtained by applying the GramSchmidt orthonormalization process to \(\{k(\cdot , x_j), j=1, \dots , n \}\) whereby \(\{x_j, j =1, \dots , n \}\) are the pairwise distinct points that are incrementally selected by the greedy procedure. We recall that we have
and it can be shown that it holds \(\langle f, v_j \rangle _{\mathcal H_k (\Omega )} = (fs_{j1})(x_{j}) / P_{X_{j1}}(x_{j})\). If \(s_n {\mathop {\longrightarrow }\limits ^{n \rightarrow \infty }} f\) in \(\mathcal H_k (\Omega )\), we have
3 Analysis of Greedy Algorithms in an Abstract Setting
This section extends the abstract analysis of greedy algorithms in Hilbert spaces introduced in [4]. For this, let \({\mathcal {H}}\) be a Hilbert space with norm \(\Vert \cdot \Vert = \Vert \cdot \Vert _{\mathcal {H}}\). Let \({\mathcal {F}} \subset {\mathcal {H}}\) be a compact subset and assume for notational convenience only that it holds \(\Vert f \Vert _{\mathcal {H}} \le 1\) for all \(f \in {\mathcal {F}}\).
We consider algorithms that select elements \(f_0, f_1, \dots \), without yet specifying any particular selection criterion. We define \(V_n := \text {span}\{f_0, \dots , f_{n1}\}\) and the following quantities, whereby \(Y_n\) is any ndimensional subspace of \({\mathcal {H}}\):
The quantities \(d_n\) and \(\sigma _n\) have already been used in [4], where \(d_n\) is the Kolmogorov nwidth of \({\mathcal {F}}\), and we recall that the compactness of \({\mathcal {F}}\) is equivalent to require that \(\lim _n d_n = 0\) (see [19]). On the other hand, the newly introduced quantity \(\nu _n\) does not seem in itself to be an interesting quantity for the abstract setting, and it was only denoted as \(a_{n,n}\) within [4] before. However, it will be the key quantity for our new analysis in the kernel setting in Sects. 4 and 5.
As we focus on Hilbert spaces, expressions like \(({\mathcal {f}}, V_n)\) can be computed via the orthogonal projector in \({\mathcal {H}}\) onto \(V_n\), that we denote as \(\Pi _{V_n}\). We have the following elementary properties:

1.
Estimates: \(d_n \le \sigma _n\) and \(\nu _n \le \sigma _n\) for all \(n \in \mathbb {N}\).

2.
Monotonicity: \((\sigma _n)_{n \in \mathbb {N}}\) and \((d_n)_{n \in \mathbb {N}}\) are monotonically decreasing.

3.
Initial value: \(d_0 \le \sigma _0 \le 1\).
The paper [4] considers weak greedy algorithms that choose, for some fixed \(0 < \gamma \le 1\), the elements \(f_n\) such that
and shows that, roughly speaking, an asymptotic polynomial or exponential decay of \(d_n\) yields a polynomial or exponential decay of \(\sigma _n\), i.e., the weak greedy algorithms essentially realize the Kolmogorov widths up to multiplicative constants. We remark that this analysis includes the strong greedy algorithm, i.e., \(\gamma =1\).
In the following, we show in Sect. 3.1 that even without using the selection of Eq. (9)—i.e., the elements \(f_0, f_1, \dots \) may even be randomly chosen within \({\mathcal {F}}\)—comparable statements hold for \(\nu _n\).
3.1 Greedy Approximation with Arbitrary Selection Rules
We start by stating a simple modification of [4, Theorem 3.2.] and a subsequent corollary. The theorem is actually valid for any sequence \(\{f_i\}_i\subset {\mathcal {F}}\), but since we are interested in greedy algorithms we phrase the result by assuming that the \(f_i\) are selected in terms of an arbitrary selection rule.
Theorem 1
Consider a compact set \({\mathcal {F}}\) in a Hilbert space \({\mathcal {H}}\), and a greedy algorithm that selects elements from \({\mathcal {F}}\) according to any arbitrary selection rule.
We have the following inequalities between \(\nu _n, \sigma _n\) and \(d_n\) for any \(N \ge 0, K \ge 1, 1 \le m < K\):
Proof
The result is obtained by simply omitting the last step in the proof of Theorem 3.2 in [4]. Namely, any element in the sequence of selected functions can be represented by its coefficients representation on a certain orthonormal basis, obtained by a Gram–Schmidt orthonormalization process on the previously selected functions. These coefficients are collected into an infinite dimensional matrix (see Section 3 in [4]). It is possible to apply Lemma 2.1 in [4], in order to obtain the two bounds (3.2) and (3.3) in [4]. We follow the original proof up to this point, i.e., right before Eq. (3.4), which is the bound on the quantity \(a_{N+i, N+i}^2\). Using the secondtolast equation on p. 459 in [4] and our definition of \(\nu _n\), in our notation we have
and this gives the result. In the original paper, an additional step in Eq. (3.4) is used to obtain a bound on \(\sigma _n\) instead of \(\nu _n\). \(\square \)
Similarly to the approach used in [4], in the following corollary we make suitable choices of N, K, m to specialize the result to the case of algebraically or exponentially decaying Kolmogorov widths.
Corollary 2
Under the assumptions of Theorem 1 the following holds.

(i)
If \(d_n({\mathcal {F}}) \le C_0 n^{\alpha }, n\ge 1\), then it holds
$$\begin{aligned} \left( \prod _{i=n+1}^{2 n} \nu _{i} \right) ^{1/n}&\le 2^{\alpha +1/2} {\tilde{C}}_0 e^\alpha \log (n)^\alpha n^{\alpha }, \;\;n\ge 3, \end{aligned}$$(10)with \({{\tilde{C}}}_0 := \max \{1, C_0\}\).

(ii)
If \(d_n({\mathcal {F}}) \le C_0 e^{c_0 n^\alpha }, n=1,2,\dots \), then it holds
$$\begin{aligned} \left( \prod _{i=n+1}^{2n} \nu _i \right) ^{1/n} \le \sqrt{2 {\tilde{C}}_0} \cdot e^{c_1 n^\alpha }, \;\;n\ge 2, \end{aligned}$$(11)with \({\tilde{C}}_0 := \max \{1, C_0\}\) and \(c_1 = 2^{(2+\alpha )}c_0 < c_0\).
Proof
First of all we observe that for \(1 \le m < n\), we have \(0< x := m/n < 1\). Using \(x^{x}(1x)^{x1} \le 2\) for \(x \in (0,1)\) we obtain
We use Theorem 1 for \(N=K=n\) and any \(1 \le m < n\), i.e. we have
where we took the 2nth root for the second line and used the monotonicity and boundedness of \((\sigma _n)_{n \in \mathbb {N}}\) in the last step, i.e. \(\sigma _{n+1}^{m/n}\le \sigma _{1} ^{m/n} \le 1\).
In order to prove the statements (i) and (ii), we conclude now in two different ways:

(i)
For n fixed we choose a fixed \(0 < \omega \ll 1\) and define \(m^* := \lceil \omega n \rceil \in \mathbb {N}\), i.e. \(\omega n \le m^* < \omega n + 1\). Using \(d_n \le 1\) , \(d_n \le {\tilde{C}}_0 n^{\alpha }\) with \({\tilde{C}}_0 := \max \{1, C_0\}\), and since \(d_n\) is nonincreasing, we can estimate:
$$\begin{aligned} \left( \prod _{i=1}^n \nu _{n+i} \right) ^{1/n}&\le \sqrt{2} \cdot d_{m^*}^{(nm^{*})/n} \le \sqrt{2} \cdot d_{\lceil \omega n \rceil }^{(n\omega n  1)/n} \\&\le \sqrt{2} {\tilde{C}}_0^{(1\omega )  1/n} \lceil \omega n\rceil ^{\alpha (1\omega ) + \alpha /n} \\&\le \sqrt{2} {\tilde{C}}_0^{(1\omega )  1/n} (\omega n)^{\alpha (1\omega ) + \alpha /n} \\&\le \sqrt{2} {\tilde{C}}_0^{(1\omega )} \omega ^{\alpha (1\omega )} n^{\alpha (1\omega )} (n^{1/n})^ \alpha \\&\le \sqrt{2} {\tilde{C}}_0^{(1\omega )} \omega ^{\alpha (1\omega )} n^{\alpha (1\omega )} 2^ \alpha . \end{aligned}$$It follows that for each \(\omega \in (0, 1)\) it holds that
$$\begin{aligned} \left( \prod _{i=n+1}^{2 n} \nu _{i} \right) ^{1/n}&\le 2^{\alpha +1/2} {\tilde{C}}_0 n^{\alpha }\ C(\omega , n), \end{aligned}$$(13)with \(C(\omega , n) = {\tilde{C}}_0^{\omega } \omega ^{\alpha (1\omega )} n^{\alpha \omega }\). For each n, the inequality holds in particular for an optimally chosen \({{\bar{\omega }}}:={{\bar{\omega }}}(n)\) in (0, 1). To find a good candidate \({{\bar{\omega }}}\) we minimize the upper bound \({{\tilde{C}}}(\omega , n):= \omega ^{\alpha } n^{\alpha \omega }\), which satisfies \(C(\omega , n) \le {{\tilde{C}}}(\omega , n)\) since \(\omega , \alpha \ge 0\) and \(\tilde{C}_0\ge 1\). It holds
$$\begin{aligned} \partial _\omega {{\tilde{C}}}(\omega , n) = n^{\alpha \omega } \alpha \omega ^{1\alpha } (1 + \omega \log (n)), \end{aligned}$$which vanishes in \({{\bar{\omega }}} = 1/\log (n)\), and it is negative on the left of this value and positive on the right. It follows that if \({{\bar{\omega }}} \in (0, 1)\), i.e., \(n\ge 3\), then we can choose the constant \(C({{\bar{\omega }}}, n)\) in Eq. (13), which gives the statement since
$$\begin{aligned} C({{\bar{\omega }}}, n)&\le {{\tilde{C}}}({{\bar{\omega }}}, n) = \log (n)^{\alpha } n^{\alpha /\log (n)} = e^{\alpha } \log (n)^{\alpha }. \end{aligned}$$ 
(i)
We pick \(m = \lceil n/2 \rceil \) and make use of the assumed decay \(d_n({\mathcal {F}}) \le {\tilde{C}}_0 e^{c_0 n^\alpha }\) to estimate
$$\begin{aligned} \left( \prod _{i=1}^n \nu _{n+i} \right) ^{1/n}&\le \sqrt{2} \cdot d_m^{(nm)/n} = \sqrt{2} \cdot d_{\lceil n/2 \rceil }^{(n\lceil n/2 \rceil )/n} \\&\le \sqrt{2} \cdot C_0^{1/2} e^{c_0/2 (n/2)^\alpha \cdot (11/n)} \\&= \sqrt{2} \cdot C_0^{1/2} e^{2^{1\alpha } c_0 n^\alpha \cdot (11/n)} \\&{\mathop {\le }\limits ^{n \ge 2}} \sqrt{2} \cdot C_0^{1/2} e^{2^{2\alpha } c_0 n^\alpha }\\&= \sqrt{2 C_0}\ e^{c_1 n^\alpha }, \end{aligned}$$where \(c_1:= 2^{(2+\alpha )} c_0\), and this concludes the proof. \(\square \)
Remark 3
Observe that the constant \({{\tilde{C}}}_0 2^{\alpha +1/2} e ^{\alpha } = {{\tilde{C}}}_0\sqrt{2} \left( 2 e\right) ^{\alpha }\) in (10) is significantly smaller than the one obtained in [4] for the algebraic rate, which is \(C_0 2^{5\alpha + 1}\). However, we have here instead the logarithmic factor in n, even if we presume that it may be possible to remove it with a finer analysis. This conjecture is supported by the fact that we found neither an analytical nor numerical example which required the additional logarithmic factor in n.
4 Analysis of Greedy Algorithms in the Kernel Setting
This section introduces and analyses \(\beta \)greedy algorithms that are a scale of greedy algorithms which generalize the P, \(f \cdot P\), f and f/Pgreedy algorithms.
We work under the assumption
4.1 A Scale of Greedy Algorithms: \(\beta \)Greedy
We start with the definition of \(\beta \)greedy algorithms.
Definition 4
A greedy kernel algorithm is called \(\beta \)greedy algorithm with \(\beta \in [0, \infty ]\), if the next interpolation point is chosen as follows.

1.
For \(\beta \in [0, \infty )\) according to
$$\begin{aligned} \begin{aligned} x_{n+1} = {{\,\mathrm{\textrm{arg max}}\,}}_{x \in \Omega \setminus X_n}&(fs_n)(x)^\beta \cdot P_n(x)^{1\beta } . \end{aligned} \end{aligned}$$(15) 
2.
For \(\beta = \infty \) according to the f/Pgreedy algorithm.
As depicted in Fig. 1, for \(\beta = 0\) this is the Pgreedy algorithm, for \(\beta = 1/2\) it is the \(f \cdot P\)algorithm, and for \(\beta = 1\) it is the fgreedy algorithm. In the limit \(\beta \rightarrow \infty \) it makes sense to define the algorithm to be the f/Pgreedy algorithm.^{Footnote 2}
Observe that the \(\beta \)greedy algorithms are well defined also for \(1< \beta < \infty \). Indeed, in this case \(1  \beta < 0\) and thus the power function part occurs as a divisor, and this may potentially be a problem since \(P_n(x_i) = 0\) for all \(1\le i \le n\). Nevertheless, the standard power function estimate gives
i.e. it holds \(\lim _{x \rightarrow x_j} (fs_n)(x)^\beta \cdot P_n(x)^{1\beta } = 0\) for all \(x_j \in X_n\).
Remark 5
We remark that it is sufficient to consider only one parameter \(\beta > 0\) for the weighting of \((fs_n)(x)\) and \(P_n(x)\) as it was done in Eq. (15), in the sense that using two different parameters would be useless. Indeed, due to the strict monotonicity of the function \(x \mapsto x^{1/\alpha }\), for some \(\alpha > 0\) and for \(\gamma \in \mathbb {R}\) it holds
which shows that only the ratio \(\gamma /\alpha \) is decisive. The specific parametrization via \(\beta \) and \(1\beta \) in Eq. (15) was chosen in order to obtain f/Pgreedy as the limit case \(\beta \rightarrow \infty \).
4.2 Analysis of \(\beta \)Greedy Algorithms
We can now prove the convergence of these algorithms. So far, analysis of greedy kernel algorithms mainly focused on estimates on \(\Vert f  s_i \Vert _{L^\infty (\Omega )}\). Here and in the following, different quantities will be analyzed with the goal of bounding instead \(\min _{i=n+1, \dots , 2n} \Vert f  s_i \Vert _{L^\infty (\Omega )}\). We remark that no requirements on the kernel k or the set \(\Omega \) are needed for the results of this section, and especially for Theorem 8, as the proofs are based solely on RKHS theory.
We start by proving a key technical statement for greedy kernel interpolation that provides a bound on the product of the residual terms \(r_i := f  s_i\). This result holds independently of the strategy that is used to select the points, greedy or not.
Lemma 6
For any sequence \(\{ x_i \}_{i \in \mathbb {N}} \subset \Omega \) and any \(f \in \mathcal H_k (\Omega )\) it holds for all \({n=1, 2, \dots }\) that
Proof
Let
The geometric arithmetic mean inequality gives
We now use Eq. (7) applied to \(s_{2n+1}\) and \(s_{n+1}\), and the properties of orthogonal projections to obtain
It follows that \(R_n \le n^{1/2} \cdot \Vert r_{n+1} \Vert _{\mathcal H_k (\Omega )}\), and thus
\(\square \)
In order to derive convergence statements in the \({L^\infty (\Omega )}\) norm based on Lemma 6, it is now required to find a relationship between \(r_i(x_{i+1})\) and \(\Vert r_i \Vert _{L^\infty (\Omega )}\). To this end, we have the following lemma for \(\beta \)greedy algorithms. Observe that the sequence of points depends on the value of \(\beta \), i.e. \(x_n \equiv x_n^{(\beta )}\), but for notational convenience we drop the superscript.
Lemma 7
Any \(\beta \)greedy algorithm with \(\beta \in [0,\infty ]\) applied to a function \(f \in \mathcal H_k (\Omega )\) satisfies for \(i = 0, 1, \dots \):

(a)
In the case of \(\beta \in [0, 1]\):
$$\begin{aligned} \Vert r_i \Vert _{L^\infty (\Omega )}&\le r_i(x_{i+1})^{\beta } \cdot P_i(x_{i+1})^{1\beta } \cdot \Vert f  s_i \Vert _{\mathcal H_k (\Omega )}^{1\beta }. \end{aligned}$$(17) 
(b)
In the case of \(\beta \in (1, \infty ]\) with \(1/\infty := 0\):
$$\begin{aligned} \Vert r_i \Vert _{L^\infty (\Omega )}&\le \frac{r_i(x_{i+1})}{P_i(x_{i+1})^{11/\beta }} \cdot \Vert P_i \Vert _{L^\infty (\Omega )}^{1  1/\beta }. \end{aligned}$$(18)
Proof
We prove the two cases separately:

(a)
For \(\beta = 0\), i.e. the Pgreedy algorithm, this is the standard power function estimate in conjunction with the Pgreedy selection criterion \(P_n(x_{n+1}) = \Vert P_n \Vert _{L^\infty (\Omega )}\). For \(\beta = 1\) this holds with equality as it is simply the selection criterion of fgreedy since we have here \(r_n(x_{n+1}) = \Vert r_n \Vert _{L^\infty (\Omega )}\). We thus consider \(\beta \in (0, 1)\) and let \({\tilde{x}}_{i+1} \in \Omega \) be such that \(r_i({\tilde{x}}_{i+1}) = \Vert r_i \Vert _{L^\infty (\Omega )}\). Then, the selection criterion from Eq. (15) gives
$$\begin{aligned} r_i(x)^\beta \cdot P_i(x)^{1\beta } \le r_i(x_{i+1})^\beta \cdot P_i(x_{i+1})^{1\beta }\;\; \forall x\in \Omega , \end{aligned}$$and in particular
$$\begin{aligned} P_i({\tilde{x}}_{i+1}) \le \frac{r_i(x_{i+1})^{\frac{\beta }{1\beta }}}{r_i({\tilde{x}}_{i+1})^{\frac{\beta }{1\beta }}} \cdot P_i(x_{i+1}). \end{aligned}$$Using this bound with the standard power function estimate gives
$$\begin{aligned} \Vert r_i \Vert _{L^\infty (\Omega )}&= r_i({\tilde{x}}_{i+1}) \le P_i({\tilde{x}}_{i+1}) \cdot \Vert f  s_i \Vert _{\mathcal H_k (\Omega )} \\&\le \frac{r_i(x_{i+1})^{\frac{\beta }{1\beta }}}{r_i({\tilde{x}}_{i+1})^{\frac{\beta }{1\beta }}} \cdot P_i(x_{i+1}) \cdot \Vert f  s_i \Vert _{\mathcal H_k (\Omega )} \\&= \frac{r_i(x_{i+1})^{\frac{\beta }{1\beta }}}{\Vert r_i \Vert _{L^\infty (\Omega )}^{\frac{\beta }{1\beta }}} \cdot P_i(x_{i+1}) \cdot \Vert f  s_i \Vert _{\mathcal H_k (\Omega )}. \end{aligned}$$This can be rearranged for \(\Vert r_i \Vert _{L^\infty (\Omega )}\) to yield the final result.

(a)
For \(\beta \in (1, \infty )\), the selection criterion from Eq. (15) can be rearranged to
$$\begin{aligned} r_i(x)^\beta&\le \frac{r_i(x_{i+1})^\beta }{P_i(x_{i+1})^{\beta  1}} \cdot P_i(x)^{\beta  1} \;\;\forall {x \in \Omega \setminus X_i}, \end{aligned}$$and taking the supremum \(\sup _{x \in \Omega \setminus X_i}\) gives
$$\begin{aligned} \Vert r_i \Vert _{L^\infty (\Omega )}&\le \frac{r_i(x_{i+1})}{P_i(x_{i+1})^{\frac{\beta  1}{\beta }}} \cdot \Vert P_i \Vert _{L^\infty (\Omega )}^{\frac{\beta  1}{\beta }}. \end{aligned}$$For \(\beta = \infty \), the selection criterion of the f/Pgreedy algorithm can be directly rearranged to yield the statement (when using the notation \(1/\infty = 0\)). \(\square \)
Using the results of Lemma 7 as lower bounds on \(r_i(x_{i+1})\), it is now possible to control the left hand side of Inequality (16). This gives the main theorem of this section:
Theorem 8
Any \(\beta \)greedy algorithm with \(\beta \in [0,\infty ]\) applied to a function \(f \in \mathcal H_k (\Omega )\) satisfies the following error bound for \(n=1, 2, \dots \):

(a)
In the case of \(\beta \in [0, 1]\):
$$\begin{aligned} \left[ \prod _{i=n+1}^{2n} \Vert r_i \Vert _{L^\infty (\Omega )} \right] ^{1/n} \le n^{\beta /2} \cdot \Vert r_{n+1} \Vert _{\mathcal H_k (\Omega )} \cdot \left[ \prod _{i=n+1}^{2n} P_i(x_{i+1}) \right] ^{1/n}. \end{aligned}$$(19) 
(b)
In the case of \(\beta \in (1, \infty ]\) with \(1 / \infty := 0\):
$$\begin{aligned} \begin{aligned} \left[ \prod _{i=n+1}^{2n} \Vert r_i \Vert _{L^\infty (\Omega )} \right] ^{1/n} \le n^{1/2}&\cdot \Vert r_{n+1} \Vert _{\mathcal H_k (\Omega )} \cdot \left[ \prod _{i=n+1}^{2n} P_i(x_{i+1})^{1/\beta } \right] ^{1/n}. \end{aligned} \end{aligned}$$(20)
Proof
We prove the two cases separately:

(a)
For \(\beta = 0\), i.e. Pgreedy, Eq. (17) gives \(\Vert r_i \Vert _{L^\infty (\Omega )} \le P_i(x_{i+1}) \cdot \Vert r_i \Vert _{\mathcal H_k (\Omega )}\). Taking the product \(\prod _{i=n+1}^{2n}\) and the nth root in conjunction with the estimate \(\Vert r_i \Vert _{\mathcal H_k (\Omega )} \le \Vert r_{n+1} \Vert _{\mathcal H_k (\Omega )}\) for \(i = n+1, \dots , 2n\) gives the result.
For \(\beta \in (0, 1]\), we start by reorganizing the estimate (17) of Lemma 7 to get
$$\begin{aligned} r_i(x_{i+1}) \ge \left( \Vert r_i \Vert _{L^\infty (\Omega )}^{1/\beta } \right) / \left( P_i(x_{i+1})^{\frac{1\beta }{\beta }} \cdot \Vert r_i \Vert _{\mathcal H_k (\Omega )}^{\frac{1\beta }{\beta }}\right) , \end{aligned}$$and we use this to bound the left hand side of Eq. (16) as
$$\begin{aligned} n^{1/2} \cdot&\Vert r_{n+1} \Vert _{\mathcal H_k (\Omega )} \cdot \left[ \prod _{i=n+1}^{2n} P_i(x_{i+1}) \right] ^{1/n} \ge \left[ \prod _{i=n+1}^{2n} r_i(x_{i+1}) \right] ^{1/n} \\&\ge \left[ \prod _{i=n+1}^{2n} \left( \Vert r_i \Vert _{L^\infty (\Omega )}^{1/\beta } \right) / \left( P_i(x_{i+1})^{\frac{1\beta }{\beta }} \cdot \Vert r_i \Vert _{\mathcal H_k (\Omega )}^{\frac{1\beta }{\beta }}\right) \right] ^{1/n} \\&= \left[ \prod _{i=n+1}^{2n} \Vert r_i \Vert _{L^\infty (\Omega )}^{1/\beta } \right] ^{1/n} \left[ \prod _{i=n+1}^{2n} P_i(x_{i+1})^{\frac{1\beta }{\beta }} \cdot \Vert r_i \Vert _{\mathcal H_k (\Omega )}^{\frac{1\beta }{\beta }} \right] ^{1/n}. \end{aligned}$$Rearranging the factors, and using again the fact that \(\Vert r_i \Vert _{\mathcal H_k (\Omega )} \le \Vert r_{n+1} \Vert _{\mathcal H_k (\Omega )}\) for \(i = n+1, \dots , 2n\), gives
$$\begin{aligned}&\left[ \prod _{i=n+1}^{2n} \Vert r_i \Vert _{L^\infty (\Omega )}^{1/\beta } \right] ^{1/n} \\&\le n^{1/2} \cdot \Vert r_{n+1} \Vert _{\mathcal H_k (\Omega )} \cdot \left[ \prod _{i=n+1}^{2n} P_i(x_{i+1})^{1/\beta } \right] ^{1/n} \cdot \left[ \prod _{i=n+1}^{2n} \Vert r_i \Vert _{\mathcal H_k (\Omega )}^{\frac{1\beta }{\beta }} \right] ^{1/n} \\&\le n^{1/2} \cdot \Vert r_{n+1} \Vert _{\mathcal H_k (\Omega )} \cdot \left[ \prod _{i=n+1}^{2n} P_i(x_{i+1})^{1/\beta } \right] ^{1/n} \cdot \Vert r_{n+1} \Vert _{\mathcal H_k (\Omega )}^{\frac{1\beta }{\beta }} \\&\le n^{1/2} \cdot \Vert r_{n+1} \Vert _{\mathcal H_k (\Omega )}^{1/\beta } \cdot \left[ \prod _{i=n+1}^{2n} P_i(x_{i+1})^{1/\beta } \right] ^{1/n}. \end{aligned}$$Now, the inequality can be raised to the exponent \(\beta \) to give the final statement.

(b)
For \(\beta \in (1, \infty ]\) we proceed similarly by first rewriting Eq. (18) of Lemma 7 as
$$\begin{aligned} r_i(x_{i+1}) \ge \left( \Vert r_i \Vert _{L^\infty (\Omega )} \cdot P_i(x_{i+1})^{11/\beta }\right) /\left( \Vert P_i \Vert _{L^\infty (\Omega )}^{11/\beta }\right) , \end{aligned}$$and we lower bound the left hand side of Eq. (16) as
$$\begin{aligned} n^{1/2} \cdot&\Vert r_{n+1} \Vert _{\mathcal H_k (\Omega )} \cdot \left[ \prod _{i=n+1}^{2n} P_i(x_{i+1}) \right] ^{1/n} \ge \left[ \prod _{i=n+1}^{2n} r_i(x_{i+1}) \right] ^{1/n} \\&\ge \left[ \prod _{i=n+1}^{2n} \left( \Vert r_i \Vert _{L^\infty (\Omega )} \cdot P_i(x_{i+1})^{11/\beta }\right) /\left( \Vert P_i \Vert _{L^\infty (\Omega )}^{11/\beta }\right) \right] ^{1/n}. \end{aligned}$$Rearranging for \(\left[ \prod _{i=n+1}^{2n} \Vert r_i \Vert _{L^\infty (\Omega )} \right] ^{1/n}\) yields
$$\begin{aligned}&\left[ \prod _{i=n+1}^{2n} \Vert r_i \Vert _{L^\infty (\Omega )} \right] ^{1/n} \\&\le n^{1/2} \cdot \Vert r_{n+1} \Vert _{\mathcal H_k (\Omega )} \cdot \left[ \prod _{i=n+1}^{2n} \Vert P_i \Vert _{L^\infty (\Omega )}^{11/\beta } \right] ^{1/n} \cdot \left[ \prod _{i=n+1}^{2n} P_i(x_{i+1})^{1/\beta } \right] ^{1/n}, \end{aligned}$$which gives the final result due to \(\Vert P_i \Vert _{L^\infty (\Omega )} \le 1\) for all \(i = 0, 1, \ldots \). \(\square \)
4.3 An Improvement of the Standard Estimate
As an additional consequence of Lemma 7, Corollary 9 gives a new inequality that can be seen as an improved standard power function estimate, i.e. an improvement compared to the standard power function estimate from Eq. (3), that holds for any \(\beta \)greedy algorithm.
Corollary 9
[Improved standard power function estimate] Any \(\beta \)greedy algorithm with \(\beta \in [0,\infty ]\) applied to a function \(f \in \mathcal H_k (\Omega )\) satisfies for \(i = 0, 1, \dots \) the following improved standard power function estimate (with \(1/\infty := 0\)):
Proof
For both \(\beta \in [0, 1]\) and \(\beta \in (1, \infty ]\) we use the upper bounds on \(\Vert r_i \Vert _{L^\infty (\Omega )}\) as stated in Lemma 7 and further estimate the quantity \(r_i(x_{i+1})\) via the standard power function estimate from Eq. (3) to get
for \(\beta \in [0,1]\), and
for \(\beta \in (1, \infty ]\) by using \(\Vert P_i \Vert _{L^\infty (\Omega )} \le \Vert P_0 \Vert _{L^\infty (\Omega )} \le 1\) for all \(i = 0, 1, \ldots \) (see Eq. (14)). \(\square \)
The estimate from Eq. (21) is an improved estimate in comparison with Eq. (3), in that it provides a bound on \(\Vert r_i \Vert _{L^\infty (\Omega )}\) instead of \(r_i(x_{i+1})\), and this is a strictly larger quantity except that in the case of the fgreedy algorithm (i.e. \(\beta =0\)), where they coincide. Moreover, for \(\beta \in [0, 1]\) the right hand side of the estimates of Eq. (3) and (21) coincide, while for \(\beta >1\) this improvement comes at the price of a smaller exponent on the power function term, since \(1/\beta <1\).
Remark 10
We will see in the following how to obtain convergence rates of the term \(\min _{n+1\le i\le 2n} \Vert r_i \Vert _{L^{\infty }(\Omega )}\). From a practitioner point of view this kind of result might be unsatisfactory, as it is unclear which interpolant \(s_i\) gives the best approximation. In this case it is possible to resort to the improved standard power function estimate of Corollary 9: This inequality suggests to pick \(s_{i^*}\) with \(i^* := {{\,\mathrm{\textrm{arg min}}\,}}_{n+1\le i \le 2n} P_i(x_{i+1})\).
5 Convergence Rates for Greedy Kernel Interpolation
We can finally combine the abstract Hilbert space analysis from Sect. 3 and the greedy kernel interpolation analysis from Sect. 4 and apply them to concrete classes of kernels.
First of all, we recall a convenient connection that was established in [20] between the abstract analysis of [4] and kernel interpolation. We repeat it as we need to include also the extension of Sect. 3, i.e., the new quantity \(\nu _n\). The goal is to frame the \(\beta \)greedy algorithms as particular instances of the general greedy algorithm of Sect. 3. In this view we choose \({\mathcal {H}} = \mathcal H_k (\Omega )\) and \({\mathcal {F}} = \{k(\cdot , x), x \in \Omega \}\). The fact that this set is compact is implied by the decay to zero of its Kolmogorov width, that is equivalent to the existence of a sequence of points such that the associated power function converges to zero (see Eq. (23)). This choice means that \(f = k(\cdot , x) \in {\mathcal {F}}\) can be uniquely associated with an \(x \in \Omega \) and vice versa. This yields a realization of the abstract greedy algorithm that produces an approximation set
and thus this is a greedy kernel algorithm, with an appropriate selection rule. Table 1 summarizes these assignments.
With these choices, as can be seen from the definition in Eq. (8), \(\sigma _n\) is simply the maximal power function value and \(\nu _n\) is the power function value at the selected point.
Moreover, \(d_n\) can be similarly bounded as
and thus any convergence statement on \(\Vert P_{X_n} \Vert _{L^\infty (\Omega )}\) for a given set of points \(X_n\subset \Omega \) gives via Eq. (23) a bound on \(d_n\).
Additionally, observe that the assumption \(\Vert f\Vert _{\mathcal {h}} \le 1\) for \(f\in {\mathcal {F}}\) implies in the kernel setting that
5.1 Convergence Rates for \(\beta \)Greedy Algorithms
From Theorem 8, it is now easily possible to derive convergence statements and decay rates for the kernel greedy algorithms, by bounding the righthand side by Inequality (2) and using the interpretations of \(\nu _i\) and \(d_n\) from Eq. (22) and Eq. (23).
Corollary 11
Assume that a \(\beta \)greedy algorithm with \(\beta \in [0,\infty ]\) is applied to a function \(f \in \mathcal H_k (\Omega )\). Let \(\alpha , C_0, c_0>0\) be given constants, and set \(1 / \infty := 0\). Recall \(r_i \equiv f  s_i\):

1.
If there exists a sequence \((X_n)_{n\in \mathbb {N}}\subset \Omega \) of sets of points such that
$$\begin{aligned} \left\ {\tilde{f}}  \Pi _{X_n} {\tilde{f}} \right\ _{L^\infty (\Omega )} \le C_0 n^{\alpha } \Vert {\tilde{f}} \Vert _{\mathcal H_k (\Omega )}\;\;\forall {\tilde{f}} \in \mathcal H_k (\Omega ), \end{aligned}$$then for all \(\beta \ge 0\) and for all \(n\ge 3\) it holds
$$\begin{aligned} \min _{n+1\le i\le 2n}\Vert r_i \Vert _{L^\infty (\Omega )} \le C \cdot n^{\frac{\min \{1, \beta \}}{2}} (\log (n)\cdot n^{1})^{\frac{\alpha }{\max \{1, \beta \}}} \Vert r_{n+1} \Vert _{\mathcal H_k (\Omega )}, \end{aligned}$$(25)with \(C:=\left( 2^{\alpha +1/2} \max \{1, C_0\} e^\alpha \right) ^{\frac{1}{\max \{1, \beta \}}}\). In particular
$$\begin{aligned} \min _{n+1\le i\le 2n} \Vert r_i \Vert _{L^\infty (\Omega )}&\le C \cdot \log (n)^{\alpha } \cdot \Vert r_{n+1} \Vert _{\mathcal H_k (\Omega )} \cdot \left\{ \begin{array}{ll} n^{\alpha 1/2} &{} f\text {greedy} \\ n^{\alpha 1/4} &{} f \cdot P\text {greedy} \\ n^{\alpha } &{} P\text {greedy} \end{array} \right. . \end{aligned}$$ 
2.
If there exists a sequence \((X_n)_{n\in \mathbb {N}}\subset \Omega \) of sets of points^{Footnote 3} such that
$$\begin{aligned} \left\ {\tilde{f}}  \Pi _{X_n} {\tilde{f}} \right\ _{L^\infty (\Omega )} \le C_0 e^{c_0 n^\alpha } \Vert {\tilde{f}} \Vert _{\mathcal H_k (\Omega )}\;\;\forall {\tilde{f}} \in \mathcal H_k (\Omega ), \end{aligned}$$then for all \(\beta \ge 0\) and for all \(n\ge 2\) it holds
$$\begin{aligned} \min _{n+1\le i\le 2n}\Vert r_i \Vert _{L^\infty (\Omega )}&\le C \cdot n^{\frac{\min \{1, \beta \}}{2}} e^{c_1 n^\alpha } \Vert r_{n+1} \Vert _{\mathcal H_k (\Omega )}, \end{aligned}$$(26)with \(C:=\left( \sqrt{2\max \{1, C_0\}} \right) ^{\frac{1}{\max \{1, \beta \}}}\) and \(c_1 = 2^{(2+\alpha )} c_0 / \max \{1, \beta \}\). In particular
$$\begin{aligned} \min _{i=n+1, \dots , 2n} \Vert r_i \Vert _{L^\infty (\Omega )} \le C \cdot e^{c_1 n^\alpha } \cdot \Vert r_{n+1} \Vert _{\mathcal H_k (\Omega )} \cdot \left\{ \begin{array}{ll} n^{1/2} &{} f\text {greedy} \\ n^{1/4} &{} f \cdot P\text {greedy} \\ n^0 &{} P\text {greedy} \end{array} \right. . \end{aligned}$$ 
3.
For f/Pgreedy, for any kernel and for all \(n\ge 1\) it holds
$$\begin{aligned} \min _{n+1\le i \le 2n} \Vert r_i \Vert _{L^\infty (\Omega )} \le n^{1/2} \cdot \Vert r_{n+1} \Vert _{\mathcal H_k (\Omega )}. \end{aligned}$$
Proof
The proof is a simple combination of Corollary 2 and Theorem 8, with the addition of the following simple steps:
First, the worst case bounds in \(\mathcal H_k (\Omega )\) (either algebraic or exponential) imply the same bound on the power function via Eq. (2). Second, in all cases we use the results of Theorem 8 in combination with the bound
Then, Eq. (19) and (20) of Theorem 8 can be jointly written as
Plugging the bounds of Corollary 2 in the last inequality gives the result of the first two points. The third point directly follows from Eq. (20) for \(\beta = \infty \) due to \(P_i(x_{i+1}) \le 1\) for all \(i=1, 2, \ldots \) . \(\square \)
5.2 Translational Invariant Kernels
Strictly positive definite and translational invariant kernels are popular kernels for applications. To specialize our result to this interesting case, in this subsection we use the following assumption.
Assumption 1
Let \(k(x,y) = \Phi (xy)\) be a strictly positive definite translational invariant kernel with associated reproducing kernel Hilbert space \(\mathcal H_k (\Omega )\), whereby the domain \(\Omega \subset \mathbb {R}^d\) is assumed to be bounded with Lipschitz boundary and interior cone condition.
In this context, we have the following special case of Corollary 11. To highlight the results in the most relevant cases, we state them only for \(\beta \in \{0, 1/2, 1, \infty \}\) even if similar statements hold for general \(\beta >0\).
Corollary 12
Under Assumptions 1, any \(\beta \)greedy algorithm with \(\beta \in \{0, 1/2, 1, \infty \}\) applied to some function \(f \in \mathcal H_k (\Omega )\) satisfies the following error bounds for \(n=0, 1, \ldots \), where the constants are defined as in Corollary 11.

1.
In the case of kernels of finite smoothness \(\tau > d/2\)
$$\begin{aligned} \min _{i=n+1, \dots , 2n} \Vert r_i \Vert _{L^\infty (\Omega )}&\le C \cdot \log (n)^{\tau /d1/2} \cdot \Vert r_{n+1} \Vert _{\mathcal H_k (\Omega )} \cdot \left\{ \begin{array}{ll} n^{\tau /d} &{} f\text {greedy} \\ n^{1/4\tau /d} &{} f \cdot P\text {greedy} \\ n^{1/2\tau /d} &{} P\text {greedy} \end{array} \right. . \end{aligned}$$ 
2.
In the case of kernels of infinite smoothness:
$$\begin{aligned} \min _{i=n+1, \dots , 2n} \Vert r_i \Vert _{L^\infty (\Omega )} \le C \cdot e^{c_1 n^{1/d}} \cdot \Vert r_{n+1} \Vert _{\mathcal H_k (\Omega )} \cdot \left\{ \begin{array}{ll} n^{1/2} &{} f\text {greedy} \\ n^{1/4} &{} f \cdot P\text {greedy} \\ n^0 &{} P\text {greedy} \end{array} \right. . \end{aligned}$$
Observe that for any \(\beta \in (0, 1]\) we have the additional convergence of order \(n^{\beta /2}\) or \(n^{1/2}\) for \(\beta > 1\). The additional decay is faster for increasing \(\beta \in (0,1]\), i.e. increasing the weight of the target datadependent term in the selection criterion gives better decay rates. Especially, the proven decay rate for fgreedy is better than the one for \(f \cdot P\)greedy which is better than the one for Pgreedy.
This additional convergence proves in particular that the Kolmogorov barrier can be broken, i.e., approximation rates that are better than the ones provided by the Kolmogorov width can be obtained for any function in \(\mathcal H_k (\Omega )\). Indeed, as discussed above any bound on \(d_n\) turns into a bound on \(\Vert P_n\Vert _{L^\infty (\Omega )}\), which can then be used in Corollary 11 or Corollary 12.
This is particularly relevant for the kernels whose RKHS is norm equivalent to a Sobolev space. But also other general kernels of low smoothness are of interest, since it might happen that the power function is decaying at arbitrarily slow speed, while the adaptive points selected by a \(\beta \)greedy algorithm provide an additional convergence rate.
Moreover, the additional decay for \(\beta > 0\) is dimension independent and thus it does not incur in the curse of dimensionality. This is of interest in particular for translational invariant kernels of Corollary 12, as both the algebraic and the exponential decay of the power function (or Kolmogorov width) degrade with the dimension d and thus the additional term gains more importance.
Despite this notable relevance, the estimates of Corollary 11 and Corollary 12 are likely not optimal in the algebraic case. Indeed, for kernels with algebraically decaying Kolmogorov width, in the case of the Pgreedy algorithm (\(\beta =0\)) bounds without the additional \(\log (n)^{\alpha }\) factor are known [20]. We thus expect that the inconvenient additional \(\log (n)^{\alpha }\) factor is not required for any of the \(\beta \)greedy algorithms. We remark that this factor is related to the additional \(\epsilon \) within Corollary 2, but we did not find a way to get rid of it, with exception of \(\beta = 0\), i.e. the Pgreedy case. Moreover, we obtained our bounds by means of the worstcase bounds on \((\prod _{i=n+1}^{2n} P_i(x_{i+1}))^{1/n}\) from Corollary 2. Numerically, a faster decay than the worst case bound from Corollary 2 can often be observed (see the examples in Sect. 6.1). Especially, for each \(\beta \) value we obtain a different sequence of points and thus a different decay of the corresponding power function values.
Remark 13
Additional convergence orders can be obtained from the decay of \(\Vert r_n \Vert _{\mathcal H_k (\Omega )}\). Even if this quantity is in general decaying at arbitrarily slow speed for a general \(f\in \mathcal H_k (\Omega )\), we mention the case of superconvergence [24, 25], which allows to bound \(\Vert r_i \Vert _{\mathcal H_k (\Omega )} \le C_f \cdot \Vert P_i \Vert _{L^2(\Omega )}\) for special functions \(f \in \mathcal H_k (\Omega )\). The original superconvergence requirement \(f = Tg\) (whereby T is the kernel integral operator and \(g \in L^2(\Omega )\), i.e. \((T g)(x) = \int _{\Omega } k(x,y) g(y) \textrm{d}y\)) can be extended to functions \(f\in \mathcal H_k (\Omega )\) such that \(\langle f, g \rangle _{\mathcal H_k (\Omega )}  \le C_f \cdot \Vert g \Vert _{L^q(\Omega )}\) for all \(g \in \mathcal H_k (\Omega )\) (see [22, Theorem 19]).
Remark 14
The stability of the greedy interpolation, as computed here by the socalled direct method, is mainly linked to the smallest eigenvalue of the kernel interpolation matrix. A standard result [23] gives the upper estimate \(\lambda _{\min }(X_n) \le P_{n1}(x_{n})^2\). In view of the estimates of Eqs. (25) and (26), this means that a faster convergence based on a faster decay of the power function values \(P_i(x_{i+1})\) directly negatively influences the stability. This holds especially for \(\beta > 1\), because in this case the upper bound for the convergence in terms of the power function scales with the exponent \(1/\beta < 1\).
Remark 15
The analysis above shows that \(\gamma \)greedy algorithms which were introduced in [32] are actually closer to the Pgreedy algorithm than to target datadependent algorithms for the case of kernels of finite smoothness \(\tau > d/2\). In this case for \(\gamma \)greedy algorithms the decay of \(P_n(x_{n+1})\) can be both lower and upper bounded by a constant times \(n^{1/2\tau /d}\). As the point selection criteria of \(\gamma \)stabilized greedy algorithms first of all look at the power function value via \(P_n(x_{n+1}) \ge \gamma \cdot \Vert P_n \Vert _{L^\infty (\Omega )}\), there is no relationship as in Eq. (15) (\(\beta > 0\)). Thus, we cannot derive additional convergence rates.
Remark 16
For kernels of finite smoothness \(\tau > d/2\) on a set \(\Omega \) with Lipschitz boundary satisfying an interior cone condition, the optimal rates of \(L^p\)convergence are of order \(\left\ r_n\right\ _{L^p(\Omega )} \le c_p n^{\tau /d+ (1/2  1/p)_+}\). This rate is matched by the Pgreedy algorithm (see [32, Corollary 22]), since it is proven to select asymptotically uniformly distributed points.
In the case of the fgreedy algorithm, we can use the additional factor \(n^{1/2}\) from Corollary 12 to get rid of the conversion from the \(L^p\) to the \(L^\infty \) norm, i.e. we have
So we have almost \(L^p\)optimal results (up to the polylogarithmic factor) for \(p \in [1, 2]\) and even improved convergence for \(p \in (2, \infty ]\). Similar statements hold for general \(\beta \)greedy algorithms.
6 Examples
6.1 Visualization of Results of Abstract Setting
This subsection visualizes the results from the abstract analysis in Sect. 3, especially Sect. 3.1. Again we make use of the links recalled in the beginning of Sect. 5, especially in Eqs. (22) and (23).
We consider the domain \(\Omega = [0, 1]^3 \subset \mathbb {R}^3\) and the Gaussian kernel with kernel width parameter 2, i.e. \(k(x,y) = \exp (4 \Vert xy \Vert _2^2)\). Four different sequences of points are considered, with colors referring to Fig. 2:

i.
Blue: Pgreedy algorithm on the whole domain \(\Omega \).

ii.
Red: Pgreedy algorithm on the subdomain \(\Omega _2 := \{ x \in \Omega ~~ (x)_3 = 1/2 \}\). Like this, the dimension is effectively reduced from \(d=3\) to \(d=2\).

iii.
Yellow, violet: The points are independently randomly picked within \(\Omega \) according to a uniform distribution.
The results are displayed in Fig. 2:

The upper two figures displays the quantities \(\sigma _n = \Vert P_n \Vert _{L^\infty (\Omega )}\) (left) and \(\nu _n = P_n(x_{n+1})\) (right).

The lower two figures display
$$\begin{aligned} n&\mapsto \left( \prod _{j=n+1}^{2n} \sigma _j \right) ^{1/n} = \left( \prod _{j=n+1}^{2n} \Vert P_j \Vert _{L^\infty (\Omega )} \right) ^{1/n} {} & {} \text {(left),} \\ n&\mapsto \left( \prod _{j=n+1}^{2n} \nu _j \right) ^{1/n} = \left( \prod _{j=n+1}^{2n} P_j(x_{j+1}) \right) ^{1/n}{} & {} \text {(right)}. \end{aligned}$$
For the numerical experiments, the domain \(\Omega \) was discretized using \(2 \cdot 10^4\) random points and \(\Omega _2\) was discretized by projecting the random points related to \(\Omega \) onto \(\Omega _2\). The algorithms run until 300 points are selected or the next selected Power function value satisfies \(P_n(x_{n+1}) < 10^{5}\).
From the top left picture, one can infer that the displayed quantity \(\Vert P_n \Vert _{L^\infty (\Omega )}\) decays fastest for the Pgreedy algorithm. This was expected, as the algorithms directly aims at minimizing this quantity. However, the displayed quantity \(\Vert P_n \Vert _{L^\infty (\Omega )}\) does not drop at all for the Pgreedy algorithm on \(\Omega _2\), as it picks only points from \(\Omega _2\) and thus does not fill \(\Omega \).
Contrarily, the top right picture shows that the displayed quantity \(P_n(x_{n+1})\) decays faster for the Pgreedy algorithm on \(\Omega _2\), while for the Pgreedy algorithm on \(\Omega \) we have exactly the same curve due to \(P_n(x_{n+1}) = \Vert P_n \Vert _{L^\infty (\Omega )}\). The two further point choices exhibit a wiggling, noisy behavior on the displayed \(P_n(x_{n+1})\) quantity, which is related to the random point choice.
The two lower figures refer to the geometric mean \((\Pi _{j=n+1}^{2n} ~ .. ~ )^{1/n}\) of the quantities of the upper figures. In the lower left figure, we can see that only the curve related to the Pgreedy algorithm on \(\Omega \) decays fast, the other curves do not decay at all or only slowly—because the points are not chosen in a way to minimize the maximal Power function value \(\Vert P_n \Vert _{L^\infty (\Omega )}\). Contrarily, the Pgreedy algorithm on \(\Omega \) exhibits the slowest decay of the quantity \((\Pi _{j=n+1}^{2n} \nu _j)^{1/n}\), which is the same curve as in the lower left figure due to \(\nu _j = P_j(x_{j+1}) = \Vert P_j \Vert _{L^\infty (\Omega )} = \sigma _j\). However, all the three other choices of points provide a faster decay of the displayed quantity \((\prod _{j=n+1}^{2n} P_j(x_{j+1}))^{1/n} = (\prod _{j=n+1}^{2n} \nu _j)^{1/n}\) . The theoretical reason for (at least) the same decay as the Pgreedy algorithm on \(\Omega \) was proven in Corollary 2.
6.2 \(\beta \)Greedy Algorithms Using the Wendland Kernel
We consider the application of \(\beta \)greedy algorithms for the particular example of the Wendland \(k=0\) kernel on the domain \(\Omega = [0,1]\), which is defined as
and thus a piecewise linear kernel. Its native space \(\mathcal H_k (\Omega )\) is norm equivalent to the Sobolev space \(W^1_2(\Omega )\). It is immediate to see that kernel interpolation using the Wendland \(k=0\) kernel on centers \(X_n \subset \Omega \) boils down to piecewise linear spline interpolation on the subinterval \([\min X_n, \max X_n] \subset [0, 1]\). On \(\Omega \setminus [\min X_n, \max X_n]\) the interpolant is still an affine function.
We consider the function \(f: \Omega \rightarrow \mathbb {R}, x \mapsto x^\alpha \) for some \(1/2< \alpha < 1\). For \(\alpha > 1/2\) it holds \(f \in W^1_2(\Omega )\), thus \(f \in \mathcal H_k (\Omega )\). It can be shown, that in the case of asymptotically uniform interpolation points—i.e. \(q_n \asymp h_n \asymp n^{1}\), whereby \(q_n = \min _{x_i \ne x_j \in X_n} \Vert x_i  x_j \Vert _2\) is the so called separation distance—it is possible to lowerbound the error as (for details see “Appendix A”)
for \(C_\alpha > 0\). Furthermore, independent of the way the interpolation points \(X_N\) were chosen (i.e. even for optimally chosen points), it holds
for some \(C > 0\). Thus, we can infer:

Any (greedy) algorithm that yields asymptotically uniformly distributed points cannot have a convergence rate better than \(n^{\alpha }\) for this particular example. This includes especially the Pgreedy algorithm, but also any \(\gamma \)stabilized greedy algorithms [32], as they are known to provide asymptotically uniform points as well, see [32, Theorem 20]. Thus, this example shows that \(\gamma \)stabilized greedy algorithms cannot be expected in general to give a better approximation rate than the Pgreedy algorithm (however they were motivated by their use for the preasymptotic range).
Especially for \(\alpha \rightarrow 1/2\), the convergence rate can be arbitrary close to 1/2.

For the fgreedy and \(f \cdot P\)greedy algorithms we have a convergence of at least \(\log (n)^{1/2} \cdot n^{1}\) respective \(\log (n)^{1/2} \cdot n^{3/4}\) according to Corollary 12, which is strictly better compared to the Pgreedy algorithm.
Figure 3 visualizes the convergence of several \(\beta \)greedy algorithms for the described setting. One can observe that the error for the Pgreedy algorithm (\(\beta = 0\)) decays approximately according to \(n^{1/2}\), which is in accordance with Eq. (27). For the fgreedy algorithm (\(\beta = 1\)) the error seems to decay according to \(n^{2}\), which is the fastest possible decay rate according to Eq. (28). For all intermediate \(\beta \) values one can observe intermediate convergence rates: For values of \(\beta \) closer to 1, the error decays faster. The f/Pgreedy algorithm (\(\beta = \infty \)) seems to give a convergence in between \(n^{1/2}\) and \(n^{2}\).
We remark that this behavior of the error decay depending on \(\beta \) is not unique to the Wendland \(k=0\) kernel, but can also be observed for other kernels, domains and target functions f. This particular example was chosen, because it is analytically easily possible to derive several explicit statements on convergence rates for asymptotically uniform and adapted points.
6.3 Approximation of Franke’s Test Function
As a final example in 2D we consider the approximation of Franke’s test function which is defined on \(\Omega = [0, 1]\) as
Therefore, we use the linear Matérn kernel which is given as
and run \(\beta \)greedy algorithms using \(\beta \in \{0, 0.5, 1, \infty \}\). The resulting points are visualized in Fig. 4. For \(\beta =0\), i.e. Pgreedy, the points are quite uniformly distributed, which is according to the theoretical results in [32]. For \(\beta =\infty \), i.e. f/Pgreedy, the points are quite clustered around a few spots. For \(\beta = 0.5\) (\(f \cdot P\)greedy) and \(\beta = 1\) (fgreedy), an intermediate behavior can be observed: The points are still slightly clustered, but also fill the whole domain.
7 Conclusion and Outlook
Using an abstract analysis of greedy algorithms in Hilbert spaces, it was shown that arbitrary point sequences—e.g., generated from arbitrary greedy kernel algorithms—yield certain decay rates for specific power function quantities. Based on these results and refined greedy kernel interpolation analysis it was possible to investigate and prove convergence statements for a range of greedy kernel algorithms including the target datadependent f, \(f \cdot P\) and f/Pgreedy algorithms. The provided techniques and results will likely lead to further advancements, e.g., in the field of kernel quadrature.
Several points remain open, and they will be addressed in future research. First, the proven decay rate for the f/Pgreedy algorithm is still not satisfactory and is likely improvable. Moreover, the results are independent of the special choice of function \(f \in \mathcal H_k (\Omega )\). How to make use of properties of that function? It would be desirable to conclude a faster decay of the \((\prod _{i=n+1}^{2n} P_i(x_{i+1}))^{1/n}\)quantity based on properties of the considered function \(f \in \mathcal H_k (\Omega )\). Finally, it is still unclear if it is possible to derive general statements on the decay of \(\Vert fs_n \Vert _{\mathcal H_k (\Omega )}\), and what is the relationship between this fact and superconvergence.
Notes
We remark that we use the notation \(f \cdot P\)greedy algorithm here because it fits better to our notation, while in the original publication it was called psrgreedy (power scaled residual greedy).
In this framework, the f/Pgreedy algorithm is a limit case. This can be seen as an explanation why the f/Pgreedy selection rule is sometimes not well defined, as discussed in Example 6 in [32].
We remark that this sequence of sets of points does not need to be nested.
References
Cohen, A., Dahmen, W., DeVore, R.: Orthogonal matching pursuit under the restricted isometry property. Constr. Approx. 45(1), 113–127 (2017)
Davis, G., Mallat, S., Avellaneda, M.: Adaptive greedy approximations. Constr. Approx. 13(1), 57–98 (1997)
De Marchi, S., Schaback, R., Wendland, H.: Nearoptimal dataindependent point locations for radial basis function interpolation. Adv. Comput. Math. 23(3), 317–330 (2005)
DeVore, R., Petrova, G., Wojtaszczyk, P.: Greedy algorithms for reduced bases in Banach spaces. Constr. Approx. 37(3), 455–466 (2013)
DeVore, R.A., Temlyakov, V.N.: Some remarks on greedy algorithms. Adv. Comput. Math. 5(2–3), 173–187 (1996)
Dutta, S., Farthing, M.W., Perracchione, E., Savant, G., Putti, M.: A greedy nonintrusive reduced order model for shallow water equations. J. Comput. Phys. 439, 110378 (2021)
Fasshauer, G.E.: Meshfree Approximation Methods with MATLAB, Volume 6 Interdisciplinary Mathematical Sciences. World Scientific Publishing Co. Pte. Ltd., Hackensack (2007)
Fasshauer, G.E., McCourt, M.: KernelBased Approximation Methods Using MATLAB, Volume 19 of Interdisciplinary Mathematical Sciences. World Scientific Publishing Co. Pte. Ltd., Hackensack (2015)
Haasdonk, B., Santin, G.: Greedy kernel approximation for sparse surrogate modeling. In: Keiper, W., Milde, A., Volkwein, S. (eds.) ReducedOrder Modeling (ROM) for Simulation and Optimization: Powerful Algorithms as Key Enablers for Scientific Computing, pp. 21–45. Springer, Cham (2018)
Koeppl, T., Santin, G., Haasdonk, B., Helmig, R.: Numerical modelling of a peripheral arterial stenosis using dimensionally reduced models and kernel methods. Int. J. Numer. Methods Biomed. Eng. 34(8), e3095 (2018)
Köppel, M., Franzelin, F., Kröker, I., Oladyshkin, S., Santin, G., Wittwar, D., Barth, A., Haasdonk, B., Nowak, W., Pflüger, D., Rohde, C.: Comparison of datadriven uncertainty quantification methods for a carbon dioxide storage benchmark scenario. Comput. Geosci. 23(2), 339–354 (2019)
Maday, Y., Nguyen, N.C., Patera, A.T., Pau, S.H.: A general multipurpose interpolation procedure: the magic points. Commun. Pure Appl. Anal. 8(1), 383–404 (2009)
Mallat, S., Zhang, Z.: Matching pursuits with timefrequency dictionaries. IEEE Trans. Signal Process. 41(12), 3397–3415 (1993)
Müller, S.: Komplexität und Stabilität von kernbasierten Rekonstruktionsmethoden (Complexity and Stability of Kernelbased Reconstructions). PhD thesis, Fakultät für Mathematik und Informatik, GeorgAugustUniversität Göttingen (2009)
Müller, S., Schaback, R.: A Newton basis for kernel spaces. J. Approx. Theory 161(2), 645–655 (2009)
Narcowich, F.J., Ward, J.D., Wendland, H.: Sobolev bounds on functions with scattered zeros, with applications to radial basis function surface fitting. Math. Comput. 74(250), 743–763 (2005)
Narcowich, F.J., Ward, J.D., Wendland, H.: Sobolev error estimates and a Bernstein inequality for scattered data interpolation via radial basis functions. Constr. Approx. 24(2), 175–186 (2006)
Pazouki, M., Schaback, R.: Bases for kernelbased spaces. J. Comput. Appl. Math. 236(4), 575–588 (2011)
Pinkus, A.: \(n\)Widths in Approximation Theory. Ergebnisse der Mathematik und ihrer Grenzgebiete (3) [Results in Mathematics and Related Areas (3)], vol. 7. Springer, Berlin (1985)
Santin, G., Haasdonk, B.: Convergence rate of the dataindependent \(P\)greedy algorithm in kernelbased approximation. Dolomites Res. Notes Approx. 10, 68–78 (2017)
Santin, G., Haasdonk, B.: Kernel methods for surrogate modeling. In: Benner, P., GrivetTalocia, S., Quarteroni, A., Rozza, G., Schilders, W., Silveira, L.M. (eds.) Model Order Reduction, vol. 2. De Gruyter (2021)
Santin, G., Karvonen, T., Haasdonk, B.: Sampling based approximation of linear functionals in reproducing kernel hilbert spaces. BIT Numer. Math. 62(1), 279–310 (2022)
Schaback, R.: Error estimates and condition numbers for radial basis function interpolation. Adv. Comput. Math. 3(3), 251–264 (1995)
Schaback, R.: Improved error bounds for scattered data interpolation by radial basis functions. Math. Comp. 68(225), 201–216 (1999)
Schaback, R.: Superconvergence of kernelbased interpolation. J. Approx. Theory 235, 1–19 (2018)
Schaback, R., Wendland, H.: Adaptive greedy techniques for approximate solution of large RBF systems. Numer. Algorithms 24(3), 239–254 (2000)
Schaback, R., Wendland, H.: Numerical techniques based on radial basis functions. In: Curve and Surface Fitting: SaintMalo 1999. Vanderbilt University Press, pp. 359–374 (2000)
Schmidt, A., Haasdonk, B.: Datadriven surrogates of value functions and applications to feedback control for dynamical systems. IFACPapersOnLine, 51(2):307–312, 2018. 9th Vienna International Conference on Mathematical Modelling
Temlyakov, V.N.: Greedy approximation. Acta Numer. 17, 235–409 (2008)
Wendland, H.: Scattered Data Approximation. Cambridge Monographs on Applied and Computational Mathematics, vol. 17. Cambridge University Press, Cambridge (2005)
Wendland, H., Rieger, C.: Approximate interpolation with applications to selecting smoothing parameters. Numer. Math. 101(4), 729–748 (2005)
Wenzel, T., Santin, G., Haasdonk, B.: A novel class of stabilized greedy kernel approximation algorithms: convergence, stability and uniform point distribution. J. Approx. Theory 262, 105508 (2021)
Wirtz, D., Haasdonk, B.: A vectorial kernel orthogonal greedy algorithm. Dolomites Res. Notes Approx. 6, 83–100 (2013)
Acknowledgements
The authors acknowledge the funding of the project by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under Germany’s Excellence Strategy  EXC 2075  390740016 and funding by the BMBF under contract 05M20VSA.
Funding
Open Access funding enabled and organized by Projekt DEAL.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Details on Section 6.1
Details on Section 6.1
Consider the \(k=0\) Wendland kernel and the domain \(\Omega = [0, 1]\). Then, the native space \(\mathcal H_k (\Omega )\) is norm equivalent to the Sobolev space \(W^1_2(\Omega )\). We remark that kernel interpolation using the Wendland \(k=0\) kernel on centers \(X_n \subset \Omega \) boils down to piecewise linear spline interpolation on the subinterval \([\min X_n, \max X_n] \subset [0, 1]\). On \(\Omega \setminus [\min X_n, \max X_n]\) the interpolant is still an affine function.
We consider the function \(f: \Omega \rightarrow \mathbb {R}, x \mapsto x^\alpha \) for some \(1/2< \alpha < 1\). For \(\alpha > 1/2\), it holds \(f \in W^1_2(\Omega )\), thus \(f \in \mathcal H_k (\Omega )\). We consider the interpolation using notyet specified points \(X_n \subset \Omega \). Define \(z_1 := z_1^{(n)} := \min X_n, z_2 := z_2^{(n)} := \min X_n \setminus \{ z_1 \}\). We have for \(n \ge 2\):
We can estimate \(\Vert f  s_n \Vert _{L^\infty (\Omega )}\) via:
The integral \(\Vert f  s_n \Vert _{L^1([z_1, z_2])}\) can be computed as
Thus, we have
We proceed by setting \(k := z_2/z_1\):
We consider asymptotically uniform points, i.e. \(\exists C > 0 ~ \forall n \in \mathbb {N}~ h_n / q_n \le C\). Based on the definition of the separation and fill distance we can estimate
Using the asymptotic uniformity, we have finally
Finally, an analysis of the 1D function \(h_\alpha : [1 + C^{1}, \infty ) \rightarrow \mathbb {R}\) shows that it holds
for \(k \in [1 + C^{1}, \infty )\). This finally implies
for some \(c > 0\) due to \(z_2 \ge q_n \ge {\tilde{c}} n^{1}\).
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Wenzel, T., Santin, G. & Haasdonk, B. Analysis of Target DataDependent Greedy Kernel Algorithms: Convergence Rates for f, \(f \cdot P\) and f/PGreedy. Constr Approx 57, 45–74 (2023). https://doi.org/10.1007/s00365022095923
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00365022095923