High-dimensional approximation with kernel-based multilevel methods on sparse grids

Moderately high-dimensional approximation problems can successfully be solved by combining univariate approximation processes using an intelligent combination technique. While this has so far predominantly been done with either polynomials or splines, we suggest to employ a multilevel kernel-based approximation scheme. In contrast to those schemes built upon polynomials and splines, this new method is capable of combining arbitrary low-dimensional domains instead of just intervals and arbitrarily distributed points in these low-dimensional domains. We introduce the method and analyse its convergence in the so-called isotropic and anisotropic cases.


Introduction
High-dimensional problems occur in various applications from physics, engineering and even business science. Typical examples comprise solutions to the Schroedinger equation, the Fokker-Planck equation, the Black-Scholes equation or the Hamilton-Jacobi-Bellmann equation, to name a few. Another frequent application is in the context of uncertainty quantification, where a partial differential equation with coefficients that depend on random variables or high-dimensional parameters has to be solved over a low-dimensional domain. The solution or more generally the quantity of interest is then also a high-dimensional function of the parameters.
The idea behind the Smolyak construction is to combine sequences of lowdimensional approximation schemes in an orderly fashion to obtain a high-dimensional approximation method which exhibits nearly the same approximation features as the low-dimensional schemes. So far, however, Smolyak's procedure has predominantly been used within the following setting, though there are notable exception, see for example [21] for other functions and [22] for other point sets. The low-dimensional problems were usually be defined on univariate domains and the approximation schemes were either splines or polynomials. While polynomials become more and more expensive if larger and larger data sets are used, they provide easily even spectral convergence. However, the possible point sets are usually restricted to Chebyshev or Clenshaw-Curtis points. In contrast, splines, are computationally more efficient but usually only produce low approximation orders. Both have in common that a generalisation to other low-dimensional domains are not-straight forward. This, however, might be desirable. For example, if the solution of a time-dependent partial differential equation is studied then the time interval is usually univariate while the spatial domain is higher-dimensional. Also, the restriction to only one-dimensional domains limits the structure of the resulting high-dimensional domains.
In this paper, we suggest to combine Smolyak's technique with that of multilevel radial basis functions. Radial basis functions are a well-established tool to tackle multivariate interpolation and approximation problems, see for example [23,24]. They allow the user to build approximations and interpolations using arbitrarily distributed data sites in low-dimensional domains. Moreover, if compactly supported radial basis functions are combined with a multilevel strategy, see for example [25][26][27], they provide a stable and fast approximation method. The combination of radial basis functions with Smolyak's construction is not entirely new. For example, in [28,29] Gaussians are used in this context. However, these papers are purely numerical. We propose to use multiscale compactly supported radial basis functions. We derive the method, provide several representations of the multivariate approximation operator and give rigorous error estimates. We do this for the classical Smolyak method over isotropic index sets, as well, as for the generalisation over anisotropic index sets.
The paper is organised as follows. In the next section we collect all required material on tensor product spaces to describe Smolyak's method in its most general form. In the Sect. 3 we collect the required material on radial basis functions, or more generally, on kernel-based approximation methods. Here, we review the multilevel method and provide first new representations of it. In the Sect. 4 we introduce the kernel-based tensor product multilevel method, derive different types of representation and provide a thorough error analysis. In the Sect. 5 we give some numerical examples.

Smolyak's construction
For higher dimensional approximation problems, the method of Smolyak, introduced in [2], is one of the standard approximation methods. The basic idea of Smolyak's construction is to combine univariate approximation processes in an orderly fashion, resulting in a multivariate approximation process, which comes close to achieving the approximation power of the fully resolved high-dimensional product rule, using significantly less information. Even if the number of required information still depends exponentially on the space dimension, the dependence is significantly better than it is for classical product rules.

Tensor product spaces and operators
We start by giving a short introduction to tensor products of spaces, functions and operators. For the convenience of the reader we collect in this subsection all the relevant material. This will allow us to describe our methods in the most general form. For more details and proofs, we refer, for example, to [30][31][32].
Tensor product spaces are usually first defined algebraically by the so-called universal property. The definition is then extended to normed spaces. Definition 2.1 Let V (1) , . . . , V (d) be linear spaces. The linear space T is called the algebraic tensor product space of V (1) , . . . , V (d) , denoted by T = V (1) if there is a multilinear mapping φ : V (1) × · · · × V (d) → T having the following properties: The algebraic tensor product space always exists and is uniquely determined up to isomorphisms. From this, it particularly follows that it is essentially independent of the ordering of the building spaces V ( j) and also associative. By definition, the algebraic tensor product space is the span of the elementary tensors u (1) A direct consequence of this is that a basis of T is given by the tensor products of bases of the V ( j) . Before discussing normed tensor product spaces, we will shortly recall the concept of operators between algebraic tensor product spaces.
It can be shown that this is well-defined, i.e. independent of the actual representation of u. Moreover, this is obviously a linear map. A specific case of this is given if the V ( j) = R, i.e. if the A ( j) : U ( j) → R are linear functionals. Using the fact that the tensor product of d copies of R is R, the above definition yields for linear

Remark 2.3
We will particularly be interested in spaces of functions. Hence, if U ( j) and V ( j) contain certain functions u ( j) : ( j) → R and v ( j) : ( j) → R, then using the notations u = u (1) , the elementary tensor products are given by This mainly follows from the uniqueness of the tensor product space and the fact that the universal property is easy to verify for the mapping φ : We now turn to normed spaces. Unfortunately, the norms on the building blocks do not automatically lead to a norm on the algebraic tensor product space, unless the norms are given by inner products. Hence, it is usual to make the following assumptions and requirements.
Here, T * denotes the dual space of T . 3. Two norms · S : S → R and · T : T → R are called compatible if for all linear and bounded operators A ( j) : If · S and · T are crossnorms, it is not too difficult to verify, see for example the comments after Definition 4.77 in [32], that (3) is equivalent to the easier verifiable Having a norm on an algebraic tensor product space, we can, if necessary, complete the space with respect to this norm.

Remark 2.6
As the tensor product spaces are the closure of the algebraic tensor product spaces under given norms, it is always possible to extend an operator A (1) ⊗· · ·⊗ A (d) : S → T to a bounded operator between the tensor product spaces U ( j) and V ( j) having the same norm, provided that the operator is bounded from S → T , which is the case if the norms on S and T are compatible.
We are mainly interested in Hilbert spaces, in particular in Sobolev spaces. Definition 2.7 Let V (1) , . . . , V (d) be (pre-)Hilbert spaces with algebraic tensor product T = V (1) ⊗ · · · ⊗ V (d) and inner products ·, · V ( j) , 1 ≤ j ≤ d. Then, the induced inner product on T is defined by and linear extension.
This indeed defines an inner product on the algebraic tensor product space. Moreover, the norm induced by this tensor product has all the required properties.

Proposition 2.8
The induced norm on the algebraic tensor product space of (pre)-Hilbert spaces is a reasonable crossnorm. Induced norms are always compatible.
We are particularly interested in Sobolev spaces. However, not necessarily standard Sobolev spaces H m ( ), which consists of all functions u ∈ L 2 ( ) having weak derivatives D α u ∈ L 2 ( ) for all |α| ≤ m but in so-called mixed-order Sobolev spaces.
To introduce them, we will employ the following notation. Let ( j) ⊆ R n j , 1 ≤ j ≤ d, be given and := (1) × · · · × (d) . For a function u : (1) ,...,α (d) u for both weak and strong derivatives, using also the notation Then, the Sobolev space of mixed regularity m ∈ N d 0 is defined as with inner product given by (1) ,...,α (d) u, D α (1) ,...,α (d) v L 2 ( ) and norm given by Following the ideas of the proof for standard Sobolev spaces, it is easy to see that this is indeed a Hilbert space. Moreover, it is the tensor product space of the Sobolev spaces H m j ( j ) under the induced norm.

Lemma 2.10 Let
Then, with the induced norm on the tensor product space, Proof This is a standard result for m = (0, . . . , 0) T , i.e. for the tensor product of L 2 ( ( j) ), see for example [30]. However, the proof easily carries over to this more general situation.
In this paper we will consider two types of particular monotone index sets, which have been studied frequently in the literature. The first type is the one that was used originally by Smolyak [2] and in subsequent papers like [3,7,33] and is for q, d ∈ N given by Obviously, this set makes only sense if q ≥ d. It then consists of q d elements. To also treat functions which have a different regularity in each direction, this set was subsequently generalised, see for example [8,[34][35][36], to using a positive, real weight vector ω > 0 and a threshold ∈ N. This is indeed a generalisation, as we obviously have for any constant vector ω = (ω, . . . , ω) T = 0 the relation In contrast to the isotropic index set (4), it is not so easy to determine the exact number of elements in the index set (5) due to the real weight vector ω. However, we will need a bound on the number of elements later on, which we quote from [35,Lemma 5.3].

Lemma 2.12
If the weight vector ω ∈ R d + is non-decreasingly ordered, i.e. if ω 1 ≤ ω 2 ≤ · · · ≤ ω d then the cardinality of the set ω ( , d) from (5) is bounded by We remark several points regarding this lemma. First, the assumption on the weights ω to be ordered is obviously no restriction since we can always permute the directions. Second, in the isotropic case (4) the estimate is sharp, as we have with constant ω using also (6), There are several other publications with different estimates on the cardinality of ω ( , d), see for example [36,Lemma 2.8] or [34,Proposition 3.5]. However, they all exhibit the same asymptotic behaviour of d for → ∞ and the bound given in Lemma 2.12 seems to be the sharpest upper bound yet obtained.
With all this at hand, we can now give a formal definition of Smolyak's construction. We will do this in the most general form using only arbitrary linear spaces and their algebraic tensor products.

Definition 2.13
Let U (1) , . . . , U (d) and V (1) , . . . , V (d) be linear spaces with algebraic tensor products S = U (1) Then, the Smolyak operator A : S → T is defined as the tensor product operator Sometimes one also speaks of the are bounded, then the Smolyak operator can uniquely be extended from the algebraic tensor products to the tensor products, i.e. to an operator A : , as long as the norms on the algebraic tensor product spaces are compatible, see Remark 2.6. In this situation, we will not distinguish between the Smolyak operator and its extension.

We now want to discuss a different representation of A , expressing it in the
i . This is basically based on the identity, see [33], where we also used the fact that A (1) This can be further simplified if the index sets mentioned above are used. For the isotropic index set (q, d), the following result can be found in [3,7,33].

Lemma 2.16
The operator A of Definition 2.13 has for = (q, d) = {i ∈ N d : |i| ≤ q} the representation For the anisotropic index set of (5), i.e. = ω ( , d) a generalisation of the previous result can be found in, e.g., [8].
These representations are also called combination techniques, see [12].
We finally need yet another representation, which will be helpful for the error analysis later on. We will do this only for the two types of index sets we have in mind. The first one is for the isotropic set. We will formulate the result in a more general form than the one in [3,7,33]. However, it is obvious that the proof given there holds in this situation, as well.
. Then, using the notationq := q − d, The corresponding version for the anisotropic index sets ω ( , d) comes again from [8]. Again, we formulate it slightly more general and use also Remark 2.15. To simplify the notation, we will write for ω ∈ R d

The kernel-based multilevel method
In this section, we will introduce the operators that we will employ as the operators A ( j) i j in the Smolyak formula. To simplify the notation, for this section, we omit the explicit dependency on the direction j.

Let
⊆ R n be the domain of interest. We will use positive definite functions for building approximations to functions f ∈ C( ) which are only known at discrete points X = {x 1 , . . . , x N } ⊆ . Details on this topic can be found in [23].
For us the following result will be particularly important. For a proof we again refer to [23].

Lemma 3.1 Let σ > n/2 and let
: R n → R be an integrable and continuous function having a Fourier transform satisfying with two fixed constants 0 < c 1 ≤ c 2 . Then, K : The induced norm is equivalent to the standard norm on H σ (R n ).
It is even possible to choose to have compact support in the closed unit ball B 1 [0] = {x ∈ R n : x 2 ≤ 1}. In this situation, we can also scale as δ = δ −n (·/δ) with δ > 0. The resulting function has support in B δ [0]. Moreover, since scaling does not change the decay of the Fourier transform, We are particularly interested in two reconstruction processes based on positive definite functions: interpolation and penalised least-squares. The following well-known result makes kernel-based approximations particularly appealing. A proof of the first part can be found in [23] and for the second part in [37].
Moreover, for every λ > 0 there is a unique function s of the form (14) solving where N K (R n ) is the reproducing kernel Hilbert space of the kernel K induced by . In this case, the coefficients can be computed via the linear system (M + λI )α = f.
When working with a compactly supported, positive definite and radial kernel, there is an unfortunate trade-off principle, see for example [23,27]. If the support of the kernel is kept fixed and the number of data sites is increased then the sequence of interpolants converges to the target function. However, the interpolation matrices become denser and denser, i.e. the ratio between non-zero entries and all entries remains constant, and the evaluation of the interpolant becomes more and more expensive. Eventually, the advantage of the compact support is lost. If, however, the ratio between the support radius and the "data density" is kept fixed then the interpolation matrices remain sparse and well-conditioned, evaluation can be done in constant time but the sequence of interpolants does not converge.
The remedy to this problem is described in the next subsection and will form the basis of our definition of the approximation operators in the Smolyak algorithm.

Multilevel approximation
The idea of the kernel-based multilevel approximation is to combine different kernelbased approximation spaces with a simple residual correction scheme. The latter is rather general and we start with its definition.
The residual correction scheme computes for a given f ∈ V a sequence of local approximations s i ∈ W i , global approximations f i ∈ V i and errors e i ∈ V as follows. For initialisation, f 0 = 0 and e 0 = f . Then, recursively, for i = 1, 2, . . .
• Determine a local approximant s i ∈ W i to e i−1 .
Throughout this paper, we will assume that the local approximants are given by an operator I i : V → W i . In our applications these operators will always be linear.

Lemma 3.4 If the local approximations s i ∈ W i are computed with a specific operator
Proof This follows immediately from the definition, as the sequences of functions generated by the residual correction scheme satisfy the additional relations We will now describe how we will use kernel-based approximation to build the approximation spaces W i for the residual correction scheme. As can be expected there are two ingredients to this: point sets and kernels. We follow mainly [26]. Let ⊆ R n again be given. We assume that we have a sequence of discrete, not necessarily nested data sets X 1 , X 2 , . . . given by X i := {x i,1 , . . . , x i,N i }. For each point set X i we define two geometric quantities The first quantity is called fill distance or mesh norm and is a measure of how well the points in X i cover the domain , as it is the radius of the largest ball with center in without a point from X i . The second quantity is called separation radius and denotes the half of the smallest distance between two data sites. We are especially interested in so-called quasi-uniform data sets, i.e. there exists a constant c qu > 0 such that Concerning the fill distances of the data sites X 1 , X 2 , . . . , we assume them to be monotonically decreasing, i.e. we assume them to satisfy for some fixed μ ∈ (0, 1) the relation h i+1 ≈ μh i .
Next, we define a kernel K i : × → R for each level i ∈ N. In theory, we could pick different kernels for every level i, choosing a smooth kernel for the first few levels and then, for higher levels, switching to a rougher one. But in this paper, we will define the kernels on all of R n and build them using the same function : R n → R. We will assume that this function has the properties stated in Lemma 3.1, i.e. the associated reproducing kernel Hilbert space We will, in addition, assume that is compactly supported with support given by the closed unit ball B 1 [0] = {x ∈ R n : x 2 ≤ 1}. Then, we can pick a new support radius δ i > 0 for each level and define the level-dependent kernels as Obviously, K i (·, y) has support in B δ i [y] for fixed y and all these kernels are reproducing kernels of H σ (R n ) but with different inner products.
Since we assume that the data sites X 1 , X 2 , . . . become denser and denser, the support radii of i are chosen to mimic this behaviour, i.e. we choose them smaller and smaller. Usually, we couple them to the fill distance via a constant ν > 0 such that δ i = νh i . This way, the number of sites in the support of K i can be bounded independently of the level.

Definition 3.5 Let
The kernel-based multilevel approximation method is given by the residual correction scheme with local approximation spaces where K i : R n × R n → R is defined in (16)  Though there is some resemblance to other multiscale methods like wavelets, there are significant differences. For example, the local approximation spaces W i are in general finite-dimensional for every i and built using shifts and scales of a fixed function over scattered data rather than a regular grid. Moreover, the function does not satisfy any kind of refinement equation. However, there are also recent multiscale constructions, for example based on geometric harmonics, see [38], which also employ finite dimensional spaces and scattered points, but differ significantly from our approach as they are based on eigenfunction expansions.
Nonetheless, the interpretation of the method is similar to that of other multiscale methods. The idea is that we start by computing an approximant from V 1 = W 1 to f on X 1 using a large support radius δ 1 . In the next step, we want to add more details and hence compute an approximant from W 2 on X 2 using a smaller support radius δ 2 to the error of the first step. The sum of these two approximants belongs to V 2 = W 1 + W 2 and should be a better approximation than the first one. Proceeding in this way, we have the above defined kernel-based approximation method. The method has initially be suggested in [39] and has then further been investigated in [25,26,[40][41][42][43].
If the local approximations s i ∈ W i are computed via s i = I i (e i−1 ) using a linear operator I i : C( ) → W i then the kernel-based multilevel approximation scheme allows us once again to write f L as f L = A L ( f ) with global approximation operators A L : C( ) → V L given by (15).
It remains to discuss how we choose these approximation operators I i . In this paper, we will deal with two types, interpolation and penalised least-squares approximation, see Lemma 3.2.
Any operator I i : C( ) → W i has the form with certain coefficients α i = (α i,1 , . . . , α i,N i ) T ∈ R N i . In the case of interpolation, these coefficients are determined by a linear system M i α i = f with the symmetric, positive definite and sparse matrix Similarly, in the case of penalised least-squares approximation, we have the local approximation as which is also given by a linear operator s i = I i f of the form (17), where the coefficients are determined by solving the linear system The numerical complexity is comparable to the one for interpolation. We end this section with collecting convergence results for the multilevel method. Again, these results are proven in [26].

If the target function f belongs to H σ ( ) and if the multilevel method is build
using interpolation operators then there are constants C 1 , C > 0 such that 2. If the target function f belongs to H β ( ) with n/2 < β < σ and if there is a constant c q > 0 such that q i ≤ h i ≤ c q q i for i = 1, 2, . . . then there are constants 3. If the target function f belongs to H σ ( ) and if the smoothing parameters λ i in (18) satisfy λ i ≤ κ(h j /δ j ) 2σ with a fixed constant κ > 0 then then there are constants C 1 , C > 0 such that In all three cases convergences appears if the parameter μ is chosen sufficiently small.
The first case in the theorem above corresponds to approximating functions from H σ ( ) where the smoothness aligns to the smoothness of the kernel. In the second case rougher target functions f ∈ H β ( ) are approximated with a smoother kernel. Here, an additional stability assumption on the point distributions is required. In the third case penalised least-squares approximation for functions from H σ ( ) is discussed. Again, the smoothness is aligned to the smoothness of the kernel. Convergence is guaranteed if the smoothing parameters are chosen correctly. It is also possible to show convergence for less smooth target functions but we omit the details here.
Finally, the L 2 -norm on the left-hand side can be replaced by other L p or Sobolev norms. Again, we omit the details here.

Further representations
For their application within the Smolyak construction, we need a better understanding on how the residual correction operator in general and the more specific multilevel approximation operator depend on the target function f . This is the goal of this section and we will start with the general residual correction procedure. . . , L} with j = #u elements, we will always assume that the elements are sorted as u 1 < u 2 < · · · < u j . Then, we define the combined operator It is important to note that the ordering of u is crucial in the definition of I u as the operators I i usually do not commute. Moreover, we stress that in the definition of I u we first apply the operator with the smallest index then the one with the next bigger index and so on. For the resulting product of operators it is easy to see by induction, that we have the identity Finally, this, together with the identity f L = f − e L yields the stated representation.
After this general representation, we will now derive a specific representation for the multilevel approximation method for both cases, interpolation and penalised leastsquares approximation. Hence, we are in the situation described in Definition 3.5, i.e. we consider operators I i : C( ) → W i defining the local approximation with We first look at interpolation. A change of basis allows us to write the operators I i also in explicit form. For each i and 1 ≤ k ≤ N i let χ i,k ∈ W i be the cardinal or Lagrange function satisfying χ i,k (x i, ) = δ k, , 1 ≤ k, ≤ N i , where δ k, is the Kronecker symbol. Then, obviously, we have Inserting this formula into the representation (22) yields the following result.
where the coefficients are given by a(u, k) = 1 if #u = 1 and for #u > 1.

Moreover, the multilevel interpolation operator A L : C( ) → V L has the representation
Proof For u = {u 1 , . . . , u i } we have Inserting this into (22) gives the representation for the multilevel approximation.
Next, we are dealing with penalised least-squares approximation, see also Lemma 3.2.
To find a representation for the multilevel approximation operator like the one in Theorem 3.9, we have to replace the cardinal or Lagrange functions χ i,k appropriately. To this end, we note that, in the case of interpolation, the coefficient vector of the interpolant (17) is given by ). Hence, the interpolant can also be written as Thus, we can conclude that the cardinal or Lagrange function can alternatively be written as where e k ∈ R N i is the k-the unit vector. This idea now carries over to the penalised least-squares setting. As in this case the coefficient vector in (17) is the solution of (M i + λ i I )α i = f, we can now write the approximant as with the modified cardinal functions This gives together with the representation (22) the following result.

The tensor product multilevel method
In this section, we will combine the kernel multilevel method of the previous section with Smolyak's formula. First, we will describe the resulting tensor-product multilevel method in more detail. In the second subsection we will then study the convergence of this method.

The method and its representations
The main idea of the method is to use the multilevel method of Sect. 3 in Smolyak's construction (9). We will discuss various representations of the resulting operator. We will start this by first looking at the residual correction scheme with general monotone index sets and then become more and more specific by looking at the isotropic and anisotropic index sets (q, d) and ω ( , d) from (4) and (5), respectively. After that, we prove representations if the kernel-based multilevel method is employed with interpolation and penalised least-squares approximation.
In the most general case, we assume that the operators A ( j) i : U ( j) → V ( j) for 1 ≤ j ≤ d and 1 ≤ i ≤ L ( j) between linear spaces U ( j) and V ( j) are now given by our multilevel approximation operators from Lemma 3.4, i.e. by

In its original form, the Smolyak construction
i d (27) heavily relies on the difference operators (8) rather than the approximation operators. In our setting, the difference operators simplify to the local approximation operators, i.e.
( j) i This is particularly helpful when approximating elements from an algebraic tensor product space, as in this case all computations can be done independently for each direction. (1) , . . . , U (d) and V (1) , . . . , V (d) be linear spaces with algebraic tensor products S = U (1) ⊗ · · · ⊗ U (d) and j) , are given by the residual correction operators in (26). Assume finally, f ∈ S has the form

be the errors defined in the standard residual correction scheme applied to f
( j) k . Then, the tensor product multilevel operator applied to f is given by However, if the U ( j) are normed spaces and we want to extend the Smolyak operator to the tensor product space U ( j) , we need the combination technique, to find a representation of the tensor product multilevel method. To this end, we need to express the dependence on the multilevel operator on f explicitly. This now follows immediately if we employ the representation from Theorem 3.8 in each direction 1 ≤ j ≤ d. (1) , . . . , U (d) and V (1) , . . . , V (d) be linear spaces with algebraic tensor products S = U (1) ⊗ · · · ⊗ U (d) and T = V (1) (1) , . . . , u (d) )I (1)

are given by the residual correction operators in (26). Then, the Smolyak operator (27) has for a general index set , the isotropic index set (q, d) and the anisotropic index set ω ( , d) the representations
Proof Using (22) for each direction shows Inserting this into (10) yields the first stated representation for a general index set . If we insert this formula into (11), this gives the second representation for the index set (q, d) and inserting it into (12) yields the final representation for the anisotropic set ω ( , d).
After these general results for the residual correction scheme, we now turn to the kernel-based multilevel approximation method. Here, we can further specify the approximation operators.
Consequently, we fix for each direction 1 ≤ j ≤ d a sequence of data sites where L ( j) = max{λ j : λ ∈ }. The corresponding mesh norms h ( j) i and the employed support radii δ ( j) i are coupled, as explained above by with fixed parameters μ ( j) ∈ (0, 1) and ν ( j) > 1. Next, for each direction, we choose a smoothness parameter σ j > n j /2 and a function ( j) : R n j → R satisfying (13) with σ = σ j . The kernels K Finally, for each 1 ≤ j ≤ d we have the local approximation operators where the coefficients α are either determined by interpolation or penalised least-squares, though other methods are obviously possible as well. This finally yields the also direction-dependent multilevel approximation operators A k are the error functions defined in the multilevel process, depending now also on the direction 1 ≤ j ≤ d.
For interpolation, we can rewrite the interpolation operator as using now also direction-dependent Lagrange functions χ i . For penalised least-squares approximation, we have the same representation after replacing χ i . Hence, we will mainly concentrate on the interpolation case. To derive the representation of the Smolyak operator in this situation we can employ Proposition 4.2. Hence, we only need to derive representations for the combined operators I (1) u (1) ⊗ · · · ⊗ I (d) u (d) appearing in that proposition as functions of the target function. This representation, however, follows with Theorem 3.9. We have . This then gives the following result, which, together with Proposition 4.2 yields the combination technique for the multilevel approximation operator.

Proposition 4.3 Let
be the interpolation operators previously defined. Then, for any f ∈ C( ), and sets u (1) , . . . , u (d) , the combined operator has the form

Convergence results
We will now analyse the error of our tensor product multilevel approximation method. We will do this first for the isotropic case, i.e. for index sets of the form = (q, d) and then for the anisotropic case, i.e. for index sets = ω ( , d).
For both cases it is helpful to summarise the general set-up, though we will simplify it significantly for the isotropic case.
As outlined above, we consider function spaces over domains ( j) ⊆ R n j . To be more precise, we will use U ( j) = H β j ( j) with β j > n j /2 and V ( j) = L 2 ( ( j) ).
Then, with = (1) × · · · × ( j) , the associated tensor product spaces are, This follows for the L 2 case and for the case that β ∈ N d 0 from Lemma 2.10. For general β ∈ R d + this can be derived in the same way if the standard Sobolev spaces H β j ( ( j) ) are defined, for example, by interpolation.
Note that the condition β j > n j /2 ensures U ( j) ⊆ C( ( j) ) by the Sobolev embedding theorem such that all our operators are well defined.
Next, we need bounds on all involved operators.

Lemma 4.4 Assume that the local approximations A
satisfy the conditions of Proposition 3.6. For a uniform notation we will simply set β j = σ j for the first and third case of Proposition 3.6.
Let E ( j) : U ( j) → V ( j) be the embedding operator. Then, for 1 ≤ j ≤ d, the following bounds on the norm hold Proof The bound on the embedding is obviously true. The second bound follows directly from Proposition 3.6. The final bound follows because of immediately from the second one.

The isotropic case
in the isotropic case we use the index set = (q, d) = {i ∈ N d : |i| ≤ q}. As the index set does not distinguish between the directions, it is reasonable and custom to assume this also for everything else. Hence, in this situation, we have ( j) = (1) ⊆ R n and β j = β for 1 ≤ j ≤ d. Thus, we have U ( j) = H β ( (1) ) and V ( j) = L 2 ( (1) ).
In the multilevel method we also choose the point sets also direction-independent, i.e. X (1) . This means that the fill distances h ( j) i = h i are independent of j and hence we choose the support radii δ ( j) i = δ i and the parameters μ ( j) = μ and ν ( j) = ν also direction-independent.
In this simplified situation Lemma 4.4 simplifies as follows.

Lemma 4.5
In the isotropic case, for 1 ≤ j ≤ d, the following bounds on the norm hold (1) )→L 2 ( (1) ) = 1, Using the fact that all involved norms are reasonable and compatible crossnorms, it is now possible to derive a bound on the error of the tensor product multilevel approximation, i.e. on E (d) − A (q,d) using Lemma 2.18. However, this has already been in done in a general context in [33,Lemma 2] which is as follows.

hold. Then, the error for the Smolyak operator can be bounded by
Combining Lemmas 4.5 and 4.6 gives our first main error estimate.
Theorem 4.7 Let (1) ⊆ R n be a bounded domain with Lipschitz boundary and let σ ≥ β > n/2. Let = (1) × · · · × (1) . Assume that the low-dimensional reconstructions A i : C( (1) ) → V i are given by the multilevel RBF interpolants (15). Assume further that these multilevel RBF interpolants are built using compactly supported reproducing kernels of H σ (R n ) and support radii and data sets as outlined in Proposition 3.6. Finally, assume that the parameter μ ∈ (0, 1) is chosen in such a way that := C 1 μ β < 1. Then, the error where C is the constant from Lemma 4.5.
Proof From Lemma 4.5 we know that the constants in Lemma 4.6 are given by B = 1, c = C, c = C(1 + 1/ ) and D = . This leads for C ≥ 1 to Hence, Lemma 4.6 yields which is the stated estimate.
As we assume q ≥ d, estimate (29) shows also convergence for → 0. However, this is not desirable as we have = C 1 μ β , where μ is the refinement factor of the mesh norms of the involved grids, i.e. it is a fixed number in (0, 1) and hence so is . A smaller and smaller μ would lead to a too expensive refinement of the data sets. The involved constant, unfortunately but also expectedly, depends exponentially on the space dimension d. Using the bound q d−1 ≤ q d−1 (d−1)! , the error bound becomes with

The anisotropic case
We will now discuss the more general case of the anisotropic index ω ( , d) from (5). The purpose here is to analyse the approximate reconstructions of functions f ∈ H β mix ( ) with a possibly different smoothness β j in each direction 1 ≤ j ≤ d. This, of course, is reflected by choosing direction-dependent functions ( j) leading to reproducing kernels of H σ j ( ( j) ) as outlined at the beginning of this section.
In the rest of this subsection, we will, without restriction, assume that the elements of the weight vector ω are ordered as ω 1 ≤ ω 2 ≤ · · · ≤ ω d such that we in particular have min j ω j = ω 1 .
Though the representation in Lemma 2.19 was used in [8] to derive an error analysis for univariate operators A ( j) i given by polynomial interpolation in Chebyshev points, a general bound like the one in Lemma 4.6 has not been provided. Hence, we will use Lemma 2.19 directly. Accordingly, we have the error representation with and, for i ∈ ω ( , k), We will now use this representation to bound the error in the situation that the lowdimensional interpolation operators are given by the multilevel RBF interpolant. Using the fact that all involved norms are reasonable and compatible crossnorms we can use the estimates from Lemma 4.4, to bound the single terms in (31). To simplify the notation, we will from now on suppress the index at the occuring norms. Essentially, we are only dealing with operator norms · H β j ( ( j) )→L 2 ( ( j) ) or · H β mix ( )→L 2 ( ) and it should be clear from the context, which norm is meant. Defining C := max 1≤ j≤d C ( j) we have

This gives the first bound
To understand this estimate, we need to look at the coefficient i k+1 − 1 and, as k+1 ∈ (0, 1), we need a lower bound. This can, for example, be reached by noticing that ω 1 ≤ ω k for all 2 ≤ k ≤ d and hence With this, (32) becomes To simplify this further we will make the assumption that we choose the local refinement factors μ ( j) in such a way that j = for all 1 ≤ j ≤ d. Such a choice seems natural as it reflects also the different smoothness β j in each direction. The larger β j is, the larger we can also choose μ ( j) ∈ (0, 1). Under the assumption ω 1 ≤ ω 2 ≤ · · · ≤ ω d we can use (7) to bound the number of indices in ω ( , k) as With this, our error bound becomes where we also have used ω k+1 ≤ ω d and hence ω 1 /ω k+1 ≥ ω 1 /ω d . The final sum can further be bounded by Taking this all together yields the following error bound.
are given by the multilevel RBF interpolants (15). Assume further that these multilevel RBF interpolants are built using compactly supported reproducing kernels of H σ j (R n j ) and support radii δ ( j) i and data sets X ( j) i as outlined in Proposition 3.6. Finally, assume that the factors μ ( j) are chosen so that all j := C For → ∞, the asymptotic behaviour of the error bound is given by with c d from (30). Hence, we have again exponential convergence for → ∞. Moreover, in the case of a constant weight vector ω = (ω, . . . , ω) T = 0 we have ω ( , d) = (q, d) with q = + d. In this case, the error bound (33) becomes which recovers exactly the same behaviour that we have seen in (30). A closer look at the proof above shows that we can rewrite one of the intermediate steps of the error as follows Obviously, from our assumption ω 1 ≤ ω 2 ≤ · · · ≤ ω d it follows that the maximum number of levels in each direction is decreasing, i.e. L (1) ≥ L (2) ≥ · · · ≥ L (d) .
Unfortunately, in the above formula we have two competing terms. We want each to become small. If we want to balance them, it seems natural to assume that 1 ≥ 2 ≥ · · · ≥ k , i.e. we can choose a larger j for smaller indices as the corresponding exponents L ( j) are also larger. However, this collides with the other term k j=1 ( j / k ) i j −1 as these products then become larger than one. We have resolved this issue by assuming that all j are equal. This allowed us to neglect the damaging products but leads to the bound which is dominated by the worst term L (d) .

Remark 4.9
So far, we have not coupled the smoothness β of the target function with the weight vector ω of the index set. However, our arguments above suggest that we should choose ω = β. This leads to all the desired properties. Because of L (1) ≥ L (2) ≥ · · · ≥ L (d) we use more levels in the direction of the least smoothness, as expected. Because of j = C ( j) [μ ( j) ] β j = we also have μ (1) ≤ μ (2) ≤ · · · ≤ μ (d) . As the number of points N ( j) i is proportional to [μ ( j) ] −i/n j , this means that we also have more points in those directions of lower smoothness. The error can then be expressed using the smoothness of the target function via

Numerical tests
We will now provide numerical examples to demonstrate that the new tensor product multilevel method works as theoretically predicted and to validate the error bounds we derived in the previous section. The details of the algorithmic realisation of the method are mostly already described in Sect. 4. However, we recall the important parts here. Once we fixed the index set ω ( , d), where the weight vector ω ∈ R d + and threshold ∈ N serve as input parameters for the algorithm, we determine the low-dimensional point sets X ( j) i with refinement parameter μ or μ ( j) such that the conditions of Theorems 4.7 and 4.8, respectively, are satisfied. After choosing a reproducing kernel K ( j) for every direction, we compute the Lagrange functions χ ( j) i,k for every level 1 ≤ i ≤ L ( j) and every anchor i . All this can be done in an offline phase as soon as the input parameters are determined. The interpolants then are computed following the formulas of Propositions 4.2 and 4.3.
We test two different settings: First, an isotropic, high-dimensional example and second a fully anisotropic one. In any case we use the compactly supported Wendland kernels
As direction-wise point sets X with # X ( j) i = 2 i+2 , i.e. we use a uniform refinement parameter μ = μ ( j) = 0.5. The overlap parameter, which dictates how the support radius δ i is coupled to the fill distance h i , is chosen as ν = ν ( j) = 8.0.
In Fig. 1 we show an ∞ -error plot for the interpolation with the first two kernels given above. We follow again [3] and compute the error for both examples on the same 50 randomly drawn points. Though our theory is based on L 2 -estimates, it can easily be modified for L ∞ . Additionally, we give the theoretical bound ( + 6) 6 ε +1 as the dashed lines. We use ε = (C 1 μ) 2−0.5 , for the case φ 1,1 and ε = (C 1 μ) 3−0.5 if we use φ 1,2 . In both cases we use C 1 = 1. As expected, approximating with a smoother kernel leads to smaller errors and a faster convergence. We see that in the case of the less smooth kernel φ 1,1 the method converges faster than the theory predicts. Approximating with φ 1,2 yields a convergence order similar to the one predicted by the theory.

Two-dimensional anisotropic example
Looking at (33) in Theorem 4.8, we see that the convergence order depends only on the weights of the first and last direction, ω 1 and ω d , respectively. Hence, it suffices to test a two-dimensional problem.
Assuming that C This leads to cardinalities # X (1) i = 2 2+i and # X (2) i = 3 2+i . Additionally, we set the overlap parameter ν ( j) to 8.0 for every direction j. This results in directionindependent condition numbers of the kernel matrices.
To determine the anisotropy, characterised by the quotient of the components of the weight vector ω ∈ R d + , we couple ω j to β j , the smoothness of f α in direction j, and normalise it with respect to ω 1 . This leads to the weight vector ω = (1, 2) T . We compute the 2 -errors on a uniform 15000 × 15000 grid and show the error in Fig. 2. Additionally, we give again as the dashed line the values of the expected behaviour ε 2 +1 . To gain an impression of the convergence speed we provide a linear fit for each graph, shown as a dotted red line. We note that the obvious steps in the error graph are not surprising. We have the formula which indicates the maximum level L ( j) per direction for a given . We see that this L ( j) exhibits steps due to the Gaussian braces. The mild but unexpected increase in the error steps is most likely due to numerical cancellation errors caused by parallelisation. Again, we can see that the method converges slightly faster than the theory suggests.

Summary
In this paper, we have, for the first time, combined the multilevel radial basis function method with Smolyak's algorithm. We have provided a rigorous error analysis for both the isotropic and anisotropic situation. Some advantages of this new tensor product multilevel method are as follows. The underlying domains j do not have to be intervals nor one-dimensional. Though the error analysis requires the low-dimensional data to be quasi-uniform, this is not necessary for the actual computation. The method can deal with arbitrary point locations in the low dimensional domains, leading to sparse grids based on scattered data. The numerical examples provided show that the method has the potential to perform well in moderately high-dimensional situations. However, more research is necessary for example to find computationally more efficient representations of the high-dimensional approximation operator. Here, optimised combination techniques could and should be studied. Moreover, we plan to further study the anisotropic case and make better use of the different smoothnesses in different directions.