Adaptive Radial Basis Function Partition of Unity Interpolation: A Bivariate Algorithm for Unstructured Data

In this article we present a new adaptive algorithm for solving 2D interpolation problems of large scattered data sets through the radial basis function partition of unity method. Unlike other time-consuming schemes this adaptive method is able to efficiently deal with scattered data points with highly varying density in the domain. This target is obtained by decomposing the underlying domain in subdomains of variable size so as to guarantee a suitable number of points within each of them. The localization of such points is done by means of an efficient search procedure that depends on a partition of the domain in square cells. For each subdomain the adaptive process identifies a predefined neighborhood consisting of one or more levels of neighboring cells, which allows us to quickly find all the subdomain points. The algorithm is further devised for an optimal selection of the local shape parameters associated with radial basis function interpolants via leave-one-out cross validation and maximum likelihood estimation techniques. Numerical experiments show good performance of this adaptive algorithm on some test examples with different data distributions. The efficacy of our interpolation scheme is also pointed out by solving real world applications.


Introduction
In kernel based approximation radial basis function (RBF) methods are effective meshfree techniques, which can be implemented to numerically solve various types of science and engineering problems. Over the last years the use of RBF methods has gained much attention in several interdisciplinary fields. Indeed, while many traditional numerical methods such as finite differences, finite elements or finite volumes have troubles with high-dimensional prob-lems, meshfree methods can often handle changes in the geometry of the domain of interest (e.g., free surfaces, moving particles and large deformations) better. Moreover, independence from a mesh is a great advantage since mesh generation is one of the most time-consuming parts of any mesh-based numerical simulation (see e.g. [16,17,20]). All these issues deserve to be considered not only when one has to model (systems of) partial differential equations (PDEs), but also when a multivariate problem of scattered data fitting needs to be faced and solved appropriately. This problem is particularly relevant in several situations in which surface reconstruction involves unstructured large data sets, requiring in some way the construction of adaptive interpolation or approximation methods. Approximating scattered data of high complexity arises in various areas of applied sciences, ranging from scanner acquisitions to geographic benchmarks, as well as for industrial and medical purposes where the processing of large random data configurations is usually carried out. In such cases scattered data can have significantly different distributions, e.g. data with highly varying density or data with voids, which demand adaptive algorithms (see e.g. [8,13]).
In this article we present a new adaptive algorithm for solving 2D interpolation problems of large scattered data sets. In doing that, though some adaptive strategies have been developed, for example, in [15,38], a global RBF interpolation method is not definitely suitable for our purposes, since it seldom turns out to be usable in practice. Indeed, a global scheme is characterized by a big interpolation matrix, and this results in a two-fold issue: first, a severe ill-conditioning of the matrix with a consequent high level of instability; second, a high computational cost when such a matrix has to be inverted (and the method applied in its entirety). For this reason, in this work, we focus on a local RBF method, such as the radial basis function partition of unity method (RBF-PUM), which allows us to decompose a big problem into several small subproblems. This interpolation method relies on a decomposition of the domain into several subdomains forming a cover of it, and constructing a local RBF interpolant on each subdomain. The original idea of PUM comes from the context of PDEs [3,27], but later it also gained much popularity in the field of numerical approximation [10,11,13] and, more in general, in various areas of applied mathematics and scientific computing (see e.g. [4,5,12,19,22,23,25,28]). However, although the RBF-PUM has some specific features that makes it particularly suitable for processing large scattered data sets, the problem of interpolating very irregularly distributed data points has been addressed only partly in [13]. In fact, the numerical method in [13] is accurate, but at the same time it is also quite costly from the computational point of view. Here we therefore propose a new adaptive scheme that allows us to define subdomains of variable size so as to guarantee in any case a suitable number of points within each of them. Such points are localized by considering a cell based structure, which preliminary partitions the underlying domain and its points in a suitable number of cells, thus enabling the use of an efficient search procedure. After completing this phase, the algorithm identifies for each subdomain optimal values of the local RBF shape parameters via leave-one-out cross validation (LOOCV) or maximum likelihood estimation (MLE). Both techniques have a statistical background and can be combined with efficient optimization routines to quickly achieve reliable predictions of the RBF shape parameters (see [16,30,31]). In our extensive numerical experiments we show the performance of this new adaptive algorithm, also comparing our results with non-adaptive and adaptive-but existing or standard routine based-algorithms. Finally, the efficacy of our scheme is also pointed out by solving interpolation problems with data sets coming from real world applications.
The paper is organized as follows. In Sect. 2 we give a brief overview on the basic notions regarding the RBF based interpolation. In Sect. 3 we focus on the construction of the RBF-PUM selecting local RBF shape parameters via either LOOCV or MLE. Section 4 is devoted to describe our adaptive algorithm, then discussing computational issues and providing a complexity analysis. In Sect. 5 we present several numerical results designed on test examples in order to illustrate the performance of our interpolation algorithm. Section 6 shows some applications to real world data sets. Finally, Sect. 7 deals with conclusions and future work.

Preliminaries on RBF Based Interpolation
In this section we give a brief overview on the basic notions of RBF methods, which are powerful and flexible tools for scattered data interpolation. To have more details on the theoretical background, we refer the reader to [9,16,36].
Given a domain Ω ⊆ R s , a set X N = {x 1 , . . . , x N } ⊆ Ω of N distinct data points or nodes and the corresponding data or function values f (x i ) ∈ R, i = 1, . . . , N , obtained by possibly sampling any (unknown) function f : Ω → R, we want to find a function s : Ω → R that satisfies the interpolation conditions We express a RBF interpolant s : Ω → R as a linear combination of RBFs, i.e., where c i , i = 1, . . . , N , are unknown real coefficients, || · || 2 denotes the Euclidean norm, and φ : R ≥0 → R is a strictly positive definite (SPD) RBF depending on a shape parameter ε > 0 such that In the following, for the sake of simplicity, we refer to φ ε as φ. In Table 1 we report a list of some SPD RBFs together with their degrees of smoothness. Note that Gaussian, Inverse MultiQuadric and Matérn functions are globally supported and SPD in R s for any s, while Wendland functions are compactly supported-with support [0, 1/ε]-and SPD in R s for s ≤ 3 [36]. Since φ is a SPD function, the matrix A = (A ki ) with the entries A ki = φ(||x k − x i || 2 ), k, i = 1, . . . , N , is positive definite for all possible sets of nodes. In this case, the coefficients c k are uniquely determined by enforcing the interpolation conditions in (1) and can be obtained by solving the symmetric linear system where c = (c 1 , . . . , c N ) T , and f = ( f 1 , . . . , f N ) T . Therefore, the interpolation problem is well-posed and, hence, its solution exists uniquely [17]. Moreover, for any SPD RBF φ we can define a symmetric and SPD kernel Φ : For the kernel Φ there exists the so-called native space, that is a Hilbert space N Φ (Ω) with inner product (·, ·) N Φ (Ω) in which the kernel Φ is reproducing, i.e., for any f ∈ N Φ (Ω) we have the identity f (x) = ( f , Φ(·, x)) N Φ (Ω) , for x ∈ Ω. Thus, the space H Φ (Ω) = span{Φ(·, x), x ∈ Ω}, equipped with the bilinear form (·, ·) H Φ (Ω) , is an inner product space with reproducing kernel Φ. The native space N Φ (Ω) of the kernel Φ is then defined as the  [17,36]).

RBF-PUM Based Interpolation
In this section we focus on the construction of the method, describing the LOOCV and MLE based approaches that can be used to select optimal values of the RBF shape parameters. Moreover, we present the main theoretical results concerning the RBF-PUM interpolation.

Construction of RBF-PUM Interpolants
Let Ω ⊆ R s be an open and bounded domain, and let {Ω j } d j=1 be an open and bounded cover of Ω that fulfills some mild overlap condition among the subdomains Ω j . Indeed, the subdomains Ω j need to form a cover of the domain such that d j=1 Ω j ⊇ Ω. Moreover, a given point x ∈ Ω must belong at most to K (independent of d) overlapping subdomains. A typical example of partition of unity (PU) subdomains using scattered data points in R 2 is shown in Fig. 1; in this case, for the sake of clarity, we show circular subdomains of fixed radius.
Given the subdomains Ω j , we consider a partition of unity where the weight w j : Ω j → R is a continuous, nonnegative and compactly supported function with supp(w j ) ⊆ Ω j . Then, we define the global RBF-PUM interpolant of the form For each subdomain Ω j , in (3) we can thus express the local RBF interpolants s j : Ω j → R as follows where N j is the number of data points in Ω j (i.e. the nodes x j i ∈ X N j = X N ∩ Ω j ), and we construct PU functions w j using the well-known Shepard's weight [1,33] where ϕ j (x) is a compactly supported function with support on Ω j such as the W2 function (see Table 1). Such functions are scaled with a shape parameter σ to get ϕ j (x) = ϕ(σ ||x − ξ j || 2 ), ξ j being the center of the weight function. If the functions s j , j = 1, . . . , d, satisfy the interpolation conditions the global interpolant (3) inherits the interpolation property of the local interpolants (4), i.e., Solving the jth interpolation problem (6) results in the linear system ⎛ or simply As for the global system (2), the use of SPD functions φ ensures (also in the local case) that the solution of the local system (7) exists uniquely, since the matrix A j is nonsingular [17].

Selection of Local RBF Shape Parameters
Since the accuracy of the global interpolant (3) strongly depends upon the choice of shape parameter ε associated with the local RBFs in (4), we need an effective approach that enables us to find suitable values of ε (and, possibly, an optimal one for each of PU subdomains). The RBF shape parameter is in fact responsible for the flatness of the basis functions. However, in the flat limit ε → 0, i.e. when the best accuracy is typically achieved, the local matrix in (7) might suffer from instability due to ill-conditioning (see [16]). As a consequence, the selection of ε may highly influence the accuracy of the RBF-PUM based interpolation. It is therefore paramount to optimally detect such values of the local RBF shape parameters.

Choice of " via LOOCV
A good way to select a shape parameter ε is to use locally the LOOCV technique [30]. The idea behind LOOCV in the RBF-PUM interpolation is to split the data of each subdomain Ω j , j = 1, . . . , d, into two distinct sets: For a fixed index k ∈ {1, . . . , N j } and a fixed shape parameter ε, we define the partial RBF interpolant whose coefficients c j i are found by interpolating the training data In order to measure the quality of this attempt, we define the error at the one validation point x j k not used to determine the interpolant. The "optimal" value of ε is found as where || · || is any norm used in the minimization problem, for instance, the ∞-norm.
The important aspect is that we can determine the error vector e j without solving N j problems, each of size (N j − 1) × (N j − 1). In fact, instead of (8), the computation of the error components can be expressed in terms of the interpolation matrix A j in (7), i.e.
where c j k is the kth coefficient in the full RBF interpolant (4) and [18]. So from (9) and (10) it follows that the LOOCV cost function to be minimized is

Choice of " via MLE
Another approach for selecting a shape parameter ε is given by the MLE, which relies on solid probabilistic and statistical foundations [18]; for further details, see e.g. [16,31,32]. As the LOOCV technique even a MLE based criterion can be applied to RBF-PUM interpolation to locally find an optimal ε-value for each subdomain Ω j , j = 1, . . . , d.
In this stochastic context we introduce the concept of random field Y j = {Y j (x)} x∈Ω j defined on the subdomain Ω j . In particular, we assume that the field Y j is a Gaussian random field, i.e., the vectors j is the process variance. Here, for simplicity we assume that μ j = 0 and σ 2 j = 1. Now, for a fixed shape parameter ε, if the probability density that occurs an event given the data f j is expressed as p( f j |ε), then the function L(ε| f j ) characterizes the likelihood of ε given the existing data values f j . Therefore, the Gaussian joint probability density function that refers to the vector Y j of N j random observations (belonging to the subdomain Ω j ) can be written as follows When we evaluate the density function (12) by using the jth data vector In order to find the optimal ε parametrization, we thus need to maximize the log-likelihood function. Equivalently, we can multiply it by −2, and then-ignoring the constant termminimize the resulting negative function. If we instead want a criterion dependent of the variance process, we can obtain the following MLE cost function which can be minimized to determine the optimal value of ε (see [16]).

Convergence and Error Estimates
In order to formulate error bounds for RBF-PUM interpolation, we assume some additional regularity conditions on the subdomains Ω j . In particular, we require that the PU weight functions w j are k-stable [35]. This property holds if-besides assumptions given in and make some further assumptions to have a regular covering {Ω j } d j=1 for (Ω, X N ) [36]. This condition demands that every subdomain Ω j satisfies the so-called interior cone condition, and the local fill distances h X N j ,Ω j are uniformly bounded by the global fill distance (14).
After defining the space C k ν (R s ) of all functions f ∈ C k whose derivatives of order for ||x|| 2 → 0, we consider the following convergence result, see [17,Theorem 29.1] and [36,Theorem 15.9].

then the error between f ∈ N φ (Ω) and its PUM interpolant (3) is bounded by
for all x ∈ Ω and all |α| ≤ k/2.
If we compare this convergence result with the global error estimates in [36], we see that the PUM preserves the local approximation order for the global fit (3). So we can efficiently compute large RBF interpolants by solving many small RBF interpolation problems and then glue them together with the global PU weights {w j } d j=1 (see [17]).
interpolation problems that are characterized by unstructured or very irregularly distributed data. In this case, indeed, besides identifying suitable values of ε, in a PU framework it is important to have the chance of constructing subdomains of variable size. Since in this work we are considering radial kernels, it is natural to consider subdomains of circular shape. The search process is therefore directed to find a sufficient (or minimum) number of points to be able to define properly a local RBF interpolant. From a practical standpoint this means first to select in an adaptive way the radii δ of our subdomains, and then determine for each subdomain an optimal value of the shape parameter ε associated with the RBF. While the choice of ε has been discussed widely in Sect. 3.2, here we focus on the selection of suitable subdomain radii δ. This phase imposes to organize all given data in an efficient way so as to be able to quickly identify all the points belonging to the given subdomains. Indeed, when the initial radius of a subdomain is not large enough, i.e. the subdomain does not contain a sufficient number of points, the subdomain size has to increased thus including more points. Now, in the following we give a description of our adaptive algorithm that enables us to find for each subdomain Ω j , j = 1, . . . , d, a couple of suitable values (ε j , δ j ), also providing an analysis of the computational cost.

Data Structures and Search Procedures
In this subsection we describe how the data are organized in the two-dimensional space to select the nodes belonging to the various subdomains in the RBF-PUM based interpolation. By doing so, for the sake of clarity we usually assume that the search of points is carried out once (i.e., for a fixed index k), taking into account that an adaptive algorithm is obviously characterized by an iterative process. Computational efficiency is indeed an essential aspect to fast assembly the local matrix A j in (7) and determine the corresponding local RBF interpolant (4). Hence here we present the search procedure used for the localization of points that lie in the subdomains Ω j , j = 1, . . . , d. Such a technique relies on a suitable partition of the domain Ω in square cells. Similar approaches have also been considered in [11,13]. Even if our partitioning structure is applicable to a generic domain, in this work we simply discuss the case of Ω = [0, 1] 2 ⊆ R 2 .
First of all, we start with considering a cover of the domain Ω in which each subdomain Ω j , j = 1, . . . , d, has initially radius where defines the number of PU centers along a single direction of Ω. The partition of the domain Ω is thus formed by d = d 2 PU subdomains, whose centers are given by a grid of points (see the red stars in Fig. 1).

Remark 1
From the definition of d PU in (16) it follows that if we take a larger (smaller) value of d, the partitioning structure becomes finer (coarser), i.e. less subdomains lead to larger ones (and vice versa). This specific connection between the parameters δ and d always ensures to be able to form a cover of Ω.
A choice as that given in (15) can be appropriate when we have a uniform or quite regular node distribution, but in case of irregularly distributed data the value (15) must be updated. In particular, we may iterate our adaptive process by computing the subdomain radii as follows where t k ∈ R >1 is a value that increases with k until every subdomain Ω j , j = 1, . . . , d, contains a number N (k) j of points, which is larger than or equal to a prescribed minimum number N min 2 . Now, in order to localize all the nodes that lie in every subdomain Ω j , we need to generate a data structure that enables us to partition suitably the domain Ω and the data points therein contained. Such a partition turns out to be particularly effective if it is combined with an efficient searching procedure. This search technique is based on a partition of Ω in b 2 square cells, where denotes the cell number along one side of the domain. Thus we assume that the side of every square cell is equal to (or at most slightly less than) the initial subdomain radii. This choice enables us to examine in the searching procedure only a reduced number of cells, thus minimizing the computational effort w.r.t. the most advanced space-partitioning data structures, such as kd-trees, which are commonly used for range and nearest neighbor searches (see e.g. [2,6]). Acting in this way, given a generic point, say x = (x 1 , x 2 ), we can compute the indexes which identify the cell in which the point x in (19) lies. Such a index is computed by taking into account the couple of cell coordinates (k 1 ,k 2 ) in (20) that move along x 1 and x 2 axes, respectively. It is therefore easy to assign to any point the corresponding cell k and partition a set of points in the domain Ω, as for instance the set X N of data points. A sketch of the routine for building the cell based structure is shown in Procedure 1.

Procedure 1: Cell based Structure
Step 1 For any point x ∈ X N (a) Compute the indexes k 1 and k 2 as in (19) (b) Find the cell k in which x is located applying (20), and associate the index of x to it Step 2 Return the indexes belonging to each of the b 2 cells, i.e., all points of X N are identified in their own cell Partitioning the domain Ω, we adopt a lexicographic order by proceeding from bottom to top and from left to right, numbering the square cells from 1 to b 2 . Now, if a given subdomain Ω j has radius (15) and its center lies in the kth cell, from (18) we deduce that the nodes belonging to Ω j must be searched in the so-called square neighborhood consisted initially 2 The number N (k) j represents the amount of data points present in the subdomain Ω j at the kth iteration; this value updates the generic definition of N j introduced in (4), which has been used so far. of nine cells, i.e. the kth square cell and its 3 2 − 1 neighboring cells. However, when the number of points in Ω j is not enough, we enlarge the radius as outlined in (17) and so the square neighborhood becomes larger as well. In so doing, in order to find all data points of Ω j , we have to explore the kth cell and its (3 + 2n) 2 − 1 neighboring cells, for n = 1, 2, . . .. In other words, for n = 0 we consider the first level (or "crown") of the neighborhood, for n = 1 the second one, and so on. An example of domain partition in square cells associated with the phase of localization and search of points within the adaptive subdomains is given in Fig. 2, left to right.
After partitioning the points in b 2 square cells, we have to answer the following inquiries, known respectively as containing query and range search. These computational issues can be briefly described as follows: (i) given the center of a subdomain Ω j , return the index k of the square cell in which that center is contained; (ii) given a subdomain Ω j and the nodes belonging to the corresponding square neighborhood, find all points that lie in that subdomain.
Therefore, the cell based containing query routine in item (i) provides the index k of the cell containing the subdomain center. Here, unlike (19), the cell coordinates k 1 and k 2 in (20) are replacing the generic point x with the specific point x c j = (x c j1 , x c j2 ), which represents the center of the subdomain Ω j (see [13]). As regards item (ii) we have to construct beforehand the square neighborhoods, since we need to consider the points that are located in the given neighborhood only, instead of all points of the domain Ω. In the above description we observe that Ω j is a generic subdomain, so we should think the subscript j is fixed. A practical example of the rule (20) for k = 80 (i.e., k 1 = 6 and k 2 = 5) and k = 160 (i.e., k 1 = 11 and k 2 = 10) is depicted in Fig. 2 (left). Then, after answering the first query, given a subdomain Ω j the search routine enables us to determine all points that are contained in Ω j . More precisely, since the center of such subdomain is located within the kth cell, the associated search technique identifies all data points which are in the kth cell and in its neighboring cells, see Fig. 2 (right). The algorithm also provides for the chance to reduce the number of neighboring cells to be examined when a cell is placed close to the boundary of the domain Ω. An outline of the routines for solving containing query and range search problems is illustrated in Procedure 2 and Procedure 3, respectively, while a summary of the whole adaptive algorithm for bivariate RBF-PUM interpolation is sketched in Algorithm 1.

Procedure 2: Containing Query
Step 1 Define the subdomain center of Ω j Step 2 Compute the cell coordinates (k 1 , k 2 ) given in (21) Step 3 Apply the rule (20) and obtain the index k of the cell containing the center of Ω j

Procedure 3: Range Search
Step 1 Define the square neighborhood associated with a subdomainΩ j Step 2 Compute the Euclidean distance between the center of Ω j and all data points belonging to the square neighborhood Step 3 Sort all computed distances and return all points contained in the subdomain Ω j

Complexity Analysis
In this subsection we discuss computational complexity of the adaptive algorithm. To keep the presentation easier, in the following we explicitly refer to the various steps of Algorithm 1. First of all, we observe that Steps 1, 2 and 3 are essentially preliminary phases, where we define the basic setup for the RBF-PUM and the related partition in cells of the domain Ω. This stage does not significantly influence the computational cost. Then, in Step 4 we build and apply the cell based structure to assign the corresponding cell to each interpolation point of X N . This step has a O(N ) cost. In the assessment of the total complexity we should however take into account the cost which derives from the storing of other points (e.g., evaluation points) or the number d of subdomains used to construct the PUM. In Step 5 for every subdomain Ω j , j = 1, . . . , d, we have to solve containing query and range search problems; in this phase, the process also needs to compute a square neighborhood consisting of nine cells, i.e. the ones at zero and one levels, respectively (see yellow and cyan cells in Fig. 2, left). Now, applying the containing query and range search routines, the complexity is of the order O(N ). The procedure is then iterated in Step 6 until the while-loop has reached the minimum number of prescribed points in each subdomain Ω j . In an adaptive process this could cause an increase of the computational cost, in particular when we have unstructured or significantly different data distributions within the domain. In fact, this stage establishes to update the subdomain size and, accordingly, to enlarge the square neighborhood. Here, we can also observe that the call to the range search routine requires a sort of all computed distances

Algorithm 1: Adaptive RBF-PUM Algorithm
Step 1 Create a cover of the domain Ω with subdomains Ω j of radius (15) Step 2 Define a partition of Ω in b 2 square cells, where b is given in (18) Step 3 Fix the minimum number N min of points required in each subdomain Ω j Step 4 Apply the cell based structure for the identification of points in the b 2 cells Step 5 For each Ω j , find the cell containing the subdomain center (containing query) and, generated the square neighborhoods formed by 3 2 cells, determine all points that lie in Ω j (range search) Step 6 While N Select the local RBF shape parameter via LOOCV or MLE Step 8 Compute local interpolants (4) and weight functions (5) Step 9 Evaluate the global interpolant (3) within the subdomains Ω j , j = 1, . . . , d (see Step 3 of Procedure 3). However, due to local use of a quicksort routine whose complexity is O (N j log N j ), the cost of this phase is estimated to be O(1). In Step 7, the algorithm finds automatically the shape parameters associated with the local RBF interpolants. As discussed in Sect. 3.2, the choice of ε can be done, either using LOOCV or MLE. Both methods require computation of the inverse matrix A −1 j , j = 1, . . . , d with a cost of O(N 3 j ). Even so, the actual selection of the ε-parameter is carried out in an efficient way by minimizing the cost functions (11) or (13)

Numerical Results
In this section we illustrate the performance of our adaptive algorithm, which is implemented in Matlab environment. All the numerical experiments have been carried out on a laptop with an Intel(R) Core i7 6500U CPU 2.50GHz processor and 8.00GB RAM, while the results are shown in tables and figures.
In the following we focus on a wide series of experiments carried out by running the adaptive algorithm for 2D RBF-PUM interpolation. This study aims to analyze the algorithm behavior when the use of an adaptive scheme is essential to get reliable results in broad sense. In doing so, in our tests we consider four different sets of irregularly distributed (or scattered) data points contained in the unit square Ω = [0, 1] 2 ⊂ R 2 . Each of these unstructured node distributions exhibits distinct features. The latter have been chosen to stress the adaptive method and see how the numerical algorithm works in various situations. For the sake of simplicity, we denote these four data sets with the names "Halton", "Ameoba", "Starfish" and "Strips", which are defined as follows: -"Halton" refers to a data set consisting of N = 4 096 low discrepancy Halton points [37] generated through the Matlab command haltonset(2,'Skip',1), see Fig. 3 top-left; -"Ameoba" is a data set characterized by five different distributions, with N = 8 419. In the middle of the domain we have a node distribution with 4 460 nodes within an Ameoba like shape region which is bounded by the parametric curve [34] r (θ ) = e sin(θ ) sin 2 (2θ) + e cos(θ ) cos 2 (2θ), θ ∈ [0, 2π), while the remaining area is split into four small "incomplete" squares containing 165, 257, 438 and 3 099 points, respectively, see Fig. 3 top-right; -"Starfish" identifies a data set containing four distributions of points with distinct densities in the domain and on the whole N = 11 436 interpolation nodes. In the central part of the domain we have a starfish like shape area with 2 547 points bounded by the parametric curve [24] r (θ ) = 0.8 + 0.1(sin(6θ) + sin(3θ )), θ ∈ [0, 2π).
The other parts of Ω are characterized by three oblique bands in which, excluding the starfish-like region, there are 1 287, 5 461 and 2 141 points, respectively, see Fig. 3 bottomleft; -"Strips" labels the last data set with N = 14 001 points. In this case we consider five vertical strips, each of them having a different node distribution. More precisely, left-toright, we move going from a low density to a high density of points, which is proved by the fact that strips of equal area contain 802, 1 800, 2 801, 3 800, and 4 798 points, respectively, see Fig. 3 bottom-right.
Besides selecting suitable subdomains of variable size in the PUM as discussed in Sect. 4, the adaptive algorithm also ensures dependable previsions of the RBF shape parameters via LOOCV or MLE (see Sect. 3.2). The detection of such parameters is completely automatic and the ε-computation is done by using the Matlab fminbnd minimization routine. Moreover, in (17) we assume that the value t k = 1 + k/8, with k = 1, 2, . . ., demanding that each subdomain contains at least N min = 15 data points. We thus show the results obtained by applying our adaptive PUM algorithm and using as local interpolants in (4) some of the SPD RBFs contained in Table 1. As a matter of fact, since we are interested in studying in depth how much our method is effective, the analysis is based on considering various local kernels, thus involving radial functions of different smoothness such as GA, IMQ, W6, M6, M4 and M2. In regard to Shepard's weight in (5) we take the compactly supported function W2.
In these experiments we analyze the performance of our algorithm taking the data values by three test functions. The former is known as Franke's function [26], and its analytic expression is The latter is a trigonometric function [29] of the form while the last one [7,21] is given by In Fig. 4 we show a graphical representation of the above functions, which are commonly used to test and validate new methods and algorithms, then making them usable in several fields of applied sciences and engineering. In order to investigate accuracy of the interpolation method, we compute the Root Mean Square Error (RMSE), whose formula is given by where the ξ i , i = 1, . . . , N eval , are a grid of evaluation points. Here, we assume N eval = 40 × 40. The target of our study is therefore two-fold: on the one hand, studying what is the level of accuracy that this adaptive interpolation method can achieve; on the other, analyzing the computational efficiency expressed in terms of CPU times (in seconds) of the proposed algorithm. In order to emphasize the benefit deriving from this new numerical code, we compare our implementation with a non-adaptive algorithm and an adaptive one characterized by the use of standard search procedures, i.e. without using the procedures discussed in Sect. 4.1.
First of all, we start with an analysis on the accuracy of our adaptive RBF-PUM interpolation scheme. Thus, in Tables 2, 3, and 4, we report the computation errors obtained by applying our algorithm on the four data sets and the three test functions previously mentioned. This study enables us to see how good the ε-predictions via LOOCV and MLE within our local method are. From this comparison it is quite clear that LOOCV seems to be able to provide a greater effectiveness, since its use usually leads to more accurate results than MLE. This fact is especially evident when a high regularity kernel such as GA is employed, instead of a limited smoothness kernel like M4. However, we can also observe that in general the LOOCV gives a slight improvement in term of precision, even if in some cases the benefit is around a order of magnitude. For this reason, in our next tests we will mainly focus on use of LOOCV technique.
Then, in Tables 5 and 6, our focus is to show the importance of our adaptive algorithm with respect to a non-adaptive one. This relevance is undeniable in terms of accuracy of the   numerical method (in these tests the PUM uses IMQ as local kernel). In fact, the benefit deriving from adaptivity is noteworthy, not only when very irregularly distributed data sets (e.g., Ameoba, Starfish and Strips) are considered but also in case of quasi-random data points (Halton), see Fig. 3. In particular, from previous tables we highlight that the adaptive algorithm gives results that-when they are computable-are about two or three orders of magnitude more accurate than the non-adaptive one. Indeed, we note as the non-adaptive interpolation algorithm can be applied successfully only in two cases (Halton and Starfish), while in the other two situations (Ameoba and Strips) is not possible to get any result. This drawback is essentially due to the fact that the non-adaptive method cannot find points in every subdomain, thus making the interpolation process not practicable. In Tables 5 and 6 we denote this computational issue with the symbol -. As regards the computational efficiency expressed in CPU times, the adaptive implementation is obviously a little more costly than the non-adaptive one. Nevertheless, as outlined from our experiments, this extra-work is quite limited and fully compensated by high level of reliability of the new adaptive algorithm.
In Fig. 5 we study the behavior of interpolation errors (RMSEs) and execution times (CPU times) by varying the minimum number N min of data points that is required to lie in each subdomain. In this analysis, for shortness, we focus our attention on a specific case, comparing the behavior of LOOCV and MLE techniques for fixed values of N min ∈ {10, 12, . . . , 30}.
Here the algorithm is tested on the "Strips" data set for f 1 , using M6 as a local kernel. Figure 5 (left) confirms once more that LOOCV results in greater accuracy than MLE; at the same time, these tests show that the highest level of precision due to LOOCV is obtained  Fig. 5 (right), as expected, we can also note that for both LOOCV and MLE the CPU times grow as N min increases, thus making the algorithm computationally more expensive. The given results are not immutable facts, because they are obviously influenced by several variables present in these numerical experiments (e.g., local kernel, test function, data set, etc.). However, this analysis offers useful information for the choice of appropriate values of N min . Moreover, it points out that the selected value N min = 15 turns out to be a good choice for our purposes. This statement is also true if we analyze results contained in Fig. 6, where we use the LOOCV based estimator and compare M6 and W6 to see how RMSEs and CPU times change by varying N min . In this case, we report graphs obtained by running our algorithm on the "Halton" data set for f 3 . From these results we notice that the accuracy of globally supported M6 is slightly better than compactly supported W6. The same considerations can be extended-even in a more pronounced way-for the execution time.
Finally, in Tables 7 and 8 we further test our adaptive interpolation scheme on the test function f 1 . More precisely, here we are interested in comparing execution times obtained by running our new adaptive algorithm and a standard one characterized by the use of the Matlab rangesearch routine, instead of the cell based procedures discussed in Sect. 4. Both algorithms make use of LOOCV for the ε-selection. In Table 7 we report the results computed with the M2 kernel and for the four data sets introduced at the beginning of this section, while in Table 8 we consider the local kernel M4 and five sets of Halton data points whose number N goes from 1 089 to 263 169. Especially looking at Table 8, we may highlight the great speed-up between the two algorithms, underlining as this gap tends to be more and more remarkable when the number of interpolation nodes increases. These results show the significant improvement in terms of computational efficiency that the use of our adaptive algorithm produces.  Execution times are obtained by comparing our new adaptive algorithm with a standard implementation characterized by the use of the Matlab rangesearch routine, instead of that discussed in Sect. 4; in the latter case, the speed-up (ratio of execution times) between the standard algorithm and the new one is given Execution times are obtained by comparing our new adaptive algorithm with a standard implementation characterized by the use of the Matlab rangesearch routine, instead of that discussed in Sect. 4; in the latter case, the speed-up (ratio of execution times) between the standard algorithm and the new one is given

Applications
In this section we test our adaptive algorithm on two real world data sets. In the first example, we consider an application oriented to approximate the Black Forest elevation data set [8,14], which consists of 15 885 data points. This data set refers to a region in the neighborhood of Freiburg (Germany). It represents a specific case of scattered data with highly varying density as shown in Fig. 7 (left). In the second example, we focus on the approximation of the so-called Gattinara topography data set, which is characterized by 10 671 data points. The latter belong to the homonymous geographic area, close to the city of Gattinara in province of Vercelli (Italy). Also in this situation, although the data set distribution is quite different from the Black Forest one, we have a typical case of very irregularly distributed data points as evident from Fig. 7 (right). Both regions are mountain areas: the differences in height are 1214 m and 309.87 m, respectively, while minimum and maximum heights for such data are gathered in Table 9. In Fig. 8 we report a 3D view for the Black Forest (left) and Gattinara (right) data sets.
Since we are working with real data (and therefore we do not have any exact or true solution), we assess reliability of our results by considering a technique that is commonly  used in applications. For Black Forest data set we have 15 885 elevation data points, which we split into two subsets: first, we randomly select N = 15 715 nodes for the RBF-PUM interpolation process; second, we reserve the remaining N eval = 170 evaluation points for the cross validation. Roughly in the same way, we act for Gattinara data set. We start from the 10 671 points, and then we subdivide this data set by taking N = 10 600 interpolation nodes and N eval = 71 evaluation points. In Table 10, we report the numerical results obtained by applying our adaptive algorithm. The latter are computed by using M2 as local RBF interpolant and selecting the local shape parameters via LOOCV or MLE, with N min = 25. From this table we can observe as LOOCV and MLE provide a very similar accuracy, with a slight prevalence of LOOCV, while-as already outlined in numerical experiments of Sect. 5-the MLE is about twice faster than LOOCV. Note that, although such errors are larger than the ones shown in Sect. 5, they turn out to be consistent with the previous results; in fact, in these real world situations, the error in (22) is measured in meters. From more extensive tests we can also point out that in the Black Forest case LOOCV and MLE seem to have a similar predictive capability because the same level of accuracy is achieved, even when N min varies between 10 and 30 (see Fig. 9, left). For the Gattinara data set we can observe that the MLE is more accurate than LOOCV for smaller values of N min , whereas this behavior is reversed for larger ones (see Fig. 9, right). Instead, as regards

Conclusions and Future Work
In this paper we proposed a new adaptive algorithm for bivariate interpolation of large scattered data points through the RBF-PUM. We showed performance and efficacy of our numerical method by solving interpolation problems with artificial and real data sets, which were very irregularly distributed or with highly varying density in the domain. Compared to non-adaptive or standard RBF-PUM implementations, this adaptive algorithm enabled us to achieve accurate solutions for problems which in some cases might be unsolvable, also significantly reducing computational cost and execution time. All these results have been obtained by mainly exploiting the meshfree nature of the RBF-PUM. We thus created an adaptive scheme with subdomains of variable size, implementing an efficient search procedure for the localization of data points. The adaptive algorithm has been devised to effectively find optimal values of the RBF shape parameters by using LOOCV or MLE based criteria. This choice makes the scheme entirely automatic.
As future work we propose to further enhance our adaptive algorithm, for example optimizing the selection of the minimal number of points within each subdomain. Further studies in this direction will be discussed in future works.