Environmental and Ecological Statistics

, Volume 21, Issue 4, pp 715–731 | Cite as

Finite area smoothing with generalized distance splines

Article

Abstract

Most conventional spatial smoothers smooth with respect to the Euclidean distance between observations, even though this distance may not be a meaningful measure of spatial proximity, especially when boundary features are present. When domains have complicated boundaries leakage (the inappropriate linking of parts of the domain which are separated by physical barriers) can occur. To overcome this problem, we develop a method of smoothing with respect to generalized distances, such as within domain distances. We obtain the generalized distances between our points and then use multidimensional scaling to find a configuration of our observations in a Euclidean space of 2 or more dimensions, such that the Euclidian distances between points in that space closely approximate the generalized distances between the points. Smoothing is performed over this new point configuration, using a conventional smoother. To mitigate the problems associated with smoothing in high dimensions we use a generalization of thin plate spline smoothers proposed by Duchon (Constructive theory of functions of several variables, pp 85–100, 1977). This general method for smoothing with respect to generalized distances improves on the performance of previous within-domain distance spatial smoothers, and often provides a more natural model than the soap film approach of Wood et al. (J R Stat Soc Ser B Stat Methodol 70(5):931–955, 2008). The smoothers are of the linear basis with quadratic penalty type easily incorporated into a range of statistical models.

Keywords

Finite area smoothing Generalized additive model Multidimensional scaling Spatial modelling Splines 

1 Introduction

In ecology one would often like to create a smooth map of some noisy response as a function of geographical coordinates. In such cases, care must be taken to account for the structure of the domain which is being modelled. If the domain is bounded (finite area smoothing) then problems can occur when the smoother does not respect the boundary shape appropriately, especially when the shape of the boundary is complex. This complexity may manifest itself as some peninsula-like feature(s) in the domain with notably different observation values on either side of the feature. Features such as peninsulae give rise to a phenomenon known as leakage. The top two panels of Fig. 1 show an example of leakage (taken from Wood et al. 2008) where the high values in the upper half of the domain (top panel) leak across the gap to the lower values below and vice versa (second panel). The phenomenon is problematic since it causes the fitted surface to be mis-estimated; this can then lead to incorrect inference, in particular the mis-identification of “hot spots” in biological populations.
Fig. 1

Top row (left to right): the modified Ramsay horseshoe function from Wood et al. (2008), predictions from models using thin plate regression splines (“tprs”) and MDSDS (“mdsds”). Bottom row: predictions from the soap film smoother (“soap”) and geodesic low-rank thin plate splines (“gltps”). Prediction were made from models where the data were 600 points were sampled from the horseshoe and standard normal noise was added. Note that the predicted surface from the thin plate regression spline fit shows severe leakage

The problem of leakage arises because spatial smoothers consider proximate data to be similar, but in almost all cases distance between data locations is measured using straight line (Euclidean) distance. This approach is flawed in cases in which Euclidean distance is not a meaningful measure of proximity. For example, since whales do not travel on land, the meaningful distance between sightings of two whales on either side of the Antarctic peninsula is not the straight line distance across the peninsula, but the shortest path between them that stays entirely in open water. This issue is ubiquitous in spatial ecology. Natural and man-made barriers carve up the landscape (and seascape), partitioning biological populations; spatial models should take this into account.

In this article we propose a general method for smoothing, based on generalized distances between points. We apply this to produce a finite area smoother, based on the within-area distances between points in the domain of interest. The general approach uses multidimensional scaling (MDS; e.g. Chatfield and Collins 1980, Chapter 10) to associate a location in a \(\fancyscript{D}\) dimensional Euclidian space (p-space) with each original data point. The Euclidian distances between points in p-space then approximate the original generalized distance between the points. Smoothing is then performed with respect to locations in p-space. Reasonable approximation of the generalized distances by the Euclidean distances in p-space can require \(\fancyscript{D}\) to be greater than the 2–4 dimensions in which conventional multidimensional smoothers work well. For this reason we revisit the general class of smoothers proposed in Duchon (1977), selecting a smoother that behaves well with increasing dimension. Note that when applied to the finite area problem our generalized distance smoother can be viewed as an extension of Wang and Ranalli (2007), albeit somewhat better founded (which we argue below).

The use of multidimensional scaling in spatial statistics is not new, especially in the kriging literature. For example, Sampson and Guttorp (1992) model spatial covariance functions by computing a distance measure based on the observed spatial covariances, then project the points using MDS before kriging (points out of sample are found using a thin plate spline). Their approach differs from ours in a number of ways: (i) they require multiple observations at each location in order to calculate the covariances, (ii) only projections in 2 dimensions are considered and (iii) non-metric MDS is used. Further applications in geostatistics are described in Sect. 5 and compared to the method proposed here.

The smoother proposed here has the attractive property of being representable using a linear basis expansion with an associated quadratic penalty. Such basis-penalty smoothers have a dual interpretation as Gaussian random fields (Rue and Held 2005), and are appealing because of the ease with which they can be incorporated as components of other models. Such components include, for example, varying-coefficient models, random/mixed effects models and signal regression models, as well as the focus for this article: generalized additive models (see e.g. Ruppert et al. 2003; Wood 2006, for overviews). This flexibility is vital in ecological applications, where a spatial smooth is usually only one part of a much larger model.

Before presenting our proposed method in detail we now briefly review spline type spatial smoothers, and previous approaches to the finite area smoothing problem in the additive model literature.

1.1 Spline smoothing for spatial data

In the simplest case, we wish to find an \(f\) which is a smooth function of spatial coordinates, \(x_1\) and \(x_2\). We model \(f\) using a basis function expansion:
$$\begin{aligned} f(x_{1}, x_{2}) = \sum _{k=1}^K \beta _k b_k(x_{1}, x_{2}), \end{aligned}$$
(1)
where the \(\beta _k\)s are coefficients to be estimated and the \(b_k\)s are flexible (known) basis functions, such as thin plate spline basis functions or tensor products of B-splines.
If \(K\) is made large enough to avoid substantial model mis-specification bias, then the estimates of \(f\) are almost certain to over-fit any data to which they are fitted. For this reason it is usual to associate a measure \(J(f)\) of function wiggliness with \(f\), and to use this to penalize overfit during model estimation. For example, consider the simple generalized linear model
$$\begin{aligned} y_i \sim \text {EF}(\mu _i, \phi ),~~~~ g(\mu _i) = \eta _i =f(x_{1i},x_{2i}) \end{aligned}$$
where EF denotes an exponential family distribution with mean \(\mu _i\) and scale parameter \(\phi \), while \(g\) is a known link function and \(\eta _i\) is known as the ‘linear predictor’ of \(y_i\). Letting \(l({\varvec{\beta }})\) be the log likelihood then estimation of \(\varvec{\beta }\) is by maximization of
$$\begin{aligned} l({\varvec{\beta }}) - \lambda /2~J(f) \end{aligned}$$
where \(\lambda \) is a tune-able smoothing parameter, used to control the wiggliness of the estimate of \(f\). \(\lambda \) is typically estimated by GCV or marginal likelihood maximization (Wood 2011). A popular \(J\!(f)\) in the spatial context is the 2 dimensional thin plate spline penalty
$$\begin{aligned} J\!(f)=\int \left( \frac{\partial ^2 f}{\partial x_1^2}\right) ^2 + 2\left( \frac{\partial ^2 f}{\partial x_1 \partial x_2}\right) ^2 + \left( \frac{\partial ^2 f}{\partial x_2^2}\right) ^2 dx_1 dx_2 \end{aligned}$$
which can conveniently be written as a quadratic form in \(\varvec{\beta }\).

Within this framework it is straightforward to allow \(\eta _i\) to depend on multiple smooth functions of various predictor variables, as well as on conventional parametric terms that are linear in any unknown parameters (Hastie and Tibshirani 1990). Such models are widely used in quantitative ecology, for example in the creation of density maps which can then be integrated over the domain to obtain an abundance estimate (e.g., Williams et al. 2011; Miller et al. 2013) or as part of a larger model, taking into account nuisance spatial effects (e.g., Augustin et al. 2009).

2 Previous approaches to the problem of leakage

There are three main types of existing approaches for dealing with the finite area smoothing problem.

2.1 Partial differential equation methods

Ramsay (2002) exploited the link between smoothing with differential operator based penalties and partial differential equations to produce a smoother defined as the solution to a particular PDE problem defined only over a finite area. His FELSPINE method uses a finite element method to compute a smoother, based on the penalty
$$\begin{aligned} J\!(f)=\int \limits _\varOmega \left( \frac{\partial ^2 f}{\partial x_1^2} + \frac{\partial ^2 f}{\partial x_2^2} \right) ^2 dx_1 dx_2 \end{aligned}$$
(2)
where \(\varOmega \) is the region of the \(x_1\), \(x_2\) plane of interest. Ramsay had to use the very strong boundary condition that contours of \(f\) meet the boundary of \(\varOmega \) at right angles, which leads to artefacts when the condition does no hold (see Wood et al. 2008). The computational method also makes it awkward to include such terms in larger models.

Wood et al. (2008) use the physical analogy of a soap film to motivate an alternative which can be represented as a basis penalty smoother, and has better boundary behaviour. First consider the domain boundary to be made of wire, then dip this wire into a bucket of soapy water; a soap film with the same shape as the boundary will have then formed. If the wire lies in the spatial plane, the height of the soap film at a given point is the value of the smooth at that point. This film is then distorted smoothly toward each datum, while minimising the overall surface tension in the film. Mathematically the soap film consists of two sets of basis functions, one that is based entirely inside the domain (a set of interior knot locations are specified) and one that is induced by the (known or estimated) boundary values. These functions are found by solving Poisson and Laplace’s equations in two dimensions. The penalty associated with the former set is again (2).

The soap film approach has the basis-penalty form that is convenient for applied work and solves the boundary leakage problem, but basis setup is quite computationally expensive, and for many applications the approach is less natural than smoothing using within domain distances. A further problem with the soap film approach is that no distinction exists between ‘open’ boundaries (for example a boundary that is simply the limit of the region surveyed) and ‘hard’ boundaries (real physical barriers).

2.2 Within-area distances

Wang and Ranalli (2007) propose to replace straight-line distances with ‘geodesic’ distances in a smoother that is a sort of approximate thin plate spline (Geodesic Low-rank Thin Plate Splines, GLTPS). To calculate the geodesic distances, a graph is constructed in which each vertex is the location of an observation and is connected only to its \(k\) nearest neighbours. The within-area distances between each vertex pair is approximated using Floyd’s algorithm (Floyd 1962) to find the shortest path through the graph. This algorithm is cubic in the number of data, making the approach costly for large datasets. At large sample sizes the geodesic distances will tend towards ‘within-area distance’, i.e. the shortest path between two points that lies entirely within the domain of interest (Bernstein et al. 2000).

Wang and Ranalli use their geodesic distances in place of the usual Euclidean distances in the radial basis functions used to define a thin plate spline. They leave the basis for the nullspace of the thin plate spline penalty (i.e. those functions which are not penalised; in the case of 2-dimensional smoothing, linear functions of the coordinates and the plane) unchanged, so some linkage across boundary features remains in the smoother (since nullspace functions are defined over \(\mathbb {R}^2\) in the 2-dimensional case). The principle difficulty in interpreting the results of their method is that it is unclear what their penalty term penalizes. The interpretational difficulty arises because Wang and Ranalli’s expressions (3) and (9) involve the square roots of matrices that are not positive semi-definite. In the case of their expression (3), which relates to a thin plate spline, this problem would be rectifiable if the spline coefficients had the usual thin plate spline linear constraints applied in order to force positive definiteness on the spline penalty. However in the case of (9), which defines their geodesic splines, there appears to be no sensible way to obtain positive semi-definiteness. This is a problem because matrix square roots in general only exist for positive semi-definite matrices plus some rather special cases not useful here (see e.g., Higham 1987). It appears that for computational purposes Wang and Ranalli have used the generalization of a matrix square root given in appendix A.2.11 of Ruppert et al. (2003), but this square root lacks the basic properties that would allow Wang and Ranalli’s (2) to be interpretable exactly as a (reparameterized) thin plate spline, or for it to be possible to work out what the penalty on their geodesic spline is actually penalizing.

The Complex Region Spatial Smoother (CReSS) of Scott-Hayward et al. (2013) adapts GLTPS in two ways. First, in building the graph, edges are only drawn between two points if the straight line drawn between the points lies entirely within the boundary (boundary vertices are also included, in addition to observations). Second, a set of local radial basis functions are used (with a tuneable parameter controlling the locality of the basis). An AICc-weighted average over a series of models with different basis sizes, knot locations and locality of the basis functions is used for prediction. Unlike Wang and Ranalli (2007), the nullspace of the basis is removed.

The combination of an un-modified nullspace, the opacity of the penalty meaning and \(O(n^3)\) computational cost of distance calculation are of some concern for practical work. In the case of CReSS, the necessity of running many models also creates a substantial computational burden. For both interpretive and computational reasons it seems worthwhile to investigate alternative ways of using the within-area distance idea, avoiding these difficulties.

2.3 Domain warping

Paul Eilers (in a seminar at University of Munich in 2006) suggested conformally mapping the smoothing domain to a convex one via the Schwarz-Christoffel transformation (Driscoll and Trefethen 2002). The idea is that smoothing can then be conducted on the convex domain, without leakage problems. The first author has extensively investigated such an approach (Miller 2012, Chapter 3). Although it is possible to warp the boundary of the region into a shape such as a rectangle, the resulting distortions in the positions of observations inside the region lead to observations with vastly differing response values being “squashed” together, while other areas contain no observations. These distortions in observation density make smoothing more difficult and cause artefacts that are significantly more problematic than the leakage effects that the method seeks to avoid.

The methods proposed in the next section can be viewed as an attempt to put within-area distance methods on a more interpretable foundation by using an extension of the notion of domain warping.

3 The generalized distance smoothing model

We assume that we want to model a response \(y_i\) which is dependent on covariates via a linear predictor \(\eta _i\). Our model is then
$$\begin{aligned} \eta _i = \alpha _i + f(\mathbf{d}_i) \end{aligned}$$
where \(\alpha _i\) may depend linearly on further model coefficients (or may simply be zero). \(f\) is a smooth function, dependent on \(\mathbf{d}_i\), a vector of generalized distances between the \(i^\mathrm{th}\) observation and either (i) the other observations, or (ii) some set of ‘reference points’.
We complete the model by setting
$$\begin{aligned} f(\mathbf{d}_i) = f_{\fancyscript{D}}\{\mathbf{x}(\mathbf{d}_i)\} \end{aligned}$$
where \(\mathbf{x}(\mathbf{d})\) is the location of the point with distance vector \(\mathbf d\) in the \(\fancyscript{D}\) dimensional Euclidean space determined by multi-dimensional scaling applied to either i) the matrix for the complete data set or ii) the distance matrix for the set ‘reference points’ (see below). \(f_{\fancyscript{D}}\) is a \(\fancyscript{D}\) dimensional Duchon spline (Duchon 1977), a generalization of the familiar thin plate spline.

The key idea here is that we smooth over a Euclidean space in which the Euclidean inter-observation distances are approximately equal to the original generalized distances. That is \(\Vert \mathbf{x}(\mathbf{d}_i) - \mathbf{x}(\mathbf{d}_j) \Vert \approx d_{ij}\) when \(d_{ij}\) is the generalized distance between points \(i\) and \(j\) (\(\Vert \cdot \Vert \) is the Euclidean norm). The choice of \(\fancyscript{D}\) determines the accuracy of the distance approximation. This can either be part of model specification, in which case \(\fancyscript{D}\) is chosen to achieve some specified level of approximation accuracy, or more pragmatically, can be chosen to optimize estimated prediction error (e.g. GCV score).

In the case of finite area smoothing, the elements of \(\mathbf{d}_i\) are ‘within-area’ distances between points, that is to say the shortest path between two points, such that the path lies entirely within the domain of interest. We will refer to the original 2 dimensional data co-ordinates as being elements of the ‘o-space’ while \(\fancyscript{D}\) dimensional co-ordinates in the MDS projection will be referred to as elements of the ‘p-space’. Web Appendix B gives and algorithm for calculating within-area distances for simple polygons.

These smoothers will be henceforth referred to as MDSDS (Multi-Dimensionally Scaled Duchon Splines), and the next three subsections provide the details for the MDS, smoothing and \(\fancyscript{D}\) selection steps.

3.1 MDS as a transformation of space

In this section we consider the construction of the mapping \(\mathbf{x}(\mathbf{d})\) by multidimensional scaling (MDS; Gower 1968). We start the process by choosing a representative set of locations of size \(n_s\) within the domain of interest (i.e. in o-space). This set might be all the locations at which we have observations, but in the case of finite area smoothing we would usually choose a set of locations spread uniformly over the region of interest in order to ensure that all the important geographic features in o-space will be represented in p-space.

The generalized distances between all pairs of points in the representative set are then computed, in order to obtain a matrix \(\mathbf D\) such that \(D_{ij}\) is the squared generalized distance between points \(i\) and \(j\). MDS then finds a configuration of points in \(\fancyscript{D}\) dimensional Euclidean space such that the Euclidean distances between the points approximate the original generalized distances. The recipe for achieving this is straightforward. Defining \(\mathbf{H} = \mathbf{I} - \mathbf{11}^T/n_s\) (where 1 is a vector of \(1\)s) we can obtain the double centred version of \(\mathbf{D}\), \(\mathbf{S} = - \mathbf{HDH}/2\), which is then eigen-decomposed
$$\begin{aligned} \mathbf{S} = \mathbf{U}{{\varvec{\varLambda }}}\mathbf{U}^T. \end{aligned}$$
The rows of \(\mathbf{X }= \mathbf{U}{{\varvec{\varLambda }}}^{1/2}\) then give locations in Euclidean space, such that the interpoint Euclidean distances in that space approximate the original generalized distances. The dimension of the resulting space is at most \(n_s\) however, in practice not all of the eigenvalues of \(\mathbf{S}\) may be positive. Using the first \(\fancyscript{D}\) columns of \(\mathbf X\) gives a \(\fancyscript{D}\) dimensional p-space. See e.g. (Chatfield and Collins 1980, Chapter 10) for more details.
We then require some means for finding the location in p-space of a point in o-space that was not used to calculate \(\mathbf D\). Let \(\mathbf d\) be the \(n_s\) vector of squared generalized distances (in o-space) between this new point and the points in the original representative set. Gower (1968) gives the following interpolation formula for the location of the new point in p-space
$$\begin{aligned} \mathbf{x} = - {{\varvec{\varLambda }}}^{-1/2} \mathbf{U}^T\mathbf{d}^\prime \end{aligned}$$
where \(d^\prime _i = d_i - S_{ii}\). Again, \(\mathbf x\) would usually be truncated, retaining only its first \(\fancyscript{D} \) components.

So, MDS combined with Gower’s interpolation formula provide a means for constructing and computing with \(\mathbf{x}(\mathbf{d})\). We now turn to the construction of a suitable smoother in p-space.

3.2 Smoothing with Duchon splines

In order for our smoother to have a convenient basis-penalty form, we need to smooth in p-space using a basis-penalty smoother. A thin plate spline (TPS) is the obvious choice for smoothing arbitrarily scattered data where the Euclidean distance between points determines similarity, but there is a technical problem. To achieve a smooth \(f\) requires \(2m >{\fancyscript{D}}\) where \(m\) is the order of differentiation in the TPS penalty and the dimension of the space of unpenalized functions in a TPS basis is \(M=\left( \begin{array}{c} m + {\fancyscript{D}} - 1 \\ {\fancyscript{D}}\end{array}\right) \). As Fig. 2 shows the minimum possible \(M\) increases rapidly with \(\fancyscript{D}\), leading to the danger of substantial undersmoothing as \(\fancyscript{D}\) increases.
Fig. 2

Relationship between smoothing dimension (\(d\)) and the nullspace dimension (\(M\)) when \(m\) (the derivative penalty order) is set to 2 for thin plate regression splines (dashed) and Duchon splines (solid). Note that as the nullspace dimension increases, the complexity of those functions in the nullspace increases too. For the thin plate splines a combination of the continuity condition that \(2m>d\) and the form of \(M\) makes the size of the nullspace increase very quickly with smoothing dimension

To combat this problem we use a more general version of the thin plate spline from the larger class of functions considered in Duchon (1977), which will allow us to obtain a smoother for which \(M = {\fancyscript{D}}+1\). Duchon (1977) is somewhat inaccessible, and this larger class has been almost entirely ignored in the statistical literature, so we provide a brief summary here.

The difference between general Duchon splines and thin plate splines is in the smoothing penalty used. To understand the difference it helps to start with the general TPS penalty
$$\begin{aligned} J_{m,{\fancyscript{D}}} = \int \limits _{\mathbb {R}^d} \sum _{\nu _1 + \dots + \nu _d=m} \frac{m!}{\nu _1! \dots \nu _d!} \left( \frac{\partial ^m f \left( \mathbf {x} \right) }{\partial x_1^{\nu _1} \ldots \partial x_d^{\nu _d}} \right) ^2 \text {d} x_1 \ldots \text {d} x_d. \end{aligned}$$
(3)
By Plancherel’s theorem (e.g. Vretblad 2003, p. 180) if we take the Fourier transform, \(\mathfrak {F}\), of the derivatives in (3) then the penalty can be re-expressed as
$$\begin{aligned} J_{m,{\fancyscript{D}}} = \int \limits _{\mathbb {R}^d} \sum _{\nu _1 + \dots + \nu _d=m} \frac{m!}{\nu _1! \dots \nu _d!} \left( \mathfrak {F} \frac{\partial ^m f}{\partial x_1^{\nu _1} \ldots \partial x_d^{\nu _d}} \left( \varvec{\tau }\right) \right) ^2 \text {d} \varvec{\tau }, \end{aligned}$$
(4)
where \(\varvec{\tau }\) are now frequencies, rather than locations. Duchon then considers weighting the Fourier transform of the derivatives by some power of frequency, effectively increasing the penalization of high frequency components in the spatial derivatives if the power is positive. The resulting penalty is
$$\begin{aligned} \breve{J}_{m,{\fancyscript{D}}} = \int \limits _{\mathbb {R}^d}\Vert \varvec{\tau } \Vert ^{2s} \sum _{\nu _1 + \dots + \nu _d=m} \frac{m!}{\nu _1!\dots \nu _d!}\left( \mathfrak {F} \frac{\partial ^m f}{\partial x_1^{\nu _1} \ldots \partial x_d^{\nu _d}} \left( \varvec{\tau }\right) \right) ^2\text {d} \varvec{\tau }, \end{aligned}$$
(5)
where \(-{\fancyscript{D}} < 2s < {\fancyscript{D}}\) and the restriction \(m + s > {\fancyscript{D}}/2\) is applied to ensure continuity of the splines that result from use of this penalty.
Duchon shows that the function minimising (5) while interpolating or smoothing data at locations \(\mathbf{x}_i\) has the form
$$\begin{aligned} f(\mathbf{x}) = \sum _i^n \gamma _i K_{2m+2s-{\fancyscript{D}}}(\Vert \mathbf{x}-\mathbf{x}_i\Vert ) + \sum _j^M \alpha _j \phi _j(\mathbf{x}) \end{aligned}$$
where the \(\phi _j(\mathbf{x})\) form an orthogonal basis of polynomials of order \(<m\), \(\mathbf{x}_i\) is the \(i\)th observation location and \(\gamma _i\) and \(\alpha _j\) are coefficients to be estimated, subject to the \(M\) linear constraints
$$\begin{aligned} \sum _i^n \gamma _i \phi _j(\mathbf{x}_i)=0. \end{aligned}$$
(6)
The other basis functions are given by
$$\begin{aligned} K_d(t) = \left\{ \begin{array}{ll} (-1)^{(d+1)/2}|t|^d &{} d \text {~~odd}\\ (-1)^{d/2}|t|^d\log |t| &{} d \text {~~even} \end{array} \right. \end{aligned}$$
Finally, given the linear constraints (6),
$$\begin{aligned} \breve{J}_{m,{\fancyscript{D}}} = {\varvec{\delta }} ^T\mathbf{K} {\varvec{\delta }} \end{aligned}$$
where \(K_{ij} = K_{2m+2s-{\fancyscript{D}}}(\Vert \mathbf{x}_i-\mathbf{x}_j\Vert )\). Notice that \(s=0\) gives a conventional TPS. When \(s=0\) the functions \(\phi _j\) form the nullspace of the penalty and are therefore unpenalized.
As an example of using this basis and penalty, we note that estimating \(f\) to minimize
$$\begin{aligned} \sum _i^n (y_i - f(\mathbf{x}_i))^2 + \lambda \breve{J}_{m,{\fancyscript{D}}} \end{aligned}$$
now reduces to the straightforward optimization problem of finding \(\hat{\varvec{\gamma }}\) and \(\hat{\varvec{\alpha }}\) to minimize
$$\begin{aligned} \Vert \mathbf{y} - \mathbf{K } \varvec{\gamma } - \mathbf{A}{\varvec{\alpha }} \Vert ^2 + \lambda \varvec{\gamma }^T\mathbf{K} \varvec{\gamma } ~~\text {s.t.}~~\mathbf{A}^T\varvec{\gamma } = \mathbf{0} \end{aligned}$$
where \(A_{ij} = \phi _j(\mathbf{x}_i)\). Notice that this problem has exactly the same structure as the TPS problem, so exactly the same approach to computation can be taken. More importantly optimal rank reduced versions of these Duchon splines can be produced using the methods given in Wood (2003) for the TPS. Wood (2003) uses an eigen approximation to the full spline thereby avoiding the difficult problem of \(\fancyscript{D}\) dimensional knot placement that complicates other approaches to reduced rank splines, so we use this approach in what follows.

Although MDS only produces a unique configuration up to rotation and translation this is not problematic for smoothing with Duchon splines as they are isotropic smoothers and hence rotation invariant.

We are now in a position to produce a spline suitable for smoothing in p-space. Specifically we choose a (reduced rank) Duchon spline with \(m=2\) and \(s = {\fancyscript{D}}/2 - 1\), which will give us a smooth \(f\) for which \(M={\fancyscript{D}}+1\) (i.e. the unpenalized component of \(f\) grows only linearly with \(\fancyscript{D}\)). We choose \(s = {\fancyscript{D}}/2 - 1\) for \(m=2\) since this is the minimum \(s\) we can use. Using a higher \(s\) will increase the weighting on the higher frequency components of the smooth in the penalty, reducing flexibility. Since our aim here is simply to minimise the effect of the nullspace, we simply choose \(s\) as small as possible to avoid any other effects.

3.3 Selecting \(\fancyscript{D}\)

An obvious question is whether we actually need \(\fancyscript{D}\) to be larger than 2 in practice. Figure 3 provides an illustration that in general we do. It shows what happens when within-area distances over a 2 dimensional domain with a peninsulae are used to obtain a 2D p-space. Some of the points in the resulting p-space configuration become quite severely squashed together. In fact truncating the projection to two dimensions can sometimes cause the basic ordering of the points to be lost, making the task of the smoother impossible. Further investigation showed that increasing the dimensionality of p-space maintains the ordering of the points, however the number of dimensions required varied according to the shape of the domain. Increasing the dimension of p-space also makes the distances measured in p-space approximate the distance matrix, \(\mathbf {D}\), better.
Fig. 3

Left to right: test function over the peninsulae domain, points in the domain and finally their projection into 2-dimensional p-space when within-area distances are used to calculate the distance matrix. The p-space plot shows that some squashing can happen in two dimensions. The large left peninsula and some of the smaller peninsulae have lost their ‘width’ and, in fact, points within them have lost ordering

Having accepted the need for \({\fancyscript{D}}>2\), we need some means of choosing \(\fancyscript{D}\). Rather than setting a maximum difference between the distances in \(\mathbf {D}\) and the distances in the projection, we choose \(\fancyscript{D}\) in order to minimize GCV. Selecting \(\fancyscript{D}\) is typically a small part of the computational burden, since the MDS and smoothing are cheap relative to the computation of distances (at least in the finite area smoothing case). Figure 6 shows the relationship between \(\fancyscript{D}\) and GCV score for the Aral sea data analysed below.

4 Examples

To illustrate the utility of the model two simulation studies are shown, followed by examples using real data. All concentrate on the finite area smoothing problem. In each case MDSDS was compared with thin plate splines as described in Wood (2003) (which do not account for the boundary), geodesic low-rank thin plate splines (GLTPS) and the soap film smoother (which both do account for the boundary). The GLTPS model was as described in Wang and Ranalli (2007), but with the within-area distances calculated as described in Web Appendix B (i.e. the same as for MDSDS); knots were placed using the cover.design method in the package fields (again, as in Wang and Ranalli 2007). In all cases smoothing parameters were selected by GCV. The R packages mgcv (available from CRAN) and msg (available from https://github.com/dill/msg) were used to fit the models. Code for fitting the GLTPS is available at https://github.com/dill/gltps.

In all the cases below the basis size specified refers to the maximum basis size allowed, since the penalty will reduce the complexity of the smoother, we simply need to specify an upper bound on the basis size.

4.1 Ramsay’s horseshoe

The horseshoe shape shown in the top panel of Fig. 1 is an obvious benchmark for techniques that wish to combat leakage. Although perhaps unrealistic (and bordering on pathological), any new method that works well on the horseshoe should have a good chance of working well in more realistic situations. A simulation experiment was run with the same setup as in Wood et al. (2008): 200 replicates were generated at each of three error levels (standard normal noise multiplied by 0.1, 1 and 10) with sample size 600. A thin plate regression spline, with basis size 100 and a soap film smoother with 32 interior knots and a 40 knot cyclic spline was used to estimate the boundary. For the MDSDS model, the basis size was set to 100 and a 20 by 20 initial grid was used for the MDS projection (see Web Appendix A), MDS projection dimension was selected by GCV in the range of 2 and the number of dimensions that explained 95% of the variation in the distance matrix of the initial grid. For the GLTPS 40 knot locations were selected as in Wang and Ranalli (2007). For each realisation the mean squared error (MSE) was calculated between the true function and a prediction grid of 720 points.

As can be seen in Fig. 4, the thin plate regression spline has rather poor performance in MSE terms while MDSDS, the soap film smoother and GLTPS perform significantly better. MDSDS performs better than the soap film smoother and GLTPS at lower noise levels, becoming indistinguishable at the highest noise level. The median number of dimensions selected for the MDS projection using GCV was 3 (max. 14, min. 2). Looking more qualitatively at the bottom three plots in Fig. 4, the predictions do not show any evidence of leakage.
Fig. 4

Top: boxplots of per-realisation log mean squared error at the three noise levels. Using a paired Wilcoxon signed-ranks test, the difference between the new approach (“mdsds”) and the other models was significantly different for the two lower noise levels (at the 0.05 level). For the higher noise level, the three methods that accounted for the boundary (“gltps” and “soap”) were not distinguishable from MDSDS. The thin plate regression spline (“tprs”) was worse

4.2 Peninsulae domain

The results from the modified Ramsay horseshoe are encouraging. However the domain is not particularly realistic. To further explore the performance of MDSDS a more realistic domain was used. The domain, which attempts to mimic a coastline, is shown in the left panel of Fig. 3.

Simulations were run at signal-to-noise ratios of 0.50, 0.75 and 0.95 (equating to adding standard normal noise multiplied by 0.35, 0.9 and 1.55, respectively). The soap film smoother used 109 internal knots and 60 for the cyclic boundary smooth. The MDSDS models used an initial grid of 120 by 126 points, the basis size was 140. The thin plate regression spline basis size was also 140. For the GLTPS, 80 knots were selected using the space filling design.

Figure 5 shows the boxplots of the \(\log \) of the MSE per realisation for each model. In the low noise cases, a paired Wilcoxon signed-ranks test showed that the soap film smoother and MDSDS were not significantly different at the 0.05 level. In all cases MDSDS was significantly better than both GLTPS and thin plate regression splines.
Fig. 5

Boxplots of logarithm of mean MSE per realisation for the models tested on the peninsulae domain at three noise levels. At each noise level, the median mean MSE was lower for MDSDS than for the thin plate regression spline, soap film smoother and GLTPS. A paired Wilcoxon signed-ranks test showed that the only non-significant difference (at the 5 % level) occurred between MDSDS and the soap film smoother at the 0.35 noise level

4.3 Aral sea

The Aral sea is located between Kazakhstan and Uzbekistan and has been steadily shrinking since the 1960s when the Soviet government diverted the sea’s two tributaries in order to irrigate the surrounding desert. The NASA SeaWiFS satellite collected data on chlorophyll levels in the Aral sea over a series of 8 day observation periods from 1998 to 2002 (Wood et al. 2008). The 496 data are averages of the \(38^\text {th}\) observation period. Smooths were fitted to the spatial coordinates (Northings and Eastings; kilometres from a specified latitude and longitude) with the logarithm of chlorophyll concentration (modelled with a Gamma distribution) as the response.

The models that were fitted to the data were: a thin plate regression spline with basis size 70, MDSDS with a basis size of 70 (a 20 by 20 initial grid was used for the MDS projection), a GLTPS with 60 knots and soap film using 49 boundary knots and 74 internal knots. Using GCV for MDS projection dimension selection lead to a 5-dimensional projection. A plot of the relationship between projection dimension and GCV score can be seen in Fig. 6; there is a clear minimum at 5 dimensions.
Fig. 6

Plot of the relationship between GCV score and MDS projection dimension for the Aral sea data set. Here a clear minimum at 5 dimensions can be seen, however there is no particular reason to believe that there will always be such a pronounced optimum

Predictions from the models over a grid of 496 points are shown in Fig. 7. The fits are broadly similar across most of the domain. MDSDS, GLTPS and the soap film smoother do not show signs of leakage around (\(-\)50, \(-\)50), as the thin plate regression spline does.
Fig. 7

Predictions from models of the Aral sea chlorophyll data. Top row, left to right: raw data, predicted surface for thin plate regression spline. Second row: predicted surfaces for the soap film smoother and MDSDS. Bottom row: predicted surface for GLTPS. The latter three avoid the leakage seen in the \((-50, 50)\) region of the thin plate regression spline fit

5 Discussion

Our MDSDS approach appears to have competitive performance compared to existing methods, while providing a number of possible advantages. Relative to the soap film smoother the method has a more natural handling of open and closed boundaries, and is also often the more natural model when the linkage between geographic areas is via movement of organisms. Relative to Wang and Ranalli (2007) our approach is somewhat more transparent in terms of what is being penalized when smoothing, and also uses a nullspace basis that avoids leakage, unlike the Wang and Ranalli method for which the nullspace does not respect boundary features. As mentioned above, MDSDS fits easily into a generalized additive model, which may have many more components, which are often necessary for ecological work.

As mentioned above, using MDS to build covariance functions for kriging has been investigated previously. When non-Euclidean distances are used, covariance functions may no-longer be positive definite or conditionally negative definite (Curriero 2005), so MDS can be used to project the data, creating a set of Euclidean distances. For example, Løland and Høst (2003) used river network distances in the construction of a variogram, and overcame the problem of lack of positive definiteness by using MDS and then constructing the variogram in MDS space. Projection dimension selection is partially addressed in Jensen et al. (2006), the authors suggest using the proportion of variation explained or the Bayesian criterion of Oh and Raftery (2001) as possible metrics but do not fully explore the issue, resorting to 2-dimensional projections. The use of Duchon splines in MDSDS allows for a high-dimensional projections, thus allowing for more accurate approximation of the distance matrix whilst ensuring that the points maintain ordering (this second point has not been addressed in the kriging literature to our knowledge).

Further work would involve considering more biologically motivated measures of distance. For example, distances based on the minimum energetic cost of moving between two locations. It is also of interest to investigate the use of MDSDS for smoothing non-geographic distances outside of ecology such as the socio-economic similarity of parliamentary constituencies, or measures of genetic relatedness.

Notes

Acknowledgments

We are especially grateful to Jean Duchon for generous help in understanding Duchon (1977). David wishes to thank EPSRC for financial support during his PhD at the University of Bath.

Supplementary material

10651_2014_277_MOESM1_ESM.pdf (141 kb)
Supplementary material 1 (pdf 140 KB)

References

  1. Augustin N, Musio M, von Wilpert K, Kublin E, Wood SN, Schumacher M (2009) Modeling spatiotemporal forest health monitoring data. J Am Stat Assoc 104(487):899–911CrossRefGoogle Scholar
  2. Bernstein M, De Silva V, Langford J, Tenenbaum J (2000) Graph approximations to geodesics on embedded manifolds. Technical report, Department of Psychology, Stanford University. ftp://ftp-sop.inria.fr/prisme/boissonnat/ImageManifolds/isomap.pdf
  3. Chatfield C, Collins AJ (1980) Introduction to multivariate analysis. Science paperbacks, Chapman and HallGoogle Scholar
  4. Curriero F (2005) On the use of non-euclidean isotropy in geostatistics. Technical report 94, Johns Hopkins University, Department of Biostatistics. http://www.bepress.com/cgi/viewcontent.cgi?article=1094&context=jhubiostat
  5. Driscoll TA, Trefethen L (2002) Schwartz-Christoffel transform. Cambridge University Press, CambridgeCrossRefGoogle Scholar
  6. Duchon J (1977) Splines minimizing rotation-invariant semi-norms in Sobolev spaces. Constructive theory of functions of several variables, pp 85–100Google Scholar
  7. Floyd RW (1962) Algorithm 97: shortest path. Commun. ACM 5(6):345–345CrossRefGoogle Scholar
  8. Gower J (1968) Adding a point to vector diagrams in multivariate analysis. Biometrika 55(3):582CrossRefGoogle Scholar
  9. Hastie TJ, Tibshirani RJ (1990) Generalized additive models. Monographs on statistics and applied probability. Taylor & Francis, New YorkGoogle Scholar
  10. Higham NJ (1987) Computing real square roots of a real matrix. Linear Algebra Appl 88–89:405–430. doi:10.1016/0024-3795(87)90118-2 CrossRefGoogle Scholar
  11. Jensen OP, Christman MC, Miller TJ (2006) Landscape-based geostatistics: a case study of the distribution of blue crab in Chesapeake Bay. Environmetrics 17(6):605–621. doi:10.1002/env.767 CrossRefGoogle Scholar
  12. Løland A, Høst G (2003) Spatial covariance modelling in a complex coastal domain by multidimensional scaling. Environmetrics 14(3):307–321. doi:10.1002/env.588 CrossRefGoogle Scholar
  13. Miller DL (2012) On smooth models for complex domains and distances. PhD thesis, University of BathGoogle Scholar
  14. Miller DL, Burt ML, Rexstad EA (2013) Spatial models for distance sampling data: recent developments and future directions. Methods in Ecology and EvolutionGoogle Scholar
  15. Oh MS, Raftery AE (2001) Bayesian multidimensional scaling and. J Am Stat Assoc 96(455):1031CrossRefGoogle Scholar
  16. Ramsay T (2002) Spline smoothing over difficult regions. J R Stat Soc Ser B Stat Methodol 64(2):307–319CrossRefGoogle Scholar
  17. Rue H, Held L (2005) Gaussian Markov random fields: theory and applications. Monographs on statistics and applied probability. Taylor & Francis, New YorkCrossRefGoogle Scholar
  18. Ruppert D, Wand M, Carroll RJ (2003) Semiparametric regression. Cambridge series on statistical and probabilistic mathematics. Cambridge University Press, CambridgeCrossRefGoogle Scholar
  19. Sampson PD, Guttorp P (1992) Nonparametric estimation of nonstationary spatial covariance structure. J Am Stat Assoc 87(417):108–119CrossRefGoogle Scholar
  20. Scott-Hayward LAS, MacKenzie ML, Donovan CR, Walker CG, Ashe E (2013) Complex region spatial smoother (CReSS). J Comput Graph. Stat. doi: 10.1080/10618600.2012.762920
  21. Vretblad A (2003) Fourier analysis and its applications. Graduate texts in mathematics. Springer, BerlinGoogle Scholar
  22. Wang H, Ranalli M (2007) Low-rank smoothing splines on complicated domains. Biometrics 63(1):209–217PubMedCrossRefGoogle Scholar
  23. Williams R, Hedley SL, Branch TA, Bravington MV, Zerbini AN, Findlay KP (2011) Chilean blue whales as a case study to illustrate methods to estimate abundance and evaluate conservation status of rare species. Conserv Biol 25(3):526–535. doi:10.1111/j.1523-1739.2011.01656.x PubMedCrossRefGoogle Scholar
  24. Wood SN (2003) Thin plate regression splines. J R Stat Soc Ser B Stat Methodol 65(1):95–114CrossRefGoogle Scholar
  25. Wood SN (2006) Generalized additive models: an introduction with R. Chapman & Hall/CRC, LondonGoogle Scholar
  26. Wood SN (2011) Fast stable restricted maximum likelihood and marginal likelihood estimation of semiparametric generalized linear models. J R Stat Soc Ser B Stat Methodol 73(1):3–36CrossRefGoogle Scholar
  27. Wood SN, Bravington MV, Hedley SL (2008) Soap film smoothing. J R Stat Soc Ser B Stat Methodol 70(5):931–955CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2014

Authors and Affiliations

  1. 1.Centre for Research into Ecological and Environmental ModellingUniversity of St AndrewsThe ObservatoryScotland
  2. 2.Department of Mathematical SciencesUniversity of BathClaverton Down, BathUK

Personalised recommendations