Abstract
This paper illustrates the use of diffusion kernels to estimate smooth density and regression functions defined on highly complex domains. We generalize the two-dimensional lattice-based estimators of Barry and McIntyre (2011) and McIntyre and Barry (2018) to estimate any function defined on a domain that may be embedded in \(\mathbb {R}^d\), \(d\ge 1\). Examples include function estimation on the surface of a sphere, a sphere with boundaries and holes, a sphere over multiple time periods, a linear network, the surface of cylinder, a three-dimensional volume with boundaries, and a union of one- and two-dimensional subregions.
Similar content being viewed by others
References
Arya S, Mount D, Kemp SE, Jefferis G (2019) RANN: fast nearest neighbour search (Wraps ANN Library) using L2 metric. R package version 2.6.1. https://CRAN.R-project.org/package=RANN
Barry RP (2012) latticeDensity: density estimation and nonparametric regression on irregular regions. R package version 1.0.7. http://CRAN.R-project.org/package=latticeDensity
Barry RP, McIntyre J (2011) Estimating animal densities and home range in regions with irregular boundaries and holes: a lattice-based alternative to the kernel density estimator. Ecol Model 222:1666–1672
Botev ZI, Grotowski JF, Kroese DP (2010) Kernel density estimation via diffusion. Ann Stat 38:2916–2957
Brus DJ, Yang R, Zhang G (2016) Three-dimensional geostatistical modeling of soil organic carbon: a case study in the Qilian Mountains, China. Catena 141:46–55
Clark I (1986) The art of cross validation in geostatistical applications. In: Ramani RV (ed) 19th application of computers and operations research in the mineral industry. Society of Mining Engineers, Littleton CO
Cox DR, Lewis PAW (1966) Statistical analysis of series of events. Methuen, London
Deserno M (2004) How to generate equidistributed points on the surface of a sphere. http://www.cmu.edu/biolphys/deserno/pdf/sphere_equi.pdf
Duchamp T, Stuetzle W (2003) Spline smoothing on surfaces. J Comput Graph Stat 12:354–381
Ettinger B, Perotto S, Sangalli LM (2016) Spatial regression models over two-dimensional manifolds. Biometrika 103:71–88
Fisher RA (1953) Dispersion on a sphere. Proc R Soc A 217:295–305
Furrer R, Sain S (2010) spam: a sparse matrix R package with emphasis on MCMC methods for Gaussian Markov Random Fields. J Stat Softw 36:1–25. http://www.jstatsoft.org/v36/i10/
Greco F, Ventrucci M, Castelli E (2018) P-spline smoothing for spatial data collected worldwide. Spatial Stat 27:1–17
Li HY, Marchant BP, Webster R (2016) Modelling the electrical conductivity of soil in the Yangtze delta in three dimensions. Geoderma 269:119–125
Ligas M, Kulczycki M (2018) Kriging and moving window kriging on a sphere in geometric (GNSS/levelling) geoid modelling. Surv Rev 50:155–162
Maisog JM, Wang Y, Luta G, Liu J (2014) ptinpoly: point-in-polyhedron test (2D and 3D). R package version 2.4. https://CRAN.R-project.org/package=ptinpoly
McIntyre J, Barry RP (2018) A lattice-based smoother for regions with irregular boundaries and holes. J Comput Graph Stat 27:360–367
McSwiggan G, Baddeley A, Nair G (2017) Kernel density estimation on a linear network. Scand J Stat 44:324–345
Miller DL, Wood SN (2014) Finite area smoothing with generalized distance splines. Environ Ecol Stat 21:715–731
National Geophysical Data Center (1996) Bathymetry of Lake Michigan. National Geophysical Data Center, NOAA. https://www.ngdc.noaa.gov/mgg/greatlakes/michigan.html
National Oceanic and Atmospheric Administration (2019) https://www.ndbc.noaa.gov/
Pigoli Davide, Menafoglio Alessandra, Secchi Piercesare (2016) Kriging prediction for manifold-valued random fields. J Multivariate Anal 145:117–131
Poggio L, Gimona A (2014) National scale 3D modelling of soil organic carbon stocks with uncertainty propagation—an example from Scotland. Geoderma 232–234:284–299
R Core Team (2018) R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/
Ramsay T (2002) Spline smoothing over difficult regions. J R Stat Soc Ser B 64:307–319
Sain SR, Baggerly KA, Scott DW (1994) Cross-validation of multivariate densities. J Am Stat Assoc 89:807–817
Sangalli LM, Ramsay JO, Ramsay TO (2013) Spatial spline regression models. J R Stat Soc Ser B 75:681–703
Scott-Hayward LAS, Mackenzie ML, Donovan CR, Walker CG, Ashe E (2014) Complex region spatial smoother (CReSS). J Comput Graph Stat 23:340–360
Soetaert K (2017) plot3D: plotting multi-dimensional data. R package version 1.1.1. https://CRAN.R-project.org/package=plot3D
Tsagris M, Athineou G, Sajib A, Amson E, Waldstein MJ (2019) Directional: directional statistics. R package version 3.7. https://CRAN.R-project.org/package=Directional
U.S. Environmental Protection Agency (2000) Great lakes environmental database system, Lake Michigan mass balance results. https://www.epa.gov/greatlakes
Wang H, Ranalli MG (2007) Low-rank smoothing splines on complicated domains. Biometrics 63:209–217
Wood SN, Bravington MV, Hedley SL (2008) Soap film smoothing. J R Stat Soc Ser B 70:931–955
Author information
Authors and Affiliations
Corresponding author
Additional information
Handling Editor: Pierre Dutilleul.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Appendix
Appendix
Here we use a toy example to illustrate the basic steps for estimating a smooth function with the lattice-based smoother. R code for reproducing this example is provided in Supplementary Materials in the vignette ToyExample. Here we estimate a density function over a subregion of \(\mathbb {R}^2\) from observations of a point process. This same basic squence of steps is followed to estimate a density or regression function on any arbitrary subregion of \(\mathbb {R}^d\).
Figure 10 shows the region of interest, representing a lake. A single observation of a point process, for example the location where a certain fish species is observed, is recorded at \(x=5.8\) and \(y=1.2\). Here the subregion of interest is a polygonal region embedded in \(\mathbb {R}^2\). The first step is to define a set of nodes over the lake. Typically nodes will be defined densely throughout the subregion, but for illustration we will use only ten nodes. Nodes are plotted in Fig. 10, and note that the observed point process location is not exactly on a node, so it is moved to the nearest node, node 10. The matrix of node coordinates \({{{\varvec{{L}}}}}\) is then the \(10\times 2\) matrix shown below.
Next we define the neighbor matrix, \({{{\varvec{{B}}}}}\). The neighbor relationship is arbitrary, although most often, when the region of interest is spatial, it is determined by distance between nodes. For this example we declare points adjacent in the north-south and east-west directions to be neighbors. The neighbor matrix \({{{\varvec{{B}}}}}\) is shown below, along with a plot of the lattice in Fig. 11.
Finally we specify the transition matrix \({{{\varvec{{T}}}}}\). Here we define \({{{\varvec{{T}}}}}\) to be isotropic, with the probability of movement between neighbors equal to 1/6 everywhere. Note then that the probabilities on the diagonal of staying put at any step in the random walk depend on the number of neighbhors of each node.
The estimate of density at each point on the lattice after k steps is computed using Equation 1. For the single point-process observation at (5.8,1.2) relocated to node 10, the initial density vector is \({{{\varvec{{p}}}}}_0 = [0,0,0,0,0,0,0,0,0,1]\). Taking \(k=5\), the estimated density becomes
Note that three nodes have estimated density zero, because they are more than five steps from any observation. The estimated density at each node is plotted in Fig. 12. Commonly such an estimate would be displayed using a contour or image graphic, after estimating density on a more densely defined grid of nodes.
Rights and permissions
About this article
Cite this article
Barry, R.P., McIntyre, J. Lattice-based methods for regression and density estimation on complicated multidimensional regions. Environ Ecol Stat 27, 571–589 (2020). https://doi.org/10.1007/s10651-020-00459-z
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10651-020-00459-z