Skip to main content
Log in

Lattice-based methods for regression and density estimation on complicated multidimensional regions

  • Published:
Environmental and Ecological Statistics Aims and scope Submit manuscript

Abstract

This paper illustrates the use of diffusion kernels to estimate smooth density and regression functions defined on highly complex domains. We generalize the two-dimensional lattice-based estimators of Barry and McIntyre (2011) and McIntyre and Barry (2018) to estimate any function defined on a domain that may be embedded in \(\mathbb {R}^d\), \(d\ge 1\). Examples include function estimation on the surface of a sphere, a sphere with boundaries and holes, a sphere over multiple time periods, a linear network, the surface of cylinder, a three-dimensional volume with boundaries, and a union of one- and two-dimensional subregions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  • Arya S, Mount D, Kemp SE, Jefferis G (2019) RANN: fast nearest neighbour search (Wraps ANN Library) using L2 metric. R package version 2.6.1. https://CRAN.R-project.org/package=RANN

  • Barry RP (2012) latticeDensity: density estimation and nonparametric regression on irregular regions. R package version 1.0.7. http://CRAN.R-project.org/package=latticeDensity

  • Barry RP, McIntyre J (2011) Estimating animal densities and home range in regions with irregular boundaries and holes: a lattice-based alternative to the kernel density estimator. Ecol Model 222:1666–1672

    Article  Google Scholar 

  • Botev ZI, Grotowski JF, Kroese DP (2010) Kernel density estimation via diffusion. Ann Stat 38:2916–2957

    Article  Google Scholar 

  • Brus DJ, Yang R, Zhang G (2016) Three-dimensional geostatistical modeling of soil organic carbon: a case study in the Qilian Mountains, China. Catena 141:46–55

    Article  CAS  Google Scholar 

  • Clark I (1986) The art of cross validation in geostatistical applications. In: Ramani RV (ed) 19th application of computers and operations research in the mineral industry. Society of Mining Engineers, Littleton CO

    Google Scholar 

  • Cox DR, Lewis PAW (1966) Statistical analysis of series of events. Methuen, London

    Book  Google Scholar 

  • Deserno M (2004) How to generate equidistributed points on the surface of a sphere. http://www.cmu.edu/biolphys/deserno/pdf/sphere_equi.pdf

  • Duchamp T, Stuetzle W (2003) Spline smoothing on surfaces. J Comput Graph Stat 12:354–381

    Article  Google Scholar 

  • Ettinger B, Perotto S, Sangalli LM (2016) Spatial regression models over two-dimensional manifolds. Biometrika 103:71–88

    Article  Google Scholar 

  • Fisher RA (1953) Dispersion on a sphere. Proc R Soc A 217:295–305

    Article  Google Scholar 

  • Furrer R, Sain S (2010) spam: a sparse matrix R package with emphasis on MCMC methods for Gaussian Markov Random Fields. J Stat Softw 36:1–25. http://www.jstatsoft.org/v36/i10/

  • Greco F, Ventrucci M, Castelli E (2018) P-spline smoothing for spatial data collected worldwide. Spatial Stat 27:1–17

    Article  Google Scholar 

  • Li HY, Marchant BP, Webster R (2016) Modelling the electrical conductivity of soil in the Yangtze delta in three dimensions. Geoderma 269:119–125

    Article  Google Scholar 

  • Ligas M, Kulczycki M (2018) Kriging and moving window kriging on a sphere in geometric (GNSS/levelling) geoid modelling. Surv Rev 50:155–162

    Article  Google Scholar 

  • Maisog JM, Wang Y, Luta G, Liu J (2014) ptinpoly: point-in-polyhedron test (2D and 3D). R package version 2.4. https://CRAN.R-project.org/package=ptinpoly

  • McIntyre J, Barry RP (2018) A lattice-based smoother for regions with irregular boundaries and holes. J Comput Graph Stat 27:360–367

    Article  Google Scholar 

  • McSwiggan G, Baddeley A, Nair G (2017) Kernel density estimation on a linear network. Scand J Stat 44:324–345

    Google Scholar 

  • Miller DL, Wood SN (2014) Finite area smoothing with generalized distance splines. Environ Ecol Stat 21:715–731

    Article  Google Scholar 

  • National Geophysical Data Center (1996) Bathymetry of Lake Michigan. National Geophysical Data Center, NOAA. https://www.ngdc.noaa.gov/mgg/greatlakes/michigan.html

  • National Oceanic and Atmospheric Administration (2019) https://www.ndbc.noaa.gov/

  • Pigoli Davide, Menafoglio Alessandra, Secchi Piercesare (2016) Kriging prediction for manifold-valued random fields. J Multivariate Anal 145:117–131

    Article  Google Scholar 

  • Poggio L, Gimona A (2014) National scale 3D modelling of soil organic carbon stocks with uncertainty propagation—an example from Scotland. Geoderma 232–234:284–299

    Article  Google Scholar 

  • R Core Team (2018) R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/

  • Ramsay T (2002) Spline smoothing over difficult regions. J R Stat Soc Ser B 64:307–319

    Article  Google Scholar 

  • Sain SR, Baggerly KA, Scott DW (1994) Cross-validation of multivariate densities. J Am Stat Assoc 89:807–817

    Article  Google Scholar 

  • Sangalli LM, Ramsay JO, Ramsay TO (2013) Spatial spline regression models. J R Stat Soc Ser B 75:681–703

    Article  Google Scholar 

  • Scott-Hayward LAS, Mackenzie ML, Donovan CR, Walker CG, Ashe E (2014) Complex region spatial smoother (CReSS). J Comput Graph Stat 23:340–360

    Article  Google Scholar 

  • Soetaert K (2017) plot3D: plotting multi-dimensional data. R package version 1.1.1. https://CRAN.R-project.org/package=plot3D

  • Tsagris M, Athineou G, Sajib A, Amson E, Waldstein MJ (2019) Directional: directional statistics. R package version 3.7. https://CRAN.R-project.org/package=Directional

  • U.S. Environmental Protection Agency (2000) Great lakes environmental database system, Lake Michigan mass balance results. https://www.epa.gov/greatlakes

  • Wang H, Ranalli MG (2007) Low-rank smoothing splines on complicated domains. Biometrics 63:209–217

    Article  Google Scholar 

  • Wood SN, Bravington MV, Hedley SL (2008) Soap film smoothing. J R Stat Soc Ser B 70:931–955

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ronald P. Barry.

Additional information

Handling Editor: Pierre Dutilleul.

Electronic supplementary material

Appendix

Appendix

Here we use a toy example to illustrate the basic steps for estimating a smooth function with the lattice-based smoother. R code for reproducing this example is provided in Supplementary Materials in the vignette ToyExample. Here we estimate a density function over a subregion of \(\mathbb {R}^2\) from observations of a point process. This same basic squence of steps is followed to estimate a density or regression function on any arbitrary subregion of \(\mathbb {R}^d\).

Figure 10 shows the region of interest, representing a lake. A single observation of a point process, for example the location where a certain fish species is observed, is recorded at \(x=5.8\) and \(y=1.2\). Here the subregion of interest is a polygonal region embedded in \(\mathbb {R}^2\). The first step is to define a set of nodes over the lake. Typically nodes will be defined densely throughout the subregion, but for illustration we will use only ten nodes. Nodes are plotted in Fig. 10, and note that the observed point process location is not exactly on a node, so it is moved to the nearest node, node 10. The matrix of node coordinates \({{{\varvec{{L}}}}}\) is then the \(10\times 2\) matrix shown below.

Fig. 10
figure 10

Point process location (left) and nodes (right)

$$\begin{aligned} {{{\varvec{{L}}}}}^T = \left[ \begin{array}{cccccccccc} 1 &{} 2 &{} 1 &{} 2 &{} 3 &{} 4 &{} 5 &{} 6 &{} 5 &{} 6\\ 1 &{} 1 &{} 2 &{} 2 &{} 2 &{} 2 &{} 2 &{} 2 &{} 1 &{} 1\\ \end{array} \right] \end{aligned}$$

Next we define the neighbor matrix, \({{{\varvec{{B}}}}}\). The neighbor relationship is arbitrary, although most often, when the region of interest is spatial, it is determined by distance between nodes. For this example we declare points adjacent in the north-south and east-west directions to be neighbors. The neighbor matrix \({{{\varvec{{B}}}}}\) is shown below, along with a plot of the lattice in Fig. 11.

Fig. 11
figure 11

Lattice structure on lake

Fig. 12
figure 12

Colors represent density over the lake for the toy example

$$\begin{aligned} {{{\varvec{{B}}}}}= \left[ \begin{array}{c@{\quad }c@{\quad }c@{\quad }c@{\quad }c@{\quad }c@{\quad }c@{\quad }c@{\quad }c@{\quad }c} 0 &{}1 &{}1 &{}0 &{}0 &{}0 &{}0 &{}0 &{}0 &{}0\\ 1 &{}0 &{}0 &{}1 &{}0 &{}0 &{}0 &{}0 &{}0 &{}0\\ 1 &{}0 &{}0 &{}1 &{}0 &{}0 &{}0 &{}0 &{}0 &{}0\\ 0 &{}1 &{}1 &{}0 &{}1 &{}0 &{}0 &{}0 &{}0 &{}0\\ 0 &{}0 &{}0 &{}1 &{}0 &{}1 &{}0 &{}0 &{}0 &{}0\\ 0 &{}0 &{}0 &{}0 &{}1 &{}0 &{}1 &{}0 &{}0 &{}0\\ 0 &{}0 &{}0 &{}0 &{}0 &{}1 &{}0 &{}1 &{}1 &{}0\\ 0 &{}0 &{}0 &{}0 &{}0 &{}0 &{}1 &{}0 &{}0 &{}1\\ 0 &{}0 &{}0 &{}0 &{}0 &{}0 &{}1 &{}0 &{}0 &{}1\\ 0 &{}0 &{}0 &{}0 &{}0 &{}0 &{}0 &{}1 &{}1 &{}0\end{array} \right] \end{aligned}$$

Finally we specify the transition matrix \({{{\varvec{{T}}}}}\). Here we define \({{{\varvec{{T}}}}}\) to be isotropic, with the probability of movement between neighbors equal to 1/6 everywhere. Note then that the probabilities on the diagonal of staying put at any step in the random walk depend on the number of neighbhors of each node.

$$\begin{aligned} {{{\varvec{{T}}}}}=\left[ \begin{array}{c@{\quad }c@{\quad }c@{\quad }c@{\quad }c@{\quad }c@{\quad }c@{\quad }c@{\quad }c@{\quad }c} {2\over 3} &{}{1\over 6}&{}{1\over 6}&{}0 &{}0 &{}0 &{}0 &{}0 &{}0 &{}0\\ {1\over 6} &{}{2\over 3} &{}0 &{}{1\over 6} &{}0 &{}0 &{}0 &{}0 &{}0 &{}0\\ {1\over 6} &{}0 &{}{2\over 3} &{}{1\over 6} &{}0 &{}0 &{}0 &{}0 &{}0 &{}0\\ 0 &{}{1\over 6} &{}{1\over 6} &{}{1\over 2} &{}{1\over 6} &{}0 &{}0 &{}0 &{}0 &{}0\\ 0 &{}0 &{}0 &{}{1\over 6} &{}{2\over 3} &{}{1\over 6} &{}0 &{}0 &{}0 &{}0\\ 0 &{}0 &{}0 &{}0 &{}{1\over 6} &{}{2\over 3} &{}{1\over 6} &{}0 &{}0 &{}0\\ 0 &{}0 &{}0 &{}0 &{}0 &{}{1\over 6} &{}{1\over 2} &{}{1\over 6} &{}{1\over 6} &{}0\\ 0 &{}0 &{}0 &{}0 &{}0 &{}0 &{}{1\over 6} &{}{2\over 3} &{}0 &{}{1\over 6}\\ 0 &{}0 &{}0 &{}0 &{}0 &{}0 &{}{1\over 6} &{}0 &{}{2\over 3} &{}{1\over 6}\\ 0 &{}0 &{}0 &{}0 &{}0 &{}0 &{}0 &{}{1\over 6} &{}{1\over 6} &{}{2\over 3}\end{array} \right] \end{aligned}$$

The estimate of density at each point on the lattice after k steps is computed using Equation 1. For the single point-process observation at (5.8,1.2) relocated to node 10, the initial density vector is \({{{\varvec{{p}}}}}_0 = [0,0,0,0,0,0,0,0,0,1]\). Taking \(k=5\), the estimated density becomes

$$\begin{aligned} {{{\varvec{{T}}}}}^5{{{\varvec{{p}}}}}_0= \left[ \begin{array}{c} 0.0000000000\\ 0.0000000000\\ 0.0000000000\\ 0.0002572016\\ 0.0048868313\\ 0.0378086420\\ 0.1520061728\\ 0.2443415638\\ 0.2443415638\\ 0.3163580247\\ \end{array} \right] \end{aligned}$$

Note that three nodes have estimated density zero, because they are more than five steps from any observation. The estimated density at each node is plotted in Fig. 12. Commonly such an estimate would be displayed using a contour or image graphic, after estimating density on a more densely defined grid of nodes.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Barry, R.P., McIntyre, J. Lattice-based methods for regression and density estimation on complicated multidimensional regions. Environ Ecol Stat 27, 571–589 (2020). https://doi.org/10.1007/s10651-020-00459-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10651-020-00459-z

Keywords

Navigation