Kernel density estimation for static backgrounds
Kernel density estimation (KDE) is an essential method for analyzing spatial point (a.k.a. ‘event’) patterns (Silverman 1986). It results in a smooth surface of density estimates by imposing a regular grid of points (‘pixels’) on the study area. The density for each grid point is computed based on neighboring events, as follows: The kernel, a circular window with radius hs (‘bandwidth’, s denotes ‘spatial’) is centered on a data point. Any grid point located within the kernel receives a contribution (a.k.a. ‘weight’) toward its density estimate. The contribution is determined by their distance to the data point in the center, which is plugged into the kernel function (ks, closer proximity results in higher contribution). Lastly, the weights are summed for each grid point, as multiple data points can contribute to a given grid point. We repeat the procedure for each data point and hence, create a density surface based on the observed point data. For a given grid point (x, y), kernel density \(\hat{f}\left( {x,y} \right)\) is calculated as follows (Eq. 1):
$$\hat{f}\left( {x,y} \right) = \frac{1}{{nh_{{\text{s}}}^{2} }} \mathop \sum \limits_{i = 1}^{n} k_{{\text{s}}} \left( {\frac{{d_{{i\left( {x,y} \right)}} }}{{h_{{\text{s}}} }}} \right)$$
(1)
where n is the number of data points within the study area, ks is the kernel function, and di(x,y) is the distance between the grid point and data point i. Kernel functions like Epanechnikov, Gaussian, or Biweight are widely used and accepted (Bowman and Azzalini 1997). We use the Epanechnikov kernel function (Epanechnikov 1969) in all of our analyses due to its popularity for spatial- and spatiotemporal analysis.
Kernel density estimation for dynamic backgrounds
For many geographical research questions, density as the distance-weighted number of points per unit area may not provide a suitable answer (Bithell 1990). In the case of estimating disease risk, mapping the distance-weighted number of disease cases per unit population-at-risk rather than per unit area might be more realistic. The latter assumes that geographic distance is the sole determinant of the contribution of a case to the disease risk at a grid point (Shi 2010). As a result, an area of elevated risk identified by KDE might merely reflect a large local background population (Bithell 2000). Depending on the phenomenon under study, the population-at-risk can exhibit an uneven distribution in space and time, include all or only certain segments of the population (i.e. for COVID-19, elderly individuals and people with comorbidities are at increased mortality risk), and may be a sample of the full population-at-risk. This is referred to as the background population, or simply ‘the background’ (Carlos et al. 2010). A generic method to deal with a spatially varying background is to compute the risk \(\left( {\hat{r}} \right)\) at location (x, y) by dividing the density of cases (c) by the density of the background population (p), shown in Eq. 2 (Davies and Hazelton 2010).
$$\hat{r}\left( {x,y} \right) = \frac{c}{p}$$
(2)
We use Eq. (2) to compute the risk at any location (x, y) by centering the kernel on each data point i (disease case). We then compute i’s contribution to risk at surrounding grid points by factoring in the population within the kernel. Therefore, the contribution of i to the risk at (x, y) is determined by the population near i, and not by the population near (x, y). In other words, the contribution of a case to the disease risk at a particular grid point is controlled by the local population surrounding that case, rather than the population surrounding the grid point. This distinction is relevant as it can result in different risk estimates under spatially varying backgrounds (Shi 2010).
Fixed kernels have constant bandwidth (search radius), whereas adaptive kernels allow the bandwidth to adapt to local conditions (Sain 2002; Brunsdon 1995). In our case, the kernel can either adapt to the background (Fig. 2a) or neighboring cases (Fig. 2b). While a fixed kernel results in constant areal support for each case, an adaptive kernel establishes either a constant population or case support. Areas that exhibit a high density of disease events (‘clusters’), for example, are often sought out for prevention efforts (Coleman et al. 2009). As infectious diseases spread between individuals in close proximity (Salathé et al. 2010), an area where cases cluster may be characterized as ‘high risk,’ whereas an area where cases are sparse may be ‘low risk’ (Bhopal 2016; Riley 2007). Therefore, we choose a kernel that adapts to neighboring disease cases and adjusts the estimate to the background within.
We achieve this by centering the kernel on a disease case and start increasing the bandwidth until it encircles a specified number of neighboring cases (the support threshold). Note that as the kernel expands, the case in its center will expand the spatial range of its contribution to disease risk. In other words, as the circle grows outward, seeking support, more grid points will receive contribution from the disease case in its center.
Shi (2010) proposes an adaptive bandwidth kernel density estimator (Eq. 3), which corresponds to Fig. 2a):
$$\hat{r}\left( {x,y} \right) = \mathop \sum \limits_{i = 1}^{n} k_{{\text{s}}} \left( {\frac{{d_{{i, \left( {x,y} \right)}} }}{{h_{{\text{s}}} \left[ {p\left( {x_{i} ,y_{i} } \right)} \right]}}} \right)$$
(3)
where the bandwidth hs is a function of the local population density p at the location (xi, yi) of case i. The weight of i for (x, y) is divided by the population in the kernel. This method results in disease risk values that are defensible in health studies, while also being more statistically comparable (Carlos et al. 2010; Shi and Wang 2015; Shi 2010).
Equation (4) denotes S-DB, the purely spatial kernel density estimator for dynamic backgrounds that adapts to the neighboring cases, which corresponds to Fig. 2b):
$$\hat{r}_{{{\text{S}} - {\text{DB}}}} \left( {x,y} \right) = \mathop \sum \limits_{i = 1}^{n} k_{{\text{s}}} \left( {\frac{{d_{{i, \left( {x,y} \right)}} }}{{h_{{\text{s}}} \left[ {c\left( {x_{i} ,y_{i} } \right)} \right]}}} \right)$$
(4)
Here the bandwidth hs is a function of the local case density c at the location (xi, yi) of case i. The density contribution of i to the locations within bandwidth is then divided by the population in the kernel. Note that both estimators (Eqs. 3, 4) require choosing the support threshold value. We circumvent this choice by analyzing the sensitivity of the resulting risk estimates to the support threshold value.
Space–time kernel density estimation for static backgrounds
So far, we have ignored the temporal dimension in the discussion. Many geographic studies do not consider the temporal dimension or employ time-flattening: collapsing the temporal dimension into a single 2D map, which represents the entire study period (Bach et al. 2016). Another approach discretizes time into a number of time slices, which can be displayed as small multiples (Boyandin et al. 2012). However, both methods are limited in their ability to depict patterns of spatiotemporal point events: Time-flattening ignores any temporal variation in the data, and the small multiples approach is not scalable (Delmelle et al. 2014).
Space–time kernel density estimation (STKDE) extends traditional bivariate KDE with the temporal dimension and is suited for characterizing spatiotemporal patterns of spatial point events with a timestamp (Nakaya and Yano 2010). STKDE produces density estimates for a spatiotemporal grid of points (‘voxels’) based on the proximity and number of surrounding point data (Delmelle et al. 2014; Brunsdon et al. 2007). We can visualize the density estimates within a space–time cube (Hagerstrand 1970; Bach et al. 2016; Hohl et al. 2016; Desjardins et al. 2019; Gao 2015; Nakaya and Yano 2010; Demšar et al. 2015) that has two spatial (x, y) and a temporal dimension (t). STKDE is computed as follows: We center the bottom of the kernel, a cylindrical window defined by its circular base with radius hs (spatial bandwidth) and height ht (temporal bandwidth) on a data point. Any voxel located within the kernel receives a contribution (or weight) toward its density estimate, as a case imposes risk only to the time-period after the event. The contribution is determined by the distance between voxel and data point in the center, which is plugged into the spatial and temporal kernel functions (ks, kt). Lastly, the weights are summed for each voxel, as multiple data points can contribute to a given voxel. We repeat the procedure for each data point and hence, create a density volume based on the observed point data. For a given voxel (x, y, t), density is calculated as follows (Eq. 5):
$$\hat{f}\left( {x,y,t} \right) = \frac{1}{{nh_{{\text{s}}}^{2} h_{{\text{t}}} }} \mathop \sum \limits_{i} k_{{\text{s}}} \left( {\frac{{d_{{i,\left( {x,y} \right)}} }}{{h_{{\text{s}}} }}} \right)k_{t} \left( {\frac{{d_{i,\left( t \right)} }}{{h_{{\text{t}}} }}} \right)$$
(5)
Every voxel s with coordinates (x, y, t) receives a density estimate \(\hat{f}\left( {x,y,t} \right)\), which is determined by distance and number of neighboring data points i. Data points in the neighborhood of s are weighted by the spatial and temporal kernel functions, ks and kt, which are computed as separate components, and then multiplied to calculate the contribution of a given data point to the density estimates on surrounding grid points. Lastly, di,(x,y) and di,(t) are the spatial and temporal distances between voxel and data point, respectively.
Space–time kernel density estimation for dynamic backgrounds
Here, we discuss the extension of STKDE to account for spatially and temporally varying backgrounds (Eq. 6). It is the temporal extension of Shi’s case-side adaptive bandwidth kernel density estimator:
$$\hat{r}\left( {x,y,t} \right) = \mathop \sum \limits_{i} k_{{\text{s}}} \left( {\frac{{d_{{i,\left( {x,y} \right)}} }}{{h_{{\text{s}}} \left[ {p\left( {x_{i} ,y_{i} } \right)} \right]}}} \right)k_{{\text{t}}} \left( {\frac{{d_{i,\left( t \right)} }}{{h_{{\text{t}}} \left[ {p\left( {t_{i} } \right)} \right]}}} \right)$$
(6)
The spatial- and temporal bandwidths hs and ht, respectively, are a function of the local population density p(xi, yi), p(ti) at space–time location (xi, yi, ti) of data point i. The background is assessed within a half cylinder moving through 3D space, which means that we consider the population within the kernel until the disease case occurs, but not after. A kernel that adapts to the background population is useful to establish constant population support (constant p in Eq. 2), rather than constant areal support, which is the case with fixed bandwidth kernels.
As seen in Sect. 2.2, it may make sense to adapt the bandwidth to the surrounding cases c(xi, yi, ti) instead of the local population p(xi, yi, ti) (Eq. 7). Therefore, we define the kernel density estimator for spatially and temporally dynamic backgrounds (ST-DB) as follows:
$$\hat{r}_{{{\text{ST}} - {\text{DB}}}} \left( {x,y,t} \right) = \mathop \sum \limits_{i} k_{{\text{s}}} \left( {\frac{{d_{{i,\left( {x,y} \right)}} }}{{h_{{\text{s}}} \left[ {c\left( {x_{i} ,y_{i} } \right)} \right]}}} \right)k_{{\text{t}}} \left( {\frac{{d_{i,\left( t \right)} }}{{h_{{\text{t}}} \left[ {c\left( {t_{i} } \right)} \right]}}} \right)$$
(7)
Here, the spatial and temporal bandwidths hs and ht expand until a specified number of neighboring disease cases is found within the cylindrical kernel. The density contribution of the disease case i to the voxels within bandwidth is then divided by the population in the kernel. As population information might be available in different formats and conceptualizations (see Sect. 1), we pick the population columns model (Fig. 1d) to illustrate the utility of our approach. The within-kernel population is computed by summation of the segment length of all population columns within the cylinder (Fig. 2c, d). The sum represents the number of individuals and their length of exposure to the disease case. It is measured in people-days (an analogy to the term ‘man-hours’ used to quantify the amount of work that can be done by one person within this period).
The support can be achieved in multiple ways. In search for neighbors, we could either exclusively expand the spatial bandwidth, or exclusively the temporal bandwidth, or both in an alternating pattern. Therefore, ambiguity arises by the choice of search strategy. To solve this problem, we need to unify the spatial and temporal dimensions, allowing us to expand the bandwidths simultaneously. We employ the k-nearest neighbors (kNN) method (Jacquez 1996) for this task as follows:
-
1.
Generate two ordered sets for each disease case: (1) the spatial k-nearest neighbors (Fig. 2e) and (2) the temporal k-nearest neighbors (Fig. 2f) of case i.
-
2.
Compute the cardinality card() of the intersection between the two sets (Fig. 2g).
Starting with k = 1, we increase k and apply the procedure until card() equals the support threshold. We then compute the spatial and temporal bandwidths hs, ht, respectively, as the spatial and temporal distance of the farthest point in the intersection set to the case. Using this procedure, we unify the spatial and temporal dimensions, enabling search for the support in adaptive-bandwidth kernel density estimation for spatially and temporally dynamic backgrounds. Therefore, we solve the multiway problem for ST-DB.