Spatial Spread Sampling Using Weakly Associated Vectors

Geographical data are generally autocorrelated. In this case, it is preferable to select spread units. In this paper, we propose a new method for selecting well spread samples from a finite spatial population with equal or unequal inclusion probabilities. The proposed method is based on the definition of a spatial structure by using stratification matrix. Our method exactly satisfies given inclusion probabilities and provides samples that are very well spread. A set of simulations shows that our method outperforms other existing methods such as the Generalized Random Tessellation Stratified (GRTS) or the Local Pivotal Method (LPM). Analysis of the variance on a real dataset shows that our method is more accurate than these two. Furthermore, a variance estimator is proposed.


Introduction
Geo-referenced data are usually autocorrelated, meaning that two close measurements are often similar. By spreading the measurements in the space, more information is collected. An important problem of spatial sampling is thus to spread at best the sampled units in space. Tillé and Wilhelm (2017), Wang et al. (2012), Benedetti et al. (2017) and Tillé (2020, Chapter 8) give a review of the main spatial sampling methods. The simplest method is systematic sampling. This method provides the best spread of the sample. Geographical stratification with one or a small number of sampled units in each stratum is also an interesting solution. Unfortunately, these two methods cannot be applied when units are selected with unequal probabilities or when units of the population are not evenly distributed in space.
Generalized Random Tessellation Stratified (GRTS) sampling is a spatial sampling method proposed by Stevens Jr. and Olsen (1999, 2003. They use a mapping 1 Introduction by means of a quadrant-recursive function to map a finite subset of a multi-dimensional space into the real line. A one-dimension systematic sampling is then applied, possibly with unequal probabilities (see also Theobald et al., 2007;Brown et al., 2015;Kincaid et al., 2019). Dickson and Tillé (2016) have simply used the Traveling Salesman Problem (TSP) in order to map the population points in one dimension. Systematic sampling is then applied. Grafström (2011) has proposed spatially correlated Poisson sampling (SCPS). This method uses weights to create strong negative correlations between the inclusion indicators of nearby units. The pivotal method is one of the particular cases of the splitting method proposed by Deville and Tillé (1998). It consists of randomly confronting two units at each step. Grafström et al. (2012) proposed the Local Pivotal Method (LPM), where two neighbouring units are confronted at each step, which produces an automatic repulsion in the selection of the neighbour units. Grafström and Tillé (2013) have generalized the LPM to obtain spread samples that are also balanced on totals of auxiliary variables. All these methods are implemented in the BalanceSampling R package (Grafström and Lisic, 2019). Stevens Jr. and Olsen (2004) have proposed to compute the Voronoï polygons around the sampled units, after which they sum the inclusion probabilities of the population units belonging to each Voronoï polygon. The variance of these sums, called "spatial balance", is an indicator of the quality of spreading. Tillé et al. (2018) have modified the index proposed by Moran (1950) so that it can be interpreted as a coefficient of correlation between the units and their neighbourhood. The index provides another measure of the quality of spreading.
In this paper, we propose a new spatial sampling method based on a completely different approach. The algorithm starts with the vector of inclusion probabilities. Like in the cube method (Deville and Tillé, 2004;Tillé, 2006) inclusion probabilities are randomly modified at each step. It can be seen as random walk that from the vector of inclusion probabilities ends up with a sample. By well choosing the modification direction at each step the sample selected is very well spread.
The paper is organized as follows. Section 2 gives the notation and a basic setup of the problem. In Section 3, we introduce the new method that we propose and the process of sample selection. In Section 4, we describe the indices that enable to evaluate the quality of the spreading: the spatial balance index and the measure based on Moran's I index. In Section 5, we present a variance estimatior for our method. In Section 6, we give simulation results of the algorithm on artificial spatial configurations while Section 7 is dedicated to simulations on real data. We used the geo-referenced "Meuse" dataset available in the R package "sp" of Pebesma and Bivand (2005) with inclusion probabilities proportional to the "cadmium" variable. Simulations show that the proposed method surpasses, LPM, GRTS and SCPS for the quality of the spreading, and the estimation accuracy.

Basic setup and notation
Consider a finite population U of size N whose units can be defined by labels k ∈ {1, 2, . . . , N }. Let S = {s|s ⊂ U } be the power set of U . These units are geo-referenced in a space that can have more than two dimensions. A sampling design is defined by a probability distribution p(.) on S such that p(s) ≥ 0 for all s ∈ S and s∈S p(s) = 1.
A random sample S is a random variable that takes elements of S as value such that P(S = s) = p(s). Define a k (S), for k = 1, . . . , N : Then a sample can be denoted by means of a vector notation: a = (a 1 , a 2 , . . . , a N ). For each unit of the population, the inclusion probability 0 ≤ π k ≤ 1 is defined as the probability that unit k is selected into sample S: Let π = (π 1 , . . . , π N ) be the vector of inclusion probabilities. Then, E(a) = π. In many applications, inclusion probabilities are such that samples have a fixed size n. Let the set of all samples that have fixed size equal to n be defined by The sample is generally selected with the aim of estimating some population parameters. Let y k denote a real number associated with unit k ∈ U , usually called the variable of interest. For example, the total Y = k∈U y k can be estimated by using the classical Horvitz-Thompson estimator of the total defined by Usually, some auxiliary information x k = (x k1 , x k2 , . . . , x kp ) regarding the population units is available. In the particular case of spatial sampling, a set of spatial coordinates z k = (z k1 , z k2 , . . . , z kp ) ∈ R p is supposed to be available, where p is the dimension of the considered space. A sampling design is saif to be balanced on the auxiliary variables x k if and only if it satisfies the balancing equations Grafström and Lundström (2013) showed that well spread probability samples are relatively well balanced.

General idea
Our sampling algorithm starts with the inclusion probability vector. At each step, this vector is randomly modified so that at least one of the components of the vector is replaced by a 0 or a 1. So, in at most N steps a sample is randomly selected. This idea is also used in the cube method proposed by Deville and Tillé (2004) to select balanced samples. By carefully choosing the direction of the modification of the working vector, we can ensure that the selection of the sample will be well spread. This choice is describe in sections 3.4.

Distance
In order to describe the spatial structure of the population, a distance is defined as a function m defined on the product set U × U such that and satisfies the property of non-negativity, symmetry, and triangular inequality. More specifically, for all x, y, z ∈ U the following properties hold: m(x, y) ≥ 0, m(x, y) = 0 ⇐⇒ x = y, m(x, y) = m(y, x), m(x, z) ≤ m(x, y)+m(y, z).
In most of applications, the usual Euclidean distance is used. It is defined by, where z k and z are the spatial coordinates of units k, ∈ U . Sometimes it could be interesting to compute the distance on auxiliary variables. In this case, the Mahalanobis distance can be more appropriate, When the population is distributed on a N 1 × N 2 regular grid of R 2 , a tore distance can be defined. With a tore distance, two units on the same column (respectively row) that are on the opposite side have a small distance. It is like seeing the grid curved such that it looks like a regular tore. The distance is then defined by: Example 3.1: Let {1, . . . , 9} be on a regular grid of size 3 × 3, then the squared distance matrices defined by Equations (3) and (4) are equal to 0 1 4 1 2 5 4 5 8 1 0 1 2 1 2 5 4 5 4 1 0 5 2 1 8 5 4 1 2 5 0 1 4 1 2 5 2 1 2 1 0 1 2 1 2 5 2 1 4 1 0 5 2 1 4 5 8 1 2 5 0 1 4 5 4 5 2 1 2 1 0 1 8 5 4 5 2 1 4 1 0 0 1 1 1 2 2 1 2 2 1 0 1 2 1 2 2 1 2 1 1 0 2 2 1 2 2 1 1 2 2 0 1 1 1 2 2 2 1 2 1 0 1 2 1 2 2 2 1 1 1 0 2 2 1 1 2 2 1 2 2 0 1 1 2 1 2 2 1 2 1 0 1 2 2 1 2 2 1 1 1 0 In spatial configuration of a regular grid, some distances between points are equal. The rank of the nearest neighbours is then assigned and duplicated values appear. In order to obtain a different rank distance for each unit, a small random quantity is added to the coordinates so that it disturbs the given units and the distances are a little bit different from each other. Let ε ∈ R 2 andz k = z k + ε the shifted coordinates, Equation (4) is then replaced by, ε is called a "shift" and m S the shifted version of m T , for example if ε = (1/12, 1/4), the distance matrix M S becomes, The matrix is no longer a distance matrix since the symmetric axiom has been dropped. A distance that has an unsatisfied symmetry axiom is called a quasi-metric.
Nevertheless, if an epsilon value is added instead of (1/12, 1/4), then the values are almost the same and the order is preserved in each row. In Figure 1, three simple configurations are presented: Euclidean, tore and shifted tore distance on a 3×3 regular grid. In shifted distance graph, all the distances from point (1, 1) to the other grid points are different.

Euclidean
Tore Shifted Tore  (3), the central graph is the tore distance given in (4) and the right one is the shifted tore distance with a shift equal to (1/12, 1/4) (the red point on the graph). It illustrates the two different patterns and the values of the grid points corresponding to the entries of the first row of the three previous matrices (5).

The stratification matrix
Let k ∈ U a unit in the population. Define G k the set of the nearest neighbours of unit k, including k, such that their inclusion probabilities are greater or equal than one by only one unit. Denote g k = #G k , the spatial weights are then defined as follows otherwise.
W denote an N × N stratification matrix and each row of matrix W represents a stratum. Each stratum is defined by a particular unit and its neighbouring units. Nearest neighbours are defined with a metric function (2). If the metric is such that there exists ties values, then we can divide the quantity w kl into the different g k nearest neighbours of the unit k that have the same distance. Or, a shifted metrics can be used (exemplified in matrix (6)) such that all the distances are different. Each row of matrix W sum to 1. Thus matrix W is a right stochastic matrix. Most of the components of matrix W are null. W can thus be encoded as a sparse matrix.
Example 3.2: Let {1, . . . , 9} be on a regular grid of size 3 × 3 with inclusion probabilities equal π k = 1/3, for all k ∈ U . Figure 2 shows different stratification matrices corresponding to M E , M T and M S with a shift randomly generated from a random variable N (0, 1/100I).

Fig. 2:
Sparsity pattern of three stratification matrices. Spatial coordinates are 3 × 3 regular grid and the inclusion probabilities are equal to π = (1/3, . . . , 1/3). Depending on the way of defining the nearest neighbours in Equation (7), different weight values are obtained. The left stratification matrix uses the classical Euclidean distance (3), the central one the tore distance (4) and the right one uses a shifted tore distance with a shift randomly generated from a random variable N (0, 1/100I).
Let now D = diag(π) the matrix with inclusion probabilities on the diagonal and define A by Matrices W and A are square but not necessarily full rank. The sum of the rows of A is equal or approximately equal to the number of elements in each stratum. The strata are represented by the rows and the contribution of a unit i in each stratum is represented by the ith column. Figure 3 shows the sparsity pattern of the two stratification matrices.
Example 3.3: Let U be a population of size N = 250 and inclusion probabilities equal to π k = 1/25, for all k ∈ U . Suppose that spatial coordinates are generated independantly from a uniform distribution on the square unit, so that with probability one there are no ties distance value. Since all 1/π k = 25 the non-zero entries of A are all equal to 1. Based on the definition (7), the weights are all equal to the inclusion probabilities or zero. Figure 3 shows the sparsity pattern of the stratification matrices and exemplifies some initial strata.

Implementation
The method is described in detail in Algorithm 1. The main idea is derived from the cube method (Deville and Tillé, 2004). At each step, vector π is randomly modified. To modify π, we choose a vector that spreads at best. Ideally, the aim consists of obtaining a sample a such that the following equality is satisfied: This linear system define an affine subspace of R N : which could also be rewrite: Depending if matrix A is full rank or not, the vector giving the direction is not selected in the same way. If matrix A is not full rank, a vector that is contained in the right null space is selected. If matrix A is full rank, we compute v,u a left and a right singular vectors associated to the smallest singular value σ of A i.e, Av = σu, A u = σv.
By choosing the modification vector v, we ensure that we select the vector which remains closest to the set A. Vector v is called the weakest associated vector to the matrix Intial strata Stratification matrix Fig. 3: Representation of the strata defined by the spatial weights Equation (7). Spatial coordinates of the units are generated randomly from a uniform distribution on the square unit [0, 1] × [0, 1]. The overall population size is equal to N = 250 and the inclusion probabilities are identical and equal to π k = 1/25 = 0.04. Meaning that the sample size is equal to n = 10. With these parameters the expected number of units in each stratum is equal to 125/4 = 25. The left graph shows the population and the selected units with its initial strata. On the right, it shows the sparsity pattern of the matrix (8). All entries of the matrix are equal to 1.
A. Vector v is then centred to ensure the fixed sample size. By using these weakest associated vectors, the initial spatial configurations are the least modified. At each step, some inclusion probabilities π are modified and at least one component is set to 0 or 1. Matrix A is updated from the new inclusion probabilities. This step is repeated until there is only one component that is not equal to 0 or 1. Algorithm 1 is implemented in the R package WaveSampling (Jauslin and Tillé, 2019), which uses the Armadillo C++ library into the R interface (Eddelbuettel and Sanderson, 2014). The implementation uses the sparse matrix class. Indeed, depending on the inclusion probabilities, matrix A given in (8) could be strongly sparse. Even if the function benefits from the C++ implementation, it could be quite time consuming as the size of the population N increases. Nevertheless, we will see in the next section that the algorithm performs better in terms of two spreading measure than those currently used for the spatial balanced sampling design.
1. From π t , extract π t vector π t restricted to the k such that 0 < π (t) k < 1. Let J be the length of π t .
2. Compute the J × J matrix A t of Equation (8) using inclusion probabilities π t .
3. Calculate the rank r of matrix A t .
(a) If matrix A t does not have full rank, J ) ∈ R J a vector in the right null space of A t .
(b) If matrix A t has full rank, compute the singular value decomposition and seek for v t a right singular vector associated to the smallest singular value σ t .
4. Next in order to ensure the fixed sample size, vector v t is centered: where 1 J is the J × 1 vector of one.
7. Return at 1. with π t+1 until no units k remains such that 0 < π (t+1) k < 1. The variance of the E[v k ] could be approximated and give a good measure of the spatial balance of the sample. The spatial balance measure based on the Voronoï polygons is defined by Two samples are compared in Fig. 4. The left one is selected with a simple random sampling without replacement and the right one is selected with WAVE sampling. The darker the Voronoï polygons, the more units it contains. An exactly well spread sample should have all polygons of the same colour.
The measure B has some limitations. It does not vary from a fixed finite range. This does not allow a clear understanding if the sample is balanced or clustered (Tillé et al., 2018). Moreover, the measure behaves sometimes wrongly and suggest a well spread sample altough it is not the case. Examples are given in Supplementary Material Section in Fig. 7 and Fig. 8. For these reasons, we suggest to use another measure based on Moran's I index.

Moran's I index
A second approach for measuring the spatial balance of a sampling design has been proposed by Tillé et al. (2018). Consider a N × N spatial weights matrix, A large value of w k indicates that is a neighbour of k. Matrix W is not necessarily symmetric. The index proposed by Tillé et al. (2018) is defined by where a is the sample andā w = a W1 1 W1 , Simple random sampling Weakly associated vectors 1 2 v k Fig. 4: Illustrated example of how the spatial balance measure based on the Voronoï is performed. The population and sample sizes are respectively equal to N = 50 and n = 20, the inclusion probabilities are identical and equal to π k = 0.4. The spatial coordinates are generated from two random uniform U(0, 1). Two sampling design are compared. The left one is the simple random sampling without replacement and the right one is the weakly associated vector sampling.
D is the diagonal matrix containing w k. = ∈U w k on its diagonal, and 1 is a column vector of N ones. Tillé et al. (2018) pointed out that I B can be interpreted as weighted correlation between a k and the average of the a that are in the neighbouring of k. We have that −1 ≤ I B ≤ 1 and I B = −1 when the sample is well spread. Tillé et al. (2018) have proposed to use the inverse of the inclusion probability h k = 1/π k to define the neighbouring of the unit k. More specifically, if the unit k is selected it seems natural to consider h k − 1 neighbours in the population. Let h k and h k be respectively the inferior and superior integers of h k . Spatial weights are then defined as follows, if unit is in the set of the h k nearest neighbour of k h k − h k if unit is the h k th nearest neighbour of k 0 otherwise .
For example, if a unit k has an inclusion probability of π k = 0.35 then h k ∼ = 2.857. Meaning that the first nearest neighbour of k has a weight equal to 1 and the second has a weight of 0.857. In case there are units that are at equal distance from each other, Tillé et al. (2018) suggests to divide the spatial weights equally among them.
We propose a new way of defining the spatial weights. It consists of using spatial weights defined in (7) rather than the weights (11). We set w kk = 0 for all k ∈ U . For the rest of the paper, I B 1 will represent the measure based on the spatial weights (11) and I B the one based on (7).

Variance estimation
If the sampling design is of fixed size, the variance of the Horvitz-Thompson estimator of the total (1) is defined by where ∆ k = π k − π k π and π k = E(a k a l ) is the joint inclusion probabilities. For complex sampling designs, quantities π k are generally impossible to compute. Many different estimators have been developed. Sen (1953) and Yates and Grundy (1953) proposed one classical estimator: This estimator can take negative values, but it is non-negative when ∆ k ≤ 0 for all k = ∈ U . A common problem with spatially balanced sampling designs is that many joint inclusion probabilities are equal to zero. Indeed the probability of selecting two close units is generally zero or very close to zero. In this case, v SY G is not an unbiaised estimator of var( Y HT ). Tillé (2020, Chapter 5) gives a general estimator based on the variance estimator of the conditional Poisson sampling. It is equal to Choosing c = (1−π k )n/n−1 we obtain the Hájek-Rosén estimator Hájek (1981) This variance estimator is simple to compute and has the advantage of using only the first-order inclusion probabilities. It is a good estimator for maximum entropy sampling design and simple random sampling without replacement. Grafström et al. (2012) pointed out that the estimator seems to overestimate the variance for spread sampling design. Stevens Jr. and Olsen (2003) proposed an estimator based on a local neighbourhood for each units in the sample. It is called the local mean variance estimator and is given by where the weights w k are computed such that they vary inversely as π and decrease as the distance between unit k and increases. Moreover, it satisfies the constraint k∈S w k = ∈S w k = 1. The set D k is the neighbourhood of the unit k and is defined by the unit itself and the three neighbourhoods of the three nearest neighbours. Meaning that D k contains at least four units and at most thirteen. This variance estimator is implemented by function localmean.var in the R package "spsurvey" Kincaid et al. (2019). It produces a good estimator for the GRTS method. As the WAVE sampling seems to select a more spatially spread sample than the GRTS method, we observe that v LM ( Y HT ) overestimates the variance. We then proposed to decrease the neighbourhood size to two (three with the included unit) for the WAVE sampling. This should produce better estimates of the variance.

Simulations on artificial spatial configurations
In this section, we propose three artificial spatial configurations to study the performance of the WAVE sampling in terms of spreading measure. To generate the three population datasets, the expected size of the population is equal to N = 225.

The dataset is generated from the Complete Spatial Randomness (CSR) that is a
Poisson process with intensity equal to N , meaning that the expected number of points in the unit square is equal to N . (Neyman and Scott, 1958) is generated with 15 circular discs of radius 0.055 with units uniformly distributed around the centre. Each cluster contains 15 units such that the population target size is equal to N .

A Neyman-Scott cluster process
3. Simple regular grid of size 15 × 15. Figure 5 shows a sample selection by the WAVE sampling design on the three different datasets. For the three configurations, the sample size is equal to n = 75 and the inclusion probabilities are all equal to π k = n/N for all k ∈ U . When units are regularly dispersed in the space and when the inverse of inclusion probabilities is equal to an integer that is a divisor of the population size N , the selected sample can be systematic, which is the optimal solution.
Complete spatial randomness Neyman-Scott process Simple regular grid Fig. 5: Example of a sample selection by the WAVE sampling on the three different spatial configurations, Complete Spatial Randomness, Neyman-Scott and regular grid. For each of them, the inclusion probabilities are equal to π k = n/N for all k ∈ U .
For each population, 10,000 samples of size n respectively equal to 25, 50 and 100 are selected. Two cases are considered for the inclusion probabilities. In the first case, all inclusion probabilities are equal for all k ∈ U, π k = π = n N .
For the second case, the inclusion probabilities are unequal and sum up to n, for all k, ∈ U, k = we have π k = π and k∈U π k = n.
In each case we calculate the spatial balance based on the Voronoï polygons (9) and measures based on Moran's I index (10). The simulations results of the CRS dataset are given in the Table 1. For the measures based on Moran's I index, the WAVE sampling design performs better than the other algorithms. Moreover, for the classical measure based on the Voronoï polygons, the WAVE sampling design performs equally and sometimes better than the local pivotal method. This can be explained by the fact that the spatial balance measure based on the Voronoï polygons is less sensitive to observe a well spread sample and sometimes suggest a well spread sample altough it is not the case (See Supplementary Material Section). For the equal probabilities designs the measures I B 1 and I B coincide. Indeed the strata based on the inverse inclusion probabilities are the same as the ones considered such that the inclusion probabilities sum to 1. For unequal sampling designs, the differences are less marked with the measure based on the inverse inclusion probabilities (11). This result comes from the heterogeneity of the strata and the randomness of the algorithm. If the inclusion probabilities of a unit is nearly zero, Tab. 1: Spreading measures results based on 10000 simulations on the Complete spatial randomness dataset. The population size is equal to 219.

Sampling design
Equal probabilities Unequal probabilities wave lpm1 scps grts srswor wave lpm1 scps grts maxent then the size of the strata will be very large. This effect can increase the spatial balance measure. Similar results for the two remaining datasets can be seen in the Supplementary Material Section. This analysis shows that the measure I B should be prefered to I B 1 .

Application to the Meuse dataset
This section investigates the application of WAVE sampling on the dataset "Meuse" available in the R package "sp" of Pebesma and Bivand (2005). It is described as follows: "This data set gives locations and topsoil heavy metal concentrations, along with a number of soil and landscape variables at the observation locations, collected in a flood plain of the river Meuse, near the village of Stein (NL). Heavy metal concentrations are from composite samples of an area of approximately 15 m x 15 m." In order to see how the WAVE sampling performs in terms of spread measures, m = 10, 000 samples of size respectively equal to 15, 30 and 75 are selected. As in the previous simulation with an artificial population, two cases are considered, equal and unequal probabilities. In the latter case, inclusion probabilities are set proportional to concentration of copper. Locations with high concentrations of copper were therefore more likely to be selected into the sample. Let Y be the total cadmium concentration over the whole population. To show that the variance of the estimated total with the WAVE sampling design is lower than the other method, we calculate the approximated variance with the following quantity:   shows sample selected with the WAVE sampling. The filled black circles are selected units while the hollow circles are those that are not selected in the sample. We observe that the dataset is partially aggregated around the river showing a strong spatial correlation.
Results of the three spatial balanced measures on 10'000 simulated samples is given in Table 2. WAVE sampling performs better than other sampling designs in terms of I B and I B 1 . In terms of spatial balance measure B, the algorithms are comparable to the artificial simulation, the differences are less marked.
Results of the simulations on the variance estimator in Table 3 shows that the WAVE sampling startegy has a lower variance than the currently used method. This suggests that the method is more efficient in cases where there is a clear spatial correlation. A design-unbiased variance estimator does not exist for the Horvitz-Tompson estimator, but the local mean variance (13) with only three neighbours seems to produce a good estimator for this dataset.  Based on these simulation results, we are confident that we propose here a new method that allows to select a sample with a really strong degree of spreading. It performs better than the other sampling method. It can be generalized to higher dimensions and respects the unequal inclusion probabilities.

Acknowledgments
We would like to thank Pierre-Yves Deléamont, Ziqing Dong, Esther Eustache, Cliona Jauslin and Lionel Qualité for their time spent to bring valuable comments on this manuscript.
Tab. 3: Results of 10000 simulations on Meuse dataset. The population size is equal to 155. v sim is equal to the variance approximated by the simulations (14). v depends on the sampling design. For the srswor and maxent methods, we used the estimator v HAJ (12) while for the other sampling designs, we use v LM (13). In the wave sampling design, the parameter for the neighbouring is set to three instead of four. Coverage rate of the 95% confidence intervals are computed as well as the ratio between averages of v and v SIM .  . . , 36} be on a regular grid 6 × 6, inclusion probabilities are all equal to π k = π = 1/6. Measure B given in Equation (9)  On the right figure, population is clustered around four units. Inclusion probabilities are all equal to π k = π = 1/9 (respectively π k = π = 1/12). Measure B (9) for the both example is equal to 0 while it should be a great value. On the other hand the Moran index (10) shows a correct behaviour with the two samples. For the Example 1 I B = 0.732 while I B = 0.232 for Example 2.