1 Introduction

Spatial sampling is a crucial issue for proper estimation of parameters in spatio-temporal dynamical models [9, 18] and for estimation of spatial fields (see [14]). For wireless sensor network (WSN) at least three requirements are crucial, namely, efficient energy usage, coverage and connectivity. Here, we concentrate on the last two of them from the view-point of sensors’ deployment. Coverage (or information coverage) is the ability of WSN to cover the whole area, assuming that a single sensor has the ability of collecting information from its (usually circular) neighborhood. The connectivity requires that wireless senors can transmit information from one to another and finally to a sink. Both connectivity and coverage require sensors to be evenly placed in the area.

Our aim is to discuss advantages of equidistributed (EQD) (also known as uniformly distributed (UD) or quasi-random points [8]) and their sub-class recently proposed by the authors [17] that are well-distributed (WD). Furthermore, we shall prove that it is possible to construct a nonparametric estimator of the mean of (possibly correlated) random field that is based on observations from such points.

We refer the reader to [1] and [3] for surveys on WSN and to [5, 10, 11, 14, 19] for an excerpt of approaches recently proposed for sensors placement.

2 Equidistributed Sequences – Good Candidates for Sensors’ Sites

Equidistributed sequences, also called uniformly distributed, quasi-random or quasi Monte-Carlo sequences, are well known in the theory of numerical integration (see, e.g., [7, 8]). Here, we summarize some of their basic properties, putting emphasis on those, which indicate that they are good candidates for sensors’ positions in WSN.

2.1 Definition

Define \(I_d=[0,\,1]^d\) as \(d\)-dimensional unit cube, which is our space for sensors’ deployment. Clearly, most WSN are considered for \(d=2\), but it can also be of interest in some applications to consider WSN in 3D space, e.g., for air pollution.

A deterministic sequence \((x_{i})_{i=1}^{n}\) is called EQD sequence in \(I_d\) iff

$$\begin{aligned} \lim _{n\rightarrow \infty } n^{-1}{\mathbf \sum _{i=1}^{n}}\, g(x_{i}) = \int _{I_d}\, g(x)\, dx \end{aligned}$$

holds for every \(g\) continuous on \(I_d\).

Thus, formally, EQD sequence behave as uniformly distributed random sequences, since (1) mimics a law of large numbers. As we shall see later, EQD sequences are in some sense “more uniform” than uniformly distributed random sequences.

The well known EQD sequences include Corput, Halton, Hammersley, Korobov, Zaremba and Sobol (see, e.g., [68]). For our purposes, the following example is important.

Important example – the Weyl sequence is defined as follows:

$$ t_{i}=\text{ fractional } \text{ part }(i\, \theta ), \quad i=1,2,\ldots , $$

where \(\theta \) is selected irrational number. E.g., quadratic irrational – \(\theta =(\sqrt{5}-1)/2\) behaves quite well in practice. In general, as it is more difficult to approximate \(\theta \) by rational numbers, then it is better candidate for generating the Weyl sequence.

2.2 Discrepancy as Indicator of WSN Coverage and Connectivity

A discrepancy \(D_n\) of EQD sequence is well known measure of its uniformity (see, e.g., [6, 7]). \(D_n\) discrepancy of \((x_{i})_{i=1}^{n}\) is defined as follows:

$$\begin{aligned} D_n = \sup _{A\subset I_d}\left| \mu _d(A)-\frac{N_n(A)}{n} \right| , \end{aligned}$$

where \(A\) is any parallelepiped and supremum is taken with respect to all such \(A\subset I_d\), \(\mu _d(A)\) is \(d\)-dimensional Lebesgue measure (the area or volume) of \(A\), while \(N_n(A)\) is number of \(x_i\)’s in \(A\).

As in the classical Monte-Carlo method, one can expect that for fairly spaced sensors \(\mu _d(A)\) is close to the fraction of sensors in \(A\). For interpreting (8) assume that the supremum is attained for a certain set \(\hat{A}_n\subset I_d\). Then, \(\hat{A}_n\) is the set that is most unevenly covered by sensors, i.e., it contains too small or too large number of sensors in comparison to other areas.

For this reason, in our opinion, \(D_n\) is a a good measure for connectivity and coverage of a sensors’ net. It is, however, not easy to find \(\hat{A}_n\) numerically and for this reason in the theory of numerical integration a simplified, but still useful, version of discrepancy, called \(D_n^*\) discrepancy, is more frequently used.

Let \(\Pi (x)\) be the parallelepiped in \(I_d\) with vertices in \((0,\ldots 0)\) and \(x\). \(D_n^*\) discrepancy of \((x_{i})_{i=1}^{n}\) is defined as follows:

$$\begin{aligned} D_n^* = \sup _{x\in I_d}\left| \mu _d(\Pi (x))-\frac{N_n(x)}{n} \right| , \end{aligned}$$

where \(N_n(x)\) is the number of \(x_i\)’s in \(\Pi (x)\).

Notice that \(D_n^*\) is an analog to Kolmogorov-Smirnoff statistics for testing the uniformity of a distribution. \(D_n^*\) can be efficiently calculated for a given set of points in \(1D\), \(2D\). Furthermore, it can be proved (see [7]) that

$$\begin{aligned} D_n^* \le D_n\le 2^d\, D_n^*. \end{aligned}$$

For “good” known EQD sequences:

$$D_n^* =\frac{\log ^{d}(n)}{n}. $$

This is much better than for “usual” uniformly distributed random variables, for which

$$ D_n^*\sim \frac{1}{\sqrt{n}}.$$

The same order \(1/\sqrt{n}\) is obtained when equidistant grid is considered in \(R^2\).

3 Basic Properties of Space-Filling Curves

Our idea is to transform a sequence of one dimensional EQD sequence by a space-filling curve (SFC) in order to obtain multidimensional EQD sequence with good properties. Note that a similar construction has been already proposed and used in the theory of numerical integration, but the transformed sequence was equidistant, i.e., \(i/n\), \(i=1,\,2,\ldots ,\, n\).

Below, we summarize known properties of space filling curves that are either directly used in the rest of the paper or indicate why we can expect that our construction leads to sensors’ site with good coverage and connectivity.

Space-filling curve is a mapping \(\varPhi : [0,\, 1] \mathop {\rightarrow }\limits ^{onto} I_d\) such that

  • \(\varPhi (t)\) a continuous function in \(I_1=[0,\, 1]\)

  • maps \(I_1=[0,1]\) onto \(I_d\)\(d\)-dim. cube.

The Hilbert, Peano and Sierpinski are well known examples of SFCs [15]. All these curves:

  • preserve areas and neighbors (see SFC (2), SFC (3) below),

  • fill the space uniformly and as densely as desired.

  • can be generated recursively and efficiently.

For the Peano, Hilbert and Sierpiński curves the following properties hold:

SFC (1) \(\forall g:\,I_d\rightarrow R\), g - continuous

$$\begin{aligned} \int _{I_d}\, g(x)\, dx=\int ^1_0\,g(\varPhi (t))\,dt, \end{aligned}$$

where \(x=[x^{(1)},\, x^{(2)},\ldots ,\,x^{(d)}]\)

SFC (2) Hölder continuity:

$$\begin{aligned} \exists _{C_\varPhi >0}\,\, ||\varPhi (t)-\varPhi (t')|| \le C_{\varPhi } |t-t'|^{1/d} , \end{aligned}$$

where \(||.||\) is the Euclidean norm in \(R^d\).

It is well known that \(\varPhi \) does not have the inverse. However, for any Borel set \(A\subset I_d\) we can define \(\varPhi ^{-1}(A)\) as preimage of \(A\).

SFC (3) \(\varPhi \) preserves the Lebesgue measure in the sense that

$$\begin{aligned} \forall _{A\subset I_d,}\,\,\,\, \mu _d(A)=\mu _1(\varPhi ^{-1}(A)), \end{aligned}$$

where \(\mu _1\), \(\mu _d\) are the Lebesgue measures in \(R_1,\,R_d\), respectively. In other words, volume of Borel set \(A\) in \(I^d\) is equal to the length of \(\varPhi ^{-1}(A)\) in \(I_1\).

We stress that the Peano, Hilbert and Sierpiński curves can be approximated efficiently, using \(O\left( \lceil \frac{d}{\varepsilon } \rceil \right) \) arithmetic operations, where \(\varepsilon >0\) is the approximation accuracy (see [2, 16]). Notice that for our purpose we shall calculate \(\varPhi (t)\) once for \(t=t_1,\,t_2,\ldots ,\, t_n\). Approximation of the Hilbert SFC is shown in Fig. 1.

Fig. 1.
figure 1

Approximation of the Hilbert SFC using 1024 points

4 Proposed EQD Sequences

The following algorithm generates EQD sequences that can be used as sensors’ positions.

Algorithm 1

Step (1) Generate EQD sequence in \([0,\,1]\) by the Weyl method: \( t_{i}=\text{ frac }(i\, \theta ),\ i=1,2,\ldots ,\, n\), where \(\theta \) is irrational number, e.g., \(\theta =(\sqrt{5}-1)/2\).

Step (2*) Sort \(t_i\)’s and get \(t_{(1)}<t_{(2)}<\ldots < t_{(n)}\). This step is optional, it serves mainly for theoretical purposes, but it can also be used to determine ordering of senors along SFC, which – in turn – can be used to indicate the ordering of information transmission between sensors.

Step (3) Select SFC and generate \(x_i\)’s as follows: \( x_{i}=\varPhi (t_{(i)})\), \(i=1,2,\ldots ,\,n\).

Below we state properties of the above generated sequences that justify their usefulness as sensors’ positions.

Algorithm 1 generates sequences that are extendable, i.e., one can add points without recalculating positions of earlier sites. This property is not shared by many other methods of generating multidimensional EQD sequences.

A deterministic sequence \((x_{i})_{i=1}^{n}\) is called well distributed (WD) in \(I_d\) iff

$$ \lim _{n\rightarrow \infty } n^{-1}{\mathbf \sum _{i=p}^{p+n}}\, g(x_{i}) = \int _{I_d}\, g(x)\, dx $$

holds uniformly in \(p\), for every \(g\in C(I_d)\), i.e., continuous on \(I_d\).

It is known that Weyl seq. \(t_i=frac(i\,\theta )\) is WD. As far as we know, multidimensional, extendable WD sequences are not known. The only exception is the above proposed sequence.

Theorem: If \(\theta \) irrational and SFC has the property SFC (1), then \(x_i\)’s generated by Algorithm 1 are not only EQD but also well distributed.

Proof – EQD property. \(\quad \forall g\in C(I_d)\) we have:

$$\begin{aligned}&n^{-1}\,\sum _{i=1}^n g(x_i)= n^{-1}\,\sum _{i=1}^n g(\varPhi (t_i))\, \rightarrow \nonumber \\&\quad \rightarrow \, \int ^1_0 g(\varPhi (t))dt=\int _{I_d} g(x)\,dx, \end{aligned}$$

since \(\left\{ t_i\right\} _{i=1}^n\) are EQD, \(g(\varPhi (.))\) is also continuous, while last equality follows from SFC 1). The proof of WD property uses the Weyl criterion, generalized to WD that is too complicated to be presented here (see [17] for details).

Corollary 1.

Under the same assumptions as in Theorem 1, for points generated by Algorithm 1 we have:

$$\begin{aligned} D_n\rightarrow 0 \quad \text{ and } \quad D^*_n\rightarrow 0 \quad \text{ as } \quad n\rightarrow \infty \end{aligned}$$

The first statement follows from (8) by selecting \(g\) as indicator functions of parallelepipeds. The second one is a direct consequence of (4).

Fig. 2.
figure 2

\(n=256\) points generated by Algorithm 1, using the Hilbert SFC, and linked by thin lines, indicating their ordering on SFC that can be used as ordering for transmitting information between sensors.

Points generated by Algorithm 1, using the Hilbert SFC and \(\theta =(\sqrt{5}-1)/2\) are shown in Fig. 2. We do not have a closed form formula for \(D_n^*\) of sequences generated by Algorithm 1. Instead, we have performed extensive simulations for \(n\) ranging from 50 to 30 000. Then, values of \(D^*_n\) were calculated and the least-squares method was used to fit the dependence of \(D_n^*(d)\) on \(n\) for \(d=2\) and \(d=3\). The results are the following:

$$\begin{aligned} 0.2\,\log ^2(n)/n\quad \text{ for }\quad d=2, \end{aligned}$$
$$\begin{aligned} 0.06\,\log ^3(n)/n\quad \text{ for }\quad d=3. \end{aligned}$$

They follow a general pattern of “good” EQD sequences

$$\begin{aligned} D_n^*(d)=O\left( \frac{\log ^d(n)}{n}\right) . \end{aligned}$$

5 Nonparametric Estimation of Spatial Means of Random Fields

Our aim in this section is to show how to estimate the spatial mean of a random field using WD points generated by Algorithm 1. The proposed estimator is similar to those proposed earlier in [12, 13], but it differs in the following two respects.

  • In [12] and [13] points were generated using the Halton-Hammersley sequences, which are EQD, but it is not known whether they are WD or not, while our sensors’ positions are WD.

  • Here, we allow correlated observations.

Let us assume that observations \(y_i\) of a scalar random field with unknown mean \(f(x)\) are collected at spatial points \(x_i\)’s , generated by Algorithm 1. More precisely,

$$\begin{aligned} y_i=f(x_i)+\varepsilon _i,\quad i=1,\, 2,\ldots ,\, n, \end{aligned}$$

where \(\varepsilon _i\) are random variables (r.v.’s) for which the following conditions hold:

ERR1) \(\varepsilon _i\)’s have zero mean, finite variance \(\sigma ^2\),

ERR2) for each \(i=1,\, 2,\ldots ,\, n\) the covariance \(E(\varepsilon _i\,\varepsilon _j)\) is not equal to zero only for a finite number, \(0 \le G(n) < n\) of indices \(j=1,\, 2,\ldots ,\, n\).

This assumption means that we allow for correlations between observations from \(i\)-th sensor and observations from only a finite number \(G(n)\) of its nearest neighborhood sensors with indices in a set denoted by \(\mathcal {O}(i)\). As we shall see later, we may allow \(G(n)\) to grow with \(n\), but rather slowly. It is a reasonable assumption since the statistical dependence between sensors fades out rapidly as a function of a distance between them. Clearly, if errors are uncorrelated, we have \(\mathcal {O}(i)=\{i\}\).

Our aim is to estimate \(f\) from \((x_i,\, y_i)\), \(i=1,\, 2,\ldots ,\, n\) in a nonparametric way, i.e., without imposing any finite parametrization of \(f\). Instead, we impose some smoothness assumptions on \(f\) and our aim is to construct estimator \(\hat{f}_n(x)\) such that the integrated mean-square error (IMSE) \(I(f,\, f_n)\,\mathop {=}\limits ^{def}\,\int _{I_d} E(f(x)-\hat{f}_n(x))^2\, dx\,\,\rightarrow \infty \) as \(n\rightarrow \infty \).

We select the following estimator \(\hat{f}_{n}\) of \(f\).

Algorithm 2

$$\begin{aligned} \hat{f}_{n}(x)=\sum _{k=1}^{N} \hat{a}_{kn}v_{k}(x), \;\; \end{aligned}$$
$$\begin{aligned} \hat{a}_{kn}=\frac{1}{n}\sum _{i=1}^{n} y_{i}v_{k}(x_{i}), \quad k=1,2,\ldots , N, \end{aligned}$$

where \(v_{1}, v_{2}, \ldots \) is a complete sequence of orthonormal functionsFootnote 1 in the space \(L_{2}(I_d)\) of square integrable functions on \(I_d\). Denote by \(a_{k}=\int _{I_d}f(x) v_{k}(x)\,dx\), \(k=1,2,\ldots \), the coefficients of the Fourier series of \(f\) in the basis \((v_{k})\), i.e.,

$$\begin{aligned} f(x) \sim \sum _{k=1}^{\infty } a_k\,v_k(x), \end{aligned}$$

where convergence is understand in \(L_2\) norm. In (14) one can recognize the truncated version of (16). In practice, the truncation point \(N\) is also estimated. In asymptotical considerations below, we admit a slowly growing dependence of \(N\) on the number of sensors \(n\) and we shall write \(N(n)\).

Remark 1. Notice that \(\hat{a}_{kn}\) is the estimator of \(a_k\), which is asymptotically unbiased. Indeed, equidistribution of \(x_i\)’s implies

$$\begin{aligned} E(\hat{a}_{kn})=n^{-1}\,\sum _{i=1}^n f(x_i)\,v_k(x_i)\rightarrow \int _{I_d} f\,v_k=a_k . \end{aligned}$$

But, due tu WD property of \(x_i\)’s, we can say more, namely

$$\begin{aligned} E(\hat{a}_{kn})=n^{-1}\,\sum _{i=p}^{n+p} f(x_i)\,v_k(x_i)\rightarrow a_k , \end{aligned}$$

uniformly w.r.t \(p\). This property seems to be important for WSN, because it frequently happens that a group of sensors fails (due to, e.g., a battery or communication faults) and can be replaced by differently located sensors. From (18) it follows that WSN based on sensors placed at WD points still can provide estimators with a small bias, provided that \(n\), i.e., the number of active sensors, is sufficiently large.

Remark 2. Notice that (15) has the form that is well suited for collecting data along a network, because coefficients can be calculated recursively. Indeed, \(v_k(x_i)\)’ s, (\(k=1,\, 2,\ldots ,\, N\)) can be pre-computed and stored in \(i\)-h sensor. Sensors can be ordered along SFC (see Fig. 2) and values of partial sums, after adding \(y_i\,v_k(x_i)\) to previous ones can be passed from one sensor to another. The role of the sink is to calculate (14).

Now, our aim is to sketch the proof of IMSE consistency of \(\hat{f}_n\). For simplicity of formulas, we assume that \(d=2\) and ONS) \(v_k(x^{(1)},x^{(2)})\) are ordered and normalized trigonometric functions of the form: \(1, \, \sin (x^{(1)})\, \sin (x^{(2)}), \, \sin (x^{(1)})\, \cos (x^{(2)}), \cos (x^{(1)})\, \sin (x^{(2)}),\) \( \cos (x^{(1)})\, \cos (x^{(2)}), \ldots \). We have \(N=N(n)\) such products in common and they are commonly bounded by \(H>0\), say, while their derivatives are commonly bounded by \(N(n)\, H\). Notice that \(N(n)\) implicitly depends on \(d\), so for \(d=2\), \(N(n)=M_1(n)\, M_2(n)\), where \(M_1(n)\) and \(M_2(n)\) are the numbers of trigonometric functions in \(x^{(1)}\) and \(x^{(2)}\) variables, respectively. Below, when we write \(N(n)\rightarrow \infty \), we require that \(M_1(n)\rightarrow \infty \) and \(M_2(n)\rightarrow \infty \).

One can weaken this assumption by allowing \(H\) be dependent on \(k\), as it is, e. g., for the Legendre polynomials.

The orthonormality of \((v_k)\) implies

$$\begin{aligned} IMSE(\hat{f}_n,f)=W_{n}+ B_{n}^{2}+R(N,f) \end{aligned}$$

where \(R(N,f)\mathop {=}\limits ^{def}\sum _{k=N+1}^{\infty }a_{k}^{2}\),

$$\begin{aligned} W_{n}\mathop {=}\limits ^{def}\sum _{k=1}^{N}\text{ Var } (\hat{a}_{kn}), \end{aligned}$$
$$\begin{aligned} B_{n}^{2}\mathop {=}\limits ^{def}\sum _{k=1}^{N}(E(\hat{a}_{kn})-a_{k})^{2}, \end{aligned}$$

To estimate the variance, let us note that

$$\begin{aligned} \text{ Var } (\hat{a}_{kn})=\frac{1}{n^2}\, E\left( \sum _{i=1}^n \varepsilon _i\,v_k(x_i)\right) ^2= \end{aligned}$$
$$ =\frac{1}{n^2}\, \sum _{i=1}^n \sum _{j=1}^n cov(\varepsilon _i,\, \varepsilon _j)\, \,v_k(x_i)\, \,v_k(x_j)= $$
$$ =\frac{1}{n^2}\, \sum _{i=1}^n \sum _{j\in \mathcal {O}(i)} cov(\varepsilon _i,\, \varepsilon _j)\, \,v_k(x_i)\, \,v_k(x_j) $$

Due to ONS) and ERR2) we have

$$\begin{aligned} \text{ Var } (\hat{a}_{kn})\le \frac{H\, G(n)\,\sigma ^2}{n}\, \left[ \frac{1}{n} \sum _{i=1}^n |v_k(x_i)|\right] \le \frac{H^2\, G(n)\,\sigma ^2}{n}. \end{aligned}$$

Thus, \(W_{n}\le \frac{N(n)\,H^2\, G(n)\,\sigma ^2}{n}\).

Denote by \(V(I_d)\) the space of functions having bounded variation and let \(\mathcal {V}(g)\) denote the total variation of \(g\in V(I_d)\). Then, by the Koksma-Hlavka inequality (see, e.g., [6]) we obtain for \(f\in V(X)\cap C(X)\)

$$\begin{aligned} \left( E(\hat{a}_{kn})-a_{k}\right) ^{2} \le \left( \mathcal{V}(f\cdot v_k)\cdot D_n^* \right) ^2 \le c_1\, N^2(n)\, (D_n^*)^2, \end{aligned}$$

where \(c_1>0\) is a constant independent of \(n\), while the second inequality can be obtained in a way similar to the one that was used in [12]. Thus, \(B_n^2\le c_1\, N^3(n)\, (D_n^*)^2\) and finally, we obtain

$$\begin{aligned} IMSE(\hat{f}_n,f)\le c_1\, N^3(n)\, (D_n^*)^2 + c_2\,N(n)/n +R(N,f). \end{aligned}$$

Notice that \(f\in V(I_d)\cap C(I_d)\subset L^2(I_d)\), which implies \(R(N,f)\rightarrow 0\) as \({N(n)\rightarrow \infty }\).

Proposition: If \(f\in V(I_d)\cap C(I_d)\) and sequence \(N(n)\) is selected in such a way that for \(n\rightarrow \infty \)

$$\begin{aligned} N(n)\rightarrow \infty ,\quad N^3(n)\, (D_n^*)^2 \rightarrow 0, \quad N(n)/n \rightarrow 0, \end{aligned}$$

then \(IMSE(\hat{f}_n,f)\rightarrow 0\).

In the case of bivariate random fields, a particular choice of \(N(n)=M_1(n)\, M_2(n)\) depends on the rate of decay \(D_n^*\), which is typically of order \(O\left( \frac{1}{n^{1-\epsilon }}\right) \), where \(\epsilon >0\) is arbitrarily small. Thus, selecting \(M_1(n)=M_2(n)=c\,n^{\beta /2}\), \(c>0\), \(0< \beta < 2\,(1-\epsilon )/3\), we can assure that for \(N(n)=M_1(n)\,M_2(n)\) conditions (26) hold.