Model fitting
The reduced form of model (5) based on inversion (8) permits the following equation to be derived:Footnote 2
$$\begin{aligned} y_P =\rho W_{PP}y_P+X_P\beta +\varepsilon _P +A_{PC}\tilde{\varXi }^{-1}\left[ A_{CP}A_{PP}^{-1}\,(X_P\beta +\varepsilon _P) -(X_C\beta +\varepsilon _C)\right] .\nonumber \\ \end{aligned}$$
(10)
Left-hand side term of Eq. (10) together with the first three terms of the right-hand side perfectly describe a SAR amongst correctly geo-referenced points, sharing the same parameters of the complete model (1). Unfortunately, the last term on the right-hand side makes things more complicated.
The fourth term on the right-hand side of Eq. (10) proves that, in general, any subset of observations of a SAR does not follow a SAR. Indeed, it makes the estimation process of a SAR with coarsened points particularly tricky, since Eq. (10) includes blocks of matrix A which depend on the (unknown) coordinates of the coarsened points.
As previously stated, the estimation strategy proposed in this paper relies on a double marginalisation of the likelihood function of the SAR (1). In particular, the former marginalisation should be made with respect to \(y_P\), thus concentrating the information about coarsened points into a lower dimensional space. A similar approach to the marginalisation of the SAR has already proved to be successful in the context of variance estimation in 2-dimensional systematic sampling (see Espa et al. 2017). The latter marginalisation should instead be made with respect to the point process of non-coarsened points \(Z_P\), so as to include direct and indirect effects of positional errors in the (marginal) probability distribution of \(y_P\).
The first marginalisation can be derived in closed form from the reduced form of model (1) based on inversion (9), and equals
$$\begin{aligned} y_P =\varXi ^{-1}X_P\beta +\varXi ^{-1}\varepsilon _P -\varXi ^{-1}A_{PC}A_{CC}^{-1}(X_C\beta +\varepsilon _C) , \end{aligned}$$
which implies that:
$$\begin{aligned} {{\,\mathrm{\mathbb {E}}\,}}(y_P|Z,\varPhi )&=\varXi ^{-1}X_P\beta +\rho \,\varXi ^{-1}W_{PC}A_{CC}^{-1}X_C\beta , \end{aligned}$$
(11a)
$$\begin{aligned} {{\,\mathrm{cov}\,}}(y_P|Z,\varPhi )&=\sigma ^2\,\varXi ^{-1}(I_p+\rho ^2\,W_{PC}(A_{CC}^{\mathrm{T}}A_{CC})^{-1}W_{PC}^{\mathrm{T}})(\varXi ^{-1})^{\mathrm{T}}, \end{aligned}$$
(11b)
so that the log-likelihood function \(\ln {{\,\mathrm{\mathcal {L}}\,}}(\rho ,\beta ,\sigma ^2|y,X,Z,\varPhi )\) of the of the model (1) marginalised with respect to \(y_P\) equals:
$$\begin{aligned}&\ln {{\,\mathrm{\mathcal {L}}\,}}(\rho ,\beta ,\sigma ^2|y_P,X,Z,\varPhi ) =-\frac{p}{2}\,\ln (2\pi \sigma ^2)+\ln |\varXi |\nonumber \\&\quad -\frac{1}{2}\,\ln |I_p+\rho ^2\,W_{PC}(A_{CC}^{\mathrm{T}}A_{CC})^{-1}W_{PC}^{\mathrm{T}}|\nonumber \\&\quad -\frac{1}{2\sigma ^2}\,(\varXi y_P-X\beta -\rho W_{PC}A_{CC}^{-1}X_C\beta )^{\mathrm{T}}\cdot \nonumber \\&\quad \cdot (I_p+\rho ^2\,W_{PC}(A_{CC}^{\mathrm{T}}A_{CC})^{-1}W_{PC}^{\mathrm{T}})^{-1}\cdot \nonumber \\&\quad \cdot (\varXi y_P-X\beta -\rho W_{PC}A_{CC}^{-1}X_C\beta ) . \end{aligned}$$
(12)
The second marginalisation requires the intensity function \(\lambda \) to be estimated, so as to characterise the first-order properties of the spatial point process \(\{Z(s,\omega ):s\in S\}\) and, in turn, the probabilistic law of the spatial weight matrix W under coarsened geocoding.
The point process \(\{Z(s,\omega ):s\in S\}\) along with the coarsening process \(\{\varPhi _i:i=1, \ldots ,n\}\) defines a bivariate point pattern (Illian et al. 2008).Footnote 3 According to Zimmerman (2008), for any \(s\in S\), the intensity function \(\lambda :S\rightarrow {{\,\mathrm{\mathbb {R}}\,}}^+\) of the spatial point pattern affected by incomplete geocoding can be estimated as follows:
$$\begin{aligned} \hat{\lambda }(s)=\sum _{i=1}^n[\hat{\phi }(z_i)]^{-1}K_h(s-z_i), \end{aligned}$$
(13)
where \(K_h\) is some kernel function with bandwidth h, \(z_i\) is the observed location of unit i, and \(\hat{\phi }\) is an estimate of the geocoding propensity function \(\phi :S\rightarrow (0,1]\), whose reciprocal (\(1/\hat{\phi }\)) is used as the weighting criterion of the kernel estimator.
As for \(K_h\), Zimmerman (2008) uses a Gaussian kernel whose bandwidth is automatically selected by minimising the mean-square error statistic defined in Diggle (1985) through cross-validation (Berman and Diggle 1989).
In operative terms, Zimmerman (2008) estimates the intensity function \(\lambda \) through the R (R Core Team 2020) function density.ppp of package spatstat (Baddeley et al. 2015), whereas the bandwidth is computed by means of the function bw.diggle (of package spatstat as well). Monte Carlo simulations illustrated in Sect. 4 and the application to real data discussed in Sect. 5 use the same functions.
The geocoding propensity function \(\phi \) can be estimated in various ways, according to the available information about the coarsening process. In this paper, the values of the coarsening probabilities in (4) are assumed to be such that \(p_i=\phi (z_i)\), given the coordinate \(z_i\in S\) of the unit i. It follows that:
$$\begin{aligned} \hat{\phi }(s)= \frac{\sum _{r=1}^R\sum _{i=1}^n\varPhi _i\mathbb {1}_{\{z_i\in S_r\}}\mathbb {1}_{\{s\in S_r\}}}{\sum _{r=1}^R\sum _{i=1}^n\mathbb {1}_{\{z_i\in S_r\}}\mathbb {1}_{\{s\in S_r\}}} , \end{aligned}$$
(14)
so that \(\hat{\phi }\) is constant over each region \(S_r\in \mathcal {S}\), and equals the proportion of non-coarsened points in \(S_r\).
The solution we propose in this paper consists in five steps which are summarised in Algorithm 1.
Algorithm 1
(Double-marginalisation estimation)
-
1.
the geocoding propensity function \(\phi \) is estimated over \(\mathcal {S}\) through estimator (14);
-
2.
the intensity function \(\lambda \) of the coarsened point process Z is estimated according to Zimmerman (2008) through estimator (13);
-
3.
the likelihood of SAR (1) marginalised with respect to \(y_P\) is derived from (11); we denote that likelihood function by \({{\,\mathrm{\mathcal {L}}\,}}(\rho ,\beta ,\sigma ^2|y_P,X,Z,\varPhi )\);
-
4.
the likelihood \({{\,\mathrm{\mathcal {L}}\,}}(\rho ,\beta ,\sigma ^2|y_P,X,Z,\varPhi )\) is marginalised with respect to \(Z_P\), that is:
where \(\hat{\varrho }:S^{n-p}\rightarrow {{\,\mathrm{\mathbb {R}}\,}}^+\) is the conditional probability density function of \(Z_C|Z_P\) implied by the estimated intensity function \(\hat{\lambda }\);
-
5.
marginal likelihood \({{\,\mathrm{\mathcal {L}}\,}}(\rho ,\beta ,\sigma ^2|y,X,Z_P,\varPhi )\) is maximised with respect to \(\rho \), \(\beta \) and \(\sigma ^2\).
As anticipated, marginalisation (15) has to be performed numerically since it seems impossible to compute it analytically. Anyway, two issues may make the outlined method computationally unfeasible.
Firstly, the high-dimensional integration space in (15) may substantially deteriorate the performances of Monte Carlo integration methods.
Secondly, the need to evaluate integral (15) at every step of the optimisation procedure dramatically exacerbates the problem outlined in the previous point.
In order to overcome both problems (and the second in particular), we rely on the cross-entropy algorithm for the optimisation of noisy functions (Rubinstein and Kroese 2004). Unlike other numerical optimisation methods such as the Expectation-Maximisation algorithm (Dempster et al. 1977; Robert and Casella 2004), at each iteration the cross-entropy algorithm simultaneously performs the marginalisation and the optimisation of the likelihood function \({{\,\mathrm{\mathcal {L}}\,}}(\rho ,\beta ,\sigma ^2|y,X,Z_P,Z_C,\varPhi )\). This leads to a substantial reduction of the computational burden required by the optimisation routine.
Results of Monte Carlo simulations discussed in the next section have been performed adopting the same parameters and instrumental distributions of the cross-entropy algorithm as described in Bee et al. (2017), where the method has been applied to maximum likelihood estimation of generalised linear multilevel models (the only exception is in the number N of draws, as it will be clarified later).
Impact estimators
According to LeSage and Pace (2009), the effects of covariates on the dependent variable of a SAR do not solely depend on regression coefficients \(\beta \), as the spatially-lagged dependent variable induces an indirect effect resulting from the autoregressive parameter \(\rho \) and the spatial weight matrix W. It follows that the overall impact of a regressor on the value of the dependent variable can be decomposed in a direct and an indirect impact, which, however, it is not constant amongst all units. For these reasons, averages of total (\(T(\beta )\)), direct (\(D(\beta \))), and indirect (\(M(\beta )\)) impacts are usually computed (LeSage and Pace 2009):
$$\begin{aligned} T(\beta )&=n^{-1}\,\iota _n^{\mathrm{T}}(I-\rho W)^{-1}\iota _n\beta , \end{aligned}$$
(16a)
$$\begin{aligned} D(\beta )&=n^{-1}\,{{\,\mathrm{tr}\,}}(I-\rho W)^{-1}\beta , \end{aligned}$$
(16b)
$$\begin{aligned} M(\beta )&=T(\beta )-D(\beta ) . \end{aligned}$$
(16c)
According to the model we have described in Sect. 2, some elements of the spatial weight matrix W are uncertain when geocoding is not complete. It follows that impacts should be estimated via Monte Carlo simulations, where the weight matrices are defined according to the realisations of a point process Z with estimated intensity function \(\hat{\lambda }\). Thus, the Monte Carlo estimators of the impact measures (16) can be defined as follows:
$$\begin{aligned}&\widehat{(A^{-1})}=\frac{1}{N}\sum _{k=1}^N(I-\hat{\rho } W_k)^{-1} ,&\hat{T}(\hat{\beta })=n^{-1}\,\iota _n^{\mathrm{T}}\widehat{(A^{-1})}\iota _n\hat{\beta } ,\\&\hat{D}(\hat{\beta })=n^{-1}\,{{\,\mathrm{tr}\,}}\widehat{(A^{-1})}\hat{\beta } ,&\hat{M}(\hat{\beta })=\hat{T}(\hat{\beta })-\hat{D}(\hat{\beta }) . \end{aligned}$$
Since Monte Carlo estimation of matrix \(\widehat{(A^{-1})}\) may be computationally demanding, because of the inversions of the weight matrices \(W_k\), a truncated geometric series of \((I-\hat{\rho } W_k)^{-1}\) may reduce substantially the computational burden of the simulation:
$$\begin{aligned} \widehat{(A^{-1})}=\frac{1}{N}\sum _{k=1}^N\sum _{h=0}^m\hat{\rho }^hW_k^h . \end{aligned}$$
where m represents the truncation order.
Asymptotics and generalisations
As stated in the introduction, this paper aims at proposing an estimation method for spatial models à la Cliff–Ord (Cliff and Ord 1969), where a portion of data is affected by coarsening, thus the primarily interest is devoted to the parameters of the model, as well as other measures of covariates’ effects (like, e.g. direct, indirect and total impacts, which are discussed in Sect. 3.2).
The double marginalisation performed in Algorithm 1 derives from the following marginalisation of the probability density function of the marked point process:
$$\begin{aligned} f_{Y_P\vert Z_P}(y_P\vert z_P) =\int \left[ \int f_{Y_P,Y_C\vert Z_P,Z_C}(y_P,y_C\vert z_P,z_C)\,\text {d}y_C\right] \varrho (z_C\vert z_P)\,\text {d}z_C , \end{aligned}$$
(17)
where conditioning of all probability density functions with respect to the coarsening vector \(\varPhi \) has been omitted for notational simplicity, and \(\varrho (z_C\vert z_P)=f_{Z_C\vert Z_P}(z_C\vert z_P)\) has been denoted consistently to the notation of Eq. (15). The inner integral of Eq. (17) corresponds to the first marginalisation described at point 3 of Algorithm 1, whereas the outer integral determines the marginalisation of Eq. (15).
The inner integral of Eq. (17) is a marginalisation of a model whose maximum likelihood estimators have been proved to be consistent by Lee (2004), provided that specific requirements on the asymptotic specification of the spatial weight matrix W are satisfied. Yet, the asymptotic behaviour of the double-marginal estimator depends also on both the geocoding propensity function estimator (14) and the intensity function estimator (13); both estimators are consistent, and their asymptotic properties are discussed in Zimmerman (2008), however this is not enough to guarantee that the double-marginal estimator is consistent too. The reason for this is that the spatial weight matrix is built according to both the spatial point pattern and the coarsening process, and its asymptotic behaviour is fully determined by the properties of that two processes. To our knowledge, at the moment, there are no theoretical results which can be exploited in order to prove (or refuse) the consistency of double-marginal estimator.
As for the applicability of the double-marginal estimator, it is worth pointing out that it can be easily adapted or generalised to other coarsening mechanisms, point processes, or stochastic spatial processes, as it is only required that the model can be identified and marginalised. Thus, if a spatial model other than the SAR is considered, Algorithm 1 changes in step 3, where the likelihood function of the model is marginalised with respect to non-coarsened units, whereas the rest of the algorithm does not change.
A special case is represented by the spatial Durbin model (SDM), which generalises the SAR model by including some (or all) spatially lagged covariates amongst the regressors. In this case, both Algorithm 1 and likelihood function (12) are valid without modifications, provided that the design matrix X is properly redefined so as to include extra covariates.
The family of Cliff–Ord spatial models consists of other specifications which somehow include other forms of spatial dependence or allows for other specifications of the covariate effects. An extensive review of the existing Cliff–Ord spatial models can be found in Cressie (2015), Anselin (1988), and LeSage and Pace (2009). Here it is worth reminding the general nesting spatial model (GNSM) defined in Elhorst (2014):
$$\begin{aligned} {\left\{ \begin{array}{ll} y=\rho Wy+\alpha \iota _n+X\beta +WX\theta +u\\ u=\lambda Wu+\varepsilon \\ \varepsilon \sim {{\,\mathrm{\mathcal {N}}\,}}_n(0,\sigma ^2I_n) \end{array}\right. } \end{aligned}$$
(18)
Although the GNSM (18) is not identifyable, it deserves consideration, as it includes the main Cliff–Ord spatial models as special cases, if one or more restrictions are apllied to its parameters.—For example, the SDM is obtained when \(\lambda =0\), whereas the SAR model (1) results if \(\lambda =0\) and \(\delta =0\) (the constant vector \(\iota _n\) can be included into the design matrix X).
The full log-likelihood of the GNSM (18) can be proved to be:
$$\begin{aligned}&{{\,\mathrm{\ln {\mathcal {L}}}\,}}(\rho ,\lambda ,\alpha ,\beta ,\delta ,\sigma ^2) =-\frac{n}{2}\ln (2\pi \sigma ^2)+\ln |A_\rho |+\ln |A_\lambda |\nonumber \\&\quad -\frac{[A_\lambda (A_\rho y-\alpha \iota _n-X\beta -WX\theta )]^{\mathrm{T}}[A_\lambda (A_\rho y-\alpha \iota _n-X\beta -WX\theta )]}{2\sigma ^2} , \end{aligned}$$
(19)
where \(A_\rho =I_n-\rho W\) and \(A_\lambda =I_n-\lambda W\).
The likelihood of models nested in GNSM can be derived from (19), whereas the first analytical marginalisation can be derived through inversions (8) and (9). Once the first marginalisation has been derived, Algorithm 1 can be applied just as illustrated above.