Let us consider a target population partitioned into m small areas. In our application, estimates of confidence in policing will be produced for London wards, thus, m equals 610. In the traditional EBLUP derived from the FH model (Fay and Herriot 1979), we assume that a linking model linearly relates the quantity of inferential interest (i.e. proportion of citizens who think that police do a good job), which is usually an area mean or total δi, to p area level auxiliary variables xi = (xi1, …, xip)′ with a random effect vi:
$$ {\delta}_i={\boldsymbol{x}}_i^{\prime}\boldsymbol{\beta} +{v}_i,\kern0.5em i=1,\dots, m, $$
(1)
where β is the p × 1 vector of regression parameters and \( {v}_i\sim iid\left(0,{\sigma}_u^2\right) \). In our case, δi represents the confidence in police work and xi denotes the covariates known to be associated to confidence in policing (e.g. unemployment, concentration of minorities, poverty). The model assumes that a design-unbiased direct estimate denoted yi for δi, which is obtained from the observed sample, is available for each area i = 1, …, m:
$$ {y}_i={\delta}_i+{e}_i,\kern0.5em i=1,\dots, m, $$
(2)
where ei ∼ N(0, ψi) denotes the sampling errors, independent of vi, and ψi refers to the sampling variance of the direct estimates (Rao and Molina 2015).
The SEBLUP borrows strength from neighboring areas by adding spatially correlated random area effects (Petrucci and Salvati 2006; Salvati 2004). If we combine (1) with (2) we can write the following model:
$$ \boldsymbol{y}=\boldsymbol{X}\boldsymbol{\beta } +\boldsymbol{v}+\boldsymbol{e}, $$
(3)
where y = (y1, …, ym)′ is the vector of direct estimates of confidence in policing for m areas, X = (x1, …, xm)′ denotes the covariates associated to the outcome measure for m areas, v = (v1, …, vm)′ is a vector of area effects and e = (e1, …, em)′ is a vector of sampling errors independent of v. We assume v to follow a SAR process with unknown autoregression parameter ρ ϵ (−1, 1) and a contiguity matrix W (Cressie 1993):
$$ \boldsymbol{v}=\rho \boldsymbol{Wv}+\boldsymbol{u}, $$
(4)
where ρ represents the spatial autocorrelation coefficient of our outcome measure (i.e. confidence in policing) and W is a standardised matrix that relates each area with all neighboring areas.
We also assume (Im − ρW) to be non-singular, where Im is a the m × m identity matrix, so we can express (4) as follows:
$$ \boldsymbol{v}={\left({\boldsymbol{I}}_m-\rho \boldsymbol{W}\right)}^{-1}\boldsymbol{u}, $$
(5)
where u = (u1, …, um)′ satisfies \( \boldsymbol{u}\sim N\left({\mathbf{0}}_m,{\sigma}_u^2{\boldsymbol{I}}_m\right) \). Thus,
$$ \boldsymbol{y}=\boldsymbol{X}\boldsymbol{\beta } +{\left({\boldsymbol{I}}_m-\rho \boldsymbol{W}\right)}^{-\mathbf{1}}\boldsymbol{u}+\boldsymbol{e} $$
(6)
The vector of variance components are denoted as \( \boldsymbol{\theta} ={\left({\theta}_1,{\theta}_2\right)}^{\prime }=\left({\sigma}_u^2,\rho \right)^{\prime } \). Then, the Spatial Best Linear Unbiased Predictor (SBLUP) of \( {\delta}_i={\boldsymbol{x}}_i^{\prime}\boldsymbol{\beta} +{v}_i \) is given by
$$ {\overset{\sim }{\delta}}_i^{SBLUP}\left(\boldsymbol{\theta} \right)={\boldsymbol{x}}_i^{\prime}\overset{\sim }{\boldsymbol{\beta}}\left(\boldsymbol{\theta} \right)+{\boldsymbol{b}}_i^{\prime}\boldsymbol{G}\left(\boldsymbol{\theta} \right){\boldsymbol{\varSigma}}^{-\mathbf{1}}\left(\boldsymbol{\theta} \right)\left\{\boldsymbol{y}-\boldsymbol{X}\overset{\sim }{\boldsymbol{\beta}}\left(\boldsymbol{\theta} \right)\right\} $$
(7)
where \( {\boldsymbol{b}}_i^{\prime } \) is a 1 × m vector (0,…,1,0,…,0) with 1 in position i. G(θ), the covariance matrix of v, is given by \( \boldsymbol{G}\left(\boldsymbol{\theta} \right)={\sigma}_u^2{\left\{\left({\boldsymbol{I}}_m-\rho \boldsymbol{W}\right)\prime \left({\boldsymbol{I}}_m-\rho \boldsymbol{W}\right)\right\}}^{-\mathbf{1}} \). Σ(θ), which is the covariance matrix of y, is defined as Σ(θ) = G(θ) + Ψ, where Ψ = diag (ψ1, …, ψm). And \( \overset{\sim }{\boldsymbol{\beta}}\left(\boldsymbol{\theta} \right) \), the weighted least squares estimator of β, is obtained as \( \overset{\sim }{\boldsymbol{\beta}}\left(\boldsymbol{\theta} \right)={\left\{{\boldsymbol{X}}^{\prime }{\boldsymbol{\varSigma}}^{-\mathbf{1}}\left(\boldsymbol{\theta} \right)\boldsymbol{X}\right\}}^{-\mathbf{1}}{\boldsymbol{X}}^{\prime }{\boldsymbol{\varSigma}}^{-\mathbf{1}}\left(\boldsymbol{\theta} \right)\boldsymbol{y} \).
The SEBLUP is obtained by replacing a consistent estimator of θ by \( \hat{\boldsymbol{\theta}}=\left({\hat{\sigma}}_u^2,\hat{\rho}\right)^{\prime } \):
$$ {\hat{\delta}}_i^{SEBLUP}={\overset{\sim }{\delta}}_i^{SEBLUP}\left(\hat{\boldsymbol{\theta}}\right)={\boldsymbol{x}}_i^{\prime}\overset{\sim }{\boldsymbol{\beta}}\left(\hat{\boldsymbol{\theta}}\right)+{\boldsymbol{b}}_i^{\prime}\boldsymbol{G}\left(\hat{\boldsymbol{\theta}}\right){\boldsymbol{\varSigma}}^{-\mathbf{1}}\left(\hat{\boldsymbol{\theta}}\right)\left\{\boldsymbol{y}-\boldsymbol{X}\overset{\sim }{\boldsymbol{\beta}}\left(\hat{\boldsymbol{\theta}}\right)\right\}. $$
(8)
If we assume the normality of the random effects, we can estimate \( {\sigma}_u^2 \) and ρ based on different procedures. In this research, we consider the Restricted Maximum Likelihood estimator, which takes into account for the loss in degrees of freedom derived from estimating β, while other estimators, such as the Maximum Likelihood estimator, do not (Rao and Molina 2015). The assumption of normality of the random effects is reasonable in those cases in which area-level direct estimates are normally distributed, as tends to be the case in criminological studies looking into the confidence in police work (Williams et al. 2019), emotions about crime (Whitworth 2012) and rates of some crime types at large spatial scales (Fay and Diallo 2012). However, such assumption may be considered invalid in those cases in which the normality of direct estimates is not met. This may be the case of studies analysing specific crime types at detailed spatial scales, as these may show zero inflated skewed distributions and thus robust SAE techniques adjusted to non-normal distributions are needed (Dreassi et al. 2014).
Previous Studies Using the SEBLUP
The SEBLUP has not yet been used to estimate crime rates or confidence in the police. However, a series of simulation studies and applications analysing economic and agricultural outcomes have shown that the SEBLUP tends to outperform EBLUP estimators when ρ moves away from zero -especially when it is close to −1 or 1 (Chandra et al. 2007; Petrucci and Salvati 2006; Pratesi and Salvati 2008). There are very few simulation studies that investigate the impact of m, and the interaction between m and ρ, on the SEBLUP’s performance, and these show contradicting results. Salvati (2004) examined the precision of SEBLUP estimates for m equal to 25 and 50, and ρ = {±0.25,±0.5,±0.75}, and concluded that the improvement in the estimates’ accuracy is higher when the spatial autoregressive coefficient increases, but also that “benefit is bigger as the number of small areas increase” (Salvati 2004:11). In policing research, the SEBLUP is thus expected to produce more reliable estimates than the EBLUP when the values of the variable of interest geographically cluster together, as observed in many studies on crime and crime perceptions (Baller et al. 2001; Williams et al. 2019), and when the number of areas for which we aim to produce estimates is large. Therefore, in cases like the one encountered by Gemmell et al. (2004), who produced estimates of drug use for ten local authorities in Greater Manchester, the EBLUP is expected to produce better estimates than the SEBLUP due to the small number of areas under study.
Asfar and Sadik (2016) analyzed the SEBLUP’s relative mean squared errors under m equal to 16, 64 and 144, and they found large relative improvement of SEBLUP estimates even when ρ is very small (ρ = 0.05) and small (ρ = 0.25), also in cases of very few areas under study (m = 16). In addition, such improvement was sometimes larger when m was equal to 16 than in cases of m equal to 64 and 144. These results are not consistent with other simulation studies, which show that SEBLUP’s relative performance improves as the number of areas increases (Salvati 2004), and the SEBLUP’s precision is not improved if ρ ≅ 0 in cases of m equal to 25 and 50 (Salvati 2004), 61 (Petrucci and Salvati 2006), 23 (Chandra et al. 2007) and 42 (Pratesi and Salvati 2008). Therefore, further research is needed to understand how both ρ and m affect the SEBLUP’s relative precision, and we assess the performance of the SEBLUP in Section 5.