Abstract
We explore the estimation effectiveness of spatial lag models in the presence of missing observations. Spatial lag models are used to measure interdependency between dependent variables. If there are no missing data, it is easy to interpret this spatial autocorrelation process. Very sparsely sampled data are sometimes used in empirical studies. For such data, we observe only a small part of a population containing possible mutual dependencies. Simulation studies based on artificial data confirm the relation between the sampling rate and selection ratio of spatial and non-spatial models. Our findings include the following: (1) Negative spatial autocorrelation of the data-generating process (DGP) may not be observed. (2) Positive spatial autocorrelation of the DGP may be observed, but it is downward-biased. (3) We obtain less-biased estimates if we use a non-row-standardized weight matrix. (4) Non-spatial models tend to be selected in preference to the correct model, the spatial lag model. (5) Estimates of regression coefficients remain almost unbiased.
Similar content being viewed by others
Notes
The edge (or border) effect described in Anselin (1988, pp. 175–176) has a close relation with the missing data problem, in that it affects the expected value of the disturbance term and introduces heteroscedasticity.
In this current study, every data unit is observable with a \(\gamma \) probability. Sampling rate is expressed as ‘PMP’ in Arbia et al. (2015). ‘PMP’ there is defined as the proportion of missing points. Their PMP \(=0.05\) and PMP \(\,=0.25\) correspond to our \(\gamma =0.95\) and \(\gamma =0.75\), respectively. It is expressed as ‘\(\alpha \)’ in Wang and Lee (2013a, b). ‘\(\alpha \)’ there is defined as a missing data percentage. Their \(\alpha =10, 20\), and 40 correspond to our \(\gamma =0.9, 0.8\), and 0.6, respectively.
See Anselin (1988, ch. 6) for details of these two methods.
See Stakhovych and Bijmolt (2009, p. 393) for a survey of spatial weights matrices used in simulation analyses.
The current simulation study employs \(\alpha = 2\) and \(\bar{d} = 5\).
Note that there is no spatial autocorrelation in the DGP when the true \(\rho \) value is 0.
Arbia et al. (2015) also pointed out that spatial correlation disappears when some points are missing.
A referee kindly suggested that ‘the SAR parameter \(\rho \) loses its interpretation, and its relationship with the eigen values of weight matrix is lost’ in this setting. Even if there should be further discussions, we here just point out the possibility of the non-row-standardized weight matrix setting.
Arbia et al. (2015) performed experiments for the relatively dense sampling cases. They control the spatial intensity of data deletion with a parameter \(\psi \). Their \(\psi =0\) result corresponds to our experiments. A comparison between their results tells us several things. (1) There are similarities in decline patterns of spatial parameter efficiency. (2) There are differences in decline patterns of regression coefficients efficiency. In our sparse sampling experiments, the pattern does not depend on the true value of spatial parameter. In Arbia et al. (2015)’s dense sampling experiments, it depends on the true value of spatial parameter.
References
Anselin L (1988) Spatial econometrics: methods and models. Kluwer Academic, Dordrecht
Arbia G, Espa G, Giuliani D (2015) Dirty spatial econometrics. DEM Discussion Papers, 2015/09, University of Trento, Department of Economics and Management. http://web.unitn.it/files/download/27419/dem2015_09
Freeman JR (1989) Systematic sampling, temporal aggregation, and the study of political relationships. Polit Anal 1(1):61–98
Goulard M, Laurent T, Thomas-Agnan C (2009) About predictions in spatial SAR models: optimal and almost optimal strategies. A paper was presented at the “3rd World Conference of the Spatial Econometrics Association”, 10 July 2009
Griffith DA, Bennett RJ, Haining RP (1989) Statistical analysis of spatial data in the presence of missing observations: a methodological guide and an application to urban census data. Environ Plan A 21(11):1511–1523
Kelejian HH, Piras G (2011) An extension of Kelejian’s J-test for non-nested spatial models. Reg Sci Urban Econ 41(3):281–292
Kelejian HH, Prucha IR (1998) A generalized spatial two-stage least squares procedure for estimating a spatial autoregressive model with autoregressive disturbances. J Real Estate Finance Econ 17(1):99–121
Kelejian HH, Prucha IR (2010) Spatial models with spatially lagged dependent variables and incomplete data. J Geogr Syst 12:241–257
LeSage JP, Kelley Pace R (2004) Models for spatially dependent missing data. J Real Estate Finance Econ 29(2):233–254
Little RJA, Rubin DB (2002) Statistical analysis with missing data. Wiley-Interscience, New York
Stakhovych S, Bijmolt THA (2009) Specification of spatial models: a simulation study on weights matrices. Pap Reg Sci 88:389–408
Wang W, Lee L-F (2013a) Estimation of spatial autoregressive models with randomly missing data in the dependent variable. Econom J 16(1):73–102
Wang W, Lee L-F (2013b) Estimation of spatial panel data models with randomly missing data in the dependent variable. Reg Sci Urban Econ 43(3):521–538