We’re sorry, something doesn't seem to be working properly.

Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.


Likelihood-based strategies for estimating unknown parameters and predicting missing data in the simultaneous autoregressive model

  • 18 Accesses


We attempt a three-stage comparison of several strategies for estimating parameters and predicting data in the simultaneous autoregressive model, which is a regression model with spatial autocorrelation in the disturbance between locations as the unit of observation. These strategies differ according to the formulation of the log-likelihood function containing a parametric weight matrix. In the first stage, a chain of logical reasoning is used to obtain theoretical findings by assuming that the data generating model and the data fitting model coincide. We consider the possibility that a subset of locations may be included in neither the parameter estimation nor the data prediction. In the second stage, a series of Monte Carlo experiments are conducted to supplement the theoretical comparison by considering also a mismatch between the two models. The prevalent strategy is defined as an approach that is not based on the exact log-likelihood function, regardless of the setting. The use of this strategy indicates that the parameter estimators do not reflect the mutual connection between all the locations included in the prediction. In the third stage, an empirical comparison is made to confirm the findings from the experimental comparison by using data observed in the real world. We conclude that the reasonable choice is not the prevalent strategy, but a strategy that can be defined as an approach based on the exact log-likelihood function, depending on the setting. The reasonable strategy tailors the parameter estimators to suit the mutual connection between all the locations included in the prediction.

This is a preview of subscription content, log in to check access.

Access options

Buy single article

Instant unlimited access to the full article PDF.

US$ 39.95

Price includes VAT for USA

Subscribe to journal

Immediate online access to all issues from 2019. Subscription will auto renew annually.

US$ 99

This is the net price. Taxes to be calculated in checkout.


  1. 1.

    See Kato (2008) for the performances of the two predictors.

  2. 2.

    According to usage, we define abbreviations in parentheses at first mention and use them throughout thereafter. For the three models considered in LeSage and Pace (2004), their abbreviations are carried over for the sake of expediency.

  3. 3.

    See Dubin (2003) for the distinction between the two cases.

  4. 4.

    In this setting, they also conducted Monte Carlo experiments to illustrate the approach for the SAM by selecting appropriate data from the same data set as we adopt in the present study, and using contiguous neighbors to define the DGM and the DFM.

  5. 5.

    Using this coordinate system neither denies that the definition of space may vary, depending on the subject of study, nor affects the generality of our discussion.

  6. 6.

    In the CAR model, \(R\left( {\mathbf {D}};{\mathbf {c}}\right) \) is set to \(({\mathbf {I}}-G({\mathbf {D}};{\mathbf {c}}))^{-1}\), where \(G(\circ;\,{\mathbf {c}})\) is a function of the same type. This model can also be expressed as the SAR model if \(G({\mathbf {D}};{\mathbf {c}})\) is equal to \(c_{1}W({\mathbf {D}} ;c_{2})+c_{1}W({\mathbf {D}};c_{2})^{\prime }-c_{1}^{2}W({\mathbf {D}};c_{2} )^{\prime }W({\mathbf {D}};c_{2})\). The dimensions of \({\mathbf {0}}\) and \({\mathbf {I}}\) change according to the context.

  7. 7.

    The number of nearest neighbors offers an example of \(c_{2}\). The point to observe is that in using contiguous neighbors, \({\mathbf {D}}\) as well as \(c_{2}\) may not be suitable to define the weight matrix. We believe that the findings from our strategy comparison will also be useful in such circumstances.

  8. 8.

    The exact log-likelihood is equivalent to the marginal log-likelihood considered in Suesse (2018), where parameter estimation was discussed without a view to making data prediction.

  9. 9.

    See Kato (2008) for the use of this approach.

  10. 10.

    See Martin (1984) for the derivation of this identity.

  11. 11.

    The approach based on \(M_{2}({\mathbf {a}}|{\mathbf {y}}_{1},{\hat{\mathbf{a}}})\) was proposed in LeSage and Pace (2004) and was compared with the prevalent approach in Kato (2013). These studies defined intuitive estimators for \(\sigma ^{2}\), however, without considering the relevant first-order condition for optimizing the objective function.

  12. 12.

    See Suesse and Zammit-Mangion (2017) for an empirical illustration of this algorithm.

  13. 13.

    The expression \(M_{4}({\mathbf {a}}|{\mathbf {y}} _{1},{\hat{\mathbf{a}}})\) can be specified as \(\int _{S}\ln P({\mathbf {y}} _{1}|{\mathbf {a}})\cdot P({\mathbf {y}}_{2}|{\mathbf {y}}_{1},{\hat{\mathbf{a}} }){\rm d}{\mathbf {y}}_{2}\), and \(\ln P({\mathbf {y}}_{1}|{\mathbf {a}})\) can be rewritten as \(L({\mathbf {a}}|{\mathbf {y}}_{1})\).

  14. 14.

    See Griffith et al. (1989) for the potential of this approach.

  15. 15.

    No row is standardized when it comprises zeros.

  16. 16.

    A shorthand device is used to express the components of \(R({\mathbf {D}};{\mathbf {c}})\).

  17. 17.

    Negative autocorrelation may be of practical concern in certain circumstances. However, such autocorrelation lies outside the scope of our discussion.

  18. 18.

    If a model is used as the DFM, then the values of the parameters chosen for defining the model as the DGM should be utilized for carrying out such a procedure, under the assumption that the DFM is the same as the DGM.

  19. 19.

    If \(c_{1}\) were set to 1, then the logarithm of the determinant in the profiled objective function would be undefined.

  20. 20.

    If \(c_{1}\) were set to 0, then the value of the profiled objective function would be independent of the value of \(c_{2}\).

  21. 21.

    The impacts of changes in the values of the regressors on the values of the regressand can be divided into direct impacts and indirect impacts. As proposed in LeSage and Pace (2018), the two types of impacts should be taken into account in producing a design for Monte Carlo experiments. In experiments on the SAM or the SDM, each type of impacts will be measured by combining the spatial autocorrelation parameters and the regression parameters, carrying out a parametric bootstrap. In experiments on the SEM, however, as indicated in LeSage and Pace (2009), the direct impacts should be measured by focusing on the regression parameters, under the assumption that the indirect impacts are zeros. It follows that a bootstrap-free design can be adopted for experiments on the SAR model.

  22. 22.

    It may also be possible to focus on a small number of extreme configurations and use a large number of replications in such configurations.

  23. 23.

    The standard normal distribution is assumed to draw inferences.

  24. 24.

    In all eight tables, the first four experiments produce large figures for the MSE, regardless of the strategy, compared with the last three experiments. A plausible explanation is that unexpected values may be calculated for the disturbance and hence the regressand as a consequence of the irregularity of spatial autocorrelation if a weight matrix is used to define the DGM. See Dubin (1998) for a graphical illustration of this irregularity.

  25. 25.

    Depending on the strategy, arithmetic errors crept into the execution. In both Tables 7 and 8, two replications have been replaced to obtain the results of the experiment with PWR as the DGM. In Table 8, one replication has also been replaced to obtain the results of the experiment with NNS as the DGM.

  26. 26.

    In some of the experiments in Table 8, the standard error of the estimator of \(c_{2}\) is large, regardless of the strategy, compared with the value of the estimator. This suggests that detection of autocorrelation should place primary importance on the results for \(c_{1}\).

  27. 27.

    The point to observe is that in the present paper, transformations of the dependent variable and the independent variables, including transformations into themselves, are labeled regressand and regressors, respectively. In certain circumstances, it will be necessary to predict missing dependent variable data. However, such prediction lies outside the scope of our discussion.

  28. 28.

    Earlier examples were cited in Gilley and Pace (1996) and Pace and Gilley (1997). Recent examples appear in Kostov (2010) and Arbia (2014).


  1. Anselin L (1988) Spatial econometrics: methods and models. Kluwer Academic Publishers, Dordrecht

  2. Anselin L (2010) Thirty years of spatial econometrics. Pap Reg Sci 89(1):3–26

  3. Arbia G (2014) A primer for spatial econometrics with applications in R. Palgrave Macmillan, Hampshire

  4. Baltagi BH, Li D (2004) Prediction in the panel data model with spatial correlation. In: Anselin L, Florax RJGM, Rey SJ (eds) Advances in spatial econometrics: methodology, tools and applications. Springer, Heidelberg, pp 283–295

  5. Belsley DA, Kuh E, Welsch RE (1980) Regression diagnostics: identifying influential data and sources of collinearity. Wiley, New York

  6. Dubin RA (1998) Spatial autocorrelation: a primer. J Hous Econ 7(4):304–327

  7. Dubin R (2003) Robustness of spatial autocorrelation specifications: some Monte Carlo evidence. J Reg Sci 43(2):221–248

  8. Gilley OW, Pace RK (1996) On the Harrison and Rubinfeld data. J Environ Econ Manag 31(3):403–405

  9. Goulard M, Laurent T, Thomas-Agnan C (2017) About predictions in spatial autoregressive models: optimal and almost optimal strategies. Spat Econ Anal 12(2–3):304–325

  10. Griffith DA, Bennett RJ, Haining RP (1989) Statistical analysis of spatial data in the presence of missing observations: a methodological guide and an application to urban census data. Environ Plan A 21(11):1511–1523

  11. Haining R, Griffith D, Bennett R (1989) Maximum likelihood estimation with missing spatial data and with an application to remotely sensed data. Commun Stat Theory Methods 18(5):1875–1894

  12. Harrison D Jr, Rubinfeld DL (1978) Hedonic housing prices and the demand for clean air. J Environ Econ Manag 5(1):81–102

  13. Kato T (2008) A further exploration into the robustness of spatial autocorrelation specifications. J Reg Sci 48(3):615–639

  14. Kato T (2013) Usefulness of the information contained in the prediction sample for the spatial error model. J Real Estate Finance Econ 47(1):169–195

  15. Kelejian HH, Prucha IR (2007) The relative efficiencies of various predictors in spatial econometric models containing spatial lags. Reg Sci Urban Econ 37(3):363–374

  16. Kostov P (2010) Model boosting for spatial weighting matrix selection in spatial lag models. Environ Plan B 37(3):533–549

  17. Le Gallo J (2014) Cross-section spatial regression models. In: Fischer MM, Nijkamp P (eds) Handbook of regional science, vol 3. Springer, Heidelberg, pp 1511–1533

  18. LeSage JP, Pace RK (2004) Models for spatially dependent missing data. J Real Estate Finance Econ 29(2):233–254

  19. LeSage J, Pace RK (2009) Introduction to spatial econometrics. Chapman & Hall/CRC, Boca Raton

  20. LeSage JP, Pace RK (2018) Spatial econometric Monte Carlo studies: raising the bar. Empir Econ 55(1):17–34

  21. Martin RJ (1984) Exact maximum likelihood for incomplete data from a correlated Gaussian process. Commun Stat Theory Methods 13(10):1275–1288

  22. Pace RK (2014) Maximum likelihood estimation. In: Fischer MM, Nijkamp P (eds) Handbook of regional science, vol 3. Springer, Heidelberg, pp 1553–1569

  23. Pace RK, Gilley OW (1997) Using the spatial configuration of the data to improve estimation. J Real Estate Finance Econ 14(3):333–340

  24. Pace RK, Zhu S (2012) Separable spatial modeling of spillovers and disturbances. J Geogr Syst 14(1):75–90

  25. Schabenberger O, Gotway CA (2005) Statistical methods for spatial data analysis. Chapman & Hall/CRC, Boca Raton

  26. Suesse T (2018) Marginal maximum likelihood estimation of SAR models with missing data. Comput Stat Data Anal 120:98–110

  27. Suesse T, Zammit-Mangion A (2017) Computational aspects of the EM algorithm for spatial econometric models with missing data. J Stat Comput Simul 87(9):1767–1786

  28. Wall MM (2004) A close look at the spatial structure implied by the CAR and SAR models. J Stat Plan Inference 121(2):311–324

Download references


The Center for Spatial Data Science at the University of Chicago offers the free use of a collection of data sets for a wide range of purposes. This offer was accepted with much appreciation for the data set Bostonhsg. The quality of the present paper could not have been ensured without constructive suggestions from the editor-in-chief Manfred M. Fischer and illuminative comments from three anonymous reviewers. The usual caveats apply.

Author information

Correspondence to Takafumi Kato.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Kato, T. Likelihood-based strategies for estimating unknown parameters and predicting missing data in the simultaneous autoregressive model. J Geogr Syst 22, 143–176 (2020). https://doi.org/10.1007/s10109-019-00316-z

Download citation


  • Conditional autoregressive model
  • Correlation function
  • Maximum likelihood
  • Simultaneous autoregressive model
  • Weight matrix

JEL Classification

  • C13
  • C21
  • C53