Abstract
In geographically weighted regression, one must determine a window size which will be used to subset the data locally. Typically, a cross-validation procedure is used to determine a globally optimal window size. Preliminary investigations indicate that the global cross-validation score is heavily influenced by a small number of observations in the dataset. At present, the ramifications of this behaviour in cross-validation are unknown. The research reported here explores the extent to which individual and groups of observations impact optimal window size determination, and whether one can explain why some points are more influential than others. In addition, we strive to examine the impact neighbourhood specification has on model quality in terms of predictive capabilities and the ability of the method to retrieve spatially varying processes. The analysis is based on several datasets and using simulated data in order to compare and validate results. The results provide some practical guidelines for the use of cross-validation.
Similar content being viewed by others
Notes
For Toronto, 400 neighbours is the largest bandwidth tested, so it is likely that the frequency of local optima at 400 is being augmented by those points which perform well under even larger bandwidths. We would have liked to compute cross-validation scores for larger bandwidths but the current software used to perform the GWR computations is presently incapable of processing such large matrices.
References
Anselin L (1988) Spatial econometrics: methods and models. Kluwer, Dordrecht
Brunsdon C, Fotheringham AS, Charlton ME (1996) Geographically weighted regression: a method for exploring spatial nonstationarity. Geogr Anal 28(4):281–298
Farber S (2004) A comparison of localized regression models in an hedonic house price context. M.A. Dissertation. Centre for the Study of Commercial Activity, Ryerson University
Farber S, Yeates M (2006) A comparison of localized regression models in a hedonic house price context. Can J Reg Sci 29(3):405–420
Fotheringham AS, Brunsdon C, Charlton M (2002) Geographically weighted regression: the analysis of spatially varying relationships. Wiley, Chichester
Fox J (1997) Applied regression analysis, linear models and related methods. Sage Publications, Thousand Oaks
Griffith DA (1988) Advanced spatial statistics: special topics in the exploration of quantitative spatial data series. Kluwer, Dordrecht
Long F (2006) Modelling spatial variations of housing prices in Toronto, ON. M.A. Dissertation. School of Geography and Earth Sciences, McMaster University
Nakaya T, Fotheringham AS, Brunsdon C, Charlton M (2005) Geographically weighted Poisson regression for disease association mapping. Stat Med 24(17):2695–2717
Páez A, Uchida T, Miyamoto K (2001) Spatial association and heterogeneity issues in land price models. Urban Stud 38(9):1493–1508
Páez A, Uchida T, Miyamoto K (2002a) A general framework for estimation and inference of geographically weighted regression models: 1. Location-specific kernel bandwidths and a test for locational heterogeneity. Environ Plann A 34(4):733–754
Páez A, Uchida T, Miyamoto K (2002b) A general framework for estimation and inference of geographically weighted regression models: 2. Spatial association and model specification tests. Environ Plann A 34(5):883–904
Wang N, Mei CL, Yan XD (2007) Local linear estimation of spatially varying coefficient models: an improvement on geographically weighted regression technique. Environ Plann A (forthcoming)
Wheeler DC, Calder CA (2007) An assessment of coefficient accuracy in linear regression models with spatially varying coefficients. J Geogr Syst 9(2):145–166
Wheeler D, Tiefelsdorf M (2005) Multicollinearity and correlation among local regression coefficients in geographically weighted regression. J Geogr Syst 7(2):161–187
Yu DL (2006) Spatially varying development mechanisms in the Greater Beijing Area: a geographically weighted regression investigation. Ann Reg Sci 40(1):173–190
Zhang LJ, Gove JH (2005) Spatial assessment of model errors from four regression techniques. Forest Sci 51(4):334–346
Zhang LJ, Gove JH, Heath LS (2005) Spatial residual analysis of six modeling techniques. Ecol Model 186(2):154–177
Acknowledgments
The authors would like to thank Jean Paelinck and the participants of the Spatial Statistics and Econometrics sessions in the 2006 North American Regional Science Meetings in Toronto for their feedback and suggestions. Three anonymous reviewers provided valuable comments and insights that helped improve the paper. This research was supported by NSERC grant #261872-03. Thanks are also due to Ontario’s Municipal Property Assessment Corporation and in particular Mr. Bill Bradley for their kind support regarding the use of Toronto’s housing data. All views expressed in the paper are those of the authors alone.
Author information
Authors and Affiliations
Corresponding author
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Farber, S., Páez, A. A systematic investigation of cross-validation in GWR model estimation: empirical analysis and Monte Carlo simulations. J Geograph Syst 9, 371–396 (2007). https://doi.org/10.1007/s10109-007-0051-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10109-007-0051-3
Keywords
- Geographically weighted regression
- Cross-validation score
- Influential points
- Goodness-of-fit
- Polarization