A systematic investigation of cross-validation in GWR model estimation: empirical analysis and Monte Carlo simulations
- 746 Downloads
In geographically weighted regression, one must determine a window size which will be used to subset the data locally. Typically, a cross-validation procedure is used to determine a globally optimal window size. Preliminary investigations indicate that the global cross-validation score is heavily influenced by a small number of observations in the dataset. At present, the ramifications of this behaviour in cross-validation are unknown. The research reported here explores the extent to which individual and groups of observations impact optimal window size determination, and whether one can explain why some points are more influential than others. In addition, we strive to examine the impact neighbourhood specification has on model quality in terms of predictive capabilities and the ability of the method to retrieve spatially varying processes. The analysis is based on several datasets and using simulated data in order to compare and validate results. The results provide some practical guidelines for the use of cross-validation.
KeywordsGeographically weighted regression Cross-validation score Influential points Goodness-of-fit Polarization
The authors would like to thank Jean Paelinck and the participants of the Spatial Statistics and Econometrics sessions in the 2006 North American Regional Science Meetings in Toronto for their feedback and suggestions. Three anonymous reviewers provided valuable comments and insights that helped improve the paper. This research was supported by NSERC grant #261872-03. Thanks are also due to Ontario’s Municipal Property Assessment Corporation and in particular Mr. Bill Bradley for their kind support regarding the use of Toronto’s housing data. All views expressed in the paper are those of the authors alone.
- Anselin L (1988) Spatial econometrics: methods and models. Kluwer, DordrechtGoogle Scholar
- Farber S (2004) A comparison of localized regression models in an hedonic house price context. M.A. Dissertation. Centre for the Study of Commercial Activity, Ryerson UniversityGoogle Scholar
- Farber S, Yeates M (2006) A comparison of localized regression models in a hedonic house price context. Can J Reg Sci 29(3):405–420Google Scholar
- Fotheringham AS, Brunsdon C, Charlton M (2002) Geographically weighted regression: the analysis of spatially varying relationships. Wiley, ChichesterGoogle Scholar
- Fox J (1997) Applied regression analysis, linear models and related methods. Sage Publications, Thousand OaksGoogle Scholar
- Griffith DA (1988) Advanced spatial statistics: special topics in the exploration of quantitative spatial data series. Kluwer, DordrechtGoogle Scholar
- Long F (2006) Modelling spatial variations of housing prices in Toronto, ON. M.A. Dissertation. School of Geography and Earth Sciences, McMaster UniversityGoogle Scholar
- Wang N, Mei CL, Yan XD (2007) Local linear estimation of spatially varying coefficient models: an improvement on geographically weighted regression technique. Environ Plann A (forthcoming)Google Scholar
- Zhang LJ, Gove JH (2005) Spatial assessment of model errors from four regression techniques. Forest Sci 51(4):334–346Google Scholar