Abstract
Regression models often suffer from multicollinearity that greatly reduces the reliability of estimated coefficients and hinders an appropriate understanding of the role of independent variables. It occurs in regional science especially when independent variables include the distances from urban facilities. This paper proposes a new method for deriving the configuration of sample points that reduces multicollinearity in regression models with distance variables. Multicollinearity is evaluated by the maximum absolute correlation coefficient between distance variables. A spatial optimization technique is utilized to calculate the optimal configuration of sample points. The method permits us not only to locate sample points appropriately but also to evaluate the location of facilities from which the distance is measured in terms of the correlation between distance variables in a systematic way. Numerical experiments and empirical applications are performed to test the validity of the method. The results support the technical soundness of the proposed method and provided some useful implications for the design of sample location.
Similar content being viewed by others
References
Asteriou D, Hall SG (2007) Applied econometrics: a modern approach using reviews and microfit. Palgrave Macmillan, New York
Avriel M (2003) Nonlinear programming: analysis and methods. Dover, New York
Belsley DA, Kuh E, Welsch RE (1980) Regression diagnostics: identifying influential data and sources of collinearity. Wiley, New York
Bender B, Hwang H-S (1985) Hedonic housing price indices and secondary employment centers. J Urban Econ 17:90–107
Berry BJL (1976) Ghetto expansion and single-family housing prices: Chicago, 1968–1972. J Urban Econ 3:397–423
Boland RP, Urrutia J (2001) Finding the largest axis-aligned rectangle in a polygon in o(n log n) time. In: Proceedings 13th Canadian conference on computational geometry, Citeseer
Camerer CF, Loewenstein G, Rabin M (2011) Advances in behavioral economics. Princeton University Press, Princeton
Chatterjee S, Hadi AS (2013) Regression analysis by example. Wiley, New York
Chen GJ (2012) A simple way to deal with multicollinearity. J Appl Stat 39:1893–1909
Curto JD, Pinto JC (2007) New multicollinearity indicators in linear regression models. Int Stat Rev 75:114–121
Daniels K, Milenkovic V, Roth D (1997) Finding the largest area axis-parallel rectangle in a polygon. Comput Geom 7:125–148
Davis DD, Holt CA (1993) Experimental economics. Princeton University Press, Princeton
Davis PJ, Rabinowitz P (2007) Methods of numerical integration. Dover, New York
Dewhurst JHL (1993) Spatial multicollinearity and sample selection in models with inverse distance measures. University of Dundee, Department of Economics and Management
Dormann CF, Elith J, Bacher S, Buchmann C, Carl G, Carré G, Marquéz JRG, Gruber B, Lafourcade B, Leitão PJ (2013) Collinearity: a review of methods to deal with it and a simulation study evaluating their performance. Ecography 36:027–046
Farrar DE, Glauber RR (1967) Multicollinearity in regression analysis: the problem revisited. Rev Econ Stat 49(1):92–107
Fletcher R (2013) Practical methods of optimization. Wiley, New York
Gordon AG, Gorham E (1963) Ecological aspects of air pollution from an iron-sintering plant at Wawa, Ontario. Can J Bot 41:1063–1078
Hamacher HW, Drezner Z (2002) Facility location: applications and theory. Springer, Berlin
Harrison D Jr, Rubinfeld DL (1978) Hedonic housing prices and the demand for clean air. J Environ Econ Manag 5:81–102
Heady EO, Yeh MH (1959) National and regional demand functions for fertilizer. J Farm Econ 41:332–348
Heady EO, Pesek JT, Brown WG (1955) Crop response surfaces and economic optima in fertilizer use. Agricultural Experiment Station, Iowa State College
Heikkila E (1988) Multicollinearity in regression models with multiple distance measures. J Reg Sci 28:345–362
Ihlanfeldt KR, Taylor LO (2004) Externality effects of small-scale hazardous waste sites: evidence from urban commercial property markets. J Environ Econ Manag 47:117–139
Jilcott Pitts SB, Wu Q, McGuirt JT, Crawford TW, Keyserling TC, Ammerman AS (2013) Associations between access to farmers’ markets and supermarkets, shopping patterns, fruit and vegetable consumption and health indicators among women of reproductive age in eastern North Carolina, USA. Public Health Nutr 16:1944–1952
Kagel JH, Roth AE, Hey JD (1995) The handbook of experimental economics. Princeton University Press, Princeton, NJ
Karlof JK (2005) Integer programming: theory and practice. CRC Press, Boca Raton, FL
Kashid DN, Kulkarni SR (2002) A more genral criterion for subset selection in multiple linear regression. Commun Stat Theory Methods 31:795–811
Kovàcs P, Petres T, Tóth L (2005) A new measure of multicollinearity in linear regression models. Int Stat Rev 73:405–412
Levy PS, Lemeshow S (2008) Sampling of populations: methods and applications, 4th edn. Wiley, Chichester
Li MM, Brown HJ (1980) Micro-neighborhood externalities and hedonic housing prices. Land Econ 56:125–141
Mansfield ER, Webster JT, Gunst RF (1977) An analytic variable selection technique for principal component regression. Appl Stat 26:34–40
Miller A (2012) Subset selection in regression. CRC Press, Boca Raton, FL
Morland K, Diez Roux AV, Wing S (2006) Supermarkets, other food stores, and obesity: the atherosclerosis risk in communities study. Am J Prev Med 30:333–339
Nemhauser GL, Wolsey LA (1988) Integer and combinatorial optimization. Wiley, New York
Ni L (2011) Principal component regression revisited. Stat Sin 21:741
Noonan DS, Krupka DJ, Baden BM (2007) Neighborhood dynamics and price effects of superfund site clean-up. J Reg Sci 47:665–692
Pielou EC (1977) Mathematical ecology. Wiley, New York
Riga-Karandinos A, Karandinos M (1998) Assessment of air pollution from a lignite power plant in the plain of Megalopolis (Greece) using as biomonitors three species of lichens; impacts on some biochemical parameters of lichens. Sci Total Environ 215:167–183
Ripley BD (2005) Spatial statistics. Wiley, New York
Rundle A, Neckerman KM, Freeman L, Lovasi GS, Purciel M, Quinn J, Richards C, Sircar N, Weiss C (2009) Neighborhood food environment and walkability predict obesity in New York City. Environ Health Perspect 117:442–447
Sadahiro Y, Wang Y (2015) Configuration of sample points for the reduction of multicollinearity in regression models with distance variables. In Discussion paper series, Center for Spatial Information Science, The University of Tokyo
Snyman J (2005) Practical mathematical optimization: an introduction to basic optimization theory and classical and new gradient-based algorithms. Springer, Berlin
Spanos A, McGuirk A (2002) The problem of near-multicollinearity revisited: erratic vs systematic volatility. J Econom 108:365–393
Thompson ST (2012) Sampling, 3rd edn. Wiley, New York
Valliant R, Dever JA, Kreuter F (2013) Practical tools for designing and weighting survey samples. Springer, Berlin
Vigneau E, Devaux M, Qannari E, Robert P (1997) Principal component regression, ridge regression and ridge principal component regression in spectroscopy calibration. J Chemom 11:239–249
Weissfeld LA, Sereika SM (1991) A multicollinearity diagnostic for generalized linear models. Commun Stat Theory Methods 20:1183–1198
Wilkinson N, Klaes M (2012) An introduction to behavioral economics. Palgrave Macmillan, New York
Wolsey LA (1998) Integer programming. Wiley, New York
Yu K, Cheung Y, Cheung T, Henry RC (2004) Identifying the impact of large urban airports on local air quality by nonparametric regression. Atmos Environ 38:4501–4507
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Sadahiro, Y., Wang, Y. Configuration of sample points for the reduction of multicollinearity in regression models with distance variables. Ann Reg Sci 61, 295–317 (2018). https://doi.org/10.1007/s00168-018-0868-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00168-018-0868-3