Abstract
A nationwide Environmental Public Health Tracking program is being created to monitor environmental impacts on human health. This, and many other efforts to relate environmental and health outcomes, depend largely on the synthesis of existing data sets; little new data are being generated for this purpose. More often than not, the data available for such synthesis have been collected for different geographic or spatial units, and any set of these units may be different from the one of interest. In this paper, we compare and contrast two approaches that can be used within a Geographic Information System to link spatial data from different sources. The first approach works with centroids of areal units and is commonly used in environmental health analyses. The second approach honors the spatial support (size, shape and orientation) of the data. Using traditional regression models and a spatially-varying coefficient regression model, we show that different linkage methods can lead to different inference. We describe key ideas pertaining to the support of spatial data that are often ignored in many analyses of environmental health data and present a general analytical approach to change-of-support problems.
Similar content being viewed by others
Abbreviations
- ASCVD:
-
Atherosclerotic cardiovascular disease
- BRFSS:
-
Behavioral risk factor surveillance system
- CDC:
-
Centers of disease control and prevention
- EPHT:
-
Environmental public health tracking
- GEE:
-
Generalized estimating equation
- GIS:
-
Geographical information system
- NAWQA:
-
National water quality assessment
- NCEH:
-
National Center for Environmental Health
- NHANES:
-
National Health and Nutrition Examination Survey
- US:
-
United States
- USGS:
-
United States Geological Survey
References
Arbia G (1989) Statistical effect of data transformations: a proposed general framework. In: Goodchild M, Gopal S (eds) The accuracy of spatial databases. Taylor and Francis, London, pp 249–259
Basseville M, Benveniste A, Chou KC, Golden SA, Nikoukhah R, Willsky AS (1992) Modeling and estimation of multiresolution stochastic processes. IEEE Trans Inf Theory 38:766–784
Best NG, Ickstadt K, Wolpert RL (2000) Spatial Poisson regression for health and exposure data measured at disparate resolutions. J Am Stat Assoc 95:1076–1088
Bierkens MFP, Finke PA, De Willigen P (2000) Upscaling and downscaling methods for environmental research. Kluwer Academic, Dordrecht
Bracken I, Martin D (1989) The generation of spatial population distributions from census centroid data. Environ Plann A 21:537–543
Brillinger DR (1990) Spatial–temporal modeling of spatially aggregate birth data. Surv Methodol 16:255–269
Chilès JP, Delfiner P (1999) Geostatistics: modeling spatial uncertainty. Wiley, New York
Chou KC, Willsky AS, Nikoukhah R (1994) Multiscale systems, Kalman filters, and Riccati equations. IEEE Trans Autom Control 39:479–492
Congdon P (2003) Modelling spatially varying impacts of socioeconomic predictors on mortality outcomes. J Geogr Sys 5:161–184
Congdon P (2006) A model for non-parametric spatially varying regression effects. Comput Stat Data Anal 50:422–445
Cressie N (1993) Aggregation in geostatistical problems. In: Soares A (ed) Geostatistics Troia ’92. Kluwer Academic, Dordrecht, pp 25–35
Fieguth PW, Karl WC, Willsky AS, Wunsch C (1995) Multiresolution optimal interpolation and statistical analysis of TOPEX/POSEIDON satellite altimetry. IEEE Trans Geosci Remote Sens 33:280–292
Flowerdew R, Green M (1989) Statistical methods for inference between incompatible zonal systems. In: Goodchild M, Gopal S (eds) The Accuracy of Spatial Data Bases. Taylor and Francis, London, pp 239–247
Flowerdew R, Green M (1992) Developments in areal interpolating methods and GIS. Ann Reg Sci 26:67–78
Fotheringham AS, Brandson C, Charlton M (2002) Geographically weighted regression: the analysis of spatially varying relationships. Wiley, West Sussex
Gelfand AE, Zhu L, Carlin BP (2001) On the change-of-support problem for spatio-temporal data. Biostatistics 2:31–45
Gelfand AE, Kim H-J, Sirmans CF, Banerjee S (2003) Spatial modeling with spatially varying coefficient processes. J Am Stat Assoc 98:387–396
Gotway CA (2002) Sample support. In: El-Shaarawi AH, Piegorsh WW (eds) Encyclopedia of environmetric, vol 4. Wiley, Chirchester, pp 1910–1914
Gotway CA, Young LJ (2002) Combining incompatible spatial data. J Am Stat Assoc 97:632–648
Gotway CA, Young LJ (2007) A geostatistical approach to linking spatially-aggregated data from different sources. J Comput Graph Stat (in press)
Greenland S, Robins J (1994) Invited commentary: ecologic studies—biases, misconceptions, and counterexamples. Am J Epidemiol 139:747–760
Hastie TJ, Tibshirani RJ (1993) Varying-coefficient models. J R Stat Soc B 55:757–796
Huang H-C, Cressie N (2000) Multiscale graphical modeling in space: Applications to command and control. In: Moore M (ed) Proceedings of the spatial statistics workshop. Springer Lecture Notes in Statistics. Springer, New York
Huang H-C, Cressie N, Gabrosek J (2002) Fast resolution-consistent spatial prediction of global processses from satellite data. J Comput Graph Stat 11:1–26
Isaaks EH, Srivastava RM (1990) An introduction to applied geostatistics. Oxford University Press, New York, pp 323–337
Journel AG, Huijbregts CJ (1978) Mining geostatistics. Academic, London
Kelsall J, Wakefield J (2002) Modeling spatial variation in disease risk: a geostatistical approach. J Am Stat Assoc 97:692–701
King G (1997) A solution to the ecological inference problem. Princeton University Press, Princeton
Knobeloch L, Salna B, Hogan A, Postle J, Anderson H (2001) Blue babies and nitrate-contaminated well water. Environ Health Perspect 108:675–678
Liang K-Y, Zeger SL (1986) Longitudinal data analysis using generalized linear models. Biometrika 73:13–22
Mather FJ, White LE, Langlois EC, Shorter CF, Swalm CM, Shaffer JG, Hartley WR (2004) Statistical methods for linking health, exposure, and hazards. Environ Health Perspect 112:1440–1445
McShane L, Albert P, Palmatier M (1997) A latent process regression model for spatially correlated count data. Biometrics 53:698–706
Mugglin AS, Carlin BP (1998) Hierarchical modeling in geographic information systems: Population interpolation over incompatible zones. J Agric Biol Environ Stat 3:117–130
Mugglin AS, Carlin BP, Gelfand AE (2000) Fully model based approaches for spatially misaligned data. J Am Stat Assoc 95:877–887
Müller H-G, Stadtmüller U, Tabnak F (1997) Spatial smoothing of geographically aggregated data, with application to the construction of incidence maps. J Am Stat Assoc 92:61–71
Nakaya T, Fotheringham AS, Brunsdon C, Charlton M (2005) Geographically weighted Poisson regression for disease association mapping. Stat Med 24:2695–2717
Olea RA (1991) Geostatistical glossary and multilingual dictionary. Oxford University Press, New York
Openshaw S (1984) The modifiable areal unit problem. Geobooks, Norwich
Openshaw S, Taylor PJ (1979) A million or so correlation coefficients. In: Wrigley N (ed) Statistical methods in the spatial sciences. Pion, London, pp 127–144
Pew Environmental Health Commission (2000) America’s environmental health gap: why the nation needs a national health tracking network. Technical Report. John Hopkins School of Hygiene and Public Health, pp 1–92
Prentice RL (1988) Correlated binary regression with covariates specific to each binary observation. Biometrics 44:1033–1048
Robinson WS (1950) Ecological correlations and the behavior of individuals. Am Sociol Rev 15:351–357
Swistock BR, Robillard RD, Sharpe WE (1993) A survey of lead, nitrate and radon contamination of private individual water systems in Pennsylvania. J Environ Health 55:6–12
Tobler W (1979) Smooth pycnophylactic interpolation for geographical regions (with discussion). J Am Stat Assoc 74:519–536
Tolbert PE, Mulholland JA, MacIntosh DL, Xu F, Daniels D, Devine OJ, Carlin BP, Klein M, Dorley J, Butler AJ, Nordenberg DF, Frumkin H, Ryan PB, White MC (2000) Air pollution and pediatric emergency room visits for asthma in Atlanta. Am J Epidemiol 151:798–810
US Department of Health and Human Services (2006a) Health risks in the United States: behavioral risk factor surveillance system 2006. Coordinating Center for Health Promotion, Centers for Disease Control and Prevention, Atlanta
U.S. Department of Health and Human Services (2006b) National Health and Nutrition Examination Survey:2005–2006. National Center for Health Statistics, Centers for Disease Control and Prevention, Atlanta
Waller LA, Gotway CA (2004) Applied spatial statistics for public health data. John Wiley, New York
Wikle CK, Milliff RF, Nychka D, Berliner LM (2001) Spatio–temporal hierarchical Bayesian modeling: tropical ocean surface winds. J Am Stat Assoc 96:382–397
Wong DWS (1996) Aggregation effects in geo-referenced data. In: Griffiths D (ed) Advanced spatial statistics. CRC Press, Baton Rouge, pp 83–106
Zeger SL, Liang K-Y (1986) Longitudinal data analysis for discrete and continuous outcomes. Biometrics 42:121–130
Zhao LP, Prentice RL (1990) Correlated binary regression using a quadratic exponential model. Biometrika 77:642–648
Acknowledgements
We are grateful to Bill Miller of the Georgia Hospital Association for compiling the hospital discharge database and granting permission to use it. We also appreciate the work of Marty Mendelsonin compiling the nitrate data from the NAQWA surveillance system and managing the GHA database.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Young, L.J., Gotway, C.A. Linking spatial data from different sources: the effects of change of support. Stoch Environ Res Risk Assess 21, 589–600 (2007). https://doi.org/10.1007/s00477-007-0136-z
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00477-007-0136-z