Skip to main content

Advertisement

Log in

Linking spatial data from different sources: the effects of change of support

  • Original Paper
  • Published:
Stochastic Environmental Research and Risk Assessment Aims and scope Submit manuscript

Abstract

A nationwide Environmental Public Health Tracking program is being created to monitor environmental impacts on human health. This, and many other efforts to relate environmental and health outcomes, depend largely on the synthesis of existing data sets; little new data are being generated for this purpose. More often than not, the data available for such synthesis have been collected for different geographic or spatial units, and any set of these units may be different from the one of interest. In this paper, we compare and contrast two approaches that can be used within a Geographic Information System to link spatial data from different sources. The first approach works with centroids of areal units and is commonly used in environmental health analyses. The second approach honors the spatial support (size, shape and orientation) of the data. Using traditional regression models and a spatially-varying coefficient regression model, we show that different linkage methods can lead to different inference. We describe key ideas pertaining to the support of spatial data that are often ignored in many analyses of environmental health data and present a general analytical approach to change-of-support problems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Abbreviations

ASCVD:

Atherosclerotic cardiovascular disease

BRFSS:

Behavioral risk factor surveillance system

CDC:

Centers of disease control and prevention

EPHT:

Environmental public health tracking

GEE:

Generalized estimating equation

GIS:

Geographical information system

NAWQA:

National water quality assessment

NCEH:

National Center for Environmental Health

NHANES:

National Health and Nutrition Examination Survey

US:

United States

USGS:

United States Geological Survey

References

  • Arbia G (1989) Statistical effect of data transformations: a proposed general framework. In: Goodchild M, Gopal S (eds) The accuracy of spatial databases. Taylor and Francis, London, pp 249–259

    Google Scholar 

  • Basseville M, Benveniste A, Chou KC, Golden SA, Nikoukhah R, Willsky AS (1992) Modeling and estimation of multiresolution stochastic processes. IEEE Trans Inf Theory 38:766–784

    Article  Google Scholar 

  • Best NG, Ickstadt K, Wolpert RL (2000) Spatial Poisson regression for health and exposure data measured at disparate resolutions. J Am Stat Assoc 95:1076–1088

    Article  Google Scholar 

  • Bierkens MFP, Finke PA, De Willigen P (2000) Upscaling and downscaling methods for environmental research. Kluwer Academic, Dordrecht

    Google Scholar 

  • Bracken I, Martin D (1989) The generation of spatial population distributions from census centroid data. Environ Plann A 21:537–543

    Article  CAS  Google Scholar 

  • Brillinger DR (1990) Spatial–temporal modeling of spatially aggregate birth data. Surv Methodol 16:255–269

    Google Scholar 

  • Chilès JP, Delfiner P (1999) Geostatistics: modeling spatial uncertainty. Wiley, New York

    Google Scholar 

  • Chou KC, Willsky AS, Nikoukhah R (1994) Multiscale systems, Kalman filters, and Riccati equations. IEEE Trans Autom Control 39:479–492

    Article  Google Scholar 

  • Congdon P (2003) Modelling spatially varying impacts of socioeconomic predictors on mortality outcomes. J Geogr Sys 5:161–184

    Article  Google Scholar 

  • Congdon P (2006) A model for non-parametric spatially varying regression effects. Comput Stat Data Anal 50:422–445

    Article  Google Scholar 

  • Cressie N (1993) Aggregation in geostatistical problems. In: Soares A (ed) Geostatistics Troia ’92. Kluwer Academic, Dordrecht, pp 25–35

    Google Scholar 

  • Fieguth PW, Karl WC, Willsky AS, Wunsch C (1995) Multiresolution optimal interpolation and statistical analysis of TOPEX/POSEIDON satellite altimetry. IEEE Trans Geosci Remote Sens 33:280–292

    Article  Google Scholar 

  • Flowerdew R, Green M (1989) Statistical methods for inference between incompatible zonal systems. In: Goodchild M, Gopal S (eds) The Accuracy of Spatial Data Bases. Taylor and Francis, London, pp 239–247

    Google Scholar 

  • Flowerdew R, Green M (1992) Developments in areal interpolating methods and GIS. Ann Reg Sci 26:67–78

    Article  Google Scholar 

  • Fotheringham AS, Brandson C, Charlton M (2002) Geographically weighted regression: the analysis of spatially varying relationships. Wiley, West Sussex

    Google Scholar 

  • Gelfand AE, Zhu L, Carlin BP (2001) On the change-of-support problem for spatio-temporal data. Biostatistics 2:31–45

    Article  Google Scholar 

  • Gelfand AE, Kim H-J, Sirmans CF, Banerjee S (2003) Spatial modeling with spatially varying coefficient processes. J Am Stat Assoc 98:387–396

    Article  Google Scholar 

  • Gotway CA (2002) Sample support. In: El-Shaarawi AH, Piegorsh WW (eds) Encyclopedia of environmetric, vol 4. Wiley, Chirchester, pp 1910–1914

  • Gotway CA, Young LJ (2002) Combining incompatible spatial data. J Am Stat Assoc 97:632–648

    Article  Google Scholar 

  • Gotway CA, Young LJ (2007) A geostatistical approach to linking spatially-aggregated data from different sources. J Comput Graph Stat (in press)

  • Greenland S, Robins J (1994) Invited commentary: ecologic studies—biases, misconceptions, and counterexamples. Am J Epidemiol 139:747–760

    CAS  Google Scholar 

  • Hastie TJ, Tibshirani RJ (1993) Varying-coefficient models. J R Stat Soc B 55:757–796

    Google Scholar 

  • Huang H-C, Cressie N (2000) Multiscale graphical modeling in space: Applications to command and control. In: Moore M (ed) Proceedings of the spatial statistics workshop. Springer Lecture Notes in Statistics. Springer, New York

  • Huang H-C, Cressie N, Gabrosek J (2002) Fast resolution-consistent spatial prediction of global processses from satellite data. J Comput Graph Stat 11:1–26

    Article  CAS  Google Scholar 

  • Isaaks EH, Srivastava RM (1990) An introduction to applied geostatistics. Oxford University Press, New York, pp 323–337

    Google Scholar 

  • Journel AG, Huijbregts CJ (1978) Mining geostatistics. Academic, London

    Google Scholar 

  • Kelsall J, Wakefield J (2002) Modeling spatial variation in disease risk: a geostatistical approach. J Am Stat Assoc 97:692–701

    Article  Google Scholar 

  • King G (1997) A solution to the ecological inference problem. Princeton University Press, Princeton

    Google Scholar 

  • Knobeloch L, Salna B, Hogan A, Postle J, Anderson H (2001) Blue babies and nitrate-contaminated well water. Environ Health Perspect 108:675–678

    Article  Google Scholar 

  • Liang K-Y, Zeger SL (1986) Longitudinal data analysis using generalized linear models. Biometrika 73:13–22

    Article  Google Scholar 

  • Mather FJ, White LE, Langlois EC, Shorter CF, Swalm CM, Shaffer JG, Hartley WR (2004) Statistical methods for linking health, exposure, and hazards. Environ Health Perspect 112:1440–1445

    Article  Google Scholar 

  • McShane L, Albert P, Palmatier M (1997) A latent process regression model for spatially correlated count data. Biometrics 53:698–706

    Article  CAS  Google Scholar 

  • Mugglin AS, Carlin BP (1998) Hierarchical modeling in geographic information systems: Population interpolation over incompatible zones. J Agric Biol Environ Stat 3:117–130

    Article  Google Scholar 

  • Mugglin AS, Carlin BP, Gelfand AE (2000) Fully model based approaches for spatially misaligned data. J Am Stat Assoc 95:877–887

    Article  Google Scholar 

  • Müller H-G, Stadtmüller U, Tabnak F (1997) Spatial smoothing of geographically aggregated data, with application to the construction of incidence maps. J Am Stat Assoc 92:61–71

    Article  Google Scholar 

  • Nakaya T, Fotheringham AS, Brunsdon C, Charlton M (2005) Geographically weighted Poisson regression for disease association mapping. Stat Med 24:2695–2717

    Article  CAS  Google Scholar 

  • Olea RA (1991) Geostatistical glossary and multilingual dictionary. Oxford University Press, New York

    Google Scholar 

  • Openshaw S (1984) The modifiable areal unit problem. Geobooks, Norwich

    Google Scholar 

  • Openshaw S, Taylor PJ (1979) A million or so correlation coefficients. In: Wrigley N (ed) Statistical methods in the spatial sciences. Pion, London, pp 127–144

    Google Scholar 

  • Pew Environmental Health Commission (2000) America’s environmental health gap: why the nation needs a national health tracking network. Technical Report. John Hopkins School of Hygiene and Public Health, pp 1–92

  • Prentice RL (1988) Correlated binary regression with covariates specific to each binary observation. Biometrics 44:1033–1048

    Article  CAS  Google Scholar 

  • Robinson WS (1950) Ecological correlations and the behavior of individuals. Am Sociol Rev 15:351–357

    Article  Google Scholar 

  • Swistock BR, Robillard RD, Sharpe WE (1993) A survey of lead, nitrate and radon contamination of private individual water systems in Pennsylvania. J Environ Health 55:6–12

    CAS  Google Scholar 

  • Tobler W (1979) Smooth pycnophylactic interpolation for geographical regions (with discussion). J Am Stat Assoc 74:519–536

    Article  CAS  Google Scholar 

  • Tolbert PE, Mulholland JA, MacIntosh DL, Xu F, Daniels D, Devine OJ, Carlin BP, Klein M, Dorley J, Butler AJ, Nordenberg DF, Frumkin H, Ryan PB, White MC (2000) Air pollution and pediatric emergency room visits for asthma in Atlanta. Am J Epidemiol 151:798–810

    CAS  Google Scholar 

  • US Department of Health and Human Services (2006a) Health risks in the United States: behavioral risk factor surveillance system 2006. Coordinating Center for Health Promotion, Centers for Disease Control and Prevention, Atlanta

  • U.S. Department of Health and Human Services (2006b) National Health and Nutrition Examination Survey:2005–2006. National Center for Health Statistics, Centers for Disease Control and Prevention, Atlanta

  • Waller LA, Gotway CA (2004) Applied spatial statistics for public health data. John Wiley, New York

    Book  Google Scholar 

  • Wikle CK, Milliff RF, Nychka D, Berliner LM (2001) Spatio–temporal hierarchical Bayesian modeling: tropical ocean surface winds. J Am Stat Assoc 96:382–397

    Article  Google Scholar 

  • Wong DWS (1996) Aggregation effects in geo-referenced data. In: Griffiths D (ed) Advanced spatial statistics. CRC Press, Baton Rouge, pp 83–106

    Google Scholar 

  • Zeger SL, Liang K-Y (1986) Longitudinal data analysis for discrete and continuous outcomes. Biometrics 42:121–130

    Article  CAS  Google Scholar 

  • Zhao LP, Prentice RL (1990) Correlated binary regression using a quadratic exponential model. Biometrika 77:642–648

    Article  Google Scholar 

Download references

Acknowledgements

We are grateful to Bill Miller of the Georgia Hospital Association for compiling the hospital discharge database and granting permission to use it. We also appreciate the work of Marty Mendelsonin compiling the nitrate data from the NAQWA surveillance system and managing the GHA database.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Linda J. Young.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Young, L.J., Gotway, C.A. Linking spatial data from different sources: the effects of change of support. Stoch Environ Res Risk Assess 21, 589–600 (2007). https://doi.org/10.1007/s00477-007-0136-z

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00477-007-0136-z

Keywords

Navigation