Skip to main content

Advertisement

Log in

Application of the double kernel density approach to the multivariate analysis of attributeless event point datasets

  • Original Paper
  • Published:
Letters in Spatial and Resource Sciences Aims and scope Submit manuscript

Abstract

Attributeless event point datasets (AEPDs) are datasets composed of discrete events or observations defined by their geographical location only and lacking any other additional attributes. Examples of such datasets include spotted criminal events, road accidents and residential locations of disease patients. A commonly used approach to the analysis of such datasets involves their aggregation into predefined areal units, such as neighborhoods or census tracts. However, this approach does not perform effectively when the events of interests are geographically localized and the number of areal units available for aggregation is small. An alternative approach to the analysis of AEPDs is based on double kernel density (DKD) smoothing, according to which events of interest are transformed into continuous density surfaces and then normalized by the density of the entire population from which the events of interest are drawn. In the present study, the applicability of the DKD approach to multivariate analysis is tested for estimation consistency, sensitivity to the number of input observations and potential bias attributed to the spatial dependency of neighboring observations. Our analysis indicates that the DKD approach provides reasonably stable and consistent estimates, if the following three preconditions are met: (a) the kernel estimation parameters are properly defined, (b) the number of reference points, used for transformation of continuous DKD surfaces into discrete observations, is sufficiently large, and (c) the spatial dependency of neighboring observations is taken into account using spatial analysis tools.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Notes

  1. The industrial zone in question hosts several petrochemical facilities, and, therefore, aerial proximity to this zone was suspected to be a potential determinant of the observed cancer morbidity.

  2. We implicitly assume that individual exposure is determined by the place of residence. This assumption is required since working place and duration of living in a particular location were not available in our data set.

  3. As previously mentioned, the term “double kernel density” (DKD) refers to an analytical method that performs kernel estimation twice: first, for the events of interests, second, for the overall population from which the events of interest are drawn, and then by calculating the ratio between the two. The term DKD itself originates in an early theoretical paper by Devroye (1989).

  4. Several GIS software packages can perform pixel-on-pixel regressions, but such regressions are unable to account for spatial dependency of regression residuals. In our study, the GeoDa\(^{\mathrm{TM}}\) software (Anselin et al. 2006) was used for this task.

  5. Cross–validation test uses all the data and makes comparison by removing each data location once at a time and predicts the associated data values (Johnston et al. 2001).

References

  • Anselin, L., Rey, S.: Properties of tests for spatial dependence in linear regression models. Geogr. Anal. 23(2), 112–131 (1991)

    Article  Google Scholar 

  • Anselin, L., Syabri, I., Kho, Y.: GeoDa: an introduction to spatial data analysis. Geogr. Anal. 38, 15–22 (2006)

    Article  Google Scholar 

  • Barchana, M., Liptshitz, I., Fishler, Y., Green, M.: Geographical mapping of malignant diseases in Israel 2001–2005. Israel National Cancer Registry, Jerusalem (2007)

    Google Scholar 

  • Bithell, J.F.: An application of density estimation to geographical epidemiology. Stat. Med. 9, 691–701 (1990)

    Article  Google Scholar 

  • Busby, C.: Nuclear pollution, childhood leukaemia, retinoblastoma and brain tumours in Gwynedd and Anglesey Wards near the Menai Straits, North Wales 2000–2003. Green Audit, Aberystwyth, Bangor Report for HTV (2004)

    Google Scholar 

  • Carlos, H.A., Shi, X., Sargent, J., Tanski, S., Berke, E.M.: Density estimation and adaptive bandwidths: a primer for public health practitioners. Int J Health Geogr 9, 39 (2010)

    Article  Google Scholar 

  • Centers for Disease Control and Prevention. Health Disparities in Cancer. (2014). http://www.cdc.gov

  • Chan, Y.C., Simpson, R.W., Mctainsh, G.H., Vowles, P.D., Cohen, D.D., Bailey, G.M.: Source apportionment of PM2.5 and PM10 aerosols in Brisbane (Australia) by receptor modelling. Atmos. Environ. 33(19), 3251–3268 (1999)

    Article  Google Scholar 

  • Chow, J.C., Watson, J.G.: Review of PM2.5 and PM10 apportionment for fossil fuel combustion and other sources by the chemical mass balance receptor model. Energy Fuels 16(2), 222–260 (2002)

    Article  Google Scholar 

  • Colwell, R.K., Chao, A., Gotelli, N.J., Lin, S.Y., Mao, C.X., Chazdon, R.L., Longino, J.T.: Models and estimators linking individual-based and sample-based rarefaction, extrapolation and comparison of assemblages. J. Plant Ecol. 5(1), 3–21 (2012)

    Article  Google Scholar 

  • Cooke, T., Marchant, S.: The changing intrametropolitan location of high-poverty neighbourhoods in the US, 1990–2000. Urban Stud. 43(11), 1971–1989 (2006)

    Article  Google Scholar 

  • Devroye, L.: The double kernel method in density estimation. Annales de l’IHP Probabilités et statistiques 25(4), 533–580 (1989)

    Google Scholar 

  • ESRI [Internet]. ArcGIS Desktop Help 10. (2011). http://webhelp.esri.com

  • Fan, X., Sivo, S.: Keenan S. SAS for Monte Carlo studies: a guide for quantitative researchers. Sas Institute (2002)

  • Fuller-Thomson, E., Hulchanski, J.D., Hdis, S.: The housing/health relationship: what do we know? Rev. Environ. Health 15, 109–133 (2000)

    Article  Google Scholar 

  • Gatrell, A.C., Bailey, T.C., Diggle, P.J., Rowlingson, B.S.: Spatial point pattern analysis and its application in geographical epidemiology. Trans Inst Br Geogr N Ser 21(1), 256–274 (1996)

    Article  Google Scholar 

  • Goldberg, D.W.: Ensuring privacy and confidentiality. In: A geocoding best practices guide. Springfield. North American Association of Central Cancer Registries, Inc., Boca Raton (2008)

  • The Israel National Health Interview Survey, 2005–2006. ICDC, Jerusalem (2007)

  • Statistical abstract of Israel 2007: population, by population group, religion, sex and age. ICBS, Jerusalem (2008)

  • Jetz, W., McPherson, J.M., Guralnick, R.P.: Integrating biodiversity distribution knowledge: toward a global map of life. Trends Ecol Evol 27(3), 151–159 (2012)

    Article  Google Scholar 

  • Johnston, K., Ver Hoef, J.M., Krivoruchko, K., Lucas, N.: Using ArcGIS geostatistical analyst. ESRI (2001)

  • Kloog, I., Abraham, H., Portnov, B.A.: Using kernel density function as an urban analysis tool: Investigating the association between nightlight exposure and the incidence of breast cancer in Haifa. Israel. Comput. Environ. Urban Syst. 33, 55–63 (2009)

    Article  Google Scholar 

  • Laitila, J., Moilanen, A.: Approximating the dispersal of multi-species ecological entities such as communities, ecosystems or habitat types. Ecol. Model. 259, 24–29 (2013)

    Article  Google Scholar 

  • Li, L., Lian, Z.W.: Application of statistical power analysis–how to determine the right sample size in human health, comfort and productivity research. Build Environ. 45(5), 1202–1213 (2010)

    Article  Google Scholar 

  • Logan, J.R.: Making a place for space: spatial thinking in social science. Annu. Rev. Sociol. 38, 507–524 (2012)

    Article  Google Scholar 

  • Municipality of Haifa.: Research and statistical information department: neighborhood profile. MH, Haifa, p. 2010 (2008)

  • Nakaya, T., Yano, K.: Visualising crime clusters in a space–time cube: an exploratory data analysis approach using space–time kernel density estimation and scan statistics. Trans GIS 14(3), 223–239 (2010)

    Article  Google Scholar 

  • Openshaw, S.: The modifiable areal unit problem. GeoBooks, Norwich (1984)

    Google Scholar 

  • Perchoux, C., Chaix, B., Cummins, S., Kestens, Y.: Conceptualization and measurement of environmental exposure in epidemiology: accounting for activity space related to daily mobility. Health Place 21, 86–93 (2013)

    Article  Google Scholar 

  • Pollock, D.S.J.: Maximum Likelihood Estimation. Lecture conducted from University of Leicester, England (1995)

    Google Scholar 

  • Portnov, B.A., Dubnov, J., Barchana, M.: Studying the association between air-pollution and lung cancer incidence in a large metropolitan area using a kernel density function. Socio Econ. Plan. Sci. 43, 141–150 (2009)

    Article  Google Scholar 

  • Portnov, B.A., Zusman, M.: Spatial data analysis using kernel density tools. In: Wang, J. (ed.) Encyclopedia of business analytics and optimization, Montclair State University, vol. 203, pp. 2252–2264 (2014)

  • Prentice, R.L., Pyke, R.: Logistic disease incidence models and case-control studies. Biometrika 66(3), 403–411 (1979)

    Article  Google Scholar 

  • Rosu, A.: A new approach for geocoding postal code-based data in health related studies. Thesis (Master, Geography), Queen’s University (2014)

  • Rushton, G., Armstrong, M.P., Gittler, J., Greene, B.R., Pavlik, C.D., West, M.M., Zimmerman, D.L.: Geocoding in cancer research: a review. Am. J. Prev. Med. 30(2), S16–S24 (2006)

    Article  Google Scholar 

  • Sexton, K.: Cumulative risk assessment: an overview of methodological approaches for evaluating combined health effects from exposure to multiple environmental stressors. Int. J. Environ. Res. Public Health 2012, 9(2), 370–390 (2012)

  • Shi, X.: A geocomputational process for characterizing the spatial pattern of lung cancer incidence in New Hampshire. Ann. Assoc. Am. Geogr. 99(3), 521–533 (2009)

    Article  Google Scholar 

  • Shi, X.: Selection of bandwidth type and adjustment side in kernel density estimation over inhomogeneous backgrounds. Int. J. Geogr. Inf. Sci. 24(5), 643–660 (2010)

    Article  Google Scholar 

  • Silverman, B.W.: Density estimation for statistics and data analysis. Chapman and Hall, London, New York (1986)

    Book  Google Scholar 

  • Swift, A., Liu, L., Uber, J.: MAUP sensitivity analysis of ecological bias in health studies. GeoJournal 79, 137–153 (2014)

    Article  Google Scholar 

  • Tiefelsdorf, M.: Modelling spatial processes: the identification and analysis of spatial relationships in regression residuals by means of Moran’s I. PhD dissertation, Wilfrid Laurier University (1998)

  • U.S. Environmental Protection Agency, US EPA.: Particulate Matter (PM2.5) Speciation Guidance—Final Draft; Technical Report (1999)

  • Ward, E., Jemal, A., Cokkinides, V., Singh, G.K., Cardinez, C., Ghafoor, A., Thun, M.: Cancer disparities by race/ethnicity and socioeconomic status. CA Cancer J Clin 54(2), 78–93 (2004)

    Article  Google Scholar 

  • Wickle, C.K.: Modern perspectives on statistics for spatio-temporal data. Wiley Interdiscip. Rev. Comput. Stat. 7(1), 86–98 (2015)

    Article  Google Scholar 

  • Wilson, R.: Using dual kernel density estimation to examine changes in voucher density over time. Cityscape 14(3), 225–234 (2012)

    Google Scholar 

  • Xie, Z., Yan, J.: Kernel density estimation of traffic accidents in a network space. Comput. Environ. Urban Syst. 32, 396–406 (2008)

    Article  Google Scholar 

  • Zusman, M., Dubnov, J., Barchana, M., Portnov, B.A.: Residential proximity to petroleum storage tanks and associated cancer risks: double kernel density approach vs. zonal estimates. Sci. Total Environ. 441, 265–276 (2012)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dani Broitman.

Appendices

Appendix 1: Descriptive Statistics

See Table 3.

Table 3 Descriptive statistics of the research variables

Appendix 2: Normality tests

See Figs.10, 11.

Fig. 10
figure 10

Normality test statistics of unstandardized residuals for different kernel bandwidths (Kolmogorov-Smirnov test (a) and Shapiro-Wilk test (b)

Fig. 11
figure 11

Normal Q-Q Plot of unstandardized residual for DKD of lung cancer patients (R\(^{2}= 0.79\); 1000-m bandwidth; 1000 reference points)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zusman, M., Broitman, D. & Portnov, B.A. Application of the double kernel density approach to the multivariate analysis of attributeless event point datasets. Lett Spat Resour Sci 9, 363–382 (2016). https://doi.org/10.1007/s12076-015-0166-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12076-015-0166-y

Keywords

JEL Classification

Navigation