Processing aggregated data: the location of clusters in health data
 Kevin Buchin,
 Maike Buchin,
 Marc van Kreveld,
 Maarten Löffler,
 Jun Luo,
 Rodrigo I. Silveira
 … show all 6 hide
Abstract
Spatially aggregated data is frequently used in geographical applications. Often spatial data analysis on aggregated data is performed in the same way as on exact data, which ignores the fact that we do not know the actual locations of the data. We here propose models and methods to take aggregation into account. For this we focus on the problem of locating clusters in aggregated data. More specifically, we study the problem of locating clusters in spatially aggregated health data. The data is given as a subdivision into regions with two values per region, the number of cases and the size of the population at risk. We formulate the problem as finding a placement of a cluster window of a given shape such that a cluster function depending on the population at risk and the cases is maximized. We propose areabased models to calculate the cases (and the population at risk) within a cluster window. These models are based on the areas of intersection of the cluster window with the regions of the subdivision. We show how to compute a subdivision such that within each cell of the subdivision the areas of intersection are simple functions. We evaluate experimentally how taking aggregation into account influences the location of the clusters found.
 Armstrong, MP, Rushton, G, Zimmerman, DL (1999) Geographically masking health data to preserve confidentiality. Stat Med 18: pp. 497525 CrossRef
 de Berg M, Cheong O, van Kreveld M, Overmars M (2008) Computational geometry: algorithms and applications, 3rd edn. Springer
 Brody, H, Rip, MR, VintenJohansen, P, Paneth, N, Rachman, S (2000) Mapmaking and mythmaking in broad street: the London cholera epidemic, 1854. Lancet 356: pp. 6468 CrossRef
 Cleave, N, Brown, P, Payne, C (1995) Methods for ecological inference: an evaluation. J R Stat Soc A 158: pp. 5575 CrossRef
 Cox, LH (1996) Protecting confidentiality in small population health and environmental statistics. Stat Med 15: pp. 18951905 CrossRef
 Cromley, E, McLafferty, S (2002) GIS and public health. The Guilford Press, New York
 Den Boer, JW, Verhoef, L, Bencini, MA, Bruin, JP, Jansen, R, Yzerman, EP (2007) Outbreak detection and secondary prevention of legionnaires disease: a national approach. Int J Hyg Environ Health 210: pp. 17 CrossRef
 Gilsdorf, A, Kroh, C, Grimm, S, Jensen, E, WagnerWiening, C, Alpers, K (2008) Large Q fever outbreak due to sheep farming near residential areas. Epidemiol Infect 136: pp. 10841087 CrossRef
 Hawley, K, Moellering, H (2005) A comparative analysis of areal interpolation methods. Cartogr Geogr Inf Sci 32: pp. 411423 CrossRef
 Henry, K, Boscoe, F (2008) Estimating the accuracy of geographical imputation. Int J Health Geogr 7: pp. 3 CrossRef
 Isken L, Kostalova B (2007) Salmonella typhimurium 560, 2006. RIVM. In: Volksgezondheid Toekomst Verkenning, Nationale Atlas Volksgezondheid
 Johnson, PJ, Thiede Call, K, Blewett, LA (2010) The importance of geographic data aggregation in assessing disparities in american indian prenatal care. Am J Publ Health 100: pp. 122128 CrossRef
 King, G (1997) A solution to the ecological inference problem. Princeton University Press, Princeton
 Kulldorff, M (1997) A spatial scan statistic. Commun Stat Theory Methods 26: pp. 14811496 CrossRef
 Kulldorff, M, Heffernan, R, Hartman, J, Assunção, R, Mostashari, F (2005) A spacetime permutation scan statistic for the early detection of disease outbreaks. PLoS Med 2: pp. 216224 CrossRef
 Kulldorff, M, Nagarwalla, N (1995) Spatial disease clusters: detection and inference. Stat Med 14: pp. 799810 CrossRef
 Kulldorff, M, Tango, T, Park, PJ (2003) Power comparisons for disease clustering tests. Comput Stat Data Anal 42: pp. 665684 CrossRef
 Lam, N (1983) Spatial interpolation methods: a review. Am Cartogr 10: pp. 129149 CrossRef
 Luo, L, McLafferty, S, Wang, F (2010) Analyzing spatial aggregation error in statistical models of latestage cancer risk: a monte carlo simulation approach. Int J Health Geogr 9: pp. 51 CrossRef
 Mount, D, Silverman, R, Wu, A (1996) On the area of overlap of translated polygons. Comput Vis Image Underst 64: pp. 5361 CrossRef
 Openshaw S (1984) The modifiable areal unit problem. CATMOG, no 38. Geo Books, Norwich
 Openshaw, S, Charlton, M, Wymer, C, Craft, A (1987) A Mark 1 geographical analysis machine for the automated analysis of point data sets. Int J Geogr Inf Syst 1: pp. 335358 CrossRef
 Phillips P, Lee I (2007) Areal aggregated crime reasoning through density tracing. In: Workshops proc. 7th IEEE internat. conf. data mining. IEEE Computer Society, pp 649–654
 Reinbacher I, van Kreveld M, Benkert M (2006) Scale dependent definitions of gradient and aspect and their computation. In: Riedl A, Kainz W, Elmes GA (eds) Proc. 12th international symposium on Spatial Data Handling (SDH’06). Springer, pp 863–879
 Rice JA (2006) Mathematical statistics and data analysis. Duxbury Press
 Robinson, W (1950) Ecological correlations and the behavior of individuals. Am Sociol Rev 15: pp. 351357 CrossRef
 Rogerson P, Yamada I (2008) Statistical detection and surveillance of geographic clusters. Interdisciplinary Statistics Series, Chapman & Hall/CRC, Boca Raton
 Schrijver, A (1986) Theory of linear and integer programming. John Wiley & Sons, Chichester
 Snow, J (1854) On the mode of communication of cholera. Churchill Livingstone, London
 Tobler, WR (1979) Smooth pycnophylactic interpolation for geographical regions. J Am Stat Assoc 74: pp. 519536 CrossRef
 Waller, LA, Gotway, CA (2004) Applied spatial statistics for public health data. Wiley Series in Probability and Statistics, Wiley, Hoboken CrossRef
 Wein R, Fogel E, Zukerman B, Halperin D (2010) 2D arrangements. In: CGAL user and reference manual, 3.6 edn. CGAL Editorial Board
 Wolfram S (2003) The mathematica book. Wolfram Media, Incorporated
 Wolsey LA, Nemhauser GL (1999) Integer and combinatorial optimization. WileyInterscience
 Title
 Processing aggregated data: the location of clusters in health data
 Open Access
 Available under Open Access This content is freely available online to anyone, anywhere at any time.
 Journal

GeoInformatica
Volume 16, Issue 3 , pp 497521
 Cover Date
 20120701
 DOI
 10.1007/s1070701101436
 Print ISSN
 13846175
 Online ISSN
 15737624
 Publisher
 Springer US
 Additional Links
 Topics
 Keywords

 Cluster
 Aggregated data
 Algorithm
 Public health
 Industry Sectors
 Authors

 Kevin Buchin ^{(1)}
 Maike Buchin ^{(1)}
 Marc van Kreveld ^{(2)}
 Maarten Löffler ^{(3)}
 Jun Luo ^{(4)}
 Rodrigo I. Silveira ^{(5)}
 Author Affiliations

 1. Department of Mathematics and Computer Science, TU Eindhoven, Eindhoven, The Netherlands
 2. Department of Computer Science, Utrecht University, Utrecht, The Netherlands
 3. Computer Science Department, University of California, Irvine, CA, USA
 4. Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Beijing, China
 5. Departament de Matemàtica Aplicada II, Universitat Politècnica de Catalunya, Catalunya, Spain