, Volume 16, Issue 3, pp 497-521,
Open Access This content is freely available online to anyone, anywhere at any time.

Processing aggregated data: the location of clusters in health data

Abstract

Spatially aggregated data is frequently used in geographical applications. Often spatial data analysis on aggregated data is performed in the same way as on exact data, which ignores the fact that we do not know the actual locations of the data. We here propose models and methods to take aggregation into account. For this we focus on the problem of locating clusters in aggregated data. More specifically, we study the problem of locating clusters in spatially aggregated health data. The data is given as a subdivision into regions with two values per region, the number of cases and the size of the population at risk. We formulate the problem as finding a placement of a cluster window of a given shape such that a cluster function depending on the population at risk and the cases is maximized. We propose area-based models to calculate the cases (and the population at risk) within a cluster window. These models are based on the areas of intersection of the cluster window with the regions of the subdivision. We show how to compute a subdivision such that within each cell of the subdivision the areas of intersection are simple functions. We evaluate experimentally how taking aggregation into account influences the location of the clusters found.

This research was initiated during the GADGET Workshop on Geometric Algorithms and Spatial Data Mining, funded by the Netherlands Organization for Scientific Research (NWO) under BRICKS/FOCUS grant number 642.065.503, and has been also partially funded by NWO under the project GOGO.