Spatial Variation of Privacy Measured Through Individual Uniqueness Based on Simple US Demographics Data
Previous studies reveal that, using solely U.S. census data, over 60 % population could be uniquely identified with (gender, zip code, date of birth) in 1990 and 2000. This paper extends these studies to examine spatial variation of uniqueness in 2010. We provide (1) the comparison of national level uniqueness between 2000 and 2010, and (2) the investigation of spatial variation of uniqueness in different regions and at different scales. The comparison between 2000 and 2010 reveals that, although overall uniqueness changes little, the uniqueness of middle-age group has significantly decreased. Spatial variation studies show that similar characteristics in age-group uniqueness exist in different regions. Finally, the analysis at different scales discloses that overall uniqueness decreases, and the difference between age-group uniqueness reduce, when geographical scales focus on the cores of urban area. This study offers contributions to geographic information privacy, particularly relevant to reverse geocoding and related spatial aggregation techniques.
KeywordsSpatial statistics Census Privacy Spatial analysis Accuracy
This research is partially supported by the Summer Research Scholarship of Department of Geography, Environment and Society, University of Minnesota, Twin Cities, provided through the Abler Foundation.
- Bayardo RJ, Agrawal R (2005) Data privacy through optimal K-anonymization. In: ICDE 2005. Proceedings. 21st international conference on data engineering, pp 217–228Google Scholar
- Department of Labor (2010) Geographic practice cost index values by ZIP code US. Available at. http://www.dol.gov/owcp/regs/feeschedule/fee/fee10/fs10gpci Accessed 1 Feb 2014
- Golle P (2006) Revisiting the uniqueness of simple demographics in the US population. In: Proceedings of the 5th ACM workshop on privacy in electronic society ACM, New York, pp 77–80Google Scholar
- Minnesota Population Center (2011) National historical geographic information system: version 2.0. https://www.nhgis.org/. Accessed 1 Feb 2014
- Pfitzmann A, Hansen M (2010) A terminology for talking about privacy by data minimization: anonymity, unlinkability, undetectability, unobservability, pseudonymity, and identity management. Available via. http://dud.inf.tu-dresden.de/Anon_Terminology.shtml. Accessed 1 Nov 2014
- Sweeney L (2000) Uniqueness of simple demographics in the US population. In: LIDAPWP4. Carnegie Mellon University, laboratory for international data privacy, Pittsburgh, PAGoogle Scholar
- Sheppard E, McMaster RB (2008) Introduction: scale and geographic inquiry. In: Sheppard E, McMaster RB (ed) Scale and geographic inquiry: nature, society, and method. Wiley, New YorkGoogle Scholar
- U.S. Census Bureau (2010) Population distribution in the United States and Puerto Rico [map]. 1:7,500,000. https://www.census.gov/geo/maps-data/maps/2010popdistribution.html. Accessed 14 Sep 2014
- United States Census Bureau (2012) ZIP Code Tabulation Areas (ZCTAs). https://www.census.gov/geo/reference/zctas.html. Accessed 20 Feb 2014
- United States Census Bureau (2013) About data protection and privacy. Available http://www.census.gov/privacy/. Accessed 1 Feb 2014
- Winkler W (2002) Using simulated annealing for K-anonymity. Research Report 2002–07, US Census Bureau Statistical Research DivisionGoogle Scholar