Spatial Variation of Privacy Measured Through Individual Uniqueness Based on Simple US Demographics Data

  • Allen LinEmail author
  • Francis Harvey
Part of the Advances in Geographic Information Science book series (AGIS)


Previous studies reveal that, using solely U.S. census data, over 60 % population could be uniquely identified with (gender, zip code, date of birth) in 1990 and 2000. This paper extends these studies to examine spatial variation of uniqueness in 2010. We provide (1) the comparison of national level uniqueness between 2000 and 2010, and (2) the investigation of spatial variation of uniqueness in different regions and at different scales. The comparison between 2000 and 2010 reveals that, although overall uniqueness changes little, the uniqueness of middle-age group has significantly decreased. Spatial variation studies show that similar characteristics in age-group uniqueness exist in different regions. Finally, the analysis at different scales discloses that overall uniqueness decreases, and the difference between age-group uniqueness reduce, when geographical scales focus on the cores of urban area. This study offers contributions to geographic information privacy, particularly relevant to reverse geocoding and related spatial aggregation techniques.


Spatial statistics Census Privacy Spatial analysis Accuracy 



This research is partially supported by the Summer Research Scholarship of Department of Geography, Environment and Society, University of Minnesota, Twin Cities, provided through the Abler Foundation.


  1. Allshouse WB, Fitch MK, Hampton KH, Gesink DC, Ia Doherty, Pa Leone, Miller WC (2010) Geomasking sensitive health data and privacy protection: an evaluation using an E911 database. Geocarto Int 25(6):443–452CrossRefGoogle Scholar
  2. Armstrong MP, Rushton G, Zimmerman DL (1999) Geographically masking health data to preserve confidentiality. Stat Med 18(5):497–525CrossRefGoogle Scholar
  3. Bayardo RJ, Agrawal R (2005) Data privacy through optimal K-anonymization. In: ICDE 2005. Proceedings. 21st international conference on data engineering, pp 217–228Google Scholar
  4. Department of Labor (2010) Geographic practice cost index values by ZIP code US. Available at. Accessed 1 Feb 2014
  5. El Emam K, Brown A, AbdelMalik P (2005) Evaluating predictors of geographic area population size cut-offs to manage re-identification risk. J Am Med Inform Assoc JAMIA 16(2):256–266CrossRefGoogle Scholar
  6. El Emam K, Dankar FK (2008) Protecting privacy using K-anonymity. J Am Med Inform Assoc 15(5):627–637CrossRefGoogle Scholar
  7. Golle P (2006) Revisiting the uniqueness of simple demographics in the US population. In: Proceedings of the 5th ACM workshop on privacy in electronic society ACM, New York, pp 77–80Google Scholar
  8. Minnesota Population Center (2011) National historical geographic information system: version 2.0. Accessed 1 Feb 2014
  9. Pfitzmann A, Hansen M (2010) A terminology for talking about privacy by data minimization: anonymity, unlinkability, undetectability, unobservability, pseudonymity, and identity management. Available via. Accessed 1 Nov 2014
  10. Sweeney L (2000) Uniqueness of simple demographics in the US population. In: LIDAPWP4. Carnegie Mellon University, laboratory for international data privacy, Pittsburgh, PAGoogle Scholar
  11. Sweeney L (2002) K-anonymity: a model for protecting privacy. Int J Uncertainty Fuzziness Knowl Based Syst 10(05):557–570CrossRefGoogle Scholar
  12. Sheppard E, McMaster RB (2008) Introduction: scale and geographic inquiry. In: Sheppard E, McMaster RB (ed) Scale and geographic inquiry: nature, society, and method. Wiley, New YorkGoogle Scholar
  13. U.S. Census Bureau (2010) Population distribution in the United States and Puerto Rico [map]. 1:7,500,000. Accessed 14 Sep 2014
  14. United States Census Bureau (2012) ZIP Code Tabulation Areas (ZCTAs). Accessed 20 Feb 2014
  15. United States Census Bureau (2013) About data protection and privacy. Available Accessed 1 Feb 2014
  16. Winkler W (2002) Using simulated annealing for K-anonymity. Research Report 2002–07, US Census Bureau Statistical Research DivisionGoogle Scholar
  17. Zimmerman DL, Pavlik C (2008) Quantifying the effects of mask metadata disclosure and multiple releases on the confidentiality of geographically masked health data. Geogr Anal 40(1):52–76CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  1. 1.Department of Geography, Environment and SocietyUniversity of Minnesota, Twin CitiesMinneapolisUSA
  2. 2.Leibniz Institute for Regional Geography and Leipzig UniversityLeipzigGermany

Personalised recommendations