Advertisement

Differential Privacy of Hierarchical Census Data: An Optimization Approach

  • Ferdinando FiorettoEmail author
  • Pascal Van Hentenryck
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11802)

Abstract

This paper is motivated by applications of a Census Bureau interested in releasing aggregate socio-economic data about a large population without revealing sensitive information. The released information can be the number of individuals living alone, the number of cars they own, or their salary brackets. Recent events have identified some of the privacy challenges faced by these organizations. To address them, this paper presents a novel differential-privacy mechanism for releasing hierarchical counts of individuals satisfying a given property. The counts are reported at multiple granularities (e.g., the national, state, and county levels) and must be consistent across levels. The core of the mechanism is an optimization model that redistributes the noise introduced to attain privacy in order to meet the consistency constraints between the hierarchical levels. The key technical contribution of the paper shows that this optimization problem can be solved in polynomial time by exploiting the structure of its cost functions. Experimental results on very large, real datasets show that the proposed mechanism provides improvements up to two orders of magnitude in terms of computational efficiency and accuracy with respect to other state-of-the-art techniques.

References

  1. 1.
    AAAS: New Privacy Protections Highlight the Value of Science Behind the 2020 census. https://www.aaas.org/news/new-privacy-protections-highlight-value-science-behind-2020-census. Accessed 23 Apr 2019
  2. 2.
    NBC News: Potential privacy lapse found in Americans’ 2010 census data. https://www.nbcnews.com/news/us-news/potential-privacy-lapse-found-americans-2010-census-data-n972471. Accessed 23 Apr 2019
  3. 3.
    New York City Taxi Data. http://www.nyc.gov/html/tlc/html/about/trip_record_data.shtml. Accessed 20 Apr 2019
  4. 4.
    NY Times: To Reduce Privacy Risks, the Census Plans to Report Less AccurateData. https://www.nytimes.com/2018/12/05/upshot/to-reduce-privacy-risks-the-census-plans-to-report-less-accurate-data.html
  5. 5.
    Cormode, G., Procopiuc, C., Srivastava, D., Shen, E., Yu, T.: Differentially private spatial decompositions. In: 2012 IEEE 28th International Conference on Data Engineering, pp. 20–31. IEEE (2012)Google Scholar
  6. 6.
    Dinur, I., Nissim, K.: Revealing information while preserving privacy. In: Proceedings of the Twenty-Second ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pp. 202–210. ACM (2003)Google Scholar
  7. 7.
    Dwork, C., McSherry, F., Nissim, K., Smith, A.: Calibrating noise to sensitivity in private data analysis. In: Halevi, S., Rabin, T. (eds.) TCC 2006. LNCS, vol. 3876, pp. 265–284. Springer, Heidelberg (2006).  https://doi.org/10.1007/11681878_14CrossRefGoogle Scholar
  8. 8.
    Dwork, C., Roth, A.: The algorithmic foundations of differential privacy. Theoret. Comput. Sci. 9(3–4), 211–407 (2013)MathSciNetzbMATHGoogle Scholar
  9. 9.
    Erlingsson, Ú., Pihur, V., Korolova, A.: Rappor: randomized aggregatable privacy-preserving ordinal response. In: Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security, pp. 1054–1067. ACM (2014)Google Scholar
  10. 10.
    Fioretto, F., Lee, C., Van Hentenryck, P.: Constrained-based differential privacy for private mobility. In: Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS), pp. 1405–1413 (2018)Google Scholar
  11. 11.
    Fioretto, F., Pontelli, E., Yeoh, W., Dechter, R.: Accelerating exact and approximate inference for (distributed) discrete optimization with GPUs. Constraints 23(1), 1–43 (2018)MathSciNetCrossRefGoogle Scholar
  12. 12.
    Fioretto, F., Van Hentenryck, P.: Constrained-based differential privacy: releasing optimal power flow benchmarks privately. In: Proceedings of the International Conference on the Integration of Constraint Programming, Artificial Intelligence, and Operations Research (CPAIOR), pp. 215–231 (2018)CrossRefGoogle Scholar
  13. 13.
    Ghosh, A., Roughgarden, T., Sundararajan, M.: Universally utility-maximizing privacy mechanisms. SIAM J. Comput. 41(6), 1673–1693 (2012)MathSciNetCrossRefGoogle Scholar
  14. 14.
    Golle, P.: Revisiting the uniqueness of simple demographics in the US population. In: Proceedings of the 5th ACM Workshop on Privacy in Electronic Society, pp. 77–80. ACM (2006)Google Scholar
  15. 15.
    Hay, M., Rastogi, V., Miklau, G., Suciu, D.: Boosting the accuracy of differentially private histograms through consistency. Proc. VLDB Endow. 3(1–2), 1021–1032 (2010)CrossRefGoogle Scholar
  16. 16.
    Huang, D., Han, S., Li, X., Yu, P.S.: Orthogonal mechanism for answering batch queries with differential privacy. In: Proceedings of the 27th International Conference on Scientific and Statistical Database Management, p. 24. ACM (2015)Google Scholar
  17. 17.
    Kuo, Y.H., Chiu, C.C., Kifer, D., Hay, M., Machanavajjhala, A.: Differentially private hierarchical group size estimation. arXiv preprint arXiv:1804.00370 (2018)
  18. 18.
    Li, C., Hay, M., Miklau, G., Wang, Y.: A data-and workload-aware algorithm for range queries under differential privacy. Proc. VLDB Endow. 7(5), 341–352 (2014)CrossRefGoogle Scholar
  19. 19.
    Li, T., Li, N.: On the tradeoff between privacy and utility in data publishing. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 517–526. ACM (2009)Google Scholar
  20. 20.
    McSherry, F., Talwar, K.: Mechanism design via differential privacy. In: 48th Annual IEEE Symposium on Foundations of Computer Science, 2007, FOCS 2007, pp. 94–103. IEEE (2007)Google Scholar
  21. 21.
    Qardaji, W., Yang, W., Li, N.: Understanding hierarchical methods for differentially private histograms. Proc. VLDB Endow. 6(14), 1954–1965 (2013)CrossRefGoogle Scholar
  22. 22.
    Sweeney, L.: k-anonymity: a model for protecting privacy. Int. J. Uncertainty, Fuzziness Knowl.-Based Syst. 10(05), 557–570 (2002)MathSciNetCrossRefGoogle Scholar
  23. 23.
    Team, A.D.P.: Learning with privacy at scale. Apple Mach. Learn. J. 1(8), 1–25 (2017)Google Scholar
  24. 24.
    U.S. Census Bureau: 2010 census summary file 1: census of population and housing, technical documentation (2012). https://www.census.gov/prod/cen2010/doc/sf1.pdf
  25. 25.
    Winkler, W.: Single ranking micro-aggregation and re-identification. Statistical Research Division report RR 2002/8 (2002)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Georgia Institute of TechnologyAtlantaUSA

Personalised recommendations