The VLDB Journal

, Volume 22, Issue 6, pp 797–822 | Cite as

Differentially private histogram publication

  • Jia Xu
  • Zhenjie Zhang
  • Xiaokui Xiao
  • Yin Yang
  • Ge Yu
  • Marianne Winslett
Regular Paper

Abstract

Differential privacy (DP) is a promising scheme for releasing the results of statistical queries on sensitive data, with strong privacy guarantees against adversaries with arbitrary background knowledge. Existing studies on differential privacy mostly focus on simple aggregations such as counts. This paper investigates the publication of DP-compliant histograms, which is an important analytical tool for showing the distribution of a random variable, e.g., hospital bill size for certain patients. Compared to simple aggregations whose results are purely numerical, a histogram query is inherently more complex, since it must also determine its structure, i.e., the ranges of the bins. As we demonstrate in the paper, a DP-compliant histogram with finer bins may actually lead to significantly lower accuracy than a coarser one, since the former requires stronger perturbations in order to satisfy DP. Moreover, the histogram structure itself may reveal sensitive information, which further complicates the problem. Motivated by this, we propose two novel mechanisms, namely NoiseFirst and StructureFirst, for computing DP-compliant histograms. Their main difference lies in the relative order of the noise injection and the histogram structure computation steps. NoiseFirst has the additional benefit that it can improve the accuracy of an already published DP-compliant histogram computed using a naive method. For each of proposed mechanisms, we design algorithms for computing the optimal histogram structure with two different objectives: minimizing the mean square error and the mean absolute error, respectively. Going one step further, we extend both mechanisms to answer arbitrary range queries. Extensive experiments, using several real datasets, confirm that our two proposals output highly accurate query answers and consistently outperform existing competitors.

Keywords

Differential privacy Database query processing Histogram 

References

  1. 1.
    Barak, B., Chaudhuri, K., Dwork, C., Kale, S., McSherry, F., Talwar, K.: Privacy, accuracy, and consistency too: a holistic solution to contingency table release. In: PODS, pp. 273–282 (2007)Google Scholar
  2. 2.
    Bhaskar, R., Laxman, S., Smith, A., Thakurta, A.: Discovering frequent patterns in sensitive data. In: KDD, pp. 503–512 (2010)Google Scholar
  3. 3.
    Blum, A., Ligett, K., Roth, A.: A learning theory approach to non-interactive database privacy. In: STOC, pp. 609–618 (2008)Google Scholar
  4. 4.
    Cormen, T.H., Leiserson, C.E., Rivest, R.L.: Introduction to Algorithms, 2nd edn., pp. 185–192. MIT Press and McGraw-Hill, New York (2001)Google Scholar
  5. 5.
    Cormode, G., Procopiuc, C.M., Srivastava, D., Tran, T.T.L.: Differentially private publication of sparse data. In: ICDT (2012)Google Scholar
  6. 6.
    Cormode, G., Procopiuc, M., Shen, E., Srivastava, D., Yu, T.: Differentially private spatial decompositions. In: ICDE (2012)Google Scholar
  7. 7.
    Ding, B., Winslett, M., Han, J., Li, Z.: Differentially private data cubes: optimizing noise sources and consistency. In: SIGMOD, pp. 217–228 (2011)Google Scholar
  8. 8.
    Dwork, C.: Differential privacy: a survey of results. In: TAMC, pp. 1–19 (2008)Google Scholar
  9. 9.
    Dwork, C., McSherry, F., Nissim, K., Smith, A.: Calibrating noise to sensitivity in private data analysis. In: TCC, pp. 265–284 (2006)Google Scholar
  10. 10.
    Dwork, C., McSherry, F., Talwar, K.: The price of privacy and the limits of LP decoding. In: STOC, pp. 85–94 (2007)Google Scholar
  11. 11.
    Dwork, C., Rothblum, G.N., Vadhan, S.P.: Boosting and differential privacy. In: FOCS, pp. 51–60 (2010)Google Scholar
  12. 12.
    Friedman, A., Schuster, A.: Data mining with differential privacy. In: KDD, pp. 493–502 (2010)Google Scholar
  13. 13.
    Götz, M., Machanavajjhala, A., Wang, G., Xiao, X., Gehrke, J.: Publishing search logs—a comparative study of privacy guarantees. IEEE TKDE 24(3): 520–532 (2012)Google Scholar
  14. 14.
    Guha, S., Koudas, N., Shim, K.: Approximation and streaming algorithms for histogram construction problems. ACM TODS 31(1), 396–438 (2006)Google Scholar
  15. 15.
    Hay, M., Rastogi, V., Miklau, G., Suciu, D.: Boosting the accuracy of differentially private histograms through consistency. PVLDB 3(1), 1021–1032 (2010)Google Scholar
  16. 16.
    Homer N., Szelinger S., Redman M., Duggan D., Tembe W., Muehling J., Pearson J.V., Stephan D.A., Nelson S.F., Craig, D.W.: Resolving individuals contributing trace amounts of dna to highly complex mixtures using high-density snp genotyping microarrays. PLoS Genet. 4(8), e100167 (2008)Google Scholar
  17. 17.
    Jagadish, H.V., Koudas, N., Muthukrishnan, S., Poosala, V., Sevcik, K.C., Suel, T.: Optimal histograms with quality guarantees. In: VLDB, pp. 275–286 (1998) Google Scholar
  18. 18.
    Jagadish H.V., Koudas N., Muthukrishnan S., Poosala V., Sevcik K.C., Suel T. Optimal histograms with quality guarantees. In: VLDB, pp. 275–286 (1998)Google Scholar
  19. 19.
    Korolova, A., Kenthapadi, K., Mishra, N., Ntoulas, A.: Releasing search queries and clicks privately. In: WWW, pp. 171–180 (2009)Google Scholar
  20. 20.
    Kotz, S., Kozubowski, T., Podgórski, K.: The Laplace Distribution and Generalizations: A Revisit with Applications to Communications, Economics, Engineering, and Finance. Birkhäuser Publication, Boston (2001)Google Scholar
  21. 21.
    Li, C., Hay, M., Rastogi, V., Miklau, G., McGregor, A.: Optimizing linear counting queries under differential privacy. In: PODS, pp. 123–134 (2010)Google Scholar
  22. 22.
    Li, C., Miklau, G.: An adaptive mechanism for accurate query answering under differential privacy. PVLDB 5(6), 514–525 (2012)Google Scholar
  23. 23.
    McSherry, F., Mahajan R. Differentially-private network trace analysis. In: SIGCOMM, pp. 123–134 (2010)Google Scholar
  24. 24.
    Mohan, P., Thakurta, A., Shi, E., Song, D., Culler, D.E.: Gupt: privacy preserving data analysis made easy. In: SIGMOD, pp. 349–360 (2012)Google Scholar
  25. 25.
    Rastogi V., Nath S.: Differentially private aggregation of distributed time-series with transformation and encryption. In: SIGMOD, pp. 735–746 (2010)Google Scholar
  26. 26.
    Wang, R., Li, Y., Wang, X., Tang, H., Zhou, X.: Learning your identity and disease from research papers: Information leaks in genome wide association study. In: ACM CCS (2009)Google Scholar
  27. 27.
    Xiao, X., Bender, G., Hay, M., Gehrke, J.: ireduct: differential privacy with reduced relative errors. In: SIGMOD, pp. 229–240 (2011)Google Scholar
  28. 28.
    Xiao, X., Wang, G., Gehrke, J.: Differential privacy via wavelet transforms. In: ICDE, pp. 225–236 (2010)Google Scholar
  29. 29.
    Xiao, Y., Xiong, L., Yuan, C.: Differentially private data release through multidimensional partitioning. In: Secure Data Management, pp. 150–168 (2010)Google Scholar
  30. 30.
    Yuan, G., Zhang, Z., Winslett, M., Xiao, X., Yang, Y., Hao, Z.: Low-rank mechanism: optimizing batch queries under differential privacy. PVLDB 5(11), 1352–1363 (2012)Google Scholar
  31. 31.
    Zhang, J., Zhang, Z., Xiao, X., Yang, Y., Winslett, M.: Functional mechanism: regression analysis under differential privacy. PVLDB 5(11), 1364–1375 (2012)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Jia Xu
    • 1
  • Zhenjie Zhang
    • 2
  • Xiaokui Xiao
    • 3
  • Yin Yang
    • 2
  • Ge Yu
    • 1
  • Marianne Winslett
    • 4
  1. 1.College of Information Science and EngineeringNortheastern UniversityShenyangChina
  2. 2.Advanced Digital Sciences CenterIllinois at Singapore Pte. LtdSingaporeSingapore
  3. 3.School of Computer EngineeringNanyang Technological UniversitySingaporeSingapore
  4. 4.Department of Computer ScienceUniversity of Illinois at Urbana-ChampaignUrbana-ChampaignUSA

Personalised recommendations