Differentially private multidimensional data publishing

Al-Hussaeni, Khalil; Fung, Benjamin C. M.; Iqbal, Farkhund; Liu, Junqiang; Hung, Patrick C. K.

doi:10.1007/s10115-017-1132-3

Differentially private multidimensional data publishing

Regular Paper
Published: 24 November 2017

Volume 56, pages 717–752, (2018)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Khalil Al-Hussaeni¹,
Benjamin C. M. Fung ORCID: orcid.org/0000-0001-8423-2906²,
Farkhund Iqbal³,
Junqiang Liu⁴ &
…
Patrick C. K. Hung⁵

1076 Accesses
8 Citations
Explore all metrics

Abstract

Various organizations collect data about individuals for various reasons, such as service improvement. In order to mine the collected data for useful information, data publishing has become a common practice among those organizations and data analysts, research institutes, or simply the general public. The quality of published data significantly affects the accuracy of the data analysis and thus affects decision making at the corporate level. In this study, we explore the research area of privacy-preserving data publishing, i.e., publishing high-quality data without compromising the privacy of the individuals whose data are being published. Syntactic privacy models, such as k-anonymity, impose syntactic privacy requirements and make certain assumptions about an adversary’s background knowledge. To address this shortcoming, we adopt differential privacy, a rigorous privacy model that is independent of any adversary’s knowledge and insensitive to the underlying data. The published data should preserve individuals’ privacy, yet remain useful for analysis. To maintain data utility, we propose DiffMulti, a workload-aware and differentially private algorithm that employs multidimensional generalization. We devise an efficient implementation to the proposed algorithm and use a real-life data set for experimental analysis. We evaluate the performance of our method in terms of data utility, efficiency, and scalability. When compared to closely related existing methods, DiffMulti significantly improved data utility, in some cases, by orders of magnitude.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Differentially Private High-Dimensional Data Publication via Markov Network

Multi-party High-Dimensional Related Data Publishing via Probabilistic Principal Component Analysis and Differential Privacy

Differentially private high-dimensional data publication via grouping and truncating techniques

Article 11 April 2019

Notes

https://www.popdata.bc.ca/.
Unless performed randomly, having a fixed generalization function \(\phi \) is a non-trivial task. The domain space of \(\phi \) is as large as the cardinality of the input data set. Moreover, the codomain of \(\phi \) is a set of d-dimensional regions, each bounded by either an interval or a value from the generalization hierarchy. Our proposed algorithm effectively partitions the regions to maintain data utility.

References

Barak B, Chaudhuri K, Dwork C, Kale S, McSherry F, Talwar K (2007) Privacy, accuracy, and consistency too: a holistic solution to contingency table release. In: Proceedings of the 26th ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems (PODS ’07), pp 273–282
Bayardo RJ, Agrawal R (2005) Data privacy through optimal k-anonymization. In: Proceedings of the 21st international conference on data engineering (ICDE ’05), pp 217–228
Blum A, Ligett K, Roth A (2008) A learning theory approach to non-interactive database privacy. In: Proceedings of the fortieth annual ACM symposium on theory of computing (STOC ’08), pp 609–618
Carlisle DM, Rodrian ML, Diamond CL (2007) California inpatient data reporting manual, medical information reporting for California, 5th edn. Technical Report, Office of Statewide Health Planning and Development
Chawla S, Dwork C, McSherry F, Smith A, Wee H (2005) Toward privacy in public databases. In: Proceedings of the second international conference on theory of cryptography (TCC ’05), pp 363–385
Chen R, Mohammed N, Fung BCM, Desai BC, Xiong L (2011) Publishing set-valued data via differential privacy. Proc VLDB Endow 4(11):1087–1098
Google Scholar
Cormode G, Procopiuc C, Srivastava D, Tran TTL (2012) Differentially private summaries for sparse data. In: Proceedings of the 15th international conference on database theory (ICDT ’12), pp 299–311
Ding B, Winslett M, Han J, Li Z (2011) Differentially private data cubes: optimizing noise sources and consistency. In: Proceedings of the 2011 ACM SIGMOD international conference on management of data (SIGMOD ’11), pp 217–228
Dwork C (2006) Differential privacy. In: Proceedings of the 33rd international conference on automata, languages and programming—volume part II (ICALP ’06), pp 1–12
Dwork C (2008) Differential privacy: a survey of results. In: Proceedings of the 5th international conference on theory and applications of models of computation (TAMC ’08), pp 1–19
Dwork C (2011) A firm foundation for private data analysis. Commun ACM 54(1):86–95
Article Google Scholar
Dwork C, McSherry F, Nissim K, Smith A (2006) Calibrating noise to sensitivity in private data analysis. In: Proceedings of the third conference on theory of cryptography (TCC ’06), pp 265–284
Dwork C, Roth A (2014) The algorithmic foundations of differential privacy. Found Trends Theor Comput Sci 9(3–4):211–407
MathSciNet MATH Google Scholar
Frank A, Suncion A (2010) UCI machine learning repository. http://archive.ics.uci.edu/ml
Friedman A, Schuster A (2010) Data mining with differential privacy. In: Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining (KDD ’10), pp 493–502
Fung BCM, Wang K, Wang L, Hung PCK (2009) Privacy-preserving data publishing for cluster analysis. Data Knowl Eng 68(6):552–575
Article Google Scholar
Fung BCM, Wang K, Yu PS (2007) Anonymizing classification data for privacy preservation. IEEE Trans Knowl Data Eng 19(5):711–725
Article Google Scholar
Ganta SR, Kasiviswanathan S, Smith A (2008) Composition attacks and auxiliary information in data privacy. In: Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining (KDD ’08), pp 265–273
Hafner K (2006) And if you liked the movie, a netflix contest may reward you handsomely. New York Times, New York
Google Scholar
Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The WEKA data mining software: an update. ACM SIGKDD Explor Newsl 11(1):10–18
Article Google Scholar
Hay M, Rastogi V, Miklau G, Suciu D (2010) Boosting the accuracy of differentially private histograms through consistency. Proc VLDB Endow 3(1–2):1021–1032
Article Google Scholar
Iyengar VS (2002) Transforming data to satisfy privacy constraints. In: Proceedings of the eighth ACM SIGKDD international conference on knowledge discovery and data mining (KDD ’02), pp 279–288
Joachims T (1999) Making large-scale SVM learning practical. In: Schölkopf B, Burges C, Smola A (eds) Advances in Kernel methods–support vector learning, vol 11. MIT Press, Cambridge, pp 169–184
Google Scholar
Karypis G (2006) CLUTO—software for clustering high-dimensional datasets. http://glaros.dtc.umn.edu/gkhome/views/cluto
Kaufman L, Rousseeuw PJ (2009) Finding groups in data: an introduction to cluster analysis, vol 344. Wiley, Hoboken
MATH Google Scholar
Kifer D (2009) Attacks on privacy and de Finetti’s theorem. In: Proceedings of the 2009 ACM SIGMOD international conference on management of data (SIGMOD ’09), pp 127–138
Kifer D, Lin B-R (2010) Towards an axiomatization of statistical privacy and utility. In: Proceedings of the 29th ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems (PODS ’10), pp 147–158
LeFevre K, DeWitt DJ, Ramakrishnan R (2006) Mondrian multidimensional K-anonymity. In: Proceedings of the 22nd international conference on data engineering (ICDE ’06)
LeFevre K, DeWitt DJ, Ramakrishnan R (2008) Workload-aware anonymization techniques for large-scale datasets. ACM Trans Database Syst 33(3):17:1–17:47
Article Google Scholar
Li C, Hay M, Rastogi V, Miklau G, McGregor A (2010) Optimizing linear counting queries under differential privacy. In: Proceedings of the 29th ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems (PODS ’10), pp 123–134
Li H, Xiong L, Jiang X (2014) Differentially private synthesization of multi-dimensional data using copula functions. In: Proceedings of the 17th international conference on extending database technology (EDBT ’14), vol 2014, pp 475–486
Li N, Li T, Venkatasubramanian S (2007) \(t\)-closeness: privacy beyond \(k\)-anonymity and \(\ell \)-diversity. In: Proceedings of the 23rd international conference on data engineering (ICDE ’07), pp 106–115
Machanavajjhala A, Gehrke J, Kifer D, Venkitasubramaniam M (2006) \(\ell \)-diversity: privacy beyond \(k\)-anonymity. In: Proceedings of the 22nd IEEE international conference on data engineering (ICDE ’6), p 24
McSherry F (2009) Privacy integrated queries. In: Proceedings of the 2009 ACM SIGMOD international conference on management of data (SIGMOD ’09), pp 19–30
McSherry F, Talwar K (2007) Mechanism design via differential privacy. In: Proceedings of the 48th annual IEEE symposium on foundations of computer science (FOCS ’07), pp 94–103
Mohammed N, Chen R, Fung BCM, Yu PS (2011) Differentially private data release for data mining. In: Proceedings of the 17th ACM SIGKDD international conference on knowledge discovery and data mining (KDD ’11), pp 493–501
Qardaji W, Li N (2012) Recursive partitioning and summarization: a practical framework for differentially private data publishing. In: Proceedings of the 7th ACM symposium on information, computer and communications security (ASIACCS ’12), pp 38–39
Qardaji W, Yang W, Li N (2013) Differentially private grids for geospatial data. In: Proceedings of the 29th IEEE international conference on data engineering (ICDE ’13), pp 757–768
Qardaji W, Yang W, Li N (2013) Understanding hierarchical methods for differentially private histograms. Proc VLDB Endow 6(14):1954–1965
Article Google Scholar
Qardaji W, Yang W, Li N (2014) PriView: practical differentially private release of marginal contingency tables. In: Proceedings of the 2014 ACM SIGMOD international conference on management of data (SIGMOD ’14), pp 1435–1446
Quinlan JR (1993) C4.5: programs for machine learning. Morgan Kaufmann, San Francisco
Google Scholar
Samarati P (2001) Protecting respondents’ identities in microdata release. IEEE Trans Knowl Data Eng 13(6):1010–1027
Article Google Scholar
Sweeney L (2002) K-anonymity: a model for protecting privacy. Int J Uncertain Fuzziness Knowl Based Syst 10(5):557–570
Article MathSciNet MATH Google Scholar
Weiss SM, Kulikowski CA (1991) Computer systems that learn: classification and prediction methods from statistics, neural nets, machine learning, and expert systems. Morgan Kaufmann Publishers Inc., San Francisco
Google Scholar
Wong RC-W, Fu AW-C, Wang K, Pei J (2007) Minimality attack in privacy preserving data publishing. In: Proceedings of the 33rd international conference on very large data bases (VLDB ’07), pp 543–554
Xiao X, Bender G, Hay M, Gehrke J (2011) iReduct: differential privacy with reduced relative errors. In: Proceedings of the 2011 ACM SIGMOD international conference on management of data (SIGMOD ’11), pp 229–240
Xiao X, Wang G, Gehrke J (2011) Differential privacy via wavelet transforms. IEEE Trans Knowl Data Eng 23(8):1200–1214
Article Google Scholar
Xiao Y, Xiong L, Fan L, Goryczka S, Li H (2014) DPCube: differentially private histogram release through multidimensional partitioning. Trans Data Privacy 7(3):195–222
MathSciNet Google Scholar
Xu J, Wang W, Pei J, Wang X, Shi B, Fu AWC (2006) Utility-based anonymization using local recoding. In: Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining (KDD ’06), pp 785–790
Xu J, Zhang Z, Xiao X, Yang Y, Yu G, Winslett M (2013) Differentially private histogram publication. VLDB J 22(6):797–822
Article Google Scholar
Zhang J, Cormode G, Procopiuc CM, Srivastava D, Xiao X (2014) PrivBayes: private data release via Bayesian networks. In: Proceedings of the 2014 ACM SIGMOD international conference on management of data (SIGMOD ’14), pp 1423–1434

Download references

Acknowledgements

The research is supported in part by the Discovery Grants (356065-2013) from the Natural Sciences and Engineering Research Council of Canada (NSERC), Canada Research Chairs Program (950-230623), Research Incentive Funds (R15046 and R15048) from Zayed University, Research Grants (61272306) from the National Natural Science Foundation of China (NSFC), and Research Grants (LY17F020004) from the Zhejiang Natural Science Foundation of China (ZJNSF). The work was partially completed while Benjamin C. M. Fung was visiting the Department of Computer Science at Hong Kong Baptist University.

Author information

Authors and Affiliations

Department of Electrical and Computer Engineering, Concordia University, Montreal, QC, Canada
Khalil Al-Hussaeni
School of Information Studies, McGill University, Montreal, QC, Canada
Benjamin C. M. Fung
College of Technological Innovation, Zayed University, Abu Dhabi, UAE
Farkhund Iqbal
School of Information and Electronic Engineering, Zhejiang Gongshang University, Hangzhou, China
Junqiang Liu
Faculty of Business and Information Technology, University of Ontario Institute of Technology, Oshawa, ON, Canada
Patrick C. K. Hung

Authors

Khalil Al-Hussaeni
View author publications
You can also search for this author in PubMed Google Scholar
Benjamin C. M. Fung
View author publications
You can also search for this author in PubMed Google Scholar
Farkhund Iqbal
View author publications
You can also search for this author in PubMed Google Scholar
Junqiang Liu
View author publications
You can also search for this author in PubMed Google Scholar
Patrick C. K. Hung
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Benjamin C. M. Fung.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Al-Hussaeni, K., Fung, B.C.M., Iqbal, F. et al. Differentially private multidimensional data publishing. Knowl Inf Syst 56, 717–752 (2018). https://doi.org/10.1007/s10115-017-1132-3

Download citation

Received: 14 February 2016
Revised: 24 July 2017
Accepted: 08 November 2017
Published: 24 November 2017
Issue Date: September 2018
DOI: https://doi.org/10.1007/s10115-017-1132-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Differentially private multidimensional data publishing

Abstract

Access this article

Similar content being viewed by others

Differentially Private High-Dimensional Data Publication via Markov Network

Multi-party High-Dimensional Related Data Publishing via Probabilistic Principal Component Analysis and Differential Privacy

Differentially private high-dimensional data publication via grouping and truncating techniques

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Differentially private multidimensional data publishing

Abstract

Access this article

Similar content being viewed by others

Differentially Private High-Dimensional Data Publication via Markov Network

Multi-party High-Dimensional Related Data Publishing via Probabilistic Principal Component Analysis and Differential Privacy

Differentially private high-dimensional data publication via grouping and truncating techniques

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation