Discovering top-k patterns with differential privacy-an accurate approach

Zhang, Xiaojian; Meng, Xiaofeng

doi:10.1007/s11704-014-3230-7

Discovering top-k patterns with differential privacy-an accurate approach

Research Article
Published: 25 August 2014

Volume 8, pages 816–827, (2014)
Cite this article

Frontiers of Computer Science Aims and scope Submit manuscript

Xiaojian Zhang^1,2 &
Xiaofeng Meng¹

137 Accesses
4 Citations
Explore all metrics

Abstract

Frequent pattern mining discovers sets of items that frequently appear together in a transactional database; these can serve valuable economic and research purposes. However, if the database contains sensitive data (e.g., user behavior records, electronic health records), directly releasing the discovered frequent patterns with support counts will carry significant risk to the privacy of individuals. In this paper, we study the problem of how to accurately find the top-k frequent patterns with noisy support counts on transactional databases while satisfying differential privacy. We propose an algorithm, called differentially private frequent pattern (DFP-Growth), that integrates a Laplace mechanism and an exponential mechanism to avoid privacy leakage. We theoretically prove that the proposed method is (λ, δ)-useful and differentially private. To boost the accuracy of the returned noisy support counts, we take consistency constraints into account to conduct constrained inference in the post-processing step. Extensive experiments, using several real datasets, confirm that our algorithm generates highly accurate noisy support counts and top-k frequent patterns.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Trends and Future Perspective Challenges in Big Data

Big data privacy: a technological perspective and review

Article Open access 26 November 2016

Stratified random sampling from streaming and stored data

Article 23 October 2020

References

Agrawal R, Srikant R. Fast algorithms for mining association rules in large databases. In: Proceedings of the 20th International Conference on Very Large Data Bases. 1994, 487–499
Google Scholar
Atzori M, Bonchi F, Giannotti F, Pedreschi D. Anonymity preserving pattern discovery. Very Large Data Bases Journal, 2008, 17(4): 703–727
Article Google Scholar
Xu Y, Wang K, Fu A, Yu P S. Anonymizing Transaction Databases for Publication. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2008, 767–775
Chapter Google Scholar
Ganta S R, Kasiviswanathan S P, Smith A. Composition attacks and auxiliary information in data privacy. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2008, 265–273
Chapter Google Scholar
Wong R C W, Fu A, Wang K, Yu P S, Pei J. Can the utility of anonymized data be used for privacy breaches. ACM Transaction on Knowledge Discovery from Data, 2011, 5(3):16
Google Scholar
Dwork C. Differential privacy. In: Proceedings of the 33th Colloquium on Automata, Languages and Programming. 2006, 1–12
Chapter Google Scholar
Bhaskar R, Laxman S, Smith A, Thakurta A. Discovering frequent patterns in sensitive data. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2010, 503–512
Google Scholar
Li N, Qardaji W, Su D, Cao J. PrivBasis: frequent item set mining with differential privacy. Very Large Data Bases Endowment, 2012, 5(11): 1340–1351
Google Scholar
Han J, Pei J, Yin Y, Mao R. Mining frequent patterns without candidate generation: a frequent-pattern tree approach. Data Mining and Knowledge Discovery, 2004, 8(1): 53–87
Article MathSciNet Google Scholar
Dwork C, McSherry F, Smith A. Calibrating noise to sensitivity in private data analysis. In: Proceedings of the 3rd Theory of Cryptography Conference. 2006, 265–284
Chapter Google Scholar
Hay M, Rastogi V, Miklau G, Suciu D. Boosting the accuracy of differentially private histogram through consistency. Very Large Data Bases Endowment, 2010, 3(1): 1021–1032
Google Scholar
Chen R, Mohammed N, Fung B C M, Desai B C, Xiong L. Publishing set-valued data via differential privacy. Very Large Data Bases Endowment, 2011, 4(11): 1087–1098
Google Scholar
Götz M, Machanavajjhala A, Wang G, Xiao X, Gehrke J. Publishing search logs-a comparative study of privacy guarantees. IEEE Transaction Knowledge and Data Engineering, 2012, 24(3): 520–532
Article Google Scholar
McSherry F, Mironov I. Differentially private recommender systems: building privacy into the Netflix prize contenders. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2009, 627–636
Chapter Google Scholar
Ding B, Winslett M, Han J, Li Z. Differentially private data cubes: optimizing noise sources and consistency. In: Proceedings of the 31th International Conference on Management of Data. 2011, 217–228
Google Scholar
Friedman A, Schuster A. Data mining with differential privacy. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2010, 493–502
Google Scholar
Mohammed N, Chen R, Fung B C M, Yu P S. Differentially private data release for data mining. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2011, 493–501
Google Scholar
Chen R, Fung B C M, Desai B C, Sossou N M. Differentially private transit data publication: a case study on the Montreal transportation system. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2012, 213–221
Chapter Google Scholar
Terrovitis M, Mamoulis N, Kalnis P. Privacy-preserving anonymization of set-valued data. Very Large Data Bases Endowment, 2008, 1(1): 115–125
Google Scholar
He Y, Naughton J F. Anonymization of set-valued data via top-down, local generalization. Very Large Data Bases Endowment, 2009, 2(1): 934–945
Google Scholar
McSherry F. Mechanism design via differential privacy. In: Proceedings of the 48th Annual IEEE Symposium on Foundations of Computer Science. 2007, 94–103
Google Scholar
McSherry F. Privacy integrated queries: an extensible platform for privacy-preserving data analysis. In: Proceedings of the ACM SIGMOD International Conference on Management of Data. 2009, 19–30
Chapter Google Scholar
Xiao Y, Xiong L, Yuan C. Differentially private data release through multidimensional partitioning. In: Proceedings of the 7th Very Large Data Bases Workshop on Secure Data Management. 2010, 150–168
Google Scholar

Download references

Author information

Authors and Affiliations

School of Information, Renmin University of China, Beijing, 100872, China
Xiaojian Zhang & Xiaofeng Meng
School of Computer and Information Engineering, Henan University of Economics and Law, Zhengzhou, 450002, China
Xiaojian Zhang

Authors

Xiaojian Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Xiaofeng Meng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiaojian Zhang.

Additional information

Xiaojian Zhang received his MS degree in computer science form Yanshan University in 2007. He is now a PhD student at Renmin University of China. His main research interests include differential privacy and data mining.

Xiaofeng Meng received his PhD degree in computer science from the Institute of Computing Technology, Chinese Academy of Sciences. He is a professor and PhD supervisor at Renmin University of China. His research interests include cloud data management, web data management, native XML databases, flash-based databases, and privacy-preserving.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, X., Meng, X. Discovering top-k patterns with differential privacy-an accurate approach. Front. Comput. Sci. 8, 816–827 (2014). https://doi.org/10.1007/s11704-014-3230-7

Download citation

Received: 06 July 2013
Accepted: 06 May 2014
Published: 25 August 2014
Issue Date: October 2014
DOI: https://doi.org/10.1007/s11704-014-3230-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Discovering top-k patterns with differential privacy-an accurate approach

Abstract

Access this article

Similar content being viewed by others

Trends and Future Perspective Challenges in Big Data

Big data privacy: a technological perspective and review

Stratified random sampling from streaming and stored data

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Discovering top-k patterns with differential privacy-an accurate approach

Abstract

Access this article

Similar content being viewed by others

Trends and Future Perspective Challenges in Big Data

Big data privacy: a technological perspective and review

Stratified random sampling from streaming and stored data

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation