Privacy-Preserving Data Mining: A Survey

Aggarwal, Charu C.; Yu, Philip S.

doi:10.1007/978-0-387-48533-1_18

Charu C. Aggarwal³ &
Philip S. Yu³

1982 Accesses
14 Citations

Summary

In recent years, privacy-preserving data mining has been studied extensively, because of the wide proliferation of sensitive information on the internet. A number of algorithmic techniques have been designed for privacy-preserving data mining. In this paper, we provide a review of the state-of-the-art methods for privacy. We discuss methods for randomization, k-anonymization, and distributed privacy-preserving data mining. We also discuss cases in which the output of data mining applications needs to be sanitized for privacy-preservation purposes. We discuss the computational and theoretical limits associated with privacy-preservation over high dimensional data sets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Adam N., Wortmann J. C.: Security-Control Methods for Statistical Databases: A Comparison Study. ACM Computing Surveys, 21(4), 1989.
Article Google Scholar
Agrawal R., Srikant R. Privacy-Preserving Data Mining. Proceedings of the ACM SIGMOD Conference, 2000.
Google Scholar
Agrawal R., Srikant R., Thomas D. Privacy-Preserving OLAP. Proceedings of the ACM SIGMOD Conference, 2005.
Google Scholar
Agrawal R., Bayardo R., Faloutsos C., Kiernan J., Rantzau R., Srikant R.: Auditing Compliance via a hippocratic database. VLDB Conference, 2004.
Google Scholar
Agrawal D. Aggarwal C. C. On the Design and Quantification of Privacy-Preserving Data Mining Algorithms. ACM PODS Conference, 2002.
Google Scholar
Aggarwal C., Pei J., Zhang B. A Framework for Privacy Preservation against Adversarial Data Mining. ACM KDD Conference, 2006.
Google Scholar
Aggarwal C. C. On k-anonymity and the curse of dimensionality. VLDB Conference, 2005.
Google Scholar
Aggarwal C. C., Yu P. S.: A Condensation approach to privacy preserving data mining. EDBT Conference, 2004.
Google Scholar
Aggarwal C. C., Yu P. S.: On Variable Constraints in Privacy-Preserving Data Mining. SIAM Conference, 2005.
Google Scholar
Aggarwal C. C.: On Randomization, Public Information and the Curse of Dimensionality. ICDE Conference, 2007.
Google Scholar
Aggarwal G., Feder T., Kenthapadi K., Motwani R., Panigrahy R., Thomas D., Zhu A.: Anonymizing Tables. ICDT Conference, 2005.
Google Scholar
Aggarwal G., Feder T., Kenthapadi K., Motwani R., Panigrahy R., Thomas D., Zhu A.: Approximation Algorithms for k-anonymity. Journal of Privacy Technology, paper 20051120001, 2005.
Google Scholar
Aggarwal G., Feder T., Kenthapadi K., Khuller S., Motwani R., Panigrahy R., Thomas D., Zhu A.: Achieving Anonymity via Clustering. ACM PODS Conference, 2006.
Google Scholar
Atallah, M., Elmagarmid, A., Ibrahim, M., Bertino, E., Verykios, V.: Disclosure limitation of sensitive rules, Workshop on Knowledge and Data Engineering Exchange, 1999.
Google Scholar
Bawa M., Bayardo R. J., Agrawal R.: Privacy-Preserving Indexing of Documents on the Network. VLDB Conference, 2003.
Google Scholar
Bayardo R. J., Agrawal R.: Data Privacy through Optimal k-Anonymization. Proceedings of the ICDE Conference, pp. 217–228, 2005.
Google Scholar
Bettini C., Wang X. S., Jajodia S.: Protecting Privacy against Location Based Personal Identification. Proc. of Secure Data Management Workshop, Trondheim, Norway, 2005.
Google Scholar
Biskup J., Bonatti P.: Controlled Query Evaluation for Known Policies by Combining Lying and Refusal. Annals of Mathematics and Artificial Intelligence, 40(1-2), 2004.
Article MathSciNet Google Scholar
Blum A., Dwork C., McSherry F., Nissim K.: Practical Privacy: The SuLQ Framework. ACM PODS Conference, 2005.
Google Scholar
Chang L., Moskowitz I.: An integrated framwork for database inference and privacy protection. Data and Applications Security. Kluwer, 2000.
Google Scholar
Chang L., Moskowitz I.: Parsimonious downgrading and decision trees applied to the inference problem. New Security Paradigms Workshop, 1998.
Google Scholar
Chaum D., Crepeau C., Damgard I.: Multiparty unconditionally secure protocols. ACM STOC Conference, 1988.
Google Scholar
Chawla S., Dwork C., McSherry F., Smith A., Wee H.: Towards Privacy in Public Databases, TCC, 2005.
Google Scholar
Chawla S., Dwork C., McSherry F., Talwar K.: On the Utility of Privacy-Preserving Histograms, UAI, 2005.
Google Scholar
Chen K., Liu L.: Privacy-preserving data classification with rotation perturbation. ICDM Conference, 2005.
Google Scholar
Chin F.: Security Problems on Inference Control for SUM, MAX, and MIN Queries. J. of the ACM, 33(3), 1986.
Google Scholar
Chin F., Ozsoyoglu G.: Auditing for Secure Statistical Databases. Proceedings of the ACM’81 Conference, 1981.
Google Scholar
Ciriani V., De Capitiani di Vimercati S., Foresti S., Samarati P.: k-Anonymity. Security in Decentralized Data Management, ed. Jajodia S., Yu T., Springer, 2006.
Google Scholar
Clifton C., Kantarcioglou M., Lin X., Zhu M.: Tools for privacy-preserving distributed data mining. ACM SIGKDD Explorations, 4(2), 2002.
Google Scholar
Clifton C., Marks D.: Security and Privacy Implications of Data Mining., Workshop on Data Mining and Knowledge Discovery, 1996.
Google Scholar
Dasseni E., Verykios V., Elmagarmid A., Bertino E.: Hiding Association Rules using Confidence and Support, 4th Information Hiding Workshop, 2001.
Google Scholar
Denning D.: Secure Statostical Databases with Random Sample Queries. ACM TODS Journal, 5(3), 1980.
Google Scholar
Dinur I., Nissim K.: Revealing Information while preserving privacy. ACM PODS Conference, 2003.
Google Scholar
Dobkin D., Jones A., Lipton R.: Secure Databases: Protection against User Influence. ACM Transactions on Databases Systems, 4(1), 1979.
Article Google Scholar
Domingo-Ferrer J, Mateo-Sanz J.: Practical data-oriented micro-aggregation for statistical disclosure control. IEEE TKDE, 14(1), 2002.
Google Scholar
Du W., Atallah M.: Secure Multi-party Computation: A Review and Open Problems.CERIAS Tech. Report 2001-51, Purdue University, 2001.
Google Scholar
Du W., Han Y. S., Chen S.: Privacy-Preserving Multivariate Statistical Analysis: Linear Regression and Classification, Proc. SIAM Conf. Data Mining, 2004.
Google Scholar
Du W., Atallah M.: Privacy-Preserving Cooperative Statistical Analysis, 17th Annual Computer Security Applications Conference, 2001.
Google Scholar
Dwork C., Nissim K.: Privacy-Preserving Data Mining on Vertically Partitioned Databases, CRYPTO, 2004.
Google Scholar
Dwork C., Kenthapadi K., McSherry F., Mironov I., Naor M.: Our Data, Ourselves: Privacy via Distributed Noise Generation. EUROCRYPT, 2006.
Google Scholar
Dwork C., McSherry F., Nissim K., Smith A.: Calibrating Noise to Sensitivity in Private Data Analysis, TCC, 2006.
Google Scholar
Even S., Goldreich O., Lempel A.: A Randomized Protocol for Signing Contracts. Communications of the ACM, vol 28, 1985.
Google Scholar
Evfimievski A., Gehrke J., Srikant R. Limiting Privacy Breaches in Privacy Preserving Data Mining. ACM PODS Conference, 2003.
Google Scholar
Evfimievski A., Srikant R., Agrawal R., Gehrke J.: Privacy-Preserving Mining of Association Rules. ACM KDD Conference, 2002.
Google Scholar
Fienberg S., McIntyre J.: Data Swapping: Variations on a Theme by Dalenius and Reiss. Technical Report, National Institute of Statistical Sciences, 2003.
Google Scholar
Fung B., Wang K., Yu P.: Top-Down Specialization for Information and Privacy Preservation. ICDE Conference, 2005.
Google Scholar
Gambs S., Kegl B., Aimeur E.: Privacy-Preserving Boosting. Knowledge Discovery and Data Mining Journal, to appear.
Google Scholar
Gedik B., Liu L.: A customizable k-anonymity model for protecting location privacy, ICDCS Conference, 2005.
Google Scholar
Goldreich O.: Secure Multi-Party Computation, Unpublished Manuscript, 2002.
Google Scholar
Huang Z., Du W., Chen B.: Deriving Private Information from Randomized Data. pp. 37–48, ACM SIGMOD Conference, 2005.
Google Scholar
Hore B., Mehrotra S., Tsudik B.: A Privacy-Preserving Index for Range Queries. VLDB Conference, 2004.
Google Scholar
Hughes D, Shmatikov V.: Information Hiding, Anonymity, and Privacy: A modular Approach. Journal of Computer Security, 12(1), 3–36, 2004.
Google Scholar
Inan A., Saygin Y., Savas E., Hintoglu A., Levi A.: Privacy-Preserving Clustering on Horizontally Partitioned Data. Data Engineering Workshops, 2006.
Google Scholar
Ioannidis I., Grama A., Atallah M.: A secure protocol for computing dot products in clustered and distributed environments, International Conference on Parallel Processing, 2002.
Google Scholar
Iyengar V. S.: Transforming Data to Satisfy Privacy Constraints. KDD Conference, 2002.
Google Scholar
Jiang W., Clifton C.: Privacy-preserving distributed k-Anonymity. Proceedings of the IFIP 11.3 Working Conference on Data and Applications Security, 2005.
Google Scholar
Johnson W., Lindenstrauss J.: Extensions of Lipshitz Mapping into Hilbert Space, Contemporary Math. vol. 26, pp. 189-206, 1984.
MATH MathSciNet Google Scholar
Jagannathan G., Wright R.: Privacy-Preserving Distributed k-means clustering over arbitrarily partitioned data. ACM KDD Conference, 2005.
Google Scholar
Jagannathan G., Pillaipakkamnatt K., Wright R.: A New Privacy-Preserving Distributed k-Clustering Algorithm. SIAM Conference on Data Mining, 2006.
Google Scholar
Kantarcioglu M., Clifton C.: Privacy-Preserving Distributed Mining of Association Rules on Horizontally Partitioned Data. IEEE TKDE Journal, 16(9), 2004.
Google Scholar
Kantarcioglu M., Vaidya J.: Privacy-Preserving Naive Bayes Classifier for Horizontally Partitioned Data. IEEE Workshop on Privacy-Preserving Data Mining, 2003.
Google Scholar
Kargupta H., Datta S., Wang Q., Sivakumar K.: On the Privacy Preserving Properties of Random Data Perturbation Techniques. ICDM Conference, pp. 99-106, 2003.
Google Scholar
Karn J., Ullman J.: A model of statistical databases and their security. ACM Transactions on Database Systems, 2(1):1–10, 1977.
Article Google Scholar
Kenthapadi K., Mishra N., Nissim K.: Simulatable Auditing, ACM PODS Conference, 2005.
Google Scholar
Kifer D., Gehrke J.: Injecting utility into anonymized datasets. SIGMOD Conference, pp. 217-228, 2006.
Google Scholar
Kim J., Winkler W.: Multiplicative Noise for Masking Continuous Data, Technical Report Statistics 2003-01, Statistical Research Division, US Bureau of the Census, Washington D.C., Apr. 2003.
Google Scholar
Kleinberg J., Papadimitriou C., Raghavan P.: Auditing Boolean Attributes. Journal of Computer and System Sciences, 6, 2003.
Google Scholar
Lakshmanan L., Ng R., Ramesh G. To Do or Not To Do: The Dilemma of Disclosing Anonymized Data. ACM SIGMOD Conference, 2005.
Google Scholar
Liew C. K., Choi U. J., Liew C. J. A data distortion by probability distribution. ACM TODS, 10(3):395-411, 1985.
Article MATH Google Scholar
LeFevre K., DeWitt D., Ramakrishnan R.: Incognito: Full Domain K-Anonymity. ACM SIGMOD Conference, 2005.
Google Scholar
LeFevre K., DeWitt D., Ramakrishnan R.: Mondrian Multidimensional K-Anonymity. ICDE Conference, 25, 2006.
Google Scholar
LeFevre K., DeWitt D., Ramakrishnan R.: Workload Aware Anonymization. KDD Conference, 2006.
Google Scholar
Li F., Sun J., Papadimitriou S. Mihaila G., Stanoi I.: Hiding in the Crowd: Privacy Preservation on Evolving Streams through Correlation Tracking. ICDE Conference, 2007.
Google Scholar
Lindell Y., Pinkas B.: Privacy-Preserving Data Mining. CRYPTO, 2000.
Google Scholar
Liu K., Kargupta H., Ryan J.: Random Projection Based Multiplicative Data Perturbation for Privacy Preserving Distributed Data Mining. IEEE Transactions on Knowledge and Data Engineering, 18(1), 2006.
Google Scholar
Liu K., Giannella C. Kargupta H.: An Attacker’s View of Distance Preserving Maps for Privacy-Preserving Data Mining. PKDD Conference, 2006.
Google Scholar
Machanavajjhala A., Gehrke J., Kifer D., and Venkitasubramaniam M.: l-Diversity: Privacy Beyond k-Anonymity. ICDE, 2006.
Google Scholar
Malin B, Sweeney L. Re-identification of DNA through an automated linkage process. American Medical Informatics Association, 423–427, 2001.
Google Scholar
Malin B. Why methods for genomic data privacy fail and what we can do to fix it, AAAS Annual Meeting, Seattle, WA, 2004.
Google Scholar
Meyerson A., Williams R. On the complexity of optimal k-anonymity. ACM PODS Conference, 2004.
Google Scholar
Mishra N., Sandler M.: Privacy vis Pseudorandom Sketches. ACM PODS Conference, 2006.
Google Scholar
Mukherjee S., Chen Z., Gangopadhyay S.: A privacy-preserving technique for Euclidean distance-based mining algorithms using Fourier based transforms, VLDB Journal, 2006.
Google Scholar
Moskowitz I., Chang L.: A decision theoretic system for information downgrading. Joint Conference on Information Sciences, 2000.
Google Scholar
Nabar S., Marthi B., Kenthapadi K., Mishra N., Motwani R.: Towards Robustness in Query Auditing. VLDB Conference, 2006.
Google Scholar
Naor M., Pinkas B.: Efficient Oblivious Transfer Protocols, SODA Conference, 2001.
Google Scholar
Natwichai J., Li X., Orlowska M.: A Reconstruction-based Algorithm for Classification Rules Hiding. Australasian Database Conference, 2006.
Google Scholar
Oliveira S. R. M., Zaane O.: Privacy Preserving Clustering by Data Transformation, Proc. 18th Brazilian Symp. Databases, pp. 304-318, Oct. 2003.
Google Scholar
Oliveira S. R. M., Zaiane O.: Data Perturbation by Rotation for Privacy-Preserving Clustering, Technical Report TR04-17, Department of Computing Science, University of Alberta, Edmonton, AB, Canada, August 2004.
Google Scholar
Oliveira S. R. M., Zaiane O., Saygin Y.: Secure Association-Rule Sharing. PAKDD Conference, 2004.
Google Scholar
Pinkas B.: Cryptographic Techniques for Privacy-Preserving Data Mining. ACM SIGKDD Explorations, 4(2), 2002.
Article Google Scholar
Polat H., Du W.: SVD-based collaborative filtering with privacy. ACM SAC Symposium, 2005.
Google Scholar
Polat H., Du W.: Privacy-Preserving Top-N Recommendations on Horizontally Partitioned Data. Web Intelligence, 2005.
Google Scholar
Rabin M. O.: How to exchange secrets by oblivious transfer, Technical Report TR-81, Aiken Corporation Laboratory, 1981.
Google Scholar
Reiss S.: Security in Databases: A combinatorial Study, Journal of ACM, 26(1), 1979.
Article MathSciNet Google Scholar
Rizvi S., Haritsa J.: Maintaining Data Privacy in Association Rule Mining. VLDB Conference, 2002.
Google Scholar
Saygin Y., Verykios V., Clifton C.: Using Unknowns to prevent discovery of Association Rules, ACM SIGMOD Record, 30(4), 2001.
Article Google Scholar
Saygin Y., Verykios V., Elmagarmid A.: Privacy-Preserving Association Rule Mining, 12th International Workshop on Research Issues in Data Engineering, 2002.
Google Scholar
Samarati P.: Protecting Respondents’ Identities in Microdata Release. IEEE Trans. Knowl. Data Eng. 13(6): 1010-1027 (2001).
Article Google Scholar
Shannon C. E.: The Mathematical Theory of Communication, University of Illinois Press, 1949.
Google Scholar
Silverman B. W.: Density Estimation for Statistics and Data Analysis. Chapman and Hall, 1986.
Google Scholar
Vaidya J., Clifton C.: Privacy-Preserving Association Rule Mining in Vertically Partitioned Databases. ACM KDD Conference, 2002.
Google Scholar
Vaidya J., Clifton C.: Privacy-Preserving k-means clustering over vertically partitioned Data. ACM KDD Conference, 2003.
Google Scholar
Vaidya J., Clifton C.: Privacy-Preserving Naive Bayes Classifier over vertically partitioned data. SIAM Conference, 2004.
Google Scholar
Vaidya J., Clifton C.: Privacy-Preserving Decision Trees over vertically partitioned data. Lecture Notes in Computer Science, Vol 3654, 2005.
Google Scholar
Verykios V. S., Bertino E., Fovino I. N., Provenza L. P., Saygin Y., Theodoridis Y.: State-of-the-art in privacy preserving data mining. ACM SIGMOD Record, v.33 n.1, 2004.
Google Scholar
Verykios V. S., Elmagarmid A., Bertino E., Saygin Y.,, Dasseni E.: Association Rule Hiding. IEEE Transactions on Knowledge and Data Engineering, 16(4), 2004.
Article Google Scholar
Wang K., Yu P., Chakraborty S.: Bottom-Up Generalization: A Data Mining Solution to Privacy Protection. ICDM Conference, 2004.
Google Scholar
Wang K., Fung B. C. M., Yu P. Template based Privacy -Preservation in classification problems. ICDM Conference, 2005.
Google Scholar
Wang K., Fung B. C. M.: Anonymization for Sequential Releases. ACM KDD Conference, 2006.
Google Scholar
Wang K., Fung B. C. M., Dong G.: Integarting Private Databases for Data Analysis. Lecture Notes in Computer Science, 3495, 2005.
Google Scholar
Warner S. L. Randomized Response: A survey technique for eliminating evasive answer bias. Journal of American Statistical Association, 60(309):63–69, March 1965.
Google Scholar
Winkler W.: Using simulated annealing for k-anonymity. Technical Report 7, US Census Bureau.
Google Scholar
Wu Y.-H., Chiang C.-M., Chen A. L. P.: Hiding Sensitive Association Rules with Limited Side Effects. IEEE Transactions on Knowledge and Data Engineering, 19(1), 2007.
Article Google Scholar
Xiao X., Tao Y.. Personalized Privacy Preservation. ACM SIGMOD Conference, 2006.
Google Scholar
Xiao X., Tao Y. Anatomy: Simple and Effective Privacy Preservation. VLDB Conference, pp. 139-150, 2006.
Google Scholar
Xu J., Wang W., Pei J., Wang X., Shi B., Fu A. W. C.: Utility Based Anonymization using Local Recoding. ACM KDD Conference, 2006.
Google Scholar
Xu S., Yung M.: k-anonymous secret handshakes with reusable credentials. ACM Conference on Computer and Communications Security, 2004.
Google Scholar
Yao A. C.: How to Generate and Exchange Secrets. FOCS Conferemce, 1986.
Google Scholar
Yao G., Feng D.: A new k-anonymous message transmission protocol. International Workshop on Information Security Applications, 2004.
Google Scholar
Yang Z., Zhong S., Wright R.: Privacy-Preserving Classification of Customer Data without Loss of Accuracy. SDM Conference, 2006.
Google Scholar
Yao C., Wang S., Jajodia S.: Checking for k-Anonymity Violation by views. ACM Conference on Computer and Communication Security, 2004.
Google Scholar
Yu H., Jiang X., Vaidya J.: Privacy-Preserving SVM using nonlinear Kernels on Horizontally Partitioned Data. SAC Conference, 2006.
Google Scholar
Yu H., Vaidya J., Jiang X.: Privacy-Preserving SVM Classification on Vertically Partitioned Data. PAKDD Conference, 2006.
Google Scholar
Zhang P., Tong Y., Tang S., Yang D.: Privacy-Preserving Naive Bayes Classifier. Lecture Notes in Computer Science, Vol 3584, 2005.
Google Scholar
Zhong S., Yang Z., Wright R.: Privacy-enhancing k-anonymization of customer data, In Proceedings of the ACM SIGMOD-SIGACT-SIGART Principles of Database Systems, Baltimore, MD. 2005.
Google Scholar
Zhu Y., Liu L. Optimal Randomization for Privacy- Preserving Data Mining. ACM KDD Conference, 2004.
Google Scholar

Download references

Author information

Authors and Affiliations

IBM T. J. Watson Research Center, 19 Skyline Drive, Hawthorne, NY 10532
Charu C. Aggarwal & Philip S. Yu

Authors

Charu C. Aggarwal
View author publications
You can also search for this author in PubMed Google Scholar
Philip S. Yu
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Dept. of Computer Science, University of California at Davis, One Shields Avenue, Davis, CA 95616-8562
Michael Gertz
George Mason University Center for Secure Information Systems Research I, Suite 417, Fairfax, VA 22030-4444
Sushil Jajodia

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Aggarwal, C.C., Yu, P.S. (2008). Privacy-Preserving Data Mining: A Survey. In: Gertz, M., Jajodia, S. (eds) Handbook of Database Security. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-48533-1_18

Download citation

DOI: https://doi.org/10.1007/978-0-387-48533-1_18
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-48532-4
Online ISBN: 978-0-387-48533-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics