Summary
In recent years, privacy-preserving data mining has been studied extensively, because of the wide proliferation of sensitive information on the internet. A number of algorithmic techniques have been designed for privacy-preserving data mining. In this paper, we provide a review of the state-of-the-art methods for privacy. We discuss methods for randomization, k-anonymization, and distributed privacy-preserving data mining. We also discuss cases in which the output of data mining applications needs to be sanitized for privacy-preservation purposes. We discuss the computational and theoretical limits associated with privacy-preservation over high dimensional data sets.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Adam N., Wortmann J. C.: Security-Control Methods for Statistical Databases: A Comparison Study. ACM Computing Surveys, 21(4), 1989.
Agrawal R., Srikant R. Privacy-Preserving Data Mining. Proceedings of the ACM SIGMOD Conference, 2000.
Agrawal R., Srikant R., Thomas D. Privacy-Preserving OLAP. Proceedings of the ACM SIGMOD Conference, 2005.
Agrawal R., Bayardo R., Faloutsos C., Kiernan J., Rantzau R., Srikant R.: Auditing Compliance via a hippocratic database. VLDB Conference, 2004.
Agrawal D. Aggarwal C. C. On the Design and Quantification of Privacy-Preserving Data Mining Algorithms. ACM PODS Conference, 2002.
Aggarwal C., Pei J., Zhang B. A Framework for Privacy Preservation against Adversarial Data Mining. ACM KDD Conference, 2006.
Aggarwal C. C. On k-anonymity and the curse of dimensionality. VLDB Conference, 2005.
Aggarwal C. C., Yu P. S.: A Condensation approach to privacy preserving data mining. EDBT Conference, 2004.
Aggarwal C. C., Yu P. S.: On Variable Constraints in Privacy-Preserving Data Mining. SIAM Conference, 2005.
Aggarwal C. C.: On Randomization, Public Information and the Curse of Dimensionality. ICDE Conference, 2007.
Aggarwal G., Feder T., Kenthapadi K., Motwani R., Panigrahy R., Thomas D., Zhu A.: Anonymizing Tables. ICDT Conference, 2005.
Aggarwal G., Feder T., Kenthapadi K., Motwani R., Panigrahy R., Thomas D., Zhu A.: Approximation Algorithms for k-anonymity. Journal of Privacy Technology, paper 20051120001, 2005.
Aggarwal G., Feder T., Kenthapadi K., Khuller S., Motwani R., Panigrahy R., Thomas D., Zhu A.: Achieving Anonymity via Clustering. ACM PODS Conference, 2006.
Atallah, M., Elmagarmid, A., Ibrahim, M., Bertino, E., Verykios, V.: Disclosure limitation of sensitive rules, Workshop on Knowledge and Data Engineering Exchange, 1999.
Bawa M., Bayardo R. J., Agrawal R.: Privacy-Preserving Indexing of Documents on the Network. VLDB Conference, 2003.
Bayardo R. J., Agrawal R.: Data Privacy through Optimal k-Anonymization. Proceedings of the ICDE Conference, pp. 217–228, 2005.
Bettini C., Wang X. S., Jajodia S.: Protecting Privacy against Location Based Personal Identification. Proc. of Secure Data Management Workshop, Trondheim, Norway, 2005.
Biskup J., Bonatti P.: Controlled Query Evaluation for Known Policies by Combining Lying and Refusal. Annals of Mathematics and Artificial Intelligence, 40(1-2), 2004.
Blum A., Dwork C., McSherry F., Nissim K.: Practical Privacy: The SuLQ Framework. ACM PODS Conference, 2005.
Chang L., Moskowitz I.: An integrated framwork for database inference and privacy protection. Data and Applications Security. Kluwer, 2000.
Chang L., Moskowitz I.: Parsimonious downgrading and decision trees applied to the inference problem. New Security Paradigms Workshop, 1998.
Chaum D., Crepeau C., Damgard I.: Multiparty unconditionally secure protocols. ACM STOC Conference, 1988.
Chawla S., Dwork C., McSherry F., Smith A., Wee H.: Towards Privacy in Public Databases, TCC, 2005.
Chawla S., Dwork C., McSherry F., Talwar K.: On the Utility of Privacy-Preserving Histograms, UAI, 2005.
Chen K., Liu L.: Privacy-preserving data classification with rotation perturbation. ICDM Conference, 2005.
Chin F.: Security Problems on Inference Control for SUM, MAX, and MIN Queries. J. of the ACM, 33(3), 1986.
Chin F., Ozsoyoglu G.: Auditing for Secure Statistical Databases. Proceedings of the ACM’81 Conference, 1981.
Ciriani V., De Capitiani di Vimercati S., Foresti S., Samarati P.: k-Anonymity. Security in Decentralized Data Management, ed. Jajodia S., Yu T., Springer, 2006.
Clifton C., Kantarcioglou M., Lin X., Zhu M.: Tools for privacy-preserving distributed data mining. ACM SIGKDD Explorations, 4(2), 2002.
Clifton C., Marks D.: Security and Privacy Implications of Data Mining., Workshop on Data Mining and Knowledge Discovery, 1996.
Dasseni E., Verykios V., Elmagarmid A., Bertino E.: Hiding Association Rules using Confidence and Support, 4th Information Hiding Workshop, 2001.
Denning D.: Secure Statostical Databases with Random Sample Queries. ACM TODS Journal, 5(3), 1980.
Dinur I., Nissim K.: Revealing Information while preserving privacy. ACM PODS Conference, 2003.
Dobkin D., Jones A., Lipton R.: Secure Databases: Protection against User Influence. ACM Transactions on Databases Systems, 4(1), 1979.
Domingo-Ferrer J, Mateo-Sanz J.: Practical data-oriented micro-aggregation for statistical disclosure control. IEEE TKDE, 14(1), 2002.
Du W., Atallah M.: Secure Multi-party Computation: A Review and Open Problems.CERIAS Tech. Report 2001-51, Purdue University, 2001.
Du W., Han Y. S., Chen S.: Privacy-Preserving Multivariate Statistical Analysis: Linear Regression and Classification, Proc. SIAM Conf. Data Mining, 2004.
Du W., Atallah M.: Privacy-Preserving Cooperative Statistical Analysis, 17th Annual Computer Security Applications Conference, 2001.
Dwork C., Nissim K.: Privacy-Preserving Data Mining on Vertically Partitioned Databases, CRYPTO, 2004.
Dwork C., Kenthapadi K., McSherry F., Mironov I., Naor M.: Our Data, Ourselves: Privacy via Distributed Noise Generation. EUROCRYPT, 2006.
Dwork C., McSherry F., Nissim K., Smith A.: Calibrating Noise to Sensitivity in Private Data Analysis, TCC, 2006.
Even S., Goldreich O., Lempel A.: A Randomized Protocol for Signing Contracts. Communications of the ACM, vol 28, 1985.
Evfimievski A., Gehrke J., Srikant R. Limiting Privacy Breaches in Privacy Preserving Data Mining. ACM PODS Conference, 2003.
Evfimievski A., Srikant R., Agrawal R., Gehrke J.: Privacy-Preserving Mining of Association Rules. ACM KDD Conference, 2002.
Fienberg S., McIntyre J.: Data Swapping: Variations on a Theme by Dalenius and Reiss. Technical Report, National Institute of Statistical Sciences, 2003.
Fung B., Wang K., Yu P.: Top-Down Specialization for Information and Privacy Preservation. ICDE Conference, 2005.
Gambs S., Kegl B., Aimeur E.: Privacy-Preserving Boosting. Knowledge Discovery and Data Mining Journal, to appear.
Gedik B., Liu L.: A customizable k-anonymity model for protecting location privacy, ICDCS Conference, 2005.
Goldreich O.: Secure Multi-Party Computation, Unpublished Manuscript, 2002.
Huang Z., Du W., Chen B.: Deriving Private Information from Randomized Data. pp. 37–48, ACM SIGMOD Conference, 2005.
Hore B., Mehrotra S., Tsudik B.: A Privacy-Preserving Index for Range Queries. VLDB Conference, 2004.
Hughes D, Shmatikov V.: Information Hiding, Anonymity, and Privacy: A modular Approach. Journal of Computer Security, 12(1), 3–36, 2004.
Inan A., Saygin Y., Savas E., Hintoglu A., Levi A.: Privacy-Preserving Clustering on Horizontally Partitioned Data. Data Engineering Workshops, 2006.
Ioannidis I., Grama A., Atallah M.: A secure protocol for computing dot products in clustered and distributed environments, International Conference on Parallel Processing, 2002.
Iyengar V. S.: Transforming Data to Satisfy Privacy Constraints. KDD Conference, 2002.
Jiang W., Clifton C.: Privacy-preserving distributed k-Anonymity. Proceedings of the IFIP 11.3 Working Conference on Data and Applications Security, 2005.
Johnson W., Lindenstrauss J.: Extensions of Lipshitz Mapping into Hilbert Space, Contemporary Math. vol. 26, pp. 189-206, 1984.
Jagannathan G., Wright R.: Privacy-Preserving Distributed k-means clustering over arbitrarily partitioned data. ACM KDD Conference, 2005.
Jagannathan G., Pillaipakkamnatt K., Wright R.: A New Privacy-Preserving Distributed k-Clustering Algorithm. SIAM Conference on Data Mining, 2006.
Kantarcioglu M., Clifton C.: Privacy-Preserving Distributed Mining of Association Rules on Horizontally Partitioned Data. IEEE TKDE Journal, 16(9), 2004.
Kantarcioglu M., Vaidya J.: Privacy-Preserving Naive Bayes Classifier for Horizontally Partitioned Data. IEEE Workshop on Privacy-Preserving Data Mining, 2003.
Kargupta H., Datta S., Wang Q., Sivakumar K.: On the Privacy Preserving Properties of Random Data Perturbation Techniques. ICDM Conference, pp. 99-106, 2003.
Karn J., Ullman J.: A model of statistical databases and their security. ACM Transactions on Database Systems, 2(1):1–10, 1977.
Kenthapadi K., Mishra N., Nissim K.: Simulatable Auditing, ACM PODS Conference, 2005.
Kifer D., Gehrke J.: Injecting utility into anonymized datasets. SIGMOD Conference, pp. 217-228, 2006.
Kim J., Winkler W.: Multiplicative Noise for Masking Continuous Data, Technical Report Statistics 2003-01, Statistical Research Division, US Bureau of the Census, Washington D.C., Apr. 2003.
Kleinberg J., Papadimitriou C., Raghavan P.: Auditing Boolean Attributes. Journal of Computer and System Sciences, 6, 2003.
Lakshmanan L., Ng R., Ramesh G. To Do or Not To Do: The Dilemma of Disclosing Anonymized Data. ACM SIGMOD Conference, 2005.
Liew C. K., Choi U. J., Liew C. J. A data distortion by probability distribution. ACM TODS, 10(3):395-411, 1985.
LeFevre K., DeWitt D., Ramakrishnan R.: Incognito: Full Domain K-Anonymity. ACM SIGMOD Conference, 2005.
LeFevre K., DeWitt D., Ramakrishnan R.: Mondrian Multidimensional K-Anonymity. ICDE Conference, 25, 2006.
LeFevre K., DeWitt D., Ramakrishnan R.: Workload Aware Anonymization. KDD Conference, 2006.
Li F., Sun J., Papadimitriou S. Mihaila G., Stanoi I.: Hiding in the Crowd: Privacy Preservation on Evolving Streams through Correlation Tracking. ICDE Conference, 2007.
Lindell Y., Pinkas B.: Privacy-Preserving Data Mining. CRYPTO, 2000.
Liu K., Kargupta H., Ryan J.: Random Projection Based Multiplicative Data Perturbation for Privacy Preserving Distributed Data Mining. IEEE Transactions on Knowledge and Data Engineering, 18(1), 2006.
Liu K., Giannella C. Kargupta H.: An Attacker’s View of Distance Preserving Maps for Privacy-Preserving Data Mining. PKDD Conference, 2006.
Machanavajjhala A., Gehrke J., Kifer D., and Venkitasubramaniam M.: l-Diversity: Privacy Beyond k-Anonymity. ICDE, 2006.
Malin B, Sweeney L. Re-identification of DNA through an automated linkage process. American Medical Informatics Association, 423–427, 2001.
Malin B. Why methods for genomic data privacy fail and what we can do to fix it, AAAS Annual Meeting, Seattle, WA, 2004.
Meyerson A., Williams R. On the complexity of optimal k-anonymity. ACM PODS Conference, 2004.
Mishra N., Sandler M.: Privacy vis Pseudorandom Sketches. ACM PODS Conference, 2006.
Mukherjee S., Chen Z., Gangopadhyay S.: A privacy-preserving technique for Euclidean distance-based mining algorithms using Fourier based transforms, VLDB Journal, 2006.
Moskowitz I., Chang L.: A decision theoretic system for information downgrading. Joint Conference on Information Sciences, 2000.
Nabar S., Marthi B., Kenthapadi K., Mishra N., Motwani R.: Towards Robustness in Query Auditing. VLDB Conference, 2006.
Naor M., Pinkas B.: Efficient Oblivious Transfer Protocols, SODA Conference, 2001.
Natwichai J., Li X., Orlowska M.: A Reconstruction-based Algorithm for Classification Rules Hiding. Australasian Database Conference, 2006.
Oliveira S. R. M., Zaane O.: Privacy Preserving Clustering by Data Transformation, Proc. 18th Brazilian Symp. Databases, pp. 304-318, Oct. 2003.
Oliveira S. R. M., Zaiane O.: Data Perturbation by Rotation for Privacy-Preserving Clustering, Technical Report TR04-17, Department of Computing Science, University of Alberta, Edmonton, AB, Canada, August 2004.
Oliveira S. R. M., Zaiane O., Saygin Y.: Secure Association-Rule Sharing. PAKDD Conference, 2004.
Pinkas B.: Cryptographic Techniques for Privacy-Preserving Data Mining. ACM SIGKDD Explorations, 4(2), 2002.
Polat H., Du W.: SVD-based collaborative filtering with privacy. ACM SAC Symposium, 2005.
Polat H., Du W.: Privacy-Preserving Top-N Recommendations on Horizontally Partitioned Data. Web Intelligence, 2005.
Rabin M. O.: How to exchange secrets by oblivious transfer, Technical Report TR-81, Aiken Corporation Laboratory, 1981.
Reiss S.: Security in Databases: A combinatorial Study, Journal of ACM, 26(1), 1979.
Rizvi S., Haritsa J.: Maintaining Data Privacy in Association Rule Mining. VLDB Conference, 2002.
Saygin Y., Verykios V., Clifton C.: Using Unknowns to prevent discovery of Association Rules, ACM SIGMOD Record, 30(4), 2001.
Saygin Y., Verykios V., Elmagarmid A.: Privacy-Preserving Association Rule Mining, 12th International Workshop on Research Issues in Data Engineering, 2002.
Samarati P.: Protecting Respondents’ Identities in Microdata Release. IEEE Trans. Knowl. Data Eng. 13(6): 1010-1027 (2001).
Shannon C. E.: The Mathematical Theory of Communication, University of Illinois Press, 1949.
Silverman B. W.: Density Estimation for Statistics and Data Analysis. Chapman and Hall, 1986.
Vaidya J., Clifton C.: Privacy-Preserving Association Rule Mining in Vertically Partitioned Databases. ACM KDD Conference, 2002.
Vaidya J., Clifton C.: Privacy-Preserving k-means clustering over vertically partitioned Data. ACM KDD Conference, 2003.
Vaidya J., Clifton C.: Privacy-Preserving Naive Bayes Classifier over vertically partitioned data. SIAM Conference, 2004.
Vaidya J., Clifton C.: Privacy-Preserving Decision Trees over vertically partitioned data. Lecture Notes in Computer Science, Vol 3654, 2005.
Verykios V. S., Bertino E., Fovino I. N., Provenza L. P., Saygin Y., Theodoridis Y.: State-of-the-art in privacy preserving data mining. ACM SIGMOD Record, v.33 n.1, 2004.
Verykios V. S., Elmagarmid A., Bertino E., Saygin Y.,, Dasseni E.: Association Rule Hiding. IEEE Transactions on Knowledge and Data Engineering, 16(4), 2004.
Wang K., Yu P., Chakraborty S.: Bottom-Up Generalization: A Data Mining Solution to Privacy Protection. ICDM Conference, 2004.
Wang K., Fung B. C. M., Yu P. Template based Privacy -Preservation in classification problems. ICDM Conference, 2005.
Wang K., Fung B. C. M.: Anonymization for Sequential Releases. ACM KDD Conference, 2006.
Wang K., Fung B. C. M., Dong G.: Integarting Private Databases for Data Analysis. Lecture Notes in Computer Science, 3495, 2005.
Warner S. L. Randomized Response: A survey technique for eliminating evasive answer bias. Journal of American Statistical Association, 60(309):63–69, March 1965.
Winkler W.: Using simulated annealing for k-anonymity. Technical Report 7, US Census Bureau.
Wu Y.-H., Chiang C.-M., Chen A. L. P.: Hiding Sensitive Association Rules with Limited Side Effects. IEEE Transactions on Knowledge and Data Engineering, 19(1), 2007.
Xiao X., Tao Y.. Personalized Privacy Preservation. ACM SIGMOD Conference, 2006.
Xiao X., Tao Y. Anatomy: Simple and Effective Privacy Preservation. VLDB Conference, pp. 139-150, 2006.
Xu J., Wang W., Pei J., Wang X., Shi B., Fu A. W. C.: Utility Based Anonymization using Local Recoding. ACM KDD Conference, 2006.
Xu S., Yung M.: k-anonymous secret handshakes with reusable credentials. ACM Conference on Computer and Communications Security, 2004.
Yao A. C.: How to Generate and Exchange Secrets. FOCS Conferemce, 1986.
Yao G., Feng D.: A new k-anonymous message transmission protocol. International Workshop on Information Security Applications, 2004.
Yang Z., Zhong S., Wright R.: Privacy-Preserving Classification of Customer Data without Loss of Accuracy. SDM Conference, 2006.
Yao C., Wang S., Jajodia S.: Checking for k-Anonymity Violation by views. ACM Conference on Computer and Communication Security, 2004.
Yu H., Jiang X., Vaidya J.: Privacy-Preserving SVM using nonlinear Kernels on Horizontally Partitioned Data. SAC Conference, 2006.
Yu H., Vaidya J., Jiang X.: Privacy-Preserving SVM Classification on Vertically Partitioned Data. PAKDD Conference, 2006.
Zhang P., Tong Y., Tang S., Yang D.: Privacy-Preserving Naive Bayes Classifier. Lecture Notes in Computer Science, Vol 3584, 2005.
Zhong S., Yang Z., Wright R.: Privacy-enhancing k-anonymization of customer data, In Proceedings of the ACM SIGMOD-SIGACT-SIGART Principles of Database Systems, Baltimore, MD. 2005.
Zhu Y., Liu L. Optimal Randomization for Privacy- Preserving Data Mining. ACM KDD Conference, 2004.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer Science+Business Media, LLC.
About this chapter
Cite this chapter
Aggarwal, C.C., Yu, P.S. (2008). Privacy-Preserving Data Mining: A Survey. In: Gertz, M., Jajodia, S. (eds) Handbook of Database Security. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-48533-1_18
Download citation
DOI: https://doi.org/10.1007/978-0-387-48533-1_18
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-48532-4
Online ISBN: 978-0-387-48533-1
eBook Packages: Computer ScienceComputer Science (R0)