Skip to main content

Privacy-Preserving Data Mining: A Survey

  • Chapter
Handbook of Database Security

Summary

In recent years, privacy-preserving data mining has been studied extensively, because of the wide proliferation of sensitive information on the internet. A number of algorithmic techniques have been designed for privacy-preserving data mining. In this paper, we provide a review of the state-of-the-art methods for privacy. We discuss methods for randomization, k-anonymization, and distributed privacy-preserving data mining. We also discuss cases in which the output of data mining applications needs to be sanitized for privacy-preservation purposes. We discuss the computational and theoretical limits associated with privacy-preservation over high dimensional data sets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Adam N., Wortmann J. C.: Security-Control Methods for Statistical Databases: A Comparison Study. ACM Computing Surveys, 21(4), 1989.

    Article  Google Scholar 

  2. Agrawal R., Srikant R. Privacy-Preserving Data Mining. Proceedings of the ACM SIGMOD Conference, 2000.

    Google Scholar 

  3. Agrawal R., Srikant R., Thomas D. Privacy-Preserving OLAP. Proceedings of the ACM SIGMOD Conference, 2005.

    Google Scholar 

  4. Agrawal R., Bayardo R., Faloutsos C., Kiernan J., Rantzau R., Srikant R.: Auditing Compliance via a hippocratic database. VLDB Conference, 2004.

    Google Scholar 

  5. Agrawal D. Aggarwal C. C. On the Design and Quantification of Privacy-Preserving Data Mining Algorithms. ACM PODS Conference, 2002.

    Google Scholar 

  6. Aggarwal C., Pei J., Zhang B. A Framework for Privacy Preservation against Adversarial Data Mining. ACM KDD Conference, 2006.

    Google Scholar 

  7. Aggarwal C. C. On k-anonymity and the curse of dimensionality. VLDB Conference, 2005.

    Google Scholar 

  8. Aggarwal C. C., Yu P. S.: A Condensation approach to privacy preserving data mining. EDBT Conference, 2004.

    Google Scholar 

  9. Aggarwal C. C., Yu P. S.: On Variable Constraints in Privacy-Preserving Data Mining. SIAM Conference, 2005.

    Google Scholar 

  10. Aggarwal C. C.: On Randomization, Public Information and the Curse of Dimensionality. ICDE Conference, 2007.

    Google Scholar 

  11. Aggarwal G., Feder T., Kenthapadi K., Motwani R., Panigrahy R., Thomas D., Zhu A.: Anonymizing Tables. ICDT Conference, 2005.

    Google Scholar 

  12. Aggarwal G., Feder T., Kenthapadi K., Motwani R., Panigrahy R., Thomas D., Zhu A.: Approximation Algorithms for k-anonymity. Journal of Privacy Technology, paper 20051120001, 2005.

    Google Scholar 

  13. Aggarwal G., Feder T., Kenthapadi K., Khuller S., Motwani R., Panigrahy R., Thomas D., Zhu A.: Achieving Anonymity via Clustering. ACM PODS Conference, 2006.

    Google Scholar 

  14. Atallah, M., Elmagarmid, A., Ibrahim, M., Bertino, E., Verykios, V.: Disclosure limitation of sensitive rules, Workshop on Knowledge and Data Engineering Exchange, 1999.

    Google Scholar 

  15. Bawa M., Bayardo R. J., Agrawal R.: Privacy-Preserving Indexing of Documents on the Network. VLDB Conference, 2003.

    Google Scholar 

  16. Bayardo R. J., Agrawal R.: Data Privacy through Optimal k-Anonymization. Proceedings of the ICDE Conference, pp. 217–228, 2005.

    Google Scholar 

  17. Bettini C., Wang X. S., Jajodia S.: Protecting Privacy against Location Based Personal Identification. Proc. of Secure Data Management Workshop, Trondheim, Norway, 2005.

    Google Scholar 

  18. Biskup J., Bonatti P.: Controlled Query Evaluation for Known Policies by Combining Lying and Refusal. Annals of Mathematics and Artificial Intelligence, 40(1-2), 2004.

    Article  MathSciNet  Google Scholar 

  19. Blum A., Dwork C., McSherry F., Nissim K.: Practical Privacy: The SuLQ Framework. ACM PODS Conference, 2005.

    Google Scholar 

  20. Chang L., Moskowitz I.: An integrated framwork for database inference and privacy protection. Data and Applications Security. Kluwer, 2000.

    Google Scholar 

  21. Chang L., Moskowitz I.: Parsimonious downgrading and decision trees applied to the inference problem. New Security Paradigms Workshop, 1998.

    Google Scholar 

  22. Chaum D., Crepeau C., Damgard I.: Multiparty unconditionally secure protocols. ACM STOC Conference, 1988.

    Google Scholar 

  23. Chawla S., Dwork C., McSherry F., Smith A., Wee H.: Towards Privacy in Public Databases, TCC, 2005.

    Google Scholar 

  24. Chawla S., Dwork C., McSherry F., Talwar K.: On the Utility of Privacy-Preserving Histograms, UAI, 2005.

    Google Scholar 

  25. Chen K., Liu L.: Privacy-preserving data classification with rotation perturbation. ICDM Conference, 2005.

    Google Scholar 

  26. Chin F.: Security Problems on Inference Control for SUM, MAX, and MIN Queries. J. of the ACM, 33(3), 1986.

    Google Scholar 

  27. Chin F., Ozsoyoglu G.: Auditing for Secure Statistical Databases. Proceedings of the ACM’81 Conference, 1981.

    Google Scholar 

  28. Ciriani V., De Capitiani di Vimercati S., Foresti S., Samarati P.: k-Anonymity. Security in Decentralized Data Management, ed. Jajodia S., Yu T., Springer, 2006.

    Google Scholar 

  29. Clifton C., Kantarcioglou M., Lin X., Zhu M.: Tools for privacy-preserving distributed data mining. ACM SIGKDD Explorations, 4(2), 2002.

    Google Scholar 

  30. Clifton C., Marks D.: Security and Privacy Implications of Data Mining., Workshop on Data Mining and Knowledge Discovery, 1996.

    Google Scholar 

  31. Dasseni E., Verykios V., Elmagarmid A., Bertino E.: Hiding Association Rules using Confidence and Support, 4th Information Hiding Workshop, 2001.

    Google Scholar 

  32. Denning D.: Secure Statostical Databases with Random Sample Queries. ACM TODS Journal, 5(3), 1980.

    Google Scholar 

  33. Dinur I., Nissim K.: Revealing Information while preserving privacy. ACM PODS Conference, 2003.

    Google Scholar 

  34. Dobkin D., Jones A., Lipton R.: Secure Databases: Protection against User Influence. ACM Transactions on Databases Systems, 4(1), 1979.

    Article  Google Scholar 

  35. Domingo-Ferrer J, Mateo-Sanz J.: Practical data-oriented micro-aggregation for statistical disclosure control. IEEE TKDE, 14(1), 2002.

    Google Scholar 

  36. Du W., Atallah M.: Secure Multi-party Computation: A Review and Open Problems.CERIAS Tech. Report 2001-51, Purdue University, 2001.

    Google Scholar 

  37. Du W., Han Y. S., Chen S.: Privacy-Preserving Multivariate Statistical Analysis: Linear Regression and Classification, Proc. SIAM Conf. Data Mining, 2004.

    Google Scholar 

  38. Du W., Atallah M.: Privacy-Preserving Cooperative Statistical Analysis, 17th Annual Computer Security Applications Conference, 2001.

    Google Scholar 

  39. Dwork C., Nissim K.: Privacy-Preserving Data Mining on Vertically Partitioned Databases, CRYPTO, 2004.

    Google Scholar 

  40. Dwork C., Kenthapadi K., McSherry F., Mironov I., Naor M.: Our Data, Ourselves: Privacy via Distributed Noise Generation. EUROCRYPT, 2006.

    Google Scholar 

  41. Dwork C., McSherry F., Nissim K., Smith A.: Calibrating Noise to Sensitivity in Private Data Analysis, TCC, 2006.

    Google Scholar 

  42. Even S., Goldreich O., Lempel A.: A Randomized Protocol for Signing Contracts. Communications of the ACM, vol 28, 1985.

    Google Scholar 

  43. Evfimievski A., Gehrke J., Srikant R. Limiting Privacy Breaches in Privacy Preserving Data Mining. ACM PODS Conference, 2003.

    Google Scholar 

  44. Evfimievski A., Srikant R., Agrawal R., Gehrke J.: Privacy-Preserving Mining of Association Rules. ACM KDD Conference, 2002.

    Google Scholar 

  45. Fienberg S., McIntyre J.: Data Swapping: Variations on a Theme by Dalenius and Reiss. Technical Report, National Institute of Statistical Sciences, 2003.

    Google Scholar 

  46. Fung B., Wang K., Yu P.: Top-Down Specialization for Information and Privacy Preservation. ICDE Conference, 2005.

    Google Scholar 

  47. Gambs S., Kegl B., Aimeur E.: Privacy-Preserving Boosting. Knowledge Discovery and Data Mining Journal, to appear.

    Google Scholar 

  48. Gedik B., Liu L.: A customizable k-anonymity model for protecting location privacy, ICDCS Conference, 2005.

    Google Scholar 

  49. Goldreich O.: Secure Multi-Party Computation, Unpublished Manuscript, 2002.

    Google Scholar 

  50. Huang Z., Du W., Chen B.: Deriving Private Information from Randomized Data. pp. 37–48, ACM SIGMOD Conference, 2005.

    Google Scholar 

  51. Hore B., Mehrotra S., Tsudik B.: A Privacy-Preserving Index for Range Queries. VLDB Conference, 2004.

    Google Scholar 

  52. Hughes D, Shmatikov V.: Information Hiding, Anonymity, and Privacy: A modular Approach. Journal of Computer Security, 12(1), 3–36, 2004.

    Google Scholar 

  53. Inan A., Saygin Y., Savas E., Hintoglu A., Levi A.: Privacy-Preserving Clustering on Horizontally Partitioned Data. Data Engineering Workshops, 2006.

    Google Scholar 

  54. Ioannidis I., Grama A., Atallah M.: A secure protocol for computing dot products in clustered and distributed environments, International Conference on Parallel Processing, 2002.

    Google Scholar 

  55. Iyengar V. S.: Transforming Data to Satisfy Privacy Constraints. KDD Conference, 2002.

    Google Scholar 

  56. Jiang W., Clifton C.: Privacy-preserving distributed k-Anonymity. Proceedings of the IFIP 11.3 Working Conference on Data and Applications Security, 2005.

    Google Scholar 

  57. Johnson W., Lindenstrauss J.: Extensions of Lipshitz Mapping into Hilbert Space, Contemporary Math. vol. 26, pp. 189-206, 1984.

    MATH  MathSciNet  Google Scholar 

  58. Jagannathan G., Wright R.: Privacy-Preserving Distributed k-means clustering over arbitrarily partitioned data. ACM KDD Conference, 2005.

    Google Scholar 

  59. Jagannathan G., Pillaipakkamnatt K., Wright R.: A New Privacy-Preserving Distributed k-Clustering Algorithm. SIAM Conference on Data Mining, 2006.

    Google Scholar 

  60. Kantarcioglu M., Clifton C.: Privacy-Preserving Distributed Mining of Association Rules on Horizontally Partitioned Data. IEEE TKDE Journal, 16(9), 2004.

    Google Scholar 

  61. Kantarcioglu M., Vaidya J.: Privacy-Preserving Naive Bayes Classifier for Horizontally Partitioned Data. IEEE Workshop on Privacy-Preserving Data Mining, 2003.

    Google Scholar 

  62. Kargupta H., Datta S., Wang Q., Sivakumar K.: On the Privacy Preserving Properties of Random Data Perturbation Techniques. ICDM Conference, pp. 99-106, 2003.

    Google Scholar 

  63. Karn J., Ullman J.: A model of statistical databases and their security. ACM Transactions on Database Systems, 2(1):1–10, 1977.

    Article  Google Scholar 

  64. Kenthapadi K., Mishra N., Nissim K.: Simulatable Auditing, ACM PODS Conference, 2005.

    Google Scholar 

  65. Kifer D., Gehrke J.: Injecting utility into anonymized datasets. SIGMOD Conference, pp. 217-228, 2006.

    Google Scholar 

  66. Kim J., Winkler W.: Multiplicative Noise for Masking Continuous Data, Technical Report Statistics 2003-01, Statistical Research Division, US Bureau of the Census, Washington D.C., Apr. 2003.

    Google Scholar 

  67. Kleinberg J., Papadimitriou C., Raghavan P.: Auditing Boolean Attributes. Journal of Computer and System Sciences, 6, 2003.

    Google Scholar 

  68. Lakshmanan L., Ng R., Ramesh G. To Do or Not To Do: The Dilemma of Disclosing Anonymized Data. ACM SIGMOD Conference, 2005.

    Google Scholar 

  69. Liew C. K., Choi U. J., Liew C. J. A data distortion by probability distribution. ACM TODS, 10(3):395-411, 1985.

    Article  MATH  Google Scholar 

  70. LeFevre K., DeWitt D., Ramakrishnan R.: Incognito: Full Domain K-Anonymity. ACM SIGMOD Conference, 2005.

    Google Scholar 

  71. LeFevre K., DeWitt D., Ramakrishnan R.: Mondrian Multidimensional K-Anonymity. ICDE Conference, 25, 2006.

    Google Scholar 

  72. LeFevre K., DeWitt D., Ramakrishnan R.: Workload Aware Anonymization. KDD Conference, 2006.

    Google Scholar 

  73. Li F., Sun J., Papadimitriou S. Mihaila G., Stanoi I.: Hiding in the Crowd: Privacy Preservation on Evolving Streams through Correlation Tracking. ICDE Conference, 2007.

    Google Scholar 

  74. Lindell Y., Pinkas B.: Privacy-Preserving Data Mining. CRYPTO, 2000.

    Google Scholar 

  75. Liu K., Kargupta H., Ryan J.: Random Projection Based Multiplicative Data Perturbation for Privacy Preserving Distributed Data Mining. IEEE Transactions on Knowledge and Data Engineering, 18(1), 2006.

    Google Scholar 

  76. Liu K., Giannella C. Kargupta H.: An Attacker’s View of Distance Preserving Maps for Privacy-Preserving Data Mining. PKDD Conference, 2006.

    Google Scholar 

  77. Machanavajjhala A., Gehrke J., Kifer D., and Venkitasubramaniam M.: l-Diversity: Privacy Beyond k-Anonymity. ICDE, 2006.

    Google Scholar 

  78. Malin B, Sweeney L. Re-identification of DNA through an automated linkage process. American Medical Informatics Association, 423–427, 2001.

    Google Scholar 

  79. Malin B. Why methods for genomic data privacy fail and what we can do to fix it, AAAS Annual Meeting, Seattle, WA, 2004.

    Google Scholar 

  80. Meyerson A., Williams R. On the complexity of optimal k-anonymity. ACM PODS Conference, 2004.

    Google Scholar 

  81. Mishra N., Sandler M.: Privacy vis Pseudorandom Sketches. ACM PODS Conference, 2006.

    Google Scholar 

  82. Mukherjee S., Chen Z., Gangopadhyay S.: A privacy-preserving technique for Euclidean distance-based mining algorithms using Fourier based transforms, VLDB Journal, 2006.

    Google Scholar 

  83. Moskowitz I., Chang L.: A decision theoretic system for information downgrading. Joint Conference on Information Sciences, 2000.

    Google Scholar 

  84. Nabar S., Marthi B., Kenthapadi K., Mishra N., Motwani R.: Towards Robustness in Query Auditing. VLDB Conference, 2006.

    Google Scholar 

  85. Naor M., Pinkas B.: Efficient Oblivious Transfer Protocols, SODA Conference, 2001.

    Google Scholar 

  86. Natwichai J., Li X., Orlowska M.: A Reconstruction-based Algorithm for Classification Rules Hiding. Australasian Database Conference, 2006.

    Google Scholar 

  87. Oliveira S. R. M., Zaane O.: Privacy Preserving Clustering by Data Transformation, Proc. 18th Brazilian Symp. Databases, pp. 304-318, Oct. 2003.

    Google Scholar 

  88. Oliveira S. R. M., Zaiane O.: Data Perturbation by Rotation for Privacy-Preserving Clustering, Technical Report TR04-17, Department of Computing Science, University of Alberta, Edmonton, AB, Canada, August 2004.

    Google Scholar 

  89. Oliveira S. R. M., Zaiane O., Saygin Y.: Secure Association-Rule Sharing. PAKDD Conference, 2004.

    Google Scholar 

  90. Pinkas B.: Cryptographic Techniques for Privacy-Preserving Data Mining. ACM SIGKDD Explorations, 4(2), 2002.

    Article  Google Scholar 

  91. Polat H., Du W.: SVD-based collaborative filtering with privacy. ACM SAC Symposium, 2005.

    Google Scholar 

  92. Polat H., Du W.: Privacy-Preserving Top-N Recommendations on Horizontally Partitioned Data. Web Intelligence, 2005.

    Google Scholar 

  93. Rabin M. O.: How to exchange secrets by oblivious transfer, Technical Report TR-81, Aiken Corporation Laboratory, 1981.

    Google Scholar 

  94. Reiss S.: Security in Databases: A combinatorial Study, Journal of ACM, 26(1), 1979.

    Article  MathSciNet  Google Scholar 

  95. Rizvi S., Haritsa J.: Maintaining Data Privacy in Association Rule Mining. VLDB Conference, 2002.

    Google Scholar 

  96. Saygin Y., Verykios V., Clifton C.: Using Unknowns to prevent discovery of Association Rules, ACM SIGMOD Record, 30(4), 2001.

    Article  Google Scholar 

  97. Saygin Y., Verykios V., Elmagarmid A.: Privacy-Preserving Association Rule Mining, 12th International Workshop on Research Issues in Data Engineering, 2002.

    Google Scholar 

  98. Samarati P.: Protecting Respondents’ Identities in Microdata Release. IEEE Trans. Knowl. Data Eng. 13(6): 1010-1027 (2001).

    Article  Google Scholar 

  99. Shannon C. E.: The Mathematical Theory of Communication, University of Illinois Press, 1949.

    Google Scholar 

  100. Silverman B. W.: Density Estimation for Statistics and Data Analysis. Chapman and Hall, 1986.

    Google Scholar 

  101. Vaidya J., Clifton C.: Privacy-Preserving Association Rule Mining in Vertically Partitioned Databases. ACM KDD Conference, 2002.

    Google Scholar 

  102. Vaidya J., Clifton C.: Privacy-Preserving k-means clustering over vertically partitioned Data. ACM KDD Conference, 2003.

    Google Scholar 

  103. Vaidya J., Clifton C.: Privacy-Preserving Naive Bayes Classifier over vertically partitioned data. SIAM Conference, 2004.

    Google Scholar 

  104. Vaidya J., Clifton C.: Privacy-Preserving Decision Trees over vertically partitioned data. Lecture Notes in Computer Science, Vol 3654, 2005.

    Google Scholar 

  105. Verykios V. S., Bertino E., Fovino I. N., Provenza L. P., Saygin Y., Theodoridis Y.: State-of-the-art in privacy preserving data mining. ACM SIGMOD Record, v.33 n.1, 2004.

    Google Scholar 

  106. Verykios V. S., Elmagarmid A., Bertino E., Saygin Y.,, Dasseni E.: Association Rule Hiding. IEEE Transactions on Knowledge and Data Engineering, 16(4), 2004.

    Article  Google Scholar 

  107. Wang K., Yu P., Chakraborty S.: Bottom-Up Generalization: A Data Mining Solution to Privacy Protection. ICDM Conference, 2004.

    Google Scholar 

  108. Wang K., Fung B. C. M., Yu P. Template based Privacy -Preservation in classification problems. ICDM Conference, 2005.

    Google Scholar 

  109. Wang K., Fung B. C. M.: Anonymization for Sequential Releases. ACM KDD Conference, 2006.

    Google Scholar 

  110. Wang K., Fung B. C. M., Dong G.: Integarting Private Databases for Data Analysis. Lecture Notes in Computer Science, 3495, 2005.

    Google Scholar 

  111. Warner S. L. Randomized Response: A survey technique for eliminating evasive answer bias. Journal of American Statistical Association, 60(309):63–69, March 1965.

    Google Scholar 

  112. Winkler W.: Using simulated annealing for k-anonymity. Technical Report 7, US Census Bureau.

    Google Scholar 

  113. Wu Y.-H., Chiang C.-M., Chen A. L. P.: Hiding Sensitive Association Rules with Limited Side Effects. IEEE Transactions on Knowledge and Data Engineering, 19(1), 2007.

    Article  Google Scholar 

  114. Xiao X., Tao Y.. Personalized Privacy Preservation. ACM SIGMOD Conference, 2006.

    Google Scholar 

  115. Xiao X., Tao Y. Anatomy: Simple and Effective Privacy Preservation. VLDB Conference, pp. 139-150, 2006.

    Google Scholar 

  116. Xu J., Wang W., Pei J., Wang X., Shi B., Fu A. W. C.: Utility Based Anonymization using Local Recoding. ACM KDD Conference, 2006.

    Google Scholar 

  117. Xu S., Yung M.: k-anonymous secret handshakes with reusable credentials. ACM Conference on Computer and Communications Security, 2004.

    Google Scholar 

  118. Yao A. C.: How to Generate and Exchange Secrets. FOCS Conferemce, 1986.

    Google Scholar 

  119. Yao G., Feng D.: A new k-anonymous message transmission protocol. International Workshop on Information Security Applications, 2004.

    Google Scholar 

  120. Yang Z., Zhong S., Wright R.: Privacy-Preserving Classification of Customer Data without Loss of Accuracy. SDM Conference, 2006.

    Google Scholar 

  121. Yao C., Wang S., Jajodia S.: Checking for k-Anonymity Violation by views. ACM Conference on Computer and Communication Security, 2004.

    Google Scholar 

  122. Yu H., Jiang X., Vaidya J.: Privacy-Preserving SVM using nonlinear Kernels on Horizontally Partitioned Data. SAC Conference, 2006.

    Google Scholar 

  123. Yu H., Vaidya J., Jiang X.: Privacy-Preserving SVM Classification on Vertically Partitioned Data. PAKDD Conference, 2006.

    Google Scholar 

  124. Zhang P., Tong Y., Tang S., Yang D.: Privacy-Preserving Naive Bayes Classifier. Lecture Notes in Computer Science, Vol 3584, 2005.

    Google Scholar 

  125. Zhong S., Yang Z., Wright R.: Privacy-enhancing k-anonymization of customer data, In Proceedings of the ACM SIGMOD-SIGACT-SIGART Principles of Database Systems, Baltimore, MD. 2005.

    Google Scholar 

  126. Zhu Y., Liu L. Optimal Randomization for Privacy- Preserving Data Mining. ACM KDD Conference, 2004.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer Science+Business Media, LLC.

About this chapter

Cite this chapter

Aggarwal, C.C., Yu, P.S. (2008). Privacy-Preserving Data Mining: A Survey. In: Gertz, M., Jajodia, S. (eds) Handbook of Database Security. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-48533-1_18

Download citation

  • DOI: https://doi.org/10.1007/978-0-387-48533-1_18

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-0-387-48532-4

  • Online ISBN: 978-0-387-48533-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics