The VLDB Journal

, Volume 17, Issue 4, pp 703–727 | Cite as

Anonymity preserving pattern discovery

  • Maurizio Atzori
  • Francesco Bonchi
  • Fosca Giannotti
  • Dino Pedreschi
Regular Paper

Abstract

It is generally believed that data mining results do not violate the anonymity of the individuals recorded in the source database. In fact, data mining models and patterns, in order to ensure a required statistical significance, represent a large number of individuals and thus conceal individual identities: this is the case of the minimum support threshold in frequent pattern mining. In this paper we show that this belief is ill-founded. By shifting the concept of k-anonymity from the source data to the extracted patterns, we formally characterize the notion of a threat to anonymity in the context of pattern discovery, and provide a methodology to efficiently and effectively identify all such possible threats that arise from the disclosure of the set of extracted patterns. On this basis, we obtain a formal notion of privacy protection that allows the disclosure of the extracted knowledge while protecting the anonymity of the individuals in the source database. Moreover, in order to handle the cases where the threats to anonymity cannot be avoided, we study how to eliminate such threats by means of pattern (not data!) distortion performed in a controlled way.

Keywords

Knowledge discovery Privacy preserving data mining Frequent pattern mining Individual privacy anonymity 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Aggarwal, C.C.: On k-anonymity and the curse of dimensionality. In: Proceedings of the 31st International Conference on Very Large Data Bases, pp. 901–909 (2005)Google Scholar
  2. 2.
    Agrawal, D., Aggarwal, C.C.: On the design and quantification of privacy preserving data mining algorithms. In: Proceedings of the 20th ACM PODS (2001)Google Scholar
  3. 3.
    Agrawal, R., Imielinski, T., Swami, A.N.: Mining association rules between sets of items in large databases. In: Proceedings of the 1993 ACM SIGMOD (1993)Google Scholar
  4. 4.
    Agrawal, R., Srikant, R.: Fast algorithms for mining association rules in large databases. In: Proceedings of the 20th VLDB 1994Google Scholar
  5. 5.
    Agrawal, R., Srikant, R.: Privacy-preserving data mining. In: Proceedings of the 2000 ACM SIGMOD on Management of Data (2000)Google Scholar
  6. 6.
    Agrawal, S., Haritsa, J.R.: A framework for high-accuracy privacy-preserving mining. In: Proceedings of the 22nd IEEE International Conference on Data Engineering (ICDE), pp. 193–204 (2005)Google Scholar
  7. 7.
    Atallah, M., Elmagarmid, A., Ibrahim, M., Bertino, E., , V.: Disclosure limitation of sensitive rules. In: Proceedings of the 1999 Workshop on Knowledge and Data Engineering Exchange, p. 45. IEEE Computer Society (1999)Google Scholar
  8. 8.
    Atzori, M.: Weak k-anonymity: a low-distortion model for protecting privacy. In: Information Security, International 8th Conference (ISC06), Proceedings, PP. 60–71 (2006)Google Scholar
  9. 9.
    Atzori, M., Bonchi, F., Giannotti, F., Pedreschi, D.: Blocking anonymity threats raised by frequent itemset mining. In: Proceedings of 5th IEEE International Conference on Data Mining (ICDM’05), pp. 561–564 (2005)Google Scholar
  10. 10.
    Atzori, M., Bonchi, F., Giannotti, F., Pedreschi, D.: k-anonymous patterns. In: Proceedings of 9th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD’05) (2005)Google Scholar
  11. 11.
    Calders, T.: Computational complexity of itemset frequency satisfiability. In: Proceedings of PODS International Conference Principles of Database Systems (2004)Google Scholar
  12. 12.
    Calders, T., Goethals, B.: Mining all non-derivable frequent itemsets. In: Proceedings of the 6th PKDD (2002)Google Scholar
  13. 13.
    Chang, L., Moskowitz, I.S.: An integrated framework for database inference and privacy protection. In: Data and Applications Security (2000)Google Scholar
  14. 14.
    Cheung, D., Han, J., Ng, V., Fu, A., Fu, Y.: A fast distributed algorithm for mining association rules. In: l4th International Conference on Parallel and Distributed Information Systems (PDIS’96) (1996)Google Scholar
  15. 15.
    Clifton, C., Kantarcioglu, M., Vaidya, J.: Defining privacy for data mining. In: Natural Science Foundation Workshop on Next Generation Data Mining (2002)Google Scholar
  16. 16.
    Clifton, C., Kantarcioglu, M., Vaidya, J., Lin, X., Zhu, M.Y.: Tools for privacy preserving distributed data mining. SIGKDD Explor. Newsl. 4(2), (2002)Google Scholar
  17. 17.
    Dasseni, E., Verykios, V.S., Elmagarmid, A.K., Bertino, E.: Hiding association rules by using confidence and support. In: Proceedings of the 4th International Workshop on Information Hiding (2001)Google Scholar
  18. 18.
    Du, W., Atallah, M.J. : Secure multi-party computation problems and their applications: a review and open problems. In: Proceedings of the 2001 Workshop on New Security Paradigms (2001)Google Scholar
  19. 19.
    Du, W., Zhan, Z.: Building decision tree classifier on private data. In: Proceedings of the IEEE International Conference on Privacy, Security and Data Mining (2002)Google Scholar
  20. 20.
    Du, W., Zhan, Z.: Using randomized response techniques for privacy-preserving data mining. In: Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2003)Google Scholar
  21. 21.
    Estivill-Castro, V., Brankovic, L.: Data swapping: Balancing privacy against precision in mining for logic rules. In: Proceedings of the 1st International Conference on Data Warehousing and Knowledge Discovery (1999)Google Scholar
  22. 22.
    Evfimievski, A.: Randomization in privacy preserving data mining. SIGKDD Explor. Newsl. 4(2), (2002)Google Scholar
  23. 23.
    Evfimievski, A., Gehrke, J., Srikant, R.: Limiting privacy breaches in privacy preserving data mining. In: Proceedings of the Twenty-Second ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (2003)Google Scholar
  24. 24.
    Evfimievski, A., Srikant, R., Agrawal, R., Gehrke, J.: Privacy preserving mining of association rules. In: Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2002)Google Scholar
  25. 25.
    FIMI repository. http://fimi.cs.helsinki.fi/data/Google Scholar
  26. 26.
    Fule, P., Roddick, J.F.: Detecting privacy and ethical sensitivity in data mining results. In: Proceedings of the 27th conference on Australasian computer science (2004)Google Scholar
  27. 27.
    Hand D., Mannila H., Smyh P. (2001) Principles of Data Mining. MIT Press, CambridgeGoogle Scholar
  28. 28.
    Hintoglu, A.A., Inan, A., Saygin, Y., Keskinöz, M.: Suppressing data sets to prevent discovery of association rules. In: Proceedings of 5th IEEE International Conference on Data Mining (ICDM’05), pp. 645–648 (2005)Google Scholar
  29. 29.
    Huang, Z., Du, W., Chen, B.: Deriving private information from randomized data. In: SIGMOD’05: Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data, pp. 37–48 (2005)Google Scholar
  30. 30.
    Ioannidis, I., Grama, A., Atallah, M.: A secure protocol for computing dot-products in clustered and distributed environments. In: Proceedings of the International Conference on Parallel Processing (ICPP’02) (2002)Google Scholar
  31. 31.
    Islam, M.Z., Brankovic, L.: A framework for privacy preserving classification in data mining. In: Proceedings of the 2nd Workshop on Australasian Information Security, Data Mining and Web Intelligence, and Software Internationalisation, pp. 163–168 (2004)Google Scholar
  32. 32.
    Kacprzyk J., Cios K. (eds) (2001) Medical Data Mining and Knowledge Discovery. Physica-Verlag, HeidelbergGoogle Scholar
  33. 33.
    Kantarcioglu, M., Clifton, C.: Privacy-preserving distributed mining of association rules on horizontally partitioned data. In: The ACM SIGMOD Workshop on Research Issues on Data Mining and Knowledge Discovery (DMKD’02) (2002)Google Scholar
  34. 34.
    Kantarcioglu, M., Jin, J., Clifton, C.: When do data mining results violate privacy? In: Proceedings of the 10th ACM SIGKDD (2004)Google Scholar
  35. 35.
    Kargupta, H., Datta, S., Wang, Q., Sivakumar, K.: On the privacy preserving properties of random data perturbation techniques. In: Proceedings of the 3rd IEEE International Conference on Data Mining (2003)Google Scholar
  36. 36.
    Kifer, D., Gehrke, J.: Injecting utility into anonymized datasets. In: SIGMOD ’06: Proceedings of the 2006 ACM SIGMOD International Conference on Management of Data, pp. 217–228 (2006)Google Scholar
  37. 37.
    Knuth D. (1997) Fundamental Algorithms. Addison-Wesley, ReadingMATHGoogle Scholar
  38. 38.
    LeFevre, K., DeWitt, D.J., Ramakrishnan, R.: Incognito: efficient full-domain k-anonymity. In: Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data, pp. 49–60 (2005)Google Scholar
  39. 39.
    Machanavajjhala, A., Gehrke, J., Kifer, D., Venkitasubramaniam, M.: l-Diversity: privacy beyond k-anonymity. In:Proceedings of the 22nd IEEE International Conference on Data Engineering (ICDE), Atlanta, GA, USA (2006)Google Scholar
  40. 40.
    Mannila, H., Toivonen, H.: Multiple uses of frequent sets and condensed representations: extended abstract. In: Proceedings of the 2nd ACM KDD, p. 189 (1996)Google Scholar
  41. 41.
    Muralidhar, K., Sarathy, R.: Security of random data perturbation methods. ACM Trans. Database Syst. 24(4), (1999)Google Scholar
  42. 42.
    Øhrn A., Ohno-Machado L. (1999) Using boolean reasoning to anonymize databases. Artifi. Intell. Med. 15(3): 235–254CrossRefGoogle Scholar
  43. 43.
    Oliveira, S.R.M., Zaiane, O.R. : Privacy preserving frequent itemset mining. In: Proceedings of the IEEE International Conference on Privacy Security and Data mining (2002)Google Scholar
  44. 44.
    Oliveira, S.R.M., Zaiane, O.R.: Protecting sensitive knowledge by data sanitization. In: Third IEEE International Conference on Data Mining (ICDM’03) (2003)Google Scholar
  45. 45.
    Oliveira, S.R.M., Zaiane, O.R., Saygin, Y.: Secure association rule sharing. In: Proceedings of the 8th PAKDD (2004)Google Scholar
  46. 46.
    Pasquier, N., Bastide, Y., Taouil, R., Lakhal, L.: Discovering frequent closed itemsets for association rules. In: Proceedings of ICDT ’99 (1999)Google Scholar
  47. 47.
    Pei, J., Han, J., Wang, J.: Closet+: searching for the best strategies for mining frequent closed itemsets. In: SIGKDD ’03 (2003)Google Scholar
  48. 48.
    Pinkas, B.: Cryptographic techniques for privacy-preserving data mining. SIGKDD Explor. Newsl. 4(2), (2002)Google Scholar
  49. 49.
    Rizvi, S., Haritsa, J.R.: Maintaining data privacy in association rule mining. In: Proceedings of the 28th VLDB Conference (2002)Google Scholar
  50. 50.
    Samarati P. (2001) Protecting respondents’ identities in microdata release. IEEE Trans. Knowl. Data Eng. (TKDE) 13(6): 1010–1027CrossRefGoogle Scholar
  51. 51.
    Samarati, P., Sweeney, L.: Generalizing data to provide when disclosing information (abstract). In: Proceedings of the 17th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (1998)Google Scholar
  52. 52.
    Saygin, Y., Verykios, V.S., Clifton, C.: Using unknowns to prevent discovery of association rules. SIGMOD Rec. 30(4), (2001)Google Scholar
  53. 53.
    Sun, X., Yu, P.S.: A border-based approach for hiding sensitive frequent itemsets. In: Proceedings of 5th IEEE International Conference on Data Mining (ICDM’05), pp. 426–433 (2005)Google Scholar
  54. 54.
    Sweeney, L.: k-Anonymity: a model for protecting privacy. Int. J. Uncertain. Fuzzi. Knowl. Based Syst. 10(5), (2002)Google Scholar
  55. 55.
    Sweeney, L.: k-Anonymity privacy protection using generalization and suppression. Int. J. Uncertain. Fuzzi. Knowl. Based Syst. 10(5), (2002)Google Scholar
  56. 56.
    Vaidya, J., Clifton, C.: Privacy preserving association rule mining in vertically partitioned data. In: Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2002)Google Scholar
  57. 57.
    Verykios V.S., Bertino E., Fovino I.N., Provenza L.P., Saygin Y., Theodoridis Y. (2004) State-of-the-art in privacy preserving data mining. SIGMOD Rec. 33(1): 50–57CrossRefGoogle Scholar
  58. 58.
    Wang, K., Fung, B.C.M., Yu, P.S.: Template-based privacy preservation in classification problems. In: Proceedings of Fifth IEEE International Conference on Data Mining (ICDM’05), pp. 466–473 (2005)Google Scholar
  59. 59.
    Wu, X., Wu, Y., Wang, Y., Li, Y.: Privacy aware market basket data set generation: a feasible approach for inverse frequent set mining. In: Proceedings of 2005 SIAM International Conference on Data Mining (2005)Google Scholar
  60. 60.
    Xiao, X., Tao, Y.: Anatomy: Simple and effective privacy preservation. In: 32nd Very Large Data Bases (VLDB) (2006)Google Scholar
  61. 61.
    Zaki, M.J., Hsiao, C.-J.: Charm: an efficient algorithm for closed itemsets mining. In: 2nd SIAM International Conference on Data Mining (2002)Google Scholar

Copyright information

© Springer-Verlag 2006

Authors and Affiliations

  • Maurizio Atzori
    • 1
    • 2
  • Francesco Bonchi
    • 1
  • Fosca Giannotti
    • 1
  • Dino Pedreschi
    • 2
  1. 1.Pisa KDD LaboratoryISTI–CNRPisaItaly
  2. 2.Pisa KDD Laboratory, Computer Science DepartmentUniversity of PisaPisaItaly

Personalised recommendations