km-Anonymity for Continuous Data Using Dynamic Hierarchies

  • Olga Gkountouna
  • Sotiris Angeli
  • Athanasios Zigomitros
  • Manolis Terrovitis
  • Yannis Vassiliou
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8744)

Abstract

Many organizations, enterprises or public services collect and manage personal data of individuals. These data contain knowledge that is of substantial value for scientists and market experts, but carelessly disseminating them can lead to significant privacy breaches, as they might reveal financial, medical or other personal information. Several anonymization methods have been proposed to allow the privacy preserving sharing of datasets with personal information. Anonymization techniques provide a trade-off between the strength of the privacy guarantee and the quality of the anonymized dataset. In this work we focus on the anonymization of sets of values from continuous domains, e.g., numerical data, and we provide a method for protecting the anonymized data from attacks against identity disclosure. The main novelty of our approach is that instead of using a fixed, given generalization hierarchy, we let the anonymization algorithm decide how different values will be generalized. The benefit of our approach is twofold: a) we are able to generalize datasets without requiring an expert to define the hierarchy and b) we limit the information loss, since the proposed algorithm is able to limit the scope of the generalization. We provide a series of experiments that demonstrate the gains in terms of information quality of our algorithm compared to the state-of-the-art.

Keywords

Privacy-Preserving Data Publishing Privacy km-anonymity Continuous data 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Terrovitis, M., Mamoulis, N., Kalnis, P.: Privacy-preserving Anonymization of Set-valued Data. PVLDB 1(1) (2008)Google Scholar
  2. 2.
    Sweeney, L.: k-Anonymity: A Model for Protecting Privacy. IJUFKS 10(5) (2002)Google Scholar
  3. 3.
    Terrovitis, M., Mamoulis, N., Kalnis, P.: Local and global recoding methods for anonymizing set-valued data. The VLDB Journal 20(1), 83–106 (2011)CrossRefGoogle Scholar
  4. 4.
    Terrovitis, M., Mamoulis, N., Liagouris, J., Skiadopoulos, S.: Privacy preservation by disassociation. Proceedings of the VLDB Endowment 5(10), 944–955 (2012)CrossRefGoogle Scholar
  5. 5.
    Meyerson, A., Williams, R.: On the Complexity of Optimal K-anonymity. In: PODS, pp. 223–228 (2004)Google Scholar
  6. 6.
    Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. In: SIGMOD, pp. 1–12 (2000)Google Scholar
  7. 7.
    Xu, J., Wang, W., Pei, J., Wang, X., Shi, B., Fu, A.: Utility-Based Anonymization Using Local Recoding. In: KDD, pp. 785–790 (2006)Google Scholar
  8. 8.
  9. 9.
    Uci repository us census data 1990 data set (1990), http://archive.ics.uci.edu/ml/datasets/US+Census+Data+%281990%29
  10. 10.
    Samarati, P., Sweeney, L.: Generalizing Data to Provide Anonymity when Disclosing Information (abstract). In: PODS (see also Technical Report SRI-CSL-98-04) (1998)Google Scholar
  11. 11.
    Samarati, P.: Protecting respondents identities in microdata release. TKDE 13(6), 1010–1027 (2001)Google Scholar
  12. 12.
    Sweeney, L.: Datafly: A system for providing anonymity in medical data. In: Proc. of the International Conference on Database Security, pp. 356–381 (1998)Google Scholar
  13. 13.
    Iyengar, V.S.: Transforming data to satisfy privacy constraints. In: SIGKDD, pp. 279–288. ACM (2002)Google Scholar
  14. 14.
    Sweeney, L.: Achieving k-anonymity privacy protection using generalization and suppression. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems 10(05), 571–588 (2002)CrossRefMATHMathSciNetGoogle Scholar
  15. 15.
    Wang, K., Yu, P.S., Chakraborty, S.: Bottom-up generalization: A data mining solution to privacy protection. In: ICDM, pp. 249–256. IEEE (2004)Google Scholar
  16. 16.
    LeFevre, K., DeWitt, D.J., Ramakrishnan, R.: Incognito: Efficient full-domain k-anonymity. In: SIGMOD, pp. 49–60. ACM (2005)Google Scholar
  17. 17.
    LeFevre, K., DeWitt, D.-J., Ramakrishnan, R.: Mondrian multidimensional k-anonymity. In: ICDE, p. 25. IEEE (2006)Google Scholar
  18. 18.
    Fung, B.C., Wang, K., Yu, P.S.: Top-down specialization for information and privacy preservation. In: ICDE, pp. 205–216. IEEE (2005)Google Scholar
  19. 19.
    Bayardo, R.J., Agrawal, R.: Data privacy through optimal k-anonymization. In: ICDE, pp. 217–228. IEEE (2005)Google Scholar
  20. 20.
    El Emam, K., Dankar, F.K., Issa, R., Jonker, E., Amyot, D., Cogo, E., Corriveau, J.-P., Walker, M., Chowdhury, S., Vaillancourt, R., et al.: A globally optimal k-anonymity method for the de-identification of health data. Journal of the American Medical Informatics Association 16(5), 670–682 (2009)CrossRefGoogle Scholar
  21. 21.
    Kohlmayer, F., Prasser, F., Eckert, C., Kemper, A., Kuhn, K.A.: Flash: efficient, stable and optimal k-anonymity. In: PASSAT, SocialCom, pp. 708–717. IEEE (2012)Google Scholar
  22. 22.
    Zhang, Q., Koudas, N., Srivastava, D., Yu, T.: Aggregate query answering on anonymized tables. In: ICDE, pp. 116–125. IEEE (2007)Google Scholar
  23. 23.
    Evfimievski, A., Gehrke, J., Srikant, R.: Limiting privacy breaches in privacy preserving data mining. In: ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pp. 211–222. ACM (2003)Google Scholar
  24. 24.
    Verykios, V.S., Elmagarmid, A.K., Bertino, E., Saygin, Y., Dasseni, E.: Association rule hiding. TKDE 16(4), 434–447 (2004)Google Scholar
  25. 25.
    Domingo-Ferrer, J., Torra, V.: Ordinal, continuous and heterogeneous k-anonymity through microaggregation. Data Mining and Knowledge Discovery 11(2), 195–212 (2005)CrossRefMathSciNetGoogle Scholar
  26. 26.
    Domingo-Ferrer, J., Solanas, A., Martinez-Balleste, A.: Privacy in statistical databases: k-anonymity through microaggregation. In: GrC, pp. 774–777 (2006)Google Scholar
  27. 27.
    Domingo-Ferrer, J.: Microaggregation: achieving k-anonymity with quasi-optimal data quality. In: European Conference on Quality in Survey Statistics (2006)Google Scholar
  28. 28.
    Xiao, X., Tao, Y.: Anatomy: Simple and effective privacy preservation. In: VLDB, pp. 139–150. VLDB Endowment (2006)Google Scholar
  29. 29.
    Li, T., Li, N., Zhang, J., Molloy, I.: Slicing: A new approach for privacy preserving data publishing. TKDE 24(3), 561–574 (2012)Google Scholar
  30. 30.
    Casino, F., Patsakis, C., Puig, D., Solanas, A.: On privacy preserving collaborative filtering: Current trends, open problems, and new issues. In: e-Business Engineering (ICEBE), pp. 244–249. IEEE (2013)Google Scholar
  31. 31.
    Casino, F., Domingo-Ferrer, J., Patsakis, C., Puig, D., Solanas, A.: Privacy preserving collaborative filtering with k-anonymity through microaggregation. In: e-Business Engineering (ICEBE), pp. 490–497. IEEE (2013)Google Scholar
  32. 32.
    Machanavajjhala, A., Kifer, D., Gehrke, J., Venkitasubramaniam, M.: l-diversity: Privacy beyond k-anonymity. TKDD 1(1), 3 (2007)CrossRefGoogle Scholar
  33. 33.
    Liu, J., Wang, K.: On optimal anonymization for l + -diversity. In: ICDE, pp. 213–224. IEEE (2010)Google Scholar
  34. 34.
    Li, N., Li, T., Venkatasubramanian, S.: t-closeness: Privacy beyond k-anonymity and l-diversity. In: ICDE, pp. 106–115. IEEE (2007)Google Scholar
  35. 35.
    Cao, J., Karras, P.: Publishing microdata with a robust privacy guarantee. Proceedings of the VLDB Endowment 5(11), 1388–1399 (2012)CrossRefGoogle Scholar
  36. 36.
    Wang, K., Fung, B.: Anonymizing sequential releases. In: SIGKDD, pp. 414–423. ACM (2006)Google Scholar
  37. 37.
    Gionis, A., Mazza, A., Tassa, T.: k-anonymization revisited. In: ICDE, pp. 744–753. IEEE (2008)Google Scholar
  38. 38.
    Wong, W.K., Mamoulis, N., Cheung, D.W.L.: Non-homogeneous generalization in privacy preserving data publishing. In: SIGMOD, pp. 747–758. ACM (2010)Google Scholar
  39. 39.
    Tassa, T., Mazza, A., Gionis, A.: k-concealment: An alternative model of k-type anonymity. Transactions on Data Privacy 5(1), 189–222 (2012)MathSciNetGoogle Scholar
  40. 40.
    Stokes, K., Torra, V.: n-confusion: a generalization of k-anonymity. In: EDBT/ICDT Workshops, pp. 211–215. ACM (2012)Google Scholar
  41. 41.
    Ghinita, G., Tao, Y., Kalnis, P.: On the Anonymization of Sparse High-Dimensional Data. In: ICDE (2008)Google Scholar
  42. 42.
    Zigomitros, A., Solanas, A., Patsakis, C.: The role of inference in the anonymization of medical records. In: Computer-Based Medical Systems, CBMS (2014)Google Scholar
  43. 43.
    Xu, Y., Wang, K., Fu, A.W.-C., Yu, P.S.: Anonymizing transaction databases for publication. In: ACM SIGKDD, pp. 767–775. ACM (2008)Google Scholar
  44. 44.
    Gkountouna, O., Lepenioti, K., Terrovitis, M.: Privacy against aggregate knowledge attacks. In: PrivDB, Data Engineering Workshops (ICDEW), pp. 99–103. IEEE (2013)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Olga Gkountouna
    • 1
    • 2
  • Sotiris Angeli
    • 1
  • Athanasios Zigomitros
    • 3
    • 2
  • Manolis Terrovitis
    • 2
  • Yannis Vassiliou
    • 1
  1. 1.National Technical University of AthensGreece
  2. 2.Institute for the Management of Information Systems (IMIS)AthensGreece
  3. 3.University of PiraeusGreece

Personalised recommendations