Advertisement

kACTUS 2: Privacy Preserving in Classification Tasks Using k-Anonymity

  • Slava Kisilevich
  • Yuval Elovici
  • Bracha Shapira
  • Lior Rokach
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5661)

Abstract

k-anonymity is the method used for masking sensitive data which successfully solves the problem of re-linking of data with an external source and makes it difficult to re-identify the individual. Thus k-anonymity works on a set of quasi-identifiers (public sensitive attributes), whose possible availability and linking is anticipated from external dataset, and demands that the released dataset will contain at least k records for every possible quasi-identifier value. Another aspect of k is its capability of maintaining the truthfulness of the released data (unlike other existing methods). This is achieved by generalization, a primary technique in k-anonymity. Generalization consists of generalizing attribute values and substituting them with semantically consistent but less precise values. When the substituted value doesn’t preserve semantic validity the technique is called suppression which is a private case of generalization. We present a hybrid approach called compensation which is based on suppression and swapping for achieving privacy. Since swapping decreases the truthfulness of attribute values there is a tradeoff between level of swapping (information truthfulness) and suppression (information loss) incorporated in our algorithm.

We use k-anonymity to explore the issue of anonymity preservation. Since we do not use generalization, we do not need a priori knowledge of attribute semantics. We investigate data anonymization in the context of classification and use tree properties to satisfy k-anonymization. Our work improves previous approaches by increasing classification accuracy.

Keywords

anonymity privacy preserving generalization suppression data mining 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Sweeney, L.: k-anonymity: a model for protecting privacy. Int. J. Uncertain. Fuzziness Knowl.-Based Syst. 10(5), 557–570 (2002)zbMATHCrossRefMathSciNetGoogle Scholar
  2. 2.
    Samarati, P., Sweeney, L.: Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppressionGoogle Scholar
  3. 3.
    Sweeney, L.: Achieving k-anonymity privacy protection using generalization and suppression (2002)Google Scholar
  4. 4.
    Iyengar, V.S.: Transforming data to satisfy privacy constraints. In: KDD 2002: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 279–288. ACM, New York (2002)CrossRefGoogle Scholar
  5. 5.
    Wang, K., Yu, P.S., Chakraborty, S.: Bottom-up generalization: a data mining solution to privacy protection. In: Proc. of the 4th IEEE International Conference on Data Mining (ICDM 2004) (November 2004)Google Scholar
  6. 6.
    Fung, B.C.M., Wang, K., Yu, P.S.: Top-down specialization for information and privacy preservation. In: Proc. of the 21st IEEE International Conference on Data Engineering (ICDE 2005), Tokyo, Japan, April 2005, pp. 205–216 (2005)Google Scholar
  7. 7.
    LeFevre, K., DeWitt, D.J., Ramakrishnan, R.: Incognito: efficient full-domain k-anonymity. In: SIGMOD 2005: Proceedings of the 2005 ACM SIGMOD international conference on Management of data, pp. 49–60. ACM, New York (2005)CrossRefGoogle Scholar
  8. 8.
    Friedman, A., Schuster, A., Wolff, R.: k-anonymous decision tree induction. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) PKDD 2006. LNCS (LNAI), vol. 4213, pp. 151–162. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  9. 9.
    Fung, B.C.M., Wang, K.: Anonymizing classification data for privacy preservation. IEEE Trans. on Knowl. and Data Eng. 19(5), 711–725 (2007); Fellow-Philip S. YuCrossRefGoogle Scholar
  10. 10.
    Friedman, A., Wolff, R., Schuster, A.: Providing k-anonymity in data mining. VLDB J. (2008) (accepted for publication)Google Scholar
  11. 11.
    Dalenius, T., Reiss, S.P.: Data-swapping, a Technique for Disclosure Control. Program in Computer Science and Division of Engineering. Brown University (1978)Google Scholar
  12. 12.
    Reiss, S.P.: Practical data-swapping: the first steps. ACM Trans. Database Syst. 9(1), 20–37 (1984)zbMATHCrossRefGoogle Scholar
  13. 13.
    Richard, A., Moore, J.: Controlled data-swapping techniques for masking public use microdata sets. Statistical Research Division Report Series, RR96-04, U.S. Bureau of the Census (1996)Google Scholar
  14. 14.
    Fienberg, S.E., McIntyre, J.: Data swapping: Variations on a theme by dalenius and reiss. Technical report, National Institute of Statistical Sciences, Research Triangle Park, NC (2003)Google Scholar
  15. 15.
    Kisilevich, S., Elovici, Y., Shapira, B., Rokach, L.: A multi-dimensional suppression for k-anonymity (to appear, 2009)Google Scholar
  16. 16.
    Shannon, C.E.: A mathematical theory of communication. Bell Systems Technical Journal 27, 379–423 (1948)zbMATHMathSciNetGoogle Scholar
  17. 17.
    Newman, C.B.D., Merz, C.: UCI repository of machine learning databases (1998)Google Scholar
  18. 18.
    Witten, I.H., Frank, E.: Data mining: practical machine learning tools and techniques with java implementations. SIGMOD Rec. 31(1), 76–77 (2002)CrossRefGoogle Scholar
  19. 19.
    Dietterich, T.G.: Approximate statistical tests for comparing supervised classification learning algorithms. Neural Comput. 10(7), 1895–1923 (1998)CrossRefGoogle Scholar
  20. 20.
    Salzberg, S.L.: C4.5: Programs for Machine Learning by J. Ross Quinlan. Machine Learning 16(3), 235–240 (1994)MathSciNetGoogle Scholar
  21. 21.
    Cessie, S.L., Houwelingen, J.C.V.: Ridge estimators in logistic regression. Applied Statistics 41(1), 191–201 (1992)zbMATHCrossRefGoogle Scholar
  22. 22.
    Demšar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)MathSciNetGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Slava Kisilevich
    • 1
  • Yuval Elovici
    • 2
  • Bracha Shapira
    • 2
  • Lior Rokach
    • 2
  1. 1.Department of Computer and Information ScienceKonstanz UniversityKonstanzGermany
  2. 2.Department of Information System Engineering and Deutsche Telekom Laboratories at Ben-Gurion UniversityBen Gurion UniversityBe’er ShevaIsrael

Personalised recommendations