kACTUS 2: Privacy Preserving in Classification Tasks Using k-Anonymity
k-anonymity is the method used for masking sensitive data which successfully solves the problem of re-linking of data with an external source and makes it difficult to re-identify the individual. Thus k-anonymity works on a set of quasi-identifiers (public sensitive attributes), whose possible availability and linking is anticipated from external dataset, and demands that the released dataset will contain at least k records for every possible quasi-identifier value. Another aspect of k is its capability of maintaining the truthfulness of the released data (unlike other existing methods). This is achieved by generalization, a primary technique in k-anonymity. Generalization consists of generalizing attribute values and substituting them with semantically consistent but less precise values. When the substituted value doesn’t preserve semantic validity the technique is called suppression which is a private case of generalization. We present a hybrid approach called compensation which is based on suppression and swapping for achieving privacy. Since swapping decreases the truthfulness of attribute values there is a tradeoff between level of swapping (information truthfulness) and suppression (information loss) incorporated in our algorithm.
We use k-anonymity to explore the issue of anonymity preservation. Since we do not use generalization, we do not need a priori knowledge of attribute semantics. We investigate data anonymization in the context of classification and use tree properties to satisfy k-anonymization. Our work improves previous approaches by increasing classification accuracy.
Keywordsanonymity privacy preserving generalization suppression data mining
Unable to display preview. Download preview PDF.
- 2.Samarati, P., Sweeney, L.: Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppressionGoogle Scholar
- 3.Sweeney, L.: Achieving k-anonymity privacy protection using generalization and suppression (2002)Google Scholar
- 5.Wang, K., Yu, P.S., Chakraborty, S.: Bottom-up generalization: a data mining solution to privacy protection. In: Proc. of the 4th IEEE International Conference on Data Mining (ICDM 2004) (November 2004)Google Scholar
- 6.Fung, B.C.M., Wang, K., Yu, P.S.: Top-down specialization for information and privacy preservation. In: Proc. of the 21st IEEE International Conference on Data Engineering (ICDE 2005), Tokyo, Japan, April 2005, pp. 205–216 (2005)Google Scholar
- 10.Friedman, A., Wolff, R., Schuster, A.: Providing k-anonymity in data mining. VLDB J. (2008) (accepted for publication)Google Scholar
- 11.Dalenius, T., Reiss, S.P.: Data-swapping, a Technique for Disclosure Control. Program in Computer Science and Division of Engineering. Brown University (1978)Google Scholar
- 13.Richard, A., Moore, J.: Controlled data-swapping techniques for masking public use microdata sets. Statistical Research Division Report Series, RR96-04, U.S. Bureau of the Census (1996)Google Scholar
- 14.Fienberg, S.E., McIntyre, J.: Data swapping: Variations on a theme by dalenius and reiss. Technical report, National Institute of Statistical Sciences, Research Triangle Park, NC (2003)Google Scholar
- 15.Kisilevich, S., Elovici, Y., Shapira, B., Rokach, L.: A multi-dimensional suppression for k-anonymity (to appear, 2009)Google Scholar
- 17.Newman, C.B.D., Merz, C.: UCI repository of machine learning databases (1998)Google Scholar