Abstract
k-Anonymity is a privacy preserving method for limiting disclosure of private information in data mining. The process of anonymizing a database table typically involves generalizing table entries and, consequently, it incurs loss of relevant information. This motivates the search for anonymization algorithms that achieve the required level of anonymization while incurring a minimal loss of information. The problem of k-anonymization with minimal loss of information is NP-hard. We present a practical approximation algorithm that enables solving the k-anonymization problem with an approximation guarantee of O(ln k). That algorithm improves an algorithm due to Aggarwal et al. (Proceedings of the international conference on database theory (ICDT), 2005) that offers an approximation guarantee of O(k), and generalizes that of Park and Shim (SIGMOD ’07: proceedings of the 2007 ACM SIGMOD international conference on management of data, 2007) that was limited to the case of generalization by suppression. Our algorithm uses techniques that we introduce herein for mining closed frequent generalized records. Our experiments show that the significance of our algorithm is not limited only to the theory of k-anonymization. The proposed algorithm achieves lower information losses than the leading approximation algorithm, as well as the leading heuristic algorithms. A modified version of our algorithm that issues ℓ-diverse k-anonymizations also achieves lower information losses than the corresponding modified versions of the leading algorithms.
Similar content being viewed by others
References
Aggarwal G, Feder T, Kenthapadi K, Motwani R, Panigrahy R, Thomas D, Zhu A (2005) Anonymizing tables. In: ICDT, pp 246–258
Agrawal R, Srikant R (1994) Fast algorithms for mining association rules. In: Proc. 20th int. conf. very large data bases, VLDB, pp 487–499
Agrawal R, Srikant R (2000) Privacy-preserving data mining. In: ACM-SIGMOD conference on management of data, pp 439–450
Bayardo R, Agrawal R (2005) Data privacy through optimal k-anonymization. In: International conference on data engineering (ICDE), pp 217–228
Byun J-W, Kamra A, Bertino E, Li N (2007) Efficient k-anonymization using clustering techniques. In: DASFAA, pp 188–200
Ghinita G, Karras P, Kalnis P, Mamoulis N (2009) A framework for efficient data anonymization under privacy and accuracy constraints. ACM Trans Database Syst 34(2)
Gionis A, Tassa T (2009) k-anonymization with minimal loss of information. IEEE Trans Knowl Data Eng 21(2): 206–219
Gionis A, Mazza A, Tassa T (2008) k-anonymization revisited. In: International conference on data engineering (ICDE), pp 744–753
Goldberger J, Tassa T (2010) Efficient anonymizations with enhanced utility. TDP 3: 149–175
Grahne G, Zhu J (2005) Fast algorithms for frequent itemset mining using fp-trees. IEEE Trans Knowl Data Eng 17(10): 1347–1362
Hsiao C-J (2005) Efficient algorithms for mining closed itemsets and their lattice structure. IEEE Trans Knowl Data Eng 17(4): 462–478
Iyengar V (2002) Transforming data to satisfy privacy constraints. In: KDD ’02: proceedings of the eighth ACM SIGKDD international conference on knowledge discovery and data mining, pp 279–288
LeFevre K, DeWitt D, Ramakrishnan R (2005) Incognito: efficient full-domain k-anonymity. In: ACM-SIGMOD conference on management of data, pp 49–60
LeFevre K, DeWitt DJ, Ramakrishnan R (2006) Mondrian multidimensional k-anonymity. In: ICDE, p 25
Li N, Li T, Venkatasubramanian S (2007) t-closeness: privacy beyond k-anonymity and ℓ-diversity. In: Proceedings of IEEE international conference on data engineering (ICDE) 2007, pp 106–115
Machanavajjhala A, Kifer D, Gehrke J, Venkitasubramaniam M (2007) ℓ-diversity: privacy beyond k-anonymity. ACM Trans Knowl Discov Data 1(1): 3
Meyerson A, Williams R (2004) On the complexity of optimal k-anonymity. In: PODS ’04: proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems, pp 223–228
Nergiz M, Clifton C (2007) Thoughts on k-anonymization. Data Knowl Eng 63(3): 622–645
Park H, Shim K (2007) Approximate algorithms for k-anonymity. In: SIGMOD ’07: proceedings of the 2007 ACM SIGMOD international conference on management of data, pp 67–78
Pei J, Han J, Mao R (2000) CLOSET: an efficient algorithm for mining frequent closed itemsets. In: Workshop on research issues in data mining and knowledge discovery, DMKD, pp 21–30
Samarati P (2001) Protecting respondents’ identities in microdata release. IEEE Trans Knowl Data Eng 13(6): 1010–1027
Samarati P, Sweeney L (1998) Generalizing data to provide anonymity when disclosing information (abstract). In: PODS ’98: proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on principles of database systems, p 188
Srikant R, Agrawal R (1995) Minning generalized association rules. In: VLDB, pp 407–419
Sweeney L (2000) Uniqueness of simple demographics in the U.S. population. Laboratory for international Data Privacy LIDAP-WP4
Sweeney L (2002) k-anonymity: a model for protecting privacy. Int J Uncertain Fuzz Knowl Based Syst 10(5): 557–570
Truta T, Campan A, Meyer P (2007) Generating microdata with p-sensitive k-anonymity property. In: SDM, pp 124–141
Wong R, Li J, Fu A, Wang K (2006) (α, k)-anonymity: an enhanced k-anonymity model for privacy preserving data publishing. In: In ACM SIGKDD, pp 754–759
Xiao X, Tao Y (2006) Anatomy: simple and effective privacy preservation. In: VLDB, pp 139–150
Author information
Authors and Affiliations
Corresponding author
Additional information
Responsible editor: Johannes Gehrke.
Rights and permissions
About this article
Cite this article
Kenig, B., Tassa, T. A practical approximation algorithm for optimal k-anonymity. Data Min Knowl Disc 25, 134–168 (2012). https://doi.org/10.1007/s10618-011-0235-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10618-011-0235-9