Frontiers in Algorithmics and Algorithmic Aspects in Information and Management pp 350-361 | Cite as
Pattern-Guided k-Anonymity
Abstract
We suggest a user-oriented approach to combinatorial data anonymization. A data matrix is called k-anonymous if every row appears at least k times—the goal of the NP-hard k -Anonymity problem then is to make a given matrix k-anonymous by suppressing (blanking out) as few entries as possible. We describe an enhanced k-anonymization problem called Pattern-Guided k -Anonymity where the users can express the differing importance of various data features. We show that Pattern-Guided k -Anonymity remains NP-hard. We provide a fixed-parameter tractability result based on a data-driven parameterization and, based on this, develop an exact ILP-based solution method as well as a simple but very effective greedy heuristic. Experiments on several real-world datasets show that our heuristic easily matches up to the established “Mondrian” algorithm for k -Anonymity in terms of quality of the anonymization and outperforms it in terms of running time.
Keywords
Integer Linear Program Input Matrix Greedy Heuristic Integer Linear Program Formulation Pattern VectorPreview
Unable to display preview. Download preview PDF.
References
- 1.Blocki, J., Williams, R.: Resolving the complexity of some data privacy problems. In: Abramsky, S., Gavoille, C., Kirchner, C., Meyer auf der Heide, F., Spirakis, P.G. (eds.) ICALP 2010. LNCS, vol. 6199, pp. 393–404. Springer, Heidelberg (2010)CrossRefGoogle Scholar
- 2.Bonizzoni, P., Della Vedova, G., Dondi, R.: Anonymizing binary and small tables is hard to approximate. Journal of Combinatorial Optimization 22(1), 97–119 (2011)MathSciNetMATHCrossRefGoogle Scholar
- 3.Bonizzoni, P., Della Vedova, G., Dondi, R., Pirola, Y.: Parameterized complexity of k-anonymity: hardness and tractability. Journal of Combinatorial Optimization (2011)Google Scholar
- 4.Bredereck, R., Nichterlein, A., Niedermeier, R., Philip, G.: Pattern-guided data anonymization and clustering. In: Murlak, F., Sankowski, P. (eds.) MFCS 2011. LNCS, vol. 6907, pp. 182–193. Springer, Heidelberg (2011)CrossRefGoogle Scholar
- 5.Bredereck, R., Nichterlein, A., Niedermeier, R., Philip, G.: The effect of homogeneity on the computational complexity of combinatorial data anonymization. In: Data Mining and Knowledge Discovery (2012)Google Scholar
- 6.Campan, A., Truta, T.M.: Data and structural k-anonymity in social networks. In: Bonchi, F., Ferrari, E., Jiang, W., Malin, B. (eds.) PinKDD 2008. LNCS, vol. 5456, pp. 33–54. Springer, Heidelberg (2009)CrossRefGoogle Scholar
- 7.Chakaravarthy, V.T., Pandit, V., Sabharwal, Y.: On the complexity of the k-anonymization problem. CoRR, abs/1004.4729 (2010)Google Scholar
- 8.Dwork, C.: A firm foundation for private data analysis. Communications of the ACM 54(1), 86–95 (2011)CrossRefGoogle Scholar
- 9.Evans, P.A., Wareham, T., Chaytor, R.: Fixed-parameter tractability of anonymizing data by suppressing entries. Journal of Combinatorial Optimization 18(4), 362–375 (2009)MathSciNetMATHCrossRefGoogle Scholar
- 10.Frank, A., Asuncion, A.: UCI machine learning repository(2010), http://archive.ics.uci.edu/ml
- 11.Fredkin, E.: Trie memory. Communications of the ACM 3(9), 490–499 (1960)CrossRefGoogle Scholar
- 12.Fung, B.C.M., Wang, K., Chen, R., Yu, P.S.: Privacy-preserving data publishing: A survey of recent developments. ACM Computing Surveys 14(2) 14:1–14:53 (2010)Google Scholar
- 13.Gkoulalas-Divanis, A., Kalnis, P., Verykios, V.S.: Providing k-anonymity in location based services. ACM SIGKDD Explorations Newsletter 12, 3–10 (2010)CrossRefGoogle Scholar
- 14.Karp, R.M.: Reducibility among combinatorial problems. In: Miller, R.E., Thatcher, J.W. (eds.) Complexity of Computer Computations, pp. 85–103. Plenum Press (1972)Google Scholar
- 15.LeFevre, K., DeWitt, D., Ramakrishnan, R.: Mondrian multidimensional k-anonymity. In: Proceedings of the 22nd International Conference on Data Engineering (ICDE 2006), pp. 25–25. IEEE (2006)Google Scholar
- 16.Lenstra, H.W.: Integer programming with a fixed number of variables. Mathematics of Operations Research 8, 538–548 (1983)MathSciNetMATHCrossRefGoogle Scholar
- 17.Li, N., Li, T., Venkatasubramanian, S.: t-closeness: Privacy beyond k-anonymity and l-diversity. In: Proceedings of the 23rd International Conference on Data Engineering (ICDE 2007), pp. 106–115. IEEE (2007)Google Scholar
- 18.Loukides, G., Shao, J.: Capturing data usefulness and privacy protection in k-anonymisation. In: Proceedings of the 2007 ACM Symposium on Applied Computing, pp. 370–374. ACM (2007)Google Scholar
- 19.Machanavajjhala, A., Kifer, D., Gehrke, J., Venkitasubramaniam, M.: ℓ-diversity: Privacy beyond k-anonymity. ACM Transactions on Knowledge Discovery from Data 1(1) (2007)Google Scholar
- 20.Meyerson, A., Williams, R.: On the complexity of optimal k-anonymity. In: Proceedings of the 23rd ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS 2004), pp. 223–228. ACM (2004)Google Scholar
- 21.Navarro-Arribas, G., Torra, V., Erola, A., Castellà-Roca, J.: User k-anonymity for privacy preserving data mining of query logs. Information Processing & Management 48(3), 476–487 (2012)CrossRefGoogle Scholar
- 22.Rastogi, V., Suciu, D., Hong, S.: The boundary between privacy and utility in data publishing. In: Proceedings of the 33rd International Conference on Very Large Data Bases, pp. 531–542. VLDB Endowment (2007)Google Scholar