Pattern-Guided k-Anonymity

  • Robert Bredereck
  • André Nichterlein
  • Rolf Niedermeier
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7924)

Abstract

We suggest a user-oriented approach to combinatorial data anonymization. A data matrix is called k-anonymous if every row appears at least k times—the goal of the NP-hard k-Anonymity problem then is to make a given matrix k-anonymous by suppressing (blanking out) as few entries as possible. We describe an enhanced k-anonymization problem called Pattern-Guided k-Anonymity where the users can express the differing importance of various data features. We show that Pattern-Guided k-Anonymity remains NP-hard. We provide a fixed-parameter tractability result based on a data-driven parameterization and, based on this, develop an exact ILP-based solution method as well as a simple but very effective greedy heuristic. Experiments on several real-world datasets show that our heuristic easily matches up to the established “Mondrian” algorithm for k-Anonymity in terms of quality of the anonymization and outperforms it in terms of running time.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Blocki, J., Williams, R.: Resolving the complexity of some data privacy problems. In: Abramsky, S., Gavoille, C., Kirchner, C., Meyer auf der Heide, F., Spirakis, P.G. (eds.) ICALP 2010. LNCS, vol. 6199, pp. 393–404. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  2. 2.
    Bonizzoni, P., Della Vedova, G., Dondi, R.: Anonymizing binary and small tables is hard to approximate. Journal of Combinatorial Optimization 22(1), 97–119 (2011)MathSciNetMATHCrossRefGoogle Scholar
  3. 3.
    Bonizzoni, P., Della Vedova, G., Dondi, R., Pirola, Y.: Parameterized complexity of k-anonymity: hardness and tractability. Journal of Combinatorial Optimization (2011)Google Scholar
  4. 4.
    Bredereck, R., Nichterlein, A., Niedermeier, R., Philip, G.: Pattern-guided data anonymization and clustering. In: Murlak, F., Sankowski, P. (eds.) MFCS 2011. LNCS, vol. 6907, pp. 182–193. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  5. 5.
    Bredereck, R., Nichterlein, A., Niedermeier, R., Philip, G.: The effect of homogeneity on the computational complexity of combinatorial data anonymization. In: Data Mining and Knowledge Discovery (2012)Google Scholar
  6. 6.
    Campan, A., Truta, T.M.: Data and structural k-anonymity in social networks. In: Bonchi, F., Ferrari, E., Jiang, W., Malin, B. (eds.) PinKDD 2008. LNCS, vol. 5456, pp. 33–54. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  7. 7.
    Chakaravarthy, V.T., Pandit, V., Sabharwal, Y.: On the complexity of the k-anonymization problem. CoRR, abs/1004.4729 (2010)Google Scholar
  8. 8.
    Dwork, C.: A firm foundation for private data analysis. Communications of the ACM 54(1), 86–95 (2011)CrossRefGoogle Scholar
  9. 9.
    Evans, P.A., Wareham, T., Chaytor, R.: Fixed-parameter tractability of anonymizing data by suppressing entries. Journal of Combinatorial Optimization 18(4), 362–375 (2009)MathSciNetMATHCrossRefGoogle Scholar
  10. 10.
    Frank, A., Asuncion, A.: UCI machine learning repository(2010), http://archive.ics.uci.edu/ml
  11. 11.
    Fredkin, E.: Trie memory. Communications of the ACM 3(9), 490–499 (1960)CrossRefGoogle Scholar
  12. 12.
    Fung, B.C.M., Wang, K., Chen, R., Yu, P.S.: Privacy-preserving data publishing: A survey of recent developments. ACM Computing Surveys 14(2) 14:1–14:53 (2010)Google Scholar
  13. 13.
    Gkoulalas-Divanis, A., Kalnis, P., Verykios, V.S.: Providing k-anonymity in location based services. ACM SIGKDD Explorations Newsletter 12, 3–10 (2010)CrossRefGoogle Scholar
  14. 14.
    Karp, R.M.: Reducibility among combinatorial problems. In: Miller, R.E., Thatcher, J.W. (eds.) Complexity of Computer Computations, pp. 85–103. Plenum Press (1972)Google Scholar
  15. 15.
    LeFevre, K., DeWitt, D., Ramakrishnan, R.: Mondrian multidimensional k-anonymity. In: Proceedings of the 22nd International Conference on Data Engineering (ICDE 2006), pp. 25–25. IEEE (2006)Google Scholar
  16. 16.
    Lenstra, H.W.: Integer programming with a fixed number of variables. Mathematics of Operations Research 8, 538–548 (1983)MathSciNetMATHCrossRefGoogle Scholar
  17. 17.
    Li, N., Li, T., Venkatasubramanian, S.: t-closeness: Privacy beyond k-anonymity and l-diversity. In: Proceedings of the 23rd International Conference on Data Engineering (ICDE 2007), pp. 106–115. IEEE (2007)Google Scholar
  18. 18.
    Loukides, G., Shao, J.: Capturing data usefulness and privacy protection in k-anonymisation. In: Proceedings of the 2007 ACM Symposium on Applied Computing, pp. 370–374. ACM (2007)Google Scholar
  19. 19.
    Machanavajjhala, A., Kifer, D., Gehrke, J., Venkitasubramaniam, M.: ℓ-diversity: Privacy beyond k-anonymity. ACM Transactions on Knowledge Discovery from Data 1(1) (2007)Google Scholar
  20. 20.
    Meyerson, A., Williams, R.: On the complexity of optimal k-anonymity. In: Proceedings of the 23rd ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS 2004), pp. 223–228. ACM (2004)Google Scholar
  21. 21.
    Navarro-Arribas, G., Torra, V., Erola, A., Castellà-Roca, J.: User k-anonymity for privacy preserving data mining of query logs. Information Processing & Management 48(3), 476–487 (2012)CrossRefGoogle Scholar
  22. 22.
    Rastogi, V., Suciu, D., Hong, S.: The boundary between privacy and utility in data publishing. In: Proceedings of the 33rd International Conference on Very Large Data Bases, pp. 531–542. VLDB Endowment (2007)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Robert Bredereck
    • 1
  • André Nichterlein
    • 1
  • Rolf Niedermeier
    • 1
  1. 1.Institut für Softwaretechnik und Theoretische InformatikTU BerlinBerlinGermany

Personalised recommendations