Advertisement

Anonymization of Data Sets with NULL Values

  • Margareta Ciglic
  • Johann EderEmail author
  • Christian Koncilia
Chapter
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9510)

Abstract

Releasing, publishing or transferring microdata is restricted by the necessity to protect the privacy of data owners. k-anonymity is one of the most widespread concepts for anonymizing microdata but it does not explicitly cover NULL values which are nevertheless frequently found in microdata. We study the problem of NULL values (missing values, non-applicable attributes, etc.) for anonymization in detail, present a set of new definitions for k-anonymity explicitly considering NULL values and analyze which definition protects from which attacks. We show that an adequate treatment of missing values in microdata can be easily achieved by an extension of generalization algorithms. In particular, we show how the proposed treatment of NULL values was incorporated in the anonymization tool ANON, which implements generalization and tuple suppression with an application specific definition of information loss. With a series of experiments we show that NULL aware generalization algorithms have less information loss than standard algorithms.

Keywords

Privacy k-anonymity NULL values Missing values 

References

  1. 1.
    Aggarwal, G., Feder, T., Kenthapadi, K., Motwani, R., Panigrahy, R., Thomas, D., Zhu, A.: Approximation algorithms for \(k\)-anonymity. In: Proceedings of the International Conference on Database Theory, ICDT 2005 (2005)Google Scholar
  2. 2.
    Bayardo, R.J., Agrawal, R.: Data privacy through optimal \(k\)-anonymization. In: Proceedings of the 21st International Conference on Data Engineering, ICDE 2005, pp. 217–228 (2005)Google Scholar
  3. 3.
    Ciglic, M., Eder, J., Koncilia, C.: ANON - a flexible tool for achieving optimal \(k\)-anonymous and \(\ell \)-diverse tables. Technical report, University of Klagenfurt (2014). http://isys.uni-klu.ac.at/PDF/2014-ANON-Techreport.pdf
  4. 4.
    Ciglic, M., Eder, J., Koncilia, C.: \(k\)-anonymity of microdata with NULL values. In: Decker, H., Lhotská, L., Link, S., Spies, M., Wagner, R.R. (eds.) DEXA 2014, Part I. LNCS, vol. 8644, pp. 328–342. Springer, Heidelberg (2014)Google Scholar
  5. 5.
    Ciriani, V., De Capitani di Vimercati, S., Foresti, S., Samarati, P.: \(k\)-anonymity. In: Yu, T., Jajodia, S. (eds.) SDMDS 2007. AISC, vol. 33, pp. 323–353. Springer, New York (2007)CrossRefGoogle Scholar
  6. 6.
    Codd, E.F.: Extending the database relational model to capture more meaning. ACM Trans. Database Syst. 4(4), 397–434 (1979)CrossRefGoogle Scholar
  7. 7.
    Cox, L.H.: Suppression methodology and statistical disclosure control. J. Am. Stat. Assoc. 75(370), 377–385 (1980)zbMATHCrossRefGoogle Scholar
  8. 8.
    Eder, J., Dabringer, C., Schicho, M., Stark, K.: Information systems for federated biobanks. In: Hameurlain, A., Küng, J., Wagner, R. (eds.) Transactions on Large-Scale Data- and Knowledge-Centered Systems I. LNCS, vol. 5740, pp. 156–190. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  9. 9.
    Eder, J., Gottweis, H., Zatloukal, K.: IT solutions for privacy protection in biobanking. Public Health Genomics 15(5), 254–262 (2012)CrossRefGoogle Scholar
  10. 10.
    Eder, J., Stark, K., Asslaber, M., Abuja, P.M., Gottweis, H., Trauner, M., Mischinger, H.J., Schippinger, W., Berghold, A., Denk, H., Zatloukal, K.: The genome austria tissue bank. Pathobiology 74(4), 251–8 (2007)CrossRefGoogle Scholar
  11. 11.
    Frank, A., Asuncion, A.: UCI machine learning repository (2010). http://archive.ics.uci.edu/ml
  12. 12.
    Fung, B.C.M., Wang, K., Fu, A.W.-C., Yu, P.S.: Introduction to Privacy-Preserving Data Publishing: Concepts and Techniques, 1st edn. Chapman & Hall/CRC, Boca Raton (2010)CrossRefGoogle Scholar
  13. 13.
    Gaskell, G., Gottweis, H., Starkbaum, J., Gerber, M.M., Broerse, J., Gottweis, U., Hobbs, A., Ilpo, H., Paschou, M., Snell, K., Soulier, A.: Publics and biobanks: Pan-European diversity and the challenge of responsible innovation. Eur. J. Hum. Genet. 21(1), 14–20 (2013)CrossRefGoogle Scholar
  14. 14.
    ISO: ISO/IEC 9075–2:2011 Information technology – Database languages – SQL – Part 2: Foundation (SQL/Foundation), December 2011Google Scholar
  15. 15.
    Iyengar, V.S.: Transforming data to satisfy privacy constraints. In: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2002, pp. 279–288 (2002)Google Scholar
  16. 16.
    Kifer, D., Gehrk, J.: Injecting utility into anonymized datasets. In: Proceedings of the 2006 ACM SIGMOD International Conference on Management of data, SIGMOD 2006, pp. 217–228 (2006)Google Scholar
  17. 17.
    LeFevre, K., DeWitt, D.J., Ramakrishnan, R.: Incognito: efficient full-domain \(k\)-anonymity. In: Proceedings of the 2005 ACM SIGMOD International Conference on Management of data, SIGMOD 2005, pp. 49–60 (2005)Google Scholar
  18. 18.
    Machanavajjhala, A., Kifer, D., Gehrke, J., Venkitasubramaniam, M.: L-diversity: privacy beyond \(k\)-anonymity. ACM Trans. Knowl. Disc. Data (TKDD) 1(1), 3 (2007)CrossRefGoogle Scholar
  19. 19.
    Matthews, G.J., Harel, O.: Data confidentiality: a review of methods for statistical disclosure limitation and methods for assessing privacy. Stat. Surv. 5, 1–29 (2011)zbMATHMathSciNetCrossRefGoogle Scholar
  20. 20.
    Meyden, R.: Logical approaches to incomplete information: a survey. In: Chomicki, J., Saake, G. (eds.) Logics for Databases and Information Systems. The Springer International Series in Engineering and Computer Science, vol. 436, pp. 307–357. Springer, New York (1998). Chapter 10CrossRefGoogle Scholar
  21. 21.
    Meyerson, A., Williams, R:. On the complexity of optimal \(k\)-anonymity. In: Proceedings of the Twenty-Third ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2004, pp. 223–228 (2004)Google Scholar
  22. 22.
    Ohrn, A., Ohno-Machado, L.: Using boolean reasoning to anonymize databases. Artif. Intell. Med. 15(3), 235–254 (1999)CrossRefGoogle Scholar
  23. 23.
    Park, H., Shim, K.: Approximate algorithms for \(k\)-anonymity. In: Proceedings of the 2007 ACM SIGMOD International Conference on Management of data, SIGMOD 2007, pp. 67–78 (2007)Google Scholar
  24. 24.
    Samarati, P.: Protecting respondents’ identities in microdata release. IEEE Trans. Knowl. Data Eng. (TKDE) 13(6), 1010–1027 (2001)CrossRefGoogle Scholar
  25. 25.
    Samarati, P., Sweeney, L.: Generalizing data to provide anonymity when disclosing information (abstract). In: Proceedings of the Seventeenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, PODS 1998, p. 188 (1998)Google Scholar
  26. 26.
    Samarati, P., Sweeney, L.: Protecting privacy when disclosing information: \(k\)-anonymity and its enforcement through generalization and suppression. Technical report (1998)Google Scholar
  27. 27.
    Stark, K., Eder, J., Zatloukal, K.: Priority-based \(k\)-anonymity accomplished by weighted generalisation structures. In: Tjoa, A.M., Trujillo, J. (eds.) DaWaK 2006. LNCS, vol. 4081, pp. 394–404. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  28. 28.
    Stark, K., Eder, J., Zatloukal, K.: Achieving \(k\)-anonymity in datamarts used for gene expressions exploitation. J. Integr. Bioinform. 4(1), 57 (2007)Google Scholar
  29. 29.
    Sun, X., Wang, H., Li, J., Truta, T.M.: Enhanced p-sensitive \(k\)-anonymity models for privacy preserving data publishing. Trans. Data Priv. 1(2), 53–66 (2008)MathSciNetGoogle Scholar
  30. 30.
    Sweeney, L.: Achieving \(k\)-anonymity privacy protection using generalization and suppression. Int. J. Uncertainty Fuzziness Knowl.-Based Syst. 10(5), 571–588 (2002)zbMATHMathSciNetCrossRefGoogle Scholar
  31. 31.
    Terrovitis, M., Mamoulis, N., Kalnis, P.: Local and global recoding methods for anonymizing set-valued data. VLDB J. 20(1), 83–106 (2011)CrossRefGoogle Scholar
  32. 32.
    Tian, H., Zhang, W.: Extending l-diversity to generalize sensitive data. Data Knowl. Eng. 70(1), 101–126 (2011)CrossRefGoogle Scholar
  33. 33.
    Wichmann, H.-E.E., Kuhn, K.A., Waldenberger, M., Schmelcher, D., Schuffenhauer, S., Meitinger, T., Wurst, S.H., Lamla, G., Fortier, I., Burton, P.R., Peltonen, L., Perola, M., Metspalu, A., Riegman, P., Landegren, U., Taussig, M.J., Litton, J.-E.E., Fransson, M.N., Eder, J., Cambon-Thomsen, A., Bovenberg, J., Dagher, G., van Ommen, G.-J.J., Griffith, M., Yuille, M., Zatloukal, K.: Comprehensive catalog of European biobanks. Nat. Biotechnol. 29(9), 795–797 (2011)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2016

Authors and Affiliations

  • Margareta Ciglic
    • 1
  • Johann Eder
    • 1
    Email author
  • Christian Koncilia
    • 1
  1. 1.Department of Informatics SystemsAlpen-Adria-Universität KlagenfurtKlagenfurtAustria

Personalised recommendations