Advertisement

Secure Anonymization for Incremental Datasets

  • Ji-Won Byun
  • Yonglak Sohn
  • Elisa Bertino
  • Ninghui Li
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4165)

Abstract

Data anonymization techniques based on the k-anonymity model have been the focus of intense research in the last few years. Although the k-anonymity model and the related techniques provide valuable solutions to data privacy, current solutions are limited only to static data release (i.e., the entire dataset is assumed to be available at the time of release). While this may be acceptable in some applications, today we see databases continuously growing everyday and even every hour. In such dynamic environments, the current techniques may suffer from poor data quality and/or vulnerability to inference. In this paper, we analyze various inference channels that may exist in multiple anonymized datasets and discuss how to avoid such inferences. We then present an approach to securely anonymizing a continuously growing dataset in an efficient manner while assuring high data quality.

Keywords

Equivalence Class Information Loss High Data Quality Sensitive Attribute Inference Attack 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Adam, N., Wortmann, J.: Security-control methods for statistical databases: A comparative study. ACM Computing Surveys 21 (1989)Google Scholar
  2. 2.
    Agrawal, R., Evfimievski, A., Srikant, R.: Information sharing across private databases. In: ACM International Conference on Management of Data (2003)Google Scholar
  3. 3.
    Bayardo, R.J., Agrawal, R.: Data privacy through optimal k-anonymization. In: The 21st International Conference on Data Engineering (2005)Google Scholar
  4. 4.
    Dalenius, T.: Finding a needle in a haystack. Journal of Official Statistics 2 (1986)Google Scholar
  5. 5.
    Denning, D.E.: Cryptography and Data Security. Addison-Wesley, Reading (1982)MATHGoogle Scholar
  6. 6.
    Dobkin, D., Jones, A.K., Lipton, R.J.: Secure databases: Protection against user influence. ACM Transactions on Database systems 4 (1979)Google Scholar
  7. 7.
    Dong, X., Halevy, A., Madhavan, J., Nemes, E.: Reference reconciliation in complex information spaces. In: ACM International Conference on Management of Data (2005)Google Scholar
  8. 8.
    Fellegi, I.P.: On the question of statistical confidentiality. Journal of the American Statistical Association (1972)Google Scholar
  9. 9.
    Fellegi, I.P., Sunter, A.B.: A theory for record linkage. Journal of the American Statistical Association (1969)Google Scholar
  10. 10.
    Fung, B.C.M., Wang, K., Yu, P.S.: Top-down specialization for information and privacy preservation. In: The 21st International Conference on Data Engineering (2005)Google Scholar
  11. 11.
    Iyengar, V.S.: Transforming data to satisfy privacy constraints. In: ACM Conference on Knowledge Discovery and Data mining (2002)Google Scholar
  12. 12.
    Lambert, D.: Measures of disclosure risk and harm. Journal of Official Statistics 9 (1993)Google Scholar
  13. 13.
    LeFevre, K., DeWitt, D., Ramakrishnan, R.: Incognito: Efficient full-domain k-anonymity. In: ACM International Conference on Management of Data (2005)Google Scholar
  14. 14.
    LeFevre, K., DeWitt, D., Ramakrishnan, R.: Mondrian multidimensional k-anonymity. In: The 22nd International Conference on Data Engineering (2006)Google Scholar
  15. 15.
    Liew, C.K., Choi, U.J., Liew, C.J.: A data distortion by probability distribution. ACM Transactions on Database Systems 10 (1985)Google Scholar
  16. 16.
    Machanavajjhala, A., Gehrke, J., Kifer, D., Venkitasubramaniam, M.: ℓ-diversity: Privacy beyond k-anonymity. In: The 22nd International Conference on Data Engineering (2006)Google Scholar
  17. 17.
    Reiss, S.P.: Practical data-swapping: The first steps. ACM Transactions on Database Systems 9 (1980)Google Scholar
  18. 18.
    Hettich, C.B.S., Merz, C.: UCI repository of machine learning databases (1998)Google Scholar
  19. 19.
    Sarawagi, S., Bhamidipaty, A.: Interactive deduplication using active learning. In: ACM International Conference on Knowledge Discovery and Data Mining (2002)Google Scholar
  20. 20.
    Srikant, R., Agrawal, R.: Mining quantitative association rules in large relational tables. In: ACM International Conference on Management of Data (1996)Google Scholar
  21. 21.
    Sweeney, L.: Achieving k-anonymity privacy protection using generalization and suppression. International Journal on Uncertainty, Fuzziness and Knowledge-based Systems (2002)Google Scholar
  22. 22.
    Sweeney, L.: K-anonymity: A model for protecting privacy. International Journal on Uncertainty, Fuzziness and Knowledge-based Systems (2002)Google Scholar
  23. 23.
    Traub, J.F., Wozniakowski, Y.Y.H.: The statistical security of statistical database. ACM Transactions on Database Systems 9 (1984)Google Scholar
  24. 24.
    Vaidya, J., Clifton, C.: Privacy preserving association rule mining in vertically partitioned data. In: ACM International Conference on Knowledge Discovery and Data Mining (2002)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Ji-Won Byun
    • 1
  • Yonglak Sohn
    • 2
  • Elisa Bertino
    • 1
  • Ninghui Li
    • 1
  1. 1.CERIAS and Computer SciencePurdue UniversityUSA
  2. 2.Computer EngineeringSeokyeong UniversityKorea

Personalised recommendations