Using Classification Methods to Evaluate Attribute Disclosure Risk

  • Jordi Nin
  • Javier Herranz
  • Vicenç Torra
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6408)

Abstract

Statistical Disclosure Control protection methods perturb the non-confidential attributes of an original dataset and publish the perturbed results along with the values of confidential attributes. Traditionally, such a method is considered to achieve a good privacy level if attackers who try to link an original record with its perturbed counterpart have a low success probability. Another opinion is lately gaining popularity: the protection methods should resist not only record re-identification attacks, but also attacks that try to guess the true value of some confidential attribute of some original record(s). This is known as attribute disclosure risk.

In this paper we propose a quite simple strategy to estimate the attribute disclosure risk suffered by a protection method: using a classifier, constructed from the protected (public) dataset, to predict the attribute values of some original record. After defining this approach in detail, we describe some experiments that show the power and danger of the approach: very popular protection methods suffer from very high attribute disclosure risk values.

Keywords

Attribute Disclosure Control Classification Privacy-Preserving Data Perturbation 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Adam, N.R., Worthmann, J.C.: Security-control methods for statistical databases: a comparative study. ACM Computing Surveys 21(4), 515–556 (1989)CrossRefGoogle Scholar
  2. 2.
    Aha, D.W., Kibler, D., Albert, M.K.: Instance-based learning algorithms. Machine Learning 6, 37–66 (1991)Google Scholar
  3. 3.
    Dalenius, T., Reiss, S.: Data-swapping: a technique for disclosure control. Journal of Statistical Planning and Inference 6, 73–85 (1982)MathSciNetCrossRefMATHGoogle Scholar
  4. 4.
    Domingo-Ferrer, J., Torra, V.: Disclosure control methods and information loss for microdata. In: Confidentiality, Disclosure, and Data Access: Theory and Practical Applications for Statistical Agencies, pp. 91–110 (2001)Google Scholar
  5. 5.
    Domingos, P., Pazzani, M.: On the optimality of the simple bayesian classifier under zero-one loss. Machine Learning 29(2), 103–130 (1997)CrossRefMATHGoogle Scholar
  6. 6.
    Kim, J.: A method for limiting disclosure in microdata based on random noise and transformation. In: Proceedings of the ASA Section on Survey Research Methodology, pp. 303–308 (1986)Google Scholar
  7. 7.
    Kim, J., Winkler, W.E.: Multiplicative noise for masking continuous data. Research report series (statistics 2003-01), U. S. Bureau of the Census (2003)Google Scholar
  8. 8.
    Laszlo, M., Mukherjee, S.: Minimum spanning tree partitioning algorithm for microaggregation. IEEE Transactions on Knowledge and Data Engineering 17(7), 902–911 (2005)CrossRefGoogle Scholar
  9. 9.
    Li, N., Li, T.: t-closeness: Privacy beyond k-anonymity and -diversity. In: Proc. of IEEE Int. Conf. on Data Engineering (2007)Google Scholar
  10. 10.
    Liu, K., Kargupta, H., Ryan, J.: Random projection based multiplicative data perturbation for privacy preserving data mining. IEEE Transactions on Knowledge and Data Engineering 18(1), 92–106 (2006)CrossRefGoogle Scholar
  11. 11.
    Machanavajjhala, A., Gehrke, J., Kifer, D., Venkitasubramaniam, M.: l-diversity: Privacy beyond k-anonymity. In: IEEE Int. Conf. on Data Engineering (2006)Google Scholar
  12. 12.
    Mateo-Sanz, J.M., Domingo-Ferrer, J., Sebé, F.: Probabilistic information loss measures in confidentiality protection of continuous microdata. Data Mining and Knowledge Discovery 11(2), 181–193 (2005)MathSciNetCrossRefGoogle Scholar
  13. 13.
    U.S. Census Bureau. Data extraction system (2009), http://www.census.gov/
  14. 14.
    Murphy, P., Aha, D.: UCI Repository machine learning databases. University of California, Department of Information and Computer Science, Irvine (1994)Google Scholar
  15. 15.
    Nin, J., Herranz, J., Torra, V.: Rethinking rank swapping to decrease disclosure risk. Data and Knowledge Engineering 64(1), 346–364 (2008)CrossRefGoogle Scholar
  16. 16.
    Oganian, A., Domingo-Ferrer, J.: On the complexity of optimal microaggregation for statistical disclosure control. Statistical Journal United Nations Economic Commission for Europe 18(4), 345–354 (2000)Google Scholar
  17. 17.
    Quinlan, J.R.: Induction of decision trees. Machine Learning 1(1), 81–106 (1986)Google Scholar
  18. 18.
    Samatari, P., Sweeney, L.: Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression. Technical report, SRI Intl. Tech. Rep. (1998)Google Scholar
  19. 19.
    Torra, V., Nin, J.: Record linkage for database integration using fuzzy integrals. Int. Journal of Intelligent Systems (IJIS) 23(6), 715–734 (2008)CrossRefMATHGoogle Scholar
  20. 20.
    Truta, T.M., Vinay, B.: Privacy protection: p-sensitive k-anonymity property. In: IEEE Int. Conf. on Data Engineering Workshops (2006)Google Scholar
  21. 21.
    Vapnik, V.: The support vector method. In: Int. Conference on Artificial Neural Networks, pp. 263–271 (1997)Google Scholar
  22. 22.
    Witten, I.H., Frank, E.: Data Mining: Practical machine learning tools and techniques. Morgan Kaufmann, San Francisco (2005)MATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Jordi Nin
    • 1
  • Javier Herranz
    • 2
  • Vicenç Torra
    • 3
  1. 1.LAASCNRSToulouse Cedex 4France
  2. 2.Dept. Matemàtica Aplicada IVUniversitat Politècnica de CatalunyaBarcelonaSpain
  3. 3.CSIC, Spanish National Research CouncilIIIA, Artificial Intelligence Research InstituteBellaterraSpain

Personalised recommendations