Using Classification Methods to Evaluate Attribute Disclosure Risk
Statistical Disclosure Control protection methods perturb the non-confidential attributes of an original dataset and publish the perturbed results along with the values of confidential attributes. Traditionally, such a method is considered to achieve a good privacy level if attackers who try to link an original record with its perturbed counterpart have a low success probability. Another opinion is lately gaining popularity: the protection methods should resist not only record re-identification attacks, but also attacks that try to guess the true value of some confidential attribute of some original record(s). This is known as attribute disclosure risk.
In this paper we propose a quite simple strategy to estimate the attribute disclosure risk suffered by a protection method: using a classifier, constructed from the protected (public) dataset, to predict the attribute values of some original record. After defining this approach in detail, we describe some experiments that show the power and danger of the approach: very popular protection methods suffer from very high attribute disclosure risk values.
KeywordsAttribute Disclosure Control Classification Privacy-Preserving Data Perturbation
Unable to display preview. Download preview PDF.
- 2.Aha, D.W., Kibler, D., Albert, M.K.: Instance-based learning algorithms. Machine Learning 6, 37–66 (1991)Google Scholar
- 4.Domingo-Ferrer, J., Torra, V.: Disclosure control methods and information loss for microdata. In: Confidentiality, Disclosure, and Data Access: Theory and Practical Applications for Statistical Agencies, pp. 91–110 (2001)Google Scholar
- 6.Kim, J.: A method for limiting disclosure in microdata based on random noise and transformation. In: Proceedings of the ASA Section on Survey Research Methodology, pp. 303–308 (1986)Google Scholar
- 7.Kim, J., Winkler, W.E.: Multiplicative noise for masking continuous data. Research report series (statistics 2003-01), U. S. Bureau of the Census (2003)Google Scholar
- 9.Li, N., Li, T.: t-closeness: Privacy beyond k-anonymity and -diversity. In: Proc. of IEEE Int. Conf. on Data Engineering (2007)Google Scholar
- 11.Machanavajjhala, A., Gehrke, J., Kifer, D., Venkitasubramaniam, M.: l-diversity: Privacy beyond k-anonymity. In: IEEE Int. Conf. on Data Engineering (2006)Google Scholar
- 13.U.S. Census Bureau. Data extraction system (2009), http://www.census.gov/
- 14.Murphy, P., Aha, D.: UCI Repository machine learning databases. University of California, Department of Information and Computer Science, Irvine (1994)Google Scholar
- 16.Oganian, A., Domingo-Ferrer, J.: On the complexity of optimal microaggregation for statistical disclosure control. Statistical Journal United Nations Economic Commission for Europe 18(4), 345–354 (2000)Google Scholar
- 17.Quinlan, J.R.: Induction of decision trees. Machine Learning 1(1), 81–106 (1986)Google Scholar
- 18.Samatari, P., Sweeney, L.: Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression. Technical report, SRI Intl. Tech. Rep. (1998)Google Scholar
- 20.Truta, T.M., Vinay, B.: Privacy protection: p-sensitive k-anonymity property. In: IEEE Int. Conf. on Data Engineering Workshops (2006)Google Scholar
- 21.Vapnik, V.: The support vector method. In: Int. Conference on Artificial Neural Networks, pp. 263–271 (1997)Google Scholar