Breaching Privacy Using Data Mining: Removing Noise from Perturbed Data

  • Michal Sramka
Part of the Studies in Computational Intelligence book series (SCI, volume 394)


Data perturbation is a sanitization method that helps restrict the disclosure of sensitive information from published data. We present an attack on the privacy of the published data that has been sanitized using data perturbation. The attack employs data mining and fusion to remove some noise from the perturbed sensitive values. Our attack is practical – it can be launched by non-expert adversaries having no background knowledge about the perturbed data and no data mining expertise. Moreover, our attack model also allows to consider informed and expert adversaries having background knowledge and/or expertise in data mining and fusion. Extensive experiments were performed on four databases derived from UCI’s Adult and IPUMS census-based data sets sanitized with noise addition that satisfies ε-differential privacy. The experimental results confirm that our attack presents a significant privacy risk to published perturbed data because the majority of the noise can be effectively removed. The results show that a naive adversary is able to remove around 90% of the noise added during perturbation using general-purpose data miners from the Weka software package, and an informed expert adversary is able to remove 91%–99.93% of the added noise. Interestingly, the higher the aimed privacy, the higher the percentage of noise can be removed. This suggests that adding more noise does not always increase the real privacy.


Data Mining External Knowledge External Data Fusion Algorithm Data Owner 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Abowd, J.M., Vilhuber, L.: How Protective Are Synthetic Data? In: Domingo-Ferrer, J., Saygın, Y. (eds.) PSD 2008. LNCS, vol. 5262, pp. 239–246. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  2. 2.
    Adam, N.A., Wortman, J.C.: Security-control methods for statistical databases. ACM Computing Surveys 21(4), 515–556 (1989)CrossRefGoogle Scholar
  3. 3.
    Agrawal, R., Srikant, R.: Privacy-Preserving Data Mining. In: Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, SIGMOD 2000, May 16-18, pp. 439–450. ACM Press, New York (2000)CrossRefGoogle Scholar
  4. 4.
    Asuncion, A., Newman, D.J.: UCI Machine Learning Repository (2007),
  5. 5.
    Bleiholder, J., Naumann, F.: Data fusion. ACM Computing Surveys 41(1), 1–41 (2008)CrossRefGoogle Scholar
  6. 6.
    Blum, A., Dwork, C., McSherry, F., Nissim, K.: Practical privacy: the SuLQ framework. In: Proceedings of the 24nd ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, PODS 2005, June 13-15, pp. 128–138. ACM Press, Baltimore (2005)CrossRefGoogle Scholar
  7. 7.
    Dalenius, T.: Towards a methodology for statistical disclosure control. Statistisk Tidskrift 15, 429–444 (1977)Google Scholar
  8. 8.
    Dutta, H., Kargupta, H., Datta, S., Sivakumar, K.: Analysis of privacy preserving random perturbation techniques: further explorations. In: Proceedings of the 2003 ACM Workshop on Privacy in the Electronic Society, WPES 2003, October 30, pp. 31–38. ACM Press, Washington (2003)CrossRefGoogle Scholar
  9. 9.
    Dwork, C.: Differential Privacy. In: Bugliesi, M., Preneel, B., Sassone, V., Wegener, I. (eds.) ICALP 2006. LNCS, vol. 4052, pp. 1–12. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  10. 10.
    Dwork, C.: Differential Privacy: A Survey of Results. In: Agrawal, M., Du, D.-Z., Duan, Z., Li, A. (eds.) TAMC 2008. LNCS, vol. 4978, pp. 1–19. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  11. 11.
    Dwork, C.: The Differential Privacy Frontier (Extended Abstract). In: Reingold, O. (ed.) TCC 2009. LNCS, vol. 5444, pp. 496–502. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  12. 12.
    Dwork, C., McSherry, F., Nissim, K., Smith, A.: Calibrating Noise to Sensitivity in Private Data Analysis. In: Halevi, S., Rabin, T. (eds.) TCC 2006. LNCS, vol. 3876, pp. 265–284. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  13. 13.
    Dwork, C., Naor, M., Reingold, O., Rothblum, G.N., Vadhan, S.P.: On the complexity of differentially private data release: efficient algorithms and hardness results. In: Proceedings of the 41st Annual ACM Symposium on Theory of Computing, STOC 2009, May 31 - June 2, pp. 381–390. ACM Press, Bethesda (2009)CrossRefGoogle Scholar
  14. 14.
    Ganta, S.R., Acharya, R.: On Breaching Enterprise Data Privacy Through Adversarial Information Fusion. In: Proceedings of the 24th International Conference on Data Engineering Workshops, Workshop on Information Integration Methods, Architectures, and Systems, ICDE-IIMAS 2008, April 7-12, pp. 246–249. IEEE Computer Society Press, Cancun (2005)Google Scholar
  15. 15.
    Goodman, I.R., Mahler, R.P., Nguyen, H.T.: Mathematics of Data Fusion. Kluwer Academic Publishers, Norwell (1997)zbMATHGoogle Scholar
  16. 16.
    Iyengar, V.S.: Transforming data to satisfy privacy constraints. In: Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2002, July 23-26, pp. 279–288. ACM Press, Edmonton (2002)CrossRefGoogle Scholar
  17. 17.
    Kasiviswanathan, S.P., Lee, H.K., Nissim, K., Raskhodnikova, S., Smith, A.: What Can We Learn Privately? In: Proceedings of the 49th Annual IEEE Symposium on Foundations of Computer Science, FOCS 2008, October 25-28, pp. 531–540. IEEE Computer Society Press, Philadelphia (2008)CrossRefGoogle Scholar
  18. 18.
    Machanavajjhala, A., Kifer, D., Abowd, J.M., Gehrke, J., Vilhuber, L.: Privacy: Theory meets Practice on the Map. In: Proceedings of the 24th International Conference on Data Engineering, ICDE 2008, April 7-12, pp. 277–286. IEEE Computer Society Press, Cancun (2008)CrossRefGoogle Scholar
  19. 19.
    McSherry, F.: Preserving privacy in large-scale data analysis. A presentation at Workshop on Algorithms for Modern Massive Data Sets (MMDS 2006), Stanford, CA, USA, June 21-24 (2006),
  20. 20.
    Mironov, I., Pandey, O., Reingold, O., Vadhan, S.: Computational Differential Privacy. In: Halevi, S. (ed.) CRYPTO 2009. LNCS, vol. 5677, pp. 126–142. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  21. 21.
    Muralidhar, K., Sarathy, R.: Differential Privacy for Numeric Data. In: Joint UNECE/Eurostat Work Session on Statistical Data Confidentiality, Bilbao, Spain (2009)Google Scholar
  22. 22.
    Muralidhar, K., Sarathy, R.: Does Differential Privacy Protect Terry Gross” Privacy? In: Domingo-Ferrer, J., Magkos, E. (eds.) PSD 2010. LNCS, vol. 6344, pp. 200–209. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  23. 23.
    Sarathy, R., Muralidhar, K.: Some Additional Insights on Applying Differential Privacy for Numeric Data. In: Domingo-Ferrer, J., Magkos, E. (eds.) PSD 2010. LNCS, vol. 6344, pp. 210–219. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  24. 24.
    Sramka, M.: A Privacy Attack That Removes the Majority of the Noise From Perturbed Data. In: Proceedings of the 2010 International Joint Conference on Neural Networks, IJCNN 2010, as part of the 2010 IEEE World Congress on Computational Intelligence, WCCI 2010, July 18-23. IEEE Computer Society Press, Barcelona (2010)Google Scholar
  25. 25.
    Sramka, M.: Data mining as a tool in privacy-preserving data publishing. Tatra Mountains Mathematical Publications 45, 151–159 (2010)zbMATHMathSciNetGoogle Scholar
  26. 26.
    Sramka, M., Safavi-Naini, R., Denzinger, J.: An Attack on the Privacy of Sanitized Data That Fuses the Outputs of Multiple Data Miners. In: Proceedings of the 9th IEEE International Conference on Data Mining Workshops, International workshop on Privacy Aspects of Data Mining, ICDM-PADM 2009, December 6, pp. 130–137. IEEE Computer Society Press, Miami Beach (2009)Google Scholar
  27. 27.
    Sramka, M., Safavi-Naini, R., Denzinger, J., Askari, M.: A Practice-oriented Framework for Measuring Privacy and Utility in Data Sanitization Systems. In: Proceedings of the 12th International Conference on Extending Database Technology Workshops, the 3rd International Workshop on Privacy and Anonymity in the Information Society, EDBT-PAIS 2010, March 22-26. ACM Press, Lausanne (2010)Google Scholar
  28. 28.
    Sramka, M., Safavi-Naini, R., Denzinger, J., Askari, M., Gao, J.: Utility of Knowledge Extracted from Unsanitized Data when Applied to Sanitized Data. In: Proceedings of the 6th Annual Conference on Privacy, Security and Trust, PST 2008, October 1-3, pp. 227–231. IEEE Computer Society Press, Fredericton (2008)CrossRefGoogle Scholar
  29. 29.
    Torra, V. (ed.): Information Fusion in Data Mining. Studies in Fuzziness and Soft Computing, vol. 123. Springer, Heidelberg (2003)Google Scholar
  30. 30.
    Torra, V., Narukawa, Y.: Modeling Decisions: Information Fusion and Aggregation Operators. In: Cognitive Technologies. Springer, Heidelberg (2007)Google Scholar
  31. 31.
    Valls, A., Torra, V., Domingo-Ferrer, J.: Semantic based aggregation for statistical disclosure control. International Journal of Intelligent Systems 18(9), 393–951 (2003)CrossRefGoogle Scholar
  32. 32.
    Witten, I.H., Frank, E.: Data Mining: Practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)zbMATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  1. 1.UNESCO Chair in Data Privacy, Department of Computer Engineering and MathsUniversitat Rovira i VirgiliTarragonaSpain
  2. 2.Department of Applied Informatics, Faculty of Electrical Engineering and Information TechnologySlovak University of TechnologyBratislavaSlovakia

Personalised recommendations