Suppressing Microdata to Prevent Probabilistic Classification Based Inference

  • Ayça Azgın Hintoğlu
  • Yücel Saygın
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3674)

Abstract

Enterprises have been collecting data for many reasons including better customer relationship management, and high-level decision making. Public safety was another motivation for large-scale data collection efforts initiated by government agencies. However, such widespread data collection efforts coupled with powerful data analysis tools raised concerns about privacy. This is due to the fact that collected data may contain confidential information, or it can be used to infer confidential information. One method to ensure privacy is to selectively hide confidential data values from the data set to be disclosed. However, with data mining technology it is now possible for an adversary to predict the hidden data values, which is another threat to privacy. In this paper we concentrate on probabilistic classification, which is a specific data mining technique widely used for prediction purposes, and propose methods for downgrading probabilistic classification models in order to block the inference of hidden microdata values.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Clifton, C.: Using Sample Size to Limit Exposure to Data Mining. Journal of Computer Security 8(4), 281–307 (2000)Google Scholar
  2. 2.
    Kantarcioglu, M., Clifton, C.: Privacy-Preserving Distributed Mining of Association Rules on Horizontally Partitioned Data. IEEE TKDE 16(9) (September 2004)Google Scholar
  3. 3.
    Kantarcioglu, M., Jin, J., Clifton, C.: When do data mining results violate privacy? In: PKDD (2004)Google Scholar
  4. 4.
    Verykios, V.S., Bertino, E., Parasiliti, L., Favino, I.N., Saygin, Y., Theodoridis, Y.: State-of-the-Art in Privacy Preserving Data Mining. SIGMOD Record 33(1) (2004)Google Scholar
  5. 5.
    Verykios, V.S., Elmagarmid, A., Bertino, E., Saygin, Y., Dasseni, E.: Association Rule Hiding. IEEE TKDE 16(4) (2004)Google Scholar
  6. 6.
    Saygin, Y., Verykios, V.S., Elmagarmid, A.: Privacy Preserving Association Rule Mining. In: Proceedings of the 12th International Workshop on Research Issues in Data Engineering (RIDE 2002). IEEE Computer Society Press, Los Alamitos (2002)Google Scholar
  7. 7.
    Vaidya, J., Clifton, C.: Privacy-Preserving K-Means Clustering over Vertically Partitioned Data. In: SIGKDD (2003)Google Scholar
  8. 8.
    Agrawal, R., Srikant, R.: Privacy Preserving Data Mining. In: SIGMOD 2000, pp. 45–52 (2000)Google Scholar
  9. 9.
    Chang, L., Moskowitz, I.S.: Parsimonious Downgrading and Decision Trees Applied to the Inference Problem. In: Proceedings of the Workshop of New Security Paradigms (1999)Google Scholar
  10. 10.
    Iyengar, V.S.: Transforming data to satisfy privacy constraints. In: SIGKDD 2002 (2002)Google Scholar
  11. 11.
    Oliveira, S.R.M., Zaiane, O.R.: Protecting Sensitive Knowledge by Data Sanitization. In: Proceedings of the 3rd IEEE International Conference on Data Mining, ICDM 2003 (2003)Google Scholar
  12. 12.
    Rizvi, S.J., Haritsa, J.R.: Privacy-Preserving Association Rule Mining. In: Proceedings of the 28th International Conference on Very Large Data Bases, Hong Kong, China (August 2002)Google Scholar
  13. 13.
    Adam, N.R., Wortmann, J.C.: Security-Control Methods for Statistical Databases: A Comparative Study. ACM Computing Survey 21(4), 515–556 (1989)CrossRefGoogle Scholar
  14. 14.
    Domingo-Ferrer, J. (ed.): Inference Control in Statistical Databases. LNCS, vol. 2316. Springer, Heidelberg (2002)MATHGoogle Scholar
  15. 15.
    Mangasarian, O.L., Wolberg, W.H.: Cancer diagnosis via linear programming. SIAM News 23(5), 1–18 (1990)Google Scholar
  16. 16.
    UCI Machine Learning Repository, http://www.ics.uci.edu/~mlearn/MLSummary.html
  17. 17.
    Samarati, P.: Protecting respondents’ identities in microdata release. IEEE TKDE (2001)Google Scholar
  18. 18.
    Han, J., Kamber, M.: Data Mining Concepts and Techniques. Morgan Kaufmann Publishers, San Francisco (2001)Google Scholar
  19. 19.
    Du, W., Zhan, Z.: Using Randomized Response Techniques for Privacy-Preserving Data Mining. In: SIGKDD (2003)Google Scholar
  20. 20.
    Kargupta, H., Liu, K., Ryan, J.: Privacy Sensitive Distributed Data Mining from Multi-party Data. In: Chen, H., Miranda, R., Zeng, D.D., Demchak, C.C., Schroeder, J., Madhusudan, T. (eds.) ISI 2003. LNCS, vol. 2665, pp. 336–342. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  21. 21.
    Kargupta, H., Datta, S., Wang, O., Sivakumar, K.: On the privacy preserving properties of random data perturbation techniques. In: Data Mining, ICDM 2003 (2003)Google Scholar
  22. 22.
    Polat, H., Wenliang, D.: Privacy-preserving collaborative filtering using randomized per-turbation techniques. In: Data Mining, ICDM 2003 (2003)Google Scholar
  23. 23.
    Evfimievski, A.V., Gehrke, J., Srikant, R.: Limiting privacy breaches in privacy preserving data mining. In: PODS 2003, pp. 211–222 (2003)Google Scholar
  24. 24.
    Cover, T.M., Thomas, J.A.: Elements of Information Theory. John Wiley & Son, Chichester (1991)MATHCrossRefGoogle Scholar
  25. 25.
    Farkas, C., Jajodia, S.: The inference problem: A survey. SIGKDD Explorations (January 2003)Google Scholar
  26. 26.
    Sweeney, L.: k-Anonymity: A model for protecting privacy. International Journal on Un-certainty, Fuzziness and Knowledge-based Systems 10(5), 557–570 (2002)MATHCrossRefMathSciNetGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Ayça Azgın Hintoğlu
    • 1
  • Yücel Saygın
    • 1
  1. 1.Faculty of Engineering and Natural Sciences, TuzlaSabancı UniversityIstanbulTurkey

Personalised recommendations