Privacy Preserving Distributed Data Mining with Evolutionary Computing

  • Lambodar Jena
  • Narendra Ku. Kamila
  • Sushruta Mishra
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 247)

Abstract

Publishing data about individuals without revealing sensitive information about them is an important problem. Distributed data mining applications use sensitive data from distributed databases held by different parties. This comes into direct conflict with an individual’s need and right to privacy. It is thus of great importance to develop adequate security techniques for protecting privacy of individual values used for data mining. Here, we study how to maintain privacy in distributed data mining. That is, we study how two (or more) parties can find frequent itemsets in a distributed database without revealing each party’s portion of the data to the other. In this paper, we consider privacy-preserving naïve-Bayes classifier for horizontally partitioned distributed data and propose data mining privacy by decomposition (DMPD) method that uses genetic algorithm to search for optimal feature set partitioning by classification accuracy and k-anonymity constraints.

Keywords

Distributed database privacy data mining classification k-anonymity 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Kargupta, H., Chan, P.: Advances in Distributed and Parallel Knowledge Discovery. MIT, AAAI Press, Cambridge, New York (2000)Google Scholar
  2. 2.
    Vaidya, J., Clifton, C.: Privacy-preserving data mining: Why, how and when. IEEE Security and Privacy, 19–27 (November/December 2004)Google Scholar
  3. 3.
    Evfimievski, A., Ramakrishnan, S., Agrawal, R., Gehrke, J.: Privacy- preserving mining of association rules. In: Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Edmonton, Alberta, Canada (July 2002)Google Scholar
  4. 4.
    Kantarcioglu, M., Vaidya, J.: Privacy preserving naive Bayes classifier for horizontally partitioned data. In: Proceedings of IEEE Workshop on Privacy Preserving Data Mining (2003)Google Scholar
  5. 5.
    Vaidya, J., Clifton, C.: Privacy-preserving association rule mining in vertically partitioned data. In: Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 639–644. ACM Press, New York (2002)Google Scholar
  6. 6.
    Verykios, V.S., Elmagarmid, A.K., Bertino, E., Saygin, Y., Dasseni, E.: Association rule hiding. IEEE Transactions on Knowledge and Data Engineering 16(4), 434–447 (2004)CrossRefGoogle Scholar
  7. 7.
    Rizvi, S.J., Haritsa, J.R.: Maintaining data privacy in association rule mining. In: Proceedings of the 28th International Conference on Very Large Data Bases, pp. 682–693 (2002)Google Scholar
  8. 8.
    Clifton, C., Kantarcioglou, M., Lin, X., Zhu, M.Y.: Tools for privacy preserving distributed data mining. SIGKDD Exploration 4(2), 1–7 (2002)CrossRefGoogle Scholar
  9. 9.
    Vaidya, J., Clifton, C.: Privacy-preserving k-means clustering over vertically partitioned data. In: Proceedings of 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 206–215. ACM Press, New York (2003)Google Scholar
  10. 10.
    Kantarcioglu, M., Vaidya, J.: Privacy-preserving naive Bayes classifier for horizontally partitioned data. In: IEEE Workshop on Privacy Preserving Data Mining (2003)Google Scholar
  11. 11.
    Vaidya, J., Clifton, C.: Privacy preserving naive Bayes classifier on vertically partitioned data. In: 2004 SIAM International Conference on Data Mining (2004)Google Scholar
  12. 12.
    Wright, R., Yang, Z.: Privacy-preserving Bayesian network structure computation on distributed heterogeneous data. In: KDD 2004, Seattle, Washington, USA (August 2004)Google Scholar
  13. 13.
    Yang, Z., Zhong, S., Wright, R.: Privacy-preserving classification of customer data without loss of accuracy. In: Proceedings of the 5th SIAM International Conference on Data Mining, Newport Beach, CA (April 2005)Google Scholar
  14. 14.
    Alpaydin, E.: Combined 5 _ 2 CV F-test for comparing supervised classification learning classifiers. Neural Computation 11, 1975–1982 (1999)Google Scholar
  15. 15.
    Cohen, S., Rokach, L., Maimon, O.: Decision-tree instance-space decomposition with grouped gain-ratio. Information Sciences 177(17), 3592–3612 (2007)CrossRefGoogle Scholar
  16. 16.
    Fayyad, U., Piatetsky-Shapiro, G., Smyth, P.: From data mining to knowledge discovery: An overview. In: Advances in Knowledge Discovery and Data Mining, pp. 1–31. AAAI Press, Menlo Park (1996)Google Scholar
  17. 17.
    Fonseca, C.M., Fleming, P.J.: Genetic algorithms for multiobjective optimization: Formulation, discussion and generalization. In: Forrest, S. (ed.) Proc. of the Fifth International Conference on Genetic Algorithms, pp. 416–423. Morgan Kaufmann, San Mateo (1993)Google Scholar
  18. 18.
    Friedman, A., Schuster, R.W.: Providing k-anonymity in data mining. VLDB 17(4), 789–804 (2008)CrossRefGoogle Scholar
  19. 19.
    Fung, B.C.M., Wang, K., Yu, P.S.: Anonymizing classification data for privacy preservation. IEEE Transactions on Knowledge and Data Engineering 19(5), 711–725 (2007)CrossRefGoogle Scholar
  20. 20.
    Fung, B.C.M., Wang, K., Yu, P.S.: Top-down specialization for information and privacy preservation. In: Proc. of the 21st IEEE International Conference on Data Engineering, ICDE 2005, pp. 205–216. IEEE Computer Society, Washington, DC (2005)Google Scholar
  21. 21.
    Goldberg, D.E.: Genetic Algorithms in Search, Optimization, and Machine Learning. Addison-Wesley, Boston (1989)MATHGoogle Scholar
  22. 22.
    Jones, D.F., Mirrazavi, S.K., Tamiz, M.: Multiobjective meta-heuristics: An overview of the current state-of-the-art. European Journal of Operational Research 137(1), 1–9 (2002)Google Scholar
  23. 23.
    Kim, S.W., Park, S., Won, J.I., Kim, A.W.: Privacy preserving data mining of sequential patterns for network traffic data. Information Sciences 178(3), 694–713 (2008)CrossRefGoogle Scholar
  24. 24.
    Konaka, D.W., Coitb, A.E.: Smithc, Multi-objective optimization using genetic algorithms: A tutorial. Reliability Engineering and System Safety 91, 992–1007 (2006)CrossRefGoogle Scholar
  25. 25.
    Mitchell, M.: An Introduction to Genetic Algorithms. MIT Press, Cambridge (1996)Google Scholar
  26. 26.
    Meints, M., Moller, J.: Privacy preserving data mining – a process centric view from a European perspective (2004), http://www.fidis.net
  27. 27.
    Sharpe, P.K., Glover, R.P.: Efficient GA based techniques for classification. Applied Intelligence 11, 277–284 (1999)CrossRefGoogle Scholar
  28. 28.
    Zhang, J., Zhuang, J., Du, H., Wang, S.: Self-organizing genetic algorithm based tuning of PID controllers. Information Sciences 179(7), 1007–1018 (2009)CrossRefMATHGoogle Scholar
  29. 29.
    Zitzler, E., Deb, K., Thiele, L.: Comparison of multiobjective evolutionary algorithms: Empirical results. Evolutionary Computation 8(2), 173–195 (2000)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Lambodar Jena
    • 1
    • 2
  • Narendra Ku. Kamila
    • 3
  • Sushruta Mishra
    • 1
  1. 1.Department of Computer Science & EngineeringGandhi Engineering CollegeBhubaneswarIndia
  2. 2.Department of Computer Science & EngineeringUtkal UniversityBhubaneswarIndia
  3. 3.Department of Computer Science & EngineeringC.V.Raman College of EngineeringBhubaneswarIndia

Personalised recommendations