Approximate Privacy-Preserving Data Mining on Vertically Partitioned Data

  • Robert Nix
  • Murat Kantarcioglu
  • Keesook J. Han
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7371)

Abstract

In today’s ever-increasingly digital world, the concept of data privacy has become more and more important. Researchers have developed many privacy-preserving technologies, particularly in the area of data mining and data sharing. These technologies can compute exact data mining models from private data without revealing private data, but are generally slow. We therefore present a framework for implementing efficient privacy-preserving secure approximations of data mining tasks. In particular, we implement two sketching protocols for the scalar (dot) product of two vectors which can be used as sub-protocols in larger data mining tasks. These protocols can lead to approximations which have high accuracy, low data leakage, and one to two orders of magnitude improvement in efficiency. We show these accuracy and efficiency results through extensive experimentation. We also analyze the security properties of these approximations under a security definition which, in contrast to previous definitions, allows for very efficient approximation protocols.

Keywords

Data Mining Association Rule Mining Random Projection Privacy Preserve Data Mining Task 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. 1.
    Achlioptas, D.: Database-friendly random projections: Johnson-lindenstrauss with binary coins. Journal of Computer and System Sciences 66(4), 671–687 (2003)MathSciNetCrossRefMATHGoogle Scholar
  2. 2.
    Aggarwal, C., Yu, P.: A general survey of privacy-preserving data mining models and algorithms. In: Privacy-Preserving Data Mining, pp. 11–52 (2008)Google Scholar
  3. 3.
    Agrawal, R., Srikant, R.: Privacy-preserving data mining. ACM Sigmod Record 29, 439–450 (2000)CrossRefGoogle Scholar
  4. 4.
    Asuncion, A., Newman, D.: UCI machine learning repository (2007)Google Scholar
  5. 5.
    Clifton, C., Kantarcioglu, M., Vaidya, J., Lin, X., Zhu, M.: Tools for privacy preserving distributed data mining. ACM SIGKDD Explorations Newsletter 4(2), 28–34 (2002)CrossRefGoogle Scholar
  6. 6.
    Du, W., Atallah, M.: Privacy-preserving cooperative statistical analysis. In: Proceedings of the 17th Annual Computer Security Applications Conference, p. 102. IEEE Computer Society (2001)Google Scholar
  7. 7.
    Dwork, C.: Differential Privacy: A Survey of Results. In: Agrawal, M., Du, D.-Z., Duan, Z., Li, A. (eds.) TAMC 2008. LNCS, vol. 4978, pp. 1–19. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  8. 8.
    Feigenbaum, J., Ishai, Y., Malkin, T., Nissim, K., Strausse, M., Wright, R.: Secure multiparty computation of approximations. ACM Transactions on Algorithms (TALG) 2(3), 435–472 (2006)MathSciNetCrossRefMATHGoogle Scholar
  9. 9.
    Fradkin, D., Madigan, D.: Experiments with random projections for machine learning. In: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 517–522. ACM (2003)Google Scholar
  10. 10.
    Goethals, B.: Frequent itemset mining implementations repository (2005)Google Scholar
  11. 11.
    Goethals, B., Laur, S., Lipmaa, H., Mielikäinen, T.: On Private Scalar Product Computation for Privacy-Preserving Data Mining. In: Park, C.-S., Chee, S. (eds.) ICISC 2004. LNCS, vol. 3506, pp. 104–120. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  12. 12.
    Hoeffding, W.: Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association 58(301), 13–30 (1965)MathSciNetCrossRefMATHGoogle Scholar
  13. 13.
    Huang, Z., Du, W., Chen, B.: Deriving private information from randomized data (2005)Google Scholar
  14. 14.
    Ioannidis, I., Grama, A., Attallah, M.: A secure protocol for computing the dot-products in clustered and distributed environments. In: International Conference on Parallel Processing, 2002, pp. 379–384. IEEE (2002)Google Scholar
  15. 15.
    Johnson, W., Lindenstrauss, J.: Extensions of lipschitz mappings into a hilbert space. Contemporary Mathematics 26(189-206), 1 (1984)MathSciNetMATHGoogle Scholar
  16. 16.
    Kantarcioglu, M., Clifton, C.: Privacy-preserving distributed mining of association rules on horizontally partitioned data. IEEE Transactions on Knowledge and Data Engineering 16(9), 1026–1037 (2004)CrossRefGoogle Scholar
  17. 17.
    Kantarcioglu, M., Nix, R., Vaidya, J.: An Efficient Approximate Protocol for Privacy-Preserving Association Rule Mining. In: Theeramunkong, T., Kijsirikul, B., Cercone, N., Ho, T.-B. (eds.) PAKDD 2009. LNCS, vol. 5476, pp. 515–524. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  18. 18.
    Kargupta, H., Datta, S., Wang, Q., Sivakumar, K.: On the privacy preserving properties of random data perturbation techniques. In: Third IEEE International Conference on Data Mining, ICDM 2003, pp. 99–106. IEEE (2003)Google Scholar
  19. 19.
    Li, P., Hastie, T., Church, K.: Very sparse random projections. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 287–296. ACM (2006)Google Scholar
  20. 20.
    Lindell, Y., Pinkas, B.: Privacy Preserving Data Mining. In: Bellare, M. (ed.) CRYPTO 2000. LNCS, vol. 1880, pp. 36–54. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  21. 21.
    Liu, K., Giannella, C., Kargupta, H.: An Attacker’s View of Distance Preserving Maps for Privacy Preserving Data Mining. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) PKDD 2006. LNCS (LNAI), vol. 4213, pp. 297–308. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  22. 22.
    Liu, K., Kargupta, H., Ryan, J.: Random projection-based multiplicative data perturbation for privacy preserving distributed data mining. IEEE Transactions on Knowledge and Data Engineering, 92–106 (2006)Google Scholar
  23. 23.
    Menezes, A., Van Oorschot, P., Vanstone, S.: Handbook of applied cryptography. CRC (1997)Google Scholar
  24. 24.
    Pinkas, B.: Cryptographic techniques for privacy-preserving data mining. ACM SIGKDD Explorations Newsletter 4(2), 12–19 (2002)CrossRefGoogle Scholar
  25. 25.
    Qiu, L., Li, Y., Wu, X.: Preserving privacy in association rule mining with bloom filters. Journal of Intelligent Information Systems 29(3), 253–278 (2007)CrossRefGoogle Scholar
  26. 26.
    Ravikumar, P., Cohen, W., Feinberg, S.: A secure protocol for computing string distance metrics. In: Proceedings of the Workshop on Privacy and Security Aspects of Data Mining at the International Conference on Data Mining, pp. 40–46. IEEE (2004)Google Scholar
  27. 27.
    Vaidya, J., Clifton, C.: Privacy preserving association rule mining in vertically partitioned data. In: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 639–644. ACM (2002)Google Scholar
  28. 28.
    Vaidya, J., Clifton, C.: Privacy preserving naıve bayes classifier for vertically partitioned data. In: 2004 SIAM International Conference on Data Mining, Lake Buena Vista, Florida, pp. 522–526 (2004)Google Scholar
  29. 29.
    Vaidya, J., Clifton, C.: Privacy-Preserving Decision Trees over Vertically Partitioned Data. In: Jajodia, S., Wijesekera, D. (eds.) Data and Applications Security 2005. LNCS, vol. 3654, pp. 139–152. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  30. 30.
    Vaidya, J., Clifton, C.: Secure set intersection cardinality with application to association rule mining. Journal of Computer Security 13(4), 593–622 (2005)CrossRefGoogle Scholar
  31. 31.
    Wang, W., Garofalakis, M., Ramchandran, K.: Distributed sparse random projections for refinable approximation. In: Proceedings of the 6th International Conference on Information Processing in Sensor Networks, pp. 331–339. ACM (2007)Google Scholar

Copyright information

© IFIP International Federation for Information Processing 2012

Authors and Affiliations

  • Robert Nix
    • 1
  • Murat Kantarcioglu
    • 1
  • Keesook J. Han
    • 2
  1. 1.Jonsson School of Engineering and Computer ScienceThe University of Texas at DallasRichardsonUSA
  2. 2.Air Force Research LaboratoryInformation DirectorateRomeUSA

Personalised recommendations