Advertisement

Distributed and Parallel Databases

, Volume 32, Issue 1, pp 5–35 | Cite as

A generic and distributed privacy preserving classification method with a worst-case privacy guarantee

  • Madhushri BanerjeeEmail author
  • Zhiyuan Chen
  • Aryya Gangopadhyay
Article

Abstract

It is often necessary for organizations to perform data mining tasks collaboratively without giving up their own data. This necessity has led to the development of privacy preserving distributed data mining. Several protocols exist which deal with data mining methods in a distributed scenario but most of these methods handle a single data mining task. Therefore, if the participating parties are interested in more than one classification methods they will have to go through a series of distributed protocols every time, thus increasing the overhead substantially. A second significant drawback with existing methods is that they are often quite expensive due to the use of encryption operations. In this paper a method has been proposed that addresses both these issues and provides a generic approach to efficient privacy preserving classification analysis in a distributed setting with a worst-case privacy guarantee. The experimental results demonstrate the effectiveness of this method.

Keywords

Data mining Privacy preserving data mining Classification 

References

  1. 1.
    Aggarwal, C.C., Yu, P.S.: A condensation approach to privacy preserving data mining. In: 9th International Conference on Extending Database Technology, Heraklion, Crete, Greece (2004) Google Scholar
  2. 2.
    Aggarwal, C.C., Yu, P.S.: Privacy-Preserving Data Mining: Models and Algorithms. Springer, Berlin (2008) CrossRefGoogle Scholar
  3. 3.
    Agrawal, D., Aggarwal, C.C.: On the design and quantification of privacy preserving data mining algorithms. In: 20th ACM PODS, Santa Barbara, CA, pp. 247–255 (2001) Google Scholar
  4. 4.
    Agrawal, S., Haritsa, J.R.: A framework for high-accuracy privacy-preserving mining. In: ICDE (2005) Google Scholar
  5. 5.
    Agrawal, R., Srikant, R.: Privacy preserving data mining. In: 2000 ACM SIGMOD, Dallas, TX, May 2000, pp. 439–450 (2000) Google Scholar
  6. 6.
    Blum, A., Dwork, C., McSherry, F., Nissim, K.: Practical privacy: the sulq framework. In: PODS (2005) Google Scholar
  7. 7.
    Blum, A., Ligett, K., Roth, A.: A learning theory approach to non-interactive database privacy. In: Proceedings of the 40th Annual ACM Symposium on Theory of Computing, STOC’08, pp. 609–618. ACM, New York (2008). http://doi.acm.org/10.1145/1374376.1374464 Google Scholar
  8. 8.
    Caragea, D., Silvescu, A., Honavar, V.: Decision tree induction from distributed, heterogeneous, autonomous data sources. In: Conference on Intelligent Systems Design and Applications (2003) Google Scholar
  9. 9.
    Chen, K., Liu, L.: A random rotation perturbation approach to privacy-preserving data classification. In: ICDM 2005, Houston, TX, November 2005 Google Scholar
  10. 10.
    Clifton, C., Kantarcioglu, M., Vaidya, J., Lin, X., Zhu, M.: Tools for privacy preserving distributed data mining. ACM SIGKDD Explor. 4, 28–34 (2002) CrossRefGoogle Scholar
  11. 11.
    Dalenius, T., Reiss, S.P.: Data-swapping: a technique for disclosure control. J. Stat. Plan. Inference 6, 73–85 (1982) CrossRefzbMATHMathSciNetGoogle Scholar
  12. 12.
    Du, W., Zhan, Z.: Building decision tree classifier on private data. In: IEEE International Conference on Privacy, Security and Data Mining, Maebashi City, Japan, December 2002, pp. 1–8 (2002) Google Scholar
  13. 13.
    Du, W., Zhan, Z.: Using randomized response techniques for privacy preserving data mining. In: 9th ACM SIGKDD, Washington, DC, August 2003, pp. 505–510 (2003) Google Scholar
  14. 14.
    Dwork, C.: Differential privacy. In: ICALP, pp. 1–12 (2006) Google Scholar
  15. 15.
    Dwork, C.: Differential privacy: a survey of results. In: Proceedings of the 5th International Conference on Theory and Applications of Models of Computation, TAMC’08, pp. 1–19. Springer, Berlin (2008). http://dl.acm.org/citation.cfm?id=1791834.1791836 CrossRefGoogle Scholar
  16. 16.
    Dwork, C., Mcsherry, F., Nissim, K., Smith, A.: Calibrating noise to sensitivity in private data analysis. In: Proceedings of the 3rd Theory of Cryptography Conference, pp. 265–284. Springer, Berlin (2006) CrossRefGoogle Scholar
  17. 17.
    Dwork, C., Naor, M., Reingold, O., Rothblum, G.N., Vadhan, S.: On the complexity of differentially private data release: efficient algorithms and hardness results. In: Proceedings of the 41st Annual ACM Symposium on Theory of Computing, STOC’09, pp. 381–390. ACM, New York (2009). http://doi.acm.org/10.1145/1536414.1536467 CrossRefGoogle Scholar
  18. 18.
    Evfimevski, A., Gehrke, J., Srikant, R.: Limiting privacy breaches in privacy preserving data mining. In: 22nd ACM PODS, San Diego, CA, June 2003, pp. 211–222 (2003) Google Scholar
  19. 19.
    Feldman, D., Fiat, A., Kaplan, H., Nissim, K.: Private coresets. In: Proceedings of the 41st Annual ACM Symposium on Theory of Computing, STOC’09, pp. 361–370. ACM, New York (2009). http://doi.acm.org/10.1145/1536414.1536465 CrossRefGoogle Scholar
  20. 20.
    Fienberg, S.E., McIntyre, J.: Data-swapping: variations on a theme by Dalenius and Reiss. Tech. rep., National Institute of Statistical Sciences (2003) Google Scholar
  21. 21.
    Frank, A., Asuncion, A.: UCI machine learning repository (2010). http://archive.ics.uci.edu/ml
  22. 22.
    Gal, T., Chen, Z., Gangopadhyay, A.: A privacy protection model for patient data with multiple sensitive attributes. Int. J. Inf. Secur. Priv. 2(3), 28–44 (2008) CrossRefGoogle Scholar
  23. 23.
    Giannella, C., Liu, K., Olsen, T., Kargupta, H.: Communication efficient construction of decision trees over heterogeneously distributed data. In: Fourth IEEE International Conference on Data Mining (2004) Google Scholar
  24. 24.
    Goethals, B., Laur, S., Lipmaa, H., Mielikainen, T.: On secure scalar product computation for privacy-preserving data mining. In: The 7th Annual International Conf. in Information Security and Cryptology (2004) Google Scholar
  25. 25.
    Huang, Z., Du, W., Chen, B.: Deriving private information from randomized data. In: SIGMOD 2005, Baltimore, MD, June 2005, pp. 37–48 (2005) Google Scholar
  26. 26.
    Jagannathan, G., Wright, R.N.: Privacy-preserving distributed k-means clustering over arbitrarily partitioned data. In: SIGKDD’05, Chicago, IL, pp. 593–599 (2005) Google Scholar
  27. 27.
    Kantarcioglu, M., Clifton, C.: Privacy-preserving distributed mining of association rules on horizontally partitioned data. IEEE Trans. Knowl. Data Eng. 16(9), 1026–1037 (2004) CrossRefGoogle Scholar
  28. 28.
    Kantarcioglu, M., Vaidya, J.: Privacy preserving naïve Bayes classifier for horizontally partitioned data. In: IEEE ICDM Workshop on Privacy Preserving Data Mining, Melbourne, FL, November 2003, pp. 3–9 (2003) Google Scholar
  29. 29.
    Kargupta, H., Park, B.H.: A Fourier spectrum-based approach to represent decision trees for mining data streams in mobile environments. IEEE Trans. Knowl. Data Eng. 16(2), 216–229 (2004) CrossRefGoogle Scholar
  30. 30.
    Kargupta, H., Datta, S., Wang, Q., Sivakumar, K.: On the privacy preserving properties of random data perturbation techniques. In: ICDM, pp. 99–106 (2003) Google Scholar
  31. 31.
    Kim, J.J., Winkler, W.E.: Multiplicative noise for masking continuous data. Tech. rep. 2003-01, Statistical Research Division, U.S. Bureau of the Census, April 2003 Google Scholar
  32. 32.
    Kim, D., Chen, Z., Gangopadhyay, A.: Optimizing privacy-accuracy tradeoff for privacy preserving distance-based classification. Int. J. Inf. Secur. Priv. 6(2), 16–33 (2012) CrossRefGoogle Scholar
  33. 33.
    Li, C., Hay, M., Rastogi, V., Miklau, G., McGregor, A.: Optimizing linear counting queries under differential privacy. In: Proceedings of the Twenty-Ninth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS’10, pp. 123–134. ACM, New York (2010). http://doi.acm.org/10.1145/1807085.1807104 CrossRefGoogle Scholar
  34. 34.
    Lin, X., Clifton, C., Zhu, Y.: Privacy preserving clustering with distributed em mixture modeling. Int. J. Knowl. Inf. Syst. 8(1), 68–81 (2005) CrossRefGoogle Scholar
  35. 35.
    Lindell, Y., Pinkas, B.: Privacy preserving data mining. In: Advances in Cryptology (CRYPTO’00). Lecture Notes in Computer Science, vol. 180, pp. 36–53 (2000) Google Scholar
  36. 36.
    Liu, K., Giannella, C., Kargupta, H.: An attacker’s view of distance preserving maps for privacy preserving data mining. In: The 10th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD’06) (2006) Google Scholar
  37. 37.
    Liu, K., Kargupta, H., Ryan, J.: Random projection-based multiplicative data perturbation for privacy preserving distributed data mining. IEEE Trans. Knowl. Data Eng. 18(1), 92–106 (2006) CrossRefGoogle Scholar
  38. 38.
    Ma, D., Sivakumar, K., Kargupta, H.: Privacy sensitive Bayesian network parameter learning. In: 4th IEEE International Conference on Data Mining (ICDM’04), Brighton, UK, November 2004, pp. 487–490 (2004) Google Scholar
  39. 39.
    Mcsherry, F.: Mechanism design via differential privacy. In: Proceedings of the 48th Annual Symposium on Foundations of Computer Science (2007) Google Scholar
  40. 40.
    Mukherjee, S., Chen, Z., Gangopadhyay, A.: A privacy preserving technique for Euclidean distance-based mining algorithms using Fourier-related transforms. VLDB J. 15(4), 292–315 (2006) CrossRefGoogle Scholar
  41. 41.
    Mukherjee, S., Banerjee, M., Chen, Z., Gangopadhyay, A.: A privacy preserving technique for distance-based classification with worst case privacy guarantees. Data Knowl. Eng. 66(2), 264–288 (2008) CrossRefGoogle Scholar
  42. 42.
    Mukherjee, S., Chen, Z., Gangopadhyay, A.: A fuzzy programming approach for data reduction and privacy in distance based mining. Int. J. Inf. Comput. Secur. (in press) Google Scholar
  43. 43.
    Oliveira, S., Zaïane, O.R.: Privacy preserving clustering by data transformation. In: 18th Brazilian Symposium on Databases, pp. 304–318 (2003) Google Scholar
  44. 44.
    Paillier, P.: Public-key cryptosystems based on composite degree residuosity classes. In: EUROCRYPT, pp. 223–238. Springer, Berlin (1999) Google Scholar
  45. 45.
    Roth, A., Roughgarden, T.: Interactive privacy via the median mechanism. In: Proceedings of the 42nd ACM Symposium on Theory of Computing, STOC’10, pp. 765–774. ACM, New York (2010). http://doi.acm.org/10.1145/1806689.1806794 CrossRefGoogle Scholar
  46. 46.
    Sweeney, L.: K-anonymity: a model for protecting privacy. Int. J. Uncertain. Fuzziness Knowl.-Based Syst. 10(5), 557–570 (2002) CrossRefzbMATHMathSciNetGoogle Scholar
  47. 47.
    Vaidya, J.: Towards a holistic approach to privacy-preserving data analysis. In: Workshop on Secure Knowledge Management (2008) Google Scholar
  48. 48.
    Vaidya, J.S., Clifton, C.: Privacy preserving association rule mining in vertically partitioned data. In: 8th ACM SIGKDD, Edmonton, Canada, July 2002, pp. 639–644 (2002) Google Scholar
  49. 49.
    Vaidya, J.S., Clifton, C.: Privacy-preserving k-means clustering over vertically partitioned data. In: 9th ACM SIGKDD, Washington, DC, August 2003, pp. 206–215 (2003) Google Scholar
  50. 50.
    Vaidya, J., Clifton, C.: Privacy-preserving decision trees over vertically partitioned data. In: Proceedings of the IFIP WG 11.3 International Conference on Data and Applications Security, pp. 139–152. Springer, Berlin (2005) CrossRefGoogle Scholar
  51. 51.
    Vaidya, J., Clifton, C., Zhu, M.: Privacy Preserving Data Mining. Springer, Berlin (2005) Google Scholar
  52. 52.
    Vaidya, J., Kantarcioglu, M., Clifton, C.: Privacy-preserving naïve Bayes classification. VLDB J. 17(4), 879–898 (2008) CrossRefGoogle Scholar
  53. 53.
    Warner, S.: Randomized response: a survey technique for eliminating evasive answer bias. J. Am. Stat. Assoc. 60(309), 63–69 (1965) CrossRefGoogle Scholar
  54. 54.
    Wright, R., Yang, Z.: Privacy-preserving Bayesian network structure computation on distributed heterogeneous data. In: 10th ACM SIGKDD Conference (SIGKDD’04), Seattle, WA, August 2004, pp. 713–718 (2004) Google Scholar
  55. 55.
    Xiao, X., Bender, G., Hay, M., Gehrke, J.: Ireduct: differential privacy with reduced relative errors. In: Proceedings of the 2011 International Conference on Management of Data, SIGMOD’11, pp. 229–240. ACM, New York (2011). http://doi.acm.org/10.1145/1989323.1989348 CrossRefGoogle Scholar
  56. 56.
    Xiao, X., Wang, G., Gehrke, J.: Differential privacy via wavelet transforms. IEEE Trans. Knowl. Data Eng. 23, 1200–1214 (2011). doi: 10.1109/TKDE.2010.247 CrossRefGoogle Scholar
  57. 57.
    Yao, A.C.: How to generate and exchange secrets. In: 27th IEEE Symposium on Foundations of Computer Science, pp. 162–167 (1986) Google Scholar
  58. 58.
    Yu, H., Jiang, X., Vaidya, J.: Privacy-preserving svm using nonlinear kernels on horizontally partitioned data. In: Proceedings of the 2006 ACM Symposium on Applied Computing, SAC’06, pp. 603–610 (2006) CrossRefGoogle Scholar
  59. 59.
    Yu, H., Vaidya, J., Jiang, X.: Privacy-preserving svm classification on vertically partitioned data. In: PAKDD, pp. 647–656 (2006) Google Scholar
  60. 60.
    Zhu, Y., Liu, L.: Optimal randomization for privacy preserving data mining. In: KDD, pp. 761–766 (2004) CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2013

Authors and Affiliations

  • Madhushri Banerjee
    • 1
    Email author
  • Zhiyuan Chen
    • 1
  • Aryya Gangopadhyay
    • 1
  1. 1.Department of Information SystemsUniversity of Maryland Baltimore CountyBaltimoreUSA

Personalised recommendations