Abstract
Privacy is a critical requirement in distributed data mining. Cryptography-based secure multiparty computation is a main approach for privacy preserving. However, it shows poor performance in large scale distributed systems. Meanwhile, data perturbation techniques are comparatively efficient but are mainly used in centralized privacy-preserving data mining (PPDM). In this paper, we propose a light-weight anonymous data perturbation method for efficient privacy preserving in distributed data mining. We first define the privacy constraints for data perturbation based PPDM in a semi-honest distributed environment. Two protocols are proposed to address these constraints and protect data statistics and the randomization process against collusion attacks: the adaptive privacy-preserving summary protocol and the anonymous exchange protocol. Finally, a distributed data perturbation framework based on these protocols is proposed to realize distributed PPDM. Experiment results show that our approach achieves a high security level and is very efficient in a large scale distributed environment.
Similar content being viewed by others
References
Agrawal, D., Aggarwal, C.C., 2001. On the Design and Quantification of Privacy Preserving Data Mining Algorithms. Proc. 20th ACM SIGMOD-SIGACT-SIGART Symp. on Principles of Database Systems, p.247–255. [doi:10.1145/375551.375602]
Agrawal, R., Srikant, R., 2000. Privacy-preserving data mining. ACM SIGMOD Record, 29(2):439–450. [doi:10.1145/335191.335438]
Ashley, P., Hada, S., Karjoth, G., 2003. The Enterprise Privacy Authorization Language (EPAL 1.1), IBM. Available from: http://www.zurich.ibm.com/security/enterprise-privacy/epal/
Ashrafi, M.Z., Taniar, D., Smith, K., 2003. Towards Privacy Preserving Distributed Association Rule Mining. Distributed Computing-IWDC, p.279–289. [doi:10.1007/b94926]
Beaver, D., 1991. Foundations of secure interactive computing. CRYPTO, 1991:377–391. [doi:10.1007/3-540-46766-1]
Bertino, E., Fovino, I.N., Provenza, L.P., 2005. A framework for evaluating privacy preserving data mining algorithms. Data Min. Knowl. Discov., 11(2):121–154. [doi:10.1007/s10618-005-0006-6]
Chaum, D., Crepeau, C., Damgard, I., 1988. Multiparty Unconditionally Secure Protocols. Proc. 20th Annual ACM Symp. on Theory of Computing, p.11–19. [doi:10.1145/62212.62214]
Chawla, S., Dwork, C., McSherry, F., Smith, A., Wee, H., 2005. Toward Privacy in Public Databases. Theory of Cryptography Conf., p.363–385. [doi:10.1007/b106171]
Cramer, R., Damgard, I., Nielsen, J.B., 2001. Multiparty Computation from Threshold Homomorphic Encryption. Proc. EUROCRYPT, p.280–300. [doi:10.1007/3-540-44987-6]
Cranor, L., Langheinrich, M., Marchiori, M., Presler-Marshall, M., Reagle, J. (Eds.), 2002. The Platform for Privacy Preferences 1.0 (P3P1.0) Specification. W3C. Available from: http://www.w3.org/TR/P3P/
CSA (Canadian Standards Association), 2004. Privacy Code. Available from: http://www.csa.ca/standards/privacy/Default.asp?laguage=English
Evfimievski, A., Srikant, R., Agrawal, R., Gehrke, J., 2004. Privacy preserving mining of association rules. Inf. Syst., 29(4):343–364. [doi:10.1016/j.is.2003.09.001]
Fienberg, S.E., McIntyre, J., 2004. Data swapping: variations on a theme by dalenius and reiss. Priv. Statist. Datab., 3050:14–29. [doi:10.1007/b97945]
Fukasawa, T., Wang, J., Takata, T., Miyazaki, M., 2004. An Effective Distributed Privacy-preserving Data Mining Algorithm. Intelligent Data Engineering and Automated Learning (IDEAL), p.320–325. [doi:10.1007/b99975]
Goldreich, O., Micali, S., Wigderson, A., 1987. How to Play Any Mental Game or a Completeness Theorem for Protocols with Honest Majority. 19th ACM Symp. on the Theory of Computing, p.218–229. [doi:10.1145/28395.28420]
Kargupta, H., Das, K., Liu, K., 2007. Multi-party, privacy-preserving distributed data mining using a game theoretic framework. LNCS, 4702:523. [doi:10.1007/978-3-540-74976-9]
Liew, C.K., Choi, U.J., Liew, C.J., 1985. A data distortion by probability distribution. ACM Trans. Datab. Syst., 10(3):395–411. [doi:10.1145/3979.4017]
Paillier, P., 1999. Public-key cryptosystems based on composite degree residuosity classes. Advances in Cryptology EUROCRYPT, 99:223–238. Available from: http://www.springerlink.com/content/kwjvf0k8fqyy2h3d/
Rizvi, S.J., Haritsa, J.R., 2002. Maintaining Data Privacy in Association Rule Mining. Proc. 28th Int. Conf. on Very Large Data Bases, 28:682–693. [doi:10.1016/B978-155860869-6/50066-4]
Sweeney, L., 2002. Achieving k-anonymity privacy protection using generalization and suppression. Int. J. Uncert., Fuzz. and Knowl.-based Syst., 10(5):571–588. [doi:10.1142/S021848850200165X]
Yao, A.C., 1986. How to Generate and Exchange Secrets. Proc. 27th IEEE Symp. on Foundations of Computer Science, p.162–167.
Zhang, P., Tong, Y.H., Tang, S.W., Yang, D.Q., 2005. Privacy Preserving Naive Bayes Classification Advanced Data Mining and Applications. 3584:744–752. [doi:10.1007/b11111]
Author information
Authors and Affiliations
Corresponding author
Additional information
Project supported by the National Natural Science Foundation of China (Nos. 60772098 and 60672068), and the New Century Excellent Talents in University of China (No. NCET-06-0393)
Rights and permissions
About this article
Cite this article
Li, F., Ma, J. & Li, Jh. Distributed anonymous data perturbation method for privacy-preserving data mining. J. Zhejiang Univ. Sci. A 10, 952–963 (2009). https://doi.org/10.1631/jzus.A0820320
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1631/jzus.A0820320