Abstract
This paper proposes a scalable, local privacy-preserving algorithm for distributed Peer-to-Peer (P2P) data aggregation useful for many advanced data mining/analysis tasks such as average/sum computation, decision tree induction, feature selection, and more. Unlike most multi-party privacy-preserving data mining algorithms, this approach works in an asynchronous manner through local interactions and it is highly scalable. It particularly deals with the distributed computation of the sum of a set of numbers stored at different peers in a P2P network in the context of a P2P web mining application. The proposed optimization-based privacy-preserving technique for computing the sum allows different peers to specify different privacy requirements without having to adhere to a global set of parameters for the chosen privacy model. Since distributed sum computation is a frequently used primitive, the proposed approach is likely to have significant impact on many data mining tasks such as multi-party privacy-preserving clustering, frequent itemset mining, and statistical aggregate computation.
Similar content being viewed by others
References
Bawa M, Garcia-Molina H, Gionis A, Motwani R (2003) Estimating aggregates on a Peer-to-Peer network. Technical report, Stanford University
Bhaduri K, Srivastava A (2009) A local scalable distributed expectation maximization algorithm for large Peer-to-Peer networks. In: Proceedings of ICDM’09, Miami, FL, pp 31–40
Bhaduri K, Wolff R, Giannella C, Kargupta H (2008) Distributed decision tree induction in Peer-to-Peer systems. Statistical Analysis and Data Mining (SAM) 1(2):85–103
Boyd S, Vandenberghe L (2004) Convex optimization. Cambridge University Press, Cambridge
Clifton C, Kantarcioglu M, Vaidya J, Lin X, Zhu M (2003) Tools for privacy preserving distributed data mining. ACM SIGKDD Explorations 4(2):1–7
Das K, Bhaduri K, Liu K, Kargupta H (2008) Distributed identification of top-l inner product elements and its application in a Peer-to-Peer network. TKDE 20(4):475–488
Datta S, Bhaduri K, Giannella C, Wolff R, Kargupta H (2006) Distributed data mining in Peer-to-Peer networks. IEEE Internet Computing 10(4):18–26
Datta S, Giannella C, Kargupta H (2006) K-means clustering over a large, dynamic network. In: Proceedings of SDM’06, MD, pp 153–164
Deb K (2001) Multi-objective optimization using evolutionary algorithms. Wiley, New York
Evfimevski A, Gehrke J, Srikant R (2003) Limiting privacy breaches in privacy preserving data mining. In: Proc. of SIGMOD’03, San Diego, CA
Evfimievski A, Srikant R, Agrawal R, Gehrke J (2002) Privacy preserving mining of association rules. In: Proceedings of 8th ACM SIGKDD international conference on knowledge discovery and data mining (KDD’02)
Gilburd B, Schuster A, Wolff R (2004) k-TTP: a new privacy model for large-scale distributed environments. In: Proc. of KDD’04, Seattle, pp 563–568
Kargupta H, Das K, Liu K (2007) Multi-party, privacy-preserving distributed data mining using a game theoretic framework. In: Proc. of PKDD’07, pp 523–531
Kargupta H, Sivakumar K (2004) Existential pleasures of distributed data mining. Data mining: next generation challenges and future directions. AAAI/MIT Press, Cambridge
Kargupta H, Chan P (eds) Advances in distributed and parallel knowledge discovery. MIT Press, Cambridge
Li N, Li T, Venkatasubramanian S (2007) t-closeness: privacy beyond k-anonymity and ℓ-diversity. In: Proceedings of ICDE’07, pp 106–115
Liu K, Bhaduri K, Das K, Nguyen P, Kargupta H (2006) Client-side web mining for community formation in Peer-to-Peer environments. SIGKDD Explorations 8(2):11–20
Machanavajjhala A, Gehrke J, Kifer D, Venkitasubramanian M (2006) l-diversity: privacy beyond k-anonymity. In: Proc. of ICDE’06, GA, p 24
Mane S, Mopuru S, Mehra K, Srivastava J (2005) Network size estimation in a Peer-to-Peer network. Technical Report 05-030, University of Minnesota
Mehyar M, Spanos D, Pongsajapan J, Low SH, Murray R (2005) Distributed averaging on Peer-to-Peer networks. In: Proc. of CDC’05, Spain
Scherber D, Papadopoulos H (2005) Distributed computation of averages over ad hoc networks. IEEE J Sel Areas Commun 23(4):776–787
Sweeney L (2002) k-anonymity: a model for protecting privacy. Int J Uncertain Fuzziness Knowl-Based Syst 10(5):557–570
Teng Z, Du W (2009) Hybrid multi-group approach for privacy-preserving data mining. Knowl Inf Syst 19(2):133–157
Trottini M, Fienberg S, Makov U, Meyer M (2004) Additive noise and multiplicative bias as disclosure limitation techniques for continuous microdata: a simulation study. J Comput Methods Sci Eng 4(1,2):5–16
Wolff R, Bhaduri K, Kargupta H (2009) A generic local algorithm for mining data streams in large distributed systems. IEEE Trans Knowl Data Eng 21(4):465–478
Wolff R, Schuster A (2004) Association rule mining in Peer-to-Peer systems. IEEE SMC Part B 34(6):2426–2438
Yao AC (1986) How to generate and exchange secrets (extended abstract). In: FOCS, pp 162–167
Author information
Authors and Affiliations
Corresponding author
Additional information
A shorter version of this paper was published in IEEE P2P’09 conference. This work was supported by AFOSR MURI grant 2009-11.
Rights and permissions
About this article
Cite this article
Das, K., Bhaduri, K. & Kargupta, H. Multi-objective optimization based privacy preserving distributed data mining in Peer-to-Peer networks. Peer-to-Peer Netw. Appl. 4, 192–209 (2011). https://doi.org/10.1007/s12083-010-0075-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12083-010-0075-1