Peer-to-Peer Networking and Applications

, Volume 4, Issue 2, pp 192–209

Multi-objective optimization based privacy preserving distributed data mining in Peer-to-Peer networks

Article

Abstract

This paper proposes a scalable, local privacy-preserving algorithm for distributed Peer-to-Peer (P2P) data aggregation useful for many advanced data mining/analysis tasks such as average/sum computation, decision tree induction, feature selection, and more. Unlike most multi-party privacy-preserving data mining algorithms, this approach works in an asynchronous manner through local interactions and it is highly scalable. It particularly deals with the distributed computation of the sum of a set of numbers stored at different peers in a P2P network in the context of a P2P web mining application. The proposed optimization-based privacy-preserving technique for computing the sum allows different peers to specify different privacy requirements without having to adhere to a global set of parameters for the chosen privacy model. Since distributed sum computation is a frequently used primitive, the proposed approach is likely to have significant impact on many data mining tasks such as multi-party privacy-preserving clustering, frequent itemset mining, and statistical aggregate computation.

Keywords

Privacy preserving Data mining Peer-to-Peer 

References

  1. 1.
    Bawa M, Garcia-Molina H, Gionis A, Motwani R (2003) Estimating aggregates on a Peer-to-Peer network. Technical report, Stanford UniversityGoogle Scholar
  2. 2.
    Bhaduri K, Srivastava A (2009) A local scalable distributed expectation maximization algorithm for large Peer-to-Peer networks. In: Proceedings of ICDM’09, Miami, FL, pp 31–40Google Scholar
  3. 3.
    Bhaduri K, Wolff R, Giannella C, Kargupta H (2008) Distributed decision tree induction in Peer-to-Peer systems. Statistical Analysis and Data Mining (SAM) 1(2):85–103CrossRefMathSciNetGoogle Scholar
  4. 4.
    Boyd S, Vandenberghe L (2004) Convex optimization. Cambridge University Press, CambridgeMATHGoogle Scholar
  5. 5.
    Clifton C, Kantarcioglu M, Vaidya J, Lin X, Zhu M (2003) Tools for privacy preserving distributed data mining. ACM SIGKDD Explorations 4(2):1–7Google Scholar
  6. 6.
    Das K, Bhaduri K, Liu K, Kargupta H (2008) Distributed identification of top-l inner product elements and its application in a Peer-to-Peer network. TKDE 20(4):475–488Google Scholar
  7. 7.
    Datta S, Bhaduri K, Giannella C, Wolff R, Kargupta H (2006) Distributed data mining in Peer-to-Peer networks. IEEE Internet Computing 10(4):18–26CrossRefGoogle Scholar
  8. 8.
    Datta S, Giannella C, Kargupta H (2006) K-means clustering over a large, dynamic network. In: Proceedings of SDM’06, MD, pp 153–164Google Scholar
  9. 9.
    Deb K (2001) Multi-objective optimization using evolutionary algorithms. Wiley, New YorkMATHGoogle Scholar
  10. 10.
    Evfimevski A, Gehrke J, Srikant R (2003) Limiting privacy breaches in privacy preserving data mining. In: Proc. of SIGMOD’03, San Diego, CAGoogle Scholar
  11. 11.
    Evfimievski A, Srikant R, Agrawal R, Gehrke J (2002) Privacy preserving mining of association rules. In: Proceedings of 8th ACM SIGKDD international conference on knowledge discovery and data mining (KDD’02)Google Scholar
  12. 12.
    Gilburd B, Schuster A, Wolff R (2004) k-TTP: a new privacy model for large-scale distributed environments. In: Proc. of KDD’04, Seattle, pp 563–568Google Scholar
  13. 13.
    Kargupta H, Das K, Liu K (2007) Multi-party, privacy-preserving distributed data mining using a game theoretic framework. In: Proc. of PKDD’07, pp 523–531Google Scholar
  14. 14.
    Kargupta H, Sivakumar K (2004) Existential pleasures of distributed data mining. Data mining: next generation challenges and future directions. AAAI/MIT Press, CambridgeGoogle Scholar
  15. 15.
    Kargupta H, Chan P (eds) Advances in distributed and parallel knowledge discovery. MIT Press, CambridgeGoogle Scholar
  16. 16.
    Li N, Li T, Venkatasubramanian S (2007) t-closeness: privacy beyond k-anonymity and ℓ-diversity. In: Proceedings of ICDE’07, pp 106–115Google Scholar
  17. 17.
    Liu K, Bhaduri K, Das K, Nguyen P, Kargupta H (2006) Client-side web mining for community formation in Peer-to-Peer environments. SIGKDD Explorations 8(2):11–20CrossRefMATHGoogle Scholar
  18. 18.
    Machanavajjhala A, Gehrke J, Kifer D, Venkitasubramanian M (2006) l-diversity: privacy beyond k-anonymity. In: Proc. of ICDE’06, GA, p 24Google Scholar
  19. 19.
    Mane S, Mopuru S, Mehra K, Srivastava J (2005) Network size estimation in a Peer-to-Peer network. Technical Report 05-030, University of MinnesotaGoogle Scholar
  20. 20.
    Mehyar M, Spanos D, Pongsajapan J, Low SH, Murray R (2005) Distributed averaging on Peer-to-Peer networks. In: Proc. of CDC’05, SpainGoogle Scholar
  21. 21.
    Scherber D, Papadopoulos H (2005) Distributed computation of averages over ad hoc networks. IEEE J Sel Areas Commun 23(4):776–787CrossRefGoogle Scholar
  22. 22.
    Sweeney L (2002) k-anonymity: a model for protecting privacy. Int J Uncertain Fuzziness Knowl-Based Syst 10(5):557–570CrossRefMATHMathSciNetGoogle Scholar
  23. 23.
    Teng Z, Du W (2009) Hybrid multi-group approach for privacy-preserving data mining. Knowl Inf Syst 19(2):133–157CrossRefGoogle Scholar
  24. 24.
    Trottini M, Fienberg S, Makov U, Meyer M (2004) Additive noise and multiplicative bias as disclosure limitation techniques for continuous microdata: a simulation study. J Comput Methods Sci Eng 4(1,2):5–16MATHGoogle Scholar
  25. 25.
    Wolff R, Bhaduri K, Kargupta H (2009) A generic local algorithm for mining data streams in large distributed systems. IEEE Trans Knowl Data Eng 21(4):465–478CrossRefGoogle Scholar
  26. 26.
    Wolff R, Schuster A (2004) Association rule mining in Peer-to-Peer systems. IEEE SMC Part B 34(6):2426–2438Google Scholar
  27. 27.
    Yao AC (1986) How to generate and exchange secrets (extended abstract). In: FOCS, pp 162–167Google Scholar

Copyright information

© Springer Science + Business Media, LLC 2010

Authors and Affiliations

  • Kamalika Das
    • 1
  • Kanishka Bhaduri
    • 2
  • Hillol Kargupta
    • 3
    • 4
  1. 1.Stinger Ghaffarian Technologies Inc.NASA Ames Research CenterMoffett FieldUSA
  2. 2.Mission Critical Technologies Inc.NASA Ames Research CenterMoffett FieldUSA
  3. 3.CSEE Dept.University of MarylandBaltimore CountyUSA
  4. 4.AGNIK LLCColumbiaUSA

Personalised recommendations