Asynchronous Peer-to-Peer Data Mining with Stochastic Gradient Descent

  • Róbert Ormándi
  • István Hegedűs
  • Márk Jelasity
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6852)

Abstract

Fully distributed data mining algorithms build global models over large amounts of data distributed over a large number of peers in a network, without moving the data itself. In the area of peer-to-peer (P2P) networks, such algorithms have various applications in P2P social networking, and also in trackerless BitTorrent communities. The difficulty of the problem involves realizing good quality models with an affordable communication complexity, while assuming as little as possible about the communication model. Here we describe a conceptually simple, yet powerful generic approach for designing efficient, fully distributed, asynchronous, local algorithms for learning models of fully distributed data. The key idea is that many models perform a random walk over the network while being gradually adjusted to fit the data they encounter, using a stochastic gradient descent search. We demonstrate our approach by implementing the support vector machine (SVM) method and by experimentally evaluating its performance in various failure scenarios over different benchmark datasets. Our algorithm scheme can implement a wide range of machine learning methods in an extremely robust manner.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Bai, X., Bertier, M., Guerraoui, R., Kermarrec, A.-M., Leroy, V.: Gossiping personalized queries. In: Proc. 13th Intl. Conf. on Extending Database Technology (EBDT 2010) (2010)Google Scholar
  2. 2.
    Bakker, A., Ogston, E., van Steen, M.: Collaborative filtering using random neighbours in peer-to-peer networks. In: Proc. 1st ACM Intl. Workshop on Complex Networks Meet Information & Knowledge Management (CNIKM 2009), pp. 67–75. ACM, New York (2009)CrossRefGoogle Scholar
  3. 3.
    Buchegger, S., Schiöberg, D., Vu, L.-H., Datta, A.: PeerSoN: P2P social networking: early experiences and insights. In: Proc. Second ACM EuroSys Workshop on Social Network Systems (SNS 2009), pp. 46–52. ACM, New York (2009)CrossRefGoogle Scholar
  4. 4.
    Chapelle, O.: Training a support vector machine in the primal. Neural Computation 19, 1155–1178 (2007)MathSciNetCrossRefMATHGoogle Scholar
  5. 5.
    Cheetancheri, S.G., Agosta, J.M., Dash, D.H., Levitt, K.N., Rowe, J., Schooler, E.M.: A distributed host-based worm detection system. In: Proc. 2006 SIGCOMM Workshop on Large-Scale Attack Defense (LSAD 2006), pp. 107–113. ACM, New York (2006)CrossRefGoogle Scholar
  6. 6.
    Cristianini, N., Shawe-Taylor, J.: An introduction to Support Vector Machines and other kernel-based learning methods. Cambridge University Press, Cambridge (2000)CrossRefMATHGoogle Scholar
  7. 7.
    Datta, S., Giannella, C., Kargupta, H.: Approximate distributed k-means clustering over a peer-to-peer network. IEEE Trans. on Knowl. and Data Eng. 21, 1372–1388 (2009)CrossRefGoogle Scholar
  8. 8.
    Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification, 2nd edn. Wiley Interscience, Hoboken (2000)MATHGoogle Scholar
  9. 9.
    Fisher, R.A.: The use of multiple measurements in taxonomic problems. Annals of Eugenics 7(7), 179–188 (1936)CrossRefGoogle Scholar
  10. 10.
    Frank, A., Asuncion, A.: UCI machine learning repository (2010)Google Scholar
  11. 11.
    Guyon, I., Hur, A.B., Gunn, S., Dror, G.: Result analysis of the nips 2003 feature selection challenge. In: Advances in Neural Information Processing Systems 17, pp. 545–552. MIT Press, Cambridge (2004)Google Scholar
  12. 12.
    Han, P., Xie, B., Yang, F., Wang, J., Shen, R.: A novel distributed collaborative filtering algorithm and its implementation on p2p overlay network. In: Dai, H., Srikant, R., Zhang, C. (eds.) PAKDD 2004. LNCS (LNAI), vol. 3056, pp. 106–115. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  13. 13.
    Hensel, C., Dutta, H.: GADGET SVM: a gossip-based sub-gradient svm solver. In: Intl. Conf. on Machine Learning (ICML), Numerical Mathematics in Machine Learning Workshop (2009)Google Scholar
  14. 14.
    Jelasity, M., Babaoglu, O.: T-man: Gossip-based overlay topology management. In: Brueckner, S.A., Di Marzo Serugendo, G., Hales, D., Zambonelli, F. (eds.) ESOA 2005. LNCS (LNAI), vol. 3910, pp. 1–15. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  15. 15.
    Jelasity, M., Montresor, A., Babaoglu, O.: Gossip-based aggregation in large dynamic networks. ACM Transactions on Computer Systems 23(3), 219–252 (2005)CrossRefGoogle Scholar
  16. 16.
    Jelasity, M., Voulgaris, S., Guerraoui, R., Kermarrec, A.M., van Steen, M.: Gossip-based peer sampling. ACM Transactions on Computer Systems 25(3), 8 (2007)CrossRefGoogle Scholar
  17. 17.
    Joachims, T.: Making large-scale SVM learning practical. In: Schölkopf, B., Burges, C., Smola, A. (eds.) Advances in Kernel Methods - Support Vector Learning, ch. 11, pp. 169–184. MIT Press, Cambridge (1999)Google Scholar
  18. 18.
    Kempe, D., Dobra, A., Gehrke, J.: Gossip-based computation of aggregate information. In: Proc. 44th Annual IEEE Symposium on Foundations of Computer Science (FOCS 2003), pp. 482–491. IEEE Computer Society, Los Alamitos (2003)CrossRefGoogle Scholar
  19. 19.
    Ma, J., Saul, L.K., Savage, S., Voelker, G.M.: Identifying suspicious urls: an application of large-scale online learning. In: Proc. 26th Annual Intl. Conf. on Machine Learning, ICML 2009, pp. 681–688. ACM, New York (2009)Google Scholar
  20. 20.
    Massoulié, L., Merrer, E.L., Kermarrec, A.M., Ganesh, A.: Peer counting and sampling in overlay networks: random walk methods. In: Proc. 25th Annual ACM Symposium on Principles of Distributed Computing (PODC), pp. 123–132. ACM, New York (2006)Google Scholar
  21. 21.
    Montresor, A., Jelasity, M.: Peersim: A scalable P2P simulator. In: Proc. 9th IEEE Intl. Conf. on Peer-to-Peer Computing (P2P 2009), pp. 99–100. IEEE, Los Alamitos (2009), extended abstractCrossRefGoogle Scholar
  22. 22.
    Mosk-Aoyama, D., Shah, D.: Fast distributed algorithms for computing separable functions. IEEE Transactions on Information Theory 54(7), 2997–3007 (2008)MathSciNetCrossRefMATHGoogle Scholar
  23. 23.
    Ormándi, R., Hegedűs, I., Jelasity, M.: Overlay management for fully distributed user-based collaborative filtering. In: D’Ambra, P., Guarracino, M., Talia, D. (eds.) Euro-Par 2010. LNCS, vol. 6271, pp. 446–457. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  24. 24.
    Pouwelse, J.A., Garbacki, P., Wang, J., Bakker, A., Yang, J., Iosup, A., Epema, D.H.J., Reinders, M., van Steen, M.R., Sips, H.J.: TRIBLER: a social-based peer-to-peer system. Concurrency and Computation: Practice and Experience 20(2), 127–138 (2008)CrossRefGoogle Scholar
  25. 25.
    van Renesse, R., Birman, K.P., Vogels, W.: Astrolabe: A robust and scalable technology for distributed system monitoring, management, and data mining. ACM Transactions on Computer Systems 21(2), 164–206 (2003)CrossRefGoogle Scholar
  26. 26.
    Roozenburg, J.: Secure Decentralized Swarm Discovery in Tribler. Master’s thesis, Parallel and Distributed Systems Group, Delft University of Technology (2006)Google Scholar
  27. 27.
    Shalev-Shwartz, S., Singer, Y., Srebro, N., Cotter, A.: Pegasos: primal estimated sub-gradient solver for SVM. Mathematical Programming B (2010)Google Scholar
  28. 28.
    Siersdorfer, S., Sizov, S.: Automatic document organization in a p2p environment. In: Lalmas, M., MacFarlane, A., Rüger, S.M., Tombros, A., Tsikrika, T., Yavlinsky, A. (eds.) ECIR 2006. LNCS, vol. 3936, pp. 265–276. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  29. 29.
    Stutzbach, D., Rejaie, R.: Understanding churn in peer-to-peer networks. In: Proc. 6th ACM Conf. on Internet measurement (IMC 2006), pp. 189–202. ACM Press, New York (2006)Google Scholar
  30. 30.
    Tölgyesi, N., Jelasity, M.: Adaptive peer sampling with newscast. In: Sips, H., Epema, D., Lin, H.-X. (eds.) Euro-Par 2009. LNCS, vol. 5704, pp. 523–534. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  31. 31.
    Tveit, A.: Peer-to-peer based recommendations for mobile commerce. In: Proc. 1st Intl. workshop on Mobile commerce (WMC 2001), pp. 26–29. ACM Press, New York (2001)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Róbert Ormándi
    • 1
  • István Hegedűs
    • 1
  • Márk Jelasity
    • 2
  1. 1.University of SzegedHungary
  2. 2.University of Szeged and Hungarian Academy of SciencesHungary

Personalised recommendations