Compact Samples for Data Dissemination

  • Tova Milo
  • Assaf Sagi
  • Elad Verbin
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4353)


We consider data dissemination in a peer-to-peer network, where each user wishes to obtain some subset of the available information objects. In most of the modern algorithms for such data dissemination, the users periodically obtain samples of peer IDs (possibly with some summary of their content). They then use the samples for connecting to other peers and downloading data pieces from them. For a set O of information objects, we call a sample of peers, containing at least k possible providers for each object oO, a k-sample.

In order to balance the load, the k-samples should be fair, in the sense that for every object, its providers should appear in the sample with equal probability. Also, since most algorithms send fresh samples frequently, the size of the k-samples should be as small as possible, to minimize communication overhead. We describe in this paper two novel techniques for generating fair and small k-samples in a P2P setting. The first is based on a particular usage of uniform sampling and has the advantage that it allows to build on standard P2P uniform sampling tools. The second is based on non-uniform sampling and requires more particular care, but is guaranteed to generate the smallest possible fair k-sample. The two algorithms exploit available dependencies between information objects to reduce the sample size, and are proved, both theoretically and experimentally, to be extremely effective.


Data Dissemination Information Object Uniform Sampling Compact Sample Object Node 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Agrawal, D., El Abbadi, A., Steinke, R.C.: Epidemic algorithms in replicated databases. In: PODS 1997 (1997)Google Scholar
  2. 2.
  3. 3.
    Byers, J.W., Considine, J., Mitzenmacher, M., Rost, S.: Informed content delivery across adaptive overlay networks. In: SIGCOMM 2002 (2002)Google Scholar
  4. 4.
    Castro, M., Druschel, P., Kermarrec, A.M., Rowstron, A.: SCRIBE: A large-scale and decentralized application-level multicast infrastructure. IEEE JSAC 20(8) (October 2002)Google Scholar
  5. 5.
    Cohen, B.: Incentives build robustness in BitTorrent. In: Proc. of the Workshop on the Economics of P2P Systems, Berkeley, CA (2003)Google Scholar
  6. 6.
    Dahlhaus, E., Johnson, D.S., Papadimitriou, C.H., Seymour, P.D., Yannakakis, M.: The complexity of multiterminal cuts. SIAM 23(4), 864–894 (1994)zbMATHCrossRefMathSciNetGoogle Scholar
  7. 7.
    Dahlin, M.: Interpreting stale load information. In: The 19th IEEE Int. Conf. on Distributed Computing Systems (ICDCS) (May 1999)Google Scholar
  8. 8.
    Eugster, P., Handurukande, S., Guerraoui, R., Kermarrec, A., Kuznetsov, P.: Lightweight probabilistic broadcast. In: Proc. of The Int. Conf. on Dependable Systems and Networks (DSN 2001) (July 2001)Google Scholar
  9. 9.
    Eugster, P.T., Guerraoui, R.: Probabilistic multicast. In: Proc. of the Int. Conf. on Dependable Systems and Networks (DSN 2002) (June 2002)Google Scholar
  10. 10.
    Fabret, F., Jacobsen, H.-A., Llirbat, F., Pereira, J., Ross, K., Shasha, D.: Filtering algorithms and implementation for very fast publish/subscribe systems. In: Proc. of ACM SIGMOD 2001 (2001)Google Scholar
  11. 11.
    Freedman, M.J., Freudenthal, E., Maziéres, D.: Democratizing content publication with Coral. In: Proc. 1st USENIX/ACM Symp. on Networked Systems Design and Implementation (NSDI 2004) (2004)Google Scholar
  12. 12.
    Garey, M.R., Johnson, D.S.: Computers and Intractability, A Guide to the Theory of NP-Completeness. W.H. Freeman and Company, New York (1979)zbMATHGoogle Scholar
  13. 13.
    Gkantsidis, C., Mihail, M., Saberi, A.: Random walks in peer-to-peer networks. In: INFOCOM 2004 (2004)Google Scholar
  14. 14.
    Jelasity, M., Guerraoui, R., Kermarrec, A., van Steen, M.: The peer sampling service: Experimental evaluation of unstructured gossip-based implementations. In: 5th Int. Middleware Conference, Toronto (October 2004)Google Scholar
  15. 15.
    Kostic, D., Braud, R., Killian, C., Vandekieft, E., Anderson, J.W., Snoeren, A.C., Vahdat, A.: Maintaining high bandwidth under dynamic network conditions. In: USENIX (2005)Google Scholar
  16. 16.
    Kostic, D., Rodriguez, A., Albrecht, J., Bhirud, A., Vahdat, A.: Using random subsets to build scalable network services. In: Proc. of USITS 2003 (2003)Google Scholar
  17. 17.
    Milo, T., Sagi, A., Verbin, E.: Compact samples for data dissemination (full version). Tech. Report,
  18. 18.
    Minsky, Y., Trachtenberg, A., Zippel, R.: Set reconciliation with nearly optimal communication complexity. In: Int. Symp. on Information Theory (June 2001)Google Scholar
  19. 19.
    Petrovic, M., Liu, H., Jacobsen, H.: CMS-ToPSS: efficient dissemination of RSS documents. In: VLDB 2005 (2005)Google Scholar
  20. 20.
    Ranganathan, S., George, A.D., Todd, R.W., Chidester, M.C.: Gossip-style failure detection for scalable heterogeneous clusters. Cluster Computing 4(3), 197–209 (2001)CrossRefGoogle Scholar
  21. 21.
    Zhong, M., Shen, K., Seiferas, J.: Non-uniform random membership management in peer-to-peer networks. In: INFOCOM 2005 (2005)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Tova Milo
    • 1
  • Assaf Sagi
    • 1
  • Elad Verbin
    • 1
  1. 1.School of Computer ScienceTel Aviv University 

Personalised recommendations