Dealing with network partitions in structured overlay networks

Article

Abstract

Structured overlay networks form a major class of peer-to-peer systems, which are touted for their abilities to scale, tolerate failures, and self-manage. Any long-lived Internet-scale distributed system is destined to face network partitions. Although the problem of network partitions and mergers is highly related to fault-tolerance and self-management in large-scale systems, it has hardly been studied in the context of structured peer-to-peer systems. These systems have mainly been studied under churn (frequent joins/failures), which as a side effect solves the problem of network partitions, as it is similar to massive node failures. Yet, the crucial aspect of network mergers has been ignored. In fact, it has been claimed that ring-based structured overlay networks, which constitute the majority of the structured overlays, are intrinsically ill-suited for merging rings. In this paper, we present an algorithm for merging multiple similar ring-based overlays when the underlying network merges. We examine the solution in dynamic conditions, showing how our solution is resilient to churn during the merger, something widely believed to be difficult or impossible. We evaluate the algorithm for various scenarios and show that even when falsely detecting a merger, the algorithm quickly terminates and does not clutter the network with many messages. The algorithm is flexible as the tradeoff between message complexity and time complexity can be adjusted by a parameter.

Keywords

DHTs Network partitions  Network mergers Structured overlay networks Loopy rings Distributed hash tables 

References

  1. 1.
    Aberer K, Alima LO, Ghodsi A, Girdzijauskas S, Haridi S, Hauswirth M (2005) The essence of P2P: a reference architecture for overlay networks. In: Proceedings of the 5th international conference on peer-to-peer computing (P2P’05). IEEE Computer Society, Los Alamitos, pp 11–20, AugustGoogle Scholar
  2. 2.
    Aberer K, Cudré-Mauroux P, Datta A, Despotovic Z, Hauswirth M, Punceva M, Schmidt R (2003) P-grid: a self-organizing structured P2P system. SIGMOD Rec 32(3):29–33CrossRefGoogle Scholar
  3. 3.
    Alima LO, Ghodsi A, Haridi S (2004) A framework for structured peer-to-peer overlay networks. In: Post-proceedings of global computing. Lecture notes in computer science (LNCS), vol 3267. Springer, Berlin Heidelberg New York, pp 223–250Google Scholar
  4. 4.
    Bharambe AR, Agrawal M, Seshan S (2004) Mercury: supporting scalable multi-attribute range queries. In: Proceedings of the ACM SIGCOMM 2004 symposium on communication, architecture, and protocols. ACM, Portland, pp 353–366, MarchGoogle Scholar
  5. 5.
    Brewer E (2000) Towards robust distributed systems. Invited talk at the 19th annual ACM symposium on principles of distributed computing (PODC’00)Google Scholar
  6. 6.
    Jahanian F, Labovitz C, Ahuja A (1998) Experimental study of internet stability and wide-area backbone failures. Technical report CSE-TR-382-98, University of Michigan, NovemberGoogle Scholar
  7. 7.
    Chandra TD, Toueg S (1996) Unreliable failure detectors for reliable distributed systems. J ACM 43(2):225–267MATHCrossRefMathSciNetGoogle Scholar
  8. 8.
    Datta A, Aberer K (2006) The challenges of merging two similar structured overlays: a tale of two networks. In: Proceedings of the first international workshop on self-organizing systems (IWSOS’06). Lecture notes in computer science (LNCS), vol 4124. Springer, Berlin Heidelberg New York, pp 7–22Google Scholar
  9. 9.
    Datta A (2007) Merging intra-planetary index structures: decentralized bootstrapping of overlays. In: Proceedings of the first international conference on self-adaptive and self-organizing systems (SASO 2007). IEEE Computer Society, Boston, pp 109–118, JulyCrossRefGoogle Scholar
  10. 10.
    Davidson SB, Garcia-Molina H, Skeen D (1985) Consistency in a partitioned network: a survey. ACM Comput Surv 17(3):341–370CrossRefGoogle Scholar
  11. 11.
    Demers A, Greene D, Hauser C, Irish W, Larson J, Shenker S, Sturgis H, Swinehart D, Terry D (1987) Epidemic algorithms for replicated database maintenance. In: Proceedings of the 7th annual ACM symposium on principles of distributed computing (PODC’87). ACM, New York, pp 1–12Google Scholar
  12. 12.
    Eugster PTh, Guerraoui R, Handurukande SB, Kouznetsov P, Kermarrec A-M (2003) Lightweight probabilistic broadcast. ACM Trans Comput Syst 21(4):341–374CrossRefGoogle Scholar
  13. 13.
    Ganesh AJ, Kermarrec A-M, Massoulié L (2001) SCAMP: peer-to-peer lightweight membership service for large-scale group communication. In: Proceedings of the 3rd international workshop on networked group communication (NGC’01). Lecture notes in computer science (LNCS), vol 2233. Springer, London, pp 44–55Google Scholar
  14. 14.
    Ghodsi A (2006) Distributed k-ary system: algorithms for distributed hash tables. PhD dissertation, KTH—Royal Institute of Technology, Stockholm, DecemberGoogle Scholar
  15. 15.
    Gilbert S, Lynch NA (2002) Brewer’s conjecture and the feasibility of consistent, available, partition-tolerant web services. ACM Spec Interest Group Algorithms Comput Theory News 33(2):51–59Google Scholar
  16. 16.
    Gummadi K, Gummadi R, Gribble S, Ratnasamy S, Shenker S, Stoica I (2003) The impact of DHT routing geometry on resilience and proximity. In: Proceedings of the ACM SIGCOMM 2003 symposium on communication, architecture, and protocol. ACM, New York, pp 381–394Google Scholar
  17. 17.
    Harvey N, Jones MB, Saroiu S, Theimer M, Wolman A (2003) Skipnet: a scalable overlay network with practical locality properties. In: Proceedings of the 4th USENIX symposium on internet technologies and systems (USITS’03). USENIX, Seattle, MarchGoogle Scholar
  18. 18.
    Jelasity M, Babaoglu Ö (2005) T-man: gossip-based overlay topology management. In: Proceedings of 3rd workshop on engineering self-organising systems (EOSA’05). Lecture notes in computer science (LNCS), vol 3910. Springer, Berlin Heidelberg New York, pp 1–15Google Scholar
  19. 19.
    Jelasity M, Kowalczyk W, van Steen M (2003) Newscast computing. Technical report IR–CS–006, Vrije Universiteit, NovemberGoogle Scholar
  20. 20.
    Kaashoek MF, Karger, DR (2003) Koorde: a simple degree-optimal distributed hash table. In: Proceedings of the 2nd interational workshop on peer-to-peer systems (IPTPS’03). Lecture notes in computer science (LNCS), vol 2735. Springer, Berkeley, pp 98–107Google Scholar
  21. 21.
    Kunzmann G, Binzenhöfer A (2006) Autonomically improving the security and robustness of structured P2P overlays. In: Proceedings of the international conference on systems and networks communications (ICSNC 2006). IEEE Computer Society, Tahiti, October–NovemberGoogle Scholar
  22. 22.
    Leong B, Liskov B, Demaine E (2004) EpiChord: parallelizing the chord lookup algorithm with reactive routing state management. In: 12th international conference on networks (ICON’04). IEEE Computer Society, Singapore, NovemberGoogle Scholar
  23. 23.
    Li J, Stribling J, Morris R, Kaashoek MF (2005) Bandwidth-efficient management of DHT routing tables. In: Proceedings of the 2nd USENIX symposium on networked systems design and implementation (NSDI’05). USENIX, Boston, MayGoogle Scholar
  24. 24.
    Li X, Misra J, Plaxton, CG (2004) Brief announcement: concurrent maintenance of rings. In: Proceedings of the 23rd annual ACM symposium on principles of distributed computing (PODC’04). ACM, New York, p 376Google Scholar
  25. 25.
    Liben-Nowell D, Balakrishnan H, Karger DR (2002) Observations on the dynamic evolution of peer-to-peer networks. In: Proceedings of the first international workshop on peer-to-peer systems (IPTPS’02). Lecture notes in computer science (LNCS), vol 2429. Springer, Berlin Heidelberg New YorkGoogle Scholar
  26. 26.
    Lynch NA, Malkhi D, Ratajczak, D (2002) Atomic data access in distributed hash tables. In: Proceedings of the first interational workshop on peer-to-peer systems (IPTPS’02). Lecture notes in computer science (LNCS). Springer, London, pp 295–305Google Scholar
  27. 27.
    Mahajan R, Castro M, Rowstron A (2003) Controlling the cost of reliability in peer-to-peer overlays. In: Proceedings of the 2nd international workshop on peer-to-peer systems (IPTPS’03). Lecture notes in computer science (LNCS), vol 2735. Springer, Berkeley, pp 21–32Google Scholar
  28. 28.
    Manku GS, Bawa M, Raghavan P (2003) Symphony: distributed hashing in a small world. In: Proceedings of the 4th USENIX symposium on internet technologies and systems (USITS’03). USENIX, Seattle, MarchGoogle Scholar
  29. 29.
    Montresor A, Jelasity M, Babaoglu Ö (2005) Chord on demand. In: Proceedings of the 5th international conference on peer-to-peer computing (P2P’05). IEEE Computer Society, Los Alamitos, AugustGoogle Scholar
  30. 30.
    PINR (2008) Taiwan earthquake on December 2006. http://www.pinr.com/report.php?ac=view_report&report_id=602. Accessd January 2008
  31. 31.
    Oppenheimer D, Ganapathi A, Patterson DA (2003) Why do internet services fail, and what can be done about it? In: USITS’03: proceedings of the 4th conference on USENIX symposium on internet technologies and systems. USENIX Association, Berkeley, pp 1–1Google Scholar
  32. 32.
    Paxson V (1997) End-to-end routing behavior in the internet. IEEE/ACM Trans Netw (TON) 5(5):601–615CrossRefGoogle Scholar
  33. 33.
    Plaxton CG, Rajaraman R, Richa, AW (1997) Accessing nearby copies of replicated objects in a distributed environment. In: Proceedings of the 9th annual ACM symposium on parallelism in algorithms and architectures (SPAA’97). ACM, New York, pp 311–320CrossRefGoogle Scholar
  34. 34.
    Rowstron A, Druschel P (2001) Pastry: scalable, distributed object location and routing for large-scale peer-to-peer systems. In: Proceedings of the 2nd ACM/IFIP international conference on middleware (MIDDLEWARE’01). Lecture notes in computer science (LNCS), vol 2218. Springer, Heidelberg, pp 329–350, NovemberGoogle Scholar
  35. 35.
    Shafaat TM, Ghodsi A, Haridi S (2007) Handling network partitions and mergers in structured overlay networks. In: Proceedings of the 7th international conference on peer-to-peer computing (P2P’07). IEEE Computer Society, Los Alamitos, pp 132–139, SeptemberGoogle Scholar
  36. 36.
    Shaker A, Reeves DS (2005) Self-stabilizing structured ring topology P2P systems. In: Proceedings of the 5th international conference on peer-to-peer computing (P2P’05). IEEE Computer Society, Los Alamitos, pp 39–46, AugustGoogle Scholar
  37. 37.
    SicsSim (2008) http://dks.sics.se/p2p07partition/. Accessed January 2008
  38. 38.
    Stoica I, Morris R, Liben-Nowell D, Karger DR, Kaashoek MF, Dabek F, Balakrishnan H (2002) Chord: a scalable peer-to-peer lookup service for internet applications. Technical report TR-819, Massachusetts Institute of Technology (MIT), JanuaryGoogle Scholar
  39. 39.
    Stoica I, Morris R, Liben-Nowell D, Karger DR, Kaashoek MF, Dabek F, Balakrishnan H (2003) Chord: a scalable peer-to-peer lookup protocol for internet applications. IEEE/ACM Trans Netw (TON) 11(1):17–32CrossRefGoogle Scholar
  40. 40.
    Terry DB, Theimer M, Petersen K, Demers AJ, Spreitzer M, Hauser C (1995) Managing update conflicts in Bayou, a weakly connected replicated storage system. In: Proceedings of the 15th ACM symposium on operating systems principles (SOSP’95). ACM, New York, pp 172–183, DecemberGoogle Scholar
  41. 41.
    Voulgaris S, Gavidia D, van Steen M (2005) Cyclon: inexpensive membership management for unstructured p2p overlays. J Netw Syst Manag 13(2):197–217CrossRefGoogle Scholar
  42. 42.
    Waldspurger CA, Weihl WE (1994) Lottery scheduling: flexible proportional-share resource management. In: Proceedings of the first symposium on operating systems design and implementation (OSDI’94). USENIX, Seattle, pp 1–11, NovemberGoogle Scholar

Copyright information

© Springer Science + Business Media, LLC 2009

Authors and Affiliations

  1. 1.KTH - Royal Institute of TechnologyKistaSweden
  2. 2.Swedish Institute of Computer Science (SICS)KistaSweden

Personalised recommendations