Efficient Synchronization of Replicated Data in Distributed Systems
We present nsync, a tool for synchronizing large replicated data sets in distributed systems. nsync computes nearly optimal synchronization plans based on a hierarchy of gossip algorithms that take the network topology into account. Our primary design goals were maximum performance and maximum scalability. We achieved these goals by exploiting parallelism in the planning and the synchronization phase, by omitting transfer of unnecessary metadata, by synchronizing at a block level rather than a file level, and by using sophisticated compression methods. With its relaxed consistency semantic, nsync neither needs a master copy nor a quorum for updating distributed replicas. Each replica is kept as an autonomous entity and can be modified with the usual tools.
KeywordsReplicate Data Synchronization Process Broadcast Tree Proxy Node Storage Resource Broker
- 2.C. Baru, R. Moore, A. Rajasekar, and M. Wan. The SDSC Storage Resource Broker. In Proceedings of CASCON’98, Toronto, Canada, November 1998.Google Scholar
- 3.B. Dempsey and D. Weiss. On the performance and scalability of a data mirroring approach for I2-DSI. In Network Storage Symposium, 1999.Google Scholar
- 4.A. Chervenak et al. Giggle: A framework for constructing scalable replica location services. In Proceedings of the SC 2002, Baltimore, Maryland, November 2002.Google Scholar
- 6.R. G. Guy, P. L. Reiher, D. Ratner, M. Gunter, W. Ma, and G. J. Popek. Rumor: Mobile data access through optimistic peer-to-peer replication. In ER Workshops, pages 254–265, 1998.Google Scholar
- 7.J. Hromkovic, C. Klasing, B. Monien, and R. Peine. Dissemination of information in interconnection networks. Combinatorial Network Theory, pages 125–212, 1995.Google Scholar
- 8.R. Jiménez-Peris, M. Patiño-Martínez, G. Alonso, and B. Kemme. How to select a replication protocol according to scalability, availability, and communication overhead. In IEEE Int. Conf. on Reliable Distrib. Systems (SRDS’01), New Orleans, October 2001. IEEE CS Press.Google Scholar
- 10.Globus Project. http://www.globus.org.
- 11.GridLab Project. http://www.gridlab.org.
- 12.S. Ratnasamy, P. Francis, M. Handley, R. Karp, and S. Shenker. Ascalable content addressable network. In Proceedings of ACM SIGCOMM 2001, 2001.Google Scholar
- 13.M. Ripeanu and I. Foster. A decentralized, adaptive, replica location service. In Proceedings of 11th IEEE Int. Symp. on High Performance Distributed Compuing (HPDC-11), July 2002.Google Scholar
- 14.F. Schintke and A. Reinefeld. On the cost of reliability in large data grids. Technical Report ZR-02-52, Zuse Institute Berlin (ZIB), December 2002.Google Scholar
- 15.T. Schütt. Synchronisation von verteilten Verzeichnisstrukturen. Diploma Thesis, 2002.Google Scholar
- 16.I. Stoica, R. Morris, D. Karger, M. F. Kaashoek, and H. Balakrishnan. Chord: A scalable peer-to-peer lookup service for Internet applications. In Proceedings of the ACM SIGCOMM’ 01 Conference, San Diego, California, August 2001.Google Scholar
- 17.A. Tridgell. Efficient Algorithms for Sorting and Synchronization. PhD thesis, Australian National University, 1999.Google Scholar
- 18.H. Yu and A. Vahdat. The costs and limits of availability for replicated services. In Proc. of the 18th ACM Symposium on Operating Systems Principles, pages 29–42. ACM Press, 2001.Google Scholar