Disaster-Tolerant Storage with SDN

  • Vincent Gramoli
  • Guillaume Jourjon
  • Olivier Mehani
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9466)

Abstract

Cloud services are becoming centralized at several geo-replicated datacentres. These services replicate data within a single datacentre to tolerate isolated failures. Unfortunately, the effects of a disaster cannot be avoided, as existing approaches migrate a copy of data to backup datacentres only after data have been stored at a primary datacentre. Upon disaster, all data not yet migrated can be lost.

In this paper, we propose and implement SDN-KVS, a disaster-tolerant key-value store, which provides strong disaster resilience by replicating data before storing. To this end, SDN-KVS features a novel communication primitive, SDN-cast, that leverages Software Defined Network (SDN) in two ways: it offers an SDN-multicast primitive to replicate critical update request flows and an SDN-anycast primitive to redirect request flows to the closest available datacentre. Our performance evaluation indicates that SDN-KVS ensures no data loss and that traffic gets redirected across long distance key-value store replicas within 30 s after a datacentre outage.

References

  1. 1.
    Barré, S., Bonaventure, O., Raiciu, O., Handley, M.: Experimenting with multipath TCP. In: SIGCOMM, pp. 443–444 (2010)Google Scholar
  2. 2.
    Chakravorty, R., Katti, S., Crowcroft, J., Pratt, I.: Flow aggregation for enhanced TCP over wide-area wireless. In: INFOCOM (2003)Google Scholar
  3. 3.
    Chockler, G., Gilbert, S., Gramoli, V., Musial, P.M., Shvartsman, A.A.: Reconfigurable distributed storage for dynamic networks. J. Parallel Distrib. Comput. 69(1), 100–116 (2009)CrossRefMATHGoogle Scholar
  4. 4.
    DeCandia, G., Hastorun, D., Jampani, M., Kakulapati, G., Lakshman, A., Pilchin, A., Sivasubramanian, S., Vosshall, P., Vogels, W.: Dynamo: amazon’s highly available key-value store. In: SOSP, pp. 205–220 (2007)Google Scholar
  5. 5.
    Garcia-Molina, H., Polyzois, C.A., Hagmann, R.B.: Two epoch algorithms for disaster recovery. In: VLDB, pp. 222–230 (1990)Google Scholar
  6. 6.
    Handigol, N., Heller, B., Jeyakumar, V., Lantz, B., McKeown, N.: Reproducible network experiments using container-based emulation. In: CoNEXT, pp. 253–264 (2012)Google Scholar
  7. 7.
    Herlihy, M., Wing, J.: Linearizability: a correctness condition for concurrent objects. ACM Trans. Program. Lang. Syst. 12(3), 463–492 (1990)CrossRefGoogle Scholar
  8. 8.
    Honda, M., Nishida, Y., Raiciu, C., Greenhalgh, A., Handley, M., Tokuda, H.: Is it still possible to extend TCP? In: IMC (2011)Google Scholar
  9. 9.
    Jain, S., Kumar, A., Mandal, S., Ong, J., Poutievski, L., Singh, A., Venkata, S., Wanderer, J., Zhou, J., Zhu, M., Zolla, J., Hölzle, U., Stuart, S., Vahdat, A.: B4: experience with a globally-deployed software defined WAN. In: SIGCOMM, pp. 3–14 (2013)Google Scholar
  10. 10.
    Ji, M., Veitch, A.C., Wilkes, J.: Seneca: remote mirroring done write. In: ATC, pp. 253–268 (2003)Google Scholar
  11. 11.
    Kim, J., Santos, J.R., Turner, Y., Schlansker, M., Tourrilhes, J., Feamster, N.: CORONET: fault tolerance for software defined networks. In: ICNP (2012)Google Scholar
  12. 12.
    Lynch, N., Shvartsman, A.: Robust emulation of shared memory using dynamic quorum-acknowledged broadcasts. In: FTCS, pp. 272–281 (1997)Google Scholar
  13. 13.
    Maltz, D., Bhagwat, P.: TCP splicing for application layer proxy performance, RC 21139. IBM, March 1998Google Scholar
  14. 14.
    Medina, A., Allman, M., Floyd, S.: Measuring interaction between transport protocols and middleboxes. In: IMC, pp. 336–341 (2004)Google Scholar
  15. 15.
    Oracle: Oracle optimized solution for disaster recovery on oracle supercluster (2013)Google Scholar
  16. 16.
    Patterson, R.H., Manley, S., Federwisch, M., Hitz, D., Kleiman, S., Owara, S.: SnapMirror: file-system-based asynchronous mirroring for disaster recovery. In: FAST, pp. 117–129 (2002)Google Scholar
  17. 17.
    Verma, A., Voruganti, K., Routray, R., Jain, R.: SWEEPER: an efficient disaster recovery point identification mechanism. In: FAST, pp. 297–312 (2008)Google Scholar
  18. 18.
    Vigfusson, Y., Abu-Libdeh, H., Balakrishnan, M., Birman, K., Burgess, R., Chockler, G., Li, H., Tock, Y.: Dr. multicast: Rx for data center communication scalability. In: EuroSys, pp. 349–362 (2010)Google Scholar
  19. 19.
    Wood, T., Lagar-Cavilla, H.A., Ramakrishnan, K.K., Shenoy, P., Van der Merwe, J.: PipeCloud: using causality to overcome speed-of-light delays in cloud-based disaster recovery. In: SoCC, pp. 17:1–17:13 (2011)Google Scholar
  20. 20.
    Xie, A., Wang, X., Wang, W., Lu, S.: Designing a disaster-resilient network with software defined networking. In: IWQoS, pp. 135–140, May 2014Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Vincent Gramoli
    • 1
    • 2
  • Guillaume Jourjon
    • 1
  • Olivier Mehani
    • 1
  1. 1.NICTASydneyAustralia
  2. 2.University of SydneySydneyAustralia

Personalised recommendations