FITCH: Supporting Adaptive Replicated Services in the Cloud

  • Vinicius V. Cogo
  • André Nogueira
  • João Sousa
  • Marcelo Pasin
  • Hans P. Reiser
  • Alysson Bessani
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7891)

Abstract

Despite the fact that cloud computing offers a high degree of dynamism on resource provisioning, there is a general lack of support for managing dynamic adaptations of replicated services in the cloud, and, even when such support exists, it is focused mainly on elasticity by means of horizontal scalability. We analyse the benefits a replicated service may obtain from dynamic adaptations in the cloud and the requirements on the replication system. For example, adaptation can be done to increase and decrease the capacity of a service, move service replicas closer to their clients, obtain diversity in the replication (for resilience), recover compromised replicas, or rejuvenate ageing replicas. We introduce FITCH, a novel infrastructure to support dynamic adaptation of replicated services in cloud environments. Two prototype services validate this architecture: a crash fault-tolerant Web service and a Byzantine fault-tolerant key-value store based on state machine replication.

References

  1. 1.
    Abd-El-Malek, M., et al.: Fault-scalable Byzantine fault-tolerant services. In: Proc. of SOSP (2005)Google Scholar
  2. 2.
    Amazon Web Services: Amazon Elastic Compute Cloud (Amazon EC2) (2006), http://aws.amazon.com/ec2/
  3. 3.
    Barroso, L., Hölzle, U.: The datacenter as a computer: An introduction to the design of warehouse-scale machines. Synthesis Lectures on Computer Architecture 4(1) (2009)Google Scholar
  4. 4.
    Bessani, A., Correia, M., Quaresma, B., André, F., Sousa, P.: Depsky: dependable and secure storage in a cloud-of-clouds. In: Proc. of EuroSys (2011)Google Scholar
  5. 5.
    Bessani, A., et al.: BFT-SMaRt webpage, http://code.google.com/p/bft-smart
  6. 6.
    Buisson, J., Andre, F., Pazat, J.L.: Supporting adaptable applications in grid resource management systems. In: Proc. of the IEEE/ACM Int. Conf. on Grid Computing (2007)Google Scholar
  7. 7.
    Buyya, R., Garg, S., Calheiros, R.: SLA-oriented resource provisioning for cloud computing: Challenges, architecture, and solutions. In: Proc. of Cloud and Service Computing (2011)Google Scholar
  8. 8.
    Castro, M., Liskov, B.: Practical Byzantine fault tolerance and proactive recovery. ACM Trans. Comput. Syst. 20(4) (2002)Google Scholar
  9. 9.
    Chandra, T., Griesemer, R., Redstone, J.: Paxos made live - an engineering perspective. In: Proc. of the PODC (2007)Google Scholar
  10. 10.
    Chen, W., Hiltunen, M., Schlichting, R.: Constructing adaptive software in distributed systems. In: Proc. of ICDCS (2001)Google Scholar
  11. 11.
    Cooper, B., et al.: Benchmarking cloud serving systems with YCSB. In: Proc. of SOCC (2010)Google Scholar
  12. 12.
    Cully, B., Lefebvre, G., Meyer, D., Feeley, M., Hutchinson, N., Warfield, A.: Remus: high availability via asynchronous virtual machine replication. In: Proc. of the NSDI 2008 (2008)Google Scholar
  13. 13.
    DeCandia, G., et al.: Dynamo: Amazon’s highly available key-value store. In: Proc. of SOSP (2007)Google Scholar
  14. 14.
    Dejun, J., Pierre, G., Chi, C.H.: Autonomous resource provisioning for multi-service web applications. In: Proc. of the WWW (2010)Google Scholar
  15. 15.
    Distler, T., et al.: SPARE: Replicas on Hold. In: Proc. of NDSS (2011)Google Scholar
  16. 16.
    Dwork, C., Lynch, N., Stockmeyer, L.: Consensus in the presence of partial synchrony. J. ACM 35 (1988)Google Scholar
  17. 17.
    Garlan, D., et al.: Rainbow: Architecture-based self-adaptation with reusable infrastructure. Computer 37(10) (2004)Google Scholar
  18. 18.
    Huang, Y., Kintala, C., Kolettis, N., Fulton, N.: Software rejuvenation: analysis, module and applications. In: Proc. of FTCS (1995)Google Scholar
  19. 19.
    Kephart, J.O., Chess, D.M.: The vision of autonomic computing. Computer 36(1) (2003)Google Scholar
  20. 20.
    Lamport, L., Malkhi, D., Zhou, L.: Reconfiguring a state machine. SIGACT News 41(1) (2010)Google Scholar
  21. 21.
    Lorch, J., et al.: The SMART way to migrate replicated stateful services. In: Proc. of EuroSys (2006)Google Scholar
  22. 22.
    Reiser, H., Kapitza, R.: Hypervisor-based efficient proactive recovery. In: Proc. of SRDS (2007)Google Scholar
  23. 23.
    Schneider, F.: Implementing fault-tolerant services using the state machine approach: a tutorial. ACM Comput. Surv. 22(4) (1990)Google Scholar
  24. 24.
    Sousa, P., Neves, N., Verissimo, P.: How resilient are distributed f fault/intrusion-tolerant systems? In: Proc. of DSN (2005)Google Scholar
  25. 25.
    Sousa, P., et al.: Highly available intrusion-tolerant services with proactive-reactive recovery. IEEE Trans. on Parallel and Distributed Systems (2010)Google Scholar
  26. 26.
    Sun Microsystems: Web services performance: Comparing JavaTM 2 enterprise edition (J2EETM platform) and .NET framework. Tech. rep., Sun Microsystems, Inc. (2004)Google Scholar
  27. 27.
    Veríssimo, P.: Travelling throught wormholes: Meeting the grand challenge of distributed systems. In: Proc. of FuDiCo (2002)Google Scholar
  28. 28.
    Yi, S., Andrzejak, A., Kondo, D.: Monetary cost-aware checkpointing and migration on amazon cloud spot instances. IEEE Trans. on Services Computing PP(99) (2011)Google Scholar
  29. 29.
    Zhang, W.: Linux virtual server for scalable network services. In: Proc. of Linux (2000)Google Scholar

Copyright information

© IFIP International Federation for Information Processing 2013

Authors and Affiliations

  • Vinicius V. Cogo
    • 1
  • André Nogueira
    • 1
  • João Sousa
    • 1
  • Marcelo Pasin
    • 2
  • Hans P. Reiser
    • 3
  • Alysson Bessani
    • 1
  1. 1.Faculty of SciencesUniversity of LisbonPortugal
  2. 2.Faculty of ScienceUniversity of NeuchatelSwitzerland
  3. 3.Institute of IT-Security and Security LawUniversity of PassauGermany

Personalised recommendations