Transparent State Machine Replication for Kubernetes

  • Felipe Borges
  • Luis Pacheco
  • Eduardo AlchieriEmail author
  • Marcos F. Caetano
  • Priscila Solis
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 926)


State Machine Replication (SMR) is an approach widely used to implement fault-tolerant systems. In this approach, servers are replicated and client requests are deterministically executed in the same order by all replicas. Virtualization can be seen as a technique that favor development of fault-tolerant applications, since it provides an architecture that isolate virtual machines or containers. In order to provide support to the development of fault-tolerant virtualized applications, this work proposes an architecture to provide SMR for applications virtualized in containers managed by Kubernetes. Transparency is the main design principle addressed by the proposed architecture: applications are still being developed as in the traditional non-replicated approach and end users also access the system as in the traditional way. The open-source Bft-SMaRt SMR library was used to implement a prototype of the proposed architecture and a key-value store service. Experiments conducted with this service show the practical behavior of the proposed solutions.



This work was partially supported by RNP/CTIC (Brazil) through projects ATMOSPHERE and P4Sec.


  1. 1.
    Bernstein, D.: Containers and cloud: from LXC to docker to kubernetes. IEEE Cloud Comput. 1(3), 81–84 (2014)CrossRefGoogle Scholar
  2. 2.
    Bessani, A., Santos, M., Felix, J., Neves, N., Correia, M.: On the efficiency of durable state machine replication. In: Proceedings of the USENIX Annual Technical Conference (2013)Google Scholar
  3. 3.
    Bessani, A., Sousa, J., Alchieri, E.: State machine replication for the masses with BFT-SMaRt. In: Proceedings of the International Conference on Dependable Systems and Networks (2014)Google Scholar
  4. 4.
    Burns, B., Grant, B., Oppeheimer, D., Brewer, E., Wilkes, J.: Lessons learned from three container-management systems over a decade: borg, omega, and kubernetes. ACM Queue Mag. 14, 70–93 (2016)CrossRefGoogle Scholar
  5. 5.
    Castro, M., Liskov, B.: Practical Byzantine fault-tolerance and proactive recovery. ACM Trans. Comput. Syst. 20(4), 398–461 (2002)CrossRefGoogle Scholar
  6. 6.
    Docker: What is docker (2018). Accessed March 2018
  7. 7.
    Dwork, C., Lynch, N.A., Stockmeyer, L.: Consensus in the presence of partial synchrony. J. ACM 35(2), 288–322 (1988)MathSciNetCrossRefGoogle Scholar
  8. 8.
    Garfinkel, T., Rosenblum, M.: A virtual machine introspection based architecture for intrusion detection. In: Network and Distributed Systems Security Symposium (2003)Google Scholar
  9. 9.
    Goldberg, R.P.: Architecture of virtual machines. In: Proceedings of the Workshop on Virtual Computer Systems (1973)Google Scholar
  10. 10.
    Goldberg, R.P., Mager, P.S.: Virtual machine technology: a bridge from large mainframes to networks of small computers. In: Proceedings of the Compcon Fall (1979)Google Scholar
  11. 11.
    Hadzilacos, V., Toueg, S.: A modular approach to the specification and implementation of fault-tolerant broadcasts. Technical report, Department of Computer Science, Cornell (1994)Google Scholar
  12. 12.
    Herlihy, M., Wing, J.M.: Linearizability: a correctness condition for concurrent objects. ACM Trans. Program. Lang. Syst. 12(3), 463–492 (1990)CrossRefGoogle Scholar
  13. 13.
    Howard, H., Schwarzkopf, M., Madhavapeddy, A., Crowcroft, J.: Raft refloated: do we have consensus? ACM SIGOPS Oper. Syst. Rev. 49(1), 12–21 (2015)CrossRefGoogle Scholar
  14. 14.
    Jiang, X., Wang, X.: “out-of-the-box” monitoring of VM-based high-interaction honeypots. In: 10th International Conference on Recent Advances in Intrusion Detection (2007)Google Scholar
  15. 15.
    Lamport, L., Shostak, R., Pease, M.: The Byzantine generals problem. ACM Trans. Program. Lang. Syst. 4(3), 382–401 (1982)CrossRefGoogle Scholar
  16. 16.
    Lamport, L.: The part-time parliament. ACM Trans. Comput. Syst. 16(2), 133–169 (1998)CrossRefGoogle Scholar
  17. 17.
    Laureano, M., Maziero, C., Jamhour, E.: Intrusion detection in virtual machine environments. In: Proceedings of 30th Euromicro Conference (2004)Google Scholar
  18. 18.
    Merkel, D.: Docker: lightweight linux containers for consistent development and deployment. ACM Linux J. 2014(239), 1–8 (2014)Google Scholar
  19. 19.
    Oliveira, C., Lung, L.C., Netto, H., Rech, L.: Evaluating raft in docker on kubernetes. In: Świa̧tek J., T. J. (eds.) International Conference on Systems Science (ICSS). Advances in Intelligent Systems and Computing, vol. 539, pp. 123–130. Springer (2016)Google Scholar
  20. 20.
    Schneider, F.B.: Implementing fault-tolerant service using the state machine aproach: a tutorial. ACM Comput. Surv. 22(4), 299–319 (1990)CrossRefGoogle Scholar
  21. 21.
    Schwarzkopf, M., Konwinski, A., Abd-El-Malek, M., Wilkes, J.: Omega: flexible, scalable schedulers for large compute clusters. In: European Conference on Computer Systems (2013)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  • Felipe Borges
    • 1
  • Luis Pacheco
    • 1
  • Eduardo Alchieri
    • 1
    Email author
  • Marcos F. Caetano
    • 1
  • Priscila Solis
    • 1
  1. 1.Department of Computer ScienceUniversity of BrasiliaBrasíliaBrazil

Personalised recommendations