Testing and Verifying Chain Repair Methods for Corfu Using Stateless Model Checking

  • Stavros Aronis
  • Scott Lystig Fritchie
  • Konstantinos Sagonas
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10510)

Abstract

Corfu is a distributed shared log that is designed to be scalable and reliable in the presence of failures and asynchrony. Internally, Corfu is fully replicated for fault tolerance, without sharding data or sacrificing strong consistency. In this case study, we present the modeling approaches we followed to test and verify, using Concuerror, the correctness of repair methods for the Chain Replication protocol suitable for Corfu. In the first two methods we tried, Concuerror located bugs quite fast. In contrast, the tool did not manage to find bugs in the third method, but the time this took also motivated an improvement in the tool that reduces the number of traces explored. Besides more details about all the above, we present experiences and lessons learned from applying stateless model checking for verifying complex protocols suitable for distributed programming.

References

  1. 1.
    Abdulla, P., Aronis, S., Jonsson, B., Sagonas, K.: Optimal dynamic partial order reduction. In: Proceedings of the 41st ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL 2014, pp. 373–384. ACM, New York (2014). doi:10.1145/2535838.2535845
  2. 2.
    Armstrong, J.: Erlang. Commun. ACM 53(9), 68–75 (2010)CrossRefGoogle Scholar
  3. 3.
    Burckhardt, S., Kothari, P., Musuvathi, M., Nagarakatte, S.: A randomized scheduler with probabilistic guarantees of finding bugs. In: Proceedings of ASPLOS, ASPLOS XV, pp. 167–178. ACM, New York (2010). doi:10.1145/1736020.1736040
  4. 4.
    Christakis, M., Gotovos, A., Sagonas, K.: Systematic testing for detecting concurrency errors in Erlang programs. In: Sixth IEEE International Conference on Software Testing, Verification and Validation (ICST 2013), pp. 154–163. IEEE Computer Society (2013)Google Scholar
  5. 5.
    Deligiannis, P., Donaldson, A.F., Ketema, J., Lal, A., Thomson, P.: Asynchronous programming, analysis and testing with state machines. In: Proceedings of the 36th PLDI, PLDI 2015, pp. 154–164 (2015). doi:10.1145/2737924.2737996
  6. 6.
    Emmi, M., Qadeer, S., Rakamarić, Z.: Delay-bounded scheduling. In: Proceedings of the 38th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL 2011, pp. 411–422. ACM, New York (2011)Google Scholar
  7. 7.
    Fritchie, S.L.: Chain replication in theory and in practice. In: Proceedings of the 9th ACM SIGPLAN Workshop on Erlang, Erlang 2010, pp. 33–44. ACM, New York (2010). doi:10.1145/1863509.1863515
  8. 8.
    Geambasu, R., Birrell, A., MacCormick, J.: Experiences with formal specification of fault-tolerant file systems. In: IEEE International Conference on Dependable Systems and Networks With FTCS and DCC, DSN 2008, pp. 96–101. IEEE (2008)Google Scholar
  9. 9.
    Godefroid, P.: Partial-Order Methods for the Verification of Concurrent Systems: An Approach to the State-Explosion Problem. Springer-Verlag New York Inc., Secaucus (1996)CrossRefMATHGoogle Scholar
  10. 10.
    Godefroid, P.: Model checking for programming languages using VeriSoft. In: Proceedings of the 24th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL 1997, pp. 174–186. ACM, New York (1997). doi:10.1145/263699.263717
  11. 11.
    Godefroid, P.: Software model checking: the VeriSoft approach. Form. Methods Syst. Des. 26(2), 77–101 (2005). doi:10.1007/s10703-005-1489-x CrossRefGoogle Scholar
  12. 12.
    Gotovos, A., Christakis, M., Sagonas, K.: Test-driven development of concurrent programs using Concuerror. In: Proceedings of the 10th ACM SIGPLAN Workshop on Erlang, Erlang 2011, pp. 51–61. ACM, New York (2011). doi:10.1145/2034654.2034664
  13. 13.
    Malkhi, D., Balakrishnan, M., Davis, J.D., Prabhakaran, V., Wobber, T.: From Paxos to CORFU: a flash-speed shared log. SIGOPS Oper. Syst. Rev. 46(1), 47–51 (2012). doi:10.1145/2146382.2146391 CrossRefGoogle Scholar
  14. 14.
    Musuvathi, M., Qadeer, S., Ball, T., Basler, G., Nainar, P.A., Neamtiu, I.: Finding and reproducing heisenbugs in concurrent programs. In: Proceedings of the 8th USENIX Conference on Operating Systems Design and Implementation, OSDI 2008, pp. 267–280. USENIX Association, Berkeley (2008)Google Scholar
  15. 15.
    Qadeer, S., Rehof, J.: Context-bounded model checking of concurrent software. In: Halbwachs, N., Zuck, L.D. (eds.) TACAS 2005. LNCS, vol. 3440, pp. 93–107. Springer, Heidelberg (2005). doi:10.1007/978-3-540-31980-1_7 CrossRefGoogle Scholar
  16. 16.
    van Renesse, R., Schneider, F.B.: Chain replication for supporting high throughput and availability. In: Proceedings of the 6th Conference on Symposium on Operating Systems Design & Implementation, OSDI 2004, pp. 91–104. USENIX, Berkeley (2004)Google Scholar
  17. 17.
    Schiper, N., Rahli, V., van Renesse, R., Bickford, M., Constable, R.L.: Developing correctly replicated databases using formal tools. In: 2014 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), pp. 395–406. IEEE (2014)Google Scholar
  18. 18.
    Thomson, P., Donaldson, A.F., Betts, A.: Concurrency testing using controlled schedulers: an empirical study. ACM Trans. Parallel Comput. 2(4), 23:1–23:37 (2016). doi:10.1145/2858651 CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Stavros Aronis
    • 1
  • Scott Lystig Fritchie
    • 2
  • Konstantinos Sagonas
    • 1
  1. 1.Department of Information TechnologyUppsala UniversityUppsalaSweden
  2. 2.VMwareCambridgeUSA

Personalised recommendations