Scalable Dynamic Partial Order Reduction

  • Jiri Simsa
  • Randy Bryant
  • Garth Gibson
  • Jason Hickey
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7687)


Systematic testing, first demonstrated in small, specialized cases 15 years ago, has matured sufficiently for large-scale systems developers to begin to put it into practice. With actual deployment come new, pragmatic challenges to the usefulness of the techniques. In this paper we are concerned with scaling dynamic partial order reduction, a key technique for mitigating the state space explosion problem, to very large clusters. In particular, we present a new approach for distributed dynamic partial order reduction. Unlike previous work, our approach is based on a novel exploration algorithm that 1) enables trading space complexity for parallelism, 2) achieves efficient load-balancing through time-slicing, 3) provides for fault tolerance, which we consider a mandatory aspect of scalability, 4) scales to more than a thousand parallel workers, and 5) is guaranteed to avoid redundant exploration of overlapping portions of the state space.


Model Checker Fault Tolerance Time Budget Concurrent Program Symbolic Execution 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Agha, G.: Actors: A Model of Concurrent Computation in Distributed Systems. MIT Press, Cambridge (1986)Google Scholar
  2. 2.
    Barnat, J., Brim, L., Češka, M., Ročkai, P.: DiVinE: Parallel Distributed Model Checker (Tool paper). In: HiBi/PDMC 2010, pp. 4–7. IEEE (2010)Google Scholar
  3. 3.
    Bucur, S., Ureche, V., Zamfir, C., Candea, G.: Parallel symbolic execution for automated real-world software testing. In: EuroSys 2011, pp. 183–198 (2011)Google Scholar
  4. 4.
    Chang, F., Dean, J., Ghemawat, S., Hsieh, W.C., Wallach, D.A., Burrows, M., Chandra, T., Fikes, A., Gruber, R.E.: BigTable: A distributed storage system for structured data. In: OSDI 2006, pp. 205–218 (2006)Google Scholar
  5. 5.
    Flanagan, C., Godefroid, P.: Dynamic Partial Order Reduction for Model Checking Software. SIGPLAN Not. 40(1), 110–121 (2005)CrossRefGoogle Scholar
  6. 6.
    Ghemawat, S., Gobioff, H., Leung, S.: The Google file system. SIGOPS Oper. Syst. Rev. 37(5), 29–43 (2003)CrossRefGoogle Scholar
  7. 7.
    Godefroid, P.: Partial-Order Methods for the Verification of Concurrent Systems - An Approach to the State-Explosion Problem. Springer (1996)Google Scholar
  8. 8.
    Godefroid, P.: Model Checking for Programming Languages using VeriSoft. In: POPL 1997, pp. 174–186. ACM (1997)Google Scholar
  9. 9.
    Gueta, G., Flanagan, C., Yahav, E., Sagiv, M.: Cartesian Partial-Order Reduction. In: Bošnački, D., Edelkamp, S. (eds.) SPIN 2007. LNCS, vol. 4595, pp. 95–112. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  10. 10.
    Guo, H., Wu, M., Zhou, L., Hu, G., Yang, J., Zhang, L.: Practical software model checking via dynamic interface reduction. In: SOSP 2011, pp. 265–278. ACM, New York (2011)Google Scholar
  11. 11.
    Holzmann, G.J., Joshi, R., Groce, A.: Swarm Verification Techniques. IEEE Transactions on Software Engineering 37, 845–857 (2011)CrossRefGoogle Scholar
  12. 12.
    Killian, C.E., Anderson, J.W., Jhala, R., Vahdat, A.: Life, Death, and the Critical Transition: Finding Liveness Bugs in Systems Code. In: NSDI 2007 (2007)Google Scholar
  13. 13.
    Lamport, L.: Time, Clocks, and the Ordering of Events in a Distributed System. Commun. ACM 21(7), 558–565 (1978)zbMATHCrossRefGoogle Scholar
  14. 14.
    Musuvathi, M., Qadeer, S., Ball, T., Basler, G., Nainar, P.A., Neamtiu, I.: Finding and Reproducing Heisenbugs in Concurrent Programs. In: OSDI 2008, pp. 267–280 (2008)Google Scholar
  15. 15.
    Simsa, J., Bryant, R., Gibson, G., Hickey, J.: Efficient Exploratory Testing of Concurrent Systems. CMU-PDL Technical Report, 113 (November 2011)Google Scholar
  16. 16.
    Simsa, J., Gibson, G., Bryant, R.: dBug: Systematic Evaluation of Distributed Systems. In: SSV 2010 (2010)Google Scholar
  17. 17.
    Stern, U., Dill, D.L.: Parallelizing the MurPhi Verifier. Formal Methods in System Design 18(2), 117–129 (2001)CrossRefGoogle Scholar
  18. 18.
    Vakkalanka, S.S., Sharma, S., Gopalakrishnan, G., Kirby, R.M.: ISP: A tool for model checking MPI programs. In: PPoPP 2008, pp. 285–286 (2008)Google Scholar
  19. 19.
    Yang, J., Chen, T., Wu, M., Xu, Z., Liu, X., Lin, H., Yang, M., Long, F., Zhang, L., Zhou, L.: MoDist: Transparent Model Checking of Unmodified Distributed Systems. In: NSDI 2009, pp. 213–228 (April 2009)Google Scholar
  20. 20.
    Yang, Y., Chen, X., Gopalakrishnan, G.: Inspect: A Runtime Model Checker for Multithreaded C Programs. University of Utah Tech. Report, UUCS-08-004 (2008)Google Scholar
  21. 21.
    Yang, Y., Chen, X., Gopalakrishnan, G., Kirby, R.M.: Distributed Dynamic Partial Order Reduction Based Verification of Threaded Software. In: Bošnački, D., Edelkamp, S. (eds.) SPIN 2007. LNCS, vol. 4595, pp. 58–75. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  22. 22.
    Yang, Y., Chen, X., Gopalakrishnan, G., Kirby, R.M.: Efficient Stateful Dynamic Partial Order Reduction. In: Havelund, K., Majumdar, R. (eds.) SPIN 2008. LNCS, vol. 5156, pp. 288–305. Springer, Heidelberg (2008)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Jiri Simsa
    • 1
  • Randy Bryant
    • 1
  • Garth Gibson
    • 1
  • Jason Hickey
    • 2
  1. 1.Carnegie Mellon UniversityPittsburghUSA
  2. 2.Google, Inc.Mountain ViewUSA

Personalised recommendations