Evaluation of model checkers by verifying message passing programs


Benchmarks and evaluation are important for the development of techniques and tools. Studies regarding evaluation of model checkers by large-scale benchmarks are few. The lack of such studies is mainly because of the language difference of existing model checkers and the requirement of intensive labor in building models. In this study, we present a large-scale benchmark for evaluating model checkers whose inputs are concurrent models. The benchmark consists of 2318 models that are generated automatically from real-world message passing interface (MPI) programs. The complexities of the models have been inspected to be well distributed and suitable for evaluating model checkers. Based on the benchmark, we have evaluated five state-of-the-art model checkers, i.e., PAT, FDR, Spin, PRISM, and NuSMV, by verifying the deadlock freedom property. The evaluation results demonstrate the ability and performance difference of these model checkers in verifying message passing programs.

This is a preview of subscription content, access via your institution.


  1. 1

    Clarke E M, Grumberg O, Peled D A. Model Checking. Cambridge: MIT Press, 2001

    Google Scholar 

  2. 2

    Frappier M, Fraikin B, Chossart R, et al. Comparison of model checking tools for information systems. In: Proceedings of the 12th International Conference on Formal Engineering Methods, 2010. 581–596

  3. 3

    Pelánek R. BEEM: benchmarks for explicit model checkers. In: Proceedings of the 14th International SPIN Workshop on Model Checking Software, 2007. 263–267

  4. 4

    Gopalakrishnan G, Kirby R M, Siegel S F, et al. Formal analysis of MPI-based parallel programs. Commun ACM, 2011, 54: 82–91

    Article  Google Scholar 

  5. 5

    Siegel S F. Verifying parallel programs with mpi-spin. In: Proceedings of Recent Advances in Parallel Virtual Machine and Message Passing Interface, 2007. 13–14

  6. 6

    Luo Z Q, Zheng M C, Siegel S F. Verification of MPI programs using CIVL. In: Proceedings of the 24th European MPI Users’ Group Meeting, 2017. 6: 1–11

    Google Scholar 

  7. 7

    Yu H B, Chen Z B, Fu X J, et al. Combining symbolic execution and model checking to verify MPI programs. 2018. ArXiv: 1803.06300

  8. 8

    King J C. Symbolic execution and program testing. Commun ACM, 1976, 19: 385–394

    MathSciNet  Article  MATH  Google Scholar 

  9. 9

    Gibson-Robinson T, Armstrong P, Boulgakov A, et al. A modern refinement checker for CSP. In: Proceedings of Tools and Algorithms for the Construction and Analysis of Systems, 2014. 187–201

  10. 10

    Lattner C. Llvm and clang: next generation compiler technology. In: Proceedings of the BSD Conference, 2008. 1–2

  11. 11

    Hoare C A R. Communicating Sequential Processes. Upper Saddle River: Prentice-Hall, 1985

    Google Scholar 

  12. 12

    Scattergood J B. The semantics and implementation of machine-readable CSP. Dissertation for Ph.D. Degree. Oxford: University of Oxford, 1998

    Google Scholar 

  13. 13

    McMillan K L. Symbolic model checking. Norwell: Kluwer Academic Publishers, 1993

    Google Scholar 

  14. 14

    Baier C, Katoen J. Principles of Model Checking. Cambridge: MIT Press, 2008

    Google Scholar 

  15. 15

    Siegel S F, Zirkel T K. TASS: the toolkit for accurate scientific software. Math Comput Sci, 2011, 5: 395–426

    Article  MATH  Google Scholar 

  16. 16

    Xue R N, Liu X Z, Wu M, et al. Mpiwiz: subgroup reproducible replay of mpi applications. In: Proceedings of the 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2009. 251–260

  17. 17

    Müller M, de Supinski B, Gopalakrishnan G, et al. Dealing with mpi bugs at scale: Best practices, automatic detection, debugging, and formal verification. 2011

  18. 18

    Vakkalanka S. Efficient dynamic verification algorithms for MPI applications. 2010

  19. 19

    Thompson J D, Higgins D G, Gibson T J. Clustalw: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res, 1994, 22: 4673–4680

    Article  Google Scholar 

  20. 20

    Lattner C, Adve V S. LLVM: a compilation framework for lifelong program analysis & transformation. In: Proceedings of the 2nd IEEE/ACM International Symposium on Code Generation and Optimization (CGO 2004), 2004. 75–88

  21. 21

    Just R, Jalali D, Inozemtseva L, et al. Are mutants a valid substitute for real faults in software testing? In: Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering, 2014. 654–665

  22. 22

    Newman M E J. The structure and function of complex networks. SIAM Rev, 2003, 45: 167–256

    MathSciNet  Article  MATH  Google Scholar 

  23. 23

    Hermann L R. Laplacian-isoparametric grid generation scheme. J Eng Mech Div, 1976, 102: 749–907

    Google Scholar 

  24. 24

    Godefroid P. Partial-order methods for the verification of concurrent systems — an approach to the state-explosion problem. In: Lecture Notes in Computer Science. Berlin: Springer, 1996

    Google Scholar 

  25. 25

    McKeeman W M. Differential testing for software. Digit Tech J, 1998, 10: 100–107

    Google Scholar 

  26. 26

    Vakkalanka S S, Gopalakrishnan G, Kirby R M. Dynamic verification of MPI programs with reductions in presence of split operations and relaxed orderings. In: Proceedings of the 20th International Conference on Computer Aided Verification, 2008. 66–79

  27. 27

    Vakkalanka S S, Sharma S, Gopalakrishnan G, et al. ISP: a tool for model checking MPI programs. In: Proceedings of the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2008. 285–286

  28. 28

    Forejt V, Joshi S, Kroening D, et al. Precise predictive analysis for discovering communication deadlocks in MPI programs. ACM Trans Program Lang Syst, 2017, 39: 1–27

    Article  Google Scholar 

  29. 29

    Blom S, van de Pol J, Weber M. Ltsmin: distributed and symbolic reachability. In: Proceedings of the 22nd International Conference on Computer Aided Verification, 2010. 354–359

  30. 30

    Lal A, Reps T W. Reducing concurrent analysis under a context bound to sequential analysis. Form Methods Syst Des, 2009, 35: 73–97

    Article  MATH  Google Scholar 

  31. 31

    Laarman A, van de Pol J, Weber M. Boosting multi-core reachability performance with shared hash tables. In: Proceedings of the 10th International Conference on Formal Methods in Computer-Aided Design, 2010. 247–255

  32. 32

    Kwiatkowska M Z, Norman G, Parker D. The PRISM benchmark suite. In: Proceedings of the 9th International Conference on Quantitative Evaluation of Systems, 2012. 203–204

  33. 33

    Atiya D A, Catano N, Lüttgen G. Towards a benchmark for model checkers of asynchronous concurrent systems. In: Proceedings of the 5th International Workshop on Automated Verification of Critical Systems (AVOCs), 2005. 98: 142–170

    Google Scholar 

Download references


This work was supported by National Key R&D Program of China (Grant No. 2017YFB1001802) and National Natural Science Foundation of China (Garnt Nos. 61472440, 61632015, 61690203, 61532007).

Author information



Corresponding authors

Correspondence to Zhenbang Chen or Ji Wang.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Hong, W., Chen, Z., Yu, H. et al. Evaluation of model checkers by verifying message passing programs. Sci. China Inf. Sci. 62, 200101 (2019). https://doi.org/10.1007/s11432-018-9825-3

Download citation


  • model checker
  • evaluation
  • benchmark
  • MPI
  • symbolic execution