Advertisement

Software Verification: Testing vs. Model Checking

A Comparative Evaluation of the State of the Art
  • Dirk BeyerEmail author
  • Thomas Lemberger
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10629)

Abstract

In practice, software testing has been the established method for finding bugs in programs for a long time. But in the last 15 years, software model checking has received a lot of attention, and many successful tools for software model checking exist today. We believe it is time for a careful comparative evaluation of automatic software testing against automatic software model checking. We chose six existing tools for automatic test-case generation, namely AFL-fuzz, CPATiger, Crest-ppc, FShell, Klee, and PRtest, and four tools for software model checking, namely Cbmc, CPA-Seq, Esbmc-incr, and Esbmc-kInd, for the task of finding specification violations in a large benchmark suite consisting of 5 693 C programs. In order to perform such an evaluation, we have implemented a framework for test-based falsification (tbf) that executes and validates test cases produced by test-case generation tools in order to find errors in programs. The conclusion of our experiments is that software model checkers can (i) find a substantially larger number of bugs (ii) in less time, and (iii) require less adjustment to the input programs.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Anand, S., Burke, E.K., Chen, T.Y., Clark, J.A., Cohen, M.B., Grieskamp, W., Harman, M., Harrold, M.J., McMinn, P.: An orchestrated survey of methodologies for automated software test-case generation. Journal of Systems and Software 86(8), 1978–2001 (2013)CrossRefGoogle Scholar
  2. 2.
    Ball, T., Rajamani, S.K.: The Slam project: Debugging system software via static analysis. In: Proc. POPL, pp. 1–3. ACM (2002)Google Scholar
  3. 3.
    Beyer, D.: Competition on software verification. In: Flanagan, C., König, B. (eds.) TACAS 2012. LNCS, vol. 7214, pp. 504–524. Springer, Heidelberg (2012).  https://doi.org/10.1007/978-3-642-28756-5_38 CrossRefGoogle Scholar
  4. 4.
    Beyer, D.: Software verification with validation of results. In: Legay, A., Margaria, T. (eds.) TACAS 2017. LNCS, vol. 10206, pp. 331–349. Springer, Heidelberg (2017).  https://doi.org/10.1007/978-3-662-54580-5_20 CrossRefGoogle Scholar
  5. 5.
    Beyer, D., Chlipala, A.J., Henzinger, T.A., Jhala, R., Majumdar, R.: Generating tests from counterexamples. In: Proc. ICSE, pp. 326–335. IEEE (2004)Google Scholar
  6. 6.
    Beyer, D., Dangl, M.: Verification-Aided debugging: An interactive web-service for exploring error witnesses. In: Chaudhuri, S., Farzan, A. (eds.) CAV 2016. LNCS, vol. 9780, pp. 502–509. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-41540-6_28 Google Scholar
  7. 7.
    Beyer, D., Dangl, M., Dietsch, D., Heizmann, M., Stahlbauer, A.: Witness validation and stepwise testification across software verifiers. In: Proc. FSE, pp. 721–733. ACM (2015)Google Scholar
  8. 8.
    Beyer, D., Dangl, M., Wendler, P.: Boosting k-induction with continuously-refined invariants. In: Kroening, D., Păsăreanu, C.S. (eds.) CAV 2015. LNCS, vol. 9206, pp. 622–640. Springer, Cham (2015).  https://doi.org/10.1007/978-3-319-21690-4_42 CrossRefGoogle Scholar
  9. 9.
    Beyer, D., Henzinger, T.A., Jhala, R., Majumdar, R.: The software model checker Blast. Int. J. Softw. Tools Technol. Transfer 9(5–6), 505–525 (2007)CrossRefGoogle Scholar
  10. 10.
    Beyer, D., Holzer, A., Tautschnig, M., Veith, H.: Information reuse for multi-goal reachability analyses. In: Felleisen, M., Gardner, P. (eds.) ESOP 2013. LNCS, vol. 7792, pp. 472–491. Springer, Heidelberg (2013).  https://doi.org/10.1007/978-3-642-37036-6_26 CrossRefGoogle Scholar
  11. 11.
    Beyer, D., Keremoglu, M.E.: CPAchecker: A tool for configurable software verification. In: Gopalakrishnan, G., Qadeer, S. (eds.) CAV 2011. LNCS, vol. 6806, pp. 184–190. Springer, Heidelberg (2011).  https://doi.org/10.1007/978-3-642-22110-1_16 CrossRefGoogle Scholar
  12. 12.
    Beyer, D., Keremoglu, M.E., Wendler, P.: Predicate abstraction with adjustable-block encoding. In: Proc. FMCAD, pp. 189–197. FMCAD (2010)Google Scholar
  13. 13.
    Beyer, D., Löwe, S.: Explicit-State software model checking based on cegar and interpolation. In: Cortellessa, V., Varró, D. (eds.) FASE 2013. LNCS, vol. 7793, pp. 146–162. Springer, Heidelberg (2013).  https://doi.org/10.1007/978-3-642-37057-1_11 CrossRefGoogle Scholar
  14. 14.
    Beyer, D., Löwe, S., Wendler, P.: Reliable benchmarking: Requirements and solutions. Int. J. Softw. Tools Technol, Transfer (2017)Google Scholar
  15. 15.
    Beyer, D., Wendler, P.: Reuse of verification results. In: Bartocci, E., Ramakrishnan, C.R. (eds.) SPIN 2013. LNCS, vol. 7976, pp. 1–17. Springer, Heidelberg (2013).  https://doi.org/10.1007/978-3-642-39176-7_1 CrossRefGoogle Scholar
  16. 16.
    Biere, A., Cimatti, A., Clarke, E., Zhu, Y.: Symbolic model checking without BDDs. In: Cleaveland, W.R. (ed.) TACAS 1999. LNCS, vol. 1579, pp. 193–207. Springer, Heidelberg (1999).  https://doi.org/10.1007/3-540-49059-0_14 CrossRefGoogle Scholar
  17. 17.
    Böhme, M., Pham, V., Roychoudhury, A.: Coverage-based greybox fuzzing as Markov chain. In: Proc. SIGSAC, pp. 1032–1043. ACM (2016)Google Scholar
  18. 18.
    Burnim, J., Sen, K.: Heuristics for scalable dynamic test generation. In: Proc. ASE, pp. 443–446. IEEE (2008)Google Scholar
  19. 19.
    Cadar, C., Dunbar, D., Engler, D.R.: KLEE: Unassisted and automatic generation of high-coverage tests for complex systems programs. In: Proc. OSDI, pp. 209–224. USENIX Association (2008)Google Scholar
  20. 20.
    Cimatti, A., Griggio, A., Schaafsma, B.J., Sebastiani, R.: The MathSAT5 SMT Solver. In: Piterman, N., Smolka, S.A. (eds.) TACAS 2013. LNCS, vol. 7795, pp. 93–107. Springer, Heidelberg (2013).  https://doi.org/10.1007/978-3-642-36742-7_7 CrossRefGoogle Scholar
  21. 21.
    Clarke, E.M., Grumberg, O., Jha, S., Lu, Y., Veith, H.: Counterexample-guided abstraction refinement for symbolic model checking. J. ACM 50(5), 752–794 (2003)CrossRefzbMATHMathSciNetGoogle Scholar
  22. 22.
    Clarke, E., Kroening, D., Lerda, F.: A Tool for Checking ANSI-C Programs. In: Jensen, K., Podelski, A. (eds.) TACAS 2004. LNCS, vol. 2988, pp. 168–176. Springer, Heidelberg (2004).  https://doi.org/10.1007/978-3-540-24730-2_15 CrossRefGoogle Scholar
  23. 23.
    Craig, W.: Linear reasoning. A new form of the Herbrand-Gentzen theorem. J. Symb. Log. 22(3), 250–268 )(1957)Google Scholar
  24. 24.
    Csallner, C., Smaragdakis, Y.: Check ‘n’ crash: Combining static checking and testing. In: Proc. ICSE, pp. 422–431. ACM (2005)Google Scholar
  25. 25.
    Dangl, M., Löwe, S., Wendler, P.: CPAchecker with support for recursive programs and floating-point arithmetic. In: Baier, C., Tinelli, C. (eds.) TACAS 2015. LNCS, vol. 9035, pp. 423–425. Springer, Heidelberg (2015).  https://doi.org/10.1007/978-3-662-46681-0_34 Google Scholar
  26. 26.
    Eén, N., Sörensson, N.: An extensible SAT-solver. In: Giunchiglia, E., Tacchella, A. (eds.) SAT 2003. LNCS, vol. 2919, pp. 502–518. Springer, Heidelberg (2004).  https://doi.org/10.1007/978-3-540-24605-3_37 CrossRefGoogle Scholar
  27. 27.
    Gadelha, M.Y.R., Ismail, H.I., Cordeiro, L.C.: Handling loops in bounded model checking of C programs via k-induction. STTT 19(1), 97–114 (2017)CrossRefGoogle Scholar
  28. 28.
    Galler, S.J., Aichernig, B.K.: Survey on test data generation tools. STTT 16(6), 727–751 (2014)CrossRefGoogle Scholar
  29. 29.
    Godefroid, P., Klarlund, N., Sen, K.: Dart: Directed automated random testing. In: Proc. PLDI, pp. 213–223. ACM (2005)Google Scholar
  30. 30.
    Godefroid, P., Levin, M.Y., Molnar, D.A.: Automated whitebox fuzz testing. In: Proc. NDSS. The Internet Society (2008)Google Scholar
  31. 31.
    Graf, S., Saidi, H.: Construction of abstract state graphs with PVS. In: Grumberg, O. (ed.) CAV 1997. LNCS, vol. 1254, pp. 72–83. Springer, Heidelberg (1997).  https://doi.org/10.1007/3-540-63166-6_10 CrossRefGoogle Scholar
  32. 32.
    Heizmann, M., Dietsch, D., Leike, J., Musa, B., Podelski, A.: Ultimate Automizer with array interpolation. In: Baier, C., Tinelli, C. (eds.) TACAS 2015. LNCS, vol. 9035, pp. 455–457. Springer, Heidelberg (2015).  https://doi.org/10.1007/978-3-662-46681-0_43 Google Scholar
  33. 33.
    Heizmann, M., Hoenicke, J., Podelski, A.: Software model checking for people who love automata. In: Sharygina, N., Veith, H. (eds.) CAV 2013. LNCS, vol. 8044, pp. 36–52. Springer, Heidelberg (2013).  https://doi.org/10.1007/978-3-642-39799-8_2 CrossRefGoogle Scholar
  34. 34.
    Henzinger, T.A., Jhala, R., Majumdar, R., Sutre, G.: Lazy abstraction. In: Proc. POPL, pp. 58–70. ACM (2002)Google Scholar
  35. 35.
    Holzer, A., Schallhart, C., Tautschnig, M., Veith, H.: How did you specify your test suite? In: Proc. ASE, pp. 407–416. ACM (2010)Google Scholar
  36. 36.
    Jayaraman, K., Harvison, D., Ganesh, V., Kiezun, A.: jFuzz: a concolic whitebox fuzzer for Java. In: Proc. NFM, pp. 121–125 (2009)Google Scholar
  37. 37.
    Jhala, R., Majumdar, R.: Software model checking. ACM Computing Surveys 41(4) (2009)Google Scholar
  38. 38.
    Khoroshilov, A., Mutilin, V., Petrenko, A., Zakharov, V.: Establishing Linux driver verification process. In: Pnueli, A., Virbitskaite, I., Voronkov, A. (eds.) PSI 2009. LNCS, vol. 5947, pp. 165–176. Springer, Heidelberg (2010).  https://doi.org/10.1007/978-3-642-11486-1_14 CrossRefGoogle Scholar
  39. 39.
    Köroglu, Y., Sen, A.: Design of a modified concolic testing algorithm with smaller constraints. In: Proc. ISSTA, pp. 3–14. ACM (2016)Google Scholar
  40. 40.
    Kroening, D., Tautschnig, M.: CBMC – C bounded model checker (competition contribution). In: Ábrahám, E., Havelund, K. (eds.) TACAS 2014. LNCS, vol. 8413, pp. 389–391. Springer, Heidelberg (2014).  https://doi.org/10.1007/978-3-642-54862-8_26 CrossRefGoogle Scholar
  41. 41.
    Li, K., Reichenbach, C., Csallner, C., Smaragdakis, Y.: Residual investigation: Predictive and precise bug detection. In: Proc. ISSTA, pp. 298–308. ACM (2012)Google Scholar
  42. 42.
    McMillan, K.L.: Interpolation and SAT-based model checking. In: Hunt, W.A., Somenzi, F. (eds.) CAV 2003. LNCS, vol. 2725, pp. 1–13. Springer, Heidelberg (2003).  https://doi.org/10.1007/978-3-540-45069-6_1 CrossRefGoogle Scholar
  43. 43.
    Morse, J., Ramalho, M., Cordeiro, L., Nicole, D., Fischer, B.: ESBMC 1.22 (competition contribution). In: Ábrahám, E., Havelund, K. (eds.) TACAS 2014. LNCS, vol. 8413, pp. 405–407. Springer, Heidelberg (2014).  https://doi.org/10.1007/978-3-642-54862-8_31 CrossRefGoogle Scholar
  44. 44.
    Pavlinovic, Z., Lal, A., Sharma, R.: Inferring annotations for device drivers from verification histories. In: Proc. ASE, pp. 450–460. ACM (2016)Google Scholar
  45. 45.
    Sen, K., Marinov, D., Agha, G.: Cute: A concolic unit testing engine for C. In: Proc. ESEC/FSE, pp. 263–272. ACM (2005)Google Scholar
  46. 46.
    Seo, H., Kim, S.: How we get there: A context-guided search strategy in concolic testing. In: Proc. FSE, pp. 413–424. ACM (2014)Google Scholar
  47. 47.
    Zakharov, I.S., Mandrykin, M.U., Mutilin, V.S., Novikov, E., Petrenko, A.K., Khoroshilov, A.V.: Configurable toolset for static verification of operating systems kernel modules. Programming and Computer Software 41(1), 49–64 (2015)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.LMU MunichMünichGermany

Personalised recommendations