Abstract
Program verifiers are not exempt from the bugs that affectnearly every piece of software. In addition, they often exhibit brittle behavior: their performance changes considerably with details of how the input program is expressed—details that should be irrelevant, such as the order of independent declarations. Such a lack of robustness frustrates users who have to spend considerable time figuring out a tool’s idiosyncrasies before they can use it effectively. This paper introduces a technique to detect lack of robustness of program verifiers; the technique is lightweight and fully automated, as it is based on testing methods (such as mutation testing and metamorphic testing). The key idea is to generate many simple variants of a program that initially passes verification. All variants are, by construction, equivalent to the original program; thus, any variant that fails verification indicates lack of robustness in the verifier. We implemented our technique in a tool called \(\mu \) gie, which operates on programs written in the popular Boogie language for verification—used as intermediate representation in numerous program verifiers. Experiments targeting 135 Boogie programs indicate that brittle behavior occurs fairly frequently (16 programs) and is not hard to trigger. Based on these results, the paper discusses the main sources of brittle behavior and suggests means of improving robustness.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
In this paper, the term “verification” also designates validation techniques such as testing.
- 2.
- 3.
By an anonymous reviewer of FM 2018.
- 4.
[6] describes some experiments with seeds that fail verification. Unsurprisingly, random mutations are unlikely to turn an unverified program into a verified one—therefore, the main paper focuses on using verified programs as seeds.
- 5.
See Sect. 5 for a discussion of how robustness testing differs from traditional mutation testing.
- 6.
For clarity, we initially focus on Boogie 4.5.0, and later discuss differences with other versions.
- 7.
Additionally, Why3 times out on 51 mutants of 2 seeds in group S; this seems to reflect an ineffective translation performed by b2w [1] rather than brittleness of Why3.
References
Ameri, M., Furia, C.A.: Why just Boogie? In: Ábrahám, E., Huisman, M. (eds.) IFM 2016. LNCS, vol. 9681, pp. 79–95. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-33693-0_6
AutoProof verified code repository. http://tiny.cc/autoproof-repo
Barr, E.T., Harman, M., McMinn, P., Shahbaz, M., Yoo, S.: The oracle problem in software testing: a survey. IEEE Trans. Softw. Eng. 41(5), 507–525 (2015)
Chen, T.Y., Cheung, S.C., Yiu, S.M.: Metamorphic testing: a new approach for generating next test cases. Technical Report HKUST-CS98-01, Department of Computer Science, Hong Kong University of Science and Technology (1998)
Chen, Y.T., Furia, C.A.: Triggerless happy. In: Polikarpova, N., Schneider, S. (eds.) IFM 2017. LNCS, vol. 10510, pp. 295–311. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-66845-1_19
Chen, Y.T., Furia, C.A.: Robustness testing of intermediate verifiers. http://arxiv.org/abs/1805.03296 (2018)
Claessen, K., Hughes, J.: Quickcheck: a lightweight tool for random testing of Haskell programs. In: ICFP, pp. 268–279. ACM (2000)
Dafny examples and tests. https://github.com/Microsoft/dafny/tree/master/Test
Filliâtre, J.-C., Paskevich, A.: Why3—where programs meet provers. In: Felleisen, M., Gardner, P. (eds.) ESOP 2013. LNCS, vol. 7792, pp. 125–128. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-37036-6_8
Furia, C.A., Meyer, B., Velder, S.: Loop invariants: analysis, classification, and examples. ACM Comput. Surv. 46(3) (2014)
Furia, C.A., Nordio, M., Polikarpova, N., Tschannen, J.: AutoProof: auto-active functional verification of object-oriented programs. STTT 19(6), 697–716 (2016)
Godefroid, P., Levin, M.Y., Molnar, D.A.: SAGE: whitebox fuzzing for security testing. Commun. ACM 55(3), 40–44 (2012)
Hawblitzel, C., Howell, J., Kapritsos, M., Lorch, J.R., Parno, B., Roberts, M.L., Setty, S.T.V., Zill, B.: IronFleet: proving practical distributed systems correct. In: SOSP, pp. 1–17. ACM (2015)
Hawblitzel, C., Howell, J., Lorch, J.R., Narayan, A., Parno, B., Zhang, D., Zill, B.: Ironclad Apps: end-to-end security via automated full-system verification. In: USENIX OSDI, pp. 165–181. USENIX Association (2014)
Hierons, R.M., et al.: Using formal specifications to support testing. ACM Comput. Surv. 41(2), 9:1–9:76 (2009)
Jia, Y., Harman, M.: An analysis and survey of the development of mutation testing. IEEE Trans. Softw. Eng. 37(5), 649–678 (2011)
Leino, K.R.M.: This is Boogie 2 (2008). http://goo.gl/QsH6g
Leino, K., Rustan, M.: Dafny: an automatic program verifier for functional correctness. In: Clarke, E.M., Voronkov, A. (eds.) LPAR 2010. LNCS (LNAI), vol. 6355, pp. 348–370. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-17511-4_20
Leino, K.R.M., Pit-Claudel, C.: Trigger selection strategies to stabilize program verifiers. In: CAV, pp. 361–381. Springer, Berlin (2016)
Leroy, X.: Formal verification of a realistic compiler. Commun. ACM 52(7), 107–115 (2009)
Liew, D., Cadar, C., Donaldson, A.F.: Symbooglix: A symbolic execution engine for boogie programs. In: ICST, pp. 45–56. IEEE Computer Society (2016)
McKeeman, W.M.: Differential testing for software. Digit. Tech. J. 10(1), 100–107 (1998)
\(\mu \)gie repository. https://emptylambda.github.io/mu-gie/
Pacheco, C., Lahiri, S.K., Ernst, M.D., Ball, T.: Feedback-directed random test generation. In: ICSE, pp. 75–84. IEEE Computer Society (2007)
Polikarpova, N., Furia, C.A., West, S.: To run what no one has run before: executing an intermediate verification language. In: Legay, A., Bensalem, S. (eds.) RV 2013. LNCS, vol. 8174, pp. 251–268. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-40787-1_15
Segura, S., Fraser, G., Sanchez, A.B., Ruiz-Cortés, A.: A survey on metamorphic testing. IEEE Trans. Softw. Eng. 42(9), 805–824 (2016)
Tange, O.: GNU parallel—the command-line power tool. Login: USENIX Mag. 36, 42–47 (2011)
Yang, X., Chen, Y., Eide, E., Regehr, J.: Finding and understanding bugs in C compilers. ACM SIGPLAN Not. ACM 46, 283–294 (2011)
Zeller, A., Hildebrandt, R.: Simplifying and isolating failure-inducing input. IEEE Trans. Softw. Eng. 28(2), 183–200 (2002)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Chen, Y., Furia, C.A. (2018). Robustness Testing of Intermediate Verifiers. In: Lahiri, S., Wang, C. (eds) Automated Technology for Verification and Analysis. ATVA 2018. Lecture Notes in Computer Science(), vol 11138. Springer, Cham. https://doi.org/10.1007/978-3-030-01090-4_6
Download citation
DOI: https://doi.org/10.1007/978-3-030-01090-4_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-01089-8
Online ISBN: 978-3-030-01090-4
eBook Packages: Computer ScienceComputer Science (R0)