Skip to main content
Log in

A framework for testing first-order logic axioms in program verification

  • Published:
Software Quality Journal Aims and scope Submit manuscript

Abstract

Program verification systems based on automated theorem provers rely on user-provided axioms in order to verify domain-specific properties of code. However, formulating axioms correctly (that is, formalizing properties of an intended mathematical interpretation) is non-trivial in practice, and avoiding or even detecting unsoundness can sometimes be difficult to achieve. Moreover, speculating soundness of axioms based on the output of the provers themselves is not easy since they do not typically give counterexamples. We adopt the idea of model-based testing to aid axiom authors in discovering errors in axiomatizations. To test the validity of axioms, users define a computational model of the axiomatized logic by giving interpretations to the function symbols and constants in a simple declarative programming language. We have developed an axiom testing framework that helps automate model definition and test generation using off-the-shelf tools for meta-programming, property-based random testing, and constraint solving. We have experimented with our tool to test the axioms used in Auto-Cert, a program verification system that has been applied to verify aerospace flight code using a first-order axiomatization of navigational concepts, and were able to find counterexamples for a number of axioms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16

Similar content being viewed by others

Notes

  1. The converse is not always true, however: provers can time out or the domain theory might be incomplete.

  2. In fact, we could even choose to define the interpretation using the actual libraries that will be used in the production implementation, by calling on the Haskell library or using the Foreign Function Interface of Haskell to call external libraries written in other languages like C.

  3. We generate properties as first class values of Haskell, which is different from writing an interpreter or an evaluator over syntax trees of first-order logic formulae. We use Template Haskell (Sheard and Peyton Jones 2002) to generate these properties as Haskell code at compile time from the axiom formulae input and the user-provided interpretation.

  4. Not all axiom formulae that can be written with TPTP FOF syntax have a natural computational interpretation. In practice, however, most axioms, which are actually used in software verification, or as constructive axioms for other domains, do have a computational interpretation. In Sect. 5, we discuss the class of axioms that our framework can test.

  5. We will write TPTP syntax (see http://www.cs.miami.edu/~tptp/TPTP/SyntaxBNF.html) in typewriter font, e.g., \( {\tt ![X,Y,Z]:((X=Y\&} \tilde{\,}{\tt (Y=Z)) => X=Z)}.\) We use this plain text TPTP syntax when we quote axiom formulas verbatim. When we discuss first-order formulae more conceptually or mathematically, such as when we describe algorithms on formulae, we use conventional math symbols for logical connectives, e.g., \( \forall{\tt [X,Y,Z]:(( X=Y \land \neg(Y=Z)) \Rightarrow X=Z)}.\) We use italic math font for metavariables of formulae (PQRABCD) and terms (T).

  6. We intentionally used the same Haskell names as the logical symbol names (e.g., mapping “ hi ” to hi ) for the sake of readability, but they do not have to be the same.

  7. We specify type annotations in TPTP definition comments, which are comments that start with either %$ (for line comments) or /*\$ (for block comments). A type annotation list is simply a comma delimited list of identifiers starting with lowercase alphabet, enclosed in square brackets. The length of a type annotation list must be the same as the list of universally qualified variables that is being annotated.

  8. http://www.tptp.org/.

  9. It is not enough to just try and prove false since different provers exploit inconsistency in different ways. Moreover, a logic can be consistent yet still be unsound with respect to a model.

  10. ? is the symbol for existential quantification in TPTP FOF syntax.

  11. In the actual implementation, we attach some additional code for printing a debugging message when p fails (i.e., evaluates to False for a generated test case while testing) in order to alert the user to vacuously true test cases.

  12. The axiom mat_symm_trans happens to have duplicate names only in negative position, but the testing framework alpha renames any inner quantifications (including the ones at positive positions) when there are duplicate variable names.

  13. define commands derived from type annotation comments were omitted from Fig. 2 for simplicity.

  14. We express both plain formula and the tagged formula with a common parametrized data structure using the well known functional programming idiom called indirect composite (see http://haskell.org/haskellwiki/Indirect_composite).

  15. Here, we have chosen the names to coincide with the axioms names specified in the axiom file for clarity, but this is not necessary.

  16. We use Template Haskell to plug the interpretations of symbols with Haskell code into the automatically derived properties, which are also first class Haskell functions. This gives us the ability to interactively run tests and evaluate both the derived properties and the interpretation functions in the interactive environment of GHC. However, programming with Template Haskell is more complicated than writing plain Haskell code in several ways because of the stage restriction for compile time. We hide such details by using a preprocessing macro.

  17. http://hackage.haskell.org/package/logic-TPTP.

  18. To be more accurate, it is not just the logical equivalence since we also do transformation of unrolling which is only equivalent under the given interpretation.

References

  • Becker, M., & Smith, D. R. (2005). Model validation in Planware. In Verification and validation of model-based planning and scheduling systems (VVPS 2005), Monterey, CA, USA.

  • Berghofer, S., & Nipkow, T. (2004). Random testing in Isabelle/HOL. In 2nd IEEE international conference on software engineering and formal methods (SEFM 2004), pp. 230–239.

  • Blaine, L., Gilham, L., Liu, J., Smith, D., & Westfold, S. (1998). Planware: Domain-specific synthesis of high-performance schedulers. In The 13th IEEE international conference on automated software engineering (ASE ’98). IEEE Computer Society, Honolulu, Hawaii, USA, pp. 270–280.

  • Bradley, A. R., Manna, Z., & Sipma, H. B. (2006). What’s decidable about arrays? In E. A. Emerson & K. S. Namjoshi (Eds.), VMCAI, Springer, Lecture Notes in Computer Science, 3855, 427–442. http://dx.doi.org/10.1007/11609773_28.

  • Carlier, M., Dubois, C. (2008). Functional testing in the Focal environment. In B. Beckert & R. Hähnle (Eds.), The 2nd international conference on tests and proofs (TAP 2008) (Vol. 4966, pp. 84–98). Springer, Lecture Notes in Computer Science. http://dx.doi.org/10.1007/978-3-540-79124-9_7.

  • Claessen, K., & Hughes, J. (2000). QuickCheck: A lightweight tool for random testing of Haskell programs. In Proceedings of the ACM SIGPLAN international conference on functional programming, pp. 268–279.

  • Claessen, K., & Sutcliffe, G. (2009). A simple type system for FOF. http://www.cs.miami.edu/~tptp/TPTP/Proposals/TypedFOF.html.

  • Claessen, K., & Svensson, H. (2008). Finding counter examples in induction proofs. In The 2nd international conference on tests and proofs (TAP 2008), pp. 48–65.

  • Denney, E., & Fischer, B. (2008). Generating customized verifiers for automatically generated code. In Proceedings of the conference on generative programming and component engineering (GPCE ’08) (pp. 77–87). Nashville, TN: ACM Press.

  • Denney, E., & Trac, S. (2008). A software safety certification tool for automatically generated guidance, navigation and control code. In: IEEE aerospace conference.

  • Dutertre, B., & de Moura, L. (2006). The YICES SMT solver. Tool paper at http://yices.csl.sri.com/tool-paper.pdf.

  • Dybjer, P., Haiyan, Q., & Takeyama, M. (2003). Combining testing and proving in dependent type theory. In 16th International conference on theorem proving in higher order logics (TPHOLs 2003) (pp. 188–203). New York: Springer.

  • Fontaine, P. (2007). Combinations of theories and the bernays-schönfinkel-ramsey class. In B. Beckert (Ed.), VERIFY, CEUR-WS.org, CEUR workshop proceedings, Vol. 259. http://ceur-ws.org/Vol-259/paper06.pdf.

  • Green, C. (1969). The application of theorem proving to question-answering systems. PhD thesis, Stanford University.

  • Kuipers, J. B. (1999). Quaternions and rotation sequences. Princeton: Princeton University Press.

    MATH  Google Scholar 

  • McCarthy, J., & Painter, J. (1967). Correctness of a compiler for arithmetic expressions. In: J. T. Schwartz (Ed.), Proceedings symposium in applied mathematics (Vol. 19, pp. 33–41). Mathematical aspects of computer science. Providence, RI: American Mathematical Society.

  • Paulson, L., & Nipkow, T. (1994). Isabelle: A generic theorem prover. Lecture notes in computer science (Vol. 828). Springer, New York.

    Google Scholar 

  • Pérez, J. A. N., & Voronkov, A. (2007). Encodings of problems in effectively propositional logic. In: J. Marques-Silva & K. A. Sakallah (Eds.), SAT, Springer, lecture notes in computer science (Vol. 4501, p. 3). http://dx.doi.org/10.1007/978-3-540-72788-0_2.

  • Piskac, R., de Moura, L. M., Bjørner, N. (2010). Deciding effectively propositional logic using DPLL and substitution sets. Journal of Autom Reasoning, 44(4), 401–424. http://dx.doi.org/10.1007/s10817-009-9161-6.

    Google Scholar 

  • Sheard, T., & Peyton Jones, S. (2002). Template metaprogramming for Haskell. In: ACM SIGPLAN Haskell workshop 02 (pp. 1–16). New York: ACM Press.

  • Sutcliffe, G. (2000). System description: systemOn TPTP. In 17th International conference on automated deduction (CADE 2000)( Vol. 1831, pp. 406–410), Springer, Lecture notes in computer science.

  • Sutcliffe, G. (2009). The TPTP problem library and associated infrastructure: The FOF and CNF parts, v3.5.0. Journal of Automated Reasoning, 43(4), 337–362.

    Article  MATH  Google Scholar 

  • Sutcliffe, G., Denney, E., & Fischer, B. (2005). Practical proof checking for program certification. In: Proceedings of the CADE-20 workshop on empirically successful classical automated reasoning (ESCAR ’05).

  • Vallado, D. A. (2001). Fundamentals of astrodynamics and applications (2nd ed.). Torrance: Space Technology Library, Microcosm Press and Kluwer Academic Publishers.

    Google Scholar 

  • Weyhrauch, R. (1980). Prolegomena to a theory of mechanized formal reasoning. Artificial Intelligence, 13(1,2), 133–170.

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ki Yung Ahn.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ahn, K.Y., Denney, E. A framework for testing first-order logic axioms in program verification. Software Qual J 21, 159–200 (2013). https://doi.org/10.1007/s11219-011-9168-1

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11219-011-9168-1

Keywords

Navigation