A framework for testing first-order logic axioms in program verification

Ahn, Ki Yung; Denney, Ewen

doi:10.1007/s11219-011-9168-1

A framework for testing first-order logic axioms in program verification

Published: 16 November 2011

Volume 21, pages 159–200, (2013)
Cite this article

Software Quality Journal Aims and scope Submit manuscript

Ki Yung Ahn¹ &
Ewen Denney²

255 Accesses
7 Citations
Explore all metrics

Abstract

Program verification systems based on automated theorem provers rely on user-provided axioms in order to verify domain-specific properties of code. However, formulating axioms correctly (that is, formalizing properties of an intended mathematical interpretation) is non-trivial in practice, and avoiding or even detecting unsoundness can sometimes be difficult to achieve. Moreover, speculating soundness of axioms based on the output of the provers themselves is not easy since they do not typically give counterexamples. We adopt the idea of model-based testing to aid axiom authors in discovering errors in axiomatizations. To test the validity of axioms, users define a computational model of the axiomatized logic by giving interpretations to the function symbols and constants in a simple declarative programming language. We have developed an axiom testing framework that helps automate model definition and test generation using off-the-shelf tools for meta-programming, property-based random testing, and constraint solving. We have experimented with our tool to test the axioms used in Auto-Cert, a program verification system that has been applied to verify aerospace flight code using a first-order axiomatization of navigational concepts, and were able to find counterexamples for a number of axioms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Notes

The converse is not always true, however: provers can time out or the domain theory might be incomplete.
In fact, we could even choose to define the interpretation using the actual libraries that will be used in the production implementation, by calling on the Haskell library or using the Foreign Function Interface of Haskell to call external libraries written in other languages like C.
We generate properties as first class values of Haskell, which is different from writing an interpreter or an evaluator over syntax trees of first-order logic formulae. We use Template Haskell (Sheard and Peyton Jones 2002) to generate these properties as Haskell code at compile time from the axiom formulae input and the user-provided interpretation.
Not all axiom formulae that can be written with TPTP FOF syntax have a natural computational interpretation. In practice, however, most axioms, which are actually used in software verification, or as constructive axioms for other domains, do have a computational interpretation. In Sect. 5, we discuss the class of axioms that our framework can test.
We will write TPTP syntax (see http://www.cs.miami.edu/~tptp/TPTP/SyntaxBNF.html) in typewriter font, e.g., $ {\tt ![X,Y,Z]:((X=Y\&} \tilde{\,}{\tt (Y=Z)) => X=Z)}.$ We use this plain text TPTP syntax when we quote axiom formulas verbatim. When we discuss first-order formulae more conceptually or mathematically, such as when we describe algorithms on formulae, we use conventional math symbols for logical connectives, e.g., $ \forall{\tt [X,Y,Z]:(( X=Y \land \neg(Y=Z)) \Rightarrow X=Z)}.$ We use italic math font for metavariables of formulae (P, Q, R, A, B, C, D) and terms (T).
We intentionally used the same Haskell names as the logical symbol names (e.g., mapping “ hi ” to hi ) for the sake of readability, but they do not have to be the same.
We specify type annotations in TPTP definition comments, which are comments that start with either %$ (for line comments) or /*\$ (for block comments). A type annotation list is simply a comma delimited list of identifiers starting with lowercase alphabet, enclosed in square brackets. The length of a type annotation list must be the same as the list of universally qualified variables that is being annotated.
http://www.tptp.org/.
It is not enough to just try and prove false since different provers exploit inconsistency in different ways. Moreover, a logic can be consistent yet still be unsound with respect to a model.
? is the symbol for existential quantification in TPTP FOF syntax.
In the actual implementation, we attach some additional code for printing a debugging message when p fails (i.e., evaluates to False for a generated test case while testing) in order to alert the user to vacuously true test cases.
The axiom mat_symm_trans happens to have duplicate names only in negative position, but the testing framework alpha renames any inner quantifications (including the ones at positive positions) when there are duplicate variable names.
define commands derived from type annotation comments were omitted from Fig. 2 for simplicity.
We express both plain formula and the tagged formula with a common parametrized data structure using the well known functional programming idiom called indirect composite (see http://haskell.org/haskellwiki/Indirect_composite).
Here, we have chosen the names to coincide with the axioms names specified in the axiom file for clarity, but this is not necessary.
We use Template Haskell to plug the interpretations of symbols with Haskell code into the automatically derived properties, which are also first class Haskell functions. This gives us the ability to interactively run tests and evaluate both the derived properties and the interpretation functions in the interactive environment of GHC. However, programming with Template Haskell is more complicated than writing plain Haskell code in several ways because of the stage restriction for compile time. We hide such details by using a preprocessing macro.
http://hackage.haskell.org/package/logic-TPTP.
To be more accurate, it is not just the logical equivalence since we also do transformation of unrolling which is only equivalent under the given interpretation.

References

Becker, M., & Smith, D. R. (2005). Model validation in Planware. In Verification and validation of model-based planning and scheduling systems (VVPS 2005), Monterey, CA, USA.
Berghofer, S., & Nipkow, T. (2004). Random testing in Isabelle/HOL. In 2nd IEEE international conference on software engineering and formal methods (SEFM 2004), pp. 230–239.
Blaine, L., Gilham, L., Liu, J., Smith, D., & Westfold, S. (1998). Planware: Domain-specific synthesis of high-performance schedulers. In The 13th IEEE international conference on automated software engineering (ASE ’98). IEEE Computer Society, Honolulu, Hawaii, USA, pp. 270–280.
Bradley, A. R., Manna, Z., & Sipma, H. B. (2006). What’s decidable about arrays? In E. A. Emerson & K. S. Namjoshi (Eds.), VMCAI, Springer, Lecture Notes in Computer Science, 3855, 427–442. http://dx.doi.org/10.1007/11609773_28.
Carlier, M., Dubois, C. (2008). Functional testing in the Focal environment. In B. Beckert & R. Hähnle (Eds.), The 2nd international conference on tests and proofs (TAP 2008) (Vol. 4966, pp. 84–98). Springer, Lecture Notes in Computer Science. http://dx.doi.org/10.1007/978-3-540-79124-9_7.
Claessen, K., & Hughes, J. (2000). QuickCheck: A lightweight tool for random testing of Haskell programs. In Proceedings of the ACM SIGPLAN international conference on functional programming, pp. 268–279.
Claessen, K., & Sutcliffe, G. (2009). A simple type system for FOF. http://www.cs.miami.edu/~tptp/TPTP/Proposals/TypedFOF.html.
Claessen, K., & Svensson, H. (2008). Finding counter examples in induction proofs. In The 2nd international conference on tests and proofs (TAP 2008), pp. 48–65.
Denney, E., & Fischer, B. (2008). Generating customized verifiers for automatically generated code. In Proceedings of the conference on generative programming and component engineering (GPCE ’08) (pp. 77–87). Nashville, TN: ACM Press.
Denney, E., & Trac, S. (2008). A software safety certification tool for automatically generated guidance, navigation and control code. In: IEEE aerospace conference.
Dutertre, B., & de Moura, L. (2006). The YICES SMT solver. Tool paper at http://yices.csl.sri.com/tool-paper.pdf.
Dybjer, P., Haiyan, Q., & Takeyama, M. (2003). Combining testing and proving in dependent type theory. In 16th International conference on theorem proving in higher order logics (TPHOLs 2003) (pp. 188–203). New York: Springer.
Fontaine, P. (2007). Combinations of theories and the bernays-schönfinkel-ramsey class. In B. Beckert (Ed.), VERIFY, CEUR-WS.org, CEUR workshop proceedings, Vol. 259. http://ceur-ws.org/Vol-259/paper06.pdf.
Green, C. (1969). The application of theorem proving to question-answering systems. PhD thesis, Stanford University.
Kuipers, J. B. (1999). Quaternions and rotation sequences. Princeton: Princeton University Press.
MATH Google Scholar
McCarthy, J., & Painter, J. (1967). Correctness of a compiler for arithmetic expressions. In: J. T. Schwartz (Ed.), Proceedings symposium in applied mathematics (Vol. 19, pp. 33–41). Mathematical aspects of computer science. Providence, RI: American Mathematical Society.
Paulson, L., & Nipkow, T. (1994). Isabelle: A generic theorem prover. Lecture notes in computer science (Vol. 828). Springer, New York.
Google Scholar
Pérez, J. A. N., & Voronkov, A. (2007). Encodings of problems in effectively propositional logic. In: J. Marques-Silva & K. A. Sakallah (Eds.), SAT, Springer, lecture notes in computer science (Vol. 4501, p. 3). http://dx.doi.org/10.1007/978-3-540-72788-0_2.
Piskac, R., de Moura, L. M., Bjørner, N. (2010). Deciding effectively propositional logic using DPLL and substitution sets. Journal of Autom Reasoning, 44(4), 401–424. http://dx.doi.org/10.1007/s10817-009-9161-6.
Google Scholar
Sheard, T., & Peyton Jones, S. (2002). Template metaprogramming for Haskell. In: ACM SIGPLAN Haskell workshop 02 (pp. 1–16). New York: ACM Press.
Sutcliffe, G. (2000). System description: systemOn TPTP. In 17th International conference on automated deduction (CADE 2000)( Vol. 1831, pp. 406–410), Springer, Lecture notes in computer science.
Sutcliffe, G. (2009). The TPTP problem library and associated infrastructure: The FOF and CNF parts, v3.5.0. Journal of Automated Reasoning, 43(4), 337–362.
Article MATH Google Scholar
Sutcliffe, G., Denney, E., & Fischer, B. (2005). Practical proof checking for program certification. In: Proceedings of the CADE-20 workshop on empirically successful classical automated reasoning (ESCAR ’05).
Vallado, D. A. (2001). Fundamentals of astrodynamics and applications (2nd ed.). Torrance: Space Technology Library, Microcosm Press and Kluwer Academic Publishers.
Google Scholar
Weyhrauch, R. (1980). Prolegomena to a theory of mechanized formal reasoning. Artificial Intelligence, 13(1,2), 133–170.
Article MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Portland State University, Portland, OR, 97201, USA
Ki Yung Ahn
NASA Ames Research Center, m/s 269-2, Moffett Field, CA, 94035, USA
Ewen Denney

Authors

Ki Yung Ahn
View author publications
You can also search for this author in PubMed Google Scholar
Ewen Denney
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ki Yung Ahn.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ahn, K.Y., Denney, E. A framework for testing first-order logic axioms in program verification. Software Qual J 21, 159–200 (2013). https://doi.org/10.1007/s11219-011-9168-1

Download citation

Published: 16 November 2011
Issue Date: March 2013
DOI: https://doi.org/10.1007/s11219-011-9168-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A framework for testing first-order logic axioms in program verification

Abstract

Access this article

Similar content being viewed by others

An empirical study of automated unit test generation for Python

Machine learning and logic: a new frontier in artificial intelligence

Automating requirements analysis and test case generation

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A framework for testing first-order logic axioms in program verification

Abstract

Access this article

Similar content being viewed by others

An empirical study of automated unit test generation for Python

Machine learning and logic: a new frontier in artificial intelligence

Automating requirements analysis and test case generation

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation