Testing a Saturation-Based Theorem Prover: Experiences and Challenges

  • Giles Reger
  • Martin SudaEmail author
  • Andrei Voronkov
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10375)


This paper attempts to address the question of how best to assure the correctness of saturation-based automated theorem provers using our experience with developing the theorem prover Vampire. We describe the techniques we currently employ to ensure that Vampire is correct and use this to motivate future challenges that need to be addressed to make this process more straightforward and to achieve better correctness guarantees.


Theorem Prover Software Product Line Incorrect Result Proof Step Proof Search 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Artho, C., Biere, A., Seidl, M.: Model-based testing for verification back-ends. In: Veanes, M., Viganò, L. (eds.) TAP 2013. LNCS, vol. 7942, pp. 39–55. Springer, Heidelberg (2013). doi: 10.1007/978-3-642-38916-0_3 CrossRefGoogle Scholar
  2. 2.
    Bachmair, L., Ganzinger, H.: Resolution theorem proving. In: Handbook of Automated Reasoning, vol. 1, chap. 2, pp. 19–99. Elsevier Science (2001)Google Scholar
  3. 3.
    Barrett, C., Conway, C., Deters, M., Hadarean, L., Jovanovic, D., King, T., Reynolds, A., Tinelli, C.: CVC4. In: Gopalakrishnan, G., Qadeer, S. (eds.) CAV 2011. LNCS, vol. 6806, pp. 171–177. Springer, Heidelberg (2011). doi: 10.1007/978-3-642-22110-1_14 CrossRefGoogle Scholar
  4. 4.
    Barrett, C., Stump, A., Tinelli, C.: The Satisfiability Modulo Theories Library (SMT-LIB) (2010).
  5. 5.
    Brummayer, R., Lonsing, F., Biere, A.: Automated testing and debugging of SAT and QBF solvers. In: Strichman, O., Szeider, S. (eds.) SAT 2010. LNCS, vol. 6175, pp. 44–57. Springer, Heidelberg (2010). doi: 10.1007/978-3-642-14186-7_6 Google Scholar
  6. 6.
    Creignou, N., Egly, U., Seidl, M.: A framework for the specification of random SAT and QSAT formulas. In: Brucker, A.D., Julliand, J. (eds.) TAP 2012. LNCS, vol. 7305, pp. 163–168. Springer, Heidelberg (2012). doi: 10.1007/978-3-642-30473-6_14 CrossRefGoogle Scholar
  7. 7.
    Dershowitz, N., Jayasimha, D.N., Park, S.: Bounded fairness. In: Dershowitz, N. (ed.) Verification: Theory and Practice. LNCS, vol. 2772, pp. 304–317. Springer, Heidelberg (2003). doi: 10.1007/978-3-540-39910-0_14 CrossRefGoogle Scholar
  8. 8.
    Diekert, V., Leucker, M.: Topology, monitorable properties and runtime verification. Theor. Comput. Sci. 537, 29–41 (2014). Theoretical Aspects of Computing (ICTAC 2011)MathSciNetCrossRefzbMATHGoogle Scholar
  9. 9.
    Falcone, Y., Fernandez, J.C., Mounier, L.: Runtime verification of safety-progress properties. In: Bensalem, S., Peled, D.A. (eds.) RV 2009. LNCS, vol. 5779, pp. 40–59. Springer, Heidelberg (2009). doi: 10.1007/978-3-642-04694-0_4 CrossRefGoogle Scholar
  10. 10.
    Hoder, K., Reger, G., Suda, M., Voronkov, A.: Selecting the selection. In: Olivetti, N., Tiwari, A. (eds.) IJCAR 2016. LNCS, vol. 9706, pp. 313–329. Springer, Cham (2016). doi: 10.1007/978-3-319-40229-1_22 Google Scholar
  11. 11.
    Hoder, K., Voronkov, A.: Sine qua non for large theory reasoning. In: Bjørner, N., Sofronie-Stokkermans, V. (eds.) CADE 2011. LNCS, vol. 6803, pp. 299–314. Springer, Heidelberg (2011). doi: 10.1007/978-3-642-22438-6_23 CrossRefGoogle Scholar
  12. 12.
    Kaliszyk, C., Urban, J.: Hol(y)hammer: online ATP service for HOL light. Math. Comput. Sci. 9(1), 5–22 (2015)CrossRefzbMATHGoogle Scholar
  13. 13.
    Korovin, K.: iProver - an instantiation-based theorem prover for first-order logic (system description). In: Armando, A., Baumgartner, P., Dowek, G. (eds.) IJCAR 2008. LNCS, vol. 5195, pp. 292–298. Springer, Heidelberg (2008). doi: 10.1007/978-3-540-71070-7_24 CrossRefGoogle Scholar
  14. 14.
    Kovács, L., Voronkov, A.: First-order theorem proving and vampire. In: Sharygina, N., Veith, H. (eds.) CAV 2013. LNCS, vol. 8044, pp. 1–35. Springer, Heidelberg (2013). doi: 10.1007/978-3-642-39799-8_1 CrossRefGoogle Scholar
  15. 15.
    Paulson, L.C., Blanchette, J.C.: Three years of experience with sledgehammer, a practical link between automatic and interactive theorem provers. In: The 8th International Workshop on the Implementation of Logics, IWIL 2010. EPiC Series in Computing, vol. 2, pp. 1–11. EasyChair (2012)Google Scholar
  16. 16.
    Perrouin, G., Sen, S., Klein, J., Baudry, B., Traon, Y.l.: Automated and scalable t-wise test case generation strategies for software product lines. In: Proceedings of the 2010 Third International Conference on Software Testing, Verification and Validation, ICST 2010, pp. 459–468. IEEE Computer Society (2010)Google Scholar
  17. 17.
    Reger, G.: Better proof output for Vampire. In: Proceedings of the 3rd Vampire Workshop, Vampire 2016. EPiC Series in Computing, vol. 44, pp. 46–60. EasyChair (2017)Google Scholar
  18. 18.
    Reger, G., Suda, M.: The uses of sat solvers in vampire. In: Proceedings of the 1st and 2nd Vampire Workshops. EPiC Series in Computing, vol. 38, pp. 63–69. EasyChair (2016)Google Scholar
  19. 19.
    Reger, G., Suda, M., Voronkov, A.: New techniques in clausal form generation. In: 2nd Global Conference on Artificial Intelligence, GCAI 2016. EPiC Series in Computing, vol. 41, pp. 11–23. EasyChair (2016)Google Scholar
  20. 20.
    Reger, G., Suda, M., Voronkov, A.: Testing a Saturation-Based Theorem Prover: Experiences and Challenges (Extended Version). ArXiv e-prints (2017)Google Scholar
  21. 21.
    Riazanov, A., Voronkov, A.: Limited resource strategy in resolution theorem proving. J. Symb. Comput. 36(1–2), 101–115 (2003)MathSciNetCrossRefzbMATHGoogle Scholar
  22. 22.
    Schulz, S.: E - a brainiac theorem prover. AI Commun. 15(2–3), 111–126 (2002)zbMATHGoogle Scholar
  23. 23.
    Sutcliffe, G.: Semantic derivation verification: techniques and implementation. Int. J. Artif. Intell. Tools 15(6), 1053–1070 (2006)CrossRefGoogle Scholar
  24. 24.
    Sutcliffe, G.: The TPTP problem library and associated infrastructure. J. Autom. Reason. 43(4), 337–362 (2009)CrossRefzbMATHGoogle Scholar
  25. 25.
    Sutcliffe, G.: The CADE ATP system competition - CASC. AI Mag. 37(2), 99–101 (2016)zbMATHGoogle Scholar
  26. 26.
    Weidenbach, C., Dimova, D., Fietzke, A., Kumar, R., Suda, M., Wischnewski, P.: SPASS version 3.5. In: Schmidt, R.A. (ed.) CADE 2009. LNCS, vol. 5663, pp. 140–145. Springer, Heidelberg (2009). doi: 10.1007/978-3-642-02959-2_10 CrossRefGoogle Scholar
  27. 27.
    Zeller, A.: Yesterday, my program worked. Today, it does not. Why? In: Nierstrasz, O., Lemoine, M. (eds.) ESEC/FSE 1999. LNCS, vol. 1687, pp. 253–267. Springer, Heidelberg (1999). doi: 10.1007/3-540-48166-4_16 Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.University of ManchesterManchesterUK
  2. 2.TU WienViennaAustria
  3. 3.Chalmers University of TechnologyGothenburgSweden

Personalised recommendations