Journal of Automated Reasoning

, Volume 53, Issue 2, pp 141–172 | Cite as

Machine Learning for First-Order Theorem Proving

Learning to Select a Good Heuristic
  • James P. Bridge
  • Sean B. Holden
  • Lawrence C. Paulson


We applied two state-of-the-art machine learning techniques to the problem of selecting a good heuristic in a first-order theorem prover. Our aim was to demonstrate that sufficient information is available from simple feature measurements of a conjecture and axioms to determine a good choice of heuristic, and that the choice process can be automatically learned. Selecting from a set of 5 heuristics, the learned results are better than any single heuristic. The same results are also comparable to the prover’s own heuristic selection method, which has access to 82 heuristics including the 5 used by our method, and which required additional human expertise to guide its design. One version of our system is able to decline proof attempts. This achieves a significant reduction in total time required, while at the same time causing only a moderate reduction in the number of theorems proved. To our knowledge no earlier system has had this capability.


Automatic theorem proving Machine learning First-order logic with equality Feature selection Support vector machines Gaussian processes 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Bache, K., Lichman, M.: UCI Machine Learning Repository (2013).
  2. 2.
    Baldi, P., Brunak, S., Chauvin, Y., Anderson, C.A.F., Nielsen, H.: Assessing the accuracy of prediction algorithms for classification: an overview. Bioinformatics 16(5), 412–424 (2000)CrossRefGoogle Scholar
  3. 3.
    Bishop, C.M.: Pattern Recognition and Machine Learning. Springer-Verlag (2006)Google Scholar
  4. 4.
    Bridge, J.P.: Machine Learning and Automated Theorem Proving. Tech. Rep. UCAM-CL-TR-792, University of Cambridge, Computer Laboratory (2010).
  5. 5.
    Chu, W., Ghahramani, Z., Falciani, F., Wild, D.L.: Biomarker discovery in microarray gene expression data with Gaussian processes. Bioinformatics 21(16), 3385–3393 (2005)CrossRefGoogle Scholar
  6. 6.
    Davis, M., Logemann, G., Loveland, D.: A machine program for theorem-proving. Commun. ACM 5(7), 394–397 (1962). doi:10.1145/368273.368557 MathSciNetCrossRefMATHGoogle Scholar
  7. 7.
    Davis, M., Putnam, H.: A computing procedure for quantification theory. J. ACM 7(3), 201–215 (1960). doi:10.1145/321033.321034 MathSciNetCrossRefMATHGoogle Scholar
  8. 8.
    Denzinger, J., Fuchs, M., Fuchs, M.: High performance ATP systems by combining several AI methods. In: Proceedings Fifteenth International Joint Conference on Artificial Intelligence (IJCAI) 1997, pp. 102–107. Morgan Kaufmann (1997)Google Scholar
  9. 9.
    Denzinger, J., Fuchs, M., Goller, C., Schulz, S.: Learning from Previous Proof Experience. Technical Report AR99-4, Institut für Informatik, Technische Universität München (1999)Google Scholar
  10. 10.
    Denzinger, J., Kronenburg, M., Schulz, S.: Discount - a distributed and learning equational prover. J. Autom. Reason. 18, 189–198 (1997). doi:10.1023/A:1005879229581 CrossRefGoogle Scholar
  11. 11.
    Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification, 2nd edn. Wiley (2000)Google Scholar
  12. 12.
    Erkek, C.A.: Mixture of Experts Learning in Automated Theorem Proving. Master’s thesis, Bogazici University (2010)Google Scholar
  13. 13.
    Fawcett, T.: An introduction to ROC analysis. Pattern Recogn. Lett. 27, 861–874 (2006)CrossRefGoogle Scholar
  14. 14.
    Fuchs, M.: Automatic selection of search-guiding heuristics for theorem proving. In: Proceedings of the 10th FLAIRS, pp. 1–5. Florida AI Research Society, Daytona Beach (1998)Google Scholar
  15. 15.
    Fuchs, M., Fuchs, M.: Feature-based learning of search-guiding heuristics for theorem proving. AI Commun. 11(3–4), 175–189 (1998)Google Scholar
  16. 16.
    Goller, C.: Learning search-control heuristics for automated deduction systems with folding architecture networks. In: Proceedings European Symposium on Artificial Neural Networks. D-Facto publications (1999)Google Scholar
  17. 17.
    Grimmett, G., Stirzaker, D.: Probability and Random Processes. Oxford University Press (2001)Google Scholar
  18. 18.
    Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003)MATHGoogle Scholar
  19. 19.
    Haim, S., Walsh, T.: Online estimation of SAT solving runtime. In: Kleine Büning, H., Zhao, X. (eds.) Theory and Applications of Satisfiability Testing – SAT 2008, Lecture Notes in Computer Science, vol. 4996, pp. 133–138. Springer, Berlin (2008). doi:10.1007/978-3-540-79719-7_12
  20. 20.
    Haim, S., Walsh, T.: Restart strategy selection using machine learning techniques. In: Kullmann, O. (ed.) Theory and Applications of Satisfiability Testing - SAT 2009, Lecture Notes in Computer Science, vol. 5584, pp. 312–325. Springer, Berlin (2009). doi:10.1007/978-3-642-02777-2_30
  21. 21.
    Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning. Springer Series in Statistics, 2nd edn. Springer (2009)Google Scholar
  22. 22.
    He, H.: Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21(9), 1263–1284 (2009)CrossRefGoogle Scholar
  23. 23.
    Hsu, C.W., Chang, C.C., Lin, C.J., et al.: A practical guide to support vector classification. Tech. rep., Department of Computer Science, National Taiwan University (2003)Google Scholar
  24. 24.
    Huth, M., Ryan, M.: Logic in Computer Science: Modelling and Reasoning about Systems, 2nd edn. Cambridge University Press (2004)Google Scholar
  25. 25.
    Joachims, T.: Making large-scale SVM learning practical. In: Schölkopf, B., Burges, C., Smola, A. (eds.) Advances in Kernel Methods - Support Vector Learning, chap. 11, pp. 169–184. MIT Press, Cambridge, MA (1999)Google Scholar
  26. 26.
    Kadioglu, S., Malitsky, Y., Sabharwal, A., Samulowitz, H., Sellmann, M.: Algorithm selection and scheduling. In: Lee, J. (ed.) Principles and Practice of Constraint Programming – CP 2011, Lecture Notes in Computer Science, vol. 6876, pp. 454–469. Springer, Berlin (2011). doi:10.1007/978-3-642-23786-7_35
  27. 27.
    Kohavi, R.: A study of cross-validation and bootstrap for accuracy estimation and model selection. In: Proceedings of the 14th International Joint Conference on Artificial Intelligence (IJCAI’95), vol. 2, pp. 1137–1143. Morgan Kaufmann (1995)Google Scholar
  28. 28.
    Lanckriet, G.R.G., Bie, T.D., Cristianini, N., Jordan, M.I., Noble, W.S.: A statistical framework for genomic data fusion. Bioinformatics 20(16), 2626–2635 (2004)CrossRefGoogle Scholar
  29. 29.
    Luenberger, D.G.: Linear and Nonlinear Programming. Kluwer (2003)Google Scholar
  30. 30.
    McCune, W.: Prover9 and Mace4 (2005–2010).
  31. 31.
    Mercer, J.: Functions of positive and negative type and their connection with the theory of integral equations. Philos. Trans. R. Soc. Lond. 209, 415–446 (1909)CrossRefMATHGoogle Scholar
  32. 32.
    Mitchell, T.: Machine Learning. McGraw Hill (1997)Google Scholar
  33. 33.
    Morik, K., Brockhausen, P., Joachims, T.: Combining statistical learning with a knowledge-based approach – a case study in intensive care monitoring. In: International Conference on Machine Learning (ICML), pp. 268–277. Bled, Slowenien (1999)Google Scholar
  34. 34.
    Nudelman, E., Leyton-Brown, K., Hoos, H., Devkar, A., Shoham, Y.: Understanding random SAT: Beyond the clauses-to-variables ratio. In: Wallace, M. (ed.) Principles and Practice of Constraint Programming – CP 2004, Lecture Notes in Computer Science, vol. 3258, pp. 438–452. Springer, Berlin (2004). doi:10.1007/978-3-540-30201-8_33
  35. 35.
    Pilkington, N.C.V., Trotter, M.W.B., Holden, S.B.: Multiple kernel learning for drug discovery. Mol. Inform. 31(3–4), 313–322 (2012)CrossRefGoogle Scholar
  36. 36.
    Rasmussen, C.E., Williams, C.KI.: Gaussian Processes for Machine Learning. Adaptive Computation and Machine Learning. MIT Press, Cambridge, MA (2006)Google Scholar
  37. 37.
    Rosenblatt, F.: Principles of Neurodynamics: Perceptrons and the Theory of Brain Mechanisms. Spartan Books (1962)Google Scholar
  38. 38.
    Samulowitz, H., Memisevic, R.: Learning to solve QBF. In: Proceedings of the 22nd National Conference on Artificial Intelligence - AAAI’07, vol. 1, pp. 255–260. AAAI Press (2007).
  39. 39.
    Schulz, S.: Learning Search Control Knowledge for Equational Deduction. No. 230 in DISKI. Akademische Verlagsgesellschaft Aka GmbH Berlin (2000)Google Scholar
  40. 40.
    Schulz, S.: E – a brainiac theorem prover. AI Commun. 15(2/3), 111–126 (2002)MATHGoogle Scholar
  41. 41.
    Shawe-Taylor, J., Cristianini, N.: Support Vector Machines and Other Kernel-Based Learning Methods. Cambridge University Press, Cambridge (2000)Google Scholar
  42. 42.
    Shawe-Taylor, J., Cristianini, N.: Kernel Methods for Pattern Analysis. Cambridge University Press, Cambridge (2004)Google Scholar
  43. 43.
    Sutcliffe, G.: The TPTP problem library and associated infrastructure: the FOF and CNF parts, v3.5.0. J. Autom. Reason. 43(4), 337–362 (2009)CrossRefMATHGoogle Scholar
  44. 44.
    Ting, K.M.: An instance-weighted method to induce cost-sensitive trees. IEEE Trans. Knowl. Data Eng. 14(3), 659–665 (2002)CrossRefGoogle Scholar
  45. 45.
    Urban, J.: MaLARea: a metasystem for automated reasoning in large theories. In: Urban, J., Sutcliffe, G., Schulz, S. (eds.) Proceedings of the CADE-21 Workshop on Empirically Successful Automated Reasoning in Large Theories, no. 257 in CEUR Workshop Proceedings, pp. 45–58 (2007)Google Scholar
  46. 46.
    Williams, C.KI., Barber, D.: Bayesian classification with Gaussian processes. IEEE Trans Pattern. Anal. Mach. Intell. 20(12), 1342–1351 (1998)CrossRefGoogle Scholar
  47. 47.
    Xu, L., Hutter, F., Hoos, H., Leyton-Brown, K.: SATzilla: Portfolio-based algorithm selection for SAT. J. Artif. Intell. Res. 32, 565–606 (2008)MATHGoogle Scholar
  48. 48.
    Xu, L., Hutter, F., Hoos, H., Leyton-Brown, K.: Features for SAT (2012). Available at
  49. 49.
    Xu, L., Hutter, F., Shen, J., Hoos, H., Leyton-Brown, K.: Satzilla2012: improved algorithm slection based on cost-sensitive classification models. In: Balint, A., Belov, A., Diepold, D., Gerber, S., Järvisalo, M., Sinz, C. (eds.) Proceedings of SAT Challange 2012: Solver and Benchmark Descriptions, Department of Computer Science Series of Publications B, vol. B-2012-2, pp. 57–58. University of Helsinki (2012)Google Scholar

Copyright information

© Springer Science+Business Media Dordrecht 2014

Authors and Affiliations

  • James P. Bridge
    • 1
  • Sean B. Holden
    • 1
  • Lawrence C. Paulson
    • 1
  1. 1.Computer LaboratoryUniversity of CambridgeCambridgeUK

Personalised recommendations