Online Machine Learning Techniques for Coq: A Comparison

Zhang, Liao; Blaauwbroek, Lasse; Piotrowski, Bartosz; Černỳ, Prokop; Kaliszyk, Cezary; Urban, Josef

doi:10.1007/978-3-030-81097-9_5

Liao Zhang^10,12,
Lasse Blaauwbroek^10,11,
Bartosz Piotrowski^10,13,
Prokop Černỳ¹⁰,
Cezary Kaliszyk^12,13 &
…
Josef Urban¹⁰

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12833))

Included in the following conference series:

International Conference on Intelligent Computer Mathematics

794 Accesses
2 Citations

Abstract

We present a comparison of several online machine learning techniques for tactical learning and proving in the Coq proof assistant. This work builds on top of Tactician, a plugin for Coq that learns from proofs written by the user to synthesize new proofs. Learning happens in an online manner, meaning that Tactician’s machine learning model is updated immediately every time the user performs a step in an interactive proof. This has important advantages compared to the more studied offline learning systems: (1) it provides the user with a seamless, interactive experience with Tactician and, (2) it takes advantage of locality of proof similarity, which means that proofs similar to the current proof are likely to be found close by. We implement two online methods, namely approximate k-nearest neighbors based on locality sensitive hashing forests and random decision forests. Additionally, we conduct experiments with gradient boosted trees in an offline setting using XGBoost. We compare the relative performance of Tactician using these three learning methods on Coq’s standard library.

This work was supported by the ERC grant no. 714034 SMART, by the European Regional Development Fund under the project AI&Reasoning (reg. no. CZ.02.1.01/0.0/0.0/15_003/0000466), and by the Ministry of Education, Youth and Sports within the dedicated program ERC CZ under the project POSTMAN no. LL1902.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
If we have labels \(\{a, a, b, b, b\}\), ideally, we would like to produce a split which passes all the examples with label a to one side and the examples with b to the other side.
2.
Doing the splits in the leaves has quadratic time complexity with respect to the number of examples stored in the leaf; sometimes it happens, that leaves of the trees store large number of examples.
3.
The results here are not directly comparable to those in Table 2 mainly due to the usage of a non-indexed version of k-NN in contrast to the algorithm presented in 1.

References

Bansal, K., Loos, S.M., Rabe, M.N., Szegedy, C., Wilcox, S.: HOList: an environment for machine learning of higher order logic theorem proving. In: Chaudhuri, K., Salakhutdinov, R. (eds.) Proceedings of the 36th International Conference on Machine Learning, ICML 2019, Long Beach, California, USA, 9–15 June 2019. Proceedings of Machine Learning Research, vol. 97, pp. 454–463. PMLR (2019)
Google Scholar
Bawa, M., Condie, T., Ganesan, P.: LSH forest: Self-tuning indexes for similarity search. In: Ellis, A., Hagino, T. (eds.) Proceedings of the 14th International Conference on World Wide Web, WWW 2005, Chiba, Japan, 10–14 May 2005, pp. 651–660. ACM (2005)
Google Scholar
Bentley, J.L.: Multidimensional binary search trees used for associative searching. Commun. ACM 18(9), 509–517 (1975)
Article Google Scholar
Blaauwbroek, L., Urban, J., Geuvers, H.: Tactic learning and proving for the Coq proof assistant. In: Albert, E., Kovács, L. (eds.) Proceedings of the 23rd International Conference on Logic for Programming, Artificial Intelligence and Reasoning, LPAR 2020. EPiC Series in Computing, vol. 73, pp. 138–150. EasyChair (2020)
Google Scholar
Blaauwbroek, L., Urban, J., Geuvers, H.: The tactician. In: Benzmüller, C., Miller, B. (eds.) CICM 2020. LNCS (LNAI), vol. 12236, pp. 271–277. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-53518-6_17
Chapter Google Scholar
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Article Google Scholar
Broder, A.Z.: On the resemblance and containment of documents. In: Carpentieri, B., Santis, A.D., Vaccaro, U., Storer, J.A. (eds.) Compression and Complexity of SEQUENCES 1997, Positano, Amalfitan Coast, Salerno, Italy, 11–13 June 1997, Proceedings, pp. 21–29. IEEE (1997)
Google Scholar
Chen, T., Guestrin, C.: XGBoost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 785–794 (2016)
Google Scholar
Chvalovský, K., Jakubův, J., Suda, M., Urban, J.: ENIGMA-NG: efficient neural and gradient-boosted inference guidance for E. In: Fontaine, P. (ed.) CADE 2019. LNCS (LNAI), vol. 11716, pp. 197–215. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-29436-6_12
Chapter Google Scholar
Domingos, P.M., Hulten, G.: Mining high-speed data streams. In: Ramakrishnan, R., Stolfo, S.J., Bayardo, R.J., Parsa, I. (eds.) Proceedings of the sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 71–80. ACM (2000)
Google Scholar
Driscoll, J.R., Sarnak, N., Sleator, D.D., Tarjan, R.E.: Making data structures persistent. J. Comput. Syst. Sci. 38(1), 86–124 (1989)
Article MathSciNet Google Scholar
Färber, M., Kaliszyk, C.: Random forests for premise selection. In: Lutz, C., Ranise, S. (eds.) FroCoS 2015. LNCS (LNAI), vol. 9322, pp. 325–340. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24246-0_20
Chapter Google Scholar
Gauthier, T., Kaliszyk, C.: Premise selection and external provers for HOL4. In: Leroy, X., Tiu, A. (eds.) Proceedings of the 4th Conference on Certified Programs and Proofs (CPP 2015), pp. 49–57. ACM (2015)
Google Scholar
Gauthier, T., Kaliszyk, C., Urban, J.: TacticToe: learning to reason with HOL4 tactics. In: Eiter, T., Sands, D. (eds.) Proceedings of the 21st International Conference on Logic for Programming, Artificial Intelligence and Reasoning, LPAR-21. EPiC Series in Computing, vol. 46, pp. 125–143. EasyChair (2017)
Google Scholar
Gauthier, T., Kaliszyk, C., Urban, J., Kumar, R., Norrish, M.: TacticToe: learning to prove with tactics. J. Autom. Reason. 65(2), 257–286 (2021)
Article MathSciNet Google Scholar
Gionis, A., Indyk, P., Motwani, R.: Similarity search in high dimensions via hashing. In: Atkinson, M.P., Orlowska, M.E., Valduriez, P., Zdonik, S.B., Brodie, M.L. (eds.) Proceedings of 25th International Conference on Very Large Data Bases, VLDB 1999, Edinburgh, Scotland, UK, 7–10 September 1999, pp. 518–529. Morgan Kaufmann (1999)
Google Scholar
Har-Peled, S., Indyk, P., Motwani, R.: Approximate nearest neighbor: towards removing the curse of dimensionality. Theory Comput. 8(1), 321–350 (2012)
Article MathSciNet Google Scholar
Jones, K.S.: A statistical interpretation of term specificity and its application in retrieval. J. Documentation 60(5), 493–502 (2004)
Article Google Scholar
Kaliszyk, C., Urban, J., Michalewski, H., Olšák, M.: Reinforcement learning of theorem proving. In: Bengio, S., Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., Garnett, R. (eds.) Advances in Neural Information Processing Systems 31, pp. 8836–8847. Curran Associates, Inc. (2018)
Google Scholar
Kaliszyk, C., Urban, J., Vyskočil, J.: Efficient semantic features for automated reasoning over large theories. In: Yang, Q., Wooldridge, M. (eds.) Proceedings of the 24th International Joint Conference on Artificial Intelligence, (IJCAI 2015), pp. 3084–3090. AAAI Press (2015)
Google Scholar
Mitchell, T.M.: Machine Learning, International Edition. McGraw-Hill Series in Computer Science. McGraw-Hill (1997)
Google Scholar
Nagashima, Y., He, Y.: PaMpeR: proof method recommendation system for Isabelle/HOL. In: Huchard, M., Kästner, C., Fraser, G. (eds.) Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering, ASE 2018, Montpellier, France, 3–7 September 2018, pp. 362–372. ACM (2018)
Google Scholar
Nagashima, Y., Kumar, R.: A proof strategy language and proof script generation for Isabelle/HOL. In: de Moura, L. (ed.) CADE 2017. LNCS (LNAI), vol. 10395, pp. 528–545. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-63046-5_32
Chapter Google Scholar
Piotrowski, B., Urban, J.: ATPboost: learning premise selection in binary setting with ATP feedback. In: Galmiche, D., Schulz, S., Sebastiani, R. (eds.) IJCAR 2018. LNCS (LNAI), vol. 10900, pp. 566–574. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-94205-6_37
Chapter Google Scholar
Saffari, A., Leistner, C., Santner, J., Godec, M., Bischof, H.: On-line random forests. In: 12th IEEE International Conference on Computer Vision Workshops, ICCV Workshops 2009, Kyoto, Japan, 27 September–4 October 2009, pp. 1393–1400. IEEE Computer Society (2009)
Google Scholar
The Coq Development Team: The Coq proof assistant, version 8.11.0, October 2019
Google Scholar
Zhang, C., Zhang, Y., Shi, X., Almpanidis, G., Fan, G., Shen, X.: On incremental learning for gradient boosting decision trees. Neural Process. Lett. 50(1), 957–987 (2019)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Czech Technical University, Prague, Czech Republic
Liao Zhang, Lasse Blaauwbroek, Bartosz Piotrowski, Prokop Černỳ & Josef Urban
Radboud University, Nijmegen, The Netherlands
Lasse Blaauwbroek
University of Innsbruck, Innsbruck, Austria
Liao Zhang & Cezary Kaliszyk
University of Warsaw, Warsaw, Poland
Bartosz Piotrowski & Cezary Kaliszyk

Authors

Liao Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Lasse Blaauwbroek
View author publications
You can also search for this author in PubMed Google Scholar
Bartosz Piotrowski
View author publications
You can also search for this author in PubMed Google Scholar
Prokop Černỳ
View author publications
You can also search for this author in PubMed Google Scholar
Cezary Kaliszyk
View author publications
You can also search for this author in PubMed Google Scholar
Josef Urban
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Heriot-Watt University, Edinburgh, UK
Fairouz Kamareddine
University of Bologna, Bologna, Italy
Claudio Sacerdoti Coen

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, L., Blaauwbroek, L., Piotrowski, B., Černỳ, P., Kaliszyk, C., Urban, J. (2021). Online Machine Learning Techniques for Coq: A Comparison. In: Kamareddine, F., Sacerdoti Coen, C. (eds) Intelligent Computer Mathematics. CICM 2021. Lecture Notes in Computer Science(), vol 12833. Springer, Cham. https://doi.org/10.1007/978-3-030-81097-9_5

Download citation

DOI: https://doi.org/10.1007/978-3-030-81097-9_5
Published: 20 July 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-81096-2
Online ISBN: 978-3-030-81097-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics