Developing and Debugging Proof Strategies by Tinkering
 6 Citations
 1.3k Downloads
Abstract
Previously, we have developed a graphical proof strategy language, called PSGraph [4], to support the development and maintenance of large and complex proof tactics for interactive theorem provers. By using labelled hierarchical graphs this formalisation improves upon tactic composition, analysis and maintenance compared with traditional tactic languages. PSGraph has been implemented as the Tinker system, supporting the Isabelle and ProofPower theorem provers [5]. In this paper we present Tinker2, a new version of Tinker, which provides enhancements in user interaction and experience, together with: novel support for controlled inspection; debugging using breakpoints and a logging mechanism; and advanced recording, exporting and reply.
Keywords
Goal Node Theorem Prover Proof Strategy Goal Type Hierarchical Graph1 PSGraph and Tinker
Most interactive theorem provers provide users with a tactic language in which they can encode common proof strategies in order to reduce user interaction. To encode proof strategies, these languages typically provide: a set of functions, called tactics, which reduces subgoals into smaller and simpler subgoals; and a set of combinators, called tacticals, which combines tactics in different ways.
Composition in most tacticals either relies on the number and the order of subgoals, or is to try all tactics on all subgoals. The former is brittle as the number and the order could be changed if any of the subtactics changes; and the latter is hard to debug and maintain, as if a proof fails the actual position is hard to find. It is also difficult for others to see the intuition behind tactic design.
The main advantages of PSGraph over more traditional tactic languages (e.g. as found in Isabelle and ProofPower) are the ability of a stepbystep inspection of how subgoals flow through the graph during evaluation, combined with features to debug and modify it. Such features are of great aid when debugging and maintaining proof strategies. It also provides a more intuitive representation to understand how the proof strategy works, also for nondevelopers (similar to graph visualisation of proofs in e.g. [7]). Lowlevel details can be hidden by using hierarchies to improve readability. Such features rely on good GUI support, which was only partially supported by the original Tinker tool [5]. Here, we introduce Tinker2, a new version of Tinker, which extends Tinker with new features, including supports for: library and hierarchical graphs; richer tactic and debugging options; and recording and replay. Figure 2 shows the Tinker2 GUI and its layout.
We will use the ProofPower instance of Tinker2 in this paper, albeit we could just as well have used Isabelle as the features are identical. In Sect. 2 we focus on how to develop proof strategies from scratch; in Sect. 3 we discuss advanced features of evaluating, debugging, recording and replaying proofs; while we conclud and briefly discuss related and further work in Sect. 4.
2 Developing Proof Strategies
 Atomic Tactics. An atomic tactic wraps a tactic of the underlying theorem prover, which by default has the same as the name of the node. Tinker2 will automatically use all available tactics from the underlying prover. New tactics can be defined in the tactic editor of the Tinker2 GUI. To illustrate, the tactic definitioncreates a tactic with no argument (fn []). This tactic will be parsed and stored by the CORE, so that it can be used.$$\begin{aligned} \texttt {tactic~all}\_\exists \_\texttt {uncurry := fn}\,\,\,\texttt {[] => conv}\_\texttt {tac all}\_\exists \_\texttt {uncurry}\_\texttt {conv;} \end{aligned}$$

Hierarchical Nodes. Modularity is achieved by hierarchies. This can also help to reduce the complexity and size of a PSGraph by hiding parts of it. We will illustrate the new hierarchy features below.

Identity Nodes. Identity nodes are used to fanout and join wires. As the name suggests, they do not change the subgoals.

Breakpoints. A novel feature of Tinker2 is the introduction of breakpoint nodes, which can be added/removed from wires by a simple mouse click. We return to this is in Sect. 3.

Goal Nodes. A goal node wraps a subgoal of a proof, and this can not be modified by the user, i.e. these nodes can only be changed through tactic applications, and introduced by the CORE when a new proof is started.
Reuse of PSGraphs is supported by a library. This feature is provided in the Library panel (see Fig. 2). The items in the library are PSGraphs. Therefore, the library can also be customised by simply copying PSGraph files into the library directory. When importing an item from the library to the current PSGraph, Tinker2 will copy it to the graph that the user is currently editing and merge all the required information, such as defined tactics and goal types.
3 Evaluating, Debugging, Exporting and Replaying
A PSGraph in Tinker2 can be applied as a normal tactic/method within an Isabelle or ProofPower proof script. This is the normal execution. However, if it fails, it can instead be run in an ‘interactive mode’ where the GUI is used to visualise and guide how the proof proceeds and identify where it failed. Compared with the first version of Tinker, users can now: (1) select which goal to apply; (2) choose between stepping into and stepping over the evaluation of hierarchical nodes; (3) apply and complete the current hierarchical tactic; (4) apply and finish the whole proof strategy; (5) insert a breakpoint and evaluate a graph automatically until the break point is reached by a goal. These options are illustrated in the Drawing and evaluations controls panel of Fig. 2 (see also [9]), which also shows a break point in the graph.
To support debugging, an evaluation log, which shows the details of the current proof status, can be displayed. The log uses tags that can be used to filter the log to tags of interests. It also contains a realtime development mode that allows users to develop proof strategies seamlessly during proof tasks. Here, a user can freely edit the PSGraph (except for the goal nodes), e.g. change a tactic node, and then submit the changes to continue the current evaluation with the updated PSGraph. This is achieved using a new communication protocol, with details available in the second author’s UG thesis [1], Note that this is currently not sufficiently constrained as one could edit paths a subgoal has already passed thus invalidating the proof status, which we are now working on (see Sect. 4).
Tinker2 provides new features to export PSGraphs and record proofs. A PSGraph can be exported to the SVG format, e.g. to use in a paper; Fig. 6 illustrates this as the SVG diagram has been exported from Tinker2. The recording feature can be switched on/off to start/pause recording changes made to a graph. These changes could have been made by the user or by the tool during evaluation. Once completed, such recording can be exported to a lightweight web application (written in HTML / CSS and JavaScript) via a generated JSON file. Figure 4 (right) shows a screenshot of this, while [9] shows an example of this together with several screencasts of the GUI.
4 Conclusion, Related and Future Work
We have introduced a new version of the Tinker tool, called Tinker2, with a range of novel features to develop, debug, maintain, record and export hierarchical proof strategies. With Tinker2, users can easily reuse existing PSGraphs to develop and debug structured and intuitive hierarchical proof strategies. The most relevant work is the first version of the Tinker tool [5], which we have compared with throughout. It is also important to note that Tinker/Tinker2 is built on top of the Quantomatic graph rewriting engine [6], which is used internally as a library function. The second author has also developed webbased version of Tinker, which supports a subset of the GUI features discussed here [1]. With the exception of simple proof visualisation (e.g. [7]), we are not familiar with any other graphical proof tools to support theorem provers. While there are tactic languages that support robust tactics (e.g. Ltac [3] for Coq), we believe that the development and debugging features of Tinker2 are novel.
With DRisQ (www.drisq.com) we are using Tinker2 to encode their highly complex Supertac proof strategy in ProofPower [8]. Several enhancements have been motivated by this work. In the future, we would like to improve static checking of PSGraph, such as being able to validate a PSGraph before evaluation. We also plan to improve the layout algorithm, and develop and implement a better framework for combining evaluation and user edits of PSGraphs.
Footnotes
References
 1.Le Bras, P.: Web based interface for graphical proof strategies. Undergraduate CS Honours Thesis (2015). https://goo.gl/LWG522
 2.Le Bras, P., Grov, G., Lin, Y.: Tinker: User guide. http://ggrov.github.io/tinker/userGuides.pdf
 3.Delahaye, D.: A proof dedicated metalanguage. Electron. Notes Theoret. Comput. Sci. 70(2), 96–109 (2002)CrossRefzbMATHGoogle Scholar
 4.Grov, G., Kissinger, A., Lin, Y.: A graphical language for proof strategies. In: McMillan, K., Middeldorp, A., Voronkov, A. (eds.) LPAR19 2013. LNCS, vol. 8312, pp. 324–339. Springer, Heidelberg (2013)CrossRefGoogle Scholar
 5.Grov, G., Kissinger, A., Lin, Y.: Tinker, tailor, solver, proof. In: UITP 2014. ENTCS, vol. 167, pp. 23–34. Open Publishing Association (2014)Google Scholar
 6.Kissinger, A., Zamdzhiev, V.: Quantomatic: a proof assistant for diagrammatic reasoning. In: Felty, A.P., Middeldorp, A. (eds.) CADE25. LNCS, vol. 9195, pp. 326–336. Springer, New York (2015)CrossRefGoogle Scholar
 7.Libal, T., Riener, M., Rukhaia, M.: Advanced proof viewing in ProofTool. In: UITP 2014. EPTCS, vol. 167, pp. 35–47. Open Publishing Association (2014)Google Scholar
 8.O’Halloran, C.: Automated verification of code automatically generated from Simulink. ASE 20(2), 237–264 (2013)Google Scholar
 9.Le Bras, P., Lin, Y., Grov, G.: Tinker2  TACAS 16 paper resources. http://ggrov.github.io/tinker/tacas16/. Accessed 17 October 2015