Harpoon: Mechanizing Metatheory Interactively (System Description)

. Beluga is a proof checker that provides sophisticated infrastructure for implementing formal systems with the logical framework LF and proving metatheoretic properties as total, recursive functions transforming LF derivations. In this paper, we describe Harpoon , an interactive proof engine built on top of Beluga . It allows users to develop proofs interactively using a small, ﬁxed set of high-level actions that safely transform a subgoal. A sequence of actions elaborates into a (partial) proof script that serves as an intermediate representation describing an assertion-level proof. Last, a proof script translates into a Beluga program which can be type-checked independently. Harpoon is available on GitHub. We have used Harpoon to replay a wide array of examples covering all features supported by Beluga . In particular, we have used it for normalization proofs, including the recently proposed POPLMark reloaded challenge.


Introduction
Mechanizing formal systems and proofs about them plays an important role in establishing trust in programming languages and verifying software systems in general. Key questions in this setting are how to represent variables, (simultaneous) substitutions, assumptions, and derivations that depend on assumptions. Higher-order abstract syntax (HOAS) provides an elegant and unifying answer to these questions, relieving users from having to write boilerplate code.
Beluga is a proof checker with built-in support for HOAS encodings of formal systems based on the logical framework LF [13]. Metatheoretic inductive proofs are implemented as recursive, dependently-typed functions that manipulate and transform HOAS representations [21,4,25]. In this paper, we describe the interactive proof engine Harpoon which is built on top of Beluga. A Harpoon user modularly and incrementally develops a metatheoretic proof by solving independent subgoals via a fixed set of high-level actions. An action eliminates the subgoal on which it is executed, filling it with a proof that possibly contains new subgoals to be resolved. The actions we support are: introduction of assumptions, case-analysis, inductive reasoning, and both forward and backward reasoning styles.
While our fixed set of actions is largely inspired by similar systems such as Twelf [20,28,27] and Abella [11], Harpoon advances the state of the art in interactively developing mechanized proofs about HOAS representations in two ways: 1. We treat subgoals as first-class and characterize them using contextual types that pair their goal types together with the contexts in which they are meaningful; a contextual substitution property guarantees that each step of proof development correctly refines the partial proof under construction [8]. 2. Rather than simply record the sequence of actions given by the user, we elaborate this sequence into an assertion-level proof [15], represented as what we call a proof script. The proof script is what we record as output of an interactive session. It can be both typechecked directly and translated into a Beluga program.
We have used Harpoon (see https://beluga-lang.readthedocs.io/) on a wide range of representative examples from the Beluga library: normalization proofs for the simply-typed lambda calculus [6], benchmarks for reasoning about binders [9,10], and the recent POPLMark Reloaded challenge [1]. These examples involve numerous concerns that arise in proof development, and cover all the domainspecific abstractions that Beluga provides. Our experience shows that Harpoon lowers the entry barrier for users: they only need to understand how to represent formal systems and derivations using HOAS encodings and can then manipulate the HOAS representations directly via the high-level actions which correspond closely to how proofs are developed on paper. As such, we believe that Harpoon eases the task of proving metatheoretic statements.

Proof Development in Harpoon
We introduce the main features of Harpoon by interactively developing the proof of two lemmas that play a central role in the proof of weak normalization of the simply-typed lambda calculus. For a more detailed description, see [6].

Initial setup: encoding the language
We begin by defining the simply-typed lambda-calculus in the logical framework LF [13] using an intrinsically typed encoding. In typical HOAS style, lambda abstraction takes an LF function representing the abstraction of a term over a variable. There is no case for variables, as they are treated implicitly. We remind the reader that this is a weak, representational function space -there is no case analysis or recursion, so only genuine lambda terms can be represented.
Next, we define a small-step operational semantics for the language. For simplicity, we use a call-by-name reduction strategy and do not reduce under lambda-abstractions. Note that we use LF application to encode the object-level substitution in the s_beta rule.

Termination Property: intros, split, unbox, and solve
As the first short lemma, we show the Termination property: if M' is known to halt and steps M M', then M also halts. We start our interactive proof session by loading the signature and defining the name of the theorem and the statement that we want to prove. We pair each LF object such as step M M' together with the LF context in which it is meaningful [21,26,19]. We refer to such an object as a contextual object and embed contextual types, written as _ _ , into Beluga types using the "box" syntax. In this example, the LF context, written on the left of , is empty, as we consider closed LF objects. As before, the free variables M and M' are implicitly quantified at the outside. They themselves stand for contextual objects and have contextual type ( tm T). The theorem statements are hence statements about contextual LF objects and directly correspond to Beluga types.
The proof begins with a single subgoal whose type is simply the statement of the theorem under no assumptions. Since this subgoal has a function type, Harpoon will automatically apply the intros action, which introduces assumptions as follows: First, the (implicitly) universally quantified variables M, M' are added to the meta-context. This context collects parameters introduced by universal quantifiers. This is in contrast with the computational context, which collects assumptions introduced by the simple function space. In particular, the second phase of the intros action adds the assumptions s : [ step M M'] and h : [ halts M'] to the computational context. Observe that since M and M' have type tm T, intros also adds T to the meta-context, although it is implicit in the definitions of step and halts and is not visible at all in the theorem statement (see the meta-context Fig. 1 step 1).
The proof proceeds by inversion on h. Using the split action, we add the two new assumptions S:( steps M' M2) and V:( val M2) to the meta-context Step 1 Step 2 Step 3 (see Fig. 1, step 1.). To build a proof for [ halts M], we need to show that there is a step from M to some value M2. To build such a derivation, we use first the unbox action on the computation-level assumption s to obtain an assumption S' in the meta-context which is accessible to the LF layer (inside a box) (see Fig. 1, step 2.). Finally, we can finish the proof by supplying the term [ halts/m (next S' S) V] with the solve action (see Fig. 1, step 3). This is similar to the exact tactic in Coq.

Meta
The resulting proof script is given below. Assertions are written in boldface and curly braces denote new scopes, listing the full meta-context and the full computational context. Using an erasure we can then generate a translated program in the external syntax, i.e. the syntax a user would use when implementing the proof directly, rather than the internal syntax. It is hence much more compact than the actual proof script. This program can then be seamlessly combined with hand-written Beluga programs and can also independently type-checked.

Setup continued: reducibility
We now consider one of the key lemmas in the weak normalization proof, called the backwards closed lemma, i.e. if M' is reducible at some type T and M steps to M', then M is also reducible at T. We begin to define a set of terms reducible at a type T. All reducible terms are required to halt, and reducible terms at an arrow type are required to produce reducible output given reducible input. Concretely, a term M is reducible at type (arr T1 T2), if for all terms N:tm T1 where N is reducible at type T1, then (app M N) is reducible at type T2. Reducibility cannot be directly encoded on the LF layer, as it is not merely describing the syntax of an expression or derivation. Instead, we encode the set of reducible terms using the stratified type Reduce which is recursively defined on the type T in Beluga (see [16]). Note that we write { } for explicit universal quantification over contextual objects.

Backwards Closed Property: msplit, suffices, and by
We can now state the backwards closed lemma formally as follows: if M' is reducible at some type T and M steps to M', then M is also reducible at T. We prove this lemma by induction on T. This is specified by referring to the position of the induction variable in the statement. ], we use msplit T to split the proof into two cases (see Fig. 2, step 1). Whereas split case analyzes a Beluga type, msplit considers the cases for a (contextual) LF type. In reality, msplit is implemented in terms of the split action.
The case for T = unit is straightforward (see Fig. 2, steps 2 and 3). First, we use the split action to invert the premise r : Reduce [ unit] [ M']. Then, we use the by action to invoke the halts_step lemma (see Sec. 2.2) to obtain an assumption h : [ halts M]. We solve this case by supplying the term Unit h (see Fig. 2 step 3).
In the case for T = arr T1 T2, we begin similarly by inversion on r using the split action (see Fig. 3 step 4). We observe that the goal type is Such backwards reasoning is accomplished via the suffices action. The user supplies a term representing an implication whose conclusion is compatible with the current goal and proceeds to prove its premises as specified (see Fig.3

step 5).
Step 1 Step 2 Step 3  To prove the first premise, we apply the halts_step lemma (see Fig. 3  Finally, we appeal to the induction hypothesis. Using the by action, we refer to the recursive call to complete the proof (see Fig. 3 step 7). The resulting proof script (of around 70 lines) can again be translated into a compact program.
Note that Harpoon allows users to use underscores to stand for arguments that are uniquely determined (see Harpoon Proof 3 step 7). We enforce that these underscores stand for uniquely determined objects in order to guarantee that the contexts and the goal type of every subgoal are closed. This ensures modularity: solving one subgoal does not affect any other open subgoals. As a consequence, users are not restricted in their proof development. As they would on paper, users can work on goals in any order, mix forward and backward reasoning, erase wrong parts, and replace them by correct steps.
Using the explained actions, one can now prove the fundamental lemma and the weak normalization theorem. For a more detailled description of this proof in Beluga see [5,6].

Additional actions.
Harpoon supports some additional features not discussed in this paper; see https://beluga-lang.readthedocs.io/ for a complete list of actions. In general, these actions add no expressive power, but enable more precise expression of a user's intent. For example, the invert action splits on the type of a given term, ensuring that there is a unique case to consider. It is implemented simply as the split action followed by an additional check.

Implementation of Harpoon
Harpoon is a front end that allows users to construct a proof for a theorem statement represented as a Beluga type. Types in Beluga include universal quantification over contextual types (dependent function space, written with curly braces), implications (simple function space), boxed contextual types, and stratified/recursive types (written as c − → C where C stands for a contextual object). In addition, Beluga supports quantification over LF contexts and even LF substitutions relating two LF contexts. We omit these below for simplicity, although they are also supported in Harpoon. In essence, Beluga types correspond to statements in first-order logic over a domain consisting of contextual objects, LF contexts, and LF substitutions. We can view c − → C and [Ψ A] as atomic propositions. Users construct a natural deduction proof for a theorem statement where Γ , the computation context, contains hypotheses introduced from the simple function space and where ∆, the meta-context, holds parameters introduced from the universal quantifier (curly-brace syntax) or by lifting an assumption [Ψ A] from Γ (box-elimination rule).
A subgoal in Harpoon is a typed hole in the proof that remains to be filled by the user. Such a hole is represented by a subgoal variable, the type of which is a contextual type (∆; Γ τ ) that captures the typechecking state at the point the variable occurs [19,3]: it remains to construct a proof for τ with the parameters from ∆ and the assumptions from Γ . Subgoal variables in the proof script are collected into a subgoal context and substitution of subgoal variables is typepreserving [8]. Interactive actions are implemented with subgoal substitutions, so the correctness of interactive proof refinement is a consequence of the subgoal substitution property. Note that a subgoal's type cannot itself contain subgoalsthe subgoal type must be fully determined, so solving one subgoal cannot affect any other subgoal. Furthermore, subgoal variables may be introduced only in positions where we must construct a normal term (written e); these are terms that we must check against a given type. This given type becomes part of the subgoal's type. Subgoal variables stand thus in contrast with ordinary variables, which are neutral terms (written i). (See [14,26,16] for examples of this so-called bi-directional characterization of normal and neutral proof terms in Beluga.) An action is executed on a subgoal to eliminate it, while possibly introducing new subgoals. Actions emphasize the bi-directional nature of interactive proof construction: some demand normal terms e and others demand neutral terms i. To execute an action, the system synthesizes a proof script fragment from it, and substitutes that fragment for the current subgoal. Any subgoal variables present in the fragment become part of the subgoal context, and the user will have to solve them later. When no subgoals remain, the proof script is closed and can be translated straightforwardly to a Beluga program in internal (fully elaborated) syntax. We employ an erasure to display the program to the user. These are the essential actions for proof development, omitting our so-called "administrative" actions (such as undo): Actions α ::= intros | solve e | by i as x | unbox i as X | split i | suffices i by − → τ intros introduces all assumptions from function types in the current goal; solve closes the current subgoal with a given a normal term, introducing no new subgoals. This action trivially makes Harpoon complete, as a full Beluga program could be given via solve to eliminate the initial subgoal of any proof. The action by enables introducing an intermediate result, often from a lemma or an induction hypothesis, demanding a neutral term i and binding it to a given name; unbox is the same as by, but it binds the result as a variable in the metacontext; split considers a covering set of cases for a neutral term (typically a variable) and generates possible induction hypotheses based on the specified induction order, (for details on coverage, see [24]); suffices allows programmers to reason backwards by supplying a neutral term i of function type and the types − → τ of arguments to construct for this function.

Empirical evaluation of Harpoon
We give a summary of representative case studies that we replayed using Harpoon in Table  STLC weak normalization [6] Case analysis on LF contexts, substitution variables, parameter variables, and inductive and stratified types.
STLC strong normalization [1] Larger development (310 commands), all forms of case analysis as above.
STLC alg. equality completeness [6] Larger development (180 commands), all forms of case analysis as above.  [18]. This evaluation gives us confidence in the robustness and expressive power of Harpoon.

Related work
There are several approaches to specify and reason about formal systems.
Beluga and hence Harpoon belong to the lineage of the Twelf system [20], which also implements the logical framework LF. Metatheoretic proofs in Twelf are implemented as relations. Totality checking then ensures that these relations correspond to actual proofs. As Twelf is limited to proving Π 1 formulas ("forallexists" statements), normalization proofs using logical relations cannot be directly encoded. Although Harpoon's actions are largely inspired by the internal actions of Twelf's (experimental) fully-automated metatheorem prover [28,27], Harpoon supports user interaction, more expressive theorem statements, and generation of proof witnesses, in the form of both the generated proof script and Beluga program resulting from translation.
The Abella system [11] also provides an interactive theorem prover for reasoning about specifications using HOAS. First, its theoretical basis is quite different from Beluga's: Abella's reasoning logic extends first-order logic with a ∇ quantifier [12] that is used to express properties about variables. Second, Abella's interactive mode provides a fixed set of tactics, similar to the actions we describe in this paper. However, these tactics only loosely connect to the actual theoretical foundation of Abella and no proof terms are generated as witnesses by the Abella system.
We can also reason about formal systems in general purpose proof assistants such as Coq. The general philosophy in such systems is that users should be in the position of writing complex domain-specific tactics to facilitate proof construction using languages such as LTac [7] or MTac(2) [29,17]. Although this is an extremely flexible approach, we believe that the tactic-centric view often obscures the actual line of reasoning in the proof. The proofs themselves can often be illegible and incomprehensible. Further, strong static guarantees about interactive proof construction are lacking; for example, dynamic checks enforce variable dependencies. In contrast, our goal is to enable mechanized proof development in a style close to that of a proof on paper. Thus we provide a fixed set of tactics suitable for a wide array of proofs, so users can concentrate on proof development instead of tactic development. As such, our work draws inspiration from [2] where the authors describe high-level actions within the tutorial proof checker Tutch. Our work extends and adapts this view to the mechanization of inductive metatheoretic proofs based on HOAS representations.

Conclusion
We have presented Harpoon, an interactive command-driven front-end of Beluga for mechanizing meta-theoretic proofs based on high-level actions. The sequence of interactive actions is elaborated into a proof script behind the scenes that represents an assertion-level proof. Last, proof scripts can soundly be translated to Beluga programs. We have evaluated Harpoon on several casestudies, ranging from purely syntactic arguments to proofs by logical relations. Our experience is that Harpoon lowers the entry barrier for users to develop meta-theoretic proofs about HOAS encodings.
In the future, we aim to extend Harpoon with additional high-level actions that support further automation. A natural first step is to support an action trivial which would attempt to automatically close an open sub-goal.