Verifying HigherOrder Functions with Tree Automata
Abstract
This paper describes a fully automatic technique for verifying safety properties of higherorder functional programs. Tree automata are used to represent sets of reachable states and functional programs are modeled using term rewriting systems. From a tree automaton representing the initial state, a completion algorithm iteratively computes an automaton which overapproximates the output set of the program to verify. We identify a subclass of higherorder functional programs for which the completion is guaranteed to terminate. Precision and termination are obtained conjointly by a careful choice of equations between terms. The verification objective can be used to generate sets of equations automatically. Our experiments show that tree automata are sufficiently expressive to prove intricate safety properties and sufficiently simple for the verification result to be certified in Coq.
1 Introduction
Higherorder functions are an integral feature of modern programming languages such as Java, Scala or JavaScript, not to mention Haskell and Caml. Higherorder functions are useful for program structuring but pose a challenge when it comes to reasoning about the correctness of programs that employ them. To this end, the correctnessminded software engineer can opt for proving properties interactively with the help of a proof assistant such as Coq [13] or Isabelle/HOL [30], or write a specification in a formalism such as Liquid Types [31] or Bounded Refinement Types [33, 34] and ask an SMT solver whether it can prove the verification conditions generated from this specification. This approach requires expertise of the formal method used, and both the proof construction and the annotation phase can be time consuming.
Another approach is based on fully automated verification tools, where the proof is carried out automatically without annotations or intermediate lemmas. This approach is accessible to a larger class of programmers but applies to a more restricted class of program properties. The flow analysis of higherorder functions was studied by Jones [21] who proposed to model higherorder functions as term rewriting systems and use regular grammars to approximate the result. More recently, the breakthrough results of Ong [29] and Kobayashi [23, 24, 26] show that combining abstraction with model checking techniques can be used with success to analyse higherorder functions automatically. Their approach relies on abstraction for computing overapproximations of the set of reachable states, on which safety properties can then be verified.
In this paper, we pursue the goals of higherorder functional verification using an approach based on the original term rewriting models of Jones. We present a formal verification technique based on Tree Automata Completion (TAC) [20], capable of checking a class of properties, called regular properties, of higherorder programs in a fully automatic manner. In our approach, a program is represented as a term rewriting system \(\mathcal {R} \) and the set of (possibly infinite) inputs to this program as a tree automaton \(\mathcal {A} \). The TAC algorithm computes a new automaton \(\mathcal {A} ^{*}\), by completing \(\mathcal {A} \) with all terms reachable from \(\mathcal {A} \) by \(\mathcal {R} \)rewriting. This automaton representation of the reachable terms contains all intermediate states as well as the final output of the program. Checking correctness properties of the program is then reduced to checking properties of the computed automaton. Moreover, our completionbased approach permits to certify automatically \(\mathcal {A} ^{*}\) in Coq [6], i.e. given \(\mathcal {A} \), \(\mathcal {R} \) and \(\mathcal {A} ^{*}\), obtain the formal proof that \(\mathcal {A} ^{*}\) recognizes all terms reachable from \(\mathcal {A} \) by \(\mathcal {R} \)rewriting.
Example 1

We state and prove a general termination theorem for the Tree Automata Completion algorithm (Sect. 3);

From the conditions of the theorem we characterise a class of higherorder functional programs for which the completion algorithm terminates (Sect. 4). This class covers common usage of higherorder features in functional programming languages.

We define an algorithm that is able to automatically generate equations for enforcing convergence, thus avoiding any user intervention (Sect. 5).
All proofs missing in this paper can be found in the accompanying technical report [17]. The paper is organised as follow: We describe the completion algorithm and how to use equations to ensure termination in Sect. 2.1. The technical contributions as described above are developed in Sects. 3 to 5. In Sect. 6, we present a series of experiments validating our verification technique, and discuss the certification of results in Coq. We present related work in Sect. 7. Section 8 concludes the paper.
2 Background
This section introduces basic concepts used throughout the paper. We recall the usual definitions of term rewriting systems and tree automata, and present the completion algorithm which forms the basis of our verification technique.
2.1 Term Rewriting and Tree Automata
Term Rewriting Systems. [1] provide a flexible way of defining functional programs and their semantics. A rewriting system is a pair \(\langle \mathcal {F}, \mathcal {R} \rangle \), where \(\mathcal {F} \) is an alphabet and \(\mathcal {R} \) a set of rewriting rules of the form \(l \rightarrow r\), where \(l, r \in \mathcal {T} (\mathcal {F}, \mathcal {X}),\ l \not \in \mathcal {X} \) and \( Var (r) \subseteq Var (l)\). A TRS can be seen as a set of rules, each of them defining one step of computation. We write \(\mathcal {R} \) a rewriting system \(\langle \mathcal {F}, \mathcal {R} \rangle \) if there is no ambiguity on \(\mathcal {F} \). A rewriting rule \(l \rightarrow r\) is said to be leftlinear if the term l is linear. Example 1 shows a TRS representing a functional program, where each rule is leftlinear. In that case we say that the TRS \(\mathcal {R} \) is leftlinear.
Tree Automata. [12] are a convenient way to represent regular sets of terms. A tree automaton is a quadruple \(\langle \mathcal {F}, \mathcal {Q}, \mathcal {Q} _f,\varDelta \rangle \) where \(\mathcal {F} \) is an alphabet, \(\mathcal {Q} \) a finite set of states, \(\mathcal {Q} _f\) the set of final states, and \(\varDelta \) a rewriting system on \(\mathcal {F} \cup \mathcal {Q} \). Rules in \(\varDelta \), called transitions, are of the form \(l \rightarrow q\) where \(q \in \mathcal {Q} \) and l is either a state (\(\in \mathcal {Q} \)), or a configuration of the form \(f(q_1, \dots , q_n)\) with \(f \in \mathcal {F}, q_1 \dots q_n \in \mathcal {Q} \). A term t is recognised by a state \(q \in \mathcal {Q} \) if \(t \rightarrow _\varDelta ^{*} q\), which we also write \(t \rightarrow _\mathcal {A} ^{*} q\). We write \(L(\mathcal {A}, q)\) for the language of all terms recognised by q. A term t is recognised by \(\mathcal {A} \) if there exists \(q \in \mathcal {Q} _f\) s.t. \(t \in L(\mathcal {A}, q)\). In that case we write \(t \in L(\mathcal {A})\). E.g., the tree automaton \(\mathcal {A} = \langle \mathcal {F}, \mathcal {Q}, \mathcal {Q} _f, \varDelta \rangle \) with \(\mathcal {F} = \{0 : 0, s : 1\}\), \(\mathcal {Q} _f = \{q_{pair}\}\) and \(\varDelta = \{0 \rightarrow q_{pair}, s(q_{odd}) \rightarrow q_{pair}, s(q_{pair}) \rightarrow q_{odd}, nil \rightarrow q_{list}, cons(q_{pair}, q_{list}) \rightarrow q_{list}\}\) recognises all lists of even natural numbers.
An \(\epsilon \)transition is a transition \(q \rightarrow q'\) where \(q \in \mathcal {Q} \). A tree automaton \(\mathcal {A} \) is \(\epsilon \)free if it contains no \(\epsilon \)transitions. \(\mathcal {A} \) is deterministic if for all terms t there is at most one state q such that \(t \rightarrow _\varDelta ^{*} q\). \(\mathcal {A} \) is reduced if for all q there is at least one term t such that \(t \rightarrow _\varDelta ^{*} q\).
2.2 Tree Automata Completion Algorithm
Definition 1
(Critical pair). A critical pair is a triple \(\langle l \rightarrow r, \sigma , q \rangle \) where \(l \rightarrow r \in \mathcal {R} \), \(\sigma \) is a substitution, and \(q \in \mathcal {Q} \) such that \(l\sigma \rightarrow _{\mathcal {A} ^i}^{*} q\) and Open image in new window .
Completing a critical pair consists in adding the necessary transitions in \(\mathcal {A} ^{i+1}\) to have \(r\sigma \rightarrow _{\mathcal {A} ^{i+1}}^{*} q\), and hence \(r\sigma \in L(\mathcal {A} ^{i+1}, q)\).
Example 2
Let \(\mathcal {A} ^0\) be the previously defined tree automaton recognising all lists of even natural numbers. Let \(\mathcal {R} = \{s(s(x)) \rightarrow s(x)\}\). \(\mathcal {A} ^0\) has a critical pair \(\langle s(s(x)) \rightarrow s(x), \sigma , q_{pair} \rangle \) with \(\sigma (x) = q_{pair}\). To complete the automaton, we need to add transition such that \(s(q_{pair}) \rightarrow _{\mathcal {A} ^1}^{*} q_{pair}\). Since we already have the state \(q_{odd}\) recognising \(s(q_{pair})\), we only add the transition \(q_{odd} \rightarrow q_{pair}\). The formal definition of the completion step, including the procedure of choosing which new transition to introduce, can be found in [17].
Definition 2
(Simplification Relation). Let \(\mathcal {A} = \langle \mathcal {F}, \mathcal {Q}, \mathcal {Q} _f, \varDelta \rangle \) be a tree automaton and E be a set of equations. If \(s = t \in E, \sigma : \mathcal {X} \mapsto \mathcal {Q}, q, q' \in \mathcal {Q} \) such that \(s\sigma \rightarrow _\mathcal {A} ^{*} q, t\sigma \rightarrow _\mathcal {A} ^{*} q'\) and \(q \ne q'\) then \(\mathcal {A} \) can be simplified into \(\mathcal {A} ' = \mathcal {A} \{q' \mapsto q\}\) (where \(q'\) has been substitued by q), denoted by \(\mathcal {A} \leadsto _E \mathcal {A} '\).
Example 3
This example shows how using equations can lead to approximations in tree automata. Let \(\mathcal {A} \) be the tree automaton defined by the set of transitions \(\varDelta = \{0 \rightarrow q_0, s(q_0) \rightarrow q_1\}\). This automaton recognises the two terms 0 in \(q_0\) and s(0) (also known as 1) in \(q_1\). Let \(E = \{s(\underline{x}) = \underline{x}\}\) containing the equation that equates a number and its successor. For \(\sigma = \{\underline{x} \mapsto 0\}\) we have \(s(\underline{x})\sigma \rightarrow _\mathcal {A} q_1\), \(\underline{x}\sigma \rightarrow _\mathcal {A} q_0\) and \(s(\underline{x})\sigma =_E \underline{x}\sigma \). Then in \(\mathcal {S} _E(\mathcal {A})\), \(q_0\) and \(q_1\) are merged. The resulting automaton has transitions \(\{0 \rightarrow q_0, s(q_0) \rightarrow q_0\}\), which recognises \(\mathbb {N}\) in \(q_0\).
The idea behind the simplification is to overapproximate \(\mathcal {R} ^{*}(L(\mathcal {A}))\) when it is not regular. It has been shown in [19] that it is possible to tune the precision of the approximation. For a given TRS \(\mathcal {R} \), initial state automaton \(\mathcal {A} \) and set of equations E, the termination of the completion algorithm is undecidable in general, even with the use of equations. Our contribution in this paper consists in finding a class of TRS/programs and equations E for which the completion algorithm with equations terminates.
3 Termination of Tree Automata Completion
In this section, we show that termination of the completion algorithm with a set of equations E is ensured under the following conditions: if (i) \(\mathcal {A} ^k\) is reduced \(\epsilon \)free and deterministic (written REFD in the rest of the paper) for all k; (ii) every term of \(\mathcal {A} ^k\) can be rewritten into a term of a given language \(\mathcal {L} \subseteq \mathcal {T} (\mathcal {F})\) using \(\mathcal {R} \) (for instance if \(\mathcal {R} \) is terminating); (iii) \(\mathcal {L} \) has a finite number of equivalence classes w.r.t E. Completion is known to preserve \({\not \epsilon }\)reduceness and \({\not \epsilon }\)determinism if \(E \supseteq E_r \cup E_\mathcal {R} \) [19] where \(E_\mathcal {R} = \{s = t\ \ s \rightarrow t \in \mathcal {R} \}\) and \(E_r = \{f(x_1, \dots , x_n) = f(x_1, \dots , x_n)\ \ f \in \mathcal {F} ^n\}\). Condition (i) is ensured by showing that, in our verification setting, completion preserve REFD. The last condition is ensured by having \(E \supseteq E_\mathcal {L} ^c\) where \(E_\mathcal {L} ^c\) is a set of contracting equations.
Definition 3
(Contracting Equations). Let \(\mathcal {L} \subseteq \mathcal {T} (\mathcal {F})\). A set of equations is contracting for \(\mathcal {L} \), denoted by \(E_\mathcal {L} ^c\), if all equations of \(E_\mathcal {L} ^c\) are of the form \(u = u_p\) with u a linear term of \(\mathcal {T} (\mathcal {F},\mathcal {X})\), \(p \ne \lambda \) and if the set of normal forms of \(\mathcal {L} \) w.r.t the TRS \(\vec {E_\mathcal {L} ^c} = \{u \rightarrow u_p\ \ u = u_p \in E_\mathcal {L} ^c\}\) is finite.
Example 4
Assume that \(\mathcal {F} =\{0 : 0, s : 1\}\). The set \(E_\mathcal {L} ^c=\{s(x)=x\}\) is contracting for \(\mathcal {L} =\mathcal {T} (\mathcal {F})\) because the set of normal forms of \(\mathcal {T} (\mathcal {F})\) with respect to \(\vec {E_\mathcal {L} ^c} = \{s(x)\rightarrow x\}\) is the (finite) set \(\{0\}\). The set \(E_\mathcal {L} ^c=\{s(s(x))=x\}\) is contracting because the normal forms of \(\{s(s(x))\rightarrow x\}\) are \(\{0,s(0)\}\).
The contracting equations ensure that the completion algorithm will merge enough states during the simplification steps to terminate. Note that \(E_\mathcal {L} ^c\) cannot be empty, unless \(\mathcal {L} \) is finite. To prove termination of completion, we first prove that it is possible to bound the number of states needed in \(\mathcal {A} ^{*}\) to recognise a language \(\mathcal {L} \) by the number of normal forms of \(\mathcal {L} \) with respect to \(\vec {E_\mathcal {L} ^c}\). In our case \(\mathcal {L} \) will be the set of output terms of the program. Since \(\mathcal {A} ^{*}\) does not only recognises the output terms, we need additional states to recognise intermediate computation terms. In the proof of Theorem 1 we show that with \(E_\mathcal {R} \), the simplification steps will merge the states recognising the intermediate computation with the states recognising the outputs. If the latter set of states is finite then we can show that \(\mathcal {A} ^{*}\) is finite.
Theorem 1
Let \(\mathcal {A} \) be an REFD tree automaton, \(\mathcal {R} \) a leftlinear TRS, E a set of equations and \(\mathcal {L} \) a language closed by subterms such that for all \(k \in \mathbb {N}\) and for all \(s \in L_\unrhd (\mathcal {A} ^k)\), there exists \(t \in \mathcal {L} \) s.t. \(s \rightarrow _\mathcal {R} ^{*} t\). If \(E \supseteq E^r \cup E_\mathcal {L} ^c \cup E_\mathcal {R} \) then the completion of \(\mathcal {A} \) by R and E terminates with a REFD \(\mathcal {A} ^{*}\).
4 A Class of Analysable Programs
The next step is to identify a class of functional programs and a language \(\mathcal {L} \) for which Theorem 1 applies. By choosing \(\mathcal {L} = \mathcal {T} (\mathcal {F})\) and providing a set of contracting equations \(E_{\mathcal {T} (\mathcal {F})}^c\), the termination theorem above proves that the completion algorithm terminates on any functional program \(\mathcal {R} \). If this works in theory, in practice we want to avoid introducing equations over the application symbol (such as \(@(x, y) = y\)). Contracting equations on applications makes sense in certain cases, e.g., with idempotent functions (\(@(sort, @(sort, x)) = @(sort, x)\)), but in most cases, such equations dramatically lower the precision of the completion algorithm. Hence, we want to identify a language \(\mathcal {L} \) with no contracting equations over @ in \(E_\mathcal {L} ^c\). Since such a language \(\mathcal {L} \) still has to have a finite number of normal forms w.r.t. \(\vec {E_\mathcal {L} ^c}\) (Theorem 1), it cannot include terms containing an unbounded stack of applications. For instance, \(\mathcal {L} \) cannot contain all the terms of the form @(f, x), @(f, @(f, x)), @(f, @(f, @(f, x))), etc. The @ stack must be bounded, even if the applications symbols are interleaved with other symbols (e.g. @(f, s(@(f, s(@(f, s(x))))))). To do that we (i) define a set \(\mathcal {B}^d\) of all terms where such stack size is bounded by \(d \in \mathbb {N}\); (ii) define a set \(\mathcal {K} ^n\) and a class of TRS called \(\mathcal {K} \)TRS such that for any TRS \(\mathcal {R} \) in this class, \(\mathcal {K} ^n\) is closed by \(\mathcal {R} \) and \(\mathcal {K} ^n \cap IRR (\mathcal {R}) \subseteq \mathcal {B}^{\phi (n)}\) for some function \(\phi \). This is done by first introducing a type system over the terms; (iii) finally define \(\mathcal {L} = \mathcal {B}^{\phi (n)} \cap IRR (\mathcal {R})\) that can be used to instantiate Theorem 1.
Definition 4
In Sect. 5, we show how to produce \(E^c\) such that \(\mathcal {B}^d \cap IRR (\mathcal {R})\) has a finite number of normal forms w.r.t. \(\vec {E^c}\) with no equations on @. However we don’t have for all k, for all term \(t \in L_\unrhd (\mathcal {A} ^k)\) a term \(s \in \mathcal {B}^d \cap IRR (\mathcal {R})\) s.t. \(t \rightarrow _\mathcal {R} ^{*} s\) in general. Theorem 1 cannot be instantiated with \(\mathcal {L} = \mathcal {B}^d \cap IRR (\mathcal {R})\). Instead we define (i) a set \(\mathcal {K} ^n \subseteq \mathcal {T} (\mathcal {F})\) and \(\phi \) such that \(\mathcal {K} ^n \cap IRR (\mathcal {R}) \subseteq \mathcal {B}^{\phi (d)}\) and (ii) a class of TRS, called \(\mathcal {K} \)TRS for which \(L_\unrhd (\mathcal {A} ^k) \subseteq \mathcal {K} _\unrhd ^n\). In \(\mathcal {K} \)TRS, the right hand sides of TRS rules are contained in a set \(\mathcal {K} \) whose purpose is to forbid the construction of unbounded partial applications during rewriting. If the initial automaton satisfies \(L_\unrhd (\mathcal {A}) \subseteq \mathcal {K} _\unrhd ^n\) then we can instantiate Theorem 1 with \(\mathcal {L} = \mathcal {K} _\unrhd ^n \cap IRR (\mathcal {R})\) and prove termination.
4.1 Types
Definition 5
Example 5
Define two different types of lists list and \(list'\). The first defines lists of int with the constructor \(consA : int \rightarrow list \rightarrow list \in \mathcal {C} \), while the second defines lists of functions with the constructor \(consB : (int \rightarrow int) \rightarrow list' \rightarrow list' \in \mathcal {C} \). The importance of order becomes manifest here: in the first case a fully reduced term of type list cannot contain any @ whereas in the second case it can. \(ord(list) = 0\) and \(ord(list') = 1\).
Lemma 1
4.2 The Class \(\mathcal {K} \)TRS
Recall that we want to define (i) a set \(\mathcal {K} ^n \subseteq \mathcal {T} (\mathcal {F})\) and \(\phi \) such that \(\mathcal {K} _\unrhd ^n \cap IRR (\mathcal {R}) \subseteq \mathcal {B}^{\phi (n)}\) and (ii) a class of TRS \(\mathcal {K} \)TRS for which \(L_\unrhd (\mathcal {A} ^k) \subseteq \mathcal {K} _\unrhd ^n\). Assuming that \(L_\unrhd (\mathcal {A}) \subseteq \mathcal {K} _\unrhd ^n\) we instantiate Theorem 1 with \(\mathcal {L} = \mathcal {K} _\unrhd ^n \cap IRR (\mathcal {R})\) and prove termination.
Definition 6
By constraining the form of the right hand side of each rule of \(\mathcal {R} \), \(\mathcal {K} \) defines a set of TRS that cannot construct unbounded partial applications during rewriting. The definition of \(\mathcal {K} \) takes advantage of the type structure and Lemma 1. The rules (1) and (2) ensure that an application \(@(t_1, t_2)\) is either: (1) a total application, and the whole term can be rewritten; or (2) a partial application where \(t_2\) can be rewritten into a term of \(\mathcal {B} ^0\) (Lemma 1). In (1), \(\mathcal {Z} \) allows partial applications inside the total application of a multiparameter function.
Example 6
Consider the classical map function. A typical call to this function is @(@(map, f), l) of type list, where f is a mapping function, and l a list. The whole term belongs to \(\mathcal {K} \) because of rule (1): list is an algebraic type and its subterm \(@(map, f) : list \rightarrow list\) belongs to \(\mathcal {Z} \). This subterm is a partial application, but there is no risk of stacking partial applications as it is part of a complete call (to the map function).
Example 7

Prove that if \(\mathcal {A} \) recognises terms of \(\mathcal {K} _\unrhd ^n\), then it is preserved by completion using the notion of \(\mathcal {K} ^n\)coherence of \(\mathcal {A} \).

Prove that \(\mathcal {K} _\unrhd ^n \cap IRR (\mathcal {R}) \subseteq \mathcal {B}^{n+2B} \cap IRR (\mathcal {R})\) where \(B \in \mathbb {N}\) is a fixed upper bound of the arity of all the types of the program.

Prove that there is a finite number of normal form of \(\mathcal {B}^{n+2B} \cap IRR (\mathcal {R})\) w.r.t \(\vec {E_\mathcal {L} ^c}\).

Finally, we use those three properties combined, and instantiate Theorem 1 with \(\mathcal {L} = \mathcal {B}^{n+2B} \cap IRR (\mathcal {R})\) to prove Theorem 2, defined as follows.
Theorem 2
Let \(\mathcal {A} \) be a \(\mathcal {K} ^n\)coherent REFD tree automaton, \(\mathcal {R} \) a terminating functional TRS such that for all rule \(l \rightarrow r \in \mathcal {R}, r \in \mathcal {K} \) and E a set of equations. Let \(\mathcal {L} = \mathcal {B}^{n+2B} \cap IRR (\mathcal {R})\). If \(E = E^r \cup E_\mathcal {L} ^c \cup E_\mathcal {R} \) then the completion of \(\mathcal {A} \) by \(\mathcal {R} \) and E terminates.
To prove that after each step of completion, the recognised language stays in \(\mathcal {K} ^n\), we require the considered automaton to be \(\mathcal {K} ^n\)coherent.
Definition 7
If \(\mathcal {K} ^n\)coherence is not preserved during completion, then some states in the completed automaton may recognise terms outside of \(\mathcal {K} _\unrhd ^n\). Our goal is to show that it is preserved by \(\mathcal {C} _\mathcal {R} (\cdot )\) (Lemma 2) then by \(\mathcal {S} _E(\cdot )\) (Lemma 3).
Lemma 2
(\(\mathcal {C} _\mathcal {R} (\mathcal {A})\) preserves \(\mathcal {K} ^n\)coherence). Let \(\mathcal {A} \) be a REFD tree automaton. If \(\mathcal {A} \) is \(\mathcal {K} ^n\)coherent, then \(\mathcal {C} _\mathcal {R} (\mathcal {A})\) is \(\mathcal {K} ^n\)coherent.
Lemma 3
(\(\mathcal {S} _E(\mathcal {A})\) preserves \(\mathcal {K} ^n\)coherence). Let \(\mathcal {A} \) be a REFD tree automaton, \(\mathcal {R} \) a functional TRS and E a set of equations such that \(E = E^r \cup E_\mathcal {L} ^c \cup E_\mathcal {R} \) with \(\mathcal {L} =\mathcal {B}^{n+2B} \cap IRR (\mathcal {R})\). If \(\mathcal {A} \) is \(\mathcal {K} ^n\)coherent then \(\mathcal {S} _E(\mathcal {A})\) is \(\mathcal {K} ^n\)coherent.
By using Lemmas 2 and 3, we can prove that the completion algorithm, which is a composition of \(C_\mathcal {R} (\mathcal {A})\) and \(\mathcal {S} _E(\mathcal {A})\), preserves \(\mathcal {K} ^n\)coherence. The proofs of these two lemmas are based on a detailed analysis of the completion algorithm itself. The complete proofs are provided in [17].
Lemma 4
(Completion preserves \(\mathcal {K} ^n\)coherence). Let \(\mathcal {A} = \langle \mathcal {F}, \mathcal {Q}, \mathcal {Q} _f, \varDelta \rangle \) be a tree automaton, \(\mathcal {R} \) a functional TRS and E a set of equations. If \(E = E^r \cup E_\mathcal {L} ^c \cup E_\mathcal {R} \) with \(\mathcal {L} =\mathcal {B}^{n+2B} \cap IRR (\mathcal {R})\) and \(\mathcal {A} \) is \(\mathcal {K} ^n\)coherent then for all \(k \in \mathbb {N}\), \(\mathcal {A} ^k\) is \(\mathcal {K} ^n\)coherent. In particular, \(\mathcal {A} ^{*}\) is \(\mathcal {K} ^n\)coherent.
By construction we can prove that the depth of irreducible \(\mathcal {K} _\unrhd ^n\) terms is bounded, which correspond to the following lemma.
Lemma 5
For all \(t : T \in \mathcal {K} _\unrhd ^n\), \(t:T \in IRR (\mathcal {R}) \Rightarrow t : T \in \mathcal {B}^{n + 2B  arity(T)}\).
4.3 Proof of Theorem 2
Proof
According to Lemma 4, for all \(k \in \mathbb {N}\), the completed automaton \(\mathcal {A} ^k\) is \(\mathcal {K} ^n\)coherent. By definition this implies that \(\mathcal {L} _\unrhd (\mathcal {A} ^k) \subseteq \mathcal {K} _\unrhd ^n\). Moreover, we know that \( IRR (\mathcal {R}) \cap \mathcal {K} _\unrhd ^n \subseteq \mathcal {B}^{n + 2B}\) (Lemma 5). Let \(\mathcal {L} = \mathcal {B}^{n+2B} \cap IRR (\mathcal {R})\). \(\mathcal {R} \) is terminating, so for every term \(s \in \mathcal {L} _\unrhd (\mathcal {A} ^k)\) there exists \(t \in \mathcal {L} \) such that \(s \rightarrow _\mathcal {R} ^{*} t\). Since the number of normal form of \(\mathcal {L} \) is finite w.r.t \(\vec {E}\), Theorem 1 implies that the completion of \(\mathcal {A} \) by \(\mathcal {R} \) and E terminates.
5 Equation Generation

The initial automaton \(\mathcal {A} \) must be \(\mathcal {K} ^n\)coherent and REFD.

\(\mathcal {R} \) must be terminating.

All lefthand sides of rules of \(\mathcal {R} \) are in the set of terms \(\mathcal {K} \). This is a straightforward syntactic check. If it is not verified, we can reject the TRS before starting the completion.

The set of equations E must be of the form \(E^r \cup E_\mathcal {L} ^c \cup E_\mathcal {R} \). The equation sets \(E^r\) and \(E_\mathcal {R} \) are determined directly from the syntactic structure of \(\mathcal {R} \). However, there is no unique suitable set of contracting equations \(E_\mathcal {L} ^c\). This set must be generated carefully, because a bad choice of contracting equations (i.e., equations that equate too many terms) will have a severe negative impact on the precision of the analysis result.
In this section, we describe a method for generating all possible sets of contracting equations \(E_\mathcal {L} ^c\). To simplify the presentation, we only present the case where \(\mathcal {L} =\mathcal {W} (\mathcal {C})\) and \( IRR (\mathcal {R}) \subseteq \mathcal {W} (\mathcal {C})\) (i.e., all results are firstorder terms). Our approach looks for contracting equations for the set of closed terms \(\mathcal {W} (\mathcal {C})\) instead of the set \(\mathcal {B}^{n+2B}\) mentioned in Theorem 2. More precisely, we generate the set of equations iteratively, as a series of equation sets \(\mathbb {E}_c^{k}\) where the equations only equate terms of depth at most k. Recall that a contracting equation is of the form \(u = u_p\) with \(p \ne \lambda \), i.e., it equates a term with a strict subterm of the same type. A set of contracting equations over the set \(\mathcal {W} (\mathcal {C})\) is then generated as follows: (i) generate the set of lefthand side of equations as a covering set of terms [25], so that for each term \(t \in \mathcal {W} (\mathcal {C})\) there exists a lefthand side u of an equation and a substitution \(\sigma \) such that \(t = u\sigma \). (ii) for each lefthand side, generate all possible equations of the form \(u = u_p\), satisfying that both sides have the same type. (iii) from all those equations, we build all possible \(E_\mathcal {L} ^c\) (with \(\mathcal {L} =\mathcal {W} (\mathcal {C})\)) such that the set of normal forms of \(\mathcal {W} (\mathcal {C})\) w.r.t. \(\vec {E_\mathcal {L} ^c}\) is finite. Since \(\vec {E_\mathcal {L} ^c}\) is leftlinear and \(\mathcal {L} =\mathcal {W} (\mathcal {C})\), this can be decided efficiently [11].
Example 8
Assume that \(\mathcal {C} =\{0 : 0, s : 1\}\). For \(k=1\), the covering set is \(\{s(x),0\}\) and \(\mathbb {E}_c^{1}=\{\{s(x)=x\}\}\). For depth 2, the covering set is \(\{s(s(x)),s(0),0\}\) and \(\mathbb {E}_c^{2}=\mathbb {E}_c^{1} \cup \{\{s(s(x))=x\},\{s(s(x))=s(x)\},\{s(0)=0\},\{s(0)=0, s(s(x))=x\},\{s(0)=0,s(s(x))=s(x)\}\}\). All equation sets of \(\mathbb {E}_c^{1}\) and \(\mathbb {E}_c^{2}\) satisfy Definition 3 and lead to different approximations.
 1.
We first complete the tree automaton \(\mathcal {A} _k\) recognising the finite subset of \(\mathcal {L} (\mathcal {A})\) of terms of maximum depth k. Since \(\mathcal {L} (\mathcal {A} _k)\) is finite and \(\mathcal {R} \) is terminating, the set of reachable terms is finite, completion terminates without equations and computes an automaton \(\mathcal {A} _k^{*}\) recognising exactly the set \(\mathcal {R} ^{*}(L(\mathcal {A} _k))\) [20].
 2.
If \(\mathcal {L} (\mathcal {A} _k^{*})\) does not satisfy \(\varphi \) then verification fails: a counterexample is found.
 3.
Otherwise, we search for a suitable set \(E_c\). All \(E_c\) of \(\mathbb {E}_c^k\) that introduce a counterexample in the completion of \(\mathcal {A} _k\) with \(\mathcal {R} \) and \(E_c\) are filtered out.
 4.
Then for all remaining \(E_c\), we try to complete \(\mathcal {A} \) with \(\mathcal {R} \) and \(E=E_r \cup E_\mathcal {R} \cup E_c\) and check \(\varphi \) on the completed automaton. If \(\varphi \) is true on \(\mathcal {A} ^{*}\) then verification succeeds. Otherwise, we try the next \(E_c\).
 5.
If there remain no \(E_c\), we start again with \(k=k+1\).
If there exists a set of equations \(E_c\) able to verify the program, this algorithm will find it eventually, or find a counter example. However if there is no set of equations that can verify the program, this algorithm does not terminate.
6 Experiments
The verification technique described above has been integrated in the Timbuk library [16]. We implemented the naive equation generation where all possible equation sets \(E_c\) are enumerated. Despite the evident scalability issues of this simple version of the verification algorithm, we have been able to verify a series of properties of several classical higherorder functions: \( map \), \( filter \), \( exists \), \( forall \), \( foldRight \), \( foldLeft \) as well as higherorder sorting functions parameterised by an ordering function. Most examples are taken from or inspired by [26, 28] and have corresponding TRSs in the \(\mathcal {K} \) set defined above. The property \(\varphi \) consists in checking that a finite set of forbidden terms is not reachable (Patterns section of Timbuk specifications).
Given \(\mathcal {A} \), \(\mathcal {R} \) and \(\mathcal {A} ^{*}\), the correctness of the verification, i.e. the fact that \(\mathcal {L} (\mathcal {A} ^{*}) \supseteq \mathcal {R} ^{*}(\mathcal {L} (\mathcal {A}))\), can be checked in a proof assistant embedding a formalisation of rewriting and tree automata. It is enough to prove that (a) \(\mathcal {L} (\mathcal {A} ^{*})\supseteq \mathcal {L} (\mathcal {A})\) and that (b) for all critical pairs \(\langle l \rightarrow r, \sigma , q \rangle \) of \(\mathcal {A} ^{*}\) we have \(r\sigma \rightarrow _{\mathcal {A} ^{*}}^{*} q\). Property (a) can be checked using standard algorithms on tree automata. Property (b) can be checked by enumerating all critical pairs of \(\mathcal {A} ^{*}\) (there are finitely many) and by proving that all of them satisfy \(r\sigma \rightarrow _{\mathcal {A} ^{*}}^{*} q\). Since there exists algorithms for checking properties (a) and (b), the complete proof of correctness can automatically be built in the proof assistant. For instance, the automaton \(\mathcal {A} ^{*}\) can be used as a certificate to build the correctness proof in Coq [6] and in Isabelle/HOL [14]. It is also used to build unreachability proofs in Isabelle/HOL [14]. Besides, since verifying (a) and (b) is automatic, the correctness proof may be run outside of the proof assistant (in a more efficient way) using a formally verified external checker extracted from the formalisation. All our (successful) completion attempts output a comp.res file, containing \(\mathcal {A} \), \(\mathcal {R} \) and \(\mathcal {A} ^{*}\), which has been certified automatically using the external certified checker of [6]. Timbuk’s site http://people.irisa.fr/Thomas.Genet/timbuk/funExperiments/ lists those verification experiments. Nine of them are automatically proven. Two other examples show that correct counterexamples are generated when the property is not provable. On one example equation generation times out due to our Open image in new window enumeration of equations. For this last case, by providing the right set of equations in mapTree2NoGen the verification of the function succeeds.
7 Related Work
When it comes to verifying firstorder imperative programs, there exist several successful tools based on abstract interpretation such as ASTREE [3] and SLAM [2]. The use of abstract interpretation for verifying higherorder functional programs has comparatively received less attention. The tree automaton completion technique is one analysis technique able to verify firstorder Java programs [4]. Until now, the completion algorithm was guaranteed to terminate only in the case of firstorder functional programs [19].
Liquid Types [31], followed by Bounded Refinement Types [33, 34], and also SetTheoretic Types [8, 9], are all attempts to enrich the type system of functional languages to prove nontrivial properties on higherorder programs. However, these methods are not automatic. The user has to express the property he wants to prove using the type system, which can be tedious and/or difficult. In some cases, the user even has to specify straightforward intermediate lemmas to help the type checker.
The first attempt in verifying regular properties came with Jones [21] and Jones and Andersen [22]. Their technique computes a grammar overapproximating the set of states reachable by a rewriting systems. However, their approximation is fixed and too rough to prove programs like Example 1 (\( filter \) odd). Our program and property models are close to those of Jones and Andersen. However, the approximation in our analysis is not fixed and can be automatically adapted to the verification objective.
Ong et al. proposes one way of addressing the precision issue of Jones and Andersen’s approach using a model checking technique on Pattern Matching Recursion Schemes [28] (PMRS). This technique improves the precision but is still not able to verify functions such as Example 1 (see [32] page 85). As shown in our experiments, our technique handles this example.
Kobayashi et al. developed a tree automatabased technique [26] (but not relying on TRS and completion), able to verify regular properties (including safety properties on Example 1). We have verified a selection of examples coming from [26] and observed that we can verify the same regular properties as they can. Our prototype implementation is inferior in terms of execution time, due to the slow generation of equations. A strength of our approach is that our verification results are certifiable and that they can be used as certificates to build unreachability proofs in proof assistants (see Sect. 6).
Our verification framework is based on regular abstractions and uses a simple abstraction mechanism based on equations. Regular abstractions are less expressive than HigherOrder Recursion Schemes [23, 29] or Collapsible Pushdown Automata [7], and equationbased abstractions are a particular case of predicate abstraction [24]. However, the two restrictions imposed in this particular framework result in two strong benefits. First, the precision of the approximation is formally defined and precisely controlled using equations: \(\mathcal {L} (\mathcal {A} ^{*}) \subseteq (\mathcal {R}/E)^{*}(\mathcal {L} (\mathcal {A}))\) [20]. This precision property permits us to prove intricate properties with simple (regular) abstractions. Second, using tree automatabased models facilitates the certification of the verification results in a proof assistant. This significantly increases the confidence in the verification result compared e.g., to verdicts obtained by complex CEGARbased modelcheckers.
8 Conclusion and Future Work
This paper shows that tree automata completion is a simple yet powerful, fully automatic verification technique for higherorder functional programs, expressed as term rewriting systems. We have proved that the completion algorithm terminates on a subset of TRS encompassing common functional programs, and provided experimental evidence of the viability of the approach by verifying properties on fundamental higherorder functions including filtering and sorting.
One remaining question is whether this approach is complete: if there exists a regular approximation of the reachable terms of a functional program, can we build it using equations? We can already answered this question in the positive when \(\mathcal {L} =\mathcal {W} (\mathcal {C})\), i.e., all results are first order terms [15]. Extending this result to all kind of results, including higherorder ones, is a promising research topic.
The generation of the approximating equations is automatic but simpleminded, and too simple to turn the prototype into a full verification tool. Further work will look into how sets of contracting equations can be generated in a more efficient manner, notably by taking the structure of the TRS into account and using a CEGAR approach.
The present verification technique is agnostic to the evaluation strategy. An interesting research track would be to experiment completionbased verification techniques with different term rewriting semantics of functional programs such as outlined by Clemente et al. [10]. This would permit us to take a particular evaluation strategy into account, and in certain cases, improve the precision of the verification. We already experimented with this in [18]. This is in line with our longterm research goal of providing a lightweight verification tool to assist the working OCaml programmer.
Our work focuses on verifying regular properties represented by tree automata. Dealing with nonregular overapproximations of reachable terms would allow us to verify relational properties like comparing the length of the list before and after filter. This is one of the objective of techniques like [24]. Building nonregular overapproximations of reachable terms for TRS, using a form of completion, is possible [5]. However, up to now, adapting automatically the precision of such approximations to a given verification goal is not possible. Extending their approach with equations may provide a powerful verification tool worth pursuing.
References
 1.Baader, F., Nipkow, T.: Term Rewriting and All That. Cambridge University Press, New York (1998)CrossRefGoogle Scholar
 2.Ball, T., Rajamani, S.K.: The SLAM project: debugging system software via static analysis. In: Conference Record of POPL 2002: The 29th SIGPLANSIGACT Symposium on Principles of Programming Languages, Portland, OR, USA, 16–18 January 2002, pp. 1–3 (2002)Google Scholar
 3.Blanchet, B., Cousot, P., Cousot, R., Feret, J., Mauborgne, L., Miné, A., Monniaux, D., Rival, X.: A static analyzer for large safetycritical software. In: Proceedings of the ACM SIGPLAN 2003 Conference on Programming Language Design and Implementation 2003, San Diego, California, USA, 9–11 June 2003, pp. 196–207 (2003)Google Scholar
 4.Boichut, Y., Genet, T., Jensen, T., Le Roux, L.: Rewriting approximations for fast prototyping of static analyzers. In: Baader, F. (ed.) RTA 2007. LNCS, vol. 4533, pp. 48–62. Springer, Heidelberg (2007). https://doi.org/10.1007/9783540734499_6CrossRefzbMATHGoogle Scholar
 5.Boichut, Y., Chabin, J., Réty, P.: Towards more precise rewriting approximations. In: Dediu, A.H., Formenti, E., MartínVide, C., Truthe, B. (eds.) LATA 2015. LNCS, vol. 8977, pp. 652–663. Springer, Cham (2015). https://doi.org/10.1007/9783319155791_51CrossRefGoogle Scholar
 6.Boyer, B., Genet, T., Jensen, T.: Certifying a tree automata completion checker. In: Armando, A., Baumgartner, P., Dowek, G. (eds.) IJCAR 2008. LNCS (LNAI), vol. 5195, pp. 523–538. Springer, Heidelberg (2008). https://doi.org/10.1007/9783540710707_43CrossRefGoogle Scholar
 7.Broadbent, C.H., Carayol, A., Hague, M., Serre, O.: Cshore: a collapsible approach to higherorder verification. In: ICFP 2013. ACM (2013)Google Scholar
 8.Castagna, G., Nguyen, K., Xu, Z., Abate, P.: Polymorphic functions with settheoretic types: part 2: local type inference and type reconstruction. In: POPL 2015. ACM (2015)Google Scholar
 9.Castagna, G., Nguyen, K., Xu, Z., Im, H., Lenglet, S., Padovani, L.: Polymorphic functions with settheoretic types: part 1: syntax, semantics, and evaluation. In: POPL 2014. ACM (2014)Google Scholar
 10.Clemente, L., Parys, P., Salvati, S., Walukiewicz, I.: Ordered treepushdown systems. In: FSTTCS 2015. LIPIcs, Schloss Dagstuhl  LeibnizZentrum fuer Informatik, vol. 45, pp. 163–177 (2015)Google Scholar
 11.Comon, H.: Sequentiality, monadic secondorder logic and tree automata. Inf. Comput. 157(1–2), 25–51 (2000)MathSciNetCrossRefGoogle Scholar
 12.Comon, H., Dauchet, M., Gilleron, R., Jacquemard, F., Lugiez, D., Löding, C., Tison, S., Tommasi, M.: Tree Automata Techniques and Applications (2008). http://tata.gforge.inria.fr
 13.Coq: The Coq proof assistant reference manual: version 8.6 (2016)Google Scholar
 14.Felgenhauer, B., Thiemann, R.: Reachability, confluence, and termination analysis with statecompatible automata. Inf. Comput. 253, 467–483 (2017)MathSciNetCrossRefGoogle Scholar
 15.Genet, T.: Automata completion and regularity preservation. Technical report, Inria (2017). https://hal.inria.fr/hal01501744
 16.Genet, T., Boichut, Y., Boyer, B., Murat, V., Salmon, Y.: Reachability Analysis and Tree Automata Calculations. IRISA/Université de Rennes 1. http://people.irisa.fr/Thomas.Genet/timbuk/
 17.Genet, T., Haudebourg, T., Jensen, T.: Verifying HigherOrder Functional Programs With Tree Automata: Extended Version (2017). https://hal.inria.fr/hal01614380
 18.Genet, T., Salmon, Y.: Reachability analysis of innermost rewriting – extended version. Log. Methods Comput. Sci. 13(1), 1–35 (2017)MathSciNetzbMATHGoogle Scholar
 19.Genet, T.: Termination criteria for tree automata completion. J. Log. Algebr. Methods Program. 85(1), 3–33 (2016)MathSciNetCrossRefGoogle Scholar
 20.Genet, T., Rusu, V.: Equational approximations for tree automata completion. J. Symb. Comput. 45(5), 574–597 (2010)MathSciNetCrossRefGoogle Scholar
 21.Jones, N.D.: Flow analysis of lazy higherorder functional programs. In: Abramsky, S., Hankin, C. (eds.) Abstract Interpretation of Declarative Languages, pp. 103–122. Ellis Horwood, Chichester (1987)Google Scholar
 22.Jones, N.D., Andersen, N.: Flow analysis of lazy higherorder functional programs. Theor. Comput. Sci. 375(1–3), 120–136 (2007)MathSciNetCrossRefGoogle Scholar
 23.Kobayashi, N.: Types and higherorder recursion schemes for verification of higherorder programs. In: Proceedings of the 36th ACM SIGPLANSIGACT Symposium on Principles of Programming Languages, POPL 2009, Savannah, GA, USA, 21–23 January 2009, pp. 416–428 (2009)Google Scholar
 24.Kobayashi, N., Sato, R., Unno, H.: Predicate abstraction and CEGAR for higherorder model checking. In: Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2011, San Jose, CA, USA, 4–8 June 2011, pp. 222–233 (2011)Google Scholar
 25.Kounalis, E.: Testing for the ground (co)reducibility property in termrewriting systems. Theor. Comput. Sci. 106(1), 87–117 (1992)MathSciNetCrossRefGoogle Scholar
 26.Matsumoto, Y., Kobayashi, N., Unno, H.: Automatabased abstraction for automated verification of higherorder treeprocessing programs. In: Feng, X., Park, S. (eds.) APLAS 2015. LNCS, vol. 9458, pp. 295–312. Springer, Cham (2015). https://doi.org/10.1007/9783319265292_16CrossRefGoogle Scholar
 27.Meseguer, J., Palomino, M., MartíOliet, N.: Equational abstractions. TCS 403(2–3), 239–264 (2008)MathSciNetCrossRefGoogle Scholar
 28.Ong, C.L., Ramsay, S.J.: Verifying higherorder functional programs with patternmatching algebraic data types. In: Proceedings of the 38th ACM SIGPLANSIGACT Symposium on Principles of Programming Languages, POPL 2011, Austin, TX, USA, 26–28 January 2011, pp. 587–598 (2011)Google Scholar
 29.Ong, C.H.: On modelchecking trees generated by higherorder recursion schemes. In: 21st Annual IEEE Symposium on Logic in Computer Science (LICS 2006), pp. 81–90. IEEE (2006)Google Scholar
 30.Paulson, L.C., et al.: The Isabelle reference manual. Technical report, University of Cambridge, Computer Laboratory (1993)Google Scholar
 31.Rondon, P.M., Kawaguchi, M., Jhala, R.: Liquid types. In: Proceedings of the ACM SIGPLAN 2008 Conference on Programming Language Design and Implementation, Tucson, AZ, USA, 7–13 June 2008, pp. 159–169 (2008)Google Scholar
 32.Salmon, Y.: Analyse d’atteignabilité pour les programmes fonctionnels avec stratégie d’évaluation en profondeur. Ph.D. thesis, University of Rennes 1 (2015)Google Scholar
 33.Vazou, N., Bakst, A., Jhala, R.: Bounded refinement types. In: Proceedings of the 20th ACM SIGPLAN International Conference on Functional Programming, ICFP 2015, Vancouver, BC, Canada, 1–3 September 2015, pp. 48–61 (2015)Google Scholar
 34.Vazou, N., Rondon, P.M., Jhala, R.: Abstract refinement types. In: Felleisen, M., Gardner, P. (eds.) ESOP 2013. LNCS, vol. 7792, pp. 209–228. Springer, Heidelberg (2013). https://doi.org/10.1007/9783642370366_13CrossRefGoogle Scholar
Copyright information
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made. The images or other third party material in this book are included in the book's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the book's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.