Abstract
We propose an extension of the automated theorem prover E by the weighted path ordering (WPO). WPO is theoretically stronger than all the orderings used in E Prover, however its parametrization is more involved than those normally used in automated reasoning. In particular, it depends on a term algebra. We integrate the ordering in E Prover and perform an evaluation on the standard theorem proving benchmarks. The ordering is complementary to the ones used in E prover so far. Furthermore, firsttime presented here, we propose a relaxed variant of the weighted path order as an approximation of the standard WPO definition. A theorem prover strategy with a relaxed order can be incomplete, which is, however, not an issue as completeness can be easily regained by switching to a complete strategy. We show that the relaxed weighted path order can have a huge impact on an improvement of a theorem prover strategy.
Introduction
In the last two decades the superposition calculus has become one of the main foundations of automated theorem provers for firstorder logic. Indeed the systems regularly winning the yearly CADE ATP Systems Competition, such as E Prover [9] and Vampire [4] are based on the superposition calculus. Also for the problems not previously solved by humans, superposition calculus based Prover9 has been most useful so far [7].
The use of powerful and efficient orderings is one of the major advantages of the superposition calculus for classical firstorder theorem proving. Orderings allow provers to avoid redundant clauses, namely clauses which only differ in the order of literals, as well as permit orienting equations and therefore rewriting the clauses only in one direction. The three predominantly used orderings in automated theorem proving are LPO, KBO, and RPO. In fact, for the former two optimized implementations are known [5, 6].
However, term rewriting research has shown that there exist more powerful orderings, for example the weighted path order (WPO) [12] is one of the strongest known orderings. With carefully selected parameters is can subsume most known orderings including LPO, KBO, and RPO [13]. There are however two reasons, why such stronger orderings have not been tried for automated reasoning so far. First, they often rely on complicated parameters. For example WPO relies on an algebra on terms as an argument. Second, the efficiency of KBO, LPO, or even RPO has been optimized for the most common cases, whereas the more advanced orderings have been stated in a general manner, without optimizing their efficiency.
This paper extends our previous research [2] where we attempt to overcome both of these obstacles and propose an efficient way to implement WPO as part of an automated reasoning system. We also propose parameters that allow WPO to function efficiently within a stateoftheart automated theorem prover and help with actual theorem proving problems. After discussing the preliminaries on term orderings in Sect. 2 and on their use in the superposition calculus in Sect. 3, the particular contributions of this paper are:

We propose algebras that can be used efficiently for firstorder theorem proving (Sect. 4).

Firsttime presented in this paper, we propose a relaxed version of WPO based on approximation of the standard WPO definition (Sect. 5).

We evaluate WPO against existing orderings in E Prover on parts of the TPTP library, the proofs stemming from the AIM conjecture [10], and on the CoqHammer proofs [1] in Sect. 6.

We show that relaxed WPO can provide a huge benefit for a theorem proving strategy (Sect. 6.2).

Additionally to our previous research, we provide an evaluation of effectiveness of our implementation based on real CPU time limits.
This work is an extended version of our ICMS 2018 paper [2]. In comparison with that work, the relaxed WPO, efficient implementation, and a more extensive evaluation including an evaluation based on CPU time limit are the main new contributions presented firsttime in this paper.
Term Orderings and Rewriting
We work in firstorder logic (FOL). A signature \(\Sigma \) is a collection of symbols with arities. The set of firstorder variables is denoted \(\mathcal {V}\), and \(\mathcal {T}_\Sigma \) stands for the terms over signature \(\Sigma \) and variables \(\mathcal {V}\). A literal is an atomic formula or its negation, and a clause is a disjunction of literals. In ATPs, clauses are used to describe both the input problem, and the knowledge inferred during the search. On occasion, unit equality clauses of the form \(s=t\) are inferred. Such equalities can be used to simplify other clauses using \(s\rightarrow t\) or \(t\rightarrow s\) as a rewriting rule.
Rewriting systems, described by finite sets of rewriting rules, are often used inside ATPs to keep a set of clauses in normal forms. A crucial property for ATPs is the termination of every rewriting chain on any term. The termination of system \(\mathcal {R}\) can be shown using a wellfounded term ordering \(>_\mathcal {T}\) on terms \(\mathcal {T}\), that orients every rule \((s\rightarrow t)\in \mathcal {R}\), meaning \(s>_\mathcal {T}t\). Terminating rewriting systems are called reduction orders. See [8, 13] for details.
Reduction orders are successfully used in many stateoftheart ATPs. Common orders [8, 13] are lexicographic path order (LPO) and KnuthBendix order (KBO). LPO extends a precedence \(>_\Sigma \) on symbols to a reduction order on \(\mathcal {T}_\Sigma \) by a variety of subterm comparisons. KBO is generated by a precedence and symbol weights. Terms in KBO are first compared by weights and the subterm comparisons are necessary only if the weights differ. WPO further abstracts the idea of symbol weight comparisons to comparisons in algebras.
In this section, we remind the theoretical definitions of the orderings LPO and KBO used in E Prover, and remind the theoretical definition of WPO. We mostly follow [8] for LPO and KBO and [13] for WPO, and we refer the reader there for further details.
All the orderings will be defined on firstorder terms \(\mathcal {T}_\Sigma \), and rely on a precedence \(>_\Sigma \), which needs to be a proper order on the symbols from signature \(\Sigma \).
Definition 2.1
(LPO [8]) Given a precedence on symbols \(>_\Sigma \), we define the lexicographic path order (LPO) \(>_{\text{ lpo }}\) as follows: \(s = f(s_1,\dots ,s_{n}) >_{\text{ lpo }}t\) iff one of the following conditions holds:

1.
\(t = f(t_1,\dots ,t_{n})\) and \(\exists i \in \{ 1, \dots , n \}\) such that

(i)
\(s_j = t_j\) for all j such that \(1 \leqslant j < i\),

(ii)
\(s_i >_{\text{ lpo }}t_i\), and

(iii)
\(s >_{\text{ lpo }}t_j\) for all j such that \(i < j \leqslant n\),

(i)

2.
\(t = g(t_1,\dots ,t_{m})\), \(f >_\Sigma g\), and \(s >_{\text{ lpo }}t_i\) for all i such that \(1 \leqslant i \leqslant m\), or

3.
\(s_i \ge _{\text{ lpo }}t\) for all i such that \(1 \leqslant i \leqslant n\).
Where \(\ge _{\text{ lpo }}\) is the reflexive closure of \(>_{\text{ lpo }}\).
In order to define KBO, we additionally need a weight function induced by a pair \((w,w_0)\) where w is symbol weight function and \(w_0\) is a constant variable weight. The constant \(w_0\) must be greater than zero, and the mapping w from the signature \(\Sigma \) to the natural numbers is defined such that \(w(c) \geqslant w_0\) for any constant \(c\in \Sigma \). The weight function w on symbols from \(\Sigma \) is naturally extended to the weight function on terms \(\mathcal {T}_\Sigma \) as follows.
Additionally if a unary function f has weight 0, than f is the greatest element wrt. the precedence. In the following, \(s_x\) denotes the number of occurrences of variable \(x\in \mathcal {V}\) in term \(s\in \mathcal {T}_\Sigma \).
Definition 2.2
(KBO [8]) Given a precedence \(>_\Sigma \) and the weight function induced by \((w,w_0)\), we define the KnuthBendix order (KBO) \(>_{\text{ kbo }}\) on \(\mathcal {T}_\Sigma \) as follows: \(s >_{\text{ kbo }}t\) iff \(s_x \geqslant t_x\) for all \(x \in \mathcal {V}\) and one of the following conditions holds:

1.
\(w(s) > w(t)\), or

2.
\(w(s) = w(t)\) and one of the following conditions holds:

(i)
\(t \in \mathcal {V}\) and \(s = f^n(t)\) for an unary function symbol f and \(n > 0\),

(ii)
\(s = f(s_1,\dots ,s_{n})\), \(t = f(t_1,\dots ,t_{n})\), and \(\exists i \in \{ 1, \dots , n \}\) such that \(s_j = t_j\) for all \(1 \leqslant j < i\) and \(s_i >_{\text{ kbo }}t_i\), or

(iii)
\(s = f(s_1,\dots ,s_{n})\), \(t = g(t_1,\dots ,t_{m})\), and \(f > g\).

(i)
Weighted path order (WPO) further abstracts the weight function to the notion of algebras on firstorder terms defined as follows.
Definition 2.3
An algebra \(\mathcal {A}\) over \(\Sigma \) consists of a wellordered carrier set and of an interpretation \(f_\mathcal {A}:\mathbb {N}^n\rightarrow \mathbb {N}\) for every nary function symbol f from \(\Sigma \). An algebra \(\mathcal {A}\) is weakly monotone iff \(a\ge b\) implies \(f(\ldots ,a,\ldots )\ge f(\ldots ,b,\ldots )\), and weakly simple iff \(f(\ldots ,a,\ldots )\ge a\) for every \(f\in \Sigma \).
In this work, we consider the carrier set always to be \(\mathbb {N}\) with the standard order on \(\mathbb {N}\). Given a variable assignment \(\sigma :\mathcal {V}\rightarrow \mathbb {N}\), we can structurally interpret every term \(t\in \mathcal {T}_\Sigma \) using interpretations from algebra \(\mathcal {A}\) as the number \(\sigma _\mathcal {A}(t)\in \mathbb {N}\), formally as follows.
Thus the algebra \(\mathcal {A}\) induces the following ordering \(>_\mathcal {A}\) on terms: \(s>_\mathcal {A}t\) iff \(\sigma _\mathcal {A}(s)>\sigma _\mathcal {A}(t)\) for every variable assignment \(\sigma \). Similarly, we write \(s\ge _\mathcal {A}t\) iff \(\sigma _\mathcal {A}(s)\ge \sigma _\mathcal {A}(t)\) for every \(\sigma \). The following defines WPO induced by \(\mathcal {A}\).
Definition 2.4
(WPO [13]) Given a precedence \(>_\Sigma \) and an algebra \(\mathcal {A}\) over \(\Sigma \), the weighted path order \(>_{\text{ wpo }}\) on \(\mathcal {T}_\Sigma \) is defined as follows: \(s=f(s_1,\ldots ,s_n)>_{\text{ wpo }}t\) iff (1) \(s >_\mathcal {A}t\), or (2) \(s \ge _\mathcal {A}t\) and one of the following holds:

2a.
\(\exists i\in \{1,\ldots ,n\}.\ s_i\ge _{\text{ wpo }}t\), or

2b.
\(t=g(t_1,\ldots ,t_m)\), \(\forall j\in \{1,\ldots ,m\}.\ s>_{\text{ wpo }}t_j\) and either

(i)
\(f>_\Sigma g\), or

(ii)
\(f=g\) and \((s_1,\ldots ,s_n)>_{\text{ wpo }}^{lex}(t_1,\ldots ,t_n)\).

(i)
Only terms comparable in \(\mathcal {A}\) are comparable in \(>_{\text{ wpo }}\). Strict order \(s>_\mathcal {A}t\) alone implies \(s>_{\text{ wpo }}t\). Otherwise \(s\ge _\mathcal {A}\) must hold and various subterm conditions are checked. In (2a), \(\ge _{\text{ wpo }}\) is the reflexive closure of \(>_{\text{ wpo }}\), while \(>_\mathcal {A}\) and \(\ge _\mathcal {A}\) are separately defined orders induced by \(\mathcal {A}\). In (2b/ii) the lexicographical extension \(>_{\text{ wpo }}^{lex}\) of \(>_{\text{ wpo }}\) to ntuples is used when the compared terms have the same head symbol.
If the WPO algebra \(\mathcal {A}\) is weakly monotone and weakly simple, then \(>_{\text{ wpo }}\) is a reduction order [13, Theorem 13]. With different algebras, WPO is known to behave like LPO [13, Theorem 19], or like KBO [13, Theorem 16], or to subsume both [13, Theorem 20]. Instantiations of WPO with different algebras are discussed in Sect. 4.
Orderings in Superposition Calculus
Saturation based automated theorem provers, like E Prover [9], attempt to prove a firstorder goal conjecture G in a theory T, that is, \(T \vdash G\). First, theory axioms with the negated conjecture \(T\cup \{\lnot G\}\) are translated to a logically equivalent set of clauses. Then, a saturation process is initiated, which selects an unprocessed clause C and computes all possible inferences of C with all the previously processed clauses. Clause C is then marked as processed and another unprocessed clause is selected. This process continues until an empty clause (contradiction) is derived, or there are no more unprocessed clauses (the set of processed clauses becomes saturated), or the prover runs out of resources.
The saturation process uses term orderings for various purposes depending on the selected inference rules. The classical resolution rule allows to infer the clause \((C_1\vee C_2)\sigma \) from clauses \((L_1\vee C_1)\) and \((\lnot L_2\vee C_2)\) provided \(L_1\) and \(L_2\) are unifiable with the unifier \(\sigma \). The ordered resolution restricts the classical resolution rule to literals maximal in each clause (w.r.t. a fixed term ordering \(>_\mathcal {T}\)). In paramodulation, inferred unit equality clauses of the form \(s=t\), which can be oriented using the ordering (either \(s>_\mathcal {T}t\) or \(t>_\mathcal {T}s\)), can be used as rewriting rules (\(s\rightarrow t\) or \(t\rightarrow s\), respectively). The processed clauses are then kept in their normal form with respect to the inferred rewriting rules (called demodulators). All these extensions restrict the number of possible inferences preserving completeness (that is, they do not prevent the inference of the empty clause). Clearly, the more terms are comparable, the more inferences are restricted, which leads to a more effective search space reduction.
E Prover implements LPO and KBO. The desired term ordering can be selected using a commandline option. E implements approximately ten signatureindependent methods to generate the precedence on the symbols. In this work, we shall consider the following.
 (arity/iarity).:

Symbols are sorted by arity or reverse arity. Symbols with higher arity are larger/smaller.
 (freq/ifreq).:

Symbols are sorted by the frequency of their occurrence in the input problem. Frequently occurring symbols are larger/smaller. In the case of the same frequency, symbols are sorted by arity.
 (ufirst).:

Same as arity but unary symbols are smaller. In the case of the same arity, symbols are sorted by frequency.
 (ufreq).:

Same as ifreq but unary symbols are always smaller.
KBO is additionally parametrized by a weight function \((w,w_0)\). E implements several ways of generating weights for a given problem. We shall consider the following. All of these set the variable weight \(w_0\) to 1 and only differ in w.
 (const).:

The weights of all the symbols are set to the constant 1.
 (arity/iarity).:

The weight of an nary function symbol is set to \(n+1\) (respectively to \(mn+1\), where m is the largest symbol arity).
 (prec/iprec).:

Given a symbol precedence <, the weight of symbol f is the number of symbols smaller/larger than f increased by 1.
 (fcount/ifcount).:

The weight of symbol f is the number of occurrences of f in the input problem (respectively m minus the number of occurrences, where m is the frequency of the most occurring symbol).
 (frank/ifrank).:

Sort all function symbols by frequency of occurrence (which induces a total quasiordering). The weight of a symbol is the rank of it’s equivalence class, with less frequent symbols getting lower/higher weights.
Additionally, E allows userdefined weights for all constant symbols, which override the weight assigned by the above weight generation schemes. Finally, E allows both a specific userdefined precedence and specific symbol weights. We do not, however, consider these specific settings as they depends on a signature. Our implementation of WPO in E Prover is described in the next Sect. 4.
Implementation of WPO in E Prover
This section describes our implementation of WPO in E Prover. We introduce two specific algebras from the literature [13]. Both algebras are weakly monotone and simple, and hence instantiate WPO to a reduction order. We discuss the implementation of algebra comparisons and provide several coefficient generation schemes for WPO. We conclude by a brief description of our main WPO comparison method. First we introduce \(\mathcal {S} um \)algebras which sum the arguments with a positive multiplier.
Definition 4.1
(\(\mathcal {S} um \)algebra) A \(\mathcal {S} um \)algebra \(\mathcal {A}\) over \(\Sigma \) induced by (w, c) is an algebra over \(\Sigma \) where an nary function symbol f is interpreted as
where \(w(f)>0\) is the weight of f and \(c(f,i)>0\) is the coefficient of the ith argument of f (called subterm coefficient).
Both the weights and subterm coefficients can be zero under certain additional conditions [13, Theorems 5 & 13]. All E weight generation schemes used in this work produce nonzero weights, and hence we consider only positive coefficients, mainly to simplify the implementation. Experimenting with nonzero values is left as future work. The carrier set of \(\mathcal {A}\) can be instantiated by a subset of \(\mathbb {N}\) (\(\{n\in \mathbb {N}:n\ge w_0\}\) for some \(w_0\in \mathbb {N}\)). Note, that a restriction of such a \(\mathcal {S} um \)algebra to \(w_0>0\) and \(c(f,i)=1\) is equivalent to KBO [13, Theorem 16].
Given a \(\mathcal {S} um \)algebra \(\mathcal {A}\) over \(\Sigma \), every term \(s\in \mathcal {T}_\Sigma \) can be interpreted in \(\mathcal {A}\) as an expression of the grammar “\(E::=\mathbb {N}\ \ \mathcal {V}\ \ (E+E)\ \ (\mathbb {N}*E) \)”. This expression contains variables \(vars (s)=\{x_1,\ldots ,x_n\}\). The expression can transformed to the equivalent expression \(s_\mathcal {A}\) of the following form, which we say interprets s in \(\mathcal {A}\) (for appropriate \(c_i\in \mathbb {N}\)).
Since the definitions of \(>_\mathcal {A}\) and \(\ge _\mathcal {A}\) involve an infinite number of variable assignments, it is necessary to provide an efficient algorithm to check the algebra comparisons in WPO. The following lemma helps us to achieve that. Note that, we take the liberty of reordering variables so that shared variables come first.
Lemma 4.1
Given \(\mathcal {S} um \)algebra \(\mathcal {A}\) over \(\Sigma \) and terms \(s,t\in \mathcal {T}_\Sigma \), let \(vars (t)\subseteq vars (s)=\{x_1,\ldots ,x_n\}\) and let \(vars (t)=\{x_1,\ldots ,x_m\}\) for some \(m\le n\). Let
be the interpretations of s and t in \(\mathcal {A}\). Then the following holds.
Clearly, \(s>_\mathcal {A}t\) (and also \(s\ge _\mathcal {A}t\)) implies \(vars (t)\subseteq vars (s)\), hence the variable requirement is not a limitation. WPO requires algebras to be weakly monotone to generate a reduction order. Similarly, the notion of strictly monotone algebras can be defined (using strict comparisons instead of weak ones). \(\mathcal {S} um \)algebras are strictly (and hence weakly) monotone. We next define the \(\mathcal {M} ax \)algebras, which use max instead of addition, making them weakly monotone.
Definition 4.2
(\(\mathcal {M} ax \)algebra) A \(\mathcal {M} ax \)algebra \(\mathcal {A}\) over \(\Sigma \) induced by (w, c) is an algebra over \(\Sigma \) where an nary function symbol f is interpreted as
where \(w(f)>0\) is the weight of f and \(c(f,i)>0\) is the coefficient of the ith argument of f (called subterm penalty).
Again, zero weights and penalties are allowed under certain conditions, which we omit in this presentation. For example, setting all the weights and penalty coefficients to zeros makes WPO behave like LPO [13, Theorem 19]. Similarly to \(\mathcal {S} um \)algebras, given a \(\mathcal {M} ax \)algebra \(\mathcal {A}\) over \(\Sigma \), every term \(s\in \mathcal {T}_\Sigma \) with \(vars (s)=\{x_1,\ldots ,x_n\}\) can be interpreted by an expression \(s_\mathcal {A}\) of the following form, which is said to interpret s in \(\mathcal {A}\).
The following allows efficiently comparing terms in \(\mathcal {M} ax \)algebras.
Lemma 4.2
Let \(\mathcal {M} ax \)algebra \(\mathcal {A}\) over \(\Sigma \) and terms \(s,t\in \mathcal {T}_\Sigma \) be given. Let \(vars (t)\subseteq vars (s)=\{x_1,\ldots ,x_n\}\) and \(vars (t)=\{x_1,\ldots ,x_m\}\) for some \(m\le n\). Let
interpret s and t in \(\mathcal {A}\). Let \(c_{\max }=\max (c_0,\ldots ,c_n)\) and \(d_{\max }=\max (d_0,\ldots ,d_m)\). Then the following holds.
Note that in \(s>_\mathcal {A}t\), as opposed to Lemma 4.1, we require all the coefficients to be strictly greater. Otherwise \(\max (x+2,y+1)\) would be strictly greater than \(\max (x+1,y+1)\). We do not compare the constant coefficients \(c_0\) and \(d_0\), because, for example, \(\max (1,x+3)\) is always greater than \(\max (2,x+2)\) even though the constant coefficients are not. The proof of Lemma 4.2 follows from the observation that \(c_0\) can be substituted by \(c_{\max }\) without affecting the value of \(s_\mathcal {A}\).
Inspired by precedence/weight generation schemes in E, we have implemented the following subterm coefficient generation schemes. These schemes generate coefficients c(f, i) to be used both in \(\mathcal {S} um \) and \(\mathcal {M} ax \)algebras.
 (constant).:

All coefficients are set to 1.
 (arity).:

For an nary function symbol f we set \(c(f,i)=n\).
 (firstmax).:

For all f, the first coefficient c(f, 1) is set 2 while the others to 1.
 (firstmin).:

For all f, the first coefficient c(f, 1) is set 1 while the others to 2.
 (asc/desc).:

Set up ascending/descending coefficients. For an nary function symbol f we set \(c(f,i)=i\) (respectively \(c(f,i)=ni+1\)).
To implement a new term ordering \(>_\mathcal {T}\) in E, a term comparison method is required. The method takes two terms s and t as input and returns whether \(s<_\mathcal {T}t\), or \(s>_\mathcal {T}t\), or \(s=t\), or the terms are incomparable. We have implemented the WPO comparison methods for \(\mathcal {S} um \) and \(\mathcal {M} ax \) algebras. Our implementation mostly follows Definition 2.4. At first we check strict algebra comparisons \(>_\mathcal {A}\). To do that, we compute coefficients \(\overline{c_i}\) and \(\overline{d_i}\) from Lemmas 4.1 or 4.2 by a traversal of s and t. If the coefficients are the same, we clearly have both \(s\ge _\mathcal {A}t\) and \(t\ge _\mathcal {A}s\). If \(s>_\mathcal {A}t\), we return \(s>_{\text{ wpo }}t\) (and vice versa). For terms incomparable with \(>_\mathcal {A}\), we proceed with the weak comparison \(\ge _\mathcal {A}\). If they are weakly comparable, we proceed with the subterm checks.
Term Rewriting with Relaxed Algebras
Algebras \(\mathcal {S} um \) and \(\mathcal {M} ax \) from Sect. 4 have nice theoretical properties, in particular, they instantiate WPO to a reduction order. In this section we try to address the question, whether forsaking some of these theoretical properties might give us an advancement in a practical use of a theorem prover. We shall introduce several relaxed algebras which might instantiate WPO to a nonterminating order, or even to a relation which is not an order at all. To avoid infinite loops when rewriting terms, we impose an upper bound on the length of every rewriting chain. Proof strategies with this modification of rewriting might not be complete, however, correctness is preserved. It is a known fact in theorem proving, that incomplete strategies can still be useful in practice. Moreover, any proof search can be made complete by switching to a complete strategy once incomplete strategies fail to find a proof.
All of our relaxed algebras, just like the standard complete algebras from Sect. 4, are induced by (w, c) where w is a symbol weight function and c is a coefficient function. We define four relaxed \(\mathcal {RS} um \)algebras and four relaxed \(\mathcal {RM} ax \)algebras. Each of these algebras assign a numeric weight to a term. In the case of the \(\mathcal {RS} um \)algebras, we denote the weight of term t by \(\mathcal {RS} um (t)\), and all of the four algebras use the following recursive formula to compute the weight of a nonvariable term.
The four relaxed \(\mathcal {RS} um \)algebras differ only in the value they assign to variables. Algebra \(\mathcal {RS} um _0\) simply assigns 0 to every variable, while in algebra \(\mathcal {RS} um _1\) we set \(\mathcal {RS} um _1(x)=1\) for every variable x. The remaining two algebras suppose that variables in a term are numbered (starting from 1) by their first occurrence in the term from left to right. Algebra \(\mathcal {RS} um _+\) assigns to each variable its number, while algebra \(\mathcal {RS} um _\) assigns the opposite number. For example, given a term f(g(x, y), x), we have \(\mathcal {RS} um _{+}(x)=1\) and \(\mathcal {RS} um _{+}(y)=2\), while \(\mathcal {RS} um _{}(x)=1\) and \(\mathcal {RS} um _{}(y)=2\).
Similarly, we define four relaxed \(\mathcal {RM} ax \)algebras. Again, each of them assigns a weight to each term t, denoted \(\mathcal {RM} ax (t)\). The following common formula is used to compute the weight of a nonvariable term.
The algebras differ in the value they assign to variables, and this gives us four \(\mathcal {RM} ax \) algebras: \(\mathcal {RM} ax _0\), \(\mathcal {RM} ax _1\), \(\mathcal {RM} ax _+\), and \(\mathcal {RM} ax _\). The variable weights are the same as in the case of the four \(\mathcal {RS} um \) algebras.
Our relaxed algebras can easily used with WPO from Definition 2.4. The terms are at first compared by their weights, and only in the case of equal weights, subterms conditions (2a) and (2b) are checked. As opposed to the standard complete algebras, every two terms are comparable in our relaxed algebras. Hence more terms are strictly comparable in our relaxed algebras, thus, the computationally expensive subterms checks should be executed less often. Hence our relaxed algebras can be expected to perform more effectively.
The relaxed algebras can be seen as an approximation of the complete algebras in the following way. With the complete algebras, terms are represented by expressions with variables, and the expressions are compared with respect to every possible variable assignment (see Sect. 2). In the relaxed algebras, we just evaluate the expressions with a single fixed variable assignment, for example, \(\sigma _0=\{x\mapsto 0:x\in \mathcal {V}\}\) in the case of \(\mathcal {RS} um _0\) or \(\mathcal {RM} ax _0\).
Experimental Evaluations
This section provides an evaluation of our experimental implementation of WPO in E Prover.^{Footnote 1} We use a single goodperforming E strategy with the different term orders. The strategy was randomly selected and is provided in “Appendix A”. Section 6.1 describes previously published [2] evaluation of standard WPO. Section 6.2 provides evaluation of WPO with relaxed algebras, first published here.
We evaluate our experimental implementation on four complementary benchmarks with around 200 problems each. Benchmark problems are from two TPTP [11] categories (LAT and REL), from the Abelian Inner Mappings project (AIM) [10], and from CoqHammer [1]. As we evaluate a large number of different ordering instances on all of the benchmark problems, it is important to limit the number of problems, so that the evaluation can be done in a reasonable time.^{Footnote 2} We, however, believe that our collection of about 800 benchmark problems is reasonably orthogonal to allow us to perform an objective evaluation. All the selected benchmark domains rely heavily on equational reasoning, and hence can be expected to benefit from improvements in term rewriting.
Evaluation of Standard WPO Implementation
We evaluate all instances of LPO, KBO, and WPO induced by the generation schemes described above, in order to estimate the value of WPO for E. Altogether we have 1410 instances to be evaluated on all the benchmark problems. The limit of 1000 processed clauses, instead of time limit, is used for an evaluation independent on implementation effectiveness. Section 6.2, however, contains evaluation conducted with a fixed CPU time limit per instance and problem.
We have 6 instances of LPO, 108 instances KBO, and 1296 of WPO. The results for each benchmark are in Table 1. For each ordering, the column “by” shows the least number of instances necessary to solve the number in the column solved. Number of problems solved by E’s automated term order selection is shown in column “Auto”. The “union” columns show a combined performance. Table 2 shows the bestperforming instance for every order type, measuring number problems solved and the number of problems solved additionally to Auto mode (column “E+”). The parameters of the instances select the generation schemes for precedence, weights, algebra, and coefficients.
WPO helped to solve more problems for each benchmark. It also solved problems unsolved by Auto. Furthermore, the strongest WPO is usually equal or better than the strongest version of LPO and KBO. LPO(arity) is often the best of LPOs. As for WPO, \(\mathcal {S} um \) often performs better than \(\mathcal {M} ax \) overall but \(\mathcal {M} ax \) can solve unique problems. The algebra coefficients generated by \(\mathbf{desc} \) often perform best.
As stated above, we used a limit on processed clauses rather than on runtime, in order to abstract from implementation details. In order to assess the effectiveness of our implementation, we have additionally evaluated the best performing ordering instances from Table 2 on the benchmark problems with runtime limit of 5 seconds. For each benchmark category (AIM, COQ, etc.) we have computed the average runtime on the problems solved by all the instances. The results vary on different categories but LPO is usually the fastest and KBO is in average from 10% The speed of WPO varies, but in average it is from 40% LPO. However, for example on TPTP/REL, our implementation of WPO is in average faster than both LPO and KPO. We conclude that our implementation can be definitely made more effective, but even in the current state, it can provide a valuable gain. Section 6.2 provides additional evaluation with fixed CPU time limit instead of an abstract time limit.
Evaluation of WPO with Relaxed Algebras
This section provides experimental evaluation of WPO with both standard and relaxed algebras. We evaluate all LPO, KBO, and WPO instances with fixed CPU time limit of 1 second per problem. In this way we shall be able to estimate whether there are some WPO instances which enrich standard E Prover implementation. As before, we have 6 LPO instances, 108 KBO instances, and 1296 standard WPO instances. Additionally, we introduce 5184 instances of relaxed WPO, generated by 8 relaxed algebras from Sect. 5. Altogether we have 6594 instances to be evaluated on all of the benchmark problems. Hence a relatively small time limit of 1 second per instance and problem was chosen in order to make this evaluation possible. This is, however, not a limitation as a reasonable correlation between results obtained with higher time limits can be expected.
The results for each benchmark are in Table 3. The table is as in the previous section, that is, the column “solved” shows the total number of problems solved by all the instances of LPO, KBO, WPO, and by WPO instances with relaxed algebras (denoted “RWPO”). Again, the column “by” shows the least number (more precisely, the size of a greedy coverage) of instances necessary to solve the number in the column “solved”. A full listing of instances in greedy coverage are presented in “Appendix B”, as these data might provide additional insight about useful ordering parameters. Number of problems solved by E’s automated term order selection (+tAuto+) is shown in column “Auto” as a reference. The “union” columns show a combined performance. Additionally, Table 4 shows the bestperforming instance for every ordering and benchmark, together with the number of problems solved.
We can see that WPO with relaxed algebras outperforms other ordering types. From the combined performance in the column “union” of Table 3 we can furthermore conclude that WPO with relaxed algebras can solve all the problems as other orderings (with the exception of one AIM problem) and more. With a relatively big number of possible WPO instances, it is, however, a question whether one can arrive at the right instantiations easily. This is further discussed in Sect. 7.
Conclusion
In this paper we proposed efficient implementations of algebras that allow integrating more powerful orderings in the superposition calculus. The resulting E strategies are more precise, resulting in complementary proofs on the various corpora and have a potential to benefit E Prover and superposition calculus ATPs in general. Furthermore, firsttime presented here, we proposed a relaxed version of WPO and experimentally evaluate its benefits, and thus also benefits of relaxed term orderings for ATPs in general.
We have experimentally evaluated our implementation of WPO with standard and relaxed algebras on a single goodperforming E Prover strategy. Stateoftheart theorem provers, however, are not based on a single strategy, but rather on a portfolio of complementary strategies. It is often the case, that even a large improvement of a single strategy from the portfolio has just a minimal effect on the overall portfolio performance. This is because the additionally solved problems are often already solved by another portfolio strategy. For example, we have shown that with the selected E strategy, there are problems solved only by WPO with relaxed algebras. Whether the same problems can be solved with another E strategy with LPO or KBO is not clear. We have experimentally tried to employ portfolio invention system BliStrTune [3] in order to invent two portfolios, one with and one without our WPO orderings (both standard and relaxed). So far we have been able to reach only a 1% Whether this behavior is caused by WPO, or by a wrong configuration or limitations of BliStrTune is left for further research.
As another future work, we would like to experiment with orderings that work modulo associativity and commutativity [14]. Additionally we would like to investigate other coefficient settings, and experiment with zero weights, as this might further reduce the number of derived clauses. We would also like to further optimize the efficiency of the algebra comparisons, as well as the computation of the ordering itself.
Notes
The evaluations took around 4 days employing 128 cores of Intel(R) Xeon(R) CPU E52698 v3 @ 2.30GHz with 128 GB memory in total.
References
Czajka, Ł., Kaliszyk, C.: Hammer for Coq: automation for dependent type theory. J. Autom. Reason. 61(1–4), 423–453 (2018)
Jakubuv, J., Kaliszyk, C.: Towards a unified ordering for superpositionbased automated reasoning. In: ICMS, Lecture Notes in Computer Science, vol. 10931, pp. 245–254. Springer, Berlin (2018)
Jakubuv, J., Urban, J.: Hierarchical invention of theorem proving strategies. AI Commun. 31(3), 237–250 (2018)
Kovács, L., Voronkov, A.: Firstorder theorem proving and Vampire. ComputerAided Verification (CAV 2013). LNCS, vol. 8044, pp. 1–35. Springer, Berlin (2013)
Löchner, B.: Things to know when implementing KBO. J. Autom. Reason. 36(4), 289–310 (2006)
Löchner, B.: Things to know when implementing LPO. Int. J. Artif. Intell. Tools 15(1), 53–80 (2006)
McCune, W.: Solution of the Robbins problem. J. Autom. Reason. 19(3), 263–276 (1997)
Middeldorp, A.: Term rewriting lecture notes. In: 9th International School on Rewriting (ISR 2017) (2017)
Schulz, S.: System description: E 1.8. Logic for programming. artificial intelligence (LPAR 2013), LNCS, vol. 8312, pp. 735–743. Springer, Berlin (2013)
Sutcliffe, G.: The 8th IJCAR automated theorem proving system competition: CASCJ8. AI Communications 29(5), 607–619 (2016)
Sutcliffe, G.: The TPTP problem library and associated infrastructure: from CNF to TH0, TPTP v6.4.0. J. Autom. Reason. 59(4), 483–502 (2017)
Yamada, A., Kusakari, K., Sakabe, T.: Unifying the KnuthBendix, recursive path and polynomial orders, pp. 181–192. PPDP, ACM (2013)
Yamada, A., Kusakari, K., Sakabe, T.: A unified ordering for termination proving. Sci. Comput. Program. 111, 110–134 (2015)
Yamada, A., Winkler, S., Hirokawa, N., Middeldorp, A.: ACKBO revisited. TPLP 16(2), 163–188 (2016)
Acknowledgements
Open access funding provided by University of Innsbruck and Medical University of Innsbruck.
Author information
Authors and Affiliations
Corresponding authors
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This work was kindly supported by the ERC Consolidator Grant No. 649043 AI4REASON, by ERC Starting Grant No. 714034 SMART, and by Cost Action CA15123 EUTypes.
Appendices
Appendix A: E Strategy Used for Experiments
The following is the E Prover strategy used in the experiments given as E Prover command line options. Additional command line options are introduced by the given term ordering selection. For example, KBO(ifreq,fcount,c1) adds
“tKBO6 Ginvfreq wfreqcount c1”.
 definitionalcnf=24  destructiveeraggressive –destructiveer
 preferinitialclauses F1  splitclauses=0
 forwardcontextsr WSelectComplexG  orientedsimulparamod
H’(1*ConjectureRelativeSymbolWeight(SimulateSOS,0.5,100,100,100,
100,1.5,1.5,1),4*ConjectureRelativeSymbolWeight(ConstPrio,0.1,100,
100,100,100,1.5,1.5,1.5),1*FIFOWeight(PreferProcessed),
1*ConjectureRelativeSymbolWeight(PreferNonGoals,0.5,100,100,100,100,
1.5,1.5,1),4*Refinedweight(SimulateSOS,3,2,2,1.5,2))’
Appendix B: Greedy Coverage
Table 5 gives a full list of order instances in greedy covers required to solve the number of problems listed in Table 3 for each benchmark. The first numeric column states how many new problems the corresponding instance adds to problems solved by the previous instances in the sequence (from top to down). The second column states how many problems the corresponding instance solves by itself. The following abbreviations are used: “fmin” stands for for “firstmin”, and “fmax” for “firstmax”. Other values might be abbreviated to a unique prefix (like “constant” to “con”) for space restrictions. Furthermore, \(\mathcal {S} um \) and \(\mathcal {M} ax \) are abbreviated to \(\mathcal {S}\) and \(\mathcal {M}\).
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Jakubův, J., Kaliszyk, C. Relaxed Weighted Path Order in Theorem Proving. Math.Comput.Sci. 14, 657–670 (2020). https://doi.org/10.1007/s11786020004740
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11786020004740
Keywords
 Automated theorem proving
 Firstorder logic
 Term orderings
 Term rewriting
 Superposition calculus
 Weighted path order
Mathematics Subject Classification
 03B10
 03B35
 03B70
 06A05
 06A06
 68T15
 6804