Relaxed Weighted Path Order in Theorem Proving

We propose an extension of the automated theorem prover E by the weighted path ordering (WPO). WPO is theoretically stronger than all the orderings used in E Prover, however its parametrization is more involved than those normally used in automated reasoning. In particular, it depends on a term algebra. We integrate the ordering in E Prover and perform an evaluation on the standard theorem proving benchmarks. The ordering is complementary to the ones used in E prover so far. Furthermore, first-time presented here, we propose a relaxed variant of the weighted path order as an approximation of the standard WPO definition. A theorem prover strategy with a relaxed order can be incomplete, which is, however, not an issue as completeness can be easily regained by switching to a complete strategy. We show that the relaxed weighted path order can have a huge impact on an improvement of a theorem prover strategy.


Introduction
In the last two decades the superposition calculus has become one of the main foundations of automated theorem provers for first-order logic. Indeed the systems regularly winning the yearly CADE ATP Systems Competition, such as E Prover [9] and Vampire [4] are based on the superposition calculus. Also for the problems not previously solved by humans, superposition calculus based Prover9 has been most useful so far [7].
The use of powerful and efficient orderings is one of the major advantages of the superposition calculus for classical first-order theorem proving. Orderings allow provers to avoid redundant clauses, namely clauses which only differ in the order of literals, as well as permit orienting equations and therefore rewriting the clauses only in one direction. The three predominantly used orderings in automated theorem proving are LPO, KBO, and RPO. In fact, for the former two optimized implementations are known [5,6].
However, term rewriting research has shown that there exist more powerful orderings, for example the weighted path order (WPO) [12] is one of the strongest known orderings. With carefully selected parameters is can subsume most known orderings including LPO, KBO, and RPO [13]. There are however two reasons, why such stronger orderings have not been tried for automated reasoning so far. First, they often rely on complicated parameters. For example WPO relies on an algebra on terms as an argument. Second, the efficiency of KBO, LPO, or even RPO has been optimized for the most common cases, whereas the more advanced orderings have been stated in a general manner, without optimizing their efficiency.
This paper extends our previous research [2] where we attempt to overcome both of these obstacles and propose an efficient way to implement WPO as part of an automated reasoning system. We also propose parameters that allow WPO to function efficiently within a state-of-the-art automated theorem prover and help with actual theorem proving problems. After discussing the preliminaries on term orderings in Sect. 2 and on their use in the superposition calculus in Sect. 3, the particular contributions of this paper are: • We propose algebras that can be used efficiently for first-order theorem proving (Sect. 4).
• First-time presented in this paper, we propose a relaxed version of WPO based on approximation of the standard WPO definition (Sect. 5). • We evaluate WPO against existing orderings in E Prover on parts of the TPTP library, the proofs stemming from the AIM conjecture [10], and on the CoqHammer proofs [1] in Sect. 6. • We show that relaxed WPO can provide a huge benefit for a theorem proving strategy (Sect. 6.2).
• Additionally to our previous research, we provide an evaluation of effectiveness of our implementation based on real CPU time limits. This work is an extended version of our ICMS 2018 paper [2]. In comparison with that work, the relaxed WPO, efficient implementation, and a more extensive evaluation including an evaluation based on CPU time limit are the main new contributions presented first-time in this paper.

Term Orderings and Rewriting
We work in first-order logic (FOL). A signature is a collection of symbols with arities. The set of first-order variables is denoted V, and T stands for the terms over signature and variables V. A literal is an atomic formula or its negation, and a clause is a disjunction of literals. In ATPs, clauses are used to describe both the input problem, and the knowledge inferred during the search. On occasion, unit equality clauses of the form s = t are inferred. Such equalities can be used to simplify other clauses using s → t or t → s as a rewriting rule.
Rewriting systems, described by finite sets of rewriting rules, are often used inside ATPs to keep a set of clauses in normal forms. A crucial property for ATPs is the termination of every rewriting chain on any term. The termination of system R can be shown using a well-founded term ordering > T on terms T , that orients every rule (s → t) ∈ R, meaning s > T t. Terminating rewriting systems are called reduction orders. See [8,13] for details.
Reduction orders are successfully used in many state-of-the-art ATPs. Common orders [8,13] are lexicographic path order (LPO) and Knuth-Bendix order (KBO). LPO extends a precedence > on symbols to a reduction order on T by a variety of subterm comparisons. KBO is generated by a precedence and symbol weights. Terms in KBO are first compared by weights and the subterm comparisons are necessary only if the weights differ. WPO further abstracts the idea of symbol weight comparisons to comparisons in algebras.
In this section, we remind the theoretical definitions of the orderings LPO and KBO used in E Prover, and remind the theoretical definition of WPO. We mostly follow [8] for LPO and KBO and [13] for WPO, and we refer the reader there for further details.
All the orderings will be defined on first-order terms T , and rely on a precedence > , which needs to be a proper order on the symbols from signature . Definition 2.1 (LPO [8]) Given a precedence on symbols > , we define the lexicographic path order (LPO) > lpo as follows: s = f (s 1 , . . . , s n ) > lpo t iff one of the following conditions holds: . . . , t m ), f > g, and s > lpo t i for all i such that 1 i m, or 3. s i ≥ lpo t for all i such that 1 i n.
Where ≥ lpo is the reflexive closure of > lpo .
In order to define KBO, we additionally need a weight function induced by a pair (w, w 0 ) where w is symbol weight function and w 0 is a constant variable weight. The constant w 0 must be greater than zero, and the mapping w from the signature to the natural numbers is defined such that w(c) w 0 for any constant c ∈ . The weight function w on symbols from is naturally extended to the weight function on terms T as follows.
Additionally if a unary function f has weight 0, than f is the greatest element wrt. the precedence. In the following, |s| x denotes the number of occurrences of variable x ∈ V in term s ∈ T . Weighted path order (WPO) further abstracts the weight function to the notion of algebras on first-order terms defined as follows. In this work, we consider the carrier set always to be N with the standard order on N. Given a variable assignment σ : V → N, we can structurally interpret every term t ∈ T using interpretations from algebra A as the number σ A (t) ∈ N, formally as follows.

Definition 2.4 (WPO
Only terms comparable in A are comparable in > wpo . Strict order s > A t alone implies s > wpo t. Otherwise s ≥ A must hold and various subterm conditions are checked. In (2a), ≥ wpo is the reflexive closure of > wpo , while > A and ≥ A are separately defined orders induced by A. In (2b/ii) the lexicographical extension > lex wpo of > wpo to n-tuples is used when the compared terms have the same head symbol.
If the WPO algebra A is weakly monotone and weakly simple, then > wpo is a reduction order [

Orderings in Superposition Calculus
Saturation based automated theorem provers, like E Prover [9], attempt to prove a first-order goal conjecture G in a theory T , that is, T G. First, theory axioms with the negated conjecture T ∪ {¬G} are translated to a logically equivalent set of clauses. Then, a saturation process is initiated, which selects an unprocessed clause C and computes all possible inferences of C with all the previously processed clauses. Clause C is then marked as processed and another unprocessed clause is selected. This process continues until an empty clause (contradiction) is derived, or there are no more unprocessed clauses (the set of processed clauses becomes saturated), or the prover runs out of resources.
The saturation process uses term orderings for various purposes depending on the selected inference rules. The classical resolution rule allows to infer the clause (C 1 ∨ C 2 )σ from clauses (L 1 ∨ C 1 ) and (¬L 2 ∨ C 2 ) provided L 1 and L 2 are unifiable with the unifier σ . The ordered resolution restricts the classical resolution rule to literals maximal in each clause (w.r.t. a fixed term ordering > T ). In paramodulation, inferred unit equality clauses of the form s = t, which can be oriented using the ordering (either s > T t or t > T s), can be used as rewriting rules (s → t or t → s, respectively). The processed clauses are then kept in their normal form with respect to the inferred rewriting rules (called demodulators). All these extensions restrict the number of possible inferences preserving completeness (that is, they do not prevent the inference of the empty clause). Clearly, the more terms are comparable, the more inferences are restricted, which leads to a more effective search space reduction. E Prover implements LPO and KBO. The desired term ordering can be selected using a command-line option. E implements approximately ten signature-independent methods to generate the precedence on the symbols. In this work, we shall consider the following.
(arity/iarity). Symbols are sorted by arity or reverse arity. Symbols with higher arity are larger/smaller.
(freq/ifreq). Symbols are sorted by the frequency of their occurrence in the input problem. Frequently occurring symbols are larger/smaller. In the case of the same frequency, symbols are sorted by arity. (ufirst). Same as arity but unary symbols are smaller. In the case of the same arity, symbols are sorted by frequency. (ufreq). Same as ifreq but unary symbols are always smaller. KBO is additionally parametrized by a weight function (w, w 0 ). E implements several ways of generating weights for a given problem. We shall consider the following. All of these set the variable weight w 0 to 1 and only differ in w.
(const). The weights of all the symbols are set to the constant 1. (arity/iarity). The weight of an n-ary function symbol is set to n + 1 (respectively to m − n + 1, where m is the largest symbol arity).
(prec/iprec). Given a symbol precedence <, the weight of symbol f is the number of symbols smaller/larger than f increased by 1. (fcount/ifcount). The weight of symbol f is the number of occurrences of f in the input problem (respectively m minus the number of occurrences, where m is the frequency of the most occurring symbol). (frank/ifrank). Sort all function symbols by frequency of occurrence (which induces a total quasi-ordering).
The weight of a symbol is the rank of it's equivalence class, with less frequent symbols getting lower/higher weights.
Additionally, E allows user-defined weights for all constant symbols, which override the weight assigned by the above weight generation schemes. Finally, E allows both a specific user-defined precedence and specific symbol weights. We do not, however, consider these specific settings as they depends on a signature. Our implementation of WPO in E Prover is described in the next Sect. 4.

Implementation of WPO in E Prover
This section describes our implementation of WPO in E Prover. We introduce two specific algebras from the literature [13]. Both algebras are weakly monotone and simple, and hence instantiate WPO to a reduction order. We discuss the implementation of algebra comparisons and provide several coefficient generation schemes for WPO. We conclude by a brief description of our main WPO comparison method. First we introduce Sum-algebras which sum the arguments with a positive multiplier. c) is an algebra over where an n-ary function symbol f is interpreted as where w( f ) > 0 is the weight of f and c( f, i) > 0 is the coefficient of the i-th argument of f (called subterm coefficient).
Both the weights and subterm coefficients can be zero under certain additional conditions [13,Theorems 5 & 13]. All E weight generation schemes used in this work produce non-zero weights, and hence we consider only positive coefficients, mainly to simplify the implementation. Experimenting with non-zero values is left as future work. The carrier set of A can be instantiated by a subset of N ({n ∈ N : n ≥ w 0 } for some w 0 ∈ N). Note, that a restriction of such a Sum-algebra to w 0 > 0 and c( f, i) = 1 is equivalent to KBO [13,Theorem 16].
Given a Sum-algebra A over , every term s ∈ T can be interpreted in A as an expression of the grammar "E : . This expression contains variables vars(s) = {x 1 , . . . , x n }. The expression can transformed to the equivalent expression s A of the following form, which we say interprets s in A (for appropriate c i ∈ N).
Since the definitions of > A and ≥ A involve an infinite number of variable assignments, it is necessary to provide an efficient algorithm to check the algebra comparisons in WPO. The following lemma helps us to achieve that. Note that, we take the liberty of reordering variables so that shared variables come first.
be the interpretations of s and t in A. Then the following holds.
Clearly, s > A t (and also s ≥ A t) implies vars(t) ⊆ vars(s), hence the variable requirement is not a limitation. WPO requires algebras to be weakly monotone to generate a reduction order. Similarly, the notion of strictly monotone algebras can be defined (using strict comparisons instead of weak ones). Sum-algebras are strictly (and hence weakly) monotone. We next define the Max-algebras, which use max instead of addition, making them weakly monotone. Max-algebra A over induced by (w, c) is an algebra over where an n-ary function symbol f is interpreted as f A (a 1 , . . . , a n

Definition 4.2 (Max-algebra) A
where w( f ) > 0 is the weight of f and c( f, i) > 0 is the coefficient of the i-th argument of f (called subterm penalty).
Again, zero weights and penalties are allowed under certain conditions, which we omit in this presentation. For example, setting all the weights and penalty coefficients to zeros makes WPO behave like LPO [13,Theorem 19]. Similarly to Sum-algebras, given a Max-algebra A over , every term s ∈ T with vars(s) = {x 1 , . . . , x n } can be interpreted by an expression s A of the following form, which is said to interpret s in A.
The following allows efficiently comparing terms in Max-algebras.  Note that in s > A t, as opposed to Lemma 4.1, we require all the coefficients to be strictly greater. Otherwise max(x + 2, y + 1) would be strictly greater than max(x + 1, y + 1). We do not compare the constant coefficients c 0 and d 0 , because, for example, max(1, x + 3) is always greater than max(2, x + 2) even though the constant coefficients are not. The proof of Lemma 4.2 follows from the observation that c 0 can be substituted by c max without affecting the value of s A .
Inspired by precedence/weight generation schemes in E, we have implemented the following subterm coefficient generation schemes. These schemes generate coefficients c( f, i) to be used both in Sum and Max-algebras. To implement a new term ordering > T in E, a term comparison method is required. The method takes two terms s and t as input and returns whether s < T t, or s > T t, or s = t, or the terms are incomparable. We have implemented the WPO comparison methods for Sum and Max algebras. Our implementation mostly follows Definition 2.4. At first we check strict algebra comparisons > A . To do that, we compute coefficients c i and d i from Lemmas 4.1 or 4.2 by a traversal of s and t. If the coefficients are the same, we clearly have both s ≥ A t and t ≥ A s. If s > A t, we return s > wpo t (and vice versa). For terms incomparable with > A , we proceed with the weak comparison ≥ A . If they are weakly comparable, we proceed with the subterm checks.

Term Rewriting with Relaxed Algebras
Algebras Sum and Max from Sect. 4 have nice theoretical properties, in particular, they instantiate WPO to a reduction order. In this section we try to address the question, whether forsaking some of these theoretical properties might give us an advancement in a practical use of a theorem prover. We shall introduce several relaxed algebras which might instantiate WPO to a non-terminating order, or even to a relation which is not an order at all. To avoid infinite loops when rewriting terms, we impose an upper bound on the length of every rewriting chain. Proof strategies with this modification of rewriting might not be complete, however, correctness is preserved. It is a known fact in theorem proving, that incomplete strategies can still be useful in practice. Moreover, any proof search can be made complete by switching to a complete strategy once incomplete strategies fail to find a proof.
All of our relaxed algebras, just like the standard complete algebras from Sect. 4, are induced by (w, c) where w is a symbol weight function and c is a coefficient function. We define four relaxed RSum-algebras and four relaxed RMax-algebras. Each of these algebras assign a numeric weight to a term. In the case of the RSum-algebras, we denote the weight of term t by RSum(t), and all of the four algebras use the following recursive formula to compute the weight of a non-variable term.
The four relaxed RSum-algebras differ only in the value they assign to variables. Algebra RSum 0 simply assigns 0 to every variable, while in algebra RSum 1 we set RSum 1 (x) = 1 for every variable x. The remaining two algebras suppose that variables in a term are numbered (starting from 1) by their first occurrence in the term from left to right. Algebra RSum + assigns to each variable its number, while algebra RSum − assigns the opposite number. For example, given a term f (g(x, y), x), we have RSum + (x) = 1 and RSum Similarly, we define four relaxed RMax-algebras. Again, each of them assigns a weight to each term t, denoted RMax(t). The following common formula is used to compute the weight of a non-variable term.
The algebras differ in the value they assign to variables, and this gives us four RMax algebras: RMax 0 , RMax 1 , RMax + , and RMax − . The variable weights are the same as in the case of the four RSum algebras. Our relaxed algebras can easily used with WPO from Definition 2.4. The terms are at first compared by their weights, and only in the case of equal weights, subterms conditions (2a) and (2b) are checked. As opposed to the standard complete algebras, every two terms are comparable in our relaxed algebras. Hence more terms are strictly comparable in our relaxed algebras, thus, the computationally expensive subterms checks should be executed less often. Hence our relaxed algebras can be expected to perform more effectively.
The relaxed algebras can be seen as an approximation of the complete algebras in the following way. With the complete algebras, terms are represented by expressions with variables, and the expressions are compared with respect to every possible variable assignment (see Sect. 2). In the relaxed algebras, we just evaluate the expressions with a single fixed variable assignment, for example, σ 0 = {x → 0 : x ∈ V} in the case of RSum 0 or RMax 0 .

Experimental Evaluations
This section provides an evaluation of our experimental implementation of WPO in E Prover. 1 We use a single good-performing E strategy with the different term orders. The strategy was randomly selected and is provided in "Appendix A". Section 6.1 describes previously published [2] evaluation of standard WPO. Section 6.2 provides evaluation of WPO with relaxed algebras, first published here. The bold values specify the best values for each domain (row). The values in italics specify reference values We evaluate our experimental implementation on four complementary benchmarks with around 200 problems each. Benchmark problems are from two TPTP [11] categories (LAT and REL), from the Abelian Inner Mappings project (AIM) [10], and from CoqHammer [1]. As we evaluate a large number of different ordering instances on all of the benchmark problems, it is important to limit the number of problems, so that the evaluation can be done in a reasonable time. 2 We, however, believe that our collection of about 800 benchmark problems is reasonably orthogonal to allow us to perform an objective evaluation. All the selected benchmark domains rely heavily on equational reasoning, and hence can be expected to benefit from improvements in term rewriting.

Evaluation of Standard WPO Implementation
We evaluate all instances of LPO, KBO, and WPO induced by the generation schemes described above, in order to estimate the value of WPO for E. Altogether we have 1410 instances to be evaluated on all the benchmark problems. The limit of 1000 processed clauses, instead of time limit, is used for an evaluation independent on implementation effectiveness. Section 6.2, however, contains evaluation conducted with a fixed CPU time limit per instance and problem.
We have 6 instances of LPO, 108 instances KBO, and 1296 of WPO. The results for each benchmark are in Table 1. For each ordering, the column "by" shows the least number of instances necessary to solve the number in the column solved. Number of problems solved by E's automated term order selection is shown in column "Auto". The "union" columns show a combined performance. Table 2 shows the best-performing instance for every order type, measuring number problems solved and the number of problems solved additionally to Auto mode (column "E+"). The parameters of the instances select the generation schemes for precedence, weights, algebra, and coefficients.
WPO helped to solve more problems for each benchmark. It also solved problems unsolved by Auto. Furthermore, the strongest WPO is usually equal or better than the strongest version of LPO and KBO. LPO(arity) is often the best of LPOs. As for WPO, Sum often performs better than Max overall but Max can solve unique problems. The algebra coefficients generated by desc often perform best.
As stated above, we used a limit on processed clauses rather than on runtime, in order to abstract from implementation details. In order to assess the effectiveness of our implementation, we have additionally evaluated the best performing ordering instances from Table 2 on the benchmark problems with runtime limit of 5 seconds. For each benchmark category (AIM, COQ, etc.) we have computed the average runtime on the problems solved by all the instances. The results vary on different categories but LPO is usually the fastest and KBO is in average from 10% The speed of WPO varies, but in average it is from 40% LPO. However, for example on TPTP/REL, our implementation of WPO is in average faster than both LPO and KPO. We conclude that our implementation can be definitely made more effective, but even in the current state, it can provide a valuable gain. Section 6.2 provides additional evaluation with fixed CPU time limit instead of an abstract time limit.

Evaluation of WPO with Relaxed Algebras
This section provides experimental evaluation of WPO with both standard and relaxed algebras. We evaluate all LPO, KBO, and WPO instances with fixed CPU time limit of 1 second per problem. In this way we shall be able to estimate whether there are some WPO instances which enrich standard E Prover implementation. As before, we have 6 LPO instances, 108 KBO instances, and 1296 standard WPO instances. Additionally, we introduce 5184 instances of relaxed WPO, generated by 8 relaxed algebras from Sect. 5. Altogether we have 6594 instances to be evaluated on all of the benchmark problems. Hence a relatively small time limit of 1 second per instance and problem was chosen in order to make this evaluation possible. This is, however, not a limitation as a reasonable correlation between results obtained with higher time limits can be expected.
The results for each benchmark are in Table 3. The table is as in the previous section, that is, the column "solved" shows the total number of problems solved by all the instances of LPO, KBO, WPO, and by WPO instances with relaxed algebras (denoted "R-WPO"). Again, the column "by" shows the least number (more precisely, the size of a greedy coverage) of instances necessary to solve the number in the column "solved". A full listing of instances in greedy coverage are presented in "Appendix B", as these data might provide additional insight about useful ordering parameters. Number of problems solved by E's automated term order selection (-tAuto) is shown in column "Auto" as a reference. The "union" columns show a combined performance. Additionally, Table 4 shows the best-performing instance for every ordering and benchmark, together with the number of problems solved. The bold values specify the best values for each domain (row). The values in italics specify reference values We can see that WPO with relaxed algebras outperforms other ordering types. From the combined performance in the column "union" of Table 3 we can furthermore conclude that WPO with relaxed algebras can solve all the problems as other orderings (with the exception of one AIM problem) and more. With a relatively big number of possible WPO instances, it is, however, a question whether one can arrive at the right instantiations easily. This is further discussed in Sect. 7.

Conclusion
In this paper we proposed efficient implementations of algebras that allow integrating more powerful orderings in the superposition calculus. The resulting E strategies are more precise, resulting in complementary proofs on the various corpora and have a potential to benefit E Prover and superposition calculus ATPs in general. Furthermore, first-time presented here, we proposed a relaxed version of WPO and experimentally evaluate its benefits, and thus also benefits of relaxed term orderings for ATPs in general.
We have experimentally evaluated our implementation of WPO with standard and relaxed algebras on a single good-performing E Prover strategy. State-of-the-art theorem provers, however, are not based on a single strategy, but rather on a portfolio of complementary strategies. It is often the case, that even a large improvement of a single strategy from the portfolio has just a minimal effect on the overall portfolio performance. This is because the additionally solved problems are often already solved by another portfolio strategy. For example, we have shown that with the selected E strategy, there are problems solved only by WPO with relaxed algebras. Whether the same problems can be solved with another E strategy with LPO or KBO is not clear. We have experimentally tried to employ portfolio invention system BliStrTune [3] in order to invent two portfolios, one with and one without our WPO orderings (both standard and relaxed). So far we have been able to reach only a 1% Whether this behavior is caused by WPO, or by a wrong configuration or limitations of BliStrTune is left for further research.
As another future work, we would like to experiment with orderings that work modulo associativity and commutativity [14]. Additionally we would like to investigate other coefficient settings, and experiment with zero weights, as this might further reduce the number of derived clauses. We would also like to further optimize the efficiency of the algebra comparisons, as well as the computation of the ordering itself. Table 5 gives a full list of order instances in greedy covers required to solve the number of problems listed in Table 3 for each benchmark. The first numeric column states how many new problems the corresponding instance adds to problems solved by the previous instances in the sequence (from top to down). The second column states how many problems the corresponding instance solves by itself. The following abbreviations are used: "fmin" stands for for "firstmin", and "fmax" for "firstmax". Other values might be abbreviated to a unique prefix (like "constant" to "con") for space restrictions. Furthermore, Sum and Max are abbreviated to S and M.